Remove HTML tags and Javascript from pages July 31
Sometimes, for whatever reason, you have the entire code for a page in a string and you would like to remove everything but the raw content. Easy to remove the HTML tags, but then you are left with all the javascript function calls.
Here is the solution:
Public Function RemoveHTML(ByVal sIn As String) As String
Dim R As Regex
’Remove the javascript
R = New Regex(”<script(.*?)>((.|\n)*?)(<\/script>)”, RegexOptions.IgnoreCase + RegexOptions.IgnoreCase)
sIn = R.Replace(sIn, “”)
’Remove all the HTML tags
R = New Regex(”<[^>]*>”)
sIn = R.Replace(sIn, “”)
RemoveHTML = sIn
End Function
This post brought to you by WeGotDomain.com - Over 10,000 Aged domains for sale!
Related posts:
');
« « Opening a new browser window with POST dataAdd GeoLocation to your site in less time than it takes to Drink a Beer » »
If you liked this, then subscribe to my RSS feed

Mac Aug 6
So what happens if you have some text like this then
“i8″
i don’t think your solution is complete
Mac Aug 6
Ahh. I think the forms just removed my tags… ho hum.. never mind. Either way, you solution is not complete
Gath Aug 6
Hi Mac,
I am not sure what you mean by:
So what happens if you have some text like this then
“i8″
The solution is meant to take text like:
<html>
<head>
<script>
function testme() {
var s;
s=’this is a test’; }
</head>
<body>
This is the content.
</body>
</html>
and return:
This is the content.
Clint Rutkas Aug 11
You forgot about inline javascript statements for onload’s and onclick’s. I could do:
…
I could even do soemthing like this:
http://www.codehouse.com/javascript/articles/external/
Clint Rutkas Aug 11
Grrr….
Ok, lets try this without complete HTML mark up then
body onload=”function(){alert(”foo”);bar();}”></body
Gath Aug 11
Hi Clint,
I just did your test, and it works.
<body onload="function(){alert('foo');bar();}">
This is the content
</body>
Becomes:
This is the content
Or did you mean something else?
Image Map Aug 30
thanks this is just want I needed