Previous     Next

Remove HTML tags and Javascript from pages

Sometimes, for whatever reason, you have the entire code for a page in a string and you would like to remove everything but the raw content. Easy to remove the HTML tags, but then you are left with all the javascript function calls.

Here is the solution:


    Public Function RemoveHTML(ByVal sIn As String) As String
        Dim R As Regex

        ’Remove the javascript
        R = New Regex(”<script(.*?)>((.|\n)*?)(<\/script>)”, RegexOptions.IgnoreCase + RegexOptions.IgnoreCase)
        sIn = R.Replace(sIn, “”)

        ’Remove all the HTML tags
        R = New Regex(”<[^>]*>”)
        sIn = R.Replace(sIn, “”)

        RemoveHTML = sIn
    End Function


This post brought to you by WeGotDomain.com - Over 10,000 Aged domains for sale!

Related posts:

  1. Remove Name Mangling from ASP.Net Master Pages (get your ID back!)


« « Opening a new browser window with POST data
Add GeoLocation to your site in less time than it takes to Drink a Beer » »

If you liked this, then subscribe to my RSS feed

7 comments

  1. Mac Aug 6

    So what happens if you have some text like this then

    “i8″

    i don’t think your solution is complete

  2. Mac Aug 6

    Ahh. I think the forms just removed my tags… ho hum.. never mind. Either way, you solution is not complete

  3. Gath Aug 6

    Hi Mac,

    I am not sure what you mean by:

    So what happens if you have some text like this then

    “i8″

    The solution is meant to take text like:

    <html>
    <head>
    <script>
    function testme() {
    var s;
    s=’this is a test’; }
    </head>
    <body>
    This is the content.
    </body>
    </html>

    and return:
    This is the content.

  4. Clint Rutkas Aug 11

    You forgot about inline javascript statements for onload’s and onclick’s. I could do:

    I could even do soemthing like this:
    http://www.codehouse.com/javascript/articles/external/

  5. Clint Rutkas Aug 11

    Grrr….

    Ok, lets try this without complete HTML mark up then

    body onload=”function(){alert(”foo”);bar();}”></body

  6. Gath Aug 11

    Hi Clint,

    I just did your test, and it works.


    <body onload="function(){alert('foo');bar();}">
    This is the content
    </body>

    Becomes:
    This is the content

    Or did you mean something else?

  7. Image Map Aug 30

    thanks this is just want I needed

Leave a reply