Remove HTML tags and Javascript from pages

Sometimes, for whatever reason, you have the entire code for a page in a string and you would like to remove everything but the raw content. Easy to remove the HTML tags, but then you are left with all the javascript function calls.

Here is the solution:


    Public Function RemoveHTML(ByVal sIn As String) As String
        Dim R As Regex

        ’Remove the javascript
        R = New Regex(”<script(.*?)>((.|\n)*?)(<\/script>)”, RegexOptions.IgnoreCase + RegexOptions.IgnoreCase)
        sIn = R.Replace(sIn, “”)

        ’Remove all the HTML tags
        R = New Regex(”<[^>]*>”)
        sIn = R.Replace(sIn, “”)

        RemoveHTML = sIn
    End Function

Opening a new browser window with POST data

Every now and again I want to have a link that opens a new browser window, and passes some values in. It’s easy when you just need to pass in GET data:

http://dummydomain.com?value1=just&value2=a&value3=test

But it becomes harder when you want to pass in POST values.

Maybe not that hard…How about opening the new window, adding the POST variables to the form, and then posting it to the relevant domain. That would work :)

This code gives you a button which will open a new window and automatically log in to your WordPress blog.


<html>
<head>

    <script type='text/javascript'>
    function LoginToWordPressBlog()
    {
    var sData;
    var sUserName = "YourUserName";
    var sPassword = "YourPassword";
    var sDomain = "http://YourDomain.com";

    sData = "<form name='loginform' id='loginform' action='" + sDomain + "/wp-login.php' method='post'>";
    sData = sData + "<input type='text' name='log' id='user_login' class='input' value='" + sUserName + "' />";
    sData = sData + "<input type='password' name='pwd' id='user_pass' class='input' value='" + sPassword + "' />";
    sData = sData + "<input type='submit' name='wp-submit' id='wp-submit' value='Login »' />";
    sData = sData + "<input type='hidden' name='redirect_to' value='/wp-admin/' />";
    sData = sData + "<input type='hidden' id='wordpress_test_cookie' name='wordpress_test_cookie' value='WP Cookie check' />";
    sData = sData + "</form>";
    sData = sData + "<script type='text/javascript'>";
    sData = sData + "document.cookie='wordpress_test_cookie=home; expires=Fri, 11 Jul 2009 05:23:14 +0000; path=/';";
    sData = sData + "document.loginform.submit();</sc" + "ript>";
    OpenWindow=window.open("", "newwin");
    OpenWindow.document.write(sData);
    OpenWindow.document.close()
    }
    </script>
</head>
<body>
    <form id="form1">
         <input id="Button1" type="button" value="Login To Wordpress" onclick='LoginToWordPressBlog()' />
    </form>
</body>
</html>

The method above is in javascript, so it should work for pretty much everybody. The steps are:

  1. Create a new window
  2. Use document.write to write the HTML POST data tags to the new window.
  3. Submit the new window.

The only tough part now is finding out which POST tags you need - and the best tool for that is Fiddler. It even shows you the cookie values that you need.