I just started this blog in the hope that it will be a location to store various diddles, lessons learned and security related stuff that I come across. I'm mostly interested in low-level packet/protocol stuff, vulnerability research and coding in Ruby but I'm sure other random crap will leak into scope.
For this first post, I'll start this off with a little lesson learned on how to quickly produce and test regular expressions in Ruby. I'm picking this topic because it's something I've seen others struggle with a lot over the years and I've been grateful enough to have a couple good mentors on the topic.
Probably the most important part of building and testing really good regular expressions is having an environment that allows to you to experiment using different techniques and data sets. If you're working in a production environment where this regex might be used to catch a specific security threat or fire off an email when "x" happens you need to be sure that it's going match.
I've always been sort of a scripting guys so using grep/awk/sed/perl were always at the top of my list when I needed to build out an ad-hoc regex until one day someone recommended I try out "irb" (Interactive Ruby Shell) as an alternative and it's now the only tool I use for regex stuff because it's just dirt simple to build and validate.
First, if you haven't used irb before, it can be kicked off at command line by typing "irb" after you've installed Ruby and it looks something to this effect with maybe some slight variances depending on what you have in your ".irbrc" file...
$ irb
>>
The main idea here is that irb provides direct access to the Ruby interpreter so you can just start typing native Ruby directly into the shell and execute it line by line while getting direct feedback as you go. Here is a quick "hello world" just so you get my drift...
>> puts "hello world"
hello world
=> nil
The nice thing about this is that we can directly interact with Ruby's internal regex classes and methods to help us build and validate that a regex will indeed not only match our desired criteria, but also how that information can be carved out and stored if say we actually wanted to do something with the resultant set.
Lets say for a minute we have a web site that contains the version number of the application, but we want to be able to parse and store that data so that it can be productively used later by some other tool or process. In irb, the first step is to simply store a sample of what we're going to be matching so that we can actively test against it in irb. We'll use Squirrelmail as our test bunny here and we can do this by grabbing the source of the login page that contains the version string and setting it equal to an arbitrary variable, in this case we'll use variable "a" to store this blob of html data...
>> a = '<small>SquirrelMail version 1.4.2-1.2.1<br />'
=> "<small>SquirrelMail version 1.4.2-1.2.1<br />"
The next step is to use the match method, which is already part the Stringobject "a" after its creation. If we want to we can simply just match on the existence of "SquirrelMail" in our test data and we can do this like so...
>> a.match(/SquirrelMail/)
=> #<MatchData "SquirrelMail">
As you can see the response we get indicates that we got a positive match on SquirrelMail and shows us what exactly we matched on. We can see if the opposite of this is true by searching for something that is not existent in the data set we're looking at like looking for the word "bananas"...
>> a.match(/Bananas/)
=> nil
The regex for bananas was not found in the string and it's clearly indicated by the nil response we're getting from our test. This just gives us some confidence that we know what to expect when we match and when we don't. Now matching on the name of SquirrelMail is nice and we will be able to confirm if we do see the presence of it on the login page, but what if we want to snarf out the version and the subversion numbers while we are at it so we can store them as variables for later use? Well, we would want to use something like this...
>> a.match(/SquirrelMail version ([\d\.]+)-([\d\.]+)/)
=> #<MatchData "SquirrelMail version 1.4.2-1.2.1" 1:"1.4.2" 2:"1.2.1">
What you'll notice here is that output has changed quite considerably even though it still gives us a sense that we did get a positive response due to the lack of a nil. This match attempt actually returns an array for which we can now pick out precise pieces of our sample. Once we have this array type response we can simply grab the individual elements out and store them as independent entities...
>> entire_string =$~[0]
=> "SquirrelMail version 1.4.2-1.2.1"
>> major_version =$~[1]
=> "1.4.2"
>> minor_version =$~[2]
=> "1.2.1"
So, yeah, that's pretty how you can snarf out this information using a regex using irb and matching against a dummy blob data set. One of the thing I find useful about these methods is just getting that immediate response that I'm properly parsing everything out properly in real-time rather than having to run scripts or copy and paste things into a web tool. This generally just helps me save time and produce better more reliable regex matchers for when it really matters.
As a final note I wanted to include an end to end procedure to show how easy it is to now put what we've learned to practical use. We can now point this at a real application outside of our test environment knowing full well we're going to not only get a positive response, but also that we're going to be able to snarf all the data we want for later use...
Step 1 - Get Login Page and Store as variable
>> login = Net::HTTP.get('mail.example.com', '/webmail/src/login.php')
=> "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n\n<html>\n<head>\n\n<title>SquirrelMail - Login</title><script language="JavaScript" type="text/javascript">\n<!--\n function squirrelmail_loginpage_onload() {\n document.forms[0].js_autodetect_results.value = '1';\n var textElements = 0;\n for (i = 0; i < document.forms[0].elements.length; i++) {\n if (document.forms[0].elements[i].type == "text" || document.forms[0].elements[i].type == "password") {\n textElements++;\n if (textElements == 1) {\n document.forms[0].elements[i].focus();\n break;\n }\n }\n }\n }\n// -->\n</script>\n\n<style type="text/css">\n<!--\n /* avoid stupid IE6 bug with frames and scrollbars */\n body { \n voice-family: "\\"}\\""; \n voice-family: inherit; \n width: expression(document.documentElement.clientWidth - 30);\n }\n-->\n</style>\n\n</head>\n\n<body text="#000000" bgcolor="#FFFFFF" link="#0000CC" vlink="#0000CC" alink="#0000CC" onload="squirrelmail_loginpage_onload();">\n<form action="redirect.php" method="post">\n<table bgcolor="#ffffff" border="0" cellspacing="0" cellpadding="0" width="100%"><tr><td align="center"><center><img src="../images/sm_logo.png" alt="SquirrelMail Logo" width="308" height="111" /><br />\n<small>SquirrelMail version 1.4.2-1.2.1<br />\n By the SquirrelMail Development Team<br /></small>\n<table bgcolor="#ffffff" border="0" width="350"><tr><td bgcolor="#DCDCDC" align="center"><b>SquirrelMail Login</b>\n</td></tr><tr><td bgcolor="#FFFFFF" align="left">\n<table bgcolor="#ffffff" align="center" border="0" width="100%"><tr><td align="right" width="30%">Name:</td><td align="left" width="*"><input type="text" name="login_username" value="" /></td></tr>\n<tr><td align="right" width="30%">Password:</td><td align="left" width="*"><input type="password" name="secretkey" />\n<input type="hidden" name="js_autodetect_results" value="SMPREF_JS_OFF" />\n<input type="hidden" name="just_logged_in" value="1" />\n</td></tr></table></td></tr><tr><td align="left"><center><input type="submit" value="Login" /></center></td></tr></table></center></td></tr></table></form>\n</body>\n</html>\n"
Step 2 - Snarf Out the Info we want
>> login.match(/SquirrelMail version ([\d\.]+)-([\d\.]+)/)
=> #<MatchData "SquirrelMail version 1.4.2-1.2.1" 1:"1.4.2" 2:"1.2.1">
>> $~[0]
=> "SquirrelMail version 1.4.2-1.2.1"
>> $~[1]
=> "1.4.2"
>> $~[2]
=> "1.2.1"

No comments:
Post a Comment