Last Tuesday, I went to the Verify conference to give a talk about Web application scanners evaluation: what we are actually doing at NIST. I'm gonna make a simple entry reviewing what I actually talked about. The slides are here.

First of all, the evaluation was made with a test suite I made. The choices for the test suite are kinda simple, I wanted something really close to a real website. So I decided to use a real website (not a couple of test cases). The website contains multiple seeded vulnerabilities from different kinds (XSS, SQLi, RFi, CSRF, etc.). The website is actually configurable in a sense of vulnerability: you can choose what vulnerabilities will be in the website or not (let's say, I only want to have XSS vulnerabilities). Moreover, in order to see the web apps scanners capabilities, we can select a type of defense for the current protection: the level of defense.

Level of defenses

Programmers are different. They have different background, knowledge and approach to solve security problems. The filters we can see in wild web applications are not equivalents, some are good, some are just bad and we have the full shade of effectiveness. So, in order to test web apps scanner with different difficulties (for them) we implemented different level of protection around the vulnerabilities: the level of defenses.

A simple example: SQL Injection

  • Level 0: No protection
  • Level 1: Typecasting (in order to convert integer, boolean, double, strings, dates etc.). This protection will limit the SQL Injection on SQL native number types (integers will be converted as integer: 1' OR 1=1-- will be converted into 1).
  • Level 2: Escaping the meta-characters. We are protecting about quote injection, etc.
  • Level 3: Hiding the MySQL errors, we will now have Blind SQL Injections.
  • Level 4: Restricted user management.
  • Level 5: Using prepared statements.

Since the level of defenses will be use in combination, the order is important. (combination: level 2 = level 2(level 1(level 0))). So, using these level of defenses we are able to select the difficulty that the tool will have to break the vulnerabilities. For the results, if you are looking at the slides, in the detection rate slide, you'll see that there is not result for the level 2 which means that no tools were able to find vulnerabilities in the level of defense 2.

Attack Surface Coverage

Another point I have been working on is the attack surface coverage. A webapps scanner is not a simple piece of software which launches attacks! The crawling/parsing step is actually really important maybe the most important since it will try to understand the application. The attack surface of the test application is the places where the user has a direct interaction, means no algorithms etc. just inputs handling, error messages etc.

Here is an example of attack surface coverage check points (with numbers) for a login function:

(1) Touch the file [login.php]
if ( all fields are set ) then
	(2) All fields are set [login.php]
	Boolean goodCredentials = checkThisUser(fields)
	if ( goodCredentials ) then
		(3) Credentials are correct; the User is now log in [login.php]
		registerCurrentUser()
	else
		if ( available login test > 0 ) then
			(4) Login information incorrect [login.php]
			displayErrorLogin()
			available login test -= 1
		else
			(5) Too much try with wrong credential [login.php]
			displayErrorLogin()
			askUserToSolveCAPTCHA()
		endif
	endif
endif

Basically, we would like the scanner to use the normal behavior paths and also the abnormals (errors etc.) in order to find vulnerabilities there such as Information Leakage etc. Just a note about the attack surface coverage rate: this number cannot be interpreted alone. You need to use this with the detection rate and the false positive rate. In the slides you can see that the tool A as a 25% attack surface coverage of the application, but this is also the tool with best findings and no false positive. This means that the tool were able to find 33% of vulnerabilities (best results from all the 4 scanner we tested) in 25% of the application which can be considered as accurate compared to the others.

The attack surface coverage may have an important impact, depending on what type of testing you are doing with your webapps scanner. If you want a tool to run at the end, doing a full assessement, then you will need a tool which as a very good coverage (since you only rely on that). But if you are looking for a tool which is fully integrated in your testing process (testing == quality and security) then, I think it's better to have an accurate tool which will cover a lower surface, but the tool will cover the important points.

Conclusions

This is actually hard to make a real strong conclusion about the results given in the slides. The test application is a real simple website (banking application) and is far from a real company website; this is a huge confounding factor. Another problem is that I did the evaluation one vulnerability at the time (and one level of defense at the time). This prevent a couple of real life behaviors...