I <3 Bots!
Subscribe to the RSS feed

Keyword - Evaluation

Entries feed - Comments feed

Sunday, August 10 2008

Why the "line of code" is indeed a good metric

When I first learned about source code metrics, I was amazed about people using the line of code for doing comparison with software. It was for me a lack of imagination.

At the beginning of the week, I started a small and fast experiment: extracting metrics from the SATE 2008 test cases. This experiment focuses on function-wise properties and therefore, I have to extract for each functions a couple of metrics:

  • McCabe's cyclomatic complexity which computes the code complexity, this is indeed a good metric to estimate the difficulty that a human will have to understand a given piece of code (very important for security related problems)
  • Line of Code
  • Line of Comments
  • Number of local variables
  • Number of parameters (which represents the coercion between the function and the whole program)
  • Number of function call
  • Number of function that are ``sources''
  • Number of function that are ``sinks''
  • Number of C standards functions (obviously, only for C test cases)

At first the the line of code was implemented cause it's an easy one to compute and it also gives an important value if we want to normalize the other metrics. We also decided to introduce the number of ``source/sinks'' for studying input validation weaknesses later on...

Anyway, after running some statistics on the output results, I was amazed by observing that the Pearson correlation coefficient between McCabe and Line of Code was never less than 0.90 (which could be compare to 90% as a correlation rate) (but I have to say that there is huge limitations in the parsers we are using for extracting information, for instance, the C is not pre-processed etc.). This result is only valid for C test cases, actually, the average of observed correlation in Java test case is around 0.60...

Of course further statistical analysis will be necessary to conclude anything on this subject, but if we were unlucky with the test cases selection, this may have been a source of the problem, but I don't think we were. Actually, this seems quite logical to think that these metrics a related, the longer the code is, the more complex in term of tests, loops etc. it can be, there is indeed more chance that a longer code contains more cycles :)

Oh well, I'll keep writing about especially since I expect to get results pretty soon...

Tuesday, June 10 2008

My talk at SAW: Automated Evaluation of source code analyzer output

It has been some time since I haven't post on my blog... well, I've been busy especially with the end of SATE, and oh well! had vacation :)

Anyway, at the next Static Analysis Workshop this Thursday, we're gonna talk about the SATE experiment and the observations/results we could get from this. I am then gonna talk about a tool I wrote in order to probe if a reported weakness is a false-positive: this is the Automated Evaluation.

The main idea of the Automated Evaluation, is to get some information on the source code and, under some assumptions, try to make a conclusion on the correctness of the piece of code. Behind all the reasoning from that particular tool, my approach had to be radically different than a classical SCA otherwise this would have been like creating a new SCA and this would have been obviously useless. The context of this automated evaluation is limited to the buffer overflows and this can only work for proving false-positive only!

So basically, I am reading the source code from the reported sink to the possibles sources and grabbing the actions that possibly affect the variable which have a role in the code.

These actions are like:

  • Allocation of a destination buffer
  • Computing the size of the source buffer(s)
  • Test for NULL
  • Test that involves the size of the buffers...
  • ... and some others

Then, once these actions are detected, the tool increments a global score of false-positiveness to this reported weakness. We then only have to set a threshold in order to know what correctness we want to have; this is really tied to the source code and how the program is developed.

Even though this evaluation method is not perfect, this was adapted to the C test cases we had in SATE 2008 since the global code quality was good. We can even say that the software were well written; it was then okay to make some assumption on the code such as:

  • If the size of the destination buffer is computed with the size of the source buffer, the size is good (basically: no off-by-one)

Also, the tool itself needs some information on the source code such since it uses regular expression to match the "actions"...



Here we are for a quick explanation and here are the slides: SAW: Automated Evaluation of SCA output

Tuesday, February 5 2008

NIST Static Analysis Tool Exposition: No, this is not a competition!

I've was happy yesterday when I learned that Fortify will participate to the Static Analysis Tool Exposition (SATE) we are currently organizing. And even more when I saw this morning Brian Chess blogging about SATE.

We've been working on SATE since our last Static Analysis Summit and, helped with a couple of existing exposition already existing at NIST such as TREC etc. for the guidelines, the rules and so on. But even so, we had some example, we had three difficult tasks:

  1. Make people agree on the fact that it is not a competition
  2. Make vendors participating (if you are a vendor, reading this please, subscribe for participating at SATE)
  3. Choosing the test cases

The last point is not solved yet, and even, none of them can be considered as solved since not everybody is participating to the 2008 exposition (which has 2 tracks: C and Java), but we've been seeking for good test cases in C and Java. Good test cases... means not too big, not too small and having exploitable vulnerabilities. By the way, if any of the readers of this blog have some idea of Java or C test cases that would be good test cases, please, send me links, ideas or whatever :)

Anyway, SATE is on his way, I hope more tool makers will sign up for participating at this experiment.

Maybe another point, due to my usual blogging on web security and web apps security scanners, if SATE is a success as we expect it to be, we may open new tracks for... web application security scanners and I would love to have special tracks for security metrics (I want to show up!! :p)

Thursday, November 1 2007

My talk at the Verify Conference

Last Tuesday, I went to the Verify conference to give a talk about Web application scanners evaluation: what we are actually doing at NIST. I'm gonna make a simple entry reviewing what I actually talked about. The slides are here.

First of all, the evaluation was made with a test suite I made. The choices for the test suite are kinda simple, I wanted something really close to a real website. So I decided to use a real website (not a couple of test cases). The website contains multiple seeded vulnerabilities from different kinds (XSS, SQLi, RFi, CSRF, etc.). The website is actually configurable in a sense of vulnerability: you can choose what vulnerabilities will be in the website or not (let's say, I only want to have XSS vulnerabilities). Moreover, in order to see the web apps scanners capabilities, we can select a type of defense for the current protection: the level of defense.

Level of defenses

Programmers are different. They have different background, knowledge and approach to solve security problems. The filters we can see in wild web applications are not equivalents, some are good, some are just bad and we have the full shade of effectiveness. So, in order to test web apps scanner with different difficulties (for them) we implemented different level of protection around the vulnerabilities: the level of defenses.

A simple example: SQL Injection

  • Level 0: No protection
  • Level 1: Typecasting (in order to convert integer, boolean, double, strings, dates etc.). This protection will limit the SQL Injection on SQL native number types (integers will be converted as integer: 1' OR 1=1-- will be converted into 1).
  • Level 2: Escaping the meta-characters. We are protecting about quote injection, etc.
  • Level 3: Hiding the MySQL errors, we will now have Blind SQL Injections.
  • Level 4: Restricted user management.
  • Level 5: Using prepared statements.

Since the level of defenses will be use in combination, the order is important. (combination: level 2 = level 2(level 1(level 0))). So, using these level of defenses we are able to select the difficulty that the tool will have to break the vulnerabilities. For the results, if you are looking at the slides, in the detection rate slide, you'll see that there is not result for the level 2 which means that no tools were able to find vulnerabilities in the level of defense 2.

Attack Surface Coverage

Another point I have been working on is the attack surface coverage. A webapps scanner is not a simple piece of software which launches attacks! The crawling/parsing step is actually really important maybe the most important since it will try to understand the application. The attack surface of the test application is the places where the user has a direct interaction, means no algorithms etc. just inputs handling, error messages etc.

Here is an example of attack surface coverage check points (with numbers) for a login function:

(1) Touch the file [login.php]
if ( all fields are set ) then
	(2) All fields are set [login.php]
	Boolean goodCredentials = checkThisUser(fields)
	if ( goodCredentials ) then
		(3) Credentials are correct; the User is now log in [login.php]
		registerCurrentUser()
	else
		if ( available login test > 0 ) then
			(4) Login information incorrect [login.php]
			displayErrorLogin()
			available login test -= 1
		else
			(5) Too much try with wrong credential [login.php]
			displayErrorLogin()
			askUserToSolveCAPTCHA()
		endif
	endif
endif

Basically, we would like the scanner to use the normal behavior paths and also the abnormals (errors etc.) in order to find vulnerabilities there such as Information Leakage etc. Just a note about the attack surface coverage rate: this number cannot be interpreted alone. You need to use this with the detection rate and the false positive rate. In the slides you can see that the tool A as a 25% attack surface coverage of the application, but this is also the tool with best findings and no false positive. This means that the tool were able to find 33% of vulnerabilities (best results from all the 4 scanner we tested) in 25% of the application which can be considered as accurate compared to the others.

The attack surface coverage may have an important impact, depending on what type of testing you are doing with your webapps scanner. If you want a tool to run at the end, doing a full assessement, then you will need a tool which as a very good coverage (since you only rely on that). But if you are looking for a tool which is fully integrated in your testing process (testing == quality and security) then, I think it's better to have an accurate tool which will cover a lower surface, but the tool will cover the important points.

Conclusions

This is actually hard to make a real strong conclusion about the results given in the slides. The test application is a real simple website (banking application) and is far from a real company website; this is a huge confounding factor. Another problem is that I did the evaluation one vulnerability at the time (and one level of defense at the time). This prevent a couple of real life behaviors...

Tuesday, October 16 2007

Stuck at data-flow? Do box-modeling!

Since yesterday, I'm working on a data-flow problem. I need to model a function and I should do all the data-flow process. Well, that's kinda long if I have to do that on all functions and especially I will never use much of the information I would generate by analyzing the tree associated to the function (local variables etc.). So what the point of doing that? None.

I was stuck at this point, didn't find a good way to model a function (entry parameters, global calls etc.) so I thought of reasoning as a crystal ball. I can see what it is, but it's kinda blurry :) I am now modeling a function as inputs and outputs, only in terms of functions and global variables interaction. By this, I should be able to see the possible interaction of the given function on the system. Hope it's gonna work well!

Thursday, August 23 2007

Web App Security Scanner Evaluation Criteria

Here is a new interesting project: WASSEC. This WASC's project is run by Anurag Agarwal and is about the evaluation of web application scanners such as Watchfire's AppScan, SPI's WebInspect etc.

If you are in the field, don't wait to help us :). Here is Anurag's words:

Thank you all for your patience. We have received an overwhelming response from the WASSEC (Web Application Security Scanner Evaluation Criteria) project. To proceed with the project please

1. Please email wasc-wassec-subscribe(AT)webappsec(DOT)org and reply to confirmation email.

2. It is moderated subscription so every contributor has to be approved to send messages to the list.

3. Once you are subscribed to the list, then email wasc-wassec(AT)webappsec(DOT)org to post messages.

All further communication will be done through the mailing list. Please keep checking your junk mail folder in case some messages might go there. We are also in the process of setting up a wiki for the length of the project to post updates, etc. Until then I will be updating my blog with the project details.

Once again, thank you for your participation.

You can checkout the project here: http://webappsec.org/projects/wassec

Tuesday, July 10 2007

Website functionalities coverage

Coverage is a tool written in Python which allows you to track what functionalities/web pages are reached on your website. I use this tool for in my Web Apps Scanner evaluation methodology in order to know if the web apps scanner was able to scan every pages, every functionalities of my test apps.

Anyway, this tool is pretty easy to use even if it requires a MySQL database to store the EntryPoints of the application. Basically, you setup the database, you insert the entry points into your code and you run the python script which will generate an HTML report with SVG graphs, reporting the coverage of your application.

Here is a report example

Installation

1/ Database

The database design I used for storing the needed information is the following:

CREATE TABLE `coverage` (
`CoverageID` int(32) NOT NULL auto_increment,
`Apps` varchar(128) character set utf8 collate utf8_unicode_ci NOT NULL,
`Date` date NOT NULL,
`EntryPoint` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
`Origin` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
PRIMARY KEY  (`CoverageID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
  • Apps: name of the covered application
  • Date: time when the entry point is reached
  • EntryPoint: Name of the entry point with a special format:


** File Reached:
Touch_ + Name of the file with extension, example, Touch_Index.Php, Touch_Search.Php etc.

** Functionality Reached:
Name of the functionality + _ + Name of the file with extension, example, this sequence of entry points of the page Login.php of a given application:

  1. Touch_Login.Php : Enter the page Login.Php
  2. Username_Password_Login.Php : The username and the password are feed
  3. Call_Function_Login.Php : Call the function login()
  4. Call_Function_Succeed_Login.Php : The function login succeed
  5. Call_Function_Error_Login.Php : The function login reported an error
  • Origin: the origin string is the concatenation of the md5 of the HTTP_USER_AGENT a pipe and the date; this ID + date is used to be sure to study the same user.
<?php
// ...
$origin = md5($_SERVER['HTTP_USER_AGENT']). '|' . date("j-m-y H:i");
?>

2/ In the code

So, you will need to add, in your apps code, lots of entry points. I made a PHP source code to do that more easily:

<?php
class Coverage{
 private $coverage_id = false;
 private $coverage = null;
 function __construct() {
  $this->coverage_id = true;
  $this->coverage = mysql_connect('192.168.1.3:3306', 'test', 'test');
  mysql_select_db("test_collect");
 }
 function send($entryPoint){
  if ($this->coverage) {
   $origin = "";
   $origin .= md5($_SERVER['HTTP_USER_AGENT']);
   $origin .= ('|' . date("j-m-y H:i"));
   $entryPoint = mysql_real_escape_string($entryPoint);
   mysql_query("INSERT INTO coverage VALUES(NULL,'BankApp',NOW(),'$entryPoint','$origin')");
  }
 }
};
	
$coverage = new Coverage();
function register_EntryPoint($entryPoint) {
 global $coverage, $supportCodeCoverage;
 if ($supportCodeCoverage) {
  $coverage->send($entryPoint);
 }
}
?>

Insert this code in a header or something and call:

register_EntryPoint('Touch_MyFile.Php');

etc. in your code where you have functional difference.

Run the tool

To run the tool, you need to have:

  • Python + MySQLdb (the python MySQL API)
  • The date (in SQL format) you want to cover; for now, it's only one day
  • The Origin ID of the user (the MD5(HTTP_USER_AGENT)), basically, you will look at this in the database, or get it by your code etc.


example:

$ python coverage.py 2007-06-28 41942da0293d0b8afcfab4c2d10c2401
$ python coverage.py 2007-04-12

The script must be in the same directory of your files for now... you can download the archive here: coverage.zip

Wednesday, May 30 2007

Such a noisy thing with SWAAT

In one of the last post, I made a comparison between two PHP Source Code Security Analyzers: SWAAT and PHP-SAT. The results was close to say that SWAAT was really better than PHP-SAT.
I started working on the configuration of PHP-SAT and it looks to be quite powerful (well, after talking with Eric Bouwers, I'm waiting for the next release) and I think I will be able to have good results with combining a security oriented configuration and some additional bugpatterns.
On the other hand, SWAAT is really limited for now as example, I've made a simple php script with only SQL queries inside: every lines are highlighted as flawed (and with a MEDIUM level)!! This is simply stupid and they would better don't report anything than doing that... just tell that you don't support SQL Injection for now... Anyway, SWAAT is for me the tool to keep an eye on, I will try to develop some features on it, especially for XSS detection and SQL Injection findings...

Wednesday, February 7 2007

How you should design a test suite for Web Apps Scanners

If you have ever think about using a web application scanner for testing the security of your website, you certainly made a choice: Which web apps scanner should I buy/use ?

In this post, I will not tell you what is the better black box tester for whatever kind of web application.

The web applications may be very different, the tools are different and thus they could have different efficiency (i think it's non countable noun)... If you read this, you probably know that I am talking about scanners such as WebInspect, AppScan, Acunetix, Hailstorm, Pantera, Grabber... In the following sections, I will explain a main idea that should be used for testing such a tool.

A test suite for our tools is a website, this website has typically vulnerabilities; you can see this kind of website by watchfire, spi but also WebGoat, SiteGenerator or others. But all of these websites are not realistic and do not consider that the vulnerabilities may exist in different instances: variants.
If you don't know what is a variant check at the XSS Cheat Sheet (RSnake) or at the Attack Patterns (Sean Barnum). A variant is what the hacker use to perform his exploit.
A simple example for XSS is: Let's say you protect your website against XSS by checking the <script> tag; this is not perfect and not good because there is some way to insert other type of XSS strings (onmouseover).

Here comes the concept of Level Of Defense. If you are a developer you think about filters, if you are an attacker you think about variants of vulnerability and attack patterns. The level of defense of a website is the strength of its filters again a given vulnerability.

For a SQL Injection you can have multiple type of filters... Here is a possible list of levels for the SQL Injection:

  • Level 0: Show SQL errors / No input filtering
  • Level 1: Hide SQL errors / No input filtering
  • Level 2: Typecasting (integer, string etc.)
  • Level 3: Escaping input strings
  • Level 4: Restricted accounts...
  • ...


In the concept of the level of defense, it's important to not that depending of the type of vulnerability (weakness, failure...) the level n-1 is also performed in the level n or the level n is stronger (for the same variants) than the level n-1 (for instance, for Weak Hash Function it's not possible but using SHA-1 instead of MD5 is a level of defense higher).

A Key point: When you are implementing a level of defense for a vulnerbility, you must be sure that your implementation does the whole thing for that type of filter. For example, if you are escaping the HTML entities, you need to do all not only '<', '>' and in the next LoD escaping ' and ".

Why is the level of defense better than a simple system with vulnerabilities?

With the level of defense, you can calibrate a type of website which may be close to yours; you can construct a test suite with your kind of level of defense and see how the tool detect the vulnerabilities when the LoD increase. It is also a good way to know the state of the art of the tools for detecting vulnerabilities...

The idea was developed to create a test suite in order to evaluate web apps scanners; in this test suite we can select the current type of vulnerability and its level of defense (the hardness to break):


Wednesday, January 10 2007

Test Suites for Web Application Scanners

For a while, I've been working on a test suite for evaluating web application scanners. Now I have a test suite (PHP/MySQL/AJAX) with a bunch of variable vulnerabilities:


But there is a problem for a full evaluation. Web Application are not only a simple schema of scripts and databases and complex relation, there is also server configuration, infrastructure, different type of databases etc. Thus, I really have to create different test suites for a good coverage of what web apps could be.
I plan to use:

  • Ruby On Rails framework
  • ASP.NET/MS SQL based application
  • JSP application


This should cover the differnt type of application but I still have to think about server types, architectures,multiple databases etc.

Friday, November 17 2006

Tools evaluation state...

For my work in the Samate Project, in the web apps scanners evaluation, I made a website with a variable level of security, because I was totally not satisfied by the Watchfire or the SPI-Dynamics demo websites.

Then, I started to consider this website as a test suite... The problem is the gap that could be in different type of tools: - basic tools (Paros, Pantera, Wapiti etc.) - famous commercials (webinspect, appscan, ntospider etc.) Mainly because of the AJAX.

Actually I use AJAX in different part of the website such as login system, registration, dynamic verification and I'm sure that if you cannot interpret the JavaScript, you cannot see the vulnerabilities in this code. Maybe the tools can parse some urls... maybe i have to create another "more classical" website, with only {php,mysql,sessions,cookies}... Wait and see the first results

Monday, October 30 2006

Wapiti! Piti piti

You: What a sense of humour!
Me: I know

By the way, this thread is only to give a URL: http://wapiti.sourceforge.net This is a quite simple web apps scanner. I have to test it for wednesday (when I'll give a presentation on Web Apps Scanners with Demos).

And because I'm glad you're reading theses lines, here is a new 'stuff' on OWASP website (meaning Cool Stuff For Web Developpers/Security) : Pantera

I <3 Bots!