I <3 Bots!
Subscribe to the RSS feed

Keyword - testing

Entries feed - Comments feed

Sunday, August 10 2008

Why the "line of code" is indeed a good metric

When I first learned about source code metrics, I was amazed about people using the line of code for doing comparison with software. It was for me a lack of imagination.

At the beginning of the week, I started a small and fast experiment: extracting metrics from the SATE 2008 test cases. This experiment focuses on function-wise properties and therefore, I have to extract for each functions a couple of metrics:

  • McCabe's cyclomatic complexity which computes the code complexity, this is indeed a good metric to estimate the difficulty that a human will have to understand a given piece of code (very important for security related problems)
  • Line of Code
  • Line of Comments
  • Number of local variables
  • Number of parameters (which represents the coercion between the function and the whole program)
  • Number of function call
  • Number of function that are ``sources''
  • Number of function that are ``sinks''
  • Number of C standards functions (obviously, only for C test cases)

At first the the line of code was implemented cause it's an easy one to compute and it also gives an important value if we want to normalize the other metrics. We also decided to introduce the number of ``source/sinks'' for studying input validation weaknesses later on...

Anyway, after running some statistics on the output results, I was amazed by observing that the Pearson correlation coefficient between McCabe and Line of Code was never less than 0.90 (which could be compare to 90% as a correlation rate) (but I have to say that there is huge limitations in the parsers we are using for extracting information, for instance, the C is not pre-processed etc.). This result is only valid for C test cases, actually, the average of observed correlation in Java test case is around 0.60...

Of course further statistical analysis will be necessary to conclude anything on this subject, but if we were unlucky with the test cases selection, this may have been a source of the problem, but I don't think we were. Actually, this seems quite logical to think that these metrics a related, the longer the code is, the more complex in term of tests, loops etc. it can be, there is indeed more chance that a longer code contains more cycles :)

Oh well, I'll keep writing about especially since I expect to get results pretty soon...

Friday, May 16 2008

Yet another study on code quality: A Tale of Four Kernels

If like me you are interested in code quality and some general conclusion that one can draw based on code quality studies, I really recommend to read this paper: A Tale of Four Kernels by Diomidis Spinellis, ICSE '08: Proceedings of the 30th International Conference on Software Engineering

I just want to quote a part of the conclusion by the author

Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development.

The only problem with this statement is that it is based on the fact that the metrics he used were not weighted for their importance for the "Code Quality" (if this means something). Therefore, the comparison between the Windows research kernel and Linux seems a little bit awkward to me. Anyway, this is a very interesting paper about code quality, and lots of interesting ideas from the author of CScout.

Monday, February 25 2008

Code review: facilitate the SCA output analysis

This post is not exactly a follow up of a previous post called Code review tools: the missing link (so far), But since I will have to perform a lot of code review in the next couple of weeks and also tool output analysis, I was looking for some tool to help me, to facilitate my job. I've been asking people for links, tips etc. but nothing really convinced me. I am looking for a tool which is basically able to smartly index the source code I am reviewing, which means that I want to be able to look at the variables, where they are declared, affected and used... I also want to see the call graphs of functions and this, mostly to probe the correctness of tool output.

After a couple of hours looking at specialized tools, I was not able to find something good and free (No, I don't call cscope good!). Yes, there are a couple of commercial ones, especially the ones shipped with the commercial source code analyzers and well, they're not perfect at all!

So, this morning, I was like frustrated when I actually thought of using a tool I used a lot, but for a quite different utilization: Doxygen. You may know this documentation tool, but may not know all it is capable of.

As a documentation generation tool, it is really powerful and mostly based on specially formated comments that the developers seed in the source code. But the tool is also generating a bunch of structure related information such as classes relations, function calls graphs etc. As I don't want to generate a documentation of the code I'm reviewing, I don't mind not to have the well formated comments. I am asking this tool to generate me the structural information and facilitate the navigation from function to function.

I made a small example of the report generated by Doxygen using the configuration I made for getting all the information I wanted (only one page since the documentation and the pictures etc. are kinda big...). In order to generate the configuration I wanted, I made a tiny python script ozone.py since the DoxyWizard is not really convenient for that. Also, I will add a process to pre-compile the JSP files since Doxygen doesn't understand the JSP syntax and the option to use the Doxygen search engine (PHP script that use and file with indexed tags).

This is the first step of that script, as you may see by looking at the source code, I am also generating the XML files, this is because the XML generated Doxygen documentation contains a lot of interesting information that I may use later... Also, while looking at the Doxygen source code, I thought that it could be possible to integrate many more static analysis such as computing metrics, etc. Anyway, so many other things to do than thinking about that right now!

Thursday, January 31 2008

Talk: Problems and solutions for testing web application security scanners

I just came back from the DHS Forum on the Software Assurance where I gave a talk about testing web application security scanners, and especially, the problems and some solutions for testing the scanners.

The presentation is an introduction to a methodology I've been developing at NIST for a while now. This presentation is the follow-up of the Verify Conference slides and also the talk I gave at HICSS (I will release the slides from this presentation when engadget.com will fix the vulnerabilities that I used in order to show the different variation of attacks for introducing the level of defenses)

You can reach the DHS Forum slides as a Google presentation.

Wednesday, January 30 2008

Definition parsing: first step done

Since I started to work on my static analyzer using php-ast/oracle, I realized that looking for vulnerabilities need a lot of hard coded/database entries. This is really sad, since, in order to get something correct you would need a huge knowledge database. So I started thinking of generalization of vulnerabilities and way to express it. It's tough. Really.

The most realistic (if I can say so) idea I had is to actually handle vulnerabilities definition using a given taxonomy. I still need a lot of knowledge, especially on the language (PHP) I'm analyzing, especially the output functions, global variable, filters, resources etc. but the big advantage with rules is that you can generalize the definition.

Anyway, I started dealing with natural language, will try to make this fitting into my model in order to communicate with the future static analyzer engine of php-oracle... and thanks to the AIMA project, I was able to get some fast results on the processing:

# source definition:
unvalidated input go to sink in html context
# parse tree:
2 possiblities
##
  02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
      23VP[('Verb', 'go')][]
        45NP[('Noun', 'sink')][]
       ('Preposition', 'to')
      35PP[]
     
    25VP[]
      68NP[('Name', 'html'), ('Noun', 'context')][]
     ('Preposition', 'in')
    58PP[]
   
  28VP[]

08S[]
##
  02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
    23VP[('Verb', 'go')][]
        45NP[('Noun', 'sink')][]
          68NP[('Name', 'html'), ('Noun', 'context')][]
         ('Preposition', 'in')
        58PP[]
       
      48NP[]
     ('Preposition', 'to')
    38PP[]
   
  28VP[]
 
08S[]

And the taxonomy I used is the following (which needs to be extended to handle more than "input validation"):

IV = Grammar('InputValidation',
	Rules(
		S = 'NP VP | S Conjunction S',
		NP = 'Pronoun | Noun | Article Noun | Adjective Noun | NP PP | NP RelClause | Name Noun',
		VP = 'Verb | VP NP | VP Adjective | VP PP',
		PP = 'Preposition NP',
		RelClause = 'That VP'
	),
	Lexicon(
		Noun = "input | output | privilege | context | header | user | sink | file",
		Verb = "is | go | write | print",
		Adjective = "validated | unvalidated | asynchronous",
		Pronoun = "me | you | i | it",
		Name = "html | database | http | sql | ldap",
		Article = "the | a | an",
		Preposition = "to | in | on",
		Conjunction = "and | or | but | not",
		That = "that"
	))

Now, I only have to finish my model of a vulnerability (I do not think about building something really general, but a model that can handle injection flaws, privilege, communication would be awesome). Once this is finish, lots of things would be possible such as generating attacks directly from the definition (this would be more like a generalized attack generator) and vulns. checkers for the source code analyzer.

I know this is a kinda tough project and I really have lots of other things to do, but I really want to give this a try... just to see where it goes...

Monday, January 28 2008

How come I didn't know this resource!!

While surfing the web, I found this website: http://opensourcetesting.org/.

Just the perfect repository of testing tools, there are a bunch of them on different testing area (security, functional, quality, unit testing and so on!).

Edit: Added in my security planet!

Wednesday, December 5 2007

Static Analysis Framework: PHP-Ast/Oracle

In my previous blog post, I talked briefly about PHP-Ast/Oracle a PHP source code static analysis framework. I am developing it in order to play with source code and security. The goal of that framework is to be able to perform different type of operations on a PHP source code. I am releasing this tool as it is because I think people may be interested with this... Anyway, I learned a lot doing this.

PHP-Ast/Oracle is developed in C++ and the tool has been developed mainly for:

How it works

The source code repository is divided in 2 parts:

  • php-ast is the converter from PHP to XML
  • php-oracle is the actual engine

php-oracle get a XML file as input which is the output of php-ast. In the SVN there are some python scripts I used in order to combine the 2 tools (they may be outdated i.e. doesn't work with the current php-oracle).

How I think you could use php-oracle

I do not attend to make a clean build with an executable etc. I just provide source code. I decided to give only the source code because I don't want to spend too much time on creating a clean software, it's only research oriented stuff. Furthermore, there is not much documentation in the source code (advantages of being alone to develop such a tool) and then, only really interested people will download this! I can then help them if they have some question about how it works etc.

Getting the source code

You can download the source here: php-ast-oracle.zip

And the trac repository has more documentation about what the framework actually does: http://trac2.assembla.com/php-ast

Development

The tool is in perpetual development, I don't want to create a real software from that, but I think people can use it to perform security analysis, compute stuff, make code transformation and so on.

Sunday, December 2 2007

Yet another study oriented release

I've been working a couple of months on a project named php-ast/oracle. I am opening the source of the project today because I think that people may be interested in such a code. Roughly, php-ast/oracle is able to get/transform information on a php source code, I used it for: creating real obfuscations (control-flow, data-flow), implementing security metrics, writing a converter from php to c++ for static analysis purpose and some other stuff such as variables flow etc.. You can have more information here: http://trac2.assembla.com/php-ast. I may post about this project later don't have much time now...

But this news is only for releasing a script I used a lot this last weeks; a PHP preprocessor. I've been using this preprocessor in order to clean the crappy PHP code we can found in the wild... in order to use php-ast/oracle correctly for calculating security metrics and so on.

The preprocessor is actually doing 3 things:

  • Simplifying the strings (keeping only the php variables in the strings -- really important for keeping the AST small with SQL queries and so on, because the strings could be evaluated in PHP, the AST would need to tokenize the strings)
  • Removing comments and HTML
  • Resolving the file inclusions (not for dynamic variable inclusion of course, but it's working with define names and static names)

The preprocessor is available here: preproc.zip

Wednesday, November 21 2007

The new grabber

Grabber was a nice project. The main goal for me was to learn stuff around web application security/scanners; I didn't really know much before I started this project. But now that I've been playing with web apps scanners for more than 10months, I need to create a new one and go deeper in heuristics, browser integration and AI.

Grabber was in fact more a spider+fuzzer than something else... Not a good web apps scanner at all. Thinking of the analysis engine... It's something kinda stupid, no JavaScript execution, just simple heuristics for parsing and levenstein distances ;)

Anyway, I decided to start over this project. It's not gonna be a bunch of python scripts anymore, I am gonna use Qt/C++ extensively. The idea if this project is to be pen-testers oriented and open, I want to create a kind of wrapper around WebKit (especially using QtWebKit), a spider as core utilities and after, using plugins. The plugins should be either in C++ or JavaScript (QtScript actually). So far, we are 3 guys thinking of this project: we didn't start yet but we are open to every contribution; the project will of course be free and GPL'd.

I just post this in order to get some comments or suggestions about what a web apps scanner should do... Feel free to comment/mail...

Thursday, November 1 2007

My talk at the Verify Conference

Last Tuesday, I went to the Verify conference to give a talk about Web application scanners evaluation: what we are actually doing at NIST. I'm gonna make a simple entry reviewing what I actually talked about. The slides are here.

First of all, the evaluation was made with a test suite I made. The choices for the test suite are kinda simple, I wanted something really close to a real website. So I decided to use a real website (not a couple of test cases). The website contains multiple seeded vulnerabilities from different kinds (XSS, SQLi, RFi, CSRF, etc.). The website is actually configurable in a sense of vulnerability: you can choose what vulnerabilities will be in the website or not (let's say, I only want to have XSS vulnerabilities). Moreover, in order to see the web apps scanners capabilities, we can select a type of defense for the current protection: the level of defense.

Level of defenses

Programmers are different. They have different background, knowledge and approach to solve security problems. The filters we can see in wild web applications are not equivalents, some are good, some are just bad and we have the full shade of effectiveness. So, in order to test web apps scanner with different difficulties (for them) we implemented different level of protection around the vulnerabilities: the level of defenses.

A simple example: SQL Injection

  • Level 0: No protection
  • Level 1: Typecasting (in order to convert integer, boolean, double, strings, dates etc.). This protection will limit the SQL Injection on SQL native number types (integers will be converted as integer: 1' OR 1=1-- will be converted into 1).
  • Level 2: Escaping the meta-characters. We are protecting about quote injection, etc.
  • Level 3: Hiding the MySQL errors, we will now have Blind SQL Injections.
  • Level 4: Restricted user management.
  • Level 5: Using prepared statements.

Since the level of defenses will be use in combination, the order is important. (combination: level 2 = level 2(level 1(level 0))). So, using these level of defenses we are able to select the difficulty that the tool will have to break the vulnerabilities. For the results, if you are looking at the slides, in the detection rate slide, you'll see that there is not result for the level 2 which means that no tools were able to find vulnerabilities in the level of defense 2.

Attack Surface Coverage

Another point I have been working on is the attack surface coverage. A webapps scanner is not a simple piece of software which launches attacks! The crawling/parsing step is actually really important maybe the most important since it will try to understand the application. The attack surface of the test application is the places where the user has a direct interaction, means no algorithms etc. just inputs handling, error messages etc.

Here is an example of attack surface coverage check points (with numbers) for a login function:

(1) Touch the file [login.php]
if ( all fields are set ) then
	(2) All fields are set [login.php]
	Boolean goodCredentials = checkThisUser(fields)
	if ( goodCredentials ) then
		(3) Credentials are correct; the User is now log in [login.php]
		registerCurrentUser()
	else
		if ( available login test > 0 ) then
			(4) Login information incorrect [login.php]
			displayErrorLogin()
			available login test -= 1
		else
			(5) Too much try with wrong credential [login.php]
			displayErrorLogin()
			askUserToSolveCAPTCHA()
		endif
	endif
endif

Basically, we would like the scanner to use the normal behavior paths and also the abnormals (errors etc.) in order to find vulnerabilities there such as Information Leakage etc. Just a note about the attack surface coverage rate: this number cannot be interpreted alone. You need to use this with the detection rate and the false positive rate. In the slides you can see that the tool A as a 25% attack surface coverage of the application, but this is also the tool with best findings and no false positive. This means that the tool were able to find 33% of vulnerabilities (best results from all the 4 scanner we tested) in 25% of the application which can be considered as accurate compared to the others.

The attack surface coverage may have an important impact, depending on what type of testing you are doing with your webapps scanner. If you want a tool to run at the end, doing a full assessement, then you will need a tool which as a very good coverage (since you only rely on that). But if you are looking for a tool which is fully integrated in your testing process (testing == quality and security) then, I think it's better to have an accurate tool which will cover a lower surface, but the tool will cover the important points.

Conclusions

This is actually hard to make a real strong conclusion about the results given in the slides. The test application is a real simple website (banking application) and is far from a real company website; this is a huge confounding factor. Another problem is that I did the evaluation one vulnerability at the time (and one level of defense at the time). This prevent a couple of real life behaviors...

Tuesday, October 16 2007

Stuck at data-flow? Do box-modeling!

Since yesterday, I'm working on a data-flow problem. I need to model a function and I should do all the data-flow process. Well, that's kinda long if I have to do that on all functions and especially I will never use much of the information I would generate by analyzing the tree associated to the function (local variables etc.). So what the point of doing that? None.

I was stuck at this point, didn't find a good way to model a function (entry parameters, global calls etc.) so I thought of reasoning as a crystal ball. I can see what it is, but it's kinda blurry :) I am now modeling a function as inputs and outputs, only in terms of functions and global variables interaction. By this, I should be able to see the possible interaction of the given function on the system. Hope it's gonna work well!

Wednesday, October 10 2007

Working around security metrics...

I'm not gonna write a long entry about Security Metrics, but since I've been working on this for a couple of weeks now, I have some thoughts. Evaluating the security of a source code is actually pretty hard. Even if I'm sure there is a lot of source code security metrics out there, it's often (I guess) hard to compute. Basically, you will need to know lots of things about the source code then, you need an engine working on the AST , data-flow etc.

This is what I've done for a couple of months, an engine which is working on XML AST, generated by yaxx (this is the same engine that I use to do source code modifications, obfuscations, etc.).

With Vadim Okun, we had the idea of computing the "size" of the security in a source code. The idea is pretty simple and we are aware that this is limited to implementation flaws and not design flaws for now. The "size" of the security is the number of inputs going to sinks.

The inputs have to be taken in the large sense, these are in fact all the variable that are derivate from direct inputs. Here is a simple example of the variable diffusion:

$a = $_GET['foo'];
$b = htmlentities($a);
echo $b;

We are here counting $a and $b since $b is a modification of $a which is a direct input. We are using the same methodologies for all possible modification (concatenation, cast, etc.).

Once we know these variables, we are counting the ones that are going to sinks. The sinks are a list of function such as 'echo', 'mysql_query', 'fopen', and so on. Our list of sinks is directly coming from the PHP-SAT project. In the previous example, the metric result is 1 since there is only one sink 'echo' where a derivate input is going to.

And here we are, this is a fairly simple (in the idea, not the implementation) way to evaluate the possible security problems that you can have in your source code. We are going to try and evaluate this metric on different open source project (wordpress, joomla, mediawiki etc.). I'm sure this is really incomplete: first because we are only counting the security problems that are coming from inputs but also because it really depends on the programmer (his style of programming).

An other example is available here: smetric.pdf

Next Improvements

For the revised version, the first add would be to count the output validation problems. But for that purpose, I need a stronger data-flow analysis which would analyze in function definitions (not done yet). Then, I will be able to trace everything coming from supposed secure sources (databases, resources, local files, etc.) to sinks. Maybe the weight of such flows would be different than the first one (input to sink)...

Tuesday, September 4 2007

Source Code Obfuscation

Source Code Obfuscation is actually a powerful tool for testers. Whether you use it to obfuscate your bytecode (Java, .NET etc.) or increasing the code complexity of your current source code.

Working at SAMATE we are also playing, tweaking, testing, stressing source code analyzers. And now you see the relation. I'm writing a source code obfuscater in order to increase the complexity of our test cases and see if the tools are still doing well.

Thus, I was able (with good documentation, and yaxx) to create one. It currently only add control flow complexity (and of course renaming classes, functions and variables).

Some words on obfuscation

You may have heard about obfuscation in a sense of making the code unreadable for users. This is not what I'm interested in. I want to modify the actually source code, adding some information in it, some tests... I need the outputs of the original program and the obfuscated one to be the same, otherwise we cannot consider the source code as being equivalent.

So for example if I do:

if (var == 0) { 
  echo 0;
}

I will have the same behavior with this source code:

x = some_value;
if (var == 0 or x*x < 0) {
  echo 0;
}

Even though they have the same output, the second one is more complicated since it adds one other test.

Test case wise example

To see what it exactly does given this original source code

<?php
	$b = 0;
	$c = "Salut";
	$len = strlen($c);

	function fct($a) {
		return $a . "_1\n";
	}
	
	class T {
		function foo() {
			echo "test\n";
		}
	}
	echo fct(0);
	echo T::foo();
?>

...and by choosing one of my control flow obfuscating pattern

<?php
	class rand_class_name {
		function rand_func_name_2($rand_name_1) {
			return $rand_name_1 + 1;
		}
	}

	function rand_func_name_1($rand_name_2) {
		return $rand_name_2 + 1;
	}
	
	if (rand_func_name_1(0) > 0 && rand_class_name::rand_func_name_2(0)) {
		$enter_the_new_statement;
	}
?>

To have this result:

<?php
	function HXvE5Plwxp0RSoQM ( $ZMfP98Az96Rq67j6 ) {
		return $ZMfP98Az96Rq67j6 + 1 ;
	}
	class TF03COvMuzXRQcCK {
		function Ltghf3a0McCI8RaZ ( $V309os5vQo15ak9b ) {
			return $V309os5vQo15ak9b + 1 ;
		}
	}
	$b = 0 ;
	$c = "Salut" ;
	$len = strlen ( $c ) ;
	function fct ( $a ) {
		return $a . "_1\n" ;
	}
	class T {
		function foo ( ) {
			echo "test\n" ;
		}
	}
	if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) {
		echo fct ( 0 ) ;
	}
	if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) {
		echo T :: foo ( ) ;
	}

?>

How it actually works

First of all, the engine only works on Abstract Syntax Tree (AST) in order to do powerful manipulation and code refactoring. The idea is to take a couple of transformation patterns (the second source code is in fact a complicated one), and fitting this patterns with the original source code.

The patterns are meta code. You can see that they are in PHP using some names such as $rand_name_1 etc. this means that the engine will generate one unique name for each of them and replace it before the actual refactoring.

Select what I want to obfuscate is not a real problem, but for now I only selected the top statements and will apply the whole modifications to each of them.

A little schema explaining a little how it works is available here: schema_obfuscation.png

What's next

The applied control flow obfuscating pattern is on of the many I do have for now (many more to come), and I guess this is kinda promising, lots of interesting studies should come now.

Currently the tools is only for PHP but I should make it general by using my own AST nodes names and then be able to do code transformation on C, C++, Java etc.

There is no release of the tool (written in C++) right now, I will wait until it's more than correct and clean. I also need to do data obfuscation (using indirections etc.). The program will of course be public and free for everybody when it's gonna be ready.

Monday, July 23 2007

Python script utility called wwwCall and Grabber news

wwwCall: HTTP(S) utilities

wwwCall is a very small module for Python (tested under python 2.5 but should be okay for python >= 2.3) which handle the HTTP(S) connection with some special features like proxy, cookies, authentification (basic, digest). This morning, I was working on Grabber and I just realized how ugly the code was, mostly because of how I handled the web connections, so I decided to create a simple module to do the job easily. The idea is to have a single object handling some basic function of the python urllib2.

If you have ever use Python for doing web calls, you'll see that the utilization is damn simple and I think, pretty cool... Example:

# create the object
http = wwwCall('http://rgaucher.info')
# add the features you want (cookies,auth)
http.setCookieFile('./the_path/file.cookie')
# reaching a logging URL and saving the cookie
http.post("http://rgaucher.info/login.php",{'username' : 'foo', 'password' : 'bar'})
# register the username/password for the basic authentification
http.setAuthBasic("romain","mypassword")
# print the content of the protected page
print http.get("http://rgaucher.info/401protected").read()

Download: wwwCall.zip

The next Grabber

So, I've been working on Grabber for a couple of months without a release now; it's mainly because I don't have that much time to work on it, but also because I made lots of modification. Today I added a couple of features:

  • Understanding some mod_rewrite rules for the spider
  • URL exclusion
  • Basic/Digest Authentification

This comes in addition on the previous features I added, mainly:

  • Multi Site
  • Multi threads
  • Cookie analyzer
  • XSS Locator in addition of the XSS Fuzzer which is definitely faster
  • Spider module, only to crawl the site and export it in XML
  • Login ability, keeping session state

I cannot give a d-day for the release of the 0.2 version because I really want to have a more stable product and will feed some test suites I made at work the tool, to be sure it's reasonable (I will not give comparison results with commercial products :P). I also want to have a better spider...

Tuesday, July 10 2007

Website functionalities coverage

Coverage is a tool written in Python which allows you to track what functionalities/web pages are reached on your website. I use this tool for in my Web Apps Scanner evaluation methodology in order to know if the web apps scanner was able to scan every pages, every functionalities of my test apps.

Anyway, this tool is pretty easy to use even if it requires a MySQL database to store the EntryPoints of the application. Basically, you setup the database, you insert the entry points into your code and you run the python script which will generate an HTML report with SVG graphs, reporting the coverage of your application.

Here is a report example

Installation

1/ Database

The database design I used for storing the needed information is the following:

CREATE TABLE `coverage` (
`CoverageID` int(32) NOT NULL auto_increment,
`Apps` varchar(128) character set utf8 collate utf8_unicode_ci NOT NULL,
`Date` date NOT NULL,
`EntryPoint` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
`Origin` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
PRIMARY KEY  (`CoverageID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
  • Apps: name of the covered application
  • Date: time when the entry point is reached
  • EntryPoint: Name of the entry point with a special format:


** File Reached:
Touch_ + Name of the file with extension, example, Touch_Index.Php, Touch_Search.Php etc.

** Functionality Reached:
Name of the functionality + _ + Name of the file with extension, example, this sequence of entry points of the page Login.php of a given application:

  1. Touch_Login.Php : Enter the page Login.Php
  2. Username_Password_Login.Php : The username and the password are feed
  3. Call_Function_Login.Php : Call the function login()
  4. Call_Function_Succeed_Login.Php : The function login succeed
  5. Call_Function_Error_Login.Php : The function login reported an error
  • Origin: the origin string is the concatenation of the md5 of the HTTP_USER_AGENT a pipe and the date; this ID + date is used to be sure to study the same user.
<?php
// ...
$origin = md5($_SERVER['HTTP_USER_AGENT']). '|' . date("j-m-y H:i");
?>

2/ In the code

So, you will need to add, in your apps code, lots of entry points. I made a PHP source code to do that more easily:

<?php
class Coverage{
 private $coverage_id = false;
 private $coverage = null;
 function __construct() {
  $this->coverage_id = true;
  $this->coverage = mysql_connect('192.168.1.3:3306', 'test', 'test');
  mysql_select_db("test_collect");
 }
 function send($entryPoint){
  if ($this->coverage) {
   $origin = "";
   $origin .= md5($_SERVER['HTTP_USER_AGENT']);
   $origin .= ('|' . date("j-m-y H:i"));
   $entryPoint = mysql_real_escape_string($entryPoint);
   mysql_query("INSERT INTO coverage VALUES(NULL,'BankApp',NOW(),'$entryPoint','$origin')");
  }
 }
};
	
$coverage = new Coverage();
function register_EntryPoint($entryPoint) {
 global $coverage, $supportCodeCoverage;
 if ($supportCodeCoverage) {
  $coverage->send($entryPoint);
 }
}
?>

Insert this code in a header or something and call:

register_EntryPoint('Touch_MyFile.Php');

etc. in your code where you have functional difference.

Run the tool

To run the tool, you need to have:

  • Python + MySQLdb (the python MySQL API)
  • The date (in SQL format) you want to cover; for now, it's only one day
  • The Origin ID of the user (the MD5(HTTP_USER_AGENT)), basically, you will look at this in the database, or get it by your code etc.


example:

$ python coverage.py 2007-06-28 41942da0293d0b8afcfab4c2d10c2401
$ python coverage.py 2007-04-12

The script must be in the same directory of your files for now... you can download the archive here: coverage.zip

Monday, June 25 2007

How not to waste 6hours?

Make sure that your test case is correct!!!!!

Damn I'm stupid, I was working on Grabber on the session state management, and of course, I did a small test case with a couple of pages to be sure the spider can reach every pages. But, my test case was just stupid and calling twice my index make my session still alive, but the variables were set to an order just crazy and have the same effect as destroying the session.

Anyway, now it works! At least in the next Grabber release:

  • Multi site support
  • Multi-threading
  • Better Session state management, you can now add the login information in the configuration file
  • A new XSS detector based on few vectors and some variations on this. The XSS disclosure based on RSnake's Cheat Sheet is still here, but I needed a new one faster...
  • A module which makes Grabber be able to be used as a simple spider and will save the information in a XML file

I don't know yet when I'm gonna release the version, I need to make sure it works correctly and is stable, I also need to create something to generate nice report (maybe simple XSLT sheets developer/user side) and I want to work more on the hybrid mechanism using different tools (fortify,pixy,php-sat,swaat...)

Wednesday, May 30 2007

Such a noisy thing with SWAAT

In one of the last post, I made a comparison between two PHP Source Code Security Analyzers: SWAAT and PHP-SAT. The results was close to say that SWAAT was really better than PHP-SAT.
I started working on the configuration of PHP-SAT and it looks to be quite powerful (well, after talking with Eric Bouwers, I'm waiting for the next release) and I think I will be able to have good results with combining a security oriented configuration and some additional bugpatterns.
On the other hand, SWAAT is really limited for now as example, I've made a simple php script with only SQL queries inside: every lines are highlighted as flawed (and with a MEDIUM level)!! This is simply stupid and they would better don't report anything than doing that... just tell that you don't support SQL Injection for now... Anyway, SWAAT is for me the tool to keep an eye on, I will try to develop some features on it, especially for XSS detection and SQL Injection findings...

Saturday, May 26 2007

RegFuzzer: Test your regular expression filter

Here we go, I release the first shot of a tool I start writing months ago... The goal of that tool is to find some strings that are valids and which pass your regular expressions filters. Basically, it was designed for testing IDS regexp.
The tool is not finish yet, I have lots of work to do on this, especially the attack strings dictionary; currently there is only some client-side string patterns.
You can download the tool here: RegFuzzer

For using the tool, you need to enter the regular expression to test into the XML input file, and launch the tool like:

python regFuzzer.py -f input.xml

This will produce an HTML file as output. As I said before, this is the first release the goal is much more to show that tool and see if the idea is interesting for you; if so, I may work more on this. Don't hesitate to drop me a line about the tool if you have some comments.

I <3 Bots!