Subscribe to the RSS feed

Yet another study on code quality: A Tale of Four Kernels

If like me you are interested in code quality and some general conclusion that one can draw based on code quality studies, I really recommend to read this paper: A Tale of Four Kernels by Diomidis Spinellis, ICSE '08: Proceedings of the 30th International Conference on Software Engineering

I just want to quote a part of the conclusion by the author

Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development.

The only problem with this statement is that it is based on the fact that the metrics he used were not weighted for their importance for the "Code Quality" (if this means something). Therefore, the comparison between the Windows research kernel and Linux seems a little bit awkward to me. Anyway, this is a very interesting paper about code quality, and lots of interesting ideas from the author of CScout.

Static Analysis Tool Exposition is over

Yeah, that's sad and also a relief: SATE is over. We actually released today the last stage of the evaluation (basically, the evaluation with some correction based on comments from the participants). Even though I would have prefer to have more feedback from participants on our evaluation, especially to increase its quality, I still think SATE is a good thing and will be an interesting resource for lost of researchers. This is, as far as I know, the only exhaustive resource on the subject (wild source code + weaknesses).

What do I want to do, see next? Since we have accumulated lots of data with the tool reports (raw weaknesses), the evaluations (I really want to thank MITRE's guys, especially Steve Christey and Bob Schmeichel for their help), I'm looking forward to do data analysis and trying to extract some limited results on it.

Anyway, this was overall a good experience, I actually did my first real code review mostly on lighttpd, dspace, mvnform and naim, I think I know way more on how detecting vulnerabilities, I also have been asking myself about how to rate vulnerabilities such as Cross-Site Scripting (hopefully, I will release the little document I wrote about it), I learned so much about how people are writing code trying to understand the design, the code etc. in the applications.

Also, hopefully, I will be able to release the website I developed to handle the weaknesses from different tools. It is, I think, interesting if you are working with more than one assessor. You can send evaluation, comments, merging the weaknesses etc. with a web interface. Even though it needs improvements (it has been done in less than 2 weeks) I think this would be an interesting piece of software for people who are dealing with tons of weaknesses. Another interesting point is that we (at NIST) may open that website for everybody in order to make new evaluation in order to increase the quality of the data we currently have.

Oh well, it seems like a journey is really close to its end, it was such a good time sometimes, and some other time such consuming work. We've been dealing with fifty thousands of weaknesses, dozen of tool reports, and almost tens of test cases... I will keep you posted about the next decision we are gonna make with SATE and hope that lots of people will find in this "exposition" the most they could get.

Oh please stop it with these ridiculous CAPTCHAs!

Marcin just told me about that stupid CAPTCHA from the rapidshare website. Even if I think this is made explicitly to annoy people (this CAPTCHA is used only for free accounts) this is just stupid.

Can you really tell which letter has cat or not? I'm sorry but I can't!

Accelerate the convergence to the bug: Running the test in 16-bit

Yesterday, I came across a case in a piece of software which was really hard for me to understand perfectly. Not only the code is well written (which is always worse for finding bugs :)) but the structure is also well thought (this is the implementation of an associated array in C in the lighttpd application).

The problem I had was to state whether a tool report was a true-positive/false-positive. So, as in many case I've seen in this software a problem may occur only in the limit cases. This one may occur after INT_MAX insertion in the structure. I don't know if one of you ever tried to do such a thing, but only INT_MAX (~2 billions on typical PC) allocations is a lot, so inserting elements in a structure that needs at least 5 (re)allocations is too much. But well, I did it. Also, I ran this test with valgrind using the memory leak check (full check and high definition).

I then ran a simple test program to fill this structure in a real condition: a typical x86/32-bit architecture. As I knew it was stupid and didn't even think this could end before 2 days I started looking in other direction in order to reduce the INT_MAX size for having a reasonable time execution of the test.


My first attempt is to shift all the types that are used, I knew this was not perfect because even if I can force my program to use unsigned short instead of size_t, I wouldn't change the size of the pointers, a char * would still b 32-bit (there may be some options in gcc to control the size of the pointers — which I doubt — but I didn't find any). Using this methodology, I was able to make the program crash in the way that would have been a real true-positive.


But as I knew it was not good since the size of the pointers are not modified and I had the feeling that in that particular structure, the case of the possible crash is handled by itself (due to pointer and type limits), I started looking in other direction for running that program in 16-bit, a pseudo-real-16-bit-mode. I then started looking into emulators and how to compile code for 16-bits and running it on my linux (x86/32-bit). After having issues compiling and running the test program with the gnu-m68hc11 ELF package, I found the bcc/elksemu stuff. After compiling and running with ELKS utilities, the test program didn't crash, it only failed in an assertion test after an allocation...


Different behavior, with different methods, okay... which is the correct one? Is it a problem of pointer size that made the test running differently than the real program on a 32-bit or maybe a limitation of the elksemu machine? As this morning I checked the state of the 32-bit run I launched yesterday, and this was finished... ended by a failed assertion.

As expected, pointer size matters when you wanna test on intrinsic limitations of a structure and its behavior using limit cases.

Scaling MySQL db

I've just came across this interesting blog entry; some numbers on how people (large websites companies) are actually using MySQL.

http://venublog.com/2008/04/16/notes-from-scaling-mysql-up-or-out/

MySQL table/field names

Sometimes I really don't understand developers.

Why the heck a table name such as a<script>foo(42)`cool could ever be allowed? What's the point of that? I know I am almost clueless with SQL but... what's the reason here? If someone has some idea, I would love to hear them!

Untrusted websites passwords

After using different password, it's really bothering to have lots of diversity; you need to remember them or well, store them in a password.txt

I just made a simple script for my own in order, from mostly the same password, to generate different ones for different websites... This is not that big deal, just a simple script to do that, but I thought it could have been useful for some of you...

You can reach the script here: Untrusted websites passwords creator

NIST SATE step 3 completed: test cases information release

This evening at work, with Vadim, we were exhausted after days of work but we were smiling. Smiling and happy because we knew that the step 3 of SATE was pretty much done. The step 3 is when all the participants are sending their output to us. Even if we know that we will have hard time to come up with the master reference list for each test cases what we selected for SATE 2008, we know that this is interesting data for the SwA community and especially SCA studies.

Today, we can finally tell which test cases were selected by us for SATE 2008. First of all, we have 2 different tracks: C language and Java language. For the java track, we decided to look more into web applications. We then have:

And for the C track we selected:

  • Nagios: host, service and network monitoring with web interface (using CGI)
  • Lighttpd: web server
  • Naim: console instant messenger

You may have lots of comments on why these and I am totally ready to answer your questions. Just to let you know, during the selection phase, we reviewed 50+ different applications. For each applications, we had to scan them using tools, doing some manual review and our main goal is to find at least one exploitable vulnerability. Concerning the type of test cases themselves, the constrain is to have real exploitable vulnerabilities and they must be real applications which means basically, not test cases that we have in our SRD.

Just as reminder, the next important dates for SATE 2008 are:

  • April 15, we are distributing to the participants our master reference list, the list of real weaknesses found by the participants
  • June, comparison of all the participants results, the participants get all the reports submitted at SATE 2008
  • December, all the data and reports are public

Code review: facilitate the SCA output analysis

This post is not exactly a follow up of a previous post called Code review tools: the missing link (so far), But since I will have to perform a lot of code review in the next couple of weeks and also tool output analysis, I was looking for some tool to help me, to facilitate my job. I've been asking people for links, tips etc. but nothing really convinced me. I am looking for a tool which is basically able to smartly index the source code I am reviewing, which means that I want to be able to look at the variables, where they are declared, affected and used... I also want to see the call graphs of functions and this, mostly to probe the correctness of tool output.

After a couple of hours looking at specialized tools, I was not able to find something good and free (No, I don't call cscope good!). Yes, there are a couple of commercial ones, especially the ones shipped with the commercial source code analyzers and well, they're not perfect at all!

So, this morning, I was like frustrated when I actually thought of using a tool I used a lot, but for a quite different utilization: Doxygen. You may know this documentation tool, but may not know all it is capable of.

As a documentation generation tool, it is really powerful and mostly based on specially formated comments that the developers seed in the source code. But the tool is also generating a bunch of structure related information such as classes relations, function calls graphs etc. As I don't want to generate a documentation of the code I'm reviewing, I don't mind not to have the well formated comments. I am asking this tool to generate me the structural information and facilitate the navigation from function to function.

I made a small example of the report generated by Doxygen using the configuration I made for getting all the information I wanted (only one page since the documentation and the pictures etc. are kinda big...). In order to generate the configuration I wanted, I made a tiny python script ozone.py since the DoxyWizard is not really convenient for that. Also, I will add a process to pre-compile the JSP files since Doxygen doesn't understand the JSP syntax and the option to use the Doxygen search engine (PHP script that use and file with indexed tags).

This is the first step of that script, as you may see by looking at the source code, I am also generating the XML files, this is because the XML generated Doxygen documentation contains a lot of interesting information that I may use later... Also, while looking at the Doxygen source code, I thought that it could be possible to integrate many more static analysis such as computing metrics, etc. Anyway, so many other things to do than thinking about that right now!

OWASP France Chapter & OWASP Top Ten 2007 French

Just to say that I am please to see the OWASP Chapter France starting again thanks to Sebastien Gioria! I hope that this is gonna last for good and that we will be able to spread the web security & tools in France. Even though I am not in France anymore, I am please to be part of the board. What I would like to do so far, is to talk with engineering school, universities, etc in order to make web security as part of classes when students are learning about web development for instance (or just development).

In the same time, we are releasing the translation of the OWASP Top Ten 2007 in French. The document by itself is a really good content! The French translation has been done while trying to keep the original ideas of the Top Ten.

You can download the OWASP Top Ten 2007 in French on the OWASP Chapter France web page. As usual, every comments, ideas etc, about the role of OWASP in France are more than welcome!

- page 1 of 12