deep inside: security and tools

Source Code Obfuscation

Source Code Obfuscation is actually a powerful tool for testers. Whether you use it to obfuscate your bytecode (Java, .NET etc.) or increasing the code complexity of your current source code.

Working at SAMATE we are also playing, tweaking, testing, stressing source code analyzers. And now you see the relation. I'm writing a source code obfuscater in order to increase the complexity of our [test cases][] and see if the tools are still doing well.

Thus, I was able (with good documentation, and yaxx to create one. It currently only add control flow complexity (and of course renaming classes, functions and variables).

Some words on obfuscation

You may have heard about obfuscation in a sense of making the code unreadable for users. This is not what I'm interested in. I want to modify the actually source code, adding some information in it, some tests... I need the outputs of the original program and the obfuscated one to be the same, otherwise we cannot consider the source code as being equivalent.

So for example if I do:

if (var == 0) {
  echo 0;
}

I will have the same behavior with this source code:

x = some_value;
if (var == 0 or x*x < 0) { 
  echo 0;
}

Even though they have the same output, the second one is more complicated since it adds one other test.

Test case wise example

To see what it exactly does given this original source code

<?php
$b = 0; 
$c = "Salut"; 
$len = strlen($c);  
function fct($a) {
  return $a . "_1"; 
}
class T {
  function foo() {
    echo "test";
  }
}
echo fct(0);
echo T::foo();
?>

...and by choosing one of my control flow obfuscating pattern

<?php  
class rand_class_name {
  function rand_func_name_2($rand_name_1) {
    return $rand_name_1 + 1;
  }
}
function rand_func_name_1($rand_name_2) {
  return $rand_name_2 + 1;
}
if (rand_func_name_1(0) > 0 && rand_class_name::rand_func_name_2(0)) {
  $enter_the_new_statement;
}
?>

To have this result:

<?php
function HXvE5Plwxp0RSoQM ( $ZMfP98Az96Rq67j6 ) {
  return $ZMfP98Az96Rq67j6 + 1 ;
}
class TF03COvMuzXRQcCK {
  function Ltghf3a0McCI8RaZ ( $V309os5vQo15ak9b ) {
    return $V309os5vQo15ak9b + 1 ;
  }
}
$b = 0 ;
$c = "Salut" ;
$len = strlen ( $c ) ;
function fct ( $a ) {
  return $a . "_1" ;
} 
class T {
  function foo ( ) {
    echo "test" ;
  }
}
if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) {
  echo fct ( 0 ) ;
}
if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) { 
  echo T :: foo ( ) ; 
}
?>

How it actually works

First of all, the engine only works on Abstract Syntax Tree (AST) in order to do powerful manipulation and code refactoring. The idea is to take a couple of transformation patterns (the second source code is in fact a complicated one), and fitting this patterns with the original source code.

The patterns are meta code. You can see that they are in PHP using some names such as $rand_name_1 etc. this means that the engine will generate one unique name for each of them and replace it before the actual refactoring.

Select what I want to obfuscate is not a real problem, but for now I only selected the top statements and will apply the whole modifications to each of them.

A little schema explaining a little how it works is available here: [schema_obfuscation.png]

What's next

The applied control flow obfuscating pattern is on of the many I do have for now (many more to come), and I guess this is kinda promising, lots of interesting studies should come now.

Currently the tools is only for PHP but I should make it general by using my own AST nodes names and then be able to do code transformation on C, C++, Java etc.

There is no release of the tool (written in C++) right now, I will wait until it's more than correct and clean. I also need to do data obfuscation (using indirections etc.). The program will of course be public and free for everybody when it's gonna be ready.

All entries

  1. February 2013 — RSA 2013 speaking session
  2. February 2013 — HTML5 tokenization visualization
  3. September 2011 — PHP, Variable variables, Oh my!
  4. July 2011 — Dissection of a SQL injection challenge
  5. January 2010 — Yes, we need a standard to evaluate SAST, but it ain't easy...
  6. November 2009 — Data driven factory: I give you data, you give me an object...
  7. June 2009 — NIST Static Analysis Tool Exposition special publication released
  8. December 2008 — Every-day's CSRF: Sorry, I turned off your christmas tree lights
  9. August 2008 — Why the "line of code" is indeed a good metric
  10. May 2008 — Accelerate the convergence to the bug: Running the test in 16-bit
  11. February 2008 — Code review tools: the missing link (so far)
  12. January 2008 — Talk: Problems and solutions for testing web application security scanners
  13. October 2007 — IE6 And IE7 don't have compatible CSS tricks
  14. September 2007 — Source Code Obfuscation
  15. February 2007 — The return of the SVG XSS
  16. February 2007 — How you should design a test suite for Web Apps Scanners
  17. January 2007 — Test Suites for Web Application Scanners
  18. December 2006 — SVG Files: XSS attacks