Source Code Obfuscation is actually a powerful tool for testers. Whether you use it to obfuscate your bytecode (Java, .NET etc.) or increasing the code complexity of your current source code.
Working at SAMATE we are also playing, tweaking, testing, stressing source code analyzers. And now you see the relation. I'm writing a source code obfuscater in order to increase the complexity of our [test cases][] and see if the tools are still doing well.
Thus, I was able (with good documentation, and yaxx to create one. It currently only add control flow complexity (and of course renaming classes, functions and variables).
Some words on obfuscation
You may have heard about obfuscation in a sense of making the code unreadable for users. This is not what I'm interested in. I want to modify the actually source code, adding some information in it, some tests... I need the outputs of the original program and the obfuscated one to be the same, otherwise we cannot consider the source code as being equivalent.
So for example if I do:
if (var == 0) { echo 0; }
I will have the same behavior with this source code:
x = some_value; if (var == 0 or x*x < 0) { echo 0; }
Even though they have the same output, the second one is more complicated since it adds one other test.
Test case wise example
To see what it exactly does given this original source code
<?php $b = 0; $c = "Salut"; $len = strlen($c); function fct($a) { return $a . "_1"; } class T { function foo() { echo "test"; } } echo fct(0); echo T::foo(); ?>
...and by choosing one of my control flow obfuscating pattern
<?php class rand_class_name { function rand_func_name_2($rand_name_1) { return $rand_name_1 + 1; } } function rand_func_name_1($rand_name_2) { return $rand_name_2 + 1; } if (rand_func_name_1(0) > 0 && rand_class_name::rand_func_name_2(0)) { $enter_the_new_statement; } ?>
To have this result:
<?php function HXvE5Plwxp0RSoQM ( $ZMfP98Az96Rq67j6 ) { return $ZMfP98Az96Rq67j6 + 1 ; } class TF03COvMuzXRQcCK { function Ltghf3a0McCI8RaZ ( $V309os5vQo15ak9b ) { return $V309os5vQo15ak9b + 1 ; } } $b = 0 ; $c = "Salut" ; $len = strlen ( $c ) ; function fct ( $a ) { return $a . "_1" ; } class T { function foo ( ) { echo "test" ; } } if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) { echo fct ( 0 ) ; } if ( HXvE5Plwxp0RSoQM ( 0 ) > 0 && TF03COvMuzXRQcCK :: Ltghf3a0McCI8RaZ ( 0 ) ) { echo T :: foo ( ) ; } ?>
How it actually works
First of all, the engine only works on Abstract Syntax Tree (AST) in order to do powerful manipulation and code refactoring. The idea is to take a couple of transformation patterns (the second source code is in fact a complicated one), and fitting this patterns with the original source code.
The patterns are meta code. You can see that they are in PHP using some names such as $rand_name_1 etc. this means that the engine will generate one unique name for each of them and replace it before the actual refactoring.
Select what I want to obfuscate is not a real problem, but for now I only selected the top statements and will apply the whole modifications to each of them.
A little schema explaining a little how it works is available here: [schema_obfuscation.png]
What's next
The applied control flow obfuscating pattern is on of the many I do have for now (many more to come), and I guess this is kinda promising, lots of interesting studies should come now.
Currently the tools is only for PHP but I should make it general by using my own AST nodes names and then be able to do code transformation on C, C++, Java etc.
There is no release of the tool (written in C++) right now, I will wait until it's more than correct and clean. I also need to do data obfuscation (using indirections etc.). The program will of course be public and free for everybody when it's gonna be ready.