Definition parsing: first step done
By Romain Wednesday, January 30 2008 - 15:37 UTC - Tech - Permalink
By Romain Wednesday, January 30 2008 - 15:37 UTC - Tech - Permalink
Since I started to work on my static analyzer using php-ast/oracle, I realized that looking for vulnerabilities need a lot of hard coded/database entries. This is really sad, since, in order to get something correct you would need a huge knowledge database. So I started thinking of generalization of vulnerabilities and way to express it. It's tough. Really.
The most realistic (if I can say so) idea I had is to actually handle vulnerabilities definition using a given taxonomy. I still need a lot of knowledge, especially on the language (PHP) I'm analyzing, especially the output functions, global variable, filters, resources etc. but the big advantage with rules is that you can generalize the definition.
Anyway, I started dealing with natural language, will try to make this fitting into my model in order to communicate with the future static analyzer engine of php-oracle... and thanks to the AIMA project, I was able to get some fast results on the processing:
# source definition:
unvalidated input go to sink in html context
# parse tree:
2 possiblities
##
02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
23VP[('Verb', 'go')][]
45NP[('Noun', 'sink')][]
('Preposition', 'to')
35PP[]
25VP[]
68NP[('Name', 'html'), ('Noun', 'context')][]
('Preposition', 'in')
58PP[]
28VP[]
08S[]
##
02NP[('Adjective', 'unvalidated'), ('Noun', 'input')][]
23VP[('Verb', 'go')][]
45NP[('Noun', 'sink')][]
68NP[('Name', 'html'), ('Noun', 'context')][]
('Preposition', 'in')
58PP[]
48NP[]
('Preposition', 'to')
38PP[]
28VP[]
08S[]
And the taxonomy I used is the following (which needs to be extended to handle more than "input validation"):
IV = Grammar('InputValidation',
Rules(
S = 'NP VP | S Conjunction S',
NP = 'Pronoun | Noun | Article Noun | Adjective Noun | NP PP | NP RelClause | Name Noun',
VP = 'Verb | VP NP | VP Adjective | VP PP',
PP = 'Preposition NP',
RelClause = 'That VP'
),
Lexicon(
Noun = "input | output | privilege | context | header | user | sink | file",
Verb = "is | go | write | print",
Adjective = "validated | unvalidated | asynchronous",
Pronoun = "me | you | i | it",
Name = "html | database | http | sql | ldap",
Article = "the | a | an",
Preposition = "to | in | on",
Conjunction = "and | or | but | not",
That = "that"
))
Now, I only have to finish my model of a vulnerability (I do not think about building something really general, but a model that can handle injection flaws, privilege, communication would be awesome). Once this is finish, lots of things would be possible such as generating attacks directly from the definition (this would be more like a generalized attack generator) and vulns. checkers for the source code analyzer.
I know this is a kinda tough project and I really have lots of other things to do, but I really want to give this a try... just to see where it goes...
Comments