I'll be speaking next week at RSA. My session is in Friday morning (10:20am, Room 132) and is called:
Why haven't we stamped out XSS and SQLi yet?
RSA talk content
Since all the slides are apparently available for everyone on the RSA website, I can give some more insights about what I will be talking about. We ran an experiment at Coverity in which we analyzed many Java web applications and looked for where developers add dynamic data. The goal is to try to understand what contexts (both HTML contexts and SQL contexts) are frequently used.
The tone of the talk is fairly straightforward: security pros. have been giving advices to developers for a long time, yet we still have these issues on a frequent basis, so we map common advices with what we see from the data.
What you can expect from this talk:
- Some information about observed HTML contexts: that's about 26 different stacks, 45% of them had 2 elements in the stack (e.g., HTML attribute -> CSS code), and the longest ones had 3 elements.
- A list of SQL contexts and good notes about what developers usually do
- Advices for security pros. on how to communicate with developers
Anyhow, this blog post is not only to announce this, but also to give some insights on how we extracted the data from these applications.
Analysis technique
We created and modified different checkers from Security Advisor in order to extract all injection sites that are related to dynamic data regardless of its taintedness. For each injection site, we computed the context in which it belonged to the sub language (one of HTML, JavaScript, CSS, SQL, HQL, and JQPL). This represent our working dataset.
Here's an example of injection sites (using JSP):
... <script type="text/javascript"> var content = '${dynamic_data}'; // context ::= {HTML SCRIPT TAG -> JS STRING} </script> ...
We track the construction of this snippet of HTML page and recorded the injection site such as ${dynamic_data}
and its associated context. Since we do not care about the taintedness of dynamic_data
we do not need to track all paths that could lead to a defect (XSS here) and that's where what we did is very different from our XSS checker.
Note that we still need to properly track parts of the HTML page that's being constructed to properly compute the context. This is however part of our context aware global data flow analysis...
For SQL related queries, we essentially need to do the same thing, but we also need to track the parameters being inserted in a query using a parameterized notation: remember, we need to find all dynamic data that can eventually go into a query. That's why the following code:
String sql = "select foo, bar from table where 1=1"; if (cond1) sql += " and user='" + user_name + "'"; // context ::= {SQL_STRING} if (cond2) sql += " and password=?"; // context ::= {SQL_DATA_VALUE}
has 2 interesting injection sites for the experiment, and we do not need to understand the full abstract string (an eventual set of 4 possible strings) from this piece of code.
Note that if there is this fairly common construct:
String sql1 = "select foo, bar from table where "; String and_beg = " and ("; String and_end = " ) "; sql1 += and_beg + "user = '" + user_name + "'" + and_end; sql1 += sql2; // `sql2` is another part of the query coming // from a different procedure or so
we will still properly track the contexts even if all parts (sql1, and_beg, etc.) are inter-procedurally created.
Limitations
I will quickly explain this during the talk, but essentially tracking HTML contexts on a global data flow analysis is not a trivial part. Moreover, considering the impact of some JavaScript code on the resulting web page (and therefore where the HTML contexts could potentially be transformed to at runtime) is an ever more complex problem. We did not analyze JavaScript.