deep inside: security and tools

RSA 2013 speaking session

I'll be speaking next week at RSA. My session is in Friday morning (10:20am, Room 132) and is called:

Why haven't we stamped out XSS and SQLi yet?

RSA talk content

Since all the slides are apparently available for everyone on the RSA website, I can give some more insights about what I will be talking about. We ran an experiment at Coverity in which we analyzed many Java web applications and looked for where developers add dynamic data. The goal is to try to understand what contexts (both HTML contexts and SQL contexts) are frequently used.

The tone of the talk is fairly straightforward: security pros. have been giving advices to developers for a long time, yet we still have these issues on a frequent basis, so we map common advices with what we see from the data.

What you can expect from this talk:

Anyhow, this blog post is not only to announce this, but also to give some insights on how we extracted the data from these applications.

Analysis technique

We created and modified different checkers from Security Advisor in order to extract all injection sites that are related to dynamic data regardless of its taintedness. For each injection site, we computed the context in which it belonged to the sub language (one of HTML, JavaScript, CSS, SQL, HQL, and JQPL). This represent our working dataset.

Here's an example of injection sites (using JSP):

...
<script type="text/javascript">
var content = '${dynamic_data}';
// context ::= {HTML SCRIPT TAG -> JS STRING}
</script>
...

We track the construction of this snippet of HTML page and recorded the injection site such as ${dynamic_data} and its associated context. Since we do not care about the taintedness of dynamic_data we do not need to track all paths that could lead to a defect (XSS here) and that's where what we did is very different from our XSS checker. Note that we still need to properly track parts of the HTML page that's being constructed to properly compute the context. This is however part of our context aware global data flow analysis...

For SQL related queries, we essentially need to do the same thing, but we also need to track the parameters being inserted in a query using a parameterized notation: remember, we need to find all dynamic data that can eventually go into a query. That's why the following code:

String sql = "select foo, bar from table where 1=1";
if (cond1)
  sql += " and user='" + user_name + "'"; // context ::= {SQL_STRING}
if (cond2)
  sql += " and password=?"; // context ::= {SQL_DATA_VALUE}

has 2 interesting injection sites for the experiment, and we do not need to understand the full abstract string (an eventual set of 4 possible strings) from this piece of code.

Note that if there is this fairly common construct:

String sql1 = "select foo, bar from table where ";
String and_beg = " and (";
String and_end = " ) ";
sql1 += and_beg + "user = '" + user_name + "'" + and_end;
sql1 += sql2; // `sql2` is another part of the query coming
              // from a different procedure or so

we will still properly track the contexts even if all parts (sql1, and_beg, etc.) are inter-procedurally created.

Limitations

I will quickly explain this during the talk, but essentially tracking HTML contexts on a global data flow analysis is not a trivial part. Moreover, considering the impact of some JavaScript code on the resulting web page (and therefore where the HTML contexts could potentially be transformed to at runtime) is an ever more complex problem. We did not analyze JavaScript.

All entries

  1. February 2013 — HTML5 tokenization visualization
  2. September 2011 — PHP, Variable variables, Oh my!
  3. July 2011 — Dissection of a SQL injection challenge
  4. January 2010 — Yes, we need a standard to evaluate SAST, but it ain't easy...
  5. November 2009 — Data driven factory: I give you data, you give me an object...
  6. June 2009 — NIST Static Analysis Tool Exposition special publication released
  7. December 2008 — Every-day's CSRF: Sorry, I turned off your christmas tree lights
  8. August 2008 — Why the "line of code" is indeed a good metric
  9. May 2008 — Accelerate the convergence to the bug: Running the test in 16-bit
  10. February 2008 — Code review tools: the missing link (so far)
  11. January 2008 — Talk: Problems and solutions for testing web application security scanners
  12. October 2007 — IE6 And IE7 don't have compatible CSS tricks
  13. September 2007 — Source Code Obfuscation
  14. February 2007 — The return of the SVG XSS
  15. February 2007 — How you should design a test suite for Web Apps Scanners
  16. January 2007 — Test Suites for Web Application Scanners
  17. December 2006 — SVG Files: XSS attacks