Preventing SQL injection: a Django author's perspective

by Jacob Kaplan-Moss and Grayson Hardaway on May 12, 2020

This is a guest post co-authored by Jacob Kaplan-Moss, co-creator of Django, and Grayson Hardaway.

What’s SQL Injection?

SQL Injection (SQLi) is one of the most dangerous classes of web vulnerabilities. Thankfully, it’s becoming increasingly rare — thanks mostly to increasing use of database abstraction layers like Django’s ORM — but where it occurs it can be devastating.

SQLi happens when code incorrectly constructs SQL queries that contain user input. For example, imagine writing a search function without knowing about SQLi:

def search(request):
    query = request.GET['q']
    sql = f"SELECT * FROM some_table WHERE title LIKE '%{query}%';"

    cursor = db.cursor()
    cursor.execute(sql)
    ...

Can you spot the problem? Notice that the query comes from the browser: request.GET['q']. Think about what might happen if that query contains a single quote. What happens when the SQL string is constructed?

Consider if an attacker searches for ' OR 'a'='a. In this case the constructed SQL would become:

SELECT * FROM some_table WHERE title LIKE '%%' OR 'a'='a';

So that’s bad; now we’re returning the entire contents of the table. This could be a data breach, or it could overwhelm your database server.

But it gets worse; imagine now that the attacker searches for '; DELETE FROM some_table. Now, the constructed SQL becomes:

SELECT * FROM some_table WHERE title LIKE '%%'; DELETE FROM some_table;

Uh oh.

General concepts for preventing SQLi

We’ll get to Django specifics shortly, but first it’s important to really understand the fundamental rules of preventing SQL injection:

  1. Never trust any data submitted by the user
  2. Always use "parameterized statements" when directly constructing SQL queries

Anything that comes from the user could be maliciously constructed. Even things that seem safe, like browser headers (e.g., things like the user agent, request.META['HTTP_USER_AGENT'] in Django) are trivial to tamper with either directly in the browser or with tools like Burp or Charles.

Practically, in Django this means nearly anything that hangs off the HttpRequest object, i.e., the request parameter that’s passed as the first argument to view functions. Though there are some exceptions, it’s probably best to consider anything on request as fundamentally untrustworthy.

However, just because some piece of data isn’t attached to request right now doesn't mean that you can trust it. For example, consider something like an image caption. You might access it through an API that doesn’t mention a request:

image = Image.objects.get(...)
sql = f"""SELECT * FROM images WHERE similarity(caption, '{image.caption}') > 0.5;
...

But if that image caption was previously entered by a user…it’s still dangerous. So this brings us around to the second rule: always use parameterized statements.

Parameterized statements are a mechanism to pass any dynamic parameters separate from the SQL query. They’re either interpreted directly by the database or safely escaped before being added to the query. Almost every database client on the planet supports parameterized statements — and if yours doesn’t, find a different one.

Here’s what the search function from above would look like with parameterized statements:

def search(request):
    cursor = db.cursor()
    cursor.execute(
        "SELECT * FROM some_table WHERE title LIKE '%?%'",
        [request.GET['q']]
    )

Notice the ? in the SQL string, and the second parameter to execute. This second argument is the parameter list; items in this list are safely injected into the query to replace the question marks.

PEP-249, the Python database API standard requires parameterized statements, though different libraries may use different syntax for the placeholders (%-style parameters, :named parameters, numeric parameters, etc.).

You can use code analysis tools to check for SQL injections. Bento is one such tool that has several checks for common SQL injection problems. This can catch many common errors; but it’s still a best practice to use parameterized statements and one of the techniques below to completely prevent this attack.

Preventing SQLi in Django

Django’s ORM uses parameterized statements everywhere, so it is highly resistant to SQLi. Thus, if you’re using the ORM to make database queries you can be fairly confident that your app is safe.

However, there are still a few cases where you need to be aware of injection attacks; a very small minority of APIs are not 100% safe. These are where you should focus your auditing, and where your automated code analysis should focus its checks.

Raw Queries

Occasionally, the ORM isn’t expressive enough, and you need raw SQL. Before you do, consider whether there are ways to avoid it -- for example, building a Django model on top of a database view, or calling a stored procedure can help prevent the need to embed raw SQL in your Python.

But, sometimes raw SQL is unavoidable. There are several APIs for doing this, but all are somewhat dangerous. In order of desirability, these are the APIs that Django provides:

  1. Raw queries, for example:

    sql = "... some complex SQL query here ..."
    qs = MyModel.objects.raw(sql, [param1, param2])
    # ^ note the parameterized statements in the line above
  2. The RawSQL annotation, for example:

    from django.db.models.expressions import RawSQL
    
    sql = "... some complex subquery here ..."
    qs = MyModel.objects.annotate(val=RawSQL(sql, [param1]))
    # ^ note the parameterized statement in the line above
  3. Use database cursors directly, for example:

    from django.db import connection
    sql = "... some complex query here ..."
    with connection.cursor() as cursor:
     cursor.execute(sql, [param1])
     # ^ again, note the parameterized statement in the line above
  4. AVOID: Queryset.extra() (no example: this is unsafe, so it's just included for completeness).

To use these APIs safely:

  1. Read the first part of this article and make sure you understand parameterized statements before proceeding.
  2. Don’t use extra(). It’s difficult (if not impossible) to use in a way that’s 100% safe, and should be considered deprecated.
  3. Always pass parameterized statements — even if your parameter list is empty. That is, you should write something like:

    sql = 'SELECT * FROM something;'
    qs = MyModel.objects.raw(sql, [])

This is to remind you to later add parameters to this list, and to make it easier for automated tools like Bento to find potentially incorrect API usage.

The query itself should always be a static string, rather than one formed from concatenation or any other string processing. Again, this is to make it easier for automated tools to find incorrect API usage.

Automatic Prevention

It is good practice to use code analysis tools to catch preventable mistakes — to err is human, as the saying goes. Bento will automatically check Django code for SQL injection patterns. The following will check your codebase all at once for SQL injections caused by something hanging off of the request object.

pip3 install bento-cli && \
  bento init && \
  BENTO_REGISTRY=r/r2c.python.django.security.injection.sql bento check -t semgrep --all .

Better than checking your current code, however, is checking your future code! Bento is designed to be run as a pre-commit hook or in continuous integration (CI) environments. Bento is diff-aware and will only check commits, ensuring a speedy workflow while keeping your code secure. When you init Bento on your project, it will automatically set itself up to check commits.

This commit-based workflow is especially powerful for ensuring certain patterns never enter your codebase. To practically eliminate SQL injection from your codebase, you should automatically detect that your code:

  1. Always uses parameterized queries.
  2. Never uses .extra().

Bento can detect these patterns by using a different registry:

BENTO_REGISTRY=r/r2c.python.django.security.audit bento check -t semgrep --all .

This set of rules will highlight many more findings even when there is not a vulnerability. It is much stricter and can be overwhelming if you check your code all at once. However, you can also archive your findings with Bento, which will suppress findings until you’re ready to deal with them. This lets you continuously check your code for these patterns without being overwhelmed by findings.

Under the hood, Bento is powered by Semgrep. Semgrep is a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search. This has advantages over normal grep — the most obvious one being that Semgrep is not thwarted by line boundaries.

Let’s say you wanted to detect the following SQL injection:

def search(request):
    search_term = request.GET['search_term']
    cursor = db.cursor()
    cursor.execute("SELECT \* FROM table WHERE field=" + search_term)

This can be expressed in Semgrep like so:

$VAR = request.GET[...]
...
$CUR.execute("..." + \$VAR)

Detecting a pattern like this in a commit-based workflow is invaluable because it effectively eliminates this pattern of SQL injection from your codebase! You can check this out in action at https://sgrep.live/0X5.

Other ORMs

Finally, if you are continually finding that Django’s ORM isn't expressive enough, you may want to experiment with replacing Django’s ORM with SQLAlchemy, which is a more powerful and expressive ORM. You’ll lose out on many of Django’s conveniences like the admin, model forms, and model-based generic views, but will gain a more powerful and expressive API that’s still safe.

Custom ORM additions

Finally, there are a few potentially-dangerous areas that may be unsafe even though you’re not using raw SQL directly. Django allows for the creation of custom aggregates and custom expressions -- e.g. a third-party library could write APIs such that something like Document.objects.filter(title__similar_to=other_title) would work.

Django’s core ORM -- the core expressions, annotations, and aggregations -- are all mature and battle-hardened. The odds of a SQLi in the core parts of the ORM is very, very low. But ORM additions -- especially ones that you write yourself -- can still be a source of risk.

To mitigate the risk of injection from these advanced features, I suggest the following:

First, be cautious about including custom expressions/aggregates from third-party apps. You should audit those third-party apps carefully. Is the app mature, stable, and maintained? Are you confidant that any security issues would be promptly fixed and responsibly disclosed? And, of course, be sure to pin your dependencies to prevent newer and potentially less secure versions from being installed without your explicit direction.

Similarly, be cautious about writing your own custom aggregates. Carefully read the beginning of this article, and Django's documentation about avoiding SQL injection in custom expressions. As the documentation shows, if possible you should avoid doing any string interpolation in custom expressions. If you can’t, you'll need to escape any expression parameters yourself. This is tricky to do right, and will depend on the specifics of your database engine and Python wrapper API. Consult an expert before diving in here!

The django.security.audit registry in Bento will detect if a custom ORM addition is defined in your codebase; you could also quickly audit third-party apps with this. The exploitation conditions are very nuanced, so if you find this in your project, be sure to consult that expert!

Wrapping up

Django was designed to be resilient against SQL injection (and other common web vulnerabilities). Most common uses of Django will be automatically protected, so SQLi vulnerabilities in real-world Django apps are thankfully rare.

However, when they occur, SQLi vulnerabilities are devastating. It’s well worth your time to audit your codebase to ensure you’re safe. Bento can help by flagging several common vulnerabilities. Now that you understand the concepts, and why certain errors are flagged, you should be better equipped to write safe code.