Bento check: Catch catastrophic backtracking ReDoS bugs

Find severe regular expression denial-of-service bugs in Python using Bento
semgrep placeholder image

get this check now in Bento: $ pip3 install bento-cli && bento init

ReDoS (regular expression denial-of-service) bugs occur when a specially crafted string is passed to an inefficient regular expression, making it take a very long time to complete. We recently found a high severity vulnerability in several versions of Python 2 and 3, so we created a Bento check to find similar vulnerabilities in other projects. The check is included as of Bento version 0.9.

One type of inefficient regular expression occurs with nested quantifiers. A simple example is (a+)+. Another type of inefficient expression occurs with mutually inclusive alternation. These are either/or expressions with overlapping character space like ([a-z]|a)+. Both of these inefficient expressions must also contain a pattern after the inefficient part that can’t be matched against, like (a+)+b. This causes backtracking, and in the inefficient case it causes “catastrophic backtracking.” Catastrophic backtracking can impact the security and availability of an application, thus opening the door for denial-of-service attacks.

A catastrophic backtracking bug caused a large Cloudflare outage in July, 2019. In this case, “CPU exhaustion was caused by a single WAF rule that contained a poorly written regular expression that ended up creating excessive backtracking.” This led to Cloudflare’s first global outage in six years.

Here’s an example of catastrophic backtracking:

ReDoS backtracking

To create this check, we added heuristics for detecting inefficient regular expressions into tooling and then used our program analysis platform to run the check against several thousand open source Python projects. Our first tests of the check produced many findings:

Initial findings across thousands of Python projects

A particularly interesting finding came from the Python urllib module:

1rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
2                'realm=(["\']?)([^"\']*)\\2', re.I)

This line was caught by the check due to nested quantifiers in the (?:.*,)* part of the regular expression. Since . and , overlap (dot means any character), by simply repeating a comma and then including a character we can cause catastrophic backtracking. This led to Python bpo-39503 and CVE-2020-8492.

In looking at the check’s other results, we found and reported bugs in several widely-used Python projects:

If you’re using regular expressions in your Python project, you can now run Bento and determine if this bug affects it. For more details on how we found the specific vulnerabilities noted above, see this post on the r2c engineering blog.

References

About

Semgrep Logo

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.

Find and fix the issues that matter before build time

Semgrep helps organizations shift left without the developer productivity tax.

Get started in minutesBook a demo