Keep your rules simple with symbolic propagation

by Iago Abal on February 07, 2022

tl;dr: Symbolic propagation is a new experimental feature, a generalization of constant propagation, that enables Semgrep to perform matching modulo variable assignments. You can then write simple patterns like $OBJ.foo().bar() and they will match equivalent code even if there are intermediate variables. For example, you can do this!

Let's say that you want to find all places where a function returns with code 42. With Semgrep's pattern language, you would simply write the pattern return 42, isn't that simple? But, would such a simple pattern match the following code?

C = 42

def test():
  return C

Yes it does! Thanks to constant propagation our simple pattern can match code that is equivalent to return 42. Wonderful.

Now, let me propose another exercise, let’s write a rule that matches a chain of method calls, first foo() then bar(), on some arbitrary object. Most of us would start with a pattern such as $OBJ.foo().bar(). But, would such a simple rule also match the following code?

def test(obj):
    x = obj.foo()
    # ruleid: test
    x.bar()

Unfortunately, this time, the rule does not match. If you have some experience writing Semgrep rules, you may have found yourself in this situation before. Intermediate assignments can make writing Semgrep rules somewhat tricky. It is not really difficult, but it is a bit cumbersome:

pattern-either:
  - pattern: $OBJ.foo().bar()
  - pattern: |
      $VAR = $OBJ.foo();
      ...
      $VAR.bar()

We need to enumerate the different shapes of the target code depending on the use of intermediate variables. If our base rule were more complex, then doing this can result in rather complex rules like r/python.django.security.injection.open-redirect.open-redirect, with large pattern-eithers that look like this:

- pattern: |
    $DATA = request.$W.get(...)
    ...
    $INTERM = $DATA
    ...
    django.shortcuts.redirect(..., $INTERM, ...)

Wouldn't it be wonderful if our simple pattern $OBJ.foo().bar() could match equivalent code, regardless of any intermediate assignments, just the same way as we do with constants? Why cannot Semgrep "just know" that x is equals to obj.foo()? I have good news for you, since version 0.78.0, Semgrep can actually do this! Simply set symbolic_propagation: true using rule options: et voilà:

Isn't that beatiful?

This feature, that we coined symbolic propagation, is a generalization of constant propagation. It tracks simple assignments and enables Semgrep’s matching engine to automatically unfold a definition when needed.

Please give it a try and let us know what you think. We hope that you will find this new feature useful for writing simple yet powerful Semgrep rules. For example, what would it take to write a Semgrep rule that does this without symbolic propagation? See also how succinctly you can now write rule r/python.django.security.injection.open-redirect.open-redirect by combining metavariable-pattern and symbolic propagation: https://semgrep.dev/s/py8N!

For now, this is still an experimental feature, but we are very excited about the power it provides, and we want to promote it to be the default as soon as possible.

Enjoy!