Introducing Semgrep for GitLab

by Bence Nagy and Pablo Estrada on June 22, 2021

Semgrep now has 1st-class integration into GitLab through two paths: GitLab SAST and Semgrep CI. In GitLab SAST, Semgrep now powers analysis for JavaScript, Python, and TypeScript, with more languages coming. Through Semgrep CI, GitLab users can also add a Semgrep job to their CI/CD workflow to discuss findings in merge requests, access 1,000+ community-driven rules, scan code in 17+ languages, and get the power of easily written custom rules using Semgrep’s intuitive syntax.

Looking for the docs? Jump over here.

Semgrep in GitLab SAST

For the past several months, we’ve been working with the GitLab team on their integration of Semgrep into GitLab SAST for language agnostic scanning and simplified custom rule development. As of the GitLab 14 release, Semgrep is the default SAST analyzer for JavaScript, Python, and TypeScript, replacing Bandit and ESLint as the analyzers for those languages.

Adding Semgrep to your GitLab CI/CD workflow is as simple as including the SAST.gitlab-ci.yml template in your .gitlab-ci.yml file. This default configuration will scan your code using a set of rules written by GitLab and r2c that are nearly identical to those previously used by Bandit and ESLint (see the rules here). If you’re curious to compare performance between tools, check out this deep-dive comparison between Semgrep and Bandit and this comparison between Semgrep and ESLint.

To help ensure a great experience for Semgrep users in GitLab, we ran large-scale benchmarks, shipped performance optimizations, fixed bugs, and interviewed users. We’ve also been continuously testing Semgrep in CI environments to ensure speed and reliability. Over the past 4 weeks, Semgrep CI has scanned over 780 GB of source code in 302,000+ scans from 8,000+ projects.

GitLab is in the process of transitioning more of their SAST analyzers to Semgrep and are also excited to contribute to the Semgrep open source project and rule registry.

Adding rules from the community registry

The Semgrep Registry includes 1,000+ community-driven rules covering security, correctness, and performance bugs. It also includes rules recommended for teams looking to set up guardrails or to flag troublesome spots for further review.

You can add any rules from the registry to your pipeline. For example, to additionally scan your code using the security-audit ruleset (a group of 245 rules to scan code for potential security issues that require additional review) and the secrets ruleset (a group of 44 rules for detecting secrets checked into version control), add the following to your .gitlab-ci.yml:

semgrep:
  image: returntocorp/semgrep-agent:v1
  script: semgrep-agent
  variables:
    SEMGREP_RULES: >-
      p/security-audit
      p/secrets

There’s no need to manually specify which rules are used for a particular language—Semgrep automatically skips rules/rulesets for languages that aren’t applicable.

Discuss findings in merge requests

Semgrep fits right into existing developer workflows. Scan results present as merge request discussions, so they look just like regular code reviews. Developers can fix, discuss, or suppress these findings all in the context of the proposed code changes.

Semgrep scan results as GitLab merge request comments

See the docs to enable Semgrep results in MR discussions.

Scan 10x faster using merge request scans

While you can scan all the files in any branch, Semgrep scans are significantly faster when they’re triggered on a merge request because unchanged files aren’t scanned. To enable this behavior, add the following rule:

semgrep:
  image: returntocorp/semgrep-agent:v1
  script: semgrep-agent
  variables:
    SEMGREP_RULES: >-
      p/security-audit
      p/secrets
  rules:
    - if: $CI_MERGE_REQUEST_IID

If you’d also like to scan any pushes to the default branch, add the following rule:

- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

Adding your own rules

Semgrep rules look like the code you already write: no abstract syntax trees, regex wrestling, or painful DSLs. To illustrate, here's a simple rule for finding Python print() statements:

Semgrep rule to find Python print() statements

You can prototype custom rules using the Semgrep Playground or using the Semgrep command line tool on your computer (try brew install semgrep or pip install semgrep).

The Semgrep Playground lets you save a custom rule and use it in CI. Copy the URL generated after saving the rule and add paste it into your CI/CD config file. That’s it!

What’s next

We’ve really enjoyed working with the GitLab team on this partnership and thank them for their collaboration. GitLab plans to migrate more SAST analyzers to Semgrep, bringing the community-driven registry and rule customization to the GitLab CI/CD pipeline for projects written in more languages. And if you’re a GitLab user, we’d love to get your feedback on this integration!