Semgrep rules that catch insecure patterns AI code generators (Copilot, Cursor, ChatGPT) commonly produce.
AI assistants write functional code fast — but they also generate hardcoded secrets, prompt injection vulnerabilities, insecure deserialization, and unprotected LLM endpoints at an alarming rate. This ruleset catches those patterns before they hit production.
Part of the GRIMSEC DevSecOps Suite.
| Category | Rules | Examples |
|---|---|---|
| Hardcoded Secrets | 6 | API keys (AWS, OpenAI, GitHub), JWT secrets, Flask SECRET_KEY, DB connection strings with embedded creds |
| Prompt Injection | 5 | User input in f-string prompts, string concat in LLM calls, LangChain unsanitized input, system prompt leakage |
| Insecure Deserialization | 6 | pickle.load(), yaml.load() without SafeLoader, torch.load(), node-serialize, Java ObjectInputStream |
| LLM App Security | 5 | exec() on LLM output, LLM output in SQL, no rate limiting on LLM endpoints, OpenAI key in source |
22 rules across Python, JavaScript, TypeScript, Java, Go, and Ruby.
# Install Semgrep
pip install semgrep
# Clone this ruleset
git clone https://github.com/camgrimsec/ai-codegen-security-linter.git
# Scan your project
semgrep scan --config ai-codegen-security-linter/rules/ /path/to/your/project# .github/workflows/security.yaml
- name: AI Codegen Security Lint
run: semgrep scan --config "https://github.com/camgrimsec/ai-codegen-security-linter/rules/" --error# .pre-commit-config.yaml
repos:
- repo: https://github.com/semgrep/semgrep
rev: v1.96.0
hooks:
- id: semgrep
args: ['--config', 'https://github.com/camgrimsec/ai-codegen-security-linter/rules/', '--error']Built-in scoring engine maps all 21 rules to CVSS 3.1 base scores and generates prioritized risk reports grouped by Critical, High, and Medium.
# Step 1: Run Semgrep with JSON output
semgrep scan --config rules/ --json -o results.json /path/to/your/project
# Step 2: Generate prioritized risk report
python -m scorer results.json # Console (default)
python -m scorer results.json --format json # JSON for CI/CD
python -m scorer results.json --format markdown # Markdown for PRs
python -m scorer results.json --min-severity high # Filter by severity
python -m scorer results.json -f json -o report.json # Write to file========================================================================
AI CODEGEN SECURITY LINTER — PRIORITIZED RISK REPORT
========================================================================
Findings : 11
Max CVSS : 9.8
Avg CVSS : 9.3
Files Hit : 4
Rules Hit : 7
🔴 CRITICAL: 7 🟠 HIGH: 4 🟡 MEDIUM: 0 ⚪ LOW: 0
========================================================================
🔴 CRITICAL (7 findings)
────────────────────────────────────────────────────────────────────
[9.8] Pickle Deserialization of Untrusted Data
Rule : ai-codegen-pickle-load-untrusted
CWE : CWE-502
CVSS : CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
File : app/ml/model_loader.py:42
Fix : Use json, msgpack, or safetensors for ML models.
| Severity | CVSS Range | Rule Count | Examples |
|---|---|---|---|
| 🔴 Critical | 9.0 - 9.8 | 9 | pickle.load, yaml.load, torch.load, exec(llm_output), LangChain code exec, node-serialize, Java deser |
| 🟠 High | 7.0 - 8.6 | 10 | Hardcoded API keys, JWT secrets, prompt injection (f-string/concat/LangChain), Flask SECRET_KEY, LLM→SQL |
| 🟡 Medium | 4.0 - 5.3 | 2 | System prompt leak, no rate limit on LLM endpoint |
- Console — Color-coded terminal output with severity grouping
- JSON — Structured output for CI/CD pipelines, dashboards, and downstream tools
- Markdown — Drop into GitHub Issues, PRs, or wiki pages
rules/
├── hardcoded-secrets/
│ ├── api-key-in-source.yaml # AWS, OpenAI, GitHub, Slack, GitLab keys
│ └── jwt-secret-hardcoded.yaml # JWT signing + Flask SECRET_KEY
├── prompt-injection/
│ ├── user-input-in-prompt.yaml # f-string, concat, template injection
│ └── langchain-unsafe-patterns.yaml # LangChain-specific unsafe patterns
├── insecure-deserialization/
│ ├── pickle-yaml-unsafe.yaml # Python pickle, yaml, shelve, torch
│ └── js-deserialization.yaml # node-serialize, eval-as-parse, Java deser
└── llm-app-security/
└── insecure-llm-patterns.yaml # exec(llm_output), SQL injection via LLM, rate limiting
scorer/ ├── init.py ├── main.py # python -m scorer entrypoint ├── cli.py # CLI argument parsing ├── cvss_map.py # CVSS 3.1 scores + vectors for all 21 rules └── engine.py # Scoring engine + console/JSON/markdown formatters
## 🧪 Testing
```bash
# Run rule tests against fixtures
semgrep scan --config rules/ --test tests/
Test files use Semgrep's ruleid: and ok: annotations to validate true/false positives.
Standard SAST rules catch generic issues. These rules target patterns specific to how AI assistants generate code:
- Copilot inlines API keys as "placeholder" values that look real enough to work
- Cursor generates complete Flask/FastAPI apps with hardcoded
SECRET_KEY - ChatGPT produces LangChain examples with
PythonREPLTool(arbitrary code exec) - All AI tools use
pickle.load()in ML examples without mentioning RCE risk - All AI tools build LLM prompts with f-string interpolation (prompt injection)
These aren't theoretical — they're the most common patterns seen in code reviews of AI-assisted PRs.
Every rule includes:
- CWE mapping (e.g., CWE-798, CWE-502, CWE-77)
- OWASP mapping where applicable
- Confidence rating (HIGH/MEDIUM/LOW)
- Severity (ERROR/WARNING)
- AI-codegen-specific context in the message explaining why AI tools produce this pattern
- Go-specific rules (hardcoded creds in
http.NewRequest, unsafe template exec) - Rust rules (unsafe blocks AI generators over-use)
- Terraform/IaC rules (AI-generated overly permissive IAM policies)
- VS Code extension for real-time linting
- Semgrep App integration for dashboard visibility
- SARIF output integration with GitHub Advanced Security
See CONTRIBUTING.md for guidelines. Key areas where help is needed:
- New rules — especially for Go, Rust, and C#
- False positive tuning — test against real AI-generated codebases
- Documentation — remediation guides for each rule category
MIT License. See LICENSE.
- GRIMSEC DevSecOps Suite — Full AI-powered DevSecOps agent platform
- Semgrep — The static analysis engine these rules run on
- OWASP Top 10 for LLM Applications — Framework for LLM security risks