Skip to content

fix(math): restrict sympy expression parsing#200

Open
Sebastion (sebastiondev) wants to merge 2 commits into
QWED-AI:mainfrom
sebastiondev:fix/cwe95-main-sympy-9383
Open

fix(math): restrict sympy expression parsing#200
Sebastion (sebastiondev) wants to merge 2 commits into
QWED-AI:mainfrom
sebastiondev:fix/cwe95-main-sympy-9383

Conversation

@sebastiondev

@sebastiondev Sebastion (sebastiondev) commented Jun 8, 2026

Copy link
Copy Markdown

QWED Enforcement Checklist

  • No fallback execution added
  • No new raw eval / exec usage introduced
  • Verification is enforced before execution
  • No silent error handling or bypass-oriented retries added
  • No trust placed in LLM-provided expected values, reasoning, or confidence
  • Failure paths remain fail-closed

Summary

This change fixes a CWE-95 code injection issue in math expression verification. User-controlled expression strings from the POST /verify/math endpoint, VerificationEngine math helpers, SemanticValidator, and math batch verification were passed directly to SymPy parse_expr(). SymPy's parser evaluates generated Python code internally, so calling it without restricted namespaces allows a tenant-supplied expression to reach Python execution primitives.

The affected public endpoint is authenticated with get_current_tenant, but that is not a sufficient mitigation for a multi-tenant API: any tenant with a valid API key can submit math verification input, while a valid API key should not grant arbitrary code execution on the service host.

The fix adds qwed_new.core.safe_parser.safe_parse_expr() and replaces the bare parser calls in the affected math paths. The wrapper rejects dangerous constructs before parsing, supplies an allow-listed local namespace containing expected SymPy math symbols/functions only, removes Python builtins from the parser globals, and enforces basic input validation including empty/non-string checks and a length limit. This preserves deterministic symbolic verification while narrowing the parser boundary to math expressions.

Vulnerability details

  • CWE: CWE-95, Improper Neutralization of Directives in Dynamically Evaluated Code
  • Severity: high, because a crafted math expression can execute host commands under the API process account
  • Affected files/functions before this patch:
    • src/qwed_new/api/main.py, verify_math, request expression -> parse_expr()
    • src/qwed_new/core/verifier.py, VerificationEngine math/identity/derivative/integral/limit helpers -> parse_expr()
    • src/qwed_new/core/batch.py, math batch item query -> parse_expr()
    • src/qwed_new/core/validator.py, semantic math validation -> parse_expr()

Proof of Concept

The underlying unsafe behavior can be reproduced against the previous implementation with the same sink used by the affected code paths:

from sympy.parsing.sympy_parser import parse_expr

# This executes the host command before returning a SymPy value.
parse_expr('__import__("os").system("id")')

For the API path, a tenant with a valid key could send the malicious expression to the existing math verification route:

import requests

response = requests.post(
    "http://localhost:8000/verify/math",
    headers={"x-api-key": "<valid tenant api key>"},
    json={"expression": "__import__(\"os\").system(\"id\")"},
    timeout=10,
)
print(response.status_code)
print(response.text)

After this patch, the same payload is rejected by safe_parse_expr() before SymPy evaluation. Normal expressions such as 2+2, x**2 + 2*x + 1, sin(x), sqrt(16), and factorial(5) still parse successfully.

Validation

I validated the changed parser boundary and the endpoint exception path with:

PYTHONPATH=src python3 -m pytest tests/security/test_safe_parser.py -q
PYTHONPATH=src python3 -m pytest tests/test_api_exceptions.py::test_verify_math_exception_handling -q

Results:

  • tests/security/test_safe_parser.py: 36 passed
  • tests/test_api_exceptions.py::test_verify_math_exception_handling: 1 passed

I also checked for remaining direct production uses of SymPy parse_expr() outside the new wrapper. The remaining sympy.parse_expr text is in documentation/example text, not an active parser call.

A broader local run of tests/security/test_safe_parser.py tests/test_api_exceptions.py passed the new security tests but exposed an unrelated Python 3.14 compatibility failure in tests/test_api_exceptions.py::test_verify_stats_exception_handling caused by ast.Num removal while importing stats_verifier.py. That failure is outside this parser change.

Security analysis

The exploit works because parse_expr() is not just a passive math grammar parser: it transforms input and evaluates Python code. Without a constrained global_dict and local_dict, identifiers such as __import__ can resolve to Python runtime capabilities and invoke OS commands. The request body field expression and the corresponding engine/batch inputs are controlled by the caller, so the attacker controls the string that reaches the evaluation sink.

This patch mitigates the issue in depth. The denylist blocks common Python execution, reflection, import, filesystem, and process-spawning constructs before parsing. The __builtins__ removal prevents fallback access to builtins during evaluation. The allow-listed local dictionary limits successful names to expected mathematical symbols, constants, and SymPy functions. The length/type checks also keep malformed or oversized inputs fail-closed instead of reaching the parser.

Before submitting, I attempted to disprove the issue by checking whether authentication or routing protections would remove exploitability. The route does require get_current_tenant, but the project exposes tenant API keys for verification workloads; that precondition does not already grant code execution. I also checked for existing input validation before the parser and did not find a sanitizer that would block the payload before it reached parse_expr().

Notes

This remains compliant with QWED's deterministic verification boundary: the patch does not add fallback execution, retries, or LLM trust. It tightens the symbolic math parser so verification continues to run through SymPy, but only with a constrained math namespace.


Submitted by Sebastion — autonomous open-source security research from Foundation Machines. Free for public repos via the Sebastion AI GitHub App.

Summary by CodeRabbit

  • New Features

    • Implemented stricter validation for mathematical expression parsing across the platform.
  • Tests

    • Added comprehensive test suite for expression validation and variable name handling.
  • Bug Fixes

    • Enhanced expression parsing to handle edge cases more robustly.

sympy parse_expr uses Python code execution internally and without
restrictions on local_dict/global_dict allows arbitrary code execution
through crafted math expression strings.

Add safe_parse_expr wrapper that:
- Validates input against a denylist of dangerous patterns (dunder
  attrs, import, exec, os, subprocess, etc.)
- Restricts the namespace to only known-safe sympy objects (math
  functions, constants, and common symbolic variables)
- Strips Python builtins from the global namespace
- Enforces a maximum expression length

Replace all bare parse_expr calls across the codebase:
- src/qwed_new/api/main.py (/verify/math endpoint)
- src/qwed_new/core/verifier.py (VerificationEngine)
- src/qwed_new/core/batch.py (batch math verification)
- src/qwed_new/core/validator.py (SemanticValidator)

Signed-off-by: Sebastion <sebastion@sebastion.dev>
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a CWE-95 (eval injection) vulnerability where user-controlled math expression strings were passed directly to SymPy's parse_expr(), which uses Python eval() internally and could execute arbitrary host commands. A new safe_parse_expr() wrapper is introduced and substituted at every affected call site in main.py, verifier.py, batch.py, and validator.py.

  • safe_parser.py adds a denylist for dangerous patterns, strips __builtins__ from the global_dict, supplies an allow-listed local_dict of SymPy math symbols, copies the shared global dict per call to prevent cross-request namespace leakage, and enforces type/length guards before reaching parse_expr.
  • verifier.py also calls validate_variable_name() before passing the user-supplied variable argument to Symbol() in verify_derivative, verify_integral, and verify_limit, closing the one remaining user-controlled string path that bypassed the expression guard.
  • Tests cover 36 security scenarios (payload blocking, legitimate math, multi-letter symbols, global-dict isolation) plus the API exception-handling path, all passing.

Confidence Score: 5/5

This PR is safe to merge. It replaces every direct parse_expr() call site with the new sandboxed wrapper and adds no new execution paths or trust boundaries.

All four previously vulnerable call sites (main.py, verifier.py, batch.py, validator.py) are consistently patched. The wrapper's three-layer defence (denylist → stripped builtins → allow-listed namespace) is correctly implemented and independently tested with 36 targeted security cases. Previous concerns about multi-letter symbols and shared global-dict mutation are both resolved in this version. No regressions in the exception-handling path were introduced, and the test for API error sanitisation still passes with the updated patch target.

No files require special attention. The new safe_parser.py module is the core of the change and is well-tested.

Important Files Changed

Filename Overview
src/qwed_new/core/safe_parser.py New module implementing the sandboxed parser: denylist, empty builtins, allow-listed local_dict (including Greek letters), per-call shallow copy of global_dict, and validate_variable_name(). Defense-in-depth is solid; previous thread concerns about multi-letter symbols and shared global-dict mutation are both resolved here.
src/qwed_new/core/verifier.py All five direct parse_expr() call sites replaced with safe_parse_expr(); validate_variable_name() correctly gates the variable parameter in verify_derivative, verify_integral, and verify_limit before Symbol() is called.
src/qwed_new/api/main.py Three parse_expr() calls replaced with safe_parse_expr(); outer exception handler still returns the sanitised INTERNAL_VERIFICATION_ERROR constant so no error content leaks to clients.
src/qwed_new/core/batch.py All three parse_expr() call sites in MATH batch verification replaced with safe_parse_expr(); no other math paths affected.
src/qwed_new/core/validator.py Single parse_expr() call in SemanticValidator.validate() replaced with safe_parse_expr(); change is minimal and correct.
tests/security/test_safe_parser.py New test file with 36 cases covering payload blocking, legitimate math, Greek/multi-letter symbols, global-dict isolation, and variable-name validation — comprehensive coverage for the new module.
tests/test_api_exceptions.py Patch target updated from sympy.parsing.sympy_parser.parse_expr to qwed_new.core.safe_parser.parse_expr — correctly patches the reference used inside the wrapper module.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as POST /verify/math
    participant SP as safe_parse_expr()
    participant DL as Denylist check
    participant LD as _build_safe_local_dict()
    participant PE as parse_expr() (SymPy)

    Client->>API: expression (user input)
    API->>SP: safe_parse_expr(expression)
    SP->>SP: isinstance / empty / length checks
    SP->>DL: _DANGEROUS_PATTERNS.search(stripped)
    alt Dangerous pattern found
        DL-->>SP: match
        SP-->>API: ValueError("disallowed constructs")
        API-->>Client: "status=ERROR, sanitised message"
    else Safe input
        DL-->>SP: no match
        SP->>LD: build allow-listed local_dict
        LD-->>SP: "{x,y,sin,cos,...,Greek letters}"
        SP->>PE: "parse_expr(expr, local_dict=..., global_dict=dict(SAFE_GLOBAL))"
        PE-->>SP: sympy.Basic expression
        SP-->>API: sympy expression
        API-->>Client: verification result
    end
Loading

Reviews (2): Last reviewed commit: "fix: address review — add Greek symbols,..." | Re-trigger Greptile

Comment thread src/qwed_new/core/safe_parser.py
Comment thread src/qwed_new/core/safe_parser.py
Comment thread src/qwed_new/core/verifier.py
@rahuldass19 Rahul Dass (rahuldass19) added the bug Something isn't working label Jun 12, 2026
@rahuldass19

Copy link
Copy Markdown
Member

Thanks for the report, Sebastion (@sebastiondev). We independently audited the codebase against your claims and verified the vulnerability.

Vulnerability Status: Confirmed

We reproduced the exploit on our current SymPy 1.14.0 installation using a chr()-based bypass that works with and without SymPy transformations:

from sympy.parsing.sympy_parser import parse_expr
parse_expr('__import__(chr(111)+chr(115)).system(chr(105)+chr(100))')
# Executes os.system("id") on the host

All 17 production parse_expr calls across main.py, verifier.py, batch.py, and validator.py are reachable from user-controlled input with zero sanitization between the request body and the eval() sink. Authentication (get_current_tenant) is not a mitigation here — any tenant with a valid API key can exploit this.

The fix direction is correct and aligns with QWED's fail-closed philosophy. However, we need the following changes before we can merge:

Required Changes

1. Multi-letter symbolic variable support

_build_safe_local_dict only defines single-letter variables (x, y, z, etc.). Expressions using alpha, beta, theta, phi, omega, lambda, epsilon, tau, delta, sigma, gamma — which are common in our verification workloads — will raise ValueError where they previously parsed fine. These need to be added to the allow-list.

2. Shared mutable _SAFE_GLOBAL_DICT

The module-level dict is passed by reference to every parse_expr call. SymPy transformations can mutate global_dict in-place, which means entries from one tenant's parse call can leak into subsequent calls. Use dict(_SAFE_GLOBAL_DICT) (shallow copy) per invocation.

3. Validate variable parameter in calculus methods

verify_derivative, verify_integral, and verify_limit route expression through safe_parse_expr but pass the variable argument directly to Symbol(variable) without any validation. The hardened boundary should be consistent across all user-controlled string inputs.


If you're able to address these, we're happy to re-review. If not, we may implement the fix internally based on your approach — the vulnerability is real and we want to close it promptly. Either way, thank you for the responsible report.

@codspeed-hq

codspeed-hq Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will improve performance by 65%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
✅ 18 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
test_bench_math_algebraic_expression 1.8 ms 1.1 ms +67.3%
test_bench_math_simple_arithmetic 1.8 ms 1.1 ms +62.72%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing sebastiondev:fix/cwe95-main-sympy-9383 (bbf9ade) with main (06801f6)

Open in CodSpeed

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.85714% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/qwed_new/core/safe_parser.py 93.02% 3 Missing ⚠️
src/qwed_new/api/main.py 50.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR hardens math expression parsing across the verification system by introducing a denylist-based safe parser that replaces SymPy's unrestricted parse_expr. The safe parser validates input length, rejects dangerous constructs, and restricts the parsing namespace to allow-listed symbols and functions, mitigating CWE-95 code injection risks.

Changes

Safe Expression Parser Security Hardening

Layer / File(s) Summary
Safe parser security implementation
src/qwed_new/core/safe_parser.py
Introduces safe_parse_expr() with denylist validation and allow-listed namespace; validate_variable_name() for identifier constraints; and SAFE_TRANSFORMATIONS constant for SymPy configuration.
Core verification engine integration
src/qwed_new/core/verifier.py
Updates VerificationEngine.TRANSFORMATIONS and all verification methods (verify_math, verify_identity, verify_derivative, verify_integral, verify_limit) to parse expressions and variable names via safe parser.
Semantic validator integration
src/qwed_new/core/validator.py
Updates SemanticValidator syntax validation to use safe_parse_expr for expression parsing.
Batch and API service integration
src/qwed_new/core/batch.py, src/qwed_new/api/main.py
Batch service and /verify/math endpoint both switch to safe_parse_expr for equation and non-equation expression verification.
Safe parser security test suite
tests/security/test_safe_parser.py
Comprehensive tests verifying rejection of code-execution payloads, I/O functions, oversized input; correct parsing of valid math expressions; and variable name validation.
API exception test infrastructure update
tests/test_api_exceptions.py
Updates exception injection point to patch the new safe parser function while maintaining sanitized error response validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A parser once wild, now tamed with care,
No more dangerous code shall lurk there!
With denylists strong and namespaces tight,
Safe math expressions shine ever so bright! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: switching from SymPy's unrestricted parse_expr to a safer parser to prevent code injection vulnerabilities.
Description check ✅ Passed The description is comprehensive and well-structured, covering all template sections including the enforcement checklist, detailed vulnerability analysis, proof-of-concept, validation results, and security justification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sebastiondev

Copy link
Copy Markdown
Author

Thank you Rahul Dass (@rahuldass19) for the thorough review and for independently confirming the vulnerability — really appreciated. All three requested changes have been addressed in bbf9ade:

1. Multi-letter symbolic variable support ✅

_build_safe_local_dict now includes the full Greek alphabet (lowercase: alpha, beta, gamma, delta, epsilon, zeta, eta, theta, iota, kappa, mu, nu, xi, omicron, rho, sigma, tau, upsilon, phi, chi, psi, omega) plus commonly-used capitals (Alpha, Beta, Gamma, Delta, Theta, Lambda, Sigma, Phi, Psi, Omega). All are pre-defined as Symbol(...) in the allow-list, so expressions like alpha + beta or sin(theta) now parse correctly. Test coverage added in TestSafeParseExprMultiLetterSymbols.

2. Shared mutable _SAFE_GLOBAL_DICT → shallow copy per call ✅

safe_parse_expr now passes dict(_SAFE_GLOBAL_DICT) instead of the module-level dict directly. This prevents SymPy transformations from mutating the shared reference across invocations, eliminating the cross-tenant namespace leak risk. Test coverage added in TestSafeParseExprGlobalDictIsolation.

3. variable parameter validation in calculus methods ✅

Added validate_variable_name() to safe_parser.py — it applies a character-set whitelist (^[A-Za-z][A-Za-z0-9_]{0,49}$), the same denylist regex, and a 50-char length cap. verify_derivative, verify_integral, and verify_limit in verifier.py now call validate_variable_name(variable) before Symbol(variable), making the hardened boundary consistent across all user-controlled string inputs. Test coverage added in TestValidateVariableName.


All 61 security tests pass (the pre-existing test_verify_stats_exception_handling failure is an unrelated ast.Num deprecation in Python 3.14, not caused by this PR).

Happy to adjust if any of the Greek letter selections need tweaking for your workloads.

Comment on lines +284 to 286
expr = safe_parse_expr(expression)
validate_variable_name(variable)
var = Symbol(variable)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When verifying calculus operations, using "n" as the variable causes a symbol mismatch, leading to incorrect results like 0 for derivatives.
Severity: MEDIUM

Suggested Fix

When creating the variable symbol in the verifier (e.g., var = Symbol(variable)), check if the variable string matches a key in the _build_safe_local_dict() from safe_parser. If it does, reuse the existing symbol object from that dictionary instead of creating a new one. This ensures the symbol used for the operation is identical to the one in the parsed expression.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/qwed_new/core/verifier.py#L284-L286

Potential issue: In `verify_derivative`, `verify_integral`, and `verify_limit`, if the
variable of the operation is `"n"`, the verification will likely fail. This is because
`safe_parse_expr` parses expressions containing `"n"` into a special SymPy symbol with
assumptions (`Symbol("n", integer=True, positive=True)`). However, the verifier creates
a plain `Symbol("n")` for the operation variable. Since SymPy treats these two symbols
as distinct, calculus operations like `diff` incorrectly return 0, as the
differentiation variable is not found in the expression. This causes valid claims like
`diff(x**n, n) = x**n * log(x)` to be incorrectly rejected.

Did we get this right? 👍 / 👎 to inform future reviews.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/security/test_safe_parser.py (1)

126-138: ⚡ Quick win

Strengthen the global dict isolation test.

The current test only verifies that multiple calls succeed without exceptions, but it doesn't validate the actual isolation mechanism. It doesn't prove that copying global_dict per call prevents cross-contamination of SymPy transformations or symbols.

Consider testing a scenario where state leakage would actually occur if the dict were shared:

  • Parse an expression with extra_symbols containing a custom symbol
  • Parse a second expression referencing that symbol name without passing extra_symbols
  • Verify the second parse fails (proving the symbol didn't leak)

Alternatively, if such leakage is difficult to trigger in practice, document why the current test provides sufficient coverage.

♻️ Example strengthened test
 def test_global_dict_not_shared_between_calls(self) -> None:
-    """Parse two different expressions and confirm no cross-contamination."""
-    safe_parse_expr("x + 1")
-    safe_parse_expr("alpha + beta")
-    # If global_dict were shared mutably, SymPy transformations could
-    # leak symbols from one call into another. The shallow-copy fix
-    # prevents this. We just verify no exception is raised and both
-    # parse independently.
-    result = safe_parse_expr("y + 2")
-    assert str(result) == "y + 2"
+    """Verify that extra_symbols from one call don't leak into another."""
+    from sympy import Symbol
+    
+    # First call with a custom symbol
+    safe_parse_expr("custom_var + 1", extra_symbols={"custom_var": Symbol("custom_var")})
+    
+    # Second call should NOT have access to custom_var
+    with pytest.raises(ValueError):
+        safe_parse_expr("custom_var + 2")  # Should fail: custom_var not in default namespace
+    
+    # Verify normal expressions still work
+    result = safe_parse_expr("x + 2")
+    assert str(result) == "x + 2"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/security/test_safe_parser.py` around lines 126 - 138, Update the test
for global dict isolation to actively demonstrate leakage wouldn't occur: call
safe_parse_expr once with extra_symbols including a custom symbol (e.g.,
safe_parse_expr("leak + 1", extra_symbols={"leak": Symbol("leak")})) and assert
that parsing that expression succeeds and produces the expected result, then
call safe_parse_expr again for the same symbol name without providing
extra_symbols and assert that this second call raises the appropriate error
(e.g., NameError or SympifyError) or does not return a Symbol — this proves that
the per-call copy of _SAFE_GLOBAL_DICT prevents the first call's symbol from
leaking into subsequent calls; update
TestSafeParseExprGlobalDictIsolation.test_global_dict_not_shared_between_calls
accordingly, keeping references to safe_parse_expr and _SAFE_GLOBAL_DICT to
locate the implementation if needed.
src/qwed_new/core/safe_parser.py (2)

198-203: 💤 Low value

Including Symbol in local_dict is necessary but warrants documentation.

Symbol is needed because SymPy's transformations may emit code referencing Symbol(). However, this allows users to create symbols with arbitrary names via expressions like Symbol('anything'). The denylist protects against dangerous attribute accesses on the resulting symbol, but consider adding a comment documenting this security boundary.

         # Sympy internal types emitted by standard_transformations
         "Integer": Integer,
         "Float": Float,
         "Rational": Rational,
+        # Symbol is required by transformations; denylist prevents dangerous
+        # attribute access on created symbols.
         "Symbol": Symbol,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/safe_parser.py` around lines 198 - 203, Add a short
in-source comment by the local_dict entry that maps "Symbol" (the dictionary
that includes Integer, Float, Rational, Symbol used with
standard_transformations) explaining why Symbol is required (SymPy
transformations may emit Symbol()), the security boundary it creates (users can
call Symbol('name') to create arbitrary symbol names) and that the existing
denylist/attribute-access protections are relied on to mitigate risks; keep the
comment concise and reference the local_dict and Symbol so future readers know
this is intentional and what to review if tightening is needed.

40-66: 💤 Low value

Denylist-based filtering is a reasonable mitigation but inherently weaker than allowlist.

The pattern set covers key attack vectors (dunders, code execution primitives, system modules). However, denylist approaches can be bypassed by novel attack vectors or encoding tricks. Since SymPy's parse_expr ultimately uses eval(), consider adding defense-in-depth measures in future iterations:

  • AST complexity/depth limits
  • Symbolic expression tree validation post-parse

For this PR, the combination of denylist + restricted namespace + stripped builtins is a substantial improvement and aligns with fail-closed philosophy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/safe_parser.py` around lines 40 - 66, The denylist in
_DANGEROUS_PATTERNS is useful but brittle; add defense‑in‑depth by (1) enforcing
an AST complexity/depth limit on parsed expressions (e.g., run ast.parse on the
input and reject if node count/depth exceeds a safe threshold) before calling
SymPy's parse_expr, and (2) validating the resulting SymPy expression tree after
parse_expr to ensure it contains only allowed node/symbol types (reject
Function, Call, or unexpected Name nodes). Update the code paths that call
parse_expr/use _DANGEROUS_PATTERNS to perform the pre-parse AST checks and the
post-parse symbolic validation (leave the denylist in place as an additional
filter).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/qwed_new/core/safe_parser.py`:
- Around line 198-203: Add a short in-source comment by the local_dict entry
that maps "Symbol" (the dictionary that includes Integer, Float, Rational,
Symbol used with standard_transformations) explaining why Symbol is required
(SymPy transformations may emit Symbol()), the security boundary it creates
(users can call Symbol('name') to create arbitrary symbol names) and that the
existing denylist/attribute-access protections are relied on to mitigate risks;
keep the comment concise and reference the local_dict and Symbol so future
readers know this is intentional and what to review if tightening is needed.
- Around line 40-66: The denylist in _DANGEROUS_PATTERNS is useful but brittle;
add defense‑in‑depth by (1) enforcing an AST complexity/depth limit on parsed
expressions (e.g., run ast.parse on the input and reject if node count/depth
exceeds a safe threshold) before calling SymPy's parse_expr, and (2) validating
the resulting SymPy expression tree after parse_expr to ensure it contains only
allowed node/symbol types (reject Function, Call, or unexpected Name nodes).
Update the code paths that call parse_expr/use _DANGEROUS_PATTERNS to perform
the pre-parse AST checks and the post-parse symbolic validation (leave the
denylist in place as an additional filter).

In `@tests/security/test_safe_parser.py`:
- Around line 126-138: Update the test for global dict isolation to actively
demonstrate leakage wouldn't occur: call safe_parse_expr once with extra_symbols
including a custom symbol (e.g., safe_parse_expr("leak + 1",
extra_symbols={"leak": Symbol("leak")})) and assert that parsing that expression
succeeds and produces the expected result, then call safe_parse_expr again for
the same symbol name without providing extra_symbols and assert that this second
call raises the appropriate error (e.g., NameError or SympifyError) or does not
return a Symbol — this proves that the per-call copy of _SAFE_GLOBAL_DICT
prevents the first call's symbol from leaking into subsequent calls; update
TestSafeParseExprGlobalDictIsolation.test_global_dict_not_shared_between_calls
accordingly, keeping references to safe_parse_expr and _SAFE_GLOBAL_DICT to
locate the implementation if needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b93953c5-3fb4-47f0-93ea-c266f7c3684a

📥 Commits

Reviewing files that changed from the base of the PR and between 06801f6 and bbf9ade.

📒 Files selected for processing (7)
  • src/qwed_new/api/main.py
  • src/qwed_new/core/batch.py
  • src/qwed_new/core/safe_parser.py
  • src/qwed_new/core/validator.py
  • src/qwed_new/core/verifier.py
  • tests/security/test_safe_parser.py
  • tests/test_api_exceptions.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants