Skip to content

fix: reduce false positives from real-world repo analysis#142

Open
JBAhire wants to merge 2 commits into
mainfrom
fix/fp-analysis-from-real-repos
Open

fix: reduce false positives from real-world repo analysis#142
JBAhire wants to merge 2 commits into
mainfrom
fix/fp-analysis-from-real-repos

Conversation

@JBAhire
Copy link
Copy Markdown
Contributor

@JBAhire JBAhire commented Apr 8, 2026

Summary

Scanned g0 against 14 real-world OSS repos and eliminated 89% of false positives (4,578 → 482 findings) across two rounds of fixes.

Repos tested

langchain, langchainjs, langgraph, crewAI-examples, MCP servers, MCP python-sdk, openai-agents-python, swarm, autogen, vercel ai-chatbot, langchain4j, langchaingo, spring-ai, open-interpreter

Final Results (default --min-confidence medium)

Repo Before After Reduction Score
servers 373 17 95% 67→85
crewAI-examples 238 158 34% 45→47
langchain 317 3 99% 47→48
langchainjs 299 22 93% 51→58
langgraph 110 39 65% 66→67
autogen 695 89 87% 22→24
ai-chatbot 100 29 71% 60→60
python-sdk 26 10 62% 92→82
openai-agents-python 106 26 75% 68→67
langchain4j 50 24 52% 81→88
langchaingo 126 33 74% 79→86
spring-ai 281 32 89% 80→89
swarm 377 0 100% 69→100
open-interpreter 1480 0 100% 17→100
Total 4,578 482 89%

Round 1 Fixes (commit 1)

  1. Test file filtering — expanded isTestFile() patterns, remove ALL test-file findings by default
  2. CVE version placeholders — reject ${project.version}, workspace:* as non-numeric
  3. Utility-code suppression — suppress when >80% of findings are utility-code
  4. Server endpoint detectionserver.(py|ts|js) pattern for MCP servers

Round 2 Fixes (commit 2)

  1. CVE package matchingaffectedPackages field so CVE-2026-28363 only matches openclaw (eliminated 430 false critical CVEs)
  2. Framework library detectionisFrameworkLibFile() downgrades confidence for langchain_core/, crewai/src/, autogen_agentchat/ (hidden by default)
  3. Blanket rule cleanup — AA-GI-059, AA-IA-096, AA-MP-086, AA-RA-068, AA-DL-102 downgraded to info/low
  4. AA-CF-056 cap — max 3 findings per scan, severity medium
  5. MCP tool server exclusionfile_not_matches context for AA-RA-041/AA-HO-075 (new yaml-compiler feature)
  6. AA-TS-142 scoped to Python — no longer fires on React .tsx setTimeout/setInterval
  7. Framework filter on agent_property — AA-TS-184 now respects frameworks: [mcp]
  8. Intelligence test file filter — IOC findings in test files now filtered

Test plan

  • All 1,420 tests pass
  • 14 real-world repos scanned, 89% FP reduction
  • No regressions on repos with good agent detection
  • langchain: 317→3, langchainjs: 299→22, spring-ai: 281→32

JBAhire added 2 commits April 8, 2026 14:21
Tested g0 against 16 OSS repos (langchain, crewAI, AutoGPT, MCP servers,
spring-ai, etc.) and found 70% of 7,983 findings were false positives.

Root causes and fixes:

1. Test file filtering too lenient (engine.ts)
   - Expanded isTestFile() patterns: .github/, __mocks__/, _test_/,
     testing/, testutils/, testdata/, _tests.py, Tests.java
   - Changed default from "keep downgraded critical/high in test files"
     to "remove all test-file findings" (--include-tests still works)

2. CVE version comparison bug (cve-feed.ts)
   - Maven ${project.version}, pnpm workspace:*, workspace:^ were
     parsed as NaN, which compared as "less than" any version
   - Added early return for non-numeric version placeholders
   - Eliminated 430 false critical CVE findings across 7 repos

3. Utility-code suppression too conservative (pipeline.ts)
   - Previously only suppressed when agents/tools were detected
   - Now also suppresses when >80% of findings are utility-code
   - Prevents noise floods in large repos where parsers miss entry points

4. Server file endpoint detection (reachability.ts)
   - Added server.(py|ts|js) pattern to endpoint detection
   - MCP server files now classified as endpoint-reachable instead of
     utility-code, preserving real findings in server repos
Round 2 of FP analysis against 14 OSS repos (4,578 -> 482 findings).

Fixes:

1. CVE matching: add affectedPackages to CVEEntry so CVE-2026-28363
   only matches openclaw, not langchain/ollama/google-generativeai.
   Also filter intelligence findings from test files (SSRF test IOCs).

2. Blanket agent_property rules: downgrade AA-GI-059, AA-IA-096,
   AA-MP-086, AA-RA-068, AA-DL-102 to info/low — these check for
   properties no parser ever populates, making them 100% FP.

3. Framework library detection: new isFrameworkLibFile() downgrades
   confidence to low for findings in langchain_core/, crewai/src/,
   autogen_agentchat/, etc. Hidden by default --min-confidence medium.

4. AA-CF-056 capped at 3 findings and downgraded to medium — was
   firing 101 times (once per CrewAI agent) for the same systemic issue.

5. AA-RA-041/AA-HO-075: add file_not_matches context for MCP fetch
   servers. Also add file_not_matches support to yaml-compiler for
   both ast_matches and code_matches check types.

6. AA-TS-142: scope to Python only — was firing on React .tsx files
   matching setTimeout/setInterval.

7. AA-TS-184: enforce frameworks filter on agent_property checks —
   rule declared frameworks: [mcp] but fired on langchain agents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant