Skip to content

fix: scoring recalibration, FP reduction, enriched JSON, Zod schemas#132

Open
JBAhire wants to merge 9 commits into
mainfrom
fix/scoring-recalibration
Open

fix: scoring recalibration, FP reduction, enriched JSON, Zod schemas#132
JBAhire wants to merge 9 commits into
mainfrom
fix/scoring-recalibration

Conversation

@JBAhire
Copy link
Copy Markdown
Contributor

@JBAhire JBAhire commented Mar 27, 2026

Summary

Major quality overhaul — fixes scoring inversion bug, reduces false positives, enriches JSON API, adds platform security hardening, and establishes shared schema contract.

Scoring fix: A secure agent scored worse (B:85) than a vulnerable agent with shell injection (B:87). Now: vulnerable=D:69, secure=C:75.

CLI Changes (9 commits)

Scoring Recalibration

  • Severity deductions 2x (critical 20→40, high 10→18)
  • Critical-floor clamp: 1 vuln critical→max C, 2→max D, 3+→max F
  • Unknown reachability 0.6→0.85, not-assessed exploitability 0.7→0.85
  • FindingCategory type: vulnerability | hardening | informational
  • Absence-check rules (no kill switch, no RBAC) downgraded to medium

False Positive Reduction

  • AA-DL-046: skip imports, expand memory isolation patterns
  • AA-GI-002: expanded instruction guarding detection
  • AA-TS-184: framework-scoped YAML agent_property rules
  • AA-TS-021: narrow network access check to tool scope
  • Non-agent guard: drop hardening findings when no agents/tools detected
  • Self-scan: exclude .claude/worktrees/, fixtures/, CVE docs

JSON API

  • metadata (frameworks, agentCount, toolCount, filesScanned)
  • score.securityScore / score.hardeningScore split
  • finding.category, finding.message, finding.fix aliases
  • graph.nodes[] and graph.edges[] populated
  • Banner suppressed for --cyclonedx and --output

Architecture

  • Zod schemas for all 8 upload payload types (shared contract)
  • Code splitting (SDK consumers get smaller bundles)
  • BaseFinding interface, Rule type safety (removed unsafe casts)
  • AI provider throws on --ai without API key
  • Structured logger (TTY-aware, JSON in CI)
  • --strict flag (exit 2 on critical findings)
  • guard0 npm wrapper (g0 command only, no guard0 command conflict)

Results

Test Case Before After
Vulnerable agent (shell+SQL injection) B (87) D (69)
Secure agent (proper controls) B (85) C (75)
Non-agent code (Flask) A (99) A (100)

Test plan

  • Vulnerable agent scores D, secure agent scores C, non-agent scores A
  • JSON output has metadata, categories, graph nodes, message/fix
  • Zod schemas validate correct payloads, reject malformed
  • --strict exits with code 2 on critical findings
  • 102/103 test files pass (1 pre-existing daemon failure)
  • Build succeeds with code splitting

JBAhire added 9 commits March 26, 2026 16:40
Fixes the #1 trust-destroying bug: a secure agent scored worse (B:85)
than a vulnerable agent with shell injection (B:87).

Scoring changes:
- Severity deductions increased (critical 20→40, high 10→18, medium 4→6)
- Critical-floor clamp: 1 crit→max C, 2→max D, 3+ criticals→max F
- Unknown reachability 0.6→0.85 (assume reachable until proven safe)
- Not-assessed exploitability 0.7→0.85 (same principle)
- Correlation bonus capped at 50% of remaining domain score

Finding categorization:
- Added FindingCategory type: vulnerability | hardening | informational
- Improved isAbsenceBased() with title-pattern fallback detection

Rule severity downgrades (absence → hardening):
- AA-HO-005 "No emergency stop": critical → medium
- AA-IA-030 "No RBAC enforcement": high → medium
- AA-RA-011 "No kill switch": critical → medium

Results: Vulnerable agent B:87→D:69, Secure agent B:85→C:72

Also includes:
- guard0 CLI alias (guard0 + g0 both work)
- guard0 npm wrapper package for `npm install guard0`
- Version sync script (scripts/version.mjs)
- Updated release workflow to publish guard0 wrapper
- Updated banner to show GUARD0 branding
…mport skipping

AA-DL-046 (shared memory):
- Skip import statements (only flag actual instantiation)
- Expand isolation patterns: thread_id, config.*thread, configurable,
  memory_key=, chat_history

checkInstructionGuarding (shared parser):
- Add broader deny patterns: MUST NOT, you can only, do not disclose,
  outside these boundaries, politely decline, refuse any requests
- Fixes FP where well-written system prompts weren't recognized as guarded

Results: Secure agent 0 criticals (was 1), score C:75 (was C:72)
…ge/fix aliases

JSON reporter:
- Added metadata: frameworks, agentCount, toolCount, promptCount, modelCount, filesScanned
- Added score.securityScore and score.hardeningScore split
- Added finding.category (vulnerability | hardening | informational)
- Added finding.message (alias for title) and finding.fix (alias for remediation)
- Added graph.nodes[] (agents, tools, models with file/line)
- Added graph.edges[] (typed edges between nodes)

Analysis engine:
- Derives finding category from checkType and title patterns
- Absence-based rules (no X, missing Y, lacks Z) → hardening
- Code-pattern rules → vulnerability
- Info severity → informational

CLI:
- Banner now suppressed for --cyclonedx and --output flags
New file: src/platform/schemas/upload.ts
- 8 upload payload schemas (scan, inventory, mcp, test, flows, endpoint,
  host-hardening, openclaw-audit) as a discriminated union on 'type'
- Endpoint register + heartbeat schemas
- Shared sub-schemas: Finding, ScanScore, ProjectMeta, MachineMeta, CIMeta
- Validates: severity enums, score ranges (0-100), grade enums, required fields

All schemas exported from @guard0/g0 package index so the platform can:
  import { UploadPayloadSchema } from '@guard0/g0'
  const result = UploadPayloadSchema.safeParse(payload)

This is the shared schema contract between CLI and platform — single
source of truth, runtime validation, no more silent schema drift.
FP fixes:
- agent_property YAML rules now skip when framework filter doesn't match
  (fixes AA-TS-184 MCP rule firing on LangChain agents)
- Non-agent project guard: drop hardening findings when no agents/tools
  detected (Flask app: 5 findings → 2, score 99 → 100)

Architecture fixes:
- AI provider throws explicit error when --ai set but no API key configured
  (was silent console.error, now fails loudly)
- Rule interface: added suppressedBy/requiresControl fields directly,
  removed unsafe `as Rule & Record<string, unknown>` type assertions
- ModelNode: added maxTokens/temperature/topP optional fields
Build:
- Split tsup config into CLI (single bundle + shebang) and SDK/daemon
  (code-split for smaller imports)
- SDK consumers importing { runScan } no longer pull in CLI code

Types:
- Added BaseFinding interface (severity, title, description, category)
- Finding now extends BaseFinding
- Exported BaseFinding and FindingCategory from package index
New: src/utils/logger.ts
- Human-readable colored output in TTY mode
- JSON lines on stderr in non-TTY/CI mode
- Log levels via G0_LOG_LEVEL env var (error/warn/info/debug)
- Replaced console.error in platform/upload.ts with logger.error
- Removed guard0 bin entry from main package (conflicts with enterprise pip)
- Removed guard0 bin entry from wrapper package
- Removed cliName() detection — always 'g0'
- guard0 npm package still exists for discoverability (npm install guard0)
  but only installs the g0 command
- AA-TS-021: narrow network access check to within 2000 chars of tool
  definition (was file-wide, caused FPs on non-tool network calls)
- Self-scan: exclude .claude/worktrees/, advisories/, CVE docs from analysis
- Added --strict flag: exit code 2 if any critical finding exists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant