Skip to content

Detection parity audit — verify coverage of all known MCP/agent skill attack patterns #122

@JBAhire

Description

@JBAhire

Summary

Conduct a systematic audit of g0's detection capabilities against all known MCP server and agent skill attack patterns to ensure comprehensive coverage and identify any gaps.

Motivation

The MCP security scanning space is maturing rapidly, and standardized attack pattern taxonomies are emerging. g0 should verify it detects all known attack types and achieves low false positive rates on legitimate skills/servers.

Known Attack Patterns to Verify

MCP Server Attacks

  1. Prompt injection in tool descriptions
  2. Tool poisoning — malicious behavior hidden in tool implementations
  3. Tool shadowing — overriding legitimate tools with malicious versions
  4. Cross-origin tool confusion — tools that impersonate other servers
  5. Capability inflation — tools claiming broader access than needed
  6. Rug-pull attacks — tool descriptions changing after initial approval
  7. Exfiltration via tool output — embedding sensitive data in responses
  8. Parameter injection — malicious content in tool parameters
  9. Server instruction injection — malicious instructions in server metadata
  10. Toxic flows — chains of tools that create security risks when combined

Agent Skill Attacks

  1. Malicious skill packages — skills that execute harmful operations
  2. Dependency confusion — skills that load malicious dependencies
  3. Typosquatting — skills with names similar to legitimate ones
  4. Credential harvesting — skills that capture and exfiltrate credentials
  5. Backdoor installation — skills that establish persistent access

Detection Quality Metrics

  • Recall — what percentage of known malicious patterns does g0 detect?
  • Precision — what's the false positive rate on legitimate skills/servers?
  • Severity accuracy — are findings classified at the right severity level?

Proposed Work

  1. Build a test corpus of known malicious MCP tools/skills (synthetic, not real malware)
  2. Build a legitimate corpus of 100+ popular, trusted MCP tools/skills
  3. Run g0 scan against both and measure recall/precision
  4. Identify gaps — attack patterns that g0 misses
  5. Add missing rules for any undetected patterns
  6. Tune severities based on real-world impact
  7. Document coverage matrix showing which rules detect which attack patterns

Files to Create/Modify

  • tests/fixtures/malicious-mcp/ — synthetic malicious MCP tools
  • tests/fixtures/legitimate-mcp/ — legitimate MCP tool corpus
  • tests/integration/detection-audit.test.ts — automated coverage test
  • New rules in src/rules/builtin/ for any gaps found
  • docs/detection-coverage.md — coverage matrix documentation

Acceptance Criteria

  • Test corpus of 50+ synthetic malicious patterns
  • Test corpus of 100+ legitimate MCP tools
  • Recall ≥ 95% on known malicious patterns
  • False positive rate < 5% on legitimate tools
  • Coverage matrix documented
  • Any identified gaps closed with new rules

Metadata

Metadata

Assignees

No one assigned

    Labels

    securitySecurity hardening, vulnerability detection, threat mitigationstatic-analysisStatic scanning, rules, heuristics, detection patterns

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions