Detection parity audit — verify coverage of all known MCP/agent skill attack patterns

## Summary

Conduct a systematic audit of g0's detection capabilities against all known MCP server and agent skill attack patterns to ensure comprehensive coverage and identify any gaps.

## Motivation

The MCP security scanning space is maturing rapidly, and standardized attack pattern taxonomies are emerging. g0 should verify it detects all known attack types and achieves low false positive rates on legitimate skills/servers.

## Known Attack Patterns to Verify

### MCP Server Attacks
1. **Prompt injection** in tool descriptions
2. **Tool poisoning** — malicious behavior hidden in tool implementations
3. **Tool shadowing** — overriding legitimate tools with malicious versions
4. **Cross-origin tool confusion** — tools that impersonate other servers
5. **Capability inflation** — tools claiming broader access than needed
6. **Rug-pull attacks** — tool descriptions changing after initial approval
7. **Exfiltration via tool output** — embedding sensitive data in responses
8. **Parameter injection** — malicious content in tool parameters
9. **Server instruction injection** — malicious instructions in server metadata
10. **Toxic flows** — chains of tools that create security risks when combined

### Agent Skill Attacks
11. **Malicious skill packages** — skills that execute harmful operations
12. **Dependency confusion** — skills that load malicious dependencies
13. **Typosquatting** — skills with names similar to legitimate ones
14. **Credential harvesting** — skills that capture and exfiltrate credentials
15. **Backdoor installation** — skills that establish persistent access

### Detection Quality Metrics
- **Recall** — what percentage of known malicious patterns does g0 detect?
- **Precision** — what's the false positive rate on legitimate skills/servers?
- **Severity accuracy** — are findings classified at the right severity level?

## Proposed Work

1. **Build a test corpus** of known malicious MCP tools/skills (synthetic, not real malware)
2. **Build a legitimate corpus** of 100+ popular, trusted MCP tools/skills
3. **Run g0 scan against both** and measure recall/precision
4. **Identify gaps** — attack patterns that g0 misses
5. **Add missing rules** for any undetected patterns
6. **Tune severities** based on real-world impact
7. **Document coverage matrix** showing which rules detect which attack patterns

## Files to Create/Modify
- `tests/fixtures/malicious-mcp/` — synthetic malicious MCP tools
- `tests/fixtures/legitimate-mcp/` — legitimate MCP tool corpus
- `tests/integration/detection-audit.test.ts` — automated coverage test
- New rules in `src/rules/builtin/` for any gaps found
- `docs/detection-coverage.md` — coverage matrix documentation

## Acceptance Criteria
- [ ] Test corpus of 50+ synthetic malicious patterns
- [ ] Test corpus of 100+ legitimate MCP tools
- [ ] Recall ≥ 95% on known malicious patterns
- [ ] False positive rate < 5% on legitimate tools
- [ ] Coverage matrix documented
- [ ] Any identified gaps closed with new rules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection parity audit — verify coverage of all known MCP/agent skill attack patterns #122

Summary

Motivation

Known Attack Patterns to Verify

MCP Server Attacks

Agent Skill Attacks

Detection Quality Metrics

Proposed Work

Files to Create/Modify

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Detection parity audit — verify coverage of all known MCP/agent skill attack patterns #122

Description

Summary

Motivation

Known Attack Patterns to Verify

MCP Server Attacks

Agent Skill Attacks

Detection Quality Metrics

Proposed Work

Files to Create/Modify

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions