SEMMAP generates a compressed architectural map of your codebase. An AI that reads the map before working on a task can identify and request the right small set of source files instead of wandering, guessing, and burning tokens on the wrong files.
The map looks like documentation. That is intentional but not the point. The point is retrieval: an AI with the map should converge on the correct 3-8 files for any task in fewer round trips than without it.
AI coding tools explore unfamiliar codebases the wrong way.
Without orientation they tend to:
- read too much and still miss what matters
- patch the wrong file confidently
- ask for more context without narrowing down
- act on a weak mental model they can't identify as weak
A good developer doesn't start by reading random source files. They want the shape first - where the app starts, what the major boundaries are, which files are load-bearing, what to look at next. Then they read only what the task actually requires.
AI tools need the same orientation. SEMMAP produces it.
cargo install semmap
semmap generateCommit the generated SEMMAP.md. Wire it into your workflow however makes sense - an agent.md instructions file, a system prompt, a context file your IDE plugin reads, or a manual paste. The map is plain markdown and works anywhere.
The workflow:
- Read the map - understand layers, hotspots, and boundaries
- Trace the likely path - follow execution from the relevant entry point
- Request only what the task needs - read that small file set deeply
- Edit with grounded context
SEMMAP analyzes the repo statically and emits:
Layers - architectural role of each file
Layer 0 Config and build artifacts
Layer 1 Domain logic and core engine
Layer 2 Adapters, infra, and integration
Layer 3 Entrypoints and app shell
Layer 4 Tests
Hotspots - files with high fan-in that should be requested early for any task touching their domain. Hotspot detection uses weighted fan-in: call edges count 2x, import edges 1x.
Risk scores - composite metric combining weighted fan-in, cognitive complexity, error handling density, and concurrency primitives. High-risk files get smaller diffs and stronger tests.
Descriptions - what each file does, grounded in imports, exports, string literals, and graph position - not just the filename
Exports - the primary symbols a file exposes, ranked by likely importance
Dependency graph - bidirectional import and call edges grouped by architectural role, collapsed where homogeneous
Semantic summaries - concise behavior descriptions composed from AST analysis: "async side-effecting adapter with HTTP handler surface", "pure computation over domain types", "error-swallowing orchestration module"
Behavioral, surface, and quality tags - coupling type, runtime behavior, API surfaces, and code quality signals:
- Behavior:
[BEHAVIOR:owns-state],[BEHAVIOR:async],[BEHAVIOR:panics-on-error] - Surface:
[SURFACE:filesystem],[SURFACE:http-handler],[SURFACE:database] - Coupling:
[COUPLING:pure],[COUPLING:mixed],[COUPLING:ui-coupled] - Quality:
[QUALITY:undocumented],[QUALITY:complex-flow],[QUALITY:error-boundary]
Topology tags - graph-derived roles for high fan-in files:
[GLOBAL-UTIL]- imported from 3+ distinct domains[DOMAIN-CONTRACT]- shared contract imported mostly by one subsystem
Example:
## Layer 1 - Domain (Engine)
`src/compiler.rs`
Compiles timeline entries into optimized schedule blocks. [COUPLING:pure]
Exports: Compiler, compile_schedule
Semantic: pure computation
`src/types.rs` [TYPE] [HOTSPOT] [DOMAIN-CONTRACT]
Core data structures shared across the pipeline. [QUALITY:undocumented]
Exports: Schedule, TimeBlock, Constraint
Read the map before touching any source. Identify:
- which layer the task lives in
- which hotspots are relevant
- what the dep graph says about blast radius
- which files have quality warnings (complex flow, undocumented APIs, error boundaries)
semmap trace src/main.rsTrace from src/main.rs
Layer 1 src/main.rs - entry point
Layer 2 src/deps.rs - imported by main.rs
src/parser.rs - imported by main.rs
Layer 3 src/types.rs - imported by deps.rs, parser.rs
Trace prioritizes call edges over import edges and weights high-risk files higher, so the execution spine reflects runtime influence, not just static imports.
Use the map, hotspot tags, and trace output to identify the smallest file set that covers the task. Request those files. Read them deeply. Edit with context.
The escape hatch: if the map doesn't cover something the task needs, request the missing file and continue. The map narrows the search - it doesn't have to be perfect to save round trips.
semmap generate --chat
semmap generate --chat-output /tmp/semmap-chat.mdCopies a ready-to-paste bundle to your clipboard by default. In headless or sandboxed sessions, use --chat-output or --chat-stdout to keep the bundle accessible without a working desktop clipboard.
SEMMAP resolves imports, extracts exports, infers architectural role, and produces descriptions across:
- Rust
- TypeScript & JavaScript (ES Modules and CommonJS)
- Go
- Python
- C and C++
- Swift
- HTML (script/link/img tags, inline ES module imports)
Semantic analysis (call graphs, complexity, error handling, concurrency, documentation coverage) works across all supported languages.
| Command | Description |
|---|---|
semmap generate |
Generate SEMMAP.md |
semmap generate --purpose "..." |
Generate with explicit purpose string |
semmap generate --chat |
Generate a chat-ready bundle; falls back to a sidecar file if clipboard access fails |
semmap generate --chat-output <path> |
Write the chat-ready bundle directly to a file |
semmap trace <file> |
Layer-annotated dependency trace from an entry point |
semmap cat <files...> |
Copy specific files to clipboard, or use --stdout / --output |
semmap override cat <file> |
Print raw file content to stdout and audit non-manifest reads in .semmap/session-audit.jsonl |
semmap inspect <file> |
Print persisted file analysis from .semmap/files.json and .semmap/quality.json |
semmap preview <files...> |
Generate AST previews, with --stdout / --output for non-clipboard delivery |
semmap analyze <file> |
Print intra-file architecture analysis and optionally skip clipboard with --no-clipboard |
semmap style |
Render persisted style samples from .semmap/style.json, with --stdout / --output for agent-safe delivery |
semmap deps |
Print structured dependency graph |
semmap deps --check |
Check for architectural layer violations |
semmap validate |
Validate map against repo |
semmap deps --checkDetects layer violations: a file in an inner layer importing from an outer layer. Useful in CI to catch architectural drift before it becomes load-bearing.
Most AI coding mistakes are retrieval mistakes - the wrong files, read in the wrong order, producing a confident but wrong mental model.
SEMMAP treats this as a compression problem. A codebase of 200 files contains maybe 8 files that matter for any given task. The map's job is to make those 8 files identifiable without reading all 200.
The map is not documentation. Quality descriptions are necessary for the map to work, but the goal is not readable prose - it is discriminability. Two files with identical descriptions are indistinguishable when deciding what to request next. Every improvement to description quality is an improvement to retrieval accuracy.
MIT