SEMMAP - Minimal working set, first try

SEMMAP generates a compressed architectural map of your codebase. An AI that reads the map before working on a task can identify and request the right small set of source files instead of wandering, guessing, and burning tokens on the wrong files.

The map looks like documentation. That is intentional but not the point. The point is retrieval: an AI with the map should converge on the correct 3-8 files for any task in fewer round trips than without it.

The problem

AI coding tools explore unfamiliar codebases the wrong way.

Without orientation they tend to:

read too much and still miss what matters
patch the wrong file confidently
ask for more context without narrowing down
act on a weak mental model they can't identify as weak

A good developer doesn't start by reading random source files. They want the shape first - where the app starts, what the major boundaries are, which files are load-bearing, what to look at next. Then they read only what the task actually requires.

AI tools need the same orientation. SEMMAP produces it.

Quick start

cargo install semmap
semmap generate

Commit the generated SEMMAP.md. Wire it into your workflow however makes sense - an agent.md instructions file, a system prompt, a context file your IDE plugin reads, or a manual paste. The map is plain markdown and works anywhere.

The workflow:

Read the map - understand layers, hotspots, and boundaries
Trace the likely path - follow execution from the relevant entry point
Request only what the task needs - read that small file set deeply
Edit with grounded context

What the map contains

SEMMAP analyzes the repo statically and emits:

Layers - architectural role of each file

Layer 0  Config and build artifacts
Layer 1  Domain logic and core engine
Layer 2  Adapters, infra, and integration
Layer 3  Entrypoints and app shell
Layer 4  Tests

Hotspots - files with high fan-in that should be requested early for any task touching their domain. Hotspot detection uses weighted fan-in: call edges count 2x, import edges 1x.

Risk scores - composite metric combining weighted fan-in, cognitive complexity, error handling density, and concurrency primitives. High-risk files get smaller diffs and stronger tests.

Descriptions - what each file does, grounded in imports, exports, string literals, and graph position - not just the filename

Exports - the primary symbols a file exposes, ranked by likely importance

Dependency graph - bidirectional import and call edges grouped by architectural role, collapsed where homogeneous

Semantic summaries - concise behavior descriptions composed from AST analysis: "async side-effecting adapter with HTTP handler surface", "pure computation over domain types", "error-swallowing orchestration module"

Behavioral, surface, and quality tags - coupling type, runtime behavior, API surfaces, and code quality signals:

Behavior: [BEHAVIOR:owns-state], [BEHAVIOR:async], [BEHAVIOR:panics-on-error]
Surface: [SURFACE:filesystem], [SURFACE:http-handler], [SURFACE:database]
Coupling: [COUPLING:pure], [COUPLING:mixed], [COUPLING:ui-coupled]
Quality: [QUALITY:undocumented], [QUALITY:complex-flow], [QUALITY:error-boundary]

Topology tags - graph-derived roles for high fan-in files:

[GLOBAL-UTIL] - imported from 3+ distinct domains
[DOMAIN-CONTRACT] - shared contract imported mostly by one subsystem

Example:

## Layer 1 - Domain (Engine)

`src/compiler.rs`
Compiles timeline entries into optimized schedule blocks. [COUPLING:pure]
Exports: Compiler, compile_schedule
Semantic: pure computation

`src/types.rs` [TYPE] [HOTSPOT] [DOMAIN-CONTRACT]
Core data structures shared across the pipeline. [QUALITY:undocumented]
Exports: Schedule, TimeBlock, Constraint

Using the map

Orient first

Read the map before touching any source. Identify:

which layer the task lives in
which hotspots are relevant
what the dep graph says about blast radius
which files have quality warnings (complex flow, undocumented APIs, error boundaries)

Trace the execution path

semmap trace src/main.rs

Trace from src/main.rs

Layer 1  src/main.rs - entry point
Layer 2  src/deps.rs - imported by main.rs
         src/parser.rs - imported by main.rs
Layer 3  src/types.rs - imported by deps.rs, parser.rs

Trace prioritizes call edges over import edges and weights high-risk files higher, so the execution spine reflects runtime influence, not just static imports.

Request a minimal working set

Use the map, hotspot tags, and trace output to identify the smallest file set that covers the task. Request those files. Read them deeply. Edit with context.

The escape hatch: if the map doesn't cover something the task needs, request the missing file and continue. The map narrows the search - it doesn't have to be perfect to save round trips.

One-shot context assembly

semmap generate --chat
semmap generate --chat-output /tmp/semmap-chat.md

Copies a ready-to-paste bundle to your clipboard by default. In headless or sandboxed sessions, use --chat-output or --chat-stdout to keep the bundle accessible without a working desktop clipboard.

Supported languages

SEMMAP resolves imports, extracts exports, infers architectural role, and produces descriptions across:

Rust
TypeScript & JavaScript (ES Modules and CommonJS)
Go
Python
C and C++
Swift
HTML (script/link/img tags, inline ES module imports)

Semantic analysis (call graphs, complexity, error handling, concurrency, documentation coverage) works across all supported languages.

Commands

Command	Description
`semmap generate`	Generate `SEMMAP.md`
`semmap generate --purpose "..."`	Generate with explicit purpose string
`semmap generate --chat`	Generate a chat-ready bundle; falls back to a sidecar file if clipboard access fails
`semmap generate --chat-output <path>`	Write the chat-ready bundle directly to a file
`semmap trace <file>`	Layer-annotated dependency trace from an entry point
`semmap cat <files...>`	Copy specific files to clipboard, or use `--stdout` / `--output`
`semmap override cat <file>`	Print raw file content to stdout and audit non-manifest reads in `.semmap/session-audit.jsonl`
`semmap inspect <file>`	Print persisted file analysis from `.semmap/files.json` and `.semmap/quality.json`
`semmap preview <files...>`	Generate AST previews, with `--stdout` / `--output` for non-clipboard delivery
`semmap analyze <file>`	Print intra-file architecture analysis and optionally skip clipboard with `--no-clipboard`
`semmap style`	Render persisted style samples from `.semmap/style.json`, with `--stdout` / `--output` for agent-safe delivery
`semmap deps`	Print structured dependency graph
`semmap deps --check`	Check for architectural layer violations
`semmap validate`	Validate map against repo

Architecture checks

semmap deps --check

Detects layer violations: a file in an inner layer importing from an outer layer. Useful in CI to catch architectural drift before it becomes load-bearing.

Philosophy

Most AI coding mistakes are retrieval mistakes - the wrong files, read in the wrong order, producing a confident but wrong mental model.

SEMMAP treats this as a compression problem. A codebase of 200 files contains maybe 8 files that matter for any given task. The map's job is to make those 8 files identifiable without reading all 200.

The map is not documentation. Quality descriptions are necessary for the map to work, but the goal is not readable prose - it is discriminability. Two files with identical descriptions are indistinguishable when deciding what to request next. Every improvement to description quality is an improvement to retrieval accuracy.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
.cargo		.cargo
.claude		.claude
.ishoo		.ishoo
.semmap		.semmap
docs		docs
eval		eval
src		src
tests		tests
.codex		.codex
.gitattributes		.gitattributes
.gitignore		.gitignore
.netiignore		.netiignore
.semmap.toml		.semmap.toml
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
SEMMAP.md		SEMMAP.md
neti.toml		neti.toml
semmap.toml		semmap.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEMMAP - Minimal working set, first try

The problem

Quick start

What the map contains

Using the map

Orient first

Trace the execution path

Request a minimal working set

One-shot context assembly

Supported languages

Commands

Architecture checks

Philosophy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SEMMAP - Minimal working set, first try

The problem

Quick start

What the map contains

Using the map

Orient first

Trace the execution path

Request a minimal working set

One-shot context assembly

Supported languages

Commands

Architecture checks

Philosophy

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages