π°π· νκ΅μ΄λ‘ μ½μΌμλ €λ©΄ β README.ko.md Β Β·Β π¬π§ English below.
Hand your AI agent the exact slice of code it needs β by coordinate, not by guessing. μ’νλ§ μ λ°νκ². μλ―Έλ LLMμ΄ rawλ₯Ό λ³΄κ³ λ§€λ² μλ‘ νμ νλ€.
Your agent needs to read createMessage. So it greps β and gets 40 hits across
providers. It opens whole files to find the right one, burning tokens on code it won't
use. Next turn it "remembers" the function was at line 120 β but your last edit pushed it
to 138, so it reads the wrong lines and never notices. π±
With code-map riding along, the agent just:
read({ refs: ["anthropic.ts#createMessage", "openai.ts#createMessage"] })β both exact slices, one call, still correct even after the file moved.
grep still does the finding β it's great at that. code-map does the reading:
small, exact, drift-proof. That's the whole idea.
| π― Never silently wrong | After heavy edits with no re-index, read re-anchors on the signature line: 0 silently-wrong bytes (a naive "line number" cache is ~100% wrong). It returns the right code or tells you it can't β never the wrong bytes. |
| β‘ Fewer tokens, fewer steps | Wired into a coding agent (codex, 150-task pass@30): β19% tokens, β67% shell commands, same success rate. On known-ref reads the cut is much larger β grok composer-2.5-fast, 30 passes: β53β¦β60% tokens, β71β¦β78% retrieval payload; codex with the routing skill: β34β¦β54%. |
| π§ Routing is the lever | Agents won't reach for read on their own (~17%). The bundled plugin/skill makes them: it flips discovery from a loss to β31% by killing the double-call, and turns vague usage (erratic, +61% worse on one task) into a steady win β 30/30 pass. |
| π§© Tiny & drop-in | Node + one dependency (oxc-parser), no build step. TS/JS and Python. MCP server + a one-line skill β install for Claude, Codex, grok, or Antigravity. |
Honest about the edges (this repo's whole point): code-map does not beat
grepat searching β it ties, so keep grepping. And it's not a universal token-saver β the win is large for reading known symbols with routing, ~0 on an already-lean read task, and a loss on raw discovery unless the skill routes it. Every number, every retraction, the model/metric caveats, and a one-command verifier: code-map-bench.
TL;DR β grep finds, read reads.
# 1. install (until npm publish, straight from GitHub)
npm install -g github:annyeong844/map # gives you `map` + `map-mcp`
# 2. index the repo you want your agent to read
cd /path/to/your-repo && map index --root . # writes ./.map-index.json
# 3. wire it into your agent (examples)
codex mcp add code-map -- map-mcp # Codex
claude mcp add code-map --scope user -- map-mcp # Claude CodeThat's it β your agent now has one tool, read. For the β19% / β67% efficiency win on
Codex you also tell it when to use code-map (one line); see Wiring it for real below.
The MCP server auto-detects the index (walks up for .map-index.json) and auto-reloads
when you re-index β no reconnect.
- run an AI coding agent (Codex, Claude Code, β¦) that reads a lot of code
- have a repo big enough that
grepreturns noise and reads pull whole files - work in TypeScript / JavaScript or Python
- want reads that stay correct as the code changes under the agent
- want a better search β
grep/ripgrep already ties it; code-map is for reading - have a 1β2 file project β the read savings won't show up
- want "where is auth handled?" concept search β that's embeddings (a measured non-goal here)
map read "alias-map.ts#buildAliasMap" # path-scoped name β exact slice
map read buildAliasMap # bare name (errors if ambiguous)
map read withRetry --snippet "req.copy()" # char range *inside* the symbol
map read --refs "getModel,createMessage,withRetry" # batch: many symbols, one call
map stats # index overviewAdd --json for machine output. Search with your own grep β feed the file:line or
symbol name you find to read. As an MCP tool it's the same read (single ref, a refs
array for batch, optional snippet).
π§ͺ The honest scorecard β what we kept, and what the data made us cut
code-map started broad (locate, grep, graph, hotspots, semantic search) and was
benchmarked honestly against grep + a strong agent (Sonnet/Opus, headless, on real
repos β cline, django, requests). The measurements ate most of it; the surface was
cut to match. Keeping only what beat the baseline is the point.
| Capability | Measured vs grep + strong agent |
Verdict |
|---|---|---|
Drift-safe READ (read) |
After heavy churn, no re-index: 0 silently-wrong bytes, 94.5% recovery vs naive line-caching at 100% silent. Reproduced. | kept |
Drift-safe EDIT (read --snippet) |
Quoted snippet β its current char range after churn: 0 silent mistargets vs naive 100%. | kept |
refs batch tokens |
Pass@30, 150 tasks, real plugin env (codex): β18.6% effective tokens, β67% shell commands, tied pass@30, 0 MCP fails. Biggest where it fully replaces grep (known-cross-file β25% tok / β44% time); a wash/slower where it only supplements (discovery, multi-symbol batch). A loss on Opus (native already lean). | kept |
| Read β turns | β25β30% agent turns (K=30, both models, CI clear of 0). | kept |
Caller precision (code-oracle, separate sibling) |
31% fewer files to read for blast-radius (40β75% on common names); the type checker disambiguates which class's method. LSP-warmup cost. | kept (sibling) |
| Single-read tokens | The early "β16β35%" was K=5 noise; single read at K=30 ~0. | retracted |
Search / locate |
Ties grep (100% recall). |
removed |
| Semantic embeddings | Worse β rejected three independent ways; degraded a grep agent. | not built |
| Light call-graph | Loses to grep on recall (blind to dispatch/types). |
removed |
Full numbers, the round-trip law, the adoption ladder, and every retraction live in
code-map-bench β
RESULTS.md (drift/edit/oracle)
and EFFICIENCY-CODEX.md
(batch/cross-model/adoption). node verify.mjs there re-derives every headline from raw data.
π§ How read survives drift (the core trick)
read(symbol) resolves the name to one symbol, then:
1. file token matches index β exact char-offset slice [exact]
2. file changed β re-anchor on the signature, re-slice [relocated]
3. anchor matches many sites β return the candidate locations [ambiguous]
4. anchor is gone β say so; re-index to refresh [anchor-lost]
Line numbers drift; a signature line rarely does. When offsets go stale, read re-finds
the symbol on its signature and flags the result so you verify the boundary β nothing
is silently trusted. That's its edge over a blind Read(file, lineRange): a stale line
range returns the wrong bytes; read re-anchors or tells you it can't. --snippet gets a
char range inside the symbol, never escaping into a neighbour.
π§± Why coordinates, never meaning
A map that stores meaning (summaries) must defend it against going stale β producers,
verifiers, regeneration. Store no interpretation and that machinery disappears; what's left
is only what a machine can verify: a coordinate index (path/line/charStartβcharEnd)
plus one token per file that says whether those coordinates still hold. The only
question β "is this coordinate correct?" β has an answer. "Is this description right?" is
never asked. The LLM reads the raw bytes and judges them fresh, every call.
Where the coordinates come from: code-map parses the source tree itself (no external
graph). Git-tracked files (git ls-files, so .gitignore is respected) β TS/JS via
oxc-parser, Python via a stdlib-ast backend β both emit the same per-file
primitives (symbol coordinate + a searchText drift-anchor + a content token). fanIn
(cross-file reference count) only breaks ties when a bare name resolves to more than one
symbol. Honest scope: namespace / export * / alias imports aren't attributed.
π Wiring it for real (install options + the efficiency win)
Requirements: Node β₯ 23.6 (runs TypeScript directly, no build), one runtime dep
(oxc-parser); ripgrep used for the file walk when present; Python needs python3 on PATH.
Install: npm install -g @annyeong844/code-map (once published) Β·
npm install -g github:annyeong844/map (now) Β· or clone + npm install && npm link.
All expose map and map-mcp.
MCP config β pin the index if your client starts servers outside the repo:
# ~/.codex/config.toml (or project .codex/config.toml)
[mcp_servers.code-map]
command = "map-mcp"
[mcp_servers.code-map.env]
MAP_INDEX = "/path/to/target-repo/.map-index.json"The efficiency win needs adoption. Agents won't pick code-map over grep on their own (measured: ~17%). Wirings that reach 100% reliable use β pick one:
- The bundled plugin/skill (recommended): this repo ships the routing skill at
skills/code-map-retrieval/with a.claude-plugin/plugin.jsonmanifest, so it installs as a plugin and self-routes everywhere.It carries the discovery double-call guard β for discovery, grep and stop; don't add agrok plugin install annyeong844/map # Grok (or a local path) claude plugin install annyeong844/map # Claude Code # any host: copy skills/code-map-retrieval/SKILL.md into ~/.codex/skills/ (or .grok/.claude)
readon top β which a 3-arm benchmark showed flips discovery from a loss to a win. - An
AGENTS.mdline (per-repo, zero load cost): see code-map-bench/integrations/AGENTS.code-map.md. - Antigravity / Gemini: Antigravity reads rules from
GEMINI.md(global~/.gemini/GEMINI.mdor workspace) andAGENTS.md. This repo ships a readyGEMINI.mdβ copy it into your global~/.gemini/GEMINI.md(or a workspace) for the routing. Wire the MCP via the IDE's Manage MCP Servers β View raw config (~/.gemini/config/mcp_config.json):On Windows + WSL, install code-map with the Windows Node (β₯23.6) so{ "mcpServers": { "code-map": { "command": "map-mcp" } } }map-mcpis a native command ({ "command": "cmd", "args": ["/d","/c","map-mcp"] }); it reads the same repo files.
Either says, in effect: "read known symbols via code-map read (batch independent refs in
one call); use grep only to discover, and don't double-fetch." The MCP server also
self-advertises this at startup (raises the no-config baseline), but a plugin/skill/rule
directive is what makes it reliable.
References (optional, type-aware): wire code-oracle too. For who-calls / definition /
implementations the skill escalates to the sibling code-oracle (tsgo for TS/JS, ty for Python,
checker-grade). It's a separate MCP (kept out of the zero-dep core); wire it where the skill can reach it:
codex mcp add code-oracle -- node /abs/path/to/map/code-oracle/server.ts
claude mcp add code-oracle --scope user -- node /abs/path/to/map/code-oracle/server.tsIt warms a tsgo session once (~secondsβ20s by repo size), so the skill only calls it when it pays
(large repo / colliding name). Cross-platform: code-oracle normalizes /mnt/c/β¦ β C:\β¦ paths,
so one server serves both a Windows IDE and WSL agents (over interop) β e.g. a fast win32 build
can serve WSL clients too, dodging the /mnt/c drvfs penalty (~38s β ~4s on the same repo).
π Benchmark it yourself
git clone https://github.com/annyeong844/code-map-bench && cd code-map-bench
codex login --device-auth
node harnesses/bench-codex-headless.mjs --run --passes 30 --auth chatgpt --strategies native,map-batchThe harness (in code-map-bench) runs pass@30 over a diverse task set, captures usage
from codex exec --json, and scores the route (native rows fail if they touch MCP;
map-batch rows fail if they don't complete a read({ refs: [...] })). It reports raw,
adjusted = input β cached, and cache-aware effective = uncached + cachedΓweight so you
can see the win under real prompt caching. The honest takeaway, by scenario:
| scenario | code-map vs grep | when |
|---|---|---|
| known-cross-file | β25% tokens, β44% time | reading named symbols spread across files β grep fully replaced |
| file-wide / known-single | β15β21% tokens, β28β35% time | known symbols, grep replaced |
| discovery-first | tokens β but time β | must grep to find first β code-map only supplements |
| multi-symbol batch | ~tie | native already batches there |
It is not a token-saver everywhere β strongest when the agent already knows the refs and code-map can replace (not augment) the search.
ποΈ Architecture + the code-oracle sibling
src/
core/ types Β· files Β· extract-symbols (oxc) Β· fan-in Β· build-index Β· locate Β· read Β· store
py/ extract.py (Python: stdlib ast β the same per-file primitives)
cli/ main.ts (index / read / stats)
mcp/ server.ts (the single `read` tool, auto-reload)
test/ extract Β· exact-slice Β· methods Β· relocation Β· anchor-lost Β· incremental Β· fan-in
Β· snippet-aim Β· batch Β· path-traversal refusal Β· Python
node --test "test/*.test.ts"code-oracle/ β an optional, separate, heavier sibling. grep and a light index can't
resolve obj.method() dispatch β that needs types. code-oracle is a separate MCP that
answers who calls this / what implements this / where is this defined at type-checker grade
over a warm LSP session (tsgo for TS/JS, ty for Python). Kept separate on purpose:
a heavy preview dependency, seconds of warmup, stateful β the opposite of code-map's one-dep
lightness. Honest bounds (measured): tsgo is solid; ty (0.0.50) resolves definition
cross-file and accurately, but references is intra-file only β so Python callers are a
lower-bound intra-file screen (flagged incomplete: true). For complete Python callers, use
grep (100% recall); we deliberately don't add a heavier Python references backend. Truly
dynamic dispatch (token-only DI, Proxy, obj[k]()) is invisible to any checker. Its own
package.json + tests (cd code-oracle && npm test).
π§ Maintainer / publishing
npm test # 24 tests
npm run typecheck # tsc --noEmit, strict (0 errors; src/ is any-free)
npm run lint # npx oxlint (no devDep)
npm run check:package # dry-run package file-list safety check
npm publish --access public # runs check:package via prepublishOnlycheck:package inspects the exact dry-run file list and fails if local env/config paths
(.env, .codex, auth.json, config.toml, β¦) or likely token values would ship.