Skip to content

annyeong844/map

Repository files navigation

code-map

πŸ‡°πŸ‡· ν•œκ΅­μ–΄λ‘œ μ½μœΌμ‹œλ €λ©΄ β†’ README.ko.md Β Β·Β  πŸ‡¬πŸ‡§ English below.

Hand your AI agent the exact slice of code it needs β€” by coordinate, not by guessing. μ’Œν‘œλ§Œ μ •λ°€ν•˜κ²Œ. μ˜λ―ΈλŠ” LLM이 rawλ₯Ό 보고 맀번 μƒˆλ‘œ νŒμ •ν•œλ‹€.

Node langs deps tool


Sound familiar? πŸ˜…

Your agent needs to read createMessage. So it greps β€” and gets 40 hits across providers. It opens whole files to find the right one, burning tokens on code it won't use. Next turn it "remembers" the function was at line 120 β€” but your last edit pushed it to 138, so it reads the wrong lines and never notices. 😱

With code-map riding along, the agent just:

read({ refs: ["anthropic.ts#createMessage", "openai.ts#createMessage"] }) β†’ both exact slices, one call, still correct even after the file moved.

grep still does the finding β€” it's great at that. code-map does the reading: small, exact, drift-proof. That's the whole idea.


What you get β€” measured, not promised

🎯 Never silently wrong After heavy edits with no re-index, read re-anchors on the signature line: 0 silently-wrong bytes (a naive "line number" cache is ~100% wrong). It returns the right code or tells you it can't β€” never the wrong bytes.
⚑ Fewer tokens, fewer steps Wired into a coding agent (codex, 150-task pass@30): βˆ’19% tokens, βˆ’67% shell commands, same success rate. On known-ref reads the cut is much larger β€” grok composer-2.5-fast, 30 passes: βˆ’53β€¦βˆ’60% tokens, βˆ’71β€¦βˆ’78% retrieval payload; codex with the routing skill: βˆ’34β€¦βˆ’54%.
🧭 Routing is the lever Agents won't reach for read on their own (~17%). The bundled plugin/skill makes them: it flips discovery from a loss to βˆ’31% by killing the double-call, and turns vague usage (erratic, +61% worse on one task) into a steady win β€” 30/30 pass.
🧩 Tiny & drop-in Node + one dependency (oxc-parser), no build step. TS/JS and Python. MCP server + a one-line skill β€” install for Claude, Codex, grok, or Antigravity.

Honest about the edges (this repo's whole point): code-map does not beat grep at searching β€” it ties, so keep grepping. And it's not a universal token-saver β€” the win is large for reading known symbols with routing, ~0 on an already-lean read task, and a loss on raw discovery unless the skill routes it. Every number, every retraction, the model/metric caveats, and a one-command verifier: code-map-bench.

TL;DR β€” grep finds, read reads.


Quick start

# 1. install (until npm publish, straight from GitHub)
npm install -g github:annyeong844/map        # gives you `map` + `map-mcp`

# 2. index the repo you want your agent to read
cd /path/to/your-repo && map index --root .  # writes ./.map-index.json

# 3. wire it into your agent (examples)
codex mcp add code-map -- map-mcp             # Codex
claude mcp add code-map --scope user -- map-mcp   # Claude Code

That's it β€” your agent now has one tool, read. For the βˆ’19% / βˆ’67% efficiency win on Codex you also tell it when to use code-map (one line); see Wiring it for real below. The MCP server auto-detects the index (walks up for .map-index.json) and auto-reloads when you re-index β€” no reconnect.


Who is this for?

βœ… Great fit if you

  • run an AI coding agent (Codex, Claude Code, …) that reads a lot of code
  • have a repo big enough that grep returns noise and reads pull whole files
  • work in TypeScript / JavaScript or Python
  • want reads that stay correct as the code changes under the agent

❌ Not the tool (yet) if you

  • want a better search β€” grep/ripgrep already ties it; code-map is for reading
  • have a 1–2 file project β€” the read savings won't show up
  • want "where is auth handled?" concept search β€” that's embeddings (a measured non-goal here)

The one tool: read

map read "alias-map.ts#buildAliasMap"        # path-scoped name β†’ exact slice
map read buildAliasMap                       # bare name (errors if ambiguous)
map read withRetry --snippet "req.copy()"    # char range *inside* the symbol
map read --refs "getModel,createMessage,withRetry"   # batch: many symbols, one call
map stats                                    # index overview

Add --json for machine output. Search with your own grep β€” feed the file:line or symbol name you find to read. As an MCP tool it's the same read (single ref, a refs array for batch, optional snippet).


πŸ§ͺ The honest scorecard β€” what we kept, and what the data made us cut

code-map started broad (locate, grep, graph, hotspots, semantic search) and was benchmarked honestly against grep + a strong agent (Sonnet/Opus, headless, on real repos β€” cline, django, requests). The measurements ate most of it; the surface was cut to match. Keeping only what beat the baseline is the point.

Capability Measured vs grep + strong agent Verdict
Drift-safe READ (read) After heavy churn, no re-index: 0 silently-wrong bytes, 94.5% recovery vs naive line-caching at 100% silent. Reproduced. kept
Drift-safe EDIT (read --snippet) Quoted snippet β†’ its current char range after churn: 0 silent mistargets vs naive 100%. kept
refs batch tokens Pass@30, 150 tasks, real plugin env (codex): βˆ’18.6% effective tokens, βˆ’67% shell commands, tied pass@30, 0 MCP fails. Biggest where it fully replaces grep (known-cross-file βˆ’25% tok / βˆ’44% time); a wash/slower where it only supplements (discovery, multi-symbol batch). A loss on Opus (native already lean). kept
Read β€” turns βˆ’25–30% agent turns (K=30, both models, CI clear of 0). kept
Caller precision (code-oracle, separate sibling) 31% fewer files to read for blast-radius (40–75% on common names); the type checker disambiguates which class's method. LSP-warmup cost. kept (sibling)
Single-read tokens The early "βˆ’16–35%" was K=5 noise; single read at K=30 ~0. retracted
Search / locate Ties grep (100% recall). removed
Semantic embeddings Worse β€” rejected three independent ways; degraded a grep agent. not built
Light call-graph Loses to grep on recall (blind to dispatch/types). removed

Full numbers, the round-trip law, the adoption ladder, and every retraction live in code-map-bench β€” RESULTS.md (drift/edit/oracle) and EFFICIENCY-CODEX.md (batch/cross-model/adoption). node verify.mjs there re-derives every headline from raw data.

🧭 How read survives drift (the core trick)

read(symbol) resolves the name to one symbol, then:

1. file token matches index   β†’  exact char-offset slice          [exact]
2. file changed               β†’  re-anchor on the signature, re-slice [relocated]
3. anchor matches many sites  β†’  return the candidate locations    [ambiguous]
4. anchor is gone             β†’  say so; re-index to refresh        [anchor-lost]

Line numbers drift; a signature line rarely does. When offsets go stale, read re-finds the symbol on its signature and flags the result so you verify the boundary β€” nothing is silently trusted. That's its edge over a blind Read(file, lineRange): a stale line range returns the wrong bytes; read re-anchors or tells you it can't. --snippet gets a char range inside the symbol, never escaping into a neighbour.

🧱 Why coordinates, never meaning

A map that stores meaning (summaries) must defend it against going stale β€” producers, verifiers, regeneration. Store no interpretation and that machinery disappears; what's left is only what a machine can verify: a coordinate index (path/line/charStart–charEnd) plus one token per file that says whether those coordinates still hold. The only question β€” "is this coordinate correct?" β€” has an answer. "Is this description right?" is never asked. The LLM reads the raw bytes and judges them fresh, every call.

Where the coordinates come from: code-map parses the source tree itself (no external graph). Git-tracked files (git ls-files, so .gitignore is respected) β†’ TS/JS via oxc-parser, Python via a stdlib-ast backend β€” both emit the same per-file primitives (symbol coordinate + a searchText drift-anchor + a content token). fanIn (cross-file reference count) only breaks ties when a bare name resolves to more than one symbol. Honest scope: namespace / export * / alias imports aren't attributed.

πŸ”Œ Wiring it for real (install options + the efficiency win)

Requirements: Node β‰₯ 23.6 (runs TypeScript directly, no build), one runtime dep (oxc-parser); ripgrep used for the file walk when present; Python needs python3 on PATH.

Install: npm install -g @annyeong844/code-map (once published) Β· npm install -g github:annyeong844/map (now) Β· or clone + npm install && npm link. All expose map and map-mcp.

MCP config β€” pin the index if your client starts servers outside the repo:

# ~/.codex/config.toml  (or project .codex/config.toml)
[mcp_servers.code-map]
command = "map-mcp"
[mcp_servers.code-map.env]
MAP_INDEX = "/path/to/target-repo/.map-index.json"
// generic MCP client
{ "mcpServers": { "code-map": { "command": "map-mcp" } } }

The efficiency win needs adoption. Agents won't pick code-map over grep on their own (measured: ~17%). Wirings that reach 100% reliable use β€” pick one:

  • The bundled plugin/skill (recommended): this repo ships the routing skill at skills/code-map-retrieval/ with a .claude-plugin/plugin.json manifest, so it installs as a plugin and self-routes everywhere.
    grok plugin install annyeong844/map          # Grok (or a local path)
    claude plugin install annyeong844/map         # Claude Code
    # any host: copy skills/code-map-retrieval/SKILL.md into ~/.codex/skills/ (or .grok/.claude)
    It carries the discovery double-call guard β€” for discovery, grep and stop; don't add a read on top β€” which a 3-arm benchmark showed flips discovery from a loss to a win.
  • An AGENTS.md line (per-repo, zero load cost): see code-map-bench/integrations/AGENTS.code-map.md.
  • Antigravity / Gemini: Antigravity reads rules from GEMINI.md (global ~/.gemini/GEMINI.md or workspace) and AGENTS.md. This repo ships a ready GEMINI.md β€” copy it into your global ~/.gemini/GEMINI.md (or a workspace) for the routing. Wire the MCP via the IDE's Manage MCP Servers β†’ View raw config (~/.gemini/config/mcp_config.json):
    { "mcpServers": { "code-map": { "command": "map-mcp" } } }
    On Windows + WSL, install code-map with the Windows Node (β‰₯23.6) so map-mcp is a native command ({ "command": "cmd", "args": ["/d","/c","map-mcp"] }); it reads the same repo files.

Either says, in effect: "read known symbols via code-map read (batch independent refs in one call); use grep only to discover, and don't double-fetch." The MCP server also self-advertises this at startup (raises the no-config baseline), but a plugin/skill/rule directive is what makes it reliable.

References (optional, type-aware): wire code-oracle too. For who-calls / definition / implementations the skill escalates to the sibling code-oracle (tsgo for TS/JS, ty for Python, checker-grade). It's a separate MCP (kept out of the zero-dep core); wire it where the skill can reach it:

codex  mcp add code-oracle -- node /abs/path/to/map/code-oracle/server.ts
claude mcp add code-oracle --scope user -- node /abs/path/to/map/code-oracle/server.ts

It warms a tsgo session once (~seconds–20s by repo size), so the skill only calls it when it pays (large repo / colliding name). Cross-platform: code-oracle normalizes /mnt/c/… ↔ C:\… paths, so one server serves both a Windows IDE and WSL agents (over interop) β€” e.g. a fast win32 build can serve WSL clients too, dodging the /mnt/c drvfs penalty (~38s β†’ ~4s on the same repo).

πŸ“Š Benchmark it yourself
git clone https://github.com/annyeong844/code-map-bench && cd code-map-bench
codex login --device-auth
node harnesses/bench-codex-headless.mjs --run --passes 30 --auth chatgpt --strategies native,map-batch

The harness (in code-map-bench) runs pass@30 over a diverse task set, captures usage from codex exec --json, and scores the route (native rows fail if they touch MCP; map-batch rows fail if they don't complete a read({ refs: [...] })). It reports raw, adjusted = input βˆ’ cached, and cache-aware effective = uncached + cachedΓ—weight so you can see the win under real prompt caching. The honest takeaway, by scenario:

scenario code-map vs grep when
known-cross-file βˆ’25% tokens, βˆ’44% time reading named symbols spread across files β€” grep fully replaced
file-wide / known-single βˆ’15–21% tokens, βˆ’28–35% time known symbols, grep replaced
discovery-first tokens ↓ but time ↑ must grep to find first β†’ code-map only supplements
multi-symbol batch ~tie native already batches there

It is not a token-saver everywhere β€” strongest when the agent already knows the refs and code-map can replace (not augment) the search.

πŸ›οΈ Architecture + the code-oracle sibling
src/
  core/    types Β· files Β· extract-symbols (oxc) Β· fan-in Β· build-index Β· locate Β· read Β· store
  py/      extract.py   (Python: stdlib ast β†’ the same per-file primitives)
  cli/     main.ts      (index / read / stats)
  mcp/     server.ts    (the single `read` tool, auto-reload)
test/      extract Β· exact-slice Β· methods Β· relocation Β· anchor-lost Β· incremental Β· fan-in
           Β· snippet-aim Β· batch Β· path-traversal refusal Β· Python
node --test "test/*.test.ts"

code-oracle/ β€” an optional, separate, heavier sibling. grep and a light index can't resolve obj.method() dispatch β€” that needs types. code-oracle is a separate MCP that answers who calls this / what implements this / where is this defined at type-checker grade over a warm LSP session (tsgo for TS/JS, ty for Python). Kept separate on purpose: a heavy preview dependency, seconds of warmup, stateful β€” the opposite of code-map's one-dep lightness. Honest bounds (measured): tsgo is solid; ty (0.0.50) resolves definition cross-file and accurately, but references is intra-file only β€” so Python callers are a lower-bound intra-file screen (flagged incomplete: true). For complete Python callers, use grep (100% recall); we deliberately don't add a heavier Python references backend. Truly dynamic dispatch (token-only DI, Proxy, obj[k]()) is invisible to any checker. Its own package.json + tests (cd code-oracle && npm test).

πŸ”§ Maintainer / publishing
npm test                 # 24 tests
npm run typecheck        # tsc --noEmit, strict (0 errors; src/ is any-free)
npm run lint             # npx oxlint (no devDep)
npm run check:package    # dry-run package file-list safety check
npm publish --access public   # runs check:package via prepublishOnly

check:package inspects the exact dry-run file list and fails if local env/config paths (.env, .codex, auth.json, config.toml, …) or likely token values would ship.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors