Improve context lookup for code-like source strings#451
Conversation
|
+1 on this — the "code-like source string" path (route keys, event names, config keys) is a recurring need for us, and I wanted to add a concrete cross-repo use-case as signal. Context: we run CodeGraph in workspace mode over a 23-repo / ~2.8k-file estate (TS / React Native / Directus / Unity). The questions that fall through today are all exact body string literals that cross a repo boundary:
On a default v1.0.1 install, What would close it for us is a queryable, FTS-backed source-string index (exact match → Thanks for the great tool. |
|
Thanks for the concrete use cases. I added a targeted regression test to this PR for exact cross-repo contract strings like /live-scoring/append-event\ and \game_matches\ resolving to their enclosing caller functions.\n\nFor the broader queryable, FTS-backed source-string surface, I opened #923 as a separate larger PR so this one can stay scoped: #923 |
|
Hi mythologi,
Thanks a lot for the detailed signal and the concrete cross-repo examples. This week has been unusually busy for me; I just got home, saw your note, and put together two follow-ups.
First, I updated #451 with a targeted regression test for the exact case you described: code-like cross-repo contract strings such as `/live-scoring/append-event` and `game_matches` should resolve to the smallest enclosing caller function in context lookup.
Second, I opened a separate larger PR for the broader design you suggested:
#923
That PR adds a queryable `source_strings` side-table with FTS5 support for compact code-like literals: route keys, collection names, bridge event names, config keys, etc. It indexes exact literal sites with file:line and enclosing symbol metadata, and wires them into API/search/context/CLI/MCP paths while keeping exact literal semantics for single string lookups.
I also added tests around exact lookup, FTS term lookup, MCP search/explore output, sync replacement, clear/delete lifecycle behavior, and natural-language ranking regression.
Really appreciate the concrete examples. They helped separate the small scoped fix in #451 from the larger `source_strings` surface that probably deserves its own review.
原始邮件
发件人:mythologi ***@***.***>
发件时间:2026年6月16日 11:38
收件人:colbymchenry/codegraph ***@***.***>
抄送:Lee-take ***@***.***>, Author ***@***.***>
主题:Re: [colbymchenry/codegraph] Improve context lookup for code-like source strings (PR #451)
mythologi left a comment (colbymchenry/codegraph#451)
+1 on this — the "code-like source string" path (route keys, event names, config keys) is a recurring need for us, and I wanted to add a concrete cross-repo use-case as signal.
Context: we run CodeGraph in workspace mode over a 23-repo / ~2.8k-file estate (TS / React Native / Directus / Unity). The questions that fall through today are all exact body string literals that cross a repo boundary:
Route keys — '/live-scoring/append-event' is a Directus endpoint defined in one repo and called via fetch('…/live-scoring/append-event') string literals in another. "Who calls this route?" isn't answerable from the symbol graph (the caller is a string; the endpoint isn't a route node).
Collection names — 'game_matches' (a Directus collection) referenced as a string in createItem('game_matches') across services.
Bridge event names — RN→Unity postMessage('<eventName>') string args.
On a default v1.0.1 install, codegraph query / explore do token-fuzzy symbol expansion for these — e.g. explore game_matches returns ~200 symbols including a token-adjacent matchesQuery from an unrelated repo — rather than the exact literal sites. (I also don't see the config_refs / sql_refs tables from #114/#115 in the v1.0.1 release, so there's no queryable string-literal surface for the general case yet.)
What would close it for us is a queryable, FTS-backed source-string index (exact match → file:line of the smallest enclosing symbol) — essentially what this issue proposes for the context path. We're prototyping a local source_strings FTS5 side-table (node:sqlite + the bundled tree-sitter, mirroring the config_refs/sql_refs shape) to unblock ourselves meanwhile — happy to share findings or contribute upstream if useful.
Thanks for the great tool.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you authored the thread.
|
Summary
deepseek-r1env_var_fornow outrank broader source-text matches, while generic lowercase words such asmessageare not over-boostedWhy
codegraph_contextcould miss important implementation code when the user query named a code-like string that appears only inside function bodies, provider aliases, feature flags, event names, or route keys. In that case the existing node-name FTS path may return generic symbols or nothing useful, even though the indexed source contains an exact match.This keeps the existing tool flow intact: agents still call
codegraph_context, but the context builder can use exact code-like source text as an additional high-signal entry point.Validation
npx vitest run __tests__/context.test.ts— 22 passednpm run build— passeddist/bin/codegraph.js: 7 function-specific context queries all ranked the expected function first:normalize_model_for_provider—crates/config/src/lib.rs:1189sanitize_thinking_mode_messages—crates/tui/src/client/chat.rs:1461active_provider_has_env_api_key—crates/tui/src/config.rs:3464env_for—crates/secrets/src/lib.rs:526provider_env_vars—crates/cli/src/lib.rs:761env_var_for—crates/tui/src/tui/provider_picker.rs:87requires_reasoning_content—crates/tui/src/client/chat.rs:1591npm test; local Windows run was blocked outside this change by MCP process tests failing to remove temp directories withEPERMplus a worker OOM after most tests had already passed.