Improve context lookup for code-like source strings by Lee-take · Pull Request #451 · colbymchenry/codegraph

Lee-take · 2026-05-26T11:40:35Z

Summary

add a source-text fallback inside context lookup for code-like string literals/config keys such as deepseek-r1
map source-text hits back to the smallest enclosing indexed symbol so context still returns implementation nodes instead of raw grep output
deprioritize likely test/spec nodes for non-test queries and add coverage for string-alias lookup
preserve exact code-symbol intent: specific queried symbols such as env_var_for now outrank broader source-text matches, while generic lowercase words such as message are not over-boosted

Why

codegraph_context could miss important implementation code when the user query named a code-like string that appears only inside function bodies, provider aliases, feature flags, event names, or route keys. In that case the existing node-name FTS path may return generic symbols or nothing useful, even though the indexed source contains an exact match.

This keeps the existing tool flow intact: agents still call codegraph_context, but the context builder can use exact code-like source text as an additional high-signal entry point.

Validation

npx vitest run __tests__/context.test.ts — 22 passed
npm run build — passed
DeepSeek-TUI real-source validation after rebuilding dist/bin/codegraph.js: 7 function-specific context queries all ranked the expected function first:
- normalize_model_for_provider — crates/config/src/lib.rs:1189
- sanitize_thinking_mode_messages — crates/tui/src/client/chat.rs:1461
- active_provider_has_env_api_key — crates/tui/src/config.rs:3464
- env_for — crates/secrets/src/lib.rs:526
- provider_env_vars — crates/cli/src/lib.rs:761
- env_var_for — crates/tui/src/tui/provider_picker.rs:87
- requires_reasoning_content — crates/tui/src/client/chat.rs:1591
Also attempted npm test; local Windows run was blocked outside this change by MCP process tests failing to remove temp directories with EPERM plus a worker OOM after most tests had already passed.

mythologi · 2026-06-16T03:38:00Z

+1 on this — the "code-like source string" path (route keys, event names, config keys) is a recurring need for us, and I wanted to add a concrete cross-repo use-case as signal.

Context: we run CodeGraph in workspace mode over a 23-repo / ~2.8k-file estate (TS / React Native / Directus / Unity). The questions that fall through today are all exact body string literals that cross a repo boundary:

Route keys — '/live-scoring/append-event' is a Directus endpoint defined in one repo and called via fetch('…/live-scoring/append-event') string literals in another. "Who calls this route?" isn't answerable from the symbol graph (the caller is a string; the endpoint isn't a route node).
Collection names — 'game_matches' (a Directus collection) referenced as a string in createItem('game_matches') across services.
Bridge event names — RN→Unity postMessage('<eventName>') string args.

On a default v1.0.1 install, codegraph query / explore do token-fuzzy symbol expansion for these — e.g. explore game_matches returns ~200 symbols including a token-adjacent matchesQuery from an unrelated repo — rather than the exact literal sites. (I also don't see the config_refs / sql_refs tables from #114/#115 in the v1.0.1 release, so there's no queryable string-literal surface for the general case yet.)

What would close it for us is a queryable, FTS-backed source-string index (exact match → file:line of the smallest enclosing symbol) — essentially what this issue proposes for the context path. We're prototyping a local source_strings FTS5 side-table (node:sqlite + the bundled tree-sitter, mirroring the config_refs/sql_refs shape) to unblock ourselves meanwhile — happy to share findings or contribute upstream if useful.

Thanks for the great tool.

Lee-take · 2026-06-18T14:54:00Z

Thanks for the concrete use cases. I added a targeted regression test to this PR for exact cross-repo contract strings like /live-scoring/append-event\ and \game_matches\ resolving to their enclosing caller functions.\n\nFor the broader queryable, FTS-backed source-string surface, I opened #923 as a separate larger PR so this one can stay scoped: #923

Lee-take · 2026-06-18T15:29:15Z

Hi mythologi, Thanks a lot for the detailed signal and the concrete cross-repo examples. This week has been unusually busy for me; I just got home, saw your note, and put together two follow-ups. First, I updated #451 with a targeted regression test for the exact case you described: code-like cross-repo contract strings such as `/live-scoring/append-event` and `game_matches` should resolve to the smallest enclosing caller function in context lookup. Second, I opened a separate larger PR for the broader design you suggested: #923 That PR adds a queryable `source_strings` side-table with FTS5 support for compact code-like literals: route keys, collection names, bridge event names, config keys, etc. It indexes exact literal sites with file:line and enclosing symbol metadata, and wires them into API/search/context/CLI/MCP paths while keeping exact literal semantics for single string lookups. I also added tests around exact lookup, FTS term lookup, MCP search/explore output, sync replacement, clear/delete lifecycle behavior, and natural-language ranking regression. Really appreciate the concrete examples. They helped separate the small scoped fix in #451 from the larger `source_strings` surface that probably deserves its own review. 原始邮件发件人：mythologi ***@***.***> 发件时间：2026年6月16日 11:38 收件人：colbymchenry/codegraph ***@***.***> 抄送：Lee-take ***@***.***>, Author ***@***.***> 主题：Re: [colbymchenry/codegraph] Improve context lookup for code-like source strings (PR #451) mythologi left a comment (colbymchenry/codegraph#451) +1 on this — the "code-like source string" path (route keys, event names, config keys) is a recurring need for us, and I wanted to add a concrete cross-repo use-case as signal. Context: we run CodeGraph in workspace mode over a 23-repo / ~2.8k-file estate (TS / React Native / Directus / Unity). The questions that fall through today are all exact body string literals that cross a repo boundary: Route keys — '/live-scoring/append-event' is a Directus endpoint defined in one repo and called via fetch('…/live-scoring/append-event') string literals in another. "Who calls this route?" isn't answerable from the symbol graph (the caller is a string; the endpoint isn't a route node). Collection names — 'game_matches' (a Directus collection) referenced as a string in createItem('game_matches') across services. Bridge event names — RN→Unity postMessage('<eventName>') string args. On a default v1.0.1 install, codegraph query / explore do token-fuzzy symbol expansion for these — e.g. explore game_matches returns ~200 symbols including a token-adjacent matchesQuery from an unrelated repo — rather than the exact literal sites. (I also don't see the config_refs / sql_refs tables from #114/#115 in the v1.0.1 release, so there's no queryable string-literal surface for the general case yet.) What would close it for us is a queryable, FTS-backed source-string index (exact match → file:line of the smallest enclosing symbol) — essentially what this issue proposes for the context path. We're prototyping a local source_strings FTS5 side-table (node:sqlite + the bundled tree-sitter, mirroring the config_refs/sql_refs shape) to unblock ourselves meanwhile — happy to share findings or contribute upstream if useful. Thanks for the great tool. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today! You are receiving this because you authored the thread.

Lee-take added 3 commits May 26, 2026 19:40

Improve context lookup for code-like source strings

b60ca52

Add source-text context ranking coverage

39ef004

Prioritize exact code symbols in context ranking

8af9848

Lee-take marked this pull request as ready for review May 26, 2026 12:04

Add cross-repo source-string context coverage

767676b

Lee-take mentioned this pull request Jun 18, 2026

Add source string index for code-like literals #923

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve context lookup for code-like source strings#451

Improve context lookup for code-like source strings#451
Lee-take wants to merge 4 commits into
colbymchenry:mainfrom
Lee-take:codex/source-text-context-fallback

Lee-take commented May 26, 2026 •

edited

Loading

Uh oh!

mythologi commented Jun 16, 2026

Uh oh!

Lee-take commented Jun 18, 2026

Uh oh!

Lee-take commented Jun 18, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Lee-take commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Uh oh!

mythologi commented Jun 16, 2026

Uh oh!

Lee-take commented Jun 18, 2026

Uh oh!

Lee-take commented Jun 18, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lee-take commented May 26, 2026 •

edited

Loading