Skip to content

Improve context lookup for code-like source strings#451

Open
Lee-take wants to merge 4 commits into
colbymchenry:mainfrom
Lee-take:codex/source-text-context-fallback
Open

Improve context lookup for code-like source strings#451
Lee-take wants to merge 4 commits into
colbymchenry:mainfrom
Lee-take:codex/source-text-context-fallback

Conversation

@Lee-take

@Lee-take Lee-take commented May 26, 2026

Copy link
Copy Markdown

Summary

  • add a source-text fallback inside context lookup for code-like string literals/config keys such as deepseek-r1
  • map source-text hits back to the smallest enclosing indexed symbol so context still returns implementation nodes instead of raw grep output
  • deprioritize likely test/spec nodes for non-test queries and add coverage for string-alias lookup
  • preserve exact code-symbol intent: specific queried symbols such as env_var_for now outrank broader source-text matches, while generic lowercase words such as message are not over-boosted

Why

codegraph_context could miss important implementation code when the user query named a code-like string that appears only inside function bodies, provider aliases, feature flags, event names, or route keys. In that case the existing node-name FTS path may return generic symbols or nothing useful, even though the indexed source contains an exact match.

This keeps the existing tool flow intact: agents still call codegraph_context, but the context builder can use exact code-like source text as an additional high-signal entry point.

Validation

  • npx vitest run __tests__/context.test.ts — 22 passed
  • npm run build — passed
  • DeepSeek-TUI real-source validation after rebuilding dist/bin/codegraph.js: 7 function-specific context queries all ranked the expected function first:
    • normalize_model_for_providercrates/config/src/lib.rs:1189
    • sanitize_thinking_mode_messagescrates/tui/src/client/chat.rs:1461
    • active_provider_has_env_api_keycrates/tui/src/config.rs:3464
    • env_forcrates/secrets/src/lib.rs:526
    • provider_env_varscrates/cli/src/lib.rs:761
    • env_var_forcrates/tui/src/tui/provider_picker.rs:87
    • requires_reasoning_contentcrates/tui/src/client/chat.rs:1591
  • Also attempted npm test; local Windows run was blocked outside this change by MCP process tests failing to remove temp directories with EPERM plus a worker OOM after most tests had already passed.

@Lee-take Lee-take marked this pull request as ready for review May 26, 2026 12:04
@mythologi

Copy link
Copy Markdown

+1 on this — the "code-like source string" path (route keys, event names, config keys) is a recurring need for us, and I wanted to add a concrete cross-repo use-case as signal.

Context: we run CodeGraph in workspace mode over a 23-repo / ~2.8k-file estate (TS / React Native / Directus / Unity). The questions that fall through today are all exact body string literals that cross a repo boundary:

  • Route keys'/live-scoring/append-event' is a Directus endpoint defined in one repo and called via fetch('…/live-scoring/append-event') string literals in another. "Who calls this route?" isn't answerable from the symbol graph (the caller is a string; the endpoint isn't a route node).
  • Collection names'game_matches' (a Directus collection) referenced as a string in createItem('game_matches') across services.
  • Bridge event names — RN→Unity postMessage('<eventName>') string args.

On a default v1.0.1 install, codegraph query / explore do token-fuzzy symbol expansion for these — e.g. explore game_matches returns ~200 symbols including a token-adjacent matchesQuery from an unrelated repo — rather than the exact literal sites. (I also don't see the config_refs / sql_refs tables from #114/#115 in the v1.0.1 release, so there's no queryable string-literal surface for the general case yet.)

What would close it for us is a queryable, FTS-backed source-string index (exact match → file:line of the smallest enclosing symbol) — essentially what this issue proposes for the context path. We're prototyping a local source_strings FTS5 side-table (node:sqlite + the bundled tree-sitter, mirroring the config_refs/sql_refs shape) to unblock ourselves meanwhile — happy to share findings or contribute upstream if useful.

Thanks for the great tool.

@Lee-take

Copy link
Copy Markdown
Author

Thanks for the concrete use cases. I added a targeted regression test to this PR for exact cross-repo contract strings like /live-scoring/append-event\ and \game_matches\ resolving to their enclosing caller functions.\n\nFor the broader queryable, FTS-backed source-string surface, I opened #923 as a separate larger PR so this one can stay scoped: #923

@Lee-take

Lee-take commented Jun 18, 2026 via email

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants