Skip to content

Add optional CodeGraph structural context tool#211

Draft
GenjiCy wants to merge 2 commits into
alibaba:mainfrom
GenjiCy:codex/optional-codegraph-tool
Draft

Add optional CodeGraph structural context tool#211
GenjiCy wants to merge 2 commits into
alibaba:mainfrom
GenjiCy:codex/optional-codegraph-tool

Conversation

@GenjiCy

@GenjiCy GenjiCy commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Closes #210.

This adds an optional code_graph_context review tool backed by the external CodeGraph CLI. The tool is meant for structural code-review context — symbol lookup, exploration, callers, callees, and impact checks — while keeping code_search as the literal text-search tool.

Behavior

The CodeGraph tool is hidden from the model unless all of the following are true:

  • .codegraph/codegraph.db exists in the repository
  • codegraph is available on PATH
  • the installed CodeGraph major version is supported (1.x)
  • codegraph status <repo> succeeds
  • for range/commit review modes, the current checkout HEAD matches the review target ref used for file reads

If any check fails, OCR filters code_graph_context out of the tool definitions and does not register the provider, so existing users keep the current behavior.

CI/CD note

The current GitHub Actions and GitLab CI examples install OpenCodeReview only; they do not install CodeGraph or build a .codegraph index. In those default CI/CD jobs, this tool will therefore be hidden automatically.

Teams that want structural context in CI need to opt in explicitly, for example by using a job image that includes CodeGraph and running codegraph init/codegraph sync after checkout and before ocr review.

Implementation notes

  • Uses the external CodeGraph CLI instead of linking to CodeGraph internals or its database schema.
  • Supports explore, search, callers, callees, and impact modes.
  • Trims ANSI output, applies timeouts, and caps large outputs before returning them to the model.
  • Describes intended usage in tools.json so the agent uses it for public API, signature, interface, model, route, auth/security, concurrency, lifecycle, or cross-file impact risks rather than simple text lookup.

Tests

go test ./...

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 6 issue(s) in this PR.

  • ✅ 5 posted as inline comment(s)
  • 📝 1 posted as summary

📄 internal/tool/definitions.go

Truncating by byte index (out[:codeGraphMaxOutput]) can split a multi-byte UTF-8 character, producing invalid UTF-8 output. Consider truncating at a valid rune boundary instead, e.g.:

if len(out) > codeGraphMaxOutput {
    truncated := out[:codeGraphMaxOutput]
    // Trim to last valid rune boundary
    for len(truncated) > 0 && !utf8.ValidString(truncated) {
        truncated = truncated[:len(truncated)-1]
    }
    out = truncated + "\n\n[truncated: CodeGraph output exceeded tool limit]"
}

}

// FilterByName returns entries excluding any tool whose name appears in names.
func FilterByName(entries []ToolConfigEntry, names ...string) []ToolConfigEntry {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name FilterByName is ambiguous — it could be interpreted as "keep only entries matching these names" (inclusive filter) rather than "exclude entries matching these names" (exclusive filter). Consider renaming to something like ExcludeByName or FilterOutByName to make the exclusion semantics clear from the name alone, reducing the chance of misuse by future callers.

Suggestion:

Suggested change
func FilterByName(entries []ToolConfigEntry, names ...string) []ToolConfigEntry {
func ExcludeByName(entries []ToolConfigEntry, names ...string) []ToolConfigEntry {

Comment thread internal/tool/definitions.go Outdated
FileFind = Tool{name: "file_find"}
FileReadDiff = Tool{name: "file_read_diff"}
CodeSearch = Tool{name: "code_search"}
CodeGraph = Tool{name: "code_graph_context"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming inconsistency: All other tool variables match their string names (e.g., CodeSearch"code_search", FileRead"file_read"). Here, the variable is CodeGraph but the tool name is "code_graph_context". Consider renaming the variable to CodeGraphContext to maintain consistency and avoid confusion during maintenance.

Suggestion:

Suggested change
CodeGraph = Tool{name: "code_graph_context"}
CodeGraphContext = Tool{name: "code_graph_context"}

Comment thread internal/tool/code_graph.go Outdated
Comment on lines +130 to +132
if len(out) > codeGraphMaxOutput {
out = out[:codeGraphMaxOutput] + "\n\n[truncated: CodeGraph output exceeded tool limit]"
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: UTF-8 unsafe truncation. out[:codeGraphMaxOutput] slices by byte index, which can cut in the middle of a multi-byte UTF-8 character, producing invalid UTF-8 output. This could cause rendering issues or downstream parsing errors.

Use a UTF-8-safe truncation approach, for example:

if len(out) > codeGraphMaxOutput {
    truncated := out[:codeGraphMaxOutput]
    // Trim back to valid UTF-8 boundary
    for len(truncated) > 0 && !utf8.ValidString(truncated) {
        truncated = truncated[:len(truncated)-1]
    }
    out = truncated + "\n\n[truncated: CodeGraph output exceeded tool limit]"
}

Or use strings.ToValidUTF8() after slicing.

Comment thread internal/tool/code_graph.go Outdated
if kind := strings.TrimSpace(stringArg(args, "kind")); kind != "" {
cmdArgs = append(cmdArgs, "-k", kind)
}
cmdArgs = append(cmdArgs, query)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential flag injection via query/kind arguments. User-supplied query and kind values are passed directly as CLI arguments to the external codegraph binary. If query starts with -, it could be interpreted as a flag by the subprocess (e.g., --help, -p, etc.), leading to unexpected behavior or information disclosure.

Consider adding a -- separator before positional arguments to prevent flag injection, e.g.:

cmdArgs = append(cmdArgs, "--", query)

This should be applied consistently across all modes where user input is appended as a positional argument.

Comment on lines +111 to +114
err := cmd.Run()
if ctx.Err() != nil {
return "code_graph_context timed out. Try using mode=search with a specific symbol, or reduce max_files/limit.", nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout error handling discards partial output and original error. When a timeout occurs (ctx.Err() != nil), this check runs before examining err from cmd.Run(). This means:

  1. Any partial stdout/stderr that was captured before the timeout is silently discarded, making debugging difficult.
  2. If both a timeout and another error occur, only the timeout message is returned.

Consider including any partial output in the timeout response to aid debugging, e.g.:

if ctx.Err() != nil {
    msg := "code_graph_context timed out. Try using mode=search with a specific symbol, or reduce max_files/limit."
    if partial := strings.TrimSpace(stdout.String()); partial != "" {
        msg += "\n\nPartial output:\n" + stripANSI(partial)
    }
    return msg, nil
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3ece64b: timeout handling now builds the response through codeGraphTimeoutMessage, including any captured stdout and stderr as partial output while keeping the existing timeout guidance. I also added a focused unit test for that message construction.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


chenyangyang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add optional CodeGraph structural context tool

2 participants