Intelligent File Grouping: Progress and Challenges with LLM-Driven Approach #196

lizhengfeng101 · 2026-06-23T16:24:30Z

lizhengfeng101
Jun 23, 2026
Maintainer

Background

When processing large PRs, OCR needs to intelligently group changed files so that functionally related files can be reviewed together within a single LLM context window. Our internal version achieves this through an embedding model for semantic clustering, which delivers strong results.

To bring this capability to open-source users, we developed an alternative approach that does not depend on an embedding model — instead, it leverages the user's already-configured LLM to drive file grouping. The implementation is available on the feat/grouping branch.

How It Works

Uses the LLM to understand logical relationships between files (e.g., implementation + test, interface + implementation, API handler + model), grouping related files for co-review
Supports a configurable line-count threshold to prevent any single group from exceeding context limits
Automatically isolates oversized files into their own groups; skips grouping entirely for small-scale changes

Why We're Not Releasing It Yet

We evaluated this approach against 200 real PRs and found that the LLM-driven grouping shows a noticeable drop in F1 score compared to our internal embedding-based approach. The main reason is that the LLM relies solely on file paths and change statistics for grouping decisions, lacking the deeper semantic understanding of file contents that embeddings provide. This leads to insufficient grouping accuracy in complex scenarios.

Next Steps

We will continue exploring better alternatives for the open-source version, with the goal of approaching the internal solution's grouping quality without requiring an additional embedding model dependency. If the community has ideas or suggestions, we'd love to hear them in this discussion!

Related branch: feat/grouping

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intelligent File Grouping: Progress and Challenges with LLM-Driven Approach #196

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Intelligent File Grouping: Progress and Challenges with LLM-Driven Approach #196

Uh oh!

lizhengfeng101 Jun 23, 2026 Maintainer

Background

How It Works

Why We're Not Releasing It Yet

Next Steps

Replies: 0 comments

lizhengfeng101
Jun 23, 2026
Maintainer