Skip to content

feat: add GitHub Action to score PRs and suggest splits#39

Merged
vitali87 merged 5 commits into
mainfrom
feat/split-score-action
Mar 28, 2026
Merged

feat: add GitHub Action to score PRs and suggest splits#39
vitali87 merged 5 commits into
mainfrom
feat/split-score-action

Conversation

@vitali87

Copy link
Copy Markdown
Owner

Summary

  • Add a composite GitHub Action (action.yml) that scores every PR and posts a split plan comment when the PR is too large
  • Uses the graph backend by default — no API key needed
  • Includes scripts/score_pr.py which runs pr-split in dry-run mode, computes metrics, and generates a markdown comment
  • Updates/creates comments (idempotent via hidden HTML marker) so re-runs update the existing comment
  • Add example workflow at .github/workflows/split-score.yml
  • Document the action in README with inputs/outputs tables

How it works

  1. Installs pr-split via uv tool install
  2. Computes diff stats — if LOC <= max-loc, skips with "no split needed"
  3. Runs pr-split split --dry-run with the configured backend
  4. Parses the saved plan JSON and computes metrics (overflow, scatter, group count)
  5. If groups >= threshold, posts a PR comment with the full split plan table

Action outputs

  • total-loc, total-groups, objective, should-split — usable in downstream workflow steps

Test plan

  • Script syntax verified, lint clean
  • Full test suite passes (410 tests)
  • Manual test on a real PR after merge

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GitHub Action for pr-split, allowing users to automatically score PR complexity and receive split suggestions as CI comments. The changes include a new composite action definition, a Python script for scoring logic, and updated documentation. The review feedback highlights several improvement opportunities: implementing robust error handling for integer environment variables, documenting the missing python-version input, pinning action versions for better security, and replacing hardcoded file paths with dynamic temporary paths to ensure cross-platform compatibility.

Comment thread scripts/score_pr.py Outdated


def main() -> None:
max_loc = os.environ.get("MAX_LOC", "400")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The max_loc variable is read as a string here, but converted to an integer in multiple places later (e.g., line 53, 100). This can raise a ValueError if a non-integer string is provided by the user, causing the action to fail. It would be more robust to parse it to an integer here, handle a potential ValueError with a clear error message, and then use the integer variable throughout the script (e.g., if total_loc <= max_loc:).

Suggested change
max_loc = os.environ.get("MAX_LOC", "400")
try:
max_loc = int(os.environ.get("MAX_LOC", "400"))
except ValueError:
print("Error: MAX_LOC must be an integer.", file=sys.stderr)
sys.exit(1)

Comment thread scripts/score_pr.py Outdated
min_loc = os.environ.get("MIN_LOC", "")
strategy = os.environ.get("PARTITION_STRATEGY", "graph")
priority = os.environ.get("PRIORITY", "orthogonal")
threshold = int(os.environ.get("THRESHOLD_GROUPS", "2"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The conversion int(...) can raise a ValueError if the environment variable THRESHOLD_GROUPS is not a valid integer. This will crash the script. It's safer to wrap this in a try-except block to handle potential errors gracefully and provide a more informative error message.

Suggested change
threshold = int(os.environ.get("THRESHOLD_GROUPS", "2"))
try:
threshold = int(os.environ.get("THRESHOLD_GROUPS", "2"))
except ValueError:
print("Error: THRESHOLD_GROUPS must be an integer.", file=sys.stderr)
sys.exit(1)

Comment thread README.md
with:
fetch-depth: 0

- uses: vitali87/pr-split@main

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is a security and stability best practice to pin actions to a specific version (a tag or a commit SHA) instead of a branch like main. Using @main can introduce breaking changes into your workflow unexpectedly. Please consider updating this to use a specific version tag once you create a release.

For example:

      - uses: vitali87/pr-split@v1.0.0

Comment thread README.md
| `partition-strategy` | `graph` | Backend for partitioning (`graph` or `cp_sat`) |
| `priority` | `orthogonal` | Grouping priority (`orthogonal` or `logical`) |
| `threshold-groups` | `2` | Minimum suggested groups before posting the split plan |
| `post-comment` | `true` | Whether to post a PR comment with the results |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The action.yml file defines a python-version input, but it's not documented here in the Action inputs table. It would be helpful to add it for completeness.

Suggested change
| `post-comment` | `true` | Whether to post a PR comment with the results |
| `post-comment` | `true` | Whether to post a PR comment with the results |
| `python-version` | `3.12` | Python version to use |

Comment thread action.yml Outdated
with:
script: |
const fs = require('fs');
const body = fs.readFileSync('/tmp/pr-split-comment.md', 'utf8');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The path /tmp/pr-split-comment.md is hardcoded, which can be brittle and may not work on all runner operating systems. This path should be received as an output from the score step. I've left a corresponding comment on scripts/score_pr.py with the required changes for that file. Once the python script outputs a comment_path, this line would become:

const body = fs.readFileSync('${{ steps.score.outputs.comment_path }}', 'utf8');

Comment thread scripts/score_pr.py Outdated
lines.append("This PR is within acceptable size limits.")

comment = "\n".join(lines)
with open("/tmp/pr-split-comment.md", "w") as f:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a hardcoded path /tmp/pr-split-comment.md can be brittle and might not work on all runner environments (e.g., Windows). It's better to use Python's tempfile module to create a temporary file and then pass its path as an output to the next step. This would also require a change in action.yml to read from this output.

For example, you could replace lines 158-159 with:

comment_path = os.path.join(tempfile.gettempdir(), "pr-split-comment.md")
with open(comment_path, "w") as f:
    f.write(comment)
_set_output("comment_path", comment_path)

Remember to import tempfile and import os at the top of the file. I've left a related comment on action.yml.

@greptile-apps

greptile-apps Bot commented Mar 28, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces a composite GitHub Action (action.yml) and supporting Python script (scripts/score_pr.py) that automatically scores every PR for size and posts a markdown split-plan comment when it exceeds a configurable LOC threshold. It also adds an example workflow and documents the action in the README.

The implementation is well-structured — previous review concerns (shell injection via python-version, fork PR fetch via refs/pull/{n}/head, markdown table pipe escaping, and /tmp path collisions) have all been addressed. A few minor clean-up items remain:

  • depends_on group IDs in the markdown split-plan table are not passed through _md_escape, unlike every other dynamic field (title, id, file_path).
  • min_loc_raw is forwarded to pr-split without integer validation, inconsistent with the validated MAX_LOC and THRESHOLD_GROUPS inputs — a bad value produces a misleading "pr-split failed" message.
  • astral-sh/setup-uv and actions/github-script are pinned to floating version tags (@v4, @v7) rather than immutable commit SHAs, which is a supply-chain concern for a publicly-reusable action.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/hardening suggestions with no impact on correctness or runtime behaviour.

All P0/P1 issues raised in previous review threads have been addressed. The three remaining comments are P2: a minor markdown-escaping gap in depends_on, an inconsistent integer-validation path for min_loc_raw, and unpinned action SHAs. None of these affect the primary user path. Per the confidence guidance, a PR with only P2 findings scores 5/5.

scripts/score_pr.py — minor escaping and validation gaps; action.yml — unpinned third-party action tags.

Important Files Changed

Filename Overview
scripts/score_pr.py Core scoring script — fork-PR fetch handled correctly via refs/pull/{n}/head; minor gaps: depends_on values not escaped in markdown table, min_loc_raw not validated as integer
action.yml Composite action wiring is solid; python-version injection addressed via env indirection; unpinned @v4/@v7 action tags remain a supply-chain concern
.github/workflows/split-score.yml Example workflow triggering on pull_request — intentionally uses local ./ reference for self-testing; pull-requests: write permission declared correctly
README.md Documentation update adding GitHub Action section with inputs/outputs tables and usage example — accurate and consistent with action.yml

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Event
    participant WF as split-score.yml
    participant AC as action.yml (composite)
    participant PY as score_pr.py
    participant GS as github-script

    GH->>WF: pull_request opened/synced
    WF->>AC: uses: ./
    AC->>AC: Install uv + Python
    AC->>AC: uv tool install pr-split
    AC->>PY: python score_pr.py
    PY->>PY: git fetch origin base_branch
    PY->>PY: git fetch refs/pull/{n}/head (fork-safe)
    PY->>PY: git diff --numstat → total_loc
    alt total_loc <= max_loc
        PY-->>AC: should_split=false (skip)
    else total_loc > max_loc
        PY->>PY: pr-split split --dry-run
        PY->>PY: Parse .pr-split/plan.json
        PY->>PY: Compute overflow / scatter / objective
        PY->>PY: Write comment markdown to RUNNER_TEMP
        PY-->>AC: outputs: should_split, total_loc, total_groups, objective, comment_path
    end
    AC->>GS: Post comment (if should_split==true)
    GS->>GS: listComments → find <!-- pr-split-score -->
    alt comment exists
        GS->>GH: updateComment (idempotent)
    else no comment
        GS->>GH: createComment
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/score_pr.py
Line: 170

Comment:
**`depends_on` IDs not escaped like other table fields**

`deps` is the only column in the split-plan table whose values aren't passed through `_md_escape`. `depends_on` entries are group IDs — the same values that are escaped with `_md_escape(g["id"])` on line 175. If pr-split ever generates IDs containing `|` (or changes its ID format), this column would break the markdown table while all other columns are already protected.

```suggestion
    deps = ", ".join(_md_escape(d) for d in g.get("depends_on", [])) or "—"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/score_pr.py
Line: 102-103

Comment:
**`min-loc` bypasses integer validation**

`MAX_LOC` and `THRESHOLD_GROUPS` are validated through `_parse_int_env` (which exits with a clear message if the value isn't an integer), but `min_loc_raw` is passed straight to `pr-split` without any validation. If someone passes a non-integer string for `min-loc`, pr-split will fail and the script will silently fall through to `_skip("pr-split failed to generate a plan.")` — a misleading error that hides the real cause.

```suggestion
    if min_loc_raw:
        try:
            int(min_loc_raw)
        except ValueError:
            print(f"Error: MIN_LOC must be an integer, got '{min_loc_raw}'.", file=sys.stderr)
            sys.exit(1)
        cmd.extend(["--min-loc", min_loc_raw])
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: action.yml
Line: 47-48

Comment:
**Unpinned action tags are a supply-chain risk**

Both `astral-sh/setup-uv@v4` (line 48) and `actions/github-script@v7` (line 76) reference floating version tags rather than immutable commit SHAs. Any repo that uses this action will silently pick up whatever those tags point to if a tag is force-pushed. For a reusable action published to the Marketplace, pinning to SHA is the recommended practice:

```yaml
- uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb  # v4.x
```
```yaml
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea  # v7.x
```

Pin both to the exact commit SHA that corresponds to the version you trust.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix: prevent shell injection, support fo..." | Re-trigger Greptile

Comment thread scripts/score_pr.py Outdated
Comment thread action.yml Outdated
Comment thread action.yml
Comment thread scripts/score_pr.py Outdated
Comment thread scripts/score_pr.py Outdated
@vitali87 vitali87 force-pushed the feat/split-score-action branch from 12d823d to ee03398 Compare March 28, 2026 20:04
@vitali87 vitali87 merged commit 3f9577b into main Mar 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant