feat: add GitHub Action to score PRs and suggest splits#39
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a GitHub Action for pr-split, allowing users to automatically score PR complexity and receive split suggestions as CI comments. The changes include a new composite action definition, a Python script for scoring logic, and updated documentation. The review feedback highlights several improvement opportunities: implementing robust error handling for integer environment variables, documenting the missing python-version input, pinning action versions for better security, and replacing hardcoded file paths with dynamic temporary paths to ensure cross-platform compatibility.
|
|
||
|
|
||
| def main() -> None: | ||
| max_loc = os.environ.get("MAX_LOC", "400") |
There was a problem hiding this comment.
The max_loc variable is read as a string here, but converted to an integer in multiple places later (e.g., line 53, 100). This can raise a ValueError if a non-integer string is provided by the user, causing the action to fail. It would be more robust to parse it to an integer here, handle a potential ValueError with a clear error message, and then use the integer variable throughout the script (e.g., if total_loc <= max_loc:).
| max_loc = os.environ.get("MAX_LOC", "400") | |
| try: | |
| max_loc = int(os.environ.get("MAX_LOC", "400")) | |
| except ValueError: | |
| print("Error: MAX_LOC must be an integer.", file=sys.stderr) | |
| sys.exit(1) |
| min_loc = os.environ.get("MIN_LOC", "") | ||
| strategy = os.environ.get("PARTITION_STRATEGY", "graph") | ||
| priority = os.environ.get("PRIORITY", "orthogonal") | ||
| threshold = int(os.environ.get("THRESHOLD_GROUPS", "2")) |
There was a problem hiding this comment.
The conversion int(...) can raise a ValueError if the environment variable THRESHOLD_GROUPS is not a valid integer. This will crash the script. It's safer to wrap this in a try-except block to handle potential errors gracefully and provide a more informative error message.
| threshold = int(os.environ.get("THRESHOLD_GROUPS", "2")) | |
| try: | |
| threshold = int(os.environ.get("THRESHOLD_GROUPS", "2")) | |
| except ValueError: | |
| print("Error: THRESHOLD_GROUPS must be an integer.", file=sys.stderr) | |
| sys.exit(1) |
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - uses: vitali87/pr-split@main |
There was a problem hiding this comment.
It is a security and stability best practice to pin actions to a specific version (a tag or a commit SHA) instead of a branch like main. Using @main can introduce breaking changes into your workflow unexpectedly. Please consider updating this to use a specific version tag once you create a release.
For example:
- uses: vitali87/pr-split@v1.0.0| | `partition-strategy` | `graph` | Backend for partitioning (`graph` or `cp_sat`) | | ||
| | `priority` | `orthogonal` | Grouping priority (`orthogonal` or `logical`) | | ||
| | `threshold-groups` | `2` | Minimum suggested groups before posting the split plan | | ||
| | `post-comment` | `true` | Whether to post a PR comment with the results | |
There was a problem hiding this comment.
The action.yml file defines a python-version input, but it's not documented here in the Action inputs table. It would be helpful to add it for completeness.
| | `post-comment` | `true` | Whether to post a PR comment with the results | | |
| | `post-comment` | `true` | Whether to post a PR comment with the results | | |
| | `python-version` | `3.12` | Python version to use | |
| with: | ||
| script: | | ||
| const fs = require('fs'); | ||
| const body = fs.readFileSync('/tmp/pr-split-comment.md', 'utf8'); |
There was a problem hiding this comment.
The path /tmp/pr-split-comment.md is hardcoded, which can be brittle and may not work on all runner operating systems. This path should be received as an output from the score step. I've left a corresponding comment on scripts/score_pr.py with the required changes for that file. Once the python script outputs a comment_path, this line would become:
const body = fs.readFileSync('${{ steps.score.outputs.comment_path }}', 'utf8');| lines.append("This PR is within acceptable size limits.") | ||
|
|
||
| comment = "\n".join(lines) | ||
| with open("/tmp/pr-split-comment.md", "w") as f: |
There was a problem hiding this comment.
Using a hardcoded path /tmp/pr-split-comment.md can be brittle and might not work on all runner environments (e.g., Windows). It's better to use Python's tempfile module to create a temporary file and then pass its path as an output to the next step. This would also require a change in action.yml to read from this output.
For example, you could replace lines 158-159 with:
comment_path = os.path.join(tempfile.gettempdir(), "pr-split-comment.md")
with open(comment_path, "w") as f:
f.write(comment)
_set_output("comment_path", comment_path)Remember to import tempfile and import os at the top of the file. I've left a related comment on action.yml.
Greptile SummaryThis PR introduces a composite GitHub Action ( The implementation is well-structured — previous review concerns (shell injection via
Confidence Score: 5/5Safe to merge — all remaining findings are P2 style/hardening suggestions with no impact on correctness or runtime behaviour. All P0/P1 issues raised in previous review threads have been addressed. The three remaining comments are P2: a minor markdown-escaping gap in scripts/score_pr.py — minor escaping and validation gaps; action.yml — unpinned third-party action tags. Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub Event
participant WF as split-score.yml
participant AC as action.yml (composite)
participant PY as score_pr.py
participant GS as github-script
GH->>WF: pull_request opened/synced
WF->>AC: uses: ./
AC->>AC: Install uv + Python
AC->>AC: uv tool install pr-split
AC->>PY: python score_pr.py
PY->>PY: git fetch origin base_branch
PY->>PY: git fetch refs/pull/{n}/head (fork-safe)
PY->>PY: git diff --numstat → total_loc
alt total_loc <= max_loc
PY-->>AC: should_split=false (skip)
else total_loc > max_loc
PY->>PY: pr-split split --dry-run
PY->>PY: Parse .pr-split/plan.json
PY->>PY: Compute overflow / scatter / objective
PY->>PY: Write comment markdown to RUNNER_TEMP
PY-->>AC: outputs: should_split, total_loc, total_groups, objective, comment_path
end
AC->>GS: Post comment (if should_split==true)
GS->>GS: listComments → find <!-- pr-split-score -->
alt comment exists
GS->>GH: updateComment (idempotent)
else no comment
GS->>GH: createComment
end
Prompt To Fix All With AIThis is a comment left during a code review.
Path: scripts/score_pr.py
Line: 170
Comment:
**`depends_on` IDs not escaped like other table fields**
`deps` is the only column in the split-plan table whose values aren't passed through `_md_escape`. `depends_on` entries are group IDs — the same values that are escaped with `_md_escape(g["id"])` on line 175. If pr-split ever generates IDs containing `|` (or changes its ID format), this column would break the markdown table while all other columns are already protected.
```suggestion
deps = ", ".join(_md_escape(d) for d in g.get("depends_on", [])) or "—"
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: scripts/score_pr.py
Line: 102-103
Comment:
**`min-loc` bypasses integer validation**
`MAX_LOC` and `THRESHOLD_GROUPS` are validated through `_parse_int_env` (which exits with a clear message if the value isn't an integer), but `min_loc_raw` is passed straight to `pr-split` without any validation. If someone passes a non-integer string for `min-loc`, pr-split will fail and the script will silently fall through to `_skip("pr-split failed to generate a plan.")` — a misleading error that hides the real cause.
```suggestion
if min_loc_raw:
try:
int(min_loc_raw)
except ValueError:
print(f"Error: MIN_LOC must be an integer, got '{min_loc_raw}'.", file=sys.stderr)
sys.exit(1)
cmd.extend(["--min-loc", min_loc_raw])
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: action.yml
Line: 47-48
Comment:
**Unpinned action tags are a supply-chain risk**
Both `astral-sh/setup-uv@v4` (line 48) and `actions/github-script@v7` (line 76) reference floating version tags rather than immutable commit SHAs. Any repo that uses this action will silently pick up whatever those tags point to if a tag is force-pushed. For a reusable action published to the Marketplace, pinning to SHA is the recommended practice:
```yaml
- uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb # v4.x
```
```yaml
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.x
```
Pin both to the exact commit SHA that corresponds to the version you trust.
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "fix: prevent shell injection, support fo..." | Re-trigger Greptile |
12d823d to
ee03398
Compare
Summary
action.yml) that scores every PR and posts a split plan comment when the PR is too largegraphbackend by default — no API key neededscripts/score_pr.pywhich runs pr-split in dry-run mode, computes metrics, and generates a markdown comment.github/workflows/split-score.ymlHow it works
uv tool installpr-split split --dry-runwith the configured backendAction outputs
total-loc,total-groups,objective,should-split— usable in downstream workflow stepsTest plan