approve: fix silent approval bypass when PR exceeds GitHub file list API limit#707
approve: fix silent approval bypass when PR exceeds GitHub file list API limit#707jmguzik wants to merge 1 commit into
Conversation
✅ Deploy Preview for k8s-prow ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jmguzik The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
c3553c3 to
decefa0
Compare
| seen[f] = struct{}{} | ||
| } | ||
| for _, c := range commits { | ||
| commit, err := ghc.GetSingleCommit(org, repo, c.SHA) |
There was a problem hiding this comment.
Doesn't this API call have the same limitation?
There was a problem hiding this comment.
https://docs.github.com/en/rest/commits/commits?apiVersion=2026-03-10#get-a-commit
If there are more than 300 files in the commit diff and the default JSON media type is requested, the
response will include pagination link headers for the remaining files, up to a limit of 3000 files. Each
page contains the static commit information, and the only changes are to the file listing.
petr-muller
left a comment
There was a problem hiding this comment.
This is a really good find!
As @Prucek mentioned in #707 (comment) this is a bit more complicated and computing the appropriate set of approvers may be hard (use git to get affected paths) for large PRs.
A feasible solution is to require root-level OWNERS file approver for large PRs. There's a minor wrinkle that this basically overrides possible no_parent_owners somewhere in the tree, but I consider that setting to be more like a strong delegation power by higher-level approvers (who need to approve the no_parent_owners when it is first set). Root-level approvers have stronger permissions in other cases as well so treating them as repo-level fallback decision makers makes sense to me.
The trick is in efficiently detecting that the PR is too large. The issue here is that GetPullRequestChanges cannot be trusted when the PR is above certain size. Fortunately there is a good oracle - changed file count on the pull request object (GetPullRequest), which can be used to identify the situation, either by crosschecking against GetPullRequestChanges result (mismatch indicates the problem) or even before (a number above a threshold indicates calling GetPullRequestChanges may not even make sense).
Additionally, approve is not the only GetPullRequestChanges caller, so we may have similar bugs elsewhere.
So I think we should do the following:
- Make
GetPullRequestChangesdetect the case and return error (if it sees too many changes like >= 3000 do a GetPullRequest -- it's cacheable -- and crosscheck). That means any caller will need to handle the case appropriately. - Any caller is advised to do
GetPullRequestfirst if appropriate to avoid calling potentially paginatedGetPullRequestChangesthat's doomed to fail. - The
approveplugin would fall back to root-level approver instead of handling the actual paths as normal. I do not believe theno_parent_ownerscase is worth consideration.
We'd probably need to do something similar for GetSingleCommit but we lack the good crosscheck number there, so we'd probably need to treat any at-limit result as suspicious and untrustworthy.
…API limit GitHub's "List pull request files" API silently truncates results at 3000 files. The approve plugin previously assumed the returned file list was complete, which allowed unapproved changes to be merged if they appeared alphabetically after the 3000-file cutoff. Fix: detect truncation by comparing len(files) against the PullRequest.ChangedFiles field (the authoritative count from the API). When truncation is detected, skip automated approval entirely and post a comment informing the user that a repository administrator must manually apply the approved label after verifying OWNERS requirements. If a bot-applied approved label already exists from a prior evaluation, it is removed to prevent stale approvals from persisting. Signed-off-by: Jakub Guzik <jguzik@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
decefa0 to
f8fb3fd
Compare
GitHub's "List pull request files" REST API endpoint returns at most 3000 files across all pages. When a pull request exceeds this limit, GitHub silently stops returning further pages. Prow's GetPullRequestChanges follows pagination via Link headers, so it terminates normally without any indication that the result is incomplete.
This creates a security vulnerability: a contributor who has OWNERS approval rights over an alphabetically-early directory can craft a PR with more than 3000 file changes in directories they own, then append additional changes in directories they do not own. Because the approve plugin only evaluates the files returned by the API, the hidden files are never checked against OWNERS, and the PR can receive the approved label without proper review coverage. This was observed in practice against openshift/hypershift#8355.
Fix:
Add ChangedFiles to the PullRequest struct so that the value GitHub returns on the PR object itself (which is always accurate, unlike the paginated file list) can be compared against the number of files the API actually returned.
Introduce a getFilenames helper that implements a layered strategy:
When truncation cannot be resolved:
For PRs within the limit the behaviour is unchanged: the fast path returns immediately after the first API call.