Skip to content

approve: fix silent approval bypass when PR exceeds GitHub file list API limit#707

Open
jmguzik wants to merge 1 commit into
kubernetes-sigs:mainfrom
jmguzik:global-trusted-apps
Open

approve: fix silent approval bypass when PR exceeds GitHub file list API limit#707
jmguzik wants to merge 1 commit into
kubernetes-sigs:mainfrom
jmguzik:global-trusted-apps

Conversation

@jmguzik

@jmguzik jmguzik commented May 5, 2026

Copy link
Copy Markdown
Contributor

GitHub's "List pull request files" REST API endpoint returns at most 3000 files across all pages. When a pull request exceeds this limit, GitHub silently stops returning further pages. Prow's GetPullRequestChanges follows pagination via Link headers, so it terminates normally without any indication that the result is incomplete.

This creates a security vulnerability: a contributor who has OWNERS approval rights over an alphabetically-early directory can craft a PR with more than 3000 file changes in directories they own, then append additional changes in directories they do not own. Because the approve plugin only evaluates the files returned by the API, the hidden files are never checked against OWNERS, and the PR can receive the approved label without proper review coverage. This was observed in practice against openshift/hypershift#8355.

Fix:

  1. Add ChangedFiles to the PullRequest struct so that the value GitHub returns on the PR object itself (which is always accurate, unlike the paginated file list) can be compared against the number of files the API actually returned.

  2. Introduce a getFilenames helper that implements a layered strategy:

    • Fast path: use GetPullRequestChanges (works for <=3000 files).
    • Fallback: if the file list is shorter than PR.ChangedFiles, call ListPullRequestCommits and GetSingleCommit for each commit (up to maxCommitsForFallback=10) and union the filenames. This handles the common case of large auto-generated PRs that touch many files across a small number of commits.
    • Give up: if the PR has more than 10 commits, or if the per-commit file lists are also truncated, declare truncation.
  3. When truncation cannot be resolved:

    • Post an [ApprovalNotifier] comment explaining why automated approval is blocked, with the specific reason (too many commits, or individual commits also truncated). The comment is deduplicated against the latest existing notifier comment to avoid spam.
    • Remove the approved label if present and it was not added manually by a human (preserving manual administrator overrides). If the WasLabelAddedByHuman API call fails, label state is preserved rather than risking removal of a legitimate manual override.
    • Return nil so the hook handler does not log a spurious error.

For PRs within the limit the behaviour is unchanged: the fast path returns immediately after the first API call.

@netlify

netlify Bot commented May 5, 2026

Copy link
Copy Markdown

Deploy Preview for k8s-prow ready!

Name Link
🔨 Latest commit f8fb3fd
🔍 Latest deploy log https://app.netlify.com/projects/k8s-prow/deploys/6a313c088f7eee0008bdcb11
😎 Deploy Preview https://deploy-preview-707--k8s-prow.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@linux-foundation-easycla

linux-foundation-easycla Bot commented May 5, 2026

Copy link
Copy Markdown

CLA Not Signed

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmguzik
Once this PR has been reviewed and has the lgtm label, please assign stevekuznetsov for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/plugins Issues or PRs related to prow's plugins for the hook component label May 5, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 5, 2026
@jmguzik jmguzik force-pushed the global-trusted-apps branch from c3553c3 to decefa0 Compare May 5, 2026 10:23
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 5, 2026
Comment thread pkg/plugins/approve/approve.go Outdated
seen[f] = struct{}{}
}
for _, c := range commits {
commit, err := ghc.GetSingleCommit(org, repo, c.SHA)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this API call have the same limitation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.github.com/en/rest/commits/commits?apiVersion=2026-03-10#get-a-commit

If there are more than 300 files in the commit diff and the default JSON media type is requested, the
response will include pagination link headers for the remaining files, up to a limit of 3000 files. Each
page contains the static commit information, and the only changes are to the file listing.

@petr-muller petr-muller left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good find!

As @Prucek mentioned in #707 (comment) this is a bit more complicated and computing the appropriate set of approvers may be hard (use git to get affected paths) for large PRs.

A feasible solution is to require root-level OWNERS file approver for large PRs. There's a minor wrinkle that this basically overrides possible no_parent_owners somewhere in the tree, but I consider that setting to be more like a strong delegation power by higher-level approvers (who need to approve the no_parent_owners when it is first set). Root-level approvers have stronger permissions in other cases as well so treating them as repo-level fallback decision makers makes sense to me.

The trick is in efficiently detecting that the PR is too large. The issue here is that GetPullRequestChanges cannot be trusted when the PR is above certain size. Fortunately there is a good oracle - changed file count on the pull request object (GetPullRequest), which can be used to identify the situation, either by crosschecking against GetPullRequestChanges result (mismatch indicates the problem) or even before (a number above a threshold indicates calling GetPullRequestChanges may not even make sense).

Additionally, approve is not the only GetPullRequestChanges caller, so we may have similar bugs elsewhere.

So I think we should do the following:

  1. Make GetPullRequestChanges detect the case and return error (if it sees too many changes like >= 3000 do a GetPullRequest -- it's cacheable -- and crosscheck). That means any caller will need to handle the case appropriately.
  2. Any caller is advised to do GetPullRequest first if appropriate to avoid calling potentially paginated GetPullRequestChanges that's doomed to fail.
  3. The approve plugin would fall back to root-level approver instead of handling the actual paths as normal. I do not believe the no_parent_owners case is worth consideration.

We'd probably need to do something similar for GetSingleCommit but we lack the good crosscheck number there, so we'd probably need to treat any at-limit result as suspicious and untrustworthy.

…API limit

GitHub's "List pull request files" API silently truncates results at
3000 files. The approve plugin previously assumed the returned file
list was complete, which allowed unapproved changes to be merged if
they appeared alphabetically after the 3000-file cutoff.

Fix: detect truncation by comparing len(files) against the
PullRequest.ChangedFiles field (the authoritative count from the API).
When truncation is detected, skip automated approval entirely and post
a comment informing the user that a repository administrator must
manually apply the approved label after verifying OWNERS requirements.

If a bot-applied approved label already exists from a prior evaluation,
it is removed to prevent stale approvals from persisting.

Signed-off-by: Jakub Guzik <jguzik@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@jmguzik jmguzik force-pushed the global-trusted-apps branch from decefa0 to f8fb3fd Compare June 16, 2026 12:05
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/plugins Issues or PRs related to prow's plugins for the hook component cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants