Review agent: escalate PR-description-vs-diff discrepancies above Info severity

## What happened

On [PR #2179](https://github.com/fullsend-ai/fullsend/pull/2179), the review agent detected that the PR description claimed to fix a pre-existing broken link (`-inference.md` → `getting-inference.md`) in `docs/guides/README.md`, but this change was not present in the diff. The agent correctly identified this discrepancy but rated it as `Info` severity under the `broken-link-fix` tag. The human reviewer approved the PR without addressing this or any other finding.

The review agent ran as workflow [27355898627](https://github.com/fullsend-ai/.fullsend/actions/runs/27355898627) (2026-06-11 14:55–15:09 UTC). The human approved on 2026-06-12 with an empty review body.

## What could go better

When a PR description explicitly claims a change that isn't reflected in the diff, this indicates one of two problems: (1) the change was accidentally omitted and should be included, or (2) the PR description is inaccurate and misleading for future readers. Either case is more significant than Info severity.

I'm fairly confident this is a repeatable pattern. Agent-generated PRs include structured summaries, and human-authored PRs often describe intended changes that may not all land in the final diff. Catching these discrepancies at a higher severity would increase the chance of human engagement.

Uncertainty: I don't have data on how often this pattern occurs across other PRs. It's possible this is rare enough that escalating severity would add noise rather than signal. A sample of 10-20 PRs with Info-level findings would help validate frequency.

## Proposed change

In the review agent's finding classification logic (likely in the code-review skill or review sub-agent definitions), add a heuristic that detects when the PR description/body explicitly claims a change (e.g., 'Fixes a broken link in X', 'Adds Y to Z') that cannot be corroborated by the diff. When detected, classify this as Medium severity with a tag like `description-diff-mismatch` rather than Info.

The detection should look for action verbs in the PR body (fixes, adds, removes, updates, changes) paired with specific file or path references, then verify those files appear in the diff with corresponding changes. This is a targeted check, not a general NLP problem — PR descriptions tend to use predictable phrasing.

This change belongs in the review agent's finding evaluation layer, not in per-repo configuration.

## Validation criteria

Over the next 20 review agent runs on PRs with non-trivial descriptions:
1. At least 1-2 PRs should trigger the `description-diff-mismatch` finding when applicable.
2. False positive rate should be below 20% (i.e., at most 1 in 5 triggered findings should be incorrect).
3. When triggered, the finding should be rated Medium, not Info.
4. Spot-check 5 triggered findings manually to confirm they identify genuine discrepancies between description and diff.

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/2179_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review agent: escalate PR-description-vs-diff discrepancies above Info severity #2280

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Review agent: escalate PR-description-vs-diff discrepancies above Info severity #2280

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions