Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
7f63d4e
feat: add unified-diff parser for tokenless diff provider (#2457)
vectorkovacspeter Jun 22, 2026
5f0c079
feat: reconstruct base file by reverse-applying diff (#2457)
vectorkovacspeter Jun 22, 2026
2fe6773
refactor: harden diff base reconstruction (CRLF, newline, tests) (#2457)
vectorkovacspeter Jun 22, 2026
3a7f74e
feat: add DiffGitProvider with stdout/file output (#2457)
vectorkovacspeter Jun 22, 2026
eb2bf46
fix: guard malformed diff parsing in DiffGitProvider (#2457)
vectorkovacspeter Jun 22, 2026
e4868f8
feat: CLI --stdin/--diff-file/--output for tokenless diff mode (#2457)
vectorkovacspeter Jun 22, 2026
4416ce2
test: end-to-end coverage for tokenless diff provider (#2457)
vectorkovacspeter Jun 22, 2026
8306df7
test: strengthen diff provider e2e (base reconstruction + mocked-LLM)…
vectorkovacspeter Jun 22, 2026
5b83516
docs: document tokenless local diff mode (#2457)
vectorkovacspeter Jun 22, 2026
daeb714
fix: harden DiffGitProvider capability gating and file reads (#2457)
vectorkovacspeter Jun 22, 2026
566cbaf
test: make path-traversal guard test a real sentinel (#2457)
vectorkovacspeter Jun 22, 2026
304384d
fix(diff-provider): address Qodo review findings on tokenless diff mode
vectorkovacspeter Jun 22, 2026
dfb19b2
fix(diff-provider): resolve Qodo follow-up findings on tokenless diff…
vectorkovacspeter Jun 23, 2026
a05bce3
fix(diff-provider): resolve Qodo third-round findings on tokenless di…
vectorkovacspeter Jun 23, 2026
ae473c4
fix(diff-provider): hunk-only patches + isort/test-isolation (Qodo ro…
vectorkovacspeter Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions docs/docs/usage-guide/tokenless_diff_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Tokenless local diff mode

Run PR-Agent against a raw unified diff with no platform API token and no PR URL.
Results are printed to stdout (and optionally saved to a file). This suits security-first
or air-gapped environments where HTTP access tokens are avoided, and enables pre-push
hooks or CI pipelines that operate on a diff artifact rather than a live pull request.

## Usage

Pipe a diff directly from stdin:

```bash
git diff main...feature-branch | python -m pr_agent.cli --stdin review
```

Or pass a diff file and save the output alongside stdout:

```bash
git diff main...feature-branch > changes.diff
python -m pr_agent.cli --diff-file changes.diff --output review.md review
```

### Flags

| Flag | Description |
|---|---|
| `--stdin` | Read a unified diff from stdin |
| `--diff-file <path>` | Read a unified diff from a file |
| `--output <path>` | Write the result to a file in addition to stdout |

`--stdin` and `--diff-file` are mutually exclusive. At least one must be provided to
enter tokenless diff mode; omitting both falls back to the normal `--pr_url` flow.

### Supported commands

`review`, `improve`, `describe`, and `ask` are supported. Because there is no hosting
platform to push to, `improve` renders its code suggestions as a single markdown
document to stdout (and to `--output`, if given) instead of as committable inline
suggestions. Commands that require live platform interaction (such as
`update_changelog` or `similar_issue`) are not meaningful in this mode.

## How it works

1. **Diff parsing** — the unified diff is parsed locally into per-file patch objects.
Binary files are skipped automatically.

2. **Working-tree enrichment** — when PR-Agent is run inside the repository working tree,
it reads each changed file from disk and reverse-applies the diff to reconstruct the
base (pre-change) file content. Having both the base and head versions available gives
the LLM full file context, which produces higher-quality analysis.

3. **Patch-only fallback** — if a changed file cannot be found on disk (e.g. the diff was
generated elsewhere, or the file was deleted), PR-Agent falls back to patch-only mode
for that file. The review still runs; it simply has less context.

4. **Output** — the result is written to stdout. If `--output <path>` is given, it is
also written to that file (UTF-8, overwritten on each run).

No platform token, no PR URL, and no internet access are required for the diff processing
step itself. An LLM API key is still needed unless you configure a local model.

## Difference from the `local` git provider

The existing `git_provider = "local"` mode (invoked with `--pr_url`) computes a diff
by comparing branches in a local Git repository and requires a clean working tree.
The diff mode is different in the following ways:

| | `local` provider | `diff` provider (this page) |
|---|---|---|
| Input | Branch names in a local repo | A unified diff supplied via stdin or file |
| Working tree required | Yes (clean) | No |
| Platform token required | No | No |
| Output | GitHub-style comment published locally | stdout (+ optional file) |
| Inline comments | Not supported | Not supported |

Use the `diff` provider when you already have a diff artifact (e.g. from a CI step or
`git format-patch`) and want a zero-configuration, token-free review.
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ nav:
- Managing Mail Notifications: 'usage-guide/mail_notifications.md'
- Changing a Model: 'usage-guide/changing_a_model.md'
- Additional Configurations: 'usage-guide/additional_configurations.md'
- Tokenless Diff Mode: 'usage-guide/tokenless_diff_mode.md'
- Frequently Asked Questions: 'faq/index.md'
- Tools:
- 'tools/index.md'
Expand Down
34 changes: 32 additions & 2 deletions pr_agent/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import asyncio
import os
import sys

from pr_agent.agent.pr_agent import PRAgent, commands
from pr_agent.algo.utils import get_version
Expand Down Expand Up @@ -65,6 +66,12 @@ def set_parser():
"Repo-local .pr_agent.toml overrides values set here."
),
)
parser.add_argument("--diff-file", dest="diff_file", type=str, default=None,
help="Path to a unified diff file to review (tokenless local mode)")
parser.add_argument("--stdin", action="store_true", default=False,
help="Read a unified diff from stdin (tokenless local mode)")
parser.add_argument("--output", dest="output", type=str, default=None,
help="Write the result to this file (in addition to stdout)")
parser.add_argument('command', type=str, help='The', choices=commands, default='review')
parser.add_argument('rest', nargs=argparse.REMAINDER, default=[])
return parser
Expand All @@ -83,7 +90,29 @@ def run(inargs=None, args=None):
parser = set_parser()
if not args:
args = parser.parse_args(inargs)
if not args.pr_url and not args.issue_url:
diff_mode = getattr(args, "stdin", False) or getattr(args, "diff_file", None)
if diff_mode:
if args.stdin and args.diff_file:
parser.error("--stdin and --diff-file are mutually exclusive")
if args.diff_file:
try:
with open(args.diff_file, "r", encoding="utf-8") as fh:
diff_content = fh.read()
except OSError as e:
parser.error(f"Could not read --diff-file '{args.diff_file}': {e}")
except UnicodeDecodeError as e:
parser.error(f"--diff-file '{args.diff_file}' is not valid UTF-8 text: {e}")
else:
diff_content = sys.stdin.read()
if not diff_content.strip():
parser.error("No diff content received (empty stdin/file)")
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
get_settings().set("config.git_provider", "diff")
get_settings().set("diff.content", diff_content)
get_settings().set("diff.output_path", getattr(args, "output", None))
# Diff mode's whole purpose is to emit the result to stdout/--output, so
# force publishing on even if a config/env set publish_output=false.
get_settings().set("config.publish_output", True)
elif not args.pr_url and not args.issue_url:
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
parser.print_help()
return

Expand All @@ -106,7 +135,8 @@ async def inner():
if args.issue_url:
result = await asyncio.create_task(PRAgent().handle_request(args.issue_url, [command] + args.rest))
else:
result = await asyncio.create_task(PRAgent().handle_request(args.pr_url, [command] + args.rest))
target = args.pr_url if args.pr_url else "local_diff"
result = await asyncio.create_task(PRAgent().handle_request(target, [command] + args.rest))

if get_settings().litellm.get("enable_callbacks", False):
# There may be additional events on the event queue from the run above. If there are give them time to complete.
Expand Down
9 changes: 8 additions & 1 deletion pr_agent/git_providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pr_agent.git_providers.gitlab_provider import GitLabProvider
from pr_agent.git_providers.local_git_provider import LocalGitProvider
from pr_agent.git_providers.gitea_provider import GiteaProvider
from pr_agent.git_providers.diff_provider import DiffGitProvider

_GIT_PROVIDERS = {
'github': GithubProvider,
Expand All @@ -23,7 +24,8 @@
'codecommit': CodeCommitProvider,
'local': LocalGitProvider,
'gerrit': GerritProvider,
'gitea': GiteaProvider
'gitea': GiteaProvider,
'diff': DiffGitProvider,
}


Expand Down Expand Up @@ -57,6 +59,11 @@ def get_git_provider_with_context(pr_url) -> GitProvider:
else:
try:
provider_id = get_settings().config.git_provider
# Tokenless diff mode is keyed on loaded diff content; it must not be
# overridden by an extra/repo config file that sets a different
# git_provider (apply_repo_settings merges those before this call).
if get_settings().get("diff.content", None):
provider_id = "diff"
if provider_id not in _GIT_PROVIDERS:
raise ValueError(f"Unknown git provider: {provider_id}")
git_provider = _GIT_PROVIDERS[provider_id](pr_url)
Expand Down
114 changes: 114 additions & 0 deletions pr_agent/git_providers/diff_parsing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
from unidiff import PatchSet
from unidiff.errors import UnidiffParseError

from pr_agent.algo.types import EDIT_TYPE, FilePatchInfo
from pr_agent.log import get_logger


def _strip_prefix(path: str | None) -> str | None:
if path is None:
return None
if path.startswith(("a/", "b/")):
return path[2:]
return path


def to_hunk_only_patch(patch_str: str) -> str:
"""Drop file-header lines ('diff --git', 'index', '---', '+++') that precede
the first '@@' hunk header.

Platform providers (GitHub, GitLab, ...) store hunk-only patches, and the
shared hunk/line-number converter treats any '+'/'-' line as content. Left
in, the '---'/'+++' headers would be emitted as a bogus leading hunk with
invalid line numbers. Returns "" when there is no hunk (e.g. rename-only)."""
lines = patch_str.splitlines(keepends=True)
for i, line in enumerate(lines):
if line.startswith("@@"):
return "".join(lines[i:])
return ""


def parse_unified_diff(diff_text: str) -> list[FilePatchInfo]:
"""Parse a unified diff into FilePatchInfo objects (patch + metadata only).

base_file / head_file are left empty here; the provider fills them from the
working tree and by reverse-applying the patch. Binary files are skipped.
"""
patch_set = PatchSet(diff_text)
files: list[FilePatchInfo] = []
for pf in patch_set:
if pf.is_binary_file:
get_logger().info(f"Skipping binary file in diff: {pf.path}")
continue
if pf.is_added_file:
edit_type = EDIT_TYPE.ADDED
elif pf.is_removed_file:
edit_type = EDIT_TYPE.DELETED
elif pf.is_rename:
edit_type = EDIT_TYPE.RENAMED
else:
edit_type = EDIT_TYPE.MODIFIED

if pf.is_removed_file: # target is /dev/null: use source path as the name
filename = _strip_prefix(pf.source_file)
else:
filename = _strip_prefix(pf.target_file)
old_filename = _strip_prefix(pf.source_file) if pf.is_rename else None

files.append(
FilePatchInfo(
base_file="",
head_file="",
patch=str(pf),
filename=filename,
edit_type=edit_type,
old_filename=old_filename,
)
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
)
return files


def reconstruct_base_file(head_file_str: str, patch_str: str) -> str:
"""Reverse-apply a single-file unified diff to head (new) content to recover
base (original) content. Returns "" if the patch does not cleanly apply."""
try:
patch_set = PatchSet(patch_str)
except UnidiffParseError as e:
get_logger().info(f"Could not parse patch for base reconstruction: {e}")
return ""
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
if len(patch_set) != 1:
return ""

head_lines = head_file_str.splitlines()
base_lines: list[str] = []
head_idx = 0 # 0-based cursor into head_lines

for hunk in patch_set[0]:
hunk_head_start = hunk.target_start - 1 # 1-based -> 0-based
if hunk_head_start < head_idx or hunk_head_start > len(head_lines):
return "" # out-of-order / out-of-bounds hunk
base_lines.extend(head_lines[head_idx:hunk_head_start])
head_idx = hunk_head_start

for line in hunk:
value = line.value.rstrip("\r\n")
if line.is_context:
if head_idx >= len(head_lines) or head_lines[head_idx] != value:
return ""
base_lines.append(head_lines[head_idx])
head_idx += 1
elif line.is_added: # present only in head: verify + skip
if head_idx >= len(head_lines) or head_lines[head_idx] != value:
return ""
head_idx += 1
elif line.is_removed: # present only in base: emit, don't consume head
base_lines.append(value)

base_lines.extend(head_lines[head_idx:])
result = "\n".join(base_lines)
# Preserve a trailing newline only when the base actually has content; an
# empty base (e.g. reversing an add-file patch) must stay "" so downstream
# extend_patch() correctly treats it as having no original file.
if base_lines and head_file_str.endswith("\n"):
result += "\n"
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
return result
Loading
Loading