fix(security): validate clone host exactly across providers and block /help_docs repo overrides (#2445)#2450
fix(security): validate clone host exactly across providers and block /help_docs repo overrides (#2445)#2450naorpeled wants to merge 1 commit into
Conversation
… /help_docs repo overrides (#2445) Providers validated the git clone target with a substring check (e.g. `github_com not in repo_url_to_clone`) before embedding an auth token into the clone URL/header. A host that merely *contained* the allowed host string, such as `github.com.attacker.tld/o/r` or `https://github.com@attacker.tld/o/r`, passed validation and the token was sent to the attacker host. The same weak pattern existed in the GitHub, GitLab, Gitea, Bitbucket Cloud and Bitbucket Server providers. Add a shared `GitProvider._validate_clone_url_and_extract_path` helper that: - requires the URL host to match the configured host exactly (case-insensitive) - rejects embedded credentials (userinfo) and non-http(s) schemes - rejects paths containing whitespace (argument-injection guard) - returns only the path, so each provider rebuilds the clone URL from its trusted authority (host[:port]) and the token can never reach another host Apply it in all five providers. For Bitbucket Server, also build the git argv as a list instead of shlex-splitting an f-string, so the repo URL can no longer be re-parsed into extra git arguments. Additionally, block untrusted comment commands from overriding the clone target by adding pr_help_docs.repo_url, pr_help_docs.repo_default_branch and pr_help_docs.docs_path to the forbidden CLI args list. Adds regression tests covering lookalike hosts, userinfo injection, scheme restrictions, whitespace/arg-injection, self-hosted hosts/ports, all five providers, and the new blocklist entries.
Code Review by Qodo
Context used 1. GitLab clone URL embeds token
|
PR Summary by QodoFix clone URL host validation across providers and block help_docs overrides WalkthroughsDescription• Enforce exact clone host validation to prevent auth token exfiltration via lookalike URLs. • Rebuild clone URLs from trusted configured authorities across GitHub/GitLab/Gitea/Bitbucket. • Block /help_docs runtime overrides of repo/branch/path from untrusted comment commands. Diagramgraph TD
PRC["Untrusted comment command (/help_docs)"] --> CLI["CliArgs.validate_user_args"] --> PRV["Provider _prepare_clone_url_with_token"] --> VAL["GitProvider._validate_clone_url_and_extract_path"] --> URL["Rebuilt clone URL (trusted host[:port])"] --> GIT["git clone (argv list / subprocess)"] --> HOST["Remote git host (configured)"]
High-Level AssessmentThe following are alternative approaches to this PR: 1. Use header/credential-helper auth (avoid tokens in URLs)
2. Centralize cloning into a single hardened clone helper
Recommendation: Merge this PR’s approach: exact hostname validation + rebuilding URLs from trusted configured authority is the correct minimal-risk remediation across providers. Consider a follow-up to migrate providers away from embedding tokens in clone URLs (prefer headers/credential helper), but that’s a larger compatibility change and is reasonable to do separately. File ChangesBug fix (6)
Refactor (1)
Tests (1)
|
| # Rebuild from the trusted authority (host[:port]) so the token cannot reach a different host. | ||
| scheme = (base_parsed.scheme or "https") + "://" | ||
| clone_url = f"{scheme}oauth2:{access_token}@{base_parsed.netloc}{repo_full_name}" |
There was a problem hiding this comment.
1. Gitlab clone url embeds token 📎 Requirement gap ⛨ Security
The code constructs clone URLs for multiple git providers (GitLab, GitHub, Gitea, and Bitbucket Cloud) by embedding provider access tokens directly in the URL (e.g., https://oauth2:<*******t;@... / https://<token>@... / https://x-token-auth:<*******t;@...). This violates the requirement to avoid placing secrets in clone URLs where they can leak via logs, process arguments, or stored git remote strings.
Agent Prompt
## Issue description
Several provider implementations build git clone URLs by embedding authentication tokens directly in the URL (GitLab `oauth2:<token>@...`, GitHub/Gitea `<token>@...`, Bitbucket Cloud `x-token-auth:<token>@...`), which violates compliance requirements.
## Issue Context
Compliance (PR Compliance ID 10) requires that tokens/secrets are not embedded in URLs used for git operations because they can be exposed via logs, process lists/args, or persisted git remote strings. Prefer authentication approaches that keep secrets out of URLs, such as git credential helpers or `http.extraHeader`-based auth.
## Fix Focus Areas
- pr_agent/git_providers/gitlab_provider.py[977-980]
- pr_agent/git_providers/github_provider.py[1220-1223]
- pr_agent/git_providers/gitea_provider.py[731-734]
- pr_agent/git_providers/bitbucket_provider.py[633-645]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def _make_stub(token="ghp_SECRETTOKEN", base_url_html="https://github.com", deployment_type="user"): | ||
| """Build a real GithubProvider instance without running __init__, setting only the | ||
| attributes used by _prepare_clone_url_with_token (inherited base helpers stay available).""" | ||
| stub = object.__new__(GithubProvider) | ||
| stub.auth = SimpleNamespace(token=token) | ||
| stub.base_url_html = base_url_html | ||
| stub.deployment_type = deployment_type | ||
| return stub |
There was a problem hiding this comment.
2. Tests contain token-like literals 📘 Rule violation ⛨ Security
New unit tests include hardcoded token-like strings (e.g., ghp_SECRETTOKEN, glpat-TOKEN) which can trigger secret scanners or be mistaken for real credentials. This violates the requirement to avoid hardcoded secrets/tokens in the repository.
Agent Prompt
## Issue description
The new tests hardcode token-like values (e.g., GitHub `ghp_...` style), which violates the no-hardcoded-secrets policy and may trip secret scanning.
## Issue Context
Even in tests, committed token-like strings should be avoided; use clearly non-secret placeholders (e.g., `TEST_TOKEN`), fixtures, or environment-provided dummy values.
## Fix Focus Areas
- tests/unittest/test_github_clone_url_validation.py[11-18]
- tests/unittest/test_github_clone_url_validation.py[31-45]
- tests/unittest/test_github_clone_url_validation.py[97-134]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
Fixes #2445.
Git providers validated the clone target with a substring check (e.g.
github_com not in repo_url_to_clone) before embedding an auth token into the clone URL/header. A host that merely contained the allowed host string — e.g.github.com.attacker.tld/o/rorhttps://github.com@attacker.tld/o/r— passed validation, so the token was sent to the attacker host. Via/help_docs, an untrusted PR commenter could set--pr_help_docs.repo_url=...and exfiltrateGITHUB_TOKEN.The same weak pattern existed in all five providers (GitHub, GitLab, Gitea, Bitbucket Cloud, Bitbucket Server), each leaking its own token, so this is fixed as a class rather than a single call site.
Changes
Sink — new shared validator
GitProvider._validate_clone_url_and_extract_path(repo_url, allowed_host):http(s)schemeshost[:port]), so the token can never reach a different host, and self-hosted ports/context paths are preservedApplied in GitHub, GitLab, Gitea, Bitbucket Cloud and Bitbucket Server. For Bitbucket Server, the git command is now built as an argv list instead of
shlex.splitof an f-string, so the repo URL can't be re-parsed into extragitarguments (-c ...injection against the bearer-token clone).Source — runtime override blocklist: added
pr_help_docs.repo_url,pr_help_docs.repo_default_branchandpr_help_docs.docs_pathto the forbidden CLI args, so untrusted comment commands can no longer override the clone target (these must come from maintainer-controlled config).Tests
tests/unittest/test_github_clone_url_validation.py— 20 regression tests covering lookalike hosts, userinfo injection, scheme restrictions, whitespace/arg-injection, missing path, self-hosted hosts and non-standard ports, all five providers, and the new blocklist entries. Full unit suite passes (536 tests).Notes / follow-up
/help_docsonauthor_associationis a property of the consumer's GitHub Actions workflow, not PR-Agent code, so it is out of scope here.