feat: sandboxed verification worker (closes #68) by tcconnally · Pull Request #73 · Perseus-Computing-LLC/pr-pilot

tcconnally · 2026-06-18T02:14:36Z

Implements sandboxed verification (#68).

The SandboxVerifier clones the PR branch into a temp directory, applies all generated patches, detects the test framework (pytest, unittest, jest, cargo, go, npm, make), runs real tests, and reports evidence-backed pass/fail results.

When SANDBOX_ENABLED=true, the Verifier agent uses real sandbox verification first; falls back to LLM-only otherwise. VERIFIED_AUTO_APPROVE now gates actual GitHub APPROVE on sandbox evidence passing.

Wired through webhook → pr_context → AgentChain → Verifier.

Add SandboxVerifier that actually clones the PR branch, applies generated patches, and runs the project's real test suite in an isolated temp directory. Supports 7 test frameworks: pytest, unittest, jest, cargo, go, npm, and make. Dual-mode in Verifier agent: sandbox first when SANDBOX_ENABLED=true (with github_token, repo_clone_url, branch_name in context), LLM-only fallback otherwise. The VERIFIED_AUTO_APPROVE flag now gates actual GitHub APPROVE on real sandbox evidence.

github-actions

🤖 PR Pilot Review

🔍 Reviewer (Agent 1)

This pull request introduces a sandboxed verification worker, a significant and valuable feature. However, it contains a critical security vulnerability that leaks repository access tokens in error logs and a high-severity bug in the patch application logic. These issues must be resolved before this can be merged.

📋 Issues Found

[CRITICAL] security — The _clone_repo method may leak the GitHub access token into the result details. If the git clone command fails, its stderr, which can contain the full clone URL with the embedded token, is captured and returned. This sensitive information could then be logged or exposed. (src/sandbox_verifier.py:210)
[HIGH] bug — The fallback logic for applying a patch when the patch command is not found is incorrect. It writes the patch content (a unified diff) directly to the target file, which will corrupt the file instead of patching it. This will lead to incorrect verification failures. (src/sandbox_verifier.py:307)
[MEDIUM] bug — The _run_tests method contains redundant logic for detecting test frameworks. It first calls _detect_test_framework, which already checks for package.json and Makefile, and then has a separate if/elif block to check for these files again as a fallback. This makes the code harder to maintain. (src/sandbox_verifier.py:353)
[LOW] style — The code uses asyncio.get_event_loop() to get the event loop for running the cleanup task. In modern Python (3.7+), asyncio.get_running_loop() is preferred as it's context-aware and safer. (src/agents/verifier.py:120)

🔧 Fixer (Agent 2)

Fixed a security vulnerability that could leak GitHub tokens, removed incorrect patch fallback logic, consolidated redundant test framework detection, and updated to modern asyncio usage.

🧪 Tester (Agent 3)

Tester failed: Unterminated string starting at: line 7 column 18 (char 141)

✅ Verifier (Agent 4)

Verification skipped.

👤 Decision: ESCALATE TO HUMAN

This PR requires human review — too complex or risky for autonomous handling.

Pipeline duration: ~356.3s

Review by PR Pilot — AI-native code quality service

tcconnally merged commit 17976fc into main Jun 18, 2026
2 of 4 checks passed

github-actions Bot reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: sandboxed verification worker (closes #68)#73

feat: sandboxed verification worker (closes #68)#73
tcconnally merged 1 commit into
mainfrom
feat/68-sandboxed-verification

tcconnally commented Jun 18, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tcconnally commented Jun 18, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🤖 PR Pilot Review

🔍 Reviewer (Agent 1)

📋 Issues Found

🔧 Fixer (Agent 2)

🧪 Tester (Agent 3)

✅ Verifier (Agent 4)

👤 Decision: ESCALATE TO HUMAN

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant