feat: sandboxed verification worker (closes #68)#73
Conversation
Add SandboxVerifier that actually clones the PR branch, applies generated patches, and runs the project's real test suite in an isolated temp directory. Supports 7 test frameworks: pytest, unittest, jest, cargo, go, npm, and make. Dual-mode in Verifier agent: sandbox first when SANDBOX_ENABLED=true (with github_token, repo_clone_url, branch_name in context), LLM-only fallback otherwise. The VERIFIED_AUTO_APPROVE flag now gates actual GitHub APPROVE on real sandbox evidence.
There was a problem hiding this comment.
🤖 PR Pilot Review
🔍 Reviewer (Agent 1)
This pull request introduces a sandboxed verification worker, a significant and valuable feature. However, it contains a critical security vulnerability that leaks repository access tokens in error logs and a high-severity bug in the patch application logic. These issues must be resolved before this can be merged.
📋 Issues Found
- [CRITICAL] security — The
_clone_repomethod may leak the GitHub access token into the result details. If thegit clonecommand fails, its stderr, which can contain the full clone URL with the embedded token, is captured and returned. This sensitive information could then be logged or exposed. (src/sandbox_verifier.py:210) - [HIGH] bug — The fallback logic for applying a patch when the
patchcommand is not found is incorrect. It writes the patch content (a unified diff) directly to the target file, which will corrupt the file instead of patching it. This will lead to incorrect verification failures. (src/sandbox_verifier.py:307) - [MEDIUM] bug — The
_run_testsmethod contains redundant logic for detecting test frameworks. It first calls_detect_test_framework, which already checks forpackage.jsonandMakefile, and then has a separateif/elifblock to check for these files again as a fallback. This makes the code harder to maintain. (src/sandbox_verifier.py:353) - [LOW] style — The code uses
asyncio.get_event_loop()to get the event loop for running the cleanup task. In modern Python (3.7+),asyncio.get_running_loop()is preferred as it's context-aware and safer. (src/agents/verifier.py:120)
🔧 Fixer (Agent 2)
Fixed a security vulnerability that could leak GitHub tokens, removed incorrect patch fallback logic, consolidated redundant test framework detection, and updated to modern asyncio usage.
🧪 Tester (Agent 3)
Tester failed: Unterminated string starting at: line 7 column 18 (char 141)
✅ Verifier (Agent 4)
Verification skipped.
👤 Decision: ESCALATE TO HUMAN
This PR requires human review — too complex or risky for autonomous handling.
Pipeline duration: ~356.3s
Review by PR Pilot — AI-native code quality service
Implements sandboxed verification (#68).
The
SandboxVerifierclones the PR branch into a temp directory, applies all generated patches, detects the test framework (pytest, unittest, jest, cargo, go, npm, make), runs real tests, and reports evidence-backed pass/fail results.When
SANDBOX_ENABLED=true, the Verifier agent uses real sandbox verification first; falls back to LLM-only otherwise.VERIFIED_AUTO_APPROVEnow gates actual GitHub APPROVE on sandbox evidence passing.Wired through webhook → pr_context → AgentChain → Verifier.