test: e2e tests harness + simple_prompt scenario#173
Open
luca-iachini wants to merge 67 commits into
Open
Conversation
Add the full integration test infrastructure: harness, config, audit utilities, CI workflow, and supporting crate changes. Wire up one scenario (normal_llm_call) to validate the end-to-end flow before the remaining scenarios land in the follow-up PR.
Add 7 scenarios covering the key enforcement policies: block_paste_service, block_unlisted_host, tool_call_exfil, direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.
supervisor writes flat AuthorityConfig TOML; firma authority --config calls load_section(..., "authority") which expects a section wrapper.
Per-run authority always runs plaintext on loopback. User config may have TLS cert paths and a fixed listen_addr; carrying those into the spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server). Clear tls config and select an ephemeral loopback port up front.
Default ca.dir is "./firma-ca/" relative to sidecar CWD (firma run's CWD). sidecar_trust_env_overrides expects firma-ca.crt at <marker_dir>/firma-ca/firma-ca.crt. Path mismatch meant the cert was never found, env vars not injected into agent, agent rejected the MITM CA with x509 unknown authority.
Remaining scenarios land on fir-368-integration-tests.
firma run hard-errors without structural network enforcement unless this flag is set. Needed on macOS and Linux without bwrap.
Kill the process and collect buffered stdout/stderr instead of returning empty strings, making timeout failures debuggable.
Snapshot dynamic fields (ids, timestamps, latency) so failures show a structured diff of the full audit event rather than a bare string.
- .config/nextest.toml: e2e profile builds firma automatically unless FIRMA_BIN is set to a prebuilt binary - Makefile: add `make e2e` target - README: drop firma binary prereq (handled by nextest), update run commands to use nextest
# Conflicts: # Makefile
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
|
Ahead of merging, let's make sure that it succeeds in CI via workflow dispatch. |
# Conflicts: # Cargo.lock # Makefile # crates/firma-sidecar/src/config/enforcement.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Black-box E2E harness validating the OpenFirma enforcement boundary against real coding agents (Claude Code + Codex CLI).
tests/e2e/— new test target, hooked into thefirmacrate via[[test]](no separate workspace member).firma run. Confirms the expected ALLOW/DENY outcome + correct audit events.wiremockMockServer; LLM provider calls are stubbed so tests need no live API.instasnapshots (per agent + scenario).scenario_tests!macro generates one#[tokio::test]per(agent, scenario); all#[ignore]— run with--include-ignored.simple_prompt(greeting → LLM provider → ALLOW). Remaining scenarios land in the follow-up PR.Modules
setup.rsScenarioSetupbuilder,FirmaConfigBuilder, git workspace init, mock serverscenario.rsEnforcementScenariotrait +PhaseOutputrunner.rsrun_scenario— drives baseline + enforcement phasesagent.rsAgentKind(Claude/Codex), spawn argsaudit.rsFirmaAuditTrail— parse JSONL audit log, snapshot assertionsconfig.rs/policy.rsscenarios/simple_prompt.rssimple_promptscenarioSupporting changes
e2e-tests.ymlCI: matrixubuntu-latest(bwrap) ×macos-latest(vz) ×claude+codex; onv*.*.*tags +workflow_dispatch; actions pinned to commit SHAs.e2eprofile +make e2eentry point; buildsfirma(debug) unlessFIRMA_BINis set.firma-runfixes for spawned sidecar config: pinca.dirto marker dir, strip TLS/ephemeral port inresolve_persisted_paths, wrap authority config in[authority].*.chatgpt.comsubdomains ascommunication.external.send.Run
Prerequisites
claudeorcodexbwrapon Linux;vzsandbox on macOS (OS-provided)