feat: multi-backend reviewers (fan-out) + pre-edit block + native host-hook packaging#2
feat: multi-backend reviewers (fan-out) + pre-edit block + native host-hook packaging#2metaphorics wants to merge 37 commits into
Conversation
Generalize the single-Codex reviewer into a data-driven backend registry that can fan out to N reviewers per event and merge their verdicts. Default REFLECTOR_BACKENDS=["codex"] short-circuits to the inline single call, keeping the codex path byte-identical (INV-CODEX-PATH-STABLE). - U1: Backend spec table + invoke_backend(); shared _codex_argv() used by both the codex reviewer row and the summarizer invoke_codex (byte-identity incl. CODEX_REFLECTOR_MODEL override + LIGHTNING effort bump). Reviewer call sites routed through invoke_backend; summarizer stays pinned to codex+FAST_MODEL. - U2: four prompt-delivery mechanisms, stdout/file capture, per-spec timeout, read-only levers per backend (codex --sandbox read-only; claude/grok --permission-mode plan; cursor --mode plan; grok --sandbox read-only; agy --sandbox + stdin=DEVNULL). No write-enabling flag on any path. - U3: fan_out() (ThreadPoolExecutor, N=1 short-circuit) + pure merge_verdicts() (drop infra-empty, parse per-reviewer raw, asymmetric lattice, empty-set sentinel) + resolve_backends() + REVIEWER_LABEL(set) + Stop delivery budget (_compact_output_stop caps compaction to 1 layer). - U4: codex-only effort gating; non-codex force backend-native default_model; model override is codex-scoped only. - U5: self-test extended to 129 cases (byte-identity x3, levers, merge lattice, resolve precedence, label, codex-scoped override, summarizer pinning). Security fix (adversarial review): build_bash_failure_prompt now wraps its attacker-controllable inputs in _sandbox_content (was _redact only) — closing a prompt-injection hole; sandbox audit widened to all 8 builders. Fixed claude/cursor-agent default_model (were inheriting the codex gpt-5.5 id) to sonnet / sonnet-4; added a guard that no non-codex backend uses DEFAULT_MODEL. Op: extend
Add an opt-in PreToolUse pre-flight review that can DENY a high-severity edit
before it lands. Off by default (REFLECTOR_PREEDIT_BLOCK); respond_pretooluse
gates-first and returns None before any work, and no PreToolUse entry is added
to committed hooks/hooks.json — so the default Claude Code path is unchanged
(INV-CODEX-PATH-STABLE). Opt-in installers wire the event later.
- build_pretooluse_prompt: reviews the PROPOSED edit (tool_input), never the
on-disk file or tool_response; _redact + _sandbox_content; higher FAIL bar
(block only high-confidence/high-severity; when in doubt PASS).
- respond_pretooluse: single primary reviewer (resolve_backends[0], never
fan_out); FAIL -> exit-0 stdout hookSpecificOutput.permissionDecision="deny"
(INV-DENY-STDOUT, never exit-2/stderr — the only channel Grok honors);
PASS/UNCERTAIN/empty -> None quiet allow (fail-open, never permissionDecision
="allow"); never writes fail-state (INV-PREBLOCK-NOSTATE).
- Deny-loop breaker (M-B): per-(file,edit-hash) counter in a SEPARATE
/tmp/codex-reflector-denies-{sid}.json (flock); after N denies of the same
edit, allow + advisory; cwd-derived fallback key when session_id is absent so
the breaker can't silently disable (e.g. under Grok).
- _ME_PRE_EDIT preset bounds effort/latency on the synchronous path.
Adversarial-review fixes: deny reason and the >400K prompt-build path hard-
truncate instead of running unbounded compaction (INV-STOP-DELIVERY on the
synchronous path — a host timeout mid-compaction would drop the deny and let
the edit land). _sandbox_content neutralizes the closing fence via an explicit
escape so a crafted proposed edit can't break out and force PASS
(applies to all builders). Deny label attributes the one reviewer that ran.
Self-test 149/149.
Op: extend
Introduce the Axis-B host seam so future hosts plug in without touching the router core (KTD-7). Two symmetric seams + a canonical schema: - resolve_host(payload): REFLECTOR_HOST env wins (validated), else infer from distinguishing payload keys, ambiguous -> claude identity (safe default). - _normalize_input(host, hook_data): host payload -> canonical Claude-shaped hook_data; cursor -> existing _normalize_cursor_input, claude/codex identity, grok/antigravity clean identity extension points for U10/U11. - Canonical namedtuple (systemMessage/additionalContext/reason/blocking/ permission_decision/permission_decision_reason + raw) + _to_canonical lift. - _render_host_output / _render_identity_output: the claude/codex/cursor renderer re-emits the responder dict VERBATIM from canonical.raw using the exact pre-refactor exit-code expression, so default-host output is byte- identical by construction (INV-CODEX-PATH-STABLE; verified by a 48-case differential oracle). - probe_backends (m3): absent reviewer binary -> visible notice, treated as infra-empty (never UNCERTAIN) so it can't wedge Stop. Also fixes a regression where gating _normalize_cursor_input on workspace keys under-detected bare Cursor payloads; resolve_host now keys on Cursor's camelCase event name. Self-test 190/190. Op: extend
…y (KTD-11) A present-but-logged-out stdout-capture backend (grok/claude/cursor-agent/agy) can exit nonzero while printing auth-error text to stdout. Without a returncode check that text was returned as a "verdict", parsed to UNCERTAIN, and on Stop (fail-closed) blocked the agent — the exact wedge KTD-11 forbids. Guard stdout-capture backends: nonzero exit -> "" (infra-empty, excluded from the merge). codex (file-capture) is unchanged, so INV-CODEX-PATH-STABLE holds. Op: correct Restores: spec:KTD-11
install-cursor.sh gains a --pre-edit flag that adds an opt-in preToolUse hook (maps via Cursor's _CURSOR_EVENT_MAP) so the pre-edit deny gate can run under Cursor. Default install is unchanged (4 events, no PreToolUse); the hook also self-gates on REFLECTOR_PREEDIT_BLOCK at runtime, so it is inert unless enabled. Cursor is otherwise already a working host via the U7 seam; .cursor-plugin/ plugin.json manifest already present. Cloud-Stop degradation + settings.json drift docs are consolidated in U12. Op: extend
_normalize_grok_input maps Grok's Claude-compat camelCase stdin envelope (hookEventName, workspaceRoot/CLAUDE_PROJECT_DIR, sessionId/conversationId) to canonical Claude-shaped fields. _render_grok_output is asymmetric per KTD-9: PreToolUse emits permissionDecision=deny on stdout (Grok's only honored hard-block channel); post/Stop/PreCompact are advisory — logged to a /tmp side-channel + best-effort systemMessage, never additionalContext, never exit 2. install-grok.sh wires the 5 hooks into Claude-compat discovery and enables REFLECTOR_PREEDIT_BLOCK unconditionally (Grok's sole enforcement seam). Wired both seams (_normalize_input, _render_host_output); self-test +18 grok cases (209/209). Firing+deny wire-shape is KTD-10 smoke-test-gated. Op: extend
…e (U11) Op: extend
Package the reflector to run inside Codex's native hook system and make the
failure-diagnostic flow live on Codex.
- .codex-plugin/plugin.json: Codex plugin manifest mirroring .claude-plugin,
declaring a hooks capability (-> ./hooks/codex-hooks.json).
- hooks/codex-hooks.json: native wiring for PostToolUse/Stop/PreCompact via a
committed wrapper (codex-reflector-hook.sh) using ${CLAUDE_PLUGIN_ROOT}.
PreToolUse pre-edit hard-block is OPT-IN, split into codex-hooks-preedit.json.
- scripts/install-codex.sh: idempotent installer (POSIX sh, python3 read-modify-
write, no jq). Merges ~/.codex/hooks.json and PRE-TRUSTS hook commands BY HASH
([hooks.state] in config.toml, backup + parse-validate + rollback). Generates
abs-path wrappers. --preedit opt-in; --dangerously-bypass-hook-trust footnote.
- B4: _normalize_codex_input re-emits PostToolUseFailure from error/non-zero
PostToolUse payloads so classify() routes to bash_failure / code_change_failure.
Wired host=="codex" in _normalize_input. INV-CODEX-PATH-STABLE preserved: a
non-error codex PostToolUse passes through byte-identical (ambiguous code/status
spellings excluded from exit-key detection).
- run_self_test: B4 re-emit + invariant cases; manifest/hooks JSON schema-shape
validation. 218/218 passed; ruff + shellharden clean.
Op: extend
Holistic /ce-code-review of the merged host layer (0 blockers) surfaced two shipped-behavior enforcement bugs + hardening: - Cursor --pre-edit shipped INERT: the wired preToolUse hook omitted REFLECTOR_PREEDIT_BLOCK=1, so respond_pretooluse gated off and never denied. Now the --pre-edit command sets it inline (Cursor runs hooks via a shell, like grok/codex); default install still wires no preToolUse hook. - Antigravity host-pin relied on an inline `VAR=val cmd` env-prefix the codex unit deliberately distrusted. If agy execs without shell interpretation, the Stop payload (no workspacePaths) infers host=claude and reads the bare /tmp fail-state file — silently losing the FAIL (INV-STOP-DELIVERY). Now mirrors codex: a committed wrapper (hooks/antigravity-reflector-hook.sh) and an installer-generated absolute-path wrapper `export REFLECTOR_HOST=antigravity`, referenced as `sh <wrapper>` (plain two-arg exec, no shell-interp dependency). - Refreshed the stale _normalize_input docstring (grok/antigravity now dispatch, not identity passthrough). - Extended the INV-READONLY forbidden-flag self-test set with --no-plan, --dangerously-bypass-approvals-and-sandbox, workspace-write, danger-full-access. Self-test 270/270; ruff clean; installers sh -n clean. Op: correct Restores: spec:INV-STOP-DELIVERY
…2 simplify) /ce-simplify-code pass: moved the _IDENTITY_HOSTS frozenset from the host-seam section down to just before its sole consumer _state_path, so the definition precedes use, and deleted the apologetic forward-reference comment that only existed to excuse the out-of-order reference. Byte-identical value, single consumer (the identity renderer dispatches by string literal, not the symbol); strictly behavior-preserving (adversarially verified). The simplify pass deliberately left the per-host normalizers/renderers separate (merging would hide host-specific wire-shape structure — speculative DRY) and the rare-path double parse_verdict alone (factoring needs restructuring). Self-test 270/270. Op: compress
…g checklist (U12) - CLAUDE.md: fixed the stale model-override gotcha (now codex/reviewer-scoped; summarizer pinned to FAST_MODEL ignoring the env vars); added the two-axis Reviewer×Host model, fan-out + merge_verdicts, the opt-in pre-edit gate, the env-var table, the named-invariants table, reviewer×host capability matrix, and per-host gaps/known-limitations. - README.md: user-facing reviewer selection/fan-out, pre-edit gate, per-host install, the matrix, per-host gaps, auth prerequisites, version snapshot. - scripts/test-readonly.sh (NEW): behavioral INV-READONLY harness — drives each reviewer with a write-attempt prompt in a fresh scratch repo and asserts the repo is unchanged; SKIPs absent/unauthed CLIs (nonce liveness probe) and sandbox-unenforceable kernels. Carries the KTD-10 per-host firing checklist. Live results: claude PASS, grok PASS, codex FAIL (see below), cursor/agy SKIP. The harness surfaced a CRITICAL pre-existing finding: the codex reviewer is NOT read-only — `_codex_argv` passes `--sandbox read-only --full-auto`, but `--full-auto` (deprecated on codex 0.137.0) resolves the sandbox to workspace-write, overriding read-only. Documented honestly in CLAUDE.md/README; the argv fix is deferred to an explicit decision (it touches the byte-identity self-test + changes default behavior on Landlock-restricted kernels). Op: extend
…ad-only (INV-READONLY) The U12 behavioral harness proved the DEFAULT codex reviewer was NOT read-only: _codex_argv passed `--sandbox read-only --full-auto`, but on codex 0.137.0 `--full-auto` is deprecated and resolves the effective sandbox to workspace-write, OVERRIDING read-only — so prompt-injection from reviewed content could make codex write the repo. Pre-existing (the original reflector shipped --full-auto; the multi-backend feature preserved it byte-identically). Removed --full-auto from _codex_argv (single source for the codex reviewer AND the codex-pinned summarizer). codex now runs read-only on a sandbox-capable host, or REFUSES to run where the kernel cannot set up its sandbox (Landlock EPERM) — the review fails open (no review, no writes), never unsandboxed. Harness confirms: codex now SKIPs on the Landlock-restricted test kernel instead of writing. Deliberate, security-justified break of the codex argv byte-identity: INV-READONLY wins over INV-CODEX-PATH-STABLE byte-identity. Updated the two byte-identity self-test assertions and synced scripts/test-readonly.sh + CLAUDE.md + README.md. Self-test 270/270; ruff + shellcheck clean. Op: correct Restores: spec:INV-READONLY
The docstring said grok 'falls back to identity until its renderer lands' — stale since U10; grok has a dedicated asymmetric renderer. Documentation-only, zero behavioral impact. Closes the final code-review minor. Self-test 270/270. Op: correct
…arning (ce-compound) The --full-auto-defeats-read-only bug as a compounding solution doc: argv-presence checks pass while the effective sandbox is workspace-write; behavioral write-attempt testing is the arbiter. Bug track, security category. Op: extend
One-line pointer so future agents search the learnings store before debugging recurring issues (ce-compound discoverability check). Op: extend
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request expands the Codex Reflector plugin to support multiple reviewer backends (such as Claude, Grok, and Antigravity) and multiple agent hosts, adding corresponding installer scripts, native hook configurations, and a behavioral read-only test harness. Feedback on the changes highlights a potential failure in the newly added shell wrappers (antigravity-reflector-hook.sh, codex-reflector-hook.sh, and codex-reflector-preedit-hook.sh) if both CLAUDE_PLUGIN_ROOT and PLUGIN_ROOT are unset, and suggests robust relative path fallbacks to prevent execution errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughConverts Codex Reflector into a multi-host reflector: adds plugin manifest, per-host hook wiring and wrappers, host installers (idempotent/trusted), optional pre-edit hard-block wiring, read-only verification and installer idempotency tests, and extensive docs and security conventions. ChangesMulti-Backend Reflector Plugin
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR expands the reflector from a single Codex-based reviewer running in Claude Code into a two-axis system: (1) selectable, stackable reviewer backends with concurrent fan-out and merged verdicts, and (2) host-specific adapters/installers so the reflector can run inside multiple agent hosts (Claude Code, Codex, Cursor, Grok, Antigravity), including an opt-in synchronous pre-edit deny gate.
Changes:
- Added a backend registry + fan-out execution in
scripts/codex-reflector.py, with deterministic merge semantics and host input/output seams (including Grok and Antigravity renderers). - Introduced an opt-in
PreToolUsepre-edit hard-block gate (REFLECTOR_PREEDIT_BLOCK=1) with a deny-loop breaker and strict fail-open behavior on uncertainty/infra failures. - Added behavioral read-only verification harness and new host installers/packaging artifacts (Codex native hooks + trust-by-hash, Grok/Cursor Claude-compat wiring, Antigravity JSON hook wiring + enablement).
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/test-readonly.sh | New behavioral harness to verify reviewer “read-only” behavior via write-attempts on a scratch repo. |
| scripts/install-grok.sh | New installer wiring Grok via Claude-compat hooks, always enabling pre-edit blocking. |
| scripts/install-cursor.sh | Adds --pre-edit option to wire the opt-in PreToolUse gate for Cursor. |
| scripts/install-codex.sh | New installer for Codex native hooks, with wrapper scripts and best-effort trust-by-hash. |
| scripts/install-antigravity.sh | New installer for Antigravity JSON hooks plus enable_json_hooks settings toggle. |
| scripts/codex-reflector.py | Major refactor: backend registry, fan-out + merge, pre-edit gate, host normalization/rendering seams, and read-only/security hardening. |
| README.md | Updated docs for multi-backend reviewers, pre-edit gate, per-host install, and limitations. |
| hooks/codex-reflector-preedit-hook.sh | Opt-in pre-edit wrapper for Codex plugin-dir packaging. |
| hooks/codex-reflector-hook.sh | Codex plugin-dir wrapper pinning REFLECTOR_HOST=codex. |
| hooks/codex-hooks.json | Codex native hook wiring (default events) for plugin-dir packaging. |
| hooks/codex-hooks-preedit.json | Separate opt-in Codex PreToolUse wiring fragment. |
| hooks/antigravity-reflector-hook.sh | Antigravity plugin-dir wrapper pinning REFLECTOR_HOST=antigravity. |
| docs/solutions/security/readonly-sandbox-levers-need-behavioral-verification.md | New security writeup documenting the --full-auto sandbox override issue and behavioral testing approach. |
| CLAUDE.md | Expanded architecture/invariants documentation for reviewer×host axes and new safety contracts. |
| antigravity/hooks.json | Antigravity JSON hook wiring (plugin-dir artifact) pinning REFLECTOR_HOST=antigravity. |
| .codex-plugin/plugin.json | Codex plugin manifest for native hook packaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/install-cursor.sh (1)
136-155:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPre-edit hook entries are not deduplicated in merge mode.
Line 153 removes only hooks matching
codex_command, butPreToolUseusesREFLECTOR_PREEDIT_BLOCK=1 .... That leaves stale pre-edit hooks and duplicates on repeated--pre-editinstalls.Proposed fix
codex_command="python3 \"${reflector_script}\"" +codex_preedit_command="REFLECTOR_PREEDIT_BLOCK=1 ${codex_command}" @@ - jq --arg codex_command "$codex_command" -s ' - def without_codex($command): + jq --arg codex_command "$codex_command" --arg codex_preedit_command "$codex_preedit_command" -s ' + def without_codex($command; $preedit): map( if has("hooks") then - . + {hooks: (.hooks | map(select((.command // "") != $command)))} + . + {hooks: (.hooks | map(select(((.command // "") != $command) and ((.command // "") != $preedit))))} else . end ) @@ - (($existing.hooks[$key] // []) | without_codex($codex_command)) + (($existing.hooks[$key] // []) | without_codex($codex_command; $codex_preedit_command)) + ($codex.hooks[$key] // [])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/install-cursor.sh` around lines 136 - 155, The merge leaves duplicate PreToolUse/pre-edit entries because without_codex($codex_command) only removes exact matches on $codex_command but PreToolUse adds entries that include REFLECTOR_PREEDIT_BLOCK=1; update the jq pipeline around without_codex and the final hooks merge to filter out any hook whose (.command // "") equals $codex_command OR contains "REFLECTOR_PREEDIT_BLOCK" (or otherwise normalize by applying unique_by(.command) / unique on command strings after merging), so modify the without_codex function and/or the reduce that builds hooks to exclude those patterns and/or deduplicate by .command to prevent repeated pre-edit hooks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Around line 102-106: The fenced code block containing the flow diagram
starting with "stdin JSON → resolve_host() → _normalize_input(host) → classify()
→ _gate_model_effort(codex-only) → ..." is missing a language tag; change the
opening fence from ``` to ```text so the block is typed (e.g., replace the
untyped fenced block with one that begins with ```text) to satisfy markdownlint
MD040.
In `@scripts/install-antigravity.sh`:
- Around line 90-117: Normalize settings_path and hooks_path to absolute paths
before computing settings_dir/hooks_dir and before creating wrapper_path and
hook_command; for example, resolve settings_path and the chosen hooks_path (from
AGY_HOOKS or default) with a realpath/readlink -f step so settings_dir,
hooks_dir and wrapper_path are all absolute, then write the wrapper and set
hook_command using that absolute wrapper_path (ensure hook_command uses the
resolved wrapper_path variable). Update references around settings_dir,
hooks_path, hooks_dir, wrapper_path and hook_command so all path variables are
the resolved absolute values.
In `@scripts/install-codex.sh`:
- Around line 161-162: The merge code is always including preedit_command in the
local list named "ours" (currently set as "ours = {hook_command,
preedit_command}"), which causes existing pre-edit hooks to be stripped when
reinstalling without the --preedit flag; change the construction of "ours" so
that preedit_command is only included when the pre-edit feature is enabled
(e.g., check the existing preedit flag/variable such as "preedit" or "--preedit"
and append preedit_command to ours conditionally), and keep the downstream merge
logic that compares "ours" and existing hooks unchanged so that a reinstall
without preedit does not remove an existing pre-edit hook.
---
Outside diff comments:
In `@scripts/install-cursor.sh`:
- Around line 136-155: The merge leaves duplicate PreToolUse/pre-edit entries
because without_codex($codex_command) only removes exact matches on
$codex_command but PreToolUse adds entries that include
REFLECTOR_PREEDIT_BLOCK=1; update the jq pipeline around without_codex and the
final hooks merge to filter out any hook whose (.command // "") equals
$codex_command OR contains "REFLECTOR_PREEDIT_BLOCK" (or otherwise normalize by
applying unique_by(.command) / unique on command strings after merging), so
modify the without_codex function and/or the reduce that builds hooks to exclude
those patterns and/or deduplicate by .command to prevent repeated pre-edit
hooks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d932802a-93a3-4378-bca8-bf6ebb94ef2d
📒 Files selected for processing (16)
.codex-plugin/plugin.jsonCLAUDE.mdREADME.mdantigravity/hooks.jsondocs/solutions/security/readonly-sandbox-levers-need-behavioral-verification.mdhooks/antigravity-reflector-hook.shhooks/codex-hooks-preedit.jsonhooks/codex-hooks.jsonhooks/codex-reflector-hook.shhooks/codex-reflector-preedit-hook.shscripts/codex-reflector.pyscripts/install-antigravity.shscripts/install-codex.shscripts/install-cursor.shscripts/install-grok.shscripts/test-readonly.sh
When both CLAUDE_PLUGIN_ROOT and PLUGIN_ROOT are unset/empty, plugin_root collapsed to "" and the wrapper exec'd /scripts/codex-reflector.py at the filesystem root (fails). Fall back to the script's own parent dir (hooks/.. = repo root) so direct/absolute invocation of the wrapper still resolves the reflector. Addresses gemini-code-assist PR #2 review. Note: the ${X:-$(...)} default only evaluates when the var is unset, so the normal (env-set) path is unchanged. This does NOT rescue `sh "$EMPTY/hooks/..."` — that resolves to /hooks/... and fails in sh before this script runs. Op: correct Restores: spec:wrapper-resolves-reflector-path
_normalize_input dispatches host=="antigravity" to _normalize_antigravity_input (codex-reflector.py:2882), so the comment claiming "antigravity has no normalizer yet" was stale. The test still holds because the normalizer is a no-op on already-Claude-shaped input; the comment now states that idempotency property. Addresses Copilot PR #2 review. Op: correct Restores: spec:comment-matches-code
A relative --hooks-path or AGY_HOOKS baked a relative path into the wrapper's hook_command (`sh "rel/.../hook.sh"`), which fails once agy runs the hook from its own cwd. Re-resolve settings_dir/hooks_dir via `cd && pwd -P` after mkdir (the idiom install-codex.sh already uses) and rebuild the *_path vars, so the generated command is always absolute. Verified: relative --hooks-path now emits an absolute hook_command. Addresses CodeRabbit PR #2 (Major). Op: correct Restores: spec:hook-command-absolute-path
`ours` (the set of commands the idempotent merge strips before re-adding) unconditionally included preedit_command, but `desired` only re-adds the PreToolUse hook when --preedit is passed. So a plain reinstall stripped a previously opted-in pre-edit hard-block without re-adding it. Only claim ownership of preedit_command when pre-edit is enabled this run. Verified: install --preedit then plain reinstall preserves PreToolUse; PostToolUse stays a single group. Addresses CodeRabbit PR #2 (Major). Op: correct Restores: spec:preedit-hook-survives-default-reinstall
The data-flow fenced block was untyped; markdownlint MD040 flags it. Addresses CodeRabbit PR #2 (Minor). Op: correct Restores: spec:fenced-blocks-have-language
Two compress-only, behavior-preserving simplifications surfaced by an axis-decomposed simplify pass and confirmed by adversarial audit: - parse_verdict computed raw.strip() twice; bind once as `stripped` (raw is a never-reassigned param, so both sites were provably equal). - _read_tail had two comments narrating one readline(); keep the richer inline one, drop the redundant standalone. Oracle: --test-parse 270/270, ruff clean. Op: compress
…rawl) spec_needs_outfile() was a pure single-caller forwarder for `spec.output_capture == "file"` — and its sibling _build_backend_argv already inlines that exact test. Inline it at the one call site and drop the wrapper, so the file-capture predicate has one consistent form. Oracle: --test-parse 270/270, ruff clean. grep confirms no residual refs. Op: compress
respond_code_review / respond_plan_review / respond_subagent_review each hand-inlined the identical FAIL-write / PASS-clear / UNCERTAIN-no-op block (differing only in tool_name + key). CLAUDE.md names this exact triple under the "UNCERTAIN preserves prior state" invariant — a shared contract enforced only by copy-paste, free to drift. Extract _apply_verdict_state() as the single owner; the load-bearing UNCERTAIN WHY comment moves into it once. It calls write_fail_state/clear_fail_state by global name so the self-test's module-scope monkeypatch still intercepts (verified). Oracle: --test-parse 270/270, ruff clean. Op: compress
install-cursor.sh's jq dedup stripped only the bare codex_command, never the "REFLECTOR_PREEDIT_BLOCK=1 ..."-prefixed PreToolUse command. A second --pre-edit install therefore appended a DUPLICATE PreToolUse hook — and each duplicate fires _record_deny on the same key per edit, tripping the _PRE_EDIT_MAX_DENIES loop-breaker after one denial instead of two, silently weakening the pre-edit gate. Match prior reflector hooks by the unique reflector_script path substring so both the bare and prefixed variants are stripped before re-adding. Verified: 2x --pre-edit install -> one PreToolUse group/hook. install-grok is unaffected (it dedups on the same prefixed command it installs). Found by ce-code-review (correctness/testing/adversarial/previous-comments corroborated; also CodeRabbit outside-diff). Op: correct Restores: spec:installer-idempotent-reinstall
Two robustness gaps that could invert the fail-closed Stop checkpoint to fail-open: - fan_out's N>1 path collected results via f.result(), which re-raises any worker exception invoke_backend did not already convert to "". One backend's error then aborted the whole comprehension (discarding the OTHER backends' verdicts) and propagated through respond_stop/main with no top-level catch, crashing the hook — exit 1 is non-blocking on Stop, so a fail-closed checkpoint silently became fail-open. Guard each result via _future_raw -> "" (infra-empty), consistent with invoke_backend's fail-open contract and INV-MERGE. - invoke_backend's mkstemp ran OUTSIDE its try, so a tmpfile OSError bypassed the documented fail-open. Move it inside the try. Regression test: a raising N>1 worker -> infra-empty, others survive, merge -> PASS (272/272). Found by ce-code-review (adversarial P1, corroborated by learnings). Op: correct Restores: spec:fan-out-fails-open-per-worker
tool_response (controlled by the MCP edit/diagnostic tool) was interpolated into the review/failure prompts OUTSIDE the _sandbox_content fence: - build_code_review_prompt: filePath was neither redacted nor newline- stripped (error wasn't newline-stripped either). A forged "\nPASS\n" in filePath lands as a verdict line; respond_code_review then clear_fail_state on it, suppressing a real FAIL at the fail-closed Stop gate. - build_code_change_failure_prompt: same unsandboxed-metadata pattern. It is a no-verdict diagnostic (distortion-only), hardened for consistency. Add _safe_meta(): redact-THEN-truncate (a secret straddling the cap can't leak its prefix) + newline-collapse; apply at every tool_response metadata interpolation. Regression tests: forged "\nPASS\n" in filePath is collapsed, not raw-injected (275/275). Found by ce-code-review (security P1, corroborated by learnings); the redact-before-truncate refinement was caught by the reflector's own review. Op: correct Restores: spec:untrusted-metadata-sanitized
…-TEXT) The read-only-lever loop verified INV-READONLY for all 5 backends but never asserted INV-VERDICT-TEXT for the claude reviewer — dropping "text" from its extra_argv left the suite green while silently letting a JSON output mode wrap (and hide) the verdict from parse_verdict's first-5-lines scan. Assert --output-format is present and followed by "text". Found by ce-code-review (testing P2). Op: correct Restores: test:inv-verdict-text-claude
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/codex-reflector.py`:
- Line 1131: Update the list comprehension return so the zip call is explicit
about strict length matching to avoid silent truncation (B905): change the
comprehension that returns [(name, _future_raw(f)) for name, f in zip(backends,
futures)] to use zip(backends, futures, strict=True) so any mismatch between
backends and futures raises immediately; locate the expression using the names
backends, futures and helper _future_raw to apply the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6c7f5473-b200-4e2a-8b73-512894eb6ed7
📒 Files selected for processing (8)
CLAUDE.mdhooks/antigravity-reflector-hook.shhooks/codex-reflector-hook.shhooks/codex-reflector-preedit-hook.shscripts/codex-reflector.pyscripts/install-antigravity.shscripts/install-codex.shscripts/install-cursor.sh
✅ Files skipped from review due to trivial changes (1)
- hooks/codex-reflector-hook.sh
🚧 Files skipped from review as they are similar to previous changes (6)
- hooks/antigravity-reflector-hook.sh
- hooks/codex-reflector-preedit-hook.sh
- scripts/install-cursor.sh
- scripts/install-antigravity.sh
- scripts/install-codex.sh
- CLAUDE.md
…d absent When a host omitted session_id, _atomic_update_state/_read_state bailed on the empty id, so a PostToolUse FAIL was silently never recorded and Stop never blocked on it — while respond_pretooluse already synthesized a nosession-<hash(cwd)> id. Add _resolve_session_id() (same scheme) and route BOTH the PostToolUse write (main) and the Stop read (respond_stop) through it, so write/read agree on a stable per-cwd key. An id the host DID send is returned unchanged (INV-CODEX-PATH-STABLE). Regression: respond_stop blocks on a missing-session FAIL via the pending- FAIL fast path (read-side wiring guarded end-to-end); empty-id drop asserted (281/281). main()'s one-line write-side resolution is not unit-testable (sys.exit) and shares the same helper. Found by ce-code-review (adversarial P3, pre-existing); the read-both-paths refinement was caught by the reflector's own review. Op: correct Restores: spec:fail-state-recorded-without-session-id
The cursor --pre-edit duplicate-hook bug (afb4478) reached review only because nothing tested installer idempotency — --test-parse is hermetic and never shells out to the installers. Add scripts/test-install-idempotent.sh: run each merging installer twice against a scratch dir and assert the hook set is stable AND free of duplicate commands per event (a count-only check would miss a duplicate already present after run 1). Covers cursor/grok/ codex/antigravity; skips jq-less installers (skip != pass); cleans scratch dirs via a trap. shellcheck-clean; negative control confirms the dupe check fails on a planted duplicate. Document it in CLAUDE.md + README; bump the documented self-test count 270 -> 281 (tests added across the review fixes). Found by ce-code-review (testing P2). Op: extend
The cursor idempotency fix (afb4478) switched the jq dedup to $reflector_script, leaving codex_command with no remaining reference — the python heredoc derives its command from sys.argv[1] independently. shellcheck SC2034. Remove it. Op: compress
…n_id (Graft) The _resolve_session_id helper (fc9ab2b) unified the nosession-<hash(cwd)> fallback for the fail-state write/read paths, but its original mirror site — respond_pretooluse's deny-loop key — kept the inline copy, so the formula lived in two places (free to drift). Route it through the helper too; now all three session-key sites share one owner. Behavior-identical (the inline expression was byte-for-byte the helper body). Docstring updated: the helper is the canonical owner, no longer "mirroring" respond_pretooluse. Oracle: --test-parse 281/281, ruff clean. Found by ce-simplify-code (reuse + quality + efficiency all flagged it). Op: compress
The idempotency fix (afb4478) used contains($reflector_script) to strip prior hooks — but that ALSO strips a user's own hook that merely references the reflector path (e.g. `mylogger python3 ".../codex-reflector.py" --tail`), clobbering it on every reinstall. Match the two EXACT commands this installer emits (bare + REFLECTOR_PREEDIT_BLOCK=1 variant); this re-adds codex_command (and a pre_edit_command sibling) that 18d7fef removed once the substring approach orphaned it. Still idempotent for --pre-edit and plain reinstalls. Harness gains an over-strip regression: a user hook referencing the path survives reinstall (now 5 checks). Found by ce-code-review round 2 (adversarial P2, reproduced) — a regression the first round's substring fix introduced. Op: correct Restores: spec:dedup-strips-only-own-commands
The metadata-sanitization fix (9a51a39) routed tool_response.* through _safe_meta but left the sibling `File: {file_path}` / `Tool: {tool_name}` header lines raw — same outside-the-fence position, same verdict-bearing prompt (build_code_review_prompt), same forged-"\nPASS\n" -> clear_fail_state injection class under a poisoned-agent threat model. Route them (and build_code_change_failure_prompt's File/Tool/Error lines) through _safe_meta, matching build_pretooluse_prompt's existing safe_path/safe_tool. Regression: a forged newline in tool_input.file_path's header is collapsed (282/282). Found by ce-code-review round 2 (security P1, corroborated by correctness). Op: correct Restores: spec:untrusted-metadata-sanitized
The two lists are built in lockstep so this is behavior-neutral, but strict= turns any future length desync into a hard error instead of silently truncating — a dropped backend on a fail-closed Stop would invert it to fail-open. Satisfies ruff B905 (CodeRabbit flagged it; the repo has no ruff config so default ruff didn't). Found by ce-code-review round 2 (correctness/adversarial/previous-comments; CodeRabbit PR #2). Op: correct Restores: spec:fan-out-no-silent-truncation
The idempotency harness used `set -e`, so a broken installer aborted the whole run before the remaining installers were tested and without a clear per-installer FAIL line. Drop `set -e` (keep `set -u`): a failed installer leaves a missing/unparseable settings file, report() reads it as -1 -> FAIL, and the harness continues through the rest of the matrix. Verified: happy path still 5/5; a missing settings file reports FAIL and continues. Found by ce-code-review round 2 (testing P3). Op: correct Restores: spec:harness-reports-each-installer
Five recur-prone learnings surfaced by this session's two ce-code-review rounds + simplify, each grounded in the commits that fixed them and adversarially audited (accuracy / schema / no-overlap): - security/unsandboxed-prompt-metadata-injection — untrusted metadata outside the _sandbox_content fence is a forged-verdict vector (_safe_meta; 9a51a39/5494478) - security/fan-out-worker-exception-inverts-fail-closed — a worker exception crashes the hook; exit 1 is non-blocking on Stop -> fail-open (_future_raw; cf19cad) - security/redact-before-truncate-ordering — truncate-first leaks a secret's prefix; redact before slicing (_safe_meta; 9a51a39) - conventions/installer-dedup-must-match-all-command-variants — dedup on the exact command-set emitted, not a path substring (install-cursor.sh; afb4478/5372732) - conventions/key-normalizer-helper-must-route-all-call-sites — a shared-key helper must route every read+write site or write/read silently disagree (_resolve_session_id; fc9ab2b/c667abc) Authored via an ultracode workflow (5 drafters -> adversarial audit); CLAUDE.md already surfaces docs/solutions/ for discovery. Op: extend
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/install-cursor.sh (1)
141-166:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: Unconditional dedup strips opted-in pre-edit hook on plain reinstall.
The jq filter strips BOTH
$cmdand$precmdunconditionally (line 148), but the Python heredoc only emits the PreToolUse hook whenpre_edit==1(lines 117-132). This means runninginstall-cursor.sh(without--pre-edit) after a priorinstall-cursor.sh --pre-editrun will silently remove the user's opted-in PreToolUse hook.The strip-set must match the emit-set: strip
$precmdonly when--pre-editis passed, matching the Codex installer's conditional pattern (see context snippet frominstall-codex.shlines 164-167 whereoursconditionally includespreedit_command).🛡️ Proposed fix: conditional strip-set
Pass the
pre_editflag to jq and conditionally strip the pre-edit command:- jq --arg cmd "$codex_command" --arg precmd "$pre_edit_command" -s ' + jq --arg cmd "$codex_command" --arg precmd "$pre_edit_command" --argjson pre_edit "$pre_edit" -s ' # Strip ONLY our two exact generated commands (bare + pre-edit variant), - # so reinstall is idempotent for both --pre-edit and plain runs WITHOUT - # clobbering a user-authored hook that merely references the reflector path. - def without_reflector($cmd; $precmd): + # when --pre-edit is passed (matching emit-set). Plain runs strip only the + # bare command so they do not clobber a prior --pre-edit opt-in. + def without_reflector($cmd; $precmd; $pre_edit): map( if has("hooks") then - . + {hooks: (.hooks | map(select((.command // "") as $c | $c != $cmd and $c != $precmd)))} + . + {hooks: (.hooks | map(select((.command // "") as $c | + if $pre_edit then ($c != $cmd and $c != $precmd) + else ($c != $cmd) + end + )))} else . end ) | map(select((has("hooks") | not) or ((.hooks // []) | length > 0))); .[0] as $existing | .[1] as $codex | (($existing.hooks // {}) + ($codex.hooks // {}) | keys_unsorted | unique) as $keys | $existing + { hooks: reduce $keys[] as $key ({}; .[$key] = ( - (($existing.hooks[$key] // []) | without_reflector($cmd; $precmd)) + (($existing.hooks[$key] // []) | without_reflector($cmd; $precmd; $pre_edit)) + ($codex.hooks[$key] // []) ) ) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/install-cursor.sh` around lines 141 - 166, The jq filter's without_reflector currently always removes both $cmd and $precmd, which deletes an opted-in PreToolUse hook when reinstalling without --pre-edit; modify the invocation and filter so the pre-edit command is only stripped when pre_edit is true: pass a boolean pre_edit flag into jq from the shell, change without_reflector($cmd; $precmd) to conditionally include $precmd only when pre_edit is true (e.g., call without_reflector($cmd; ($precmd | select($pre_edit)))) and update the jq call site that reads "$settings_path" and "$tmp_new" and writes "$tmp_merged" so it supplies the pre_edit value derived from the script's pre_edit variable.
🧹 Nitpick comments (1)
scripts/test-install-idempotent.sh (1)
89-119: ⚡ Quick winConsider adding cross-run idempotency test case.
The current Cursor test runs
--pre-edittwice, catching duplicate accumulation and user-hook over-stripping. Consider adding a test case that runs--pre-editthen plain (without the flag) to verify that an opted-in PreToolUse hook is preserved, catching the conditional strip-set bug flagged ininstall-cursor.sh.This would match the Codex installer's convention (context snippet 3, lines 164-167) where a plain reinstall does not remove a previously opted-in
--preedithook.💡 Proposed additional test case
After line 95, add:
# Cross-run preservation: opt-in with --pre-edit, then plain reinstall should # preserve the PreToolUse hook (not strip it). d2=$(mkdir_scratch) sh "$repo/scripts/install-cursor.sh" --pre-edit "$d2" >/dev/null 2>&1 r1_prehook=$(grep -c 'REFLECTOR_PREEDIT_BLOCK=1' "$d2/.claude/settings.json" || echo 0) sh "$repo/scripts/install-cursor.sh" "$d2" >/dev/null 2>&1 r2_prehook=$(grep -c 'REFLECTOR_PREEDIT_BLOCK=1' "$d2/.claude/settings.json" || echo 0) if [ "$r1_prehook" -gt 0 ] && [ "$r1_prehook" -eq "$r2_prehook" ]; then printf 'PASS install-cursor.sh preserves --pre-edit opt-in across plain reinstall\n' pass=$((pass + 1)) else printf 'FAIL install-cursor.sh stripped --pre-edit hook on plain reinstall (%s -> %s)\n' "$r1_prehook" "$r2_prehook" fail=$((fail + 1)) fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/test-install-idempotent.sh` around lines 89 - 119, Add a cross-run idempotency test that verifies an opted-in PreToolUse hook survives a plain reinstall: create a second scratch dir via mkdir_scratch, run sh "install-cursor.sh" --pre-edit on it, record presence/count of the REFLECTOR_PREEDIT_BLOCK (or equivalent PreToolUse marker) in "$d2/.claude/settings.json", then run sh "install-cursor.sh" (plain) on the same dir and re-check the marker; assert that the marker count before and after are equal and non-zero and fail the test if the plain reinstall stripped the PreToolUse hook. Use the existing pattern for reporting/pass/fail and reference install-cursor.sh, PreToolUse, and REFLECTOR_PREEDIT_BLOCK to locate where to add the snippet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@scripts/install-cursor.sh`:
- Around line 141-166: The jq filter's without_reflector currently always
removes both $cmd and $precmd, which deletes an opted-in PreToolUse hook when
reinstalling without --pre-edit; modify the invocation and filter so the
pre-edit command is only stripped when pre_edit is true: pass a boolean pre_edit
flag into jq from the shell, change without_reflector($cmd; $precmd) to
conditionally include $precmd only when pre_edit is true (e.g., call
without_reflector($cmd; ($precmd | select($pre_edit)))) and update the jq call
site that reads "$settings_path" and "$tmp_new" and writes "$tmp_merged" so it
supplies the pre_edit value derived from the script's pre_edit variable.
---
Nitpick comments:
In `@scripts/test-install-idempotent.sh`:
- Around line 89-119: Add a cross-run idempotency test that verifies an opted-in
PreToolUse hook survives a plain reinstall: create a second scratch dir via
mkdir_scratch, run sh "install-cursor.sh" --pre-edit on it, record
presence/count of the REFLECTOR_PREEDIT_BLOCK (or equivalent PreToolUse marker)
in "$d2/.claude/settings.json", then run sh "install-cursor.sh" (plain) on the
same dir and re-check the marker; assert that the marker count before and after
are equal and non-zero and fail the test if the plain reinstall stripped the
PreToolUse hook. Use the existing pattern for reporting/pass/fail and reference
install-cursor.sh, PreToolUse, and REFLECTOR_PREEDIT_BLOCK to locate where to
add the snippet.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 829c40a9-d13c-4c6b-b738-dfd60dd045c0
📒 Files selected for processing (10)
CLAUDE.mdREADME.mddocs/solutions/conventions/installer-dedup-must-match-all-command-variants.mddocs/solutions/conventions/key-normalizer-helper-must-route-all-call-sites.mddocs/solutions/security/fan-out-worker-exception-inverts-fail-closed.mddocs/solutions/security/redact-before-truncate-ordering.mddocs/solutions/security/unsandboxed-prompt-metadata-injection.mdscripts/codex-reflector.pyscripts/install-cursor.shscripts/test-install-idempotent.sh
✅ Files skipped from review due to trivial changes (5)
- docs/solutions/conventions/installer-dedup-must-match-all-command-variants.md
- docs/solutions/conventions/key-normalizer-helper-must-route-all-call-sites.md
- docs/solutions/security/fan-out-worker-exception-inverts-fail-closed.md
- docs/solutions/security/redact-before-truncate-ordering.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- CLAUDE.md
…on learning A sixth recur-prone learning, drafted in the prior session but never persisted before compaction (confirmed via session-history search). The review cycle flagged `cp --` / `readlink --` as non-POSIX "GNUisms"; both claims are false, and stripping the guard would re-introduce a dash-prefixed-path misparse bug. - conventions/end-of-options-double-dash-is-portable-and-load-bearing — `cp` shall conform to POSIX XBD 12.2 (Guideline 10); `readlink` is not in POSIX but GNU+BSD both accept `--` via getopt; reject-with-spec-citation is the correct disposition, not the "fix" Authored via a Full ce-compound run (3 research drafters + session-history search), then adversarially fact-checked against primary sources (Open Group POSIX Issue 8, GNU coreutils, FreeBSD) — all five load-bearing claims confirmed. CLAUDE.md/AGENTS.md already surface docs/solutions/ for discovery. Op: extend
|
@coderabbitai review |
✅ Action performedReview finished.
|
| settings_path="${settings_dir}/settings.json" | ||
| tmp_new=$(mktemp) | ||
| tmp_merged=$(mktemp) | ||
| trap 'rm -f "$tmp_new" "$tmp_merged"' EXIT INT HUP TERM |
| tmp_hooks=$(mktemp) | ||
| tmp_merged=$(mktemp) | ||
| tmp_settings=$(mktemp) | ||
| trap 'rm -f "$tmp_hooks" "$tmp_merged" "$tmp_settings"' EXIT INT HUP TERM |
| settings_path="${settings_dir}/settings.json" | ||
| tmp_new=$(mktemp) | ||
| tmp_merged=$(mktemp) | ||
| trap 'rm -f "$tmp_new" "$tmp_merged"' EXIT INT HUP TERM |
| mkdir_scratch() { | ||
| _d=$(mktemp -d) | ||
| tmpdirs="$tmpdirs $_d" | ||
| printf '%s' "$_d" | ||
| } |
Summary
Extends the reflector from a single OpenAI Codex reviewer into a two-axis system: pick (and stack) who reviews and run it inside any of five host agents.
-p:codex(default),claude,cursor-agent,grok,agy(Antigravity), selectable and concurrently fanned out (REFLECTOR_BACKENDS=codex,claude,…). Verdicts merge with an asymmetric lattice (any-FAIL→FAIL, else any-UNCERTAIN→UNCERTAIN, else PASS; infra-empty results excluded, never coerced to UNCERTAIN). Default["codex"]short-circuits to the inline single call — byte-identical to before..codex-plugin/+hooks/codex-hooks.json+ trust-by-hash installer), Cursor (.cursor-plugin/+ Claude-compat installer), Grok (advisory on post/Stop, real hard-block onPreToolUse), Antigravity (enable_json_hooks-gated, Stop-centric), alongside the existing Claude Code native hooks. A thin per-host input-normalizer + output-renderer seam keeps the router host-agnostic.REFLECTOR_PREEDIT_BLOCK=1adds aPreToolUsedeny gate (deny on exit-0 stdoutpermissionDecision, fail-open on UNCERTAIN, deny-loop breaker, never writes fail-state). Off by default; absent from the committedhooks/hooks.jsonso the default Claude Code path spawns no extra process.Security fix (notable)
The behavioral read-only harness (
scripts/test-readonly.sh, new) caught that the default codex reviewer was not actually read-only:_codex_argvpassed--sandbox read-only --full-auto, and--full-auto(deprecated on codex 0.137.0) resolves the effective sandbox toworkspace-write, overriding read-only. Removed--full-auto— codex now runs read-only or refuses to run (never unsandboxed). Pre-existing bug, preserved byte-identically by the feature's stability invariant until the new harness exposed it. Documented indocs/solutions/security/.Invariants (named, tested)
INV-CODEX-PATH-STABLE(default path stable, with the deliberate documented--full-autoexception),INV-MERGE,INV-DENY-STDOUT,INV-PREBLOCK-NOSTATE,INV-STOP-DELIVERY,INV-READONLY,INV-VERDICT-TEXT.Testing
python3 scripts/codex-reflector.py --test-parse→ 270/270 passed (was 41);ruffclean;shellcheckclean across all installers + wrappers + harness.scripts/test-readonly.sh, this env): claude PASS, grok PASS (write-attempt blocked), codex SKIP (read-only sandbox unenforceable on this Landlock-restricted kernel → refuses to run, safe), cursor-agent/agy SKIP (not authenticated here).Known limitations (honest)
enable_json_hooks) is documented as a manual checklist inscripts/test-readonly.sh— the adapters/installers are built and offline-tested; live firing needs an authed host.Plan:
docs/plans/(multi-backend reflectors). stdlib-only Python + POSIX-sh installers; no new dependencies.Summary by CodeRabbit
New Features
Documentation
Tests
Chores