Skip to content

feat: multi-backend reviewers (fan-out) + pre-edit block + native host-hook packaging#2

Open
metaphorics wants to merge 37 commits into
mainfrom
v2
Open

feat: multi-backend reviewers (fan-out) + pre-edit block + native host-hook packaging#2
metaphorics wants to merge 37 commits into
mainfrom
v2

Conversation

@metaphorics

@metaphorics metaphorics commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extends the reflector from a single OpenAI Codex reviewer into a two-axis system: pick (and stack) who reviews and run it inside any of five host agents.

  • Axis A — Reviewer(s) via -p: codex (default), claude, cursor-agent, grok, agy (Antigravity), selectable and concurrently fanned out (REFLECTOR_BACKENDS=codex,claude,…). Verdicts merge with an asymmetric lattice (any-FAIL→FAIL, else any-UNCERTAIN→UNCERTAIN, else PASS; infra-empty results excluded, never coerced to UNCERTAIN). Default ["codex"] short-circuits to the inline single call — byte-identical to before.
  • Axis B — Hosts: native hook packaging for Codex (.codex-plugin/ + hooks/codex-hooks.json + trust-by-hash installer), Cursor (.cursor-plugin/ + Claude-compat installer), Grok (advisory on post/Stop, real hard-block on PreToolUse), Antigravity (enable_json_hooks-gated, Stop-centric), alongside the existing Claude Code native hooks. A thin per-host input-normalizer + output-renderer seam keeps the router host-agnostic.
  • Opt-in pre-edit hard-block: REFLECTOR_PREEDIT_BLOCK=1 adds a PreToolUse deny gate (deny on exit-0 stdout permissionDecision, fail-open on UNCERTAIN, deny-loop breaker, never writes fail-state). Off by default; absent from the committed hooks/hooks.json so the default Claude Code path spawns no extra process.

Security fix (notable)

The behavioral read-only harness (scripts/test-readonly.sh, new) caught that the default codex reviewer was not actually read-only: _codex_argv passed --sandbox read-only --full-auto, and --full-auto (deprecated on codex 0.137.0) resolves the effective sandbox to workspace-write, overriding read-only. Removed --full-auto — codex now runs read-only or refuses to run (never unsandboxed). Pre-existing bug, preserved byte-identically by the feature's stability invariant until the new harness exposed it. Documented in docs/solutions/security/.

Invariants (named, tested)

INV-CODEX-PATH-STABLE (default path stable, with the deliberate documented --full-auto exception), INV-MERGE, INV-DENY-STDOUT, INV-PREBLOCK-NOSTATE, INV-STOP-DELIVERY, INV-READONLY, INV-VERDICT-TEXT.

Testing

  • python3 scripts/codex-reflector.py --test-parse270/270 passed (was 41); ruff clean; shellcheck clean across all installers + wrappers + harness.
  • Behavioral read-only (scripts/test-readonly.sh, this env): claude PASS, grok PASS (write-attempt blocked), codex SKIP (read-only sandbox unenforceable on this Landlock-restricted kernel → refuses to run, safe), cursor-agent/agy SKIP (not authenticated here).
  • Reviewed across multiple adversarial passes (0 blockers); see commits.

Known limitations (honest)

  • Antigravity is advisory-only until firing gates (Stop re-injection, PreToolUse deny) are confirmed live (KTD-10).
  • Non-codex behavioral read-only verified for claude/grok in-env; cursor-agent/agy verified only by argv + vendor-doc (rerun the harness where authed).
  • Per-host live firing (Codex trust gate, Cursor cloud-Stop, Grok deny wire-shape, agy enable_json_hooks) is documented as a manual checklist in scripts/test-readonly.sh — the adapters/installers are built and offline-tested; live firing needs an authed host.
  • install-codex.sh trust-by-hash is best-effort (Codex canonicalization undocumented); fails safe (wrong hash → Codex skips the hook, never wrong-trusts).

Plan: docs/plans/ (multi-backend reflectors). stdlib-only Python + POSIX-sh installers; no new dependencies.

Summary by CodeRabbit

  • New Features

    • Multi-host reflector support (Codex, Claude Code, Cursor, Grok, Antigravity)
    • Optional opt-in pre-edit hard-block review gate
    • New host-specific hook manifests and installer flows
  • Documentation

    • Expanded README and CLAUDE.md with host/reviewer contracts, fan-out/merge rules, safety guarantees, and operational checklists
    • Multiple new security guides (sandbox verification, redact-before-truncate, prompt-metadata injection, fan-out failure modes)
  • Tests

    • Read-only sandbox validation harness and installer idempotency test suite
  • Chores

    • Added hook wrappers and per-host installers; improved install idempotency and smoke-test guidance

Generalize the single-Codex reviewer into a data-driven backend registry that
can fan out to N reviewers per event and merge their verdicts. Default
REFLECTOR_BACKENDS=["codex"] short-circuits to the inline single call, keeping
the codex path byte-identical (INV-CODEX-PATH-STABLE).

- U1: Backend spec table + invoke_backend(); shared _codex_argv() used by both
  the codex reviewer row and the summarizer invoke_codex (byte-identity incl.
  CODEX_REFLECTOR_MODEL override + LIGHTNING effort bump). Reviewer call sites
  routed through invoke_backend; summarizer stays pinned to codex+FAST_MODEL.
- U2: four prompt-delivery mechanisms, stdout/file capture, per-spec timeout,
  read-only levers per backend (codex --sandbox read-only; claude/grok
  --permission-mode plan; cursor --mode plan; grok --sandbox read-only; agy
  --sandbox + stdin=DEVNULL). No write-enabling flag on any path.
- U3: fan_out() (ThreadPoolExecutor, N=1 short-circuit) + pure merge_verdicts()
  (drop infra-empty, parse per-reviewer raw, asymmetric lattice, empty-set
  sentinel) + resolve_backends() + REVIEWER_LABEL(set) + Stop delivery budget
  (_compact_output_stop caps compaction to 1 layer).
- U4: codex-only effort gating; non-codex force backend-native default_model;
  model override is codex-scoped only.
- U5: self-test extended to 129 cases (byte-identity x3, levers, merge lattice,
  resolve precedence, label, codex-scoped override, summarizer pinning).

Security fix (adversarial review): build_bash_failure_prompt now wraps its
attacker-controllable inputs in _sandbox_content (was _redact only) — closing a
prompt-injection hole; sandbox audit widened to all 8 builders. Fixed
claude/cursor-agent default_model (were inheriting the codex gpt-5.5 id) to
sonnet / sonnet-4; added a guard that no non-codex backend uses DEFAULT_MODEL.

Op: extend
Add an opt-in PreToolUse pre-flight review that can DENY a high-severity edit
before it lands. Off by default (REFLECTOR_PREEDIT_BLOCK); respond_pretooluse
gates-first and returns None before any work, and no PreToolUse entry is added
to committed hooks/hooks.json — so the default Claude Code path is unchanged
(INV-CODEX-PATH-STABLE). Opt-in installers wire the event later.

- build_pretooluse_prompt: reviews the PROPOSED edit (tool_input), never the
  on-disk file or tool_response; _redact + _sandbox_content; higher FAIL bar
  (block only high-confidence/high-severity; when in doubt PASS).
- respond_pretooluse: single primary reviewer (resolve_backends[0], never
  fan_out); FAIL -> exit-0 stdout hookSpecificOutput.permissionDecision="deny"
  (INV-DENY-STDOUT, never exit-2/stderr — the only channel Grok honors);
  PASS/UNCERTAIN/empty -> None quiet allow (fail-open, never permissionDecision
  ="allow"); never writes fail-state (INV-PREBLOCK-NOSTATE).
- Deny-loop breaker (M-B): per-(file,edit-hash) counter in a SEPARATE
  /tmp/codex-reflector-denies-{sid}.json (flock); after N denies of the same
  edit, allow + advisory; cwd-derived fallback key when session_id is absent so
  the breaker can't silently disable (e.g. under Grok).
- _ME_PRE_EDIT preset bounds effort/latency on the synchronous path.

Adversarial-review fixes: deny reason and the >400K prompt-build path hard-
truncate instead of running unbounded compaction (INV-STOP-DELIVERY on the
synchronous path — a host timeout mid-compaction would drop the deny and let
the edit land). _sandbox_content neutralizes the closing fence via an explicit
​ escape so a crafted proposed edit can't break out and force PASS
(applies to all builders). Deny label attributes the one reviewer that ran.
Self-test 149/149.

Op: extend
Introduce the Axis-B host seam so future hosts plug in without touching the
router core (KTD-7). Two symmetric seams + a canonical schema:

- resolve_host(payload): REFLECTOR_HOST env wins (validated), else infer from
  distinguishing payload keys, ambiguous -> claude identity (safe default).
- _normalize_input(host, hook_data): host payload -> canonical Claude-shaped
  hook_data; cursor -> existing _normalize_cursor_input, claude/codex identity,
  grok/antigravity clean identity extension points for U10/U11.
- Canonical namedtuple (systemMessage/additionalContext/reason/blocking/
  permission_decision/permission_decision_reason + raw) + _to_canonical lift.
- _render_host_output / _render_identity_output: the claude/codex/cursor
  renderer re-emits the responder dict VERBATIM from canonical.raw using the
  exact pre-refactor exit-code expression, so default-host output is byte-
  identical by construction (INV-CODEX-PATH-STABLE; verified by a 48-case
  differential oracle).
- probe_backends (m3): absent reviewer binary -> visible notice, treated as
  infra-empty (never UNCERTAIN) so it can't wedge Stop.

Also fixes a regression where gating _normalize_cursor_input on workspace keys
under-detected bare Cursor payloads; resolve_host now keys on Cursor's camelCase
event name. Self-test 190/190.

Op: extend
…y (KTD-11)

A present-but-logged-out stdout-capture backend (grok/claude/cursor-agent/agy)
can exit nonzero while printing auth-error text to stdout. Without a returncode
check that text was returned as a "verdict", parsed to UNCERTAIN, and on Stop
(fail-closed) blocked the agent — the exact wedge KTD-11 forbids. Guard
stdout-capture backends: nonzero exit -> "" (infra-empty, excluded from the
merge). codex (file-capture) is unchanged, so INV-CODEX-PATH-STABLE holds.

Op: correct
Restores: spec:KTD-11
install-cursor.sh gains a --pre-edit flag that adds an opt-in preToolUse hook
(maps via Cursor's _CURSOR_EVENT_MAP) so the pre-edit deny gate can run under
Cursor. Default install is unchanged (4 events, no PreToolUse); the hook also
self-gates on REFLECTOR_PREEDIT_BLOCK at runtime, so it is inert unless enabled.
Cursor is otherwise already a working host via the U7 seam; .cursor-plugin/
plugin.json manifest already present. Cloud-Stop degradation + settings.json
drift docs are consolidated in U12.

Op: extend
_normalize_grok_input maps Grok's Claude-compat camelCase stdin envelope
(hookEventName, workspaceRoot/CLAUDE_PROJECT_DIR, sessionId/conversationId)
to canonical Claude-shaped fields. _render_grok_output is asymmetric per
KTD-9: PreToolUse emits permissionDecision=deny on stdout (Grok's only
honored hard-block channel); post/Stop/PreCompact are advisory — logged to
a /tmp side-channel + best-effort systemMessage, never additionalContext,
never exit 2. install-grok.sh wires the 5 hooks into Claude-compat discovery
and enables REFLECTOR_PREEDIT_BLOCK unconditionally (Grok's sole enforcement
seam). Wired both seams (_normalize_input, _render_host_output); self-test
+18 grok cases (209/209). Firing+deny wire-shape is KTD-10 smoke-test-gated.

Op: extend
Package the reflector to run inside Codex's native hook system and make the
failure-diagnostic flow live on Codex.

- .codex-plugin/plugin.json: Codex plugin manifest mirroring .claude-plugin,
  declaring a hooks capability (-> ./hooks/codex-hooks.json).
- hooks/codex-hooks.json: native wiring for PostToolUse/Stop/PreCompact via a
  committed wrapper (codex-reflector-hook.sh) using ${CLAUDE_PLUGIN_ROOT}.
  PreToolUse pre-edit hard-block is OPT-IN, split into codex-hooks-preedit.json.
- scripts/install-codex.sh: idempotent installer (POSIX sh, python3 read-modify-
  write, no jq). Merges ~/.codex/hooks.json and PRE-TRUSTS hook commands BY HASH
  ([hooks.state] in config.toml, backup + parse-validate + rollback). Generates
  abs-path wrappers. --preedit opt-in; --dangerously-bypass-hook-trust footnote.
- B4: _normalize_codex_input re-emits PostToolUseFailure from error/non-zero
  PostToolUse payloads so classify() routes to bash_failure / code_change_failure.
  Wired host=="codex" in _normalize_input. INV-CODEX-PATH-STABLE preserved: a
  non-error codex PostToolUse passes through byte-identical (ambiguous code/status
  spellings excluded from exit-key detection).
- run_self_test: B4 re-emit + invariant cases; manifest/hooks JSON schema-shape
  validation. 218/218 passed; ruff + shellharden clean.

Op: extend
Holistic /ce-code-review of the merged host layer (0 blockers) surfaced two
shipped-behavior enforcement bugs + hardening:

- Cursor --pre-edit shipped INERT: the wired preToolUse hook omitted
  REFLECTOR_PREEDIT_BLOCK=1, so respond_pretooluse gated off and never denied.
  Now the --pre-edit command sets it inline (Cursor runs hooks via a shell, like
  grok/codex); default install still wires no preToolUse hook.
- Antigravity host-pin relied on an inline `VAR=val cmd` env-prefix the codex
  unit deliberately distrusted. If agy execs without shell interpretation, the
  Stop payload (no workspacePaths) infers host=claude and reads the bare /tmp
  fail-state file — silently losing the FAIL (INV-STOP-DELIVERY). Now mirrors
  codex: a committed wrapper (hooks/antigravity-reflector-hook.sh) and an
  installer-generated absolute-path wrapper `export REFLECTOR_HOST=antigravity`,
  referenced as `sh <wrapper>` (plain two-arg exec, no shell-interp dependency).
- Refreshed the stale _normalize_input docstring (grok/antigravity now dispatch,
  not identity passthrough).
- Extended the INV-READONLY forbidden-flag self-test set with --no-plan,
  --dangerously-bypass-approvals-and-sandbox, workspace-write, danger-full-access.

Self-test 270/270; ruff clean; installers sh -n clean.

Op: correct
Restores: spec:INV-STOP-DELIVERY
…2 simplify)

/ce-simplify-code pass: moved the _IDENTITY_HOSTS frozenset from the host-seam
section down to just before its sole consumer _state_path, so the definition
precedes use, and deleted the apologetic forward-reference comment that only
existed to excuse the out-of-order reference. Byte-identical value, single
consumer (the identity renderer dispatches by string literal, not the symbol);
strictly behavior-preserving (adversarially verified). The simplify pass
deliberately left the per-host normalizers/renderers separate (merging would
hide host-specific wire-shape structure — speculative DRY) and the rare-path
double parse_verdict alone (factoring needs restructuring). Self-test 270/270.

Op: compress
…g checklist (U12)

- CLAUDE.md: fixed the stale model-override gotcha (now codex/reviewer-scoped;
  summarizer pinned to FAST_MODEL ignoring the env vars); added the two-axis
  Reviewer×Host model, fan-out + merge_verdicts, the opt-in pre-edit gate, the
  env-var table, the named-invariants table, reviewer×host capability matrix, and
  per-host gaps/known-limitations.
- README.md: user-facing reviewer selection/fan-out, pre-edit gate, per-host
  install, the matrix, per-host gaps, auth prerequisites, version snapshot.
- scripts/test-readonly.sh (NEW): behavioral INV-READONLY harness — drives each
  reviewer with a write-attempt prompt in a fresh scratch repo and asserts the
  repo is unchanged; SKIPs absent/unauthed CLIs (nonce liveness probe) and
  sandbox-unenforceable kernels. Carries the KTD-10 per-host firing checklist.
  Live results: claude PASS, grok PASS, codex FAIL (see below), cursor/agy SKIP.

The harness surfaced a CRITICAL pre-existing finding: the codex reviewer is NOT
read-only — `_codex_argv` passes `--sandbox read-only --full-auto`, but
`--full-auto` (deprecated on codex 0.137.0) resolves the sandbox to
workspace-write, overriding read-only. Documented honestly in CLAUDE.md/README;
the argv fix is deferred to an explicit decision (it touches the byte-identity
self-test + changes default behavior on Landlock-restricted kernels).

Op: extend
…ad-only (INV-READONLY)

The U12 behavioral harness proved the DEFAULT codex reviewer was NOT read-only:
_codex_argv passed `--sandbox read-only --full-auto`, but on codex 0.137.0
`--full-auto` is deprecated and resolves the effective sandbox to workspace-write,
OVERRIDING read-only — so prompt-injection from reviewed content could make codex
write the repo. Pre-existing (the original reflector shipped --full-auto; the
multi-backend feature preserved it byte-identically).

Removed --full-auto from _codex_argv (single source for the codex reviewer AND
the codex-pinned summarizer). codex now runs read-only on a sandbox-capable host,
or REFUSES to run where the kernel cannot set up its sandbox (Landlock EPERM) —
the review fails open (no review, no writes), never unsandboxed. Harness confirms:
codex now SKIPs on the Landlock-restricted test kernel instead of writing.

Deliberate, security-justified break of the codex argv byte-identity: INV-READONLY
wins over INV-CODEX-PATH-STABLE byte-identity. Updated the two byte-identity
self-test assertions and synced scripts/test-readonly.sh + CLAUDE.md + README.md.
Self-test 270/270; ruff + shellcheck clean.

Op: correct
Restores: spec:INV-READONLY
The docstring said grok 'falls back to identity until its renderer lands' —
stale since U10; grok has a dedicated asymmetric renderer. Documentation-only,
zero behavioral impact. Closes the final code-review minor. Self-test 270/270.

Op: correct
…arning (ce-compound)

The --full-auto-defeats-read-only bug as a compounding solution doc: argv-presence
checks pass while the effective sandbox is workspace-write; behavioral write-attempt
testing is the arbiter. Bug track, security category.

Op: extend
One-line pointer so future agents search the learnings store before debugging
recurring issues (ce-compound discoverability check).

Op: extend
Copilot AI review requested due to automatic review settings June 8, 2026 21:08
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the Codex Reflector plugin to support multiple reviewer backends (such as Claude, Grok, and Antigravity) and multiple agent hosts, adding corresponding installer scripts, native hook configurations, and a behavioral read-only test harness. Feedback on the changes highlights a potential failure in the newly added shell wrappers (antigravity-reflector-hook.sh, codex-reflector-hook.sh, and codex-reflector-preedit-hook.sh) if both CLAUDE_PLUGIN_ROOT and PLUGIN_ROOT are unset, and suggests robust relative path fallbacks to prevent execution errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread hooks/antigravity-reflector-hook.sh Outdated
Comment thread hooks/codex-reflector-hook.sh Outdated
Comment thread hooks/codex-reflector-preedit-hook.sh Outdated
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e1004b4-2dd2-4c47-8151-05e86e16cf58

📥 Commits

Reviewing files that changed from the base of the PR and between 1f25be2 and b574c78.

📒 Files selected for processing (1)
  • docs/solutions/conventions/end-of-options-double-dash-is-portable-and-load-bearing.md
✅ Files skipped from review due to trivial changes (1)
  • docs/solutions/conventions/end-of-options-double-dash-is-portable-and-load-bearing.md

📝 Walkthrough

Walkthrough

Converts Codex Reflector into a multi-host reflector: adds plugin manifest, per-host hook wiring and wrappers, host installers (idempotent/trusted), optional pre-edit hard-block wiring, read-only verification and installer idempotency tests, and extensive docs and security conventions.

Changes

Multi-Backend Reflector Plugin

Layer / File(s) Summary
Plugin manifest and Codex hook wiring
.codex-plugin/plugin.json, hooks/codex-hooks.json, hooks/codex-reflector-hook.sh, hooks/codex-hooks-preedit.json, hooks/codex-reflector-preedit-hook.sh
Plugin manifest and Codex hook configurations wire PostToolUse/Stop/PreCompact to the reflector and provide an opt-in PreToolUse pre-edit wrapper; wrappers export REFLECTOR_HOST/REFLECTOR_PREEDIT_BLOCK and exec the Python entrypoint.
Codex installer (idempotent, trust-by-hash)
scripts/install-codex.sh
Installer generates wrapper(s), merges hook entries into hooks.json idempotently, computes canonical SHA-256 digests for installed hooks, writes [hooks.state.*] trusted_hash into config.toml, and validates/backups on failure.
Antigravity hooks and installer
antigravity/hooks.json, hooks/antigravity-reflector-hook.sh, scripts/install-antigravity.sh
Antigravity JSON hook bundle wires PostToolUse/PostToolUseFailure/Stop/PreCompact to a host-pinned wrapper; installer idempotently merges hooks.json and updates settings.json to enable JSON hooks with safe-merge/backup behavior.
Grok installer and wiring
scripts/install-grok.sh
Grok installer emits Grok-compatible settings.json including PreToolUse and per-event matchers/timeouts with REFLECTOR_PREEDIT_BLOCK and REFLECTOR_HOST=grok; merges idempotently via jq or replaces with --force.
Cursor pre-edit extension
scripts/install-cursor.sh
Cursor installer adds --pre-edit to optionally inject PreToolUse with REFLECTOR_PREEDIT_BLOCK and updates dedupe logic to match both bare and pre-edit command variants exactly.
Installer idempotency tests
scripts/test-install-idempotent.sh
Test harness runs per-host installers twice, inspects hook JSON totals and exact-command duplicates, and asserts idempotent merges; includes regression checks against over-stripping.
Read-only sandbox behavioral verification
scripts/test-readonly.sh, docs/solutions/security/readonly-sandbox-levers-need-behavioral-verification.md
Test harness runs reviewer backends against a scratch git repo to assert INV-READONLY behavior (tracked-file fingerprinting and PASS/FAIL/SKIP) and docs describe flag-override failure modes and behavioral verification recommendations.
Documentation, security solutions, and conventions
README.md, CLAUDE.md, docs/solutions/*
README rewritten for multi-backend architecture, CLAUDE.md expanded with reviewer/host contracts and gotchas, and many security/convention write-ups added (fan-out exceptions, redact-before-truncate, unsandboxed metadata, installer deduping, key-normalizer convention).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through hooks and manifest light,
Wrappers set hosts and pre-edit might,
Installers merge, idempotent and neat,
A sandbox test proves the read-only seat,
Carrots, docs, and safety — small crunchy delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the three main changes: multi-backend reviewer fan-out, pre-edit block functionality, and native host-hook packaging.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the reflector from a single Codex-based reviewer running in Claude Code into a two-axis system: (1) selectable, stackable reviewer backends with concurrent fan-out and merged verdicts, and (2) host-specific adapters/installers so the reflector can run inside multiple agent hosts (Claude Code, Codex, Cursor, Grok, Antigravity), including an opt-in synchronous pre-edit deny gate.

Changes:

  • Added a backend registry + fan-out execution in scripts/codex-reflector.py, with deterministic merge semantics and host input/output seams (including Grok and Antigravity renderers).
  • Introduced an opt-in PreToolUse pre-edit hard-block gate (REFLECTOR_PREEDIT_BLOCK=1) with a deny-loop breaker and strict fail-open behavior on uncertainty/infra failures.
  • Added behavioral read-only verification harness and new host installers/packaging artifacts (Codex native hooks + trust-by-hash, Grok/Cursor Claude-compat wiring, Antigravity JSON hook wiring + enablement).

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/test-readonly.sh New behavioral harness to verify reviewer “read-only” behavior via write-attempts on a scratch repo.
scripts/install-grok.sh New installer wiring Grok via Claude-compat hooks, always enabling pre-edit blocking.
scripts/install-cursor.sh Adds --pre-edit option to wire the opt-in PreToolUse gate for Cursor.
scripts/install-codex.sh New installer for Codex native hooks, with wrapper scripts and best-effort trust-by-hash.
scripts/install-antigravity.sh New installer for Antigravity JSON hooks plus enable_json_hooks settings toggle.
scripts/codex-reflector.py Major refactor: backend registry, fan-out + merge, pre-edit gate, host normalization/rendering seams, and read-only/security hardening.
README.md Updated docs for multi-backend reviewers, pre-edit gate, per-host install, and limitations.
hooks/codex-reflector-preedit-hook.sh Opt-in pre-edit wrapper for Codex plugin-dir packaging.
hooks/codex-reflector-hook.sh Codex plugin-dir wrapper pinning REFLECTOR_HOST=codex.
hooks/codex-hooks.json Codex native hook wiring (default events) for plugin-dir packaging.
hooks/codex-hooks-preedit.json Separate opt-in Codex PreToolUse wiring fragment.
hooks/antigravity-reflector-hook.sh Antigravity plugin-dir wrapper pinning REFLECTOR_HOST=antigravity.
docs/solutions/security/readonly-sandbox-levers-need-behavioral-verification.md New security writeup documenting the --full-auto sandbox override issue and behavioral testing approach.
CLAUDE.md Expanded architecture/invariants documentation for reviewer×host axes and new safety contracts.
antigravity/hooks.json Antigravity JSON hook wiring (plugin-dir artifact) pinning REFLECTOR_HOST=antigravity.
.codex-plugin/plugin.json Codex plugin manifest for native hook packaging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/codex-reflector.py Outdated
Comment thread scripts/install-antigravity.sh
Comment thread scripts/test-readonly.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/install-cursor.sh (1)

136-155: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pre-edit hook entries are not deduplicated in merge mode.

Line 153 removes only hooks matching codex_command, but PreToolUse uses REFLECTOR_PREEDIT_BLOCK=1 .... That leaves stale pre-edit hooks and duplicates on repeated --pre-edit installs.

Proposed fix
 codex_command="python3 \"${reflector_script}\""
+codex_preedit_command="REFLECTOR_PREEDIT_BLOCK=1 ${codex_command}"
@@
-    jq --arg codex_command "$codex_command" -s '
-      def without_codex($command):
+    jq --arg codex_command "$codex_command" --arg codex_preedit_command "$codex_preedit_command" -s '
+      def without_codex($command; $preedit):
         map(
           if has("hooks") then
-            . + {hooks: (.hooks | map(select((.command // "") != $command)))}
+            . + {hooks: (.hooks | map(select(((.command // "") != $command) and ((.command // "") != $preedit))))}
           else
             .
           end
         )
@@
-              (($existing.hooks[$key] // []) | without_codex($codex_command))
+              (($existing.hooks[$key] // []) | without_codex($codex_command; $codex_preedit_command))
               + ($codex.hooks[$key] // [])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install-cursor.sh` around lines 136 - 155, The merge leaves duplicate
PreToolUse/pre-edit entries because without_codex($codex_command) only removes
exact matches on $codex_command but PreToolUse adds entries that include
REFLECTOR_PREEDIT_BLOCK=1; update the jq pipeline around without_codex and the
final hooks merge to filter out any hook whose (.command // "") equals
$codex_command OR contains "REFLECTOR_PREEDIT_BLOCK" (or otherwise normalize by
applying unique_by(.command) / unique on command strings after merging), so
modify the without_codex function and/or the reduce that builds hooks to exclude
those patterns and/or deduplicate by .command to prevent repeated pre-edit
hooks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 102-106: The fenced code block containing the flow diagram
starting with "stdin JSON → resolve_host() → _normalize_input(host) → classify()
→ _gate_model_effort(codex-only) → ..." is missing a language tag; change the
opening fence from ``` to ```text so the block is typed (e.g., replace the
untyped fenced block with one that begins with ```text) to satisfy markdownlint
MD040.

In `@scripts/install-antigravity.sh`:
- Around line 90-117: Normalize settings_path and hooks_path to absolute paths
before computing settings_dir/hooks_dir and before creating wrapper_path and
hook_command; for example, resolve settings_path and the chosen hooks_path (from
AGY_HOOKS or default) with a realpath/readlink -f step so settings_dir,
hooks_dir and wrapper_path are all absolute, then write the wrapper and set
hook_command using that absolute wrapper_path (ensure hook_command uses the
resolved wrapper_path variable). Update references around settings_dir,
hooks_path, hooks_dir, wrapper_path and hook_command so all path variables are
the resolved absolute values.

In `@scripts/install-codex.sh`:
- Around line 161-162: The merge code is always including preedit_command in the
local list named "ours" (currently set as "ours = {hook_command,
preedit_command}"), which causes existing pre-edit hooks to be stripped when
reinstalling without the --preedit flag; change the construction of "ours" so
that preedit_command is only included when the pre-edit feature is enabled
(e.g., check the existing preedit flag/variable such as "preedit" or "--preedit"
and append preedit_command to ours conditionally), and keep the downstream merge
logic that compares "ours" and existing hooks unchanged so that a reinstall
without preedit does not remove an existing pre-edit hook.

---

Outside diff comments:
In `@scripts/install-cursor.sh`:
- Around line 136-155: The merge leaves duplicate PreToolUse/pre-edit entries
because without_codex($codex_command) only removes exact matches on
$codex_command but PreToolUse adds entries that include
REFLECTOR_PREEDIT_BLOCK=1; update the jq pipeline around without_codex and the
final hooks merge to filter out any hook whose (.command // "") equals
$codex_command OR contains "REFLECTOR_PREEDIT_BLOCK" (or otherwise normalize by
applying unique_by(.command) / unique on command strings after merging), so
modify the without_codex function and/or the reduce that builds hooks to exclude
those patterns and/or deduplicate by .command to prevent repeated pre-edit
hooks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d932802a-93a3-4378-bca8-bf6ebb94ef2d

📥 Commits

Reviewing files that changed from the base of the PR and between 301c7e4 and 72a0817.

📒 Files selected for processing (16)
  • .codex-plugin/plugin.json
  • CLAUDE.md
  • README.md
  • antigravity/hooks.json
  • docs/solutions/security/readonly-sandbox-levers-need-behavioral-verification.md
  • hooks/antigravity-reflector-hook.sh
  • hooks/codex-hooks-preedit.json
  • hooks/codex-hooks.json
  • hooks/codex-reflector-hook.sh
  • hooks/codex-reflector-preedit-hook.sh
  • scripts/codex-reflector.py
  • scripts/install-antigravity.sh
  • scripts/install-codex.sh
  • scripts/install-cursor.sh
  • scripts/install-grok.sh
  • scripts/test-readonly.sh

Comment thread CLAUDE.md Outdated
Comment thread scripts/install-antigravity.sh
Comment thread scripts/install-codex.sh Outdated
When both CLAUDE_PLUGIN_ROOT and PLUGIN_ROOT are unset/empty, plugin_root
collapsed to "" and the wrapper exec'd /scripts/codex-reflector.py at the
filesystem root (fails). Fall back to the script's own parent dir
(hooks/.. = repo root) so direct/absolute invocation of the wrapper still
resolves the reflector. Addresses gemini-code-assist PR #2 review.

Note: the ${X:-$(...)} default only evaluates when the var is unset, so
the normal (env-set) path is unchanged. This does NOT rescue
`sh "$EMPTY/hooks/..."` — that resolves to /hooks/... and fails in sh
before this script runs.

Op: correct
Restores: spec:wrapper-resolves-reflector-path
_normalize_input dispatches host=="antigravity" to
_normalize_antigravity_input (codex-reflector.py:2882), so the comment
claiming "antigravity has no normalizer yet" was stale. The test still
holds because the normalizer is a no-op on already-Claude-shaped input;
the comment now states that idempotency property. Addresses Copilot PR #2
review.

Op: correct
Restores: spec:comment-matches-code
A relative --hooks-path or AGY_HOOKS baked a relative path into the
wrapper's hook_command (`sh "rel/.../hook.sh"`), which fails once agy runs
the hook from its own cwd. Re-resolve settings_dir/hooks_dir via
`cd && pwd -P` after mkdir (the idiom install-codex.sh already uses) and
rebuild the *_path vars, so the generated command is always absolute.
Verified: relative --hooks-path now emits an absolute hook_command.
Addresses CodeRabbit PR #2 (Major).

Op: correct
Restores: spec:hook-command-absolute-path
`ours` (the set of commands the idempotent merge strips before re-adding)
unconditionally included preedit_command, but `desired` only re-adds the
PreToolUse hook when --preedit is passed. So a plain reinstall stripped a
previously opted-in pre-edit hard-block without re-adding it. Only claim
ownership of preedit_command when pre-edit is enabled this run. Verified:
install --preedit then plain reinstall preserves PreToolUse; PostToolUse
stays a single group. Addresses CodeRabbit PR #2 (Major).

Op: correct
Restores: spec:preedit-hook-survives-default-reinstall
The data-flow fenced block was untyped; markdownlint MD040 flags it.
Addresses CodeRabbit PR #2 (Minor).

Op: correct
Restores: spec:fenced-blocks-have-language
Two compress-only, behavior-preserving simplifications surfaced by an
axis-decomposed simplify pass and confirmed by adversarial audit:
- parse_verdict computed raw.strip() twice; bind once as `stripped`
  (raw is a never-reassigned param, so both sites were provably equal).
- _read_tail had two comments narrating one readline(); keep the richer
  inline one, drop the redundant standalone.

Oracle: --test-parse 270/270, ruff clean.

Op: compress
…rawl)

spec_needs_outfile() was a pure single-caller forwarder for
`spec.output_capture == "file"` — and its sibling _build_backend_argv
already inlines that exact test. Inline it at the one call site and drop
the wrapper, so the file-capture predicate has one consistent form.

Oracle: --test-parse 270/270, ruff clean. grep confirms no residual refs.

Op: compress
respond_code_review / respond_plan_review / respond_subagent_review each
hand-inlined the identical FAIL-write / PASS-clear / UNCERTAIN-no-op block
(differing only in tool_name + key). CLAUDE.md names this exact triple
under the "UNCERTAIN preserves prior state" invariant — a shared contract
enforced only by copy-paste, free to drift. Extract _apply_verdict_state()
as the single owner; the load-bearing UNCERTAIN WHY comment moves into it
once. It calls write_fail_state/clear_fail_state by global name so the
self-test's module-scope monkeypatch still intercepts (verified).

Oracle: --test-parse 270/270, ruff clean.

Op: compress
install-cursor.sh's jq dedup stripped only the bare codex_command, never
the "REFLECTOR_PREEDIT_BLOCK=1 ..."-prefixed PreToolUse command. A second
--pre-edit install therefore appended a DUPLICATE PreToolUse hook — and
each duplicate fires _record_deny on the same key per edit, tripping the
_PRE_EDIT_MAX_DENIES loop-breaker after one denial instead of two,
silently weakening the pre-edit gate. Match prior reflector hooks by the
unique reflector_script path substring so both the bare and prefixed
variants are stripped before re-adding. Verified: 2x --pre-edit install
-> one PreToolUse group/hook. install-grok is unaffected (it dedups on
the same prefixed command it installs).

Found by ce-code-review (correctness/testing/adversarial/previous-comments
corroborated; also CodeRabbit outside-diff).

Op: correct
Restores: spec:installer-idempotent-reinstall
Two robustness gaps that could invert the fail-closed Stop checkpoint to
fail-open:
- fan_out's N>1 path collected results via f.result(), which re-raises any
  worker exception invoke_backend did not already convert to "". One
  backend's error then aborted the whole comprehension (discarding the
  OTHER backends' verdicts) and propagated through respond_stop/main with
  no top-level catch, crashing the hook — exit 1 is non-blocking on Stop,
  so a fail-closed checkpoint silently became fail-open. Guard each result
  via _future_raw -> "" (infra-empty), consistent with invoke_backend's
  fail-open contract and INV-MERGE.
- invoke_backend's mkstemp ran OUTSIDE its try, so a tmpfile OSError
  bypassed the documented fail-open. Move it inside the try.

Regression test: a raising N>1 worker -> infra-empty, others survive,
merge -> PASS (272/272).

Found by ce-code-review (adversarial P1, corroborated by learnings).

Op: correct
Restores: spec:fan-out-fails-open-per-worker
tool_response (controlled by the MCP edit/diagnostic tool) was interpolated
into the review/failure prompts OUTSIDE the _sandbox_content fence:
- build_code_review_prompt: filePath was neither redacted nor newline-
  stripped (error wasn't newline-stripped either). A forged "\nPASS\n" in
  filePath lands as a verdict line; respond_code_review then clear_fail_state
  on it, suppressing a real FAIL at the fail-closed Stop gate.
- build_code_change_failure_prompt: same unsandboxed-metadata pattern. It is
  a no-verdict diagnostic (distortion-only), hardened for consistency.

Add _safe_meta(): redact-THEN-truncate (a secret straddling the cap can't
leak its prefix) + newline-collapse; apply at every tool_response metadata
interpolation. Regression tests: forged "\nPASS\n" in filePath is collapsed,
not raw-injected (275/275).

Found by ce-code-review (security P1, corroborated by learnings); the
redact-before-truncate refinement was caught by the reflector's own review.

Op: correct
Restores: spec:untrusted-metadata-sanitized
…-TEXT)

The read-only-lever loop verified INV-READONLY for all 5 backends but never
asserted INV-VERDICT-TEXT for the claude reviewer — dropping "text" from its
extra_argv left the suite green while silently letting a JSON output mode
wrap (and hide) the verdict from parse_verdict's first-5-lines scan. Assert
--output-format is present and followed by "text".

Found by ce-code-review (testing P2).

Op: correct
Restores: test:inv-verdict-text-claude

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/codex-reflector.py`:
- Line 1131: Update the list comprehension return so the zip call is explicit
about strict length matching to avoid silent truncation (B905): change the
comprehension that returns [(name, _future_raw(f)) for name, f in zip(backends,
futures)] to use zip(backends, futures, strict=True) so any mismatch between
backends and futures raises immediately; locate the expression using the names
backends, futures and helper _future_raw to apply the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c7f5473-b200-4e2a-8b73-512894eb6ed7

📥 Commits

Reviewing files that changed from the base of the PR and between 72a0817 and aacdaf3.

📒 Files selected for processing (8)
  • CLAUDE.md
  • hooks/antigravity-reflector-hook.sh
  • hooks/codex-reflector-hook.sh
  • hooks/codex-reflector-preedit-hook.sh
  • scripts/codex-reflector.py
  • scripts/install-antigravity.sh
  • scripts/install-codex.sh
  • scripts/install-cursor.sh
✅ Files skipped from review due to trivial changes (1)
  • hooks/codex-reflector-hook.sh
🚧 Files skipped from review as they are similar to previous changes (6)
  • hooks/antigravity-reflector-hook.sh
  • hooks/codex-reflector-preedit-hook.sh
  • scripts/install-cursor.sh
  • scripts/install-antigravity.sh
  • scripts/install-codex.sh
  • CLAUDE.md

Comment thread scripts/codex-reflector.py Outdated
…d absent

When a host omitted session_id, _atomic_update_state/_read_state bailed on
the empty id, so a PostToolUse FAIL was silently never recorded and Stop
never blocked on it — while respond_pretooluse already synthesized a
nosession-<hash(cwd)> id. Add _resolve_session_id() (same scheme) and route
BOTH the PostToolUse write (main) and the Stop read (respond_stop) through
it, so write/read agree on a stable per-cwd key. An id the host DID send is
returned unchanged (INV-CODEX-PATH-STABLE).

Regression: respond_stop blocks on a missing-session FAIL via the pending-
FAIL fast path (read-side wiring guarded end-to-end); empty-id drop asserted
(281/281). main()'s one-line write-side resolution is not unit-testable
(sys.exit) and shares the same helper.

Found by ce-code-review (adversarial P3, pre-existing); the read-both-paths
refinement was caught by the reflector's own review.

Op: correct
Restores: spec:fail-state-recorded-without-session-id
The cursor --pre-edit duplicate-hook bug (afb4478) reached review only
because nothing tested installer idempotency — --test-parse is hermetic and
never shells out to the installers. Add scripts/test-install-idempotent.sh:
run each merging installer twice against a scratch dir and assert the hook
set is stable AND free of duplicate commands per event (a count-only check
would miss a duplicate already present after run 1). Covers cursor/grok/
codex/antigravity; skips jq-less installers (skip != pass); cleans scratch
dirs via a trap. shellcheck-clean; negative control confirms the dupe check
fails on a planted duplicate.

Document it in CLAUDE.md + README; bump the documented self-test count
270 -> 281 (tests added across the review fixes).

Found by ce-code-review (testing P2).

Op: extend
The cursor idempotency fix (afb4478) switched the jq dedup to
$reflector_script, leaving codex_command with no remaining reference — the
python heredoc derives its command from sys.argv[1] independently.
shellcheck SC2034. Remove it.

Op: compress
…n_id (Graft)

The _resolve_session_id helper (fc9ab2b) unified the nosession-<hash(cwd)>
fallback for the fail-state write/read paths, but its original mirror site —
respond_pretooluse's deny-loop key — kept the inline copy, so the formula
lived in two places (free to drift). Route it through the helper too; now
all three session-key sites share one owner. Behavior-identical (the inline
expression was byte-for-byte the helper body). Docstring updated: the helper
is the canonical owner, no longer "mirroring" respond_pretooluse.

Oracle: --test-parse 281/281, ruff clean.

Found by ce-simplify-code (reuse + quality + efficiency all flagged it).

Op: compress
The idempotency fix (afb4478) used contains($reflector_script) to strip prior
hooks — but that ALSO strips a user's own hook that merely references the
reflector path (e.g. `mylogger python3 ".../codex-reflector.py" --tail`),
clobbering it on every reinstall. Match the two EXACT commands this installer
emits (bare + REFLECTOR_PREEDIT_BLOCK=1 variant); this re-adds codex_command
(and a pre_edit_command sibling) that 18d7fef removed once the substring
approach orphaned it. Still idempotent for --pre-edit and plain reinstalls.

Harness gains an over-strip regression: a user hook referencing the path
survives reinstall (now 5 checks).

Found by ce-code-review round 2 (adversarial P2, reproduced) — a regression
the first round's substring fix introduced.

Op: correct
Restores: spec:dedup-strips-only-own-commands
The metadata-sanitization fix (9a51a39) routed tool_response.* through
_safe_meta but left the sibling `File: {file_path}` / `Tool: {tool_name}`
header lines raw — same outside-the-fence position, same verdict-bearing
prompt (build_code_review_prompt), same forged-"\nPASS\n" -> clear_fail_state
injection class under a poisoned-agent threat model. Route them (and
build_code_change_failure_prompt's File/Tool/Error lines) through _safe_meta,
matching build_pretooluse_prompt's existing safe_path/safe_tool. Regression:
a forged newline in tool_input.file_path's header is collapsed (282/282).

Found by ce-code-review round 2 (security P1, corroborated by correctness).

Op: correct
Restores: spec:untrusted-metadata-sanitized
The two lists are built in lockstep so this is behavior-neutral, but strict=
turns any future length desync into a hard error instead of silently
truncating — a dropped backend on a fail-closed Stop would invert it to
fail-open. Satisfies ruff B905 (CodeRabbit flagged it; the repo has no ruff
config so default ruff didn't).

Found by ce-code-review round 2 (correctness/adversarial/previous-comments;
CodeRabbit PR #2).

Op: correct
Restores: spec:fan-out-no-silent-truncation
The idempotency harness used `set -e`, so a broken installer aborted the
whole run before the remaining installers were tested and without a clear
per-installer FAIL line. Drop `set -e` (keep `set -u`): a failed installer
leaves a missing/unparseable settings file, report() reads it as -1 -> FAIL,
and the harness continues through the rest of the matrix. Verified: happy
path still 5/5; a missing settings file reports FAIL and continues.

Found by ce-code-review round 2 (testing P3).

Op: correct
Restores: spec:harness-reports-each-installer
Five recur-prone learnings surfaced by this session's two ce-code-review
rounds + simplify, each grounded in the commits that fixed them and
adversarially audited (accuracy / schema / no-overlap):

- security/unsandboxed-prompt-metadata-injection — untrusted metadata outside
  the _sandbox_content fence is a forged-verdict vector (_safe_meta; 9a51a39/5494478)
- security/fan-out-worker-exception-inverts-fail-closed — a worker exception
  crashes the hook; exit 1 is non-blocking on Stop -> fail-open (_future_raw; cf19cad)
- security/redact-before-truncate-ordering — truncate-first leaks a secret's
  prefix; redact before slicing (_safe_meta; 9a51a39)
- conventions/installer-dedup-must-match-all-command-variants — dedup on the
  exact command-set emitted, not a path substring (install-cursor.sh; afb4478/5372732)
- conventions/key-normalizer-helper-must-route-all-call-sites — a shared-key
  helper must route every read+write site or write/read silently disagree
  (_resolve_session_id; fc9ab2b/c667abc)

Authored via an ultracode workflow (5 drafters -> adversarial audit); CLAUDE.md
already surfaces docs/solutions/ for discovery.

Op: extend

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/install-cursor.sh (1)

141-166: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Unconditional dedup strips opted-in pre-edit hook on plain reinstall.

The jq filter strips BOTH $cmd and $precmd unconditionally (line 148), but the Python heredoc only emits the PreToolUse hook when pre_edit==1 (lines 117-132). This means running install-cursor.sh (without --pre-edit) after a prior install-cursor.sh --pre-edit run will silently remove the user's opted-in PreToolUse hook.

The strip-set must match the emit-set: strip $precmd only when --pre-edit is passed, matching the Codex installer's conditional pattern (see context snippet from install-codex.sh lines 164-167 where ours conditionally includes preedit_command).

🛡️ Proposed fix: conditional strip-set

Pass the pre_edit flag to jq and conditionally strip the pre-edit command:

-    jq --arg cmd "$codex_command" --arg precmd "$pre_edit_command" -s '
+    jq --arg cmd "$codex_command" --arg precmd "$pre_edit_command" --argjson pre_edit "$pre_edit" -s '
       # Strip ONLY our two exact generated commands (bare + pre-edit variant),
-      # so reinstall is idempotent for both --pre-edit and plain runs WITHOUT
-      # clobbering a user-authored hook that merely references the reflector path.
-      def without_reflector($cmd; $precmd):
+      # when --pre-edit is passed (matching emit-set). Plain runs strip only the
+      # bare command so they do not clobber a prior --pre-edit opt-in.
+      def without_reflector($cmd; $precmd; $pre_edit):
         map(
           if has("hooks") then
-            . + {hooks: (.hooks | map(select((.command // "") as $c | $c != $cmd and $c != $precmd)))}
+            . + {hooks: (.hooks | map(select((.command // "") as $c |
+              if $pre_edit then ($c != $cmd and $c != $precmd)
+              else ($c != $cmd)
+              end
+            )))}
           else
             .
           end
         )
         | map(select((has("hooks") | not) or ((.hooks // []) | length > 0)));
 
       .[0] as $existing
       | .[1] as $codex
       | (($existing.hooks // {}) + ($codex.hooks // {}) | keys_unsorted | unique) as $keys
       | $existing + {
           hooks: reduce $keys[] as $key ({};
             .[$key] = (
-              (($existing.hooks[$key] // []) | without_reflector($cmd; $precmd))
+              (($existing.hooks[$key] // []) | without_reflector($cmd; $precmd; $pre_edit))
               + ($codex.hooks[$key] // [])
             )
           )
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install-cursor.sh` around lines 141 - 166, The jq filter's
without_reflector currently always removes both $cmd and $precmd, which deletes
an opted-in PreToolUse hook when reinstalling without --pre-edit; modify the
invocation and filter so the pre-edit command is only stripped when pre_edit is
true: pass a boolean pre_edit flag into jq from the shell, change
without_reflector($cmd; $precmd) to conditionally include $precmd only when
pre_edit is true (e.g., call without_reflector($cmd; ($precmd |
select($pre_edit)))) and update the jq call site that reads "$settings_path" and
"$tmp_new" and writes "$tmp_merged" so it supplies the pre_edit value derived
from the script's pre_edit variable.
🧹 Nitpick comments (1)
scripts/test-install-idempotent.sh (1)

89-119: ⚡ Quick win

Consider adding cross-run idempotency test case.

The current Cursor test runs --pre-edit twice, catching duplicate accumulation and user-hook over-stripping. Consider adding a test case that runs --pre-edit then plain (without the flag) to verify that an opted-in PreToolUse hook is preserved, catching the conditional strip-set bug flagged in install-cursor.sh.

This would match the Codex installer's convention (context snippet 3, lines 164-167) where a plain reinstall does not remove a previously opted-in --preedit hook.

💡 Proposed additional test case

After line 95, add:

  # Cross-run preservation: opt-in with --pre-edit, then plain reinstall should
  # preserve the PreToolUse hook (not strip it).
  d2=$(mkdir_scratch)
  sh "$repo/scripts/install-cursor.sh" --pre-edit "$d2" >/dev/null 2>&1
  r1_prehook=$(grep -c 'REFLECTOR_PREEDIT_BLOCK=1' "$d2/.claude/settings.json" || echo 0)
  sh "$repo/scripts/install-cursor.sh" "$d2" >/dev/null 2>&1
  r2_prehook=$(grep -c 'REFLECTOR_PREEDIT_BLOCK=1' "$d2/.claude/settings.json" || echo 0)
  if [ "$r1_prehook" -gt 0 ] && [ "$r1_prehook" -eq "$r2_prehook" ]; then
    printf 'PASS install-cursor.sh preserves --pre-edit opt-in across plain reinstall\n'
    pass=$((pass + 1))
  else
    printf 'FAIL install-cursor.sh stripped --pre-edit hook on plain reinstall (%s -> %s)\n' "$r1_prehook" "$r2_prehook"
    fail=$((fail + 1))
  fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test-install-idempotent.sh` around lines 89 - 119, Add a cross-run
idempotency test that verifies an opted-in PreToolUse hook survives a plain
reinstall: create a second scratch dir via mkdir_scratch, run sh
"install-cursor.sh" --pre-edit on it, record presence/count of the
REFLECTOR_PREEDIT_BLOCK (or equivalent PreToolUse marker) in
"$d2/.claude/settings.json", then run sh "install-cursor.sh" (plain) on the same
dir and re-check the marker; assert that the marker count before and after are
equal and non-zero and fail the test if the plain reinstall stripped the
PreToolUse hook. Use the existing pattern for reporting/pass/fail and reference
install-cursor.sh, PreToolUse, and REFLECTOR_PREEDIT_BLOCK to locate where to
add the snippet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/install-cursor.sh`:
- Around line 141-166: The jq filter's without_reflector currently always
removes both $cmd and $precmd, which deletes an opted-in PreToolUse hook when
reinstalling without --pre-edit; modify the invocation and filter so the
pre-edit command is only stripped when pre_edit is true: pass a boolean pre_edit
flag into jq from the shell, change without_reflector($cmd; $precmd) to
conditionally include $precmd only when pre_edit is true (e.g., call
without_reflector($cmd; ($precmd | select($pre_edit)))) and update the jq call
site that reads "$settings_path" and "$tmp_new" and writes "$tmp_merged" so it
supplies the pre_edit value derived from the script's pre_edit variable.

---

Nitpick comments:
In `@scripts/test-install-idempotent.sh`:
- Around line 89-119: Add a cross-run idempotency test that verifies an opted-in
PreToolUse hook survives a plain reinstall: create a second scratch dir via
mkdir_scratch, run sh "install-cursor.sh" --pre-edit on it, record
presence/count of the REFLECTOR_PREEDIT_BLOCK (or equivalent PreToolUse marker)
in "$d2/.claude/settings.json", then run sh "install-cursor.sh" (plain) on the
same dir and re-check the marker; assert that the marker count before and after
are equal and non-zero and fail the test if the plain reinstall stripped the
PreToolUse hook. Use the existing pattern for reporting/pass/fail and reference
install-cursor.sh, PreToolUse, and REFLECTOR_PREEDIT_BLOCK to locate where to
add the snippet.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 829c40a9-d13c-4c6b-b738-dfd60dd045c0

📥 Commits

Reviewing files that changed from the base of the PR and between aacdaf3 and 1f25be2.

📒 Files selected for processing (10)
  • CLAUDE.md
  • README.md
  • docs/solutions/conventions/installer-dedup-must-match-all-command-variants.md
  • docs/solutions/conventions/key-normalizer-helper-must-route-all-call-sites.md
  • docs/solutions/security/fan-out-worker-exception-inverts-fail-closed.md
  • docs/solutions/security/redact-before-truncate-ordering.md
  • docs/solutions/security/unsandboxed-prompt-metadata-injection.md
  • scripts/codex-reflector.py
  • scripts/install-cursor.sh
  • scripts/test-install-idempotent.sh
✅ Files skipped from review due to trivial changes (5)
  • docs/solutions/conventions/installer-dedup-must-match-all-command-variants.md
  • docs/solutions/conventions/key-normalizer-helper-must-route-all-call-sites.md
  • docs/solutions/security/fan-out-worker-exception-inverts-fail-closed.md
  • docs/solutions/security/redact-before-truncate-ordering.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • CLAUDE.md

…on learning

A sixth recur-prone learning, drafted in the prior session but never
persisted before compaction (confirmed via session-history search). The
review cycle flagged `cp --` / `readlink --` as non-POSIX "GNUisms"; both
claims are false, and stripping the guard would re-introduce a
dash-prefixed-path misparse bug.

- conventions/end-of-options-double-dash-is-portable-and-load-bearing —
  `cp` shall conform to POSIX XBD 12.2 (Guideline 10); `readlink` is not in
  POSIX but GNU+BSD both accept `--` via getopt; reject-with-spec-citation is
  the correct disposition, not the "fix"

Authored via a Full ce-compound run (3 research drafters + session-history
search), then adversarially fact-checked against primary sources (Open Group
POSIX Issue 8, GNU coreutils, FreeBSD) — all five load-bearing claims
confirmed. CLAUDE.md/AGENTS.md already surface docs/solutions/ for discovery.

Op: extend
@metaphorics metaphorics requested a review from Copilot June 9, 2026 13:04
@metaphorics

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.

Comment thread scripts/install-grok.sh
Comment on lines +80 to +83
settings_path="${settings_dir}/settings.json"
tmp_new=$(mktemp)
tmp_merged=$(mktemp)
trap 'rm -f "$tmp_new" "$tmp_merged"' EXIT INT HUP TERM
Comment on lines +127 to +130
tmp_hooks=$(mktemp)
tmp_merged=$(mktemp)
tmp_settings=$(mktemp)
trap 'rm -f "$tmp_hooks" "$tmp_merged" "$tmp_settings"' EXIT INT HUP TERM
Comment thread scripts/install-cursor.sh
Comment on lines 72 to 75
settings_path="${settings_dir}/settings.json"
tmp_new=$(mktemp)
tmp_merged=$(mktemp)
trap 'rm -f "$tmp_new" "$tmp_merged"' EXIT INT HUP TERM
Comment on lines +37 to +41
mkdir_scratch() {
_d=$(mktemp -d)
tmpdirs="$tmpdirs $_d"
printf '%s' "$_d"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants