diff --git a/daemon/CHANGELOG.md b/daemon/CHANGELOG.md index cae0c782..2139f237 100644 --- a/daemon/CHANGELOG.md +++ b/daemon/CHANGELOG.md @@ -11,12 +11,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`--init` now sets up disclosure in one step.** Alongside the Ed25519 signing pair, `obsigna-daemon --init` also generates an X25519 forensic key pair (ADR-0012) and writes a starter `daemon.toml` with `parameter_disclosure = "true"`, so a fresh daemon records each action's parameters — encrypted to the forensic key and recoverable with `obsigna receipt disclose` — out of the box rather than hashes alone. Previously the forensic key was a separate `--init-forensic-key` step and disclosure defaulted off, which meant the common evaluation path showed only hashes. Writes stay fail-closed: `--init` refuses to overwrite an existing signing or forensic key and never clobbers an existing config (it reports leaving it untouched). The bare-daemon default is unchanged — run with no config and no forensic key, the daemon still hashes only; `--init` is what turns disclosure on. The forensic **private** key is written locally for immediate use; operators should move it off-host for production, where the daemon only needs the public key. `--init-forensic-key` remains for generating a forensic key independently. +### Added + +- **`obsigna-hook` captures failed tool calls** (#853) — the hook now handles Claude Code's `PostToolUseFailure` event in addition to `PostToolUse`. Claude Code fires `PostToolUse` only on success and `PostToolUseFailure` on failure, so a hook wired to `PostToolUse` alone left an errored or interrupted tool call (e.g. a lost concurrent write whose `Edit` no longer matched) with no receipt — only inferable retry activity. A `PostToolUseFailure` frame carries an `error` string (and no `tool_response`); the hook maps it to `decision="allowed"` with a non-empty error, which the daemon already records as `outcome.status=failure`. A blank error is replaced with `"tool call failed"` (or `"tool call interrupted"` when `is_interrupt` is set) so a failure frame is never silently downgraded to success. Add a `PostToolUseFailure` block to `~/.claude/settings.json` alongside the existing `PostToolUse` one to enable it (see `hook/README.md`). No schema change. + ## [0.27.0-alpha.1] - 2026-06-16 ### Added - **Out-of-band checkpoint anchor for tail-truncation resistance (ADR-0008 §2, spike #600).** The daemon can emit an additive, Ed25519-signed checkpoint of each chain HEAD — `{chain_id, sequence, receipt_hash, timestamp}`, canonicalised through the existing RFC 8785 path — to one or more append-only sinks the agent UID cannot rewrite. Enable with `--checkpoint-anchor`, a comma-separated fan-out list of `file:`, `git:`, or `syslog:` specs (env `AGENTRECEIPTS_CHECKPOINT_ANCHOR`, TOML `checkpoint_anchor`), and `--checkpoint-cadence` (receipts per checkpoint; default every receipt). A checkpoint is signed once and fanned out to every sink; the git sink commits each record so its commit chain is a tamper-evident log. Sink write failures are logged and metered but never block or undo receipt emission — receipts are the primary record. `obsigna receipt verify --against-anchor ` additionally verifies each checkpoint signature, asserts the log is strictly increasing in file order, and fails on tail truncation (a checkpoint ahead of the store HEAD). Receipts stay a linear verifiable-credential chain — this touches neither the receipt schema, hash chain, `@context`, nor the issuer DID (the anchoring freeze, ADR-0008). **Off by default:** with no `--checkpoint-anchor` configured, the daemon and `verify` are byte-identical to before. **Alpha — opt-in and experimental:** checkpoint emission is synchronous on the commit path, and durability/retry plus production-grade sinks (object-lock storage, TPM, transparency log) are not yet built. - ## [0.26.0] - 2026-06-14 ### Changed diff --git a/hook/AGENTS.md b/hook/AGENTS.md index c36d3e0e..0959f31f 100644 --- a/hook/AGENTS.md +++ b/hook/AGENTS.md @@ -1,6 +1,6 @@ # AGENTS.md -Short-lived PostToolUse hook binary. Currently supports Claude Code; designed to support additional runtimes via the `formats` map in `main.go`. Reads a JSON frame from stdin, maps it to an `emitter.Event`, and forwards it to `obsigna-daemon` over a Unix-domain socket. Exits 0 when the frame is unreadable or the runtime isn't recognised; once a runtime is identified, a failure to record the receipt exits 1 with a stderr message. Never pauses or modifies the tool call. Built on [sdk/go/emitter](../sdk/go/emitter/). +Short-lived PostToolUse / PostToolUseFailure / PreToolUse hook binary. Currently supports Claude Code; designed to support additional runtimes via the `formats` map in `main.go`. Reads a JSON frame from stdin, maps it to an `emitter.Event`, and forwards it to `obsigna-daemon` over a Unix-domain socket. A `PostToolUseFailure` frame carries an `error` string (and no `tool_response`); it maps to `decision="allowed"` with a non-empty error, so the daemon records the call as `outcome.status=failure` instead of leaving no receipt. Exits 0 when the frame is unreadable or the runtime isn't recognised; once a runtime is identified, a failure to record the receipt exits 1 with a stderr message. Never pauses or modifies the tool call. Built on [sdk/go/emitter](../sdk/go/emitter/). ## Getting started diff --git a/hook/README.md b/hook/README.md index f815c272..c4e5bffe 100644 --- a/hook/README.md +++ b/hook/README.md @@ -1,6 +1,6 @@ # obsigna-hook -Short-lived hook binary for [Agent Receipts](https://github.com/agent-receipts/obsigna). Invoked by agent runtimes on `PostToolUse` events — reads a JSON frame from stdin, maps it to an audit event, and forwards it to `obsigna-daemon` over a Unix-domain socket. It exits 0 silently when the frame is unreadable or the runtime isn't recognised; once the runtime is identified, a failure to record the receipt exits 1 with a stderr message (surfacing a broken audit pipeline rather than dropping receipts). It never pauses or modifies the tool call. +Short-lived hook binary for [Agent Receipts](https://github.com/agent-receipts/obsigna). Invoked by agent runtimes on `PostToolUse` and `PostToolUseFailure` events — reads a JSON frame from stdin, maps it to an audit event, and forwards it to `obsigna-daemon` over a Unix-domain socket. `PostToolUseFailure` is what records a failed (or interrupted) tool call as a `failure` row in the chain; without it a lost concurrent write or errored call leaves no receipt at all. It exits 0 silently when the frame is unreadable or the runtime isn't recognised; once the runtime is identified, a failure to record the receipt exits 1 with a stderr message (surfacing a broken audit pipeline rather than dropping receipts). It never pauses or modifies the tool call. > **Renamed from `agent-receipts-hook`** (ADR-0036). The old `agent-receipts-hook` binary still ships as a thin deprecation shim that forwards to `obsigna-hook`, so existing runtime hook configs keep working — point new configs at `obsigna-hook`. Homebrew now ships the hook in the umbrella `obsigna` formula (ADR-0034), which installs both `obsigna-hook` and the `agent-receipts-hook` shim. @@ -31,6 +31,17 @@ In your Claude Code settings (`~/.claude/settings.json`): } ] } + ], + "PostToolUseFailure": [ + { + "matcher": "", + "hooks": [ + { + "type": "command", + "command": "obsigna-hook" + } + ] + } ] } } diff --git a/hook/cmd/obsigna-hook/claude_code.go b/hook/cmd/obsigna-hook/claude_code.go index 2c7946fd..4441b4d1 100644 --- a/hook/cmd/obsigna-hook/claude_code.go +++ b/hook/cmd/obsigna-hook/claude_code.go @@ -1,6 +1,7 @@ package main import ( + "bytes" "encoding/json" "errors" "fmt" @@ -10,8 +11,39 @@ import ( "github.com/agent-receipts/ar/sdk/go/emitter" ) +// maxErrorTextLen bounds the failure message copied into the receipt. Claude +// Code error strings are short, but the field is host-supplied and otherwise +// uncapped before the emitter's whole-frame MaxFrameSize check. Truncating here +// degrades an oversized message to a truncated receipt rather than dropping the +// failure entirely (an oversized frame would fail Emit and exit the hook 1). +const maxErrorTextLen = 16 << 10 + +// failureErrorText extracts the human-readable failure message from a +// PostToolUseFailure frame's `error` field. Claude Code sends a JSON string +// today; the frame is treated as an external artifact, so a non-string value +// (object/array/number) is kept as its raw JSON text rather than aborting the +// whole-frame unmarshal — a schema variation degrades the message instead of +// dropping the failure receipt. The result is trimmed and length-capped. +func failureErrorText(raw json.RawMessage) string { + trimmed := bytes.TrimSpace(raw) + if len(trimmed) == 0 || bytes.Equal(trimmed, []byte("null")) { + return "" + } + var s string + if err := json.Unmarshal(trimmed, &s); err != nil { + // Not a JSON string — keep the raw JSON text as the message. + s = string(trimmed) + } + s = strings.TrimSpace(s) + if len(s) > maxErrorTextLen { + // ToValidUTF8 drops a trailing partial rune left by the byte-slice cut. + s = strings.ToValidUTF8(s[:maxErrorTextLen], "") + "…(truncated)" + } + return s +} + // claudeCodeFrame is the JSON envelope Claude Code sends on stdin for -// PostToolUse and PreToolUse hooks. +// PostToolUse, PostToolUseFailure, and PreToolUse hooks. type claudeCodeFrame struct { HookEventName string `json:"hook_event_name"` SessionID string `json:"session_id"` @@ -22,13 +54,29 @@ type claudeCodeFrame struct { AgentID string `json:"agent_id"` AgentType string `json:"agent_type"` TranscriptPath string `json:"transcript_path"` + + // Error and IsInterrupt are carried only on PostToolUseFailure frames. + // Error is the human-readable failure message. Claude Code sends a JSON + // string, but it is kept as RawMessage and decoded leniently (see + // failureErrorText) so a non-string value cannot abort the whole-frame + // unmarshal and drop the failure receipt. IsInterrupt distinguishes a + // user/abort cancellation from a genuine execution error. A + // PostToolUseFailure frame carries no tool_response. + Error json.RawMessage `json:"error"` + IsInterrupt bool `json:"is_interrupt"` } -// readClaudeCode parses a Claude Code PostToolUse or PreToolUse stdin frame -// and maps it to an emitter.Event. The decision is derived from the -// hook_event_name field: -// - "PostToolUse" → "allowed" (tool ran successfully) -// - "PreToolUse" → "pending" (tool is about to run; outcome not yet known) +// readClaudeCode parses a Claude Code PostToolUse, PostToolUseFailure, or +// PreToolUse stdin frame and maps it to an emitter.Event. The decision is +// derived from the hook_event_name field: +// - "PostToolUse" → "allowed" (tool ran successfully) +// - "PostToolUseFailure" → "allowed" + non-empty Error (tool ran but failed; +// the daemon stamps outcome.status=failure from the error) +// - "PreToolUse" → "pending" (tool is about to run; outcome not yet known) +// +// PostToolUse fires only on success and PostToolUseFailure only on failure, so +// capturing both is what records a failed (or interrupted) tool call as a +// failure row in the chain rather than leaving no receipt at all. // // The returned sessionID is the host-supplied session identifier from the // frame; it is the empty string when absent. @@ -44,10 +92,26 @@ func readClaudeCode(stdin []byte, env func(string) string) (emitter.Event, strin return emitter.Event{}, "", errors.New("missing tool_name") } - var decision string + var decision, failureErr string switch f.HookEventName { case "PostToolUse": decision = "allowed" + case "PostToolUseFailure": + // The tool call was permitted and ran, but execution failed (or was + // interrupted). Record it as "allowed" with a non-empty error: the + // daemon maps decision="allowed" + a non-empty error to + // outcome.status=failure. Guarantee non-empty text so a failure frame + // is never silently downgraded to success by that rule, even on the + // rare frame that carries no message. + decision = "allowed" + failureErr = failureErrorText(f.Error) + if failureErr == "" { + if f.IsInterrupt { + failureErr = "tool call interrupted" + } else { + failureErr = "tool call failed" + } + } case "PreToolUse": decision = "pending" default: @@ -61,6 +125,7 @@ func readClaudeCode(stdin []byte, env func(string) string) (emitter.Event, strin Channel: "claude-code", Tool: emitter.Tool{Name: f.ToolName}, Decision: decision, + Error: failureErr, CorrelationID: f.ToolUseID, AgentID: f.AgentID, AgentType: f.AgentType, diff --git a/hook/cmd/obsigna-hook/integration_test.go b/hook/cmd/obsigna-hook/integration_test.go index cdf379f3..0e954b26 100644 --- a/hook/cmd/obsigna-hook/integration_test.go +++ b/hook/cmd/obsigna-hook/integration_test.go @@ -131,6 +131,7 @@ type wireFrame struct { } `json:"tool"` Input json.RawMessage `json:"input,omitempty"` Output json.RawMessage `json:"output,omitempty"` + Error string `json:"error,omitempty"` Decision string `json:"decision"` TsEmit string `json:"ts_emit"` } @@ -205,6 +206,63 @@ func TestIntegration_ClaudeCodeFrame(t *testing.T) { } } +// TestIntegration_ClaudeCodeFailureFrame exercises a PostToolUseFailure frame +// end-to-end: the failure error must reach the listener on the wire so the +// daemon can stamp outcome.status=failure. This is the path that records a +// failed (e.g. lost concurrent write) tool call as a failure row in the chain. +func TestIntegration_ClaudeCodeFailureFrame(t *testing.T) { + dir := shortSocketDir(t) + rl := newRecordingListener(t, dir) + + const sessionID = "integ-fail-2026" + stdin := `{ + "hook_event_name": "PostToolUseFailure", + "session_id": "` + sessionID + `", + "tool_use_id": "tu-lost-write", + "tool_name": "Edit", + "tool_input": {"file_path":"/repo/shared.go","old_string":"a","new_string":"b"}, + "error": "String to replace not found in file." + }` + + ev, sid, err := readClaudeCode([]byte(stdin), func(string) string { return "" }) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + + em, err := emitter.NewDaemon( + emitter.WithSocketPath(rl.path), + emitter.WithSessionID(sid), + emitter.WithLogger(slog.New(slog.NewTextHandler(io.Discard, nil))), + ) + if err != nil { + t.Fatalf("emitter.NewDaemon: %v", err) + } + defer em.Close() + + if err := em.Emit(context.Background(), ev); err != nil { + t.Fatalf("Emit: %v", err) + } + + frames := rl.waitForFrames(t, 1, 2*time.Second) + + var got wireFrame + if err := json.Unmarshal(frames[0], &got); err != nil { + t.Fatalf("unmarshal frame: %v (raw: %s)", err, frames[0]) + } + if got.Decision != "allowed" { + t.Errorf("decision = %q; want allowed", got.Decision) + } + if got.Error != "String to replace not found in file." { + t.Errorf("error = %q; want the failure message on the wire", got.Error) + } + if got.Output != nil { + t.Errorf("output = %s; want absent on a failure frame", got.Output) + } + if !json.Valid(got.Input) { + t.Errorf("input not valid JSON: %s", got.Input) + } +} + // TestIntegration_DaemonDown_SurfacesError verifies that by default (ADR-0025, // the mode the hook uses), Emit returns an error when the daemon socket is // unreachable. The hook then exits non-zero to surface the failure to the agent diff --git a/hook/cmd/obsigna-hook/main.go b/hook/cmd/obsigna-hook/main.go index f90bf8d6..acd98e0d 100644 --- a/hook/cmd/obsigna-hook/main.go +++ b/hook/cmd/obsigna-hook/main.go @@ -1,7 +1,7 @@ // Command obsigna-hook is a short-lived hook binary invoked by agent runtimes -// (Claude Code, Codex, …) on PostToolUse and PreToolUse events. It reads a JSON -// frame from stdin, maps it to an emitter.Event, and forwards it to the -// agent-receipts daemon over a Unix-domain socket. +// (Claude Code, Codex, …) on PostToolUse, PostToolUseFailure, and PreToolUse +// events. It reads a JSON frame from stdin, maps it to an emitter.Event, and +// forwards it to obsigna-daemon over a Unix-domain socket. // // It is the primary hook entrypoint (ADR-0036). The legacy agent-receipts-hook // binary is a thin deprecation shim that forwards here (see @@ -66,7 +66,7 @@ var formats = map[string]reader{ // Claude Code does not set CLAUDE_SESSION_ID as an environment variable; it // passes hook_event_name in the stdin JSON payload instead. We check both // signals so the binary works with runtimes that take either approach. -// Both "PostToolUse" and "PreToolUse" are accepted from stdin. +// "PostToolUse", "PostToolUseFailure", and "PreToolUse" are accepted from stdin. func detect(stdin []byte, env func(string) string) string { if env("CLAUDE_SESSION_ID") != "" { return "claude-code" @@ -76,7 +76,7 @@ func detect(stdin []byte, env func(string) string) string { } if json.Unmarshal(stdin, &probe) == nil { switch probe.HookEventName { - case "PostToolUse", "PreToolUse": + case "PostToolUse", "PostToolUseFailure", "PreToolUse": return "claude-code" } } diff --git a/hook/cmd/obsigna-hook/main_test.go b/hook/cmd/obsigna-hook/main_test.go index 823caaaf..b63d8a20 100644 --- a/hook/cmd/obsigna-hook/main_test.go +++ b/hook/cmd/obsigna-hook/main_test.go @@ -215,6 +215,148 @@ func TestReadClaudeCode_InputOutputAreValidJSON(t *testing.T) { } } +// TestReadClaudeCode_PostToolUseFailure verifies a PostToolUseFailure frame is +// mapped to an "allowed" decision carrying the failure error, so the daemon +// records it as outcome.status=failure rather than leaving no receipt. The +// frame still carries tool_input (so the target/parameters are captured) but no +// tool_response. +func TestReadClaudeCode_PostToolUseFailure(t *testing.T) { + noEnv := func(string) string { return "" } + + t.Run("error message passes through", func(t *testing.T) { + stdin := `{ + "hook_event_name": "PostToolUseFailure", + "session_id": "sess-fail", + "tool_use_id": "tu-fail", + "tool_name": "Edit", + "tool_input": {"file_path":"/repo/shared.go","old_string":"a","new_string":"b"}, + "error": "String to replace not found in file.", + "is_interrupt": false + }` + ev, sid, err := readClaudeCode([]byte(stdin), noEnv) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + if ev.Decision != "allowed" { + t.Errorf("Decision = %q; want allowed", ev.Decision) + } + if ev.Error != "String to replace not found in file." { + t.Errorf("Error = %q; want the frame error", ev.Error) + } + if sid != "sess-fail" { + t.Errorf("sessionID = %q; want sess-fail", sid) + } + if ev.Input == nil { + t.Error("Input is nil; want non-nil (tool_input present on failure frames)") + } + if ev.Output != nil { + t.Errorf("Output = %s; want nil (no tool_response on failure frames)", ev.Output) + } + if ev.Target.System != "filesystem" || ev.Target.Resource != "/repo/shared.go" { + t.Errorf("Target = %+v; want filesystem /repo/shared.go", ev.Target) + } + }) + + t.Run("empty error falls back to a non-empty message", func(t *testing.T) { + stdin := `{ + "hook_event_name": "PostToolUseFailure", + "session_id": "s", + "tool_name": "Bash", + "tool_input": {"command":"false"}, + "error": " " + }` + ev, _, err := readClaudeCode([]byte(stdin), noEnv) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + // A blank error must not be forwarded as-is: the daemon treats an empty + // error on an "allowed" decision as success, which would silently lose + // the failure. + if ev.Error != "tool call failed" { + t.Errorf("Error = %q; want fallback %q", ev.Error, "tool call failed") + } + }) + + t.Run("interrupt with no message falls back to interrupted", func(t *testing.T) { + stdin := `{ + "hook_event_name": "PostToolUseFailure", + "session_id": "s", + "tool_name": "Bash", + "tool_input": {"command":"sleep 100"}, + "error": "", + "is_interrupt": true + }` + ev, _, err := readClaudeCode([]byte(stdin), noEnv) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + if ev.Error != "tool call interrupted" { + t.Errorf("Error = %q; want fallback %q", ev.Error, "tool call interrupted") + } + }) + + t.Run("non-string error does not abort the frame parse", func(t *testing.T) { + // Claude Code sends `error` as a string today; a schema variation must + // not break parsing and drop the failure receipt entirely. The raw JSON + // is kept as the message text. + stdin := `{ + "hook_event_name": "PostToolUseFailure", + "session_id": "s", + "tool_name": "Edit", + "tool_input": {"file_path":"/x.go"}, + "error": {"message":"nested"} + }` + ev, _, err := readClaudeCode([]byte(stdin), noEnv) + if err != nil { + t.Fatalf("readClaudeCode returned error on non-string error field: %v", err) + } + if ev.Decision != "allowed" { + t.Errorf("Decision = %q; want allowed", ev.Decision) + } + if ev.Error != `{"message":"nested"}` { + t.Errorf("Error = %q; want the raw JSON text of the error object", ev.Error) + } + }) + + t.Run("oversized error is truncated, not dropped", func(t *testing.T) { + big := strings.Repeat("x", maxErrorTextLen+500) + stdin, _ := json.Marshal(map[string]any{ + "hook_event_name": "PostToolUseFailure", + "session_id": "s", + "tool_name": "Bash", + "tool_input": map[string]string{"command": "false"}, + "error": big, + }) + ev, _, err := readClaudeCode(stdin, noEnv) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + if len(ev.Error) > maxErrorTextLen+len("…(truncated)") { + t.Errorf("Error length = %d; want capped near %d", len(ev.Error), maxErrorTextLen) + } + if !strings.HasSuffix(ev.Error, "…(truncated)") { + t.Errorf("Error = %q…; want a truncation marker suffix", ev.Error[:64]) + } + }) + + t.Run("success frame carries no error", func(t *testing.T) { + stdin := `{ + "hook_event_name": "PostToolUse", + "session_id": "s", + "tool_name": "Bash", + "tool_input": {"command":"true"}, + "tool_response": {"output":"ok"} + }` + ev, _, err := readClaudeCode([]byte(stdin), noEnv) + if err != nil { + t.Fatalf("readClaudeCode: %v", err) + } + if ev.Error != "" { + t.Errorf("Error = %q; want empty on a success frame", ev.Error) + } + }) +} + // --- detect unit tests --- func TestDetect(t *testing.T) { @@ -253,6 +395,12 @@ func TestDetect(t *testing.T) { env: map[string]string{}, want: "claude-code", }, + { + name: "PostToolUseFailure hook_event_name is detected as claude-code", + stdin: `{"hook_event_name":"PostToolUseFailure","session_id":"s","tool_name":"Edit","tool_input":{},"error":"old_string not found"}`, + env: map[string]string{}, + want: "claude-code", + }, { name: "unrecognised hook_event_name is not detected", stdin: `{"hook_event_name":"StopHook","session_id":"s"}`,