Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion daemon/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- **`--init` now sets up disclosure in one step.** Alongside the Ed25519 signing pair, `obsigna-daemon --init` also generates an X25519 forensic key pair (ADR-0012) and writes a starter `daemon.toml` with `parameter_disclosure = "true"`, so a fresh daemon records each action's parameters — encrypted to the forensic key and recoverable with `obsigna receipt disclose` — out of the box rather than hashes alone. Previously the forensic key was a separate `--init-forensic-key` step and disclosure defaulted off, which meant the common evaluation path showed only hashes. Writes stay fail-closed: `--init` refuses to overwrite an existing signing or forensic key and never clobbers an existing config (it reports leaving it untouched). The bare-daemon default is unchanged — run with no config and no forensic key, the daemon still hashes only; `--init` is what turns disclosure on. The forensic **private** key is written locally for immediate use; operators should move it off-host for production, where the daemon only needs the public key. `--init-forensic-key` remains for generating a forensic key independently.

### Added

- **`obsigna-hook` captures failed tool calls** (#853) — the hook now handles Claude Code's `PostToolUseFailure` event in addition to `PostToolUse`. Claude Code fires `PostToolUse` only on success and `PostToolUseFailure` on failure, so a hook wired to `PostToolUse` alone left an errored or interrupted tool call (e.g. a lost concurrent write whose `Edit` no longer matched) with no receipt — only inferable retry activity. A `PostToolUseFailure` frame carries an `error` string (and no `tool_response`); the hook maps it to `decision="allowed"` with a non-empty error, which the daemon already records as `outcome.status=failure`. A blank error is replaced with `"tool call failed"` (or `"tool call interrupted"` when `is_interrupt` is set) so a failure frame is never silently downgraded to success. Add a `PostToolUseFailure` block to `~/.claude/settings.json` alongside the existing `PostToolUse` one to enable it (see `hook/README.md`). No schema change.

## [0.27.0-alpha.1] - 2026-06-16

### Added

- **Out-of-band checkpoint anchor for tail-truncation resistance (ADR-0008 §2, spike #600).** The daemon can emit an additive, Ed25519-signed checkpoint of each chain HEAD — `{chain_id, sequence, receipt_hash, timestamp}`, canonicalised through the existing RFC 8785 path — to one or more append-only sinks the agent UID cannot rewrite. Enable with `--checkpoint-anchor`, a comma-separated fan-out list of `file:<path>`, `git:<dir>`, or `syslog:<tag>` specs (env `AGENTRECEIPTS_CHECKPOINT_ANCHOR`, TOML `checkpoint_anchor`), and `--checkpoint-cadence` (receipts per checkpoint; default every receipt). A checkpoint is signed once and fanned out to every sink; the git sink commits each record so its commit chain is a tamper-evident log. Sink write failures are logged and metered but never block or undo receipt emission — receipts are the primary record. `obsigna receipt verify --against-anchor <log>` additionally verifies each checkpoint signature, asserts the log is strictly increasing in file order, and fails on tail truncation (a checkpoint ahead of the store HEAD). Receipts stay a linear verifiable-credential chain — this touches neither the receipt schema, hash chain, `@context`, nor the issuer DID (the anchoring freeze, ADR-0008). **Off by default:** with no `--checkpoint-anchor` configured, the daemon and `verify` are byte-identical to before. **Alpha — opt-in and experimental:** checkpoint emission is synchronous on the commit path, and durability/retry plus production-grade sinks (object-lock storage, TPM, transparency log) are not yet built.

## [0.26.0] - 2026-06-14

### Changed
Expand Down
2 changes: 1 addition & 1 deletion hook/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AGENTS.md

Short-lived PostToolUse hook binary. Currently supports Claude Code; designed to support additional runtimes via the `formats` map in `main.go`. Reads a JSON frame from stdin, maps it to an `emitter.Event`, and forwards it to `obsigna-daemon` over a Unix-domain socket. Exits 0 when the frame is unreadable or the runtime isn't recognised; once a runtime is identified, a failure to record the receipt exits 1 with a stderr message. Never pauses or modifies the tool call. Built on [sdk/go/emitter](../sdk/go/emitter/).
Short-lived PostToolUse / PostToolUseFailure / PreToolUse hook binary. Currently supports Claude Code; designed to support additional runtimes via the `formats` map in `main.go`. Reads a JSON frame from stdin, maps it to an `emitter.Event`, and forwards it to `obsigna-daemon` over a Unix-domain socket. A `PostToolUseFailure` frame carries an `error` string (and no `tool_response`); it maps to `decision="allowed"` with a non-empty error, so the daemon records the call as `outcome.status=failure` instead of leaving no receipt. Exits 0 when the frame is unreadable or the runtime isn't recognised; once a runtime is identified, a failure to record the receipt exits 1 with a stderr message. Never pauses or modifies the tool call. Built on [sdk/go/emitter](../sdk/go/emitter/).

## Getting started

Expand Down
13 changes: 12 additions & 1 deletion hook/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# obsigna-hook

Short-lived hook binary for [Agent Receipts](https://github.com/agent-receipts/obsigna). Invoked by agent runtimes on `PostToolUse` events — reads a JSON frame from stdin, maps it to an audit event, and forwards it to `obsigna-daemon` over a Unix-domain socket. It exits 0 silently when the frame is unreadable or the runtime isn't recognised; once the runtime is identified, a failure to record the receipt exits 1 with a stderr message (surfacing a broken audit pipeline rather than dropping receipts). It never pauses or modifies the tool call.
Short-lived hook binary for [Agent Receipts](https://github.com/agent-receipts/obsigna). Invoked by agent runtimes on `PostToolUse` and `PostToolUseFailure` events — reads a JSON frame from stdin, maps it to an audit event, and forwards it to `obsigna-daemon` over a Unix-domain socket. `PostToolUseFailure` is what records a failed (or interrupted) tool call as a `failure` row in the chain; without it a lost concurrent write or errored call leaves no receipt at all. It exits 0 silently when the frame is unreadable or the runtime isn't recognised; once the runtime is identified, a failure to record the receipt exits 1 with a stderr message (surfacing a broken audit pipeline rather than dropping receipts). It never pauses or modifies the tool call.

> **Renamed from `agent-receipts-hook`** (ADR-0036). The old `agent-receipts-hook` binary still ships as a thin deprecation shim that forwards to `obsigna-hook`, so existing runtime hook configs keep working — point new configs at `obsigna-hook`. Homebrew now ships the hook in the umbrella `obsigna` formula (ADR-0034), which installs both `obsigna-hook` and the `agent-receipts-hook` shim.

Expand Down Expand Up @@ -31,6 +31,17 @@ In your Claude Code settings (`~/.claude/settings.json`):
}
]
}
],
"PostToolUseFailure": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "obsigna-hook"
}
]
}
]
}
}
Expand Down
79 changes: 72 additions & 7 deletions hook/cmd/obsigna-hook/claude_code.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package main

import (
"bytes"
"encoding/json"
"errors"
"fmt"
Expand All @@ -10,8 +11,39 @@ import (
"github.com/agent-receipts/ar/sdk/go/emitter"
)

// maxErrorTextLen bounds the failure message copied into the receipt. Claude
// Code error strings are short, but the field is host-supplied and otherwise
// uncapped before the emitter's whole-frame MaxFrameSize check. Truncating here
// degrades an oversized message to a truncated receipt rather than dropping the
// failure entirely (an oversized frame would fail Emit and exit the hook 1).
const maxErrorTextLen = 16 << 10

// failureErrorText extracts the human-readable failure message from a
// PostToolUseFailure frame's `error` field. Claude Code sends a JSON string
// today; the frame is treated as an external artifact, so a non-string value
// (object/array/number) is kept as its raw JSON text rather than aborting the
// whole-frame unmarshal — a schema variation degrades the message instead of
// dropping the failure receipt. The result is trimmed and length-capped.
func failureErrorText(raw json.RawMessage) string {
trimmed := bytes.TrimSpace(raw)
if len(trimmed) == 0 || bytes.Equal(trimmed, []byte("null")) {
return ""
}
var s string
if err := json.Unmarshal(trimmed, &s); err != nil {
// Not a JSON string — keep the raw JSON text as the message.
s = string(trimmed)
}
s = strings.TrimSpace(s)
if len(s) > maxErrorTextLen {
// ToValidUTF8 drops a trailing partial rune left by the byte-slice cut.
s = strings.ToValidUTF8(s[:maxErrorTextLen], "") + "…(truncated)"
}
return s
}

// claudeCodeFrame is the JSON envelope Claude Code sends on stdin for
// PostToolUse and PreToolUse hooks.
// PostToolUse, PostToolUseFailure, and PreToolUse hooks.
type claudeCodeFrame struct {
HookEventName string `json:"hook_event_name"`
SessionID string `json:"session_id"`
Expand All @@ -22,13 +54,29 @@ type claudeCodeFrame struct {
AgentID string `json:"agent_id"`
AgentType string `json:"agent_type"`
TranscriptPath string `json:"transcript_path"`

// Error and IsInterrupt are carried only on PostToolUseFailure frames.
// Error is the human-readable failure message. Claude Code sends a JSON
// string, but it is kept as RawMessage and decoded leniently (see
// failureErrorText) so a non-string value cannot abort the whole-frame
// unmarshal and drop the failure receipt. IsInterrupt distinguishes a
// user/abort cancellation from a genuine execution error. A
// PostToolUseFailure frame carries no tool_response.
Error json.RawMessage `json:"error"`
IsInterrupt bool `json:"is_interrupt"`
}

// readClaudeCode parses a Claude Code PostToolUse or PreToolUse stdin frame
// and maps it to an emitter.Event. The decision is derived from the
// hook_event_name field:
// - "PostToolUse" → "allowed" (tool ran successfully)
// - "PreToolUse" → "pending" (tool is about to run; outcome not yet known)
// readClaudeCode parses a Claude Code PostToolUse, PostToolUseFailure, or
// PreToolUse stdin frame and maps it to an emitter.Event. The decision is
// derived from the hook_event_name field:
// - "PostToolUse" → "allowed" (tool ran successfully)
// - "PostToolUseFailure" → "allowed" + non-empty Error (tool ran but failed;
// the daemon stamps outcome.status=failure from the error)
// - "PreToolUse" → "pending" (tool is about to run; outcome not yet known)
//
// PostToolUse fires only on success and PostToolUseFailure only on failure, so
// capturing both is what records a failed (or interrupted) tool call as a
// failure row in the chain rather than leaving no receipt at all.
//
// The returned sessionID is the host-supplied session identifier from the
// frame; it is the empty string when absent.
Expand All @@ -44,10 +92,26 @@ func readClaudeCode(stdin []byte, env func(string) string) (emitter.Event, strin
return emitter.Event{}, "", errors.New("missing tool_name")
}

var decision string
var decision, failureErr string
switch f.HookEventName {
case "PostToolUse":
decision = "allowed"
case "PostToolUseFailure":
// The tool call was permitted and ran, but execution failed (or was
// interrupted). Record it as "allowed" with a non-empty error: the
// daemon maps decision="allowed" + a non-empty error to
// outcome.status=failure. Guarantee non-empty text so a failure frame
// is never silently downgraded to success by that rule, even on the
// rare frame that carries no message.
decision = "allowed"
failureErr = failureErrorText(f.Error)
if failureErr == "" {
if f.IsInterrupt {
failureErr = "tool call interrupted"
} else {
failureErr = "tool call failed"
}
}
case "PreToolUse":
decision = "pending"
default:
Expand All @@ -61,6 +125,7 @@ func readClaudeCode(stdin []byte, env func(string) string) (emitter.Event, strin
Channel: "claude-code",
Tool: emitter.Tool{Name: f.ToolName},
Decision: decision,
Error: failureErr,
CorrelationID: f.ToolUseID,
AgentID: f.AgentID,
AgentType: f.AgentType,
Expand Down
58 changes: 58 additions & 0 deletions hook/cmd/obsigna-hook/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ type wireFrame struct {
} `json:"tool"`
Input json.RawMessage `json:"input,omitempty"`
Output json.RawMessage `json:"output,omitempty"`
Error string `json:"error,omitempty"`
Decision string `json:"decision"`
TsEmit string `json:"ts_emit"`
}
Expand Down Expand Up @@ -205,6 +206,63 @@ func TestIntegration_ClaudeCodeFrame(t *testing.T) {
}
}

// TestIntegration_ClaudeCodeFailureFrame exercises a PostToolUseFailure frame
// end-to-end: the failure error must reach the listener on the wire so the
// daemon can stamp outcome.status=failure. This is the path that records a
// failed (e.g. lost concurrent write) tool call as a failure row in the chain.
func TestIntegration_ClaudeCodeFailureFrame(t *testing.T) {
dir := shortSocketDir(t)
rl := newRecordingListener(t, dir)

const sessionID = "integ-fail-2026"
stdin := `{
"hook_event_name": "PostToolUseFailure",
"session_id": "` + sessionID + `",
"tool_use_id": "tu-lost-write",
"tool_name": "Edit",
"tool_input": {"file_path":"/repo/shared.go","old_string":"a","new_string":"b"},
"error": "String to replace not found in file."
}`

ev, sid, err := readClaudeCode([]byte(stdin), func(string) string { return "" })
if err != nil {
t.Fatalf("readClaudeCode: %v", err)
}

em, err := emitter.NewDaemon(
emitter.WithSocketPath(rl.path),
emitter.WithSessionID(sid),
emitter.WithLogger(slog.New(slog.NewTextHandler(io.Discard, nil))),
)
if err != nil {
t.Fatalf("emitter.NewDaemon: %v", err)
}
defer em.Close()

if err := em.Emit(context.Background(), ev); err != nil {
t.Fatalf("Emit: %v", err)
}

frames := rl.waitForFrames(t, 1, 2*time.Second)

var got wireFrame
if err := json.Unmarshal(frames[0], &got); err != nil {
t.Fatalf("unmarshal frame: %v (raw: %s)", err, frames[0])
}
if got.Decision != "allowed" {
t.Errorf("decision = %q; want allowed", got.Decision)
}
if got.Error != "String to replace not found in file." {
t.Errorf("error = %q; want the failure message on the wire", got.Error)
}
if got.Output != nil {
t.Errorf("output = %s; want absent on a failure frame", got.Output)
}
if !json.Valid(got.Input) {
t.Errorf("input not valid JSON: %s", got.Input)
}
}

// TestIntegration_DaemonDown_SurfacesError verifies that by default (ADR-0025,
// the mode the hook uses), Emit returns an error when the daemon socket is
// unreachable. The hook then exits non-zero to surface the failure to the agent
Expand Down
10 changes: 5 additions & 5 deletions hook/cmd/obsigna-hook/main.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Command obsigna-hook is a short-lived hook binary invoked by agent runtimes
// (Claude Code, Codex, …) on PostToolUse and PreToolUse events. It reads a JSON
// frame from stdin, maps it to an emitter.Event, and forwards it to the
// agent-receipts daemon over a Unix-domain socket.
// (Claude Code, Codex, …) on PostToolUse, PostToolUseFailure, and PreToolUse
// events. It reads a JSON frame from stdin, maps it to an emitter.Event, and
// forwards it to obsigna-daemon over a Unix-domain socket.
//
// It is the primary hook entrypoint (ADR-0036). The legacy agent-receipts-hook
// binary is a thin deprecation shim that forwards here (see
Expand Down Expand Up @@ -66,7 +66,7 @@ var formats = map[string]reader{
// Claude Code does not set CLAUDE_SESSION_ID as an environment variable; it
// passes hook_event_name in the stdin JSON payload instead. We check both
// signals so the binary works with runtimes that take either approach.
// Both "PostToolUse" and "PreToolUse" are accepted from stdin.
// "PostToolUse", "PostToolUseFailure", and "PreToolUse" are accepted from stdin.
func detect(stdin []byte, env func(string) string) string {
if env("CLAUDE_SESSION_ID") != "" {
return "claude-code"
Expand All @@ -76,7 +76,7 @@ func detect(stdin []byte, env func(string) string) string {
}
if json.Unmarshal(stdin, &probe) == nil {
switch probe.HookEventName {
case "PostToolUse", "PreToolUse":
case "PostToolUse", "PostToolUseFailure", "PreToolUse":
return "claude-code"
}
}
Expand Down
Loading
Loading