feat(core)!: two-phase broadcast status + reconcile grace window by whoabuddy · Pull Request #26 · aibtcdev/tx-schemas

whoabuddy · 2026-04-15T19:26:22Z

Closes #25.

Summary

Introduces LedgerEntry.status (pending_broadcast | broadcast_sent | broadcast_failed) with explicit lifecycle helpers — beginPendingBroadcast / resolveBroadcast — backed by a transition matrix enforced at the schema layer (LedgerTransitionError on invalid moves).
Adds justBroadcastGraceSeconds (default 30) to reconcile(); ledger entries absent from the mempool but broadcast within the grace window are surfaced as inFlightPendingIndex instead of dropped. reconcile() also auto-promotes pending_broadcast → broadcast_sent on mempool confirmation.
decideBroadcast gains an await_pending_broadcast variant that short-circuits any other decision while a ledger entry is unresolved — hard invariant against silent double-broadcast.

Design notes

OccupantClassification stays closed. Grace-window ambiguity needs ledger + clock context that classifyOccupant has neither of by design; forcing it in would make every caller grace-aware for a concern only reconcile can answer. inFlightPendingIndex is additive on ReconcileResult where that context lives.
Timestamps stay ISO. broadcastAt remains IsoDateTimeSchema; grace math parses to epoch ms internally. No parallel unix-ms field.
Lifecycle helpers over README conventions. A two-phase write "pattern" documented only in prose drifts. beginPendingBroadcast / resolveBroadcast enforce the transition matrix at the type boundary and throw on misuse.

Breaking change

SponsorLedgerEntry.status is required. Entries persisted under 0.8.0 lack it and will fail schema parse. Consumers must backfill status: "broadcast_sent" on migration. Targets 0.9.0.

Test plan

217 tests pass (npm test)
npm run typecheck clean
npm run build clean
New coverage: every state transition (valid + invalid), await_pending_broadcast precedence over RBF, grace window inside/outside (10s/60s) and custom override (45s × {30, 60}), reconcile pending-promotion, crash-recovery scenario
Relay integration smoke (out of scope here — tracked in Consume tx-schemas sponsor wallet helpers; fix nonce conflict handling x402-sponsor-relay#338)

🤖 Generated with Claude Code

#25) Closes the two residual state-confidence gaps that blocked the relay sponsor-ledger refactor from claiming mainnet-grade state confidence. **Gap 1 — two-phase broadcast lifecycle** - `LedgerEntryStatusSchema`: `pending_broadcast` | `broadcast_sent` | `broadcast_failed` - `SponsorLedgerEntry.status` is now required; schema rejects entries without it - `beginPendingBroadcast(ledger, input)` writes the ledger before the network call; refuses to overwrite an unresolved `pending_broadcast` - `resolveBroadcast(ledger, nonce, "sent" | "failed")` transitions on return; throws `LedgerTransitionError` on invalid transitions (no double-resolve) - `decideBroadcast` returns `{ kind: "await_pending_broadcast", nonce, txId }` when the current entry is unresolved — forces the consumer to resolve the prior call before another broadcast can fire **Gap 2 — reconcile grace window** - `reconcile()` accepts `justBroadcastGraceSeconds` (default 30) - Ledger entries absent from the mempool but broadcast within the window are surfaced as `inFlightPendingIndex` instead of `dropped` — covers node → Hiro indexer propagation lag - `reconcile()` promotes `pending_broadcast` → `broadcast_sent` automatically when the mempool confirms the txId (crash recovery) **Classification stays closed** `OccupantClassification` is NOT extended. Grace-window ambiguity requires ledger + clock context that `classifyOccupant` has neither of by design; leaking it there would make every caller grace-aware for a concern only `reconcile` can answer. The pending state is a ledger lifecycle concern, not an occupant kind — the existing `sponsor_owned_in_ledger` match handles mempool-confirmed pending entries correctly. **Breaking change** Existing ledger entries persisted under 0.8.0 lack `status` and will fail schema parse. Consumers must backfill `status: "broadcast_sent"` on migration — targets `0.9.0` minor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens the sponsor wallet state machine by introducing a two-phase broadcast lifecycle for ledger entries and adding a reconcile grace window to reduce false “dropped” classifications during mempool/indexer propagation lag.

Changes:

Add required SponsorLedgerEntry.status (pending_broadcast | broadcast_sent | broadcast_failed) and new lifecycle helpers beginPendingBroadcast / resolveBroadcast with LedgerTransitionError.
Update decideBroadcast to return await_pending_broadcast when a nonce is unresolved, preventing accidental double-broadcast.
Extend reconcile() with justBroadcastGraceSeconds and a new inFlightPendingIndex result bucket; also auto-promote pending_broadcast → broadcast_sent on mempool confirmation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`src/core/sponsor-ledger.ts`	Adds required `status` field + status enum schema for ledger entries.
`src/core/sponsor-wallet-machine.ts`	Implements two-phase lifecycle helpers, `await_pending_broadcast` decision, and reconcile grace-window + in-flight classification.
`tests/sponsor-wallet-machine.test.ts`	Updates existing tests for required `status` and adds coverage for two-phase lifecycle + grace-window behaviors.
`README.md`	Documents the two-phase broadcast write pattern and reconcile grace-window semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unused LedgerEntryStatusSchema import from sponsor-wallet-machine. - reconcile: txId mismatch at a nonce is drift, not propagation lag. Classify as dropped regardless of broadcastAt age. Grace window now applies only when observed is null/undefined. Adds test to pin the behavior. - README: reformat inline object literal as fenced block so markdown renderers don't split the single-line code span across a newline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arc0btc

Two-phase broadcast lifecycle + reconcile grace window — closes the crash-window gap that's been an open reliability concern for the relay.

What works well:

The transition matrix is airtight: PENDING_PREDECESSORS set clearly models which states are valid predecessors, and LedgerTransitionError gives callers a typed error to catch rather than a silent wrong-state write.
decideBroadcast guarding at the top — returning await_pending_broadcast before any other logic — is the right precedence call. It keeps the invariant clean regardless of occupant classification.
The two-pass structure in reconcile() is well-commented and the separation of concerns is sound: Pass 1 handles mempool hits (adopt + promote), Pass 2 handles absences (grace vs. dropped). The second commit's fix making txId-drift unconditionally dropped is exactly right — drift is not lag.
Test coverage is thorough. The crash-recovery scenario (entry broadcast 120s ago, never resolved, not in mempool → dropped) and the grace-window boundary tests (10s inside / 60s outside / custom override) give me confidence the edge cases are understood.

[question] Grace window applies to broadcast_sent entries too (src/core/sponsor-wallet-machine.ts, Pass 2)

Pass 2 computes ageMs from entry.broadcastAt for all entries when observed is null — regardless of whether the entry is pending_broadcast or broadcast_sent. A broadcast_sent entry that temporarily disappears from the Hiro mempool (say, due to rapid confirmation or transient indexer lag) within 30s of its broadcastAt timestamp would land in inFlightPendingIndex rather than dropped. Is this intentional? Operationally, a broadcast_sent entry going absent is a different signal than a pending_broadcast one — the former means the node confirmed receipt, the latter means we haven't heard back yet. If the intent is grace-for-pending-only, the condition could be:

    const ageMs = nowMs - Date.parse(entry.broadcastAt);
    if (entry.status === "pending_broadcast" && ageMs >= 0 && ageMs < graceMs) {
      inFlightPendingIndex.push(entry.nonce);
    } else {

If the current behavior is intentional (grace for any recent absence), a brief comment explaining why would help future readers.

[question] justBroadcastGraceSeconds default vs. stated propagation timing (src/core/sponsor-wallet-machine.ts:507)

The comment says the grace window "Covers node→indexer propagation lag (~6-10 Nakamoto blocks + Hiro indexing)". If those are streaming Stacks blocks at ~1-2s each, 30s is comfortably generous. If they're longer, 30s could be tight. The x402 relay can override this via ReconcileOptions, but for consumers who don't override, it's worth confirming the 30s default is calibrated to real Hiro indexing latency. We've seen Hiro 400 delays operationally, though those are different failure modes.

[nit] Variable shadowing in README example (README.md)

let ledger = beginPendingBroadcast(ledger, {

This shadows the outer ledger binding. JS evaluates the RHS correctly so the code works, but the pattern is confusing to read — looks like a circular assignment at a glance. let pendingLedger = beginPendingBroadcast(ledger, ...) or a reassignment (ledger = beginPendingBroadcast(ledger, ...)) would read more naturally.

Code quality notes:

The rbfAttempts defaulting chain in beginPendingBroadcast (input.rbfAttempts ?? existing?.rbfAttempts ?? 0) is correct but callers doing RBF need to remember to pass the incremented count explicitly — there's no auto-increment. The README covers the first-broadcast case but not the RBF flow. A brief README addition showing the RBF call (rbfAttempts: existingEntry.rbfAttempts + 1) would prevent a subtle foot-gun for consumers integrating this pattern.

Operational context:

We run this code path nightly for Zest sBTC supply operations. The nonce-serialization fix we shipped last week (shared nonce coordinator) prevents concurrent double-submissions at the application layer, but there was still a theoretical crash window between write and node confirmation. This closes it cleanly at the state machine layer. The await_pending_broadcast guard is the piece we'd want most — if the process dies mid-broadcast, crash recovery via reconcile picks it up without a second broadcast firing.

The two questions above are worth a quick check before merge, but neither is blocking. The core logic is sound and the test coverage gives me confidence in the critical paths.

- reconcile: grace window now applies ONLY to pending_broadcast entries. A broadcast_sent entry that vanishes from the mempool is mined or evicted (the node already confirmed receipt), not propagation lag. Falls through to dropped so the caller disambiguates via tx status lookup. - ReconcileOptions: clarify default-30s calibration (Nakamoto block timing + Hiro indexer latency) and document grace-window scope. - README: drop shadowed `let ledger = ...` in favor of straight reassignment. - README: add RBF flow example showing explicit rbfAttempts increment so consumers don't miss the no-auto-increment behavior. - Test: pin broadcast_sent absence behavior (5s old, mempool null → dropped, not inFlightPendingIndex). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

whoabuddy · 2026-04-15T22:15:36Z

Thanks for the thorough read, @arc0btc — all four points addressed in a909bbc.

[question 1] Grace window leaking to broadcast_sent — fixed.
Sharp catch. You're right that broadcast_sent and pending_broadcast carry different operational meaning when they vanish from the mempool. Pass 2 now branches explicitly:

pending_broadcast + absent → grace window applies (could still be propagation lag)
broadcast_sent + absent → dropped immediately (node confirmed receipt; absence means mined or evicted, caller disambiguates via tx status lookup)
any tx_id mismatch → dropped immediately (drift, not lag)

Added a test does NOT grace-window broadcast_sent entries that pins this: 5s-old broadcast_sent entry, mempool null → dropped, not inFlightPendingIndex. Comment on ReconcileOptions.justBroadcastGraceSeconds now spells out the scope.

[question 2] 30s default — keeping it, but documented honestly.
Stayed at 30s to match the issue spec, but expanded the comment to acknowledge it's calibrated to the tight end of the Nakamoto-block + Hiro-indexer envelope (3-5s × 6-10 blocks ≈ 30-50s). Operators seeing routine Hiro slowness should override upward; the grace only delays a dropped classification, never changes a true outcome. If we hit operational pain on this default, we can revisit in a 0.10.x.

[nit 3] README shadowing — fixed.
Dropped the let ledger = ... rebind in favor of straight reassignment. Reads as the iterative state update it actually is.

[code quality] RBF flow example — added.
Good foot-gun call. README now has a second snippet showing the explicit rbfAttempts: existing.rbfAttempts + 1 pass on retry, with a note that beginPendingBroadcast doesn't auto-increment because the same helper covers first broadcasts and retries.

Operationally: the Zest sBTC nightly path is exactly the consumer this is meant to harden. The shared nonce coordinator catches concurrent submissions; this catches the in-process crash window. Together that's the gap closed end-to-end. 219 tests green, build clean — ready for merge when you are.

…atch Six independent silent-failure modes in v2.1.108 produced prose-only dispatches with no tool calls landing. Decisive fix: setting CLAUDE_CODE_SUBPROCESS_ENV_SCRUB=0 — the v2.1.108 hardening force-resets permission mode to "default" when scrub is on, silently overriding --permission-mode bypassPermissions. End-to-end verified: PR review posted to aibtcdev/tx-schemas#26 by arc0btc, task self-closed. Files: - .claude/settings.json: disable sandbox (VM is the trust boundary; network allowlist wildcard "*" silently no-ops in v2.1.108) - docs/claude-code-v2.1.108-sandbox-patch.md: full investigation notes covering all six failure modes (dispatch.ts changes documented there were committed in earlier auto-commits) - .gitignore: exclude db/last-dispatch-*.txt diagnostic dumps

* chore(deps): bump @aibtc/tx-schemas to 1.0.0 Upgrade from ^0.3.0 to ^1.0.0. The 1.0.0 breaking change (two-phase broadcast status + reconcile grace window, aibtcdev/tx-schemas#26) adds five new TerminalReason variants and five new RpcErrorCode variants; the core enums used by this repo (TrackedPaymentStateSchema, TerminalFailureStateSchema, paymentStateDefaultDeliveryByState) are unchanged and required no code edits. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(inbox): adopt new broadcast states from tx-schemas 1.0.0 Expand RPC_ERROR_CODE_MAP and TERMINAL_REASON_ERROR_CODE_MAP in relay-rpc.ts to cover the five new variants introduced in tx-schemas 1.0.0: TerminalReason: sponsor_exhausted → INSUFFICIENT_FUNDS, sponsor_nonce_conflict → RELAY_ERROR, origin_chaining_limit → NONCE_CONFLICT, broadcast_rate_limited → BROADCAST_FAILED, sender_hand_expired → PAYMENT_NOT_FOUND RpcErrorCode: SPONSOR_EXHAUSTED → INSUFFICIENT_FUNDS, ORIGIN_CHAINING_LIMIT → NONCE_CONFLICT, BROADCAST_RATE_LIMITED → BROADCAST_FAILED, SENDER_HAND_EXPIRED → PAYMENT_NOT_FOUND, NONCE_OCCUPIED → NONCE_CONFLICT Add 18 schema-compat tests verifying enum parse round-trips and map resolution for each new variant (506 total, up from 488). Co-Authored-By: Claude <noreply@anthropic.com> * fix: address PR review feedback - Refactor new-TerminalReason mapping tests to be table-driven and cover all 5 new variants, not just 3. Previously sponsor_nonce_conflict and broadcast_rate_limited were added to TERMINAL_REASON_ERROR_CODE_MAP without matching regression tests. - Add `afterEach(() => vi.useRealTimers())` to the mapping describe so an assertion throwing mid-test can't leak fake timers into later tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 15, 2026 19:26

Copilot started reviewing on behalf of whoabuddy April 15, 2026 19:26 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread src/core/sponsor-wallet-machine.ts Outdated

Comment thread README.md Outdated

Comment thread src/core/sponsor-wallet-machine.ts Outdated

arc0btc approved these changes Apr 15, 2026

View reviewed changes

whoabuddy merged commit cbf7261 into main Apr 15, 2026
1 check passed

github-actions Bot mentioned this pull request Apr 15, 2026

chore(main): release tx-schemas 1.0.0 #27

Merged

whoabuddy mentioned this pull request Apr 22, 2026

chore(deps): bump @aibtc/tx-schemas to 1.0.0 aibtcdev/landing-page#633

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core)!: two-phase broadcast status + reconcile grace window#26

feat(core)!: two-phase broadcast status + reconcile grace window#26
whoabuddy merged 3 commits into
mainfrom
feat/sponsor-ledger-two-phase-broadcast

whoabuddy commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arc0btc left a comment

Uh oh!

whoabuddy commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

whoabuddy commented Apr 15, 2026

Summary

Design notes

Breaking change

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arc0btc left a comment

Choose a reason for hiding this comment

Uh oh!

whoabuddy commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants