Skip to content

ExecuteTransaction verification robustness follow-ups (head-of-line blocking, timeout misclassification, Tier-3 re-scan) #304

@jorgecuesta

Description

@jorgecuesta

Summary

Tracking issue for verification-robustness follow-ups surfaced during the PR #278 review. None are regressions introduced by that PR's core fix; they're pre-existing/edge weaknesses in the ExecuteTransaction verification flow worth a dedicated pass.

Items

  • Head-of-line blocking in ExecutePendingTransactions. It awaits each executeChild("ExecuteTransaction") sequentially in a for loop. With the new per-block verify retry policy (up to TX_EXPIRATION_BLOCKS × 60s ≈ 30 min on the not-found path), a single slow/stuck transaction now stalls every later pending transaction in that run. Consider bounded concurrency (e.g. Promise.all over batches, like SupplierStatus).

  • verifyTransaction startToCloseTimeout (30s) vs. Tier-3 scan duration. One attempt can run Tier-1 + Tier-2 (REST) + Tier-3 (up to 30 sequential comet.block round-trips). A slow attempt is killed with a START_TO_CLOSE TimeoutFailure, whose cause is not the TX_NOT_FOUND ApplicationFailure that isTxNotFoundFailure() checks for. On the final attempt this flips verifyErroredUnexpectedly = true, so a merely-not-yet-landed tx is logged as verification errored (not a clean not-found) ... marked failure for triage — a misleading triage signal. The timeout also consumes a maximumAttempts slot, shrinking the real verify budget below 30 blocks.

  • Tier-3 re-scans from scratch on every retry. For a never-landing tx, ~30 retries × up to 30 sequential comet.block(h) fetches ≈ ~900 block RPC round-trips against one endpoint over ~30 min, plus a fresh client connection per attempt, mostly re-reading already-scanned blocks. Parallelize the block fetches (Promise.all) and/or only scan newly-produced blocks since the previous attempt.

Notes

The block-count↔retry-count coupling (maximumAttempts = TX_EXPIRATION_BLOCKS with a 60s interval, assuming ~60s block time) is a related fragility: if block time drifts, the verify window no longer matches the on-chain mempool-expiration window. Worth considering a height-based loop instead of a time-based proxy.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions