fix(toolkit): emit successful txs from batch-single-tx on partial failure#1731
fix(toolkit): emit successful txs from batch-single-tx on partial failure#1731chrispalaskas wants to merge 1 commit into
Conversation
…lure batch-single-tx discarded every successfully-built tx whenever a single transfer failed, returning PartialFailure before serializing anything to --dest-file. Downstream load appliers (tx_load_applier.py) saw the non-zero exit, deleted the batch, and skipped submission entirely — so a batch of 250 transfers with even one failure sent zero transactions. Now emit all txs that built successfully and only error when the entire batch fails (AllFailed), which the wrapper already handles as "produced no transactions". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 534280335c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if succeeded == 0 { | ||
| return Err(BatchSingleTxError::AllFailed { failed }); | ||
| } |
There was a problem hiding this comment.
Preserve state when accepting partial batches
When failed > 0 but at least one transfer succeeds, this now continues with the mutated shared builder context and emits the remaining txs. For failures that occur after tx_info.prove() starts (for example shielded transfers where OfferInfo::build_inputs has already spent wallet state before proof/validation returns ProvingFailed), a failed, un-emitted transfer can leave local coins/DUST marked spent or create local change; later successes built from the same wallet may then depend on state that will not exist on-chain once only the successful txs are submitted. Please isolate or roll back per-transfer context mutations before treating partial failures as success.
Useful? React with 👍 / 👎.
HarleysCodes
left a comment
There was a problem hiding this comment.
Thanks for the fix — the behavior change is correct and the rationale (don't discard 247 valid txs because 3 failed) is exactly right. A few concrete observations from reading the diff:
Behavior change is sound. The rename of PartialFailure → AllFailed is a clean refactor: the variant is only constructed in one place and only matched by the BuildTxs Error impl in this same file, so there's no other call site to update. Verified via gh search code 'PartialFailure' returning only the original construction site. The new succeeded > 0 short-circuit correctly turns a partial-failure error into a successful return with the partial batch.
Suggestion: add a test for the new behavior. The PR description only mentions cargo check -p midnight-node-toolkit passing — there's no test that exercises failed > 0 && succeeded > 0 and verifies the function returns Ok with the partial batch. Even a small unit test in batch_single_tx.rs (constructing a builder with 5 mock transfers where 2 fail) would prevent regressions and document the new contract. Given the perf-repo load applier in the description relies on the new Ok path, locking the behavior in a test is high-value.
Minor: warn-message arg order. In the new tracing::warn!:
tracing::warn!(
succeeded = succeeded,
failed = failed,
"Partial batch: {} of {} transfers failed; emitting {} successful txs",
failed,
succeeded + failed,
succeeded
);The format string lists arguments as failed / total / succeeded but the structured fields above are succeeded = succeeded, failed = failed. I'd swap the format placeholders to {}/{}, succeeded {} → failed/total/succeeded for consistency with the structured fields, OR move the structured fields below the format string with failed = {}-style placeholders. Cosmetic, but the dual layout makes it easy to misread at a glance.
Pre-existing edge case, not a blocker. If the batch is empty (0 transfers, succeeded == 0 && failed == 0), the new code returns Ok(SerializedTxBatches { batches: vec![] }). Pre-existing behavior was the same, so this PR doesn't regress anything — but worth filing a follow-up if the team considers an empty batch an error condition (e.g. missing --num-transfers flag). Out of scope for this fix.
Optional doc nit. The PR body mentions the perf-repo wrapper (tx_load_applier.py → generate_txs_round_robin.py) but doesn't link it. A maintainer reviewing this would have to grep for those script paths manually. A couple of <url> references would let the reviewer verify the claim that 'no wrapper change is needed.'
Overall the fix is small, focused, and addresses a real waste-of-work bug. The main ask is adding a test; the rest is polish. Happy to follow up on any of these.
HarleysCodes
left a comment
There was a problem hiding this comment.
Two small follow-up inline comments after the main PR-level review above.
| return Err(BatchSingleTxError::PartialFailure { succeeded, failed }); | ||
| if succeeded == 0 { | ||
| return Err(BatchSingleTxError::AllFailed { failed }); | ||
| } |
There was a problem hiding this comment.
Tiny nit on the warn-message: structured fields above are succeeded = succeeded, failed = failed but the format placeholders consume them as failed, succeeded + failed, succeeded — i.e. the format string reads {failed} of {total} transfers failed; emitting {succeeded} successful txs. Either swap the format placeholders to match the structured-field ordering, or reorder the args. Cosmetic, but the dual layout is easy to misread.
| succeeded = succeeded, | ||
| failed = failed, | ||
| "Partial batch: {} of {} transfers failed; emitting {} successful txs", | ||
| failed, |
There was a problem hiding this comment.
This is the new code path worth a test: failed > 0 && succeeded > 0 must return Ok with the partial batch, not Err. The perf-repo load applier in the PR description relies on this exact behavior. Even a small test that constructs a BatchSingleTxBuilder with 5 transfers where 2 deliberately fail and verifies the function returns Ok(SerializedTxBatches) with 3 entries would lock the contract and document the new semantics for future maintainers. Without it, a future refactor could revert to Err on partial failure and break the load-test pipeline silently.
| @@ -246,8 +249,21 @@ impl<C: BuilderContext<DefaultDB>> BuildTxs for BatchSingleTxBuilder<C> { | |||
|
|
|||
| progress.finish(format!("batch-single-tx: {} succeeded, {} failed", succeeded, failed)); | |||
|
|
|||
| // Emit every tx we managed to build, even on partial failure. Discarding the | |||
| // successful txs because a few transfers failed wastes the whole batch and starves | |||
| // downstream load appliers. Only treat a batch with zero successes as an error. | |||
| if failed > 0 { | |||
There was a problem hiding this comment.
I would make sure that process returns a valid JSON for downstream instead of playing with error codes here.
As pointed by another comment: 0 of 0 is success, 0 of 1 if failure, but in both cases downstream doesn't have much to work on. It seems that return code alone isn't sufficient here.
Also, is 1 okay out of 250 a success? It hardly seems so to me.
It seems simpler to me if downstream makes decision what is enough for it.
Problem
generate-txs batch-single-txbuilds N transfers concurrently, then discards every successfully-built tx if even one transfer fails. It returnsPartialFailurebefore serializing anything, so--dest-fileis never written:This wastes the whole batch. Downstream load appliers (the perf repo's
tx_load_applier.py→generate_txs_round_robin.py) see the non-zero exit, delete the batch directory, and skip submission entirely. In practice a 250-transfer batch with even one failed transfer sends zero transactions — so the load test produces no traffic at all:All 247 valid txs are thrown away because 3 failed.
Fix
Emit every tx that built successfully, even on partial failure. Only error when the entire batch fails (
AllFailed):succeeded > 0: log a warning, write the successful txs, exit 0.succeeded == 0: returnAllFailed, exit non-zero (the wrapper already handles this as "produced no transactions").No wrapper change is needed — the existing
valid_count == 0 && returncode != 0guard ingenerate_txs_round_robin.pynow only trips on a fully-empty batch, and the applier submits partial batches.Testing
cargo check -p midnight-node-toolkitpasses.🤖 Generated with Claude Code