Skip to content

fix(execution-verifier): persist verified block progress on exit#145

Open
piersy wants to merge 1 commit into
mainfrom
piersy/execution-verifier-persist-progress-on-exit
Open

fix(execution-verifier): persist verified block progress on exit#145
piersy wants to merge 1 commit into
mainfrom
piersy/execution-verifier-persist-progress-on-exit

Conversation

@piersy

@piersy piersy commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

An execution verifier instance was entering a CrashLoopBackOff in Kubernetes because it would complete a short block range faster than the 10-second persistence interval, then exit without saving progress. On restart it would read the same stale persisted block, re-process the same range, and exit again in an infinite loop.

This PR wont stop the verifier from getting stuck in CrashLoopBackOff but at least the verifier will output a log that makes it clear that it has finished processing its requested block range.

This was the alert - https://clabsco.slack.com/archives/C04NWTCC810/p1774433563332659

Changes:

  • Persist the verified block tracker to the state file on all exit paths (normal completion, task error, and task panic), not just on the background timer. The happy path propagates persist errors; the error paths use best-effort persistence to avoid masking the original error.
  • Clone state_file and tracker before they are moved into spawned task closures so they remain available at the exit points.
  • Improve the startup log to print the end block number when defined, or indicate that the verifier is following the head, making it easier to diagnose range vs head-following mode from pod logs.

The execution verifier was entering a CrashLoopBackOff in Kubernetes
because it would complete a short block range faster than the 10-second
persistence interval, then exit without saving progress. On restart it
would read the same stale persisted block, re-process the same range,
and exit again in an infinite loop.

Changes:
- Persist the verified block tracker to the state file on all exit
  paths (normal completion, task error, and task panic), not just on
  the background timer. The happy path propagates persist errors; the
  error paths use best-effort persistence to avoid masking the original
  error.
- Clone state_file and tracker before they are moved into spawned task
  closures so they remain available at the exit points.
- Improve the startup log to print the end block number when defined,
  or indicate that the verifier is following the head, making it easier
  to diagnose range vs head-following mode from pod logs.
@piersy piersy force-pushed the piersy/execution-verifier-persist-progress-on-exit branch from d1f465d to f5fdac6 Compare March 25, 2026 21:28
@piersy piersy requested a review from ezdac March 26, 2026 13:27
Some(end) => tracing::info!(
start_block_number = start_block,
end_block_number = end,
"Using start-block with end-block"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bit weird. Maybe "using fixed block range"?

Comment on lines 307 to +308
verified_block_store_task.abort();
persist_verified_block(tracker, cli.state_file.as_ref()).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this give us a clean abort without joining verified_block_store_task? Should we rather do

Suggested change
verified_block_store_task.abort();
persist_verified_block(tracker, cli.state_file.as_ref()).await?;
cancel_token.cancel();
verified_block_store_task.await?;
persist_verified_block(tracker, cli.state_file.as_ref()).await?;

let concurrency_handle = verify_new_heads_concurrency.clone();
handles.spawn({
let cancel_token = cancel_token.clone();
let cloned_cancel_token = cancel_token.clone();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the name cancel_token_clone would be consistent with the naming in this file. We already had cancel_token_clone and state_file_clone above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants