chore(docker): retry package installs and network fetches in CI builds#21478
Merged
Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
falcorocks
approved these changes
Jun 22, 2026
sebastianst
reviewed
Jun 22, 2026
… CI flakes CI Docker builds regularly flake when a package registry/CDN drops a connection or returns a server error (apk.cgr.dev, dl-cdn.alpinelinux.org). Use each tool's own retry mechanism so there is no shared script to maintain: - apt: `-o Acquire::Retries=8` on every apt-get/apt invocation. - curl/wget: `--retry 5 --retry-all-errors --retry-delay 2`. - apk (no built-in retry): a one-line `until` loop, ~15 attempts / 20s apart (~5 min), that exits non-zero if it never succeeds so genuine breakages still fail rather than being masked.
7044101 to
67ad18e
Compare
Add docs/ai/docker.md: the rule that every external network fetch in a Dockerfile must retry (apt Acquire::Retries, curl/wget --retry, apk until-loop) so registry/CDN blips don't flake CI image builds. Reference it from the AGENTS.md doc index and cross-link it from ci-ops.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
CI Docker image builds regularly flake because package installs fail to download when a registry/CDN drops a connection or returns a server error — e.g. this run, and an
apk.cgr.dev(Chainguard) server error observed on this PR's own first build.This makes the network steps resilient using each tool's own retry mechanism — no shared script to keep in sync across Dockerfiles:
-o Acquire::Retries=8on everyapt-get/aptinvocation (apt's built-in download retry, with backoff).--retry 5 --retry-all-errors --retry-delay 2.untilloop, ~15 attempts × 20s apart (~5 min budget), that exits non-zero if it never succeeds so a genuine breakage (e.g. a renamed package) still fails rather than being masked.apk --no-cacheis kept intentionally so builds still pull the latest package versions each time. Reliability is favoured over speed: the apk registries (dl-cdn.alpinelinux.org,apk.cgr.dev) are the ones actually observed flaking, and they get the generous ~5 min window.Covers
ops/docker/*,op-up,cannon,rust/op-reth, and therust/kona/docker/*images. The vendoredrust/op-rbuilderandrust/rollup-boostDockerfiles are intentionally excluded (slated for deprecation perdocs/ai/rust-dev.md).Also adds
docs/ai/docker.mddocumenting the rule (every external fetch in a Dockerfile must retry), referenced from theAGENTS.mddoc index and cross-linked fromdocs/ai/ci-ops.md.Test plan
No Docker available locally to build the images. Verified:
untilloop in POSIXsh: silent exit 0 on first-try success, exits 0 when a command recovers mid-sequence, and exits non-zero (not masked) after the full attempt budget is exhausted.Acquire::Retriesis a real apt option;-o Acquire::Retries=8is valid in any position relative to the subcommand./bin/shof every base image it's used in (alpine / golang-alpine / wolfi); noSHELL ["/bin/bash"]directive is in effect in those stages.--retry-all-errorsis supported by the curl in every base image used (curl ≥ 7.71).Real validation is the CI Docker builds on this PR.
History
An earlier revision used a shared
retryshell shim (heredoc /retry-toolstage). That was replaced with built-in retries to avoid duplicating/maintaining the script across the many Dockerfiles and heterogeneous build contexts.