Bake Mathlib build cache into CI image#307
Conversation
| continue-on-error: true | ||
| run: python3 scripts/ci/verify_ci_image.py ghcr.io/reilabs/lampe-ci:${{ steps.versions.outputs.image_tag }} | ||
| - name: Free Runner Disk Space | ||
| if: steps.check.outcome == 'failure' && inputs.allow_push |
There was a problem hiding this comment.
What about always removing it if don't need it at all? is the step very slow and it's better to do it only when in need to rebuild the image?
There was a problem hiding this comment.
Also all other steps share same condition that could be a condition for another job in the pipeline (like run build image job only if there is no image already build).
https://github.com/orgs/community/discussions/25941
There was a problem hiding this comment.
ok good suggestion: build-ci-image is now check/build/guard jobs with job-level conditions, and the disk-space step only exists inside the build job, so it only runs when an image actually gets built. did the same with ci.yaml, the per-step cache-hit guards are gone
| if: ${{ steps.lampe-build-cache.outputs.cache-hit != 'true' }} | ||
| working-directory: Lampe | ||
| run: lake exe cache get | ||
| run: lake exe cache get || echo "cache get failed; falling back to artifacts baked into the CI image" |
There was a problem hiding this comment.
Why not always use baked in docker image files?
There was a problem hiding this comment.
this command reads the local archive (via an env var pointing to it) and unpackages. it shouldn't hit the network
Ship only the ~0.4GB ltar store at /opt/mathlib-cache: bake_mathlib_cache.sh fills it from mathlib's cloud cache when it still serves the pinned rev, else from a full source build plus 'lake exe cache pack', and fails the image build on an incomplete store. The image tag gains the mathlib rev and a hash of the docker build context, so editing the recipe or any COPY'd file retags the image.
Container jobs clear the bind-mounted /__t; host runners drop /opt/hostedtoolcache, the Android SDK, and the .NET tree.
check probes GHCR for the tag with 'docker manifest inspect' (replacing verify_ci_image.py), build runs under a single job-level condition (image missing and pushes allowed), and guard fails fork PRs whose image is missing. ci-noir passes allow_push: true since it only triggers on schedule and dispatch.
scripts/ci/link_lake_packages.py points a workspace's .lake/packages at the shared $LAKE_PKG_DIR; test.xsh and the CI jobs invoke the same helper. The lakefile/manifest rewriters only write when something actually changes, since CI cache keys hash the rewritten files. 'lake exe cache get' is best-effort: the baked store restores offline and 'lake build' can always finish from source. Tests run in sorted order, and the single-purpose helpers in utils.xsh fold into the regex-based ones.
Checked-in lake-manifest.json files keep lake from re-resolving mathlib through proven-zk's .git-suffixed URL spelling and discarding existing clones. MerkleFromScratch drops its packagesDir override in favor of the shared packages symlink.
A keys job computes every cache key in one place and probes them with lookup-only restores; the rustfmt/clippy/test/lean/stdlib/e2e jobs skip at the job level on an exact hit, with !failure() && !cancelled() guards on jobs downstream of skippable ones. Lean jobs share dependency clones via $LAKE_PKG_DIR and restore builds through restore-keys fallbacks, so lake builds stay incremental on partial hits. The CLI travels between jobs only as an artifact (upload zips drop the exec bit, hence the chmod steps). The unused CI_IMAGE env var is gone.
93093e4 to
60542d4
Compare
Mathlib cache migrated to Azure and they stopped serving our version's build cache. That and a couple of other reasons (e.g. cache get having a few types of transient errors, github itself being an unreliable mess) makes our CI pipelines very finicky, especially when you consider how long a cacheless Mathlib build takes.
This PR bakes the Mathlib build cache on our CI image. Since we rebuild the image whenever Mathlib (among other core dependencies) changes version, this has basically no downside.
A couple of other changes here like: