Filing this issue to share a bit of transparency around the recent CI work.
The CI is now running with buildbuddy enabled. Local developers should be able to dramatically speed up local builds (contact me if you can't get it to work). However, there are still some slowness issues with the CI to work out.
To get information about why a build is slow, bazel emits a log like (example from https://github.com/google/heir/actions/runs/28059439274/job/83069884813)
INFO: Streaming build results to: https://heir.buildbuddy.io/invocation/01c314c4-1586-47bd-90b2-1b0922e1088f
That link should be publicly visible. From there you can navigate to the timing page, which in this case shows Compiling tests/Examples/openfhe/ckks/mnist/mnist_openfhe_lib.inc.cc is taking 1271 seconds (21 minutes) to compile. As a result, the clearest path forward to making the CI faster is to enable rolled kernels for that test, and I am preparing a PR to do this tomorrow. Once that is resolved the build step should take ~5 minutes (for an LLVM commit bump).
The second thing of note is that we now install OpenFHE separately and run the python frontend tests with pip install -e . followed by pytest. The OpenFHE installation and python venv should be completely cacheable. Given that the bazel cache upload step takes ~5 minutes, in total this should save about 7 minutes of CI time, bringing the total CI time down to 14 minutes, with the last bottleneck being the frontend tests themselves taking 6 minutes. I can configure pytest to use parallelism to reduce this, though the compilation substep (mainly, linking) may still keep those tests at ~1 minute.
A third long-tail build item I noticed from the buildbuddy timing report is Compiling mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp, which takes 2 minutes total. Since we don't use the NVVM dialect, i can try to trace what dependency path pulls this into HEIR, and try to improve the upstream LLVM bazel build to allow us to not pull it in. Fixing that could get us down to a 3.5 minute clean build.
Finally, the GitHub security isolation means that pull requests from forks cannot run their CI with access to HEIR's configured secrets, including the remote build executor API key for build buddy. This means that those GH workflows will remain local, but what I'm hoping they can be sped up by using the remote build cache (in a read only fashion) and thereby use the build artifacts from the builds that happen on main. I have yet to validate that this is working properly. If you are an active contributor, you should be able to write to branches within google/heir (rather than your fork) and open pull requests from within the repository. We could configure the CI to use HEIR secrets in that case (though with the current CI config it would not).
CC @AlexanderViand @crockeea @mdgrs
Filing this issue to share a bit of transparency around the recent CI work.
The CI is now running with buildbuddy enabled. Local developers should be able to dramatically speed up local builds (contact me if you can't get it to work). However, there are still some slowness issues with the CI to work out.
To get information about why a build is slow, bazel emits a log like (example from https://github.com/google/heir/actions/runs/28059439274/job/83069884813)
That link should be publicly visible. From there you can navigate to the timing page, which in this case shows Compiling tests/Examples/openfhe/ckks/mnist/mnist_openfhe_lib.inc.cc is taking 1271 seconds (21 minutes) to compile. As a result, the clearest path forward to making the CI faster is to enable rolled kernels for that test, and I am preparing a PR to do this tomorrow. Once that is resolved the build step should take ~5 minutes (for an LLVM commit bump).
The second thing of note is that we now install OpenFHE separately and run the python frontend tests with
pip install -e .followed bypytest. The OpenFHE installation and python venv should be completely cacheable. Given that the bazel cache upload step takes ~5 minutes, in total this should save about 7 minutes of CI time, bringing the total CI time down to 14 minutes, with the last bottleneck being the frontend tests themselves taking 6 minutes. I can configure pytest to use parallelism to reduce this, though the compilation substep (mainly, linking) may still keep those tests at ~1 minute.A third long-tail build item I noticed from the buildbuddy timing report is Compiling mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp, which takes 2 minutes total. Since we don't use the NVVM dialect, i can try to trace what dependency path pulls this into HEIR, and try to improve the upstream LLVM bazel build to allow us to not pull it in. Fixing that could get us down to a 3.5 minute clean build.
Finally, the GitHub security isolation means that pull requests from forks cannot run their CI with access to HEIR's configured secrets, including the remote build executor API key for build buddy. This means that those GH workflows will remain local, but what I'm hoping they can be sped up by using the remote build cache (in a read only fashion) and thereby use the build artifacts from the builds that happen on main. I have yet to validate that this is working properly. If you are an active contributor, you should be able to write to branches within google/heir (rather than your fork) and open pull requests from within the repository. We could configure the CI to use HEIR secrets in that case (though with the current CI config it would not).
CC @AlexanderViand @crockeea @mdgrs