feat: append builder info to detect builds made with uncommitted changes in the builder by sayakpaul · Pull Request #666 · huggingface/kernels

sayakpaul · 2026-06-19T07:32:04Z

Fixes #648

Considerations:

The Nix sandbox has no .git, so this information has to be passed in explicitly. So, builder-info related stuff become command-line arguments embedded in the derivation's build script at eval time. Is there a better way to handle it?
nix build / nix flake show writes an untracked flake.lock, which makes the kernel tree dirty aka kernel.dirty=true unless the lock is committed. I have made a note about it in the docs. Is there a better way to handle it?

Example metadata.json with builder-info:

{
    "name": "relu",
    "id": "_relu_cuda_9f8a0ff",
    "version": 1,
    "license": "Apache-2.0",
    "source": "https://github.com/myorg/relu",
    "python-depends": [],
    "backend": {
      "type": "cuda",
      "archs": ["7.0", "7.5", "8.0", "8.6", "8.9", "9.0+PTX"]
    },
    "digest": {
      "algorithm": "sha256",
      "files": {
        "__init__.py": "xLMbARTcTl8L/m1kJLc/h/QL4Kzt772F872a46pfRGI=",
        "_relu_cuda_9f8a0ff.abi3.so": "vtdzzToloH38HZkVs7sFEf69QFDxROuPsBAond3Jic0=",
        "_ops.py": "Hrp5aF4o0eHSttw4sQGsbBAXFqvLJ42Y9YJ2KkqvZhg=",
        "relu/__init__.py": "DFYPlrhXwYjEqCl/8n0SmWGZV8NFml5DPhMjKfv98GY="
      }
    },
    "build-info": {
      "kernel-builder": {
        "version": "0.16.0-dev0",
        "sha": "03795978d693a6e0872f6a36187de55888744cfd",
        "dirty": false
      },
      "kernel": {
        "sha": "9f8a0ff1dcb2e1d34d46950eadf0714050f9af04",
        "dirty": false
      }
    }
  }

We throw a warning during upload to the Hub and also show a warning on the Hub card that the corresponding builds were made with a "dirty" version. Kernel loading also logs similar warnings.

Since the real changes affect the above three surfaces, the total changes are a bit bigger than I expected (including all the docs and tests). I can break the PR down further:

Add builder info in the kernel-builder CLI
kernels related changes
Append card-related changes

HuggingFaceDocBuilderDev · 2026-06-19T07:34:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Adds the --kernel-sha/--kernel-dirty and --kernel-builder-sha/ --kernel-builder-dirty flags to the generated CLI reference, fixing the check-cli-docs CI job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fixes the `nix fmt --ci` formatting check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

danieldk · 2026-06-19T09:42:05Z

+    git_info: Option<GitInfo>,
+    kernel_builder: Option<KernelBuilderInfo>,


We shouldn't overload this struct, it's purely for uniquely identifying a kernel (e.g. for ops name generation). Let's keep it small and atomic.

danieldk · 2026-06-19T09:44:53Z

+    // Prefer externally-provided `kernel-builder` provenance (e.g. from Nix),
+    // falling back to the provenance baked in at compile time.
+    let kernel_builder = kernel_id
+        .kernel_builder()
+        .cloned()
+        .unwrap_or_else(kernel_builder_info);


This seems the wrong place to do this. The builder's hash/version should be burned into the builder, so we should adjust the derivation to give the build access to it.

danieldk · 2026-06-19T09:53:36Z

+    kernel_sha: Option<String>,
+    kernel_dirty: bool,


This can lead to incorrect metadata in case someone generates pyproject.toml and then adds more commits. So doing this from CMake (and then giving the kernel build sandbox access to git files) seems like the way to do it. For the Nix build (since we probably do not want to copy all the git objects to the sandbox), we could pass the SHA to CMake. Not sure how important this is though.

This is currently also an issue for the sha hash we use in the ops name. We are generating it when making the project files, but it should probably be done inside CMake (but would better be a separate PR to avoid PRs becoming too big).

danieldk · 2026-06-19T09:58:25Z

        build_dir.display()
    );

+    let dirty_build = detect_dirty_build(&variants);


Can we split the dirty warnings into a separate PR? I think it's best to iterate on the metadata first as a smaller unit.

danieldk · 2026-06-19T11:05:12Z

+    emit_git_info();
+}
+
+fn emit_git_info() {


Might be worth using built over our own custom code:

https://crates.io/crates/built

Also uses git2 rather than shelling out, which is generally more reliable than shelling out (e.g. git may not be available in a build sandbox).

Also seems to be used by at least one very high-profile crate (rav1e).

danieldk · 2026-06-19T11:15:26Z


+  # Extra `create-pyproject` flags recording git provenance (commit SHA and
+  # dirty state) of the kernel source and `kernel-builder`.
+  provenanceArgs ? "",


I think it's nicer if the args are just the atomic data, not flags.

danieldk · 2026-06-19T11:16:15Z

      rev,
      doGetKernelCheck,
      stripRPath ? false,
+      provenanceArgs ? "",


Why set defaults? Makes it easy to miss.

danieldk · 2026-06-19T11:17:33Z


+  # Git provenance of the `kernel-builder` itself (the flake revision it was
+  # evaluated from), recorded in the build metadata. `null`/`false` when
+  # `kernel-builder` is used from a non-git source (e.g. a local `path:`).


A local path can also be a git source. I think in flakes it must even be.

danieldk · 2026-06-19T11:20:23Z

+  builderRev ? null,
+  builderDirty ? false,


This makes it kernel-overridable, I don't think we want to expose this in the public API. We should probably let the function take two attrsets. Also, I think we can pass through the builder flake and get it from there, rather than separate attributes. Finding these in flake.nix seems like the wrong place.

danieldk · 2026-06-19T11:23:11Z

+  provenanceArgs =
+    let
+      kernelSha =
+        if self == null then
+          null
+        else if self ? rev then
+          self.rev
+        else if self ? dirtyRev then
+          lib.removeSuffix "-dirty" self.dirtyRev
+        else
+          null;
+      kernelDirty = self != null && !(self ? rev);
+      builderSha = if builderRev == null then null else lib.removeSuffix "-dirty" builderRev;
+      parts =
+        lib.optional (kernelSha != null) "--kernel-sha ${kernelSha}"
+        ++ lib.optional (kernelSha != null && kernelDirty) "--kernel-dirty"
+        ++ lib.optional (builderSha != null) "--kernel-builder-sha ${builderSha}"
+        ++ lib.optional (builderSha != null && builderDirty) "--kernel-builder-dirty";
+    in
+    lib.concatStringsSep " " parts;
+


As mentioned above in arch.nix, this is a very local concern, doing this gen-flake-outputs.nix seems like the wrong level of abstraction.

sayakpaul · 2026-06-19T12:28:42Z

Will break the PR into multiple parts, starting with appending the builder info, as discussed.

Will wait for the metadata versioning PR, first before continuing. This so that we don't have to make the provenance data optional,

github-actions · 2026-06-19T12:30:09Z

Coverage report — `kernels/`

Measured on: Python 3.10 / Torch 2.12.0.
Other CI configurations are not included in this number.
Hardware-gated code paths (ROCm/XPU/NPU/Darwin/Windows) are excluded or unreachable on the Linux+CUDA runner.

Total coverage: 84.5% — threshold: 80% — ✅

Per-file breakdown

Name	Stmts	Miss	Cover	Missing
src/kernels/__init__.py	10	0	100%
src/kernels/_system.py	6	1	83%	10
src/kernels/_versions.py	63	7	89%	46, 49, 52-53, 56-57, 100
src/kernels/backends.py	194	55	72%	40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276
src/kernels/compat.py	8	1	88%	5
src/kernels/deps.py	54	4	93%	58-59, 95, 98
src/kernels/layer/__init__.py	6	0	100%
src/kernels/layer/_interval_tree.py	103	4	96%	23, 52, 147, 150
src/kernels/layer/device.py	48	14	71%	42, 47-49, 91, 96-98, 101, 149, 152, 155-157
src/kernels/layer/func.py	93	7	92%	72, 100, 154, 257, 263, 272, 290
src/kernels/layer/globals.py	5	0	100%
src/kernels/layer/kernelize.py	73	8	89%	255, 273, 281-282, 288, 292, 308-310
src/kernels/layer/layer.py	174	15	91%	166, 209, 215, 228, 320-321, 333, 342, 350, 361, 390, 394, 407, 460, 490
src/kernels/layer/mode.py	14	0	100%
src/kernels/layer/repos.py	130	34	74%	27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235
src/kernels/lockfile.py	71	46	35%	37-104, 108-131
src/kernels/status.py	49	2	96%	23, 81
src/kernels/utils.py	315	55	83%	65, 77-81, 87-88, 242, 246, 249, 311, 319, 358-359, 397, 428, 433, 468, 697, 700, 702, 708, 721-722, 743-755, 759-766, 774, 778-788, 792-799, 837, 841, 860, 862
src/kernels/variants.py	262	19	93%	56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536
src/kernels/verify.py	88	1	99%	32
TOTAL	1766	273	85%

Updated by the Test kernels workflow on commit 2f1855613fb3b7f93c9a95b08f568369622d6266.

sayakpaul added 3 commits June 19, 2026 12:05

feat: record git status.

0379597

up

a6fa638

Merge branch 'main' into dirty

b638689

sayakpaul temporarily deployed to testpypi June 19, 2026 07:32 — with GitHub Actions Inactive

sayakpaul marked this pull request as draft June 19, 2026 07:33

docs: regenerate builder-cli.md for create-pyproject provenance flags

cd94930

Adds the --kernel-sha/--kernel-dirty and --kernel-builder-sha/ --kernel-builder-dirty flags to the generated CLI reference, fixing the check-cli-docs CI job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sayakpaul temporarily deployed to testpypi June 19, 2026 08:34 — with GitHub Actions Inactive

nix: format gen-flake-outputs.nix with nixfmt

27a2009

Fixes the `nix fmt --ci` formatting check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sayakpaul temporarily deployed to testpypi June 19, 2026 08:48 — with GitHub Actions Inactive

formatting

b6e800e

sayakpaul marked this pull request as ready for review June 19, 2026 09:19

sayakpaul temporarily deployed to testpypi June 19, 2026 09:19 — with GitHub Actions Inactive

danieldk reviewed Jun 19, 2026

View reviewed changes

Merge branch 'main' into dirty

2f18556

sayakpaul temporarily deployed to testpypi June 19, 2026 12:25 — with GitHub Actions Inactive

sayakpaul marked this pull request as draft June 19, 2026 12:29

		git_info: Option<GitInfo>,
		kernel_builder: Option<KernelBuilderInfo>,

Conversation

sayakpaul commented Jun 19, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jun 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Coverage report — kernels/

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Coverage report — `kernels/`