Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/builder-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ Generate CMake files for a kernel extension build

* `-f`, `--force` — Force-overwrite existing files
* `--unique-id <UNIQUE_ID>` — This is an optional unique identifier that is suffixed to the kernel name to avoid name collisions. (e.g. Git SHA)
* `--kernel-sha <KERNEL_SHA>` — Full commit SHA of the kernel source, recorded in the build metadata. When absent, it is detected from the kernel's git repository (used by Nix builds where the source has no `.git`)
* `--kernel-dirty` — Mark the kernel source as having uncommitted changes in the build metadata. Only meaningful together with `--kernel-sha`
* `--kernel-builder-sha <KERNEL_BUILDER_SHA>` — Full commit SHA of the `kernel-builder` source, recorded in the build metadata. When absent, the SHA baked in at compile time is used (Nix builds pass this since the sandbox has no `.git`)
* `--kernel-builder-dirty` — Mark `kernel-builder` as having uncommitted changes in the build metadata. Only meaningful together with `--kernel-builder-sha`



Expand Down
20 changes: 20 additions & 0 deletions docs/source/kernel-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,26 @@ metadata. Currently the following top-level keys are supported:
- `digest` (`Digest`, required): hash digest of the kernel files.
- `python-depends` (`list[str]`, optional): list of Python dependencies
from a curated set of Python dependencies.
- `build-info` (`dict`, optional): provenance of the build, used to flag
non-reproducible (dirty) builds. It contains two optional sub-objects:
- `kernel-builder`: the `kernel-builder` that produced the build, with its
`version` (`str`), the `sha` (`str`) of the `kernel-builder` source it was
built from (when known), and a `dirty` (`bool`) flag that is `true` when
`kernel-builder` was built from a source tree with uncommitted changes.
- `kernel`: the kernel source that was built, with its commit `sha` (`str`)
and a `dirty` (`bool`) flag that is `true` when the kernel source had
uncommitted changes.

When either `dirty` flag is set, the kernel was built from uncommitted
sources and cannot be reliably reproduced. `kernels` warns when loading such
a build, `kernel-builder` warns when uploading it, and a warning banner is
added to the Hub repository's README.

> **Note:** For Nix builds, dirtiness follows Nix's flake tree status, which
> also counts **untracked** files (including an uncommitted `flake.lock`).
> Commit your `flake.lock` (and avoid stray untracked files) so that clean
> builds are not flagged as dirty. Local `create-pyproject` runs only
> consider changes to tracked files.

Example `metadata.json`:

Expand Down
10 changes: 10 additions & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@
;
inherit (import ./nix-builder/lib/cache.nix) mkForCache;

# Git provenance of `kernel-builder` itself, recorded in the build
# metadata of kernels it builds. `self` here is the `kernel-builder`
# flake (as resolved in the consuming kernel's lock). It is captured
# here because the `self` argument of `genKernelFlakeOutputs` below
# shadows it with the *kernel's* `self`.
builderRev = self.rev or self.dirtyRev or null;
builderDirty = !(self ? rev);

systems = with flake-utils.lib.system; [
aarch64-darwin
aarch64-linux
Expand Down Expand Up @@ -109,6 +117,8 @@
path
rev
self
builderRev
builderDirty
doGetKernelCheck
pythonCheckInputs
pythonNativeCheckInputs
Expand Down
49 changes: 49 additions & 0 deletions kernel-builder/build.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,52 @@
use std::process::Command;

fn main() {
minijinja_embed::embed_templates!("src/pyproject/templates");
emit_git_info();
}

fn emit_git_info() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth using built over our own custom code:

https://crates.io/crates/built

Also uses git2 rather than shelling out, which is generally more reliable than shelling out (e.g. git may not be available in a build sandbox).

Also seems to be used by at least one very high-profile crate (rav1e).

let sha = env_var("KERNEL_BUILDER_GIT_SHA").or_else(|| git_output(&["rev-parse", "HEAD"]));
if let Some(sha) = sha {
println!("cargo:rustc-env=KERNEL_BUILDER_GIT_SHA={sha}");
}

let dirty = match env_var("KERNEL_BUILDER_GIT_DIRTY") {
Some(value) => is_truthy(&value),
// Only consider tracked files; untracked files (e.g. generated build
// artifacts) should not mark the build as dirty.
None => git_output(&["status", "--porcelain", "--untracked-files=no"])
.map(|out| !out.is_empty())
.unwrap_or(false),
};
println!(
"cargo:rustc-env=KERNEL_BUILDER_GIT_DIRTY={}",
if dirty { "1" } else { "0" }
);

println!("cargo:rerun-if-env-changed=KERNEL_BUILDER_GIT_SHA");
println!("cargo:rerun-if-env-changed=KERNEL_BUILDER_GIT_DIRTY");
println!("cargo:rerun-if-changed=../.git/HEAD");
println!("cargo:rerun-if-changed=../.git/index");
}

fn env_var(name: &str) -> Option<String> {
std::env::var(name).ok().filter(|s| !s.is_empty())
}

fn is_truthy(value: &str) -> bool {
value == "1" || value.eq_ignore_ascii_case("true")
}

fn git_output(args: &[&str]) -> Option<String> {
let output = Command::new("git").args(args).output().ok()?;
if !output.status.success() {
return None;
}
let s = String::from_utf8_lossy(&output.stdout).trim().to_string();
if s.is_empty() {
None
} else {
Some(s)
}
}
37 changes: 36 additions & 1 deletion kernel-builder/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,28 @@ enum Commands {
/// kernel name to avoid name collisions. (e.g. Git SHA)
#[arg(long)]
unique_id: Option<String>,

/// Full commit SHA of the kernel source, recorded in the build
/// metadata. When absent, it is detected from the kernel's git
/// repository (used by Nix builds where the source has no `.git`).
#[arg(long)]
kernel_sha: Option<String>,

/// Mark the kernel source as having uncommitted changes in the build
/// metadata. Only meaningful together with `--kernel-sha`.
#[arg(long)]
kernel_dirty: bool,

/// Full commit SHA of the `kernel-builder` source, recorded in the
/// build metadata. When absent, the SHA baked in at compile time is
/// used (Nix builds pass this since the sandbox has no `.git`).
#[arg(long)]
kernel_builder_sha: Option<String>,

/// Mark `kernel-builder` as having uncommitted changes in the build
/// metadata. Only meaningful together with `--kernel-builder-sha`.
#[arg(long)]
kernel_builder_dirty: bool,
},

/// Spawn a kernel development shell.
Expand Down Expand Up @@ -365,7 +387,20 @@ fn main() -> Result<()> {
force,
target_dir,
unique_id,
} => create_pyproject(kernel_dir, target_dir, force, unique_id),
kernel_sha,
kernel_dirty,
kernel_builder_sha,
kernel_builder_dirty,
} => create_pyproject(
kernel_dir,
target_dir,
force,
unique_id,
kernel_sha,
kernel_dirty,
kernel_builder_sha,
kernel_builder_dirty,
),
Commands::Devshell {
kernel_dir,
variant,
Expand Down
22 changes: 21 additions & 1 deletion kernel-builder/src/pyproject/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use eyre::Result;
use itertools::Itertools;

use kernels_data::config::{Backend, General};
use kernels_data::metadata::{BackendInfo, Metadata};
use kernels_data::metadata::{BackendInfo, BuildInfo, KernelBuilderInfo, Metadata};

use crate::pyproject::ops_identifier::KernelIdentifier;
use crate::pyproject::FileSet;
Expand All @@ -20,11 +20,30 @@ pub fn write_compat_py(file_set: &mut FileSet) -> Result<()> {
Ok(())
}

fn kernel_builder_info() -> KernelBuilderInfo {
KernelBuilderInfo {
version: env!("CARGO_PKG_VERSION").to_owned(),
sha: option_env!("KERNEL_BUILDER_GIT_SHA").map(str::to_owned),
dirty: matches!(option_env!("KERNEL_BUILDER_GIT_DIRTY"), Some("1")),
}
}

pub fn write_metadata(
general: &General,
kernel_id: &KernelIdentifier,
file_set: &mut FileSet,
) -> Result<()> {
// Prefer externally-provided `kernel-builder` provenance (e.g. from Nix),
// falling back to the provenance baked in at compile time.
let kernel_builder = kernel_id
.kernel_builder()
.cloned()
.unwrap_or_else(kernel_builder_info);
Comment on lines +36 to +41

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems the wrong place to do this. The builder's hash/version should be burned into the builder, so we should adjust the derivation to give the build access to it.

let build_info = BuildInfo {
kernel_builder: Some(kernel_builder),
kernel: kernel_id.git_info().cloned(),
};

for backend in &Backend::all() {
let writer = file_set.entry(format!("metadata-{backend}.json"));

Expand All @@ -51,6 +70,7 @@ pub fn write_metadata(
backend_type: *backend,
},
digest: None,
build_info: Some(build_info.clone()),
};

serde_json::to_writer_pretty(writer, &metadata)?;
Expand Down
34 changes: 32 additions & 2 deletions kernel-builder/src/pyproject/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use std::{

use eyre::{bail, Result};
use kernels_data::config::{Build, Framework};
use kernels_data::metadata::{GitInfo, KernelBuilderInfo};
use minijinja::Environment;

use crate::{
Expand Down Expand Up @@ -39,16 +40,38 @@ pub fn create_pyproject_file_set(build: Build, kernel_id: &KernelIdentifier) ->
Ok(file_set)
}

#[allow(clippy::too_many_arguments)]
pub fn create_pyproject(
kernel_dir: Option<PathBuf>,
target_dir: Option<PathBuf>,
force: bool,
unique_id: Option<String>,
kernel_sha: Option<String>,
kernel_dirty: bool,
Comment on lines +49 to +50

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lead to incorrect metadata in case someone generates pyproject.toml and then adds more commits. So doing this from CMake (and then giving the kernel build sandbox access to git files) seems like the way to do it. For the Nix build (since we probably do not want to copy all the git objects to the sandbox), we could pass the SHA to CMake. Not sure how important this is though.

This is currently also an issue for the sha hash we use in the ops name. We are generating it when making the project files, but it should probably be done inside CMake (but would better be a separate PR to avoid PRs becoming too big).

kernel_builder_sha: Option<String>,
kernel_builder_dirty: bool,
) -> Result<()> {
let kernel_dir = check_or_infer_kernel_dir(kernel_dir)?;
let target_dir = check_or_infer_target_dir(&kernel_dir, target_dir)?;
let build = parse_build(&kernel_dir)?;
let kernel_id = KernelIdentifier::new(&kernel_dir, build.general.name.python_name(), unique_id);
let git_override = kernel_sha.map(|sha| GitInfo {
sha,
dirty: kernel_dirty,
});
// The version is always that of the running `kernel-builder`; only the
// git provenance is supplied externally (e.g. by Nix).
let kernel_builder_override = kernel_builder_sha.map(|sha| KernelBuilderInfo {
version: env!("CARGO_PKG_VERSION").to_owned(),
sha: Some(sha),
dirty: kernel_builder_dirty,
});
let kernel_id = KernelIdentifier::new(
&kernel_dir,
build.general.name.python_name(),
unique_id,
git_override,
kernel_builder_override,
);
let file_set = create_pyproject_file_set(build, &kernel_id)?;
file_set.write(&target_dir, force)?;

Expand All @@ -65,7 +88,14 @@ pub fn clean_pyproject(
let kernel_dir = check_or_infer_kernel_dir(kernel_dir)?;
let target_dir = check_or_infer_target_dir(&kernel_dir, target_dir)?;
let build = parse_build(&kernel_dir)?;
let kernel_id = KernelIdentifier::new(&kernel_dir, build.general.name.python_name(), unique_id);
// Provenance is irrelevant when computing the set of files to clean.
let kernel_id = KernelIdentifier::new(
&kernel_dir,
build.general.name.python_name(),
unique_id,
None,
None,
);

let generated_files = create_pyproject_file_set(build, &kernel_id)?.into_names();

Expand Down
Loading
Loading