Skip to content

fix(scout): skip attestation on hosts with no TPM device#2900

Open
s3rj1k wants to merge 1 commit into
NVIDIA:mainfrom
s3rj1k:tpm_attestation
Open

fix(scout): skip attestation on hosts with no TPM device#2900
s3rj1k wants to merge 1 commit into
NVIDIA:mainfrom
s3rj1k:tpm_attestation

Conversation

@s3rj1k

@s3rj1k s3rj1k commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Gate the attestation block on actual TPM presence so a host with no TPM skips AK/EK setup and sends discovery without AttestKeyInfo, which the API accepts when attestation_enabled is false.

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@s3rj1k s3rj1k requested a review from a team as a code owner June 25, 2026 21:33
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@s3rj1k

s3rj1k commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

related: #2703

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • Bug Fixes
    • Improved attestation startup so it only runs when a TPM is actually available.
    • If no TPM device is found, the app now logs a warning and continues without failing attestation setup.
    • Added smarter TPM device detection, including support for explicit device paths and common kernel TPM nodes.

Walkthrough

Registration now checks whether a TPM device is present before enabling attestation. TPM path handling now normalizes device-node inputs, probes standard kernel TPM nodes, and adds tests for both candidate selection and presence checks.

Changes

TPM attestation gating

Layer / File(s) Summary
TPM device probe
crates/scout/src/tpm.rs
Normalizes TPM path inputs into device candidates, probes candidate nodes with try_exists(), and adds unit tests for explicit device paths and fallback kernel nodes.
Attestation gate in register
crates/scout/src/register.rs
Derives do_attestation from host type and TPM presence, skips attestation setup without a TPM device, and reuses the gate for the post-registration attestation block.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/infra-controller#2670: Also changes crates/scout/src/register.rs around attestation/TPM setup and the conditions under which that flow executes.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: skipping attestation when no TPM device is present.
Description check ✅ Passed The description accurately describes the TPM gate, the attestation behavior change, and the related testing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

Gate the attestation block on actual TPM presence so a host with no TPM skips AK/EK setup and sends discovery without AttestKeyInfo, which the API accepts when attestation_enabled is false.

Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
@s3rj1k s3rj1k force-pushed the tpm_attestation branch from 0eb4c79 to 09b1d85 Compare June 25, 2026 21:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/scout/src/tpm.rs (1)

195-224: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Express these tables through carbide-test-support rather than hand-rolled loops.

Both additions are textbook total-operation tables, yet they reimplement iteration and labeling manually. The house style mandates the shared helpers: tpm_device_candidates_cases is a natural value_scenarios!/check_values table, and tpm_present_probes_explicit_device_path collapses into a check_values table over (input, expected_bool) instead of four discrete assert!s. This keeps failures labeled by case and removes bespoke harness code.

♻️ Illustrative shape for the presence probe
#[test]
fn tpm_present_probes_explicit_device_path() {
    check_values(tpm_present, &[
        ("device:/dev/null", true),
        ("/dev/null", true),
        ("device:/dev/forge_scout_nonexistent_tpm", false),
        ("/dev/forge_scout_nonexistent_tpm", false),
    ]);
}

As per coding guidelines: "Prefer table-driven tests using the carbide-test-support crate with scenarios! ... and value_scenarios! ... Use check_cases / check_values directly when a macro would obscure a table."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/scout/src/tpm.rs` around lines 195 - 224, The two TPM tests are
hand-rolled table-driven checks and should use the shared carbide-test-support
helpers instead of manual loops and asserts. Replace tpm_device_candidates_cases
with a value_scenarios! or equivalent check_values-style table over
tpm_device_candidates, and collapse tpm_present_probes_explicit_device_path into
a check_values table for tpm_present so each case is labeled consistently. Keep
the existing case coverage, but move the iteration and assertion logic into the
shared helpers to match the project’s test style.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/scout/src/tpm.rs`:
- Around line 195-224: The two TPM tests are hand-rolled table-driven checks and
should use the shared carbide-test-support helpers instead of manual loops and
asserts. Replace tpm_device_candidates_cases with a value_scenarios! or
equivalent check_values-style table over tpm_device_candidates, and collapse
tpm_present_probes_explicit_device_path into a check_values table for
tpm_present so each case is labeled consistently. Keep the existing case
coverage, but move the iteration and assertion logic into the shared helpers to
match the project’s test style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3ea14b92-1ec2-489b-915c-1d29f05ec3ca

📥 Commits

Reviewing files that changed from the base of the PR and between 47e304c and 09b1d85.

📒 Files selected for processing (2)
  • crates/scout/src/register.rs
  • crates/scout/src/tpm.rs

@prbinu-nvidia

Copy link
Copy Markdown
Contributor

NICo's current design expects TPM on the host to operate. Seeking input from @ajf on feasibility of removing host-TPM dependency ( i think this would trigger more work on api-server side).

@s3rj1k

s3rj1k commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

i think this would trigger more work on api-server side

works fine if you disable attestation in config, this is exactly a case I see in env I have where TPM is not working correctly (HW thing) but with disabled attestation I can get machines to enroll into READY state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants