Skip to content

fix(pxe): scout-loader honors static_pxe_url override instead of hardcoding carbide-static-pxe.forge#2944

Open
kirson-git wants to merge 1 commit into
NVIDIA:mainfrom
kirson-git:fix/scout-loader-honor-static-pxe-url
Open

fix(pxe): scout-loader honors static_pxe_url override instead of hardcoding carbide-static-pxe.forge#2944
kirson-git wants to merge 1 commit into
NVIDIA:mainfrom
kirson-git:fix/scout-loader-honor-static-pxe-url

Conversation

@kirson-git

Copy link
Copy Markdown
Contributor

The bug

The scout-loader hardcodes the rootfs (scout.squashfs) download host to carbide-static-pxe.forge and ignores the configurable static-pxe URL that the rest of the PXE flow already honors:

  • crates/api-core/src/cfg/file.rs: external_static_pxe_url: Option<String>
  • crates/api-core/src/handlers/pxe.rs: resolves static_pxe_url_override
  • crates/pxe/src/routes/ipxe.rs: templates static_pxe_url into the iPXE wrapper
  • pxe/templates/pxe: set base-url {{ static_pxe_url }}/public/blobs/

The scout-loader is the most critical early-boot fetch, yet it was the one place that did not respect the override. Before this change, pxe/common_files/scout-loader-rclocal did:

newrootfsurl="http://carbide-static-pxe.forge/public/blobs/internal/${arch}/scout.squashfs"

Verified repro (dpu-mode / unresolvable DNS)

On a BlueField dpu-mode host, when the DPU's local resolver (dnsmasq/HBN) is down or the site uses IP-only / custom DNS, the scout-loader fails with:

LOADER: Failed to gather root filesystem information:
curl: (6) Could not resolve host: carbide-static-pxe.forge

so the host never finishes discovery — even though the static-pxe server is reachable by IP and a static_pxe_url override is configured.

The fix

  1. crates/api-core/src/ipxe.rs — append static_pxe_url=${base-url} to the scout.efi kernel command line (x86_64 and aarch64 host). ${base-url} is already set by the wrapping pxe template to the resolved {{ static_pxe_url }}/public/blobs/ (which honors external_static_pxe_url / static_pxe_url_override), and iPXE expands it before booting, so the scout kernel receives the resolved static-pxe base. This reuses the value the override system already computes; no new plumbing.

  2. pxe/common_files/scout-loader-rclocal — parse static_pxe_url= from /proc/cmdline and build the rootfs URL from it; fall back to the existing http://carbide-static-pxe.forge host when it is unset. This mirrors the existing idiom in pxe/common_files/check-scout-updates.sh, which already derives its base from a kernel cmdline arg with the same hostname fallback.

Resolution order in the loader:

  1. newrootfs=<full-url> on the cmdline (explicit override) — unchanged, still highest priority.
  2. static_pxe_url=<base> on the cmdline (new) — the configured static-pxe URL.
  3. Hardcoded carbide-static-pxe.forge (fallback).

Note: in current main the per-arch etc/rc.local files are generated at build time by the stage-scout-loader-rclocal Make task from the single pxe/common_files/scout-loader-rclocal source, so editing the one source file covers both profiles.

Backward compatibility

When no override is configured the loader falls back to the original http://carbide-static-pxe.forge/public/blobs/internal/${arch}/scout.squashfs URL, so behavior is unchanged for existing deployments.

🤖 Generated with Claude Code

…coding carbide-static-pxe.forge

The scout-loader hardcoded the rootfs (scout.squashfs) download host to
`carbide-static-pxe.forge` and ignored the configurable static-pxe URL that the
rest of the PXE flow already honors. The loader is the most critical early-boot
fetch, yet it was the one place that did not respect the override.

Repro: on a BlueField dpu-mode host, when the DPU's local resolver
(dnsmasq/HBN) is down or the site uses IP-only/custom DNS, the scout-loader
fails with:

    LOADER: Failed to gather root filesystem information:
    curl: (6) Could not resolve host: carbide-static-pxe.forge

so the host never finishes discovery -- even though the static-pxe server is
reachable by IP and a `static_pxe_url` override is configured.

Fix:
- crates/api-core/src/ipxe.rs: append `static_pxe_url=${base-url}` to the
  scout.efi kernel command line (x86_64 and aarch64 host). `${base-url}` is
  already set by the wrapping `pxe` template to the resolved
  `{{ static_pxe_url }}/public/blobs/` (which honors
  external_static_pxe_url / static_pxe_url_override), and iPXE expands it
  before booting, so the scout kernel receives the resolved static-pxe base.
- pxe/common_files/scout-loader-rclocal: parse `static_pxe_url=` from
  /proc/cmdline and build the rootfs URL from it; fall back to the existing
  `http://carbide-static-pxe.forge` host when it is unset. This mirrors the
  existing idiom in check-scout-updates.sh, which already derives its base from
  a kernel cmdline arg with the same hostname fallback.

Backward-compatible: when no override is configured the loader falls back to
the original `carbide-static-pxe.forge` URL, so behavior is unchanged for
existing deployments. The explicit `newrootfs=<full-url>` override continues to
take precedence.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kirson-git kirson-git requested a review from a team as a code owner June 27, 2026 12:31
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • New Features
    • Added support for an additional boot-time URL parameter to help load the correct root filesystem location.
    • Improved loader behavior to use a provided static PXE base URL when available, with support for existing manual overrides.
  • Bug Fixes
    • Updated boot command generation so the loader receives the new URL setting consistently across architectures.
    • Reduced fallback reliance by using the configured PXE URL before defaulting to the previous hostname.

Walkthrough

The PXE boot command line now includes static_pxe_url=${base-url} for ARM and X86. The loader reads static_pxe_url= from the kernel cmdline, trims trailing slashes, constructs the internal/<arch>/scout.squashfs URL, and falls back to the previous hostname when absent.

Changes

PXE boot URL propagation

Layer / File(s) Summary
iPXE command line update
crates/api-core/src/ipxe.rs
ARM and X86 instruction generators append static_pxe_url=${base-url} to the generated boot parameters, with updated inline comments.
scout-loader URL selection
pxe/common_files/scout-loader-rclocal
Rootfs URL selection checks newrootfs=, then static_pxe_url= with trailing-slash trimming, then the default carbide-static-pxe.forge hostname.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main PXE scout-loader fix and matches the implemented static_pxe_url override behavior.
Description check ✅ Passed The description is directly related to the changeset and accurately explains the bug, fix, and fallback behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/api-core/src/ipxe.rs (1)

210-213: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Confirm the generated command_line is exercised by tests. The current iPXE tests cover boot-image selection, but not the serialized command line. Add a table-driven case for the ARM host and X86 paths that asserts static_pxe_url=${base-url}, plus a negative check that the DPU carbide.efi branch still omits it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/ipxe.rs` around lines 210 - 213, The iPXE command-line
serialization in the iPXE path is not covered by tests, so add assertions around
the generated command_line for the relevant branches in ipxe.rs. Extend the
existing table-driven tests to verify both ARM host and X86 host cases include
static_pxe_url=${base-url}, and add a negative assertion for the DPU carbide.efi
path to confirm it still does not include that field. Use the existing iPXE
command-line construction and branch-selection symbols to place the new checks
alongside the current boot-image tests.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/api-core/src/ipxe.rs`:
- Around line 210-213: The iPXE command-line serialization in the iPXE path is
not covered by tests, so add assertions around the generated command_line for
the relevant branches in ipxe.rs. Extend the existing table-driven tests to
verify both ARM host and X86 host cases include static_pxe_url=${base-url}, and
add a negative assertion for the DPU carbide.efi path to confirm it still does
not include that field. Use the existing iPXE command-line construction and
branch-selection symbols to place the new checks alongside the current
boot-image tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9d57241b-0e07-47cd-8430-a1dbf15fa36c

📥 Commits

Reviewing files that changed from the base of the PR and between 125a7d0 and 41f679b.

📒 Files selected for processing (2)
  • crates/api-core/src/ipxe.rs
  • pxe/common_files/scout-loader-rclocal

// static_pxe_url passes the (already iPXE-expanded) static-pxe
// base to the scout-loader so it fetches scout.squashfs from the
// configured static-pxe server instead of hardcoding a hostname.
command_line: format!("mac={mac_address} console=tty0 console={console},115200 pci=realloc=off iommu=off cli_cmd=auto-detect machine_id={machine_interface_id} server_uri=[api_url] pxe_uri=[pxe_url] static_pxe_url=${{base-url}}"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's correct to pass base URL, isn't this a template, aka, shouldn't there be a static PXE url variable in scope?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants