NIC-mode (zero-DPU) hosts should sail through boot-order setup even when the BlueField NIC blips off Redfish across a reboot. Today SetBootOrder leans on HttpDev1 (the UEFI HTTP boot device) staying enabled from the BIOS-setup phase -- but on a Dell NIC-mode host a reboot can de-enumerate the NIC and revert HttpDev1 to the onboard default, and SetBootOrder never re-asserts it. So the reorder keeps missing the now-gone "HTTP Device 1" boot option, is_boot_order_setup never passes, and the host burns its retries stuck.
What this involves
- In
set_host_boot_order (crates/machine-controller/src/handler.rs, the SetBootOrderState::SetBootOrder arm), re-run machine_setup right before set_boot_order_dpu_first. That re-asserts HttpDev1=Enabled + the boot NIC by id; the existing RebootHost force-restart applies it in the same reboot as the reorder -- no extra reboot.
set_boot_order_dpu_first only reorders existing boot options; it never re-enables HttpDev1. Re-asserting it is the same move the BIOS-setup phase already makes on recovery (HandleBiosJobFailure -> RetryPlatformConfiguration re-runs machine_setup) -- SetBootOrder just never got the same treatment.
- Idempotent on a healthy host (HttpDev1 already enabled), and the returned BIOS job id is discarded -- zero-DPU hosts swallow it as
NoDpu, and the boot-order job is what CheckBootOrder already tracks.
- Tests: teach the
RedfishSim harness to model the de-enum (a reorder only sticks while HttpDev1 is enabled; machine_setup re-enables it), then cover the retry re-assert and the unchanged healthy path.
Related follow-up: the same NIC de-enum can revert HttpDev1 right before lockdown is enabled, which locks in the bad config and stalls validation -- worth a small verify-before-lock guard, tracked separately.
Part of #870.
NIC-mode (zero-DPU) hosts should sail through boot-order setup even when the BlueField NIC blips off Redfish across a reboot. Today
SetBootOrderleans onHttpDev1(the UEFI HTTP boot device) staying enabled from the BIOS-setup phase -- but on a Dell NIC-mode host a reboot can de-enumerate the NIC and revertHttpDev1to the onboard default, andSetBootOrdernever re-asserts it. So the reorder keeps missing the now-gone "HTTP Device 1" boot option,is_boot_order_setupnever passes, and the host burns its retries stuck.What this involves
set_host_boot_order(crates/machine-controller/src/handler.rs, theSetBootOrderState::SetBootOrderarm), re-runmachine_setupright beforeset_boot_order_dpu_first. That re-assertsHttpDev1=Enabled+ the boot NIC by id; the existingRebootHostforce-restart applies it in the same reboot as the reorder -- no extra reboot.set_boot_order_dpu_firstonly reorders existing boot options; it never re-enablesHttpDev1. Re-asserting it is the same move the BIOS-setup phase already makes on recovery (HandleBiosJobFailure->RetryPlatformConfigurationre-runsmachine_setup) -- SetBootOrder just never got the same treatment.NoDpu, and the boot-order job is whatCheckBootOrderalready tracks.RedfishSimharness to model the de-enum (a reorder only sticks while HttpDev1 is enabled;machine_setupre-enables it), then cover the retry re-assert and the unchanged healthy path.Related follow-up: the same NIC de-enum can revert
HttpDev1right before lockdown is enabled, which locks in the bad config and stalls validation -- worth a small verify-before-lock guard, tracked separately.Part of #870.