Skip to content

[sonic-installer] Add dedicated SONiC-BMC U-Boot bootloader#4602

Open
william8545 wants to merge 6 commits into
sonic-net:masterfrom
william8545:shadow-pr/pr-115
Open

[sonic-installer] Add dedicated SONiC-BMC U-Boot bootloader#4602
william8545 wants to merge 6 commits into
sonic-net:masterfrom
william8545:shadow-pr/pr-115

Conversation

@william8545

@william8545 william8545 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Address issue - #4548

What I did

Added a dedicated BmcUbootBootloader so sonic-installer manages SONiC-BMC boot state correctly. SONiC-BMC platforms (ASPEED AST2700) run U-Boot with a two-slot environment, which the generic UbootBootloader does not model — on a BMC it previously fell through to that generic handler and operated on the wrong environment variables.

Specifically:

  • New sonic_installer/bootloader/bmc_uboot.py (BmcUbootBootloader, NAME = 'bmc-uboot'), selected via utilities_common.chassis.is_bmc().
  • Registered first in BOOTLOADERS so a stray /host/grub/grub.cfg cannot preempt it, and guarded UbootBootloader.detect() to exclude BMC.
  • Full two-slot lifecycle: list, set-default, set-next-boot, install, remove, cleanup, verify-next-image, set-fips/get-fips, binary-version, plus a BMC-specific verify_image_platform (fails closed).
  • Unit tests in tests/installer_bootloader_bmc_uboot_test.py (12 tests).

No change to sonic_installer/main.py — the work is confined to the bootloader plugin and its test.

How I did it

BmcUbootBootloader subclasses OnieInstallerBootloader and models the ASPEED AST2700 two-slot U-Boot environment:

Concept Slot 1 Slot 2
version var sonic_version_1 sonic_version_2
boot command run sonic_image_1 run sonic_image_2
kernel args linuxargs linuxargs_old

Empty slots are written/read as the marker None (also accepts NONE/empty, case-insensitively).

  • Boot selection (Aboot-style contract): set-default writes the persistent boot_next and clears the one-shot boot_once; set-next-boot writes only boot_once. get_next_image() resolves boot_once first (it wins, matching U-Boot bootcmd), then boot_next, and surfaces an empty/unrecognized selector as a raw string so a broken boot state stays visible rather than being masked.
  • Install: install_image() runs the image's own installer (bash <image>, which performs the slot rotation and sets boot_next) and then clears any stale boot_once that would otherwise shadow the new default.
  • Remove: remove_image() repoints boot_once/boot_next to the surviving slot before clearing the removed slot's version + auxiliary vars and deleting its rootfs; it refuses to remove the only populated slot.
  • FIPS: set-fips/get-fips rewrite the sonic_fips= token in the target slot's linuxargs/linuxargs_old only (per-slot isolation).
  • Platform check: verify_image_platform() mirrors the GRUB approach — extracts installer/platforms_asic from the image and matches the running device_info.get_platform() with grep -Fxq (whole-line). It fails closed: compatible only when grep matches, so a malformed/manifest-less payload is rejected rather than silently accepted.

How to verify it

On a SONiC-BMC (ASPEED AST2700) board, the dedicated bootloader is auto-selected and the full command surface was exercised end-to-end, cross-checking the U-Boot environment (fw_printenv) and, across real reboots, /proc/cmdline:

  • sonic-installer list / verify-next-image reflect the two-slot state.
  • set-default <img> sets boot_next and clears boot_once; set-next-boot <img> sets a one-shot boot_once that U-Boot consumes on the next reboot and then reverts to boot_next.
  • install <bmc.bin> rotates the new image into a slot and boots it after reboot.
  • remove / cleanup clear the slot's vars, repoint the boot selectors, and delete the rootfs.
  • set-fips/get-fips toggle sonic_fips= in the correct per-slot args.
  • A wrong-platform (switch) image is rejected by verify_image_platform; --skip-platform-check overrides it.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

SONiC-BMC runs U-Boot with a fixed two-slot ASPEED environment that the
shared UbootBootloader mishandles. Add BmcUbootBootloader: Aboot-style
boot contract (set-default wins, boot_once one-shot), exact slot match,
per-slot FIPS, and platforms_asic verification. Detect via is_bmc() and
guard UbootBootloader.detect() to exclude BMC.

Signed-off-by: William Tsai <willtsai@nvidia.com>
…te set-fips echo

Make BmcUbootBootloader the first entry in BOOTLOADERS so a BMC is never
preempted by an accidental/stale /host/grub/grub.cfg. This is safe because
BmcUbootBootloader.detect() is is_bmc(), which is False on every non-BMC
platform, so probing it first cannot mis-select on non-BMC systems.

Drop the duplicate 'Done' echo in set_fips(); the CLI already prints the
user-facing success message.

Add a regression test asserting BMC wins when both BMC and GRUB detectors
return true.

Signed-off-by: William Tsai <willtsai@nvidia.com>
…m; add tests

verify_image_platform() returned True whenever the tar extraction of
installer/platforms_asic failed, so a malformed/non-tar payload or one
missing the manifest silently passed platform validation. Key the result
on grep (p3) instead: compatible only when the running platform is
explicitly listed. This fails closed on malformed/missing manifests while
still accepting an early grep match where tar is then SIGPIPE'd (which
would wrongly fail if keyed on tar's return code).

Add unit tests for verify_image_platform (match / mismatch / fail-closed /
SIGPIPE-match / non-file), fw_printenv read-failure handling,
get_installed_images empty-slot skipping, and the only-populated-slot
remove guard.

Signed-off-by: William Tsai <willtsai@nvidia.com>
…mment

Signed-off-by: William Tsai <willtsai@nvidia.com>
…-boot

Signed-off-by: William Tsai <willtsai@nvidia.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

…fy flake8 E501

The single-line 'return ... or ... or ...' in get_next_image() was 128
characters, exceeding the 120-char limit enforced by the pre-commit
flake8 check (Pretest Static Analysis). Wrap it across lines; behavior
is unchanged.

Signed-off-by: William Tsai <willtsai@nvidia.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants