Skip to content

U-Boot sonic-installer behavior requiring fixes #4548

@william8545

Description

@william8545

Description

This report analyzes the U-Boot bootloader behavior in sonic_installer/bootloader/uboot.py and related SONiC-BMC U-Boot scripts as verified on the installed BMC image. All reproduce steps are based on SONiC-BMC (ASPEED AST2700), but behaviours should be similar across U-Boot devices.

Severity summary

Severity Issue
Critical Sparse slot state can make set-default, set-next-boot, and remove select the wrong U-Boot slot
Critical boot_once is ignored by get_next_image(), so list, cleanup, verify-next-image, and reboot scripts can operate on the wrong image
Critical Image names that are substrings of other image names can select the wrong slot
High Stale boot_once can shadow set-default or a newly installed default image
High Removing an image does not clear a boot_once that still points to the removed slot
High cleanup can remove the user's pinned one-time boot image
High Platform validation accepts any existing file on U-Boot platforms
High FIPS commands ignore the requested image and modify/report the wrong slot
High Image install can report success even if U-Boot env programming failed
High soft-reboot can use the next image's kernel with the current image's bootargs on U-Boot
Medium Broken or custom U-Boot selectors are hidden by get_next_image() fallback behavior
Medium Removing an image leaves stale slot-specific boot variables behind
Medium set-fips can corrupt linuxargs when the existing sonic_fips value is not a single trailing token
Medium Current-image detection can crash on U-Boot platforms whose bootargs do not use loop=<image>/fs.squashfs
Low get-fips uses a substring check and can report multi-character values beginning with 1 as enabled
Low Empty slot marker is inconsistent (None vs NONE)
Low U-Boot detection treats any non-GRUB/non-Aboot ARM system as U-Boot

Issue 1: Sparse slot state selects the wrong U-Boot slot

Severity: Critical

Description

Slot 1 is empty, slot 2 contains the only installed image:

sonic_version_1=None
sonic_version_2=SONiC-OS-A

This is reachable from a normal clean SONiC-BMC install. A clean ONIE/TFTP install starts as slot 1 populated and slot 2 empty. During a later upgrade, the ASPEED installer stores the newly installed image in slot 1 and moves the currently running image into slot 2. If the new slot 1 image is then removed before it is booted, slot 1 becomes empty while the current image remains in slot 2.

Steps to reproduce the issue

Start from clean image SONiC-OS-A:

fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2

Expected initial state:

sonic_version_1=SONiC-OS-A
sonic_version_2=None

Install a second image SONiC-OS-B, but do not reboot into it:

sonic-installer install -y /path/to/sonic-bmc-B.bin
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2

Expected state after install:

sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A

Remove the new image while the running image is still SONiC-OS-A:

sonic-installer remove -y SONiC-OS-B
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2

Expected reachable sparse state:

sonic_version_1=NONE
sonic_version_2=SONiC-OS-A

Now set the surviving slot 2 image as default:

sonic-installer set-default SONiC-OS-A
fw_printenv -n boot_next

Describe the results you expected

boot_next should point to slot 2:

boot_next=run sonic_image_2

Describe the results you received

The installed implementation writes slot 1:

boot_next=run sonic_image_1

Additional information

get_installed_images() in sonic_installer/bootloader/uboot.py reads sonic_version_1 and sonic_version_2, but filters out empty/non-SONiC slots into a compact list. With slot 1 empty and slot 2 populated, it returns:

["SONiC-OS-A"]

Then set_default_image() assumes images[0] is slot 1:

if image in images[0]:
    fw_setenv boot_next "run sonic_image_1"

The list index no longer matches the U-Boot slot number. This can point the default boot target at an empty slot. The same slot-index bug affects:

  • set_next_image(), which can write the wrong boot_once.
  • remove_image(), which can clear the wrong sonic_version_N and point boot_next at the wrong slot.

Additional information: Impact

The next reboot can attempt run sonic_image_1 even though the only valid image is in slot 2. On SONiC-BMC, slot-specific boot variables such as fit_name, fit_name_old, linuxargs, and linuxargs_old control the actual FIT path and kernel command line, so selecting the wrong slot can boot the wrong image or fail to boot.

Issue 2: Similar image names can boot the wrong image

Severity: Critical

Description

Two installed image names have a prefix/substr relationship:

sonic_version_1=SONiC-OS-A-new
sonic_version_2=SONiC-OS-A

This is reachable with normal upgrade sequencing if the first image version string is a prefix of the second image version string. For example, clean-install SONiC-OS-A, then install SONiC-OS-A-new. The ASPEED upgrade flow stores the new image in slot 1 and the previous image in slot 2. The user then wants to boot SONiC-OS-A from slot 2.

Steps to reproduce the issue

Start with:

sonic_version_1=SONiC-OS-A-new
sonic_version_2=SONiC-OS-A

Then run:

sonic-installer set-next-boot SONiC-OS-A
fw_printenv -n boot_once

Describe the results you expected

boot_once should point to slot 2:

boot_once=run sonic_image_2

Describe the results you received

The installed implementation writes slot 1:

boot_once=run sonic_image_1

Additional information

The implementation uses substring checks:

if image in images[0]:

That is not an exact image-name comparison. If the requested image name is contained inside the other slot's image name, the implementation selects the first matching slot, not the exact slot.

Additional information: Impact

The command succeeds, but the next reboot boots a different image than the one the user requested. This is worse than a validation failure because the CLI gives no warning and writes a valid-looking U-Boot variable.

Issue 3: boot_once is ignored, so the reported "Next" image can be wrong

Severity: Critical

Description

Default boot is slot 1, but a one-time boot is scheduled for slot 2:

sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
boot_next=run sonic_image_1
boot_once=run sonic_image_2

This is reachable from clean install by installing a second image and scheduling the currently running image for one-time boot. No reboot is required to reach the failing state: the installer stores the new image in slot 1 and preserves the running image in slot 2.

sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-A
fw_printenv -n boot_next
fw_printenv -n boot_once

Steps to reproduce the issue

After the state above is reached, run:

sonic-installer list
sonic-installer verify-next-image

Describe the results you expected

Because ASPEED bootcmd executes boot_once before boot_next, the next boot target is SONiC-OS-A.

sonic-installer list should show:

Next: SONiC-OS-A

Describe the results you received

The installed implementation reports slot 1:

sonic-installer list: Next: SONiC-OS-B
fw_printenv -n boot_once: run sonic_image_2

Additional information

sonic_installer/bootloader/uboot.py:get_next_image() reads only boot_next:

fw_printenv -n boot_next

It never reads boot_once, even though bootcmd executes boot_once first. Therefore the CLI reports the persistent default image, not the actual next boot image.

Additional information: Impact

This breaks more than sonic-installer list:

  • sonic-installer verify-next-image uses Bootloader.verify_next_image(), which calls get_next_image().
  • sonic-installer cleanup keeps current and next, and removes other images. If boot_once points to slot 2 but get_next_image() reports slot 1, cleanup can remove the actual one-time boot target.
  • scripts/reboot checks sonic-installer verify-next-image.
  • scripts/fast-reboot, warm-reboot, express-reboot, and soft-reboot derive target image paths from sonic-installer list.

The result is a CLI that can say the next image is safe while U-Boot is about to boot a different image.

Issue 4: cleanup can remove the user's pinned one-time boot image

Severity: High

Description

The user pins a non-current image with set-next-boot, then runs cleanup -y to remove images that are neither current nor next. On ASPEED U-Boot, the pinned one-time target is the actual next boot target because bootcmd consumes boot_once before boot_next.

This is reachable from clean install:

  1. Start with clean image SONiC-OS-A.
  2. Install SONiC-OS-B.
  3. Keep SONiC-OS-A as the default/current image.
  4. Pin SONiC-OS-B for one-time boot.
  5. Run cleanup.

Steps to reproduce the issue

sonic-installer install -y /path/to/sonic-bmc-B.bin

PINNED_IMAGE=$(fw_printenv -n sonic_version_1)
CURRENT_IMAGE=$(fw_printenv -n sonic_version_2)

sonic-installer set-default "$CURRENT_IMAGE"
sonic-installer set-next-boot "$PINNED_IMAGE"

fw_printenv boot_once boot_next sonic_version_1 sonic_version_2
sonic-installer list
sonic-installer cleanup -y
fw_printenv boot_once boot_next sonic_version_1 sonic_version_2

Do not perform the final reboot during reproduction; the environment after cleanup is enough to prove the failure.

Describe the results you expected

cleanup should keep the image referenced by boot_once, because that is the image U-Boot will try first on the next reboot:

sonic_version_1=SONiC-OS-B
boot_once=run sonic_image_1

Describe the results you received

cleanup uses get_next_image() to decide which image to keep. Because get_next_image() ignores boot_once, it keeps only the current/default image and removes the one-time target:

sonic_version_1=NONE
boot_once=run sonic_image_1
boot_next=run sonic_image_2

Additional information

The cleanup() command in sonic_installer/main.py removes every image that is not equal to current or bootloader.get_next_image(). The U-Boot get_next_image() path reports boot_next, not the higher-priority boot_once. Then remove_image() marks the pinned slot empty but does not clear boot_once.

Additional information: Impact

The next cold reboot consumes boot_once=run sonic_image_1 and attempts to boot a slot that cleanup just removed. This turns a routine cleanup after set-next-boot into a possible boot failure.

Issue 5: set-fips can corrupt or duplicate the sonic_fips bootarg

Severity: Medium

Description

The existing linuxargs contains a sonic_fips token whose value has more than one character, or the token appears at the beginning of the command line:

linuxargs=console=ttyS12,115200n8 sonic_fips=10 loop=foo
linuxargs=sonic_fips=1 console=ttyS12,115200n8 loop=foo

The current production value is usually a single character, but the parser should not corrupt the kernel command line if a future path writes a multi-character value or moves the token to the front.

Steps to reproduce the issue

Variant A, multi-character value:

ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)

fw_setenv linuxargs "console=ttyS12,115200n8 sonic_fips=10 loop=foo"
sonic-installer set-fips "$IMAGE" --enable-fips
fw_printenv -n linuxargs

fw_setenv linuxargs "$ORIG_LINUXARGS"

Variant B, token at the beginning:

ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)

fw_setenv linuxargs "sonic_fips=1 console=ttyS12,115200n8 loop=foo"
sonic-installer set-fips "$IMAGE" --disable-fips
fw_printenv -n linuxargs
sonic-installer get-fips "$IMAGE"

fw_setenv linuxargs "$ORIG_LINUXARGS"

Describe the results you expected

The existing sonic_fips=<value> token should be removed as a whole token and replaced with exactly one new token.

Describe the results you received

For variant A, only sonic_fips=1 is removed, leaving the trailing 0 glued to the previous argument:

linuxargs=console=ttyS12,115200n80 loop=foo sonic_fips=1

For variant B, the leading token is not matched, so a duplicate is appended:

linuxargs=sonic_fips=1 console=ttyS12,115200n8 loop=foo sonic_fips=0

get-fips then reports enabled because the stale sonic_fips=1 token is still present.

Additional information

set_fips() uses this pattern:

re.sub(r' sonic_fips=[^\s]', '', cmdline)

[^\s] consumes exactly one non-whitespace character, not the entire token value, and the leading literal space means the token is ignored when it appears at the beginning of linuxargs.

Additional information: Impact

The command can silently write a malformed kernel command line or leave contradictory FIPS flags. On serial-console BMC systems, corrupting the console token is especially risky because it can make boot-time debugging harder.

Issue 6: get-fips treats multi-character values beginning with 1 as enabled

Severity: Low

Description

linuxargs contains a sonic_fips value such as 10 or 12:

linuxargs=console=ttyS12,115200n8 sonic_fips=10 loop=foo

Steps to reproduce the issue

ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)

fw_setenv linuxargs "console=ttyS12,115200n8 sonic_fips=10 loop=foo"
sonic-installer get-fips "$IMAGE"

fw_setenv linuxargs "$ORIG_LINUXARGS"

Describe the results you expected

Only an exact sonic_fips=1 token should be treated as enabled.

Describe the results you received

get-fips reports enabled because it tests for a substring:

'sonic_fips=1' in out

sonic_fips=10 and sonic_fips=12 both satisfy that substring check.

Additional information: Impact

This is lower severity than Issue 5 because current values are normally 0 or 1, but it is still a correctness bug in the parser and makes future extensions of the FIPS value unsafe.

Issue 7: Stale boot_once can shadow set-default and image installation

Severity: High

Description

A previous set-next-boot left:

boot_once=run sonic_image_N

The user then tries to make a different slot the default, or installs a new image that should become the default.

Steps to reproduce the issue

Case A, set-default is shadowed. Start from clean image SONiC-OS-A, install a second image, but do not reboot into it:

sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-B
sonic-installer set-default SONiC-OS-A
fw_printenv -n boot_once
fw_printenv -n boot_next

Expected: set-default clears the one-time selector so the next reboot follows the new default and boots SONiC-OS-A. Actual: boot_once=run sonic_image_1 remains set, so U-Boot will consume it first and boot SONiC-OS-B once.

Case B, a newly installed image is shadowed. Start from the same two-image state before rebooting into SONiC-OS-B, then leave a one-time boot entry for the current image before installing a third image:

sonic-installer set-next-boot SONiC-OS-A
sonic-installer install -y /path/to/sonic-bmc-C.bin
fw_printenv -n boot_once
fw_printenv -n boot_next

Expected: the next reboot boots the newly installed image SONiC-OS-C. Actual: boot_once remains set and takes priority over the new boot_next; slot 2 is still the previous current image, so the reboot can boot that previous image instead of SONiC-OS-C.

Describe the results you expected

After set-default, the next boot should follow the selected default. After install, the first reboot should boot the newly installed image/default configured by the installer.

Describe the results you received

The installed implementation does not clear boot_once in set_default_image() or install_image(). After the commands above, boot_once remains non-empty:

boot_once=run sonic_image_N

Additional information

set_default_image() sets boot_next, but leaves boot_once untouched.

install_image() runs the image installer script, but also leaves boot_once untouched.

ASPEED install code in platform/aspeed/platform_arm64.conf programs boot_next and bootcmd as part of prepare_boot_menu, but it does not clear stale boot_once.

Because U-Boot evaluates boot_once before boot_next, a stale one-time boot selector shadows the newly selected default.

Additional information: Impact

The next reboot can boot a stale one-time target instead of the image the user just made default or just installed. If the stale one-time target points to a removed or stale slot, this can become a boot failure.

Issue 8: Removing an image does not clear boot_once pointing to that image

Severity: High

Description

The running image is still SONiC-OS-A, a newly installed image SONiC-OS-B is in slot 1, and slot 1 is scheduled for one-time boot:

current image from /proc/cmdline: SONiC-OS-A
sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
boot_once=run sonic_image_1

This is reachable from a clean install because installing SONiC-OS-B stores the new image in slot 1 while the currently running SONiC-OS-A is preserved in slot 2.

Steps to reproduce the issue

Start from clean image SONiC-OS-A, then install but do not reboot into SONiC-OS-B:

sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-B
fw_printenv -n boot_once

Expected intermediate state:

boot_once=run sonic_image_1

Now remove SONiC-OS-B, which is allowed because the running image is still SONiC-OS-A:

sonic-installer remove -y SONiC-OS-B
fw_printenv -n sonic_version_1
fw_printenv -n boot_once

Describe the results you expected

The remove operation should clear boot_once because it points to the removed slot:

boot_once=

Describe the results you received

The installed implementation leaves boot_once unchanged. The check after removal still shows:

sonic_version_1=NONE
boot_once=run sonic_image_1

No fw_setenv boot_once "" appears.

Additional information

remove_image() flips boot_next, marks one sonic_version_N as NONE, and deletes /host/image-*. It never checks whether boot_once still points at the slot being removed.

Additional information: Impact

On the next cold reboot, bootcmd still tries the removed slot first. Depending on U-Boot command behavior and platform scripts, it may fail before fallback, delay boot, or boot with stale slot-specific variables. Even if fallback eventually reaches boot_next, the environment is inconsistent and the user has no warning.

Issue 9: Platform validation is effectively disabled on U-Boot

Severity: High

Description

The user accidentally provides an image for a different platform/ASIC:

sonic-installer install /tmp/wrong-platform-sonic.bin

The user did not pass --skip-platform-check.

Steps to reproduce the issue

Use an image whose installer/platforms_asic does not include the running BMC platform. For example, use an x86_64 Mellanox/NVIDIA switch image on an ASPEED BMC:

IMG=/tmp/switch-sonic.bin
BMC_PLATFORM=$(sed -n 's/^onie_platform=//p' /host/machine.conf)

sed -e '1,/^exit_marker$/d' "$IMG" | tar xOf - installer/platforms_asic | grep -qx "$BMC_PLATFORM"

python3 - <<'PY'
from sonic_installer.bootloader.uboot import UbootBootloader
img = "/tmp/switch-sonic.bin"
b = UbootBootloader()
print("verify_image_platform(switch image)=", b.verify_image_platform(img))
print("verify_image_platform(/etc/hostname)=", b.verify_image_platform("/etc/hostname"))
print("verify_image_platform(/nonexistent)=", b.verify_image_platform("/nonexistent"))
PY

Describe the results you expected

The installer/platforms_asic check should show that the BMC platform is absent, and verify_image_platform() should return False for the switch image, matching GRUB/other bootloaders.

Describe the results you received

The switch image's installer/platforms_asic does not contain the BMC platform, but the U-Boot platform check still accepts it because it accepts any existing regular file:

BMC platform: arm64-aspeed_nvidia_ast2700_bmc-r0
switch image contains BMC platform: no
verify_image_platform(switch image)= True
verify_image_platform(/etc/hostname)= True
verify_image_platform(/nonexistent)= False

In the install flow, that means the command proceeds past the default platform check instead of failing with the normal platform mismatch message.

Additional information

verify_image_platform() implements:

def verify_image_platform(self, image_path):
    return os.path.isfile(image_path)

The install command in sonic_installer/main.py relies on bootloader.verify_image_platform(image_path) to enforce the default platform check. Therefore the --skip-platform-check safety boundary is meaningless on U-Boot: the default behavior already skips real platform validation.

Additional information: Impact

SONiC-BMC can attempt to install an image that is not intended for the BMC platform. That can leave the device with an unbootable or unsupported image even though the user did not request a forced platform bypass.

Issue 10: FIPS commands modify/report the wrong image

Severity: High

Description

Slot 1 and slot 2 have separate kernel args:

sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
linuxargs=... loop=image-B/fs.squashfs sonic_fips=0
linuxargs_old=... loop=image-A/fs.squashfs sonic_fips=1

This is reachable from clean install by installing a second image. Start with SONiC-OS-A, install SONiC-OS-B, and before reboot the running image SONiC-OS-A is preserved in slot 2. The user asks for FIPS status or changes FIPS for that slot 2 image:

sonic-installer get-fips SONiC-OS-A
sonic-installer set-fips SONiC-OS-A --disable-fips

Steps to reproduce the issue

From clean image SONiC-OS-A:

sonic-installer install -y /path/to/sonic-bmc-B.bin
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2
fw_printenv -n linuxargs
fw_printenv -n linuxargs_old
sonic-installer set-fips SONiC-OS-A --disable-fips
fw_printenv -n linuxargs
fw_printenv -n linuxargs_old

The expected observable difference is that slot 2's linuxargs_old should change, while slot 1's linuxargs should not.

Describe the results you expected

The commands should read or write slot 2's bootargs (linuxargs_old on ASPEED).

Describe the results you received

The installed implementation always reads and writes linuxargs:

fw_printenv linuxargs
fw_setenv linuxargs ...

Additional information

The U-Boot set_fips() and get_fips() methods ignore the image argument. ASPEED U-Boot uses:

  • linuxargs for slot 1 (sonic_bootargs).
  • linuxargs_old for slot 2 (sonic_bootargs_old).

Those variables are programmed in:

  • sonic-program-uboot-env.sh, where the installer writes linuxargs and linuxargs_old
  • platform_arm64.conf, where ASPEED defines the slot 1 and slot 2 bootargs variables

Additional information: Impact

  • get-fips SONiC-OS-A reports slot 1's FIPS status.
  • set-fips SONiC-OS-A ... changes slot 1, not slot 2.
  • set-fips with no image defaults through get_next_image(), so it can combine with Issue 3 and modify an image that is not actually the next boot target.

This is a security configuration bug because the CLI can report success while the requested image's FIPS bootarg is unchanged.

Issue 11: Broken or custom U-Boot selectors are hidden

Severity: Medium

Scenario A

boot_next points at an empty slot:

sonic_version_1=NONE
sonic_version_2=SONiC-OS-A
boot_next=run sonic_image_1
Scenario B

boot_once contains a non-standard U-Boot command:

boot_once=run recovery_script
boot_next=run sonic_image_1

Steps to reproduce the issue

Scenario A is reachable by following Issue 1: create the sparse state, then run set-default on the surviving slot 2 image. That writes boot_next=run sonic_image_1 while slot 1 is empty.

Scenario B is a lab/debug reproduction: set a custom one-shot command and then ask the installer what will boot next:

fw_setenv boot_once "run recovery_script"
fw_setenv boot_next "run sonic_image_1"
sonic-installer list

Describe the results you expected

The CLI should surface the actual selected slot or command, or at least report that the selected slot is empty/invalid. It should not claim a safe next image that U-Boot will not actually execute first.

Describe the results you received

After Scenario A, sonic-installer list reports the surviving installed image even though boot_next points at the empty slot:

Next: SONiC-OS-A
boot_next=run sonic_image_1
sonic_version_1=NONE

After Scenario B, sonic-installer list reports the image selected by boot_next and does not reveal the custom boot_once command that U-Boot will execute first.

Additional information

get_next_image() only returns images[1] when boot_next contains sonic_image_2 and there are exactly two populated images. Otherwise it returns images[0]. It also ignores boot_once entirely.

Additional information: Impact

sonic-installer list and verify-next-image can hide a broken U-Boot environment by reporting the first installed image. That makes debugging and pre-reboot validation unreliable.

Issue 12: Removing an image leaves stale slot-specific boot variables

Severity: Medium

Description

Slot 1 contains a newly installed non-current image SONiC-OS-B:

image_dir=image-B
fit_name=image-B/boot/sonic_arm64.fit
linuxargs=... loop=image-B/fs.squashfs ...
sonic_version_1=SONiC-OS-B

This is reachable from clean install: install a second image, but do not reboot into it. The running image remains in slot 2, so the slot 1 image can be removed.

sonic-installer install -y /path/to/sonic-bmc-B.bin
NEW_IMAGE=$(fw_printenv -n sonic_version_1)
sonic-installer remove -y "$NEW_IMAGE"

The important point is to remove a non-current U-Boot slot image and then inspect that slot's payload variables.

Steps to reproduce the issue

Before removal, inspect the slot 1 variables:

fw_printenv image_dir fit_name linuxargs sonic_bootargs sonic_boot_load sonic_version_1

Then remove the slot 1 image and inspect the same variables again.

Describe the results you expected

The slot identity and slot-specific boot payload variables should be cleared together.
This must be done with platform awareness: variables should only be cleared when the platform actually uses paired slot-local variables. For example, a platform with both linuxargs and linuxargs_old can treat them as slot-local, but a platform that uses one shared linuxargs for both slots must not lose that global variable when slot 1 is removed.

Describe the results you received

The installed implementation clears only sonic_version_1 and deletes /host/image-B. It leaves variables such as image_dir, fit_name, linuxargs, sonic_bootargs, and sonic_boot_load.

Additional information

remove_image() only writes boot_next and sonic_version_N. ASPEED slot 1 booting depends on the non-_old variable set used by sonic_image_1 in platform_arm64.conf; slot 2 similarly depends on the _old variable set used by sonic_image_2.

Additional information: Impact

Manual U-Boot recovery commands or future scripts can still expand stale variables for a deleted image. The menu no longer lists a real image, but run sonic_image_1 or run sonic_image_2 can still attempt to load the removed FIT path for whichever slot was cleared only at the sonic_version_N level. This is confusing at best and risky during recovery.

Issue 13: Empty slot marker is inconsistent

Severity: Low

Description

Fresh ASPEED initialization sets:

sonic_version_2=None

The U-Boot remove_image() method sets:

sonic_version_N=NONE

Steps to reproduce the issue

Compare a fresh ASPEED initialized system with a system after removing a non-current image:

fw_printenv -n sonic_version_2
sonic-installer install -y /path/to/sonic-bmc-B.bin
NEW_IMAGE=$(fw_printenv -n sonic_version_1)
sonic-installer remove -y "$NEW_IMAGE"
fw_printenv -n sonic_version_1

Describe the results you expected

The empty slot marker should be consistent.

Describe the results you received

The U-Boot remove_image() method uses uppercase NONE, while ASPEED scripts use None.

Additional information

This does not break get_installed_images() because it filters by IMAGE_PREFIX, but U-Boot menu text prints the literal sonic_version_N.

Additional information: Impact

The boot menu and debugging output can alternate between None and NONE for the same empty-slot concept. This is low severity but avoidable confusion.

Issue 14: U-Boot detection is too broad

Severity: Low

Description

An ARM/aarch64 SONiC platform does not use the ASPEED-style U-Boot env, and GRUB/Aboot detection does not claim it.

Steps to reproduce the issue

On such a platform, run:

sonic-installer list

Describe the results you expected

The bootloader should either be detected accurately or fail with a clear "unsupported bootloader/env" message.

Describe the results you received

UbootBootloader.detect() detects U-Boot for any arm or aarch64 machine:

return ("arm" in arch) or ("aarch64" in arch)

Additional information

Bootloader detection runs in AbootBootloader, GrubBootloader, then UbootBootloader order. U-Boot is therefore a fallback, but it still does not verify that /usr/bin/fw_printenv can read the expected variables (sonic_version_1, sonic_version_2, boot_next, etc.).

Additional information: Impact

The CLI can select U-Boot and fail later with confusing env-read behavior. This is low severity for the current SONiC-BMC target, but it matters for portability to other ARM platforms.

Issue 15: Install can report success even when U-Boot env programming failed

Severity: High

Description

During image installation, the image's installer path attempts to program U-Boot env, but U-Boot env programming fails because fw_setenv is unavailable, /etc/fw_env.config is wrong, the env store is inaccessible, or a required variable is missing.

Steps to reproduce the issue

In a lab installer environment, force the ASPEED U-Boot env setup path to fail, for example by using an invalid fw_env.config or by making fw_setenv return nonzero, then run:

sonic-installer install -y /path/to/sonic-bmc.bin

Describe the results you expected

The install should fail before reporting success, because the installed image may not be bootable if the U-Boot env was not updated.

Describe the results you received

The ASPEED installer path has U-Boot env setup as a helper step inside prepare_boot_menu. In the image tested on the BMC, prepare_boot_menu calls configure_uboot_env, but the call is not guarded with an abort if the helper returns nonzero:

prepare_boot_menu() {
    configure_uboot_env
    ...
    fw_setenv boot_next 'run sonic_image_1'
    fw_setenv bootcmd '...'
}

The U-Boot bootloader install_image() method only runs:

run_command(["bash", image_path])

If the image installer exits zero despite a U-Boot env setup failure or unhandled fw_setenv failure, sonic-installer install continues to migration/sync and prints success.

Additional information

For a U-Boot SONiC-BMC image, programming boot_next, bootcmd, sonic_image_1/2, linuxargs, and the FIT paths is part of making the installed image bootable. Treating env programming failure as a warning breaks the installer's success contract.

Additional information: Impact

The operator can see a successful install, but the next reboot may still boot the previous image, boot an empty slot, or fail because the U-Boot variables still point to stale paths. This is high severity because the failure is deferred until reboot.

Issue 16: Current-image detection can crash when U-Boot bootargs do not use the SONiC-BMC loop format

Severity: Medium

Description

The U-Boot bootloader class inherits current-image detection from the ONIE installer bootloader. That parser expects /proc/cmdline to contain:

loop=<image-dir>/fs.squashfs

The available SONiC-BMC board does use that exact shape, so this does not reproduce on the current AST2700 test setup. However, other U-Boot platforms can use different bootargs or variable names, such as a firmware= selector, a different rootfs filename, or no loop= parameter at all.

Steps to reproduce the issue

On an ARM/U-Boot lab platform, boot SONiC with U-Boot args that do not include loop=<image-dir>/fs.squashfs, then run a command that asks for the current image:

cat /proc/cmdline
sonic-installer list
sonic-installer remove -y <some-non-current-image>

For example, the issue is reachable on a lab U-Boot platform whose boot command line looks like this:

root=/dev/mmcblk0p1 rw firmware=SONiC-OS-A

Running sonic-installer list in that state asks the bootloader for the current image and hits the parser path.

Describe the results you expected

The command should either identify the current image through a U-Boot-aware fallback or fail with a clear message saying the current image could not be determined.

Describe the results you received

The inherited parser performs a regular-expression match and immediately calls .group(1). If the expected loop=.../fs.squashfs pattern is absent, the command raises an exception instead of returning a controlled error.

The parser lives in the shared ONIE installer bootloader base class, so a fix can either be a U-Boot-specific override or a guarded base-class parser. The U-Boot-specific risk is that UbootBootloader.detect() intentionally covers ARM platforms whose bootargs may not match the SONiC-BMC loop format.

Additional information

U-Boot support should not assume every U-Boot platform uses the exact SONiC-BMC bootargs format. The AST2700 path is valid, but the detection should fail clearly or support a small set of known U-Boot selectors.

Additional information: Impact

sonic-installer list, remove, cleanup, and any code path that checks the current image can crash on a valid U-Boot platform with different bootargs. This matters for keeping the implementation flexible across BMC and non-BMC U-Boot targets.

Issue 17: soft-reboot can pair the next image's kernel with the current image's bootargs

Severity: High

Description

Two images are installed. The user selects a non-current image as the next/default boot target, then runs soft-reboot instead of a cold reboot.

On ASPEED/SONiC-BMC, each image's rootfs is selected through the kernel command line, especially:

loop=image-<version>/fs.squashfs

Steps to reproduce the issue

Start from clean image SONiC-OS-A, install SONiC-OS-B, but do not reboot into it. The installer selects SONiC-OS-B as the next/default image while the running kernel command line still points to SONiC-OS-A:

sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer list

Do not run the final soft-reboot just to prove this issue. Compare the running bootargs with the selected slot's U-Boot bootargs:

cat /proc/cmdline
fw_printenv -n linuxargs
SOFT_REBOOT=$(command -v soft-reboot)
grep -A20 'function setup_reboot_variables' "$SOFT_REBOOT"

The current /proc/cmdline points to the running image, while linuxargs points to the selected slot 1 image.

Describe the results you expected

soft-reboot should load the kernel/initrd and bootargs from the same selected next image.

Describe the results you received

The U-Boot/device-tree path in scripts/soft-reboot setup_reboot_variables() sets:

KERNEL_IMAGE="$(ls $IMAGE_PATH/boot/vmlinuz-*)"
BOOT_OPTIONS="$(cat /sys/firmware/devicetree/base/chosen/bootargs | sed 's/.$//') SONIC_BOOT_TYPE=${BOOT_TYPE_ARG}"

IMAGE_PATH is derived from sonic-installer list, but BOOT_OPTIONS comes from the current device-tree bootargs. This can kexec the next image's kernel with the current image's loop= rootfs path.

Additional information

The fast/warm/express reboot scripts have U-Boot-specific logic to fetch sonic_bootargs${SUFFIX} and linuxargs${SUFFIX} from U-Boot env, but soft-reboot uses current device-tree bootargs directly. That is safe only when the next image is the current image.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions