Description
This report analyzes the U-Boot bootloader behavior in sonic_installer/bootloader/uboot.py and related SONiC-BMC U-Boot scripts as verified on the installed BMC image. All reproduce steps are based on SONiC-BMC (ASPEED AST2700), but behaviours should be similar across U-Boot devices.
Severity summary
| Severity |
Issue |
| Critical |
Sparse slot state can make set-default, set-next-boot, and remove select the wrong U-Boot slot |
| Critical |
boot_once is ignored by get_next_image(), so list, cleanup, verify-next-image, and reboot scripts can operate on the wrong image |
| Critical |
Image names that are substrings of other image names can select the wrong slot |
| High |
Stale boot_once can shadow set-default or a newly installed default image |
| High |
Removing an image does not clear a boot_once that still points to the removed slot |
| High |
cleanup can remove the user's pinned one-time boot image |
| High |
Platform validation accepts any existing file on U-Boot platforms |
| High |
FIPS commands ignore the requested image and modify/report the wrong slot |
| High |
Image install can report success even if U-Boot env programming failed |
| High |
soft-reboot can use the next image's kernel with the current image's bootargs on U-Boot |
| Medium |
Broken or custom U-Boot selectors are hidden by get_next_image() fallback behavior |
| Medium |
Removing an image leaves stale slot-specific boot variables behind |
| Medium |
set-fips can corrupt linuxargs when the existing sonic_fips value is not a single trailing token |
| Medium |
Current-image detection can crash on U-Boot platforms whose bootargs do not use loop=<image>/fs.squashfs |
| Low |
get-fips uses a substring check and can report multi-character values beginning with 1 as enabled |
| Low |
Empty slot marker is inconsistent (None vs NONE) |
| Low |
U-Boot detection treats any non-GRUB/non-Aboot ARM system as U-Boot |
Issue 1: Sparse slot state selects the wrong U-Boot slot
Severity: Critical
Description
Slot 1 is empty, slot 2 contains the only installed image:
sonic_version_1=None
sonic_version_2=SONiC-OS-A
This is reachable from a normal clean SONiC-BMC install. A clean ONIE/TFTP install starts as slot 1 populated and slot 2 empty. During a later upgrade, the ASPEED installer stores the newly installed image in slot 1 and moves the currently running image into slot 2. If the new slot 1 image is then removed before it is booted, slot 1 becomes empty while the current image remains in slot 2.
Steps to reproduce the issue
Start from clean image SONiC-OS-A:
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2
Expected initial state:
sonic_version_1=SONiC-OS-A
sonic_version_2=None
Install a second image SONiC-OS-B, but do not reboot into it:
sonic-installer install -y /path/to/sonic-bmc-B.bin
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2
Expected state after install:
sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
Remove the new image while the running image is still SONiC-OS-A:
sonic-installer remove -y SONiC-OS-B
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2
Expected reachable sparse state:
sonic_version_1=NONE
sonic_version_2=SONiC-OS-A
Now set the surviving slot 2 image as default:
sonic-installer set-default SONiC-OS-A
fw_printenv -n boot_next
Describe the results you expected
boot_next should point to slot 2:
boot_next=run sonic_image_2
Describe the results you received
The installed implementation writes slot 1:
boot_next=run sonic_image_1
Additional information
get_installed_images() in sonic_installer/bootloader/uboot.py reads sonic_version_1 and sonic_version_2, but filters out empty/non-SONiC slots into a compact list. With slot 1 empty and slot 2 populated, it returns:
Then set_default_image() assumes images[0] is slot 1:
if image in images[0]:
fw_setenv boot_next "run sonic_image_1"
The list index no longer matches the U-Boot slot number. This can point the default boot target at an empty slot. The same slot-index bug affects:
set_next_image(), which can write the wrong boot_once.
remove_image(), which can clear the wrong sonic_version_N and point boot_next at the wrong slot.
Additional information: Impact
The next reboot can attempt run sonic_image_1 even though the only valid image is in slot 2. On SONiC-BMC, slot-specific boot variables such as fit_name, fit_name_old, linuxargs, and linuxargs_old control the actual FIT path and kernel command line, so selecting the wrong slot can boot the wrong image or fail to boot.
Issue 2: Similar image names can boot the wrong image
Severity: Critical
Description
Two installed image names have a prefix/substr relationship:
sonic_version_1=SONiC-OS-A-new
sonic_version_2=SONiC-OS-A
This is reachable with normal upgrade sequencing if the first image version string is a prefix of the second image version string. For example, clean-install SONiC-OS-A, then install SONiC-OS-A-new. The ASPEED upgrade flow stores the new image in slot 1 and the previous image in slot 2. The user then wants to boot SONiC-OS-A from slot 2.
Steps to reproduce the issue
Start with:
sonic_version_1=SONiC-OS-A-new
sonic_version_2=SONiC-OS-A
Then run:
sonic-installer set-next-boot SONiC-OS-A
fw_printenv -n boot_once
Describe the results you expected
boot_once should point to slot 2:
boot_once=run sonic_image_2
Describe the results you received
The installed implementation writes slot 1:
boot_once=run sonic_image_1
Additional information
The implementation uses substring checks:
That is not an exact image-name comparison. If the requested image name is contained inside the other slot's image name, the implementation selects the first matching slot, not the exact slot.
Additional information: Impact
The command succeeds, but the next reboot boots a different image than the one the user requested. This is worse than a validation failure because the CLI gives no warning and writes a valid-looking U-Boot variable.
Issue 3: boot_once is ignored, so the reported "Next" image can be wrong
Severity: Critical
Description
Default boot is slot 1, but a one-time boot is scheduled for slot 2:
sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
boot_next=run sonic_image_1
boot_once=run sonic_image_2
This is reachable from clean install by installing a second image and scheduling the currently running image for one-time boot. No reboot is required to reach the failing state: the installer stores the new image in slot 1 and preserves the running image in slot 2.
sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-A
fw_printenv -n boot_next
fw_printenv -n boot_once
Steps to reproduce the issue
After the state above is reached, run:
sonic-installer list
sonic-installer verify-next-image
Describe the results you expected
Because ASPEED bootcmd executes boot_once before boot_next, the next boot target is SONiC-OS-A.
sonic-installer list should show:
Describe the results you received
The installed implementation reports slot 1:
sonic-installer list: Next: SONiC-OS-B
fw_printenv -n boot_once: run sonic_image_2
Additional information
sonic_installer/bootloader/uboot.py:get_next_image() reads only boot_next:
It never reads boot_once, even though bootcmd executes boot_once first. Therefore the CLI reports the persistent default image, not the actual next boot image.
Additional information: Impact
This breaks more than sonic-installer list:
sonic-installer verify-next-image uses Bootloader.verify_next_image(), which calls get_next_image().
sonic-installer cleanup keeps current and next, and removes other images. If boot_once points to slot 2 but get_next_image() reports slot 1, cleanup can remove the actual one-time boot target.
scripts/reboot checks sonic-installer verify-next-image.
scripts/fast-reboot, warm-reboot, express-reboot, and soft-reboot derive target image paths from sonic-installer list.
The result is a CLI that can say the next image is safe while U-Boot is about to boot a different image.
Issue 4: cleanup can remove the user's pinned one-time boot image
Severity: High
Description
The user pins a non-current image with set-next-boot, then runs cleanup -y to remove images that are neither current nor next. On ASPEED U-Boot, the pinned one-time target is the actual next boot target because bootcmd consumes boot_once before boot_next.
This is reachable from clean install:
- Start with clean image
SONiC-OS-A.
- Install
SONiC-OS-B.
- Keep
SONiC-OS-A as the default/current image.
- Pin
SONiC-OS-B for one-time boot.
- Run cleanup.
Steps to reproduce the issue
sonic-installer install -y /path/to/sonic-bmc-B.bin
PINNED_IMAGE=$(fw_printenv -n sonic_version_1)
CURRENT_IMAGE=$(fw_printenv -n sonic_version_2)
sonic-installer set-default "$CURRENT_IMAGE"
sonic-installer set-next-boot "$PINNED_IMAGE"
fw_printenv boot_once boot_next sonic_version_1 sonic_version_2
sonic-installer list
sonic-installer cleanup -y
fw_printenv boot_once boot_next sonic_version_1 sonic_version_2
Do not perform the final reboot during reproduction; the environment after cleanup is enough to prove the failure.
Describe the results you expected
cleanup should keep the image referenced by boot_once, because that is the image U-Boot will try first on the next reboot:
sonic_version_1=SONiC-OS-B
boot_once=run sonic_image_1
Describe the results you received
cleanup uses get_next_image() to decide which image to keep. Because get_next_image() ignores boot_once, it keeps only the current/default image and removes the one-time target:
sonic_version_1=NONE
boot_once=run sonic_image_1
boot_next=run sonic_image_2
Additional information
The cleanup() command in sonic_installer/main.py removes every image that is not equal to current or bootloader.get_next_image(). The U-Boot get_next_image() path reports boot_next, not the higher-priority boot_once. Then remove_image() marks the pinned slot empty but does not clear boot_once.
Additional information: Impact
The next cold reboot consumes boot_once=run sonic_image_1 and attempts to boot a slot that cleanup just removed. This turns a routine cleanup after set-next-boot into a possible boot failure.
Issue 5: set-fips can corrupt or duplicate the sonic_fips bootarg
Severity: Medium
Description
The existing linuxargs contains a sonic_fips token whose value has more than one character, or the token appears at the beginning of the command line:
linuxargs=console=ttyS12,115200n8 sonic_fips=10 loop=foo
linuxargs=sonic_fips=1 console=ttyS12,115200n8 loop=foo
The current production value is usually a single character, but the parser should not corrupt the kernel command line if a future path writes a multi-character value or moves the token to the front.
Steps to reproduce the issue
Variant A, multi-character value:
ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)
fw_setenv linuxargs "console=ttyS12,115200n8 sonic_fips=10 loop=foo"
sonic-installer set-fips "$IMAGE" --enable-fips
fw_printenv -n linuxargs
fw_setenv linuxargs "$ORIG_LINUXARGS"
Variant B, token at the beginning:
ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)
fw_setenv linuxargs "sonic_fips=1 console=ttyS12,115200n8 loop=foo"
sonic-installer set-fips "$IMAGE" --disable-fips
fw_printenv -n linuxargs
sonic-installer get-fips "$IMAGE"
fw_setenv linuxargs "$ORIG_LINUXARGS"
Describe the results you expected
The existing sonic_fips=<value> token should be removed as a whole token and replaced with exactly one new token.
Describe the results you received
For variant A, only sonic_fips=1 is removed, leaving the trailing 0 glued to the previous argument:
linuxargs=console=ttyS12,115200n80 loop=foo sonic_fips=1
For variant B, the leading token is not matched, so a duplicate is appended:
linuxargs=sonic_fips=1 console=ttyS12,115200n8 loop=foo sonic_fips=0
get-fips then reports enabled because the stale sonic_fips=1 token is still present.
Additional information
set_fips() uses this pattern:
re.sub(r' sonic_fips=[^\s]', '', cmdline)
[^\s] consumes exactly one non-whitespace character, not the entire token value, and the leading literal space means the token is ignored when it appears at the beginning of linuxargs.
Additional information: Impact
The command can silently write a malformed kernel command line or leave contradictory FIPS flags. On serial-console BMC systems, corrupting the console token is especially risky because it can make boot-time debugging harder.
Issue 6: get-fips treats multi-character values beginning with 1 as enabled
Severity: Low
Description
linuxargs contains a sonic_fips value such as 10 or 12:
linuxargs=console=ttyS12,115200n8 sonic_fips=10 loop=foo
Steps to reproduce the issue
ORIG_LINUXARGS=$(fw_printenv -n linuxargs)
IMAGE=$(fw_printenv -n sonic_version_1)
fw_setenv linuxargs "console=ttyS12,115200n8 sonic_fips=10 loop=foo"
sonic-installer get-fips "$IMAGE"
fw_setenv linuxargs "$ORIG_LINUXARGS"
Describe the results you expected
Only an exact sonic_fips=1 token should be treated as enabled.
Describe the results you received
get-fips reports enabled because it tests for a substring:
sonic_fips=10 and sonic_fips=12 both satisfy that substring check.
Additional information: Impact
This is lower severity than Issue 5 because current values are normally 0 or 1, but it is still a correctness bug in the parser and makes future extensions of the FIPS value unsafe.
Issue 7: Stale boot_once can shadow set-default and image installation
Severity: High
Description
A previous set-next-boot left:
boot_once=run sonic_image_N
The user then tries to make a different slot the default, or installs a new image that should become the default.
Steps to reproduce the issue
Case A, set-default is shadowed. Start from clean image SONiC-OS-A, install a second image, but do not reboot into it:
sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-B
sonic-installer set-default SONiC-OS-A
fw_printenv -n boot_once
fw_printenv -n boot_next
Expected: set-default clears the one-time selector so the next reboot follows the new default and boots SONiC-OS-A. Actual: boot_once=run sonic_image_1 remains set, so U-Boot will consume it first and boot SONiC-OS-B once.
Case B, a newly installed image is shadowed. Start from the same two-image state before rebooting into SONiC-OS-B, then leave a one-time boot entry for the current image before installing a third image:
sonic-installer set-next-boot SONiC-OS-A
sonic-installer install -y /path/to/sonic-bmc-C.bin
fw_printenv -n boot_once
fw_printenv -n boot_next
Expected: the next reboot boots the newly installed image SONiC-OS-C. Actual: boot_once remains set and takes priority over the new boot_next; slot 2 is still the previous current image, so the reboot can boot that previous image instead of SONiC-OS-C.
Describe the results you expected
After set-default, the next boot should follow the selected default. After install, the first reboot should boot the newly installed image/default configured by the installer.
Describe the results you received
The installed implementation does not clear boot_once in set_default_image() or install_image(). After the commands above, boot_once remains non-empty:
boot_once=run sonic_image_N
Additional information
set_default_image() sets boot_next, but leaves boot_once untouched.
install_image() runs the image installer script, but also leaves boot_once untouched.
ASPEED install code in platform/aspeed/platform_arm64.conf programs boot_next and bootcmd as part of prepare_boot_menu, but it does not clear stale boot_once.
Because U-Boot evaluates boot_once before boot_next, a stale one-time boot selector shadows the newly selected default.
Additional information: Impact
The next reboot can boot a stale one-time target instead of the image the user just made default or just installed. If the stale one-time target points to a removed or stale slot, this can become a boot failure.
Issue 8: Removing an image does not clear boot_once pointing to that image
Severity: High
Description
The running image is still SONiC-OS-A, a newly installed image SONiC-OS-B is in slot 1, and slot 1 is scheduled for one-time boot:
current image from /proc/cmdline: SONiC-OS-A
sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
boot_once=run sonic_image_1
This is reachable from a clean install because installing SONiC-OS-B stores the new image in slot 1 while the currently running SONiC-OS-A is preserved in slot 2.
Steps to reproduce the issue
Start from clean image SONiC-OS-A, then install but do not reboot into SONiC-OS-B:
sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer set-next-boot SONiC-OS-B
fw_printenv -n boot_once
Expected intermediate state:
boot_once=run sonic_image_1
Now remove SONiC-OS-B, which is allowed because the running image is still SONiC-OS-A:
sonic-installer remove -y SONiC-OS-B
fw_printenv -n sonic_version_1
fw_printenv -n boot_once
Describe the results you expected
The remove operation should clear boot_once because it points to the removed slot:
Describe the results you received
The installed implementation leaves boot_once unchanged. The check after removal still shows:
sonic_version_1=NONE
boot_once=run sonic_image_1
No fw_setenv boot_once "" appears.
Additional information
remove_image() flips boot_next, marks one sonic_version_N as NONE, and deletes /host/image-*. It never checks whether boot_once still points at the slot being removed.
Additional information: Impact
On the next cold reboot, bootcmd still tries the removed slot first. Depending on U-Boot command behavior and platform scripts, it may fail before fallback, delay boot, or boot with stale slot-specific variables. Even if fallback eventually reaches boot_next, the environment is inconsistent and the user has no warning.
Issue 9: Platform validation is effectively disabled on U-Boot
Severity: High
Description
The user accidentally provides an image for a different platform/ASIC:
sonic-installer install /tmp/wrong-platform-sonic.bin
The user did not pass --skip-platform-check.
Steps to reproduce the issue
Use an image whose installer/platforms_asic does not include the running BMC platform. For example, use an x86_64 Mellanox/NVIDIA switch image on an ASPEED BMC:
IMG=/tmp/switch-sonic.bin
BMC_PLATFORM=$(sed -n 's/^onie_platform=//p' /host/machine.conf)
sed -e '1,/^exit_marker$/d' "$IMG" | tar xOf - installer/platforms_asic | grep -qx "$BMC_PLATFORM"
python3 - <<'PY'
from sonic_installer.bootloader.uboot import UbootBootloader
img = "/tmp/switch-sonic.bin"
b = UbootBootloader()
print("verify_image_platform(switch image)=", b.verify_image_platform(img))
print("verify_image_platform(/etc/hostname)=", b.verify_image_platform("/etc/hostname"))
print("verify_image_platform(/nonexistent)=", b.verify_image_platform("/nonexistent"))
PY
Describe the results you expected
The installer/platforms_asic check should show that the BMC platform is absent, and verify_image_platform() should return False for the switch image, matching GRUB/other bootloaders.
Describe the results you received
The switch image's installer/platforms_asic does not contain the BMC platform, but the U-Boot platform check still accepts it because it accepts any existing regular file:
BMC platform: arm64-aspeed_nvidia_ast2700_bmc-r0
switch image contains BMC platform: no
verify_image_platform(switch image)= True
verify_image_platform(/etc/hostname)= True
verify_image_platform(/nonexistent)= False
In the install flow, that means the command proceeds past the default platform check instead of failing with the normal platform mismatch message.
Additional information
verify_image_platform() implements:
def verify_image_platform(self, image_path):
return os.path.isfile(image_path)
The install command in sonic_installer/main.py relies on bootloader.verify_image_platform(image_path) to enforce the default platform check. Therefore the --skip-platform-check safety boundary is meaningless on U-Boot: the default behavior already skips real platform validation.
Additional information: Impact
SONiC-BMC can attempt to install an image that is not intended for the BMC platform. That can leave the device with an unbootable or unsupported image even though the user did not request a forced platform bypass.
Issue 10: FIPS commands modify/report the wrong image
Severity: High
Description
Slot 1 and slot 2 have separate kernel args:
sonic_version_1=SONiC-OS-B
sonic_version_2=SONiC-OS-A
linuxargs=... loop=image-B/fs.squashfs sonic_fips=0
linuxargs_old=... loop=image-A/fs.squashfs sonic_fips=1
This is reachable from clean install by installing a second image. Start with SONiC-OS-A, install SONiC-OS-B, and before reboot the running image SONiC-OS-A is preserved in slot 2. The user asks for FIPS status or changes FIPS for that slot 2 image:
sonic-installer get-fips SONiC-OS-A
sonic-installer set-fips SONiC-OS-A --disable-fips
Steps to reproduce the issue
From clean image SONiC-OS-A:
sonic-installer install -y /path/to/sonic-bmc-B.bin
fw_printenv -n sonic_version_1
fw_printenv -n sonic_version_2
fw_printenv -n linuxargs
fw_printenv -n linuxargs_old
sonic-installer set-fips SONiC-OS-A --disable-fips
fw_printenv -n linuxargs
fw_printenv -n linuxargs_old
The expected observable difference is that slot 2's linuxargs_old should change, while slot 1's linuxargs should not.
Describe the results you expected
The commands should read or write slot 2's bootargs (linuxargs_old on ASPEED).
Describe the results you received
The installed implementation always reads and writes linuxargs:
fw_printenv linuxargs
fw_setenv linuxargs ...
Additional information
The U-Boot set_fips() and get_fips() methods ignore the image argument. ASPEED U-Boot uses:
linuxargs for slot 1 (sonic_bootargs).
linuxargs_old for slot 2 (sonic_bootargs_old).
Those variables are programmed in:
sonic-program-uboot-env.sh, where the installer writes linuxargs and linuxargs_old
platform_arm64.conf, where ASPEED defines the slot 1 and slot 2 bootargs variables
Additional information: Impact
get-fips SONiC-OS-A reports slot 1's FIPS status.
set-fips SONiC-OS-A ... changes slot 1, not slot 2.
set-fips with no image defaults through get_next_image(), so it can combine with Issue 3 and modify an image that is not actually the next boot target.
This is a security configuration bug because the CLI can report success while the requested image's FIPS bootarg is unchanged.
Issue 11: Broken or custom U-Boot selectors are hidden
Severity: Medium
Scenario A
boot_next points at an empty slot:
sonic_version_1=NONE
sonic_version_2=SONiC-OS-A
boot_next=run sonic_image_1
Scenario B
boot_once contains a non-standard U-Boot command:
boot_once=run recovery_script
boot_next=run sonic_image_1
Steps to reproduce the issue
Scenario A is reachable by following Issue 1: create the sparse state, then run set-default on the surviving slot 2 image. That writes boot_next=run sonic_image_1 while slot 1 is empty.
Scenario B is a lab/debug reproduction: set a custom one-shot command and then ask the installer what will boot next:
fw_setenv boot_once "run recovery_script"
fw_setenv boot_next "run sonic_image_1"
sonic-installer list
Describe the results you expected
The CLI should surface the actual selected slot or command, or at least report that the selected slot is empty/invalid. It should not claim a safe next image that U-Boot will not actually execute first.
Describe the results you received
After Scenario A, sonic-installer list reports the surviving installed image even though boot_next points at the empty slot:
Next: SONiC-OS-A
boot_next=run sonic_image_1
sonic_version_1=NONE
After Scenario B, sonic-installer list reports the image selected by boot_next and does not reveal the custom boot_once command that U-Boot will execute first.
Additional information
get_next_image() only returns images[1] when boot_next contains sonic_image_2 and there are exactly two populated images. Otherwise it returns images[0]. It also ignores boot_once entirely.
Additional information: Impact
sonic-installer list and verify-next-image can hide a broken U-Boot environment by reporting the first installed image. That makes debugging and pre-reboot validation unreliable.
Issue 12: Removing an image leaves stale slot-specific boot variables
Severity: Medium
Description
Slot 1 contains a newly installed non-current image SONiC-OS-B:
image_dir=image-B
fit_name=image-B/boot/sonic_arm64.fit
linuxargs=... loop=image-B/fs.squashfs ...
sonic_version_1=SONiC-OS-B
This is reachable from clean install: install a second image, but do not reboot into it. The running image remains in slot 2, so the slot 1 image can be removed.
sonic-installer install -y /path/to/sonic-bmc-B.bin
NEW_IMAGE=$(fw_printenv -n sonic_version_1)
sonic-installer remove -y "$NEW_IMAGE"
The important point is to remove a non-current U-Boot slot image and then inspect that slot's payload variables.
Steps to reproduce the issue
Before removal, inspect the slot 1 variables:
fw_printenv image_dir fit_name linuxargs sonic_bootargs sonic_boot_load sonic_version_1
Then remove the slot 1 image and inspect the same variables again.
Describe the results you expected
The slot identity and slot-specific boot payload variables should be cleared together.
This must be done with platform awareness: variables should only be cleared when the platform actually uses paired slot-local variables. For example, a platform with both linuxargs and linuxargs_old can treat them as slot-local, but a platform that uses one shared linuxargs for both slots must not lose that global variable when slot 1 is removed.
Describe the results you received
The installed implementation clears only sonic_version_1 and deletes /host/image-B. It leaves variables such as image_dir, fit_name, linuxargs, sonic_bootargs, and sonic_boot_load.
Additional information
remove_image() only writes boot_next and sonic_version_N. ASPEED slot 1 booting depends on the non-_old variable set used by sonic_image_1 in platform_arm64.conf; slot 2 similarly depends on the _old variable set used by sonic_image_2.
Additional information: Impact
Manual U-Boot recovery commands or future scripts can still expand stale variables for a deleted image. The menu no longer lists a real image, but run sonic_image_1 or run sonic_image_2 can still attempt to load the removed FIT path for whichever slot was cleared only at the sonic_version_N level. This is confusing at best and risky during recovery.
Issue 13: Empty slot marker is inconsistent
Severity: Low
Description
Fresh ASPEED initialization sets:
The U-Boot remove_image() method sets:
Steps to reproduce the issue
Compare a fresh ASPEED initialized system with a system after removing a non-current image:
fw_printenv -n sonic_version_2
sonic-installer install -y /path/to/sonic-bmc-B.bin
NEW_IMAGE=$(fw_printenv -n sonic_version_1)
sonic-installer remove -y "$NEW_IMAGE"
fw_printenv -n sonic_version_1
Describe the results you expected
The empty slot marker should be consistent.
Describe the results you received
The U-Boot remove_image() method uses uppercase NONE, while ASPEED scripts use None.
Additional information
This does not break get_installed_images() because it filters by IMAGE_PREFIX, but U-Boot menu text prints the literal sonic_version_N.
Additional information: Impact
The boot menu and debugging output can alternate between None and NONE for the same empty-slot concept. This is low severity but avoidable confusion.
Issue 14: U-Boot detection is too broad
Severity: Low
Description
An ARM/aarch64 SONiC platform does not use the ASPEED-style U-Boot env, and GRUB/Aboot detection does not claim it.
Steps to reproduce the issue
On such a platform, run:
Describe the results you expected
The bootloader should either be detected accurately or fail with a clear "unsupported bootloader/env" message.
Describe the results you received
UbootBootloader.detect() detects U-Boot for any arm or aarch64 machine:
return ("arm" in arch) or ("aarch64" in arch)
Additional information
Bootloader detection runs in AbootBootloader, GrubBootloader, then UbootBootloader order. U-Boot is therefore a fallback, but it still does not verify that /usr/bin/fw_printenv can read the expected variables (sonic_version_1, sonic_version_2, boot_next, etc.).
Additional information: Impact
The CLI can select U-Boot and fail later with confusing env-read behavior. This is low severity for the current SONiC-BMC target, but it matters for portability to other ARM platforms.
Issue 15: Install can report success even when U-Boot env programming failed
Severity: High
Description
During image installation, the image's installer path attempts to program U-Boot env, but U-Boot env programming fails because fw_setenv is unavailable, /etc/fw_env.config is wrong, the env store is inaccessible, or a required variable is missing.
Steps to reproduce the issue
In a lab installer environment, force the ASPEED U-Boot env setup path to fail, for example by using an invalid fw_env.config or by making fw_setenv return nonzero, then run:
sonic-installer install -y /path/to/sonic-bmc.bin
Describe the results you expected
The install should fail before reporting success, because the installed image may not be bootable if the U-Boot env was not updated.
Describe the results you received
The ASPEED installer path has U-Boot env setup as a helper step inside prepare_boot_menu. In the image tested on the BMC, prepare_boot_menu calls configure_uboot_env, but the call is not guarded with an abort if the helper returns nonzero:
prepare_boot_menu() {
configure_uboot_env
...
fw_setenv boot_next 'run sonic_image_1'
fw_setenv bootcmd '...'
}
The U-Boot bootloader install_image() method only runs:
run_command(["bash", image_path])
If the image installer exits zero despite a U-Boot env setup failure or unhandled fw_setenv failure, sonic-installer install continues to migration/sync and prints success.
Additional information
For a U-Boot SONiC-BMC image, programming boot_next, bootcmd, sonic_image_1/2, linuxargs, and the FIT paths is part of making the installed image bootable. Treating env programming failure as a warning breaks the installer's success contract.
Additional information: Impact
The operator can see a successful install, but the next reboot may still boot the previous image, boot an empty slot, or fail because the U-Boot variables still point to stale paths. This is high severity because the failure is deferred until reboot.
Issue 16: Current-image detection can crash when U-Boot bootargs do not use the SONiC-BMC loop format
Severity: Medium
Description
The U-Boot bootloader class inherits current-image detection from the ONIE installer bootloader. That parser expects /proc/cmdline to contain:
loop=<image-dir>/fs.squashfs
The available SONiC-BMC board does use that exact shape, so this does not reproduce on the current AST2700 test setup. However, other U-Boot platforms can use different bootargs or variable names, such as a firmware= selector, a different rootfs filename, or no loop= parameter at all.
Steps to reproduce the issue
On an ARM/U-Boot lab platform, boot SONiC with U-Boot args that do not include loop=<image-dir>/fs.squashfs, then run a command that asks for the current image:
cat /proc/cmdline
sonic-installer list
sonic-installer remove -y <some-non-current-image>
For example, the issue is reachable on a lab U-Boot platform whose boot command line looks like this:
root=/dev/mmcblk0p1 rw firmware=SONiC-OS-A
Running sonic-installer list in that state asks the bootloader for the current image and hits the parser path.
Describe the results you expected
The command should either identify the current image through a U-Boot-aware fallback or fail with a clear message saying the current image could not be determined.
Describe the results you received
The inherited parser performs a regular-expression match and immediately calls .group(1). If the expected loop=.../fs.squashfs pattern is absent, the command raises an exception instead of returning a controlled error.
The parser lives in the shared ONIE installer bootloader base class, so a fix can either be a U-Boot-specific override or a guarded base-class parser. The U-Boot-specific risk is that UbootBootloader.detect() intentionally covers ARM platforms whose bootargs may not match the SONiC-BMC loop format.
Additional information
U-Boot support should not assume every U-Boot platform uses the exact SONiC-BMC bootargs format. The AST2700 path is valid, but the detection should fail clearly or support a small set of known U-Boot selectors.
Additional information: Impact
sonic-installer list, remove, cleanup, and any code path that checks the current image can crash on a valid U-Boot platform with different bootargs. This matters for keeping the implementation flexible across BMC and non-BMC U-Boot targets.
Issue 17: soft-reboot can pair the next image's kernel with the current image's bootargs
Severity: High
Description
Two images are installed. The user selects a non-current image as the next/default boot target, then runs soft-reboot instead of a cold reboot.
On ASPEED/SONiC-BMC, each image's rootfs is selected through the kernel command line, especially:
loop=image-<version>/fs.squashfs
Steps to reproduce the issue
Start from clean image SONiC-OS-A, install SONiC-OS-B, but do not reboot into it. The installer selects SONiC-OS-B as the next/default image while the running kernel command line still points to SONiC-OS-A:
sonic-installer install -y /path/to/sonic-bmc-B.bin
sonic-installer list
Do not run the final soft-reboot just to prove this issue. Compare the running bootargs with the selected slot's U-Boot bootargs:
cat /proc/cmdline
fw_printenv -n linuxargs
SOFT_REBOOT=$(command -v soft-reboot)
grep -A20 'function setup_reboot_variables' "$SOFT_REBOOT"
The current /proc/cmdline points to the running image, while linuxargs points to the selected slot 1 image.
Describe the results you expected
soft-reboot should load the kernel/initrd and bootargs from the same selected next image.
Describe the results you received
The U-Boot/device-tree path in scripts/soft-reboot setup_reboot_variables() sets:
KERNEL_IMAGE="$(ls $IMAGE_PATH/boot/vmlinuz-*)"
BOOT_OPTIONS="$(cat /sys/firmware/devicetree/base/chosen/bootargs | sed 's/.$//') SONIC_BOOT_TYPE=${BOOT_TYPE_ARG}"
IMAGE_PATH is derived from sonic-installer list, but BOOT_OPTIONS comes from the current device-tree bootargs. This can kexec the next image's kernel with the current image's loop= rootfs path.
Additional information
The fast/warm/express reboot scripts have U-Boot-specific logic to fetch sonic_bootargs${SUFFIX} and linuxargs${SUFFIX} from U-Boot env, but soft-reboot uses current device-tree bootargs directly. That is safe only when the next image is the current image.
Description
This report analyzes the U-Boot bootloader behavior in
sonic_installer/bootloader/uboot.pyand related SONiC-BMC U-Boot scripts as verified on the installed BMC image. All reproduce steps are based on SONiC-BMC (ASPEED AST2700), but behaviours should be similar across U-Boot devices.Severity summary
set-default,set-next-boot, andremoveselect the wrong U-Boot slotboot_onceis ignored byget_next_image(), solist,cleanup,verify-next-image, and reboot scripts can operate on the wrong imageboot_oncecan shadowset-defaultor a newly installed default imageboot_oncethat still points to the removed slotcleanupcan remove the user's pinned one-time boot imagesoft-rebootcan use the next image's kernel with the current image's bootargs on U-Bootget_next_image()fallback behaviorset-fipscan corruptlinuxargswhen the existingsonic_fipsvalue is not a single trailing tokenloop=<image>/fs.squashfsget-fipsuses a substring check and can report multi-character values beginning with1as enabledNonevsNONE)Issue 1: Sparse slot state selects the wrong U-Boot slot
Severity: Critical
Description
Slot 1 is empty, slot 2 contains the only installed image:
This is reachable from a normal clean SONiC-BMC install. A clean ONIE/TFTP install starts as slot 1 populated and slot 2 empty. During a later upgrade, the ASPEED installer stores the newly installed image in slot 1 and moves the currently running image into slot 2. If the new slot 1 image is then removed before it is booted, slot 1 becomes empty while the current image remains in slot 2.
Steps to reproduce the issue
Start from clean image
SONiC-OS-A:Expected initial state:
Install a second image
SONiC-OS-B, but do not reboot into it:Expected state after install:
Remove the new image while the running image is still
SONiC-OS-A:Expected reachable sparse state:
Now set the surviving slot 2 image as default:
Describe the results you expected
boot_nextshould point to slot 2:Describe the results you received
The installed implementation writes slot 1:
Additional information
get_installed_images()insonic_installer/bootloader/uboot.pyreadssonic_version_1andsonic_version_2, but filters out empty/non-SONiC slots into a compact list. With slot 1 empty and slot 2 populated, it returns:Then
set_default_image()assumesimages[0]is slot 1:The list index no longer matches the U-Boot slot number. This can point the default boot target at an empty slot. The same slot-index bug affects:
set_next_image(), which can write the wrongboot_once.remove_image(), which can clear the wrongsonic_version_Nand pointboot_nextat the wrong slot.Additional information: Impact
The next reboot can attempt
run sonic_image_1even though the only valid image is in slot 2. On SONiC-BMC, slot-specific boot variables such asfit_name,fit_name_old,linuxargs, andlinuxargs_oldcontrol the actual FIT path and kernel command line, so selecting the wrong slot can boot the wrong image or fail to boot.Issue 2: Similar image names can boot the wrong image
Severity: Critical
Description
Two installed image names have a prefix/substr relationship:
This is reachable with normal upgrade sequencing if the first image version string is a prefix of the second image version string. For example, clean-install
SONiC-OS-A, then installSONiC-OS-A-new. The ASPEED upgrade flow stores the new image in slot 1 and the previous image in slot 2. The user then wants to bootSONiC-OS-Afrom slot 2.Steps to reproduce the issue
Start with:
Then run:
Describe the results you expected
boot_onceshould point to slot 2:Describe the results you received
The installed implementation writes slot 1:
Additional information
The implementation uses substring checks:
That is not an exact image-name comparison. If the requested image name is contained inside the other slot's image name, the implementation selects the first matching slot, not the exact slot.
Additional information: Impact
The command succeeds, but the next reboot boots a different image than the one the user requested. This is worse than a validation failure because the CLI gives no warning and writes a valid-looking U-Boot variable.
Issue 3:
boot_onceis ignored, so the reported "Next" image can be wrongSeverity: Critical
Description
Default boot is slot 1, but a one-time boot is scheduled for slot 2:
This is reachable from clean install by installing a second image and scheduling the currently running image for one-time boot. No reboot is required to reach the failing state: the installer stores the new image in slot 1 and preserves the running image in slot 2.
Steps to reproduce the issue
After the state above is reached, run:
Describe the results you expected
Because ASPEED
bootcmdexecutesboot_oncebeforeboot_next, the next boot target isSONiC-OS-A.sonic-installer listshould show:Describe the results you received
The installed implementation reports slot 1:
Additional information
sonic_installer/bootloader/uboot.py:get_next_image()reads onlyboot_next:It never reads
boot_once, even thoughbootcmdexecutesboot_oncefirst. Therefore the CLI reports the persistent default image, not the actual next boot image.Additional information: Impact
This breaks more than
sonic-installer list:sonic-installer verify-next-imageusesBootloader.verify_next_image(), which callsget_next_image().sonic-installer cleanupkeepscurrentandnext, and removes other images. Ifboot_oncepoints to slot 2 butget_next_image()reports slot 1, cleanup can remove the actual one-time boot target.scripts/rebootcheckssonic-installer verify-next-image.scripts/fast-reboot,warm-reboot,express-reboot, andsoft-rebootderive target image paths fromsonic-installer list.The result is a CLI that can say the next image is safe while U-Boot is about to boot a different image.
Issue 4:
cleanupcan remove the user's pinned one-time boot imageSeverity: High
Description
The user pins a non-current image with
set-next-boot, then runscleanup -yto remove images that are neither current nor next. On ASPEED U-Boot, the pinned one-time target is the actual next boot target becausebootcmdconsumesboot_oncebeforeboot_next.This is reachable from clean install:
SONiC-OS-A.SONiC-OS-B.SONiC-OS-Aas the default/current image.SONiC-OS-Bfor one-time boot.Steps to reproduce the issue
Do not perform the final reboot during reproduction; the environment after
cleanupis enough to prove the failure.Describe the results you expected
cleanupshould keep the image referenced byboot_once, because that is the image U-Boot will try first on the next reboot:Describe the results you received
cleanupusesget_next_image()to decide which image to keep. Becauseget_next_image()ignoresboot_once, it keeps only the current/default image and removes the one-time target:Additional information
The
cleanup()command insonic_installer/main.pyremoves every image that is not equal tocurrentorbootloader.get_next_image(). The U-Bootget_next_image()path reportsboot_next, not the higher-priorityboot_once. Thenremove_image()marks the pinned slot empty but does not clearboot_once.Additional information: Impact
The next cold reboot consumes
boot_once=run sonic_image_1and attempts to boot a slot thatcleanupjust removed. This turns a routine cleanup afterset-next-bootinto a possible boot failure.Issue 5:
set-fipscan corrupt or duplicate thesonic_fipsbootargSeverity: Medium
Description
The existing
linuxargscontains asonic_fipstoken whose value has more than one character, or the token appears at the beginning of the command line:The current production value is usually a single character, but the parser should not corrupt the kernel command line if a future path writes a multi-character value or moves the token to the front.
Steps to reproduce the issue
Variant A, multi-character value:
Variant B, token at the beginning:
Describe the results you expected
The existing
sonic_fips=<value>token should be removed as a whole token and replaced with exactly one new token.Describe the results you received
For variant A, only
sonic_fips=1is removed, leaving the trailing0glued to the previous argument:For variant B, the leading token is not matched, so a duplicate is appended:
get-fipsthen reports enabled because the stalesonic_fips=1token is still present.Additional information
set_fips()uses this pattern:[^\s]consumes exactly one non-whitespace character, not the entire token value, and the leading literal space means the token is ignored when it appears at the beginning oflinuxargs.Additional information: Impact
The command can silently write a malformed kernel command line or leave contradictory FIPS flags. On serial-console BMC systems, corrupting the console token is especially risky because it can make boot-time debugging harder.
Issue 6:
get-fipstreats multi-character values beginning with1as enabledSeverity: Low
Description
linuxargscontains asonic_fipsvalue such as10or12:Steps to reproduce the issue
Describe the results you expected
Only an exact
sonic_fips=1token should be treated as enabled.Describe the results you received
get-fipsreports enabled because it tests for a substring:sonic_fips=10andsonic_fips=12both satisfy that substring check.Additional information: Impact
This is lower severity than Issue 5 because current values are normally
0or1, but it is still a correctness bug in the parser and makes future extensions of the FIPS value unsafe.Issue 7: Stale
boot_oncecan shadowset-defaultand image installationSeverity: High
Description
A previous
set-next-bootleft:The user then tries to make a different slot the default, or installs a new image that should become the default.
Steps to reproduce the issue
Case A,
set-defaultis shadowed. Start from clean imageSONiC-OS-A, install a second image, but do not reboot into it:Expected:
set-defaultclears the one-time selector so the next reboot follows the new default and bootsSONiC-OS-A. Actual:boot_once=run sonic_image_1remains set, so U-Boot will consume it first and bootSONiC-OS-Bonce.Case B, a newly installed image is shadowed. Start from the same two-image state before rebooting into
SONiC-OS-B, then leave a one-time boot entry for the current image before installing a third image:Expected: the next reboot boots the newly installed image
SONiC-OS-C. Actual:boot_onceremains set and takes priority over the newboot_next; slot 2 is still the previous current image, so the reboot can boot that previous image instead ofSONiC-OS-C.Describe the results you expected
After
set-default, the next boot should follow the selected default. After install, the first reboot should boot the newly installed image/default configured by the installer.Describe the results you received
The installed implementation does not clear
boot_onceinset_default_image()orinstall_image(). After the commands above,boot_onceremains non-empty:Additional information
set_default_image()setsboot_next, but leavesboot_onceuntouched.install_image()runs the image installer script, but also leavesboot_onceuntouched.ASPEED install code in
platform/aspeed/platform_arm64.confprogramsboot_nextandbootcmdas part ofprepare_boot_menu, but it does not clear staleboot_once.Because U-Boot evaluates
boot_oncebeforeboot_next, a stale one-time boot selector shadows the newly selected default.Additional information: Impact
The next reboot can boot a stale one-time target instead of the image the user just made default or just installed. If the stale one-time target points to a removed or stale slot, this can become a boot failure.
Issue 8: Removing an image does not clear
boot_oncepointing to that imageSeverity: High
Description
The running image is still
SONiC-OS-A, a newly installed imageSONiC-OS-Bis in slot 1, and slot 1 is scheduled for one-time boot:This is reachable from a clean install because installing
SONiC-OS-Bstores the new image in slot 1 while the currently runningSONiC-OS-Ais preserved in slot 2.Steps to reproduce the issue
Start from clean image
SONiC-OS-A, then install but do not reboot intoSONiC-OS-B:Expected intermediate state:
Now remove
SONiC-OS-B, which is allowed because the running image is stillSONiC-OS-A:Describe the results you expected
The remove operation should clear
boot_oncebecause it points to the removed slot:Describe the results you received
The installed implementation leaves
boot_onceunchanged. The check after removal still shows:No
fw_setenv boot_once ""appears.Additional information
remove_image()flipsboot_next, marks onesonic_version_NasNONE, and deletes/host/image-*. It never checks whetherboot_oncestill points at the slot being removed.Additional information: Impact
On the next cold reboot,
bootcmdstill tries the removed slot first. Depending on U-Boot command behavior and platform scripts, it may fail before fallback, delay boot, or boot with stale slot-specific variables. Even if fallback eventually reachesboot_next, the environment is inconsistent and the user has no warning.Issue 9: Platform validation is effectively disabled on U-Boot
Severity: High
Description
The user accidentally provides an image for a different platform/ASIC:
The user did not pass
--skip-platform-check.Steps to reproduce the issue
Use an image whose
installer/platforms_asicdoes not include the running BMC platform. For example, use an x86_64 Mellanox/NVIDIA switch image on an ASPEED BMC:Describe the results you expected
The
installer/platforms_asiccheck should show that the BMC platform is absent, andverify_image_platform()should returnFalsefor the switch image, matching GRUB/other bootloaders.Describe the results you received
The switch image's
installer/platforms_asicdoes not contain the BMC platform, but the U-Boot platform check still accepts it because it accepts any existing regular file:In the install flow, that means the command proceeds past the default platform check instead of failing with the normal platform mismatch message.
Additional information
verify_image_platform()implements:The
installcommand insonic_installer/main.pyrelies onbootloader.verify_image_platform(image_path)to enforce the default platform check. Therefore the--skip-platform-checksafety boundary is meaningless on U-Boot: the default behavior already skips real platform validation.Additional information: Impact
SONiC-BMC can attempt to install an image that is not intended for the BMC platform. That can leave the device with an unbootable or unsupported image even though the user did not request a forced platform bypass.
Issue 10: FIPS commands modify/report the wrong image
Severity: High
Description
Slot 1 and slot 2 have separate kernel args:
This is reachable from clean install by installing a second image. Start with
SONiC-OS-A, installSONiC-OS-B, and before reboot the running imageSONiC-OS-Ais preserved in slot 2. The user asks for FIPS status or changes FIPS for that slot 2 image:Steps to reproduce the issue
From clean image
SONiC-OS-A:The expected observable difference is that slot 2's
linuxargs_oldshould change, while slot 1'slinuxargsshould not.Describe the results you expected
The commands should read or write slot 2's bootargs (
linuxargs_oldon ASPEED).Describe the results you received
The installed implementation always reads and writes
linuxargs:Additional information
The U-Boot
set_fips()andget_fips()methods ignore theimageargument. ASPEED U-Boot uses:linuxargsfor slot 1 (sonic_bootargs).linuxargs_oldfor slot 2 (sonic_bootargs_old).Those variables are programmed in:
sonic-program-uboot-env.sh, where the installer writeslinuxargsandlinuxargs_oldplatform_arm64.conf, where ASPEED defines the slot 1 and slot 2 bootargs variablesAdditional information: Impact
get-fips SONiC-OS-Areports slot 1's FIPS status.set-fips SONiC-OS-A ...changes slot 1, not slot 2.set-fipswith no image defaults throughget_next_image(), so it can combine with Issue 3 and modify an image that is not actually the next boot target.This is a security configuration bug because the CLI can report success while the requested image's FIPS bootarg is unchanged.
Issue 11: Broken or custom U-Boot selectors are hidden
Severity: Medium
Scenario A
boot_nextpoints at an empty slot:Scenario B
boot_oncecontains a non-standard U-Boot command:Steps to reproduce the issue
Scenario A is reachable by following Issue 1: create the sparse state, then run
set-defaulton the surviving slot 2 image. That writesboot_next=run sonic_image_1while slot 1 is empty.Scenario B is a lab/debug reproduction: set a custom one-shot command and then ask the installer what will boot next:
Describe the results you expected
The CLI should surface the actual selected slot or command, or at least report that the selected slot is empty/invalid. It should not claim a safe next image that U-Boot will not actually execute first.
Describe the results you received
After Scenario A,
sonic-installer listreports the surviving installed image even thoughboot_nextpoints at the empty slot:After Scenario B,
sonic-installer listreports the image selected byboot_nextand does not reveal the customboot_oncecommand that U-Boot will execute first.Additional information
get_next_image()only returnsimages[1]whenboot_nextcontainssonic_image_2and there are exactly two populated images. Otherwise it returnsimages[0]. It also ignoresboot_onceentirely.Additional information: Impact
sonic-installer listandverify-next-imagecan hide a broken U-Boot environment by reporting the first installed image. That makes debugging and pre-reboot validation unreliable.Issue 12: Removing an image leaves stale slot-specific boot variables
Severity: Medium
Description
Slot 1 contains a newly installed non-current image
SONiC-OS-B:This is reachable from clean install: install a second image, but do not reboot into it. The running image remains in slot 2, so the slot 1 image can be removed.
The important point is to remove a non-current U-Boot slot image and then inspect that slot's payload variables.
Steps to reproduce the issue
Before removal, inspect the slot 1 variables:
Then remove the slot 1 image and inspect the same variables again.
Describe the results you expected
The slot identity and slot-specific boot payload variables should be cleared together.
This must be done with platform awareness: variables should only be cleared when the platform actually uses paired slot-local variables. For example, a platform with both
linuxargsandlinuxargs_oldcan treat them as slot-local, but a platform that uses one sharedlinuxargsfor both slots must not lose that global variable when slot 1 is removed.Describe the results you received
The installed implementation clears only
sonic_version_1and deletes/host/image-B. It leaves variables such asimage_dir,fit_name,linuxargs,sonic_bootargs, andsonic_boot_load.Additional information
remove_image()only writesboot_nextandsonic_version_N. ASPEED slot 1 booting depends on the non-_oldvariable set used bysonic_image_1inplatform_arm64.conf; slot 2 similarly depends on the_oldvariable set used bysonic_image_2.Additional information: Impact
Manual U-Boot recovery commands or future scripts can still expand stale variables for a deleted image. The menu no longer lists a real image, but
run sonic_image_1orrun sonic_image_2can still attempt to load the removed FIT path for whichever slot was cleared only at thesonic_version_Nlevel. This is confusing at best and risky during recovery.Issue 13: Empty slot marker is inconsistent
Severity: Low
Description
Fresh ASPEED initialization sets:
The U-Boot
remove_image()method sets:Steps to reproduce the issue
Compare a fresh ASPEED initialized system with a system after removing a non-current image:
Describe the results you expected
The empty slot marker should be consistent.
Describe the results you received
The U-Boot
remove_image()method uses uppercaseNONE, while ASPEED scripts useNone.Additional information
This does not break
get_installed_images()because it filters byIMAGE_PREFIX, but U-Boot menu text prints the literalsonic_version_N.Additional information: Impact
The boot menu and debugging output can alternate between
NoneandNONEfor the same empty-slot concept. This is low severity but avoidable confusion.Issue 14: U-Boot detection is too broad
Severity: Low
Description
An ARM/aarch64 SONiC platform does not use the ASPEED-style U-Boot env, and GRUB/Aboot detection does not claim it.
Steps to reproduce the issue
On such a platform, run:
Describe the results you expected
The bootloader should either be detected accurately or fail with a clear "unsupported bootloader/env" message.
Describe the results you received
UbootBootloader.detect()detects U-Boot for anyarmoraarch64machine:Additional information
Bootloader detection runs in
AbootBootloader,GrubBootloader, thenUbootBootloaderorder. U-Boot is therefore a fallback, but it still does not verify that/usr/bin/fw_printenvcan read the expected variables (sonic_version_1,sonic_version_2,boot_next, etc.).Additional information: Impact
The CLI can select U-Boot and fail later with confusing env-read behavior. This is low severity for the current SONiC-BMC target, but it matters for portability to other ARM platforms.
Issue 15: Install can report success even when U-Boot env programming failed
Severity: High
Description
During image installation, the image's installer path attempts to program U-Boot env, but U-Boot env programming fails because
fw_setenvis unavailable,/etc/fw_env.configis wrong, the env store is inaccessible, or a required variable is missing.Steps to reproduce the issue
In a lab installer environment, force the ASPEED U-Boot env setup path to fail, for example by using an invalid
fw_env.configor by makingfw_setenvreturn nonzero, then run:Describe the results you expected
The install should fail before reporting success, because the installed image may not be bootable if the U-Boot env was not updated.
Describe the results you received
The ASPEED installer path has U-Boot env setup as a helper step inside
prepare_boot_menu. In the image tested on the BMC,prepare_boot_menucallsconfigure_uboot_env, but the call is not guarded with an abort if the helper returns nonzero:The U-Boot bootloader
install_image()method only runs:If the image installer exits zero despite a U-Boot env setup failure or unhandled
fw_setenvfailure,sonic-installer installcontinues to migration/sync and prints success.Additional information
For a U-Boot SONiC-BMC image, programming
boot_next,bootcmd,sonic_image_1/2,linuxargs, and the FIT paths is part of making the installed image bootable. Treating env programming failure as a warning breaks the installer's success contract.Additional information: Impact
The operator can see a successful install, but the next reboot may still boot the previous image, boot an empty slot, or fail because the U-Boot variables still point to stale paths. This is high severity because the failure is deferred until reboot.
Issue 16: Current-image detection can crash when U-Boot bootargs do not use the SONiC-BMC loop format
Severity: Medium
Description
The U-Boot bootloader class inherits current-image detection from the ONIE installer bootloader. That parser expects
/proc/cmdlineto contain:The available SONiC-BMC board does use that exact shape, so this does not reproduce on the current AST2700 test setup. However, other U-Boot platforms can use different bootargs or variable names, such as a
firmware=selector, a different rootfs filename, or noloop=parameter at all.Steps to reproduce the issue
On an ARM/U-Boot lab platform, boot SONiC with U-Boot args that do not include
loop=<image-dir>/fs.squashfs, then run a command that asks for the current image:For example, the issue is reachable on a lab U-Boot platform whose boot command line looks like this:
Running
sonic-installer listin that state asks the bootloader for the current image and hits the parser path.Describe the results you expected
The command should either identify the current image through a U-Boot-aware fallback or fail with a clear message saying the current image could not be determined.
Describe the results you received
The inherited parser performs a regular-expression match and immediately calls
.group(1). If the expectedloop=.../fs.squashfspattern is absent, the command raises an exception instead of returning a controlled error.The parser lives in the shared ONIE installer bootloader base class, so a fix can either be a U-Boot-specific override or a guarded base-class parser. The U-Boot-specific risk is that
UbootBootloader.detect()intentionally covers ARM platforms whose bootargs may not match the SONiC-BMC loop format.Additional information
U-Boot support should not assume every U-Boot platform uses the exact SONiC-BMC bootargs format. The AST2700 path is valid, but the detection should fail clearly or support a small set of known U-Boot selectors.
Additional information: Impact
sonic-installer list,remove,cleanup, and any code path that checks the current image can crash on a valid U-Boot platform with different bootargs. This matters for keeping the implementation flexible across BMC and non-BMC U-Boot targets.Issue 17:
soft-rebootcan pair the next image's kernel with the current image's bootargsSeverity: High
Description
Two images are installed. The user selects a non-current image as the next/default boot target, then runs
soft-rebootinstead of a cold reboot.On ASPEED/SONiC-BMC, each image's rootfs is selected through the kernel command line, especially:
Steps to reproduce the issue
Start from clean image
SONiC-OS-A, installSONiC-OS-B, but do not reboot into it. The installer selectsSONiC-OS-Bas the next/default image while the running kernel command line still points toSONiC-OS-A:Do not run the final
soft-rebootjust to prove this issue. Compare the running bootargs with the selected slot's U-Boot bootargs:The current
/proc/cmdlinepoints to the running image, whilelinuxargspoints to the selected slot 1 image.Describe the results you expected
soft-rebootshould load the kernel/initrd and bootargs from the same selected next image.Describe the results you received
The U-Boot/device-tree path in
scripts/soft-rebootsetup_reboot_variables()sets:IMAGE_PATHis derived fromsonic-installer list, butBOOT_OPTIONScomes from the current device-tree bootargs. This can kexec the next image's kernel with the current image'sloop=rootfs path.Additional information
The fast/warm/express reboot scripts have U-Boot-specific logic to fetch
sonic_bootargs${SUFFIX}andlinuxargs${SUFFIX}from U-Boot env, butsoft-rebootuses current device-tree bootargs directly. That is safe only when the next image is the current image.