Skip to content

[sonic-installer] uboot: fix two-slot semantics, boot_once handling, and platform check#4489

Closed
william8545 wants to merge 2 commits into
sonic-net:masterfrom
william8545:fix_uboot_installer
Closed

[sonic-installer] uboot: fix two-slot semantics, boot_once handling, and platform check#4489
william8545 wants to merge 2 commits into
sonic-net:masterfrom
william8545:fix_uboot_installer

Conversation

@william8545

@william8545 william8545 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Partially addresses sonic-net/sonic-utilities#4548.

What I did

Bring the U-Boot bootloader sonic_installer/bootloader/uboot.py in line with the contracts that grub.py and aboot.py already implement, and add the missing platform check that has effectively turned --skip-platform-check into a no-op on every U-Boot SONiC platform.

This PR is scoped to the Python uboot.py bootloader implementation and its tests. It does not touch FIPS code paths, reboot scripts, the shared OnieInstallerBootloader base class, the bootloader detection helper, or the buildimage installer shell scripts.

Issue coverage from #4548:

# Title (from #4548) Status in this PR
1 Sparse slot state selects the wrong U-Boot slot ✅ Fixed
2 Similar image names can boot the wrong image ✅ Fixed
3 boot_once is ignored, so the reported "Next" image can be wrong ✅ Fixed
4 cleanup can remove the user's pinned one-time boot image ✅ Fixed (via issue 3)
5 set-fips can corrupt or duplicate the sonic_fips bootarg ❌ Not in this PR
6 get-fips treats multi-character values beginning with 1 as enabled ❌ Not in this PR
7 Stale boot_once can shadow set-default and image installation ✅ Fixed
8 Removing an image does not clear boot_once pointing to that image ✅ Fixed
9 Platform validation is effectively disabled on U-Boot ✅ Fixed
10 FIPS commands modify/report the wrong image ❌ Not in this PR
11 Broken or custom U-Boot selectors are hidden ✅ Fixed
12 Removing an image leaves stale slot-specific boot variables ✅ Fixed for ASPEED/BMC; portability caveat below
13 Empty slot marker is inconsistent (None vs NONE) ❌ Not in this PR
14 U-Boot detection is too broad ❌ Not in this PR
15 Selected empty U-Boot slot is hidden or can crash command consumers ✅ Fixed
16 Install can report success even when U-Boot env programming failed ❌ Not in this PR (buildimage)
17 Current-image detection can crash on non-SONiC-BMC loop bootargs ❌ Not in this PR (base class)
18 soft-reboot can pair the next image's kernel with the current image's bootargs ❌ Not in this PR (scripts)

This PR fully fixes 9 of the 18 reported issues and fixes issue 12 for the tested ASPEED/BMC per-slot U-Boot environment. The remaining 8 are out of scope (FIPS, marker spelling, detection breadth, scripts, buildimage, shared base class) and are tracked for follow-up. Issue 12 still has a shared-linuxargs portability caveat described below.

Issues 5, 6, and 10 were reproduced on an AST2700 BMC during validation and are intentionally deferred to a FIPS-focused follow-up PR.

  • Issue 5 — set-fips regex [^\s] corrupts multi-char values / requires leading space. The fix is a one-line regex change ([^\s]\S+, and drop the leading-space requirement) but linuxargs mutation also needs to be routed per-slot to fix issue 10. Both should land together in a FIPS-focused follow-up PR.
  • Issue 6 — get-fips substring check returns enabled for any value starting with 1. Symmetric with issue 5; will land in the same FIPS follow-up.
  • Issue 10 — FIPS commands ignore the image argument. Requires routing read/write to linuxargs or linuxargs_old based on _get_image_slot(image). Held back to keep this PR scoped, and because the FIPS contract change deserves its own PR for review and test focus.
  • Issue 13 — Empty-slot marker "NONE" vs "None". Cosmetic only — both spellings pass get_installed_images' IMAGE_PREFIX filter. Aligning the markers requires touching the buildimage installer scripts as well as uboot.py, so it's better as a separate change that lands both halves together.
  • Issue 14 — Bootloader detection is too broad. The current detect() returns True for any ARM/aarch64 machine. The detection order in bootloader/__init__.py (Aboot → Grub → Uboot) prevents this from being load-bearing today, so the change is defensive rather than user-visible. Worth a separate small PR that also adds a /usr/bin/fw_printenv existence check.
  • Issue 16 — Install reports success even when U-Boot env programming failed. The failure path lives in the buildimage installer scripts (sonic-uboot-env-init.sh), not in sonic-utilities. Belongs in a sonic-buildimage PR.
  • Issue 17 — get_current_image regex assumes loop=<image>/fs.squashfs. The method is inherited from OnieInstallerBootloader and is shared with grub.py. Fixing it in the right place (probably the base class, with an opt-in override hook) requires touching multiple bootloaders and is a separate cross-cutting change. This PR only prevents get_next_image() from hiding or crashing on selected empty-slot U-Boot states; it does not fix the shared current-image parser.
  • Issue 18 — soft-reboot pairs the next image's kernel with the current image's bootargs. The defect is in scripts/soft-reboot, not in sonic_installer/bootloader/uboot.py. The fix mirrors the per-slot linuxargs / sonic_bootargs lookup that fast-reboot / warm-reboot / express-reboot already use in their device-tree branch. Will land in a separate reboot-scripts PR.

Files this PR did not touch at all:

How I did it

Slot mapping (issues 1, 2)
  • Replaced the if image in images[N] substring branches in set_default_image / set_next_image / remove_image with an explicit _get_image_slot(image) helper that uses exact-equality matching against sonic_version_<N>. Substring-related collisions (release SONiC-OS-X.0 vs. dirty SONiC-OS-X.0-dirty-…) no longer silently pick the wrong slot.
  • Replaced the images[0] / images[1] list-index assumption with a _read_slots() helper returning {slot: version}. When slot 1 is empty and slot 2 holds the only image, the new code correctly writes boot_next='run sonic_image_2' instead of pointing at the empty slot 1.
  • set_default_image and set_next_image now return False for unknown images instead of always returning True. This restores the contract that grub.py and aboot.py honor and makes main.py's duplicate-install failure branch reachable again. The direct CLI handlers are intentionally unchanged in this PR and still rely on their installed-image precheck.
boot_once handling (issues 3, 4, 7, 11, 15)
  • get_next_image now consults boot_once first, then falls back to boot_next — matching U-Boot bootcmd's actual execution order. As a side effect, sonic-installer list, verify-next-image, and cleanup (which all consume get_next_image) now agree with what U-Boot will actually do at the next reboot.
  • When boot_once / boot_next references an empty slot, return the raw sonic_version_<N> value (e.g. "NONE") rather than silently substituting another installed image. The broken state is visible to the user instead of being hidden behind a fake "Next:" line.
  • When boot_once contains an unrecognised selector (anything that doesn't name sonic_image_<1|2>), return the literal raw value so list doesn't lie about which command will run first.
  • set_default_image now writes fw_setenv boot_once '' after writing boot_next. This mirrors grub-set-default's implicit collapse of next_entry — a freshly chosen persistent default can no longer be shadowed by a stale one-shot.
  • install_image likewise clears boot_once after the installer script returns, so a one-shot set before install can't shadow the newly-installed image at next reboot.
  • get_next_image no longer hides selected empty-slot states (issue 15). If boot_once or boot_next points at an empty slot, the function returns the raw sonic_version_<N> marker instead of substituting another installed image. If no selector is configured at all, it falls back to the current image or the surviving slot.
remove_image (issues 8, 12)
  • Clears boot_once when it still references the slot being removed. Before this PR, bootcmd would consume boot_once='run sonic_image_N' on next reboot and try to bootm a .fit blob that had just been rm -rf'd.
  • Clears per-slot aux env vars using a conservative pair-check: a name is cleared only when both <var> and its sibling exist in the env. The pair table covers both naming conventions:
    • ASPEED's _old suffix (image_dir / image_dir_old, fit_name / fit_name_old, linuxargs / linuxargs_old, sonic_bootargs / sonic_bootargs_old, sonic_boot_load / sonic_boot_load_old, initrd_name / initrd_name_old, fdt_name / fdt_name_old, ubi_sonic_boot_bootargs / ubi_sonic_boot_bootargs_old, ubi_sonic_boot_load / ubi_sonic_boot_load_old, image_name / image_name_old).
    • Centec / Pensando's _1 / _2 suffix (sonic_dir_1 / sonic_dir_2).
  • The pair-check intentionally skips vars that are not paired in the running env. This reduces risk on platforms where a single globally-shared linuxargs is referenced by both sonic_image_1 and sonic_image_2. It is not a complete proof for every U-Boot vendor: if a shared-linuxargs platform also has a stale linuxargs_old variable, a future hardening patch should derive slot-local variables from the actual sonic_image_N command before clearing them.
  • The full 11-pair table is a class-level constant; adding a new vendor convention is a single line of code plus a unit test.
verify_image_platform (issue 9)
  • Port the installer/platforms_asic check from grub.py: extract the manifest from the .bin via the standard sed -e '1,/^exit_marker$/d' | tar xf - installer/platforms_asic -O pipeline, then grep -Fxq for the running platform. Before this PR this method was return os.path.isfile(image_path), which meant the platform-check layer accepted any existing file. In the full install flow, the image still had to pass the earlier SONiC binary-version check, but valid foreign-platform SONiC images were not rejected unless the user explicitly requested --skip-platform-check.
  • Backward-compatible with images that don't ship platforms_asic: tar exits non-zero and the method returns True (matches grub.py).
Tests

22 regression tests in tests/installer_bootloader_uboot_test.py cover the new behaviour. The tests use unittest.mock to drive subprocess.Popen / run_command against crafted env states and assert the exact fw_setenv call list — same pattern as the existing tests in this file. No live hardware required to run the suite.

Backward compatibility / risk
  • Other U-Boot SONiC platforms (Marvell-prestera arm64, Marvell armhf, Centec, Pensando): The pair-check pattern in remove_image was designed against a survey of the per-slot env vars used by all five vendors and was live-tested on ASPEED/BMC. Vars that are not paired in the running env are intentionally left alone. No vendor hardware tests were run. Platforms with globally-shared linuxargs plus stale sibling variables should get additional regression coverage or follow-up hardening that derives slot-local variables from the actual sonic_image_N command.
  • set-default clearing boot_once: Behaviour change with a clear rationale (cross-bootloader alignment; grub-set-default does the same). A user who explicitly sets boot_once and then runs set-default will lose the one-shot — which is the intended cross-bootloader semantic.
  • install_image clearing boot_once: Same rationale — freshly-installed images shouldn't be silently shadowed by stale one-shots.
  • verify_image_platform change: The new check is permissive on images that don't ship installer/platforms_asic (returns True), matching grub.py. Existing valid installs continue to work; only foreign-platform .bins that do ship a non-matching platforms_asic are newly rejected.

How to verify it

Unit tests
cd sonic-utilities
python3 -m pytest tests/installer_bootloader_uboot_test.py -v

22 tests should pass. The non-trivial ones:

  • test_set_default_image_only_slot_2_populated / test_set_next_image_only_slot_2_populated — regression for the list-index bug. Slot 1 empty + image in slot 2 must write run sonic_image_2 (not run sonic_image_1).
  • test_set_default_image_unknown_returns_false / test_set_next_image_unknown_returns_false — contract: False on unknown image, not silent True.
  • test_install_image — install clears boot_once.
  • test_remove_image_clears_boot_once_pointing_at_removed_slot / test_remove_image_preserves_boot_once_pointing_at_other_slot — the conditional boot_once clear in remove_image.
  • test_remove_image_clears_slot_aux_vars / test_remove_image_skips_absent_aux_vars — the pair-check (clear only when both sibling names exist; otherwise leave the var alone).
  • test_get_next_image_boot_once_overrides_boot_nextboot_once takes precedence over boot_next.
  • test_get_next_image_boot_once_empty_slot_returns_marker / test_get_next_image_unknown_boot_once_returns_raw — empty-slot and unrecognised-selector handling.
  • test_get_image_slot_substring_safe — exact-equality slot lookup with versions that share substrings.
  • test_verify_image_platform_matches / test_verify_image_platform_mismatch — the new platforms_asic check.
Live device validation (AST2700 BMC, two installed images)
# Precondition: two installed images.
# From a clean one-image BMC, reach this state by installing a valid BMC image first:
#   sudo sonic-installer install -y /path/to/valid-bmc-sonic.bin
#   sudo reboot
#   # wait for the BMC to come back, then confirm two installed images below.
sudo sonic-installer list
SLOT1=$(sudo fw_printenv -n sonic_version_1)
SLOT2=$(sudo fw_printenv -n sonic_version_2)
CURRENT=$(sudo sonic-installer list | awk -F': ' '/^Current:/ {print $2}')
ONE_SHOT="$SLOT2"
if [ "$CURRENT" = "$SLOT2" ]; then ONE_SHOT="$SLOT1"; fi

# Issue 3 + 4 (boot_once / cleanup): `list` reflects boot_once and cleanup preserves it.
sudo sonic-installer set-next-boot "$ONE_SHOT"
sudo sonic-installer list                       # Next: $ONE_SHOT
sudo sonic-installer verify-next-image          # validates the right slot
sudo sonic-installer cleanup -y                 # no image removed when current and one-shot next are the two slots

# Issue 1 + 2 (slot mapping): exact-equality match, no substring confusion.
sudo sonic-installer set-default "$SLOT1"
sudo fw_printenv boot_next                      # run sonic_image_<correct-slot>
sudo sonic-installer set-default "$SLOT2"
sudo fw_printenv boot_next                      # run sonic_image_<correct-slot>

# Issue 9 (platform check): unit-tested with mocked platforms_asic data.
# Manual validation requires a valid foreign-platform SONiC .bin whose installer/platforms_asic
# does not include the running platform. A dummy file is not sufficient because the
# earlier binary-version check rejects it before platform validation.
sudo sonic-installer install -y /path/to/valid-foreign-platform-sonic.bin       # expected non-zero

# Issue 15 (selected empty slot): `list` surfaces the selected empty slot.
ORIG1=$(sudo fw_printenv -n sonic_version_1)
ORIG2=$(sudo fw_printenv -n sonic_version_2)
ORIGNEXT=$(sudo fw_printenv -n boot_next)
sudo fw_setenv sonic_version_1 NONE
sudo fw_setenv sonic_version_2 "$ORIG2"
sudo fw_setenv boot_next 'run sonic_image_1'
sudo sonic-installer list                       # Next: NONE, no false fallback
sudo fw_setenv sonic_version_1 "$ORIG1"
sudo fw_setenv sonic_version_2 "$ORIG2"
sudo fw_setenv boot_next "$ORIGNEXT"

# Issue 8 + 12 (remove): boot_once and aux vars cleared.
# Run this after the non-destructive checks above; it intentionally removes one slot.
CURRENT=$(sudo sonic-installer list | awk -F': ' '/^Current:/ {print $2}')
NON_CURRENT="$SLOT1"
if [ "$CURRENT" = "$SLOT1" ]; then NON_CURRENT="$SLOT2"; fi
sudo sonic-installer set-next-boot "$NON_CURRENT"
sudo sonic-installer remove -y "$NON_CURRENT"
sudo fw_printenv sonic_version_1 sonic_version_2 image_dir image_dir_old fit_name fit_name_old linuxargs linuxargs_old boot_once

Previous command output (if the output of a command-line utility has changed)

sonic-installer list after set-next-boot before this PR:

$ sudo fw_printenv boot_once boot_next
boot_once=run sonic_image_2
boot_next=run sonic_image_1
$ sudo sonic-installer list
Current: SONiC-OS-A
Next: SONiC-OS-A         <-- WRONG, should be SONiC-OS-B (boot_once consumes first)
Available:
SONiC-OS-A
SONiC-OS-B

sonic-installer set-default with a version that is a prefix of another slot's version (substring bug):

$ sudo fw_printenv sonic_version_1 sonic_version_2
sonic_version_1=SONiC-OS-A.0-dirty-20260513.053011
sonic_version_2=SONiC-OS-A.0
$ sudo sonic-installer set-default SONiC-OS-A.0
$ sudo fw_printenv boot_next
boot_next=run sonic_image_1    <-- WRONG, target is in slot 2

sonic-installer remove leaving stale aux vars and boot_once before this PR:

$ NON_CURRENT=SONiC-OS-A     # non-current image in slot 1 for this example
$ sudo sonic-installer set-next-boot "$NON_CURRENT"
$ sudo sonic-installer remove -y "$NON_CURRENT"
$ sudo fw_printenv sonic_version_1 image_dir linuxargs boot_once
sonic_version_1=NONE
image_dir=image-A          <-- STALE
linuxargs=...loop=image-A/fs.squashfs...    <-- STALE
boot_once=run sonic_image_1                 <-- DANGLING, /host/image-A rm -rf'd

sonic-installer list when boot_next points at an empty slot:

$ sudo fw_setenv sonic_version_1 NONE
$ sudo fw_setenv sonic_version_2 SONiC-OS-A
$ sudo fw_setenv boot_next 'run sonic_image_1'
$ sudo sonic-installer list
Current: SONiC-OS-A
Next: SONiC-OS-A         <-- WRONG, selected slot 1 is empty
Available:
SONiC-OS-A

verify_image_platform accepting any existing file at the platform-check layer:

$ python3 - <<'PY'
from sonic_installer.bootloader.uboot import UbootBootloader
b = UbootBootloader()
print(b.verify_image_platform('/etc/hostname'))
PY
True

New command output (if the output of a command-line utility has changed)

sonic-installer list after set-next-boot with this PR:

$ sudo fw_printenv boot_once boot_next
boot_once=run sonic_image_2
boot_next=run sonic_image_1
$ sudo sonic-installer list
Current: SONiC-OS-A
Next: SONiC-OS-B         <-- CORRECT, boot_once wins
Available:
SONiC-OS-A
SONiC-OS-B

set-default with the same substring-prefix state:

$ sudo sonic-installer set-default SONiC-OS-A.0
$ sudo fw_printenv boot_next
boot_next=run sonic_image_2    <-- CORRECT, target lives in slot 2

remove now clears boot_once and the per-slot aux vars for the ASPEED/BMC paired-slot environment:

$ NON_CURRENT=SONiC-OS-A     # non-current image in slot 1 for this example
$ sudo sonic-installer set-next-boot "$NON_CURRENT"
$ sudo sonic-installer remove -y "$NON_CURRENT"
$ sudo fw_printenv sonic_version_1 image_dir linuxargs boot_once
sonic_version_1=NONE
image_dir=                 <-- cleared (pair-check passed)
linuxargs=                 <-- cleared (pair-check passed)
boot_once=                 <-- cleared (it pointed at the removed slot)

list when boot_next points at an empty slot:

$ sudo fw_setenv sonic_version_1 NONE
$ sudo fw_setenv sonic_version_2 SONiC-OS-A
$ sudo fw_setenv boot_next 'run sonic_image_1'
$ sudo sonic-installer list
Current: SONiC-OS-A
Next: NONE                     <-- selected empty-slot marker surfaced
Available:
SONiC-OS-A

install rejecting a valid foreign-platform SONiC image:

$ sudo sonic-installer install -y /path/to/valid-foreign-platform-sonic.bin
Image file '/path/to/valid-foreign-platform-sonic.bin' is of a different platform ASIC type than running platform's.
If you are sure you want to install this image, use --skip-platform-check.
Aborting...

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@william8545 william8545 marked this pull request as draft May 12, 2026 15:11
@william8545 william8545 changed the title [sonic-installer] uboot: fix slot-mapping brick bugs in set-default / set-next-boot / remove [sonic-installer] uboot: fix two-slot semantics, boot_once handling, and platform check May 15, 2026
…tform check

Bring sonic-installer's U-Boot bootloader in line with the contracts that
grub.py and aboot.py already implement, and add the missing platform check
that has effectively turned `--skip-platform-check` into a no-op on every
U-Boot SONiC platform.

Slot mapping
  * Replace the `if image in images[N]` substring match with explicit
    `_get_image_slot(image)` (exact-equality lookup against
    `sonic_version_<N>`). The previous code wrote the wrong slot when one
    image's version string was a substring of another's (e.g. a release
    `SONiC-OS-bmc.0` vs a dirty build `SONiC-OS-bmc.0-dirty-...`).
  * Replace the `images[0]` / `images[1]` list-index assumption with a
    `{slot: version}` dict returned by `_read_slots()`. The list-index
    code broke whenever one slot was empty: with slot 1 = NONE and the
    only image in slot 2, `images[0]` was slot 2's image and the code
    still wrote `boot_next = run sonic_image_1` — pointing at the empty
    slot.
  * `set_default_image` and `set_next_image` now return False when the
    image is not installed (previously returned True for any input,
    making main.py's failure-handling branch dead code).

boot_once handling
  * `get_next_image` now reads `boot_once` first, then falls back to
    `boot_next` — matching the U-Boot bootcmd order. Before this change
    `sonic-installer list` and `verify-next-image` lied about the next
    image whenever `set-next-boot` had been used.
  * When `boot_once` or `boot_next` references an empty slot, return the
    raw `sonic_version_<N>` value rather than silently substituting
    another image. Surfaces the broken state to the user instead of
    hiding it.
  * `set_default_image` clears `boot_once` so a stale one-shot can no
    longer shadow the newly-set default (mirrors `grub-set-default`).
  * `install_image` clears `boot_once` so a stale one-shot can no
    longer shadow the freshly-installed image.

remove_image
  * Clears `boot_once` when it still references the slot being removed
    — previously the bootcmd would consume `boot_once = run sonic_image_N`
    on next reboot and try to bootm a `.fit` that had just been rm -rf'd.
  * Clears per-slot aux env vars using a vendor-portable pair-check:
    only writes `<var> = ""` when both `<var>` and its sibling exist in
    the env. This covers ASPEED's `_old` convention plus
    `sonic_dir_1`/`sonic_dir_2` (Centec/Pensando), and intentionally
    skips globally-shared vars (Centec uses a single `linuxargs` for
    both slots) so the surviving slot stays bootable.
  * Tracks the platform-survey of 11 per-slot pairs across the five
    U-Boot SONiC platforms (Aspeed, Marvell-prestera arm64, Marvell
    armhf, Centec, Pensando) in a single class-level table.

verify_image_platform
  * Port grub.py's `installer/platforms_asic` check (extract via
    `sed -e '1,/^exit_marker$/d' | tar xf - installer/platforms_asic -O`,
    then `grep -Fxq` for the running platform). Previously this method
    was `return os.path.isfile(image_path)`, which meant
    `sonic-installer install` happily accepted any existing file —
    making the documented `--skip-platform-check` flag a no-op in both
    directions on U-Boot platforms.
  * Backward-compatible with images that don't ship `platforms_asic`
    (tar exits non-zero → returns True, matching grub.py).

Signed-off-by: William Tsai <willtsai@nvidia.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@yxieca

yxieca commented May 21, 2026

Copy link
Copy Markdown
Contributor

Obsersavations:

# Issue Severity Confirmed? Method
1 Sparse slot → wrong slot Critical ✅ YES Simulated slot1=NONE; set-default wrote sonic_image_1 (empty slot!) instead of sonic_image_2
2 Substring name → wrong slot Critical ✅ YES Renamed slot2 to prefix of slot1; set-default picked slot1 via if image in images[0]
3 boot_once ignored Critical ✅ YES Set boot_once=run sonic_image_2; sonic-installer list still shows Next=slot1
4 Stale boot_once shadows set-default High ✅ YES set-default to slot1 succeeded but boot_once still points slot2 — next boot goes slot2
5 remove doesn't clear boot_once High ✅ YES After simulated remove of slot2, boot_once still says run sonic_image_2
6 cleanup removes pinned image High ✅ YES sonic-installer list shows Next=slot1 while boot_once=slot2 — cleanup would delete slot2
7 Validation accepts any file High ❌ NO Empty file rejected with "not a valid SONiC image" — upper layer validation catches it
8 FIPS modifies wrong slot High ✅ YES set-fips on slot2 image modified linuxargs (slot1), not linuxargs_old (slot2)
9 Install ignores fw_setenv failure High ✅ YES Code review — install_image() runs bash with no error checking
10 soft-reboot wrong kernel High ⚠️ Code confirmed Not safe to test live
12 Remove leaves stale vars Medium ✅ YES fit_name, fit_name_old, linuxargs_old all persist after remove — only sonic_version_N cleared
13 set-fips regex corruption Medium ✅ YES Worse than reported! sonic_fips=12 → regex ate trailing char of previous arg: logs_inram=on became logs_inram=on2
15 get-fips substring false positive Low ✅ YES sonic_fips=11 → reports "FIPS is enabled"
16 None vs NONE inconsistent Low ✅ YES remove_image() writes NONE, but fw_printenv could return None
17 ARM = U-Boot detection Low ✅ YES UbootBootloader.detect() → True on aarch64 regardless of actual bootloader

@william8545

Copy link
Copy Markdown
Contributor Author

Obsersavations:

Issue Severity Confirmed? Method

1 Sparse slot → wrong slot Critical ✅ YES Simulated slot1=NONE; set-default wrote sonic_image_1 (empty slot!) instead of sonic_image_2
2 Substring name → wrong slot Critical ✅ YES Renamed slot2 to prefix of slot1; set-default picked slot1 via if image in images[0]
3 boot_once ignored Critical ✅ YES Set boot_once=run sonic_image_2; sonic-installer list still shows Next=slot1
4 Stale boot_once shadows set-default High ✅ YES set-default to slot1 succeeded but boot_once still points slot2 — next boot goes slot2
5 remove doesn't clear boot_once High ✅ YES After simulated remove of slot2, boot_once still says run sonic_image_2
6 cleanup removes pinned image High ✅ YES sonic-installer list shows Next=slot1 while boot_once=slot2 — cleanup would delete slot2
7 Validation accepts any file High ❌ NO Empty file rejected with "not a valid SONiC image" — upper layer validation catches it
8 FIPS modifies wrong slot High ✅ YES set-fips on slot2 image modified linuxargs (slot1), not linuxargs_old (slot2)
9 Install ignores fw_setenv failure High ✅ YES Code review — install_image() runs bash with no error checking
10 soft-reboot wrong kernel High ⚠️ Code confirmed Not safe to test live
12 Remove leaves stale vars Medium ✅ YES fit_name, fit_name_old, linuxargs_old all persist after remove — only sonic_version_N cleared
13 set-fips regex corruption Medium ✅ YES Worse than reported! sonic_fips=12 → regex ate trailing char of previous arg: logs_inram=on became logs_inram=on2
15 get-fips substring false positive Low ✅ YES sonic_fips=11 → reports "FIPS is enabled"
16 None vs NONE inconsistent Low ✅ YES remove_image() writes NONE, but fw_printenv could return None
17 ARM = U-Boot detection Low ✅ YES UbootBootloader.detect() → True on aarch64 regardless of actual bootloader

Yes, accepting any file is incorrect here; it needs to be a SONiC image.
sudo sonic-installer install sonic-*.bin -y will only accept SONiC image but will accept switch image too.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR corrects SONiC’s U-Boot bootloader implementation (sonic_installer/bootloader/uboot.py) to behave consistently with the existing Grub/Aboot contracts: robust two-slot selection, correct boot_once semantics, safer cleanup behavior, and a real platform verification check so --skip-platform-check is no longer effectively ignored on U-Boot systems.

Changes:

  • Fix two-slot mapping by switching from list-index/substrings to explicit slot reading and exact image-to-slot lookup.
  • Make get_next_image() honor U-Boot boot order (boot_once before boot_next) and surface empty-slot / unknown-selector states instead of masking them.
  • Implement platform validation by parsing installer/platforms_asic from the image payload (matching Grub’s behavior) and add/expand unit tests for the new semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sonic_installer/bootloader/uboot.py Correct slot semantics, boot_once precedence/clearing, slot-aux-var cleanup, and add a real platforms_asic-based platform check.
tests/installer_bootloader_uboot_test.py Add regression/unit tests for slot mapping, boot_once behavior, remove-image cleanup, and platform-check behavior.
Comments suppressed due to low confidence (1)

tests/installer_bootloader_uboot_test.py:22

  • MockProc.communicate is currently defined without a self parameter and returns an undefined name (commandline). If this stub is invoked, it will raise (TypeError/NameError) instead of behaving like subprocess.Popen().communicate().
class MockProc():
    commandline = "linuxargs="
    def communicate():
        return commandline, None

Comment on lines 108 to 117
def test_install_image(mock_run_cmd):
image_path = ['sonic_image']
expected_call = [call(['bash', image_path])]
expected_call = [
call(['bash', image_path]),
call(['/usr/bin/fw_setenv', 'boot_once', '']),
]

bootloader = uboot.UbootBootloader()
bootloader.install_image(image_path)
assert mock_run_cmd.call_args_list == expected_call

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in latest commit

Comment on lines +477 to +482
@patch("sonic_installer.bootloader.uboot.device_info")
@patch("sonic_installer.bootloader.uboot.subprocess.Popen")
def test_verify_image_platform_mismatch(popen_patch, device_info_patch,
tmp_path):
"""tar rc=0 + grep rc=1 -> False (platform not in manifest)."""
image = tmp_path / "sonic.bin"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in latest commit

Follow-up addressing PR review feedback on the U-Boot two-slot work.

remove_image
  * Drop ("linuxargs", "linuxargs_old") from SLOT_AUX_VAR_PAIRS.
    set_fips()/get_fips() read and write `linuxargs` as a
    slot-agnostic kernel cmdline (they persist sonic_fips= into it
    regardless of slot), so clearing it when removing a slot could
    wipe the surviving image's boot args / FIPS setting. The slot is
    already made non-bootable via sonic_version_<N>=NONE + boot_next
    repoint, so the leftover var is inert.

get_next_image
  * Fix an inaccurate comment: the no-selector fallback returns the
    currently-running image, not images[0] -- it does NOT "mirror
    grub.py" (grub falls back to images[0]). No behavior change.

tests
  * test_install_image: pass image_path as a str, matching how
    sonic_installer.main.install() actually calls install_image()
    (avoids encoding a nested argv shape).
  * Add test_verify_image_platform_no_manifest: regression for the
    tar-nonzero branch (image ships no installer/platforms_asic ->
    allow install), preventing accidental tightening of the
    backward-compatible contract.
  * Add test_remove_image_preserves_linuxargs: regression guarding
    that linuxargs is never cleared on remove.

Signed-off-by: William Tsai <willtsai@nvidia.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Sourabh-Kumar7

Copy link
Copy Markdown
Member

@judyjoseph could you please help review the change? thanks.

@judyjoseph judyjoseph requested review from saiarcot895 and yxieca June 9, 2026 19:52
@william8545

Copy link
Copy Markdown
Contributor Author

sorry, I forget to close this
Based on the discussion and feedbacks from other vendors, I will write a new class for SONiC-BMC

@william8545 william8545 closed this Jun 9, 2026
@william8545

Copy link
Copy Markdown
Contributor Author

#4602

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants