Skip to content

[reboot_test] Fix flaky time.time() assertions on second boundaries#391

Open
DavidZagury wants to merge 2 commits into
sonic-net:masterfrom
DavidZagury:master_fix_reboot_test_time_flake
Open

[reboot_test] Fix flaky time.time() assertions on second boundaries#391
DavidZagury wants to merge 2 commits into
sonic-net:masterfrom
DavidZagury:master_fix_reboot_test_time_flake

Conversation

@DavidZagury

@DavidZagury DavidZagury commented Jun 8, 2026

Copy link
Copy Markdown

Why I did it

The execute_reboot pytest cases in tests/host_modules/reboot_test.py were flaky on second boundaries. The production code in host_modules/reboot.py records a failure timestamp via int(time.time()) and the tests compared that recorded value against another freshly-sampled int(time.time()) inside the assertion. When the wall clock ticked over a second between the production call and the assertion, the two integers differed by 1 and the test failed.

This was observed in a SONiC builds where test_execute_reboot_fail_issue_reboot_command_warm failed with:

Expected: populate_reboot_status_flag(False, 1779879232, 'Failed to execute reboot command', 'WARM', STATUS_FAILURE)
  Actual: populate_reboot_status_flag(False, 1779879231, 'Failed to execute reboot command', 'WARM', STATUS_FAILURE)

Four other tests in the same file shared the identical race and would have failed at random as well.

How I did it
In each of the five affected tests, added mock.patch("time.time", return_value=TIME) to the existing with block and replaced the in-assertion int(time.time()) with the literal TIME constant (already defined at the top of the file as 1617811205). This matches the pattern already in use by test_populate_reboot_status_flag and test_populate_reboot_status_flag_with_status in the same file.

Tests updated:

  • test_execute_reboot_success
  • test_execute_reboot_fail_issue_reboot_command_cold_boot
  • test_execute_reboot_fail_issue_reboot_command_halt
  • test_execute_reboot_fail_halt_timeout
  • test_execute_reboot_fail_issue_reboot_command_warm

Production code is untouched. The halt-timeout loop in reboot.py uses time.monotonic() rather than time.time(), so patching time.time does not affect any production control flow.

The execute_reboot tests compared a mocked populate_reboot_status_flag
call against a freshly-sampled int(time.time()) in the assertion. When
the wall clock crossed a second boundary between the production call
and the assertion, the timestamps differed by 1 and the test failed.

Patch time.time to the existing TIME constant in the five affected
tests, matching the pattern already used in
test_populate_reboot_status_flag*. Production halt-loop uses
time.monotonic() and is unaffected.

Signed-off-by: david.zagury <davidza@nvidia.com>
Address CodeRabbit docstring-coverage warning on the five tests updated
in the previous commit. One-line summary per test describing the
scenario being exercised.

Signed-off-by: david.zagury <davidza@nvidia.com>
@mssonicbld

Copy link
Copy Markdown

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Sourabh-Kumar7

Copy link
Copy Markdown
Member

@saiarcot895 could you please help review this? Thanks.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves test determinism for the host_modules/reboot.py failure-timestamp behavior by removing reliance on live time.time() sampling during assertions in tests/host_modules/reboot_test.py.

Changes:

  • Mock time.time() in five execute_reboot pytest cases to eliminate second-boundary races.
  • Replace int(time.time()) in assertions with the fixed TIME constant to ensure stable expected values.
  • Add short docstrings to the affected tests to clarify intent.

Comment on lines 272 to 277
with (
mock.patch("reboot._run_command") as mock_run_command,
mock.patch("time.sleep") as mock_sleep,
mock.patch("time.time", return_value=TIME),
mock.patch("reboot.Reboot.is_halt_command_running", return_value=True) as mock_is_halt_command_running,
mock.patch("reboot.Reboot.is_container_running", return_value=True) as mock_is_container_running,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants