Fix: Add validations to suspend_trigger (BugFix)#2500
Conversation
8053763 to
da7085e
Compare
1734983 to
bda469a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2500 +/- ##
=======================================
Coverage 58.92% 58.93%
=======================================
Files 476 476
Lines 48031 48050 +19
Branches 8574 8579 +5
=======================================
+ Hits 28303 28317 +14
- Misses 18835 18840 +5
Partials 893 893
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bda469a to
708f6cb
Compare
* Run "systemctl list-jobs suspend" to check if there are suspend jobs running before running one more. * If "systemctl list-jobs suspend" detects suspend jobs running wait for a while before proceeding. * Set "set -o pipefail" on the Checkbox job definition for the job to be marked as "failed" when exiting on error.
7972510 to
3e3604f
Compare
a8028f1 to
dfa9d20
Compare
* Include fixes for the unit tests to succeed with the new systemctl callbacks.
dfa9d20 to
427dd93
Compare
pieqq
left a comment
There was a problem hiding this comment.
I only skimmed through the PR, but I have a suggestion.
* Lets better use SystemExit when needing the job to exit on error.
e86402d to
82c2284
Compare
* Call @Retry(max_attempts=10, delay=1) instead of manually defining a while loop.
25693aa to
20565f4
Compare
* Now that the error handling is done differently the tests must be updated as well.
20565f4 to
c6949eb
Compare
* Give a little more of time while checking if the suspend jobs are still running.
c6949eb to
0915934
Compare
|
Hello @pieqq , I've digging more into this issue, please review my findings. When running The issue has been isolated through the attached scripts (scripts.zip), I recorded this video for illustrative purposes (suspend-tests-video.zip), The helper In Checkbox, this issue appears sporadically during stress tests due to the following execution timing.
Iterating between steps 1-3 sometimes gives enough time to the system to be suspended, but as as reported on this PR, the issues are sometimes visible. Therefore, implementing a polling mechanism via |
Description
The following jobs were executed as part of the NVIDIA Riverside Stress test plans.
As can be seen the following error is commonly displayed.
Even though the suspend command failed, the job is marked as
Outcome: job passed, and the error is printed cyclically in further iterations.This PR proposes some validations for the
suspend_trigger.pyscript to make thestress-tests/suspend_cycles_{{suspend_id}}_reboot{{suspend_reboot_id}}test cases more robust.systemctl list-jobs *suspend*to check if there are suspend jobs running before running one more.systemctl list-jobs *suspend*detects suspend jobs running, the job will wait before proceeding.set -o pipefailon the Checkbox job definition for the job to be marked as "failed" when exiting on error.systemctl list-jobs *suspend*calls.Resolved issues
https://warthogs.atlassian.net/browse/PERI-1367
Documentation
Tests
I ran the
com.canonical.certification::suspend-cycles-stress-testtest plan two times on anvidia-jetson-orin-nanoDUT.stress-tests/suspend_cycles_27_reboot1there where suspend jobs still on-going when trying to suspend the device once more, but the test case waited until being able to send the corresponding commands.