Skip to content

Voltage and Current sensors CLI tests#10736

Closed
bmridul wants to merge 5588 commits into
sonic-net:masterfrom
bmridul:sensormon
Closed

Voltage and Current sensors CLI tests#10736
bmridul wants to merge 5588 commits into
sonic-net:masterfrom
bmridul:sensormon

Conversation

@bmridul

@bmridul bmridul commented Nov 14, 2023

Copy link
Copy Markdown
Contributor

Description of PR

Sonic-mgmt tests for CLI introduced as part of Sensormon. HLD - sonic-net/SONiC#1394

Summary:
Added tests for Sensormon supported CLIs for
show platform voltage
show platform current

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • [x ] Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205
  • 202305

Approach

What is the motivation for this PR?

Added first set of sonic mgmt tests for Sensormon feature.

How did you verify/test it?

Ran the tests on the DUT.

Any platform specific information?

Supported testbed topology if it's a new test case?

Any. Should be applicable to all.

Documentation

HLD link provided above.

@bmridul bmridul requested a review from prgeor as a code owner November 14, 2023 17:50
@mssonicbld

Copy link
Copy Markdown
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Failed
- hook id: check-ast
- exit code: 1

tests/platform_tests/cli/test_show_platform.py: failed parsing with CPython 3.8.10:

Traceback (most recent call last):
File "/home/AzDevOps/.cache/pre-commit/repoc03rpkpp/py_env-python3/lib/python3.8/site-packages/pre_commit_hooks/check_ast.py", line 21, in main
ast.parse(f.read(), filename=filename)
File "/usr/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "tests/platform_tests/cli/test_show_platform.py", line 368
check_show_platform_sensor_output(cmd, duthost):
^
SyntaxError: invalid syntax
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@cyw233

cyw233 commented Aug 9, 2024

Copy link
Copy Markdown
Contributor

Hey @bmridul, could you sync this PR with the latest master as it's been here for quite a while, please? It's also a good chance to rerun the PR checks. Thanks!

Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
Comment thread tests/platform_tests/cli/test_show_platform.py Outdated
@rlhui

rlhui commented Apr 30, 2025

Copy link
Copy Markdown

@bmridul please follow up thanks.

@rlhui rlhui requested a review from judyjoseph April 30, 2025 17:34
@abdosi

abdosi commented Jun 4, 2025

Copy link
Copy Markdown
Contributor

@bmridul : very old PR> if still applicable let's update it so that we can merge this soon.

@rlhui

rlhui commented Jul 9, 2025

Copy link
Copy Markdown

@judyjoseph please help review, thanks

@judyjoseph

Copy link
Copy Markdown
Contributor

@bmridul @anamehra, please address comments, which needs to be fixed for a test run. Also attach the o/p of a test run in the description - thanks !

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@github-actions github-actions Bot requested review from gechiang, nhe-NV and rawal01 April 16, 2026 19:04
Comment thread tests/platform_tests/cli/test_show_platform.py Fixed
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

echuawu and others added 6 commits April 16, 2026 16:09
…t#21713)

1. Limit SRv6 test script on 202412 master 202511 and later release
2. Skip SRv6 on Mellanox SPC1-3

Change-Id: I27519e05d55d93cf5a11ede30861d08048738210

Signed-off-by: echuawu <chuanw@nvidia.com>
…bility (sonic-net#22351)

With sonic-net#11457, in Python3 sonic-mgmt docker environments, we are forced to
use YAML modules to load files and this has repercussions if inventory
file is in INI format.

If we are using TestbedProcessing.py to generate veos (which has ansible
inventory entries for test server) and lab (which has inventory entries
for dut, ptf, and fanout switch), we would be forced to use it with yaml
flag as with sonic-net#11457, we can only open yaml files.

If you take a look at makeLab and makeLabYaml, there are so many
required fields that are not at all handled in current makeLabYaml
function in its current state. This patch attempts to fix these
differences and make the transition to Python3 seamless if we are using
TestbedProcessing.py to generate inventory files.

Signed-off-by: Mohan Yelugoti <ymd@arista.com>
… tests (sonic-net#22199)

* Validate ACL rules and routes
* Fix formatting
* Fix wait for route add issue
* Fix v6 route validation issue in recycle port queue counters test
* Cleanup added route at the end
* Validate ACL rule VID to RID mapping exists
* Handle multi-ASIC systems

Signed-off-by: venu-nexthop <venu@nexthop.ai>

---------

Signed-off-by: venu-nexthop <venu@nexthop.ai>
…ride (sonic-net#23163)

This test performs multiple config reloads which restart all
containers. Memory check is not meaningful here because monit captures
a low baseline right after reload when containers are still initializing,
then flags normal steady-state memory as a false positive increase.

Change-Id: I8bda705a40078b9d2838cd5e97a3745b64a87784

Signed-off-by: weiguo-nvidia <weguo@nvidia.com>
…evices (sonic-net#23164)

On platforms with large numbers of PCIe devices, pcied needs
~60s after restart to scan and begin writing to STATE_DB. The previous
60s timeout left insufficient time for the actual DB population

Also fix variable naming: pooling_interval -> polling_timeout to
accurately reflect the parameter semantics (wait_until timeout, not
polling interval).

Signed-off-by: Xixue Jia <xixuej@nvidia.com>
rraghav-cisco and others added 15 commits April 16, 2026 16:09
…23956)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Cisco-console server comes with SFP(10G), which is not yet added in
test_sfp.py. This PR we add "sfp" to the list of modules supported by
this script.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [X] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [X] 202511

### Approach
#### What is the motivation for this PR?
Adding support for SFP in test_sfp.py.

#### How did you do it?
Added the "SFP" to the current list of modules.

#### How did you verify/test it?
Ran it in cisco console platform testbed:
```
===================================================================================================== PASSES =====================================================================================================
____________________________________________________________________________________ TestSfpApi.test_get_presence[sirc0-dut1] ____________________________________________________________________________________
_____________________________________________________________________________________ TestSfpApi.test_get_model[sirc0-dut1] ______________________________________________________________________________________
_____________________________________________________________________________________ TestSfpApi.test_get_serial[sirc0-dut1] _____________________________________________________________________________________
___________________________________________________________________________________ TestSfpApi.test_is_replaceable[sirc0-dut1] ___________________________________________________________________________________
________________________________________________________________________________ TestSfpApi.test_get_transceiver_info[sirc0-dut1] ________________________________________________________________________________
___________________________________________________________________________ TestSfpApi.test_get_transceiver_dom_real_value[sirc0-dut1] ___________________________________________________________________________
___________________________________________________________________________ TestSfpApi.test_get_transceiver_threshold_info[sirc0-dut1] ___________________________________________________________________________
__________________________________________________________________________________ TestSfpApi.test_get_reset_status[sirc0-dut1] __________________________________________________________________________________
_____________________________________________________________________________________ TestSfpApi.test_get_rx_los[sirc0-dut1] _____________________________________________________________________________________
____________________________________________________________________________________ TestSfpApi.test_get_tx_fault[sirc0-dut1] ____________________________________________________________________________________
__________________________________________________________________________________ TestSfpApi.test_get_temperature[sirc0-dut1] ___________________________________________________________________________________
____________________________________________________________________________________ TestSfpApi.test_get_voltage[sirc0-dut1] _____________________________________________________________________________________
____________________________________________________________________________________ TestSfpApi.test_get_tx_bias[sirc0-dut1] _____________________________________________________________________________________
____________________________________________________________________________________ TestSfpApi.test_get_rx_power[sirc0-dut1] ____________________________________________________________________________________
____________________________________________________________________________________ TestSfpApi.test_get_tx_power[sirc0-dut1] ____________________________________________________________________________________
_______________________________________________________________________________________ TestSfpApi.test_reset[sirc0-dut1] ________________________________________________________________________________________
_____________________________________________________________________________________ TestSfpApi.test_tx_disable[sirc0-dut1] _____________________________________________________________________________________
_________________________________________________________________________________ TestSfpApi.test_tx_disable_channel[sirc0-dut1] _________________________________________________________________________________
_______________________________________________________________________________________ TestSfpApi.test_lpmode[sirc0-dut1] _______________________________________________________________________________________
___________________________________________________________________________________ TestSfpApi.test_power_override[sirc0-dut1] ___________________________________________________________________________________
_______________________________________________________________________________ TestSfpApi.test_get_error_description[sirc0-dut1] ________________________________________________________________________________
______________________________________________________________________________________ TestSfpApi.test_thermals[sirc0-dut1] ______________________________________________________________________________________
--------------------------------------------------------- generated xml file: /run_logs/sir-c0/39632/2026-04-15-17-51-18/platform_tests/api/test_sfp.xml ---------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
============================================================================================ short test summary info =============================================================================================
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_presence[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_model[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_serial[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_is_replaceable[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_transceiver_info[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_transceiver_dom_real_value[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_transceiver_threshold_info[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_reset_status[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_rx_los[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_tx_fault[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_temperature[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_voltage[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_tx_bias[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_rx_power[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_tx_power[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_reset[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_tx_disable[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_tx_disable_channel[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_lpmode[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_power_override[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_get_error_description[sirc0-dut1]
PASSED platform_tests/api/test_sfp.py::TestSfpApi::test_thermals[sirc0-dut1]
FAILED platform_tests/api/test_sfp.py::TestSfpApi::test_get_name[sirc0-dut1] - Failed: Transceiver name 'QSFP_1' for PORT1 NOT found in platform.json, Transceiver name 'QSFP_2' for PORT2 NOT found in platform.json
============================================================================== 1 failed, 22 passed, 1 warning in 804.79s (0:13:24) ===============================================================================
sonic@arctos_cicd_all_202603:/data/tests$ 
```

#### Any platform specific information?
The list is not platform specific, but this is required to support the
cisco console server.

Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
The test creates VXLAN tunnel config via sonic-cfggen --write-to-db
(CONFIG_DB) then verifies it in APP_DB (redis-cli -n 0). On dualtor
topology, the VXLAN tunnel config does not propagate from CONFIG_DB to
APP_DB, causing the test to always fail with:
  'VXLAN tunnel tunnel_v4 not found in APP_DB after config reload'

This test was originally designed for t0 topology only (pytestmark has
topology('t0')), but gets scheduled on dualtor via the nightly test
scheduler. Add explicit skip for dualtor topologies.

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [x] New Test case
    - [x] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

Signed-off-by: yawenni <yawenni@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description of PR
Summary:
Add sonic_yang error pattern to loganalyzer ignore list for test_multiasic_addcluster test to prevent false test failures from expected transient YANG validation errors.

Fixes # N/A (test reliability improvement)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202205
 202305
 202311
 202405
 202411
 202505
 202511
Approach
What is the motivation for this PR?
The test_multiasic_addcluster test was failing due to sonic_yang YANG validation errors being caught by the loganalyzer. These errors are expected transient failures that occur during GCU (Generic Config Updater) patch application.

When GCU applies JSON patches, it tries multiple orderings of patch operations to find one that passes YANG validation. During these intermediate attempts:

PORT entries may temporarily be missing the required lanes field
Tables like PORT_QOS_MAP, QUEUE, BUFFER_PG may reference ports that don't exist yet
Example errors:

ERR sonic_yang: Data Loading Failed:Missing required element "lanes" in "PORT_LIST"
ERR sonic_yang: Data Loading Failed:Invalid value "Ethernet0" in "ifname" element
ERR sonic_yang: Data Loading Failed:Leafref...points to a non-existing leaf
The conftest.py in tests/generic_config_updater/ has an ignore pattern ".*ERR sonic_yang.*" but it only applies to ONE host. The test's own LOGANALYZER_IGNORE_REGEX (which applies to ALL hosts via ignore_expected_loganalyzer_errors fixture) was missing this pattern.

How did you do it?
Added ".*ERR sonic_yang.*" to LOGANALYZER_IGNORE_REGEX in test_multiasic_addcluster.py, which is applied to all DUT hosts via the ignore_expected_loganalyzer_errors fixture.

How did you verify/test it?
Verified the error pattern matches the observed syslog entries
Confirmed the fixture applies ignore patterns to all hosts in duthosts
Cross-referenced with existing ignore pattern in conftest.py (line 93)
Any platform specific information?
N/A - applies to all multi-ASIC platforms running GCU operations.

Supported testbed topology if it's a new test case?
N/A - this is an improvement to an existing t2 topology test.

Documentation
N/A - no new features or test cases added.

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
…mport in test_vrf.py (sonic-net#23721)

### Description of PR

Summary:
PR sonic-net#18347 refactored fixture handling but accidentally removed the
`skip_test_module_over_backend_topologies` import from
`tests/vrf/test_vrf.py`. This causes `FixtureLookupError` for any test
that depends on the `setup_vrf` fixture (e.g. `test_vrf2_fib`), because
pytest cannot resolve the `skip_test_module_over_backend_topologies`
fixture parameter.

This PR adds the missing import back.

Related ADO PBI:
[37430687](https://msazure.visualstudio.com/One/_workitems/edit/37430687)
Regression introduced by: sonic-net#18347


Fixed the error:
```
failed on setup with "file /var/src/sonic-mgmt_vms91-t0-7060x6-moby-512-3/tests/vrf/test_vrf.py, line 763
def test_vrf2_fib(self, partial_ptf_runner):
file /var/src/sonic-mgmt_vms91-t0-7060x6-moby-512-3/tests/vrf/test_vrf.py, line 498
@pytest.fixture(scope="module", autouse=True)
def setup_vrf(
E fixture 'skip_test_module_over_backend_topologies' not found
> available fixtures: __pytest_repeat_step_number, active_active_ports, active_active_ports_config, active_standby_ports, add_mgmt_test_mark, ansible_adhoc, ansible_facts, ansible_module, backup_and_restore_config_db, backup_and_restore_config_db_module, backup_and_restore_config_db_on_duts, backup_and_restore_config_db_package, backup_and_restore_config_db_session, build_gnmi_stubs, cable_type, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, capteesys, cfg_facts, change_mac_addresses, check_bfd_up_count, check_bgp, check_dbmemory, check_dut_asic_type, check_interfaces, check_ipv4_mgmt, check_ipv6_mgmt, check_mac_entry_count,
```
### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
`test_vrf2_fib` (and other VRF tests) fail with `FixtureLookupError:
skip_test_module_over_backend_topologies` because the fixture is
referenced in the `setup_vrf` fixture's parameter list but never
imported in `test_vrf.py`. This regression was introduced by PR sonic-net#18347.

#### How did you do it?
Added the missing import line:
```python
from tests.common.fixtures.backend_topology import skip_test_module_over_backend_topologies  # noqa: F401
```

#### How did you verify/test it?
- Confirmed the fixture is used in `setup_vrf` (line 500) but has no
corresponding import
- Confirmed no `conftest.py` exists in `tests/vrf/` that could provide
the fixture
- Verified the import path `tests.common.fixtures.backend_topology`
matches the existing fixture module

#### Any platform specific information?
N/A

#### Supported testbed topology if it's a new test case?
N/A

### Documentation
N/A

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
1. Add a traffic test to standby switch
2. Distinguish the expected HA states for two DPUs when ha_owner is
switch
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

---------

Signed-off-by: BYGX-wcr <wcr@live.cn>
…c-net#23867)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->
Fix file permission issues in topology converge by removing unnecessary
sudo commands.
When converge runs with sudo, both the topology file and its backup end
up with `root:root` ownership, preventing regular users from editing or
committing these files.

Two small improvements to the topology converge flow in testbed-cli.sh:
1. Remove unnecessary sudo from converge operations so topology and
backup files retain the current user's ownership.
2. Replace end-of-script restore with trap to guarantee the backup is
always restored on exit, regardless of whether the script completes
successfully or exits early.

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [x] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?
Improvement 1 – Remove unnecessary sudo: Converge operations (cp, python
-m ceos_topo_converger) were run with sudo, causing the topology file
and its backup to be owned by root:root. Since these files live in the
repo directory and belong to the current user, sudo is not needed.
Running without it keeps the original file ownership intact, so users
can still edit and git commit the files normally.

Improvement 2 – Guarantee restore on exit: The restore block was placed
at the end of the script. With set -e, any failure (e.g.,
ansible-playbook error) exits the script immediately, skipping the
restore and leaving a modified topo file behind. Moving the restore
logic into a `trap restore_topo_if_needed EXIT` handler ensures the
backup is always restored on exit, whether the script succeeds or fails.

#### How did you do it?
1. Removed sudo from cp and python -m ceos_topo_converger calls inside
converge_topo_if_needed.
2. Introduced a restore_topo_if_needed function and registered it via
trap restore_topo_if_needed EXIT at the top of the script, replacing the
end-of-script restore block.

#### How did you verify/test it?
Ran add-topo on a multi-VRF testbed with use_converged_peers: True.
Verified that:
- Topology and backup files retain the original user's ownership after
converge
- Topo file is correctly restored on normal exit
- Topo file is correctly restored when the script exits due to a mid-way
failure

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

---------

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->
This PR makes improvements and hardens `test_console_stress.py`

Summary:
Fixes # sonic-net#23818

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

`test_console_stress.py` is a new test case recently introduced and has
been failing consistently on Nexthop devices.

#### How did you do it?
Test was using `max_loops=300`, coupled with the default `loop_delay` of
`0.025` only waits for 7.5 secs before timing out in
`test_console_stress_output`. Changed it to `read_timeout=300` to
increase timeout allowance and also `max_loops` is being deprecated
according to Netniko docs.
<img width="1329" height="734" alt="Screenshot 2026-04-09 at 8 43 31 PM"
src="https://github.com/user-attachments/assets/62205d97-b5fb-4ed2-9cda-92e8ec75c041"
/>

Replaced `send_command` with send_command_timing in
`test_console_stress_input` because `send_command` internally calls
`command_echo_read()` with a hardcoded 10-second timeout that cannot be
overridden. The 100,000-character echo command takes longer than 10
seconds to echo back over console, causing a `ReadTimeout` before the
actual output is ever read. `send_command_timing` skips echo matching
entirely, avoiding this limitation. Since `send_command_timing` returns
the raw channel output (command echo + output + prompt), the test now
strips the command echo and prompt before verifying the payload.

#### How did you verify/test it?
previously when the test was failing:
```
dut_console/test_console_stress.py::test_console_stress_output
-------------------------------- live log call ---------------------------------
09/04/2026 09:57:30 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 1720, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
    raise exception
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 166, in pytest_pyfunc_call
    result = testfunction(**testargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/AzDevOps/sonic-mgmt/tests/dut_console/test_console_stress.py", line 31, in test_console_stress_output
    output = duthost_console.send_command(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 111, in wrapper_decorator
    return_val = func(self, *args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/utilities.py", line 667, in wrapper_decorator
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 1841, in send_command
    raise ReadTimeout(msg)
netmiko.exceptions.ReadTimeout: 
Pattern not detected: 'admin:\\~\\$' in output.

Things you might try to fix this:
1. Explicitly set your pattern using the expect_string argument.
2. Increase the read_timeout to a larger value.

You can also look at the Netmiko session_log or debug log for more information.



FAILED                                                                   [ 50%]
dut_console/test_console_stress.py::test_console_stress_input
-------------------------------- live log call ---------------------------------
09/04/2026 09:57:41 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 1720, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
    raise exception
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 166, in pytest_pyfunc_call
    result = testfunction(**testargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/AzDevOps/sonic-mgmt/tests/dut_console/test_console_stress.py", line 79, in test_console_stress_input
    output = duthost_console.send_command(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 111, in wrapper_decorator
    return_val = func(self, *args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/utilities.py", line 667, in wrapper_decorator
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 1791, in send_command
    new_data = self.command_echo_read(cmd=cmd, read_timeout=10)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 1494, in command_echo_read
    new_data = self.read_until_pattern(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/netmiko/base_connection.py", line 755, in read_until_pattern
    raise ReadTimeout(msg)
netmiko.exceptions.ReadTimeout: 

Pattern not detected: "echo\\ '012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
...
```

On Nexthop devices, with the fix:

<img width="1160" height="322" alt="Screenshot 2026-04-09 at 8 57 18 PM"
src="https://github.com/user-attachments/assets/0da798b1-8dc0-4a7d-b3c7-f9ad0a1d9927"
/>


```
09/04/2026 21:57:57 base_connection.read_channel             L0652 DEBUG  | read_channel: 
09/04/2026 21:57:57 base_connection.find_prompt              L1462 DEBUG  | [find_prompt()]: prompt is admin:~$
09/04/2026 21:57:57 base_connection.wrapper_decorator        L0128 DEBUG  | write_channel: b'python3 -c "for i in range(1000): print(f\'LINE_{i:04d}: \' + \'0123456789\' * 10)"\n'
...
09/04/2026 21:57:58 base_connection.read_channel             L0652 DEBUG  | read_channel: python3 -c "for i in range(1000): print(f'LINE_{i:04d}: ' + '0123456789' * 10)"
...
09/04/2026 21:57:58 base_connection.read_until_pattern       L0743 DEBUG  | Pattern found: (python3\ \-c\ "for\ i\ in\ range\(1000\):\ print\(f'LINE_\{i:04d\}:\ '\ \+\ '0123456789'\ \*\ 10\)") admin:~$ python3 -c "for i in range(1000): print(f'LINE_{i:04d}: ' + '0123456789' * 10)"
...
09/04/2026 21:57:58 base_connection.read_channel             L0652 DEBUG  | read_channel: LINE_0000: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0001: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0002: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0003: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0004: 0123456789012345678901234567890123456789012345678
...
9/04/2026 21:59:56 base_connection.read_channel             L0652 DEBUG  | read_channel: 1234567890123456789
LINE_0997: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0998: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
LINE_0999: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
...
09/04/2026 21:59:56 base_connection.read_channel             L0652 DEBUG  | read_channel: echo test_responsive
test_responsive

09/04/2026 21:59:56 base_connection.read_until_pattern       L0743 DEBUG  | Pattern found: (echo\ test_responsive) echo test_responsive
09/04/2026 21:59:56 base_connection.read_channel             L0652 DEBUG  | read_channel: 
09/04/2026 21:59:56 base_connection.read_channel             L0652 DEBUG  | read_channel: 
09/04/2026 21:59:56 base_connection.read_channel             L0652 DEBUG  | read_channel: 
```

#### Any platform specific information?
N/A

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
N/A

---------

Signed-off-by: antonio-nexthop <antonio@nexthop.ai>
Signed-off-by: Antonio Hui <antonio@nexthop.ai>
Add --buffer-size=102400 --immediate-mode -U flags to tcpdump for larger
kernel buffer and faster packet delivery. Replace fixed sleep(15) with
poll-until-idle that waits until the capture log stops growing for 5
seconds before killing tcpdump, ensuring all relayed packets are
captured.

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement


### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [ ] 202511

### Approach
#### What is the motivation for this PR?

The DHCP relay stress test (test_dhcp_relay_stress) sends packets at
10,000 pps and compares DUT-side packet counts with PTF-side tcpdump
captures. The PTF tcpdump was dropping a significant percentage of
packets due to two issues:

1. No kernel buffer tuning: The default tcpdump buffer is too small for
high packet rates, causing kernel-level drops.
2. Fixed sleep before kill: The sleep(15) before killing tcpdump doesn't
account for variable relay processing times. If relayed packets are
still arriving after 15 seconds, they are missed.

These issues caused consistent count mismatches (DUT vs PTF) exceeding
the 10% tolerance, resulting in false test failures.

#### How did you do it?

1. Added --buffer-size=102400 to increase the kernel capture buffer
(matching the DUT-side tcpdump configuration).
2. Added --immediate-mode -U for faster packet delivery to userspace and
unbuffered pcap output.
3. Replaced time.sleep(15) with a poll-until-idle loop that monitors the
tcpdump log file size every second and only kills tcpdump after the file
has been stable (no new data) for 5 consecutive seconds.
 
#### How did you verify/test it?

- Tested on Broadcom 7260 (t0-116 topology) at 10,000 pps for 120
seconds
 - Before fix: ~3x count mismatch between DUT and PTF captures
 - After fix: mismatch reduced to ~17-19%, down from ~200%+ (3x)
- At lower packet rates (100 pps), all stress tests pass with 0%
mismatch, confirming the issue is tcpdump performance under high load,
not test logic

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->

---------

Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…onic-net#23814)

What: Added _derive_subports method in port_config_gen.py for fanout subport assignment, updated sonic_deploy_202505.j2 to render subport attribute, and added new lt2-o128-tor.j2 EOS ToR template.
Why: Fanout ports sharing the same physical index (split OSFP modules) need correct subport mapping. The lt2-o128-tor topology lacked an EOS configuration template.
How: _derive_subports assigns subport attributes based on alias suffix (e.g. etp1a/etp1b) for ports with matching physical index; sonic_deploy template conditionally includes subport; new Jinja2 template provides full EOS ToR config (mgmt, interfaces, BGP, cEOS agent shutdowns).
Testing: All CI checks passed including KVM tests (t0, t1-lag, t2, dpu, multi-asic-t1, t0-sonic, t0-2vlans, t1-lag-vpp).

Signed-off-by: Austin Pham <austinpham@microsoft.com>
…droom (sonic-net#23816)

What: Changed / to // for integer division in cable length binary search in test_exceeding_headroom.
Why: Python 3 / returns float, producing invalid cable lengths like "473.0m" that YANG validation rejects.
How: Changed cable_length_step /= 2 to //= 2 and (upper + lower) / 2 to // 2.
Testing: Regression pass on multi-ASIC platforms. All CI checks passed. Backport requested for 202511.

Signed-off-by: weiguo-nvidia <weguo@nvidia.com>
…ry check failure (sonic-net#23824)

What: Added wait_for_bgp=True to config_reload calls in test_config_reload_with_rendered_golden_config.py setup_env fixture.
Why: After config_reload, BGP convergence causes transient memory spikes that trigger false memory utilization ALARMs.
How: Added wait_for_bgp=True, safe_reload, and check_intf_up_ports to both config_reload calls in the fixture.
Testing: Regression pass. All CI checks passed. Backport requested for 202511.

Signed-off-by: weiguo-nvidia <weguo@nvidia.com>
Signed-off-by: Mridul Bajpai <mridul@cisco.com>
Signed-off-by: Mridul Bajpai <mridul@cisco.com>
Signed-off-by: Mridul Bajpai <mridul@cisco.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@bmridul

bmridul commented Apr 18, 2026

Copy link
Copy Markdown
Contributor Author

New PR opened due to Git history issues on the branch
#24036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.