Fix numa parsing in job exporter by abuccts · Pull Request #143 · microsoft/ltp-platform

abuccts · 2026-01-27T23:38:59Z

Fix numa parsing in job exporter on GB200.

Copilot

Pull request overview

This PR fixes a NUMA parsing issue in the job exporter that occurs on GB200 hardware. The GB200's numactl --hardware output includes header lines in the format "node X cpus:" without CPU lists, which the previous code would incorrectly attempt to parse.

Changes:

Added a check to skip lines ending with "cpus:" to prevent parsing empty CPU lists on GB200 systems

Comments suppressed due to low confidence (2)

src/job-exporter/src/Moneo/src/worker/exporters/node_exporter.py:281

The fix correctly filters out header lines ending with "cpus:" to prevent parsing issues on GB200. However, there's a potential logic issue in the subsequent lines (278-281). When a valid line is found (passing all three conditions on line 276), the code checks for ':' again on line 278, which is redundant since line 276 already requires 'cpus' to be in the line. More importantly, if a line passes the check on line 276 but doesn't contain ':', the code continues without processing that line, potentially missing valid NUMA data.

Consider refactoring the inner condition to be clearer about what it's checking, or combine the checks on lines 276 and 278 into a single, more explicit condition.

        if 'node ' in line and 'cpus' in line and not line.strip().endswith('cpus:'):
            current_numa_domain = int(re.search(r'node (\d+)', line).group(1))
            if ':' in line:
                cpus_str = line.split(': ')[1].split()
                for cpu in cpus_str:
                    numa_mapping[int(cpu)] = current_numa_domain

src/job-exporter/src/Moneo/src/worker/exporters/node_exporter.py:281

This change fixes a parsing issue for GB200 hardware but lacks test coverage. The repository has comprehensive test coverage for other exporters (test_amd.py, test_nvidia.py, test_collector.py, etc.), but there are no tests for the get_core_numa_mapping function or the node_exporter module. Consider adding unit tests that verify the function correctly handles various numactl output formats, including:

Standard output with CPU lists
GB200-specific output with empty "node X cpus:" lines
The fallback to lscpu when core_count doesn't match

This would prevent regressions and document the expected behavior across different hardware platforms.

        if 'node ' in line and 'cpus' in line and not line.strip().endswith('cpus:'):
            current_numa_domain = int(re.search(r'node (\d+)', line).group(1))
            if ':' in line:
                cpus_str = line.split(': ')[1].split()
                for cpu in cpus_str:
                    numa_mapping[int(cpu)] = current_numa_domain

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* Fix numa parsing in job exporter (#143) Fix numa parsing in job exporter on GB200. * Fix kubespray deployment on bare metal (#144) Fix kubespray deployment on bare metal when cffi package is installed by package manager. * fix the circular dependency (#142) Co-authored-by: Rui Gao <ruigao@microsoft.com> * fix more circular dependencies in rest server (#145) Co-authored-by: Rui Gao <ruigao@microsoft.com> Co-authored-by: zhogu <57975490+zhogu@users.noreply.github.com> * Update the workflow tigger (#146) * change trigger of github workflow * update * update * update * add release note for v1.5 (#148) Co-authored-by: Rui Gao <ruigao@microsoft.com> --------- Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com> Co-authored-by: Rui Gao <ruigao@microsoft.com> Co-authored-by: zhogu <57975490+zhogu@users.noreply.github.com>

Fix numa parsing in job exporter

f8b92a9

Fix numa parsing in job exporter on GB200.

abuccts requested review from Copilot and hippogr January 27, 2026 23:38

abuccts temporarily deployed to auto-test January 27, 2026 23:39 — with GitHub Actions Inactive

Copilot started reviewing on behalf of abuccts January 27, 2026 23:39 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Comment thread src/job-exporter/src/Moneo/src/worker/exporters/node_exporter.py

abuccts temporarily deployed to auto-test January 27, 2026 23:57 — with GitHub Actions Inactive

hippogr approved these changes Feb 4, 2026

View reviewed changes

abuccts merged commit 2664031 into release/1.5 Feb 5, 2026
10 checks passed

abuccts deleted the xiongyf/fix-job-exporter-numa branch February 5, 2026 00:51

hippogr mentioned this pull request Feb 9, 2026

Release - Lucia Training Platform v1.5 #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix numa parsing in job exporter#143

Fix numa parsing in job exporter#143
abuccts merged 1 commit into
release/1.5from
xiongyf/fix-job-exporter-numa

abuccts commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abuccts commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants