Skip to content

QA fixups and convert Nvidia PMDA to extraunits#2634

Merged
kmcdonell merged 7 commits into
performancecopilot:mainfrom
kmcdonell:wip
Jun 24, 2026
Merged

QA fixups and convert Nvidia PMDA to extraunits#2634
kmcdonell merged 7 commits into
performancecopilot:mainfrom
kmcdonell:wip

Conversation

@kmcdonell

Copy link
Copy Markdown
Member

No description provided.

For power and temperature.

+ pmlogrewrite rules and qa for this
+ updated recipe for recreating qa/archives/nvidiagpu archive
Throughout the known (GNU) awk universe, this works:
    awk 'END { f = "/foo"; print >"/tmp" f }' </dev/null
but on macOS this produces a "awk: syntax error"

However, this works for all versions of awk:
    awk 'END { f = "/foo"; print >("/tmp" f) }' </dev/null

    modified:   qa/1633
    modified:   qa/1674
    modified:   qa/671
    modified:   qa/681
    modified:   qa/new
    modified:   qa/new-dup
    modified:   qa/new-grind
    modified:   qa/src/mv-me
    modified:   src/ctl-tools/ctl-tools.sh
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: d21efe4d-08df-4627-9ed4-56f2141a1446

📥 Commits

Reviewing files that changed from the base of the PR and between ad58a57 and bcd79e9.

📒 Files selected for processing (3)
  • qa/1675
  • qa/group
  • src/pmdas/nvidia/rewrite.conf
✅ Files skipped from review due to trivial changes (1)
  • src/pmdas/nvidia/rewrite.conf
🚧 Files skipped from review as they are similar to previous changes (2)
  • qa/group
  • qa/1675

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added NVIDIA log-rewrite configuration so temperature is reported in °C and power in mW.
  • Bug Fixes
    • Improved NVIDIA archive metric recreation when tools are available, with clearer warnings when not.
    • Refined several parsing/output and filtering behaviors used by QA scripts to produce more consistent diff/report output.
  • Tests
    • Added a new QA script to validate pmlogrewrite results against NVIDIA PMDA data/config using archive-level comparisons.

Walkthrough

Adds explicit temperature and power unit metadata to the NVIDIA PMDA, introduces pmlogrewrite support and QA coverage for NVIDIA archives, updates archive generation, and adjusts several AWK print-redirection expressions in QA and tool scripts.

Changes

NVIDIA PMDA unit metadata and pmlogrewrite support

Layer / File(s) Summary
Metric units and rewrite rules
src/pmdas/nvidia/nvidia.c, src/pmdas/nvidia/rewrite.conf
NVIDIA_TEMPERATURE and NVIDIA_POWER use explicit extra units, and rewrite.conf maps the NVIDIA metrics to temperature and power encodings.
Rewrite config install wiring
src/pmdas/nvidia/GNUmakefile
Adds rewrite config directories and installs the NVIDIA rewrite configuration into the pmlogrewrite variable config location.
QA coverage, archive regeneration, and group entry
qa/1675, qa/archives/mk.nvidiagpu, qa/group
Adds QA 1675 for pmlogrewrite on NVIDIA archives, regenerates nvidiagpu.smi.txt with nvidia-smi when available, and points the group entry at pmda.nvidia.
Amgpu QA adjustments
qa/1559, qa/1674
qa/1559 switches archive discovery to ls with stderr suppression, and qa/1674 changes unexpected-unit error reporting and the amd-smi pipeline line.

AWK print redirection parenthesization fixes

Layer / File(s) Summary
Parenthesized print targets
qa/new, qa/new-dup, qa/new-grind, qa/src/mv-me, src/ctl-tools/ctl-tools.sh, qa/671, qa/681, qa/1633
AWK redirections in these scripts now use parenthesized filename expressions for print > ... targets.

Possibly related PRs

Suggested reviewers

  • natoscott
  • tallpsmith

Poem

🐇 Hop hop, the units now align,
Celsius and milliwatts trace the line.
Rewrite rules twinkle, archives reply,
AWK gets parentheses tucked neatly by.
This bunny stamps the review with a grin ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive No meaningful author description was provided, so its relation to the changeset can't be evaluated. Add a brief description of the QA fixups and NVIDIA PMDA extraunits changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: QA fixes and the NVIDIA PMDA extraunits conversion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@qa/1675`:
- Line 57: The archive iteration currently uses a literal glob fallback, which
can cause a bogus loop entry when no matching files exist. Update the loop in
the archive-handling logic to use the same unmatched-safe listing approach used
in related QA fixes, replacing the echo-based expansion with the ls-based
pattern so the loop only runs for real nvidiagpu archive indexes. Locate the
change in the archive iteration expression and keep the surrounding sed-based
suffix stripping intact.

In `@qa/group`:
- Line 2225: The group association for test 1675 is missing the local tag, so
update the test entry in the qa/group definition to include local alongside the
existing pmlogrewrite and pmda.nvidia tags. Keep the change scoped to the test
1675 association so it participates in expected local-group runs.

In `@src/pmdas/nvidia/rewrite.conf`:
- Around line 9-11: The nvidia.power rewrite rule is using the wrong scale
metadata and no longer matches the PMDA descriptor. Update the metric stanza for
nvidia.power in rewrite.conf so its extraunits uses the same power scale as the
PMDA metadata (PM_POWER_mW) rather than POWER,W, keeping the rewrite contract
consistent with the live descriptor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: baadc2b3-9421-4a2a-b639-bff4a59f1731

📥 Commits

Reviewing files that changed from the base of the PR and between aabd29c and ad58a57.

⛔ Files ignored due to path filters (3)
  • qa/1559.out is excluded by !**/*.out
  • qa/1670.out is excluded by !**/*.out
  • qa/1675.out is excluded by !**/*.out
📒 Files selected for processing (16)
  • qa/1559
  • qa/1633
  • qa/1674
  • qa/1675
  • qa/671
  • qa/681
  • qa/archives/mk.nvidiagpu
  • qa/group
  • qa/new
  • qa/new-dup
  • qa/new-grind
  • qa/src/mv-me
  • src/ctl-tools/ctl-tools.sh
  • src/pmdas/nvidia/GNUmakefile
  • src/pmdas/nvidia/nvidia.c
  • src/pmdas/nvidia/rewrite.conf

Comment thread qa/1675 Outdated
Comment thread qa/group Outdated
Comment thread src/pmdas/nvidia/rewrite.conf
@kmcdonell kmcdonell merged commit 93efeee into performancecopilot:main Jun 24, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant