linux, collectl2pcp: add missing subsystem handlers and cpu frequency metric by natoscott · Pull Request #2628 · performancecopilot/pcp

natoscott · 2026-06-20T08:03:40Z

linux PMDA: add hinv.cpu.frequency_scaling.current (item 10 in CLUSTER_SYSFS_DEVICES) reading /sys/devices/system/cpu/*/cpufreq/ scaling_cur_freq per CPU. The existing freq.scale field already holds this value from refresh_sysfs_frequency_scaling_cur_freq(); we simply expose it as a new instant metric alongside the existing .min/.max.

collectl2pcp: add handlers for subsystems that had no coverage:
sock - network.sockstat.* from /proc/net/sockstat
fs-ds - vfs.dentry.count from /proc/sys/fs/dentry-state
fs-is - vfs.inodes.count from /proc/sys/fs/inode-nr
fs-fnr - vfs.files.count from /proc/sys/fs/file-nr
buddy - mem.buddyinfo.total from /proc/buddyinfo (per zone)
nfsc- - nfs.client.calls from /proc/net/rpc/nfs
nfss- - nfs.server.calls from /proc/net/rpc/nfsd
numai - mem.numa.util.* from /sys/devices/system/node/nodeN/meminfo

Test data: qa/collectl/subsys_test.raw.gz (1.9KB, 3 samples) collected using collectl -s sibm to exercise all new handlers.

… metric linux PMDA: add hinv.cpu.frequency_scaling.current (item 10 in CLUSTER_SYSFS_DEVICES) reading /sys/devices/system/cpu/*/cpufreq/ scaling_cur_freq per CPU. The existing freq.scale field already holds this value from refresh_sysfs_frequency_scaling_cur_freq(); we simply expose it as a new instant metric alongside the existing .min/.max. collectl2pcp: add handlers for subsystems that had no coverage: sock - network.sockstat.* from /proc/net/sockstat fs-ds - vfs.dentry.count from /proc/sys/fs/dentry-state fs-is - vfs.inodes.count from /proc/sys/fs/inode-nr fs-fnr - vfs.files.count from /proc/sys/fs/file-nr buddy - mem.buddyinfo.total from /proc/buddyinfo (per zone) nfsc- - nfs.client.calls from /proc/net/rpc/nfs nfss- - nfs.server.calls from /proc/net/rpc/nfsd numai - mem.numa.util.* from /sys/devices/system/node/nodeN/meminfo Test data: qa/collectl/subsys_test.raw.gz (1.9KB, 3 samples) collected using collectl -s sibm to exercise all new handlers. Signed-off-by: Nathan Scott <nathans@redhat.com>

coderabbitai · 2026-06-20T08:04:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 980a8c8c-9360-4e5c-af55-22812f01b194

📥 Commits

Reviewing files that changed from the base of the PR and between 8fcb59e and 862562f.

📒 Files selected for processing (7)

docs/HowTos/collectl/archives.rst
docs/HowTos/collectl/colmux.rst
docs/HowTos/collectl/getting-started.rst
docs/HowTos/collectl/index.rst
docs/HowTos/collectl/subsystems.rst
docs/index.rst
man/man1/pcpcompat.1

💤 Files with no reviewable changes (1)

man/man1/pcpcompat.1

✅ Files skipped from review due to trivial changes (5)

docs/index.rst
docs/HowTos/collectl/index.rst
docs/HowTos/collectl/subsystems.rst
docs/HowTos/collectl/getting-started.rst
docs/HowTos/collectl/archives.rst

🚧 Files skipped from review as they are similar to previous changes (1)

docs/HowTos/collectl/colmux.rst

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added collection and export for additional kernel/process data sources: socket statistics, filesystem counters, buddy allocator info, NUMA memory utilization, and NFS client/server call and retransmission metrics.
- Added hinv.cpu.frequency_scaling.current to report current CPU frequency scaling.
Documentation
- Expanded the pcp-collectl user guide (getting started, subsystem reference, archives, and colmux).
- Updated pcpcompat manual page to correct the NAME section and remove the outdated pcp-collectl documentation block.

Walkthrough

This PR adds eight new collectl subsystem handlers to subsys.c (sockstat, VFS dentry/inode/file, buddyinfo, NFS client/server, NUMA meminfo) wired into the collectl2pcp dispatch table and build, a new hinv.cpu.frequency_scaling.current metric to the Linux PMDA with metrictab entry and fetch callback, comprehensive user documentation covering the pcp-collectl guide, getting-started tutorial, subsystem reference, archive/daemon management, and multi-host aggregation with pcp-colmux, and removes pcp-collectl content from the compatibility man page.

Changes

collectl2pcp subsystem handlers

Layer / File(s)	Summary
Handler prototypes, dispatch table, and build wiring `src/collectl2pcp/metrics.h`, `src/collectl2pcp/collectl2pcp.c`, `src/collectl2pcp/GNUmakefile`	`metrics.h` declares eight new `extern` handler prototypes; `collectl2pcp.c` inserts the tag-to-handler entries (`sock`, `fs-ds`, `fs-is`, `fs-fnr`, `buddy`, `nfsc-`, `nfss-`, `numai`) into `handlers[]`; `GNUmakefile` adds `subsys.c` to `CFILES`.
subsys.c handler implementations `src/collectl2pcp/subsys.c`	Implements all eight handlers: `sock_handler` parses `/proc/net/sockstat` by protocol variant; `dentry_handler`, `inode_handler`, and `filenr_handler` map sysfs fs entries to `vfs.` metrics; `buddy_handler` computes weighted per-zone free-page totals; `nfsc_handler` and `nfss_handler` dispatch on `rpc`/`proc3`/`proc4ops` lines for NFS client and server metrics; `numai_handler` maps NUMA node meminfo keys to `mem.numa.util.` metrics.

Linux PMDA: hinv.cpu.frequency_scaling.current

Layer / File(s)	Summary
CPU frequency scaling current metric registration and fetch `src/pmdas/linux/pmda.c`, `src/pmdas/linux/root_linux`, `src/pmdas/linux/help`	`pmda.c` adds a `metrictab` entry (`PM_TYPE_FLOAT`, `CPU_INDOM`, item 10) and a `case 10` fetch path in `linux_fetchCallBack()` calling `refresh_sysfs_frequency_scaling_cur_freq()`, gated on `CPUFREQ_SCALE` and a non-zero `cp->freq.scale`; `root_linux` maps the metric to `60:55:10`; `help` adds its one-line description.

pcp-collectl and pcp-colmux user documentation

Layer / File(s)	Summary
pcp-collectl guide landing page and overview `docs/HowTos/collectl/index.rst`	Introduces `pcp-collectl` as a PCP reimplementation of collectl, outlines archive semantics, includes quick-start examples for monitoring and archive replay, documents key differences from the original collectl (archive format, collection method, compression, playback), and provides a toctree linking to subsections.
Getting Started: subsystem selection and common usage patterns `docs/HowTos/collectl/getting-started.rst`	Documents subsystem codes with a full reference table, default subsystem sets, output modes (brief vs. verbose), understanding rate vs. percentage metrics, instance filtering via POSIX extended regex patterns, and multiple ready-to-run command examples.
Subsystem Reference: detailed metric mappings and output columns `docs/HowTos/collectl/subsystems.rst`	Comprehensive reference for each subsystem (CPU, Disk, Memory, NUMA, Network, Sockets, TCP, Inodes/Files, Interrupts, Slab, Buddy, NFS, Process) documenting output columns, filtering options, and PCP metric namespace mappings.
Archive Management: daemon mode, rotation, migration, and playback `docs/HowTos/collectl/archives.rst`	Documents PCP archive semantics and naming, daemon mode setup via systemd and `collectl.conf`, log rotation options with compression/purging behavior, manual rotation via `SIGUSR1`, migration from legacy collectl raw archives via `collectl2pcp`, and playback/replay workflows with time-window filtering.
pcp-colmux: multi-host aggregation, display modes, and archive playback `docs/HowTos/collectl/colmux.rst`	Introduces `pcp-colmux` for multi-host metric aggregation, documents host specification via `-address` (ranges, `@file`), sorted/columnar display modes with interactive terminal controls, command-line options for sorting/filtering/formatting, and multi-host archive playback with glob patterns.
Main documentation index navigation update `docs/index.rst`	Adds `HowTos/collectl/index` to both the main document list and the Guides toctree, placing the collectl guide in the top-level documentation navigation.
Compatibility man page: remove pcp-collectl section `man/man1/pcpcompat.1`	Corrects the NAME section to list `PCPCompat` and removes the `PCP-COLLECTL` documentation section, which is now documented in the standalone collectl user guide.

🐇 Eight handlers hop into subsys.c,
Parsing sockets, NFS, and memory with glee.
Dentry counts and buddy zones align,
With CPU frequencies in the Linux PMDA line.
And guides bloom to help all who see! 📚✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: adding missing subsystem handlers in collectl2pcp and a CPU frequency metric to the Linux PMDA.
Description check	✅ Passed	The description is directly related to the changeset, providing specific details about the Linux PMDA metric addition and eight new collectl2pcp handlers with their metric mappings.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/collectl2pcp/subsys.c`:
- Around line 51-67: The field index boundary checks in the sock_handler
function have off-by-one errors that could cause out-of-bounds reads. In the
UDPLITE, RAW, and FRAG cases, the condition checking `f->nfields >= 3` before
accessing `f->fields[3]` should be `>= 4` since accessing index 3 requires at
least 4 fields. In the UDP case, change the condition `f->nfields >= 7` to `>=
6` before accessing `f->fields[5]` since index 5 only requires 6 fields minimum.
In the FRAG case, also change the secondary condition from `f->nfields >= 5` to
`>= 6` before accessing `f->fields[5]`. Update all these boundary conditions to
prevent potential out-of-bounds array access.
- Around line 149-152: The loop in the buddy_handler function has undefined
behavior and overflow issues. First, add a guard check to ensure that (i - 5) is
less than 64 before performing the left shift operation (1ULL << (i - 5)), as
shifting by 64 or more bits on a 64-bit type is undefined. Second, add overflow
checking for the multiplication of count and the shifted value, and also add
overflow checking for the addition to total. Consider using safe arithmetic
functions or manual checks to detect when these operations would exceed the
limits of unsigned long long, and handle such cases appropriately (either by
capping the value, breaking early, or logging an error).
- Around line 228-239: The mem.numa.util.* metrics (mem.numa.util.total,
mem.numa.util.free, mem.numa.util.used, mem.numa.util.active,
mem.numa.util.inactive, mem.numa.util.dirty) in numai_handler are being written
with PM_INDOM_NULL and no instance name, but these metrics are defined with
instance domain 0xf000013 (node domain) in metrics.c. This causes data loss
across multiple nodes. Either redefine these metrics in metrics.c to use
PM_INDOM_NULL as their instance domain if they should be aggregates, or modify
the put_str_value calls in lines 228-239 to use the correct instance domain
0xf000013 with appropriate node instance names. Choose one approach and apply it
consistently across all six mem.numa.util.* metrics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: eb518945-2746-42ef-a15c-fb4202f1de07

📥 Commits

Reviewing files that changed from the base of the PR and between 13767ba and a1608f3.

⛔ Files ignored due to path filters (1)

qa/collectl/subsys_test.raw.gz is excluded by !**/*.gz

📒 Files selected for processing (7)

src/collectl2pcp/GNUmakefile
src/collectl2pcp/collectl2pcp.c
src/collectl2pcp/metrics.h
src/collectl2pcp/subsys.c
src/pmdas/linux/help
src/pmdas/linux/pmda.c
src/pmdas/linux/root_linux

coderabbitai · 2026-06-20T08:11:31Z

+    else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 5) {
+	/* UDP: inuse N mem N */
+	put_str_value("network.sockstat.udp.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
+	if (f->nfields >= 7)
+	    put_str_value("network.sockstat.udp.mem", PM_INDOM_NULL, NULL, f->fields[5]);
+    }
+    else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 3) {
+	put_str_value("network.sockstat.udplite.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
+    }
+    else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 3) {
+	put_str_value("network.sockstat.raw.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
+    }
+    else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 3) {
+	put_str_value("network.sockstat.frag.inuse",   PM_INDOM_NULL, NULL, f->fields[3]);
+	if (f->nfields >= 5)
+	    put_str_value("network.sockstat.frag.memory", PM_INDOM_NULL, NULL, f->fields[5]);
+    }


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix off-by-one field guards in sock_handler to prevent OOB reads.

At Line 57/60/63, f->fields[3] is accessed with only f->nfields >= 3; this requires >= 4.
At Line 55 and Line 66, f->fields[5] is accessed with >= 7 (UDP, too strict) and >= 5 (FRAG, too loose); both should be >= 6 for index 5.

Suggested patch

- else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 5) { + else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 4) { /* UDP: inuse N mem N */ put_str_value("network.sockstat.udp.inuse", PM_INDOM_NULL, NULL, f->fields[3]); - if (f->nfields >= 7) + if (f->nfields >= 6) put_str_value("network.sockstat.udp.mem", PM_INDOM_NULL, NULL, f->fields[5]); } - else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 3) { + else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 4) { put_str_value("network.sockstat.udplite.inuse", PM_INDOM_NULL, NULL, f->fields[3]); } - else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 3) { + else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 4) { put_str_value("network.sockstat.raw.inuse", PM_INDOM_NULL, NULL, f->fields[3]); } - else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 3) { + else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 4) { put_str_value("network.sockstat.frag.inuse", PM_INDOM_NULL, NULL, f->fields[3]); - if (f->nfields >= 5) + if (f->nfields >= 6) put_str_value("network.sockstat.frag.memory", PM_INDOM_NULL, NULL, f->fields[5]); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/collectl2pcp/subsys.c` around lines 51 - 67, The field index boundary checks in the sock_handler function have off-by-one errors that could cause out-of-bounds reads. In the UDPLITE, RAW, and FRAG cases, the condition checking `f->nfields >= 3` before accessing `f->fields[3]` should be `>= 4` since accessing index 3 requires at least 4 fields. In the UDP case, change the condition `f->nfields >= 7` to `>= 6` before accessing `f->fields[5]` since index 5 only requires 6 fields minimum. In the FRAG case, also change the secondary condition from `f->nfields >= 5` to `>= 6` before accessing `f->fields[5]`. Update all these boundary conditions to prevent potential out-of-bounds array access.

coderabbitai · 2026-06-20T08:11:31Z

+    for (i = 5; i < f->nfields; i++) {
+	unsigned long long count = strtoull(f->fields[i], NULL, 10);
+	total += count * (1ULL << (i - 5));
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard buddy_handler arithmetic against undefined shift and overflow.

Line 151 computes 1ULL << (i - 5) from parsed input width; once (i - 5) >= 64, this is undefined behavior. Also, count * (...) and total += ... can wrap silently.

Suggested patch

+#include <limits.h> @@ for (i = 5; i < f->nfields; i++) { + unsigned int order = (unsigned int)(i - 5); unsigned long long count = strtoull(f->fields[i], NULL, 10); - total += count * (1ULL << (i - 5)); + if (order >= 64) + break; /* avoid undefined shift */ + if (count > (ULLONG_MAX >> order)) + continue; /* skip unrepresentable term */ + { + unsigned long long pages = (count << order); + if (ULLONG_MAX - total < pages) { + total = ULLONG_MAX; + break; + } + total += pages; + } }

As per coding guidelines, "Integer overflow: check arithmetic on untrusted sizes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/collectl2pcp/subsys.c` around lines 149 - 152, The loop in the buddy_handler function has undefined behavior and overflow issues. First, add a guard check to ensure that (i - 5) is less than 64 before performing the left shift operation (1ULL << (i - 5)), as shifting by 64 or more bits on a 64-bit type is undefined. Second, add overflow checking for the multiplication of count and the shifted value, and also add overflow checking for the addition to total. Consider using safe arithmetic functions or manual checks to detect when these operations would exceed the limits of unsigned long long, and handle such cases appropriately (either by capping the value, breaking early, or logging an error).

Source: Coding guidelines

coderabbitai · 2026-06-20T08:11:31Z

+    if (strcmp(f->fields[1], "MemTotal:") == 0)
+	put_str_value("mem.numa.util.total", PM_INDOM_NULL, NULL, f->fields[2]);
+    else if (strcmp(f->fields[1], "MemFree:") == 0)
+	put_str_value("mem.numa.util.free", PM_INDOM_NULL, NULL, f->fields[2]);
+    else if (strcmp(f->fields[1], "MemUsed:") == 0)
+	put_str_value("mem.numa.util.used", PM_INDOM_NULL, NULL, f->fields[2]);
+    else if (strcmp(f->fields[1], "Active:") == 0)
+	put_str_value("mem.numa.util.active", PM_INDOM_NULL, NULL, f->fields[2]);
+    else if (strcmp(f->fields[1], "Inactive:") == 0)
+	put_str_value("mem.numa.util.inactive", PM_INDOM_NULL, NULL, f->fields[2]);
+    else if (strcmp(f->fields[1], "Dirty:") == 0)
+	put_str_value("mem.numa.util.dirty", PM_INDOM_NULL, NULL, f->fields[2]);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail RAW="$(fd 'subsys_test\.raw\.gz$' qa | head -n1)" gzip -dc "$RAW" | rg -n '^numai' -C1 echo "---- first token frequencies ----" gzip -dc "$RAW" | rg '^numai' | awk '{print $1}' | sort | uniq -c

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Find the test file or qa directory find . -type d -name "qa" 2>/dev/null | head -5

Repository: performancecopilot/pcp

Length of output: 86

🏁 Script executed:

# Locate the subsys.c file and verify it exists find . -name "subsys.c" -type f

Repository: performancecopilot/pcp

Length of output: 94

🏁 Script executed:

# Check the structure of the file and look at the numai_handler function cat -n src/collectl2pcp/subsys.c | head -50

Repository: performancecopilot/pcp

Length of output: 2456

🏁 Script executed:

# Find the numai_handler function and context around lines 228-239 sed -n '200,260p' src/collectl2pcp/subsys.c | cat -n

Repository: performancecopilot/pcp

Length of output: 2037

🏁 Script executed:

# Look for the full numai_handler function definition grep -n "numai_handler" src/collectl2pcp/subsys.c

Repository: performancecopilot/pcp

Length of output: 111

🏁 Script executed:

# Search for how numai data is input/structured - look at the input data processing grep -n "numai" src/collectl2pcp/subsys.c | head -20

Repository: performancecopilot/pcp

Length of output: 438

🏁 Script executed:

# Look at how fields are parsed - find where fields array is populated grep -n "fields\[" src/collectl2pcp/subsys.c | head -30

Repository: performancecopilot/pcp

Length of output: 2380

🏁 Script executed:

# Look at the complete numai handler including comments to understand intent sed -n '210,241p' src/collectl2pcp/subsys.c

Repository: performancecopilot/pcp

Length of output: 1424

🏁 Script executed:

# Search for test data files find qa -name "*numai*" -o -name "*subsys*test*" 2>/dev/null

Repository: performancecopilot/pcp

Length of output: 97

🏁 Script executed:

# Look for how collectl data is read/parsed - understand the handler invocation grep -n "handler_t" src/collectl2pcp/subsys.c | head -10

Repository: performancecopilot/pcp

Length of output: 422

🏁 Script executed:

# Check how the fields array is populated and if node info could be in header grep -n "fields_t\|nfields\|fields\[0\]" src/collectl2pcp/subsys.c | head -20

Repository: performancecopilot/pcp

Length of output: 990

🏁 Script executed:

# Examine the actual numai test data gzip -dc qa/collectl/subsys_test.raw.gz | grep -A5 "^numai" | head -40

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Get a broader view of numai entries and context gzip -dc qa/collectl/subsys_test.raw.gz | grep "^numai" | wc -l gzip -dc qa/collectl/subsys_test.raw.gz | grep "^numai" | head -20

Repository: performancecopilot/pcp

Length of output: 68

🏁 Script executed:

# Check if there's any header or node identifier before numai lines gzip -dc qa/collectl/subsys_test.raw.gz | grep -B2 "^numai" | head -50

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Look for how the handler_t structure is used - find the main processing loop grep -n "h->\|handler_t\|fields_t" src/collectl2pcp/*.c | grep -v "^Binary" | head -30

Repository: performancecopilot/pcp

Length of output: 2186

🏁 Script executed:

# Check the handler_t and fields_t structure definitions grep -n "typedef.*handler_t\|typedef.*fields_t\|struct.*handler_t\|struct.*fields_t" src/collectl2pcp/*.h

Repository: performancecopilot/pcp

Length of output: 118

🏁 Script executed:

# Look at buddy_handler which seems to extract node info - understand the pattern sed -n '124,168p' src/collectl2pcp/subsys.c

Repository: performancecopilot/pcp

Length of output: 1435

🏁 Script executed:

# Check the header.c to understand how handlers are invoked and where node info might come from cat src/collectl2pcp/header.c

Repository: performancecopilot/pcp

Length of output: 5120

🏁 Script executed:

# Look at how collectl data is parsed and structured in main processing sed -n '140,300p' src/collectl2pcp/collectl2pcp.c

Repository: performancecopilot/pcp

Length of output: 3972

🏁 Script executed:

# Look at the handler_t structure definition more completely grep -A 20 "typedef struct handler" src/collectl2pcp/*.h

Repository: performancecopilot/pcp

Length of output: 543

🏁 Script executed:

# Check if there's any mechanism to track node context or state across lines grep -n "static\|context\|node" src/collectl2pcp/subsys.c | head -20

Repository: performancecopilot/pcp

Length of output: 895

🏁 Script executed:

# Look at how the collectl data is formatted - check for any header or metadata before numai section cat qa/collectl/subsys_test.raw.gz | gzip -dc | head -100

Repository: performancecopilot/pcp

Length of output: 3345

🏁 Script executed:

# Search for actual numai data in test files or examples find qa -type f -name "*.gz" -o -name "*.raw" | xargs -I {} sh -c 'gzip -dc {} 2>/dev/null | grep -q "^numai" && echo "Found numai in: {}"' 2>/dev/null

Repository: performancecopilot/pcp

Length of output: 50

🏁 Script executed:

# Search the codebase for references to numai to understand its status grep -r "numai" src --include="*.c" --include="*.h" | grep -v "^Binary"

Repository: performancecopilot/pcp

Length of output: 885

🏁 Script executed:

# Check if there are any integration tests or documentation about numai functionality find . -type f $ -name "*.md" -o -name "*.txt" -o -name "*test*" $ -exec grep -l "numai" {} \;

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Check the metrics definition for mem.numa.util.* to see if they have an instance domain grep -n "mem.numa.util" src/collectl2pcp/metrics.c

Repository: performancecopilot/pcp

Length of output: 4537

🏁 Script executed:

# Look at the full handler registration to confirm numai_handler is included grep -B5 -A15 "handlers\[\]" src/collectl2pcp/collectl2pcp.c

Repository: performancecopilot/pcp

Length of output: 529

🏁 Script executed:

# Check if there's any documentation or comments about the intended design grep -B5 -A5 "aggregate\|NUMA\|node" src/collectl2pcp/subsys.c | head -60

Repository: performancecopilot/pcp

Length of output: 1727

🏁 Script executed:

# Verify the exact indom value used for mem.numa.util metrics grep "0xf000013" src/collectl2pcp/metrics.c | head -3

Repository: performancecopilot/pcp

Length of output: 354

🏁 Script executed:

# Check what 0xf000013 resolves to grep -n "0xf000013\|PM_INDOM_NULL\|INDOM" src/collectl2pcp/metrics.h | head -20

Repository: performancecopilot/pcp

Length of output: 103

🏁 Script executed:

# Look at how put_str_value works with PM_INDOM_NULL grep -n "put_str_value\|put_str_instance" src/collectl2pcp/*.c | head -20

Repository: performancecopilot/pcp

Length of output: 2053

🏁 Script executed:

# Confirm what indom 0xf000013 is grep "0xf000013\|PM_INDOM_NULL" src/collectl2pcp/metrics.c | grep -E "hinv.*node|kernel.*node" | head -5

Repository: performancecopilot/pcp

Length of output: 555

🏁 Script executed:

# Check if there's documentation about what the numai metrics are intended to represent grep -B3 -A3 "mem.numa.util" src/collectl2pcp/metrics.c | head -20

Repository: performancecopilot/pcp

Length of output: 1640

🏁 Script executed:

# Look at how buddy_handler uses put_str_instance vs how numai_handler uses put_str_value grep -n "put_str_instance\|put_str_value" src/collectl2pcp/subsys.c

Repository: performancecopilot/pcp

Length of output: 2506

🏁 Script executed:

# Verify the definition and purpose of put_str_instance and put_str_value grep -n "^put_str" src/collectl2pcp/util.c | head -10

Repository: performancecopilot/pcp

Length of output: 188

🏁 Script executed:

# Check what the actual indom 0xf000013 is grep "define.*0xf000013\|`#define`.*0xf000013" src/collectl2pcp/*.h

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Look at the INDOM definitions grep -n "INDOM\|define.*INDOM" src/collectl2pcp/metrics.h | head -20

Repository: performancecopilot/pcp

Length of output: 103

🏁 Script executed:

# Understand how put_str_value is implemented with PM_INDOM_NULL sed -n '52,100p' src/collectl2pcp/util.c

Repository: performancecopilot/pcp

Length of output: 1555

🏁 Script executed:

# Check the full indom definitions and values grep -n "INDOM\|define" src/collectl2pcp/metrics.h | grep -i "0xf000013\|node\|numa" | head -20

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Look at how metrics are defined - see the pattern for metrics with vs without indom sed -n '390,410p' src/collectl2pcp/metrics.c

Repository: performancecopilot/pcp

Length of output: 1687

🏁 Script executed:

# Check if there are any other numai format samples or documentation find . -type f $ -name "*.md" -o -name "*.txt" -o -name "README*" $ -exec grep -l "numai\|/sys/devices/system/node" {} \;

Repository: performancecopilot/pcp

Length of output: 48

🏁 Script executed:

# Confirm the indom 0xf000013 definition - what is the serial number 0x13? python3 << 'PYTHON' indom_val = 0xf000013 domain = (indom_val >> 10) & 0xfffff serial = indom_val & 0x3ff print(f"Domain: {domain} (0x{domain:x}), Serial: {serial} (0x{serial:x})") PYTHON

Repository: performancecopilot/pcp

Length of output: 110

🏁 Script executed:

# Check the metric definition to see what indom is actually declared grep -A2 "mem.numa.util.total" src/collectl2pcp/metrics.c

Repository: performancecopilot/pcp

Length of output: 313

🏁 Script executed:

# Verify if metrics with 0xf000013 indom expect per-node instances grep "0xf000013" src/collectl2pcp/metrics.c | head -10

Repository: performancecopilot/pcp

Length of output: 1063

🏁 Script executed:

# Confirm: check how buddy_handler uses indom vs how it's defined grep "mem.buddyinfo" src/collectl2pcp/metrics.c | head -3

Repository: performancecopilot/pcp

Length of output: 260

🏁 Script executed:

# Look at the complete picture - what indom should numai use? grep "buddy\|BUDDYINFO" src/collectl2pcp/*.c src/collectl2pcp/*.h

Repository: performancecopilot/pcp

Length of output: 1583

The mem.numa.util.* metrics are defined with instance domain 0xf000013 (the node domain) but numai_handler writes them with PM_INDOM_NULL, causing data loss across multiple nodes.

Lines 228-239 call put_str_value() with PM_INDOM_NULL and no instance name, which means each new node's data overwrites the previous node's values instead of creating per-node instances. This contradicts the metric definitions in metrics.c, which declare these metrics with indom 0xf000013 (the same instance domain used by kernel.pernode.cpu.* and hinv.node.online).

The code comment acknowledges: "Since we cannot recover the node number from the line itself, we map to the aggregate mem.numa.util.* metrics." However, the metrics are not defined as singular aggregates—they are defined with a node instance domain, creating a design mismatch.

Either:

Redefine mem.numa.util.* metrics with PM_INDOM_NULL if aggregates are intended, or

Recover or infer the node identity and write per-instance values like buddy_handler does.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/collectl2pcp/subsys.c` around lines 228 - 239, The mem.numa.util.* metrics (mem.numa.util.total, mem.numa.util.free, mem.numa.util.used, mem.numa.util.active, mem.numa.util.inactive, mem.numa.util.dirty) in numai_handler are being written with PM_INDOM_NULL and no instance name, but these metrics are defined with instance domain 0xf000013 (node domain) in metrics.c. This causes data loss across multiple nodes. Either redefine these metrics in metrics.c to use PM_INDOM_NULL as their instance domain if they should be aggregates, or modify the put_str_value calls in lines 228-239 to use the correct instance domain 0xf000013 with appropriate node instance names. Choose one approach and apply it consistently across all six mem.numa.util.* metrics.

Four RST pages under docs/HowTos/collectl/: index.rst - overview, quick start, key differences from collectl getting-started - subsystem codes, output modes, rate calculation, filtering, common examples subsystems - per-subsystem metric reference with PCP metric names and output column layouts for all 18 subsystems archives - archive naming, daemon mode, log rotation with xz/.meta + zstd/data compression, culling, playback, collectl2pcp migration path colmux - pcp-colmux multi-host monitoring: PDSH expansion, sorted/columnar display, interactive controls, playback, legacy socket deprecation note Content adapted from the original collectl HTML documentation (http://collectl.sourceforge.net/) by Mark Seger, HP. Signed-off-by: Nathan Scott <nathans@redhat.com>

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

natoscott force-pushed the linux-collectl2pcp-subsys branch from 8fcb59e to 862562f Compare June 21, 2026 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

linux, collectl2pcp: add missing subsystem handlers and cpu frequency metric#2628

linux, collectl2pcp: add missing subsystem handlers and cpu frequency metric#2628
natoscott wants to merge 2 commits into
performancecopilot:mainfrom
natoscott:linux-collectl2pcp-subsys

natoscott commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

natoscott commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading