linux, collectl2pcp: add missing subsystem handlers and cpu frequency metric#2628
linux, collectl2pcp: add missing subsystem handlers and cpu frequency metric#2628natoscott wants to merge 2 commits into
Conversation
… metric linux PMDA: add hinv.cpu.frequency_scaling.current (item 10 in CLUSTER_SYSFS_DEVICES) reading /sys/devices/system/cpu/*/cpufreq/ scaling_cur_freq per CPU. The existing freq.scale field already holds this value from refresh_sysfs_frequency_scaling_cur_freq(); we simply expose it as a new instant metric alongside the existing .min/.max. collectl2pcp: add handlers for subsystems that had no coverage: sock - network.sockstat.* from /proc/net/sockstat fs-ds - vfs.dentry.count from /proc/sys/fs/dentry-state fs-is - vfs.inodes.count from /proc/sys/fs/inode-nr fs-fnr - vfs.files.count from /proc/sys/fs/file-nr buddy - mem.buddyinfo.total from /proc/buddyinfo (per zone) nfsc- - nfs.client.calls from /proc/net/rpc/nfs nfss- - nfs.server.calls from /proc/net/rpc/nfsd numai - mem.numa.util.* from /sys/devices/system/node/nodeN/meminfo Test data: qa/collectl/subsys_test.raw.gz (1.9KB, 3 samples) collected using collectl -s sibm to exercise all new handlers. Signed-off-by: Nathan Scott <nathans@redhat.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited) Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (7)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds eight new collectl subsystem handlers to Changescollectl2pcp subsystem handlers
Linux PMDA: hinv.cpu.frequency_scaling.current
pcp-collectl and pcp-colmux user documentation
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/collectl2pcp/subsys.c`:
- Around line 51-67: The field index boundary checks in the sock_handler
function have off-by-one errors that could cause out-of-bounds reads. In the
UDPLITE, RAW, and FRAG cases, the condition checking `f->nfields >= 3` before
accessing `f->fields[3]` should be `>= 4` since accessing index 3 requires at
least 4 fields. In the UDP case, change the condition `f->nfields >= 7` to `>=
6` before accessing `f->fields[5]` since index 5 only requires 6 fields minimum.
In the FRAG case, also change the secondary condition from `f->nfields >= 5` to
`>= 6` before accessing `f->fields[5]`. Update all these boundary conditions to
prevent potential out-of-bounds array access.
- Around line 149-152: The loop in the buddy_handler function has undefined
behavior and overflow issues. First, add a guard check to ensure that (i - 5) is
less than 64 before performing the left shift operation (1ULL << (i - 5)), as
shifting by 64 or more bits on a 64-bit type is undefined. Second, add overflow
checking for the multiplication of count and the shifted value, and also add
overflow checking for the addition to total. Consider using safe arithmetic
functions or manual checks to detect when these operations would exceed the
limits of unsigned long long, and handle such cases appropriately (either by
capping the value, breaking early, or logging an error).
- Around line 228-239: The mem.numa.util.* metrics (mem.numa.util.total,
mem.numa.util.free, mem.numa.util.used, mem.numa.util.active,
mem.numa.util.inactive, mem.numa.util.dirty) in numai_handler are being written
with PM_INDOM_NULL and no instance name, but these metrics are defined with
instance domain 0xf000013 (node domain) in metrics.c. This causes data loss
across multiple nodes. Either redefine these metrics in metrics.c to use
PM_INDOM_NULL as their instance domain if they should be aggregates, or modify
the put_str_value calls in lines 228-239 to use the correct instance domain
0xf000013 with appropriate node instance names. Choose one approach and apply it
consistently across all six mem.numa.util.* metrics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: eb518945-2746-42ef-a15c-fb4202f1de07
⛔ Files ignored due to path filters (1)
qa/collectl/subsys_test.raw.gzis excluded by!**/*.gz
📒 Files selected for processing (7)
src/collectl2pcp/GNUmakefilesrc/collectl2pcp/collectl2pcp.csrc/collectl2pcp/metrics.hsrc/collectl2pcp/subsys.csrc/pmdas/linux/helpsrc/pmdas/linux/pmda.csrc/pmdas/linux/root_linux
| else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 5) { | ||
| /* UDP: inuse N mem N */ | ||
| put_str_value("network.sockstat.udp.inuse", PM_INDOM_NULL, NULL, f->fields[3]); | ||
| if (f->nfields >= 7) | ||
| put_str_value("network.sockstat.udp.mem", PM_INDOM_NULL, NULL, f->fields[5]); | ||
| } | ||
| else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 3) { | ||
| put_str_value("network.sockstat.udplite.inuse", PM_INDOM_NULL, NULL, f->fields[3]); | ||
| } | ||
| else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 3) { | ||
| put_str_value("network.sockstat.raw.inuse", PM_INDOM_NULL, NULL, f->fields[3]); | ||
| } | ||
| else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 3) { | ||
| put_str_value("network.sockstat.frag.inuse", PM_INDOM_NULL, NULL, f->fields[3]); | ||
| if (f->nfields >= 5) | ||
| put_str_value("network.sockstat.frag.memory", PM_INDOM_NULL, NULL, f->fields[5]); | ||
| } |
There was a problem hiding this comment.
Fix off-by-one field guards in sock_handler to prevent OOB reads.
At Line 57/60/63, f->fields[3] is accessed with only f->nfields >= 3; this requires >= 4.
At Line 55 and Line 66, f->fields[5] is accessed with >= 7 (UDP, too strict) and >= 5 (FRAG, too loose); both should be >= 6 for index 5.
Suggested patch
- else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 5) {
+ else if (strcmp(f->fields[1], "UDP:") == 0 && f->nfields >= 4) {
/* UDP: inuse N mem N */
put_str_value("network.sockstat.udp.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
- if (f->nfields >= 7)
+ if (f->nfields >= 6)
put_str_value("network.sockstat.udp.mem", PM_INDOM_NULL, NULL, f->fields[5]);
}
- else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 3) {
+ else if (strcmp(f->fields[1], "UDPLITE:") == 0 && f->nfields >= 4) {
put_str_value("network.sockstat.udplite.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
}
- else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 3) {
+ else if (strcmp(f->fields[1], "RAW:") == 0 && f->nfields >= 4) {
put_str_value("network.sockstat.raw.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
}
- else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 3) {
+ else if (strcmp(f->fields[1], "FRAG:") == 0 && f->nfields >= 4) {
put_str_value("network.sockstat.frag.inuse", PM_INDOM_NULL, NULL, f->fields[3]);
- if (f->nfields >= 5)
+ if (f->nfields >= 6)
put_str_value("network.sockstat.frag.memory", PM_INDOM_NULL, NULL, f->fields[5]);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/collectl2pcp/subsys.c` around lines 51 - 67, The field index boundary
checks in the sock_handler function have off-by-one errors that could cause
out-of-bounds reads. In the UDPLITE, RAW, and FRAG cases, the condition checking
`f->nfields >= 3` before accessing `f->fields[3]` should be `>= 4` since
accessing index 3 requires at least 4 fields. In the UDP case, change the
condition `f->nfields >= 7` to `>= 6` before accessing `f->fields[5]` since
index 5 only requires 6 fields minimum. In the FRAG case, also change the
secondary condition from `f->nfields >= 5` to `>= 6` before accessing
`f->fields[5]`. Update all these boundary conditions to prevent potential
out-of-bounds array access.
| for (i = 5; i < f->nfields; i++) { | ||
| unsigned long long count = strtoull(f->fields[i], NULL, 10); | ||
| total += count * (1ULL << (i - 5)); | ||
| } |
There was a problem hiding this comment.
Guard buddy_handler arithmetic against undefined shift and overflow.
Line 151 computes 1ULL << (i - 5) from parsed input width; once (i - 5) >= 64, this is undefined behavior. Also, count * (...) and total += ... can wrap silently.
Suggested patch
+#include <limits.h>
@@
for (i = 5; i < f->nfields; i++) {
+ unsigned int order = (unsigned int)(i - 5);
unsigned long long count = strtoull(f->fields[i], NULL, 10);
- total += count * (1ULL << (i - 5));
+ if (order >= 64)
+ break; /* avoid undefined shift */
+ if (count > (ULLONG_MAX >> order))
+ continue; /* skip unrepresentable term */
+ {
+ unsigned long long pages = (count << order);
+ if (ULLONG_MAX - total < pages) {
+ total = ULLONG_MAX;
+ break;
+ }
+ total += pages;
+ }
}As per coding guidelines, "Integer overflow: check arithmetic on untrusted sizes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/collectl2pcp/subsys.c` around lines 149 - 152, The loop in the
buddy_handler function has undefined behavior and overflow issues. First, add a
guard check to ensure that (i - 5) is less than 64 before performing the left
shift operation (1ULL << (i - 5)), as shifting by 64 or more bits on a 64-bit
type is undefined. Second, add overflow checking for the multiplication of count
and the shifted value, and also add overflow checking for the addition to total.
Consider using safe arithmetic functions or manual checks to detect when these
operations would exceed the limits of unsigned long long, and handle such cases
appropriately (either by capping the value, breaking early, or logging an
error).
Source: Coding guidelines
| if (strcmp(f->fields[1], "MemTotal:") == 0) | ||
| put_str_value("mem.numa.util.total", PM_INDOM_NULL, NULL, f->fields[2]); | ||
| else if (strcmp(f->fields[1], "MemFree:") == 0) | ||
| put_str_value("mem.numa.util.free", PM_INDOM_NULL, NULL, f->fields[2]); | ||
| else if (strcmp(f->fields[1], "MemUsed:") == 0) | ||
| put_str_value("mem.numa.util.used", PM_INDOM_NULL, NULL, f->fields[2]); | ||
| else if (strcmp(f->fields[1], "Active:") == 0) | ||
| put_str_value("mem.numa.util.active", PM_INDOM_NULL, NULL, f->fields[2]); | ||
| else if (strcmp(f->fields[1], "Inactive:") == 0) | ||
| put_str_value("mem.numa.util.inactive", PM_INDOM_NULL, NULL, f->fields[2]); | ||
| else if (strcmp(f->fields[1], "Dirty:") == 0) | ||
| put_str_value("mem.numa.util.dirty", PM_INDOM_NULL, NULL, f->fields[2]); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
RAW="$(fd 'subsys_test\.raw\.gz$' qa | head -n1)"
gzip -dc "$RAW" | rg -n '^numai' -C1
echo "---- first token frequencies ----"
gzip -dc "$RAW" | rg '^numai' | awk '{print $1}' | sort | uniq -cRepository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Find the test file or qa directory
find . -type d -name "qa" 2>/dev/null | head -5Repository: performancecopilot/pcp
Length of output: 86
🏁 Script executed:
# Locate the subsys.c file and verify it exists
find . -name "subsys.c" -type fRepository: performancecopilot/pcp
Length of output: 94
🏁 Script executed:
# Check the structure of the file and look at the numai_handler function
cat -n src/collectl2pcp/subsys.c | head -50Repository: performancecopilot/pcp
Length of output: 2456
🏁 Script executed:
# Find the numai_handler function and context around lines 228-239
sed -n '200,260p' src/collectl2pcp/subsys.c | cat -nRepository: performancecopilot/pcp
Length of output: 2037
🏁 Script executed:
# Look for the full numai_handler function definition
grep -n "numai_handler" src/collectl2pcp/subsys.cRepository: performancecopilot/pcp
Length of output: 111
🏁 Script executed:
# Search for how numai data is input/structured - look at the input data processing
grep -n "numai" src/collectl2pcp/subsys.c | head -20Repository: performancecopilot/pcp
Length of output: 438
🏁 Script executed:
# Look at how fields are parsed - find where fields array is populated
grep -n "fields\[" src/collectl2pcp/subsys.c | head -30Repository: performancecopilot/pcp
Length of output: 2380
🏁 Script executed:
# Look at the complete numai handler including comments to understand intent
sed -n '210,241p' src/collectl2pcp/subsys.cRepository: performancecopilot/pcp
Length of output: 1424
🏁 Script executed:
# Search for test data files
find qa -name "*numai*" -o -name "*subsys*test*" 2>/dev/nullRepository: performancecopilot/pcp
Length of output: 97
🏁 Script executed:
# Look for how collectl data is read/parsed - understand the handler invocation
grep -n "handler_t" src/collectl2pcp/subsys.c | head -10Repository: performancecopilot/pcp
Length of output: 422
🏁 Script executed:
# Check how the fields array is populated and if node info could be in header
grep -n "fields_t\|nfields\|fields\[0\]" src/collectl2pcp/subsys.c | head -20Repository: performancecopilot/pcp
Length of output: 990
🏁 Script executed:
# Examine the actual numai test data
gzip -dc qa/collectl/subsys_test.raw.gz | grep -A5 "^numai" | head -40Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Get a broader view of numai entries and context
gzip -dc qa/collectl/subsys_test.raw.gz | grep "^numai" | wc -l
gzip -dc qa/collectl/subsys_test.raw.gz | grep "^numai" | head -20Repository: performancecopilot/pcp
Length of output: 68
🏁 Script executed:
# Check if there's any header or node identifier before numai lines
gzip -dc qa/collectl/subsys_test.raw.gz | grep -B2 "^numai" | head -50Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Look for how the handler_t structure is used - find the main processing loop
grep -n "h->\|handler_t\|fields_t" src/collectl2pcp/*.c | grep -v "^Binary" | head -30Repository: performancecopilot/pcp
Length of output: 2186
🏁 Script executed:
# Check the handler_t and fields_t structure definitions
grep -n "typedef.*handler_t\|typedef.*fields_t\|struct.*handler_t\|struct.*fields_t" src/collectl2pcp/*.hRepository: performancecopilot/pcp
Length of output: 118
🏁 Script executed:
# Look at buddy_handler which seems to extract node info - understand the pattern
sed -n '124,168p' src/collectl2pcp/subsys.cRepository: performancecopilot/pcp
Length of output: 1435
🏁 Script executed:
# Check the header.c to understand how handlers are invoked and where node info might come from
cat src/collectl2pcp/header.cRepository: performancecopilot/pcp
Length of output: 5120
🏁 Script executed:
# Look at how collectl data is parsed and structured in main processing
sed -n '140,300p' src/collectl2pcp/collectl2pcp.cRepository: performancecopilot/pcp
Length of output: 3972
🏁 Script executed:
# Look at the handler_t structure definition more completely
grep -A 20 "typedef struct handler" src/collectl2pcp/*.hRepository: performancecopilot/pcp
Length of output: 543
🏁 Script executed:
# Check if there's any mechanism to track node context or state across lines
grep -n "static\|context\|node" src/collectl2pcp/subsys.c | head -20Repository: performancecopilot/pcp
Length of output: 895
🏁 Script executed:
# Look at how the collectl data is formatted - check for any header or metadata before numai section
cat qa/collectl/subsys_test.raw.gz | gzip -dc | head -100Repository: performancecopilot/pcp
Length of output: 3345
🏁 Script executed:
# Search for actual numai data in test files or examples
find qa -type f -name "*.gz" -o -name "*.raw" | xargs -I {} sh -c 'gzip -dc {} 2>/dev/null | grep -q "^numai" && echo "Found numai in: {}"' 2>/dev/nullRepository: performancecopilot/pcp
Length of output: 50
🏁 Script executed:
# Search the codebase for references to numai to understand its status
grep -r "numai" src --include="*.c" --include="*.h" | grep -v "^Binary"Repository: performancecopilot/pcp
Length of output: 885
🏁 Script executed:
# Check if there are any integration tests or documentation about numai functionality
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*test*" \) -exec grep -l "numai" {} \;Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Check the metrics definition for mem.numa.util.* to see if they have an instance domain
grep -n "mem.numa.util" src/collectl2pcp/metrics.cRepository: performancecopilot/pcp
Length of output: 4537
🏁 Script executed:
# Look at the full handler registration to confirm numai_handler is included
grep -B5 -A15 "handlers\[\]" src/collectl2pcp/collectl2pcp.cRepository: performancecopilot/pcp
Length of output: 529
🏁 Script executed:
# Check if there's any documentation or comments about the intended design
grep -B5 -A5 "aggregate\|NUMA\|node" src/collectl2pcp/subsys.c | head -60Repository: performancecopilot/pcp
Length of output: 1727
🏁 Script executed:
# Verify the exact indom value used for mem.numa.util metrics
grep "0xf000013" src/collectl2pcp/metrics.c | head -3Repository: performancecopilot/pcp
Length of output: 354
🏁 Script executed:
# Check what 0xf000013 resolves to
grep -n "0xf000013\|PM_INDOM_NULL\|INDOM" src/collectl2pcp/metrics.h | head -20Repository: performancecopilot/pcp
Length of output: 103
🏁 Script executed:
# Look at how put_str_value works with PM_INDOM_NULL
grep -n "put_str_value\|put_str_instance" src/collectl2pcp/*.c | head -20Repository: performancecopilot/pcp
Length of output: 2053
🏁 Script executed:
# Confirm what indom 0xf000013 is
grep "0xf000013\|PM_INDOM_NULL" src/collectl2pcp/metrics.c | grep -E "hinv.*node|kernel.*node" | head -5Repository: performancecopilot/pcp
Length of output: 555
🏁 Script executed:
# Check if there's documentation about what the numai metrics are intended to represent
grep -B3 -A3 "mem.numa.util" src/collectl2pcp/metrics.c | head -20Repository: performancecopilot/pcp
Length of output: 1640
🏁 Script executed:
# Look at how buddy_handler uses put_str_instance vs how numai_handler uses put_str_value
grep -n "put_str_instance\|put_str_value" src/collectl2pcp/subsys.cRepository: performancecopilot/pcp
Length of output: 2506
🏁 Script executed:
# Verify the definition and purpose of put_str_instance and put_str_value
grep -n "^put_str" src/collectl2pcp/util.c | head -10Repository: performancecopilot/pcp
Length of output: 188
🏁 Script executed:
# Check what the actual indom 0xf000013 is
grep "define.*0xf000013\|`#define`.*0xf000013" src/collectl2pcp/*.hRepository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Look at the INDOM definitions
grep -n "INDOM\|define.*INDOM" src/collectl2pcp/metrics.h | head -20Repository: performancecopilot/pcp
Length of output: 103
🏁 Script executed:
# Understand how put_str_value is implemented with PM_INDOM_NULL
sed -n '52,100p' src/collectl2pcp/util.cRepository: performancecopilot/pcp
Length of output: 1555
🏁 Script executed:
# Check the full indom definitions and values
grep -n "INDOM\|define" src/collectl2pcp/metrics.h | grep -i "0xf000013\|node\|numa" | head -20Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Look at how metrics are defined - see the pattern for metrics with vs without indom
sed -n '390,410p' src/collectl2pcp/metrics.cRepository: performancecopilot/pcp
Length of output: 1687
🏁 Script executed:
# Check if there are any other numai format samples or documentation
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) -exec grep -l "numai\|/sys/devices/system/node" {} \;Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Confirm the indom 0xf000013 definition - what is the serial number 0x13?
python3 << 'PYTHON'
indom_val = 0xf000013
domain = (indom_val >> 10) & 0xfffff
serial = indom_val & 0x3ff
print(f"Domain: {domain} (0x{domain:x}), Serial: {serial} (0x{serial:x})")
PYTHONRepository: performancecopilot/pcp
Length of output: 110
🏁 Script executed:
# Check the metric definition to see what indom is actually declared
grep -A2 "mem.numa.util.total" src/collectl2pcp/metrics.cRepository: performancecopilot/pcp
Length of output: 313
🏁 Script executed:
# Verify if metrics with 0xf000013 indom expect per-node instances
grep "0xf000013" src/collectl2pcp/metrics.c | head -10Repository: performancecopilot/pcp
Length of output: 1063
🏁 Script executed:
# Confirm: check how buddy_handler uses indom vs how it's defined
grep "mem.buddyinfo" src/collectl2pcp/metrics.c | head -3Repository: performancecopilot/pcp
Length of output: 260
🏁 Script executed:
# Look at the complete picture - what indom should numai use?
grep "buddy\|BUDDYINFO" src/collectl2pcp/*.c src/collectl2pcp/*.hRepository: performancecopilot/pcp
Length of output: 1583
The mem.numa.util.* metrics are defined with instance domain 0xf000013 (the node domain) but numai_handler writes them with PM_INDOM_NULL, causing data loss across multiple nodes.
Lines 228-239 call put_str_value() with PM_INDOM_NULL and no instance name, which means each new node's data overwrites the previous node's values instead of creating per-node instances. This contradicts the metric definitions in metrics.c, which declare these metrics with indom 0xf000013 (the same instance domain used by kernel.pernode.cpu.* and hinv.node.online).
The code comment acknowledges: "Since we cannot recover the node number from the line itself, we map to the aggregate mem.numa.util.* metrics." However, the metrics are not defined as singular aggregates—they are defined with a node instance domain, creating a design mismatch.
Either:
- Redefine
mem.numa.util.*metrics withPM_INDOM_NULLif aggregates are intended, or - Recover or infer the node identity and write per-instance values like
buddy_handlerdoes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/collectl2pcp/subsys.c` around lines 228 - 239, The mem.numa.util.*
metrics (mem.numa.util.total, mem.numa.util.free, mem.numa.util.used,
mem.numa.util.active, mem.numa.util.inactive, mem.numa.util.dirty) in
numai_handler are being written with PM_INDOM_NULL and no instance name, but
these metrics are defined with instance domain 0xf000013 (node domain) in
metrics.c. This causes data loss across multiple nodes. Either redefine these
metrics in metrics.c to use PM_INDOM_NULL as their instance domain if they
should be aggregates, or modify the put_str_value calls in lines 228-239 to use
the correct instance domain 0xf000013 with appropriate node instance names.
Choose one approach and apply it consistently across all six mem.numa.util.*
metrics.
Four RST pages under docs/HowTos/collectl/:
index.rst - overview, quick start, key differences from collectl
getting-started - subsystem codes, output modes, rate calculation,
filtering, common examples
subsystems - per-subsystem metric reference with PCP metric names
and output column layouts for all 18 subsystems
archives - archive naming, daemon mode, log rotation with
xz/.meta + zstd/data compression, culling, playback,
collectl2pcp migration path
colmux - pcp-colmux multi-host monitoring: PDSH expansion,
sorted/columnar display, interactive controls,
playback, legacy socket deprecation note
Content adapted from the original collectl HTML documentation
(http://collectl.sourceforge.net/) by Mark Seger, HP.
Signed-off-by: Nathan Scott <nathans@redhat.com>
8fcb59e to
862562f
Compare
linux PMDA: add hinv.cpu.frequency_scaling.current (item 10 in CLUSTER_SYSFS_DEVICES) reading /sys/devices/system/cpu/*/cpufreq/ scaling_cur_freq per CPU. The existing freq.scale field already holds this value from refresh_sysfs_frequency_scaling_cur_freq(); we simply expose it as a new instant metric alongside the existing .min/.max.
collectl2pcp: add handlers for subsystems that had no coverage:
sock - network.sockstat.* from /proc/net/sockstat
fs-ds - vfs.dentry.count from /proc/sys/fs/dentry-state
fs-is - vfs.inodes.count from /proc/sys/fs/inode-nr
fs-fnr - vfs.files.count from /proc/sys/fs/file-nr
buddy - mem.buddyinfo.total from /proc/buddyinfo (per zone)
nfsc- - nfs.client.calls from /proc/net/rpc/nfs
nfss- - nfs.server.calls from /proc/net/rpc/nfsd
numai - mem.numa.util.* from /sys/devices/system/node/nodeN/meminfo
Test data: qa/collectl/subsys_test.raw.gz (1.9KB, 3 samples) collected using collectl -s sibm to exercise all new handlers.