promdb2pcp: new tool to import Prometheus node_exporter metrics#2623
promdb2pcp: new tool to import Prometheus node_exporter metrics#2623Aniruddh9 wants to merge 3 commits into
Conversation
Add promdb2pcp, a Python import tool that converts Prometheus node_exporter metrics (stored as JSON from the query_range API) into PCP archives for analysis with pmrep, pcp2csv, pmchart, etc. Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O, network interfaces, load averages, scheduler counters, vmstat counters, and PSI pressure stall information. Input is a directory of JSON files produced by the Prometheus HTTP API /api/v1/query_range endpoint. An optional metadata.json file provides hostname and timezone for the archive header.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited) Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughA new Changespromdb2pcp tool
Sequence Diagram(s)sequenceDiagram
participant CLI as promdb2pcp CLI
participant convert
participant metadata.json
participant RegisterHelpers as register_*_metrics
participant pmiLogImport as pmiLogImport handle
CLI->>convert: datadir, archive, hostname, verbose
convert->>pmiLogImport: pmiStart(archive)
convert->>metadata.json: load hostname/timezone (optional)
metadata.json-->>convert: hostname, timezone (or fallbacks)
convert->>pmiLogImport: pmiSetHostname(), pmiSetTimezone()
convert->>RegisterHelpers: log, data_dir, verbose
RegisterHelpers->>pmiLogImport: pmiAddMetric(), pmiAddInstance(), pmiPutText()
RegisterHelpers-->>convert: timestamps, metric_values
loop sorted timestamps
convert->>pmiLogImport: pmiPutValue(metric, instance, value)
convert->>pmiLogImport: pmiWrite(sec, usec)
end
convert->>pmiLogImport: del log
convert-->>CLI: archive written
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/promdb2pcp/promdb2pcp.py`:
- Around line 307-313: The code at line 312 silently selects the first matching
time series with `matching[0]` without validating that exactly one series
matches the metric name. This causes multiple time series with different labels
(such as different instances or jobs) to be silently discarded. Add validation
after the `if not matching: continue` check to ensure exactly one series
matches, or add explicit filtering by label before accessing the matching list.
If multiple matches are found, either raise an error to alert the user about the
ambiguity, skip the metric, or implement label-based filtering to select the
correct series.
- Around line 316-322: The bare except blocks that catch pmi.pmiErr exceptions
and use pass are suppressing critical PCP API failures without any logging or
tracking, allowing partial archives to be created while still exiting
successfully. At each location where pmi.pmiErr is caught (in the
pmiAddMetric/pmiPutText block around lines 316-322, the instance registration
block around lines 393-399, the value write block around lines 451-457, and the
text registration block around lines 560-561), replace the empty pass statement
with proper error handling that logs the failure using the existing log facility
and tracks that a failure occurred. After processing all metrics, check if any
failures were recorded and return a non-zero exit code to signal that the import
failed, ensuring bad imports are visible and actionable rather than silently
producing partial archives.
- Around line 490-499: The json.load(f) call at line 492 can raise a
JSONDecodeError if metadata.json is malformed, causing the entire conversion to
crash even though metadata is optional. Wrap the file open and json.load
operations in a try-except block that catches JSON decode errors, and when
parsing fails, skip the metadata configuration (hostname and timezone setup) and
allow the conversion to continue with default values instead of propagating the
exception.
- Around line 259-267: The `load_json` function lacks exception handling around
the `json.load()` call and file operations, which means malformed JSON or I/O
errors will cause the entire import to fail. Wrap the `with open(filepath)`
block and `json.load(f)` call in a try-except block to catch JSONDecodeError and
other I/O exceptions. When an exception occurs, return None (or log a warning if
desired) so that the function degrades gracefully and allows other valid metric
groups to continue being processed instead of aborting the entire import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: f6119efb-4df7-436e-8d7a-6ee36bd37f10
📒 Files selected for processing (4)
src/GNUmakefilesrc/promdb2pcp/GNUmakefilesrc/promdb2pcp/promdb2pcp.1src/promdb2pcp/promdb2pcp.py
| matching = [r for r in results | ||
| if r['metric'].get('__name__') == prom_name] | ||
| if not matching: | ||
| continue | ||
|
|
||
| r = matching[0] | ||
| units = pmapi.pmUnits(*units_args) |
There was a problem hiding this comment.
Avoid silently picking the first time series when multiple labelsets match.
At Line 312, matching[0] discards additional series for the same metric name. If input contains multiple instances/jobs, this imports arbitrary data and produces incorrect archives. Validate that exactly one series matches (or explicitly filter by label) before ingesting.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/promdb2pcp/promdb2pcp.py` around lines 307 - 313, The code at line 312
silently selects the first matching time series with `matching[0]` without
validating that exactly one series matches the metric name. This causes multiple
time series with different labels (such as different instances or jobs) to be
silently discarded. Add validation after the `if not matching: continue` check
to ensure exactly one series matches, or add explicit filtering by label before
accessing the matching list. If multiple matches are found, either raise an
error to alert the user about the ambiguity, skip the metric, or implement
label-based filtering to select the correct series.
Harden promdb2pcp against malformed input files and registration failures: - load_json(): catch OSError and JSONDecodeError instead of crashing - register_simple_metrics(): warn on multiple matching series and report pmiErr details instead of silently swallowing them - convert(): handle metadata.json parse failures gracefully with a warning
| if verbose: | ||
| print('%s (%d samples)' % (pcp_name, len(values))) | ||
|
|
||
| cluster += 1 |
There was a problem hiding this comment.
In PCP, when a new cluster is created the item ID number should reset to 0. It looks like when you create a new cluster the item ID keeps incrementing without being reset.
| (prom_name, pcp_name, sem, units_args, | ||
| pcp_type, divisor, helptext) = entry | ||
|
|
||
| matching = [r for r in results |
There was a problem hiding this comment.
Can we move this line above the entries loop and create a hash map for the names?
Then inside the loop we can do a quick O(1) lookup instead of iterating through the results variable each time?
| item += 1 | ||
| values = {} | ||
| for ts, val in r.get('values', []): | ||
| fts = float(ts) |
There was a problem hiding this comment.
Add try/except clause to catch malformed data resulting in TypeError or ValueError and skip it smoothly
| for mode in CPU_MODES: | ||
| pcp_name = 'kernel.percpu.cpu.' + mode | ||
| units = pmapi.pmUnits(0, 1, 0, 0, PM_TIME_MSEC, 0) | ||
| indom = log.pmiInDom(DOMAIN, serial) |
There was a problem hiding this comment.
These metrics should share the same instance domain. So we should move the indom line outside of the loop and the serial should be constant for these metrics.
|
|
||
| for cpu_num in cpu_numbers: | ||
| try: | ||
| log.pmiAddInstance(indom, 'cpu' + cpu_num, int(cpu_num)) |
There was a problem hiding this comment.
on some systems the 'cpu' label could have a value of 'cpu-0' instead of just '0' in this case the int(cpu_num) would fail
|
Hi Ani! Thank you so much for your contribution :) This looks like a great addition to PCP I left some comments in line on the files. Along with those we should have some QA to test the functionality of the tool added to our testsuite under pcp/qa. You can use the ./new script to generate a new test number along with a skeleton qa test script. Or you can search around and see if there is an existing QA test available where it would make sense to add to it. Let me know if you have any questions Lauren |
Add promdb2pcp, a Python import tool that converts Prometheus node_exporter metrics (stored as JSON from the query_range API) into PCP archives for analysis with pmrep, pcp2csv, pmchart, etc.
Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O, network interfaces, load averages, scheduler counters, vmstat counters, and PSI pressure stall information.
Input is a directory of JSON files produced by the Prometheus HTTP API /api/v1/query_range endpoint. An optional metadata.json file provides hostname and timezone for the archive header.
Pull Request Description
Related Issues :
New feature — no existing issue.
Fixes #
Checklist
Description :
Adds promdb2pcp, a new Python import tool that converts Prometheus node_exporter metrics into PCP archives.
Background: When analyzing OpenShift/Kubernetes node performance, Prometheus node_exporter metrics are often the primary data source. Currently there is no way to import this data into PCP for analysis with tools like pmrep, pcp2csv, and pmchart. This tool bridges that gap.
What it does: Reads a directory of JSON files produced by the Prometheus HTTP API /api/v1/query_range endpoint and creates a PCP archive with proper metric metadata, instance domains, and help text.
Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O, network interfaces, load averages, scheduler counters, vmstat counters, and PSI pressure stall information.
Conventions: Follows the same patterns as guidellm2pcp and vllmbench2pcp — #!/usr/bin/pmpython shebang, GPL header, pmiID/pmiInDom for proper PMIDs, help text via pmiPutText, domain 510.
Commits :
Single commit: promdb2pcp: new tool to import Prometheus node_exporter metrics
Files added/modified:
src/promdb2pcp/promdb2pcp.py — the import tool
src/promdb2pcp/GNUmakefile — build integration
src/promdb2pcp/promdb2pcp.1 — man page
src/GNUmakefile — register promdb2pcp in the build
Documentation updated
Man page promdb2pcp.1 included with synopsis, description, options, examples, and SEE ALSO references.
Tests added/updated
Manually tested with real Prometheus data (80 timestamps, 51 metrics, 8 CPUs, 2 disk devices, 3 network interfaces).