Skip to content

promdb2pcp: new tool to import Prometheus node_exporter metrics#2623

Open
Aniruddh9 wants to merge 3 commits into
performancecopilot:mainfrom
Aniruddh9:promdb2pcp
Open

promdb2pcp: new tool to import Prometheus node_exporter metrics#2623
Aniruddh9 wants to merge 3 commits into
performancecopilot:mainfrom
Aniruddh9:promdb2pcp

Conversation

@Aniruddh9

Copy link
Copy Markdown

Add promdb2pcp, a Python import tool that converts Prometheus node_exporter metrics (stored as JSON from the query_range API) into PCP archives for analysis with pmrep, pcp2csv, pmchart, etc.

Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O, network interfaces, load averages, scheduler counters, vmstat counters, and PSI pressure stall information.

Input is a directory of JSON files produced by the Prometheus HTTP API /api/v1/query_range endpoint. An optional metadata.json file provides hostname and timezone for the archive header.

Pull Request Description

Related Issues :

New feature — no existing issue.

Fixes #

Checklist

Description :
Adds promdb2pcp, a new Python import tool that converts Prometheus node_exporter metrics into PCP archives.

Background: When analyzing OpenShift/Kubernetes node performance, Prometheus node_exporter metrics are often the primary data source. Currently there is no way to import this data into PCP for analysis with tools like pmrep, pcp2csv, and pmchart. This tool bridges that gap.

What it does: Reads a directory of JSON files produced by the Prometheus HTTP API /api/v1/query_range endpoint and creates a PCP archive with proper metric metadata, instance domains, and help text.

Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O, network interfaces, load averages, scheduler counters, vmstat counters, and PSI pressure stall information.

Conventions: Follows the same patterns as guidellm2pcp and vllmbench2pcp — #!/usr/bin/pmpython shebang, GPL header, pmiID/pmiInDom for proper PMIDs, help text via pmiPutText, domain 510.

Commits :
Single commit: promdb2pcp: new tool to import Prometheus node_exporter metrics

Files added/modified:

src/promdb2pcp/promdb2pcp.py — the import tool

src/promdb2pcp/GNUmakefile — build integration

src/promdb2pcp/promdb2pcp.1 — man page

src/GNUmakefile — register promdb2pcp in the build

Documentation updated

Man page promdb2pcp.1 included with synopsis, description, options, examples, and SEE ALSO references.
Tests added/updated

Manually tested with real Prometheus data (80 timestamps, 51 metrics, 8 CPUs, 2 disk devices, 3 network interfaces).

Add promdb2pcp, a Python import tool that converts Prometheus
node_exporter metrics (stored as JSON from the query_range API)
into PCP archives for analysis with pmrep, pcp2csv, pmchart, etc.

Supported metric groups: CPU (per-cpu/per-mode), memory, disk I/O,
network interfaces, load averages, scheduler counters, vmstat
counters, and PSI pressure stall information.

Input is a directory of JSON files produced by the Prometheus HTTP
API /api/v1/query_range endpoint.  An optional metadata.json file
provides hostname and timezone for the archive header.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 61eeaf4a-aada-4240-b460-378260b377ff

📥 Commits

Reviewing files that changed from the base of the PR and between 6198b81 and d77dcc7.

📒 Files selected for processing (1)
  • src/promdb2pcp/promdb2pcp.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/promdb2pcp/promdb2pcp.py

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added promdb2pcp to convert Prometheus query_range JSON files into PCP archive files, including CPU, disk, and network device metric support and optional archive metadata.
  • Documentation

    • Added a new manual page covering the tool’s purpose, supported metric groups, command-line options, and an example workflow.
  • Chores

    • Updated the build to include the new tool for default builds, installation, and linting.

Walkthrough

A new promdb2pcp tool is added under src/promdb2pcp/. It converts Prometheus query_range JSON files (cpu, memory, disk, network, etc.) into PCP archives via pmiLogImport. The PR includes the Python implementation, a GNUmakefile, a man page, and wires the subdirectory into the top-level build.

Changes

promdb2pcp tool

Layer / File(s) Summary
Build system wiring
src/GNUmakefile, src/promdb2pcp/GNUmakefile
promdb2pcp is appended to OTHER_SUBDIRS in the top-level makefile. The new subdirectory makefile defines build, install (gated on HAVE_PYTHON), and lint (PYLINT/MANLINT) targets.
Imports, constants, and metric mapping tables
src/promdb2pcp/promdb2pcp.py
Module-level constants DOMAIN and CPU_MODES are defined, PCP bindings are imported, and SOURCE_FILES, DISK_METRICS, and NET_METRICS data structures map Prometheus metric names to PCP metric definitions with units, types, indom semantics, and scaling factors.
JSON helpers and metric registration functions
src/promdb2pcp/promdb2pcp.py
load_json(), discover_instances(), and build_indexed_data() handle Prometheus JSON ingestion. register_simple_metrics(), register_cpu_metrics(), and register_device_metrics() call pmiAddMetric(), pmiPutText(), and pmiAddInstance() and return aggregated timestamp sets and metric value payloads.
Archive writing orchestration, CLI, and man page
src/promdb2pcp/promdb2pcp.py, src/promdb2pcp/promdb2pcp.1
convert() initialises the pmiLogImport handle, reads optional metadata.json for hostname/timezone, calls all registration helpers, then iterates sorted timestamps to write values with pmiPutValue() and commit with pmiWrite(). The argparse CLI validates datadir, resolves a default archive path, and calls convert(). The man page documents all options, supported metric groups, and an example workflow.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as promdb2pcp CLI
  participant convert
  participant metadata.json
  participant RegisterHelpers as register_*_metrics
  participant pmiLogImport as pmiLogImport handle

  CLI->>convert: datadir, archive, hostname, verbose
  convert->>pmiLogImport: pmiStart(archive)
  convert->>metadata.json: load hostname/timezone (optional)
  metadata.json-->>convert: hostname, timezone (or fallbacks)
  convert->>pmiLogImport: pmiSetHostname(), pmiSetTimezone()
  convert->>RegisterHelpers: log, data_dir, verbose
  RegisterHelpers->>pmiLogImport: pmiAddMetric(), pmiAddInstance(), pmiPutText()
  RegisterHelpers-->>convert: timestamps, metric_values
  loop sorted timestamps
    convert->>pmiLogImport: pmiPutValue(metric, instance, value)
    convert->>pmiLogImport: pmiWrite(sec, usec)
  end
  convert->>pmiLogImport: del log
  convert-->>CLI: archive written
Loading

Poem

A rabbit hops through metrics galore,
Prometheus JSON? I'll convert the store.
CPU modes and disk devices, all mapped with care,
PCP archives grow with data to share.
🐇 pmiWrite() away, no timestamp left bare!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing promdb2pcp, a new tool for importing Prometheus metrics into PCP archives.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the purpose, functionality, and implementation details of the promdb2pcp tool.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/promdb2pcp/promdb2pcp.py`:
- Around line 307-313: The code at line 312 silently selects the first matching
time series with `matching[0]` without validating that exactly one series
matches the metric name. This causes multiple time series with different labels
(such as different instances or jobs) to be silently discarded. Add validation
after the `if not matching: continue` check to ensure exactly one series
matches, or add explicit filtering by label before accessing the matching list.
If multiple matches are found, either raise an error to alert the user about the
ambiguity, skip the metric, or implement label-based filtering to select the
correct series.
- Around line 316-322: The bare except blocks that catch pmi.pmiErr exceptions
and use pass are suppressing critical PCP API failures without any logging or
tracking, allowing partial archives to be created while still exiting
successfully. At each location where pmi.pmiErr is caught (in the
pmiAddMetric/pmiPutText block around lines 316-322, the instance registration
block around lines 393-399, the value write block around lines 451-457, and the
text registration block around lines 560-561), replace the empty pass statement
with proper error handling that logs the failure using the existing log facility
and tracks that a failure occurred. After processing all metrics, check if any
failures were recorded and return a non-zero exit code to signal that the import
failed, ensuring bad imports are visible and actionable rather than silently
producing partial archives.
- Around line 490-499: The json.load(f) call at line 492 can raise a
JSONDecodeError if metadata.json is malformed, causing the entire conversion to
crash even though metadata is optional. Wrap the file open and json.load
operations in a try-except block that catches JSON decode errors, and when
parsing fails, skip the metadata configuration (hostname and timezone setup) and
allow the conversion to continue with default values instead of propagating the
exception.
- Around line 259-267: The `load_json` function lacks exception handling around
the `json.load()` call and file operations, which means malformed JSON or I/O
errors will cause the entire import to fail. Wrap the `with open(filepath)`
block and `json.load(f)` call in a try-except block to catch JSONDecodeError and
other I/O exceptions. When an exception occurs, return None (or log a warning if
desired) so that the function degrades gracefully and allows other valid metric
groups to continue being processed instead of aborting the entire import.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: f6119efb-4df7-436e-8d7a-6ee36bd37f10

📥 Commits

Reviewing files that changed from the base of the PR and between 58eecb9 and 6198b81.

📒 Files selected for processing (4)
  • src/GNUmakefile
  • src/promdb2pcp/GNUmakefile
  • src/promdb2pcp/promdb2pcp.1
  • src/promdb2pcp/promdb2pcp.py

Comment thread src/promdb2pcp/promdb2pcp.py
Comment on lines +307 to +313
matching = [r for r in results
if r['metric'].get('__name__') == prom_name]
if not matching:
continue

r = matching[0]
units = pmapi.pmUnits(*units_args)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid silently picking the first time series when multiple labelsets match.

At Line 312, matching[0] discards additional series for the same metric name. If input contains multiple instances/jobs, this imports arbitrary data and produces incorrect archives. Validate that exactly one series matches (or explicitly filter by label) before ingesting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/promdb2pcp/promdb2pcp.py` around lines 307 - 313, The code at line 312
silently selects the first matching time series with `matching[0]` without
validating that exactly one series matches the metric name. This causes multiple
time series with different labels (such as different instances or jobs) to be
silently discarded. Add validation after the `if not matching: continue` check
to ensure exactly one series matches, or add explicit filtering by label before
accessing the matching list. If multiple matches are found, either raise an
error to alert the user about the ambiguity, skip the metric, or implement
label-based filtering to select the correct series.

Comment thread src/promdb2pcp/promdb2pcp.py Outdated
Comment thread src/promdb2pcp/promdb2pcp.py
Aniruddh9 and others added 2 commits June 15, 2026 13:50
Harden promdb2pcp against malformed input files and registration failures:
- load_json(): catch OSError and JSONDecodeError instead of crashing
- register_simple_metrics(): warn on multiple matching series and report
  pmiErr details instead of silently swallowing them
- convert(): handle metadata.json parse failures gracefully with a warning
if verbose:
print('%s (%d samples)' % (pcp_name, len(values)))

cluster += 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PCP, when a new cluster is created the item ID number should reset to 0. It looks like when you create a new cluster the item ID keeps incrementing without being reset.

(prom_name, pcp_name, sem, units_args,
pcp_type, divisor, helptext) = entry

matching = [r for r in results

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this line above the entries loop and create a hash map for the names?
Then inside the loop we can do a quick O(1) lookup instead of iterating through the results variable each time?

item += 1
values = {}
for ts, val in r.get('values', []):
fts = float(ts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add try/except clause to catch malformed data resulting in TypeError or ValueError and skip it smoothly

for mode in CPU_MODES:
pcp_name = 'kernel.percpu.cpu.' + mode
units = pmapi.pmUnits(0, 1, 0, 0, PM_TIME_MSEC, 0)
indom = log.pmiInDom(DOMAIN, serial)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics should share the same instance domain. So we should move the indom line outside of the loop and the serial should be constant for these metrics.


for cpu_num in cpu_numbers:
try:
log.pmiAddInstance(indom, 'cpu' + cpu_num, int(cpu_num))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on some systems the 'cpu' label could have a value of 'cpu-0' instead of just '0' in this case the int(cpu_num) would fail

@lmchilton

Copy link
Copy Markdown
Contributor

Hi Ani! Thank you so much for your contribution :) This looks like a great addition to PCP

I left some comments in line on the files.

Along with those we should have some QA to test the functionality of the tool added to our testsuite under pcp/qa. You can use the ./new script to generate a new test number along with a skeleton qa test script. Or you can search around and see if there is an existing QA test available where it would make sense to add to it.

Let me know if you have any questions

Lauren

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants