Skip to content

pmdas: add chronyd pmda#2613

Open
jeffpc wants to merge 1 commit into
performancecopilot:mainfrom
jeffpc:chrony
Open

pmdas: add chronyd pmda#2613
jeffpc wants to merge 1 commit into
performancecopilot:mainfrom
jeffpc:chrony

Conversation

@jeffpc

@jeffpc jeffpc commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

It retrieves chronyd tracking and source info, as well as source statistics by shelling out to chronyc and parsing the output. These are exposes as chrony.{tracking,sources,sourcestats}, respectively.

This is the first time I'm using the python pmda API, so there's a good chance I'm messing something up. You've been warned :D

It retrieves chronyd tracking and source info, as well as source
statistics by shelling out to chronyc and parsing the output.  These are
exposes as chrony.{tracking,sources,sourcestats}, respectively.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added chrony PMDA to monitor NTP daemon metrics, including tracking status, source information, and source statistics.
  • Documentation

    • Added man page documentation for the chrony PMDA module.

Walkthrough

The PR adds a complete Python-based PMDA for the PCP performance monitoring system that collects chronyd synchronization metrics. The change registers domain 165, defines build and installation infrastructure, implements metric collection via chronyc command execution and CSV parsing, and documents the new agent.

Changes

Chrony PMDA Addition

Layer / File(s) Summary
Domain allocation and build registration
src/pmns/stdpmid.pcp, src/pmdas/GNUmakefile
CHRONY PMDA is assigned domain 165 in the PCP namespace; chrony is added to the PYPMDAS subdirectory list for build inclusion.
Build and installation infrastructure
src/pmdas/chrony/GNUmakefile, src/pmdas/chrony/Install, src/pmdas/chrony/Remove
GNUmakefile configures Python-conditional install, domain prerequisite, and check targets; Install and Remove scripts handle PMDA registration and lifecycle via PCP helper functions.
Python PMDA implementation
src/pmdas/chrony/pmdachrony.python
ChronyPMDA class executes chronyc commands (tracking, sources, sourcestats), parses CSV output using per-metric typed conversion functions, registers metrics with types/units/semantics, implements refresh to populate in-memory stores and update indom instances, and fetch callback to return metric values with error handling.
User documentation
src/pmdas/chrony/pmdachrony.1
Man page documents the chronyd PMDA, installation/removal procedures, relevant files, PCP environment variables, and related references.

Poem

🐰 A chrony PMDA hops into the fold,
with timing metrics, precise and bold.
From domain one-six-five, CSV streams flow,
chronyd's synchronization dance steals the show! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'pmdas: add chronyd pmda' clearly and concisely summarizes the main change: adding a new chronyd PMDA to the PCP PMDAs collection.
Description check ✅ Passed The description is directly related to the changeset, explaining what the new chronyd PMDA does, how it retrieves data via chronyc, and what metrics it exposes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch chrony

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jeffpc

jeffpc commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

When I use pmval for the first time after pmcd restart on any of the chrony metrics that use the per-source instance domain, I get the following warning (e.g., if I try to fetch chrony.sources.m):

Warning: value="^", but instance=0 is unknown

IIUC, this is because the first time pmcd fetchs the values it gets (and forwards onto pmval) the instance domain references before it has learned about the instances. The second fetch by pmval (i.e., 1 second later by default) displays the values correctly.

Is this something that matters? Is there a good way to fix it? My best idea is to proactively populate the instances on pmda init (IOW, register the metrics, and immediately shell out to chronyc to populate the indom), but it seems like a bit of a hack.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pmdas/chrony/pmdachrony.python`:
- Line 87: The subprocess.run calls in _oneline_cmd and _multiline_cmd lack
timeouts and can hang; add a sensible timeout argument (e.g. timeout=...
seconds) to those subprocess.run invocations, handle subprocess.TimeoutExpired
by logging an error via the existing logger/ processLogger and returning/raising
an appropriate failure (consistent with current error handling in those
functions), and ensure other parameters (stdout, check=True, universal_newlines)
remain unchanged; update both functions (_oneline_cmd and _multiline_cmd) so
chronyc hangs do not block refresh handling.
- Around line 86-99: The _oneline_cmd function (and the similar parsing block at
lines ~104-123) only catches subprocess.CalledProcessError but can raise
AssertionError, ValueError, or IndexError when parsing/converting chronyc
output; wrap the parsing and conversion steps (the assert and the loop that
applies cvt_fxns to values) in exception handling that catches AssertionError,
ValueError, and IndexError (or a single except (AssertionError, ValueError,
IndexError) as e) and handle it the same way as CalledProcessError: log a clear
message including the command and the exception and return an empty dict, so
malformed or unexpected chronyc output won’t crash the PMDA.
- Around line 89-93: The assert in _oneline_cmd and _multiline_cmd checks
len(values) >= len(cvt_fxns) but the loop iterates over values causing
IndexError when values is longer; change the loop to iterate over cvt_fxns
(e.g., for i in range(len(cvt_fxns))) and apply each converter to the
corresponding values[i], writing to out[i]; alternatively enforce exact length
by validating len(values) == len(cvt_fxns) before parsing. Update the loops that
reference values, cvt_fxns, and out so you never index cvt_fxns with an
out-of-range index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 7bdb5aa1-aa45-4170-92e1-1e86db0f375a

📥 Commits

Reviewing files that changed from the base of the PR and between 7343c27 and b57ee9c.

📒 Files selected for processing (7)
  • src/pmdas/GNUmakefile
  • src/pmdas/chrony/GNUmakefile
  • src/pmdas/chrony/Install
  • src/pmdas/chrony/Remove
  • src/pmdas/chrony/pmdachrony.1
  • src/pmdas/chrony/pmdachrony.python
  • src/pmns/stdpmid.pcp

Comment on lines +86 to +99
try:
res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)
values = res.stdout.strip().split(",")
assert len(values) >= len(cvt_fxns), (values, cvt_fxns)

# parse each item
for i in range(len(values)):
out[i] = cvt_fxns[i](values[i])
except subprocess.CalledProcessError as e:
cmd = " ".join(cmd)
self.log(f"Error running '{cmd}': {e}")
return {}

return out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Catch chronyc parsing/conversion failures to avoid crashing the PMDA

In src/pmdas/chrony/pmdachrony.python (_oneline_cmd and also the similar logic at 104-123), the code only catches subprocess.CalledProcessError; assert can raise AssertionError, and cvt_fxns conversions like int(..., 16/8) / float(...) can raise ValueError (with potential IndexError if field counts don’t match). These currently bubble up and can take the PMDA down.

Proposed fix
-        except subprocess.CalledProcessError as e:
+        except (subprocess.CalledProcessError, ValueError, IndexError, AssertionError) as e:
             cmd = " ".join(cmd)
             self.log(f"Error running '{cmd}': {e}")
             return {}
...
-        except subprocess.CalledProcessError as e:
+        except (subprocess.CalledProcessError, ValueError, IndexError, AssertionError) as e:
             cmd = " ".join(cmd)
             self.log(f"Error running '{cmd}': {e}")
             return {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pmdas/chrony/pmdachrony.python` around lines 86 - 99, The _oneline_cmd
function (and the similar parsing block at lines ~104-123) only catches
subprocess.CalledProcessError but can raise AssertionError, ValueError, or
IndexError when parsing/converting chronyc output; wrap the parsing and
conversion steps (the assert and the loop that applies cvt_fxns to values) in
exception handling that catches AssertionError, ValueError, and IndexError (or a
single except (AssertionError, ValueError, IndexError) as e) and handle it the
same way as CalledProcessError: log a clear message including the command and
the exception and return an empty dict, so malformed or unexpected chronyc
output won’t crash the PMDA.

out = {}

try:
res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

# Show relevant sections with line numbers
nl -ba "$FILE" | sed -n '60,130p'

# Find all subprocess.run occurrences in the file
echo "---- subprocess.run occurrences ----"
rg -n "subprocess\.run\(" "$FILE" || true

# Ensure file ends with a single trailing newline (basic check)
echo "---- trailing newline check ----"
python3 - <<'PY'
import pathlib
p = pathlib.Path("src/pmdas/chrony/pmdachrony.python")
b = p.read_bytes()
print("ends_with_newline:", b.endswith(b"\n"))
print("last_bytes:", b[-20:])
PY

Repository: performancecopilot/pcp

Length of output: 109


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

echo "---- lines 60-130 (with line numbers via python) ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
lines = p.read_text().splitlines(True)  # keepends
for i in range(59, 130):  # 1-based lines; slice starting at 60
    if 1 <= i <= len(lines):
        print(f"{i:6d}: {lines[i-1].rstrip()}")
PY

echo "---- subprocess.run occurrences (with line numbers) ----"
rg -n "subprocess\.run\(" "$FILE" || true

echo "---- trailing newline check ----"
python3 - <<'PY'
from pathlib import Path
b = Path("src/pmdas/chrony/pmdachrony.python").read_bytes()
print("ends_with_newline:", b.endswith(b"\n"))
PY

Repository: performancecopilot/pcp

Length of output: 3083


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

echo "---- chronyc usages ----"
rg -n "chronyc" "$FILE" || true

echo "---- calls to _oneline_cmd ----"
rg -n "_oneline_cmd\(" "$FILE" || true

echo "---- calls to _multiline_cmd ----"
rg -n "_multiline_cmd\(" "$FILE" || true

echo "---- show around _oneline_cmd calls (context) ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
txt = p.read_text().splitlines()
targets = []
for i,line in enumerate(txt, start=1):
    if "_oneline_cmd(" in line or "_multiline_cmd(" in line:
        targets.append(i)
for t in targets:
    start=max(1,t-5); end=min(len(txt), t+5)
    print(f"\n--- context {start}-{end} around line {t} ---")
    for j in range(start,end+1):
        print(f"{j:6d}: {txt[j-1]}")
PY

Repository: performancecopilot/pcp

Length of output: 3947


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

echo "---- search for existing timeouts in this file ----"
rg -n "timeout\s*=" "$FILE" || true
rg -n "signal\." "$FILE" || true
rg -n "alarm\(" "$FILE" || true
rg -n "threading|Timer" "$FILE" || true

echo "---- subprocess import/usage ----"
rg -n "import subprocess|subprocess\." "$FILE" || true

echo "---- show chrony_fetch_callback for refresh path context ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
txt = p.read_text().splitlines()
# print around chrony_fetch_callback definition
for i,line in enumerate(txt, start=1):
    if "def chrony_fetch_callback" in line:
        start=max(1,i-10); end=min(len(txt), i+120)
        for j in range(start,end+1):
            print(f"{j:6d}: {txt[j-1]}")
        break
PY

Repository: performancecopilot/pcp

Length of output: 8628


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
lines = p.read_text().splitlines(True)
for i in range(110, 190):  # focus on refresh-related code
    if 1 <= i <= len(lines):
        print(f"{i:6d}: {lines[i-1].rstrip()}")
PY

Repository: performancecopilot/pcp

Length of output: 3918


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

echo "---- chrony_refresh usages in file ----"
rg -n "chrony_refresh\(" "$FILE" || true

echo "---- any fetch/refresh callback wiring in this file ----"
rg -n "fetch|refresh|pmda|callback" "$FILE" | head -n 80

Repository: performancecopilot/pcp

Length of output: 3176


Add subprocess timeouts to chronyc calls in the refresh path.

src/pmdas/chrony/pmdachrony.python runs chronyc via subprocess.run(..., check=True) in both _oneline_cmd (line 87) and _multiline_cmd (line 105) with no timeout; a hung chronyc can stall PMDA refresh handling.

Proposed fix
-            res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)
+            res = subprocess.run(
+                cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True, timeout=5
+            )
...
-            res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)
+            res = subprocess.run(
+                cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True, timeout=5
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pmdas/chrony/pmdachrony.python` at line 87, The subprocess.run calls in
_oneline_cmd and _multiline_cmd lack timeouts and can hang; add a sensible
timeout argument (e.g. timeout=... seconds) to those subprocess.run invocations,
handle subprocess.TimeoutExpired by logging an error via the existing logger/
processLogger and returning/raising an appropriate failure (consistent with
current error handling in those functions), and ensure other parameters (stdout,
check=True, universal_newlines) remain unchanged; update both functions
(_oneline_cmd and _multiline_cmd) so chronyc hangs do not block refresh
handling.

Comment on lines +89 to +93
assert len(values) >= len(cvt_fxns), (values, cvt_fxns)

# parse each item
for i in range(len(values)):
out[i] = cvt_fxns[i](values[i])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and inspect the target file
ls -l src/pmdas/chrony/pmdachrony.python

# Show the relevant sections with line numbers (around 70-140)
sed -n '1,220p' src/pmdas/chrony/pmdachrony.python | nl -ba | sed -n '60,160p'

# Also search for the converter/CSV parsing loops
rg -n "def _oneline_cmd|def _multiline_cmd|cvt_fxns|assert len\(values\)|range\(len\(values\)\)|out\[values\[ip_idx\]\]|subprocess\.run|timeout" src/pmdas/chrony/pmdachrony.python

Repository: performancecopilot/pcp

Length of output: 212


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/pmdas/chrony/pmdachrony.python"

awk 'NR>=60 && NR<=140 {printf "%5d:%s\n", NR, $0}' "$FILE"

rg -n --no-heading "def _oneline_cmd|def _multiline_cmd|cvt_fxns|assert len\\(values\\)|range\\(len\\(values\\)\\)|out\\[|subprocess\\.run" "$FILE"

Repository: performancecopilot/pcp

Length of output: 3732


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"

python3 - <<'PY'
import os
p="src/pmdas/chrony/pmdachrony.python"
data=open(p,'rb').read()
trail=data[-50:]
print("last_bytes:", trail)
print("endswith_newline:", data.endswith(b'\n'))
# count how many consecutive newlines at end
cnt=0
for b in reversed(data):
    if b==10: # \n
        cnt+=1
    else:
        break
print("trailing_newline_count:", cnt)
PY

Repository: performancecopilot/pcp

Length of output: 183


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"

rg -n "_oneline_cmd\\(|_multiline_cmd\\(|SOURCESTATS_FXNS|TRACKING|SOURC" "$FILE" || true

# show portions around refresh / metric mapping usage
awk 'NR>=140 && NR<=320 {printf "%5d:%s\n", NR, $0}' "$FILE"

# show any helper that consumes out[i] dict layout
rg -n "out\\[|values\\[ip_idx\\]|cvt_fxns\\[" "$FILE"

Repository: performancecopilot/pcp

Length of output: 11875


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"

awk 'NR>=1 && NR<=90 {printf "%5d:%s\n", NR, $0}' "$FILE"
awk 'NR>=90 && NR<=170 {printf "%5d:%s\n", NR, $0}' "$FILE"

Repository: performancecopilot/pcp

Length of output: 6657


Prevent converter-table index overruns when parsing chronyc CSV output.

In src/pmdas/chrony/pmdachrony.python (_oneline_cmd lines 89-93 and _multiline_cmd lines 111-118), the code asserts len(values) >= len(cvt_fxns) but then iterates for i in range(len(values)) and indexes cvt_fxns[i]; if chronyc returns more fields than cvt_fxns, this raises IndexError and breaks the refresh path. Iterate over cvt_fxns (or enforce exact length) instead of iterating over values.

Proposed fix
-            assert len(values) >= len(cvt_fxns), (values, cvt_fxns)
-
-            # parse each item
-            for i in range(len(values)):
-                out[i] = cvt_fxns[i](values[i])
+            if len(values) < len(cvt_fxns):
+                raise ValueError(f"short chronyc output: {values}")
+            for i, cvt in enumerate(cvt_fxns):
+                out[i] = cvt(values[i])
...
-                assert len(values) >= len(cvt_fxns), (values, cvt_fxns)
+                if len(values) < len(cvt_fxns):
+                    raise ValueError(f"short chronyc output line: {values}")
...
-                # parse each item
-                for i in range(len(values)):
-                    out[values[ip_idx]][i] = cvt_fxns[i](values[i])
+                for i, cvt in enumerate(cvt_fxns):
+                    out[values[ip_idx]][i] = cvt(values[i])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pmdas/chrony/pmdachrony.python` around lines 89 - 93, The assert in
_oneline_cmd and _multiline_cmd checks len(values) >= len(cvt_fxns) but the loop
iterates over values causing IndexError when values is longer; change the loop
to iterate over cvt_fxns (e.g., for i in range(len(cvt_fxns))) and apply each
converter to the corresponding values[i], writing to out[i]; alternatively
enforce exact length by validating len(values) == len(cvt_fxns) before parsing.
Update the loops that reference values, cvt_fxns, and out so you never index
cvt_fxns with an out-of-range index.

@kmcdonell

Copy link
Copy Markdown
Member

@jeffpc I don't do Python, so I can't help there. But the PMDA question is a generic one ... if an indom is not static then the indom needs to be refreshed (possibly one-trip style it is unchanged for the life of the PMDA, else everytime) before handling a pmFetch or any of the pmInDom operations. If it needs doing only once then it is probably cleaner to make it part of the PMDA's init processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants