pmdas: add chronyd pmda#2613
Conversation
It retrieves chronyd tracking and source info, as well as source
statistics by shelling out to chronyc and parsing the output. These are
exposes as chrony.{tracking,sources,sourcestats}, respectively.
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe PR adds a complete Python-based PMDA for the PCP performance monitoring system that collects chronyd synchronization metrics. The change registers domain 165, defines build and installation infrastructure, implements metric collection via chronyc command execution and CSV parsing, and documents the new agent. ChangesChrony PMDA Addition
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
When I use pmval for the first time after pmcd restart on any of the chrony metrics that use the per-source instance domain, I get the following warning (e.g., if I try to fetch IIUC, this is because the first time pmcd fetchs the values it gets (and forwards onto pmval) the instance domain references before it has learned about the instances. The second fetch by pmval (i.e., 1 second later by default) displays the values correctly. Is this something that matters? Is there a good way to fix it? My best idea is to proactively populate the instances on pmda init (IOW, register the metrics, and immediately shell out to chronyc to populate the indom), but it seems like a bit of a hack. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pmdas/chrony/pmdachrony.python`:
- Line 87: The subprocess.run calls in _oneline_cmd and _multiline_cmd lack
timeouts and can hang; add a sensible timeout argument (e.g. timeout=...
seconds) to those subprocess.run invocations, handle subprocess.TimeoutExpired
by logging an error via the existing logger/ processLogger and returning/raising
an appropriate failure (consistent with current error handling in those
functions), and ensure other parameters (stdout, check=True, universal_newlines)
remain unchanged; update both functions (_oneline_cmd and _multiline_cmd) so
chronyc hangs do not block refresh handling.
- Around line 86-99: The _oneline_cmd function (and the similar parsing block at
lines ~104-123) only catches subprocess.CalledProcessError but can raise
AssertionError, ValueError, or IndexError when parsing/converting chronyc
output; wrap the parsing and conversion steps (the assert and the loop that
applies cvt_fxns to values) in exception handling that catches AssertionError,
ValueError, and IndexError (or a single except (AssertionError, ValueError,
IndexError) as e) and handle it the same way as CalledProcessError: log a clear
message including the command and the exception and return an empty dict, so
malformed or unexpected chronyc output won’t crash the PMDA.
- Around line 89-93: The assert in _oneline_cmd and _multiline_cmd checks
len(values) >= len(cvt_fxns) but the loop iterates over values causing
IndexError when values is longer; change the loop to iterate over cvt_fxns
(e.g., for i in range(len(cvt_fxns))) and apply each converter to the
corresponding values[i], writing to out[i]; alternatively enforce exact length
by validating len(values) == len(cvt_fxns) before parsing. Update the loops that
reference values, cvt_fxns, and out so you never index cvt_fxns with an
out-of-range index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 7bdb5aa1-aa45-4170-92e1-1e86db0f375a
📒 Files selected for processing (7)
src/pmdas/GNUmakefilesrc/pmdas/chrony/GNUmakefilesrc/pmdas/chrony/Installsrc/pmdas/chrony/Removesrc/pmdas/chrony/pmdachrony.1src/pmdas/chrony/pmdachrony.pythonsrc/pmns/stdpmid.pcp
| try: | ||
| res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True) | ||
| values = res.stdout.strip().split(",") | ||
| assert len(values) >= len(cvt_fxns), (values, cvt_fxns) | ||
|
|
||
| # parse each item | ||
| for i in range(len(values)): | ||
| out[i] = cvt_fxns[i](values[i]) | ||
| except subprocess.CalledProcessError as e: | ||
| cmd = " ".join(cmd) | ||
| self.log(f"Error running '{cmd}': {e}") | ||
| return {} | ||
|
|
||
| return out |
There was a problem hiding this comment.
Catch chronyc parsing/conversion failures to avoid crashing the PMDA
In src/pmdas/chrony/pmdachrony.python (_oneline_cmd and also the similar logic at 104-123), the code only catches subprocess.CalledProcessError; assert can raise AssertionError, and cvt_fxns conversions like int(..., 16/8) / float(...) can raise ValueError (with potential IndexError if field counts don’t match). These currently bubble up and can take the PMDA down.
Proposed fix
- except subprocess.CalledProcessError as e:
+ except (subprocess.CalledProcessError, ValueError, IndexError, AssertionError) as e:
cmd = " ".join(cmd)
self.log(f"Error running '{cmd}': {e}")
return {}
...
- except subprocess.CalledProcessError as e:
+ except (subprocess.CalledProcessError, ValueError, IndexError, AssertionError) as e:
cmd = " ".join(cmd)
self.log(f"Error running '{cmd}': {e}")
return {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pmdas/chrony/pmdachrony.python` around lines 86 - 99, The _oneline_cmd
function (and the similar parsing block at lines ~104-123) only catches
subprocess.CalledProcessError but can raise AssertionError, ValueError, or
IndexError when parsing/converting chronyc output; wrap the parsing and
conversion steps (the assert and the loop that applies cvt_fxns to values) in
exception handling that catches AssertionError, ValueError, and IndexError (or a
single except (AssertionError, ValueError, IndexError) as e) and handle it the
same way as CalledProcessError: log a clear message including the command and
the exception and return an empty dict, so malformed or unexpected chronyc
output won’t crash the PMDA.
| out = {} | ||
|
|
||
| try: | ||
| res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
# Show relevant sections with line numbers
nl -ba "$FILE" | sed -n '60,130p'
# Find all subprocess.run occurrences in the file
echo "---- subprocess.run occurrences ----"
rg -n "subprocess\.run\(" "$FILE" || true
# Ensure file ends with a single trailing newline (basic check)
echo "---- trailing newline check ----"
python3 - <<'PY'
import pathlib
p = pathlib.Path("src/pmdas/chrony/pmdachrony.python")
b = p.read_bytes()
print("ends_with_newline:", b.endswith(b"\n"))
print("last_bytes:", b[-20:])
PYRepository: performancecopilot/pcp
Length of output: 109
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
echo "---- lines 60-130 (with line numbers via python) ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
lines = p.read_text().splitlines(True) # keepends
for i in range(59, 130): # 1-based lines; slice starting at 60
if 1 <= i <= len(lines):
print(f"{i:6d}: {lines[i-1].rstrip()}")
PY
echo "---- subprocess.run occurrences (with line numbers) ----"
rg -n "subprocess\.run\(" "$FILE" || true
echo "---- trailing newline check ----"
python3 - <<'PY'
from pathlib import Path
b = Path("src/pmdas/chrony/pmdachrony.python").read_bytes()
print("ends_with_newline:", b.endswith(b"\n"))
PYRepository: performancecopilot/pcp
Length of output: 3083
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
echo "---- chronyc usages ----"
rg -n "chronyc" "$FILE" || true
echo "---- calls to _oneline_cmd ----"
rg -n "_oneline_cmd\(" "$FILE" || true
echo "---- calls to _multiline_cmd ----"
rg -n "_multiline_cmd\(" "$FILE" || true
echo "---- show around _oneline_cmd calls (context) ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
txt = p.read_text().splitlines()
targets = []
for i,line in enumerate(txt, start=1):
if "_oneline_cmd(" in line or "_multiline_cmd(" in line:
targets.append(i)
for t in targets:
start=max(1,t-5); end=min(len(txt), t+5)
print(f"\n--- context {start}-{end} around line {t} ---")
for j in range(start,end+1):
print(f"{j:6d}: {txt[j-1]}")
PYRepository: performancecopilot/pcp
Length of output: 3947
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
echo "---- search for existing timeouts in this file ----"
rg -n "timeout\s*=" "$FILE" || true
rg -n "signal\." "$FILE" || true
rg -n "alarm\(" "$FILE" || true
rg -n "threading|Timer" "$FILE" || true
echo "---- subprocess import/usage ----"
rg -n "import subprocess|subprocess\." "$FILE" || true
echo "---- show chrony_fetch_callback for refresh path context ----"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
txt = p.read_text().splitlines()
# print around chrony_fetch_callback definition
for i,line in enumerate(txt, start=1):
if "def chrony_fetch_callback" in line:
start=max(1,i-10); end=min(len(txt), i+120)
for j in range(start,end+1):
print(f"{j:6d}: {txt[j-1]}")
break
PYRepository: performancecopilot/pcp
Length of output: 8628
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
python3 - <<'PY'
from pathlib import Path
p = Path("src/pmdas/chrony/pmdachrony.python")
lines = p.read_text().splitlines(True)
for i in range(110, 190): # focus on refresh-related code
if 1 <= i <= len(lines):
print(f"{i:6d}: {lines[i-1].rstrip()}")
PYRepository: performancecopilot/pcp
Length of output: 3918
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
echo "---- chrony_refresh usages in file ----"
rg -n "chrony_refresh\(" "$FILE" || true
echo "---- any fetch/refresh callback wiring in this file ----"
rg -n "fetch|refresh|pmda|callback" "$FILE" | head -n 80Repository: performancecopilot/pcp
Length of output: 3176
Add subprocess timeouts to chronyc calls in the refresh path.
src/pmdas/chrony/pmdachrony.python runs chronyc via subprocess.run(..., check=True) in both _oneline_cmd (line 87) and _multiline_cmd (line 105) with no timeout; a hung chronyc can stall PMDA refresh handling.
Proposed fix
- res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)
+ res = subprocess.run(
+ cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True, timeout=5
+ )
...
- res = subprocess.run(cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True)
+ res = subprocess.run(
+ cmd, stdout=subprocess.PIPE, universal_newlines=True, check=True, timeout=5
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pmdas/chrony/pmdachrony.python` at line 87, The subprocess.run calls in
_oneline_cmd and _multiline_cmd lack timeouts and can hang; add a sensible
timeout argument (e.g. timeout=... seconds) to those subprocess.run invocations,
handle subprocess.TimeoutExpired by logging an error via the existing logger/
processLogger and returning/raising an appropriate failure (consistent with
current error handling in those functions), and ensure other parameters (stdout,
check=True, universal_newlines) remain unchanged; update both functions
(_oneline_cmd and _multiline_cmd) so chronyc hangs do not block refresh
handling.
| assert len(values) >= len(cvt_fxns), (values, cvt_fxns) | ||
|
|
||
| # parse each item | ||
| for i in range(len(values)): | ||
| out[i] = cvt_fxns[i](values[i]) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and inspect the target file
ls -l src/pmdas/chrony/pmdachrony.python
# Show the relevant sections with line numbers (around 70-140)
sed -n '1,220p' src/pmdas/chrony/pmdachrony.python | nl -ba | sed -n '60,160p'
# Also search for the converter/CSV parsing loops
rg -n "def _oneline_cmd|def _multiline_cmd|cvt_fxns|assert len\(values\)|range\(len\(values\)\)|out\[values\[ip_idx\]\]|subprocess\.run|timeout" src/pmdas/chrony/pmdachrony.pythonRepository: performancecopilot/pcp
Length of output: 212
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
awk 'NR>=60 && NR<=140 {printf "%5d:%s\n", NR, $0}' "$FILE"
rg -n --no-heading "def _oneline_cmd|def _multiline_cmd|cvt_fxns|assert len\\(values\\)|range\\(len\\(values\\)\\)|out\\[|subprocess\\.run" "$FILE"Repository: performancecopilot/pcp
Length of output: 3732
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
python3 - <<'PY'
import os
p="src/pmdas/chrony/pmdachrony.python"
data=open(p,'rb').read()
trail=data[-50:]
print("last_bytes:", trail)
print("endswith_newline:", data.endswith(b'\n'))
# count how many consecutive newlines at end
cnt=0
for b in reversed(data):
if b==10: # \n
cnt+=1
else:
break
print("trailing_newline_count:", cnt)
PYRepository: performancecopilot/pcp
Length of output: 183
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
rg -n "_oneline_cmd\\(|_multiline_cmd\\(|SOURCESTATS_FXNS|TRACKING|SOURC" "$FILE" || true
# show portions around refresh / metric mapping usage
awk 'NR>=140 && NR<=320 {printf "%5d:%s\n", NR, $0}' "$FILE"
# show any helper that consumes out[i] dict layout
rg -n "out\\[|values\\[ip_idx\\]|cvt_fxns\\[" "$FILE"Repository: performancecopilot/pcp
Length of output: 11875
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/pmdas/chrony/pmdachrony.python"
awk 'NR>=1 && NR<=90 {printf "%5d:%s\n", NR, $0}' "$FILE"
awk 'NR>=90 && NR<=170 {printf "%5d:%s\n", NR, $0}' "$FILE"Repository: performancecopilot/pcp
Length of output: 6657
Prevent converter-table index overruns when parsing chronyc CSV output.
In src/pmdas/chrony/pmdachrony.python (_oneline_cmd lines 89-93 and _multiline_cmd lines 111-118), the code asserts len(values) >= len(cvt_fxns) but then iterates for i in range(len(values)) and indexes cvt_fxns[i]; if chronyc returns more fields than cvt_fxns, this raises IndexError and breaks the refresh path. Iterate over cvt_fxns (or enforce exact length) instead of iterating over values.
Proposed fix
- assert len(values) >= len(cvt_fxns), (values, cvt_fxns)
-
- # parse each item
- for i in range(len(values)):
- out[i] = cvt_fxns[i](values[i])
+ if len(values) < len(cvt_fxns):
+ raise ValueError(f"short chronyc output: {values}")
+ for i, cvt in enumerate(cvt_fxns):
+ out[i] = cvt(values[i])
...
- assert len(values) >= len(cvt_fxns), (values, cvt_fxns)
+ if len(values) < len(cvt_fxns):
+ raise ValueError(f"short chronyc output line: {values}")
...
- # parse each item
- for i in range(len(values)):
- out[values[ip_idx]][i] = cvt_fxns[i](values[i])
+ for i, cvt in enumerate(cvt_fxns):
+ out[values[ip_idx]][i] = cvt(values[i])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pmdas/chrony/pmdachrony.python` around lines 89 - 93, The assert in
_oneline_cmd and _multiline_cmd checks len(values) >= len(cvt_fxns) but the loop
iterates over values causing IndexError when values is longer; change the loop
to iterate over cvt_fxns (e.g., for i in range(len(cvt_fxns))) and apply each
converter to the corresponding values[i], writing to out[i]; alternatively
enforce exact length by validating len(values) == len(cvt_fxns) before parsing.
Update the loops that reference values, cvt_fxns, and out so you never index
cvt_fxns with an out-of-range index.
|
@jeffpc I don't do Python, so I can't help there. But the PMDA question is a generic one ... if an indom is not static then the indom needs to be refreshed (possibly one-trip style it is unchanged for the life of the PMDA, else everytime) before handling a pmFetch or any of the pmInDom operations. If it needs doing only once then it is probably cleaner to make it part of the PMDA's init processing. |
It retrieves chronyd tracking and source info, as well as source statistics by shelling out to chronyc and parsing the output. These are exposes as chrony.{tracking,sources,sourcestats}, respectively.
This is the first time I'm using the python pmda API, so there's a good chance I'm messing something up. You've been warned :D