Skip to content

CORENET-7140: feat(cluster-diagnostics): add nodes_debug_exec tool for node-level debugging #333

Open
arkadeepsen wants to merge 3 commits into
openshift:mainfrom
arkadeepsen:node-debug
Open

CORENET-7140: feat(cluster-diagnostics): add nodes_debug_exec tool for node-level debugging #333
arkadeepsen wants to merge 3 commits into
openshift:mainfrom
arkadeepsen:node-debug

Conversation

@arkadeepsen

@arkadeepsen arkadeepsen commented Jun 9, 2026

Copy link
Copy Markdown
Member

This PR adds the add_debug_exec tool.
Cherry-picked the commit from #38 and refactored it. The refactor is added to a separate commit.

Summary by CodeRabbit

  • New Features
    • Added a new cluster-diagnostics toolset with node debug execution support for privileged diagnostics.
  • Documentation
    • Updated README and configuration tables to include cluster-diagnostics and openshift.
    • Added detailed cluster-diagnostics eval suite documentation and added OpenShift must-gather prompt guidance.
  • Tests
    • Added cluster-diagnostics eval task coverage plus new unit/integration-style tests and pod cleanup scripts for task runs.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 9, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 9, 2026

Copy link
Copy Markdown

@arkadeepsen: This pull request references CORENET-7140 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from Kaustubh-pande and matzew June 9, 2026 11:01
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: arkadeepsen
Once this PR has been reviewed and has the lgtm label, please assign manusa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 75bee0e0-11c3-4f68-9628-fb7dfc9b4da1

📥 Commits

Reviewing files that changed from the base of the PR and between 86d6c16 and 5dbb207.

📒 Files selected for processing (20)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-kubelet-active/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (3)
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • docs/configuration.md
🚧 Files skipped from review as they are similar to previous changes (14)
  • evals/gemini-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-kubelet-active/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/claude-code/eval.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • internal/tools/update-readme/main.go
  • pkg/mcp/toolsets_test.go
  • README.md
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go

📝 Walkthrough

Walkthrough

Adds a new cluster-diagnostics MCP toolset with a nodes_debug_exec server tool for executing commands on Kubernetes nodes via privileged debug pods. Includes core pod orchestration implementation with comprehensive tests, tool handler with input validation, toolset integration, evaluation tasks, multiple eval config updates, and README documentation.

Changes

cluster-diagnostics Toolset

Layer / File(s) Summary
Node debug pod orchestration
pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
Implements NodeDebugClient interface, NodeDebug wrapper, and core pod lifecycle: privileged pod creation with host filesystem mounting at configurable paths, pod readiness polling, command execution via pod exec with stdout/stderr capture, pod cleanup via deferred delete, log retrieval on exec failure, node name sanitization for DNS label safety, and validatePath rejecting relative paths, traversal segments, and unsafe characters.
Node debug tests and helpers
pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go, nodes_debug_testhelpers_test.go
Provides NodeDebugTestEnv with FakeNodeDebugClient and FakePodInterface test doubles for deterministic pod lifecycle and I/O; comprehensive test suite covering successful privileged pod creation with correct namespace/node targeting, security context configuration, host volume mounting, error handling with log enrichment, input validation (node/command required), custom namespace/image/paths, pod name sanitization with DNS length enforcement, readiness timeout, and path rejection for traversal/relative/unsafe inputs.
nodes_debug_exec tool handler and validation
pkg/toolsets/cluster-diagnostics/nodes/nodes.go, nodes_test.go
Registers nodes_debug_exec server tool with JSON schema requiring node and command (array), optional namespace/image/timeout_seconds; validates node presence, converts command to string array via toStringSlice, parses timeout across float64/int/int64 with integer enforcement and minimum 1 validation, invokes node debug client, and formats output by preferring stdout, concatenating stderr when both present, or returning no-output message for empty streams.
Toolset implementation and registration
pkg/toolsets/cluster-diagnostics/toolset.go, pkg/mcp/modules.go, pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json, pkg/mcp/toolsets_test.go
Implements Toolset type exposing tools from nodes.InitNodes(), registers via init(), adds blank import in modules.go to ensure MCP registration, includes JSON testdata snapshot defining nodes_debug_exec with destructive/open-world annotations and complete parameter schema, and extends toolsets test to validate cluster-diagnostics metadata snapshot.
Evaluation task suite documentation and tasks
evals/tasks/cluster-diagnostics/README.md, check-kubelet-active/*, list-host-etc/*
Documents prerequisites (cluster/tooling, MCP server, toolset enablement with read_only=false/disable_destructive=false), required permissions, rationale for nodes_debug_exec (standard API insufficient for host-level facts), prompt design rules (goal-focused discovery without naming tools), cleanup conventions (idempotent label-based deletion), task discovery via suite labels, and local/CI workflows. Adds two evaluation tasks: check-kubelet-active verifying kubelet service active on control-plane, and list-host-etc listing /etc directory on control-plane, each with cleanup script and nodes_debug_exec tool assertions.
Evaluation config updates and toolset wiring
evals/claude-code/eval.yaml, evals/gemini-agent/eval.yaml, evals/openai-agent/eval.yaml, internal/tools/update-readme/main.go
Updates three eval configs (claude-code, gemini-agent renamed to toolsets-eval, openai-agent) to add cluster-diagnostics taskSets with glob patterns, label selectors, and Kubernetes tool usage assertions (toolPattern: ".*", minToolCalls: 1, maxToolCalls: 20); adds blank imports to update-readme main for cluster-diagnostics and openshift toolsets to include them in README generation.
Documentation and README updates
README.md, docs/configuration.md
Regenerates "Available Toolsets" table including cluster-diagnostics and openshift entries with "Default" column showing config and core as enabled by default; documents nodes_debug_exec tool with UBI9 toolbox approach, /host mounting/chroot guidance, output truncation notes, and all parameters; adds OpenShift prompts block for plan_mustgather with node targeting, must-gather directory/command, namespace selection, timeout/since, and image/networking options.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hop in boots of UBI nine,
A debug pod trails the node's thin line,
Mount /host, run, then tuck away,
Tests and docs keep errors at bay. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding the nodes_debug_exec tool for node-level debugging in the cluster-diagnostics toolset, which aligns with the substantial additions of the new toolset, MCP tool implementation, tests, and documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (8)
pkg/ocp/nodesdebug/nodes_debug.go (3)

176-176: ⚡ Quick win

Sleep command creates a long-lived container that wastes resources.

The debug pod runs sleep infinity (line 176) and relies solely on the defer cleanup. If cleanup fails (e.g., process crash, network partition), the pod runs indefinitely. Consider using a finite sleep duration as a safety net.

♻️ Proposed improvement
-	sleepCommand := []string{"sleep", "infinity"}
+	// Use a finite sleep as a safety net in case cleanup fails
+	sleepCommand := []string{"sleep", "3600"}  // 1 hour
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ocp/nodesdebug/nodes_debug.go` at line 176, The debug pod currently uses
a perpetual sleep command stored in sleepCommand (sleepCommand :=
[]string{"sleep", "infinity"}) which can leave pods running forever if deferred
cleanup fails; change this to a finite, reasonable timeout (e.g., sleep for a
configurable duration like 1h or derived from a context/timeout value) and
ensure the value is documented/configurable where sleepCommand is defined (or
within the function that creates the debug pod) so pods automatically exit as a
safety net if cleanup doesn't run.

155-160: ⚡ Quick win

Retrieve logs with parent context instead of Background.

Line 155 retrieves logs using context.Background(), which means log retrieval continues even if the parent operation was canceled. This can waste resources and delay error reporting.

♻️ Proposed refactor
-	logs, _ := n.retrieveLogs(context.Background(), ns, created.Name)
+	logs, _ := n.retrieveLogs(timeoutCtx, ns, created.Name)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ocp/nodesdebug/nodes_debug.go` around lines 155 - 160, The log retrieval
is using context.Background() which ignores cancellation of the parent
operation; change the call to n.retrieveLogs to pass the parent context (the ctx
used in the surrounding function) instead of context.Background() so log
retrieval is cancelled when the parent is cancelled; update the call in the
error handling branch that references retrieveLogs(ctx?, ns, created.Name) and
ensure any variable named execErr, created.Name and ns are preserved in the
returned error formatting.

215-219: ⚖️ Poor tradeoff

Excessive tolerations may schedule debug pods on incompatible nodes.

Lines 215-219 add three tolerations that collectively tolerate all taints (any key, any value, any effect). This means debug pods can land on nodes with critical taints like node.kubernetes.io/unschedulable or custom infrastructure taints. While broad scheduling is intentional for debugging, consider documenting this behavior or adding a constraint.

The first toleration (line 216, Operator: TolerationOpExists with no Effect) already tolerates everything, making lines 217-218 redundant. Consider simplifying:

♻️ Optional simplification
 Tolerations: []corev1.Toleration{
 	{Operator: corev1.TolerationOpExists},
-	{Operator: corev1.TolerationOpExists, Effect: corev1.TaintEffectNoSchedule},
-	{Operator: corev1.TolerationOpExists, Effect: corev1.TaintEffectNoExecute},
 },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ocp/nodesdebug/nodes_debug.go` around lines 215 - 219, The Tolerations
slice currently contains redundant and overly-broad entries (corev1.Toleration
with Operator: corev1.TolerationOpExists plus separate entries for
corev1.TaintEffectNoSchedule and corev1.TaintEffectNoExecute) which effectively
tolerates all taints and may place debug pods on incompatible nodes; update the
Tolerations on the Pod spec (the Tolerations []corev1.Toleration literal in
nodes_debug.go) by either keeping a single corev1.Toleration{Operator:
corev1.TolerationOpExists} and removing the two redundant entries, or narrow
them to explicit effects/keys you actually need (e.g., specify Effect or Key)
and add a clear code comment documenting the deliberate broad tolerance behavior
if you keep it.
pkg/toolsets/openshift/nodes/nodes.go (2)

62-109: 💤 Low value

Consider validating node name format to prevent obvious typos.

The handler accepts any non-empty string as a node name. While Kubernetes will reject invalid node names, catching obvious errors early (e.g., leading/trailing whitespace, invalid DNS characters) would improve user experience.

♻️ Optional improvement
 nodeArg := params.GetArguments()["node"]
 nodeName, ok := nodeArg.(string)
 if nodeArg == nil || !ok || nodeName == "" {
 	return api.NewToolCallResult("", errors.New("missing required argument: node")), nil
 }
+nodeName = strings.TrimSpace(nodeName)
+if nodeName == "" {
+	return api.NewToolCallResult("", errors.New("node name cannot be empty or whitespace")), nil
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/openshift/nodes/nodes.go` around lines 62 - 109, In
nodesDebugExec, validate the parsed nodeName for obvious typos before
proceeding: trim leading/trailing whitespace (strings.TrimSpace) and reject
empty results, then enforce a Kubernetes DNS-1123-like pattern (e.g. regex
^[a-z0-9]([-a-z0-9]*[a-z0-9])?$ and maximum length check) to catch invalid
characters or lengths; if validation fails return api.NewToolCallResult("",
errors.New("invalid node name: <brief reason>")), nil. Update the existing
nodeArg handling in nodesDebugExec to perform these checks right after nodeName
is extracted and use clear error messages for whitespace/format/length failures.

100-100: ⚖️ Poor tradeoff

Hard-coded empty hostPath and mountPath limit debugging capabilities.

Line 100 passes empty strings for hostPath and mountPath, which means the tool always uses the default //host mapping. This prevents users from mounting specific host directories (e.g., /var/log/logs) which could be useful for targeted debugging. Consider exposing these as optional tool parameters.

♻️ Optional enhancement

Add host_path and mount_path fields to the InputSchema and parse them from arguments, similar to how namespace and image are handled.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/openshift/nodes/nodes.go` at line 100, The NodesDebugExec call
currently passes empty strings for hostPath and mountPath (stdout, stderr,
execErr := client.NodesDebugExec(params.Context, namespace, nodeName, image,
command, "", "", timeout)), preventing custom host mounts; update the
InputSchema to include optional host_path and mount_path fields, parse them from
the CLI/args like namespace and image, and pass those variables instead of ""
into NodesDebugExec so callers can mount arbitrary host directories (e.g.,
hostPath -> "/var/log", mountPath -> "/logs"); ensure defaults remain "/" ->
"/host" when the new fields are omitted.
pkg/toolsets/openshift/nodes/nodes_test.go (1)

1-75: ⚡ Quick win

Enhance test coverage with success path and error propagation tests.

The current tests only validate input errors. Consider adding tests that:

  1. Verify successful command execution returns expected output
  2. Test error propagation from the underlying nodesdebug layer
  3. Verify timeout parameter is correctly passed through
  4. Test namespace and image parameter handling

Example test case structure:

🧪 Suggested additional tests
s.Run("successful execution returns stdout", func() {
	env := setupMockEnv() // Mock the underlying nodesdebug client
	params := api.ToolHandlerParams{
		ToolCallRequest: staticRequest{args: map[string]any{
			"node": "worker-0",
			"command": []interface{}{"echo", "test"},
		}},
		KubernetesClient: env.Client,
		Context: context.Background(),
	}
	result, err := nodesDebugExec(params)
	s.Require().NoError(err)
	s.Require().Nil(result.Error)
	s.Contains(result.Output, "test")
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/openshift/nodes/nodes_test.go` around lines 1 - 75, Add unit
tests to nodes_test.go that cover the success and propagation paths: in
NodesHandlerSuite add runs that (1) mock the underlying nodesdebug behavior
(e.g., via a setupMockEnv or a fake client injected via
ToolHandlerParams.KubernetesClient) and assert nodesDebugExec(params) returns no
error, nil result.Error and expected stdout when command is ["echo","test"]; (2)
simulate the nodesdebug layer returning an execution error and assert that
nodesDebugExec returns a result with that error propagated; (3) pass a "timeout"
argument and assert the fake client received the timeout value; and (4) pass
"namespace" and "image" args and assert they are forwarded to the nodesdebug
call—use the existing staticRequest, NodesHandlerSuite and nodesDebugExec
identifiers to locate where to add these tests and wire the mock client via
ToolHandlerParams.Context/KubernetesClient.
evals/tasks/openshift/list-host-etc/cleanup.sh (1)

1-5: 💤 Low value

Consider extracting shared cleanup script to reduce duplication.

This cleanup script is identical to evals/tasks/openshift/check-node-kernel/cleanup.sh. Both tasks share the same cleanup logic (deleting node-debug pods). Consider one of these approaches:

  1. Create a shared cleanup script at evals/tasks/openshift/cleanup.sh and symlink to it from each task directory.
  2. Reference a shared cleanup script path in the task YAML if the framework supports it.
  3. Accept the duplication if task-specific cleanup divergence is expected in the future.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/openshift/list-host-etc/cleanup.sh` around lines 1 - 5, Both
cleanup scripts (evals/tasks/openshift/list-host-etc/cleanup.sh and
evals/tasks/openshift/check-node-kernel/cleanup.sh) duplicate the same kubectl
deletion logic; extract that shared logic into a single script (e.g.,
evals/tasks/openshift/cleanup.sh) preserving the shebang and set -euo pipefail,
move the kubectl delete command into it, make the new file executable, then
replace the two originals with either (a) symlinks pointing to the new
evals/tasks/openshift/cleanup.sh or (b) update the task YAMLs to reference the
shared script path; ensure permissions and behavior remain identical
(ignore-not-found, wait=false) after the change.
evals/tasks/openshift/list-host-etc/verify.sh (1)

1-10: 💤 Low value

Consider extracting shared verification script to reduce duplication.

This verification script is identical to evals/tasks/openshift/check-node-kernel/verify.sh. Both tasks use the same verification logic (checking for leftover node-debug pods). Similar to the cleanup scripts, consider extracting to a shared location if the framework supports it, or document the intentional duplication if task-specific verification is expected to diverge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/openshift/list-host-etc/verify.sh` around lines 1 - 10, The
verify.sh script duplicates the same node-debug pod check (using the label
selector app.kubernetes.io/component=node-debug and the remaining variable
computed via kubectl) found in the other task's verify.sh; extract that logic
into a single shared verification script or library (e.g., a common script that
exports a function like check_node_debug_pods) and have both verify.sh files
source and call that shared function, or if the framework doesn't support shared
scripts, add a short comment documenting intentional duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/ocp/nodesdebug/nodes_debug_test.go`:
- Around line 92-170: Add negative tests that call NodeDebug.NodesDebugExec to
ensure validatePath rejects unsafe paths: create cases for invalid hostPath
(e.g., "../etc"), invalid mountPath (e.g., "relative/path" or paths with ".." or
unsafe chars), instantiate env via NewNodeDebugTestEnv, set env.Pods.Running =
true, call nd := &NodeDebug{NodeDebugClient: env.Client} and invoke
nd.NodesDebugExec(...) and assert an error is returned and its message contains
the expected validation clue (e.g., "path traversal" or "invalid path"); add one
test per invalid scenario to the TestCreateDebugPod table or as separate s.Run
blocks so validatePath behavior is covered.
- Line 136: The tests currently call the unexported method createDebugPod
directly; change them to exercise that behavior via the public API
(NodesDebugExec) instead or, if the behavior must be part of the package
contract, export the function (rename createDebugPod -> CreateDebugPod) and
update callers/tests accordingly; specifically, update the test cases at the
sites calling createDebugPod to invoke NodesDebugExec with the same inputs and
assert the same outcomes (or promote the method to exported and adjust
imports/tests), ensuring you reference the createDebugPod implementation for
mapping inputs/results to the public NodesDebugExec parameters and expected
outputs.

In `@pkg/ocp/nodesdebug/nodes_debug.go`:
- Around line 304-311: Remove the unused helper function containerStatusByName
from the codebase: delete the function definition and any associated comments or
imports that only exist for it, then run build/static-analysis/tests to confirm
no remaining references; if this helper was intended for future use, either add
a TODO comment linking to the planned usage or move it to a clearly named
utilities file to avoid dead code.
- Line 147: The call to n.pollForRunning currently passes a hard-coded
time.Minute which ignores the user-provided timeout parameter; update the
n.pollForRunning invocation to use the function's timeout variable (convert the
timeout_seconds int/uint to a time.Duration if needed) instead of time.Minute so
pollForRunning(timeoutCtx, ns, created.Name, <converted timeout>) honors the
user's timeout setting; ensure the timeoutCtx and timeout variable remain
consistent when passed to pollForRunning.
- Line 329: The current traversal check uses slices.Contains(strings.Split(path,
string(filepath.Separator)), "..") which is OS-dependent; replace this with a
POSIX-safe check by normalizing and splitting using the Unix separator or the
path package: call path.Clean(path) (from the standard path, not filepath), then
split on "/" (e.g. strings.Split(cleanPath, "/")) and use slices.Contains on the
resulting segments to detect ".."; update the check that references the variable
path and the slices.Contains call (and remove or stop using filepath.Separator
here) so it correctly detects upward traversal regardless of host OS.
- Around line 140-144: The deferred pod-delete currently creates a fixed 30s
context and discards any error, which can leave orphaned pods; update the defer
in the defer func() around deleteCtx/delete cancel so it derives its timeout
from the parent context (or uses the parent context directly) and
increase/parameterize the timeout, call n.Pods(ns).Delete(deleteCtx,
created.Name, metav1.DeleteOptions{}) and capture the returned error, and log it
(using the existing logger or context logger) instead of ignoring it so deletion
failures are visible for debugging.

---

Nitpick comments:
In `@evals/tasks/openshift/list-host-etc/cleanup.sh`:
- Around line 1-5: Both cleanup scripts
(evals/tasks/openshift/list-host-etc/cleanup.sh and
evals/tasks/openshift/check-node-kernel/cleanup.sh) duplicate the same kubectl
deletion logic; extract that shared logic into a single script (e.g.,
evals/tasks/openshift/cleanup.sh) preserving the shebang and set -euo pipefail,
move the kubectl delete command into it, make the new file executable, then
replace the two originals with either (a) symlinks pointing to the new
evals/tasks/openshift/cleanup.sh or (b) update the task YAMLs to reference the
shared script path; ensure permissions and behavior remain identical
(ignore-not-found, wait=false) after the change.

In `@evals/tasks/openshift/list-host-etc/verify.sh`:
- Around line 1-10: The verify.sh script duplicates the same node-debug pod
check (using the label selector app.kubernetes.io/component=node-debug and the
remaining variable computed via kubectl) found in the other task's verify.sh;
extract that logic into a single shared verification script or library (e.g., a
common script that exports a function like check_node_debug_pods) and have both
verify.sh files source and call that shared function, or if the framework
doesn't support shared scripts, add a short comment documenting intentional
duplication.

In `@pkg/ocp/nodesdebug/nodes_debug.go`:
- Line 176: The debug pod currently uses a perpetual sleep command stored in
sleepCommand (sleepCommand := []string{"sleep", "infinity"}) which can leave
pods running forever if deferred cleanup fails; change this to a finite,
reasonable timeout (e.g., sleep for a configurable duration like 1h or derived
from a context/timeout value) and ensure the value is documented/configurable
where sleepCommand is defined (or within the function that creates the debug
pod) so pods automatically exit as a safety net if cleanup doesn't run.
- Around line 155-160: The log retrieval is using context.Background() which
ignores cancellation of the parent operation; change the call to n.retrieveLogs
to pass the parent context (the ctx used in the surrounding function) instead of
context.Background() so log retrieval is cancelled when the parent is cancelled;
update the call in the error handling branch that references retrieveLogs(ctx?,
ns, created.Name) and ensure any variable named execErr, created.Name and ns are
preserved in the returned error formatting.
- Around line 215-219: The Tolerations slice currently contains redundant and
overly-broad entries (corev1.Toleration with Operator: corev1.TolerationOpExists
plus separate entries for corev1.TaintEffectNoSchedule and
corev1.TaintEffectNoExecute) which effectively tolerates all taints and may
place debug pods on incompatible nodes; update the Tolerations on the Pod spec
(the Tolerations []corev1.Toleration literal in nodes_debug.go) by either
keeping a single corev1.Toleration{Operator: corev1.TolerationOpExists} and
removing the two redundant entries, or narrow them to explicit effects/keys you
actually need (e.g., specify Effect or Key) and add a clear code comment
documenting the deliberate broad tolerance behavior if you keep it.

In `@pkg/toolsets/openshift/nodes/nodes_test.go`:
- Around line 1-75: Add unit tests to nodes_test.go that cover the success and
propagation paths: in NodesHandlerSuite add runs that (1) mock the underlying
nodesdebug behavior (e.g., via a setupMockEnv or a fake client injected via
ToolHandlerParams.KubernetesClient) and assert nodesDebugExec(params) returns no
error, nil result.Error and expected stdout when command is ["echo","test"]; (2)
simulate the nodesdebug layer returning an execution error and assert that
nodesDebugExec returns a result with that error propagated; (3) pass a "timeout"
argument and assert the fake client received the timeout value; and (4) pass
"namespace" and "image" args and assert they are forwarded to the nodesdebug
call—use the existing staticRequest, NodesHandlerSuite and nodesDebugExec
identifiers to locate where to add these tests and wire the mock client via
ToolHandlerParams.Context/KubernetesClient.

In `@pkg/toolsets/openshift/nodes/nodes.go`:
- Around line 62-109: In nodesDebugExec, validate the parsed nodeName for
obvious typos before proceeding: trim leading/trailing whitespace
(strings.TrimSpace) and reject empty results, then enforce a Kubernetes
DNS-1123-like pattern (e.g. regex ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$ and maximum
length check) to catch invalid characters or lengths; if validation fails return
api.NewToolCallResult("", errors.New("invalid node name: <brief reason>")), nil.
Update the existing nodeArg handling in nodesDebugExec to perform these checks
right after nodeName is extracted and use clear error messages for
whitespace/format/length failures.
- Line 100: The NodesDebugExec call currently passes empty strings for hostPath
and mountPath (stdout, stderr, execErr := client.NodesDebugExec(params.Context,
namespace, nodeName, image, command, "", "", timeout)), preventing custom host
mounts; update the InputSchema to include optional host_path and mount_path
fields, parse them from the CLI/args like namespace and image, and pass those
variables instead of "" into NodesDebugExec so callers can mount arbitrary host
directories (e.g., hostPath -> "/var/log", mountPath -> "/logs"); ensure
defaults remain "/" -> "/host" when the new fields are omitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b26f1bea-6a61-4dc9-9705-6623d36ebb04

📥 Commits

Reviewing files that changed from the base of the PR and between a3d5bbc and 5c24461.

📒 Files selected for processing (21)
  • .github/workflows/mcpchecker.yaml
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/openshift/README.md
  • evals/tasks/openshift/check-node-kernel/cleanup.sh
  • evals/tasks/openshift/check-node-kernel/task.yaml
  • evals/tasks/openshift/check-node-kernel/verify.sh
  • evals/tasks/openshift/list-host-etc/cleanup.sh
  • evals/tasks/openshift/list-host-etc/task.yaml
  • evals/tasks/openshift/list-host-etc/verify.sh
  • internal/tools/update-readme/main.go
  • pkg/config/config_default_overrides.go
  • pkg/ocp/nodesdebug/nodes_debug.go
  • pkg/ocp/nodesdebug/nodes_debug_test.go
  • pkg/ocp/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/toolsets/openshift/nodes/nodes.go
  • pkg/toolsets/openshift/nodes/nodes_test.go
  • pkg/toolsets/openshift/toolset.go

Comment thread pkg/ocp/nodesdebug/nodes_debug_test.go Outdated
Comment thread pkg/ocp/nodesdebug/nodes_debug_test.go Outdated
Comment thread pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
Comment thread pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
Comment thread pkg/ocp/nodesdebug/nodes_debug.go Outdated
Comment thread pkg/ocp/nodesdebug/nodes_debug.go Outdated
@arkadeepsen arkadeepsen force-pushed the node-debug branch 2 times, most recently from 009980b to f24340d Compare June 9, 2026 13:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/ocp/nodesdebug/nodes_debug_test.go (1)

22-261: 🏗️ Heavy lift

Consider adopting envtest for integration test coverage.

The tests currently use a fake client (NodeDebugTestEnv) for all test cases. As per coding guidelines, tests should use real implementations and integration testing where possible, avoiding mocks, and specifically use the setup-envtest tooling from sigs.k8s.io/controller-runtime for test setup.

While envtest cannot test actual pod execution (no container runtime), it can provide a real Kubernetes API server to test:

  • Pod creation/deletion API calls and validation
  • Resource spec constraints and API-level errors
  • Namespace and node targeting behavior

Consider a hybrid approach:

  • Use envtest-based tests for API interactions (pod CRUD, input validation, spec generation)
  • Retain fake-based tests for execution behavior testing (since envtest cannot run containers)

This would provide stronger integration coverage and align with the repository's testing guidelines.

As per coding guidelines: "Use real implementations and integration testing where possible - avoid mocks" and "Use the setup-envtest tooling from sigs.k8s.io/controller-runtime for test setup and ensure network access is available for downloading the Kubernetes envtest environment"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ocp/nodesdebug/nodes_debug_test.go` around lines 22 - 261, Tests in
NodesDebugSuite currently rely entirely on the fake test harness
(NewNodeDebugTestEnv and its Pods behavior) for all cases; update test coverage
to use envtest (setup-envtest from sigs.k8s.io/controller-runtime) for API-level
validation and pod CRUD checks while keeping the fake env for runtime exec
behavior: create new test variants (e.g., TestNodesDebugExec_APIValidation or
TestNodesDebugExec_CreationWithEnvtest) that initialize an envtest control
plane, use the real client to call NodesDebugExec to assert pod creation,
namespace, volume and securityContext fields and node targeting, and retain
existing NewNodeDebugTestEnv-based tests only for execution/exec error scenarios
(where container runtime is required); ensure the new envtest tests wire up the
same NodeDebug implementation (the nodeDebug helper) but point it at the envtest
client and clean up namespaced resources after each run.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/ocp/nodesdebug/nodes_debug_test.go`:
- Around line 22-261: Tests in NodesDebugSuite currently rely entirely on the
fake test harness (NewNodeDebugTestEnv and its Pods behavior) for all cases;
update test coverage to use envtest (setup-envtest from
sigs.k8s.io/controller-runtime) for API-level validation and pod CRUD checks
while keeping the fake env for runtime exec behavior: create new test variants
(e.g., TestNodesDebugExec_APIValidation or
TestNodesDebugExec_CreationWithEnvtest) that initialize an envtest control
plane, use the real client to call NodesDebugExec to assert pod creation,
namespace, volume and securityContext fields and node targeting, and retain
existing NewNodeDebugTestEnv-based tests only for execution/exec error scenarios
(where container runtime is required); ensure the new envtest tests wire up the
same NodeDebug implementation (the nodeDebug helper) but point it at the envtest
client and clean up namespaced resources after each run.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 264958aa-1e2a-47f2-b269-fca1dd97322b

📥 Commits

Reviewing files that changed from the base of the PR and between 009980b and f24340d.

📒 Files selected for processing (17)
  • .github/workflows/mcpchecker.yaml
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/openshift/README.md
  • evals/tasks/openshift/check-node-kernel/cleanup.sh
  • evals/tasks/openshift/check-node-kernel/task.yaml
  • evals/tasks/openshift/check-node-kernel/verify.sh
  • evals/tasks/openshift/list-host-etc/cleanup.sh
  • evals/tasks/openshift/list-host-etc/task.yaml
  • evals/tasks/openshift/list-host-etc/verify.sh
  • pkg/ocp/nodesdebug/nodes_debug.go
  • pkg/ocp/nodesdebug/nodes_debug_test.go
  • pkg/ocp/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/toolsets/openshift/nodes/nodes.go
  • pkg/toolsets/openshift/nodes/nodes_test.go
  • pkg/toolsets/openshift/toolset.go
✅ Files skipped from review due to trivial changes (4)
  • evals/tasks/openshift/list-host-etc/task.yaml
  • evals/tasks/openshift/list-host-etc/cleanup.sh
  • evals/claude-code/eval.yaml
  • .github/workflows/mcpchecker.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
  • evals/tasks/openshift/check-node-kernel/task.yaml
  • evals/tasks/openshift/list-host-etc/verify.sh
  • evals/tasks/openshift/check-node-kernel/cleanup.sh
  • pkg/toolsets/openshift/toolset.go
  • evals/tasks/openshift/check-node-kernel/verify.sh
  • evals/openai-agent/eval.yaml
  • evals/gemini-agent/eval.yaml
  • pkg/toolsets/openshift/nodes/nodes_test.go
  • evals/tasks/openshift/README.md
  • pkg/toolsets/openshift/nodes/nodes.go
  • pkg/ocp/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/ocp/nodesdebug/nodes_debug.go

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2026
Add a privileged debug pod tool that mimics `oc debug node/<name>` for
running commands on OpenShift nodes. The tool creates a temporary pod
with the UBI9 toolbox image, executes the command, collects output, and
cleans up automatically.

Key design decisions based on PR review feedback:
- All new code lives in pkg/ocp/ and pkg/toolsets/openshift/ only,
  no upstream packages (pkg/kubernetes/, pkg/mcp/) are modified
- Tool is in the "openshift" toolset (not "core") to avoid conflicts
- NodeDebugClient interface decouples from concrete k8s client for
  testability
- Error messages include command output/logs for better diagnostics
- Default timeout is 1 minute (within MCP client timeout limits)
- Gated behind ReadOnly config (read_only=false required to use)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/tasks/openshift/check-node-kernel/cleanup.sh`:
- Line 4: The cleanup command currently uses "kubectl delete pods -A -l
app.kubernetes.io/component=node-debug --ignore-not-found --wait=false" which
returns immediately and can leave node-debug pods around; update the command to
wait for deletion (e.g. use "--wait=true" or remove the flag and add a sensible
"--timeout=<duration>" such as "--timeout=2m") so that the call blocks until
pods are deleted or the timeout elapses, ensuring deterministic cleanup for
subsequent tasks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bfdbdb07-b4f3-4b4c-af79-f0a09f5d2d08

📥 Commits

Reviewing files that changed from the base of the PR and between 15a931f and dd60bbd.

📒 Files selected for processing (22)
  • .github/workflows/mcpchecker.yaml
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/openshift/README.md
  • evals/tasks/openshift/check-node-kernel/cleanup.sh
  • evals/tasks/openshift/check-node-kernel/task.yaml
  • evals/tasks/openshift/check-node-kernel/verify.sh
  • evals/tasks/openshift/list-host-etc/cleanup.sh
  • evals/tasks/openshift/list-host-etc/task.yaml
  • evals/tasks/openshift/list-host-etc/verify.sh
  • internal/tools/update-readme/main.go
  • pkg/config/config_default_overrides.go
  • pkg/mcp/testdata/toolsets-openshift-tools.json
  • pkg/ocp/nodesdebug/nodes_debug.go
  • pkg/ocp/nodesdebug/nodes_debug_test.go
  • pkg/ocp/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/toolsets/openshift/nodes/nodes.go
  • pkg/toolsets/openshift/nodes/nodes_test.go
  • pkg/toolsets/openshift/toolset.go
✅ Files skipped from review due to trivial changes (2)
  • evals/tasks/openshift/list-host-etc/cleanup.sh
  • evals/openai-agent/eval.yaml
🚧 Files skipped from review as they are similar to previous changes (19)
  • docs/configuration.md
  • evals/tasks/openshift/list-host-etc/verify.sh
  • pkg/config/config_default_overrides.go
  • internal/tools/update-readme/main.go
  • evals/tasks/openshift/check-node-kernel/task.yaml
  • evals/claude-code/eval.yaml
  • README.md
  • evals/tasks/openshift/list-host-etc/task.yaml
  • pkg/toolsets/openshift/nodes/nodes_test.go
  • .github/workflows/mcpchecker.yaml
  • evals/tasks/openshift/check-node-kernel/verify.sh
  • pkg/mcp/testdata/toolsets-openshift-tools.json
  • evals/tasks/openshift/README.md
  • pkg/toolsets/openshift/toolset.go
  • pkg/ocp/nodesdebug/nodes_debug.go
  • pkg/ocp/nodesdebug/nodes_debug_test.go
  • evals/gemini-agent/eval.yaml
  • pkg/ocp/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/toolsets/openshift/nodes/nodes.go

Comment thread evals/tasks/cluster-diagnostics/check-node-kernel/cleanup.sh Outdated
@arkadeepsen arkadeepsen changed the title CORENET-7140: feat(openshift): add nodes_debug_exec tool for node-level debugging CORENET-7140: feat(cluster-diagnostics): add nodes_debug_exec tool for node-level debugging Jun 11, 2026
@arkadeepsen arkadeepsen force-pushed the node-debug branch 3 times, most recently from d9df4eb to b4e45d7 Compare June 12, 2026 06:55
@arkadeepsen

Copy link
Copy Markdown
Member Author

@coderabbitai resume
@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

@arkadeepsen: Resuming reviews and triggering a review of this PR.

✅ Action performed

Reviews resumed.


Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (1)
pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go (1)

60-62: ⚡ Quick win

Preserve log request options in the fake to keep test behavior faithful.

Line 60 accepts tail but drops it, and Line 118 ignores name/opts. This makes log-path tests less reliable for catching regressions around log option wiring.

Proposed refactor
 func (f *FakeNodeDebugClient) PodsLog(ctx context.Context, namespace, name, container string, previous bool, tail int64) (string, error) {
-	req := f.pods.GetLogs(name, &corev1.PodLogOptions{Container: container, Previous: previous})
+	var tailLines *int64
+	if tail > 0 {
+		tailLines = &tail
+	}
+	req := f.pods.GetLogs(name, &corev1.PodLogOptions{
+		Container: container,
+		Previous:  previous,
+		TailLines: tailLines,
+	})
 	res := req.Do(ctx)
@@
 type FakePodInterface struct {
 	corev1client.PodInterface
@@
 	ExecError  error
+	LastLogName string
+	LastLogOpts *corev1.PodLogOptions
 }
@@
 func (f *FakePodInterface) GetLogs(name string, opts *corev1.PodLogOptions) *restclient.Request {
+	f.LastLogName = name
+	if opts != nil {
+		c := *opts
+		f.LastLogOpts = &c
+	}
 	body := io.NopCloser(strings.NewReader(f.Logs))

Also applies to: 118-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go` around
lines 60 - 62, The fake log implementation in nodes_debug_testhelpers_test.go
currently accepts parameters like tail, name, and opts but drops them, which
breaks log-path tests; update the fake (the test helper function that handles
log requests) to preserve and store/forward the incoming log request options
(tail, name, opts) exactly as passed so test assertions can verify wiring, i.e.,
change the fake's handler to keep the tail value and the name/opts map/struct
instead of ignoring them and adjust any recorded request struct used by tests to
include those fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml`:
- Around line 10-12: The verification currently only checks for the substring
"Linux", which is too permissive; update the verifier (the inline check or
verify.sh referenced in the task) to require a version-bearing kernel string
instead of just "Linux". Replace the simple contains test with a regex check
that matches kernel-version patterns (for example, /Linux\s+\d+\.\d+(\.\d+)?/ or
similar) and fail when the output doesn't include digits/dots after "Linux";
ensure this change is applied to the verify logic that currently asserts
"Linux".

In `@evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh`:
- Line 4: The kubectl pod-count invocation in verify.sh is currently silencing
errors with a 2>/dev/null while the script runs under set -euo pipefail, which
hides why kubectl may fail; remove the stderr redirection (or instead capture
stderr into a variable and print a clear error) so failures are visible in CI
logs—locate the kubectl command in verify.sh that uses 2>/dev/null and either
delete the redirection or wrap the call to print the captured error before
exiting.

In `@evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh`:
- Line 4: The current cleanup.sh deletes all "node-debug" pods across all
namespaces; change the kubectl delete invocation in cleanup.sh to target only
the task's namespace or a task-specific label instead of using --all-namespaces
— e.g., use -n "$TASK_NAMESPACE" to restrict to the current task namespace or
add -l "task=$TASK_ID" (or run/$RUN_ID) to select pods by a task/run-specific
label; update any environment variable names (TASK_NAMESPACE, TASK_ID)
referenced in the script and ensure the node-debug pod deletion uses that
namespace/label selector so other evals' debug pods are not removed.

In `@evals/tasks/cluster-diagnostics/list-host-etc/verify.sh`:
- Line 4: The verify.sh script currently suppresses kubectl stderr with
"2>/dev/null", which hides useful error details; remove that stderr redirection
from any kubectl invocations in verify.sh (or alternatively capture stderr into
a variable and echo it on failure) so that authentication/context/API errors are
visible—look for kubectl commands in verify.sh and either delete "2>/dev/null"
or wrap the call to capture >&2 into a variable and print the variable before
exiting.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go`:
- Around line 1-20: The test currently lives in package nodesdebug and
constructs the internal NodeDebug type directly; change the test package to
nodesdebug_test and stop referencing NodeDebug internals — instead exercise only
exported entry points (e.g., call exported functions like RunNodeDebug or
NewPublicNodeDebug); if no exported constructor exists, add an exported
constructor (e.g., NewNodeDebug or NewPublicNodeDebug) to the package and use
that from the test; also update imports in the test to import the package as
nodesdebug and replace direct struct instantiation with calls to the exported
API.
- Around line 187-191: The test indexes VolumeMounts[0] after calling
s.Require().NotNil(...) which still allows an empty slice and can panic; before
any access add an explicit length guard (e.g.
s.Require().Len(<theObject>.VolumeMounts, 1) or
s.Require().NotEmpty(<theObject>.VolumeMounts)) immediately before the existing
access to VolumeMounts[0] so the test fails cleanly instead of panicking; keep
the existing NotNil check but insert the s.Require().Len(...) call (referencing
the VolumeMounts slice and the test's s.Require() helper) just prior to the
index usage.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go`:
- Around line 99-116: The fake pod client's Get and Delete handlers in
nodes_debug_testhelpers_test.go currently ignore the requested pod name and
operate on the last-created pod, allowing false-positive tests; update the fake
methods (e.g., the fake Get and Delete implementations used in tests) to verify
that the requested pod name equals the stored pod's Name and if not, return a
NotFound error (or nil pod + error) instead of operating on the wrong pod,
ensuring callers using an incorrect pod name fail the test reliably.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug.go`:
- Around line 156-164: retrieveLogs' returned error is currently discarded;
update the call in nodes_debug.go to log that error instead of ignoring it by
capturing the returned err from retrieveLogs and emitting a descriptive log
entry including node/pod identifiers and the error (use the file's existing
logger variable, e.g., d.logger or log, and include the error in the message) so
the diagnostic context is preserved for debugging.

In `@pkg/mcp/toolsets_test.go`:
- Around line 246-253: Replace the non-fatal assertion s.NoError(err) used when
creating the dependent test state with a fatal assertion so downstream checks
don't run on invalid state: change the call that checks the returned err (the
assertion currently using s.NoError(err) where err is produced on the setup at
line ~246) to s.Require().NoError(err) so the test stops on failure before the
dependent assertions around the later checks (currently at line ~252) execute.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go`:
- Around line 22-70: The test currently calls the unexported nodesDebugExec
directly; change it to exercise the public toolset API by registering the
handler via the toolset registration function and invoking it using the MCP tool
invocation path consumers use (instead of calling nodesDebugExec), then
replace/augment existing error-only cases with at least one positive successful
execution test that asserts the returned stdout/stderr and exit status; locate
usages of nodesDebugExec, the tool handler registration code (the public
Register/Handle function for this tool), and the MCP tool call helper in the
test suite (e.g., NodesHandlerSuite) to implement these changes and optionally
add an envtest-based integration test if you need end-to-end pod creation
validation.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go`:
- Around line 41-45: The handler that reads the "timeout_seconds" request
parameter must enforce the documented contract: if the parameter is missing set
a default of 60, reject non-integer/fractional values, and enforce a minimum of
1. Update the code in the request handler in nodes.go (the code that currently
parses "timeout_seconds" around the existing parsing block) to: 1) check for
empty/missing and set timeoutSecs = 60; 2) attempt to parse the value as an
integer (use integer parsing, not float) and if parsing fails return a 400
error; and 3) if parsed value < 1 clamp or reject per project convention (prefer
returning 400 for invalid input or set to 1 if you choose to clamp). Ensure the
variable used downstream is an int and that any timeouts use this validated
integer value.
- Around line 115-118: The code that parses the node `command` currently only
handles values of type []interface{} and thus rejects valid []string inputs;
update the parsing logic (the `command` handling block in nodes.go) to accept
both []interface{} and []string (and also a single string) by using a type
switch on the raw `command` value, converting []interface{} entries to strings,
using []string directly when present, and trimming/validating each entry before
storing into the existing command slice; ensure nil or empty inputs are handled
gracefully and keep the final type as []string used elsewhere.

---

Nitpick comments:
In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go`:
- Around line 60-62: The fake log implementation in
nodes_debug_testhelpers_test.go currently accepts parameters like tail, name,
and opts but drops them, which breaks log-path tests; update the fake (the test
helper function that handles log requests) to preserve and store/forward the
incoming log request options (tail, name, opts) exactly as passed so test
assertions can verify wiring, i.e., change the fake's handler to keep the tail
value and the name/opts map/struct instead of ignoring them and adjust any
recorded request struct used by tests to include those fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 24fbdc2f-698b-486a-920f-5e27b8b5c106

📥 Commits

Reviewing files that changed from the base of the PR and between 15a931f and b4e45d7.

📒 Files selected for processing (22)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-node-kernel/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml
  • evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/verify.sh
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (4)
  • docs/configuration.md
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/openai-agent/eval.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/tools/update-readme/main.go
  • README.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 12

🧹 Nitpick comments (1)
pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go (1)

60-62: ⚡ Quick win

Preserve log request options in the fake to keep test behavior faithful.

Line 60 accepts tail but drops it, and Line 118 ignores name/opts. This makes log-path tests less reliable for catching regressions around log option wiring.

Proposed refactor
 func (f *FakeNodeDebugClient) PodsLog(ctx context.Context, namespace, name, container string, previous bool, tail int64) (string, error) {
-	req := f.pods.GetLogs(name, &corev1.PodLogOptions{Container: container, Previous: previous})
+	var tailLines *int64
+	if tail > 0 {
+		tailLines = &tail
+	}
+	req := f.pods.GetLogs(name, &corev1.PodLogOptions{
+		Container: container,
+		Previous:  previous,
+		TailLines: tailLines,
+	})
 	res := req.Do(ctx)
@@
 type FakePodInterface struct {
 	corev1client.PodInterface
@@
 	ExecError  error
+	LastLogName string
+	LastLogOpts *corev1.PodLogOptions
 }
@@
 func (f *FakePodInterface) GetLogs(name string, opts *corev1.PodLogOptions) *restclient.Request {
+	f.LastLogName = name
+	if opts != nil {
+		c := *opts
+		f.LastLogOpts = &c
+	}
 	body := io.NopCloser(strings.NewReader(f.Logs))

Also applies to: 118-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go` around
lines 60 - 62, The fake log implementation in nodes_debug_testhelpers_test.go
currently accepts parameters like tail, name, and opts but drops them, which
breaks log-path tests; update the fake (the test helper function that handles
log requests) to preserve and store/forward the incoming log request options
(tail, name, opts) exactly as passed so test assertions can verify wiring, i.e.,
change the fake's handler to keep the tail value and the name/opts map/struct
instead of ignoring them and adjust any recorded request struct used by tests to
include those fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml`:
- Around line 10-12: The verification currently only checks for the substring
"Linux", which is too permissive; update the verifier (the inline check or
verify.sh referenced in the task) to require a version-bearing kernel string
instead of just "Linux". Replace the simple contains test with a regex check
that matches kernel-version patterns (for example, /Linux\s+\d+\.\d+(\.\d+)?/ or
similar) and fail when the output doesn't include digits/dots after "Linux";
ensure this change is applied to the verify logic that currently asserts
"Linux".

In `@evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh`:
- Line 4: The kubectl pod-count invocation in verify.sh is currently silencing
errors with a 2>/dev/null while the script runs under set -euo pipefail, which
hides why kubectl may fail; remove the stderr redirection (or instead capture
stderr into a variable and print a clear error) so failures are visible in CI
logs—locate the kubectl command in verify.sh that uses 2>/dev/null and either
delete the redirection or wrap the call to print the captured error before
exiting.

In `@evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh`:
- Line 4: The current cleanup.sh deletes all "node-debug" pods across all
namespaces; change the kubectl delete invocation in cleanup.sh to target only
the task's namespace or a task-specific label instead of using --all-namespaces
— e.g., use -n "$TASK_NAMESPACE" to restrict to the current task namespace or
add -l "task=$TASK_ID" (or run/$RUN_ID) to select pods by a task/run-specific
label; update any environment variable names (TASK_NAMESPACE, TASK_ID)
referenced in the script and ensure the node-debug pod deletion uses that
namespace/label selector so other evals' debug pods are not removed.

In `@evals/tasks/cluster-diagnostics/list-host-etc/verify.sh`:
- Line 4: The verify.sh script currently suppresses kubectl stderr with
"2>/dev/null", which hides useful error details; remove that stderr redirection
from any kubectl invocations in verify.sh (or alternatively capture stderr into
a variable and echo it on failure) so that authentication/context/API errors are
visible—look for kubectl commands in verify.sh and either delete "2>/dev/null"
or wrap the call to capture >&2 into a variable and print the variable before
exiting.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go`:
- Around line 1-20: The test currently lives in package nodesdebug and
constructs the internal NodeDebug type directly; change the test package to
nodesdebug_test and stop referencing NodeDebug internals — instead exercise only
exported entry points (e.g., call exported functions like RunNodeDebug or
NewPublicNodeDebug); if no exported constructor exists, add an exported
constructor (e.g., NewNodeDebug or NewPublicNodeDebug) to the package and use
that from the test; also update imports in the test to import the package as
nodesdebug and replace direct struct instantiation with calls to the exported
API.
- Around line 187-191: The test indexes VolumeMounts[0] after calling
s.Require().NotNil(...) which still allows an empty slice and can panic; before
any access add an explicit length guard (e.g.
s.Require().Len(<theObject>.VolumeMounts, 1) or
s.Require().NotEmpty(<theObject>.VolumeMounts)) immediately before the existing
access to VolumeMounts[0] so the test fails cleanly instead of panicking; keep
the existing NotNil check but insert the s.Require().Len(...) call (referencing
the VolumeMounts slice and the test's s.Require() helper) just prior to the
index usage.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go`:
- Around line 99-116: The fake pod client's Get and Delete handlers in
nodes_debug_testhelpers_test.go currently ignore the requested pod name and
operate on the last-created pod, allowing false-positive tests; update the fake
methods (e.g., the fake Get and Delete implementations used in tests) to verify
that the requested pod name equals the stored pod's Name and if not, return a
NotFound error (or nil pod + error) instead of operating on the wrong pod,
ensuring callers using an incorrect pod name fail the test reliably.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug.go`:
- Around line 156-164: retrieveLogs' returned error is currently discarded;
update the call in nodes_debug.go to log that error instead of ignoring it by
capturing the returned err from retrieveLogs and emitting a descriptive log
entry including node/pod identifiers and the error (use the file's existing
logger variable, e.g., d.logger or log, and include the error in the message) so
the diagnostic context is preserved for debugging.

In `@pkg/mcp/toolsets_test.go`:
- Around line 246-253: Replace the non-fatal assertion s.NoError(err) used when
creating the dependent test state with a fatal assertion so downstream checks
don't run on invalid state: change the call that checks the returned err (the
assertion currently using s.NoError(err) where err is produced on the setup at
line ~246) to s.Require().NoError(err) so the test stops on failure before the
dependent assertions around the later checks (currently at line ~252) execute.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go`:
- Around line 22-70: The test currently calls the unexported nodesDebugExec
directly; change it to exercise the public toolset API by registering the
handler via the toolset registration function and invoking it using the MCP tool
invocation path consumers use (instead of calling nodesDebugExec), then
replace/augment existing error-only cases with at least one positive successful
execution test that asserts the returned stdout/stderr and exit status; locate
usages of nodesDebugExec, the tool handler registration code (the public
Register/Handle function for this tool), and the MCP tool call helper in the
test suite (e.g., NodesHandlerSuite) to implement these changes and optionally
add an envtest-based integration test if you need end-to-end pod creation
validation.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go`:
- Around line 41-45: The handler that reads the "timeout_seconds" request
parameter must enforce the documented contract: if the parameter is missing set
a default of 60, reject non-integer/fractional values, and enforce a minimum of
1. Update the code in the request handler in nodes.go (the code that currently
parses "timeout_seconds" around the existing parsing block) to: 1) check for
empty/missing and set timeoutSecs = 60; 2) attempt to parse the value as an
integer (use integer parsing, not float) and if parsing fails return a 400
error; and 3) if parsed value < 1 clamp or reject per project convention (prefer
returning 400 for invalid input or set to 1 if you choose to clamp). Ensure the
variable used downstream is an int and that any timeouts use this validated
integer value.
- Around line 115-118: The code that parses the node `command` currently only
handles values of type []interface{} and thus rejects valid []string inputs;
update the parsing logic (the `command` handling block in nodes.go) to accept
both []interface{} and []string (and also a single string) by using a type
switch on the raw `command` value, converting []interface{} entries to strings,
using []string directly when present, and trimming/validating each entry before
storing into the existing command slice; ensure nil or empty inputs are handled
gracefully and keep the final type as []string used elsewhere.

---

Nitpick comments:
In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go`:
- Around line 60-62: The fake log implementation in
nodes_debug_testhelpers_test.go currently accepts parameters like tail, name,
and opts but drops them, which breaks log-path tests; update the fake (the test
helper function that handles log requests) to preserve and store/forward the
incoming log request options (tail, name, opts) exactly as passed so test
assertions can verify wiring, i.e., change the fake's handler to keep the tail
value and the name/opts map/struct instead of ignoring them and adjust any
recorded request struct used by tests to include those fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 24fbdc2f-698b-486a-920f-5e27b8b5c106

📥 Commits

Reviewing files that changed from the base of the PR and between 15a931f and b4e45d7.

📒 Files selected for processing (22)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-node-kernel/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml
  • evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/verify.sh
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (4)
  • docs/configuration.md
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/openai-agent/eval.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/tools/update-readme/main.go
  • README.md
🛑 Comments failed to post (12)
evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml (1)

10-12: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strengthen verification to require an actual kernel version, not just "Linux"

On Line 11, contains: "Linux" is too permissive for the prompt on Line 9; many non-answer outputs will still pass. Tighten verify.sh (or the inline check) to validate a version-bearing kernel string so the task measures the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml` around lines 10
- 12, The verification currently only checks for the substring "Linux", which is
too permissive; update the verifier (the inline check or verify.sh referenced in
the task) to require a version-bearing kernel string instead of just "Linux".
Replace the simple contains test with a regex check that matches kernel-version
patterns (for example, /Linux\s+\d+\.\d+(\.\d+)?/ or similar) and fail when the
output doesn't include digits/dots after "Linux"; ensure this change is applied
to the verify logic that currently asserts "Linux".
evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh (1)

4-4: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid silent failure in the pod count command.

With set -euo pipefail, a failing kubectl exits the script immediately; 2>/dev/null hides the root cause, making eval failures opaque in CI logs. Keep stderr visible (or handle and print a clear error) so verification failures are diagnosable.

Suggested patch
-remaining=$(kubectl get pods -A -l app.kubernetes.io/component=node-debug --no-headers 2>/dev/null | wc -l | tr -d ' ')
+remaining=$(kubectl get pods -A -l app.kubernetes.io/component=node-debug --no-headers | wc -l | tr -d ' ')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh` at line 4, The
kubectl pod-count invocation in verify.sh is currently silencing errors with a
2>/dev/null while the script runs under set -euo pipefail, which hides why
kubectl may fail; remove the stderr redirection (or instead capture stderr into
a variable and print a clear error) so failures are visible in CI logs—locate
the kubectl command in verify.sh that uses 2>/dev/null and either delete the
redirection or wrap the call to print the captured error before exiting.
evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh (1)

4-4: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Cleanup scope is too broad for shared clusters.

Line 4 deletes all node-debug pods across all namespaces. That can terminate unrelated debug sessions and make concurrent evals flaky. Scope deletion to the task namespace and/or a task/run-specific label.

Suggested direction
-kubectl delete pods -A -l app.kubernetes.io/component=node-debug --ignore-not-found --wait=true --timeout=120s
+# Example: scope to a known namespace + task/run label populated by task setup
+kubectl delete pods -n "${EVAL_NAMESPACE:?missing EVAL_NAMESPACE}" \
+  -l 'app.kubernetes.io/component=node-debug,eval.task=list-host-etc,eval.run-id='"${EVAL_RUN_ID:?missing EVAL_RUN_ID}" \
+  --ignore-not-found --wait=true --timeout=120s
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh` at line 4, The
current cleanup.sh deletes all "node-debug" pods across all namespaces; change
the kubectl delete invocation in cleanup.sh to target only the task's namespace
or a task-specific label instead of using --all-namespaces — e.g., use -n
"$TASK_NAMESPACE" to restrict to the current task namespace or add -l
"task=$TASK_ID" (or run/$RUN_ID) to select pods by a task/run-specific label;
update any environment variable names (TASK_NAMESPACE, TASK_ID) referenced in
the script and ensure the node-debug pod deletion uses that namespace/label
selector so other evals' debug pods are not removed.
evals/tasks/cluster-diagnostics/list-host-etc/verify.sh (1)

4-4: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t suppress kubectl stderr in verification scripts.

2>/dev/null hides why verification failed (e.g., auth/context/API errors), which makes CI failures hard to triage. Keep stderr visible, or catch and print a clear error before exiting.

Suggested patch
-remaining=$(kubectl get pods -A -l app.kubernetes.io/component=node-debug --no-headers 2>/dev/null | wc -l | tr -d ' ')
+remaining=$(kubectl get pods -A -l app.kubernetes.io/component=node-debug --no-headers | wc -l | tr -d ' ')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

remaining=$(kubectl get pods -A -l app.kubernetes.io/component=node-debug --no-headers | wc -l | tr -d ' ')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/tasks/cluster-diagnostics/list-host-etc/verify.sh` at line 4, The
verify.sh script currently suppresses kubectl stderr with "2>/dev/null", which
hides useful error details; remove that stderr redirection from any kubectl
invocations in verify.sh (or alternatively capture stderr into a variable and
echo it on failure) so that authentication/context/API errors are visible—look
for kubectl commands in verify.sh and either delete "2>/dev/null" or wrap the
call to capture >&2 into a variable and print the variable before exiting.
pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go (2)

1-20: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Switch this suite to black-box API testing.

At Line 1 and Lines 18-20, the test runs in package nodesdebug and directly constructs NodeDebug, which couples tests to internals. Move this to package nodesdebug_test and exercise exported entry points only (or expose a constructor if needed).

As per coding guidelines, tests should “Test the public API only - tests should be black-box and not access internal/private functions”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go` around lines 1 - 20,
The test currently lives in package nodesdebug and constructs the internal
NodeDebug type directly; change the test package to nodesdebug_test and stop
referencing NodeDebug internals — instead exercise only exported entry points
(e.g., call exported functions like RunNodeDebug or NewPublicNodeDebug); if no
exported constructor exists, add an exported constructor (e.g., NewNodeDebug or
NewPublicNodeDebug) to the package and use that from the test; also update
imports in the test to import the package as nodesdebug and replace direct
struct instantiation with calls to the exported API.

Source: Coding guidelines


187-191: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a length guard before indexing VolumeMounts[0].

At Lines 189-190, NotNil does not guarantee non-empty; an empty slice can still panic on [0]. Use s.Require().Len(..., 1) first.

Suggested fix
-			s.Require().NotNil(created.Spec.Containers[0].VolumeMounts)
-			s.Equal(tt.expectedMountPath, created.Spec.Containers[0].VolumeMounts[0].MountPath)
+			s.Require().Len(created.Spec.Containers[0].VolumeMounts, 1)
+			s.Equal(tt.expectedMountPath, created.Spec.Containers[0].VolumeMounts[0].MountPath)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

			s.Require().Len(created.Spec.Containers[0].VolumeMounts, 1)
			s.Equal(tt.expectedMountPath, created.Spec.Containers[0].VolumeMounts[0].MountPath)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go` around lines 187 -
191, The test indexes VolumeMounts[0] after calling s.Require().NotNil(...)
which still allows an empty slice and can panic; before any access add an
explicit length guard (e.g. s.Require().Len(<theObject>.VolumeMounts, 1) or
s.Require().NotEmpty(<theObject>.VolumeMounts)) immediately before the existing
access to VolumeMounts[0] so the test fails cleanly instead of panicking; keep
the existing NotNil check but insert the s.Require().Len(...) call (referencing
the VolumeMounts slice and the test's s.Require() helper) just prior to the
index usage.
pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go (1)

99-116: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate pod identity in fake Get/Delete to avoid false-positive tests.

Line 99 and Line 113 ignore the requested pod name and operate on whatever was created last. That can let tests pass even when caller code uses the wrong pod name.

Proposed fix
 func (f *FakePodInterface) Get(_ context.Context, _ string, _ metav1.GetOptions) (*corev1.Pod, error) {
 	if f.Created == nil {
 		return nil, fmt.Errorf("pod not created yet")
 	}
+	if f.Created.Name != name {
+		return nil, fmt.Errorf("unexpected pod name: got %q, want %q", name, f.Created.Name)
+	}
 	pod := f.Created.DeepCopy()
@@
 func (f *FakePodInterface) Delete(_ context.Context, _ string, _ metav1.DeleteOptions) error {
+	if f.Created != nil && f.Created.Name != name {
+		return fmt.Errorf("unexpected pod name: got %q, want %q", name, f.Created.Name)
+	}
 	f.Deleted = true
 	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

func (f *FakePodInterface) Get(_ context.Context, name string, _ metav1.GetOptions) (*corev1.Pod, error) {
	if f.Created == nil {
		return nil, fmt.Errorf("pod not created yet")
	}
	if f.Created.Name != name {
		return nil, fmt.Errorf("unexpected pod name: got %q, want %q", name, f.Created.Name)
	}
	pod := f.Created.DeepCopy()
	return pod, nil
}

func (f *FakePodInterface) Delete(_ context.Context, name string, _ metav1.DeleteOptions) error {
	if f.Created != nil && f.Created.Name != name {
		return fmt.Errorf("unexpected pod name: got %q, want %q", name, f.Created.Name)
	}
	f.Deleted = true
	return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go` around
lines 99 - 116, The fake pod client's Get and Delete handlers in
nodes_debug_testhelpers_test.go currently ignore the requested pod name and
operate on the last-created pod, allowing false-positive tests; update the fake
methods (e.g., the fake Get and Delete implementations used in tests) to verify
that the requested pod name equals the stored pod's Name and if not, return a
NotFound error (or nil pod + error) instead of operating on the wrong pod,
ensuring callers using an incorrect pod name fail the test reliably.
pkg/cluster-diagnostics/nodesdebug/nodes_debug.go (1)

156-164: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log the error from retrieveLogs instead of discarding it.

Per coding guidelines, errors should not be ignored in production code. When log retrieval fails, the diagnostic context is lost silently, making it harder to debug why the exec failed.

Suggested fix
 	stdout, stderr, execErr := n.PodsExec(timeoutCtx, ns, created.Name, NodeDebugContainerName, command)
 	if execErr != nil {
 		// Retrieve logs on exec errors — the pod may have produced partial output.
-		logs, _ := n.retrieveLogs(context.Background(), ns, created.Name)
+		logs, logErr := n.retrieveLogs(context.Background(), ns, created.Name)
+		if logErr != nil {
+			klog.Warningf("failed to retrieve logs for debug pod %s: %v", created.Name, logErr)
+		}
 		if logs != "" {
 			return "", "", fmt.Errorf("failed to execute command in debug pod %s in namespace %s: %w\nlogs:\n%s", created.Name, ns, execErr, logs)
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug.go` around lines 156 - 164,
retrieveLogs' returned error is currently discarded; update the call in
nodes_debug.go to log that error instead of ignoring it by capturing the
returned err from retrieveLogs and emitting a descriptive log entry including
node/pod identifiers and the error (use the file's existing logger variable,
e.g., d.logger or log, and include the error in the message) so the diagnostic
context is preserved for debugging.

Source: Coding guidelines

pkg/mcp/toolsets_test.go (1)

246-253: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use s.Require().NoError(err) before dependent assertions.

Line 246 produces err that must be fatal for this test path; using s.NoError(err) allows Line 252 to continue with invalid state and can obscure the primary failure.

Suggested fix
 		s.InitMcpClient()
 		tools, err := s.ListTools()
+		s.Require().NoError(err, "Expected no error from ListTools")
+		s.Require().NotNil(tools, "Expected tools from ListTools")
+
 		s.Run("ListTools returns tools", func() {
-			s.NotNil(tools, "Expected tools from ListTools")
-			s.NoError(err, "Expected no error from ListTools")
+			s.NotNil(tools, "Expected tools from ListTools")
 		})
 		s.Run("ListTools returns correct Tool metadata", func() {
 			s.assertJsonSnapshot("toolsets-openshift-mustgather-tools.json", tools.Tools)
 		})

As per coding guidelines, “Use s.Require().NoError(err) for operations in tests that must succeed for the test to be valid”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/mcp/toolsets_test.go` around lines 246 - 253, Replace the non-fatal
assertion s.NoError(err) used when creating the dependent test state with a
fatal assertion so downstream checks don't run on invalid state: change the call
that checks the returned err (the assertion currently using s.NoError(err) where
err is produced on the setup at line ~246) to s.Require().NoError(err) so the
test stops on failure before the dependent assertions around the later checks
(currently at line ~252) execute.

Source: Coding guidelines

pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go (1)

22-70: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Test the public API instead of the private nodesDebugExec function.

This test directly invokes the unexported nodesDebugExec function. As per coding guidelines, tests should be black-box and only test the public API surface, not internal/private functions. The proper approach is to test through the public toolset API that consumers will use.

Additionally, this test suite only covers error validation paths but has no positive test cases for successful command execution. As per coding guidelines, aim for high test coverage including both error paths and successful scenarios.

♻️ Recommended approach
  1. Test through the public API: Register the tool handler through the toolset interface and invoke it via the MCP tool call mechanism that external consumers use
  2. Add positive test cases: Include at least one successful execution scenario that validates the tool returns expected output when given valid inputs
  3. Consider integration tests: For a tool that creates debug pods, consider using envtest (per guidelines) to verify end-to-end behavior with a real Kubernetes API

Example structure:

func (s *NodesHandlerSuite) TestSuccessfulExecution() {
    // Setup toolset and register handler through public API
    // Invoke tool via public interface
    // Assert successful result with expected stdout
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go` around lines 22 - 70,
The test currently calls the unexported nodesDebugExec directly; change it to
exercise the public toolset API by registering the handler via the toolset
registration function and invoking it using the MCP tool invocation path
consumers use (instead of calling nodesDebugExec), then replace/augment existing
error-only cases with at least one positive successful execution test that
asserts the returned stdout/stderr and exit status; locate usages of
nodesDebugExec, the tool handler registration code (the public Register/Handle
function for this tool), and the MCP tool call helper in the test suite (e.g.,
NodesHandlerSuite) to implement these changes and optionally add an
envtest-based integration test if you need end-to-end pod creation validation.

Source: Coding guidelines

pkg/toolsets/cluster-diagnostics/nodes/nodes.go (2)

41-45: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce timeout_seconds contract in handler (default + integer/min validation).

Line 43 documents a 60s default and Line 44 sets minimum 1 in schema, but Lines 85-97 don’t enforce either at runtime. This can accept fractional/invalid values and leaves default behavior implicit in downstream code.

Suggested fix
 import (
 	"errors"
 	"fmt"
+	"math"
 	"time"
@@
-	var timeout time.Duration
+	timeout := 60 * time.Second
 	if timeoutRaw, exists := params.GetArguments()["timeout_seconds"]; exists && timeoutRaw != nil {
 		switch v := timeoutRaw.(type) {
 		case float64:
-			timeout = time.Duration(int64(v)) * time.Second
+			if v < 1 || math.Trunc(v) != v {
+				return api.NewToolCallResult("", errors.New("timeout_seconds must be an integer >= 1")), nil
+			}
+			timeout = time.Duration(int64(v)) * time.Second
 		case int:
+			if v < 1 {
+				return api.NewToolCallResult("", errors.New("timeout_seconds must be >= 1")), nil
+			}
 			timeout = time.Duration(v) * time.Second
 		case int64:
+			if v < 1 {
+				return api.NewToolCallResult("", errors.New("timeout_seconds must be >= 1")), nil
+			}
 			timeout = time.Duration(v) * time.Second
 		default:
 			return api.NewToolCallResult("", errors.New("timeout_seconds must be a numeric value")), nil
 		}
 	}

Also applies to: 85-97

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go` around lines 41 - 45, The
handler that reads the "timeout_seconds" request parameter must enforce the
documented contract: if the parameter is missing set a default of 60, reject
non-integer/fractional values, and enforce a minimum of 1. Update the code in
the request handler in nodes.go (the code that currently parses
"timeout_seconds" around the existing parsing block) to: 1) check for
empty/missing and set timeoutSecs = 60; 2) attempt to parse the value as an
integer (use integer parsing, not float) and if parsing fails return a 400
error; and 3) if parsed value < 1 clamp or reject per project convention (prefer
returning 400 for invalid input or set to 1 if you choose to clamp). Ensure the
variable used downstream is an int and that any timeouts use this validated
integer value.

115-118: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

command parsing is too strict for valid in-process inputs.

Lines 115-118 only accept []interface{}. If a caller passes []string (still valid per schema intent), this path incorrectly rejects it.

Suggested fix
 func toStringSlice(arg any) ([]string, error) {
 	if arg == nil {
 		return nil, errors.New("command is required")
 	}
-	raw, ok := arg.([]interface{})
-	if !ok {
-		return nil, errors.New("command must be an array of strings")
-	}
-	if len(raw) == 0 {
+	switch v := arg.(type) {
+	case []string:
+		if len(v) == 0 {
+			return nil, errors.New("command array cannot be empty")
+		}
+		return v, nil
+	case []interface{}:
+		if len(v) == 0 {
+			return nil, errors.New("command array cannot be empty")
+		}
+		command := make([]string, 0, len(v))
+		for _, item := range v {
+			str, ok := item.(string)
+			if !ok {
+				return nil, errors.New("command items must be strings")
+			}
+			command = append(command, str)
+		}
+		return command, nil
+	default:
+		return nil, errors.New("command must be an array of strings")
+	}
-		return nil, errors.New("command array cannot be empty")
-	}
-	command := make([]string, 0, len(raw))
-	for _, item := range raw {
-		str, ok := item.(string)
-		if !ok {
-			return nil, errors.New("command items must be strings")
-		}
-		command = append(command, str)
-	}
-	return command, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go` around lines 115 - 118, The
code that parses the node `command` currently only handles values of type
[]interface{} and thus rejects valid []string inputs; update the parsing logic
(the `command` handling block in nodes.go) to accept both []interface{} and
[]string (and also a single string) by using a type switch on the raw `command`
value, converting []interface{} entries to strings, using []string directly when
present, and trimming/validating each entry before storing into the existing
command slice; ensure nil or empty inputs are handled gracefully and keep the
final type as []string used elsewhere.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go (1)

22-69: ⚡ Quick win

Route these tests through the exported tool registration path.

These cases call nodesDebugExec directly, which couples the suite to a private helper instead of the toolset’s public surface. Exercising the handler via InitNodes() keeps the test black-box and also covers the registration contract that callers actually use.

As per coding guidelines, "**/*_test.go: Test the public API only - tests should be black-box and not access internal/private functions`."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go` around lines 22 - 69,
Tests currently call the private helper nodesDebugExec directly; change them to
exercise the public registration path by calling InitNodes() to obtain the
exported tool/handler and invoking it with the same
api.ToolHandlerParams/staticRequest inputs (instead of nodesDebugExec). Replace
each nodesDebugExec(params) invocation with the registered handler's execution
method (e.g., the returned Tool.Handler or Tool.Handle/Execute function from
InitNodes()) so the suite remains black-box and still asserts the same error
results for missing node, bad command type, missing command, and empty command
array.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/tasks/cluster-diagnostics/list-host-etc/task.yaml`:
- Around line 3-4: The task's metadata label metadata.labels.suite is set to
"openshift" so the task is not matched by the new taskSets that filter on
"suite: cluster-diagnostics"; update metadata.labels.suite from "openshift" to
"cluster-diagnostics" (i.e., change the label value in task.yaml) so the task is
discoverable by the cluster-diagnostics runs.

In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug.go`:
- Around line 156-163: The current error branch for PodsExec discards
stdout/stderr when execErr occurs; change the branch handling in function(s)
that call PodsExec (referencing PodsExec, retrieveLogs, NodeDebugContainerName,
created.Name and ns) so that you still return stdout and stderr alongside the
wrapped error (and include logs if available) instead of empty strings—i.e., on
execErr return stdout, stderr, and an error that wraps execErr and appends any
retrieved pod logs for context.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go`:
- Around line 85-96: The timeout parsing currently accepts fractional and
non-positive values from params.GetArguments()["timeout_seconds"]; update the
switch in the timeout parsing block so that: when timeoutRaw is float64 you
reject it unless it represents an exact integer (e.g., math.Mod(v,1)==0) and its
integer value is >= 1; when timeoutRaw is int or int64 ensure the value is >= 1;
for any fractional or non-positive values return api.NewToolCallResult("",
errors.New("timeout_seconds must be an integer >= 1")) (same for unsupported
types) so NodesDebugExec receives only valid positive integer seconds. Ensure
you reference the same variables (timeoutRaw, timeout, params.GetArguments())
and keep the existing return path shape.

---

Nitpick comments:
In `@pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go`:
- Around line 22-69: Tests currently call the private helper nodesDebugExec
directly; change them to exercise the public registration path by calling
InitNodes() to obtain the exported tool/handler and invoking it with the same
api.ToolHandlerParams/staticRequest inputs (instead of nodesDebugExec). Replace
each nodesDebugExec(params) invocation with the registered handler's execution
method (e.g., the returned Tool.Handler or Tool.Handle/Execute function from
InitNodes()) so the suite remains black-box and still asserts the same error
results for missing node, bad command type, missing command, and empty command
array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 24fbdc2f-698b-486a-920f-5e27b8b5c106

📥 Commits

Reviewing files that changed from the base of the PR and between 15a931f and b4e45d7.

📒 Files selected for processing (22)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-node-kernel/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-node-kernel/task.yaml
  • evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/verify.sh
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (8)
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-node-kernel/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/verify.sh
  • evals/tasks/cluster-diagnostics/check-node-kernel/verify.sh
  • pkg/mcp/modules.go
  • evals/tasks/cluster-diagnostics/README.md
  • docs/configuration.md
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • internal/tools/update-readme/main.go

Comment thread evals/tasks/cluster-diagnostics/list-host-etc/task.yaml Outdated
Comment thread pkg/cluster-diagnostics/nodesdebug/nodes_debug.go Outdated
Comment thread pkg/toolsets/cluster-diagnostics/nodes/nodes.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/toolsets/cluster-diagnostics/nodes/nodes.go (2)

114-117: ⚡ Quick win

Consider including stderr when both stdout and stderr contain data.

The current logic prefers stdout and only falls back to stderr when stdout is empty. When a command writes diagnostic information to stderr while also producing normal output to stdout, the diagnostic information is lost. Consider combining both streams when both are non-empty, or at minimum documenting this design choice.

💡 Potential improvement
 output := stdout
 if stdout == "" {
   output = stderr
+} else if stderr != "" {
+  output = stdout + "\n\n--- stderr ---\n" + stderr
 }

Alternatively, if preferring stdout-only is intentional, add a comment explaining the rationale.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go` around lines 114 - 117, The
current selection logic in nodes.go (variables stdout, stderr and output)
discards stderr when stdout is non-empty; change it so diagnostic stderr isn't
lost by combining both streams when both are non-empty (e.g., set output to
stdout + "\n" + stderr or similar), or if you intentionally prefer stdout-only,
add a brief comment next to the output assignment explaining that choice and why
stderr is ignored; update any downstream consumers of output (where output is
used) to handle the combined content if you choose to concatenate.

110-110: Drop the concern about empty hostPath/mountPath; focus on stdout/stderr handling.

  • hostPath/mountPath empty strings are intentionally supported: validatePath("") returns nil, and createDebugPod defaults them to "/" and "/host" when empty.
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go returns only stdout when it’s non-empty (stderr is used only when stdout == ""), so commands that emit diagnostics to stderr while also producing stdout will lose that information—consider combining/appending stderr when both streams are present.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go` at line 110, The current
call to client.NodesDebugExec in NodesDebugExec handling returns stdout and
stderr separately but downstream code in nodes.go only returns stdout when
non-empty, discarding stderr; update the logic that consumes stdout, stderr,
execErr (from client.NodesDebugExec) to preserve both streams by appending or
combining stderr into the returned output when stdout is non-empty (e.g.,
"stdout\n[stderr]\n" or similar), ensuring execErr is still returned/propagated;
locate the handling around the stdout, stderr, execErr variables in nodes.go and
change the return/response logic to include stderr whenever it contains data
instead of dropping it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/toolsets/cluster-diagnostics/nodes/nodes.go`:
- Around line 114-117: The current selection logic in nodes.go (variables
stdout, stderr and output) discards stderr when stdout is non-empty; change it
so diagnostic stderr isn't lost by combining both streams when both are
non-empty (e.g., set output to stdout + "\n" + stderr or similar), or if you
intentionally prefer stdout-only, add a brief comment next to the output
assignment explaining that choice and why stderr is ignored; update any
downstream consumers of output (where output is used) to handle the combined
content if you choose to concatenate.
- Line 110: The current call to client.NodesDebugExec in NodesDebugExec handling
returns stdout and stderr separately but downstream code in nodes.go only
returns stdout when non-empty, discarding stderr; update the logic that consumes
stdout, stderr, execErr (from client.NodesDebugExec) to preserve both streams by
appending or combining stderr into the returned output when stdout is non-empty
(e.g., "stdout\n[stderr]\n" or similar), ensuring execErr is still
returned/propagated; locate the handling around the stdout, stderr, execErr
variables in nodes.go and change the return/response logic to include stderr
whenever it contains data instead of dropping it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d3ea014-549a-484b-9971-6df74508ce89

📥 Commits

Reviewing files that changed from the base of the PR and between b4e45d7 and a0d6fda.

📒 Files selected for processing (20)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-kubelet-active/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (4)
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • docs/configuration.md
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
🚧 Files skipped from review as they are similar to previous changes (13)
  • pkg/mcp/toolsets_test.go
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/modules.go
  • evals/openai-agent/eval.yaml
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • pkg/toolsets/cluster-diagnostics/toolset.go
  • README.md
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go

{
Tool: api.Tool{
Name: "nodes_debug_exec",
Description: "Run commands on an OpenShift node using a privileged debug pod with comprehensive troubleshooting utilities. The debug pod uses the UBI9 toolbox image which includes: systemd tools (systemctl, journalctl), networking tools (ss, ip, ping, traceroute, nmap), process tools (ps, top, lsof, strace), file system tools (find, tar, rsync), and debugging tools (gdb). The host filesystem is mounted at /host, allowing commands to chroot /host if needed to access node-level resources. Output is truncated to the most recent 100 lines, so prefer filters like grep when expecting large logs.",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkadeepsen maybe I am missing it, but I don't see the 100-line truncation logic? Can we either:

  1. add the truncation logic
  2. remove that from the tool description

My inclination would be to go with 2 as most harnesses are adding capabilities to better handle large tool outputs (e.g. claude code dumps it into a file which it can then read from selectively, rather than putting the full output into the context)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the truncation part from the description.

// pollForRunning polls the debug pod until it becomes running or times out.
func (n *NodeDebug) pollForRunning(ctx context.Context, namespace string, podName string, timeout time.Duration) error {
// Poll until the pod is running or the timeout is reached.
err := wait.PollUntilContextTimeout(ctx, 2*time.Second, timeout, true, func(ctx context.Context) (bool, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we exit early here if the pod is obviously failed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/cluster-diagnostics/nodesdebug/nodes_debug.go`:
- Around line 157-160: The retrieveLogs method call is ignoring its error return
value using a blank identifier on line 159, which violates production code
guidelines. Instead of using the blank identifier to discard the error, capture
the error returned from n.retrieveLogs and handle it appropriately by logging it
or including it in your error context, since this code is already in an error
handling path and suppressing the error can hide important debugging
information.
- Around line 269-272: The error formatting in the PodFailed case at the
fmt.Errorf call uses the %w verb with pod.Status.Message, which is a string
type, not an error type. The %w verb is designed only for error wrapping and
should not be used with strings. Replace the %w verb with %s to properly format
the string value of pod.Status.Message in the error message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d7904a84-b070-4dc5-93b6-f2f8cbcfcfdf

📥 Commits

Reviewing files that changed from the base of the PR and between 636bec8 and 86d6c16.

📒 Files selected for processing (20)
  • README.md
  • docs/configuration.md
  • evals/claude-code/eval.yaml
  • evals/gemini-agent/eval.yaml
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/README.md
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
  • evals/tasks/cluster-diagnostics/check-kubelet-active/task.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • internal/tools/update-readme/main.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • pkg/mcp/toolsets_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/toolset.go
✅ Files skipped from review due to trivial changes (5)
  • evals/tasks/cluster-diagnostics/check-kubelet-active/cleanup.sh
  • pkg/mcp/modules.go
  • pkg/mcp/testdata/toolsets-cluster-diagnostics-tools.json
  • docs/configuration.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (12)
  • evals/openai-agent/eval.yaml
  • evals/tasks/cluster-diagnostics/list-host-etc/cleanup.sh
  • evals/claude-code/eval.yaml
  • pkg/mcp/toolsets_test.go
  • evals/tasks/cluster-diagnostics/list-host-etc/task.yaml
  • evals/tasks/cluster-diagnostics/check-kubelet-active/task.yaml
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_testhelpers_test.go
  • internal/tools/update-readme/main.go
  • evals/gemini-agent/eval.yaml
  • pkg/toolsets/cluster-diagnostics/nodes/nodes_test.go
  • pkg/toolsets/cluster-diagnostics/nodes/nodes.go
  • pkg/cluster-diagnostics/nodesdebug/nodes_debug_test.go

Comment thread pkg/cluster-diagnostics/nodesdebug/nodes_debug.go Outdated
Comment thread pkg/cluster-diagnostics/nodesdebug/nodes_debug.go
A new toolset, cluster-diagnostics, is created and the node_debug_exec tool is now added as part of
it. The code is now organized under packages related to node (pkg/cluster-diagnostics/nodesdebug
and pkg/toolsets/cluster-diagnostics/nodes).

Instead of directly creating debug pod with the passed command, pod is created with sleep command.
Then, the passed command is executed inside the pod using pod_exec. Logs are now only fetched when
there's pod_exec error. This way the stdout and stderr data can be easily obtained instead of
processing the output from the pod logs.

Though, the node_debug_exec tool doesn't accept hostPath and mountPath as inputs, the internal
function now accepts these params which can be passed by other tools which need node debug.
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

@arkadeepsen: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants