Skip to content

Rhobs implementation#767

Draft
AlexSmithGH wants to merge 6 commits into
openshift:mainfrom
AlexSmithGH:rhobs-implementation
Draft

Rhobs implementation#767
AlexSmithGH wants to merge 6 commits into
openshift:mainfrom
AlexSmithGH:rhobs-implementation

Conversation

@AlexSmithGH

@AlexSmithGH AlexSmithGH commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

(feature/bug/documentation/other)

What this PR does / Why we need it?

Special notes for your reviewer

Test Coverage

Guidelines for CAD investigations

  • New investgations should be accompanied by unit tests and/or step-by-step manual tests in the investigation README.
  • Actioning investigations should be locally tested in staging, and E2E testing is desired. See README for more info on investigation graduation process.

Test coverage checks

  • Added tests
  • Created jira card to add unit test
  • This PR may not need unit tests

Pre-checks (if applicable)

  • Ran unit tests locally
  • Validated the changes in a cluster
  • Included documentation changes with PR

Summary by CodeRabbit

  • New Features

    • Added RHOBS-based log fetching for investigation analysis, replacing previous log sourcing method
    • Investigation reports now include formatted logs and Grafana explore links from log queries
  • Configuration

    • New required environment variable CAD_GRAFANA_TOKEN must be set for Grafana/Loki API access during investigations

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@openshift-ci

openshift-ci Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2026
@openshift-ci

openshift-ci Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlexSmithGH

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown

Walkthrough

The changes introduce a new RHOBS (Red Hat Observability Stack) Loki client package and integrate it into the investigation framework. The controller now manages a Grafana token from environment variables, the OCM client retrieves RHOBS cell endpoints from cluster labels, and investigation resource builders construct RHOBS clients for log querying. The HCP etcd analysis investigation migrates from Dynatrace URL construction to fetching and displaying logs from Loki.

Changes

Cohort / File(s) Summary
RHOBS Core Package
pkg/rhobs/types.go, pkg/rhobs/rhobs.go, pkg/rhobs/rhobs_test.go, pkg/rhobs/mock/rhobsmock.go
New package implementing Loki HTTP client with query support, response parsing, log formatting, and GoMock fixtures; includes comprehensive unit tests covering client initialization, query execution, error handling, and result formatting.
Investigation RHOBS Integration
pkg/investigations/investigation/investigation.go
Extended NewResourceBuilder to accept grafanaToken, added WithRHOBSClient() builder method, added conditional buildRHOBSClientResource() logic to create RHOBS client after fetching cell endpoint from OCM, and removed Dynatrace URL handling; Resources struct gains RHOBSClient, RHOBSCell, GrafanaToken fields and loses DynatraceManagementClusterURL.
HCP Etcd Investigation
pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace.go, pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace_test.go
Changed log fetching from Dynatrace URL construction to querying RHOBS via RHOBSClient.QueryLogs, replaced Dynatrace helpers with buildRHOBSLogsURL and fetchRHOBSLogs, updated investigation labels to reflect RHOBS state, and added test coverage for successful log retrieval and query failure scenarios.
OCM Client RHOBS Support
pkg/ocm/ocm.go, pkg/ocm/mock/ocmmock.go
Added GetRHOBSCell(clusterID string) (string, error) interface method and implementation to retrieve RHOBS cell endpoint from management cluster external configuration labels, plus GoMock fixture methods.
Controller Grafana Token
pkg/controller/controller.go
Added GrafanaToken field to Dependencies struct, populated from required CAD_GRAFANA_TOKEN environment variable in initializeDependencies, and passed to investigation.NewResourceBuilder call.
Documentation and Environment
CLAUDE.md, test/set_stage_env.sh
Updated documentation to describe RHOBS integration and RHOBSCell exposure; added Vault secret path for Grafana credentials in staging environment setup.

Sequence Diagram

sequenceDiagram
    participant Ctrl as Controller
    participant Dep as Dependencies
    participant ResB as ResourceBuilder
    participant OCM as OCM Client
    participant RHOBS as RHOBS Client
    participant Loki as Loki Server
    
    Ctrl->>Dep: initializeDependencies()
    Dep->>Dep: Read CAD_GRAFANA_TOKEN env
    Ctrl->>ResB: NewResourceBuilder(..., grafanaToken)
    ResB->>ResB: Store grafanaToken
    
    Ctrl->>ResB: WithRHOBSClient()
    Ctrl->>ResB: Build()
    
    ResB->>OCM: GetClusterInfo(managementClusterID)
    OCM-->>ResB: Cluster details
    
    ResB->>OCM: GetRHOBSCell(clusterID)
    OCM->>OCM: Query ext config labels<br/>for rhobs-cell
    OCM-->>ResB: RHOBSCell endpoint
    
    ResB->>RHOBS: NewClient(BaseURL=RHOBSCell,<br/>Token=grafanaToken)
    RHOBS-->>ResB: RHOBS Client
    ResB-->>Ctrl: Resources with RHOBSClient
    
    Ctrl->>RHOBS: QueryLogs(logQLQuery, timeRange, limit)
    RHOBS->>Loki: HTTP GET query_range<br/>(with auth headers)
    Loki-->>RHOBS: QueryRangeResponse JSON
    RHOBS->>RHOBS: Parse streams and entries
    RHOBS-->>Ctrl: LogQueryResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description consists entirely of unfilled template placeholders; no actual content describes the changes, objectives, test coverage, or pre-checks performed. Fill in all template sections with concrete details: specify PR type (feature), explain why RHOBS integration is needed, document test coverage and validation status, and confirm pre-checks completion.
Title check ❓ Inconclusive The title 'Rhobs implementation' is vague and generic, lacking specific detail about what the RHOBS integration accomplishes or which components are affected. Provide a more descriptive title that clarifies the specific purpose, such as 'Add RHOBS log querying for HCP etcd analysis' or 'Integrate RHOBS client for log fetching in investigations'.
✅ Passed checks (8 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing package with static, deterministic test names and no dynamic values in test titles.
Test Structure And Quality ✅ Passed Tests follow established codebase patterns: while some files use assertion messages, the predominant pattern across the repository is assertions without custom messages, which the new tests align with.
Microshift Test Compatibility ✅ Passed The PR adds only standard Go unit tests using the testing package, not Ginkgo e2e tests, so the MicroShift compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This pull request does not add any Ginkgo e2e tests. All new tests are standard Go unit tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR implements RHOBS integration for HCP cluster log fetching without introducing new Kubernetes scheduling constraints. Changes are limited to RHOBS client package, investigation logic, and environment variable configuration.
Ote Binary Stdout Contract ✅ Passed Pull request contains no stdout writes in process-level code, only library code with fmt.Fprintf writing to strings.Builder buffers.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR does not add any new Ginkgo e2e tests. New test files are standard Go unit tests, not Ginkgo-based tests. The custom check for Ginkgo e2e tests with IPv4 assumptions or external connectivity requirements is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch rhobs-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace.go (1)

610-612: Consider making the time window configurable or dynamic.

The hardcoded 30-minute time window may be insufficient if the analysis job takes longer to complete or if there's significant delay before log ingestion. Consider:

  • Using the job creation timestamp as the start time
  • Making the window configurable

This is a minor concern since most jobs should complete well within 30 minutes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace.go`
around lines 610 - 612, Replace the hardcoded 30-minute window (variables now
and start computed as now.Add(-30 * time.Minute)) with a configurable or dynamic
duration: add a parameter or config value (e.g., windowDuration or
etcdQueryWindow) and use that to compute start, and if a job or resource
creation timestamp is available (e.g., job.CreationTimestamp or the
investigation object’s CreatedAt), use max(job.CreationTimestamp,
now.Add(-windowDuration)) as the start time; update any callers to pass the new
config/parameter or default to 30 minutes when unspecified.
test/set_stage_env.sh (1)

9-9: Shellcheck warnings apply to this and all similar vault export loops.

The static analysis hints (SC2163, SC2086) are valid but apply equally to lines 6-8. The pattern is consistent with existing code, so this is acceptable for now. Consider a future refactor to handle values with spaces safely:

💡 Safer alternative pattern (optional)
while IFS='=' read -r key value; do
  export "$key=$value"
done < <(vault kv get -format=json osd-sre/configuration-anomaly-detection/grafana/stg | jq -r '.data.data | to_entries[] | "\(.key)=\(.value)"')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/set_stage_env.sh` at line 9, The current for-loop that reads vault
output into variable v via `for v in $(vault kv get -format=json ... | jq -r
"...")` is unsafe for values with spaces and triggers shellcheck warnings
SC2163/SC2086; replace this loop in set_stage_env.sh with a while-read pattern
that uses a safe IFS and read -r to split key and value (e.g., `while IFS='='
read -r key value; do export "$key=$value"; done < <(vault kv get ... | jq -r
'...')`) so the `vault kv get ... | jq -r` pipeline is consumed safely and
keys/values with spaces are preserved, and ensure the loop uses the same
variable names (key, value) instead of `v`.
pkg/investigations/investigation/investigation.go (1)

212-216: WithRHOBSClient() creates an unnecessary dependency on management REST config.

The RHOBS client only needs the management cluster name (to look up the RHOBS cell label via OCM API), but WithManagementRestConfig() forces creation of a backplane REST config to the management cluster. If backplane connectivity to the management cluster fails, RHOBS client creation will fail even though it doesn't actually require that connection.

Consider creating a lighter dependency that only ensures the cluster is loaded and HCP info is fetched without requiring the management REST config.

♻️ Suggested approach
 func (r *ResourceBuilderT) WithRHOBSClient() ResourceBuilder {
-	r.WithManagementRestConfig()
+	r.WithCluster() // Only need cluster info to determine if HCP and get management cluster name
 	r.buildRHOBSClient = true
 	return r
 }

Then ensure buildRHOBSClientResource() fetches the management cluster name if not already available (similar logic to what's in buildManagementClusterResources but without creating REST configs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/investigations/investigation/investigation.go` around lines 212 - 216,
WithRHOBSClient currently forces a management REST config via
WithManagementRestConfig(); remove that call so WithRHOBSClient() only sets
buildRHOBSClient = true. Update buildRHOBSClientResource() to lazily ensure the
management cluster name is available (reusing the name-fetching logic from
buildManagementClusterResources without creating a REST config or backplane
connection) and fail only if the cluster name/ HCP info cannot be resolved,
thereby avoiding unnecessary backplane REST config creation when building the
RHOBS client.
pkg/rhobs/rhobs.go (1)

69-71: Minor: Content-Type header is unnecessary for GET requests.

Content-Type describes the body of the request, but GET requests don't have a body. While this won't break anything (Loki will ignore it), it could be removed for clarity. Consider using Accept: application/json instead if you want to indicate the expected response format.

♻️ Optional cleanup
 	req.Header.Set("Authorization", "Bearer "+c.token)
-	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set("Accept", "application/json")
 	req.Header.Set("User-Agent", "configuration-anomaly-detection")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhobs/rhobs.go` around lines 69 - 71, Remove the unnecessary Content-Type
header on GET requests by deleting the req.Header.Set("Content-Type",
"application/json") call and, if you want to declare the expected response
format, replace it with req.Header.Set("Accept", "application/json"); update the
header handling where req is prepared (the existing
req.Header.Set("Authorization", ...), req.Header.Set("Content-Type", ...),
req.Header.Set("User-Agent", ...) block) so GET requests no longer set
Content-Type and instead set Accept when appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/controller.go`:
- Around line 141-145: The controller now requires the environment variable
CAD_GRAFANA_TOKEN (referenced as grafanaToken) but deployments and secrets are
not provisioned; add CAD_GRAFANA_TOKEN to the deployment template as an env var
sourced from a Kubernetes Secret (e.g., secretRef to cad-grafana-token)
following the pattern used for cad-pd-token and cad-ocm-client-secret,
create/update the cad-grafana-token Secret in all environments/CI pipelines, and
ensure the CI/CD manifests and secret provisioning steps are updated so the
controller can read grafanaToken at startup.

In `@pkg/investigations/investigation/investigation.go`:
- Around line 417-426: The error message in buildRHOBSClientResource() is
ambiguous when ManagementClusterName is empty because
buildManagementClusterResources() returned early for non-HCP clusters; update
the check in buildRHOBSClientResource (and any callers like WithRHOBSClient) to
return a clearer error indicating RHOBS/Loki is only supported for HCP clusters
(e.g., mention non-HCP cluster and that management cluster name is unavailable),
or alternatively detect the non-HCP case before attempting RHOBS setup and
return that specific message; reference the buildRHOBSClientResource function
and the builtResources.ManagementClusterName field so you change the error text
and/or add a guard for non-HCP clusters.

---

Nitpick comments:
In `@pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace.go`:
- Around line 610-612: Replace the hardcoded 30-minute window (variables now and
start computed as now.Add(-30 * time.Minute)) with a configurable or dynamic
duration: add a parameter or config value (e.g., windowDuration or
etcdQueryWindow) and use that to compute start, and if a job or resource
creation timestamp is available (e.g., job.CreationTimestamp or the
investigation object’s CreatedAt), use max(job.CreationTimestamp,
now.Add(-windowDuration)) as the start time; update any callers to pass the new
config/parameter or default to 30 minutes when unspecified.

In `@pkg/investigations/investigation/investigation.go`:
- Around line 212-216: WithRHOBSClient currently forces a management REST config
via WithManagementRestConfig(); remove that call so WithRHOBSClient() only sets
buildRHOBSClient = true. Update buildRHOBSClientResource() to lazily ensure the
management cluster name is available (reusing the name-fetching logic from
buildManagementClusterResources without creating a REST config or backplane
connection) and fail only if the cluster name/ HCP info cannot be resolved,
thereby avoiding unnecessary backplane REST config creation when building the
RHOBS client.

In `@pkg/rhobs/rhobs.go`:
- Around line 69-71: Remove the unnecessary Content-Type header on GET requests
by deleting the req.Header.Set("Content-Type", "application/json") call and, if
you want to declare the expected response format, replace it with
req.Header.Set("Accept", "application/json"); update the header handling where
req is prepared (the existing req.Header.Set("Authorization", ...),
req.Header.Set("Content-Type", ...), req.Header.Set("User-Agent", ...) block) so
GET requests no longer set Content-Type and instead set Accept when appropriate.

In `@test/set_stage_env.sh`:
- Line 9: The current for-loop that reads vault output into variable v via `for
v in $(vault kv get -format=json ... | jq -r "...")` is unsafe for values with
spaces and triggers shellcheck warnings SC2163/SC2086; replace this loop in
set_stage_env.sh with a while-read pattern that uses a safe IFS and read -r to
split key and value (e.g., `while IFS='=' read -r key value; do export
"$key=$value"; done < <(vault kv get ... | jq -r '...')`) so the `vault kv get
... | jq -r` pipeline is consumed safely and keys/values with spaces are
preserved, and ensure the loop uses the same variable names (key, value) instead
of `v`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1bc8288f-7d58-4609-8a56-e846417515fe

📥 Commits

Reviewing files that changed from the base of the PR and between c83e5b9 and 5fbcfd1.

📒 Files selected for processing (12)
  • CLAUDE.md
  • pkg/controller/controller.go
  • pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace.go
  • pkg/investigations/etcddatabasequotalowspace/etcddatabasequotalowspace_test.go
  • pkg/investigations/investigation/investigation.go
  • pkg/ocm/mock/ocmmock.go
  • pkg/ocm/ocm.go
  • pkg/rhobs/mock/rhobsmock.go
  • pkg/rhobs/rhobs.go
  • pkg/rhobs/rhobs_test.go
  • pkg/rhobs/types.go
  • test/set_stage_env.sh

Comment on lines +141 to +145
// Load Grafana/RHOBS token for HCP log fetching
grafanaToken := os.Getenv("CAD_GRAFANA_TOKEN")
if grafanaToken == "" {
return nil, fmt.Errorf("missing required environment variable CAD_GRAFANA_TOKEN")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if CAD_GRAFANA_TOKEN is referenced in deployment manifests or documentation
echo "=== Searching for CAD_GRAFANA_TOKEN references in deployment configs ==="
rg -l "CAD_GRAFANA_TOKEN" --type yaml --type json -g '!*_test*' || echo "No YAML/JSON references found"

echo ""
echo "=== Checking for environment variable documentation ==="
rg -n "CAD_GRAFANA_TOKEN" -g '*.md' -g '!CLAUDE.md'

Repository: openshift/configuration-anomaly-detection

Length of output: 244


🏁 Script executed:

#!/bin/bash

echo "=== All references to GrafanaToken in codebase ==="
rg -n "GrafanaToken" --type go

echo ""
echo "=== All references to CAD_GRAFANA_TOKEN ==="
rg -n "CAD_GRAFANA_TOKEN"

echo ""
echo "=== Search for deployment manifests in common locations ==="
find . -type f \( -name "*.yaml" -o -name "*.yml" \) | head -20

echo ""
echo "=== Check for Dockerfile or helm configs ==="
find . -type f \( -name "Dockerfile*" -o -name "Chart.yaml" -o -name "values*.yaml" \) | head -20

echo ""
echo "=== Check if token is used conditionally anywhere ==="
rg -B3 -A3 "GrafanaToken" --type go | head -50

Repository: openshift/configuration-anomaly-detection

Length of output: 6638


🏁 Script executed:

#!/bin/bash

echo "=== Check openshift/template.yaml for CAD_GRAFANA_TOKEN ==="
rg -n "CAD_GRAFANA_TOKEN" openshift/template.yaml || echo "Not found in template.yaml"

echo ""
echo "=== Check if template.yaml defines environment variables ==="
rg -B2 -A2 "env:" openshift/template.yaml | head -40

echo ""
echo "=== Check full template for GRAFANA references ==="
rg -n "GRAFANA\|Grafana" openshift/template.yaml

echo ""
echo "=== Check build Dockerfile for env vars ==="
grep -n "ENV CAD_\|ARG CAD_" build/Dockerfile || echo "No CAD env vars in Dockerfile"

echo ""
echo "=== Full context of template.yaml around deployment spec ==="
head -100 openshift/template.yaml | tail -50

Repository: openshift/configuration-anomaly-detection

Length of output: 2500


🏁 Script executed:

#!/bin/bash

echo "=== Search for any secret definitions in template or related configs ==="
rg -n "name:.*token\|name:.*grafana\|secretRef" openshift/template.yaml | head -20

echo ""
echo "=== Check if there are parameter definitions for CAD_GRAFANA_TOKEN ==="
rg -n "CAD_GRAFANA_TOKEN\|parameters:" openshift/template.yaml | head -20

Repository: openshift/configuration-anomaly-detection

Length of output: 229


Critical: CAD_GRAFANA_TOKEN is required but not defined in deployment configuration.

The code now enforces CAD_GRAFANA_TOKEN as a required environment variable (lines 141-145). The controller will fail to start if this variable is missing. However, openshift/template.yaml does not define this environment variable or configure the corresponding Kubernetes secret, which will cause immediate deployment failures.

Required actions before merging:

  • Add CAD_GRAFANA_TOKEN to the deployment template (e.g., as a secretRef for cad-grafana-token secret, similar to existing cad-pd-token and cad-ocm-client-secret)
  • Ensure the secret is provisioned in all deployment environments
  • Update CI/CD pipelines to provide this token
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/controller.go` around lines 141 - 145, The controller now
requires the environment variable CAD_GRAFANA_TOKEN (referenced as grafanaToken)
but deployments and secrets are not provisioned; add CAD_GRAFANA_TOKEN to the
deployment template as an env var sourced from a Kubernetes Secret (e.g.,
secretRef to cad-grafana-token) following the pattern used for cad-pd-token and
cad-ocm-client-secret, create/update the cad-grafana-token Secret in all
environments/CI pipelines, and ensure the CI/CD manifests and secret
provisioning steps are updated so the controller can read grafanaToken at
startup.

Comment on lines +417 to +426
// buildRHOBSClientResource creates a RHOBS client for fetching logs from RHOBS/Loki
func (r *ResourceBuilderT) buildRHOBSClientResource() error {
if r.builtResources.RHOBSClient != nil {
return nil
}

r.builtResources.DynatraceManagementClusterURL = dynatraceURL
if r.builtResources.RHOBSCell == "" {
if r.builtResources.ManagementClusterName == "" {
return fmt.Errorf("management cluster name not available - cannot determine RHOBS cell")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

RHOBS client fails with unclear error for non-HCP clusters.

If WithRHOBSClient() is called for a non-HCP cluster, buildManagementClusterResources() returns early at lines 311-314 without setting ManagementClusterName. Then buildRHOBSClientResource() fails with "management cluster name not available - cannot determine RHOBS cell" which doesn't clearly indicate that RHOBS is only supported for HCP clusters.

🛠️ Proposed fix for clearer error messaging
 func (r *ResourceBuilderT) buildRHOBSClientResource() error {
 	if r.builtResources.RHOBSClient != nil {
 		return nil
 	}

+	if !r.builtResources.IsHCP {
+		return fmt.Errorf("RHOBS client is only available for HCP clusters")
+	}
+
 	if r.builtResources.RHOBSCell == "" {
 		if r.builtResources.ManagementClusterName == "" {
 			return fmt.Errorf("management cluster name not available - cannot determine RHOBS cell")
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// buildRHOBSClientResource creates a RHOBS client for fetching logs from RHOBS/Loki
func (r *ResourceBuilderT) buildRHOBSClientResource() error {
if r.builtResources.RHOBSClient != nil {
return nil
}
r.builtResources.DynatraceManagementClusterURL = dynatraceURL
if r.builtResources.RHOBSCell == "" {
if r.builtResources.ManagementClusterName == "" {
return fmt.Errorf("management cluster name not available - cannot determine RHOBS cell")
}
// buildRHOBSClientResource creates a RHOBS client for fetching logs from RHOBS/Loki
func (r *ResourceBuilderT) buildRHOBSClientResource() error {
if r.builtResources.RHOBSClient != nil {
return nil
}
if !r.builtResources.IsHCP {
return fmt.Errorf("RHOBS client is only available for HCP clusters")
}
if r.builtResources.RHOBSCell == "" {
if r.builtResources.ManagementClusterName == "" {
return fmt.Errorf("management cluster name not available - cannot determine RHOBS cell")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/investigations/investigation/investigation.go` around lines 417 - 426,
The error message in buildRHOBSClientResource() is ambiguous when
ManagementClusterName is empty because buildManagementClusterResources()
returned early for non-HCP clusters; update the check in
buildRHOBSClientResource (and any callers like WithRHOBSClient) to return a
clearer error indicating RHOBS/Loki is only supported for HCP clusters (e.g.,
mention non-HCP cluster and that management cluster name is unavailable), or
alternatively detect the non-HCP case before attempting RHOBS setup and return
that specific message; reference the buildRHOBSClientResource function and the
builtResources.ManagementClusterName field so you change the error text and/or
add a guard for non-HCP clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant