Skip to content

feat(helm): package Grafana dashboards#2946

Open
osu wants to merge 3 commits into
NVIDIA:mainfrom
osu:issue-1696-grafana-dashboards
Open

feat(helm): package Grafana dashboards#2946
osu wants to merge 3 commits into
NVIDIA:mainfrom
osu:issue-1696-grafana-dashboards

Conversation

@osu

@osu osu commented Jun 28, 2026

Copy link
Copy Markdown
Member

Description

Package three production Grafana dashboards for NICo site overview, object-lifecycle diagnostics, and API performance. The Helm chart can optionally install them for a Grafana dashboard sidecar.

This change:

  • adds the site overview, object lifecycle, and API performance dashboard JSON assets;
  • adds an opt-in, release-scoped Helm ConfigMap installer with configurable namespace, discovery labels, annotations, folder, and folder-annotation key;
  • keeps the installer disabled by default because the chart does not install Grafana;
  • documents Prometheus/Grafana prerequisites, direct JSON import, metric-prefix selection, and cross-namespace behavior;
  • selects fresh state-controller snapshots to avoid stale object-state panels and supports both the default carbide and alt_metric_prefix metric prefixes; and
  • synchronizes the generated Core metrics reference for the Site Explorer status metric.

Related issues

Closes #1696

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

The dashboard installer is disabled by default and does not install Grafana.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Passed locally:

  • jq structure, unique UID, unique panel-ID, and field-config checks for every dashboard
  • Prometheus promtool 3.5.4: all 59 dashboard PromQL expressions parsed successfully
  • helm lint helm --set global.image.repository=example.invalid/nico --set global.image.tag=test
  • helm lint helm -f helm/tests/fixtures/grafana-custom-values.yaml --set global.image.repository=example.invalid/nico --set global.image.tag=test
  • default and enabled helm template renders
  • rendered custom ConfigMap readback: exact sorted dashboard keys, custom namespace/metadata, unique UIDs, and all embedded JSON parsed
  • kubectl apply --dry-run=client --validate=false on the rendered ConfigMap
  • helm unittest --strict -f 'tests/grafana_dashboards_test.yaml' helm (4/4 pass)
  • helm package helm: all three dashboard JSON files present in the chart archive
  • Grafana 13.1.0 API import and search readback: all three dashboards imported successfully

GitHub CI passed: core-ci-pass and rest-ci-pass.

Additional Notes

The three JSON payloads total about 60 KiB. When enabled, the installer emits one release-scoped ConfigMap with the standard grafana_dashboard: "1" sidecar label.

The complete existing Helm unit suite was also run. It has the same nine stale nico-api certificate SAN expectation failures on this branch and pristine origin/main: the tests do not yet include the recently added carbide-api.forge SAN. This change does not touch those templates or tests.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested a review from a team as a code owner June 28, 2026 01:30
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR packages three Grafana dashboards into Helm, adds values, helpers, a conditional ConfigMap template, tests, and sidecar/setup documentation. It also adds one new core metric documentation row.

Changes

Grafana Dashboard Helm Packaging

Layer / File(s) Summary
Values schema and Helm template helpers
helm/values.yaml, helm/templates/_helpers.tpl
Adds the grafanaDashboards configuration block and helpers for chart naming, dashboards ConfigMap naming, and merged dashboard labels.
Conditional ConfigMap template
helm/templates/grafana-dashboards.yaml
Renders a ConfigMap only when dashboard provisioning is enabled, with labels, optional folder annotation, and all packaged dashboard JSON files.
Site overview dashboard
helm/dashboards/nico-overview.json
Adds the site overview dashboard with service health, capacity, and health-alert panels plus template variables.
Lifecycle and API dashboards
helm/dashboards/nico-lifecycle.json, helm/dashboards/nico-api-performance.json
Adds the object lifecycle and API performance dashboards with Prometheus-backed panels and dashboard variables.
Helm tests and dashboard documentation
helm/tests/grafana_dashboards_test.yaml, helm/tests/fixtures/grafana-custom-values.yaml, helm/README.md, helm/PREREQUISITES.md
Adds Helm tests, a fixture, and documentation for enabling dashboards and configuring sidecar prerequisites.

Core Metrics Documentation Update

Layer / File(s) Summary
Core metric row
docs/observability/core_metrics.md
Adds the carbide_site_explorer_last_run_status entry to the core metrics table.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The docs/observability/core_metrics.md update adds an unrelated metric entry that is not part of the dashboard packaging scope. Split the metric documentation update into a separate PR, or remove it from this change unless it is required for the dashboards.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: packaging Grafana dashboards in Helm.
Linked Issues check ✅ Passed The PR packages the requested dashboards and adds Helm support to install them into Grafana as required by #1696.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The description accurately matches the dashboard packaging, Helm installer, documentation, and metric updates in the changeset.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/templates/_helpers.tpl`:
- Around line 11-13: The `nico.grafanaDashboardsName` helper is using only
`.Release.Name`, which can produce the same ConfigMap name for releases with the
same name in different namespaces and cause collisions. Update this helper to
incorporate the release namespace into the generated name, or add a
values-driven override for the ConfigMap name, and keep the result
truncated/trimmed so existing Helm upgrade behavior remains stable.

In `@helm/tests/fixtures/grafana-custom-values.yaml`:
- Around line 6-8: The Grafana dashboard label override is being rendered as a
null value instead of being removed, which breaks the expected manifest output.
Update the label handling in the Grafana values/template flow so the default
label is explicitly omitted or filtered out before rendering metadata.labels,
using the relevant label merge/helper path that processes grafana_dashboard and
sidecar.example.com/dashboard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1572a944-6787-43ab-9555-219f1f1a5bbc

📥 Commits

Reviewing files that changed from the base of the PR and between 87a5337 and 4238db1.

📒 Files selected for processing (10)
  • helm/PREREQUISITES.md
  • helm/README.md
  • helm/dashboards/nico-api-performance.json
  • helm/dashboards/nico-lifecycle.json
  • helm/dashboards/nico-overview.json
  • helm/templates/_helpers.tpl
  • helm/templates/grafana-dashboards.yaml
  • helm/tests/fixtures/grafana-custom-values.yaml
  • helm/tests/grafana_dashboards_test.yaml
  • helm/values.yaml

Comment thread helm/templates/_helpers.tpl
Comment thread helm/tests/fixtures/grafana-custom-values.yaml
osu added 2 commits June 27, 2026 18:36
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested a review from a team as a code owner June 28, 2026 02:08
@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 285 6 25 103 7 144
machine-validation-runner 748 32 187 272 36 221
machine_validation 748 32 187 272 36 221
machine_validation-aarch64 748 32 187 272 36 221
nvmetal-carbide 748 32 187 272 36 221
TOTAL 3283 134 773 1197 151 1028

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dashboards should be included in repository and installed via Helm charts

1 participant