Skip to content

feat: Add skeleton for Operational Metrics Dashboard#95

Merged
0xaboomar merged 2 commits into
mainfrom
feat/metrics-dashboard-skeleton
Jun 6, 2026
Merged

feat: Add skeleton for Operational Metrics Dashboard#95
0xaboomar merged 2 commits into
mainfrom
feat/metrics-dashboard-skeleton

Conversation

@0xaboomar

@0xaboomar 0xaboomar commented Jun 6, 2026

Copy link
Copy Markdown
Member

This PR adds a skeleton for the new operational metrics dashboard in Grafana, structuring panels and sections according to the DR: Operational Dashboard Structure for Watchdog Metrics design. The dashboard provides basic queries mapped to all available metrics and supports filtering by server and agency, leaving the final choice of chart visualizations and query optimizations for future contributors.

Summary by CodeRabbit

  • New Features
    • Introduced a new Watchdog Metrics Dashboard for comprehensive system monitoring across multiple domains including availability, GTFS feed health, realtime feed health, matching quality, vehicle anomalies, and deep diagnostics. The dashboard supports dynamic filtering by server and agency, allowing operators to analyze metrics across different scopes and efficiently identify issues.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Review Change Stack

Warning

Review limit reached

@0xaboomar, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 2 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 666d6c7a-b1c6-4c36-9666-80400a7e23ef

📥 Commits

Reviewing files that changed from the base of the PR and between f8fc2ec and d95bf93.

📒 Files selected for processing (1)
  • grafana/dashboards/watchdog_metrics_dashboard.json
📝 Walkthrough

Walkthrough

This PR introduces a new Grafana dashboard configuration file for monitoring Watchdog system metrics. The dashboard includes six metric sections (System Availability, Static GTFS Health, Realtime Feed Health, Matching Quality, Vehicle Anomalies, Deep Diagnostics) with Prometheus-backed time series panels, template variables for server and agency filtering, and default time range settings.

Changes

Watchdog Metrics Dashboard

Layer / File(s) Summary
Complete Dashboard Definition with Panels and Variables
grafana/dashboards/watchdog_metrics_dashboard.json
Grafana dashboard JSON defining metadata, six row sections, Prometheus time series panels with PromQL targets, template variables for server_id and agency_id filtering, and default time range (now-6h to now).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: Add skeleton for Operational Metrics Dashboard' accurately reflects the main change: adding a new Grafana dashboard definition file with a structured skeleton for operational metrics monitoring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/metrics-dashboard-skeleton

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coveralls

coveralls commented Jun 6, 2026

Copy link
Copy Markdown

Coverage Status

coverage: 50.185% (+0.6%) from 49.604% — feat/metrics-dashboard-skeleton into main

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
grafana/dashboards/watchdog_metrics_dashboard.json (1)

583-590: Scope Grafana template variable queries to reduce cardinality.

watchdog_metrics_dashboard.json defines server_id as label_values(server_id) and agency_id as label_values(agency_id), while panels already filter using server_id=~"$server_id" (e.g., oba_api_status{server_id=~"$server_id"} and vehicle_count_api{server_id=~"$server_id", agency_id=~"$agency_id"}). Scoping the variable queries to those metrics/selected server will reduce irrelevant values and speed refreshes.

Suggested query refinement
-        "definition": "label_values(server_id)",
+        "definition": "label_values(oba_api_status, server_id)",
...
-          "query": "label_values(server_id)",
+          "query": "label_values(oba_api_status, server_id)",
...
-        "definition": "label_values(agency_id)",
+        "definition": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)",
...
-          "query": "label_values(agency_id)",
+          "query": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grafana/dashboards/watchdog_metrics_dashboard.json` around lines 583 - 590,
The dashboard's template variables "server_id" and "agency_id" use unconstrained
label_values(...) which returns high-cardinality, irrelevant labels; change
their queries to scope to the metrics actually used by panels—e.g., set the
server_id variable's definition/query to label_values(oba_api_status, server_id)
or another primary metric that filters servers, and set agency_id to
label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar
metric-scoped queries) so the variable results are filtered by relevant metrics
and selected server, reducing cardinality and speeding refreshes; update the
"definition" and "query" fields for the variables in the JSON accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 234-241: The panel mixes scoped and unscoped series: the first
query uses vehicle_position_report_interval_seconds{server_id=~"$server_id"} but
the second uses oba_time_since_last_update_seconds with no server_id filter,
causing inconsistent Server selection; update the
oba_time_since_last_update_seconds expression to include the same server_id
selector (e.g., oba_time_since_last_update_seconds{server_id=~"$server_id"}) so
all series in the "3.2 Feed Freshness" panel are consistently filtered by the
Server variable.
- Line 77: The dashboard is plotting raw cumulative Prometheus series; replace
the direct uses of http_outgoing_request_duration_seconds_sum and
oba_realtime_records_total with proper rate-based expressions: for the histogram
use the averaged rate formula using both sum and count (e.g.
rate(http_outgoing_request_duration_seconds_sum[5m]) divided by
rate(http_outgoing_request_duration_seconds_count[5m]) and optionally wrap with
sum(...) if the panel should aggregate across labels) and for the counter use
rate(oba_realtime_records_total[5m]) (optionally wrapped with sum(...) for
aggregation); update the panel expressions that reference
http_outgoing_request_duration_seconds_sum and oba_realtime_records_total
accordingly.

---

Nitpick comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 583-590: The dashboard's template variables "server_id" and
"agency_id" use unconstrained label_values(...) which returns high-cardinality,
irrelevant labels; change their queries to scope to the metrics actually used by
panels—e.g., set the server_id variable's definition/query to
label_values(oba_api_status, server_id) or another primary metric that filters
servers, and set agency_id to
label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar
metric-scoped queries) so the variable results are filtered by relevant metrics
and selected server, reducing cardinality and speeding refreshes; update the
"definition" and "query" fields for the variables in the JSON accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: afde77a2-30ce-4801-a79b-160131e8101c

📥 Commits

Reviewing files that changed from the base of the PR and between beaad5c and f8fc2ec.

📒 Files selected for processing (1)
  • grafana/dashboards/watchdog_metrics_dashboard.json

Comment thread grafana/dashboards/watchdog_metrics_dashboard.json
Comment thread grafana/dashboards/watchdog_metrics_dashboard.json Outdated
@0xaboomar 0xaboomar merged commit c0e3251 into main Jun 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants