feat: Add skeleton for Operational Metrics Dashboard by 0xaboomar · Pull Request #95 · OneBusAway/watchdog

0xaboomar · 2026-06-06T02:26:31Z

This PR adds a skeleton for the new operational metrics dashboard in Grafana, structuring panels and sections according to the DR: Operational Dashboard Structure for Watchdog Metrics design. The dashboard provides basic queries mapped to all available metrics and supports filtering by server and agency, leaving the final choice of chart visualizations and query optimizations for future contributors.

Summary by CodeRabbit

New Features
- Introduced a new Watchdog Metrics Dashboard for comprehensive system monitoring across multiple domains including availability, GTFS feed health, realtime feed health, matching quality, vehicle anomalies, and deep diagnostics. The dashboard supports dynamic filtering by server and agency, allowing operators to analyze metrics across different scopes and efficiently identify issues.

coderabbitai · 2026-06-06T02:26:42Z

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Warning

Review limit reached

@0xaboomar, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 2 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 666d6c7a-b1c6-4c36-9666-80400a7e23ef

📥 Commits

Reviewing files that changed from the base of the PR and between f8fc2ec and d95bf93.

📒 Files selected for processing (1)

grafana/dashboards/watchdog_metrics_dashboard.json

📝 Walkthrough

Walkthrough

This PR introduces a new Grafana dashboard configuration file for monitoring Watchdog system metrics. The dashboard includes six metric sections (System Availability, Static GTFS Health, Realtime Feed Health, Matching Quality, Vehicle Anomalies, Deep Diagnostics) with Prometheus-backed time series panels, template variables for server and agency filtering, and default time range settings.

Changes

Watchdog Metrics Dashboard

Layer / File(s)	Summary
Complete Dashboard Definition with Panels and Variables `grafana/dashboards/watchdog_metrics_dashboard.json`	Grafana dashboard JSON defining metadata, six row sections, Prometheus time series panels with PromQL targets, template variables for `server_id` and `agency_id` filtering, and default time range (`now-6h` to `now`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: Add skeleton for Operational Metrics Dashboard' accurately reflects the main change: adding a new Grafana dashboard definition file with a structured skeleton for operational metrics monitoring.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/metrics-dashboard-skeleton

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coveralls · 2026-06-06T02:28:26Z

coverage: 50.185% (+0.6%) from 49.604% — feat/metrics-dashboard-skeleton into main

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

grafana/dashboards/watchdog_metrics_dashboard.json (1)

583-590: Scope Grafana template variable queries to reduce cardinality.

watchdog_metrics_dashboard.json defines server_id as label_values(server_id) and agency_id as label_values(agency_id), while panels already filter using server_id=~"$server_id" (e.g., oba_api_status{server_id=~"$server_id"} and vehicle_count_api{server_id=~"$server_id", agency_id=~"$agency_id"}). Scoping the variable queries to those metrics/selected server will reduce irrelevant values and speed refreshes.

Suggested query refinement

-        "definition": "label_values(server_id)",
+        "definition": "label_values(oba_api_status, server_id)",
...
-          "query": "label_values(server_id)",
+          "query": "label_values(oba_api_status, server_id)",
...
-        "definition": "label_values(agency_id)",
+        "definition": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)",
...
-          "query": "label_values(agency_id)",
+          "query": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grafana/dashboards/watchdog_metrics_dashboard.json` around lines 583 - 590,
The dashboard's template variables "server_id" and "agency_id" use unconstrained
label_values(...) which returns high-cardinality, irrelevant labels; change
their queries to scope to the metrics actually used by panels—e.g., set the
server_id variable's definition/query to label_values(oba_api_status, server_id)
or another primary metric that filters servers, and set agency_id to
label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar
metric-scoped queries) so the variable results are filtered by relevant metrics
and selected server, reducing cardinality and speeding refreshes; update the
"definition" and "query" fields for the variables in the JSON accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 234-241: The panel mixes scoped and unscoped series: the first
query uses vehicle_position_report_interval_seconds{server_id=~"$server_id"} but
the second uses oba_time_since_last_update_seconds with no server_id filter,
causing inconsistent Server selection; update the
oba_time_since_last_update_seconds expression to include the same server_id
selector (e.g., oba_time_since_last_update_seconds{server_id=~"$server_id"}) so
all series in the "3.2 Feed Freshness" panel are consistently filtered by the
Server variable.
- Line 77: The dashboard is plotting raw cumulative Prometheus series; replace
the direct uses of http_outgoing_request_duration_seconds_sum and
oba_realtime_records_total with proper rate-based expressions: for the histogram
use the averaged rate formula using both sum and count (e.g.
rate(http_outgoing_request_duration_seconds_sum[5m]) divided by
rate(http_outgoing_request_duration_seconds_count[5m]) and optionally wrap with
sum(...) if the panel should aggregate across labels) and for the counter use
rate(oba_realtime_records_total[5m]) (optionally wrapped with sum(...) for
aggregation); update the panel expressions that reference
http_outgoing_request_duration_seconds_sum and oba_realtime_records_total
accordingly.

---

Nitpick comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 583-590: The dashboard's template variables "server_id" and
"agency_id" use unconstrained label_values(...) which returns high-cardinality,
irrelevant labels; change their queries to scope to the metrics actually used by
panels—e.g., set the server_id variable's definition/query to
label_values(oba_api_status, server_id) or another primary metric that filters
servers, and set agency_id to
label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar
metric-scoped queries) so the variable results are filtered by relevant metrics
and selected server, reducing cardinality and speeding refreshes; update the
"definition" and "query" fields for the variables in the JSON accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: afde77a2-30ce-4801-a79b-160131e8101c

📥 Commits

Reviewing files that changed from the base of the PR and between beaad5c and f8fc2ec.

📒 Files selected for processing (1)

grafana/dashboards/watchdog_metrics_dashboard.json

feat(grafana): add operational metrics dashboard skeleton

f8fc2ec

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread grafana/dashboards/watchdog_metrics_dashboard.json

Comment thread grafana/dashboards/watchdog_metrics_dashboard.json Outdated

fix(grafana): correct server/agency label filtering on OBA metrics

d95bf93

0xaboomar merged commit c0e3251 into main Jun 6, 2026
4 checks passed

0xaboomar mentioned this pull request Jun 6, 2026

Expand Grafana Dashboards for Watchdog Metrics #91

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add skeleton for Operational Metrics Dashboard#95

feat: Add skeleton for Operational Metrics Dashboard#95
0xaboomar merged 2 commits into
mainfrom
feat/metrics-dashboard-skeleton

0xaboomar commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Uh oh!

coveralls commented Jun 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xaboomar commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Uh oh!

coveralls commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xaboomar commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

coveralls commented Jun 6, 2026 •

edited

Loading