feat: Add skeleton for Operational Metrics Dashboard#95
Conversation
|
Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine. Warning Review limit reached
More reviews will be available in 8 minutes and 2 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR introduces a new Grafana dashboard configuration file for monitoring Watchdog system metrics. The dashboard includes six metric sections (System Availability, Static GTFS Health, Realtime Feed Health, Matching Quality, Vehicle Anomalies, Deep Diagnostics) with Prometheus-backed time series panels, template variables for server and agency filtering, and default time range settings. ChangesWatchdog Metrics Dashboard
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
grafana/dashboards/watchdog_metrics_dashboard.json (1)
583-590: Scope Grafana template variable queries to reduce cardinality.
watchdog_metrics_dashboard.jsondefinesserver_idaslabel_values(server_id)andagency_idaslabel_values(agency_id), while panels already filter usingserver_id=~"$server_id"(e.g.,oba_api_status{server_id=~"$server_id"}andvehicle_count_api{server_id=~"$server_id", agency_id=~"$agency_id"}). Scoping the variable queries to those metrics/selected server will reduce irrelevant values and speed refreshes.Suggested query refinement
- "definition": "label_values(server_id)", + "definition": "label_values(oba_api_status, server_id)", ... - "query": "label_values(server_id)", + "query": "label_values(oba_api_status, server_id)", ... - "definition": "label_values(agency_id)", + "definition": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)", ... - "query": "label_values(agency_id)", + "query": "label_values(vehicle_count_api{server_id=~\"$server_id\"}, agency_id)",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grafana/dashboards/watchdog_metrics_dashboard.json` around lines 583 - 590, The dashboard's template variables "server_id" and "agency_id" use unconstrained label_values(...) which returns high-cardinality, irrelevant labels; change their queries to scope to the metrics actually used by panels—e.g., set the server_id variable's definition/query to label_values(oba_api_status, server_id) or another primary metric that filters servers, and set agency_id to label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar metric-scoped queries) so the variable results are filtered by relevant metrics and selected server, reducing cardinality and speeding refreshes; update the "definition" and "query" fields for the variables in the JSON accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 234-241: The panel mixes scoped and unscoped series: the first
query uses vehicle_position_report_interval_seconds{server_id=~"$server_id"} but
the second uses oba_time_since_last_update_seconds with no server_id filter,
causing inconsistent Server selection; update the
oba_time_since_last_update_seconds expression to include the same server_id
selector (e.g., oba_time_since_last_update_seconds{server_id=~"$server_id"}) so
all series in the "3.2 Feed Freshness" panel are consistently filtered by the
Server variable.
- Line 77: The dashboard is plotting raw cumulative Prometheus series; replace
the direct uses of http_outgoing_request_duration_seconds_sum and
oba_realtime_records_total with proper rate-based expressions: for the histogram
use the averaged rate formula using both sum and count (e.g.
rate(http_outgoing_request_duration_seconds_sum[5m]) divided by
rate(http_outgoing_request_duration_seconds_count[5m]) and optionally wrap with
sum(...) if the panel should aggregate across labels) and for the counter use
rate(oba_realtime_records_total[5m]) (optionally wrapped with sum(...) for
aggregation); update the panel expressions that reference
http_outgoing_request_duration_seconds_sum and oba_realtime_records_total
accordingly.
---
Nitpick comments:
In `@grafana/dashboards/watchdog_metrics_dashboard.json`:
- Around line 583-590: The dashboard's template variables "server_id" and
"agency_id" use unconstrained label_values(...) which returns high-cardinality,
irrelevant labels; change their queries to scope to the metrics actually used by
panels—e.g., set the server_id variable's definition/query to
label_values(oba_api_status, server_id) or another primary metric that filters
servers, and set agency_id to
label_values(vehicle_count_api{server_id=~"$server_id"}, agency_id) (or similar
metric-scoped queries) so the variable results are filtered by relevant metrics
and selected server, reducing cardinality and speeding refreshes; update the
"definition" and "query" fields for the variables in the JSON accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: afde77a2-30ce-4801-a79b-160131e8101c
📒 Files selected for processing (1)
grafana/dashboards/watchdog_metrics_dashboard.json
This PR adds a skeleton for the new operational metrics dashboard in Grafana, structuring panels and sections according to the DR: Operational Dashboard Structure for Watchdog Metrics design. The dashboard provides basic queries mapped to all available metrics and supports filtering by server and agency, leaving the final choice of chart visualizations and query optimizations for future contributors.
Summary by CodeRabbit