Skip to content

fix(config): validate OBA server configs and drop invalid entries#112

Open
aaronbrethorst wants to merge 3 commits into
mainfrom
prod-errors
Open

fix(config): validate OBA server configs and drop invalid entries#112
aaronbrethorst wants to merge 3 commits into
mainfrom
prod-errors

Conversation

@aaronbrethorst

@aaronbrethorst aaronbrethorst commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

  • Fixes the production failure where the service-discovery file had null feed URLs (gtfs_url, vehicle_position_url, agency_id). Go unmarshals JSON null into a string field as "", which produced cryptic Get "": unsupported protocol scheme "" errors on every fetch cycle for the affected servers (33, 41, 43, 36, 19).
  • Adds ValidateServer (required: id, name, oba_base_url, oba_api_key, gtfs_url, vehicle_position_url, agency_id; trip_update_url and the GTFS-RT auth pair are optional) and filterValidServers, which drops invalid servers and reports each to Sentry — one bad entry can't block monitoring of the rest of the fleet. Wired into both loadConfigFromFile and loadConfigFromURL, so it covers startup and the periodic remote-config refresh.
  • Initializes Sentry before loading config in main.go. Previously config load/validation ran ~30 lines before SetupSentry(), so startup-time error reports were captured by an uninitialized hub and silently lost — exactly the partial-misconfig case this fix targets.
  • Also includes the earlier CLAUDE.md commit already on this branch.

Test plan

  • go test ./... — full suite green
  • New unit tests: ValidateServer (valid, each required field, whitespace-only, all-missing production mirror), filterValidServers (interleaved valid/invalid ordering, all-invalid, empty), and filtering coverage for both the file and URL loaders
  • Ran the binary against a config mirroring the prod failure (one all-null server + one well-formed): the broken server is filtered out with no unsupported protocol scheme error; the well-formed server flows through normally
  • Ran with an all-invalid config: clean exit with No servers found in configuration.
  • go vet ./... clean, gofmt clean, binary builds with CGO_ENABLED=0

Review notes (non-blocking)

  • Drop severity / aggregate signal: each dropped server is reported at sentry.LevelError (consistent with the loaders' existing reports). A reviewer may prefer LevelWarning for routine single-entry drops, plus a single summary signal that conveys the ratio dropped (e.g. "45 of 50") so a near-total config collapse is distinguishable from one bad entry. Left as-is to match current package conventions.
  • Log visibility: the loaders have no *slog.Logger, so drops are reported only to Sentry, not the structured logs. Surfacing a "loaded N of M servers, dropped K" summary via the logger in main.go/refreshConfig (which do hold loggers) would require threading a drop count back out of the loaders — deferred to keep this PR scoped.
  • The real upstream fix is whatever generates the service-discovery file: it should emit real feed URLs instead of null. This PR makes watchdog resilient to that input but doesn't fix the generator.

Summary by CodeRabbit

  • Bug Fixes
    • Improved server configuration validation to filter out invalid server entries before they’re used, reducing runtime misconfiguration issues and improving monitoring of bad configs.
    • Enhanced startup monitoring by initializing error reporting earlier to better capture failures.
  • Documentation
    • Added repository guidance for running common development, testing, and CI workflows.
  • Tests
    • Added unit tests covering server validation, config loading/filtering, and refreshed configuration payload handling.

A production service-discovery file with null feed URLs (gtfs_url,
vehicle_position_url, agency_id) unmarshalled to empty strings, producing
endless cryptic `Get "": unsupported protocol scheme ""` errors on every
fetch cycle.

- Add ValidateServer to check required fields (id, name, oba_base_url,
  oba_api_key, gtfs_url, vehicle_position_url, agency_id); trip_update_url
  and the GTFS-RT auth pair stay optional.
- filterValidServers drops invalid servers and reports each to Sentry, so
  one bad entry can't block monitoring of the whole fleet. Wired into both
  loadConfigFromFile and loadConfigFromURL (covers startup + remote refresh).
- Initialize Sentry before loading config in main.go, so startup-time
  validation/load errors are actually delivered instead of captured by an
  uninitialized hub.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7732e07e-ce6e-4eea-ac86-4ce0486b5810

📥 Commits

Reviewing files that changed from the base of the PR and between b823aaa and 47615e8.

📒 Files selected for processing (1)
  • internal/config/config_validation_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/config/config_validation_test.go

📝 Walkthrough

Walkthrough

Adds ValidateServer and filterValidServers in a new config_validation.go module. Both config loaders (loadConfigFromFile, loadConfigFromURL) now pass unmarshalled server slices through filterValidServers before returning. Sentry initialization is moved earlier in main.go. A CLAUDE.md is added with architecture and usage guidance.

Changes

Config Server Validation and Sentry Init

Layer / File(s) Summary
ValidateServer and filterValidServers
internal/config/config_validation.go
New ValidateServer checks required ObaServer fields and returns an aggregated missing-fields error. filterValidServers applies it, reports failures to Sentry with server_id/server_name tags, and returns only valid entries.
Config loader wiring, Sentry timing, and docs
internal/config/config_loader.go, cmd/watchdog/main.go, CLAUDE.md
Both config loaders now return filterValidServers(servers) instead of raw servers. main.go moves SetupSentry, defer FlushSentry, and ConfigureScope before config loading and removes the prior later init block. CLAUDE.md documents commands, architecture, and testing conventions.
Validation tests and updated loader fixtures
internal/config/config_validation_test.go, internal/config/config_loader_test.go
New tests cover ValidateServer (valid, optional fields, missing/whitespace required fields), filterValidServers (order, all-invalid, nil), and end-to-end filtering via loadConfigFromFile/loadConfigFromURL. Existing loader test fixtures gain agency_id and fix JSON key names (oba_base_url, oba_api_key).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(config): validate OBA server configs and drop invalid entries' directly and clearly summarizes the main changes: implementing validation and filtering of OBA server configurations to handle invalid/null entries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch prod-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coveralls

coveralls commented Jun 15, 2026

Copy link
Copy Markdown

Coverage Status

coverage: 50.931% (+1.2%) from 49.71% — prod-errors into main

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/config/config_validation_test.go (1)

164-168: 💤 Low value

Consider checking the w.Write error for linter compliance.

While unchecked writes in test handlers are common practice, the static analysis tool flags line 166 for an unchecked error return. For consistency with project linting standards, consider:

🔧 Proposed fix
 ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 	w.Header().Set("Content-Type", "application/json")
-	w.Write([]byte(body))
+	if _, err := w.Write([]byte(body)); err != nil {
+		t.Errorf("Failed to write response: %v", err)
+	}
 }))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/config/config_validation_test.go` around lines 164 - 168, The
w.Write call in the HTTP handler function returns an error that is currently
unchecked, which causes a linter compliance issue. Capture the error return
value from w.Write([]byte(body)) by assigning it to a variable or explicitly
ignoring it with the blank identifier (_) to satisfy the linter requirements
while maintaining the test handler logic.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/config/config_validation_test.go`:
- Around line 164-168: The w.Write call in the HTTP handler function returns an
error that is currently unchecked, which causes a linter compliance issue.
Capture the error return value from w.Write([]byte(body)) by assigning it to a
variable or explicitly ignoring it with the blank identifier (_) to satisfy the
linter requirements while maintaining the test handler logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 32bdf4b5-cd10-4526-93eb-2848d1eae393

📥 Commits

Reviewing files that changed from the base of the PR and between e47c0ce and b823aaa.

📒 Files selected for processing (6)
  • CLAUDE.md
  • cmd/watchdog/main.go
  • internal/config/config_loader.go
  • internal/config/config_loader_test.go
  • internal/config/config_validation.go
  • internal/config/config_validation_test.go

Addresses CodeRabbit linter nitpick on PR #112.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants