Skip to content

fix: self-hosted checker and workflow routing#2083

Open
mxkaske wants to merge 8 commits into
mainfrom
fix/self-hosted-checker-workflows
Open

fix: self-hosted checker and workflow routing#2083
mxkaske wants to merge 8 commits into
mainfrom
fix/self-hosted-checker-workflows

Conversation

@mxkaske
Copy link
Copy Markdown
Member

@mxkaske mxkaske commented Apr 17, 2026

Summary

Picks up the work from #1976 (by @aiokaizen) and applies review feedback from @thibaultleouay.

  • Decouples checker/workflow routing from OpenStatus cloud infrastructure (Fly.io, GCP Cloud Tasks) so self-hosted deployments use direct HTTP calls between containers
  • Adds Redis REST shim (serverless-redis-http) for self-hosted Redis compatibility
  • Updates Docker Compose and env examples for self-host wiring

Fixes applied on top of #1976:

  • Consolidated duplicated self-host helpers (isSelfHost, getCheckerBaseUrl, getCheckerRegion) into @openstatus/utils — removed copies from 3 files
  • Fixed N+1 query in sendCheckerTasksDirect — batch-fetch all monitor statuses in one query instead of per-monitor
  • Added fetch timeout (30s AbortSignal.timeout) to direct checker dispatch
  • Added Go HTTP client timeout (30s) — replaced http.DefaultClient with bounded client
  • Deduplicated payload construction — extracted buildCheckerPayload() shared by both direct and Cloud Tasks paths (-70 lines)
  • Applied Thibault's Go feedback — replaced SELF_HOST/OPENSTATUS_WORKFLOWS_URL branching with hasGCPConfig() check, matching the TS-side pattern
  • Docker compose — added Redis + redis-http services, self-host env var wiring

Test plan

  • docker compose up starts all services with .env.docker.example values
  • Checker receives tasks from workflows via direct HTTP in self-host mode
  • Checker posts status updates back to workflows via direct HTTP
  • Cloud-hosted path (with GCP credentials) still uses Cloud Tasks
  • Monitor test (HTTP/TCP/DNS) works from dashboard in self-host mode

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openstatus-dashboard Ready Ready Preview, Comment Apr 17, 2026 10:36am
openstatus-status-page Ready Ready Preview, Comment Apr 17, 2026 10:36am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
openstatus-web Ignored Ignored Preview Apr 17, 2026 10:36am

Request Review

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 17, 2026

Deploying openstatus-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: f003e30
Status: ✅  Deploy successful!
Preview URL: https://c0b020f7.openstatus-1sy.pages.dev
Branch Preview URL: https://fix-self-hosted-checker-work.openstatus-1sy.pages.dev

View logs

Adds it back as commented-out with a note that it's not needed for self-host.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Gate fly-prefer-region header on !isSelfHost() in triggerChecker
- Use isSelfHost() utility in monitor.ts instead of raw env check
- Fix followRedirects || true bug (always true) → ?? true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Matches the TS-side pattern: isSelfHost() || !hasCloudTaskConfig().
Prevents accidental Cloud Tasks usage when GCP creds are present
alongside SELF_HOST=true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mxkaske
Copy link
Copy Markdown
Member Author

mxkaske commented Apr 17, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Claude finished @mxkaske's task in 2m 10s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Analyze code quality, bugs, and security issues
  • Post review feedback

Overall this is a well-structured PR that picks up #1976 cleanly and applies the stated improvements. The architecture is sound — isSelfHost() || !hasCloudTaskConfig() as the branching predicate is cleaner than the prior SELF_HOST env-var-only approach. Below are the issues I found, ordered by severity.


Bugs / Medium Severity

1. triggerChecker silently sends null body for unsupported jobType

packages/api/src/router/checker.ts lines 379–476

The triggerChecker function never uses the new buildCheckerPayload shared helper — it has its own if-else chain. If jobType is anything other than http, tcp, or dns, payload stays null and the fetch call fires with body: JSON.stringify(null) — a silent mis-dispatch. The cloud path's buildCheckerPayload throws explicitly (throw new Error("Unsupported jobType")), so the two paths have divergent error behaviour.

// packages/api/src/router/checker.ts ~L459
const allResult = [];
for (const region of input.regions) {
  const res = fetch(generateUrl({ row: input }), {
    ...
    body: JSON.stringify(payload), // payload can be null here

Fix this →

2. Go Cloud Tasks client created on every call

apps/checker/checker/update.go lines 83–145

updateStatusCloudTasks calls cloudtasks.NewClient(ctx, ...) on every invocation. This involves OAuth2 token exchange and a gRPC/HTTP connection setup — expensive for a path that runs after every monitor check. The TS side already uses a module-level lazy-init singleton (_client). The Go side should do the same.

// Called per monitor result, creates a new authenticated client each time
client, err := cloudtasks.NewClient(ctx, option.WithAuthCredentials(creds))

Fix this →


Low Severity / Nits

3. SELF_HOST still checked in Go despite PR description saying it was replaced

apps/checker/checker/update.go line 46

if os.Getenv("SELF_HOST") == "true" || !hasGCPConfig() {

The PR description says the Go side was updated to use hasGCPConfig() only, matching the TS-side pattern. But SELF_HOST is still checked here. This isn't wrong (belt-and-suspenders), but it's inconsistent with the stated intent. If hasGCPConfig() is the canonical gate, the SELF_HOST check is redundant.

4. Duplicated maintenance subquery between direct and cloud paths

apps/workflows/src/cron/checker.ts lines 101–121 and 275–294

Both sendCheckerTasks and sendCheckerTasksDirect build identical currentMaintenance/currentMaintenanceMonitors subqueries. Extracting to a shared helper would keep things DRY.

5. redis-http image is unpinned

docker-compose.yaml line 78

image: hiett/serverless-redis-http:latest

latest is fine for development but risky for self-hosted production deployments — a breaking change in the upstream image would affect users who re-pull. Consider pinning to a specific digest or version tag.

6. Missing Redis env vars in workflows env.ts schema

apps/workflows/src/env.ts

UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN are used by Redis.fromEnv() in monitor.ts (line 29) but are not declared in the Zod schema. They're set correctly in docker-compose, but the schema is the contract — missing them means no validation/defaults and a confusing runtime error if they're absent.

7. Single-region limitation in self-host direct dispatch is implicit

apps/workflows/src/cron/checker.ts lines 351–372

In sendCheckerTasksDirect, for each monitor only one task is dispatched (using selfHostRegion), regardless of how many regions a monitor has configured. The regions array on the monitor is silently ignored. The 30s warning on line 375 is good, but a similar warning that multi-region monitors will only be checked from one region would help operators debugging unexpected gaps in coverage.


Summary

The core routing logic and N+1 fix are solid. The two actionable items above are: (1) the null-payload silent mis-dispatch in triggerChecker and (2) the per-call Cloud Tasks client in Go. Everything else is polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants