This guide is for AI coding assistants working in the STATBUS codebase. STATBUS is a statistical business registry with temporal tables for tracking business activity throughout history. Built by Statistics Norway (SSB) using PostgreSQL 18+, PostgREST 12+, and Next.js 15+.
Architecture: Database-centric progressive disclosure (NOT microservices). PostgreSQL IS the backend—PostgREST exposes the schema directly. Security happens entirely at the database layer via Row Level Security (RLS).
StatBus has two command-line tools:
./sb— Go CLI for ops/production commands (start, stop, psql, migrate, upgrade, etc.)./dev.sh— Bash script for development-only commands (test, create-db, etc.)
The legacy manage-statbus.sh has been deleted. All references should use ./sb or ./dev.sh instead.
# Service management (profiles: all, all_except_app, app)
./sb start all # Start all services (db, rest, worker, app)
./sb stop all # Stop all services
./sb restart all # Restart all services
./sb ps # List running containers
./sb logs [service...] # Follow service logs
./sb build [profile] # Build Docker images from source (development only)
# Database connection
./sb psql # Open psql shell
./sb psql < file.sql # Run SQL file (use < redirection!)
echo "SELECT ..." | ./sb psql # Single-line query
# Configuration
./sb config generate # Regenerate .env and Caddyfiles from .env.config
./sb config show # Show current config (verbose)
./sb config show --postgres # Print shell-evaluable PG vars: eval $(./sb config show --postgres)
# Environment files
./sb dotenv -f .env get KEY # Read a key from .env file
./sb dotenv -f .env set KEY VALUE # Set a key (also accepts KEY=VALUE)
./sb dotenv -f .env set KEY +DEFAULT # Set only if key doesn't exist
# Users and types
./sb users create # Create users from .users.yml
./sb types generate # Generate TypeScript types from schema
# Database operations
./sb db status # Check if database is running
./sb db dump # Dump local database to dbdumps/
./sb db download <code> # Download database dump from remote server
./sb db dumps list # List available database dumps
./sb db dumps purge [N] # Delete old dumps, keeping newest N per source
./sb db restore <file> # Restore dump to local database
./sb db restore <file> --to no # Restore dump to remote serverecho "\d tablename" | ./sb psql # Table structure
echo "\dt pattern*" | ./sb psql # Find tables by pattern
echo "\dv viewname" | ./sb psql # View definition
echo "SELECT ..." | ./sb psql # Single queriesOffline inspection via doc/db/: The doc/db/ directory contains dumps of all database functions, tables, and views in markdown format. Use grep / rg on these files when you need to search function bodies without a running database. Prefer this over grepping migrations/ — migrations are chronological, so the first match is the oldest and least relevant version.
./dev.sh test fast 2>&1 | tee tmp/test-fast.log # Run fast tests only (good for quick iteration)
./dev.sh test 015_my_test # Run single test (prefix number)
./dev.sh test 300_test 2>&1 | tee tmp/test-300_test.log # Save output to log file for later review
./dev.sh diff-fail-all pipe # Show diffs for all failed tests
./dev.sh test all # Run all pg_regress tests (EXTREMELY SLOW)
# IMPORTANT: Use tee to save output - prevents wasting time re-running slow tests
# Test results are in test/results/*.out and can be compared to test/expected/*.outInstall-recovery harness (./dev.sh test-install-recovery [selector]): end-to-end upgrade/recovery tests on paid ephemeral Hetzner Cloud VMs (CX23, hel1, ~€0.0072/hr, 1-hour minimum; HCLOUD_TOKEN in .env.credentials required). See test/install-recovery/README.md.
./dev.sh create-db # Create database with migrations
./dev.sh delete-db # ⚠️ DESTROYS ALL DATA
./dev.sh delete-db-structure # ⚠️ Drops schema, keeps container
./dev.sh recreate-database # ⚠️ Delete + Create (fresh start)./sb migrate new --description "my change" # Create migration
./sb migrate up # Apply migrations
./sb migrate down # ⚠️ Rollback (destructive)./sb install # Unified entrypoint — detects state and dispatches
./sb upgrade check # Check GitHub for new releases
./sb upgrade list # List discovered upgrades from database
./sb upgrade schedule <version> # Queue an upgrade (writes a public.upgrade row)
./sb upgrade apply <version> # Trigger immediate upgrade via NOTIFY (needs running service)
./sb upgrade service # Run upgrade service (usually via systemd)./sb install is the single operator-facing entrypoint for first-install, repair, and applying a pending upgrade without waiting for the service. On each run it probes install state and dispatches:
| Detected state | Action |
|---|---|
fresh (no .env.config) |
Run the step-table to set up a clean install |
| half-configured / DB unreachable | Run the step-table to repair / continue setup |
| nothing-scheduled | Run the step-table as an idempotent config refresh |
| scheduled-upgrade | Dispatch the pending public.upgrade row through the same pipeline the service uses |
| crashed-upgrade (stale flag + dead PID) | Reconcile the flag, re-detect, re-dispatch |
| live-upgrade (service running) | Refuse with diagnostic; do not touch state |
legacy pre-1.0 (no public.upgrade table) |
Refuse with pointer to manual upgrade path (doc/CLOUD.md) |
Canonical operator upgrade workflow: ./sb upgrade schedule <version> to queue, then either wait for the service's next tick (production norm) or run ./sb install to dispatch immediately. After a successful inline upgrade the systemd upgrade unit (if active) is restarted so it picks up the new binary + migrations. Full contract in doc/upgrade-timeline.md.
Migration Best Practice for Modifying Existing Functions/Procedures:
When modifying an existing database function or procedure, always dump the current definition first rather than rewriting from scratch:
# Dump function definition to use as base for both up and down migrations
echo "\sf schema.function_name" | ./sb psql > tmp/function_def.sql
# For procedures
echo "\sf schema.procedure_name" | ./sb psql > tmp/procedure_def.sqlThen:
- cat the dumped definition into the down migration (this restores the original)
- cat it into the up migration
- Add
CREATE OR REPLACEprefix and wrap inBEGIN;/END; - Stage the up/down migration.
- Make only the necessary modifications in the up migration - noe easy to review as unstaged changes.
This approach:
- Preserves exact current behavior in the down migration
- Ensures surgical changes rather than accidental rewrites
- Reduces risk of introducing bugs from manual recreation
- Maintains all edge cases, exception handling, and comments
CRITICAL: Never rewrite large functions/procedures from scratch. The \sf dump ensures you're modifying the actual current code, not an outdated or incomplete version. For large procedures (100+ lines), focus only on the specific lines that need changing.
cd app && pnpm run dev # Development server with Turbopack
cd app && pnpm run build # Production build
cd app && pnpm run lint # ESLint
cd app && pnpm run format # Check Prettier
cd app && pnpm run format:fix # Fix Prettier issues
cd app && pnpm run tsc # Type check without emit
cd app && pnpm run test # Run Jest testsCRITICAL: These are two SEPARATE concepts that control different aspects:
Controlled by CADDY_DEPLOYMENT_MODE - defines how Caddy operates:
-
development: Local development- HTTP only, self-signed internal CA certs
- Next.js runs separately on host (
pnpm run dev) at port 3000 (http://local.statbus.org:3000) - Slot-based ports (e.g., 3010) are for testing containerized production builds to catch advanced timing issues
- PostgreSQL available on two separate ports (slot-based):
- Port 3014 (offset 1): Plaintext (default, convenient for local psql and SSH tunnels)
- Port 3015 (offset 1): TLS+SNI (for testing production-like connections with
TLS=1)
-
standalone: Single-server production- HTTPS with automatic Let's Encrypt certificates
- All services run in Docker
- Standard ports: 443 (HTTPS), 5432 (PostgreSQL)
- Direct public access, no additional proxy needed
- Public domain required (e.g.,
statbus.example.com)
-
private: Behind host-level reverse proxy (multi-tenant cloud)- HTTP only (host-level proxy handles HTTPS)
- Trusts X-Forwarded-* headers from proxy
- Multiple instances on same host with unique ports
- PostgreSQL: plaintext (SSH tunnel provides encryption)
Controlled by DEPLOYMENT_SLOT_CODE and DEPLOYMENT_SLOT_PORT_OFFSET - enables multiple instances per host:
-
Each slot = separate instance (country/environment)
-
Port calculation:
3000 + (slot_offset × 10)- Offset 1 (local): 3010 (HTTP), 3011 (HTTPS), 3012 (app), 3013 (rest), 3014 (db), 3015 (db-tls)
- Offset 2 (ma): 3020 (HTTP), 3021 (HTTPS), 3022 (app), 3023 (rest), 3024 (db), 3025 (db-tls)
- Offset 3 (no): 3030 (HTTP), 3031 (HTTPS), 3032 (app), 3033 (rest), 3034 (db), 3035 (db-tls)
-
Slot code used in:
- Container names:
statbus-{code}-app,statbus-{code}-db - Database names:
statbus_{code} - Cookie names:
statbus-{code},statbus-{code}-refresh - Subdomains:
{code}.statbus.org
- Container names:
Note: ./sb migrate up and other server-internal database clients should use CADDY_DB_BIND_ADDRESS + CADDY_DB_PORT (loopback in private mode), NOT SITE_DOMAIN. SITE_DOMAIN is the external hostname used via SSH tunnel (private) or public DNS (standalone); Caddy does not expose the DB port on that address in private-mode deployments.
Local Development (mode=development):
# Default: plaintext on slot-based port (e.g., 3014 for local slot)
./sb psql -c "SELECT version();"
# Testing TLS: uses TLS port (e.g., 3015 for local slot) with SNI
TLS=1 ./sb psql -c "SELECT version();"Remote via SSH Tunnel (mode=private):
# SSH tunnel: local:3014 → remote:127.0.0.1:3014 → db:5432
# Plaintext through tunnel (SSH provides encryption)
ssh statbus_dev@statbus.org "cd statbus && ./sb psql"Production (mode=standalone):
# Direct public access with TLS via Caddy Layer4 proxy
export PGHOST=statbus.example.com
export PGPORT=5432
export PGSSLMODE=require
export PGSSLNEGOTIATION=direct
export PGSSLSNI=1
psqlcli/src/manage.cr: Configuration generation and port calculation logiccli/src/templates/*.caddyfile.ecr: Mode-specific Caddy templatessb: Go CLI binary (built from cli/, in .gitignore)dev.sh: Development-only commands (test, create-db, etc.)ops/: Operations scripts (maintenance, notifications, service files).env.config: Edit this for deployment settings.env: Generated file, do not edit directly.env.credentials: Secrets, generated once, keep secure
SSB runs production StatBus on two host shapes — multi-tenant cloud (niue.statbus.org) and dedicated standalone boxes (e.g. rune.statbus.org). Both use branches as pointers for CI/CD. See doc/CLOUD.md for full details on each.
-
Trigger deployment (GitHub Actions -> "Run workflow"):
master-to-Xworkflow force-pushesmaster-> the deployment branch for X.- Cloud (multi-tenant on niue): branch is
ops/cloud/deploy/<slot>. - Standalone (dedicated host): branch is
ops/standalone/deploy/<host>-<slot>. - Example: "Push master -> ops/standalone/deploy/rune-no" deploys Norway on the dedicated rune box; "Push master -> ops/cloud/deploy/dev" deploys Development on niue.
-
Automatic execution: Push to the deployment branch triggers the matching
deploy-to-Xworkflow, which SSHs to the server and triggers the upgrade service. -
On the server (upgrade service):
- CLI writes upgrade request to database and sends NOTIFY
- The upgrade service backs up the database
- Checks out the target version
- Runs pending migrations (or recreates if --recreate)
- Restarts services with health checks
- Rolls back automatically on failure
- Sends callback notification (Slack)
| Workflow | Branch | Server | Notes |
|---|---|---|---|
| master-to-rune-no | ops/standalone/deploy/rune-no | statbus@rune.statbus.org | Norway (standalone, dedicated Hetzner box) |
| master-to-demo | ops/cloud/deploy/demo | statbus_demo@niue | Demo (multi-tenant) |
| master-to-dev | ops/cloud/deploy/dev | statbus_dev@niue | Development (multi-tenant) |
| master-to-production | ops/cloud/deploy/production | — | Pointer only |
| production-to-all | — | all servers | Cascades to all |
git push origin master:ops/standalone/deploy/rune-no # Deploy master to Norway (standalone on rune)
git push origin master:ops/cloud/deploy/dev # Deploy master to Development (multi-tenant on niue)This directly updates the branch pointer, which triggers deploy-to-X.yaml. The master-to-X workflows in GitHub UI do the same thing but add an extra hop.
# Multi-tenant slot on niue (one user per slot)
ssh statbus_<slot> "cd statbus && <command>" # e.g. statbus_dev
scp local/file statbus_<slot>:statbus/path/to/file
# Standalone box (one user per host: devops)
ssh statbus@rune.statbus.org "cd statbus && <command>"
scp local/file statbus@rune.statbus.org:statbus/path/to/fileBefore deploying: Ensure remote working directory is clean (no uncommitted changes), or deploy will fail.
migrations/- Database migrations (YYYYMMDDHHmmSS_description.up.sql/.down.sql)app/- Next.js 15 application (App Router, TypeScript, Tailwind, shadcn/ui)cli/- Crystal CLI tool for migrations and database managementtest/sql/- pg_regress SQL tests with expected output intest/expected/ops/- Operations scripts (maintenance, notifications, service files)doc/- Architecture and design documentation
Function Definitions:
CREATE FUNCTION auth.jwt_verify(token_value text)
RETURNS auth.jwt_verify_result
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = public, auth, pg_temp
AS $jwt_verify$
DECLARE
_jwt_verify_result auth.jwt_verify_result; -- Variable matches type name
BEGIN
-- Function body
END;
$jwt_verify$;Key patterns:
- Use function name in dollar quotes:
AS $function_name$ - Variables match their type names:
_jwt_verify_result auth.jwt_verify_result - Related functions share prefixes:
jwt_verify(),jwt_switch_role(),jwt_secret() - Format strings use dollar quotes:
format($$SELECT %I$$, table_name) - Temporal logic: chronological order
start <= point AND point < end - Table aliases: explicit
AS, e.g.,FROM my_table AS mt
Naming:
x_id= foreign key to tablexx_ident= external identifier (not from DB)x_at= TIMESTAMPTZx_on= DATE
Imports:
import { NextRequest, NextResponse } from "next/server";
import { getServerRestClient } from "@/context/RestClientStore";
import { Pool } from 'pg';- Use
@/for absolute imports fromapp/src/ - Named exports preferred over default exports
API Architecture:
- CRITICAL: Prefer direct browser-to-
/restrequests over Next.js/apiroutes - Why: Easier debugging, enables user integration, transparency, better performance
- Direct
/restshows actual database requests/responses (PostgREST) - Users can learn API patterns from browser Network tab for their own integrations
- Security: JWT tokens map to database roles—NO server-side secrets typically needed
- Only use
/apiroutes for performance optimizations (e.g., bulk upload via COPY) or webhooks
State Management (Jotai):
- CRITICAL: Small, independent atoms prevent re-render loops
- If state can change independently, it MUST be in its own atom
- Use
atomEffectfor set-if-null patterns, NOTuseEffect - Variables match atom names:
const timeContext = useAtomValue(timeContextAtom)
Effects:
- ALL effects MUST use
useGuardedEffectinstead ofuseEffect - Format:
useGuardedEffect(callback, deps, 'FileName.tsx:purpose') - State machine effects: depend on primitives, not whole state object
Database Types:
- Generated via
./sb types generate - Located in
app/src/lib/database.types.ts - Use:
Tables<'my_table'>,Enums<'my_enum'>
When optimizing slow database queries:
- Enable DEBUG mode in
.env.configto get auto_explain logs for queries >100ms - Preserve test databases with
PERSIST=truefor investigation while tests run - Don't start long-running tests without first preparing hot-patches for iteration
- Analyze auto_explain logs for:
- "Rows Removed by Filter/Join Filter" - high numbers indicate inefficient joins
- Nested loops with large row counts - may need indexes or query restructuring
- Sequential scans on large tables - may need indexes
- Hot-patch for quick iteration: Use
CREATE OR REPLACE FUNCTIONdirectly in psql to test changes without full migration/test cycles - Common patterns:
- LATERAL JOIN → CTE + regular JOIN for batch operations (O(n²) → O(n))
IS NOT DISTINCT FROMwith NULLs → COALESCE with sentinels for hash joins- Missing indexes → check if query patterns match existing indexes
Key commands for performance analysis:
# View auto_explain logs (queries >100ms)
docker compose logs db 2>&1 | grep -E "duration: [0-9]{5,}" | head -30
# Find queries with high row removal (inefficient joins)
docker compose logs db 2>&1 | grep -E "Rows Removed by (Join )?Filter:" | sort | uniq -c | sort -rn | head -20
# Check task timings in worker
echo "SELECT command, COUNT(*), SUM(duration_ms)::numeric(10,0) as total_ms FROM worker.tasks WHERE state = 'completed' GROUP BY command ORDER BY total_ms DESC;" | ./sb psqlFollow this iterative cycle for ALL changes, especially bug fixes:
- Hypothesize: State hypothesis in
tmp/journal.md(use for thought process, not tracked by OpenCode) - Plan: Use TodoWrite tool to create tasks for multi-step work (OpenCode tracks these)
- Isolate: Create/identify reproducing test
- Prototype: Create non-destructive verification in
tmp/verify_fix.sql - Observe: Run prototype, gather evidence (MANDATORY)
- Analyze: If successful → proceed. If not → return to step 1
- Implement: Propose permanent changes only after prototype succeeds
- Validate: Run full test suite
- Complete: Mark todos as completed using TodoWrite
Never skip the prototype step. A hypothesis is not confirmed until supported by direct observation.
- Use TodoWrite for multi-step tasks (3+ steps or complex work)
- Mark tasks as
in_progresswhen starting,completedwhen done - Only ONE task should be
in_progressat a time - Example:
todowrite({ todos: [{ id: "1", content: "Fix JWT verification", status: "in_progress", priority: "high" }] })
rg(ripgrep) - Primary search tooltree- Directory structurehead- Inspect file beginningsls- Check file sizesruplacer- Large-scale find/replacerenamer- Batch renaming
- JWT secret stored in
auth.secretstable with RLS SECURITY DEFINERfunctions bypass RLS (can access secrets)SECURITY INVOKERfunctions inherit caller privileges (can useSET ROLE)- Pattern: verification (DEFINER) separated from role switching (INVOKER)
Fail Fast: Code expected to work should fail immediately with clear errors. Don't mask problems.
Example:
IF _jwt_verify_result.is_valid = FALSE THEN
RAISE EXCEPTION 'Invalid token: %', _jwt_verify_result.error_message;
END IF;Testing: Avoid DO Blocks for Error Verification
DO blocks that catch exceptions are opaque - they hide actual behavior. If an UPDATE affects 0 rows, no exception fires and the test silently "passes".
-- BAD: Hides that UPDATE matched 0 rows
DO $$ BEGIN
UPDATE t SET x = 1 WHERE id = currval('seq'); -- Wrong id!
EXCEPTION WHEN check_violation THEN RAISE NOTICE 'Working'; END $$;
-- GOOD: Transparent - see actual errors, verify results
SAVEPOINT sp;
\set ON_ERROR_STOP off
UPDATE t SET x = 1 WHERE id = 1; -- See actual ERROR
\set ON_ERROR_STOP on
ROLLBACK TO SAVEPOINT sp;
SELECT COUNT(*) FROM t WHERE x = 1; -- Verify: 0For deterministic tests: DELETE FROM t; ALTER SEQUENCE t_id_seq RESTART WITH 1;
Temporal Table Cleanup:
IF to_regclass('pg_temp.my_temp_table') IS NOT NULL THEN
DROP TABLE my_temp_table;
END IF;
CREATE TEMP TABLE my_temp_table (...) ON COMMIT DROP;Async Action Atom:
export const loginAtom = atom(null, async (get, set, credentials) => {
set(authStatusCoreAtom); // Trigger refresh
await get(authStatusCoreAtom); // Wait for stable state
// Now safe to navigate or react
});- Use
tmp/directory for diagnostic SQL, journals, debug scripts (gitignored) - Don't delete files from
tmp/- they serve as useful logs - Commit messages:
prefix: description(e.g.,auth: Fix JWT verification) - For nested
format(), use named dollar quotes:$SQL$,$jsonb_expr$
- SQL Conventions (detailed): See
.claude/rules/sql.md - Frontend Conventions (detailed): See
.claude/rules/frontend.md - Multi-Agent Methodology: See
doc/agentic-methodology.md(for complex multi-phase tasks) - Data Model: See
doc/data-model.md - Authentication: See
doc/auth-design.md - Cloud Deployment: See
doc/CLOUD.mdfor multi-tenant on niue.statbus.org - Single-Instance Deployment: See
doc/DEPLOYMENT.mdfor standalone server setup - Import System: See
doc/import-system.md - Integration (API/PostgreSQL): See
doc/INTEGRATE.md
Never defer known bugs. If a latent bug is found during investigation, fix it immediately or in the very next step. Silent data corruption (e.g. missing UNIQUE constraints allowing duplicates) wastes huge amounts of user/debugging/support time. No "skip for now" — deferring known bugs is considered a moral deficiency.
Always add constraints. If a table can have duplicates that shouldn't exist, add UNIQUE. If a function has a race condition, fix it. The codebase has dozens of constraints by design — intentional quality engineering, not over-engineering.
There are NO flaky tests. Never dismiss test failures as "flaky" or "transient". Every failure has a real root cause. Common actual causes: concurrent test runs colliding on shared DB resources (left a background test running); actual code bugs introduced by changes; environment issues needing fixing. "Flaky" is a lazy excuse that masks real problems.