Tests are organized into five tiers, each building on the previous:
| Tier | Command | What It Verifies |
|---|---|---|
| 0 — Unit | make test-unit |
Pure logic unit tests and manifest security checks (no cluster required) |
| 1 — Smoke | make test-smoke |
Cluster reachable, all 5 pods Running, health endpoints respond |
| 2 — Pipeline | make test-pipeline |
OTLP gRPC/HTTP send completes without error (A1) |
| 3 — Forwarding | make test-forwarding |
Data flows through each pipeline leg, including Splunk receipt |
| 4 — Sourcetypes | make test-sourcetypes |
Per-sourcetype sentinel E2E and config validation |
Run all tiers: make test-all (or individually with the commands above).
CI enforcement is path-based: when the relevant files change, the corresponding tiers run in CI and block PR merges on failure:
| Tier | CI Workflow | Runner |
|---|---|---|
| 0 — Unit + manifest tests | validate.yml (unit-tests job) |
ubuntu-latest |
| 1–4 — Smoke, Pipeline, Forwarding, Sourcetypes | e2e-tests.yml |
self-hosted Linux/ARM64 (Docker on macOS) |
The e2e-tests.yml workflow runs for changes under k8s/**, scripts/**, tests/**, or the Makefile; PRs that do not touch those paths will only run Tier 0 in CI.
Manual make test-* commands are available for local development and debugging.
The Tier 1–4 jobs execute inside an ephemeral myoung34/github-runner:ubuntu-jammy container running on the macOS host that owns the OrbStack k3s cluster. The container reaches the cluster API via k8s.orb.local (OrbStack injects DNS resolution for *.orb.local inside Docker containers).
Architecture (zero custom code):
- Stock image with
EPHEMERAL=1— each registration is single-use: the runner registers, runs ONE job, deregisters cleanly, and exits with code 0 (removing its.runnerconfig file on exit).restart: unless-stoppedindocker/actions-runner/docker-compose.ymlthen restarts the runner container/service so it can register again for the next job. The container is NOT brand-new each time — bind-mounted state persists across jobs, and the container's writable layer is reused. The ephemeral runner's own cleanup is what ensures fresh-looking registration; the restart policy just brings the service back. This eliminates the "Cannot configure: already configured" crash loop that affected the previous setup. - Tools (kubectl, kustomize, sops, age, yq, doppler, python) come from the workflow's
setup-e2e-toolscomposite action, NOT from a custom image. - Boot persistence is provided by a macOS LaunchAgent at
~/Library/LaunchAgents/com.<owner>.orbstack-runner.plistinstalled bymake runner-install-launchagent. The<owner>is the lowercase owner ofGITHUB_REPO— runmake runner-print-labelto see the resolved Label. Under normal operation compose stays up and Docker handles the per-job restart cycle;KeepAlive=trueon the LaunchAgent only respawnsmake runner-foregroundif compose itself crashes.
Operations:
make runner-pull # pull the pinned myoung34/github-runner:ubuntu-jammy image
make runner-start # boot in background (one-shot, manual)
make runner-foreground # boot in foreground (used by LaunchAgent)
make runner-stop # stop and remove the runner
make runner-status # container + GitHub registration status
make runner-logs # tail container logs
make runner-doctor # deep health check (container, registration, mounts, k8s reach, LaunchAgent)
make runner-install-launchagent # install LaunchAgent (re-run from main worktree after merge)
make runner-uninstall-launchagent # remove LaunchAgentRecovery:
- Container failed?
launchctl kickstart -k gui/$(id -u)/$(make -s runner-print-label)forces the LaunchAgent to respawn the runner. - Registration stuck? Delete the orphan via
GH_TOKEN=$(doppler secrets get GH_PAT_RUNNER_TOKEN --plain -p gh-workflow-tokens -c prd) gh api -X DELETE "repos/${GITHUB_REPO}/actions/runners/<id>", thenlaunchctl kickstart -k .... ExportGITHUB_REPOfirst or substitute your repo slug directly (e.g., frommake -n runner-status). - LaunchAgent not loaded? Run
make runner-install-launchagentfrom the worktree you want it to point at (usually the main worktree). - Logs:
~/Library/Logs/orbstack-runner/{stdout,stderr}.log.
The runner requires the macOS host to be powered on and OrbStack to be running. Sleep/wake and reboot are handled automatically by the LaunchAgent.
- Test venv installed:
make test-setup - For Tiers 1–4 (local only): OrbStack running with monitoring namespace deployed:
make deploy-doppler - Splunk HEC token configured:
SPLUNK_HEC_TOKENmust be set in Doppleriac-conf-mgmt/prd - Splunk management credentials in secret:
splunk-hec-configmust havemgmt-urlandadmin-passwordkeys (populated automatically bymake deploy-dopplerwhenSPLUNK_PASSWORDis set)
Verifies infrastructure health — all pods running, all StatefulSets ready, service endpoints respond. Does not verify data flow, only that components are alive.
Verifies OTLP data can be ingested (A1: Client → OTEL Collector). Sends a trace via gRPC and HTTP, asserts no transport errors.
The main integration test suite. Three test classes:
TestCollectorToEdgeForwarding — OTEL Collector → Cribl Edge (A4)
- Verifies no OTEL exporter errors after sending a trace
- Verifies the Edge API is reachable after a send
TestEdgeToHomelabForwarding — Cribl Edge → homelab Stream → Splunk HEC (A5+A7)
test_cribl_edge_no_output_errors: Edge logs show no non-retryable errors for the proxmox-stream outputtest_cribl_edge_events_flowing: Edge showsoutBytes > 0in stats logs (bytes sent over S2S)test_otlp_events_reach_splunk_realtime✓Splunk: Sends OTLP trace with unique ID, queries Splunk forindex=claudewithin 120s
TestClaudeCodeLogPipeline — Host FS → Edge → homelab Stream → Splunk (A2+A5+A7)
test_claude_home_mount_accessible: Edge pod can read~/.claude/projects/test_sentinel_file_visible_in_edge_pod: Host file appears in pod immediatelytest_edge_file_monitor_config_path: Pack input config has correct monitoring pathtest_edge_file_monitor_picks_up_sentinel: Edge FileMonitor logs "collector added" within 35stest_edge_output_not_devnull: Edge's default output routes to proxmox-stream (not devnull)test_edge_file_input_active: Edge FileMonitor is actively collecting filestest_file_events_reach_splunk_realtime✓Splunk: Writes sentinel.jsonl, queries Splunk forindex=claude sourcetype=claude:code:sessionwithin 90s
Tests marked ✓Splunk verify data arrived in Splunk — not just that it was sent.
Pure logic tests (no cluster needed):
test_parse_otel_error_lines: Validates OTEL log parsertest_find_flowing_stats: Validates Cribl stats parser
Shared utility functions:
parse_otel_error_lines(log_text): Filters OTEL collector logs for error-level operational entriesfind_flowing_stats(log_text): Finds Cribl logs whereoutBytes > 0query_splunk(mgmt_url, admin_password, search, earliest): Queries Splunk REST API, returns list of result dicts
Shared fixtures and helpers:
cluster_ready: Session fixture, skips all tests if OrbStack unreachablesplunk_client: Returns(mgmt_url, admin_password)fromsplunk-hec-configsecret; skips if keys absentkubectl_exec_no_fail(*args): Runkubectl execreturning(stdout, return_code)without raising on failure
# Run a single test
make test-setup
.venv/bin/pytest tests/test_forwarding.py::TestClaudeCodeLogPipeline::test_file_events_reach_splunk_realtime -v
# Run all forwarding tests
make test-forwarding
# Run with extra verbosity for debugging
.venv/bin/pytest tests/test_forwarding.py -v -sThe splunk-hec-config secret is missing mgmt-url or admin-password keys. Run:
make deploy-dopplerRequires SPLUNK_PASSWORD and SPLUNK_NETWORK set in Doppler iac-conf-mgmt/prd.
Check each pipeline leg:
- Edge mount:
kubectl exec -n monitoring cribl-edge-standalone-0 -- ls /home/claude/.claude/projects/ - Edge FileMonitor:
kubectl logs -n monitoring cribl-edge-standalone-0 --since=5m | grep FileMonitor - Edge→homelab sent: Check Edge outputs API
sentCountforproxmox-stream - Edge output errors:
kubectl logs -n monitoring cribl-edge-standalone-0 --since=5m | grep "output:proxmox-stream" | grep "level=warn\|level=error" - Edge S2S config:
kubectl exec -n monitoring cribl-edge-standalone-0 -- cat /opt/cribl/data/local/cribl/outputs.yml(the homelab side is owned by its own repo)
The force-splunk-meta pipeline (attached to the Edge's proxmox-stream output) must set index (no underscore) — NOT _index. If events appear with sourcetype=httpevent or index=default, the pipeline is not running or has wrong field names. Check:
kubectl exec -n monitoring cribl-edge-standalone-0 -- \
cat /opt/cribl/data/local/cribl/pipelines/force-splunk-meta/conf.yml- A3 (Host FS → Edge Managed): Not verified by forwarding tests; only pod health is checked
- A6 (Edge Managed → Cribl Cloud): Not locally testable (cloud-managed)
- A7 (homelab Stream → Splunk HEC): The homelab Stream is not reachable from CI; covered indirectly by the ✓Splunk E2E tests
- A9 (MCP Server → Cribl Cloud): Not locally testable
- VS Code sourcetypes: Covered by pack install validation (unit tests check cc-edge-vscode-io forbidden patterns); E2E sourcetype sentinels planned for future PR
- Copilot sourcetypes (
copilot:chat:otel): Routing rules validated by the force-splunk-meta pipeline config; E2E tests require active Copilot data sources
See ARCHITECTURE.md for the full test coverage map.