Development Guide

This guide is for engineers working on ATDM. End-user docs are in the top-level README.md.

Prerequisites

Required to run the stack and the demo

Tool	Version
Python	3.12
pdm	2.26+
Docker Engine	24+
Docker Compose	v2 (`docker compose ...`, not legacy `docker-compose`)
GNU Make	any

Verify with:

python3.12 --version
pdm --version
docker --version
docker compose version

Required only for the docs-generation workflow

Tool	Version	Used by
Node.js	20+	the example Playwright test under `automation/playwright/` and the `make audit-screenshot` PNG capture
asciinema	latest	`make demo-cast` — captures `docs/assets/demo.cast` for the README. Without it, the demo still runs; you just can't regenerate the recording.

Install on macOS:

brew install node asciinema

First-time setup

git clone git@github.com:NickBaynham/agentic-test-data-manager.git
cd agentic-test-data-manager
make setup       # pdm install
cp .env.example .env.local   # then edit if you want non-default secrets

Day-to-day workflow

# Bring up the local stack (Postgres, MinIO, Target SUT stub, ATDM agent stub)
make up

# See what's running
make ps

# Tail container logs
make logs

# Run unit tests (no Docker)
make test          # or: make test-unit

# Run integration + e2e tests (uses the running stack)
make test-integration

# Lint and type-check
make lint

# Tear down (volumes preserved)
make down

# Tear down AND DELETE VOLUMES (destructive — wipes Postgres + MinIO data)
make down-clean

Host port mapping

The stack uses non-default host ports to avoid colliding with other local services. Container-internal ports are conventional; only the host side is remapped. The full table lives in planning/PLAN.md Phase 1.

Service	Host port	URL
Postgres	`55432`	`postgres://atdm@localhost:55432/target_healthcare`
MinIO API	`19000`	`http://localhost:19000`
MinIO Console	`19001`	`http://localhost:19001` (browser; default creds in `.env.example`)
Target SUT	`18000`	`http://localhost:18000/health`, `http://localhost:18000/docs`
ATDM agent	`18001`	`http://localhost:18001/health`, `http://localhost:18001/metrics`, `http://localhost:18001/docs`

Inside the compose network, services address each other by service name + container port (e.g., postgres:5432, minio:9000). Host port mapping only matters for connections from the developer's laptop.

Project layout

apps/
  test-data-agent/
    app/              FastAPI agent — scenario requests, validators, seeders (Phase 3+)
    python/atdm/      Installable client library (Phase 7)
    tests/            Per-app unit tests
    requirements.txt  Container-side runtime deps
  target-healthcare-api/
    app/              FastAPI SUT — synthetic healthcare entities
    migrations/       SQL migrations (Phase 2)
    tests/            Per-app unit tests
    requirements.txt  Container-side runtime deps
automation/
  playwright/         Playwright tests (Phase 7)
  pytest-api/         pytest-API tests (Phase 7)
  fixtures/           Generated fixtures (gitignored)
data/
  catalogs/, parquet/, seed/   (Phase 3+; gitignored except .gitkeep)
docs/                 This guide and others
infra/
  docker-compose.yml  The local stack
planning/PLAN.md      Phase-by-phase build plan
requirements/         BRD, engineering handoff, original concept
scripts/              Operational scripts (e.g., cold-start benchmark)
tests/                Root-level tests; tests/integration is the stack suite

Testing

Tests are split into three layers and must be run separately per source root (see PLAN.md Phase 0 Known pitfall — the duplicate-app mypy/pytest trap).

Layer	How to run	What it covers
Unit (top-level)	`pdm run pytest tests -m "not integration and not e2e"`	Repo-level utilities and architecture tests.
Unit (per-app)	`PYTHONPATH=apps/<name> pdm run pytest apps/<name>/tests`	FastAPI stubs and (later) generators, validators, seeders.
Integration	`make test-integration`	Tests that hit the running compose stack. Tagged `@pytest.mark.integration`.
E2E	`make test-integration` (subset by `-m e2e`)	Full make-up → assert → make-down cycles. Tagged `@pytest.mark.e2e`.

make test runs only the unit layers — never Docker. CI keeps the same separation: lint and test jobs are Docker-free; the stack job runs make up && make test-integration && make down.

Adding a new endpoint

Add the route to apps/<name>/app/main.py (Phase 1) — later it'll move to apps/<name>/app/api/... (Phase 3+).
Write the unit test under apps/<name>/tests/test_<feature>.py using fastapi.testclient.TestClient.

Run that app's tests:

PYTHONPATH=apps/<name> pdm run pytest apps/<name>/tests

If the endpoint is observed from outside the stack, add an integration test under tests/integration/.
Run make lint && make test && make test-integration before commit.

Database schema and migrations

The Target SUT schema lives in apps/target-healthcare-api/migrations/. Files are NNNN_<description>.sql in sequence — 0001_init.sql lands the seven entities and the baseline reference data (procedure / diagnosis codes).

How migrations are applied

On a fresh Postgres volume: the compose file bind-mounts migrations/ into /docker-entrypoint-initdb.d/ on the postgres container. Files in that directory are executed once when the data directory is first initialized.
On an existing volume: make migrate runs every .sql file in the directory against the running Postgres via psql. The Phase 2 migration uses CREATE TABLE IF NOT EXISTS and INSERT ... ON CONFLICT DO NOTHING so it is idempotent — re-applying does nothing.
After schema changes (during dev): the simplest path is make down-clean && make up. This wipes the Postgres volume so the entrypoint re-runs from scratch. make down-clean is destructive — only run it when you intend to throw away local data.

Schema overview (Phase 2)

Seven entities per BRD §9:

plan, provider, member, eligibility, claim — mutable, every row carries test_run_id NOT NULL with an index (DR-001 — required for the reset_run strategy in Phase 5).
procedure_code, diagnosis_code — reference tables; test_run_id is nullable so baseline rows (shared across all runs) coexist with per-run "invalid" codes that drive denial scenarios.

NFR-010 markers are enforced at two levels:

Pydantic models (fast feedback, returns 422 before DB round-trip).
DB CHECK constraints (defense in depth, catches inserts that bypass the agent): member.first_name/last_name must start with FAKE_; member.address_state must equal ZZ.

Inspecting the schema

docker exec atdm_postgres psql -U atdm -d target_healthcare -c "\dt"
docker exec atdm_postgres psql -U atdm -d target_healthcare -c "\d+ member"

Cold-start benchmark (NFR-001)

To measure the stack cold-start budget (NFR-001: ≤ 60 seconds p95 over 5 runs):

./scripts/measure_cold_start.sh 5

Sample output on a reference Mac (M2, 16 GB):

  run 1: 18s
  run 2: 18s
  run 3: 18s
  run 4: 17s
  run 5: 17s
  p50 (median): 18s
  p95 (max):    18s
  NFR-001 budget: 60s p95
  status: PASS

If p95 exceeds 60s on your hardware, investigate before scaling up the stack.

Intent-to-data request (Phase 3 — headline endpoint)

The ATDM agent now serves the headline endpoint. End-to-end example against the local stack:

# Request a scenario. The Authorization header is required for all mutations.
RESP=$(curl -fsS -X POST http://localhost:18001/test-data/requests \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'content-type: application/json' \
  -d '{"domain":"healthcare","scenario":"active_member_clean","constraints":{},
       "delivery":{"seed_target":true}}')
echo "$RESP" | jq .

RUN=$(echo "$RESP" | jq -r .test_run_id)
TOKEN=$(echo "$RESP" | jq -r .cleanup.cleanup_token)

# Inspect the audit trail.
curl -fsS "http://localhost:18001/audit/runs/$RUN" | jq '.events[].action'

# Reset.
curl -fsS -X POST "http://localhost:18001/test-data/runs/$RUN/reset" \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'content-type: application/json' \
  -d "{\"cleanup_token\":\"$TOKEN\"}" | jq .

Adding a new scenario

Drop a YAML file in apps/test-data-agent/app/scenarios/. Required keys: scenario_id, generators (ordered list of generator function names), validators (list of dotted validator names), linked_requirement_ids (FR-044; may be empty but the field is mandatory). Optional: default_constraints (a dict merged BEFORE the caller's constraints, so callers can override).
If the scenario needs new generators, add them under apps/test-data-agent/app/generators/ and register them in the seeder's _GENERATORS map.
The scenario registry loads at agent startup — docker compose restart test-data-agent for changes to take effect.

Available scenarios (Phase 4)

scenario_id	What it builds
`active_member_clean`	Active member, in-network provider, paid claim within window.
`claim_denial_active_member`	Active member, out-of-network provider, denied claim with invalid procedure code. Exercises all 4 validators.
`expired_eligibility`	Member with eligibility window that ended last year; recent claim denied.
`out_of_network_pending_claim`	Active member, out-of-network provider, claim pending review.
`inactive_member_with_history`	Inactive member with historical paid claim (soft-delete semantics).

Audit UI (Phase 8)

Every scenario request produces a Parquet audit trail at s3://atdm-audit/runs/<run_id>.parquet. Two read surfaces:

JSON: GET /audit/runs/<run_id> — for programs.
HTML: GET /ui/audit/<run_id> — for humans. Server-rendered (no JavaScript build step), styled with Pico.css from a CDN. Page is ≤ 100 KB.

Open in a browser after the demo:

open http://localhost:18001/ui/audit/<run_id>

The page shows: a banner with run_id/invoker/status, the planner steps, each validator's decision, the records created, fixtures emitted (paths), the reset status badge, and a chronological event timeline.

A worked example of the underlying JSON (plus the rejected-plan variant and a Phase 2 LLM placeholder) lives in design-decisions.md.

Architecture fitness tests (Phase 8 — gates CI)

Three tests live under tests/architecture/:

Test	Asserts
`test_no_sql_from_agents.py`	AR-003 — agent code never imports psycopg / asyncpg / sqlalchemy / sqlmodel, and never contains SQL-shaped literals.
`test_audit_log_append_only.py`	NFR-011 — no DELETE/PUT/PATCH route under `/audit/*`, no `delete_event` / `update_event` style functions.
`test_no_emoji.py`	NFR-012 — no emoji code points in any committed source / Markdown / YAML / TOML / HTML / TypeScript file.

Run locally via make test (architecture suite is part of test-unit). The CI architecture job fails the build on any violation.

Audit metrics (Phase 8)

GET /metrics emits Prometheus text exposition including:

atdm_up{} — the heartbeat (Phase 1).
atdm_audit_events_total{action, status} — one increment per audit append.
atdm_audit_write_latency_seconds{} — histogram per audit append.
atdm_audit_dropped_events_total — must remain 0; non-zero means an audit write failed durability and the integration test fails.

CLI and pytest plugin (Phase 7)

The atdm CLI is installed automatically via the editable atdm-client package. The atdm.pytest plugin auto-loads via the pytest11 entry point.

# List loaded scenarios
atdm scenarios

# Request a scenario (constraints repeatable)
atdm request claim_denial_active_member \
  -c provider_network=out_of_network -c claim_status=denied

# Look up the audit trail by run_id
atdm audit 01KS3J3H4D4AFCQD98SEPJCPBP

# Reset a specific run
atdm reset 01KS3J3H4D4AFCQD98SEPJCPBP --token <cleanup_token>

# Baseline snapshot / restore / list
atdm baseline-snapshot --baseline-id golden-2026-05-20
atdm baseline-restore --baseline-id golden-2026-05-20
atdm baseline-list

# Destructive — clear every test_run_id-tagged row across all tables
atdm reset-all --confirm

# All commands take a global --output/-o (human | json)
atdm -o json scenarios | jq '.scenarios[].scenario_id'

The CLI reads ATDM_API_URL and ATDM_API_TOKEN from the environment.

Using `@atdm_scenario` in pytest

from atdm.pytest import atdm_scenario

@atdm_scenario("active_member_clean")
def test_member(atdm_data):
    # Data is seeded BEFORE this runs.
    assert atdm_data["data"]["member_id"].startswith("m-")
    # No cleanup needed — the fixture's teardown calls /reset for you.

@atdm_scenario("claim_denial_active_member", constraints={"provider_network": "out_of_network"})
def test_denied_claim(atdm_data):
    assert atdm_data["data"]["claim_id"].startswith("claim-")

Or use AtdmClient directly:

from atdm.client import AtdmClient

client = AtdmClient()
response = client.request_scenario("active_member_clean")
try:
    ...
finally:
    client.reset_run(response["test_run_id"], response["cleanup"]["cleanup_token"])

Example Playwright test

# One-time
make playwright-install

# Generate a fixture, then run Playwright against it
atdm request active_member_clean --playwright
# (note the file path in the response; assume active_member_clean_<RUN>.json)
FIXTURE_PATH=automation/fixtures/active_member_clean_<RUN>.json make playwright-test

Fixture delivery (Phase 6)

A scenario request can optionally emit a Playwright JSON fixture and/or a pytest Python module fixture for the test runner to consume.

curl -fsS -X POST http://localhost:18001/test-data/requests \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'content-type: application/json' \
  -d '{"scenario":"claim_denial_active_member",
       "delivery":{"return_playwright_fixture":true,"return_pytest_fixture":true}}'

The response's fixtures.playwright and fixtures.pytest carry the absolute container-side paths. On the host, the files appear under automation/fixtures/<scenario>_<run_id>.{json,py} via the bind-mount.

Playwright JSON shape: {scenario_id, test_run_id, data, cleanup}. A Playwright test can JSON.parse(fs.readFileSync(...)) and use the keys directly.
pytest module: exposes SCENARIO_ID, TEST_RUN_ID, and a scenario_data() -> dict function. Import via importlib.util from a pytest test, or (Phase 7) use the atdm.pytest decorator.

Fixture writes emit a fixtures_emitted audit event with the paths written.

Reset strategies (Phase 5)

Five strategies, each demoable from the ATDM agent's HTTP API.

Strategy	Endpoint	When to use
`reset_run`	`POST /test-data/runs/{run_id}/reset`	Per-scenario cleanup. Token-gated. Atomic FK-safe delete of just one run's records.
`reset_all`	`POST /test-data/reset/all`	Wipe every test_run_id-tagged row. Requires `X-Confirm: yes` header. Baseline reference rows preserved.
`baseline_snapshot`	`POST /test-data/baseline/snapshot`	Capture the current state of every mutable + reference table to Parquet in MinIO. Body optionally names the baseline.
`baseline_restore`	`POST /test-data/baseline/restore`	TRUNCATE everything and replay a named baseline (or the latest). Idempotent — re-running yields the same state.
`idempotent_seed`	—	Property of `baseline_restore`, not an endpoint.

# Snapshot the current state
curl -fsS -X POST http://localhost:18001/test-data/baseline/snapshot \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'content-type: application/json' \
  -d '{"baseline_id":"golden-2026-05-20"}'

# List baselines
curl -fsS http://localhost:18001/test-data/baseline/list | jq .

# Restore (latest if baseline_id omitted)
curl -fsS -X POST http://localhost:18001/test-data/baseline/restore \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'content-type: application/json' \
  -d '{"baseline_id":"golden-2026-05-20"}'

# Nuke every test run (preserves reference data)
curl -fsS -X POST http://localhost:18001/test-data/reset/all \
  -H 'authorization: Bearer dev-token-change-me' \
  -H 'X-Confirm: yes'

Each strategy invocation emits its own audit trail under a synthetic strategy-{name}-{ULID} run_id, with *_started, *_completed, and on failure *_failed events. Look it up via GET /audit/runs/strategy-....

Validators (Phase 4)

Name	Rule
`relational.eligibility_status_matches_member`	inactive member ⇒ eligibility can't be active.
`relational.claim_references_existing_member`	claim.member_id must match bundle's member.member_id.
`domain.denial_requires_invalid_code`	denied claim must reference an invalid code OR carry a denial_reason.
`temporal.eligibility_window_contains_claim`	denied/pending claims must fall within eligibility window unless eligibility is expired.

Planner modes

ATDM_PLANNER=rule (default in MVP) — deterministic lookup by scenario_id.
ATDM_PLANNER=llm — Phase 4+; returns 501 LLM_MODE_NOT_ENABLED in MVP.

Catalog and audit storage

The catalog (s3://atdm-catalog/runs/{run_id}.parquet) holds one row per ScenarioRequest with the sha256 hash of its cleanup token. The audit log (s3://atdm-audit/runs/{run_id}.parquet) holds the chronological event trail for that run. Both live in MinIO — browse them at http://localhost:19001 (default creds in .env.example).

Member entity (Phase 2 — internal use)

The Target SUT exposes Member CRUD under /internal/. These routes are consumed by the agent's seeder (Phase 3+), not by humans, but they're documented here for visibility.

# Create a plan first (Member.plan_id is a FK)
docker exec atdm_postgres psql -U atdm -d target_healthcare -c \
  "INSERT INTO plan VALUES ('plan-dev', 'Dev Plan', 'hmo', '2026-01-01', 'dev-run');"

# Create a member via the internal route
curl -fsS -X POST http://localhost:18000/internal/members \
  -H 'content-type: application/json' \
  -d '{"member_id":"m-dev","status":"active","first_name":"FAKE_Alice",
       "last_name":"FAKE_Smith","date_of_birth":"1990-04-12",
       "address":{"line1":"1 FAKE_Way","city":"FAKE_Town","state":"ZZ","zip":"00000"},
       "plan_id":"plan-dev","test_run_id":"dev-run"}'

# Count and delete
docker exec atdm_postgres psql -U atdm -d target_healthcare -tA -c \
  "SELECT COUNT(*) FROM member WHERE test_run_id='dev-run';"
curl -fsS -X DELETE "http://localhost:18000/internal/members?run_id=dev-run"

Browse the full OpenAPI doc at http://localhost:18000/docs.

Troubleshooting

`make up` fails with `bind: address already in use`

Another process is holding one of the host ports (55432, 19000, 19001, 18000, 18001). Find it:

lsof -nP -iTCP:55432 -iTCP:19000 -iTCP:19001 -iTCP:18000 -iTCP:18001 -sTCP:LISTEN

If it's an old ATDM stack still running, make down. If it's something unrelated, stop it or update the port mapping in infra/docker-compose.yml (and the same table in PLAN.md, .env.example, README, and this doc).

Containers exit immediately after `make up`

Check logs:

make logs
# or for a single service:
docker logs atdm_test_data_agent

Common causes:

requirements.txt install failed inside the container (network blip; retry make up).
Code change in app/main.py has a syntax error — check the uvicorn log.

Integration tests can't reach the stack

The session fixture (tests/integration/conftest.py) auto-detects whether the stack is already up. If you want to debug a test against a long-running stack, set:

export ATDM_KEEP_STACK_UP=1
make up
make test-integration

The fixture will skip tear-down, and the warm-start e2e test will skip.

MinIO console is not loading at `http://localhost:19001`

Verify the container is healthy: make ps. Default credentials are in .env.example (MINIO_ROOT_USER, MINIO_ROOT_PASSWORD).

Tables are missing or schema looks wrong after schema changes

Postgres's docker-entrypoint-initdb.d/ runs only when the data directory is empty. If you've already brought up the stack and then changed the migration files, the changes won't apply. Two ways forward:

make down-clean && make up — wipes the postgres volume and lets the entrypoint re-run from scratch. Destructive.
make migrate — runs every .sql file against the live Postgres via psql. Works if the migration is idempotent (the Phase 2 one is).

`make lint` fails with `Duplicate module named "app"`

You probably tried to lint or test across both apps/ packages in a single mypy/pytest invocation. The Makefile is set up to invoke per source root specifically to avoid this. See PLAN.md Phase 0 Known pitfall for the full story.

Conventions

No emojis in source files, logs, or UI (per project CLAUDE.md).
pdm only. No pip install outside container requirements.txt files.
All shared system tools (Postgres, MinIO) run in Docker locally.
Maintain CHANGELOG.md, FEATURES.md, TODO.md with every merged change.
When you encounter a surprise, write the resolution into the appropriate doc (PLAN.md "Known pitfall", CHANGELOG "Lessons learned", or here) so the next replay doesn't burn time on it again.

FilesExpand file tree

development.md

Latest commit

History