This guide is for engineers working on ATDM. End-user docs are in the top-level README.md.
| Tool | Version |
|---|---|
| Python | 3.12 |
| pdm | 2.26+ |
| Docker Engine | 24+ |
| Docker Compose | v2 (docker compose ..., not legacy docker-compose) |
| GNU Make | any |
Verify with:
python3.12 --version
pdm --version
docker --version
docker compose version| Tool | Version | Used by |
|---|---|---|
| Node.js | 20+ | the example Playwright test under automation/playwright/ and the make audit-screenshot PNG capture |
| asciinema | latest | make demo-cast — captures docs/assets/demo.cast for the README. Without it, the demo still runs; you just can't regenerate the recording. |
Install on macOS:
brew install node asciinemagit clone git@github.com:NickBaynham/agentic-test-data-manager.git
cd agentic-test-data-manager
make setup # pdm install
cp .env.example .env.local # then edit if you want non-default secrets# Bring up the local stack (Postgres, MinIO, Target SUT stub, ATDM agent stub)
make up
# See what's running
make ps
# Tail container logs
make logs
# Run unit tests (no Docker)
make test # or: make test-unit
# Run integration + e2e tests (uses the running stack)
make test-integration
# Lint and type-check
make lint
# Tear down (volumes preserved)
make down
# Tear down AND DELETE VOLUMES (destructive — wipes Postgres + MinIO data)
make down-cleanThe stack uses non-default host ports to avoid colliding with other local services. Container-internal ports are conventional; only the host side is remapped. The full table lives in planning/PLAN.md Phase 1.
| Service | Host port | URL |
|---|---|---|
| Postgres | 55432 |
postgres://atdm@localhost:55432/target_healthcare |
| MinIO API | 19000 |
http://localhost:19000 |
| MinIO Console | 19001 |
http://localhost:19001 (browser; default creds in .env.example) |
| Target SUT | 18000 |
http://localhost:18000/health, http://localhost:18000/docs |
| ATDM agent | 18001 |
http://localhost:18001/health, http://localhost:18001/metrics, http://localhost:18001/docs |
Inside the compose network, services address each other by service name +
container port (e.g., postgres:5432, minio:9000). Host port mapping
only matters for connections from the developer's laptop.
apps/
test-data-agent/
app/ FastAPI agent — scenario requests, validators, seeders (Phase 3+)
python/atdm/ Installable client library (Phase 7)
tests/ Per-app unit tests
requirements.txt Container-side runtime deps
target-healthcare-api/
app/ FastAPI SUT — synthetic healthcare entities
migrations/ SQL migrations (Phase 2)
tests/ Per-app unit tests
requirements.txt Container-side runtime deps
automation/
playwright/ Playwright tests (Phase 7)
pytest-api/ pytest-API tests (Phase 7)
fixtures/ Generated fixtures (gitignored)
data/
catalogs/, parquet/, seed/ (Phase 3+; gitignored except .gitkeep)
docs/ This guide and others
infra/
docker-compose.yml The local stack
planning/PLAN.md Phase-by-phase build plan
requirements/ BRD, engineering handoff, original concept
scripts/ Operational scripts (e.g., cold-start benchmark)
tests/ Root-level tests; tests/integration is the stack suite
Tests are split into three layers and must be run separately per source root
(see PLAN.md Phase 0 Known pitfall — the duplicate-app
mypy/pytest trap).
| Layer | How to run | What it covers |
|---|---|---|
| Unit (top-level) | pdm run pytest tests -m "not integration and not e2e" |
Repo-level utilities and architecture tests. |
| Unit (per-app) | PYTHONPATH=apps/<name> pdm run pytest apps/<name>/tests |
FastAPI stubs and (later) generators, validators, seeders. |
| Integration | make test-integration |
Tests that hit the running compose stack. Tagged @pytest.mark.integration. |
| E2E | make test-integration (subset by -m e2e) |
Full make-up → assert → make-down cycles. Tagged @pytest.mark.e2e. |
make test runs only the unit layers — never Docker. CI keeps the same
separation: lint and test jobs are Docker-free; the stack job runs
make up && make test-integration && make down.
- Add the route to
apps/<name>/app/main.py(Phase 1) — later it'll move toapps/<name>/app/api/...(Phase 3+). - Write the unit test under
apps/<name>/tests/test_<feature>.pyusingfastapi.testclient.TestClient. - Run that app's tests:
PYTHONPATH=apps/<name> pdm run pytest apps/<name>/tests
- If the endpoint is observed from outside the stack, add an integration
test under
tests/integration/. - Run
make lint && make test && make test-integrationbefore commit.
The Target SUT schema lives in apps/target-healthcare-api/migrations/. Files
are NNNN_<description>.sql in sequence — 0001_init.sql lands the seven
entities and the baseline reference data (procedure / diagnosis codes).
- On a fresh Postgres volume: the compose file bind-mounts
migrations/into/docker-entrypoint-initdb.d/on the postgres container. Files in that directory are executed once when the data directory is first initialized. - On an existing volume:
make migrateruns every.sqlfile in the directory against the running Postgres viapsql. The Phase 2 migration usesCREATE TABLE IF NOT EXISTSandINSERT ... ON CONFLICT DO NOTHINGso it is idempotent — re-applying does nothing. - After schema changes (during dev): the simplest path is
make down-clean && make up. This wipes the Postgres volume so the entrypoint re-runs from scratch.make down-cleanis destructive — only run it when you intend to throw away local data.
Seven entities per BRD §9:
plan,provider,member,eligibility,claim— mutable, every row carriestest_run_id NOT NULLwith an index (DR-001 — required for thereset_runstrategy in Phase 5).procedure_code,diagnosis_code— reference tables;test_run_idis nullable so baseline rows (shared across all runs) coexist with per-run "invalid" codes that drive denial scenarios.
NFR-010 markers are enforced at two levels:
- Pydantic models (fast feedback, returns 422 before DB round-trip).
- DB CHECK constraints (defense in depth, catches inserts that bypass
the agent):
member.first_name/last_namemust start withFAKE_;member.address_statemust equalZZ.
docker exec atdm_postgres psql -U atdm -d target_healthcare -c "\dt"
docker exec atdm_postgres psql -U atdm -d target_healthcare -c "\d+ member"To measure the stack cold-start budget (NFR-001: ≤ 60 seconds p95 over 5 runs):
./scripts/measure_cold_start.sh 5Sample output on a reference Mac (M2, 16 GB):
run 1: 18s
run 2: 18s
run 3: 18s
run 4: 17s
run 5: 17s
p50 (median): 18s
p95 (max): 18s
NFR-001 budget: 60s p95
status: PASS
If p95 exceeds 60s on your hardware, investigate before scaling up the stack.
The ATDM agent now serves the headline endpoint. End-to-end example against the local stack:
# Request a scenario. The Authorization header is required for all mutations.
RESP=$(curl -fsS -X POST http://localhost:18001/test-data/requests \
-H 'authorization: Bearer dev-token-change-me' \
-H 'content-type: application/json' \
-d '{"domain":"healthcare","scenario":"active_member_clean","constraints":{},
"delivery":{"seed_target":true}}')
echo "$RESP" | jq .
RUN=$(echo "$RESP" | jq -r .test_run_id)
TOKEN=$(echo "$RESP" | jq -r .cleanup.cleanup_token)
# Inspect the audit trail.
curl -fsS "http://localhost:18001/audit/runs/$RUN" | jq '.events[].action'
# Reset.
curl -fsS -X POST "http://localhost:18001/test-data/runs/$RUN/reset" \
-H 'authorization: Bearer dev-token-change-me' \
-H 'content-type: application/json' \
-d "{\"cleanup_token\":\"$TOKEN\"}" | jq .- Drop a YAML file in
apps/test-data-agent/app/scenarios/. Required keys:scenario_id,generators(ordered list of generator function names),validators(list of dotted validator names),linked_requirement_ids(FR-044; may be empty but the field is mandatory). Optional:default_constraints(a dict merged BEFORE the caller's constraints, so callers can override). - If the scenario needs new generators, add them under
apps/test-data-agent/app/generators/and register them in the seeder's_GENERATORSmap. - The scenario registry loads at agent startup —
docker compose restart test-data-agentfor changes to take effect.
| scenario_id | What it builds |
|---|---|
active_member_clean |
Active member, in-network provider, paid claim within window. |
claim_denial_active_member |
Active member, out-of-network provider, denied claim with invalid procedure code. Exercises all 4 validators. |
expired_eligibility |
Member with eligibility window that ended last year; recent claim denied. |
out_of_network_pending_claim |
Active member, out-of-network provider, claim pending review. |
inactive_member_with_history |
Inactive member with historical paid claim (soft-delete semantics). |
Every scenario request produces a Parquet audit trail at
s3://atdm-audit/runs/<run_id>.parquet. Two read surfaces:
- JSON:
GET /audit/runs/<run_id>— for programs. - HTML:
GET /ui/audit/<run_id>— for humans. Server-rendered (no JavaScript build step), styled with Pico.css from a CDN. Page is ≤ 100 KB.
Open in a browser after the demo:
open http://localhost:18001/ui/audit/<run_id>
The page shows: a banner with run_id/invoker/status, the planner steps, each validator's decision, the records created, fixtures emitted (paths), the reset status badge, and a chronological event timeline.
A worked example of the underlying JSON (plus the rejected-plan variant and a Phase 2 LLM placeholder) lives in design-decisions.md.
Three tests live under tests/architecture/:
| Test | Asserts |
|---|---|
test_no_sql_from_agents.py |
AR-003 — agent code never imports psycopg / asyncpg / sqlalchemy / sqlmodel, and never contains SQL-shaped literals. |
test_audit_log_append_only.py |
NFR-011 — no DELETE/PUT/PATCH route under /audit/*, no delete_event / update_event style functions. |
test_no_emoji.py |
NFR-012 — no emoji code points in any committed source / Markdown / YAML / TOML / HTML / TypeScript file. |
Run locally via make test (architecture suite is part of test-unit).
The CI architecture job fails the build on any violation.
GET /metrics emits Prometheus text exposition including:
atdm_up{}— the heartbeat (Phase 1).atdm_audit_events_total{action, status}— one increment per audit append.atdm_audit_write_latency_seconds{}— histogram per audit append.atdm_audit_dropped_events_total— must remain 0; non-zero means an audit write failed durability and the integration test fails.
The atdm CLI is installed automatically via the editable atdm-client
package. The atdm.pytest plugin auto-loads via the pytest11 entry point.
# List loaded scenarios
atdm scenarios
# Request a scenario (constraints repeatable)
atdm request claim_denial_active_member \
-c provider_network=out_of_network -c claim_status=denied
# Look up the audit trail by run_id
atdm audit 01KS3J3H4D4AFCQD98SEPJCPBP
# Reset a specific run
atdm reset 01KS3J3H4D4AFCQD98SEPJCPBP --token <cleanup_token>
# Baseline snapshot / restore / list
atdm baseline-snapshot --baseline-id golden-2026-05-20
atdm baseline-restore --baseline-id golden-2026-05-20
atdm baseline-list
# Destructive — clear every test_run_id-tagged row across all tables
atdm reset-all --confirm
# All commands take a global --output/-o (human | json)
atdm -o json scenarios | jq '.scenarios[].scenario_id'The CLI reads ATDM_API_URL and ATDM_API_TOKEN from the environment.
from atdm.pytest import atdm_scenario
@atdm_scenario("active_member_clean")
def test_member(atdm_data):
# Data is seeded BEFORE this runs.
assert atdm_data["data"]["member_id"].startswith("m-")
# No cleanup needed — the fixture's teardown calls /reset for you.
@atdm_scenario("claim_denial_active_member", constraints={"provider_network": "out_of_network"})
def test_denied_claim(atdm_data):
assert atdm_data["data"]["claim_id"].startswith("claim-")Or use AtdmClient directly:
from atdm.client import AtdmClient
client = AtdmClient()
response = client.request_scenario("active_member_clean")
try:
...
finally:
client.reset_run(response["test_run_id"], response["cleanup"]["cleanup_token"])# One-time
make playwright-install
# Generate a fixture, then run Playwright against it
atdm request active_member_clean --playwright
# (note the file path in the response; assume active_member_clean_<RUN>.json)
FIXTURE_PATH=automation/fixtures/active_member_clean_<RUN>.json make playwright-testA scenario request can optionally emit a Playwright JSON fixture and/or a pytest Python module fixture for the test runner to consume.
curl -fsS -X POST http://localhost:18001/test-data/requests \
-H 'authorization: Bearer dev-token-change-me' \
-H 'content-type: application/json' \
-d '{"scenario":"claim_denial_active_member",
"delivery":{"return_playwright_fixture":true,"return_pytest_fixture":true}}'The response's fixtures.playwright and fixtures.pytest carry the absolute
container-side paths. On the host, the files appear under
automation/fixtures/<scenario>_<run_id>.{json,py} via the bind-mount.
- Playwright JSON shape:
{scenario_id, test_run_id, data, cleanup}. A Playwright test canJSON.parse(fs.readFileSync(...))and use the keys directly. - pytest module: exposes
SCENARIO_ID,TEST_RUN_ID, and ascenario_data() -> dictfunction. Import viaimportlib.utilfrom a pytest test, or (Phase 7) use theatdm.pytestdecorator.
Fixture writes emit a fixtures_emitted audit event with the paths written.
Five strategies, each demoable from the ATDM agent's HTTP API.
| Strategy | Endpoint | When to use |
|---|---|---|
reset_run |
POST /test-data/runs/{run_id}/reset |
Per-scenario cleanup. Token-gated. Atomic FK-safe delete of just one run's records. |
reset_all |
POST /test-data/reset/all |
Wipe every test_run_id-tagged row. Requires X-Confirm: yes header. Baseline reference rows preserved. |
baseline_snapshot |
POST /test-data/baseline/snapshot |
Capture the current state of every mutable + reference table to Parquet in MinIO. Body optionally names the baseline. |
baseline_restore |
POST /test-data/baseline/restore |
TRUNCATE everything and replay a named baseline (or the latest). Idempotent — re-running yields the same state. |
idempotent_seed |
— | Property of baseline_restore, not an endpoint. |
# Snapshot the current state
curl -fsS -X POST http://localhost:18001/test-data/baseline/snapshot \
-H 'authorization: Bearer dev-token-change-me' \
-H 'content-type: application/json' \
-d '{"baseline_id":"golden-2026-05-20"}'
# List baselines
curl -fsS http://localhost:18001/test-data/baseline/list | jq .
# Restore (latest if baseline_id omitted)
curl -fsS -X POST http://localhost:18001/test-data/baseline/restore \
-H 'authorization: Bearer dev-token-change-me' \
-H 'content-type: application/json' \
-d '{"baseline_id":"golden-2026-05-20"}'
# Nuke every test run (preserves reference data)
curl -fsS -X POST http://localhost:18001/test-data/reset/all \
-H 'authorization: Bearer dev-token-change-me' \
-H 'X-Confirm: yes'Each strategy invocation emits its own audit trail under a synthetic
strategy-{name}-{ULID} run_id, with *_started, *_completed, and
on failure *_failed events. Look it up via
GET /audit/runs/strategy-....
| Name | Rule |
|---|---|
relational.eligibility_status_matches_member |
inactive member ⇒ eligibility can't be active. |
relational.claim_references_existing_member |
claim.member_id must match bundle's member.member_id. |
domain.denial_requires_invalid_code |
denied claim must reference an invalid code OR carry a denial_reason. |
temporal.eligibility_window_contains_claim |
denied/pending claims must fall within eligibility window unless eligibility is expired. |
ATDM_PLANNER=rule(default in MVP) — deterministic lookup by scenario_id.ATDM_PLANNER=llm— Phase 4+; returns 501LLM_MODE_NOT_ENABLEDin MVP.
The catalog (s3://atdm-catalog/runs/{run_id}.parquet) holds one row per
ScenarioRequest with the sha256 hash of its cleanup token. The audit log
(s3://atdm-audit/runs/{run_id}.parquet) holds the chronological event trail
for that run. Both live in MinIO — browse them at
http://localhost:19001 (default creds in .env.example).
The Target SUT exposes Member CRUD under /internal/. These routes are
consumed by the agent's seeder (Phase 3+), not by humans, but they're
documented here for visibility.
# Create a plan first (Member.plan_id is a FK)
docker exec atdm_postgres psql -U atdm -d target_healthcare -c \
"INSERT INTO plan VALUES ('plan-dev', 'Dev Plan', 'hmo', '2026-01-01', 'dev-run');"
# Create a member via the internal route
curl -fsS -X POST http://localhost:18000/internal/members \
-H 'content-type: application/json' \
-d '{"member_id":"m-dev","status":"active","first_name":"FAKE_Alice",
"last_name":"FAKE_Smith","date_of_birth":"1990-04-12",
"address":{"line1":"1 FAKE_Way","city":"FAKE_Town","state":"ZZ","zip":"00000"},
"plan_id":"plan-dev","test_run_id":"dev-run"}'
# Count and delete
docker exec atdm_postgres psql -U atdm -d target_healthcare -tA -c \
"SELECT COUNT(*) FROM member WHERE test_run_id='dev-run';"
curl -fsS -X DELETE "http://localhost:18000/internal/members?run_id=dev-run"Browse the full OpenAPI doc at http://localhost:18000/docs.
Another process is holding one of the host ports (55432, 19000, 19001,
18000, 18001). Find it:
lsof -nP -iTCP:55432 -iTCP:19000 -iTCP:19001 -iTCP:18000 -iTCP:18001 -sTCP:LISTENIf it's an old ATDM stack still running, make down. If it's something
unrelated, stop it or update the port mapping in
infra/docker-compose.yml (and the same table in PLAN.md, .env.example,
README, and this doc).
Check logs:
make logs
# or for a single service:
docker logs atdm_test_data_agentCommon causes:
requirements.txtinstall failed inside the container (network blip; retrymake up).- Code change in
app/main.pyhas a syntax error — check the uvicorn log.
The session fixture (tests/integration/conftest.py) auto-detects whether the
stack is already up. If you want to debug a test against a long-running stack,
set:
export ATDM_KEEP_STACK_UP=1
make up
make test-integrationThe fixture will skip tear-down, and the warm-start e2e test will skip.
Verify the container is healthy: make ps. Default credentials are in
.env.example (MINIO_ROOT_USER, MINIO_ROOT_PASSWORD).
Postgres's docker-entrypoint-initdb.d/ runs only when the data directory
is empty. If you've already brought up the stack and then changed the
migration files, the changes won't apply. Two ways forward:
make down-clean && make up— wipes the postgres volume and lets the entrypoint re-run from scratch. Destructive.make migrate— runs every.sqlfile against the live Postgres viapsql. Works if the migration is idempotent (the Phase 2 one is).
You probably tried to lint or test across both apps/ packages in a single
mypy/pytest invocation. The Makefile is set up to invoke per source root
specifically to avoid this. See
PLAN.md Phase 0 Known pitfall for the full story.
- No emojis in source files, logs, or UI (per project CLAUDE.md).
pdmonly. Nopip installoutside containerrequirements.txtfiles.- All shared system tools (Postgres, MinIO) run in Docker locally.
- Maintain
CHANGELOG.md,FEATURES.md,TODO.mdwith every merged change. - When you encounter a surprise, write the resolution into the appropriate doc (PLAN.md "Known pitfall", CHANGELOG "Lessons learned", or here) so the next replay doesn't burn time on it again.