Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 66 additions & 71 deletions .planning/.continue-here.md
Original file line number Diff line number Diff line change
@@ -1,99 +1,94 @@
---
context: default
project: Pinakes
stage: Stage 3 Wave 3 COMPLETE (22 connectors); module renamed to pinakes.sh/pinakes; Wave 4 in flight
status: stage-3-wave-3-complete-module-renamed
head_commit: 9a76029
last_updated: 2026-06-13
stage: Stage 3 — 26 connectors (all codex-clean); module pinakes.sh/pinakes; standalone-release prep (Work Stream A) COMPLETE, audit CLEAN
status: stage-3-26-connectors-standalone-audit-clean
head_commit: 146c54d
last_updated: 2026-06-14
---

# Pinakes — Continue Here (Stage 3 Waves 1-3 + module rename done; Wave 4 in flight)
# Pinakes — Continue Here (26 connectors; standalone-release audit CLEAN)

You are the **core-build agent** for Pinakes (deterministic, verifiable data layer for AI agents;
biology beachhead, arXiv literature LAST). Repo `/Users/tristanfarmer/Projects/operon`, `origin/main`
= `github.com/001TMF/pinakes` (private, founder's PERSONAL account — NO org) @ `9a76029`.
Build env: `export PATH="$PATH:/opt/homebrew/bin"`. **Module path `pinakes.sh/pinakes`** (renamed
from the placeholder github.com/pinakes/pinakes in #47 — imports read `pinakes.sh/pinakes/...`).
ULTRACODE is ON. `make ci` == the gate. A PR touching `engine/`/`idl/`/`schema/` needs the
**`spine-change`** label (frozen-path CI guard). gh authed as 001TMF. codex CLI (gpt-5.5,
ChatGPT-authed) is the cross-model gate.
= `github.com/001TMF/pinakes` (private, founder's PERSONAL account — NO org) @ `146c54d`.
Build env: `export PATH="$PATH:/opt/homebrew/bin"`. **Module path `pinakes.sh/pinakes`** (vanity import
path, resolved via a go-import meta tag → github.com/001TMF/pinakes). ULTRACODE is ON. `make ci` == the
gate. A PR touching `engine/`/`idl/`/`schema/` needs the **`spine-change`** label (frozen-path CI guard).
gh authed as 001TMF. codex CLI (gpt-5.5, ChatGPT-authed) is the cross-model gate.

## Required reading
1. `DECISIONS.md`, `CONTRACTS.md`, `BUILD_ROADMAP.md` (§3 = Stage-3 waves), `RESUME.md`.
2. **`.planning/HANDOFF.json`** — machine state (may be stale on connector count; trust this file + git).
2. **`.planning/HANDOFF.json`** — machine state (may lag on connector count; trust git: it's 26).
3. `engine/contract/doc.go` — the canonical determinism/hash-purity contract (the connector build bar).
4. `RELEASING.md` (repo root) — the founder-gate publish checklist (now incl. the go-import meta-page gate).
5. `.planning/analysis/BLOAT-REPORT.md` — bloat analysis (verdict: lean; cleanup DEFERRED, see below).
4. `RELEASING.md` — the founder-gate publish checklist (incl. the go-import meta-page gate).
5. **`.planning/analysis/STANDALONE-DECOUPLING.md`** — the A1 standalone-release audit (verdict CLEAN).
6. `.planning/analysis/BLOAT-REPORT.md` — bloat analysis (verdict lean; cleanup DEFERRED).

## Baseline
`make ci` GREEN at `9a76029`. **22 connectors** live: alphafold, cbioportal, chembl, clinicaltrials,
clinvar, complexportal*, ensembl, geneontology, gnomad, gtex, gwascatalog*, gtopdb*, hpo, interpro,
jaspar, monarch, ncbidatasets, ncbiprotein, ncbivirus, openfda, opentargets, pdb, pubchem, reactome,
rhea*, uniprot. (* = the 4 Wave-4 connectors are IN FLIGHT in worktree, not yet merged — see below.)
NOTE: pre-existing FLAKY test in engine/governor (TestGovernor_Stress / InteractiveBeatsBatch,
timing-sensitive under -race contention) — if `make ci`/CI fails ONLY on it, re-run; NOT connector-caused.
`make ci` GREEN at `146c54d`. **26 connectors** live, ALL codex-clean: alphafold, cbioportal, chembl,
clinicaltrials, clinvar, complexportal, ensembl, geneontology, gnomad, gtex, gtopdb, gwascatalog, hpo,
interpro, jaspar, monarch, ncbidatasets, ncbiprotein, ncbivirus, openfda, opentargets, pdb, pubchem,
reactome, rhea, uniprot. The engine/governor timing suite is DETERMINISTIC now (synctest, #49) — a
governor failure is a REAL bug, not the old flake.

<completed_work>
- **Stages 0-2b COMPLETE** (engine spine, 3 surfaces MCP/CLI/REST, CI + signed-release pipeline). **Audit #1-#6 CLOSED.**
- **Stage 3 Wave 1** (#33/#35/#39/#40 → 14): openfda, clinicaltrials, cbioportal, ncbidatasets. Bookend fix #44.
- **Stage 3 Wave 2** (#45 → 18): reactome, gene-ontology(QuickGO), hpo, jaspar. The WIRE step made
provider_test.go's source-count assertions STRUCTURAL (`len(specs)==len(registry)` + floor) — so new
connector batches need NO count bump and never conflict on it. codex caught a real false-Complete in
EACH (bracket-the-whole-walk, body-id verify, empty-get-422, 204→UNKNOWN) — all fixed + regression-tested.
- **Stage 3 Wave 3** (#46 → 22): opentargets(GraphQL), gtex, monarch(BioLink), interpro(EBI cursor). THREE
codex rounds (author≠verifier): round1 (trailing-empty-page total, search-filter verify, anchor-id,
empty-get, 204); round2 (gtex dataset-substitution, blank-filter-422, monarch Retrieve-layer empty-ids,
400→422); round3 (gtex+monarch honor plan-snapshot st.Reconcile not live c.reconcile). All fixed + non-vacuous
regression tests. Documented monarch's conservative acc!=id alias limitation (determinism-safe).
- **MODULE RENAME** (#47): github.com/pinakes/pinakes → **pinakes.sh/pinakes** (vanity import path). Repo-URL
refs (install.sh PINAKES_REPO, README badges, RELEASING cosign-identity) → github.com/001TMF/pinakes
(cosign identity follows where release.yml RUNS, independent of module path). Stale pinakes.ai → pinakes.sh
in docs. RELEASING gained the go-import meta-page publish gate.
- **Bloat analysis** (`.planning/analysis/BLOAT-REPORT.md`): codebase is LEAN (~700 removable Go lines +
~1-1.3K doc lines). User chose **DEFER ALL** cleanup (see remaining_work).
- **Stages 0-2b COMPLETE** (engine spine, 3 surfaces MCP/CLI/REST, CI + signed-release pipeline). Audit #1-#6 CLOSED.
- **Stage 3 Waves 1-4 MERGED** (14→18→22→26): wave1 openfda/clinicaltrials/cbioportal/ncbidatasets; wave2
reactome/geneontology/hpo/jaspar; wave3 opentargets/gtex/monarch/interpro; wave4 gwascatalog/complexportal/
gtopdb/rhea. Each connector survived multi-round codex cross-model determinism audits + non-vacuous regression tests.
- **MODULE RENAME** (#47): → `pinakes.sh/pinakes`. Repo-URL refs (install.sh, README, cosign identity) →
github.com/001TMF/pinakes (cosign identity follows where release.yml RUNS, independent of module path).
- **Governor de-flake** (#49, spine-change): synctest; production .go byte-for-byte unchanged; 0 flakes / ~2900 -race iters.
- **CODEX ROUND-4 DISCHARGED** (#51): the deferred Wave-4 final re-verify ran when quota reset — gtopdb NO DEFECTS;
complexportal HIGH found (two untrusted full-body decodes lacked an EOF check → false Complete / hash impurity) and
FIXED with `requireEOF` (mirrors gtopdb) + 2 non-vacuous regression tests (proven by revert).
- **STANDALONE-RELEASE PREP — Work Stream A COMPLETE** (the user's objective: ship the engine standalone, add cloud
"as it grows"):
- **A1 decoupling audit → CLEAN** (`.planning/analysis/STANDALONE-DECOUPLING.md`): engine has ZERO cloud dependency;
build closure = stdlib + x/text + yaml + internal only; snapshot.Backend defaults to local FSBackend (cloud seam
NewStoreWithBackend inert); literature/arXiv ABSENT; no telemetry/phone-home; serve loopback-enforced; jobs in-process;
cold-start with scrubbed env works. NO decoupling code change needed.
- **A3 docs** (#52): README "Local state" section (default `<UserCacheDir>/pinakes` + `PINAKES_STATE_ROOT`). The
PINAKES_API_KEY-is-future-tier nit was already covered (README.md:225). README is already standalone-first.
</completed_work>

<remaining_work>
- **GATE + MERGE Wave 4** (workflow `wrwkwkk18`, worktree `../pinakes-wt-wave4`, branch feat/wave4-batch):
building gwascatalog, complexportal, gtopdb (CC-BY-SA copyleft→served-live), rhea with a Wave-3-HARDENED
build bar (mandatory guards a-f baked in: trailing-empty-page total, search-filter verify, body-id/composite
verify, non-200-never-stored + 204→UNKNOWN, empty-get-422 at Plan AND Retrieve 3-way dispatch, plan-snapshot
reconcile). When done: review → per-connector codex gate (KEEP IT — it has caught a real bug in nearly every
connector) → fix any findings + regression tests → make ci → ONE PR (no spine-change label, connectors only) → merge.
- **KEEP BREADTH GOING** (user directive: "make it quite large in a few workflows"): after Wave 4 merges, launch
Wave 5 off the updated main (reuse the wave4 workflow script, swap connectors). Roadmap remainder: STRING, Human
Protein Atlas (CC-BY-SA), MSigDB, Expression Atlas, IntAct, ChEBI, Bgee, GWAS-adjacent, Pharos, etc. (DepMap has
no clean REST — skip). Non-pathogen → fine in workflows; pathogen-DATA capture → main loop/codex (AUP).
- **DEFERRED bloat cleanup** (user: DEFER ALL): do the 6-non-hash-helper extraction (readBody/backoff/jsonNumberOrNil/
toAnySlice/httpDoer/emptyToNil → connectors/internal/, ~700 lines) as ONE dedicated pass AFTER breadth settles
(it touches all connectors; spine-adjacent; gate behind golden-hash tests; do NOT touch the canonicalizer trio —
deliberately connector-local for hash isolation). Docs cleanup (vestigial Operon_*/RECONCILIATION) = pre-public prep (parked).
- **INGEST-CLOUD COORDINATION**: the other agent (../provenir-ingest, branch feat/ingest-cloud) imports the engine.
After the rename they must update go.mod require/replace LHS → pinakes.sh/pinakes + sweep imports (seam types
snapshot.Backend / literature.PaperRecord UNCHANGED). Awaiting their reply on local-replace vs pinned consumption.
- **MCP-prompt client-neutral key affordance** (tracked): add MCP `prompts` capability + a prompt that GUIDES the
user to run `pinakes config set-key` (returns instructions only). Spine-adjacent → spine-change label.
- **FOUNDER GATES (PARKED — public can wait per user)**: go-import meta page at pinakes.sh/pinakes (needed for
`go install`; local replace + curl|sh unaffected); make repo public; get.pinakes.sh; npm/PyPI reserve; Homebrew
tap; branch/tag protection; USPTO/EUIPO check. **Stage 6 arXiv/literature = LAST.** Stage 4/5 = separate ../pinakes-cloud.
- **A2 FOUNDER GATES (the user's — launch plumbing, cloud-independent; this is now the critical path to a PUBLIC release)**:
publish the go-import meta page at pinakes.sh/pinakes; make the repo public; stand up get.pinakes.sh; publish the
Homebrew tap + wire the goreleaser `brews:` block; tag **v0.1.0** (fires cosign-signed goreleaser); enable branch/tag
protection. (optional: npm/PyPI reserve, USPTO/EUIPO check.) Do NOT publish without the user.
- **WORK STREAM B — KEEP BREADTH GOING (standing user directive: "make it quite large in a few workflows")**: launch
Wave 5 off main, reusing the wave4 batch script (`.claude/.../workflows/scripts/wave4-connector-batch-wf_c6f2836d-b00.js`
— swap the connector list; it bakes in the HARDENED build bar incl. the requireEOF/EOF rule). Candidates (non-pathogen,
clean public APIs): STRING, Human Protein Atlas (CC-BY-SA → served-live), MSigDB, Expression Atlas, IntAct, ChEBI, Bgee,
Pharos (DepMap has no clean REST — skip). Structural provider count test → parallel batches don't conflict. Non-pathogen
→ fine in workflows; pathogen-DATA capture → main loop/codex (AUP). KEEP the per-connector codex gate; pace codex (quota).
- **DEFERRED bloat cleanup** (user: DEFER ALL): the 6-non-hash-helper extraction (readBody/backoff/jsonNumberOrNil/
toAnySlice/httpDoer/emptyToNil → connectors/internal/, ~700 lines) as ONE post-breadth pass; gate behind golden-hash
tests; do NOT touch the per-connector canonicalizer trio. (.planning/analysis/BLOAT-REPORT.md.)
- **INGEST-CLOUD COORDINATION**: ../provenir-ingest (branch feat/ingest-cloud) imports the engine; after the rename it must
update go.mod require/replace LHS → pinakes.sh/pinakes + sweep imports (seam types snapshot.Backend / literature.PaperRecord
UNCHANGED). Awaiting their reply on local-replace vs pinned consumption.
- **MCP-prompt client-neutral key affordance** (tracked): add MCP `prompts` capability + a prompt that GUIDES the user to
run `pinakes config set-key` (instructions only). Spine-adjacent → spine-change label.
- **Stage 6 arXiv/literature = LAST.** Stage 4/5 cloud = the separate ../pinakes-cloud repo (plugs in behind snapshot.Backend).
</remaining_work>

## Rhythm (keep it)
design/research → implement via a Workflow (build → 3-lens adversarial Claude review [determinism/contract/test-rigor]
→ fix) → **codex gpt-5.5 cross-model gate** (`printf '%s' "$P" | codex exec -m gpt-5.5 -s read-only`; bounded with a
`( sleep ~300; kill $pid )&` watchdog; non-blocking; catches real determinism/credential bugs the Claude lenses miss —
INVALUABLE, keep on every connector) → fix findings + non-vacuous regression tests (prove by revert) → personal `make ci`
→ PR (`gh pr create --body-file`; +`--label spine-change` for engine/idl/schema) → merge `--merge --delete-branch` on
GREEN CI. Author≠verifier. AUTO-MERGE ON GREEN authorized. The recurring codex classes are now in the Wave-4 build bar.
research → implement via a Workflow (build → 3-lens adversarial Claude review [determinism/contract/test-rigor] → fix) →
**codex gpt-5.5 cross-model gate** (`printf '%s' "$P" | codex exec -m gpt-5.5 -s read-only`; bounded with a
`( sleep ~300; kill $pid )&` watchdog; catches real determinism bugs the Claude lenses miss — it found a HIGH in complexportal
at ROUND 4 that 3 Claude reviews missed; KEEP IT on every connector) → fix + non-vacuous regression tests (prove by revert) →
personal `make ci` → PR (`gh pr create --body-file`; +`--label spine-change` for engine/idl/schema) → merge `--merge
--delete-branch` on GREEN CI. Author≠verifier. AUTO-MERGE ON GREEN authorized.

## Environment gotchas
- EXPLICIT `git add <files>`, NEVER `git add -A`. `.planning/*` gitignored except `.continue-here.md` + `HANDOFF.json`.
- DIRECT PUSH TO main BLOCKED by the auto-mode classifier — even docs/handoff go via branch + PR.
- zsh: unquoted `$VAR` no word-split; `$pipestatus` not `$PIPESTATUS`; `gh --body-file` (backticks in `--body` run subst).
- zsh: `$pipestatus` not `$PIPESTATUS`; `gh --body-file` (backticks in `--body` run subst).
- macOS: no `timeout` (use `( sleep N; kill $pid )&`). `sed -i ''` (empty arg). The `cannot delete local branch … used by
worktree` on `gh pr merge --delete-branch` is benign — remote+local deleted; `git worktree remove <wt> --force` after.
- codex: pipe the prompt via stdin (positional-arg prompt hangs on stdin). Run the gate as a background bash script
looping the connectors sequentially; grep the output for severities when done.
worktree` on `gh pr merge --delete-branch` is benign — `git worktree remove <wt> --force` after.
- codex: pipe the prompt via stdin (positional hangs); heavy use exhausts the ChatGPT quota (resets a few hrs later).
- This repo = downloadable ENGINE only. cloud/ingest = the separate `../pinakes-cloud` (../provenir-ingest) repo.
- Two strings in play forever: MODULE/import path = `pinakes.sh/pinakes`; REPO URL (download/CI/cosign) =
`github.com/001TMF/pinakes`. Never conflate them in install.sh / goreleaser / README / docs.
- Two strings forever: MODULE/import path = `pinakes.sh/pinakes`; REPO URL (download/CI/cosign) = `github.com/001TMF/pinakes`.
12 changes: 6 additions & 6 deletions .planning/HANDOFF.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"version": "1.3",
"timestamp": "2026-06-13",
"version": "1.4",
"timestamp": "2026-06-14",
"project": "Pinakes",
"stage": "Stage 3 Wave 1 COMPLETE (14 connectors); audit #1-#6 ALL closed; LICENSE/SECURITY/RELEASING + optional API keys landed; agent skill in flight. See .planning/.continue-here.md for the authoritative current narrative.",
"status": "stage-3-wave-1-complete",
"head_commit": "beff063",
"stage": "Stage 3 — 26 connectors live, ALL codex-clean (Waves 1-4 merged); module renamed to pinakes.sh/pinakes; governor de-flaked (synctest); standalone-release prep (Work Stream A) COMPLETE with the A1 decoupling audit CLEAN. See .planning/.continue-here.md for the authoritative current narrative.",
"status": "stage-3-26-connectors-standalone-audit-clean",
"head_commit": "146c54d",
"origin": "github.com/001TMF/pinakes (private)",
"module_path": "github.com/pinakes/pinakes",
"module_path": "pinakes.sh/pinakes",
"build_env": "export PATH=\"$PATH:/opt/homebrew/bin\"",
"baseline_green": "go build ./... && go test ./... -race -count=1 GREEN at 8a3e9f9; binary: go build -o /tmp/pinakes ./cmd/pinakes",
"completed": [
Expand Down