From 4ec7b3dd7aa37fa89a33540ae892d29108b104bb Mon Sep 17 00:00:00 2001 From: Tristan Farmer <159447266+001TMF@users.noreply.github.com> Date: Sun, 14 Jun 2026 02:03:35 +0100 Subject: [PATCH] chore(handoff): refresh to 26 connectors + standalone-audit-clean state The handoff files were stale at 22/Wave-1 (.continue-here.md said 22; HANDOFF.json said 14/Wave-1, head beff063, old module path). Refresh both to the current main (146c54d): 26 connectors all codex-clean, module pinakes.sh/pinakes, governor de-flaked, and Work Stream A (standalone-release prep) COMPLETE with the A1 decoupling audit CLEAN. Remaining work updated to: A2 founder gates, Wave 5 breadth, deferred bloat, ingest-cloud coordination. Handoff docs only (the two non-gitignored .planning files). No engine/idl/schema change. Co-Authored-By: Claude Opus 4.8 (1M context) --- .planning/.continue-here.md | 137 +++++++++++++++++------------------- .planning/HANDOFF.json | 12 ++-- 2 files changed, 72 insertions(+), 77 deletions(-) diff --git a/.planning/.continue-here.md b/.planning/.continue-here.md index 04038d1..49eb9e4 100644 --- a/.planning/.continue-here.md +++ b/.planning/.continue-here.md @@ -1,99 +1,94 @@ --- context: default project: Pinakes -stage: Stage 3 Wave 3 COMPLETE (22 connectors); module renamed to pinakes.sh/pinakes; Wave 4 in flight -status: stage-3-wave-3-complete-module-renamed -head_commit: 9a76029 -last_updated: 2026-06-13 +stage: Stage 3 — 26 connectors (all codex-clean); module pinakes.sh/pinakes; standalone-release prep (Work Stream A) COMPLETE, audit CLEAN +status: stage-3-26-connectors-standalone-audit-clean +head_commit: 146c54d +last_updated: 2026-06-14 --- -# Pinakes — Continue Here (Stage 3 Waves 1-3 + module rename done; Wave 4 in flight) +# Pinakes — Continue Here (26 connectors; standalone-release audit CLEAN) You are the **core-build agent** for Pinakes (deterministic, verifiable data layer for AI agents; biology beachhead, arXiv literature LAST). Repo `/Users/tristanfarmer/Projects/operon`, `origin/main` -= `github.com/001TMF/pinakes` (private, founder's PERSONAL account — NO org) @ `9a76029`. -Build env: `export PATH="$PATH:/opt/homebrew/bin"`. **Module path `pinakes.sh/pinakes`** (renamed -from the placeholder github.com/pinakes/pinakes in #47 — imports read `pinakes.sh/pinakes/...`). -ULTRACODE is ON. `make ci` == the gate. A PR touching `engine/`/`idl/`/`schema/` needs the -**`spine-change`** label (frozen-path CI guard). gh authed as 001TMF. codex CLI (gpt-5.5, -ChatGPT-authed) is the cross-model gate. += `github.com/001TMF/pinakes` (private, founder's PERSONAL account — NO org) @ `146c54d`. +Build env: `export PATH="$PATH:/opt/homebrew/bin"`. **Module path `pinakes.sh/pinakes`** (vanity import +path, resolved via a go-import meta tag → github.com/001TMF/pinakes). ULTRACODE is ON. `make ci` == the +gate. A PR touching `engine/`/`idl/`/`schema/` needs the **`spine-change`** label (frozen-path CI guard). +gh authed as 001TMF. codex CLI (gpt-5.5, ChatGPT-authed) is the cross-model gate. ## Required reading 1. `DECISIONS.md`, `CONTRACTS.md`, `BUILD_ROADMAP.md` (§3 = Stage-3 waves), `RESUME.md`. -2. **`.planning/HANDOFF.json`** — machine state (may be stale on connector count; trust this file + git). +2. **`.planning/HANDOFF.json`** — machine state (may lag on connector count; trust git: it's 26). 3. `engine/contract/doc.go` — the canonical determinism/hash-purity contract (the connector build bar). -4. `RELEASING.md` (repo root) — the founder-gate publish checklist (now incl. the go-import meta-page gate). -5. `.planning/analysis/BLOAT-REPORT.md` — bloat analysis (verdict: lean; cleanup DEFERRED, see below). +4. `RELEASING.md` — the founder-gate publish checklist (incl. the go-import meta-page gate). +5. **`.planning/analysis/STANDALONE-DECOUPLING.md`** — the A1 standalone-release audit (verdict CLEAN). +6. `.planning/analysis/BLOAT-REPORT.md` — bloat analysis (verdict lean; cleanup DEFERRED). ## Baseline -`make ci` GREEN at `9a76029`. **22 connectors** live: alphafold, cbioportal, chembl, clinicaltrials, -clinvar, complexportal*, ensembl, geneontology, gnomad, gtex, gwascatalog*, gtopdb*, hpo, interpro, -jaspar, monarch, ncbidatasets, ncbiprotein, ncbivirus, openfda, opentargets, pdb, pubchem, reactome, -rhea*, uniprot. (* = the 4 Wave-4 connectors are IN FLIGHT in worktree, not yet merged — see below.) -NOTE: pre-existing FLAKY test in engine/governor (TestGovernor_Stress / InteractiveBeatsBatch, -timing-sensitive under -race contention) — if `make ci`/CI fails ONLY on it, re-run; NOT connector-caused. +`make ci` GREEN at `146c54d`. **26 connectors** live, ALL codex-clean: alphafold, cbioportal, chembl, +clinicaltrials, clinvar, complexportal, ensembl, geneontology, gnomad, gtex, gtopdb, gwascatalog, hpo, +interpro, jaspar, monarch, ncbidatasets, ncbiprotein, ncbivirus, openfda, opentargets, pdb, pubchem, +reactome, rhea, uniprot. The engine/governor timing suite is DETERMINISTIC now (synctest, #49) — a +governor failure is a REAL bug, not the old flake. -- **Stages 0-2b COMPLETE** (engine spine, 3 surfaces MCP/CLI/REST, CI + signed-release pipeline). **Audit #1-#6 CLOSED.** -- **Stage 3 Wave 1** (#33/#35/#39/#40 → 14): openfda, clinicaltrials, cbioportal, ncbidatasets. Bookend fix #44. -- **Stage 3 Wave 2** (#45 → 18): reactome, gene-ontology(QuickGO), hpo, jaspar. The WIRE step made - provider_test.go's source-count assertions STRUCTURAL (`len(specs)==len(registry)` + floor) — so new - connector batches need NO count bump and never conflict on it. codex caught a real false-Complete in - EACH (bracket-the-whole-walk, body-id verify, empty-get-422, 204→UNKNOWN) — all fixed + regression-tested. -- **Stage 3 Wave 3** (#46 → 22): opentargets(GraphQL), gtex, monarch(BioLink), interpro(EBI cursor). THREE - codex rounds (author≠verifier): round1 (trailing-empty-page total, search-filter verify, anchor-id, - empty-get, 204); round2 (gtex dataset-substitution, blank-filter-422, monarch Retrieve-layer empty-ids, - 400→422); round3 (gtex+monarch honor plan-snapshot st.Reconcile not live c.reconcile). All fixed + non-vacuous - regression tests. Documented monarch's conservative acc!=id alias limitation (determinism-safe). -- **MODULE RENAME** (#47): github.com/pinakes/pinakes → **pinakes.sh/pinakes** (vanity import path). Repo-URL - refs (install.sh PINAKES_REPO, README badges, RELEASING cosign-identity) → github.com/001TMF/pinakes - (cosign identity follows where release.yml RUNS, independent of module path). Stale pinakes.ai → pinakes.sh - in docs. RELEASING gained the go-import meta-page publish gate. -- **Bloat analysis** (`.planning/analysis/BLOAT-REPORT.md`): codebase is LEAN (~700 removable Go lines + - ~1-1.3K doc lines). User chose **DEFER ALL** cleanup (see remaining_work). +- **Stages 0-2b COMPLETE** (engine spine, 3 surfaces MCP/CLI/REST, CI + signed-release pipeline). Audit #1-#6 CLOSED. +- **Stage 3 Waves 1-4 MERGED** (14→18→22→26): wave1 openfda/clinicaltrials/cbioportal/ncbidatasets; wave2 + reactome/geneontology/hpo/jaspar; wave3 opentargets/gtex/monarch/interpro; wave4 gwascatalog/complexportal/ + gtopdb/rhea. Each connector survived multi-round codex cross-model determinism audits + non-vacuous regression tests. +- **MODULE RENAME** (#47): → `pinakes.sh/pinakes`. Repo-URL refs (install.sh, README, cosign identity) → + github.com/001TMF/pinakes (cosign identity follows where release.yml RUNS, independent of module path). +- **Governor de-flake** (#49, spine-change): synctest; production .go byte-for-byte unchanged; 0 flakes / ~2900 -race iters. +- **CODEX ROUND-4 DISCHARGED** (#51): the deferred Wave-4 final re-verify ran when quota reset — gtopdb NO DEFECTS; + complexportal HIGH found (two untrusted full-body decodes lacked an EOF check → false Complete / hash impurity) and + FIXED with `requireEOF` (mirrors gtopdb) + 2 non-vacuous regression tests (proven by revert). +- **STANDALONE-RELEASE PREP — Work Stream A COMPLETE** (the user's objective: ship the engine standalone, add cloud + "as it grows"): + - **A1 decoupling audit → CLEAN** (`.planning/analysis/STANDALONE-DECOUPLING.md`): engine has ZERO cloud dependency; + build closure = stdlib + x/text + yaml + internal only; snapshot.Backend defaults to local FSBackend (cloud seam + NewStoreWithBackend inert); literature/arXiv ABSENT; no telemetry/phone-home; serve loopback-enforced; jobs in-process; + cold-start with scrubbed env works. NO decoupling code change needed. + - **A3 docs** (#52): README "Local state" section (default `/pinakes` + `PINAKES_STATE_ROOT`). The + PINAKES_API_KEY-is-future-tier nit was already covered (README.md:225). README is already standalone-first. -- **GATE + MERGE Wave 4** (workflow `wrwkwkk18`, worktree `../pinakes-wt-wave4`, branch feat/wave4-batch): - building gwascatalog, complexportal, gtopdb (CC-BY-SA copyleft→served-live), rhea with a Wave-3-HARDENED - build bar (mandatory guards a-f baked in: trailing-empty-page total, search-filter verify, body-id/composite - verify, non-200-never-stored + 204→UNKNOWN, empty-get-422 at Plan AND Retrieve 3-way dispatch, plan-snapshot - reconcile). When done: review → per-connector codex gate (KEEP IT — it has caught a real bug in nearly every - connector) → fix any findings + regression tests → make ci → ONE PR (no spine-change label, connectors only) → merge. -- **KEEP BREADTH GOING** (user directive: "make it quite large in a few workflows"): after Wave 4 merges, launch - Wave 5 off the updated main (reuse the wave4 workflow script, swap connectors). Roadmap remainder: STRING, Human - Protein Atlas (CC-BY-SA), MSigDB, Expression Atlas, IntAct, ChEBI, Bgee, GWAS-adjacent, Pharos, etc. (DepMap has - no clean REST — skip). Non-pathogen → fine in workflows; pathogen-DATA capture → main loop/codex (AUP). -- **DEFERRED bloat cleanup** (user: DEFER ALL): do the 6-non-hash-helper extraction (readBody/backoff/jsonNumberOrNil/ - toAnySlice/httpDoer/emptyToNil → connectors/internal/, ~700 lines) as ONE dedicated pass AFTER breadth settles - (it touches all connectors; spine-adjacent; gate behind golden-hash tests; do NOT touch the canonicalizer trio — - deliberately connector-local for hash isolation). Docs cleanup (vestigial Operon_*/RECONCILIATION) = pre-public prep (parked). -- **INGEST-CLOUD COORDINATION**: the other agent (../provenir-ingest, branch feat/ingest-cloud) imports the engine. - After the rename they must update go.mod require/replace LHS → pinakes.sh/pinakes + sweep imports (seam types - snapshot.Backend / literature.PaperRecord UNCHANGED). Awaiting their reply on local-replace vs pinned consumption. -- **MCP-prompt client-neutral key affordance** (tracked): add MCP `prompts` capability + a prompt that GUIDES the - user to run `pinakes config set-key` (returns instructions only). Spine-adjacent → spine-change label. -- **FOUNDER GATES (PARKED — public can wait per user)**: go-import meta page at pinakes.sh/pinakes (needed for - `go install`; local replace + curl|sh unaffected); make repo public; get.pinakes.sh; npm/PyPI reserve; Homebrew - tap; branch/tag protection; USPTO/EUIPO check. **Stage 6 arXiv/literature = LAST.** Stage 4/5 = separate ../pinakes-cloud. +- **A2 FOUNDER GATES (the user's — launch plumbing, cloud-independent; this is now the critical path to a PUBLIC release)**: + publish the go-import meta page at pinakes.sh/pinakes; make the repo public; stand up get.pinakes.sh; publish the + Homebrew tap + wire the goreleaser `brews:` block; tag **v0.1.0** (fires cosign-signed goreleaser); enable branch/tag + protection. (optional: npm/PyPI reserve, USPTO/EUIPO check.) Do NOT publish without the user. +- **WORK STREAM B — KEEP BREADTH GOING (standing user directive: "make it quite large in a few workflows")**: launch + Wave 5 off main, reusing the wave4 batch script (`.claude/.../workflows/scripts/wave4-connector-batch-wf_c6f2836d-b00.js` + — swap the connector list; it bakes in the HARDENED build bar incl. the requireEOF/EOF rule). Candidates (non-pathogen, + clean public APIs): STRING, Human Protein Atlas (CC-BY-SA → served-live), MSigDB, Expression Atlas, IntAct, ChEBI, Bgee, + Pharos (DepMap has no clean REST — skip). Structural provider count test → parallel batches don't conflict. Non-pathogen + → fine in workflows; pathogen-DATA capture → main loop/codex (AUP). KEEP the per-connector codex gate; pace codex (quota). +- **DEFERRED bloat cleanup** (user: DEFER ALL): the 6-non-hash-helper extraction (readBody/backoff/jsonNumberOrNil/ + toAnySlice/httpDoer/emptyToNil → connectors/internal/, ~700 lines) as ONE post-breadth pass; gate behind golden-hash + tests; do NOT touch the per-connector canonicalizer trio. (.planning/analysis/BLOAT-REPORT.md.) +- **INGEST-CLOUD COORDINATION**: ../provenir-ingest (branch feat/ingest-cloud) imports the engine; after the rename it must + update go.mod require/replace LHS → pinakes.sh/pinakes + sweep imports (seam types snapshot.Backend / literature.PaperRecord + UNCHANGED). Awaiting their reply on local-replace vs pinned consumption. +- **MCP-prompt client-neutral key affordance** (tracked): add MCP `prompts` capability + a prompt that GUIDES the user to + run `pinakes config set-key` (instructions only). Spine-adjacent → spine-change label. +- **Stage 6 arXiv/literature = LAST.** Stage 4/5 cloud = the separate ../pinakes-cloud repo (plugs in behind snapshot.Backend). ## Rhythm (keep it) -design/research → implement via a Workflow (build → 3-lens adversarial Claude review [determinism/contract/test-rigor] -→ fix) → **codex gpt-5.5 cross-model gate** (`printf '%s' "$P" | codex exec -m gpt-5.5 -s read-only`; bounded with a -`( sleep ~300; kill $pid )&` watchdog; non-blocking; catches real determinism/credential bugs the Claude lenses miss — -INVALUABLE, keep on every connector) → fix findings + non-vacuous regression tests (prove by revert) → personal `make ci` -→ PR (`gh pr create --body-file`; +`--label spine-change` for engine/idl/schema) → merge `--merge --delete-branch` on -GREEN CI. Author≠verifier. AUTO-MERGE ON GREEN authorized. The recurring codex classes are now in the Wave-4 build bar. +research → implement via a Workflow (build → 3-lens adversarial Claude review [determinism/contract/test-rigor] → fix) → +**codex gpt-5.5 cross-model gate** (`printf '%s' "$P" | codex exec -m gpt-5.5 -s read-only`; bounded with a +`( sleep ~300; kill $pid )&` watchdog; catches real determinism bugs the Claude lenses miss — it found a HIGH in complexportal +at ROUND 4 that 3 Claude reviews missed; KEEP IT on every connector) → fix + non-vacuous regression tests (prove by revert) → +personal `make ci` → PR (`gh pr create --body-file`; +`--label spine-change` for engine/idl/schema) → merge `--merge +--delete-branch` on GREEN CI. Author≠verifier. AUTO-MERGE ON GREEN authorized. ## Environment gotchas - EXPLICIT `git add `, NEVER `git add -A`. `.planning/*` gitignored except `.continue-here.md` + `HANDOFF.json`. - DIRECT PUSH TO main BLOCKED by the auto-mode classifier — even docs/handoff go via branch + PR. -- zsh: unquoted `$VAR` no word-split; `$pipestatus` not `$PIPESTATUS`; `gh --body-file` (backticks in `--body` run subst). +- zsh: `$pipestatus` not `$PIPESTATUS`; `gh --body-file` (backticks in `--body` run subst). - macOS: no `timeout` (use `( sleep N; kill $pid )&`). `sed -i ''` (empty arg). The `cannot delete local branch … used by - worktree` on `gh pr merge --delete-branch` is benign — remote+local deleted; `git worktree remove --force` after. -- codex: pipe the prompt via stdin (positional-arg prompt hangs on stdin). Run the gate as a background bash script - looping the connectors sequentially; grep the output for severities when done. + worktree` on `gh pr merge --delete-branch` is benign — `git worktree remove --force` after. +- codex: pipe the prompt via stdin (positional hangs); heavy use exhausts the ChatGPT quota (resets a few hrs later). - This repo = downloadable ENGINE only. cloud/ingest = the separate `../pinakes-cloud` (../provenir-ingest) repo. -- Two strings in play forever: MODULE/import path = `pinakes.sh/pinakes`; REPO URL (download/CI/cosign) = - `github.com/001TMF/pinakes`. Never conflate them in install.sh / goreleaser / README / docs. +- Two strings forever: MODULE/import path = `pinakes.sh/pinakes`; REPO URL (download/CI/cosign) = `github.com/001TMF/pinakes`. diff --git a/.planning/HANDOFF.json b/.planning/HANDOFF.json index 605dde2..b68db9f 100644 --- a/.planning/HANDOFF.json +++ b/.planning/HANDOFF.json @@ -1,12 +1,12 @@ { - "version": "1.3", - "timestamp": "2026-06-13", + "version": "1.4", + "timestamp": "2026-06-14", "project": "Pinakes", - "stage": "Stage 3 Wave 1 COMPLETE (14 connectors); audit #1-#6 ALL closed; LICENSE/SECURITY/RELEASING + optional API keys landed; agent skill in flight. See .planning/.continue-here.md for the authoritative current narrative.", - "status": "stage-3-wave-1-complete", - "head_commit": "beff063", + "stage": "Stage 3 — 26 connectors live, ALL codex-clean (Waves 1-4 merged); module renamed to pinakes.sh/pinakes; governor de-flaked (synctest); standalone-release prep (Work Stream A) COMPLETE with the A1 decoupling audit CLEAN. See .planning/.continue-here.md for the authoritative current narrative.", + "status": "stage-3-26-connectors-standalone-audit-clean", + "head_commit": "146c54d", "origin": "github.com/001TMF/pinakes (private)", - "module_path": "github.com/pinakes/pinakes", + "module_path": "pinakes.sh/pinakes", "build_env": "export PATH=\"$PATH:/opt/homebrew/bin\"", "baseline_green": "go build ./... && go test ./... -race -count=1 GREEN at 8a3e9f9; binary: go build -o /tmp/pinakes ./cmd/pinakes", "completed": [