feat: matrix classification audit, Taxonomy, and Papers Explorer upgrades#33
Conversation
Add Phase 4 of the Zotero⇄CAAIL lifecycle: a methods-grounded re-audit of the Papers.md matrix itself. - caail-classification-reviewer agent: read-only, full-text-grounded reviewer of (method × area) placement, distinct from the bibliographic citation reviewer. Verdicts DEFENSIBLE / MISPLACED / UNSUPPORTED per cell, plus MISSING-CELL recommendations and a NOT-PRIMARY flag. - matrix-classification-audit skill + extract_matrix_corpus.py: parses the matrix and references out of Papers.md, indexes both Zotero groups by DOI and URL, and pulls each matrix paper's methods section from the PDF full-text cache into a per-ref corpus for adversarial review. - Register the new skill (Phase 4) and reviewer in CLAUDE.md; cross-reference it from zotero-to-caail-sync; gitignore the corpus build artifact.
Held audit proposals — for human triageThe methods-grounded audit behind this PR also surfaced 9 moves, 27 removes, and 14 “leave the matrix” proposals. They were deliberately not applied here because they challenge CAAIL's curatorial choice to catalogue general/foundational methods (the strict reviewer wanted papers like scGPT, GEARS, UCE, SWE-bench, GPQA out of cell-ag columns). Each cleared a majority of 3 independent adversarial skeptics, but each is a curatorial call — tick the ones to action in a follow-up. They split into two natures: (A) method-accuracy fixes (the cell names the wrong technique) and (B) scope/philosophy calls (method is right, but the application isn't cell-ag-specific). A-type fixes are the safer subset.
Moves — reclassify an existing cell (9)
Removes — drop an existing cell (27)
Not-primary — proposed to leave the matrix entirely (14)
Generated by the |
Add 18 cross-listings where a paper substantively applies more than one AI/ML method (verified against each paper's methods section via the matrix-classification-audit workflow; each survived independent adversarial review). No reference text changed; matrix cells only. - ref 11 Shen 2024 → Ensemble Learning (InfoGAN + dynamically-weighted base classifiers) - ref 20 Rafieyan 2024 → Ensemble Learning (XGBoost/GBM/RF/LightGBM) - ref 26 Sun 2023, ref 28 Sun 2026 → SVM / Ensemble (LS-SVM, RF/GBDT/SVC) - ref 32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (seven model families benchmarked for bioprocess prediction) - ref 61 Wang 2025b, ref 93 Tang 2026 → Agent Infrastructure (LangGraph / hybrid knowledge frameworks) - ref 68 Li 2024 → GNN (GEM-as-graph submodule) - ref 117 Cui 2024, ref 120 Rizvi 2026 → Cell-State & Perturbation Prediction; ref 120 also Reinforcement Learning (GRPO) - ref 161 Narayanan 2025 → Reinforcement Learning (RL-trained chemistry model) - ref 169 Hashizume & Ying 2025 → Ensemble Learning / Genetic Algorithms - ref 182 King 2004 → Active Learning (experiment-selection strategy)
ref 72 trains 18 models including SVM and MLP/Bayesian neural networks for sensory (flavor) prediction under 10-fold CV — confirmed by full-text re-verification as the paper's own applied methods, not a background enumeration. Adds (SVM × Sensory) and (Deep Learning × Sensory) alongside its existing Ensemble Learning placement.
568dfc5 to
2fe066d
Compare
…movals The audit's #33 false positive (it proposed deleting a CNN-surrogate-CFD paper from Bioprocess control, which ResearchAreas/Bioprocess.md already cites) traced to two gaps: the reviewer read only the paper (never CAAIL's own curation context), and a destructive removal carried no more burden than an additive placement. This bakes an asymmetric, context-aware burden on scope removals into the durable tooling. - extract_matrix_corpus.py: add per-ref cited_in_research_areas (scan ResearchAreas/*.md by surname+year / DOI) — an intentional-placement KEEP prior (correctly flags #33 -> Bioprocess control). - caail-classification-reviewer: read the ResearchAreas/<Area>.md scope and honor that prior before any scope call; tag every verdict nature=method-accuracy|scope; default a general-method scope concern to a MOVE to AI Tooling / Methodology, not a removal; method-absent papers stay a firm method-accuracy flag; never hedge a non-fitting paper into a destination-less move. - .claude/workflows/matrix-classification-audit.js: durable named workflow — propose -> skeptics -> (scope only) steelman defender -> gated domain-relevance web grounding. method-accuracy + additive changes bypass the heavy layers. Self-bootstraps inputs from matrix-corpus.json (args is not reliably delivered); fan-out pinned to Sonnet. - SKILL.md / CLAUDE.md: document the asymmetric burden, the layers, and the named-workflow invocation. Behavioral mini-eval (#33,#151,#152,#155): #33 now kept; SWE-bench (#155) correctly flagged NOT-PRIMARY by the defender; #152 scope-removal overturned via the curator-citation prior. No Papers.md content change.
…racy Closes the residual gap from 7fe068c: the method-accuracy path bypassed the defender, so a method-accuracy verdict on a paper the curators cite in a ResearchAreas page could apply a removal of its only cell — orphaning it and severing the live cross-reference (the exact risk the defender flagged for #152). A wrong method row on a cited paper is now a re-row, not a deletion: - workflow: proposer reports cited_by_curators; adjudicate() routes any removal (unsupported / not_primary) of a cited paper through the steelman defender regardless of nature. A re-row MOVE or an uncited method-accuracy fix still needs only skeptics; scope removals still reach the defender (gated, not blanket). - reviewer agent: a curator-cited paper is never UNSUPPORTED/NOT-PRIMARY — a wrong method row is a MISPLACED re-row. Verified by a deterministic truth-table check of the routing guard (7/7) plus the behavioral mini-eval (#33 kept; cited #151/#152 received no applied removal). No Papers.md change.
Hardened re-scrutiny of the held proposalsRe-ran all 32 held-proposal papers through the hardened pipeline (propose → skeptics → steelman defender for scope/cited removals → gated domain grounding). The over-strict scope/philosophy deletions are gone:
✅ Apply — method-accuracy fixes, orphan-safe (5)
|
Adds a non-destructive taxonomy_gap verdict so the classification audit can keep a paper that applies a real AI/ML method whose matrix row/column does not yet exist, and surface a proposed new row/column for curator decision instead of forcing a wrong cell or orphaning the paper. - reviewer: taxonomy_gap verdict + precedence ladder (gap is the last resort, after re-row into an existing label); method-family precision notes (Bayesian Optimization vs Bayesian inference; GNN vs classical network propagation) so a step-2 re-row does not grab a superficially-similar row and bury the real gap. - workflow: taxonomy_gaps schema; per-ref collection that never enters the adjudicated change set (so a gap can never become an applied removal); a Taxonomy phase that clusters pooled gaps and adversarially verifies clusters of >=2 papers into proposed new rows/columns (singletons are parked). - verify_routing.mjs: deterministic guard — the asymmetric-burden routing truth-table, the non-orphan structural invariant, and the >=2 cluster gate. - docs: SKILL.md (verdict, non-orphan guarantee, run-full-corpus note, human-applied rows/columns) and the CLAUDE.md lifecycle entry. Verified: verify_routing 20/20; behavioral mini-eval over refs 59, 60, 34, 121 (taxonomy gaps for 59 and 60, re-row for 34, drop-redundant-cell for 121) kept all four papers (no orphaning removal).
Replace the full-corpus adversarial Workflow (one proposer agent per paper
×136, which exhausted usage mid-run) with a cost cascade: a cheap batched
triage skim reads every ref and selects only the ambiguous ones; the existing
verified gate then judges that flagged subset.
- skim_to_audit_ids.py: stdlib glue that validates the pinned skim output
schema, dedupes flagged ids (flag sticky), verifies every ref was skimmed
(loud MISSING, never a silent keep), and writes _audit_ids.json as
{"ids":[...]} — ids only, so the skim's suggested_dest never reaches the
gate proposer (preserving proposer independence). Prints a flag count + floor
agent estimate and warns past an abort ceiling.
- SKILL.md: documents the default skim→gate run path, a recall-biased skim
rubric (known method-family traps, curatorial-cell guard, method-name
mismatch), and the pinned output schema; reconciles the taxonomy note so a
single un-chunked gate invocation over liberally-flagged gap suspects still
clusters corpus-wide; marks the full Workflow fan-out as the last resort.
…process & Scale-Up Add Taxonomy.md — a canonical, text-anchored definition of every matrix row (23 methods) and column (7 research areas): what each covers, what is out of scope, and the discriminators for confusable categories (a Foundation-Model row requires a pretrained/transferable model not a task-specific one; GNN excludes classical network propagation; Deep Learning is a catch-all yielding to more specific rows). It serves both readers and the AI classification audit, which grounds placements in the paper's own methods text. This becomes the trusted meaning-source, replacing the stale/AI-drafted ResearchAreas pages. Rename the column 'Bioprocess control' -> 'Bioprocess & Scale-Up' so reactor design, CFD/mixing, mass transfer, and scale-up engineering clearly belong (e.g. reactor-physics methods that transfer to bioreactor scale-up). Update the matrix header (repointed to the Taxonomy.md definition), add a Category definitions section to Papers.md, and update the area-label maps in extract_matrix_corpus.py and the workflow. The corpus (gitignored) still carries the old label until regenerated.
…ted_in signal The cited_in_research_areas signal was derived from the ResearchAreas pages, which are AI-assisted and stale — a 'paper cited in area X' hit can be a hallucination, so it is not trustworthy. Remove it everywhere; the paper's own methods text (measured against the Taxonomy.md definitions) is now the sole source of truth. - extract_matrix_corpus.py: stop computing/emitting cited_in_research_areas and reading ResearchAreas/*.md; the corpus is text-only. - workflow.js + caail-classification-reviewer.md: drop cited_by_curators from the schemas and proposer/skeptic/defender prompts; the Defend steelman now reads the paper + the Taxonomy.md column definition (not the area page); the anti-over- removal burden is anchored on paper-text evidence (scope removals face the defender; method-accuracy UNSUPPORTED stays firm and never orphans via the precedence ladder); ties go to KEEP. - prefilter_corpus.py: new stdlib, zero-token pre-filter that auto-clears the lexically-obvious classical-ML/benchmark placements from the paper's own text and emits the residual for the LLM skim. Validated zero correction-leak against the prior skim oracle (28% of the skim removed); fails toward the LLM on any uncertainty. Includes the FM-row trap (pretrained-vs-task-specific) and trap fixes that recover over-blocked keeps. - SKILL.md / CLAUDE.md: re-anchor the asymmetric-burden docs on the paper + Taxonomy.md; document the pre-filter.
Point all 23 method-row and 7 area-column labels in the matrix to their Taxonomy.md definition (the canonical, CAAIL-specific scope of each), removing every Wikipedia, ResearchAreas, and paper-ref link from the axis labels. Taxonomy outlines each method's nuance more precisely than a generic Wikipedia article. Acronyms are spelled out at first use in the Taxonomy definitions and the Category-definitions blurbs (GNN, CNN, GAN/VAE, SVM). Move the 'new rows link to Wikipedia' convention to Taxonomy.md across CLAUDE.md, SKILL.md, the reviewer contract, and the workflow: the taxonomy-gap schema's wikipedia_url becomes proposed_definition (a Taxonomy.md-style definition of the proposed new row/area). Cell anchors [N](#N) are untouched; label text is unchanged so the corpus/tooling that key on label text are unaffected.
The matrix column was renamed 'Bioprocess control' -> 'Bioprocess & Scale-Up'; update the area-column registry label so the parser/lint recognize it (the key 'bioprocess' is unchanged). Without this, lint:papers reports the column's papers as uncited. The research-area page registry is separate and untouched; the explorer's full-name display + Taxonomy routing remain a separate site pass.
…mometrics row Apply the 62 human-reviewed audit decisions to the matrix (curator-confirmed each): - ~15 method/area moves (e.g. shallow-RBF/MLP papers out of Deep Learning; GEARS/ SATURN/UCE/PRESAGE out of Foundation-Model rows since consuming FM embeddings ≠ being an FM; TxAgent general->domain-specific; ARIEL ->Benchmarks). - 8 cell removals where a method was a baseline/secondary (e.g. shallow MLP cells). - 9 additive multi-category placements (e.g. robot-scientist + active-learning cells). - 4 out-of-scope papers removed from the matrix, IDs retired (E. coli genetics #22, two HCI user studies #151/#152, a representation-scheme paper #163). - 3 perspective/correspondence papers moved to Reviews & Perspectives (#60/#113/#114). - New **Chemometrics** row (PLS/PLS-DA spectral methods for PAT bioprocess monitoring + sensory), seeded from existing CAAIL content: #7 plus two new references — ropls (Thevenot et al. 2015, #198) and mixOmics (Rohart et al. 2017, #199). Parked for a future second paper (not added): Bayesian Inference (#59), Neural/Graph Embedding (#197), Extreme Learning Machine (#26). Deferred pending full text: #52, #103, #104. Rejected as non-method axes: Network Propagation, Multi-Omics, Molecular Representation. Taxonomy.md gains the Chemometrics definition.
The audit changed the canonical Papers.md content; update the hardcoded ground-truth assertions to match: references 197->195 (−4 removed, +2 new), method rows 23->24 (Chemometrics), code-URL refs 70->72 (ropls/mixOmics), and the Deep Learning x Cellular Engineering cell composition. lint:papers and the full parser suite pass.
Full-text/identifier research on the three deferred items: - #52 BioMedReasoner builds on Neural Bellman-Ford Networks (a GNN) -> GNN x AI Tooling is correct (keep). (NeurIPS 2025 / OpenReview FmDuKzM8f7.) - #103 BitterIntense uses XGBoost (a decision-tree ensemble) -> Ensemble x Sensory is correct (keep). - #104 BitterMatch is a similarity / collaborative-filtering recommender (Tanimoto + sequence-similarity matrices, no ensemble) -> move Ensemble -> K-Nearest Neighbors x Sensory Prediction (closest similarity-based row).
Add MeatScan (Gyening et al. 2025, Data in Brief) to Datasets/Cow.md — an 11,000-image RGB dataset (5,627 fresh / 5,373 spoiled) of cow meat from Ghanaian markets, for fresh/spoiled CV classification. New 'Meat-quality imaging' thematic cluster + inventory row, linked to the Zenodo deposit and cross-linked to Papers.md #196 (CNN x Sensory Prediction). Relabel #196's companion blockquote `> **Code**` -> `> **Data**` since the Zenodo record is the dataset, not code.
…label datasets 129->130 (MeatScan inventory row), code-URL 72->71 and data-URL 9->10 (#196's companion blockquote relabelled Code->Data).
Route the canonical Taxonomy.md as a site page (/taxonomy/, in the top-level nav) via the existing caail-docs loader + a CAAIL_PAGES entry. In the Papers Explorer, the method-row and area-column labels are now links to their Taxonomy definition (heading anchors verified to match the GitHub-slug anchors used in Papers.md), and the acronym rows (SVM/CNN/GNN/GAN-VAE) carry a title tooltip spelling out the full name — so newcomers can see what each axis means and click through to the full definition.
Add a taxonomy.ts parser that reads each ### heading in Taxonomy.md and flattens its prose to a label -> definition map, emitted as the build-time taxonomy.json (gitignored like the other parser outputs, regenerated by `pnpm parse`). The Papers Explorer consumes this to show a row/column definition without a hardcoded copy that could drift. generate-data.ts gains a coverage guard: every matrix method and area label must have a non-empty Taxonomy.md definition or the build fails, so a renamed row can't silently lose its popup text. Schema + type added to types.ts; fixture + real-file tests added.
…ve search Three Papers Explorer improvements: - Axis labels are now buttons that open a definition popup (hover, focus, or click/tap) showing the Taxonomy.md entry, with a "View full definition" link out to /taxonomy/. The popup is fixed-positioned at the component root so it escapes the matrix pane's overflow clipping; dismissed on Escape, scroll, or outside click. - Selecting a research area ranks the method rows by paper count (descending) for that area, surfacing the most-studied methods first. - The search box now filters the whole matrix: matching ref ids drive the cell counts (non-matching cells dim out), and a global results list appears in the side panel when a query is active with no cell selected. Previously the box was a no-op until a cell was selected. Also fix the BASE_URL join for the taxonomy links (was producing "/caailtaxonomy/"; normalise like the other components) and refresh the e2e ground-truth counts that lagged the matrix audit (195 papers; the Deep Learning x Cellular Engineering cell is now 6, with Ji 2021 moved to the masked-LM foundation-model row). New e2e tests cover the popup, reorder, and search.
… the page A dense cell (e.g. Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking, 19 papers) made the reference panel tall enough to stretch the grid row and push the whole page taller. Cap .px-panel to the viewport (max-height + overflow-y:auto) with align-self:start so it sizes to its own content, and make it sticky below Starlight's top nav so it stays in view as a side dialogue. On the stacked narrow layout the panel is static and scrolls within 70vh. Verified: with the 19-paper cell the panel scrolls internally (839px box, 4006px content) and no longer drives the row height.
The matrix is taller than the viewport, so scrolling to lower method rows
lost the area-column labels. Bound .px-mxpane to the viewport height so it
becomes the vertical scroll container (keeping overflow:auto for the
existing horizontal scroll), then make the header row (.px-corner + .px-hd)
sticky at top:0 with a solid background that masks the rows passing under.
Two layout fixes were needed for a clean freeze:
- Drop the pane's top padding so the header pins flush at the pane edge;
otherwise a ~13px padding strip above the pinned header showed scrolling
cells.
- Give .px-hd height:100% so every tab fills the row track. The track is
sized by the tallest header ("AI Evaluation & Benchmarking"), so shorter
tabs left a band below them where a data row bled through; filling the
track aligns all tab bottoms with the corner into one solid masking band.
Verified: headers pin uniformly (71px), bottoms aligned with the corner,
no content bleeds above or beside the band across the full scroll range.
… field-gap papers Reconciles the eight ResearchAreas pages against main after the #32 field-gap additions and the #33 matrix-classification audit, which rewrote Papers.md (new rows incl. Reinforcement Learning and Chemometrics, Foundation Models split into five sub-rows, a 7th AI Evaluation & Benchmarking column, the Bioprocess column renamed to "Bioprocess & Scale-Up", and several references removed/reclassified). Correctness: - Remove four dangling anchors whose refs the audit deleted: #22 (Lao), #151/#152 (Gu CHI/HCI), #163 (MoleCode). - Rename the Bioprocess page H1/lede to "Bioprocess & Scale-Up" and relabel every cross-link to match the renamed column. - Reframe reclassified refs: #60 (Mathieu) is now a cultivated-meat perspective rather than "the one column study"; #90/#126/#62 corrected from Cellular-Engineering/Bioprocess cells to their AI Tooling cells; #68 recharacterized (LLM literature-extraction + hybrid GEM/DL predictor, not a RAG design workflow). Completeness — integrate ~39 newly-added column papers, each grounded in the paper's full text via the caail Zotero library and gated by the caail-claim-reviewer (writer != reviewer): - Media: GA/evolutionary + one-shot DoE media work (#210, #211, #212, #213). - Cellular Engineering: CellFM (#235), porcine-adipocyte readout (#218), and the active-learning/strain-design multi-listings (#63, #66, #68). - Bioprocess & Scale-Up: a reinforcement-learning control cluster (#200-#203), hybrid mechanistic+ML control (#204, #205), ML soft sensors (#206-#208), GA+ANN viral production (#209), and microbial volatile prediction (#27). - Scaffolding: ML scaffold/print-quality prediction (#214, #215, #216) and a nondestructive-characterization section (#217). - Sensory Prediction: MeatScan freshness data (#196) and the Tac generative burger study (#236). - AI Tooling: ProCyon (#224), discovery/agent-infra refs (#219-#223), the chemometrics R packages (#198, #199), and D-GEX (#4). - AI Evaluation: BioML-bench (#225), ARIEL (#53), and State/Cell-Eval (#57). Verified: 0 dangling anchors, every matrix-column paper represented in its page, and `pnpm --dir site build` succeeds.
Summary
This branch grew from the matrix-classification audit into a three-part body of work. It re-audits
the
Papers.mdmatrix against each paper's methods section, introduces a trusted Taxonomyas the single source of truth for every row/column, and upgrades the Papers Explorer on the
docs site to surface that taxonomy and make the matrix genuinely browsable.
1 · Matrix audit + multi-category reclassification (
Papers.md)Re-audits each matrix paper from its methods text (pulled from the caail Zotero full-text cache) and
applies multi-category placements where a paper substantively uses more than one method. Matrix
cells only — no reference text changed, no IDs renumbered. Each placement was proposed by a
methods-reading agent, survived 3 independent adversarial skeptics, and the low-confidence ones were
re-checked by a fresh agent. Also adds the Chemometrics row, resolves deferred full-text items
(#52/#103/#104), and catalogues the MeatScan cow-meat image dataset (
Datasets/Cow.md,companion to #196).
2 · Trusted Taxonomy (
Taxonomy.md, new)Canonical, CAAIL-specific definition of every method row and research-area column — what each covers,
what's out of scope, and how to tell confusable categories apart. Every matrix axis label now links
to its
Taxonomy.mddefinition (Wikipedia removed entirely), and the "Bioprocess control" columnis renamed "Bioprocess & Scale-Up" to make the reactor/scale-up scope explicit. Taxonomy doubles
as the grounding source for the audit, replacing the untrusted (AI-drafted, stale)
cited_insignal.3 · Papers Explorer upgrades (docs site)
click/tap, with a "View full definition →" link to
/taxonomy/. Fixed-positioned so it escapes thematrix pane's overflow; dismissed on Escape / scroll / outside click.
the most-studied methods first.
shows a global results list; previously it was a no-op until a cell was selected.
taxonomy.tsparser emitstaxonomy.json(label → definition) with a build-timeguard that fails the build if any matrix row/column lacks a definition; renders
Taxonomy.mdat
/taxonomy/; fixes aBASE_URLjoin bug that produced/caailtaxonomy/…links.Reusable tooling — Phase 4 of the Zotero⇄CAAIL lifecycle
.claude/agents/caail-classification-reviewer.md— read-only, full-text-grounded reviewer of(method × area)placement..claude/skills/matrix-classification-audit/—SKILL.md,extract_matrix_corpus.py,cost-efficient skim→gate run path, and a deterministic
prefilter_corpus.py. Registered inCLAUDE.md(now a 4-skill lifecycle). Corpus is gitignored.The audit also surfaced moves / removes / "leave the matrix" proposals that challenge CAAIL's
choice to catalogue general/foundational methods — these were triaged interactively with the
maintainer rather than auto-applied.
Test Plan
pnpm --dir site lint:papers— 0 errors (no dangling anchors, no orphaned primary refs)pnpm --dir site test— 296/296 vitest pass (incl. new taxonomy parser tests + multi-cell)pnpm --dir site test:e2e— 34/34 Playwright pass incl. axe a11y + new popup/reorder/search testspnpm --dir site build— 38 pages;generate-data.tscross-tally + taxonomy coverage guard pass🤖 Generated with Claude Code