Skip to content

feat: matrix classification audit, Taxonomy, and Papers Explorer upgrades#33

Merged
benjibromberg merged 21 commits into
mainfrom
worktree-feat+matrix-classification-audit
Jun 10, 2026
Merged

feat: matrix classification audit, Taxonomy, and Papers Explorer upgrades#33
benjibromberg merged 21 commits into
mainfrom
worktree-feat+matrix-classification-audit

Conversation

@benjibromberg

@benjibromberg benjibromberg commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

This branch grew from the matrix-classification audit into a three-part body of work. It re-audits
the Papers.md matrix against each paper's methods section, introduces a trusted Taxonomy
as the single source of truth for every row/column, and upgrades the Papers Explorer on the
docs site to surface that taxonomy and make the matrix genuinely browsable.

1 · Matrix audit + multi-category reclassification (Papers.md)

Re-audits each matrix paper from its methods text (pulled from the caail Zotero full-text cache) and
applies multi-category placements where a paper substantively uses more than one method. Matrix
cells only — no reference text changed, no IDs renumbered. Each placement was proposed by a
methods-reading agent, survived 3 independent adversarial skeptics, and the low-confidence ones were
re-checked by a fresh agent. Also adds the Chemometrics row, resolves deferred full-text items
(#52/#103/#104), and catalogues the MeatScan cow-meat image dataset (Datasets/Cow.md,
companion to #196).

2 · Trusted Taxonomy (Taxonomy.md, new)

Canonical, CAAIL-specific definition of every method row and research-area column — what each covers,
what's out of scope, and how to tell confusable categories apart. Every matrix axis label now links
to its Taxonomy.md definition (Wikipedia removed entirely), and the "Bioprocess control" column
is renamed "Bioprocess & Scale-Up" to make the reactor/scale-up scope explicit. Taxonomy doubles
as the grounding source for the audit, replacing the untrusted (AI-drafted, stale) cited_in signal.

3 · Papers Explorer upgrades (docs site)

  • Definition popups — axis labels open a popup with the Taxonomy definition on hover, focus, or
    click/tap, with a "View full definition →" link to /taxonomy/. Fixed-positioned so it escapes the
    matrix pane's overflow; dismissed on Escape / scroll / outside click.
  • Frequency reorder — selecting a research area ranks the method rows by paper count, surfacing
    the most-studied methods first.
  • Working search — the search box now filters the whole matrix (non-matching cells dim out) and
    shows a global results list; previously it was a no-op until a cell was selected.
  • Plumbing: a new taxonomy.ts parser emits taxonomy.json (label → definition) with a build-time
    guard that fails the build if any matrix row/column lacks a definition; renders Taxonomy.md
    at /taxonomy/; fixes a BASE_URL join bug that produced /caailtaxonomy/… links.

Reusable tooling — Phase 4 of the Zotero⇄CAAIL lifecycle

  • .claude/agents/caail-classification-reviewer.md — read-only, full-text-grounded reviewer of
    (method × area) placement.
  • .claude/skills/matrix-classification-audit/SKILL.md, extract_matrix_corpus.py,
    cost-efficient skim→gate run path, and a deterministic prefilter_corpus.py. Registered in
    CLAUDE.md (now a 4-skill lifecycle). Corpus is gitignored.

The audit also surfaced moves / removes / "leave the matrix" proposals that challenge CAAIL's
choice to catalogue general/foundational methods — these were triaged interactively with the
maintainer rather than auto-applied.

Test Plan

  • pnpm --dir site lint:papers — 0 errors (no dangling anchors, no orphaned primary refs)
  • pnpm --dir site test296/296 vitest pass (incl. new taxonomy parser tests + multi-cell)
  • pnpm --dir site test:e2e34/34 Playwright pass incl. axe a11y + new popup/reorder/search tests
  • pnpm --dir site build — 38 pages; generate-data.ts cross-tally + taxonomy coverage guard pass
  • Fresh-agent adversarial re-check on the lowest-confidence adds (caught + reverted one over-eager placement)

🤖 Generated with Claude Code

Add Phase 4 of the Zotero⇄CAAIL lifecycle: a methods-grounded re-audit of
the Papers.md matrix itself.

- caail-classification-reviewer agent: read-only, full-text-grounded reviewer
  of (method × area) placement, distinct from the bibliographic citation
  reviewer. Verdicts DEFENSIBLE / MISPLACED / UNSUPPORTED per cell, plus
  MISSING-CELL recommendations and a NOT-PRIMARY flag.
- matrix-classification-audit skill + extract_matrix_corpus.py: parses the
  matrix and references out of Papers.md, indexes both Zotero groups by DOI
  and URL, and pulls each matrix paper's methods section from the PDF
  full-text cache into a per-ref corpus for adversarial review.
- Register the new skill (Phase 4) and reviewer in CLAUDE.md; cross-reference
  it from zotero-to-caail-sync; gitignore the corpus build artifact.
@benjibromberg

benjibromberg commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Held audit proposals — for human triage

The methods-grounded audit behind this PR also surfaced 9 moves, 27 removes, and 14 “leave the matrix” proposals. They were deliberately not applied here because they challenge CAAIL's curatorial choice to catalogue general/foundational methods (the strict reviewer wanted papers like scGPT, GEARS, UCE, SWE-bench, GPQA out of cell-ag columns). Each cleared a majority of 3 independent adversarial skeptics, but each is a curatorial call — tick the ones to action in a follow-up.

They split into two natures: (A) method-accuracy fixes (the cell names the wrong technique) and (B) scope/philosophy calls (method is right, but the application isn't cell-ag-specific). A-type fixes are the safer subset.

Update — #33 overturned. A full-text + domain-literature re-review (stirred-tank mixing CFD is core to cultivated-meat bioreactor scale-up; ResearchAreas/Bioprocess.md already cites this paper) reversed both #33 proposals — it stays in CNN × Bioprocess control. Its struck entries below are kept for the record. The over-strictness that flagged #33 likely affects other (B) scope/philosophy removals too — re-check before actioning.

Moves — reclassify an existing cell (9)

  • #6 · Ji et al. 2021DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome (method+scope)
    • Deep Learning × Cellular EngineeringFoundation Models: Masked Language Modeling × AI Tooling / Methodology
    • DNABERT follows the same training process as BERT... we significantly modified the pretraining process from the original BERT implementation by removing next sentence prediction, adjusting the sequence length and forcing the model to predict contiguous k tokens adapting to DNA scenario. During pre-t…
  • #13 · Wang et al. 2021scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (scope/philosophy)
    • GNN × Cellular EngineeringGNN × AI Tooling / Methodology
    • scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer's disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-relat…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Genetic Algorithms × Media OptimizationGenetic Algorithms × AI Tooling / Methodology
    • The NNGA algorithm is based on an RBF-assisted genetic algorithm. The NNGA uses an RBF model to suggest points that are close to but not directly on top of optima, using a truncated genetic algorithm (TGA).
  • #34 · Andrews et al. 2025Designing cultured tissue moulds using evolutionary strategies (method-accuracy)
    • SVM × ScaffoldingGenetic Algorithms × Scaffolding
    • Genetic algorithms (GA) are used here to find optimal mould designs. They are a form of optimisation algorithm that can be used to find solutions of complex or abstract problems. Used as a design tool, they constitute a form of artificial or computational intelligence.
  • #35 · Andrews et al. 2023Rapid prediction of lab-grown tissue properties using deep learning (method-accuracy)
    • Deep Learning × ScaffoldingGAN / VAE × Scaffolding
    • We use the TensorFlow framework for machine learning to implement the pix2pix conditional GAN (cGAN) described in [25].
  • #40 · Gao et al. 2025TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools (method-accuracy)
    • General-Purpose Biomedical Agents × AI Tooling / MethodologyDomain-Specific Biomedical Agents × AI Tooling / Methodology
    • TOOLUNIVERSE has 211 biomedical tools, covering the following categories: adverse events, risks, safety; addiction and abuse; drug patient populations; drug administration and handling; pharmacology; drug use, mechanism, composition; ID and labeling tools; general clinical annotations; clinical labo…
  • #53 · Liu et al. 2026Advancing AI Research Assistants with Expert-Involved Learning (method-accuracy)
    • Scientific Literature & Discovery Agents × AI Tooling / MethodologyBenchmarks & Evaluation Frameworks × AI Tooling / Methodology
    • we propose a new dataset designed for evaluating the ability of FMs for long document summarization and scientific figure understanding... we collected the ground truth information paired with model outputs and performed a quantitative assessment with various metrics
  • #118 · Rosen et al. 2024Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN (scope/philosophy)
    • Foundation Models: LM + Biological Priors × Cellular EngineeringFoundation Models: LM + Biological Priors × AI Tooling / Methodology
    • Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene fun…
  • #126 · Youngblut et al. 2025scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository (scope/philosophy)
    • Domain-Specific Biomedical Agents × Cellular EngineeringDomain-Specific Biomedical Agents × AI Tooling / Methodology
    • SRAgent is a Python package that utilizes LangGraph for constructing the agentic workflows... To comply with NCBI API rate limits, jobs are triggered every 1-5 minutes, processing 3-5 datasets per run... All extracted metadata is stored in a GCP SQL database for downstream processing.

Removes — drop an existing cell (27)

  • #1 · Nikkhah et al. 2023Toward sustainable culture media: Using artificial intelligence to optimize reduced-serum formulations for cultivated meat (method-accuracy)
    • Deep Learning × Media Optimization
    • The paper uses a Radial Basis Function (RBF) neural network, which is explicitly a shallow, single-hidden-layer architecture. The methods state: 'RBF has fewer parameters requiring optimization compared to the widely used multilayer perceptron (MLP) neural networks, as it has only one hidden layer a…
  • #5 · Li et al. 2020Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis (scope/philosophy)
    • Deep Learning × Cellular Engineering
    • The paper presents DESC, a general-purpose deep autoencoder-based scRNA-seq clustering tool applied exclusively to macaque retina bipolar cells, human pancreatic islet cells, and human PBMCs from lupus patients. None of these are cellular agriculture contexts. The methods text contains no mention of…
  • #7 · Tamburini et al. 2014Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology (method-accuracy)
    • Deep Learning × Bioprocess control
    • No deep learning is used anywhere in this paper. The methods section describes exclusively classical chemometric and statistical techniques: MLR (Multiple Linear Regression) for on-line monitoring ('an MLR analysis was carried out'), PCA ('PCA was carried out on the acquired spectra in order to sele…
  • #11 · Shen et al. 2024Chemometrics methods, sensory evaluation and intelligent sensory technologies combined with GAN-based integrated deep-learning framework to discriminate salted goose breeds (method-accuracy)
    • CNN × Sensory Prediction
    • No CNN is described anywhere in the methods text. Section 2.8 and the abstract describe an InfoGAN for data augmentation and 'several base classifiers' fused via dynamic weighting, but no convolutional neural network architecture is mentioned or used. The full methods excerpt (84,909 chars available…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Deep Learning × Media Optimization
    • The paper contains no neural network or deep learning architecture. Section 2.2 explicitly states: 'The NNGA algorithm is based on an RBF-assisted genetic algorithm.' The RBF surrogate model (Section 2.1) is a classical radial basis function interpolation (cubic RBF with a linear tail), not a deep l…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Genetic Algorithms × Media Optimization
    • The paper does apply a genetic algorithm (the TGA in NNGA, Section 2.2: 'ranking, pairing, crossover and mutation steps'), but all experiments are on mathematical benchmark test functions (Ackley, Rastrigin, Griewank, Levy, Michalewicz, Rosenbrock, etc. — Table 1, Section 2.5). No media optimization…
  • #18 · Cosenza 2022Sequential Learning Methods for the Experimental Optimization of Cell Culture Media for Cellular Agriculture (method-accuracy)
    • Deep Learning × Media Optimization
    • No deep learning method is applied by the dissertation author. The only reference to neural networks appears in the background survey: 'neural networks have been used to optimize bioreactor cultures [46] and multi-objective protein storage conditions [68]' — attributing this work to other cited auth…
  • #22 · Lao et al. 2022Global coordination of the mutation and growth rates across the genetic and nutritional variety in Escherichia coli (scope/philosophy)
    • SVM × Cellular Engineering
    • The SVM in this paper is applied to classify E. coli genotype categories (wild-type vs. reduced-genome vs. mutator strains) and media types, and to predict mutation/growth rates in a basic microbiology study of E. coli mutation-growth-rate trade-offs. There is no cellular engineering application in …
  • #33 · Rojek et al. 2021AI-Accelerated CFD Simulation Based on OpenFOAM and CPU/GPU Computing. In M. Paszynski, D. Kranzlmüller, V. V. Krzhizhanovskaya, J. J. Dongarra, & P. M. A. Sloot (Eds.), (scope/philosophy)
    • Overturned → KEEP CNN × Bioprocess control. Full-text re-review + domain analysis: stirred-tank mixing CFD is the core engineering challenge of cultivated-meat bioreactor scale-up (STRs dominate; impeller shear is the central animal-cell constraint), the method is general OpenFOAM stirred-tank CFD, and CAAIL's own ResearchAreas/Bioprocess.md already cites this paper. The “out of scope” call was over-strict.
  • #34 · Andrews et al. 2025Designing cultured tissue moulds using evolutionary strategies (method-accuracy)
    • SVM × Scaffolding
    • There is no mention of SVM anywhere in the paper. The methods are a genetic algorithm combined with the RAPTOR deep-learning tissue-organisation model and CONDOR biophysical simulations. SVM does not appear as a model, baseline, or comparator.
  • #59 · Antonakoudis & Richelle 2026Systematic data-driven genome-scale metabolic model reduction for bioprocess modeling: CHO culture case study (method-accuracy)
    • Bayesian Optimization × Bioprocess control
    • The paper applies Bayesian flux estimation (the MetRaC framework) to derive uncertainty-aware uptake/secretion rate bounds for metabolic model reduction — this is Bayesian statistical inference, not Bayesian Optimization. Bayesian Optimization requires a surrogate model (e.g., Gaussian Process) and …
  • #60 · Mathieu et al. 2025Integrative multi-omics modeling for cultivated meat production, quality, and safety (scope/philosophy)
    • Deep Learning × Cellular Engineering
    • No deep learning method is used anywhere in the paper. The methods describe a directed-graph interactome model with random walk network propagation and z-score statistical testing for causal analysis: 'The causal analysis algorithm employed in this paper scores and ranks interactome nodes based on r…
  • #68 · Li et al. 2024Leveraging large language models for metabolic engineering design (method-accuracy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper's LLM usage is for supervised NER and relation extraction (fine-tuned Qwen1.5 with LoRA), not an autonomous agent system with tool use or agentic reasoning. Methods state: 'we used the Qwen Lora to extract strain ID and gene entities from segments of research papers along with correspondin…
  • #90 · Yu et al. 2026GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents (scope/philosophy)
    • Domain-Specific Biomedical Agents × Cellular Engineering
    • The paper concerns generalizable microscopy image segmentation for general cell biology (mitochondria, ER, Golgi, diverse cell types from mouse brain, human pancreas, plant roots). The HTML full text confirms 'zero references to cellular agriculture, cultivated meat, food science, or bioengineering …
  • #103 · Margulis et al. 2021Intense bitterness of molecules: Machine learning for expediting drug discovery (scope/philosophy)
    • Ensemble Learning × Sensory Prediction
    • The paper develops BitterIntense, an XGBoost classifier that predicts intense bitterness of pharmaceutical molecules to aid drug discovery. The application domain is entirely pharmaceutical compliance (pediatric/geriatric drug palatability) with no connection to cellular agriculture. The CAAIL 'Sens…
  • #110 · Sze & Hassoun 2024Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • The paper does not create a benchmark dataset or evaluation framework as a reusable artifact. It is an evaluation study of GPT-4o on eight pre-existing PubChem retrieval protocols. The methods describe adapting existing protocols into prompts and prompt-engineering them ('we develop a methodology fo…
  • #114 · Yang et al. 2024Reply to: Deeper evaluation of a single-cell foundation model (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • This paper is a 'Matters arising' reply letter defending the original scBERT paper against the Boiarsky et al. critique. It runs a limited set of defensive comparison experiments (scBERT vs. L1 logistic regression on cross-organ cell-type annotation) but does not create, propose, or release any benc…
  • #119 · Rosen, Y., Roohani, Y., Agrawal, A., Samotorčan, L., Tabula Sapiens Consortium, Quake, S. R., & Leskovec, J. 2026Universal Cell Embeddings: A Foundation Model for Cell Biology (method-accuracy)
    • Foundation Models: Masked Language Modeling × Cellular Engineering
    • No span in the methods_text or abstract identifies UCE's pre-training objective as masked language modeling. The methods excerpt compares UCE to Geneformer and scGPT (implicitly distinguishing UCE's approach from theirs) but never describes UCE as using a masked LM objective. UCE is described as 'co…
  • #121 · Roohani et al. 2024Predicting transcriptional outcomes of novel multigene perturbations with GEARS (method-accuracy)
    • Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering
    • GEARS is a task-specific GNN architecture trained end-to-end from scratch on perturbation datasets, not a pre-trained foundation model. The methods describe two GNN encoders (fpert and fgene) plus MLP components trained with an autofocus direction-aware loss — there is no large pre-trained model, no…
  • #122 · Magnusson, J. P., Roohani, Y., Stauber, D., Situ, Y., Teba, P. R. de C., Sandberg, R., Leskovec, J., & Qi, L. S. 2024PreciCE: Precision engineering of cell fates via data-driven multi-gene control of transcriptional networks (method-accuracy)
    • Deep Learning × Cellular Engineering
    • The abstract states 'a machine learning-based computational algorithm that uses single-cell RNA sequencing data to predict multi-gene perturbation sets' but does not specify deep learning. The methods_text excerpt covers only wet-lab CRISPR/scRNA-seq protocols and contains no description of a neural…
  • #122 · Magnusson, J. P., Roohani, Y., Stauber, D., Situ, Y., Teba, P. R. de C., Sandberg, R., Leskovec, J., & Qi, L. S. 2024PreciCE: Precision engineering of cell fates via data-driven multi-gene control of transcriptional networks (method-accuracy)
    • Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering
    • The abstract describes 'a machine learning-based computational algorithm' but never mentions a foundation model, pre-trained model, transformer, or any architecture associated with cell-state or perturbation prediction foundation models (e.g., scGPT, Geneformer, GEARS). The methods_text excerpt cove…
  • #151 · Gu et al. 2024How Do Analysts Understand and Verify AI-Assisted Data Analyses? (scope/philosophy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper is a CHI 2024 HCI user study examining how human data analysts verify AI-generated analyses using a purpose-built design probe. It does not develop or apply any biomedical agent — the AI system studied is a generic code-interpreter assistant, and the datasets used are retail, movie, and fl…
  • #152 · Gu et al. 2024How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study (scope/philosophy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper is a CHI Wizard-of-Oz user study about how data analysts respond to AI planning assistance. It is neither biomedical nor domain-specific to biomedicine — the running example throughout the methods is a soccer referee/skin-tone dataset. No actual AI agent is built or deployed; the 'wizard' …
  • #155 · Jimenez et al. 2024SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • SWE-bench is an evaluation framework for software engineering tasks (resolving GitHub issues in Python OSS repos: astropy, django, matplotlib, etc.). The methods section covers BM25 retrieval for code files, context window limits, patch generation, and model performance on software debugging — with …
  • #156 · Rein et al. 2023GPQA: A Graduate-Level Google-Proof Q&A Benchmark (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • GPQA is a general scientific reasoning benchmark covering biology (Molecular Biology, Genetics), physics (Astrophysics, Quantum Mechanics), and chemistry (Organic Chemistry, General Chemistry). The methods text describes a question-writing protocol, expert/non-expert validation stages, and domain br…
  • #157 · Wang et al. 2024MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • MMLU-Pro is a general-purpose academic language understanding benchmark spanning 14 subjects (Math, Physics, Engineering, History, Law, Psychology, etc.) with zero cellular agriculture content. The methods text confirms the benchmark tests models on reasoning across generic academic disciplines. The…
  • #196 · Gyening et al. 2025MeatScan: An image dataset for machine learning-based classification of fresh and spoiled cow meat (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • The paper is a Data in Brief dataset descriptor for MeatScan, an image dataset of conventional (slaughtered) cow meat in Ghanaian markets. The benchmark framing is aspirational and secondary: the methods state the dataset 'could also serve as a benchmark dataset for evaluating the performance and ro…

Not-primary — proposed to leave the matrix entirely (14)

  • #5 · Li et al. 2020Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis (scope/philosophy)
    • → Software.md
    • DESC is a general-purpose bioinformatics tool paper for scRNA-seq clustering. Its applications are entirely biomedical (macaque retina, human pancreas, human PBMCs/lupus). It does not apply an AI method to any cellular agriculture research problem. The methods text confirms: 'we analyzed a scRNA-seq…
  • #7 · Tamburini et al. 2014Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology (method-accuracy)
    • → Reviews & Perspectives
    • The paper applies classical chemometric and statistical methods (MLR, PCA, PLS) paired with NIR spectroscopy to bioprocess monitoring — not any of the AI/ML methods in the matrix's valid row vocabulary. The methods section states: 'an MLR analysis was carried out' (on-line set), 'PCA was carried out…
  • #22 · Lao et al. 2022Global coordination of the mutation and growth rates across the genetic and nutritional variety in Escherichia coli (scope/philosophy)
    • → Reviews & Perspectives
    • Although this paper applies SVM to a scientific problem, it has no cellular agriculture relevance. The study examines E. coli mutation and growth rates across genetic variants (reduced-genome and mutator strains) and nutritional media as a fundamental microbiology/evolutionary biology investigation.…
  • #33 · Rojek et al. 2021AI-Accelerated CFD Simulation Based on OpenFOAM and CPU/GPU Computing. In M. Paszynski, D. Kranzlmüller, V. V. Krzhizhanovskaya, J. J. Dongarra, & P. M. A. Sloot (Eds.), (scope/philosophy)
    • Overturned → KEEP CNN × Bioprocess control. Full-text re-review + domain analysis: stirred-tank mixing CFD is the core engineering challenge of cultivated-meat bioreactor scale-up (STRs dominate; impeller shear is the central animal-cell constraint), the method is general OpenFOAM stirred-tank CFD, and CAAIL's own ResearchAreas/Bioprocess.md already cites this paper. The “out of scope” call was over-strict.
  • #60 · Mathieu et al. 2025Integrative multi-omics modeling for cultivated meat production, quality, and safety (scope/philosophy)
    • → Reviews & Perspectives
    • This is a perspective/framework paper proposing an integrative multi-omics methodology rather than a primary research paper applying a specific AI/ML method. The abstract states 'we discuss the potential of an integrative multi-omics approach' — the language of a perspective, not an experimental app…
  • #103 · Margulis et al. 2021Intense bitterness of molecules: Machine learning for expediting drug discovery (scope/philosophy)
    • → Software.md (as a bitterness prediction tool for pharma/food science, if deemed relevant to cell-ag taste engineering at all) or removed from CAAIL entirely given the absence of any cellular agriculture application
    • This paper applies an AI method (XGBoost ensemble) to a concrete prediction problem, but the problem domain — predicting pharmaceutical drug bitterness for drug discovery compliance — has no connection to cellular agriculture. The paper does not apply AI to any cell-ag research area (Media Optimizat…
  • #110 · Sze & Hassoun 2024Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database (scope/philosophy)
    • → Reviews & Perspectives
    • This paper evaluates GPT-4o on PubChem database retrieval tasks (pharmaceutical chemistry: compound similarity, bioactivity, gene-protein interactions). It does not apply any AI method to a cellular agriculture problem. The evaluation domain is general biomedical/cheminformatics LLM capability, with…
  • #114 · Yang et al. 2024Reply to: Deeper evaluation of a single-cell foundation model (scope/philosophy)
    • → Reviews & Perspectives
    • This is a 'Matters arising' reply/correspondence letter (the paper explicitly labels itself 'Matters arising') authored by the original scBERT team in response to a critique by Boiarsky et al. It does not apply any AI/ML method to a cellular agriculture research problem as primary work. The full tex…
  • #151 · Gu et al. 2024How Do Analysts Understand and Verify AI-Assisted Data Analyses? (scope/philosophy)
    • → OtherResources.md
    • This is a CHI 2024 human-computer interaction user study that investigates analyst verification workflows when using AI-assisted data analysis tools. It applies no AI/ML method to any research problem — it studies human behavior around AI outputs using a design probe methodology. It has no connectio…
  • #152 · Gu et al. 2024How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study (scope/philosophy)
    • → Reviews & Perspectives
    • This is a CHI 2024 HCI user study (Wizard-of-Oz methodology) that produces design guidelines for LLM-supported data analysis planning assistants. It does not apply any AI/ML method to a cellular-agriculture research problem. The core contribution is an empirical study of human responses to AI assist…
  • #155 · Jimenez et al. 2024SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (scope/philosophy)
    • → Reviews & Perspectives
    • SWE-bench does not apply any AI method to a cellular agriculture problem. It is a general software engineering benchmark that evaluates whether LLMs can resolve GitHub issues in Python repositories (astropy, django, matplotlib, seaborn, flask, requests, xarray, pylint, pytest, scikit-learn, sphinx, …
  • #156 · Rein et al. 2023GPQA: A Graduate-Level Google-Proof Q&A Benchmark (scope/philosophy)
    • → Datasets/Benchmarks.md
    • GPQA does not apply any AI/ML method to a cellular agriculture research area. It is a benchmark dataset paper that constructs a graduate-level Q&A evaluation set across general biology, physics, and chemistry domains. The methods text confirms the paper is entirely about question construction, exper…
  • #157 · Wang et al. 2024MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (scope/philosophy)
    • → Datasets/Benchmarks.md
    • MMLU-Pro creates and evaluates a general-purpose LLM benchmark with no cellular agriculture scope whatsoever. The abstract states it extends MMLU 'across diverse domains' (Math, Physics, Engineering, History, Law, Psychology). The methods section describes 5-shot CoT prompting and regex-based answer…
  • #196 · Gyening et al. 2025MeatScan: An image dataset for machine learning-based classification of fresh and spoiled cow meat (scope/philosophy)
    • → Remove from Papers.md entirely (out of CAAIL scope); if kept, data artifact only in Datasets/Cow.md
    • MeatScan is a Data in Brief dataset descriptor whose primary contribution is a curated image dataset (11,000 RGB images of fresh/spoiled slaughtered cow meat from Ghanaian markets). The only AI experiment is an explicitly labelled 'baseline experiment' using MobileNetV2 'to demonstrate that the Meat…

Generated by the matrix-classification-audit workflow (run wf_810da7cd-742); each item cleared ≥2/3 adversarial skeptics. Not auto-applied — these are curatorial calls.

Add 18 cross-listings where a paper substantively applies more than one
AI/ML method (verified against each paper's methods section via the
matrix-classification-audit workflow; each survived independent adversarial
review). No reference text changed; matrix cells only.

- ref 11 Shen 2024 → Ensemble Learning (InfoGAN + dynamically-weighted base
  classifiers)
- ref 20 Rafieyan 2024 → Ensemble Learning (XGBoost/GBM/RF/LightGBM)
- ref 26 Sun 2023, ref 28 Sun 2026 → SVM / Ensemble (LS-SVM, RF/GBDT/SVC)
- ref 32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors
  (seven model families benchmarked for bioprocess prediction)
- ref 61 Wang 2025b, ref 93 Tang 2026 → Agent Infrastructure (LangGraph /
  hybrid knowledge frameworks)
- ref 68 Li 2024 → GNN (GEM-as-graph submodule)
- ref 117 Cui 2024, ref 120 Rizvi 2026 → Cell-State & Perturbation Prediction;
  ref 120 also Reinforcement Learning (GRPO)
- ref 161 Narayanan 2025 → Reinforcement Learning (RL-trained chemistry model)
- ref 169 Hashizume & Ying 2025 → Ensemble Learning / Genetic Algorithms
- ref 182 King 2004 → Active Learning (experiment-selection strategy)
ref 72 trains 18 models including SVM and MLP/Bayesian neural networks for
sensory (flavor) prediction under 10-fold CV — confirmed by full-text
re-verification as the paper's own applied methods, not a background
enumeration. Adds (SVM × Sensory) and (Deep Learning × Sensory) alongside
its existing Ensemble Learning placement.
@benjibromberg benjibromberg force-pushed the worktree-feat+matrix-classification-audit branch from 568dfc5 to 2fe066d Compare June 2, 2026 18:20
…movals

The audit's #33 false positive (it proposed deleting a CNN-surrogate-CFD
paper from Bioprocess control, which ResearchAreas/Bioprocess.md already
cites) traced to two gaps: the reviewer read only the paper (never CAAIL's
own curation context), and a destructive removal carried no more burden than
an additive placement. This bakes an asymmetric, context-aware burden on
scope removals into the durable tooling.

- extract_matrix_corpus.py: add per-ref cited_in_research_areas (scan
  ResearchAreas/*.md by surname+year / DOI) — an intentional-placement KEEP
  prior (correctly flags #33 -> Bioprocess control).
- caail-classification-reviewer: read the ResearchAreas/<Area>.md scope and
  honor that prior before any scope call; tag every verdict
  nature=method-accuracy|scope; default a general-method scope concern to a
  MOVE to AI Tooling / Methodology, not a removal; method-absent papers stay a
  firm method-accuracy flag; never hedge a non-fitting paper into a
  destination-less move.
- .claude/workflows/matrix-classification-audit.js: durable named workflow —
  propose -> skeptics -> (scope only) steelman defender -> gated domain-relevance
  web grounding. method-accuracy + additive changes bypass the heavy layers.
  Self-bootstraps inputs from matrix-corpus.json (args is not reliably
  delivered); fan-out pinned to Sonnet.
- SKILL.md / CLAUDE.md: document the asymmetric burden, the layers, and the
  named-workflow invocation.

Behavioral mini-eval (#33,#151,#152,#155): #33 now kept; SWE-bench (#155)
correctly flagged NOT-PRIMARY by the defender; #152 scope-removal overturned
via the curator-citation prior. No Papers.md content change.
…racy

Closes the residual gap from 7fe068c: the method-accuracy path bypassed the
defender, so a method-accuracy verdict on a paper the curators cite in a
ResearchAreas page could apply a removal of its only cell — orphaning it and
severing the live cross-reference (the exact risk the defender flagged for #152).

A wrong method row on a cited paper is now a re-row, not a deletion:
- workflow: proposer reports cited_by_curators; adjudicate() routes any removal
  (unsupported / not_primary) of a cited paper through the steelman defender
  regardless of nature. A re-row MOVE or an uncited method-accuracy fix still
  needs only skeptics; scope removals still reach the defender (gated, not blanket).
- reviewer agent: a curator-cited paper is never UNSUPPORTED/NOT-PRIMARY — a wrong
  method row is a MISPLACED re-row.

Verified by a deterministic truth-table check of the routing guard (7/7) plus the
behavioral mini-eval (#33 kept; cited #151/#152 received no applied removal). No
Papers.md change.
@benjibromberg

Copy link
Copy Markdown
Member Author

Hardened re-scrutiny of the held proposals

Re-ran all 32 held-proposal papers through the hardened pipeline (propose → skeptics → steelman defender for scope/cited removals → gated domain grounding). The over-strict scope/philosophy deletions are gone:

original audit hardened re-scrutiny
scope removals / not-primary 41 0 applied (6 overturned by the defender)
total changes ~50 7 (6 method-accuracy + 1 additive)

✅ Apply — method-accuracy fixes, orphan-safe (5)

  • #6 · Ji et al. 2021 — DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome — ADD Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
  • #34 · Andrews et al. 2025 — Designing cultured tissue moulds using evolutionary strategies — re-row SVM × ScaffoldingGenetic Algorithms × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
  • #35 · Andrews et al. 2023 — Rapid prediction of lab-grown tissue properties using deep learning — re-row Deep Learning × ScaffoldingGAN / VAE × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
  • #119 · Rosen, Y., Roohani, Y., Agrawal, A., Samotorčan, L., Tabula Sapiens Consortium, Quake, S. R., & Leskovec, J. 2026 — Universal Cell Embeddings: A Foundation Model for Cell Biology — remove Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
  • #121 · Roohani et al. 2024 — Predicting transcriptional outcomes of novel multigene perturbations with GEARS — remove Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)

⚠️ Needs your decision — correct method-fix, but would orphan the paper (2)

Each removes the paper's only cell, and its actual technique has no matrix row — a re-row / not-primary / keep-as-approximation call, not an auto-apply:

  • #59 · Antonakoudis & Richelle 2026 — Systematic data-driven genome-scale metabolic model reduction for bioprocess modeling: CHO culture case study — remove Bayesian Optimization × Bioprocess control (true method is not a matrix row). The paper applies Bayesian flux estimation (Bayesian statistical inference via the MetRaC framework) to derive uncertainty-aware rate bounds from exo-metabolomics data. This is Bayesian inference/prob…
  • #60 · Mathieu et al. 2025 — Integrative multi-omics modeling for cultivated meat production, quality, and safety — remove Deep Learning × Cellular Engineering (true method is not a matrix row). The methods text is explicit about the paper's computational approach: 'The causal analysis algorithm employed in this paper scores and ranks interactome nodes based on random walk network propagation…

↩︎ Overturned by the defender — KEEP (6)

Original scope/method removals rejected because the paper is curator-cited and/or the method label is actually correct:

  • #1 · Nikkhah et al. 2023 — Toward sustainable culture media: Using artificial intelligence to optimize reduced-serum formulations for cultivated meat — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #7 · Tamburini et al. 2014 — Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology — proposed remove Deep Learning × Bioprocess control, kept (cited_by_curators=True)
  • #11 · Shen et al. 2024 — Chemometrics methods, sensory evaluation and intelligent sensory technologies combined with GAN-based integrated deep-learning framework to discriminate salted goose breeds — proposed remove CNN × Sensory Prediction, kept (cited_by_curators=True) — note: kept on the cited-prior, but the original "no CNN" concern is unresolved; worth a human re-row check
  • #17 · Cosenza & Block 2021 — A generalizable hybrid search framework for optimizing expensive design problems using surrogate models — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #18 · Cosenza 2022 — Sequential Learning Methods for the Experimental Optimization of Cell Culture Media for Cellular Agriculture — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #126 · Youngblut et al. 2025 — scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository — proposed re-row Domain-Specific Biomedical Agents × Cellular Engineering, kept (cited_by_curators=False)

▪︎ Kept, no change (19)

All resolved to KEEP at the propose/skeptic stage — including every general-CS / general-biomedical paper the original audit wanted to delete (DESC #5, scGNN #13, SWE-bench #155, GPQA #156, MMLU-Pro #157, #110, #114, …). Pruning general-domain benchmarks would be a separate explicit curatorial decision — the hardened review (correctly) won't propose it on scope grounds.

#5, #13, #22, #33, #40, #53, #68, #90, #103, #110, #114, #118, #122, #151, #152, #155, #156, #157, #196


Hardened run wf_e54f3ded-9ce. Supersedes the un-hardened held-proposals comment above.

Adds a non-destructive taxonomy_gap verdict so the classification audit can
keep a paper that applies a real AI/ML method whose matrix row/column does not
yet exist, and surface a proposed new row/column for curator decision instead
of forcing a wrong cell or orphaning the paper.

- reviewer: taxonomy_gap verdict + precedence ladder (gap is the last resort,
  after re-row into an existing label); method-family precision notes (Bayesian
  Optimization vs Bayesian inference; GNN vs classical network propagation) so a
  step-2 re-row does not grab a superficially-similar row and bury the real gap.
- workflow: taxonomy_gaps schema; per-ref collection that never enters the
  adjudicated change set (so a gap can never become an applied removal); a
  Taxonomy phase that clusters pooled gaps and adversarially verifies clusters
  of >=2 papers into proposed new rows/columns (singletons are parked).
- verify_routing.mjs: deterministic guard — the asymmetric-burden routing
  truth-table, the non-orphan structural invariant, and the >=2 cluster gate.
- docs: SKILL.md (verdict, non-orphan guarantee, run-full-corpus note,
  human-applied rows/columns) and the CLAUDE.md lifecycle entry.

Verified: verify_routing 20/20; behavioral mini-eval over refs 59, 60, 34, 121
(taxonomy gaps for 59 and 60, re-row for 34, drop-redundant-cell for 121) kept
all four papers (no orphaning removal).
Replace the full-corpus adversarial Workflow (one proposer agent per paper
×136, which exhausted usage mid-run) with a cost cascade: a cheap batched
triage skim reads every ref and selects only the ambiguous ones; the existing
verified gate then judges that flagged subset.

- skim_to_audit_ids.py: stdlib glue that validates the pinned skim output
  schema, dedupes flagged ids (flag sticky), verifies every ref was skimmed
  (loud MISSING, never a silent keep), and writes _audit_ids.json as
  {"ids":[...]} — ids only, so the skim's suggested_dest never reaches the
  gate proposer (preserving proposer independence). Prints a flag count + floor
  agent estimate and warns past an abort ceiling.
- SKILL.md: documents the default skim→gate run path, a recall-biased skim
  rubric (known method-family traps, curatorial-cell guard, method-name
  mismatch), and the pinned output schema; reconciles the taxonomy note so a
  single un-chunked gate invocation over liberally-flagged gap suspects still
  clusters corpus-wide; marks the full Workflow fan-out as the last resort.
…process & Scale-Up

Add Taxonomy.md — a canonical, text-anchored definition of every matrix row
(23 methods) and column (7 research areas): what each covers, what is out of
scope, and the discriminators for confusable categories (a Foundation-Model row
requires a pretrained/transferable model not a task-specific one; GNN excludes
classical network propagation; Deep Learning is a catch-all yielding to more
specific rows). It serves both readers and the AI classification audit, which
grounds placements in the paper's own methods text. This becomes the trusted
meaning-source, replacing the stale/AI-drafted ResearchAreas pages.

Rename the column 'Bioprocess control' -> 'Bioprocess & Scale-Up' so reactor
design, CFD/mixing, mass transfer, and scale-up engineering clearly belong
(e.g. reactor-physics methods that transfer to bioreactor scale-up). Update the
matrix header (repointed to the Taxonomy.md definition), add a Category
definitions section to Papers.md, and update the area-label maps in
extract_matrix_corpus.py and the workflow.

The corpus (gitignored) still carries the old label until regenerated.
…ted_in signal

The cited_in_research_areas signal was derived from the ResearchAreas pages,
which are AI-assisted and stale — a 'paper cited in area X' hit can be a
hallucination, so it is not trustworthy. Remove it everywhere; the paper's own
methods text (measured against the Taxonomy.md definitions) is now the sole
source of truth.

- extract_matrix_corpus.py: stop computing/emitting cited_in_research_areas and
  reading ResearchAreas/*.md; the corpus is text-only.
- workflow.js + caail-classification-reviewer.md: drop cited_by_curators from the
  schemas and proposer/skeptic/defender prompts; the Defend steelman now reads the
  paper + the Taxonomy.md column definition (not the area page); the anti-over-
  removal burden is anchored on paper-text evidence (scope removals face the
  defender; method-accuracy UNSUPPORTED stays firm and never orphans via the
  precedence ladder); ties go to KEEP.
- prefilter_corpus.py: new stdlib, zero-token pre-filter that auto-clears the
  lexically-obvious classical-ML/benchmark placements from the paper's own text
  and emits the residual for the LLM skim. Validated zero correction-leak against
  the prior skim oracle (28% of the skim removed); fails toward the LLM on any
  uncertainty. Includes the FM-row trap (pretrained-vs-task-specific) and trap
  fixes that recover over-blocked keeps.
- SKILL.md / CLAUDE.md: re-anchor the asymmetric-burden docs on the paper +
  Taxonomy.md; document the pre-filter.
Point all 23 method-row and 7 area-column labels in the matrix to their
Taxonomy.md definition (the canonical, CAAIL-specific scope of each), removing
every Wikipedia, ResearchAreas, and paper-ref link from the axis labels. Taxonomy
outlines each method's nuance more precisely than a generic Wikipedia article.
Acronyms are spelled out at first use in the Taxonomy definitions and the
Category-definitions blurbs (GNN, CNN, GAN/VAE, SVM).

Move the 'new rows link to Wikipedia' convention to Taxonomy.md across CLAUDE.md,
SKILL.md, the reviewer contract, and the workflow: the taxonomy-gap schema's
wikipedia_url becomes proposed_definition (a Taxonomy.md-style definition of the
proposed new row/area). Cell anchors [N](#N) are untouched; label text is
unchanged so the corpus/tooling that key on label text are unaffected.
The matrix column was renamed 'Bioprocess control' -> 'Bioprocess & Scale-Up';
update the area-column registry label so the parser/lint recognize it (the key
'bioprocess' is unchanged). Without this, lint:papers reports the column's papers
as uncited. The research-area page registry is separate and untouched; the
explorer's full-name display + Taxonomy routing remain a separate site pass.
…mometrics row

Apply the 62 human-reviewed audit decisions to the matrix (curator-confirmed each):
- ~15 method/area moves (e.g. shallow-RBF/MLP papers out of Deep Learning; GEARS/
  SATURN/UCE/PRESAGE out of Foundation-Model rows since consuming FM embeddings ≠
  being an FM; TxAgent general->domain-specific; ARIEL ->Benchmarks).
- 8 cell removals where a method was a baseline/secondary (e.g. shallow MLP cells).
- 9 additive multi-category placements (e.g. robot-scientist + active-learning cells).
- 4 out-of-scope papers removed from the matrix, IDs retired (E. coli genetics #22,
  two HCI user studies #151/#152, a representation-scheme paper #163).
- 3 perspective/correspondence papers moved to Reviews & Perspectives (#60/#113/#114).
- New **Chemometrics** row (PLS/PLS-DA spectral methods for PAT bioprocess monitoring
  + sensory), seeded from existing CAAIL content: #7 plus two new references —
  ropls (Thevenot et al. 2015, #198) and mixOmics (Rohart et al. 2017, #199).

Parked for a future second paper (not added): Bayesian Inference (#59), Neural/Graph
Embedding (#197), Extreme Learning Machine (#26). Deferred pending full text: #52,
#103, #104. Rejected as non-method axes: Network Propagation, Multi-Omics, Molecular
Representation. Taxonomy.md gains the Chemometrics definition.
The audit changed the canonical Papers.md content; update the hardcoded ground-truth
assertions to match: references 197->195 (−4 removed, +2 new), method rows 23->24
(Chemometrics), code-URL refs 70->72 (ropls/mixOmics), and the Deep Learning x Cellular
Engineering cell composition. lint:papers and the full parser suite pass.
Full-text/identifier research on the three deferred items:
- #52 BioMedReasoner builds on Neural Bellman-Ford Networks (a GNN) -> GNN x AI
  Tooling is correct (keep). (NeurIPS 2025 / OpenReview FmDuKzM8f7.)
- #103 BitterIntense uses XGBoost (a decision-tree ensemble) -> Ensemble x Sensory
  is correct (keep).
- #104 BitterMatch is a similarity / collaborative-filtering recommender (Tanimoto +
  sequence-similarity matrices, no ensemble) -> move Ensemble -> K-Nearest Neighbors
  x Sensory Prediction (closest similarity-based row).
Add MeatScan (Gyening et al. 2025, Data in Brief) to Datasets/Cow.md — an
11,000-image RGB dataset (5,627 fresh / 5,373 spoiled) of cow meat from Ghanaian
markets, for fresh/spoiled CV classification. New 'Meat-quality imaging' thematic
cluster + inventory row, linked to the Zenodo deposit and cross-linked to
Papers.md #196 (CNN x Sensory Prediction). Relabel #196's companion blockquote
`> **Code**` -> `> **Data**` since the Zenodo record is the dataset, not code.
…label

datasets 129->130 (MeatScan inventory row), code-URL 72->71 and data-URL 9->10
(#196's companion blockquote relabelled Code->Data).
Route the canonical Taxonomy.md as a site page (/taxonomy/, in the top-level nav)
via the existing caail-docs loader + a CAAIL_PAGES entry. In the Papers Explorer,
the method-row and area-column labels are now links to their Taxonomy definition
(heading anchors verified to match the GitHub-slug anchors used in Papers.md), and
the acronym rows (SVM/CNN/GNN/GAN-VAE) carry a title tooltip spelling out the full
name — so newcomers can see what each axis means and click through to the full
definition.
Add a taxonomy.ts parser that reads each ### heading in Taxonomy.md and
flattens its prose to a label -> definition map, emitted as the
build-time taxonomy.json (gitignored like the other parser outputs,
regenerated by `pnpm parse`). The Papers Explorer consumes this to show a
row/column definition without a hardcoded copy that could drift.

generate-data.ts gains a coverage guard: every matrix method and area
label must have a non-empty Taxonomy.md definition or the build fails, so
a renamed row can't silently lose its popup text. Schema + type added to
types.ts; fixture + real-file tests added.
…ve search

Three Papers Explorer improvements:

- Axis labels are now buttons that open a definition popup (hover, focus,
  or click/tap) showing the Taxonomy.md entry, with a "View full
  definition" link out to /taxonomy/. The popup is fixed-positioned at the
  component root so it escapes the matrix pane's overflow clipping;
  dismissed on Escape, scroll, or outside click.
- Selecting a research area ranks the method rows by paper count
  (descending) for that area, surfacing the most-studied methods first.
- The search box now filters the whole matrix: matching ref ids drive the
  cell counts (non-matching cells dim out), and a global results list
  appears in the side panel when a query is active with no cell selected.
  Previously the box was a no-op until a cell was selected.

Also fix the BASE_URL join for the taxonomy links (was producing
"/caailtaxonomy/"; normalise like the other components) and refresh the
e2e ground-truth counts that lagged the matrix audit (195 papers; the
Deep Learning x Cellular Engineering cell is now 6, with Ji 2021 moved to
the masked-LM foundation-model row). New e2e tests cover the popup,
reorder, and search.
@benjibromberg benjibromberg changed the title feat(papers): adversarial methods-grounded multi-category matrix audit feat: matrix classification audit, Taxonomy, and Papers Explorer upgrades Jun 10, 2026
… the page

A dense cell (e.g. Benchmarks & Evaluation Frameworks × AI Evaluation &
Benchmarking, 19 papers) made the reference panel tall enough to stretch
the grid row and push the whole page taller. Cap .px-panel to the
viewport (max-height + overflow-y:auto) with align-self:start so it sizes
to its own content, and make it sticky below Starlight's top nav so it
stays in view as a side dialogue. On the stacked narrow layout the panel
is static and scrolls within 70vh. Verified: with the 19-paper cell the
panel scrolls internally (839px box, 4006px content) and no longer drives
the row height.
The matrix is taller than the viewport, so scrolling to lower method rows
lost the area-column labels. Bound .px-mxpane to the viewport height so it
becomes the vertical scroll container (keeping overflow:auto for the
existing horizontal scroll), then make the header row (.px-corner + .px-hd)
sticky at top:0 with a solid background that masks the rows passing under.

Two layout fixes were needed for a clean freeze:
- Drop the pane's top padding so the header pins flush at the pane edge;
  otherwise a ~13px padding strip above the pinned header showed scrolling
  cells.
- Give .px-hd height:100% so every tab fills the row track. The track is
  sized by the tallest header ("AI Evaluation & Benchmarking"), so shorter
  tabs left a band below them where a data row bled through; filling the
  track aligns all tab bottoms with the corner into one solid masking band.

Verified: headers pin uniformly (71px), bottoms aligned with the corner,
no content bleeds above or beside the band across the full scroll range.
@benjibromberg benjibromberg merged commit 406f280 into main Jun 10, 2026
1 check passed
@benjibromberg benjibromberg deleted the worktree-feat+matrix-classification-audit branch June 10, 2026 21:14
benjibromberg added a commit that referenced this pull request Jun 12, 2026
… field-gap papers

Reconciles the eight ResearchAreas pages against main after the #32
field-gap additions and the #33 matrix-classification audit, which rewrote
Papers.md (new rows incl. Reinforcement Learning and Chemometrics, Foundation
Models split into five sub-rows, a 7th AI Evaluation & Benchmarking column,
the Bioprocess column renamed to "Bioprocess & Scale-Up", and several
references removed/reclassified).

Correctness:
- Remove four dangling anchors whose refs the audit deleted: #22 (Lao),
  #151/#152 (Gu CHI/HCI), #163 (MoleCode).
- Rename the Bioprocess page H1/lede to "Bioprocess & Scale-Up" and relabel
  every cross-link to match the renamed column.
- Reframe reclassified refs: #60 (Mathieu) is now a cultivated-meat
  perspective rather than "the one column study"; #90/#126/#62 corrected from
  Cellular-Engineering/Bioprocess cells to their AI Tooling cells; #68
  recharacterized (LLM literature-extraction + hybrid GEM/DL predictor, not a
  RAG design workflow).

Completeness — integrate ~39 newly-added column papers, each grounded in the
paper's full text via the caail Zotero library and gated by the
caail-claim-reviewer (writer != reviewer):
- Media: GA/evolutionary + one-shot DoE media work (#210, #211, #212, #213).
- Cellular Engineering: CellFM (#235), porcine-adipocyte readout (#218), and
  the active-learning/strain-design multi-listings (#63, #66, #68).
- Bioprocess & Scale-Up: a reinforcement-learning control cluster (#200-#203),
  hybrid mechanistic+ML control (#204, #205), ML soft sensors (#206-#208),
  GA+ANN viral production (#209), and microbial volatile prediction (#27).
- Scaffolding: ML scaffold/print-quality prediction (#214, #215, #216) and a
  nondestructive-characterization section (#217).
- Sensory Prediction: MeatScan freshness data (#196) and the Tac generative
  burger study (#236).
- AI Tooling: ProCyon (#224), discovery/agent-infra refs (#219-#223), the
  chemometrics R packages (#198, #199), and D-GEX (#4).
- AI Evaluation: BioML-bench (#225), ARIEL (#53), and State/Cell-Eval (#57).

Verified: 0 dangling anchors, every matrix-column paper represented in its
page, and `pnpm --dir site build` succeeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant