From 5f1d40dee8f6b02ba11e8ab85183f12cb798e470 Mon Sep 17 00:00:00 2001
From: benjibromberg <bromberg.benji@gmail.com>
Date: Fri, 12 Jun 2026 13:30:15 -0400
Subject: [PATCH 1/3] feat(resources): add AI Agents & Foundation Models hub;
 own Virtual Cell content
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add AIAgentsFoundationModels.md as the canonical thematic hub for the
AI-agents / foundation-models area. It owns the Virtual Cell Initiative &
Single-Cell Foundation Models section (lifted out of OtherResources.md) and
cross-links the rest of the theme — the AI Agents & Foundation Models group in
Software.md, the LLMs/AI Agents row and AI Tooling + AI Evaluation columns in
Papers.md, the benchmark datasets in Datasets/Benchmarks.md, and the relevant
Talks and Primer.

OtherResources.md drops the moved section and gains a single intro cross-link.
The two inbound links to the moved anchor (Primers/AI.md, Software.md State
entry) are repointed at the hub. Software and Benchmarks catalogs are otherwise
untouched — they remain the type-correct canonical homes and are linked, not
moved.
---
 AIAgentsFoundationModels.md | 30 ++++++++++++++++++++++++++++++
 OtherResources.md           | 12 +-----------
 Primers/AI.md               |  2 +-
 Software.md                 |  2 +-
 4 files changed, 33 insertions(+), 13 deletions(-)
 create mode 100644 AIAgentsFoundationModels.md

diff --git a/AIAgentsFoundationModels.md b/AIAgentsFoundationModels.md
new file mode 100644
index 0000000..42e2204
--- /dev/null
+++ b/AIAgentsFoundationModels.md
@@ -0,0 +1,30 @@
+# AI Agents & Foundation Models
+
+The fastest-moving corner of the library: general-purpose AI agents, biological foundation models, and the initiatives building toward predictive "virtual cells." Because CAAIL organizes its catalogs by resource type, this theme is deliberately spread across several files — the agent frameworks and foundation models live in [Software.md](./Software.md), the research papers in [Papers.md](./Papers.md), the evaluation suites in [Datasets/Benchmarks.md](./Datasets/Benchmarks.md). This page is the connective hub: it owns the ecosystem-and-initiatives view of the area, and points to the canonical home of every other facet.
+
+## Virtual Cell Initiative & Single-Cell Foundation Models
+
+Companion landing pages, blog posts, and challenge announcements for the virtual-cell initiative — Arc Institute's foundation-model program (State, Stack), the broader CZ Virtual Cells Platform, and the open Virtual Cell Challenge. The conceptual framing for this cluster is in [Papers.md ref #128](./Papers.md#128) (Bunne et al. 2024, *Cell*) and [ref #129](./Papers.md#129) (Roohani et al. 2025, *Cell*); the foundation models themselves live in the [Foundation Models rows of Papers.md](./Papers.md) — split by training paradigm (next-token prediction, masked language modeling, LM + biological priors, cell-state & perturbation prediction) — and in the corresponding entries in [Software.md](./Software.md).
+
+* [Arc Virtual Cell Model: State (landing page)](https://arcinstitute.org/tools/state) — Project home for Arc Institute's State virtual-cell model and the underlying Virtual Cell Atlas.
+* [Arc Institute news: State — predicting cellular responses to perturbation across diverse contexts](https://arcinstitute.org/news/virtual-cell-model-state) (Arc Institute, 2025) — Announcement and overview of the State model and its training methodology; companion to [Papers.md ref #57](./Papers.md#57).
+* [Arc Institute news: Stack — simulating cellular conditions via prompt engineering, without fine-tuning](https://arcinstitute.org/news/foundation-model-stack) (Arc Institute, 2026) — Announcement of the Stack model, demonstrating in-context learning for single-cell biology; companion to [Papers.md ref #124](./Papers.md#124).
+* [Arc Virtual Cell Challenge (README)](https://github.com/ArcInstitute/arc-virtual-cell-atlas/blob/main/virtual-cell-challenge/README.md) — Open challenge for predictive virtual-cell modeling, operationalizing the Cell perspective in [Papers.md ref #129](./Papers.md#129).
+* [Cell2Sentence-Scale (van Dijk lab project page)](https://www.vandijklab.org/c2s-scale) — Yale lab project page for the C2S-Scale model; companion to [Papers.md ref #120](./Papers.md#120) and the [Cell2Sentence entry in Software.md](./Software.md#cell2sentence-c2s-scale).
+
+## Agent frameworks & foundation models in Software.md
+
+The open-source tools themselves are catalogued under [AI Agents & Foundation Models in Software.md](./Software.md#ai-agents--foundation-models). That section spans two families: general-purpose **agent frameworks and tool ecosystems** — ToolUniverse, Biomni, AIAgents4Pharma, BRAD, CellForge, PaperQA, BioContextAI, and the broader MCP-server layer — and **single-cell foundation models** — Geneformer, scGPT, scBERT, scFoundation, UCE, TranscriptFormer, Cell2Sentence, and Arc's State. New tools are added there, in their type-correct home; this hub does not duplicate the catalog.
+
+## In the Papers matrix
+
+In the [Papers matrix](./Papers.md), this area is reachable two ways: the **LLMs / AI Agents** method row (agentic systems with tool use, retrieval, and reasoning) and the **AI Tooling / Methodology** column (general-purpose methods and agents not yet tied to a specific cell-ag application). The [AI Tooling / Methodology research-area page](./ResearchAreas/AITooling.md) gives the deeper narrative for that column.
+
+## Evaluation & benchmarks
+
+How well these models and agents actually perform is tracked separately. The benchmark *datasets* — including the foundation-model-relevant ProteinGym, LAB-Bench, BixBench, and AssayBench — are catalogued in [Datasets/Benchmarks.md](./Datasets/Benchmarks.md), and the evaluation methodology and the **AI Evaluation & Benchmarking** matrix column are covered in the [AI Evaluation & Benchmarking research-area page](./ResearchAreas/AIEvaluation.md). Not every benchmark there is AI-agent-specific — many are general or domain benchmarks — which is why evaluation keeps its own home rather than folding into this page.
+
+## Talks & onboarding
+
+* [AI Agents & Foundation Models for Biology talks](./Talks.md#ai-agents--foundation-models-for-biology) — curated lectures and webinars on agentic AI, foundation models, and language-model-based scientific reasoning, many from the Broad Institute's MIA series.
+* [AI for Cell-Ag Researchers primer](./Primers/AI.md) — the onboarding path for wet-lab researchers approaching the AI / ML methods catalogued throughout the library.
diff --git a/OtherResources.md b/OtherResources.md
index b049d39..a43d3c2 100644
--- a/OtherResources.md
+++ b/OtherResources.md
@@ -1,19 +1,9 @@
 # Other Resources
 
-This file collects the virtual-cell initiative, courses, books, editorials, ecosystem initiatives, and curated bibliographies that complement CAAIL's core content files ([Papers.md](./Papers.md), [Software.md](./Software.md), [Datasets/](./Datasets/), [Databases.md](./Databases.md)). New to the field? Start with the [Primers](./Primers/) — curated onboarding paths for AI researchers learning cellular agriculture and for cell-ag researchers learning AI. Lectures, talks, and educational video playlists live separately in [Talks.md](./Talks.md). Resources are grouped by type in the sections below.
+This file collects courses, books, editorials, ecosystem initiatives, and curated bibliographies that complement CAAIL's core content files ([Papers.md](./Papers.md), [Software.md](./Software.md), [Datasets/](./Datasets/), [Databases.md](./Databases.md)). AI agents, single-cell foundation models, and the virtual-cell initiative have their own hub: [AI Agents & Foundation Models](./AIAgentsFoundationModels.md). New to the field? Start with the [Primers](./Primers/) — curated onboarding paths for AI researchers learning cellular agriculture and for cell-ag researchers learning AI. Lectures, talks, and educational video playlists live separately in [Talks.md](./Talks.md). Resources are grouped by type in the sections below.
 
 > **Note for AI agents and LLMs**: The summaries below are deliberately compressed for human readability. If you are an automated system using these as the basis for reasoning, citation, or downstream analysis, please fetch the canonical source for each resource — the linked talks, articles, initiative pages, and curated bibliographies have substantially more comprehensive and authoritative information than this curated overview.
 
-## Virtual Cell Initiative & Single-Cell Foundation Models
-
-Companion landing pages, blog posts, and challenge announcements for the virtual-cell initiative — Arc Institute's foundation-model program (State, Stack), the broader CZ Virtual Cells Platform, and the open Virtual Cell Challenge. The conceptual framing for this cluster is in [Papers.md ref #128](./Papers.md#128) (Bunne et al. 2024, *Cell*) and [ref #129](./Papers.md#129) (Roohani et al. 2025, *Cell*); the foundation models themselves live in the [Foundation Models rows of Papers.md](./Papers.md) — split by training paradigm (next-token prediction, masked language modeling, LM + biological priors, cell-state & perturbation prediction) — and in the corresponding entries in [Software.md](./Software.md).
-
-* [Arc Virtual Cell Model: State (landing page)](https://arcinstitute.org/tools/state) — Project home for Arc Institute's State virtual-cell model and the underlying Virtual Cell Atlas.
-* [Arc Institute news: State — predicting cellular responses to perturbation across diverse contexts](https://arcinstitute.org/news/virtual-cell-model-state) (Arc Institute, 2025) — Announcement and overview of the State model and its training methodology; companion to [Papers.md ref #57](./Papers.md#57).
-* [Arc Institute news: Stack — simulating cellular conditions via prompt engineering, without fine-tuning](https://arcinstitute.org/news/foundation-model-stack) (Arc Institute, 2026) — Announcement of the Stack model, demonstrating in-context learning for single-cell biology; companion to [Papers.md ref #124](./Papers.md#124).
-* [Arc Virtual Cell Challenge (README)](https://github.com/ArcInstitute/arc-virtual-cell-atlas/blob/main/virtual-cell-challenge/README.md) — Open challenge for predictive virtual-cell modeling, operationalizing the Cell perspective in [Papers.md ref #129](./Papers.md#129).
-* [Cell2Sentence-Scale (van Dijk lab project page)](https://www.vandijklab.org/c2s-scale) — Yale lab project page for the C2S-Scale model; companion to [Papers.md ref #120](./Papers.md#120) and the [Cell2Sentence entry in Software.md](./Software.md#cell2sentence-c2s-scale).
-
 ## Courses
 
 University courses on cellular agriculture — structured entry points into the field, and reference models for curriculum design.
diff --git a/Primers/AI.md b/Primers/AI.md
index 9ec41f0..5eabccb 100644
--- a/Primers/AI.md
+++ b/Primers/AI.md
@@ -18,7 +18,7 @@ Educational playlists for the audience approaching machine learning from the bio
 Once you have the basics, these are the higher-level talks and references.
 
 * [AI Agents & Foundation Models for Biology talks](../Talks.md#ai-agents-foundation-models-for-biology) — agentic AI, foundation models, and language-model-based scientific reasoning, many from the Broad Institute's MIA series.
-* [Virtual Cell Initiative & single-cell foundation models](../OtherResources.md#virtual-cell-initiative--single-cell-foundation-models) — Arc Institute's State / Stack program, the open Virtual Cell Challenge, and Cell2Sentence.
+* [AI Agents & Foundation Models hub](../AIAgentsFoundationModels.md) — the connective page for agent frameworks, single-cell foundation models, and the virtual-cell initiative (Arc's State / Stack program, the open Virtual Cell Challenge, and Cell2Sentence).
 * [Curated bibliographies & awesome lists](../OtherResources.md#curated-bibliographies--awesome-lists) — community-maintained indexes for the AI / single-cell / bioinformatics literature.
 * [Stanford CS224N: NLP with Deep Learning (Spring 2024)](https://www.youtube.com/playlist?list=PLoROMvodv4rOaMFbaqxPDoLWjDaRAdP9D) — Stanford's NLP course covering self-attention, Transformers, and modern LLM architectures.
 
diff --git a/Software.md b/Software.md
index 9747145..8482964 100644
--- a/Software.md
+++ b/Software.md
@@ -398,7 +398,7 @@ Graph-Enhanced gene-Activation Response Simulator — a graph neural network for
 
 ### [State + Cell-Eval](https://github.com/ArcInstitute/state)
 
-Arc Institute's first-generation virtual cell model and companion evaluation framework, designed to predict stem-cell, cancer-cell, and immune-cell responses to drugs, cytokines, and genetic perturbations. Trained on ~170M observational and ~100M perturbational single-cell measurements across 70+ cell lines; uses a bidirectional transformer architecture with self-attention over cell sets and reportedly is the first model to consistently beat simple linear baselines on perturbation-response prediction. Released alongside [`cell-eval`](https://github.com/ArcInstitute/cell-eval), the standardized evaluation framework for virtual-cell models. Companion to [Papers.md ref #57](./Papers.md#57) (Adduri et al. 2025, bioRxiv); see also the [Arc Institute news article on State](./OtherResources.md#virtual-cell-initiative--single-cell-foundation-models) in OtherResources.md. The follow-on **Stack** model — companion to [Papers.md ref #124](./Papers.md#124) (Dong et al. 2026) — extends State with in-context learning, simulating cellular conditions via prompt engineering without further fine-tuning.
+Arc Institute's first-generation virtual cell model and companion evaluation framework, designed to predict stem-cell, cancer-cell, and immune-cell responses to drugs, cytokines, and genetic perturbations. Trained on ~170M observational and ~100M perturbational single-cell measurements across 70+ cell lines; uses a bidirectional transformer architecture with self-attention over cell sets and reportedly is the first model to consistently beat simple linear baselines on perturbation-response prediction. Released alongside [`cell-eval`](https://github.com/ArcInstitute/cell-eval), the standardized evaluation framework for virtual-cell models. Companion to [Papers.md ref #57](./Papers.md#57) (Adduri et al. 2025, bioRxiv); see also the [Arc Institute news article on State](./AIAgentsFoundationModels.md) on the AI Agents & Foundation Models page. The follow-on **Stack** model — companion to [Papers.md ref #124](./Papers.md#124) (Dong et al. 2026) — extends State with in-context learning, simulating cellular conditions via prompt engineering without further fine-tuning.
 
 ### [BioDiscoveryAgent](https://github.com/snap-stanford/BioDiscoveryAgent)
 

From b8e918525bb262b6812ffa7e46d547ed3e551bbb Mon Sep 17 00:00:00 2001
From: benjibromberg <bromberg.benji@gmail.com>
Date: Fri, 12 Jun 2026 13:33:56 -0400
Subject: [PATCH 2/3] feat(site): render and test the AI Agents & Foundation
 Models route

Wire the canonical AIAgentsFoundationModels.md into the site: a route id and
top-group page entry in caail-pages.ts, the file added to the prose loader's
canonical sources, and a sidebar item. The fix that makes the page render
correctly is the caailProseRemark guard now including the new file, so its
internal .md links are rewritten and its leading H1 is stripped.

Adds coverage: caail-pages.test.ts gains the new route-id mapping and a bumped
page count (26 to 27); resources-toc.spec.ts asserts the hub renders the moved
Virtual Cell section with rewritten links, that /other-resources/ no longer
shows it, and that the page is a11y-clean.
---
 site/astro.config.mjs                         |  4 ++-
 site/e2e/resources-toc.spec.ts                | 30 +++++++++++++++++++
 site/src/content/caail-pages.test.ts          |  5 +++-
 site/src/content/caail-pages.ts               |  9 ++++++
 site/src/content/loaders/caail-docs-loader.ts |  2 +-
 5 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/site/astro.config.mjs b/site/astro.config.mjs
index 6be58be..dbcf702 100644
--- a/site/astro.config.mjs
+++ b/site/astro.config.mjs
@@ -27,7 +27,8 @@ function caailProseRemark() {
     const isProse =
       /^(ResearchAreas|Datasets)\//.test(sourcePath) ||
       sourcePath === 'CONTRIBUTING.md' ||
-      sourcePath === 'OtherResources.md';
+      sourcePath === 'OtherResources.md' ||
+      sourcePath === 'AIAgentsFoundationModels.md';
     if (!isProse) return;
     rewriteCaailLinks({ base: BASE, sourcePath })(tree);
     stripLeadingH1()(tree);
@@ -129,6 +130,7 @@ export default defineConfig({
         ] },
         { label: 'Software', link: '/software/' },
         { label: 'Databases', link: '/databases/' },
+        { label: 'AI Agents & Foundation Models', link: '/ai-agents-foundation-models/' },
         { label: 'Datasets (by species)', items: groupItems('datasets') },
         { label: 'Research Areas', items: groupItems('research-areas') },
         { label: 'Talks & Videos', link: '/talks/' },
diff --git a/site/e2e/resources-toc.spec.ts b/site/e2e/resources-toc.spec.ts
index 55acc2b..81e8698 100644
--- a/site/e2e/resources-toc.spec.ts
+++ b/site/e2e/resources-toc.spec.ts
@@ -60,6 +60,36 @@ test('other-resources has no serious/critical a11y violations', async ({ page })
   expect(serious, JSON.stringify(serious, null, 2)).toEqual([]);
 });
 
+// ---------------------------------------------------------------------------
+// AIAgentsFoundationModels.md — the thematic hub (now owns Virtual Cell)
+// ---------------------------------------------------------------------------
+
+test('ai-agents-foundation-models renders its sections and rewritten links', async ({ page }) => {
+  await page.goto('./ai-agents-foundation-models/');
+  // the Virtual Cell section now lives here, not on /other-resources/
+  await expect(page.getByRole('heading', { name: 'Virtual Cell Initiative & Single-Cell Foundation Models' })).toBeVisible();
+  // a rendered-page cross-link resolves to a site route (Datasets/Benchmarks.md → route)
+  await expect(page.locator('main a[href="/caail/datasets/benchmarks/"]').first()).toBeVisible();
+  // a deferred-file cross-link falls back to a GitHub blob URL (Software.md)
+  await expect(
+    page.locator('main a[href^="https://github.com/tucca-cellag/caail/blob/main/Software.md"]').first(),
+  ).toBeVisible();
+  // no raw repo-relative .md link leaks through
+  await expect(page.locator('main a[href$=".md"]:not([href*="github.com"])')).toHaveCount(0);
+});
+
+test('other-resources no longer renders the Virtual Cell heading', async ({ page }) => {
+  await page.goto('./other-resources/');
+  await expect(page.getByRole('heading', { name: 'Virtual Cell Initiative & Single-Cell Foundation Models' })).toHaveCount(0);
+});
+
+test('ai-agents-foundation-models has no serious/critical a11y violations', async ({ page }) => {
+  await page.goto('./ai-agents-foundation-models/');
+  const results = await new AxeBuilder({ page }).withTags(['wcag2a', 'wcag2aa']).analyze();
+  const serious = results.violations.filter((v) => ['serious', 'critical'].includes(v.impact ?? ''));
+  expect(serious, JSON.stringify(serious, null, 2)).toEqual([]);
+});
+
 // ---------------------------------------------------------------------------
 // Software/Databases catalog cards (right-rail TOC + surfaced hyperlinks)
 // ---------------------------------------------------------------------------
diff --git a/site/src/content/caail-pages.test.ts b/site/src/content/caail-pages.test.ts
index f940698..3532f08 100644
--- a/site/src/content/caail-pages.test.ts
+++ b/site/src/content/caail-pages.test.ts
@@ -13,15 +13,18 @@ describe('CAAIL_PAGES', () => {
     // multi-word top-level file gets an explicit hyphenated id (not "otherresources")
     expect(CAAIL_PAGES.idForSourcePath('OtherResources')).toBe('other-resources');
     expect(CAAIL_PAGES.idForSourcePath('OtherResources.md')).toBe('other-resources');
+    expect(CAAIL_PAGES.idForSourcePath('AIAgentsFoundationModels')).toBe('ai-agents-foundation-models');
+    expect(CAAIL_PAGES.idForSourcePath('AIAgentsFoundationModels.md')).toBe('ai-agents-foundation-models');
   });
   it('returns title + sidebar metadata by id', () => {
     expect(CAAIL_PAGES.byId('research-areas/bioprocess')?.title).toBe('Bioprocess control');
     expect(CAAIL_PAGES.byId('datasets/cow')?.title).toContain('Cow');
     expect(CAAIL_PAGES.byId('other-resources')).toMatchObject({ group: 'top', title: 'Other Resources' });
+    expect(CAAIL_PAGES.byId('ai-agents-foundation-models')).toMatchObject({ group: 'top', title: 'AI Agents & Foundation Models' });
   });
   it('all() returns {id,...meta} objects', () => {
     const all = CAAIL_PAGES.all();
-    expect(all.length).toBe(26);
+    expect(all.length).toBe(27);
     const cow = all.find((p) => p.id === 'datasets/cow');
     expect(cow).toMatchObject({ id: 'datasets/cow', group: 'datasets' });
     expect(typeof cow?.sidebarLabel).toBe('string');
diff --git a/site/src/content/caail-pages.ts b/site/src/content/caail-pages.ts
index d6f57f7..21a883e 100644
--- a/site/src/content/caail-pages.ts
+++ b/site/src/content/caail-pages.ts
@@ -239,6 +239,14 @@ const PAGES: Record<string, PageMeta> = {
     description:
       'Definitions of every AI/ML method row and cellular-agriculture research-area column in the Papers matrix — what each covers, what is out of scope, and how to tell confusable categories apart.',
   },
+  'ai-agents-foundation-models': {
+    title: 'AI Agents & Foundation Models',
+    sidebarLabel: 'AI Agents & Foundation Models',
+    group: 'top',
+    order: 4,
+    description:
+      'The connective hub for AI agents and biological foundation models in cellular agriculture — agent frameworks, single-cell foundation models, the virtual-cell initiative, and where each is catalogued across CAAIL.',
+  },
 };
 
 // ---------------------------------------------------------------------------
@@ -273,6 +281,7 @@ export const CAAIL_PAGES = {
       // Top-level file (e.g. CONTRIBUTING). Multi-word names get an explicit
       // hyphenated route id (the default lowercasing would merge the words).
       if (stripped === 'OtherResources') return 'other-resources';
+      if (stripped === 'AIAgentsFoundationModels') return 'ai-agents-foundation-models';
       return stripped.toLowerCase();
     }
 
diff --git a/site/src/content/loaders/caail-docs-loader.ts b/site/src/content/loaders/caail-docs-loader.ts
index 470513d..f6f2131 100644
--- a/site/src/content/loaders/caail-docs-loader.ts
+++ b/site/src/content/loaders/caail-docs-loader.ts
@@ -73,7 +73,7 @@ const REPO_ROOT = new URL('../../../../', import.meta.url);
  */
 const CANONICAL_SOURCES = {
   dirs: ['ResearchAreas', 'Datasets'],
-  files: ['CONTRIBUTING.md', 'OtherResources.md', 'Taxonomy.md'],
+  files: ['CONTRIBUTING.md', 'OtherResources.md', 'Taxonomy.md', 'AIAgentsFoundationModels.md'],
 } as const;
 
 export function caailDocsLoader(): Loader {

From e6be5921bcdad97eedc2bb345dd1fd6203996d6c Mon Sep 17 00:00:00 2001
From: benjibromberg <bromberg.benji@gmail.com>
Date: Fri, 12 Jun 2026 13:59:29 -0400
Subject: [PATCH 3/3] fix(resources): correct the AssayBench reference on the
 AI hub

AssayBench is a Papers.md entry surfaced in ResearchAreas/AIEvaluation.md, not a
dataset in Datasets/Benchmarks.md. Drop it from the Benchmarks list (ProteinGym,
LAB-Bench, BixBench remain) and attribute it to the AI Evaluation page instead.
---
 AIAgentsFoundationModels.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/AIAgentsFoundationModels.md b/AIAgentsFoundationModels.md
index 42e2204..1e680bf 100644
--- a/AIAgentsFoundationModels.md
+++ b/AIAgentsFoundationModels.md
@@ -22,7 +22,7 @@ In the [Papers matrix](./Papers.md), this area is reachable two ways: the **LLMs
 
 ## Evaluation & benchmarks
 
-How well these models and agents actually perform is tracked separately. The benchmark *datasets* — including the foundation-model-relevant ProteinGym, LAB-Bench, BixBench, and AssayBench — are catalogued in [Datasets/Benchmarks.md](./Datasets/Benchmarks.md), and the evaluation methodology and the **AI Evaluation & Benchmarking** matrix column are covered in the [AI Evaluation & Benchmarking research-area page](./ResearchAreas/AIEvaluation.md). Not every benchmark there is AI-agent-specific — many are general or domain benchmarks — which is why evaluation keeps its own home rather than folding into this page.
+How well these models and agents actually perform is tracked separately. The benchmark *datasets* — including the foundation-model-relevant ProteinGym, LAB-Bench, and BixBench — are catalogued in [Datasets/Benchmarks.md](./Datasets/Benchmarks.md), and the evaluation methodology, assay-level work such as AssayBench, and the **AI Evaluation & Benchmarking** matrix column are covered in the [AI Evaluation & Benchmarking research-area page](./ResearchAreas/AIEvaluation.md). Not every benchmark there is AI-agent-specific — many are general or domain benchmarks — which is why evaluation keeps its own home rather than folding into this page.
 
 ## Talks & onboarding