GenCC-Link

MCP + FastAPI server that grounds gene-disease validity questions in the Gene Curation Coalition (GenCC) dataset — harmonized, aggregated, and served with consensus and conflict detection.

Research use only. Not for diagnosis, treatment, triage, patient management, or clinical decision support.

Features

GenCC gene-disease validity harmonized across member submitters (ClinGen, Genomics England PanelApp, Orphanet, Ambry, Invitae, Illumina, and others).
Strongest-classification + conflict detection — for each gene-disease pair, the strongest_classification (highest rank across submitters) and a has_conflict flag when supporting and against assertions coexist.
Local SQLite + FTS5 store built from the weekly GenCC bulk export — fast, deterministic, no upstream API at query time.
12 MCP tools with token-efficient response_mode shaping, typed outputSchema, plain-English headlines, and ready-to-call _meta.next_commands chains (one per resolved entity) — on success and error envelopes, so recovery is deterministic.
Validated enum filters — find_curations rejects out-of-vocabulary classification/submitter/moi with invalid_input and the accepted set (case-insensitive, with "did you mean"), instead of a misleading empty result. Each matched row carries a matched field naming the triggering submission.
Observability — every _meta carries request_id + elapsed_ms; get_gencc_diagnostics reports the daily download-quota headroom.
Three transports from one codebase: unified (REST + MCP), http, stdio.
Agent-discoverable — gencc:// capabilities (with inheritance_modes, data_notes), usage, reference, license, and citation resources; typed error envelopes; full recommended_citation in full mode, or a cacheable citation_ref + one-line citation_short otherwise.

Data source & license

GenCC has no live API; data is distributed as a single bulk export.

Source: Gene Curation Coalition bulk submissions export (new format) from thegencc.org, ~24MB TSV, updated weekly.
Data license: CC0 1.0 (public domain). Attribution to GenCC and the contributing sources is requested.
OMIM restriction: OMIM disease text is restricted where licensing forbids, so the disease_original_* OMIM fields may be absent — this is expected.
Not clinical: GenCC data is not intended for direct diagnostic use or medical decision-making without review by a genetics professional.

Quick start

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install project and dev dependencies
uv sync

# Build the local SQLite database from the GenCC export (~24MB download)
make data

# Start the unified REST + MCP server on http://127.0.0.1:8000
make dev

# Or start the local stdio MCP server (for Claude Desktop)
make mcp-serve

The database is built into <repo>/data/gencc.sqlite by default. With GENCC_LINK_DATA__AUTO_BOOTSTRAP=true (the default), the HTTP / unified server also builds the database on first use if it is absent, so make data is optional but recommended for a predictable first boot.

Database management commands:

make data          # gencc-link-data build   — force download + rebuild
make data-refresh  # gencc-link-data refresh — rebuild only if export changed
make data-info     # gencc-link-data info    — print build provenance

Connecting Claude Code & Claude Desktop

See docs/MCP_CONNECTION_GUIDE.md for the full guide. Streamable HTTP at /mcp is recommended; stdio is a local fallback.

Claude Code (HTTP)

make dev
claude mcp add --transport http gencc-link http://127.0.0.1:8000/mcp

Claude Desktop (HTTP)

{
  "mcpServers": {
    "gencc-link": {
      "type": "http",
      "url": "http://127.0.0.1:8000/mcp"
    }
  }
}

Claude Desktop (stdio)

{
  "mcpServers": {
    "gencc-link": {
      "command": "gencc-link-mcp",
      "env": {
        "PYTHONUNBUFFERED": "1",
        "GENCC_LINK_LOG_LEVEL": "WARNING"
      }
    }
  }
}

Or run stdio from a checkout with uv (no install step):

{
  "mcpServers": {
    "gencc-link": {
      "command": "uv",
      "args": ["run", "python", "mcp_server.py"],
      "cwd": "/absolute/path/to/gencc-link"
    }
  }
}

Available MCP tools

Tool	Purpose
`get_server_capabilities`	Tool inventory, classification ranks, response modes, data freshness
`get_gencc_diagnostics`	Build provenance + row/gene/disease/submitter counts
`search_genes`	Resolve symbol / HGNC id / partial text to genes (FTS)
`search_diseases`	Resolve title / MONDO / OMIM id to diseases (FTS)
`get_gene_curations`	All gene-disease assertions for a gene, with strongest classification + conflict
`get_disease_curations`	All genes asserted for a disease, with strongest classification + conflict
`get_genes_curations`	Batch `get_gene_curations`: up to 20 genes in one call (misses in `unresolved`)
`get_diseases_curations`	Batch `get_disease_curations`: up to 20 diseases in one call (misses in `unresolved`)
`get_gene_disease_assertion`	One pair: per-submitter classifications, MOI, PMIDs, URLs + conflict analysis
`find_curations`	Filter assertions by classification/submitter/MOI/conflict (`ids_only` for cheap paging; `cursor` for refresh-safe autonomous page-forward)
`list_submitters`	Submitting organizations + counts
`resolve_identifier`	Map free text to canonical HGNC/MONDO ids

Tools whose payloads vary accept response_mode: minimal | compact (default) | standard | full. See docs/usage.md for the canonical workflows and the citation contract.

GeneFoundry federation

GenCC-Link is part of the GeneFoundry *-link MCP fleet, federated behind the genefoundry-router gateway. It follows Tool-Naming & Normalization Standard v1:

serverInfo.name: gencc-link (stable identity, set on the FastMCP instance).
Gateway namespace token: gencc. The router mounts this server with namespace="gencc", so its tools surface at the gateway as gencc_<tool> (e.g. gencc_search_genes). Standalone MCP clients namespace it as mcp__gencc-link__<tool>.
Unprefixed leaves: tool names are intentionally not server-prefixed — namespacing is the gateway's job (Rule 1), so a leaf prefix would double-prefix at the gateway. A CI guard (tests/test_tool_naming.py) enforces ^[a-z0-9_]{1,50}$ + a canonical verb (get/search/list/resolve/find/ compare/compute) + a domain tag on every registered tool.
Canonical arguments: gene_symbol (approved symbol) / hgnc_id (HGNC CURIE) — pass exactly one to a single-gene tool; disease (MONDO/OMIM CURIE or title); response_mode; limit/offset. The batch get_genes_curations keeps a polymorphic genes list (symbols or HGNC CURIEs).

Architecture

GenCC is small, slow-changing bulk data with no live API, so GenCC-Link builds a local SQLite + FTS5 artifact once and queries it in-process — no upstream client, rate limiting, or caching against an external API at query time.

ingest (download -> parse -> aggregate -> build) -> SQLite + FTS5 store
  -> repository (read-only) -> service (search / curations / consensus)
  -> MCP tools  +  FastAPI (/health, /, /docs)
  -> transports: unified | http | stdio

Full details, the consensus/conflict model, and an ASCII diagram are in docs/architecture.md.

Configuration

Settings load from environment variables prefixed GENCC_LINK_ (nested data config uses a double underscore) and an optional .env file. Copy .env.example and adjust. Key variables:

Variable	Default	Description
`GENCC_LINK_HOST`	`127.0.0.1`	Server host
`GENCC_LINK_PORT`	`8000`	Server port
`GENCC_LINK_TRANSPORT`	`unified`	`unified` \| `http` \| `stdio`
`GENCC_LINK_MCP_PATH`	`/mcp`	MCP endpoint path
`GENCC_LINK_LOG_LEVEL`	`INFO`	Logging level
`GENCC_LINK_LOG_FORMAT`	`console`	`console` or `json`
`GENCC_LINK_DATA__SOURCE_FORMAT`	`new`	GenCC export format (`new` \| `legacy`)
`GENCC_LINK_DATA__DATA_DIR`	`<repo>/data`	Directory for the built database
`GENCC_LINK_DATA__DB_FILENAME`	`gencc.sqlite`	SQLite filename in the data dir
`GENCC_LINK_DATA__AUTO_BOOTSTRAP`	`true` (image: `false`)	Build the database lazily on first use if absent
`GENCC_LINK_DATA__REFRESH_ENABLED`	`true`	Run the in-app conditional-refresh scheduler (unified/http only)
`GENCC_LINK_DATA__REFRESH_INTERVAL_HOURS`	`24`	Hours between conditional refresh checks
`GENCC_LINK_DATA__REFRESH_JITTER_SECONDS`	`300`	Random jitter added to each refresh
`GENCC_LINK_DATA__BUILD_LOCK_TIMEOUT`	`600`	Seconds to wait for the cross-process build lock
`GENCC_LINK_DATA__DOWNLOAD_TIMEOUT`	`120`	Download timeout (seconds)
`GENCC_LINK_DATA__CACHE_SIZE`	`512`	Query cache entries (0 disables)
`GENCC_LINK_DATA__CACHE_TTL`	`3600`	Query cache TTL (seconds)

See docs/data-lifecycle.md for how the database is built on startup and refreshed on a schedule (in-app scheduler, cron sidecar, or Kubernetes CronJob).

Development

make install      # install project + dev dependencies (uv sync --group dev)
make ci-local     # format-check, lint, file-size budget, typecheck, fast tests
make test         # run tests (excludes integration)
make test-cov     # run tests with coverage (gate: 85%)
make lint         # ruff lint
make lint-loc     # enforce the per-file line budget (scripts/check_file_size.py)
make typecheck    # mypy strict

make ci-local is the gate to run before every commit. The project uses uv, Ruff (100 cols), mypy strict, and a per-file line budget enforced by scripts/check_file_size.py. Integration tests (-m integration) hit the live GenCC download endpoint and are excluded from the default runs. Agentic coding tools should follow AGENTS.md; Claude Code also loads the lean CLAUDE.md.

Docker deployment

make docker-build           # build the image
make docker-up              # start the unified server on host port 8000
curl http://localhost:8000/health
make docker-logs
make docker-down

The container's entrypoint builds the database once on startup (before the server accepts traffic), and an in-app scheduler conditionally refreshes it every 24h and hot-reloads the running server — so first-request latency is predictable and the daily download quota is respected. The built ~24MB database lives in the gencc-data named volume at /app/data and persists across restarts (a restart re-uses it; the conditional request returns 304).

For a dedicated scheduler instead of the in-app loop, use the cron sidecar overlay:

docker compose -f docker/docker-compose.yml -f docker/docker-compose.cron.yml up -d

Kubernetes manifests (initContainer + in-app scheduler, or an external CronJob) are in deploy/k8s/. The full strategy and all scheduling options are documented in docs/data-lifecycle.md. See docker/README.md for the production overlay.

License & citation

Code: MIT — see LICENSE.
Data: CC0 1.0 (public domain), from GenCC (thegencc.org); attribution requested.

Cite GenCC as:

DiStefano MT, et al. The Gene Curation Coalition. Genet Med. 2022;24(8):1732-1742. doi:10.1016/j.gim.2022.04.017

Acknowledgments

Gene Curation Coalition (GenCC) and its contributing member organizations.
Model Context Protocol, FastMCP, FastAPI, and Pydantic.

Research use only. GenCC-Link is a research tool and must not be used for diagnosis, treatment, triage, patient management, or clinical decision support. GenCC data is not intended for direct diagnostic use or medical decision-making without review by a genetics professional.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.claude/skills		.claude/skills
.github		.github
deploy/k8s		deploy/k8s
docker		docker
docs		docs
gencc_link		gencc_link
scripts		scripts
tests		tests
.env.docker.example		.env.docker.example
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MCP-UX-ASSESSMENT.md		MCP-UX-ASSESSMENT.md
Makefile		Makefile
README.md		README.md
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenCC-Link

Features

Data source & license

Quick start

Connecting Claude Code & Claude Desktop

Claude Code (HTTP)

Claude Desktop (HTTP)

Claude Desktop (stdio)

Available MCP tools

GeneFoundry federation

Architecture

Configuration

Development

Docker deployment

License & citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenCC-Link

Features

Data source & license

Quick start

Connecting Claude Code & Claude Desktop

Claude Code (HTTP)

Claude Desktop (HTTP)

Claude Desktop (stdio)

Available MCP tools

GeneFoundry federation

Architecture

Configuration

Development

Docker deployment

License & citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages