Turn your raw genetic data into an actionable, evidence-based personal health vault.
Genome Toolkit is a set of Claude Code skills and Python scripts that import DTC genome data (23andMe, AncestryDNA, MyHeritage, Genotek, Nebula, or any VCF), build a structured Obsidian knowledge vault, and generate clinically useful outputs like drug safety cards and prescriber summaries.
- Import raw genome data from any major DTC provider (auto-detected)
- Ask about your health goals (medication safety, mental health, gut, liver, sleep...)
- Generate personalized gene notes, system maps, and clinical reports
- Validate claims using multi-agent AI pipeline (Codex, NotebookLM, PubMed)
- Track biomarkers and compare lab results against genetic predictions
- Expand your data via guided imputation (600K -> 3-40M variants)
- Map your life-expectancy context across the countries you've lived in (migrant Life-Map)
v0.4.0 — Full-stack web app with 7 views (SNP Browser, Mental Health, PGx, Addiction, Risk Landscape, in-browser Import, and Life-Map), AI chat with collapsible sidebar, multi-provider TTS (Groq Orpheus/ElevenLabs/Deepgram), chat history, configurable nav, agent-friendly setup, and 6 supported genome providers. Hardened by a scientific-honesty audit — outputs are framed as evidence-tiered context, not clinical prediction (no fabricated risk scores; uncalibrated GWAS tallies labelled as such; harm-reduction content kept but never presented as genotype-backed dosing).
- Obsidian with Dataview plugin
- Python 3.10+
- Claude Code
# Clone the toolkit
git clone https://github.com/glebis/genome-toolkit.git
cd genome-toolkit
# Install Python dependencies
pip install -e ".[web]"
# Interactive setup (API keys, vault path, TTS, visible sections)
python scripts/setup.pyThe setup script supports a fully non-interactive --auto mode that reads API keys from environment variables and accepts all options as CLI flags:
# Set API keys as env vars (or store in macOS Keychain)
export ANTHROPIC_API_KEY=sk-ant-...
export GROQ_API_KEY=gsk_...
# Run non-interactive setup
python scripts/setup.py --auto \
--vault ~/my-genome-vault \
--tts-provider orpheus \
--tts-voice leo \
--population EUR
# Optionally hide nav sections
python scripts/setup.py --auto --hide-views addiction risk
# Show them back
python scripts/setup.py --auto --show-views addictionAll flags are optional — omitted values use existing config or sensible defaults. This means an AI agent can run python scripts/setup.py --auto with zero arguments and get a working config.
# 1. Place your raw genome file in the vault
cp ~/Downloads/23andme_raw.txt $GENOME_VAULT_ROOT/data/raw/
# 2. Import your data (in Claude Code)
/genome-import
# 3. Set up your vault with health goals
/genome-onboard
# 4. Start the app
uvicorn backend.app.main:app --port 8000 &
cd frontend && npm run dev
# 5. Open http://localhost:5173A full-stack web interface for exploring your genome data interactively.
| View | Description |
|---|---|
| SNP Browser | Paginated, filterable table of 3.4M+ variants with ClinVar significance, review stars, population frequency, and GWAS effect sizes |
| Mental Health | GWAS-powered psychiatric genetics dashboard with gene cards and evidence panels |
| PGx / Drugs | Pharmacogenomic profile — metabolizer status, drug cards, interaction warnings |
| Addiction | Harm-reduction-oriented substance sensitivity analysis |
| Risk Landscape | Top mortality causes overlaid with personal genetic risk factors |
| Import | Browser upload for raw genome files — drag-and-drop, format auto-detection preview, and import history (no CLI required) |
Add genome data without touching the command line. The Import tab (always available, and shown automatically on first run when the database is empty) lets you:
- Drag-and-drop or browse for a raw file — 23andMe / AncestryDNA
.txt, MyHeritage / Genotek.csv, or VCF.vcf/.vcf.gz(including imputed). - Preview the detected format — provider, version, assembly, confidence, and an estimated variant count — before anything is written.
- Set options — profile name, minimum imputation r² (VCF only), and a dry-run toggle to validate without importing.
- Import and review — imported / duplicate / low-r² counts, plus an import history table.
Files are streamed to a temporary file (200 MB cap, .zip rejected), processed in a
threadpool so the server stays responsive, then deleted. The browser flow and the
/genome-import CLI share one code path (scripts/lib/importer.py), so results are identical.
REST endpoints: POST /api/import/detect, POST /api/import/upload, GET /api/import/history.
AI chat powered by Claude Agent SDK with 11 MCP tools for querying your genome:
- Personalized starter prompts — context-aware suggestions based on current view and your data (e.g., "Which drugs should I discuss with my doctor?" when on PGx tab with CYP2D6 poor metabolizer status)
- Vault integration — reads your Obsidian gene notes, systems, phenotypes, and protocols
- Interactive responses — clickable gene names, wikilinks, action buttons (add to checklist, show gene, show variant)
- Voice mode — dictation input + TTS output with gene name spelling
- Suggested actions — AI can add items to your checklist, filter the SNP table, or open relevant links
- Collapsible sidebar mode — when the AI filters the SNP table, the chat palette auto-collapses to a right sidebar so you can see the table updating in real time. Toggle with
Cmd+\, expand back withCmd+K
# Backend (FastAPI)
uvicorn backend.app.main:app --port 8000
# Frontend (Vite + React)
cd frontend && npm run dev
# Open http://localhost:5173- Backend: FastAPI, aiosqlite, Claude Agent SDK, SOPS-encrypted secrets
- Frontend: React 18, TypeScript, Vite, Vitest
- TTS: Multi-provider (Groq Orpheus, ElevenLabs, Deepgram) with browser fallback
- Data: SQLite (genome.db), Obsidian vault, GWAS configs (PGC), PGx configs (CPIC)
| Skill | Trigger | Purpose |
|---|---|---|
| genome-import | /genome-import |
Import raw data, prepare imputation, import imputed VCFs |
| genome-onboard | /genome-onboard |
Goal-driven vault setup. --quick (4 questions) or --full (22 questions with GAD-7/PHQ-2/PSS-4) |
| genome-create | /new-gene X |
Create gene/system/phenotype notes from SQLite data |
| genome-analytics | /genome-analytics |
PRS, enrichment, vault audit, PubMed monitoring |
| genome-report | /biomarker, /wallet-card |
Lab import, Wallet Card, PGx Card, Prescriber Summary |
| genome-query | /genome-query |
SQL-like vault queries (filter, sort, group, stats, schema) |
| genome-validate | /genome-validate |
Multi-agent fact-checking (Codex + NotebookLM + PubMed) |
| Provider | Format | Detected By |
|---|---|---|
| 23andMe (v2-v5) | TSV | "23andMe" in file header |
| Genotek (Генотек) | TSV | "Genotek" in file header |
| AncestryDNA | TSV (5 cols) | allele1/allele2 column pattern |
| MyHeritage | CSV | "RSID,CHROMOSOME,POSITION,RESULT" header |
| Nebula Genomics | VCF | "source=Nebula" in VCF header |
| Generic VCF | VCF | ##fileformat=VCF header |
Every claim in the vault has an evidence tier:
- E1 (clinical-grade): CPIC/DPWG guidelines, multiple studies — act on these
- E2 (well-replicated): Multiple GWAS, OR > 1.5 — likely reliable
- E3 (supported): 2-5 studies, plausible mechanism — interpret cautiously
- E4 (suggestive): Single study — hypothesis, not diagnosis
- E5 (speculative): Preliminary or N=1 — for curiosity only
Genome Toolkit uses multiple AI agents to validate claims:
- Claude Code — primary agent for note creation and analysis
- Codex CLI (gpt-5-codex) — cross-model validation of evidence tiers and drug interactions
- NotebookLM — source-grounded fact-checking of prescriber documents
- PubMed subagents — literature verification and retraction monitoring
- Tavily/Firecrawl — web search for recent publications and safety alerts
Prescriber-facing reports require 2 agents to agree before publishing (configurable in config/agents.yaml).
- Full audit (
/genome-validate) — multi-agent sweep of the entire vault - Gene fact-check (
/genome-validate gene COMT) — verify genotypes vs SQLite, check claims via web, validate evidence tiers - Protocol/report fact-check (
/genome-validate protocol "Sertraline Optimization") — verify gene-recommendation links, supplement safety, evidence tiers
Query any frontmatter field with SQL-like syntax from the CLI:
python3 scripts/vault_query.py "type=gene AND evidence_tier=E1" --fields gene_symbol,full_name
python3 scripts/vault_query.py "type=system" --fields system_name,coverage --sort coverage --desc
python3 scripts/vault_query.py --statsSupports: =, !=, ~ (contains), >, <, >=, <=, AND/OR/NOT logic, --sort, --group, --json, --count, --stats, --schema. See skills/genome-query/SKILL.md for full reference.
Interactive triage for vault action items with DDD architecture:
# CLI
cd genome-toolkit && PYTHONPATH=. python -m genome_toolkit.triage --vault ~/genome-vault --classify
# TUI (Textual terminal UI)
genome-triageFeatures:
- Score and bucket action items (DO_NOW / SCHEDULE / DELEGATE / DROP)
- SVG renderings: dashboard, score cards, visit reports
- Session persistence (SQLite) with approval/deferral history
- Suggestion engine based on assessment scores + genetic signals
- Tests across domain, application, infrastructure, presentation layers
Quick (/genome-onboard): 4 questions → 8-12 gene notes + Wallet Card (2 min)
Full (/genome-onboard --full): 22-question interview across 4 phases (12 min):
- Phase 1: Demographics, medications, diagnoses, goals
- Phase 2: Sleep, exercise, caffeine, GI symptoms, pain, morning stiffness
- Phase 3: GAD-7, PHQ-2, PSS-4 (validated instruments)
- Phase 4: Family history, ancestry, concerns
Generates Profile Card + personalized Action Plan with assessment-weighted gene priorities.
- evidence-check — Modular scientific claim verification (PubMed, genomics, psychiatry). Used by genome-validate for structured fact-checking.
your-genome-vault/
Dashboard.md # Decision-tree home page
Question Index.md # Search by concern, not by gene
Genetic Determinism...md # Epistemic guardrails
Genes/ # One note per gene (BDNF.md, CYP2D6.md, ...)
Systems/ # Biological systems (Dopamine, HPA Axis, ...)
Phenotypes/ # Genetics -> lived experience
Protocols/ # Actionable intervention protocols
Reports/ # Prescriber-facing documents
Biomarkers/ # Lab results with genetic comparison
Research/ # Literature reviews and findings
Meta/ # Dashboards and audit reports
Guides/ # Getting Started, Imputation Guide
data/
raw/ # Your genome file (gitignored)
genome.db # SQLite database (gitignored)
Expand from ~600K SNPs to ~3-40M variants:
/genome-importprepare for imputation (VCF export + QC)- Upload to Michigan Imputation Server (free, 2-12 hours)
/genome-importimport imputed data
See Guides/Imputation Guide.md for full walkthrough.
Requires: bcftools, bgzip (brew install bcftools htslib)
All config in config/:
default.yaml— paths, rate limits, cache TTLgoal_map.yaml— health goals -> systems -> genes mappingevidence_tiers.yaml— E1-E5 definitionsprovider_formats.yaml— file format detection signaturesagents.yaml— multi-agent validation pipeline
Override paths via environment variables: GENOME_VAULT_ROOT, GENOME_DB_PATH.
API keys are stored in macOS Keychain by default (via scripts/setup.py). For team use or version-controlled secrets, the project optionally supports SOPS encryption with age:
brew install sops age
# Load encrypted secrets into current shell
source scripts/load_secrets.sh
# Edit encrypted secrets (requires age key in ~/.config/sops/age/keys.txt)
sops config/secrets.yamlconfig/secrets.yaml is encrypted at rest and safe to commit — only age key holders can decrypt.
- Raw genome data stays local (gitignored, never uploaded)
- SQLite database is gitignored
- No data leaves your machine unless you explicitly use imputation servers or API enrichment
- Imputation servers (Michigan/TOPMed) encrypt data and delete after 7 days
- Reports reference rsIDs, not bulk genotype dumps
"Normal willpower, different hardware. Fully rewirable."
- Genetics explains WHY, not what's wrong
- Every gene note ends with "What Changes This" (the exit ramp)
- E1 claims are reliable. E3-E5 claims are hypotheses, not diagnoses.
- 40-70% of outcomes are environment, behavior, and choice
Generate genomics-themed images via OpenAI's GPT Image 2 API with 10 curated style templates.
# List available styles
python3 scripts/generate_image.py --list-styles
# Generate with a style preset
python3 scripts/generate_image.py --style flat-kahn --size 1024x1536 "COMT enzyme pathway" out.png
# Draft mode (97% cheaper, for iteration)
python3 scripts/generate_image.py --style nordic-refined --draft "dopamine clearance diagram" out.png
# Combine styles
python3 scripts/generate_image.py --style "fritz-kahn+retro-terminal" "brain factory on VT100" out.png
# Preview composed prompt without calling API
python3 scripts/generate_image.py --style scientific --dry-run "Yerkes-Dodson curve" out.png| Style | Description |
|---|---|
arntz |
Gerd Arntz ISOTYPE pictograms — bold geometric silhouettes on black |
dark-infographic |
Schemas, arrows, bar charts on black background |
nordic-craft |
Scandinavian indie — linen texture, linocut, botanical accents |
nordic-refined |
Cleaner Nordic — rounded grotesque sans-serif, editorial polish |
scientific |
Yerkes-Dodson curves, kinetics graphs, molecular diagrams |
vintage-biological |
19th century Haeckel/Cajal engravings on aged parchment |
vintage-modern |
Vintage engravings + clean modern sans-serif typography |
fritz-kahn |
1920s Industriepalast — body as factory with tiny workers |
retro-terminal |
Each slide on a different vintage computer (Mac, C64, VT100, iMac G3) |
flat-kahn |
Ultra flat vector Fritz Kahn — no gradients, Soviet-industrial palette |
Styles defined in styles.yaml. Add your own by following the existing format.
# Backend (Python)
pip install -e ".[dev]"
python -m pytest tests/ -v
# Frontend (TypeScript)
cd frontend && npx vitest run
# Coverage report
cd frontend && npx vitest run --coveragegenome-toolkit/
backend/ # FastAPI backend
app/
routes/ # API endpoints (snps, chat, vault, gwas, tts, starter-prompts)
agent/ # Claude Agent SDK orchestration + MCP tools
tts/ # Multi-provider TTS (Groq Orpheus, ElevenLabs, Deepgram)
db/ # Async SQLite wrappers (genome.db, users.db)
frontend/ # React + TypeScript + Vite
src/
components/ # UI components (common/, mental-health/, pgx/, addiction/, risk/)
hooks/ # Data hooks (useSNPs, useChat, usePGxData, useStarterPrompts, ...)
__tests__/ # Vitest tests
config/ # YAML/JSON configuration (goals, evidence tiers, GWAS, PGx drugs)
scripts/ # Python pipeline (import, vault_query, migrations)
skills/ # Claude Code skill definitions
vault-template/ # Obsidian vault starter
tests/ # Python test suite (pytest)
The toolkit follows a separation of concerns:
- Data layer: SQLite with versioned migrations, multi-profile support (
scripts/lib/db.py) - Import layer: Provider-agnostic parsing with auto-detection (
scripts/lib/providers/) - Knowledge layer: Obsidian markdown with Dataview queries (
vault-template/) - Validation layer: Multi-agent consensus pipeline (
scripts/lib/multi_agent.py) - Skill layer: Claude Code skills that orchestrate everything (
skills/)
This toolkit is for research and educational purposes only. It is not a medical device. Genetic information should be interpreted by qualified healthcare professionals. Always consult your doctor before making medical decisions based on genetic data.
The evidence tier system (E1-E5) reflects the state of published research, not clinical recommendations. Drug interaction information is derived from CPIC/DPWG guidelines but may not reflect the most current updates.
MIT
