Genome Toolkit

Turn your raw genetic data into an actionable, evidence-based personal health vault.

Genome Toolkit is a set of Claude Code skills and Python scripts that import DTC genome data (23andMe, AncestryDNA, MyHeritage, Genotek, Nebula, or any VCF), build a structured Obsidian knowledge vault, and generate clinically useful outputs like drug safety cards and prescriber summaries.

What It Does

Import raw genome data from any major DTC provider (auto-detected)
Ask about your health goals (medication safety, mental health, gut, liver, sleep...)
Generate personalized gene notes, system maps, and clinical reports
Validate claims using multi-agent AI pipeline (Codex, NotebookLM, PubMed)
Track biomarkers and compare lab results against genetic predictions
Expand your data via guided imputation (600K -> 3-40M variants)
Map your life-expectancy context across the countries you've lived in (migrant Life-Map)

v0.4.0 — Full-stack web app with 7 views (SNP Browser, Mental Health, PGx, Addiction, Risk Landscape, in-browser Import, and Life-Map), AI chat with collapsible sidebar, multi-provider TTS (Groq Orpheus/ElevenLabs/Deepgram), chat history, configurable nav, agent-friendly setup, and 6 supported genome providers. Hardened by a scientific-honesty audit — outputs are framed as evidence-tiered context, not clinical prediction (no fabricated risk scores; uncalibrated GWAS tallies labelled as such; harm-reduction content kept but never presented as genotype-backed dosing).

Quick Start

Prerequisites

Obsidian with Dataview plugin
Python 3.10+
Claude Code

Setup

# Clone the toolkit
git clone https://github.com/glebis/genome-toolkit.git
cd genome-toolkit

# Install Python dependencies
pip install -e ".[web]"

# Interactive setup (API keys, vault path, TTS, visible sections)
python scripts/setup.py

Setup with Claude Code (or any AI agent)

The setup script supports a fully non-interactive --auto mode that reads API keys from environment variables and accepts all options as CLI flags:

# Set API keys as env vars (or store in macOS Keychain)
export ANTHROPIC_API_KEY=sk-ant-...
export GROQ_API_KEY=gsk_...

# Run non-interactive setup
python scripts/setup.py --auto \
  --vault ~/my-genome-vault \
  --tts-provider orpheus \
  --tts-voice leo \
  --population EUR

# Optionally hide nav sections
python scripts/setup.py --auto --hide-views addiction risk

# Show them back
python scripts/setup.py --auto --show-views addiction

All flags are optional — omitted values use existing config or sensible defaults. This means an AI agent can run python scripts/setup.py --auto with zero arguments and get a working config.

First Run

# 1. Place your raw genome file in the vault
cp ~/Downloads/23andme_raw.txt $GENOME_VAULT_ROOT/data/raw/

# 2. Import your data (in Claude Code)
/genome-import

# 3. Set up your vault with health goals
/genome-onboard

# 4. Start the app
uvicorn backend.app.main:app --port 8000 &
cd frontend && npm run dev

# 5. Open http://localhost:5173

Web Application

A full-stack web interface for exploring your genome data interactively.

Views

View	Description
SNP Browser	Paginated, filterable table of 3.4M+ variants with ClinVar significance, review stars, population frequency, and GWAS effect sizes
Mental Health	GWAS-powered psychiatric genetics dashboard with gene cards and evidence panels
PGx / Drugs	Pharmacogenomic profile — metabolizer status, drug cards, interaction warnings
Addiction	Harm-reduction-oriented substance sensitivity analysis
Risk Landscape	Top mortality causes overlaid with personal genetic risk factors
Import	Browser upload for raw genome files — drag-and-drop, format auto-detection preview, and import history (no CLI required)

Import (browser upload)

Add genome data without touching the command line. The Import tab (always available, and shown automatically on first run when the database is empty) lets you:

Drag-and-drop or browse for a raw file — 23andMe / AncestryDNA .txt, MyHeritage / Genotek .csv, or VCF .vcf / .vcf.gz (including imputed).
Preview the detected format — provider, version, assembly, confidence, and an estimated variant count — before anything is written.
Set options — profile name, minimum imputation r² (VCF only), and a dry-run toggle to validate without importing.
Import and review — imported / duplicate / low-r² counts, plus an import history table.

Files are streamed to a temporary file (200 MB cap, .zip rejected), processed in a threadpool so the server stays responsive, then deleted. The browser flow and the /genome-import CLI share one code path (scripts/lib/importer.py), so results are identical.

REST endpoints: POST /api/import/detect, POST /api/import/upload, GET /api/import/history.

Ask AI (Cmd+K)

AI chat powered by Claude Agent SDK with 11 MCP tools for querying your genome:

Personalized starter prompts — context-aware suggestions based on current view and your data (e.g., "Which drugs should I discuss with my doctor?" when on PGx tab with CYP2D6 poor metabolizer status)
Vault integration — reads your Obsidian gene notes, systems, phenotypes, and protocols
Interactive responses — clickable gene names, wikilinks, action buttons (add to checklist, show gene, show variant)
Voice mode — dictation input + TTS output with gene name spelling
Suggested actions — AI can add items to your checklist, filter the SNP table, or open relevant links
Collapsible sidebar mode — when the AI filters the SNP table, the chat palette auto-collapses to a right sidebar so you can see the table updating in real time. Toggle with Cmd+\, expand back with Cmd+K

Running

# Backend (FastAPI)
uvicorn backend.app.main:app --port 8000

# Frontend (Vite + React)
cd frontend && npm run dev

# Open http://localhost:5173

Tech Stack

Backend: FastAPI, aiosqlite, Claude Agent SDK, SOPS-encrypted secrets
Frontend: React 18, TypeScript, Vite, Vitest
TTS: Multi-provider (Groq Orpheus, ElevenLabs, Deepgram) with browser fallback
Data: SQLite (genome.db), Obsidian vault, GWAS configs (PGC), PGx configs (CPIC)

Skills

Skill	Trigger	Purpose
genome-import	`/genome-import`	Import raw data, prepare imputation, import imputed VCFs
genome-onboard	`/genome-onboard`	Goal-driven vault setup. `--quick` (4 questions) or `--full` (22 questions with GAD-7/PHQ-2/PSS-4)
genome-create	`/new-gene X`	Create gene/system/phenotype notes from SQLite data
genome-analytics	`/genome-analytics`	PRS, enrichment, vault audit, PubMed monitoring
genome-report	`/biomarker`, `/wallet-card`	Lab import, Wallet Card, PGx Card, Prescriber Summary
genome-query	`/genome-query`	SQL-like vault queries (filter, sort, group, stats, schema)
genome-validate	`/genome-validate`	Multi-agent fact-checking (Codex + NotebookLM + PubMed)

Supported Providers

Provider	Format	Detected By
23andMe (v2-v5)	TSV	"23andMe" in file header
Genotek (Генотек)	TSV	"Genotek" in file header
AncestryDNA	TSV (5 cols)	allele1/allele2 column pattern
MyHeritage	CSV	"RSID,CHROMOSOME,POSITION,RESULT" header
Nebula Genomics	VCF	"source=Nebula" in VCF header
Generic VCF	VCF	`##fileformat=VCF` header

Evidence System

Every claim in the vault has an evidence tier:

E1 (clinical-grade): CPIC/DPWG guidelines, multiple studies — act on these
E2 (well-replicated): Multiple GWAS, OR > 1.5 — likely reliable
E3 (supported): 2-5 studies, plausible mechanism — interpret cautiously
E4 (suggestive): Single study — hypothesis, not diagnosis
E5 (speculative): Preliminary or N=1 — for curiosity only

Multi-Agent Validation

Genome Toolkit uses multiple AI agents to validate claims:

Claude Code — primary agent for note creation and analysis
Codex CLI (gpt-5-codex) — cross-model validation of evidence tiers and drug interactions
NotebookLM — source-grounded fact-checking of prescriber documents
PubMed subagents — literature verification and retraction monitoring
Tavily/Firecrawl — web search for recent publications and safety alerts

Prescriber-facing reports require 2 agents to agree before publishing (configurable in config/agents.yaml).

Validation Modes

Full audit (/genome-validate) — multi-agent sweep of the entire vault
Gene fact-check (/genome-validate gene COMT) — verify genotypes vs SQLite, check claims via web, validate evidence tiers
Protocol/report fact-check (/genome-validate protocol "Sertraline Optimization") — verify gene-recommendation links, supplement safety, evidence tiers

Vault Query

Query any frontmatter field with SQL-like syntax from the CLI:

python3 scripts/vault_query.py "type=gene AND evidence_tier=E1" --fields gene_symbol,full_name
python3 scripts/vault_query.py "type=system" --fields system_name,coverage --sort coverage --desc
python3 scripts/vault_query.py --stats

Supports: =, !=, ~ (contains), >, <, >=, <=, AND/OR/NOT logic, --sort, --group, --json, --count, --stats, --schema. See skills/genome-query/SKILL.md for full reference.

Health Triage System

Interactive triage for vault action items with DDD architecture:

# CLI
cd genome-toolkit && PYTHONPATH=. python -m genome_toolkit.triage --vault ~/genome-vault --classify

# TUI (Textual terminal UI)
genome-triage

Features:

Score and bucket action items (DO_NOW / SCHEDULE / DELEGATE / DROP)
SVG renderings: dashboard, score cards, visit reports
Session persistence (SQLite) with approval/deferral history
Suggestion engine based on assessment scores + genetic signals
Tests across domain, application, infrastructure, presentation layers

Onboarding Modes

Quick (/genome-onboard): 4 questions → 8-12 gene notes + Wallet Card (2 min)

Full (/genome-onboard --full): 22-question interview across 4 phases (12 min):

Phase 1: Demographics, medications, diagnoses, goals
Phase 2: Sleep, exercise, caffeine, GI symptoms, pain, morning stiffness
Phase 3: GAD-7, PHQ-2, PSS-4 (validated instruments)
Phase 4: Family history, ancestry, concerns

Generates Profile Card + personalized Action Plan with assessment-weighted gene priorities.

Related Projects

evidence-check — Modular scientific claim verification (PubMed, genomics, psychiatry). Used by genome-validate for structured fact-checking.

Vault Structure

your-genome-vault/
  Dashboard.md                 # Decision-tree home page
  Question Index.md            # Search by concern, not by gene
  Genetic Determinism...md     # Epistemic guardrails
  Genes/                       # One note per gene (BDNF.md, CYP2D6.md, ...)
  Systems/                     # Biological systems (Dopamine, HPA Axis, ...)
  Phenotypes/                  # Genetics -> lived experience
  Protocols/                   # Actionable intervention protocols
  Reports/                     # Prescriber-facing documents
  Biomarkers/                  # Lab results with genetic comparison
  Research/                    # Literature reviews and findings
  Meta/                        # Dashboards and audit reports
  Guides/                      # Getting Started, Imputation Guide
  data/
    raw/                       # Your genome file (gitignored)
    genome.db                  # SQLite database (gitignored)

Imputation

Expand from ~600K SNPs to ~3-40M variants:

/genome-import prepare for imputation (VCF export + QC)
Upload to Michigan Imputation Server (free, 2-12 hours)
/genome-import import imputed data

See Guides/Imputation Guide.md for full walkthrough.

Requires: bcftools, bgzip (brew install bcftools htslib)

Configuration

All config in config/:

default.yaml — paths, rate limits, cache TTL
goal_map.yaml — health goals -> systems -> genes mapping
evidence_tiers.yaml — E1-E5 definitions
provider_formats.yaml — file format detection signatures
agents.yaml — multi-agent validation pipeline

Override paths via environment variables: GENOME_VAULT_ROOT, GENOME_DB_PATH.

Secrets (SOPS + age)

API keys are stored in macOS Keychain by default (via scripts/setup.py). For team use or version-controlled secrets, the project optionally supports SOPS encryption with age:

brew install sops age

# Load encrypted secrets into current shell
source scripts/load_secrets.sh

# Edit encrypted secrets (requires age key in ~/.config/sops/age/keys.txt)
sops config/secrets.yaml

config/secrets.yaml is encrypted at rest and safe to commit — only age key holders can decrypt.

Privacy

Raw genome data stays local (gitignored, never uploaded)
SQLite database is gitignored
No data leaves your machine unless you explicitly use imputation servers or API enrichment
Imputation servers (Michigan/TOPMed) encrypt data and delete after 7 days
Reports reference rsIDs, not bulk genotype dumps

Philosophy

"Normal willpower, different hardware. Fully rewirable."

Genetics explains WHY, not what's wrong
Every gene note ends with "What Changes This" (the exit ramp)
E1 claims are reliable. E3-E5 claims are hypotheses, not diagnoses.
40-70% of outcomes are environment, behavior, and choice

Image Generation

Generate genomics-themed images via OpenAI's GPT Image 2 API with 10 curated style templates.

# List available styles
python3 scripts/generate_image.py --list-styles

# Generate with a style preset
python3 scripts/generate_image.py --style flat-kahn --size 1024x1536 "COMT enzyme pathway" out.png

# Draft mode (97% cheaper, for iteration)
python3 scripts/generate_image.py --style nordic-refined --draft "dopamine clearance diagram" out.png

# Combine styles
python3 scripts/generate_image.py --style "fritz-kahn+retro-terminal" "brain factory on VT100" out.png

# Preview composed prompt without calling API
python3 scripts/generate_image.py --style scientific --dry-run "Yerkes-Dodson curve" out.png

Style Templates

Style	Description
`arntz`	Gerd Arntz ISOTYPE pictograms — bold geometric silhouettes on black
`dark-infographic`	Schemas, arrows, bar charts on black background
`nordic-craft`	Scandinavian indie — linen texture, linocut, botanical accents
`nordic-refined`	Cleaner Nordic — rounded grotesque sans-serif, editorial polish
`scientific`	Yerkes-Dodson curves, kinetics graphs, molecular diagrams
`vintage-biological`	19th century Haeckel/Cajal engravings on aged parchment
`vintage-modern`	Vintage engravings + clean modern sans-serif typography
`fritz-kahn`	1920s Industriepalast — body as factory with tiny workers
`retro-terminal`	Each slide on a different vintage computer (Mac, C64, VT100, iMac G3)
`flat-kahn`	Ultra flat vector Fritz Kahn — no gradients, Soviet-industrial palette

Styles defined in styles.yaml. Add your own by following the existing format.

Development

Running Tests

# Backend (Python)
pip install -e ".[dev]"
python -m pytest tests/ -v

# Frontend (TypeScript)
cd frontend && npx vitest run

# Coverage report
cd frontend && npx vitest run --coverage

Project Structure

genome-toolkit/
  backend/           # FastAPI backend
    app/
      routes/        # API endpoints (snps, chat, vault, gwas, tts, starter-prompts)
      agent/         # Claude Agent SDK orchestration + MCP tools
      tts/           # Multi-provider TTS (Groq Orpheus, ElevenLabs, Deepgram)
      db/            # Async SQLite wrappers (genome.db, users.db)
  frontend/          # React + TypeScript + Vite
    src/
      components/    # UI components (common/, mental-health/, pgx/, addiction/, risk/)
      hooks/         # Data hooks (useSNPs, useChat, usePGxData, useStarterPrompts, ...)
      __tests__/     # Vitest tests
  config/            # YAML/JSON configuration (goals, evidence tiers, GWAS, PGx drugs)
  scripts/           # Python pipeline (import, vault_query, migrations)
  skills/            # Claude Code skill definitions
  vault-template/    # Obsidian vault starter
  tests/             # Python test suite (pytest)

Architecture

The toolkit follows a separation of concerns:

Data layer: SQLite with versioned migrations, multi-profile support (scripts/lib/db.py)
Import layer: Provider-agnostic parsing with auto-detection (scripts/lib/providers/)
Knowledge layer: Obsidian markdown with Dataview queries (vault-template/)
Validation layer: Multi-agent consensus pipeline (scripts/lib/multi_agent.py)
Skill layer: Claude Code skills that orchestrate everything (skills/)

Disclaimer

This toolkit is for research and educational purposes only. It is not a medical device. Genetic information should be interpreted by qualified healthcare professionals. Always consult your doctor before making medical decisions based on genetic data.

The evidence tier system (E1-E5) reflects the state of published research, not clinical recommendations. Drug interaction information is derived from CPIC/DPWG guidelines but may not reflect the most current updates.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
.claude/hooks		.claude/hooks
.github		.github
backend		backend
config		config
docs		docs
frontend		frontend
genome_toolkit		genome_toolkit
scripts		scripts
skills		skills
tests		tests
vault-template		vault-template
.env.example		.env.example
.gitignore		.gitignore
.sops.yaml		.sops.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
cliff.toml		cliff.toml
pyproject.toml		pyproject.toml
styles.yaml		styles.yaml

Folders and files

Latest commit

History

Repository files navigation

Genome Toolkit

What It Does

Quick Start

Prerequisites

Setup

Setup with Claude Code (or any AI agent)

First Run

Web Application

Views

Import (browser upload)

Ask AI (Cmd+K)

Running

Tech Stack

Skills

Supported Providers

Evidence System

Multi-Agent Validation

Validation Modes

Vault Query

Health Triage System

Onboarding Modes

Related Projects

Vault Structure

Imputation

Configuration

Secrets (SOPS + age)

Privacy

Philosophy

Image Generation

Style Templates

Development

Running Tests

Project Structure

Architecture

Disclaimer

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages