Skip to content

glebis/genome-toolkit

Genome Toolkit

Turn your raw genetic data into an actionable, evidence-based personal health vault.

Genome Toolkit is a set of Claude Code skills and Python scripts that import DTC genome data (23andMe, AncestryDNA, MyHeritage, Genotek, Nebula, or any VCF), build a structured Obsidian knowledge vault, and generate clinically useful outputs like drug safety cards and prescriber summaries.

What It Does

  1. Import raw genome data from any major DTC provider (auto-detected)
  2. Ask about your health goals (medication safety, mental health, gut, liver, sleep...)
  3. Generate personalized gene notes, system maps, and clinical reports
  4. Validate claims using multi-agent AI pipeline (Codex, NotebookLM, PubMed)
  5. Track biomarkers and compare lab results against genetic predictions
  6. Expand your data via guided imputation (600K -> 3-40M variants)
  7. Map your life-expectancy context across the countries you've lived in (migrant Life-Map)

v0.4.0 — Full-stack web app with 7 views (SNP Browser, Mental Health, PGx, Addiction, Risk Landscape, in-browser Import, and Life-Map), AI chat with collapsible sidebar, multi-provider TTS (Groq Orpheus/ElevenLabs/Deepgram), chat history, configurable nav, agent-friendly setup, and 6 supported genome providers. Hardened by a scientific-honesty audit — outputs are framed as evidence-tiered context, not clinical prediction (no fabricated risk scores; uncalibrated GWAS tallies labelled as such; harm-reduction content kept but never presented as genotype-backed dosing).

Quick Start

Prerequisites

Setup

# Clone the toolkit
git clone https://github.com/glebis/genome-toolkit.git
cd genome-toolkit

# Install Python dependencies
pip install -e ".[web]"

# Interactive setup (API keys, vault path, TTS, visible sections)
python scripts/setup.py

Setup with Claude Code (or any AI agent)

The setup script supports a fully non-interactive --auto mode that reads API keys from environment variables and accepts all options as CLI flags:

# Set API keys as env vars (or store in macOS Keychain)
export ANTHROPIC_API_KEY=sk-ant-...
export GROQ_API_KEY=gsk_...

# Run non-interactive setup
python scripts/setup.py --auto \
  --vault ~/my-genome-vault \
  --tts-provider orpheus \
  --tts-voice leo \
  --population EUR

# Optionally hide nav sections
python scripts/setup.py --auto --hide-views addiction risk

# Show them back
python scripts/setup.py --auto --show-views addiction

All flags are optional — omitted values use existing config or sensible defaults. This means an AI agent can run python scripts/setup.py --auto with zero arguments and get a working config.

First Run

# 1. Place your raw genome file in the vault
cp ~/Downloads/23andme_raw.txt $GENOME_VAULT_ROOT/data/raw/

# 2. Import your data (in Claude Code)
/genome-import

# 3. Set up your vault with health goals
/genome-onboard

# 4. Start the app
uvicorn backend.app.main:app --port 8000 &
cd frontend && npm run dev

# 5. Open http://localhost:5173

Web Application

A full-stack web interface for exploring your genome data interactively.

Views

View Description
SNP Browser Paginated, filterable table of 3.4M+ variants with ClinVar significance, review stars, population frequency, and GWAS effect sizes
Mental Health GWAS-powered psychiatric genetics dashboard with gene cards and evidence panels
PGx / Drugs Pharmacogenomic profile — metabolizer status, drug cards, interaction warnings
Addiction Harm-reduction-oriented substance sensitivity analysis
Risk Landscape Top mortality causes overlaid with personal genetic risk factors
Import Browser upload for raw genome files — drag-and-drop, format auto-detection preview, and import history (no CLI required)

Import (browser upload)

Add genome data without touching the command line. The Import tab (always available, and shown automatically on first run when the database is empty) lets you:

  1. Drag-and-drop or browse for a raw file — 23andMe / AncestryDNA .txt, MyHeritage / Genotek .csv, or VCF .vcf / .vcf.gz (including imputed).
  2. Preview the detected format — provider, version, assembly, confidence, and an estimated variant count — before anything is written.
  3. Set options — profile name, minimum imputation r² (VCF only), and a dry-run toggle to validate without importing.
  4. Import and review — imported / duplicate / low-r² counts, plus an import history table.

Files are streamed to a temporary file (200 MB cap, .zip rejected), processed in a threadpool so the server stays responsive, then deleted. The browser flow and the /genome-import CLI share one code path (scripts/lib/importer.py), so results are identical.

REST endpoints: POST /api/import/detect, POST /api/import/upload, GET /api/import/history.

Ask AI (Cmd+K)

AI chat powered by Claude Agent SDK with 11 MCP tools for querying your genome:

  • Personalized starter prompts — context-aware suggestions based on current view and your data (e.g., "Which drugs should I discuss with my doctor?" when on PGx tab with CYP2D6 poor metabolizer status)
  • Vault integration — reads your Obsidian gene notes, systems, phenotypes, and protocols
  • Interactive responses — clickable gene names, wikilinks, action buttons (add to checklist, show gene, show variant)
  • Voice mode — dictation input + TTS output with gene name spelling
  • Suggested actions — AI can add items to your checklist, filter the SNP table, or open relevant links
  • Collapsible sidebar mode — when the AI filters the SNP table, the chat palette auto-collapses to a right sidebar so you can see the table updating in real time. Toggle with Cmd+\, expand back with Cmd+K

Running

# Backend (FastAPI)
uvicorn backend.app.main:app --port 8000

# Frontend (Vite + React)
cd frontend && npm run dev

# Open http://localhost:5173

Tech Stack

  • Backend: FastAPI, aiosqlite, Claude Agent SDK, SOPS-encrypted secrets
  • Frontend: React 18, TypeScript, Vite, Vitest
  • TTS: Multi-provider (Groq Orpheus, ElevenLabs, Deepgram) with browser fallback
  • Data: SQLite (genome.db), Obsidian vault, GWAS configs (PGC), PGx configs (CPIC)

Skills

Skill Trigger Purpose
genome-import /genome-import Import raw data, prepare imputation, import imputed VCFs
genome-onboard /genome-onboard Goal-driven vault setup. --quick (4 questions) or --full (22 questions with GAD-7/PHQ-2/PSS-4)
genome-create /new-gene X Create gene/system/phenotype notes from SQLite data
genome-analytics /genome-analytics PRS, enrichment, vault audit, PubMed monitoring
genome-report /biomarker, /wallet-card Lab import, Wallet Card, PGx Card, Prescriber Summary
genome-query /genome-query SQL-like vault queries (filter, sort, group, stats, schema)
genome-validate /genome-validate Multi-agent fact-checking (Codex + NotebookLM + PubMed)

Supported Providers

Provider Format Detected By
23andMe (v2-v5) TSV "23andMe" in file header
Genotek (Генотек) TSV "Genotek" in file header
AncestryDNA TSV (5 cols) allele1/allele2 column pattern
MyHeritage CSV "RSID,CHROMOSOME,POSITION,RESULT" header
Nebula Genomics VCF "source=Nebula" in VCF header
Generic VCF VCF ##fileformat=VCF header

Evidence System

Every claim in the vault has an evidence tier:

  • E1 (clinical-grade): CPIC/DPWG guidelines, multiple studies — act on these
  • E2 (well-replicated): Multiple GWAS, OR > 1.5 — likely reliable
  • E3 (supported): 2-5 studies, plausible mechanism — interpret cautiously
  • E4 (suggestive): Single study — hypothesis, not diagnosis
  • E5 (speculative): Preliminary or N=1 — for curiosity only

Multi-Agent Validation

Genome Toolkit uses multiple AI agents to validate claims:

  • Claude Code — primary agent for note creation and analysis
  • Codex CLI (gpt-5-codex) — cross-model validation of evidence tiers and drug interactions
  • NotebookLM — source-grounded fact-checking of prescriber documents
  • PubMed subagents — literature verification and retraction monitoring
  • Tavily/Firecrawl — web search for recent publications and safety alerts

Prescriber-facing reports require 2 agents to agree before publishing (configurable in config/agents.yaml).

Validation Modes

  • Full audit (/genome-validate) — multi-agent sweep of the entire vault
  • Gene fact-check (/genome-validate gene COMT) — verify genotypes vs SQLite, check claims via web, validate evidence tiers
  • Protocol/report fact-check (/genome-validate protocol "Sertraline Optimization") — verify gene-recommendation links, supplement safety, evidence tiers

Vault Query

Query any frontmatter field with SQL-like syntax from the CLI:

python3 scripts/vault_query.py "type=gene AND evidence_tier=E1" --fields gene_symbol,full_name
python3 scripts/vault_query.py "type=system" --fields system_name,coverage --sort coverage --desc
python3 scripts/vault_query.py --stats

Supports: =, !=, ~ (contains), >, <, >=, <=, AND/OR/NOT logic, --sort, --group, --json, --count, --stats, --schema. See skills/genome-query/SKILL.md for full reference.

Health Triage System

Interactive triage for vault action items with DDD architecture:

# CLI
cd genome-toolkit && PYTHONPATH=. python -m genome_toolkit.triage --vault ~/genome-vault --classify

# TUI (Textual terminal UI)
genome-triage

Features:

  • Score and bucket action items (DO_NOW / SCHEDULE / DELEGATE / DROP)
  • SVG renderings: dashboard, score cards, visit reports
  • Session persistence (SQLite) with approval/deferral history
  • Suggestion engine based on assessment scores + genetic signals
  • Tests across domain, application, infrastructure, presentation layers

Onboarding Modes

Quick (/genome-onboard): 4 questions → 8-12 gene notes + Wallet Card (2 min)

Full (/genome-onboard --full): 22-question interview across 4 phases (12 min):

  • Phase 1: Demographics, medications, diagnoses, goals
  • Phase 2: Sleep, exercise, caffeine, GI symptoms, pain, morning stiffness
  • Phase 3: GAD-7, PHQ-2, PSS-4 (validated instruments)
  • Phase 4: Family history, ancestry, concerns

Generates Profile Card + personalized Action Plan with assessment-weighted gene priorities.

Related Projects

  • evidence-check — Modular scientific claim verification (PubMed, genomics, psychiatry). Used by genome-validate for structured fact-checking.

Vault Structure

your-genome-vault/
  Dashboard.md                 # Decision-tree home page
  Question Index.md            # Search by concern, not by gene
  Genetic Determinism...md     # Epistemic guardrails
  Genes/                       # One note per gene (BDNF.md, CYP2D6.md, ...)
  Systems/                     # Biological systems (Dopamine, HPA Axis, ...)
  Phenotypes/                  # Genetics -> lived experience
  Protocols/                   # Actionable intervention protocols
  Reports/                     # Prescriber-facing documents
  Biomarkers/                  # Lab results with genetic comparison
  Research/                    # Literature reviews and findings
  Meta/                        # Dashboards and audit reports
  Guides/                      # Getting Started, Imputation Guide
  data/
    raw/                       # Your genome file (gitignored)
    genome.db                  # SQLite database (gitignored)

Imputation

Expand from ~600K SNPs to ~3-40M variants:

  1. /genome-import prepare for imputation (VCF export + QC)
  2. Upload to Michigan Imputation Server (free, 2-12 hours)
  3. /genome-import import imputed data

See Guides/Imputation Guide.md for full walkthrough.

Requires: bcftools, bgzip (brew install bcftools htslib)

Configuration

All config in config/:

  • default.yaml — paths, rate limits, cache TTL
  • goal_map.yaml — health goals -> systems -> genes mapping
  • evidence_tiers.yaml — E1-E5 definitions
  • provider_formats.yaml — file format detection signatures
  • agents.yaml — multi-agent validation pipeline

Override paths via environment variables: GENOME_VAULT_ROOT, GENOME_DB_PATH.

Secrets (SOPS + age)

API keys are stored in macOS Keychain by default (via scripts/setup.py). For team use or version-controlled secrets, the project optionally supports SOPS encryption with age:

brew install sops age

# Load encrypted secrets into current shell
source scripts/load_secrets.sh

# Edit encrypted secrets (requires age key in ~/.config/sops/age/keys.txt)
sops config/secrets.yaml

config/secrets.yaml is encrypted at rest and safe to commit — only age key holders can decrypt.

Privacy

  • Raw genome data stays local (gitignored, never uploaded)
  • SQLite database is gitignored
  • No data leaves your machine unless you explicitly use imputation servers or API enrichment
  • Imputation servers (Michigan/TOPMed) encrypt data and delete after 7 days
  • Reports reference rsIDs, not bulk genotype dumps

Philosophy

"Normal willpower, different hardware. Fully rewirable."

  • Genetics explains WHY, not what's wrong
  • Every gene note ends with "What Changes This" (the exit ramp)
  • E1 claims are reliable. E3-E5 claims are hypotheses, not diagnoses.
  • 40-70% of outcomes are environment, behavior, and choice

Image Generation

Generate genomics-themed images via OpenAI's GPT Image 2 API with 10 curated style templates.

Style Templates Preview

# List available styles
python3 scripts/generate_image.py --list-styles

# Generate with a style preset
python3 scripts/generate_image.py --style flat-kahn --size 1024x1536 "COMT enzyme pathway" out.png

# Draft mode (97% cheaper, for iteration)
python3 scripts/generate_image.py --style nordic-refined --draft "dopamine clearance diagram" out.png

# Combine styles
python3 scripts/generate_image.py --style "fritz-kahn+retro-terminal" "brain factory on VT100" out.png

# Preview composed prompt without calling API
python3 scripts/generate_image.py --style scientific --dry-run "Yerkes-Dodson curve" out.png

Style Templates

Style Description
arntz Gerd Arntz ISOTYPE pictograms — bold geometric silhouettes on black
dark-infographic Schemas, arrows, bar charts on black background
nordic-craft Scandinavian indie — linen texture, linocut, botanical accents
nordic-refined Cleaner Nordic — rounded grotesque sans-serif, editorial polish
scientific Yerkes-Dodson curves, kinetics graphs, molecular diagrams
vintage-biological 19th century Haeckel/Cajal engravings on aged parchment
vintage-modern Vintage engravings + clean modern sans-serif typography
fritz-kahn 1920s Industriepalast — body as factory with tiny workers
retro-terminal Each slide on a different vintage computer (Mac, C64, VT100, iMac G3)
flat-kahn Ultra flat vector Fritz Kahn — no gradients, Soviet-industrial palette

Styles defined in styles.yaml. Add your own by following the existing format.

Development

Running Tests

# Backend (Python)
pip install -e ".[dev]"
python -m pytest tests/ -v

# Frontend (TypeScript)
cd frontend && npx vitest run

# Coverage report
cd frontend && npx vitest run --coverage

Project Structure

genome-toolkit/
  backend/           # FastAPI backend
    app/
      routes/        # API endpoints (snps, chat, vault, gwas, tts, starter-prompts)
      agent/         # Claude Agent SDK orchestration + MCP tools
      tts/           # Multi-provider TTS (Groq Orpheus, ElevenLabs, Deepgram)
      db/            # Async SQLite wrappers (genome.db, users.db)
  frontend/          # React + TypeScript + Vite
    src/
      components/    # UI components (common/, mental-health/, pgx/, addiction/, risk/)
      hooks/         # Data hooks (useSNPs, useChat, usePGxData, useStarterPrompts, ...)
      __tests__/     # Vitest tests
  config/            # YAML/JSON configuration (goals, evidence tiers, GWAS, PGx drugs)
  scripts/           # Python pipeline (import, vault_query, migrations)
  skills/            # Claude Code skill definitions
  vault-template/    # Obsidian vault starter
  tests/             # Python test suite (pytest)

Architecture

The toolkit follows a separation of concerns:

  • Data layer: SQLite with versioned migrations, multi-profile support (scripts/lib/db.py)
  • Import layer: Provider-agnostic parsing with auto-detection (scripts/lib/providers/)
  • Knowledge layer: Obsidian markdown with Dataview queries (vault-template/)
  • Validation layer: Multi-agent consensus pipeline (scripts/lib/multi_agent.py)
  • Skill layer: Claude Code skills that orchestrate everything (skills/)

Disclaimer

This toolkit is for research and educational purposes only. It is not a medical device. Genetic information should be interpreted by qualified healthcare professionals. Always consult your doctor before making medical decisions based on genetic data.

The evidence tier system (E1-E5) reflects the state of published research, not clinical recommendations. Drug interaction information is derived from CPIC/DPWG guidelines but may not reflect the most current updates.

License

MIT

About

Personal genomics Obsidian vault toolkit — import, analyze, and act on your genome data with Claude Code

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors