Welcome! This repository contains the open-source implementation of proto-tools, a Python package containing a large suite of computational biology and biological AI tools, all accessible through a single, consistent Python interface. Language models, structure predictors, inverse folding, sequence analysis, gene annotation, conformational dynamics, genomic scoring, and more are all available through a single pip install command.
Every tool runs in its own automatically managed isolated environment, so all dependency wrangling is handled for you. In addition, proto-tools implements extensive infrastructure for features such as device management and GPU fan-out, making it easy to call tools in quick succession. You can use it as a standalone Python library, as part of the broader proto-language optimization system, or through the proto-client Python SDK for hosted access over the Proto Bio API.
Proto-tools is open source under an MIT license. Contributions are welcome!
Proto-tools requires Python 3.10+:
pip install git+https://github.com/evo-design/proto-tools.gitNote
A direct PyPI install (pip install proto-tools) will be available soon.
Note
If you are developing or contributing to this project, follow the setup instructions in CONTRIBUTING.md instead.
All persistent data (model weights and tool environments) is cached under the PROTO_HOME directory on first use (defaults to ~/.proto/).
To customize the storage location, you can specify a path via the following environment variable:
# Add to your shell profile:
export PROTO_HOME=/path/to/your/proto_homeFor shared filesystems, model weights can be reused to avoid downloading duplicate copies. The PROTO_MODEL_CACHE environment variable lets you point just the weights at that shared location (sharing tool environments is not recommended): export PROTO_MODEL_CACHE=/path/to/shared/weights. See notes/storage.md for all details and options.
A few tools use gated models or software that require accepting a license / terms-of-use first (e.g. ESM3, AlphaGenome, AlphaFold3, X3DNA). See notes/gated-models.md for the full list and per-model access steps.
Tip
You're all set up! To learn what features are available in the library, check out the guides — four short notebooks covering tool environments, persistent execution, device management, and parallel multi-GPU runs.
binder_design/ # De novo antibody / binder design pipelines ├── bindcraft/ ├── freebindcraft/ └── germinal/ causal_models/ # Autoregressive sequence models ├── evo1/ ├── evo2/ ├── progen2/ └── progen3/ database_retrieval/ # Sequence and structure database access ├── alphafold_db/ ├── alphamissense_db/ ├── ccd_lookup/ ├── ensembl/ ├── interproscan/ ├── ncbi/ ├── pdb/ ├── pubchem/ ├── sequence_fetch/ └── uniprot/ gene_annotation/ # Sequence annotation ├── crispr_tracr_rna/ ├── meme/ ├── minced/ ├── miranda/ ├── promoter_calculator/ └── pyhmmer/ inverse_folding/ # Sequence design from structures ├── esm_if1/ ├── fampnn/ ├── ligandmpnn/ └── proteinmpnn/ masked_models/ # Masked language models ├── ablang/ ├── esm2/ ├── esm3/ └── esmc/ mutagenesis/ # Random sequence mutagenesis ├── random_nucleotide/ └── random_protein/ orf_prediction/ # Open reading frame detection ├── orfipy/ └── prodigal/ rna_splicing/ # RNA splice site prediction ├── pangolin/ ├── splice_transformer/ └── spliceai/ sequence_alignment/ # Sequence search and multiple sequence alignment ├── blast/ ├── mafft/ └── mmseqs2/ sequence_scoring/ # Genomic and regulatory scoring ├── alphagenome/ ├── borzoi/ ├── deeppbs_specificity/ ├── enformer/ ├── malinois/ ├── na_mpnn_specificity/ ├── puffin/ └── segmasker/ structure_alignment/ # Structure comparison ├── foldmason/ ├── foldseek/ ├── pymol_rmsd/ ├── tmalign/ └── usalign/ structure_design/ # De novo structure generation └── rfdiffusion3/ structure_dynamics/ # Conformational dynamics └── bioemu/ structure_prediction/ # 3D structure prediction ├── alphafold2/ ├── alphafold3/ ├── boltz2/ ├── chai1/ ├── esmfold/ ├── esmfold2/ ├── protenix/ ├── rf3/ ├── viennarna/ └── x3dna/ structure_scoring/ # Structure quality scoring ├── dssp/ ├── ipsae/ ├── pdockq2/ ├── pyrosetta/ └── structure_metrics/
Runnable walkthroughs of the core framework features live in guides/ and are also available on our docs page:
- Tool Environments — how isolated environments are built and cached on first call.
- Tool Persistence — keep models warm across calls
- Device Management — GPU allocation, LRU eviction, CPU offload
- Parallel Execution — fan out work across every GPU with
ToolPool
Each specific tool also ships a minimal examples/example.ipynb under proto_tools/tools/{category}/{tool}/examples/.
Run tools through natural language with any coding agent (Claude Code, Gemini CLI, OpenAI Codex CLI, etc.). Point the agent at proto-tools agent-context: it prints a primer covering the Input → Config → run_*() → Output pattern, the offline CLI discovery verbs, persistence and parallel execution, and links to the long-form notes on GitHub. The command ships in the wheel, so it works on a plain pip install with no repo checkout.
If you've cloned the repo for contributing, agents also pick up CLAUDE.md (symlinked as AGENTS.md/GEMINI.md) and the task-specific guides in .claude/skills/ automatically.
See CONTRIBUTING.md for full developer setup, storage configuration, PR format, code style, and testing conventions.
If you use Proto in your research, please cite our preprint:
Merchant AT, Guo D, Viggiano B, Brennan-Almaraz LE, Hur E, Mai T, Yin P, King SH, Ashley E, Hie BL. A high-level programming language for generative biology with Proto. bioRxiv (2026). doi: 10.64898/2026.06.22.733870
@article{Merchant2026.06.22.733870,
author = {Merchant, Aditi T and Guo, Daniel and Viggiano, Ben and Brennan-Almaraz, Lucas Emmanuel and Hur, Evelyn and Mai, Tina and Yin, Peter and King, Samuel H and Ashley, Euan and Hie, Brian L},
title = {A high-level programming language for generative biology with Proto},
elocation-id = {2026.06.22.733870},
year = {2026},
doi = {10.64898/2026.06.22.733870},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.64898/2026.06.22.733870},
journal = {bioRxiv}
}