Skip to content

evo-design/proto-tools

Repository files navigation

Proto Tools

Proto Tools

Checks Unit Tests License: MIT Docs bioRxiv

Welcome! This repository contains the open-source implementation of proto-tools, a Python package containing a large suite of computational biology and biological AI tools, all accessible through a single, consistent Python interface. Language models, structure predictors, inverse folding, sequence analysis, gene annotation, conformational dynamics, genomic scoring, and more are all available through a single pip install command.

Every tool runs in its own automatically managed isolated environment, so all dependency wrangling is handled for you. In addition, proto-tools implements extensive infrastructure for features such as device management and GPU fan-out, making it easy to call tools in quick succession. You can use it as a standalone Python library, as part of the broader proto-language optimization system, or through the proto-client Python SDK for hosted access over the Proto Bio API.

Proto-tools is open source under an MIT license. Contributions are welcome!

Setup

Step 1: Install the package

Proto-tools requires Python 3.10+:

pip install git+https://github.com/evo-design/proto-tools.git

Note

A direct PyPI install (pip install proto-tools) will be available soon.

Note

If you are developing or contributing to this project, follow the setup instructions in CONTRIBUTING.md instead.

Step 2: Configure storage (optional)

All persistent data (model weights and tool environments) is cached under the PROTO_HOME directory on first use (defaults to ~/.proto/).

To customize the storage location, you can specify a path via the following environment variable:

# Add to your shell profile:
export PROTO_HOME=/path/to/your/proto_home

For shared filesystems, model weights can be reused to avoid downloading duplicate copies. The PROTO_MODEL_CACHE environment variable lets you point just the weights at that shared location (sharing tool environments is not recommended): export PROTO_MODEL_CACHE=/path/to/shared/weights. See notes/storage.md for all details and options.

Step 3: Gated model access (optional)

A few tools use gated models or software that require accepting a license / terms-of-use first (e.g. ESM3, AlphaGenome, AlphaFold3, X3DNA). See notes/gated-models.md for the full list and per-model access steps.

Tip

You're all set up! To learn what features are available in the library, check out the guides — four short notebooks covering tool environments, persistent execution, device management, and parallel multi-GPU runs.

Available Tools

binder_design/                  # De novo antibody / binder design pipelines
├── bindcraft/
├── freebindcraft/
└── germinal/
causal_models/                 # Autoregressive sequence models
├── evo1/
├── evo2/
├── progen2/
└── progen3/
database_retrieval/             # Sequence and structure database access
├── alphafold_db/
├── alphamissense_db/
├── ccd_lookup/
├── ensembl/
├── interproscan/
├── ncbi/
├── pdb/
├── pubchem/
├── sequence_fetch/
└── uniprot/
gene_annotation/                # Sequence annotation
├── crispr_tracr_rna/
├── meme/
├── minced/
├── miranda/
├── promoter_calculator/
└── pyhmmer/
inverse_folding/                # Sequence design from structures
├── esm_if1/
├── fampnn/
├── ligandmpnn/
└── proteinmpnn/
masked_models/                  # Masked language models
├── ablang/
├── esm2/
├── esm3/
└── esmc/
mutagenesis/                    # Random sequence mutagenesis
├── random_nucleotide/
└── random_protein/
orf_prediction/                 # Open reading frame detection
├── orfipy/
└── prodigal/
rna_splicing/                   # RNA splice site prediction
├── pangolin/
├── splice_transformer/
└── spliceai/
sequence_alignment/             # Sequence search and multiple sequence alignment
├── blast/
├── mafft/
└── mmseqs2/
sequence_scoring/               # Genomic and regulatory scoring
├── alphagenome/
├── borzoi/
├── deeppbs_specificity/
├── enformer/
├── malinois/
├── na_mpnn_specificity/
├── puffin/
└── segmasker/
structure_alignment/            # Structure comparison
├── foldmason/
├── foldseek/
├── pymol_rmsd/
├── tmalign/
└── usalign/
structure_design/               # De novo structure generation
└── rfdiffusion3/
structure_dynamics/             # Conformational dynamics
└── bioemu/
structure_prediction/           # 3D structure prediction
├── alphafold2/
├── alphafold3/
├── boltz2/
├── chai1/
├── esmfold/
├── esmfold2/
├── protenix/
├── rf3/
├── viennarna/
└── x3dna/
structure_scoring/              # Structure quality scoring
├── dssp/
├── ipsae/
├── pdockq2/
├── pyrosetta/
└── structure_metrics/

Guides

Runnable walkthroughs of the core framework features live in guides/ and are also available on our docs page:

  1. Tool Environments — how isolated environments are built and cached on first call.
  2. Tool Persistence — keep models warm across calls
  3. Device Management — GPU allocation, LRU eviction, CPU offload
  4. Parallel Execution — fan out work across every GPU with ToolPool

Each specific tool also ships a minimal examples/example.ipynb under proto_tools/tools/{category}/{tool}/examples/.

Using with a coding agent

Run tools through natural language with any coding agent (Claude Code, Gemini CLI, OpenAI Codex CLI, etc.). Point the agent at proto-tools agent-context: it prints a primer covering the Input → Config → run_*() → Output pattern, the offline CLI discovery verbs, persistence and parallel execution, and links to the long-form notes on GitHub. The command ships in the wheel, so it works on a plain pip install with no repo checkout.

If you've cloned the repo for contributing, agents also pick up CLAUDE.md (symlinked as AGENTS.md/GEMINI.md) and the task-specific guides in .claude/skills/ automatically.

Development & Contributing

See CONTRIBUTING.md for full developer setup, storage configuration, PR format, code style, and testing conventions.

Citation

If you use Proto in your research, please cite our preprint:

Merchant AT, Guo D, Viggiano B, Brennan-Almaraz LE, Hur E, Mai T, Yin P, King SH, Ashley E, Hie BL. A high-level programming language for generative biology with Proto. bioRxiv (2026). doi: 10.64898/2026.06.22.733870

@article{Merchant2026.06.22.733870,
  author = {Merchant, Aditi T and Guo, Daniel and Viggiano, Ben and Brennan-Almaraz, Lucas Emmanuel and Hur, Evelyn and Mai, Tina and Yin, Peter and King, Samuel H and Ashley, Euan and Hie, Brian L},
  title = {A high-level programming language for generative biology with Proto},
  elocation-id = {2026.06.22.733870},
  year = {2026},
  doi = {10.64898/2026.06.22.733870},
  publisher = {Cold Spring Harbor Laboratory},
  URL = {https://www.biorxiv.org/content/10.64898/2026.06.22.733870},
  journal = {bioRxiv}
}

Releases

No releases published

Packages

 
 
 

Contributors