Skip to content

ahmedanees-m/pen-compare

Repository files navigation

PEN-COMPARE

Programmable Editor Nominee - Comparative Outcome Metrics via Pre-registration and Axis-based Ranking Evaluation

CI Version Python versions Coverage Docs License: MIT Ruff OSF genome-atlas mech-class pen-score pen-assemble


PEN-COMPARE is the capstone of PEN-STACK - a five computational infrastructure for programmable genome integration design. It answers the defining question of the framework:

Which genome editors are genuine "Molecular Pens" - writers that insert DNA without cutting it - and how robustly can we certify that distinction?

PEN-COMPARE integrates all four upstream PEN-STACK datasets into a 5-gate hierarchical certification system (TrueWriterScore v3.2), evaluates 1,058 editors and designs, and delivers its conclusions through a public Streamlit webserver with local-LLM RAG Q&A. All five pre-registered predictions passed validation.


The PEN-STACK Pipeline

PEN-COMPARE sits at the end of a five-package evidence chain. Each upstream package provides critical inputs:

┌─────────────────────────────────────────────────────────────────────┐
│                        PEN-STACK  (Papers 1–5)                      │
│                                                                     │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐            │
│  │ GENOME-ATLAS │   │  MECH-CLASS  │   │  PEN-SCORE   │            │
│  │  (Paper 1)   │   │  (Paper 2)   │   │  (Paper 3)   │            │
│  │              │   │              │   │              │            │
│  │ Knowledge    │   │ Mechanism    │   │ 8-axis multi-│            │
│  │ graph of 28  │   │ classifier:  │   │ criteria     │            │
│  │ editing sys- │   │ IS110 bridge │   │ scoring of   │            │
│  │ tems + PFAM  │   │ vs nuclease  │   │ 29 natural   │            │
│  │ domain atlas │   │ vs transpo-  │   │ editors      │            │
│  │              │   │ sase (F1=    │   │ (S_DSB,      │            │
│  │ → 28 curated │   │ 0.9862)      │   │ S_Prog, etc) │            │
│  │   systems    │   │              │   │              │            │
│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘            │
│         │                  │                  │                     │
│         └──────────────────┴──────────────────┘                    │
│                            │                                        │
│                    ┌───────▼────────┐                               │
│                    │  PEN-ASSEMBLE  │                               │
│                    │   (Paper 4)    │                               │
│                    │                │                               │
│                    │ 1,029 IS110    │                               │
│                    │ designs across │                               │
│                    │ 4 strategies   │                               │
│                    │ (deimmunized,  │                               │
│                    │ ortholog, etc) │                               │
│                    └───────┬────────┘                               │
│                            │                                        │
│          ┌─────────────────▼──────────────────────┐                │
│          │              PEN-COMPARE                │                │
│          │               (Paper 5)                 │                │
│          │                                         │                │
│          │  Unified Universe: 1,058 entities       │                │
│          │  (29 natural + 1,029 designs)           │                │
│          │                                         │                │
│          │  ┌─────────────────────────────────┐   │                │
│          │  │   5-Gate TrueWriterScore v3.2   │   │                │
│          │  │  G1 DSB Avoidance  (Necessary)  │   │                │
│          │  │  G2 Programmability             │   │                │
│          │  │  G3 Native Cargo                │   │                │
│          │  │  G4 Deliverability              │   │                │
│          │  │  G5 Experimental Evidence       │   │                │
│          │  └──────────────┬──────────────────┘   │                │
│          │                 │                       │                │
│          │    ┌────────────┴──────────────┐        │                │
│          │    ▼                           ▼        │                │
│          │  Sensitivity Analysis    Cross-Pipeline │                │
│          │  18,000 combos/entity    Triangulation  │                │
│          │  ISCro4 robustness=1.0   30 discrepancies│               │
│          │    ┌────────────┴──────────────┘        │                │
│          │    ▼                                    │                │
│          │  RAG LLM Q&A (88% accuracy)             │                │
│          │  Streamlit Webserver (p95 = 0.01 s)     │                │
│          └─────────────────────────────────────────┘                │
└─────────────────────────────────────────────────────────────────────┘

Key Results

Metric Value
Universe size 1,058 entities (29 natural + 1,029 designs)
TRUE_WRITER 1 — ISCro4 (IS110 bridge recombinase, D2TGM5)
PROBABLE_WRITER 4 (IS621, Bxb1, phiC31, eePASSIGE_v2)
EMERGING_WRITER 1,037
NOT_WRITER 16
ISCro4 robustness 1.000 across 18,000 threshold combinations
LLM RAG accuracy 88% (44/50 questions, llama3.1:8b)
Webserver p95 latency 0.01 s (well under 3 s threshold)
Pre-registered predictions 5 / 5 PASS

What Makes an Editor a "Molecular Pen"?

Genome editors fall into two fundamental categories:

  • Molecular Scissors (Cas9, Cas12a, meganucleases) — make double-strand DNA breaks, relying on the cell's error-prone repair machinery. Efficient but imprecise and immunogenic.
  • Molecular Pens (IS110 bridge recombinases) — insert large DNA payloads at specific sites without cutting both strands. No DSBs means lower mutagenesis risk and reduced immune activation.

PEN-COMPARE's 5-gate framework formalises this distinction with pre-registered, threshold-locked criteria applied to a universe of 1,058 entities (all known IS110-family editors + 1,029 computationally designed variants).


5-Gate TrueWriterScore Framework (v3.2)

Certification flows through one necessary gate and four qualifying gates. Failing the necessary gate immediately assigns NOT_WRITER regardless of all other gates.

Gate Type Criterion Threshold
G1 — DSB Avoidance Necessary S_DSB axis (pen-score) ≥ 0.95
G2 — Programmability Qualifying S_Prog axis (pen-score) ≥ 0.95
G3 — Native Cargo Qualifying S_Cargo AND intrinsic_cargo_mechanism ≥ 0.85 AND True
G4 — Deliverability Qualifying Protein length (pen-score) ≤ 900 aa OR split-AAV
G5 — Evidence Qualifying Multi-source experimental support ≥ 2 sources

Tier Ladder

  G1 FAILS  ──────────────────────────────────────────►  NOT_WRITER
                                                         (auto-demote)
  G1 PASSES
      │
      ├── 4/4 qualifying + cell-based evidence  ────────►  TRUE_WRITER
      │
      ├── 4/4 qualifying, no cell-based  ─────────────►  PROBABLE_WRITER
      ├── 3/4 qualifying + cell-based  ──────────────►  PROBABLE_WRITER
      │
      ├── 1–2/4 qualifying  ───────────────────────►  EMERGING_WRITER
      │
      └── 0/4 qualifying  ────────────────────────►  NOT_WRITER

All thresholds were SHA-256 locked before analysis in SHA256_LOCK_v3.json and deposited at OSF/4kdvy.


Pre-Registration Outcomes

All five predictions were registered at OSF/4kdvy on 2026-05-26, prior to any data analysis.

ID Pre-registered Statement Result
P1 Among natural editors, exactly 1 will be TRUE_WRITER (ISCro4) PASS
P2 Zero pen-assemble designs will be TRUE_WRITER PASS
P3 Cross-pipeline triangulation flags ≥ 5 mechanism discrepancies PASS (30 found)
P4 Local-LLM RAG correctly answers ≥ 80% of 50-question benchmark PASS (88%)
P5 Streamlit webserver p95 latency ≤ 3.0 s PASS (0.01 s)

Publication target: NAR Webserver Issue (stretch)


Upstream Packages

PEN-COMPARE depends on all four prior PEN-STACK papers. Each is independently installable and documented:

Package PyPI GitHub Role in PEN-COMPARE
GENOME-ATLAS pip install genome-atlas ahmedanees-m/genome-atlas Provides atlas_system_present PFAM evidence flag for Gate 5 and SIZE_INCONSISTENCY triangulation
MECH-CLASS pip install mech-class ahmedanees-m/mech-class Provides tier_a_gate (IS110 Tier-A classification) used in Gate 1 interpretation and AXIS_VS_TIER + MECH_VS_PFAM triangulation rules
PEN-SCORE pip install pen-score ahmedanees-m/pen-score Provides all 8 axis scores (S_DSB, S_Prog, S_Cargo …) for Gates 1–3, get_editor_metadata() for cell-based evidence and intrinsic cargo flags
PEN-ASSEMBLE pip install pen-assemble ahmedanees-m/pen-assemble Contributes 1,029 computational IS110 designs to the unified universe; catalog inherits cargo/cell-based flags from pen-score

Cross-Pipeline Triangulation — What Was Found

By comparing claims across all four packages, PEN-COMPARE identified 30 discrepancy records across 29 natural editors:

Category Severity Count Meaning
SIZE_INCONSISTENCY Medium 13 Atlas entry exists but sequence length is unknown in pen-score
MECH_VS_PFAM High 11 Atlas confirms DSB-free PFAM domains, but mech-class does not call IS110 Tier-A
EVIDENCE_GAP Low 5 IS110 confirmed, S_DSB ≥ 0.95, but no mammalian cell-based evidence yet
CARGO_INCONSISTENCY Medium 1 intrinsic_cargo=True but S_Cargo < 0.60 (evoCAST)
AXIS_VS_TIER High 0 No editors found where pen-score and mech-class flatly contradict each other

Sensitivity Analysis

To quantify how robust the tier assignments are to threshold choices, every entity was re-certified across 18,000 parameter combinations (15 × 15 × 16 × 5 grid):

Parameter Range Values
G1 threshold 0.85 – 0.99 15
G2 threshold 0.85 – 0.99 15
G3 threshold 0.80 – 0.95 16
G4 size max 600, 750, 900, 1050, 1200 aa 5

Findings: ISCro4 is TRUE_WRITER in 100% of combinations (robustness = 1.000). Zero entities were boundary cases (robustness < 50%). Only four near-boundary editors (50–80%): Bxb1, eePASSIGE, eePASSIGE_v2, phiC31 — all site-specific recombinases that pass G2 only under loose thresholds.


Installation

# Minimal (certification core only)
pip install pen-compare

# With Streamlit webserver
pip install "pen-compare[webserver]"

# With local LLM RAG Q&A
pip install "pen-compare[rag]"

# Full (all extras)
pip install "pen-compare[webserver,rag,literature]"

Requires Python ≥ 3.10. All four upstream PEN-STACK packages are installed automatically.

Docker (recommended for full pipeline)

docker run --rm \
  -v ~/pen-assemble:/workspace/pen-assemble \
  -p 8501:8501 \
  pen-stack/compare:0.1.0

The Docker image bundles Ollama (llama3.1:8b + phi3.5:3.8b) and a pre-built ChromaDB vector index.


Quick Start

Certify a single editor

from pen_compare.core.certify import certify

result = certify(
    editor_id="ISCro4",
    s_dsb=1.0,
    s_prog=1.0,
    s_cargo=0.95,
    length_aa=326,
    evidence_sources=["biochemical", "structural", "computational", "cell_based"],
    intrinsic_cargo_mechanism=True,
)

print(result.tier)                    # TRUE_WRITER
print(result.qualifying_gates_passed) # 4
print(result.has_cell_based_evidence) # True

CLI

pen-compare --version
pen-compare compare ISCro4 IS621
pen-compare list-writers

Triangulate an editor

import pandas as pd
from pen_compare.triangulation import Triangulator

universe = pd.read_parquet("data/unified_editor_universe.parquet")
tri = Triangulator()
discrepancies = tri.run_full(universe)
print(discrepancies.groupby("category").size())

Ask the RAG system

from pen_compare.rag import PenStackQA

qa = PenStackQA()
print(qa.ask("Why is ISCro4 a TRUE_WRITER?"))

Streamlit Webserver

The interactive webserver provides five analysis tabs:

Tab Content
Comparator Side-by-side radar chart of any two editors across 4 axes; gate pass/fail table
True Writers Tier distribution bar chart; TRUE_WRITER summary; natural editors scorecard
Triangulation Discrepancy browser by category and severity
Q&A Live local-LLM RAG Q&A (Ollama required; degrades gracefully on Cloud)
Designer Filter Browse and filter 1,029 computational designs by source, tier, PenScore

Streamlit Community Cloud deployment uses pre-computed JSON caches (data/cache/) so no parquet files or Ollama instance is required.


Repository Structure

pen-compare/
├── pen_compare/
│   ├── core/
│   │   ├── gates.py          # 5 gate functions (G1–G5), threshold-locked
│   │   ├── certify.py        # TrueWriterResult classifier
│   │   ├── sensitivity.py    # 18,000-combo sensitivity grid
│   │   └── universe.py       # Unified editor universe assembly (Step 5)
│   ├── triangulation/
│   │   └── triangulator.py   # 5 cross-pipeline discrepancy rules
│   ├── rag/
│   │   └── qa.py             # PenStackQA: ChromaDB + Ollama RAG pipeline
│   ├── server/
│   │   ├── app.py            # Streamlit 5-tab webserver
│   │   └── cache.py          # JSON cache builder for Cloud deployment
│   └── cli.py                # `pen-compare` CLI entry point
├── config/
│   ├── gates_v3.yaml         # Pre-registered gate thresholds (SHA-256 locked)
│   └── triangulation_rules_v3.yaml
├── prereg/
│   ├── predictions_v3.yaml   # 5 pre-registered predictions
│   ├── methodology_v3.md     # Analysis methodology
│   └── OSF_RECORD.txt        # OSF deposit confirmation
├── results/
│   ├── truewriter_scorecard_v3.2.parquet
│   ├── triangulation_discrepancies.parquet
│   ├── pred_P{1..5}.json     # Per-prediction outcomes
│   └── PREREG_OUTCOME.json   # 5/5 PASS summary
├── data/
│   ├── unified_editor_universe.parquet
│   └── cache/                # Pre-computed JSON caches for Streamlit Cloud
├── tests/
│   ├── unit/                 # 155 tests, 98.8% coverage
│   └── integration/          # Calibration anchors + smoke tests (require Docker)
├── docs/                     # Sphinx source → https://ahmedanees-m.github.io/pen-compare
├── scripts/                  # Numbered execution scripts (Steps 1–30)
├── .github/workflows/
│   ├── ci.yml                # Lint + unit tests + PyPI release
│   └── docs.yml              # Sphinx → GitHub Pages
├── SHA256_LOCK_v3.json       # Pre-registration integrity record
├── requirements.txt          # Streamlit Cloud minimal deps
└── pyproject.toml

Development

git clone https://github.com/ahmedanees-m/pen-compare.git
cd pen-compare
pip install -e ".[dev]"
pytest tests/unit/ -q

Coverage report:

pytest tests/unit/ --cov=pen_compare --cov-report=term-missing

Docs (local preview):

pip install -e ".[docs]"
sphinx-build -b html docs docs/_build/html
open docs/_build/html/index.html

Verified Biological Anchors

All key biological identifiers were independently verified on 2026-05-26:

Entity UniProt Organism Length Status
ISCro4 D2TGM5 C. rodentium ICC168 326 aa ✅ Confirmed TRUE_WRITER
IS621 A0A2X3M8B0 E. coli NCTC8009/8333 342 aa ✅ Confirmed PROBABLE_WRITER
SpCas9 Q99ZW2 S. pyogenes serotype M1 1368 aa ✅ Confirmed NOT_WRITER
Paper DOI Status
Durrant 2024 Nature 10.1038/s41586-024-07552-4 ✅ Confirmed
Hiraizumi 2024 Nature 10.1038/s41586-024-07570-2 ✅ Confirmed
Pelea 2026 Science 10.1126/science.adz1884 ✅ Confirmed
Perry 2025 Science 10.1126/science.adz0276 ✅ Confirmed

Reproducibility

Artefact Location
Pre-registration OSF/4kdvy (public 2026-05-26)
SHA-256 lock SHA256_LOCK_v3.json
Pre-reg tag prereg-v3.2
v0.1.0 release v0.1.0
Sensitivity grid pen_compare/core/sensitivity.py (SENSITIVITY_GRID constant)
Biological ID verification memory/project_pen_compare.md (session log)

All thresholds in config/gates_v3.yaml are SHA-256 locked prior to data analysis. Scores are computed using frozen upstream package versions (pen-score==0.1.3, pen-assemble==0.5.2, genome-atlas==0.7.2, mech-class==0.5.4).


Citation

@software{pen_compare_2026,
  author    = {Mahaboob Ali, Anees Ahmed},
  title     = {{PEN-COMPARE}: Hierarchical Certification Framework for
               Non-Destructive Genome Editors},
  version   = {0.1.0},
  year      = {2026},
  url       = {https://github.com/ahmedanees-m/pen-compare},
  note      = {Pre-registration: \url{https://osf.io/4kdvy}}
}

If PEN-COMPARE's results depend on an upstream package, please also cite it:


License

MIT — see LICENSE.


GENOME-ATLAS · MECH-CLASS · PEN-SCORE · PEN-ASSEMBLE · PEN-COMPARE