GenoLeWM

Action-conditioned latent world models for genomic edits.

GenoLeWM models a genomic edit as an action in a latent state space. A frozen DNA encoder maps a reference sequence window to s_t; a trainable action encoder and predictor estimate the post-edit state s_{t+1}.

reference window --Carbon encoder--> s_t
edit spec --------action encoder----> a_t
(s_t, a_t) -------predictor---------> s_hat_{t+1}
edited window ----Carbon encoder----> s_{t+1}
loss = distance(s_hat_{t+1}, s_{t+1}) + collapse regularization

The package is an alpha research system. It is not a diagnostic device, not a clinical decision system, and not evidence of privacy or runtime assurance beyond local execution contracts and checksum provenance.

Install

python -m pip install geno-lewm

Use extras for heavyweight paths:

python -m pip install "geno-lewm[eval]"    # FASTA/VCF evaluation utilities
python -m pip install "geno-lewm[train]"   # torch/transformers training paths
python -m pip install "geno-lewm[dev]"     # tests, lint, typing, packaging

Source development:

git clone https://github.com/AbdelStark/GenoLeWM.git
cd GenoLeWM
uv pip install -e ".[dev,docs]"

What Ships

Surface	Status
Python package	`geno-lewm==0.2.1` on PyPI
Core data model	`EditSpec`, `RelEdit`, edit application, typed errors, metrics, redaction-safe logs
Model code	action encoder, cross-attention predictor, AR rollout wrapper, CEM planner
Data pipeline	local gnomAD/ClinVar VCF-to-Parquet builders, tuple builder, holdout filtering, dataset packaging
Training	fixture smoke trainer, Carbon preflight, single-process Carbon trainer, packaged run evidence
Evaluation	scorer, Carbon baseline scorer, binary metrics, Spearman metrics, rollout-fidelity metrics, eval aggregation
Demo	terminal scoring demo, checksum receipts, runtime preflight, batch receipt report, planning demo
Release tooling	dataset/model/training/paper package validators, sdist inventory gate, clean-machine replay helpers

Public Artifacts

Artifact	Link
PyPI package	https://pypi.org/project/geno-lewm/0.2.1/
Source/wheel release	https://github.com/AbdelStark/GenoLeWM/releases/tag/v0.2.1
Model package	https://huggingface.co/abdelstark/geno-lewm
Dataset package	https://huggingface.co/datasets/abdelstark/geno-lewm-data
Planning demo artifacts	https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1/planning-demo
v0.2.1 benchmark/planning/paper tree	https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1
v0.2.1 benchmark readiness report	https://huggingface.co/abdelstark/geno-lewm-runs/resolve/main/geno-lewm-v021-strong-4f36eef-10k-r1/suite/model/v0.2_benchmark_readiness_report.json
v0.2.1 generated paper	https://huggingface.co/abdelstark/geno-lewm-runs/resolve/main/geno-lewm-v021-strong-4f36eef-10k-r1/paper/paper.serious-completion.md

Results

The current evidence is useful systems evidence with mixed or negative model-quality results. Do not cite it as broad superiority over Carbon.

Track	GenoLeWM result	Baseline comparison
ClinVar coding	AUROC `0.734375`, AP `0.8529761904761904`, balanced accuracy `0.75`	vs Carbon: AUROC `-0.1875`, AP `-0.09894688644688643`, balanced accuracy `+0.0625`
ClinVar non-coding	AUROC `0.5625`, AP `0.6054563492063492`, balanced accuracy `0.4375`	vs Carbon: AUROC `-0.3125`, AP `-0.30896672771672784`, balanced accuracy `-0.25`
BRCA2 saturation	Spearman rho `0.14919354838709678`	vs Carbon: `-0.32771260997067453`
TraitGym Mendelian	Spearman rho `-0.02796450759873114`	vs Carbon: `+0.05592901519746229`
Phased-haplotype rollout	cosine mean `0.28886058350550603`, L2 mean `33.319687258878126`, Recall@4 `1.0`	weak vs source-state baseline: cosine `-0.7089701215468133`, L2 `+31.19289051130368`
Synthetic edit-chain rollout	cosine mean `0.30160847029349436`, L2 mean `28.802888778495763`, Recall@4 `1.0`	weak vs source-state baseline: cosine `-0.6896310938123016`, L2 `+25.637059814259455`
AR rollout speed	K=5 speedup `2.413859489667916`; K=20 speedup `2.4732225135799566`	K=5 passes its local target; K=20 remains below the RFC-0004 `5x` target
Planning demo	`best_distance=23.656930390534644`, `384` evaluations, patience stop	released-artifact execution evidence, not useful-planning evidence

The generated paper and readiness report preserve these negative findings. The strongest current claim is that the repository can train, package, evaluate, benchmark, and replay a genomic-edit world-model pipeline with content-addressed evidence.

Quickstart

from geno_lewm import EditSpec, EditType, RelEdit, apply_edits

edit = EditSpec(chrom="chr17", pos=43_091_983, ref="A", alt="T")
assert edit.edit_type is EditType.SNV

rel = edit.relative_to(window_start_bp=43_091_900, window_end_bp=43_092_100)
print(rel.rel_pos)

window = "ACGT" * 64
edited = apply_edits(
    window,
    [
        RelEdit(rel_pos=0, edit_type=EditType.SNV, ref_bases="A", alt_bases="T"),
        RelEdit(rel_pos=4, edit_type=EditType.SNV, ref_bases="A", alt_bases="C"),
    ],
)
print(edited[:12])

Verify a checksum receipt:

geno-lewm-verify examples/data/verify_receipt/receipt.json \
  --manifest examples/data/verify_receipt/manifest.json

Run a local fixture trainer smoke test:

geno-lewm-train --fixture-smoke --run-dir /tmp/geno-lewm-smoke --steps 50

Score a local VCF with a local model package:

geno-lewm-score \
  --model-dir /path/to/model \
  --backend auto \
  --vcf variants.vcf \
  --fasta reference.fa.gz \
  --output scores.jsonl \
  --receipt receipts.jsonl \
  --batch-size 64 \
  --no-progress

Run manifest-backed planning:

geno-lewm-plan \
  --model-dir /path/to/model \
  --window-fasta window.fa \
  --target-fasta target.fa \
  --output plan.json \
  --horizon 5 \
  --iterations 5 \
  --samples 1024 \
  --elite 64

Training Pipeline

The real training path is explicit about data identity, Carbon runtime readiness, config identity, and packaged run artifacts.

Validate the checked dataset rebuild spec.

python -m tools.release.dataset_snapshot \
  --spec-json configs/first_experiment/dataset-snapshot-snv.json \
  --check-spec

Check staged upstream inputs before building a dataset snapshot.

python -m tools.release.dataset_snapshot \
  --spec-json configs/first_experiment/dataset-snapshot-snv.json \
  --check-inputs

Run Carbon preflight for the packaged dataset and training config.

geno-lewm-train --carbon-preflight \
  --dataset-dir /path/to/dataset-package \
  --carbon-model-dir /path/to/carbon-model \
  --training-config configs/first_experiment/train-carbon-500m-snv.yaml \
  --run-dir /path/to/run \
  --preflight-output /path/to/run/training_preflight_report.json

Launch training and package the run evidence.

geno-lewm-train --carbon-train --package-release-run \
  --dataset-dir /path/to/dataset-package \
  --carbon-model-dir /path/to/carbon-model \
  --training-config configs/first_experiment/train-carbon-500m-snv.yaml \
  --run-dir /path/to/run

The run package contains the checkpoint, metrics, logs, resolved config, preflight report, manifest/card, and checksum inventory. Resume checks validate run id, dataset snapshot, seed split, and config identity.

Evaluation And Benchmarks

Compute measured binary or Spearman metrics:

geno-lewm-eval \
  --scores-jsonl scores.jsonl \
  --labels-jsonl labels.jsonl \
  --baseline-scores-jsonl carbon_scores.jsonl \
  --baseline-score-field carbon_zero_shot_score \
  --baseline-name carbon_zero_shot \
  --output-metrics eval_metrics.json \
  --split clinvar_coding

Aggregate metrics and render the release report:

geno-lewm-eval-all \
  --metrics-json eval_metrics.json \
  --output-metrics aggregate/eval_metrics.json \
  --output-report aggregate/eval_report.md \
  --require-v02-vep-metrics \
  --require-v02-rollout-metrics

Run the benchmark-suite command graph from a manifest:

python -m tools.release.v02_benchmark_suite \
  --manifest configs/first_experiment/v0.2_benchmark_suite.template.json \
  --output-report /path/to/v0.2_benchmark_suite_report.json \
  --execute

Measure release efficiency:

python -m bench.inference --release-efficiency \
  --model-dir /path/to/model \
  --variant chr17:43091983:A:T \
  --window /path/to/window.txt \
  --output /path/to/efficiency_report.json

Pure solver timing for planning:

python -m bench.planning --output /path/to/planning.performance.json

Demo Pipeline

Generate a terminal demo transcript from released model/data artifacts:

python tools/demo/terminal_inference.py \
  --model-dir /path/to/model \
  --vcf demo.vcf \
  --fasta reference.fa.gz \
  --output-dir /path/to/demo \
  --backend auto \
  --batch-size 64

The demo writes terminal-demo-transcript.md, terminal_demo_manifest.json, runtime_preflight_report.json, batch_receipt_report.json, scores.jsonl, and receipts.jsonl. These files are release evidence only when generated from non-fixture artifacts and checked by the package validators.

Repository Layout

geno_lewm/      package code
bench/          inference, rollout, training, and planning benchmarks
configs/        checked dataset, training, eval, and benchmark configs
docs/           public documentation and generated API pages
examples/       small public examples and receipt fixtures
rfcs/           design records for stable contracts
tests/          unit, integration, eval, ML smoke, API, and lint gates
tools/          release, CI, dataset, demo, and documentation tooling

Quality Gates

Run the full local gate before publishing code:

uv lock --check
uv run ruff check .
uv run ruff format --check .
uv run mypy geno_lewm tools
uv run python tools/api/snapshot.py check
uv run python -m tools.lint.check_scope_language
uv run pytest
uv run mkdocs build --strict

Package gate:

rm -rf dist
uv run python -m build --outdir dist
uv run twine check --strict dist/*
uv run python -m tools.release.check_sdist_assets dist/*.tar.gz

CI runs the same core contracts plus fixture-backed eval and ML smoke jobs. Fixture outputs are test evidence, not model results.

Limitations

Current measured benchmarks are mixed or negative relative to Carbon.
The K=20 AR rollout speed target remains open.
The released planning demo exercises the model path but does not show useful planning behavior.
No clinical utility claim; the public model evidence is not clinical evidence.
Personal-genome workflows are local-first, but local execution is not the same thing as a privacy assurance.
Checksum receipts prove artifact and output identity; they do not certify runtime behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.github		.github
bench		bench
configs		configs
desktop		desktop
docs		docs
examples		examples
geno_lewm		geno_lewm
paper		paper
rfcs		rfcs
spaces/geno-lewm		spaces/geno-lewm
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
PRIVACY.md		PRIVACY.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SPEC.md		SPEC.md
SPECIFICATION.md		SPECIFICATION.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenoLeWM

Install

What Ships

Public Artifacts

Results

Quickstart

Training Pipeline

Evaluation And Benchmarks

Demo Pipeline

Repository Layout

Quality Gates

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenoLeWM

Install

What Ships

Public Artifacts

Results

Quickstart

Training Pipeline

Evaluation And Benchmarks

Demo Pipeline

Repository Layout

Quality Gates

Limitations

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages