π Project page & videos β
A robot replanning through a world whose costs drift faces a question classical
planners never answer: how good is my current route, given that most of the
map is stale? CERT-FLOW answers it every round, with a proof: a
high-probability certificate LB β€ OPT β€ UB on the optimal route cost, built
from age-weighted non-exchangeable conformal prediction over drift-adjusted
observation residuals, and it spends paid sensing exactly where the
certificate says the gap shrinks fastest.
| property | classical replanning (D* Lite, AD*) | exchangeable conformal (CIA) | π CERT-FLOW (wins) |
|---|---|---|---|
| stale map | silently trusts it | coverage collapses (0.95 β 0.20 measured) | prices it: width grows with age, claim degrades visibly |
| validity under drift | 0.02β0.59 measured | gap-dependent | 0.95β1.00, every condition ever run |
| sensing | none / heuristic | none | certificate-directed (oracle-level regret) |
| static regime | fast | tight | proof-gated preprocessing: nsβΒ΅s queries that self-expire |
| metric | better is | CERT-FLOW | best alternative | winner |
|---|---|---|---|---|
| certificate coverage | higher β | 1.000 (every condition) | AD* 0.02β0.59 Β· CIA β 0.20 | π CERT |
| travel-regret, unknown terrain | lower β | β0.12 (β clairvoyant oracle) | VOI 0.47 Β· freshness/blind 4β7 | π CERT |
| fully-certified round @ 60Γ60 | lower β | 3.7 ms p50 / 12 ms p95 | β no certified planner reports one | π CERT |
| road cost-change absorption | lower β | 0.015β0.34 ms | CRP β 1 s | π CERT |
βbetter isβ shows the metric direction (β higher / β lower); bold = best value; π = winner. Every per-condition table in docs/ is likewise marked and ranked bestβworst.
- Coverage β₯ claimed confidence on every condition ever run: 17 synthetic regimes, off-model worlds, and two real cities (METR-LA, PEMS-BAY) at up to 49% drift-model violation rates.
- Route quality: exactly optimal on known maps (β‘ Dijkstra, plus the certificate); travel-regret β0.12 β a clairvoyant oracle in unknown drifting terrain; 2β3Γ lower regret than freshness/uncertainty/random sensing at equal budget.
- Speed: 3.7 ms p50 / 12 ms p95 per fully-certified round at 60Γ60 (one CPU core). Certificate-gated preprocessing answers static queries in 269β394 ns (cost) / 8.7 Β΅s (path), at or below published static-SOTA, and at road scale absorbs cost changes in 0.015β0.34 ms vs ~1 s for CRP-style recustomization, exact under Β±20% perturbation.
- Theory T1βT7: coverage (observable + latent), a certifiability threshold (gap Ξ΅ is sustainable iff sensing rate beats drift, both directions), a βL sum-aware upper certificate with a measured selection-bias hazard and its gate, an impossibility theorem (no uniform lower bound can beat Bonferroni by more than log factors, so the certificate's asymmetry is optimal), decision-uniform validity, and a churn-measured floor.
- Honest negatives, kept: the corridor-memory speed hypothesis failed (documented), a predictor's regime claim was downgraded after its test, and the maze negative-control shows exactly where route-critical sensing cannot help. Every known limitation and its disposition: docs/results/limitations.md.
pip install "certflow[fast]" # "fast" = numba (needed to reproduce the speed numbers)
python - <<'PY'
from certflow import CertPlanner, PlannerConfig
from certflow.drift import grid_world
world = grid_world(6, 6, seed=0, kind="bounded", rho=0.02, noise_scale=0.05)
planner = CertPlanner(world, (0, 0), (5, 5),
PlannerConfig(epsilon=5.0, alpha_prime=0.2))
for _ in range(150):
cert, sensed = planner.round()
print(f"[{cert.lb:.2f}, {cert.ub:.2f}] @ confidence {cert.confidence:.2f}, "
f"gap {cert.gap:.2f}")
PYTo develop or reproduce the paper numbers, work from a clone:
git clone https://github.com/Archerkattri/CERT-FLOW && cd CERT-FLOW
python -m venv cert_env && source cert_env/bin/activate
pip install -e ".[dev,fast,realworld]" h5py
pytest # full suite: 227 with datasets; data-dependent tests skip cleanly without data/Every quantitative claim traces to a script; the core sweep runs in ~100 s on
a multicore machine (CERTFLOW_WORKERS=N parallelizes seeds bit-identically).
| Result | Script | Documented in |
|---|---|---|
| Tier-0 coverage (17 conditions, provable + strict modes) | scripts/run_tier0.py |
docs/results/tier0-coverage.md |
| CERT vs Gaussian (path level) | scripts/run_tier0_baselines.py |
docs/results/tier0-coverage.md |
| Edge-level audit (Gaussian break) | scripts/run_gaussian_break.py |
docs/results/gaussian-break.md |
| Incremental repair latency (T3) | scripts/run_tier1_latency.py |
docs/results/tier1-latency.md |
| Ablations (ΞΊ churn, pre-widening) | scripts/run_ablations.py |
docs/results/ablations.md |
| Travel regret, unknown terrain | scripts/run_tier2.py |
docs/results/tier2-regret.md |
| Real traffic (METR-LA / PEMS-BAY) | scripts/run_metr_la.py [--pems-bay] |
docs/results/metr-la.md |
| MovingAI maps + maze negative control | scripts/run_movingai.py |
docs/results/movingai.md |
| External algorithms (AD*, VOI, TASP-degenerate) | scripts/run_extern_baselines.py |
docs/results/extern-baselines.md |
| CIA exchangeability collapse | scripts/run_cia_comparison.py |
docs/results/cia-comparison.md |
| E-Graphs + networkx anchors | scripts/run_repeated_queries.py |
docs/results/extern-baselines.md |
| Lifelong missions (memory vs memoryless) | scripts/run_lifelong.py |
docs/results/lifelong.md |
| Feature regimes (predictor, decision-uniform) | scripts/run_feature_regimes.py |
docs/results/feature-regimes.md |
| Scale + engine benchmarks | scripts/run_scale.py |
docs/results/scale.md |
| Road networks (DIMACS NY/FLA, ALT) | scripts/run_roadnet.py |
docs/results/published-speed-comparison.md |
| Certified Contraction Hierarchies | scripts/run_ch.py |
docs/results/published-speed-comparison.md |
| Extended validation (baselines, stress, scaling) | scripts/extval/*.py |
docs/results/extended-validation.md |
| FoMo off-road seasonal drift | scripts/extval/fomo_validation.py |
docs/results/extended-validation.md (Β§6) |
| Comparison videos | scripts/viz_compare.py, scripts/viz_gen/*.py |
site/ (project page) |
All scripts accept --quick. Real-data runs need data/ (sources and loaders
in data/README.md; ~230 MB + optional FoMo cost-signal ~150 MB, links inside).
Honest side-by-side comparisons β every clip replays a real run and the
coverage/regret numbers shown are measured, not staged (warm-up rounds are
drawn as "no claim", never counted as misses). Generators:
scripts/viz_compare.py + scripts/viz_gen/; MP4 + supplementary reel in
assets/videos/.
The certificate that holds vs. the one that breaks β CERT's band contains the true optimum every round it claims; AD*'s w-suboptimality band, trusting stale point estimates, drifts out of date.
Synthetic drifting grid β CERT coverage 100% vs AD* 43% (60 rounds).
Real MovingAI map (DAO arena) β CERT 100% vs AD* 42%.
Sensing that pays β gap-directed sensing converges near a clairvoyant oracle; random / max-age / drive-blind wander at equal budget.
Unknown drifting terrain β CERT travel-regret 1.96 vs random 7.84 / max-age 4.89 / blind 5.43 (15 seeds).
Real arena map β CERT 1.71, lowest of all policies.
Exchangeability collapse under staleness β exchangeable conformal (CIA, its own construction) covers on the static slice it assumes, then collapses; CERT widens to hold coverage.
METR-LA β CIA coverage 0.88 β 0.25 β 0.38 (width frozen) vs CERT ~1.0 (width grows).
- Score every paid observation with a drift-adjusted residual; weight by age (data-independent geometric weights; exchangeability is not assumed).
- Price each edge as
Δ Β± (Ξ»q + ΟΒ·age): the conformal quantile pays for noise, the drift term pays for staleness. - Bound the optimum from both sides with two incremental searches (optimistic β, conservative u) over a flat-array engine (numba kernels).
- Claim
LB β€ OPT β€ UBat an honestly-annealed confidence: weak claims during warm-up instead of silence; the claim visibly decays as the map ages. - Sense the edge that shrinks the certified gap fastest (route-critical, churn-aware); certification is a rate, not a state (T2β²).
- When the certificate proves the map tight, that proof licenses preprocessing: an all-pairs oracle or certified Contraction Hierarchy answering in nsβΒ΅s, revoked the instant drift exceeds tolerance.
src/certflow/
types.py contracts (World, EdgeBelief, Certificate)
conformal.py weighted non-exchangeable quantiles, Ξ_stale, ACI, blocks
cert.py the planner: certify β sense β repair loop, gates, annealing
sensing.py gap-shrink selection + baseline policies
fastgraph.py flat-array CSR engine (numba D* Lite, Dijkstra kernels)
snapshot.py certificate-gated all-pairs oracle (ns queries)
ch.py certified Contraction Hierarchies (231 Β΅s on 264k-node NY)
roadnet.py DIMACS road graphs + exact ALT on landmark lower-bounds
drift.py / realworld.py / movingai.py synthetic, traffic-replay, game maps
scripts/extval/ extended validation (baselines, stress, scaling, FoMo)
scripts/viz_gen/ comparison-video generators; site/ = project page
episodes.py / harness.py / baselines.py runners, seeds, parametric strawman
docs/results/ one markdown per experiment: numbers, anomalies, verdicts
docs/specs/ design spec; docs/theory/ working notes
Paper (engrXiv preprint): CERT: Certified Route Planning under Drifting Costs, doi:10.31224/7306.
@software{attri2026certflow,
author = {Attri, Krishi},
title = {{CERT-FLOW}: Certified Route Planning under Drifting Costs},
year = {2026},
doi = {10.5281/zenodo.20631475},
url = {https://github.com/Archerkattri/CERT-FLOW}
}The DOI above is the concept DOI (always resolves to the latest archived version); CITATION.cff carries the same metadata in machine-readable form.
Paper preprint (extended version): engrXiv, doi:10.31224/7306.
Author: Krishi Attri (ORCID Β· Google Scholar)
MIT, see LICENSE.





