A curated, citation-backed, tier-annotated dataset of tumor-growth-inhibition (TGI) models, exposure-response links, and TGI-metric → survival models — the machinery oncology drug development runs on — exported into the standard pharmacometric and systems-biology formats (NONMEM, SBML, PharmML, nlmixr2/rxode2, Pumas).
⚠️ NOT a clinical decision tool. NOT a prognostic calculator. NOT a treatment recommender. Population/trial-level forward simulation only, for drug-development methodology, simulation, and education. Every export carriesonkos:clinicalUse = "PROHIBITED — research / drug-development / education only".
Onkos (Greek ὄγκος, "mass, swelling") is the literal root of onco-. It is the third in a family with Nidus (gestational physiology) and Hypnos (anesthetic PK/PD), sharing one thesis: a model is only as trustworthy as its weakest, least-validated input — so make that a first-class, machine-readable field.
v0.41 · Code: MIT · Data: CC-BY-4.0 · Python ≥ 3.9
Oncology has the highest drug attrition of any therapeutic area, and the field's response is model-informed drug development: link drug exposure to tumor-size dynamics, and tumor dynamics to overall survival (OS), so early data forecast late outcomes and gate go/no-go decisions. The workhorse models (Gompertz, Simeoni, Claret, Stein/Bruno growth-rate-constant) live in per-drug, per-trial papers, carry enormous and under-communicated uncertainty (resistance terms with ~90% CV are routine), and are derived in one context then silently transported to another, where their predictive validity is unknown.
Onkos is the missing curated layer: it says, honestly, which TGI model and which parameters, for which tumor type and line, derived from which trial, validated how far beyond it, with what confidence — and how much the survival prediction changes if you'd picked a different model.
Pick a tumor type, line, and drug-effect size. Onkos overlays the simulated
tumor-size and population OS curves across every eligible TGI model, greys
out the models whose transportability envelope the context violates (with the
reason), and quantifies the divergence in the survival prediction. This makes
model-selection risk in go/no-go decisions measurable — the exact risk that,
unquantified, sends drugs into doomed phase-3 trials.
In the figure above (NSCLC, first line, E = 1.0), two NSCLC-validated models that fit early tumor data comparably imply median OS anywhere from ~54 to ~94 weeks. Every model validated only on another tumor type is greyed out automatically because applying it to NSCLC leaves its validated envelope (tier → D + warning). That spread is the model-selection risk.
$ onkos simulate --compare --tumor-type NSCLC --line first --drug-effect 1.0
[C] drug_effect.norton_simon.nsclc median OS 58.0
[C] resistance.claret_2009.tgi median OS 90.8 PFS 35.1
[C] resistance.nsclc_first_line.two_population median OS 94.5
[C] tgi_metrics.wang_2009.biexponential median OS 53.7 PFS 20.9
[-] resistance.crc_first_line.claret EXCLUDED
(tumor_type 'NSCLC' is outside validated ['CRC'] -> tier_down_to_D and warn)
[-] ... 8 more excluded for out-of-context transport (breast, HCC, melanoma, 2L)
OS divergence 0.265 | median OS range (53.7, 94.5)
PFS divergence 0.242 | median PFS range (20.9, 36.5)
The divergence view quantifies model-selection uncertainty. The other axis is
parameter uncertainty: the dataset records inter-individual variability
(iiv_cv_percent) on its high-uncertainty kill/resistance terms specifically so
they cannot pose as point estimates — and onkos.simulate_ensemble makes that
stored variability flow into the prediction. Parameters with an IIV CV are
sampled lognormally (the standard pharmacometric convention; the median is
preserved) and the tumor-size, TGI-metric, and population-OS distributions are
returned as bands.
For the Claret NSCLC model, whose resistance and kill terms carry ~90% CV, this
turns a deceptively precise "80% week-8 shrinkage" into an honest
[-100%, -30%] band and a median-OS interval of roughly 58–103 weeks — the
uncertainty was always in the data; now it is in the answer.
The divergence view shows that the eligible models disagree; onkos.combine is
its inferential completion. It splits the total uncertainty of a composed survival
forecast into the two axes above using the law of total variance:
Var(Q) = Σ wₘ·Var(Q|m) + Σ wₘ·(E[Q|m] − Q̄)²
= WITHIN (parameter) + BETWEEN (model-selection)
model_selection_fraction = BETWEEN / (WITHIN + BETWEEN) ∈ [0, 1]
That fraction answers the question a go/no-go committee actually has: of everything I am uncertain about in this forecast, how much would shrink if I ran a bigger trial and nailed the parameters (within), versus how much is structural disagreement between equally-published models that more data on any one of them will not resolve (between)? A high between-fraction is precisely the signal that sends programs into doomed phase-3 trials — and it has never had a number.
The eligible models also combine into a single model-averaged OS/PFS curve S̄(t)
(a convex combination of monotone survival curves is itself a valid survival
curve) carrying its between-model band — the average never ships without the
disagreement that qualifies it. Per context, the between-fraction ranks where
adding a better-validated TGI model has the most value (model-level curation
triage), and onkos report prints exactly that ranking.
$ onkos compare --average --weights equal --decompose
Model-averaged median_os_weeks = 71.6 [OS, scheme=equal, tier=C]
Variance: within(parameter)=229.1 between(model-selection)=207.2
>> model_selection_fraction = 0.47 (irreducible model-choice risk)
scheme point within between frac
equal 71.6 229.1 207.2 0.47
tier 71.6 229.1 207.2 0.47
evidence 70.5 217.3 209.5 0.49
weight_sensitivity (point swing across schemes) = 1.08
ma = cmp.model_average(target="median_os_weeks", endpoint="OS", weights="equal")
ma.point, ma.tier # averaged median OS; worst included tier (cannot be raised)
ma.within_var, ma.between_var # the law-of-total-variance components
ma.model_selection_fraction # BETWEEN / TOTAL — the headline number
ma.curve, ma.between_band # S̄(t) and its pointwise between-model ±1σ
ma.weights # {record_id: wₘ} (combination weights, NOT posteriors)
ma.weight_sensitivity, ma.warningsThe honesty boundary. Classical Bayesian model averaging weights models by
P(model | data), and stacking optimizes predictive weights — both require the
candidates to share one dataset. Onkos models are fit to different trials,
drugs, and tumor types, so a posterior model probability is not identifiable
and would be a fabricated quantity. Onkos therefore frames its weights as
forecast-combination weights (Bates–Granger 1969), explicitly not posterior
probabilities, and prints that distinction wherever weights appear. Three declared
schemes ship — equal (the agnostic default), tier (A:B:C = 4:2:1, a declared
not fitted factor), and evidence (∝ external C-index − 0.5) — and the headline
target is always reported under all of them, with the cross-scheme swing
(weight_sensitivity) flagged when the central estimate is weight-fragile.
Averaging cannot raise a tier, never rehabilitates an excluded out-of-context
model, and a single-eligible-model context returns fraction = 0 with a warning
(a zero is an absence of cross-checks, not a clean bill of health). The method has
direct regulatory-science precedent in dose-finding — MCP-Mod (Bretz, Pinheiro &
Branson 2005) and NLME model averaging (Buatois et al. 2018); Onkos's TGI-survival
combiner is the same idea one layer up. The combination math is proven against a
landmark suite (tests/test_combine.py) the way the kernels are: the estimator
is the law of total variance and a convex forecast combination, not a curve fit.
The model-averaging split is one structural axis (TGI-model choice) at a fixed
survival link. But v0.25 showed the survival link is a second, co-equal axis that
can invert the answer. onkos.budget is the capstone: it puts every structural
choice on one ledger via a balanced two-way variance-component decomposition (an
ANOVA / first-order Sobol over the structural factors — the structural analog of the
parameter tornado), splitting total forecast variance into parameter noise and the
variance contributed by each structural choice:
Var(Q) = WITHIN(parameter) + V_model(TGI choice) + V_link(survival choice) + V_inter
(reducible by a bigger trial) | (irreducible structural-choice risk)
Factor A = the in-context TGI models (compare().included); factor B = every eligible
survival link (week-8 Weibull, Cox, k_g). Each cell runs the parameter-IIV ensemble;
the components sum exactly to the total (the ANOVA identity), and collapsing factor B to
one level recovers the v0.21 within/between split exactly — the budget is a strict
generalization, landmark-proven (tests/test_budget.py).
The capstone finding is decision-grade and uncomfortable: for the NSCLC first-line OS forecast, ~68% of the variance is irreducible structural-choice risk — only ~32% is parameter noise a bigger trial would shrink — and the single largest component is the model×link interaction (the v0.25 inversion, that the survival metric flips which TGI model wins, is an interaction term, and it dominates). The survival-link axis (~24%) outweighs the tumor-growth-model axis (~12%) everyone argues about. Across contexts the budget ranks where standardization buys the most: 5 of 6 are structure-dominated, and it flags the contexts with only one survival link, where the survival-model axis is not even cross-checked.
$ onkos budget --tumor-type NSCLC --line first --endpoint OS
axis share
model × link interaction 35% ██████████
parameter noise (within-model IIV) 31% █████████
survival-link choice (metric / structure) 23% ███████
TGI-model choice 12% ████
>> structural-choice share = 69% (irreducible by a bigger trial); parameter share = 31%
>> dominant axis: model × link interaction — standardize / validate this first
b = onkos.model_selection_budget(ds, context=ctx, endpoint="OS")
b.fractions # {parameter, tgi_model, survival_link, interaction} -> share
b.structural_fraction # share that more patients cannot remove
b.dominant # the largest-share axis — standardize/validate this first
b.tier, b.warnings # worst included tier; single-level degeneracy notesIt attributes variance across assumptions — population/trial level only, no individual prediction, no model recommendation. The structural share is the honest opposite of false precision: it is the part of the forecast that a bigger trial will not fix.
Every simulated trajectory is summarized into the derived metrics oncology pharmacometrics actually reports (spec §3, §6): depth of response, nadir and time-to-growth, the tumor growth-rate constant k_g (the strongly prognostic Stein/Bruno quantity), the shrinkage-rate constant k_s, and the RECIST-style duration of response (partial response → progression). They feed both the survival link (via the week-8 change) and the reports.
The extractor is model-agnostic — it estimates k_g / k_s from the trajectory
the way the Stein method estimates them from RECIST data, so the metrics are
comparable across the Claret, biexponential, and Simeoni kernels. It is also
self-checking: run on the biexponential kernel it recovers that kernel's
generating k_g and k_s to within ~10%, and on the Claret model the extracted
k_g recovers the model's growth constant k_L. Metrics that don't apply (no
regrowth, no RECIST response) are returned as nan, never fabricated.
m = onkos.simulate(ds, "tgi_metrics.wang_2009.biexponential", context=ctx).metrics
m["tumor_growth_rate_kg"] # late-phase log-linear regrowth rate (≈ generating kg)
m["tumor_shrinkage_rate_ks"] # initial shrink rate via k_s = k_g − s0
m["time_to_growth_weeks"] # nadir time when genuine regrowth follows
m["duration_of_response_weeks"] # RECIST PR (−30%) → PD (+20% from nadir); nan if no PRPropagation gives a band; onkos.sensitivity attributes that band's variance to
individual parameters. Because each IIV-bearing parameter is sampled
independently, a parameter's standardized regression coefficient equals its
correlation with the target and the squared coefficients partition the explained
variance (a first-order Sobol decomposition). This is curation triage: the
spec's highest-leverage contribution is verifying records against the source PDF
(§9), and this says which parameter's uncertainty actually moves the survival
prediction.
A genuinely useful nuance falls out: for the Claret NSCLC model the kill rate
kD (CV 89%) drives ~90% of the median-OS variance — more than the resistance
term lambda (CV 96%), because influence is variability × effect-strength, not
CV alone. So kD is where verification pays off first.
res = onkos.sensitivity(ds, "resistance.claret_2009.tgi", context=ctx, target="median_os_weeks")
res.r_squared # first-order variance explained
res.dominant.symbol # "kD" — verify this parameter first
[(p.symbol, round(p.contribution, 2), p.src) for p in res.indices]Sensitivity asks which uncertainty drives the forecast; onkos.identify asks the
prior question — was the uncertainty ever resolvable by the trial that reported
it? The dataset's defining honesty move is surfacing the ~90% CV on kill/resistance
terms, with the stated reason that resistance is poorly identifiable from short
trials. This module measures that claim instead of asserting it. Given a model and
a realistic RECIST scan schedule, it builds the Fisher information of the design
and returns the Cramér–Rao lower bound on each parameter's precision — the best
relative standard error (RSE) any estimator could achieve from data of that shape.
Sᵢⱼ = ∂f(tᵢ)/∂θⱼ M = SᵀWS, W = diag(1/σᵢ²) (the design FIM)
RSEⱼ = √(M⁻¹)ⱼⱼ / |θⱼ| γ_K = 1 / √(λ_min of the column-normalized SᵀWS)
practically_identifiable = (maxⱼ RSEⱼ < 50%) AND (γ_K < 15)
The payload is the stored IIV CV next to the predicted RSE. For the Claret NSCLC
model under a realistic cadence, the kill rate kD is well identified (RSE ≈ 9%, its
early-shrinkage signal is strong), but the growth rate kL and the resistance decay
lambda are flat (RSE ≈ 229% and 53%) and confounded (collinearity index γ_K ≈ 22).
So lambda's 96% CV is, at least in part, a flat-likelihood artifact of the
originating design — not a clean estimate of biological spread — and Onkos says so
with a cv_is_identifiability_artifact flag. Identifiability is a property of the
design: lengthen the follow-up past resistance-driven regrowth (right panel) and
lambda finally drops below the ceiling, while kL (growth, masked by treatment)
stays the hardest to pin down. Across the dataset the verdict bifurcates cleanly —
the parsimonious 2-parameter biexponential (Stein/Bruno) models are identifiable;
every 3-parameter resistance-augmented model is not — and onkos report ranks the
models whose estimates a realistic trial cannot support (design-level curation
triage).
res = onkos.identifiability(ds, "resistance.claret_2009.tgi", context=ctx,
schedule=[0, 6, 12, 18, 24, 36, 48]) # weeks
res.practically_identifiable # False — under this design
res.collinearity_index # γ_K ≈ 22 (a confounded parameter combination)
res.worst.symbol # "kL" — needs a richer design / external constraint first
[(p.symbol, round(p.rse_percent), p.iiv_cv_percent) for p in res.params]
res.tier # unchanged — identifiability cannot move a tierA diagnostic about a model under a design, never about a patient and never a
tier-mover: a singular design reports inf (honest) rather than a fabricated bound,
and "unidentifiable" always means under this schedule and residual-error model. It
is the individual (fixed-effects) design FIM — the standard local practical-
identifiability tool of pharmacometric optimal design (PFIM/PopED; the structural-vs-
practical distinction of Raue et al. 2009; the collinearity index of Brun et al.
2001). The information algebra is proven against a landmark suite
(tests/test_identifiability.py): the analyzer is the Cramér–Rao bound and the
Brun collinearity index — exponential closed form, information additivity, monotonic
precision, residual-error scaling, and singular-design honesty — not a precision
guess.
$ onkos identify resistance.claret_2009.tgi
parameter central pred. RSE IIV CV identifiable?
kL 0.021 229% 38% NO
lambda 0.061 53% 96% NO
kD 0.3 9% 89% yes
collinearity index γ_K = 22.3 (ceiling 15) -> NOT identifiable under this design
! cv_is_identifiability_artifact: 'lambda' carries IIV CV 96% and is practically
unidentifiable (RSE 53%) — its variability is partly a flat-likelihood artifact
identify evaluates a given schedule. But a pharmacometrician chooses the sampling
times, so the real question is whether the best schedule under a fixed budget is enough.
onkos.design (v0.31) adds that choice: the D-optimal schedule of N measurements — the
times that maximize det(M), i.e. minimize the joint confidence-ellipsoid volume — scored
against a uniform schedule of the same budget. The design Fisher information is additive over
timepoints (M = Σᵢ sᵢsᵢᵀ), so the sensitivities are computed once on a dense grid and the
optimization is pure linear algebra; the reported optimal is the better of greedy/uniform, so
D-efficiency ≥ 1 by construction.
The payload separates two kinds of flatness within one model. For Claret NSCLC (N=7 over 48 wk) the D-optimal design clusters samples at the kill phase (≈8–13 wk) and the regrowth onset (≈30 wk + tail), improving every parameter (D-efficiency ≈ 1.14):
| Parameter | uniform RSE | D-optimal RSE | verdict |
|---|---|---|---|
kD (kill rate) |
9% | 9% | already pinned |
λ (resistance) |
54% | 48% | rescued — a better trial crosses the 50% line |
kL (growth rate) |
228% | 199% | structurally flat — best design still fails |
So the borderline resistance term is rescued by a better design (its flatness was
circumstantial), while the deeply flat growth rate stays unidentifiable under the optimal
schedule (its flatness is structural — no trial of this budget pins it down, so its huge CV
is a flat-likelihood artifact, not biological spread). The 2-parameter Wang biexponential is
the control: both parameters identifiable under the optimal design (D-efficiency ≈ 1.31),
proving the kL failure is the model's structure, not the optimizer. Optimal design is what
separates "badly designed trial" from "structurally unidentifiable parameter."
od = onkos.optimal_schedule(ds, "resistance.claret_2009.tgi", context=ctx, n_samples=7, horizon=48.0)
od.optimal.schedule # D-optimal sampling times (wk), baseline-anchored
od.d_efficiency # ~1.14 — how much more informative than uniform
od.rescues_any # True — the better design rescued the borderline λ
od.structurally_flat # ["kL"] — deeply flat even under the best schedule
# onkos design <id> -> the uniform-vs-D-optimal RSE table + the structural-flat verdictLandmark-tested (tests/test_design.py): the closed-form D-optimal subset on a hand-built
matrix, information additivity, the D-efficiency ≥ 1 guarantee, that kL stays flat under
the best design, and that the biexponential is fully identifiable. Pure post-processing over
the v0.22 Fisher core — no new kernel, record, or export. Design/population level only; cannot
move a tier; no per-patient schedule, no dosing or therapy choice.
Parameter identifiability (above) asks whether a trial can pin a number within a model. The
model-level twin asks whether a trial can choose between models: given two models' population
OS curves, what trial would it take to distinguish them? onkos.discriminability answers it with
the logrank power calculation — Schoenfeld's required events
d = 4(z_{1-α/2}+z_{1-β})²/(ln HR)², where HR is the follow-up-horizon hazard ratio between the
curves. This is the rigorous close of the whole model-selection arc: when the answer is tens of
thousands of events, the model choice is practically unidentifiable from the trial — it can
only be assumed, not resolved by the data.
The payload reframes the project's load-bearing idea. Under the week-8 OS surrogate (NSCLC, power 0.8, α 0.05), the model pairs that diverge only in the regrowth tail — the resistance mechanism (Claret vs two-population, v0.24) and origin (acquired vs pre-existing, v0.32) — need 10⁴–10⁵ events to distinguish (≈11,800, ≈27,000, ≈103,000); the pairs that differ in early shrinkage (vs the complete/minimal responder) need ~60–90. So:
- The silent model-selection risks are silent because they are unidentifiable. The v0.24/v0.32 observation ("the week-8 surrogate is nearly blind to the resistance-model choice") is now a number: distinguishing them would take an impossible trial. The choice can only be assumed — with its tier and transportability attached, which is exactly the uncertainty Onkos makes first-class.
- The risk lives in the trial's blind spot. The consequences a surrogate-driven trial can see (early shrinkage; the survival-metric swing, week-8 vs k_g, needs <500 events) are identifiable; the tail-mechanism choice the surrogate is blind to is not.
from onkos.discriminability import required_events, model_discriminability
required_events(0.75) # events to distinguish a HR=0.75 difference
md = onkos.model_discriminability(ds, context=ctx)
md.n_indistinguishable # model pairs a realistic trial cannot resolveLandmark-tested (tests/test_discriminability.py): the Schoenfeld benchmark (HR 0.5 → ~65
events), HR↔1/HR symmetry, the horizon-HR proportional-hazards recovery, and the
resistance-models-indistinguishable result. The model-level member of the identifiability family
(identify v0.22, design v0.31). Pure post-processing over the OS curves; design/trial level
only; cannot move a tier; no real trial designed, no recommendation.
The spec (§2, §6) calls for both overall survival (OS) and progression-free
survival (PFS). Each tumor context carries an OS link and a PFS link (parametric
Weibull-PH on the week-8 TGI metric), so a simulation returns a curve per
endpoint and PFS is shorter than OS by construction. Every analysis — the
divergence view, uncertainty bands, and sensitivity (target="median_pfs_weeks")
— works on either endpoint.
tr = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx, drug_effect=1.0)
tr.survival # {"OS": curve, "PFS": curve}
tr.median_os, tr.median_pfs # e.g. 90.8, 35.1 weeks (PFS < OS)The spec (§2, §6) asks for Cox as well as parametric links. The Cox
proportional-hazards form uses a nonparametric tabulated baseline survival
S0(t) (from data, not a closed-form distribution): S(t | x) = S0(t)^exp(β·x).
It is marked non-default, so it never auto-collides with the Weibull link on
the same endpoint — you opt in with survival_link= to ask a different question:
how much does the choice of survival model itself move the answer?
For NSCLC OS, the same week-8 TGI metric fed through the parametric Weibull link versus the Cox link shifts median OS from ~91 to ~107 weeks — a third uncertainty axis (survival-model structure) alongside model-selection divergence and parameter variability.
cox = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx,
survival_link="survival_link.nsclc_os_cox")The survival link reads one on-treatment number to drive its hazard, and that
number has been a silent constant: the week-8 change, an early-shrinkage
surrogate blind to everything after week 8. v0.25 makes the bridge metric a declared,
swappable field (structure.link_metric, defaulting to the existing week-8 behavior)
and adds the tail-sensitive growth-rate-constant (k_g) link — the Stein/Bruno
quantity that out-discriminates early shrinkage for OS. This completes the v0.24
finding (a week-8 surrogate barely sees the resistance-model tail divergence) and
turns which metric predicts survival into an explicit model-selection axis.
The result is the sharpest in the project — the metric choice inverts the answer:
- The resistance-model ranking flips. Under week-8 the mechanistic two-population
model looks better than the phenomenological Claret model (deeper early shrinkage:
median OS 94 vs 91); under
k_git looks worse (faster regrowth: 32 vs 39). Which survival metric you assume decides which resistance model wins. - The complete responder is undervalued by the surrogate. Norton-Simon eradicates
the tumor (slow early shrinkage, no regrowth). Week-8 scores it mediocre (58, below
both resistance models);
k_g— seeing no regrowth (mapped to the baseline hazard) — correctly makes it the longest survivor (102). An early-shrinkage gate systematically penalizes a slow-but-complete responder.
default = onkos.simulate(ds, "resistance.nsclc_first_line.two_population", context=ctx)
kg = onkos.simulate(ds, "resistance.nsclc_first_line.two_population", context=ctx,
survival_link="survival_link.nsclc_os_growth_rate")
default.median_os, kg.median_os # the bridge metric re-ranks the resistance modelsThe change is near-zero code (the metric becomes a dataset field, default unchanged so
no existing curve moves) plus one record, and is landmark-tested
(tests/test_survival_metric.py): backward compatibility, the two inversions, and that
an undefined k_g (no regrowth) maps to the baseline hazard — a finite best-case
survival, never a nan curve. Onkos ships both metrics and shows they disagree; it does
not declare a winner — making the field's surrogate-endpoint debate computable rather
than rhetorical.
The week-8 change reads one early point (depth-only, blind to the tail); k_g reads
one terminal slope (tail-only, blind to depth). Both throw away most of the trajectory.
v0.33 adds the natural third option — the integrated tumor burden log_burden_auc, the
time-averaged log relative tumor size over the horizon (the AUC of the log-size curve). It is
the one summary that integrates both: depth lowers it, a regrowth tail raises it. Eradication
is floored at the detection limit so the integral stays finite. Like k_g it is a non-default
link (default=false), so the default view and every export are byte-identical.
The point is that the "comprehensive" metric does not dissolve the choice — it produces a third, distinct ranking of the NSCLC models, and exposes a pathology of the pure-tail metric:
- Three different rankings. week-8:
two-pop > acquired > Claret > Norton-Simon > Wang;k_g:Norton-Simon > Wang > Claret > two-pop > acquired; burden:Norton-Simon > Claret > two-pop > acquired > Wang. Which bridge metric you pick remains a live model-selection axis even when you reach for the one that uses the whole curve. - Burden repairs
k_g's depth-blindness.k_granks the minimal responder Wang (nadir ~75% of baseline — it barely shrinks) 2nd, because its regrowth slope happens to be slow; the pure-tail metric cannot see that the tumor never got small. The integrated burden weighs depth and ranks Wang last, where a clinician would. Tail-sensitivity without depth-sensitivity is its own failure mode, and an integrated metric is the honest fix.
b = onkos.simulate(ds, "tgi_metrics.wang_2009.biexponential", context=ctx,
survival_link="survival_link.nsclc_os_burden_auc")
b.metrics["log_burden_auc"] # the covariate, now in every trajectory's metric panel
b.median_os # OS under the integrated-burden link — Wang ranks last hereThe change is pure post-processing (one metric in the Stein/Bruno panel) plus one non-default
record; it is landmark-tested (tests/test_burden_auc.py): the closed-form identities
(baseline⇒0, constant⇒log c, floored eradication), tail-sensitivity where week-8 is blind,
depth-sensitivity where k_g is blind, and the third distinct ranking. NSCLC/first now exposes
four eligible OS links (week-8, Cox, k_g, burden-AUC) to the model-selection budget.
The three bridge metrics above are all two-stage: collapse the trajectory to one static
number, then apply a hazard. A static covariate means a proportional hazard — the hazard ratio
between two tumors is constant in time. That is itself a modeling assumption, and the rigorous
alternative the pharmacometric literature treats as gold-standard is the joint longitudinal–
survival model, whose current-value link makes the instantaneous hazard track the current tumor
size: λ(t) = λ₀(t)·exp(α·log(v(t)/y0)). onkos.joint adds it as pure post-processing (no record,
kernel, or export change — every default artifact is byte-identical). It is a strict generalization:
a constant hazard ratio recovers the two-stage Weibull-PH curve exactly (and the v0.33 burden link
is its constant-trajectory special case).
The payload is a hazard the two-stage links structurally cannot encode:
- The hazard ratio is time-varying. It is suppressed during the deep early response (HR ≈ 0.13–
0.18) then rises 10× to 255× as a resistant clone regrows — largest for the acquired-resistance
and two-population models — while a complete responder's hazard keeps falling. A two-stage link's
hazard ratio is flat (
ph_violation == 1); the joint link'sph_violation = HR(end)/HR(8wk)measures the non-proportionality, and it is concentrated in exactly the resistance models whose tail the week-8 surrogate is blind to. - The ranking inverts, structurally. Week-8 ranks the deep-early-shrinking two-population model
above Claret (median OS 94 vs 91); the joint link, weighting the regrowth tail as a rising hazard,
reverses it (Claret 199 vs two-population 144). The survival-link structure choice (two-stage
PH vs joint current-value) is the structural counterpart to the
link_metricchoice.
from onkos.joint import joint_survival, compare_joint_vs_two_stage
j = joint_survival(ds, "resistance.nsclc_first_line.two_population", context=ctx, alpha=1.0)
j.median_os, j.two_stage_median_os # joint vs week-8 two-stage median
j.ph_violation # HR(end)/HR(8wk): 1 for PH, ≫1 as resistance regrows
compare_joint_vs_two_stage(ds, context=ctx, alpha=1.0).rank_discordant_pairsLandmark-tested (tests/test_joint.py): the constant-HR⇒two-stage and unit-HR⇒baseline recoveries
to machine precision, α=0 removing the association, the non-proportional-hazard signature, the
ranking inversion, and the inherited tier/transport guardrails. α and the baseline are declared,
illustrative parameters — never fitted; a joint analysis never moves a tier and emits no individual
prediction.
The bridge-metric work asked which quantity predicts survival. The ctDNA era forces the
orthogonal question — when do you read it? The field pushes the readout ever earlier
(circulating-tumor-DNA "molecular response" at week 2–4, before a reliable RECIST size change
at week 8). onkos.early_surrogate models ctDNA molecular response as proportional to tumor
burden (the standard first-order shedding assumption), so the modeled distinction from a
size readout is purely the landmark time — which isolates the timing question. (Genomic
ctDNA content is out of scope, §2; the reduction is the honest scope, not a hidden limitation.)
landmark_response(t, v, week) generalizes the week-8 covariate to any landmark (recovering
week8_relative_change exactly at week 8), and surrogate_timing_fidelity ranks a context's
models by their early-surrogate response at each landmark and counts how many model pairs that
ranking orders oppositely to a tail-aware durable-benefit reference (median OS under the
k_g link). The finding is a clean monotone trade-off — earliness trades against fidelity:
- For NSCLC the discordance against durable benefit falls monotonically with the landmark — 9/10 pairs wrong at week 2 (the ctDNA-era readout, almost fully inverted), 8/10 at week 8, down to 3/10 at week 52. An earlier landmark sits at or before the nadir, before the resistant regrowth, so it cannot see the tail that decides durable benefit.
- The bias has a direction: early landmarks rank the mechanistic-resistance models (acquired, two-population — deep, fast-regrowing, doomed) on top, the exact models the durable-benefit reference ranks last. Shifting the readout earlier maximizes the depth-vs-durability surrogate failure, localized in time. It reproduces across breast, CRC, HCC, and melanoma.
from onkos.early_surrogate import landmark_response, surrogate_timing_fidelity
landmark_response(tr.t, tr.tumor_size, 4.0) # the ctDNA-era (week-4) molecular-response readout
st = onkos.surrogate_timing_fidelity(ds, context=ctx)
st.discordance_at(2.0), st.discordance_at(52.0) # earliest vs latest — the trade-offLandmark-tested (tests/test_early_surrogate.py): the week-8 recovery, the monotone
earliness-fidelity trade-off, the over-rewards-deep-but-doomed direction, and cross-context
reproduction. Pure post-processing over the existing trajectories and the k_g link; population
level, no individual molecular-response prediction and no go/no-go recommendation.
The objective response rate (ORR) is the dominant phase-2 go/no-go endpoint, and a
famously contested OS surrogate: drugs with a high response rate routinely fail to extend
survival. onkos.response adds the response endpoint — RECIST 1.1 best overall response
(CR > PR > PD > SD) classified from the tumor-size trajectory, lifted to population
rates over the IIV ensemble (ORR = P(CR or PR), DCR = P(CR, PR, or SD)) — and, because a
model's ORR and OS are read off the same simulated trial, makes the ORR → OS
relationship a measured quantity.
The result is the sharpest surrogate statement in the project: whether ORR faithfully
ranks survival is conditional on the survival mechanism. ORR and the week-8 survival
surrogate are both shrinkage-based, so under the week-8 OS link ORR ranks OS perfectly
(0 / 6 discordant model pairs). But under the tail-sensitive k_g link (v0.25), a deep
early responder that regrows fast has a high ORR and a short OS — so ORR inverts
(4 / 6 discordant): the highest responder (ORR ≈ 1.0) has the shortest OS, while the
eradicating drug (lower ORR, no regrowth) has the longest. That is the computational core
of every ORR-surrogate failure — high response, no survival benefit — and Onkos makes the
unobservable assumption behind it (that survival tracks early shrinkage) explicit.
rr = onkos.objective_response_rate(ds, "resistance.claret_2009.tgi", context=ctx)
rr.orr, rr.dcr, rr.distribution # population RECIST rates (CR/PR/SD/PD sums to 1)
rr.median_os_weeks # OS off the SAME trial — for the surrogate question
rs = onkos.response_vs_survival(ds, context=ctx) # week-8 link
rs.discordant_fraction, rs.orr_predicts_os # 0.0, True — ORR ranks OS faithfully
rs_kg = onkos.response_vs_survival(ds, context=ctx,
survival_link="survival_link.nsclc_os_growth_rate")
rs_kg.discordant_fraction # 0.67 — ORR mis-ranks tail-driven survivalPopulation/trial level only — no individual response probability, no therapy ranking. The
rates carry the chain's propagated tier (out-of-context → D), and the discordance is a
statement about models, not a treatment choice; Onkos reports it under each survival
assumption rather than certifying ORR valid or invalid. Landmark-tested
(tests/test_response.py): the RECIST classification boundaries, the rate simplex, ORR
monotone in drug effect, and the conditional-surrogacy result (concordant under week-8,
discordant under k_g).
ORR measures response breadth (how many patients respond); it cannot see
durability (how long each response lasts). v0.28 adds duration of response (DoR) —
read from the same RECIST episode as the response category (response_episode returns
both), as a population median over the IIV ensemble with explicit right-censoring (a
response that never progresses is a durable, censored responder, not a zero). This is the
dimension that supplies the mechanism of the v0.27 surrogate failure.
The headline is that breadth and durability dissociate: for NSCLC the model with the highest ORR (1.00) has the shortest DoR (~32 wk) — its responses are universal but brief (a deep early shrink, then a fast resistant regrowth). That durability deficit is exactly why it has the worst tail-driven OS: sorted by survival, the broadest responder is the worst survivor, while the longest-lived model's responses are the most durable. This is the immunotherapy lesson made computable — a modest-ORR/durable drug can beat a high-ORR/brief one on survival — and the quantitative core of "high response, no survival benefit." Onkos does not claim DoR is a better surrogate (it sees only responders and carries censoring); it shows ORR and DoR are orthogonal and the ORR-surrogate failures are durability failures.
rr = onkos.objective_response_rate(ds, "resistance.nsclc_first_line.two_population", context=ctx)
rr.orr # 1.00 — breadth
rr.median_dor_weeks # ~32 — durability (the highest ORR, the shortest DoR)
rr.dor_censored_fraction, rr.n_responders # honest censoring of durable responders
# onkos response --durability -> the breadth-vs-durability table across modelsLandmark-tested (tests/test_duration.py): episode consistency (best_response == response_episode[0]), the closed-form DoR, censoring of durable responses, the depth ≠
durability dissociation, and that the k_g-discordant (highest-ORR) model is the
short-DoR one — durability tracks survival where breadth inverts it. Population/trial level
only, no individual duration, no therapy ranking.
Every result above was first demonstrated on NSCLC first line. The natural question is
whether they generalize or are artifacts of one tumor type's illustrative parameters. v0.29
answers it: each other curated solid-tumor context (breast, CRC, HCC, melanoma) is given the
same two pieces NSCLC had — a mechanistic two-population resistance model (universal
response, but a fast resistant regrowth → broad-but-brief) and a non-default tail-sensitive
k_g OS link — and every NSCLC-only finding reappears, unchanged in direction, across
all five contexts.
| Context (1L) | ORR→OS under week-8 | ORR→OS under k_g |
budget survival-link share | dominant axis |
|---|---|---|---|---|
| NSCLC | concordant (0/6) | discordant (4/6) | 24% | model×link interaction |
| breast | concordant (0/3) | discordant (2/3) | 72% | survival-link |
| CRC | concordant (0/3) | discordant (2/3) | 52% | survival-link |
| HCC | concordant (0/3) | discordant (3/3) | 21% | parameter |
| melanoma | concordant (0/3) | discordant (2/3) | 54% | survival-link |
In every context the two-population model has the highest ORR but the shortest DoR
(depth ≠ durability), the worst OS under the tail-sensitive k_g link (so ORR — faithful
under the week-8 surrogate — mis-ranks OS), and the budget's survival-link axis is now
real (often the dominant structural axis). The model-selection budget report now flags
5 of 6 contexts as structure-dominated (was 4/6, with the survival-link axis empty for
the non-NSCLC contexts). CI enforces the generalization (tests/test_response.py,
tests/test_budget.py): the resistance-mechanism divergence, the conditional ORR→OS
surrogacy, the depth-vs-durability dissociation, and the budget's survival-link axis are
dataset-wide, not an NSCLC demo. The new records are illustrative and tier C — this
generalizes the structural findings, not validated per-tumor parameters.
PFS is the endpoint that gates accelerated and conditional approvals — the one a sponsor reaches for when OS is immature. Onkos computes it two legitimate ways that need not agree, and shows the route choice is itself a model-selection axis (v0.30):
- statistical — the parametric PFS survival link, a hazard model keyed on the week-8 tumor change (the standard route, fit because an early read is what a trial affords);
- mechanistic — the RECIST 1.1 time-to-progression read directly off the simulated tumor trajectory (first time the SLD rises +20% above its running nadir).
The week-8 link is blind to a regrowth tail it never sampled; the mechanism watches the SLD cross progression. For shrink-then-regrow resistance dynamics the two routes invert the model ranking. The two-population (mechanistic resistance) model is the consistent culprit: its resistant subclone regrows fast, so it has the shortest mechanistic PFS — yet at week 8 it is deeply shrunk, so the week-8 hazard link gives it among the longest statistical PFS. For NSCLC 1L the mechanistic route ranks Claret far above the two-population model (60 vs 34 wk) while the statistical route ranks them level or reversed (34 vs 35 wk) — 2 of 6 model pairs are route-discordant. Because every solid-tumor context already has both a PFS link (v0.12) and a two-population model (v0.29), this reproduces in all five contexts on day one — not an NSCLC artifact:
| Context (1L) | mech. TTP: Claret / two-pop | stat. PFS: Claret / two-pop | route-discordant pairs |
|---|---|---|---|
| NSCLC | 60 / 34 | 34 / 35 | 2/6 |
| breast | 78 / 39 | 65 / 67 | 1/3 |
| CRC | 67 / 32 | 53 / 55 | 1/3 |
| HCC | 48 / 27 | 30 / 31 | 1/3 |
| melanoma | 62 / 28 | 45 / 46 | 1/3 |
pf = onkos.progression_free_survival(ds, "resistance.nsclc_first_line.two_population", context=ctx)
pf.median_ttp_weeks # 34 — mechanistic: RECIST progression off the trajectory
pf.median_pfs_link_weeks # 35 — statistical: the week-8-keyed PFS hazard link, same trial
pf.mechanistic_pfs_rate # P(progression-free at 24 wk ≈ 6 mo), censoring-robust
pf.route_ratio # mechanistic / statistical median
div = onkos.pfs_route_divergence(ds, context=ctx) # route-discordance across the in-context models
# onkos pfs <id> / onkos pfs --routes -> the two-route table + the route-discordance countOnkos privileges neither route — the statistical link carries trial-fit hazard
information the mechanism lacks; the mechanism assumes the tumor model is true. It reports the
disagreement: PFS is not one number but two that diverge for exactly the resistance dynamics
that matter. Landmark-tested (tests/test_pfs_routes.py): the closed-form TTP, the
running-nadir rule, censoring of durable non-progressors, the NSCLC route inversion, and
dataset-wide route discordance. Population/trial level only; tier floors to D out of context;
no therapy ranking.
The context library is indexed by tumor type and line of therapy. NSCLC now carries a first-line and a second-line context (baseline, OS + PFS links, eligible TGI models), and survival matching is line-aware: a second-line simulation never silently borrows a first-line survival model. Second-line prognosis is shorter, the first-line-only Claret model is correctly excluded from the 2L view, and a line with no curated link gets no survival curve rather than a wrong one.
first = onkos.simulate(ds, "tgi_metrics.wang_2009.biexponential", context=dict(tumor_type="NSCLC", line="first"))
second = onkos.simulate(ds, "tgi_metrics.wang_2009.biexponential", context=dict(tumor_type="NSCLC", line="second"))
second.median_os < first.median_os # True — same model, line-aware survivalFix shipped here: survival-link discovery previously matched only on tumor type, so a second-line context silently reused first-line survival models. It now matches on
(tumor_type, line).
Onkos consumes exposure; it does not model PK (that is its sibling Hypnos).
The small onkos.pk bridge turns a dose/regimen — or an external Hypnos PK
profile — into the exposure metric the ER kernels expect, so the spec's headline
composability claim runs self-contained: dose → C_avg → exposure-response →
kill → tumor dynamics → OS/PFS, one open, tier-annotated chain. The PK generators
are illustrative (the cornerstone relation C_avg = F·Dose/(CL·τ)); for real PK,
fit/simulate in Hypnos and feed the profile via pk.from_profile.
from onkos import pk
c_avg = pk.steady_state_metrics(dose=1200, tau=24, ka=0.5, ke=0.05, v=5)["c_avg"]
tr = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx,
exposure=c_avg, exposure_response="exposure_response.emax_generic")
# higher dose → higher C_avg → deeper response → longer OS (the go/no-go chain)
# ...or ingest a Hypnos-style concentration–time profile directly:
C = pk.from_profile(times=[0, 8, 52, 104], concentrations=[0, 300, 220, 140], t=t)
tr = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx, exposure=C,
exposure_response="exposure_response.emax_generic", t=t)The spec's drug_effect subsystem (§3) names Norton-Simon — a kill model where
drug-induced regression is proportional to the growth rate, so a smaller,
faster-growing (Gompertz) tumor is more chemo-sensitive: dV/dt = (g − k·E)·V·ln(Vmax/V).
That is mechanistically different from the log-kill assumption (kill ∝ tumor
size) the Claret model uses. Adding it lets the divergence view show that which
kill mechanism you assume — not just the parameters — moves the trajectory: with
no resistance term, Norton-Simon predicts eradication, while the Claret log-kill +
resistance model regrows.
ns = onkos.simulate(ds, "drug_effect.norton_simon.nsclc", context=ctx, drug_effect=1.0)
# fractional kill rate rises as the tumor shrinks — the Norton-Simon signatureResistance is the project's most load-bearing term (spec §1: the λ "Hydra"). Onkos
modeled it phenomenologically — the Claret model fades the drug effect as
e^{−λt}, a curve-fitting device whose λ has no cellular referent and is ~90%-CV
unidentifiable. v0.24 adds the mechanistic alternative: a tumor of a drug-sensitive
clone and a pre-existing drug-resistant clone (the Goldie-Coldman two-population
model), observed together as V = S + R:
dS/dt = (kg − kd·E)·S sensitive: net growth kg, killed at potency kd by effect E
dR/dt = kgr·R resistant: grows at kgr, NOT killed
S(0) = V0 , R(0) = R0 a small pre-existing resistant burden R0
The drug crushes the sensitive clone to a nadir; the untouched resistant clone then
outgrows — the mechanistic origin of the nadir-then-regrowth the phenomenological λ
approximates by hand. Crucially, the resistance is now a biologically interpretable
parameter (R0, the initial resistant burden) in place of an unidentifiable rate, and
the choice between the two resistance models becomes a model-selection axis (the
kill-mechanism move, applied to resistance itself).
The sharp, honest finding — and where it hides. The two models are tuned to share the
early kill (matched kd), so they agree at week 8 (≈−87% vs −82% change) and hence on
the week-8-driven OS (median ≈94 vs ≈91 wk) — yet they diverge ≈5× in the tumor tail
(≈74 vs ≈15 mm at 3 years), because one regrowth is a fading effect and the other a
compounding clone. This is the short-trial-indistinguishable, long-horizon-divergent failure
mode exactly — and it carries a second lesson: a week-8-based OS surrogate is nearly blind
to the resistance-model choice, which is how a short-trial-fit resistance model transports
silently into a late-phase prediction it cannot support. Adding the mechanistic model to the
NSCLC divergence view raises the measured model-selection fraction from 0.39 to 0.47 —
the resistance-model axis is real between-model risk. The model is round-trip-validated
(both compartments export to SBML/NONMEM), landmark-tested (tests/test_two_population.py:
the closed form, eradication at R0=0, the late-time slope → kgr, the nadir, resistant-
fraction monotonicity), and mechanistic does not mean measured — R0 is still tier C and
practically unidentifiable from a realistic trial (it composes with the v0.22 analyzer).
mech = onkos.simulate(ds, "resistance.nsclc_first_line.two_population", context=ctx)
mech.tumor_size # V = S + R: nadir, then resistant-clone regrowth
mech.tier # C; out-of-context transport still floors to D
# It joins the divergence view automatically — two resistance mechanisms, one context.The two-population model encodes one resistance origin: a pre-existing resistant clone,
present at baseline (R0 > 0). The other canonical origin is acquired resistance — no
resistant cell exists at baseline, but under drug pressure sensitive cells convert to
resistant at a switching rate α, so resistance is generated by the treatment itself (the
acquired_resistance kernel, v0.32):
dS/dt = (kg − kd·E)·S − α·E·S sensitive: grow, killed by E, AND switch out under drug
dR/dt = kgr·R + α·E·S resistant: grow + the drug-induced acquired influx
S(0) = V0 , R(0) = R0 = 0 no resistance pre-exists; the pool is generated by switching
Setting α = 0, R0 > 0 recovers the pre-existing model exactly (strict generalization). With
kg/kd/kgr matched to the two-population model, the only difference is the resistance
origin — so any divergence is the origin's alone.
Same week-8, different tail. Matched on every shared parameter, the two origins agree at week 8 and on the week-8-driven OS surrogate (median OS 92 vs 94 wk) but diverge in the tail:
| Quantity | Acquired (switching) | Pre-existing (subclone) | sees the origin? |
|---|---|---|---|
| Median OS (week-8 link) | 92 wk | 94 wk | no — the surrogate is blind |
| Nadir depth | 8.0 mm | 2.8 mm | yes — acquired is shallower |
| Mechanistic PFS (RECIST TTP) | 26 wk | 32 wk | yes — acquired progresses earlier |
The acquired model has a markedly shallower nadir — the drug that kills the sensitive clone
simultaneously generates the resistant one, so the tumor never shrinks as deeply — and it
reaches RECIST progression earlier. The resistance origin is a real divergence the week-8
surrogate misses and the mechanistic PFS (v0.30) catches (it lifts NSCLC's PFS route-discordant
pairs to 4/10). No new module — the acquired model is an ordinary TGI record that joins the
divergence view, exports through every format (round-trip validated), and is read by the
existing machinery. Landmark-tested (tests/test_acquired.py): the α=0 recovery, switching-
flux conservation, no-drug-no-resistance, the shallower nadir, the earlier progression, and the
week-8 OS agreement. α is honestly low-identifiability (~110% CV, the acquired analog of R0);
tier C, transport still floors to D; never a resistance diagnosis or a therapy choice.
Oncology is overwhelmingly combination therapy, and a composed forecast for a
combination hides one unmeasured choice — how do the two drugs' effects combine?
onkos.interaction makes that choice a quantified model-selection axis, the same
"make the silent assumption visible" move as the kill mechanism above, one layer up.
Two single-agent effects E_A, E_B combine into one effective effect under each
declared interaction rule, which then drives the existing TGI → survival chain:
hsa E_AB = max(E_A, E_B) highest single agent (conservative null)
additive E_AB = E_A + E_B Bliss-independence / effect-additive null
greco E_AB = E_A + E_B + ψ·√(E_A·E_B) interaction index (ψ>0 synergy, ψ<0 antagonism)
For the same single-agent activity, the interaction assumption alone moves predicted
median OS across a wide range (≈77–100 wk for E_A=E_B=0.6 on the Claret NSCLC model)
— the over-optimism that sinks combination programs, now a measured divergence rather
than a buried assumption. Synergy is treated as an assumption, never a finding: ψ
is a declared input (default 0, the additive null), never fitted from the dataset, and a
non-zero value carries a warning — distinguishing synergy from additivity requires a
combination trial designed for it. A neat identity falls out and is landmark-tested:
for log-linear kill, Bliss independence is exactly additive kill rates
(e^{−E_A·Δt}·e^{−E_B·Δt} = e^{−(E_A+E_B)·Δt}), so Onkos states the equivalence rather
than hiding it. Population/regimen level only — it simulates one combination under
different interaction assumptions, never ranks regimens, and the underlying model's
tier governs and cannot be raised. An inactive partner reduces to monotherapy under
every model (no manufactured interaction). The combination math is proven against a
landmark suite (tests/test_interaction.py): the layer is the standard interaction
nulls (Bliss 1939, Loewe 1953, HSA/Berenbaum 1989, the Greco 1995 interaction index) and
a monotone interaction index, not an unconstrained synergy knob.
from onkos.interaction import compare_interactions
cmp = onkos.compare_interactions(ds, "resistance.claret_2009.tgi", context=ctx,
effect_a=0.6, effect_b=0.6, psi=0.5)
cmp.combined_effects # {hsa, additive, greco±ψ} -> E_AB
cmp.median_os # predicted median OS per interaction model
cmp.os_divergence # how much OS depends on the interaction assumption alone
cmp.median_os_range, cmp.warnings # incl. the synergy-is-an-assumption noteThe combination above works at the effect level (combine two effect magnitudes). But the effect a
dose produces is a point on a dose-response curve, and the classical gold-standard null — Loewe
dose-additivity — combines two doses through the curves, via the isobole d_A/D_A(E)+d_B/D_B(E)=1.
It is the only "no-interaction" reference that satisfies the sham-combination identity: a drug
combined with itself is exactly additive (Loewe(d_A,d_B)=f(d_A+d_B)), which Bliss/effect-additivity
fails for any saturating curve (f(d_A)+f(d_B) > f(d_A+d_B)). So which reference you call "no
interaction" is itself a model-selection axis.
For the same dose pair (Claret NSCLC, d_A=150, d_B=90) the three references give combined effects
HSA 0.90 / Loewe 1.07 / Bliss 1.60 and median OS 88 / 92 / 101 wk. The ordering is structural for
saturating curves (HSA ≤ Loewe ≤ Bliss): Bliss overstates — its combined effect even exceeds either
drug's maximum (1.60 above the shared ceiling 1.4, the classic effect-additivity artifact) — HSA
understates, and Loewe is the self-consistent middle. The disagreement is negligible at low dose and
grows into saturation, exactly where combination dose-finding lives.
from onkos.interaction import loewe_effect, er_curve, compare_additivity_references
ca = er_curve(ds, "exposure_response.emax_generic")
loewe_effect(150, 90, curve_a=ca, curve_b=er_curve(ds, "exposure_response.dacomitinib_egfr.emax"))
cmp = onkos.compare_additivity_references(ds, "resistance.claret_2009.tgi", context=ctx,
dose_a=150, dose_b=90,
er_a="exposure_response.emax_generic",
er_b="exposure_response.dacomitinib_egfr.emax")
cmp.median_os, cmp.os_divergence # the additivity reference as a survival spreadLandmark-tested (tests/test_loewe.py): the sham identity (exact), Bliss failing it, the analytic
ER-inverse round-trips, the effect-ceiling clamp, and the bliss > loewe > hsa OS ordering. Pure
post-processing over the curated ER curves; the reference is a declared choice, never an estimated
synergy; population/regimen level, no dose or therapy ranking.
Across the sections above, Onkos turned each silent modeling choice into its own quantified
model-selection axis. onkos.atlas is the synthesis layer: a declarative registry of the
axes (AXES — the source of truth this table mirrors) and a one-call survey,
model_selection_atlas(ds, context), that runs each applicable axis and returns its native
headline — a map of where the model-selection risk lies for a context.
| Axis | what it varies | headline finding | module · CLI | since |
|---|---|---|---|---|
| TGI model | which growth model | matched in-context models imply median OS ~54–94 wk | compare · simulate --compare |
v0.13 |
| survival link / metric | week-8 vs k_g vs integrated burden | the metric choice inverts the resistance-model ranking | simulate(survival_link=) |
v0.25/0.33 |
| survival structure | two-stage PH vs joint current-value | the joint HR rises 10×–255× as a clone regrows (non-PH) | joint · onkos joint |
v0.34 |
| resistance mechanism / origin | phenomenological vs subclone; acquired vs pre-existing | matched on early kill, the choice is invisible at week-8 yet drives the tail | compare (kernels) |
v0.24/0.32 |
| exposure-response shape | Emax vs power vs sigmoid | invisible at the studied dose, ~19 wk OS swing on de-escalation | dose_response · onkos dose-response |
v0.36 |
| additivity reference | HSA vs Bliss vs Loewe | only Loewe passes the sham-combination test; the choice moves combined OS | interaction · onkos loewe |
v0.35 |
| readout timing | when the surrogate is read (ctDNA wk 2–4 vs RECIST wk 8) | earlier readout trades fidelity to durable benefit (9/10 → 3/10) | early_surrogate · onkos early-surrogate |
v0.37 |
| model discriminability (meta) | whether a trial can tell the models apart at all | the resistance choice needs 10⁴–10⁵ events: practically unidentifiable | discriminability · onkos discriminability |
v0.38 |
For NSCLC first line the survey ranks the OS-swing: survival structure (~108 wk) > bridge metric
(~97) > TGI model (~41) > exposure-response (~22), while 4 of 10 model pairs are practically
indistinguishable. The atlas is a survey, not a decomposition — the axes are not orthogonal and
the headlines are in different units, so it flags comparable = False and routes the rigorous
orthogonal partition to onkos.model_selection_budget. One entry
point, the whole map; each axis's own command for the deep dive.
a = onkos.model_selection_atlas(ds, context=ctx)
a.os_swing_axes # the weeks-unit leaderboard (loosely comparable)
a.comparable, a.note # False; points to the budget for the rigorous partition
# onkos atlas --tumor-type NSCLC --line firstgit clone https://github.com/clay-good/onkos
cd onkos
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]" # or: pip install -e . (runtime only)
onkos validate # JSON-Schema-validate the dataset
onkos info # counts by subsystem / tier / review status
onkos simulate --compare # the divergence view (NSCLC, 1L by default)import numpy as np
import onkos
ds = onkos.load()
m = ds["resistance.claret_2009.tgi"]
m.tier # "C"
m.derivation_context.tumor_type # "NSCLC"
m.transportability.validated_tumor_types # ("NSCLC",)
m["lambda"].iiv_cv_percent # 96 -> uncertainty is first-class
m.review_status # "unverified"
m.primary_citation.doi # "10.1200/JCO.2008.21.0807"
# Population-level forward simulation (NO individual prognosis, NO therapy ranking)
ctx = dict(tumor_type="NSCLC", line="first")
traj = onkos.simulate(ds, "resistance.claret_2009.tgi",
context=ctx, drug_effect=1.0, t=np.linspace(0, 104, 209))
traj.tumor_size, traj.os_curve # tumor-size + population OS trajectory
traj.survival # {"OS": curve, "PFS": curve}
traj.median_os, traj.median_pfs # PFS < OS by construction
traj.tier, traj.warnings # propagated tier + transport warnings
traj.metrics["week8_relative_change"] # the TGI metric feeding the survival link
# Virtual-trial comparison — the headline feature
cmp = onkos.compare(ds, purpose="tgi", context=ctx, drug_effect=1.0)
cmp.os_divergence, cmp.pfs_divergence # model-choice dependence of OS / PFS
cmp.median_os_range # (lo, hi) median OS across models
cmp.excluded # models greyed out for out-of-context transport
cmp.to_json(include_curves=True) # serializable result for dashboards / external simulators
# Model averaging — split the forecast into parameter noise vs model-choice risk
ma = cmp.model_average(target="median_os_weeks", endpoint="OS", weights="equal")
ma.point, ma.tier # averaged median OS; worst included tier
ma.within_var, ma.between_var # law-of-total-variance components
ma.model_selection_fraction # BETWEEN / TOTAL — irreducible model-choice risk
ma.curve, ma.between_band # S̄(t) + pointwise between-model ±1σ
cmp.uncertainty_decomposition() # per-scheme (equal/tier/evidence) table
# Model-selection budget — variance split across ALL the structural choices (capstone)
b = onkos.model_selection_budget(ds, context=ctx, endpoint="OS")
b.fractions # parameter / tgi_model / survival_link / interaction
b.structural_fraction, b.dominant # irreducible share; the axis to standardize first
# RECIST response & ORR — the phase-2 endpoint, and its contested OS surrogacy
rr = onkos.objective_response_rate(ds, "resistance.claret_2009.tgi", context=ctx)
rr.orr, rr.dcr, rr.distribution # population ORR/DCR; CR/PR/SD/PD distribution
rs = onkos.response_vs_survival(ds, context=ctx) # ORR->OS discordance across models
rs.discordant_fraction, rs.orr_predicts_os # is ORR a faithful OS surrogate here?
# PFS two ways — the statistical week-8 link vs the mechanistic RECIST progression time
pf = onkos.progression_free_survival(ds, "resistance.nsclc_first_line.two_population", context=ctx)
pf.median_ttp_weeks, pf.median_pfs_link_weeks # mechanistic vs statistical PFS (same trial)
pf.mechanistic_pfs_rate, pf.route_ratio # landmark PFS rate; route disagreement
div = onkos.pfs_route_divergence(ds, context=ctx) # route-discordance across the in-context models
div.discordant_pairs, div.routes_agree # is PFS one number here, or two that disagree?
# Parameter uncertainty — propagate the stored IIV CVs (Monte-Carlo bands)
ens = onkos.simulate_ensemble(ds, "resistance.claret_2009.tgi", context=ctx, n=400, seed=0)
ens.tumor_size.median, ens.tumor_size.lo, ens.tumor_size.hi # 5–95% band arrays
ens.os_curve.lo, ens.os_curve.hi # population-OS band
ens.metrics["median_os_weeks"] # {"median", "lo", "hi"}
# Sensitivity — attribute the OS-prediction variance to parameters (verify first)
res = onkos.sensitivity(ds, "resistance.claret_2009.tgi", context=ctx, target="median_os_weeks")
res.dominant.symbol # "kD"
res.indices # ranked [ParamSensitivity(symbol, src, contribution), …]
# Practical identifiability — can a realistic trial design even estimate the params?
idn = onkos.identifiability(ds, "resistance.claret_2009.tgi", context=ctx,
schedule=[0, 6, 12, 18, 24, 36, 48]) # weeks
idn.practically_identifiable # False — under this design
idn.collinearity_index # γ_K ≈ 22 (confounded combination)
[(p.symbol, round(p.rse_percent), p.iiv_cv_percent) for p in idn.params] # RSE vs stored CV
idn.worst.symbol # least-identifiable parameter (curation triage)
# D-optimal trial design — the best sampling schedule a fixed budget allows
od = onkos.optimal_schedule(ds, "resistance.claret_2009.tgi", context=ctx, n_samples=7, horizon=48.0)
od.optimal.schedule, od.d_efficiency # D-optimal scan times (wk); ≥1 informativeness vs uniform
od.rescues_any, od.structurally_flat # rescued the borderline λ; kL stays structurally flat
# Combination therapy — the interaction model as a model-selection axis
cmp = onkos.compare_interactions(ds, "resistance.claret_2009.tgi", context=ctx,
effect_a=0.6, effect_b=0.6, psi=0.5)
cmp.combined_effects # {hsa, additive, greco±ψ} -> combined effect E_AB
cmp.median_os, cmp.os_divergence # predicted OS per model; interaction-model divergence
onkos.combine_effects(0.6, 0.6, model="greco", psi=0.5) # the pure interaction math| Command | Does |
|---|---|
onkos version |
print version |
onkos validate |
JSON-Schema + referential-integrity check of the dataset |
onkos info |
counts by subsystem / tier / review status |
onkos report [--output FILE] |
dataset health & external-validation report (Markdown) |
onkos audit |
evidence-based tier audit — flags tier inflation (also run inside validate) |
onkos simulate <id> [--tumor-type --line --drug-effect] |
one model's trajectory + metrics |
onkos simulate --compare [--json --include-curves] |
virtual-trial divergence across eligible models (text or JSON) |
onkos compare --average [--weights --decompose --json] |
model-averaged forecast + within/between variance decomposition |
onkos budget [--tumor-type --line --endpoint --json] |
model-selection budget — variance split across the structural choices |
onkos response <id> [--survival-link --surrogate --durability --json] |
RECIST best response / ORR / DoR, the ORR → OS surrogate, and breadth-vs-durability |
onkos pfs <id> [--routes --landmark --json] |
PFS two ways — mechanistic RECIST TTP vs the statistical week-8 link; the route-discordance across models |
onkos uncertainty <id> [--n --seed] |
Monte-Carlo parameter-uncertainty bands (propagates IIV CV) |
onkos sensitivity <id> [--target --n] |
rank parameters by how much their IIV drives a target metric |
onkos identify <id> [--schedule --sigma-prop] |
predicted RSE vs stored CV — can a realistic trial design estimate the parameters? |
onkos design <id> [--n-samples --horizon --json] |
D-optimal sampling schedule — the best trial a fixed budget allows, vs uniform; the structural-flat verdict |
onkos interactions <id> [--effect-a --effect-b --psi] |
drug-combination divergence — the interaction model as a model-selection axis |
onkos loewe <id> [--dose-a --dose-b --er-a --er-b] |
dose-level additivity references (HSA / Bliss / Loewe) as a model-selection axis |
onkos joint [--tumor-type --line --alpha] |
joint (current-value) vs two-stage survival — the non-proportional-hazard axis |
onkos dose-response <id> [--c-ref --e-ref] |
exposure-response model choice as a dose-extrapolation model-selection axis |
onkos early-surrogate [--tumor-type --line --reference-link] |
early-surrogate readout timing — landmark week vs durable-benefit fidelity |
onkos discriminability [--tumor-type --line --survival-link] |
required trial events to distinguish the competing models (model identifiability) |
onkos atlas [--tumor-type --line] |
model-selection atlas — one survey of every axis's headline for a context |
onkos export --format <fmt> --output <dir> |
generate artifacts |
Export formats: nonmem, sbml, pharmml, so (PharmML Standard Output),
rxode2, pumas, vt-json, jsonld (linked data), omex, csv, bibtex. The
COMBINE .omex bundles SBML + PharmML + the SO + virtual-trial JSON + JSON-LD +
provenance into one citable archive.
pip install -e ".[dashboard]"
streamlit run dashboard/app.pyThe Streamlit dashboard is a thin presentation layer over the tested package API
(compare, simulate_ensemble, sensitivity) — three tabs: the divergence
view (tumor / OS / PFS curves + divergences + the greyed-out excluded models),
an analyze-a-model tab (uncertainty bands + the sensitivity tornado for any
included model), and a dataset browser. Because all logic lives in the
package, the dashboard's data is unit-tested and CI keeps dashboard/app.py
linted and compiling against the current API.
A record is a structured object, not a scalar. Two kinds share one schema: a model record (e.g. the Claret 2009 TGI model) and a context-baseline record (e.g. NSCLC first-line baseline growth). The fields that carry the project:
derivation_context— the exact drug, tumor type, line, trial, and measurement basis a parameter came from. Machine-readable, mandatory.transportability— how far beyond that origin it has actually been validated. Crossing this boundary forces a tier penalty.iiv_cv_percent— inter-individual variability on the high-uncertainty kill/resistance terms, so a 90%-CV term cannot masquerade as a point estimate.
Honesty note. v0.1 parameter values are illustrative and
unverifiedby design — see the verification checklist. The infrastructure (schema, kernels, tier propagation, round-trip-validated exports) is real and tested; promoting records toverifiedfrom source PDFs is the highest-leverage contribution.
| Tier | Meaning |
|---|---|
| A | Model + parameters externally validated; TGI→survival link held in ≥1 independent trial; broad context. |
| B | One robust model from a well-powered trial with at least a partial external check. |
| C | Single trial, narrow tumor type/line; no external validation; high-CV kill/resistance terms. |
| D | Transported outside its validated context, or hypothesis-tier (e.g. immuno-oncology). Not predictive. |
Two rules are enforced in code (onkos/tiers.py, tested in tests/):
- Worst input wins. A composed simulation (
growth + drug_effect + resistance + exposure_response + survival_link) inherits the worst component tier. - Out-of-context transport forces a tier floor of D + a warning. You cannot get an A-looking forecast from a model validated only on a different tumor type. This is what greys models out in the divergence view.
The spec (§5, §9) says a clinical model's tier is partly a numeric judgment:
A/B require an external check (a recorded external C-index), and a poorly-identified
kill/resistance term (IIV CV ≥ 70%) is a tier-C characteristic. onkos audit
derives the best tier the recorded evidence supports ("ceiling") for each
clinical TGI / survival record and flags any whose assigned tier is better than
that — tier inflation, the dangerous direction. The check runs inside onkos validate, so an over-claimed tier fails CI and cannot regress.
$ onkos audit
record tier ceiling status
resistance.claret_2009.tgi C C ok # ~96% CV resistance term -> C
survival_link.nsclc_os_week8 C B conservative # external check, well-identified -> B available
inflated (tier exceeds evidence): 0
The shipped dataset has zero inflations and is deliberately conservative (records with an external metric but high-CV terms sit at C); the audit surfaces the upgrade candidates without forcing them — the curator reconciles tier with evidence, exactly as §5 intends.
Every model binds to a pure-NumPy/SciPy reference kernel in
onkos/export/reference.py, the single computational ground truth. E is the
drug-effect magnitude that scales the kill term — supplied directly or derived
from a PK exposure through an exposure-response kernel (below).
| Kernel | Kind | Dynamics | Records |
|---|---|---|---|
growth_exponential |
ODE | dV/dt = kg·V |
growth_laws.exponential |
growth_logistic |
ODE | dV/dt = kg·V·(1 − V/Vmax) |
growth_laws.logistic |
growth_gompertz |
ODE | dV/dt = kg·V·ln(Vmax/V) |
growth_laws.gompertz |
claret_tgi |
ODE | dy/dt = kL·y − kD·E·e^(−λt)·y (log-kill + resistance) |
resistance.claret_2009.tgi |
norton_simon |
ODE | dV/dt = (g − k·E)·V·ln(Vmax/V) (kill ∝ growth) |
drug_effect.norton_simon.nsclc |
biexp_tgi |
ODE | y = y0·(e^(−ks·E·t) + e^(kg·t) − 1) (shrink + regrowth) |
tgi_metrics.wang_2009.*, tgi_metrics.bruno_2020.* |
survival_weibull_ph |
survival | S(t) = exp(−(t/scale)^shape · e^(β·x)), x = week-8 change |
survival_link.*_os_week8, …_pfs_week8 |
survival_cox_ph |
survival | S(t) = S0(t)^e^(β·x), S0 = tabulated baseline |
survival_link.nsclc_os_cox |
er_emax |
exposure-response | E = Emax·C/(EC50+C) |
exposure_response.emax_generic, …dacomitinib_egfr.emax |
er_sigmoid_emax |
exposure-response | E = Emax·C^γ/(EC50^γ+C^γ) |
exposure_response.sigmoid_emax_generic |
er_power |
exposure-response | E = slope·C^θ |
exposure_response.power_generic |
simeoni_exp_linear |
ODE | dw/dt = λ0·w / (1+(λ0·w/λ1)^ψ)^(1/ψ) (exp→linear) |
growth_laws.simeoni_exp_linear |
simeoni_tgi |
ODE (4-state) | transit-chain TGI; observe w = x1+x2+x3+x4 |
preclinical_translation.simeoni_2004.xenograft |
ivive_power |
exposure-response | potency = scale·IC50^power |
preclinical_translation.ivive_potency |
io_tumor_immune |
ODE (2-state) | Kuznetsov tumor–immune predator-prey (hypothesis-tier) | immuno_oncology.kuznetsov_1994.tumor_immune, …poorly_immunogenic.hypothesis |
The exposure-response (ER) layer maps a PK exposure metric C (C_avg, AUC,
C_max) to the drug-effect magnitude E that drives a TGI model's kill term. This
makes the potency of a regimen first-class (with its own tier and IIV) and
completes the chain PK → exposure → tumor dynamics → survival — the seam
where a Hypnos PK record composes with an Onkos TGI
model. A time-varying exposure (a full PK profile aligned to t) yields a
time-varying E(t), and the tumor ODE is integrated numerically; a scalar
exposure uses the fast closed form.
import numpy as np, onkos
ds = onkos.load()
ctx = dict(tumor_type="NSCLC", line="first")
# Scalar exposure -> Emax transform -> drug effect -> Claret TGI -> OS
traj = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx,
exposure=200.0, # C_avg in µg/L
exposure_response="exposure_response.dacomitinib_egfr.emax")
# Time-varying PK profile (e.g. piped from Hypnos) -> E(t) -> ODE integration
t = np.linspace(0, 104, 209)
C = 300.0 * np.exp(-0.02 * t) # declining exposure
traj = onkos.simulate(ds, "resistance.claret_2009.tgi", context=ctx,
exposure=C, exposure_response="exposure_response.emax_generic", t=t)$ onkos simulate resistance.claret_2009.tgi \
--exposure 200 --exposure-response exposure_response.dacomitinib_egfr.emax
resistance.claret_2009.tgi tier=C (exposure=200.0 via exposure_response.dacomitinib_egfr.emax)
The ER record's tier and transportability propagate like any other component:
an ER model validated only on NSCLC/EGFR-TKI floors an out-of-context simulation
to D with a warning, exactly as the TGI and survival components do.
The chain's first modeling link — which ER shape maps exposure to effect — is itself a
silent choice. Emax (saturating), power (unbounded), and sigmoid-Emax (switch-like) all fit
the studied dose comparably and diverge when extrapolated to a dose the trial never
studied, which is exactly what dose selection asks of them. onkos.dose_response makes this
the project's transportability thesis one layer upstream, with the dose as the context: it
re-anchors each curated ER shape to agree at the studied dose (c_ref, e_ref), then reads off
how far the predicted effect — and the resulting OS — diverge off it.
The OS spread across ER shapes is 0 at the studied dose (the built-in control — the curves
are anchored there) and grows on extrapolation: ≈19 weeks at quarter-dose, 14 at half-dose, 5
at 2–4×. The risk is asymmetric, and sharpest on de-escalation: a lower dose lands the
effect on the steep part of the effect→survival relationship, so the ER-model choice moves OS
most there — the very question ("can we give less?") that dose-finding exists to answer.
Upward extrapolation gives a larger effect spread (the unbounded power curve runs to E≈3.25
at 4×) but a smaller OS spread, because survival saturates once the tumor is controlled.
from onkos.dose_response import calibrated_er, compare_er_extrapolation
f = calibrated_er(ds, "exposure_response.power_generic", c_ref=150.0, e_ref=1.0) # f(150)==1.0 exactly
cmp = onkos.compare_er_extrapolation(ds, "resistance.claret_2009.tgi", context=ctx,
c_ref=150.0, e_ref=1.0)
cmp.reference_os_divergence # ~0 (anchored control)
cmp.max_os_divergence # weeks of OS riding on the ER-shape choice on extrapolation
cmp.os_divergence_at(37.5) # the de-escalation case (largest)Landmark-tested (tests/test_dose_response.py): the exact anchor, the zero-divergence control
at the studied dose, the de-escalation-diverges-most asymmetry, and the inherited tier/transport
guardrails. Pure post-processing — it re-anchors the curated ER shapes, never refits them, and
ranks no doses; population/trial level, no dose or therapy recommendation.
The divergence view is only broadly useful if it has a context to run in. Phase C
builds the tumor_type_baselines library and the matching per-context survival
links, so every supported tumor type carries:
- a baseline (
tumor_type_baselines.*) — baseline SLDy0and unperturbed growth, supplying the simulation's initial conditions; - a survival link (
survival_link.*_os_week8, plus a non-default tail-sensitive*_os_growth_ratek_g link since v0.29) — a tumor-specific Weibull-PH OS model whose scale reflects that indication's baseline prognosis; - 3 eligible TGI models (a Claret phenomenological-resistance form, a biexponential form, and a mechanistic two-population resistance form), so model-selection risk — and the resistance-mechanism axis — is measurable in every context, not just NSCLC.
| Context (1L) | baseline SLD | OS scale (wk) | eligible TGI models | OS divergence |
|---|---|---|---|---|
| NSCLC | 80 mm | 60 | Claret 2009 · Wang biexp · two-population · (+Norton-Simon) | 0.26 |
| breast | 69 mm | 130 | breast Claret · Bruno biexp · two-population | 0.13 |
| CRC | 143 mm | 95 | CRC Claret · CRC biexp · two-population | 0.25 |
| HCC | 79 mm | 48 | HCC Claret · HCC biexp · two-population | 0.35 |
| melanoma | 72 mm | 85 | melanoma Claret · melanoma biexp · two-population | 0.25 |
The non-NSCLC baseline SLDs above are now grounded in open-access trial/TGI sources
(breast: Krishnan 2021; CRC: Machida 2008; HCC: IMbrave150/Salem 2021; melanoma:
BRIM-3/Mistry 2018) and carry review_status: pending_human_review — real median
baseline burdens awaiting a human sign-off (run onkos review-queue). CRC's bulkier
metastatic burden (143 mm) is the real outlier; the others cluster near 70-80 mm.
NSCLC-1L is kept at a round 80 mm (within the published range) to hold the flagship
demo numbers stable while the NSCLC Claret/Wang model values remain paywalled; NSCLC-2L
is kept illustrative because its sourced figure (OAK ~67 mm) would break the
2L-more-advanced-than-1L modeling invariant — that conflict is flagged for human review.
Each context resolves its own baseline and survival link automatically; a model
from one tumor type applied to another is greyed out (floored to D) by the
same transportability rule. Values are illustrative and unverified by design.
import onkos
ds = onkos.load()
for tt in ["NSCLC", "breast", "CRC", "HCC", "melanoma"]:
cmp = onkos.compare(ds, purpose="tgi", context=dict(tumor_type=tt, line="first"))
print(tt, len(cmp.included), round(cmp.os_divergence, 2))The discovery-to-clinic bridge. Onkos implements the canonical Simeoni 2004
xenograft PK/PD model — the project's first multi-state ODE system. Unperturbed
growth is exponential then linear; drug at concentration E damages
proliferating cells (x1) at rate k2·E, and damaged cells traverse a
signal-distribution transit chain x2→x3→x4 (rate k1) before dying, which
produces the characteristic delayed cell death. The observed tumor weight is
w = x1+x2+x3+x4.
dx1/dt = λ0·x1 / (1+(λ0·w/λ1)^ψ)^(1/ψ) − k2·E·x1 (proliferating)
dx2/dt = k2·E·x1 − k1·x2 (damaged, transit)
dx3/dt = k1·x2 − k1·x3
dx4/dt = k1·x3 − k1·x4 w = x1+x2+x3+x4 (observed weight)
Multi-state systems have no closed form, so the kernel framework integrates them
numerically and validates exports state-by-state: the SBML round-trip re-parses
each rate rule's MathML and checks it against the reference rhs, and the
NONMEM stream emits one $DES compartment per state. A concentration profile can
drive the kill term directly (exposure=...), so a Hypnos PK curve composes here
too.
# Dose-dependent xenograft TGI (concentration drives the kill term directly)
tr = onkos.simulate(ds, "preclinical_translation.simeoni_2004.xenograft",
context=dict(tumor_type="ovarian_xenograft"), drug_effect=120.0)
tr.tumor_size # total tumor weight w(t)
tr.os_curve # None — preclinical models carry no survival linkIn-vitro → in-vivo translation. preclinical_translation.ivive_potency maps
an in-vitro potency (e.g. IC50) to an in-vivo potency parameter
(potency = scale·IC50^power). The assumption that in-vitro potency predicts
in-vivo activity is itself what must be validated (Rocchetti 2007), so the record
is tiered and annotated accordingly. Preclinical records are excluded from the
clinical divergence view and applying xenograft parameters to a human tumor
floors the result to D — the translation gap, made explicit.
🛑 HYPOTHESIS-TIER. NOT FOR PREDICTION. The immuno-oncology subsystem ships tier D by construction because the quantitative validation to do otherwise honestly does not yet exist (spec §2, §3, §5, §10).
Onkos includes the Kuznetsov 1994 tumor–immune QSP model — a 2-state predator-prey system (tumor + effector cells) that reproduces the field's qualitative regimes: immune control / dormancy, immune escape, and the bistable rescue when an immunotherapy effect (e.g. checkpoint blockade) pushes the system across its threshold.
d tumor/dt = α·tumor·(1 − β·tumor) − (1+E)·effector·tumor
d effector/dt = s + ρ·effector·tumor/(η+tumor) − μ·effector·tumor − δ·effector
The non-predictive stance is enforced in code, not just documented:
- the validator rejects any immuno-oncology record (or parameter) that is not
tier D —
onkos validatefails otherwise; - every export carries
onkos:predictionStatus = "DO NOT USE FOR PREDICTION (hypothesis-tier)"and the virtual-trial JSON setsDO_NOT_USE_FOR_PREDICTION: true; - IO models are excluded from the clinical divergence view and never receive a survival link (no OS curve).
This is the Nidus "Phase-C" convention: the frontier is represented so it can be explored and exported, but it can never masquerade as validated.
onkos report turns the dataset's own honesty fields into a machine-generated
health report (docs/dataset-health.md), kept in sync
with the data by a CI gate. It surfaces tier and review-status coverage, the
external-validation backlog, and the hypothesis-tier records.
$ onkos report | head
# Onkos dataset health report
- Records: 33 · Citations: 8
- Verified (PDF-checked): 0 / 33
- External-validation coverage (clinical TGI + survival models): 15 / 15 100%
Releasable, proven in CI. The dataset is the source of truth at the repo
root; scripts/sync_dataset_into_package.py copies it into the package as
_dataset/ for packaging. The CI release job builds the wheel, installs it
into a clean environment, and runs onkos validate / info / report and a
simulation from outside the repository — proving the bundled dataset ships
and resolves without a source checkout. Resolution order puts the source
dataset/ first so a stale synced copy can never shadow live edits during
development; the bundled _dataset/ is the wheel-only fallback.
Release metadata: CHANGELOG.md, .zenodo.json
(Zenodo concept DOI on first deposit), CITATION.cff, and a
py.typed marker so downstream type-checkers see Onkos's annotations.
The dataset is the single source of truth; everything else is a deterministic
projection. The system is layered — data → core → kernels → analyses → exports →
presentation — and the layering is pinned by tests/test_architecture.py (every
declared subsystem has records, every kernel is bound, the CLI export formats
match the builders and the CI sweep, the public API surface is stable).
flowchart TD
subgraph data["① data — source of truth"]
DS["dataset/records · schema · citations · JSON-LD context"]
end
subgraph core["② core"]
LOAD["load · models · filter"]
VAL["validate (schema + referential + tier audit)"]
TIER["tiers (worst-input-wins + transport floor)"]
end
subgraph kern["③ kernels"]
REG["registry — bind record → kernel"]
REF["reference — NumPy/SciPy kernels<br/>(ODE · survival · transform)"]
end
subgraph ana["④ analyses — each a model-selection axis"]
SIM["simulate (OS+PFS) · metrics · pk"]
CMP["compare (virtual-trial divergence) · combine (model averaging)"]
UNC["uncertainty (IIV) · sensitivity · budget (variance partition)"]
IDN["identify · design · discriminability (parameter→model identifiability)"]
SRV["joint (non-PH) · dose_response (ER) · early_surrogate (timing)"]
CMB["interaction / loewe (combination) · response/pfs (endpoints)"]
ATL["atlas (axis registry + survey) · audit (evidence-based tiers)"]
end
subgraph exp["⑤ exports — generated, never hand-edited"]
EXP["NONMEM · SBML · PharmML · SO · rxode2 · Pumas<br/>vt-json · JSON-LD · COMBINE .omex · CSV · BibTeX"]
end
subgraph pres["⑥ presentation"]
CLI["onkos CLI (21 subcommands)"]
DASH["Streamlit dashboard"]
REP["health report + tier audit"]
NB["32 CI-executed notebooks"]
end
DS --> LOAD --> REG --> REF
LOAD --> VAL & TIER
REG --> SIM --> CMP
SIM --> UNC & SRV
CMP --> IDN & CMB
CMP & UNC & IDN & SRV & CMB & ATL --> CLI & DASH & REP & NB
REG --> EXP
REF -. "round-trip validates" .-> EXP
Every model binds to one pure-NumPy/SciPy reference kernel (the single computational ground truth). Kernels come in three kinds:
| Kind | What it computes | Kernels |
|---|---|---|
| ODE | tumor-size dynamics dV/dt (closed form where one exists, else integrated) |
growth_exponential/logistic/gompertz/von_bertalanffy/power_law, claret_tgi, norton_simon, biexp_tgi, two_population_resistance (2-clone), acquired_resistance (2-clone), simeoni_exp_linear, simeoni_tgi (4-state), io_tumor_immune (2-state) |
| survival | population survival S(t | x) from a TGI metric |
survival_weibull_ph (parametric), survival_cox_ph (nonparametric baseline) |
| transform | algebraic map (exposure → effect, or in-vitro → in-vivo) | er_emax, er_sigmoid_emax, er_power, ivive_power |
flowchart LR
P["dataset/records/*.json"] --> REG["registry — bind record → kernel"]
REG --> REF["reference — ODE / survival / transform kernels"]
REG --> B["NONMEM · SBML · PharmML · SO · rxode2 · Pumas · JSON-LD"]
ANN["annotate — clinicalUse=PROHIBITED · tier · DOI RDF · predictionStatus"] --> B
REF -. "round-trip: analytic↔ODE (1e-4) · MathML↔rhs per state (1e-6) · cross-format" .-> B
The unperturbed growth-law family (spec §2) is complete: exponential, logistic, Gompertz,
Simeoni (exp→linear), von Bertalanffy (dV/dt = a·V^(2/3) − b·V — surface-limited
proliferation minus volume loss, sub-exponential to a carrying capacity V∞ = (a/b)³), and
the power-law (dV/dt = a·V^p, p<1 — sub-exponential but unbounded, the law Benzekry
2014 found best-fitting across many tumor datasets). The laws are distinguished by their
specific growth rate (1/V)dV/dt signature — the analytically-derivable landmark each kernel
must reproduce.
The growth-law assumption is itself a silent model-selection choice — and the field's convenient default, exponential growth, is the one that overestimates. Matched to a power-law at the same baseline and early rate, the exponential extrapolation explodes: ~93× the burden by two years. The growth-layer analog of the exposure-response dose-extrapolation axis — invisible at the studied timepoint, dominant on extrapolation.
Each ODE kernel declares three independent expressions of the same dynamics: a
closed-form analytic solution, a hand-written rhs, and an rhs_infix string.
CI checks (tests/test_roundtrip.py):
- analytic vs. SciPy ODE integration → agreement to ~1e-4 (single-state closed forms; validates the rhs);
- SBML re-parsed: the generated MathML rate law is converted back to an
expression and evaluated against
rhs→ ~1e-6, per state (so the multi-state Simeoni system is checked compartment-by-compartment); - NONMEM re-parsed:
$THETAinitial estimates must equal the dataset values, and one$DEScompartment is emitted per state; - rxode2 / Pumas / SO re-parsed: the parameter vector is read back from each and must equal the dataset values;
- cross-format consistency: NONMEM, SBML, PharmML-SO, rxode2, and Pumas must all agree on the parameter values — one source of truth, five renderings.
Multi-state kernels (Simeoni) have no closed form, so the analytic check is skipped for them and the rhs is instead pinned by the per-state SBML round-trip plus behavioral tests (exp→linear growth, dose-dependent shrinkage, transit delay). An export bug therefore cannot ship silently.
Round-trip validation proves the exports agree with the kernel. It does not
prove the kernel reproduces the published model it claims to implement — a kernel
can be internally self-consistent yet be the wrong dynamics. A second,
independent validation axis (tests/test_landmarks.py,
catalogued in docs/validation-landmarks.md)
closes that gap: every kernel is checked against the characteristic,
analytically-derivable landmark of its model.
| Model | Landmark the kernel must reproduce |
|---|---|
| Gompertz | growth rate peaks at V = Vmax/e (the published inflection) |
| Logistic | growth rate peaks at V = Vmax/2 |
| Norton-Simon | tumor is stationary at every size when E = g/k |
| Simeoni TGI | proliferating compartment is static at c* = λ0/k2 (tumor-static concentration) |
| Bi-exponential | nadir at t* = ln(ks·E/kg)/(kg+ks·E) |
| Weibull-PH | median at scale·(ln2)^(1/shape); S_x = S₀^exp(β·x) |
| Emax / sigmoid | half-maximal effect exactly at C = EC50 (any Hill γ) |
| IO tumor-immune | effectors relax to homeostasis s/δ with no tumor |
This is the honest reading of spec §9's "compare against published example simulations": the landmark is the published property, derived from the model's own equations, so no digitized data is fabricated. The two axes are complementary — round-trip catches a mis-encoded export; landmarks catch a mis-implemented model.
Round-trip and landmark validation both check structure: that the exports match
the kernel and the kernel is the model it names. Neither checks whether a record's
parameter values match a published estimate — and by design, every value in the
dataset ships as review_status: unverified and labeled illustrative until a
human confirms it against the source PDF. docs/verification/workhorse-models.md
is the standing evidence dossier for that third axis: for each workhorse model it
records the confirmed structure, the real published values (with source, access
status, and confidence), the discrepancies the current illustrative numbers carry,
and a maintainer checklist to promote a record to verified. Per
CONTRIBUTING.md, automated review may assemble this evidence
but may never set verified on its own authority — the dossier turns that human
confirmation into a lookup rather than a re-derivation. (For example, it documents
that the Kuznetsov 1994 tumor-immune parameters already match the canonical
published nondimensional set, while flagging that the Claret record's circulating
"values" trace to a known-illustrative third-party default and must not be cited.)
Records filled from a real source but not yet PDF-confirmed by a human carry
review_status: pending_human_review — the honest bridge between an illustrative
placeholder and a human-verified value. onkos review-queue lists exactly
those records and the source behind each parameter, so a researcher can pick one
up, open the cited table, and promote it. The review_status lifecycle
(unverified → pending_human_review → verified, plus contested) is defined in
CONTRIBUTING.md, and the health report summarizes the queue.
The curation fields are exported as JSON-LD so they become real RDF triples,
not JSON that merely uses onkos: keys. The single @context
(dataset/schema/context.jsonld) maps the friendly terms to the onkos:,
bqbiol:, and dcterms: vocabularies; bqbiol:isDescribedBy is typed as @id
so each record's DOI/PMID become resolvable identifiers.org resources. onkos export --format jsonld writes per-record documents, and dataset_jsonld(ds)
emits the whole dataset as one @graph. CI validates this by expanding the
output with rdflib and asserting the expected triples (clinical-use prohibition,
confidence tier, DOI links) actually appear — the linked-data claim is tested, not
asserted.
from rdflib import Graph
g = Graph().parse(data=onkos.export.to_jsonld(ds["resistance.claret_2009.tgi"]), format="json-ld")
# -> (record, bqbiol:isDescribedBy, <https://identifiers.org/doi/10.1200/JCO.2008.21.0807>)| Decision | Rationale |
|---|---|
| Pure Python (NumPy/SciPy); R/Julia only as export targets | Nothing is compute-bound; R/Julia models are generated artifacts, not runtime deps. |
| Dataset is the centerpiece; everything else is presentation | The durable contribution is the curated, tiered, context-annotated parameters. |
derivation_context + transportability are first-class |
Out-of-context transport is the dominant silent error; machine-enforcing it is the load-bearing idea. |
| IIV CV surfaced on kill/resistance terms | A ~90%-CV term must not present as a point estimate. |
| Tiers + transport warnings propagate; worst input wins | A forecast is only as trustworthy as its least-validated component. |
| Population-level forward simulation only | The line between research tool and clinical tool is exactly individual prediction. Onkos stays on the safe side by construction. |
| Exposure-response is a separate, tiered kernel (not baked into the TGI model) | Potency/uncertainty are drug-specific and reusable; decoupling them lets one ER record drive many TGI models and keeps the PK→effect seam explicit and tier-propagating. |
| Scalar exposure uses the closed form; time-varying PK integrates the ODE | Exactness and speed for the common case; correctness for a full PK profile, where the constant-E closed form would be wrong. |
Multi-state kernels keep analytic optional; an observable maps states → the measured quantity |
The Simeoni transit model has no closed form. Numerical integration + a per-state SBML round-trip preserve export-correctness guarantees without forcing a closed form; the observable (total weight = Σ compartments) decouples the measured signal from the latent states. |
| Hypothesis-tier (immuno-oncology) is enforced in code, not just documented | A "do not predict" note in prose is easy to ignore. The validator fails the build if an IO record is not tier D, exports carry a machine-readable predictionStatus, and IO is excluded from the clinical view — so the frontier can be explored but never masquerade as validated. |
| Parameter uncertainty is propagated, not just stored | Storing iiv_cv_percent but simulating on central values would let a ~90%-CV term pose as a point estimate. simulate_ensemble samples IIV lognormally (median-preserving) so the reported variability flows into tumor/OS bands — the second uncertainty axis alongside model-selection divergence. |
| TGI metrics are extracted model-agnostically (Stein method), not read from params | Reading k_g/k_s off a record only works for the biexponential; the Claret/Simeoni structures have no such params. Extracting them from the trajectory the way Stein extracts them from RECIST data makes the metric panel uniform across kernels — and recovers the generating rates as a built-in correctness check. |
| Sensitivity uses independent sampling so first-order indices are correlations | Sampling each IIV parameter independently makes the standardized regression coefficient equal the input-target correlation and the squared SRCs partition the variance — a first-order Sobol decomposition with no extra design. It also exposes that CV alone ≠ influence (influence is CV × effect-strength), pointing verification at the parameter that actually moves the prediction. |
| PharmML SO carries IIV as random-effect variance, never as estimate precision (RSE) | The SO's job is to report results, but the dataset curates inter-individual variability, not the precision of the population estimate. Encoding IIV as omega = ln(1+CV²) is faithful; fabricating an RSE we don't have would not be — so the precision block is deliberately omitted. |
| Linked data is validated by RDF expansion, not just emitted | A JSON file with onkos: keys is not automatically valid JSON-LD. Shipping a single @context, typing isDescribedBy as @id, and having CI expand the output with rdflib to check the triples means the machine-readability claim is enforced rather than assumed. |
| OS and PFS share one mechanism (a tagged survival link), not two code paths | Both endpoints are Weibull-PH links on the same week-8 TGI metric, distinguished only by a structure.endpoint tag and their scale. simulate returns a curve per endpoint found for the context, so adding PFS needed data, not new kernels — and every analysis (divergence, uncertainty, sensitivity) works on either endpoint for free. |
| The Cox link is non-default and opt-in, not an auto-selected competitor | Auto-discovery assumes one link per (context, endpoint). The Cox alternative carries structure.default: false, so it's reachable only via explicit survival_link= — turning "Weibull vs Cox" into a deliberate survival-model-choice comparison instead of a silent collision. Its tabulated baseline rides along in the vt-json / JSON-LD exports. |
| Duration of response is read from the same RECIST episode as the response category, so breadth and durability can't drift apart | response_episode(t, v) returns both the best-response category and the DoR from one observed-baseline trajectory, so ORR (breadth) and DoR (durability) are mutually consistent by construction. That consistency is what lets the dataset show the durability mechanism behind the v0.27 surrogate failure — the highest-ORR model is the shortest-DoR one — and durable responders are right-censored (a lower-bound median with the censored fraction surfaced), never silently dropped to zero. |
| ORR is a population rate read off the same trial as OS, so its surrogacy is measured not assumed | The response endpoint reads the tumor trajectory directly (RECIST best response over the IIV ensemble); pairing each model's ORR with the OS from the same simulated trial turns the contested ORR → OS surrogate into a discordant-pair count. The honest punchline — ORR is faithful under a shrinkage-based survival model and inverts under a tail-driven one — falls out only because both endpoints share one trial, and it explains why high-ORR drugs fail confirmatory OS trials without Onkos ever ranking a treatment. |
| PFS is computed two ways and Onkos privileges neither — the disagreement is the quantity | The mechanistic route (time_to_progression, RECIST progression off the trajectory) and the statistical route (the week-8-keyed PFS link) are both legitimate and calibrated from different data, so the route is a model-selection axis inside one endpoint. For shrink-then-regrow resistance the routes invert the model ranking in every context (the week-8 link can't see the regrowth tail the mechanism does). Reporting both with their ratio — rather than picking one — is the same honesty stance as the OS-metric choice, applied to the endpoint that gates accelerated approvals; the censoring-robust landmark rate keeps the durable non-progressors from biasing the mechanistic median. |
| The structural uncertainties are accounted on one ledger, not just scored in isolation | Each model-selection axis (TGI model, survival metric/structure) was first surfaced alone; onkos.budget is the capstone that decomposes total forecast variance across all of them at once via a balanced two-way ANOVA, so their relative weights — and their interaction — are visible. It names the dominant axis (where standardization has the most leverage) instead of leaving "it depends on your assumptions" as a slogan, and it strictly generalizes the v0.21 within/between split (collapse one factor to recover it). The headline distinction — parameter (reducible by data) vs structural (not) — is the honest opposite of false precision. |
| The metric that bridges tumor dynamics to survival is a declared field, not a hidden constant | Every survival link consumed the week-8 change by default; making it structure.link_metric (default unchanged) exposes the most consequential surrogate choice in early oncology. Shipping a tail-sensitive k_g link beside it shows the choice can invert which model looks better — so Onkos surfaces the surrogate-endpoint debate rather than silently picking a side. An undefined metric (e.g. k_g with no regrowth) maps to the no-effect covariate (baseline hazard), never a fabricated or nan curve. |
| The dashboard owns no logic — it renders a tested, serializable result | The virtual-trial result is a Comparison.to_dict()/to_json() the package builds and tests; the Streamlit file only draws it. That keeps the headline view honest (the same numbers everywhere), lets external simulators ingest the JSON, and means CI catches UI/API drift by lint + compile, not by screenshots. |
| Confidence tiers are audited against evidence, not just hand-set | Spec §5 says tiers are "partly numeric." onkos audit derives the tier each clinical record's recorded external validation + IIV supports and fails validate on any inflation. A hand-set tier can't quietly over-claim — the honesty thesis applied to the honesty field itself. |
| The design Fisher information is additive over timepoints, so optimal design is row selection — and it can separate circumstantial from structural flatness | M = Σᵢ sᵢsᵢᵀ means a schedule's information is the sum of its timepoints' contributions, so onkos.design computes the sensitivities once on a dense grid and the D-optimal search is pure linear algebra over row subsets (no re-simulation), reusing the v0.22 Fisher core unchanged. That efficiency is what lets the best schedule a budget allows be computed — and the result is sharper than evaluating one schedule: the optimal design rescues the parameter whose flatness was a design problem (λ) and leaves the one whose flatness is a model problem (kL), with the biexponential as the control that the failure is structural. The reported optimal is the better of greedy/uniform, so the D-efficiency it claims is never an overstatement. |
| Survival matching is line-aware; an unsupported line yields no curve, not a borrowed one | The line of therapy is part of the context, so a second-line simulation must use second-line survival — matching only on tumor type would silently transport a 1L model. When no curated link exists for a line, the honest result is no survival curve, mirroring the no-fallback rule for tumor type. |
| The PK bridge is a thin illustrative adapter, not a PK toolkit (that's Hypnos) | Onkos's scope is exposure → tumor → survival. onkos.pk exposes only the standard dose↔exposure relations and a profile-ingestion adapter so the composability chain is runnable self-contained; modelling the PK itself stays in Hypnos, and the generators are clearly labelled illustrative. |
| Kill mechanism is a separate subsystem, so it can be a divergence axis | Bundling the kill model into each TGI record would hide that two trials might shrink tumors identically yet predict different outcomes because one assumed log-kill and the other Norton-Simon. The drug_effect subsystem makes the mechanism an explicit, comparable choice — the same "make the silent assumption visible" move as transportability. |
| Resistance carries three nested silent choices — model, mechanism, and origin — and Onkos makes each a separate, matched comparison | A resistance model hides a stack of assumptions: phenomenological-vs-mechanistic (the model, v0.24), and within the mechanistic family, pre-existing-vs-acquired (the origin, v0.32). Each is surfaced by adding a model that differs in exactly one assumption with everything else matched (Claret's kd matched to two-population; two-population's kg/kd/kgr matched to acquired), so the divergence is attributable to that one choice. The acquired kernel is a strict superset (α=0 recovers pre-existing), which is what proves the contrast is the origin and not a parameter artifact. The recurring punchline holds at every layer: matched on early kill, the choices agree at week-8 and on the week-8 OS surrogate yet diverge in the tail — the silent-transport risk, made measurable one assumption at a time. |
| Resistance ships in two forms — phenomenological and mechanistic — so the resistance model is a divergence axis | The Claret model fits resistance as a fading drug effect (an unidentifiable λ); the two-population model derives it from a sensitive/resistant clone split (an interpretable R0). Shipping both, tuned to the same early kill, makes the resistance-mechanism choice an explicit, comparable model-selection axis — and surfaces that a week-8 OS surrogate barely sees the resulting tumor-tail divergence (the silent-transport risk), the honest counterpoint to "we modeled resistance." |
| The drug-interaction model is a model-selection axis, and synergy is an assumption not a finding | A combination's predicted benefit depends on how the two effects are assumed to combine (HSA / additive-Bliss / synergy), so onkos.interaction combines at the effect level and reports the OS divergence across those assumptions rather than picking one. The interaction parameter ψ is a declared input (never fitted from the dataset, flagged when non-zero): distinguishing synergy from additivity needs a combination trial designed for it, and asserting it without one is the over-optimism the divergence exposes. Combination at the effect level (not dose-level Loewe over the ER curves) is the v0.x scope, named not hidden. |
| The architecture is a tested contract, not just a diagram | tests/test_architecture.py asserts every declared subsystem has records, every kernel is bound (no orphans/dead kernels), the CLI export formats match both the builders and the CI sweep, and the public API surface is stable. These checks have already caught real drift (an empty drug_effect subsystem; a CI export loop missing so/jsonld), so the diagrams above stay honest. |
| Model-averaging weights are combination weights, never model posteriors | Posterior model probabilities P(model|data) require the candidates to share one dataset; Onkos models are fit to different trials, so a posterior is not identifiable and would be invented. Framing the weights as Bates–Granger forecast-combination weights (and printing that everywhere) keeps the no-false-precision discipline; the headline output is a fraction of uncertainty no better estimation can remove, not a manufactured central probability. |
| The model average is structurally inseparable from its disagreement | A single combined curve looks like an answer, so ModelAverage cannot be serialized or drawn without its model_selection_fraction and worst tier; averaging cannot raise a tier, never rehabilitates an excluded model, and M=1 returns fraction 0 with a warning. The combiner is post-processing over compare, validated by a landmark suite proving it is the law of total variance and a convex combination. |
| Composable with Hypnos | A shared export/annotation convention lets a Hypnos PK record drive an Onkos TGI model end to end via an exposure-response record. |
onkos/
├── dataset/ # SOURCE OF TRUTH
│ ├── schema/ # JSON Schema + JSON-LD context
│ ├── records/ # one JSON per model / context-baseline
│ └── citations/ # Crossref/PubMed citation records
├── python/onkos/
│ ├── load · filter · validate · tiers · simulate · metrics · pk · compare · uncertainty · sensitivity
│ ├── combine · identify · design · interaction · budget · response · audit · report · cli
│ ├── joint · dose_response · early_surrogate · discriminability · atlas # model-selection axes
│ ├── py.typed # PEP 561 typing marker
│ └── export/ # registry · reference · nonmem · sbml · pharmml · pharmml_so
│ · rxode2 · pumas · virtual_trial_json · jsonld · combine · annotate
├── dashboard/app.py # Streamlit: browse + divergence view
├── notebooks/ # executed in CI (nbmake)
├── scripts/ # sync_dataset_into_package · make_figures
├── tests/ # schema · simulate · round-trip · CLI · report · …
├── docs/ # essay · specs/v0.1/spec.md · dataset-health.md · images/
├── CHANGELOG.md · CITATION.cff · .zenodo.json # release metadata
└── .github/workflows/ci.yml # lint · test (3.9–3.12) · exports · releasable wheel
In scope: unperturbed growth laws; drug-effect/kill models; resistance/ regrowth (the λ term); exposure-response links; TGI-derived metrics; TGI-metric → survival models; tumor-type/line baselines; a separated preclinical-translation subsystem; immuno-oncology only as a hypothesis-tier, non-predictive subsystem.
Out of scope (hard line, not a roadmap item): any per-patient prognosis, survival estimate for a real person, treatment recommendation, or therapy ranking. The tell that the project has crossed its line is any feature that takes a real patient's tumor measurement and returns a prognosis or a therapy choice. That feature does not get built. See spec §10.
| Phase | Content | Status |
|---|---|---|
| A — TGI spine | Growth laws + Claret TGI + NSCLC context + TGI→OS link + divergence view; NONMEM + SBML; round-trip validation. | ✅ v0.1 |
| B — Resistance + exposure-response | Emax / sigmoid-Emax / power ER kernels driving the kill term; scalar and time-varying PK-driven simulation (Hypnos composability); ER tier + transportability propagation; PharmML + rxode2/Pumas; IIV-CV surfaced. | ✅ v0.2 |
| C — Survival + baselines | tumor_type_baselines library + per-context Weibull-PH survival links across NSCLC, breast, CRC, HCC, melanoma; ≥2 eligible TGI models per context; cross-context divergence; orphan-record invariant enforced in CI. |
✅ v0.3 |
| D — Preclinical translation | Multi-state ODE framework; Simeoni 2004 xenograft model (exp→linear growth + signal-distribution transit chain); in-vitro→in-vivo potency translation; per-state SBML/NONMEM export + round-trip. | ✅ v0.4 |
| E — Immuno-oncology | Kuznetsov tumor–immune QSP, hypothesis-tier (tier D), non-predictive; tier-D enforced by the validator; DO-NOT-PREDICT annotation on every export; excluded from the clinical view. | ✅ v0.5 |
| F — Hardening | External-validation backfill (coverage 25/25); onkos report health report with CI sync gate; wheel-build releasability proven in CI; .omex, .zenodo.json, CHANGELOG.md, py.typed, CITATION.cff. |
✅ v0.6 |
The phased roadmap (spec §11, Phases A–F) is fully implemented. Work since then
follows a research track (docs/specs/research/) that deepens the project's
own thesis rather than adding breadth. Every spec — the v0.1 design spec and all
21 research specs — is implemented, tested, and shipped;
docs/specs/STATUS.md is the traceability index, and
tests/test_specs_complete.py enforces the
spec ↔ implementation mapping so a written-but-unbuilt spec fails CI.
| Research track | Content | Status |
|---|---|---|
| Model-selection uncertainty | onkos.combine: law-of-total-variance split of a composed forecast into parameter (within) vs model-selection (between) variance; the model_selection_fraction; declared equal/tier/evidence combination weights with cross-scheme fragility; the model-averaged S̄(t) curve + between-model band; report ranks contexts by irreducible model-choice risk. |
✅ v0.21 |
| Practical identifiability | onkos.identify: the design Fisher information + Cramér–Rao RSE + Brun collinearity index over the existing kernels; measures whether a realistic trial could estimate each parameter, pairs predicted RSE with stored IIV CV (flagging flat-likelihood-artifact CVs), and ranks models a realistic design cannot support; landmark-proven; cannot move a tier. |
✅ v0.22 |
| Combination interaction | onkos.interaction: combines two single-agent effects under declared interaction nulls (HSA / additive-Bliss / Greco interaction index ψ) and propagates through the existing TGI → survival chain; the interaction model becomes a quantified model-selection axis with its own OS divergence; synergy is a declared assumption, never fitted; the Bliss≡additive identity is landmark-tested; cannot rank regimens or raise a tier. |
✅ v0.23 |
| Mechanistic resistance | two_population_resistance kernel + record: the Goldie-Coldman sensitive/resistant two-clone model replaces the phenomenological decay-of-effect λ with an interpretable resistant burden R0; the resistance mechanism becomes a model-selection axis (phenomenological vs mechanistic), raising the NSCLC model-selection fraction 0.39 → 0.47; both compartments round-trip to SBML/NONMEM; landmark-tested; reveals that a week-8 OS surrogate is nearly blind to the resistance-model choice. |
✅ v0.24 |
| Survival-metric choice | structure.link_metric makes the on-treatment metric that drives a survival link a declared, swappable field (default unchanged); a non-default growth-rate-constant (k_g) OS link is added. Completes v0.24: the metric choice inverts the resistance-model ranking (two-pop > Claret under week-8; Claret > two-pop under k_g) and re-ranks a complete responder from last to first — making the surrogate-endpoint debate computable. Near-zero code; landmark-tested; the default is sacred. |
✅ v0.25 |
| Model-selection budget (capstone) | onkos.budget: a balanced two-way variance-component decomposition (ANOVA / first-order Sobol over the structural factors) puts every structural choice on one ledger — Var(Q) = WITHIN(parameter) + V_model + V_link + V_inter — naming the dominant axis (where standardization buys the most). Strict generalization of the v0.21 split (collapse factor B to recover it); landmark-proven. Capstone finding: ~68% of the NSCLC OS forecast is irreducible structural risk and the model×link interaction dominates. Report ranks contexts by structure- vs parameter-dominance. |
✅ v0.26 |
| RECIST response & ORR surrogacy | onkos.response: RECIST 1.1 best response (CR/PR/SD/PD) → population ORR / DCR over the IIV ensemble — the dominant phase-2 endpoint, previously absent. response_vs_survival reads ORR and OS off the same trial and counts discordant model pairs, showing the ORR → OS surrogate is conditional on the survival mechanism: faithful under the week-8 link (0/6), inverted under the k_g link (4/6, the high-responder has the shortest OS). Pure post-processing; landmark-tested; population level, no therapy ranking. |
✅ v0.27 |
| Duration of response | response_episode returns best response and DoR from one trajectory; ORR (breadth) gains median DoR (durability) over the ensemble with honest right-censoring. Depth ≠ durability: the highest-ORR NSCLC model has the shortest DoR, the mechanism of the v0.27 surrogate failure (broad but brief responses → worst tail-driven OS). Pure post-processing; landmark-tested; population level. |
✅ v0.28 |
| Cross-context generalization (breadth) | A mechanistic two-population model + a tail-sensitive k_g OS link added to breast, CRC, HCC, melanoma (8 records). The resistance-mechanism divergence, the budget's survival-link axis, the conditional ORR→OS surrogacy, and depth≠durability all reproduce across five solid-tumor contexts — CI-enforced (tests/test_response.py, tests/test_budget.py). 5/6 contexts now structure-dominated. Turns four single-context demos into a dataset-wide claim. |
✅ v0.29 |
| PFS endpoint — two routes | progression_free_survival / pfs_route_divergence: PFS computed both ways — the statistical week-8-keyed survival link and the mechanistic RECIST time-to-progression off the trajectory (time_to_progression) — over the same trial. The route is a model-selection axis: the two-population model is shortest mechanically yet longest statistically (the week-8 link is blind to the resistant-clone regrowth), inverting the model ranking in all five contexts (NSCLC 2/6, others 1/3). Pure post-processing; landmark-tested; neither route privileged; population level, no therapy ranking. |
✅ v0.30 |
| D-optimal trial design | onkos.design: the best sampling schedule a fixed budget allows, maximizing det(M) over the v0.22 design Fisher information (additive over timepoints → pure linear algebra, no re-simulation). Separates circumstantial from structural unidentifiability: for Claret NSCLC the optimal design rescues the borderline λ across the 50% line but the deeply flat kL stays unidentifiable under the best schedule (D-efficiency ≈ 1.14); the 2-parameter biexponential is fully identifiable (the control). Pure post-processing over the existing Fisher core; landmark-tested; cannot move a tier; design level, no per-patient schedule. |
✅ v0.31 |
| Acquired resistance | acquired_resistance kernel + record: the resistance origin (drug-induced switching at rate α, resistance generated from R0=0) as a model-selection axis one layer below the v0.24 mechanism axis. Matched on kg/kd/kgr to the pre-existing two-population model, the origins agree at week-8 and on the week-8 OS surrogate (92 vs 94 wk) but diverge in the tail (nadir 8.0 vs 2.8 mm, RECIST TTP 26 vs 32 wk) — a silent assumption the surrogate misses, the mechanistic PFS catches. α=0 recovers the pre-existing model (strict generalization); round-trip validated; landmark-tested; tier C, no therapy ranking. |
✅ v0.32 |
| Integrated tumor burden (third bridge metric) | log_burden_auc (the time-averaged log relative tumor size — the AUC of the log-size curve) added to the Stein/Bruno panel + a non-default survival_link.nsclc_os_burden_auc. The one metric that integrates both depth and tail re-ranks the NSCLC models a third, distinct way (vs week-8 and k_g), so "which bridge metric" stays a live axis even for a "comprehensive" metric — and it repairs k_g's depth-blindness (the pure-tail metric ranks a never-shrinking tumor 2nd; the integrated burden ranks it last). NSCLC/first now has four eligible OS links for the budget. Pure post-processing + one record; default view byte-identical; landmark-tested; population level, no therapy ranking. |
✅ v0.33 |
| Joint longitudinal–survival | onkos.joint: the current-value link makes the instantaneous hazard track the current tumor size (λ(t)=λ₀(t)·exp(α·log(v/y0))) — the rigorous, two-stage-free survival model. A strict generalization of proportional hazards (a constant HR recovers the two-stage Weibull-PH curve exactly). The hazard ratio is time-varying (rises 10×–255× as a resistant clone regrows; falls for a complete responder) — a non-proportional hazard the two-stage links can't encode — and it inverts the week-8 resistance-model ranking (two-pop > Claret under week-8; Claret > two-pop under the joint link). So survival-link structure (two-stage PH vs joint) is a model-selection axis. Pure post-processing, no record/kernel/export change; α declared not fitted; landmark-tested; population level, no therapy ranking. |
✅ v0.34 |
| Dose-level Loewe additivity | onkos.interaction extension: combines two doses through the dose-response curves via the isobole d_A/D_A(E)+d_B/D_B(E)=1, beside the v0.23 effect-level nulls. The "no-interaction" reference is itself a model-selection axis — Loewe is the only one satisfying the sham-combination identity (a drug with itself is exactly additive), Bliss overstates (can exceed either drug's max effect), HSA understates. The same dose pair gives combined effect 0.90/1.07/1.60 and median OS 88/92/101 wk across HSA/Loewe/Bliss; the gap grows with dose. Pure post-processing over the curated ER curves (analytic inverses), no new record/kernel/export; reference declared not fitted; landmark-tested (sham identity exact); population level, no dose/therapy ranking. |
✅ v0.35 |
| ER-model dose-extrapolation | onkos.dose_response: the upstream exposure-response model choice (Emax / power / sigmoid-Emax) as a model-selection axis. Re-anchors the shapes to agree at the studied dose, then quantifies how their effect — and OS — diverge off it: 0 at the studied dose (control), ≈19 wk at quarter-dose, sharpest on de-escalation (the dose-finding question). The transportability thesis with the dose as the context. Pure post-processing over the curated ER shapes, no new record/kernel/export; shapes re-anchored not refit; landmark-tested; population level, no dose recommendation. |
✅ v0.36 |
| Early-surrogate timing | onkos.early_surrogate: when the surrogate is read (the ctDNA push to week 2–4 vs RECIST week 8) as a model-selection axis orthogonal to which metric. ctDNA modeled as burden-proportional, so the axis is readout time. Discordance against a tail-aware durable-benefit ranking falls monotonically with the landmark (NSCLC 9/10 at week 2 → 3/10 at week 52); early landmarks over-reward the deep-but-doomed resistance models the durable-benefit ranking puts last. Reproduces across 5 contexts. Pure post-processing, no new record/kernel/export; landmark grid declared; landmark-tested; population level, no go/no-go. |
✅ v0.37 |
| Model discriminability | onkos.discriminability: the rigorous close of the model-selection arc — given two models' OS curves, the required trial events to distinguish them (Schoenfeld logrank, d=4(z_α+z_β)²/(ln HR)²). Under week-8 OS the resistance mechanism/origin pairs need 10⁴–10⁵ events (Claret vs two-pop ~11.8k, vs acquired ~103k) — practically unidentifiable, so the silent model-selection risk can only be assumed, not resolved by data; early-shrinkage-distinct pairs need ~60–90. The model-level twin of identify/design. Pure post-processing over the OS curves, no new record/kernel/export; landmark-tested; design/trial level, no trial designed, no recommendation. |
✅ v0.38 |
| Model-selection atlas (synthesis) | onkos.atlas: a declarative registry (AXES) of every model-selection axis + a one-call per-context survey reporting each axis's native headline — the synthesis layer over eighteen versions. NSCLC OS-swing leaderboard: survival structure ~108 wk > metric ~97 > TGI model ~41 > ER shape ~22; plus the detectability axes (8/10 early-misranked, 4/10 indistinguishable). Deliberately a survey, not a decomposition (comparable=False, points to the budget). Pure orchestration, no new record/kernel/export; landmark-tested. Ships with a housekeeping doc-drift fix (architecture diagram + repo layout refreshed). |
✅ v0.39 |
| Von Bertalanffy growth | growth_von_bertalanffy kernel + record completes the spec §2 growth-law family. Surface-area-limited dV/dt = a·V^(2/3) − b·V (closed form via u=V^(1/3)), sub-exponential to V∞=(a/b)³. First kernel with a fractional-power rhs_infix, exercising the MathML power round-trip. Scientific-landmark-validated (carrying capacity, surface-limited inflection below V∞/2, monotone-falling specific rate). A different kind of work — a first-class reference kernel, not an analysis axis. |
✅ v0.40 |
| Power-law growth | growth_power_law kernel + record: the unbounded sub-exponential law dV/dt = a·V^p (p<1), Benzekry's empirically best-fitting unperturbed model. Closed form V(t)=(V0^(1−p)+a(1−p)t)^(1/(1−p)). The finding: matched early, assuming exponential overestimates extrapolated burden — ~93× by two years — the growth-layer analog of the ER dose-extrapolation axis. Scientific-landmark-validated (sub-exponential, strictly below the rate-matched exponential). Reference kernel + record; no new module/CLI. |
✅ v0.41 |
Remaining work is breadth and verification: promoting unverified records to
verified from source PDFs, adding more drugs / tumor types / lines, and the
further steps in the research specs (see CONTRIBUTING.md and docs/specs/).
- Code: MIT (LICENSE).
- Dataset: CC-BY-4.0 (LICENSE-DATASET).
- Citation:
CITATION.cff. When you use a record, cite Onkos and the original source viarecord.primary_citation.doi.
Sibling projects: Nidus (gestational physiology, per-parameter tier) and Hypnos (anesthetic PK/PD, applicability envelope). Hypnos and Onkos compose: a Hypnos PK record can drive the exposure-response of an Onkos TGI model, giving an open, tier-annotated PK → exposure → tumor-dynamics → survival chain.




































{ "id": "resistance.claret_2009.tgi", "kind": "model", "purpose": "tgi", "subsystem": "resistance", "kernel": "claret_tgi", "structure": { "growth_law": "exponential", "kill_model": "first_order_exposure_driven", "resistance": "exponential_decay_of_kill" }, "parameters": [ { "symbol": "kL", "tier": "B", "value": {"central": 0.021, "units": "1/week"} }, { "symbol": "kD", "tier": "C", "iiv_cv_percent": 89, "value": {"central": 0.30, "units": "1/week per effect-unit"} }, { "symbol": "lambda", "tier": "C", "iiv_cv_percent": 96, "value": {"central": 0.061, "units": "1/week"} } ], "derivation_context": { "drug": "dacomitinib", "drug_class": "EGFR_TKI", "tumor_type": "NSCLC", "line_of_therapy": "first" }, "transportability": { "validated_tumor_types": ["NSCLC"], "out_of_context_action": "tier_down_to_D and warn" }, "tier": "C", "review_status": "unverified", "primary_citation": "claret-2009-tgi" }