Three production-style health-AI systems built around the engineering that makes clinical ML trustworthy - calibrated ranking, conformal alerting, drift monitoring, and a privacy release gate. ~11,500 lines of Python and a 396-test suite. The demos run on real public data where it exists (ClinicalTrials.gov, the open MIMIC-IV demo, Synthea); the learned models and their labels use synthetic stand-ins only where there is no public ground truth.
| # | Project | Domain | Core contribution |
|---|---|---|---|
| 1 | OncoBoard-MM | Precision oncology | Multimodal patient–trial matching with calibrated ranking and evidence traces |
| 2 | AcuteWatch-FM | Hospital deterioration | Real-time multi-horizon risk prediction with conformal alerting and drift monitoring |
| 3 | SynGuard-EHR | Synthetic data governance | Privacy-preserving synthetic EHR generation with attack-suite evaluation and release gating |
The three projects form a deliberate reuse chain:
OncoBoard-MM AcuteWatch-FM SynGuard-EHR
───────────── ───────────── ────────────
Modality masks ──────────────► Modality masks (5 modalities)
Gated fusion ────────────────► Gated multimodal fusion
Conformal prediction ────────► Conformal alerting ──────────► Release gate thresholds
Synthetic oncology cohorts ──► ─────────────────────────────► Generator + privacy screening
Calibration audit ───────────► Drift monitoring ────────────► Fairness auditing
Sparse retrieval (evidence) ─► Guideline retrieval ─────────────────────────
OncoBoard-MM establishes the core patterns: modality-mask conventions, gated late fusion, conformal uncertainty, and calibration auditing.
AcuteWatch-FM extends these to longitudinal hospital data with irregular time-series, multi-horizon prediction, and operational governance (drift detection, alert budgets).
SynGuard-EHR inherits the privacy screening and calibration machinery to build a governed synthetic-data workbench where the evaluator and release gate - not just the generator - are the primary deliverables.
Each sub-project includes a demo notebook that runs end-to-end on real public data, with synthetic stand-ins for the learned models and labels where no public ground truth exists. Headline numbers from the most recent run:
| Project | What runs end-to-end | One headline number |
|---|---|---|
| OncoBoard-MM | profile build → biomarker normalization → ranker training → calibration audit | Brier 0.0185, ECE 0.0824 on 500 synthetic predictions |
| AcuteWatch-FM | event ingest → leakage-safe windows → 9-head multi-task training → conformal alerts → drift | All 9 outcome-horizon heads train to val_acc 0.976; drift monitor flags a 0.82-std heart-rate shift |
| SynGuard-EHR | generate → privacy attacks → fidelity → fairness → release gate → FHIR export | Release gate correctly BLOCKS a deliberately leaky generator against a real Synthea cohort (PRIV-01 distinguishability 0.61 vs 0.05 threshold) |
The data backbones are real (ClinicalTrials.gov trials, the open MIMIC-IV demo, a Synthea cohort); the learned rankers/models, their labels, and the calibration/conformal figures are synthetic stand-ins where no public match/outcome labels exist. These are pipeline demonstrations, not predictive benchmarks - see each sub-project's Demonstration and Status sections for exactly what is real versus synthetic.
Each project is a self-contained Python package. To run any one of them:
cd project_01_oncoboard_mm # or project_02_acutewatch_fm or project_03_synguard_ehr
pip install -e .
PYTHONPATH=src pytest tests/ -vEach project includes a demo notebook under notebooks/ that runs a worked example with actual metrics:
cd project_01_oncoboard_mm/notebooks
jupyter notebook demo_trial_matching.ipynbSee references.md for the full bibliography referenced in each project's README.
MIT - see LICENSE.