Skip to content

kbd0011/Health_Portfolio

Repository files navigation

Health AI Research Portfolio

Three production-style health-AI systems built around the engineering that makes clinical ML trustworthy - calibrated ranking, conformal alerting, drift monitoring, and a privacy release gate. ~11,500 lines of Python and a 396-test suite. The demos run on real public data where it exists (ClinicalTrials.gov, the open MIMIC-IV demo, Synthea); the learned models and their labels use synthetic stand-ins only where there is no public ground truth.

The Three Projects

# Project Domain Core contribution
1 OncoBoard-MM Precision oncology Multimodal patient–trial matching with calibrated ranking and evidence traces
2 AcuteWatch-FM Hospital deterioration Real-time multi-horizon risk prediction with conformal alerting and drift monitoring
3 SynGuard-EHR Synthetic data governance Privacy-preserving synthetic EHR generation with attack-suite evaluation and release gating

How They Connect

The three projects form a deliberate reuse chain:

OncoBoard-MM                    AcuteWatch-FM                    SynGuard-EHR
─────────────                   ─────────────                    ────────────
Modality masks ──────────────►  Modality masks (5 modalities)
Gated fusion ────────────────►  Gated multimodal fusion
Conformal prediction ────────►  Conformal alerting ──────────►   Release gate thresholds
Synthetic oncology cohorts ──►  ─────────────────────────────►   Generator + privacy screening
Calibration audit ───────────►  Drift monitoring ────────────►   Fairness auditing
Sparse retrieval (evidence) ─►  Guideline retrieval              ─────────────────────────

OncoBoard-MM establishes the core patterns: modality-mask conventions, gated late fusion, conformal uncertainty, and calibration auditing.
AcuteWatch-FM extends these to longitudinal hospital data with irregular time-series, multi-horizon prediction, and operational governance (drift detection, alert budgets).
SynGuard-EHR inherits the privacy screening and calibration machinery to build a governed synthetic-data workbench where the evaluator and release gate - not just the generator - are the primary deliverables.

Demonstrated capabilities (real public data, synthetic models)

Each sub-project includes a demo notebook that runs end-to-end on real public data, with synthetic stand-ins for the learned models and labels where no public ground truth exists. Headline numbers from the most recent run:

Project What runs end-to-end One headline number
OncoBoard-MM profile build → biomarker normalization → ranker training → calibration audit Brier 0.0185, ECE 0.0824 on 500 synthetic predictions
AcuteWatch-FM event ingest → leakage-safe windows → 9-head multi-task training → conformal alerts → drift All 9 outcome-horizon heads train to val_acc 0.976; drift monitor flags a 0.82-std heart-rate shift
SynGuard-EHR generate → privacy attacks → fidelity → fairness → release gate → FHIR export Release gate correctly BLOCKS a deliberately leaky generator against a real Synthea cohort (PRIV-01 distinguishability 0.61 vs 0.05 threshold)

The data backbones are real (ClinicalTrials.gov trials, the open MIMIC-IV demo, a Synthea cohort); the learned rankers/models, their labels, and the calibration/conformal figures are synthetic stand-ins where no public match/outcome labels exist. These are pipeline demonstrations, not predictive benchmarks - see each sub-project's Demonstration and Status sections for exactly what is real versus synthetic.

Quickstart

Each project is a self-contained Python package. To run any one of them:

cd project_01_oncoboard_mm   # or project_02_acutewatch_fm or project_03_synguard_ehr
pip install -e .
PYTHONPATH=src pytest tests/ -v

Each project includes a demo notebook under notebooks/ that runs a worked example with actual metrics:

cd project_01_oncoboard_mm/notebooks
jupyter notebook demo_trial_matching.ipynb

References

See references.md for the full bibliography referenced in each project's README.

License

MIT - see LICENSE.

About

Three Python health-AI projects with reusable modality-mask / conformal / calibration scaffolding: precision-oncology trial matching (OncoBoard-MM), multi-horizon hospital deterioration with drift monitoring (AcuteWatch-FM), and synthetic-EHR generation with a privacy release gate (SynGuard-EHR).

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors