HIPAA-compliant inference routing and multi-cloud ML placement engine — policy-driven server selection across AWS, GCP, and on-prem with PHI gate enforcement and p99 SLA guarantees.
Healthcare ML workloads span a spectrum of data sensitivity. A de-identified risk score can run on any public cloud, but a request containing full PHI must never leave a HIPAA-compliant environment with a signed BAA. This engine makes placement automatic and policy-driven — compliance rules are enforced at the router before any cloud is contacted.
- PHI Gate — detects sensitivity tiers (
public→phi_strict) and blocks non-compliant destinations before dispatch - Three-Phase Routing — compliance filtering → SLA filtering → strategy scoring, in that order, every time
- P99 Latency Enforcement — rolling p99 per server; requests with
max_latency_msreject servers that exceed the ceiling - Circuit Breakers — per-adapter
CLOSED/OPEN/HALF_OPENstate prevents degraded servers from inflating latency metrics - Live Dashboard — real-time view of routed requests, server health, and routing decisions at
/dashboard
Stack: FastAPI · Python 3.11 · Ollama · Redis (optional)
git clone https://github.com/PreethiAndichamy342/inference-placement-engine.git
cd inference-placement-engine
pip install -r requirements.txt && ollama pull tinyllama
bash scripts/start_demo.sh
uvicorn src.api.main:app --reload --port 8000Then open http://localhost:8000/dashboard or test the compliance-first healthcare ML routing API:
curl -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"model_id":"tinyllama:latest","payload":{"prompt":"Patient DOB 1980"},"tenant_id":"hospital_A","data_sensitivity":"phi_strict","strategy":"compliance_first"}'| Document | Description |
|---|---|
| ARCHITECTURE.md | System components, 8-step data flow, three-phase routing model, PHI-aware placement design |
| SETUP.md | Prerequisites and installation for macOS, Linux, Windows, Chromebook |
| API_REFERENCE.md | All endpoints with curl examples for the compliance-first routing API |
| CODE_OF_CONDUCT.md | Contributor standards and enforcement |
- System Overview
- Placement Engine
- P99 Latency
- PHI Gate
- Kafka Failover
- Unified Compliance
- Design Retrospective
© 2026 Preethi Andichamy — Apache License 2.0
