Skip to content

PreethiAndichamy342/cloud-inference-engine

Repository files navigation

inference-placement-engine

HIPAA-compliant inference routing and multi-cloud ML placement engine — policy-driven server selection across AWS, GCP, and on-prem with PHI gate enforcement and p99 SLA guarantees.


End-to-End Architecture


Problem

Healthcare ML workloads span a spectrum of data sensitivity. A de-identified risk score can run on any public cloud, but a request containing full PHI must never leave a HIPAA-compliant environment with a signed BAA. This engine makes placement automatic and policy-driven — compliance rules are enforced at the router before any cloud is contacted.


Key Features

  • PHI Gate — detects sensitivity tiers (publicphi_strict) and blocks non-compliant destinations before dispatch
  • Three-Phase Routing — compliance filtering → SLA filtering → strategy scoring, in that order, every time
  • P99 Latency Enforcement — rolling p99 per server; requests with max_latency_ms reject servers that exceed the ceiling
  • Circuit Breakers — per-adapter CLOSED/OPEN/HALF_OPEN state prevents degraded servers from inflating latency metrics
  • Live Dashboard — real-time view of routed requests, server health, and routing decisions at /dashboard

Stack: FastAPI · Python 3.11 · Ollama · Redis (optional)


Quick Start

git clone https://github.com/PreethiAndichamy342/inference-placement-engine.git
cd inference-placement-engine
pip install -r requirements.txt && ollama pull tinyllama
bash scripts/start_demo.sh
uvicorn src.api.main:app --reload --port 8000

Then open http://localhost:8000/dashboard or test the compliance-first healthcare ML routing API:

curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"model_id":"tinyllama:latest","payload":{"prompt":"Patient DOB 1980"},"tenant_id":"hospital_A","data_sensitivity":"phi_strict","strategy":"compliance_first"}'

Documentation

Document Description
ARCHITECTURE.md System components, 8-step data flow, three-phase routing model, PHI-aware placement design
SETUP.md Prerequisites and installation for macOS, Linux, Windows, Chromebook
API_REFERENCE.md All endpoints with curl examples for the compliance-first routing API
CODE_OF_CONDUCT.md Contributor standards and enforcement

Medium Series

  1. System Overview
  2. Placement Engine
  3. P99 Latency
  4. PHI Gate
  5. Kafka Failover
  6. Unified Compliance
  7. Design Retrospective

© 2026 Preethi Andichamy — Apache License 2.0

About

HIPAA-compliant multi-cloud ML inference placement engine with p99 latency SLA enforcement, PHI gate, and policy-driven routing across AWS, GCP, on-prem

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors