inference-placement-engine

HIPAA-compliant inference routing and multi-cloud ML placement engine — policy-driven server selection across AWS, GCP, and on-prem with PHI gate enforcement and p99 SLA guarantees.

Problem

Healthcare ML workloads span a spectrum of data sensitivity. A de-identified risk score can run on any public cloud, but a request containing full PHI must never leave a HIPAA-compliant environment with a signed BAA. This engine makes placement automatic and policy-driven — compliance rules are enforced at the router before any cloud is contacted.

Key Features

PHI Gate — detects sensitivity tiers (public → phi_strict) and blocks non-compliant destinations before dispatch
Three-Phase Routing — compliance filtering → SLA filtering → strategy scoring, in that order, every time
P99 Latency Enforcement — rolling p99 per server; requests with max_latency_ms reject servers that exceed the ceiling
Circuit Breakers — per-adapter CLOSED/OPEN/HALF_OPEN state prevents degraded servers from inflating latency metrics
Live Dashboard — real-time view of routed requests, server health, and routing decisions at /dashboard

Stack: FastAPI · Python 3.11 · Ollama · Redis (optional)

Quick Start

git clone https://github.com/PreethiAndichamy342/inference-placement-engine.git
cd inference-placement-engine
pip install -r requirements.txt && ollama pull tinyllama
bash scripts/start_demo.sh
uvicorn src.api.main:app --reload --port 8000

Then open http://localhost:8000/dashboard or test the compliance-first healthcare ML routing API:

curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"model_id":"tinyllama:latest","payload":{"prompt":"Patient DOB 1980"},"tenant_id":"hospital_A","data_sensitivity":"phi_strict","strategy":"compliance_first"}'

Documentation

Document	Description
ARCHITECTURE.md	System components, 8-step data flow, three-phase routing model, PHI-aware placement design
SETUP.md	Prerequisites and installation for macOS, Linux, Windows, Chromebook
API_REFERENCE.md	All endpoints with curl examples for the compliance-first routing API
CODE_OF_CONDUCT.md	Contributor standards and enforcement

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
dashboard		dashboard
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
API_REFERENCE.md		API_REFERENCE.md
ARCHITECTURE.md		ARCHITECTURE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SETUP.md		SETUP.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inference-placement-engine

Problem

Key Features

Quick Start

Documentation

Medium Series

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

inference-placement-engine

Problem

Key Features

Quick Start

Documentation

Medium Series

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages