Human Pose Estimation Service

Production-oriented REST API service exposing ensam3d_inference as a managed backend for distributed 3D human pose estimation, with video ingestion, GPU worker orchestration, annotated visualization rendering, and persistent artifact storage.

Performance Benchmark

This benchmark demonstrates the full processing pipeline of the system running in parallel across multiple videos. Each video independently goes through three stages:

ingestion (upload + validation),
pose estimation (GPU inference),
visualization rendering (video overlay generation).

The key metric is the Parallelism Ratio — the ratio between the sum of all individual stage latencies and the actual wall-clock time. In this benchmark, a ratio of 6.36x demonstrates that the distributed architecture completed the entire workload 6.36 times faster than a single sequential pipeline would have.

Benchmark Videos

All videos were sourced from Pexels and processed simultaneously:

#	Video	Resolution	Duration	Size
1	Person jogging at the beach	1920 × 1080	8.93 sec	5.69 MB
2	Man with prosthetic leg jogging	3840 × 2160	24.32 sec	72.10 MB
3	A man running on the beach shore	3840 × 2160	12.60 sec	28.55 MB
4	A man jogging by the lakeside	3840 × 2160	15.36 sec	22.63 MB
5	A man running at the beach	3840 × 2160	26.10 sec	77.51 MB
6	A woman running in the beach	3840 × 2160	16.48 sec	48.59 MB
7	A person running on the beach at sunset	1920 × 1080	12.96 sec	7.74 MB
8	Woman jogging by the seashore	3840 × 2160	14.84 sec	41.02 MB
9	A man running on the beach	1280 × 720	9.97 sec	3.27 MB
10	Man jogging outdoors	3840 × 2160	15.64 sec	46.78 MB

Configuration

Environment


CPU	AMD Ryzen 7 5800H with Radeon Graphics
GPU	NVIDIA GeForce RTX 3070 Laptop GPU
PyTorch Version	2.5.1.post306
CUDA Version	12.6
Number of Servers	1
Number of Default Workers	1 (prefork pool, CPU, concurrency 4, prefetch_multiplier 1)
Number of Inference Workers	2 (solo pool, CUDA, concurrency 1, prefetch_multiplier 1)

Estimation Parameters


Target Resolution	1920 × 1080
Target Duration	10.00 sec
Target FPS	20.00
Target Frames per Video	200
Inference Batch Size	30

Visualization Parameters


Show Keypoints	True
Show Skeleton	True
Show Bounding Boxes	False
CRF (Quality)	20
Encoding Preset	medium
Visualization Batch Size	30

Performance


Total Videos Processed	10
Wall-Clock Time	135.68 sec
Summed Stage Times	863.15 sec
Parallelism Ratio	6.36x

Want more details?
For the full technical documentation — including conceptual design, domain modeling, runtime concurrency, API contracts, deployment guides, and more — see the documentation.

Project Structure

pose-estimation-service/
├── packages/                     # Monorepo root containing all independently deployable Python modules and
│   │                             # shared libraries. Each package strictly follows the standard `src/` layout
│   │                             # (e.g., `packages/server/src/server/`) for clean imports and packaging.
│   │
│   ├── server/                   # FastAPI application handling HTTP routing, request validation,
│   │                             # database interactions, and decoupled task delegation to workers.
│   │
│   ├── worker-inference/         # Dedicated GPU worker process executing heavy ML inference
│   │                             # via the `ensam3d_inference` engine.
│   │
│   ├── worker-default/           # General-purpose CPU worker process executing background post-processing pipelines
│   │                             # (currently: rendering annotated video overlays and persisting media assets to S3).
│   │
│   ├── shared/                   # Cross-process shared infrastructure. Contains base configurations,
│   │                             # unified client abstractions for external services
│   │                             # (e.g., PostgreSQL, Redis, MinIO/S3), and more.
│   │
│   ├── scripts/                  # Standalone automation scripts for environment bootstrapping,
│   │                             # infrastructure initialization, and auxiliary utility tasks 
│   │                             # (e.g., OpenAPI schema export).
│   │
│   └── benchmarks/               # Performance testing suite for measuring endpoint latency,
│                                 # pipeline throughput, and parallel processing efficiency.
│
├── docker/                       # Docker Compose stacks for local (infrastructure-only) and 
│                                 # deploy (fully containerized) modes.
│
├── config/                       # Environment variable templates for local and deploy modes.
│
├── migrations/                   # Alembic migration environment and database schema change scripts.
│
├── docs/                         # Technical documentation covering conceptual overview,
│                                 # domain model, runtime architecture, engineering decisions,
│                                 # API design, codebase layout, and dependencies.
│
├── pixi.toml                     # Pixi environment configuration defining feature-based
│                                 # dependency groups for server, workers, and ML inference.
│
├── pixi.lock                     # Fully resolved and reproducible dependency lockfile.
│
└── justfile                      # Task runner: bootstrap commands, database migration targets,
                                  # runtime process launchers, and Docker Compose shortcuts.
                                  # Automatically manages pixi environment context per recipe.

Quick Start

I. Prerequisites

Pixi package manager.
Docker and Docker Compose.
GNU/Linux-based system on x86_64 architecture.
NVIDIA GPU with driver compatible with CUDA Toolkit >= 12.8.

Note:
These prerequisites are not strict requirements but describe the environment used for development. The service can be set up in alternative environments with different package managers, operating systems, or GPU configurations if needed.

II. Setup

Clone the repository

git clone git@github.com:Sierra-Arn/pose-estimation-service.git
cd pose-estimation-service

Install dependencies
```
pixi install
```
Activate environment
```
pixi shell
```

III. Model Weights Access

The service requires the sam-3d-body-vith model weights, which are hosted on Hugging Face under restricted access.

Request access

Navigate to the model repository and submit an access request:
```
https://huggingface.co/facebook/sam-3d-body-vith
```

Download weights

After access is granted, download the weights manually and place them in the project root:

# Expected structure after download:
pose-estimation-service/
├── app/
├── docker/
├── config/
├── migrations/
├── scripts/
├── pixi.toml
├── pixi.lock
├── justfile
└── sam-3d-body-vith/

IV. Launch

Once the environment is activated and model weights are downloaded, the service can be launched in one of two modes depending on the deployment context.

Local mode — infrastructure services run in Docker; the API server and workers run on the host machine directly.
```
just quick-start-local
```
Deploy mode — the full stack runs in Docker, including infrastructure, API server, and both worker processes.
```
just quick-start-deploy
```
Note:
Deploy mode requires NVIDIA Container Toolkit to expose the GPU to the worker-inference container.

Whichever mode you choose, the launch script will automatically execute all necessary setup steps, start the server, and open the Swagger UI in your default web browser once the API is ready.

Want to see what happens under the hood?
The launch scripts are fully documented with step-by-step comments explaining each action. You can find them here:

Local mode script

Deploy mode script

V. Cleanup

When you are done, shut down the running services depending on the active launch mode.

Local mode — stop the API server and workers by terminating their terminal processes, then bring down the infrastructure containers.
```
just docker-local-down
```
Deploy mode — stop and remove all containers with a single command.
```
just docker-deploy-down
```

License

This project is licensed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.pixi		.pixi
config		config
docker		docker
docs		docs
migrations		migrations
packages		packages
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
justfile		justfile
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Pose Estimation Service

Performance Benchmark

Benchmark Videos

Configuration

Performance

Project Structure

Quick Start

I. Prerequisites

II. Setup

III. Model Weights Access

IV. Launch

V. Cleanup

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Human Pose Estimation Service

Performance Benchmark

Benchmark Videos

Configuration

Performance

Project Structure

Quick Start

I. Prerequisites

II. Setup

III. Model Weights Access

IV. Launch

V. Cleanup

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages