Production-oriented REST API service exposing
ensam3d_inferenceas a managed backend for distributed 3D human pose estimation, with video ingestion, GPU worker orchestration, annotated visualization rendering, and persistent artifact storage.
This benchmark demonstrates the full processing pipeline of the system running in parallel across multiple videos. Each video independently goes through three stages:
- ingestion (upload + validation),
- pose estimation (GPU inference),
- visualization rendering (video overlay generation).
The key metric is the Parallelism Ratio — the ratio between the sum of all individual stage latencies and the actual wall-clock time. In this benchmark, a ratio of 6.36x demonstrates that the distributed architecture completed the entire workload 6.36 times faster than a single sequential pipeline would have.
All videos were sourced from Pexels and processed simultaneously:
| # | Video | Resolution | Duration | Size |
|---|---|---|---|---|
| 1 | Person jogging at the beach | 1920 × 1080 | 8.93 sec | 5.69 MB |
| 2 | Man with prosthetic leg jogging | 3840 × 2160 | 24.32 sec | 72.10 MB |
| 3 | A man running on the beach shore | 3840 × 2160 | 12.60 sec | 28.55 MB |
| 4 | A man jogging by the lakeside | 3840 × 2160 | 15.36 sec | 22.63 MB |
| 5 | A man running at the beach | 3840 × 2160 | 26.10 sec | 77.51 MB |
| 6 | A woman running in the beach | 3840 × 2160 | 16.48 sec | 48.59 MB |
| 7 | A person running on the beach at sunset | 1920 × 1080 | 12.96 sec | 7.74 MB |
| 8 | Woman jogging by the seashore | 3840 × 2160 | 14.84 sec | 41.02 MB |
| 9 | A man running on the beach | 1280 × 720 | 9.97 sec | 3.27 MB |
| 10 | Man jogging outdoors | 3840 × 2160 | 15.64 sec | 46.78 MB |
Environment
| CPU | AMD Ryzen 7 5800H with Radeon Graphics |
| GPU | NVIDIA GeForce RTX 3070 Laptop GPU |
| PyTorch Version | 2.5.1.post306 |
| CUDA Version | 12.6 |
| Number of Servers | 1 |
| Number of Default Workers | 1 (prefork pool, CPU, concurrency 4, prefetch_multiplier 1) |
| Number of Inference Workers | 2 (solo pool, CUDA, concurrency 1, prefetch_multiplier 1) |
Estimation Parameters
| Target Resolution | 1920 × 1080 |
| Target Duration | 10.00 sec |
| Target FPS | 20.00 |
| Target Frames per Video | 200 |
| Inference Batch Size | 30 |
Visualization Parameters
| Show Keypoints | True |
| Show Skeleton | True |
| Show Bounding Boxes | False |
| CRF (Quality) | 20 |
| Encoding Preset | medium |
| Visualization Batch Size | 30 |
| Total Videos Processed | 10 |
| Wall-Clock Time | 135.68 sec |
| Summed Stage Times | 863.15 sec |
| Parallelism Ratio | 6.36x |
Want more details?
For the full technical documentation — including conceptual design, domain modeling, runtime concurrency, API contracts, deployment guides, and more — see the documentation.
pose-estimation-service/
├── packages/ # Monorepo root containing all independently deployable Python modules and
│ │ # shared libraries. Each package strictly follows the standard `src/` layout
│ │ # (e.g., `packages/server/src/server/`) for clean imports and packaging.
│ │
│ ├── server/ # FastAPI application handling HTTP routing, request validation,
│ │ # database interactions, and decoupled task delegation to workers.
│ │
│ ├── worker-inference/ # Dedicated GPU worker process executing heavy ML inference
│ │ # via the `ensam3d_inference` engine.
│ │
│ ├── worker-default/ # General-purpose CPU worker process executing background post-processing pipelines
│ │ # (currently: rendering annotated video overlays and persisting media assets to S3).
│ │
│ ├── shared/ # Cross-process shared infrastructure. Contains base configurations,
│ │ # unified client abstractions for external services
│ │ # (e.g., PostgreSQL, Redis, MinIO/S3), and more.
│ │
│ ├── scripts/ # Standalone automation scripts for environment bootstrapping,
│ │ # infrastructure initialization, and auxiliary utility tasks
│ │ # (e.g., OpenAPI schema export).
│ │
│ └── benchmarks/ # Performance testing suite for measuring endpoint latency,
│ # pipeline throughput, and parallel processing efficiency.
│
├── docker/ # Docker Compose stacks for local (infrastructure-only) and
│ # deploy (fully containerized) modes.
│
├── config/ # Environment variable templates for local and deploy modes.
│
├── migrations/ # Alembic migration environment and database schema change scripts.
│
├── docs/ # Technical documentation covering conceptual overview,
│ # domain model, runtime architecture, engineering decisions,
│ # API design, codebase layout, and dependencies.
│
├── pixi.toml # Pixi environment configuration defining feature-based
│ # dependency groups for server, workers, and ML inference.
│
├── pixi.lock # Fully resolved and reproducible dependency lockfile.
│
└── justfile # Task runner: bootstrap commands, database migration targets,
# runtime process launchers, and Docker Compose shortcuts.
# Automatically manages pixi environment context per recipe.- Pixi package manager.
- Docker and Docker Compose.
- GNU/Linux-based system on
x86_64architecture. - NVIDIA GPU with driver compatible with CUDA Toolkit
>= 12.8.
Note:
These prerequisites are not strict requirements but describe the environment used for development. The service can be set up in alternative environments with different package managers, operating systems, or GPU configurations if needed.
-
Clone the repository
git clone git@github.com:Sierra-Arn/pose-estimation-service.git cd pose-estimation-service -
Install dependencies
pixi install
-
Activate environment
pixi shell
The service requires the sam-3d-body-vith model weights, which are hosted on Hugging Face under restricted access.
-
Request access
Navigate to the model repository and submit an access request:
https://huggingface.co/facebook/sam-3d-body-vith -
Download weights
After access is granted, download the weights manually and place them in the project root:
# Expected structure after download: pose-estimation-service/ ├── app/ ├── docker/ ├── config/ ├── migrations/ ├── scripts/ ├── pixi.toml ├── pixi.lock ├── justfile └── sam-3d-body-vith/
Once the environment is activated and model weights are downloaded, the service can be launched in one of two modes depending on the deployment context.
-
Local mode — infrastructure services run in Docker; the API server and workers run on the host machine directly.
just quick-start-local
-
Deploy mode — the full stack runs in Docker, including infrastructure, API server, and both worker processes.
just quick-start-deploy
Note:
Deploy mode requires NVIDIA Container Toolkit to expose the GPU to theworker-inferencecontainer.
Whichever mode you choose, the launch script will automatically execute all necessary setup steps, start the server, and open the Swagger UI in your default web browser once the API is ready.
Want to see what happens under the hood?
The launch scripts are fully documented with step-by-step comments explaining each action. You can find them here:
When you are done, shut down the running services depending on the active launch mode.
-
Local mode — stop the API server and workers by terminating their terminal processes, then bring down the infrastructure containers.
just docker-local-down
-
Deploy mode — stop and remove all containers with a single command.
just docker-deploy-down
This project is licensed under the Apache License, Version 2.0.