🚀 High-Throughput Real-Time Edge AI Inference System

(TensorRT + Multi-thread + Asynchronous Pipeline)

🎬 Demo (Real-time Mode ~25 FPS)

⚡ High Throughput Mode (~60+ FPS)

The system can run significantly faster in offline mode by fully utilizing GPU through pipeline overlap.

📌 Project Overview

This project implements a high-performance edge AI inference system using TensorRT in C++.

A multi-threaded pipeline is designed to decouple capture, preprocessing, inference, and postprocessing.
Cross-frame asynchronous execution enables CPU-GPU overlap, significantly improving throughput.

The system supports:

Real-time mode (~25 FPS)
High-throughput mode (~60+ FPS)

🏗️ System Architecture

Capture → Preprocess → Inference (GPU) → Postprocess → Display

Multi-thread pipeline
Thread-safe queues with backpressure control
Frame dropping strategy for real-time stability

Pipeline Design

Producer: continuously captures frames
Preprocess: letterbox, normalization, CHW conversion
Inference: TensorRT FP16 execution on GPU
Postprocess: decode + NMS + rendering
Consumer: real-time display

Data is passed through Task structures across threads. Frame dropping is applied to ensure real-time performance.

Quant Performance Comparison

Precision	Avg Inference Time	Improvement
FP16	4.316 ms	Baseline
INT8	3.940 ms	~8.7% Faster

Profiling Method

Performance measured using NVIDIA Nsight Systems:

nsys profile --trace=cuda,nvtx,osrt -o report ./App

📊 Performance

Final Performance

Real-time mode: ~25 FPS
Throughput mode: ~60+ FPS
Inference (GPU kernel): ~2–3 ms
Preprocess: ~3 ms
Postprocess: ~9 ms

Optimization Steps

Stage	FPS
Single-thread baseline	~5 FPS
Multi-thread pipeline	~10 FPS
TensorRT FP16 inference	~20 FPS
Async pipeline overlap	~30+ FPS

💡 Technical Highlights

Multi-threaded pipeline with backpressure control
Cross-frame asynchronous execution (CPU-GPU overlap)
Double buffering for efficient memory reuse
TensorRT FP16 acceleration
Real-time vs high-throughput dual-mode design
End-to-end performance profiling (CPU + GPU)

Technology Stack

C++
TensorRT
CUDA
OpenCV
Multi-threading (std::thread, mutex, condition_variable)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
build_int8_engine.py		build_int8_engine.py
demo_25fps.gif		demo_25fps.gif
demo_60fps.gif		demo_60fps.gif
demo_jetson_25fps.gif		demo_jetson_25fps.gif
demo_jetson_60fps.gif		demo_jetson_60fps.gif
main.cpp		main.cpp
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 High-Throughput Real-Time Edge AI Inference System

🎬 Demo (Real-time Mode ~25 FPS)

⚡ High Throughput Mode (~60+ FPS)

📌 Project Overview

🏗️ System Architecture

Pipeline Design

Quant Performance Comparison

Profiling Method

📊 Performance

Final Performance

Optimization Steps

💡 Technical Highlights

Technology Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 High-Throughput Real-Time Edge AI Inference System

🎬 Demo (Real-time Mode ~25 FPS)

⚡ High Throughput Mode (~60+ FPS)

📌 Project Overview

🏗️ System Architecture

Pipeline Design

Quant Performance Comparison

Profiling Method

📊 Performance

Final Performance

Optimization Steps

💡 Technical Highlights

Technology Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages