A production-grade, open-source Gymnasium environment for optimizing high-frequency trading (HFT) matching engine latencies through reinforcement learning. Written in C++20 with zero Python overhead during simulation, bound to Python via Pybind11, and packaged with modern PEP 517/scikit-build-core standards.
Latency Gym simulates the critical performance bottlenecks of HFT order matching systems:
- Network queue dynamics with ring-buffer allocations
- Packet loss and buffer overflows under bursty traffic
- Nanosecond-precision latency tracking across orders
- Tail latency optimization via variance penalties (p99/p99.9)
In high-frequency trading, microseconds cost millions. A trader's competitive edge depends on tuning three critical parameters:
- Batch Size (1β64 orders/cycle) β How many orders to match per polling cycle
- Polling Rate (1β10 divisor) β How often to check the network for new orders
- Pre-allocation Pool (1β5 levels) β Memory pre-allocation strategy for order buffers
Latency Gym allows RL agents to discover optimal configurations under varying market conditions, accounting for both mean latency and tail risk (p99/p99.9 latencies).
Above: 500-step training animation showing an RL agent learning to optimize P99 tail latency, cumulative reward, and queue management compared to random baseline. The green fill shows cumulative improvement across all metrics.
Discrete choice of three parameters across the following ranges:
- Batch Size: 1 to 64
- Polling Rate: 1 to 10
- Pre-allocation Pool: 1 to 5
Encoded as MultiDiscrete([64, 10, 5]) in Gymnasium.
Four continuous metrics tracking system state:
- queue_depth: Current number of unmatched orders (0-4096)
- mean_latency_ns: Average latency in nanoseconds (0-1e9)
- latency_variance: Variance over 1000-order sliding window (0-1e18)
- packet_drops: Cumulative overflows (0-1e9)
The core innovation: explicitly penalize tail latencies and variance, not just mean.
Reward = -(alpha Γ mean_latency + beta Γ variance + gamma Γ drops)
Hyperparameters (defaults):
- alpha = 1.0 β Weight on mean latency
- beta = 0.5 β Weight on variance (tail risk)
- gamma = 2.0 β Weight on packet drops (catastrophic failures)
Why variance matters: Two systems with identical mean latencies differ drastically if one has p99=150Β΅s and the other p99=5ms.
High-performance components:
- TimeCounter β Nanosecond-precision timestamp arithmetic
- Order β Lightweight order struct (48 bytes)
- OrderRingBuffer β Fixed-capacity ring buffer with wraparound tracking
- LatencyStatsWindow β Rolling statistics with O(1) percentile tracking
- LatencySimulator β Deterministic discrete-event simulator
Key optimizations:
- No dynamic allocation in hot loop
- Vectorized percentile computation
- Nanosecond arithmetic with integer math
- Compiled with
-O3 -march=nativeflags
Clean interface to C++ via Pybind11:
import gymnasium as gym
env = gym.make("hft-latency-v0")
obs, info = env.reset()
for step in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)git clone https://github.com/prakulhiremath/latency-gym.git
cd latency-gym
pip install -e .Requirements:
- Python 3.8+
- CMake 3.15+
- C++20 compiler (gcc-9+, clang-10+, MSVC 2019+)
import gymnasium as gym
env = gym.make("hft-latency-v0")
obs, info = env.reset()
print("Observation shape:", obs.shape)import gymnasium as gym
import numpy as np
env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)
action = np.array([3, 4, 1])
obs, reward, terminated, truncated, info = env.step(action)
print(f"Reward: {reward:.4f}")
print(f"Queue depth: {obs[0]:.1f}")
print(f"Mean latency (ns): {obs[1]:.0f}")env = gym.make("hft-latency-v0")
obs, info = env.reset()
total_reward = 0
for step in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
if terminated or truncated:
break
print(f"Episode return: {total_reward:.2f}")env = gym.make("hft-latency-v0")
env.reset()
for _ in range(100):
env.step(env.action_space.sample())
state = env.get_state_dict()
print(f"Mean latency: {state['mean_latency_ns']:.0f} ns")
print(f"p99 latency: {state['p99_latency_ns']:.0f} ns")
print(f"p99.9 latency: {state['p999_latency_ns']:.0f} ns")import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
env = gym.make("hft-latency-v0")
env = DummyVecEnv([lambda: gym.make("hft-latency-v0")])
env = VecNormalize(env, norm_obs=True, norm_reward=True)
model = PPO("MlpPolicy", env, learning_rate=1e-4, verbose=1)
model.learn(total_timesteps=100_000)latency-gym/
βββ CMakeLists.txt
βββ pyproject.toml
βββ README.md
βββ assets/
β βββ latency_gym_training.gif
βββ include/
β βββ latency_gym/
β βββ engine.hpp
βββ src/
β βββ bindings.cpp
βββ latency_gym/
β βββ __init__.py
β βββ envs/
β βββ __init__.py
β βββ hft_env.py
βββ tests/
βββ __init__.py
βββ test_env.py
- Single step: ~100 Β΅s on modern CPU
- 1000 steps: ~100 ms
- 1M steps: ~100 seconds
- Zero Python overhead during step (C++ compiled loop)
- Base environment: ~2 MB
- Per-step allocation: 0 bytes (pre-allocated ring buffer)
- Scales to: 1B+ order matches without reallocation
Run the comprehensive test suite:
pip install -e ".[dev]"
pytest tests/test_env.py -vTest Coverage:
- Environment initialization, reset, step
- Action/observation space compliance
- Reward computation and bounds
- Memory safety (no leaks/segfaults) over 1000+ steps
- Numerical stability (no NaN/Inf)
- Gymnasium integration
- Random agent baseline
- C++ simulator directly
Test count: 50+ tests, all deterministic
double mean_penalty = alpha * state.mean_latency_ns;
double variance_penalty = beta * state.latency_variance;
double drop_penalty = gamma * state.packet_drops;
reward = -(mean_penalty + variance_penalty + drop_penalty);Numerical stability:
- Latencies capped at 1 second max
- Variance computed over 1000-element window
- Percentiles via sorted array
| Index | Min | Max | Meaning |
|---|---|---|---|
| 0 | 1 | 64 | Batch size |
| 1 | 1 | 10 | Polling rate divisor |
| 2 | 1 | 5 | Pre-allocation pool level |
When the buffer is full:
- New orders are dropped
- Packet drops counter increments
- Reward penalty applied via gamma term
- Agent learns to keep queue lower
We welcome contributions. Please:
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure all tests pass:
pytest tests/ - Submit a pull request
MIT License β See LICENSE file for full text.
If you use Latency Gym in research:
@software{latency_gym_2026,
title={Latency Gym: High-Performance HFT Matching Engine Latency Optimizer},
author={Prakul S. Hiremath},
year={2026},
url={https://github.com/prakulhiremath/latency-gym}
}Built with precision for high-frequency trading simulation. β‘
