High-Performance HFT Matching Engine Latency Optimizer
When microseconds cost millions
Production-grade Gymnasium environment for optimizing high-frequency trading matching engine latencies through reinforcement learning. C++20 simulator with zero Python overhead during simulation, bound via Pybind11.
100 Β΅s per step. Zero dynamic allocation. Nanosecond precision.
"In high-frequency trading, every microsecond of latency is quantifiable profit loss."
A trader's competitive edge depends on tuning three critical parameters: batch size, polling rate, and memory pre-allocation strategy. But these aren't staticβoptimal configurations change with market conditions. Manual tuning is impossible. Latency Gym lets RL agents discover optimal configurations automatically, accounting for both mean latency and tail risk (p99/p99.9).
HFT matching engines operate under extreme constraints. Three parameters control the entire system's latency profile:
| Parameter | Range | Meaning | Trade-off |
|---|---|---|---|
| Batch Size | 1β64 | Orders matched per polling cycle | Larger = lower latency but higher variance |
| Polling Rate | 1β10 | Divisor (1/x checks per cycle) | Faster = lower latency, higher CPU |
| Pre-alloc Pool | 1β5 | Memory pre-allocation levels | Higher = faster allocation, more memory |
queue_depth β Current unmatched orders (0β4096)mean_latency_ns β Average latency in nanoseconds (0β1e9)variance β Sliding 1000-order window (0β1e18)drops β Cumulative buffer overflows (0β1e9)The core innovation: explicitly penalize tail latencies and variance, not just mean.
Why variance matters: Two systems with identical mean latencies differ drastically if one has p99=150Β΅s and the other p99=5ms. The reward function explicitly captures this asymmetry.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Latency Gym Simulator (C++20) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β TimeCounter β Nanosecond-precision timestamps β β Order (48 bytes) β Lightweight order struct β β OrderRingBuffer β Fixed-capacity, zero-copy β β LatencyStatsWindow β Rolling O(1) percentile tracking β β LatencySimulator β Discrete-event loop β β β β Compiled with: -O3 -march=native β β Per-step cost: ~100 Β΅s on modern CPU β β Memory allocation: Zero in hot loop β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Clean Python interface via Pybind11:
import gymnasium as gym
env = gym.make("hft-latency-v0")
obs, info = env.reset()
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)Single Step
~100Β΅s
1000 Steps
~100ms
1M Steps
~100s
Base Memory
~2MB
git clone https://github.com/prakulhiremath/latency-gym.git
cd latency-gym
pip install -e .import gymnasium as gym
env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)
print("Observation shape:", obs.shape)
print("Action space:", env.action_space)import gymnasium as gym
import numpy as np
env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)
action = np.array([3, 4, 1]) # batch_size=3, poll_divisor=4, prealloc=1
obs, reward, terminated, truncated, info = env.step(action)
print(f"Reward: {reward:.4f}")
print(f"Queue depth: {obs[0]:.1f}")
print(f"Mean latency (ns): {obs[1]:.0f}")env = gym.make("hft-latency-v0")
obs, info = env.reset()
total_reward = 0
for step in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
if terminated or truncated:
break
print(f"Episode return: {total_reward:.2f}")import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
env = gym.make("hft-latency-v0")
env = DummyVecEnv([lambda: gym.make("hft-latency-v0")])
env = VecNormalize(env, norm_obs=True, norm_reward=True)
model = PPO("MlpPolicy", env, learning_rate=1e-4, verbose=1)
model.learn(total_timesteps=100_000)pip install -e ".[dev]"
pytest tests/test_env.py -v50+ deterministic tests, all passing.
latency-gym/
βββ CMakeLists.txt
βββ pyproject.toml
βββ README.md
βββ assets/
β βββ latency_gym_training.gif
βββ include/
β βββ latency_gym/
β βββ engine.hpp
βββ src/
β βββ bindings.cpp
βββ latency_gym/
β βββ __init__.py
β βββ envs/
β βββ __init__.py
β βββ hft_env.py
βββ tests/
βββ __init__.py
βββ test_env.py
@software{latency_gym_2026,
title={Latency Gym: High-Performance HFT
Matching Engine Latency Optimizer},
author={Prakul S. Hiremath},
year={2026},
url={https://github.com/prakulhiremath/latency-gym}
}
Latency Gym β High-Performance HFT Matching Engine Optimization
MIT License Β· Python 3.8+ Β· C++20 Β· Open Source
"Built with precision for high-frequency trading simulation."