⚡ HFT Optimization 🏛️ Gymnasium Environment 🔬 Zenodo ✍️ Medium 🦀 C++20 🐍 Python 3.8+

Latency Gym

High-Performance HFT Matching Engine Latency Optimizer

When microseconds cost millions

Production-grade Gymnasium environment for optimizing high-frequency trading matching engine latencies through reinforcement learning. C++20 simulator with zero Python overhead during simulation, bound via Pybind11.

Install Now Learn Architecture →

100 µs per step. Zero dynamic allocation. Nanosecond precision.

"In high-frequency trading, every microsecond of latency is quantifiable profit loss."

A trader's competitive edge depends on tuning three critical parameters: batch size, polling rate, and memory pre-allocation strategy. But these aren't static—optimal configurations change with market conditions. Manual tuning is impossible. Latency Gym lets RL agents discover optimal configurations automatically, accounting for both mean latency and tail risk (p99/p99.9).

The Optimization Problem

HFT matching engines operate under extreme constraints. Three parameters control the entire system's latency profile:

Decision Variables

Parameter	Range	Meaning	Trade-off
Batch Size	1–64	Orders matched per polling cycle	Larger = lower latency but higher variance
Polling Rate	1–10	Divisor (1/x checks per cycle)	Faster = lower latency, higher CPU
Pre-alloc Pool	1–5	Memory pre-allocation levels	Higher = faster allocation, more memory

Observation Space: System State

\text{Observation} = [\text{queue\_depth}, \text{mean\_latency\_ns}, \text{variance}, \text{drops}]

queue_depth — Current unmatched orders (0–4096)
mean_latency_ns — Average latency in nanoseconds (0–1e9)
variance — Sliding 1000-order window (0–1e18)
drops — Cumulative buffer overflows (0–1e9)

Reward Function: Tail Latency Penalty

The core innovation: explicitly penalize tail latencies and variance, not just mean.

R = -(\alpha \cdot \text{mean\_latency} + \beta \cdot \text{variance} + \gamma \cdot \text{drops})

Why variance matters: Two systems with identical mean latencies differ drastically if one has p99=150µs and the other p99=5ms. The reward function explicitly captures this asymmetry.

System Architecture

C++ High-Performance Simulator

┌────────────────────────────────────────────────────────────┐
│            Latency Gym Simulator (C++20)                   │
├────────────────────────────────────────────────────────────┤
│                                                            │
│ TimeCounter         — Nanosecond-precision timestamps     │
│ Order (48 bytes)    — Lightweight order struct            │
│ OrderRingBuffer     — Fixed-capacity, zero-copy          │
│ LatencyStatsWindow  — Rolling O(1) percentile tracking   │
│ LatencySimulator    — Discrete-event loop                │
│                                                            │
│ Compiled with:      -O3 -march=native                    │
│ Per-step cost:      ~100 µs on modern CPU                │
│ Memory allocation:  Zero in hot loop                     │
│                                                            │
└────────────────────────────────────────────────────────────┘

Gymnasium Integration

Clean Python interface via Pybind11:

import gymnasium as gym

env = gym.make("hft-latency-v0")
obs, info = env.reset()

action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)

Performance Characteristics

Single Step

~100µs

1000 Steps

~100ms

1M Steps

~100s

Base Memory

~2MB

Installation & Setup

Requirements

Python 3.8+ — Any recent version
CMake 3.15+ — For building C++ extension
C++20 compiler — gcc-9+, clang-10+, or MSVC 2019+

Quick Install

git clone https://github.com/prakulhiremath/latency-gym.git
cd latency-gym

pip install -e .

Verify Installation

import gymnasium as gym

env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)

print("Observation shape:", obs.shape)
print("Action space:", env.action_space)

Usage Examples

Basic Environment Interaction

import gymnasium as gym
import numpy as np

env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)

action = np.array([3, 4, 1])  # batch_size=3, poll_divisor=4, prealloc=1
obs, reward, terminated, truncated, info = env.step(action)

print(f"Reward: {reward:.4f}")
print(f"Queue depth: {obs[0]:.1f}")
print(f"Mean latency (ns): {obs[1]:.0f}")

Random Agent Baseline

env = gym.make("hft-latency-v0")
obs, info = env.reset()

total_reward = 0
for step in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    
    if terminated or truncated:
        break

print(f"Episode return: {total_reward:.2f}")

RL Training with Stable-Baselines3

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make("hft-latency-v0")
env = DummyVecEnv([lambda: gym.make("hft-latency-v0")])
env = VecNormalize(env, norm_obs=True, norm_reward=True)

model = PPO("MlpPolicy", env, learning_rate=1e-4, verbose=1)
model.learn(total_timesteps=100_000)

Testing & Development

Run Test Suite

pip install -e ".[dev]"
pytest tests/test_env.py -v

What's Tested

Environment initialization, reset, step
Action/observation space compliance
Reward computation and bounds
Memory safety over 1000+ steps
Numerical stability (no NaN/Inf)
Gymnasium integration
Random agent baseline

50+ deterministic tests, all passing.

Project Structure

latency-gym/
├── CMakeLists.txt
├── pyproject.toml
├── README.md
├── assets/
│   └── latency_gym_training.gif
├── include/
│   └── latency_gym/
│       └── engine.hpp
├── src/
│   └── bindings.cpp
├── latency_gym/
│   ├── __init__.py
│   └── envs/
│       ├── __init__.py
│       └── hft_env.py
└── tests/
    ├── __init__.py
    └── test_env.py

Citation

@software{latency_gym_2026,
  title={Latency Gym: High-Performance HFT 
         Matching Engine Latency Optimizer},
  author={Prakul S. Hiremath},
  year={2026},
  url={https://github.com/prakulhiremath/latency-gym}
}

Resources

Learn

Connect

Email
GitHub
License: MIT

Latency Gym — High-Performance HFT Matching Engine Optimization

MIT License · Python 3.8+ · C++20 · Open Source

"Built with precision for high-frequency trading simulation."