⚑ HFT Optimization πŸ›οΈ Gymnasium Environment πŸ”¬ Zenodo ✍️ Medium πŸ¦€ C++20 🐍 Python 3.8+

Latency Gym

High-Performance HFT Matching Engine Latency Optimizer

When microseconds cost millions

Production-grade Gymnasium environment for optimizing high-frequency trading matching engine latencies through reinforcement learning. C++20 simulator with zero Python overhead during simulation, bound via Pybind11.

100 Β΅s per step. Zero dynamic allocation. Nanosecond precision.

"In high-frequency trading, every microsecond of latency is quantifiable profit loss."

A trader's competitive edge depends on tuning three critical parameters: batch size, polling rate, and memory pre-allocation strategy. But these aren't staticβ€”optimal configurations change with market conditions. Manual tuning is impossible. Latency Gym lets RL agents discover optimal configurations automatically, accounting for both mean latency and tail risk (p99/p99.9).

The Optimization Problem

HFT matching engines operate under extreme constraints. Three parameters control the entire system's latency profile:

Decision Variables

Parameter Range Meaning Trade-off
Batch Size 1–64 Orders matched per polling cycle Larger = lower latency but higher variance
Polling Rate 1–10 Divisor (1/x checks per cycle) Faster = lower latency, higher CPU
Pre-alloc Pool 1–5 Memory pre-allocation levels Higher = faster allocation, more memory

Observation Space: System State

$$\text{Observation} = [\text{queue\_depth}, \text{mean\_latency\_ns}, \text{variance}, \text{drops}]$$

Reward Function: Tail Latency Penalty

The core innovation: explicitly penalize tail latencies and variance, not just mean.

$$R = -(\alpha \cdot \text{mean\_latency} + \beta \cdot \text{variance} + \gamma \cdot \text{drops})$$

Why variance matters: Two systems with identical mean latencies differ drastically if one has p99=150Β΅s and the other p99=5ms. The reward function explicitly captures this asymmetry.

System Architecture

C++ High-Performance Simulator

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Latency Gym Simulator (C++20)                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                            β”‚
β”‚ TimeCounter         β€” Nanosecond-precision timestamps     β”‚
β”‚ Order (48 bytes)    β€” Lightweight order struct            β”‚
β”‚ OrderRingBuffer     β€” Fixed-capacity, zero-copy          β”‚
β”‚ LatencyStatsWindow  β€” Rolling O(1) percentile tracking   β”‚
β”‚ LatencySimulator    β€” Discrete-event loop                β”‚
β”‚                                                            β”‚
β”‚ Compiled with:      -O3 -march=native                    β”‚
β”‚ Per-step cost:      ~100 Β΅s on modern CPU                β”‚
β”‚ Memory allocation:  Zero in hot loop                     β”‚
β”‚                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Gymnasium Integration

Clean Python interface via Pybind11:

import gymnasium as gym

env = gym.make("hft-latency-v0")
obs, info = env.reset()

action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)

Performance Characteristics

Single Step

~100Β΅s

1000 Steps

~100ms

1M Steps

~100s

Base Memory

~2MB

Installation & Setup

Requirements

Quick Install

git clone https://github.com/prakulhiremath/latency-gym.git
cd latency-gym

pip install -e .

Verify Installation

import gymnasium as gym

env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)

print("Observation shape:", obs.shape)
print("Action space:", env.action_space)

Usage Examples

Basic Environment Interaction

import gymnasium as gym
import numpy as np

env = gym.make("hft-latency-v0")
obs, info = env.reset(seed=42)

action = np.array([3, 4, 1])  # batch_size=3, poll_divisor=4, prealloc=1
obs, reward, terminated, truncated, info = env.step(action)

print(f"Reward: {reward:.4f}")
print(f"Queue depth: {obs[0]:.1f}")
print(f"Mean latency (ns): {obs[1]:.0f}")

Random Agent Baseline

env = gym.make("hft-latency-v0")
obs, info = env.reset()

total_reward = 0
for step in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    
    if terminated or truncated:
        break

print(f"Episode return: {total_reward:.2f}")

RL Training with Stable-Baselines3

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make("hft-latency-v0")
env = DummyVecEnv([lambda: gym.make("hft-latency-v0")])
env = VecNormalize(env, norm_obs=True, norm_reward=True)

model = PPO("MlpPolicy", env, learning_rate=1e-4, verbose=1)
model.learn(total_timesteps=100_000)

Testing & Development

Run Test Suite

pip install -e ".[dev]"
pytest tests/test_env.py -v

What's Tested

50+ deterministic tests, all passing.

Project Structure

latency-gym/
β”œβ”€β”€ CMakeLists.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ assets/
β”‚   └── latency_gym_training.gif
β”œβ”€β”€ include/
β”‚   └── latency_gym/
β”‚       └── engine.hpp
β”œβ”€β”€ src/
β”‚   └── bindings.cpp
β”œβ”€β”€ latency_gym/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── envs/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── hft_env.py
└── tests/
    β”œβ”€β”€ __init__.py
    └── test_env.py

Citation

@software{latency_gym_2026,
  title={Latency Gym: High-Performance HFT 
         Matching Engine Latency Optimizer},
  author={Prakul S. Hiremath},
  year={2026},
  url={https://github.com/prakulhiremath/latency-gym}
}

Latency Gym β€” High-Performance HFT Matching Engine Optimization

MIT License Β· Python 3.8+ Β· C++20 Β· Open Source

"Built with precision for high-frequency trading simulation."