Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior






Published 4/1/2024 by Zhiyuan Yao, Zheng Li, Matthew Thomas, Ionut Florescu



Investors and regulators can greatly benefit from a realistic market simulator that enables them to anticipate the consequences of their decisions in real markets. However, traditional rule-based market simulators often fall short in accurately capturing the dynamic behavior of market participants, particularly in response to external market impact events or changes in the behavior of other participants. In this study, we explore an agent-based simulation framework employing reinforcement learning (RL) agents. We present the implementation details of these RL agents and demonstrate that the simulated market exhibits realistic stylized facts observed in real-world markets. Furthermore, we investigate the behavior of RL agents when confronted with external market impacts, such as a flash crash. Our findings shed light on the effectiveness and adaptability of RL-based agents within the simulation, offering insights into their response to significant market events.

The paper discusses the importance of understanding how financial markets react to various events and proposes a simulation framework using reinforcement learning (RL) agents to model market behavior. Traditional agent-based market simulators use rule-based agents, which fail to capture realistic market dynamics due to their rigid, hard-coded nature. In contrast, RL agents can learn and adapt to changing market conditions, mirroring the behavior of real-world investors and enhancing the realism of the simulation.

The paper highlights successful applications of machine learning techniques in financial problems and presents RL as a suitable approach for simulating financial markets. While previous works have utilized RL agents for simplified investment problems or dealer markets, the proposed framework aims to simulate a complete continuous double auction stock market using complex RL-based agents.

The study employs a small group of representative RL agents and compares the simulation results with a system composed of rule-based zero-intelligence agents and real market data. The results obtained using the RL agents' system are comparable to real data, and the system demonstrates the ability to adapt to changing market conditions.

Important Concepts

The paper section covers two key concepts:

Reinforcement Learning (RL) Agents:

  • RL agents solve problems modeled as Markov Decision Processes (MDPs)
  • An MDP consists of states, actions, rewards, transition probabilities, and a discount factor
  • The agent's goal is to find an optimal policy to maximize expected cumulative discounted rewards
  • The Proximal Policy Optimization (PPO) method is used to optimize the RL agents

Limit Order Books (LOBs) in Continuous Double Auction (CDA) Markets:

  • Traditional financial exchanges use the CDA market model
  • The CDA maintains two LOBs, one for buy orders and one for sell orders
  • Traders place limit orders specifying desired price ranges for buying/selling assets
  • Market orders execute immediately against available limit orders in the LOBs
  • Limit orders remain in the LOBs until matched with a market order

System and Agents

The system contains a machine engine that organizes limit order books and settles trades, as well as a brokerage center that tracks each agent's account, including their buying power and assets. All agents place market and limit orders to the matching engine which runs a continuous double auction market model.

There are two types of agents: liquidity-taking (LT) agents and market-making (MM) agents. Each agent is formulated as a reinforcement learning agent, observing the system independently, selecting actions (orders), receiving feedback (rewards), and optimizing its strategy.

For MM agents, the observation space includes mid-prices, order book levels, liquidity provision percentage, inventory, and buying power. Their action space determines the size and prices of limit orders placed on both sides of the order book. The reward function incentivizes increasing profit and loss (PnL) while penalizing PnL fluctuations from inventory and price oscillations, and minimizing the difference between actual and target liquidity provision.

For LT agents, the observation space is similar without the liquidity provision percentage. Their action space allows placing market orders of a fixed size or skipping. The reward function incentivizes increasing PnL while penalizing inventory risk, and minimizing the difference between actual and target order frequencies.

The simulation details include initializing agents with random parameters, running them in parallel threads, collecting experiences independently, and training using proximal policy optimization. The simulation is implemented within a real-time trading platform, introducing realistic network latency.

Experiment Design

The paper outlines a study to evaluate if reinforcement learning (RL) agents can simulate realistic financial markets. Two key aspects are analyzed: statistical characteristics and market responsiveness.

Statistical Characteristics: The study examines if the RL agent simulations exhibit well-known stylized facts observed in real financial markets, such as heavy-tailed return distributions, absence of autocorrelation in returns, volatility clustering, etc. The agents' inventory evolution and profit sources are also examined.

Market Responsiveness: The study introduces external impacts like large sell orders (flash sales) and periods of directional trading to observe price impacts and changes in market makers' strategies, such as widening spreads. This tests if the RL agents respond similarly to real markets.

Continual Learning: Three agent groups are compared - one with continual training during simulation, one without further training, and an untrained group. This evaluates if continual learning improves adaptability to changing markets.

The RL agent simulations are compared to a baseline Zero-Intelligence agent model and real data to assess how realistic the generated markets are across different statistical properties and responsiveness scenarios.

Experiment Results

The paper section analyzes the statistical characteristics of simulated asset prices generated by reinforcement learning (RL) agents and compares them to real market data. Key findings include:

Heavy tails and kurtosis decay: The simulated prices from RL agents exhibit heavy-tailed return distributions similar to real data, with kurtosis decreasing as sampling frequency reduces, matching empirical observations.

Absence of autocorrelations: The simulated returns show a strong negative autocorrelation at the first lag, decaying to zero for larger lags, consistent with the real data bounce between bid and ask prices.

Slow decay of autocorrelation for absolute returns: The autocorrelation of absolute returns decays slowly with time lags in both real data and simulations, indicating long-range time dependency in return magnitudes.

Volatility clustering: The simulations capture the volatility clustering effect observed in real markets through decaying autocorrelations of squared returns.

Market making behavior: The RL market making agents control inventory well and profit from providing liquidity rather than holding positions, demonstrating realistic market making strategies.

The section also analyzes market responsiveness to external events like flash sales and informed trading through experiments. The continuously trained RL agents adapt order placement strategies according to market conditions like order book imbalance and price trends, exhibiting behavior consistent with financial literature.


The paper modifies the formulation of reinforcement learning (RL) agents and implements a highly realistic simulation platform. The simulation results are compared against real data and a market simulated using zero intelligence traders. The results show realistic market characteristics and responsiveness to external factors. The authors find that continual learning RL agents produce the most realistic market simulation and can adapt to changing market conditions. However, calibrating an agent-based system remains a challenging problem, as the system is non-stationary and runs in real-time. The authors mention previous work on calibrating RL-based multi-agent systems but note that applying those algorithms to their system is challenging. They plan to address this and other issues in future work.


The provided text presents additional simulation results and configuration details from the research study.

7.1 Additional Simulation Results:

  • Figure 1 shows a quantile-quantile plot comparing the return distributions of simulations against real data for all agent groups.
  • Figures 2 and 3 display autocorrelation plots for price returns and absolute returns, respectively, comparing the testing and untrained agent groups.
  • Figure 4 presents volatility clustering analysis for the testing and untrained groups.
  • Table 1 provides the kurtosis and inventory risk values for the continual training, testing, and untrained agent groups.

7.2 Simulation Configuration:

  • Table 2 lists the agent configurations (parameter values) used for the training, testing, and untrained groups.
  • Table 3 describes special setups for flash sale and informed long-term (LT) trader simulations.
  • Table 4 shows how market characteristics like spread, depth, volume, and profit/loss (PNL) change when agents' hyperparameters are manipulated.
  • Table 5 outlines the specific hyperparameter setups used for examining PNL and market share under different conditions.

