Harnessing FinRL's Environment Layer for Advanced Market Modeling

8 Jun 2024


(1) Xiao-Yang Liu, Hongyang Yang, Columbia University (xl2427,hy2500@columbia.edu);

(2) Jiechao Gao, University of Virginia (jg5ycn@virginia.edu);

(3) Christina Dan Wang (Corresponding Author), New York University Shanghai (christina.wang@nyu.edu).

Abstract and 1 Introduction

2 Related Works and 2.1 Deep Reinforcement Learning Algorithms

2.2 Deep Reinforcement Learning Libraries and 2.3 Deep Reinforcement Learning in Finance

3 The Proposed FinRL Framework and 3.1 Overview of FinRL Framework

3.2 Application Layer

3.3 Agent Layer

3.4 Environment Layer

3.5 Training-Testing-Trading Pipeline

4 Hands-on Tutorials and Benchmark Performance and 4.1 Backtesting Module

4.2 Baseline Strategies and Trading Metrics

4.3 Hands-on Tutorials

4.4 Use Case I: Stock Trading

4.5 Use Case II: Portfolio Allocation and 4.6 Use Case III: Cryptocurrencies Trading

5 Ecosystem of FinRL and Conclusions, and References

3.4 Environment Layer

Environment design is crucial in DRL, because the agent learns by interacting with the environment in a trial and error manner. A good environment that simulates real-world market will help the agent learn a better strategy. Considering the stochastic and interactive nature, a financial task is modeled as a Markov Decision Process (MDP), whose state transition is shown in Fig. 1.

The environment layer in FinRL is responsible for observing current market information and translating those information into states of the MDP problem. The state variables can be categorized into the state of an agent and the state of the market. For example, in the use case stock trading, the state of the market includes the open-high-low-close prices and volume (OHLCV) and technical indicators; the state of an agent includes the account balance and the shares for each stock.

The RL training process involves observing price change, taking an action and calculating a reward. By interacting with the environment, the agent updates iteratively and eventually obtains a trading strategy to maximize the expected return. We reconfigure real market data into gym-style training environments according to the principle of time-driven simulation. Inspired by OpenAI Gym [5], FinRL provides strategy builders with a collection of universal training environments for various trading tasks.

3.4.1 Standard Datasets and Live Trading APIs. DRL in finance is different from chess, card games and robotics [44, 52], which may have physical engines or simulators. Different financial tasks may require different market simulators. Building such training environments is time-consuming, so FinRL provides a set of representative ones and also supports user-import data, aiming to free users from such tedious and time-consuming work.

NASDAQ-100 index constituents are 100 stocks that are characterized by high technology and high growth.

Dow Jones Industrial Average (DJIA) index is made up of 30 representative constituent stocks. DJIA is the most cited market indicator to examine market overall performance.

Standard & Poor’s 500 (S&P 500) index constituents consist of 500 largest U.S. publicly traded companies.

Hang Seng Index Index (HSI) constituents are grouped into Finance, Utilities, Properties and Commerce & Industry [19]. HSI is the most widely quoted indicator of the Hong Kong stock market.

SSE 50 Index constituents [12] include the best representative companies (in 10 industries) of A shares listed at Shanghai Stock Exchange (SSE) with considerable size and liquidity.

CSI 300 Index constituents [8] consist of the 300 largest and most liquid A-share stocks listed on Shenzhen Stock Exchange and SSE. This index reflects the performance of the China A-share market.

Figure 4: The training-testing-trading pipeline.

Bitcoin (BTC) Price Index consists of the quote and trade data on Bitcoin market, available at https://public.bitmex.com/.

3.4.2 User-Imported Data. Users may want to train agents on their own data sets. FinRL provides convenient support for users to import data, adjust the time granularity, and perform the trainingtesting-trading data split. We specify the format for different trading tasks, and users preprocess and format the data according to our instructions. Stock statistics and indicators can be calculated using our support, which provides more features for the state space. Furthermore, episodic total return and Sharpe ratio can also assist performance evaluation.

This paper is available on arxiv under CC BY 4.0 DEED license.