0%

Dueling Double Deep Q learning in intraday trading

"Dueling Double Deep Q learning in intraday trading", Math 625 Portfolio Theory and Application Final Project

Keywords: Reinforcement Learning; Intraday trading; US stocks

Purpose

Training an agent use the DQN Dueling network and double Q value function in the custom environment ‘intraday’. The trained agent will use its ‘skill’ to submit trade action at every 390 steps in a day. Its performance is measured by ‘intraday’ environment and returned next step to the agent.

Approach

  • Dueling and double DQN model
    • Double Q value function is to remove the overestimate bias. The redefined loss functions: \[ L_i(\theta_i) = \mathbb{E}_{s_{t},a_{t},r,s_{t+1} \sim D_i} [ (\mathbb{E}_{s_{t+1} \sim \mathcal{E}} [r_t + \gamma Q(\phi_{t+1}, \underset{a'}{\operatorname{arg max}}{Q}(\phi_{t+1}, a'|\theta_i)|\theta^{-})] - \gamma Q(\phi_t,a_t | \theta_i))^2 ] \]
    • Dueling network separates Q network to two control layers: action dependent advantage function A and action independent value function V: \[ Q(\phi, a | \theta, \alpha, \beta) = A(\phi, a | \theta, \alpha) + V(\phi|\theta, \beta)\]
    • Adjustment: subtract an average advantage value of all action \[ Q(\phi, a | \theta, \alpha, \beta) = (A(\phi, a | \theta, \alpha) - \frac{1}{\# a'} \sum_{a'}^{} A(\phi, a'| \theta, \alpha) ) + V(\phi|\theta, \beta) \]

  • Environment ‘Intraday’ trading rules
    • The agent is can only execute 2 day-trades in these 390 steps. Day-trade is counted as a buy order and then followed by a sell order.
    • Each order will cost 0.5% transaction fee.
    • Buy order will always be executed at the high price of the current time step bar. Similarly sell order will always be executed at the low price of the current bar.
    • Sequence of order execution for 2 day-trades should be buy, sell, buy, sell. Buy order only submit when there is no buy order exists. Same, sell order cannot repeat twice since there is no stock hold at agent’s hand.
    • There is no overnight holding, all the remaining stock will be clear (sell) at the end by ‘intraday’ exchange.

Data

Symbols: 'AMP', 'INTC', 'AMZN', 'ANTM', 'T', 'AXP', 'ACN', 'ABC', 'UNH', 'WMT', 'APH', 'AMG', 'AEP', 'APD', 'ALLE', 'ARE', 'MA', 'PG', 'AIG', 'ALL', 'AEE', 'AWK', 'GE', 'MMM', 'PFE', 'WFC', 'ABT', 'ABBV', 'AES', 'AAP', 'AMT', 'CSCO', 'AME', 'AGN', 'APC', 'A', 'AAPL', 'VZ', 'ADS', 'BAC', 'CVX', 'AFL', 'JPM'

Fields: YYYY.MM.DD HH.MM.SS | OPEN | HIGH | LOW | CLOSE | VOLUME | NUMBER OF TRADES | WEIGHTED, AV. PRICE

Available data from firstratedata:

symbolsMax date rangeMax time pointsTime range
Hour4862004.01-2019.037 hours bar/day9:30 - 15:00
Min472004.01-2019.03390 mins bar/day9:30 - 15:59

Slides