Optimal Order Execution using Stochastic Control and Reinforcement Learning; Optimal orderexekvering med stokastisk styrteori och reinforcement learning

Robert Hu

Abstracts Mathematics

by Robert Hu

Institution:	KTH Royal Institute of Technology
Department:
Year:	2016
Keywords:	Natural Sciences; Mathematics; Computational Mathematics; Naturvetenskap; Matematik; Beräkningsmatematik; Teknologie masterexamen - Tillämpad matematik och beräkningsmatematik; Master of Science - Applied and Computational Mathematics; Mathematical Statistics; Matematisk statistik
Posted:	02/05/2017
Record ID:	2123847
Full text PDF:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192211

Abstract

In this thesis an attempt is made to find the optimal order execution policy that maximizes the reward from trading financial instruments. The optimal policies are found us-ing a Markov Decision Process that is build using a state space model and the Bellman equation. Since there is not an explicit formula for state space dynamics, simulations on historical data are made instead to find the state transition probabilities and the rewards associated with each state and control. The optimal policy is then generated from the Bellman equation and tested against naive policies on out-of-sample data. This thesis also attempts to model the notion of market impact and test whether the Markov Deci-sion Process is still viable under the imposed assumptions. Lastly, there is also an attempt to estimate the value func-tion using various techniques from Reinforcement Learning. It turns out that naive strategies are superior when market impact is not present and when market impact is modeled as a direct penalty on reward. The Markov Decision Pro-cess is superior with market impact when it is modeled as having an impact on simulations, although some results suggest that the market impact model is not consistent for all types of instruments. Further, approximating the value function yields results that are inferior to the Markov Deci-sion Process, but interestingly the method exhibits an im-provement in performance if the estimated value function is trained before it is tested. ; I denna uppsats görs ett försök att hitta den optimala order exekverings strategi som maximerar vinsten från att handla finansiella instrument. Den optimala strategin hittas genom att använda en Markov beslutsprocess som är byggd på en tillståndsmodell och Bellman ekvationen. Eftersom det in-te finns en explicit formel för tillstånds dynamiken, görs istället simuleringar på historiska data för att uppskatta transitionssannolikheterna och vinsten associerad med var-je tillstånd och styrsignal. Den optimala strategin genereras sedan från Bellman ekvationen och testas mot naiva stra-tegier på test data. Det görs även ett försök att modellera marknads påverkan för att testa om Markov beslutsproces-ser fortfarande är gångbara under antagandena som görs. Slutligen görs även ett försök på att estimera värdesfunk-tionen med olika tekniker från ”Reinforcement Learning”. Det visar sig att naiva strategier är överlägsna när mark-nads påverkan inte inkorporeras och när marknads påver-kan modelleras som ett stra˙ på vinsten. Markov besluts-processer är överlägsna när marknads påverkan modelleras som direkta påverkningar på simuleringarna, men några av resultaten påvisar att modellen inte är konsistent för alla typer av instrument. Slutligen, så ger approximation av vär-desfunktionen sämre resultat än Markov beslutsprocesser, men intressant nog påvisar metoden en förbättring i pre-standa om den estimerade värdesfunktionen tränas innan den testas.

AbstractsMathematics

Abstract

Abstracts Mathematics