Optimal choice of scenario tree using Reinforcement learning

Vondráček, Jakub

Optimální volba scénářového stromu za použití zpětnovazebního učení

diploma thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (348.2Kb)

Permanent link

http://hdl.handle.net/20.500.11956/182211

Identifiers

Study Information System: 234642

Consultant

Kozmík, Karel

Referee

Branda, Martin

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Probability, Mathematical Statistics and Econometrics

Department

Department of Probability and Mathematical Statistics

Date of defense

15. 6. 2023

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

English

Grade

Excellent

Keywords (Czech)

Stochastická optimalizace|Vícestupňová úloha|Zpětnovazební učení

Keywords (English)

Stochastic optimization|Multistage problem|Reinforcement learning

Tato práce se zabývá vícestupňovými stochastickými programy a zkoumá závislost hodnoty účelové funkce na struktuře vybraného scénářového stromu. Scénářové stromy jsou tvořeny moment matching metodou, je formulován mean-CVaR model a dále na historických finančních datech je natrénován agent pomocí hlubokého zpětnovazebního učení za účelem volby co nejlepší možné struktury scénářového stromu pro mean-CVaR model. Pro tento účel jsme naimplementovali vlastní prostředí pro trénování zpětnovazeb- ního agenta. Dále jsme navrhli přidání penalizace do odměny agenta za účelem penalizace stromů s moc složitou strukturou. Zpětnovazebního agenta jsme potom porovnali s agen- tem, který volí strukturu stromu náhodně a ukázali jsme, že zpětnovazební agent dosa- huje lepších výsledků. Dále jsme analyzovali strukturu stromů zvolených zpětnovazebním agentem. 1

Abstract (English)

This thesis deals with multistage stochastic programs and explores the dependence of the obtained objective value on the chosen structure of the scenario tree. In particular, the scenario trees are built using the moment matching method, a multistage mean-CVaR model is formulated and a reinforcement learning agent is trained on a set of historical financial data to choose the best scenario tree structure for the mean-CVaR model. For this purpose, we implemented a custom reinforcement learning environment. Further an inclusion of a penalty term in the reward obtained by the agent is proposed to avoid scenario trees that are too complex. The reinforcement learning agent is then evaluated against an agent that chooses the scenario tree structure at random and outperforms the random agent. Further the structure of scenario trees chosen by the reinforcement learning agent is analyzed. 1

Citace dokumentu

Metadata

Show full item record