Prediction of Czech GDP using mixed-frequency machine learning models

Kotlan, Ivan

Predikce českého HDP pomocí strojového učení se smíšenou frekvencí

bakalářská práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (277.9Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/184964

Identifikátory

SIS: 249443

Oponent práce

Kukačka, Jiří

Fakulta / součást

Fakulta sociálních věd

Obor

Ekonomie a finance

Katedra / ústav / klinika

Institut ekonomických studií

Datum obhajoby

11. 9. 2023

Nakladatel

Univerzita Karlova, Fakulta sociálních věd

Jazyk

Angličtina

Známka

Výborně

Klíčová slova (česky)

Predikce HDP, strojové učení

Klíčová slova (anglicky)

GDP nowcasting, machine learning

Cílem této práce je za prvé poskytnout p esn jöí p edpov r stu HDP eské republiky neû oficiální odhad eského statistického ú adu a eské národní banky. Zadruhé rozöí it literaturu, zkoumající predikce asov˝ch ad s pouûitím strojového u ení vyuûívající data s r zn˝mi frekvencemi. P estoûe pouûité mod- ely (Ridge model a Random Forest) nedokázaly p ekonat odhady oficiálních in- stitucí, tato práce p isp la sv˝mi v˝sledky k rozöí ení zatím málo prozkoumané oblasti zab˝vající se vyuûitím strojového u ení s daty o libovoln˝ch frekvencích. Vzhledem k tomu, ûe neexistuje model strojového u ení, kter˝ by um l praco- vat s daty o r zn˝ch frekvencí, tato práce ukazuje, jak prom nné transformo- vat do podoby vhodné pro jak˝koliv model. Dále je zkoumán efekt pouûití r zn˝ch typu datset . Datasety se liöily v asu p edpov di; konec sou asného tvrtletí (nowcast) a 40 dní po referen ním tvrtletí (backcast), typu trnasfor- mace dataset; pouûití standardizovan˝ch a nestandardizovan˝ch dat a nakonec na nejlepöím modelu (Ridge) je zkoumán vliv tzv. vysokofrekven ních prom n- n˝ch (na t˝denní bázi). Zatímco u Random Forestu typ datasetu nehrál v˝z- namnou roli, v p ípad Ridge modelu rozdíln˝ dataset siln ovlivnil odhadované hodnoty. Hlavní rozdíl pak byl mezi netransformovan˝m a transformovan˝m datasetem, kdy p i pouûití...

Abstrakt (anglicky)

The goal of this study is first to provide superior predictions of Czech GDP growth to the o cial estimates of the Czech Statistical O ce and the proxy estimation of the Czech National Bank. Secondly, to expand the literature that focuses on machine-learning predictions that utilizes data with various sampling frequency. Although in the first goal, this thesis did not succeed as all models, namely Ridge and Random Forest, failed to beat the predictions of o cial institutes, the thesis contributes to the yet scarce literature on mixed-frequency machine-learning prediction. Since no machine-learning model accounts for data with various frequencies, the thesis shows how to transform variables so that any machine-learning model can utilize them. Furthermore, di erent dataset modifications are explored, such as the prediction time: end of the reference quarter (nowcast) and 40 days after the reference quarter (backcast), standardized and non-standardized datasets. And finally, for the superior Ridge model, the e ect of so-called high-frequency variables (sampled every week) is explored. While Random Forest showed little e ect by using di erent versions of the dataset, in the case of the Ridge model, the type of dataset had a significant e ect. While the non-standardized Ridge produces better overall...

Citace dokumentu

Metadata

Zobrazit celý záznam