Umělé neuronové sítě a zpětnovazebné učení

Havránek, Vojtěch

Artificial neural networks and reinforcement learning

diploma thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (284.5Kb)

Permanent link

http://hdl.handle.net/20.500.11956/14855

Identifiers

Study Information System: 46950

CU Caralogue: 990010994970106986

Referee

Božovský, Petr

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Theoretical computer science

Department

Department of Software Engineering

Date of defense

26. 5. 2008

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

Czech

Grade

Excellent

Při řešení složitých úloh strojového učení bývá často obtížné specifikovat přesný postup vedoucí k jejich správnému řešení. V praxi proto může být výhodnější využít např. zpětnovazebného učení, které vyžaduje jedinou informaci o řešené úloze, a to odměnu úměrnou vhodnosti akcí agenta. Ukazuje se, že algoritmy zpětnovazebného učení pracující s neuronovými sítěmi mohou v řadě takových úloh dosáhnout dobrých výsledků. Dosažené výsledky mohou být lepší, má-li neuronová sít' schopnost modelovat prostředí nebo je-li rozšířena o rekurentní vazby. V práci ukazujeme, že pro danou sít', která správně predikuje odměnu, je nalezení optimální akce v obecném případě NP-úplná úloha. Popisujeme tři modely neuronových sítí. Jedním z nich je námi přizpůsobená varianta Suttonova algoritmu TD(¸) pro nemarkovské prostředí. Všechny tři modely jsme důkladně otestovali v námi vytvořeném simulátoru dravce a kořisti. Nejúspěšnější z testovaných modelů - modifikovaný TD(¸) jsme následně aplikovali při řízení reálnéeho mobilního robota. V práci se zároveň zabýváme vhodným způsobem odměňování, biologickou opodstatněností zvažovaných modelů neuronové sítě, důležitostí explorativních schopností algoritmu a mezemi použitelnosti zpětnovazebného učení. Součástí práce je knihovna neuronových sítí napsaná v jazyku C++, ve které jsou popsané...

Abstract (English)

When solving complex machine learning tasks, it is often more practical to let the agent find an adequate solution by itself using e.g. reinforcement learning rather than trying to specify a solution in detail. The only information required for reinforcement learning is a reward that gives the agent reinforcement about the desirability of his actions. Our experiments suggest that good results can be achieved by reinforcement learning with online learning neural networks. The functionality of such neural network may be further extended by allowing it to model the environment and/or by providing it with recurrent connections. In this thesis, we show that for a given network predicting the reward, it is NP-complete to find the agent action that maximizes this reward. We describe three neural network models, one of them being an original modification of Sutton's TD(¸) algorithm that extends its domain to non-Markovian environments. All three models were thoroughly tested with our predator-prey simulator. The most powerful of them, the modified TD(¸) was then applied to control of a real mobile robot. Simultaneously, we have discussed the principles of rewarding the agents, the biological plausibility of the algorithms, the importance of the exploration capabilities and general bounds of reinforcement learning....

Citace dokumentu

Metadata

Show full item record