Umělé neuronové sítě a zpětnovazebné učení

Havránek, Vojtěch

Artificial neural networks and reinforcement learning

dc.contributor.advisor	Mrázová, Iveta
dc.creator	Havránek, Vojtěch
dc.date.accessioned	2017-04-10T10:42:00Z
dc.date.available	2017-04-10T10:42:00Z
dc.date.issued	2008
dc.identifier.uri	http://hdl.handle.net/20.500.11956/14855
dc.description.abstract	When solving complex machine learning tasks, it is often more practical to let the agent find an adequate solution by itself using e.g. reinforcement learning rather than trying to specify a solution in detail. The only information required for reinforcement learning is a reward that gives the agent reinforcement about the desirability of his actions. Our experiments suggest that good results can be achieved by reinforcement learning with online learning neural networks. The functionality of such neural network may be further extended by allowing it to model the environment and/or by providing it with recurrent connections. In this thesis, we show that for a given network predicting the reward, it is NP-complete to find the agent action that maximizes this reward. We describe three neural network models, one of them being an original modification of Sutton's TD(¸) algorithm that extends its domain to non-Markovian environments. All three models were thoroughly tested with our predator-prey simulator. The most powerful of them, the modified TD(¸) was then applied to control of a real mobile robot. Simultaneously, we have discussed the principles of rewarding the agents, the biological plausibility of the algorithms, the importance of the exploration capabilities and general bounds of reinforcement learning....	en_US
dc.description.abstract	Při řešení složitých úloh strojového učení bývá často obtížné specifikovat přesný postup vedoucí k jejich správnému řešení. V praxi proto může být výhodnější využít např. zpětnovazebného učení, které vyžaduje jedinou informaci o řešené úloze, a to odměnu úměrnou vhodnosti akcí agenta. Ukazuje se, že algoritmy zpětnovazebného učení pracující s neuronovými sítěmi mohou v řadě takových úloh dosáhnout dobrých výsledků. Dosažené výsledky mohou být lepší, má-li neuronová sít' schopnost modelovat prostředí nebo je-li rozšířena o rekurentní vazby. V práci ukazujeme, že pro danou sít', která správně predikuje odměnu, je nalezení optimální akce v obecném případě NP-úplná úloha. Popisujeme tři modely neuronových sítí. Jedním z nich je námi přizpůsobená varianta Suttonova algoritmu TD(¸) pro nemarkovské prostředí. Všechny tři modely jsme důkladně otestovali v námi vytvořeném simulátoru dravce a kořisti. Nejúspěšnější z testovaných modelů - modifikovaný TD(¸) jsme následně aplikovali při řízení reálnéeho mobilního robota. V práci se zároveň zabýváme vhodným způsobem odměňování, biologickou opodstatněností zvažovaných modelů neuronové sítě, důležitostí explorativních schopností algoritmu a mezemi použitelnosti zpětnovazebného učení. Součástí práce je knihovna neuronových sítí napsaná v jazyku C++, ve které jsou popsané...	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.title	Umělé neuronové sítě a zpětnovazebné učení	cs_CZ
dc.type	diplomová práce	cs_CZ
dcterms.created	2008
dcterms.dateAccepted	2008-05-26
dc.description.department	Department of Software Engineering	en_US
dc.description.department	Katedra softwarového inženýrství	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	46950
dc.title.translated	Artificial neural networks and reinforcement learning	en_US
dc.contributor.referee	Božovský, Petr
dc.identifier.aleph	001099497
thesis.degree.name	Mgr.
thesis.degree.level	magisterské	cs_CZ
thesis.degree.discipline	Theoretical computer science	en_US
thesis.degree.discipline	Teoretická informatika	cs_CZ
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Informatics	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwarového inženýrství	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software Engineering	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Teoretická informatika	cs_CZ
uk.degree-discipline.en	Theoretical computer science	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Informatics	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Při řešení složitých úloh strojového učení bývá často obtížné specifikovat přesný postup vedoucí k jejich správnému řešení. V praxi proto může být výhodnější využít např. zpětnovazebného učení, které vyžaduje jedinou informaci o řešené úloze, a to odměnu úměrnou vhodnosti akcí agenta. Ukazuje se, že algoritmy zpětnovazebného učení pracující s neuronovými sítěmi mohou v řadě takových úloh dosáhnout dobrých výsledků. Dosažené výsledky mohou být lepší, má-li neuronová sít' schopnost modelovat prostředí nebo je-li rozšířena o rekurentní vazby. V práci ukazujeme, že pro danou sít', která správně predikuje odměnu, je nalezení optimální akce v obecném případě NP-úplná úloha. Popisujeme tři modely neuronových sítí. Jedním z nich je námi přizpůsobená varianta Suttonova algoritmu TD(¸) pro nemarkovské prostředí. Všechny tři modely jsme důkladně otestovali v námi vytvořeném simulátoru dravce a kořisti. Nejúspěšnější z testovaných modelů - modifikovaný TD(¸) jsme následně aplikovali při řízení reálnéeho mobilního robota. V práci se zároveň zabýváme vhodným způsobem odměňování, biologickou opodstatněností zvažovaných modelů neuronové sítě, důležitostí explorativních schopností algoritmu a mezemi použitelnosti zpětnovazebného učení. Součástí práce je knihovna neuronových sítí napsaná v jazyku C++, ve které jsou popsané...	cs_CZ
uk.abstract.en	When solving complex machine learning tasks, it is often more practical to let the agent find an adequate solution by itself using e.g. reinforcement learning rather than trying to specify a solution in detail. The only information required for reinforcement learning is a reward that gives the agent reinforcement about the desirability of his actions. Our experiments suggest that good results can be achieved by reinforcement learning with online learning neural networks. The functionality of such neural network may be further extended by allowing it to model the environment and/or by providing it with recurrent connections. In this thesis, we show that for a given network predicting the reward, it is NP-complete to find the agent action that maximizes this reward. We describe three neural network models, one of them being an original modification of Sutton's TD(¸) algorithm that extends its domain to non-Markovian environments. All three models were thoroughly tested with our predator-prey simulator. The most powerful of them, the modified TD(¸) was then applied to control of a real mobile robot. Simultaneously, we have discussed the principles of rewarding the agents, the biological plausibility of the algorithms, the importance of the exploration capabilities and general bounds of reinforcement learning....	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwarového inženýrství	cs_CZ
dc.identifier.lisID	990010994970106986