Srovnání sekvenční a strukturních metod strojového učení pro predikci protein-ligand vazebných reziduí

Divín, Prokop

Comparison of sequence and structure-based machine learning approaches for protein-ligand binding residues

dc.contributor.advisor	Hoksza, David
dc.creator	Divín, Prokop
dc.date.accessioned	2023-11-06T15:42:48Z
dc.date.available	2023-11-06T15:42:48Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/20.500.11956/184372
dc.description.abstract	The prediction of protein-ligand binding sites is an important task, allowing us to understand protein-ligand interactions, the understanding of which is essential in drug design and the development of certain areas of biology. Although machine learning tools for binding site prediction have been developed, the methods developed so far have only been interested in prediction from the 3D structure of the protein, which is unknown for most proteins. Therefore, in our work we are interested in prediction from knowl- edge of the mere sequence of residues representing the protein. Here we compare possi- ble approaches to solve this problem. We compare the representation of residues using their chemical and physical properties with a representation using methods from natural language recognition. Furthermore, we compare the chosen machine learning methods. Finally, we compare our results with the P2Rank method, as a state-of-the-art method using 3D structure to predict protein-ligand binding sites. 1	en_US
dc.description.abstract	Předpověd protein-ligand vazebných míst je důležitým úkolem, dovolujícím nám po- chopit interakce mezi proteinem a ligandem, jejichž pochopení je nezbytné při návrhu léčiv a rozvoji některých oblastí biologie. Ačkoliv již byly vytvořeny nástroje strojového učení pro predikci vazebných míst, tak doposud se vytvořené metody zajímaly pouze o predikci ze 3D struktury proteinu, která ale není pro většinu proteinů známá. Proto se v naší práci zajímáme o předpověď ze znalosti pouhé sekvence reziduí představující pro- tein. Srovnáváme zde možné přístupy k řešení tohoto problému. Srovnáváme reprezentaci reziduí pomocí jejich chemicko-fyzikálních vlastností s reprezentaci používající metody z rozpoznávání přirozeného jazyka. Dále porovnáváme zvolené metody strojového učení. Na závěr porovnáme naše výsledky s P2Rank metodou, jakožto s nejmodernější metodou používající k předpovědi protein-ligand vazebných míst 3D strukturu. 1	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	bioinformatika\|protein\|strojové učení\|molekulární interakce	cs_CZ
dc.subject	bioinformatics\|protein\|machine learning\|molecular interactions	en_US
dc.title	Srovnání sekvenční a strukturních metod strojového učení pro predikci protein-ligand vazebných reziduí	cs_CZ
dc.type	bakalářská práce	cs_CZ
dcterms.created	2023
dcterms.dateAccepted	2023-09-07
dc.description.department	Katedra softwarového inženýrství	cs_CZ
dc.description.department	Department of Software Engineering	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	251977
dc.title.translated	Comparison of sequence and structure-based machine learning approaches for protein-ligand binding residues	en_US
dc.contributor.referee	Škoda, Petr
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	Informatika se specializací Umělá inteligence	cs_CZ
thesis.degree.discipline	Computer Science with specialisation in Artificial Intelligence	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwarového inženýrství	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software Engineering	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Informatika se specializací Umělá inteligence	cs_CZ
uk.degree-discipline.en	Computer Science with specialisation in Artificial Intelligence	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Velmi dobře	cs_CZ
thesis.grade.en	Very good	en_US
uk.abstract.cs	Předpověd protein-ligand vazebných míst je důležitým úkolem, dovolujícím nám po- chopit interakce mezi proteinem a ligandem, jejichž pochopení je nezbytné při návrhu léčiv a rozvoji některých oblastí biologie. Ačkoliv již byly vytvořeny nástroje strojového učení pro predikci vazebných míst, tak doposud se vytvořené metody zajímaly pouze o predikci ze 3D struktury proteinu, která ale není pro většinu proteinů známá. Proto se v naší práci zajímáme o předpověď ze znalosti pouhé sekvence reziduí představující pro- tein. Srovnáváme zde možné přístupy k řešení tohoto problému. Srovnáváme reprezentaci reziduí pomocí jejich chemicko-fyzikálních vlastností s reprezentaci používající metody z rozpoznávání přirozeného jazyka. Dále porovnáváme zvolené metody strojového učení. Na závěr porovnáme naše výsledky s P2Rank metodou, jakožto s nejmodernější metodou používající k předpovědi protein-ligand vazebných míst 3D strukturu. 1	cs_CZ
uk.abstract.en	The prediction of protein-ligand binding sites is an important task, allowing us to understand protein-ligand interactions, the understanding of which is essential in drug design and the development of certain areas of biology. Although machine learning tools for binding site prediction have been developed, the methods developed so far have only been interested in prediction from the 3D structure of the protein, which is unknown for most proteins. Therefore, in our work we are interested in prediction from knowl- edge of the mere sequence of residues representing the protein. Here we compare possi- ble approaches to solve this problem. We compare the representation of residues using their chemical and physical properties with a representation using methods from natural language recognition. Furthermore, we compare the chosen machine learning methods. Finally, we compare our results with the P2Rank method, as a state-of-the-art method using 3D structure to predict protein-ligand binding sites. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwarového inženýrství	cs_CZ
thesis.grade.code	2
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O