Srovnání sekvenční a strukturních metod strojového učení pro predikci protein-ligand vazebných reziduí

Divín, Prokop

Comparison of sequence and structure-based machine learning approaches for protein-ligand binding residues

bachelor thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (347.4Kb)

Permanent link

http://hdl.handle.net/20.500.11956/184372

Identifiers

Study Information System: 251977

Referee

Škoda, Petr

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Computer Science with specialisation in Artificial Intelligence

Department

Department of Software Engineering

Date of defense

7. 9. 2023

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

Czech

Grade

Very good

Keywords (Czech)

bioinformatika|protein|strojové učení|molekulární interakce

Keywords (English)

bioinformatics|protein|machine learning|molecular interactions

Předpověd protein-ligand vazebných míst je důležitým úkolem, dovolujícím nám po- chopit interakce mezi proteinem a ligandem, jejichž pochopení je nezbytné při návrhu léčiv a rozvoji některých oblastí biologie. Ačkoliv již byly vytvořeny nástroje strojového učení pro predikci vazebných míst, tak doposud se vytvořené metody zajímaly pouze o predikci ze 3D struktury proteinu, která ale není pro většinu proteinů známá. Proto se v naší práci zajímáme o předpověď ze znalosti pouhé sekvence reziduí představující pro- tein. Srovnáváme zde možné přístupy k řešení tohoto problému. Srovnáváme reprezentaci reziduí pomocí jejich chemicko-fyzikálních vlastností s reprezentaci používající metody z rozpoznávání přirozeného jazyka. Dále porovnáváme zvolené metody strojového učení. Na závěr porovnáme naše výsledky s P2Rank metodou, jakožto s nejmodernější metodou používající k předpovědi protein-ligand vazebných míst 3D strukturu. 1

Abstract (English)

The prediction of protein-ligand binding sites is an important task, allowing us to understand protein-ligand interactions, the understanding of which is essential in drug design and the development of certain areas of biology. Although machine learning tools for binding site prediction have been developed, the methods developed so far have only been interested in prediction from the 3D structure of the protein, which is unknown for most proteins. Therefore, in our work we are interested in prediction from knowl- edge of the mere sequence of residues representing the protein. Here we compare possi- ble approaches to solve this problem. We compare the representation of residues using their chemical and physical properties with a representation using methods from natural language recognition. Furthermore, we compare the chosen machine learning methods. Finally, we compare our results with the P2Rank method, as a state-of-the-art method using 3D structure to predict protein-ligand binding sites. 1

Citace dokumentu

Metadata

Show full item record