Mining novel terpene synthases from large-scale repositories

Čalounová, Tereza

Mining nových terpen syntáz z rozsáhlých databází

dc.contributor.advisor	Pluskal, Tomáš
dc.creator	Čalounová, Tereza
dc.date.accessioned	2024-11-28T11:53:10Z
dc.date.available	2024-11-28T11:53:10Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.11956/190195
dc.description.abstract	Terpeny a terpenoidy představují největší a strukturně nejrozmanitější skupinu přírodních látek s využitím v mnoha oborech, včetně farmaceutického průmyslu. Tyto molekuly jsou v přírodě syntetizovány enzymy známými jako terpen syntázy. V této práci byla provedena bioinformatická analýza kurátorované databáze obsahující všech 1125 experimentálně charakterizovaných terpen syntáz se zaměřením na identifikaci vzorců v délkách sekvencí a doménových architekturách těchto enzymů napříč různými říšemi života. Na základě poznatků této analýzy byl proveden sekvenčně založený mining s cílem identifikovat možné nové terpen syntázy. S využitím téměř 5,5 miliard proteinových sekvencí z různých rozsáhlých sekvenčních databází vedl mining k identifikaci více než 600 tisíc potenciálních terpen syntáz. Tyto potenciální terpen syntázy pocházejí převážně z bakterií a metagenomů, tedy ze zdrojů, které byly historicky méně zkoumány. Výsledný dataset, doplněný fylogenetickým stromem, sítí sekvenční podobnosti a dvěma skóre prioritizace, nabízí cenný zdroj pro objevování nových terpenů. Klíčová slova: terpen syntáza, TPS, mining, Pfam, SUPERFAMILY, doména, terpen	cs_CZ
dc.description.abstract	Terpenes and terpenoids represent the largest and most structurally diverse group of natural products, with applications across many fields, including the pharmaceutical industry. These molecules are synthesized in nature by enzymes known as terpene synthases. This thesis conducted a bioinformatic analysis of a curated database containing all 1125 experimentally characterized terpene synthases, focusing on identifying patterns in sequence lengths and domain architectures of these enzymes across different kingdoms of life. Based on this analysis's knowledge, sequence-guided mining was conducted to identify possible new terpene synthases. Using nearly 5.5 billion protein sequences from various large-scale sequence repositories, the mining resulted in the identification of more than 600 thousand putative terpene synthases. These putative terpene synthases mainly originate from Bacteria and metagenomes, sources that had historically been less explored. The resulting dataset, accompanied by a phylogenetic tree, sequence similarity network, and two prioritization scores, offers a valuable resource for the discovery of novel terpenes. Keywords: terpene synthase, TPS, mining, Pfam, SUPERFAMILY, domain, terpene	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Přírodovědecká fakulta	cs_CZ
dc.subject	terpene synthase	en_US
dc.subject	mining	en_US
dc.subject	database	en_US
dc.subject	Pfam	en_US
dc.subject	Supfam	en_US
dc.subject	domain	en_US
dc.subject	terpene	en_US
dc.subject	terpen syntáza	cs_CZ
dc.subject	mining	cs_CZ
dc.subject	databáze	cs_CZ
dc.subject	Pfam	cs_CZ
dc.subject	Supfam	cs_CZ
dc.subject	doména	cs_CZ
dc.subject	terpen	cs_CZ
dc.title	Mining novel terpene synthases from large-scale repositories	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2024
dcterms.dateAccepted	2024-06-04
dc.description.department	Department of Cell Biology	en_US
dc.description.department	Katedra buněčné biologie	cs_CZ
dc.description.faculty	Přírodovědecká fakulta	cs_CZ
dc.description.faculty	Faculty of Science	en_US
dc.identifier.repId	253476
dc.title.translated	Mining nových terpen syntáz z rozsáhlých databází	cs_CZ
dc.contributor.referee	Štáfková, Jitka
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Bioinformatics	en_US
thesis.degree.discipline	Bioinformatika	cs_CZ
thesis.degree.program	Bioinformatics	en_US
thesis.degree.program	Bioinformatika	cs_CZ
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Přírodovědecká fakulta::Katedra buněčné biologie	cs_CZ
uk.taxonomy.organization-en	Faculty of Science::Department of Cell Biology	en_US
uk.faculty-name.cs	Přírodovědecká fakulta	cs_CZ
uk.faculty-name.en	Faculty of Science	en_US
uk.faculty-abbr.cs	PřF	cs_CZ
uk.degree-discipline.cs	Bioinformatika	cs_CZ
uk.degree-discipline.en	Bioinformatics	en_US
uk.degree-program.cs	Bioinformatika	cs_CZ
uk.degree-program.en	Bioinformatics	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Terpeny a terpenoidy představují největší a strukturně nejrozmanitější skupinu přírodních látek s využitím v mnoha oborech, včetně farmaceutického průmyslu. Tyto molekuly jsou v přírodě syntetizovány enzymy známými jako terpen syntázy. V této práci byla provedena bioinformatická analýza kurátorované databáze obsahující všech 1125 experimentálně charakterizovaných terpen syntáz se zaměřením na identifikaci vzorců v délkách sekvencí a doménových architekturách těchto enzymů napříč různými říšemi života. Na základě poznatků této analýzy byl proveden sekvenčně založený mining s cílem identifikovat možné nové terpen syntázy. S využitím téměř 5,5 miliard proteinových sekvencí z různých rozsáhlých sekvenčních databází vedl mining k identifikaci více než 600 tisíc potenciálních terpen syntáz. Tyto potenciální terpen syntázy pocházejí převážně z bakterií a metagenomů, tedy ze zdrojů, které byly historicky méně zkoumány. Výsledný dataset, doplněný fylogenetickým stromem, sítí sekvenční podobnosti a dvěma skóre prioritizace, nabízí cenný zdroj pro objevování nových terpenů. Klíčová slova: terpen syntáza, TPS, mining, Pfam, SUPERFAMILY, doména, terpen	cs_CZ
uk.abstract.en	Terpenes and terpenoids represent the largest and most structurally diverse group of natural products, with applications across many fields, including the pharmaceutical industry. These molecules are synthesized in nature by enzymes known as terpene synthases. This thesis conducted a bioinformatic analysis of a curated database containing all 1125 experimentally characterized terpene synthases, focusing on identifying patterns in sequence lengths and domain architectures of these enzymes across different kingdoms of life. Based on this analysis's knowledge, sequence-guided mining was conducted to identify possible new terpene synthases. Using nearly 5.5 billion protein sequences from various large-scale sequence repositories, the mining resulted in the identification of more than 600 thousand putative terpene synthases. These putative terpene synthases mainly originate from Bacteria and metagenomes, sources that had historically been less explored. The resulting dataset, accompanied by a phylogenetic tree, sequence similarity network, and two prioritization scores, offers a valuable resource for the discovery of novel terpenes. Keywords: terpene synthase, TPS, mining, Pfam, SUPERFAMILY, domain, terpene	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Přírodovědecká fakulta, Katedra buněčné biologie	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O