Generování textů českých coververzí anglických písní

Štěpánková, Barbora

Generation of Czech Lyrics to Cover Songs

dc.contributor.advisor	Rosa, Rudolf
dc.creator	Štěpánková, Barbora
dc.date.accessioned	2024-07-19T06:23:42Z
dc.date.available	2024-07-19T06:23:42Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.11956/192043
dc.description.abstract	This thesis explores the topic of generating Czech lyrics to English cover songs. Songs are often adapted to different languages to make them more available to people who do not necessarily speak the language of the original song. During the translation process, however, it is essential to preserve the singability of the text in relation to the melody of the original song, as well as the meaning of the song, so that the translated text fits the context of the original. Currently, such translations are done by hand. We analyze and present the first approaches to solve this problem for Czech through automatic generation using NLP methods. In our work, we create and provide a dataset consisting of pairs of English song lyrics and their official Czech translations. We also provide a dataset of pure Czech song lyrics. We compare the quality of several generative language models. To thoroughly evaluate and analyze their quality, we introduce several automatic metrics and take into account the results of manual evaluation. We find that smaller trained models perform better than larger untrained models. In addition, context is important for the generation of good covers. Finally, we show that our task can be approached from both the translation and generation point of view. 1	en_US
dc.description.abstract	Tato práce se zabývá tvorbou českých textů k anglickým originálním písním. Písně jsou často překládány do různých jazyků, aby byly přístupné i lidem, kteří nerozumějí původnímu jazyku. Během procesu překladu je však nezbytné zachovat zpěvnost textu vzhledem k melodii půvdní písně, stejně tak jako význam písně, aby i přeložený text seděl do kontextu originálu. V současné době se takové překlady provádějí ručně. Provádíme analýzu a představujeme první přístupy k řešení tohoto problému pro češtinu prostřed- nictvím automatického generování pomocí NLP metod. V naší práci vytváříme a poskytu- jeme dataset sestávající se z dvojic Anglických písňových textů a jejich oficiálních Českých překladů. Také poskytujeme dataset z čistě Českých písňových textů. Porovnáváme kval- itu několika generativních jazykových modelů. Pro důkladné zhodnocení a analýzu jejich kvality zavádíme několik automatických metrik a bereme v úvahu i výsledky od lidských hodnotitelů. Zjistili jsme, menší natrénované modely mají lepší výsledky než větší ne- natrénované modely.. Kromě toho je pro kvalitní generování coververzí důležitý kontext. Nakonec ukazujeme, že k našemu úkolu lze přistupovat jak prostřednictvím přístupu založeném na překladu, tak prostřednictvím generativních modelů. 1	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	natural language processing\|text generation\|literary NLP\|machine translation	en_US
dc.subject	zpracování přirozeného jazyka\|generování textu\|literární NLP\|strojový překlad	cs_CZ
dc.title	Generování textů českých coververzí anglických písní	cs_CZ
dc.type	bakalářská práce	cs_CZ
dcterms.created	2024
dcterms.dateAccepted	2024-06-28
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	269769
dc.title.translated	Generation of Czech Lyrics to Cover Songs	en_US
dc.contributor.referee	Mareček, David
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	Informatika se specializací Umělá inteligence	cs_CZ
thesis.degree.discipline	Computer Science with specialisation in Artificial Intelligence	en_US
thesis.degree.program	Computer Science	en_US
thesis.degree.program	Informatika	cs_CZ
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Informatika se specializací Umělá inteligence	cs_CZ
uk.degree-discipline.en	Computer Science with specialisation in Artificial Intelligence	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Tato práce se zabývá tvorbou českých textů k anglickým originálním písním. Písně jsou často překládány do různých jazyků, aby byly přístupné i lidem, kteří nerozumějí původnímu jazyku. Během procesu překladu je však nezbytné zachovat zpěvnost textu vzhledem k melodii půvdní písně, stejně tak jako význam písně, aby i přeložený text seděl do kontextu originálu. V současné době se takové překlady provádějí ručně. Provádíme analýzu a představujeme první přístupy k řešení tohoto problému pro češtinu prostřed- nictvím automatického generování pomocí NLP metod. V naší práci vytváříme a poskytu- jeme dataset sestávající se z dvojic Anglických písňových textů a jejich oficiálních Českých překladů. Také poskytujeme dataset z čistě Českých písňových textů. Porovnáváme kval- itu několika generativních jazykových modelů. Pro důkladné zhodnocení a analýzu jejich kvality zavádíme několik automatických metrik a bereme v úvahu i výsledky od lidských hodnotitelů. Zjistili jsme, menší natrénované modely mají lepší výsledky než větší ne- natrénované modely.. Kromě toho je pro kvalitní generování coververzí důležitý kontext. Nakonec ukazujeme, že k našemu úkolu lze přistupovat jak prostřednictvím přístupu založeném na překladu, tak prostřednictvím generativních modelů. 1	cs_CZ
uk.abstract.en	This thesis explores the topic of generating Czech lyrics to English cover songs. Songs are often adapted to different languages to make them more available to people who do not necessarily speak the language of the original song. During the translation process, however, it is essential to preserve the singability of the text in relation to the melody of the original song, as well as the meaning of the song, so that the translated text fits the context of the original. Currently, such translations are done by hand. We analyze and present the first approaches to solve this problem for Czech through automatic generation using NLP methods. In our work, we create and provide a dataset consisting of pairs of English song lyrics and their official Czech translations. We also provide a dataset of pure Czech song lyrics. We compare the quality of several generative language models. To thoroughly evaluate and analyze their quality, we introduce several automatic metrics and take into account the results of manual evaluation. We find that smaller trained models perform better than larger untrained models. In addition, context is important for the generation of good covers. Finally, we show that our task can be approached from both the translation and generation point of view. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O