dc.contributor.advisor | Hana, Jiří | |
dc.creator | Ustinova, Evgeniya | |
dc.date.accessioned | 2023-07-24T12:40:53Z | |
dc.date.available | 2023-07-24T12:40:53Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | http://hdl.handle.net/20.500.11956/181574 | |
dc.description.abstract | Quotations extraction and attribution are important practical tasks for the media, but most of the presented solutions are monolingual. In this work, I present a complex machine learning-based system for extraction and attribution of direct and indirect quo- tations, which is trained on English and tested on Czech and Russian data. Czech and Russian test datasets were manually annotated as part of this study. This system is com- pared against a rule-based baseline model. Baseline model demonstrates better precision in extraction of quotation elements, but low recall. The machine learning-based model is better overall in extracting separate elements of quotations and full quotations as well. 1 | en_US |
dc.language | English | cs_CZ |
dc.language.iso | en_US | |
dc.publisher | Univerzita Karlova, Matematicko-fyzikální fakulta | cs_CZ |
dc.subject | NLP|quotation extraction|quotation attribution|CRFs|article|annotation | en_US |
dc.subject | NLP | cs_CZ |
dc.title | Automatic detection and attribution of quotes | en_US |
dc.type | diplomová práce | cs_CZ |
dcterms.created | 2023 | |
dcterms.dateAccepted | 2023-06-06 | |
dc.description.department | Ústav formální a aplikované lingvistiky | cs_CZ |
dc.description.department | Institute of Formal and Applied Linguistics | en_US |
dc.description.faculty | Faculty of Mathematics and Physics | en_US |
dc.description.faculty | Matematicko-fyzikální fakulta | cs_CZ |
dc.identifier.repId | 245126 | |
dc.title.translated | Automatická identifikace citátů | cs_CZ |
dc.contributor.referee | Vidová Hladká, Barbora | |
thesis.degree.name | Mgr. | |
thesis.degree.level | navazující magisterské | cs_CZ |
thesis.degree.discipline | Computer Science - Language Technologies and Computational Linguistics | cs_CZ |
thesis.degree.discipline | Computer Science - Language Technologies and Computational Linguistics | en_US |
thesis.degree.program | Computer Science - Language Technologies and Computational Linguistics | cs_CZ |
thesis.degree.program | Computer Science - Language Technologies and Computational Linguistics | en_US |
uk.thesis.type | diplomová práce | cs_CZ |
uk.taxonomy.organization-cs | Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky | cs_CZ |
uk.taxonomy.organization-en | Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics | en_US |
uk.faculty-name.cs | Matematicko-fyzikální fakulta | cs_CZ |
uk.faculty-name.en | Faculty of Mathematics and Physics | en_US |
uk.faculty-abbr.cs | MFF | cs_CZ |
uk.degree-discipline.cs | Computer Science - Language Technologies and Computational Linguistics | cs_CZ |
uk.degree-discipline.en | Computer Science - Language Technologies and Computational Linguistics | en_US |
uk.degree-program.cs | Computer Science - Language Technologies and Computational Linguistics | cs_CZ |
uk.degree-program.en | Computer Science - Language Technologies and Computational Linguistics | en_US |
thesis.grade.cs | Výborně | cs_CZ |
thesis.grade.en | Excellent | en_US |
uk.abstract.en | Quotations extraction and attribution are important practical tasks for the media, but most of the presented solutions are monolingual. In this work, I present a complex machine learning-based system for extraction and attribution of direct and indirect quo- tations, which is trained on English and tested on Czech and Russian data. Czech and Russian test datasets were manually annotated as part of this study. This system is com- pared against a rule-based baseline model. Baseline model demonstrates better precision in extraction of quotation elements, but low recall. The machine learning-based model is better overall in extracting separate elements of quotations and full quotations as well. 1 | en_US |
uk.file-availability | V | |
uk.grantor | Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky | cs_CZ |
thesis.grade.code | 1 | |
uk.publication-place | Praha | cs_CZ |
uk.thesis.defenceStatus | O | |