Automatic detection and attribution of quotes

Ustinova, Evgeniya

Automatická identifikace citátů

dc.contributor.advisor	Hana, Jiří
dc.creator	Ustinova, Evgeniya
dc.date.accessioned	2023-07-24T12:40:53Z
dc.date.available	2023-07-24T12:40:53Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/20.500.11956/181574
dc.description.abstract	Quotations extraction and attribution are important practical tasks for the media, but most of the presented solutions are monolingual. In this work, I present a complex machine learning-based system for extraction and attribution of direct and indirect quo- tations, which is trained on English and tested on Czech and Russian data. Czech and Russian test datasets were manually annotated as part of this study. This system is com- pared against a rule-based baseline model. Baseline model demonstrates better precision in extraction of quotation elements, but low recall. The machine learning-based model is better overall in extracting separate elements of quotations and full quotations as well. 1	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	NLP\|quotation extraction\|quotation attribution\|CRFs\|article\|annotation	en_US
dc.subject	NLP	cs_CZ
dc.title	Automatic detection and attribution of quotes	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2023
dcterms.dateAccepted	2023-06-06
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	245126
dc.title.translated	Automatická identifikace citátů	cs_CZ
dc.contributor.referee	Vidová Hladká, Barbora
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Computer Science - Language Technologies and Computational Linguistics	cs_CZ
thesis.degree.discipline	Computer Science - Language Technologies and Computational Linguistics	en_US
thesis.degree.program	Computer Science - Language Technologies and Computational Linguistics	cs_CZ
thesis.degree.program	Computer Science - Language Technologies and Computational Linguistics	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Computer Science - Language Technologies and Computational Linguistics	cs_CZ
uk.degree-discipline.en	Computer Science - Language Technologies and Computational Linguistics	en_US
uk.degree-program.cs	Computer Science - Language Technologies and Computational Linguistics	cs_CZ
uk.degree-program.en	Computer Science - Language Technologies and Computational Linguistics	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.en	Quotations extraction and attribution are important practical tasks for the media, but most of the presented solutions are monolingual. In this work, I present a complex machine learning-based system for extraction and attribution of direct and indirect quo- tations, which is trained on English and tested on Czech and Russian data. Czech and Russian test datasets were manually annotated as part of this study. This system is com- pared against a rule-based baseline model. Baseline model demonstrates better precision in extraction of quotation elements, but low recall. The machine learning-based model is better overall in extracting separate elements of quotations and full quotations as well. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O