Výpočet gramatického statusu: kvantitativní analýza čínských textů

Konývková, Eliška

Calculation of grammatical status: quantitative analysis of Chinese texts

dc.contributor.advisor	Milička, Jiří
dc.creator	Konývková, Eliška
dc.date.accessioned	2024-11-29T07:11:30Z
dc.date.available	2024-11-29T07:11:30Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.11956/194710
dc.description.abstract	Tato studie opakuje výzkum Linlin Sun & Davida C. Saavedry zaměřený na určování gramatického statusu jednotek v čínštině pomocí kvantitativních metod. V návaznosti na jejich práci používáme binární logistický regresní model pro výpočet skóre gramatického statusu vybraných lexikálních jednotek. Jako zdroj dat nám slouží Lancasterský korpus čínštiny (LCMC), který obsahuje současné standardní čínské texty. V práci podrobně popisujeme metriky a modelovací přístup použitý k odvození skóre gramatického stavu a porovnáváme naše výsledky s výsledky autorů původní studie. V naší analýze dále posuzujeme vhodnost vybraných metrik, hodnotíme přesnost predikce binárního logistického regresního modelu a analyzujeme, kde se po přiřazení gramatického statusu jednotky nachází na lexikálně - gramatické škále, a zda tomu odpovídá jejich "tradiční" zařazení do slovních kategorií. Klíčová slova: gramatický status, korpus, čínština, kvantitativní lingvistika, replikační studie	cs_CZ
dc.description.abstract	This study replicates Linlin Sun & David C. Saavedra's research on determining the grammatical status of units in Chinese using quantitative methods. Following their work, we use a binary logistic regression model to calculate the grammatical status scores of selected lexical units. As a source of data, the Lancaster Corpus of Chinese (LCMC), which contains contemporary standard Chinese texts, is used. In this paper, we detail the metrics and modeling approach used to derive the grammatical status scores and compare our results with those of the authors of the original study. In our analysis, we further assess the appropriateness of the selected metrics, evaluate the accuracy of the binary logistic regression model's prediction, and analyze where units are on the lexico-grammatical scale after grammatical status assignment, and whether this corresponds to their "traditional" placement in word categories. Key words: grammatical status, corpora, Chinese, quantitative linguistics, replication study	en_US
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Filozofická fakulta	cs_CZ
dc.subject	grammatical status\|corpora\|Chinese\|quantitative linguistics\|replication study	en_US
dc.subject	gramatický status\|korpus\|čínština\|kvantitativní lingvistika\|replikační studie	cs_CZ
dc.title	Výpočet gramatického statusu: kvantitativní analýza čínských textů	cs_CZ
dc.type	bakalářská práce	cs_CZ
dcterms.created	2024
dcterms.dateAccepted	2024-09-12
dc.description.department	Institute of Linguistics	en_US
dc.description.department	Ústav obecné lingvistiky	cs_CZ
dc.description.faculty	Filozofická fakulta	cs_CZ
dc.description.faculty	Faculty of Arts	en_US
dc.identifier.repId	270245
dc.title.translated	Calculation of grammatical status: quantitative analysis of Chinese texts	en_US
dc.contributor.referee	Kubát, Miroslav
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	General Linguistics - Chinese Studies	en_US
thesis.degree.discipline	Obecná lingvistika - Sinologie	cs_CZ
thesis.degree.program	General Linguistics	en_US
thesis.degree.program	Obecná lingvistika	cs_CZ
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Filozofická fakulta::Ústav obecné lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Arts::Institute of Linguistics	en_US
uk.faculty-name.cs	Filozofická fakulta	cs_CZ
uk.faculty-name.en	Faculty of Arts	en_US
uk.faculty-abbr.cs	FF	cs_CZ
uk.degree-discipline.cs	Obecná lingvistika - Sinologie	cs_CZ
uk.degree-discipline.en	General Linguistics - Chinese Studies	en_US
uk.degree-program.cs	Obecná lingvistika	cs_CZ
uk.degree-program.en	General Linguistics	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Tato studie opakuje výzkum Linlin Sun & Davida C. Saavedry zaměřený na určování gramatického statusu jednotek v čínštině pomocí kvantitativních metod. V návaznosti na jejich práci používáme binární logistický regresní model pro výpočet skóre gramatického statusu vybraných lexikálních jednotek. Jako zdroj dat nám slouží Lancasterský korpus čínštiny (LCMC), který obsahuje současné standardní čínské texty. V práci podrobně popisujeme metriky a modelovací přístup použitý k odvození skóre gramatického stavu a porovnáváme naše výsledky s výsledky autorů původní studie. V naší analýze dále posuzujeme vhodnost vybraných metrik, hodnotíme přesnost predikce binárního logistického regresního modelu a analyzujeme, kde se po přiřazení gramatického statusu jednotky nachází na lexikálně - gramatické škále, a zda tomu odpovídá jejich "tradiční" zařazení do slovních kategorií. Klíčová slova: gramatický status, korpus, čínština, kvantitativní lingvistika, replikační studie	cs_CZ
uk.abstract.en	This study replicates Linlin Sun & David C. Saavedra's research on determining the grammatical status of units in Chinese using quantitative methods. Following their work, we use a binary logistic regression model to calculate the grammatical status scores of selected lexical units. As a source of data, the Lancaster Corpus of Chinese (LCMC), which contains contemporary standard Chinese texts, is used. In this paper, we detail the metrics and modeling approach used to derive the grammatical status scores and compare our results with those of the authors of the original study. In our analysis, we further assess the appropriateness of the selected metrics, evaluate the accuracy of the binary logistic regression model's prediction, and analyze where units are on the lexico-grammatical scale after grammatical status assignment, and whether this corresponds to their "traditional" placement in word categories. Key words: grammatical status, corpora, Chinese, quantitative linguistics, replication study	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Filozofická fakulta, Ústav obecné lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O