Výpočet gramatického statusu: kvantitativní analýza čínských textů

Konývková, Eliška

Calculation of grammatical status: quantitative analysis of Chinese texts

bachelor thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (334.6Kb)

Permanent link

http://hdl.handle.net/20.500.11956/194710

Identifiers

Study Information System: 270245

Referee

Kubát, Miroslav

Faculty / Institute

Faculty of Arts

Discipline

General Linguistics - Chinese Studies

Department

Institute of Linguistics

Date of defense

12. 9. 2024

Publisher

Univerzita Karlova, Filozofická fakulta

Language

Czech

Grade

Excellent

Keywords (Czech)

gramatický status|korpus|čínština|kvantitativní lingvistika|replikační studie

Keywords (English)

grammatical status|corpora|Chinese|quantitative linguistics|replication study

Tato studie opakuje výzkum Linlin Sun & Davida C. Saavedry zaměřený na určování gramatického statusu jednotek v čínštině pomocí kvantitativních metod. V návaznosti na jejich práci používáme binární logistický regresní model pro výpočet skóre gramatického statusu vybraných lexikálních jednotek. Jako zdroj dat nám slouží Lancasterský korpus čínštiny (LCMC), který obsahuje současné standardní čínské texty. V práci podrobně popisujeme metriky a modelovací přístup použitý k odvození skóre gramatického stavu a porovnáváme naše výsledky s výsledky autorů původní studie. V naší analýze dále posuzujeme vhodnost vybraných metrik, hodnotíme přesnost predikce binárního logistického regresního modelu a analyzujeme, kde se po přiřazení gramatického statusu jednotky nachází na lexikálně - gramatické škále, a zda tomu odpovídá jejich "tradiční" zařazení do slovních kategorií. Klíčová slova: gramatický status, korpus, čínština, kvantitativní lingvistika, replikační studie

Abstract (English)

This study replicates Linlin Sun & David C. Saavedra's research on determining the grammatical status of units in Chinese using quantitative methods. Following their work, we use a binary logistic regression model to calculate the grammatical status scores of selected lexical units. As a source of data, the Lancaster Corpus of Chinese (LCMC), which contains contemporary standard Chinese texts, is used. In this paper, we detail the metrics and modeling approach used to derive the grammatical status scores and compare our results with those of the authors of the original study. In our analysis, we further assess the appropriateness of the selected metrics, evaluate the accuracy of the binary logistic regression model's prediction, and analyze where units are on the lexico-grammatical scale after grammatical status assignment, and whether this corresponds to their "traditional" placement in word categories. Key words: grammatical status, corpora, Chinese, quantitative linguistics, replication study

Citace dokumentu

Metadata

Show full item record