Measuring readability of technical texts

Kriukova, Anna

Měření čitelnosti odborných textů

diploma thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (348.6Kb)

Permanent link

http://hdl.handle.net/20.500.11956/175521

Identifiers

Study Information System: 245686

Referee

Vidová Hladká, Barbora

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Computer Science - Language Technologies and Computational Linguistics

Department

Institute of Formal and Applied Linguistics

Date of defense

2. 9. 2022

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

English

Grade

Excellent

Keywords (Czech)

srozumitelnost|čitelnost|datová analýza|korpusová lingvistika

Keywords (English)

readability|technical texts|data analytics|corpus linguistics|comprehensibility

Title: Measuring readability of technical texts Author: Anna Kriukova Faculty of Mathematics and Physics: Institute of Formal and Applied Linguistics Supervisor: Mgr. Cinkov'a Silvie, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: This research explores various approaches to measuring readability of technical texts. The data I work with is provided by Hyperskill, an online educa- tional platform dedicated mostly to Computer Science, where I did my internship. In the first part of my research, I examine classical readability formulas and try to find correlations between their values and the user statistics available for the texts. The results show that there are no high correlations, thus, the standard formulas are not suitable for the task. The second part of the research is dedi- cated to experiments with machine learning algorithms. Firstly, I use four sets of features to predict the average rating, completion time, and completion rate of a step. Then, I introduce a rule-based algorithm to split the texts into well- and poorly-written ones, which relies on students' comments. However, binary classification trained on this division shows low results and is not used in the final pipeline. The system suggested as the outcome of my work employs the user statistics' prediction for new texts and...

Citace dokumentu

Metadata

Show full item record