Measuring readability of technical texts
Měření čitelnosti odborných textů
diploma thesis (DEFENDED)
View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/175521Identifiers
Study Information System: 245686
Collections
- Kvalifikační práce [11242]
Author
Advisor
Referee
Vidová Hladká, Barbora
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
Computer Science - Language Technologies and Computational Linguistics
Department
Institute of Formal and Applied Linguistics
Date of defense
2. 9. 2022
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Excellent
Keywords (Czech)
srozumitelnost|čitelnost|datová analýza|korpusová lingvistikaKeywords (English)
readability|technical texts|data analytics|corpus linguistics|comprehensibilityTitle: Measuring readability of technical texts Author: Anna Kriukova Faculty of Mathematics and Physics: Institute of Formal and Applied Linguistics Supervisor: Mgr. Cinkov'a Silvie, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: This research explores various approaches to measuring readability of technical texts. The data I work with is provided by Hyperskill, an online educa- tional platform dedicated mostly to Computer Science, where I did my internship. In the first part of my research, I examine classical readability formulas and try to find correlations between their values and the user statistics available for the texts. The results show that there are no high correlations, thus, the standard formulas are not suitable for the task. The second part of the research is dedi- cated to experiments with machine learning algorithms. Firstly, I use four sets of features to predict the average rating, completion time, and completion rate of a step. Then, I introduce a rule-based algorithm to split the texts into well- and poorly-written ones, which relies on students' comments. However, binary classification trained on this division shows low results and is not used in the final pipeline. The system suggested as the outcome of my work employs the user statistics' prediction for new texts and...