Pokročilý korektor češtiny

Richter, Michal

Pokročilý korektor češtiny

diploma thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (95.61Kb)

Permanent link

http://hdl.handle.net/20.500.11956/33978

Identifiers

Study Information System: 45334

Referee

Žabokrtský, Zdeněk

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Computational Linguistics

Department

Institute of Formal and Applied Linguistics

Date of defense

6. 9. 2010

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

English

Grade

Excellent

Cílem práce je implementovat český spell-checker, který bude využívat jazykové modely a lexikální morfologické analýzy za účelem nabízení co nejkvalitnějšího seznamu možností oprav pro jednotlivé překlepy a za účelm odhalení překlepů, které jsou zároveň platnými českými slovy. Systém by měl zároveň poskytovat službu obnovy diakritiky v českém textu. Za cílovou platformu byl zvolen operační systém Mac OS X. Během implementace byl kladen důraz zejména na efektivní paměťovou reprezentaci statistických modelů. V práci je podán přehled o použitých metodách - HMMs, language models, Viterbi algorithm. Dále je popsána vlastní implementace systému a trénování statistických modelů. Na závěr pak číselná evaluace úspěšnosti systému a diskuze dosažených výsledků.

Abstract (English)

The aim of this work is to implement a Czech spell-checker using several language models and a lexical morphological analyser in order to o er proper correction suggestions and also to nd real-word spelling errors (spelling errors that happen to be in the lexicon). The system should also be able to complete diacritics to Czech text. Mac OS X was chosen as the target platform for the application. During the implementation, emphasis was put especially on memory-effient representation of the above-mentioned statistical models. In the beginning, a gentle introduction to Hiden Markov Models, Language Models and Viterbi algorithm is given. The actual system implementation and the statistical models training is discussed further. In the nal part of the work, the achived results are evaluated and discussed in depth.

Citace dokumentu

Metadata

Show full item record