Entity retrieval on Wikipedia in the scope of the gikiCLEF track
diploma thesis (DEFENDED)
![Document thumbnail](/bitstream/handle/20.500.11956/23299/thumbnail.png?sequence=7&isAllowed=y)
View/ Open
Permanent link
http://hdl.handle.net/20.500.11956/23299Identifiers
Study Information System: 62987
Collections
- Kvalifikační práce [11264]
Author
Advisor
Referee
Žabokrtský, Zdeněk
Faculty / Institute
Faculty of Mathematics and Physics
Discipline
Computational Linguistics
Department
Institute of Formal and Applied Linguistics
Date of defense
14. 9. 2009
Publisher
Univerzita Karlova, Matematicko-fyzikální fakultaLanguage
English
Grade
Very good
This thesis presents a system to retrieve entities specified by a question or description given in natural language, this description indicates the entity type and the properties that the entities need to satisfy. This task is analogous to the one proposed in the GikiCLEF 2009 track. The system is fed with the Spanish Wikipedia Collection of 2008 and every entity is represented by a Wikipage. We propose three novel methods to perform query expansion in the problem of entity retrieval. We also introduce a novel method to employ the English Yago and DBpedia semantic resources to determine the target named entity type; this method is used to improve previous approaches in which the target NE type is based solely on Wikipedia categories. We show that our system obtains promising results when we evaluate its performance in the GikiCLEF 2009 topic list and compare the results with the other participants of the track.