Searching classes in the Wikidata ontology

Gora, Martin

Vyhledávání tříd v ontologii Wikidata

dc.contributor.advisor	Nečaský, Martin
dc.creator	Gora, Martin
dc.date.accessioned	2025-03-04T10:05:52Z
dc.date.available	2025-03-04T10:05:52Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/20.500.11956/197453
dc.description.abstract	Nástroj Dataspecer usnadňuje vytváření a správu abstraktních datových struktur pro reprezentaci a výměnu dat na webu pomocí integrace vstupních ontologií. Nicméně, získat komplexní heterogenní ontologii však zůstává náročným úkolem. Tato studie odvodila on- tologii s 830 tisíci třídami ze znalostního grafu Wikidata a následně analyzovala, navrhla, implementovala a vyhodnotila její integraci do nástroje Dataspecer, s hlavním zaměřením na vyhledávání tříd. Byly vyvinuty metody vyhledávání využívající kombinaci textových modelů, jejich interpolace a strategií řazení v rámci vícekrokového vyhledávacího pro- cesu. Tyto přístupy byly vyhodnoceny na vytvořené testovací kolekci dat a dva optimální přístupy, upřednostňující interpolaci naučených hustých a řídkých vektorů, byly inte- grovány do nástroje. Integrace ontologie byla následně posouzena prostřednictvím dvou uživatelských studií. Výsledky potvrdily rychlost odezvy a relevanci vyhledávání, přičemž nedostatky v kritériích uživatelské přívětivosti naznačily oblasti pro budoucí zlepšení. Vý- sledně, tato práce poskytuje poznatky pro budoucí výzkum vyhledávání tříd a opětovné využití rozsáhlých ontologií, zejména v kontextu Wikidat.	cs_CZ
dc.description.abstract	The Dataspecer tool facilitates the creation and management of abstract data struc- tures to represent and exchange data on the Web by leveraging input ontologies. How- ever, acquiring comprehensive heterogeneous ontologies remains challenging. This study derived an ontology of 830 thousand classes from Wikidata and analyzed, designed, imple- mented, and evaluated its integration into the Dataspecer tool, focusing particularly on class search. We devised retrieval methods leveraging a combination of text retrieval mod- els, their interpolation, and re-ranker strategies in a multi-stage retrieval pipeline. The retrieval approaches were evaluated on a developed dataset, and two optimal approaches, favouring interpolation of learned sparse and dense embeddings, were incorporated into the tool. The ontology integration was subsequently assessed through two user studies. The results confirmed the tool's responsiveness and retrieval performance, while deficien- cies in ease-of-use criteria suggested areas for future improvements. Lastly, this work offers insights for future research on class retrieval and the reuse of large-scale ontologies, particularly within the context of Wikidata.	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	wikidata\|třídy\|vyhledávání\|ontologie\|concepty\|vlastnosti\|vektory	cs_CZ
dc.subject	wikidata\|classes\|search\|ontologies\|retrieval\|reuse\|concepts\|properties\|embeddings	en_US
dc.title	Searching classes in the Wikidata ontology	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2025
dcterms.dateAccepted	2025-02-11
dc.description.department	Department of Software Engineering	en_US
dc.description.department	Katedra softwarového inženýrství	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	268694
dc.title.translated	Vyhledávání tříd v ontologii Wikidata	cs_CZ
dc.contributor.referee	Kopecký, Michal
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Computer Science - Software and Data Engineering	en_US
thesis.degree.discipline	Informatika - Softwarové a datové inženýrství	cs_CZ
thesis.degree.program	Computer Science - Software and Data Engineering	en_US
thesis.degree.program	Informatika - Softwarové a datové inženýrství	cs_CZ
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwarového inženýrství	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software Engineering	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Informatika - Softwarové a datové inženýrství	cs_CZ
uk.degree-discipline.en	Computer Science - Software and Data Engineering	en_US
uk.degree-program.cs	Informatika - Softwarové a datové inženýrství	cs_CZ
uk.degree-program.en	Computer Science - Software and Data Engineering	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Nástroj Dataspecer usnadňuje vytváření a správu abstraktních datových struktur pro reprezentaci a výměnu dat na webu pomocí integrace vstupních ontologií. Nicméně, získat komplexní heterogenní ontologii však zůstává náročným úkolem. Tato studie odvodila on- tologii s 830 tisíci třídami ze znalostního grafu Wikidata a následně analyzovala, navrhla, implementovala a vyhodnotila její integraci do nástroje Dataspecer, s hlavním zaměřením na vyhledávání tříd. Byly vyvinuty metody vyhledávání využívající kombinaci textových modelů, jejich interpolace a strategií řazení v rámci vícekrokového vyhledávacího pro- cesu. Tyto přístupy byly vyhodnoceny na vytvořené testovací kolekci dat a dva optimální přístupy, upřednostňující interpolaci naučených hustých a řídkých vektorů, byly inte- grovány do nástroje. Integrace ontologie byla následně posouzena prostřednictvím dvou uživatelských studií. Výsledky potvrdily rychlost odezvy a relevanci vyhledávání, přičemž nedostatky v kritériích uživatelské přívětivosti naznačily oblasti pro budoucí zlepšení. Vý- sledně, tato práce poskytuje poznatky pro budoucí výzkum vyhledávání tříd a opětovné využití rozsáhlých ontologií, zejména v kontextu Wikidat.	cs_CZ
uk.abstract.en	The Dataspecer tool facilitates the creation and management of abstract data struc- tures to represent and exchange data on the Web by leveraging input ontologies. How- ever, acquiring comprehensive heterogeneous ontologies remains challenging. This study derived an ontology of 830 thousand classes from Wikidata and analyzed, designed, imple- mented, and evaluated its integration into the Dataspecer tool, focusing particularly on class search. We devised retrieval methods leveraging a combination of text retrieval mod- els, their interpolation, and re-ranker strategies in a multi-stage retrieval pipeline. The retrieval approaches were evaluated on a developed dataset, and two optimal approaches, favouring interpolation of learned sparse and dense embeddings, were incorporated into the tool. The ontology integration was subsequently assessed through two user studies. The results confirmed the tool's responsiveness and retrieval performance, while deficien- cies in ease-of-use criteria suggested areas for future improvements. Lastly, this work offers insights for future research on class retrieval and the reuse of large-scale ontologies, particularly within the context of Wikidata.	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwarového inženýrství	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O