Využití klastrovacích technik při monitorování inzerce

Dzetkulič, Tomáš

Clustering techniques for ads monitoring
Využití klastrovacích technik při monitorování inzerce

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (299.2Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/13261

Identifikátory

SIS: 47022

Oponent práce

Kára, Jan

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Diskrétní matematika a optimalizace

Katedra / ústav / klinika

Katedra aplikované matematiky

Datum obhajoby

11. 9. 2007

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Slovenština

Známka

Velmi dobře

Práca sa zaoberá možnosťami klastrovania inzercie so zameraním na realitnú inzerciu. V prvej časti práce definujeme čo to je klastrovanie, kde sa používa a aké sú typické požiadavky na klastrovacie algoritmy. Popíšeme existujúce klastrovacie metódy, ich vlastnosti a použitie. Posúdime ich vhodnosť pre oblasť inzercie a vyberieme najvhodnejší algoritmus pre klastrovanie rádovo miliónov inzerátov. V ďalšej časti detailne popíšeme interpretáciu inzerátu ako prvku vektorového priestoru s vysokou dimenziou a algoritmus klastrujúci prvky takéhoto vektorového priestoru založený na rodinách lokálnych hašovacích funkcií. Popíšeme jeho vlastnosti, časovú a pamäťovú zložitosť, jeho parametre a očakávané výsledky behu algoritmu. V implementačnej časti rozoberieme detaily implementácie v programovacom jazyku Java a navrhneme vhodné uloženie dát v relačnej databázi. V časti venovanej testom potom zhodnotíme výsledky behu algoritmu na reálnych dátach a porovnáme ich s očakávaným výstupom algoritmu. V závere práce posúdime možnosti ďalšieho rozšírenia použitej klastrovacej metódy.

Abstrakt (anglicky)

This thesis surveys possibilities of clustering of advertisements, especially those for real estates. It defines clustering itself, its usage and typical requirements for clustering algorithms. We provide list of existing clustering methods and approaches, their properties and suitable application. We consider possiblity of using them for clustering of milions of advertisements and based on that, we choose most suitable algorithm for this problem. We describe how to interpret advertisement as the point in multi dimensional vector space and this algorithm for clustering such points using locality of families of hash functions. We describe algorithm in detail, listing all of its parameters, estimating its complexity and expected results. In the following chapters we describe implementation of the algorithm in Java. We also describe database structure of underlying relational database. In the next chapter we present results of the algorithm based on real data and we compare the results with the expected results of the algorithm. In the end, we discuss possibilities for future extension of the clustering method.

Citace dokumentu

Metadata

Zobrazit celý záznam