Použití filtrovacích algoritmů ve shlukové analýze

Pacovský, Matěj

Use of filter algorithms in cluster analysis

bachelor thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (38.86Kb)

Permanent link

http://hdl.handle.net/20.500.11956/45940

Identifiers

Study Information System: 107741

Referee

Novák, Petr

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Financial Mathematics

Department

Department of Probability and Mathematical Statistics

Date of defense

18. 6. 2012

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

Czech

Grade

Very good

Keywords (Czech)

Shluková analýza, algoritmus k-průměrů, filtrovací algoritmus, algoritmus x-průměrů

Keywords (English)

Cluster analysis, k-means algorithm, x-means algorithm, filtering algorithm

Práce je rozdělena do pěti kapitol. V prvních dvou kapitolách shrnuji sebrané poznatky o shlukové analýze dat, uvádím definice pojmů použitých v~práci a popisuji algoritmus k-průměrů. Ve třetí kapitole se zabývám filtrovacím algoritmem, který využívá filtrovací heuristiku během průchodu MRKD-stromem a tím urychluje algoritmus k-průměrů. Ve čtvrté kapitole popisuji algoritmus x-průměrů, který využívá všechny dosud zmíněné poznatky. V páté kapitole testuji všechny algoritmy na uměle vytvořených datech a na reálných datech z fyziky, přitom se v některých případech odkazuji na program WEKA, v němž je algoritmus x-průměrů naimplementován. Algoritmy o kterých pojednává tato práce jsou určeny pro objekty popsané pouze kvantitativními proměnnými. Jsou také vhodné k použití na velké datové soubory. Na přiloženém CD uvádím implementaci algoritmů v jazyku Matlab.

Abstract (English)

The thesis is divided into five chapters. In the first two chapters I give the overview of clustering data analysis, I present definitions of terms used in the work and describe the k-means algorithm. Third chapter focuses on the filtering algorithm that uses heuristics when algorithm pass throught the MRKD-tree. The fourth chapter describes the x-means algorithm that uses all of the above-mentioned findings. In the fifth chapter I test all algorithms both on artificial and real data from physics. In some cases I refer to the WEKA program where the x-means algorithm is implemented. Algoritms that are discussed in this thesis are intended only for objects described by quantitative variables. They are also suitable for large datasets. In the attached CD I present the implementation of algorithms in Matlab language.

Citace dokumentu

Metadata

Show full item record