Dolování dat z příchozích zpráv elektronické pošty

Šebesta, Jan

Data mining from incoming e-mail messages

dc.contributor.advisor	Žemlička, Michal
dc.creator	Šebesta, Jan
dc.date.accessioned	2017-04-19T11:29:43Z
dc.date.available	2017-04-19T11:29:43Z
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/20.500.11956/21526
dc.description.abstract	V předložené práci studujeme možnosti automatického třídění příchozí emailové komunikace. Naším hlavním cílem je rozpoznání informací o nadcházejících workshopech a konferencích, nabídkách práce a vydávaných knihách. Snažíme se vyvinout nástroj, který informace vydoluje z dat získaných z oborových konferencí. Nabídky v konferencích přicházejí ve formě html, rtf, nebo prostého textu, ale informace v nich je zapsána v běžném jazyce. Text{miningovými metodami získáváme informace z běžného textu a ukládáme je ve strukturované formě, kterou je možné jednoduše strojově zpracovávat. Zkoumáme zpusob zpracování pošty člověkem a následně tyto poznatky aplikujeme při tvorbě systému. V průběhu práce řešíme problémy se samotným získáním zpráv, rozpoznáním jazyka a kódování a rozpoznáním typu zprávy. Informace, kterou ze zprávy potřebujeme získat se různí v závislosti na typu zprávy a události, které se týká. Teprve po rozpoznání nosné informace ve zprávě jsme schopni vydolovat data pro zjištěný typ události. Na závěr ukládáme získané znalosti do databáze, která umožňuje rychlou interakci s uživatelem.	cs_CZ
dc.description.abstract	In the present work we study possibilities of automatic sorting of incoming email communication. Our primary goal is to distinguish information about oncoming workshops and conferences, job off ers and published books. We are trying to develop tool to mine the information from data from professional mailing lists. Off ers in the mailing lists come in html, rtf or plain text format, but the information in it is written in common spoken language. We are developing the system so it will use text mining methods to extract the information and save it structured form. Than we will be able to work with it. We are examining the handling of the mails by user and apply the knowledge in the development. We solve the problems with obtaining of the messages, distinguishing language and encoding and estimating the type of message. After recognition of the bearing information we are able to mine data. In the end we save the mined information to the database, which allows us to display it in well{arranged way, sort and search according to the user needs.	en_US
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	e-mail	en_US
dc.subject	workshop	en_US
dc.subject	text-mining	en_US
dc.subject	automatization	en_US
dc.subject	extraction	en_US
dc.subject	e-mail	cs_CZ
dc.subject	workshop	cs_CZ
dc.subject	text-mining	cs_CZ
dc.subject	třídění	cs_CZ
dc.subject	automatizace	cs_CZ
dc.subject	parsování	cs_CZ
dc.title	Dolování dat z příchozích zpráv elektronické pošty	cs_CZ
dc.type	diplomová práce	cs_CZ
dcterms.created	2009
dcterms.dateAccepted	2009-09-07
dc.description.department	Department of Software Engineering	en_US
dc.description.department	Katedra softwarového inženýrství	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	136490
dc.title.translated	Data mining from incoming e-mail messages	en_US
dc.contributor.referee	Hnětynka, Petr
dc.identifier.aleph	001578395
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Softwarové systémy	cs_CZ
thesis.degree.discipline	Software Systems	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwarového inženýrství	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software Engineering	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Softwarové systémy	cs_CZ
uk.degree-discipline.en	Software Systems	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Neprospěl	cs_CZ
thesis.grade.en	Fail	en_US
uk.abstract.cs	V předložené práci studujeme možnosti automatického třídění příchozí emailové komunikace. Naším hlavním cílem je rozpoznání informací o nadcházejících workshopech a konferencích, nabídkách práce a vydávaných knihách. Snažíme se vyvinout nástroj, který informace vydoluje z dat získaných z oborových konferencí. Nabídky v konferencích přicházejí ve formě html, rtf, nebo prostého textu, ale informace v nich je zapsána v běžném jazyce. Text{miningovými metodami získáváme informace z běžného textu a ukládáme je ve strukturované formě, kterou je možné jednoduše strojově zpracovávat. Zkoumáme zpusob zpracování pošty člověkem a následně tyto poznatky aplikujeme při tvorbě systému. V průběhu práce řešíme problémy se samotným získáním zpráv, rozpoznáním jazyka a kódování a rozpoznáním typu zprávy. Informace, kterou ze zprávy potřebujeme získat se různí v závislosti na typu zprávy a události, které se týká. Teprve po rozpoznání nosné informace ve zprávě jsme schopni vydolovat data pro zjištěný typ události. Na závěr ukládáme získané znalosti do databáze, která umožňuje rychlou interakci s uživatelem.	cs_CZ
uk.abstract.en	In the present work we study possibilities of automatic sorting of incoming email communication. Our primary goal is to distinguish information about oncoming workshops and conferences, job off ers and published books. We are trying to develop tool to mine the information from data from professional mailing lists. Off ers in the mailing lists come in html, rtf or plain text format, but the information in it is written in common spoken language. We are developing the system so it will use text mining methods to extract the information and save it structured form. Than we will be able to work with it. We are examining the handling of the mails by user and apply the knowledge in the development. We solve the problems with obtaining of the messages, distinguishing language and encoding and estimating the type of message. After recognition of the bearing information we are able to mine data. In the end we save the mined information to the database, which allows us to display it in well{arranged way, sort and search according to the user needs.	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwarového inženýrství	cs_CZ
dc.identifier.lisID	990015783950106986