Processing of Incorrect XML Data

Svoboda, Martin

Zpracování nekorektních XML dat

dc.contributor.advisor	Holubová, Irena
dc.creator	Svoboda, Martin
dc.date.accessioned	2017-04-27T03:23:10Z
dc.date.available	2017-04-27T03:23:10Z
dc.date.issued	2010
dc.identifier.uri	http://hdl.handle.net/20.500.11956/33985
dc.description.abstract	XML dokumenty a technologie reprezentují široce akceptovaný standard pro správu a výměnu semistrukturovaných dat. Překvapivě vysoký počet XML dokumentů však obsahuje chyby dobré formovanosti, strukturální validity nebo nekonzistence dat. Cílem této práce je analýza existujících přístupů vedoucí k návrhu nového korekčního systému. Představený model zahrnuje opravy elementů a atributů vůči jednotypovým stromovým gramatikám. Průchodem stavového prostoru automatu na rozpoznávání regulárních výrazů jsme vždy schopni nalézt všechny minimální opravy. Tyto opravy jsou kompaktně reprezentovány rekurzivními multigrafy, které se dají přeložit do konkrétních sekvencí editačních operací modifikujících datové stromy. Navrženy byly čtyři konkrétní algoritmy doplněné o prototypovou implementaci a experimentální výsledky. Nejvíce efektivní algoritmus heuristicky sleduje pouze perspektivní směry oprav a brání jakýmkoli opakovaným výpočtům.	cs_CZ
dc.description.abstract	XML documents and related technologies represent widely accepted standard for managing and exchanging semi-structured data. However, surprisingly high number of XML documents is affected by well-formedness errors, structural invalidity or data inconsistencies. The aim of this thesis is the analysis of existing approaches resulting to the proposal of a new correction framework. The introduced model involves repairs of elements and attributes with respect to single type tree grammars. Via the inspection of the state space of an automaton recognising regular expressions, we are always able to find all minimal repairs. These repairs are compactly represented by recursively nested multigraphs, which can be translated to particular sequences of edit operations altering data trees. We have proposed four particular algorithms and provided the prototype implementation supplemented with experimental results. The most efficient algorithm heuristically follows only perspective repair directions and avoids repeated computations using the caching mechanism.	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	XML	cs_CZ
dc.subject	validita	cs_CZ
dc.subject	opravy	cs_CZ
dc.subject	XML	en_US
dc.subject	validity	en_US
dc.subject	corrections	en_US
dc.title	Processing of Incorrect XML Data	en_US
dc.type	diplomová práce	cs_CZ
dcterms.created	2010
dcterms.dateAccepted	2010-09-06
dc.description.department	Department of Software Engineering	en_US
dc.description.department	Katedra softwarového inženýrství	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	65247
dc.title.translated	Zpracování nekorektních XML dat	cs_CZ
dc.contributor.referee	Nečaský, Martin
dc.identifier.aleph	001389682
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Software Systems	en_US
thesis.degree.discipline	Softwarové systémy	cs_CZ
thesis.degree.program	Computer Science	en_US
thesis.degree.program	Informatika	cs_CZ
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra softwarového inženýrství	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Software Engineering	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Softwarové systémy	cs_CZ
uk.degree-discipline.en	Software Systems	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	XML dokumenty a technologie reprezentují široce akceptovaný standard pro správu a výměnu semistrukturovaných dat. Překvapivě vysoký počet XML dokumentů však obsahuje chyby dobré formovanosti, strukturální validity nebo nekonzistence dat. Cílem této práce je analýza existujících přístupů vedoucí k návrhu nového korekčního systému. Představený model zahrnuje opravy elementů a atributů vůči jednotypovým stromovým gramatikám. Průchodem stavového prostoru automatu na rozpoznávání regulárních výrazů jsme vždy schopni nalézt všechny minimální opravy. Tyto opravy jsou kompaktně reprezentovány rekurzivními multigrafy, které se dají přeložit do konkrétních sekvencí editačních operací modifikujících datové stromy. Navrženy byly čtyři konkrétní algoritmy doplněné o prototypovou implementaci a experimentální výsledky. Nejvíce efektivní algoritmus heuristicky sleduje pouze perspektivní směry oprav a brání jakýmkoli opakovaným výpočtům.	cs_CZ
uk.abstract.en	XML documents and related technologies represent widely accepted standard for managing and exchanging semi-structured data. However, surprisingly high number of XML documents is affected by well-formedness errors, structural invalidity or data inconsistencies. The aim of this thesis is the analysis of existing approaches resulting to the proposal of a new correction framework. The introduced model involves repairs of elements and attributes with respect to single type tree grammars. Via the inspection of the state space of an automaton recognising regular expressions, we are always able to find all minimal repairs. These repairs are compactly represented by recursively nested multigraphs, which can be translated to particular sequences of edit operations altering data trees. We have proposed four particular algorithms and provided the prototype implementation supplemented with experimental results. The most efficient algorithm heuristically follows only perspective repair directions and avoids repeated computations using the caching mechanism.	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra softwarového inženýrství	cs_CZ
dc.identifier.lisID	990013896820106986