XML-SIM-CHANGE: Structure and Content Semantic Similarity Detection among XML Document Versions
Abstract
XML documents from different sources may represent the same or similar information with respect to content and structure. Being able to integrate similar XML documents is important to query systems and search engines. However, information changes periodically, therefore, it is important to detect the changes among different versions of an XML document and use the changed information to discover semantic similarity among XML documents. in this paper, we introduce such an approach to detect XML similarity using the change detection mechanism to join XML document versions. in our approach, keys in subtrees play an important role in order to avoid unnecessary comparisons of subtrees within different XML versions of the same document. We use relational database to store XML versions and apply SQL for detecting similarities. We show that our approach is highly scalable and has better efficiency in terms of execution time and provides comparable result quality. © 2010 Springer-Verlag.
Recommended Citation
W. Viyanon and S. K. Madria, "XML-SIM-CHANGE: Structure and Content Semantic Similarity Detection among XML Document Versions," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6427 LNCS, no. PART 2, pp. 1061 - 1078, Springer, Dec 2010.
The definitive version is available at https://doi.org/10.1007/978-3-642-16949-6_29
Department(s)
Computer Science
Keywords and Phrases
Change Detection; Join; Keys; XML Similarity
International Standard Book Number (ISBN)
978-364216948-9
International Standard Serial Number (ISSN)
1611-3349; 0302-9743
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Springer, All rights reserved.
Publication Date
16 Dec 2010