A Change Detection System for Unordered XML Data using a Relational Model
Abstract
The dramatic increase in the evolution of XML data available on the Internet requires a change detection system to keep track of important changes occurring during their life time. in this paper, we introduce a novel approach of detecting changes between two versions of unordered XML data stored in a traditional relational database using approaches like XRel. Most of the existing work in the area of XML change detection is mainly focused on detecting changes between two versions of XML data by constructing their Document Object Model (DOM) trees and then comparing these two tree structures based on Longest Common Sequence (LCS) using minimum edit distances. the basic tree comparison approach is not efficient in handling large XML files due to the fact that (1) an equivalent XML DOM tree will be twice as large as the original document and (2) the entire trees of both versions have to be memory resident during the comparison process. These two issues are constrained by the available main memory. in addition, existing approaches fail to detect changes among versions of XML data stored in relational databases as reverse mapping is not loss-less. We propose an efficient algorithm (XRel-Change-SQL) for detecting unordered changes between two XML data files stored in XRel as the underlying relational data model, using Structured Query Language (SQL). We compare the efficiency and quality of our change detection algorithm with existing XML change detection tools like X-Diff, DeltaXML and XANDY. We provide an experimental evaluation of the results obtained from the benchmark datasets as well as some synthetic datasets to show that our approach is highly scalable, and results in a much better efficiency and delta quality than the aforementioned approaches and tools. © 2011 Elsevier B.V. All rights reserved.
Recommended Citation
S. Sundaram and S. K. Madria, "A Change Detection System for Unordered XML Data using a Relational Model," Data and Knowledge Engineering, vol. 72, pp. 257 - 284, Elsevier, Feb 2012.
The definitive version is available at https://doi.org/10.1016/j.datak.2011.11.003
Department(s)
Computer Science
Keywords and Phrases
Change detection; Edit distance; SQL; Tree comparison; XML
International Standard Serial Number (ISSN)
0169-023X
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Elsevier, All rights reserved.
Publication Date
01 Feb 2012
Comments
National Science Foundation, Grant None