XML Data Integration based on Content and Structure Similarity using Keys

Abstract

This paper proposes a technique for approximately matching XML data based on the content and structure by detecting the similarity of subtrees clustered semantically using leaf-node parents. the leaf-node parents are considered as a root of a subtree which is then recursively traversed bottom-up for matching. First, we take advantage of the "key" for matching subtrees which reduces the number of comparisons dramatically. Second, we measure the similarity degree based on data and structures of the two XML documents. the results show that our approach finds much more accurate matches with or without the presence of keys in XML subtrees. Other approaches experience problems with similarity matching thresholds as they either ignore semantic information available or have problems in handling complex XML data. © 2008 Springer Berlin Heidelberg.

Recommended Citation

W. Viyanon et al., "XML Data Integration based on Content and Structure Similarity using Keys," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5331 LNCS, no. PART 1, pp. 484 - 493, Springer, Dec 2008.

The definitive version is available at https://doi.org/10.1007/978-3-540-88871-0_35

Department(s)

Computer Science

Keywords and Phrases

Clustering; Keys; Similarity; XML

International Standard Book Number (ISBN)

978-354088870-3

International Standard Serial Number (ISSN)

1611-3349; 0302-9743

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

31 Dec 2008

Computer Science Faculty Research & Creative Works

XML Data Integration based on Content and Structure Similarity using Keys

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

XML Data Integration based on Content and Structure Similarity using Keys

Author

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations