Geosciences and Geological and Petroleum Engineering Faculty Research & Creative Works

A New Domain-Independent Field Matching Algorithm for Large Databases

Mingzhen Wei, Missouri University of Science and TechnologyFollow
Andrew H. Sung
Martha E. Cather

Abstract

In large databases, string-valued attributes are very important due to their entity identifying and descriptive roles. Due to various reasons, the name of an entity may be presented in several different ways, as in "New Mexico Tech" and "NMT" for New Mexico Institute of Mining and Technology in Socorro, New Mexico, USA. The task of field matching is to determine whether two syntactically different values are alternatives of the same semantic entity. Field matching problem is recognized important even though little research has been done on the field matching algorithms. In this paper, a new domain-independent token-based field matching algorithm is proposed and tested. The new algorithm achieves high string matching accuracy and efficiency by introducing string matching point concept and defining proper string matching patterns. A new general string matching framework enables practical algorithms to be developed easily according to the characteristics of problems and data.

Recommended Citation

M. Wei et al., "A New Domain-Independent Field Matching Algorithm for Large Databases," Proceedings of the International Conference on Data Mining (2005, Las Vegas, NV), pp. 126 - 131, Jun 2005.

Meeting Name

International Conference on Data Mining (2005: Jun. 20-23, Las Vegas, NV)

Department(s)

Geosciences and Geological and Petroleum Engineering

Keywords and Phrases

Data Cleaning; Domain Independent; Field Matching Algorithm; String Matching Patterns; Information Management; Information Theory

International Standard Book Number (ISBN)

978-1932415797

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Publication Date

01 Jun 2005

This document is currently not available here.

COinS

Geosciences and Geological and Petroleum Engineering Faculty Research & Creative Works

A New Domain-Independent Field Matching Algorithm for Large Databases

Abstract

Recommended Citation

Meeting Name

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Publication Date

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Geosciences and Geological and Petroleum Engineering Faculty Research & Creative Works

A New Domain-Independent Field Matching Algorithm for Large Databases

Author

Abstract

Recommended Citation

Meeting Name

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Publication Date

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations