A Methodology to Discover Contaminated Data in Spatial Databases
In this information age, the problem of data quality deserves more attention than it is getting. Contaminated or corrupted data are data that were misunderstood or misrecorded in data sets. The success of the petroleum industry relies upon a combination of multi-dimensional reliable information and proper cross-disciplinary techniques. Contaminated data can mislead the strategic-level decision making and cause operational failures. This paper presents a methodology to identify contaminated data in large spatial data sets in the data quality control cycle. Graphical and quantitative spatial outlier detection methods are discussed. All these methods are different for identifying spatially abnormal data, both in mechanisms and results. This paper discusses these methods and compares their advantages and disadvantages. The effectiveness of these methods is evaluated by using a real spatial database, the Produced Water Chemistry Database (PWCD) in New Mexico.
M. Wei et al., "A Methodology to Discover Contaminated Data in Spatial Databases," Proceedings of the SPE Annual Technical Conference and Exhibition (2004, Houston, TX), pp. 2063-2070, Society of Petroleum Engineers (SPE), Sep 2004.
The definitive version is available at http://dx.doi.org/10.2118/90267-MS
SPE Annual Technical Conference and Exhibition (2004: Sep. 26-29, Houston, TX)
Geosciences and Geological and Petroleum Engineering
Keywords and Phrases
Data Quality; Data Sets; Feedback-Control Systems (FCS); Produced Water Chemistry Database (PWCD); Algorithms; Computational Methods; Contamination; Control Systems; Data Reduction; Decision Making; Functions; Petroleum Industry; Quality Control; Database Systems
Article - Conference proceedings
© 2004 Society of Petroleum Engineers (SPE), All rights reserved.