A Methodology to Discover Contaminated Data in Spatial Databases


In this information age, the problem of data quality deserves more attention than it is getting. Contaminated or corrupted data are data that were misunderstood or misrecorded in data sets. The success of the petroleum industry relies upon a combination of multi-dimensional reliable information and proper cross-disciplinary techniques. Contaminated data can mislead the strategic-level decision making and cause operational failures. This paper presents a methodology to identify contaminated data in large spatial data sets in the data quality control cycle. Graphical and quantitative spatial outlier detection methods are discussed. All these methods are different for identifying spatially abnormal data, both in mechanisms and results. This paper discusses these methods and compares their advantages and disadvantages. The effectiveness of these methods is evaluated by using a real spatial database, the Produced Water Chemistry Database (PWCD) in New Mexico.

Meeting Name

SPE Annual Technical Conference and Exhibition (2004: Sep. 26-29, Houston, TX)


Geosciences and Geological and Petroleum Engineering

Keywords and Phrases

Data Quality; Data Sets; Feedback-Control Systems (FCS); Produced Water Chemistry Database (PWCD); Algorithms; Computational Methods; Contamination; Control Systems; Data Reduction; Decision Making; Functions; Petroleum Industry; Quality Control; Database Systems

Document Type

Article - Conference proceedings

Document Version


File Type





© 2004 Society of Petroleum Engineers (SPE), All rights reserved.

Publication Date

01 Sep 2004