Dealing with Dangerous Data: Part-Whole Validation for Low Incident, High Risk Data
Abstract
In certain situations, syntactically valid, but incorrect, data entered into a database can result in nearimmediate, catastrophic financial losses for an organization. Examples include: omitting zeros in prices of goods on e-commerce sites; and financial fraud where data is directly entered into databases, bypassing application-level financial checks. Such "dangerous data" can, and should, be detected, because it deviates substantially from the statistical properties of existing data. Detection of this kind of problem requires comparing individual data items to a large amount of existing data in the database at run- Time. Furthermore, the identification of errors is probabilistic, rather than deterministic, in nature. This research proposes part-whole validation as an approach to addressing the dangerous data situation. Part-whole validation addresses fundamental issues in database management, for example, integrity maintenance. Illustrative and representative examples are first defined, and analyzed. Then, an architecture for part-whole validation is presented and implemented in a prototype to illustrate the feasibility of the research.
Recommended Citation
Chua, C. E., & Storey, V. C. (2016). Dealing with Dangerous Data: Part-Whole Validation for Low Incident, High Risk Data. Journal of Database Management, 27(1), pp. 29-57. IGI Global.
The definitive version is available at https://doi.org/10.4018/JDM.2016010102
Department(s)
Business and Information Technology
Keywords and Phrases
Audit; Boyce-codd normal form; Business rules; Dangerous data; Data management; Data quality; Database design; Part-whole validation; Relational databases
International Standard Serial Number (ISSN)
1063-8016; 1533-8010
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2016 IGI Global, All rights reserved.
Publication Date
01 Mar 2016