Abstract
Electronic health records (EHR) provide a rich source of temporal data that present a unique opportunity to characterize disease patterns and risk of imminent disease. While many data-mining tools have been adopted for EHR-based disease early detection, linear discriminant analysis (LDA) is one of the most commonly used statistical methods. However, it is difficult to train an accurate LDA model for early disease diagnosis when too few patients are known to have the target disease. Furthermore, EHR data are heterogeneous with significant noise. In such cases, the covariance matrices used in LDA are usually singular and estimated with a large variance. This article presents Daehr, an extension of the LDA framework using electronic health record data to address these issues. Beyond existing LDA analyzers, we propose Daehr to (1) eliminate the data noise caused by the manual encoding of EHR data and (2) lower the variance of parameter (covariance matrices) estimation for LDA models when only a few patients' EHR are available for training. To achieve these two goals, we designed an iterative algorithm to improve the covariance matrix estimation with embedded data-noise/parameter-variance reduction for LDA. We evaluated Daehr extensively using the College Health Surveillance Network, a large, real-world EHR dataset. Specifically, our experiments compared the performance of LDA to three baselines (i.e., LDA and its derivatives) in identifying college students at high risk for mental health disorders from 23 U.S. universities. Experimental results demonstrate Daehr significantly outperforms the three baselines by achieving 1.4%-19.4% higher accuracy and a 7.5%-43.5% higher F1-score.
Recommended Citation
H. Xiong et al., "Daehr: A Discriminant Analysis Framework for Electronic Health Record Data and an Application to Early Detection of Mental Health Disorders," ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, article no. 47, Association for Computing Machinery (ACM), Feb 2017.
The definitive version is available at https://doi.org/10.1145/3007195
Department(s)
Computer Science
Keywords and Phrases
Anxiety/depression; Early detection; Electronic health data; Predictive models; Temporal order
International Standard Serial Number (ISSN)
2157-6912; 2157-6904
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Association for Computing Machinery (ACM), All rights reserved.
Publication Date
01 Feb 2017