"DNA methylation is a widely studied epigenetic modification that can influence the expression and regulation of functional genes, especially those related to aging, cancer and other diseases. The common goal of methylation studies is to find differences in methylation levels between samples collected under different conditions. Differences can be detected at the site level, but regulated methylation targets are most commonly clustered into short regions. Thus, identifying differentially methylated regions (DMRs) between different groups is of prime interest. Despite advanced technology that enables measuring methylation genome-wide, misinterpretations in the readings can arise due to the existence of single nucleotide polymorphisms (SNPs) in the target sequence. One of the main pre-processing steps in DMR detection methods involves filtering out potential SNP-related probes due to this issue. In this work, it is proposed to leverage the current trend of collecting both SNP and methylation data on the same individual, making it possible to integrate SNP data into the DNA methylation analysis framework. This will enable the originally filtered potential SNPs to be restored if a SNP is not actually present. Furthermore, when a SNP is present or other missing data issues arise, imputation methods are proposed for methylation data. First, regularized linear regression (ridge, LASSO and elastic net) imputation models are proposed, along with a variable screening technique to restrict the number of variables in the models. Functional principal component regression imputation is also proposed as an alternative approach. The proposed imputation methods are compared to existing methods and evaluated based on imputation accuracy and DMR detection ability using both real and simulated data. One of the proposed methods (elastic net with variable screening) shows effective imputation accuracy without sacrificing computation efficiency across a variety of settings, while greatly improving the number of true positive DMR detections"--Abstract, page iii.
Olbricht, Gayla R.
Samaranayake, V. A.
Wen, Xuerong Meggie
Frank, Ronald L.
Mathematics and Statistics
Ph. D. in Mathematics
Missouri University of Science and Technology
x, 86 pages
© 2021 Yuqing Su, All rights reserved.
Dissertation - Open Access
Su, Yuqing, "Integrating snp data and imputation methods into the DNA methylation analysis framework" (2021). Doctoral Dissertations. 2987.