Abstract
A deep learning approach for analyzing DNase-seq datasets is presented, which has promising potentials for unraveling biological underpinnings on transcription regulation mechanisms. Further understanding of these mechanisms can lead to important advances in life sciences in general and drug, biomarker discovery, and cancer research in particular. Motivated by recent remarkable advances in the field of deep learning, we developed a platform, Deep Semi-Supervised DNase-seq Analytics (DSSDA). Primarily empowered by deep generative Convolutional Networks (ConvNets), the most notable aspect is the capability of semi-supervised learning, which is highly beneficial for common biological settings often plagued with a less sufficient number of labeled data. In addition, we investigated a k-mer based continuous vector space representation, attempting further improvement on learning power with the consideration of the nature of biological sequences for features associated with locality-based relationships between neighboring nucleotides. DSSDA employs a modified Ladder Network for underlying generative model architecture, and its performance is demonstrated on the cell type classification task using sequences from large-scale DNase-seq experiments. We report the performance of DSSDA in both fully supervised setting, in which DSSDA outperforms widely known ConvNet models (94.6% classification accuracy), and semi-supervised setting for which, even with less than 10% of labeled data, DSSDA performs relatively comparable to other ConvNets using the full data set. Our results underscore, in order to deal with challenging genomic sequence datasets, the need of a better deep learning method to learn latent features and representation.
Recommended Citation
S. Shams et al., "A Distributed Semi-Supervised Platform For DNase-Seq Data Analytics Using Deep Generative Convolutional Networks," ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 244 - 253, Association for Computing Machinery, Aug 2018.
The definitive version is available at https://doi.org/10.1145/3233547.3233601
Department(s)
Computer Science
Publication Status
Public Access
Keywords and Phrases
Continuous vector representation; Convolutional networks; Deep learning; Dnase-seq; Generative models; Semi-supervised learning
International Standard Book Number (ISBN)
978-145035794-4
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Association for Computing Machinery, All rights reserved.
Publication Date
15 Aug 2018
Comments
National Science Foundation, Grant 1338051