Modeling Gene Regulation from Paired Expression and Chromatin Accessibility Data
Modeling gene regulation from paired expression and PNAS PLUS chromatin accessibility data Zhana Durena,b,c, Xi Chenb, Rui Jiangd,1, Yong Wanga,c,1, and Wing Hung Wongb,1 aAcademy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China; bDepartment of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305; cSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; and dMinistry of Education Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China Contributed by Wing Hung Wong, May 8, 2017 (sent for review March 20, 2017; reviewed by Christina Kendziorski and Sheng Zhong) The rapid increase of genome-wide datasets on gene expression, gene expression data, accessibility data are available for a diverse set chromatin states, and transcription factor (TF) binding locations offers of cellular contexts (Fig. 1, blue boxes). In fact, we expect the an exciting opportunity to interpret the information encoded in amount of matched expression and accessibility data (i.e., measured genomes and epigenomes. This task can be challenging as it requires on the same sample) will increase very rapidly in the near future. joint modeling of context-specific activation of cis-regulatory ele- The purpose of the present work is to show that, by using ments (REs) and the effects on transcription of associated regulatory matched expression and accessibility data across diverse cellular factors. To meet this challenge, we propose a statistical approach contexts, it is possible to recover a significant portion of the in- based on paired expression and chromatin accessibility (PECA) data formation in the missing data on binding location and chromatin across diverse cellular contexts.
[Show full text]