Genome Architecture and Transcription Data Reveal Allelic Bias During the Cell Cycle
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Genome architecture and transcription data reveal allelic bias during the cell cycle Stephen Lindsly1, Wenlong Jia2, Haiming Chen1, Sijia Liu3, Scott Ronquist1, Can Chen4;7, Xingzhao Wen5, Gilbert Omenn1, Shuai Cheng Li2, Max Wicha1;6, Alnawaz Rehemtulla6, Lindsey Muir1, Indika Rajapakse1;7;∗ 1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor 2Department of Computer Science, City University of Hong Kong 3MIT-IBM Watson AI Lab, IBM Research 4Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor 5Department of Bioengineering, University of California San Diego, La Jolla 6Department of Hematology/Oncology, University of Michigan, Ann Arbor 7Department of Mathematics, University of Michigan, Ann Arbor ∗To whom correspondence should be addressed; E-mail: [email protected]. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Highlights • Structural and functional differences between the maternal and paternal genomes, includ- ing similar allelic bias in related subsets of genes. • Coupling between the dynamics of gene expression and genome architecture for specific alleles illuminates an allele-specific 4D Nucleome. • Introduction of a novel allele-specific phasing algorithm for genome architecture and a quantitative framework for integration of gene expression and genome architecture. Abstract Millions of genetic variants exist between the paternal and maternal genomes in human cells (1, 2), which result in unequal allelic contributions to gene transcription (3, 4). However, it remains poorly understood how allelic bias affects the interplay between transcription and the 3D organization of the genome. We sought to understand how transcription and genome architec- ture differ between the maternal and paternal genomes across the cell cy- cle. We collected and analyzed haplotype-resolved genome-wide data from B-lymphocytes (NA12878) in G1, S, and G2/M, using RNA sequencing (RNA- seq), bromouridine sequencing (Bru-seq), and genome wide chromosome con- formation capture (Hi-C). In the past, separation of allele-specific data was done only through heterozygous single nucleotide variations (SNVs), inser- tions, and deletions (InDels), as these unique variations allowed DNA sequenc- ing reads to be mapped back to their parental origins (5). We introduce Hap- loHiC, a novel method of phasing Hi-C data using reads assigned through SNVs/InDels to predict the parental origin of nearby reads of unknown ori- gin. This method allows for more structural data to be systematically assigned to a parental origin, and therefore reduces the sparsity of the allele-specific Hi- bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. C contact matrices. By integrating allele-specific RNA-seq, Bru-seq, and Hi-C data through three phases of the cell cycle, along with publicly available pro- tein binding data (ChIP-seq), we provide a more comprehensive understand- ing of architectural and transcriptional differences between the two genomes. These analyses reveal specific patterns in allelic bias, including similar bias characteristics in some groups of related genes. The integration of these data enabled construction of an allele-specific 4D Nucleome. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Introduction Maternal and paternal chromosomes typically each encode an allele of the same gene. Mil- lions of single nucleotide variants (SNVs), insertions, and deletions (InDels) enable distinction between the maternal and paternal genome. Studying the maternal and paternal alleles of a gene may enhance our understanding of genome organization and mechanisms of bias in allele expression. Observed differences between alleles of a gene include expression, 3D chromatin organization, and protein binding (1–8). Approximately 10-25% of all genes have allelic biased expression across many cell types (9–11). One well studied example of allele-specific expres- sion and architecture is X chromosome inactivation (XCI) in females, in which one of the two X chromosomes is tightly packed into two ‘superdomains’ with silencing of gene expres- sion (5, 12). Another example of known allele-specific expression is imprinting, in which one allele’s expression is suppressed (13). Expression of one allele over another may be stochastic and independent of parental origin (4, 14). It has also been shown for single cells that monoal- lelic transcripts could make up over 75% of total transcripts (14). The allele-specific nature of genome architecture and gene expression have been explored individually, but it remains poorly understood how differences in chromosome architecture be- tween alleles contribute to differences in gene expression. To address this problem, we look to transcription factors and other regulatory proteins as key components that connect differences in genome conformation with differential expression of the maternal and paternal genomes. By integrating RNA sequencing (RNA-seq), bromouridine sequencing (Bru-seq), and genome- wide chromosome conformation capture (Hi-C), along with publicly available protein binding data (ChIP-seq), we contribute insight into how maternal and paternal genomes contribute to cell phenotype. The redundancy inherent in diploid genomes makes them robust to mutations and genetic loss-of-function events. In cancers, this redundancy is often missing, and referred to as loss of heterozygosity. For example, BRCA1 and BRCA2 exhibit loss of heterozygosity (15), and mutations in these genes lead to a significant increase in the likelihood of developing breast bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. cancer (16). Additionally, monoallelically expressed genes, like X-inactivated and imprinted genes (13, 17–19), do not have a ‘backup’ gene to compensate for potential deleterious muta- tions, and it has been found that mutations in monoallelicly expressed genes are responsible for disease phenotypes (20). These findings lend credence to the pursuit of allele-specific genome analysis. The 4D Nucleome (4DN) describes the relationship between genome architecture, gene expression, and cell phenotype through time (21). Chen et al. (22) introduced a mathematical framework for the 4DN, which can be applied in an allele-specific manner to a fully haplotyped cell line (e.g. NA12878). Despite extensive study of genome replication and compaction in cell division, few studies focus on differences between maternal and paternal genomes during these processes. For example, the two alleles may have different dynamics through the cell cycle that influence proliferation, or an allele-specific 4DN. In this paper, we integrate allele-specific RNA-seq, Bru-seq, Hi-C, and protein binding data to provide a more comprehensive understanding of the maternal and paternal contributions to phenotype, and the relationship between these properties. We analyze differential expression for allele-biases and allele-specific cell cycle-biases, chromatin compartment switching (also known as A/B compartments), and Topologically Associated Domains (TADs). In addition, we analyze our Hi-C data using our recently developed methods that help discern differences in genome architecture between the two genomes and through the cell cycle. Our methods for analyses reveal notable patterns of allelic bias across the cell cycle, including in subsets of genes involved in related cellular processes. This work supports the idea that separate analysis of the two parental genomes is imperative to future genomic studies. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.15.992164; this version posted April 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Results Chromosome level differences of the maternal and paternal genomes In order to establish whether there are allele-specific differences in both chromosome structure and gene expression, we analyzed parentally phased whole-chromosome Hi-C matrices and RNA-seq data at a 1 Mb scale (phasing discussed below, in Supplementary Notes 2.1.7, and 2.4). We performed a simple subtraction of the phased Hi-C matrices and found the Frobenius norm