Characterizing the Interplay Between Gene Nucleotide Composition Bias
Total Page:16
File Type:pdf, Size:1020Kb
Lemaire et al. Genome Biology (2019) 20:259 https://doi.org/10.1186/s13059-019-1869-y RESEARCH Open Access Characterizing the interplay between gene nucleotide composition bias and splicing Sébastien Lemaire1†, Nicolas Fontrodona1†, Fabien Aubé1†, Jean-Baptiste Claude1, Hélène Polvèche2, Laurent Modolo3, Cyril F. Bourgeois1, Franck Mortreux1 and Didier Auboeuf1* Abstract Background: Nucleotide composition bias plays an important role in the 1D and 3D organization of the human genome. Here, we investigate the potential interplay between nucleotide composition bias and the regulation of exon recognition during splicing. Results: By analyzing dozens of RNA-seq datasets, we identify two groups of splicing factors that activate either about 3200 GC-rich exons or about 4000 AT-rich exons. We show that splicing factor–dependent GC-rich exons have predicted RNA secondary structures at 5′ ss and are dependent on U1 snRNP–associated proteins. In contrast, splicing factor–dependent AT-rich exons have a large number of decoy branch points, SF1- or U2AF2-binding sites and are dependent on U2 snRNP–associated proteins. Nucleotide composition bias also influences local chromatin organization, with consequences for exon recognition during splicing. Interestingly, the GC content of exons correlates with that of their hosting genes, isochores, and topologically associated domains. Conclusions: We propose that regional nucleotide composition bias over several dozens of kilobase pairs leaves a local footprint at the exon level and induces constraints during splicing that can be alleviated by local chromatin organization at the DNA level and recruitment of specific splicing factors at the RNA level. Therefore, nucleotide composition bias establishes a direct link between genome organization and local regulatory processes, like alternative splicing. Keywords: Splicing, Genomic, Chromatin organization, Nucleotide composition bias Introduction [2]. For example, secondary structures at the 5′ ss can hin- Most eukaryotic genes comprise both exons and introns. der the interactions between the 5′ ss and U1 snRNA and Introns are defined at their 5′-end by the 5′ splicing site this structure-dependent constraint can be relaxed by RNA (ss), which interacts with the U1 snRNA, and at their 3′- helicases, such as DDX5 and DDX17 [3–5]. Meanwhile, end, by the branch point (BP; recognized by SF1), the poly- secondary structures at the 3′-end of short introns can re- pyrimidine (Py) tract (recognized by U2AF2 or U2AF65), place the need for U2AF2 [6]. Splicing signals are short and the 3′ ss (recognized by U2AF1 or U2AF35) [1]. SF1 degenerate sequences, and exons are much smaller than and U2AF2 allow the recruitment of the U2 snRNP, which introns. How then are exons precisely defined? How are contains the U2 snRNA that interacts with the BP [1]. Re- bona fide splicing signals distinguished from pseudo- cent large-scale experiments have demonstrated that RNA signals or decoy signals? These questions have been inten- secondary structures frequently occur in the vicinity of sively researched but still remain open. splicing sites and gene-by-gene analyses have demonstrated Many (if not all) exons require a variety of splicing fac- that RNA structures play a direct role in splicing regulation tors to be defined. Splicing factors that belong to differ- ent families of RNA binding proteins, such as the SR and hnRNP families, bind to short degenerate motifs * Correspondence: [email protected] †Sébastien Lemaire, Nicolas Fontrodona and Fabien Aubé contributed either in exons or introns of pre-mRNAs [7]. Splicing equally to this work. factor binding sites are low-complexity sequences com- 1Laboratory of Biology and Modelling of the Cell, Univ Lyon, ENS de Lyon, prising either the same nucleotide or dinucleotide [8–10]. Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, 46 Allée d’Italie Site Jacques Monod, F-69007 Lyon, France Splicing factors modulate the recruitment of different Full list of author information is available at the end of the article spliceosome-associated components [7, 11]. © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Lemaire et al. Genome Biology (2019) 20:259 Page 2 of 21 Spliceosome assembly and the splicing process occur Here, we have investigated the relationship between mostly during transcription [11, 12]. In this setting, the the splicing process, gene nucleotide composition bias, velocity of RNA polymerase II (RNAPII) influences exon and chromatin organization at both the local and global recognition in a complex manner, as speeding up tran- levels. We initially identified sets of exons activated by scription elongation can either enhance or repress exon different splicing factors and then demonstrated that inclusion [13]. RNAPII velocity is in turn influenced by analyzing the nucleotide composition bias provided a the local chromatin organization, such as the presence better understanding of the interplay between chromatin of nucleosomes [11, 12]. Nucleosomes are preferentially organization and splicing-related features, which collect- positioned on exons because exons have a higher GC ively affect exon recognition. We propose that nucleo- content than introns, which increases DNA bendability tide composition bias not only contributes to the 1D and [14–17]. In addition, Py tracts (mostly made of Ts) up- 3D genome organization, but has also local conse- stream of exons may form a nucleosome energetic bar- quences at the exon level during the splicing process. rier [14–17]. Nucleosomes influence splicing by slowing down RNAPII in the vicinity of exons and by modulating Results the local recruitment of splicing regulators [11, 12]. In- Splicing factor–dependent GC-rich and AT-rich exons deed, depending on their specific chemical modifications Publicly available RNA-seq datasets generated after (e.g., methylation), histone tails can interact directly or knocking down or overexpressing individual splicing fac- indirectly with splicing factors [18]. Therefore, exon tors across different cell lines were analyzed [28–43] recognition during the splicing process depends on a (Additional file 1: Table S1). Using our recently pub- complex interplay between signals at the DNA level lished FARLINE pipeline, which allows the inclusion rate (e.g., nucleosome positioning) and signals at the RNA of exons from RNA-seq datasets to be quantified [44], level (e.g., splicing factor binding sites). we defined the sets of exons whose inclusion is acti- Genes are not randomly organized across a genome, vated by each of the 33 splicing factors that were and nucleotide composition bias over genomic regions analyzed (Additional file 2: Table S2). We focused on of varying lengths plays an important role in genome splicing factor–activated exons to uncover the splicing- organization at multiple genomic scales. For example, related features characterizing exons whose recognition isochores are large genomic regions (≥ 30 kbps) with a depends on at least one splicing factor. We identified 10, uniform GC content that differs from adjacent regions 707 exons that were activated by at least one splicing [19–21]. Isochores can be classified into five families, factor from the 93,680 exons whose inclusion rate ranging from less than 37% of GC content to more than was quantified by FARLINE across all datasets (see 53% [19–21]. GC-rich isochores have a higher density of the “Materials and methods” section). genes than AT-rich isochores, and genes in GC-rich iso- As expected, splicing factor–activated exons had weaker chores contain smaller introns than genes in AT-rich 3′-and/or5′-splicing site (ss) scores as compared to the isochores [22–24]. In addition, to be associated with median score of human exons (Additional file 3:FigureS1). specific gene features (e.g., intron size), the density of We computed the nucleotide composition of each splicing GC nucleotides in genes has consequences on splicing factor–activated exon (Additional file 3:FigureS2).Note site sequences and on the splicing process [25]. For ex- that we systematically refer to both thymine and uracil as ample, it has been proposed that splicing of short in- “T” to simplify our goal of analyzing sequence-dependent trons in a GC-rich context may occur through the features at both the DNA and RNA levels. In addition, intron definition model, while the splicing of large in- values obtained from splicing factor–activated exons were trons in an AT-rich context may occur through the normalized by the median values measured for human cod- exon definition model [11, 26]. Collectively, these ing exons used as a set of control exons, in order to repre- observations support a model in which the gene archi- sent results in a consistent way. Sets of exons activated by tecture and gene nucleotide composition bias (e.g., GC different splicing factors had a different proportion of GCs or AT content) influence local processes at the exon as compared to the median values of control exons (Fig. 1a, level, such as nucleosome positioning and intron re- Additional file 3: Figure S3a). Interestingly, the GC content moval. As exon recognition also depends on the bind- of splicing factor–activated exons positively correlated with ing to the pre-mRNAs of splicing factors that interact the GC content of their flanking introns (Fig. 1b, upper − with compositionally biased sequences, one interesting panel, P value < 10 16 and Additional file 3:FigureS3d). possibility is that the nature of these splicing factors Accordingly, splicing factor–activated GC- and AT- depends at least in part on the gene nucleotide com- rich exons were flanked by GC- and AT-rich position bias.