The role of histone modifications in the regulation of during epithelial-to-mesenchymal transition Alexandre Segelle

To cite this version:

Alexandre Segelle. The role of histone modifications in the regulation of alternative splicing during epithelial-to-mesenchymal transition. Agricultural sciences. Université Montpellier, 2020. English. ￿NNT : 2020MONTT017￿. ￿tel-03137009￿

HAL Id: tel-03137009 https://tel.archives-ouvertes.fr/tel-03137009 Submitted on 10 Feb 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

THÈSE POUR OBTENIR LE GRADE DE DOCTEUR

DE L’UNIVERSITÉ DE M ONTPELLIER

En Biologie Moléculaire et Cellulaire

École doctorale Sciences Chimiques et Biologiques pour la Santé (ED CBS2 168)

Unité de recherche UMR9002 CNRS-UM – Institut de Génétique Humaine (IGH)

The role of histone modifications in the regulation of alternative splicing during the epithelial-to-mesenchymal transition

Présentée par Alexandre Segelle

Le 28 Septembre 2020

Sous la direction de Reini Fernandez de Luco

Devant le jury composé de

Anne-Marie MARTINEZ, Professeure, Université de Montpellier Présidente du jury Christian MUCHARDT, DR et Chef d’équipe, Institut Pasteur de Paris Rapporteur Juan VALCARCEL, Research Professor et Chef d’équipe, Institut CRG de Barcelone Rapporteur Franck MORTREUX, CR, Institut LBMC de Lyon Examinateur Reini FERNANDEZ DE LUCO, CR et Chef d’équipe, Institut IGH de Montpellier Directrice de thèse

The Role of histone modifications in the regulation of alternative splicing during the EMT

With the support of – Avec le soutien de:

I G H

INSTITUTE OF HUMAN GENETICS

1 The Role of histone modifications in the regulation of alternative splicing during the EMT

Table of Contents

ACKNOWLEDGMENTS - REMERCIEMENTS ...... 5 LIST OF FIGURES ...... 8 LIST OF TABLES ...... 10 ABREVIATIONS ...... 11 SUMMARY – SYNTHESE EN FRANCAIS...... 13 INTRODUCTION ...... 15 Chapter 1: RNA splicing, alternative splicing (AS) and underlying mechanisms ...... 16 1.1 History of splicing ...... 16 1.1.1 Discovery of RNA splicing ...... 16 1.1.2 Discovery of alternative splicing ...... 17 1.2 RNA splicing ...... 18 1.2.1 RNA splicing reaction ...... 20 1.2.2 definition ...... 22 a Donor site and acceptor site ...... 22 b Branch point ...... 23 1.2.3 Spliceosomal complex ...... 23 a composition ...... 23 b Spliceosome assembling and catalytic activity ...... 25 1.3 Alternative splicing ...... 28 1.3.1 Different alternative splicing events ...... 28 a Cassette ...... 28 b Mutually exclusive exons ...... 28 c Alternative 5’ and 3’ splice sites ...... 29 d Intron retention ...... 30 e Alternative promoters and polyadenylation sites ...... 30 1.3.2 Regulatory elements ...... 30 a Cis regulatory elements ...... 31 b Trans regulatory elements ...... 32 1.4 Deregulation of alternative splicing in cancer ...... 34 1.4.1 Alteration of alternative splicing programs in cancer ...... 34 a Alternatively spliced in cancer ...... 34 b Splicing factors in cancer ...... 36 1.4.2 Oncogenic and tumor suppressor functions of AS variants ...... 37

2 The Role of histone modifications in the regulation of alternative splicing during the EMT

1.4.3 Therapeutic strategies based on splicing variants ...... 38 Chapter 2: The relationship between and splicing ...... 40 2.1 Chromatin ...... 40 2.1.1 Discovery of chromatin ...... 40 2.1.2 Chromatin structure ...... 40 a Nucleosome ...... 41 b Chromatin higher-order structure ...... 42 c Euchromatin and heterochromatin ...... 43 2.1.2 Chromatin: a dynamic structure ...... 45 a Chromatin remodeling ...... 45 b Histone chaperones ...... 46 c Histone variants ...... 47 d Histone post-translational modifications ...... 48 e DNA methylation ...... 50 2.2 Genome and epigenome editing: the CRISPR/(d)Cas9 system ...... 51 2.2.1 Discovery and description of CRISPR/Cas ...... 51 2.2.2 Genome editing with CRISPR/Cas9...... 53 2.2.3 CRISPR/dCas9: a powerful tool for chromatin regulation ...... 55 a Transcriptional regulation ...... 55 b Epigenome editing ...... 55 2.3 Alternative splicing and chromatin signature ...... 56 2.4 Relationship between chromatin, transcription and alternative splicing ...... 58 2.4.1 Coupling between alternative splicing and transcription ...... 59 2.4.2 Chromatin affects alternative splicing ...... 60 a Kinetic model ...... 60 b Recruitment model ...... 61 Chapter 3: The epithelial-to-mesenchymal transition (EMT) and its regulation ...... 63 3.1 The different types of EMT ...... 66 3.1.1 Type 1 EMT: development...... 66 3.1.2 Type 2 EMT: wound healing ...... 67 3.1.3 Type 3 EMT: cancer invasion ...... 67 3.2 EMT regulatory programs ...... 69 3.2.1 Transcriptional regulation ...... 69 3.2.2 Post-transcriptional regulation ...... 70 a microRNAs ...... 70 b Alternative splicing ...... 70

3 The Role of histone modifications in the regulation of alternative splicing during the EMT

3.3 Link between alternative splicing and EMT ...... 71 3.3.1 Global splicing reprogramming during EMT and examples ...... 71 a Fibroblast growth factor 2 (FGFR2) ...... 71 b p120-catenin (CTNND1) ...... 72 c CD44 ...... 73 3.3.2 Factors regulating epithelial and mesenchymal splicing ...... 73 a Epithelial splicing factors ...... 75 b Mesenchymal splicing factors ...... 76 c Histone modifications and chromatin factors ...... 78 Chapter 4: Thesis aims ...... 79 4.1 Identification of histone modifications involved in the establishment and maintenance of a new EMT-specific splicing program ...... 79 4.2 Direct effect on alternative splicing of modulating regulatory histone modifications by adapting the innovative CRISPR/dCas9 system ...... 79 4.3 Physiological impact on EMT progression of modulating splicing-specific histone modifications ...... 80 4.4 Mechanisms linking histone modifications to alternative splicing regulation during EMT 81 RESULTS ...... 82 Article: Histone marks are drivers of splicing changes necessary for an epithelial-to-mesenchymal transition ...... 83 Introduction ...... 85 Results ...... 86 Discussion ...... 93 Materials and methods ...... 109 Figures ...... 129 DISCUSSION & PERSPECTIVES ...... 144 1 Regulatory pathways responsible for H3K27 marking at alternatively spliced exons 145 2 Splicing-associated chromatin signatures ...... 146 3 Not all histone marks are drivers of changes in splicing: H3K4me1, a late change that could link AS with RNAP II speed ...... 150 4 CRISPR/dCas9 as a potential therapeutic tool to impair EMT ...... 152 CONCLUSION ...... 156 BIBLIOGRAPHY ...... 159 ANNEXES ...... 179 Article: Splicing-associated signatures: a combinatorial and position-dependent role for histone marks in splicing definition ...... 180

4 The Role of histone modifications in the regulation of alternative splicing during the EMT

ACKNOWLEDGMENTS - REMERCIEMENTS

First, I would like to thank Dr. Christian Muchardt and Pr. Juan Valcarcel for accepting to be reviewers of my thesis manuscript, Dr. Franck Mortreux for arranging his schedule to attend my thesis defense, and Pr. Anne-Marie Martinez for kindly accepting to be President of my thesis jury. Thank all of you for accepting to oversee my thesis work.

I am gratefully to Dr. Paola Scaffidi, Dr. Dominique Helmlinger and Dr. Edouard Bertrand for accepting to be in my thesis committee (comité de suivi de thèse) during these four years, helping me to make the most of my PhD.

Je vais pour la suite de ces remerciements continuer en français. Je souhaite tout d’abord remercier Reini pour m’avoir accepté dans son équipe et pour m’avoir supervisé durant ces quatre années de thèse. Tu as été une super encadrante avec qui j’ai adoré travailler et discuter. Nos échanges m’auront permis de grandir scientifiquement et humainement. Tu m’as toujours considéré comme un chercheur à part entière, m’as poussé vers le haut tout en te souc iant de mon équilibre personnel. Pour tout cela Merci.

Je remercie également Monsieur le Directeur Monsef Benkirane pour m’avoir accueilli au sein de son unité de recherche, cet endroit unique qu’est l’Institut de Génétique Humaine (IGH).

L’IGH aura été un lieu fantastique pour réaliser ma thèse et je tiens à en remercier tous ses membres sans qui cette aventure n’aurait pas été la même. Bien -sûr je remercie la communauté du « basement ». Toutes les interactions et les discussions que j’ai pu avoir avec vous auront été des moments privilégiés que je n’oublierai jamais, et même si la thèse est un travail personnel, c’est aussi grâce à vous tous si j’y suis arrivé. Merci à Sophie pour sa gentillesse et toutes ses délicieuses tartes. Merci à Amandine pour les ragots qu’elle nous aura rapporté s et les « quelques » produits qu’elle m’a passés. Merci à Eric et Lenka, les collègues « d’ à côté », qui lorsqu’ils étaient là ont su animer nos repas et m’apporter d’enrichissantes discussions. Un grand merci à vous tous pour tout ce que vous m’avez apporté.

5 The Role of histone modifications in the regulation of alternative splicing during the EMT

J’en connais qui se demandent pourquoi ils n’ont pas encore vu leur nom apparaitre alors je règle cela tout de suite en disant un grand merci à tous les membres de l’équipe qui m’ ont accueilli. Andrew et Jean-Philippe avec qui j’ai débuté cette aventure, Yaiza qui nous a rapidement rejoint, et enfin Marie-Sarah la petite dernière. Vous avez été comme une famille durant ces quatre années, parfois chiants, parfois réconfortants, mais toujours p résents. Je n’oublierai jamais nos si « nombreuses » soirées bières. Sans vous cette aventure n’aurait certainement pas été la même et je ne peux que vous en remercier.

Je vais mainte nant m’éloigner de l’institut où tout s’est passé pour remercier des gens qui ont tout autant compté.

Je tiens à profondément remercier le club des bracelets : Sandra, Claire et Delphine (alias Mimi, Zozo et Dédé), mais aussi Kevin et Julien qui font maintenant partie de la famille. Tou tes les trois vous savez ce qu’est une thèse pour l’avoir déjà fait, pour être en train de le faire, ou pour connaitre des personnes qui traversent cette étape, et malgré les difficultés nous sommes toujours restés soudés, on a continué à se voir, à voyager, à rigoler. Tout ça jamais je ne l’oublierai et sur vous je sais que je pourrai toujours compter.

Je remercie particulièrement mes parents qui ont toujours été derrière moi pour me soutenir et m’encourager. Vous m’avez depuis toujours donné les moyens d e réussir, m’avez offert tout ce que je pouvais désirer, et si j’en suis arrivé l à aujourd’hui c’est grâce à vous, cette réussite vous revient.

A mon frère et à ma belle-sœur Aurore, à mes grand parents, un grand merci pour être toujours à mes côtés et pour me soutenir. A bibou Théo qui ne verra pas ces mots tout de suite mais qui fait déjà de moi le plus heureux des Tontons. Sache que tu auras toujours un super parrain pour t’épauler.

A mes beaux-parents et à mon beau-frère qui m’ont accueilli au sein de leur famille sans la moindre hésitation. A eux qui m’ont fait me sentir chez eux comme chez moi. Ces gestes-là jamais je ne les oublierai.

Ces remerciements je ne pouvais les terminer que par une personne, ou plutôt deux. Les deux personnes le s plus importantes de ma vie, avec qui j’ai traversé plus de la moitié de ma thèse et sans qui rien n’aurait été possible.

6 The Role of histone modifications in the regulation of alternative splicing during the EMT

A toi « Chalou », Merci mon Amour, merci pour tout ce que tu m’apportes au quotidien, pour ton soutien, ta bienveillance, ta générosité, pour ce que tu es. Tu as su me donner une nouvelle vision de la vie, me faire ouvrir les yeux sur ce qui importait, me faire découvrir des choses que je n’avais jamais connues. Ma thèse sera aussi toujours associée à ces trois pièces que tu as jetées dans la fontaine de Trevi et qui ont scellé notre destin à tout jamais. Ma réussite d’aujourd’hui est également la tienne et la fin de ces quatre années annonce le début de la plus belle aventure qui soit, celle de faire ma vie près de toi. « Je t’aime de là à de là en faisant tout le tour de l’univers et plus encore <3 <3 ».

Si j’ai précédemment parlé de deux personnes, c’est parce que je ne pouvais pas oublier de remercier celui qui a été l’une de nos plus belles décisions. A toi « Chappi », notre fils a doptif qui aura passé le temps que j’ai mis à rédiger à dormir à mes côtés. Tes ronronnements et tes câlins m’auront aidé à m’apaiser et je sais que sans toi je n’y serais pas arrivé.

7 The Role of histone modifications in the regulation of alternative splicing during the EMT

LIST OF FIGURES

INTRODUCTION Figure 1: Electronic microscopy (EM) of DNA-RNA hybrid...... 16 Figure 2: Representation of alternative splicing of CALCA ...... 17 Figure 3: Role of the three different types of RNA...... 18 Figure 4: Multi-step process of RNA maturation...... 19 Figure 5: Constitutive and alternative splicing...... 20 Figure 6: Trans-esterification reactions during intron splicing...... 21 Figure 7: Consensus sequences of major-class ...... 22 Figure 8: snRNA secondary structures...... 24 Figure 9: snRNAs biogenesis...... 25 Figure 10: Kinetic of spliceosome assembling during splicing reaction...... 27 Figure 11: Different modes of alternative splicing...... 29 Figure 12: Regulatory elements of alternative splicing...... 31 Figure 13: SR and HNRNP families...... 33 Figure 14: therapeutic strategies based on alternative splicing targeting...... 39 Figure 15: Double helix structure of DNA...... 41 Figure 16: Nucleosome composition...... 42 Figure 17: Different levels of DNA compaction...... 44 Figure 18: Chromatin remodelers...... 46 Figure 19: Histone post-translational modifications...... 49 Figure 20: the three stages of the CRISPR/Cas9 system...... 52 Figure 21: CRISPR/Cas9 interference systems...... 54 Figure 22: different applications of the CRISPR/dCas9 system...... 56 Figure 23: Nucleosome and histone modification enrichments around exons...... 57 Figure 24: the kinetic model for AS regulation by transcriptional elongation...... 60 Figure 25: two different models by which chromatin influences AS...... 62 Figure 26: The Epithelial-to-Mesenchymal transition (EMT)...... 63 Figure 27: EMT is associated to multiple cellular processes...... 64 Figure 28: Different types of EMT...... 65 Figure 29: Reversibility of Type 1 EMT during development...... 66 Figure 30: Wound healing in physio-pathological conditions...... 67 Figure 31: The metastatic cascade...... 68 Figure 32: Role of Transcription Factors in EMT regulation...... 69 Figure 33: Binding and expression of splicing factors in EMT...... 74

8 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 34: Cell type specific splicing factors and associated splicing events...... 75 Figure 35: Different splicing factors involved in EMT...... 77

RESULTS Figure 1: Specific histone modifications correlate in time with dynamic changes in splicing during EMT...... 130 Figure 2: Localised changes in H3K27me3 and H3K27ac drive alternative splicing...... 132 Figure 3: Chromatin-induced changes in splicing recapitulate the EMT...... 134 Figure 4: H3K27 marks regulate splicing by modulating the recruitment of specific splicing factors to the pre-mRNA...... 136 Supplementary Figure 1: Localised enrichment of specific histone marks at alternatively spliced exons during EMT...... 138 Supplementary Figure 2: -specific epigenome editing of H3K27 marks is sufficient to induce a change in splicing...... 140 Supplementary Figure 3: Direct effect of dCas9 epigenomic editing on EMT...... 142 Supplementary Figure 4: H3K27 marks regulate splicing by modulating the recruitment of RNA- binding proteins, such as PTB...... 143

DISCUSSION, CONCLUSION & PERSPECTIVES Figure 1: role of antisense FGFR2 in alternative splicing regulation...... 145 Figure 2: Impact of H3K27ac levels on splicing and breast cancer...... 147 Figure 3: Mechanisms through which H3K36me3 affect alternative splicing...... 148 Figure 4: Splicing-associated chromatin signatures (SACS)...... 149 Figure 5: Kinetics of RNA Polymerase II and histone modifications...... 151 Figure 6: different strategies used to modulate chromatin modifications...... 153 Figure 7: strategy to impair EMT via dCas9-associated AS changes...... 154 Figure 8: Model linking chromatin marks with AS outcome during EMT...... 158

9 The Role of histone modifications in the regulation of alternative splicing during the EMT

LIST OF TABLES

INTRODUCTION Table 1: Composition and dynamic of the different spliceosomal complexes...... 26 Table 2: Examples of abnormal transcripts in various cancers...... 35 Table 3: Examples of splicing factors deregulated in various cancers...... 36 Table 4: Histone chaperones and functions in nucleosome assembling...... 47 Table 5: Examples of alternative splicing changes induced during EMT...... 72

RESULTS Supplementary List: List of Reagents and resources ...... 121 Supplementary Table S1: List of ChIP-qPCR primers ...... 123 Supplementary Table S2: List of RT-qPCR primers ...... 125 Supplementary Table S3: List of RNAP II elongation assay and RNA-IP primers ...... 126 Supplementary Table S4: List of gRNAs ...... 127 Supplementary Table S5: List of shRNAs ...... 127 Supplementary Table S6: List of cloning primers ...... 128

10 The Role of histone modifications in the regulation of alternative splicing during the EMT

ABREVIATIONS

A, T, C, G: Adenine, Thymine, Cytosine, Guanine Ac: Acetyl APA: Alternative Polyadenylation AR: AS: Alternative Splicing ATP: Adenosine Tri-Phosphate bp: Base Pairs BRCA1 and 2: Breast Cancer 1 and 2 CALCA: Calcitonin Related Polypeptide Alpha Cas: CRISPR-associated CASC: Cancer-Associated Splicing Changes CGRP: Calcitonin Gene-Related Peptide CHD: Chromodomain--DNA Binding ChIP-seq: Chromatin Immuno-Precipitation Sequencing CLIP-seq: Cross-Linking Immuno-Precipitation Sequencing CMML: Chronic Myelomonocytic Leukemia CRISPR: Clustered Regulatory Interspaced Short Palindromic Repeats crRNA: CRISPR RNA CTD: Carboxy-Terminal Domain dCas9: Nuclease-null Cas9 – Dead-Cas9 DNA: Deoxyribonucleic Acid DRB: 5,6-Dichloro-1-β- d-ribofuranosylbenzimidazole DSB: Double Strand Break EM: Electronic Microscopy EMT: Epithelial-to-Mesenchymal Transition ESE: Exonic Splicing Enhancer ESRP1 and 2: Epithelial Specific Regulatory 1 And 2 ESS: Exonic Splicing Silencer EZH2: Enhancer of Zest Homologue 2 FGF: Fibroblast Growth Factor FGFR2: Fibroblast Growth Factor Receptor 2 FN1: Fibronectin 1 H: Histone HAT: Histone Acetyltransferase HCC: Hepatocellular Carcinoma HDAC: Histone Deacetylase HDM: Histone Demethylase HDR: Homologous Direct Repair HMT: Histone Methyltransferase HNRNP: Heterogeneous Nuclear Ribonucleoprotein ILS: Intron Lariat Spliceosome INO80: Inositol Requiring 80-Switch Related Complex 1 ISE: Intronic Splicing Enhancer ISS: Intronic Splicing Silencer ISWI: Imitation Switch K: Lysine kDa: Kilo Dalton KRAB: Krüppel Associated Box

11 The Role of histone modifications in the regulation of alternative splicing during the EMT

KS: K-Homology Lsm: Like Sm Me: Methyl MET: Mesenchymal-To-Epithelial Transition miRNA: Micro RNA mRNA: Messenger Ribonucleic Acid NHEJ: Non-Homologous End-Joining NMD: Nonsense-Mediated Decay NSCLC: Non-Small Cell Lung Cancer PAM: Protospacer Adjacent Motif PCR: Polymerase Chain Reaction Ph: Phosphorylation Poly Y: Polypyrimidine Tract PolyA: Polyadenylation PTB: Polypyrimidine Tract Binding PTM: Post-Translational Modification qRT-PCR: Quantitative Reverse Transcription PCR R: Purine RBM47: RNA Binding Motif Protein 47 RNAP II: RNA Polymerase II RRM: RNA Recognition Motif rRNA: Ribosomal Ribonucleic Acid scaRNA: Small Cajal Body RNA sgRNA: Single Guide RNA SMN: Survival of Motor Neurons snoRNA: Small Nucleolar RNA snRNA: Small Nuclear Ribonucleic Acid snRNP: Small Nuclear Ribonucleoprotein Sp: Streptococcus Pyogenes SR: Serine Arginine Rich SRE: Splicing Regulatory Element SWI/SNF: Switching Defective/Sucrose Non-Fermenting TALE: Transcription Activator-Like Effector tracrRNA: Trans-Activating RNA tRNA: Transfer Ribonucleic Acid TSA: Trichostatin A TSS: Transcription Start Site TXF: Tamoxifen U: Uracil Ub: Ubiquitination VEGF-A: Vascular Endothelial Growth Factor A VPR: VP64-P65-Rta Y: Pyrimidine ZFP: Protein

12 The Role of histone modifications in the regulation of alternative splicing during the EMT

SUMMARY – SYNTHESE EN FRANCAIS

ENGLISH Alternative splicing is a key mechanism for cell identity that increases the protein diversity and plasticity of a limited coding genome. Disease-specific splice variants are more and more identified, and splicing-targeting strategies are turning into promising new therapies. Interestingly, emerging evidence suggest an important role for chromatin conformation and histone modifications in regulating this RNA process. However, whether histone modifications are sufficient to impact the alternative splicing outcome in a meaningful biological context still remains unknown. To address this question, we have taken advantage of the epithelial-to-mesenchymal transition (EMT), an inducible and highly dynamic physiological model system of cell reprogramming.

To identify histone modifications involved in EMT-dependent splicing, we used as a cellular system human normal epithelial mammary MCF10a cells. We first correlated in time during the onset of the EMT changes in alternative splicing with changes in histone marks levels along genes essential for EMT, such as FGFR2 and CTNND1. Surprisingly, we observed that marks, such as H3K27me3 and H3K27ac, changed very early in time, and in opposite ways, at the regulated exons even before the first changes in splicing could be detected. Whereas marks, such as H3K4me1, change late in time. To go beyond correlations and address the causative role of these histone marks on alternative splicing outcome, we adapted the CRISPR/dCas9 system to edit exon-specifically the levels of H3K27me3 and H3K27ac at the studied alternatively spliced genes. For the first time, we could induce a change in the splicing outcome by just changing the levels of a histone mark specifically and uniquely at the alternatively spliced exon of interest, proving that these histone marks are sufficient to trigger the highly dynamic changes observed during the EMT. Importantly, these chromatin- induced changes in splicing were also sufficient to recapitulate an EMT, supporting a physiological role for these histone marks in alternative splicing regulation.

Altogether, our results support an important role for chromatin in orchestrating highly dynamic changes in alternative splicing relevant for a cell reprogramming process, such as the EMT. Moreover, we are showing for the first time that exon-specific changes in histone modifications are sufficient to induce a change in the splicing outcome that has phenotypic consequences on the cell identity.

13 The Role of histone modifications in the regulation of alternative splicing during the EMT

FRANCAIS L'épissage alternatif est un mécanisme important et lié à la complexité de notre génome, il est impliqué dans de nombreux processus biologiques et maladies. Durant ces dernières années, la chromatine a été montrée comme jouant un rôle majeur dans la régulation de l'épissage. Cependant, à quel point les modifications d'histones peuvent impacter ce phénomène reste encore inconnu. Pour répondre à cette question, nous utilisons comme modèle la transition épithélio-mésenchymateuse (TEM), un système dynamique et inductible de reprogrammation cellulaire, dans lequel l'épissage est fortement impliqué.

Nous avons premièrement corrélé les changements d'épissage alternatif et les changements d'enrichissements de marques d'histones le long de gènes essentiels pour la TEM tels que FGFR2 et CTNND1. Étonnamment, nous observons des changements très précoces de certaines marques (H3K27me3, H3K27Ac), qui précèdent les changements d'épissage alternatif, alors que d'autres vont être plus tardives (H3K4me1) et être associées à des changements d'épissage déjà établis. Pour répondre à la question d'un potentiel effet direct des marques d'histones sur l'épissage, nous avons adapté le système CRISPR/dCas9 afin de modifier les niveaux de H3K27me3 et H3K27Ac spécifiquement sur les exons alternativement épissés et de voir l'effet sur l'inclusion de ceux-ci. Pour la première fois, nous avons pu induire un changement d'épissage par la simple modification de marques d'histones spécifiquement sur l'exon alternativement épissé, prouvant que ces marques sont suffisantes pour induire les changements dynamiques d'épissage observés durant la TEM. Ces changements d’épissage liés à la chromatine se sont également montrés suffisants pour induire une TEM partielle, suggérant un rôle physiologique de ces marques d’histones dans la régulation de l’épissage alternatif.

Ensemble, nos résultats démontrent un nouveau rôle de la chromatine dans la régulation de l'épissage alternatif au cours du processus de reprogrammation cellulaire qu’est l a TEM, et qu'un changement spécifique des marques sur l'exon régulé est suffisant pour induire le changement d'épissage, ces modifications étant suffisantes pour induire des changements d’épissage qui vont à leur tour avoir des conséquences phénotypiques sur l’identité des cellules.

14 The Role of histone modifications in the regulation of alternative splicing during the EMT

INTRODUCTION

15 The Role of histone modifications in the regulation of alternative splicing during the EMT

Chapter 1: RNA splicing, alternative splicing (AS) and underlying mechanisms

1.1 History of splicing

1.1.1 Discovery of RNA splicing

In 1977, Richard Roberts (Chow et al., 1977) and Phillip Sharp (Berget et al., 1977) started fundamental works on genetic regulation leading to the discovery of RNA splicing. They received in 1993 the Nobel prize in Physiology and Medicine. In eukaryotic cells, DNA is transcribed in messenger RNA (mRNA) in the nucleus and is then transported into the cytoplasm where it is translated into proteins. By microscopic observations, they have identified differences between nuclear and cytoplasmic mRNAs from adenovirus. Complexes composed of cytoplasmic RNA and double strand DNA, called hybrids or R-loops, were not completely assembled, with corresponding regions and other regions that disappeared in the cytoplasmic RNA (Figure 1 ). A model has been proposed in which, additionally to the 5’ capping and the 3’ polyA tail, mRNA maturation would involve skipping of fragments from the primary transcript. They propose for the first time the term “Splicing”.

Figure 1: Electronic microscopy (EM) of DNA-RNA hybrid. EM and representation of RNA-DNA hybrid. The mRNA is shown in red and DNA in black. Regions where RNA and DNA are in parallel represent hybrids and A, B and C are introns (adapted from Berget et al., 1977).

16 The Role of histone modifications in the regulation of alternative splicing during the EMT

Few months later, Pierre Chambon lab demonstrated the existence of split genes in with the ovalbumin gene containing two sequences not found in the coding region (Breathnach et al., 1977). I n 1978, Walter Gilbert called them “exons” for included sequences in the mRNA and “introns” for skipped sequences (Gilbert, 1978). The same year, the splicing machinery hypothesis wa s demonstrated with the β -globin gene that is transcribed in a precursor RNA containing an intron (Tilghman et al., 1978).

1.1.2 Discovery of alternative splicing

No longer after RNA splicing discovery, publications brought out that one single pre- mRNA can produce several mature mRNAs with various combinations of exons (Berk and Sharp, 1978) and in 1982, alternative splicing has been demonstrated for the first time in an endogenous gene (Amara et al., 1982). CALCA gene was identified as alternatively spliced. The pre-mRNA of this gene encodes for 6 exons giving rise to two isoforms, one containing exons 1-4 and encoding for the calcitonin protein, and another in which the exon 4 is skipped and containing exons 1-3, 5 and 6, encoding for the CGRP protein ( Figure 2 ). In the following years, more and more genes were shown as alternatively spliced such as the Immunoglobulin gene (Maki et al., 1981).

Figure 2: Representation of alternative splicing of CALCA gene. In the thyroid, exon 5 and 6 are skipped giving a mature RNA composed of 4 exons and coding for calcitonin protein. In the brain, a different set of exons is retained, the exon 4 is skipped and give a mature RNA of 5 exons encoding CGRP protein (adapted from https://sciencecases.lib.buffalo.edu/ collection/ detail.html/?case_id=849&id=849).

17 The Role of histone modifications in the regulation of alternative splicing during the EMT

1.2 RNA splicing

In eukaryotes, nuclear coding genes are composed of coding sequences (exons) interspaced with non-coding sequences (introns). These intronic sequences can be also found in genes of organelles such as mitochondria or chloroplast, or in specific bacterial genes, but in these cases, intron removal is done in a different manner than for nuclear genes. It exists three main types of RNA, the classical one that is the messenger RNA (mRNA), and two other that are ribosomal RNA (rRNA) and transfer RNA (tRNA) ( Figure 3).

Figure 3: Role of the three different types of RNA. mRNAs represent 1% of the total RNA content and they carry the genetic information that is present in DNA to the protein synthesis system. rRNAs represent the most abundant RNA fraction of the total RNA content, around 95%, and are divided in four , 5S, 5.8S, 18S and 28S. They are found in ribosomes that are involved in protein synthesis. The last type are tRNAs, representing 5% of all RNA, and are involved in amino acid recruitment to ribosomes during protein synthesis (adapted from http://csls- text3.c.u-tokyo.ac.jp/inactive/08_02.html).

Transcription occurs in DNA to generate a pre messenger RNA, through the RNA polymerase II, which will be then maturated to get a mature and functional messenger RNA. It is only after this processing that the mRNA will be translocated into the cytoplasm for its translation and protein synthesis. Three major steps are necessary for a proper mRNA maturation. The 5’ capping of one extremity and the 3’ polyadenylation of the other extremity, and the splicing of introns (Figure 4 ). Cap

18 The Role of histone modifications in the regulation of alternative splicing during the EMT formation is performed by the addition of a 7-methylguanosine group at the 5’ end of the pre mRNA. This capping protects RNA from 5’ ribonucleases and is essential for the binding of the RNA to ribosomes for protein synthesis (Furuichi, 2015). Polyadenylation occurs at the 3’ end of the pre -mRNA. It implies an enzymatic cleavage and the addition of several adenine residues. A poly(A) signal sequence (5'- AAUAAA-3') is present at the 3’ end of the transcript followed by a GU -rich sequence further downstream. These two sequences will permit cleavage of the 3’ end and a subsequent addition of around 200 adenine residues to form the poly(A) tail. This tail will be bound by poly(A) binding proteins, protecting RNA from 3’ ribonucleases (Zhao et al., 1999). The third step of RNA processing is the RNA splicing, a process in which the non-coding regions of the transcript, called introns, are removed by excision, and remaining exons are connected to produce a mRNA. Intron excision and exon linking are performed by a large complex called spliceosome, composed of proteins and small nuclear RNAs (snRNA) (De Conti et al., 2013). RNA splicing occurs mostly post- transcriptionally, after complete synthesis of the RNA transcript and 5’ and 3’ maturation, but some transcripts can be spliced co-transcriptionally (Bentley, 2014).

Figure 4: Multi-step process of RNA maturation. Once the pre-mRNA is transcribed, it undergoes maturation processes to be protected from degradation and for further cytoplasmic translocation. The three processes involved are the 5’ capping, the 3’ polyadenylation and introns removal (adapted from http://csls-text3.c.u-tokyo.ac.jp/inactive/ 08_03.html).

19 The Role of histone modifications in the regulation of alternative splicing during the EMT

It exists two different types of splicing, a constitutive splicing and an alternative splicing (Figure 5 ). Constitutive splicing concerns exons that are systematically inserted into the final mature transcript while alternative splicing is a process during which specific exons can be either included or skipped from the final transcript. Alternative splicing increases the number of mRNA produced by the same gene and, consequently, increases the number of proteins that are translated (Black, 2003). The expressed isoforms will depend on the cell type, the differentiation state, the physiological state or the developmental stage (Modrek and Lee, 2002; Resch, 2004). Competition occurs between different splicing sites present in the pre mRNA and depending on the one selected, it will differentially affect the splicing outcome. Splicing site selection depends on the strength of the site itself, the recruitment of specific splicing factors, or the RNA secondary structure. Among the 20.000 coding genes present in the , around 95% of them are alternatively spliced, increasing the number of proteins that are encoded by the genome (Pan et al., 2008).

Figure 5: Constitutive and alternative splicing. (A) Constitutive splicing corresponds to the splicing out of all introns and splicing in of all exons. (B) Alternative splicing, at the opposite, has all introns spliced out but only the selected exons are spliced in resulting in different isoforms coming from the same gene (adapted from http://flax.nzdl.org/ greenstone3/).

1.2.1 RNA splicing reaction

The splicing reaction is a two-step process based on two successive trans- esterification reactions (Saldanha et al., 1993) and involves three specific sequences, the 5’ splice site (donor site), the 3’ splice site (acceptor site) and the branch point (Figure 6 ). These sequences are conserved in yeast while they are degenerated in metazoans (Will and Luhrmann, 2011). The first step is a nucleophilic attack of the 5’

20 The Role of histone modifications in the regulation of alternative splicing during the EMT phosphate at the 5’ end of the intron , on the donor site. This attack is carried out by the 2’ OH of an adenosine residue in the branch point and will release 2 intermediate products, the exon 1 and the intron lariat intermediate containing the exon 2. This lariat intermediate as a 5’ guanosine bound to the branch point through a 2’ -5’ phosphodiester binding. Following the first attack, a second nucleophilic attack occurs between the 3’ OH of the exon 1 and the phosphate at the 3’ end of the intron , on the acceptor site, leading to the ligation of the 2 exons and releasing the intron lariat that is then debranched and metabolized by cellular nucleases (Montemayor et al., 2014).

Figure 6: Trans-esterification reactions during intron splicing. Schematic representation of the two-step splicing of pre-mRNAs. The 2’ OH of the branch point carries out a nucleophilic attack on the 5’ splice site , and in turn the 3’ OH at the 3’ end of the released exon carries out a nucleophilic attack on the 3’ splice site (adapted from Wongpalee and Sharma, 2014).

21 The Role of histone modifications in the regulation of alternative splicing during the EMT

1.2.2 Intron definition

As previously mentioned, intron splicing involves specific sequences that are important to recruit the splicing machinery. These consensus sequences are found at the 5’ and 3’ ends of intron s and are respectively called donor site and acceptor site. The branch point is present upstream of the 3’ splice site and is important for the two trans - esterification steps (Figure 7 ).

Figure 7: Consensus sequences of major-class introns. Consensus sequences of the donor site, acceptor site and branch point for major-class introns. Nucleotide size of each position represents the frequency of this nucleotide at that position. Nucleotides in black are those involved in intron recognition (adapted from Patel and Steitz, 2003; and Will and Luhrmann, 2011).

a Donor site and acceptor site

Comparison of 5’ splice sites led to identify the consensus sequence AG GURAGU (R: purine). AG represent the two last nucleotides of the first exon and GU are the most conserved nucleotides in the 5’ splicing site (Moore and Sharp, 1993). Mutations in one of these two nucleotides are responsible of abnormal splicing by recognition of a 5’ cryptic site , usually close to the initial donor site, by the splicing machinery (Roca, 2003).

The acceptor site is present at the 3’ intron/exon junction and is composed of a consensus sequence: YAG G (Y: pyrimidine), where the last G represents the first nucleotide of the second exon. Such as the 5’ splice site, mutations in the 3’ site affect strongly recruitment of the spliceosome leading to splicing defects (Anna and Monika, 2018).

22 The Role of histone modifications in the regulation of alternative splicing during the EMT

b Branch point

The branch point is a consensus sequence: CURACU (R: purine), found typically 20 to 40 nucleotides upstream the 3’ splice site and a poly pyrimidine tract (Poly Y). This sequence contains an adenine residue (in red, Figure 7 ), called branching nucleotide, which binds to the guanosine residue at the 5’ end of the intron (Gao et al., 2008).

1.2.3 Spliceosomal complex

The spliceosome is a ribonucleoprotein complex involved in the catalysis of the splicing reaction previously described. It is composed of more than 200 proteins (Hegele et al., 2012) and different small nuclear ribonucleoproteins (snRNP). are the combination of a snRNA (small nuclear RNA) and numerous associated proteins.

It exists two types of , a major spliceosome involved in more than 99% of intron skipping, and a minor spliceosome catalyzing around 0.5% of intron reactions (Turunen et al., 2013). Major spliceosome is composed of 5 snRNAs (U1, U2, U4, U5 and U6) in addition to non snRNA factors. Introns processed by this spliceosome are called type U2 introns. Minor spliceosome has a distinct composition of snRNAs but with similar functions: U11, U12, U4atac and U6atac, analogues of U1, U2, U4 and U6, respectively. U5 snRNA is common with the major spliceosome. Introns processed by this spliceosome are called type U12 introns.

Major and minor spliceosomes imply similar regulation of splicing processes despite different compositions. Therefore, we will only describe the regulatory process of the major spliceosome in the following part.

a Spliceosome composition

The spliceosome is composed of two major components, snRNPs with their own biogenesis process, and non-snRNP proteins associated with the spliceosomal complex. snRNPs formation can be divided in 4 different steps: transcription of a snRNA coding gene, co-transcriptional maturation, cytoplasmic translocation and maturation, and finally nuclear import and maturation (Gruss et al., 2017). snRNAs are small RNAs rich in uridine residues of around 100 to 300 nucleotides with a highly conserved secondary structure ( Figure 8 ) (Branlant et al., 1982, 1983). At the

23 The Role of histone modifications in the regulation of alternative splicing during the EMT exception of the U6 snRNA that is transcribed by the RNA polymerase III, all snRNAs (U1, U2, U4 and U5) are transcribed by the RNA polymerase II. A monomethyl capping occurs at the 5’ end of the snRNA and the 3’ end is extended by around 20 nucleotides . This first processing step leads to the cytoplasmic export of U1, U2, U4 and U5 snRNAs. In the cytoplasm, each snRNA is associated with seven Sm proteins (B/B’, D3, D2, D1, E, F and G), forming a pre-snRNP thanks to the SMN complex. A second maturation step of the 5’ and 3’ ends of the pre -snRNP occurs leading to a nuclear import where they accumulate in cajal bodies for subsequent post-transcriptional modifications. These modifications are targets of small RNAs called snoRNA/scaRNA (Jady, 2001). Final assembling of snRNPs ends with recruitment of more than 150 proteins specific to each snRNA.

Figure 8: snRNA secondary structures. Secondary structures of the snRNAs present in the major spliceosome (adapted from Will and Luhrmann, 2011).

During the biogenesis process, U6 snRNA is retained in the nucleus. It undergoes several maturation steps similarly to the other snRNAs and is associated with seven Lsm proteins (Vidal et al., 1999), but its post-transcriptional modifications occur in the nucleolus (Figure 9).

Spliceosome involves many other non-snRNP proteins for a correct processing of introns. These proteins are not all present at the same time in the spliceosome and their recruitment depends on their functions in spliceosome assembling. They are involved in fundamental processes such as the recognition of the 5’ and 3’ splice sites or the structuration of the different spliceosomal complexes. Regulation of exchanges between each spliceosome state has a crucial role in spliceosome plasticity and is a major element for the splicing reaction (Table 1 ) (Wahl et al., 2009).

24 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 9: snRNAs biogenesis. Schematic representation of snRNAs U1, U2, U4, U5 and U6 biogenesis. U1 to U5 snRNAs are transcribed by RNA polymerase II and are processed the same way while U6 snRNA is transcribed by RNA polymerase III and has its own processing pathway (adapted from Kiss, 2004).

b Spliceosome assembling and catalytic activity

Spliceosome assembling is a sequential and multi-step process. It starts with the recognition of the 5’ and 3’ sp lice sites and the recruitment of the splicing machinery that gives a specific spatial conformation to the pre-mRNA for the following splicing reaction. Studies have demonstrated the formation of 5 different complexes during pre- mRNA processing by the spliceosome (Figure 10) (Jurica et al., 2002).

These different complexes, called E, A, B, C and P, correspond to the recruitment of different sets of snRNPs on the pre-mRNA for its processing. Once transcribed by the RNA polymerase II, the transcript is bound by different proteins and U1 snRNP is recruited at the 5’ splice site. This first complex, called E complex, is important to initiate the dynamic process. Splice sites can be degenerated, and their recognition is facilitated by the recruitment in parallel of splicing factors such as SF1 and U2AF that bind cis-regulatory sequences in the exon and/or intron (Rino et al., 2008). The A complex (pre-spliceosome) is formed by the recruitment of U2 snRNP on the acceptor splice site at the 3’ end of the intron.

25 The Role of histone modifications in the regulation of alternative splicing during the EMT

Table 1: Composition and dynamic of the different spliceosomal complexes. Protein composition of the human A, B and C complexes determined by mass spectrometry. Proteins are grouped according to function, association with snRNPs or presence in a spliceosomal complex (adapted from Wahl et al., 2009).

26 The Role of histone modifications in the regulation of alternative splicing during the EMT

Once the two extremities of the intron are recognized by U1 and U2 snRNPs, a tri- snRNP U4/U6.U5 integrates the pre-spliceosome to form the pre-B complex (Gottschalk et al., 1999). This complex undergoes conformational changes to initiate spliceosome activation (B complex – pre-catalytic spliceosome) and U1 and U4 snRNPs are release, resulting in a new interaction between snRNPs U2 and U6 (B act complex). An ATP-dependent conformational change and a protein rearrangement occur leading to the formation of a catalytically active spliceosome (B* complex). The first splicing reaction is achieved, giving rise to the C complex of the spliceosome that in turn catalyzes the second splicing reaction after another ATP-dependent conformational change (C* complex). Finally, U5 snRNP of the post-splicing complex (P complex) interacts with the two extremities of the exons, making them in close proximity for ligation to generate the mature mRNA. Intron lariat stays attached to the ILS complex (intron lariat spliceosome) that is then released and metabolized by cellular nucleases while the spliceosome is disassembled (Yoshimoto et al., 2009).

Figure 10: Kinetic of spliceosome assembling during splicing reaction. Each splicing cycle involves different spliceosomal complexes with different compositions (E, A, B, C and P). There are four stages: assembly of the spliceosome, its activation, splicing reaction, and disassembly of the spliceosome (adapted from Yan et al., 2019).

27 The Role of histone modifications in the regulation of alternative splicing during the EMT

1.3 Alternative splicing

Initially considered as an exception, it is now well established that alternative splicing occurs in almost all genes. 40 years of studies and emergence of genome wide sequencing have progressively demonstrated that almost 95% of the multi-exons genes are alternatively spliced (Barash et al., 2010; Pan et al., 2008). Alteration of alternative splicing can lead to serious genetic disorders.

1.3.1 Different alternative splicing events

Alternative splicing involves the use of different donor and acceptor sites giving rise to several types of alternative splicing events. For the majority of them we find: cassette exons (skipping exons), mutually exclusive exons, intron retention, alternat ive 5’ or 3’ splice sites, and alternative promoters or termination sites (polyadenylation sites) (Figure 11).

a Cassette exons

It represents around 35% of alternative splicing events and is the most common type in mammalian pre-mRNAs (Wang et al., 2008). Exons are either included in the mature mRNA or spliced out. These exons usually code for extra domains bringing different functions to the isoforms. For example, Fibronectin (FN1) contains an EDA extra- domain important to regulate cell adhesion and migration depending on the inclusion of the EDA exon (extradomain-A exon). In fibroblasts, EDA exon is included whereas in liver it is skipped generating a soluble FN1 isoform secreted into plasma (Baralle and Giudice, 2017). Expression of an isoform lacking EDA exon in fibroblast has been shown as decreasing cell adhesion (Manabe et al., 1997).

b Mutually exclusive exons

In this case, alternatively spliced exons are never present at the same time in the mature mRNA despite they have functional 5’ and 3’ splice sites. FGFR2, a transmembrane receptor tyrosine kinase of the fibroblast growth factor receptor family, has two mutually exclusive alternative exons, IIIb and IIIc, encoding for a part of the third extra-cellular immunoglobulin-like domain of FGFR2. These two exons undergo tissue-specific alternative splicing (Carstens et al., 2000; Warzecha et al., 2009), exon IIIb being known to be predominantly included in epithelial cells, whereas exon IIIc is

28 The Role of histone modifications in the regulation of alternative splicing during the EMT limited to mesenchymal cells, leading to different affinities for FGF ligands and different downstream effects on differentiation and mitogenesis.

Figure 11: Different modes of alternative splicing. Representation of seven different modes of alternative splicing. Constitutive exons are represented in blue and alternative exons in orange and green (adapted from Kim et al., 2018).

c Alternative 5’ and 3’ splice sites

Exons can be shorter or longer depending on the use of alternative splice sites present in 5’ or 3’. These sites , present in the exons, can be in competition with each other and the alternative selection of these sites leads to the expression of different isoforms in which the exon or parts of it are included or spliced out from the mature mRNA (Koren et al., 2007). For example, the U6 unit of the human papillomavirus HPV16 contains

29 The Role of histone modifications in the regulation of alternative splicing during the EMT two 3’ splice sites that compete , and only one 5’ splice site, leading to the production of two mRNA E6*I and E6*II (Ajiro and Zheng, 2014).

d Intron retention

In rare cases, introns are not spliced out in the nucleus and are present in the mature transcript that is translocated in the cytoplasm to be then translated. This phenomenon is mainly tissue-specific and is the resultant of a defect in intron splice site recognition. Transcripts with intron retention are quickly degraded via nonsense-mediated decay (NMD). Granulocytes take advantage of this mechanism to inhibit dozens of essential genes for their differentiation (Wong et al., 2013) and in pluripotent stem cells, intron retention is an essential mechanism controlling cell self-renewal and differentiation (Tahmasebi et al., 2016).

e Alternative promoters and polyadenylation sites

Initial or terminal exons can be absent in the mature transcript. Transcription initiation can occur at different alternative promoters giving rise to different mRNA, and in the same way, splicing at different alternative polyadenylation (APA) sites can generate shorter or longer transcripts. In Arabidopsis thaliana , OXT6 gene produces two different proteins through alternative polyadenylation. An APA site, downstream the exon 2, produces an mRNA coding for AtCPSF30 protein, while the longer AtC30Y isoform is produced by the use of the terminal APA site (Li et al., 2017).

1.3.2 Regulatory elements mRNAs produced by the splicing machinery depend on the selection of 5’ and 3’ splice sites. In eukaryotes, this selection is the result of a combinatorial effect of multiple factors: on one hand, there are specific sequences found in introns or exons, called cis regulatory elements, and on the other hand, there is recruitment of several splicing regulators, called trans regulatory elements. Combination of these two processes makes splicing a very accurate and complicated mechanism (Ghigna et al., 2008).

30 The Role of histone modifications in the regulation of alternative splicing during the EMT

a Cis regulatory elements

The use of alternative 5’ or 3’ splice sites needs the recruitment of specific splicing factors that interact with the exon itself or with a flanking intron by associating with short and specific regulatory sequences called cis-acting splicing regulatory elements (SRE) in order to induce or inhibit spliceosome recruitment. SRE sequences are degenerated such as 5’ and 3’ splice sites.

It exists different types of SRE depending on their location and their function. They can have a silencer activity of splicing and are called ESS (exonic splicing silencer) when they are present in exons, or ISS (intronic splicing silencer) if they are located in introns. At the opposite, they can have an enhancer activity of splicing and are called ESE (exonic splicing enhancer) when they are present in exons, or ISE (intronic splicing enhancer) if they are located in introns (Blencowe, 2006; Wang and Burge, 2008). Splicing factors binding on these elements (trans regulatory elements) can modulate recruitment of U1 and U2 snRNPs of the spliceosome (Figure 12).

Enhancer or silencer elements can be found at the same splice site where they compete for splicing regulation. Cis regulatory elements are bound by two major families of splicing factors: HNRNP proteins that are preferentially associated with silencer sequences, and SR proteins that are mainly bound to enhancer elements. Other factors such as MBNL1, NOVA, RBFOX2 and PTBP1 can be involved in alternative splicing and the regulation of this mechanism depends on the balance of silencer and enhancer regulators involved (Chen, 2015; Merkin et al., 2012).

Figure 12: Regulatory elements of alternative splicing. Splicing outcome is regulated by cis-acting splicing regulatory elements (SRE) and trans-acting splicing regulatory elements. SRE can be present in exonic or intronic regions and are associated either with enhancing or silencing properties (ESE, ESS: exonic enhancers and silencers; ISS, ISE: intronic enhancers and silencers). These sequences recruit specific splicing factors to inhibit or promote recognition of surrounding splicing sites. (adapted from Matera and Wang, 2014).

31 The Role of histone modifications in the regulation of alternative splicing during the EMT

b Trans regulatory elements

Splicing factors can be divided in two types depending on their downstream effects on alternative splicing. The first type corresponds to splicing factors enhancing inclusion of alternatively spliced exons, and the second type includes splicing factors involved in silencing of splicing events leading to exon skipping. They are respectively called activators and repressors. These factors can be ubiquitously expressed, cell-type specific or selectively regulated depending on a developmental stage.

Serine Arginine Rich proteins (SR proteins) are ubiquitously expressed and among them we can find SF2/ASF (SRSF1), SC35 (SRSF2), SRp20 (SRSF3), 9G8 (SRSF7), SRp30c (SRSF9) and SRp35 (SRSF12) (Shepard and Hertel, 2009). SR proteins can interact with other SR proteins or directly with RNA to regulate alternative splicing, but their main function is based on their interaction with spliceosomal proteins such as U1 and U2 snRNPs at donor and acceptor sites. These interactions preferentially lead to exon inclusion. SR proteins are composed of several RNA binding motifs (RNA recognition motif – RRM) and repeated arginine/serine domains important for their catalytic activity ( Figure 13). In addition to their main activity on splicing, SR proteins are known to play a role in other processes such as RNA degradation, transcription or translation and are deregulated in numerous cancers (da Silva et al., 2015). Another type of ubiquitously expressed splicing factors are the HNRNPs (heterogeneous nuclear ribonucleoproteins). It exists more than twenty HNRNP proteins including HNRNPA, C, D, F, H, K or M and they all carry RNA binding motifs such as RRM (RNA recognition motif) or KS (K-homology) motifs (Figure 13). Even though HNRNPs can affect positively (Expert-Bezancon et al., 2002; Hofmann and Wirth, 2002) or negatively splicing, they preferentially act as repressors. These splicing factors are recruited to specific RNA binding sites that when binding mask splicing enhancers nearby, which blocks recruitment of an active spliceosome and thus exon inclusion (Martinez-Contreras et al., 2007). For instance, HNRNPI (called PTB – Polypirimidine Tract Binding) has high affinity for the polypyrimidine tract upstream the 3’ splice site , which interferes with U2AF binding, thus impacting alternative splicing (Singh et al., 1995). Similar to SR proteins, HNRNPs also regulate other processes like transcription, mRNA maturation, translation and telomere length conservation (Naarmann-de Vries et al., 2016; Singh and Lakhotia, 2016).

32 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 13: SR and HNRNP proteins families. Different examples of SR proteins and HNRNP proteins. SR proteins mainly bind to splicing enhancer elements while HNRNP proteins mostly bind to splicing silencer elements (adapted from Mueller and Hertel, 2011; and Ustaoglu et al., BioRxiv).

SR and HNRNP proteins are mostly ubiquitously expressed, although there are many other splicing factors that are cell type or tissue specific. For example, the brain-specific NOVA-1 and NOVA-2, RBFOX1 or nPTB have key roles mainly in brain’s alternative splicing outcome (Gehman et al., 2011; Irimia and Blencowe, 2012). Two other well- known examples are ESRP1 and ESRP2, essential splicing regulators in epithelial cells and completely shut down in mesenchymal cells, which can inhibit or favor alternative splicing depending on where they are recruited on the RNA (Warzecha et al., 2009).

33 The Role of histone modifications in the regulation of alternative splicing during the EMT

1.4 Deregulation of alternative splicing in cancer

Alteration of alternative splicing is known to be highly associated with numerous pathologies such as genetic disorders and cancer (Daguenet et al., 2015; Venables, 2004). Thanks to genome wide studies, many abnormal splicing transcripts have been identified in cancers. It can be the resultant of mutations in splicing factors involved in alternative splicing regulation, called “trans anomalies”, or direct mutations in splicing sites or regulatory elements so called “cis anomalies” (El Marabti and Younis, 2018). Functions of many splicing variants in cancer are still not well known but it has been demonstrated that some of them can have pro-tumoral or anti-tumoral properties in carcinogenesis and play key roles in therapeutic resistance (Wang and Lee, 2018).

1.4.1 Alteration of alternative splicing programs in cancer

a Alternatively spliced genes in cancer

Emergence of high throughput sequencing led to identify a large set of transcripts differentially expressed in different cancer types compared to healthy conditions (The Cancer Genome Atlas Research Network, 2014). Alternative splicing is an important process for cancer cells and is considered as a hallmark of cancer such as metastasis formation or cell death resistance, and interestingly, these hallmarks are themselves influenced by mis-splicing (Ladomery, 2013; Naro et al., 2015).

Alternative splicing can lead to the expression of tumor-associated variants involved in one of the cancer hallmarks to promote tumorigenesis and some examples are referenced in Table 2 . Moreover it has been recently shown at a genome wide level that alternative splicing alterations in cancer, called CASC (cancer-associated splicing changes), have a functional impact through modification of functional protein domains affecting important processes such protein-protein interactions (Climente-González et al., 2017). These CASC would be directly involved in the appearance of new oncogenic isoforms.

As previously mentioned, it exists two major mechanisms contributing to aberrant alternative splicing in cancer, trans anomalies and cis anomalies. Well-known examples of cis anomalies are tumor suppressor genes BRCA1 and BRCA2 (breast cancer 1 and 2), important markers of ovarian and breast cancers. Point Mutation in

34 The Role of histone modifications in the regulation of alternative splicing during the EMT

ESE element of exon 18 of BRCA1 disrupts the binding site of SRFS1, impairing splicing and leading to exon skipping (Mazoyer et al., 1998).

Table 2: Examples of abnormal transcripts in various cancers. Representation of different tumor-associated isoforms related to cancer hallmarks with their isoform structures, expression patterns in different tumor types, experimental evidences and associated splicing regulators (adapted from Urbanski et al., 2018).

35 The Role of histone modifications in the regulation of alternative splicing during the EMT

b Splicing factors in cancer

Splicing factors are highly important in alternative splicing regulation and their alteration can be a major driver of tumorigenesis. Mutations or modifications of their expression lead to a global transcriptomic reprogramming responsible of the transformation of normal cells into cancer cells, or make cancer cells becoming more aggressive (Anczuków and Krainer, 2016). Many splicing factors have been shown as deregulated in numerous cancers ( Table 3 ).

Table 3: Examples of splicing factors deregulated in various cancers. Representation of different classes of splicing factors, including the two predominant SR and HNRNP families, and the associated solid tumors in which they are deregulated (adapted from Urbanski et al., 2018).

HNRNP proteins are known to be involved in cancer. For example, HNRNPA1 is over- expressed in hepatocellular carcinoma (HCC) and is responsible of an increase in inclusion of the exon variant v6 in CD44, leading to the expression of a new isoform involved in metastasis formation (Loh et al., 2015). Similar to HNRNPs, SR proteins are deregulated in some cancers. SRSF5 is over-expressed in breast cancer (Huang et al., 2007), and in chronic myelomonocytic leukemia (CMML), around 50% of patients carry a mutation on SRSF2 splicing factor (Meggendorfer et al., 2012). This mutation

36 The Role of histone modifications in the regulation of alternative splicing during the EMT

(on the Pro95 residue) affects ability of SRSF2 to bind to ESE regulatory elements, changing splicing outcome of hundreds of genes. One of these genes is at EZH2 (Enhancer of Zest Homologue 2), an H3K27 methyltransferase, that is prone to an abnormal splicing regulation and is in turn responsible of a global loss of H3K27me3 (Kim et al., 2015), suggesting a link between epigenetics and alternative splicing. Splicing factors can also be downregulated in cancer. RBFOX2 is repressed in ovarian and breast cancers and associated with many abnormal alternative splicing events (Venables et al., 2009). Finally, some deregulations can affect spliceosomal complexes. In non-small cell lung cancer (NSCLC), spliceosomal proteins U2AF1 and RBM10 are both mutated and around 3% of adenocarcinoma are initiated by mutations in U2AF1, the most mutated protein in this cancer (Imielinski et al., 2012).

1.4.2 Oncogenic and tumor suppressor functions of AS variants

Many functional studies on alternative splicing in cancer and tumorigenesis have identified oncogenic and tumor suppressor alternative splicing isoforms. These studies have highlighted genes encoding for isoforms with different biological functions. As previously mentioned, abnormal splicing can be explained by two different types of events, cis anomalies corresponding to mutations of regulatory sequences, or trans anomalies resulting of aberrant expression or function of splicing factors. One of the first example described is the BCL2L1 gene encoding proteins that belongs to the BCL2 protein family. More precisely, it encodes for two isoforms of BCL-X: BCL-XS and BCL-XL. These two variants have antagonist functions and are generated by an alternative 5’ splice site at the exon 2. BCL -XS has pro-apoptotic functions while BCL- XL has anti-apoptotic functions. Their expression depends on several splicing factors such as SRSF2 and HNRNPK (Kędzierska and Piekiełko -Witkowska, 2017; Merdzhanova et al., 2008). FAS is another example of alternatively spliced gene giving rise to two antagonist isoforms involved in apoptosis (Miura et al., 2012). The vascular endothelial growth factor A (VEGF-A) gene encode for proteins involved in angiogenesis, and this gene gives rise to two transcripts with opposite functions, VEGF-A165b isoform is anti-angiogenic while VEGF-A165 is pro-angiogenic and highly expressed in cancer cells (Bates et al., 2002; Bonnal et al., 2012).

37 The Role of histone modifications in the regulation of alternative splicing during the EMT

Moreover, many other genes highly involved in tumorigenesis, such as androgen receptor (AR) and TP53, encode for different alternative splicing variants with distinct effects in cancer, making alternative splicing a major regulator of cancer progression and a promising therapeutic target (Chen and Weiss, 2015).

1.4.3 Therapeutic strategies based on splicing variants

Identification of splicing variants specifically associated with cancer brought to light their use as potential biomarkers. If a splicing variant is only expressed in one tumor tissue, it could be a highly useful tool for prediction of cancer progression, diagnostics and prognostics, provided that detection methods are specific and sensitive enough (Martinez-Montiel et al., 2018; Pajares et al., 2007). One of the most used and the most known biomarker is the CD44 gene, encoding for a transmembrane glycoprotein involved in cellular processes such as cell survival, migration and proliferation (Zöller, 2011). CD44 has ten alternatively spliced exons in its coding sequence, giving rise to a plethora of isoforms. Overexpression of CD44 v6 variant is associated with poor patient prognosis in gastric cancer progression (Fang et al., 2016), and in pancreatic cancer, expression of CD44 v10 isoform correlates with anti-metastatic properties (Navaglia et al., 2003).

In addition to a role of biomarker in cancer, the better understanding of alternative splicing mechanisms and identification of new variants led to develop therapeutic strategies targeting these new alternative transcripts ( Figure 14 ). The first approach consists of using antibodies conjugated to tumor-cell toxins against splicing variants specifically expressed in cancer cells ( Figure 14.A ). This strategy has been applied in neck and head cancers with antibodies only recognizing CD44v6 variant (Colnot et al., 2003; Verel et al., 2002). A second strategy consists of modifying alternative splicing by targeting upstream regulators. Indeed, drugs have been designed in order to change activity of kinases involved in phosphorylation of splicing factors from SR family such as SRPK1, despite the massive pleiotropic effects (Figure 14.B ) (Batson et al., 2017). Cis-regulatory elements are key sequences of splicing regulation and their targeting with synthetically modified oligonucleotides masks splicing sites preventing recruitment of trans splicing factors, decreasing the quantity of cancer-related transcripts ( Figure 14.C ) (Havens and Hastings, 2016). However, this strategy is very cost-effective and difficult to establish for each splicing event. Finally, antisense RNA,

38 The Role of histone modifications in the regulation of alternative splicing during the EMT small interfering RNA involved in mRNA degradation, can be designed to specifically recognize unique sequences such as oncogenic mRNAs (Figure 14.D ) (Gaur, 2006). In prostate cancer, antisense RNA targeting KLF6 SV1, an isoform of KLF6 gene, reduces tumor growth by approximately 50% and decreases the expression of many growth and angiogenesis-related proteins (Narla et al., 2005). The development of more straightforward therapeutic strategies that can affect the splicing events of interest more efficiently and specifically will be of great importance for improving current methods and increase cancer survival and prognosis.

Figure 14: therapeutic strategies based on alternative splicing targeting. Different strategies based on alternative splicing are used for cancer treatment. (A) Monoclonal antibodies targeting specific epitopes of a cancer-associated protein. (B) Drugs involved in the inhibition of splicing factors. (C) Antisense oligonucleotides binding to regulatory elements favoring normal variants expression. (D) Degradation of a specific transcript by RNA interference (adapted from Ghigna et al., 2008).

39 The Role of histone modifications in the regulation of alternative splicing during the EMT

Chapter 2: The relationship between chromatin and splicing

2.1 Chromatin

2.1.1 Discovery of chromatin

In 1880, Walther Flemming observed under a microscope mitotic cells and identified for the first time the chromatin. Histones, which are chromatin ’s major components, have been identified 4 years later by Albrecht Kossel. It is only in the 20th century that in deep functional and structural studies on chromatin can be found. Deoxyribonucleic acid (DNA), another important component of chromatin, has been discovered in bacteria and defined as part of and involved in heredity. This new discovery was initially controversial because scientific community was convinced that genes were proteins and heredity transmitted by them. In 1950, DNA has been described as a macromolecule composed of four different nucleotides (A, T, C, G) and its double helix structure has been characterized in 1953 by James Watson and Francis Crick (Figure 15 ) (Watson and Crick, 1953), receiving the Nobel prize of Medicine in 1962 for their important discovery. More recent studies highlighting molecular structure of chromatin have shown that DNA wraps around particles called nucleosomes while other studies were focused on the dynamic architecture of chromatin and its functions (Olins and Olins, 2003).

2.1.2 Chromatin structure

Chromatin is defined by a mix of one-third of nucleotides (RNA, DNA) and two-thirds of proteins. Among these proteins we find histones that represent 50% of them, and “non -histone ” proteins. The main characteristic of chromatin is its structural organization. Indeed, primary function of chromatin is to compact DNA in a limited volume in the nucleus. Because of its composition and its structure, chromatin can be involved in other processes such as DNA damage protection, DNA replication or control.

40 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 15: Double helix structure of DNA. The four nucleotides interact together through hydrogen bonds: A (adenine) with T (thymine), and C (cytosine) with G (guanine). The backbones (grey) are anti-parallel to each other making the 5’ and 3’ ends of each strand aligned (adapted from Leslie A. Pray, 2008).

a Nucleosome

In eukaryotes, the basic structural unit of chromatin, called nucleosome, is composed of a nucleosome core particle and an inter-nucleosomic region bound by the linker histone H1. This core particle can be divided in two different elements, a histone octamer and a DNA sequence of around 147 bp wrapping the octamer on 1.7 turns (Luger et al., 1997). Four types of histones are part of the octamer: H2A (14 kDa), H2B (14kDa), H3 (15 kDa) and H4 (11 kDa) and they are all assembled in a hierarchical manner. A central tetramer of histones H3 and H4 (H3-H4) 2 is associated with two peripheric dimers of histones H2A and H2B (H2A-H2B) thanks to interactions between H4 and H2B ( Figure 16 ) (Arents et al., 1991). Finally, a fifth type of histone, the linker histone H1, can physically bind a region of 20 bp between the inter-nucleosomic DNA and the nucleosome and is important for nucleosome stability and chromatin compaction (McGhee and Felsenfeld, 1980).

41 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 16: Nucleosome composition. Nucleosome is composed of 2 molecules of each histone: H2A, H2B, H3 and H4 (core histones) themselves assembled into two heterodimers (H2A-H2B) and one heterotetramer (H3-H4). DNA is then wrapped around the complete histone octamer (adapted from Henri Norman, 2018).

Surrounding the “histone -fold” domain, C -terminal and N-terminal ends of histone proteins are localized outside the nucleosome (Davey et al., 2002). N-terminal extremities are not involved in the nucleosome structure and their accessibility makes them highly sensible to post-translational modifications (PTM) (Kouzarides, 2007). These PTM are primordial for essential functions such as regulation of chromatin structure and can be recognized by specific factors.

b Chromatin higher-order structure

Human cell nucleus has a diameter between 10-12 µm and contains 3 billion base pairs (bp) DNA. All DNA molecules together would correspond to a 2 nm fiber of 2 meters long (Figure 17.A ). To solve this physical problem, several layers of chromatin exist to compact it as much as possible.

42 The Role of histone modifications in the regulation of alternative splicing during the EMT

The first compaction level consists of nucleosomes linked to each other by inter- nucleosomic DNA regions of dozens of DNA base pairs. This concatenation of nucleosomes forms a structure of 11 nm, called “beads on a string” form of chromatin (Figure 17.B ). A second compaction step of the nucleofilament occurs through specific interactions between nucleosomes giving rise to a 30 nm fiber (Figure 17.C ). Two in vitro models have been proposed for this fiber: solenoid and zigzag, depending on which neighboring nucleosomes establish binding with each other (Razin and Gavrilov, 2014). These two structures could exist in nucleus and would correspond to different functional states of chromatin (Pachov et al., 2011). This 30 nm fiber is then extended and folded in loops ( Figure 17.D ) along a protein matrix called nuclear matrix to reach a diameter of 300 nm. The last compaction level consists of the folding of loops to generate metaphase chromosomes that can be observed in mitosis ( Figure 17.E and F). This fiber of 1400 nm diameter is the most condensed form of chromatin.

Despite the extreme compaction level of chromatin, its dynamic can regulate plethora of genomic functions and DNA must be accessible for a proper course of events such as DNA repair, replication, transcription, and cell division.

c Euchromatin and heterochromatin

Initially, chromatin has been separated in two different types: euchromatin and heterochromatin. This distinction comes from their different colors on mitotic chromosomes.

Euchromatin can be defined has a decondensed and low-density structure and is associated with transcriptionally active genes. Some histone modifications are preferentially associated with euchromatin. It is the case of acetylation of H3 and H4 (H3ac and H4ac) (Grunstein, 1997) or methylation of histone 3 lysine 36 (H3K36me) (Wagner and Carpenter, 2012).

43 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 17: Different levels of DNA compaction. The basal level of compaction is the nucleosome. (A) DNA is initially represented as a 2 nm double helix. (B) Nucleosomes interact with each other to form a 11 nm structure called “beads on a string”. (C) This structure winds on itself to form a 30 nm chromatin fiber that extends and fold to form loops (D). Finally these loops are condensed to form mitotic chromosomes (E,F), the most condensed form of chromatin (adapted from Felsenfeld and Groudine, 2003; and Lia, 2005).

At the opposite, heterochromatin is a more dense and compact structure and its compaction remains the same regardless the cell cycle. While euchromatin correspond to transcriptionally active genes, heterochromatin is mainly associated with repressed genes. Different histone modifications correlate with this chromatin state such as trimethylation of histone 3 lysine 27 (H3K27me3) and trimethylation of histone 3 lysine 9 (H3K9me3). Heterochromatin can be divided in two sub-types: the constitutive heterochromatin and the facultative heterochromatin. The first one, usually poorly enriched in genes, is an irreversible and stable chromatin state conserved across cell divisions (Grewal and Elgin, 2007) and is made of repeated sequences next to

44 The Role of histone modifications in the regulation of alternative splicing during the EMT centromeres, peri-centromeres and telomeres. Facultative heterochromatin contains gene rich regions, is reversible, and can get structural and functional characteristics of constitutive heterochromatin. These regions can become decondensed leading to transcriptional activation of genes previously repressed.

2.1.2 Chromatin: a dynamic structure

Chromatin is a highly dynamic structure and its reorganization gives it new properties important for replication, transcription, or DNA repair. Three major mechanisms can explain this dynamic: nucleosome movements by remodeling factors, histone post- translational modifications (PTM) and DNA methylation, and histone variants incorporation in the core particle of the nucleosome.

a Chromatin remodeling

The high compaction level of chromatin and its organization are obstacles for accessibility to DNA regulatory regions. To make these regions more accessible and permit smooth functioning of cellular processes such as DNA repair, replication and transcription, chromatin remodeling factors are necessary. These factors are associated in complexes to modulate chromatin structure leading to changes in nucleosome positioning, their eviction, or modification of their composition. Remodeling complexes carry an ATPase catalytic sub-unit which hydrolyses ATP molecules to disrupt DNA-protein interactions and induce nucleosome conformation changes (Narlikar et al., 2013).

These remodeling factors can be divided in four categories ( Figure 18 ) classified according to properties of the ATPase sub-unit. First the SWI/SNF (Switching defective/Sucrose Non-Fermenting) family that interacts with acetylated lysine residues of chromatin thanks to their bromodomain (Singh et al., 2007). These factors make DNA accessible to transcription initiation factors. ISWI (Imitation Switch) family carries SANT and SLIDE domains in C-terminal region but does not have bromodomain. They bind to histone N-terminal extremities and are important for correct chromatin assembling during replication (Fyodorov and Kadonaga, 2002). CHD (Chromodomain-Helicase-DNA binding) family contains chromodomains, usually in N- terminal, interacting specifically with methylated lysine residues. CHD family members are involved in gene expression regulation and have important functions in development (von Zelewsky et al., 2000). Finally, INO80 (Inositol requiring 80-Switch

45 The Role of histone modifications in the regulation of alternative splicing during the EMT related complex 1) family has a fractionated ATPase domain favoring interactions with Rvb1 and Rvb2 proteins and adding helicase activity to the complex. Members of INO80 family are involved in transcription regulation and are responsible of H2A.Z variant incorporation in chromatin during double strand break repair (Tsukuda et al., 2005).

Figure 18: Chromatin remodelers. Organization of chromatin remodeler families defined by their characteristics and catalytic domains. All remodeling factors possess an ATPase domain important for remodeling activity, and unique flanking domains (adapted from Manelyte and Längst, 2013).

b Histone chaperones

Histone chaperones participate to incorporation or eviction of histones at specific spots in chromatin, guiding reversible nucleosome assembling and disassembling processes in a sequential manner. First, chaperones will deposit (H3-H4) 2 dimers on DNA to form tetrasomes and other chaperones will incorporate (H2A-H2B) dimers, forming final nucleosomes.

Chaperones activity can be dependent or independent of DNA synthesis and carries a specificity of histones they associate with. Some chaperones are specific to (H3-H4)

46 The Role of histone modifications in the regulation of alternative splicing during the EMT dimers, some are specific to (H2A-H2B) dimers, and some only associate with histone variants. Specific functions and specificity during nucleosome assembly of several chaperones are detailed in the following non exhaustive Table 4 (Burgess and Zhang, 2013).

Table 4: Histone chaperones and functions in nucleosome assembling. Non exhaustive list of histone chaperones, the associated histones, and their functions (adapted from Burgess and Zhang, 2013).

c Histone variants

Histone variants are non-allelic isoforms of conventional histones. In metazoans, all histone proteins (H1, H2A, H2B, H3) have variants excepted H4 (Franklin and Zweidler, 1977). Their expression is tissue or developmental stage dependent. Genes coding for histone variants are present in one or two copies in the genome and they all carry at least one intron and a poly-adenylation signal. At protein level, they arbore 40% to 99% homology with conventional histones (Szenker et al., 2011) and they can drastically modify structure and function of chromatin by replacing their corresponding conventional histone in the nucleosome octamer.

47 The Role of histone modifications in the regulation of alternative splicing during the EMT

H2A family contains numerous variants including H2A.Z and H2A.X, present in all eukaryotes, and H2A.Bbd and macro H2A, specific to vertebrates. Macro H2A is enriched on the inactive X and has important functions at promoters to regulate transcription and cell differentiation (Barrero et al., 2013). At the opposite, H2A.Bbd (Barr-Body Deficient) is excluded to the inactive X chromosome and is associated with active transcription (Tolstorukov et al., 2012). H2A.Z, one of the most expressed variant, is incorporated at TSS (transcription Start Site) of transcriptionally active genes while H2A.X has specific functions in double strand break DNA repair, recruiting DNA repair machinery and chromatin remodeling factors after its phosphorylation (van Attikum et al., 2007).

H2B has few variants and two of them are TH2B and H2BFWT, both expressed specifically in testis. TH2B is enriched in spermatocytes (Meistrich et al., 1985) while H2BFWT is present at telomeres during gametogenesis (Churikov et al., 2004).

Eight variants of histone H3 have been identified until now. First, H3.1 and H3.2, the most abundant histones in cell and considered as canonical histones. They are ubiquitously expressed. The others: H3.3, H3.4 (also called TH3, H3.t or H3.1t), H3.5, H3.X, H3.Y and CENP-A, are organism, tissue, or developmental stage specific. For example, H3.X and H3.Y are only expressed in primates (Wiedemann et al., 2010). CENP-A is specifically incorporated at centromeres during kinetochore formation (Régnier et al., 2005).

H1 family carries the most important number of variants. Eleven are known and even if their functions are not well understood, it would seem that they are all involved in chromatin compaction.

d Histone post-translational modifications

Histones are targeted by numerous post-translational modifications (PTM) ( Figure 19 ). This phenomenon has been discovered in 1960s (Allfrey and Mirsky, 1964) and majority of modifications are carried by N- and C-terminal extremities of histone tails that are easily accessible. Some modifications are specifically associated with processes such as chromatin compaction, DNA repair or transcription but most of the time, combination of several modifications is necessary, and some can even have ppppp

48 The Role of histone modifications in the regulation of alternative splicing during the EMT opposite effects. In 2000, a histone code has been proposed, predicting that histone PTM, alone or in combination, function to direct distinct and specific DNA-templated programs (Strahl and Allis, 2000).

Figure 19: Histone post-translational modifications. Scheme of histones H2A, H2B, H3, H4 and H1 post-translational modifications (PTM) on N- and C- terminal tails. Ac: acetylation, Me: methylation, Ph: phosphorylation, and Ub: Ubiquitination (adapted from Zhao et al., 2013).

PTM are in charge of the recruitment of specific factors but some of these modifications can be associated with structural modifications of the nucleosome, facilitating its remodeling or decreasing its affinity for DNA (Tropberger and Schneider, 2013). It exists tens of PTM and they are all deposited or removed by specific enzymes, called writers and erasers, respectively. The most studied are the methyl- transferases/demethylases (HMT/HDM, respectively) for histone methylation, and the acetyltransferases/deacetylases (HAT/HDAC, respectively) for histone acetylation. In addition to acetylation and methylation, phosphorylation, ubiquitination, ADP ribosylation and sumoylation are other modifications carried by histones (Rothbart and

49 The Role of histone modifications in the regulation of alternative splicing during the EMT

Strahl, 2014), but this list is not exhaustive and only the acetylation and methylation will be described in the following part.

Lysine residues can be mono-, di- or tri-methylated and position of these residues is highly important for their function. Methylation of histone 3 lysine 4, 36 and 79 (H3K4, H3K36 and H3K79, respectively) are associated with transcriptionally active genes, while methylation of histone 3 lysine 9 and 27 and histone 4 lysine 20 (H3K9, H3K27 and H4K20, respectively) correlate with repressed transcription. These modifications lead to recruitment of specific proteins, called readers, such as HP1 proteins (heterochromatin protein 1) that recognize H3K9me2/3 and induce heterochromatin formation. H3K27me3 recruits Polycomb proteins at transcriptionally inactive genes (Fischle et al., 2003).

Histone acetylation is carried by lysine residues. It neutralizes the positive charge of these residues, altering electrostatic interactions linking histones to DNA, opening chromatin and facilitating its accessibility (Zentner and Henikoff, 2013). Therefore, acetylated lysine residues are associated with active transcription. At the opposite, hypoacetylation is a characteristic of transcriptionally inactive chromatin (Eberharter and Becker, 2002). Acetylation of histone 3 lysine 9 and 27 (H3K9 and H3K27, respectively) are usually found at promoters and active chromatin and correlate with open chromatin regions facilitating recruitment of transcription factors (Igolkina et al., 2019).

e DNA methylation

DNA methylation is a key mechanism in eukaryotes. It is an highly important process involved in early embryonic development (Reik, 2007), X chromosome inactivation and genomic imprinting (Bird, 2002). In mammals, DNA methylation occurs mostly on cytosine residues in the context of CpG dinucleotides, while in prokaryotes, cytosines and adenines can be methylated. DNA methylation is conserved in majority of eucaryotes and consists of the addition of a methyl group (CH3) on the carbon 5 of the cytosine (5mC). This reaction is catalyzed by DNA methyltransferases (DNMT) that use the methyl group from the S-adenosylmethionine (SAM). Several DNMT have been characterized and are essential for the establishment and maintenance of the methylation of cytosines. DNMT1 is mostly involved in maintenance of DNA pppppppppp

50 The Role of histone modifications in the regulation of alternative splicing during the EMT methylation profile across cell divisions and can be recruited at specific loci to repress transcription (Avvakumov et al., 2008; Jair et al., 2006). DNMT3A and DNMT3B are responsible of the de novo methylation during embryonic development and gametogenesis, and DNMT3L is important to stabilize the conformation of the active site of DNMT3A (Okano et al., 1999).

Despite DNA methylation is a stable process in somatic cells, a global loss of methylation occurs at different stages of development. This reversed process can be passive or active. Passive demethylation consists of the loss of methylation during successive replications, while active demethylation is performed by TET enzymes (ten- eleven translocation) that are methylcytosine dioxygenases and responsible of the eviction or modification of methyl groups on cytosine residues.

2.2 Genome and epigenome editing: the CRISPR/(d)Cas9 system

2.2.1 Discovery and description of CRISPR/Cas

In 1987, clustered regulatory interspaced short palindromic repeats (CRISPR) have been identified in bacteria Escherichia coli . They are separated by non-repeated DNA sequences of around 20 to 40 nucleotides called spacers (Ishino et al., 1987). CRISPR contains other associated and highly conserved genes called Cas (CRISPR- associated), coding for various proteins carrying nuclease, polymerase or helicase domains.

CRISPR/Cas systems are involved in bacterial adaptive defenses against foreign DNA of bacteriophages during infection, recognizing and cleaving this DNA (Horvath and Barrangou, 2010). These defense systems have been found in 40% of archaea and 90% of bacteria (Mojica et al., 2000). Function of these repeated elements has been elucidated in 2007. Bacteria are able to resist to macrophage infections by integrating a piece of genome of the virus in the CRISPR locus, protecting bacteria from further re-infection (Barrangou et al., 2007). Following this discovery, many other labs have demonstrated that the CRISPR locus is transcribed and converted in short RNA sequences (CRISPR-RNA – crRNA), each of them containing a unique spacer that pppp

51 The Role of histone modifications in the regulation of alternative splicing during the EMT guides activity of the Cas nuclease to further recognize and cleave the exogenous DNA (Garneau et al., 2010). From now, six different types of CRISPR locus have been discovered (type I to type VI) separated in two groups (class 1 and 2) (Ishino et al., 2018). Because of its simplicity, the type II CRISPR/Cas is the system used for genome editing applications and only this one will be described in this manuscript.

Type II CRISPR locus contains cas9 gene encoding for Cas9 endonuclease involved in spacers acquirement, crRNA biogenesis and DNA targeting (Sapranauskas et al., 2011). Size of Cas9 proteins can vary depending on bacterial species. CRISPR locus contains another sequence encoding for a short non-transcribed RNA called tracrRNA (trans-activating RNA) that interacts with crRNA and helps to couple the CRISPR complex with exogenous DNA (Deltcheva et al., 2011).

Figure 20: the three stages of the CRISPR/Cas9 system. After macrophage DNA injection in cell, Cas proteins cut the viral molecule in 20 nucleotides spacers that are then integrated in the CRISPR locus (Stage 1). The locus is transcribed, leading to the expression of a pre-crRNA that will be maturated by the RNAse III and assembled to Cas9 proteins (Stage 2). Finally, this complex will target specifically bacteriophage DNA that will be cleaved and degraded (Stage 3) (adapted from https://doudnalab.org/research_areas/crispr-systems/).

The CRISPR/Cas9 system can be divided in three major steps (Figure 20 ). First, the foreign DNA is cut in small fragments of 20 nucleotides that are then incorporated in the CRISPR locus between short repeated sequences. These fragments are not randomly selected and they depend on the presence of a PAM motif (protospacer ppppp

52 The Role of histone modifications in the regulation of alternative splicing during the EMT adjacent motif) at their 5’ end that is specific of each Cas9 orthologue (Mojica et al., 2009). The second step consists of the transcription of this locus, producing a small RNA (pre-crRNA) important to guide endonucleases involved in foreign DNA targeting and degradation. Precursor RNA and several tracrRNAs hybridize thanks to complementarity with CRISPR repeated sequences leading to the recruitment of Cas9 proteins that will stabilize pre-crRNA-tracrRNA interactions. Pre-crRNA is maturated by RNAse III in order to form functional crRNA containing unique spacers, called sgRNA (single guide RNA), that can recognize complementary exogenous DNA (Jinek et al., 2012). The last step consists of the interference with target sequences. To do so, mature crRNA-tracrRNA duplex and Cas9 protein form a complex that scan foreign DNA looking for PAM motifs. Interaction between PAM and Cas9 protein leads to the destabilization of the nearby double strand DNA and formation of a RNA-DNA heteroduplex (Jakočiūnas et al., 2016) . Consequences are a conformational change of Cas9 protein and a cleavage of the two strands of DNA. This system has been well characterized in Streptococcus pyogenes (Sp) (Jinek et al., 2012). A perfect complementary between the first 12 base pairs next to the PAM sequence is essential for DNA cleavage, while mismatches are tolerated in the last 8 base pairs (Sternberg et al., 2014).

2.2.2 Genome editing with CRISPR/Cas9

CRISPR/Cas9 technology is an efficient system to target specific regions of the genome in order to generate DNA double stand breaks and induce modifications such as deletions or insertions. Before CRISPR/Cas9, TALE (Transcription Activator-Like Effector) and ZFP (zinc Finger protein) were used to target the genome but these technics are more difficult to use and less efficient (Thakore et al., 2016). This new system is composed of a Cas9 endonuclease carrying catalytic domains (RuvC1 and HNH) involved in DNA cleavage and isolated from Streptococcus pyogenes (Figure 21.A ) (Larson et al., 2013; Qi et al., 2013).

53 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 21: CRISPR/Cas9 interference systems. (A) Cas9 endonuclease is made of recognition (REC) and nuclease (NUC) lobes. It recognizes a guide RNA (sgRNA) that is important for Cas9 protein to target a specific DNA sequence thanks to a protospacer and the PAM motif directly recognized by Cas9 to cleave double strand DNA via its two nucleases domains TuvC1 and HNH. (B) dCas9 carries mutations in RuvC1 and HNH domains, D10A and H840A, respectively, inactivating its nuclease function. dCas9 still binds to specific sequences and can be fused to other domains or sub-units such as p300 (adapted from Dominguez et al., 2016).

Cas9 protein targets specific DNA regions thanks to the sgRNA. This sgRNA is an RNA hybrid composed of a protospacer (crRNA) of 18 to 24 nucleotides complementary to the target sequence, and a scaffold sequence (tracrRNA) of 42 nucleotides (Cong et al., 2013). Cas9 and sgRNA form a complex that binds to the target sequence thanks to the protospacer, itself upstream a PAM sequence 5’ -NGG- 3’ directly recognized by the Cas9 to cleave double strand DNA (Dominguez et al., 2016).

These DNA breaks can lead to gene inactivation or heterologous gene introduction through two different types of repair processes: the Non Homologous End-Joining (NHEJ) and the Homologous Direct Repair (HDR). NHEJ can introduce punctual mutations during repair process such as insertions, deletions or mutations, favoring emergence of a stop codon or changes in amino acid composition. HDR, thanks to a recombination substrate, can insert, delete or substitute a longer DNA sequence (Yang et al., 2014).

It exists three types of Cas9. The “wild type” Cas9 that has been previously described and that cleaves double strand DNA at a specific locus. A Cas9 mutated version in which NuvC1 or HNH nuclease domains are mutated (D10A or H840A mutations, respectively ). This Cas9 has a “nickase” activity and cut only one strand of DNA (Cong et al., 2013). Finally, a dCas9 (nuclease-null Cas9 – dead-Cas9) in which the two

54 The Role of histone modifications in the regulation of alternative splicing during the EMT nucleases NuvC1 and HNH are mutated (D10A and H840A mutations, respectively) (Figure 21.B ). This dCas9 cannot cleave DNA but can still target specific sequences to activate or repress specific genes (Qi et al., 2013).

2.2.3 CRISPR/dCas9: a powerful tool for chromatin regulation

a Transcriptional regulation

The dCas9-sgRNA complex can be recruited at gene promoters to disrupt transcription initiation and inhibit gene expression (Figure 22.A ) (Qi et al., 2013). The repressive function of the dCas9 can be enhanced with the fusion of a repressor domain such as a KRAB domain (krüppel associated box), isolated from Kox1 protein, or HP1α chromoshadow domain (Gilbert et al., 2013). At the opposite, dCas9 protein can be fused to an activation domain to enhance transcription (Figure 22.B ). It is the case of multiple repeats of VP16 domain (VP64, VP 160), p65AD domain or a combination of transcription activators, called VPR domain (Vp64-p65-Rta). For example, VP64 trans- activator domain from herpes simplex virus fused to the dCas9 (dCas9-VP64) has been demonstrated as enhancing expression of MYOD and OCT4 genes when used with specific sgRNAs targeting their promoters (Hilton et al., 2015). To get better effects on transcription, a combination a several sgRNAs can be used at the same time in order to target a larger region of the same promoter (Maeder et al., 2013).

b Epigenome editing

Epigenetic modifications on histones or DNA are essential for a proper transcriptional regulation and play key functions in various biological processes in mammals, and fusion of specific epigenetic modifiers to dCas9 protein is a powerful tool to edit epigenome ( Figure 22.C ). Hilton lab has demonstrated that the recruitment of a dCas9 fused to the catalytic domain of the histone acetyltransferase p300 (dCas-p300core) is sufficient to activate expression of genes by targeting their promoter or distal regulatory regions, called enhancers, known to interact with these genes (Hilton et al., 2015). dCas-p300core has the capacity to specifically open chromatin, through modifying chromatin environment, and to favor recruitment of transcriptional machinery. Another study has shown that dCas9 associated to the histone demethylase LSD1 (dCas9- LSD1) can decrease methylation of histone 3 lysine 4 (H3K4) leading to the repression of the expression of specific genes (Kearns et al, 2015).

55 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 22: different applications of the CRISPR/dCas9 system. (A) dCas9 protein can be fused to repressor domains such as MXI1, KRAB or SID4X to repress transcription. (B) At the opposite, transcription activation can be performed by fusing activation domains to the dCas9, including multiple repeats of VP16 domain (VP64, VP 160), p65AD domain or a combination of transcription activators called VPR domain (Vp64-p65-Rta). (C) dCas9 can regulate epigenetic states by fusion of chromatin regulators. Fusion of catalytic core of the acetyltransferase p300 acetylates H3K27 and activates transcription, while the demethylase LSD1 removes methylation of H3K4 and represses transcription (adapted from Dominguez et al., 2016).

2.3 Alternative splicing and chromatin signature

Nucleosomes are not randomly distributed along the genome and DNA composition has a direct effect on their location (Segal et al., 2006). In human, nucleosomes are more frequently found in cytosine and guanine rich regions while sequences enriched in adenine and thymine are less associated with nucleosomes. Nucleosome repartition inside genes is not equal and it has been shown that exons are more enriched in nucleosomes than introns (Figure 23.A ) (Schwartz et al., 2009; Spies et al., 2009; Tilgner et al., 2009). It can be explained by a high concentration of guanine and cytosine nucleotides in exons compared to introns. These nucleosomes would interfere with transcriptional elongation by slowing down the RNA polymerase II (RNAP II) to favor exon recognition.

56 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 23: Nucleosome and histone modification enrichments around exons. (A) Nucleosome enrichment centered on 3’ splice sites, exon centers and 5’ splice sites . (B) Histone modification enrichments on exons compared to flanking intronic regions (adapted from Spies et al., 2009).

Nucleosomes can influence splicing through their histone modifications. Different studies, at exon specific and genome wide levels, have demonstrated that exons are enriched with specific histone modifications compared to introns (Figure 23.B ) (Andersson et al., 2009; Saint-André et al., 2011; Spies et al., 2009) (Agirre et al., in review, Nature Communication, Annexe 1 ). They have demonstrated that several marks including H3K27me2 and H3K36me3 are enriched on exons, some can even be specific to 5’ exons such as H4K20me1 (Hon et al., 2009). Presence of H3K9me3

57 The Role of histone modifications in the regulation of alternative splicing during the EMT mark can favor recruitment of HP1 proteins and facilitates inclusion of alternatively spliced exons of CD44 (Saint-André et al., 2011), while H3K4me3 would be involved in specific recruitment of splicing factors and spliceosome assembling factors (Davie et al., 2016). Many correlations have been done with histone methylation, but it has also been demonstrated that histone acetylation can play a role in alternative splicing control. For example, H3K27ac and H3K9ac are more enriched in alternative 5’ splice sites and alternative 3’ splice sites (Zhou et al., 2012).

DNA methylation is another signature that is more and more demonstrated as associated with splicing changes (Lev Maor et al., 2015) (Agirre et al., in review, Nature Communication, Annexe 1 ) and is positively correlated with inclusion levels of alternative exons (Maunakea et al., 2013). A specific and direct modulation of DNA methylation levels on spliced exons has been demonstrated as sufficient to induce changes in splicing of EDI minigene (Shayevitch et al., 2018). A major issue in biology is the cell-to-cell variability it exists in a heterogeneous population of cells. Development of single-cell technologies led to elucidate new and more accurate correlations. It is the case between DNA methylation and alternative splicing. Bonder lab has demonstrated, by performing parallel DNA methylation and transcriptome sequencing at single-cell level, that DNA methylation information accurately predicts splicing changes of individual cassette exons (Linker et al., 2019).

In addition to histone and DNA modifications, splicing factors themselves can be post transcriptionally modified to differentially modulate alternative splicing. A recent study demonstrated that the acetyltransferase p300 directly binds to the promoter of CD44 and phosphorylates HNRNPM splicing factor that in turn is evicted from CD44 pre- mRNA, leading to the activation of SAM68 and inclusion of CD44v exon (Siam et al., 2019).

2.4 Relationship between chromatin, transcription and alternative splicing

Alternative splicing occurs mostly co-transcriptionally (Oesterreich et al., 2016) and is coupled with several cellular processes such as transcription and these connections are important for splicing efficiency. RNA polymerase II (RNAP II) is a major component of splicing regulation, but recognition of introns and exons can be

58 The Role of histone modifications in the regulation of alternative splicing during the EMT controlled by other parameters including chromatin through histone modifications that can modulate splicing efficiency of an exon (Luco et al., 2011).

Transcription can modulate alternative splicing according to two distinct but non- exclusive models. A recruitment model in which RNAP II interacts directly or indirectly with spliceosome components to modulate splicing outcome. And a kinetic model based on transcriptional elongation speed that is involved in spliceosome recruitment, therefore influencing alternative exon inclusion (Kornblihtt et al., 2013). Chromatin and transcription are tightly associated, and chromatin could modulate alternative splicing by two models similar to those linking alternative splicing and transcription.

2.4.1 Coupling between alternative splicing and transcription

The recruitment model explaining the link between alternative splicing and transcription is based on the capacity of the Carboxy-Terminal Domain (CTD) of RNAP II large sub- unit to be phosphorylated and to recruit splicing factors. once this domain is mutated, splicing machinery is not recruited and the pre-mRNA is not properly spliced (McCracken et al., 1997). More generally, CTD domain permits recruitment of splicing factors from SR family promoting assembling of the spliceosome and its activation (David and Manley, 2011). One example is the exon EDI (exon 33) of fibronectin that is spliced in thanks to the tethering of SRSF3 to the CTD domain of RNAP II (de la Mata and Kornblihtt, 2006).

The second model, called kinetic model, is based on the kinetic and/or the processivity of the RNAP II during transcription in alternative splicing regulation. First evidence came from the use of a mutant of CTD domain decreasing the elongation rate of RNAP II (Coulter and Greenleaf, 1985). This slowing down was responsible of an increase of the inclusion of exon EDI of Fibronectin (de la Mata et al., 2003). Such observations have also been done with drugs activating or inhibiting transcriptional elongation, for example DRB (5,6-Dichloro-1-β- d-ribofuranosylbenzimidazole) and TSA (Trichostatin A), respectively, and impacting alternative splicing regulation of CFTR alternative exon 9 (Dujardin et al., 2014). According to the kinetic model, slow transcriptional elongation favors inclusion of alternatively spliced exons. Fast elongation of RNAP II exposes several splicing sites, including weak and strong sites, that compete for the splicing decision and weak sites are ignored for the benefit of strong sites. At the opposite, slow

59 The Role of histone modifications in the regulation of alternative splicing during the EMT elongation permits recognition of weak splice sites by the spliceosome eliminating introns and inducing exon inclusion ( Figure 24 ) (Kornblihtt, 2006).

Figure 24: the kinetic model for AS regulation by transcriptional elongation. The 3’ splice site preceding AS exon (blue) is weaker than the 3’ splice site in the following intron (red). Fast elongation of RNA polymerase II induces recognition of strong 3’ splic e sites (red) leading to exclusion of the AS exon (left). At the opposite, slow elongation favor exon inclusion by recognizing the weak 3’ splice site (blue - right) (adapted from Kornblihtt, 2006). 2.4.2 Chromatin affects alternative splicing

a Kinetic model

We have previously shown that CTD domain and elongation rate of RNA polymerase II influence alternative splicing, but RNAP II processivity depends on many factors including chromatin. Di- and tri-methylation of histone 3 lysine 9 (H3K9me2 and H3K9me3, respectively) can influence alternative splicing by the modulation of RNAP II elongation rate. Indeed, it has been demonstrated that the chromosomic region of variable exons of CD44 gene is enriched in H3K9me3 and correlates with a high level of alternatively spliced exon inclusion. To achieve this, H3K9me3 is recognized by HP1γ protein (CBX3), then interacting with the chromatin remodeling complex SWI/SNF that locally reduces transcriptional elongation rate and facilitates exon inclusion (Batsché et al., 2006; Saint-André et al., 2011). Histone acetylation correlates with a more accessible chromatin and during neuron depolarization, the enrichment of

60 The Role of histone modifications in the regulation of alternative splicing during the EMT

H3K9Ac is inversely proportional to the inclusion level of exon 18 of NCAM. In this study they also demonstrate that a slow transcriptional elongation favors inclusion of the same exon, bringing the hypothesis that exon 18 inclusion is repressed by an enrichment of H3K9ac on the exon 18 locus and a speed up of transcriptional elongation (Figure 25.A ) (Schor et al., 2009).

b Recruitment model

Another mechanism by which histone modifications can impact splicing is through the recruitment of specific proteins by these marks. Histone modifications can be detected by proteins called adaptors that in turn recruit splicing factors influencing splicing decision (Luco et al., 2011). FGFR2 (Fibroblast Growth Factor Receptor 2) carries two mutually exclusive exons IIIb and IIIc that are included in epithelial and mesenchymal cells, respectively. It has been shown that depending on the cell type, different histone marks are enriched on the exon locus. In mesenchymal cells, there is an enrichment of trimethylation of histone 3 lysine 36 (H3K36me3) that correlates with exon IIIb exclusion. They demonstrated that H3K36me3 is an anchoring platform for the chromatin adaptor MGR15 that in turn recruits the splicing factor PTB to exon IIIb, preventing its inclusion (Figure 25.B ) (Luco et al., 2010). Trimethylation of histone 3 lysine 4 (H3K4me3) has also been associated with alternative splicing outcome. This histone modification is recognized by the CHD1 chromatin adaptor, a component of the histone acetylase SAGA complex, that recruits U2 snRNP to the spliced exons (Sims et al., 2007).

Interestingly, some chromatin factors were also found to play a role in alternative splicing regulation but their involvement was independent of their chromatin functions. It is the case of the methyltransferase CARM1 and the remodeling factor BRM. Both of them have been shown as regulating alternative splicing independently of their enzymatic activities. CARM1 associates with the U1C protein and affects the selection of the 5’ splice site, while BRM int eracts with several spliceosomal components and SAM68, an enhancer of exon inclusion (Batsché et al., 2006; Ohkura et al., 2005).

61 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 25: two different models by which chromatin influences AS. (A) Kinetic model in which histone modifications affect transcriptional elongation speed and alternative splicing. Deposition of H3K9ac on exon locus brings a subsequent increase of RNAP II elongation favoring exon skipping of exon 18 of NCAM. The reverse process leads to exon inclusion. (B) Recruitment model in which histone modifications can affect alternative splicing through chromatin adaptors. An enrichment of H3K36me3 at FGFR2 locus recruits splicing factor PTB via the adaptor protein MRG15 resulting in exon skipping. Conversely, inclusion of FGFR2 exon is increased when H3K4me3 levels are higher, which reduce MRG15 recruitment (adapted from Kornblihtt et al., 2013).

62 The Role of histone modifications in the regulation of alternative splicing during the EMT

Chapter 3: The epithelial-to-mesenchymal transition (EMT) and its regulation

The epithelial-to-mesenchymal transition (EMT) is the reprogramming of an epithelial cell into a mesenchymal cell. Thanks to the shutting down of genes essential for the epithelial phenotype (Occludins, Cytokeratin, …) (Thiery and Sleeman, 2006), and the activation of a mesenchymal-specific transcriptional program (Vimentin, Fibronectin, …) (Thiery et al., 2009), the cell loses its apico-basal polarization to become more elongating. The whole cytoskeleton is reorganized, and cell-cell contacts are lost, allowing cells to individualize from the rest and become more mobile and invasive (Figure 26 ).

Figure 26: The Epithelial-to-Mesenchymal transition (EMT). During EMT, epithelial cells lose their epithelial characteristics and acquire progressively a mesenchymal phenotype. This transition is characterized by the loss of epithelial markers such as E- Cadherin, Claudin and Cytokeratin, and the gain of mesenchymal markers such as N-Cadherin, Fibronectin and Vimentin (adapted from Angadi and Kale, 2015).

EMT is highly inducible by environmental stimuli such as hypoxia, inflammation, developmental signals, etc … (Kao et al., 2016) and it is not a two stages process, but it is progressive with intermediate stages that share epithelial and mesenchymal characteristics. In the process of EMT, there is first a loss of cell-cell junctions that

63 The Role of histone modifications in the regulation of alternative splicing during the EMT together with a cytoskeleton reorganization creates new mechanical forces that favor the detachment of the cell from the rest of the epithelium and the apparition of a new elongating shape. Moreover, the cellular protrusions together with the degradation of the extracellular matrix thanks to the secretion of metalloproteases increase the cellular mobility and its capacity to invade other tissues like the blood stream during cancer metastasis. During this process, there are several intermediate stages with shared epithelial and mesenchymal characteristics that are all together in a heterogenous population of cells. Single cell studies have identified up to 6 transition states, called hybrids, with different tumorigenic and invasive capacities (Tam and Weinberg, 2013; Yu et al., 2013), raising the question whether it is those hybrid cells which should be targeted in cancer treatment.

It is important to note that such a process is reversible, and mesenchymal cells can switch back into epithelial cells in a process called mesenchymal to epithelial transition (MET). EMT and MET programs are physiological processes essential in early development and organogenesis, like during gastrulation or formation of the neural crest and other internal organs (Figure 27).

Figure 27: EMT is associated to multiple cellular processes. Physiologically, EMT plays major functions in embryonic development, and wound healing in adult. An abnormal progression of the EMT process can carry out different pathological disorders such as cancer and fibrosis (adapted from Thiery et al., 2009)

64 The Role of histone modifications in the regulation of alternative splicing during the EMT

They are also major processes for tissue regeneration and wound healing but an excess of EMT or induction of EMT in adult tissues when is not necessary can lead to very serious diseases such as fibrosis and tumor metastasis, highlighting the ambivalence of this very important process.

EMT has been classified into three types involved in different biological contexts. Type 1 EMT is associated with implantation, embryonic formation, neural crest cells formation and organogenesis. Type 2 EMT is associated with wound healing, inflammation and fibrosis. Type 3 EMT is associated with cancer, metastasis formation and cancer cells with stem cell-like properties ( Figure 28 ).

Figure 28: Different types of EMT. EMT has been classified into three different sub-types regarding their physiological or pathological involvements. Type 1 EMT is associated with embryonic development, type 2 EMT is associated with the context of inflammation and fibrosis, and type 3 EMT is associated with cancer (adapted from Kalluri and Weinberg, 2009).

.

65 The Role of histone modifications in the regulation of alternative splicing during the EMT

3.1 The different types of EMT

3.1.1 Type 1 EMT: development

Type 1 EMT occurs early during embryonic development and is associated with implantation and embryonic gastrulation (Carver et al., 2001), which gives rise to the mesoderm and endoderm and to mobile neural crest cells (Sauka-Spengler and Bronner-Fraser, 2008). The primitive epithelium, specifically the epiblast, gives rise to primary mesenchyme via an EMT. This primary mesenchyme can be re-induced to form secondary epithelia via an MET process. It is speculated that such secondary epithelia may further differentiate to form other types of epithelial tissues and undergo subsequent secondary EMT to generate cells of connective tissues, including astrocytes, adipocytes, chondrocytes, osteoblasts and muscle cells (Figure 29 ).

Figure 29: Reversibility of Type 1 EMT during development. Endoderm and mesoderm precursors invade the primitive streak (1). Precursors migrate to occupy different positions on the medio-lateral axis of the embryo (2) and, after undergoing MET, will generate many tissues such as the notochord and somites (2). Most of these cells undergo a second round of EMT (3). In blue are represented epithelial cells and in yellow mesenchymal cells (adapted from Nieto, 2013).

66 The Role of histone modifications in the regulation of alternative splicing during the EMT

3.1.2 Type 2 EMT: wound healing

TYPE 2 EMT is associated with wound healing and tissue regeneration. In this type 2 EMT, the program begins as part of a repair-associated event that normally generates fibroblasts and other related cells in order to reconstruct tissues following trauma and inflammatory injuries. However, in contrast to type 1 EMT, type 2 EMT is associated with inflammation and cease once inflammation is attenuated, as seen during wound healing and tissue regeneration. In the setting of organ fibrosis, type 2 EMT can continue to respond to ongoing inflammation, leading eventually to organ destruction. Tissue fibrosis is an unabated form of wound healing due to persistent inflammation (Figure 30 ) (Boutet et al., 2006; Zeisberg et al., 2007).

Figure 30: Wound healing in physio-pathological conditions. Representation of a normal wound healing (A) and fibrosis (B) with different consequences on the role of fibroblasts. We observe an increase of proliferation, migration and matrix synthesis in fibrosis, epithelial cells migrate to the wound area in order to close the wound area (adapted from Rieder et al., 2007).

3.1.3 Type 3 EMT: cancer invasion

Type 3 EMT occurs in neoplastic cells that have previously undergone genetic and epigenetic changes, specifically in genes that favor clonal outgrowth and the development of localized tumors. These changes, notably affecting oncogenes and

67 The Role of histone modifications in the regulation of alternative splicing during the EMT tumor suppressor genes, conspire with the EMT regulatory circuitry to produce outcomes far different from those observed in the other two types of EMT. Carcinoma cells undergoing a type 3 EMT may invade and metastasize and thereby generate the final, life-threatening manifestations of cancer progression (Labelle et al., 2011). Hypoxia and inflammation, often found in the tumor microenvironment, can favor the activation of EMT inducers, such as Snail and Twist, that will promote tumor dissemination and invasion of adjacent tissues, together with new stem cell properties, favoring the self-renewal of a small population of cells that can colonize and differentiate into secondary carcinoma (Figure 31 ) (Scheel and Weinberg, 2012). In addition, Twist also inactivates the cellular safeguard mechanism of cellular senescence triggered by oncogenes and Snail induces immunosuppression, immunoresistance, and chemoresistance (Ansieau et al., 2008; Kudo-Saito et al., 2009).

Figure 31: The metastatic cascade. Tumor progression represents a complex series of several steps, which are all together called the invasion-metastasis cascade. Cells from primary tumor exit their initial site to integrate circulation (local invasion, intravasation). They adapt to survive in blood vessels until they reach distant tissues (extraversion). Finally, the metastatic tumor grows in its new microenvironment (metastatic colonization) (adapted from Valastyan and Weinberg, 2011).

68 The Role of histone modifications in the regulation of alternative splicing during the EMT

3.2 EMT regulatory programs

There are three levels of regulation: a transcriptional regulation at the gene level, a post-transcriptional regulation at the RNA level and a translational regulation at the protein level. These three levels of regulation meet together to repress key epithelial genes, such as E-cadherin and other cell-cell contact proteins like occludin, and to activate cytoskeleton proteins, such as vimentin and fibronectin, and transcription factors, such as snail and twist to induce the mesenchymal phenotype.

3.2.1 Transcriptional regulation

The transcriptional regulation consists in the expression of key transcription factors, such as Snail, Twist and Zeb, in response to extracellular stimuli, that quickly shut down genes essential for the epithelial phenotype, such as cell-cell junction and cytoskeleton proteins like E-cadherin, occludin and cytokeratins, leading to cell individualization and loss of the apico-basal polarity (Figure 32 ). In parallel, genes essential for the reprogramming in mesenchymal cell are induced, like genes encoding for cytoskeleton proteins, integrins and metalloproteases to create cellular protrusions and degrade the basal membrane and extracellular matrix to migrate and invade other tissues (Barrallo-Gimeno, 2005; Sánchez-Tilló et al., 2010).

Figure 32: Role of Transcription Factors in EMT regulation. Multiple signaling pathways (inductors) can regulate EMT through modulating Snail, Twist and Zeb transcription factors, promoting migration and invasion, and facilitating tumor metastasis formation. The reverse process (MET) is regulated by a different set of transcription factors including ELF3, GATA3 and GRHL2 (adapted from https://www.bio-rad-antibodies.com/epithelial-to-mesenchymal- transition.html).

69 The Role of histone modifications in the regulation of alternative splicing during the EMT

Alternatively, the same way there are transcription factors inducing the EMT, there are factors reverting it, like ELF, GATA3 or GRHL2 that induce mesenchymal to epithelial transition essential for the final step in tumor metastasis, once the tumor cell has invaded another tissue and needs to come back to the epithelial state to create new carcinoma (Sengez et al., 2019; Somarelli et al., 2016; Takaku et al., 2016).

3.2.2 Post-transcriptional regulation

a microRNAs

It has been recently shown that small non-coding RNAs, called microRNAs and that are composed of around 22 nucleotides, also play a key role in the regulation of EMT by repressing the translation into proteins of the transcription factors essential to trigger EMT, such as Snail and Twist. Several miRNAs including the miR-200 family, miR-34 and miR-101, have been identified and repression of these miRNAs is sufficient to block the EMT, turning them into promising novel therapeutic targets to reduce tumor metastasis and resistance to treatment (Guo et al., 2014; Mongroo and Rustgi, 2010; Siemens et al., 2011).

b Alternative splicing

The most novel regulatory layer identified until now is the alternative processing of the transcribed pre-mRNAs into different mature RNAs that will give rise to different proteins with different functions. So far, almost at every single step in the reprogramming of epithelial cells into mesenchymal cells, there is an alternatively spliced gene with an epithelial-specific splicing variant and a mesenchymal splicing variant with different cell-specific functions. For instance, the mesenchymal splicing variant of the transmembrane receptor FGFR2 can recognize FGF-2 as a ligand whereas the epithelial isoform has less affinity for it, which will affect differentiation, growth and capacity to invade of cells (Savagner et al., 1994; Warzecha and Carstens, 2012). In the case of CD44, the mesenchymal splice variant induces the formation of invadopodia, increasing cell migration, whereas the epithelial isoform does not (Chen et al., 2018). Importantly, these splicing variants are so important, that the switch from one variant to another can be sufficient to induce or impair EMT (Brown et al., 2011; Ranieri et al., 2016; Shapiro et al., 2011).

70 The Role of histone modifications in the regulation of alternative splicing during the EMT

3.3 Link between alternative splicing and EMT

3.3.1 Global splicing reprogramming during EMT and examples

During EMT, there is a whole reprogramming at the splicing level with more than 4,000 changes in alternative splicing, amongst which some of them are sufficient to trigger EMT by themselves, suggesting that alternative splicing is a key regulatory mechanism for EMT and tumor progression (Grosso et al., 2008; Shapiro et al., 2011). Mis-splicing of genes including RON, ENAH, CD44, FGFR2 or CTNND1 can actively induce EMT (Table 5 ) (Brown et al., 2011; Carstens et al.,1997), and a recent genome-wide study found more than 100 alternative splicing events conserved over different cancer types, amongst which almost 50 were associated with EMT or its reverse process, the mesenchymal-to-epithelial transition (MET) (Danan-Gotthold et al., 2015). More and more therapeutic strategies aim to restore normal splicing patterns in cancer cells, and are even currently in clinical trials, by using splicing blockers or modified oligonucleotides to interrupt the splicing of transcripts such as BCL-X, FGFR1, RON or MDM2, which are known to promote metastasis in breast cancer (Bartel and Harris, 2004).

a Fibroblast growth factor receptor 2 (FGFR2)

FGFR2 is a transmembrane receptor tyrosine kinase of the fibroblast growth factor receptor family. It is activated by the ligands of the fibroblast growth factor family (FGFs) (Turner and Grose, 2010). The interaction between the receptor and its ligands leads to a cascade of downstream signals, ultimately influencing several EMT-related processes such as differentiation or invasion. FGFR2 has the particularity to have two mutually exclusive alternative exons, IIIb and IIIc, encoding for a part of the third extra- cellular immunoglobulin-like domain of FGFR2. These two exons undergo tissue- specific alternative splicing (Carstens et al., 2000; Warzecha et al., 2009), leading to different affinities for FGF ligands to the two receptor isoforms (Zhang et al., 2006). Exon IIIb is known to be predominantly included in epithelial cells, whereas exon IIIc is limited to mesenchymal cells. A switch between FGFR2-IIIb and FGFR2-IIIc isoforms has been one of the first example demonstrating the role of alternative splicing during EMT (Medina et al., 1999; Savagner et al., 1994), and more recently, it has been shown that the expression of the FGFR2-IIIc variant is sufficient to promote cell migration, invasiveness and proliferation in response to FGF-2 (Sanidas et al., 2014).

71 The Role of histone modifications in the regulation of alternative splicing during the EMT

Table 5: Examples of alternative splicing changes induced during EMT. List of several genes, including FGFR2, CTNND1 and CD44, that have important functions in cellular processes involved in EMT, with their isoform specific functions and localizations (adapted from Warzecha and Carstens, 2012).

b p120-catenin (CTNND1)

CTNND1 gene encodes for a protein that is part of the Armadillo (Arm) family proteins. It regulates cell-cell adhesions through stabilization of E-Cadherin and is involved in the rearrangement of the cytoskeleton via regulation of Rho GTPase activity (Davis et al., 2003). Additionally to its E-Cadherin stabilization function, CTNND1 is able to promote invasion and cell motility (Yanagisawa et al., 2008). These two functions seem to be contradictory but can be explained by different functions of the protein isoforms expressed either in epithelial cells or in mesenchymal cells. The mesenchymal variant contains alternative exons, among which the alternatively spliced exon 2, while these exons are predominantly skipped in epithelial cells, leading to the expression of a shorter isoform in epithelial cells (Keirsebilck et al., 1998). The lack of these exons is responsible to the absence of a coiled-coil domain in the epithelial

72 The Role of histone modifications in the regulation of alternative splicing during the EMT variant. In mesenchymal cells, this domain stabilizes RhoA binding and inhibits RhoA activity, resulting in an increase of migration and cell invasiveness (Epifano et al., 2014; Keirsebilck et al., 1998). During EMT, the expression of the mesenchymal CTNND1 isoform is induced (Zhang et al., 2014).

c CD44

CD44 gene encodes for a transmembrane glycoprotein involved in many cellular processes such as cell survival, migration and proliferation (Zöller, 2011). CD44 has a high structural heterogeneity associated to the presence of ten alternatively spliced exons in its coding sequence, giving rise to a plethora of isoforms. The expression of these different isoforms is cell type dependent and differentiation stage dependent. The CD44 transcript is composed of 20 exons in human, including 10 variable exons (v1-v10) and 14 constitutive exons (exons 1-5 and 16-20). The standard isoform is only composed of the constitutive exons and is called CD44s, while the inclusion of the variable exons leads to an increase of the size of the extracellular region of CD44, providing new interaction sites for additional molecules (Bennett et al., 1995; Hertweck et al., 2011). The CD44s isoform is ubiquitously expressed while spliced variants (CD44v) are restricted to specific cell types. In keratinocytes, the longest isoform is expressed. During monocyte differentiation, the v6 variant is progressively included while this same variant is skipped in granulocytes differentiation. During EMT, a drastic switch from the CD44E variant (exons v8-v10 included) to the CD44s variant occurs, and it has even been reported that this switch is required for the execution of the EMT, showing how important is the relationship between alternative splicing and EMT (Brown et al., 2011; Warzecha et al., 2009).

3.3.2 Factors regulating epithelial and mesenchymal splicing

More and more evidences indicate that alternative splicing is regulated by tissue- specific splicing factors that are mostly directly involved in processes essential for EMT (Figure 33 ) (Bebee et al., 2014). The major EMT splicing regulators are the epithelial specific regulatory proteins 1 and 2 (ESRP1/2). They have been first identified in a screen of FGFR2 and their downregulation is responsible of a switch from exon IIIb to exon IIIc isoform (Warzecha et al., 2009). Other splicing factors such as RBM47 and MBNL1 have been shown as potential splicing regulators in epithelial cells (Yang et al., 2016). Interestingly, alternative splicing in mesenchymal cells is regulated by a

73 The Role of histone modifications in the regulation of alternative splicing during the EMT different subset of regulators, including RBFOX2 and SRSF1 (Bebee et al., 2014), but some factors like MBNL1 can regulate splicing in both cell types. It is a combinatorial effect of different splicing factors, in this case MBNL1 and RBFOX2, that is responsible of the correct splicing regulation (Shapiro et al., 2011). Here we will focus on factors that regulate splicing in epithelial and mesenchymal cells in the context of cell reprogramming during EMT.

Figure 33: Binding and expression of splicing factors in EMT. (A) RNA binding motifs analysis of skipped exons during EMT. Binding sites are annotated as described in the key. (B) Representation of the expression levels of splicing factors and RNA binding proteins in epithelial and mesenchymal cells (adapted from Shapiro et al., 2011).

74 The Role of histone modifications in the regulation of alternative splicing during the EMT

a Epithelial splicing factors

Emergence of high throughput sequencing has allowed to identify many new alternative splicing events in EMT and to have a better understanding of the EMT itself. These sequencing technologies have favored discovery and characterization of regulators of these splicing events (Figure 34 ). As mentioned before, the firsts identified were ESRP splicing factors. These regulators are the most downregulated in multiple models of EMT and are involved in FGFR2 splicing regulation (Hovhannisyan and Carstens, 2005). Consequently, ESRP-targeted transcripts undergo a switch from epithelial to mesenchymal isoforms (Warzecha and Carstens, 2012). Interestingly, it has been shown that ESRP targets overlap with RBFOX2-regulated AS events even though RBFOX2 is mostly involved in mesenchymal splicing regulation (Braeutigam et al., 2014; Venables et al., 2013). During EMT, expression of RBFOX2 is slightly increased. These observations indicate a combinatorial regulation of alternative splicing during EMT with constitutively expressed factors and cell type specific factors involved.

Figure 34: Cell type specific splicing factors and associated splicing events. Cell type specificity of splicing factors with shifts in expression or activity during epithelial (blue) to mesenchymal (red) transition that impact relevant splicing events such as FGFR2 and CD44 (adapted from Neumann et al., 2018).

75 The Role of histone modifications in the regulation of alternative splicing during the EMT

Other factors have been shown as involved in splicing regulation during EMT. The RNA binding motif protein 47 (RBM47) has been recently shown as involved in splicing regulation in epithelial cells in an independent manner and with opposite effects of ESRPs, or in collaboration with them to promote a combinatorial effect on splicing (Yang et al., 2016). In cancer cells, RBM47, via its ability to modulate splicing, has been demonstrated as a potential metastasis formation inducer (Vanharanta et al., 2014). The HNRNPA1 splicing factor is another epithelial splicing regulator. It inhibits the expression of ΔRon, an active isoform of the Ron tyrosine kinase receptor important for EMT induction. HNRNPA1 can antagonize SRSF1 binding, preventing exon skipping (Bonomi et al., 2013). In cancer cells, expression of various ILF3 isoforms are regulated by the mesenchymal splicing factor SRSF3, leading to an increase of cell proliferation and transformation (Jia et al., 2019).

b Mesenchymal splicing factors

The same way it has been shown that specific splicing factors regulate epithelial splicing events, several factors have been associated with the expression of specific mesenchymal variants. It is the case of SRSF1, RBFOX2 and MBNL1. RBFOX2 is one of the most important mesenchymal splicing factors. It has initially been identified in breast cancer cells as an enhancer of splicing (Lapuk et al., 2010). Knock-down of RBFOX2 in mesenchymal cells has been reported as decreasing cell invasiveness and migration enforcing the idea that it is a major alternative splicing regulator in mesenchymal cells (Braeutigam et al., 2014; Venables et al., 2013). CLIP-seq data have shown that RBFOX2 binds near alternatively spliced exons, in introns either upstream or downstream, leading to exon skipping or inclusion, respectively. RBFOX2 is essential for embryonic stem cell survival (Yeo et al., 2009). MBNL1 is another RNA binding protein regulating alternative splicing in mesenchymal cells (Shapiro et al., 2011). In 2013, it has been demonstrated that MBNL1 and RBFOX2 cooperate in mesenchymal cells to generate new splicing variants, for instance ADD3 and LRRFIP2 genes. At the opposite, a competition between MBNL1 and other factors can occur. It happens between MBNL1 and PTBP1 in the regulation of PLOD2 and INF2 transcripts (Venables et al., 2013). Member of the SR family, SRSF1 (SF2/ASF) also regulates alternative splicing during EMT and is important for its progression (Gout et al., 2012). During EMT, the Ron proto-oncogene is involved in cell invasion and motility, and the ΔRon isoform expressed in mesenchymal cells is generated by the binding of SRSF1

76 The Role of histone modifications in the regulation of alternative splicing during the EMT on the ESE sequence present in the exon 12 (Ghigna et al., 2005). These data tend to demonstrate that alternative splicing deregulation is a major factor involved in tumor progression, cancer aggressiveness and EMT induction.

Other splicing factors from HNRNP family, SR family, or tissue-specific regulators, are also known to be involved in alternative splicing regulation during EMT (Figure 35 ) (Pradella et al., 2017; Shapiro et al., 2011).

Figure 35: Different splicing factors involved in EMT. During EMT, many constitutively expressed splicing factors from SR and HNRNP families are involved in splicing regulation. Other tissue-specific factors regulate splicing in a cell type dependent manner, with different expressions or localizations regarding the differentiation stage of EMT (adapted from Pradella et al., 2017).

77 The Role of histone modifications in the regulation of alternative splicing during the EMT

c Histone modifications and chromatin factors

Genome-wide studies demonstrated that in cancer, and more specifically during EMT, there are global changes of several histone modifications. More precisely, there is an increase of H3K36me3 and H3K4me3, both associated with active transcription, and a global decrease of the heterochromatin marks H3K9me2/3 (McDonald et al., 2011; Podlaha et al., 2014). As previously mentioned, some of these marks correlate with alternative splicing changes of genes essential for the EMT process. It is the case of CD44 and FGFR2 genes. FGFR2 arbors two mutually exclusive exons IIIb and IIIc, specifically expressed in epithelial and mesenchymal cells, respectively, and it has been demonstrated in steady-state cell lines that these exons are differentially enriched in several histone marks deposited by SET2 and EZH2 chromatin factors (H3K36me3 and H3K27me3, respectively) (Luco et al., 2010). Changes in splicing of exons of the variable region of CD44, that are excluded during EMT, are associated with changes in enrichment of the H3K9me3 mark that is recognized by the chromatin factor HP1γ (Saint-André et al., 2011). Even though the link between specific histone modifications and different alternative splicing outcomes is known, it remains unclear what is the real dynamic interplay connecting these two regulatory layers during EMT and what is the role of these marks in establishing the new cell-specific splicing program.

78 The Role of histone modifications in the regulation of alternative splicing during the EMT

Chapter 4: Thesis aims

4.1 Identification of histone modifications involved in the establishment and maintenance of a new EMT-specific splicing program

Taking advantage of RNA sequencing and Chromatin Immunoprecipitation sequencing (ChIP-seq) we have recently performed in the lab, I correlated in time during the onset of EMT the changes in alternative splicing with changes in histone modification enrichment levels at key time points in EMT. I used an inducible cell reprogramming system based on normal human mammary epithelial cells that stably express a tamoxifen (TXF) inducible form of the EMT transcriptional regulator Snail (MCF10a-Snail-ER). Upon TXF treatment, Snail enters in the nucleus and silences key epithelial markers 3h-6h after induction. It leads to a rapid reprogramming into mesenchymal cells with the first changes in splicing visible as soon as 12h-20h after treatment and expression of mesenchymal markers in 24h-48h. EMT is complete after 7 days. Based on these genome wide data and what was already known about a potential link between alternative splicing and histone modifications, I studied by regular ChIP-qPCR and RT-qPCR the dynamic interplay between changes in chromatin marks already known to be associated with changes in splicing (H3K36me3, H3K27me3) or identified by ChIP-sequencing as differentially enriched on alternatively spliced exons (H3K27Ac, H3K4me1), and alternative splicing during EMT by performing very accurate time courses along well-known model genes known to be regulated by chromatin and whose pattern of splicing changes early during EMT, such as FGFR2, and new candidates identified with the genome wide data, such as CTNND1 and TCF7L2.

4.2 Direct effect on alternative splicing of modulating regulatory histone modifications by adapting the innovative CRISPR/dCas9 system

Current approaches to modulate histone marks are based on changes in global expression of known histone modifiers, which create many pleiotropic effects. The

79 The Role of histone modifications in the regulation of alternative splicing during the EMT search for a more specific system to address the direct role of histone marks in splicing regulation led us to the epigenome editing CRISPR/dCas9 system. In this system, a DNA-targeting endonuclease dead mutant dCas9 is fused to the catalytic core, and not the full length protein to avoid potential aspecific effects, of the chromatin modifiers of interest, such as the H3K27 methyltransferase EZH2 or the acetyltransferase p300, to specifically modify the level of a histone mark at a specific target sequence. Based on this strategy, I have generated and established a dCas9-guided system to locally modify, at the gene locus of interest, the histone marks identified in Aim 1 to test their role in splicing and EMT. As a proof-of-concept, I tethered H3K27 methyltransferase EZH2 to FGFR2 alternatively spliced region to induce a change in FGFR2 splicing, since this chromatin modifier has previously been shown to regulate FGFR2 alternative splicing (Luco et al., 2010). An empty dCas9 fused to GFP was used as a negative control together with catalytic dead mutants to make sure that the effects were not because of an impact on chromatin structure, RNA polymerase II elongation rate or other indirect effects. I have also tested the gene-specificity of the system by controlling the splicing of other alternatively spliced genes important for EMT and used complementary gRNAs to assess potential off-target effects. The successful generation of these guidable histone modifiers allowed me to alter the histone mark landscape in an exon-specific way to study its direct impact on alternative splicing in greater details, especially for the EMT key splicing events identified in Aim 1.

4.3 Physiological impact on EMT progression of modulating splicing-specific histone modifications

A growing body of evidence suggests a central role of EMT in metastasis and cancer reoccurrence. This clinical relevance in combination with increasing evidence for the importance of alternative splicing in EMT may lead to novel therapeutic strategies. I have therefore tested in vitro the effect on EMT progression of locally modulating histone marks important for key EMT-specific splicing events by using the newly established dCas9 system . The characterization of morphological and physiological changes was done by performing wound healing assays and trans-well assays that I established for the first time in the lab and that allow me to steady the effects on migration and cell invasiveness.

80 The Role of histone modifications in the regulation of alternative splicing during the EMT

4.4 Mechanisms linking histone modifications to alternative splicing regulation during EMT

To identify the mechanism associated with the causative role of histone marks in alternative splicing regulation, I first looked at the kinetic of RNA polymerase II elongation that is known to make a link between chromatin and alternative splicing by modulating the recruitment of splicing factors at specific binding sites. To do so, I compared RNA polymerase II elongation rate at alternatively spliced exons before and after EMT induction to see if RNAP II kinetics would play a role in establishing the new splicing variant during EMT. I also tested if these specific chromatin marks modulate the recruitment of splicing factors by analyzing the knock-down of several factors, such as PTB, potentially involved in splicing regulation and identified by motif search analysis. I finally quantified the differential recruitment of some of these factors to the alternatively spliced exons by UV-crosslinking RNA Immunoprecipitation during EMT, and I assessed the impact of altering specific chromatin marks levels using the dCas9 system on the recruitment of candidate factors to the spliced exons.

81 The Role of histone modifications in the regulation of alternative splicing during the EMT

RESULTS

82 The Role of histone modifications in the regulation of alternative splicing during the EMT

Article: Histone marks are drivers of splicing changes necessary for an epithelial-to- mesenchymal transition (Submitted)

Segelle Alexandre , Núñez-Álvarez Yaiza, Webb Kimberly M., Villemin Jean- Philippe, Voigt Philip., Luco Reini F.

83 The Role of histone modifications in the regulation of alternative splicing during the EMT

Histone marks are drivers of splicing changes necessary for an epithelial-to-mesenchymal transition

Segelle Alexandre 1, Núñez-Álvarez Yaiza 1, Webb Kimberly M. 2, Villemin Jean-Philippe 1, Voigt Philipp 2, Luco Reini F. 1*

1. Institute of Human Genetics, UMR9002 CNRS-University of Montpellier, 34000, France.

2. Wellcome Centre for , School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom

* Corresponding author and Lead contact: Reini F. Luco, [email protected]

Summary

Alternative splicing is essential in many biological processes, including cell differentiation and reprogramming. In addition to well-known RNA-centric regulatory layers, chromatin has also been implicated in the modulation of specific splicing patterns. However, proof of a causal relationship between epigenetic modifications and splicing outcomes in a biological relevant context has remained elusive. We find that local changes in H3K27me3 and H3K27ac epigenetic marks at alternatively spliced genes important for EMT, such as FGFR2 and CTNND1, perfectly correlate in time with dynamic changes in alternative splicing during the epithelial-to-mesenchymal transition (EMT). CRISPR epigenome editing of these exon-specific histone marks is sufficient to induce inclusion of the mesenchymal-specific splicing isoform by impairing recruitment of the RNA-binding splicing regulator PTB, which triggers an EMT. These results reveal the direct capacity of histone marks to drive phenotypic changes via alternative splicing regulation, opening new perspectives on the development of novel chromatin-based therapeutic strategies to specifically target disease-related splicing events, such as EMT- related tumour metastasis.

Keywords

Alternative splicing; chromatin; histone modifications; H3K27; CRISPR; epigenome editing; epithelial-to-mesenchymal transition; EMT.

84 The Role of histone modifications in the regulation of alternative splicing during the EMT

Introduction

Alternative splicing of pre-mRNA transcripts is an essential process necessary to increase the proteome diversity of a cell. In the last twenty years, alternative splicing has been shown to be intimately linked to the transcriptional machinery (Herzel et al., 2017; Perales and Bentley, 2009). Early evidence pointed to a role for RNA polymerase II elongation rate in modulating the window of time available for splicing regulators to be recruited to competing RNA binding sites along the pre-mRNA during co- transcriptional splicing (de la Mata et al., 2003; Shukla et al., 2011). More recently, Zinc-finger transcriptional regulators (Han et al., 2017), 3D chromatin organization (Curado et al., 2015; Mercer et al., 2013), non-coding RNAs (Allo et al., 2009; Ameyar- Zazoua et al., 2012; Gonzalez et al., 2015), and specific histone modifications such as histone acetylation (Gunderson and Johnson, 2009), H3K4me3 (Sims et al., 2007), H3K9me3 (Ameyar-Zazoua et al., 2012; Yearim et al., 2015), or H3K36me3 (Guo et al., 2014; Luco et al., 2010; Pradeepa et al., 2012) have also been implicated in the regulation of alternative splicing in a more direct way, by inducing recruitment of splicing regulators to the pre-mRNA. However, most of these studies were based on global alterations of chromatin, via misexpression or drug-based inhibition of the chromatin factors involved, turning difficult to separate direct from pleiotropic indirect effects. It has therefore remained elusive whether a specific histone mark is capable of driving a change in splicing that is sufficient to impact the cell’s biology.

We have previously shown that alternatively spliced genes involved in the maintenance of the epithelial or mesenchymal cell identity, such as the Fibroblast Growth Factor Receptor 2 (FGFR2), are differentially enriched in two distinct chromatin signatures, H3K36me3/H3K4me1 or H3K4me3/H3K27me3, depending on the pattern of splicing. An increase in H3K36me3 levels, by overexpressing the histone methyltransferase SETD2, induced inclusion of FGFR2 mesenchymal isoform in normal human prostate epithelial cells; while overexpression of the Trithorax H3K4 methyltransferase ASH2 had the opposite effect in mesenchymal stem cells (Luco et al., 2010). Additionally, we found that expression of an antisense long non-coding RNA, which overlaps this chromatin-dependent alternatively spliced region in FGFR2 locus, was responsible for the recruitment of the chromatin modifiers, EZH2 H3K27 methyltransferase and KDM2a H3K36 demethylase, necessary to establish the chromatin signature that favours the epithelial-specific splicing isoform (Gonzalez et al., 2015). Here, we aimed

85 The Role of histone modifications in the regulation of alternative splicing during the EMT to study the functional link between these histone modifications and changes in alternative splicing in a dynamic cellular system to evaluate the causal effect of these epigenetic marks in the establishment of a new cell-specific alternative splicing pattern that directly impacts the cell phenotype. To do so, we took advantage of the well- established Epithelial-to-Mesenchymal Transition (EMT), in which changes in the alternative splicing of specific genes are sufficient to induce a cell reprogramming (Brown et al., 2011; Ranieri et al., 2016; Shapiro et al., 2011). We first correlated in time, during EMT, changes in enrichment of H3K36me3, H3K4me1, H3K27ac and H3K27me3, previously shown to affect alternative splicing (Gonzalez et al., 2015; Luco et al., 2010), with changes in exon inclusion in key genes for EMT, such as the aforementioned FGFR2 growth factor receptor and the cell-cell adhesion protein catenin delta-1 (CTNND1) (Sanidas et al., 2014; Shapiro et al., 2011; Yanagisawa et al., 2008). We identified two histone marks, H3K27me3 and H3K27ac, whose levels perfectly correlated in time with the highly dynamic changes in splicing observed during EMT. CRISPR/dCas9 epigenome editing of these H3K27-marked exons in human epithelial cells was sufficient to trigger the inclusion of FGFR2 and CTNND1 mesenchymal-specific splicing isoforms that could initiate an EMT. These findings provide direct evidence establishing chromatin modifications as drivers of alternative splicing patterns important for cell identity, and uncover a new mechanism through which dynamic changes in splicing are regulated by exon-specific dynamic changes in chromatin.

Results

Specific histone modifications correlate in time with dynamic changes in splicing during EMT.

The epithelial-to-mesenchymal transition is a cell reprogramming process involved in early development, wound healing, and tumour invasion in metastasis (Javaid et al., 2013; Shapiro et al., 2011). Human epithelial MCF10a cells stably expressing the EMT inducer SNAIL1 fused to the oestrogen receptor (MCF10a-Snail-ER) can be reprogrammed into mesenchymal-like cells in less than a week by addition of the ER ligand tamoxifen (Figure 1A) (Javaid et al., 2013). The first changes in splicing of classical EMT genes such as FGFR2, CTNND1, SLK or SCRIB are observed as early

86 The Role of histone modifications in the regulation of alternative splicing during the EMT as 12h after induction (T0.5) (Figure 1C,H and S1F,J,N). Moreover, all changes in splicing can be reversed by removing tamoxifen from the culture medium for three weeks, in what we refer to as mesenchymal-to-epithelial transition (MET), highlighting the dynamic nature of this cellular system (Figure 1C,H and S1F). When comparing changes in alternative splicing of key EMT genes (CTNND1, ENAH, FGFR2, SLK, SCRIB and TCF7L2, Figure 1A) to changes in histone modifications levels previously shown to mark alternatively spliced genes (Gonzalez et al., 2015; Luco et al., 2010), we found that changes in H3K27me3 and H3K27ac strongly correlated in time with splicing changes in 5 out of 6 genes studied (Figure 1B-E,G-J and S1I-P), whereas H3K4me1 only correlated at late phases of EMT (Figure 1F,K and S1I,K) and H3K36me3 exhibited poor correlation with changes in splicing (Figure S1D,H and data not shown).

These epigenetic changes were highly localised, occurring only at specific exons along the studied genes, such as CTNND1 exon 2, FGFR2 exon IIIc, SLK exon 13, TCF7L2 exon 4 and SCRIB exon 16 (Figure 1D-F,I-K and SI,K,M). With the exception of FGFR2 mutually exclusive exons, H3K27me3 levels positively correlated with inclusion of the alternatively spliced exon in all the genes tested (Figure 1D and S1I,K,M). H3K27me3 and H3K27ac levels were anti- correlated at most of the regulated exons, and H3K4me1 changed in the same direction as H3K27ac, suggesting distinct functions for these marks, but with coordinated effects on splicing regulation (Figure 1D-F,I-K and S1I,K). Of note, early changes in exon-specific histone marks did not correlate with changes in gene expression or nucleosome positioning during EMT (Figure S1A,C,E,G), suggesting a direct effect on splice site selection. Finally, the observed changes in chromatin modifications were not only as dynamic as the changes in splicing, but also reversible upon MET, implying epigenetic plasticity (Figure 1, MET panel).

In conclusion, we have found a localised enrichment of specific histone marks, H3K27me3, H3K27ac and H3K4me1, which changes correlate in time with the highly dynamic splicing changes observed during the reprogramming of an epithelial cell into mesenchymal during EMT, suggesting a potential functional link.

87 The Role of histone modifications in the regulation of alternative splicing during the EMT

Localised changes in H3K27me3 and H3K27ac drive alternative splicing.

In contrast with the late changes in H3K4me1, changes in H3K27me3 and H3K27ac were evident before changes in splicing could be detected in CTNND1 and FGFR2 alternative exons, at 6h post-induction (Figure 1, T0.25 panel), suggesting a causal role of these marks on alternative splicing. To directly test this hypothesis, we adapted the CRISPR/dCas9 system to edit the epigenome specifically at differentially marked alternatively spliced exons (Hilton et al., 2015; Maeder et al., 2013). Catalytic domains of well-known H3K27 modifiers were fused to a DNA targeting-competent, but nuclease-dead, mutant dCas9 to induce site-specific changes in H3K27 methyl or acetyl levels. Using this system, EZH2 H3K27 methyltransferase (Hwang et al., 2008; Margueron and Reinberg, 2011), UTX1 demethylase (Hong et al., 2007), p300 acetyltransferase (Hilton et al., 2015) and Sid4x deacetylase (Pradeepa et al., 2016; Siam et al., 2019) were targeted to CTNND1 exon 2 or FGFR2 exon IIIc in untreated epithelial MCF10a-Snail-ER cells (Figure 2A,G). To verify the exon specificity of the system, alternatively spliced exons present in the same gene, but not differentially enriched for these histone marks during EMT, namely CTNND1 exon 20 and FGFR2 exon IIIb, were also targeted using the same dCas9 modifiers (Figure S2C,L). As expected, dCas9-p300, but not its catalytic mutant dCas9-p300*, increased H3K27ac levels specifically at the targeted exons in both genes (Figure 2B,H and S2D,H,M,Q). On the other hand, dCas9-Sid4x slightly reduced H3K27ac levels just at CTNND1 exon 2 and dCas9-UTX1 reduced H3K27me3 levels mostly at exon IIIc, which are the exons enriched in these marks in epithelial cells (Figure 2B,C,H,I). However, dCas9- EZH2 had only a minor effect on H3K27me3 levels. We therefore tested vSET, a viral SET domain protein that specifically methylates H3K27 without requiring Polycomb Repressive Complex 2 subunits for activity (Figure S2A-B) (Mujtaba et al., 2008). As expected, a dimeric vSET construct fused to dCas9 (dCas9-vSETx2), but not the catalytic mutant dCas9- vSETx2*, strongly increased H3K27me3 levels precisely at the targeted exons (Figure 2C,I and S2E,I,N,R). Finally, H3K4me1, H3K9ac and H3K9me2 were not affected by H3K27 epigenome editing, confirming specificity of the system (Figure S2U,V). Interestingly, the increase in H3K27ac levels, mediated by dCas9- p300, also resulted in reduced H3K27me3 levels, while dCas9-UTX1-mediated H3K27 demethylation increased H3K27ac levels. These findings confirm the anti-correlative nature of these marks and establish the capacity of the CRISPR-dCas9 system to

88 The Role of histone modifications in the regulation of alternative splicing during the EMT generate the chromatin signatures observed during EMT (Figure 2B,C,H,I).

As predicted from the changes in histone modifications observed during EMT (Figure 1D,E), only dCas9-EZH2/vSETx2-mediated increase in H3K27me3 levels or decrease in H3K27ac, using dCas9-Sid4x, affected CTNND1 splicing, resulting in a ~3.5x-fold increase in the inclusion of the mesenchymal-specific exon 2 (Figure 2D). In contrast, consistent with the exon-specific H3K27 signature observed in FGFR2 exon IIIc in EMT cells (Figure 1I,J), exon IIIc splicing was induced by an increase in H3K27ac (dCas9-p300) and by a decrease in H3K27me3 (dCas9-UTX1) levels (Figure 2J). These results validate the hypothesis that H3K27me3 and H3K27ac levels play a direct role in the regulation of splicing. Importantly, H3K27 epigenome editing did neither affect the total expression levels of these genes nor splicing of other exons, such as CTNND1 exon 20 or FGFR2 exon IIIb, supporting a direct exon-specific splicing effect (Figure 2E,F,K,L). Furthermore, the use of catalytically dead mutants, such as dCas9- vSETx2* and dCas9-p300*, did not impact exon inclusion levels either, supporting an epigenetic-dependent effect on alternative splicing independent of alterations in chromatin structure and/or RNA polymerase II kinetics (Figure 2D,J). Finally, targeting CTNND1 exon 2 or FGFR2 exon IIIc with a second set of gRNAs (g2) also consistently induced inclusion of the mesenchymal-specific isoforms when the corresponding dCas9 modifier was used, confirming the robustness of the results and ruling out potential off- target effects (Figure S2C-G, L-P). It is important to note that epigenome editing of alternatively spliced exons not differentially marked by H3K27 modifications during EMT, such as CTNND1 exon 20, FGFR2 exon IIIb or ENAH exon 11 had no impact on their splicing (Figure S2H-K, Q- T and not shown), suggesting an exon- specific regulatory effect in which not all exons are sensitive to H3K27 modifications.

Taken together, our results demonstrate that local changes in specific histone modifications at epigenetically-marked exons are sufficient to trigger a change in splicing, demonstrating a causative role of chromatin marks on alternative splicing and their capacity to drive cell-specific splicing changes.

Chromatin-induced changes in splicing recapitulate the EMT.

Since changes in CTNND1 and FGFR2 alternative splicing are important regulators of EMT, and as mesenchymal-specific isoforms have been associated with poor

89 The Role of histone modifications in the regulation of alternative splicing during the EMT prognosis in several carcinomas, including breast and prostate cancer (Carstens et al., 1997; Sebestyén et al., 2015; Shapiro et al., 2011), we next tested the impact of these chromatin-mediated splicing changes on EMT. It has been previously shown that an increase in CTNND1 exon 2 inclusion levels affects the capacity of the protein to interact with E-cadherins, destabilizing cell-cell interactions and increasing cell motility and invasiveness (Yanagisawa et al., 2008). Furthermore, an H3K36me3-mediated decrease of FGFR2 exon IIIc mesenchymal isoform, which impacts the ligand specificity of the receptor, was shown to significantly decrease the migratory and invasive phenotype of non-small lung cancer cells, without impacting proliferation or apoptosis (Sanidas et al., 2014). We thus tested whether H3K27-mediated epigenetic induction of the mesenchymal-specific isoforms of CTNND1 or FGFR2 could reproduce EMT-like phenotypes in epithelial MCF10-Snail-ER cells. dCas9-Sid4x- or dCas9-vSETx2-mediated increase in CTNND1.ex2 mesenchymal isoform or dCas9-p300- or dCas9-UTX1-mediated increase in FGFR2.IIIc mesenchymal isoform significantly decreased the expression of classical epithelial markers such as E-cadherin and EPCAM, while increasing the expression of the mesenchymal markers ECM-1 and MCAM by ~2-fold, both at the transcript and protein levels. Whereas none of the other dCas9 modifiers nor dCas9 mutants (dCas9- vSETx2*, dCas9-p300*) had an effect (Figure 3A-D and not shown). Even more, H3K27me3-mediated switch in CTNND1 splicing significantly increased the non- directional (wound-healing) and bi-directional (transwell assay) migration capacity of targeted MCF10a-Snail-ER cells (Figure 3E-F). Importantly, catalytically dead dCas9- p300* and dCas9-vSETx2* had no effect on EMT (Figure 3A-F), and none of the splicing regulators known to play a role in EMT changed expression levels upon CRISPR epigenome editing (Figure S3A), supporting a direct chromatin-mediated effect on EMT progression. We conclude that highly localised changes in H3K27 marks at alternatively spliced exons of key EMT genes are sufficient to induce cell reprogramming changes.

To assess the biological impact of chromatin-induced changes in CTNND1 splicing in more depth, we took advantage of an antibody that specifically recognises CTNND1 protein isoforms including the mesenchymal-specific exon 2 (mCTNND1(ex2)). dCas9- vSETx2-mediated H3K27me3-driven induction of exon 2 inclusion in epithelial MCF10a-Snail-ER cells increased the proportion of cells expressing the CTNND1

90 The Role of histone modifications in the regulation of alternative splicing during the EMT mesenchymal isoform (31% positive cells) almost as much as tamoxifen-induced EMT (43% positive cells), while the use of the dCas9- vSETx2* mutant had no effect (9% of positive cells) (Figure 3G). Additionally, dCas9-vSETx2- induced mCTNND1(ex2)- positive cells, in which exon 2 inclusion levels were comparable to tamoxifen-induced EMT cells (Figure S3C), completely recapitulated the EMT observed by tamoxifen treatment, including bi-directional migration, invasion and expression of EMT markers (Figure 3H-L). Of note, an independent combination of gRNAs (g2) targeting exon 2 had comparable effects, supporting an exon-specific splicing effect of chromatin modifications in EMT (Figure S3B-G).

Collectively, these results support a model by which exon-specific regulation of alternative splicing by H3K27 modifications is sufficient to induce key features of EMT.

H3K27 marks do not regulate splicing by modulating RNA polymerase II elongation rate.

We next sought to understand the mechanistic basis underlying regulation of CTNND1 splicing by H3K27 marks. Chromatin has long been proposed to impact splicing by modulating RNA polymerase II elongation rate, altering the kinetics of splicing factors recruitment to competing alternative splice sites and/or regulatory RNA binding sites in nascent transcripts (Braunschweig et al., 2013; Dujardin et al., 2014; Luco et al., 2011). As H3K27me3 is known to mediate chromatin compaction, which can slow down transcription kinetics, and as histone deacetylase (HDAC) inhibitors, such as TSA, display opposite effects in chromatin and RNA polymerase II dynamics (Dujardin et al., 2014; Margueron and Reinberg, 2011; de la Mata et al., 2003), we compared RNA polymerase II elongation rate at CTNND1 exon 2 before and after EMT induction in MCF10a-Snail-ER cells. As expected, using the RNA polymerase II inhibitor DRB for synchronous stop / release of transcription in a cell population, we found a delay in transcription of CTNND1 exon 2, but not in constitutively spliced exon 15, in tamoxifen- induced EMT cells, correlating with enrichment in H3K27me3 and RNA polymerase II at exon 2 (Figure 4B,C,E,F). However, slowing down of RNA polymerase II was not observed 12h after induction of EMT (T0.5, Figure 4A,D), even though changes in H3K27 marks and exon 2 inclusion were already detected at that time point (Figure 1, T0.25 panels), suggesting that the H3K27 chromatin-induced changes in splicing,

91 The Role of histone modifications in the regulation of alternative splicing during the EMT observed very early during EMT (T0.25), are unlikely mediated by changes in RNA polymerase II kinetics. Finally, treatment with drugs increasing (TSA) or decreasing (DRB) RNA polymerase II elongation rate did not have an effect on CTNND1 splicing (Figure S4A-B), ruling out a kinetic effect on this splicing regulation. We thus conclude that changes in RNA polymerase II elongation rate do not play a role in establishing the new CTNND1 splicing variant during EMT, but may be a consequence of the new cell-specific splicing pattern that might play a role in its maintenance as a feed-back mechanism to reinforce the new splice site choice.

H3K27 marks modulate the recruitment of specific splicing factors to the pre- mRNA.

In parallel to the RNA polymerase II kinetic model, we and others have identified an alternative mechanism of chromatin-mediated recruitment of splicing regulators to pre- mRNA (Luco et al., 2010; Pradeepa et al., 2012; Sims et al., 2007; Yearim et al., 2015; Young et al., 2005). For instance, H3K36me3-mediated recruitment of the splicing repressor PTB to FGFR2 exon IIIb favors inclusion of the mutually exclusive exon IIIc in mesenchymal cells (Gonzalez et al., 2015; Luco et al., 2010). We thus tested a panel of the splicing factors potentially involved in CTNND1 exon 2 splicing to determine their possible connection with H3K27 marks. shRNA- mediated knockdown of all the RNA binding proteins previously implicated in CTNND1 splicing regulation (Girardot et al., 2018; Tripathi et al., 2016; Warzecha et al., 2010), as well as of RNA binding proteins identified by motif search analysis, pointed to the splicing factor PTB as a major repressor of CTNND1 exon 2 (Figure 4G-H and S4C). UV-crosslinking RNA immunoprecipitation assays further revealed differential recruitment of PTB to exon 2 pre- mRNA during EMT, with preferential binding to the H3K27ac-marked exon in untreated epithelial MCF10a-Snail-ER cells, when the exon is excluded (Figure 4I and S4D).

We next assessed the impact of altering H3K27me3 levels using the dCas9-vSETx2 construct on PTB recruitment to CTNND1 exon 2. As expected, a local increase in H3K27me3 levels at CTNND1 exon 2, which increases exon inclusion, reduced PTB binding to the exon pre-mRNA, while PTB binding to control regions, such as CTNND1 exon 6 and exon 20, was not affected, suggesting a direct impact of H3K27me3 on

92 The Role of histone modifications in the regulation of alternative splicing during the EMT

PTB binding to CTNND1 exon 2 pre-mRNA (Figure 4J and S4E).

We conclude that some alternatively spliced exons, probably sharing common regulators such as PTB, can be regulated directly by dynamic changes in H3K27ac and H3K27me3 marks. Even more, an exon-specific change in just one of these histone marks is sufficient to drive the switch in splicing necessary to induce a change in the cell’s phenotyp e, providing a novel mechanism in the cell toolkit to modulate its proteome in a dynamic and reversible way.

Discussion

Cell type-specific chromatin and alternative splicing patterns have been intimately involved in differentiation and lineage commitment (Gabut et al., 2011; Margueron and Reinberg, 2011; Soares and Zhou, 2018; Tapial et al., 2017). Increasing evidence has shown a functional cross-talk between these two regulatory layers, whose dysregulation can lead to disease (de Almeida et al., 2011; Braunschweig et al., 2013; Daguenet et al., 2015; Ellis et al., 2009; Sanidas et al., 2014; Singh and Cooper, 2012; Xu et al., 2018). However, it remained unclear to what extent chromatin modifications could directly cause cell fate-switching splicing changes. Using CRISPR/dCas9 epigenome editing tools (Hilton et al., 2015), we have successfully altered local H3K27me3 or H3K27ac levels at alternatively spliced loci. This exon- specific chromatin editing directly affected recognition of these exons by the splicing machinery without affecting the overall transcription nor RNA Polymerase II kinetics. As we targeted exons essential for the reprogramming of epithelial into mesenchymal cells (EMT), such as CTNND1 exon 2 and FGFR2 exon IIIc, we established that H3K27-mediated switches in alternative splicing of key EMT exons are sufficient to induce important features of cell reprogramming, demonstrating that chromatin can also regulate cellular identity by driving key changes in alternative splicing.

Changes in histone marks that switch splicing behaviour were very dynamic, starting as early as 6h after induction of EMT. They were further completely reversible, highlighting the plasticity and adaptability of the splicing machinery to a new stimulus via changes in chromatin modifications. In fact, plants and flies already exploit these mechanisms in cellular responses to changes in light and temperature via changes in

93 The Role of histone modifications in the regulation of alternative splicing during the EMT chromatin and alternative splicing (Martin Anduaga et al., 2019; Pajoro et al., 2017; Petrillo et al., 2014). We propose that both regulatory layers could also be interconnected in mammalian cells for a more efficient and rapid response to external stimuli. Furthermore, single-cell splicing analyses have revealed that transcripts with different splicing isoforms can coexist in the same cell and that the ratio of transcripts including or excluding a particular exon impacts the cell phenotype, such as in the case of CTNND1 in which only 30% of transcripts include the mesenchymal-specific exon 2 in cells that have undergone EMT (Linker et al., 2019). However, those differentially spliced transcripts arise from the same locus and share the same genomic and nuclear environment, raising the question of how splicing is regulated on the levels of individual transcripts. We propose that histone marks, which can be dynamically regulated by differential recruitment of writers and erasers to the gene locus, can induce rapid and transient changes in recruitment of the splicing machinery, thereby creating a transcript- specific regulation that can rapidly change between nascent pre-mRNAs.

Chromatin is known to impact splicing by modulating the recruitment of the splicing machinery to weaker RNA binding sites. Several direct physical interactions between splicing and chromatin regulators have been reported. For instance H3K36me3 and H3K9me3 can regulate PTB and SRSF3-dependent splicing via recruitment of the chromatin adaptor proteins MRG15 and HP1, respectively, which by physical interaction favour the binding of the splicing regulators to the pre-mRNA during co- transcriptional splicing (Luco et al., 2010; Yearim et al., 2015). Other splicing regulators such as hnRNPA2B1 and hnRNPL have also been shown to directly interact with chromatin in an RNA-independent way (Kfir et al., 2015), or with histone mark writers such as the H3K9 methyltransferase SETDB1 in the case of the splicing repressor hnRNPK or the Polycomb Repressor Complex 2 for the splicing factor RBFOX2 (Thompson et al., 2015; Wei et al., 2016). Finally, histone mark writers, such as p300, have recently been shown to modulate alternative splicing by post-transcriptionally acetylating the splicing factors themselves, which can impact their RNA binding capacity and activity (Siam et al., 2019). However, direct modification of splicing factors is unlikely the case in our cellular model, because both acetylation (dCas9-p300 and dCas9-Sid4x) and methylation (dCas9-UTX1 and dCas9-vSETx2) activities affect the same alternatively spliced gene, implying that acetylation and methylation would regulate the same splicing factor (or factors) involved, which seems unlikely. Of note,

94 The Role of histone modifications in the regulation of alternative splicing during the EMT not all histone marks previously shown to be associated with splicing correlated with changes in splicing during EMT. For instance, levels of H3K4me1 were changed only after changes in splicing were already established, similar to changes in RNA polymerase II elongation rate. As H3K4me1 levels have been positively associated with RNA polymerase II kinetics (Jonkers et al., 2014), we suggest that, contrary to H3K27me3 and H3K27ac, H3K4me1 changes could be a consequence of altered splicing, setting up a regulatory feedback loop to reinforce or maintain novel splicing patterns by impacting RNA polymerase II elongation rates, which is also known to regulate splicing (Dujardin et al., 2014; de la Mata et al., 2003).

Finally, we expect this chromatin-mediated regulation of alternative splicing to be gene and context specific. Specific subsets of alternatively spliced exons, sharing common regulatory pathways or cellular functions, have already been shown to be co-enriched by specific histone marks, such as H3K79me2 at cancer-related genes important for malignancy in leukaemia cell lines, or H3K36me3, H3K27ac, and H4K8ac in stem cell- related genes involved in DNA damage and cell cycle (Li et al., 2018; Xu et al., 2018). In the case of H3K27ac, preliminary analysis of recently obtained ChIP-seq data during EMT suggests that 16% of all exons changing splicing during EMT are differentially marked by H3K27ac (data not shown). analysis of these H3K27ac- marked alternatively spliced genes found classical hallmarks of EMT, such as lamellipodium formation, extracellular matrix organization or focal adhesion, which suggests a common and coordinated regulatory pathway. However, we could not find a clear correlation between H3K27ac and exon inclusion levels, suggesting that even if all these exons are marked by the same histone mark, they might not be modulating the recruitment of the same splicing regulators to H3K27-marked exons. In fact, H3K36me3 has been previously shown to modulate recruitment of the splicing repressor PTB and of the enhancer SRSF1 at different subsets of alternatively spliced exons (Luco et al., 2010; Pradeepa et al., 2012). Histone acetyltransferases (HATs) and deacetylases (HDACs) have also been shown to differently impact splicing by a variety of mechanisms, from modulating RNA polymerase II elongation rates to directly interacting with splicing regulators such as SF3A1 and SMN1(de la Mata et al., 2003; Rahhal and Seto, 2019; Siam et al., 2019). Finally, of relevance for this work, the aforementioned splicing factor RBFOX2 has recently been shown to induce recruitment of the Polycomb Repressive Complex 2, which methylates H3K27me3, to

95 The Role of histone modifications in the regulation of alternative splicing during the EMT bivalent gene promoters by protein-protein interactions (Wei et al., 2016). Since RBFOX2 and PTB are major splicing regulators of EMT (Shapiro et al., 2011), H3K27me3 enrichment at RBFOX2-dependent exons and H3K27ac enrichment at PTB-dependent sites could represent complementary mechanisms of regulating key splicing events during EMT.

We thus propose that exons sensitive to H3K27 marks might be coregulated during EMT for a rapid induction of changes in splicing necessary for the dynamic functional changes observed during cellular reprogramming. This might have a tremendous impact on the development of more specific therapeutic targets to reduce cell invasion and tumour metastasis, which depends on EMT phenomena. Therapies targeting general chromatin and splicing factors are currently in use, but often associated with pleiotropic and indirect effects (Daguenet et al., 2015; Ellis et al., 2009; Singh and Cooper, 2012). We propose to use epigenome editing tools to selectively change the splicing-associated chromatin marks responsible for pro-tumorigenic splicing isoforms, such as mCTNND1(ex2). Epigenome editing does not create large deletions, translocations or inversions in the genome, contrary to DNA editing, and off-target effects are likely much less hazardous than misregulating thousands of loci by inhibiting the equivalent chromatin modifier. Moreover, more than one exon can be targeted at the same time using sequence-specific gRNAs, increasing the impact in the cell. Future studies addressing the global impact of H3K27-mediated alternative splicing in EMT and cancer will confirm the importance of this new regulatory pathway, with the long-term goal of using it to reduce cell invasion and metastasis. In addition to the H3K27-centric regulation of EMT-related alternative splicing identified here, other histone marks might also coordinate the regulation of alternative splicing events important for other physiological processes. Further single-cell and single-locus approaches will bring the necessary insights into this highly dynamic layer of regulation.

96 The Role of histone modifications in the regulation of alternative splicing during the EMT

Acknowledgments

We are thankful to Dr. Haber for the EMT cellular model, Dr. Bertrand for the modified gRNA plasmid, Dr. Salton for plasmid reagents, Dr. Duckett for EZH2 plasmid, Dr. Pradeepa for Sid4x plasmid and Dr. Ge for UTX1 plasmid. We are also thankful to Paola Scaffidi, Jerome Dejardin, and Bernard de Massy for critical reading and discussion of the manuscript. This work was supported by the ANR program Labex EpiGenMed (to A.S. and Y.N.A), La Ligue contre le Cancer (to A.S.), the ANR Young Investigator grant (ANR-16-CE12-0012-01 to R.L), the ATIP- AVENIR program (to R.L.), the Wellcome Trust ([104175/Z/14/Z], Sir Henry Dale Fellowship to P.V.) and through funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC-STG grant agreement No. 639253 to P.V.). The Institut de Génétique Humaine is supported by the Centre National de la Recherche Scientifique and the University of Montpellier. The Wellcome Centre for Cell Biology is supported by core funding from the Wellcome Trust [203149]. We are grateful to Montpellier’s MRI image facility and the Edinburgh Protein Production Facility (EPPF) for their support. The EPPF was supported by the Wellcome Trust through a Multi-User Equipment grant [101527/Z/13/Z].

Author Contributions

Conceptualization: A.S. and R.L.; Methodology & Investigation: A.S., Y.N.A. and K.W.; Resources: P.V.; Writing & editing: A.S., Y.N.A., P.V. and R.L.; Funding Acquisition: P.V. and R.L.

Declaration of Interests

The authors declare no competing interests.

97 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure Legends

Figure 1: Specific histone modifications correlate in time with dynamic changes in splicing during EMT. (A) Schematic representation of the epithelial-to- mesenchymal transition (EMT) and reverse MET. The function of the alternatively spliced genes most relevant for EMT transition is shown. Normal human epithelial MCF10a-Snail-ER are totally reprogrammed into mesenchymal-like cells in 7 days (T7). First changes in EMT markers are observed 6h (T0.25) after treatment with tamoxifen (TXF). Until the EMT is complete, there are several intermediate states in which heterogeneous populations of cells coexist. ( B,G ) Representation of CTNND1 and FGFR2 gene loci with position of the primers used for ChIP-qPCR experiments. Highlighted in red the alternatively spliced exon regulated during EMT. Sense of transcription is represented as a black arrow. ( C, H ) Inclusion levels of CTNND1 exon 2 and FGFR2 exon IIIc relative to total expression levels of CTNND1 and FGFR2, respectively, in MCF10a-Snail-ER cells at different time points during induction of the EMT (0 to 7 days in presence of tamoxifen, orange) and reversible MET (21 days after removal of the tamoxifen at T7, red). RT-qPCR results are shown as the mean +/- SEM of n=4 biological replicates. ( D-F and I-K) Enrichment levels of H3K27me3 (D, I), H3K27ac (E, J) and H3K4me1 (F, K) along CTNND1 or FGFR2 locus, focusing into the alternatively spliced exons of interest (CTNND1.e2 and FGFR2.IIIc) and flanking intronic and exonic control regions, in tamoxifen-induced MCF10a-Snail-ER cells treated for 6h (T0.25), 24h (T1) or 7 days (T7) with tamoxifen and MET reversed cells in which the tamoxifen was eliminated for 21 days (MET). Chromatin immunoprecipitation results are shown as the mean +/- SEM in n=4 biological replicates. The percentage of input was normalized by three control regions across the different conditions. *P <0.05, **P <0.01, ***P <0.001 in two-tail un paired Student’s t - test respect untreated cells (grey).

Figure 2: Localised changes in H3K27me3 and H3K27ac drive alternative splicing. (A) Schematic representation of CTNND1 gene locus and the alternatively spliced exon 2 (yellow) and exon 20 (grey) with the position of the gRNAs used to exon-specifically target the different dCas9-fused proteins. Nucleosome positioning, according to MNase-qPCR assay (data not shown), is represented in exon 2. ( B,C ) Enrichment levels of H3K27ac (B) and H3K27me3 (C) at CTNND1 exon 2 (yellow) and control exon 20 (grey) in MCF10a-Snail-ER cells upon infection of dCas9 fused to the

98 The Role of histone modifications in the regulation of alternative splicing during the EMT catalytic domain of an H3K27 epigenetic modifier (summary table on the right) in the presence of exon -specific gRNAs targeting exon 2, by chromatin immunoprecipitation (mean +/- SEM, n=4). The percentage of input was normalized by three control regions across the different conditions. Mutated p300* and vSETx2* were used as negative controls together with empty dCas9. ( D-F) Inclusion levels of CTNND1 exon 2 (D) and exon 20 (E), and total expression levels of CTNND1 (F) relative to total CTNND1 or TBP in MCF10a-Snail-ER cells upon infection with dCas9 H3K27 epigenome editors and exon 2-specific gRNAs, determined by RT-qPCR (mean +/- SEM, n=4). ( G) Schematic representation of FGFR2 gene locus, as in (A), with gRNAs position and nucleosome positioning (data not shown) at the targeted exon IIIc (green). Exon IIIb is used as a control (grey). ( H,I ) H3K27ac and H3K27me3 enrichment levels at the gRNA-targeted exon IIIc and control exon IIIb in MCF10a-Snail-ER cells infected with the dCas9 H3K27 modifiers by ChIP-qPCR as described in (B,C) (mean +/- SEM, n=4). (J-L) Inclusion levels of FGFR2 exon IIIc, control exon IIIb and total expression levels of FGFR2, relative to total FGFR2 or TBP expression levels by RT-qPCR as described in (D-F) (mean +/- SEM, n=4). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect empty dCas9 plasmid (empty).

Figure 3: Chromatin-induced changes in splicing recapitulate the EMT. (A-D) Expression levels of epithelial (E-Cadherin, EPCAM) and mesenchymal (ECM1, MCAM) markers at the mRNA (A,B - RT-qPCR, mean +/- SEM, n=4) and protein (C,D - Flow cytometry, mean +/- SEM, n=4) levels in MCF10a-Snail-ER cells infected with the dCas9-fused proteins changing splicing and the corresponding exon-specific gRNAs targeting FGFR2 exon IIIc (g1) or CTNND1 exon 2 (g1). mRNA levels are normalized by TBP, and protein levels are quantified above the no primary antibody background signal (summary scheme on the left). ( E,F ) Functional EMT assays to test non-directional (E) and bi-directional migration (F) in MCF10-Snail-ER cells infected with exon 2 or exon IIIc-specific targeting gRNAs and dCas9-fused proteins with their corresponding catalytic mutants. Scratch assays ( E) were carried out on confluent monolayers of cells for evaluating the % of gap remaining 24h after wound (mean +/- SEM, n=3). Transwell assays ( F) evaluate the number of cells migrating towards FGF- 2 in 12h (mean +/- SEM, n=3). ( G) MCF10a-Snail-ER cells infected with gRNAs targeting CTNND1 exon 2 and either dCas9-vSETx2 or mutant dCas9-vSETx2* were cell-sorted using a splicing-specific antibody detecting only CTNND1 mesenchymal

99 The Role of histone modifications in the regulation of alternative splicing during the EMT protein variant (mCTNND1(ex2)) for directional migration and invasion transwell assays. Negative cells not expressing mCTNND1(ex2) and tamoxifen-induced T7 EMT cells were used as controls. The percentage of mCTNND1(ex2) positive cells is shown on the right. ( H) The number of sorted cells migrating or invading through a matrigel matrix for 24h were normalized to untreated cells for comparison with tamoxifen- induced T7 EMT cells (mean +/- SEM, n=3). ( I-L) Expression levels of epithelial (E- Cadherin, EPCAM) and mesenchymal (MCAM, ECM1) markers in cell-sorted MCF10a-Snail-ER cells. RT-qPCR levels were normalized by TBP expression levels (mean +/- SEM, n=3). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t-test respect the corresponding control (empty, dCas9-vSETx2* or negative cells, all in grey).

Figure 4: H3K27 marks regulate splicing by modulating the recruitment of specific splicing factors to the pre-mRNA. (A,B ) Apparition in time of CTNND1 exon 2 pre-mRNA in synchronized untreated (grey) and tamoxifen-induced (orange) MCF10a-Snail-ER cells upon release of the transcriptional inhibitor DRB after 0.5 days (A, T0.5) or 7 days (B, T7) of EMT induction. RT-qPCR results are normalized by tRNA expression levels (mean +/- SEM, n=3). ( C) Total RNA Polymerase II levels at CTNND1 exon 2 in untreated (grey) and tamoxifen-induced EMT T7 (orange) MCF10a- Snail-ER cells by ChIP-qPCR (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions and represented relative to untreated cells (grey). ( D-F) Same as (A-C) for CTNND1 control exon 15 (mean +/- SEM, n=3). (G) Predicted RNA-binding motifs along CTNND1 exon 2 in at least two of the four software used (RBPDB, RBPMAP, SFMAP and Spliceaid, details in methods). ( H) Inclusion levels of CTNND1 exon 2 upon knock-down, using lentiviral shRNAs, of candidate splicing regulators in untreated (grey) and tamoxifen-induced MCF10-Snail-ER (T7, orange) cells. RT-qPCR results are normalized by total CTNND1 expression levels (mean ± SEM, n=3). ( I) Enrichment levels of PTB at CTNND1 exon 2 pre-mRNA in untreated (UNT) and tamoxifen-induced (T7) MCF10a-Snail-ER cells. Constitutively included CTNND1 exon 6 and excluded CTNND1 exon 20 were used as negative and positive controls of PTB binding, respectively. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels (mean ± SEM, n=5) ( J) Enrichment levels of PTB at CTNND1 exon 2 and control exon 6 and exon 20 pre-mRNA in cell-sorted cells expressing (positive) or not

100 The Role of histone modifications in the regulation of alternative splicing during the EMT

(negative) the mesenchymal-specific splicing isoform mCTNND1(ex2) in MCF10a- Snail-ER cells infected with dCas9-vSETx2, or mutant dCas9-vSETx2*, and the exon- specific gRNAs (g1) targeting CTNND1 exon 2. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels (mean ± SEM, n=6). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t - test respect control cells (untreated cells, scramble shRNA or negative cells).

Supplemental Information

Supplementary Figure 1: Localised enrichment of specific histone marks at alternatively spliced exons during EMT. (A,B ) CTNND1 expression and exon 20 inclusion levels relative to total TBP and CTNND1, respectively, during EMT and MET in induced MCF10a-Snail-ER cells by RT-qPCR (mean +/- SEM, n=4). ( C,D ) Enrichment levels of H3 and H3K36me3 along CTNND1 locus in tamoxifen-induced MCF10a-Snail-ER cells for 24h by ChIP-qPCR (mean +/- SEM, n=4). Only for H3K36me3, the percentage of input was normalized by three control regions across the different conditions. ( E,F ) FGFR2 expression and exon IIIb inclusion levels relative to total TBP and FGFR2, respectively, during EMT and MET in induced MCF10a-Snail- ER cells by RT-qPCR (mean +/- SEM, n=4). ( G,H ) Enrichment levels of total H3 and H3K36me3 along FGFR2 locus in tamoxifen-induced MCF10a-Snail-ER cells for 24h by ChIP-qPCR (mean +/- SEM, n=4). Only for H3K36me3, the percentage of input was normalized by three control regions across the different conditions ( I,K,M,O ) Enrichment levels of H3K27me3, H3K27ac and H3K4me1 along SLK (I), TCF7L2 (K), SCRIB (M) and ENAH (O) loci in tamoxifen-induced MCF10a-Snail-ER cells by ChIP- qPCR (mean +/- SEM, n=4). The percentage of input was normalized by three control regions across the different conditions. ( J,L,N,P ) Inclusion levels of alternatively spliced exons essential for the EMT: SLK exon 13 (J), TCF7L2 exon 4 (L), SCRIB exon 16 (N) and ENAH exon 11 (P) in MCF10a-Snail-ER during induction of EMT. RT-qPCR values were normalized by total expression levels of SLK, TCF7L2, SCRIB and ENAH, respectively (mean +/- SEM, n=4). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect untreated (grey).

101 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Figure 2: Exon-specific epigenome editing of H3K27 marks is sufficient to induce a change in splicing. (A) Schematic of vSETx2 construct with optimal linker sequence between the two monomers. ( B) Representative histone methyltransferase assay to show the specificity of native vSET and vSETx2 activity on H3K27 residue in wild-type and H3K27A mutated recombinant chromatin templates. (C) Schematic representation of CTNND1 gene locus and the alternatively spliced exon 2 (yellow) and control exon 20 (grey) with the position of the gRNAs used to exon- specifically target the different dCas9-fused proteins. ( D,E,H,I ) Enrichment levels of H3K27ac (D,H) and H3K27me3 (E,I) at CTNND1 exon 2 (yellow) and control exon 20 (grey) in MCF10a-Snail -ER cells infected with dCas9-fused proteins and two different combinations of exon-specific gRNAs targeting exon 2 (g2) or exon 20 (g1) (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions. ( F,G,J,K ) Expression levels of CTNND1 exon 2 (F,J), total CTNND1 (G) and exon 20 (K) in untreated MCF10a-Snail-ER cells infected with dCas9-fused proteins and the exon-specific gRNAs targeting exon 2 (g2) or exon 20 (g1). RT-qPCR values were normalized by total CTNND1 or TBP as indicated in the graph (mean +/- SEM, n=4). ( L-T) Same as C-K on FGFR2 gene locus, with ( L) a schematic representation of FGFR2 locus, with gRNAs position at the alternatively spliced exons IIIc (green) and control exon IIIb (grey). ( M-T) H3K27ac, H3K27me3 and expression levels as represented in D-K (mean +/- SEM, n=4). ( U,V ) Enrichment levels of H3K4me1, H3K9ac and H3K9me2 at FGFR2 exon IIIc (green, U) or CTNND1 exon 2 (yellow, V) in untreated MCF10a-Snail-ER cells infected with dCas9-fused proteins and exon-specific gRNAs targeting exon IIIc (g1, U) or exon 2 (g1, V) by ChIP-qPCR (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions. *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test compared to control empty-dCas9.

Supplementary Figure 3: Direct effect of dCas9 epigenomic editing on EMT . (A) Expression levels of the splicing factors most important for EMT (PTB, ESRP1, ESRP2, RbFOX2, MBNL1 and RBM47) in MCF10a-Snail-ER cells infected with different dCas9-fused proteins and exon-specific gRNAs targeting FGFR2 exon IIIc (g1, dark grey) or CTNND1 exon 2 (g1, light grey). RT-qPCR levels were normalized by TBP (mean +/- SEM, n=3). ( B) MCF10a-Snail-ER cells infected with dCas9-vSETx2 or mutant dCas9-vSETx2* and two different combinations of exon-specific gRNAs (g1

102 The Role of histone modifications in the regulation of alternative splicing during the EMT and g2) targeting CTNND1 exon 2 were cell-sorted by expression levels of the mesenchymal CTNND1 protein variant, which includes exon 2 (mCTNND1(ex2)), using splicing-specific antibody. Negative cells not expressing mCTNND1(ex2) and tamoxifen-induced T7 EMT cells were used as controls. The percentage of mCTNND1(ex2) positive cells per condition is shown. ( C) CTNND1 exon 2 inclusion levels in cells expressing (positive) or not (negative) the splicing variant mCTNND1(ex2) in the conditions described in (B). RT-qPCR levels were normalized by total CTNND1 expression levels (mean +/- SEM, n=3). ( D-G) Expression levels of epithelial (E-Cadherin, EPCAM) and mesenchymal (MCAM, ECM1) markers in cell- sorted MCF10a-Snail-ER cells infected with dCas9-vSETx2 or the mutant dCas9- vSETx2* and the second combination (g2) of gRNAs targeting CTNND1 exon2. RT- qPCR levels were normalized by TBP expression levels (mean +/- SEM, n=3). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpa ired Student’s t -test respect negative cells.

Supplementary Figure 4. H3K27 marks regulate splicing by modulating the recruitment of RNA-binding proteins, such as PTB. (A,B ) CTNND1 expression and exon 2 inclusion levels normalized by total TBP or CTNND1 expression levels, respectively, in untreated (grey) and tamoxifen-induced (orange) MCF10a-Snail-ER cells treated with DMSO (control), 1 µg/mL TSA (HDAC inhibitor) or 40µM DRB (RNA Polymerase II inhibitor) for 24h. RT-qPCR results are shown as the mean +/- SEM of n=4 biological replicates. (C) Total expression levels of the candidate splicing factors involved in CTNND1 exon 2 regulation upon shRNA knockdown in untreated (UNT, grey) and tamoxifen-induced (T7, orange) MCF10a-Snail-ER cells. RT-qPCR levels are shown relative to cells infected with scramble shRNA (mean +/- SEM, n=3). (D) PTB enrichment levels at the negative control CD44 v10 and the positive control PKM2 intron 8 in untreated (UNT) and tamoxifen-induced (T7) MCF10a-Snail-ER cells. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels as in Figure 4 (mean ± SEM, n=5). ( E) PTB enrichment levels at the same positive and negative control regions as in (D) in cell-sorted MCF10- Snail-ER cells expressing (positive) or not (negative) the mesenchymal-specific splicing isoform mCTNND1(ex2) upon infection with dCas9-vSETx2, or mutant dCas9- vSETx2*, and the exon-specific gRNAs (g1) targeting CTNND1 exon 2. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1

103 The Role of histone modifications in the regulation of alternative splicing during the EMT exon 7 control levels (mean ± SEM, n=6). . *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect controls (untreated cells, scramble shRNA or negative control CD44 v10).

References

Allo, M., Buggiano, V., Fededa, J.P., Petrillo, E., Schor, I., de la Mata, M., Agirre, E., Plass, M., Eyras, E., Elela, S.A., et al. (2009). Control of alternative splicing through siRNA-mediated transcriptional gene silencing. Nat Struct Mol Biol 16 , 717 –724. de Almeida, S.F., Grosso, A.R., Koch, F., Fenouil, R., Carvalho, S., Andrade, J., Levezinho, H., Gut, M., Eick, D., Gut, I., et al. (2011). Splicing enhances recruitment of methyltransferase HYPB/Setd2 and methylation of histone H3 Lys36. Nat Struct Mol Biol 18 , 977 –983.

Ameyar-Zazoua, M., Rachez, C., Souidi, M., Robin, P., Fritsch, L., Young, R., Morozova, N., Fenouil, R., Descostes, N., Andrau, J.C., et al. (2012). Argonaute proteins couple chromatin silencing to alternative splicing. Nat Struct Mol Biol 19 , 998 –1004.

Braunschweig, U., Gueroussov, S., Plocik, A.M., Graveley, B.R., and Blencowe, B.J. (2013). Dynamic integration of splicing within gene regulatory pathways. Cell 152 , 1252 –1269.

Brown, R.L., Reinke, L.M., Damerow, M.S., Perez, D., Chodosh, L.A., Yang, J., and Cheng, C. (2011). CD44 splice isoform switching in human and mouse epithelium is essential for epithelial-mesenchymal transition and breast cancer progression. J Clin Invest 121 , 1064 – 1074.

Carstens, R.P., Eaton, J.V., Krigman, H.R., Walther, P.J., and Garcia-Blanco, M.A. (1997). Alternative splicing of fibroblast growth factor receptor 2 (FGF-R2) in human prostate cancer. Oncogene 15 , 3059 –3065.

Curado, J., Iannone, C., Tilgner, H., Valcarcel, J., and Guigo, R. (2015). Promoter-like epigenetic signatures in exons displaying cell type-specific splicing. Genome Biol 16 , 236.

Daguenet, E., Dujardin, G., and Valcarcel, J. (2015). The pathogenicity of splicing defects: mechanistic insights into pre-mRNA processing inform novel therapeutic approaches. EMBO Rep 16 , 1640 –1655.

Dujardin, G., Lafaille, C., de la Mata, M., Marasco, L.E., Munoz, M.J., Le Jossic-Corcos, C., Corcos, L., and Kornblihtt, A.R. (2014). How slow RNA polymerase II elongation favors alternative exon skipping. Mol Cell 54 , 683 –690.

Ellis, L., Atadja, P.W., and Johnstone, R.W. (2009). Epigenetics in cancer: targeting chromatin modifications. Mol Cancer Ther 8, 1409 –1420.

Gabut, M., Samavarchi-Tehrani, P., Wang, X., Slobodeniuc, V., O’Hanlon, D., Sung, H.K., Alvarez, M., Talukder, S., Pan, Q., Mazzoni, E.O., et al. (2011). An alternative splicing switch regulates embryonic stem cell pluripotency and reprogramming. Cell 147 , 132 –146.

Girardot, M., Bayet, E., Maurin, J., Fort, P., Roux, P., and Raynaud, P. (2018). SOX9 has distinct regulatory roles in alternative splicing and transcription. Nucleic Acids Res 46 , 9106 – 9118.

104 The Role of histone modifications in the regulation of alternative splicing during the EMT

Gonzalez, I., Munita, R., Agirre, E., Dittmer, T.A., Gysling, K., Misteli, T., and Luco, R.F. (2015). A lncRNA regulates alternative splicing via establishment of a splicing-specific chromatin signature. Nat Struct Mol Biol 22 , 370 –376.

Gunderson, F.Q., and Johnson, T.L. (2009). Acetylation by the transcriptional coactivator Gcn5 plays a novel role in co-transcriptional spliceosome assembly. PLoS Genet 5, e1000682.

Guo, R., Zheng, L., Park, J.W., Lv, R., Chen, H., Jiao, F., Xu, W., Mu, S., Wen, H., Qiu, J., et al. (2014). BS69/ZMYND11 reads and connects histone H3.3 lysine 36 trimethylation- decorated chromatin to regulated pre-mRNA processing. Mol Cell 56 , 298 –310.

Han, H., Braunschweig, U., Gonatopoulos-Pournatzis, T., Weatheritt, R.J., Hirsch, C.L., Ha, K.C., Radovani, E., Nabeel-Shah, S., Sterne-Weiler, T., Wang, J., et al. (2017). Multilayered Control of Alternative Splicing Regulatory Networks by Transcription Factors. Mol Cell 65 , 539- 553 e7.

Herzel, L., Ottoz, D.S.M., Alpert, T., and Neugebauer, K.M. (2017). Splicing and transcription touch base: co-transcriptional spliceosome assembly and function. Nat. Rev. Mol. Cell Biol. 18 , 637 –650.

Hilton, I.B., D’Ippolito, A.M., Vockley, C.M., Thakore, P.I., Crawford, G.E., Reddy, T.E., and Gersbach, C.A. (2015). Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat Biotechnol 33 , 510 –517.

Hong, S., Cho, Y.-W., Yu, L.-R., Yu, H., Veenstra, T.D., and Ge, K. (2007). Identification of JmjC domain-containing UTX and JMJD3 as histone H3 lysine 27 demethylases. PNAS 104 , 18439 – 18444.

Hwang, C., Giri, V.N., Wilkinson, J.C., Wright, C.W., Wilkinson, A.S., Cooney, K.A., and Duckett, C.S. (2008). EZH2 regulates the transcription of estrogen-responsive genes through association with REA, an corepressor. Breast Cancer Res Treat 107 , 235 – 242.

Javaid, S., Zhang, J., Anderssen, E., Black, J.C., Wittner, B.S., Tajima, K., Ting, D.T., Smolen, G.A., Zubrowski, M., Desai, R., et al. (2013). Dynamic Chromatin Modification Sustains Epithelial-Mesenchymal Transition following Inducible Expression of Snail-1. Cell Reports 5, 1679 –1689.

Jonkers, I., Kwak, H., and Lis, J.T. (2014). Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. ELife 3, e02407.

Kfir, N., Lev-Maor, G., Glaich, O., Alajem, A., Datta, A., Sze, S.K., Meshorer, E., and Ast, G. (2015). SF3B1 association with chromatin determines splicing outcomes. Cell Reports 11 , 618 –629.

Li, T., Liu, Q., Garza, N., Kornblau, S., and Jin, V.X. (2018). Integrative analysis reveals functional and regulatory roles of H3K79me2 in mediating alternative splicing. Genome Med 10 , 30.

Linker, S.M., Urban, L., Clark, S.J., Chhatriwala, M., Amatya, S., McCarthy, D.J., Ebersberger, I., Vallier, L., Reik, W., Stegle, O., et al. (2019). Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol 20 , 30.

Luco, R.F., Pan, Q., Tominaga, K., Blencowe, B.J., Pereira-Smith, O.M., and Misteli, T. (2010). Regulation of alternative splicing by histone modifications. Science 327 , 996 –1000.

105 The Role of histone modifications in the regulation of alternative splicing during the EMT

Luco, R.F., Allo, M., Schor, I.E., Kornblihtt, A.R., and Misteli, T. (2011). Epigenetics in alternative pre-mRNA splicing. Cell 144 , 16 –26.

Maeder, M.L., Linder, S.J., Cascio, V.M., Fu, Y., Ho, Q.H., and Joung, J.K. (2013). CRISPR RNA –guided activation of endogenous human genes. Nature Methods 10 , 977 –979.

Manzur, K.L., Farooq, A., Zeng, L., Plotnikova, O., Koch, A.W., Sachchidanand, null, and Zhou, M.-M. (2003). A dimeric viral SET domain methyltransferase specific to Lys27 of histone H3. Nat. Struct. Biol. 10 , 187 –196.

Margueron, R., and Reinberg, D. (2011). The Polycomb complex PRC2 and its mark in life. Nature 469 , 343 –349.

Martin Anduaga, A., Evantal, N., Patop, I.L., Bartok, O., Weiss, R., and Kadener, S. (2019). Thermosensitive alternative splicing senses and mediates temperature adaptation in Drosophila. ELife 8, e44642. de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M., Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow RNA polymerase II affects alternative splicing in vivo. Mol Cell 12 , 525 –532.

Mercer, T.R., Edwards, S.L., Clark, M.B., Neph, S.J., Wang, H., Stergachis, A.B., John, S., Sandstrom, R., Li, G., Sandhu, K.S., et al. (2013). DNase I-hypersensitive exons colocalize with promoters and distal regulatory elements. Nat Genet 45 , 852 –859.

Mujtaba, S., Manzur, K.L., Gurnon, J.R., Kang, M., Van Etten, J.L., and Zhou, M.-M. (2008). Epigenetic transcriptional repression of cellular genes by a viral SET protein. Nat. Cell Biol. 10 , 1114 –1122.

Pajoro, A., Severing, E., Angenent, G.C., and Immink, R.G.H. (2017). Histone H3 lysine 36 methylation affects temperature-induced alternative splicing and flowering in plants. Genome Biol 18 , 102.

Perales, R., and Bentley, D. (2009). “Cotranscriptionality”: the transcription elongation complex as a nexus for nuclear transactions. Mol Cell 36 , 178 –191.

Petrillo, E., Godoy Herz, M.A., Fuchs, A., Reifer, D., Fuller, J., Yanovsky, M.J., Simpson, C., Brown, J.W.S., Barta, A., Kalyna, M., et al. (2014). A chloroplast retrograde signal regulates nuclear alternative splicing. Science 344 , 427 –430.

Pradeepa, M.M., Sutherland, H.G., Ule, J., Grimes, G.R., and Bickmore, W.A. (2012). Psip1/Ledgf p52 Binds Methylated Histone H3K36 and Splicing Factors and Contributes to the Regulation of Alternative Splicing. PLoS Genet 8,e1002717.

Pradeepa, M.M., Grimes, G.R., Kumar, Y., Olley, G., Taylor, G.C.A., Schneider, R., and Bickmore, W.A. (2016). Histone H3 globular domain acetylation identifies a new class of enhancers. Nature Genetics 48 , 681 –686.

Rahhal, R., and Seto, E. (2019). Emerging roles of histone modifications and HDACs in RNA splicing. Nucleic Acids Res 47 , 4911 –4926.

Ranieri, D., Rosato, B., Nanni, M., Magenta, A., Belleudi, F., and Torrisi, M.R. (2016). Expression of the FGFR2 mesenchymal splicing variant in epithelial cells drives epithelial- mesenchymal transition. Oncotarget 7, 5440 –5460.

106 The Role of histone modifications in the regulation of alternative splicing during the EMT

Sanidas, I., Polytarchou, C., Hatziapostolou, M., Ezell, S.A., Kottakis, F., Hu, L., Guo, A., Xie, J., Comb, M.J., Iliopoulos, D., et al. (2014). Phosphoproteomics screen reveals akt isoform- specific signals linking RNA processing to lung cancer. Mol Cell 53 , 577 –590.

Sebestyén, E., Zawisza, M., and Eyras, E. (2015). Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res 43 , 1345 – 1356.

Shapiro, I.M., Cheng, A.W., Flytzanis, N.C., Balsamo, M., Condeelis, J.S., Oktay, M.H., Burge, C.B., and Gertler, F.B. (2011). An EMT-driven alternative splicing program occurs in human breast cancer and modulates cellular phenotype. PLoS Genet 7, e1002218.

Shukla, S., Kavak, E., Gregory, M., Imashimizu, M., Shutinoski, B., Kashlev, M., Oberdoerffer, P., Sandberg, R., and Oberdoerffer, S. (2011). CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature 479 , 74 –79.

Siam, A., Baker, M., Amit, L., Regev, G., Rabner, A., Najar, R.A., Bentata, M., Dahan, S., Cohen, K., Araten, S., et al. (2019). Regulation of alternative splicing by p300-mediated acetylation of splicing factors. RNA 25 , 813 –824.

Sims, R.J., Millhouse, S., Chen, C.F., Lewis, B.A., Erdjument-Bromage, H., Tempst, P., Manley, J.L., and Reinberg, D. (2007). Recognition of trimethylated histone H3 lysine 4 facilitates the recruitment of transcription postinitiation factors and pre-mRNA splicing. Mol Cell 28 , 665 – 676.

Singh, R.K., and Cooper, T.A. (2012). Pre-mRNA splicing in disease and therapeutics. Trends Mol Med 18 , 472 –482.

Soares, E., and Zhou, H. (2018). Master regulatory role of p63 in epidermal development and disease. Cell. Mol. Life Sci. 75 , 1179 –1190.

Tapial, J., Ha, K.C.H., Sterne-Weiler, T., Gohr, A., Braunschweig, U., Hermoso-Pulido, A., Quesnel-Vallieres, M., Permanyer, J., Sodaei, R., Marquez, Y., et al. (2017). An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res 27 , 1759 –1768.

Thompson, P.J., Dulberg, V., Moon, K.M., Foster, L.J., Chen, C., Karimi, M.M., and Lorincz, M.C. (2015). hnRNP K coordinates transcriptional silencing by SETDB1 in embryonic stem cells. PLoS Genet 11 , e1004933.

Tripathi, V., Sixt, K.M., Gao, S., Xu, X., Huang, J., Weigert, R., Zhou, M., and Zhang, Y.E. (2016). Direct Regulation of Alternative Splicing by SMAD3 through PCBP1 Is Essential to the Tumor-Promoting Role of TGF-β. Mol. Cell 64 , 549 –564.

Voigt, P., LeRoy, G., Drury, W.J., Zee, B.M., Son, J., Beck, D.B., Young, N.L., Garcia, B.A., and Reinberg, D. (2012). Asymmetrically modified nucleosomes. Cell 151 , 181 –193.

Warzecha, C.C., Jiang, P., Amirikian, K., Dittmar, K.A., Lu, H., Shen, S., Guo, W., Xing, Y., and Carstens, R.P. (2010). An ESRP-regulated splicing programme is abrogated during the epithelial-mesenchymal transition. EMBO J 29 , 3286 –3300.

Wei, C., Xiao, R., Chen, L., Cui, H., Zhou, Y., Xue, Y., Hu, J., Zhou, B., Tsutsui, T., Qiu, J., et al. (2016). RBFox2 Binds Nascent RNA to Globally Regulate Polycomb Complex 2 Targeting in Mammalian Genomes. Mol Cell 62 , 875 –889.

107 The Role of histone modifications in the regulation of alternative splicing during the EMT

Xu, Y., Zhao, W., Olson, S.D., Prabhakara, K.S., and Zhou, X. (2018). Alternative splicing links histone modifications to stem cell fate decision. Genome Biol 19 , 133.

Yanagisawa, M., Huveldt, D., Kreinest, P., Lohse, C.M., Cheville, J.C., Parker, A.S., Copland, J.A., and Anastasiadis, P.Z. (2008). A p120 catenin isoform switch affects Rho activity, induces tumor cell invasion, and predicts metastatic disease. J Biol Chem 283 , 18344 –18354.

Yearim, A., Gelfman, S., Shayevitch, R., Melcer, S., Glaich, O., Mallm, J.P., Nissim-Rafinia, M., Cohen, A.H., Rippe, K., Meshorer, E., et al. (2015). HP1 is involved in regulating the global impact of DNA methylation on alternative splicing. Cell Reports 10 , 1122 –1134.

Young, J.I., Hong, E.P., Castle, J.C., Crespo-Barreto, J., Bowman, A.B., Rose, M.F., Kang, D., Richman, R., Johnson, J.M., Berget, S., et al. (2005). Regulation of RNA splicing by the methylation-dependent transcriptional repressor methyl-CpG binding protein 2. Proc Natl Acad Sci U S A 102 , 17551 –17558.

108 The Role of histone modifications in the regulation of alternative splicing during the EMT

Materials and methods

Detailed methods are provided in the online version of this paper and include the following:

KEY RESOURCE TABLE

CONTACT FOR REAGENT AND RESOURCE SHARING

Reini Luco, [email protected]

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Cell Lines and Cell Culture

METHOD DETAILS

Cloning and Plasmids Expression and Purification of vSET constructs Methyltransferase assays Epigenome Editing Recombinant Lentivirus Production Chromatin Immunoprecipitation RNA Extraction and RT-qPCR Migration and Invasion Assays Flow Cytometry Experiments FACS of CTNND1 Exon 2 Expressing Cells Polymerase II Elongation Measurement shRNA Knockdown RNA-Immunoprecipitation Motif Search Analysis

QUANTIFICATION AND STATISTICAL ANALYSIS

P values and Statistical Analysis

DATA AND SOFTWARE AVAILABILITY

ADDITIONAL RESOURCES

109 The Role of histone modifications in the regulation of alternative splicing during the EMT

STAR METHODS

KEY RESOURCES TABLE

Supplementary List: List of Reagents and resources

Supplementary Table S1: List of ChIP-qPCR primers

Supplementary Table S2: List of RT-qPCR primers

Supplementary Table S3: List of Polymerase II elongation assay and RNA-IP primers

Supplementary Table S4: List of gRNAs

Supplementary Table S5: List of shRNAs

Supplementary Table S6: List of cloning primers

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for reagents may be directed to and will be fulfilled by the Lead Contact, Reini Luco ([email protected]).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Cell Lines and Cell Culture

MCF10a cells: Mcf10a cells are non-transformed human breast epithelial cells. Mcf10a-Snail-ER cell line was generated by introducing a Snail-1 retroviral expression construct using a fused oestrogen receptor (ER) response element to mediate regulation by exogenous 4-hydroxy-tamoxifen (4-OHT) and was obtained from Daniel A. Haber lab with its parental cell line (Javaid et al., 2013). All Mcf10a cell lines were maintained at 37°C with 5% CO 2 in DMEM/F12 supplemented with 5% horse serum, 10 ng/mL EGF, 10 μg/mL insulin, 0.1 μg/mL cholera toxin, 0.5 μg/mL hydrocortisone, 1% penicillin/streptomycin, 1% L-glutamine (complete medium).

EMT induction: Mcf10a were seeded at 7.5.10 5 cells / 150mm dish and 24h after, cells were synchronized in DMEM/F12 supplemented with 10 ng/mL EGF, 10 μg/mL

110 The Role of histone modifications in the regulation of alternative splicing during the EMT insulin, 0.1 μg/mL cholera toxin, 0.5 μg/mL hydrocortisone, 1% w/v penicillin/streptomycin (No serum medium) for 15h. Cells were then treated with 100nM 4-OHT or Methanol (control) in complete medium.

HEK293T cells: HEK293T were maintained at 37°C with 5% CO 2 in DMEM supplemented with 10% foetal bovine serum, 1% P/S, 1% L-glutamine. HEK293T are transfected by Calcium Phosphate transfection to generate recombinant lentiviruses.

METHOD DETAILS

Cloning and Plasmids

To generate plasmid DNAs encoding GFP/HAtag epitope-tagged sid4x (a gift from Salton lab), p300core (addgene 61357), EZH2core (a gift from Ducket lab), vSETx2(a gift from Voigt lab) and UTXcore (a gift from Ge lab), the cDNAs were amplified using Q5 High-Fidelity DNA Polymerase (NEB) with primers carrying the appropriate restriction enzymes sites AscI/SbfI (See Table S6 for the list of primers used) and cloned using Quick DNA Ligation Kit (NEB) into dCas9-empty-GFP vector. dCas9- empty-GFP vector has been generated by cutting dCas9-VP64-GFP plasmid (addgene 61422) by BamHI and NheI restriction enzymes to remove VP64 sequence, followed by introduction of a linker containing AscI and SbfI restriction sites and a HAtag epitope-tagged. Q5 Site-Directed Mutagenesis Kit (NEB) was used for generating dCas9 plasmids encoding the mutant p300core* (Y1467F) and vSETx2* (Y105F x2) proteins. Mutagenesis primers sequences and plasmids used in this study are listed in the Supplementary list and Table S6. To generate pKLV2.3-Hygro gRNA lentiviral plasmid, the commercial pKLV2.2-PGKpuroBFP plasmid (addgene 72666) was modified by removing the puromycin resistance and the BFP tag, an EcoRI site was added and hygromycin resistance was introduced in XhoI/EcoRI restriction sites. The different gRNAs were cloned by using SapI or BbsI restriction sites. Cloning primers sequences and gRNAs used in this study are listed in the Tables S4 and S6. Sh RNA plasmids were gifts from different laboratories (See Key Resources Table) or obtained by cloning Sh RNA sequences into pLKO.1-Hygro (addgene 24150) or pLKO.1-Blast (addgene 26655) plasmids with AgeI/EcoRI restriction sites. Sh RNA sequences used in this study are listed in the Table S5.

111 The Role of histone modifications in the regulation of alternative splicing during the EMT

Expression and Purification of vSET constructs

The coding sequence for vSET was ordered from IDT and cloned into a modified pET22b plasmid. Single-chain dimeric vSET constructs with GSGSG-(SSG)n-SGSGG linkers (n=1 –3) in between two vSET monomers were generated by PCR and subcloning of a fragment encoding the C-terminal 8 residues of vSET followed by the linker and a complete vSET monomer into the XbaI and HindIII restriction sites of vSET in modified pET22b. vSET and dimeric sc-vSET (also called vSETx2 in this paper) constructs were expressed in BL21 E. coli and purified from inclusion bodies essentially as described for vSET by (Manzur et al., 2003). In short, inclusion bodies were solubilized in unfolding buffer (20 mM Tris pH 7.5, 7 M guanidine hydrochloride, 10 mM DTT). To refold vSET and vSETx2 proteins, solubilized protein was first dialyzed against urea dialysis buffer (10 mM Tris pH 7.5, 7 M urea, 100 mM NaCl, 1 mM EDTA, 5 mM ß- mercaptoethanol), followed by repeated dilution with vSET refolding buffer (50 mM Tris pH 7.5, 300 mM NaCl, 10% glycerol, 0.1 mM EDTA, 5 mM ß-mercaptoethanol) reducing the concentration of urea from 7 M to 1 M in a step-wise fashion in increments of 1 M (1 h/dialysis step). Finally, refolded vSET and vSETx2 proteins were dialyzed once against vSET refolding buffer and then once against vSET HEPES refolding buffer (50 mM HEPES pH 7.5, 300 mM NaCl, 5% glycerol, 0.1 mM EDTA, 5 mM ß- mercaptoethanol).

Size exclusion chromatography of refolded vSET and vSETx2 constructs was performed on a Superdex 75 column in vSET HEPES refolding buffer. vSETx2 construct specificity towards H3K27 was tested by methyltransferase assays in which the substrate nucleosomes were harbouring an H3K27A mutation.

Methyltransferase assays

In vitro histone methyltransferase (HMT) assays were carried out essentially as described in (Voigt et al., 2012). Briefly, core histones were expressed in E. coli , purified from inclusion bodies and assembled into histone octamers by dialysis into refolding buffer (10 mM Tris pH 8, 2 M NaCl, 1 mM EDTA, 5 mM ß-mercaptoethanol). Correctly assembled octamers were purified by size exclusion chromatography on a Superdex S200 column. Recombinant nucleosome arrays were reconstituted via salt

112 The Role of histone modifications in the regulation of alternative splicing during the EMT dialysis assembly of histone octamers onto plasmid DNA containing 12 177-bp repeats of the 601 nucleosome positioning sequence. To determine methylation activity, 2 –10 ng of vSET or VSETx2 constructs were incubated with 1 µg of recombinant 3 nucleosome arrays in 50 mM Tris pH 8.5, 5 mM MgCl 2, 4 mM DTT, and H-labeled SAM for 1 h at 30°C. Reactions were stopped by addition of SDS loading buffer. After separation by SDS-PAGE and transfer to PVDF membranes, loading was assessed by Coomassie staining. Activity was detected as incorporation of 3H via exposure of Biomax MS film with the help of Biomax Transcreen LE (both Kodak Carestream) intensifying screens.

Epigenome Editing

Stable cell lines of MCF10a-Snail-ER expressing the different dCas9s were generated. Briefly, cells were infected with recombinant viruses containing dCas9-empty-GFP, dCas9-sid4x-GFP, dCas9-p300core-GFP, dCas9-EZH2core-GFP, dCas9-vSETx2- GFP or dCas9-UTXcore-GFP following Recombinant Lentivirus Production protocol. Infected cells were harvested and GFP-sorted using a BD FACS Melody (BD Biosciences-US). GFP was excited by a 488-nm laser line and its emission was collected through 527/32BD. dCas9 Stable cell lines were then infected with Lentiviruses containing pKLV2.3-Hygro + gRNAs, split, and medium was supplemented with 100µg/mL hygromycin. See Table S4 for the list of gRNAs used.

Recombinant Lentivirus Production

HEK393T were split at 2.10 6 cells / 100mm dish (Day 1). Cells were transfected with 1µg psPAX2 plasmid (VSVG env gene), 1µg pMD2.G plasmid (gag, pol, and accessory proteins), 5µg of plasmid of interest (eg. dCas9-empty), 250mM Cacl 2, qsp 500µL sterile water. Samples were gently mixed and completed with 2X HEPES Buffered Saline (HBS), and incubated 15min at room temperature. Mixes were dropped on

HEK293T and cells were maintained at 37°C with 5% CO 2 (Day 2). 15h after transfection medium was replaced by MCF10a complete medium and MCF10a cells were split at 5.10 5 cells/100mm dish for further infections (Day 3). 48h and 72h after transfection, viruses were collected, filtered through 0.45µm filters, and dropped on MCF10a cells (Days 4 and 5). 72h after, cells were split and medium was supplemented with 15µg/mL blasticidin or 100µg/mL hygromycin.

113 The Role of histone modifications in the regulation of alternative splicing during the EMT

Chromatin Immunoprecipitation

We performed ChIP using H3K27me3 antibody (Cell Signaling C36B11), H3K27Ac antibody (abcam 4729), H3K4me1 antibody (abcam 8895), H3K9Ac antibody (abcam 4441), H3K9me2 antibody (abcam 1220), HAtag antibody (abcam 9110), total Pol-II antibody (Santa Cruz sc-55492). MCF10a cells (10 million per sample) were fixed in 1% formaldehyde in PBS at room temperature with agitation for 2min (Histone marks), 4min (HAtag), 10min (Total Pol-II), then quenched with 1M glycine for 5 min. Fixed cells were resuspended in 1mL cold Lysis Buffer A (50mM HEPES pH 7.5, 140mM NaCl, 1mM EDTA, 10% glycerol, 0.5% NP-40/Igepal, 0.25% Triton X-100) prepared fresh with protease inhibitors (Sigma 11836145001) and incubated at 4°C on rotating wheel for 10 min. Nuclei were pelleted and resuspended in 1mL Lysis Buffer B (10mM Tris-HCl pH 8, 200mM NaCl, 1mM EDTA, 0.5mM EGTA, prepared fresh with protease inhibitors), and incubated on rotating wheel for 10 min. Samples were then diluted with 0.75mL Dilution Buffer C (10mM Tris-HCl pH 8, 100mM NaCl, 1mM EDTA, 0.5mM EGTA, 0.1% sodium deoxycholate, 0.5% N-lauroylsarcosine, prepared fresh with protease inhibitors), and sonicated at 4°C for 12, 14, 16 min (for 2, 4, 10 min cross- linking respectively), in 15 ml conical polystyrene tubes using a Bioruptor TM (Diagenode) sonicator with a 4°C water bath cold circulation system, to generate fragments from 200bp to 1kp long. After sonication, samples were spun at 20,000xg for 30min at 4°C to remove debris. 8µg (Histone marks, HAtag) or 25µg (Total Pol-II) of chromatin were diluted in TSE 150 Buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris-HCl pH 8, 150mM NaCl, supplemented with protease inhibitors) and cleaned-up with 30µL of pre-washed Dynabeads Protein G (Thermo Fisher 10009D) and incubated at 4°C on rotating wheel for 1h30. Prior to setting up immunoprecipitation (“IP”) reactions, 50µl of precleared chromatin was removed as “Input.” 150µL of TE/1% SDS Buff er (10mM Tris-HCl pH 8, 1mM EDTA pH 8, 1% SDS) was added to “Input” and incubated overnight at 65°C. 3µl of proteinase K (Thermo Fisher EO0491) was added and samples were incubated at 37°C for 2h. Following the incubation, “Input” DNA was purified using th e QIAquick PCR Purification kit (QIAGEN 28106) per the manufacturer’s instructions. To set up IP reactions, precleared chromatin was mixed with antibody and rotated overnight at 4°C. IP reactions was added to 30µL pre-washed Dynabeads Protein G and rotated 1h30 at 4°C. Beads were washed once with TSE 150 Buffer, once with TSE 500 Buffer (0.1% SDS, 1% Triton

114 The Role of histone modifications in the regulation of alternative splicing during the EMT

X-100, 2mM EDTA, 20mM Tris-HCl pH 8, 500mM NaCl), once with Washing Buffer (10 mM Tris-HCl pH 8, 1mM EDTA, 0.25 M LiCl, 0.5% NP-40/Igepal, 0.5% sodium deoxycholate), and twice with TE buffer (10mM Tris-HCl pH 8, 1mM EDTA pH 8). Following the final wash, beads were eluted with 100µl of Elution Buffer (50mM Tris- HCl pH 8, 10mM EDTA, 1% SDS) 15min at 65°C while vigorously shaking, and 100µL of TE/1% SDS Buffer for a final eluate volume of 200ul. The following were incubated overnight at 65°C. 3µl of proteinase K was added and samples were incubated at 37°C for 2h. Following the incubation, DNA was purified using the QIAquick PCR Purification kit (QIAGEN 28106 ) per the manufacturer’s instructions. Input and immunoprecipitated DNA were then analysed by QPCR using the iTaq Universal Syber Green supermix (Bio-Rad #1725121) on the Bio-Rad CFX-96 Touch Real-Time PCR System. Results are represented as the mean value +/− S.E.M of at least 3 independent experiments of immunoprecipitated chromatin (calculated as a percentage of input) with the indicated antibodies after normalization by the mean of three control regions stably enriched across the different conditions. See Table S1 for the list of gene-specific primers used.

RNA Extraction and RT-qPCR

RT-qPCR analysis were performed in biological triplicates or quadruplicates. Total RNAs were prepared from cells with the GeneJET RNA Purification Kit (Thermo Scientific #K0732). All samples were eluted into 30µl RNAse-free water. DNAs was remove from RNA by using RQ1 RNase-Free DNase (Promega #M6101), briefly, 1µg of RNA was mixed with RQ1 DNase and RQ1 DNase 10X Reaction Buffer and incubated 30min at 37°C. RQ1 enzyme was inactivated by adding Stop Solution 10min at 65°C. cDNAs were generated using the Transcriptor First Strand cDNA Synthesis kit (Roche 04 897 030 001) according to the manufacturer’s instructions. For each biological replicate, quantitative PCR reactions were performed in technical duplicates using the iTaq Universal Syber Green supermix (Bio-Rad) on the Bio-Rad CFX-96 Touch Real-Time PCR System, and the data normalized to a control gene. Data from biological replicates are plotted as mean +/− S.E.M. See Table S2 for the list of gene - specific primers used.

115 The Role of histone modifications in the regulation of alternative splicing during the EMT

Migration and Invasion Assays

Nondirectional migration - Wound Healing Assay: MCF10a cells were plated at 1.10 6 cells/well in 6 well plates containing complete medium and they were grown to confluency. Confluent cultures were serum-starved for 12 hours. Serum-starved, confluent cell monolayers were wounded with a plastic pipette tip and they were washed three times with 1X PBS to remove floating cells. Following washing, the cells were cultured in complete medium. The wounded area was photographed at 0h (control) and 24 hours later using a Zeiss axiovert 40 CFL microscope with a 10X objective (100X magnification). Cell migration into the scratch was quantified using ImageJ plugin MRI Wound Healing Tool (Volker Baecker, Montpellier RIO Imaging).

Directional migration – Transwell Filter Assay: Cell migration assay was performed using 24 well chambers (Sigma CLS3422-48EA) with uncoated polycarbonate membranes (pore size 8µm). Briefly, 5.10 4 cells resuspended in depleted medium (DMEM/F12 supplemented with 1% horse serum, 10 μg/mL insulin, 0.1 μg/mL cholera toxin, 0.5 μg/mL hydrocortisone, 1% penicillin/streptomycin, 1% L-glutamine) were placed in the upper chamber of the transwell unit. The bottom chamber was filled with 0.6mL complete medium supplemented with 20ng/mL FGF-2. The plates were incubated for 12h at 37°C with 5% CO 2 and the cells migrating from the upper to the lower chamber of the unit were fixed with 4% paraformaldehyde in PBS 2min, permeabilized with 0.1% Triton X-100 for 5min and stained with 0.2% crystal violet for 1h. Migrating cells were counted using a Zeiss axiovert 40 CFL microscope with a 5X objective (50X magnification).

Invasion – Transwell Filter Assay: For cell invasion assay, 24 well chambers were coated with Matrigel (Sigma E6909) diluted in depleted medium for 1h at 37°C and assays were performed as described in Directional migration, excepted that the assay was performed for 24h.

Flow Cytometry Experiments

MCF10a cells were fixed in 4% Paraformaldehyde for 10min at room temperature followed by a 15min permeabilization step in 0.5% Tween20. Cells were resuspended in Blocking Buffer (PBS, 3% BSA, 0.1% Tween20) for 30min on rotating wheel at room temperature and incubated with conjugated antibodies EPCAM-PE (MACS Miltenyi

116 The Role of histone modifications in the regulation of alternative splicing during the EMT

130-113-264) and MCAM-APC (MACS Miltenyi 130-120-771) for 1h30 on rotating wheel at room temperature protected from light. Cells were harvested and analysed using a MACS Quant 10 (MACS Miltenyi Biotec). PE was excited by a 488-nm laser line (laser DPSS) and its emission was collected through 655/605nm; APC was excited by a 640-nm laser line and its emission was collected through 655/730nm. The data were analysed using Flowing software (Perttu Terho, Turku Centre for Biotechnology).

FACS of CTNND1 Exon 2 Expressing Cells

Cells were resuspended in Blocking Buffer (PBS, 3% BSA) for 30min on rotating wheel at room temperature and successively incubated with CTNND1 exon 2 primary antibody (Santa Cruz sc-23873) for 1h30 on rotating wheel at room temperature, and PE-Cy7 secondary antibody (Thermo Fisher 25-4015-82) for 30min on rotating wheel at room temperature protected from light. Cells were harvested and analysed using a BD FACS Melody (BD Biosciences-US). PE-Cy7 was excited by a 561-nm laser line and its emission was collected through 783/56BD. The data were analysed using BD FACS Chorus software (BD Biosciences-US).

Polymerase II Elongation Measurement

A DRB treatment (Sigma D1916) of 100µM for 6h was necessary in order to fully block endogenous CFTR transcription. Cells were washed and the kinetic (0, 5, 10, 15, 20, 30, 45, 60, 90 min) was started by adding complete medium. For each time point of the kinetic, cells are scraped and cell pellets are snap frozen in liquid nitrogen. Total RNA was extracted as mentioned above in RNA Extraction and RT-qPCR. Reverse transcriptase reaction was initiated with random hexamers. Quantification of the pre- mRNAs was performed by real-time PCR with amplicons spanning the intron-exon junctions. For each biological replicate, quantitative PCR reactions were performed in technical duplicates using the iTaq Universal Syber Green supermix (Bio-Rad) on the Bio-Rad CFX-96 Touch Real-Time PCR System, and the data normalized by tRNA . Data from biological replicates are plotted as mean +/− S.E.M. See Table S3 for the list of gene-specific primers used.

117 The Role of histone modifications in the regulation of alternative splicing during the EMT

TSA and CPT Treatments

A 24 hours treatment of 40µM of DRB (Sigma D1916) or 1µg/mL of TSA (Trichostatin A – Sigma T8552) was applied on MCF10a-Snail-ER cells after 0 days (T0) or 7 days (T7) of EMT induction, to impede the dynamics of transcribing RNA Polymerase II. Total RNA extraction and quantification were performed as mentioned above in Polymerase II Elongation measurement.

shRNA Knockdown

Knock-down of PTB, ELAV1, ESRP1, MBNL1, hnRNPH1, CELF1, hnRNPF, SRSF1, FUS, RBFOX2, SOX9, SMAD3 and PCBP1 was performed according to the Recombinant Lentivirus Production protocol. Briefly, HEK293T cells were transfected with the appropriate shRNA plasmid, 15h after transfection medium was replaced by MCF10a complete medium and MCF10a cells were split for further infections. 48h and 72h after transfection, viruses were collected, filtered through 0.45µm filter, and dropped on MCF10a cells. 72h after, cells were split and medium was supplemented with 15µg/mL blasticidin or 100µg/mL hygromycin.

RNA-Immunoprecipitation

The day before collection, 10 7 cells were seeded per condition and IP reaction (5x106 for PTB and 5x10 6 for normal mouse IgG1 IP’s) in a 15 cm plate. Next day, cell media was discarded and each plate was washed with 12 ml of cold PBS 1X (D8537, Sigma- Aldrich). Cells were UV-crosslinked at 254 nm with 2000 J/m 2 in ice and scrapped. After centrifugation at 2500 rpm for 5 min, the PBS was discarded and the pellets were stored at -80 °C until processing. Cells were lysed in 617.5 μl of cell lysis buffer (1% v/v NP-40, 400 U/ml of RNAse inhibitor in 1X PBS) for 10 min in ice. Sodium deoxycholate was added to 0.5% v/v final concentration and samples were incubated with rotation for 15 min at 4°C. Samples were incubated at 37 °C with 30 U of DNAse with shacking at 300 rpm, vortexed briefly and sonicated for 10 cycles x (30’’ on /30’’ off, high setting condition) in 15 ml conical polystyrene tubes using a BioruptorTM (Diagenode) sonicator with a 4°C water bath cold circulation system. After that, the tubes were spun to recover all samples and centrifuged for 15 min at 15.000 rpm to remove insoluble debris. Every sample was divided in 2 x 300 μl aliquots and 30 μl

118 The Role of histone modifications in the regulation of alternative splicing during the EMT were saved as « Input » control and stored at -80 °C. Each aliquot was incubated with 6μg of α -PTB antibody (Ref. 32-4800, Invitrogen) or 6μg of α -normal mouse control IgG1 antibody (Ref. 14-4714-82, Invitrogen), overnight with rotation at 4°C. Next day, 40 μl of Dynabeads protein G (Ref. 10009D, Invitrogen), pre -washed three times with 1ml 1X PBS, 0.01% v/v Tween-20, were added per sample and incubated for 4h at 4°C with rotation. The unbound supernatant was discarded and beads were washed once with Cell Lysis Buffer (1% v/v NP-40, 0.5% sodium deoxycholate in 1X PBS), three times with Washing Buffer I (1% v/v NP-40, 0.5% sodium deoxycholate, 300mM NaCl in 1X PBS), once with Washing Buffer II (0.5% v/v NP-40, 0.5% sodium deoxycholate, 0.125% v/v SDS in 1X PBS) and once in PBS 1X. All washes were done for 5 min with rotation at 4°C. Beads and Inputs were incubated with 100μl of Proteinase K buffer (100 mM Tris-HCl pH 8, 50 mM NaCl, 10 mM EDTA, 0.5% v/v SDS, 100 U RNAsin in DEPC water) and 10 μl of Proteinase K for 45 min at 25°C shacking at 1.500 rpm. 1ml of Trizol® (Ambion) was added per sample (beads or input) and RNA was purified according to manufacturer’s protocol, including 1μl of Glycoblue TM Coprecipitant (Ref. AM9515, Invitrogen). RNA pellets were resuspended in 8μl of DEPC water and incubated with 1 μl DNAse (1U DNAse) and 1 μl of 10X DNAse buffer (Ref. M6101, Promega) for 30 min at 37°C. The DNAse was inactivated with 1 μl of Stop Solution at 65°C for 10 min and RT was performed using Transcriptor First Strand cDNA Synthesis Kit (Ref. 04 897 030 001, Roche) in a final volume of 20 μl. RT was diluted 1/5 and each sample was quantified in duplicates as describe d before. The enrichment of every IP was normalized to its Input using (2^(Ct IP-Ct Input) and for representation the fold change was calculated relative to IgG and CTNND1 exon 7 control enrichments.

Motif Search Analysis

RNA binding motif search analysis was done using CTNND1 exon 2 sequence in four public software: RBPDB v1.3 (http://rbpdb.ccbr.utoronto.ca), RBPMAP v1.1 (http://rbpmap.technion.ac.il), SFMAP v1.8 (http://sfmap.technion.ac.il/) and Spliceaid (http://www.introni.it/splicing.html). All software were used with the default parameter settings, except for some exceptions. For RBPDB the threshold 0.8 was applied. For RBPMAP, the Stringency level used was “High stringency” with all motifs available from Human/mouse. For SFMAP both “Perfect match” and “High stringency” levels

119 The Role of histone modifications in the regulation of alternative splicing during the EMT were used. We prioritized the RNA motifs predicted by more than one database and expressed in MCF10a cells.

QUANTIFICATION AND STASTISTICAL ANALYSIS

P Values and Statistical Analysis

Two-tailed unpaired Student’s t -test was used in all Figures and Supplementary Figures. P-values and other details can be found in figure legends.

DATA AND SOFTWARE AVAILABILITY

Software

See Key Resources Table

SUPPLEMENTARY REFERENCES

Javaid, S., Zhang, J., Anderssen, E., Black, J.C., Wittner, B.S., Tajima, K., Ting, D.T., Smolen, G.A., Zubrowski, M., Desai, R., et al. (2013). Dynamic Chromatin Modification Sustains Epithelial-Mesenchymal Transition following Inducible Expression of Snail-1. Cell Rep. 5, 1679 –1689.

Manzur, K.L., Farooq, A., Zeng, L., Plotnikova, O., Koch, A.W., Sachchidanand, null, and Zhou, M.-M. (2003). A dimeric viral SET domain methyltransferase specific to Lys27 of histone H3. Nat. Struct. Biol. 10 , 187 –196.

Voigt, P., LeRoy, G., Drury, W.J., Zee, B.M., Son, J., Beck, D.B., Young, N.L., Garcia, B.A., and Reinberg, D. (2012). Asymmetrically modified nucleosomes. Cell 151 , 181 –193.

120 The Role of histone modifications in the regulation of alternative splicing during the EMT

KEY RESOURCES TABLE

Supplementary List: List of Reagents and resources

121 The Role of histone modifications in the regulation of alternative splicing during the EMT

122 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Table S1: List of ChIP-qPCR primers

123 The Role of histone modifications in the regulation of alternative splicing during the EMT

124 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Table S2: List of RT-qPCR primers

125 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Table S3: List of RNAP II elongation assay and RNA-IP primers

126 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Table S4: List of gRNAs

Supplementary Table S5: List of shRNAs

127 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Table S6: List of cloning primers

128 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figures Figure 1

129 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 1

Figure 1: Specific histone modifications correlate in time with dynamic changes in splicing during EMT. (A) Schematic representation of the epithelial-to-mesenchymal transition (EMT) and reverse MET. The function of the alternatively spliced genes most relevant for EMT transition is shown. Normal human epithelial MCF10a-Snail-ER are totally reprogrammed into mesenchymal-like cells in 7 days (T7). First changes in EMT markers are observed 6h (T0.25) after treatment with tamoxifen (TXF). Until the EMT is complete, there are several intermediate states in which heterogeneous populations of cells coexist. ( B,G ) Representation of CTNND1 and FGFR2 gene loci with position of the primers used for ChIP-qPCR experiments. Highlighted in red the alternatively spliced exon regulated during EMT. Sense of transcription is represented as a black arrow. ( C, H) Inclusion levels of CTNND1 exon 2 and FGFR2 exon IIIc relative to total expression levels of CTNND1 and FGFR2, respectively, in MCF10a- Snail-ER cells at different time points during induction of the EMT (0 to 7 days in presence of tamoxifen, orange) and reversible MET (21 days after removal of the tamoxifen at T7, red). RT-qPCR results are shown as the mean +/- SEM of n=4 biological replicates. ( D-F and I-K) Enrichment levels of H3K27me3 (D,I), H3K27ac (E,J) and H3K4me1 (F,K) along CTNND1 or FGFR2 locus, focusing into the alternatively spliced exons of interest (CTNND1.e2 and FGFR2.IIIc) and flanking intronic and exonic control regions, in tamoxifen-induced MCF10a-Snail-ER cells treated for 6h (T0.25), 24h (T1) or 7 days (T7) with tamoxifen and MET reversed cells in which the tamoxifen was eliminated for 21 days (MET). Chromatin immunoprecipitation results are shown as the mean +/- SEM in n=4 biological replicates. The percentage of input was normalized by three control regions across the different conditions. *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect untreated cells (grey).

130 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 2

131 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 2

Figure 2: Localised changes in H3K27me3 and H3K27ac drive alternative splicing. (A) Schematic representation of CTNND1 gene locus and the alternatively spliced exon 2 (yellow) and exon 20 (grey) with the position of the gRNAs used to exon-specifically target the different dCas9-fused proteins. Nucleosome positioning, according to MNase-qPCR assay (data not shown), is represented in exon 2. (B,C ) Enrichment levels of H3K27ac (B) and H3K27me3 (C) at CTNND1 exon 2 (yellow) and control exon 20 (grey) in MCF10a-Snail-ER cells upon infection of dCas9 fused to the catalytic domain of an H3K27 epigenetic modifier (summary table on the right) in the presence of exon -specific gRNAs targeting exon 2, by chromatin immunoprecipitation (mean +/- SEM, n=4). The percentage of input was normalized by three control regions across the different conditions. Mutated p300* and vSETx2* were used as negative controls together with empty dCas9. ( D-F) Inclusion levels of CTNND1 exon 2 (D) and exon 20 (E), and total expression levels of CTNND1 (F) relative to total CTNND1 or TBP in MCF10a- Snail-ER cells upon infection with dCas9 H3K27 epigenome editors and exon 2-specific gRNAs, determined by RT-qPCR (mean +/- SEM, n=4). ( G) Schematic representation of FGFR2 gene locus, as in (A), with gRNAs position and nucleosome positioning (data not shown) at the targeted exon IIIc (green). Exon IIIb is used as a control (grey). ( H,I ) H3K27ac and H3K27me3 enrichment levels at the gRNA-targeted exon IIIc and control exon IIIb in MCF10a-Snail-ER cells infected with the dCas9 H3K27 modifiers by ChIP-qPCR as described in (B,C) (mean +/- SEM, n=4). ( J-L) Inclusion levels of FGFR2 exon IIIc, control exon IIIb and total expression levels of FGFR2, relative to total FGFR2 or TBP expression levels by RT-qPCR as described in (D-F) (mean +/- SEM, n=4). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect empty dCas9 plasmid (empty).

132 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 3

133 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 3

Figure 3: Chromatin-induced changes in splicing recapitulate the EMT. (A-D) Expression levels of epithelial (E-Cadherin, EPCAM) and mesenchymal (ECM1, MCAM) markers at the mRNA (A,B - RT- qPCR, mean +/- SEM, n=4) and protein (C,D - Flow cytometry, mean +/- SEM, n=4) levels in MCF10a- Snail-ER cells infected with the dCas9-fused proteins changing splicing and the corresponding exon- specific gRNAs targeting FGFR2 exon IIIc (g1) or CTNND1 exon 2 (g1). mRNA levels are normalized by TBP, and protein levels are quantified above the no primary antibody background signal (summary scheme on the left). ( E,F ) Functional EMT assays to test non-directional (E) and bi-directional migration (F) in MCF10-Snail-ER cells infected with exon 2 or exon IIIc-specific targeting gRNAs and dCas9-fused proteins with their corresponding catalytic mutants. Scratch assays ( E) were carried out on confluent monolayers of cells for evaluating the % of gap remaining 24h after wound (mean +/- SEM, n=3). Transwell assays ( F) evaluate the number of cells migrating towards FGF-2 in 12h (mean +/- SEM, n=3). (G) MCF10a-Snail-ER cells infected with gRNAs targeting CTNND1 exon 2 and either dCas9-vSETx2 or mutant dCas9-vSETx2* were cell-sorted using a splicing-specific antibody detecting only CTNND1 mesenchymal protein variant (mCTNND1(ex2)) for directional migration and invasion transwell assays. Negative cells not expressing mCTNND1(ex2) and tamoxifen-induced T7 EMT cells were used as controls. The percentage of mCTNND1(ex2) positive cells is shown on the right. ( H) The number of sorted cells migrating or invading through a matrigel matrix for 24h were normalized to untreated cells for comparison with tamoxifen-induced T7 EMT cells (mean +/- SEM, n=3). ( I-L) Expression levels of epithelial (E-Cadherin, EPCAM) and mesenchymal (MCAM, ECM1) markers in cell-sorted MCF10a- Snail-ER cells. RT-qPCR levels were normalized by TBP expression levels (mean +/- SEM, n=3). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect the corresponding control (empty, dCas9-p300*, dCas9-vSETx2* or negative cells, all in grey).

134 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 4

135 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 4

Figure 4: H3K27 marks regulate splicing by modulating the recruitment of specific splicing factors to the pre-mRNA. (A,B ) Apparition in time of CTNND1 exon 2 pre-mRNA in synchronized untreated (grey) and tamoxifen-induced (orange) MCF10a-Snail-ER cells upon release of the transcriptional inhibitor DRB after 0.5 days (A, T0.5) or 7 days (B, T7) of EMT induction. RT-qPCR results are normalized by tRNA expression levels (mean +/- SEM, n=3). (C) Total RNA Polymerase II levels at CTNND1 exon 2 in untreated (grey) and tamoxifen-induced EMT T7 (orange) MCF10a-Snail- ER cells by ChIP-qPCR (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions and represented relative to untreated cells (grey). ( D-F) Same as (A-C) for CTNND1 control exon 15 (mean +/- SEM, n=3). (G) Predicted RNA-binding motifs along CTNND1 exon 2 in at least two of the four software used (RBPDB, RBPMAP, SFMAP and Spliceaid, details in methods). ( H) Inclusion levels of CTNND1 exon 2 upon knock-down, using lentiviral shRNAs, of candidate splicing regulators in untreated (grey) and tamoxifen-induced MCF10-Snail-ER (T7, orange) cells. RT-qPCR results are normalized by total CTNND1 expression levels (mean ± SEM, n=3). (I) Enrichment levels of PTB at CTNND1 exon 2 pre-mRNA in untreated (UNT) and tamoxifen-induced (T7) MCF10a-Snail-ER cells. Constitutively included CTNND1 exon 6 and excluded CTNND1 exon 20 were used as negative and positive controls of PTB binding, respectively. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels (mean ± SEM, n=5) ( J) Enrichment levels of PTB at CTNND1 exon 2 and control exon 6 and exon 20 pre-mRNA in cell-sorted cells expressing (positive) or not (negative) the mesenchymal-specific splicing isoform mCTNND1(ex2) in MCF10a-Snail-ER cells infected with dCas9-vSETx2, or mutant dCas9-vSETx2*, and the exon-specific gRNAs (g1) targeting CTNND1 exon 2. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels (mean ± SEM, n=6). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect control cells (untreated cells, scramble shRNA or negative cells).

136 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 1

137 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 1

Supplementary Figure 1: Localised enrichment of specific histone marks at alternatively spliced exons during EMT. (A,B ) CTNND1 expression and exon 20 inclusion levels relative to total TBP and CTNND1, respectively, during EMT and MET in induced MCF10a-Snail-ER cells by RT-qPCR (mean +/- SEM, n=4). ( C,D ) Enrichment levels of H3 and H3K36me3 along CTNND1 locus in tamoxifen- induced MCF10a-Snail-ER cells for 24h by ChIP-qPCR (mean +/- SEM, n=4). Only for H3K36me3, the percentage of input was normalized by three control regions across the different conditions. ( E,F ) FGFR2 expression and exon IIIb inclusion levels relative to total TBP and FGFR2, respectively, during EMT and MET in induced MCF10a-Snail-ER cells by RT-qPCR (mean +/- SEM, n=4). ( G,H ) Enrichment levels of total H3 and H3K36me3 along FGFR2 locus in tamoxifen-induced MCF10a-Snail-ER cells for 24h by ChIP-qPCR (mean +/- SEM, n=4). Only for H3K36me3, the percentage of input was normalized by three control regions across the different conditions ( I,K,M,O ) Enrichment levels of H3K27me3, H3K27ac and H3K4me1 along SLK (I), TCF7L2 (K), SCRIB (M) and ENAH (O) loci in tamoxifen-induced MCF10a-Snail-ER cells by ChIP-qPCR (mean +/- SEM, n=4). The percentage of input was normalized by three control regions across the different conditions. ( J,L,N,P ) Inclusion levels of alternatively spliced exons essential for the EMT: SLK exon 13 (J), TCF7L2 exon 4 (L), SCRIB exon 16 (N) and ENAH exon 11 (P) in MCF10a-Snail-ER during induction of EMT. RT-qPCR values were normalized by total expression levels of SLK, TCF7L2, SCRIB and ENAH, respectively (mean +/- SEM, n=4). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect untreated (grey).

138 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 2

139 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 2

Supplementary Figure 2: Exon-specific epigenome editing of H3K27 marks is sufficient to induce a change in splicing. (A) Schematic of vSETx2 construct with optimal linker sequence between the two monomers. ( B) Representative histone methyltransferase assay to show the specificity of native vSET and vSETx2 activity on H3K27 residue in wild-type and H3K27A mutated recombinant chromatin templates. ( C) Schematic representation of CTNND1 gene locus and the alternatively spliced exon 2 (yellow) and control exon 20 (grey) with the position of the gRNAs used to exon-specifically target the different dCas9-fused proteins. ( D,E,H,I ) Enrichment levels of H3K27ac (D,H) and H3K27me3 (E,I) at CTNND1 exon 2 (yellow) and control exon 20 (grey) in MCF10a-Snail -ER cells infected with dCas9- fused proteins and two different combinations of exon-specific gRNAs targeting exon 2 (g2) or exon 20 (g1) (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions. ( F,G,J,K ) Expression levels of CTNND1 exon 2 (F,J), total CTNND1 (G) and exon 20 (K) in untreated MCF10a-Snail-ER cells infected with dCas9-fused proteins and the exon-specific gRNAs targeting exon 2 (g2) or exon 20 (g1). RT-qPCR values were normalized by total CTNND1 or TBP as indicated in the graph (mean +/- SEM, n=4). ( L-T) Same as C-K on FGFR2 gene locus, with (L) a schematic representation of FGFR2 locus, with gRNAs position at the alternatively spliced exons IIIc (green) and control exon IIIb (grey). ( M-T) H3K27ac, H3K27me3 and expression levels as represented in D-K (mean +/- SEM, n=4). (U,V ) Enrichment levels of H3K4me1, H3K9ac and H3K9me2 at FGFR2 exon IIIc (green, U) or CTNND1 exon 2 (yellow, V) in untreated MCF10a-Snail-ER cells infected with dCas9-fused proteins and exon-specific gRNAs targeting exon IIIc (g1, U) or exon 2 (g1, V) by ChIP-qPCR (mean +/- SEM, n=3). The percentage of input was normalized by three control regions across the different conditions. *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t-test compared to control empty-dCas9.

140 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 3

141 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 3

Supplementary Figure 3: Direct effect of dCas9 epigenomic editing on EMT. (A) Expression levels of the splicing factors most important for EMT (PTB, ESRP1, ESRP2, RbFOX2, MBNL1 and RBM47) in MCF10a-Snail-ER cells infected with different dCas9-fused proteins and exon-specific gRNAs targeting FGFR2 exon IIIc (g1, dark grey) or CTNND1 exon 2 (g1, light grey). RT-qPCR levels were normalized by TBP (mean +/- SEM, n=3). ( B) MCF10a-Snail-ER cells infected with dCas9-vSETx2 or mutant dCas9-vSETx2* and two different combinations of exon-specific gRNAs (g1 and g2) targeting CTNND1 exon 2 were cell-sorted by expression levels of the mesenchymal CTNND1 protein variant, which includes exon 2 (mCTNND1(ex2)), using splicing-specific antibody. Negative cells not expressing mCTNND1(ex2) and tamoxifen-induced T7 EMT cells were used as controls. The percentage of mCTNND1(ex2) positive cells per condition is shown. ( C) CTNND1 exon 2 inclusion levels in cells expressing (positive) or not (negative) the splicing variant mCTNND1(ex2) in the conditions described in (B). RT-qPCR levels were normalized by total CTNND1 expression levels (mean +/- SEM, n=3). ( D- G) Expression levels of epithelial (E-Cadherin, EPCAM) and mesenchymal (MCAM, ECM1) markers in cell-sorted MCF10a-Snail-ER cells infected with dCas9-vSETx2 or the mutant dCas9-vSETx2* and the second combination (g2) of gRNAs targeting CTNND1 exon 2. RT-qPCR levels were normalized by TBP expression levels (mean +/- SEM, n=3). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect negative cells.

142 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supp. Fig. 4

Supplementary Figure 4: H3K27 marks regulate splicing by modulating the recruitment of RNA- binding proteins, such as PTB. (A,B ) CTNND1 expression and exon 2 inclusion levels normalized by total TBP or CTNND1 expression levels, respectively, in untreated (grey) and tamoxifen-induced EMT T7 (orange) MCF10a-Snail-ER cells treated with DMSO (control), 1 µg/mL TSA (HDAC inhibitor) or 40µM DRB (RNA Polymerase II inhibitor) for 24h. RT-qPCR results are shown as the mean +/- SEM of n=4 biological replicates. (C) Total expression levels of the candidate splicing factors involved in CTNND1 exon 2 regulation upon shRNA knockdown in untreated (UNT, grey) and tamoxifen-induced (T7, orange) MCF10a-Snail-ER cells. RT-qPCR levels are shown relative to cells infected with scramble shRNA (mean +/- SEM, n=3). (D) PTB enrichment levels at the negative control CD44 v10 and the positive control PKM2 intron 8 in untreated (UNT, grey) and tamoxifen-induced (T7, orange) MCF10a- Snail-ER cells. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels as in Figure 4 (mean +/- SEM, n=5). ( E) PTB enrichment levels at the same positive and negative control regions as in (D) in cell-sorted MCF10-Snail-ER cells expressing (positive) or not (negative) the mesenchymal-specific splicing isoform mCTNND1(ex2) upon infection with dCas9-vSETx2, or mutant dCas9-vSETx2*, and the exon-specific gRNAs (g1) targeting CTNND1 exon 2. PTB levels were quantified in the indicated regions by qPCR and normalized by IgG and CTNND1 exon 7 control levels (mean +/- SEM, n=6). *P <0.05, **P <0.01, ***P <0.001 in two-tail unpaired Student’s t -test respect controls (untreated cells, scramble shRNA or negative control CD44 v10).

143 The Role of histone modifications in the regulation of alternative splicing during the EMT

DISCUSSION & PERSPECTIVES

144 The Role of histone modifications in the regulation of alternative splicing during the EMT

Thanks to the use of the dynamic cell reprogramming system that is the EMT, I identified two histone marks, H3K27ac and H3K27me3, that perfectly correlated in time with changes in alternative splicing. Even more, these marks were actually drivers of changes in alternative splicing necessary for the EMT, opening new insights into the impact of epigenetics in regulating gene expression and cell identity.

Here I will discuss some aspects of this newly identified H3K27-mediated regulation of splicing.

1 Regulatory pathways responsible for H3K27 marking at alternatively spliced exons

The upstream regulation of the coupling between histone modifications and alternative splicing is still not well understood and, based on previous results in the lab, I propose that H3K27 writers could be recruited in two different ways: by lncRNAs (Gonzalez et al., 2015) (Figure 1) and by chromatin and/or transcription factors.

Figure 1: role of antisense FGFR2 in alternative splicing regulation. Schematic representation of how the expression of the antisense FGFR2 induces recruitment of PRC2 complex, leading to an increase of KDM2a (H3K36 demethylase) recruitment, which inhibits recruitment of the splicing complex MRG15-PTB. The downstream effect is an increase of exon IIIb inclusion (adapted from Gonzalez et al., 2015).

145 The Role of histone modifications in the regulation of alternative splicing during the EMT

Our lab has previously shown how expression of an antisense lncRNA, within the alternatively spliced gene FGFR2, was sufficient to induce a change in H3K36me3 and H3K27me3 levels that led to inclusion of the epithelial isoform (Gonzalez et al., 2015). The lncRNA, called asFGFR2, was shown to interact with FGFR2 chromatin and to recruit the PRC2 H3K27 methyltransferase EZH2 and the H3K36me3 demethylase KDM2a. In the absence of KDM2a, asFGFR2 did not have an effect on FGFR2 splicing, supporting a chromatin-dependent role of this lncRNA in splicing regulation (Gonzalez et al., 2015). Moreover, it has been suggested that one of the mechanisms of PRC2 recruitment to chromatin is via lncRNAs that will act as recruiting scaffolds (Davidovich and Cech, 2015). I thus hypothesize that more lncRNAs might exist regulating EMT splicing via changes in H3K27me3/ac levels.

Chromatin factors could be also involved in the recruitment of histone marks writers. ATAC-seq (Transposase-Accessible Chromatin sequencing, a widely used protocol for genome-wide identification of accessible open chromatin) data combined to DNA motif search analysis had put forward the role of TP63 in epithelial cells. TP63 is a - related that preferentially binds to regulatory enhancers to maintain an epithelial-specific transcriptional program. Inhibition of this transcriptional regulator can induce the EMT and has been involved in tumor malignancy (Lindsay et al., 2011; Somerville et al., 2018). When looking at alternatively spliced exons with differential H3K27ac and ATAC-seq levels, we found many with evidence of TP63 ChIP-seq binding in keratinocytes (Bao et al., 2015), which are skin highly specialized epithelial cells ( Figure 2B), suggesting a role for TP63 in binding and regulating H3K27me3/ac levels in some of these genes. Interestingly TP63 has been shown by mass spectrometry to interact with splicing regulators, such as SRSF2 (Huang et al., 2012). DNA motif search analysis combined to evidence of an actual binding, using publicly available ChIP-seq datasets from hundreds of DNA-bound regulators, would bring hints on the DNA-bound proteins potentially responsible for the observed changes in H3K27 marks.

2 Splicing-associated chromatin signatures

Recent studies demonstrated that specific alternatively spliced exons with common cellular functions or regulatory pathways are co-enriched with specific histone marks.

146 The Role of histone modifications in the regulation of alternative splicing during the EMT

For instance, it is the case of H3K36me3, H3K27ac, and H4K8ac enriched in stem cell- related genes involved in DNA damage and cell cycle (Xu et al., 2018).

Using ChIP-seq data generated in the lab during the EMT, I identified that 16% of all the EMT-alternatively spliced exons were also changing H3K27ac levels for more than 2x-fold precisely at the exon of interest, and not downstream exons at the same gene (Figure 2A,B ). Gene Ontology analysis of H3K27ac-marked splicing genes found lamellipodium, extracellular matrix organization and focal adhesion as enriched biological terms, which are all hallmarks of EMT, and 22% of these K27ac-marked exons were associated with poor prognosis in breast cancer ( Figure 2C), which strongly supports a more global role for this mark in the regulation of EMT- and cancer related alternative splicing events.

Figure 2: Impact of H3K27ac levels on splicing and breast cancer. (A) Scatter plot correlating changes in H3K27ac ChIP-seq levels and RNA-seq exon inclusion levels genome-wide during EMT. (B) UCSC brother shot of T0/T7 RNA-seq, H3K27ac ChIP-seq, ATAC-seq and TP63 ChIP-seq at CTNND1 exon 2 and flanking regions. (C) 5-year Disease-Specific Survival curve of 110 TCGA basal breast cancer patients excluding (blue) or including (red) CTNND1 exon 2. Inclusion of the EMT isoform strongly associates with poor prognosis.

Even though I identified at the genome-wide level a correlation between the exons enriched in H3K27ac and alternative splicing changes of the same exons, I could not find a clear association between the level of H3K27ac and the level of inclusion of the spliced exons. These results suggest that even if they are regulated by the same histone mark, this regulation could involve different splicing regulators, or a same factor could be recruited at different binding sites depending of a combination of several histone modifications, with different downstream effects on splicing. It is already known that a same histone mark can differentially impact splicing by recruiting different splicing factors. For instance, H3K36me3 can recruit the splicing activator SRSF1

147 The Role of histone modifications in the regulation of alternative splicing during the EMT

(Pradeepa et al., 2012) or the splicing repressor PTB (Luco et al., 2010) at different alternatively spliced exons (Figure 3 ).

Despite the link between H3K27ac and splicing is still not well known, it is interesting to note that the Polycomb Repressive Complex 2 (PRC2), the chromatin complex involved in H3K27me3 deposition, the other mark I identified impacting splicing, carry a RNA binding activity and is enriched in exon-intron boundaries within RNAs, and also overlapping with the recruitment of RNA binding proteins such as FUS and HNRNPC (Beltran et al., 2016). RbFOX2 slicing factor has also been shown to recruit PCR2 at promoters creating an interaction with the nascent RNA (Wei et al., 2016). Thereby, we could propose that PTB, via H3K27ac mark, and RBFOX2, FUS or HNRNPC, via H3K27me3, are involved in complementary mechanisms to regulate alternative splicing.

Figure 3: Mechanisms through which H3K36me3 affect alternative splicing. (A) The adaptor protein MRG15 binds to H3K36me3, leading to the recruitment of the splicing repressor PTB, thereby promoting exon skipping. (B) Psp1, another adaptor binding to H3K36me3, favor exclusion of the alternative exon by recruiting the splicing factor SRSF1 (Zhou et al., 2014).

Unpublished data in the lab tend to confirm that different combinations of histone modifications can predict alternative splicing changes and that the similarly marked exons are involved in the same biological pathways (Agirre et al., in review, Nature Communication, Annexe 1 ).

148 The Role of histone modifications in the regulation of alternative splicing during the EMT

Indeed, using a machine learning approach applied to epigenomics and transcriptomics datasets, we have identified eleven chromatin modifications that differentially mark alternatively spliced exons depending on the level of exon inclusion. In addition to the presence or not of histone modifications, their position along the exons is also important, and a histone mark can also mark both included and excluded exons, increasing the complexity of chromatin-associated splicing. These modifications act in a combinatorial and position-dependent way creating splicing-associated chromatin signatures (SACS) on 34% of the alternatively spliced exons analyzed ( Figure 4).

Figure 4: Splicing-associated chromatin signatures (SACS). Schematic representation of the seven combinations of chromatin modifications (SACS) that differentially mark alternatively spliced exons. For each SACS, we specify the splicing group it is related to, the two co-enriched histone marks, the position of enrichment along the exon (represented by a peak) and the total number (n) of exons marked by the chromatin signature (in brackets the percentage of chromatin- marked exons respect the total number of exons analyzed per group) (adapted from Agirre et al., in review, Nature Communication, Annexe 1).

149 The Role of histone modifications in the regulation of alternative splicing during the EMT

Importantly, each SACS marks exons with particular genomic features, such as exon length, weaker splice sites, nucleosome positioning that impacts RNA polymerase II elongation rate and the presence of particular RNA binding motifs, which underlined specific regulatory pathways mediated by each combination of chromatin modifications. Following these lines, some of these splicing-associated chromatin marks also correlated with recruitment of the splicing regulator predicted by the RNA motif search analysis, further supporting a functional link between these chromatin signatures (SACS) and the splicing machinery. Finally, we could find a conservation between marking of some SACS and alternative splicing patterns in different cell types and species, such as mouse, supporting even more a functional role of histone marks and coordination of the regulation of some splicing patterns.

Finally, Gene Ontology analysis revealed that each group of alternatively spliced genes marked by a specific SACS was enriched in distinctive biological processes not found in the other groups, suggesting that chromatin might differentially mark exons sharing common functional and/or regulatory pathways.

Altogether, these data support a genome-wide impact of histone marks on alterative splicing in a combinatorial and position-dependent manner. These epigenetically marked exons share unique genomic features and regulatory functions that support a coordinated role of chromatin modifications in regulating spliced exons involved in similar functions (or regulated by the same factor).

To identify the protein regulators specifically recruited to these H3K27-dependent exons, I could consider using a pull-down and proteomics approach to properly pinpoint the true regulators of splicing, and identify all the chromatin and splicing regulators differentially recruited to exons of interest during EMT.

3 Not all histone marks are drivers of changes in splicing: H3K4me1, a late change that could link AS with RNAP II speed

In addition to the epigenetic marks H3K27Ac and H3K27me3 locally enriched on alternatively spliced exons and involved in early changes in alternative splicing, I identified that an enrichment of H3K4me1 is associated with late changes in splicing and more precisely with exon skipping. Moreover, in this model, late changes in alternative splicing coincide with changes in transcriptional elongation rate and RNA

150 The Role of histone modifications in the regulation of alternative splicing during the EMT polymerase II recruitment specifically at the alternatively spliced exon enriched with H3K4me1 chromatin modification. More precisely, I found that an increase of H3K4me1, associated with an increase of exon skipping, correlates with a local increase of the transcriptional elongation ( Figure 5).

The link between H3K4me1 and RNAP II speed is still not well understood but a recent study has demonstrated that elongation rates correlate positively with active transcription mark H3K4me1 and negatively with exon density (Jonkers et al., 2014).

They showed that genome-wide, RNA polymerase II slows down at exons and that this local decrease of elongation rates is associated to a differential enrichment of several features, such has H3K4me1, that synergistically regulate RNAP II kinetics, thus facilitating splicing.

Figure 5: Kinetics of RNA Polymerase II and histone modifications. Epigenetic marks are highly dynamic during EMT with some that precede changes in splicing, such as H3K27 marks, while H3K4me1 mark is associated with late changes in splicing and correlates with local changes in transcriptional elongation rate.

Two approaches could be considered to decipher the interplay between RNAP II elongation rate and the histone modification H3K4me1, and to identify which of these two parameters influence the other one. First, I could use the previously develop CRISPR/dCas9 system to edit exon-specifically the level of H3K4me1 at the studied alternatively spliced genes and observe the effects. dCas9 protein can be fused to the

151 The Role of histone modifications in the regulation of alternative splicing during the EMT histone methyltransferase MLL4 (dCas9-MLL4) or the histone demethylase LSD1 (dCas9-LSD1) to increase or decrease methylation of histone 3 lysine 4 (H3K4), respectively. These exon-specific modifications, associated to the analysis of the transcriptional elongation rate at the same exons, would indicate if the H3K4me1 histone mark is responsible of the local variations of RNAP II speed. Then the reverse correlation can be studied by using drugs activating or inhibiting transcriptional elongation, and by looking at the impact on alternative splicing and enrichment of H3K4me1 on the alternatively spliced exon.

The correlation between the change of H3K4me1 associated to already established changes in splicing, similar to changes in RNA polymerase II elongation rate, in addition to the previously described positive coupling between H3K4me1 and RNA polymerase II kinetics (Jonkers et al., 2014), make me suggest that H3K4me1 marking at alternatively spliced exons could be a consequence of changes in splicing initiated by H3K27me3 and H3K27ac marks. H3K4me1 would be important to maintain or to reinforce the newly established splicing events by impacting RNA polymerase II elongation rates, which is also known to regulate alternative splicing in a histone marks-dependent or -independent manner (Dujardin et al., 2014; de la Mata et al., 2003; Saint-André et al., 2011).

4 CRISPR/dCas9 as a potential therapeutic tool to impair EMT

Until now, studies about coupling between chromatin and alternative splicing were mostly correlative and it has never been properly demonstrated if chromatin is sufficient to induce changes in splicing. Plenty of treatments have been used to globally impact histone modifications to observe downstream effects on splicing. One example is the use of molecular drugs inhibiting enzymes involved in histone mark deposition. It is the case of TSA (Trichostatin A), an HDAC inhibitor, that induces a genome wide increase of histone acetylation responsible of changes in splicing (Schor et al., 2009). A second alternative found in literature to observe chromatin-associated splicing changes is the over-expression or down-regulation of chromatin enzymes. Histone methylases SET2 and ASH2, involved in H3K36me3 and H3K4me3 methylation, respectively, have been over-expressed and associated with changes in FGFR2 alternative splicing (Luco et al., 2010) , while depletion of heterochromatin protein HP1γ favors skipping of CD44 alternative exons (Saint-André et al., 2011).

152 The Role of histone modifications in the regulation of alternative splicing during the EMT

These two approaches have disadvantages. They are susceptible to affect thousands of other loci in the genome. These off-target effects could bring other correlations explaining the observed results or could even bring other results non-related to the chromatin, leading to potential over-interpretations about links between chromatin and alternative splicing. Only speculations can be done about a potential direct or indirect effect of modifying the chromatin environment and its impact on alternative splicing outcome ( Figure 6).

Figure 6: different strategies used to modulate chromatin modifications. Drugs and gene deregulation have been used for years to correlate changes in chromatin with changes in splicing, introducing an important amount of non-specific effects. Adaptation of the CRISPR/dCas9 system is an emerging powerful tool to elucidate the real interplay between these two processes.

These issues have been solved by using the CRISPR/dCas9 system to specifically modulate different histone marks at a specific exon to test their effects on splicing and EMT induction. Such a tool led me to demonstrate for the first time a causative effect of specific histone modifications on alternative splicing with minimum pleiotropic effects

and could make this system a powerful tool for therapeutic perspectives.

Changes in histone modifications that influence alternative splicing are very dynamic, starting early after EMT induction and completely reversible, highlighting the plasticity and adaptability of the splicing machinery to a new stimulus via changes in chromatin

153 The Role of histone modifications in the regulation of alternative splicing during the EMT modifications. This quick adaptability was already known in response to external stimuli. In plants and flies, temperature and light signals are integrated via changes in chromatin and alternative splicing (Martin Anduaga et al., 2019; Petrillo et al., 2014). In this study I proposed that these two regulatory layers could also be interconnected in mammalian cells for a more efficient and rapid response to external stimuli making them interesting targets in anti-metastatic treatments.

A growing body of evidence suggests a central role of EMT in metastasis and cancer reoccurrence. This clinical relevance in combination with increasing evidence for the importance of alternative splicing in EMT may lead to novel therapeutic strategies. I could therefore test the effect of locally modulating the two previously identified histone marks H3K27me3 and H3K27ac by using the newly established dCas9 system during the induction of EMT. The aim being to delay, impair or even reverse EMT by inhibiting key changes in alternative splicing via modulation of the chromatin (Figure 7).

Figure 7: strategy to impair EMT via dCas9-associated AS changes. dCas9 fused to H3K27 modifiers is recruited to an EMT-relevant alternatively spliced exon in order to enforce expression of the epithelial isoform and inhibition of the mesenchymal isoform, thereby to delay, impair or reverse the EMT process.

154 The Role of histone modifications in the regulation of alternative splicing during the EMT

To have a better understanding of the EMT process at a molecular level is of paramount importance for the development of cancer and metastasis treatments, and I have now demonstrated the causative role of H3K27me3 and H3K27ac marks on the splicing regulation of key transcripts essential for the induction of EMT that could be interesting targets for potential therapies. However, EMT is not a binary process, but occurs through distinct intermediate states that are not entirely known yet. Moreover, most of the changes in exon inclusion levels responsible for the EMT are not unimodal, going from 100% included to 100% excluded, but multimodal, with exons been intermediately included (Pradella et al., 2017), which means that either not all the cells are changing inclusion levels at the same time during the EMT or that there is transcript variability in the same cell. Looking at individual cells all the changes in H3K27ac, H3K27me3 and alternative splicing at different time-points during the EMT would address the splicing complexity and dynamics associated to the EMT and which are the exons dependent on this newly identified chromatin regulatory layer to look for the chromatin regulators involved.

155 The Role of histone modifications in the regulation of alternative splicing during the EMT

CONCLUSION

156 The Role of histone modifications in the regulation of alternative splicing during the EMT

Alternative splicing is a key mechanism to increase protein diversity and modulate transcript levels by just including or excluding intronic and exonic sequences in a highly regulated manner. It has been long suggested that despite been a RNA process, chromatin can also impact alternative splicing decisions by modulating the recruitment of the splicing machinery to the pre-mRNA. However, the causal effect of histone marks on splicing and whether this impact is sufficient to trigger a biological response is not clear yet. In this work, I took advantage of the well-established cell reprogramming system called Epithelial-to-Mesenchymal transition (EMT), in which splicing plays a major role, to test the actual role of histone marks in EMT and splicing. Human normal epithelial MCF10a cells, stably expressing the EMT inducer Snail fused to the estrogen receptor (ER), can induce an EMT upon treatment with tamoxifen in 7 days. It is a highly dynamic system with the first changes in splicing observed just after 12h of treatment, which can be reversed by just eliminating the tamoxifen (MET). I have first correlated in time changes in histone modifications with changes in alternative splicing at exons essential for the EMT. Then, taking advantage of innovative CRISPR/dCas9 editing tools, I have specifically modified one histone mark at a specific exon to test its effect on splicing and EMT. This work is the first one to properly address whether histone modifications are sufficient to regulate splicing in a biologically meaningful manner and to prove the highly dynamic role of chromatin.

During this work, I have demonstrated for the first time that alternatively spliced exons important for the EMT are differentially marked by specific histone modifications. These epigenetic marks are highly dynamic during the EMT/MET and some can precede the changes in splicing, such as H3K27me3 and H3K27ac marks, while H3K4me1 mark is associated with late changes in splicing. Decreasing the levels of these epithelial specific H3K27 marks, or increasing the mesenchymal ones, precisely at these alternatively spliced exons is sufficient to shift the splicing pattern towards the mesenchymal isoform, which can partially induce an EMT. Overall, I have shown that chromatin is sufficient to change EMT-specific splicing patterns, which can impact the cell identity.

To go beyond these results, I investigated the mechanism bridging the local chromatin changes and alternative splicing outcome during EMT. I finally showed that changes in RNA polymerase II elongation rate do not play a role in establishing the new CTNND1 splicing variant during EMT, but may be a consequence of the new cell-

157 The Role of histone modifications in the regulation of alternative splicing during the EMT specific splicing pattern, and that some alternatively spliced exons are sharing common regulators such as the splicing factor PTB (Figure 8 ).

Further studies will better elucidate the genome-wide impact of these histone marks on alternative splicing regulation and EMT progression with the final aim of impairing tumor metastasis by targeting this chromatin-dependent splicing events.

Figure 8: Model linking chromatin marks with AS outcome during EMT. In epithelial cells, CTNND1 exon 2 is excluded and marked by H3K27ac and H3K4me1. During EMT induction and before the first changes in splicing are observed, H3K27ac disappears and H3K27me3 become enriched on the spliced exon. These differential enrichments lead to the eviction of the splicing factor PTB from the exon 2 and is responsible to an increase of the inclusion. Once these changes are well established, H3K4me1 disappears in association with a local decrease of the RNA Polymerase II elongation rate that could be associated with a reinforcement or maintenance of the new splicing state.

158 The Role of histone modifications in the regulation of alternative splicing during the EMT

BIBLIOGRAPHY

159 The Role of histone modifications in the regulation of alternative splicing during the EMT

Ajiro, M., and Zheng, Z.-M. (2014). Oncogenes and RNA splicing of human tumor viruses. Emerg. Microbes Infect. 3, e63.

Allfrey, V.G., and Mirsky, A.E. (1964). Structural Modifications of Histones and their Possible Role in the Regulation of RNA Synthesis. Science 144 , 559.

Amara, S.G., Jonas, V., Rosenfeld, M.G., Ong, E.S., and Evans, R.M. (1982). Alternative RNA processing in calcitonin gene expression generates mRNAs encoding different polypeptide products. Nature 298 , 240 –244.

Anczuków, O., and Krainer, A.R. (2016). Splicing-factor alterations in cancers. RNA 22 , 1285 – 1301.

Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C., and Komorowski, J. (2009). Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res. 19 , 1732 –1741.

Angadi, P., and Kale, A. (2015). Epithelial-mesenchymal transition - A fundamental mechanism in cancer progression: An overview. Indian J. Health Sci. 8, 77.

Anna, A., and Monika, G. (2018). Splicing mutations in human genetic disorders: examples, detection, and confirmation. J. Appl. Genet. 59 , 253 –268.

Ansieau, S., Bastid, J., Doreau, A., Morel, A.-P., Bouchet, B.P., Thomas, C., Fauvet, F., Puisieux, I., Doglioni, C., Piccinin, S., et al. (2008). Induction of EMT by Twist Proteins as a Collateral Effect of Tumor-Promoting Inactivation of Premature Senescence. Cancer Cell 14 , 79 –89.

Arents, G., Burlingame, R.W., Wang, B.C., Love, W.E., and Moudrianakis, E.N. (1991). The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left- handed superhelix. Proc. Natl. Acad. Sci. U. S. A. 88 , 10148 –10152. van Attikum, H., Fritsch, O., and Gasser, S.M. (2007). Distinct roles for SWR1 and INO80 chromatin remodeling complexes at chromosomal double-strand breaks. EMBO J. 26 , 4113 – 4125.

Avvakumov, G.V., Walker, J.R., Xue, S., Li, Y., Duan, S., Bronner, C., Arrowsmith, C.H., and Dhe-Paganon, S. (2008). Structural basis for recognition of hemi-methylated DNA by the SRA domain of human UHRF1. Nature 455 , 822 –825.

Bao, X., Rubin, A.J., Qu, K., Zhang, J., Giresi, P.G., Chang, H.Y., and Khavari, P.A. (2015). A novel ATAC-seq approach reveals lineage-specific reinforcement of the open chromatin landscape via cooperation between BAF and p63. Genome Biol. 16 , 284.

Baralle, F.E., and Giudice, J. (2017). Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18 , 437 –451.

Barash, Y., Blencowe, B.J., and Frey, B.J. (2010). Model-based detection of alternative splicing signals. Bioinforma. Oxf. Engl. 26 , i325-333.

Barrallo-Gimeno, A. (2005). The Snail genes as inducers of cell movement and survival: implications in development and cancer. Development 132 , 3151 –3161.

Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P., Moineau, S., Romero, D.A., and Horvath, P. (2007). CRISPR provides acquired resistance against viruses in prokaryotes. Science 315 , 1709 –1712.

160 The Role of histone modifications in the regulation of alternative splicing during the EMT

Barrero, M.J., Sese, B., Martí, M., and Izpisua Belmonte, J.C. (2013). Macro histone variants are critical for the differentiation of human pluripotent cells. J. Biol. Chem. 288 , 16110 –16116.

Bartel, F., and Harris, L.C. MDM2 and Its Splice Variant Messenger RNAs: Expression in Tumors and Down-Regulation Using Antisense Oligonucleotides. 8.

Bates, D.O., Cui, T.-G., Doughty, J.M., Winkler, M., Sugiono, M., Shields, J.D., Peat, D., Gillatt, D., and Harper, S.J. (2002). VEGF165b, an inhibitory splice variant of vascular endothelial growth factor, is down-regulated in renal cell carcinoma. Cancer Res. 62 , 4123 –4131.

Batsché, E., Yaniv, M., and Muchardt, C. (2006). The human SWI/SNF subunit Brm is a regulator of alternative splicing. Nat. Struct. Mol. Biol. 13 , 22 –29.

Batson, J., Toop, H.D., Redondo, C., Babaei-Jadidi, R., Chaikuad, A., Wearmouth, S.F., Gibbons, B., Allen, C., Tallant, C., Zhang, J., et al. (2017). Development of Potent, Selective SRPK1 Inhibitors as Potential Topical Therapeutics for Neovascular Eye Disease. ACS Chem. Biol. 12 , 825 –832.

Bebee, T.W., Cieply, B.W., and Carstens, R.P. (2014). Genome-wide activities of RNA binding proteins that regulate cellular changes in the epithelial to mesenchymal transition (EMT). Adv. Exp. Med. Biol. 825 , 267 –302.

Beltran, M., Yates, C.M., Skalska, L., Dawson, M., Reis, F.P., Viiri, K., Fisher, C.L., Sibley, C.R., Foster, B.M., Bartke, T., et al. (2016). The interaction of PRC2 with RNA or chromatin is mutually antagonistic. Genome Res. 26 , 896 –907.

Bennett, K.L., Modrell, B., Greenfield, B., Bartolazzi, A., Stamenkovic, I., Peach, R., Jackson, D.G., Spring, F., and Aruffo, A. (1995). Regulation of CD44 binding to hyaluronan by glycosylation of variably spliced exons. J. Cell Biol. 131 , 1623 –1633.

Bentley, D.L. (2014). Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet. 15 , 163 –175.

Berget, S.M., Moore, C., and Sharp, P.A. (1977). Spliced segments at the 5′ terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. 74 , 3171 –3175.

Berk, A.J., and Sharp, P.A. (1978). Spliced early mRNAs of simian virus 40. Proc. Natl. Acad. Sci. 75 , 1274 –1278.

Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev. 16 , 6 –21.

Black, D.L. (2003). Mechanisms of Alternative Pre-Messenger RNA Splicing. Annu. Rev. Biochem. 72 , 291 –336.

Blencowe, B.J. (2006). Alternative Splicing: New Insights from Global Analyses. Cell 126 , 37 – 47.

Bonnal, S., Vigevani, L., and Valcárcel, J. (2012). The spliceosome as a target of novel antitumour drugs. Nat. Rev. Drug Discov. 11 , 847 –859.

Bonomi, S., di Matteo, A., Buratti, E., Cabianca, D.S., Baralle, F.E., Ghigna, C., and Biamonti, G. (2013). HnRNP A1 controls a splicing regulatory circuit promoting mesenchymal-to- epithelial transition. Nucleic Acids Res. 41 , 8665 –8679.

161 The Role of histone modifications in the regulation of alternative splicing during the EMT

Boutet, A., Frutos, C.A.D., Maxwell, P.H., Mayol, M.J., Romero, J., and Nieto, M.A. (2006). Snail activation disrupts tissue homeostasis and induces fibrosis in the adult kidney. EMBO J. 25 , 5603 –5613.

Braeutigam, C., Rago, L., Rolke, A., Waldmeier, L., Christofori, G., and Winter, J. (2014). The RNA-binding protein Rbfox2: an essential regulator of EMT-driven alternative splicing and a mediator of cellular invasion. Oncogene 33 , 1082 –1092.

Branlant, C., Krol, A., Ebel, J.P., Lazar, E., Haendler, B., and Jacob, M. (1982). U2 RNA shares a structural domain with U1, U4, and U5 RNAs. EMBO J. 1, 1259 –1265.

Branlant, C., Krol, A., Lazar, E., Haendler, B., Jacob, M., Galego-Dias, L., and Pousada, C. (1983). High evolutionary conservation of the secondary structure and of certain nucleotide sequences of U5 RNA. Nucleic Acids Res. 11 , 8359 –8367.

Breathnach, R., Mandel, J.L., and Chambon, P. (1977). Ovalbumin gene is split in chicken DNA. Nature 270 , 314 –319.

Brown, R.L., Reinke, L.M., Damerow, M.S., Perez, D., Chodosh, L.A., Yang, J., and Cheng, C. (2011). CD44 splice isoform switching in human and mouse epithelium is essential for epithelial-mesenchymal transition and breast cancer progression. J. Clin. Invest. 121 , 1064 – 1074.

Burgess, R.J., and Zhang, Z. (2013). Histone chaperones in nucleosome assembly and human disease. Nat. Struct. Mol. Biol. 20 , 14 –22.

Carstens, R.P., Wagner, E.J., and Garcia-Blanco, M.A. (2000). An Intronic Splicing Silencer Causes Skipping of the IIIb Exon of Fibroblast Growth Factor Receptor 2 through Involvement of Polypyrimidine Tract Binding Protein. Mol. Cell. Biol. 20 , 7388 –7400.

Carstens, R.P., Eaton, J.V., Krigman, H.R., Walther, P.J., and Garcia-Blanco, M.A. Alternative splicing of ®broblast growth factor receptor 2 (FGF-R2) in human prostate cancer. 7.

Carver, E.A., Jiang, R., Lan, Y., Oram, K.F., and Gridley, T. (2001). The Mouse Snail Gene Encodes a Key Regulator of the Epithelial-Mesenchymal Transition. Mol. Cell. Biol. 21 , 8184 – 8188.

Chen, K. (2015). Alternative splicing: An important mechanism in stem cell biology. World J. Stem Cells 7, 1.

Chen, J., and Weiss, W.A. (2015). Alternative splicing in cancer: implications for biology and therapy. Oncogene 34 , 1 –14.

Chen, C., Zhao, S., Karnad, A., and Freeman, J.W. (2018). The biology and role of CD44 in cancer progression: therapeutic implications. J. Hematol. Oncol.J Hematol Oncol 11 , 64.

Chow, L.T., Gelinas, R.E., Broker, T.R., and Roberts, R.J. (1977). An amazing sequence arrangement at the 5’ ends of adenovirus 2 messenger RNA. Cell 12 , 1 –8.

Churikov, D., Siino, J., Svetlova, M., Zhang, K., Gineitis, A., Morton Bradbury, E., and Zalensky, A. (2004). Novel human testis-specific histone H2B encoded by the interrupted gene on the X chromosome. Genomics 84 , 745 –756.

Climente-González, H., Porta-Pardo, E., Godzik, A., and Eyras, E. (2017). The Functional Impact of Alternative Splicing in Cancer. Cell Rep. 20 , 2215 –2226.

162 The Role of histone modifications in the regulation of alternative splicing during the EMT

Colnot, D.R., Roos, J.C., de Bree, R., Wilhelm, A.J., Kummer, J.A., Hanft, G., Heider, K.-H., Stehle, G., Snow, G.B., and van Dongen, G.A.M.S. (2003). Safety, biodistribution, pharmacokinetics, and immunogenicity of 99m Tc-labeled humanized monoclonal antibody BIWA 4 (bivatuzumab) in patients with squamous cell carcinoma of the head and neck. Cancer Immunol. Immunother. 52 , 576 –582.

Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W., Marraffini, L.A., et al. (2013). Multiplex genome engineering using CRISPR/Cas systems. Science 339 , 819 –823.

Coulter, D.E., and Greenleaf, A.L. (1985). A mutation in the largest subunit of RNA polymerase II alters RNA chain elongation in vitro. J. Biol. Chem. 260 , 13190 –13198.

Daguenet, E., Dujardin, G., and Valcárcel, J. (2015). The pathogenicity of splicing defects: mechanistic insights into pre - MRNA processing inform novel therapeutic approaches. EMBO Rep. 16 , 1640 –1655.

Danan-Gotthold, M., Golan-Gerstl, R., Eisenberg, E., Meir, K., Karni, R., and Levanon, E.Y. (2015). Identification of recurrent regulated alternative splicing events across human solid tumors. Nucleic Acids Res. 43 , 5130 –5144.

Davey, C.A., Sargent, D.F., Luger, K., Maeder, A.W., and Richmond, T.J. (2002). Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. J. Mol. Biol. 319 , 1097 –1113.

David, C.J., and Manley, J.L. (2011). The RNA polymerase C-terminal domain: a new role in spliceosome assembly. Transcription 2, 221 –225.

Davidovich, C., and Cech, T.R. (2015). The recruitment of chromatin modifiers by long noncoding RNAs: lessons from PRC2. RNA 21 , 2007 –2022.

Davie, J.R., Xu, W., and Delcuve, G.P. (2016). Histone H3K4 trimethylation: dynamic interplay with pre-mRNA splicing. Biochem. Cell Biol. Biochim. Biol. Cell. 94 , 1 –11.

Davis, M.A., Ireton, R.C., and Reynolds, A.B. (2003). A core function for p120-catenin in cadherin turnover. J. Cell Biol. 163 , 525 –534.

De Conti, L., Baralle, M., and Buratti, E. (2013). Exon and intron definition in pre-mRNA splicing: Exon and intron definition in pre-mRNA splicing. Wiley Interdiscip. Rev. RNA 4, 49 – 60.

Deltcheva, E., Chylinski, K., Sharma, C.M., Gonzales, K., Chao, Y., Pirzada, Z.A., Eckert, M.R., Vogel, J., and Charpentier, E. (2011). CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471 , 602 –607.

Dominguez, A.A., Lim, W.A., and Qi, L.S. (2016). Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. Nat. Rev. Mol. Cell Biol. 17 , 5 –15.

Dujardin, G., Lafaille, C., de la Mata, M., Marasco, L.E., Muñoz, M.J., Le Jossic-Corcos, C., Corcos, L., and Kornblihtt, A.R. (2014). How Slow RNA Polymerase II Elongation Favors Alternative Exon Skipping. Mol. Cell 54 , 683 –690.

Eberharter, A., and Becker, P.B. (2002). Histone acetylation: a switch between repressive and permissive chromatin: Second in review series on chromatin dynamics. EMBO Rep. 3, 224 – 229.

163 The Role of histone modifications in the regulation of alternative splicing during the EMT

El Marabti, E., and Younis, I. (2018). The Cancer Spliceome: Reprograming of Alternative Splicing in Cancer. Front. Mol. Biosci. 5, 80.

Epifano, C., Megias, D., and Perez -Moreno, M. (2014). p120 -catenin differentially regulates cell migration by Rho -dependent intracellular and secreted signals. EMBO Rep. 15 , 592 –600.

Expert-Bezancon, A., Le Caer, J.P., and Marie, J. (2002). Heterogeneous Nuclear Ribonucleoprotein (hnRNP) K Is a Component of an Intronic Splicing Enhancer Complex That Activates the Splicing of the Alternative Exon 6A from Chicken -Tropomyosin Pre-mRNA. J. Biol. Chem. 277 , 16614 –16623.

Fang, M., Wu, J., Lai, X., Ai, H., Tao, Y., Zhu, B., and Huang, L. (2016). CD44 and CD44v6 are Correlated with Gastric Cancer Progression and Poor Patient Prognosis: Evidence from 42 Studies. Cell. Physiol. Biochem. 40 , 567 –578.

Felsenfeld, G., and Groudine, M. (2003). Controlling the double helix. Nature 421 , 448 –453.

Fischle, W., Wang, Y., Jacobs, S.A., Kim, Y., Allis, C.D., and Khorasanizadeh, S. (2003). Molecular basis for the discrimination of repressive methyl-lysine marks in histone H3 by Polycomb and HP1 chromodomains. Genes Dev. 17 , 1870 –1881.

Franklin, S.G., and Zweidler, A. (1977). Non-allelic variants of histones 2a, 2b and 3 in mammals. Nature 266, 273 –275.

Furuichi, Y. (2015). Discovery of m 7G-cap in eukaryotic mRNAs. Proc. Jpn. Acad. Ser. B 91 , 394 –409.

Fyodorov, D.V., and Kadonaga, J.T. (2002). Dynamics of ATP-dependent chromatin assembly by ACF. Nature 418 , 897 –900.

Gao, K., Masuda, A., Matsuura, T., and Ohno, K. (2008). Human branch point consensus sequence is yUnAy. Nucleic Acids Res. 36 , 2257 –2267.

Garneau, J.E., Dupuis, M.-È., Villion, M., Romero, D.A., Barrangou, R., Boyaval, P., Fremaux, C., Horvath, P., Magadán, A.H., and Moineau, S. (2010). The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468 , 67 –71.

Gaur, R.K. (2006). RNA interference: a potential therapeutic tool for silencing splice isoforms linked to human diseases. BioTechniques 40 , S15 –S22.

Gehman, L.T., Stoilov, P., Maguire, J., Damianov, A., Lin, C.-H., Shiue, L., Ares, M., Mody, I., and Black, D.L. (2011). The splicing regulator Rbfox1 (A2BP1) controls neuronal excitation in the mammalian brain. Nat. Genet. 43 , 706 –711.

Ghigna, C., Giordano, S., Shen, H., Benvenuto, F., Castiglioni, F., Comoglio, P.M., Green, M.R., Riva, S., and Biamonti, G. (2005). Cell Motility Is Controlled by SF2/ASF through Alternative Splicing of the Ron Protooncogene. Mol. Cell 20 , 881 –890.

Ghigna, C., Valacca, C., and Biamonti, G. (2008). Alternative splicing and tumor progression. Curr. Genomics 9, 556 –570.

Gilbert, W. (1978). Why genes in pieces? Nature 271 , 501.

Gilbert, L.A., Larson, M.H., Morsut, L., Liu, Z., Brar, G.A., Torres, S.E., Stern-Ginossar, N., Brandman, O., Whitehead, E.H., Doudna, J.A., et al. (2013). CRISPR-mediated modular RNA- guided regulation of transcription in eukaryotes. Cell 154 , 442 –451.

164 The Role of histone modifications in the regulation of alternative splicing during the EMT

Gonzalez, I., Munita, R., Agirre, E., Dittmer, T.A., Gysling, K., Misteli, T., and Luco, R.F. (2015a). A lncRNA regulates alternative splicing via establishment of a splicing-specific chromatin signature. Nat. Struct. Mol. Biol. 22 , 370 –376.

Gonzalez, I., Munita, R., Agirre, E., Dittmer, T.A., Gysling, K., Misteli, T., and Luco, R.F. (2015b). A lncRNA regulates alternative splicing via establishment of a splicing-specific chromatin signature. Nat Struct Mol Biol 22 , 370 –376.

Gottschalk, A., Neubauer, G., Banroques, J., Mann, M., Lührmann, R., and Fabrizio, P. (1999). Identification by mass spectrometry and functional analysis of novel proteins of the yeast [U4/U6.U5] tri-snRNP. EMBO J. 18 , 4535 –4548.

Gout, S., Brambilla, E., Boudria, A., Drissi, R., Lantuejoul, S., Gazzeri, S., and Eymin, B. (2012). Abnormal Expression of the Pre-mRNA Splicing Regulators SRSF1, SRSF2, SRPK1 and SRPK2 in Non Small Cell Lung Carcinoma. PLoS ONE 7, e46539.

Grewal, S.I.S., and Elgin, S.C.R. (2007). Transcription and RNA interference in the formation of heterochromatin. Nature 447 , 399 –406.

Grosso, A.R., Martins, S., and Carmo -Fonseca, M. (2008). The emerging role of splicing factors in cancer. EMBO Rep. 9, 1087 –1093.

Grunstein, M. (1997). Histone acetylation in chromatin structure and transcription. Nature 389 , 349 –352.

Gruss, O.J., Meduri, R., Schilling, M., and Fischer, U. (2017). UsnRNP biogenesis: mechanisms and regulation. Chromosoma 126 , 577 –593.

Guo, F., Cogdell, D., Hu, L., Yang, D., Sood, A.K., Xue, F., and Zhang, W. (2014). miR-101 suppresses the epithelial-to-mesenchymal transition by targeting ZEB1 and ZEB2 in ovarian carcinoma. Oncol. Rep. 31 , 2021 –2028.

Havens, M.A., and Hastings, M.L. (2016). Splice-switching antisense oligonucleotides as therapeutic drugs. Nucleic Acids Res. 44 , 6549 –6563.

Hegele, A., Kamburov, A., Grossmann, A., Sourlis, C., Wowro, S., Weimann, M., Will, C.L., Pena, V., Lührmann, R., and Stelzl, U. (2012). Dynamic Protein-Protein Interaction Wiring of the Human Spliceosome. Mol. Cell 45 , 567 –580.

Henri Norman (2018). Where are histones produced? - https://www.quora.com/Where-are- histones-produced.

Hertweck, M.K., Erdfelder, F., and Kreuzer, K.-A. (2011). CD44 in hematological neoplasias. Ann. Hematol. 90 , 493 –508.

Hilton, I.B., D’Ippolito, A.M., Vockley, C.M., Thakore, P.I., Crawford, G.E., Reddy, T.E., and Gersbach, C.A. (2015). Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33 , 510 –517.

Hofmann, Y., and Wirth, B. (2002). hnRNP-G promotes exon 7 inclusion of survival motor neuron (SMN) via direct interaction with Htra2-b. 13.

Hon, G., Wang, W., and Ren, B. (2009). Discovery and Annotation of Functional Chromatin Signatures in the Human Genome. PLoS Comput. Biol. 5.

165 The Role of histone modifications in the regulation of alternative splicing during the EMT

Horvath, P., and Barrangou, R. (2010). CRISPR/Cas, the immune system of bacteria and archaea. Science 327 , 167 –170.

Hovhannisyan, R.H., and Carstens, R.P. (2005). A Novel Intronic cis Element, ISE/ISS-3, Regulates Rat Fibroblast Growth Factor Receptor 2 Splicing through Activation of an Upstream Exon and Repression of a Downstream Exon Containing a Noncanonical Branch Point Sequence. Mol. Cell. Biol. 25 , 250 –263.

Huang, C.-S., Shen, C.-Y., Wang, H.-W., Wu, P.-E., and Cheng, C.-W. (2007). Increased expression of SRp40 affecting CD44 splicing is associated with the clinical outcome of lymph node metastasis in human breast cancer. Clin. Chim. Acta Int. J. Clin. Chem. 384 , 69 –74.

Huang, Y., Jeong, J.S., Okamura, J., Sook-Kim, M., Zhu, H., Guerrero-Preston, R., and Ratovitski, E.A. (2012). Global tumor protein p53/p63 interactome: Making a case for cisplatin chemoresistance. Cell Cycle 11 , 2367 –2379.

Igolkina, A.A., Zinkevich, A., Karandasheva, K.O., Popov, A.A., Selifanova, M.V., Nikolaeva, D., Tkachev, V., Penzar, D., Nikitin, D.M., and Buzdin, A. (2019). H3K4me3, H3K9ac, H3K27ac, H3K27me3 and H3K9me3 Histone Tags Suggest Distinct Regulatory Evolution of Open and Condensed Chromatin Landmarks. Cells 8.

Imielinski, M., Berger, A.H., Hammerman, P.S., Hernandez, B., Pugh, T.J., Hodis, E., Cho, J., Suh, J., Capelletti, M., Sivachenko, A., et al. (2012). Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing. Cell 150 , 1107 –1120.

Irimia, M., and Blencowe, B.J. (2012). Alternative splicing: decoding an expansive regulatory layer. Curr. Opin. Cell Biol. 24 , 323 –332.

Ishino, Y., Shinagawa, H., Makino, K., Amemura, M., and Nakata, A. (1987). Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J. Bacteriol. 169 , 5429 –5433.

Ishino, Y., Krupovic, M., and Forterre, P. (2018). History of CRISPR-Cas from Encounter with a Mysterious Repeated Sequence to Genome Editing Technology. J. Bacteriol. 200 .

Jady, B.E. (2001). A small nucleolar guide RN A functions both in 2’ -O-ribose methylation and pseudouridylation of the U5 spliceosomal RNA. EMBO J. 20 , 541 –551.

Jair, K.-W., Bachman, K.E., Suzuki, H., Ting, A.H., Rhee, I., Yen, R.-W.C., Baylin, S.B., and Schuebel, K.E. (2006). De novo CpG island methylation in human cancer cells. Cancer Res. 66 , 682 –692.

Jakočiūnas, T., Jensen, M.K., and Keasling, J.D. (2016). CRISPR/Cas9 advances engineering of microbial cell factories. Metab. Eng. 34 , 44 –59.

Javaid, S., Zhang, J., Anderssen, E., Black, J.C., Wittner, B.S., Tajima, K., Ting, D.T., Smolen, G.A., Zubrowski, M., Desai, R., et al. (2013). Dynamic Chromatin Modification Sustains Epithelial-Mesenchymal Transition following Inducible Expression of Snail-1. Cell Rep. 5, 1679 –1689.

Jia, R., Ajiro, M., Yu, L., McCoy, P., and Zheng, Z.-M. (2019). Oncogenic splicing factor SRSF3 regulates ILF3 alternative splicing to promote cancer cell proliferation and transformation. RNA 25 , 630 –644.

166 The Role of histone modifications in the regulation of alternative splicing during the EMT

Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A., and Charpentier, E. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337 , 816 –821.

Jonkers, I., Kwak, H., and Lis, J.T. (2014). Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. ELife 3, e02407.

Jurica, M.S., Licklider, L.J., Gygi, S.P., Grigorieff, N., and Moore, M.J. (2002). Purification and characterization of native spliceosomes suitable for three-dimensional structural analysis. RNA 8, 426 –439.

Kalluri, R., and Weinberg, R.A. (2009). The basics of epithelial-mesenchymal transition. J. Clin. Invest. 119 , 1420 –1428.

Kao, S.-H., Wu, K.-J., and Lee, W.-H. (2016). Hypoxia, Epithelial-Mesenchymal Transition, and TET-Mediated Epigenetic Changes. J. Clin. Med. 5, 24.

Kearns, N.A., Pham, H., Tabak, B., Genga, R.M., Silverstein, N.J., Garber, M., and Maehr, R. (2015). Functional annotation of native enhancers with a Cas9-histone demethylase fusion. Nat. Methods 12 , 401 –403.

Kędzierska, H., and Piekiełko -Witkowska, A. (2017). Splicing factors of SR and hnRNP families as regulators of apoptosis in cancer. Cancer Lett. 396 , 53 –65.

Keirsebilck, A., Bonné, S., Staes, K., van Hengel, J., Nollet, F., Reynolds, A., and van Roy, F. (1998). Molecular Cloning of the Human p120ctnCatenin Gene (CTNND1): Expression of Multiple Alternatively Spliced Isoforms. Genomics 50 , 129 –146.

Kim, E., Ilagan, J.O., Liang, Y., Daubner, G.M., Lee, S.C.-W., Ramakrishnan, A., Li, Y., Chung, Y.R., Micol, J.-B., Murphy, M.E., et al. (2015). SRSF2 Mutations Contribute to Myelodysplasia by Mutant-Specific Effects on Exon Recognition. Cancer Cell 27 , 617–630.

Kim, H.K., Pham, M.H.C., Ko, K.S., Rhee, B.D., and Han, J. (2018). Alternative splicing isoforms in health and disease. Pflüg. Arch. - Eur. J. Physiol. 470 , 995 –1016.

Kiss, T. (2004). Biogenesis of small nuclear RNPs. J. Cell Sci. 117 , 5949 –5951.

Koren, E., Lev-Maor, G., and Ast, G. (2007). The Emergence of Alternative 3′ and 5′ Splice Site Exons from Constitutive Exons. PLoS Comput. Biol. 3, e95.

Kornblihtt, A.R. (2006). Chromatin, transcript elongation and alternative splicing. Nat. Struct. Mol. Biol. 13 , 5 –7.

Kornblihtt, A.R., Schor, I.E., Alló, M., Dujardin, G., Petrillo, E., and Muñoz, M.J. (2013). Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat. Rev. Mol. Cell Biol. 14 , 153 –165.

Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128 , 693 –705.

Kudo-Saito, C., Shirako, H., Takeuchi, T., and Kawakami, Y. (2009). Cancer Metastasis Is Accelerated through Immunosuppression during Snail-Induced EMT of Cancer Cells. Cancer Cell 15 , 195 –206.

Labelle, M., Begum, S., and Hynes, R.O. (2011). Direct Signaling between Platelets and Cancer Cells Induces an Epithelial-Mesenchymal-Like Transition and Promotes Metastasis. Cancer Cell 20 , 576 –590.

167 The Role of histone modifications in the regulation of alternative splicing during the EMT

Ladomery, M. (2013). Aberrant Alternative Splicing Is Another Hallmark of Cancer. Int. J. Cell Biol. 2013 , 1 –6.

Lapuk, A., Marr, H., Jakkula, L., Pedro, H., Bhattacharya, S., Purdom, E., Hu, Z., Simpson, K., Pachter, L., Durinck, S., et al. (2010). Exon-Level Microarray Analyses Identify Alternative Splicing Programs in Breast Cancer. Mol. Cancer Res. 8, 961 –974.

Larson, M.H., Gilbert, L.A., Wang, X., Lim, W.A., Weissman, J.S., and Qi, L.S. (2013). CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 8, 2180 – 2196.

Leslie A. Pray (2008). Discovery of DNA Double Helix: Watson and Crick | Learn Science at Scitable - https://www.nature.com/scitable/topicpage/discovery-of-dna-structure-and-function- watson-397/.

Lev Maor, G., Yearim, A., and Ast, G. (2015). The alternative role of DNA methylation in splicing regulation. Trends Genet. 31 , 274 –280.

Li, Q.Q., Liu, Z., Lu, W., and Liu, M. (2017). Interplay between Alternative Splicing and Alternative Polyadenylation Defines the Expression Outcome of the Plant Unique OXIDATIVE TOLERANT-6 Gene. Sci. Rep. 7, 2052.

Lia, G. (2005). Etude du mécanisme d’action des facteurs de remodelage de la chromatine par micromanipulation et visualisation de l’ADN. p.

Lindsay, J., McDade, S.S., Pickard, A., McCloskey, K.D., and McCance, D.J. (2011). Role of DeltaNp63gamma in epithelial to mesenchymal transition. J. Biol. Chem. 286 , 3915 –3924.

Linker, S.M., Urban, L., Clark, S.J., Chhatriwala, M., Amatya, S., McCarthy, D.J., Ebersberger, I., Vallier, L., Reik, W., Stegle, O., et al. (2019). Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol. 20 , 30.

Loh, T.J., Moon, H., Cho, S., Jang, H., Liu, Y.C., Tai, H., Jung, D.-W., Williams, D.R., Kim, H.- R., Shin, M.-G., et al. (2015). CD44 alternative splicing and hnRNP A1 expression are associated with the metastasis of breast cancer. Oncol. Rep. 34 , 1231 –1238.

Luco, R.F., Pan, Q., Tominaga, K., Blencowe, B.J., Pereira-Smith, O.M., and Misteli, T. (2010). Regulation of Alternative Splicing by Histone Modifications. Science 327 , 996 –1000.

Luco, R.F., Allo, M., Schor, I.E., Kornblihtt, A.R., and Misteli, T. (2011). Epigenetics in Alternative Pre-mRNA Splicing. Cell 144 , 16 –26.

Luger, K., Mäder, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389 , 251 –260.

Maeder, M.L., Linder, S.J., Cascio, V.M., Fu, Y., Ho, Q.H., and Joung, J.K. (2013). CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10 , 977 –979.

Maki, R., Roeder, W., Traunecker, A., Sidman, C., Wabl, M., Raschke, W., and Tonegawa, S. (1981). The role of DNA rearrangement and alternative RNA processing in the expression of immunoglobulin delta genes. Cell 24 , 353 –365.

Manabe, R., Oh-e, N., Maeda, T., Fukuda, T., and Sekiguchi, K. (1997). Modulation of Cell- adhesive Activity of Fibronectin by the Alternatively Spliced EDA Segment. J. Cell Biol. 139 , 295 –307.

168 The Role of histone modifications in the regulation of alternative splicing during the EMT

Manelyte, L., and Längst, G. (2013). Chromatin Remodelers and Their Way of Action. Chromatin Remodel.

Manzur, K.L., Farooq, A., Zeng, L., Plotnikova, O., Koch, A.W., Sachchidanand, null, and Zhou, M.-M. (2003). A dimeric viral SET domain methyltransferase specific to Lys27 of histone H3. Nat. Struct. Biol. 10 , 187 –196.

Martin Anduaga, A., Evantal, N., Patop, I.L., Bartok, O., Weiss, R., and Kadener, S. (2019). Thermosensitive alternative splicing senses and mediates temperature adaptation in Drosophila. ELife 8.

Martinez-Contreras, R., Cloutier, P., Shkreta, L., Fisette, J.-F., Revil, T., and Chabot, B. (2007). hnRNP proteins and splicing control. Adv. Exp. Med. Biol. 623 , 123 –147.

Martinez-Montiel, N., Rosas-Murrieta, N.H., Anaya Ruiz, M., Monjaraz-Guzman, E., and Martinez-Contreras, R. (2018). Alternative Splicing as a Target for Cancer Treatment. Int. J. Mol. Sci. 19 , 545. de la Mata, M., and Kornblihtt, A.R. (2006). RNA polymerase II C-terminal domain mediates regulation of alternative splicing by SRp20. Nat. Struct. Mol. Biol. 13 , 973 –980. de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M., Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow RNA polymerase II affects alternative splicing in vivo. Mol. Cell 12 , 525 –532.

Matera, A.G., and Wang, Z. (2014). A day in the life of the spliceosome. Nat. Rev. Mol. Cell Biol. 15 , 108 –121.

Maunakea, A.K., Chepelev, I., Cui, K., and Zhao, K. (2013). Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res. 23 , 1256 –1269.

Mazoyer, S., Puget, N., Perrin-Vidoz, L., Lynch, H.T., Serova-Sinilnikova, O.M., and Lenoir, G.M. (1998). A BRCA1 nonsense mutation causes exon skipping. Am. J. Hum. Genet. 62 , 713 –715.

McCracken, S., Fong, N., Yankulov, K., Ballantyne, S., Pan, G., Greenblatt, J., Patterson, S.D., Wickens, M., and Bentley, D.L. (1997). The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385 , 357 –361.

McDonald, O.G., Wu, H., Timp, W., Doi, A., and Feinberg, A.P. (2011). Genome-scale epigenetic reprogramming during epithelial-to-mesenchymal transition. Nat. Struct. Mol. Biol. 18 , 867 –874.

McGhee, J.D., and Felsenfeld, G. (1980). Nucleosome structure. Annu. Rev. Biochem. 49 , 1115 –1156.

Medina, S.G.-D.D., Popov, Z., Chopin, D.K., Southgate, J., and Tucker, G.C. (1999). Relationship between E-cadherin and ®broblast growth factor receptor 2b expression in bladder carcinomas. 5.

Meggendorfer, M., Roller, A., Haferlach, T., Eder, C., Dicker, F., Grossmann, V., Kohlmann, A., Alpermann, T., Yoshida, K., Ogawa, S., et al. (2012). SRSF2 mutations in 275 cases with chronic myelomonocytic leukemia (CMML). Blood 120 , 3080 –3088.

169 The Role of histone modifications in the regulation of alternative splicing during the EMT

Meistrich, M.L., Bucci, L.R., Trostle-Weige, P.K., and Brock, W.A. (1985). Histone variants in rat spermatogonia and primary spermatocytes. Dev. Biol. 112 , 230 –240.

Merdzhanova, G., Edmond, V., De Seranno, S., Van den Broeck, A., Corcos, L., Brambilla, C., Brambilla, E., Gazzeri, S., and Eymin, B. (2008). controls alternative splicing pattern of genes involved in apoptosis through upregulation of the splicing factor SC35. Cell Death Differ. 15 , 1815 –1823.

Merkin, J., Russell, C., Chen, P., and Burge, C.B. (2012). Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues. Science 338 , 1593 –1599.

Miura, K., Fujibuchi, W., and Unno, M. (2012). SPLICE VARIANTS IN APOPTOTIC PATHWAY. Exp. Oncol. 6.

Modrek, B., and Lee, C. (2002). A genomic view of alternative splicing. Nat. Genet. 30 , 13 –19.

Mojica, F.J., Díez-Villaseñor, C., Soria, E., and Juez, G. (2000). Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol. Microbiol. 36 , 244 –246.

Mojica, F.J.M., Díez-Villaseñor, C., García-Martínez, J., and Almendros, C. (2009). Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiol. Read. Engl. 155 , 733 –740.

Mongroo, P.S., and Rustgi, A.K. (2010). The role of the miR-200 family in epithelial- mesenchymal transition. Cancer Biol. Ther. 10 , 219 –222.

Montemayor, E.J., Katolik, A., Clark, N.E., Taylor, A.B., Schuermann, J.P., Combs, D.J., Johnsson, R., Holloway, S.P., Stevens, S.W., Damha, M.J., et al. (2014). Structural basis of lariat RNA recognition by the intron debranching enzyme Dbr1. Nucleic Acids Res. 42 , 10845 – 10855.

Moore, M.J., and Sharp, P.A. (1993). Evidence for two active sites in the spliceosome provided by stereochemistry of pre-mRNA splicing. Nature 365 , 364 –368.

Mueller, W.F., and Hertel, K.J. (2011). THE ROLE OF SR AND SR -RELATED PROTEINS IN pre -mRNA SPLICING - https://www.semanticscholar.org/paper/THE-ROLE-OF-SR-AND- SR%E2%80%91RELATED-PROTEINS-IN-pre%E2%80%91mRNA-Mueller- Hertel/173787a839bee7d69075afe514634b627f3ddfce.

Naarmann-de Vries, I.S., Brendle, A., Bähr-Ivacevic, T., Benes, V., Ostareck, D.H., and Ostareck-Lederer, A. (2016). Translational control mediated by hnRNP K links NMHC IIA to erythroid enucleation. J. Cell Sci. 129 , 1141 –1154.

Narla, G., DiFeo, A., Yao, S., Banno, A., Hod, E., Reeves, H.L., Qiao, R.F., Camacho- Vanegas, O., Levine, A., Kirschenbaum, A., et al. (2005). Targeted inhibition of the KLF6 splice variant, KLF6 SV1, suppresses prostate cancer cell growth and spread. Cancer Res. 65 , 5761 – 5768.

Narlikar, G.J., Sundaramoorthy, R., and Owen-Hughes, T. (2013). Mechanisms and functions of ATP-dependent chromatin-remodeling enzymes. Cell 154 , 490 –503.

Naro, C., Bielli, P., Pagliarini, V., and Sette, C. (2015). The interplay between DNA damage response and RNA processing: the unexpected role of splicing factors as gatekeepers of genome stability. Front. Genet. 6, 142.

170 The Role of histone modifications in the regulation of alternative splicing during the EMT

Navaglia, F., Fogar, P., Greco, E., Basso, D., Stefani, A.L., Mazza, S., Zambon, C.F., Habeler, W., Altavilla, G., Amadori, A., et al. (2003). CD44v10: an antimetastatic membrane glycoprotein for pancreatic cancer. Int. J. Biol. Markers 18 , 130 –138.

Neumann, D.P., Goodall, G.J., and Gregory, P.A. (2018). Regulation of splicing and circularisation of RNA in epithelial mesenchymal plasticity. Semin. Cell Dev. Biol. 75 , 50 –60.

Nieto, M.A. (2013). Epithelial Plasticity: A Common Theme in Embryonic and Cancer Cells. Science 342 , 1234850 –1234850.

Oesterreich, F.C., Herzel, L., Straube, K., Hujer, K., Howard, J., and Neugebauer, K.M. (2016). Splicing of Nascent RNA Coincides with Intron Exit from RNA Polymerase II. Cell 165 , 372 – 381.

Ohkura, N., Takahashi, M., Yaguchi, H., Nagamura, Y., and Tsukada, T. (2005). Coactivator- associated arginine methyltransferase 1, CARM1, affects pre-mRNA splicing in an isoform- specific manner. J. Biol. Chem. 280 , 28927 –28935.

Okano, M., Bell, D.W., Haber, D.A., and Li, E. (1999). DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99 , 247 – 257.

Olins, D.E., and Olins, A.L. (2003). Chromatin history: our view from the bridge. Nat. Rev. Mol. Cell Biol. 4, 809 –814.

Pachov, G.V., Gabdoulline, R.R., and Wade, R.C. (2011). On the structure and dynamics of the complex of the nucleosome and the linker histone. Nucleic Acids Res. 39 , 5255 –5263.

Pajares, M.J., Ezponda, T., Catena, R., Calvo, A., Pio, R., and Montuenga, L.M. (2007). Alternative splicing: an emerging topic in molecular and clinical oncology. Lancet Oncol. 8, 349 –357.

Pan, Q., Shai, O., Lee, L.J., Frey, B.J., and Blencowe, B.J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40 , 1413 –1415.

Patel, A.A., and Steitz, J.A. (2003). Splicing double: insights from the second spliceosome. Nat. Rev. Mol. Cell Biol. 4, 960 –970.

Petrillo, E., Godoy Herz, M.A., Fuchs, A., Reifer, D., Fuller, J., Yanovsky, M.J., Simpson, C., Brown, J.W.S., Barta, A., Kalyna, M., et al. (2014). A chloroplast retrograde signal regulates nuclear alternative splicing. Science 344 , 427 –430.

Podlaha, O., De, S., Gonen, M., and Michor, F. (2014). Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells. PLoS Comput. Biol. 10 , e1003611.

Pradeepa, M.M., Sutherland, H.G., Ule, J., Grimes, G.R., and Bickmore, W.A. (2012). Psip1/Ledgf p52 binds methylated histone H3K36 and splicing factors and contributes to the regulation of alternative splicing. PLoS Genet. 8, e1002717.

Pradella, D., Naro, C., Sette, C., and Ghigna, C. (2017). EMT and stemness: flexible processes tuned by alternative splicing in development and cancer progression. Mol. Cancer 16 , 8.

171 The Role of histone modifications in the regulation of alternative splicing during the EMT

Qi, L.S., Larson, M.H., Gilbert, L.A., Doudna, J.A., Weissman, J.S., Arkin, A.P., and Lim, W.A. (2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152 , 1173 –1183.

Ranieri, D., Rosato, B., Nanni, M., Magenta, A., Belleudi, F., and Torrisi, M.R. (2016). Expression of the FGFR2 mesenchymal splicing variant in epithelial cells drives epithelial- mesenchymal transition. Oncotarget 7, 5440 –5460.

Razin, S.V., and Gavrilov, A.A. (2014). Chromatin without the 30-nm fiber: constrained disorder instead of hierarchical folding. Epigenetics 9, 653 –657.

Régnier, V., Vagnarelli, P., Fukagawa, T., Zerjal, T., Burns, E., Trouche, D., Earnshaw, W., and Brown, W. (2005). CENP-A Is Required for Accurate Chromosome Segregation and Sustained Kinetochore Association of BubR1. Mol. Cell. Biol. 25 , 3967 –3981.

Reik, W. (2007). Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447 , 425 –432.

Resch, A. (2004). Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation. Nucleic Acids Res. 32 , 1261 –1269.

Rieder, F., Brenmoehl, J., Leeb, S., Scholmerich, J., and Rogler, G. (2007). Wound healing and fibrosis in intestinal disease. Gut 56 , 130 –139.

Rino, J., Desterro, J.M.P., Pacheco, T.R., Gadella, T.W.J., and Carmo-Fonseca, M. (2008). Splicing Factors SF1 and U2AF Associate in Extraspliceosomal Complexes. Mol. Cell. Biol. 28 , 3045 –3057.

Roc a, X. (2003). Intrinsic differences between authentic and cryptic 5’ splice sites. Nucleic Acids Res. 31 , 6321 –6333.

Rothbart, S.B., and Strahl, B.D. (2014). Interpreting the language of histone and DNA modifications. Biochim. Biophys. Acta 1839 , 627 –643.

Saint-André, V., Batsché, E., Rachez, C., and Muchardt, C. (2011). Histone H3 lysine 9 trimethylation and HP1 γ favor inclusion of alternative exons. Nat. Struct. Mol. Biol. 18 , 337 – 344.

Saldanha, R., Mohr, G., Belfort, M., and Lambowitz, A.M. (1993). Group I and group II introns. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 7, 15 –24.

Sánchez-Tilló, E., Lázaro, A., Torrent, R., Cuatrecasas, M., Vaquero, E.C., Castells, A., Engel, P., and Postigo, A. (2010). ZEB1 represses E-cadherin and induces an EMT by recruiting the SWI/SNF chromatin-remodeling protein BRG1. Oncogene 29 , 3490 –3500.

Sanidas, I., Polytarchou, C., Hatziapostolou, M., Ezell, S.A., Kottakis, F., Hu, L., Guo, A., Xie, J., Comb, M.J., Iliopoulos, D., et al. (2014). Phosphoproteomics Screen Reveals Akt Isoform- Specific Signals Linking RNA Processing to Lung Cancer. Mol. Cell 53 , 577 –590.

Sapranauskas, R., Gasiunas, G., Fremaux, C., Barrangou, R., Horvath, P., and Siksnys, V. (2011). The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucleic Acids Res. 39 , 9275 –9282.

Sauka-Spengler, T., and Bronner-Fraser, M. (2008). A gene regulatory network orchestrates neural crest formation. Nat. Rev. Mol. Cell Biol. 9, 557 –568.

172 The Role of histone modifications in the regulation of alternative splicing during the EMT

Savagner, P., Vallés, A.M., Jouanneau, J., Yamada, K.M., and Thiery, J.P. (1994). Alternative splicing in fibroblast growth factor receptor 2 is associated with induced epithelial- mesenchymal transition in rat bladder carcinoma cells. Mol. Biol. Cell 5, 851 –862.

Scheel, C., and Weinberg, R.A. (2012). Cancer stem cells and epithelial –mesenchymal transition: Concepts and molecular links. Semin. Cancer Biol. 22 , 396 –403.

Schor, I.E., Rascovan, N., Pelisch, F., Alló, M., and Kornblihtt, A.R. (2009). Neuronal cell depolarization induces intragenic chromatin modifications affecting NCAM alternative splicing. Proc. Natl. Acad. Sci. U. S. A. 106 , 4325 –4330.

Schwartz, S., Meshorer, E., and Ast, G. (2009). Chromatin organization marks exon-intron structure. Nat. Struct. Mol. Biol. 16 , 990 –995.

Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thåström, A., Field, Y., Moore, I.K., Wang, J.-P.Z., and Widom, J. (2006). A genomic code for nucleosome positioning. Nature 442 , 772 –778.

Sengez, B., Aygün, I., Shehwana, H., Toyran, N., Tercan Avci, S., Konu, O., Stemmler, M.P., and Alotaibi, H. (2019). The Transcription Factor Elf3 Is Essential for a Successful Mesenchymal to Epithelial Transition. Cells 8, 858.

Shapiro, I.M., Cheng, A.W., Flytzanis, N.C., Balsamo, M., Condeelis, J.S., Oktay, M.H., Burge, C.B., and Gertler, F.B. (2011). An EMT –Driven Alternative Splicing Program Occurs in Human Breast Cancer and Modulates Cellular Phenotype. PLoS Genet. 7, e1002218.

Shayevitch, R., Askayo, D., Keydar, I., and Ast, G. (2018). The importance of DNA methylation of exons on alternative splicing. RNA N. Y. N 24 , 1351 –1362.

Shepard, P.J., and Hertel, K.J. (2009). The SR protein family. Genome Biol. 10 , 242.

Siam, A., Baker, M., Amit, L., Regev, G., Rabner, A., Najar, R.A., Bentata, M., Dahan, S., Cohen, K., Araten, S., et al. (2019). Regulation of alternative splicing by p300-mediated acetylation of splicing factors. RNA 25 , 813 –824.

Siemens, H., Jackstadt, R., Hünten, S., Kaller, M., Menssen, A., Götz, U., and Hermeking, H. (2011). miR-34 and SNAIL form a double-negative feedback loop to regulate epithelial- mesenchymal transitions. Cell Cycle 10 , 4256 –4271. da Silva, M.R., Moreira, G.A., Gonçalves da Silva, R.A., de Almeida Alves Barbosa, É., Pais Siqueira, R., Teixera, R.R., Almeida, M.R., Silva Júnior, A., Fietto, J.L.R., and Bressan, G.C. (2015). Splicing Regulators and Their Roles in Cancer Biology and Therapy. BioMed Res. Int. 2015 , 1 –12.

Sims, R.J., Millhouse, S., Chen, C.-F., Lewis, B.A., Erdjument-Bromage, H., Tempst, P., Manley, J.L., and Reinberg, D. (2007). Recognition of trimethylated histone H3 lysine 4 facilitates the recruitment of transcription postinitiation factors and pre-mRNA splicing. Mol. Cell 28 , 665 –676.

Singh, A.K., and Lakhotia, S.C. (2016). The hnRNP A1 homolog Hrb87F/Hrp36 is important for telomere maintenance in Drosophila melanogaster. Chromosoma 125 , 373 –388.

Singh, M., Popowicz, G.M., Krajewski, M., and Holak, T.A. (2007). Structural ramification for acetyl-lysine recognition by the bromodomain of human BRG1 protein, a central ATPase of the SWI/SNF remodeling complex. Chembiochem Eur. J. Chem. Biol. 8, 1308 –1316.

173 The Role of histone modifications in the regulation of alternative splicing during the EMT

Singh, R., Valcárcel, J., and Green, M.R. (1995). Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins. Science 268 , 1173 –1176.

Somarelli, J.A., Shetler, S., Jolly, M.K., Wang, X., Bartholf Dewitt, S., Hish, A.J., Gilja, S., Eward, W.C., Ware, K.E., Levine, H., et al. (2016). Mesenchymal-Epithelial Transition in Sarcomas Is Controlled by the Combinatorial Expression of MicroRNA 200s and GRHL2. Mol. Cell. Biol. 36 , 2503 –2513.

Somerville, T.D.D., Xu, Y., Miyabayashi, K., Tiriac, H., Cleary, C.R., Maia-Silva, D., Milazzo, J.P., Tuveson, D.A., and Vakoc, C.R. (2018). TP63-Mediated Enhancer Reprogramming Drives the Squamous Subtype of Pancreatic Ductal Adenocarcinoma. Cell Rep. 25 , 1741- 1755.e7.

Spies, N., Nielsen, C.B., Padgett, R.A., and Burge, C.B. (2009). Biased chromatin signatures around polyadenylation sites and exons. Mol. Cell 36 , 245 –254.

Sternberg, S.H., Redding, S., Jinek, M., Greene, E.C., and Doudna, J.A. (2014). DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507 , 62 –67.

Strahl, B.D., and Allis, C.D. (2000). The language of covalent histone modifications. Nature 403 , 41 –45.

Szenker, E., Ray-Gallet, D., and Almouzni, G. (2011). The double face of the histone variant H3.3. Cell Res. 21 , 421 –434.

Tahmasebi, S., Jafarnejad, S.M., Tam, I.S., Gonatopoulos-Pournatzis, T., Matta-Camacho, E., Tsukumo, Y., Yanagiya, A., Li, W., Atlasi, Y., Caron, M., et al. (2016). Control of embryonic stem cell self-renewal and differentiation via coordinated alternative splicing and translation of YY2. Proc. Natl. Acad. Sci. 113 , 12360 –12367.

Takaku, M., Grimm, S.A., Shimbo, T., Perera, L., Menafra, R., Stunnenberg, H.G., Archer, T.K., Machida, S., Kurumizaka, H., and Wade, P.A. (2016). GATA3-dependent cellular reprogramming requires activation-domain dependent recruitment of a chromatin remodeler. Genome Biol. 17 , 36.

Tam, W.L., and Weinberg, R.A. (2013). The epigenetics of epithelial-mesenchymal plasticity in cancer. Nat. Med. 19 , 1438 –1449.

Thakore, P.I., Black, J.B., Hilton, I.B., and Gersbach, C.A. (2016). Editing the epigenome: technologies for programmable transcription and epigenetic modulation. Nat. Methods 13 , 127 –137.

The Cancer Genome Atlas Research Network (2014). Comprehensive molecular profiling of lung adenocarcinoma. Nature 511 , 543 –550.

Thiery, J.P., and Sleeman, J.P. (2006). Complex networks orchestrate epithelial – mesenchymal transitions. Nat. Rev. Mol. Cell Biol. 7, 131 –142.

Thiery, J.P., Acloque, H., Huang, R.Y.J., and Nieto, M.A. (2009). Epithelial-Mesenchymal Transitions in Development and Disease. Cell 139 , 871 –890.

Tilghman, S.M., Tiemeier, D.C., Seidman, J.G., Peterlin, B.M., Sullivan, M., Maizel, J.V., and Leder, P. (1978). Intervening sequence of DNA identified in the structural portion of a mouse fl-globin gene. 5.

174 The Role of histone modifications in the regulation of alternative splicing during the EMT

Tilgner, H., Nikolaou, C., Althammer, S., Sammeth, M., Beato, M., Valcárcel, J., and Guigó, R. (2009). Nucleosome positioning as a determinant of exon recognition. Nat. Struct. Mol. Biol. 16 , 996 –1001.

Tolstorukov, M.Y., Goldman, J.A., Gilbert, C., Ogryzko, V., Kingston, R.E., and Park, P.J. (2012). Histone variant H2A.Bbd is associated with active transcription and mRNA processing in human cells. Mol. Cell 47 , 596 –607.

Tropberger, P., and Schneider, R. (2013). Scratching the (lateral) surface of chromatin regulation by histone modifications. Nat. Struct. Mol. Biol. 20 , 657 –661.

Tsukuda, T., Fleming, A.B., Nickoloff, J.A., and Osley, M.A. (2005). Chromatin remodelling at a DNA double-strand break site in Saccharomyces cerevisiae. Nature 438 , 379 –383.

Turner, N., and Grose, R. (2010). Fibroblast growth factor signalling: from development to cancer. Nat. Rev. Cancer 10 , 116 –129.

Turunen, J.J., Niemelä, E.H., Verma, B., and Frilander, M.J. (2013). The significant other: splicing by the minor spliceosome: Splicing by the minor spliceosome. Wiley Interdiscip. Rev. RNA 4, 61 –76.

Urbanski, L.M., Leclair, N., and Anczuków, O. (2018). Alternative-splicing defects in cancer: Splicing regulators and their downstream targets, guiding the way to novel cancer therapeutics. Wiley Interdiscip. Rev. RNA 9, e1476.

Ustaoglu, P., Haussmann, I.U., Liao, H., Torres-Mendez, A., Arnold, R., Irimia, M., and Soller, M. (2019). Srrm234, but not canonical SR and hnRNP proteins drive inclusion of Dscam exon 9 variable exons. BioRxiv 584003.

Valastyan, S., and Weinberg, R.A. (2011). Tumor Metastasis: Molecular Insights and Evolving Paradigms. Cell 147 , 275 –292.

Vanharanta, S., Marney, C.B., Shu, W., Valiente, M., Zou, Y., Mele, A., Darnell, R.B., and Massagué, J. (2014). Loss of the multifunctional RNA-binding protein RBM47 as a source of selectable metastatic traits in breast cancer. ELife 3, e02734.

Venables, J.P. (2004). Aberrant and Alternative Splicing in Cancer. Cancer Res. 64 , 7647 – 7654.

Venables, J.P., Klinck, R., Koh, C., Gervais-Bird, J., Bramard, A., Inkel, L., Durand, M., Couture, S., Froehlich, U., Lapointe, E., et al. (2009). Cancer-associated regulation of alternative splicing. Nat. Struct. Mol. Biol. 16 , 670 –676.

Venables, J.P., Brosseau, J.-P., Gadea, G., Klinck, R., Prinos, P., Beaulieu, J.-F., Lapointe, E., Durand, M., Thibault, P., Tremblay, K., et al. (2013). RBFOX2 Is an Important Regulator of Mesenchymal Tissue-Specific Splicing in both Normal and Cancer Tissues. Mol. Cell. Biol. 33 , 396 –405.

Verel, I., Heider, K.-H., Siegmund, M., Ostermann, E., Patzelt, E., Sproll, M., Snow, G.B., Adolf, G.R., and Dongen, G.A.M.S. van (2002). Tumor targeting properties of monoclonal antibodies with different affinity for target antigen CD44V6 in nude mice bearing head-and-neck cancer xenografts. Int. J. Cancer 99 , 396 –402.

Vidal, V.P.I., Verdone, L., Mayes, A.E., and Beggs, J.D. (1999). Characterization of U6 snRNA –protein interactions. RNA 5, 1470 –1481.

175 The Role of histone modifications in the regulation of alternative splicing during the EMT

Voigt, P., LeRoy, G., Drury, W.J., Zee, B.M., Son, J., Beck, D.B., Young, N.L., Garcia, B.A., and Reinberg, D. (2012). Asymmetrically modified nucleosomes. Cell 151 , 181 –193.

Wagner, E.J., and Carpenter, P.B. (2012). Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 13 , 115 –126.

Wahl, M.C., Will, C.L., and Lührmann, R. (2009). The Spliceosome: Design Principles of a Dynamic RNP Machine. Cell 136 , 701 –718.

Wang, B.-D., and Lee, N. (2018). Aberrant RNA Splicing in Cancer and Drug Resistance. Cancers 10 , 458.

Wang, Z., and Burge, C.B. (2008). Splicing regulation: From a parts list of regulatory elements to an integrated splicing code. RNA 14 , 802 –813.

Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456 , 470 –476.

Warzecha, C.C., and Carstens, R.P. (2012). Complex changes in alternative pre-mRNA splicing play a central role in the epithelial-to-mesenchymal transition (EMT). Semin. Cancer Biol. 22 , 417 –427.

Warzecha, C.C., Sato, T.K., Nabet, B., Hogenesch, J.B., and Carstens, R.P. (2009). ESRP1 and ESRP2 Are Epithelial Cell-Type-Specific Regulators of FGFR2 Splicing. Mol. Cell 33 , 591 – 601.

Watson, J.D., and Crick, F.H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171 , 737 –738.

Wei, C., Xiao, R., Chen, L., Cui, H., Zhou, Y., Xue, Y., Hu, J., Zhou, B., Tsutsui, T., Qiu, J., et al. (2016). RBFox2 Binds Nascent RNA to Globally Regulate Polycomb Complex 2 Targeting in Mammalian Genomes. Mol. Cell 62 , 875 –889.

Wiedemann, S.M., Mildner, S.N., Bönisch, C., Israel, L., Maiser, A., Matheisl, S., Straub, T., Merkl, R., Leonhardt, H., Kremmer, E., et al. (2010). Identification and characterization of two novel primate-specific histone H3 variants, H3.X and H3.Y. J. Cell Biol. 190 , 777 –791.

Will, C.L., and Luhrmann, R. (2011). Spliceosome Structure and Function. Cold Spring Harb. Perspect. Biol. 3, a003707 –a003707.

Wong, J.J.-L., Ritchie, W., Ebner, O.A., Selbach, M., Wong, J.W.H., Huang, Y., Gao, D., Pinello, N., Gonzalez, M., Baidya, K., et al. (2013). Orchestrated Intron Retention Regulates Normal Granulocyte Differentiation. Cell 154 , 583 –595.

Wongpalee, S.P., and Sharma, S. (2014). The Pre-mRNA Splicing Reaction. In Spliceosomal Pre-MRNA Splicing: Methods and Protocols, K.J. Hertel, ed. (Totowa, NJ: Humana Press), pp. 3–12.

Xu, Y., Zhao, W., Olson, S.D., Prabhakara, K.S., and Zhou, X. (2018). Alternative splicing links histone modifications to stem cell fate decision. Genome Biol. 19 , 133.

Yan, C., Wan, R., and Shi, Y. (2019). Molecular Mechanisms of pre-mRNA Splicing through Structural Biology of the Spliceosome. Cold Spring Harb. Perspect. Biol. 11 , a032409.

176 The Role of histone modifications in the regulation of alternative splicing during the EMT

Yanagisawa, M., Huveldt, D., Kreinest, P., Lohse, C.M., Cheville, J.C., Parker, A.S., Copland, J.A., and Anastasiadis, P.Z. (2008). A p120 Catenin Isoform Switch Affects Rho Activity, Induces Tumor Cell Invasion, and Predicts Metastatic Disease. J. Biol. Chem. 283 , 18344 – 18354.

Yang, L., Mali, P., Kim-Kiselak, C., and Church, G. (2014). CRISPR-Cas-mediated targeted genome editing in human cells. Methods Mol. Biol. Clifton NJ 1114 , 245 –267.

Yang, Y., Park, J.W., Bebee, T.W., Warzecha, C.C., Guo, Y., Shang, X., Xing, Y., and Carstens, R.P. (2016). Determination of a Comprehensive Alternative Splicing Regulatory Network and Combinatorial Regulation by Key Factors during the Epithelial-to-Mesenchymal Transition. Mol. Cell. Biol. 36 , 1704 –1719.

Yeo, G.W., Coufal, N.G., Liang, T.Y., Peng, G.E., Fu, X.-D., and Gage, F.H. (2009). An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat. Struct. Mol. Biol. 16 , 130 –137.

Yoshimoto, R., Kataoka, N., Okawa, K., and Ohno, M. (2009). Isolation and characterization of post-splicing lariat –intron complexes. Nucleic Acids Res. 37 , 891–902.

Yu, M., Bardia, A., Wittner, B.S., Stott, S.L., Smas, M.E., Ting, D.T., Isakoff, S.J., Ciciliano, J.C., Wells, M.N., Shah, A.M., et al. (2013). Circulating Breast Tumor Cells Exhibit Dynamic Changes in Epithelial and Mesenchymal Composition. Science 339 , 580 –584.

Zeisberg, M., Yang, C., Martino, M., Duncan, M.B., Rieder, F., Tanjore, H., and Kalluri, R. (2007). Fibroblasts Derive from Hepatocytes in Liver Fibrosis via Epithelial to Mesenchymal Transition. J. Biol. Chem. 282 , 23337 –23347. von Zelewsky, T., Palladino, F., Brunschwig, K., Tobler, H., Hajnal, A., and Müller, F. (2000). The C. elegans Mi-2 chromatin-remodelling proteins function in vulval cell fate determination. Dev. Camb. Engl. 127 , 5277 –5284.

Zentner, G.E., and Henikoff, S. (2013). Regulation of nucleosome dynamics by histone modifications. Nat. Struct. Mol. Biol. 20 , 259 –266.

Zhang, X., Ibrahimi, O.A., Olsen, S.K., Umemori, H., Mohammadi, M., and Ornitz, D.M. (2006). Receptor Specificity of the Fibroblast Growth Factor Family: THE COMPLETE MAMMALIAN FGF FAMILY. J. Biol. Chem. 281 , 15694 –15700.

Zhang, Y., Zhao, Y., Jiang, G., Zhang, X., Zhao, H., Wu, J., Xu, K., and Wang, E. (2014). Impact of p120-catenin Isoforms 1A and 3A on Epithelial Mesenchymal Transition of Lung Cancer Cells Expressing E-cadherin in Different Subcellular Locations. PLoS ONE 9, e88064.

Zhao, J., Hyman, L., and Moore, C. (1999). Formation of mRNA 3’ ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol. Mol. Biol. Rev. MMBR 63 , 405 –445.

Zhao, Y.-Q., Jordan, I.K., and Lunyak, V.V. (2013). Epigenetics Components of Aging in the Central Nervous System. Neurotherapeutics 10 , 647 –663.

Zhou, H.-L., Luo, G., Wise, J.A., and Lou, H. (2014). Regulation of alternative splicing by local histone modifications: potential roles for RNA-guided mechanisms. Nucleic Acids Res. 42 , 701 –713.

Zhou, Y., Lu, Y., and Tian, W. (2012). Epigenetic features are significantly associated with alternative splicing. BMC Genomics 13 , 123.

177 The Role of histone modifications in the regulation of alternative splicing during the EMT

Zöller, M. (2011). CD44: can a cancer-initiating cell profit from an abundantly expressed molecule? Nat. Rev. Cancer 11 , 254 –267.

178 The Role of histone modifications in the regulation of alternative splicing during the EMT

ANNEXES

179 The Role of histone modifications in the regulation of alternative splicing during the EMT

Article: Splicing-associated signatures: a combinatorial and position-dependent role for histone marks in splicing definition (In review)

Agirre E., Oldfield A.J., Bellora N., Segelle A. and Luco R.F.

180 The Role of histone modifications in the regulation of alternative splicing during the EMT

Splicing-associated chromatin signatures: a combinatorial and position-dependent role for histone marks in splicing definition

Agirre E. 1,3 , Oldfield A.J. 1, Bellora N. 2, Segelle A. 1 and Luco R.F. 1*

1. Institute of Human Genetics, UMR9002 CNRS-University of Montpellier, 34000, France

2. Instituto Andino Patagonico de Tecnologias Biologicas y Geoambientales, CONICET, Argentina,

3. Current address: Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.

*Corresponding author: [email protected]

Abstract

Alternative splicing relies on the combinatorial recruitment of splicing regulators to specific RNA binding sites at the regulated exon. Chromatin has been shown to impact alternative splicing by modulating this recruitment. However, a limited number of histone marks, over a limited number of model genes, have been studied and the combinatorial role of chromatin modifications at a genome-wide level has not been properly addressed yet. Using a machine learning approach applied to extensive epigenomics and transcriptomics datasets in human H1 embryonic stem cells and IMR90 foetal fibroblasts cells, we have identified eleven chromatin modifications, some of which have never been reported before, that differentially mark alternatively spliced exons depending on the level of exon inclusion. These modifications act in a combinatorial and position-dependent way creating characteristic splicing-associated chromatin signatures (SACS) on 34% of the alternatively spliced exons analysed. Moreover, chromatin-marked exons share common features, such as shorter exon length, weaker splice sites and the presence of particular RNA binding motifs, which might underline common regulatory pathways. Importantly, we observed a strong correlation between alternative splicing levels and enrichment of the chromatin marks identified by our SACS model when studying the same splicing event in eight different human cell lines. Even more, some of these splicing-associated chromatin marks were also correlated with recruitment of the splicing regulator predicted by the RNA motif search analysis, further supporting a functional link between these chromatin signatures (SACS) and the splicing machinery. We propose that the highly dynamic nature of chromatin could be a way to quickly fine-tune the alternative splicing of key genes necessary for a prompt cellular response.

181 The Role of histone modifications in the regulation of alternative splicing during the EMT

Introduction

An essential aspect of cell identity is to express the right subset of proteins, at the right developmental stage, and to maintain this expression pattern as a cell memory. It is not only about which gene is expressed, but also how it is processed by mechanisms such as alternative splicing. Alternative splicing is a highly regulated process that affects most human genes. It consists in the alternative processing of a molecule of pre- mRNA into different mature mRNAs, and therefore coding proteins, thereby increasing genome diversity and complexity 1,2 . Alternatively spliced exons are defined by cis- regulatory sequences, called RNA motifs, responsible for the recruitment of positive and negative trans-acting factors that will favour or inhibit the inclusion of the regulated exon in the pre-mRNA 3,4 . The strength and composition of these RNA binding sites, the differential G/C content between introns and exons, RNA secondary structures and exon/intron lengths play an important role in predicting exon inclusion 5-8. On top of that, splicing is mostly a co-transcriptional process in which chromatin conformation, histone modifications, DNA methylation and transcriptional regulators have also been shown to impact the final splicing outcome 9-20 . Effectively, nucleosomes have been shown in genome-wide studies to non-randomly distribute along genes, with a specific positioning at the intron/exon junction 17,21-24 , creating transcriptional roadblocks that can shape the final splicing outcome 18 . Moreover, these exon-specific nucleosomes are enriched in characteristic histone modifications, many of which have already been shown by our group and others to play an active role in the final splicing decision 9,11,22-

25 . So far, there are two models functionally linking chromatin to splicing. The kinetic model, in which by slowing down the RNA polymerase II, chromatin modulates the window of time for splicing regulators to bind to competing RNA binding sites; and the recruitment model, in which chromatin modifications modulate splicing factors binding to the pre-mRNA via recruitment of chromatin binding proteins that act as adaptors between the chromatin and the splicing machinery 19,26-30 .

There have been several attempts to identify all the chromatin modifications that are differentially enriched along alternatively spliced exons 12,20,31-33 . Recently, several works have started to functionally link histone modifications with splicing at a genome- wide level. First, chronic cocaine administration in mice was shown to induce dramatic changes in chromatin and the alternative splicing of genes from the nucleus accumbens brain reward region through physical interaction of the splicing factor A2BP1 (RBFOX1)

182 The Role of histone modifications in the regulation of alternative splicing during the EMT

with H3K4me3 at target genes 34 . In a disease context, deletion of the H3K79 methyltransferase DOT1L1 in two acute myeloid leukemia cell lines induced inclusion of H3K79me2-marked exons, which reduced cell proliferation and transformation 35 . Finally, in a developmental context, half of the alternatively spliced events that change upon human embryonic stem cell differentiation were shown to be differentially enriched in H3K36me3, H3K27ac or H4K8ac. Interestingly, these chromatin- marked alternatively spliced genes shared common gene ontology terms related to stemness signatures, G2/M cell-cycle progression and DNA damage response, suggesting that chromatin might coordinate the regulation of specific splicing-related pathways important for cell differentiation 36 . However, in most of these studies, just the histone marks of interest have been analysed individually, omitting many known chromatin marks and their combinatorial role in gene expression. Moreover, splicing has been studied in a binomial way, with exons either totally included or excluded from the mRNA, without taking in consideration the real complexity of alternative splicing, in which different spliced mRNA isoforms coexist in the same cell at different levels.

In this study, we have used a supervised machine learning approach to identify all the chromatin modifications that could classify alternatively spliced exons into four splicing groups based on exon-inclusion levels, from highly excluded (0% exon inclusion) to highly included (100% exon inclusion). From the 28 chromatin modifications analysed, 11 were shown to differentially mark 34% of all the alternatively spliced exons analysed. When studied individually, there was no obvious association between enrichment of a specific histone mark and a percentage of exon inclusion. However, when studied in a combinatorial way, we found seven unique combinations of chromatin marks, co-enriched at a specific position along the regulated exon, that were selectively marking exons with a specific range of exon inclusion levels, creating like this what we called splicing-associated chromatin signatures (SACS). Interestingly, alternatively spliced exons marked by these SACS were smaller than constitutive, had distinctive gene ontology functions and characteristic RNA binding motifs, suggesting that each chromatin signature might coordinate the recruitment of specific splicing regulators to the pre-mRNA of a subset of genes that share common functional pathways. As expected, a shift in exon inclusion levels between two different cell lines correlated with a change in histone marks enrichment levels and binding of the

183 The Role of histone modifications in the regulation of alternative splicing during the EMT corresponding splicing regulator, as predicted by the SACS model, supporting a functional link between chromatin and cell- specific alternative splicing.

Results

Exon-specific chromatin modifications can discriminate between different levels of exon inclusion.

Splicing is mainly a co-transcriptional process in which chromatin and transcriptional regulators have long been shown to impact the final splicing outcome at a number of model genes 9,11 . However, a systematic approach to identify all the histone modifications that can influence alternative splicing at a genome-wide level is lacking. Neither do we know whether these histone marks can act in a combinatorial way and what these chromatin-marked spliced genes have in common. To address those questions, we took advantage of publicly available transcriptomics and epigenomics data from the Roadmap Epigenomics and ENCODE projects to identify, by using machine learning approaches, the chromatin modifications that were informative to classify alternatively spliced exons in different splicing groups (Fig.1). We first selected in two cell lines in which the most extensive epigenomics data is available, which are human H1 embryonic stem cells and IMR90 foetal lung fibroblasts, splicing events in which an alternatively spliced exon was flanked by two constitutive exons. Then we distributed these cassette exons into different splicing groups based on the percentage of exon inclusion, using the Percent Spliced In index (PSI). Most of the studies done until now study splicing as a binomial process, in which exons are either totally included (PSI>80%) or excluded (PSI<20%). However, a mix population of splicing isoforms, with the exon included in some transcripts and excluded in some others, can also co-exist in the same cell or population of cells, increasing the splicing complexity. Assuming that the mechanisms of splicing regulation might be different between these splicing conditions, we decided to distribute the alternatively spliced exons into four categories, with highly included (PSI>80%), mid-included (40%

184 The Role of histone modifications in the regulation of alternative splicing during the EMT

PSI>95% in more than 75% of the 10 cell lines analysed. Lowly expressed genes (Transcripts per million reads (TPM) <10) and the two first exons from each alternatively spliced gene were excluded from the analysis to avoid a chromatin effect from the transcription start sites. Then, using available chromatin immunoprecipitation (ChIP-seq) and methyl DNA immunoprecipitation high-throughput sequencing data (MeDIP-seq), we calculated the levels of 27 histone modifications and methyl DNA (5mC) at different positions upstream, at the body and downstream the selected exons, which were used as epigenetic features for downstream analysis (Table S1, Fig.1b).

Finally, using a supervised Random Forest-based classifier 38 , we identified the epigenetic features that were informative to classify the selected exons into the four pre-defined splicing groups in each cell type independently. Fourteen histone modifications and DNA methylation were found in common between the two cell lines (Fig.1c). Some of them are well-known to play a role in alternative splicing, such as

H3K36me3, H4K4me3, H3K9me3, H3K9ac, H3K27me3 and DNA methylation 13,14,19,25-

27,29,30,36 . Whereas others are more novel, such as H2AK5ac, H4K91ac, H3K18ac, H3K14ac, H3K4me1, H3K20me1 and H3K79me1,2, which broadens the potential role of chromatin in alternative splicing. Since histone marks are known to play a combinatorial role in gene expression, we next tested whether splicing-associated chromatin marks were co- enriched at alternatively spliced exons.

Splicing-associated chromatin marks act in a combinatorial and position-dependent manner along the regulated exon.

When comparing histone modification levels at the selected cassette exons, we could not find many striking differences between the four pre-defined splicing groups (Suppl. Fig.1). Mid- excluded exons were slightly more enriched in 5mC and histone acetyl marks, such as H2AK5ac, H3K14ac, H3K18ac, H3K9ac; whereas included exons had reduced levels in H3K27me3, H3K36me3 and H4K20me1 (Suppl. Fig. S1). However, when studying the co- occurrence by pairs of these same chromatin modifications at specific positions around the selected exons, we found that alternatively spliced exons were significantly marked by seven unique combinations of chromatin modifications, enriched at very specific positions along the exon, creating like this splicing-associated chromatin signatures (SACS) (Fig.2a). These chromatin signatures were selected based on two criteria: 1) there was a localised enrichment of two different chromatin

185 The Role of histone modifications in the regulation of alternative splicing during the EMT marks at a specific position around the exon, which could be either upstream, at the body or downstream the regulated exon. We started looking for co-enrichment of just two marks for simplicity, more complex combinatorial patterns might certainly exist. 2) the chromatin signature was specific for a splicing category and was not significantly found at any of the other splicing groups analysed, including constitutive exons (Fig.2). In H1 hESCs, almost 900 alternatively spliced exons (34% from all the exons analysed) were marked by one of these chromatin signatures (Fig.2a). Surprisingly, chromatin did not only differentially mark exons homogenously included (PSI>80%) or excluded (PSI<20%) in a single splicing isoform, but it also marked exons present in more than one splicing isoform, leading to a mixed percentage of exon inclusion (20%

186 The Role of histone modifications in the regulation of alternative splicing during the EMT

To visualize the splicing-dependent co-enrichment of the newly identified chromatin signatures, we assessed the levels of a specific histone mark along exons enriched in the other mark in all four splicing groups. As expected, we found that H3K4me2 was co-enriched upstream H3K4me1-marked exons when included (SACS1, Fig.2b). H3K79me2 only peaked at H4K20me1-marked exons when excluded (SACS4, Fig.2f), whereas H4K91ac was more enriched at H4K20me1-marked exons when mid- included (SACS3, Fig.2d). Similarly, H3K9me3 was mostly enriched at 5mC-marked exons when excluded (SACS5, Fig.2g), and shifted towards the end of the exon when included (SACS2, Fig.2c). Finally, H3K9ac peaked upstream H3K14ac-enriched exons mainly when mid-excluded (SACS7, Fig.2e). Strikingly, 37% of the exons annotated as constitutive, meaning always included in all the mRNAs transcribed and processed from that locus, were also marked by their own and unique chromatin signature, H4K20me1+H3K36me3 (SACS8, Fig.2a,i), which was not found enriched at alternatively spliced exons, further supporting a role for chromatin in regulating exon recognition and splicing . Interestingly, using the same unbiased strategy with publicly available ChIP-seq datasets, we could also find some of these splicing-associated chromatin signatures in mouse embryonic stem cells, such as SACS4, with 57% of the studied excluded exons strongly marked by H4K20me1+H3K79m2, which may indicate a specific and evolutionary conserved role of some of these SACS in stem cells alternative splicing (Suppl. Fig.2a). We presume that with more chromatin modifications mapped in different cell types and organisms, more SACS will be identified.

To test the link between the newly identified chromatin signatures and alternative splicing, we then compared the levels of splicing-associated histone marks at exons that changed splicing, with the expectation that a change in chromatin should correlate with a change in splicing.

Splicing-associated chromatin signatures (SACS) are intimately linked to alternative splicing patterns.

A splicing shift from excluded to included is not a frequent event between cell lines. In order to accumulate enough exons for a robust statistical analysis of the correlation between changes in chromatin and splicing, we collected data from 8 different cell lines

187 The Role of histone modifications in the regulation of alternative splicing during the EMT for which relevant transcriptomics and epigenomics data was publicly available (see Suppl. Table S1 for a list of the datasets). The only histone marks we could robustly study in these conditions were H4K20me1 and H3K79me2, which are precisely the ones with a chromatin signature conserved in mice (SACS4, Suppl. Fig.2a-b). We first selected all the exons marked by SACS4 in H1 cells that were also expressed in at least one of the cell types analysed (n=139, Fig.3a). Then, for each cell line, we classified the exons between excluded and included and calculated the co-enrichment of the two histone marks at the exon, individually. We found that most of the excluded events marked by H4K20me1+H3K79me2 (SACS4) in H1 cells remained excluded and enriched in these histone marks in the other cell types (72/94=77%), whereas nearly half of the exons that shifted to included lost the SACS4 chromatin signature (Fig.3a). Importantly, when performing the same type of analysis, but this time selecting for mid- included events marked by H4K20me1+H4K91ac in H1 cells (SACS3), we found that 65% (20/31) of the events that shifted their pattern of splicing from included (in H1) to excluded (in any other cell type), also changed their chromatin signature from H4K20me1+H4K91ac to H4K20me1+H3K79me2. In contrast, barely 20% (6/30) of the events that maintained the same inclusion levels were enriched in H4K20me1+H3K79me2 (Fig.3a). These results support a link between specific chromatin signatures and alternative splicing patterns across cell types that might be evolutionary conserved in mammals.

To validate these in silico results, we performed ChIP-qPCR and RT-qPCR in three unrelated human cell lines: tumoral hematopoietic K562, tumoral epithelial HeLa S3 and normal epithelial MCF10 cells (Fig.3b-e). We first confirmed that excluded events are more enriched in H3K79me2 and H4K20me1 than included (Fig.3b,c). Of note, in some cases, H4K20me1 levels in mid-included exons were as high as in excluded, which is consistent with its enrichment in SACS3. Importantly when looking at three splicing events that shifted from included in K562 to excluded in MCF10a, a significant increase in H3K79me2 and H4K20me1 levels was observed, confirming a negative correlation between enrichment of these two marks and exon inclusion levels, as predicted by our model (Fig.3d,e). In conclusion, we could validate in other cell types the existence of splicing-associated chromatin signatures that specifically marked exons depending on the level of exon inclusion. Importantly, a shift in alternative splicing led to a shift in the chromatin signature enriched at the regulated exon, suggesting a functional link

188 The Role of histone modifications in the regulation of alternative splicing during the EMT between the two. Of course, further functional analyses are necessary to prove the causality of these marks in splicing.

We next aimed to identify the properties that these chromatin-marked exons had in common and whether chromatin is playing a role in defining their pattern of splicing.

Exons marked by a particular splicing-associated chromatin signature (SACS) share genetic and functional features.

When looking at characteristic features shared by exons marked by a specific SACS, we found that most of the chromatin-marked exons were shorter than constitutive exons, had weaker 3’ and 5’ splice sites and were surrounded by shorter flanking introns (Fig.4a,b), which is consistent with previous observations pointing to a role for chromatin in improving the recognition of suboptimal exons by the splicing machinery 8,21,22,25 . Interestingly, SACS1 and 2 were the only groups with longer exons than constitutive and no differences in splice site strength, suggesting a different mechanism of chromatin-mediated splicing regulation. Another feature important for splice-site recognition is the differential G/C content between exons and introns 8. Mammalian alternatively spliced exons evolved towards low GC levels along the exon, but with a strong exon-intron differential content, which helps the splicing machinery to recognize the exon next to long introns 8. In our case, the chromatin-marked exons that are not homogenously defined as included or excluded (mid-included SACS3 and mid- excluded SACS7), meaning that more than one splice variant is coexisting in the cell, had precisely a lower differential GC content between exons and introns, which could explain why the patterns of splicing of these exons are not as well defined as the highly included and excluded ones (Fig.4c). We propose that the existence of a specific chromatin signature could precisely help to better define which is the dominant splicing outcome in such cases. Importantly, the exons differentially enriched in methyl DNA (SACS2 and SACS5) did not show significant changes in their GC content compared to non-marked alternatively spliced exons, arguing against 5mC changes just because differences in the percentage of GC between exons (Fig.4c). Finally, there were no major differences in total gene expression levels, all splicing variants included, between the SACS genes and the non-marked alternatively spliced genes used as control, ruling out that the chromatin changes observed at a specific splice variant are

189 The Role of histone modifications in the regulation of alternative splicing during the EMT indirectly related to more general gene expression effects (Fig.4d).

Gene Ontology analysis revealed that each group of alternatively spliced genes marked by a specific SACS was enriched in distinctive biological processes not found in the other groups, suggesting that chromatin might differentially mark exons sharing common functional and/or regulatory pathways (Suppl. Fig. S3). For instance, genes with excluded exons marked by H4K20me1+H3K79me2 (SACS4) were strongly enriched in biological terms related to gene expression, chromatin, RNA regulation and protein translation. While included events marked by H3K9me3+5mC (SACS2) were strongly related to cancer, and mid-excluded events marked by H3K9ac+H3K14ac (SACS7) were related to sugar metabolism and oxytocin signalling pathway (Suppl. Fig. S3). Interestingly, when looking at the gene ontology of the genes with excluded exons differentially marked by H4K20me1+H3K79me2 in mice (SACS4), there were many terms in common with the ones found in human cells (Suppl. Fig. S3), suggesting a conserved functional chromatin marking of alternatively spliced exons involved in specific regulatory pathways. We propose that SACSs could coordinate efficient and quicker splicing responses by just modifying key histone modifications at key regulatory genes, instead of changing the expression patterns of more pleiotropic splicing regulators.

Following these lines, if chromatin-marked exons shared common regulatory pathways, it should be reflected by the presence of common RNA motifs. To address this hypothesis, we looked into the presence of SACS-specific RNA binding sites, responsible for the recruitment of specific splicing regulators to the pre-mRNA. For each group of chromatin-marked splicing events, we scanned for known RNA motifs from the CISBP-RNA database 41 that were significantly more enriched at the chromatin-marked exons, or flanking intronic sequences, compared to alternatively spliced exons not marked by a chromatin signature. As expected, we found characteristic RNA motifs in 4 of the 7 SACS identified (Fig. 5). Some of them were common between more than one SACS, such as hnRNPK and hnRNPL (SACS3, 4 and 5 - Fig.5b,c,d), whereas others were unique to a specific group, such as U1 snRNP and SRSF9 (SACS2 - Fig.5a), or the Zn-Finger protein ZNF638 (SACS4 - Fig.5c). Interestingly, ZNF638 is a transcriptional cofactor shown to regulate splicing during adipocyte differentiation by directly interacting with the splicing machinery 42,43 . While hnRNPK and hnRNPL have been shown to directly interact with some histone

190 The Role of histone modifications in the regulation of alternative splicing during the EMT

methyltransferases 44,45 , supporting a functional link between chromatin and the recruitment of specific splicing regulators to the pre-mRNA.

We next delved deeper into how these splicing-associated chromatin signatures might impact the splicing machinery.

Splicing-associated chromatin signatures (SACS) can impact the recruitment of RNA binding proteins to the pre-mRNA.

Chromatin has been shown to regulate splicing via two models, the RNA polymerase II kinetic model and the chromatin-adaptor recruitment model 9. To gain insights into how the newly identified chromatin signatures might regulate splicing, we first looked into the distribution of RNA polymerase II along chromatin-marked exons and their flanking intronic regions, using constitutively spliced and non-marked alternatively spliced exons as controls (Fig.6a). ChIP- seq and NET-seq studies have shown that RNA polymerase II occupancy is higher at exons with well-defined nucleosomes positioned at the intron/exon junction 46,47 . These nucleosomes act as roadblocks that modulate RNA polymerase II elongation rate by inducing pausing at specific sites to increase the window of time for splicing regulators to be recruited to weak splice sites. Importantly, a shift in nucleosome positioning can be sufficient to induce a change in RNA polymerase II pausing and alternative splicing 12 . When studying RNA polymerase II occupancy at exons differentially marked by a SACS, we observed that RNA polymerase II levels were significantly higher at exons compared to flanking introns, except for SACS1 and 2, in which there were similar enrichment levels at the exon and flanking introns (Fig.6a). SACS1 and SACS2 are precisely the groups in which we observed an enrichment of the chromatin marks at the beginning and end of the exon, respectively, suggesting the presence of a nucleosome, or chromatin binding protein, at these sites. We suggest that SACS1 and SACS2 force the RNA polymerase II to pause around the exon, which favours recruitment of the splicing machinery, and thus inclusion of the exon. In support, exons affected by elongation rate mutants are longer than non-affected exons 48 , which is consistent with SACS1 and SACS2 exons having a bigger exon size than the rest (Fig. 4b).

Finally, regarding H4K20me1-marked exons (SACS3 and 4), which are two SACS with clear enriched RNA binding motifs, we found common RNA motifs (FXR1,

191 The Role of histone modifications in the regulation of alternative splicing during the EMT hnRNPA2B1, hnRNPK) between the two groups (Fig.5c,d). Since these two groups share the same histone mark (H4K20me1), but have two opposite splicing outcomes (mid-included and excluded), we suggest a model in which the enrichment of a chromatin signature might impact the recruitment of a RNA binding protein to the regulated exon, thereby modulating the final splicing outcome. To test this hypothesis, we took advantage of available hnRNPK knock down and eCLIP data from different cell lines to correlate enrichment of SACS4 with functional dependence on the splicing repressor hnRNPK (Fig.6b,c). We found that 86% of SACS4-marked exons were dependent on hnRNPK levels, in contrast to barely a 48% when studying all alternatively spliced exons (Fig.6b). Moreover, hnRNPK was bound at a higher proportion of these hnRNPK-dependent events in chromatin –marked (88%) vs non- marked exons (56%), supporting a link between SACS4 enrichment and hnRNPK functional binding (Fig.6b). Even more, when comparing H4K20me1+H3K79me2 and hnRNPK levels between cell lines in which SACS4 marked exons change splicing patterns, we found that 48.5% (16/33) of the exons that stayed excluded between cell lines kept the SACS4 signature and were bound by hnRNPK, whereas only 16% (3/19) of the exons that shifted to included in HepG2 or K562 cells lines, kept these signatures (Fig.6c), further confirming a preferential binding of hnRNPK to H4K20me1+H3K79me2-rich exons. It is important to note that several of the RNA motifs found to be enriched along these chromatin-marked exons, such as hnRNPL, hnRNPK and hnRNPA2B1, have been reported to directly interact with specific chromatin regulators or to be associated with the chromatin fraction in mass spectrometry analyses 44,45,49 , supporting the idea that different combinations of histone marks can influence the recruitment of chromatin and splicing regulators to the pre- mRNA via protein-protein interactions.

Discussion

Since the discovery of co-transcriptional splicing, there have been many attempts to assess at a genome-wide level the extent to which histone modifications differentially mark alternatively spliced exons. Using a supervised machine learning algorithm, we found that 34% of the alternatively spliced cassette exons analysed in human embryonic stem cells were differentially marked by different combinations of 11

192 The Role of histone modifications in the regulation of alternative splicing during the EMT chromatin modifications, including DNA methylation. For the first time, we found that these splicing-associated chromatin marks are highly localised along the exon and in a combinatorial way, creating specific chromatin signatures. It is therefore not only about which histone marks are co-enriched, but also where along the exon we can find them. For instance, H3K9me3 and DNA methylation can mark both excluded and included exons depending on whether they are co-enriched at the body or downstream the exon at the 5’ splice site, respectively. To further increase the complexity of chromatin-associated splicing, a histone mark on its own can also mark both included and excluded exons. The specificity comes from which is the other histone mark it is co-enriched with, such as in the case of H4K20me1 that marks inclusion when co- enriched with H4K91ac and exclusion when together with H3K79me2. The global impact of chromatin in differentially marking splicing has thus certainly been underestimated in most of the studies in which marks were studied individually and exclusively at the exon body.

Another interesting conclusion from this study is that chromatin-marked exons share common genetic features, such as short exons, weak splice sites and common RNA binding motifs, suggesting a role in exon definition and recruitment of the splicing machinery to the regulated exon. Supporting this model, Yearim et al. also found H3K9me3+5mC differentially enriched at alternatively spliced exons in a highly- localised way, in which these chromatin modifications regulated splicing by modulating the recruitment of the splicing regulator SRSF3 to the pre- mRNA via protein-protein interaction with the H3K9me3-binding protein HP1 29 . We have found H3K9me3+5mC- marked exons to be enriched in U1 snRNP and SRSF9 in H1 cells. Only when included, these chromatin-marked exons have RNA polymerase II enriched at the exon and downstream the flanking intron, precisely where the chromatin marks are differentially enriched, suggesting a chromatin-dependent pausing of RNA polymerase II downstream the exon, which can impact recruitment of the splicing machinery and inclusion of the regulated exon. When looking into other chromatin-marked exons, we found that H4K20me1+H4K91ac- and H4K20me1+H3K79me2-marked exons share common RNA binding motifs, such as hnRNPK. Using available ChIP-seq and eCLIP data from different cell lines from the ENCODE project, we found that exons included in H1 ESC, and enriched in H4K20me1+H4K91ac, gained H3K79me2 levels when excluded in other cell types. Furthermore, co-enrichment of H4K20me1+H3K79me2 at

193 The Role of histone modifications in the regulation of alternative splicing during the EMT excluded exons was positively correlated with a functional binding of hnRNPK, reinforcing a role for chromatin in modulating recruitment of the splicing regulators to the pre-mRNA. In support, proteomics analysis of chromatin-enriched proteins showed that hnRNPA2B1, hnRNPL and hnRNPK, whose binding motifs are enriched in several of the chromatin-marked exons identified in this study, are strongly associated with the chromatin in an RNA-independent way 49 , and hnRNPK was shown to physically interact with the H3K9 histone methyltransferase SETDB144 . Taken together, there are strong evidence for a functional link between the observed chromatin signatures and recruitment of specific splicing regulators to the pre-mRNA that can impact the final splicing outcome.

Finally, we found that these splicing-associated chromatin signatures (SACS) were not only marking well included and excluded exons, but also exons with intermediary inclusion levels, suggesting that chromatin could play a role in regulating the number of transcripts that include or not a particular exon in a cell population or at the single cell level, thus creating a protein diversity that could be important for cell adaptability or response to certain external stimuli. Effectively, almost half of the exons that are included at low levels, with low usage splice sites (20% to 40% inclusion), are marked by H3K14ac+H3K9ac, and a fourth of the ones included at 40-80% are marked by H4K20me1+H4K91ac, suggesting that exons found in more than one splicing isoform in the same cell might be more sensitive to regulatory epigenetic factors. In support of this hypothesis, a recent study using single cell deep sequencing transcriptomics has shown that there can be more than one splicing isoform per gene in a single cell, and that 8% of all the alternatively spliced exons analysed are also differentially marked by

DNA methylation 37 . Importantly, this methylation-splicing association was stronger when looking at cells individually, in contrast to bulk data using mean values, pointing again to an underestimation of the impact of chromatin in splicing decisions when using cell population data, such as in our study 37 .

Another caveat when working with correlations is the difficulty to assess the real functional link between chromatin and splicing. The causative role in splicing of some of the chromatin marks identified in this study, such as H3K4me3, H3K27me3, H3K9me3 and DNA methylation, has already been proved in some model genes 19,26,27,29 . However, the global impact at a genome- wide level still remains unknown. Interestingly, most of the studies agree that it is a small subset, rather than

194 The Role of histone modifications in the regulation of alternative splicing during the EMT the vast majority, of regulated exons that are sensitive to chromatin. In our study, 34% of all the cassette exons analysed are marked by some type of chromatin modification. In agreement, in HeLa cells, 4% of alternatively spliced exons analysed are enriched in promoter-like marks, such as H3K9ac and H3K4me3, that can stall RNA polymerase

II promoting exon inclusion 12 . In HEK293 cells, 16% of cassette exons are sensitive to

RNA polymerase II elongation rate mutants 48 . Finally, when correlating changes in splicing with changes in chromatin in 34 normal and blood cancer cell lines, 35% of the alternatively spliced exons analysed were enriched in H3K79me2 35 . Nevertheless, more functional studies are necessary to properly assess the causal link and biological impact of chromatin in cell-specific splicing. Of particular interest will be to study such a link in a disease context, as a potential new target for novel therapeutic treatments, and in highly dynamic situations, such as in response to external stimuli. Indeed, in plants, the chromatin remodeler ZmCHB101 has been shown to impact alternative splicing in response to osmotic stress 50 . Furthermore, reduction in H3K36me3 or H3K36me-binding proteins from the MRG15 family, which has been shown by our group to induce recruitment of splicing regulators to the pre- mRNA 27 , affects splicing of genes important for Arabidopsis flowering in response to temperature 51,52 , supporting a functional link between chromatin and splicing in response to environmental stimuli.

In conclusion, we have identified 11 chromatin modifications that differentially mark alternatively spliced exons in a highly localised and combinatorial way. These chromatin- marked exons share similar inclusion levels and genomic features, such as specific RNA binding motifs. Moreover, we found that a shift in exon inclusion levels between different cell types correlates well with changes in the enrichment levels of the histone marks and splicing regulators predicted by our model, further supporting a role for chromatin in regulating the recruitment of the splicing machinery to the pre- mRNA. The fact that chromatin usually marks shorter exons with longer flanking introns and weaker splice sites supports a role for chromatin in improving exon definition and the recognition of the exon by the splicing machinery 22,49 . However, in this study we find that exons are differentially marked by specific combinations of chromatin modifications depending on the level of exon inclusion, suggesting that it is not just about exon recognition, but also about the level of splicing diversity needed by the cell. Recent single cell transcriptomics studies in induced pluripotent cells pointed out that

195 The Role of histone modifications in the regulation of alternative splicing during the EMT multiple splicing isoforms can be expressed within the same cell and that there is a cell- to-cell variability in exon inclusion levels, even though the same genetic features and cell-specific transcriptional programs are present 37 . Moreover, recent studies have shown that chromatin impacts the splicing of genes important for stem cell differentiation 36 and in response to environmental stimuli and stress 50,51 , suggesting that chromatin could be playing an extra regulatory layer in modulating highly dynamic changes in splicing and in creating a splicing diversity at the single cell level necessary to increase the cell’s adaptability to rapidly changing condition s. Further studies, particularly at the single cell level and in physiologically relevant model systems, such as disease, will be essential to further understand the importance of chromatin in splicing and cell biology.

Methods

Cell lines and culture

K562 human immortalized chronic myelogenous leukemia bone marrow cells were grown in IMDM + Glutamax-I (Gibco), supplemented with 10% fetal bovine serum (Sigma) and 1x Pen./Strep. (Sigma). Human cervix adenocarcinoma HeLa S3 cells were cultured in Ham’s F- 12 nutrient mix with 2mM L-Glutamine (Gibco), supplemented with 10% fetal bovine serum (Sigma) and 1x Pen./Strep. (Sigma).

Human mammary epithelial MCF10a cells were cultured as previously described 26 . Briefly, they were grown in DMEM/F-12 (Sigma) supplemented with 5% horse serum (Thermofisher), 20 ng/ml EGF (Sigma), 0.5 µg/ml hydrocortisone (Sigma), 100 ng/ml cholera toxin (Sigma), 10 µg/ml insulin (Sigma), 2mM L-Glutamine (Thermofisher) and

1x Pen./Strep. (Sigma). All cells were grown at 37 oC, 5% CO2, and were regularly tested for mycoplasma presence.

RNA extraction and real time RT-qPCR

RNA is extracted from cell pellets using Trizol (Life technologies) and then cleaned - up using the GeneJET RNA purification kit (Thermofisher), following the manu facturer’s instructions. 500 ng of total RNA is retrotranscribed into cDNAs using the Transcriptor First Strand cDNA Synthesis kit (Roche, 04897030001). Quantitative qPCRs were performed on the Bio-Rad CFX-96 Real-Time PCR System using iTaq

196 The Role of histone modifications in the regulation of alternative splicing during the EMT

Universal Sybr green Supermix (Bio-Rad), as previously described 26 . Three or more biological replicates were performed for each experiment and data was plotted as mean +/- S.E.M. Exon inclusion levels are calculated by normalizing the expression levels of the regulated exon with total expression levels of the gene calculated from constitutive exons. See Supplementary Table S2 for a list of primers used.

Chromatin Immunoprecipitation

K562, HeLa or MCF10a cells are cross-linked with 1% formaldehyde (Thermo Scientific) in DMEM/F-12 (Sigma) for 2 min. The reaction is quenched by the addition of glycine (Sigma) at a final concentration of 125 mM for 5 min. Cells are then washed twice with PBS, and resuspended in 1 ml of lysis buffer A [50 mM Hepes (Sigma, H3375) pH 7.5; 140 mM NaCl (Sigma, S5150); 1 mM EDTA (Gibco, 15575-038); 10% Glycerol; 0.5% IGEPAL CA-630 (Sigma, I3021); 0.25% Triton X-100 (Sigma, X100); 1x Complete protease inhibitor mixture (Roche, 4693159001), 200 nM PMSF (Sigma, P7626)]. After 10 min on ice, the cells are pelleted and resuspended in 1 ml of lysis buffer B [10 mM Tris-HCl (Sigma, T2663) pH 8.0; 200 mM NaCl (Sigma, S5150); 1 mM EDTA (Gibco, 15575-038); 0.5 mM EGTA (Bioworld, 40520008-2); 1x protease inhibitors (Roche, 4693159001); 200 nM PMSF (Sigma, P7626)]. After 10 min at room temperature, cells are sonicated in lysis buffer C [10 mM Tris-HCl (Sigma, T2663) pH 8.0; 100 mM NaCl (Sigma, S5150); 1 mM EDTA (Gibco, 15575-038); 0.5 mM EGTA (Bioworld, 40520008-2); 0.1% sodium deoxycholate (Sigma, 30970); 0.5% N- lauroylsarcosine (MP, Biomedicals, 190110); 1x protease inhibitors (Roche, 4693159001); 200 nM PMSF (Sigma, P7626)] using Diagenode Bioruptor for 12 cycles (30 sec ON; 50 sec OFF) to obtain ~200 –500 bp fragments. Cell debris are pre-cleared by centrifugation at 14,000 rpm for 20 min, and 8 μg of chromatin is incubated with either anti-H3K79me2 (Abcam, ab3594) or anti- H4K20me1 (Abcam, ab9051) antibodies overnight at 4 ̊C. Protein G-conjugated magnetic beads (Invitrogen, 10009D) are added the next day for 2 hours. Subsequent washing and reverse cross- linking are performed as previously described 26 . Quantitative qPCRs were performed on the Bio-Rad CFX-96 Real-Time PCR System using iTaq Universal Sybr green

Supermix (Bio-Rad) as previously described 26 . Three or more biological replicates were performed for each experiment and data was plotted as mean +/- S.E.M. ChIP enrichment for a primer-set was evaluated using the percentage of input normalized to

197 The Role of histone modifications in the regulation of alternative splicing during the EMT a negative primer-set, used as background. See Supplementary Table S2 for a list of primers used.

Chromatin modification analysis of Chip-Seq data

ChIP-Seq and MeDip-seq data were obtained from the ENCODE portal (https://www.encodeproject.org/), which compiles both ENCODE and ROADMAP data.

Reads were mapped to the reference genome hg19 using Bowtie 55 , keeping the best unique matches, with at most two mismatches to the reference (−v 2 –best –strata -m 1). Reads were extended to 200 nt in the 5′ to 3′ direction using Pyicos56. Then using

BedTools 57 , we removed repetitive reads overlapping centromeres, gaps, satellites and pericentromeric regions. For each sample, we used Pyicos to build clusters with overlapping reads along the genome, discarding single-read clusters. To recover the original reads from the clusters we used BedTools intersect –u. For downstream analyses, we used the reads assigned to the clusters to calculate the ChIP-seq and MeDip-seq coverage over defined genomic reads with BedTools bedcoverage and normalizing by region length and total number of reads.

Alternative Splicing Analysis from RNA-seq data

Available paired-end RNA-seq data was downloaded from the ENCODE portal (https://www.encodeproject.org/), which compiles both ENCODE and ROADMAP data. Percent of inclusion (PSI) of alternative splicing events was calculated using the raw fastq files, in which reads were filtered following the ENCODE guidelines, for human H1 hESC, K562, IMR90, A549, Gm12878, HEK293, HelaS3, HepG2, NHEK, HMEC, HSMM, HUVEC and MCF7 cell lines and mouse mESC files. RNA-seq from the nuclear fraction was used for downstream analysis if total RNA-seq was not available. Data was aligned in hg19 genome with STAR 2.3.0e 53 with parameters; -- runThreadN 4 --outFilterMismatchNmax 2 --clip3pAdapterSeq AAAAAA -- clip3pAdapterMMp 0 --outSJfilterReads Unique --alignSJDBoverhangMin 3 -- alignSJoverhangMin 5 -- outSJfilterOverhangMin 30 12 12 12 -- outSJfilterCountUniqueMin 5 1 1 1 -- outSJfilterIntronMaxVsReadN 50000 100000 200000 --sjdbScore 2 --outFilterType BySJout - -outSAMattributes All -- seedSearchStartLmax 50 , using an overhang of 99nt for long RNAseq reads and a custom database of exon-intron junction annotations from

198 The Role of histone modifications in the regulation of alternative splicing during the EMT https://github.com/nellore/intropolis. All the exons and introns were extracted from Biomart Ensembl72 annotations. Alternatively spliced cassette exons, in which an alternatively spliced exon is flanked by two constitutive exons, were extracted. First and second exons were excluded from the analysis to avoid a chromatin effect from the transcription start sites. Using the Ensembl biotype term, we also discarded from the final dataset all the exons not labelled as protein coding or noncoding RNA. We considered as constitutive exons all the exons annotated with the constitutive exon term in Ensembl72 and coming from the same transcripts as the selected alternatively splice exons. For each cell line the SJ.out.tab file from STAR output was filtered to recover all the exon-intron junctions present at least with a count of 5 reads. We extracted the number of exclusion and inclusion reads at all the exon-intron and exon- exon junctions from the final exon triplets dataset. Final PSI was calculated as a ratio of the number of reads including the exon, divided by the sum of the exclusion and inclusion reads. For the inclusion reads we consider the average value, since we expected reads coming from the acceptor and donor sites. Based on the cumulative distribution of PSI in H1 and IMR90 cell lines, four splicing groups were created: excluded (0%= 10 to discard lowly expressed gene. When comparing different cell lines, alternatively spliced genes with different expression levels were excluded to avoid confounding transcriptional effects.

Random Forest analysis

To find the epigenetic features informative to classify splicing events into the 4 pre- defined splicing groups (excluded, mid-excluded, mid-included and included), we first calculated, in H1 hESC and IMR90 cell lines, the coverage of each histone modification and methyl DNA at constitutive and alternatively spliced exons, together with the corresponding 200 nt intronic regions from the acceptor (3’ss) and donor (5’ss) sit es using BedTools bedcoverage, and normalizing by the total number of reads and region length. For each alternative exon, we also calculated the splice site strength with

Maxent 58 . We used splice site strengths and chromatin coverages over the defined

199 The Role of histone modifications in the regulation of alternative splicing during the EMT exonic and intronic regions as classifying features for the Random Forest analysis.

Selection of epigenetic features was performed using Boruta 38 in H1 hESC and IMR90 alternative splicing exons separately. Boruta measures the importance of each feature respect a reference attribute in comparison to a random model extracted from the dataset. Boruta was run with the following parameters, mcAdj= TRUE , maxRuns=100000 , getImp= getImpRfZ , dooTrace=2 , ntree = 100000 , for each of the pairwise comparisons between the four defined splicing groups . DNA methylation and 14 histone modifications were found in common as informative features to classify exons into the four pre-defined splicing groups in H1 and IMR90 cells. As expected splice site strength ranked as the most informative for inclusion and exclusion.

Method to identify the Splicing-associated chromatin signatures (SACS)

To test the co-enrichment by pairs of different histone and DNA marks at alternatively spliced exons, we looked for overlapping of ChIP-seq and MeDIP-seq reads along the alternatively spliced exons and the flanking upstream and downstream 200nt intronic regions by running version 0.8.1 of the script Block Bootstrap and Segmentation method 60 with parameters -r 0.1 -n 10,000 , where r is the fraction of each region in each sample and n is the number of bootstrap samples used. As input data, we used all the combinations of histone marks and DNA methylation overlapping at the alternative exon and flanking introns, compared to the background genome. With this method, we calculated a z-score corresponding to the number of standard deviations of the observed overlap compared to the random expected one. The script was also run in a dataset of constitutive and flanking exons, used as controls. From the output z-scores, we filtered out any chromatin pair that was enriched in more than one splicing or control group. We obtained 7 unique pairs of chromatin modifications specifically enriched at alternatively spliced exons depending on the level of exon inclusion. We defined as alternatively spliced exons not marked by chromatin, used as control spliced exons in downstream analysis, the exons that belonged to the same genes as the chromatin-marked exons, but did not have an enrichment of the chromatin pairs found by our approach. Similarly, constitutive exons were also selected from these same genes. For consistency, we only used a random comparable number of control exons for all the comparisons by selecting 600 random constitutive, 600 random excluded and 600 random included exons.

200 The Role of histone modifications in the regulation of alternative splicing during the EMT

RNA Binding Motif Analysis

RNA binding protein motif enrichments were calculated on the exonic and intronic regions of each chromatin marked event. We used a maximum intronic flank of 250 nt of the upstream and downstream introns from each chromatin marked event, removing 9 nt at donor site and 30 nt at the acceptor site to avoid branch point (BP), splice site

(SS) and polypyrimidine tract (PPT) signals as described in Llorian et al61 . Introns smaller than 60 nt were discarded. We calculated the enrichment of 5mers and RNA compete RNA motif matches as described in Coelho et al62 . As a control, we retrieved intronic and exonic regions from a set of excluded and included exons that do not overlap with the histone and 5mC signal. Statistically over- represented motifs were selected based on the Benjamini and Hochberg false discovery rate multiple test corrected p-value (BH-FDR < 0.05).

Analysis of hnRNPK knock-down and eCLIP publicly available data

Illumina RNA-seq data of hnRNPK-downregulated lymphoblastoid GM19238 cells using small interfering siRNAs (GSE52834: SRR1040861, SRR1040862 and

SRR1040863) was studied using VasTools 63 to identify all the exons that were affected, or not, by hnRNPK knockdown compared to control siRNAs. The exons that were also expressed in H1 cells were kept as potential hnRNPK-dependent exons for further analysis. eCLIP processed bed files from HepG2 and K562 cell lines were downloaded from the ENCODE data portal using the hg19 genome assembly. Exonic and intronic regions of the selected chromatin marked events were overlapped with

BedTools intersect 57 . Then, we counted the number of occurrences and applied a Fisher exact test to measure the difference between the different groups.

Gene ontology enrichment analysis

To identify the biological processes enriched at each group of chromatin-marked exons, we run the Enrichr R package 40 for KEGG_2019_Human pathways and GO_Biological_process_2018. Enrichr uses a Fisher’s exact test (p) and a combined score (c), in which the p-value (p) is combined with the z-score (z) of the deviation from the expected rank ( !=log( ")· #c=logp·z). GO terms were selected based on this p-value (p < 0.01) and the combined score.

201 The Role of histone modifications in the regulation of alternative splicing during the EMT

Statistical analysis

Statistical analysis was performed using R version 3.4.3. In Fig.3, we performed Fisher’s exact test and Wilcoxon rank-sum test, two-sided. In Fig.4, 6 and Supplementary Fig.1, we performed a Wilcoxon Rank test, two tailed, with the following n per each group, Fig.4 and 6: H4K20me1+H3K79me2 exons=152, H3K14ac+H3K9ac exons=139, H3K9me3+5mC Excluded exons=89, H4K20me1+H4K91ac exons=143, H3K9me3+5mC, Included exons =142, Excluded exons=600, Included exons =600 and Constitutive exons =600; Suppl. Fig.1: Excluded exons n=950, Mid-excluded exons n=332, Mid-included exons n=634 and Included exon n=675. In Fig.5, the – log10 adjusted p-value of the enrichment FDR and associated adjusted p-value were calculated from the same number of events as in Fig.4 and 6.

Data availability

Supplementary Table S1

References

1 Irimia, M. & Blencowe, B. J. Alternative splicing: decoding an expansive regulatory layer. Curr Opin Cell Biol 24 , 323-332, doi:10.1016/j.ceb.2012.03.005 (2012).

2 Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456 , 470-476, doi:nature07509 [pii] 10.1038/nature07509 (2008).

3 Wahl, M. C., Will, C. L. & Luhrmann, R. The spliceosome: design principles of a dynamic RNP machine. Cell 136 , 701-718, doi:10.1016/j.cell.2009.02.009 (2009).

4 Fu, X. D. & Ares, M., Jr. Context-dependent control of alternative splicing by RNA-binding proteins. Nat Rev Genet 15 , 689-701, doi:10.1038/nrg3778 (2014).

5 Busch, A. & Hertel, K. J. Splicing predictions reliably classify different types of alternative splicing. RNA 21 , 813-823, doi:10.1261/.048769.114 (2015).

6 Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163 , 698-711, doi:10.1016/j.cell.2015.09.054 (2015).

7 Barash, Y. et al. Deciphering the splicing code. Nature 465 , 53-59, doi:nature09000 [pii] 10.1038/nature09000 (2010).

8 Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell reports 1, 543-556, doi:10.1016/j.celrep.2012.03.013 (2012).

202 The Role of histone modifications in the regulation of alternative splicing during the EMT

9 Luco, R. F., Allo, M., Schor, I. E., Kornblihtt, A. R. & Misteli, T. Epigenetics in alternative pre- mRNA splicing. Cell 144 , 16-26, doi:S0092-8674(10)01378-4 [pii] 10.1016/j.cell.2010.11.056 (2011).

10 Luco, R. F. & Misteli, T. More than a splicing code: integrating the role of RNA, chromatin and non-coding RNA in alternative splicing regulation. Curr Opin Genet Dev 21 , 366-372, doi:S0959-437X(11)00064-5 [pii] 10.1016/j.gde.2011.03.004 (2011).

11 Braunschweig, U., Gueroussov, S., Plocik, A. M., Graveley, B. R. & Blencowe, B. J. Dynamic integration of splicing within gene regulatory pathways. Cell 152 , 1252-1269, doi:10.1016/j.cell.2013.02.034 (2013).

12 Curado, J., Iannone, C., Tilgner, H., Valcarcel, J. & Guigo, R. Promoter-like epigenetic signatures in exons displaying cell type-specific splicing. Genome Biol 16 , 236, doi:10.1186/s13059-015-0797-8 (2015).

13 Shukla, S. et al. CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature 479 , 74-79, doi:nature10442 [pii] 10.1038/nature10442 (2011).

14 Maunakea, A. K., Chepelev, I., Cui, K. & Zhao, K. Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res 23 , 1256- 1269, doi:10.1038/cr.2013.110 (2013).

15 Kornblihtt, A. R. Promoter usage and alternative splicing. Curr Opin Cell Biol 17 , 262-268, doi:S0955-0674(05)00056-6 [pii] 10.1016/j.ceb.2005.04.014 (2005).

16 Han, H. et al. Multilayered Control of Alternative Splicing Regulatory Networks by Transcription Factors. Mol Cell 65 , 539-553 e537, doi:10.1016/j.molcel.2017.01.011 (2017).

17 Chodavarapu, R. K. et al. Relationship between nucleosome positioning and DNA methylation. Nature 466 , 388-392, doi:10.1038/nature09147 nature09147 [pii] (2010).

18 Batsche, E., Yaniv, M. & Muchardt, C. The human SWI/SNF subunit Brm is a regulator of alternative splicing. Nat Struct Mol Biol 13 , 22-29, doi:nsmb1030 [pii] 10.1038/nsmb1030 (2006).

19 Saint-Andre, V., Batsche, E., Rachez, C. & Muchardt, C. Histone H3 lysine 9 trimethylation and HP1gamma favor inclusion of alternative exons. Nat Struct Mol Biol 18 , 337-344, doi:nsmb.1995 [pii] 10.1038/nsmb.1995 (2011).

20 Agirre, E. et al. A chromatin code for alternative splicing involving a putative association between CTCF and HP1alpha proteins. BMC Biol 13 , 31, doi:10.1186/s12915-015-0141-5 (2015).

21 Tilgner, H. et al. Nucleosome positioning as a determinant of exon recognition. Nat Struct Mol Biol 16 , 996-1001, doi:nsmb.1658 [pii] 10.1038/nsmb.1658 (2009).

22 Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16 , 990-995, doi:10.1038/nsmb.1659 nsmb.1659 [pii] (2009).

23 Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C. & Komorowski, J. Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19 , 1732- 1741, doi:10.1101/gr.092353.109 (2009).

24 Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet 41 , 376-381, doi:ng.322 [pii] 10.1038/ng.322 (2009).

203 The Role of histone modifications in the regulation of alternative splicing during the EMT

25 Spies, N., Nielsen, C. B., Padgett, R. A. & Burge, C. B. Biased chromatin signatures around polyadenylation sites and exons. Mol Cell 36 , 245-254, doi:S1097-2765(09)00743-6 [pii] 10.1016/j.molcel.2009.10.008 (2009).

26 Gonzalez, I. et al. A lncRNA regulates alternative splicing via establishment of a splicing- specific chromatin signature. Nat Struct Mol Biol 22 , 370-376, doi:10.1038/nsmb.3005 (2015).

27 Luco, R. F. et al. Regulation of alternative splicing by histone modifications. Science 327 , 996- 1000, doi:science.1184208 [pii] 10.1126/science.1184208 (2010).

28 Schor, I. E., Rascovan, N., Pelisch, F., Allo, M. & Kornblihtt, A. R. Neuronal cell depolarization induces intragenic chromatin modifications affecting NCAM alternative splicing. Proc Natl Acad Sci U S A 106 , 4325-4330, doi:0810666106 [pii] 10.1073/pnas.0810666106 (2009).

29 Yearim, A. et al. HP1 is involved in regulating the global impact of DNA methylation on alternative splicing. Cell reports 10 , 1122-1134, doi:10.1016/j.celrep.2015.01.038 (2015).

30 Pradeepa, M. M., Sutherland, H. G., Ule, J., Grimes, G. R. & Bickmore, W. A. Psip1/Ledgf p52 Binds Methylated Histone H3K36 and Splicing Factors and Contributes to the Regulation of Alternative Splicing. PLoS Genet 8, e1002717, doi:10.1371/journal.pgen.1002717 PGENETICS-D-11-02810 [pii] (2012).

31 Zhou, Y., Lu, Y. & Tian, W. Epigenetic features are significantly associated with alternative splicing. BMC Genomics 13 , 123, doi:10.1186/1471-2164-13-123 (2012).

32 Podlaha, O., De, S., Gonen, M. & Michor, F. Histone modifications are associated with transcript isoform diversity in normal and cancer cells. PLoS Comput Biol 10 , e1003611, doi:10.1371/journal.pcbi.1003611 (2014).

33 Enroth, S., Bornelov, S., Wadelius, C. & Komorowski, J. Combinations of histone modifications mark exon inclusion levels. PLoS One 7, e29911, doi:10.1371/journal.pone.0029911 (2012).

34 Feng, J. et al. Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens. Genome Biol 15 , R65, doi:10.1186/gb-2014-15-4-r65 (2014).

35 Li, T., Liu, Q., Garza, N., Kornblau, S. & Jin, V. X. Integrative analysis reveals functional and regulatory roles of H3K79me2 in mediating alternative splicing. Genome Med 10 , 30, doi:10.1186/s13073-018-0538-1 (2018).

36 Xu, Y., Zhao, W., Olson, S. D., Prabhakara, K. S. & Zhou, X. Alternative splicing links histone modifications to stem cell fate decision. Genome Biol 19 , 133, doi:10.1186/s13059- 018-1512- 3 (2018).

37 Linker, S. M. et al. Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol 20 , 30, doi:10.1186/s13059- 019-1644-0 (2019).

38 Kursa, M. B. R. W. R. Feature Selection with the Boruta Package. Journal of Statistical Software 36 , doi: 10.18637/jss.v036.i11 (2010).

39 Gunderson, F. Q. & Johnson, T. L. Acetylation by the transcriptional coactivator Gcn5 plays a novel role in co-transcriptional spliceosome assembly. PLoS Genet 5, e1000682, doi:10.1371/journal.pgen.1000682 (2009).

40 Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44 , W90-97, doi:10.1093/nar/gkw377 (2016).

204 The Role of histone modifications in the regulation of alternative splicing during the EMT

41 Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499 , 172-177, doi:10.1038/nature12311 (2013).

42 Du, C., Ma, X., Meruvu, S., Hugendubler, L. & Mueller, E. The adipogenic transcriptional cofactor ZNF638 interacts with splicing regulators and influences alternative splicing. J Lipid Res 55 , 1886-1896, doi:10.1194/jlr.M047555 (2014).

43 Meruvu, S., Hugendubler, L. & Mueller, E. Regulation of adipocyte differentiation by the zinc finger protein ZNF638. J Biol Chem 286 , 26516-26523, doi:10.1074/jbc.M110.212506 (2011).

44 Thompson, P. J. et al. hnRNP K coordinates transcriptional silencing by SETDB1 in embryonic stem cells. PLoS Genet 11 , e1004933, doi:10.1371/journal.pgen.1004933 (2015).

45 Yuan, W. et al. Heterogeneous nuclear ribonucleoprotein L Is a subunit of human KMT3a/Set2 complex required for H3 Lys-36 trimethylation activity in vivo. J Biol Chem 284 , 15701-15707, doi:10.1074/jbc.M808431200 (2009).

46 Nojima, T., Gomes, T., Carmo-Fonseca, M. & Proudfoot, N. J. Mammalian NET-seq analysis defines nascent RNA profiles and associated RNA processing genome-wide. Nature protocols 11 , 413-428, doi:10.1038/nprot.2016.012 (2016).

47 Brodsky, A. S. et al. Genomic mapping of RNA polymerase II reveals sites of co- transcriptional regulation in human cells. Genome Biol 6, R64, doi:10.1186/gb-2005-6-8- r64 (2005).

48 Fong, N. et al. Pre-mRNA splicing is facilitated by an optimal RNA polymerase II elongation rate. Genes Dev 28 , 2663-2676, doi:10.1101/gad.252106.114 (2014).

49 Kfir, N. et al. SF3B1 association with chromatin determines splicing outcomes. Cell reports 11 , 618-629, doi:10.1016/j.celrep.2015.03.048 (2015).

50 Yu, X. et al. The chromatin remodeler ZmCHB101 impacts alternative splicing contexts in response to osmotic stress. Plant Cell Rep 38 , 131-145, doi:10.1007/s00299-018-2354-x (2019).

51 Pajoro, A., Severing, E., Angenent, G. C. & Immink, R. G. H. Histone H3 lysine 36 methylation affects temperature-induced alternative splicing and flowering in plants. Genome Biol 18 , 102, doi:10.1186/s13059-017-1235-x (2017).

52 Bu, Z. et al. Regulation of arabidopsis flowering by the histone mark readers MRG1/2 via interaction with CONSTANS to modulate FT expression. PLoS Genet 10 , e1004617, doi:10.1371/journal.pgen.1004617 (2014).

53 Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 , 15-21, doi:10.1093/bioinformatics/bts635 (2013).

54 Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias- aware quantification of transcript expression. Nat Methods 14 , 417-419, doi:10.1038/nmeth.4197 (2017).

55 Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 , R25, doi:10.1186/gb-2009- 10-3-r25 (2009).

56 Althammer, S., Gonzalez-Vallinas, J., Ballare, C., Beato, M. & Eyras, E. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data. Bioinformatics 27 , 3333-3340, doi:10.1093/bioinformatics/btr570 (2011).

205 The Role of histone modifications in the regulation of alternative splicing during the EMT

57 Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 , 841-842, doi:10.1093/bioinformatics/btq033 (2010).

58 Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of computational biology : a journal of computational molecular cell biology 11 , 377-394, doi:10.1089/1066527041410418 (2004).

59 Corvelo, A., Hallegger, M., Smith, C. W. & Eyras, E. Genome-wide association between branch point properties and alternative splicing. PLoS Comput Biol 6, e1001016, doi:10.1371/journal.pcbi.1001016 (2010).

60 Bickel, P. J. B., N.; Brown, J.B.; Huang, H.; Zhang N.R. . Subsampling methods for genomic inference. The Annals of Applied Statistics 4, 1660-1697, doi:10.1214/10-AOAS363 (2010).

61 Llorian, M. et al. The alternative splicing program of differentiated smooth muscle cells involves concerted non-productive splicing of post-transcriptional regulators. Nucleic Acids Res 44 , 8933-8950, doi:10.1093/nar/gkw560 (2016).

62 Coelho, M. B. et al. Nuclear matrix protein Matrin3 regulates alternative splicing and forms overlapping regulatory networks with PTB. EMBO J 34 , 653-668, doi:10.15252/embj.201489852 (2015).

63 Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res 27 , 1759-1768, doi:10.1101/gr.220962.117 (2017).

64 Supek, F., Bosnjak, M., Skunca, N. & Smuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800, doi:10.1371/journal.pone.0021800 (2011).

206 The Role of histone modifications in the regulation of alternative splicing during the EMT

Acknowledgements

We thank Eduardo Eyras (Australian National University-EMBL Australia) for valuable suggestions and comments on the manuscript. Nadine Laguette’s laboratory (IGH, Montpellier-France) for Hela-S3 cells and Michael Hahne’s (IGMM, Montpellier-France) for the K562 cell line. This project was funded by the ANR young investigators programs (ANR-16- CE12-0012-01/CT141033), the EpiGeneSys European Network of Excellence, the ARC Foundation for cancer research and the Marie Curie framework. N.B was supported by the CONICET.

Author contributions

E.A. performed all the computational analysis. A.O. and A.S. performed all the experimental analysis. N.B. performed the motif search analysis. E.A and R.F.L designed the experiments and wrote the manuscript.

Figure Legends

Figure 1 . Schematic pipeline of the machine learning approach used to identify the chromatin modifications that can classify exons into four different splicing categories . a. Cumulative distribution of alternatively spliced exons in human H1 embryonic stem cells and IMR90 fetal fibroblasts using available RNA-seq datasets. Four arbitrary groups were created based on the percentage of exon inclusion (PSI). A color code is given to each category, with light blue for well excluded (0

207 The Role of histone modifications in the regulation of alternative splicing during the EMT defined splicing categories in H1 and IMR90 cells. In the bottom are listed the final list of chromatin modifications found in common between the two cell lines. As control, we used constitutive exons, which are exons always included in the mRNA, from the same genes as the alternatively spliced exons analyzed.

Figure 2. Splicing-associated chromatin signatures (SACS) in H1 hESCs. a. Schematic representation of the seven combinations of chromatin modifications (SACS) that differentially mark alternatively spliced exons. For each SACS, we specify the splicing group it is related to, the two co-enriched histone marks, the position of enrichment along the exon (represented by a peak) and the total number (n) of exons marked by the chromatin signature (in brackets the percentage of chromatin-marked exons respect the total number of exons analyzed per group). b-i. Density profiles of H3K4me2 reads around exons marked by H3K4me1 upstream the 3’ss exon start (b); H3K9me3 reads around exons marked by 5mC downstream the 5’ss end of the exon (c); H4K91ac reads around H4K20me1-marked exons ( d), H3K9ac reads around exons marked by H3K14ac upstream the exon start (e), H3K79me2 reads around H4K20me1- marked exons ( f), H3K9me3 reads around 5mC-marked exons ( g), H3K27me3 reads around exons marked by H3K4me3 downstream the end of the exon (h); and H3K36me3 reads around H4K20me1-marked exons ( i) in excluded (light blue), mid-excluded (dark blue), mid- included (yellow), included (red) and constitutive (grey) exons using available ChIP-seq datasets from H1 hESCs. The mean density of histone marks’ reads is represented +/- 250 bp from either the 5’ or 3’ splice site, depending on the SACS. For each mark, we highlight with a black arrow the splicing group that is the highest enriched at a specific position around the regulated exon, as defined by the SACS. Please notice that H3K9me3+5mC is enriched at both included (SACS2) and excluded (SACS5) exons but at different positions, this is why there are two arrows (in grey the enrichment that corresponds to the other SACS).

Figure 3. Experimental validation of SACS4: H4K20me1 and H3K79me2 levels positively correlate with exon skipping. a. H4K20me1 and H3K79me2 levels at differentially spliced exons in H1 compared to several cell lines from the ENCODE project (K562, IMR90, A549, Gm12878, HelaS3, HepG2 and MCF7). From 139 excluded exons enriched in H4K20me1 and H3K79me2 in H1, 94 remained excluded

208 The Role of histone modifications in the regulation of alternative splicing during the EMT and 45 shifted to included in at least one of the cell lines analyzed. When looking at H4K20me1 and H3K79me2 co-enrichment levels at the selected exons, we found a statistically significant correlation between exon exclusion and co- enrichment of both histone marks (Fisher’s exact test p -value < 0.01). When performing the same type of analysis in mid-included exons enriched in H4K20me1+H4K91ac, we found that out of 61 events, 30 remained included and 31 switched to excluded. Again, exclusion significantly correlated with co-enrichment of H4K20me1 and H3K79me2 (Fisher’s exact test p-value < 0.01), suggesting a change in chromatin when a change in splicing is observed. b-c For validation of the results obtained in silico , five splicing events that are included (INCL.) and five that are excluded (EXCL.) in K562 (black) and HeLa S3 (green) cells were tested for H3K79me2 and H4K20me1 levels along the alternatively spliced exon. In b, exon inclusion levels are normalized to total expression levels of the corresponding gene. Below 0.2 (highlighted with a dotted line) the exon is considered excluded. Data is depicted as the Mean +/- SEM of 3 independent experiments by quantitative RT-qPCR. In c, the enrichment levels (% input) of H3K79me2 and H4K20me1 at the alternatively spliced exon are normalized to background levels, which correspond to the average of 3 negative control positions. Data is depicted as the Mean +/- SEM of 3 independent experiments by quantitative

ChIP-qPCR. *** p-value < 10 -12 (Wilcoxon rank-sum test, two-sided). d, e. Same as b- c, but this time three alternatively spliced events (AS) that switched exon inclusion levels between K562 (in black) and MCF10a (in green) are shown. An included and an excluded event that do not change between cell lines are shown as controls. ** p-value < 0.007 and *** p-value < 0.0005 (Wilcoxon rank-sum test, two-sided).

Figure 4. The genetic features of chromatin-marked alternatively spliced exons (SACS). a Boxplots of the 3’ and 5’ splice site (ss) strength scores. b. Boxplots of exon and upstream and downstream intron lengths (in bp). c. Boxplots of the log2 ratio of the percentage of GC content at the alternatively spliced exon respect the upstream or downstream flanking intron. d. Boxplot of the normalized gene expression levels, represented as log(TPM). Boxplots are centered on the median with interquartile ranges, representing the mean and 25 th quantile of all the exons enriched in a particular chromatin signature. Each chromatin-marked splicing group (SACS) has its own color- code. Constitutive exons (in grey+black) and non-marked excluded (in grey+blue) and

209 The Role of histone modifications in the regulation of alternative splicing during the EMT included (in grey+red) exons are used as controls. * p-value < 0.01 and ** p-value < 0.001 in Wilcoxon rank test compared to constitutive exons (in black) or the corresponding alternatively spliced control exons (in purple) . For each of the boxplots, the number of exons are: SACS1 (K4me1+K4me2 in included exons) n=165 , SACS2 (K9me3+5mC in included exons) n=142, SACS3 (K20me1+K91ac in mid-included exons) n=143, SACS4 (K20me1+K79me2 in excluded exons) n=152, SACS5 (K9me3+5mC in excluded exons) n=89, SACS7 (K14ac+K9ac in mid-excluded exons) n=139, non-marked excluded exons n=600, non-marked included exons n=600, Constitutive exons n=600.

Figure 5. SACS are defined by specific RNA binding protein (RBP) motifs. a-d. Volcano plots of the scanned RBP motifs and 5mers in the upstream intron (left), chromatin-marked exon (middle) and downstream intron (right) for a. included H3K9me3+5mC-marked exons (SACS2), b. excluded H3K9me3+5mC-marked exons (SACS5), c. excluded H4K20me1+H3K79me2-marked exons (SACS4) and d. mid- included H4K20me1+H4K91ac- marked exons (SACS3). Colored dots correspond to motifs with adjusted p-value < 0.01 and FDR < 0.05. X axis represents the log2 fold enrichment (FC) of each motif compared to non- marked alternatively spliced events sequences. Y axis represents the –log10 adjusted p-value of the enrichment. FDR and associated adjusted p-value were calculated from n=152 H4K20me1+H3K79me2 excluded exons, n=89 H3K9me3+5mC excluded, n=143 H4K20me1+H4K91ac mid- included exons, n=142 H3K9me3+5mC included exons, n=600 non chromatin-marked excluded exons and n=600 non chromatin-marked included exons.

Figure 6. SACS can impact RNA polymerase II distribution and recruitment of splicing factors. a. Boxplot of the normalized RNA polymerase II reads coverage over the upstream intron, exon and downstream intron for the chromatin-marked exons. Constitutive and non- marked excluded and included exons are used as controls (shaded in grey). RNA polymerase II is more enriched at exons than introns in all conditions except for H3K4me1+H3K4me2 (SACS1) and H3K9me3+5mC (SACS2) included exons. ** p-value < 0.01 at exons compared to flanking introns in Wilcoxon rank test. b. Splicing effect on alternatively spliced exons upon hnRNPK knockdown, using available data from GM19238 cells. The number of hnRNPK- dependent events

210 The Role of histone modifications in the regulation of alternative splicing during the EMT with hnRNPK binding evidence, using publicly available eCLIP data in K562 and HepG2 cells, is also shown. The analysis is performed in exons marked by SACS4 and alternatively spliced exons in general. Exons that are more included upon hnRNPK knock down are shown in red, more excluded are shown in blue and not affected are shown in grey. c. hnRNPK binding and enrichment of H4K20me1+H3K79me2 levels at alternatively spliced exons shifting splicing patterns in different cell lines. Using available eCLIP and ChIP-seq data in K562 and HepG2 cells, we found that from 52 excluded exons rich in H4K20me1+H3K79me2 in H1 hESC, 33 remained excluded and 19 switched to included in K562 or HepG2. Excluded events were more co- enriched in H4K20me1+H3K79me2 than included (Fisher’s exact test p -value < 0.05) and most of the (H4K20me1+H3K79me2)-rich excluded events were bound by hnRNPK (Fisher’s exact test p -value < 0.05), supporting a model in which a specific chromatin signature can favor the recruitment of a splicing regulator to the pre-mRNA.

Supplementary informations

Supplementary Figure S1 . Enrichment levels of the 15 histone and methyl DNA (5mC) marks selected from the Random Forest classifier . Boxplots of the log2 ratio of the normalized read coverage at the exon respect the read coverage upstream (in the left) or downstream (in the right) the intronic region, for each chromatin marks in excluded (Ex, light blue), mid-excluded (ME, dark blue), mid-included (MI, yellow) and well included (In, red) alternatively spliced events. Boxplots represent the mean and

25 th quantile of the log2 (exon/intron) ratio. * p-value < 0.05 and ** p-value < 0.01 in Wilcoxon Rank Sum test, paired.

Supplementary Figure S2 . a. Heatmap of the normalized read coverage upstream, at the exon and downstream the intronic region (represented as a yellow rectangle and a line) for each of the chromatin modifications (H3K27me3, H3K4me3, H4K20me1 and H3K79me2) found in our model and in which there is data available in mouse mESCs. When data from only one of the two chromatin marks from a SACS was available, it was not analysed. Only H4K20me1+H3K79me2 was found to significantly mark exons when excluded, as represented by the SACS4 in human hESCs. Importantly, 57% of all the excluded exons analysed in mouse mESCs where enriched in this signature

211 The Role of histone modifications in the regulation of alternative splicing during the EMT

(357/623). b. Density profile of H3K79me2 reads around H4K20me1-marked exons +/- 250 bp from the 3’ ss exon start in excluded, mid - excluded, mid-included and included exons, as represented in Figure 2.

Supplementary Figure S3. Gene ontology enrichments for the chromatin-marked alternatively spliced genes (SACS) . The most significant GO terms related to biological processes are plotted for each SACS using EnrichR combined score as a reference. In SACS4, we highlight with matched colored arrows all the GO terms in common between human hESCs and mouse mESCs. All the terms have a p-value < 0.01 in Fisher’s exact test.

Supplementary data S1. List of the cell lines and epigenomics and transcriptomics data used from human and mouse. The source and accession number for each data set is given in each category.

Supplementary data S2 . List of the primer pairs used in RT-qPCR and ChIP-qPCR. The name of the gene and primer sequences are detailed.

Supplementary data S3 . List of the alternatively spliced exons marked by a specific chromatin signature. The chromosome, exon’s genomic coordinates in hg19, the strand specificity and the gene name is given for each splicing event.

212 The Role of histone modifications in the regulation of alternative splicing during the EMT

213 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 1 . Schematic pipeline of the machine learning approach used to identify the chromatin modifications that can classify exons into four different splicing categories . a. Cumulative distribution of alternatively spliced exons in human H1 embryonic stem cells and IMR90 fetal fibroblasts using available RNA-seq datasets. Four arbitrary groups were created based on the percentage of exon inclusion (PSI). A color code is given to each category, with light blue for well excluded (0

214 The Role of histone modifications in the regulation of alternative splicing during the EMT

Continue next page

215 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 2. Splicing-associated chromatin signatures (SACS) in H1 hESCs. a. Schematic representation of the seven combinations of chromatin modifications (SACS) that differentially mark alternatively spliced exons. For each SACS, we specify the splicing group it is related to, the two co- enriched histone marks, the position of enrichment along the exon (represented by a peak) and the total number (n) of exons marked by the chromatin signature (in brackets the percentage of chromatin- marked exons respect the total number of exons analyzed per group). b-i. Density profiles of H3K4me2 reads around exons marked by H3K4me1 upstream the 3’ss exon start (b); H3K9me3 reads around exons marked by 5mC downstream the 5’ss end of the exon ( c); H4K91ac reads around H4K20me1- marked exons ( d), H3K9ac reads around exons marked by H3K14ac upstream the exon start (e), H3K79me2 reads around H4K20me1- marked exons ( f), H3K9me3 reads around 5mC-marked exons (g), H3K27me3 reads around exons marked by H3K4me3 downstream the end of the exon (h); and H3K36me3 reads around H4K20me1-marked exons ( i) in excluded (light blue), mid-excluded (dark blue), mid- included (yellow), included (red) and constitutive (grey) exons using available ChIP-seq datasets from H1 hESCs. The mean density of histone marks’ reads is represented +/- 250 bp from either the 5’ or 3’ splice site, depending on the SACS. For each mark, we highlight with a black arrow the splicing group that is the highest enriched at a specific position around the regulated exon, as defined by the SACS. Please notice that H3K9me3+5mC is enriched at both included (SACS2) and excluded (SACS5) exons but at different positions, this is why there are two arrows (in grey the enrichment that corresponds to the other SACS).

216 The Role of histone modifications in the regulation of alternative splicing during the EMT

217 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 3. Experimental validation of SACS4: H4K20me1 and H3K79me2 levels positively correlate with exon skipping. a. H4K20me1 and H3K79me2 levels at differentially spliced exons in H1 compared to several cell lines from the ENCODE project (K562, IMR90, A549, Gm12878, HelaS3, HepG2 and MCF7). From 139 excluded exons enriched in H4K20me1 and H3K79me2 in H1, 94 remained excluded and 45 shifted to included in at least one of the cell lines analyzed. When looking at H4K20me1 and H3K79me2 co-enrichment levels at the selected exons, we found a statistically significant correlation between exon exclusion and co- enrichment of both histone marks (Fisher’s exact test p -value < 0.01). When performing the same type of analysis in mid-included exons enriched in H4K20me1+H4K91ac, we found that out of 61 events, 30 remained included and 31 switched to excluded. Again, exclusion significantly correlated with co-enrichment of H4K20me1 and H3K79me2 (Fisher’s exact test p-value < 0.01), suggesting a change in chromatin when a change in splicing is observed. b-c For validation of the results obtained in silico , five splicing events that are included (INCL.) and five that are excluded (EXCL.) in K562 (black) and HeLa S3 (green) cells were tested for H3K79me2 and H4K20me1 levels along the alternatively spliced exon. In b, exon inclusion levels are normalized to total expression levels of the corresponding gene. Below 0.2 (highlighted with a dotted line) the exon is considered excluded. Data is depicted as the Mean +/- SEM of 3 independent experiments by quantitative RT-qPCR. In c, the enrichment levels (% input) of H3K79me2 and H4K20me1 at the alternatively spliced exon are normalized to background levels, which correspond to the average of 3 negative control positions. Data is depicted as the Mean +/- SEM of 3 independent experiments by quantitative ChIP-qPCR. *** p-value < 10 -12 (Wilcoxon rank-sum test, two-sided). d, e. Same as b-c, but this time three alternatively spliced events (AS) that switched exon inclusion levels between K562 (in black) and MCF10a (in green) are shown. An included and an excluded event that do not change between cell lines are shown as controls. ** p-value < 0.007 and *** p-value < 0.0005 (Wilcoxon rank-sum test, two-sided).

218 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 4. The genetic features of chromatin-marked alternatively spliced exons (SACS). a Boxplots of the 3’ and 5’ splice site (ss) strength scores. b. Boxplots of exon and upstream and downstream intron lengths (in bp). c. Boxplots of the log2 ratio of the percentage of GC content at the alternatively spliced exon respect the upstream or downstream flanking intron. d. Boxplot of the normalized gene expression levels, represented as log(TPM). Boxplots are centered on the median with interquartile ranges, representing the mean and 25 th quantile of all the exons enriched in a particular chromatin signature. Each chromatin-marked splicing group (SACS) has its own color-code. Constitutive exons (in grey+black) and non-marked excluded (in grey+blue) and included (in grey+red) exons are used as controls. * p-value < 0.01 and ** p-value < 0.001 in Wilcoxon rank test compared to constitutive exons (in black) or the corresponding alternatively spliced control exons (in purple) . For each of the boxplots, the number of exons are: SACS1 (K4me1+K4me2 in included exons) n=165 , SACS2 (K9me3+5mC in included exons) n=142, SACS3 (K20me1+K91ac in mid-included exons) n=143, SACS4 (K20me1+K79me2 in excluded exons) n=152, SACS5 (K9me3+5mC in excluded exons) n=89, SACS7 (K14ac+K9ac in mid-excluded exons) n=139, non-marked excluded exons n=600, non-marked included exons n=600, Constitutive exons n=600.

219 The Role of histone modifications in the regulation of alternative splicing during the EMT

220 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 5. SACS are defined by specific RNA binding protein (RBP) motifs. a-d. Volcano plots of the scanned RBP motifs and 5mers in the upstream intron (left), chromatin-marked exon (middle) and downstream intron (right) for a. included H3K9me3+5mC-marked exons (SACS2), b. excluded H3K9me3+5mC-marked exons (SACS5), c. excluded H4K20me1+H3K79me2-marked exons (SACS4) and d. mid-included H4K20me1+H4K91ac- marked exons (SACS3). Colored dots correspond to motifs with adjusted p-value < 0.01 and FDR < 0.05. X axis represents the log2 fold enrichment (FC) of each motif compared to non- marked alternatively spliced events sequences. Y axis represents the –log10 adjusted p-value of the enrichment. FDR and associated adjusted p-value were calculated from n=152 H4K20me1+H3K79me2 excluded exons, n=89 H3K9me3+5mC excluded, n=143 H4K20me1+H4K91ac mid-included exons, n=142 H3K9me3+5mC included exons, n=600 non chromatin-marked excluded exons and n=600 non chromatin-marked included exons.

221 The Role of histone modifications in the regulation of alternative splicing during the EMT

Figure 6. SACS can impact RNA polymerase II distribution and recruitment of splicing factors. a. Boxplot of the normalized RNA polymerase II reads coverage over the upstream intron, exon and downstream intron for the chromatin-marked exons. Constitutive and non- marked excluded and included exons are used as controls (shaded in grey). RNA polymerase II is more enriched at exons than introns in all conditions except for H3K4me1+H3K4me2 (SACS1) and H3K9me3+5mC (SACS2) included exons. ** p-value < 0.01 at exons compared to flanking introns in Wilcoxon rank test. b. Splicing effect on alternatively spliced exons upon hnRNPK knockdown, using available data from GM19238 cells. The number of hnRNPK- dependent events with hnRNPK binding evidence, using publicly available eCLIP data in K562 and HepG2 cells, is also shown. The analysis is performed in exons marked by SACS4 and alternatively spliced exons in general. Exons that are more included upon hnRNPK knock down are shown in red, more excluded are shown in blue and not affected are shown in grey. c. hnRNPK binding and enrichment of H4K20me1+H3K79me2 levels at alternatively spliced exons shifting splicing patterns in different cell lines. Using available eCLIP and ChIP-seq data in K562 and HepG2 cells, we found that from 52 excluded exons rich in H4K20me1+H3K79me2 in H1 hESC, 33 remained excluded and 19 switched to included in K562 or HepG2. Excluded events were more co-enriched in H4K20me1+H3K79me2 than included (Fisher’s exact test p -value < 0.05) and most of the (H4K20me1+H3K79me2)-rich excluded events were bound by hnRNPK (Fisher’s exact test p -value < 0.05), supporting a model in which a specific chromatin signature can favor the recruitment of a splicing regulator to the pre-mRNA.

222 The Role of histone modifications in the regulation of alternative splicing during the EMT

223 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Figure S1 . Enrichment levels of the 15 histone and methyl DNA (5mC) marks selected from the Random Forest classifier . Boxplots of the log2 ratio of the normalized read coverage at the exon respect the read coverage upstream (in the left) or downstream (in the right) the intronic region, for each chromatin marks in excluded (Ex, light blue), mid-excluded (ME, dark blue), mid-included (MI, yellow) and well included (In, red) alternatively spliced events. Boxplots represent the mean and 25 th quantile of the log2 (exon/intron) ratio. * p-value < 0.05 and ** p-value < 0.01 in Wilcoxon Rank Sum test, paired.

224 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Figure S2 . a. Heatmap of the normalized read coverage upstream, at the exon and downstream the intronic region (represented as a yellow rectangle and a line) for each of the chromatin modifications (H3K27me3, H3K4me3, H4K20me1 and H3K79me2) found in our model and in which there is data available in mouse mESCs. When data from only one of the two chromatin marks from a SACS was available, it was not analysed. Only H4K20me1+H3K79me2 was found to significantly mark exons when excluded, as represented by the SACS4 in human hESCs. Importantly, 57% of all the excluded exons analysed in mouse mESCs where enriched in this signature (357/623). b. Density profile of H3K79me2 reads around H4K20me1-marked exons +/- 250 bp from the 3’ ss exon start in excluded, mid- excluded, mid-included and included exons, as represented in Figure 2.

225 The Role of histone modifications in the regulation of alternative splicing during the EMT

226 The Role of histone modifications in the regulation of alternative splicing during the EMT

Supplementary Figure S3. Gene ontology enrichments for the chromatin-marked alternatively spliced genes (SACS) . The most significant GO terms related to biological processes are plotted for each SACS using EnrichR combined score as a reference. In SACS4, we highlight with matched colored arrows all the GO terms in common between human hESCs and mouse mESCs. All the terms have a p-value < 0.01 in Fisher’s exact test.

227 The Role of histone modifications in the regulation of alternative splicing during the EMT

228