Analysis of the Impact of Population Copy Number Variation in Introns

UNIVERSIDAD AUTÓNOMA DE MADRID Programa de Doctorado en Biociencias Moleculares Analysis of the Impact of Population Copy Number Variation in Introns Maria Rigau de Llobet Madrid, 2019 DEPARTMENT OF BIOCHEMISTRY UNIVERSIDAD AUTÓNOMA DE MADRID FACULTY OF MEDICINE Analysis of the Impact of Population Copy Number Variation in Introns Maria Rigau de Llobet Graduate in Biology Thesis Directors: Dr. Alfonso Valencia Herrera and Dr. Daniel Rico Rodríguez Barcelona Supercomputing Center I Certificado del director o directores Dr Alfonso Valencia Herrera, Head of the Computational Biology Life Sciences Group (Barcelona Supercomputing Center) and Dr Daniel Rico Rodríguez, Head of the Chromatin, Immunity and Bioinformatics Group (Institute of Cellular Medicine, Newcastle, UK) CERTIFY: That Ms. Maria Rigau de Llobet, Graduate in Biology by the University Pompeu Fabra (Barcelona, Spain), has completed her Doctoral Thesis entitled “Analysis of the Impact of Population Copy Number Variation in Introns” under our supervision and meets the necessary requirements to obtain the PhD degree in Molecular Biosciences. To this purpose, she will defend her doctoral thesis at the Universidad Autó noma de Madrid. We hereby authorize its defense in front of the appropriate Thesis evaluation panel. We issue this certificate in Madrid on April 30th 2019 Alfonso Valencia Herrera Daniel Rico Rodríguez Thesis Director Thesis Director III This thesis was supported by a “La Caixa” – Severo Ochoa International PhD Programme V Aknowledgments Para empezar, me gustaría dar las gracias a mis dos directores de tesis, Alfonso Valencia y Daniel Rico. Muchas gracias Alfonso por la oportunidad de hacer el doctorado en tu grupo. Puedo decir sin dudar que ha sido el comienzo de una carrera profesional con la que disfruto. Me siento muy afortunada por estos casi cinco años, una etapa de aprender con libertad y con apoyo, de trabajar a gusto y de estar rodeada de gente brillante, tanto en el ámbito laboral como personal. Gracias por crear y mantener un entorno de trabajo tan estimulante y por dejarme formar parte de él. Dani, a ti te debo, para empezar, mi inesperada entrada al mundo de la bioinformática. Gracias por animarme a hacer el doctorado con vosotros, que sigo convencida de que fue la mejor decisión. Te estoy muy agradecida por todos tus consejos, por tu entusiasmo, por valorar siempre mis comentarios y opiniones y por tratarme como igual desde el principio. También te agradezco la manera cómo has afrontado y revertido con toda naturalidad mis malos momentos y que me hayas ayudado a mantenerme motivada. A David el pelirrojo también le estoy eternamente agradecida. David, muchísimas gracias por tus lluvias torrenciales de ideas, por enseñarme a descifrar e interpretar resultados, por sacar tiempo de donde no lo había y, a ti también, gracias por el apoyo moral. Quiero agradecer a Óscar, Fátima y Tomàs, mi comité de tesis, todas sus aportaciones. De mi etapa en el CNIO, quiero dedicarle un agradecimiento especial a Marta, ya que formó parte del comité de reclutamiento y bienvenida al grupo, y porque sin su compañía en esos primeros desayunos y sin todos los descansos compartidos, mi etapa en el CNIO no habría tenido ni la mitad de las risas. Y a Ángel, a quien también le agradezco la acogida desde el principio y, además, el ayudarme a tener una relación medianamente cordial con mi ordenador. Y a Enrique, por las tardes de tertulia y cacahuetes. Y a Belén, por estar en todo y hacernos la vida más fácil. VII Y no me olvido de todos los compañeros de la planta baja - ala norte del CNIO, de mis compañeras puntuales de oficina (Cata, Mónica, Mago e Irene) y de toda la gente de los Supresores de Telómeros con quien he compartido o bien muchas horas de trabajo o bien muchas de no-trabajo. Mi etapa madrileña no habría sido lo mismo sin mis compañeras de piso. Muchas gracias Cristina, Mónica, Miriam, Nerea y Leire por hacer tan familiares, alegres y cómodos mis tres años en Sandoval. Y cómo no, este agradecimiento también aplica a mis vecinos, compañeros de vacaciones, anfitriones y grandes amigos Alejandra y Juan. And then there are the Chicolas. I couldn’t be happier to have shared most of the PhD with you. I miss you and I hope we can meet soon as Doctor Chicolas. For all these years, grazie, Fede; danke, Teresa; eskerrik asko, Leire; ευχαριστώ, Dafni. Mi doctorado tiene también una etapa Barcelonesa compartida con otro gran equipo muy multidisciplinar: de trabajo, de yoga, de kebabs, de eXcursiones, de celebraciones, y de karaokes (con y sin lesión). Estos son: Miguel, Vera, Davide, Alba L, Alba J, Eva, Vicky, François, Arnau, Iker, Carlos, Hugo, Mónica, Laure, Eduard, Miguel V, Patricia, Mattia, Victor, los INBs y los mineros. Y cómo no, Juan y Jon, los mejores compañeros de doctorado posibles. Juancho, mil gracias por estos años. Por el soporte informático y moral, por las discusiones científicas y las reflexiones filosóficas, por obsequiarnos con un peinado diferente cada día y por estar siempre dispuesto a bajar de tu mundo de máXima concentración para ayudar en lo que haga falta. Y Jon, tu párrafo debería estar en mayúsculas. Muchas gracias por aguantarme del primer al último día, por tu paciencia y por estar allí siempre que lo he necesitado. Creo que nadie me discutirá que, sin ti, nuestro grupo no estaría ni tan unido, ni tan entretenido, ni tan cuidado. Mil gracias por todo. Gràcies també a tots els del grup d’en David Torrents, perquè amb qui no he compartit CNVs he compartit menjars del pingüí, congressos, aires condicionats o màrfegues de ioga. VIII També vull donar les gràcies a la colla de la universitat (+2) per tots els moments compartits durant, ara ja, més d’una dècada. Perquè ens hem fet grans junts i mai hem perdut les ganes veure’ns. I a les amigues de tota, o quasi tota la vida. Ester i Núria, no sé quina de les dues ha aguantat més “quan acabi la tesi”, però espero que a partir d’ara poguem fer totes les sortides i viatges que tenim pendents. I Silvieta, merci per estar sempre disposada a àpats improvisats, sempre riallera contagiosa i per ser sempre una font de tranquil·litat. Vull aprofitar l’ocasió per donar les gràcies a la meva gran família, especialment als meus pares, pel seu eXemple i per haver-me donat suport en totes les meves decisions, i als meus altres tres quarts: la Gemma, l’Anna i l’Anton. Us estimo infinit. I a en Carlos. Per aquests perfectes i inesperats últims mesos de doctorat. IX Abstract/Resumen XI Abstract Introns cover most of the DNA sequence in human protein-coding genes and represent approXimately half of the non-coding genome. Very little is known about the patterns of structural variation in introns and little attention has been paid to their functional implications, even if several pathogenic intronic mutations have already been characterized. Through the combined analysis of the five most eXtensive maps of Copy Number Variants (CNVs) in human populations we show that intronic losses are the most frequent type of CNV in protein-coding genes. The lower density of CNVs in introns compared to intergenic regions supports the presence negative selection on intronic CNVs. We identified many intronic deletions associated with gene expression changes by integrating genotype with RNA-seq and promoter-capture Hi-C data, supporting the implication of many CNVs in genetic regulation. Remarkably, a noteworthy number of these associations are better interpreted by long-range genome interactions. Supporting the possible impact of intronic CNVs on splicing, we have found 185 genes differentially expressed transcripts associated with deletions. Moreover, we have found changes in exon inclusion associated with deletions that alter the GC content of the intron. This finding suggests that the structure of the fragments deleted in introns play a significant role on which exons are included in the mature messenger RNA. Altogether, our findings additionally support the substantial role of intronic CNVs on gene regulation. Interestingly, we have observed that CNVs are not equally distributed among genes of different evolutionary ages. Ancient genes are, in general, depleted of losses covering their exons, but they carry the majority of intronic deletions, including intronic deletions associated with eXpression changes. On the other hand, recent primate-specific genes are enriched in CNVs implicating exons. Taken together, our findings suggest that CNVs have a role in shaping gene evolution, possibly acting at different levels at large and short evolutionary times (old and young genes). While in young genes CNVs contribute to directly alter protein sequences, in ancient genes CNVs seem to be preferentially contributing to population variability at the level of regulation with possible adaptive implications. XIII Resumen Los intrones cubren la mayor parte de la secuencia de ADN en genes codificantes para proteínas y representan aproXimadamente la mitad del genoma no codificante en humanos. Se sabe muy poco acerca de los patrones de variación estructural en los intrones y se ha prestado poca atención a sus implicaciones funcionales, incluso si ya se han caracterizado varias mutaciones intrónicas patógenicas. A través del análisis combinado de los cinco mapas más eXtensos de las Variantes en Número de Copia (CNVs) en poblaciones humanas, mostramos que las pérdidas intrónicas son el tipo más frecuente de CNV en los genes codificantes para proteínas. La menor densidad de CNVs en intrones en comparación con regiones intergénicas sugiere la presencia de selección negativa sobre las CNVs intrónicas. Integrando datos de CNVs con datos de RNA-seq y PCHi-C hemos identificado deleciones intrónicas asociadas a cambios en la eXpresión génica. Parte de estas asociaciones se interpretan mejor por interacciones genómicas entre fragmentos distantes.

Analysis of the Impact of Population Copy Number Variation in Introns

ASSESSING DIAGNOSTIC and THERAPEUTIC TARGETS in OBESITY- ASSOCIATED LIVER DISEASES Noemí Cabré Casares

Frontiers in Integrative Genomics and Translational Bioinformatics

C12) United States Patent (IO) Patent No.: US 10,774,349 B2 San Et Al

The Molecular Karyotype of 25 Clinical-Grade Human Embryonic Stem Cell Lines Received: 07 August 2015 1 1 2 3,4 Accepted: 27 October 2015 Maurice A

Dual Proteome-Scale Networks Reveal Cell-Specific Remodeling of the Human Interactome

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

A Clinicopathological and Molecular Genetic Analysis of Low-Grade Glioma in Adults

In Vivo Dual RNA-Seq Analysis Reveals the Basis for Differential Tissue Tropism of Clinical Isolates of Streptococcus Pneumoniae

Genes in a Refined Smith-Magenis Syndrome Critical Deletion Interval on Chromosome 17P11.2 and the Syntenic Region of the Mouse

Large Meta-Analysis of Genome-Wide Association Studies

Meta-Analysis of Genetic Association with Diagnosed Alzheimer's Disease Identifies Novel Risk Loci and Implicates Abeta, Tau, Immunity and Lipid Processing

Supplementary Table S4. FGA Co-Expressed Gene List in LUAD