Mapping Natural Selection Through the Drosophila Melanogaster Development Following a Multiomics Data Integration Approach Marta Coronado Zamora
Total Page:16
File Type:pdf, Size:1020Kb
ADVERTIMENT. Lʼaccés als continguts dʼaquesta tesi queda condicionat a lʼacceptació de les condicions dʼús establertes per la següent llicència Creative Commons: http://cat.creativecommons.org/?page_id=184 ADVERTENCIA. El acceso a los contenidos de esta tesis queda condicionado a la aceptación de las condiciones de uso establecidas por la siguiente licencia Creative Commons: http://es.creativecommons.org/blog/licencias/ WARNING. The access to the contents of this doctoral thesis it is limited to the acceptance of the use conditions set by the following Creative Commons license: https://creativecommons.org/licenses/?lang=en doctoral thesis Mapping natural selection through the Drosophila melanogaster development following a multiomics data integration approach marta coronado zamora Directors Antonio Barbadilla Prados Isaac Salazar Ciudad Departament de Genètica i de Microbiologia Facultat de Biociències Universitat Autònoma de Barcelona 2018 ABSTRACT Charles Darwin’s theory of evolution proposes that the adaptations of organisms arise because of the process of natural selection. Natural selec- tion leaves a characteristic footprint on the patterns of genetic variation that can be detected by means of statistical methods of genomic analy- sis. Today, we can infer the action of natural selection in a genome and even quantify what proportion of the incorporated genetic variants in the populations are adaptive. The genomic era has led to the paradox- ical situation in which much more evidence of selection is available on the genome than on the phenotype of the organism, the primary target of natural selection. The advent of next-generation sequencing (NGS) technologies is provid- ing a vast amount of -omics data, especially increasing the breadth of available developmental transcriptomic series. In contrast to the genome of an organism, the transcriptome is a phenotype that varies during the lifetime and across different body parts. Studying a developmental tran- scriptome from a population genomic and spatio-temporal perspective is a promising approach to understand the genetic and developmental basis of the phenotypic change. This thesis is an integrative population genomics and evolutionary biol- ogy project following a bioinformatic approach. It is performed in three sequential steps: (i) the comparison of different variations of the McDon- ald and Kreitman test (MKT), a method to detect recurrent positive se- lection on coding sequences at the molecular level, using empirical data from a North American population of D. melanogaster and simulated data, (ii) the inference of the genome features correlated with the evolu- tionary rate of protein-coding genes, and (iii) the integration of patterns of genomic variation with annotations of large sets of spatio-temporal developmental data (evo-dev-omics). As a result of this approach, we have carried out two different stud- ies integrating the patterns of genomic diversity with multiomics layers across developmental time and space. In the first study, we give a global perspective on how natural selection acts during the whole life cycle of D. melanogaster, assessing whether different regimes of selection act iii through the developmental stages. In the second study, we draw an ex- haustive map of selection acting on the complete embryo anatomy of D. melanogaster. Taking all together, our results show that genes expressed in mid- and late-embryonic development stages exhibit the highest sequence conser- vation and the most complex structure: they are larger, consist of more exons and longer introns, encode a large number of isoforms and, on average, are highly expressed. Selective constraint is pervasive, particu- larly on the digestive and nervous systems. On the other hand, earlier stages of embryonic development are the most divergent, which seems to be due to the diminished efficiency of natural selection on maternal- effect genes. Additionally, genes expressed in these first stages have on average the shortest introns, probably due to the need for a rapid and efficient expression during the short cell cycles. Adaptation is found in the structures that also show evidence of adaptation in the adult, the immune and reproductive systems. Finally, genes that are expressed in one or a few different anatomical structures are younger and have higher rates of evolution, unlike genes that are expressed in all or almost all structures. Population genomics is no longer a theoretical science, it has become an interdisciplinary field where bioinformatics, large functional -omics datasets, statistical and evolutionary models and emerging molecular techniques are all integrated to get a systemic view of the causes and consequences of evolution. The integration of population genomics with other phenotypic multiomics data is the necessary step to gain a global picture of how adaptation occurs in nature. iv ACKNOWLEDGMENTS Primero de todo quisiera agradecer muy sinceramente al Dr. Antonio Barbadilla y al Dr. Isaac Salazar por haberme dado la gran oportunidad de realizar este trabajo bajo su supervisión. Antonio, muchas gracias por haber guiado mis primeros pasos en la genética y haberme adentrado en este apasionante mundo. Isaac, gràcies per haver-me fet més crítica i haver contribuït en el meu creixement com a científica. Quiero agradecer a David Castellano todo lo que me ha enseñado estos años. Te debo mucho, y esta tesis no habría sido posible sin ti. Así mismo, agradecer a Irepan Salvador toda su ayuda y haberme enseñado a ser más rigurosa. El trabajo con vosotros ha sido, sin duda, mucho más llevadero. Agradecer a mis compañeros del laboratorio –Carla, Jon, Isaac, Sergi, Esteve y Jesús– todos los momentos compartidos. Carla, gracias por ofrecerme tu ayuda siempre que lo necesito y ser una gran amiga. Jon, a ti te debo los chismorreos de las tardes que tanto nos gustan y las risas que nos han hecho salir del laboratorio a coger aire. Isaac, tu hueco fue irremplazable y agradezco infinitamente tu apoyo en los momentos difí- ciles. Sergi, me alegro de que esta experiencia nos haya permitido cono- cernos mejor y haber aprendido tanto el uno del otro. Esteve, porque no todos los héroes llevan capa, siempre serás el mejor sysadmin. Jesús, mi gran compañero de fatigas, gracias por estar siempre ahí, envidio las ganas y la fuerza que le echas a todo. Muchas gracias a todos los miembros, tanto antiguos como actuales, del grupo de GBBE. En especial a Sònia, Marta y Raquel, sois las culpables de que me enamorase de la bioinformática y la genómica durante la carrera. Alejandra, aprecio tus ánimos, las charlas y los buenos consejos que me has dado. Teresa, thanks for sending your cheers and support despite the distance. Marina, gracias por alegrar el cotarro las mañanas y tardes del café. Mario, gracias por saber transmitir lo que es hacer ciencia de verdad. I thank Dr. Erich Bornberg-Bauer for giving me the opportunity to stay 3 months in his lab. Thanks to Carsten for his patience with me and his v follow-up guidance. I want to thank April and Brennen for making my days much more enjoyable. També agrair a l’Alícia i a la Maria Josep pel seu ajut administratiu i haver-me facilitat tant les feines burocràtiques. Agradecer a las fuentes de financiamiento de esta tesis: el Servi- cio de Genómica i Bioinformática del IBB por la ayuda durante los primeros meses y la Secretaria d’Universitats i Recerca del Departament d’Empresa i Coneixement de la Generalitat de Catalunya por la beca FI de 3 años. Agradezco a mis amigas todos estos años que han permanecido a mi lado, aunque no nos hayamos visto todo lo que hubiéramos querido: Lorena, Lara, Dámaris, Eli, Sara, Tania y Vero. ¡Nos quedan muchas cenas, celebraciones y viajes por hacer! A las personas que han sabido mantenerme cuerda en el mejor ambiente posible: Montse, Conchi y Teresa, gracias por enseñarme a relativizar los problemas. Después de bailar se ven las cosas de otra manera. Gracias a Laia Blanes por su asesoramiento en el diseño de la portada. Elias, from the beginning you were kind, supportive, and showed me that you were always there for me. Despite the distance, this feeling hasn’t changed. Ich liebe dich. Finalmente, agradecer a mi hermana y mis padres todo su apoyo in- condicional, comprensión y ayuda que me han dado durante toda mi vida. Muchas gracias a todos, tanto los que están lejos como los que ya no están, que me han apoyado, animado y cuidado durante este camino. vi PUBLICATIONS During the Ph.D. research, the following articles were published or are currently under submission process. Three of them are the core of the present thesis (articles 2, 3 and 4) while the others were published as the result of the collaboration with members of the Bioinformatics for Genomics Diversity (BGD) group at the UAB. [1] Castellano, D., M. Coronado-Zamora, J. L. Campos, A. Barbadilla, and A. Eyre-Walker (2015). “Adaptive evolution is substantially im- peded by Hill-Robertson interference in Drosophila.” Molecular Biol- ogy and Evolution 33.2, pp. 442–455. [2] Salvador-Martínez*, I., M. Coronado-Zamora*, D. Castellano*, A. Bar- badilla, and I. Salazar-Ciudad (2018). “Mapping selection within Drosophila melanogaster embryo’s anatomy.” Molecular Biology and Evolution 35.1, pp. 66–79. [3] Coronado-Zamora*, M., I. Salvador-Martínez*, D. Castellano, A. Bar- badilla, and I. Salazar-Ciudad. “Adaptation and selective constraint throughout Drosophila melanogaster life cycle.” (Submitted). [4] Coronado-Zamora, M., J. Murga-Moreno, S. Hervás, S. Casillas, and A. Barbadilla. “Comparison of five McDonald and Kreitman test ap- proaches using Drosophila melanogaster and human population data.” (In preparation). [5] Murga-Moreno*, J., M. Coronado-Zamora*, A. Bodelón, A. Bar- badilla, and S. Casillas (2018). “PopHumanScan: the collaborative catalog of human adaptation.” Nucleic Acid Research. * The authors contributed equally to the work. Author contributions: [1]: In this article, the evolutionary advantage of genetic recombination at the gene level was quantified for the first time in the D.