Developing Genomic Resources for an Emerging Ecological Model Species Senecio Lautus Huanle Liu Msc
Total Page:16
File Type:pdf, Size:1020Kb
Developing genomic resources for an emerging ecological model species Senecio lautus Huanle Liu MSc A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2014 School of Biological Sciences 1 Abstract Next generation sequencing (NGS) technologies have seen successful applications in ecology and evolutionary biology. The ever expanded applications of NGS to non-model organisms in the wild populations make it possible to address important ecological and evolutionary questions at larger and more precise scales. These questions vary from deciphering the genetic basis of traits underlying rapid ecological adaptation and speciation to identifying the genome wide differentiation patterns of populations at different divergence stages. The later question can only be answered when the whole genome of the species is available. The Australian groundsel, i.e., Senecio lautus is a good system to study ecological speciation. This species complex consists of multiple ecotypes adapting to different environments, among which the Dune and Headland ecotypes occur proximately to each other in several coastal localities of Australia, displaying very contrasting morphologies despite being interfertile. Phylogenetic analyses supported independent origination of each Dune and Headland pairs and ecological studies have correlated phenotypic difference with environmental adaptation. Here I used methylation filtration to develop gene space and transcriptome sequencing (RNAseq) to construct reference transcriptome for this species. I explored multiple strategies to optimize the final results in terms of assembly completeness, accuracy and contiguity. For gene space sequence assembly, I found hybridizing results from different assemblers used for assembling specific data type is better than combining results from hybrid assemblers (i.e., assemblers assembling all types of data at once). In the case of transcriptome assembly, I found the multiple spectrum assembly strategy (i.e., multiple assembly parameters and multiple assemblers) reconstructed more genes, but introduced more redundancy, complicating downstream analyses such as gene family reconstruction, phylogenetic and population genetics analyses. I also found that redundancy reduction based on expression level is better than the other methods. I used gene space assembly as reference sequences to detect the genome wide divergence patterns of multiple Dune and Headland ecotypes and found that geographic distance affects the magnitude of genetic differentiation between populations, but not the genomic divergence patterns. Applying pooled RNAseq to four S. lautus ecotypes allows population genomics and molecular evolution analyses leading to the finding of short divergence time among these ecotypes and candidate genes that have potentially contributed to adaptive divergence of S. lautus. Moreover, I also found decoupled differential gene expression and coding sequence divergence patterns, suggesting the rapid divergence of S. lautus has been achieved through evolution at both levels. These works are essential to shape S. lautus up as an excellent system to study ecological speciation with gene flow. 2 Declaration by author This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis. I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my research higher degree candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award. I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the General Award Rules of The University of Queensland, immediately made available for research and study in accordance with the Copyright Act 1968. I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis. 3 Publications during candidature Peer-reviewed papers ELLIOTT, A. G., C. DELAY, H. LIU, Z. PHUA, K. J. ROSENGREN et al., 2014 Evolutionary Origins of a Bioactive Peptide Buried within Preproalbumin. The Plant Cell Online 26: 981-995. RODA, F., L. AMBROSE, G. M. WALTER, H. L. LIU, A. SCHAUL et al., 2013a Genomic evidence for the parallel evolution of coastal forms in the Senecio lautus complex. Molecular ecology 22: 2941-2952. RODA, F., H. LIU, M. J. WILKINSON, G. M. WALTER, M. E. JAMES et al., 2013b Convergence and divergence during the adaptation to similar environments by an Australian groundsel. Evolution 67: 2515-2529. Publications included in this thesis No publications included. 4 Contributions by others to the thesis Contributor Specific contribution Federico Roda Contributed to collecting plant tissues for Gene Space and RADs preparation (90%) Diana M. Bernal Contributed to collecting plant tissues for transcriptome sequencing (90%) Daniel Ortiz-Barrientos Contributed to Sample collection, Data analysis, and Writing (20%) Statement of parts of the thesis submitted to qualify for the award of another degree None 5 Acknowledgements I came to Australia and started my PhD study in 2010. So I have been abroad for 4 and half years. I still remember the day my parents saw me off at the airport saying ‘good bye and take care’ to me. First of all, I would like to thank my parents for encouraging me to study abroad. I also remember my supervisor Daniel Ortiz-Barrientos picked me up at Brisbane airport and brought me to UQ and showed me the beautiful campus. Daniel is the kind of people that you will instantly love and never forget once you meet him. He is full of energy and enthusiasm. From then on, he also shows me around the scientific world. Daniel has most of my gratitude for being a considerate and supportive supervisor. He has supervised me with guidance and endless patience and trust. He cheered me up when I had the linguistic problem at the beginning of my life abroad. He did not give up on me when I failed to meet deadlines and failed the sunflower experiment, which was supposed to be my major project, during and after the floods. So thanks a million for his insightful and patient guidance and his tolerance and generosity. I would also like to express my gratitude to some friends who have made my life in Australia more memorable and enjoyable. I thank Fedrico Roda, a funny friend in daily life and a good student with an insightful mind. I still get a laugh when I reminisce about the time we made fun of each other’s accent. I thank Maria Clara Melo, with whom I have had lots of comfortable talks and I also thank her for introducing me to happy folks just like herself. I would also like to thank all other lab mates for the communication we had from time to time and for their unhesitant help. Last but not least, I really appreciate the help from UQ graduate school, which represents a good support system. I thank the graduate school for approving extension of thesis submission by taking into account the influence of floods and for offering me scholarship covering the tuition fee during extension. 6 Keywords Next Generation Sequencing, Sequence assembly, Population genomics, Gene expression, speciation, local adaptation, genome divergence Australian and New Zealand Standard Research Classifications (ANZSRC) ANZSRC code: 060102, Bioinformatics, 30% ANZSRC code: 060408, Genomics, 30% ANZSRC code: 060411, Population, Ecological and Evolutionary Genetics, 40% Fields of Research (FoR) Classification FoR code: 0603, Evolutionary Biology, 40% FoR code: 0604, Genetics, 60% 7 TABLE OF CONTENTS Chapter 1 ............................................................................................................................................ 13 INTRODUCTION ............................................................................................................................. 13 ABSTRACT ............................................................................................................................... 13 AN OVERVIEW OF NGS ......................................................................................................... 14 A BRIEF HISTORY OF NGS .................................................................................................... 14 APPLICATION OF NGS TO NON-MODEL SYSTEMS ......................................................... 15 CHALLENGES OF NGS APPLICATION AND POTENTIAL SOLUTIONS ......................... 17 THESIS CHAPTERS SUMMARIES......................................................................................... 22 REFERENCE ............................................................................................................................. 26 Chapter 2 ...........................................................................................................................................