Sydney Escholarship Repository
Total Page:16
File Type:pdf, Size:1020Kb
THE UNIVERSITY OF SYDNEY Copyright and use of this thesis This thesis must be used in accordance with the provisions of the Copyright Act 1968. Reproduction of material protected by copyright may be an infringement of copyright and copyright owners may be entitled to take legal action against persons who infringe their copyright. Section 51 (2) of the Copyright Act permits an authorized officer of a university library or archives to provide a copy (by communication or otherwise) of an unpublished thesis kept in the library or archives, to a person who satisfies the authorized officer that he or she requires the reproduction for the purposes of research or study. The Copyright Act grants the creator of a work a number of moral rights, specifically the right of attribution, the right against false attribution and the right of integrity. You may infringe the author’s moral rights if you: - fail to acknowledge the author of this thesis if you quote sections from the work - attribute this thesis to another author -subject this thesis to derogatory treatment which may prejudice the author’s reputation For further information contact the University’s Copyright Service. sydney.edu.au/copyright Genome evolution in Prochlorococcus and marine Synechococcus Camilla Lan-Chih Ip BSc (Hons), MApplSc (Bioinf) A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy School of Biological Sciences The University of Sydney New South Wales 2006 Australia July 2010 Statement of authorship I hereby declare that the work presented in this thesis is, to the best of my knowledge, original and my own work, except as acknowledged in the text; and that the material has not been submitted, either in whole or in part, for a degree at this or any other university. Acknowledgements A PhD is a long and difficult journey. During my candidature, many people have been generous with their time, good counsel, and resources. I am honoured that my supervisors Anthony Larkum, Michael Charleston and Murray Henwood thought I was worthy of such a significant investment of their time and energy. For the hundreds of hours collectively spent with me during my candidature, their patience and friendship, and their continued support of my efforts to complete my candidature in the face of numerous obstacles, I am profoundly grateful. Without the wisdom and guidance of my supervisors, I would not have reached my destination. I warmly thank Min Chen for welcoming me into the her laboratory group, for her insight ful discussions on the biochemistry of cyanobacteria, and for demonstrating by example how to be an efficient research scientist. I extend my thanks to Lars Jermiin, under whose supervision I started this candidature, for taking me on as a student and guiding my early research. I thank past and present members of the Bioinformatics Laboratory (Isabel Hyman, Shauna Murray, Faisal Ababneh, Vivek Jayaswal, Nick Ho and Kwok Wai (Rex) Lau, Julia Jones and Tanya Golubchik), the Marine Algae Laboratory (Yinan Zhang, Mar tin Schliep, Ray Ritchie, Kathie Donohoe, Ross McC. Lilley and Moacir Torres) and the Plant Systematics Laboratory (Trevor Wilson, Matt Pye, Endymion Cooper, Kerry Gibbons and Andrew Perkins) at the School of Biological Sciences, the Bioinformatics Interest Group (Joshua Ho, Yung Shwen Ho, Jeremy Sumner, Michael Woodhams, liana Lichtenstein and Selma Klanten) at the School of Information Technologies, and Leon Poladian from the School of Mathematics and Statistics. I thank Luke Henderson from the School of Medical Sciences in the Faculty of Medicine for generously providing access to additional computing resources. I thank the University of Sydney for funding the University Postgraduate Award scholarship that subsidised my living expenses during my candidature and allowed me to conduct my studies on a full time basis. Similarly, I was able to attend international conferences and workshops with the assistance of the Postgraduate Research Support Scheme at the University of Sydney, and Marine Genomics Europe, who generously relaxed their rules to allow a student from outside the EU to attend. These opportunities to interact with researchers in my held were critical for the development of my research skills. The final part of the thesis write-up was conducted while working as a post-doc. I warmly thank Rory Bowden, Derrick Crook, Tim Peto and Rosalind Harding for not putting additional pressure on me during this difficult period and generously allowing me to take time off to complete the thesis. Throughout the entire PhD journey, my partner Danny Yee, has been there to support me. He has been an active sounding board for ideas, provided invaluable technical support for the installation of computer software and hardware, put his own plans on hold in order to be there for me, and unfailingly believed that I could complete the candidature. Without his love and companionship, it would have been a dark and lonely journey. I thank my siblings Amy, Daniel and Lilian for all their love, support and advice during my candidature. This work is dedicated to my mother Dana, and my late father Percy, who nurtured my quest for wisdom through knowledge. IV Abstract The primary aim of this work was to explore the relative importance of vertical and non-vertical inheritance in one group of closely-related cyanobacteria, and from this to gain insights into the processes of genome evolution associated with the diversification of photosynthetic systems in bacteria. Recovering the genome-level events associated with the evolutionary diversification of photosynthetic systems is difficult. The information that could be used to recover the evolutionary relationships among taxa has been eroded by a long series of gains, losses, duplications, rearrangements and mutations in the primary sequence. Furthermore, the main technique for inferring the evolutionary relationships among taxa, phylogenetic in ference, is compromised by stochastic and systematic error, a particular problem if the sequences being studied have experienced lateral gene transfer (LGT). Cyanobacteria from the genus Prochlorococcus and the marine species of Synechococcus are small, unicellular, marine organisms that are ecologically important on a global scale. The Prochlorococcus genus is unusual in using a membrane-intrinsic light-harvesting an tenna system consisting of divinyl chlorophylls a and 6, while their closest relative, the marine Synechococcus clade, uses the system most commonly found in the cyanobacteria, a membrane-extrinsic system based on biliproteins organised into phycobilisomes. There is extensive evidence that Prochlorococcus and marine Synechococcus have evolved v from a common ancestor, but the genomic processes involved are not well understood. The evidence of common ancestry is difficult to reconcile with evidence of extensive LGT. Analyses based on different genes have found different phylogenies and it is not clear whether this is due to stochastic or systematic error arising from the phylogenetic inference procedure or is indicative of different evolutionary histories. Prochlorococcus and marine Synechococcus were chosen for this study because they are closely-related, so their genomes are very similar, while still being sufficiently different to be used for a study of genome evolution. Furthermore, as cyanobacteria they are the extant survivors of the pioneering oxyphotobacteria in which oxygenic photosynthetic originally evolved, they have genetic similarities with the anoxygenic photosynthetic bac teria, and they share a common ancestry with the plastids of the eukaryotes, ffence a better understanding of the mechanisms of genome evolution in cyanobacteria may lead to insights into the processes by which photosynthetic systems have diversified. The approach adopted in this study has been to prepare and select sequence data to prevent or minimise sources of error, so as to produce phylogenies that are a better repre sentation of the evolutionary history. Each step of the phylogenetic inference process was scrutinised to identify potential sources of error and extensive preliminary studies were conducted to identify the most appropriate methods for data preparation and analysis. Unsupervised inference of orthologous sets of protein-coding genes from diverse eubacte- ria was a necessary but particularly error-prone step, so preliminary studies were carried out to identify the best BLAST parameters to use in pairwise comparisons and several clustering methods were trialled before the Markov Clustering (MCL) method was se lected. Different phylogenetic methods, and in particular, phylogenetic network methods, were evaluated for suitability and utility. Maximum-likelihood was selected for tree-based phylogenetic inference, and the Neighbor-Net analyses for highlighting conflicting site- patterns in gene alignments. vi A set of 70 diverse eubacteria consisting of 31 cyanobacteria, 12 anoxyphotobacteria and 27 non-photosynthetic eubacteria was selected for analysis. A “core” of 439 protein coding genes was identified and, from these 280 that could be used for phylogenetic analyses, were filtered to identify a “reduced core” of 62 genes which did not exhibit large variation in amino-acid frequencies. Functional annotation was allocated to each set by cross-referencing with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to find that 12 of the 21 functional categories relevent to bacteria were covered. Phylogenetic trees were inferred using