Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance

Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance

Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance by Matthew Daratha Whiteside B.Sc., (Hons., Bioinformatics), University of Waterloo, 2006 Thesis Submitted In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Molecular Biology and Biochemistry Faculty of Science © Matthew Daratha Whiteside 2013 SIMON FRASER UNIVERSITY Spring 2013 Approval Name: Matthew Daratha Whiteside Degree: Doctor of Philosophy Title of Thesis: Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance Examining Committee: Chair: Dr. Ralph Pantophlet Assistant Professor, Faculty of Health Sciences Dr. Fiona S.L. Brinkman Senior Supervisor Professor, Department of Molecular Biology and Biochemistry Dr. Jack Chen Supervisor Associate Professor, Department of Molecular Biology and Biochemistry Dr. Margo M. Moore Supervisor Professor, Department of Biological Sciences Dr. Ryan D. Morin Internal Examiner Assistant Professor, Department of Molecular Biology and Biochemistry Dr. Rosemary J. Redfield External Examiner Professor, Department of Zoology University of British Columbia Date Defended/Approved: March 8th, 2013 ii Partial Copyright Licence iii Abstract Orthologs are genes that diverged from an ancestral gene when the species diverged. High-throughput computational methods for ortholog prediction are a key component of many computational biology analyses. A fundamental premise in these analyses is that orthologs (when predicted correctly) are functionally equivalent and can be used to transfer gene annotations across species. Currently, many existing ortholog prediction methods generate a sizeable number of incorrect ortholog predictions, especially in cases of complex gene evolution. My thesis examines the functional equivalence hypothesis further and presents one solution that increases the precision of ortholog prediction. To examine the use of orthologs in computational analysis, I conducted and evaluated three projects that employ ortholog prediction in distinct ways. In these projects, orthologs were used to (1) identify conserved, unique genes in metazoan species, (2) validate predicted gene regulatory modules in Pseudomonas aeruginosa, and (3) construct a transcriptional regulatory network in Aspergillus fumigatus. I identified factors affecting ortholog prediction in these specific use cases, demonstrating how successive gene duplications, incomplete genomes and rapid evolution of gene regulation can impact the results for such analyses. To improve ortholog prediction, I evaluated and augmented an existing method called Ortholuge. Ortholuge is a computational method that increases the precision of ortholog prediction in a high-throughput setting. I evaluated the performance of Ortholuge, showing that its approach of classifying orthologs based on their relative phylogenetic divergence does identify orthologs that are more functionally equivalent. I compared Ortholuge to contemporary methods QuartetS and OMA, and showed that Ortholuge consistently identifies functionally-equivalent orthologs across a range of taxonomic distances. I also further developed Ortholuge’s functionality by reducing run-time, increasing accuracy and improving usability through a number of modifications. Lastly, to make Ortholuge results available to the research community, I developed a database of Ortholuge ortholog predictions for bacteria and archaea species. This online iv database provides high-level visualization of orthologs and the ability to easily run complex queries to retrieve genes that are shared or unique between specified taxa. Overall, this work contributes an enhanced method for precise high-throughput ortholog identification and increases our understanding of the functional equivalences between orthologs. Keywords: Orthology; Comparative Genomics; Bioinformatics; Phylogenomics; Evolution v Dedication Aan mijn liefde Joske. Dank u voor jou eeuwige steun. Je geeft me de moed om mijn dromen te volgen. vi Acknowledgements I would like to thank the many people who have helped me in the completion of my PhD thesis. The first is my senior supervisor, Dr. Fiona Brinkman. You have my sincerest gratitude. Thank you for this opportunity, which has taught me so much. My PhD has been more than an education; it has been a life changing experience. I would like to thank past and present members of the Brinkman lab. You have made this experience truly unforgettable. Thank you so much for your support and shared laughs during these past six years. I am very thankful for the invaluable guidance and contribution of my supervisory committee; Dr. Margo Moore and Dr. Jack Chen. I would like to acknowledge all project collaborators. Your help and the opportunities you have provided me have made this work possible. I would like to extend thanks to: • Drs. Jinko Graham and Brad McNeney, Jeong Eun Min – OL.locfdr Project • Drs. Melissa Frederic and Michel Leroux – Metazoan Project • Dr. Margo Moore, Linda Pinto and Jason Catterson – Aspergillus fumigatus Iron-Limitation Project • Geoff Winsor and Matthew Laird – OrtholugeDB Project Thank you to the funding agencies: the Michael Smith Foundation for Health Research and Simon Fraser University for their financial support during my PhD. Finally, I would like to thank my family. Without your support and encouragement, none of this would be possible. vii Table of Contents Approval .............................................................................................................................ii Partial Copyright Licence .................................................................................................. iii Abstract .............................................................................................................................iv Dedication .........................................................................................................................vi Acknowledgements .......................................................................................................... vii Table of Contents ............................................................................................................ viii List of Tables .................................................................................................................... xii List of Figures.................................................................................................................. xiii List of Acronyms ...............................................................................................................xv Glossary .......................................................................................................................... xvi 1. Introduction to Homology and Comparative Genomics ..................................... 1 1.1. Gene Evolution and Its Impact on Gene Function ................................................... 2 1.1.1. Overview of the Mechanisms of Gene Evolution .......................................... 2 1.1.2. Definition of Orthology and Paralogy ............................................................ 3 1.1.3. Correspondence between Mode of Gene Evolution and Functional Divergence ................................................................................................... 5 Fates of Genes after Duplication .................................................................. 6 Neofunctionalization ..................................................................................... 7 Subfunctionalization ..................................................................................... 7 1.2. Detection of Orthologs ............................................................................................. 8 1.2.1. Phylogenetic Tree-based Detection of Orthologs ......................................... 8 1.2.2. Graph-based Detection of Orthologs ............................................................ 9 1.2.3. Hybrid Approaches ..................................................................................... 10 1.2.4. Resolving Complex Ortholog Relationships ............................................... 11 Detection of In-paralogs ............................................................................. 11 Grouping Orthologs from Multiple Species ................................................. 11 1.3. Factors Affecting Ortholog Prediction Accuracy ..................................................... 14 Gene Loss and Incomplete Genomes ........................................................ 14 Gene Fusion and Fission ........................................................................... 15 Issues Identifying the Nearest Neighbour with the BLAST Algorithm ........ 15 Horizontal Gene Transfer ........................................................................... 16 1.4. Improving Ortholog Prediction ................................................................................ 16 1.4.1. Review of Existing Strategies for Improved Ortholog Prediction ................ 16 Phylogenetic-based Ortholog Prediction without a Species Tree .............. 16 Overcoming Limitations of the BLAST algorithm in Identifying the Nearest Neighbour ............................................................................... 17 A Domain-Centric Approach ....................................................................... 17 Synteny .....................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    203 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us