Applying Computational Solutions for Solving Problems in Mammalian
Total Page:16
File Type:pdf, Size:1020Kb
University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 12-29-2017 Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Ajay Obla University of Connecticut - Storrs, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Obla, Ajay, "Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis" (2017). Doctoral Dissertations. 1693. https://opencommons.uconn.edu/dissertations/1693 Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Ajay Babu Obla Kumaresh University of Connecticut, 2018 Using computational tools to solve various biological problems has become common practice over the last decade. This has been primarily fueled by exponentially growing high throughput biological data and relevant computational biology resources to support it [1]. The work presented in this thesis showcases application of computational techniques to answer critical biological questions pertaining to gene family evolution and single cell gene expression analysis. Dr. Susumu Ohno was a pioneer to propose the significance of gene duplication in driving gene family evolution. Gene duplication has been shown to expand the repertoire of several gene families involved in multitude of important biological functions. Thus, understanding the forces that lead to retention and loss of gene duplicates have been subject to close scrutiny over the past decade. One of the works presented in this thesis provides a strong argument for a novel gene duplication model that seeks to explain retention of previously unexplained gene duplicates. The first part of the work involved an in-depth study of evolutionary history of mammalian ribosomal protein gene family (RPG) evolution. This study confirmed our prior preliminary finding that there are thousands of intact duplicates whose fate could not be explained by existing gene duplication models. This led us to frame a novel gene duplication model that explains the retention of these gene duplicates. We also make a strong case for this model by employing rigorous in-silico and in-vitro tests to demonstrate its Ajay Babu Obla Kumaresh - University of Connecticut, 2018 feasibility. We have achieved a rare feat by employing the above-mentioned two orthogonal but necessary tests that are lacking in other gene duplication models. Our investment in studying RPG family evolution enabled us to frame a novel single cell RNA-Seq (scRNA-Seq) QC pipeline. We hypothesized that the biological constraints under which RPGs function could serve as a robust biological indicator for cell health at single cell resolution. We formulated an outlier based QC model consisting of three features that could be extracted from RPG transcriptional signatures in any scRNA-Seq dataset. We show stable performance of the model across various datasets along with comparison with other QC features widely used in existing approaches. This QC model is designed to be easily implemented and applied to any scRNA-Seq study irrespective of experimental approach. Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Ajay Babu Obla Kumaresh M.S., University of New Haven, 2010 A Dissertation Submitted in Partial Fulfilment of the Requirement for the Degree of Doctor of Philosophy at the University of Connecticut 2018 i Copyright by Ajay Babu Obla Kumaresh 2018 ii APPROVAL PAGE Doctor of Philosophy Dissertation Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Presented by Ajay Babu Obla Kumaresh, M.S Major Advisor _________________________________________________________ Dr. Craig Nelson Associate Advisor _________________________________________________________ Dr. Ion Mandoiu Associate Advisor _________________________________________________________ Dr. Victoria Robinson Associate Advisor _________________________________________________________ Dr. David Goldhamer Associate Advisor _________________________________________________________ Dr. Mukul Bansal University of Connecticut 2018 iii Acknowledgements I would like to first thank my doctoral adviser, Craig for giving me an opportunity to pursue my thesis work in his lab. I was fortunate to be exposed to diverse research projects under his tutelage. Craig is uncompromising in his attention to detail and always sets a high standard of excellence in all domains of scientific research ranging from critical thinking to articulating complex research findings. This has allowed me to always push my boundaries and I am certainly thankful to him for patiently training me to become an able scientist. A special thanks to my lab mate Asav for being a co-traveler in two research projects and helping me immensely during my inception in Craig’s lab. We have motivated each other by working long hours to meet deadlines and brainstorming future experiments. We had a highly productive partnership in which Asav took the lead on the wet lab and I took the lead on the dry lab. I would like to thank my co-adviser Ion for providing me with much needed access to computational resources and exposing me to innovative computational analyses. In particular, his help and advice in developing scRNA-Seq QC pipeline was invaluable. My committee member Vicky has been an amazing pillar of support during my graduate school and played a vital role in providing great insights in ribosomal biology. I would also like to thank my other committee members David and Mukul for providing their honest opinions and feedbacks during my interactions with them. My pleasant stint in Nelson Lab was highly aided by fun loving lab mates Fred, Albert, Ed, Jay, Caroline, Matt G and Steve who were always there to cheer and counsel me in times of need. I am highly indebted to them for maintaining a vibrant environment. Last, but not least, I would like to thank my parents for providing a strong home base to turn to for comfort and being supportive of all my endeavors in life. iv Contents Chapter 1: Introduction to Gene Family Evolution ..................................................................................... 1 1.1 Evolution of gene families through gene duplication ................................................................... 1 1.2 Gene Duplication Mechanisms ..................................................................................................... 1 1.3 Models for Gene Duplication ........................................................................................................ 2 1.4 Retro-duplication in mammals ..................................................................................................... 2 1.5 Known examples of retrotransposed duplicates .......................................................................... 3 1.6 The Mammalian Ribosome and Ribosomal Proteins .................................................................... 4 1.7 Known Examples of Mammalian Ribosomal Protein Duplicates .................................................. 4 1.8 Prior work and scope of the study ................................................................................................ 5 Chapter 2: Tempo and Mode of Gene Duplication in Mammalian Ribosomal Protein Evolution ............ 6 2.1 Computational pipeline to reconstruct mammalian RP Gene Families .............................................. 6 2.1.1 Ribosomal Dataset ....................................................................................................................... 6 2.1.2 Extraction of Gene Family Members ........................................................................................... 6 2.1.3 Identification of Duplications and Phylogenetic Analysis ............................................................ 7 2.1.4 Conservation and EST Analyses ................................................................................................... 8 2.2 Results from the analysis pipeline ...................................................................................................... 9 2.2.1 76 Ribosomal Protein Family Member Analyses ......................................................................... 9 2.2.2 The Fate of Ribosomal Protein Duplications over time ............................................................. 11 2.2.3 Analysis of Selective Pressure Acting on All Ribosomal Gene Duplicates .................................. 14 2.2.4 EST Analysis for human and mouse RP duplicate genes ............................................................ 19 2.3 Discussion .......................................................................................................................................... 21 Chapter 3: Evaluation of Gene duplication models with duplication mechanisms. ................................ 23 3.1 Mechanism of Gene Duplication ...................................................................................................... 23 3.1.1 Unequal crossing over ................................................................................................................ 24 3.1.2 DNA Transposition ....................................................................................................................