
High-resolution sequencing of mitochondrial DNA by Sofia Nicolai Annis B.A in Biology, Smith College A dissertation submitted to The Faculty of the College of Science of Northeastern University in partial fulfillment of the requirements for the degree of Doctor of Philosophy July 27th, 2020 Dissertation directed by Konstantin Khrapko Department of Biology 1 Acknowledgments I would first like to thank Dr. Konstantin Khrapko for inviting me into his lab and graciously letting me take control of the projects that interested me while always being available with guidance and support. I appreciate all of the meandering conversations about our science, unrelated science, and life in general. I would not have been able to keep going without the continuous supply of coffee, chocolate, and good humor. The additional mentorship and support from Dr. Jonathan Tilly and particularly Dr. Dori Woods was invaluable and truly helped me grow as a scientist. Many aspects of this work required a true team effort. Zoe Fleischmann provided invaluable contributions to the development of LUCS and showed that bench and computational scientists can work together in harmony. Her patient guidance in teaching me the basics of coding is greatly appreciated. Other members of the Laboratory for Aging and Infertility Research helped were excellent technical and theoretical sounding boards, and they helped created a supportive and happy community. I want to thank the many, many undergraduate, graduate, and high school students who I worked with in the Khrapko lab. Through teaching them, I was able to expand and deepen my own knowledge, and I had to challenge myself to always have an answer waiting. Their willingness to take on a wide variety of projects let me chase many ideas at once. Particular thanks are owed to Housaiyin Li, Melissa Franco, and Zachary Mullin-Bernstein. Finally, I want to thank my family for fostering my interest in science from a young age. They encouraged me to perform my own genetics experiments with my many rabbits, sent me away to science camps in the summers, and pretended to not be too sad when I continued to spend my summers away from home in labs across the country. Last but not least I want to thank my boyfriend Mike for his unwavering support and devotion, who along with his family has been so patient and kind. 2 Abstract of Dissertation Mitochondria, heralded as the powerhouses of the cell, contain small, highly-conserved genomes. This mitochondrial DNA (mtDNA) is abundant in most eukaryotic cells, reflecting its crucial role in providing protein subunits that are necessary for oxidative phosphorylation. mtDNA is asexually transmitted through the maternal germline but the precise mechanisms that govern the maintenance of genomic integrity through the generations are still a focus of research and lively debate. The general consensus is that germline mitochondria undergo a population bottleneck and are exposed to strong purifying selection, a combination that can generate a high degree of genetic variability among offspring. Details are lacking, however, on the exact structure of the bottleneck and timing of selective pressures. Current mtDNA sequencing strategies that aim to elucidate these topics largely rely on Next Generation Sequencing (NGS) datasets that provide an abundance of data but provide no insight into mtDNA as discrete and diverse genetic elements. This research offers a high-resolution analysis of mtDNA in oocytes, with a particular focus on low-frequency variants that are often overlooked by NGS approaches but could have important clinical significance. To further enhance this whole-mtDNA genome sequencing approach, this work describes a novel sequencing strategy that couples Oxford Nanopore’s long single-read capabilities with unique molecular identifiers that can generate highly accurate consensus sequences and ultimately capture the true genetic diversity of mtDNA populations. 3 Table of Contents Acknowledgments 2 Abstract of Dissertation 3 Table of Contents 4 Abbreviations 5 Chapter 1: High-resolution analysis of oocyte mtDNA 6 Introduction 6 Materials and methods 18 Results 22 Discussion 28 Figures 33 Tables 46 References 49 Chapter 2: Quasi-Mendelian Paternal Inheritance of mitochondrial DNA: A 55 notorious artifact, or anticipated mtDNA behavior? Abstract 55 Introduction 56 Methods and results 58 Figures 83 References 87 Chapter 3: LUCS: a high-resolution nucleic acid sequencing tool for accurate 90 long-read analysis of individual DNA molecules Introduction 90 Materials and methods 107 Results 112 Discussion 117 Figures 121 Tables 136 References 138 4 Abbreviations bp: Base pair CCS: Circular consensus sequencing COX: Cytochrome c oxidase ETC: Electron transport chain Gbp: Giga base pair (1,000,000,000 bp) GERP: Genomic evolutionary rate profiling HGP: Human Genome Project Indel: Insertion or deletion in a DNA sequence Kbp: Kilo base pair (1,000 bp) LHON: Leber hereditary optic neuropathy LUCS: Long UMI-driven consensus sequencing Mbp: Mega base pair (1,000,000 bp) mtDNA: Mitochondrial DNA mtDNA: Mitochondrial DNA nDNA: Nuclear DNA NGS: Next-generation sequencing NUMT: Nuclear mitochondrial DNA PGC: Primordial germ cell PolG: Polymerase gamma SMRT: Single molecule real time sequencing SNP: Single nucleotide polymorphism SNV: Single nucleotide variant ssDNA: Single-stranded DNA UMI: Unique molecular identifier VEP: Variant effect predictor WGS: Whole-genome sequencing 5 Chapter 1: High-resolution analysis of oocyte mtDNA Introduction Often called the powerhouses of the cell, the mitochondria are crucial components of eukaryotic cells that fulfill the bulk of the cells’ energetic demands through oxidative phosphorylation [1]. Mitochondria arose following the endosymbiosis of a single-celled organism, likely an a-Proteobacteria, into the common, likely archaeal, bacterial ancestor of all eukaryotes around 2 billion years ago [2-4]. This monophyletic origin has been traced back by sequencing and phylogenetic analysis on the mitochondrial DNA (mtDNA) that still remains in the organelle as a vestige of its previous free-living lifestyle [5]. The emergence of mitochondria as specialized intercellular energy producers gave an immense evolutionary advantage to their host cells and is one of the key reasons for the growth in cell size and complexity of eukaryotes [6-8]. Over time, mitochondria lost most of their genetic complexity, shifting to a reliance on the nuclear genome to provide the organelle with essential proteins. Mitochondrial genome size varies across the eukaryotic domain, but in mammals is typically 15,000-17,000 bp. Comprised of just 13 protein subunits, 22 tRNAs and 2 rRNAs, mtDNA is a compact intron-free circular genome with minimal non-coding sequence (Figure 1.1). Around 1,500 nuclear gene products are involved in governing mitochondrial function, with specific expression varying across tissue types [9, 10] . The mitochondrial genome exists in high abundance with an average of ~4.6 copies per organelle [11, 12] and hundreds to hundreds of thousands of organelles per cell. Across eukaryotic evolution, mitochondria have lost differing amounts of their genomic content. Plants have significantly larger genomes, typically ranging from 100-1,000 kbp and encoding ~40-156 genes; the lower gene-to-genome size ratio is due to a more complex genetic structure featuring introns and repetitive sequences [13]. Despite the broad diversity of 6 mitochondrial genomes, a shared trend across eukaryotes is the retention of components of the electron transport chain (ETC) [14]. The proper function of these retained mitochondrial genes is vital for life; indeed, pathogenic mtDNA mutations that disrupt gene function can manifest in a variety of clinical syndromes ranging from mild to fatal. Leber hereditary optic neuropathy (LHON) is the most prevalent mitochondrial disorder, occurring in as many as 1 in 10,000 to 1 in 50,000 Europeans [15, 16]. LHON was the first disorder to be directly linked to a mutation in the mitochondrial genome [17] and is characterized by the degeneration of retinal ganglion cells, leading to severe or complete vision loss. Like many mitochondrial diseases, LHON can arise from any of several point mutations, with three point mutations accounting for 90-95% of patients’ conditions and dozens of other mutations eliciting a similar pathology [15]. Because mitochondrial function is vital for almost all cells in the human body, dysfunction caused by mtDNA mutations affects a broad spectrum of organ systems and results in a varied range of clinical phenotypes. A key feature of pathogenic mtDNA mutations is that they must be at relatively high frequency within the cell in order to exert a negative phenotypic effect. For example, the pathogenic mutations that cause LHON typically need to be present in at least 60% of the mtDNA genomes in order to result in loss of vision [15]. As mammals are diploid, a pathogenic nuclear mutation could be present in as few as one copy and have a detrimental effect on mitochondrial function. The presence of a single pathogenic mutation in mtDNA within a cell, however, will not have a noticeable effect as it is merely one disruption among hundreds or thousands of genomes. Even slightly higher levels of heteroplasmy (the state of having multiple mtDNA genotypes within a cell, as contrasted with the uniform state of homoplasmy) are generally not sufficient to have a phenotypic effect, owing to the dynamic nature of 7 mitochondrial populations. While mitochondria
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages142 Page
-
File Size-