Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

Total Page:16

File Type:pdf, Size:1020Kb

Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science Summer 8-12-2013 Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads Bassam A. Tork Georgia State University Follow this and additional works at: https://scholarworks.gsu.edu/cs_diss Recommended Citation Tork, Bassam A., "Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads." Dissertation, Georgia State University, 2013. https://scholarworks.gsu.edu/cs_diss/77 This Dissertation is brought to you for free and open access by the Department of Computer Science at ScholarWorks @ Georgia State University. It has been accepted for inclusion in Computer Science Dissertations by an authorized administrator of ScholarWorks @ Georgia State University. For more information, please contact [email protected]. VIRAL QUASISPECIES RECONSTRUCTION USING NEXT GENERATION SEQUENCING READS by BASSAM TORK Under the Direction of Dr. Alexander Zelikovsky ABSTRACT The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods. INDEX WORDS: RNA viruses, Viral quasispecies, Genome assembly VIRAL QUASISPECIES RECONSTRUCTION USING NEXT GENERATION SEQUENCING READS by BASSAM TORK A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the College of Arts and Sciences Georgia State University 2013 Copyright by Bassam Tork 2013 VIRAL QUASISPECIES RECONSTRUCTION USING NEXT GENERATION SEQUENCING READS by BASSAM TORK Committee Chair: Dr. Alexander Zelikovsky Committee: Dr. Yury Khudyakov Dr. Robert Harrison Dr. Rajshekhar Sunderraman Electronic Version Approved: Office of Graduate Studies College of Arts and Sciences Georgia State University August 2013 iv DEDICATION To my parents Abdel Rahman Tork and Mariam Tork. v ACKNOWLEDGEMENTS I would especially like to thank my advisor Dr. Alexander Zelikovsky for his encourage- ment, advisement and constant support over the years under his supervision at Georgia State University. This dissertation could not have been done without his guidance, patience and motivation. I have learned more about computer science and conducting research from him than virtually everyone else. I would especially like to thank my committee members: Dr. Yury Khudyakov, Dr. Robert Harrison, and Dr. Rajshekhar Sunderraman. Their critique of my work with no doubt strengthened and improved it. Special thanks to Professor Ion Mandoiu and Professors I.Khan and O'Neill of University of Connecticut for their valuable advisement. Special thanks to Dr. Yi Pan Chairman of the Computer Science Department for his encouragement, advisement and support over the years of my PhD study. Special thanks to Dr. Rajshekhar Sunderraman for his encouragement, advisement and support over the years of my PhD study. Special thanks to Computer Science Department of Georgia State University faculty, staff and students. Special thanks to the GSU Molecular Basis of Disease Initiative (MBD). Special thanks go out to my dear friends Nick, Hamed, Serghei, Adrian, Sasha, Blanche, Olga, and Ekaterina. Finally I am grateful to my parents, my brothers, my sisters and all my friends for their support and ecouragement. Special thanks go from my heart to my nice family and daughters whom I miss a lot. vi TABLE OF CONTENTS ACKNOWLEDGEMENTS ::::::::::::::::: v LIST OF TABLES :::::::::::::::::::: ix LIST OF FIGURES :::::::::::::::::::: xi LIST OF ABBREVIATIONS :::::::::::::::: xiv PART 1 INTRODUCTION ::::::::::::::: 1 1.1 Viral Quasispecies ............................. 1 1.2 Challenges of Quasispecies Reconstruction From NGS Reads . 2 1.3 Viral Quasispecies Spectrum Reconstruction Problem . 2 1.4 Existing Approaches For Sequencing Viral Quasispecies . 3 1.5 Main Contributions ............................ 4 1.6 Future Work ................................ 5 1.7 Roadmap of the Rest of the Dissertation ............... 5 1.8 Publications ................................. 6 1.9 Oral Presentations ............................. 7 1.10 Book Chapters ............................... 7 1.11 Posters .................................... 7 PART 2 STATE-OF-THE-ART IN VIRAL QUASISPECIES RE- CONSTRUCTION ::::::::::::::: 9 2.1 Introduction ................................. 9 2.2 NGS Technologies ............................. 10 2.3 Existing Tools ................................ 17 2.3.1 ShoRAH . 18 vii 2.3.2 QuRe . 19 2.3.3 Error Correction . 20 PART 3 QUASISPECIES SPECTRUM RECONSTRUCTION FROM SHOTGUN READS :::::::::::::: 21 3.1 ViSpA .................................... 21 3.1.1 Read Error Correction . 22 3.1.2 Read Alignment . 22 3.1.3 Processing of 454 Reads . 23 3.1.4 Read Graph . 23 3.1.5 Candidate path selection. 23 3.1.6 Estimation of candidate quasispecies sequence frequencies. 23 3.2 Fixing Erroneous Homopolymer Indels via Protein Alignment . 24 PART 4 QUASISPECIES SPECTRUM RECONSTRUCTION FROM AMPLICON READS :::::::::::::: 27 4.1 Theoretical Analysis of Quasispecies Assembly Problem for Ampli- con Reads .................................. 27 4.1.1 Quasispecies Assembly in the Error-Free, Ideal-Frequency Model Prob- lem . 28 4.1.2 Reduction of Skewed-Frequency Model to Ideal-Frequency Model 31 4.2 Viral Quasispecies Reconstruction Algorithms From Amplicon Reads 32 4.2.1 Read Graph . 32 4.2.2 Maximum Bandwidth Path . 33 4.2.3 Maximum Frequency Path. 33 4.2.4 Greedy Algorithm for Resolving Forks. 33 4.3 Data Sets .................................. 34 4.3.1 Simulated Reads . 34 4.3.2 HBV Data . 34 viii 4.4 Validation Methods ............................ 34 4.4.1 Quasispecies Assembling Validation . 35 4.4.2 Validation of Frequency Distribution . 35 4.5 Experimental Results ........................... 35 4.5.1 Simulated Reads . 35 4.5.2 Real Reads . 42 PART 5 RECONSTRUCTION OF IBV QUASISPECIES ::: 44 5.1 Introduction ................................. 44 5.2 Methods ................................... 45 5.2.1 Compared Methods . 46 5.3 Data Sets and Golden Standards .................... 47 5.4 Validation .................................. 48 5.4.1 Validation of 454 Reads . 48 5.4.2 Validation of Correction Methods . 48 5.4.3 Comparison Measures . 49 5.5 Experimental Results ........................... 50 5.5.1 Results of 454 Reads Validation . 50 5.5.2 Results of Correction Methods Validation . 53 5.5.3 Results of Reconstructed Quasispecies . 53 PART 6 CONCLUSIONS AND FUTURE WORK :::::: 63 REFERENCES ::::::::::::::::::::: 65 ix LIST OF TABLES Table 4.1 Average sensitivity/PPV of ViSpA and Maximum Bandwidth algo- rithms for shotgun and amplicon error-free read data sets simulated from ten and twenty quasispecies [1, 2]. 42 Table 4.2 Total and distinct reads of amplicons per patient [1, 2]. 43 Table 5.1 Pairwise edit distance between the 10 Sanger clones [3]. 54 Table 5.2 Edit distance between collapsed Sanger clones and reconstructed qua- sispecies using parameters 1 2 5( the number of mismatches between sub-reads and super-reads, the number of mismatches between two overlapped reads, and mutation rate respectively), threshold 0.005 on KEC corrected reads using ViSpA [3]. 55 Table 5.3 Edit distance between collapsed sanger clones and reconstructed qua- sispecies using parameters 1 2 5( the number of mismatches between sub-reads and super-reads, the number of mismatches between two overlapped reads, and mutation rate respectively), threshold 0.005 on uncorrected reads using ViSpA [3]. 56 Table 5.4 Edit distance between collapsed Sanger clones and reconstructed qua- sispecies using ShoRAH assembler with default parameters [3]. 58 Table 5.5 Average distance to clones (ADC) for the reconstructed quasispecies using different methods [3]. 60 Table 5.6 Average prediction error (APE) for the reconstructed quasispecies us- ing different methods [3]. 60 x Table 5.7 Average prediction error(APE) and Average distance to clones(ADC) of reconstructed quasispecies for different methods [3]. 61 xi LIST OF FIGURES Figure 2.1 454 Sequencing Method [4]. 12 Figure 2.2 Illumina Sequencing Method[5]. 14 Figure 2.3 SOLiD Sequencing Method [6]. 16 Figure 2.4 Ion Torrent Sequencing Method [7]. 17 Figure 3.1 Algorithmic approach to solve QSA problem. 22 Figure 3.2 protein sequence alignment (CLUSTAL W) of the quasispecies se- quence (0.0498579823867 ID20 * 1) with the reference (contig20 * 3) before systematic error fixing. 25 Figure 3.3 protein sequence alignment (CLUSTAL W) of the quasispecies se- quence (0.0498579823867 ID20 * 1) with the reference (contig20 * 3) after systematic error fixing. 26 Figure 4.1 The case of two distinct reads for both amplicons [1, 2]. 30 Figure 4.2 Sensitivity for different amplicon window shifts from uniform distribu- tion [1, 2] . 37 Figure 4.3 Sensitivity for different amplicon window shifts from skewed distribu- tion [1, 2] . 37 Figure 4.4 Sensitivity for different amplicon window shifts from geometric
Recommended publications
  • Quasispecies Theory in Virology
    JOURNAL OF VIROLOGY, Jan. 2002, p. 463–465 Vol. 76, No. 1 0022-538X/02/$04.00ϩ0 DOI: 10.1128/JVI.76.1.463–465.2002 Copyright © 2002, American Society for Microbiology. All Rights Reserved. Quasispecies Theory in Virology Holmes and Moya claim that quasispecies is an unnecessary ular sort. Darwinian principles in connection with quasispecies and misleading description of RNA virus evolution, that virol- have been explicitly invoked by theoreticians and experimen- ogists refer to quasispecies inappropriately, and that there is talists alike (11, 13, 16, 35). little evidence of quasispecies in RNA virus evolution. They What is the evidence of quasispecies dynamics in RNA virus wish to look for other ideas in evolutionary biology and to set populations, and why is quasispecies theory exerting an influ- down an agenda for future research. I argue here that real virus ence in virology? The initial experiment with phage Q␤ which quasispecies often differ from the theoretical quasispecies as provided the first experimental support for a quasispecies dy- initially formulated and that this difference does not invalidate namics in an RNA virus (14, 17) has now been carried out with quasispecies as a suitable theoretical framework to understand biological and molecular clones of representatives of the major viruses at the population level. groups of human, animal, and plant RNA viruses, including In the initial theoretical formulation to describe error-prone immunodeficiency viruses and hepatitis C virus, both in cell replication of simple RNA (or RNA-like) molecules, quasispe- culture and in vivo (11). Support for quasispecies has also cies were defined as stationary (equilibrium) mutant distribu- come from studies on replication of RNA molecules in vitro tions of infinite size, centered around one or several master (4).
    [Show full text]
  • Viruses at the Edge of Adaptation
    Virology 270, 251–253 (2000) doi:10.1006/viro.2000.0320, available online at http://www.idealibrary.com on MINIREVIEW Viruses at the Edge of Adaptation Esteban Domingo1 Centro de Biologı´a Molecular “Severo Ochoa,” (CSIC-UAM) Universidad Auto´noma de Madrid, Cantoblanco, 28049 Madrid, Spain Received February 25, 2000; accepted March 14, 2000 How vulnerable is the line that separates adaptation from extinction? Viruses, in particular RNA viruses, are well known for their high rates of genetic variation and their potential to adapt to environmental modifications (Drake and Holland, 1999; Domingo et al., 2000). Yet, fitness variations—both increases and decreases—can be spectacularly rapid, and the simple genetic stratagem of forcing virus multiplication to go through repeated genetic bottlenecks can induce fitness losses, at times near viral extinction. New information has been recently obtained on the two sides of the survival line: the edge of adaptation and the edge of extinction. © 2000 Academic Press SUBTLE ADAPTIVE STRATEGIES present in the population. Furthermore, this particular variant was systematically selected, while, when absent RNA virus quasispecies are highly effective in the from memory, its selection was rare (Ruiz-Jarabo et al., exploration of new genomic sequences (Eigen and 2000). In agreement with the concept that memory ge- Biebricher, 1988). It has long been known that mutant nomes are part of the mutant spectrum of the quasispe- swarms (the mutant spectra that populate viral quasispe- cies, memory was erased when the populations were cies) contain potentially useful variants [those with in- subjected to genetic bottlenecks (Fig. 1). Because of the creased resistance to antiviral agents, altered interferon- existence of a molecular memory, a viral quasispecies inducing capacity, or antibody-escape and cytotoxic T may be capable of reacting swiftly to a selective con- lymphocyte (CTL)-escape mutants, among others].
    [Show full text]
  • Evolutionary Virology at 40
    | PERSPECTIVES Evolutionary Virology at 40 Jemma L. Geoghegan* and Edward C. Holmes†,‡,§,**,1 *Department of Biological Sciences, Macquarie University, Sydney, New South Wales 2109, Australia and †Marie Bashir Institute for Infectious Diseases and Biosecurity, ‡Charles Perkins Centre, §School of Life and Environmental Sciences, and **Sydney Medical School, The University of Sydney, New South Wales 2006, Australia ORCID IDs: 0000-0003-0970-0153 (J.L.G.); 0000-0001-9596-3552 (E.C.H.) ABSTRACT RNA viruses are diverse, abundant, and rapidly evolving. Genetic data have been generated from virus populations since the late 1970s and used to understand their evolution, emergence, and spread, culminating in the generation and analysis of many thousands of viral genome sequences. Despite this wealth of data, evolutionary genetics has played a surprisingly small role in our understanding of virus evolution. Instead, studies of RNA virus evolution have been dominated by two very different perspectives, the experimental and the comparative, that have largely been conducted independently and sometimes antagonistically. Here, we review the insights that these two approaches have provided over the last 40 years. We show that experimental approaches using in vitro and in vivo laboratory models are largely focused on short-term intrahost evolutionary mechanisms, and may not always be relevant to natural systems. In contrast, the comparative approach relies on the phylogenetic analysis of natural virus populations, usually considering data collected over multiple cycles of virus–host transmission, but is divorced from the causative evolutionary processes. To truly understand RNA virus evolution it is necessary to meld experimental and comparative approaches within a single evolutionary genetic framework, and to link viral evolution at the intrahost scale with that which occurs over both epidemiological and geological timescales.
    [Show full text]
  • Error Threshold in RNA Quasispecies Models with Complementation
    ARTICLE IN PRESS Journal of Theoretical Biology 265 (2010) 278–286 Contents lists available at ScienceDirect Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi Error threshold in RNA quasispecies models with complementation Josep Sardanye´s a,Ã, Santiago F. Elena a,b a Instituto de Biologı´a Molecular y Celular de Plantas, Consejo Superior de Investigaciones Cientı´ficas-UPV, Ingeniero Fausto Elio s/n, 46022 Valencia, Spain b Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501, USA article info abstract Article history: A general assumption of quasispecies models of replicons dynamics is that the fitness of a genotype is Received 22 December 2009 entirely determined by its sequence. However, a more biologically plausible situation is that fitness Received in revised form depends on the proteins that catalyze metabolic reactions, including replication. In a stirred population 26 April 2010 of replicons, such as viruses replicating and accumulating within the same cell, the association between Accepted 14 May 2010 a given genome and the proteins it encodes is not tight as it can be replicated by proteins translated Available online 21 May 2010 from other genomes. We have investigated how this complementation phenomenon affects the error Keywords: threshold in simple quasispecies mean field models. We first studied a model in which the master and Complementation the mutant genomes code for wild-type and mutant replicases, respectively. We assume that the Defective interfering particles mutant replicase has a reduced activity and that the wild-type replicase does not have increased affinity Error threshold for the master genome.
    [Show full text]
  • Profiling SARS-Cov-2 Mutation Fingerprints That Range from the Viral Pangenome to Individual Infection Quasispecies Billy T
    Lau et al. Genome Medicine (2021) 13:62 https://doi.org/10.1186/s13073-021-00882-2 RESEARCH Open Access Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies Billy T. Lau1,2†, Dmitri Pavlichin1†, Anna C. Hooker1†, Alison Almeda1, Giwon Shin1, Jiamin Chen1, Malaya K. Sahoo3, Chun Hong Huang3, Benjamin A. Pinsky3,4, Ho Joon Lee1* and Hanlee P. Ji1,2* Abstract Background: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing. Methods: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints. Results: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes.
    [Show full text]
  • Virus-Host Interactions by Linking Virology and Bioinformatics
    viruses Commentary ITN—VIROINF: Understanding (Harmful) Virus-Host Interactions by Linking Virology and Bioinformatics Winfried Goettsch 1 , Niko Beerenwinkel 2,† , Li Deng 3,† , Lars Dölken 4,† , Bas E. Dutilh 5,† , Florian Erhard 4,† , Lars Kaderali 6,† , Max von Kleist 7,†, Roland Marquet 8 , Jelle Matthijnssens 9,† , Shawna McCallin 10 , Dino McMahon 11,† , Thomas Rattei 12 , Ronald P. Van Rij 13,† , David L. Robertson 14,† , Martin Schwemmle 15,† , Noam Stern-Ginossar 16 and Manja Marz 1,17,*,† 1 RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University, 07743 Jena, Germany; [email protected] 2 Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland; [email protected] 3 Institute of Virology, Helmholtz Centre Munich and Technical University Munich, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; [email protected] 4 Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, 97078 Würzburg, Germany; [email protected] (L.D.); fl[email protected] (F.E.) 5 Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Hugo R. Kruytgebouw, Padualaan 8, 3584 CH Utrecht, The Netherlands; [email protected] 6 Institute of Bioinformatics, University Medicine Greifswald, 17475 Greifswald, Germany; [email protected] 7 MF1 Bioinformatics, Robert Koch-Institute, 13353 Berlin, Germany; [email protected] 8 CNRS, Architecture et Réactivité de l’ARN, Université de
    [Show full text]
  • Qcolors: an Algorithm for Conservative Viral Quasispecies Reconstruction from Short and Non-Contiguous Next Generation Sequencing Reads
    2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops QColors: An Algorithm for Conservative Viral Quasispecies Reconstruction from Short and Non-Contiguous Next Generation Sequencing Reads Austin Huang∗, Rami Kantor∗, Allison DeLong†, Leeann Schreier∗, and Sorin Istrail‡ ∗ Division of Infectious Disease, † Center for Statistical Sciences, ‡ Department of Computer Science, Brown University Providence, RI, USA, Email: [email protected] Abstract—Next generation sequencing technologies have been viral population that consists of genetically distinct subpopu- successfully applied to HIV-infected patients in order to obtain lations [12]–[14]. Thus the viral population is often described the mutational spectrum of heterogeneous viral populations as a quasispecies [15]. Viral populations that circulate at low within individuals, known as quasispecies. However, the metage- nomics problem of quasispecies sequence reconstruction from levels, termed minor variants, are clinically relevent because next generation sequencing reads is not-yet widely applied in drug-resistant subpopulations may be selected upon treatment current practice and remains an emerging area of research. [16]–[18]. These minor variants are often undetectable using Furthermore, the majority of research methodology in HIV standard sequencing protocols. Specialized techniques such has focused on 454 sequencing, while many next-generation as clonal and single genome sequencing (SGS) have been sequencing platforms are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how developed to obtain sequences of minor variant subpopulations best to address the read length limitations of other platforms. within patients [19]–[22]. SGS, in particular, was designed to The approach described here incorporates graph representa- address and minimize sequencing artifacts such as recombi- tions of both read differences and read overlap to conservatively nation of qusispecies sequences during PCR [19].
    [Show full text]
  • The Impact of Quasispecies Dynamics on the Use of Therapeutics
    Review The impact of quasispecies dynamics on the use of therapeutics 1,2 3 3 1,2,3 Celia Perales , Jaime Iranzo , Susanna C. Manrubia , and Esteban Domingo 1 Centro de Biologı´a Molecular ‘‘Severo Ochoa’’ (CSIC-UAM), Campus de Cantoblanco 28049, Madrid, Spain 2 Centro de Investigacio´ n Biome´ dica en Red de Enfermedades Hepa´ ticas y Digestivas (CIBERehd), 08036 Barcelona, Spain 3 Centro de Astrobiologı´a (INTA-CSIC), Instituto Nacional de Te´ cnica Aeroespacial, Carretera de Ajalvir km 4, 28850 Torrejo´ n de Ardoz, Madrid, Spain The application of quasispecies theory to viral popula- Glossary tions has boosted our understanding of how endoge- Combination therapy: treatment that consists of the administration of two or nous and exogenous features condition their adaptation. more drugs. Mounting empirical evidence demonstrates that internal Complementation: increase of viral progeny production mediated by gene interactions within mutant spectra may cause unexpect- products supplied by another virus (in quasispecies, supplied by closely related variants). ed responses to antiviral treatments. In this scenario, Complexity of a mutant spectrum: number of mutations and genomic increased mutagenesis could be efficient at low muta- sequences in a viral population. It is often quantified by pairwise genetic gen doses due to the lethal action of defective genomes, distances, mutation frequency (calculated by dividing the number of different mutations by the total number of nucleotides sequenced), and Shannon whereas sequential administration of antiviral drugs entropy (proportion of different genomes in the population). New technologies might be superior to combination therapies. Our ability should allow a quantitative characterization of quasispecies complexity in to predict the outcome of a particular therapy takes terms of phenotypic diversity.
    [Show full text]
  • Viral Evolution and the Emergence of SARS Coronavirus
    FirstCite Published online e-publishing Viral evolution and the emergence of SARS coronavirus Edward C. Holmes* and Andrew Rambaut Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK The recent appearance of severe acute respiratory syndrome coronavirus (SARS-CoV) highlights the con- tinual threat to human health posed by emerging viruses. However, the central processes in the evolution of emerging viruses are unclear, particularly the selection pressures faced by viruses in new host species. We outline some of the key evolutionary genetic aspects of viral emergence. We emphasize that, although the high mutation rates of RNA viruses provide them with great adaptability and explain why they are the main cause of emerging diseases, their limited genome size means that they are also subject to major evolutionary constraints. Understanding the mechanistic basis of these constraints, particularly the roles played by epistasis and pleiotropy, is likely to be central in explaining why some RNA viruses are more able than others to cross species boundaries. Viral genetic factors have also been implicated in the emergence of SARS-CoV, with the suggestion that this virus is a recombinant between mammalian and avian corona- viruses. We show, however, that the phylogenetic patterns cited as evidence for recombination are more probably caused by a variation in substitution rate among lineages and that recombination is unlikely to explain the appearance of SARS in humans. Keywords: emerging viruses; SARS coronavirus; evolution; phylogeny; recombination 1. INTRODUCTION relatives in Himalayan palm civets (Guan et al. 2003), although it is not yet established that these are the source Since the first descriptions of AIDS in the early 1980s, population for the human form of the virus.
    [Show full text]
  • SARS-Cov-2 Quasispecies Provides Insight Into Its Genetic Dynamics During Infection
    bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258376; this version posted August 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. SARS-CoV-2 Quasispecies provides insight into its genetic dynamics during infection Fengming Sun1,4,#, Xiuhua Wang1,4,#, Shun Tan2,#, Yunjie Dan1,4, Yanqiu Lu2, Juan Zhang1,4, Junli Xu3, Zhaoxia Tan1, Xiaomei Xiang1, Yi Zhou1, Weiwei He1, Xing Wan1, Wei Zhang1, Yaokai Chen2,*, Wenting Tan1,4,*, Guohong Deng1,4,* 1 Department of Infectious Diseases, Southwest Hospital, Third Military Medical University (Army Medial University), Chongqing 400038, China; 2 Division of Infectious Diseases, Chongqing Public Health Medical Center, Chongqing 400036, China; 3 Beijing Novogene Company Limited, Beijing 100015, China; 4 Chongqing Key Laboratory for Research of Infectious Diseases, Chongqing 400038, China. # Authors contributed equally to this article *Corresponding Author: (1) Guohong Deng, PhD, Army Medial University, Southwest Hospital, 29 Gaotanyan Street, Shapingba District, Chongqing 400038, China, Email:[email protected]; (2) Wenting Tan, PhD, Army Medial University, Southwest Hospital, 29 Gaotanyan Street, Shapingba District, Chongqing 400038, China. Email: [email protected]; (3) Yaokai Chen, PhD, Chongqing Public Health Medical Center, 109 Baoyu Road, Shapingba District, Chongqing 400036, China, Email: [email protected]. 1 / 28 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258376; this version posted August 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
    [Show full text]
  • QUASISPECIES and VIRUS EVOLUTION Esteban
    Gene Therapy (2001) 8, (Suppl 1), S1–S13 2001 Nature Publishing Group All rights reserved 0969-7128/01 $15.00 www.nature.com/gt QUASISPECIES AND VIRUS EVOLUTION PANDEMIC SPREAD: INFLUENZA Robert G. Webster1•2, Yi Guan2, Scott Krauss1, Kennedy Esteban Domingo Shortridge2, Malik Peiris2 Centro de Biologfa Molecular "Severo Ochoa" 1) Dept. Virol Mo! Biol, St. Jude C R Hosp., 332 N. (CSIC-UAM). Cantoblanco, 28049 Madrid, Spain Lauderdale, Memphis, Tennessee, and 2) Dept Microbiol, Hong Kong Univ., SAR China High mutation rates and quasispecies dynamics are a Influenza virus continues to evolve, and new antigenic drift hallmark of RNA viruses. Recent studies with foot-and­ variants emerge constantly, giving rise to yearly epidemics. In mouth disease virus (FMDV) have revealed rapid addition, strains to which most humans have no immunity coevolution of antigenicity and host cell tropism of virus appear suddenly, and the resulting pandemics vary from populations in cell culture and in vivo. Remarkable serious to catastrophic. Studies in aquatic birds over 25 years expansions of host cell tropism occurred in a clonal have established that they play an important role in the natural population of FMDV as a result of prolonged cytolytic history of influenza viruses. In the past four years there have replication in BHK-21 cells. Reversion of mutants with been two different transmissions of avian influenza viruses to humans in Hong Kong. The first genetic lesions associated with decreases in viral fitness, occurred in 1997, when 6 of 18 infected humans died of H5Nl influenza virus. In March and analysis of the mutant spectra of revertant 1999, H9N2 influenza viruses were isolated from two children populations, have indicated the presence of memory in Hong Kong and unconfirmed reports of an additional five genomes in viral quasispecies.
    [Show full text]
  • A Framework for Inferring Fitness Landscapes of Patient-Derived Viruses Using Quasispecies Theory
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2015 A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory Seifert, David ; Di Giallonardo, Francesca ; Metzner, Karin J ; Günthard, Huldrych F ; Beerenwinkel, Niko Abstract: Fitness is a central quantity in evolutionary models of viruses. However, it remains difficult to determine viral fitness experimentally, and existing in vitro assays can be poor predictors of in vivo fitness of viral populations within their hosts. Next-generation sequencing can nowadays provide snapshots of evolving virus populations, and these data offer new opportunities for inferring viral fitness. Using the equilibrium distribution of the quasispecies model, an established model of intrahost viral evolution, we linked fitness parameters to the composition of the virus population, which can be estimated bynext- generation sequencing. For inference, we developed a Bayesian Markov chain Monte Carlo method to sample from the posterior distribution of fitness values. The sampler can overcome situations where no maximum-likelihood estimator exists, and it can adaptively learn the posterior distribution of highly correlated fitness landscapes without prior knowledge of their shape. We tested our approach on simulated data and applied it to clinical human immunodeficiency virus 1 samples to estimate their fitness landscapes in vivo. The posterior fitness distributions allowed for differentiating viral haplotypes from each other, for determining neutral haplotype networks, in which no haplotype is more or less credibly fit than any other, and for detecting epistasis in fitness landscapes. Our implemented approach, called QuasiFit, is available at http://www.cbg.ethz.ch/software/quasifit.
    [Show full text]