A Framework for Inferring Fitness Landscapes of Patient-Derived Viruses Using Quasispecies Theory
Total Page:16
File Type:pdf, Size:1020Kb
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2015 A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory Seifert, David ; Di Giallonardo, Francesca ; Metzner, Karin J ; Günthard, Huldrych F ; Beerenwinkel, Niko Abstract: Fitness is a central quantity in evolutionary models of viruses. However, it remains difficult to determine viral fitness experimentally, and existing in vitro assays can be poor predictors of in vivo fitness of viral populations within their hosts. Next-generation sequencing can nowadays provide snapshots of evolving virus populations, and these data offer new opportunities for inferring viral fitness. Using the equilibrium distribution of the quasispecies model, an established model of intrahost viral evolution, we linked fitness parameters to the composition of the virus population, which can be estimated bynext- generation sequencing. For inference, we developed a Bayesian Markov chain Monte Carlo method to sample from the posterior distribution of fitness values. The sampler can overcome situations where no maximum-likelihood estimator exists, and it can adaptively learn the posterior distribution of highly correlated fitness landscapes without prior knowledge of their shape. We tested our approach on simulated data and applied it to clinical human immunodeficiency virus 1 samples to estimate their fitness landscapes in vivo. The posterior fitness distributions allowed for differentiating viral haplotypes from each other, for determining neutral haplotype networks, in which no haplotype is more or less credibly fit than any other, and for detecting epistasis in fitness landscapes. Our implemented approach, called QuasiFit, is available at http://www.cbg.ethz.ch/software/quasifit. DOI: https://doi.org/10.1534/genetics.114.172312 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-105801 Journal Article Published Version Originally published at: Seifert, David; Di Giallonardo, Francesca; Metzner, Karin J; Günthard, Huldrych F; Beerenwinkel, Niko (2015). A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory. Genetics, 199(1):191-203. DOI: https://doi.org/10.1534/genetics.114.172312 INVESTIGATION A Framework for Inferring Fitness Landscapes of Patient-Derived Viruses Using Quasispecies Theory David Seifert,*,† Francesca Di Giallonardo,‡ Karin J. Metzner,‡ Huldrych F. Günthard,‡ and Niko Beerenwinkel*,†,1 *Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland, †Swiss Institute of Bioinformatics, Basel 4058, Switzerland, and ‡Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland ABSTRACT Fitness is a central quantity in evolutionary models of viruses. However, it remains difficult to determine viral fitness experimentally, and existing in vitro assays can be poor predictors of in vivo fitness of viral populations within their hosts. Next- generation sequencing can nowadays provide snapshots of evolving virus populations, and these data offer new opportunities for inferring viral fitness. Using the equilibrium distribution of the quasispecies model, an established model of intrahost viral evolution, we linked fitness parameters to the composition of the virus population, which can be estimated by next-generation sequencing. For inference, we developed a Bayesian Markov chain Monte Carlo method to sample from the posterior distribution of fitness values. The sampler can overcome situations where no maximum-likelihood estimator exists, and it can adaptively learn the posterior distribution of highly correlated fitness landscapes without prior knowledge of their shape. We tested our approach on simulated data and applied it to clinical human immunodeficiency virus 1 samples to estimate their fitness landscapes in vivo. The posterior fitness distributions allowed for differentiating viral haplotypes from each other, for determining neutral haplotype networks, in which no haplotype is more or less credibly fit than any other, and for detecting epistasis in fitness landscapes. Our implemented approach, called QuasiFit,is available at http://www.cbg.ethz.ch/software/quasifit. ITNESS is a central quantity in evolutionary biology. It infection assays (Quiñones-Mateu and Arts 2002). A draw- Fcan be regarded as a measure of reproductive capacity of back to all in vitro measurements of viral fitness is the each individual. In evolving populations, individuals with removal of viruses from the environment to which they have higher fitness can outcompete those with lower fitness. Fit- adapted. Such estimates disregard effects on the fitness ness depends on the genetic composition of the individual’s landscape deriving from the natural in vivo environment. haplotype, i.e., the allelic constellation of multiple loci of its RNA viruses, such as human immunodeficiency virus genome, and on the environment, i.e., the host conditions (HIV), have very high mutation rates (Rezende and Prasad for viral reproduction. Determining the fitness of viruses in 2004), and the number of haplotypes that arise in the nor- a population is experimentally difficult and laborious, be- mal course of intrahost evolution can be extremely large cause individual virus particles need to be isolated and an- (Steinhauer and Holland 1987). Thus, HIV populations change alyzed separately. In vitro determination of viral fitness and explore sequence space on timescales that are much usually involves enzymatic, growth competition, or mono- shorter than those of higher eukaryotes. As such, RNA viruses lend themselves to being studied as model systems of evolu- tionary theory. In clinical settings, viral fitness is also of great Copyright © 2015 by the Genetics Society of America doi: 10.1534/genetics.114.172312 interest. Disease progression, the formation of escape Manuscript received July 31, 2014; accepted for publication November 10, 2014; mutants, and ultimately treatment failure depend on viral published Early Online November 17, 2014. fitness (Clavel and Hance 2004; Beerenwinkel et al. 2013). Available freely online through the author-supported open access option. Supporting information is available online at http://www.genetics.org/lookup/suppl/ The frequencies of viral haplotypes are determined by doi:10.1534/genetics.114.172312/-/DC1. evolutionary parameters, including mutation rate and fit- 1Corresponding author: Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology, Mattenstrasse 26, 4058 Basel, Switzerland. ness. With the advent and exponential decrease in cost of E-mail: [email protected] next-generation sequencing (NGS) data (Niedringhaus et al. Genetics, Vol. 199, 191–203 January 2015 191 Figure 1 Schematic illustration of fitness landscapes and quasispecies. On the left is a fitness landscape on a simple biallelic two-locus genome. Here, 0 indicates a wild- type allele and 1 indicates a mu- tant allele and the vertical axis indicates fitness f. The fitness landscape also includes epistasis, as the fitness of the double mu- tant is not additive in the main fit- ness effects of the two mutant alleles. For this fitness landscape, at equilibrium, the quasispecies equation yields the quasispecies, i.e., the mutation–selection equi- librium (right-hand side). The vertical bars indicate the relative frequency p of each haplotype. In practice, these frequencies are not known but can be estimated from next-generation sequencing data. The stacked horizontal lines represent reads from a sequencing experiment and amount to a finite sample of the quasispecies. Solid circles and crosses indicate mutant alleles at the two loci. Due to the sampling variance inherent in the finite sample, the number of reads will not match the frequencies exactly. Note that fitness values for the haplotypes need not show a strong linear correlation with the haplotype frequencies, as mutational coupling can obscure this relationship. Given a finite sample of the population, we aim to infer properties of the fitness landscape (right-to-left arrow). 2011), the technical prerequisites for affordable in-depth and Schuster 1977; Burch and Chao 2000; Vignuzzi et al. 2005; personalized diagnostics are within reach. Acevedo et al. Metzner et al. 2009; Domingo et al. 2012), which is mathemat- (2014) have shown that inference of marginal fitness effects ically tractable. One of the predictions of quasispecies theory of single nucleotide variants in viruses with NGS is possible that has stood the test of time is the existence of an error already today. High-quality NGS data for intrahost viral pop- threshold in viral replication. If the mutation rate of a virus ulations will become ubiquitous in the near future, making lies above this critical threshold, then mutation will cause in vivo fitness analysis on the basis of such data possible. genetic information to be lost. Anderson et al. (2004) have A fitness landscape is the association of a real, non- shown this phenomenon to exist in practice with the use of negative fitness value to each haplotype. Fitness landscapes mutagenic nucleosides. can be perfectly correlated, “Mount Fuji”-like with strong In this article, we establish a computational framework correlation between the fitnesses of closely related haplo- for estimating fitness from NGS data based on quasispecies types or, on the other end of