<<

Downloaded by guest on September 28, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1806133115 a Bloom D. Jesse and Lee M. Juhye variants H3N2 influenza human of fates evolutionary helps predict of scanning mutational Deep in lopa nipratrl 5 ,9 7.Specifically, 17). of 9, lack 7, and (18–20) become (5, mutations rate deleterious mutation role muta- (21), high recombination intrasegment important of ’s influenza an effects to due play nonantigenic circulating has also the of date tions However, antigenicity to the (11–16). evolution assess strains to influenza data on experimental work used affect Most mutations change, how features. identifying experi- antigenic these for principle, informative In greater be transmission. could and competitors: ments growth viral their efficient from and clades ful as succeed to clades is certain evolution out. enable H3N2 die of that others study features the low the in at identify goal remain to key that A clades 9). than suc- (4–6, fitness that frequency higher indicate have evidence over clades of take lines cessful to some Several on with going population. others success, virus and the evolutionary emergence (4– after their soon mutations in out dying widely of vary complements strains Clades different of 8). bearing groups char- among clades further turnover to called is and updates evolution competition to This by frequent (3). contribute acterized necessitates vaccine mutations influenza that annual these drift the of antigenic Many rapid 2). the (1, surface S shifts mutational virus influenza viral nature. a in in on studied lineages being made ones viral are the to of measurements similar the fates strain leveraged when evolutionary be only the can nature—but understand exper- effects help that mutational suggests to has work of HA our measurements Overall, H3 HA. imental stalk the H1 and the instance, head the than amino For between domains tolerance which sites. distantly mutational in in many a disparity differences at less for substantial preferred generated find are previously and acids data HA similar H1 under- to related for HA measurements informative H3 our compare for be also We still nature. in despite evolution can Therefore, standing measurements viral growth. of such measurements viral cell-culture fitness, for about caveats favorable well-known less evolutionarily the mea- are be in be that to mutations enriched to to sured are relative measured lineages growth are viral influenza H3N2 viral that successful H3N2 mutations for human that favorable recent show more a amino We single from all strain. HA of virus the culture cell to in mutations growth acid viral on mea- effects experimentally affect the we sure mutations Here these antigenicity. how and functionality on of HA’s 2018) depends success 9, lineages April evolutionary review virus The for influenza (received (HA). 2018 major hemagglutinin 18, its July protein in approved mutations surface and NY, accumulates York, rapidly New virus Sinai, influenza Mount Human at Medicine of School Icahn Palese, Peter by and Edited 92037; CA Jolla, 98109 WA La Seattle, Institute, Center, Research Research Scripps Cancer The Hutchinson Biology, Computational and Structural 98109; Integrative WA Seattle, Center, Research Cancer 98195; WA ai cecsDvso,Fe ucisnCne eerhCne,Sate A98109; WA Seattle, Center, Research Cancer Hutchinson Fred Division, Sciences Basic w ancaatrsisdsigiheouinrl success- evolutionarily distinguish characteristics main Two mn cdmttosprya nishmgltnn(HA) hemagglutinin its 4 in to year 3 per fixing mutations rapidly, acid evolves virus amino influenza H3N2 easonal c eia cets riigPorm nvriyo ahntn etl,W 98195; WA Seattle, Washington, of University Program, Training Scientist Medical | a,b,c,1 hemagglutinin onHuddleston John , a,b,g,2 | epmttoa scanning mutational deep e oeua n ellrBooyPorm nvriyo ahntn etl,W 98195; WA Seattle, Washington, of University Program, Biology Cellular and Molecular d,e,1 ihe .Doud B. Michael , | epistasis | a,b,c ahy .Hooper A. Kathryn , sdt nlz h aaadpoueterslsi h ae r nGtu at GitHub on are paper the in results the produce and data the https://github.com/jbloomlab/Perth2009-DMS-Manuscript analyze to used 1073/pnas.1806133115/-/DCSupplemental. at online information supporting contains article This 2 1 aadpsto:De eunigdt r vial rmteSqec edArchive Read Sequence the nos. from under available accessions are BioSample distributed data under sequencing Deep is deposition: Data article BY-NC-ND). (CC 4.0 access License NonCommercial-NoDerivatives open This Submission. Direct PNAS the a is wrote article J.D.B. This interest. and of conflict T.B., no J.H., declare authors J.M.L., The and data; tools; analyzed reagents/analytic J.D.B. new paper. and contributed per- T.B., N.C.W. J.H. J.H., and and J.M.L., J.M.L. K.A.H., research; M.B.D., designed research; J.D.B. and formed T.B., J.H., J.M.L., contributions: Author are H3 HA of H1 fate an evolutionary the on strains. viral understanding made for experimental Measurements informative How- the natural less strains: between out. understanding similarity natural for die the and experiments quickly on the that depends of strains evolution utility in recent mea- viral the found a experimental ever, on those of these muta- from HA effects successful that evolutionarily the tions the discriminate show to help We measure can mutations surements all strain. we of H3N2 Here, (H1N1) culture human cell 24). A/WSN/1933 in laboratory-adapted 23, growth for highly (10, HA the strain only the from muta- made However, acid been is previously 23–27). have amino measurements (10, large-scale single such which all viral of to effects tions functional affect the HA measure H3N2 quantita- to large-scale mutations no how are growth. viral of there characterizations However, to tive dele- central (9). properties of fitness nonantigenic accumulation viral affect resulting can The mutations ones. terious beneficial to linked owo orsodnemyb drse.Eal bomfehthogor [email protected] Email: addressed. be may correspondence [email protected]. whom work.y To this to equally contributed J.H. and J.M.L prahsmyb sflfrudrtnigvrsevolution virus Overall, understanding nature. nature. experimental for in in useful high-throughput be succeed new may not that approaches do suggests among and work distinguishing do our that for that strains utility show viral We have culture. cell measurements in H3N2 growth these human viral a on from strain acid protein influenza amino hemagglutinin all the of to effects mutations out. the die measure fore- will experimentally to ones which is we and evolution Here persist virus will influenza strains viral of which study cast the in goal key A Significance ti o osbet s epmttoa cnig(2 to (22) scanning mutational deep use to possible now is It b eateto eoeSine,Uiest fWsigo,Seattle, Washington, of University Sciences, of Department d acn n netosDsaeDvso,Fe Hutchinson Fred Division, Disease Infectious and Vaccine y a,e g ihlsC Wu C. Nicholas , opttoa ilg rga,Fred Program, Biology Computational SAMN08102609 and www.pnas.org/lookup/suppl/doi:10. raieCmosAttribution- Commons Creative NSLts Articles Latest PNAS f optrcode Computer SAMN08102610. rvrBedford Trevor , . f eatetof Department | d,g,2 f10 of 1 ,

EVOLUTION Results HA mutants. Specifically, stop codons were purged to 20% to Deep Mutational Scanning of HA from a Recent Strain of Human H3N2 45% of their initial frequencies after correcting for error rates Influenza Virus. We performed a deep mutational scan to mea- estimated by sequencing the wild-type controls (Fig. 1C). The sure the effects of all amino acid mutations to HA from the incomplete purging of stop codons is likely because genetic com- A/Perth/16/2009 (H3N2) strain on viral growth in cell culture. plementation due to co-infection (33, 34) enabled the persistence This strain was the H3N2 component of the influenza vaccine of some virions with nonfunctional HAs. We also observed selec- from 2010–2012 (28, 29). Relative to the consensus sequence for tion against many nonsynonymous mutations (Fig. 1C), with this HA in GenBank, we used a variant with two mutations that their frequencies falling to 30% to 40% of their initial values enhanced viral growth in cell culture, G78D and T212I (see SI after error correction. Appendix, Fig. S1 and Dataset S1). The G78D mutation occurs We next quantified the reproducibility of our deep mutational at low frequency in natural H3N2 sequences, and T212 is a site scanning across biological and technical replicates. We first used where a mutation to Ala rose to fixation in human influenza the deep sequencing data for each replicate to estimate the pref- in ∼2011. erence of each site in HA for all 20 amino acids as described We mutagenized the entire HA coding sequence at the codon in ref. 39. Because there are 566 residues in HA, there are level to create mutant plasmid libraries harboring an average of 566 × 19 = 10, 754 distinct measurements [the 20 preferences at ∼1.4 codon mutations per clone (see SI Appendix, Fig. S2). We each site sum to 1 (39)]. The correlations of the amino acid then generated mutant virus libraries from the mutant plasmids preferences between pairs of replicates are shown in Fig. 1D. using a helper-virus system that enables efficient generation of The biological replicates were well-correlated, with Pearson’s complex influenza virus libraries (10) (Fig. 1A). These mutant R ranging from 0.69 to 0.78. Replicate 1 exhibited the weak- derived all their non-HA from the laboratory- est correlation with other replicates; this replicate also showed adapted A/WSN/1933 strain. Using WSN/1933 for the non- the weakest selection against stop and nonsynonymous muta- HA genes reduces biosafety concerns and also helped increase tions (Fig. 1C), perhaps indicating more experimental noise. The viral titers. To further increase viral titers, we used MDCK- two technical replicates, 3-1 and 3-2, were only slightly more SIAT1 cells (Madin–Darby canine kidney cells overexpressing correlated than pairs of biological replicates, suggesting that bot- 2,6-sialyltransferase) (30) that we engineered to constitutively tlenecking of library diversity during viral passage contributes express TMPRSS2 (Transmembrane Protease, Serine 2), which most of the experimental noise. cleaves the HA precursor to activate it for membrane fusion (31, 32). Our Measurements Are Consistent with Existing Knowledge About After generating the mutant virus libraries, we passaged them HA’s Evolution and Function. How do the HA amino acid pref- at low multiplicity of infection (MOI) in cell culture to create a erences measured in our experiments relate to the evolution of genotype–phenotype link and select for functional HA variants H3N2 influenza virus in nature? This question can be addressed (Fig. 1A). All experiments were completed in full biological trip- by evaluating how well an experimentally informed codon sub- licate (Fig. 1B). We also passaged and deep sequenced library 3 stitution model (ExpCM) using our measurements describes in duplicate (library 3-1 and 3-2) to gauge experimental noise H3N2 evolution compared with standard substitution models within a single biological replicate. As a control to measure (35, 40). Table 1 shows that an ExpCM using the across-replicate sequencing and mutational errors, we used the unmutated HA average of our measurements greatly outperforms conventional to generate and passage viruses carrying wild-type HA. substitution models. This result indicates that our experiments Deep sequencing of the initial plasmid mutant libraries and authentically capture some of the constraints on HA evolu- the passaged mutant viruses revealed selection for functional tion. A substitution model in which the amino acid preferences

A

BC D

Fig. 1. Deep mutational scanning of the Perth/2009 H3 HA. (A) We generated mutant virus libraries using a helper-virus approach (10) and passaged the libraries at low MOI to establish a genotype–phenotype linkage and to select for functional HA variants. Deep sequencing of the variants before and after selection allowed us to estimate each site’s amino acid preferences. (B) The experiments were performed in full biological triplicate. We also passaged and deep sequenced library 3 in duplicate. (C) Frequencies of nonsynonymous, stop, and synonymous mutations in the mutant plasmid DNA, the passaged mutant viruses, and wild-type DNA and virus controls. (D) The Pearson correlations among the amino acid preferences estimated in each replicate.

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1806133115 Lee et al. Downloaded by guest on September 28, 2021 Downloaded by guest on September 28, 2021 sg e. inlppie A co,H1etdmi;H2et. A cooan M rnmmrn oan yo al yolsi al.Teletters domain The HA tail). the indicates cytoplasmic bar tail, overlay cyto. bottom al. domain; et The Lee transmembrane 38. site. ref. TM, that in numbering. ectodomain; at H3 delineated acid HA2 in residues amino ecto., are epitope wild-type sites of HA2 the The indicate set ectodomain; 1. stack the Table HA1 logo in in ecto., each is parameter above HA1 site stringency directly a the peptide; not by signal (35) or pep., rescaling whether (sig. and indicates replicates bar experimental overlay over top average The the taking after acid, amino 2. Fig. is informative are of measurements rate relative our The constraints. site-specific that demonstrat- capture they reason models, because the substitution performs that average) conventional ing site than (ExpCM, better sites no all across averaged were shape the by over followed distribution mean gamma the the give of we parameters model, rate M5 measurements and the the For for ExpCM. accounting the after for dN/dS relative the and models and Yang The (LnL) parameters. likelihood Akaike log model by the of from compared number computed are (37) Models versions (AIC) (36). M5 criterion and information model M0 (GY94) and Goldman–Yang average), the mea- (site of experimental sites the across which averaged in are ExpCM surements (35), ExpCM using HAs H3N2 human 2,501 M0 GY94 average site ExpCM, M5 GY94 ExpCM Model models traditional experiments than the better by evolution informed HA’s models describe Substitution 1. Table DOM DOM DOM DOM DOM DOM DOM epi epi epi epi epi epi epi hw r h aiu ieiodpyoeei t oa lgmn of alignment an to fits phylogenetic likelihood maximum the are Shown 1 -6 -16 h ieseicaioai rfrne ftePrh20 Amaue noreprmns h egto ahlte stepeeec o that for preference the is letter each of height The experiments. our in measured HA Perth/2009 the of preferences acid amino site-specific The H216(A)5 H216(A)7 H216(A)9 H226(HA2)216 (HA2)206 (HA2)196 (HA2)186 (HA2)176 (HA2)166 (HA2)156 (HA2)146 H26 H27 H28 H29 H216(A)1 H216(HA2)136 (HA2)126 (HA2)116 (HA2)106 (HA2)96 (HA2)86 (HA2)76 (HA2)66 A co HA2ecto. HA1 ecto. 1 325 315 235 5 6 175 165 155 75 2,536 2,094 I n Stringency LnL ∆AIC 0.0 domain (DOM) ,8 .6(.0 0.84) (0.30, 0.36 — 0.67 — −9,704 2.47 −9,692 −9,482 −8,441 ω 4 255 245 aaee sd/SfrteGoldman– the for dN/dS is parameter Mct.ti i.pp ptp non-epitope epitope sig.pep. cyto.tail TM 595 85 51525 H26(A)6(A)6(A)6(A)6(HA2)56 (HA2)46 (HA2)36 (HA2)26 (HA2)16 (HA2)6 ω . 0.31 0.32 0.91 ω 6 275 265 8 195 185 105 cd uha 9,D9,W5,adS2 4–7.Apositively A (44–47). sialic S228 binding and in W153, involved D190, be for Y98, to preferences as known such strong are acid, residues that are At acids 144- there amino 137-HA2, cysteine. binding, and the prefer receptor 14 strongly 305, in and (43) involved 281 148-HA2) 139, and and and 52 97 HA2 (sites 76, existing bridges and disulfide 64 with 277, important agree structurally sites form instance, generally For that function. and preferences structure HA’s about acid knowledge amino sured shown 2. are Fig. preferences in by rescaled 40) These (35, rescaled parameter. preferences stringency acid of this amino rest the the the use as we Throughout acids paper, indi- stringency. this amino greater 1), same with (Table the but favors 2.47 experiments selection is natural scanning. (35) that mutational parameter cating deep stringency the ExpCM in The for nonsynonymous accounted against is in selection substitutions measured purifying most preferences that acid indicating (ω amino experiments the our for substitutions However, synonymous accounting 1). to after nonsynonymous (Table of models rate relative substitution or the conventional (dN/dS for substitutions 1 synonymous to nonsynonymous xmnto fFg eel htteeprmnal mea- experimentally the that reveals 2 Fig. of Examination 1 125 115 35 o h xC)i ls o1(al 1), (Table 1 to close is ExpCM) the for epitope site?(epi) 8 295 285 205 555 45 NSLts Articles Latest PNAS 1 225 215 135 | 305 f10 of 3 ω 145 is ) 65

EVOLUTION charged amino acid at site 329 is important for cleavage of the (Fig. 3). Specifically, whereas solvent-exposed residues in the HA0 precursor into the mature form (48), and this site strongly head domain are substantially more mutationally tolerant than prefers arginine. those in the stalk domain for the WSN/1933 H1 HA, the trend is However, there are also some differences between the amino actually reversed for the Perth/2009 H3 HA (Fig. 3B). This dif- acid preferences measured in our experiments and amino acid ference between the relative mutational tolerances of the H1 and frequencies in natural H3 HA sequences (see SI Appendix, Fig. H3 HAs is robust to the cutoff used to define surface residues S3). Most surprisingly, the start codon does not show a partic- (see SI Appendix, Fig. S7). For instance, for the H3 HA, the short ularly strong preference for methionine (Fig. 2). We validated helix A in the stalk domain is as mutationally tolerant as many that a virus carrying a mutation at this site from methionine surface-exposed residues in the head domain—something that is to lysine does in fact reach appreciable titers (see SI Appendix, not the case for the H1 HA. Helix A forms part of the epitope of Fig. S4), perhaps because of alternative translation-initiation at many broadly neutralizing antistalk antibodies (51–53). a downstream or upstream start site as has been described for We also see high mutational tolerance in many of the known other HAs (49). Our measurements also suggest mutational tol- antigenic regions of H3 HA (54). For instance, antigenic region erance at some other sites that are relatively conserved among B is an immunodominant area, and many recent major anti- natural HAs, such as the N-linked glycosylation motifs near the genic drift mutations have occurred in this region (14, 15, 55). beginning of HA1 and the transmembrane domain (Fig. 2). We We find that the most distal portion of the globular head near validated that viruses with mutations to the glycosylation motifs the 190-helix, which is part of antigenic region B, is highly tol- at sites 22 or 38, or a site in the transmembrane domain, do in erant of mutations (Fig. 3A). Antigenic region C is also notably fact grow to high titers (SI Appendix, Figs. S5 and S6, respec- mutationally tolerant. tively). The disparity between the relative conservation of these Many residues inside HA’s receptor binding pocket are known sites in nature and their mutational tolerance in our study could to be highly functionally constrained (45, 56), and our data indi- be because cell culture does not fully capture the constraints on cate that these sites are relatively mutationally intolerant in both HA function in nature or could be because these sites are not H3 and H1 HAs (Fig. 3A). In contrast, the residues surround- under strong immune pressure and so mutations at them are not ing the receptor binding pocket are fairly mutationally tolerant, positively selected in nature. which may contribute to the rapidity of influenza’s antigenic evo- lution, since mutations at these sites can have large effects on There Is Less Difference in Mutational Tolerance Between the HA antigenicity (14, 54). Head and Stalk Domains for H3 than for H1. Our experiments mea- sure which amino acids are tolerated at each HA site under Our Measurements Can Help Distinguish Between Mutations That selection for viral growth. We can therefore use our experimen- Reach Low and High Frequencies in Nature. Mutations occurring tally measured amino acid preferences to calculate the inherent in the H3N2 virus population experience widely varying evolu- mutational tolerance of each site, which we quantify as the Shan- tionary fates (Fig. 4). Some mutations appear, spread, and fix in non entropy of the rescaled preferences. In prior mutational the population, while others briefly circulate before disappear- studies of H1 HAs, the stalk domain was found to be substan- ing. We take the maximum frequency reached by a mutation tially less mutationally tolerant than the globular head (10, 23, as a coarse indicator of its effect on fitness, since favorable 24, 50). mutations generally reach higher frequencies than unfavorable We performed a similar analysis using our new data for the ones (57). Here, we follow the population genetic definition of Perth/2009 H3 HA. Surprisingly, the head domain of the H3 “mutation” and track the outcome of each individual mutation HA is not more mutationally tolerant than the stalk domain event; for example, although R142G occurs multiple times on the

AB

Fig. 3. Mutational tolerance of each site in H3 and H1 HAs. (A) Mutational tolerance as measured in the current study is mapped onto the structure of the H3 trimer [Protein Data Bank (PDB) ID code 4O5N (41)]. Mutational tolerance of the WSN/1933 H1 HA as measured in ref. 10 is mapped onto the structure of the H1 trimer [PDB ID code 1RVX (42)]. Different color scales are used because measurements are comparable among sites within the same HA but not necessarily across HAs. Both trimers are shown in the same orientation. For each HA, the structure at Left shows a surface representation of the full trimer, while the structure at Right shows a ribbon representation of just one monomer. The sialic acid receptor is shown in red sticks. (B) The mutational tolerance of solvent-exposed residues in the head and stalk domains of the Perth/2009 H3 HA (purple) and WSN/1933 H1 HA (gold). Residues falling in between the two cysteines at sites 52 and 277 were defined as belonging to the head domain, while all other residues were defined as the stalk domain. A residue was classified as solvent exposed if its relative solvent accessibility was ≥0.2. The results are robust to the choice of solvent accessibility cutoff (see SI Appendix, Fig. S7). Note that the mutational tolerance values are not comparable between the two HAs but are comparable between domains of the same HA.

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1806133115 Lee et al. Downloaded by guest on September 28, 2021 Downloaded by guest on September 28, 2021 e tal. et Lee infeunisi h 32pyoeyvaSera rank Spearman via phylogeny H3N2 muta- post the maximum both in the and for frequencies quantified effects measurements We tion mutational our respectively. between analyses, of relationship prospective utility par- and post-Perth the to and hoc fates test pre-Perth evolutionary these to their used muta- titions for We these time resolved. since enough fully after, be had or not 2014 have in tions sampled were that partition S8 (see mutations strain 299 Perth/2009 and partitioned the excluded predating and postdating mutations mutations we itself 1,022 into strain experiment, mutations experimental remaining the the the at in to background acids related used genetic closely amino strain the to wild-type the related of and effects of ratio minimize mutant the To of site. the that logarithm for the preferences took given fre- the simply a calculate we maximum To effect, the evolution. mutation’s natural and during attained effect it quency measured experimentally there tion’s that occur. obvious do be they visually to when frequency measure rarely is high mutations we reach deleterious it that such that 4), mutations deleterious—and circulating strongly (Fig. few way relatively are this in logeny effects of sum the mutation. on each based of effect mutational branch, single single a a assigned we on dis- occurred be mutations cannot multiple because branch when However, phylogeny entangled, 4. same the Fig. on in mutations branch multiple separate a a as shown on is circle mutation differ- separate each on such, occurring As separately. mutations backgrounds these ent of each track we phylogeny, domain. stalk or head HA’s in experiments. are our they in favorable S11 Fig. relatively Appendix, be SI to measured are frequency according high in colored as trajectories effects mutational with containing the mutation, clade to analyses. each the our of in The from trajectory nodes excluded growth). frequency and were viral star, strain a to Perth/2009 with deleterious the labeled be is to strain measured Perth/2009 val- (negative mutations the scanning to indicate mutational according deep ues our colored in are measured and effect mutations mutational acid amino individual in tree indicate HA H3N2 full the of mutations. set these of effects measured experimentally the 4. Fig. enx ogtt uniytecreainbtenamuta- a between correlation the quantify to sought next We phy- the on frequencies their and mutations annotating After .W diinlyecue uain rmtepost-Perth the from mutations excluded additionally We ). frequency 0 1 0420 0821 022014 2012 2010 2008 2006 2004 rqec rjcoiso niiulmttosadterrlto to relation their and mutations individual of trajectories Frequency mutational effect less than-10 -10 to0 greater than0 hw iia aotbtclr uain ywhether by mutations colors but layout similar a shows i.S8 Fig. Appendix, SI ti la htms uain htreach that mutations most that clear is It Top. year A/Perth/16/2009 rm20 o21.Circles 2014. to 2004 from Fig. Appendix, SI Top Bottom hw h sub- the shows hw the shows a vnsrne o uain rmupsae ia isolates viral unpassaged frequency from mutations maximum (ρ for the stronger and even effects was mutational measured our of 5A partition Fig. post-Perth tree. the phylogenetic for commonplace, the analysis become this perform recently only unpas- could only not of we has had sequencing to isolates that Because primary robust viruses laboratory. saged the from were in derived results analy- passaged sequences been can HA our our only repeated which that using we sis eggs, check mutations, or To laboratory-adaptation such culture (58). analyses cell evolution- ary in confound that passaged mutations laboratory-adaptation cause were that lates precede strain. that mutations experimental studied analyze the experimentally retrospectively to the as fates postdate well as evolutionary that strain strains the in exper- explain our mutations help that of show can partitions measurements post-Perth and imental pre- frequency the both mutation for maximum relationship and significant (pre-Perth statistically effect but mutational modest between a found peri- time we post-Perth and ods, pre-Perth both In 5A). (Fig. correlation rmidpnetde uainlsann elctso H1 on 7B replicates Fig. HAs. scanning H3 measurements mutational and correlate deep amino to extent is 42% independent the preferences only investigate from acid to amino with way in diverged, simple shifts more One of 7A). far (Fig. are identity HAs acid the H3 However, (27). HIV and identity of acid H1 variants amino and 86% (59) with identity () envelope acid nucle- amino influenza 94% of with variants in oprotein two shifts between modest H3 only preferences found between acid have amino amino shifted experiments the Prior have HAs. much sites H1 how and homologous examined of we To case, preferences subtypes. the acid HA is two this these if same between the determine different of effect often the is that is mutation the viruses understanding influenza for H3N2 Amino useful of the less evolution is in scanning HAs mutational deep H1 H1 and Sites. Many H3 of Between Preferences Acid Differences Large are Are that There sequences natural sequence the experimental understand- from studied. the for being diverged as experiment more degrades an becomes Fig. evolution of to natural utility 6 ing the Fig. (compare Therefore, those measurements 5A). than reach experimental weaker mutations consistently H3 are that using evolution frequency viral mea- H3N2 maximum experimental during 6 the H1 Fig. the and identity). HA between sequence surements H3 correlations protein Perth/2009 the 42% that the have shows from only diverged HAs highly two (see (the is (10) muta- which HA deep S9), H1 prior WSN/1933 Fig. our the in of frequencies scanning measured tional mutation effects H3N2 mutational using of but analysis we foregoing HAs, across the Under- generalized repeated for be can Informative measurements Less Influenza. experimental Are H3 of HA Evolution H1 the standing an on Made Measurements unlikely is is nature. mutation in mutation a prosper that if to growth, particular, viral In to nature: deleterious in measurably the mutations understanding these for informative of are fates culture affect these cell mutations in Overall, how growth of viral frequency. measurements high that substantially demonstrate reach results be more never to a almost measured have (Fig. deleterious Mutations frequencies population effect. higher the mean reach in favorable that fix mutations that The those 5B). and frequencies medium, low, high reach that and fur- those into this mutations investigated partitioning by We ther mutations. deleterious substantially of ayo h A nsqec aaae r rmvrliso- viral from are databases sequence in HAs the of Many h rnsi i.5 Fig. in trends The 0 = . 24). ρ 0 = post-Perth .17, A hw htrpiaemaueet nthe on measurements replicate that shows r otsrnl rvnb h behavior the by driven strongly most are hw httecreainbetween correlation the that shows novoshptei o h the why for hypothesis obvious An ρ 0 = .Tesmlrefc sizes effect similar The .15). odtriehwbroadly how determine To NSLts Articles Latest PNAS IAppendix, SI | f10 of 5

EVOLUTION post-Perth/2009 A pre-Perth/2009 post-Perth/2009 (unpassaged) 1.00

0.75

0.50

0.25

max frequency 0.00

max. freq. = (0.0, 0.02] max. freq. = (0.0, 0.02] max. freq. = (0.0, 0.02] B 100 50 0 max. freq. = (0.02, 0.1] max. freq. = (0.02, 0.1] max. freq. = (0.02, 0.1] 25 0 max. freq. = (0.1, 0.5] max. freq. = (0.1, 0.5] max. freq. = (0.1, 0.5] 10 0 count max. freq. = (0.5, 0.99] max. freq. = (0.5, 0.99] max. freq. = (0.5, 0.99] 2.5 0 max. freq. = (0.99, 1.0] max. freq. = (0.99, 1.0] max. freq. = (0.99, 1.0] 10 0 -20 -15 -10 -5 0 5 10 -20 -15 -10 -5 0 5 10 -20 -15 -10 -5 0 5 10 mutational effect mutational effect mutational effect

Fig. 5. Experimental measurements are informative about the evolutionary fate of viral mutations. (A) Correlation between the effects of mutations as measured in our deep mutational scanning of the Perth/2009 HA and the maximum frequency reached by these mutations in nature. The plots show Spearman ρ and an empirical P value representing the proportion of 10,000 permutations of the experimental measurements for which the permuted ρ was greater than or equal to the observed ρ. (B) The distribution of mutational effects partitioned by maximum mutation frequency. The vertical black line shows the mean mutation effect for each category. The analysis is performed separately for pre-Perth/2009, post-Perth/2009, and unpassaged isolates from the post-Perth/2009 partitions of the tree (see SI Appendix, Fig. S8).

same HA variant are more correlated than those on different sites of large shifts do not obviously localize to one specific HA variants. region of HA’s structure (Fig. 8A). However, at the domain To more rigorously quantify shifts in amino acid preferences level, sites in HA’s stalk tend to have smaller shifts than sites after correcting for experimental noise, we used the statistical in HA’s globular head (Fig. 8B). The HA stalk domain is also approach in refs. 27 and 59. Fig. 7C shows the distribution of more conserved in sequence (60), suggesting that conservation shifts in amino acid preferences between H3 and H1 HAs after of amino acid sequence is correlated with conservation of amino correcting for experimental noise. Although some sites have acid preferences. Consistent with this idea, sites that are abso- small shifts near zero, many sites have large shifts. These shifts lutely conserved across all 18 HA subtypes are significantly less between H3 and H1 are much larger than expected from the shifted than sites that are variable across HA subtypes (Fig. 8B). null distribution that would be observed purely from experimen- Presumably these sites are under consistent functional constraint tal noise. They are also much larger than the shifts previously across all HAs. observed between two HIV Envs with 86% amino acid identity Despite their high sequence divergence, H1 and H3 adopt very (27). However, the typical shift between H3 and H1 is still smaller similar protein folds (61, 62). However, there are differences in than that observed when comparing HA to the nonhomologous the rotation and upward translation of the globular head sub- HIV Env protein. Therefore, there are very substantial shifts domains relative to the central stalk domain among different in mutational effects between highly diverged HA homologs, HA subtypes (61, 62). Previous work has defined clades of struc- although the effects of mutations remain more similar than for turally related HA subtypes (61, 62). One such clade includes nonhomologous proteins. H1, H2, H5, and H6, whereas another clade includes H3, H4, and H14 HAs (Fig. 7A). Sites that are conserved at different Properties Associated with the Shifts in Amino Acid Preferences amino acid identities in these two clades tend to have excep- Between H3 and H1 HAs. What features distinguish the sites with tionally large shifts in amino acid preferences (Fig. 8B). The shifted amino acid preferences between H3 and H1 HAs? The clade containing H1 has an upward translation of the globular

post-Perth/2009 pre-Perth/2009 post-Perth/2009 (unpassaged) 1.0

0.8

0.6

0.4

max frequency 0.2

0.0 -20 -10 0 10 20 -20 -10 0 10 20 -20 -10 0 10 20 mutational effectmutational effect mutational effect

Fig. 6. Experimental measurements on an H1 HA are less informative about the evolutionary fate of H3N2 mutations. This figure repeats the analysis of the H3N2 mutation frequencies in Fig. 5A but uses the deep mutational scanning data for an H1 HA as measured in ref. 10. SI Appendix, Fig. S10 shows the histograms comparable to those in Fig. 5B. The empirical P value represents the result of 1,000 permutations.

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1806133115 Lee et al. Downloaded by guest on September 28, 2021 Downloaded by guest on September 28, 2021 hc xeiet a eetfrlbrtr riat rfi to in al. et fail ways Lee or the artifacts chronicled laboratory has Indeed, for nature. select in in can work fitness experiments of true which amount represent not vast do a provide culture to cell able in be sug- may work scanning Our mutational data. mutations. such deep specific that of integrating realistically gests effects by more that improved the data” be represented phenotypic anti- principle and in both genotypic could “diverse for models L and accounted the Łuksza that that mutations, nonantigenic models introducing and genic paper fitness inform landmark viral might their predictive evolu- they In identify that forecasting. help suggests evolutionary can mutations experiments successful our in tionarily that viral frequencies for fact deleterious higher more The more reach be growth. be to to measured to mutations tend muta- than measured growth nature these mutations viral cell of for Specifically, in fate beneficial nature. evolutionary growth in the some viral tions have understanding on measurements for HA these value acid that H3 amino demonstrated single Perth/2009 and possible culture the all of to effects mutations the measured have We Discussion changes of terms in structure. rationalized HA HA in be in is shifts can 107 the preferences site of acid while some amino Val, Therefore, site extent mutations. HA, of lesser tolerant a H3 fairly to Perth/2009 75(HA2). and the at Gly of Gly prefers scanning a 75(HA2) mutational by deep facilitated a the is makes HA that In charged H3 turn in positively shorter prefer- loop and interhelical prefers high the sharper a contrast, strongly In has 75(HA2) Arg. 107 and and Lys site Glu HA, hydro- for H1 deep a In ence the by 8C). of stabilized (Fig. is scanning Lys-75(HA2) turn and mutational tall Glu-107 this helix between and and bond A domain, gen helix stalk connecting the has loop in H1 interhelical B containing the in clade 107 turn the sites shift taller Specifically, between a structural 62). interaction This (61, the 75(HA2) H3. to and largely containing attributed clade been the H3 has between to shifts relative the also However, are head (pink). shifts identity The shifts acid The noise. sites. amino experimental all (green). 86% to for Env due share shifts HIV are that of and differences Env distribution HA all HIV the between if plotted differences of expected and the variants (blue) 27 than two those ref. less distribution purple, between in are null in observed method H1 the are previously the and than replicates using those larger H3 HAs than much between H1 larger are Comparisons and gray. much (yellow) H3 (10). in between H1 are HA replicates site and H1 H3 each H3 the replicates and at scanning of between H1 preferences mutational scanning across acid deep mutational those amino individual deep and in three prior brown, the shift in in in are replicates measured replicates preferences three H1 acid the amino between and the study of current correlations the pairwise All in (B) labeled. HAs H3 Perth/2009 7. Fig. ti motn oepaieta esrmnso ia growth viral of measurements that emphasize to important is It hr r ag hfsi h fet fmttosbtenH n 3Hs ( HAs. H3 and H1 between mutations of effects the in shifts large are There C A ¨ si 9 noted (9) assig B eipratfrdtriighwbodyaygvnexperi- will given evolutionary make diverge any to proteins attempting broadly when forecasts. as generalized how be shift determining can investiga- ment effects for Further important 59). mutational (27, be how sites—much homologs of many comparisons related tion at protein-wide closely prior shifts more in substantial of observed found those measurements than we prior greater to com- HA, HA we H1 H3 when evolutionary an hand, on entire for other the measurements the On our across HA. pared extent H3N2 human gen- some of usefully least be history can at pro- strain on viral to as H3N2 measurements human eralized mutations that single a mutations of suggests from work HA effects of a Our the effects (69–75). shift has the evolve can work teins that prior epistasis extent Extensive that evolution. the shown HA’s generalizations to during These conserved valid to timespan. are be measurements 50-y these only a generalized from will then HAs and H3N2 HA other specific a to more to further generalizability their might strains. possibly cultures) divergent and airway utility performing human their that primary improve selections suspect or complex ferrets and We realistic (e.g., all. more they at using that experiments improve- information but similar an no nature represent over in so ment quantitative—and fitness are and capture measurements systematic perfectly our equivalent are of they had strength that mutations The (11– not nonepitope (9). mutations all effects of that deleterious effects work assumed nonantigenic modeling or property and the 13) any nature, omitted on in either HA fitness has H3 viral of to resembled studies mutations even comprehensive that of no effects were How- functional nature. there the in work, can strains our culture viral before of cell ever, success in the growth in about viral informative important be might measuring also it that caveats, are surprising these HA Given nature. seem 68). in than (67, fixes success other for strain never genes determining favorable mutation viral this as in culture, G78D Mutations cell identified in we growth although viral (63–66). nature example, in an relevant As are that pressures important capture emaue h fet falsnl mn cdmutations acid amino single all of effects the measured We R hlgntcte fH utps ihteWN13 1and H1 WSN/1933 the with subtypes, HA of tree Phylogenetic A) niae h ero orlto ofcet (C coefficient. correlation Pearson the indicates NSLts Articles Latest PNAS ecluae the calculated We ) | f10 of 7

EVOLUTION A C The human TMPRSS2 cDNA ORF was ordered from OriGene (NM 005656) and cloned into a pHAGE2 lentiviral vector under an EF1α-Int pro- moter followed by an IRES driving expression of mCherry to create plasmid pHAGE2–EF1aInt–TMPRSS2–IRES–mCherry-W. We used the lentivi- ral vector to transduce MDCK–SIAT1 or MDCK–SIAT1–CMV–PB1 cells and sorted an intermediate mCherry-positive population by flow cytometry. We refer to the sorted bulk population as MDCK–SIAT1–TMPRSS2 cells or MDCK–SIAT1–CMV–PB1–TMPRSS2 cells. There is no selectable marker for the TMPRSS2; however, we maintain the cells at low passage num- ber and have seen no indication that they lose their ability to sup- port the growth of viruses with H3 HAs in the absence of exogenous trypsin.

Generation of HA Codon Mutant Plasmid Libraries. HA and NA genes for the Perth/2009 viral strain were cloned from obtained from BEI Resources (NR-41803) into the pHW2000 (79) influenza reverse-genetics plasmids to create pHW-Perth09-HA and pHW-Perth09-NA. B We initially created a virus with the HA and NA from Perth/2009 and internal genes from WSN/1933 and passaged it in cell culture to test its genetic stability. To generate this virus, we transfected a coculture of 293T and MDCK–SIAT1–TMPRSS2 in D10 media (DMEM, supplemented with 10% heat-inactivated FBS, or fetal bovine serum, 2 mM L-glutamine, 100 U of penicillin per milliliter, and 100 µg of streptomycin per milliliter) with equal amounts of pHW-Perth09-HA, pHW-Perth09-NA, the pHW18* series of plasmids (79) for all non-HA/NA viral genes, and pHAGE2–EF1aInt–TMPRSS2– IRES-mCherry-W. The next day, we changed the media to influenza growth media (IGM, consisting of Opti-MEM supplemented with 0.01% heat- Fig. 8. Sites with strongly shifted amino acid preferences between H3 and inactivated FBS, 0.3% BSA, 100 U of penicillin per milliliter, 100 µg of H1 HAs. (A) The shift in amino acid preferences between the H3 and H1 HA streptomycin per milliliter, and 100 µg of calcium chloride per milliliter; no at each site as calculated in Fig. 7C is mapped onto the structure of the H3 trypsin was added since there was TMPRSS2) and then collected the viral HA. (B) Amino acid preferences of sites in the stalk domain are less shifted supernatant at 72 h posttransfection. This viral supernatant was blind pas- than those in the head domain. Sites absolutely conserved in all 18 HA sub- saged in MDCK–SIAT1–TMPRSS2 a total of six additional times. We isolated types are less shifted than other sites. Sites with one amino acid identity in viral RNA from these passaged viruses and sequenced the HA gene. The pas- the clade containing H1, H2, H5, and H6 and another identity in the clade saged HA had two mutations, G78D and T212I, which enhanced viral growth containing H3, H4, and H14 are more shifted than other sites. (C) Sites 107 as shown in SI Appendix, Fig. S1. The HA with these two mutations was and 75(HA2) help determine the different orientation of the globular head cloned into pHW2000 (79) and pICR2 (80) to create pHW-Perth09-HA-G78D- domain in H1 versus H3 HAs. These sites are shown in spheres on the struc- T212I and pICR2-Perth09-HA-G78D-T212I. For all subsequent experiments, ture of H1 and H3 and colored as in A, and the experimentally measured we used viruses with the HA containing these two mutations to improve amino acid preferences in the H1 and H3 HAs are shown. One monomer titers and viral genetic stability, and this is the HA that we refer to as is in dark gray, while the HA1 domain of the neighboring monomer is in Perth/2009. We used all non-HA genes (including NA) from WSN/1933 to lighter gray. help increase titers and reduce biosafety concerns. The codon-mutant libraries were generated using the approach in ref. 81 with the modifications in ref. 82. See SI Appendix, Supplementary Text for full details. Our work did not characterize the antigenic effects of muta- tions, which also play an important role in determining strain Generation and Passaging of Mutant Viruses. The mutant virus libraries were success in nature (13, 14). However, our basic selection and generated using the helper-virus approach described in ref. 10 with sev- deep-sequencing approach can be harnessed to completely map eral modifications, most notably the cell line used. Briefly, we transfected how mutations affect antibody recognition (76, 77). But so far, 5 × 105 MDCK–SIAT1–TMPRSS2 cells in suspension with 937.5 ng each of experiments using this approach have not examined antibod- four protein expression plasmids encoding the ribonucleoprotein complex ies or sera that are relevant to driving the evolution of H3N2 (HDM–Nan95–PA, HDM–Nan95–PB1, HDM–Nan95–PB2, and HDM–Aichi68– influenza (76, 77) or have used relevant sera but examined a non- NP) (71) and 1,250 ng of one of the three pICR2-mutant-HA libraries (or comprehensive set of mutations (16). Future experiments that the wild-type control) using Lipofectamine 3000 (ThermoFisher L3000008). completely map how HA mutations affect recognition by human We allowed the transfected cells to adhere in six-well plates and 4 h later changed the media to D10 media. Eighteen hours after transfection, we sera seem likely to be especially fruitful for informing viral infected the cells with the WSN/1933 HA-deficient (10) by forecasting. preparing an inoculum of 500 TCID50 per microliter of helper virus (as com- puted on HA-expressing cells) in IGM, aspirating the D10 media from the Materials and Methods cells, and adding 2 mL of the helper-virus inoculum to each well. After 3 h, Data and Computer Code. Deep sequencing data are available from the we changed the media to fresh IGM. At 24 h after helper-virus infection, we Sequence Read Archive under BioSample accession nos. SAMN08102609 harvested the viral supernatants for each replicate, froze aliquots at –80◦C, and SAMN08102610. Computer code used to analyze the data are at and titered them in MDCK–SIAT1–TMPRSS2 cells. The titers were 92, 536, https://github.com/jbloomlab/Perth2009-DMS-Manuscript. 536, and 734 TCID50 per microliter for the three library replicates and the wild-type control, respectively. 5 HA Numbering. Sites are in H3 numbering, with the signal peptide in neg- We passaged 9 × 10 TCID50 of the transfection supernatants at an MOI 6 ative numbers, HA1 in plain numbers, and HA2 denoted with “(HA2).” of 0.0035 TCID50 per cell. To do this, we plated 4.6 × 10 MDCK–SIAT1– Sequential 1, 2, ... numbering of the Perth/2009 HA can be converted to TMPRSS2 cells per dish in fifteen 15-cm dishes in D10 media and allowed the H3 numbering by subtracting 16 for the HA1 subunit and subtracting 345 cells to grow for 24 h, at which time they were at ∼ 1.7 × 107 cells per dish. for the HA2 subunit. We replaced the media in each dish with 25 mL of an inoculum of 2.5 TCID50 of virus per microliter in IGM. Three hours postinfection, we replaced the Creation of MDCK–SIAT1–TMPRSS2 Cells. When growing influenza virus in inoculum with fresh IGM for replicates 1, 2, and 3-2. We did not perform a cell culture, trypsin is normally added to cleave HA into its mature form. media change for replicate 3-1. As can be seen in Fig. 1D, the media change To obviate the need for trypsin, we engineered MDCK–SIAT1 cells and does not appear to have a substantial effect, as replicate 3-1 looks compa- MDCK–SIAT1–CMV–PB1 (78) cells to constitutively express the TMPRSS2 pro- rable to the other replicates. We collected viral supernatant for sequencing tease, which cleaves and activates HA in the human airways (31, 32). 48 h postinfection.

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1806133115 Lee et al. Downloaded by guest on September 28, 2021 Downloaded by guest on September 28, 2021 0 odM,BomJ 21)Acrt esrmn fteefcso l amino-acid all of effects the of measurement Accurate (2016) JD Bloom MB, Doud 10. 4 olB,e l 21)Sbtttosna h eetrbnigst eemn major determine site binding receptor the near Substitutions (2013) al. et BF, Koel 14. dynamics, Prediction, (2016) BI Shraiman CA, Russell RS, Daniels T, Bedford RA, Neher 13. amino hemagglutinin high-impact and virus. low- influenza of Identification of (2016) antigenicity al. the et infer WT, Harvey to data 12. sequence Using (2013) al. et H, Sun 11. 6 iC ta.(06 eeto fatgnclyavne ainso esnlinfluenza seasonal of variants advanced antigenically of of Selection Identification (2016) al. (2015) et SE C, Li Hensley K, 16. Alby TM, Ross K, Parkhouse BS, Chambers 15. 9 tihurD oln 18)Rpdeouino N viruses. RNA of evolution Rapid (1987) . J RNA Holland of D, evolution Rapid Steinhauer RNA (1982) in al. 19. load et mutation J, Holland deleterious for 18. evidence Phylogenetic (2007) al. et OG, Pybus 17. mn acid Effects. amino Mutational in of accessibilities Quantification absolute the structure using (42)] accessibility 1RVX normal- code solvent then 84. We ID relative ref. (83). [PDB a Proteins) HA to of H1 Structure ized the Secondary or (Define DSSP (41)] using 4O5N code ID [PDB acid amino for in other provided and depth are Manuscript/blob/master/analysis read at preferences about are plots metrics acid detailed control quality and amino code The Computer S3. data. Dataset sequencing deep the (https://github.com/jbloomlab/dms (39) age e tal. et Lee Accessibility. Solvent entropy Relative and Entropy Shannon See 35. performed details. ref. full were for in parameter described stringency Parameter. as a phydms of Stringency using fitting and and comparisons Comparison model Model Phylogenetic Data. Sequencing Deep of Analysis See then 10. were ref. sequencing details. in deep and as prep performed library (5 barcoded-subamplicon The P09-HA-Rev was (5 gene and P09-HA-For 200820) HA primers (Agilent The Transcriptase using Reverse instructions. AccuScript manufacturer’s with 600 the reverse-transcribed adding to tube, according 400 microcentrifuge ceeding in 4 a at pellet to h viral liquid 1.5 the RNeasy with for resuspending Qiagen plemented rpm the by using 22,000 RNA Kit viral at extracted Mini and the supernatant rotor, clarified SW28 clarified Coulter we the Beckman virus, of HA mL wild-type 2,000 24 at the centrifuging and by libraries supernatant virus HA cate Sequencing. Subamplicon Barcoded .BdodT ta.(05 lblcruainpten fsaoa nunavrssvary viruses L influenza M, seasonal Łuksza of patterns 9. circulation Global (2015) al. et patterns T, on of Bedford load mutation shape deleterious 8. a the of effects from The (2015) evolution DA Rasmussen Predicting K, (2014) Koelle BI 7. Shraiman CA, Russell RA, Neher in 6. revealed selection of L tempo N, and Strength Strelkowa (2011) 5. M Pascual S, Cobey T, Bedford 4. .SihD,e l 20)Mpigteatgncadgntceouino nunavirus. influenza of evolution of genetic and adaptation antigenic molecular the Mapping of (2004) al. rate et genomic DJ, Smith The (2011) 3. OG Pybus EC, Holmes S, Bhatt 2. .FthW,Bs M edrC,CxN 19)Ln emted nteeouinof evolution the in trends term Long (1997) NJ Cox CA, Bender RM, Bush WM, Fitch 1. eqatfidteaslt ovn cesblt fec ieo h 3HA H3 the of site each of accessibility solvent absolute the quantified We uain oiflez hemagglutinin. influenza to mutations drift. antigenic with humans. in evolution antigenic A/H3N2’s influenza of trees. genealogical Genetics genealogies. gene viral niei hnedrn nunavrsevolution. virus influenza during change antigenic viruses. influenza seasonal of USA phenotypes Sci Acad antigenic of visualization and viruses. (H1N1) A influenza of 12:e1005526. drift antigenic drive that substitutions acid MBio Science virus. A influenza human the 41:409–431. viruses. 2014–2015 the during drift antigenic season. H3N2 influenza for responsible residues hemagglutinin A. type influenza human HA1 H(3) iue n t otiuint ia evolution. viral to contribution its and viruses 4:e00230–13. h 305:371–376. a Microbiol Nat r 192:671–682. o site for a si 21)Apeitv tesmdlfrinfluenza. for model fitness predictive A (2014) M assig 1 ¨ to 113:E1701–E1709. x si 21)Coa nefrnei h vlto finfluenza. of evolution the in interference Clonal (2012) M assig ¨ tsite at a β elRep Cell 2 mratehnl ietn 0tms rnfrigthe transferring times, 30 pipetting -mercaptoethanol, r eLife a unie as quantified was as Nature 0 1:16058. -AGTAGAAACAAGGGTGTTTTTAATTACTAATACAC-3 h M vlBiol Evol BMC r 3:e03568. r . = 12:1–6. 523:217–220. − o ilEvol Biol Mol https://github.com/jbloomlab/Perth2009-DMS- 0 P -AGCAAAAGCAGGGGATAATTCTATTAATC-3 IAppendix, SI code/analysis x rcNt cdSiUSA Sci Acad Natl Proc π oetatvrlRAfo h he repli- three the from RNA viral extract To eue h dms the used We 11:220. r ,x Viruses × h feto uaigsite mutating of effect The log(π 28:2443–2451. g IAppendix SI o ilEvol Biol Mol o i,te ultracentrifuged then min, 5 for 8:155. r tools2 ,x ,where ), upeetr Text Supplementary notebook.ipynb. Science 0 tao,adpro- and ethanol, 70% µL eLife eso ..)t analyze to 2.2.5) version , ecluae Shannon calculated We Science fbfe L sup- RLT buffer of µL 94:7712–7718. 342:976–979. ol2sfwr pack- software tools2 4:e07361. , 24:845–852. π upeetr Text Supplementary r ,x nuRvMicrobiol Rev Annu stepreference the is 215:1577–1585. Nature Phylogenetic LSPathog PLoS 507:57–61. rcNatl Proc ◦ o full for r na in C from 0 0 ). ) 8 H 21)Rcmeddvrssfriflez acnsfruei h 2010- the in use for vaccines Mapping influenza (2018) for JD viruses Bloom Recommended J, (2010) viral Overbaugh WHO of 28. SK, profile genetic Hilton High-resolution AS, (2015) R Dingens Sun HK, TT, Wu Haddox Y, 27. Du NC, Wu H, of Qi effects 26. the of estimation Experimental (2016) JD Bloom hemagglutinin AS, virus Dingens A HK, influenza Haddox of profiling 25. High-throughput antigenic (2014) and al. tolerance et mutational NC, Wu inherent The 24. (2014) JD Bloom science. protein B, of Thyagarajan style new A 23. scanning: mutational Deep (2014) S is Fields recombination DM, Homologous Fowler (2008) EC 22. viruses. Holmes RNA JK, Taubenberger of Y, behavior Zhou the MF, and Boni theory 21. Quasispecies (2010) R Andino AS, Lauring 20. 5 itnS,Du B lo D(07 hds otaefrpyoeei analyses phylogenetic for Software phydms: (2017) JD essential Bloom one least MB, at Doud express SK, to fail Hilton virions a 35. influenza Most (2013) al. et CB, reassort- Brooke virus Influenza 34. (2013) AC Lowen J, Steel Z, Ende L, Priyamvada N, Marshall 33. B 32. B Overexpression (2003) 31. HD Klenk NA, Roberts J, Carr T, Matrosovich the M, in Matrosovich use 30. for vaccines virus influenza of composition Recommended (2011) WHO 29. eei fIfciu iessgat ..i e imdclShlradis and Scholar Biomedical Patho- GM119774. Pew R35 the Faculty a Grant is in NIH a Simons T.B. by Investigator the grant. Grant by supported Young Diseases and part Infectious Wellcome (NIGMS) Institute of Burroughs in Medical genesis Sciences a Hughes supported Howard Medical and is the Foundation Center General J.D.B. from the grant of of by Scholar funded research part Institute is The which in T.B.) National (CIDID), U54GM111274. supported and Diseases NIH Infectious was J.D.B. of J.M.L. by (to Dynamics T.B.). and AI127893 deep Inference Allergy (to R01 for of Illumina AI117891 Grants Institute the U19 National (NIAID) performing and NIH Diseases by for Infectious supported We Core and was manuscript. Genomics work the This Hutch on shar- sequencing. comments Fred for helpful Neher the providing Richard thank and and analysis code data analysis about ing discussions helpful for Bell ACKNOWLEDGMENTS. (78, variants viruses these PB1flank-GFP See of of 88). context each the in of generation and titers reverse-genetics S287A, supernatant after T40V, the T24F, examined C52C, C52A, we M(-16)K, C199(HA2)K, Mutants. mutants point Point HA Perth/2009 Individual of Validation et/09H n S/93H A sn h prahi e.2.See 27. ref. in approach the using HAs H1 Appendix, WSN/1933 Shifts. and H3 Mutational Perth/2009 of Analysis See (87). Nextflu in implemented Appendix, first SI as fol- and performed pipeline estimated augur were were (85), Nextstrain’s mutations timing lowing of pipeline trajectories length Frequency augur branch (86). TreeTime Nextstrain’s Frequencies. and with using reconstruction Mutation state generated Maximum ancestral was and tree Tree genetic Phylogenetic H3N2 site tro .5(see 2.05 param- examples/Doud2016/analysis stringency of a by rescaled 10, eter ref. in reported values replicate-average where 01nrhr eipeeiflez esn www.who.int/influenza/vaccines/virus/ season. recommendations/201002 influenza envelope. hemisphere northern HIV 2011 of landscape evolutionary the e34420. along effects mutational matters. it Why genomes: culture. cell in replication viral on Pathog protein PLoS envelope HIV’s to mutations amino-acid all resolution. single-nucleotide at gene hemagglutinin. influenza of evolvability Methods Nat virus. A influenza human in absent or rare very Pathog PLoS nomdb epmttoa scanning. mutational deep by informed protein. viral mismatch. segment of absence the in 9:e1003421. frequency high with and occurs localization ment subcellular in differs HAT inhibitors. and protease TMPRSS2 to susceptibility proteases airway by tinin epithelium. airway human from HAT and TMPRSS2 inhibitors. the of www.who.int/influenza/vaccines/ season. 2011 influenza hemisphere northern 2011-2012 r tce ,e l 20)Poeltcatvto fiflez iue ysrn proteases serine ottcher-Friebertsh by viruses influenza of activation Proteolytic (2006) al. et E, ottcher ¨ ¨ ssoni i.2 h S/93H Aaioai rfrne r the are preferences acid amino HA H1 WSN/1933 The 2. Fig. in shown as π 02 r IAppendix, SI ,a α 1 eomnainpfu=.Acse pi ,2018. 9, April Accessed recommendation.pdf?ua=1. 2 -illrnfrs nmc el nrae nunavrssniiiyto sensitivity virus influenza increases cells mdck in 6-sialyltransferase -2, upeetr Text Supplementary and upeetr Text Supplementary 12:e1006114. 6:e1001005. Virol J 11:801–807. π r ,a ue,E ta.(00 laaeo nunavrshemagglu- virus influenza of Cleavage (2010) al. et E, auser, 2 ¨ 87:3155–3162. r h ecldpeeecsfraioacids amino for preferences rescaled the are upeetr Text Supplementary https://github.com/jbloomlab/dms etakSrhHlo,Hg adx n Sidney and Haddox, Hugh Hilton, Sarah thank We Virol J urOi Virol Opin Curr eomnainpfu=.Acse pi ,2018. 9, April Accessed Recommendation.pdf?ua=1. notebook.ipynb). 77:8418–8425. o uldetails. full for ecmae h rfrne o the for preferences the compared We log o uldetails. full for Virol J c Rep Sci 2 PeerJ π π eLife 14:62–70. r r ,a ,a 11:5605–5614. 2 1 ovldt h ia rwhof growth viral the validate To 4:4942. o uldetails. full for 5:e3657. , 3:e03300. Virol J NSLts Articles Latest PNAS Virol J 82:4807–4811. 80:9896–9898. tools2/blob/master/ LSPathog PLoS h pylo- The a 1 | or eLife f10 of 9 a 2 [1] at SI 7:

EVOLUTION 36. Yang Z, Nielsen R, Goldman N, Pedersen AMK (2000) Codon-substitution models for 61. Ha Y, Stevens DJ, Skehel JJ, Wiley DC (2002) H5 avian and H9 swine influenza heterogeneous selection pressure at amino acid sites. Genetics 155:431–449. virus haemagglutinin structures: Possible origin of influenza subtypes. EMBO J 21: 37. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: 865–875. Advantages of Akaike information criterion and Bayesian approaches over likelihood 62. Russell R, et al. (2004) H1 and H7 influenza haemagglutinin structures extend a ratio tests. Syst Biol 53:793–808. structural classification of haemagglutinin subtypes. Virology 325:287–296. 38. Wolf Y, Viboud C, Holmes E, Koonin E, Lipman D (2006) Long intervals of stasis punc- 63. Daniels R, et al. (1985) Fusion mutants of the influenza virus hemagglutinin tuated by bursts of positive selection in the seasonal evolution of influenza A virus. . Cell 40:431–439. Biol Direct 1:34. 64. Sun X, Longping VT, Ferguson AD, Whittaker GR (2010) Modifications to the hemag- 39. Bloom JD (2015) Software for the analysis and visualization of deep mutational glutinin cleavage site control the virulence of a neurotropic H1N1 influenza virus. J scanning data. BMC Bioinformatics 16:168. Virol 84:8683–8690. 40. Bloom JD (2017) Identification of positive selection in genes is greatly improved by 65. Lee HK, et al. (2013) Comparison of mutation patterns in full-genome A/H3N2 using experimentally informed site-specific models. Biol Direct 12:1. influenza sequences obtained directly from clinical samples and the same samples 41. Lee PS, et al. (2014) Receptor mimicry by antibody F045-092 facilitates universal after a single MDCK passage. PLoS One 8:e79252. binding to the H3 subtype of influenza virus. Nat Commun 5:3614. 66. Wu N, et al. (2017) A structural explanation for the low effectiveness of the seasonal 42. Gamblin S, et al. (2004) The structure and receptor binding properties of the 1918 influenza H3N2 vaccine. PLoS Pathog 13:e1006682. influenza hemagglutinin. Science 303:1838–1842. 67. Memoli MJ, et al. (2009) Recent human influenza A/H3N2 virus evolution driven by 43. Waterfield M, Scrace G, Skehel J (1981) Disulphide bonds of haemagglutinin of Asian novel selection factors in addition to . J Infect Dis 200:1232–1241. influenza virus. Nature 289:422–424. 68. Raghwani J, Thompson RN, Koelle K (2017) Selection on non-antigenic gene seg- 44. Weis W, et al. (1988) Structure of the influenza virus haemagglutinin complexed with ments of seasonal influenza A virus and its impact on adaptive evolution. Virus Evol its receptor, sialic acid. Nature 333:426–431. 3:vex034. 45. Martin J, et al. (1998) Studies of the binding properties of influenza hemagglutinin 69. Pollock DD, Thiltgen G, Goldstein RA (2012) Amino acid coevolution induces an receptor-site mutants. Virology 241:101–111. evolutionary Stokes shift. Proc Natl Acad Sci USA 109:E1352–E1359. 46. Nobusawa E, Ishihara H, Morishita T, Sato K, Nakajima K (2000) Change in receptor- 70. Shah P, McCandlish DM, Plotkin JB (2015) Contingency and entrenchment in protein binding specificity of recent human influenza A viruses (H3N2): A single amino acid evolution under purifying selection. Proc Natl Acad Sci USA 112:E3226–E3235. change in hemagglutinin altered its recognition of sialyloligosaccharides. Virology 71. Gong LI, Suchard MA, Bloom JD (2013) Stability-mediated epistasis constrains the 278:587–596. evolution of an influenza protein. eLife 2:e00631. 47. Yang H, et al. (2015) Structure and receptor binding preferences of recombinant 72. Natarajan C, et al. (2013) Epistasis among adaptive mutations in deer mouse human A (H3N2) virus . Virology 477:18–31. hemoglobin. Science 340:1324–1327. 48. Stech J, Garn H, Wegmann M, Wagner R, Klenk H (2005) A new approach to an 73. Harms MJ, Thornton JW (2014) Historical contingency and its biophysical basis in influenza live vaccine: Modification of the cleavage site of hemagglutinin. Nat Med glucocorticoid receptor evolution. Nature 512:203–207. 11:683–689. 74. Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25:1204–1218. 49. Girard G, Gultyaev A, Olsthoorn R (2011) Upstream start codon in segment 4 of North 75. Starr TN, Picton LK, Thornton JW (2017) Alternative evolutionary histories in the American H2 avian influenza A viruses. Infect Genet Evol 11:489–495. sequence space of an ancient protein. Nature 549:409–413. 50. Heaton NS, Sachs D, Chen CJ, Hai R, Palese P (2013) Genome-wide mutagenesis of 76. Doud MB, Hensley SE, Bloom JD (2017) Complete mapping of viral escape from influenza virus reveals unique plasticity of the hemagglutinin and NS1 proteins. Proc neutralizing antibodies. PLoS Pathog 13:e1006271. Natl Acad Sci USA 110:20248–20253. 77. Doud MB, Lee JM, Bloom JD (2018) How single mutations affect viral escape from 51. Mallajosyula VV, et al. (2014) Influenza hemagglutinin stem-fragment immunogen broad and narrow antibodies to H1 influenza hemagglutinin. Nat Commun 9:1386. elicits broadly neutralizing antibodies and confers heterologous protection. Proc Natl 78. Bloom JD, Gong LI, Baltimore D (2010) Permissive secondary mutations enable the Acad Sci USA 111:E2514–E2523. evolution of influenza oseltamivir resistance. Science 328:1272–1275. 52. Laursen NS, Wilson IA (2013) Broadly neutralizing antibodies against influenza 79. Hoffmann E, Neumann G, Kawaoka Y, Hobom G, Webster RG (2000) A DNA transfec- viruses. Antiviral Res 98:476–483. tion system for generation of influenza A virus from eight plasmids. Proc Natl Acad 53. Chai N, et al. (2016) Two escape mechanisms of influenza A virus to a broadly Sci USA 97:6108–6113. neutralizing stalk-binding antibody. PLoS Pathog 12:e1005702. 80. Ashenberg O, Padmakumar J, Doud MB, Bloom JD (2017) Deep mutational scanning 54. Wiley D, Wilson I, Skehel J (1981) Structural identification of the antibody-binding identifies sites in influenza nucleoprotein that affect viral inhibition by MxA. PLoS sites of Hong Kong influenza haemagglutinin and their involvement in antigenic Pathog 13:e1006288. variation. Nature 289:373–378. 81. Bloom JD (2014) An experimentally determined evolutionary model dramatically 55. Popova L, et al. (2012) Immunodominance of antigenic site B over site A of improves phylogenetic fit. Mol Biol Evol 31:1956–1978. hemagglutinin of recent H3N2 influenza viruses. PLoS One 7:e41895. 82. Dingens AS, Haddox HK, Overbaugh J, Bloom JD (2017) Comprehensive mapping of 56. Wilson I, Skehel J, Wiley D (1981) Structure of the haemagglutinin membrane HIV-1 escape from a broadly neutralizing antibody. Cell Host Microbe 21:777–787.e4. glycoprotein of influenza virus at 3 A˚ resolution. Nature 289:366–373. 83. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: Pattern recog- 57. Ewens WJ (2012) Mathematical Population Genetics 1: Theoretical Introduction nition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637. (Springer Science & Business Media, New York). 84. Tien M, Meyer AG, Spielman SJ, Wilke CO (2013) Maximum allowed solvent accessi- 58. McWhite C, Meyer A, Wilke C (2016) Sequence amplification via cell passaging creates bilites of residues in proteins. PLoS One 8:e80635. spurious signals of positive adaptation in influenza virus H3N2 hemagglutinin. Virus 85. Hadfield J, et al. (2018) Nextstrain: Real-time tracking of pathogen evolution. Bioin- Evol 2:vew026. formatics, 10.1093/bioinformatics/bty407. 59. Doud MB, Ashenberg O, Bloom JD (2015) Site-specific amino acid preferences are 86. Sagulenko P, Puller V, Neher RA (2018) TreeTime: Maximum-likelihood phylodynamic mostly conserved in two closely related protein homologs. Mol Biol Evol 32:2944– analysis. Virus Evol 4:vex042. 2960. 87. Neher RA, Bedford T (2015) nextflu: Real-time tracking of seasonal influenza virus 60. Nobusawa E, et al. (1991) Comparison of complete amino acid sequences and evolution in humans. Bioinformatics 31:3546–3548. receptor-binding properties among 13 serotypes of hemagglutinins of influenza A 88. Hooper KA, Bloom JD (2013) A mutant influenza virus that uses an N1 neuraminidase viruses. Virology 182:475–485. as the receptor-binding protein. J Virol 87:12531–12540.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1806133115 Lee et al. Downloaded by guest on September 28, 2021