<<

Genome-wide mutagenesis of influenza reveals unique plasticity of the hemagglutinin and NS1 proteins

Nicholas S. Heatona, David Sachsb, Chi-Jene Chena, Rong Haia, and Peter Palesea,c,1

Departments of aMicrobiology, bGenetics, and cMedicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029

Contributed by Peter Palese, November 1, 2013 (sent for review October 23, 2013)

The molecular basis for the diversity across influenza strains is that although the majority of the viral genome is resistant to poorly understood. To gain insight into this question, we muta- major , two viral proteins contain extremely flexible genized the viral genome and sequenced recoverable . Only proteins scaffolds. These flexible domains may allow for rapid two small regions in the genome were enriched for insertions, the adaptation to new or changing host environments. hemagglutinin head and the immune-modulatory nonstructural protein 1. These proteins play a major role in host adaptation, and Results thus need to be able to evolve rapidly. We propose a model in To determine the relative tolerance of the IAV proteins to which certain influenza A virus proteins (or protein domains) exist mutation, we generated high-coverage mutant libraries of all as highly plastic scaffolds, which will readily accept yet eight segments of the human H1N1 influenza A/Puerto retain their functionality. This model implies that the ability to Rico/8/1934 (PR8). We mutagenized the genome in vitro using Mu fi rapidly acquire mutations is an inherent aspect of influenza HA the transposase and an arti cial transposable and nonstructural protein 1 proteins; further, this may explain element (16). After selection and removal of the transposable element, a 15-nt insertion that contains a unique 10-nt conserved why rapid antigenic drift and a broad host range is observed with A influenza A virus and not with some other RNA viruses. molecular tag remains in the virus genome (Fig. 1 ). The insertions can code for different amino acids, depending on the insertional mutagenesis | protein flexibility | frame, but cannot code for a stop codon. We chose to use in- sertional mutagenesis over a strategy that would introduce point mutations to model a more significant lesion in the fl n uenza A virus (IAV) is a segmented negative-sense, single- and thus identify the most flexible genomic locations. Istranded RNA virus (1). Every year in the United States alone, To generate recombinant viruses, the mutant library for one IAV is thought to infect 5–20% of the population, leading to segment was transfected with the seven other nonmutagenized more than 200,000 hospitalizations for IAV-related disease (2). segments into cells. To maximize the recovery of mutant viruses, Despite this major burden to human health, and significant re- multiple independent rescues were performed, and the rescued search efforts into viral pathogenesis, there are still major viral particles were pooled and used to infect the highly per- aspects of IAV biology that remain poorly understood. missive Madin–Darby canine kidney (MDCK) cell line. Virus Most RNA viruses, including IAV, encode an error-prone was allowed to replicate for 24 h (corresponding to approxi- RNA-dependent RNA polymerase responsible for replicating mately three replication cycles), and then infectious viral par- the viral genome (1, 3). Despite the fact that all eight influenza ticles were passaged on new MDCK cells for another 24 h. Total viral segments have errors introduced during RNA replication RNA was then extracted and subjected to RT-PCR for the (1), sequence analysis shows that the conservation of segments segment encoded by the mutant library. cDNA was then pre- (and therefore proteins) range from highly variable to highly pared for next-generation Illumina sequencing to identify loca- conserved (4–7). It is currently unclear as to whether this dis- tions in the viral genome that tolerate the insertions (Fig. 1B). parity arises purely from a lack of selective pressure (i.e., all Sequencing of our input libraries revealed good genome cov- proteins encoded by the virus tolerate a range of mutations, but erage for all segments (Fig. 2). Despite segments being similarly only those that confer an evolutionary advantage become fixed in a population) or if there is an inherent difference in the ability of Significance the viral proteins themselves to tolerate mutations. Currently, there is no direct experimental evidence to distinguish between Influenza A virus is characterized by a large host range and these two possibilities. — One approach to directly provide evidence of the potential rapid antigenic drift features not common to many viruses. mutability of the viral genome is genome-wide insertional mu- To begin to understand the molecular basis behind these fea- tagenesis. This technique is one in which relatively small inser- tures, we performed genome-wide mutagenesis to gauge the tions are randomly introduced at all (or at least the majority of) relative flexibility of the viral genome. We found that only two possible sites across a genome. By determining the locations at small regions in the genome (the hemagglutinin head domain which the viral genome will tolerate insertions, one can evaluate and nonstructural protein 1) were highly tolerant to trans- the intrinsic flexibility of the genome. Viruses with small genomes poson-based insertional mutation. We believe that this anal- are excellent candidates for this type of analysis. Genome-wide ysis highlights an evolutionary strategy of influenza A virus: insertional mutagenesis of two positive-stranded RNA viruses has maintain highly plastic domains in proteins required for rapid been reported (8, 9); however, no such mutagenesis has been host adaptation. This finding may help explain why influenza performed for a negative-sense RNA virus. viruses possess certain traits not common to all RNA viruses. In this report, we performed near-saturating insertional mu- tagenesis of the IAV genome. Upon mapping tolerated insertion Author contributions: N.S.H. and P.P. designed research; N.S.H., C.-J.C., and R.H. per- sites, we identified two regions of the viral genome that were formed research; N.S.H., D.S., and P.P. analyzed data; and N.S.H. and P.P. wrote the paper. highly enriched for mutations. The identified regions were in The authors declare no conflict of interest. domains associated with host adaptation and immune response 1To whom correspondence should be addressed. E-mail: [email protected]. evasion (10, 11): the viral hemagglutinin (12, 13) This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. and nonstructural protein 1 (NS1) (14, 15). These data suggest 1073/pnas.1320524110/-/DCSupplemental.

20248–20253 | PNAS | December 10, 2013 | vol. 110 | no. 50 www.pnas.org/cgi/doi/10.1073/pnas.1320524110 Downloaded by guest on September 30, 2021 genome as a whole is tolerant to mutations, even under highly permissive conditions with minimal exogenous selection pres- sures. Further, only two small regions of the viral genome were able to tolerate insertions, identifying these domains as rare “islands” of genomic plasticity. To further characterize the nature of our insertional mutants, we decided to focus on the HA protein. We first repeated our rescue experiments of the HA library, but instead of infecting MDCK cells, we inoculated the entire rescue into embryonated chicken eggs. For this experiment, instead of trying to preserve the diversity of rescued viruses, we wanted to introduce the mutant pool into a more competitive environment and identify the most well-tolerated insertion sites. Competition in eggs was expected due to differences in viral mutant growth rates as well as a more restrictive environment due to the presence of the chicken embryo innate (17). After incubation in eggs for 48 h (a sufficient time for the virus to reach maximal potential titer), we collected virions, amplified the HA segment, and submitted the library for deep sequencing. To our surprise, many of the insertion sites identified in MDCK cells were also identified in our egg competition analysis, indicating that, for the most part, diversity of the recovered viruses was preserved (Fig. 3). In fact, essentially every site identified in the HA head in MDCKs was also identified in the egg analysis. In contrast, however, we failed to recover mutations in the HA2 domain from the eggs (Fig. 3). We interpret this data to mean that insertions in the head domain are relatively well tolerated, whereas those in the stalk domain may lead to a loss of fitness. It is well known that -escape point mutations readily occur in five well-mapped antigenic sites in the HA head (18). We therefore overlaid the known antigenic sites with the loca- tions of insertional mutants verified by plaque purification or A fl common to both the MDCK and eggs rescues (Fig. 4 ). Though Fig. 1. Construction of mutant libraries and rescue of viruses. (A)In uenza fi segments (blue) were cloned into a pGEM shuttling vector and mutagenized our analysis overlapped with four of the ve characterized sites, in vitro. The virus segment containing the transposon (red) was removed we also found insertions outside of the classical antigenic sites, from the shuttling vector and ligated into the viral RNA expression plasmid indicating that the HA head can change in areas not typically pPolI. Finally, the body of the transposon was removed and the viral seg- associated with antibody escape. Upon highlighting these loca- ment was ligated back together to leave the 15-nt insertion (red striped). (B) tions on the crystal structure of the HA protein [Protein Data The mutant library for a given segment was transfected with the other seven Bank (PDB) ID code 1RU7] (19), we observe that the insertions WT segments into 293T cells. Rescued virus was used to infect MDCK cells. all sit on the upper third of the protein structure, excluded from After 24 h, virus was collected and used to infect naïve MDCK cells. RNA was the stalk domain (Fig. 4 B and C). To confirm that a range of collected from the second set of MDCK cells and subjected to RT-PCR for the HA1 mutations are not associated with a loss of fitness, we mutant segment. The input library was PCR-amplified from the expression randomly selected five of the transposon insertion sites, cloned plasmid. Both sets of amplified DNA were prepared for sequencing and them into the reverse genetic system, and rescued the viruses. submitted for Illumina next-generation sequencing. Growth-curve analysis of the mutants compared with WT PR8 revealed no significant differences in growth under these con- ditions (Fig. 4D; Fig. S2). mutagenized and virus being rescued under the same conditions, sequencing revealed that the vast majority of the viral genome Discussion (six of the eight segments) was highly refractory to tolerating B–D F–H Our genome-wide transposon mutagenesis of IAV revealed that insertions (Fig. 2 and ; Dataset S1). Two segments, the viral genome, for the most part, tolerates very few insertions. however—the segments coding for the viral hemagglutinin (HA) — For example, only a single insertion site in any of the three viral and nonstructural protein 1(NS1)/nuclear export protein (NEP) polymerase segments could be rescued. Similarly behaved were were found to tolerate insertions in many locations; sites enriched the NA, NP, and M segments with the exception of one small greater than 10-fold over background are represented by green ∼10-nt region in the M2 protein. The most well tolerated M2 A E fi bars (Fig. 2 and ; Dataset S1). To con rm our results, we insertions, around amino acid 29, fall into the transmembrane independently repeated our rescues of the mutant libraries and helix (20). The translation of these insertions is such that a string sequenced plaquable virus clones. Though we could rescue at of alanines is inserted into an existing run of alanines (Table 1); least one insertional mutant in all of the viral coding regions, the we suspect that this simply extends the helix, which has minor HA and NS segments again were the most easily recovered and effects on structure and function. It will be important, however, overrepresented (Table 1). to characterize some of the insertional mutants for impacts on Upon closer inspection, it became apparent that the HA and ion channel activity and other M2 functions such as membrane NS segments did not uniformly tolerate mutations. In fact, most scission (21). Overall, however, in contrast to the mostly in- of the identified insertion sites in the NS segment were in NS1 tolerant genome, the vast majority of the identified insertion sites (approximately nucleotides 30–720) and excluded the vast ma- were found in the HA1 domain of the HA protein and in NS1, jority of the NEP protein (Fig. 2A; Fig. S1). Similarly, the HA despite the fact that these domains together represent only segment had the vast majority of insertion sites with greater than ∼12% of the entire genome. 10-fold enrichment over background in the HA1 region corre- A pathogen that causes persistent infection or relies on re- sponding to the HA head domain (approximately nucleotides infection of the same individual must have an 50–1000; Fig. 2E). These data exclude the notion that the viral strategy to escape the host antimicrobial response (22). When

Heaton et al. PNAS | December 10, 2013 | vol. 110 | no. 50 | 20249 Downloaded by guest on September 30, 2021 Fig. 2. Insertional mutagenesis of the IAV genome reveals unique plasticity of NS and HA segments. (A–H) Results of the Illumina sequencing analysis for all eight segments. For all panels, input libraries (Left) were sequenced from the DNA rescue plasmids to determine mutational coverage. Rescued virus was amplified via RT-PCR from MDCK cell culture (Right). The x axis corresponds to the nucleotide position along the viral segment. The y axis corresponds to raw number of reads for the input and fold increase above background for the output. For output samples, a cutoff of 10-fold enrichment is indicated with the dotted line. Green lines represent viruses that tolerate insertions at nucleotide positions greater than 10-fold above background. Red lines indicate viruses that are enriched less than 10-fold. Asterisks on A and B indicate values greater than the y-axis limit; actual values are denoted on the panel.

genome size is not a limiting factor, antigenic variability can scaffolds, which can perform a core set of functions while main- simply be encoded as is well characterized to occur with the taining a large evolutionary potential. This potential must exist antigenic pilin of some Neisseria species (23). In the case of the to rapidly adapt to antibody pressure and antagonize antiviral large DNA poxvirus , the viral genome can literally signaling in a changing host environment, respectively. NS1 has expand to increase the adaptive potential in response to a strong been described to interact with a multitude of host proteins (14, selective pressure (24). Viruses with small RNA genomes, like 25), which puts it in the center of genetic conflicts across a range IAV, cannot use similar mechanisms, and thus face an inherent of hosts. We mapped the predicted NS1 insertions on the crystal dilemma; they must encode multifunctional and highly optimized structure of NS1 and found that six of seven sites are surface- proteins to accomplish the conquest of host cells, but must also exposed (PDB ID code 3F5T) (26) (Fig. S4), and this is consis- remain flexible to escape the host response. tent with the idea of flexible viral domains interacting with host The model that we propose here implies that the strategy of proteins. Our model suggests that the virus is inherently pre- IAV is to evolve certain proteins or protein domains to be in- disposed to rapidly acquire mutations in certain proteins but herently plastic and mutable; this allows for the “viable and fit” not in others. pool of viral quasispecies generated via random replication error This framework may have significant implications for trans- to be disproportionately increased rapidly in certain viral pro- lational medicine. For example, an area of major current work in teins due to their much higher inherent tolerances (Fig. S3). The the IAV field is the development of cross-reactive , HA head domain and NS1 may have evolved to exist as flexible which target the HA2 conserved stalk domain region of the IAV

20250 | www.pnas.org/cgi/doi/10.1073/pnas.1320524110 Heaton et al. Downloaded by guest on September 30, 2021 Table 1. Verified transposon insertions sites of viruses recoverable by plaque purification Nucleotide preceding Amino acid Translation of Segment Gene insertion preceding insertion insertion 1 tnemgeS 1 2BP 182 48 GAAAAG 2 tnemgeS 2 1BP 116 591 TMAAAT 3 tnemgeS 3 AP 8702 486 GGAAAG Segment 4 HA 75 14 VRPQA HA 418 128 DAAAFE HA 439 135 NAAAPK HA 499 155 GAAAEG HA 703 223 SAAANR HA 849 272 LRPHAL HA 1730 Stop codon N/A 5 tnemgeS 5 PN 143 89 RKAAAS 6 tnemgeS 6 AN 402 16 WHPRV 7 tnemgeS 7 2M/1M 957 442 )1M( GMAAAG 15 (M2) VRPQWG 2M 997 82 AIAAAA 2M 208 92 AAAAA 2M 278 35 RNRCG Segment 8 NS1 416 130 CGRIL 1SN 615 361 PLQPRL PEN/1SN 035 861 )1SN( GTRGC 10 (NEP) DAAAQD PEN/1SN 646 602 )1SN( NSAAAN 49 (NEP) MRPQVM

Insertional mutant viruses rescued in 293T cells were amplified in eggs and plaque-purified on MDCK cells. Purified viral RNAs were sequenced individually to determine the location of insertion. The “translation of insertion” column is a reflection of into which translational frame the insertion fell. The insertions for the HA and NS segments are shaded in light blue.

HA (27, 28). Our model, in addition to explaining why the stalk to be different compared with HA. For example, antibodies is less variable than the head, predicts that if stalk-reactive against NA are thought to play a less important role in the antibodies are elicited by vaccination, the virus has a lower neutralization of IAV (35, 36). Thus, these data illustrate that probability of easily escaping neutralization due to the inherently though NA certainly has the ability to drift, NS1 and HA contain less plastic stalk domain relative to the very flexible head, which exceptionally plastic domains whose flexibility may allow for tolerates mutations. a more rapid expansion of diversity or a larger potential range of The data presented in this study represent an artificial and diversity relative to proteins like NA. stringent test of protein mutability. Our analysis almost certainly In sum, our mutagenesis of the IAV genome agrees with the did not reveal all potential insertion sites in the virus. The viral accepted notion that viruses with small genomes are highly op- (NA) is known to tolerate changes to the length timized and encode little if any nonessential genetic material. By of the stalk region (29–31). Although our deep-sequencing virtue of this fact, there is very little space that tolerates in- fi

analysis failed to recover highly t insertional mutants in this sertional mutations. However, our analysis has added that de- MICROBIOLOGY region, the one rescued mutant virus was in fact in the NA stalk spite this optimized genome, there are two major regions, the domain (Table 1), which is in general agreement with previous HA head and NS1, that exist as highly plastic and mutable studies. Further, many sites in the genome that are known to domains and are evolved to provide a neutral platform on which tolerate point mutations were not able to tolerate insertions, variability can be generated. This plasticity probably manifests a good example of which is the NA protein; despite being itself by influencing the speed and range of possible mutations. a surface glycoprotein that drifts over time (32), our analysis Finally, it is tempting to speculate that the ability of these pro- found it to be intolerant of many insertions. Although this teins to tolerate mutations may explain why IAV is observed to finding was initially unexpected, it is well known that there is less have such a rapid drift rate and large host range relative to other sequence diversity relative to HA (i.e., NA drifts slower than RNA viruses. We suspect that viruses such as virus, HA) (33, 34). Further, the selective pressures on NA are thought virus, and poliovirus may encode inherently less-flexible

Fig. 3. The HA head domain tolerates insertional mutations. (A) The HA library rescued under non- competitive conditions in MDCK cells (red) and un- der competitive conditions in embryonated chicken eggs (blue). The nucleotide position of the HA seg- ment is indicated along the x axis. The HA1 (head) and HA2 (stalk) domains are denoted above the graph. (B) Venn diagram showing the overlap of the two rescue conditions from A.

Heaton et al. PNAS | December 10, 2013 | vol. 110 | no. 50 | 20251 Downloaded by guest on September 30, 2021 Fig. 4. Locations of transposon insertions in the HA head. (A) Diagram of the HA segment showing the locations of transposons (red and green triangles) recovered from both the MDCK and egg rescues as well as HA mutants verified by plaque purification. All insertional mutants in the HA1 domain were also denoted on the primary sequence of the HA1 do- main (black triangles). Escape mutation locations defining antigenic sites are indicated by the colored boxes. Insertion locations from A are shown as red or green spheres on the crystal structure (PDB ID code 1RU7) of the HA monomer (B) and trimer (C). B Inset is an enlargement of the HA head domain. (D) Individual viruses (denoted by green in A–C) were cloned and rescued. MDCK cells were infected at an MOI of 0.001, and samples were plaqued at the in- dicated times.

scaffolds relative to IAV, which limits their drift rate or evolu- (Invitrogen). The purified DNA was then digested with SapI to release the tionary potential; this may contribute to why single vaccination viral segment (containing the transposon). After gel extraction, the segment against these viruses leads to lifelong protection and why they was ligated into the viral RNA expression plasmid pPol1, which was linear- exist as human-restricted pathogens. Although this theory remains ized with SapI via overnight 16 °C incubation with T4 DNA ligase (NEB). The to be tested, if true, this type of genome-wide mutagenesis may ligation was precipitated with sodium acetate and ethanol, resuspended in water, and again introduced into XL10-Gold E. coli and plated on ten 15-cm help predict and/or validate optimal vaccine epitopes for viruses ∼ such as IAV, which would be predisposed to evolve or mutate LB plates containing ampicillin and kanamycin. After 36 h of growth at 30 °C, bacteria were scraped and pooled, and DNA was isolated via MaxiPrep. Puri- very slowly. fied DNA was then digested with NotI to remove the transposon. After gel purification, the linear DNA vector was ligated back together via overnight Materials and Methods incubation at 16 °C with T4 DNA ligase. The ligation was again precipitated Cells. The 293T and MDCK cells were propagated in DMEM supplemented with sodium acetate and ethanol, resuspended in water, and introduced into with 10% (vol/vol) FBS, penicillin/streptomycin, and sodium bicarbonate. XL10-Gold E. coli and plated on ten 15-cm LB plates containing kanamycin (because the transposon containing KanR is now removed). Bacteria were Mutant Library Generation and Rescue of Viruses. To generate insertional scraped and pooled, and DNA was purified via MaxiPrep. The purified DNA mutants of all viral segments, the entire viral segment was amplified from from this final step represented the viral segment library of 15-nt insertions. viral rescue plasmids with the following primers: CATCGCTCTTCTGCCAGC- Virus was rescued via standard protocols (37, 38), with a few modifications. AAAAGCAGG (forward) or CATCGCTCTTCTGCCAGCGAAAGCAGG, depending The mutant library for a given segment was introduced into 293T cells along on the segment, and GGCCGGCTCTTCTATTAGTAGAAACAAGG (reverse). Viruses with the seven other WT rescue plasmids. Protein expression plasmids were subcloned into the pGEM-T shuttling vector flanked by SapI sites encoded expressing WT proteins from the influenza strain A/WSN/33 were also in- in primers. The sequences of the viral segments corresponded to GenBank troduced when appropriate. Multiple independent rescues were performed accession nos.: AF389115, AF389116, AF389117, AF389118, AF389119, AF389120, for each library, with between 6 and 12 × 106 cells transfected for each li- AF389121, and AF389122). The PB1 and PB2 segments contain SapI sites, so brary. At 24 h after transfection, the supernatant was collected, spun to we therefore made silent mutations to eliminate those sites. For PB1, we pellet any cells or debris, and used to infect a confluent 15-cm dish of MDCK used the QuikChange mutagenesis kit (Agilent) according to the manu- cells in the presence of tosyl phenylalanyl chloromethyl ketone (TPCK) facturers’ instructions with the following primers: CTGTTCCACCATTGAGGA- treated trypsin to allow for viral spread. Because the rescue contains WT GCTCAGACGGCAA (forward) and TTGCCGTCTGAGCTCCTCAATGGTGGAACAG protein, and all viruses in the initial infection may not reflect true viable (reverse). For PB2, we digested the WT pGEM-T clone with NcoI, which cuts mutants, we passaged virus to reduce potential background. At 24 h after in the vector and in the middle of segment. We amplified the WT 5′ end of the initial infection, supernatant was removed, spun to remove debris, and PB2 with the following primers: CGCATGCTCCCGGCCGCCATGGCCGC (for- used to infect a second 15-cm dish of MDCK cells again in the presence of ward) and AGCAATAATCAAGCTTTGATCAACAT (reverse). We then synthe- trypsin. In the case where viruses were rescued in eggs, the rescue from 293T sized and amplified a 456-nt region (with the SapI sites silently mutated) to cells was directly injected into 10-d-old embryonated chicken eggs and in- complete the sequence with the following primers: CAAAGCTTGATTATTG- cubated at 37 °C for 48 h. CTGCT (forward) and ATCCTCTTGTGAAAATACCATG (reverse). We then per- formed a three-piece recombination (InFusion HD; Clontech) with the two PCR RT-PCR and Next-Generation Sequencing. After 24-h incubation with the products and the rest of the segment in the digested pGEM vector to restore second passage of rescued virus, cells were washed with PBS, and RNA was the complete viral segment. isolated via TRIzol (Invitrogen) extraction per the manufacturer’s instructions All eight segments in pGEM-T were then subjected to in vitro transposon (6 mL per 150-cm dish). RNA was resuspended in RNase free water and mutagenesis using the Mutation Generation System (MGS F701; Fisher Sci- subjected to RT-PCR with the appropriate primer set (sequences described in entific) according to the manufacturer’s instructions. Two independent Table S1) via SuperScript III One-Step RT-PCR System with Platinum Taq as mutagenesis reactions were performed on 200 ng of target DNA each and per the manufacturer’s instructions. RT-PCR primers were designed to then pooled for transformation. Mutagenized DNA was introduced into overlap to ensure complete coverage of the segment. Amplified cDNA was XL10-Gold Ultracompetent Escherichia coli (Agilent), and then plated on ten gel purified, sheared via Corvaris sonication, and prepared for sequencing using 15-cm plates of LB agar supplemented with ampicillin and kanamycin to the TruSeq DNA Sample Preparation Kit (Illumina) as per the manufacturer’s select for the plasmid and the transposon. After ∼36 h of growth at 30 °C, instructions. Samples were barcoded, multiplexed, and sequenced on the bacteria were scraped and pooled, and DNA was extracted via MaxiPrep kit HiSeq2000 machine via 100-nt single-end run.

20252 | www.pnas.org/cgi/doi/10.1073/pnas.1320524110 Heaton et al. Downloaded by guest on September 30, 2021 Analysis of Sequencing Datasets. Raw sequencing data were first filtered for sequencing background noise. To set a cutoff for background noise in our reads containing the molecular tag sequence introduced by the transposon analysis, we ordered the data for each segment by number of reads at a given (TGCGGCCGCA). Positive reads were then trimmed to remove the transposon position. We averaged the number of reads for ordered positions 21–40 to and mapped against a reference genome using Bowtie. Mapping the end of generate a stringent threshold defining where real signal could be dis- the trimmed read identified the location of the insertion. Matching read criminated from background while not penalizing a dataset for having a few locations were aggregated and counted to give counts per potential insertion dominant viruses. This background threshold was used to normalize each location. All software was run on Mount Sinai’s high-performance com- output sequencing file against itself. puting cluster. In our output sequencing, large regions of the genome were absent of any Cloning and Characterization of Insertional Mutants. To rescue viruses with fl mapped transposon insertion sites; to ensure this was a re ection of biology transposon insertions in the positions identified by our deep-sequencing (i.e., no viable viruses with insertions in these areas) rather than lack of se- analysis, we amplified the HA segment in two pieces, encoding the trans- quencing coverage, we performed a depth of coverage analysis on all reads poson sequence in the 3′ end of the 5′ PCR and the 5′ end of the 3′ PCR with that mapped to the genome regardless of whether they contained the the primers indicated in Table S2. A three-piece recombination into the viral transposon. We limited our counting tool to 8,000 reads for a given se- expression plasmid pDZ was performed via InFusion HD kit (Clontech) as per quencing file, and we reached that threshold for most positions. In fact, our manufacturer’s instructions. Clones were sequence-validated and rescued in analysis revealed that all possible positions in all segments were covered by 293T cells with the seven other WT rescue plasmids. Rescued virus was pla- our deep sequencing (Fig. S5). We therefore have confidence that regions que purified and sequenced to ensure correct insertion of the transposon. that had no recoverable transposons were not a reflection of technical error. For growth-curve analysis, MDCK cells were infected at a multiplicity of in- As a control for potential PCR bias in our sequencing, we took the in- fection (MOI) of 0.001 and incubated in Opti-MEM (Invitrogen) with TPCK dividual reads from major hits from three segments and mapped the start site for each read containing the transposon sequence. Because our cDNA trypsin. At the indicated time points, samples were harvested and plaqued shearing was random, we expect to have a random distribution of read start via standard protocols (38, 39). sites containing a transposon in a given position. In contrast, if a hit is a re- flection of PCR bias during library preparation, we would expect only a few ACKNOWLEDGMENTS. We thank the Mount Sinai Genomics Core Facility and or a single clonal DNA fragment to contain the transposon. Our analysis Omar Jabado for their expertise with the Illumina sequencing; Ravi Sachida- nandam for helpful discussions; and Paul Leon for experimental assistance. revealed that the distribution was fairly even across DNA fragment start N.S.H. is a Merck Fellow of the Life Sciences Research Foundation. P.P. sites, indicating that PCR bias was probably minimal in our library prepara- acknowledges partial funding from Center for Research on Influenza Patho- tions (Fig. S6). genesis Grant HHSN266200700010C, National Institutes for Health Program Because we biased our RT-PCR to be extremely sensitive, we also amplified Project 1P01AI097092-01A1, and Program for Appropriate Technology in highly attenuated viruses and defective viral RNAs, which increased our Health (PATH).

1. Shaw ML, Palese P (2013) Orthomyxoviruses. Fields , eds Knipe DM, Howley PM 21. Rossman JS, Jing X, Leser GP, Lamb RA (2010) Influenza virus M2 protein mediates (Lippincott Williams & Wilkins, Philadelphia), pp 1151–1185. ESCRT-independent membrane scission. Cell 142(6):902–913. 2. Thompson WW, et al. (2004) Influenza-associated hospitalizations in the United 22. Janeway C, Travers P, Walport M (2001) Failures of Host Defense Mechanisms. Im- States. JAMA 292(11):1333–1340. munobiology: The Immune System in Health and Disease (Garland Science, New 3. Drake JW (1993) Rates of spontaneous mutation among RNA viruses. Proc Natl Acad York), 5th Ed, Chap 11. Sci USA 90(9):4171–4175. 23. Criss AK, Kline KA, Seifert HS (2005) The frequency and rate of pilin antigenic vari- 4. Fitch WM, Bush RM, Bender CA, Cox NJ (1997) Long term trends in the evolution of ation in Neisseria gonorrhoeae. Mol Microbiol 58(2):510–519. H(3) HA1 human influenza type A. Proc Natl Acad Sci USA 94(15):7712–7718. 24. Elde NC, et al. (2012) Poxviruses deploy genomic accordions to adapt rapidly against 5. Plotkin JB, Dushoff J, Levin SA (2002) Hemagglutinin sequence clusters and the an- host antiviral defenses. Cell 150(4):831–841. tigenic evolution of influenza A virus. Proc Natl Acad Sci USA 99(9):6263–6268. 25. de Chassey B, et al. (2013) The interactomes of influenza virus NS1 and NS2 proteins 6. Shu LL, Bean WJ, Webster RG (1993) Analysis of the evolution and variation of the identify new host factors and provide insights for ADAR1 playing a supportive role in human influenza A virus nucleoprotein gene from 1933 to 1990. JVirol67(5):2723–2729. virus replication. PLoS Pathog 9(7):e1003440. 7. Sevilla-Reyes EE, Chavaro-Pérez DA, Piten-Isidro E, Gutiérrez-González LH, Santos- 26. Bornholdt ZA, Prasad BV (2008) X-ray structure of NS1 from a highly pathogenic H5N1 fl – Mendoza T (2013) Protein clustering and RNA phylogenetic reconstruction of the in uenza virus. Nature 456(7224):985 988. fl influenza a virus NS1 protein allow an update in classification and identification of 27. Krammer F, Palese P (2013) In uenza virus hemagglutinin stalk-based antibodies and – motif conservation. PLoS ONE 8(5):e63098. vaccines. Curr Opin Virol 3(5):521 530. 8. Arumugaswami V, et al. (2008) High-resolution functional profiling of 28. Subbarao K, Matsuoka Y (2013) The prospects and challenges of universal vaccines for influenza. Trends Microbiol 21(7):350–358. virus genome. PLoS Pathog 4(10):e1000182. 29. Luo G, Chung J, Palese P (1993) Alterations of the stalk of the influenza virus neur- MICROBIOLOGY 9. Beitzel BF, Bakken RR, Smith JM, Schmaljohn CS (2010) High-resolution functional aminidase: Deletions and insertions. Virus Res 29(3):321. mapping of the Venezuelan equine encephalitis virus genome by insertional muta- 30. Li J, Zu Dohna H, Cardona CJ, Miller J, Carpenter TE (2011) Emergence and genetic genesis and massively parallel sequencing. PLoS Pathog 6(10):e1001146. variation of neuraminidase stalk deletions in avian influenza viruses. PLoS ONE 6(2): 10. dos Reis M, Tamuri AU, Hay AJ, Goldstein RA (2011) Charting the host adaptation of e14722. influenza viruses. Mol Biol Evol 28(6):1755–1767. 31. Castrucci MR, Kawaoka Y (1993) Biologic importance of neuraminidase stalk length in 11. Taubenberger JK, Kash JC (2010) Influenza virus evolution, host adaptation, and influenza A virus. J Virol 67(2):759–764. pandemic formation. Cell Host Microbe 7(6):440–451. 32. Xu X, Cox NJ, Bender CA, Regnery HL, Shaw MW (1996) Genetic variation in neur- 12. Blackburne BP, Hay AJ, Goldstein RA (2008) Changing selective pressure during an- aminidase genes of influenza A (H3N2) viruses. Virology 224(1):175–183. tigenic changes in human influenza H3. PLoS Pathog 4(5):e1000058. 33. Kilbourne ED, Johansson BE, Grajower B (1990) Independent and disparate evolution 13. Wiley DC, Skehel JJ (1987) The structure and function of the hemagglutinin mem- in nature of influenza A virus hemagglutinin and neuraminidase . Proc fl – brane glycoprotein of in uenza virus. Annu Rev Biochem 56:365 394. Natl Acad Sci USA 87(2):786–790. 14. Hale BG, Randall RE, Ortín J, Jackson D (2008) The multifunctional NS1 protein of 34. Sandbulte MR, et al. (2011) Discordant antigenic drift of neuraminidase and hem- fl – in uenza A viruses. J Gen Virol 89(Pt 10):2359 2376. agglutinin in H1N1 and H3N2 influenza viruses. Proc Natl Acad Sci USA 108(51): fl 15. García-Sastre A, et al. (1998) In uenza A virus lacking the NS1 gene replicates in 20748–20753. fi – interferon-de cient systems. Virology 252(2):324 330. 35. Nayak B, et al. (2010) Contributions of the avian influenza virus HA, NA, and M2 fi 16. Haapa S, Taira S, Heikkinen E, Savilahti H (1999) An ef cient and accurate integration surface proteins to the induction of neutralizing antibodies and protective . of mini-Mu transposons in vitro: A general methodology for functional genetic J Virol 84(5):2408–2420. analysis and molecular biology applications. Nucleic Acids Res 27(13):2777–2784. 36. Marcelin G, Sandbulte MR, Webby RJ (2012) Contribution of antibody production 17. Dar A, et al. (2009) CpG oligodeoxynucleotides activate innate immune response that against neuraminidase to the protection afforded by influenza vaccines. Rev Med suppresses infectious bronchitis virus replication in chicken embryos. Avian Dis 53(2): Virol 22(4):267–279. 261–267. 37. Fodor E, et al. (1999) Rescue of influenza A virus from recombinant DNA. J Virol 18. Caton AJ, Brownlee GG, Yewdell JW, Gerhard W (1982) The antigenic structure of the 73(11):9679–9682. influenza virus A/PR/8/34 hemagglutinin (H1 subtype). Cell 31(2 Pt 1):417–427. 38. Hai R, et al. (2012) Influenza viruses expressing chimeric hemagglutinins: Globular 19. Gamblin SJ, et al. (2004) The structure and receptor binding properties of the 1918 head and stalk domains derived from different subtypes. J Virol 86(10):5774–5781. influenza hemagglutinin. Science 303(5665):1838–1842. 39. Steel J, et al. (2009) Live attenuated influenza viruses containing NS1 truncations as 20. Schnell JR, Chou JJ (2008) Structure and mechanism of the of vaccine candidates against H5N1 highly pathogenic avian influenza. J Virol 83(4): influenza A virus. Nature 451(7178):591–595. 1742–1753.

Heaton et al. PNAS | December 10, 2013 | vol. 110 | no. 50 | 20253 Downloaded by guest on September 30, 2021