<<

Viral reassortment as an information exchange between viral segments

Benjamin D. Greenbauma,1, Olive T. W. Lib, Leo L. M. Poonb, Arnold J. Levinea, and Raul Rabadanc,d

aThe Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540; bState Key Laboratory of Emerging Infectious Diseases and Centre of Research, School of Public Health, University of Hong Kong, Hong Kong; and cCenter for Computational Biology and Bioinformatics and dDepartment of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032

Edited by Robert A. Lamb, Northwestern University, Evanston, IL, and approved January 10, 2012 (received for review August 17, 2011)

Viruses have an extraordinary ability to diversify and evolve. For uncovers new aspects of the process of reassortment, demonstrat- segmented , reassortment can introduce drastic genomic ing significant combinations that occur commonly in several dif- and phenotypic changes by allowing a direct exchange of genetic ferent crosses of viral strains. We show that strain and host material between coinfecting strains. For instance, multiple influ- cell-type influence outcomes. We include a novel experiment with enza were caused by reassortments of viruses typically the latest strain, 2009 pdm, and the seasonal H1N1 found in separate hosts. What is unclear, however, are the under- strain circulating prior to its introduction, expanding the number lying mechanisms driving these events and the level of intrinsic of reassortment examples analyzed to date and comparing this bias in the diversity of strains that emerge from coinfection. To ad- case to other analogous experiments within our framework. dress this problem, previous experiments looked for correlations Information theory is a general mathematical framework for between segments of strains that coinfect cells in vitro. Here, quantifying the transmission and exchange of information (15, we present an information theory approach as the natural math- 16). Within this framework, we can separate multiple levels of ematical framework for this question. We study, for influenza and information transfer and exchange within viral segment replication other segmented viruses, the extent to which a virus’s segments and reassortment. At the same time, this formalism allows us to can communicate strain information across an infection and among show how these different levels constrain one another and to relate one another. Our approach goes beyond previous association stu- information theoretic quantities, such as entropy and mutual infor- dies and quantifies how much the diversity of emerging strains is mation, to the likely diversity of viral populations produced by BIOPHYSICS AND altered by patterns in reassortment, whether biases are consistent a host coinfection. We further provide a nonparametric permuta- COMPUTATIONAL BIOLOGY across multiple strains and cell types, and if significant information tion test to assess the statistical significance of these quantities. We is shared among more than two segments. We apply our approach show which segments share meaningful amounts of information to a new experiment that examines reassortment patterns be- across all experiments, implying general segregation rules in influ- tween the 2009 H1N1 pandemic and seasonal H1N1 strains, contex- enza, and which segments only share significant information for tualizing its segmental information sharing by comparison with particular strain pairings. Significantly, we quantify how much previously reported strain reassortments. We find evolutionary information they actually share—a key component in determining patterns across classes of experiments and previously unobserved the diversity of progeny. Finally, we extend our method to reo- higher-level structures. Finally, we show how this approach can be viruses, a member of the family, which includes rota- combined with virulence potentials to assess pandemic threats. virus, the leading cause of acute childhood diarrhea worldwide (17). ∣ systems biology ∣ emerging infectious disease In a typical experiment, a relevant cell type is coinfected with two different strains, and the repertoire of progeny viruses is eassortment of segmented viruses is a key mechanism for explored. These experiments separate intrinsic biases from those Rrapid novel virus creation. At least two human influenza pan- observed in circulating strains, in refs. 11 and 12, that may have demics in the last century were linked to lineages where some additional causes. Suppose two strains are introduced to cells number of genomic segments reassorted with a of non- in culture at equal multiplicities of infection (MOI), a typical ex- human origin (1, 2). This fact was reinforced by the emergence of perimental scenario. MOI is defined as the ratio of infectious the 2009 H1N1 pandemic (2009 pdm) virus (3–5). Novel reassor- agents to host targets, so each strain, ideally, is equally likely to tant strains can evade adaptive immunity by introducing antigens infect a target cell. After an experiment, the output probability to a naïve host population or overly stimulate innate immunity that a segment comes from a given parental strain may no longer by presenting a new host with abundant nonself molecular signals be the input value of one-half. We quantify this effect as the en- (6–10). Moreover, both sequence database studies and in vitro ex- tropy change per segment between the input probability distribu- periments have shown that genome reassortment between strains tion that a segment came from a given strain versus the output happens nonrandomly: If two strains coinfect the same cell, their distribution. progeny may not sample all possible strain/segment combinations If bias exists toward how pairs of segments appear together in uniformly (11–14). These analyses focused on whether it is more the output virus, such as may arise from packaging effects, this likely that pairs of segments from the same strain appear together will be captured by the mutual information shared between those in reassortments, typically using chi-square tests to establish signif- two segments. The entropy per segment constrains this quantity: icance. Because influenza has eight segments, there are 256 possible Author contributions: B.D.G., O.T.W.L., L.L.P., A.J.L., and R.R. designed research; B.D.G. and reassortant viruses when a cell is coinfected by two strains. Each O.T.W.L. performed research; B.D.G., O.T.W.L., L.L.P., A.J.L., and R.R. analyzed data; and strain type and host cellular environment can influence reassort- B.D.G., O.T.W.L., L.L.P., A.J.L., and R.R. wrote the paper. ment differently, so it would seem impossible to predict whether The authors declare no conflict of interest. a new pandemic strain can form. However, not every possible This article is a PNAS Direct Submission. progeny combination may occur or survive. As we show, this pro- Freely available online through the PNAS open access option. blem can be reformulated using information theory, determining 1To whom correspondence should be addressed. E-mail: [email protected]. ’ the information content of a segment s strain of origin distribu- This article contains supporting information online at www.pnas.org/lookup/suppl/ tion and the information shared among segments. Our approach doi:10.1073/pnas.1113300109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1113300109 PNAS Early Edition ∣ 1of6 Downloaded by guest on October 2, 2021 The mutual information between segments is always less than the M H − p s p s minimum entropy per segment. For these quantities we have de- n ¼ ∑ nð Þ log2ð nð ÞÞ signed a nonparametric “channel scrambling” test for generating s¼1 p values. Furthermore, we use the total correlation to capture the entropy change for that segment will be structures of a higher order than pairwise, a feature not found in previous analyses. A hypothetical case is represented in Fig, 1, ΔHn ¼ log2ðMÞ − Hn: where segments 2, 3, and 7 have a significant total correlation. In this case the segments, taken individually, are equally likely to Typically, there are two equally probable coinfecting strains and come from the same strain of origin. Yet, if one segment has a the first term will then be equal to 1. The above quantity measures given strain of origin, the other two segments will also come from how much the output distribution deviates from uniformity. For a the same strain. given segment, a value near zero would indicate that, in the out- By formalizing the mathematical analysis for this process and put viruses, one is equally likely to see a segment come from testing that analysis on both new and existing reassortment data, either strain. If the value is close to 1, it indicates that this seg- we may better understand reassortment outcomes and predict ment is dominantly from one of the two input strains in the output limitations upon which virus will emerge, resulting in better pre- viruses. Hence, a change in entropy implies that one type of pro- geny virus is now more likely to appear than another, whereas paredness. That is the goal of this program. While a full explora- previously that was not the case. tion of the true set of reassortment biases requires a large-scale We analyze this quantity for an original experiment in which exploration of all combinations of infecting strains, infected cell MDCK cells were coinfected at MOI of 1 PFU/cell of seasonal type, and cell-type of origin, in a manner that faithfully H1N1 (A/Hong Kong/226654/07) and 2009 pdm (A/California/4/ reflects the likely backgrounds and cellular response states in 09). MDCK cells were incubated with virus inoculums for 1 h at which coinfecting strains could replicate, our approach makes room temperature and then were briefly washed by acidic buffer the problem more quantitative and informative. In doing so, to inactivate nonincorporated wild-type parental viruses. The we demonstrate both how this method can be used in future ex- virus was allowed to grow, and the supernatant was collected periments to assess pandemic risk and to uncover fundamental at 72 h post infection and used to perform standard plaque assays. limits on the ability to communicate strain information between Plaques were purified and their genotypes were identified by RT- viral segments. PCR using segment specific primers for both strains as previously described (18). A detailed description of these procedures is Results available in SI Text, along with a full table of results. Entropy Change per Segment in Coinfection Experiments. In the clas- The results of our original experiment were compared to two sic coinfection experiment, first outlined in Lubeck et al., two previous experiments in which MDCK cells were coinfected with strains of equal MOI of 1–5 PFU (plaque forming units) per cell influenza strains. MDCK cells are commonly used to measure infection of a cell by multiple possible strains of origin, given are introduced to MDCK cell culture (11). However, the total the ability of a wide variety of influenza subtypes to infect and concentration of a given segment in the progeny viruses may grow productively in these cells (19). Because of this, they provide be far from uniform. If two strains are introduced to these cells, a standard context for comparison of intrinsic reassortment po- and each segment has an initial probability of ½ for having come tential. The first comparison experiment, performed, by Li et al., from a given strain, then each segment will have an initial entropy cotransfected seasonal human H3N2 and equine H7N7 expres- per segment of 1 bit. We assume that there are initially M strains sion plasmids into 293T cells (20). The data in this study was de- introduced to a cell with an equal probability, although this does rived by reverse techniques, and the viral repertoire not need to be the case, and pnðsÞ is the probability that a given studied was generated by cotransfection of viral segments produ- segment, n, from the output viruses came from strain, s. If the cing plasmids for H3N2 and H7N7 strains as well as the viral poly- entropy of a segment, n, for output viruses is defined as merase complex protein expressing plasmids for H1N1. The second is the aforementioned Lubeck experiment in which sea- sonal human H1N1 and H3N2 strains coinfect MDCK cells (11). It should be noted that these experiments used different experi- mental settings and time courses to generate recombinant viruses. Given that there may be different experimental biases in each of these settings, using data generated from different experimental platforms might help us to reduce the overall bias in our analysis and would make trends that are observed across experimental platforms all the more convincing. A comparison of the different approaches used by the published influenza experiments under analysis in this work is also provided in SI Text. Several noteworthy features are presented in Table1 and Fig. 2. Entropy changes of greater than ½ for our experiment corre- spond to segments 4 (encoding HA) and 7 (M), respectively. In comparison, the experiment of Li et al., with two strains of very different species origin compared to the other two experiments, shows four segments with changes greater than 0.4 and Lubeck et Fig. 1. Representation of a reassortment structure. Two parental strains (in- al., with two human seasonal strains, shows no changes greater dicated as red and blue) produce a set of eight reassortant viruses. One can than 0.1515 bits. Comparative examination of these results shows interpret the relation between the segments as an exchange of information. that, in our case and in the case of Li et al., HA dominantly occurs Segments 2, 3, and 7 constitute the most statistical significant structure (Total Correlation ¼ 2, p value ¼ 8.2 × 10−4), more significant than any pair from one strain, which also happens once for PB2 in the Li et al. correlation. Although its strain is uniformly distributed when examined on its experiment. Presumably these HA had an advantage over an- own, whenever segment 2 is a given color, segments 3 and 7 are the same other as the reassortant viruses reinfected. It is also interesting color. to note that segments 6 (NA) and 8 (NS) show very little change

2of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1113300109 Greenbaum et al. Downloaded by guest on October 2, 2021 Table 1. Entropy per segment for MDCK reassortment experiments MImn ≤ minfHm;Hng H1N1-2009 pdm H3N2-H7N7 H1N1-H3N2 Segment (original) (20) (11) due to the fact that the joint entropy is always greater than the maximum individual entropy, as summarized in SI Text (15). 1 0.0007 1 0.1515 Two issues now present themselves. The first is to determine 2 0.0721 0.6466 0.0072 that a quantity of mutual information between two segments 3 0.3192 0.14 0 4 0.7817 0.8777 0.0018 should be deemed significant, as opposed to when it could have 5 0.0034 0.4564 0.0072 been achieved randomly. We use a nonparametric test for asso- 6 0.1887 0.0519 0.0163 ciating a p value to the mutual information shared between seg- 7 0.6665 0.1758 0.0659 ments. To obtain a p value, we randomly permute the order of the 8 0.0523 0.1088 0.1515 output strains for each segment—“scrambling” the message. We count the number of times, out of the number of permutations, that the observed mutual information is greater than the empiri- ’ in entropy across all three experiments, implying that one strain s cal value (following the convention of counting those on the version of these proteins is typically not favored. Segment 8, con- boundary half of the time). Our procedure is further described sisting of two nonstructural proteins (NS1/NS2) and one of the in SI Text. Second, because there are eight segments in the virus, more conserved sequences in influenza, may not be expected there are multiple pairs of segments that can share information. to have divergent function across strains, implying that one Multiple hypothesis correction must be taken into account. To strain’s proteins have no real strain specific advantage. focus on the clearest associations, we use Bonferroni corrections. There are 28 possible segment pairs, so we use a p value of Mutual Information Among Replication-Biased Segments. Given that 0.05∕28 ¼ 0.00179 for significance. The results of those segment 5 the entropy per segment is altered from a uniform probability, combinations that pass the Bonferroni corrections, under 10 the opportunity for viral segments to communicate their strain permutations, are listed in Table 2. In this table significant pairs p to one another will become limited. For instance, in the case are listed, along with their mutual information and value. Also of the H3N2-H7N7 reassortment experiment of ref. 20, segment listed is the normalized mutual information: the mutual informa- 1 has lost all entropy per segment because every progeny strain tion divided by the maximum possible value given by the above inequality. has segment 1 from the same parental strain. Hence, even if an- As is clear from Table 2 and Fig. 3, the most consistently sig- BIOPHYSICS AND other segment would gain an advantage from pairing with a seg- nificant pairing is between segments 2 and 3 (PA). In each case COMPUTATIONAL BIOLOGY ment 1 from H7N7, such communication about strain type is not this pair is significant, sharing between 0.1303 and 0.1912 bits of possible because segment 1 is always from H3N2. The ability of information per segment, with the more closely related strains, segment 1 to communicate information to another segment is seasonal H1N1 and H3N2, typically sharing the most informa- closed. tion. Each strain also has strain specific pairings, which may in- In information theoretic terms, the mutual dependence be- dicate an association that was significant to the biology of that tween two random variables is quantified by the mutual informa- particular strain combination but not to others. For instance, seg- tion, applied to this problem as ments 3 and 8 show significant communication for 2009 pdm and seasonal H1N1, marginal association for H1N1 and H3N2, and M M no association for H3N2 and H7N7. The constancy of the 2–3 MI p s ;s p s ;s ∕ p s p s mn ¼ ∑ ∑ mnð i jÞ · log2ð mnð i jÞ ð mð iÞ · nð jÞÞÞ pairing across strains is important. In the original experiment si¼1 sj¼1 on the subject, ref. 11, the entire polymerase complex was signif- — H H − H icantly associated that is, segments 1, 2, and 3. However, it is ¼ m þ n mn clear from this work that if, in fact, polymerase segments pair pre- ferentially by strain, only segments 2 and 3 pair as a general rule. p s ;s m n where mnð i jÞ is the joint probability that segments and In H3N2—H7N7, the H7N7 PB2 is completely dominant, while come from strains si and sj, respectively, and, therefore, Hmn is for H1N1—2009 pdm segments 1 and 2 make no significant as- the entropy for the two-segment joint probability. The mutual in- sociations. formation and entropy per segment constrain one another via the inequality Total Correlation For More Than Two Segments. Our approach can be generalized for correlations between higher order complexes, Entropy Change per Segment in MDCK Reassortments such as triples of segments and so on. However, there are two

1 drawbacks. First, there are many ways to extend the concept 0.9 H1N1 − 2009pdm of mutual information to multivariate distributions (21). No H3N2 − H7N7 0.8 H1N1 − H3N2 one function captures all aspects of the two-dimensional mutual 0.7 information for multivariate distributions. Second, looking for as- 0.6 0.5 Table 2. Mutual information for significant segment pairs for MDCK 0.4 reassortants 0.3 Segments MI p value Normalized MI 0.2 0.1 H1N1-2009 pdm

Entropy Change per Segment – 0 2 3 0.1694 0 0.2488 123456783–8 0.1536 0.0001 0.2256 Segment H3N2-H7N7 (20) 2–3 0.1303 0 0.3689 Fig. 2. Entropy change per segment in parallel reassortment experiments. In 7–8 0.0611 0.0012 0.0741 these three comparable reassortment experiments, the entropy change for H1N1-H3N2 (11) each segment is shown for each of the eight viral segments. In all of these 1–2 0.2985 2.4 × 10−4 0.3518 cases the initial entropy is 1 bit, the maximum possible value. Any entropy 2–3 0.1912 3.3 × 10−3 0.1928 change is therefore an entropy loss.

Greenbaum et al. PNAS Early Edition ∣ 3of6 Downloaded by guest on October 2, 2021 Mutual Information Between Segments vs Significance 0.35 Table 3. Significant total correlation values for three segment groupings 0.3 Segments Total correlation p value 0.25 H1N1-2009 pdm 0.2 238 0.3871 0.000015 0.15 237 0.2429 0.00063 378 0.2308 0.000855 0.1 H3N2-H7N7 (20) 0.05 236 0.1619 0.00001 235 0.1476 <10−5 0 234 0.1384 0.000025 Mutual Information Between Segments 0 0.5 1 1.5 2 2.5 3 3.5 4 −log(p−value) 238 0.136 0.00011 237 0.1305 0.00024 Fig. 3. Mutual information between segments versus significance. The 123 0.1303 0.00001 amount of mutual information shared between segments is plotted versus 578 0.1185 0.00066 the negative logarithm of the associated p value. The p value is calculated H1N1-H3N2 (11) by the permutation test described in the text. Those mutual information va- 123 0.5065 0.00002 lues that meet the Bonferroni criterion separate at the far right from the 126 0.4542 0.00008 group on the far right. The colors correspond to the same strain crosses as 128 0.3921 0.00033 in Fig. 2.

sociations among greater combinations of segments can create one equine and one human, may therefore enhance the need more hypotheses to test, necessitating more experiments. For sig- for other segments to associate with this pair in a strain-specific nificant association among three segments, there are 56 possible manner, because the two strains evolved in comparatively different hypotheses, and the Bonferroni correction would yield a p value host backgrounds. of 0.05∕56 ¼ 0.000893. Here we focus on the total correlation (22). This measure cap- Comparison to Other Experimental Settings and Segmented Viruses. tures the notion that the proper measure of dependence is the The above experiments deal with the important setting of equi- Kullback–Leibler divergence between the multivariate distribu- probable strain reassortment in MDCK cells. We now examine tion and the independent distribution. Another commonly used three variants on the previous experiments. In Octaviani et al., quantity, the multivariate mutual information, is a natural gener- reassortants between H5N1 and 2009 pdm were examined in alization of the idea that the relevant quantity in an N-dimen- MDCK cells (24). For some strains, even when one infectious unit sional information measure is the difference in information is provided per target cell for an infection, many cells may not be between an N − 1 dimensional subset and the probability distri- coinfected with both viruses due to a particular strain’s domi- bution of that subset conditioned on an additional variable. This nance. For example, in the pilot experiment of ref. 24 when H5N1 quantity, although containing useful insights, can be difficult to and 2009 pdm were recombined at equal MOI of 1 PFU/cell, the interpret in ambiguous cases (23). Unlike the total correlation H5N1 was highly dominant, so the authors increased the MOI and two-dimensional mutual information, it can be negative ratio to 5∶1. Clearly this is a significant study, because a combi- and, as such, is ill-suited for our significance test because there nation of H5N1’s high case fatality rate with the 2009 pdm quick is not a monotonic interpretation of the meaningfulness of this transmission ability could cause a significant public health crisis quantity. Both quantities reduce to the mutual information in (25, 26). Because the initial probability distribution for a segment the two-dimensional case. coming from a given strain is no longer (1∕2, 1∕2), but is now The total correlation is the difference between the total entro- (1∕6, 5∕6), the full entropy formula is used for the initial distri- py of all single variables and the entropy of the N variable inde- bution. The initial entropy is now 0.6836 bits per segment, rather pendent probability distribution. It is nonnegative, and one can than the previous 1 bit per segment. Because this is no longer the define a nonparametric “channel scrambling” test as in the pre- maximum entropy, the possibility exists that the entropy can vious section. For three segments, l, m, and n, the total correla- either increase or decrease over the course of the experiment. tion is defined as In Table S2, we see that in most cases the entropy increased. That is, output viruses were typically more random per segment than TClmn ¼ Hl þ Hm þ Hn − Hlmn: the input viruses. For segment 5 (NP) the trend was reversed and

This quantity has attractive qualities in terms of convergence and decomposition (22). The equivalent of the aforementioned in- 0.5 equality for mutual information is that 0.4

TClmn ≤ Hl þ Hm þ Hn − maxfHl;Hm;Hng 0.3 SI Text as derived in . Significant total correlations are listed in 0.2 Table 3 and all total correlations are plotted against the logarithm of their p values in Fig. 4. They follow the same pattern as the 0.1 mutual information: More closely related strains have higher total 0 correlations in general. While there is no completely consistent Three Segment Total Correlations 0 0.5 1 1.5 2 2.5 3 3.5 4 pattern, in all but one case significant total correlations contain −log(p−value) two segments from the polymerase complex. This suggests that other segments may group with these polymerase complex pairs Fig. 4. Total correlation values versus significance. The amount of total cor- relation between segment triplexes is plotted versus the negative logarithm preferentially rather than with their individual segments. This is of the associated p value. As in Fig. 3, the p value is calculated by permutation particularly true for the H3N2-H7N7 pairing, where the segment testing. Those mutual information values that meet the Bonferroni criterion 2–3 pair, itself significant, appears significantly associated with separate at the far right from the group on the far right. The colors corre- every other segment. The distance between these two strains, spond to the same strain crosses as in Fig. 2.

4of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1113300109 Greenbaum et al. Downloaded by guest on October 2, 2021 for segment 1 it was minimal. In Varich et al., a human H1N1 to be replaced by the joint probability, and, likewise, a significant virus (A/WSN/33) and an avian H4N6 virus (A/Duck/Czechoslo- total correlation would indicate replacement of the independent vakia/56) coinfect chicken embryos at high, equal MOI of 5–10 probabilities of, say, three segments, with their joint probabilities. PFU/cell (27). Both experiments again show the potential varia- The power of these methods can be increased when combined bility of the HA containing segment from the random distribution with approaches such as that of Li et al. (30, 31). In this work, all —in each case one particular HA is preferred. Likewise, we cal- 256 possible output viruses were examined for their virulence po- culated the entropy change per segment for equal MOI of 5–10 tential. Because our approach gives a better handle on which re- PFU/cell reassortment experiments on two mammalian reovirus assortants are most likely to be produced, one can examine the strains performed by Nibert et al. (28). Reoviruses are 10 segment overlap between the set of most probable viruses and a viral phe- dsRNA viruses, so their p-value cutoff is altered accordingly, notype. We could not find a strain cross that has been assessed in showing the strongest associations between segments 3 and 7. the literature for both reassortment and phenotype of all strain An interesting aspect of these variants comes when the mutual crosses. However, the methods employed here inform future ex- information is examined between influenza segments, as shown in periments, where virulence could also be combined or substituted Table S3. In the case of H5N1—2009 pdm, there are many unique with transmissibility or another relevant phenotype. associations that are not seen for other strains. However, some If a likely reassortant also has a stronger than normal pheno- association between segments 2 and 3 continues to persist—it type, then a higher priority can be placed on preventive measures, is just below our cutoff. For the H4N6-H1N1 chicken egg reas- such as vaccine preparation, surveillance, or eradication of an in- sortment, the highest amount of mutual information in any ex- fected bird or pig host. When such a case exists, that strain can periment is recorded, between segments 3 and 5 and 4 and 7. also be examined more closely, such as by querying its immunos- Although this suggests fundamentally different interactions in timulatory potential in relevant cell types. In this way, one can avian cells, other experiments are needed before drawing such attempt to characterize potentially virulent, pandemic strains be- conclusions. fore they can enter a population. This is one of the real values of our approach. Only a rare reassortant may occur with the prob- Discussion ability and virulence to cause a dangerous pandemic. With the Quantifying reassortment bias is critical for understanding poten- proper experiments one could estimate this probability, identify tial future strains and gaining insight into the environment in any viral genomic alterations that could put a population at risk, which segmented viruses operate. As we have shown, information and thus respond accordingly. If, through surveillance, two strains theory provides a natural approach to interpreting and contextua- are found in the same location, with a suitable host for mixing BIOPHYSICS AND lizing experimental results across multiple settings. Our method is

these strains in the wild, we could estimate the diversity of reas- COMPUTATIONAL BIOLOGY applicable to any segmented virus, or any system where genetic sortant strains prior to this mixing occurring. These predictions segments from multiple sources combine in progeny. From it one could well provide measured responses depending upon the gains the insights that come from showing that this problem can threat level. be interpreted in an information theoretic context, biological in- Biologically, these experiments open a window into the cellular formation from experiments directly studied in this work, and a environment in which influenza replicates. The largest mutual in- template for future studies that will assist practical and theore- formation was associated with accessory proteins to the viral poly- tical endeavors. We summarize these features below. merase, PB1 significantly associated with PA across all similar At the single segment level, the entropy per segment captures a experiments, but not consistently with PB2 as some had found. change from the input probability of that segment coming from a One possibility is related to the work of ref. 32, which indicates particular strain. If input segments were all equally likely to have that a PB1-PA dimer is formed separately from a monomeric PB2 come from one of two strains, it would be expected that, for N and assembled in the nucleus. Our results suggest that this pairing 2N output viruses, there are equally likely outputs for each seg- may well be the most consistently significant effect on viral H ment. If the entropy, , is reduced and one has a reliable estimate progeny diversity across strains. However, it remains to be deter- “ ” 2HN of its value, there will typically be equally likely experimen- mined experimentally if this advantage comes from the favorabil- tal outputs. For all eight segments the number of readouts will ity of dimerization or an optimization of one dimer over another therefore change from for, say, more efficient polymerization. In addition to the associa-

8 tion of segments 2 and 3 by strain, we also note that segments 6 8N ðΣ HiÞN 2 → 2 i¼1 : and 8 seem to show indifference to their strain of origin. More- over, segment 4, containing HA, is highly variable between ex- In these cases, one needs solid estimates, rather than just ascer- periments, ranging from random to almost complete dominance. taining significance. Because these quantities are biased estima- In this case, because HA binds to sialic acid for , one tors, to minimize error one wants to ensure, in such a planned would be inclined to attribute the change in progeny diversity to experiment, that all strain probabilities, pi, satisfy Npi ≫ 1 (29). the fitness of one HA over another. Future reassortment experi- Further gain comes from separating the preceding step from ments at a single cell level would make an interesting point of the mutual information between segments and beyond to total comparison. correlations. Assuming the mutual information between seg- Theoretically, if a sufficient quantity of strain combination ex- ments 2 and 3 is significant across strains, then, in all cases, the periments were studied, the maximum mutual information would contribution of those terms to the diversity of strains will further give an indication of the channel capacity between strains, giving change from a bound on how much information can be possibly communicated between two segments in a noisy environment. If the rate of re- 2H2N 2H3N → 2ðH2þH3−MI23ÞN ¼ 2H23N : assortment is pushed past this limit, it could disrupt viral segment communication, a possible novel defense strategy. As more con- If one assumes that enough trials have been performed, as indi- sistent groupings are found, over many experiments, investigators cated by significance under our method, one can now get a handle can narrow in on these consistent sets to see whether the increase on which reassortants are most probable. The ratio of the like- in likelihood of certain progeny being found over others in a given lihood of one reassortant to another would give a gross sense environment is due to factors such as the fitness of a reassortant, of relative fitness (this would not be exact because the number functional constraints on the proteins, or genome packaging se- of generations is unclear). If the mutual information is significant, quence differences. Understanding how much these segments then the probability that two segments came from a given strain is transfer information about their strain of origin, and to what ex-

Greenbaum et al. PNAS Early Edition ∣ 5of6 Downloaded by guest on October 2, 2021 tent this is possible, can ultimately lead to novel antiviral strate- helpful discussions and comments. We also thank Polly Mak for her technical gies. We provide a quantitative framework, along with specific support. R.R. is supported by the Northeast Biodefence Center (U54- examples of what can be discovered, giving a greater sense of AI057158), the National Institutes of Health (U54 CA121852-05), and the Na- tional Library of Medicine (1R01LM010140-01). The laboratory work was sup- how to assess preparedness and the limits of viral segment com- ported by the Area of Excellence Scheme of the University Grants Committee munication. Hong Kong (AoE/M-12/06) and the Research Fund for the Control of Infec- tious Disease Commissioned Project from Food and Health Bureau, Hong ACKNOWLEDGMENTS. We would like to thank Vladimir Trifonov, Hossein Kong. B.D.G. is the Eric and Wendy Schmidt Member in Biology at the Insti- Khianbanian, Edo Kussell, Yoshihiro Kawaoka, and Gabriele Neumann for tute for Advanced Study and would like to thank them for their support.

1. Taubenberger JK, Morens DM (2006) 1918 influenza: The mother of all pandemics. Rev 18. Poon LL, et al. (2010) Rapid detection of reassortment of pandemic H1N1/2009 influ- Biomed 17:69–79. enza virus. Clin Chem 56:1340–1344. 2. Kilbourne ED (2006) Infleunza pandemics of the twentieth century. Emerg Infect Dis 19. Gaush CR, Smith TF (1968) Replication and plaque assay of influenza virus in an estab- 12:9–14. lished line of canine kidney cells. Appl Microbiol 16:588–594. 3. Novel Swine-Origin Influenza A (H1N1) Virus Investigation Team (2009) Emergence 20. Li C, Hatta M, Watanabe S, Neumann G, Kawaoka Y (2008) Compatibility among of a novel swine-origin influenza A (H1N1) virus in humans. New Engl J Med polymerase subunit proteins is a restricting factor in reassortment between equine 360:2605–2615. H7N7 and human H3N2 viruses. J Virol 82:11880–11888. 4. Trifonov V, Khiabanian H, Rabadan R (2009) Geographic dependence, surveillance, and 21. Han TS (1980) Multiple mutual informations and multiple interactions in frequency – origins of the 2009 influenza A (H1N1) virus. New Engl J Med 361:115 119. data. Inform Control 46:26–45. 5. Smith GJ, et al. (2009) Origins and evolutionary genomics of the 2009 swine-origin 22. Watanabe S (1960) Information theoretical analysis of multivariate correlation. IBM J – H1N1 influenza A epidemic. Nature 459:1122 1125. Res Dev 4:66–82. 6. Maines TR, et al. (2008) Pathogenesis of emerging viruses in mammals 23. McGill WJ (1954) Multivariate information transmission. IEEE Trans Inf Theory – and the host innate immune response. Immunol Rev 225:68 84. 4:93–111. 7. Kobasa D, et al. (2007) Aberrant innate immune response in lethal infection of 24. Octaviani CP, Ozawa M, Yamada S, Goto H, Kawaoka Y (2010) High level of genetic macaques with the 1918 influenza virus. Nature 445:319–323. compatibility between swine-origin H1N1 and highly pathogenic avian H5N1 influen- 8. Diebold SS, et al. (2004) Innate antiviral responses by means of TLR7-mediated recog- za viruses. J Virol 84:10918–10922. nition of single-stranded RNA. Science 303:1529–1531. 25. Li FCK, Choi BCK, Sly T, Pak AWP (2008) Finding the real case-fatality rate of H5N1 9. Greenbaum BD, Rabadan R, Levine A (2009) Patterns of oligonucleotide sequences in avian influenza. J Epidemiol Community Health 62:555–559. viral and host cell RNA identify mediators of the host innate immune system. PLoS One 26. Fiebig L, et al. (2011) Avian influenza A (H5N1) in humans: new insights from a line list 4:e5969. of World Health Organization confirmed cases, September 2006 to August 2010. Euro 10. Jimenez-Baranda S, et al. (2011) Oligonucleotide motifs that disappear during the Surveill 11:19941. evolution of influenza in humans increase IFN-alpha secretion by plasmacytoid den- 27. Varich NL, Gitelman AK, Shilov AA, Smirnov YA, Kaverin NV (2008) Deviation from dritic cells. J Virol 85:3893–3904. 11. Lubeck MD, Palese P, Schulman JL (1979) Nonrandom association of parental in the random distribution pattern of segments in reassortants – influenza A virus recombinants. 95:269–274. produced under non-selective conditions. Arch Virol 153:1149 1154. 12. Muramoto Y, et al. (2006) Hierarchy among viral RNA (vRNA) segments in their role in 28. Nibert ML, Margraf RL, Coombs KM (1996) Nonrandom segregation of parental alleles – vRNA incorporation into influenza A virions. J Virol 80:2318–2325. in reovirus reassortants. J Virol 70:7295 7300. – 13. Rabadan R, Levine AJ, Krasnitz M (2008) Non-random reassortment in human influ- 29. Carlton AG (1969) On the bias of information estimates. Psychol Bull 71:108 109. enza A viruses. Influenza Other Respi Viruses 2:9–22. 30. Li C, et al. (2010) Reassortment between avian H5N1 and human H3N2 influenza 14. Khiabanian H, Trifonov V, Rabadan R (2009) Reassortment patterns in viruses creates hybrid viruses with substantial virulence. Proc Natl Acad Sci USA viruses. PLoS One 4:e7366. 107:4687–4692. 15. Cover TM, Thomas JA (1991) Elements of Information Theory (Wiley, New York). 31. Sun Y, et al. (2011) High genetic compatibility and increased pathogenicity of reassor- 16. Shannon C (1948) A Mathematical Theory of Communication. Bell Syst Tech J tants derived from avian H9N2 and pandemic H1N1/2009 influenza viruses. Proc Natl 27:379–423. Acad Sci USA 108:4164–4169. 17. United Nations Children’s Fund/World Health Organization (2009) Diarrhoea: Why 32. Deng T, Sharps J, Fodor E, Brownlee GG (2005) In vitro assembly of PB2 with a PB1-PA children are still dying and what can be done (United Nations Children’s Fund, dimer supports a new model of assembly of influenza A virus polymerase subunits into New York; World Health Organization, Geneva). a functional trimeric complex. J Virol 79:8669–8674.

6of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1113300109 Greenbaum et al. Downloaded by guest on October 2, 2021