promoters exhibit an unprecedented conservation among all

Karsten Suhre*†‡, Ste´ phane Audic*†, and Jean-Michel Claverie*§

*Information Ge´nomique et Structurale, Centre National de la Recherche Scientifique, Institut de Biologie Structurale et Microbiologie, 13402 Marseille, France; and §University of Me´diterrane´e, School of Medicine, 13385 Marseille, France

Communicated by James L. Van Etten, University of Nebraska, Lincoln, NE, August 22, 2005 (received for review May 12, 2005) The initial analysis of the recently sequenced of Acan- . The same analysis, applied to the of other large thamoeba polyphaga Mimivirus, the largest known double- DNA , confirmed that such a homogeneity of the core stranded DNA , predicted a proteome of size and complexity promoter sequences is a unique characteristic of Mimivirus. more akin to small parasitic than to other nucleocyto- plasmic large DNA viruses and identified numerous functions Materials and Methods never before described in a virus. It has been proposed that the The Mimivirus genome sequence and the corresponding gene Mimivirus lineage could have emerged before the individualization annotations used here are identical to those deposited in Gen- of cellular from the three domains of . An exhaustive Bank under accession no. AY653733. The annotation of the in silico analysis of the noncoding moiety of all known viral Mimivirus genome sequence defines 911 predicted protein- genomes now uncovers the unprecedented perfect conservation of coding genes plus an additional set of 347 less convincing an AAAATTGA motif in close to 50% of the Mimivirus genes. This unidentified reading frames (URFs). Intergenic regions have an motif preferentially occurs in genes transcribed from the predicted average size of 157 nt, with a standard deviation of 113 nt leading strand and is associated with functions required early in (excluding the two-tailed 5% most extreme data points). Starting the viral infectious cycle, such as and protein trans- from the predicted gene map, we searched for conserved motifs lation. A comparison with the known promoter of unicellular within the 150-nt regions upstream of each of the predicted

eukaryotes, amoebal in particular, strongly suggests that start codons (ATG) of the 911 genes. These subse- EVOLUTION the AAAATTGA motif is the structural equivalent of the TATA box quences were identified (i) as statistically overrepresented short core promoter element. This element is specific to the Mimivirus oligomers (by using in-house PERL scripts that are available from lineage and may correspond to an ancestral promoter structure the authors on request) or (ii) by using the Gibbs sampler predating the radiation of the eukaryotic kingdoms. This unprec- algorithm implemented in MEME (7). edented conservation of core promoter regions is another excep- The genome data for Invertebrate iridescent virus 6 tional feature of Mimivirus that again raises the question of its (NC࿝003038), Fowlpox (NC࿝002188), and Amsacta moorei ento- evolutionary origin. mopoxvirus (NC࿝002520) were downloaded from RefSeq and analyzed by using an identical approach. nucleocytoplasmic large DNA viruses ͉ ͉ viral promoter To search for a potential occurrence of related motifs in the castellanii genome, we used the available data he recent discovery and genome analysis of Acanthamoeba from a recent genomic survey sequencing project (Ϸ19 mega- Tpolyphaga Mimivirus, the largest known double-stranded bases of sequence data are available at www.ncbi.nlm.nih.gov). DNA virus, challenged much of the accepted dogma regarding viruses (1, 2). Its particle size (Ͼ400 nm), genome length (1.2 Results and Discussion million bp), and huge gene repertoire (911 protein-coding genes) A Highly Conserved Motif in the 5؅ Upstream Regions of Mimivirus all contribute to blur the established boundaries between viruses Genes. In contrast to the common difficulty of extracting well and the smallest parasitic cellular organisms. Phylogenetic anal- conserved signals from eukaryotic promoters (8), our results ysis placed the evolutionary origin of Mimivirus before the were unexpectedly clear cut: We found that 403 of the 911 (45%) emergence of the extant eukaryotic kingdoms, raising the pos- predicted Mimivirus genes exhibited a strictly conserved sibility that large DNA viruses might define a distinct AAAATTGA motif within their 150-nt upstream region. The from the three other domains of life, Eucarya, , and statistical significance of this finding in the context of the exact Bacteria (2). The exceptionally large gene content of the Mim- composition of the Mimivirus genome was assessed ivirus genome, which includes key protein translation genes as as follows. We cut the Mimivirus genome into 15,752 consecutive well as a very diverse set of enzymes belonging to different 150-nt-long segments (using both strands) and counted the metabolic pathways, is consistent with the hypothesis that Mim- occurrences of the above motif in each of them. Only 661 (4%) ivirus (and other large DNA viruses) evolved from a free living of these segments were found to contain the AAAATTGAmotif through a genome-reduction process akin to the one at least once. Such a strong preferential occurrence of this motif experienced by parasitic intracellular bacteria (3–5). However, in the 5Ј upstream region of Mimivirus genes is indeed highly less radical scenarios have been proposed, such as the recruit- significant (Fisher exact test, P Ͻ 10Ϫ194). Overall, 60% of the ment of numerous host-acquired genes complementing a set of occurrences of this motif are located in front of a gene. More- core genes common to all nucleocytoplasmic large DNA viruses over, the distribution of the motif within these 5Ј upstream (NCLDVs) (6). In the latter case, the acquired ORFs would have regions is nonrandom, with most of the motifs being located in to be transferred with their own promoter or be put under the control of a suitable viral promoter to function properly. These questions about the origin of the Mimivirus genes prompted us Freely available online through the PNAS open access option. to systematically analyze the DNA sequences immediately 5Ј Abbreviations: NCLDV, nucleocytoplasmic large DNA virus; COG, Clusters of Orthologous upstream of each of the Mimivirus ORFs in search for tran- Groups of proteins; URF, unidentified reading frame. scriptional motifs. Unexpectedly, this analysis revealed the pres- †K.S. and S.A. contributed equally to this work. ence of the perfectly conserved octamer AAAATTGA in the ‡To whom correspondence should be addressed. E-mail: [email protected]. putative core promoter regions of nearly half of all Mimivirus © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506465102 PNAS ͉ October 11, 2005 ͉ vol. 102 ͉ no. 41 ͉ 14689–14693 Downloaded by guest on September 23, 2021 Fig. 1. The distribution of the position of the AAAATTGA motif with respect Fig. 3. No relationship between the presence͞absence of the AAAATTGA to the predicted gene start shows the location of the conserved element at motif and the distance between two genes can be identified in the size Ϸ50–110 nt. distribution of Mimivirus intergenic regions. The exception is the virtual absence of the motif in intergenic regions that are too short to host it. a narrow range, between 50 and 110 nt from the translation start (Fig. 1). A sequence logo (9) including the flanking regions of The Motif Is Less Frequent in the More Hypothetical URFs. The same these AAAATTGA motifs shows that they are preceded by two analysis was performed with the 347 unlikely genes, annotated to three AT-rich positions and followed by eight to nine AT-rich URFs, in the Mimivirus genome. Only 20 (6%) of them exhibited positions (Fig. 2). the AAAATTGA motif in their 150-nt upstream region. More- The same set of 150-nt upstream regions was also analyzed by over, many of them did not fall within the [Ϫ110, Ϫ50] range or using the more sophisticated Gibbs sampler approach, as im- were not flanked by AT-rich positions. Consistently, only 11 plemented in MEME (7). A position-weight matrix (PWM) that (3%) of the URFs exhibited the previously defined MEME motif corresponds to a motif very similar to the previous AAAAT- (score Ͼ 1,000). This motif distribution is not significantly TGA was identified (MEME motif; Fig. 5, which is published as different from the one observed in the 15,752 consecutive supporting information on the PNAS web site). Using a score 150-nt-long segments covering the whole genome. cutoff of 1,000 (based on the bimodal distribution of the PWM scores; Fig. 6, which is published as supporting information on The Presence͞Absence of the Motif Does Not Correlate with the the PNAS web site), 446 genes (49%) with a motif were detected. Intergenic Distance. We verified that the occurrences of the For comparison, only 464 (3%) of the above 15,752 consecutive AAAATTGA motif were not trivially linked to the distance genomic segments exhibit the MEME motif (score Ն 1,000). Most between successive ORFs. Fig. 3 shows that the proportion of of them are thus found upstream of a gene-coding region. (Fisher genes exhibiting the motif is fairly constant across the whole Ϫ Ͻ exact test, P Ͻ 10 280). We note that no other predominant motif range of intergenic distances, except for the smallest ones ( 60 was found in the upstream regions of genes not exhibiting the nt). It is worth noting that a fraction of these short distances AAAATTGA motif. Finally, we verified that the prevalence of might be artifactual, corresponding to cases where the proximal the AAAATTGA motif in the 150-nt segment upstream of the ATG does not coincide with the actual translation start. The net ORFs was not a mere statistical consequence of their high A ϩ result of this artifact is to slightly minimize the number of genes T content. When these sequences were randomized, the exhibiting the AAAATTGA motif. AAAATTGA octamer was found only 22.7 times (compared The Motif Significantly Correlates with Genes Transcribed from the with 403), and the MEME motif was found 7.2 times (compared Leading Strand. We then examined other possible correlations with 446) on average over 100 repetitions of the randomization between the presence of the AAAATTGA motifs and the protocol. positions of the corresponding genes within the genome. Overall, the distribution of Mimivirus protein-coding genes does not exhibit any significant strand bias: 450 genes are found on the positive strand (R genes), and 461 are found on the negative strand (L genes; Fisher exact test, P ϭ 0.8). The distribution of the AAAATTGA motif is similarly unbiased, with 196 occur- rences in front of the R genes and 207 in front of the L genes. The MEME motif exhibits a similar distribution, with 217 in front of R genes and 229 in front of L genes (score Ͼ 1,000). There is thus no significant global strand preference for the occurrence of the upstream motif (Fisher exact test, P ϭ 0.8). However, a previous analysis (2) identified a putative (OR) of the near position 380,000 (between genes L294 and L295). On the basis of this prediction, one can distinguish a ‘‘leading’’ strand, with 578 genes tran- scribed away from the OR, and a ‘‘lagging’’ strand, from which Fig. 2. The sequence logo shows the conserved AT-rich neighborhood of the 333 genes are transcribed. Interestingly, the AAAATTGA motif exactly conserved AAAATTGA octamer. The logo was based on 400 genes with ϭ a strictly conserved AAAATTGA motif; 3 genes with a motif that is Ͻ20 nt from occurs significantly more frequently (Fisher exact test, P the predicted translation start were not included in the computation of the 0.027) in front of the genes transcribed from the leading strand logo. (See Fig. 5 for the corresponding logo of the MEME motif.) (281͞578 ϭ 48.6%) than in front of the genes transcribed from

14690 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506465102 Suhre et al. Downloaded by guest on September 23, 2021 Table 1. Number of ‘‘COGed’’ Mimivirus genes with or without a conserved AAAATTGA motif With motif Without motif Total COG function

7 (9) 23 (21) 30 DNA replication, recombination, and repair 0 (0) 9 (9) 9 envelope biogenesis, outer membrane 3 (3) 5 (5) 8 transport and metabolism 5 (5) 11 (11) 16 Posttranslational modification, protein turnover, chaperones 1 (1) 2 (2) 3 Lipid metabolism 34 (38) 26 (22) 60 Function unknown or general function prediction only 2 (2) 1 (1) 3 Secondary metabolites biosynthesis, transport, and catabolism 5 (5) 0 (0) 5 Nucleotide transport and metabolism 7 (7) 2 (2) 9 Transcription 6 (5) 1 (2) 7 Translation, ribosomal structure, and biogenesis

Only COG classes that contain at least three Mimivirus genes are shown. Numbers corresponding to genes with or without a MEME motif are given in parentheses.

the lagging strand (122͞333 ϭ 36.6%). A similar asymmetry is DNA viruses have been much less studied than their cellular found for the MEME motif (313͞578 ϭ 54.2% and 133͞313 ϭ counterparts. The analysis as applied to the Mimivirus genome 39.9%; Fisher exact test, P ϭ 0.015). was thus performed on the genomes of NCLDVs of the We also examined whether the presence of the AAAATTGA Iridiviridae, , and families to es- motif might correlate with certain categories of gene functions tablish the pattern of sequence conservation in the 5Ј upstream (Table 1). One hundred fifty-seven Mimivirus genes can be regions of their genes. Our results show that none of these associated to one of the Clusters of Orthologous Groups of other NCLDVs exhibits a pattern of conservation comparable proteins (COG) (10) functional classes. We found that most to the one observed for Mimivirus (Table 4, which is published translation- and transcription-related genes exhibited the as supporting information on the PNAS web site). For in-

AAAATTGA motif, which was also true of genes related to stance, only 30 of 178 (17%) genes of Invertebrate iridescent EVOLUTION nucleotide transport and metabolism. In contrast, the motif was virus 6 (CIV, ) have a conserved AAAATTGA absent from the upstream region of most genes related to DNA motif within their 150-nt upstream region. Fourteen of 231 replication, recombination, and repair, as well as from genes (6%) Fowlpox (FOP, Poxviridae) genes and 10 of 218 (4.5%) classified in the cell envelope biogenesis͞outer membrane Amsacta moorei entomopoxvirus (AME, Poxviridae) genes also COGs. Overall, the upstream motif does not occur more fre- exhibit this motif. In all other NCLDVs, fewer or no occur- quently in front of genes associated with functional annotations rences of this motif were detected. Note, however, that 47 of (88͞232 ϭ 38%) compared with anonymous ones (315͞679 ϭ the 218 (22%) AME genes display a conserved TTTTGAAA 46%) (Table 2, which is published as supporting information on motif (Table 4). Finally, an analysis by Gibbs sampling of all the PNAS web site). viral genomes containing Ͼ100 annotated genes showed that the Mimivirus pattern of 5Ј upstream motif conservation is NCLDV Core Genes and the AAAATTGA Motif. Iyer et al. (6) iden- truly unique among large DNA viruses. Such a conservation is tified a set of homologues genes that have been identified in also absent from archaea viruses such as AFV1 (16). In the all or most members of the four main NCLDV groups: absence of experimental data, we cannot formally exclude that Iridoviridae, Asfarviridae, Phycodnaviriae, and Poxviridae. the intergenic AAAATTGA motif may act on both upstream Some of these ‘‘core’’ genes are also found in and downstream adjacent genes. However, a symmetrical and phages. These core genes are divided into four classes, analysis of the 150-nt-long intergenic regions 3Ј downstream of from the most conserved to the least conserved. Remarkably, each gene identified only half (203 of 403) of the previously we found that none of the nine class I core genes found in identified motifs and exhibited no preferential location for Mimivirus has the exact octamer motif in its 5Ј upstream these motifs with respect to the preceding stop codon (Fig. 7, region, and only two exhibit the MEME motif (Table 3, which which is published as supporting information on the PNAS web is published as supporting information on the PNAS web site). site). These results are thus in favor of a 5Ј polarity of function. In contrast, all but one of the six class II core genes have the octamer motif and the MEME motif. The motif distribution for The Conserved AAAATTGA Motif Is Likely to Be a Main Core Promoter the class III (7͞11) and class IV (10͞16) core genes is more Element. Analyses of the structure and expression of a number balanced. However, none of the above distributions signifi- of genes from amoebal protists have shown that they are cantly differs from the 446͞911 ratio observed for all Mim- expressed in single transcription units and that few of them ivirus genes (Fisher exact test, P Ͼ 0.1). have introns (15). Because the mechanisms of gene expression used by a virus and its host must be compatible, it is reasonable (The Presence of Such a Highly Conserved 5؅ Upstream Motif Is Unique to propose that the short genome regions (157-nt average to Mimivirus. For many years, the 5Ј upstream region of eu- separating two consecutive Mimivirus ORFs contain most of karyotic genes has been under intense scrutiny in many the promoter sequence information. The eukaryotic core different organisms from different kingdoms (e.g., fungi, promoter includes DNA elements that can extend 35 bp , and metazoan) in an attempt to decipher the sequence- upstream and͞or downstream of the transcription initiation based signal involved in the initiation and regulation of the site. Most core promoter elements appear to interact directly transcription process (11, 12). The sole common result that with components of the basal transcription machinery. In emerged from these numerous studies is that sequence con- metazoan, the most conserved and recognizable core pro- servation is the exception rather than the rule in the 5Ј moter element is the TATA box, located 25–30 bp from the upstream regions of eukaryotic genes (13, 14). Other recent transcription start site and present in approximately one-third studies have confirmed this lack of conservation in several of human genes (17). The average position (Ϫ60 from the lineages of parasitic protists (15). However, large eukaryotic predicted initiator ATG) of the conserved octamer found in

Suhre et al. PNAS ͉ October 11, 2005 ͉ vol. 102 ͉ no. 41 ͉ 14691 Downloaded by guest on September 23, 2021 Fig. 4. The alignment of the intergenic region of the paralogous gene cluster L175–L185 shows the perfect conservation of the AAAATTGA motif within intergenic regions that have otherwise more extensively diverged. Note that the AAAATTGA motif is part of the C terminus of gene L185 (indicated by white Xs).

Mimivirus is consistent with it playing a role similar to the complex including these divergent Mimivirus proteins. A TATA box for the expression of the viral genes, provided the second type of promoter that is highly degenerate might then 3Ј UTRs (5Ј UTRs) of viral mRNA are Ϸ30 nt long on average. be recognized by the host preinitiation complex or involve a Such a compact promoter͞3Ј UTR structure has been ob- combination of Mimivirus and host-encoded transcription served in amitochondriate protists such as Giardia intestinalis factors. This hypothesis is consistent with the preferential or Entamoeba histolytica (15). The sequences of the TATA occurrence of the AAAATTGA (type I promoter) in front of box-like element of these protozoa are also different from the Mimivirus genes encoding functions required for the early (or 5Ј-TATATAAG-3Ј consensus identified in the other eukary- late-early) phase of viral infection (transcription, nucleotide otic kingdoms. Accordingly, we propose that the AAAAT- transport, and protein translation) (Table 1). According to this TGA motif found in 50% of Mimivirus intergenic regions is the scenario, the corresponding genes could be transcribed in the virus TATA box-like motif. host cytoplasm. Conversely, the AAAATTGA motif is pref- erentially absent from the promoter of Mimivirus genes en- Mimivirus TATA Box-Like Motif Is Not Prevalent in Amoebal Organ- coding ‘‘late’’ functions such as DNA replication and particle isms. Interestingly, the Mimivirus TATA box-like motif does biogenesis and assembly (Table 1). These genes could be not bear a particular resemblance (if any) with the different transcribed in the host nucleus. TATA box-like consensus sequences (if any) that have been identified in various protozoan. For instance, the E. histolytica Conclusion TATA box-like consensus is TATTTAAA (15). Using the Our bioinformatics and comparative genomics study revealed available data from a genomic survey sequencing of A. cas- a unique feature of Mimivirus among the eukaryotic domain: tellanii, we verified that the AAAATTGA motif is not par- the presence of a highly conserved AAAATTGA motif in the ticularly prevalent in the genome of the closest (partially immediate 5Ј upstream region of 50% of its protein-encoding sequenced) relative of the Mimivirus host A. polyphaga.In genes. By analogy with the known promoter structures of addition, no significant difference in codon usage between the unicellular eukaryotes, amoebal organisms in particular, we sets of genes harboring or not harboring the AAAATTGA propose that this motif corresponds to a TATA box-like core motif could be detected. Symmetrically, of the 29 Mimivirus promoter element. This element, and its conservation, appears genes most likely to have been acquired by lateral transfer (18), to be specific of the Mimivirus lineage and might correspond 15 exhibit the AAAATTGA motif, and 14 do not have it. to an ancestral promoter structure predating the radiation of These results argue against the hypothesis that this promoter the eukaryotic kingdoms (2). Mimivirus genes exhibiting this element, together with the large proportion of associated type of promoter might be ancestral as well. Interestingly, this Mimivirus genes, was simply acquired from its host. Interest- observation is true in the case of all translation apparatus- ingly, clusters of paralogous genes that have been produced by related genes initially identified in a virus (four aminoacyl multiple rounds of duplications from an ancestral Mimivirus tRNA synthetases, mRNA cap binding protein, translation gene (19) conserved the AAAATTGA motifs amidst the factor eF-Tu, and tRNA methyltransferase) as well as in divergence of their respective promoter regions. The example Mimivirus Topoisomerase 1A (bacterial type). However, it is of the large gene cluster L175–L185 is shown in Fig. 4. also possible, but less likely, that horizontally acquired ORFs could have been inserted downstream of a preexisting The Two Types of Mimivirus Promoters Might Correspond to Early AAAATTGA motif. Versus Late Functions. Homologues of the three main proteins More genomic data on amoebal species, other protozoa, and involved in the formation of the transcription preinitiation their associated viruses are now needed to better understand the complex have been identified in the Mimivirus genome: the evolutionary scenario through which the unique promoter struc- two RNA polymerase II subunits (Rbp1 and Rpb2) and the ture of Mimivirus genes might have emerged. Our findings can TFIID (TATA box-binding) initiation factor. The correspond- now be used to guide the experimental characterization of ing amino acid sequences are extremely distant from their Mimivirus transcription units and identify the transcription closest matches: Candida albicans Rpb1 (34% identical), Dic- factors involved in the recognition of its uniquely conserved tyostelium discoideum Rpb2 (36% identical), and Plasmodium promoter element. falciparum TFIID (24% identical). We propose that Mimivi- rus’s unique TATA box-like sequence AAAATTGA might This work was supported by the Centre National de la Recherche have coevolved and might be recognized by a preinitiation Scientifique and a grant from the French National Genopole Network.

1. La Scola, B., Audic, S., Robert, C., Jungang, L., de Lamballerie, X., Bastolla, U., Fernandez, J. M., Jimenez, L., Postigo, M., Silva, F. J., et al. Drancourt, M., Birtles, R., Claverie, J.-M. & Raoult, D. (2003) Science 299, (2003) Proc. Natl. Acad. Sci. USA 100, 581–586. 2033. 5. Waters, E., Hohn, M. J., Ahel, I., Graham, D. E., Adams, M. D., Barnstead, 2. Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, M., Beeson, K. Y., Bibbs, L., Bolanos, R., Keller, M., et al. (2003) Proc. Natl. B., Suzan, M. & Claverie, J.-M. (2004) Science 306, 1344–1350. Acad. Sci. USA 100, 12984–12988. 3. Ogata, H., Audic, S., Renesto-Audiffren, P., Fournier, P. E., Barbe, V., 6. Iyer, L. M., Aravind, L. & Koonin, E. V. (2001) J. Virol. 75, 11720–11734. Samson, D., Roux, V., Cossart, P., Weissenbach, J., Claverie, J.-M. & Raoult, 7. Bailey, T. L. & Gribskov, M. (1998) Bioinformatics 14, 48–54. D. (2001) Science 293, 2093–2098. 8. Fickett, J. W. & Hatzigeorgiou, A. G. (1997) Genome Res. 7, 861–878. 4. van Ham, R. C., Kamerbeek, J., Palacios, C., Rausell, C., Abascal, F., 9. Schneider, T. D. & Stephens, R. M. (1990) Nucleic Acids Res. 18, 6097–6100.

14692 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506465102 Suhre et al. Downloaded by guest on September 23, 2021 10. Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., 15. Vanacova, S., Liston, D. R., Tachezy, J. & Johnson, P. J. (2003) Int. J. Parasitol. Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, 33, 235–255. A. N., et al. (2003) BMC Bioinformatics 4, 41. 16. Bettstetter, M., Peng, X., Garrett, R. A. & Prangishvili, D. (2003) 315, 11. Schmid, C. D., Praz, V., Delorenzi, M., Perier, R. & Bucher, P. (2004) Nucleic 68–79. Acids Res. 32, D82–D85. 17. Suzuki, Y., Tsunoda, T., Sese, J., Taira, H., Mizushima-Sugano, J., Hata, H., 12. Zhao, F., Xuan, Z., Liu, L. & Zhang, M. Q. (2005) Nucleic Acids Res. 33, Ota, T., Isogai, T., Tanaka, T., Nakamura, Y., et al. (2001) Genome Res. 11, D103–D107. 677–684. 13. Smale, S. T. & Kadonaga, J. T. (2003) Annu. Rev. Biochem. 72, 449–479. 18. Ogata, H., Abergel, C., Raoult, D. & Claverie, J.-M. (2005) Science 308, 1114b. 14. Davuluri, R. V., Grosse, I. & Zhang, M. Q. (2001) Nat. Genet. 29, 412–417. 19. Suhre, K. (2005) J. Virol., in press. EVOLUTION

Suhre et al. PNAS ͉ October 11, 2005 ͉ vol. 102 ͉ no. 41 ͉ 14693 Downloaded by guest on September 23, 2021