<<

Proc. NatL Acad. Sci. USA Vol. 78, No. 12, pp. 7639-7643, December 1981 Evolution

Sequence relationships among the hemagglutinin of 12 subtypes of influenza A virus (antigenic shift/antigenic drift/dideoxy ) GILLIAN M. AIR John Curtin School of Medical Research, Australian National University, P.O. Box 334, Canberra, A.C.T. 2601, Australia Communicated by Frank Fenner, June 3, 1981

ABSTRACT sequences of the 3' 20% of the hem- is ideal for making a cDNA copy and inserting this into plasmids agglutinin of32 influenza A virus strains from the 12 known for amplification in and nucleotide sequence hemagglutinin subtypes have been determined. Although the se- analysis. Thus, the sequences ofthe fowl plague virus HA gene quences ofhemagglutinin genes and ofdifferent subtypes (H7) (5) and some human strains, an Asian (H2) type (6) and differ greatly, cysteine and some other residues are several Hong Kong (H3) strains (7-9), have been determined. totally conserved, presumably reflecting evolution of the 12 dif- From the nucleotide sequences, amino acid sequences were ferent hemagglutinins from a single gene. When viruses of one predicted using the genetic code. subtype, isolated over a period of time, are compared, the hem- When the sequences ofthese three subtypes ofHA (H2, H3, agglutinin gene and sequences show a slow accumulation and are of nucleotide changes and some amino acid changes. Since se- H7) compared, at either the nucleotide or amino acid quence data from the genes coding for the matrix and nonstruc- level, they are remarkably dissimilar. The complete lack ofan- tural proteins also show an accumulation of changes with time, it tigenic crossreactivity among them is well characterized but, seems that antigenic selection (of the surface antigens) does not because substitution of a single amino acid can completely de- contribute significantly to the rate of change of influenza gene stroy interaction between a particular antigenic determinant sequences. Although the rate of nucleotide change during drift is and monoclonal antibody (10) and because there is a limit to the more than sufficient to account for the amino acid sequence dif- numberofindependent determinants on the hemagglutinin (11, ferences observed in the 12 subtypes, there is a clear distinction, 12), the degree of divergence in sequence is remarkable. by antigenic as well as sequence analyses, between viruses of one This paper describes nucleotide sequence data from the HA subtype (0-9% amino acid variation) and viruses ofother subtypes genes of32 virus strains. Most ofthe influenza sequences pub- (20-74% amino acid variation). No virus has yet been found that lished have been derived from cloned cDNA copies (5-9), but is intermediate between subtypes. there are possibilities of artifacts being introduced during the cloning procedure or cloning an unrepresentative molecule (5, Influenza remains an uncontrolled disease in man, largely be- 7). In this study, the sequences have been derived by dideoxy cause of variations in the amino acid sequences ofthe two sur- sequence analysis (13) of cDNA primed from the 3' end of the face antigens, hemagglutinin (HA) and neuraminidase. Year by viral RNA strand. The length of sequence obtained ranges up year, the influenza A virus becomes progressively more resis- to 380 and includes the 5'-noncoding region ofthe tant to neutralization by antibody made against older HAs as mRNA [except for the extra nucleotides derived from cell antigenic changes accumulate. In addition, every so often, a mRNA (14)], the signal peptide coding sequence, and the region "new" virus appears having such differences in the HA se- that codes for up to a third of the HA1 polypeptide. The infor- quence and antigenic properties that it is clearly not derived mation for each HA is far from complete but is a significant by from previously circulating viruses. The most re- proportion and of manageable size for detailed comparison. cent examples are the appearance of "Asian" flu in 1957 (H2 subtype), "Hong Kong" flu in 1968 (H3 subtype) and "Russian" flu in 1977 (H1 subtype). It is possible that the new HA is nor- MATERIALS AND METHODS mally derived from viruses circulating or sequestered in species Viruses were grown in embryonated eggs and purified by ad- other than man (1) and, for this reason, much work has been sorption to and elution from chicken erythrocytes followed by focused on characterizing influenza A viruses from birds and sucrose density-gradient centrifugation (15). animals. Currently, 12 subtypes ofHA are defined, designated Isolation of HA RNA and sequence analysis of the cDNA by HI-H12 (2), based on lack of serological crossreactivity. using as primer a synthetic dodecanucleotide complementary HA contains two polypeptide chains, HA1 and HA2, which to the 3' 12 nucleotides common to all segments of influenza are derived by proteolysis of a single precursor molecule that A viral RNA (16) (Collaborative Research, kindly given by C. includes loss of a signal peptide (3). Both polypeptides are gly- J. Lai) and the dideoxy method (13, 17) were as described (18, cosylated and contain hydrophobic regions (3, 4), making pro- 19). Sequences were obtained from HA genes of the following tein sequence analysis difficult. The amino acid sequence has viruses: Hi-A/NWS/33, A/PR/8/34 (Mt. Sinai), A/Bellamy/ been determined directly for HA1 and HA2 of only one strain 42, A/USSR/90/77, A/Fort Warren/i/50, A/Loyang/4/57, ofvirus, ofthe Hong Kong (H3) subtype (4) and, in spite ofmuch A/swine/Wisconsin/15/30, A/New Jersey/11/76, H2-A/RI/ effort, part of the hydrophobic COOH-terminal sequence of 5-/57, A/Tokyo/3/67, A/Netherlands/68, A/Berkeley/68, A/ HA2 could not be determined. duck/GDR/72, A/duck/Alberta/77/77, A/pintail/Alberta/ On the other hand, the segmented single-stranded genome 293/77; H3-A/Memphis/1/71, A/black duck/Australia/702/ 78, A/duck/Ukraine/1/63; H4-A/duck/Alberta/28/76; H5-A/ The publication costs ofthis article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- Abbreviations: HA, hemagglutinin; HA1, large HA polypeptide; HA2, ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact. small HA polypeptide. 7639 Downloaded by guest on September 26, 2021 7640 Evolution: Air Proc. Natl. Acad. Sci. USA 78 (1981) shearwater/Australia/75; H6-A/shearwater/Australia/72; ent virus preparations, and each region of sequence was read H7-A/turkey/Oregon/71, A/equine/Prague/1/56; H8-A/ from several gels run for different times. The sequence gels of turkey/Ontario/6118/68; H9-A/turkey/Wisconsin/1/66; closely related strains were compared and differences were H10-A/duck/Manitoba/53; Hll-A/duck/England/56, A/ checked. It is, however, not possible to be certain that there duck/Ukraine/1/60, A/tern/Australia/75, A/duck/Mem- is no error in the >10,000 nucleotides sequenced. phis/546/76, A/duck/New York/12/78; H12-A/duck/Al- berta/60/76. The sequence data were stored and analyzed in a PDP-11 RESULTS computer using programs adapted from those ofStaden (20) and Fig. 1 shows the nucleotide and predicted amino acid sequences Gibbs and McIntyre (21). To enhance the accuracy ofthe data, obtained from HA genes of 12 viruses representing the 12 every experiment was repeated, often many times with differ- subtypes.

35 Hi (HOH01) MGCAAAAGCA GGGGAAAATA AAAACAACCA AA ATG AAG GCA AAC CTA CTG GTC CTG TTA TGT GCA CTT GCA GCT GCA GAT CCA GAC ACA ATA GG CAT ACG AAC I Met Lys Ala Asn Leau Leu Val Leu Leu Cys Ala Leu Ala Ala Ala Asp Ala Asp Thr 0le IIs 11 r His Ala Asn 46t RTA H2 (02) AGCAAAAGCA GGGGTTATAC CATCGACAAC CAAAAGCAAA ACA ATG GCC ATC ATT TAT CTC ATT CTC CTG TTC ACA GCA GTG AGA GCGCAC CAG AT GC. A? CAT CCC AMT II5 Ile Tyr Leu Ile Phe Thr Arg Gly Ile (11 1y yr Hos Ala Asn 31 I1- Leu Gln Ala Leau Ala Val Asp Cys (II 05 (HaoS( AGCAAAGCA GGGGTCTAAT CTATCAAA ATG GAGset AGA GTA GTCCT? CTT CTT GCA ATG ATC AGT CTT GTC MT CAC CM A ATT CG10 TA CAT GCA AA 106 Met Glu Arg Val Val Leu Leu Leu Ala Not I1. Sle Leu Val Lys Ser Asp Gln 11 Ile Tye His Ale Asn yG H11 Hao3 MAGCAAAAGCA GGGGAAATAT CTAGAATCA AA ATG 666 AAA GTA CTG CTT TT GCA GCA ATC ATC ATC TGT ATT CCA CCA CACATICM tAT1T lO T CTG AGC AAC 110 Set20rLys Lys Val Leu Leu Phe Ala Ala Ile IOl 11e Cys Ile Arg Ala1Asp Glu Ila Tye LCu STr Asn H6 (Hav6) AGCAAAAGCA GGGGAAA ATG ATT GCA ATC ATT GTA GTA GCG ATA CTG GCA MA GCC GGA AGG TCT GAC AAG ATC ATI ly~ CM CAT GCC AAC 105 set Ile Ala Ile Ile Val Val Ala Ile Leu Ala Thr Ala Gly SeraAsp Lys Ile Cys CII 22TArg 1l sys His Ala Aln II& H0 (Havg) AGCCMGCA GGGGTCACA ATG GAG AMA TC ATC GCA ATA GCA ATG CTC TTG GCG ACC ACA AAT GCA TAC GAS AGG AT ATT 11 T CAA TCA ACT 100 Glu Lys Phe Ile Met Thr Asn Tyr Arg Tyr Gon S r A n 36 set Ole Ala Ala Leu Leu Ala Sle Ala t4p Ole GO H9 (Olav9) AGCAAAGCA GGGGAACAAC ATAGCCAATC AMG ATG GAA ACA AM GCA ATA ATT GCT GCA CTG CTA ATG GTA MA GCA CCC AAT GCT CGAT AAA AT? 11 CC? CAA TCA ACA 117 Met Glu Thr Lys Ala 01. Ile Ala Ala Leu Leu Set Val Thr Ala Ala Asn Lys lie Gy2W Tor Gln Ser Thr 22AltsAlasAsp ys 012 (HaVIO) AGCAAAAGCA GCGGTCACA ATG GAA A TTCrc AT? ATT TTG AGT ACT GTC TG GCA GCA AGC TT T GTAC AM AT TAC CAG ACA AAC 100 set Glu Lys Phe Ile Ile Leu Ser Thr Val Leu Ala Al& See PAe Ala Tyrtp Lys 1le yG Leu yr Gln Thr Asn A ACT AT? H7 (Hvel) AGCAMGCA GGGGATACAA ATG AAT CAA TTG GTA TC ATT CCC TT GTG CT? AT? AAA CCT AAA GCA GAC AAA AT L u CCCAT GCT GTG LOS set Asn Thr Gln Ole Leu Val Phe Ile Ala Cys Val Leu O1l Lys Ala Lys ClytCsp Lys Ile Cys li "is His Ala Val H1O (Hav2) AGCAAAAGCA GGGGTCACA ASG TAC AAA ATA GTG CTr GTA CTC ACG CTC TIt GCA GCA GTG AAT GCC CT? CAC AMA A6A ATCAT GCA GTC 100 set Tyr Lys Ole Val Leu Val Leu Thr Leu Phe Cly Ala Val Asn Gly LeouAsp Lys Ile His His Ala Val 22T CTG IV H4 (Hav4) AGCAAAAGCA GGGGAAACA ATG CTA TCA ATC ACG AT? CTG TOt CTG CTC ATA GCA GAG GGC TCC TCT CAG AAT TAC ACA CGA AAT CCGTG ATA CAT CAT GCT CTA l lS Set Leu Ser Ile Thr Ole Leu Phe Leu Leu Ile Ala Glu Gly Ser Ser GCn Asn Tyr Thr Gly Asn Pro Val 11l His His Ala Val H3 (03) 17 CTG OC AGCAAAGCA GGGG ATG ATT CTA TTA ACC ATG AAG ACC AT? ATT GCT TTG AGC CAC ATT TC TGT CTG GTT CTC GGC CAA TAC CTt CCA GGA AAT OAC AMC AC ACA GCA ACG CT? CAT CAT GCA CTG 137 Met Ile Leu Leu Thr Met Lys Thr Ole Ile Ala Leu See His Ile Phe Cys Leu Val Leu Gly Gln Tyr Leau Pro Gly Asn Asp Asn Se Thr Ala Thr Le 1l His His Ala Val

116 I * lil AAT TCA ACC GAC ACT fTT GAC| ;,TA C-C GAn, AAG NtT GTG AC ACAI CAC TCT GTT AAC CTG CTC GCA GAC AMC CAC AM GGC AM CTJ M MA TTA AM CCA ATA GCC CCA CTA 233 q jq AspS Tr Val Asp Ir jal Lau Clu Lye ThrI Ifis Slr Val Asn Leu Leu Glu Asp Ser His Asn Gly Lys Leo cyl Arg Leau Lys Gly Ile Ala Pro Leu 0zl 112 AAT TCC ACA GAG AMG GTC GAC NTT CTA fGAf CGG AAbC GTC ACT ACT CAT GCC AAG GAC ATT CTT GAG AAG MC CAT AC GGA AMG TT? TCA AM CTA AMC GGA ATC CCT CCA CTT 23! * SrT Glu Lys Val Asp Ile Leu Glu Arg V he ThrI ((as Ala Lys Asp Ile Leu Glu Lys Thr His Asn Gly Lys Leo Cyl Lys Leu Asn Gly Ole Pro Pro Leu Va) 1.5 AAC TCC ACA GCA CAG GTT GAC | kTA ATG GAA A G NAT GTT ACA ACA CAT GCA CAM GAC ATA CTT GAA AAG ACA CAT ?AT GCG AAA CTCc TrA ACT CTG AAT GCA CTG AMG CCC TTA 226 4qm Th Glu Gln Val Asp |AC Ilc 'et GCu Lysesn Vol The a) Thr ((is Ala GCn Asp Ile Leu Glu Lys TSr His Asn Gly Lys Leo ka Cyla Slr Leu Asn Gly Val Lys Pro Leu lill MAC TCA AC GAG CJA GTG GAC| hrl INTA ATT GAG AGT AAT GTC ACG ACT AGC TCG GTT GAA CTG GTT GAC AAT CAGM ACTAC MT A TCA TCc TrAac TCA ATC GAT GCOG AMA GCA CCA ATA 230 4P"see GIlu Lys Val Asp Ile Ile Glu Ser Asn Val Thr TC ThrI Ser Ser Val Glu Leu Val Glu Asp Glu Tyr Thr Gly S-r Phb a Cyl Ser IIl Asp Gly Lys Ala Pro Ile H6 AAT TCA ACA ACA CAtA ATA GAf1 |hrl TA CIT GAG AAG MAT GTA ACT aI ACG CAC TCA GTT GAG TG CTG GCG AAC CAA AAM CAG CMAA AA TTCc TrAc AM ATC TIC AAA A60 GCC CCT CT? 21S 9 _SerTbr Thr Gln Ile Asp5 Ile Leu Glu Lys Asn Val Thr Thr Ser Val Glu Leu Leau Glu Asn Oln Lys Olu Clu AHe Pbh a Cyl9 Lys Ile Leu Lys Lys Ala Pro Leu ST ((5 AAC TCC ACA GAC ACA GTf AAC CTC ACA GAiA CA(; NAT GTG CCA ACC CAA ACA AT? GAG CTC CTG CAA ACA CMA AMA CAT CCC GCT TAT IV11T ACT GAT TTA CGT GCC CCA TTG 220 ,5P"a Asp Thr Val Csn |hrl 1.eu Thr Glu GIn Asn Val Pro Thr Gln Thr Met Glu Leu Val Glu Thr Glu Lys His Pro Ala Tyr Cyl nhrT Asp Leu Cly Ala Pro Leu H9 AAC TCC ACC GAC ACT GTC GAC C!TA ACA GAG AGC AAT GTT CCT ACA CAC ACT AAA GAA TTC CTC CAC ACA CM CAC AAT CC AT? TGG TGaT A ACT GAT CTA G0A CCT CCC CSC 237 4L iSl_ Glu Thr Val Asp Leu Thr C lu Ser Asn Val Pro Tr His Thr Lys Glu Leu Leu Hlis Thr Glu Nis Asn Cly Hot Leo cv. la Thr Asp Leu Cly His Pro Leu al 1112 AAC TCG ACT GAA ACC GTA AAC CTA AGT GAA TTA AAC GTT CCG al MCG CAG GCG CAA GAA CTT CTA CAT GCT CGG ATT GA? CCG AT? CT) r. TG',aT ACG GAA CTA G0A TCA CCA CTA 220 Thr lu Thr Val Asn Lcu Ser Glu Leu Asn Val Pro hr Gln Val Glu Glu Leau Val (is Gly Gly Ile Asp Pro Ile Leo ka Cyl ly Thr Clu Leu Cly Slr Pro Leu rrl .1 10 GCA AAC GIGA ACA AM rTG JiC| CTA ACA GAG AGA GGG ATT GAG Ta CT? MT GCC ACA ACCAA C CT? CA 6CC GCA AAT AST GCG AAM ACc To11c CAG GGC AAA AGG --- CCA ACA 222 Ala&AnlTr Lys Val Asn Leu Thr Glu Arg Gly Ile Glu a) Vol Ass Ala Thr Glu Thr Val Glu Thr Ala Asn Ile Cly Lys Ile cyl r Gln Gly Lys Arg Pro Thr 103' 1(10 CCC 66T GGC ATC ATT GTA AAG CTT MG AAC GAG AAA GAG GAG AAT GCT ACT CMA ACA GCT GCA AMC AAA ACC CT? GAC AGA CTO h c T AAA GGT CGA AAA --- TAC A6A 217 Pro Asn Gly Ile Ile Val Lys Lou Thr Asn Glu Lys Glu Glu TT GbrClu Thr ValGlu Ser Lys Thr Lea Asp Arg Lem a t Lys Cly Arg Lys Tyr Lys 114 TCC AAT GGG MA ATG GTG A TM ACT GAT GAC CAA GTA GI C ACT GCC CAM GAC TA CTG GCA TC CAA CAT CTA CCC CMA T? r. TVI AGC CCT STA AAA --- TTA GTA 232 Ser Asn Gly Thr 'et Val Lys Leu Thr Asp Asp Gln Val Glu ValI Thr Ala Gln Glu Leu Val Glu Ser Gln His Leu Pro Glu Leo Is Cyl0 Pro Slr Pro Leu Lys Leu Vol

MC Ta --- ((3 CCA AAC GGA ACC CCA GTG ATC h AAT GAT CAG ATT CaG ACT AAT CCT ACT GAG CTA GT? CAM AMC TCC TCA ACC GCC A66 ATI 20c AMT CCT CAT CCA ATC CT? 214 Pro A l u Val Lye le Thr Asn Asp Gln I le Glu Thr Asn Ala Thr Glu Leau Val Gln Ser Ser Slr Thr Gly Lys Il0 S0Asn Asn Pro His Arg I0l Leu

236 CTC TOO MC TC aCT CCA AAC cMa aaT 344 Hi CAA TC CCC ATC GCC GGA TCG CTC ICC CTT CCA CTG MA WCA TWC TAC AT? SCT Gln Leu Gly Ly c Asn Il. Ala Gly Trp Leu Leo ly Pr lu CMp Pro Lea Leu Pro Val Arg Slr Trp Ser Tyr IIe Va hr Pro Asn Ser lu Asn 241 CTA ATT TO ATA A CAC A6 CAM AAC CCC aGa caC GOT TIC TGT TAT CCA GCC 367 H2 CAA GGCG IC GCC GCA TGG CTc CTT A61 As CTT CTA AG CTG CCA CAA TCC TAT Glu Leu Gly As Cys Ser Ile Ala Gly Trp Leu Lea ly rPe lu es rg Leu Leu Ser Val Pro Olu Trp Ser Tyr Ile Se Clu Lys Glu Ass Pro Arg Asp Gly Leu Cys Tyr Pro Cly 229 RAT ATA 0C AT? CCC TAT H5 ATC TC ACG GAT TCT AC? GCA GCT GGA TGG CTC CTT 006 Rn AT?| GAC TTC CT? ACT CT? CCA CAA TOG CC TAC GTG WAF TacAA CCA MAT CTC SCC CCa 0GG 3SS Ile Leu Arg As Cys Ser Val Ala Cly Trp Leu Leu ly rP tl Asp lu Ph- Lou Thr Val Pro Clu Trp ler Tyr Il Val Ly Asp Ass Pro Ile Ass Gly Leu Cym Tyr Pro Cly 233 CGIs H1l AMT CT? CCT GAT |SC TI CCT GCG TGG AT? CTT AC A? TIC ATT G00 AM 6CA TCA TGG CT TAC ATA GTA kC CaA MCC 3 32 Slr Leu Gly As Cysl Ser Phe Ala Cly Trp Ile Leu ly ro tl ys Asp Asp Lea Ile Gly Lys Thr Ser Trp Ser Tyr Ile Val Ca =eOle 215 H6 GAC CTA AAM GGA lacATT GAG GGT TGG ATC STC RAT A TTA CTA AMT CTA CCA CMAG CC TCA TAC ATA CT? Asp Leau Lys Gly r Ile Clu Cly Trp Ile LeU ly ln| Lea Lea Leo Ser Val Pro Clu Trp Ser Tyr Ile Vol 223 Hs GCA CTS CGA MAC IACATT GAG GCA GTA ATC TAT hRo GAsAsp T? CAC CTG AAG CSC AAT GCC TOG CA TAC ATA CT? Cl 319 0lu Leu Arg Asp ys Lys Ile Clu Ale Val Ile Tyr ly ye Asp Ile His Leu Lys Val Asn Cly Trp Ser Tyr Ile Val krqegC ccPPro 240 ART ro ISGT AC AT? CTA GAC ACC crATT GAA GCA CTA ATC TAT Amn vCCA TA CTA CTA 06A00A AAA GCM TOO TC TAC AT? CT? 330 59 GIs Ile Leu Asp Thr Ile Olu Cly Leu Ole Tyr ly Pr r ys As; Ile Leu Leu Cly Cly Lys Clu Trp Ser Tyr Il0 Val 223 512 CT? CTS GA? CAC ITATTA GAG GCC CSA ASC cTa ly Jn GATF TAT TO AMT GCC AGC CAA TOO SA TAC ATA CT? kW CCC AMA 322 Vol Leau Asp Asp ler Leu Olu Cly Leau Il Lea ly ye ye Asp Lea Tyr Leu Asn Cly Arg Olu Trp Slr Tyr Ile Val lrg Pro Lys 221 ro 07 GAC TIC GCA CJA l~aCSC CTA GGA ACA CSG ATA ly GAs; CM TC TO GAC T --- 066 TSA GOH TO ATA ATS c0a aGA 0aa OCC A aaT --- A6A TGC TAT CCC G0G 34S 6^1 lrg Clu Ass AMe Ile Pro Asp Leu Cly Cln ysl ly Leu Leu Cly The Leu Il1 era IC ln Phe Leu Clu Phe Olu Leu Asp Lea Ile I0, Arg Cly Cys Tyr Cly 220 ro acy Hl0 GAC TTA GCC AA6 kC CT ATA GCC ATT ASA ST h ly T Asp CAC CTT ACC G00 AA --- TOG GA0 ACT TTS ATA A0 CAA MAT TcT ATT OCT --- Tac TOC TAT CCA 337 Asp Leu Gly As His Pro II1 Cly Ile Ile Ile 2 lLea His Lea Thr Cly Arg Trp Clu Thr Leu Ole lrg Clu Asn Ser Ile Ala Tyr Cys Tyr Pro 231 CC Cy H 4 GAT CCC CAA ACT ICCATC CSC AAT GGS GCT CTG c Asp TSG AAS GCS GCA --- CMA TOO GAT CTC TC ATA 322 Asp Gly Gln Thr Asp Ile Vol Asn Gly Ala Leo lyv 5is Lea Asn Cly la Olu Trp Asp Val Ph. Ile s1 257 lars A --- H GA CAC A CTC ATA 0AT OCT CTA ITO TST CAA AA GA0 --- ACA TOO GAC CT? TC CTT C1CCC ACC AA GCT TTC aC aAC MCC Tac CCC 373 A3CA y AOA Asp Cly IIle Asq r Lau Ile Asp Ala Lea Leo a ly Phe Oln As Clu Thr Trp Asp Lea Ph- Val) 1u Sler Lys Ala Phe Ser Ae Cys Tyr Pro

FIG. 1. Sequences of cDNA transcribed from the 3' end of HA genes of viruses representing the 12 HA subtypes and predicted amino acid se- quences of the precursor HA. The NH2-terminal amino acid of HA1 (actual or presumed) is indicated by an arrow, and potential glycosylation sites are underlined. Boxes indicate amino acid residues conserved through all subtypes; *, positions at which most variation is seen. The sequences are aligned at cysteine residues. Those shown are H1, A/PR/8/34; H2, A/RI/5-/57; H3, A/Memphis/1/71; H4, A/duck/Alberta/28/76; H5, A/ shearwater/Australia/75; H6, A/shearwater/Australia/72; H7, A/turkey/Oregon/71; H8, A/turkey/Ontario/6118/68; H9, A/turkey/Wisconsin/ 1/66; H10, A/duck/Manitoba/53; H11, A/duck/Memphis/546/76; H12, A/duck/Alberta/60/76. Downloaded by guest on September 26, 2021 Evolution: Air Proc. Nati Acad. Sci. USA 78 (1981) 7641

Table 1. Percentage amino acid sequence identity among the NH2-terminal regions of HAl of viruses representing each of the 12 HA subtypes H1 H2 H5 Hl1 H6 H8 H9 H12 H7 H10 H4 H3 H1 69 66 57 62 51 56 49 32 33 29 26 H2 80 54 61 44 56 47 39 35 33 28 H5 56 61 46 53 49 38 36 33 32 Hl1 61 51 54 49 33 30 29 29 H6 52 56 52 37 35 35 30 H8 65 61 34 31 33 28 H9 67 35 32 35 26 H12 33 34 33 30 H7 49 42 47 H10 44 42 H4 47 The virus strains used are those shown in Fig. 1. Because the signal peptide sequences are so variable, comparisons have been made from the aspartic acid residue at the NH2-terminus of H1, H2, and H7 (3, 25) and of H5, Hll, H6, H8, H9, H10, and H12 (by ) or from the corresponding amino acid where the NH2-terminus is longer [H3 (4) and probably H41. The sequences were compared pairwise by diagonal analysis (21) and percent identity was counted, deletions or insertions being counted as mismatches. A similar analysis of published complete sequences shows that the overall homology between H2 and H7 is 43%, that between H7 and H3 is 49%, and that between H2 and H3 is 42% (5-7).

The Signal Peptides. The amino acid sequences shown in residues supports their hypothesis that the basic structure of Fig. 1 are those predicted iftranslation begins at the first ATG the HA molecule is the same in all subtypes. of the cDNA sequence (22). Only for the H2 subtype has this Some subtypes are clearly more similar than others; the per- ATG been demonstrated to be the start ofa signal peptide (23), centage amino acid identities is shown in Table 1. The - which is cleaved off at or in the cell membrane during virus tionships are shown as the dendrogram in Fig. 2. The hemag- maturation. In vivo, the mRNA extends beyond the 3' end of glutinins fall into two main families, the Hong Kong group (H3, the viral strand RNA, having 10-15 nucleotides including the H4, H7, and H110) beingvery different from the rest. The closest cap spliced on from nonviral mRNAs (14). The sequences ofthe pair are H2 and H5 (80% amino acid ), but extra 10-15 nucleotides vary (24) and it is not known what would no antigenic crossreactivity has been detected between these. happen ifthey contained an AUG sequence. Two inphase ATG One of the three or four putative antigenic areas of the Hong sequences are present in several of the cDNA sequences shown Kong HAI (12, 26) is present in the 30% ofHAl sequenced and, in Fig. 1, including one near the start of the H3 (A/Mem/l/ although what happens in the rest of the molecule does not af- 71) sequence. In other H3 sequences, only the second ATG is fect the arguments presented here, it is possible that the partial present (7-9), but it is possible that in A/Mem/1/71 the signal sequences in Fig. 1 are indeed representative of the whole. peptide is five amino acids longer. Comparisons of complete sequences of H2, H7, and H3 (5-7) The signal peptide sequences are almost wholly hydropho- show the same relationships as found in the NH2-terminal re- bic, but little homology is apparent in either nucleotide or gions (Table 1 and Fig. 2). amino acid sequences among the different subtypes. There are Sequences Within a Subtype: Genetic (Including Antigenic) some amino acid similarities between H5 and Hil and between Drift. Nucleic acid and peptide analyses have shown that rather H8 and H12. The length is variable, as is the position ofthe first large differences in antigenic properties of human viruses ATG in the cDNA, ranging from nucleotides 15-18 in A/Mem/ within the H3 subtype occur with only about 20 amino acid sub- 1/71 (H3) to 44-46 in A/RI/5-/57 (H2). There is no set pat- stitutions, mostly in HAl (9, 29). Analysis of the cDNA se- tern ofcharged residues before or after the central hydrophobic quences complementary to the 3' end of the HA genes of the portion, which is 11-14 residues long. HI subtype ofviruses isolated between 1933 and 1957 showed The NH2 Terminus ofHAI. Viruses ofsubtypes HI, H2, and an accumulation ofnucleotide changes but very few amino acid H7 have been shown to have aspartic acid at the NH2 terminus DISSIMI LARITY of HAl (3, 25) and, because all viruses of all subtypes except 500 300 100 H3 and H4 have this amino acid at the same position, I have treated this as the NH2 terminus of the mature HAl. The H3 NH2 terminus is a cyclized glutamine in A/Mem/102/72 (4), and this is coded by all other H3 strains sequenced, giving an extra 10 amino acids in these strains. The only subtype in which the NH2 terminus is entirely unknown is H4. Because attempts to directly identify the NH2-terminal amino acid in this HAI failed (W. G. Laver, personal communication), a candidate is the glutamine at amino acid 17. It is clear from Fig. 1 that, although the cysteine residues can be aligned in all subtypes, very few other amino acids are totally conserved. At each of two positions, there are nine different amino acid residues in the 12 sequences, both positions being FIG. 2. Sequence relationships among the 12 HA subtypes. "Dis- to conserved residues. similarity" was calculated from the amino acid identities shown in adjacent One group ofconserved amino Table 1 by the method of Dayhoff (27), which allows forreversions, and acids, Gly-X-Pro-Y-Cys-Asp (Fig. 1), corresponds to a rather then the dendrogram was calculated from these relationships by using tight turn in the three-dimensional structure ofHong Kong (H3) the program of Lance and Williams (28) (Euclidean distance, Burr's HA derived by Wilson et al. (26), and the conservation ofthese strategy). Downloaded by guest on September 26, 2021 7642 Evolution: Ikir Proc. Natl. Acad. Sci. USA 78 (1981)

35 122 Duck/Eng/56 (Viola) AGCAASAOCA GGGGATCTAT CAAGAAGTCG AA ATG GAG AAA ATC CTG CTA TmT GCA GCT ATT TTC CTT TGT GTG AAA OCA GAT GAG ATC TGT ATC G0G TAT TTA AOC AAC AAC TCG ACA GAC Sat Glu Lys Ile Lau Lau Phe Ala Ala Ile Phe Lau Cys Val Lys Al&fop Glu IIl Cys Ile Gly Tyr Lou Sor Asn Asn Ser Thr Asp Duck/Ukraine/1/60 (Cumin) ACA ATT GOC Thr

tsrn/Aust/75 (G55C) A A A CTC CT? ACT ATC ATC TGC GCG ATT GOC AAT Lou Thr I.- Duck/im/576/76 (Russ) AA T A A AAG GTA CT? GCA ATC ATC ATC ATT CGA GAC GAA TGC ATT GGA TAC CTG TCA GAG Lys Val Ile II IXle Arg Glu Duck/NY/12/78 (Buzz) A A G AAG ACA GT? TA TAT GCA GAA TOC GGT Lys Thr Val Tyr Ala

242 Viola ASA OT? GAC ACA ATA AT? GAG AMC AAT OTCA= G GTC ACT CTAMC WT?S TOGAS GAG ACA GSA CAC ACT GGA TCA TYC TOT TCA ATC AAT &GA ASS CAA CCA ATC AMC CTT MA GAT Lys Val Asp TShr Il I1e Glu Asn Asn Val Thr V-1 Thr Bar Ser Val Glu Lau Val Glu Thr Glu His Thr Gly Ser Ph. Cys Sar Ile Asn Gly Lys Gln Pro Ile Ser Lau Gly Asp Cumin G0G GAC

G55C ACA TG TTG 0G0 ACA AGM GAC Lou Thr Arg Russ GOG AGT GSA ART GAG TAC TGC GAT 0OG OCA AGMT GT Ser Asn Tyr Asp Ala Buzz ACG ATC GSA AMC 0GG AAG ACT Sir

281 Viola TOT TSCA SST OCT GA TOO ATA TA GOC AAG CCT AT? TOT Cys Ser Ph. Ala Gly Trp IsI Lau Gly Asn Pro Nat Cys 332 Cumin GGA AAT GAT G A A GU AG SAT CA TGM TCT TMA ATA OTG GAA AAG CSA ST? Asp Asp Lou Ile Gly Lys Asn Sar Trp Ser Tyr Ile Va1 Glu Asn Gln Ser TGC TcC AAT CCC CAA Gln Russ TOC TCC GGG ATT CTT GGG CCA GAT TG GGG SAA ACA OTA GAG Thr

Buzz TCC &GA CCC GAT ATA G0G ACT TCA ATT GAG Thr

FIG. 3. Genetic drift in the HA gene of the H11 (Hav3) subtype of influenza virus. The nucleotide sequence and predicted amino acid sequence are given for cDNA transcribed from the 3' end of RNA segment 4 of A/duck/England/56. For the other strains, where there is a nucleotide change, the entire codon is shown; if this also involves an amino acid change, the new amino acid is shown. Blank areas therefore indicate regions in which nucleotide and amino acid sequences are identical. Vertical lines indicate the end of data for the strain, and the arrow shows the start of mature HAl.

changes in HAL. Two "swine" strains of this subtype-A/ greater than that observed in the neuraminidase gene (19) or swine/Wisconsin/15/30 and A/New Jersey/11/76-are also in the nonselected genes codingfor the matrix and nonstructural highly conserved (30). proteins, which have also changed with time (32). As new vari- Similar sequence analyses of the HA gene of H2 (Asian) ants of HA appear, the older viruses disappear, but isolations strains, isolated from humans in 1957 and at the end ofthe H2 of an old virus are sometimes reported [e.g., Hong Kong/68 era (1968), and of three Asian-type strains isolated later from from South Australia in 1979 (34)]. In the best documented and birds (1972 and 1977) showed the same pattern (31). There were most widespread case of reappearance, a 1950-type of HiNL many nucleotide changes, but only 6 amino acids changed in virus was isolated in 1977 (35). This virus was a relatively late the NH2-terminal 77 amino acids of HAl through this time. member of the H1 era in humans and, although the subtype Peptides from HAl of the late human H2 strains-A/Neth- disappeared from humans in 1957, the 1977 virus caused major erlands/68 and A/Berkeley/68-have been analyzed, and epidemics following its reappearance and has since drifted sig- show a similar proportion ofchanges through the rest ofthe HAl nificantly (36). The close relationship of the A/USSR/90/77 polypeptide (unpublished results). The same pattern is seen in strain to A/FW/1/50(2nucleotidedifferences in335-adenosine subtypes occurring only in birds, such as in H l, shown in Fig. at position 31 and thymidine at 182; see ref. 30) indicates that 3. Because these are sporadic isolates from different bird species this virus was not replicating during its disappearance. in different parts ofthe world, they cannot be considered as an Fig. 1 shows that the nucleotide sequence changes between evolutionary series, but the earlier strains are more similar to even the closest subtypes are much greater than any changes each other than to later strains and vice versa. There is again within subtypes (refs. 30 and 31; Fig. 3). At the amino acid se- high conservation of the amino acid sequence, although there quence level, the relationships shown in Fig. 2 are clearly seen, are many nucleotide changes. but there is still a marked gap between the most dissimilar Although the numbers of nucleotide changes within a sub- strains within a subtype (A/duck/England/56 vs. A/duck/ type increase with time (30, 31), the vast majority of these are Memphis/546/76, Hil; Fig. 3) and the most similar pair of "silent" and do not alter the protein sequence except in the subtypes (H2 and H5). In most subtypes in which several strains apparently less constrained signal peptides. have been examined [H1 (30), H2 (31), and H3 (9)], there are so few amino acid differences that it is impossible to tell whether DISCUSSION the subtype is drifting toward any other. In Hil (Fig. 3), which In the matrix and nonstructural protein genes from human in- shows rather more variation, the other viruses of the subtype fluenza viruses isolated during 1933-1977, we have observed are no more similar to any other subtype than the example used a gradual accumulation of nucleotide (and some amino acid) in Fig. 2. changes, the rate ofchange being much the same from 1934 to Three conclusions can be drawn from the sequence data. 1957 (HI) as from 1957 to 1977 (H2 and H3) (32). Genetic drift (i) Drift within HA subtypes has been continuous at an overall also appears to occur at a constant rate in the various subtypes rate of 5% nucleotide change per 20 years, with very few ob- of HA. Of particular note is that the H2 (Asian) HA, which served reversions ofnucleotides for at least 27 years (human H1 underwent antigenic drift in humans from 1957 to 1968, con- strains), 45 years (swine H1), 21 years (H2), and 22 years (Hil). tinued to drift in avian hosts at least up to 1977 (31, 33). The The Hil subtype shows drift, even though viruses of different rate of amino acid change within a subtype is not significantly bird species from different parts of the world (England, Downloaded by guest on September 26, 2021 Evolution: Air Proc. Natl. Acad. Sci. USA 78 (1981) 7643

Ukraine, USA, and Australia) were compared (Fig. 3). The H2 5. Porter, A. G., Barber, C., Carey, N. H., Hallewell, R. A., Threl- subtype drifted in human hosts for 12 years and, although fall, G. & Emtage, J. S. (1979) Nature (London) 282, 471-477. 6. Gething, M. J., Bye, J., Skehel, J. & Waterfield, M. (1980) Na- slightly diversified at the end of its human era (A/Tokyo/3/ ture (London) 287, 301-306. 67, A/Berkeley/68 and A/Netherlands/68), a remarkably 7. Sleigh, M. J., Both, G. W., Brownlee, G. G., Bender, V. J. & similar virus appeared in bird species and continued to drift for Moss, B. A. (1980) in Structure and Variation in Influenza Virus, at least 9 years more (31). Some host specificity is seen. The HI eds. Laver, W. G. & Air, 'G. M. (Elsevier, Amsterdam), pp. subtype in humans drifted between 1933 and 1956. The HI 69-80. swine virus, a probable cause of the 1918 human epidemic, is 8. Min Jou, W., Verhoeyan, M., Devos, R., Saman, E., Fang, R., Huylebroeck, D., Fiers, W., Threlfall, G., Barber, C., Carey, N. very similar in HA sequence to the human Hi viruses (30), but & Emtage, S. (1980) Cell 19, 683-696. there are characteristic differences suggesting that it evolved 9. Verhoeyan, M., Fang, R., Min Jou, W., Devos, R., Huyle- independently of the human strains between 1930 and 1976. broeck, D., Saman, E. & Fiers, W. (1980) Nature (London) 286, However, in the H3 subtype, it is not clear from sequence data 771-776. whether a 1978 avian virus (A/black duck/Australia/702/78) 10. Laver, W. G., Air, G. M., Webster, R. G., Gerhard, W., Ward, is derived from duck/Ukraine/1/63 or from. later human H3 C. W. & Dopheide, T. A. (1979) Virology 98, 226-237. 11. Yewdell, J., Webster, R. G. & Gerhard, W. (1979) Nature (Lon- strains (31). don) 279, 246-248. (ii) The rate of sequence change in the HA gene is not sig- 12. Webster, R. G. & Laver, W. G. (1980) Virology 104, 139-148. nificantly greater than that in the antigenically unselected 13. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl Acad. genes. Sci. USA 74, 5463-5467. (iii) Drift in any HA subtype sequence has not been observed 14. Plotch, S. J., Bouloy, M. & Krug, R. M. (1979) Proc. Natl Acad. to tend toward any other subtype. Sci. USA 76, 1618-1622. 15. Laver, W. G. (1969) in Fundamental Techniques in Virology, There is a clear distinction between the extent ofamino acid eds. Habel, K. & Salzman, N. P. (Academic, New York), pp. variation within subtypes (0-9%) and that between subtypes 82-86. (20-74%). Although it is obvious that, if drift continued over 16. Skehel, J. J. & Hay, A. J. (1978) Nucleic Acids Res. 5, 1207-1219. a long period of time, the sequence variation would easily ex- 17. Sanger, F. & Coulson, A. R. (1978) FEBS Lett. 87, 107-110. ceed that between subtypes, but there is currently no evidence 18. Air, G. M. (1979) Virology 97, 468-472. that' drift can proceed indefinitely; no viruses have been found 19. Blok, J. & Air, G. M. (1980) Virology 107, 50-60. 20. Staden, R. (1979) Nucleic Acids Res. 6, 2601-2610. that are intermediate between subtypes. 21. Gibbs, A. J. & McIntyre, G. A. (1970) Eur.J. Biochem. 16, 1-11. The different subtypes of HA are not confined to different 22. Kozak, M. (1978) Cell 15, 1109-1123. species; all subtypes have been grown in embryonated chicken 23. Elder, K. T., Bye, J. M., Skehel, J. J., Waterfield, M. D. & eggs, indicating that HA subtypes are not species specific, al- Smith, A. E. (1979) Virology 95, 343-350. though transmission to other species may be relatively rare. 24. Lai, C. J., Markoff, L. J., Sveda, M., Dhar, R. & Chanock, R. Drift occurs in human, bird, and animal hosts, in HA and other M. (1980) in Structure and Variation in Influenza Virus, eds. Laver, W. G. & Air, G. M. (Elsevier, Amsterdam), pp. 115-123. genes. As drift proceeds, the earlier strains disappear, except 25. Bucher, D. J., Li, S. S. L., Kehoe, J. M. & Kilbourne, E. D. for scattered but perhaps significant reappearances (34). The (1976) Proc. Nati Acad. Sci. USA 73, 238-242. data presented here give no hint ofthe mechanisms underlying 26. Wilson, I. A., Skehel, J. J. & Wiley, D. C. (1981) Nature (Lon- two extremes of influenza virus evolution-i.e., how a'1950 don) 289, 366-372. HINM virus reappeared almost unchanged in 1977 or how 12 27. Dayhoff, M. 0. (1972) Atlas ofProtein Sequence and Structure, subtypes that have little recognizable nucleotide sequence ho- (NatI. Biomed. Res. Found., Silver Spring, MD), Vol. 5. 28. Lance, G. N. & Williams, W. T. (1967) Aust. Comput.J. 1, 15-20. mology but various degrees of amino acid sequence homology 29. Laver, W. G., Air, G. M., Dopheide, T. A. & Ward, C. W. have evolved. (1980) Nature (London) 283, 454-457. 30. Air, G. M., Blok, J. & Hall, R. M. (1981) in Replication ofNeg- I thank Dr. Adrian Gibbs for the computer analysis to generate Table ative Strand Viruses, eds. Bishop, D. H. L. & Compans, R. W. 1 Fry (Elsevier, Amsterdam), 225-239. and Fig. 2 and Anne Mackenzie, Sally Campbell, and Kathy for 31. Air, G. M. & Hall, R. M. (1981) in ICN-UCLA Symposia on Mo- expert technical assistance. This work was supported in part by Grant lecular and Cellular Biology, eds. Nayak, D. & Fox, C. F. (Ac- Al 15343 from the National Institutes of Health, and by the Australian ademic, NY), Vol. 22, in press. Government Poultry Industry Trust Fund. 32. Hall, R. M. & Air, G. M. (1981)J. Virol. 38, 1-7. 33. Hinshaw, V. S., Webster, R. G., Bean, W. J. & Sriram, G. (1980) 1. Laver, W. G. & Webster, R. G. (1979) Br. Med. BulL 35, 29-34. Comp. Immun. Microbiol Infect. Dis. 3, 155-164. 2. World Health Organization (1980) Memo. BulL W. H. 0. 58, 34. Moore, B. W., Webster, R. G., Bean, W. J., van Wyke, K. L., 585-591. Laver, W. G., Evered, M. G. & Downie, J. C. (1981) Virology 3. Waterfield, M. D., Espelie, K., Elder, K. & Skehel, J. J. (1979) 109, 219-222. Br. Med. Bull. 35, 57-64. 35. Nakajima, D., Desselberger, U. & Palese, P. (1978) Nature (Lon- 4. Ward, C. W. & Dopheide, T. A. (1980) in Structure and Vari- don) 274, 334-339. ation in Influenza Virus, eds. Laver, W. G. & Air, G. M. (Else- 36. Young, J. F., Desselberger, W. & Palese, P. (1979) Cell 18, vier, Amsterdam), pp. 27-38. 73-83. Downloaded by guest on September 26, 2021