<<

Genomic Minimalism in the Early Diverging Intestinal Parasite Giardia lamblia Hilary G. Morrison, et al. Science 317, 1921 (2007); DOI: 10.1126/science.1143837

The following resources related to this article are available online at www.sciencemag.org (this information is current as of February 5, 2008 ):

Updated information and services, including high-resolution figures, can be found in the online version of this article at: http://www.sciencemag.org/cgi/content/full/317/5846/1921

Supporting Online Material can be found at: http://www.sciencemag.org/cgi/content/full/317/5846/1921/DC1 A list of selected additional articles on the Science Web sites related to this article can be found at: http://www.sciencemag.org/cgi/content/full/317/5846/1921#related-content This article cites 42 articles, 24 of which can be accessed for free: http://www.sciencemag.org/cgi/content/full/317/5846/1921#otherarticles on February 5, 2008 This article has been cited by 2 article(s) on the ISI Web of Science.

This article has been cited by 1 articles hosted by HighWire Press; see: http://www.sciencemag.org/cgi/content/full/317/5846/1921#otherarticles

This article appears in the following subject collections: Genetics http://www.sciencemag.org/cgi/collection/genetics www.sciencemag.org Information about obtaining reprints of this article or about obtaining permission to reproduce this article in whole or in part can be found at: http://www.sciencemag.org/about/permissions.dtl Downloaded from

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2007 by the American Association for the Advancement of Science; all rights reserved. The title Science is a registered trademark of AAAS. REPORTS

baseline spontaneous activity (~1 spike/s) is typically too 24. H. M. Duvernoy, S. Delon, J. L. Vannson, Brain Res. Bull. 37. M. L. Kringelbach, N. Jenkinson, S. L. Owen, T. Z. Aziz, low to detect signal decreases. 7, 519 (1981). Nat. Rev. Neurosci. 8, 623 (2007). 16. A. von Stein, J. Sarnthein, Int. J. Psychophysiol. 38, 301 25. A. Pascual-Leone, J. Valls-Sole, E. M. Wassermann, 38. A. M. Speer et al., Biol. Psychiatry 54, 826 (2003). (2000). M. Hallett, Brain 117, 847 (1994). 39. A. T. Sack et al., Cereb. Cortex (2007). 17. One might expect to observe visually elicited oxygen 26. B. Takano et al., Neuroimage 23, 849 (2004). 40. C. C. Ruff et al., Curr. Biol. 16, 1479 (2006). responses during the evoked intervals of the TMS trial 27. Y. Z. Huang, M. J. Edwards, E. Rounis, K. P. Bhatia, 41. We thank our colleagues at the University of California, (fig. S2B). However, stimulus-evoked responses are J. C. Rothwell, Neuron 45, 201 (2005). Berkeley, and anonymous reviewers for their helpful considerably smaller and more variable than TMS- 28. F. Maeda, J. P. Keenan, J. M. Tormos, H. Topka, comments, and R. Bartholomew, N. Lines, and L. Gibson for induced oxygen responses, and are therefore negligible A. Pascual-Leone, Clin. Neurophysiol. 111, 800 (2000). assistance in developing the electrophysiological apparatus. in the current paradigm (fig. S6). 29. H. Wang, X. Wang, H. Scheich, Neuroreport 7, 521 Supported by research and CORE grants from the National 18. A. Shmuel, M. Augath, A. Oeltermann, N. K. Logothetis, (1996). Eye Institute (EY01175 and EY03176, respectively) and by Nat. Neurosci. 9, 569 (2006). 30. S. F. Cooke, T. V. Bliss, Brain 129, 1659 (2006). NSF graduate research fellowship 2003014861. 19. B. N. Pasley, B. A. Inglis, R. D. Freeman, Neuroimage 36, 31. C. Holscher, R. Anwyl, M. J. Rowan, J. Neurosci. 17, 6470 269 (2007). (1997). 20. J. Niessing et al., Science 309, 948 (2005). 32. V. Wespatat, F. Tennigkeit, W. Singer, J. Neurosci. 24, Supporting Online Material 21. On the basis of previous work in awake animals (22) and 9067 (2004). www.sciencemag.org/cgi/content/full/317/5846/1918/DC1 the similarity of neurovascular organization (23, 24) 33. J. Jacobs, M. J. Kahana, A. D. Ekstrom, I. Fried, Materials and Methods across the cortex, we expect that preserved coupling after J. Neurosci. 27, 3839 (2007). Figs. S1 to S9 TMS will generalize to a broad range of cortical regions 34. T. Paus et al., J. Neurophysiol. 79, 1102 (1998). References and physiological states. 35. H. R. Siebner et al., Neuroimage 14, 883 (2001). 22. J. Berwick et al., J. Cereb. Blood Flow Metab. 22, 670 (2002). 36. A. Pascual-Leone et al., J. Clin. Neurophysiol. 15, 333 13 June 2007; accepted 21 August 2007 23. C. Iadecola, Nat. Rev. Neurosci. 5, 347 (2004). (1998). 10.1126/science.1146426

the evolution of eukaryotes, we embarked upon a Genomic Minimalism in the genome analysis of G. lamblia. ThegenomeofG. lamblia WB clone C6 Early Diverging Intestinal Parasite (ATCC50803) is ~11.7 MB in size, distributed on five chromosomes. The edited draft genome se- Giardia lamblia quence contains 306 contigs on 92 scaffolds

(Supporting Online Material). The genome is on February 5, 2008 compact. We identified 6470 open reading Hilary G. Morrison,1* Andrew G. McArthur,1 Frances D. Gillin,2 Stephen B. Aley,3 frames (ORFs) with a mean intergenic distance Rodney D. Adam,4 Gary J. Olsen,5 Aaron A. Best,6 W. Zacheus Cande,7 Feng Chen,8 of 372 base pairs (bp) (Table 1). Approximately Michael J. Cipriano,1 Barbara J. Davids,2 Scott C. Dawson,9 Heidi G. Elmendorf,10 77% of the assembled sequence defines ORFs, of Adrian B. Hehl,11 Michael E. Holder,1 Susan M. Huse,1 Ulandt U. Kim,1 Erica Lasek-Nesselquist,1 which 1800 overlap and 1500 more are within 100 Gerard Manning,12 Anuranjini Nigam,4 Julie E. J. Nixon,1 Daniel Palm,13 nucleotides (nt) of an adjacent ORF. Serial anal- Nora E. Passamaneck,1 Anjali Prabhu,4 Claudia I. Reich,5 David S. Reiner,2 John Samuelson,14 ysis of gene expression (SAGE) and cDNA se- Staffan G. Svard,15 Mitchell L. Sogin1 quences provided transcriptional evidence for

The genome of the eukaryotic protist Giardia lamblia, an important human intestinal parasite, is 4787 of these ORFs (Supporting Online Material). www.sciencemag.org compact in structure and content, contains few introns or mitochondrial relics, and has simplified Although the total number of ORFs is similar machinery for DNA replication, transcription, RNA processing, and most metabolic pathways. Protein to that of yeast, many specific giardial pathways kinases comprise the single largest protein class and reflect Giardia’s requirement for a complex appear simple in comparison with those of other Giardia’ signal transduction network for coordinating differentiation. Lateral gene transfer from bacterial and eukaryotic organisms. s genome encodes archaeal donors has shaped Giardia’s genome, and previously unknown gene families, for example, a simplified form of many cellular processes: cysteine-rich structural proteins, have been discovered. Unexpectedly, the genome shows little fewer and more basic subunits, incorporation of evidence of heterozygosity, supporting recent speculations that this organism is sexual. This genome single-domain bacterial- and archaeal-like en- sequence will not only be valuable for investigating the evolution of eukaryotes, but will also be Downloaded from applied to the search for new therapeutics for this parasite. 1Marine Biological Laboratory, Woods Hole, MA 02543– 1015, USA. 2Department of Pathology, Division of Infectious iardia lamblia (syn. G. intestinalis, G. ally active diploid nuclei and the absence of Diseases, University of California, San Diego, CA 92103– 3 duodenalis) is the most prevalent para- mitochondria and peroxisomes. Giardia is a 8416, USA. Department of Biological Sciences, University of Texas at El Paso, El Paso, TX 79968–0519, USA. 4De- sitic protist in the United States, where member of the Diplomonadida, which includes G partments of Medicine and Immunobiology, University of its incidence may be as high as 0.7% (1). World- both free-living (e.g., Trepomonas) and parasitic Arizona College of Medicine, Tucson, AZ 85724–5049, USA. wide, giardiasis is common among people with species. The phylogenetic position of diplomonads 5Department of Microbiology, University of Illinois at 6 poor fecal-oral hygiene, and major modes of and related excavate taxa is perplexing. Ribosomal Urbana-Champaign, Urbana, IL 61801, USA. Department of Biology, Hope College, Holland, MI 49423, USA. 7University transmission include contaminated water sup- RNA (rRNA), vacuolar ATPase (adenosine triphos- of California, Berkeley, CA 94720–3200, USA. 8University of plies or sexual activity. Flagellated giardial phatase), and elongation factor phylogenies iden- Pennsylvania, Philadelphia, PA 19194, USA. 9University of trophozoites attach to epithelial cells of the small tify Giardia as a basal eukaryote (2–4). Other California, Davis, CA 95616, USA. 10Biology Department, intestine, where they can cause disease without gene trees position diplomonads as one of many Georgetown University, Washington, DC 20057, USA. 11In- triggering a pronounced inflammatory response. eukaryotic lineages that diverged nearly simulta- stitute of Parasitology, University of Zürich, CH-8057 Zürich, Switzerland. 12Razavi Newman Center for Bioinformatics, The There are no known virulence factors or toxins, neously with the opisthokonts and plants. Dis- Salk Institute for Biological Studies, La Jolla, CA 92037–1099, and variable expression of surface proteins may coveries of a mitochondrial-like cpn60 gene and USA. 13Centre for Microbiological Preparedness, Swedish allow evasion of host immune responses and a mitosome imply that the absence of respiring Institute for Infectious Disease Control, 171 82 Solna, Sweden. 14 adaptation to different host environments. Troph- mitochondria in Giardia may reflect adaptation Department of Molecular and Cell Biology, Boston University Goldman School of Dental Medicine, Boston, MA 02118–2932, ozoites can differentiate into infectious cysts that to a microaerophilic life-style rather than diver- USA. 15Department of Cell and Molecular Biology, Uppsala are transmitted through feces. gence before the endosymbiosis of the mitochon- University, SE-751 24 Uppsala, Sweden. Unusual features of this enigmatic protist in- drial ancestor (5, 6). Because of its impact on *To whom correspondence should be addressed. E-mail: clude the presence of two similar, transcription- human disease and its relevance to understanding [email protected]

www.sciencemag.org SCIENCE VOL 317 28 SEPTEMBER 2007 1921 REPORTS

zymes, and a limited metabolic repertoire com- complex proteins (Orc4 and Orc1/Cdc6) in RNAPII, and RNAPIII core peptides. Seven monly observed in parasites. We did not detect Giardia and the absence of regulatory initiation proteins are missing, but six of these are unique these missing components in searches of assem- proteins (e.g., Cdt1, Dpb11, Cdc45, MCM10, subunits that occur in only one RNAP (7). bled and unassembled reads; however, they may and Gemini) are comparable to Archaea. Giardia Moreover, Giardia contains only 4 of the 12 be highly divergent and difficult to recognize. has three replicative B-type DNA polymerases transcription initiation factors present in Saccha- Others may be nonessential or functionally re- (Pola,Pold,andPole). The occurrence of four romyces. The absence of polymerase core pep- dundant with other proteins in the same or another subunits in Giardia’sPola/primase complex is tides is unlikely to be due to our failure to pathway. The host may provide essential meta- typical of other eukaryotes, whereas the compo- recognize highly diverged homologs in Giardia, bolic products for an incomplete pathway, but this sitions of Pole and Pold resemble the corre- because the missing proteins represented RNAP- is a highly improbable explanation for missing sponding polymerases in Archaea. Most giardial specific elements rather than a random sampling structural proteins or subunits of core machinery. DNA polymerase accessory proteins are typically of both shared and unique RNAP subunits. DNA synthesis, transcription, RNA process- eukaryotic. Absence of homologs to many of the unique ing, and cell cycle machinery are simple (Fig. 1). Relative to Saccharomyces, Giardia has re- subunits required for transcription is consistent The occurrence of only two origin recognition tained most of the RNA polymerase I (RNAPI), with an evolutionary model hypothesizing that

Table 1. Comparison of eukaryotic genome content and organization.

Saccharomyces Plasmodium Trypanosoma Leishmania Entamoeba Encephalitozoon Trichomonas Giardia cerevisiae falciparum brucei major histolytica cuniculi vaginalis lamblia Size (MB) 12.5 22.8 26.1 32.8 ~24 2.3 ~160 11.7 %G+C content 38.3 19.4 46.4 59.7 ~25% 47.6 32.7 49.0 Proteins encoded 5770 5268 9068 8272 9938 1997 25,949 6470 Mean CDS (bp) 1424 2283 1592 1901 1170 1077 929 1283 Mean intergenic 515 1694 1279 2045 1245 129 1165 372 distance (bp) Gene density, 0.48 0.23 0.32 0.25 0.41 0.97 0.34 0.58 per kbp on February 5, 2008 Introns 272 7406 1 0 ~2500 2654 predicted tRNAs 275 44 65 83 Subtelomeric 44 479 63 arrays

Fig. 1. Comparison of Initiation of DNA replication Initiation of transcription Polyadenylation selected multiprotein complexes between www.sciencemag.org Giardia and the yeast Orc1/Cdc6 Saccharomyces cerevisiae. 4 RNAP II Orc4 CTD Initiation of replication: MCM 2-7 Giardia RNAPII Ysh1 Multiple initiator pro- Origin TBP IIH Yth1 Pab1 teins assemble at the A/T-rich 5’ 3’ 5’ AGTAAAY 3’ origins of replication in +1 Pap1 S. cerevisiae during the Polα Polα

ε ε Downloaded from cell cycle. Giardia has Pol Pol Glc7 fewer origin recognition proteins (Orc) and most of the initiators of the 1 2 3 pre-initiator complex. 4 6 ORC 5 Initiation of transcrip- Complex S. tion: Transcription in Origin YOR cerevisiae is initiated by Cdt1 MEDIATOR 179C Cft1 CTD RNAP II Pta1 the pre-initiation com- RNAPII Ysh1 Cdc6 IIA IIH plex (PIC) consisting of MCM 2-7 TBP IID Cft2 Yth1 Fip1 Nab2p IIF Rna14 Hrp1 Pab1 Yeast UAS TATA INR DPE theRNAPIIcorecomplex IIE 3’ 5’ Rna15 3’ 5’ A/U-rich Psf2 Clp1 (12 subunits) and gen- IIB +1 Pcf11 Pap1 Nab4p eral transcription factors YKL SSU Ref2 Polα Mcm10 Mcm10 Polα containing several sub- 059C YKL 72 Pti1 ε Cdc45 Cdc45 Polε 018W Pol Swd2p Mpe1 Glc7 units: TFIIA (2), TFIIB (1), RPA RPA TFIID (TBP plus14 TAFs), Sid3 Sid2Dpb11 Sid3 TFIIF (3), TFIIE (2), TFIIH Sld5 Psf1 Pfs2 Pfs3 (10), and the Mediator (24). These factors recognize DNA elements in the promoter, including the A/U-rich sequence, and it contains at least 25 proteins and the largest subunit of upstream activating sequence (UAS), the TATA box, the initiator element (INR), RNAPII with its C-terminal domain (CTD). The preferred polyadenylation signal and the downstream promoter element (DPE). Giardia promoters have an AT- in Giardia is AGTAAY, and Giardia has very few of the yeast polyadenylation rich initiator element and lack many of the general transcription factors. proteins and a diverged CTD. Yth1 corresponds to CPSF30 and Ysh1 corresponds Polyadenylation: The polyadenylation complex in S. cerevisiae recognizes an to CPSF73 in mammals.

1922 28 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org REPORTS class-specific polymerase subunits arose after the N-ethylmaleimide–sensitive factor attachment pro- pressed. Trophozoites thus deprive host intestinal divergence of diplomonads. tein receptors), and a small number of adaptor epithelial cells of for nitric oxide bio- A single intron with a noncanonical 5′ splice protein (AP) complexes participate in vesicle synthesis and thereby dampen innate defenses site was identified in a 2Fe-2S ferredoxin gene, docking and membrane fusion. Unlike all other (23, 24). During encystation, Giardia synthe- along with components of the spliceosome (8). We eukaryotes that have at least three AP complexes, sizes UDPGalNAc from fructose-6-phosphate generated trophozoite cDNAs and examined align- Giardia encodes only two. The presence of only by an unusual, five- bacterial-like ments of conserved proteins to identify other two APs with no indication of pseudogenes or pathway (Fig. 2). Many eukaryotes use the first possible introns. We found three candidates, in orphan subunits argues for a simple membrane enzyme, glucosamine-6-phosphate , to genes for ribosomal protein L7A, a dynein light- transport system in Giardia. generate glucosamine-6-phosphate from fructose- chain protein, and an unknown protein. Two were Two rounds of cytokinesis, accompanied by a 6-phosphate and ammonia for glycolysis. Instead, confirmed by reverse transcription–polymerase single round of nuclear division, occur during Giardia uses ammonia from arginine metabolism chain reaction, and the RPL7A intron was excystation. Giardia’s transcriptionally equiva- to drive the synthesis of glucosamine-6- independently reported (9). These new candidates lent nuclei must synchronously divide in troph- phosphate for cyst wall polysaccharide bio- show canonical GT/AG splice sites and contain an ozoites and form quadrinucleate, 16N cysts (19). synthesis. Although Giardia is microaerophilic AC-repeat motif, [AC]CT[GA]AC[AC]CACAG The presence of homologs to yeast Cin8, polo and consumes oxygen, it lacks the conventional (fig. S1). The AC-repeat motif is very like that kinase, aurora kinase, and antiparallel micro- superoxide dismutase and catalase for common to Trichomonas introns [ACTAACA- tubule bundling proteins suggests that the nec- detoxifying reactive oxygen species (25). CACAG (10)], suggesting a shared splicing essary spindle apparatus machinery is present. Motility and attachment to host cells are mechanism. An intron has also been reported in We identified giardial homologs of several mi- essential for the parasitic life-style of Giardia. the excavate Carpediomonas (11). totic exit network (MEN) proteins, indicating that The microtubule cytoskeleton organizes Giardia’s Giardia’s machinery for RNA processing regulation of cytokinesis in Giardia may be eight basal bodies and flagella, as well as other is less complex than that of other eukaryotes, similar to that of yeast in which MEN coordinates structures unique to the genus, including the ven- but the presumed polyadenylation signal nuclear division with cytokinesis. Homologs of tral disk and median body (table S4). The giardial (AGUAAA) (12) resembles that of other actin, cyclin-dependent kinases, and the mitotic cytoskeleton undergoes dramatic changes through- eukaryotes (AAUAAA). Searches for Giardia cyclins A and B are present in Giardia. How- out the life cycle. General signaling proteins sequences that are similar to the many poly- ever, the lack of myosin indicates that the actin- (protein kinase A, Erk kinase, calmodulin) and a

adenylation factors in yeast and other eukary- myosin cleavage furrow previously found in all protein phosphatase localize to the basal bodies, on February 5, 2008 otes identified relatively few homologs (Fig. 1). eukaryotes is not present in Giardia. Possibly a paraflagellar dense rods, and disk. The basal Giardia has a relative paucity of enzymes for nonmyosin, adhesion-dependent cytokinesis bodies may act as a control center that coor- posttranslational modification. Like Plasmodium, mechanism exists in Giardia, as in some mutants dinates the other cytoskeletal structures during it lacks the vast majority of genes encoding of Dictyostelium (20). growth and differentiation. The microtubule sys- glycosyltransferases and so makes the shortest Like many other microaerophilic eukaryotic tem is well conserved and includes all five tubulin N-glycan precursor yet identified, dolichol-PP- parasites, Giardia exhibits a limited metabolic forms, proteins involved in microtubule modifica- GlcNAc2 (13). Giardia, like Trypanosoma and repertoire. There are essentially no homologs for tion, organization, and assembly (centrins, tubulin- Archaea, has a single-subunit oligosaccharyl- enzymes in the Krebs cycle and, except for well- specific chaperones, tubulin tyrosine ). There

for transferring N-glycans from the known scavenging pathways, no evidence of are coding regions for microtubule motor proteins, www.sciencemag.org lipid precursor to the peptide (14), compared with vestigial genes associated with purine and pyrim- including kinesins and 12 dynein heavy chains. eight in yeast and humans. Unlike most eukary- idine biosynthesis. Amino acid metabolism is The most notable departure from conserved otes, Giardia has an N-glycan–independent even more limited, although all tRNA synthe- cytoskeletal structure is the absence of cytoplas- quality-control system for protein folding (e.g., tases are present. For lipid metabolism, the mic dynein and the divergent nature of the micro- chaperones, protein disulfide , and Giardia genome contains enzymes capable of filament cytoskeleton. The genome contains a peptidyl-prolyl cis-trans isomerases) and protein limited fatty acid extension and sphingomyelin single actin gene, yet does not encode other clas- degradation. Giardia has fewer nucleotide sugar assembly, as well as phospholipid headgroup ex- sical microfilament proteins. On the basis of transporters than any other eukaryotic genome, change and modification. Although not sufficient sequence similarities, the three genes encoding Downloaded from including just one for uridine 5´-diphosphate for de novo synthesis of lipids, these enzymes actin-related proteins participate in chromatin (UDP)–GlcNAc (15). Giardia is missing the set allow for remodeling of membrane components. remodeling, rather than cytoskeletal structure. of glycosyltransferases that typically modify N- Glycolytic activities associated with enzymes The absence of classic microfilament-associated and O-linked glycans in the Golgi lumen. involved in hexose processing and the inter- proteins extends to actin modification, organiza- Instead, Giardia has a cytosolic glycosyltrans- conversion and phosphorylation to fructose-1,6- tion, and assembly proteins. In contrast to studies ferase, rare among protists, which adds O-linked phosphate glycolysis are more similar to bacterial that used heterologous antibodies (26, 27), per- GlcNAc to Ser and Thr of cytosolic proteins than to higher eukaryal homologs (Fig. 2) (21). missive searches of the Giardia genome failed to (15). Some of these bacterial-like proteins share identify actin-associated proteins, myosins, or Giardia has a conventional endoplasmic similarity with genes in Entamoeba and Tricho- any members of the microfilament-specific mo- reticulum (ER) with conserved chaperones monas (table S3). Yet, the predicted origins of the tor (28). Trichomonas,whichmay (BiP, Hsp90, DnaJ), but is unusual in having sequences appear to be independent of each other be a sister lineage, also lacks myosin. Either five protein disulfide isomerases, each with only and are not associated with a particular bacterial novel, divergent proteins substitute functionally asingleactivesite(16), and in lacking the Ero1 group. for the missing proteins or altered cytoskeletal protein that drives disulfide formation in the ER Giardia metabolizes arginine by the anaer- dynamics accommodate their absence. Giardia lumen. Membrane transport in Giardia is unlike obic arginine dihydrolase pathway (Fig. 2), orig- contains several unusual cytoskeletal protein that of other parasitic protozoa (17, 18). Despite inally described in bacteria but unknown in families including a-giardins (annexin homo- the highly polarized cell structure, there is no eukaryotes other than Trichomonas (22). Argi- logs), b-giardins (striated fiber assemblin homo- conclusive evidence for a stacked Golgi ap- nine deiminase, ornithine carbamoyltransferase, logs),theGASP-180family(29), and several paratus or cisternae for posttranslational mat- and carbamate kinase generate ammonia, orni- microtubule-associated coiled-coil proteins. uration of secretory cargo except in encysting thine, and adenosine 5´-triphosphate (ATP), and Giardia has 276 putative protein kinases trophozoites. Only a few Rabs, SNAREs (soluble all three archaeal-like enzymes are highly ex- (fig. S2) including members from 43 of the 61

www.sciencemag.org SCIENCE VOL 317 28 SEPTEMBER 2007 1923 REPORTS

primordial kinase subfamilies present in wide- alytically inactive. By contrast, most organisms dicted giardial kinases have transmembrane ly diverged eukaryotes (ciliates, fungi/metazoa, have fewer than 10 NEK kinases. domains. Giardial kinases may have other means plants, Dictyostelium). Trichomonas also has a This non-NEK kinome is the most compact of targeting; many have either ankyrin repeats greatly expanded kinome, which might reflect known from any eukaryote, and so it is of specific (29), coiled-coiled domains, or both, which may their putative sister relationship or common- functional and evolutionary interest in defining the allow for specific localization within the cell. alities in the parasitic life-style. Giardia has no minimal eukaryotic kinome. Broad-spectrum sig- Protein dephosphorylation is also critical in signal tyrosine-specific or histidine kinases. Most nota- nal transduction proteins gain specificity by lo- transduction networks. Giardia has ~32 predicted ble is that 180 (~70%) of the putative giardial calization to specific cellular target structures. protein phosphatases, but only one is predicted to protein kinases belong to the NIMA (Never in Entamoeba histolytica, another intestinal proto- be membrane associated. Mitosis Gene A)–Related Kinase (NEK) family, zoan parasite, has >80 putative transmembrane Giardial protein sequences commonly show and that 137 of them are predicted to be cat- kinases (30), but in stark contrast, only four pre- insertions of amino acids when compared to their on February 5, 2008 www.sciencemag.org Downloaded from

Fig. 2. Glucose, pentose-phosphate, and arginine metabolism in Giardia. 5.3.1.9; NOS, nitric oxide synthase, 1.14.13.39; OCD, ornithine cyclodeaminase, Color coding denotes similarity to archaeal homolog (red), bacterial homo- 4.3.1.12; OCT, ornithine carbamoyltransferase, 2.1.3.3; ODC, ornithine log (purple), or eukaryal homolog (blue). Black indicates that no homolog decarboxylase, 4.1.1.17; PFK, phosphofructokinase (pyrophosphate-based), was found. Abbreviations and Enzyme Commission numbers: 6PGL, 6- 2.7.1.90; PGAM, phosphoglycerate mutase, 5.4.2.1; PGD, phosphogluconate phosphogluconolactonase, 3.1.1.31; ACYP, acylyphosphatase, 3.6.1.7; dehydrogenase, 1.1.1.44; PGK, phosphoglycerate kinase, 2.7.2.3; PGM, phospho- ADI, arginine deiminase, 3.5.3.6; ARG-S, arginyl-tRNA synthetase, 6.1.1.19; CK, glucomutase, 5.4.2.2; PGM3, phosphoacetylglucosamine mutase, 5.4.2.3; carbamate kinase, 2.7.2.2; DERA, deoxyribose-phosphate aldolase, 4.1.2.4; PK,pyruvatekinase,2.7.1.40;PRO-S,prolyl-tRNAsynthetase,6.1.1.15;PRPPS, ENO, enolase, 4.2.1.11; FBA, fructose-bisphosphate aldolase, 4.1.2.13; G6PD, phosphoribosylpyrophosphate synthetase, 2.7.6.1; RBKS, ribokinase, 2.7.1.15; glucose-6-phosphate dehydrogenase, 1.1.1.49; GAPDH, glyceraldehyde-3- RPE, ribulose-phosphate 3 epimerase, 5.1.3.1; RPI, ribose-5-phosphate isomerase, phosphate dehydrogenase, 1.2.1.12; GCK, glucokinase, 2.7.1.2; GNPDA, 5.3.1.6; TKT, transketolase, 2.2.1.1; TPI, triose phosphate isomerase, 5.3.1.1; UAE, glucosamine-6-phosphate deaminase, 3.5.99.6; GNPNAT, glucosamine 6- UDP-N-acetylglucosamine 4-epimerase, 5.1.3.7; UAP, UDP-N-acetylglucosamine phosphate N-acetyltransferase, 2.3.1.4; GPI, glucose-6-phosphate isomerase, diphosphorylase, 2.7.7.23.

1924 28 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org REPORTS homologs in other organisms (fig. S3). We gen- for which sequences were available from several tions demonstrates that they do not represent erated protein alignments for 1518 proteins and other eukaryotes (Chlamydomonas, Cryptococcus, introns. The functions of these unusual insertions scored the alignments for the presence of inser- Dictyostelium, Encephalitozoon, Entamoeba, remain to be determined, although when we ex- tions in the giardial protein relative to others. We Leishmania, Mus, Phytopthora, Plasmodium, perimentally deleted an insertion in giardial aurora found in-frame amino acid insertions in 44 ORFs Saccharomyces, Thalassiosira, Trichomonas,and kinase and measured protein production, we ob- (not attributable to alignment ambiguities) with Trypanosoma; Supporting Online Material). served decreased protein stability (Supporting an average of 1.5 insertions per ORF. The inser- Giardia sequences showed 15 insertions in 11 of Online Material). tions ranged in size from 8 (our lower cutoff the 54 proteins; the number of insertions detected Giardial trophozoites survive in an environ- value) to 101 amino acids, with an average of 20. for the other organisms ranged from 0 to 6 ment of host digestive enzymes and bile. A dense To determine whether this was an unusually high (Plasmodium) (Table 2). Sequence analysis of single molecular layer of a variant-specific sur- frequency, we examined 54 protein alignments, giardial cDNAs that overlap many of these inser- face protein (VSP) covers the membrane and likely protects the trophozoites. Clonal VSPs on individual trophozoites switch to new VSPs Table 2. Amino acid insertions detected in alignments of conserved proteins. every 6 to 13 generations (31). VSPs vary in se- quence and size; all are cysteine-rich (about 12%) Organism Proteins Total no. of Insertion size range with frequent CXXC motifs. Each has an N- insertions* (amino acids) terminal signal peptide and characteristic C Chlamydomonas Ribosomal protein S13, DNA-directed RNA 421–434 terminus including a membrane-spanning region polymerase subunits terminating in CRGKA and an extended polyad- Cryptococcus Rad51, Dmc1b, serine palmitoyl transferase, 49–44 enylation signal. Unlike surface proteins asso- guanosine triphosphate (GTP)–binding protein ciated with immune evasion in other parasitic Dictyostelium Ribosomal protein L9, DNA-directed RNA 318–464 protists (32), giardial VSP genes distribute to polymerase subunit, DNA topoisomerase II many noncontiguous locations on all chromo- Encephalitozoon ATP-dependent RNA helicase, DNA-directed 28somes (Fig. 3), and they are activated or inacti- RNA polymerase subunit vated in situ with no evidence for associated Giardia Tyrosyl-tRNA synthetase, tryptophanyl- 15 8–81 rearrangement or sequence alteration. VSPs oc- tRNA synthetase, U5 small nuclear cur at only two of the telomeres where they are on February 5, 2008 riboprotein, ubiquitin activating enzyme truncated by TTAGG telomeric repeats, suggest- E1, poly(A) polymerase, ing that they are pseudogenes. We estimate DnaK, MCM3, DNA-directed RNA Giardia’s VSP repertoire at 235 to 275 genes polymerase subunits, RNA helicase, (table S5). VSPs frequently cluster as two to nine nucleolar GTP-binding protein, genes in head-to-tail orientation. Intergenic dis- DNA topoisomerase II tances between members of a cluster can be very Leishmania Tryptophanyl-tRNA synthetase, RNA 415–47 short, with the 5′ end of one VSP overlapping helicase, MCM3, DNA topoisomerase II with the 3′ end of a second. Mus RNA helicase 1 26

In addition to the VSPs, we found two other www.sciencemag.org Plasmodium Ubiquitin-activating enzyme E1, serine 612–86 classes of cysteine-rich proteins (Fig. 3) (33). palmitoyl transferase, GTP-binding protein, There are 61 HCMps (high-cysteine membrane DNA-directed RNA polymerase subunit, proteins) with 10% or more cysteine and 20 or vacuolar ATPase subunit, RNA helicase more CXXC or CXC motifs. They lack the Thalassiosira GTP-binding protein, DNA topoisomerase II, 38–13 CRGKA tail, and their single membrane- TCP-1 chaperonin subunit g spanning domain diverges from the VSPs. No Trichomonas 26S proteasome subunit, g-tubulin 2 8 additional leucine-rich repeat cyst wall proteins Trypanosoma – U5 snRP, GTP-binding protein, RNA helicase 5 8 18 (CWPs), beyond those previously identified, Downloaded from *Multiple insertions occurred in some proteins. were found. Giardia encodes 149 proteins that are promis- ing drug targets, as defined by Hopkins and Groom (34). As might be expected, these in- clude a large subset of the kinases, e.g., TOR (target of rapamycin) (table S6). When attached to the surface of the intestinal mucosa, Giardia trophozoites have ample oppor- tunity to pick up genes from bacteria and to scavenge products of host and bacterial metabo- lism. Like that of both Trichomonas and Entamoeba, Giardia’s genome contains many lateral gene transfer (LGT) candidates, indicating that LGT has played an important role in shaping Giardia’s genome and metabolic pathways. We initially identified ORFs with similarity to bac- terial or archaeal proteins at a BLAST signifi- − Fig. 3. Locations of VSP and other high-cysteine proteins on assembly scaffolds. From the top, cance level of e 10 or better within the top 10 hits. scaffolds are from chromosome 5, chromosome 4, chromosome 3, and chromosome 2. Red lines Of these, ~100 had multiple bacterial or archaeal − indicate high-cysteine proteins (HCNCp, HCMp, HCp) and blue lines indicate VSPs. The x axis is homologs at a significance level of e 30 or better scaled in kilobase pairs. within the top 20 matches (table S3). These

www.sciencemag.org SCIENCE VOL 317 28 SEPTEMBER 2007 1925 REPORTS

include proteobacterial-like DnaK, cpn60, and ery, cytoskeletal structure, and metabolic path- 10. S. Vanacova, W. Yan, J. M. Carlton, P. J. Johnson, Proc. cysteine sulfurtransferase (6, 35). Others are ways compared to later diverging lineages such Natl. Acad. Sci. U.S.A. 102, 4430 (2005). Trichomonas Entamoeba 11. A. G. Simpson, E. K. MacQuarrie, A. J. Roger, Nature 419, NADH (nicotinamide adenine dinucleotide, re- as fungi and even or 270 (2002). duced) oxidase and group 3 alcohol dehydro- (Supporting Online Material; table S7 and fig. 12. R. D. Adam, Microbiol. Rev. 55, 706 (1991). genase, derived by LGT from a Gram-positive S5). A parsimonious explanation of this pattern is 13. J. Samuelson et al., Proc. Natl. Acad. Sci. U.S.A. 102, coccus and a thermoanaerobic bacterium, re- that Giardia never had many components of 1548 (2005). 36 “ ” 14. D. J. Kelleher, R. Gilmore, Glycobiology 16, 47R (2006). spectively ( ). Hybrid cluster protein, A-type what may be considered eukaryotic machinery, 15. S. Banerjee et al., Proc. Natl. Acad. Sci. U.S.A. 104, flavoprotein, and glucosamine-6 phosphate not that it had and lost them through genome 11676 (2007). isomerase were recently shown to be relics of reductionasisevidentforEncephalitozoon. 16. A. G. McArthur et al., Mol. Biol. Evol. 18, 1455 (2001). LGT (37). As noted, many of the enzymes in the Taking a whole-evidence approach, one sees that 17. H. D. Luján et al., J. Biol. Chem. 270, 4612 (1995). glycolytic and pentose phosphate pathways are these data reflect early divergence, not a derived 18. M. Marti et al., Mol. Biol. Cell 14, 1433 (2003). 19. R. Bernander, J. E. Palm, S. G. Svard, Cell. Microbiol. 3, more similar to bacterial than to eukaryal genome. 55 (2001). homologs. Several ORFs had a highly significant Because Giardia has two nuclei, a high level 20. A. Nagasaki, E. L. de Hostos, T. Q. Uyeda, J. Cell Sci. 115, match to an Entamoeba and/or Trichomonas pro- of heterozygosity could accumulate in the ge- 2241 (2002). tein, with the remaining matches to bacteria or nome. Notably, heterozygosity in the genome 21. S. Suguri, K. Henze, L. B. Sanchez, D. V. Moore, M. Muller, J. Eukaryot. Microbiol. 48, 493 (2001). archaea. Although some of these are recognized was estimated to be less than 0.01%. We ex- 22. J. M. Carlton et al., Science 315, 207 (2007). LGT relics, the rest warrant closer examination. amined the two largest contigs, representing >1.2 23. L. Eckmann et al., J. Immunol. 164, 1478 (2000). Cpn60, the iron-sulfur complex proteins, and Mbp (10% of the genome) containing 482 single- 24. E. Li, P. Zhou, S. M. Singer, J. Immunol. 176, 516 (2006). DnaK are most similar to proteobacterial and copy genes, for high-quality mismatches between 25. D. M. Brown, J. A. Upcroft, P. Upcroft, Mol. Biochem. Parasitol. 72, 47 (1995). mitochondrial homologs. The iron-sulfur cluster individual reads and the consensus (table S8). We 26. D. E. Feely, J. V. Schollmeyer, S. L. Erlandsen, Exp. proteins and cpn60 are demonstrably targeted to found only 25 in total, eight of which were in Parasitol. 53, 145 (1982). the recently discovered mitosome, believed to be coding regions. This suggests that there may be a 27. E. M. Narcisi, J. J. Paulin, M. Fechheimer, J. Parasitol. 80, a relict mitochondrion (5). Other genes with ho- biological mechanism for maintaining genome 468 (1994). 28. R. D. Vale, J. Cell Biol. 163, 445 (2003). mology to mitochondrially targeted genes are fidelity and reducing heterozygosity between the 29. H. G. Elmendorf, S. C. Rohrer, R. S. Khoury, detectable, e.g., a mitochondrial protein peptidase four genome copies. Meiosis-associated proteins R. E. Bouttenot, T. E. Nash, Int. J. Parasitol. 35, 1001 homolog, but none have phylogenetic affinity are present in Giardia (42), although they may (2005). 30. D. L. Beck et al., Eukaryot. Cell 4, 722 (2005). specifically to the a-proteobacteria/mitochondrial have alternative functions. on February 5, 2008 lineage. Giardia is impoverished with respect Giardia is an excellent functional and ge- 31. T. E. Nash, H. T. Lujan, M. R. Mowatt, J. T. Conrad, Infect. Immun. 69, 1922 (2001). to genes that are phylogenetically linked to a- nomic model for other intestinal protozoan 32. J. D. Barry, M. L. Ginger, P. Burton, R. McCulloch, proteobacteria, unlike other eukaryotes in which parasites whose complete life cycles cannot be Int. J. Parasitol. 33, 29 (2003). up to 20% of mitochondrially targeted proteins replicated in the laboratory. In many pathways 33. B. J. Davids et al., PLoS ONE 1, e44 (2006). show such ancestry (38). that require multiprotein complexes, it is nota- 34. A. L. Hopkins, C. R. Groom, Nat. Rev. Drug Discov. 1, 727 Giardia (2002). Phylogenetic inference alone cannot resolve ble that has fewer recognizable compo- 35. J. Tachezy, L. B. Sanchez, M. Muller, Mol. Biol. Evol. 18, Giardia’s evolutionary history. Because so many nents than other organisms. Whether due to 1919 (2001). of Giardia’s genes may have been derived from early divergence or genomic reduction, the ge- 36. J. E. Nixon et al., Eukaryot. Cell 1, 181 (2002). 37. J. O. Andersson, A. M. Sjogren, L. A. Davis, T. M. Embley,

horizontal transfer or be subject to accelerated evo- nome gives valuable clues to the minimal com- www.sciencemag.org A. J. Roger, Curr. Biol. 13, 94 (2003). lution, only a subset can be used to infer phylogeny. ponents needed for complex cellular processes. 38. S. G. Andersson, O. Karlberg, B. Canback, C. G. Kurland, Of the ~1500 genes for which there are known The genome sequence has revealed much but Philos. Trans. R. Soc. London B Biol. Sci. 358, 165 homologs, only a handful included diverse eukary- also raised intriguing questions for further study, (2003). otic taxa and generated robust trees, largely be- e.g., the number and distribution of introns and 39. E. Bapteste et al., Proc. Natl. Acad. Sci. U.S.A. 99, 1414 (2002). cause the sequences could not be unambiguously the composition of the giardial spliceosome, how 40. F. Thomarat, C. P. Vivares, M. Gouy, J. Mol. Evol. 59, 780 aligned. We generated and examined trees for Giardia maintains homozygosity across the sepa- (2004). many conserved proteins, and selected ribosomal rated nuclei, and the function of the novel genes 41. A. G. Simpson et al., Mol. Biol. Evol. 19, 1782 (2002).

proteins for a multigene data set because they are and gene families discovered. The anticipated 42. M. A. Ramesh, S. B. Malik, J. M. Logsdon Jr., Curr. Biol. Downloaded from an ancient family, whose nature—interaction with release of a draft genome from the related 15, 185 (2005). 43. We acknowledge J. D. Silberman, S. (Pacocha) Preheim, rRNAs and with all cellular proteins during their Spironucleus vortens, a commensal or oppor- S. Birkeland, M. Shapiro, V. Seshadri, and J. Woo for synthesis—constrains their divergence. Phyloge- tunistic parasite of angelfish, will enable compar- laboratory assistance and M. Pop, S. Salzburg, G. Hinkle, netic relationships were assessed with Bayesian ative genomics within the diplomonads and and R. Campbell for assistance with informatics and and maximum-likelihood statistical procedures reveal which features of the giardial genome helpful discussions. We are grateful to The Institute for Genomic Research, the Sanger Centre, and the Joint (Supporting Online Material). result from its obligate parasitic life-style and Genome Institute for use of sequence data. The NIH/ The resulting tree (fig. S4) and an earlier which reflect its basal evolutionary position. National Institute of Allergy and Infectious Diseases analysis based on 100 genes (39) support the (grants AI43273 to M.L.S. and AI42488/AI51687 to deep divergence of Giardia and Trichomonas in the References and Notes F.D.G.), the G. Unger Vetlesen Foundation, the Ellison Medical Foundation, and LI-COR Biotechnology supported eukaryotic tree. Only Encephalitozoon branches 1. M. C. Hlavsa, J. C. Watson, M. J. Beach, MMWR Surveill. Summ. 54, 9 (2005). this work. This whole-genome shotgun project has been earlier in this tree. The preponderance of molecular 2. T. Hashimoto et al., Mol. Biol. Evol. 12, 782 (1995). deposited at the DNA Databank of Japan/European data place microsporidia as derived relatives of 3. E. Hilario, J. P. Gogarten, J. Mol. Evol. 46, 703 (1998). Molecular Biology Laboratory/GenBank under the project fungi, on the basis of both gene trees and ultra- 4. D. D. Leipe, J. H. Gunderson, T. A. Nerad, M. L. Sogin, accession AACB00000000. The version described in this structural features (40). Giardia has no such Mol. Biochem. Parasitol. 59, 41 (1993). paper is the second version, AACB02000000. 5. A. Regoes et al., J. Biol. Chem. 280, 30557 (2005). affiliation with another eukaryotic lineage. Ge- Supporting Online Material 6. A. J. Roger et al., Proc. Natl. Acad. Sci. U.S.A. 95, 229 www.sciencemag.org/cgi/content/full/317/5846/1921/DC1 nome-scale data from other excavate taxa (41)are (1998). Giardia Trichomon- Materials and Methods needed to resolve whether and 7. A. A. Best, H. G. Morrison, A. G. McArthur, M. L. Sogin, Figs. S1 to S5 as branch deeply because that is their correct posi- G. J. Olsen, Genome Res. 14, 1537 (2004). Tables S1 to S8 “ ” 8. J. E. Nixon et al., Proc. Natl. Acad. Sci. U.S.A. 99, 3701 References tion or simply because of long branch attraction. (2002). As discussed earlier, Giardia consistently 9. A. G. Russell, T. E. Shutt, R. F. Watkins, M. W. Gray, BMC 16 April 2007; accepted 20 August 2007 shows a pattern of simplified molecular machin- Evol. Biol. 5, 45 (2005). 10.1126/science.1143837

1926 28 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org