<<

Himmelfarb Health Sciences Library, The George Washington University Health Sciences Research Commons Microbiology, Immunology, and Tropical Medicine Microbiology, Immunology, and Tropical Medicine Faculty Publications

3-2014 Genome of the human americanus Yat T. Tang

Xin Gao

Bruce A. Rosa

Sahar Abubucker

Kymberlie Hallsworth-Pepin

See next page for additional authors

Follow this and additional works at: http://hsrc.himmelfarb.gwu.edu/smhs_microbio_facpubs Part of the Medical Immunology Commons, Medical Microbiology Commons, Parasitic Diseases Commons, and the Parasitology Commons

Recommended Citation Tang, Y.T., Gao, X., Rosa, B.A., Abubucker, S., Hallsworth-Pepin, K. et al. (2014). Genome of the human hookworm . Nature Genetics, 46(3), 261-269.

This Journal Article is brought to you for free and open access by the Microbiology, Immunology, and Tropical Medicine at Health Sciences Research Commons. It has been accepted for inclusion in Microbiology, Immunology, and Tropical Medicine Faculty Publications by an authorized administrator of Health Sciences Research Commons. For more information, please contact [email protected]. Authors Yat T. Tang, Xin Gao, Bruce A. Rosa, Sahar Abubucker, Kymberlie Hallsworth-Pepin, Peter J. Hotez, Jeffrey M. Bethony, and +23 additional authors

This journal article is available at Health Sciences Research Commons: http://hsrc.himmelfarb.gwu.edu/smhs_microbio_facpubs/ 107 © 2014 Nature America, Inc. All rights reserved. M.M. M.M. ( University School of Medicine, Saint Louis, Missouri, USA. of Veterinary Science, The University of Melbourne, Parkville, Victoria, Australia. Irvine, California, USA. Biológicas, Universidade Federal de Minas Gerais, Minas Gerais, Brazil. 10 Research, Brisbane, Queensland, Australia. Development of Queensland Therapeutics, Tropical Health Alliance, James Cook University, Cairns, Queensland, Australia. Institute of Technology, Pasadena, California, USA. Texas, USA. University School of Medicine, Saint Louis, Missouri, USA. 1 adults. dioecious become to develop they where intestine, small the to travel and are swallowed they which after oropharynx, to the chea tra the via migrate and alveoli the into break iL3 The lungs. the to circulation the via and travel and lymph vessels, blood subcutaneous enter penetration, skin by host human the infect larvae These (iL3). larvae third-stage infective become to twice molt and microbes tal conditions, and then the first-stage larvae hatch, feed on environmen favorable under in soil embryonate Eggs people. of infected feces the an and women, pregnant children in in development physical and/or cognitive of impairment malnutrition anemia, by clinically infections hookworm all years life disability-adjusted burden million of causing a and regions, 1.5–22.1 disease subtropical 700 million people, primarily in disadvantaged communities in worldwide tropical people lion bil >1 affecting diseases tropical neglected cause including , and (STHs), helminths Soil-transmitted and efforts a of against host, Characterization mortality intestine, The Philip L Felgner Jason Mulvenna Qi Wang TyagiRahul Yat T Tang Genome of the human hookworm OPEN Nature Ge Nature Received 10 June 2013; accepted 18 December 2013; published online 19 January 2014; The The Genome Institute at Washington University, Washington University School of Medicine, Saint Louis, Missouri, USA.

Department Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore. postgenomic

The The life cycle of which

hookworm human

[email protected] genes

toward

hookworms.

we 4 during causing 1 n Sabin Sabin Vaccine Institute and Texas Children’s Hospital Center for Vaccine Development, Houston, Texas, USA.

, , Bin Zhan

involved immunological etics

1 highlight , 16 1

fundamental , Esley , Heizer Esley

application Necator Necator americanus , Xin , Gao Xin

pregnancy N. N. americanus

of 12 iron-deficiency 13 8 ADVANCE ONLINE PUBLICATION ONLINE ADVANCE

, , Javier Sotillo , Jeffrey Bethony

this Department Department of Microbiology, Immunology and Tropical Medicine, The George Washington University, Washington, DC, USA.

N. americanus in

1 as ,

2 3 blood

. Hookworms alone infect approximately approximately infect alone Hookworms . first ,

4 4 potential and causes , characterized characterized necatoriasis, causes and , , Peter J Hotez

1 of

3 diseases. and in , ).

. . 16 hookworm

the feeding . americanus N.

commences with commences eggs being shed in women. 1 , , Bruce A Rosa , , Xu Zhang

applied

hookworm anemia,

9 treatments

Department Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia. has 7

is , , Shoba Ranganathan

and

13 the

We undergone

genome 6 postgenomic , John M Hawdon Howard Howard Hughes Medical Institute, Chevy Chase, Maryland, USA.

development, 3 malnutrition,

predominant report , 4 represents ~85% of of ~85% represents 1

, , Paul W Sternberg genome

, , Veena Bhonagiri-Palsikar against 16 3 Ascaris Department Department of Pediatrics, National School of Tropical Medicine, Baylor College of Medicine, Houston, 1 These These authors contributed equally to this work. should Correspondence be addressed to

,

sequence 16

sequencing

a , , Sahar Abubucker

considerable

, , sequence. inflammatory research, Trichuris

soil-transmitted growth and 12 Division Division of Infectious Diseases, Department of Medicine, University of California, Irvine,

identified

13 genes

9 5 and . , , Robin B Gasser 10 - - -

15

including

and This Division Division of Infectious Diseases, Department of Internal Medicine, Washington , Elida , M Elida Rabelo

assembly and

5 by the absence of genome and proteome sequences. A recent study A recent sequences. proteome and of genome absence by the high rates of hookworm infections, hookworms and other helminths and helminths other hookworms infections, rates of high hookworm approaches. conventional than rate hit higher a in results which targets, anthelmintic of prioritization and the characterization has facilitates shown that genomics comparative of the molecular biology of of were immunogens by hampered discovery a lack of understanding tion Early strategies. attempts to use bioinformatic approaches for the cure rates in infected humans treatment failures but repeated and excessive use of this agent has the potential to lead to with treatment mass on mainly relies therapy Present 4–8 weeks, and ofa female worm period pre-patentcan produce The up decade.to 10,000 eggs a per day. to up for host on blood (up to 30 feed mucosa,theywherelong) theattach cmto adult worms (~1The encoding

,

diseases.

6 development

genome In addition to a need for anti-hookworm vaccines in countries with needed. urgently are disease hookworm control to methods New genes , , Annette Dougall

unique

doi:10.1038/ng.287 human 1

, , Kymberlie Hallsworth-Pepin the

orchestrating of

1

proteins We provides development , , Patrick Minx

Necator americanus expansion the

parasite.

also 14

stunting N. americanus 7 , Alex Loukas and drug resistance µ 11

l per day per worm), and can survive in the human used that

, , Richard K Wilson 11 an Departamento Departamento de Instituto Parasitologia, de Ciências

5

of Adult the invaluable

7

represent

in a , , Soraya Torres Gaze

of immunomodulator N. N. americanus

7

protein

Centre Centre for Biodiscovery and Molecular hookworm’s

children, 9 new 1 imply an urgent need for new interven

2 , , Wesley C Warren worms Department Department of Genetics, Washington 8 5

Queensland Queensland Institute of Medical Division Division of Biology, California genome

7 methods & Makedonka Mitreva

new microarray resource 8

feed . Recent indications of reduced and

and other hookworms potential

(244 invasion

severe

on to 1 1

, , John Martin , to

2

control

blood , proteins, Mb,

to

boost s e l c i t r A

morbidity

7 demonstrate drug . americanus N. of 1 , 1

14 9, in , the 2

hookworm ongoing Faculty Faculty , 1

the

targets 5 some

human 1 1

small genes).

and 1 1

4 , ,

and

2

,

is

15

1 6

 0 - ,

© 2014 Nature America, Inc. All rights reserved. , nematodes, the host and two outgroups; five (including species other of that to comparable was composition and 92% of the 90% of the genome. The was 244-Mb sequence estimated to represent assembled, with 11.4% (1,336) of the supercontigs ( of genome nuclear The Genome RESULTS potential. diagnostic high have therefore and individuals infected all by recognized are but species other in proteins to similarity low have that molecules analysis, identified we postgenomic the In sequence. genome the of exploration postgenomic a enabled and parasite the to response immune the of features undescribed previously revealed microarray protein worm hook a of screening and production The targets. represent intervention may new which of some genes groups, molecular protein-coding salient the identified of analyses the and Bioinformatic nematodes host. other human of those with it compared and genome non for trials clinical diseases infectious in tested been have proteins recombinant hookworm Indeed, diseases. allergic and autoimmune suppress and humans in for periods long reside worms by which a ing mechanism suggest properties, and wound-healing promote anti-inflammatory and molecules pro-inflammatory of production the suppress worms are not endemic infections diseases in humans in many industrialized countries where hookworm immunological against (probiotics) as treatments explored are being length of29kb. 213 kb;90%ofthegenomeisin1,336supercontigsandwithaminimum of thegenomeisin283supercontigsandwithaminimumlength a Protein-coding loci Assembly statistics E T s e l c i t r A  cies was not significantly different ( different significantly not was cies spe two the between orthologous genes for count and length intron ( longer were introns the and shorter were 45.8%. of content GC a and Mb per genes 78.5 of density average an at genome the of 33.7% Note ( annotated and dicted pre were families repeat 669 total, In 23.5%. was content repeat the N50 andN90respectivelydenote50%90%ofallnucleotidesintheassembly. 50% able able 1 stimated genomesize(Mb) Avg. intergenicspace(bp) Avg. intronsize(bp) Avg. exonsize(bp) Avg. numberofexonspergene Avg. genelocusfootprint(bp) Total numberofprotein-codinggenes Repetitive sequences GC contentofwholegenome N90 supercontiglength(bp) Number ofN90supercontigs N50 supercontiglength(bp) Number ofN50supercontigs Total numberofbasepairs(bp)insupercontigs Total numberofsupercontigs( We sequenced, assembled and characterized the the characterized and assembled sequenced, We Compared Compared to of those Supplementary Note ). The protein-encoding genes predicted ( predicted genes protein-encoding The ). s

features ummary ummary of N. N. americanus 1 5 . N. N. americanus Supplementary Table 2 Supplementary Caenorhabditis Caenorhabditis elegans genome ( N. americanus N. ). The GC content was 40.2%, the amino acid a a a a 1 ≥ 1 . Recent studies . Recent 1 kb) Table genomic features P = 0.65 and 0.69, respectively; respectively; 0.69, and 0.65 = Supplementary Supplementary Table 1 (244 megabases (Mb)) was was (Mb)) megabases (244 1 , , Supplementary Figs. Supplementary 1 Fig. 1 Fig. 12– , , 1 n and and 244,009,025 4 N. N. americanus = 19,151) represent represent = 19,151) indicate that hook that indicate a ≥ ), but the average average the but ), 213,095 1 1 kb) comprising Supplementary Supplementary 19,151 29,214 11,713 N. americanus N. 6,631 4,289 1,336 244 642 125 283 23.50% 40.20% 6.4

exons ) ) and

– 3 - - - - - ­

in other parasitic nematodes ( ( genes iL3-overexpressed among proteins receptor protein–coupled G ing Note a infect and ( host environment suitable complex a to adapt to iL3 of ability the activity, and reflecting anion membrane receptor transporter activity were over-represented ( stage the iL3 to nonparasitic than rather to related be to likely more are genes 32%, to compared (58% many of the adult as twice nearly genes, iL3-overexpressed to Compared adult. in 64% EdgeR, to (according overexpressed ficantly 5 Fig. ( progression mental Differences in gene expression free-living between as these stages reflect time this develop of in larvae the environment external before transitioning to amount parasitism. considerable a spend Hookworms Transcript the of predicted 68% for provided were annotations and respectively, genes, and 1,411 Gene Ontology terms for 57% and 44% of the on the basis of sequence comparisons identified 4,961 unique domains proteins predicted of annotation Functional proteome. deduced the of 33% represented proteins) 4,785 secretion, nonclassical proteins; control regulatory similar under co-transcribed are they that idea ( genes of non-operon subsets of random to to than those one another more similar were significantly operons and number of exons than those without ( without those than exons of number higher a had splicing alternative with associated genes expected, As ( spliced were alternatively also ons ( ons 373 with orientation and order gene conserved had N. americanus americanus N. orthologous groups for which more than half of the the of half than more which for groups orthologous to belong to genes other than likely more were genes spliced tively Among in orthologs have ~68.3% which of genes, the of (4,712) Table Supplementary 3 ( respectively stages, these in overexpressed were 3.7% and 6.5% and stage), per replicates logical of stages adult and iL3 the from data seq) ( genes for orthologous lengths to comparable was which length, regulation for these gene sets overexpressed genes ( N. americanus to what might be regarded as ‘normal’ intron structure longer introns are elements in thought to addition contain functional 1 Fig. longer than introns in nonorthologous genes that were orthologous to 1a Fig. P = 4.1 × 10 Among the iL3-overexpressed genes, eight molecular functions functions molecular eight genes, iL3-overexpressed the Among The Most genes were (82.6%) using confirmed RNA (RNA-sequencing Supplementary Note Supplementary Fig. 1 Fig. ). This finding is supported by the enrichment of genes encod genes of enrichment the by supported is finding This ). c ). Of the 1,948 differentially expressed genes, 36% were signi were 36% genes, expressed differentially 1,948 the Of ). ). This may indicate a diversity of function for these genes, as as genes, these for function of diversity a indicate may This ). N. americanus N. , P b N. americanus N. 51 10 × 5.1 = and and N. americanus N. d

expression −7 , , Supplementary Figs. 6 6 Figs. Supplementary Supplementary Note Supplementary iL3-overexpressed genes had longer introns than adult- ) ( genes were predicted to be trans-spliced, of 818 which to be were genes trans-spliced, predicted Fig. 2 Fig. and and a Supplementary Fig. 8 - DVANCE ONLINE PUBLICATION ONLINE DVANCE overexpressed overexpressed genes were to specific a −8 Fig. 1 predicted secretome (classical secretion, 1,590 1,590 secretion, (classical secretome predicted , , Supplementary Table 3 Table Supplementary 1 . elegans C. Supplementary Table6 Supplementary genes with with genes ) but not among adult-overexpressed genes genes adult-overexpressed among not but ) 8 P

. ). Alternative splicing was detected for ). was splicing Alternative detected 24.6% in < 0.01), including signal transduction, trans proteins ( proteins P

). The expression profiles of genes within within genes of profiles expression The ). < 10 < b infective 1 ), which may indicate a greater diversity of 1 9 6 , serine/threonine protein kinase activity . Positional bias was observed for intron N. N. americanus Fig. 1 Fig. P Supplementary Figs. 4 4 Figs. Supplementary −15 = 0.037, binomial distribution test). test). distribution = binomial 0.037, , respectively). A total of 3,223 3,223 of total A respectively). , C. elegans C. ), suggesting that species-specific species-specific that suggesting ), Supplementary Table 5 Supplementary C. elegans C. ). However, introns in in However,introns ).

c and and and ). Consistent with observations C. C. elegans P

P 7 < 10 < parasitic < 0.0001), supporting the supporting < 0.0001), Supplementary Note Supplementary , , N. americanus N. position-specific intron intron position-specific Supplementary Table 4 Supplementary genes were significantly orthologs, the alterna the orthologs, q and and and and −15

= 0.05) in iL3, and and iL3, in 0.05) = genes ( Nature Ge Nature and 2 × 10 × 2 and

stages Supplementary Supplementary Supplementary Supplementary 1 C. elegans C. C. elegans C. 6 . Furthermore, N. N. americanus N. americanus P < 1 × 10 and C. elegans. C. (two bio (two C. elegans C. ). 5 n genes genes oper −7 etics , and and , ). for for −15 1 7 - - - - - ­ ­ . ;

© 2014 Nature America, Inc. All rights reserved. yrlss n ctlss ( catalases proteases, and including hydrolases enzymes of spectrum broad a for transcripts of stages arrested in observed as primed, be to likely is but active not is expression gene is, (that invasion host after indicate that transcription factors are poised might for rapid gene activity expression factor–related transcription of enrichment iL3-stage scription factors were iL3 overexpressed ( tran expressed and by of that the the differentially most fact (92.5%) (GO:0003700; P activity” factor transcription binding DNA specific iL3, as evidenced by the enrichment of genes annotated with “sequence- is likely to activities be in high regulatory complexity of transcription ( genes iL3-overexpressed among enriched also was with In genes. nonorthologous to compared position intron every ( significantly ( 1 Figure Nature Ge Nature of diet high-protein a to worms adult of adaptation nutritional the P a c = 1.7 × 10 × 1.7 =

In contrast, in the adult stage, we detected overexpression of of overexpression detected we stage, adult the in contrast, In Average intron length (bp) Relative frequency (%) < 10 15 150 200 250 300 350 400 450 500 1 × 10 Avg. exon length (bp) 0 5 C. elegans C. 100 150 200 250 40 9 8 7 6 5 4 3 2 1

N. americanus Organization of of Organization N. americanus All All −10 N. americanus n C. elegans Not orth.to C. elegans Exon length −14 100 P etics ) shorter and the average intron is significantly ( significantly is intron average the and ) shorter

are shown on the the on shown are Intron length(bp) < N. americanus ) and genes with alternative splicing ( splicing alternative with genes and ) 1 × 10

250 genes notorth.to orth.to ADVANCE ONLINE PUBLICATION ONLINE ADVANCE genes −10 N. americanus N. 500 ) more introns than nonorthologous genes in both species. ( species. both in genes nonorthologous than introns ) more C. elegans C. elegans genes Intron length Orth. to upeetr Tbe 6 Table Supplementary C. elegans Intron position(5 C. elegans C. 2,000 gene features compared to to compared features gene

N. americanus Relative frequency (%) Not orth.to 0 2 4 6 C. elegans C. elegans chromosomes. 20 Supplementary NoteSupplementary ′ to All All C. elegans 3 ′ N. americanus ) C. elegans 50 C. elegans C. notorth.to orth.to Exon length(bp) 100 N. americanus genes

P ). This reflects reflects This ). Orth. to P N. americanus genes = 0.008). The The 0.008). = a < 2 × 10 × 2 < 250 – genes N. americanus 2 c P 0 , error bars indicate s.e.m. ( s.e.m. indicate bars , error ).

< 500 1 × 10 ). ). This C. elegans C. −13 200 250 300 350 400 450 2,000 10

), ),

- −10 (bp) length intron Avg. ) longer than in in than ) longer environment in the host the in environment protect the adult stage from the digestive and immunologically hostile 7 Table comparisons; ( iL3 in than adult in often more overexpressed N. americanus proteases (325 of 592) predicted to be secreted. Proteases, particularly ( secretome predicted the to substantially contributed proteases and Proteases genes, SP-containing genes. among enriched were inhibitors protease iL3-overexpressed and among enriched transcripts ( proteins domain–containing transmembrane were among genes enriched ( adult-overexpressed that transcripts had secretion for (SP) peptide signal a with Proteins blood . . ( d b 0 Supplementary Table 6 Supplementary

a Average intron length (bp) C. elegans ) The average exon in in exon average ) The 150 200 250 300 350 400 450 500 550 2 1 3 ( ). Serine-type endopeptidase inhibitor activity, required to to required activity, inhibitor endopeptidase Serine-type ). c Fig. Fig. ) In orthologous genes from from genes orthologous ) In iL3-specific genes to Not orthologous C. elegans d 20,040,000 (gene orderconsidered) N. americanus ) ) C. elegans C. –specific –specific proteases with no orthologs in 2 Fig. 2b Fig. N. americanus N. N. americanus , , 50,100,000 Supplementary Fig. 9 Fig. Supplementary 4 90,180,000 , c N. americanus Not orthologousto genes. ( genes. N. americanus N. , , 70,140,000 Average numberofintronspergene 2 Supplementary Note Supplementary and and 2 genes , was adult enriched ( enriched adult was , All genes 40,080,000 genes that are in operons and conserved conserved and operons in are that genes 10,020,000 genes Adult-specific Supplementary Note Supplementary b 5 ) Orthologous (orth.) genes have have genes (orth.) ) Orthologous (gene orderandorientationconsidered) N. americanus

C. elegans C. All genes Adult-specific genes genes is significantly significantly is genes iL3-specific genes and and 6 30,060,000 , introns are longer at longer are , introns Supplementary Note Supplementary C. elegans and and 60,120,000 P P N. americanus Orthologous to = 1.2 × 10 × 1.2 = P P = 1.6 × 10 × 1.6 = s e l c i t r A C. elegans Orthologous to < 10 < 10 < ), with 55% of all of all 55% with ), Supplementary Supplementary 100,200,000 C. C. elegans genes 7 −15 80,160,000 −15

), ), whereas for both both for −4 −8 ). The ). , , were ) had had ) 8 ). ).  © 2014 Nature America, Inc. All rights reserved. tions, including blood coagulation and immunomodulation, the the immunomodulation, and func coagulation blood physiological including diverse in tions, involved are serine humans that in Given clans. proteases inhibitor protease 17 of 8 for accounting in SPIs 87 Weannotated (SPIs). inhibitors protease anticoagulants hookworm Known by proteases. proteins of blood degradation and process tion ( adult enrichment of genes encoding structural constituents of the cuticle s e l c i t r A  in SPIs of diversity with with ( factor. TF, nodes. root transcription second-order (iii) and species, comparison the of two least at in stages cycle life adult or iL3 in and stages life-cycle specific in enriched terms ontology gene 2 Figure b a P transporter activity Transmembran = 1.7 × 10

Blood feeding in adult hookworms is facilitated by in is adult feeding an hookworms facilitated Blood anticoagula N. americanus genes with at least one ortholog in C. elegans (log ratio) C. elegans C. 2 A. suum ligand-gated ion channel activity Function Extracellular

More C. elegans More N. americanus B. malayi –3 –2 –1 Part of Negatively regulates Is Molecular functions enriched among among enriched functions Molecular 0 1 2 3 C. elegans a

−5 T. spiralis e ) also relates to protecting the parasite from the host orthologs ( orthologs iL3 only L. loa H. sapiens Transporter activity Over-represented in Under-represented in Adult-enriched iL3-enriched in –10 Cysteine Aspartic N. americanus N. transporter activity Structural molecule transmembrane b ) or with no no with ) or 2 4 Anion activity are dominated by single-domain serine serine single-domain by dominated are N. americanus N. constituent of –5 Structural N. americanus N. americanus iL3 enriched cuticle Serine Metalloendopeptidase

N. americanus

N. americanus log is probably crucial not only for for only not crucial probably is C. elegans C. 2 foldchange 0

, (ii) categories significantly ( significantly categories , (ii) N. americanus N. Adult enriche Catalytic activity orthologs ( orthologs Transferase activity Transferase activity containing groups Oxidoreductase 5 hexosyl groups d phosphorus- transferring transferring N. americanus N. Unknown Threonine activity c genes, stage-enriched genes and the the and genes stage-enriched genes, polymerase activity RNA-directed DNA threonine kinase ). Protein serine/ N. americanus N. activity 10 2 Molecular function 3 Hydrolase activity . Adult only - - , Endopeptidase

P

≤ activity adult adult by produced are SPIs of types multiple that suggest findings our but ( molecules Kunitz-type were hookworms in ( genes iL3-overexpressed ( genes adult-overexpressed the growth delay ing hookworm-associated prominent are elastase, including and proteases, chymotrypsin serine trypsin, where intestine, small the in enzymes from worms adult protect to likely are SPIs Specifically, host. the in survival but for also long-term feeding blood during anticoagulation 1 × 10 compared to other species. Included are (i) categories enriched in the the in enriched categories (i) are Included species. other to compared endopepsidase activity Metalloendopeptidase c inhibitor activity Endopeptidase

N. N. americanus Number of N. americanus orthologs 10 6 0 2 4 8 Cysteine-type −5 inhibitor activity peptidase activity endopeptidase ) over-represented or depleted in in depleted or ) over-represented activity Serine-type iL3 only Serine-type Hormone activity a DVANCE ONLINE PUBLICATION ONLINE DVANCE Cysteine Aspartic –10 b N. americanus N. , in the human host. A mass spectrometry–based spectrometry–based A mass host. human the in c ) Expression profiling of of profiling ) Expression P = 0.35). Most of the SPIs characterized characterized SPIs the of Most 0.35). = –5 Serine Metalloendopeptidase iL3 enriched Binding P degradome. ( degradome. = 3.9 × 10 × 3.9 = Zinc ionbinding log Lipid binding Nucleic acid 2 binding foldchange 0 2 RNA binding N. americanus N. 2 Sequence-speci c . . SPIs among were enriched Adult enriched DNA binding N. americanus N. DNA binding −8 Supplementary Note Supplementary a ) ‘Molecular function’ function’ ) ‘Molecular ), but not among the the among not but ),

Signal transducer 5 Nature Ge Nature Unknown Threonine 2 activity Protein binding 5 Sequence-speci compared to compared Transmembrane receptor activity , thus mediat thus , DNA-bindingT Hormone activity activity proteases proteases 10 n etics

Adult only F c ), ), - © 2014 Nature America, Inc. All rights reserved. tory factor (NIF), hookworm platelet inhibitor (HPI), galectins, galectins, (HPI), inhibitor platelet hookworm (NIF), factor tory inhibi neutrophil (MIF), factor inhibitory migration macrophage include interactions immunomodulatory host-parasite in involved in genes Additional the evolution of parasitism nematode of evolution the factor- growth (TGF- transforming encoding genes of in immunogenic/ homologs three encoding proteins genes predicted immunomodulatory previously to ogous targets vaccine or drug ( selective for candidates as serve might Note diseases inflammatory human candidates for vaccine extensively or drug studied been hookworm have as proteins SCP/TAPS interactions. in proteins canus SCP/TAPS of expansion large spe The parasite ciation. after proteins SCP/TAPS of expansion independent contained which of some only ( groups multiple into pro teins before SCP/TAPS classified originated similarity sequence have Primary may parasitism. proteins SCP/TAPS nematode that in orthologs of repertoire limited a of presence The Methods). Online see (MCL); clustering Markov to (according 137 ( overexpressed adult were nematodes. other to of the compared of 137) (69 half More than family protein this of expansion in proteins SCP/TAPS 137 ( interactions host-parasite in Table 5 Supplementary relationships. functional sequent ( functions molecular enriched for structure hierarchical the and comparison, penetration tissue americanus N. to addition in recruitment eosinophil of inhibition and eotaxin of cleavage the prominent most ( the proteases as metalloendopeptidases and ( studied Sixty cies parasitism. facilitate that of percent molecules identified we a distant species, and host its create nematodes, other from and genomes with genome privilege development immune sustain of which site modu defenses to products, ability immune secretory their host of evade because and years live to late several able for are host hookworms the Adult in host. the in immunity sterile americanus N. Pathogenesis Table 6 ( proteolysis to related many terms, Ontology Gene ( SPs ( Table 7 worms (Online Methods), and the proteins detected ( adult whole using was performed analysis proteomics Nature Ge Nature P Supplementary Table 5 Table Supplementary = 4.9 × 10 × 4.9 = We identified a total of 336 336 of total a We identified IPR014044; InterPro (SCP/TAPS; SCP/Tpx-1/Ag5/PR-1/Sc7 N. americanus N. ). The 96 96 The ). P upeetr Note Supplementary suggests multiple, possibly distinct roles in host-parasite host-parasite in roles distinct possibly multiple, suggests β 47 10 × 4.7 = and and and ), an important protein in modulation of inflammation and and inflammation of modulation in protein important an ), n N. americanus N. Fig. 2 Fig. Fig. 2 Fig. Supplementary Fig. 10 Supplementary Supplementary Note Supplementary etics −7

and is the only blood-feeding nematode included in the the in included nematode blood-feeding only the is upeetr Tbe 8 Table Supplementary causes chronic disease and does not usually induce induce usually not does and disease chronic causes N. americanus N. ) and SPIs ( SPIs and ) a a

) revealed shared and unique patterns and sub and patterns unique and shared revealed ) ); these proteases are probably associated with with associated probably are proteases these ); −11 SCP/TAPS proteins had orthologs in in orthologs had proteins SCP/TAPS immunobiology N. americanus N. ADVANCE ONLINE PUBLICATION ONLINE ADVANCE ) and proteins representing a wide range of of range wide a representing proteins and ) ) is a protein family inferred to be involved involved be to inferred family a protein ) is ). genes had an ortholog in the other spe other the in ortholog an had genes P 2 N. americanus N. Fig. 3b Fig. 6 P = 1.8 × 10 × 1.8 = ). Comparative analysis identified identified analysis Comparative ). . By comparing the the comparing By . < 10 < –specific SCP/TAPS identified here here identified SCP/TAPS –specific N. americanus N. Supplementary Note Supplementary N. americanus N. . elegans C. ). encoding proteins inferred to be to inferred proteins encoding ) ) were for also enriched proteases −15 , 1 c

5 Ascaris and and of or stroke or 2 ; ; 3 8 3

n hemoglobinolysis and Fig. 3 Fig. −4 hookworm ( , , Supplementary Table 5 Table Supplementary 1 ), as well as proteins with with proteins as well as ), Supplementary Fig. 12 Fig. Supplementary upeetr Fg 11 Fig. Supplementary , representing a fourfold fourfold a representing , 3 ih hi excretory- their with 3 members, suggesting suggesting members,

0 genes that are orthol are that genes a suum and as therapeutics therapeutics as and ), and only 6 of the the of 6 only and ), SCP/TAPS proteins proteins SCP/TAPS 3 1 C. elegans C. ( Supplementary Supplementary

2 Supplementary Supplementary

Supplementary Supplementary . americanus N. . americanus N. disease 4 N. N. americanus , along with with along , ). There were were There ). N. ameri N. C. elegans C. suggests suggests 2 2 3 9 7 ), ), ). ). β 2 ------. ,

and the the and level sequence primary at the to characterize are challenging GPCRs in genes GPCR 1,700 nearly are there whereas signal transduction ( targets groups in the protein these investigated we and (VGICs), channels kinases ion (LGICs), voltage-gated channels ion ligand-gated nuclear (NRs), (GPCRs), receptors receptors protein–coupled G rhodopsin-like of targets drug current all of (53%) half than more As targets. chemo therapeutic of and prioritization and prediction genomic the for studies genomic comparative enable to expected is genome the of availability the Hence, approaches. ventional con with compared rate hit the raise and targets prioritize to used nematode multiple in species screening) experimental by (accompanied and using discovered been have drugs anthelmintic Historically, Prospects glutathione and ( (PRX) others among (GST), peroxiredoxins S-transferases (C-TL), lectins C-type pathways that regulate biological processes, and they have been a been have they and processes, biological regulate that pathways in present not were channels sodium gated ( channels potassium (6TM) 6-transmembrane chloride and voltage-gated channels as calcium voltage-gated channels, such families major the than (fewer VGICs 48 encodes in SLO-1 inhibits C. elegans emodepside example, (for targets anthelmintic nematodes other ( of insensitivity ivermectin relative the explain may at diversity by a activation with correlated direct ivermectin, position genes within the glc-4 ( genes channel chloride glutamate-gated InterPro IPR015680); three of these genes in clustered with six present are that channels chloride Ivermectin differences. these levamisole as in subunits nAChR 29 with compared birds, and mammals in genes nAChR-encoding (17 nAChR- of number larger much a have Nematodes found. also were members family Cys-loop Genes encoding nicotinic acetylcholine receptor subunits (nAChR) of in genes LGIC-encoding 159 with compared identified, LGIC (Cys-loop family and glutamate-activated cation channels) were of subfamilies three the of two to belonging LGICs 224 VGICs); and nus ( overexpressed genes; (30 pression americanus N. in found were secretin) or adhesion not but frizzled, and rhodopsin briggsae in GPCRs of subfamilies several of amplifications frequent including GPCRs identified, of number the in difference this for explanation biological upeetr Note Supplementary GPCRs are attractive drug targets owing to their importance in in importance their to owing targets drug attractive are GPCRs Protein kinases are involved in numerous signal transduction transduction signal numerous in involved are kinases Protein are and channels calcium and potassium sodium, include VGICs encodes members of both major ion-channel categories (LGICs (LGICs categories ion-channel major both of members encodes in in vitro ; ; ref. 1 0 3 Supplementary Table 9 Supplementary N. americanus N. have shown that genomic and transcriptomic data can be be can data transcriptomic and genomic that shown have 6 4 4 . Three of the five GRAFS families of GPCRs (glutamate, (glutamate, GPCRs of families GRAFS five the of Three . 5 Supplementary Note Supplementary

2 and parasitic nematodes such as for compound screens compound ). The lack of a clear ortholog of the ivermectin-sensitive of ). of The lack ivermectin-sensitive the a ortholog clear C. elegans C. N. N. americanus 3

9 . The putative GPCRs were enriched for iL3 overex iL3 for enriched were GPCRs putative The . new and monepantel and P N. N. americanus 4 = 4.1 × 10 4 3 . 5

P interventions . We identified 272 GPCR genes in = 5.1 × 10 × 5.1 = and and relative to the closely related related closely the to relative C. elegans C. genome is in a draft state, there may be a be may there state, draft a in is genome −7 genome to identify potential therapeutic therapeutic genome potential to identify α upeetr Fg 13 Fig. Supplementary for under-representation). for under-representation). 4 subunits than examined vertebrates vertebrates examined than subunits 1 genome, and the underlying sequence 3 targets a subunit of glutamate-gated glutamate-gated of subunit a targets C. elegans C. 4 ). As in other nematodes other in As ). and and −8 . Recent comparative ‘omics’ . comparative Recent studies 3 4 8 0 ), with only one gene being adult adult being gene one only with ), ), and several anthelmintics such such anthelmintics several and ), have been developed to exploit exploit to developed been have Supplementary Note Supplementary Supplementary Note Supplementary N. americanus N. ), including members from from members including ), avr-14 A. suum N. americanus N. C. elegans C. , , avr-15 s e l c i t r A 4 6 N. americanus N. ). Caenorhabditis Caenorhabditis ) compared to to compared ) N. americanus N. N. americanus (eight genes; genes; (eight N. americanus and and C. elegans C. N. N. america 4 . Although Although . . 7 C. C. elegans 3 ). , voltage- , 5 consist consist glc-1 in vivo in ). to to 3 4 7  3 - - ­ - . ,

© 2014 Nature America, Inc. All rights reserved. mia leuke myelogenous chronic treating for approved inhibitor tyrosine kinase a was compound highest-scoring The Methods). database (Online available publicly a from kinases, compounds homologous prioritized also target we that inhibitors and drugs current ate tization nematodes and dissimilarity to human orthologs were used for priori respectively. among Gene expression, tissue adults, expression, conservation and iL3 in overexpressed were 12 and 15 kinases, encoding discovery drug for focus major s e l c i t r A  ( 5 of similarity of the CAP domain. Data on domain representation, secretion type and stage of expression are included. are expression of stage and type secretion representation, domain on Data domain. CAP the of similarity ( miscellaneous. plus CAP double and miscellaneous, plus CAP single domain, CAP double domain, CAP single domains: CAP the outside regions and domains CAP of number their to according grouped are proteins SCP/TAPS All members. family SCP/TAPS from structure gene of representation 3 Figure c

) Neighbor-joining clustering of the all all the of clustering ) Neighbor-joining 9 . A total of 233 other compounds had the second-highest score score second-highest the had compounds other 233 of total A . b a Double CAPs+miscellaneous Single CAP+miscellaneous Double CAPdomain Single CAPdomain Supplementary Table 11 Supplementary

Adult expression level (DCPM) 10,000

1,000 1 0.01 0 SCP/TAPS gene family expansion in in expansion family gene SCP/TAPS 100 0.1 of candidate targets ( targets candidate of 10 1 CAP domainin CAP domainin Signal peptide 0.01 0.1 iL3 expressionlevel(DCPM N. americanus N. americanus Double CAP+herpes_UL92 Double CAP,onepartiallyoverlappedwithphage_mat-A Double CAPoverlappedwithdoublepyrophosphatase Protease inhibitors+doubleCAP Double CAP,onepartiallyoverlappedwithEBmodule Single CAPoverlappedwithGSu_C4xC_C2xCH Single CAP,overlappedwithDUF3531/DUF3341 EndoU bacteria+singleCAP,partiallyoverlapped Caleosin +singleCAPdomain,partiallyoverlapped Protease inhibitors+singleCAP Single CAPdomain+DUF4001 Double CAPdomain(25genes) Single CAPdomain(99genes) ), indicating that these existing drugs drugs existing these that indicating ), Nonclassical 4 Supplementary Table 10 Supplementary 8 1 1 ,49 . Of the 274 274 the Of . C. elegans C. & 0 C. elegans N. americanus. N. Not secreted 100 ) and ungapped ungapped and N. americanus N. ortholog 1,000 ( ). To ). evalu a 10,000 ) SCPs/TAPS are enriched in the adult stage of of stage adult the in enriched are ) SCPs/TAPS N. americanus N. genes genes - - - species used in the in used species non-nematode and nematodes among conserved were 120 of which ( as chokepoints way were genes classified path metabolic of the of 34% A total modules. complete potentially ( pathways metabolic in ( terms orthology KEGG 3,265 with associated genes protein-coding 3,976 targets. drug further to identify tized thus cost and time development diseases, minimizing tropical neglected treating for repurposed be might Chokepoints in metabolic pathways metabolic in Chokepoints SCP/TAPS genes on the basis of their full-length primary sequence sequence primary full-length their of basis the on genes SCP/TAPS N. americanus Supplementary Table 7 Table Supplementary c C. elegans a genes genes DVANCE ONLINE PUBLICATION ONLINE DVANCE

comparative analysis. Chokepoint prioritization, prioritization, Chokepoint analysis. comparative Nonclassically secreted Supplementary Fig. 14 Fig. Supplementary T12A7. Y51H7C.2 ZK384. ZK384. F49E11. NECAME 00629 NECAME 18512 NECAME 17718 NECAME 17566 NECAME 19399 NECAME 12367 NECAME 18281 NECAME 18720 NECAME 15064 NECAME 05802 NECAME 01696 NECAME 07497 NECAME 06266 NECAME 12045 NECAME 07115 NECAME 07114 NECAME 16753 NECAME 16125 NECAME 16123 NECAME 03346 NECAME 01364 NECAME 13737 NECAME 17292 NECAME 01333 C04C11.1a NECAME 16754 D2062.1 C07A4. C07A4. F09B9. F57B7.2b NECAME 09859 F48E8.1a NECAME 17468 NECAME 15865 NECAME 02290 T19C9.5 F08E10. C50E3.10 ZK6.3 F58E2. Y43F8B. NECAME 19062 NECAME 18170 NECAME 13008 NECAME 07690 NECAME 16749 NECAME 15467 NECAME 09334 NECAME 01772 F11C7.3a NECAME 01217 NECAME 01219 NECAME 11281 T05A10. NECAME 18444 NECAME 17291 NECAME 09589 NECAME 06987 NECAME 15942 NECAME 01191 NECAME 13272 NECAME 08136 NECAME 12644 NECAME 00357 NECAME 19417 NECAME 16614 NECAME 06017 NECAME 12445 NECAME 14414 B0545. NECAME 01367 NECAME 02742 NECAME 15547 NECAME 01369 NECAME 01309 NECAME 01366 F09E8. C39E9. F49E11.10 F49E11. F49E11.11 F02E11. C39E9. H10D18.4 H10D18.2 F49E11. C39E9. C39E9. F49E11. Double CAPdomain Genes 3 3 2 1 5 5 5 2 3 2 4 6 5 N. americanus N. 4 7 5 6 5 9 5 5 Signal peptide CAP +other ), 938 (24%) of which are involved involved are which of (24%) 938 ), Single CAP

N. americanus N. Domain 5 5 1 0 were analyzed and priori and analyzed were . Supplementary Table 12 Supplementary Secretion . . ( 0 b ) Schematic ) Schematic

iL3 Depth (DCPM 1,200 Expression Nature Ge Nature ), representing 32 32 representing ), encodes at least at least encodes 0 Adult ) 1,200 n etics ), ), - - © 2014 Nature America, Inc. All rights reserved. the heaviest-intensity infections heaviest-intensity the observa in with that, consistent is tions This adolescence. during individuals to immunity tective protecting or pro humans, of STHs parasite other the unlike Hence, infection. killing heavy in against role no have to anti seem these but bodies antigens, secreted of number larger a to responses ( responses immune anti-hookworm of targets years (>45 of age) oldest age strata the identified 22 antigens and that were significant age) ( of years (<14 youngest the from als individu 200 on based study pilot This Brazil. in state Gerais Minas an in who residents years were 4–66 aged individuals from sera with probed was array ( 564 vac of We cines and diagnostics. development a protein developed containing microarray the for discovery antigen accelerate and dis ease hookworm human of immunobiology the investigate to tools The Postgenomic 14 Table Supplementary ( synthesis DNA and metabolism purine inhibit to DB01033) (DrugBank mercaptopurine into com converted is include that prodrug a DB00993), candidates (DrugBank azathioprine as such pounds highest-ranked The for inhibitors. as method kinase same the using pairs) target-compound (449 bases data available for publicly from compounds inhibitors assessed we repurposing, chokepoint identify To module M00049). pathway biosynthesis (KEGG ribonucleotide adenine the in chokepoint and a ko00230) pathway (KEGG pathway metabolism purine the in (ASL; EC ( 4.3.2.2) lyase adenylosuccinate is chokepoints tized Note Supplementary ( genes distinct ten by encoded enzymes eight tized in gene orthologous the of knockdown RNAi upon lethality for and be an in expression bottleneck to a for chokepoint requirements with along difference. ** Student’s with detected were children and adults human between responses antibody in differences Significant data. RNA-seq by measured antigens, indicated of expression stage-specific denote labels ‘Adult’ and ‘iL3’ intensity. fluorescence mean by measured antigen, each for groups three the of immunoreactivity mean the shows vitro in recombinant individual an represents row other Every antigens. naive control as included were stages adult and iL3 from extracts somatic ( old years >45 adults infected and old years <14 children infected individuals, uninfected of groups from antibodies IgG the to antigens 22 of immunoreactivity the shows map heat The intensity. infection and age with vary antigens 4 Figure Nature Ge Nature intensity. lower with usually antigens, fewer against responses IgG Supplementary Table 15 Table Supplementary P Older individuals showed stronger immunoglobin G (IgG) (IgG) G immunoglobin stronger showed individuals Older < 0.01; *** < 0.01; n N. americanus N. . americanus N. = 8 in each group). Duplicate crude crude Duplicate group). each = 8 in translation product. The bar chart chart bar The product. translation

upeetr Tbe 12 Tables Supplementary Serum responses to to responses Serum n etics

exploration P Necator < 0.001; NS, no significant significant no NS, < 0.001;

Supplementary Figs. 15 Supplementary recombinant proteins inferred from the genome genome the from inferred proteins recombinant genome enables development of postgenomic postgenomic of development enables genome N. N. americanus ADVANCE ONLINE PUBLICATION ONLINE ADVANCE N. americanus N. t ). Among the priori the Among ). -endemic areas, older people often harbor harbor often people older areas, -endemic -test. * -test. C. elegans C. and and and and

N. americanus N. using P N. N. americanus < 0.05; < 0.05; Supplementary Note Supplementary Supplementary Note Supplementary 1 ,

52 the does not seem to develop in most to develop not seem does – , , priori , 5 –endemic area of northeastern northeastern of area –endemic 14 3

. Younger individuals showed showed individuals Younger . N. americanus and and

– Supplementary Fig. 18 Fig. Supplementary - - 17

), ), an involved enzyme Fig. Fig. 4

). ).

). The micro The ). immunome P

≤ 0.05) ------,

the immunobiology of human hookworm disease and accelerating accelerating and disease hookworm human of immunobiology the the of utility the for highlights draft genome sequence unde health. and human for importance major of nematodes blood-feeding into insight first a provide here presented data the sequenced, been has worm Asdisease. hook combat to therapeutics of development the toward efforts as well as of hookworms biology evolutionary and epidemiology of the explorations fundamental future facilitate to expected is hookworm human a of genome first the of characterization The STH. any other N. americanus DISCUSSION tests serodiagnostic ( specific and sensitive of basis the form might that antigens represent species, nematode weak other to have homology and no or individuals, infected the all or most by recognized are that proteins secreted Furthermore, disease. hookworm against (ref. iL3 irradiated with vaccination by infection hookworm to resistant exposure of hookworm to histories different and backgrounds genetic with different individuals from sera with probed hookworm be can against microarray protein immunity of acquisition the in anti these of bodies roles the understand better to required be will array IgE and micro protein our on subclasses represented antigens IgG hookworm to responses the of STH studies other kinetic with Detailed observed infections. immunity protective age-acquired the to contrast in infection, hookworm to immunity protective overall time. over infection from individuals to response immune the of feature key a are antibodies although Thus, Supplementary Note Supplementary Our postgenomic exploration of inferred proteomic informa proteomic inferred of exploration postgenomic Our of absence to the contributing factors multiple are probably There N. americanus N. 5 5 ), thereby facilitating efforts to develop an efficacious vaccine vaccine an efficacious to develop efforts facilitating ), thereby 5 is responsible for causing more disease worldwide than 4 , as well as from rendered immunologically immunologically rendered animals from as well as , N. N. americanus and increase with host age, they do not protect protect not do they age, host with increase and ; for example, example, for ; is the first hookworm whose genome Supplementary Fig. 19 Fig. Supplementary s e l c i t r A r standing standing ). 1 3 . The The . t ion ion  - - -

© 2014 Nature America, Inc. All rights reserved. A.D. performed genome-based comparative studies, differential transcription, B.A.R., R.T., Q.W., S.A., J. Martin, E.H., A.L., S.T.G., P.L.F., J. Mulvenna, J.S. and data and constructed, annotated and submitted the assembly. M.M., Y.T.T., X.G., material. K.H.-P., X.Z., V.B.-P., P.M., W.C.W., J. Martin and S.A. produced sequence and manuscript preparation. B.Z., P.J.H., J.M.H., P.L.F., J.B. and providedE.M.R. R.K.W. and conceivedS.R. and planned the project. M.M. led the project, analysis Y.T.T., X.G. and B.A.R. contributed equally to this work. M.M., R.B.G., P.W.S., who contributed to this study. We thank the faculty and staff of The Genome Institute at Washington University acknowledged. P.W.S. is an investigator with the Howard Hughes Medical Institute. Australia’s National Health and Medical Research Council to R.B.G. are gratefully GM097435 to Sciences M.M. Funds from the Australian Research Council and Infectious AI081803 and Diseases NIH–National Institute of Medical General genome analysis was funded by grants NIH–National Institute of and Allergy Human Genome Research Institute grant U54HG003079 to R.K.W. Comparative annotation work was funded by US National Institutes of Health (NIH)–National who (U54AI065359) contributed to this study. The genome sequencing and and the Protein Microarray Laboratory at the University of California–Irvine We thank the faculty and staff of the Genome Institute at Washington University online version of the pape Note: Any Supplementary Information and Source Data files are available in the WormBase. and at Nematode.net available also is genome a and browsable Nematode.net in deposited SRR61028 SRR3 SRR03680 accessions: following the under Archive Read Short the version, sion acces project the under DDBJ/EMBL/GenBank in deposited been http:/ www.sanger.ac Accession codes. Accession the th of in version available are references associated any and Methods M ac.uk/k wustl.edu/resource ke factor database, re software ber.sourceforge http://www.cbs.dtu.d http:// URLs. societies. industrialized in proportions epidemic reaching are that diseases inflammatory as well as parasite, neglected and ancient this against interventions improved develop to quest the in methods implications. It provides many opportunities to establish postgenomic the Thus, already have undergone clinical molecules trials for strokehookworm and deep-veinrecombinant thrombosis veritable pharmacopoeia—indeed, a represents therefore sequence genome The humans. diseases of inflammatory noninfectious of therapeutic range a their against for properties interest garnering are hookworms that note to pertinent also is It diagnostics. and vaccines of development the s e l c i t r A  AUTH A

ck fcov/current et g ; Jaspar database, ; Jaspar database, no ANCG000000 h /repeatmasker.org 4146 O NCBI SRA, www.repeatmasker.org ods wledgme R R C inome ANCG0100000 / ; ; Refcov, 2 ON 0 0 , , , , , . americanus N. e pape e SRR61134 TRIBUTI r SRR60985 SRR0 / ; SignalP,; .uk/resources/databa .net ; PyMOL, PyMOL, ; http://www. The whole-genome sequence of sequence The whole-genome n http://g http://www.ncbi.nlm r 0 / ts s.htm 3680 . ; Seqclean, Seqclean, ; r 0 http://jaspar.gener . k/services/RNAmmer . The version described in this paper is the first first is the paper in this described . version The ON 1 / 0 www.cbs.dtu.d ; Fgenesh, ; Fgenesh, 0 2 , , S . All short-read data have been deposited in data have deposited . been short-read All l mt.genome.wustl.edu/ , , , ; Kinomer, Kinomer, ; SRR61135 www.py SRR6 SRR03680 genome.jp/kegg-bin/g genome sequence will have broad broad have will sequence genome /RepeatModeler.htm http://compbio 0989 w mol.org ww.softberry.com 0 ses/rfam.htm . RNA-seq profiles have been been have profiles RNA-seq . http://www.co 5 4 eg.net k/services/SignalP .nih.gov/sr , , , SRR60995 SRR0 / / ; Rfam database, database, Rfam ; ; KEGG transcription transcription KEGG ; / ; ; Patser, .dfci.harvard.edu/tg 3681 genome-shipit/gmt- et_htext?ko03000. N. N. americanus a l ; RepeatMasker, RepeatMasker, ; ; ; RepeatModeler, 1 1 l mpbio.dundee. ; RNAmmer, RNAmmer, ; , , , http://storm / ; BER, ; BER, SRR6 SRR34145 SRR0 / . 1028 3679 online online http:// ht tp:// has 1 1 9 9 o. i/ 5 - . , , ,

7. 6. 5. 4. 3. 2. 1. reprints/index.ht at online available is information permissions and Reprints The authors declare no competing interests.financial M.M., R.B.G., A.L. and J.M.H. drafted, edited and wrote the manuscript. host-parasite interaction analysis, and proteomics and protein-array analysis. 12. 11. 10. 9. 8. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. C O

MPETI emnh netos sseai rve ad meta-analysis. and review systematic infections: helminth trial. controlled and helminths soil-transmitted against 46 OMICS Microbiol. . intestinal and infection hookworm combat to vaccines 7 hookworm. and prevention and control of immune-mediated diseases. immune-mediated of control and prevention kinases. protein targeting PDR. Lao in infection helminth concomitant on effect and hookworm against mebendazole Trop.Dis. Negl. PLoS meta-analysis. and review systematic a treatment: drug after reinfection helminth 299 of (2000). action its preventing mechanisms. molecular and cellular Sci. cuticle. Immun. Infect. serine protease inhibitor: evidence for a role in hookworm-associated growth delay. nematodes. blood-feeding (2003). of proteases arrest. developmental during genes growth of targets? drug novel as nematodes—prospects parasitic important socioeconomically in phosphatases adaptation. parasitic nematode Res. in structure. gene diseases. inflammatory colitis. of (IL-4) Immunol. Parasite trial. (2011). e17366 controlled placebo double-blinded randomised disease–a (2012). 83–96 Keiser, J. & Utzinger, J. Efficacy of current drugs against soil-transmitted against drugs current of Efficacy J. Utzinger, & J. Keiser, P. Steinmann, P.J.vaccines. Hotez, Hookworm & Bethony,J.M. D.J., Diemert, A. Loukas, Developing A. Loukas, & M. Pearson, D.J., Diemert, J.M., P.J.,Bethony, Hotez, B. Schneider, J. Bethony, aeo, A.J. Daveson, interactions: immunological Helminth-host J.V. Weinstock, & D.E. Elliott, Taylor,C.M. P.A. Soukhathammavong, Soil-transmitted X.N. Zhou, & C.H. King, J., Utzinger, S., Melville, T.W., Jia, Kumar, S. & Pritchard, D.I. Secretion of metalloproteases by living infective larvae F.J.Culley, parasites: helminth by regulation Immune M. Yazdanbakhsh, & R.M. Maizels, enzymes. digestive pancreatic Human M.E. Lowe, & D.C. Whitcomb, A.R. Jex, nematode the of biogenesis the in involved Enzymes A.D. Winter, & A.P. Page, D. Chu, Digestive A. Loukas, & P.J. Hotez, D.P., Knox, P.J., Brindley, A.L., Williamson, Baugh, L.R., Demodena, J. & Sternberg, P.W. RNA Pol II accumulates at promoters Serine/threonine R.B. Gasser, & A. McCluskey, A., Hofmann, B.E., Campbell, Z. Wang, genes neighboring of Coexpression L.D. Hurst, & T. Blumenthal, M.J., Lercher, for pharmacopoeia hookworm The eukaryotic of property general a A. are introns first Longer I. Korf, & K.R. Bradnam, Loukas, & I. Ferreira, S., Navarro, I. Ferreira, infections. hookworm human of immunology The A. Loukas, & H.J. McSorley, , 1234–1244 (2011). 1234–1244 , Caenorhabditis elegans Caenorhabditis , 282–288 (2008). 282–288 , Necator americanus Necator

, 1937–1948 (2008). 1937–1948 ,

52 13 + N IL-10

, 1–17 (2007). 1–17 , 15 G G FI , 238–243 (2003). 238–243 , Adv. Parasitol. Adv. et al. et et al. et

et al. et hs ok s iesd ne a raie omn Attribution-NonCom Commons Creative a under licensed is work This visit license, this of copy a view To License. Unported 3.0 mercial-ShareAlike Infect. Immun. Infect. PLoS Negl. Trop.Dis. Negl. PLoS , 567–577 (2011). 567–577 , 8 m et al. et t al. et , 814–826 (2010). 814–826 , t al. et et al. et t al. et N l + . Molecular characterization of characterization Molecular CD4 a et al. et A t al. et http://creativeco Characterizing

t al. et PLoS ONE PLoS DVANCE ONLINE PUBLICATION ONLINE DVANCE 72 Ascaris suum Ascaris Lancet PLoS ONE PLoS N Vaccinomics for the major blood feeding helminths of humans. of helminths feeding blood major the for Vaccinomics Eotaxin is specifically cleaved by hookworm metalloproteases by hookworm cleaved specifically is Eotaxin Using existing drugs as leads for broad spectrum anthelmintics spectrum broad for leads as drugs existing Using okom xrtr/ertr pout idc interleukin-4 induce products excretory/secretory Hookworm oltasitd emnh netos acrai, , , infections: helminth Soil-transmitted

CIAL CIAL I , 2214–2221 (2004). 2214–2221 , 32 + Biotechnol. Adv. Biotechnol. A history of development. vaccine hookworm of history A fet f okom neto o wet hleg i celiac in challenge wheat on infection hookworm of Effect T cell responses and suppress pathology in a mouse model mouse a in pathology suppress and responses cell T Efficacy of single-dose and triple-dose albendazole and albendazole and triple-dose single-dose of Efficacy

, 549–559 (2010). 549–559 ,

. 6

53 n vitro in Int. J. Parasitol. J. Int. 367 , e1621 (2012). e1621 , J. Parasitol. J.

is mostly due to operons and duplicate genes. duplicate and operons to due mostly is N , 85–148 (2003). 85–148 , PLoS Pathog. PLoS 81

t al. et 3 6 , 1521–1532 (2006). 1521–1532 , TERESTS , e3093 (2008). e3093 , , 2104–2111 (2013). 2104–2111 , , e25003 (2011). e25003 , draft genome. draft mmons.org/licenses/by-nc-sa/3.0 Ancylostoma and BMC Genomics BMC

Low efficacy of single-dose albendazole and albendazole single-dose of efficacy Low 6 , e1417 (2012). e1417 ,

78 29 Nat. Rev. Immunol. Rev. Nat. n vivo in , 917–919 (1992). 917–919 ,

, 28–39 (2011). 28–39 , 43 9 , e1003149 (2013). e1003149 , , 225–231 (2013). 225–231 , Ancylostoma Nature rns Parasitol. Trends . Science

. Immunol. J. 11 , 307 (2010). 307 ,

transcriptome and exploring and transcriptome 479 Taenia

Ann. NY Acad. Sci. Acad. NY Ann.

324 http://www.nature.c , 529–533 (2011). 529–533 ,

Nature Ge Nature 3 , 92–94 (2009). 92–94 , , 733–744 (2003). 733–744 , p. a randomized a spp.: . m Md Assoc. Med. Am. J.

/ 165 .

Clin. Infect. Dis. Infect. Clin. 19 LS ONE PLoS 6447–6453 , Hum. Vaccin. Hum. 417–423 , Kunitz-type n Nat. Rev.Nat. i. Dis. Dig. Genome etics

1247 om/

6 - , ,

© 2014 Nature America, Inc. All rights reserved. 42. 41. 40. 39. 38. 37. 36. 35. 34. 33. 32. 31. 30. 29. Nature Ge Nature

1–106 (2013). 1–106 agent. antiparasitic new potent (1983). a Ivermectin: 103 candidate. development drug anthelmintic new a as 1566) (AAD children. in ascariasis of treatment nematode nomenclature. the of family gene receptor the elegans Discov. Drug Rev. Nat. discovery. parasitism. now? 34 dpie oersos suy f K2926 n ct icei stroke. ischemic acute in UK-279,276 of study dose-response adaptive trials. clinical human in in Na-ASP-2 (2009). hookworm human oet O Te ernl eoe of genome neuronal The O. Hobert, T.A.Jacob, & G. Albers-Schonberg, E.O., Stapley, M.H., W.C.,Fisher, Campbell, Kaminsky,R. the in P.E.Levamisole Soysa, & J.C. Nanayakkara, E.H., Mirando, N.D., Lionel, acetylcholine nicotinic The D.B. Sattelle, & J. Hodgkin, P., Davis, A.K., Jones, of analysis organization: synaptic and channels Ion of Ganetzky,B. J.T.& Littleton, families chemoreceptor putative The J.H. Thomas, & H.M. Robertson, Overington, J.P., Al-Lazikani, B. & Hopkins, A.L. How many drug targets are there? anthelmintic to approaches screening whole-worm and Target-based A.C. Kotze, TGF- M. Crook, F.J.& Thompson, M.E., Viney, from to helminths—where in proteins SCP/TAPS R.B. Gasser, & C. Cantacessi, M. Krams, G.N. Goud, N. Ranjit, , 2543–2548 (2003). 2543–2548 , Drosophila , 931–939 (2008). 931–939 , Mol. Cell. Probes Cell. Mol. . n WormBook etics Vet. Parasitol. Vet. t al. et et al. et Int. J. Parasitol. J. Int. et al. et Pichia pastoris Pichia et al. et genome. Invert. Neurosci. Invert. Acute Stroke Therapy by Inhibition of Neutrophils (ASTIN): an (ASTIN): Neutrophils of Inhibition by Therapy Stroke Acute rtoyi dgaain f eolbn n h itsie f the of intestine the in hemoglobin of degradation Proteolytic Expression of the of Expression

Identification of the amino-acetonitrile derivative monepantel derivative amino-acetonitrile the of Identification

ADVANCE ONLINE PUBLICATION ONLINE ADVANCE 2006 eao americanus Necator

5 26

Neuron , 993–996 (2006). 993–996 , Vaccine 186 , 1–12 (2006). 1–12 , , 54–59 (2012). 54–59 ,

and purification of the recombinant protein for use for protein recombinant the of purification and 35 , 118–123 (2012). 118–123 , , 1473–1475 (2005). 1473–1475 ,

7

26 23 , 129–131 (2007). 129–131 , , 35–43 (2000). 35–43 , Necator americanus Necator BMJ , 4754–4764 (2005). 4754–4764 , anradts elegans Caenorhabditis

anradts elegans Caenorhabditis 4 . , 340–341 (1969). 340–341 , . net Dis. Infect. J. β and the evolution of nematode of evolution the and Science hookworm larval antigen larval hookworm

.

WormBook 199 221 a udt on update an : Parasitol. Res. Parasitol. 904–912 , 823–828 ,

Stroke 2013 C. ,

54. 53. 52. 51. 50. 49. 48. 47. 46. 45. 44. 43. 55.

ua howr ifcin n Baiin community. Brazilian a in infection hookworm human of Republic People’s Province, China. Hainan in infection Necator of intensity the on Hyg. Trop.Med. women. Vietnamese in infection hookworm of intensity increased discovery. drug facilitate of drugs. existing for uses Science Discov. Drug Rev. Nat. (2012). years. million 800 parasitic the nematode of junction neuromuscular the at emodepside anthelmintic the of 37 egg-laying behaviour and development of pharynx. Parasitol. J. of ivermectin to 954–961 (2010). 954–961 153–183 (1971). 153–183 unel R.J. Quinnell, J. Bethony, D.L. Humphries, Yeh, I., Hanekamp, T., Tsoka, S., Karp, P.D. & Altman, new R.B. Computational analysis developing and identifying repositioning: Drug K.B. Thor, T.T. & Ashburn, N.P.Shah, century? twenty-first the of targets drug major kinases—the Protein P. Cohen, first the channels: sodium voltage-gated of evolution Adaptive H.H. Zakon, effect The R.J. Walker, & L. Holden-Dye, A., Harder, K., Amliwala, J., Willson, K. Bull, T.G. Geary, Richards, J.C., Behnke, J.M. & Duce, I.R. Miller, T.A. Vaccination against the canine hookworm diseases. hookworm canine the Miller,T.A.against Vaccination , 627–636 (2007). 627–636 , lsoim falciparum Plasmodium Clin. Infect. Dis. Infect. Clin.

305 et al. et Exp. Parasitol. Exp. Ascaris suum Ascaris et al. et

t al. et t al. et 25 , 399–401 (2004). 399–401 , Effects of the novel anthelmintic emodepside on the locomotion, the on emodepside anthelmintic novel the of Effects t al. et , 1185–1191 (1995). 1185–1191 ,

Overriding imatinib resistance with a novel ABL kinase inhibitor.kinase ABL novel a with resistance imatinib Overriding 91 et al. et Emerging patterns of : influence of aging of influence infection: hookworm of patterns Emerging amnhs otru: vretnidcd aayi o the of paralysis ivermectin-induced contortus: Haemonchus , 518–520 (1997). 518–520 , rc Nt. cd Si USA Sci. Acad. Natl. Proc. eao americanus Necator eei ad oshl dtriat o peipsto to predisposition of determinants household and Genetic The use of human faeces for fertilizer is associated with associated is fertilizer for faeces human of use The

1

.

Nat. Rev. Drug Discov. Drug Rev. Nat. 77 35 , 309–315 (2002). 309–315 , Genome Res. Genome Parasitology , 88–96 (1993). 88–96 , , 1336–1344 (2002). 1336–1344 , eaoim ognzn gnmc nomto to information genomic organizing metabolism:

126

14 Caenorhabditis elegans and In vitro , 917–924 (2004). 917–924 , , 79–86 (2003). 79–86 ,

3 studies on the relative sensitivity 109 nyotm ceylanicum Ancylostoma , 673–683 (2004). 673–683 , (suppl. 1), 10619–10625 10619–10625 1), (suppl. s e l c i t r A . net Dis. Infect. J. Adv.Parasitol. . Int. J. Parasitol. Trans. R. Soc. R. Trans.

. 202 Int.

9  , ,

© 2014 Nature America, Inc. All rights reserved. under liquid nitrogen before solubilization in lysis buffer, total protein protein total buffer, methods lysis established in and precipitated, solubilization was before nitrogen liquid under extract. worm somatic of analysis Proteomic categories. GO significant of structure hierarchical the analyze correction population (FWER) Rate Error Wise FUNC using calculated was comparing the iL3 and significance gene enrichment adult-overexpressed sets tests probability distribution binomial rate (FDR)-corrected discovery to similarity on (based sets gene ent hit E best with MEROPS the to brane topology tions. Proteins peptides with signal was used to get InterPro KEGG against searched enrichment. and annotation functional proteome Deduced ( genes stage-overexpressed identify to order EdgeR using analysis expression differential further to subject were genes Expressed (DCPM). bases mapped million 100 per coverage of was depth the to Expression normalized values analysis. expression using quantified downstream further for used were “expressed”) coverage of breadth a with genes and masked were bases processed were data RNA-seq Illumina Note Illumina (Illumina, San Diego, CA), with slight modifications ( sequenced using a Genome Sequencer Titanium FLX (Roche Diagnostics) and both Roche/454 and Illumina cDNA libraries ( RNA sequencing. as classified OrthoMCL using assigned were proteins of deduced categories and functional by BER (JCVI) determined was naming in operons putative ( ance two-tailed using groups gene nonorthologous and and and orthologous intron and species lengths numbers were between tested were then These compared to sizes. the intron calculating and “coding_exon”) as exon only tagged features (considering annotations gff-format from sizes exon parsing by of in and exons and introns number size The ( generated was algorithms prediction above the from MAKER tool pipeline annotation of combination a using predicted were genes Protein-coding RepeatMasker. using masked then RNAswere predicted and identified by a sequence homology search against the Rfam database transfer RNAs (tRNAscan-SE RepBase release 17.03 (ref. (ref. 17.03 release (CENSOR RepBase characterized repeats and (RepeatModeler) Newbler with assembled and platform (3 libraries shotgun kb and 8 using Roche/454 were kb sizes) sequenced insert assembly and Sequencing, annotation. and collected. were stages, worm environmental) (parasitic) adult (iL3; L3 infective the perspective, host- interaction a from parasite stages developmental key two sequencing, transcriptome For the QIAamp DNA Mini Kit according to manufacturer’s (Qiagen). instruction subcutaneously with Welfare infected hamsters of Animal intestines from all collected with were worms Adult accordance guidance. in Committee– and Use 053-12.2, and protocol Care approved Animal Institutional George University the under Washington Harlan from male) weeks, (3–4 Hamster Syrian Golden material. Parasite ONLINE Nature Ge Nature ). The 454 cDNA reads were analyzed as previously described previously as analyzed were reads cDNA 454 The ). Supplementary Note Supplementary

MET N. americanus N. n etics ≤ 7 7 e 7 were identified. The degradome was identified by comparison H 9 −10 7 protease unit database using WU-BLAST (identifying the the (identifying WU-BLAST using database unit protease RNA was extracted RNA was 0 The Anhui strain of of strain Anhui The 7 ODS , and genes not orthologous to the other 12 species were were species 12 other the to orthologous not genes and , 2 ). Enrichment of different protease groups among differ protease of different ). Enrichment . RNA-seq reads were aligned were . RNA-seq reads N. N. americanus . americanus N. C. elegans 67– 7 7 6 5 domain matches and Gene Ontology using KAAS using 6 9 specific. . Orthologous groups were built from 13 species species 13 from built were groups Orthologous . ). Two separate approaches were used to identify identify to used were approaches Two ). separate 6 8 6 1 0 2 at a 0.01 significance threshold after Family- after threshold significance 0.01 a at )). Ribosomal RNA genes (RNAmmer genes RNA Ribosomal )). ) ) were Other noncoding identified. RNAs were genes (WS230). Significant differences in exon 6 7 iL3 for 8 weeks 6 7 ( . A consensus high-confidence gene set set gene high-confidence consensus A . , nonclassical secretion , nonclassical C. elegans C. Supplementary Note Supplementary 1 6 8 8 ≥ 7 Fragment, paired-end whole-genome , DNase treated and used to generate generate to used and treated DNase , (cut-off 35 bits), and InterProScan 35 bits), (cut-off 1 5 50% across the gene sequence (i.e., (i.e., sequence gene the across 50% N. americanus N. and low-compositional complexity complexity low-compositional and 8 . A repeat library was generated generated was library repeat A . 8 7 3 Supplementary Note Supplementary N. N. americanus Supplementary Supplementary Note 4 were used to reduce, alkylate alkylate reduce, to used were ab initio ab (false discovery rate <0.05) in in <0.05) rate discovery (false ) was detected based on false false on based detected was ) Whole worms were ground ground were worms Whole 8 5 1 7 7 . QuickGO . 3 t . . DNA was with extracted to the predicted gene set set gene predicted to the -tests with unequal vari unequal with -tests Supplementary Note Supplementary programs was maintained was 5 9 v. 4.2.27 against against 4.2.27 v. 7 were determined were determined ). Gene product product Gene ). 8 and transmem Supplementary 6 7 8 Proteins were were Proteins (GO) annota 2 was used to to used was 64 , 6 ) ) that were ). 6 3 5 . . Repeats and the the and 6 1 8 1 8 0 ) and and ) . The The . . . GO 5 6 in in 6 ). ). 9 - - - -

secretion mode and expression data, and then visualized using iTOL using visualized then and data, expression and mode secretion helminths using built were genes inference Bayesian ungapped from derived sequences primary length Interproscan domains tein SCP/TAPS. Patser. using genes expressed differentially of upstream and of up to to were the sequences 500 used bp scan factors downstream scription of subset this of matrices site binding The information. of subset a KOs, to and were compared The corresponding protein accession numbers were extracted and converted to JASPAR the from database downloaded were sites binding factor scription (ref. TRANSFAC 7.0 from (derived database factor sites. transcription KEGG the from binding and N. americanus factors Transcription comparison. for background a as support proteomics genes supported by proteomics was calculated P Note ( analysis spectral mass and LC before prepared were tions frac Peptide protein. somatic total of samples mg 1.5 two tryptic-digest and Review Board (117040 and 060605), the Ethics Committee of Instituto René René Instituto of Committee Ethics the 060605), and (117040 Board Review Institutional Washington University George by the approved protocols using in an study consent) were into (with informed (inclusive) enrolled a cross-sectional Protein microarray. as performed described was previously prioritization compound for screening Chemogenomic dock (ref. and AutoDock4.2 using similarity, performed sequence was ing highest the with structures PDB and eters sequence T-COFFEE reference using their with aligned were models Homology abundance. completion and module have low the that constrains module abundance overall enzyme primarily the for essential are both that cascade the in steps tion on annotations KAAS based DCPM) in normalized (and calculated were abundances module Metabolic InterProScan targets. drug Potential identified and prioritized as previously described previously as prioritized and identified (ref. v58 KEGG from database tion reac The product. unique a produces or substrate unique a consumes either described previously protocol the adapting described previously as kinases putative Those same cutoffs were then applied to the search an until hmmpfam group and adjusted then olds were for applied kinase each Kinomer the in models domain of kinase lection Kinome and chokepoints. 2010). Version 1.3r1. System, Graphics Molecular PyMOL JalView in scheme Clustalx by colored are alignments Sequence shown. model the as were built and the one with the lowest total function score (energy) was chosen the by MODELLER built were NECAME_16780) and (NECAME_16744 for models Homology the two and NAIF1.5_06724). and MUSCLE by obtained were alignments sequence tion, the against

≤ Chokepoints of KEGG metabolic pathways were defined as a reaction that as a reaction defined were pathways metabolic of KEGG Chokepoints 0.05 were considered identified. GO functional enrichment among the the among enrichment functional GO identified. considered were 0.05 C. elegans C. N. N. americanus 8 ). Only proteins confirmed ). with Only at proteins confirmed and least two of a peptides confidence N. americanus 4 8 )) )) and comparing to 7 9 came as close as possible to identifying known known identifying to possible as close as came 5 ; ; protein structure models are rendered in PyMOL (Schrodinger, The 32 C. C. elegans Each protein for Each was pro searched the SCP/TAPS-representative , N. americanus N. 86 6 6 8 crystal structure as template as structure crystal 9 , 9 were identified by extracting KEGG Orthology (KO) numbers Orthology KEGG by extracting were identified 6 9 and hmmpfam and . Ion channels were identified using WU-BLASTP (E (E WU-BLASTP using identified were channels Ion . IPR014044 (“CAP domain”) and PF00188 (“CAP”) PF00188 and domain”) (“CAP IPR014044 0 . Leaves . of Leaves the tree were with domain annotated information, 10 orthologs within two groups orthologs orthologous (NAIF1.5_00184 8 -endemic area of Northeastern Minas Gerais state in Brazil, 0 8 , constructed with MODELLER with constructed , In 2005, 1494 individuals between the ages 4 and 66 years and neighbor joining and neighbor 9 proteome (WS230). For ivermectin target characteriza target For ivermectin (WS230). proteome 9 ( Supplementary Note Supplementary GPCRs, LGICs and VGICs were identified with with identified were VGICs and LGICs GPCRs,

transcription factors with available binding site site binding available with factors transcription N. N. americanus N. N. americanus N. N. americanus 6 8 , and module bottlenecks were defined as reac defined were bottlenecks , and module 8 7 . Phylogenetic relationship trees using full full using trees relationship Phylogenetic . 9 8 ) was used and the chokepoints were were chokepoints the and used was ) 9 7 genes were screened against the col KOs. Documented matrices of tran . Kinase prioritization was done by done was prioritization Kinase . 8 transcription factor KOs factor to define transcription 9 9 1 N. N. americanus 4 as previously described for other described as previously 0 . For each ortholog, five models models five ortholog, each For . ( 8 10 ). 1 Supplementary Note Supplementary , , using all of the genes without Transcription factors in in factors Transcription 2 9 ) using default parameters. parameters. default using ) 6 9 , and custom score thresh score , custom and 9 ( 10 N. N. americanus Supplementary Note Supplementary 1 using default param default using 9 N. americanus N. doi: 2 gene set to identify C. elegans C. for the the for Supplementary Supplementary 10.1038/ng.2875 C. elegans C. orthologs orthologs ). kinases. kinases. 8 9 9 6 1 ≤ 3 using using . using using tran e −10 8 5 ). ). ------) .

© 2014 Nature America, Inc. All rights reserved. 73. 72. 71. 70. 69. 68. 67. 66. 65. 64. 63. 62. 61. 60. 59. 58. 57. 56. ( for analysis considered was fluorescence mean the Devices). The data (Molecular were analyzedscanner using themicroarray “group GenePix a average” using method scanned were Microarrays monoclonal HP6061B). MD, Baltimore, Laboratory, mouse Reagent (Human Fc IgE anti-human biotin-conjugated and HP-6025) clone 091M4783, lot 8c/6-39), clone clone HP-6050) B3648, and IgG4 (Sigma, lot 080M4811, B3523, IgG3 (Sigma, 031M4751, lot B6775, (Sigma, IgG1 human against bodies anti monoclonal mouse biotinylated using detected were responses ­specific subclass- and N-terminal isotype- and washed, were Arrays overnight. sera human with incorporated against antibodies tags. (HA) hemagglutinin C-terminal and (His) polyhistidine using quality-checked FAST glass extracts ( nitrocellulose protein slides onto the purification and without expressed contact-printed were cloned, amplified, cDNAs successfully corresponding 623 were those, Of worms. tran adult for and/or iL3 evidence in RNA-seq had scription and secretion for peptide signal classical for positive be to determined viduals indi from collected was mL) (15 Venousblood 12/2006). and 04/2008 bers Rachou and the National Ethics Committee of Brazil (CONEP; protocol num doi: Protein arrays were blocked in blocking solution (Whatman) and probed probed and (Whatman) solution blocking in blocked were arrays Protein 1,275 of total A

10.1038/ng.2875 screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.and RepbaseSubmitter Repbase: in elements repetitive of screening reactors. americanus with RNA-Seq. with (1994). 67–70 sequences. nucleotide in motifs repetitive clustered of analysis for algorithm SIMPLE the of computers Sun VAXand for implementation transform. genomes. eukaryotic for 33 Res. server. reconstruction pathway and annotation genome automatic Consortium. Ontology finding. genomes. organism gene model emerging novo de improve to alignments Bioinformatics cDNA mapped syntenically database. family RNA an (1997). rnfr N gns n eoi sequence. genomic in genes RNA transfer genes. Res. Genome Cytogenet. Bioinformatics ( hamsters fecundity. and burden hookworm ( hamsters golden in tne M, ikas M, arsh R & aslr D Uig aie and B.L. native Cantarel, Using D. Haussler, & R. Baertsch, M., Diekhans, M., Stanke, genomes. novel in finding Gene I. Korf, S.R.Rfam: &Eddy, A. M.,Khanna, A., Marshall, Bateman, S., Griffiths-Jones, of detection improved for program a tRNAscan-SE: S.R. Eddy, & T.M. Lowe, K. Lagesen, J. Jurka, and submission Annotation, J. Jurka, & L. Hankus, A.J., Gentles, O., Kohany, M. Margulies, S. Xiao, Jian, X. rpel C, ahe, . Slbr, .. oHt dsoeig pie junctions splice discovering TopHat: S.L. Salzberg, & L. Pachter, C., Trapnell, enhanced and improved an SIMPLE34: J.S. Armstrong, & J.M. Hancock, Burrows-Wheeler with alignment read short accurate and Fast R. Durbin, & H. Li, Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups E. Quevillon, an KAAS: M. Kanehisa, & A.C. Yoshizawa, S., Okuda, M., Itoh, Y., Moriya, M. Ashburner, Supplementary Note Supplementary , W116–W120 (2005). W116–W120 ,

35 Nucleic Acids Res. Acids Nucleic , W182–W185 (2007). W182–W185 , et al. et al. et Nature t al. et Bioinformatics eorcts auratus Mesocricetus . et al. et Exp. Parasitol. Exp.

Necator americanus et al. et The evaluation of recombinant hookworm antigens as vaccines in vaccines as antigens hookworm recombinant of evaluation The

et al. et et al. et ebs Udt, dtbs o ekroi rpttv elements. repetitive eukaryotic of database a Update, Repbase 7 24 t al. et Bioinformatics

N. americanus N. , 474 (2006). 474 , 437 RNAmmer: consistent and rapid annotation of ribosomal RNA ribosomal of annotation rapid and consistent RNAmmer: , 637–644 (2008). 637–644 , InterProScan: protein domains identifier. domains protein InterProScan: Genome sequencing in microfabricated high-density picolitre high-density microfabricated in sequencing Genome Gene ontology: tool for the unification of biology. The Gene The biology. of unification the for tool ontology: Gene Mesocricetus auratus Mesocricetus AE: n ayt-s antto ppln dsge for designed pipeline annotation easy-to-use an MAKER: , 376–380 (2005). 376–380 , Nat. Genet. Nat.

Genome Res. Genome

110 Nucleic Acids Res. Acids Nucleic 25

). The printed printed The ). 35

118 , 1754–1760 (2009). 1754–1760 , , 462–467 (2005). 462–467 , , 3100–3108 (2007). 3100–3108 ,

25 , 32–40 (2008). 32–40 , open reading frames (ORFs) contained a contained (ORFs) frames reading open calne wt hmn hookworm, human with challenged ) : maintenance through one hundred generations , 1105–1111 (2009). 1105–1111 , Exp. Parasitol. Exp.

25 N. americanus N. Genome Res. Genome , 25–29 (2000). 25–29 ,

13 BMC Bioinformatics BMC , 2178–2189 (2003). 2178–2189 , ). I. Host sex-associated differences in differences sex-associated Host I. ). in vitro in

uli Ais Res. Acids Nucleic 31

, 439–441 (2003). 439–441 , 104

18 –expressed proteins were were proteins –expressed Supplementary Note Supplementary ( , 62–66 (2003). 62–66 , , 188–196 (2008). 188–196 , Supplementary Note Supplementary Comput. Appl. Biosci. Appl. Comput.

Nucleic Acids Res. Acids Nucleic 5 , 59 (2004). 59 ,

25 Nucleic Acids Nucleic 10 3 955–964 , , whereby Necator BMC ).

10 ). - - - - ,

86. 85. 84. 83. 82. 81. 80. 79. 78. 77. 76. 75. 74. 93. 92. 91. 90. 89. 88. 87. 103. 102. 101. 100. 99. 98. 97. 96. 95. 94.

ag-cl bonomtc nlss f eune datasets. sequence of analyses bioinformatic large-scale 2008update. the in (2008). D102–D106 and tools content new profiles: eukaryotes. in regulation caninum. Ancylostoma hookworm, the Proteomics of stage blood-feeding the Bioinformatics annotations. ontological and sets gene testing. (1995). multiple to approach powerful Res. Acids Nucleic Sel. secretion. protein leaderless and non-classical of prediction method. prediction peptide signal and database. prediction sets. (2012). D109–D114 data molecular large-scale of interpretation and 26 data. expression gene digital of analysis expression differential for 1760–1766 (2006). 1760–1766 techniques. analysis data microarray DNA using microarrays flexibility. receptor selective restraints. alignment. sequence multiple accurate pathways. (2013). metabolic nematode in chokepoints Res. Acids Nucleic drugs. and diseases involving networks molecular of analysis and representation Genet. Nat. kinases. protein eukaryotic workbench. analysis and editor alignment sequence Bioinformatics multiple 2—a Version receptor. Cys-loop selective Sci. Protein Protoc. throughput. annotation. and display tree Adv. Biotechnol. outcomes. biotechnological and research fundamental for framework a developing (2007). 2947–2948 models. mixed under (2011). (2012). atcsi C. Cantacessi, Bryne, J.C. V. Matys, J. Mulvenna, D. Binns, K. Prüfer, and practical a rate: discovery false Y.the Y.Hochberg, Controlling Benjamini, & database. peptidase the MEROPS: A. Bateman, & A.J. Barrett, N.D., Rawlings, Bendtsen, J.D., Jensen, L.J., Blom, N., Von Heijne, G. & Brunak, S. Feature-based topology transmembrane combined A E.L. Sonnhammer, & A. Krogh, L., Käll, S. Hunter, Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration package Bioconductor a edgeR: G.K. Smyth, & D.J. McCarthy, M.D., Robinson, Eswar, N. Eswar, high and accuracy high with alignment sequence multiple MUSCLE: R.C. Edgar, Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic C. Cantacessi, M.A. Larkin, inference phylogenetic Bayesian 3: MrBayes J.P. Huelsenbeck, & F. Ronquist, searches. HMM profile Eddy,Accelerated S.R. udrs, S. Sundaresh, G.M. Morris, spatial of satisfaction by modelling protein T.L.Comparative Blundell, & A. Sali, and fast for method T-Coffee:novel J. a Heringa, & D.G. Higgins, C., Notredame, C.M. Taylor, for KEGG M. Hirakawa, & M. Tanabe, M., Furumichi, S., Goto, M., Kanehisa, M. Mitreva, of annotation functional and Classification G.J. Barton, & D. Miranda-Saavedra, Jalview G.J. Barton, & M. Clamp, D.M., Martin, J.B., Procter, A.M., Waterhouse, anion- an in permeation and activation of Principles E. Gouaux, & R.E. Hibbs, , 139–140 (2010). 139–140 ,

17 , 349–356 (2004). 349–356 , et al. et et al. et et al. et J. Mol. Biol. Mol. J.

et al. et et al.

t al. et 8 et al. et Nucleic Acids Res. Acids Nucleic 43 , 109–121 (2009). 109–121 , et al. et t al. et t al. et t al. et

25 25 , 228–235 (2011). 228–235 , TRANSFAC and its module TRANSCompel: transcriptional gene transcriptional TRANSCompel: module its and TRANSFAC Comparative protein structure modeling using MODELLER. using modeling structure protein Comparative t al. et FUNC: a package for detecting significant associations between associations significant detecting for package a FUNC: t al. et JASPAR, the open access database of transcription factor-binding t al. et

InterPro in 2011: new developments in the family and domain and family the in developments new 2011: in InterPro The draft genome of the parasitic nematode parasitic the of genome draft The 27 ucG: wbbsd ol o Gn Otlg searching. Ontology Gene for tool web-based a QuickGO: , 3045–3046 (2009). 3045–3046 , , 1189–1191 (2009). 1189–1191 ,

Proteomics analysis of the excretory/secretory component of component excretory/secretory the of analysis Proteomics

icvr o atemni du tres n dus using drugs and targets drug anthelmintic of Discovery lsa W n Cutl vrin 2.0. version X Clustal and W Clustal uook ad uookol4 atmtd okn with docking automated AutoDockTools4: and AutoDock4 38 38 , 376–388 (2009). 376–388 , 50 Insights into SCP/TAPS proteins of liver flukes based on based flukes liver of proteins SCP/TAPS into Insights Nucleic Acids Res. Acids Nucleic Identification of humoral immune responses inprotein responses immune humoral of Identification prri o te SPTP” rtis f eukaryotes– of proteins “SCP/TAPS” the of portrait A Bioinformatics , D227–D233 (2010). D227–D233 , , D355–D360 (2010). D355–D360 ,

, 2.9 (2007). 2.9 , 234 Nucleic Acids Res. Acids Nucleic Proteins Nature , 779–815 (1993). 779–815 , J. Comput. Chem. Comput. J. Bioinformatics

32

, 1792–1797 (2004). 1792–1797 ,

474

68 19 , 893–914 (2007). 893–914 , J. Mol. Biol. Mol. J. BMC Bioinformatics BMC , 54–60 (2011). 54–60 , , 1572–1574 (2003). 1572–1574 , J. Mol. Biol. Mol. J.

40 . o. tt Sc B Soc. Stat. Roy. J.

, D306–D312 (2012). D306–D312 , 23

34 PLoS Comput. Biol. Comput. PLoS

, 127–128 (2007). 127–128 , 30 , D108–D110 (2006). D108–D110 , LS Pathog. PLoS , 2785–2791 (2009). 2785–2791 ,

338 302 uli Ais Res. Acids Nucleic uli Ais Res. Acids Nucleic , 1027–1036 (2004). 1027–1036 , , 205–217 (2000). 205–217 , Nature Ge Nature LS ONE PLoS

8 Bioinformatics , 41 (2007). 41 , Bioinformatics Trichinellaspiralis rti Eg Des. Eng. Protein

9

57 Bioinformatics 7 e1003505 , , e1002195 ,

289–300 , 7 e31164 , o. Cell Mol. n etics

Curr.

23 36 40 22 , . , , ,