© 2015 Nature America, Inc. All rights reserved. Pasadena, Pasadena, California, USA. should Correspondence be addressed to E.M.S. ( and Biological Engineering, California Institute of Technology, Pasadena, California, USA. School, Worcester, USA. Massachusetts, 1 more are that strongylids, , parasitic of class a to belong ( tested be can candidates americanus N. human-specific the for system model a as used be to it the by excreted are that eggs lay and mate adulthood, to mature to and themselves, tract where affix digestive the they intestine, small bloodstream, the through pass then Hookworms swallowed. although skin, into ing by burrow host their enter generally They host. human a encounter third- infectious and phase feeding hookworms (L3i), cease stage wait larval until they the At stages. larval second and first the bacteria through on feeding larvae and soil in hatching eggs with cycle, life hosts human to americanus are infections most the causing species two The and family. proteins genes nematodes. downregulation ion Genes subfamily during of transcriptomic A. ceylanicum model both Unlike help finding impoverishing Hookworms Erich M Schwarz gene families infection-specific hookworm The genome and transcriptome of the zoonotic s r e t t e l 416 Received 12 January 2014; accepted 5 February 2015; published online 2 March 2015; corrected after print 5 May 2015; Department Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA.

30,738

channels 1 should

. The ability to culture culture to ability The . in humans

were

downregulated for

These early other devising which

(SCVPs),

genes. of hookworm and

help upregulated

Later,

infection

infect activation-associated

and hookworms, findings and and and

genes genome duodenale

them 1 data

,

9 new was elucidate

Approximately . Hookworms are free living during part of their their of part during living free are Hookworms .

G

other encoding A. duodenale A. at

over

1

1– protein–coupled

throughout

they , , Yan Hu observed the drugs

3 disease Ancylostoma provide

in in vivo during

. sequence

Sequencing mammals,

Fig. Fig. 400 along onset

express

Ancylostoma ceylanicum Ancylostoma Ancylostoma ceylanicum hookworm A. ceylanicum A.

or

a

1 million ,

6

previously

) early

vaccines ,

new including , upon which new drug and vaccine vaccine and drug new which upon , 2 in 7 6 with of , . ,

10

3 3

infection 900 We

Section Section of Cell and Biology,Developmental University of California, San Diego, La Jolla, California, USA.

both , , Igor Antoshechkin of heavy during , can alternatively enter by being being by enter alternatively can 1

secreted

1 providing drug

infection

. Human-specific hookworms hookworms Human-specific . genes

313

, which are generally restricted hookworm

people,

determined genes receptors;

pathogenesis. parasitic

against

in allows allows hamster golden in

blood

ASPRs,

and Mb, undescribed infection

showing for

were proteins

stunting vaccine

with included a secreted

feeding,

laboratory

hookworms

and a genomes this

upregulated an

cryptic

should expression

free-living

infects

(ASPs)

targets

and protein

clade C-lectin 4

Necator

and , , Melanie M Miller

4

8

, [email protected]

. 5 V

.

-

the onset and establishment of infection by by infection of establishment and onset the spanning stages developmental at collected specimens on RNA-seq genes, acquired by HGT, that may promote bacterial lysis by HGT, bacterial promote may acquired that genes, insects phylum ( entire the in homolog ( one of only stages has it later after HGT; all acquired presumably in introns, predicted nine downregulated has It then infection. and L3i in expressed murein their recycle bacteria l A. ceylanicum (HGT) transfer gene horizontal through genes bacterial have acquired all nematodes ( respectively genes, of these (68.5%) 6,883 and (88.5%) of 23,855 expression detected we in aid might that proteins smaller uncover to residues, 30–99 of products with genes of (17–24%). We predicted 26,966 protein-coding genes in as much as twice tive, Mb) 244 of those than as n dpes giutrl productivity agricultural depress and mals as such strongylids, against free-living free-living the to related closely caninum Tables 1 and RNA sequencing with scaffolding Illumina and a N50 of scaffold 668 kb, to estimated cover ~95% of the genome, analysis. comparative genome and of transcriptome Supplementary Fig. 1 NECAME_15163 -alanine amidase amidase -alanine To find genes acting at specific points of infection, we carried out out carried we infection, of points specific at acting Togenes find The genomes of plant-parasitic, necromenic and -parasitic animal-parasitic and necromenic plant-parasitic, of genomes The We assembled an initial ≥ 100 residues ( 100 residues 5 Howard Howard Hughes Medical Institute, California Institute of Technology, 2 Program Program in Molecular Medicine, University of Medical Massachusetts Acyrthosiphon pisum Acyrthosiphon . ceylanicum A. – 22– (347 Mb) (347 3 ) ). The genome size was comparable to those of Pristionchus pacificus Pristionchus VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME 2 4 3 . We found that 40.5% of the genomic DNArepeti was genomic . Weof the 40.5% that found : , , Paul W Sternberg Acey_s0012.g1873 identify 27 . americanus N. , Supplementary Table 4 Supplementary 2 from 8 1 . We detected one instance of bacterial HGT in in HGT bacterial of instance one Wedetected . amiD 9 Fig. Fig. and and amnhs contortus Haemonchus and might thus also prove useful against other other against useful prove also thus might N. N. americanus 3 , which encodes a protein that may help help may that protein a encodes which , ). H. contortus H. A. A. ceylanicum N. americanus N. Supplementary Table 5 and and 2 A. A. ceylanicum 6 , , . With RNA sequencing (RNA-seq), (RNA-seq), sequencing RNA With . 2 , a homolog of the N-acetylmuramoyl- 9 doi:10.1038/ng.323 . elegans C. ( . . Caenorhabditis elegans Caenorhabditis Planococcus citri Planococcus Fig. Fig. Acey_s0012.g1873

4 ) ) but manyhomologs bacterial , (320–370 Mb) (320–370 5 2 genome sequence of 313 Mb & V Raffi Aroian ) ). ). We 10,050 predicted also , , 12– C. elegans C. and and is a key step toward such A. ceylanicum A. 1 ta ifc fr ani farm infect that , 1 5 6 . Treatments effective effective Treatments . 17 . Characterizing the the Characterizing .

, 1 . pacificus P. Nature Ge Nature 8 7 ). The sap-feeding ( also have have also 4 Supplementary Supplementary 2 Division Division of Biology or or 5 20 was strongly strongly was with products , Ancylostoma 2 than is the the is than 1 P. pacificus P. 30 but larger larger but in golden golden in OPEN , 3 1 n (100– . 2 etics amiD , 3 - -

© 2015 Nature America, Inc. All rights reserved. list all strictly defined orthologs within a genome. a genome. within orthologs defined strictly all list ceylanicum A. for genes orthologous strictly of numbers the To I. are clade right the from orange, III; clade parasites, animal filarial and IV;ascarid clade pink, parasites, V;plant clade green, relatives, close (clades) groups distinct from Nematodes nematodes. non-parasitic free-living, of relatives americanus N. Megen van from derived is phylogeny The 2 Figure of orthology by identified families gene particular, in downregulation; or upregulation disproportionate for families gene ( transitions among developmental in over-represented downregulated or were upregulated terms genes GO which computing and to terms (GO) gene tion ontology by assigning vivo in observations previous matches number with upregulated of 141 were also which HCM, in incubation of h 24 after L3i from upregulated significantly ( infection of h 24 after L3i from upregulated significantly be to genes 942 We found (24.PI). hamster the in infection or medium (24.HCM), a standard model for earlyculture hookworm in incubation of either h 24 by followed and L3i ( hamster 100 bars: Scale hamsters. in weeks for adults fertile remain they shown), male (19.D; d onward 19 From cycle. life the renewing defecation, during host the outside deposited are that eggs many laying begin They adults. mature fully are they shown), male d (17.D; 17 By laying. egg no or little with females, gravid a few and males mature with adults young become and feeding blood heavy start they shown), female d (12.D; 12 By differentiation. sexual visible with shown) female (5.D; larvae (L4) fourth-stage early into grown and there themselves affixed intestine, the to larvae 24.PI of those mimic to thought behavior and shape larval in changes evokes which (24.HCM), medium culture hookworm h in 24 for larvae L3i incubate to is infection parasite for model A standard (24.PI). stage L3i the exited have hamsters, golden into gavage h after 24 In a host. inside are they until development further arresting larvae, (L3i) third-stage infectious into mature they stage, larval third the exiting Before larvae. (L1–L3) third-stage to first- free-living as grow 1 Figure Nature Ge Nature elegans C. Supplementary Table 7 Table Supplementary Supplementary Tables 8 Tables Supplementary We linked known or probable gene functions to steps of infec of steps to functions gene probable or known linked We µ m (L3i through 5.D), 500 500 5.D), through m (L3i has stronger effects on gene activity than its its than activity gene on effects stronger has 1

2 Evolutionary relatedness of of relatedness Evolutionary of cycle Life within the phylum are color-coded: black, black, color-coded: are phylum the within have similar orthology to diverse nematode species. nematode diverse to orthology similar have Figs. 1 1 Figs. A. ceylanicum A. n etics or or and and in vivo in C. elegans C. and H. contortus H.

. By 5 d after infection, larvae have migrated further further migrated have larvae infection, 5 d . after By VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME A. ceylanicum A.

3 , and and , . . C. elegans C. ). In contrast, we observed only 240 genes genes 240 only observed we contrast, In ). and other species. Self-comparisons (bold) (bold) Self-comparisons species. other and and and A. ceylanicum A. Supplementary Table 6 Table Supplementary B. xylophilus P. pacificus C. briggsae C. elegans H. contortus N. americanus T. spiralis D. immitis B. malayi M. hapla A. suum A. ceylanicum are strongylid parasites strongylid are µ m (12.D) and 1 mm (17.D and 19.D). and (17.D 1 mm and m (12.D) 9 ) . . A. ceylanicum A. 3 , , A. ceylanicum A. 4 C. briggsae C. . We also analyzed homologous homologous analyzed also We . et al. et Trichinella are still in the stomach but but stomach the in still are 3 in in vivo 2 A. ceylanicum and shows that infection infection that shows and 1 3 5,373 4,833 6,933 7,042 5,268 7,506 9,860 3,015 5,166 5,312 5,955 4,161 and Kiontke Kiontke and and and A. ceylanicum A. A. A. ceylanicum to other nematodes. nematodes. other to infection. This lower This infection. hatch in feces and and feces in hatch A. ceylanicum A. , an animal parasite parasite animal , an P. pacificus 1 in vitro in 5 and the closest closest the and ), beginning at beginning ), C. elegans A. ceylanicum A. 10,795 11,323 5,587 4,979 4,576 6,225 7,042 3,186 5,308 5,545 6,209 4,376

et al. et and and model. genes are are and and in vivo in 1 4

.

3 3 2 3 - ,

rtae mgt lo iet n iatvt poen o te host’s the of proteins inactivate and digest mucosa also intestinal might and proteases blood in proteins host contortus H. in L3i after upregulated also 9a 24.PI; to Tables (L3i infection early during upregulated ( infection of components undescribed with in gene expression seen after 24 h of infection infection h of 24 after seen expression gene in changes far-reaching the to opposed as minor, relatively are (24.HCM) in from ranging shown are values TPM respectively; downregulated, and up- are blue and yellow in Genes with (TPM), million per log in shown is infection during activity Gene 3 Figure ASPs as such nematodes, parasitic other from known already some nematodes. free-living and parasitic both in conserved be A. caninum both in genes transcription and perception sensory of L3 after regulation ( in both larvae (L4) fourth-stage to L3 from transition the in downregulated genes among pattern same ( factors transcription with along 24.PI), to (L3i infection early during lated were downregu in general functions and neurotransmission-related immunity host suppress also system immune upeetr Tbe 8 Table Supplementary Figure Figure Proteases, Proteases, protease inhibitors, nucleases and protein synthesis were Among gene families upregulated during early infection, we Among during upregulated early found gene infection, families G protein–coupled receptors (GPCRs), receptor-gated ion channels N. americanus N. C. elegans C.

RNA expression levels for 30,738 30,738 for levels expression RNA 1 upeetr Tbe 9b Tables Supplementary log . Changes in gene expression after 24 h of growth in HCM HCM in growth h of 24 after expression gene in . Changes and and and 2 17.D (TPM 2 1 . Secreted proteases could allow hookworms to digest digest to hookworms allow could proteases Secreted . L3i 4 37 Brugia 3 )

≤–3.0 and 11a , 3 24.HCM 8 . Conversely, secreted protease inhibitors could could inhibitors protease secreted Conversely, . or other nematodes might encode previously previously encode might nematodes other or –2.5 ); proteases and protease inhibitors were were inhibitors protease and proteases ); k 12.D N. americanus N. partitioning of the genes into 20 groups. groups. 20 into genes the of partitioning 19.D –2.0 . hs idn i cnitn wt down with consistent is finding This ). 24.PI –1.5 ≤ 39– 2 N. americanus N. 32

−3 –1.0 4 , 1 4 to to . 5.D –0.5 4 . Such downregulation might thus might . Such downregulation ≥ H. contortus H. 2 and and 2 3 0 4 . Developmental stages are as are stages . Developmental and of ion channel genes in in genes channel ion of and 12.D Supplementary Table 10 Supplementary L3i A. ceylanicum A.

0.5 5.D 2 -transformed transcripts transcripts -transformed in vivo in 1.0 11b 2 17.D 4 , as were proteases in in proteases were as ,

1.5 the observed We ). (24.PI). 2 6 1

2.0 , 11 and and s r e t t e l 19.D Supplementary Supplementary , 24.HCM 35– 24.PI genes. genes. 2.5

≥ 3

3.0 elegans C. 7 . Secreted Secreted .

417 ).

4 2 - -

© 2015 Nature America, Inc. All rights reserved. two-domain proteins are noted as “N” or “C.” Domains from other, less familiar ASP genes are labeled in gray. Confidence values are given as decimal decimal as given are values gray.in Confidence labeled are genes ASP familiar less other, from Domains “C.” or “N” as noted are proteins two-domain ( ( blue labeled are genes ASPR blue. in labeled branch, a single from are domains ASPR most and domains ASP All domains). ASP tandem more or two encode sometimes genes ASP (as tips the at proteins full-length than rather domains protein of phylogeny a maximum-likelihood shows 4 Figure (ASPRs; genes ASP-related them to ASPs, related we so termed were distantly infection in vitro simulated of h 24 after genes these of upregulation trast, ( families gene known to tion found a family of 92 genes upregulated collectively during early infec clotting blood and responses immune blocking include probably functions whose proteins, cysteine-rich of secreted ( s r e t t e l 418 ( fractions Heligmosomoides bakeri Heligmosomoides Supplementary Table 12a Supplementary ASP-6C

ASP-3 in vivo in NIF ASP-1C was insignificant (24.HCM; (24.HCM; insignificant was

ASP-s0016.g

ASP-s0016.g ASP-s00 ASP-s0365.g35

ASP-s0365.g3 Nam-ASP-g158 ASP-s0365.g

ASP-s0105.g3

Nam-ASP-g

Nam-ASP-g00357/ ASP-s0018.g

ASP-s096

ASP-s0018.g

ASP-s0365.g3 ASP-s0695.

ASP-s043 Nam-AS

ASP-s0092.g2549/6-182 Nam-ASP-g1 Nam-ASP-g06

ASP-s0224

Nam-ASP-g01191/3- Nam-ASP-g0

Nam-A ASP-s10

Nam-ASP ASP-s0105.g3668/ ASP-s0265.g65

Nam-ASP-g Nam-ASP-g13272/1-136

Nam-ASP-g1 ASP-s0633 Nam-ASP-

Nam-AS

ASP-s0042 Domain-based phylogeny of ASP and ASPR genes from from genes ASPR and ASP of phylogeny Domain-based Nam-ASP-

Nam-ASP-6/1 ASP-s0633 Nam-ASP-g

ASP-s0633 ASP-s0003.g1387/18-81

Fig. Fig. Nam-ASP-g Nam-ASP-g18512/1-110

ASP-s0067.g

ASP-s0183.g948/1-6 Nam-ASP-g17566/13-1

Nam-AS

ASP-s019

ASP-2 Nam-A 55.g2562/110-

S ASP-s0067.g56/212-34

ASP-s0067.g31/32-2 ASP-s0010.

SP-g06164/167 Nam-ASP-g15 ASP-s0010.g1207/286-456

ASP-s0574.g17

P-g00629/81-

7.g3238/4

3123/1248-1411 21.g34 ASP-s0194 ASP-s0067. 8.g1464/5

ASP-s000

ASP-s0067.g 3127/475-62 ASP-s0133.g179 ASP-s0035

-g1236 05953/26- P-g19 g1604/61-

ASP-s00 ASP-s0067.g58/7-13 3590/245-413 .g2710/7-179

3669/80-226

ASP-s0133.g1795/8-146

ASP-s0009 3668/2

591/246-4 g18281/20-13

671/257-41 ASP-s0074.g909/18

2644/3-1 SP-g177

ASP-s0

ASP-s0009.g4 ASP-NIF- 985 92/175-28

ASP-s0009.g459/ 08137/167-2 .g898/22-15 g1506

ASP-s0011.g13

65/168-323 upplementary Fig. 2 Fig. upplementary ASP-s0050.g2051/148-32 ASP-NIF-A/6 .g548/28 ASP-s00 596/33-11

ASP-s0050 ASP-6/243-40 919/8-108

.g890/8-15 P-g18

1294/92- .g903/8-15

150

ASP-s00

ASP

08136/6-120 4.g14

ASP-s0050.g2040/200-329ASP-s0050.g2036/ 15/13-

399

6/80-210 9/34

83-341

7/78- 14-169 ASP-s0009.

33/275-411 ASP-s0074.

ASP-s

ASP-s0004.g2 ASP-s

ASP-s0004.g2023/223-3 g1209/252-41

ASP-s0004.g2028/76-281 1-334 63/

ASP-3A/23-188 8-202 4/23

ASP-s0328.g2635/14 0-210

ASP-3B/5-174 1-110

09.g458/272 9.g461/267-42 -s0119

ASP-s0

555/

ASP-s0477.g2171/235- /3-11 (24.PI; 074.g90

.g1415

135

265

159

ASP-s04

22

-21

g38/11-128 .g3034/22

19/26 243 189/4-11 18/7-13

4 228

Nam-ASP-g0 09.g455/268-423 0-43

1 17

B/17-208 215

6 -332

ASP-s0328.g2631/40-205 ASP-s0183.g 5

.g462/24 51/9-156 6/62-13 9

5-365 5 3 5 35.g303

20 0477.g2175/

6 0162.g3 ASP-s0

1-6 3

.g2045/33 8

Nam-ASP-

0 ASP-s0024.g961/243-4

9 2

46 9 -352

ASP-s0245Nam-ASP-g01772/76-240 7

-18 0 1 .g811/247-339

3/65-228 4

3-423 56/251-41 /86-235 183.g95 0 Nam-ASP-

7 4

ASP-s0001 3/500-55

ASP-s0188.g1147/415-581 29/616

Nam-ASP-1/25 2 4

g643/191-343

8

8 g904/4-67

5 77.g Nam-ASP-g16 Nam-ASP-g16120/37-159

Nam- 3

4-16 3

ASP-s0713.g1756/268-431 1 ASP-s0009.g633/756 4

1-38

183.g96

8-40 0/236- 400/74-

2 8-350 ASP-s0025.g1267/247- 030/87-25 -422 4 Nam-ASP-g12324/131-242 217

11-190 5

4-51 Nam-AS

1219/28-19 3 Nam-ASP-

ASP-g

ASP-1/22 9/251-4 -763 g17852/147-221

1 961/250-413

7 157-304

.g3568/ 0.98 0.97

3

ASP-s02 3 ASP-s0272.

0.99 0/25

0. g1612

0.87 0.99 33 and and

Nam-ASP-g1128 ASP-s0119.g8

3

0.91

.g198/229-3 1 ASP-s0048.g1 4 2/218-39

ASP-s0184.g989/ 0.87 7 227 0.93 ASP-s0272.g948/12-1

ASP-s0085.g 7-31 0.98 ASP

0.99 1

1612 7-4

8 0 0. 06

P-g16122/2-70 39

Nam-ASP-2/20-129 7-42 0.4 ASP

121/174-3

3/26-1

9 0.34 9

229-393 0.36 ASP-s0071.g526/235-397

7-392 -s0048.g16

ASP-s0184.g986/7-178 1 1

g18994

ASP-s0114.g465/46-108 0.54 2

ASP-s0340.g2977/170-334 0.56 72.g 0.72 7 0.71 2 2 5/1-

ASP-s0114.g461/25-11 0.

-s00

ASP-s0340.g ASP-s0173

0.98 ASP-s0151.g2844/231-39

2 0.93 02

8

ASP-s0048.g1

0.93 9 0.76 g933/206-3

ASP-s0151.g2847/272-44

954/

97 153

0.18 1

0.87 93 ASP-s015 ASP-s0181.g872/85-245 05.g2

ASP-s0114.g459/359-518 -92 0.89

0.37

ASP-2/1-15 0.84 0.78 ASP-s0036.g3234/202-35 25/7-7 /2-102

1832/101-2 648/2

4 ASP-s0173.g393

0.89

10-20

0.99 ASP-s0151

2/1-9 413 2 0.83 2

ASP-s0340.g2 0.91

49/163-32

0.83

0.92

0.98 0.

0.17

14-123 0.77 0.

1 695/23 0.76

ASP-s0173.g394/

ASP-s0050.g2052/2-121 9

.g397/248-410

14-30 7

1 0.77 ASP-s0173.g

ASP-

5

ASP-s1019.g3408/8-113 6 0.87 42

2994/197-3 0.93

1 1.g2846/23

ASP

0.35 4 0.81 0.06

ASP-s0341.g3010/15-18 647/5-7

ASP-s0007.g3493/6-77 9

ASP-s0341.g3009 0.99 0.69

2 -183

7 8

0.75 0.99 s0216.g

.g2841/19

3 6 ASP-s -s093

0.93 ASP- ASP-s0080.g1366/2 Nam-ASP

1

0.77

995/67-22 7 0.99 0. 0.62 0.75

3

Nam-ASP-g1

/218-38

0.66 ASP-s0335.g2860/7-81 0.98

0.87

ASP-s0808.g2453/1-117 0.92 2 Fig. Supplementary

q

5.g

0.94 s0119 0119.g810/32-1 7 2-403

5 395/2

0.79 ASP-s0041. 2381/2

Nam-ASP-g14414/11-138Nam-ASP-g19417 7 1 7

311

218-381 0.08 3

ASP-s0672.g1385/39- 0.28 1-359

ASP-s025

ASP-s025 -g16753

0.93

0.96 .g803

37-40

2 9 1/2 ASP 0.92

0.89

ASP-s0061.g3213/11-12 7 0.93 0.92 0.09 35-40 similarity obvious no had that 0.003) = value

/146-308 0.79 ASP-s025

ASP-s0074.g877/7-78 24-3

ASP-s0041

0.15 -s0256.g353/273-376

/219- 6755/6

4

1 g359/184-294

0.19

ASP-s0275.g1031/13-140ASP-s0990.g3311/3-7 0.99 /133-2

0.79 ASP-s00

1

0.86 0.17 85

6.g364/2

0.88 6.g349 0.74 ASP-s0119.g8

-10 4 0.83

0.94 02

ASP-s0050.g2041/1-124 0.99 0.34 389

ASP-s011

0.77

Nam-ASP-g17291/2-110 0.99 6.g35 9-241

ASP-s0048

3 0.98

Nam-ASP-g17468/5 0.96 0.66

/1-11 0.98 0.97 93

ASP-s004

0.53

). ASP domains from orthologs of known ASP genes are labeled in gold, with their branches. N- and C-terminal domains from from domains C-terminal and N- branches. their with gold, in labeled are genes ASP known of orthologs from domains ASP ). 0.88 /5-160

48.g15 .g354/235

0.74 0.93 ASP-s0230.g2959/1-110 28-326

ASP-s0018.g3665/37 7/256

6 0.35 0.86 ASP-s0220.g2499/133-289 ASP-s0

1 ASP-s0048. 10 0.95

0.86

0.92 9.g835/

5 0.9 0.

87/73-

0.83

ASP-s0048.g1654/55-215

28/252-405

-412 .g1655/28

Nam-ASP-g06017/ 7

ASP-s0048.g1657/164-324 0.84

ASP-s0036.g3213/37-197 8.g1585

6 -401 0.

3 048.g1656/229-389

0.84 ASP-s00 Tables 4 Supplementary

8

217-36

Nam-ASP-g16614/6-182 1 23 0.25 0.93

Nam-A

0.83

ASP-s0335.g2868/60- 0.94 0.76

0.92 g1657/

0.78 9

ASP-s0559.g3446/43-161 9-183 0.88 /208-313

0.52

8-458

0.84

ASP-s Nam-ASP-g

0.94

8

ASP-s0559.g3445/31 0.81 0.95 48.g 0.83

0.73 SP-g07 ASP-s0335.g2 ASP-5N

0.92 1-142

ASP-s05 0.98

-139 0.92 159

0522

ASP-s0559.g3448/5-175 ASP-s0522.g2

3-109 0. 0.99 0.93

115/23

0.86 1/2

6

ASP-

Nam-ASP-g12445/269-430 0.93 0.

.g28

ASP-s004

ASP-s0335.g2866/ 0.22

0.98 07114/2-1 ) 63-3

22.g

0.6 0.96

8

ASP-s0818.g2514/43-207 0.11 ASP-s0

90/1 s0522.g2

868/273-430 0.45 0.87 7-40 6 ASP-s01

ASP-s0463.g1918/26-199 217 0.9 2887 8

0. 0.16

0.86 21

30-29 ASP-s0048

6

877/24-185 048.g1

0.72 8 8.g1641/2 ASP-s0372.g147/48-221 /72-

-168 24

ASP-s0298.g1746/34-210 0.44

Nam-ASP

883/9-16

43.g241

ASP-s0799.g2411/216-391 1

). Identities of the corresponding genes and domains are given in in given are domains and genes corresponding the of Identities ). 0.8 213

642/29

ASP-s0443.g1548/28-203 1 ASP-s

5-81 1 .g1600/14

66-42 ASP-s00

0.44

ASP

0.79 3/224 0.65 -g1118 9 ,

ASP-s0068.g235/ 0002

0.89 0-45

0.

ASP-s0068 5

-s0815. 24

ASP-s0

0.92 0.88

.g608

-384

4 0.88 0

02.g

ASP-s0

4/25-191 ASP-s0068.g2 1

ASP-s0445.g1586/129-295 3-297

0.94 0.98

ASP-s0007.g3489/4-17

615/ 0.28 ASP-s0004.g1980/503-56 /11-18

g249 007.g3

ASP-s0007

0.95 0.57

ASP-s0004.g1980/114-283 007.g3495/8- .g278/28-20 0.95 19-1

7/14-18

ASP-

ASP-s0004.g1976 32-203 2

0.8 0.99 501/7-1

0.92 72 ,

58/29-21 0.76 0.09

ASP-s0031.g2326/50-221 s0003.

0.

.g3490/37-205 45 1

5

ASP-s0

ASP-s0112.g290/188-362 3 6 2 175

0.88

g1637

ASP-s0008.g207/112-285 0.99 0.92 5 ASP-s0101. ASP-5B/3-

003.g1626

0.97 5

ASP-s0008.g208/ ASP-5A/2-1 /8-178 Nam-ASP-

/257-43 8 0.71 ASP-s0031.g2324/74-247 0.66

ASP-s0031.g2333/127-30 Nam-ASP-

,

0.93 174 g3352/26-1

1 /348-413

ASP-s0154.g3024/5 0.

0.92 ASP-s0050.g2030/ 4 0.52

0.55 g17854/9

9 75

ASP-s0

ASP-s004 0.

ASP-s0497 0 g09588/7- 0.24 0.87 0.99

64-19 7

ASP-s0497.g2498/2 0.96 38 6 ASP-s0046.g1373/1 -118

ASP-s1053.g3499/412-579 116.g6 q

ASP-s00 0.99

3 0.82 6.g137

0.96 134

ASP-s0118.g747/329-498.g2493/1-1 0.98 0.92 0.91

0.72 11/9-1 1-98

0.86 ASP-s0 . a ASP set genes encode diverse

79-751 0.18

2/32-21 0 0.81 Nam-ASP-

ASP-s0176.g53 46.g1397/3

0.44 0.98 82 homologs These = 0.93). value

ASP-s0651.g1140/1-11 046.g

17 Nam-ASP 0.78 0.86

1

ASP-s0131.g1613/34 -16 0.83 0.91 4

Nam-A

0.92 -12 1371

4 g15941/67-216

ASP-s0002.g1098/167-347 7-191

4

0.94 ASP

/8-18

SP-g06987/22- ASP-s0801.g2418/177-342/96-26 -g08845

0.71 -s0010.g107

ASP-s042

ASP-s0038.g3566/111-28 0.89 0.88 0.91 6

0.49 0.72 0.57 0.97

ASP-s0

7 /3-163

ASP-s0176.g546/111-193 0.58 0.13 0.98

Nam-ASP-

1-511 7 8.g1294/

0/42-141

Nam-A 175

ASP-s0245. 0.91 636.g935/

0.71 0.

SP-g09 ASP-s0250.g1 15-80

g155 9

1 ASP-s004 0.89

0.74 73-165

ASP-s0005.g2423/54-23 0. 7 ASP-s0045.

0.94 47/12

g3545/94-2 4

ASP-s1294.g381 4 0.17 0.99 0.74 509/20

5.g1157

-124

0.87 0.93 ASP-s0045.g11

ASP-s0278.g1169/70-18141/76-25 1-30

0.91 ASP-s00

6 g1167/ 0. 9

8 /3-21

ASP-s000

ASP-s0182.g 8

1 17-18 45.g1162/1 7

ASP-s0182.g9 0/63-23 0.84 ASPR-

0. 59/4-169 ASPRs other We found ).

ASP-s0129.g1493/102-26 3 4

1 5.g2

s0002.g551/ ASP-s001

0.83

9 0.94

0.16

ASP-s0129.g1517/ 399/ 901/4-16 3-16

ASP-s0181.

00/232-40 28-14 6 0.86 0.81

0.g1054/

ASP-s0123.g1148/ ASP-s06 12-137 8 6 0.68

ASP-s0176.g534/56-230 ASP-s0010

4 0.99 g867/6

0.56 0.83

5-176 0.81

239-41 36.g940/84-1

0.86

ASP-s0004.g 7 0. 0.38 ASP-s0010.g

9 7-15 ASP-s0371.g125/135-2 .g10

7 ASP-s0

1-152 0.76 8

73/63-16

ASP-s0118.g745/2 85

0.73 ASP-s

0.34 0.88 010.g105

1976/6-91 1056/9-14

0.37 0.99

8

0.25 0010.g1059/15-14

ASP-s0068.g 0.89 ASP-s0010.g1

0.36 7/25-165

3

ASP-s0068.g229/28 ASPR-s023 1

80 0.94

Nam-ASPR-g09

ASP-s0068.g226/28-139236/30-2176-41 0.92 0.91 0.87

0.95 0.42 063/47-140

5 0.48 3.g3068 6

ASP-s0038.g3633 Nam-ASPR-g0

0.34

0.74

8 0.81

/6-150

ASP-s0019.g3863/2 0.81

-13 694/48

8 0.99 Nam-ASPR-g09

8

9692/58 0.84

-182

ASP-s0395.g660 /25-14 1 ASPR-s0

0.99

1 -142

ASP-s0395.g653/26-199 0.93 ASPR-s02 5 043.g842/1

6-199 0.37 0.99 0.86 0.99 693/9-9

ASP-s033 also we However, .

ASPR-s00

10.g2

/26-199 4

0.58 1-147

ASP-s0336.g2889/41-226 0.18 ASPR-

6.g2892/ 121/20- and

0.85

0.97 42.g709/2-159 s0015.g286

ASP-s0038 Nam-ASPR- 6

6

41-215 0.92 0.53

ASP-s0038.g Nam-ASPR-g13006

.g3635/25- 0/10-15

ASP-s0468.g2011/133-31 0.96 102019/

0 Oden-AS

3634/25- 19 9-116

ASP-s0031.g 9 ASPR-s0352.g3266/ /10-141

ASP-s0012.g1875/23-160199 PR-24659 0.74 ASPR-s00

0.51 /3-145

2273/28-1441 0.01 0.83

ASP-s0080 0.97 0.92 0.99 0.67 Nam-ASPR-g 37-115 05.g2681/44-

0.93

Nam-ASP-g07903/367-5.g1347/209 0.99

0.26 ASP-s1162.g3713/1 17186/88-17

ASP-s0080.g1347/10-151 0 178 ASPR-s003

-37 0.14

9 0

ASP-s0002.g ASPR-s0599. -9

32 0.9 9.g111/7-

8

0.8 3 0.81

ASP-s0002.g777/ 0.95 ASPR-s

13 g483/50-

778/24-15 0.81

8

0.96

0.34 0064.g 12a

ASP-s04 ASPR-s059

0.26

13 3511/

0.34

6 0

49.g1663 92-251 90-22

1

0. ASP-s0043.g 9.g480

ASPR-s0623.g780/17-15

1 3

0.59 0

/74-23 /31-16

ASPR-s01 ASP-s0043.g843/24-201844/26-2 5 1

0.83

02 06.g37

ASPR-s00

ASP-s0002 7

34/66-

0.18 0.28

0.51 18.g3621/62-

Nam-ASP-g11184/237- 172

.g776/ 1 ASPR-s0

9-184 0.84 022.g49

ASP-s000 228 Nam-ASP

6/44-192

2.g608/ 0

ASPR-s0 401 0.96 0.97 R-g047

A. ceylanicum A.

242-342 ASPR-s0004. 53/2-

015.g 0.55

2844/4 144 0.63 g1889/19-146

ASP-s0213.g23 0.74 ASPR-s012

6-134 0.53 con By ). 0.38

0.g90

ASP-s0002 06/63-17 ASPR-s0145

3/63-19

0.73 0.64 0.74 .g615 3 0 .g2468/26-1

ASP-s0535 /214-27 ASPR-s00

A. ceylanicum A. 0.78 28

ASP-s .g3088/5-1500 0.82 15.g2843/69-19

0061.g3212/370- 0.14 ASPR-s0020.g40/ 0.46 9

Nam-ASP-g089 0.94 ASP-s0031.g224 0.86

422 67-183

ASP-s000 46/46-20 0.77 0.86 8/129-205

6.g3072/262-4 6 ASPR-s0031

ASP-s0053 0.99 .g2249 ASPR-s0 /4-7

53 9

ASP-s00 .g2323/6-11 0.96 031.g2245/

0.86 21-146 9 ASP-s0

53.g2287 0.81 0.74

/479-645 0.98 031.g2

ASP-s0 248/6-8

Nam-ASPR-g15526/

0.92

525.g292 0

9/28-21 0. 1

ASP-s0101.g3406 ASPR-s0233.g3073/61-

0.86 6-155 0 0.91 0.37

3

/221-381 0.34 0.31 ASP-s0101.g3402 150

0.84 Nam-ASP ASPRs

/79-239 R-g17038/54-

0.

1

ASP- 18 9

s0007.g3401/38-141 0.91 Nam-ASPR-g

0.39 15528/5-

ASP-s010 135

1 Nam-ASPR-g155 0.85

1.g3408/ 0.33

4-11 30/4-164

ASP-s0672.g1383/35 0 0.89 0.48 ASPR-s00

0.33 0.33

42.g572/

3-45 0.96

3 8-18

0.66 4 ASP-s023 0.38

2.g3033/311-472 0.92 ASPR-s0015.g2881

ASP-s0225.g /8-146 Nam-ASPR-g15

2764/262-428 0.77

268/2-153

ASP-s0005.g2399/246-4 0 ASPR-s0015.g28 0.67 08 77/1-144

ASP-s0005.g2

395/1-15 ASPR-s0201.g1

4 0.92 0.6 711/1-14

4 5

ASP-s0005.g24 H b

a

0.79 k

-

0.82 A

S P R

- n s p - 1 6 05/25-138 / 2 3 - 1 5

0 -

AS 0.91 -

ASP-4C P-s0005.g2

4 A 1 S 4 P / R 2 - s 9 0 5 0 - 2 4 3 . 5 g

2 7 4 2

/ 3 - 1 5

ASP 0.96 2

- A s S 0 P 1 R 7 - 7 s 0

.g 4 5

6 7

1 . g 9 1

/3 8 0

0 8

/ 1 8 0.81 - 0 4 - 6 1 6

6 2

0.93 0.88

0

.

9

4

0

.

7

8

A S P A R - S s 0 P 0 0.93 1 - 5 s . 0 g 0 2 8 9 7 8 5 / . 9 g 8 - 2 2 8 2

55/322-481 0.78 3

A. ceylanicum A. ASP-s0095.g2855/24 0.76 0.73 0.98 0 9 A - S 2 P 9 R - 6 s 0 4 5 7 . g 1 8 1 0 / 1 1 - 1 0 0.96 6

ASP-s085 3 O d

. e n g - A S 2 P R 1 6 - 0 7 9 4 0 / 3 2 - 1 3

/259-306 0.9 3

0

.

5

8

0 . 2 0.08 2

0 . 8 ASP-s0095.g2865/1-156 9 0.84 . 1 A 2 S 0 P 7 4 - B / 3 1 1 - 4 7

0 0.62 Oden-ASPR-10741/2-133 A S P - s 0 0 0 4 . g 1 7

5 3 / 2 4 - 1 3 2

), green ( green ), 0.81

0

1 0.22 A S P - 4

A /

3 O 0 d 2 e - n 4 -A 6 SP 7 R-10742/2-13

0.94 3

0

.

5 ASPR-s0018.g34 respectively). genes, 33 and (35 non-parasitic the than more remarkably parasites of element infection important hookworm an comprise might ASPRs ASPs, like Thus, one ASPR in proteins were to ( predicted be secreted Tables 13 example, (for strongylids some in A 99 S /8 P -1 - 7 s 6 0 0 0 4

0 .

. g 8

1 8

7

6 5 / 2 9 7 - 4 1

1 0 0.73 0.76 0

. 0 .

7 5 . 5

9

3 A

A SP

S R

P -s - s 04

0 5 0 7 0 .g 4 1 . 8

g 0 1 9 7 /1 6 1 3 -1 / 5 3 2 3 9 -

4 5

2

0.25 0.72

A

S

P

-

s A 0 S 0

0 PR 5 - . g s0

2 4 3 5 9 7 and and 0 .g / 1 2 8

5 0 5 3 0.97 / - 3 4 0.58 - 1 91

7 ASPR-s0457.g1807

ASP-s0004.g1771/1-17 0.89 /3-138 0.98

8

0.02 ASP ASPR-s0457.g1800/4-14

-s0005.g241 0.61

0.84 0.84

6/234-361 0

0.36 ASPR ASP-s0177. -s0457.g179 0. 5 g616/2-142 9/13-152 A. ceylanicum

ASPR

A

S

P -s0457.g1806

- s

0

1

7 0.79 7

. g

6

2 /5-13 4

/ 7 0 - 9

2 3 3

0.91

ASP-s0 ASPR-s0

005. 0.52 457.g1813/ g2389

/43-18 1-169 0 ASP-s0005.g2413/9-12 Oden-ASPR

0.92 -24342/23-14

1 Nam-ASPR-g16545 8 Nam-A 0.79

SP-g09589/1-116 /30-149

0. 4 ASPR-s0042.g574/8-15 ASP-s0046.g1398/138

1 2

ASP-s0046.g137 -30 0.98 ASPR-s0045.g1247/8 3

0.98

0.88

1/238-402 0.7 ASPR-s0287.g1455 -136 ASP-s0046.g13

N. americanus N. 0.91

97/242-407 /7-143 ASP-s ASPR-s0042.g717

0.94

0046.g1373/177-342 0.3 ASP /7-15

ASP-s00 R-s0020.g78/37-1249

46.g1372/267 0.6 0.

Nam-A 5 3 Oden-A

-432

SP-g06987 0.8 SPR-254

0.68 1 ASPR-s0357.g 19/54- /227-3

Nam-ASP-g19466/2-9 0.83

0.88 14

92 3396/7 6 N. americanus N.

Nam-A 0.14 0.09 0.4 ASPR-s0081.g -140

9 0.78

SP-g18444/4- 7

0.85 ASPR- 1426/8-14 ASP

0.

8 -s0116.g611/2

142 s0081.g1431/1

0.99 1

ASPR-s00 ASP-s0002.g610

35-401 0.85 3-137

0.83 ASPR-s0081.g1427/9-14681.g1435/23-13

ASP-s000 0.23

/106-19 0.96 N. N. americanus

0 9

2.g6

0.96 ASPR-s0081 Nam-ASP 0.85 04/

0.63 and 75-19

1

-g059 .g14 Nam-ASP- ASPR-s0

71/75 0.52 0.76 28/99-236

0-91 473

g05957/401-536

0. ASP-s00 ASPR-s04 8 .g208 or or

9 0.61 0.85 0.79 5/8-90

0.91

16.g3 ASPR-s02873.g2087/6-14 ASP-s0149.g2708/20-20 117

/1430

0.88

-15 3.g13 0 VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME

ASP-s0010.g1034/346-51 Oden-ASPR 4 0.47

4 0.9 11/40

5 ASPR-s0081.g1423/27-10-17

ASP-s0010.g1032/190-30 -22809/18 9 2 0.4

0.75

ASP-s0010.g1026 -14

0.72 ASPR-s0046.g1420/75-15 6 0

0.87 0.93

0.61 ASPR-s0046.g1

ASP-s00 0 7

6

/16-18

0.94 0.4 Heligmosomoides bakeri

10.g ASPR-s004

ASP-s0010.g1023/277-441 1

3

102 0.41 415/41-1 7

N. americanus N. 8/24 0.81 ASPR-s0659.g1258/48-14 ASP-s0650.g1130/193-310 2-414

0.86 6.g1417 01

/15 ASP-s0010.g1014/307-473 0.95 0.84 1 ASPR-s0659.g1260/51-16-129

ASPR-s0034.g2844/49-1 14

ASP-s0010.g102 1

0.95 0.92 ASPR-s0015.g28 1 0.96 Nam-ASP 0.96

1 1 5/140-302

Nam-ASP-g013 ASPR-s0

-g1554 5

0.57

0.48 0.79 ASPR-s009 80/1-11 2

Nam-ASP-g 7/269-3 015.g2879

0.99 0.38 0.47 8

69/268-42 0. 9

Nam-AS 0.91 ASPR-s0258.g 0

9 had 432 ASP genes, noticeably more than the related

0.96 7.g2983/21-16/1-14

0274 0.79 ASPR-s0020.g41/19-103

0.86 5 Nam-ASP-g0 P-g01367/77-196 5 2/13 0.88

0.49

2-29 ASPR-s0020.g74/8456/2 ) but not (for all example,

Nam-ASP 2 0.97 0.6

6 4 1-16

1366/24 ASP Nam-ASP 1

-g01308/4-14 R-s0020.g73/24

S

7-40 0.76 ASPR-s0031.g2246/24-100 Nam-ASP 0.95 0.9 8 -g01305/23-184 9-19

7 ASP 2 Nam-AS

2 0.71

-g013 0.84 R-s0123 -101

0.63 ASP

Nam-ASP-g1 0.84 09/1-16 P-g013

and ASPR genes from other nematodes. The tree tree The nematodes. other from genes ASPR and 0.12 0.72

upplementary upplementary R-s023 .g111

Nam-ASP-g1801 ASPR-s0020.g13 0.87 62/1-11

5 0.94

0. 4.g3130/7/7- Nam-ASP-g171

389 8 ASPR-s0025.g1272/8- 135 6

0.93

7/27 1 ASPR-s0188.g1150/81-182 ASP-s010 0.8 34-15

7-448 0.29

1

1/1-55 0. 7 ASPR-s0148.g2676/28-1030/9-1 9

ASP-s0815.g 8

55/131- 0.49 41 0.88 1.g3352

ASP-s ASPR-s0013.g20

), purple ( purple ), 0.85 13

29

0.98 7

/305-46 1

ASP-s0007 0.92 ASP 0007.g3490/266-42 vivo in 2498/29-19

0.82 0.93 R-s0010.g114

ASP-s0007.g3 ASPR-s0013.g2039/13-14

9 0.81

ASPR-s0020.g126/137/9-12 .g3499/16 ASP 2

-s0003.g1626/468-62

ASP-5A ASPR-s0020.g128/7-1369/13-1415

1

489/232 0.34

-206 0.92 Nam-ASP

ASP-5B

/211- 0.99

0.23

-387

ASP-s0003.g1634/13-15 Nam-ASPR-g

(128 genes) and 381

0.29 09-24 1

/225 R-g03029

ASP-s005 ASPR-s0015.g2865 0.

-383 0.64 6 8

9 Nam-A

ASP-5C ASP-s0 01315/5/94-16

3.g23 Nam-A

0.67

Nam-ASP SPR-g17814/4

016.g 3

0.91

96/32 0-127

Nam-ASP-g01217/5 Nam-ASPSPR-g172 0.98

0 3129/1 /113-19

Nam-ASP-g036 -204

-g02238/154-270 0.21 Nam-ASPR-065676/24-16

1

0.37

593-1 Nam-ASPR-g15527

ASP-s0183.g9 R-g15529/3-14260/102-23-13 6

0.92 1 9

ASP-s0004 Nam-ASPR-g17261/3-14 768

0.21 Nam-ASPR-g13007/14-1 0.97 Nam-ASP-g 5-164

. Almost all domains from ASPRs fall within within fall ASPRs from domains all . Almost 36/220-292 8

Nam-ASP- 0.35

47/265 ASP

0.74 .g169

Nam-ASP-g0 ASPR-s0R-s0233.g3072/62-14/3-14 3

-34 0.89

7/1-14

18016/1-55 ASPR-s0067.g115/31-173

ASP-s0 2 9

g15942/ 0.74 0.05 0.26 .

0 Oden-ASPR-12577/2-142067.g109/86-234

ASP-s0240.g3336/ 0.64 0.94 0

351.g32

8070/1-8 Oden-A ASP-s00 54-153

0.08 00

0.96

0.93 0.92

ASP-s0066. Oden-A

0.89 0.85

46/186-257 SPR-12578/2-14 6 7 Oden-ASPR- ASP-s006

66.g

0 0.84

0.94 Nam-ASPR-g020SPR-12576/2-143

ASP-s0157.g3201/ 0.86

377

237-403 0.86 1

Oesophagostomum dentatum Oesophagostomum g3773/243-407 ASPR-s0012.g1 ASP-s 4/23

0.96 0.98

6.g371 0.78

ASP-s0520. Nam-ASP-g1 23562/12-154 7-401

0050

0

0.94 Nam- 3

ASP-s00 0.57

4/2-146 0.48

0.88

.g19 Nam-ASP-g ASP-s0009.

0.69 ASP-g18987/3-7 69/147-195

242-406 0.78 ASP-s g2851/8 0.99 0.93 Nam-ASP

87/4- 803/10-135

09.g4 9283/2 0.85 0.92

ASP-s0009.g458/10-170 0.52 Nam-ASP-g07

T 16 0.79 0.99

0009

56/5

0.36 0.83 Nam-ASP-g07493/5-10

ASP-s0009.g4 1 4-227 1729

g455/2 -7

0 -g17293/2-103 5

-169 ASP-s0119.g842/340-49 ASP-s0

.g462/ 0.36 2/2-10

0. 0.98 ASP-s 5 ASP-s0365. ables 4, 13 4, ables 2-18 9

0.86

5-173 490/13

ASP-s0365.g359 ASP-s01

009.g461/23-18 1 A. ceylanicum A. 6

0.15 0428.g1 ASP-s03 ASP-s0066.g3712/103-281 60/5-174

0 0.84 0.16 0.74 ASP-s0066.g3 -8

ASP-s0365.g 0.91 0.77 8

g3591/6-171 ASP-s0066.g3773/36-18857.g3201/9-184

ASP-s0105.g3666/19

0.92 293/88-17 3 65.g35

0.96

Nam-ASP ASP-s0240.g3336/

0.6

0.37 0.74 0.95 0.32

4 ASP-s0351.g3246/8-183 Nam-AS 9

0/8-17 0.97

94/19-1 9

ASP-s01

3592/8-7 ASP-s1508.g3901/29-138774/6-182 7

0.

0.75 0.9 0.4 americanus N.

0.99 ASP-s0477. 1 ASP-s063 -g12643/8-173

9 0.66

8 P-6/8- 0.85

ASP-s063 0.97 ASP-s0477.g2 82

0.82 0.87

2 05.g ASP-s1062.g3517/7-66 0.76

ASP-s06 1

is by adults secreted parasitic 0.97

-165

155 8-183 ASP-s ASP-s

3671/

3.g902 0.8 0.99 g2178/ Nam-ASP- ASP-s0097.g2962/1

3.g889/5-17

0.98 0.07 ASP-6/14

0.54 ASP-s0016.g3114/ 0.93 33.g89 8-17 0183.g947/ 0128.g1456/22-100

/18-17 0.41 ASP-s0010.g1028/4-176 172/93-265

ASP-s00 30-179

0.

2

0.79

ASP-s0067.g33 ASP-s001

0.99 0.88 9

5/27-1 0.98

g15467/1-109

4 Nam-ASP-g01 Nam-AS

0.

0.75 Table Supplementary 4 -162

6 0.89

9 ASP-s023 Nam-ASP-g01

7-17

67.g29/5-

0.85

95 0.73 0.87

ASP-6N

ASP-s09 0.87 Nam-ASP-g01364/13-173

0.28

0.85 0.31

0.9 0.g1034 1 Nam-ASP-g01363/7-178

ASP-s0 9-16

P-g08137/13 0.85

0.87 0.74 0.75 469-612

ASP-s01 Nam-ASP-g13

5 0.85 0.9 H. contortus

0.82 6

2.g3033/ /18-165 168 Nam-ASP-g0274 Nam-ASP- 0.86

90.g 366/4-175

0.95 0.99 3 ASP-s0095. Nam-ASP-g0 /79-257 006.g30 0.76

Nam-ASP-g0 365/1 ASP-s0061

3314/

77.g

ASP-s0489.g2364/33- ASP-s0010.g1022/24-189

0.

-16

4-201

0.43

0.84 0-178

0.9 0.97 ASP-s0010 5 ASP 0.

619 72/18-1

6-11 g0950

2 1 897/20-197

ASP-s0242.g34 ASP-s0010

8 C. elegans C.

0.35

g2855/15 /109 0.98 ASP-s0 -s0489.g2361/46-116

ASP-s048 0.76

6 0.7 1 1369/3-184

.g3212/ 0.91 ASP-s003 0.68 0.85 ASP-s0010.g1 0/18-19

9/108 4393/10-181 8

0.42 H. contortus -22 ASP-s0610.g625/257-414 ASP-s0030.g2138/151-263 5 5 0.62

0.66 0.86 0.99 ASP-s0610 .g101 ASP-s024 0.

0 010.g1014/3-187

0.3

-174

0.54 0.89 0.91 0.25

-14 ASP-s0030

71-238 7 ASP-s0017.g3 .g1019 2

9.g2365/3

0. ASP-s003 ASP-s0017.g326

1 8/526

and and 5

0.g2143/38-208

0.88 0.98 7 ASP-s0610.g625/54-217

ASP-s0030. 20/48-150

0.92

19 0.27 ASP-s0610.g626/5-174

ASP-s003 /7-1

0.99 0.94

0.92 023/2-189 ASP-s03 8 0.17

2.g342 ASP-s0610.g624/2 .g62 -701

0.95 0.98 0.81 ASP-s0610.g624/427-587

ASP-s0 85 0.84

-152 .g2215 0.

0.g2183 ASP-s0041.g354/

Nam-AS 0.76 6/21

0. 5

ASP-s0 ASP-s0522.g288

0.86 0.91 263/43-

8/35-1 0.98

8 ASP-s0256.g3

ASP-s0005.g2388/1- g2150/6-179

0.g2206/

56.g3362

0.92 3-37

0.75 0.73

061.g3273/7-18 ASP-s00 ASP-s0256.g353/112

/21-19 3/245-408

0.71

0.97 0.64 ASP-s0256.g364/18-12 ASP-s0

/24-180

P-g18166/37-1

88 0.74

005.g 0.97 0.94 Nam-ASP-g16 1 0.97

ASP-s0087.g203 0.93 0.54 21

ASP-s0087.g20 Nam-ASP-g02290/14-11 3 2

0.83

26-22 ASP-s0477.g2175/8-11 ASP-s0005. 0.97

/7-180

ASP-s0087. 0.99

74.g904/123 0.87 59-400

0.98 Nam-ASP-g061

2390 1 074.

ASP-s0005.g24 ASP-s0041.g359/

0.75

0.94 0.95 29-183

0.66 ASP-s1162.g

ASP-s0177 0. 57/3 0.86

0 2/30-18

0.36 ASP-s0048

ASP-s0004.g g903 0.91

/5-18

0.74 9

0.

0.91 7 0.89 0.82 ASP-s0004. ASP-s0048.g1591/37-204 and

0.99 0.99 9-20

8 ASP-s0048.g157 ASP-s0013.g1938/2-

14 8 0.87 0.37 753/3-6

0.9

ASP- 1 ASP-s0048.g1576/1-11

114 /102

1

0.25

g2416/8-18

0.82

ASP-4A/33-2 ASP-s0048.g16 6

3/7-186 1 g2032/10-106

-181 -215

34/8-187 ASP-s0048.g16 0.92 ASP-4 9

0. 0.84

ASP- Nam-ASP-g07115/6-179 -249 1

.g615/29-265 s0004.g1763

0.54

7 0.92 ASP-s0119.g8 0.59

ASP-s0045.g1155 .g15853714/22- 64/32-12 5

0.33 9 0.15

ASP-s ASP-s0119.g81

Nature Ge Nature 14/9-19

ASP-s0123.g1110/ 0.95

1772/1 0.89 ASP-s0935.g3111/5-17

g1770/21-177

0.92 0.95 31-132 B/34- s0004.g176 ASP-s0143.g2413/9-17

Nam-ASP-g0 ASP-s0522.g2886/9 8 0.65

ASP-s0045.g11 0

2

0.42 ASP-s0048.g1641/48-21 0.71

ASP 0.62 /17-17

0.99 0.32

0610.g628/5-19 0.72

ASP-s0045.g 1 ASP-s004

; ;

02 6

0 21 0.8 0. ASP-s0045.g 0.28 0/14-179 10 -220 ASP-s0119.g835/8

. 0.98 ASP-s050 Nam-ASP-g10 0.92

1

0.63 55/85-249

9

2 -s0346.g ASP-s0119.g828/40-197 7 Nam-A

/25-23

0.57 0.84 56/13-180

6 0.93 Nam-ASP-g05802/ ASP-s0853.g2693/8-211

180 7

0.94 ASP-s0574.g17 Nam-ASP-g 03/20-177

0.85 ASP-s01

Nam-ASP-g 0.8 5/9-19 0.8

0. 0.93 0.84 ASP-s0024.g1089/47-107 Nam-ASP-g10834/31-214 0.99

1/43-201 4 and Nam-AS

1696/6- Nam-ASP-g1 Supplementary Supplementary

1

(161 genes) and 3

7 /9-75 1 Nam-ASP-g15064/ 8.g1

ASP- 5 7 SP-g17849/11-1 0.51

Nam-ASP-g18720/14-118 ASP-s0045.g1135

3142

ASP-s0225 Nam-ASP-g15063/9-18 5.g2671

Nam-AS Nam-ASP-g16708/14 37/5-199 5-193

1192/29-143 7

1

0.

ASP-s0999. ASP-s0003.

1164/5 0. 642/

0.3 ASP-s0042.g548/54- 2

ASP-s0053.g2

0.88 9

s0045.g11 62.

ASP-s0 ASP-s

0.87 11 5

0.26

850/2

/47-22 ASP-s -132

0.99 ASP-s0035.g3034/6-17 9

P-g12045

ASP-s067 0.95

0.84 0.06 ASP-s0011. 06266/18-18 americanus N.

ASP-4N 6 ASP-s0074.g903/30 81-23 6

4

ASP-s000 g339 0749

ASP-7A/34 ASP-s0074.g909/2-136 -161

ASP-s0010.g1 /62-21 ASP-7B/23-15

-12

ASP-s0222. ASP 7/66-178

P-g13008/5

ASP-s0222.g2 ASP-s0194.

3-19 ASP-s0133.g1798/26-138 0035.g3030/12-18 7567/

ASP-s0 5

5

22-19 ASP-s0009.g82 0053.g2289/69-16 ASP-s0194.g14

.g2764 9/57 074.g875/

7/24 ASP-s038 ASP-s0183.g959/3

0 9

Nam-ASP- ASP-s0477.g2170/23-2

ASP-s01 ASP-s018

ASP-s0101. ASP-s0183.g96 -s0010.g

40/6-2 g138

g3346/2 ASP-s04

8 ). Most ASPR ASP-s0020.g 1

87 ASP-s0272.g9

ASP-s0050. /21-190

ASP-s000 ASP-s0847.g2665/69-241

2.g1383/18-18 ASP-s0 16-117 -166

-198

ASP-s

ASP-s0 ASP-s 6-18

6

ASP-s0 ASP-s0272.g952/11-13

9.g643 ASP-s0173.g393/6-17 ASP-s00

/6-201 g132

324/1-181 ASP-s0173 ASP-s00

9

/5-178 ASP-s0151 627.g815 ASP-s0001.g198/10-183

-15

Nam-ASP- ASP- 4/36-2

ASP-s0151.g28

ASP-s0

ASP-1/2 ASP-s0151.g2841/4- Nam-ASP-g0 ASP-s0173

ASP- 02

Nam-ASP-1/

ASP-s0328 ASP-s0173.g3

Nam-ASP ASP-s0005.g2699/80-179

ASP-s ASP-s0

-152 3

ASP-s0048.g1648/25 g2616/5-11

Nam-ASP-

ASP-s018 Nam-ASP-g1 ASP-s0048.g1649/

8-182

ASP-s0009

6 ASP-s0713.g g1419/15-1

9-199 Nam-ASP-g1906 Nam-ASP-g17853/1 ASP-s0025 7.g459/3

01.g3381/ Nam-ASP-g17085/6 -192

ASP-s0020 ASP-s0024.g96 ASP-s0009.

ASP-s0201.g Nam-ASP-g1 Nam-ASP-g16121/41-1

Nam-ASP-g14

Nam-ASP-g07690/6-170 1207/

Nam-ASP-g18225/1 ASP-s0020.g50/ ASP-s0069

0004. 209/2 9/40

6 0046.g1399/9-10 77.g21 7/35-154 3.g96

272.g933 215 8

/34-13

004.g2023/5-1

618/28-161 s0151.g2 050.g205 12

s0216.g2381/ ) or magenta ) magenta or 4.g202

g3382/31-162 1-45

50.g2045/61-25 3-552

50.g2042/

8/39-169 328. 17/16-16

g2040/4-15 0071

/37-

104/88-22 P.pacificus

2-182 57-2 5-19 4 g2026/8- 5

245.g3568/8 1/59-21

9 .g394 39/31-1 2/8-177

g11963/24-1 .g2847/73-234

4 5 71/17-

7-165 1 -g16749/46 3-20

.g395/1

14

g2633/

8.g1147/210-365

.g2635/

39-164

g19154/1-

2 .g52 3 9/17-146 8

/39-202 8 41-212 2238/37-1

.g633/553 844/30-1

.g1267/7 2 .g53/3-165

6

1/3-100 .g403/10-7 9 g633/16-17

7852/1-98 46/35-193 /5-17

97/37-202 1

1756/9-171 1745/10-174 1 8 8170/5-130 189 0

6/29-

3-25 1

97 42 978/12-17 0

ASP-7

1/37-190

6-17 1 74 -19

2-16 18-194

2/2-137 1-101 7 7 8 1 25-12

126

-181 5

196 8

5

-224

83 -98 66 1-197 90 -17

-176

2-121 8

08

-706 5

1

7

5 5 6

5 7 n ASP-1N etics ), ), and 4 6 .

© 2015 Nature America, Inc. All rights reserved. olive green, respectively; respectively; green, olive S 5 Figure ( bind cytoskeletal proteins such as actin were prominently upregulated components of structural encoding and cuticle genes products whose ASPRs. ASPs of and profiles and expression sequences might which favor diversifying continually systems, immune host diverse be might condition variable a relevant conditions variable under fitness robust for needed collectively be yet fitness phenotypic on effect individual little have might family gene large a of diversity members ‘graypawn’this hypothesis: the is for explanation One species. other missing the were from which entirely respectively, genes, ASPR 25 and 92 had also Nature Ge Nature family in free-living nematodes but then to have expanded greatly in both hookworms and other mammalian parasites. mammalian other and hookworms both in greatly expanded have to then but nematodes free-living in family in branches ( and blue; light and blue in shown are S Supplementary TableSupplementary 11e upplementary upplementary upplementary Fig. 5b Fig. upplementary For development from 24 h to 5 d after infection (24.PI to 5.D), genes

Phylogeny of SCVPs from from SCVPs of Phylogeny Haemonchu n H. contortus H. ex etics Hookwor e pansion xpansion T ables 4 ables

m s VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME ). The SCVP phylogeny falls into five branches: two large, independent gene expansions in hookworms (green); two more more two (green); hookworms in expansions gene independent large, two branches: five into falls phylogeny SCVP The ). and and (orange); and one small branch for non-parasitic nematodes (blue). Like ASPs, SCVPs appear to have existed as a small gene gene a small as existed have to appear SCVPs ASPs, Like (blue). nematodes non-parasitic for branch small one and (orange); H. contortus H. 18 ). This period ). in This period the life cycle corresponds ) is shown. Species are indicated by color: the hookworms hookworms the color: by indicated are Species shown. ) is A. ceylanicum A. is shown in orange; the free-living free-living the orange; in shown is H. bacteriophora H. and other nematodes. A maximum-likelihood phylogeny of SCVPs ( SCVPs of phylogeny A maximum-likelihood nematodes. other and , an insect parasite, is shown in purple. Confidence values are given as decimal fractions fractions decimal as given are values Confidence purple. in shown is parasite, insect , an 4 7 . For parasites, parasites, For .

Caenorhabditis than classical secretion to export proteins into their hosts their into proteins export to secretion classical than Table 16 non- be secreted to predicted ­classically were 21 which of residues, ~200 of proteins In (SL4Ps). proteins L4 ylid tissue brain infecting L4 larvae and ­cantonensis ceylanicum strongylids the A. in homologs with stage, this at upregulated family ( differentiation sexual overt and larvae L4 into molting feeding, parasitic of start the with 15 ); the corresponding genes in in genes corresponding the ); ); ); notably, rather often nematodes parasitic use non-classical nematodes ( nematodes ( A. ceylanicum A. Supplementary Supplementary Fig. 3 , , . americanus N. 4 9 C. elegans C. ihu a edr eune ( sequence leader a without Fig. Fig. and and Haemonchus 1 e A. ceylanicum A. N. americanus N. xpansion ) and and 6 4 , , , 8 1 . . Westrong family this named thus 0 . contortus H. . We also observed a new protein protein new a observed also We . and C. briggsae C. S upplementary Fig. 5 Fig. upplementary A. cantonensis A. Supplementary Supplementary Tables 4 are shown in green and and green in shown are , 24 SL4P genes encoded encoded genes SL4P 24 , ) and ) and Hookwor e xpansion and and P. pacificus s r e t t e l are expressed in in expressed are m Supplementary Supplementary Angiostrongylus Angiostrongylus and and Non-parasiti nemat 5 0 .

ode , , 419 s 12b c -

© 2015 Nature America, Inc. All rights reserved. the genome, we searched for genes that were conserved by diverse diverse by conserved were that genes for searched we genome, the with genes by ( encoded are four All recently been identified in in identified been recently have and trehalose-6-phosphatase) 3,5-epimerase dehydrorhamnose dTDP-4- O-palmitoyltransferase, targets carnitine , drug (adenylosuccinate Four possible. as fails targets many often as identify development to essential drug Because vaccines. or drugs genes. ASPR and for ASP observed those to analogous nematodes to 6, an suggesting expansion of SCVP genes in mammalian-parasitic contortus H. sically secreted ( encoded ~150-residue proteins, of which 48 were predicted to be clas In (SCVPs). V proteins clade secreted and Caenorhabditis briggsae riophora ( parasites not in only strongylid members with at infection, 12 d after responses immune host suppress help nematodes parasitic the in have observed been also lectins than rather nematode mammalian ( lectins receptors and neurocans asialoglycoprotein mammalian resembled genes C-lectin four other mammalian of ( duplication intragenic through arisen mimics structural receptor encoded mannose genes these of Two ( lectins nematode to than malian we detected 6 whose products had greater apparent similarity to mam ( system exposes which feeding, blood heavy of onset the and fertility incipient with adults young to larvae This period in the life cycle corresponds with maturation from late-L4 were prominently upregulated ( protein phosphatases, serine/threonine kinases and C-lectins target classes(andtheirdrugdata,ifany)aregivenin encodes trehalose-6-phosphatase,apredicteddrugtargetin avr-14 encoding itislisted,alongwith Predicted drugtargets,encodedby72genesin Trehalose-6-phosphate synthase Secreted lipase O-acetylserine sulfhydrylase Nematode prostaglandinFsynthase NADH:flavin ,Oye2/3type Malate/ KH-domain RNAbinding Isocitrate lyase/malatesynthase Glutamine-fructose 6-phosphateaminotransferase Glutamate synthase Glutamate-gated chloridechannel Fumarate reductase -independent phosphoglyceratemutase Ammonium/urea transporter 4-coumarate:coenzyme Aligase,classI Protein class T s r e t t e l 420 A. ceylanicum Supplementary Table 4 Supplementary able able 1 A key motivation for parasite genomics is to identify targets for for targets identify to is genomics parasite for motivation key A We also observed a previously undescribed gene family upregulated encoding genes 12.D), to (5.D infection after d 12 to 5 From

Supplementary Tables 12c Supplementary wasrecentlyshowntobeadrugtargetofnitazoxanide l -,YlbCtype s Supplementary Fig. 4b Fig. Supplementary ) ) but in also clade V related ( non-parasitic species Fig. Fig. ummary ummary of predicted drug targets in had 11 to 101 SCVP genes, other nematodes had only 1 only had nematodes other genes, SCVP 101 to 11 had 1 , ) 10 N. americanus Supplementary Table 4 , 1 1 5 . Among 22 C-lectin genes upregulated by 12 d, d, 12 by upregulated genes C-lectin 22 Among . 1 , with five tandem C-lectin domains that had had that domains C-lectin tandem five with , and 5 Ascaris suum Ascaris C. elegans ). To identify additional drug targets across across targets drug Toadditional ). identify 1 but from nematode arose phylogenetically H. contortus H. P. pacificus , H. contortus A. ceylanicum A. Supplementary TablesSupplementary 11g and and , c homologsthathavemutantorRNAinterference(RNAi)phenotypesanddataindicatingwhetherthedrugtargetislikelytowork. ). Lectin genes with similarities to to similarities with genes Lectin ). Supplementary Tables 4 Supplementary 18 A. ceylanicum and and 52 A. ceylanicum A. ; ; Supplementary Fig. 4a Fig. Supplementary ). Whereas and and ). We thus named this family family this We ). named thus Fig. Fig. , 5 3 . ceylanicum A. . and Toxocaracanis S upplementary 5 N. americanus N. , , to the host’s immune immune host’s the to Supplementary Fig. Supplementary 5 Heterorhabditis bacte A. A. ceylanicum H. contortus 6 ( 7 ; S N. americanus ipgm-1 upplementary , 53 SCVP genes genes SCVP , 53 T A. able 19 , previouslydetectedasapromisingtarget,wasfoundtoencodepoorlydruggableprotein 2

orthologs orthologs and might might and ceylanicum 0 C. C. elegans 20 and . “NA”indicatesproteinclassesforwhichwearenotawareofpertinentdata.Referencesalldrug and and , 5 24 4 . T 14 10 10 ). ). The , it is is it , , able 4 55– 5 6 2 3 4 5 2 3 1 1 1 5 12c and 17 5 9 ). ). ). ). genes - - - ), arelistedbytheirproteinclass.Foreachclass,thenumberof . ,

parasites but absent from mammals, might be essential for survival survival for essential be might mammals, from absent but parasites should provide lasting benefits for biology and medicine. and biology for benefits lasting provide should essentially are unknown adaptations these underlying changes genomic the whereas relatives ( free-living their than longer live and larger grow to nematodes adult allows Parasitism changes. evolutionary remarkable illuminate may of humans chronicridding worm hygiene infections improved to due partly be may which autoimmune disorders, help also might alleviate nematodes parasitic Cretaceous the since rapidly resistance that drug evolve nematodes against vaccines and over drugs new infect inventing collectively, that, humans billion 1 nematodes parasitic for genomes contortus H. one and transcripts) adult con all of with (~0.1% protein expression strong secreted sistently 79-residue a be to predicted protease inhibitor undescribed previously a yielded screen this criteria; our also upregulated during early we infection, searched for ones meeting ( scripts tran all of 1% generated genes 12 these of 5 infection, after d 19 by in 4 orthologs with genes, protease B–like cathepsin 12 infection during upregulated also are that had but orthologs mammalian lacked that and infection after d 5 by upregulated permanently were that teases proteins such as hemoglobin host digest cannot hookworms them, without because, and system) immune host’s the to exposed thus (and intestine the in expressed are they as requirements, these meet Proteases survival. for crucial ( 6-phosphatase ceylanicum A. small ( known a molecule by bound homolog one least at had and structures protein three-dimensional known with homologs had phenotypes), of basis the on (determined host the in The sequencing of of sequencing The and accessible immunologically both be should targets Vaccine Supplementary Table 4 Table Supplementary C. elegans C. N. N. americanus 1 , Supplementary Fig. 6 Fig. Supplementary 64– VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME homolog upregulated during infection. during upregulated homolog gob-1 lips-8 cysl-1 C35D10.6 F17A9.4 F36A2.3 asd-2 icl-1 gfat-1 W07E11.1 avr-14 F48E8.3 ipgm-1 amt-2 acs-10 Key , one of which ( of which , one 6 6 . The genome and transcriptome of of transcriptome and genome The . Table C. elegans , , , , , 6 gld-1 lips-9 tps-1 gfat-2 adults are ~1 mm long and live for 3 weeks), but but weeks), 3 for live and long mm ~1 are adults 0 avr-15 . Practically, these genomes will be crucial for for crucial be will genomes these Practically, . 1 6 A. ceylanicum A. , K07H8.9 , 6 1 and and 6 2 adults are ~1 cm long and live for 3–10 years, tps-2 , and that have been parasitizing vertebrates vertebrates parasitizing been have that and 3 . Understanding immunosuppression by by immunosuppression Understanding . genes . Intellectually, understanding these genomes glc-2 3 Supplementary Tables 4 Supplementary 6 Acey_s0015.g2804 . . We genes prothus encoding selected ). Because protease inhibitors were were inhibitors protease Because ). ). This screen yielded 72 genes in in genes 72 yielded screen This ). adds to a growing number of of number growing a to adds C. elegans C. 2 H. contortus H. gob-1 NA NA NA NA NA NA NA NA NA avr-14 NA Limited druggability NA NA 1 . This screen yielded yielded screen This . A. ceylanicum ) encoded trehalose- ) encoded

Drug data Nature Ge Nature predicted observed loss-of-function loss-of-function , , A. ceylanicum A. 19 6 H. contortus H. 8 ; and and and homologs homologs genes n gob-1 20 etics ).

- - - ;

© 2015 Nature America, Inc. All rights reserved. 1. org/ holder to reproduce the material. To view a copy of this license, visit obtainpermissionlicenseCreativethetoCommonsthefromunderlicense,need will users Commons license, unless indicated otherwise in the credit line; if the material is not included reprint at online available is information permissions and Reprints version of The authors declare competing interests:financial details are available in the primarily outcarried by E.M.S. but with input from all authors. E.M.S. conducted all bioinformatics and biological analyses. Writing was libraries both genomicand supervised and RNA-seq Illumina sequencing. Figure 1 hamster cultures through life full cycles and performed the photography for RNA to cDNA for sequencing. M.M.M. maintained genomic DNA and RNA from all infection stages of E.M.S., P.W.S. and R.V.A. conceived of and managed the project. Y.H. isolated start-up funds to E.M.S. and by the Howard Hughes Medical Institute to P.W.S. to P.W.S. and(GM084389) to R.V.A. by(AI056189), UniversityCornell and salary Technology. This work was supported by US National Institutes of Health grants at the Millard and Muriel Jacobs Genome Facility at the InstituteCalifornia of Agriculture and the National Foundation).Science Sequencing was outcarried Agriculture and grant IOS-0923812 from the National Institute of Food and Center (supported by grant from 2010-65205-20361 the US Department of C.T. Brown for use of the Michigan State University High-Performance Computing manuscript, L. andSchaeffer V. Kumar for genome and RNA sequencing, and bioinformatics, T.A. Aydin, J. Liu and forA.H.K. Roeder comments on the We thank C.T. Brown, P.K. Korhonen, S. Kumar and J.E. Stajich for advice on online version of the pape Note: Any Supplementary Information and Source Data files are available in the release archival at WormBase (in predictions gene and and SRR112498 SRR112491 SRR11 cum reads, DNA genomic ceylanicum A. follows: numbers as accession are NCBI corresponding The NCBI. at deposited codes. Accession the the of in version available are references associated any and Methods Me RepBase19 RepBase, com/sujaikumar/a MAKER2, and InterProScan Blast2GO, running for russell.embl-heid tables, modENCODE, term Ontology URLs. Nature Ge Nature CO AU Ackno

licenses/by-nc-sa/3.0 T MPE th SRR11250 H RNA-seq reads, reads, RNA-seq century. roe, . Btoy J & oe, .. ua howr ifcin n h 21st the in infection hookworm Human P.J. Hotez, & J. Bethony, S., Brooker, s/index.htm O FigTree, FigTree, 2490 od . . I.A. constructed large-insert paired-end and jumping Illumina T w R R the pape I CONT NG l e s .02.fasta.tar.g n 5 0 5 Adv. Parasitol. Adv. dg FI and , , , hs ok s iesd ne a raie omn Attribution- Commons Creative a under licensed is work This hr pry aeil n hs ril ae nldd n h atces Creative article’s the in included are article this in material party third other or images The License. Unported 3.0 ­NonCommercial-ShareAlike etics http://www.girins SRR112491 SRR112490 0 pape N me l RIBU cDNA assembly, assembly, cDNA . 8 r . A . ceylanicum A. http://www.mod . In addition, we have deposited the genome sequence SRR112498 elberg.de/coils/coil NC nt ssemblage/blob/maste

http://tree.bio / r All assemblies and raw reads have been been have reads raw and assemblies All . . T r VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME IA s . I ON L

SR 58 I z SRR112484 . NT S 1 6 , 197–288 (2004). 197–288 , R112484 , , , SRR11249 SRR ERES 6 http://archi ; C. elegans genome assembly, assembly, genome 112490 T t.org/server/RepBase .ed.ac.uk/software/f encode.org S 6 GASF0000000 9 and and s.tar.g , , 1 2 7 SRR1124 RNA-seq reads, r/README-annotation. , , , ve.geneontology.org/ SRR112491 SRR112490 SRR112484 z A. A. ceylanicum A. A. ceylanicum ; protocols by S. Kumar S. by protocols ; / ; NCoils, NCoils, ; 85 http://ww 0 http:// 0 ; ; JARK0000000 , , 3 8 . ceylanicum A. igtree https://git , , , and converted and golden 8 creativecommons. SRR112490 SRR112 S WS24 ; ; SRR112500 /protected/ RR112490 http://w w.nature.com/

. ceylani A.

/

; Gene Gene ;

5 on ).

online 491 full

hub. m ww. line line d 0 0 4 9 / 7 - ; ; ; , , ,

30. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 35. 34. 33. 32. 31. 29. 28. 27. 26. 25. 24. 23. 22. 21.

(2009). sequences. DNA ribosomal subunit small full-length (2011). infection. primary a ( hamsters (1972). ceylanicum humans. with comparison its and adult ( hamsters golden in Adv. Biotechnol. outcomes. biotechnological and research fundamental for framework a developing . (1989). 283–289 polygyrus annotations. ontological and sets gene suite. Blast2GO adaptation. parasitic nematode (2013). symbiosis. mealybug nested tripartite a enables genome from transfer (2007). 5634–5641 of membrane outer the to tethered specificity (2013). nematodes. parasitic human in nematodes. from lessons genomes: better. be can bigger why improve to alignments Bioinformatics cDNA mapped syntenically 46 (2008). parasitism. and lifestyle nematode on perspective genomics. comparative for nematode strongylid discovery. vaccine and drug for parasite model (2008). Ancylostoma caninum graphs. Bruijn de using Res. Genome report. hosts. within sites predilection to related are order the within nematodes of origins fruits. rotting from species new numerous n h hs-aaie eainhp uig rmr infection. primary during relationship host-parasite the on 7 infections. helminth 2010. in infections Vectors helminth Parasit. transmitted soil of burden disease and infection 2010. Study Disease of Burden Global the for analysis systematic a 1990–2010: injuries and Nikoh, N. Nikoh, R. Laing, Abubucker,S. for algorithms Velvet: E. Birney, & D.R. Zerbino, Mortazavi, A. status a importance: veterinary of nematodes in resistance Drug R.M. Kaplan, evolutionary The I. Beveridge, & R.B. Gasser, F., Huby-Chilton, N.B., Chilton, K.C. Kiontke, a Mgn H. Megen, van relatives. its and worm the Nematodes: M. Blaxter, M.K. Bhopale, & S. Menon, Ray, D.K., Bhopale, K.K. & Shrivastava, V.B. Migration and growth of Jian, X. C. Cantacessi, R.J. Traub, P.Garside, J.M. Behnke, & B. Schneider, major against need we drugs the and have we drugs The Utzinger,J. & Keiser,J. of numbers Global S.J. Brooker, & R. Jasrasaria, J.L., Smith, R.L., Pullan, Vos,T. asmr AD & uhe, .. h fo rsuc o adult of resource food The M.V. Sukhdeo, & A.D. Bansemir, K. Prüfer, S. Götz, Z. Wang, F. Husnik, Uehara, T. & Park, J.T. An anhydro- animal polished have transfers Wu,B. gene Lateral M.N. Rosso, & E.G. Danchin, pathogens: plant filamentous in evolution Genome S. Kamoun, and & S. native Raffaele, Using D. Haussler, & R. Baertsch, M., Diekhans, M., Stanke, Tang, Y.T. C. Dieterich, L.D. Stein, E.M. Schwarz, , 1234–1244 (2011). 1234–1244 , , 261–269 (2014). 261–269 , et al. et TrendsParasitol. et al. et et al. et al. et in the . small the in et al. et Int. J. Parasitol. J. Int. et al. et al. et et al. et Mesocricetus auratus) Mesocricetus et al. et in golden hamsters t al. et t al. et Years lived with disability (YLDs) for 1160 sequelae of 289 diseases 289 of sequelae 1160 Yearsfor (YLDs) disability with lived Interdomain lateral gene transfer of an essential ferrochelatase gene ferrochelatase essential an of transfer gene lateral Interdomain

nyotm ceylanicum Ancylostoma

et al. Lancet Mol. Phylogenet. Evol. Phylogenet. Mol. Buchnera et al. et 20

High-throughput functional annotation and data mining with the with mining data and annotation functional High-throughput et al. et Characterizing et al. et Genome of the human hookworm t al. et 24 The genome and transcriptome of transcriptome and genome The Bacterial genes in the aphid genome: absence of functional gene functional of absence genome: aphid the in genes Bacterial

FUNC: a package for detecting significant associations between associations significant detecting for package a FUNC: t al. et t al. et 7

t al. et Nucleic Acids Res. Acids Nucleic , 1740–1747 (2010). 1740–1747 , 27 h gnm sqec of sequence genome The Horizontal gene transfer from diverse bacteria to an insect insect an to bacteria diverse from transfer gene Horizontal , 37 (2014). 37 , , 637–644 (2008). 637–644 , Scaffolding a A phylogeny and molecular barcodes for barcodes molecular and phylogeny A The canine hookworm genome: analysis and classification of classification and analysis genome: hookworm canine The A history of hookworm vaccine development. vaccine hookworm of history A , 376–388 (2009). 376–388 ,

Mesocricetus auratus Mesocricetus J. Helminthol. J. The Adv. Parasitol. Adv. Haemonchus contortus Haemonchus 380 pyoeei te o nmtds ae o aot 1200 about on based nematodes of tree phylogenetic A prri o te SPTP” rtis f eukaryotes— of proteins “SCP/TAPS” the of portrait A survey sequences. h gnm ad eeomna tasrpoe f the of transcriptome developmental and genome The to its host. its to

Genome Res. Genome 20 , 2163–2196 (2012). 2163–2196 ,

Nat. Rev. Microbiol. Rev. Nat. 43 PLoS Biol. PLoS Ancylostoma ceylanicum Ancylostoma , 477–481 (2004). 477–481 , rsinhs pacificus Pristionchus , 1009–1015 (2013). 1009–1015 , nyotm ceylanicum Ancylostoma Ancylostoma caninum Ancylostoma Mesocricetus auratus BMC Genomics BMC : pathogenicity and humoral immune response to response immune humoral and pathogenicity : Caenorhabditis rc Nt. cd Si USA Sci. Acad. Natl. Proc. J. Parasitol. J. : maintenance through one hundred generations N

59

PLoS Genet. PLoS Front. Cell. Infect. Microbiol. Infect. Cell. Front.

-acetylmuramyl- 73 36 40

a eeegn bt elce parasitic neglected but re-emerging a , , 143–146 (1985). 143–146 , 18 1 , 197–230 (2010). 197–230 , , 3420–3435 (2008). 3420–3435 , BMC Bioinformatics BMC , 118–128 (2006). 118–128 , , E45 (2003). E45 , Exp. Parasitol. Exp. Mol. Biochem. Parasitol. , 821–829 (2008). 821–829 , ). II. Morphological development of the of development Morphological II. ). BMC Evol. Biol. Evol. BMC . anradts briggsae Caenorhabditis

Genome Biol. Genome Escherichia coli Escherichia 80

nematode genome with RNA-seq. 10 Genome Biol. Genome 11 Necator americanus

, 24–28 (1994). 24–28 , 6 , 417–430 (2012). 417–430 , eoe rvds unique a provides genome , e1000827 (2010). e1000827 , , 307 (2010). 307 , Haemonchus contortus Haemonchus e novo de a. Genet. Nat. . in the hamster: observations hamster: the in l transcriptome and exploring and transcriptome J. Helminthol. -alanine amidase with broad Nematology LS Biol. PLoS Los 11) n golden in 1911) (Looss,

105 e novo de Cell

hr ra assembly read short s r e t t e l

, 192–200 (2003). 192–200 , 11 14 Caenorhabditis 8

.

, 41 (2007). 41 , 14 110 153 , 339 (2011). 339 , , R89 (2013). R89 ,

J. Bacteriol. J. Heligmosomoides 40 , R88 (2013). R88 , ee finding. gene

157 2 9 7748–7753 ,

11 , 1567–1578 1567–1578 , 1193–1198 , Hum. Vaccin. Hum. 46 , 27 (2012). 27 , e1001050 , . Ancylostoma a platform a : Nat. Genet. 927–950 , , 187–192 , 357–362 , a key a ,

, with ,

421 189 98 , ,

© 2015 Nature America, Inc. All rights reserved. 52. 51. 50. 49. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 38. 37. 36. s r e t t e l 422

FEBS J. FEBS oncophora Sel. secretion. protein leaderless and non-classical of prediction cantonensis Angiostrongylus Biol. BMC proteins. (VAL) allergen-like venom nematode gastrointestinal (2012). 652–657 interventions. new developing and interactions host-parasite Trop.Dis. worm, filarial the of stages lifecycle of transcriptome target. its and microRNA a (2013). 1337–1344 of regulation synchronous by project. modENCODE the by 1583–1586 (2012). 1583–1586 of larvae migratory tissue in analysis EST by restricted antigen processing. antigen restricted parasite filarial the by secreted homolog cystatins. macrophages. in (2011). production cytokine regulate vaccines. and virulence, 2012). Co., & Verlag (Wiley-VCH 399–420 C.R.) Caffrey, (ed. hookworm human eesy AN & ray JE Te -ye etnlk dmi superfamily. domain lectin–like C-type Yoshida, A., Nagayasu, E., Horii, Y. The & Maruyama, H. A novel C-type J.E. lectin identified Gready, & A.N. Zelensky, Borloo, J. Bendtsen, J.D., Jensen, L.J., Blom, N., Von Heijne, G. & Brunak, S. Feature-based H. He, Thomas, J.H. & Robertson, H.M. The J.P. Hewitson, A. Osman, Y.J. Choi, oscillations expression of Dampening A. Oudenaarden, van & D. Grun, D., Kim, M.B. Gerstein, C. Watts, & R.M. Maizels, W.F., Gregory, B., Manoury, nematode by responses immune host of Modulation R. Lucius, & S. Hartmann, C. Klotz, M.S. Pearson, in D. Knox, N. Ranjit,

17 , 349–356 (2004). 349–356 ,

t al. et 272 et al.

excretome/secretome. t al. et Int. J. Parasitol. J. Int.

5 6 et al. et t al. et t al. et , e1409 (2011). e1409 , , 42 (2008). 42 , , 6179–6217 (2005). 6179–6217 , rlmnr mlclr hrceiain f h hmn pathogen human the of characterization molecular Preliminary In-depth proteomic and glycomic analysis of the adult-stage aaii Hlits Tres Sres Dus ad Vaccines and Drugs, Screens, Targets, Helminths: Parasitic hlit imnmdltr xlis ot inln eet to events signaling host exploits immunomodulator helminth A t al. et et al. et t al. et Hookworm SCP/TAPS protein structure—a key to understanding to key structure—a SCP/TAPSprotein Hookworm rtoyi dgaain f eolbn n h itsie f the of intestine the in hemoglobin of degradation Proteolytic de sqecn apoc t cmaaiey nlz the analyze comparatively to approach sequencing deep A Necator americanus Necator Integrative analysis of the of analysis Integrative rtoi aayi o sceoy rdcs rm h model the from products secretory of analysis Proteomic oeua mcaim o howr dsae stealth, disease: hookworm of mechanisms Molecular J. Allergy Clin. Immunol. Clin. Allergy J.

33 . eimsmie polygyrus Heligmosomoides Science BMC Mol. Biol. Mol. BMC , 1291–1302 (2003). 1291–1302 , Curr. Biol. Curr. J. ProteomeRes.J. J. Proteomics J. Caenorhabditis

330 . J. Infect. Dis. Infect. J.

11 rga malayi Brugia , 1775–1787 (2010). 1775–1787 , , 447–451 (2001). 447–451 ,

Ascaris suum Ascaris 10 , 97 (2009). 97 , Caenorhabditis elegans Caenorhabditis

12

LS Pathog. PLoS 74 130 chemoreceptor gene families. , 3900–3911(2013)., , 1573–1594 (2011). 1573–1594 ,

Brugia malayi Brugia 199 , 13–21 (2012). 13–21 , ihbt cas I MHC– II class inhibits , eel dmnne of dominance reveals Bm , 904–912 (2009). 904–912 , . Biotechnol. Adv.Biotechnol. Parasitol. Res. Parasitol. CI2 a cystatin a -CPI-2, rti Eg Des. Eng. Protein a. Genet. Nat.

7 e1001248 , . PLoS Negl. PLoS genome Cooperia

110

30 45 , , ,

64. 63. 62. 61. 60. 59. 58. 57. 56. 55. 54. 53. 68. 67. 66. 65.

Clin. Microbiol. Rev. Microbiol. Clin. (Nematoda).(1994). strongylida the of expansion resistance. drug anthelmintic evolution. parasite (2014). e1004245 malayi (2010). (RmlC). 3,5-epimerase hexopyranosid-4-ulose the of inhibitors Rev. discovery. inhibitor structure-based Crystallogr. Biol. D for target a as potential of pathways. (2013). metabolic nematode in chokepoints efficiency. (2012). R&D pharmaceutical in ligands. host PLoS Negl. Trop.Dis. Negl. PLoS screens. high-throughput two by revealed druggability,as limited has nematodes actionanddrugcombination studies. humans. Biogerontology Nematoda. phylum etn TS7, ertd y netv lra of larvae infective by secreted TES-70, lectin, ets GW & og B Fml sz sos iia ted i al lds f the of clades all in trends similar shows size Female B. Boag, & G.W. Yeates, regulation. immune host and infections Helminth R.M. Maizels, McSorley,& H.J. evolutionary and origins The D.M. Spratt, & I. Beveridge,Durette-Desset, M.C., J.S. Gilleard, about us taught have genomes helminth What M. Berriman, & M. Zarowiecki, Farelli, J.D. S. Sivendran, M.P.Perhexiline. &Frenneaux, J.D. Horowitz, H., Ashrafian, Structure W.N.Hunter, & S. Cameron, M.T., Hutchison, A., P.K.,Dawson, Fyfe, C.M. Taylor, J.W.,Scannell, Blanckley,Warrington,decline &the Diagnosing H.B. Boldon, A., C-type new a of Identification R.M. Maizels, & M. Hintz, A., Doedens, A., Loukas, rwhr G.J. Crowther, nematodes. Somvanshi,V.S., free-living Ellis, B.L., Hu, Y. Aroian,& R.V. and Nitazoxanide: nematicidal parasitic mode of in C.A. Desjardins, ageing and Longevity D. Gems, tpyoocs aureus Staphylococcus

25 eel ky ein rnils o atemni drugs. anthelmintic for principles design key reveals , 76–97 (2007). 76–97 , VOLUME 47 | NUMBER 4 | APRIL 2015 APRIL | 4 NUMBER | 47 VOLUME Nat. Genet. Nat. et al. Parasitology t al. et et al. et

1 amnhs contortus Haemonchus , 289–307 (2000). 289–307 , t al. et Structure of the trehalose-6-phosphate phosphatase from et al. et

Identification of triazinoindol-benzimidazolones as nanomolar as triazinoindol-benzimidazolones of Identification Parasitology yoatru tuberculosis Mycobacterium 66 icvr o atemni du tres n dus using drugs and targets drug anthelmintic of Discovery Nematology

25

45 Genomics of Genomics , 881–888 (2010). 881–888 , 8 oatridpnet hshgyeae uae from mutase phosphoglycerate Cofactor-independent , e2628 (2014). e2628 , , 585–608 (2012). 585–608 , , 495–500 (2013). 495–500 ,

dnlsciae ys (uB ad seset f its of assessment and (PurB) lyase adenylosuccinate 121 Parasitology , 545–554 (2000). 545–554 ,

42 8 , 111–127 (2006). 111–127 , (suppl. 1), S85–S97 (2015). S85–S97 1), (suppl. Mol.Biochem. Parasitol. Loa a. e. rg Discov. Drug Rev. Nat. s prdg ad oe t study to model and paradigm a as

140 , a Wolbachia-free filarial parasite of parasite filarial Wolbachia-free a , n. . Parasitol. J. Int. Bioorg. Med. Chem. Med. Bioorg. , 1506–1522 (2013). 1506–1522 , nye TDP-6-deoxy- ooaa canis Toxocara LS Pathog. PLoS

Nature Ge Nature

193

LS Pathog. PLoS wih id to binds which , adoac Drug Cardiovasc. 24 ca Crystallogr. Acta

, 1–8(2014)., 11

9 1139–1165 , 18 e1003505 , 191–200 , , 896–908 , n d - etics xylo Brugia

10 -4- ,

© 2015 Nature America, Inc. All rights reserved. with FigTree 1.4 (see URLs). (see FigTree 1.4 with branch confidence levels were computed with FastTree (2.1.7) (3.8.31) with JalView ized (2.8) MUSCLE with aligned (v7.158b) were sequences Protein psi-BLAST with extracted were domain single families. protein for associations such compute changes For example, treatment). drug (for of conditions environmental or stages developmental between up- genes with or downregulated associated were terms significantly statistics rank-sum Wilcoxon with 0.4.5 FUNC used For activity. in of gene changes significance the estimate to condition per data RNA-seq of subsets random five sampled we condition, per replicate biological one only had we Because biological or stages between ( activity conditions gene in changes for significance the (1.2.0) RSEM with expression gene contortus H. ( consortium modENCODE the from data Annotations for protein-coding genes are listed in orthology groups that contained only one predicted gene for each of the species. Strict orthologies between genes from two or more species were defined as those A. ceylanicum both for Blast2GO and InterProScan performed We assignments. term GO of GO terms between different nematode species would be based on equivalent GO terms to URLs for protocols for running Blast2GO and InterProScan). We also assigned InterProScan (4.8) Pfam-A and Pfam-B) with SEG Phobius with transmembraneand sequences signal genome assembly with BLAT. For predicted the to mapped been cDNAfromhad hintsthat using and MAKER2) running (2.26-beta) MAKER2 of round (2.6.1) AUGUSTUS genome the final (1.0.5) assembly with wereRepeatScout identified ping cDNAs to genomic DNA with BLAT (v. 34) (V1.05) A. ceylanicum of size genomic true The genes. of protein-coding prediction andthe to aid in cDNAs ( (0.2.07) Oases with cDNA into assembled were RNA-seqreads khmer with RNA-seq ing reads HaploMerger (20111230) (release_2011) (1.2.05) Velvet from was assembled 100-nt paired Illumina reads (550 nt and 6 kb apart) with in listed are of numbers The hamster golden in growth of stages described previously in shown Stages of 18-F23). (see the WelfareAnimal Act (Public Law US 89-544, Statutes at Large) as amended implementing regulations including (USDA), of Agriculture Department US Research National US Health the of with Institutes conformed study this in used laboratory of ( summary. General O doi: the in provided are details extensive Mesocricetus auratus Mesocricetus NLIN Some details of these methods are provided below; considerably more more considerably below; provided are methods these of details Some or protein a to homologous sequences analyses, phylogenetic For We 2 (ref. Bowtie with to RNA-seq genes reads mapped of analysis RNA-seq For with assembly genomic final our for genes protein-coding predicted We RNA-seqandout described largely as sequencing were carried Genomic 10.1038/ng.3237 7 Supplementary Table 2 1 (see 18-F22) and all requirements and all regulations issued by the the by issued regulations all and requirements all and 18-F22) (see E E 8 , , by ( CEGMA (v2.4.010312) 9 Figure Figure 1 2 , coiled-coil domains with NCoils M ; alignments were edited with Trimal (v1.4.rev15) Trimal with edited were alignments ; C. elegans , we used published developmental RNA-seq data RNA-seq developmental published used we , Supplementary Table21 Supplementary upeetr Tbe 7 Tables Supplementary and was estimated by counting 31-mer frequencies with SOAPdenovo E 7 1 1 TH 8 and the sequence was reduced in possible heterozygosity A. ceylanicum A. , gaps were closed after assembly with BGI GapCloser 1.12 1.12 GapCloser BGI with assembly after closed were gaps , 1 C. elegans 8 and listed in in listed and OD 5 Culture and infection of Culture and infection 2 and GO terms with Blast2GO 2.5 (build 23092011) 8 5 Guide for the Care and Use of Laboratory Animals in in Animals Laboratory of Use and Care the for Guide and 3 , after generating species-specific parameters with one with parameters species-specific generating after , ) were carried out as described as out carried were ) with HMMER 3.0/hmmsearch 9 A. A. ceylanicum 4 S . Protein maximum-likelihood phylogenies and their phylogenies . Protein maximum-likelihood 7 A. ceylanicum A. 3 H. contortus . Genomic RNA scaffolding was performed by filter . We computed orthologies with OrthoMCL (1.3) . elegans C. and hamsters used for hamstersused and 7 7 A. A. ceylanicum ) were used both to assess genome completeness 9 4 (see URLs for the protocol by S. Kumar forKumar S. by protocol the for URLs (see and then scaffolding with ERANGE (3.2) ERANGE with scaffolding then and Supplementary Table 6 Table Supplementary selected for developmental RNA-seq for developmental are selected Supplementary Note. Supplementary 8 genes with Blast2GO so that comparisons Supplementary TableSupplementary 3 8 . The The . . For individual genes, we computed computed we genes, individual For . and and , we also used rank-sum statistics to to statistics rank-sum used also we , , we used published developmental developmental published used we , A. ceylanicum Supplementary Table 22 Table Supplementary A. ceylanicum A. 8 A. A. ceylanicum 23 2 , , , Pfam 26.0 domains (from both C. C. elegans 7 wt NIe-i (2.13) NOISeq-sim with ) 7 . Repetitive DNA elements in 9 Supplementary Table 4 0 or HMMER/jackhmmer. HMMER/jackhmmer. or 8 0 8 3 , low-complexity , regions 6 4 4 A. ceylanicum A. 9 , InterPro domains with to compute which GO which to compute . All housing and care care and housing All . proteins, we annotated and 1 0 ; they are based on on based are they ; genomic sequence sequence genomic . in golden hamster hamster in golden 8 7 9 H. H. contortus ) and quantified ) and quantified ) 5 9 2 7 and visualized 1 9 1 7 6 or MAFFT MAFFT or 3 . 5 and by map and visual and . Assembled Assembled . 7 RNA-seq 8 . ) 4 7 3 2 2 3 with . For . (see (see . , , we 8 1 7 8 9 7 0 6 - - - . . . .

domain (Transposase_1/PF01359.13 in Pfam). in (Transposase_1/PF01359.13 domain of homolog a partial hit represented hit represented an that the higher-scoring indicated with HMMER/hmmsearch domain analysis ( -evalue 1e-03”). This analysis yielded two hits, with for DNAsequence lanicum host by HGT mammalian a from arise to postulated been has and humans of transposon transfer. gene A. caninum horizontal possible for elements repetitive Examining ( 426 tRNAs decoding all 20 standard amino acids and one tRNAselenocysteine (ref. tRNAscan-SE-1.3.1 with genes tRNA for genome the Wesearched heterozygosity. unresolved of free largely was assembly the that suggesting homozygous), completely are thus C. briggsae eukaryo core full-length genes for tic observed orthologs of 95%. of number value average consensus The a supported calculations these DNA: genomic to reads) RNA-seq from independently (assembled cDNA mapping by 93% as ( genes eukaryotic for conserved by searching 31-mers of frequencies the computing by 98% as completeness DNA. genomic of completeness the Assessing predicted gene for each of those species ( species those of each for gene predicted one only contained that groups orthology as those were defined more species ( (1.3) OrthoMCL with genes (NCBI-nr) database nr NCBI the of subset animal-specific an against results BLASTP and tions predic InterProScan both used we Blast2GO, for URLs); (see protocols able (build 2.5 Blast2GO 23092011) with gene each to terms GO assign to turn, in motifs, WeInterPro and Pfam-A parameters. used default with run were OrthoMCL Pfam-A domains were detected at a threshold of lutionary relationships to genes in different species (through OrthoMCL 1.3) 26 and 3.0/Pfam-A HMMER 4.8) (through motifs InterProScan fied are in and proteomes sources listed their from proteomes genomic and sequences nematode proteomes partial from translated ESTs.mammalian All and nematode used we sequences, protein of genes. protein-coding Analyzing analysis. phylogenetic and alignment multiple-sequence ( peptides 1/PF01359.13 ( peptides matches. These criteria led us to select 988 Plat_L3/L3-like RVT_1/PF00078.22 domain of 10% shortest the excluding by also and elements) L3/Plat_L3-like (for or elements) transcriptase)/PF00078.22 RVT_1 (reverse HSMAR1/2-like (for Transposase_1/PF01359.13 domains Pfam the match they that requiring extracted well-aligned, full-length protein domains from repetitive elements by RepBase from elements eukaryotic of collection ( deuterostomes and trochozoans published genome sequences from nematodes, vertebrates, arthropods, lopho in detect could we included all of the L3/Plat_L3-like and HSMAR1/2-like repetitive elements that phylogenies These elements). HSMAR1/2-like (for domains transposase and maximum-likelihood elements) (for domains L3/Plat_L3-like computed for reverse-transcriptase phylogenies we real, or adventitious were similarities like mariner elements ( HSMAR1/2- and retrotransposons L3/Plat_L3-like similarities, mammalian human repetitive elements. This analysis identified two classes of elements with repetitive DNA elements in ments existed in human hookworms, we used the DFAM database Supplementary Supplementary Table 25 Supplementary Tables 4 Supplementary Supplementary Table 24 Supplementary To examine whether more evidence for lateral acquisition of repetitive ele of repetitive acquisition for lateral To more evidence whether examine A. ceylanicum A. , , we our searched of library 7 Supplementary Table 27a Supplementary 6 3 and was 1.13, which matched averages of 1.11–1.15 in in 1.11–1.15 of averages matched which 1.13, was , , the repetitive element 3 . We performed InterProScan and Blast2GO according to avail to according Blast2GO and InterProScan We . performed 9 7 . . To a whether determine Caenorhabditis Caenorhabditis tropicalis A. A. ceylanicum , , A. ceylanicum A. bandit H. contortus H. 10 0 . . We groups for computed orthology our Supplementary Supplementary Table 25 Supplementary Table 27b Table Supplementary ). ). Phylogenetic analysis ( via BLASTN (2.2.26+) BLASTN via and ). A. A. ceylanicum 8 6 9 , for numbers of species ranging from 4 to 14 14 to 4 from ranging species of numbers for , 28 6 homolog of and and ); this analysis detected a full complement of complement a full detected analysis this ); and and ). Strict orthologies between genes of genes two or between orthologies ). Strict Supplementary Table 26 Supplementary bandit For motif searches or OrthoMCL analyses analyses or OrthoMCL searches For motif A. A. ceylanicum ) and 168 HSMAR1/2-like Transposase_ HSMAR1/2-like 168 ) and bandit C. elegans C. N. americanus N. bandit (all of which are and hermaphrodites resembles the HSMAR1 mariner-like Supplementary Table 28 Supplementary and Fig. Fig. that did not encode a transposase a transposase not encode did that bandit N. N. americanus Supplementary Table 3 Supplementary homolog also existed in genes both by known protein protein known by both genes 2 E ). ). To determine whether these We estimated the assembly’s the We estimated 9 ). Strict orthologies allowed allowed orthologies Strict ).

repetitive elements with repetitive the 9 ≤ 9 E 0 Supplementary Supplementary Fig. 7 (see URLs for source). We source). for URLs (see , , whereas the lower-scoring 1 × 10 values of 0.0 and 7 × 10 (arguments: “ (arguments: ), which we subjected to to subjected we which ), , in a diverse set of other other of set diverse a in , Nature Ge Nature −5 ; ; InterProScan and ) and in a curated curated a in and ) with similarity to A. A. ceylanicum 83– 7 1 9 , as 91–99% 91–99% as , - 8 . . We classi 8 . elegans C. task blastn blastn task to identify 5 and evo n ) A. A. cey etics 7 6 ) ) and and −170 In 8 6 ------. ,

© 2015 Nature America, Inc. All rights reserved. the Amidase_2/PF01510.20 domain, realigned them with MAFFT v7.158b v7.158b MAFFT with them realigned domain, Amidase_2/PF01510.20 the in listed are in listed are proteomes additional for UniProt of release 2014 July 9 entire the and rumen cow and stool human from metagenomes different two arthropods, amiD in listed subdivisions major four of one resent bacterial of which PF01510.20), motif PFAM 27.0 amidase; (N-acetylmuramoyl- superfamily Amidase_2 entire the for logeny the hookworms of the in origin phylogenetic the determine rigorously more To ( animals basal and arthropods from sequences and into GenBank) hookworms of the from homologs sequences bacterial and s0012.g1873 non-bacterial bacteria. and characterized metazoa from first We homologs amiD of analysis Phylogenetic Table 31 in listed are phylogeny resulting each in sequences the class, with MUSCLE (3.8.31) and phylogenetically analyzed as above. For each lectin in listed are rounds, Supplementary Table 30 psi-BLAST of numbers corresponding the with used, of threshold inclusion an at rounds (2.2.26+) psi-BLAST via sequences query single-domain with database proteome custom the searched we of sets homologs, coherent identify to proteins; lectin full-length than rather domains C-lectin individual analyzed we proteins, multidomain between similarities in listed in listed (species proteins mammalian to similarities reported previously their of because ized a with along character been had that lectins nematode proteomes, individual of added number small lophotrochozoan and deuterostome nematode, to arthropod, domains their compared phylogenetically We (NCAN). can neuro or (ASGR) receptor asialoglycoprotein (MRC), receptor mannose to ones nematode to than lectins ( vertebrate to similar more were that genes metazoa. the from homologs lectin of analysis Phylogenetic found. were sequences bacterial other no NCBI-nr, against BLASTP with testing but, hits, on for further mammalian requirement the without searches Acey_s0012.g1873 one domains) putative HGTs as false positives. However, we also (through identified Pfam-A to similarities showed checked individually by BLASTP searches of NCBI-nr. In most cases, BLASTP were that (respectively) 15 and instances of 52 possible HGT. found Each possible instance we of HGT in domains, Pfam-A 3,545 and xylophilus B. and mammals (at in present were domains the if HGT of instances possible tain to con considered likewise were domain Pfam-A a shared encoding of genes C. briggsae included they if HGT of A. ceylanicum instances possible represent to considered were with as proteomes all for (computed to domains non-nematodes from genes protein-coding ceylanicum A. of HGT of cases genes. possible protein-coding of transfer gene horizontal for Searching ceylanicum A. either from data RNA-seq had we which for conditions all under expressed 406 of set a identify thereby to and between profiles transcriptional compare to us Nature Ge Nature above. as them analyzed phylogenetically and Supplementary Table 17 Table Supplementary amiD homologs listed in listed homologs Supplementary Table 29 Supplementary . Supplementary Table 28 Table Supplementary homolog homolog , , P. pacificus Supplementary Table 28 Supplementary with BLASTP of NCBI-nr. This analysis yielded matches to to matches yielded analysis This NCBI-nr. of BLASTP with n and and , , or or , we used both orthologies (strict and non-strict) and Pfam-A Pfam-A and non-strict) and (strict orthologies both used we , etics Homo sapiens A. ceylanicum A. C. elegans C. E M. hapla M. Supplementary Table 29 Table Supplementary . To search for other such homologs, we reran our motif motif our reran we homologs, such other for search To .

≤ Acey_s0012.g1873 N. N. americanus A. ceylanicum A. 1 × 10 , , Bursaphelenchus xylophilus C. elegans C. . . The resulting single-domain matches were realigned Supplementary Table 32 Supplementary (at (at ): these fell into three classes, showing similarity similarity showing classes, three into fell these ): . −6 and gene with strong similarity to bacterial bacterial to similarity strong with gene ) ) but absent in E

≤ Mus musculus and other nematodes, which marked the the marked which nematodes, other and 1 × 10 × 1 ; it also gave nine matches to non-bacterial to non-bacterial matches ; gave nine it also , along with all of the individual metazoan metazoan individual of the all with , along and and A. ceylanicum A. ). To avoid misalignments and spurious spurious and misalignments avoid To ). A. ceylanicum A. . We extracted subsequences matching matching subsequences . We extracted , we also observed eight eight observed also we , Supplementary Table 33 Supplementary E 10 N. americanus N.

2 −5 ≤ . We searched all of the proteomes proteomes the of all We searched . × 10 × 1 90 ). Out of 33,243 orthology groups groups orthology 33,243 of Out ). C. C. elegans ; sources of proteome sequences sequences proteome of sources ; , 10 1 10 but did not include , run for either three or four four or three either for run , A. ceylanicum A. A. ceylanicum A. 3 Supplementary Table 32 Table Supplementary (our own data, deposited deposited data, own (our . Species and data sources sources data and Species . or −20 and more from proteomes genes that were strongly strongly were that genes , , . The query sequences sequences query The . Meloidogyne Meloidogyne hapla C. C. briggsae , we generated a phy a generated we , A. A. ceylanicum amiD Supplementary Supplementary In addition to to addition In and and ). Orthologies Orthologies ). A. ceylanicum A. A. ceylanicum A. ; source files files ; source amiD , , P. pacificus genes rep genes C. C. elegans C. elegans C. l -alanine -alanine To find find To genes genes amiD Acey_ . . Sets was ). ). ­ ­ - - - , , ,

80. 79. 78. 77. 76. 75. 86. 81. 74. 73. 72. 71. 70. 69. 103. 102. 101. 100. 99. 98. 97. 96. 95. 94. 93. 92. 91. 90. 89. 88. 87. 85. 84. 83. 82.

using complexity measures. complexity using (2004). method. prediction peptide signal and topology (2011). 491 management tool for second-generation genome projects. genomes. large in (2002). genomes. draft in 28 levels. expression of range dynamic the across assembly RNA-seq sequences. shotgun (2012). of assembly lifestyle. free-living a of demands harsh the by shaped are hookworms.againstanthelminticaction Nucleic Aci Nucleic Rev. Microbiol. Res. refinements. other and statistics composition-based with searches Information. Res. Genome Cytogenet. models. Markov (2007). e35 genome. reference a without or with data RNA-Seq Methods Nat. genomes. eukaryotic for (2011). 37–47 inference. (2012). D290–D301 266 approach to reducing sampling variation, removing errors, and scaling scaling and errors, removing variation, sampling reducing to approach assemblies. genome diploid nematodes. outcrossing obligately from sequencing. read (2013). 1279–1295 horizontal genetic transfer between host and parasite. and host between transfer genetic horizontal sequence. (1997). genomic in genes RNA transfer alignments. large for trees likelihood workbench. analysis and editor alignment Bioinformatics sequence multiple 2—a Version 25 automated alignment trimming in large-scale phylogenetic analyses. (2013). version7: improvements in performance and usability. throughput. 10 depth.matterofexpressionRNA-seq: ain (2011). äl L, rg, . Snhme, .. cmie transmembrane combined A E.L. Sonnhammer, & A. Krogh, L., Käll, genome-database and pipeline annotation an MAKER2: M. Yandell, & C. Holt, P.A.Pevzner, & N.C. Jones, A.L., Price, tool. alignment BLAST-like BLAT—the W.J. Kent, space gene the Assessing I. Korf, T.& Keane, Z., Ning, K., Bradnam, G., Parra, robust Oases: E. Birney, & M. Vingron, D.R., Zerbino, M.H., Schulz, Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups Wootton, J.C. Non-globular domains in protein sequences: automated segmentation pass single A T.H. Brom, & A.B. Pyrkosz, Q., Zhang, A., Howe, C.T., Brown, S. Huang, Barrière, A. R. Li, J. Srinivasan, Y. Hu, nPo Cnotu. ciiis t h Uiesl rti Rsuc (UniProt). Resource Protein Universal the at Activities Consortium. UniProt amidases. peptidoglycan of activities and Bochtler,Folds & M. M. Firczuk, A.A. Schäffer, E.W. Sayers, J. Jurka, T.J.Wheeler, T. Laha, of detection improved for program a tRNAscan-SE: S.R. Eddy, & T.M. Lowe, maximum- 2—approximately FastTree A.P. Arkin, & P.S. Dehal, M.N., Price, Jalview G.J. Barton, & M. Clamp, D.M., Martin, J.B., Procter, A.M., Waterhouse, for tool a trimAl: T. Gabaldon, & J.M. Silla-Martinez, S., Capella-Gutiérrez, software alignment sequence multiple MAFFT D.M. Standley, & K. Katoh, high and accuracy high with alignment sequence multiple MUSCLE: R.C. Edgar, C. from Camacho, quantification transcript Differential A. Conesa, & A. Ferrer, accurate J., Dopazo, Garcia-Alcalde, F., Tarazona, S., RSEM: C.N. Dewey, & B. 2. Bowtie with Li, alignment gapped-read Fast S.L. Salzberg, & B. Langmead, McDowall, J. & Hunter, S. InterPro protein classification. probabilistic on based tools search homology of generation new A S.R. Eddy, M. Punta, structures. coiled-coil of analysis and Prediction A. Lupas, , 1086–1092 (2012). 1086–1092 , , 1972–1973 (2009). 1972–1973 , (2009). 421 , , 513–525 (1996). 513–525 ,

29 et al. et et , 2994–3005 (2001). 2994–3005 , al. t al. et t al. et et al. et Genome Inform. Genome t al. et ds Res. ds et al.

Nucleic Acids Res. Acids Nucleic De novo De ehnsi ad single-dose and Mechanistic Nucleic Acids Res. Acids Nucleic et al. et

et al. et et al. et et al. et

9

h bni, nw N tasoo fo a hookworm-possible a from transposon DNA new a bandit, The ebs Udt, dtbs o ekroi rpttv elements. repetitive eukaryotic of database a Update, Repbase 25 t al. et HaploMerger: reconstructing allelic relationships for polymorphic for relationships allelic reconstructing HaploMerger: , 357–359 (2012). 357–359 , 31 Nucleic Acids Res. Acids Nucleic h Pa poen aiis database. families protein Pfam The Detecting heterozygosity in shotgun genome assemblies: lessons Genome Res. Genome , 1189–1191 (2009). 1189–1191 , Nucleic Acids Res. Acids Nucleic , 676–691 (2007). 676–691 , Database resources of the National Center for Biotechnology for Center National the of resources Database Bioinformatics BLAST+: architecture and applications. and architecture BLAST+: Dfam: a database of repetitive DNA based on profile hidden profile on based DNA repetitive of database a Dfam:

The draft genome and transcriptome of transcriptome and genome draft The 42 assembly of human genomes with massively parallel short parallel massively with genomes human of assembly mrvn te cuay f S-LS poen database protein PSI-BLAST of accuracy the Improving , D191–D198 (2014). D191–D198 ,

Genome Res. Genome 110

23 Comput. Chem. Comput. Genome Res. Genome , 462–467 (2005). 462–467 , , 205–211 (2009). 205–211 ,

20 32 40

21 , 265–272 (2010). 265–272 , , 1792–1797 (2004). 1792–1797 ,

, D13–D25 (2012). D13–D25 , 41 (suppl. 1), i351–i358 (2005). i351–i358 1), (suppl.

PLoS ONE PLoS

37 13 , D70–D82 (2013). D70–D82 , PLoS Negl. Trop.Negl.Dis. PLoS Genome Res. Genome De novo De , 289–297 (2009). 289–297 , arXiv , 2178–2189 (2003). 2178–2189 ,

GenomeRes. 22 n vivo in

18 uli Ais Res. Acids Nucleic , 1581–1588 (2012). 1581–1588 , . . , 269–285 (1994). 269–285 ,

identification of repeat families repeat of identification . o. Biol. Mol. J. 5 http://arxiv.o hrpui suis f Cry5B of studies therapeutic , e9490 (2010). e9490 , BMC Bioinformatics BMC eoe Res. Genome Mol.Biol. Evol.

19

PLoS Negl. Trop. Dis. Trop. Negl. PLoS 21 Methods Mol. Biol. uli Ais Res. Acids Nucleic BMC Bioinformatics , 470–480 (2009). 470–480 , doi: , 2213–2223(2011)., Panagrellus redivivus Panagrellus BMC Bioinformatics BMC

ehd Enzymol. Methods rg/abs/1203.480 6 10.1038/ng.3237 338 , e1900 (2012).e1900 ,

Genetics Bioinformatics Bioinformatics 25 12

1027–1036 , Nucleic Acids Nucleic 30 955–964 , 656–664 , ,772–780

12 e novo de de novo de

FEMS , 323 ,

694 193

40 12

1 2 , , , , ,

ERRATA

Erratum: The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families Erich M Schwarz, Yan Hu, Igor Antoshechkin, Melanie M Miller, Paul W Sternberg & Raffi V Aroian Nat. Genet. 47, 416–422 (2015); published online 2 March 2015; corrected after print 5 May 2015

In the version of this article initially published, the following two sentences were omitted from the Acknowledgments: “Sequencing was carried out at the Millard and Muriel Jacobs Genome Facility at the California Institute of Technology. This work was supported by US National Institutes of Health grants to P.W.S. (GM084389) and to R.V.A. (AI056189), by Cornell University salary and start-up funds to E.M.S. and by the Howard Hughes Medical Institute to P.W.S.” The error has been corrected in the HTML and PDF versions of the article. Nature America, Inc. All rights reserved. America, Inc. © 201 5 Nature npg

nature genetics