<<

l e t t e r s

OPEN

The genome and transcriptome of the zoonotic ceylanicum identify infection-specific gene families

Erich M Schwarz1, Yan Hu2,3, Igor Antoshechkin4, Melanie M Miller3, Paul W Sternberg4,5 & Raffi V Aroian2,3

Hookworms infect over 400 million people, stunting and closely related to the free-living Caenorhabditis elegans than is the impoverishing them1–3. Sequencing hookworm genomes and free-living Pristionchus pacificus (Fig. 2)12–15. Treatments effective finding which genes they express during infection should against A. ceylanicum might thus also prove useful against other help in devising new drugs or vaccines against hookworms4,5. strongylids, such as Haemonchus contortus, that infect farm ani- Unlike other , Ancylostoma ceylanicum infects mals and depress agricultural productivity16. Characterizing the both humans and other mammals, providing a laboratory genome and transcriptome of A. ceylanicum is a key step toward such model for hookworm disease6,7. We determined an comparative analysis. A. ceylanicum genome sequence of 313 Mb, with We assembled an initial A. ceylanicum genome sequence of 313 Mb transcriptomic data throughout infection showing expression and a scaffold N50 of 668 kb, estimated to cover ~95% of the genome, of 30,738 genes. Approximately 900 genes were upregulated with Illumina sequencing and RNA scaffolding17,18 (Supplementary during early infection in vivo, including ASPRs, a cryptic Tables 1–3). The genome size was comparable to those of Ancylostoma subfamily of activation-associated secreted proteins (ASPs)8. caninum (347 Mb)19 and H. contortus (320–370 Mb)20,21 but larger Genes downregulated during early infection included than those of N. americanus, C. elegans and P. pacificus (100– ion channels and G protein–coupled receptors; this 244 Mb)22–24. We found that 40.5% of the genomic DNA was repeti- downregulation was observed in both parasitic and free-living tive, twice as much as in N. americanus, C. elegans or P. pacificus . Later, at the onset of heavy blood feeding, C-lectin (17–24%). We predicted 26,966 protein-coding genes25 with products genes were upregulated along with genes for secreted clade V of ≥100 residues (Supplementary Table 4). We also predicted 10,050 proteins (SCVPs), encoding a previously undescribed protein genes with products of 30–99 residues, to uncover smaller proteins family. These findings provide new drug and vaccine targets that might aid in parasitism26. With RNA sequencing (RNA-seq), and should help elucidate hookworm pathogenesis. we detected expression of 23,855 (88.5%) and 6,883 (68.5%) of these genes, respectively (Fig. 3). The two hookworm species causing the most infections are Necator The genomes of plant-parasitic, necromenic and -parasitic americanus and , which are generally restricted nematodes have all acquired bacterial genes through horizontal gene to human hosts1,9. Hookworms are free living during part of their transfer (HGT)27,28. We detected one instance of bacterial HGT in life cycle, with eggs hatching in soil and larvae feeding on bacteria A. ceylanicum: Acey_s0012.g1873, a homolog of the N-acetylmuramoyl- through the first and second larval stages. At the infectious third- l-alanine amidase amiD, which encodes a protein that may help stage larval phase (L3i), hookworms cease feeding and wait until they bacteria recycle their murein29. Acey_s0012.g1873 was strongly encounter a human . They generally enter their host by burrow- expressed in L3i and then downregulated in all later stages of ing into skin, although Ancylostoma can alternatively enter by being infection. It has nine predicted introns, presumably acquired after swallowed. Hookworms then pass through the bloodstream, HGT; it has only one homolog in the entire phylum and digestive tract to the , where they affix themselves, (NECAME_15163 from N. americanus) but many bacterial homologs mature to adulthood, mate and lay eggs that are excreted by the (Supplementary Fig. 1 and Supplementary Table 5). The sap-feeding host1. The ability to culture A. ceylanicum in allows insects Acyrthosiphon pisum and Planococcus citri also have amiD it to be used as a model system for the human-specific hookworms genes, acquired by HGT, that may promote bacterial lysis30,31. N. americanus and A. duodenale, upon which new drug and vaccine To find genes acting at specific points of infection, we carried out candidates can be tested (Fig. 1)6,10,11. Human-specific hookworms RNA-seq on specimens collected at developmental stages spanning belong to a class of parasitic nematodes, strongylids, that are more the onset and establishment of infection by A. ceylanicum in golden

1Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA. 2Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, USA. 3Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, California, USA. 4Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA. 5Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California, USA. Correspondence should be addressed to E.M.S. ([email protected])

Received 12 January 2014; accepted 5 February 2015; published online 2 March 2015; doi:10.1038/ng.3237

Nature Genetics ADVANCE ONLINE PUBLICATION  l e t t e r s

Figure 1 Life cycle of A. ceylanicum. A. ceylanicum hatch in feces and 19.D L3i 24.HCM grow as free-living first- to third-stage (L1–L3) larvae. Before exiting the third larval stage, they mature into infectious third-stage (L3i) larvae, arresting further development until they are inside a host. In 24 h after gavage into golden hamsters, A. ceylanicum are still in the but 17.D 24.PI have exited the L3i stage (24.PI). A standard model for parasite infection is to incubate L3i larvae for 24 h in hookworm culture medium (24.HCM), which evokes changes in larval shape and behavior thought to mimic those of 24.PI larvae in vivo. By 5 d after infection, larvae have migrated further to the intestine, affixed themselves there and grown into early fourth-stage 12.D 5.D (L4) larvae (5.D; female shown) with visible sexual differentiation. By 12 d (12.D; female shown), they start heavy blood feeding and become young adults with mature males and a few gravid females, with little or no egg laying. By 17 d (17.D; male shown), they are fully mature adults. They begin laying many eggs that are deposited outside the host during defecation, renewing the life cycle. From 19 d onward (19.D; male shown), they remain fertile adults for weeks in hamsters. Scale bars: with N. americanus or other nematodes might encode previously 100 m (L3i through 5.D), 500 m (12.D) and 1 mm (17.D and 19.D). µ µ undescribed components of infection (Supplementary Table 10). Proteases, protease inhibitors, nucleases and protein synthesis were hamster (Figs. 1 and 3, and Supplementary Table 6), beginning at upregulated during early infection (L3i to 24.PI; Supplementary L3i and followed by 24 h either of incubation in hookworm culture Tables 9a and 11a); proteases and protease inhibitors were medium (24.HCM), a standard model for early hookworm infection32, also upregulated after L3i in N. americanus24, as were proteases in or infection in the hamster stomach (24.PI). We found 942 genes to H. contortus21. Secreted proteases could allow hookworms to digest be significantly upregulated from L3i after 24 h of infection in vivo host proteins in blood and intestinal mucosa6,11,35–37. Secreted (Supplementary Table 7). In contrast, we observed only 240 genes proteases might also digest and inactivate proteins of the host’s significantly upregulated from L3i after 24 h of incubation in HCM, immune system37,38. Conversely, secreted protease inhibitors could of which 141 were also upregulated with in vivo infection. This lower also suppress host immunity39–41. number matches previous observations32 and shows that infection G protein–coupled receptors (GPCRs), receptor-gated ion channels in vivo has stronger effects on gene activity than its in vitro model. and neurotransmission-related functions in general were downregu- We linked known or probable gene functions to steps of infec- lated during early infection (L3i to 24.PI), along with transcription tion by assigning gene ontology (GO) terms to A. ceylanicum genes33 factors (Supplementary Tables 9b and 11b). We observed the and computing which GO terms were over-represented among same pattern among genes downregulated in the transition from L3 genes upregulated or downregulated in developmental transitions to fourth-stage (L4) larvae both in H. contortus21 and C. elegans42 (Supplementary Tables 8 and 9)34. We also analyzed homologous (Supplementary Table 8). This finding is consistent with down- gene families for disproportionate upregulation or downregulation; regulation after L3 of sensory perception and transcription genes in in particular, gene families identified by orthology of A. ceylanicum both C. elegans43 and N. americanus24 and of ion channel genes in A. caninum and Brugia malayi32,44. Such downregulation might thus A. ceylanicum C. elegans be conserved in both parasitic and free-living nematodes. A. ceylanicum 9,860 7,042 Among gene families upregulated during early infection, we found

N. americanus 7,506 6,225 some already known from other parasitic nematodes, such as ASPs

H. contortus 5,268 4,576

C. elegans 7,042 11,323 L3i 24.HCM 24.PI 5.D 12.D 17.D 19.D

C. briggsae 6,933 10,795

P. pacificus 4,833 4,979

B. xylophilus 5,373 5,587

M. hapla 4,161 4,376

A. suum 5,955 6,209

B. malayi 5,312 5,545

D. immitis 5,166 5,308

T. spiralis 3,015 3,186

Figure 2 Evolutionary relatedness of A. ceylanicum to other nematodes. 13 14 The phylogeny is derived from van Megen et al. and Kiontke et al. . log2 (TPM) 0 15 0.5 1.0 1.5 2.0 2.5 3.0 N. americanus and H. contortus are strongylid parasites and the closest –2.5 –2.0 –1.5 –1.0 –0.5 ≥ ≤–3.0 relatives of A. ceylanicum. C. elegans, C. briggsae and P. pacificus are free-living, non-parasitic nematodes. Nematodes from distinct groups Figure 3 RNA expression levels for 30,738 A. ceylanicum genes. 12 (clades) within the phylum are color-coded: black, A. ceylanicum and Gene activity during infection is shown in log2-transformed transcripts close relatives, clade V; green, plant parasites, clade IV; pink, ascarid and per million (TPM), with k partitioning of the genes into 20 groups. filarial animal parasites, clade III; orange, Trichinella, an animal parasite Genes in yellow and blue are up- and downregulated, respectively; TPM from clade I. To the right are the numbers of strictly orthologous genes for values are shown ranging from ≤2−3 to ≥23. Developmental stages are as A. ceylanicum or C. elegans and other species. Self-comparisons (bold) in Figure 1. Changes in gene expression after 24 h of growth in HCM list all strictly defined orthologs within a genome. A. ceylanicum and (24.HCM) are relatively minor, as opposed to the far-reaching changes C. elegans have similar orthology to diverse nematode species. in gene expression seen after 24 h of infection in vivo (24.PI).

 aDVANCE ONLINE PUBLICATION Nature Genetics l e t t e r s

(Supplementary Table 12a)21,24,45,46. ASP genes encode a diverse set in some strongylids (for example, N. americanus; Supplementary of secreted cysteine-rich proteins, whose functions probably include Tables 13 and 14) but not all (for example, H. contortus). Most ASPR blocking immune responses and blood clotting8. However, we also proteins were predicted to be secreted (Supplementary Table 4), and found a family of 92 genes collectively upregulated during early infec- one ASPR in Heligmosomoides bakeri is secreted by parasitic adults46. tion in vivo (24.PI; q value = 0.003) that had no obvious similarity Thus, like ASPs, ASPRs might comprise an important element of to known gene families (Supplementary Tables 4 and 12a). By con- in vivo. trast, upregulation of these genes after 24 h of simulated infection A. ceylanicum had 432 ASP genes, noticeably more than the related in vitro was insignificant (24.HCM; q value = 0.93). These homologs parasites N. americanus (128 genes) and H. contortus (161 genes) and were distantly related to ASPs, so we termed them ASP-related genes remarkably more than the non-parasitic C. elegans and P. pacificus (ASPRs; Fig. 4 and Supplementary Fig. 2). We found other ASPRs (35 and 33 genes, respectively). A. ceylanicum and N. americanus

ASPRs

A

A

A H

S

S 2

O ASPR-s0201.g1 S ASPR-s0015.g28 b ASPR-s0233.g3073/61- Nam-ASP Nam-ASPR-g15 ASPR-s0015.g2881 6

Nam-ASPR-g155 8 5

P A 3

Nam-ASPR-g P 3 8 0 a P

d 9

ASPR-s0 7 ASPR-s00

1 1 Nam-ASPR-g15526/ 3

3 S R k R

e ASP-s0031.g224 R - 1 ASPR-s00 1 9 2 6 1 - ASPR-s0145 -

P n

- 1

- - - -14 ASPR-s0031 - A - 9 9

s

8 ASPR-s0004. s 1 ASPR-s0020.g40/ s - ASP-s0 1

R 2 3

/ ASPR-s012 2

S 0 /3-13 A -136 14 /

0 / /

/ 0 1-169 Nam-ASP /5-13 ASPR-s00 9 - /30-149 ASPR-s0 0 9 9 3 4 1 P ASPR-s01 S 2 -140 4 9/13-152 s /7-143 0

9

1 0 3-137 6 0 5 4 7 4 5 R ASPR-s 0 P ASPR-s0623.g780/17-15 4 7 ASPR-s059 1 5 8 /7-15 8 7 7 42/23 7 R-g17038/54- 7 9-236 0 -17 - R 3 031.g2245/ 0 . 1

. 0 1

ASPR-s0599. 0 1 n . 2

g

g -14 g

g

- 1 Nam-ASPR-g g 19/54- 42.g572/ g 1

. s 1 2 3

2 1 .

031.g2 1 .

-

1 ASPR-s00 ASPR-s003 - -129 15.g2843/69-19 p ASPR-s0352.g3266/ 8 5/8-90 5 .

8 0 7

8 7 -243 3396/7 28/9 8 15528/5- g

R 8

- R 1 1426/8-14 2

7 ASP-s1162.g3713/1 7 5 5 Nam-ASPR-g13006 1 1 0 5 7 11/40 711/1-14

0 022.g49 268/2-153 P Oden-AS R-g047 P 8 .g2468/26-1 4 4 /15

0 4 6 0064.g 8 18.g3621/62- 0.g90 .g2249 4 1 77/1-144 ASPR- 0 Nam-ASPR- 06.g37 457.g1813/ S

/ 0 S 0 0 30/4-164 / / / 2 g1889/19-146 9 s 1 .g14 2 1 s SPR

/ s

A

- .g208 A 415/41-1 /

8 8/129-205 2 248/6-8 - - 1 3 - 2 0 81.g1435/23-13 - -s0457.g179 3 -s0457.g1806 ASPR-s02ASPR-s00 /8-146 /1-14

- R 9.g480 8-18 - - - n SPR-254 - R n 3.g13 ASPR-s0 - R -22809/18 1417 1-16 9 2 1

1 21-146 1 80/1-11 1 73.g2087/6-14 Nam-ASPR-g0 1

P 6-155 e 3/63-19 e P

2 3 P 473 Nam-ASPR-g09 67-183 0 s0081.g1431/1 5 05.g2681/44- 6 3511/ 135 5 Nam-ASPR-g09 53/2- d 150 18 d s0015.g286 S /4-7 R-s0020.g78/37-124 -101 135 6/44-192 3 3 9-19 S 34/66- 6 g483/50- S 0 6.g 9.g111/7- 2 7 2 5 1 PR-24659 17186/88-17 A ASPR-s023 O 4

O ASPR-s0457.g1807 A A 4 ASPR-s0457.g1800/4-14 ASPR 9 ASPR ASPR-s0 Oden-A 456/2 ASP-s0010.g1 Nam-ASPR-g16545 34-15 13 0 ASPR-s0042.g574/8-15 /31-16 ASPR-s0045.g1247/8ASPR-s0287.g1455 ASP-s 9 7/7- 5 28 ASPR-s0042.g717 144 ASP 7.g2983/21-16 1 ASP-s0 102019/ 90-22 9 Oden-A ASPR-s0357.g ASPR-s0081.g 6 043.g842/1 228 3 ASPR- 10.g2 42.g709/2-159 172 ASPR-s00 ASP-s0010.g ASPR-s0081.g1427/9-146 015.g2879 0/9-1 ASP-s0010 ASPR-s0081 ASPR-s0 3 ASPR-s04 13 ASPR-s028 .g111 ASP-s06 37-115 6 0010.g1059/15-14 13 1 7 Oden-ASPR /10-141 ASPR-s0081.g1423/27-10 -9 0 ASPR-s0046.g1420/75-15 37/9-129/13-141 ASP-s0181. 0/10-15 /3-145 ASPR-s0046.g1 4.g3130/ 09-24 ASPR- 121/20- 0 ASP-s001 010.g105 9692/58 178 8 ASPR-s004 8 3.g3068 8 ASPR-s0659.g1258/48-14 9-116 9 ASP-s000 0 0 ASPR-s0659.g1260/51-16 0-127 694/48 0.67 0.82 ASPR-s0034.g2844/49-1 /94-16 0.96

2 693/9-9 . ASPR-s0015.g28 ASP-s00 R-s0020.g73/24 9 ASPR-s0 6 ASPR-s009 -13 3 0.91 /113-19 063/47-140 . ASP-s0045.ASP-s0045.g11 1-147 0.74 4 ASPR-s0258.g R-s0123 7 ASPR-s0020.g41/19-103 36.g940/84-1.g10 0 0.58 s0002.g551/ 0.94 R-s023 2 0 0.84 Nam-A 3 ASPR-s0020.g74/8 6 0.79 1056/9-14 0. -142 ASP 0 ASP-s004 /6-150 6 -182 0. ASPR-s0031.g2246/24-100 00 0.g1054/ 7/25-165 01315/5 60/102-23 6 Nam-ASP- 6 ASP g867/6 73/63-16 4 0.18 ASP 45.g1162/15.g2 0 R-g03029 /3-14 0.33 4 ASPR-s0020.g13 ASP-s0 0 ASPR-s0025.g1272/8- R-s0010.g114 SP-g09 9 0.7 ASPR-s0188.g1150/81-182 ASP-s042 0.26 . ASP ASPR-s0148.g2676/28-103 1 0.78 0.4 399/ 0 5 Nam-A 6 0.89 ASPR-s0013.g20 g1167/ 7-15 0.96 ASP 3 5.g1157 3 0.99 1 -s0010.g107 12-1375-176 ASPR-s0013.g2039/13-14 SPR-g17814/4R-g15529/3-142 Nam-ASP-Nam-ASP 85 8 0.6 ASPR-s0020.g126/1 636.g935/ 28-14 SPR-g172 g155 509/20 59/4-169 0.28 ASPR-s0020.g128/7-136 0.48 Nam-ASP ASP-s0 8 0.93 ASP-s00 SP-g06987/22- 3-16 1 0.52 Nam-ASPR-g 8.g1294/ 17-18 0.53 0.34 0.53 0.83 0.79 ASPR-s0015.g2865 47/12 0.72 Nam-A ASP-s0046.g1373/1 6 1 ASP-s004 /3-21 0.14 0.78 Nam-A 1-30 6 067.g109/86-234 0.92 0.41 5 -g08845 0.84 1 Nam-ASP 046.g 4 0.8 69/147-195 g15941/67-216 73-165 0.84 7 Nam-ASPR-065676/24-16 -7 5 ASP-s0 0.51 1 46.g1397/3 0/42-141 -124 7 6 Nam-ASPR-g15527R-s0233.g3072/62-14 0.33 23562/12-154

7 803/10-135 ASP-s0050.g2030/ 15-80 9 0.79 Nam-ASPR-g17261/3-14

Nam-ASP- . 0.94 0.8 Nam-ASPR-g13007/14-1 SPR-12578/2-14 8 Nam-ASP- 0 ASP-s0101. 1371 0.86 0.88 7 ASP SPR-12576/2-143 ASP-s0 6.g137 -8 3 ASP-5N /3-163 0.23 ASPR-s0 9283/2 2/2-10 9 116.g6 175 ASPR-s0067.g115/31-173 0.99 0. 0.55 0.9 7 /8-18 0 0.83 Oden-ASPR-12577/2-142

1 0.99 3 . 0.92 0.81 7 Oden-A 1729 2/32-21 7-191 8 0.

. 490/13 g09588/7- 8 4 0.99 Oden-A 003.g1626 0.79 ASP- 11/9-1 0 Oden-ASPR- 1 g17854/9 6 0.79 ASP-s0007 -12 0.16 0.64 ASP-g18987/3-7-g17293/2-103 0.58 Nam-ASPR-g020 g3352/26-1 7 0.6 ASP-s0007.g3489/4-17 4 ASPR-s0012.g1 293/88-17 s0003. 4 Nam-ASP-g1 0.4 ASP-5A/2-1 1-98 82 0.84 7 ASP-s0 0.86 0.81 0.84 Nam- ASP-5B/3- 0.81 2 5 ASP-s0 134 0.31 . Nam-ASP-g ASP /348-413 8 0.92 774/6-182 -118 0 8-183 Nam-ASP 0.86 0.87 0.91 0. .g3490/37-205g1637 0.9 0428.g157.g3201/9-184 ASP-s00 0.21 0.35 Nam-ASP-g07 -s0815. 007.g3495/8- 38 0. 0.74 0.3 ASP-s 0.99 0.94 Nam-ASP-g07493/5-10 Nam-ASP 007.g3 9 ASP-s0119.g842/340-49 30-179 0.98 3 ASP-s0048 /8-178 75 0.26

174 7 ASP-s . 1 ASP-s01 0.34 ASP-s01 6 0002 02.g g249 0.68 0 ASP-s0 501/7-1 0.79 1 ASP-s0066.g3712/103-281 g2178/172/93-265 9 ASP-s0066.g3 ASP-s004 0. 0.85 0.84 9-16 -g1118.g608615/ 7/14-18 175 0.4 ASP-s0066.g3773/36-188

5 4 4 0.92 43.g241.g1600/14 0.34 0.0 ASP-s0240.g3336/ 469-612

8 0.98 ASP- 048.g1 19-1 6 . ASP-s0351.g3246/8-183

ASP-s0522.g2 /11-18 2 0.91 0 4/25-191 0. 0.94 0.48 ASP-s1508.g3901/29-138 ASP-s05 1 0.47 /79-257 s0522.g28.g1641/2 72 8 0.92 ASP-s0477. ASP-s 642/29 0.29 0.78 0128.g1456/22-100 3/224 0.94 ASP-s0477.g2 0-178 3-297 2 0.87 0.94 ASP-s1062.g3517/7-66 366/4-175 Nam-ASP-g 0.73 5 0.83 Nam-A 0522 . 22.g -384 0.52 0.98 0.37 ASP-s 0.g1034 365/1 0-45 0 2 ASP-s00 883/9-16 0.72 ASP-s0097.g2962/1 877/24-18566-42 0.89 ASP-s0048.g1657/164-324.g28 0.74 0.96 SP-g07 2887 0 0.85 ASP-s0016.g3114/ 0.71 0.99 0 0.81 ASP-s0010.g1028/4-176 ASP-s0048.g1654/55-215 90/1 897/20-1970/18-19 5 . 0.74 ASP-s0 48.g 07114/2-1/72- 5 ASP-s001 9 0.83 0.76 115/23 30-29 0.86 8 Nam-ASP-g01 159 213 -701 ASP-s004ASP-s0048. Nam-ASP-g01 1369/3-184 048.g1656/229-389 0.81 0.05 0.93 0.36 Nam-ASP-g01364/13-173 1/2 1 0.13 0.38 1 4393/10-181 85 ASP-s0048 7-40 24 0.38 0.88 Nam-ASP-g01363/7-178 8/526 63-3 /7-1 0.81 Nam-ASP-g13 ASP-s011 6 0.95 0.74 8.g1585 6 0.16 Nam-ASP-g0274 .g101 ASP-s0119.g8 g1657/ 8 0.85 0.79 Nam-ASP-g0 .g1019 1 ASP-s00 .g1655/28 0.66 Nam-ASP-g0 023/2-1893-37 3 ASP-s0041 9.g835/ 0.83 0.99 ASP-s0010.g1022/24-189 21 /208-3131-142 0.09 010.g1014/3-187 ASP-s025 0 ASP-s0010 6/21 48.g15 1 ASP 8-458 0.44 ASP-s0010 28/252-405217-36 0.99 .g62 263/43-3/245-408 ASP-s025 ASP-s0 -s0256.g353/273-376.g354/23587/73- 0 0.77 0.97 0.88

6.g35 0.92 . 0.86 ASP-s0010.g1 ASP-s025 8 1 0.94 8 ASP-s0610.g625/257-414 ASP-s0041. 0.88 9 0.64 0.8 59-400 7/256 23 0. 0.87 ASP-s0610 Nam-ASP-g1 6.g364/2 -401 9 0. 0.81 Nam-ASP 5 9 8 ASP-s0017.g3 6 -412 0.82 29-183 6.g349 8 0.9 0.75 ASP-s0017.g326 1 0. ASP- g359/184-29428-326 0.57 0.38 ASP-s0610.g625/54-217 2/30-18 0.91 9-20 -g16753 /5-160 0.98 ASP-s0610.g626/5-174 -215 ASP-ss0119 6755/6 0.93 0.65 5 ASP 0.49 ASP-s0610.g624/2 57/3

ASP- -s093 0119.g810/32-1.g803 0 ASP-s0610.g624/427-587 9

/133-29-241 . 8 0.92 0.76 2 0.87 1 ASP-s0041.g354/ ASP-s0173.gs0216.g5.g /219- 0.16 2 93 0.97 0.74 ASP-s0522.g288 753/3-6 0 ASP-s0173.g394/311 0.84 389 0.94 0 0.52 ASP-s0256.g3 6 ASP-s0151 2381/21/2 1 0.98 0.78 ASP-s0256.g353/112 ASP-s0173.g393 24-3 02 0.22 0.88 395/2 0.92 0.55 0.76 ASP-s0256.g364/18-1264/32-12 7 ASP-s015 35-40 85 0.85 31-13210 .g2841/19 0.92 1 Nam-ASP-g16 7 ASP-s0151.g2847/272-44 37-40 1 1 218-381 1 Nam-ASP-g02290/14-11 0.81 ASP-s0151.g2844/231-391.g2846/23 4 0.98 ASP-s0477.g2175/8-113714/22-/17-17 /218-381-359 0.85 ASP-s0173 0.83 Nam-ASP-g061 .g1585 0. 4 7 0. ASP-s0041.g359/ 0/14-179 ASP-s0048.g1 2-4032 0.28 7 0.61 ASP-s0071.g526/235-397 ASP-s1162.g .g397/248-410 0.35 0.17 ASP-s0048 ASP 3 0.93 55/85-249 ASP -s00 0.99 ASP-s0048.g1591/37-20456/13-180 647/5-7 7 0.4 ASP-s0048.g157 ASP-s0048.g1-s0048.g1605.g2 0.66 0.97 0.24 3 ASP-s0048.g1576/1-11 695/23 3 ASP-s0272.g948/12-1 0.83 0.72 0.9 7 ASP-s0048.g16 03/20-177 9 ASP-s0272. 49/163-32 0. 0. 1/43-201 -183 4 0.74 ASP-s0048.g16 648/2 0.14 0.73 6 ASP-s02 0.19 0. Nam-ASP-g07115/6-179 14-30 3 8 0.94 g933/206-3 ASP-s0119.g8 -132 5 ASP-s0119.g8 8 0.93 0.87 72.g 2 0.07 ASP-s0119.g81 Nam-ASP- 4 2 9

0.99 . 954/ 9 0.53 0.66 ASP-s0935.g3111/5-17 10-2042 0 0.89 81-23 Nam-ASP-g16Nam-AS 0.99 ASP-s0143.g2413/9-17 g1899425/7-7 5 0.91 642/ -161 8 P-g16122/2-70 0.93 ASP-s0522.g2886/98.g1 Nam-ASP-g16120/37-159 1 0.83 5 0.54 /62-21 /2-102 0.77 0.61 0.99 0.7 9 ASP-s0048.g1641/48-21 Nam- 121/174-3 0. ASP-s004 Nam-ASP- 5.g2671 ASP-g ASP-s0119.g835/8 Nam-ASP- 4 0.99 0.78 1612 2 0.98 0.91 ASP-s050 g1612 0.98 0.94 7/66-178-166 ASP-s0024.g961/243-45/1- 0.66 0.84 0.86 0.81 ASP-s0119.g828/40-197 g17852/147-2213/26-1153 0.74 0.97 9/57 ASP-s0 ASP-s0853.g2693/8-211g339 97 0.37 62. ASP-s0183.g183.g96 0.92 0.52 1 ASP-s0574.g17 16-1173 ASP-s04 0.31 ASP-s01 7567/ 6-18 2/218-3902 0.92 ASP-s0477.g2171/235-77.g 961/250-413 0.93 ASP-s0024.g1089/47-107 8 0.83 0.91 8 ASP-s0 217 0. 0.84 Nam-ASP-g1 0/25 8 0.76 -192 ASP-s 183.g95 7-4 0.76 Nam-ASP-g15064/ 17 0.06 12 ASP-s0477.g2175/ 0.93 Nam-ASP-g18720/14-118 9/251-4392 0.84 4/36-2 0162.g3 0.92 Nam-ASP-g15063/9-18g138 215 ASP-s0009. 157-30406 0.98 Nam-ASP-g16708/14 5 400/74- 0.27 ASP-s00 ASP-s0003. g643/191-343227 0.95 4 ASP 35.g303 0.89 0.97 0.87 ASP-s0042.g548/54- ASP-s0011.g13-s0119 0035.g3030/12-183-552 0/236- ASP-s 9/40 1 .g811/247-33933 0. 0.98 g132 1-45 1 7 0.34 0.88 ASP-s0035.g3034/6-17 ASP-s0074.29/616 0.57 0.99 ASP-s0 -763 0.89 ASP-s0011. 8 074.g90 g904/4-67 0.77 0.86 ASP-s0074.g903/30 5-19 ASP-s0074.g909/18 209/2 32 ASP-s0035 3/500-55 0.37 ASP-s0074.g909/2-13657-2 9 0.66 0.98 1207/ 89 3 0.6 0.8 0.92 ASP-s0010.g1 ASP-s0133.g1795/8-146.g3034/22 8-350 0.5 0.97 -s0010.g 0.93 ASP g1419/15-1 ASP-s0133.g179 1-38 0.87 0.81 0.85 8 1 ASP-s0194. ASP-s0194 0.86 0.84 17/16-161 3/65-228 0.99 0.95 ASP-s0133.g1798/26-1383-20 ASP-s0010.g1207/286-456.g1415 0.75 1 ASP-s0194.g14 1 ASP-s0010. /86-235 1 0.76 0.28 ASP-s0183.g959/3 0 ASP-s019 g1209/252-41 1/59-21 0.76 ASP-s0477.g2170/23-23.g96 4.g14 0.93 2/8-177 ASP-s0574.g1719/26 4 0.71 0.92 ASP-s018 189 3-423 71/17- Nam-ASP-g15 6/62-13 5 1 ASP-s0183.g9677.g21 42 0.9 0.9 Nam-ASP-g17566/13-1189/4-11 1 ASP-s04 39/31-1 Nam-A 3 0.98 0.32 ASP-s0272.g9 SP-g177 4 0.34 /39-202 Nam-ASP-g 18/7-13 3 0.98 ASP-s0847.g2665/69-241272.g933 8 Nam-ASP- 150 8 0.78 1 0.51 ASP-s0 63/228 1 0.94 0046.g1399/9-10 1 g1506 -352 ASP-s Nam-AS 4/23 7 P-g18 5-365 0.81 7 ASP-s0272.g952/11-13 Nam-ASP-g18512/1-110555/ 1 0.8 7 1-64 0.82 ASP-s0173.g393/6-17/5-17 ASP-s0003.g1387/18-81 0. 0.87 0.89 .g394 9 0.83 0.99 1 ASP-s0173 ASP-s0042 .g2847/73-234 0.84 0.96 ASP-s0151 90 .g548/28 0.09 844/30-1 Nam-ASP- 0-434 0.89 s0151.g2 g18281/20-13 0.93 5 ASP- 46/35-193 Nam-A 7 0. ASP-s0151.g28 8 SP-g06164/167 0.93 0.77 -19 Nam-AS -332 .g395/1 126 P-g19399 0.31 0.99 6 ASP-s0173 Nam-ASP /3-112 0.93 0.52 0.3 -g1236 0.91 ASP-s0151.g2841/4-18-194 ASP-s10 7/78-215 0.86 ASP-s0216.g2381/ 21.g34 97/37-202 Nam-ASP-g0 15/13-178 0.87 985 0.96 ASP-s0173.g3 9/34-213 ASP-s0265.g65 0.99 ASP-s0005.g2699/80-1796/29-196 6/80-210 0.71 0071.g52 Nam-ASP-g00629/81- ASP-s -176 243 0.17 0.89 0.85 ASP-s0048.g1648/25 ASP-s0438.g1464/5 1 0.86 0.87 25-126 Nam-ASP-g158 0-210 ASP-s0048.g1649/ 65/168-323 Nam-ASP-g17852/1-98 ASP-s0695.g1604/61- 0.28 1 -98 225 0.97 0.92 0.9 Nam-ASP-g17853/1 ASP-s0018.g3668/28-202 0.56 -177 ASP-s0018.g Nam-ASP-g17085/6 3669/80-226 0.99 0.71 0.92 Nam-ASP-g19062/2-137 ASP-s0967.g3238/4 0.79 0.71 55 1-334 0.9 0.3 0.93 Nam-ASP-g16121/41-1 ASP-s0055.g2562/110-265 0.66 0.95 Nam-ASP-g18170/5-130 Nam-ASP-g05953/26-135 0.99 Nam-ASP-g18225/12-121 ASP-s0016.g3127/475-626 0.57 ASP-s0024.g961/37-190 Nam-ASP-g06919/8-108 ASP-s0069.g403/10-71 ASP-s0092.g2549/6-182 0.97 0.62 Nam-ASP-g07690/6-170 0.98 0.84 ASP-s0020.g50/ Nam-ASP-g12644/3-145 0.92 0.99 2-165 Nam-ASP-6/183-341 0.64 ASP-s0020.g53/3-165 0.06 Nam-ASP-g00357/14-169 Nam-ASP-g14978/12-177 ASP-6C .g2710/7-179 8 0.84 ASP-s0201.g1745/10-174 ASP-s0224 0.75 0. 159 ASP-s0009.g633/16-17 Nam-ASP-g01191/3- 0.91 0.8 5 92/175-289 0.99 ASP-s0713.g ASP-s0365.g35 0.7 0.08 1756/9-171 5 0.72 ASP-s0025 ASP-s0365.g3596/33-11 0.99 .g1267/71-197 3590/245-413 0.17 0.6 ASP-s0009.g633/553 ASP-s0365.g 13 ASP-s018 -706 591/246-4 0.99 0.98 0.62 0.15 0.26 8.g1147/210-365 ASP-s0365.g3 6 0.91 Nam-ASP- 671/257-41 0.95 g19154/1-66 ASP-s0105.g3 1-110 0.75 ASP-s0245.g3568/8 ASP-s0105.g3668/ 0.65 -181 0.93 Nam-ASP-g16749/46 3123/1248-1411 0.25 -224 ASP-s0016.g 0.87 0.93 0.85 ASP-s0328 0.98 0.96 .g2635/ Nam-ASP-g13272/1-1360 0.79 0.92 0.98 0.88 Nam-ASP-1/ 1-101 .g898/22-15 0.28 41-212 ASP-s0633 9 0.88 Nam-ASP-g0 .g890/8-15 0.94 2238/37-1 ASP-s0633 9 0.87 ASP-s0 08 .g903/8-15 0.76 0.33 328.g2633/ ASP-s0633 ASP-1/2 6-17 201 2-182 8 1294/92- 0 0.98 Nam-ASP- Nam-ASP-g1 0.83 0.99 g11963/24-1 ASP-s0183.g948/1-646 ASP-s0001.g198/10-18383 ASP-1N 08137/167-2 0.99 0.18 0.9 ASP-s00 Nam-ASP-g 0.87 50.g2042/ 08136/6-120 0.88 0.63 ASP-s00 3-25 Nam-ASP-g 0.33 50.g2045/61-255 33/275-4117 ASP-s0 ASP-s0067.g 1 050.g205 5 ASP-6/243-4002 ASP-s0 1/3-100 2 0.84 004.g2023/5-1 8 0.4 0.84 ASP-s ASP-s0067.g31/32-2 0.74 0004. 74 -185 0.69 0.84 ASP-s000 g2026/8- ASP-s0067.g56/212-34 0.36 0.86 0.9 4.g202 97 NIF ASP-NIF-A/6 ASP-s0050. 9/17-146 B/17-208 0.97 ASP-s0020.gg2040/4-15 ASP-NIF- 51/9-156 0.54 0.83 0.94 0.42 0.71 0.86 0.7 ASP-s0101. 104/88-22 1 ASP-s0067.gg38/11-128 2 0.3 ASP-s01 g3382/31-1620 ASP-s0067. 0.77 0.93 0.74 Nam-ASP-01.g3381/ ASP-s0067.g58/7-13 ASP-s038 7/35-15439-164 ASP-3B/5-174 0.93 ASP-s0009.g827.g459/3 ASP-3A/23-1885 0.35 0.92 ASP-s0 7-165 3 1 0 9.g461/267-424-16 0.91 0.57 ASP-s0222.g2627.g8158/39-169 ASP-s000 -422 0.86 ASP-3 7 0.91 ASP-s0222. /37-14 ASP-s0009.g459/09.g458/2728-40 0.94 0.57 618/28-1616 3 0.35 ASP-7B/23-15 ASP-s00 .g462/24 0.89 g2616/5-11 0.9 0.54 ASP-7A/34 56/251-41 0.46 2 ASP-s0009 ASP-s000 6 3 0.72 -15 ASP-s0009.g409.g455/268-4234 ASP-s067 6 0.62 8 9.g643

4-51 0 0.68 1 ASP-s00 1 0.94 . ASP-s 2.g1383/18-18/34-13 .g2045/3311-190 6 0 0.99 0053.g2289/69-16 5 ASP-s0050.g2051/148-32 0.9 0.95 ASP-s0 ASP-7 ASP-s0050 0 0.95 1 ASP-s0053.g2074.g875/ 4 7 0.93 0.74 0. 0.8 ASP-s0999. 9 ASP-s0050.g2036/030/87-25 0. 0.42 9 0.25 0.95 8-182 9 0.77 Nam-AS 324/1-181 ASP-s0050.g2040/200-3298 g3346/2 1 ASP-s0225P-g13008/5 ASP-s0004.g2 0.86 0.74 7-31 0.15 ASP-s0045.g1135 9-199 ASP-s0004.g2028/76-281 2 0.59 .g2764 ASP- -152 0.44 /5-178 ASP-s0004.g2023/223-31219/28-19 2 0.99 0.97 0.99 0.94 Nam-ASs0045.g11 ASP-s0328.g2635/14 7-42 /6-201 0.95 0.98 Nam-ASP-g10834/31-214P-g1204540/6-2 Nam-ASP-g0 7-392 0.97 0.82 1 Nam-ASP-g 02 ASP-s0328.g2631/40-205 0.96 Nam-ASP-g /21-190 Nam-ASP-1/25ASP-1/22 0.87 0.42 0.78 Nam-ASP-g05802/0749 229-39393 0.99 7/24 1 0.88 0.43 Nam-A 06266/18-18 .g3568/ 0.91 0.08 0.37 0.93 0.85 -198 Nam-ASP-g01772/76-240 2 1 0.95 Nam-ASP-g10SP-g17849/11-1 .g198/229-3 0.36 -92 0.81 ASP-s0045.g ASP-s0245 0.86 0.91 22-19 9 ASP-1C 0.64 0.85 ASP-s0045.g 6 ASP-s0001 0.79 0.75 850/2 413 ASP 87 0.99 1164/5 3-19 ASP-s0188.g1147/415-581 ASP-s0045.g11-s0346.g 6 1 0.73 0. 0.86 1192/29-143 1 ASP-s0009.g633/756 9 Nam-ASP-g0 -12 2/1-9 0.7 0 ASP-s0713.g1756/268-431 0.34 ASP-s0123.g1110/3142 /47-22 ASP-s0025.g1267/247- 0.86 ASP-s 37/5-199 14-1232 0.25 0.95 1696/6- 0.93 ASP-s0045.g1155 5 Nam-ASP-g12324/131-242Nam-ASP-g1128 76 0610.g628/5-19 Nam-ASP-2/20-129 0.96 ASP- 11 ASP-2/1-15 0.86 5-1934 0.92 0.96 0.92 ASP-4s0004.g176 ASP-s0184.g989/1832/101-2 0.73 ASP-4A/33-2 7 B/34- 2 0.92 ASP- /9-75 0.54 21 5/9-19 7 0.98 ASP-s0013.g1938/2-s0004.g17632 ASP-s0085.g 5 0.72 0.81 0.99 0. 02 ASP-2 ASP-s0184.g986/7-178 9 ASP-s0004. 7 0.85 0.74 ASP-s0004.g ASP-s0114.g465/46-108 0.48 0.99 2994/197-3 9 0.18 ASP-s0177 /25-23 ASP-s0114.g461/25-11 0.91 0.94 g1770/21-177 7 ASP-s0005.g24 0.11 0.85 0.88 180 7 ASP-s0340.g2977/170-334 0.41 ASP-s0087. 1772/1 0.92 .g615/29-265 ASP-s0340.g 0.92 1 0.91 ASP-s0005. ASP-s0181.g872/85-245995/67-22 -220 ASP-4N ASP-s0087.g20 0.84 0.92 14/9-19 ASP-s0114.g459/359-518 4 0.44 ASP-s0087.g203g2032/10-106 0.97 0.75 0.6 ASP-s0 g2416/8-18 ASP-s0036.g3234/202-35 ASP-s00 0 ASP-s0340.g2 3 0.82 ASP-s0050.g2052/2-121 ASP-s0005.g2388/1-074. 34/8-187 /146-308-10 0.81 0.26 0.79 0. ASP-s0 2 ASP-s1019.g3408/8-113 0.71 0.54 74.g904/123g903 3/7-186 0.98 5 0 ASP-s0007.g3493/6-77 6 Nam-AS 0. ASP-s0 005.g /102 0.32 5 ASP-s0341.g3010/15-18 /1-11 0.9 1 0.86 ASP-s03 -249 5 P-g18166/37-1 10 0.8 0.76 ASP-s003 2390 -181 ASP-s0341.g3009 0.47 0 0.99 061.g3273/7-18 ASP-s0080.g1366/2 114 3 0.84 ASP-s0030. /5-18 0.92 56.g3362 ASP-s0335.g2860/7-81 0.45 ASP-s003 1 0.87 ASP-s0808.g2453/1-117 0 1 0.92 0.21 0.98 ASP-s0030 0.g2206/ 8 Nam-ASP-g19417 ASP-s024 /7-180 1 4 ASP-s0030.g2138/151-2630.g2183g2150/6-179 7 Nam-ASP-g14414/11-138 ASP-s003 26-22 0.6 ASP-s0672.g1385/39-ASP-s0074.g877/7-78 0.79 .g2215 0.95 0.96 1 ASP-s048 2.g342 9-183 0.85 /24-180 0 0.69 ASP-s0242.g34 ASP-s0061.g3213/11-12 0.67 ASP /21-19 ASP-s0990.g3311/3-7 -139 0.89 0.99 0.g2143/38-208 0.93 0.95 ASP-s0489.g2364/33- 8/35-1 -s0489.g2361/46-1169.g2365/3 ASP-s0275.g1031/13-140 0.96 ASP-s0061 2 3-109 0.14 0.86 ASP-s0095. 88 ASP-s0050.g2041/1-124 0.99 0.72 0.82 0.85 Nam-ASP-g17291/2-110 0.99 Nam-ASP- 20/48-150 8 0.75 ASP-s01 -152 217 0.74 0.85 0.93 Nam-ASP-g17468/5 0.8 9 0.74 . ASP-s0 .g3212/

ASP-s0230.g2959/1-110 0 0.58 0.75 ASP-s09 g2855/15 -168 0.92 0 ASP-s0018.g3665/37 0.36 ASP-s023 77.gg0950 0.83 0.76 0.92 Nam-AS 006.g30 71-23819

ASP-s0220.g2499/133-289Nam-ASP-g06017/ 1 ASP-s0067.g33 619 8 0.97 0.18 90.g 9/108 5-81 0.96 ASP-s00 -14 ASP-s0036.g3213/37-197 0.93 0.88 0.15 /109 Nam-ASP-g16614/6-182868/273-430 0.98 ASP-6/14 2.g3033/ 5 0.49 0.34 P-g08137/133314/72/18-1 -174 1 0.99 Nam-ASP- 0.36 0.98 -22 ASP-s0335.g2868/60- 0.49 0.96 ASP-s 0 6-11 0.83 67.g29/5- 0 ASP-s0559.g3446/43-161 0.66 1 ASP-s06 0.92 4-201 8 0.36 0.92 ASP-s063 -162 5 ASP-s0559.g3445/31 0.97 0.89 /18-165 6 0.99 ASP-s063 0183.g947/ ASP-s0335.g2ASP-s0559.g3448/5-175 ASP-s01 g15467/1-109 -16 0.88 0.98 ASP-s0335.g2866/ 3 0.91 0.94 0.86 1 0.38 Nam-AS 33.g89 168 2 5 0.98 0. 0.51 Nam-ASP 32-203 0.91 3.g889/5-17 0.74 ASP-s0105.g3666/19 Nam-ASP-g12445/269-430 8 0.89 0.86 3.g902 ASP-s0818.g2514/43-207 0.61 ASP-s0365.g 05.g 5/27-17-17 0.23 0.91 ASP-s03 ASP-s0463.g1918/26-199 1 0.92 P-6/8- ASP-s0372.g147/48-221 58/29-21 0.87 0.25 0.87 ASP-s0365.g359 1 .g278/28-20 0.89 0.96 3671/ ASP-s0365. -g12643/8-173 /18-17 95 ASP-s0298.g1746/34-210 0.99 0.92 0.89 ASP-s0 155 6 1 ASP-s0009.g4 8-17 ASP-s0799.g2411/216-391 /257-43 0.25 0.85 65.g35 ASP-s0443.g1548/28-203 3 ASP-s0009.g458/10-170 4 ASP-s0068.g235/ 0.29 ASP-s 3592/8-7 0. 2 ASP-s0068 0 0.17 ASP-s0009. 009.g461/23-18 8 64-19 0 0.12 0.37 ASP-s0068.g2 0.95 ASP-s00 94/19-1 0.88 g3591/6-171 0.92 ASP-s0520. -165 17 4 0.95 0009 0.37 0.98 0.67 ASP-s 0/8-17 2 ASP-s0445.g1586/129-295 79-751 0.56 0.37 ASP-s0157.g3201/ -16 0. 0. ASP-s006 ASP-s0004.g1980/503-56 0.92 3 0.97 60/5-174 0.99 .g462/ 82 7 4 0.93 9 ASP-s0066. ASP-s0004.g1980/114-283 3 09.g4 7 9 0.34 0.97 0050 g455/2 1 7 ASP-s00 ASP-s0004.g1976 0.9 0. 0.99 0.97 . 0.95 ASP-s0240.g3336/ ASP-s0031.g2326/50-221 0.97 0.91 .g2493/1-1 0 ASP-s0 g2851/8 ASP-6N ASP-s0112.g290/188-362 0. 0.82 5-173 4 7 0.91 Nam-ASP-g0 .g19 56/5 2/96-26 0.88 0.87 ASP-s0008.g207/112-285 1-511 4 9 Nam-ASP- ASP-s0008.g208/ 0.88 6.g371 2-18 0 0.68 Nam-ASP-g 0. 8 . ASP-s0004 -169 0.96 9 66.g 87/4-

ASP-s0031.g2324/74-247 9 0.63 1 0.48 0.91 6 . 0.96 9 g3773/243-407 3 ASP-s0183.g9 351.g32 4-227 6

3 0 0.63 Nam-ASP-g036 ASP-s0031.g2333/127-30 0.01 9 Nam-ASP-g01217/5 377 4/2-146 ASP-s0497 0.93 242-40616 ASP-s0154.g3024/5 0.88 Nam-ASP ASP-s0497.g2498/2 0.72 8 ASP-s0 9 0.84 1 4 ASP-s005 g15942/ 4/23 0.96 0 0.96 8070/1-8 7 0.86 ASP-s0003.g1634/13-15 6 46/186-257 41/76-25 . 0.85 7 5 18016/1-55 0.8 ASP-s1053.g3499/412-579 0.83 9 6 ASP-5B .g169 237-403 ASP-s0118.g747/329-498 0.99 ASP-s0176.g53 0/63-23 . g3545/94-2 ASP-5A 7-401 1 ASP-s0651.g1140/1-11 0 ASP ASP-s0007.g3 0.8 016.g 0.94 1 0.98 ASP-s0007 901/4-16 0.82 -g02238/154-270 47/265 54-153 80 ASP-s 7/1-14 1-152 5 0.8 9 7 ASP-s0131.g1613/34 239-41 0 -s0003.g1626/468-62 00/232-40 8 ASP-s0815.g 3.g23 36/220-292

8 . ASP-s010 ASP-s0002.g1098/167-347 8 0.63 /225 0.94 7 Nam-ASP-g171 0.59 0.96 /211- ASP-s0801.g2418/177-34 0.3 7 3129/1 . 5 5 Nam-ASP-g1801 ASP-s0038.g3566/111-28 Nam-ASP-g1 ASP-s0176.g546/111-193 76-41 -13 0 0.88 -34 0 1976/6-91 Nam-AS 0007.g3490/266-42 -383 ASP-s0245.ASP-s0250.g1 Nam-ASP 96/32 5-164 0.73 Nam-ASP 381 Nam-ASP 9

1 6-199 0.77 0.02 Nam-ASP-g0 ASP-s0005.g2423/54-23 /25-14 9 0 Nam-AS .g3499/16 593-1 9 0.98 1 6

236/30-21 . ASP-s1294.g381 1 Nam-ASP-g 489/232 ASP-s0182.g 0.84 8 Nam-ASP-g013 1.g3352

7 -204 /26-199 19 ASP-s0278.g1169/70-181 . Nam-ASP 19 8 ASP-s0010.g102 41-215 ASP-s0010.g1014/307-473 2498/29-19 9 0 P-g013 ASP-s0182.g9 ASP-s0650.g1130/193-310 768 32 ASP-s0010.g1023/277-441 6 ASP-s00 ASP-s0010.g1026 -g013 0 ASP-s0129.g1493/102-26 -37 5 ASP-s0123.g1148/ ASP-s0010.g1032/190-30 -g01305/23-184 389 55/131- ASP-s0010.g1034/346-51 P-g01367/77-196 -g01308/4-14 ASP-s0129.g1517/ASP-s0176.g534/56-230ASP-s0004.g ASP-s0149.g2708/20-20 02 ASP-s00 -206 -387 Nam-ASP- /305-46 Nam-ASP 1/1-55 ASP-s000 ASP-s0002.g610 62/1-11 8 3634/25- 0 ASP 7/27 ASP-s0371.g125/135-2 0 Nam-A 6.g2892/ 92-251 401 1366/24 09/1-16 .g3635/25- Nam-A Nam-ASP-g19466/2-9 -g1554 0274 6 ASP-s00 ASP-s0118.g745/2ASP-s0068.g 2273/28-144 /74-23 ASP-s 9-184 9 ASP-s0046.g137 ASP-s0046.g13

ASP-s0046.g1398/138 ASP-s0068.g229/28 Nam-A 2 1 ASP-s0005.g2413/9-12 10.g 53 0 6-134 422 ASP-s0 29 ASP-s0068.g226/28-139 ASP-s0177. A ASP 7-448 A 0 ASP-s0004.g1771/1-17 A -s0116.g611/2 778/24-15 3 A

A A A 9 4 2 S 69/268-42 242-342 6 S

6

6 6 S 1

ASP-s0038.g3633 S 1 S S 2/13 844/26-2 S 16.g3 .g1347/209 08 5 9 P

5 6 6 0 ASP-s0019.g3863/2ASP-s0395.g660 8 P SP-g18444/4- /6-11 P

P 102 P /214-27 P P 7/269-3 5 4 SP-g06987 ASP-s0395.g653/26-199 2

1 4 3 0046.g1373/177-342 5/140-302 .g776/ 4 - 7-40 -s0005.g241 06/63-17 4-11 - - -g059 g05957/401-536 - -

3-45 - -

------s 2.g6

s

4 2

4

s s /79-239 5 s SP-g09589/1-116 9

ASP-s033 1 2-29 8 0 2844/4 46/46-20 2 9 46.g1372/267 /479-645 0

9/28-21 A /

0 0 ASP-s0336.g2889/41-226 49.g1663 B 0 005. 8/24 /221-381 9

4

0 1 311-472 2 5 117 ASP-s0038 0

5 0 0 0 / 2

/ ASP-s0038.g .g3088/5-150 2 8 3 7 2.g608/ 3 2 /16-18 3 / 0

3

/

6 04/ .g615 / 0 0 / / 0

.g2323 7 395/1-15 0 4 5 71/75 05/25-138 1 ASP-s0031.g 5 6 ASP-s0468.g2011/133-31 8 9 4 4 5 3 4 ASP-5C ASP-s0012.g1875/23-160 9 5 . /1430 2-414 2 g2389 1 .

1

5

2 g616/2-142 g 1 . . 015.g 5 9 . g /106-19 75-19 0 g g g - 4

-

8

g 6 6 ASP-s0002.g 8 6 2 ASP-s0080 ASP-s0080.g1347/10-151 1.g3408/ 4

4

6.g3072/262-4 . 2764/262-428 1 ASP-s0002.g777/ 1 1 97/242-407 Nam-ASP-g07903/367-5 2

2

g 2 2 2 35-401 3 6

7

5 . 7 ASP-s0043.g 7 3 7 1/238-402 /227-3 g

ASP-s0002 g 4 ASP-s04 g 53.g2287 g 0-91 . 9 7

. ASP-s0043.g843/24-201 525.g292 0

9

7 5 6 0061.g3212/370- 2.g3033/ . . 6

/ -15 6/234-361 142 5 0 6 5 2

0 7 3 7 5 3 3 5 /43-18 0 1

0 /

9

0 0 1 / 9 ASP-s000 5 / 9 / 2 s0007.g3401/38-141 3 2 2 Nam-ASP-g11184/237- ASP-s0535 0 0 0 s - 4 0 ASP-s0213.g23 0 8 -432 8 5

ASP-s0002 - 3 4 9 2 0

ASPR-s0 0 s 4 ASP-s0053 0 0 -30 92 5

- 9 Nam-ASP-g089 - s 7 3

P

s

s s 1

ASP-s 1 - -

-

- - - - 3

P ASP-s010 4 ASP-s000 ASP-s0 S 8

4 ASP-s0101.g3402 3 4 0

P ASP-s00 ASP- P 3 ASP-s0005.g2 P P 1 S

A 5 ASP-s0101.g3406 2 1

S

S 7 ASP-s0005.g24 ASP-s0672.g1383/35 S S

A 2 ASP-s023 1

ASP-s0225.g ASP-s0005.g2399/246-4

A

A

A A ASP-4C

Figure 4 Domain-based phylogeny of ASP and ASPR genes from A. ceylanicum and N. americanus and ASPR genes from other nematodes. The tree shows a maximum-likelihood phylogeny of protein domains rather than full-length proteins at the tips (as ASP genes sometimes encode two or more tandem ASP domains). All ASP domains and most ASPR domains are from A. ceylanicum or N. americanus. Almost all domains from ASPRs fall within a single branch, labeled in blue. ASPR genes are labeled blue (A. ceylanicum), green (N. americanus), purple (Oesophagostomum dentatum) or magenta (Heligmosomoides bakeri). ASP domains from orthologs of known ASP genes are labeled in gold, with their branches. N- and C-terminal domains from two-domain proteins are noted as “N” or “C.” Domains from other, less familiar ASP genes are labeled in gray. Confidence values are given as decimal fractions (Supplementary Fig. 2). Identities of the corresponding genes and domains are given in Supplementary Tables 4, 13 and 14.

Nature Genetics ADVANCE ONLINE PUBLICATION  l e t t e r s also had 92 and 25 ASPR genes, respectively, which were missing with the start of parasitic feeding, molting into L4 larvae and overt entirely from the other species. One explanation for this diversity sexual differentiation (Fig. 1)6,10. We also observed a new protein is the ‘gray pawn’ hypothesis: members of a large gene family might family upregulated at this stage, with homologs in the strongylids have little individual effect on phenotypic fitness yet be collectively A. ceylanicum, N. americanus, H. contortus and Angiostrongylus needed for robust fitness under variable conditions47. For parasites, ­cantonensis (Supplementary Fig. 3 and Supplementary Tables 4, 12b a relevant variable condition might be diverse host immune systems, and 15); the corresponding genes in A. cantonensis are expressed in which might favor continually diversifying sequences and expression L4 larvae infecting brain tissue48. We thus named this family strong- profiles of ASPs and ASPRs. ylid L4 proteins (SL4Ps). In A. ceylanicum, 24 SL4P genes encoded For development from 24 h to 5 d after infection (24.PI to 5.D), genes proteins of ~200 residues, of which 21 were predicted to be non- encoding structural components of cuticle and genes whose products ­classically secreted49 without a leader sequence (Supplementary bind cytoskeletal proteins such as actin were prominently upregulated Table 16); notably, parasitic nematodes often use non-classical rather (Supplementary Table 11e). This period in the life cycle corresponds than classical secretion to export proteins into their hosts50.

Haemonchus expansion

Hookworm expansion

Non-parasitic nematodes

Hookworm expansion

Haemonchus expansion

Figure 5 Phylogeny of SCVPs from A. ceylanicum and other nematodes. A maximum-likelihood phylogeny of SCVPs (Supplementary Fig. 5 and Supplementary Tables 4 and 18) is shown. Species are indicated by color: the hookworms A. ceylanicum and N. americanus are shown in green and olive green, respectively; H. contortus is shown in orange; the free-living Caenorhabditis nematodes (C. elegans and C. briggsae) and P. pacificus are shown in blue and light blue; and H. bacteriophora, an insect parasite, is shown in purple. Confidence values are given as decimal fractions (Supplementary Fig. 5b). The SCVP phylogeny falls into five branches: two large, independent gene expansions in hookworms (green); two more branches in H. contortus (orange); and one small branch for non-parasitic nematodes (blue). Like ASPs, SCVPs appear to have existed as a small gene family in free-living nematodes but then to have expanded greatly in both hookworms and other mammalian parasites.

 aDVANCE ONLINE PUBLICATION Nature Genetics l e t t e r s

Table 1 summary of predicted drug targets in A. ceylanicum Protein class A. ceylanicum genes Key C. elegans genes Drug data 4-coumarate:coenzyme A ligase, class I 10 acs-10 NA Ammonium/urea transporter 5 amt-2 NA Cofactor-independent phosphoglycerate mutase 1 ipgm-1 Limited druggability Fumarate reductase 1 F48E8.3 NA Glutamate-gated chloride channel 10 avr-14, avr-15, glc-2 avr-14 observed Glutamate synthase 1 W07E11.1 NA Glutamine-fructose 6-phosphate aminotransferase 3 gfat-1, gfat-2 NA Isocitrate lyase/malate synthase 2 icl-1 NA KH-domain RNA binding 5 asd-2, gld-1, K07H8.9 NA Malate/l-lactate dehydrogenase, YlbC type 4 F36A2.3 NA NADH:flavin oxidoreductase, Oye2/3 type 14 F17A9.4 NA Nematode prostaglandin F synthase 3 C35D10.6 NA O-acetylserine sulfhydrylase 2 cysl-1 NA Secreted lipase 6 lips-8, lips-9 NA Trehalose-6-phosphate synthase 5 gob-1, tps-1, tps-2 gob-1 predicted Predicted drug targets, encoded by 72 genes in A. ceylanicum (Supplementary Table 4), are listed by their protein class. For each class, the number of A. ceylanicum genes encoding it is listed, along with C. elegans homologs that have mutant or RNA interference (RNAi) phenotypes and data indicating whether the drug target is likely to work. avr-14 was recently shown to be a drug target of nitazoxanide67; ipgm-1, previously detected as a promising target, was found to encode a poorly druggable protein68; and gob-1 encodes trehalose-6-phosphatase, a predicted drug target in H. contortus20. “NA” indicates protein classes for which we are not aware of pertinent data. References for all drug target classes (and their drug data, if any) are given in Supplementary Table 19.

From 5 to 12 d after infection (5.D to 12.D), genes encoding parasites but absent from mammals, might be essential for survival protein tyrosine phosphatases, serine/threonine kinases and C-lectins in the host (determined on the basis of C. elegans loss-of-function were prominently upregulated (Supplementary Tables 11g and 12c). phenotypes), had homologs with known three-dimensional protein This period in the life cycle corresponds with maturation from late-L4 structures and had at least one homolog bound by a known small larvae to young adults with incipient fertility and the onset of heavy molecule (Supplementary Fig. 6). This screen yielded 72 genes in blood feeding, which exposes A. ceylanicum to the host’s immune A. ceylanicum, one of which (Acey_s0015.g2804) encoded trehalose- system (Fig. 1)10,11. Among 22 C-lectin genes upregulated by 12 d, 6-phosphatase (Table 1 and Supplementary Tables 4, 19 and 20). we detected 6 whose products had greater apparent similarity to mam- Vaccine targets should be both immunologically accessible and malian than to nematode lectins (Supplementary Tables 4 and 17). crucial for survival. Proteases meet these requirements, as they are Two of these genes encoded structural mimics of mammalian expressed in the intestine (and thus exposed to the host’s immune mannose receptor51, with five tandem C-lectin domains that had system) and because, without them, hookworms cannot digest host arisen through intragenic duplication (Supplementary Fig. 4a). The proteins such as hemoglobin36. We thus selected genes encoding pro- other four C-lectin genes resembled mammalian asialoglycoprotein teases that were permanently upregulated by 5 d after infection and receptors and neurocans51 but arose phylogenetically from nematode that lacked mammalian orthologs but had H. contortus homologs lectins (Supplementary Fig. 4b,c). Lectin genes with similarities to that are also upregulated during infection21. This screen yielded mammalian rather than nematode lectins have also been observed in 12 cathepsin B–like protease genes, with 4 orthologs in H. contortus; the parasitic nematodes Ascaris suum and and might by 19 d after infection, 5 of these 12 genes generated 1% of all tran- help suppress host immune responses52,53. scripts (Supplementary Table 4). Because protease inhibitors were We also observed a previously undescribed gene family upregulated also upregulated during early infection, we searched for ones meeting at 12 d after infection, with members not only in strongylid parasites our criteria; this screen yielded a previously undescribed protease (A. ceylanicum, N. americanus, H. contortus and Heterorhabditis bacte- inhibitor predicted to be a 79-residue secreted protein with con- riophora) but also in related non-parasitic clade V species (C. elegans, sistently strong expression (~0.1% of all adult transcripts) and one Caenorhabditis briggsae and P. pacificus; Fig. 5, Supplementary Fig. 5 H. contortus homolog upregulated during infection. and Supplementary Tables 12c and 18). We thus named this family The sequencing of A. ceylanicum adds to a growing number of secreted clade V proteins (SCVPs). In A. ceylanicum, 53 SCVP genes genomes for parasitic nematodes that, collectively, infect over encoded ~150-residue proteins, of which 48 were predicted to be clas- 1 billion humans60. Practically, these genomes will be crucial for sically secreted (Supplementary Table 4). Whereas N. americanus and inventing new drugs and vaccines against nematodes that rapidly H. contortus had 11 to 101 SCVP genes, other nematodes had only 1 evolve drug resistance61 and that have been parasitizing vertebrates to 6, suggesting an expansion of SCVP genes in mammalian-parasitic since the Cretaceous62. Understanding immunosuppression by nematodes analogous to those observed for ASP and ASPR genes. parasitic nematodes might also help alleviate autoimmune disorders, A key motivation for parasite genomics is to identify targets for which may be partly due to improved hygiene ridding humans of drugs or vaccines. Because drug development often fails54, it is chronic worm infections63. Intellectually, understanding these genomes essential to identify as many targets as possible. Four drug targets may illuminate remarkable evolutionary changes. allows (adenylosuccinate lyase, carnitine O-palmitoyltransferase, dTDP-4- adult nematodes to grow larger and live longer than their free-living dehydrorhamnose 3,5-epimerase and trehalose-6-phosphatase) have relatives (N. americanus adults are ~1 cm long and live for 3–10 years, recently been identified in H. contortus and N. americanus20,24,55–59. whereas C. elegans adults are ~1 mm long and live for 3 weeks), but All four are encoded by genes with A. ceylanicum orthologs the genomic changes underlying these adaptations are essentially (Supplementary Table 4). To identify additional drug targets across unknown1,64–66. The genome and transcriptome of A. ceylanicum the genome, we searched for genes that were conserved by diverse should provide lasting benefits for biology and medicine.

Nature Genetics ADVANCE ONLINE PUBLICATION  l e t t e r s

URLs. FigTree, http://tree.bio.ed.ac.uk/software/figtree/; Gene 4. Keiser, J. & Utzinger, J. The drugs we have and the drugs we need against major helminth infections. Adv. Parasitol. 73, 197–230 (2010). Ontology term tables, http://archive.geneontology.org/full/; 5. Schneider, B. et al. A history of hookworm vaccine development. Hum. Vaccin. modENCODE, http://www.modencode.org/; NCoils, http://www. 7, 1234–1244 (2011). russell.embl-heidelberg.de/coils/coils.tar.gz; protocols by S. Kumar 6. Garside, P. & Behnke, J.M. Ancylostoma ceylanicum in the hamster: observations on the host-parasite relationship during primary infection. 98, for running Blast2GO, InterProScan and MAKER2, https://github. 283–289 (1989). com/sujaikumar/assemblage/blob/master/README-annotation.md; 7. Traub, R.J. Ancylostoma ceylanicum, a re-emerging but neglected parasitic RepBase, http://www.girinst.org/server/RepBase/protected/ . Int. J. Parasitol. 43, 1009–1015 (2013). 8. Cantacessi, C. et al. A portrait of the “SCP/TAPS” proteins of eukaryotes— RepBase19.02.fasta.tar.gz. developing a framework for fundamental research and biotechnological outcomes. Biotechnol. Adv. 27, 376–388 (2009). 9. Jian, X. et al. : maintenance through one hundred generations Methods in golden hamsters (Mesocricetus auratus). II. Morphological development of the Methods and any associated references are available in the online adult and its comparison with humans. Exp. Parasitol. 105, 192–200 (2003). version of the paper. 10. Ray, D.K., Bhopale, K.K. & Shrivastava, V.B. Migration and growth of Ancylostoma ceylanicum in golden hamsters Mesocricetus auratus. J. Helminthol. 46, 357–362 (1972). Accession codes. All assemblies and raw reads have been 11. Menon, S. & Bhopale, M.K. Ancylostoma ceylanicum (Looss, 1911) in golden deposited at NCBI. The corresponding NCBI accession numbers hamsters (Mesocricetus auratus): pathogenicity and humoral immune response to a primary infection. J. Helminthol. 59, 143–146 (1985). are as follows: A. ceylanicum genome assembly, JARK00000000; 12. Blaxter, M. Nematodes: the worm and its relatives. PLoS Biol. 9, e1001050 A. ceylanicum cDNA assembly, GASF00000000; A. ceylanicum (2011). 13. van Megen, H. et al. A phylogenetic tree of nematodes based on about 1200 genomic DNA reads, SRR1124846 and SRR1124848; A. ceylani- full-length small subunit ribosomal DNA sequences. Nematology 11, 927–950 cum RNA-seq reads, SRR1124849, SRR1124850, SRR1124900, (2009). SRR1124905, SRR1124906, SRR1124907, SRR1124908, SRR1124909, 14. Kiontke, K.C. et al. A phylogeny and molecular barcodes for Caenorhabditis, with numerous new species from rotting fruits. BMC Evol. Biol. 11, 339 (2011). SRR1124910, SRR1124911, SRR1124912, SRR1124913, SRR1124914, 15. Chilton, N.B., Huby-Chilton, F., Gasser, R.B. & Beveridge, I. The evolutionary SRR1124985 and SRR1124986; C. elegans RNA-seq reads, SRR1125007 origins of nematodes within the order are related to predilection sites within hosts. Mol. Phylogenet. Evol. 40, 118–128 (2006). and SRR1125008. In addition, we have deposited the genome sequence 16. Kaplan, R.M. Drug resistance in nematodes of veterinary importance: a status and gene predictions at WormBase (in archival release WS245). report. Trends Parasitol. 20, 477–481 (2004). 17. Mortazavi, A. et al. Scaffolding a Caenorhabditis nematode genome with RNA-seq. Note: Any Supplementary Information and Source Data files are available in the Genome Res. 20, 1740–1747 (2010). online version of the paper. 18. Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008). Acknowledgments 19. Abubucker, S. et al. The canine hookworm genome: analysis and classification of survey sequences. 157, 187–192 We thank C.T. Brown, P.K. Korhonen, S. Kumar and J.E. Stajich for advice on Ancylostoma caninum Mol. Biochem. Parasitol. (2008). bioinformatics, T.A. Aydin, J. Liu and A.H.K. Roeder for comments on the 20. Laing, R. et al. The genome and transcriptome of Haemonchus contortus, a key manuscript, L. Schaeffer and V. Kumar for genome and RNA sequencing, and model parasite for drug and vaccine discovery. Genome Biol. 14, R88 (2013). C.T. Brown for use of the Michigan State University High-Performance Computing 21. Schwarz, E.M. et al. The genome and developmental transcriptome of the Center (supported by grant 2010-65205-20361 from the US Department of strongylid nematode Haemonchus contortus. Genome Biol. 14, R89 (2013). Agriculture and grant IOS-0923812 from the National Institute of Food and 22. Stein, L.D. et al. The genome sequence of Caenorhabditis briggsae: a platform Agriculture and the National Science Foundation). for comparative genomics. PLoS Biol. 1, E45 (2003). 23. Dieterich, C. et al. The Pristionchus pacificus genome provides a unique AUTHOR CONTRIBUTIONS perspective on nematode lifestyle and parasitism. Nat. Genet. 40, 1193–1198 (2008). E.M.S., P.W.S. and R.V.A. conceived of and managed the project. Y.H. isolated 24. Tang, Y.T. et al. Genome of the human hookworm Necator americanus. Nat. Genet. genomic DNA and RNA from all infection stages of A. ceylanicum and converted 46, 261–269 (2014). RNA to cDNA for sequencing. M.M.M. maintained A. ceylanicum and golden 25. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and hamster cultures through full life cycles and performed the photography for syntenically mapped cDNA alignments to improve de novo gene finding. Figure 1. I.A. constructed large-insert paired-end and jumping Illumina Bioinformatics 24, 637–644 (2008). libraries and supervised both genomic and RNA-seq Illumina sequencing. 26. Raffaele, S. & Kamoun, S. Genome evolution in filamentous plant pathogens: E.M.S. conducted all bioinformatics and biological analyses. Writing was why bigger can be better. Nat. Rev. Microbiol. 10, 417–430 (2012). primarily carried out by E.M.S. but with input from all authors. 27. Danchin, E.G. & Rosso, M.N. Lateral gene transfers have polished animal genomes: lessons from nematodes. Front. Cell. Infect. Microbiol. 2, 27 (2012). 28. Wu, B. et al. Interdomain lateral gene transfer of an essential ferrochelatase gene COMPETING FINANCIAL INTERESTS in human parasitic nematodes. Proc. Natl. Acad. Sci. USA 110, 7748–7753 The authors declare competing financial interests: details are available in the online (2013). version of the paper. 29. Uehara, T. & Park, J.T. An anhydro-N-acetylmuramyl-l-alanine amidase with broad specificity tethered to the outer membrane of Escherichia coli. J. Bacteriol. 189, Reprints and permissions information is available online at http://www.nature.com/ 5634–5641 (2007). reprints/index.html. 30. Nikoh, N. et al. Bacterial genes in the aphid genome: absence of functional gene transfer from Buchnera to its host. PLoS Genet. 6, e1000827 (2010). 31. Husnik, F. et al. Horizontal gene transfer from diverse bacteria to an insect This work is licensed under a Creative Commons Attribution- genome enables a tripartite nested mealybug symbiosis. Cell 153, 1567–1578 ­NonCommercial-ShareAlike 3.0 Unported License. The images or other (2013). third party material in this article are included in the article’s Creative 32. Wang, Z. et al. Characterizing Ancylostoma caninum transcriptome and exploring Commons license, unless indicated otherwise in the credit line; if the material is not included nematode parasitic adaptation. BMC Genomics 11, 307 (2010). under the Creative Commons license, users will need to obtain permission from the license 33. Götz, S. et al. High-throughput functional annotation and data mining with the holder to reproduce the material. To view a copy of this license, visit http://creativecommons. Blast2GO suite. Nucleic Acids Res. 36, 3420–3435 (2008). org/licenses/by-nc-sa/3.0/. 34. Prüfer, K. et al. FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 8, 41 (2007). 1. Brooker, S., Bethony, J. & Hotez, P.J. Human hookworm infection in the 21st 35. Bansemir, A.D. & Sukhdeo, M.V. The food resource of adult Heligmosomoides century. Adv. Parasitol. 58, 197–288 (2004). polygyrus in the small intestine. J. Parasitol. 80, 24–28 (1994). 2. Vos, T. et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases 36. Ranjit, N. et al. Proteolytic degradation of hemoglobin in the intestine of the and injuries 1990–2010: a systematic analysis for the Global Burden of Disease human hookworm Necator americanus. J. Infect. Dis. 199, 904–912 (2009). Study 2010. Lancet 380, 2163–2196 (2012). 37. Knox, D. in Parasitic Helminths: Targets, Screens, Drugs, and Vaccines 3. Pullan, R.L., Smith, J.L., Jasrasaria, R. & Brooker, S.J. Global numbers of (ed. Caffrey, C.R.) 399–420 (Wiley-VCH Verlag & Co., 2012). infection and disease burden of soil transmitted helminth infections in 2010. 38. Pearson, M.S. et al. Molecular mechanisms of hookworm disease: stealth, Parasit. Vectors 7, 37 (2014). virulence, and vaccines. J. Allergy Clin. Immunol. 130, 13–21 (2012).

 aDVANCE ONLINE PUBLICATION Nature Genetics l e t t e r s

39. Klotz, C. et al. A helminth immunomodulator exploits host signaling events to 54. Scannell, J.W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline regulate cytokine production in macrophages. PLoS Pathog. 7, e1001248 in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 (2011). (2012). 40. Hartmann, S. & Lucius, R. Modulation of host immune responses by nematode 55. Taylor, C.M. et al. Discovery of anthelmintic drug targets and drugs using cystatins. Int. J. Parasitol. 33, 1291–1302 (2003). chokepoints in nematode metabolic pathways. PLoS Pathog. 9, e1003505 41. Manoury, B., Gregory, W.F., Maizels, R.M. & Watts, C. Bm-CPI-2, a cystatin (2013). homolog secreted by the filarial parasite , inhibits class II MHC– 56. Fyfe, P.K., Dawson, A., Hutchison, M.T., Cameron, S. & Hunter, W.N. Structure restricted antigen processing. Curr. Biol. 11, 447–451 (2001). of Staphylococcus aureus adenylosuccinate lyase (PurB) and assessment of its 42. Gerstein, M.B. et al. Integrative analysis of the Caenorhabditis elegans genome potential as a target for structure-based inhibitor discovery. Acta Crystallogr. by the modENCODE project. Science 330, 1775–1787 (2010). D Biol. Crystallogr. 66, 881–888 (2010). 43. Kim, D., Grun, D. & van Oudenaarden, A. Dampening of expression oscillations 57. Ashrafian, H., Horowitz, J.D. & Frenneaux, M.P. Perhexiline. Cardiovasc. Drug by synchronous regulation of a microRNA and its target. Nat. Genet. 45, Rev. 25, 76–97 (2007). 1337–1344 (2013). 58. Sivendran, S. et al. Identification of triazinoindol-benzimidazolones as nanomolar 44. Choi, Y.J. et al. A deep sequencing approach to comparatively analyze the inhibitors of the Mycobacterium tuberculosis enzyme TDP-6-deoxy-d-xylo-4- transcriptome of lifecycle stages of the filarial worm, Brugia malayi. PLoS Negl. hexopyranosid-4-ulose 3,5-epimerase (RmlC). Bioorg. Med. Chem. 18, 896–908 Trop. Dis. 5, e1409 (2011). (2010). 45. Osman, A. et al. Hookworm SCP/TAPS protein structure—a key to understanding 59. Farelli, J.D. et al. Structure of the trehalose-6-phosphate phosphatase from Brugia host-parasite interactions and developing new interventions. Biotechnol. Adv. 30, malayi reveals key design principles for anthelmintic drugs. PLoS Pathog. 10, 652–657 (2012). e1004245 (2014). 46. Hewitson, J.P. et al. Proteomic analysis of secretory products from the model 60. Zarowiecki, M. & Berriman, M. What helminth genomes have taught us about gastrointestinal nematode Heligmosomoides polygyrus reveals dominance of parasite evolution. Parasitology 42 (suppl. 1), S85–S97 (2015). venom allergen-like (VAL) proteins. J. Proteomics 74, 1573–1594 (2011). 61. Gilleard, J.S. Haemonchus contortus as a paradigm and model to study 47. Thomas, J.H. & Robertson, H.M. The Caenorhabditis chemoreceptor gene families. anthelmintic drug resistance. Parasitology 140, 1506–1522 (2013). BMC Biol. 6, 42 (2008). 62. Durette-Desset, M.C., Beveridge, I. & Spratt, D.M. The origins and evolutionary 48. He, H. et al. Preliminary molecular characterization of the human pathogen expansion of the strongylida (Nematoda). Int. J. Parasitol. 24, 1139–1165 Angiostrongylus cantonensis. BMC Mol. Biol. 10, 97 (2009). (1994). 49. Bendtsen, J.D., Jensen, L.J., Blom, N., Von Heijne, G. & Brunak, S. Feature-based 63. McSorley, H.J. & Maizels, R.M. Helminth infections and host immune regulation. prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Clin. Microbiol. Rev. 25, 585–608 (2012). Sel. 17, 349–356 (2004). 64. Yeates, G.W. & Boag, B. Female size shows similar trends in all clades of the 50. Borloo, J. et al. In-depth proteomic and glycomic analysis of the adult-stage Cooperia phylum Nematoda. Nematology 8, 111–127 (2006). oncophora excretome/secretome. J. Proteome Res. 12, 3900–3911 (2013). 65. Gems, D. Longevity and ageing in parasitic and free-living nematodes. 51. Zelensky, A.N. & Gready, J.E. The C-type lectin–like domain superfamily. Biogerontology 1, 289–307 (2000). FEBS J. 272, 6179–6217 (2005). 66. Desjardins, C.A. et al. Genomics of , a Wolbachia-free filarial parasite of 52. Yoshida, A., Nagayasu, E., Horii, Y. & Maruyama, H. A novel C-type lectin identified humans. Nat. Genet. 45, 495–500 (2013). by EST analysis in tissue migratory larvae of Ascaris suum. Parasitol. Res. 110, 67. Somvanshi, V.S., Ellis, B.L., Hu, Y. & Aroian, R.V. Nitazoxanide: nematicidal mode of 1583–1586 (2012). action and drug combination studies. Mol. Biochem. Parasitol. 193, 1–8 (2014). 53. Loukas, A., Doedens, A., Hintz, M. & Maizels, R.M. Identification of a new C-type 68. Crowther, G.J. et al. Cofactor-independent phosphoglycerate mutase from lectin, TES-70, secreted by infective larvae of Toxocara canis, which binds to nematodes has limited druggability, as revealed by two high-throughput screens. host ligands. Parasitology 121, 545–554 (2000). PLoS Negl. Trop. Dis. 8, e2628 (2014).

Nature Genetics ADVANCE ONLINE PUBLICATION  ONLINE METHODS Assessing the completeness of genomic DNA. We estimated the assembly’s General summary. Culture and infection of A. ceylanicum in golden hamster completeness as 98% by computing the frequencies of 31-mers71, as 91–99% (Mesocricetus auratus) were carried out as described69. All housing and care by searching for conserved eukaryotic genes (Supplementary Table 3)76 and of laboratory used in this study conformed with the US National as 93% by mapping cDNA (assembled independently from RNA-seq reads) Institutes of Health Guide for the Care and Use of Laboratory Animals in to genomic DNA: these calculations supported a consensus value of 95%. Research (see 18-F22) and all requirements and all regulations issued by the The average number of orthologs observed for full-length core eukaryo- US Department of Agriculture (USDA), including regulations implementing tic genes76 was 1.13, which matched averages of 1.11–1.15 in C. elegans, the Animal Welfare Act (Public Law 89-544, US Statutes at Large) as amended C. briggsae and Caenorhabditis tropicalis (all of which are hermaphrodites and (see 18-F23). Stages of A. ceylanicum selected for developmental RNA-seq are thus are completely homozygous), suggesting that the assembly was largely shown in Figure 1 and listed in Supplementary Table 6; they are based on free of unresolved heterozygosity. We searched the genome for tRNA genes previously described stages of growth in golden hamster10. with tRNAscan-SE-1.3.1 (ref. 96); this analysis detected a full complement of Genomic sequencing and RNA-seq were carried out largely as described70. 426 tRNAs decoding all 20 standard amino acids and one selenocysteine tRNA The numbers of A. ceylanicum and hamsters used for A. ceylanicum RNA-seq (Supplementary Table 24). are listed in Supplementary Table 21. The A. ceylanicum genomic sequence was assembled from paired Illumina 100-nt reads (550 nt and 6 kb apart) with Examining repetitive elements for possible horizontal gene transfer. In Velvet (1.2.05)18, gaps were closed after assembly with BGI GapCloser 1.12 A. caninum, the repetitive element bandit resembles the HSMAR1 mariner-like (release_2011)71 and the sequence was reduced in possible heterozygosity72 with transposon of humans and has been postulated to arise from a mammalian HaploMerger (20111230)73. Genomic RNA scaffolding was performed by filter- host by HGT97. To determine whether a bandit homolog also existed in A. cey- ing RNA-seq reads with khmer74 and then scaffolding with ERANGE (3.2)17. lanicum, we searched our library of A. ceylanicum repetitive elements with the RNA-seq reads were assembled into cDNA with Oases (0.2.07)75. Assembled DNA sequence for bandit via BLASTN (2.2.26+)90 (arguments: “-task blastn cDNAs (Supplementary Table 2) were used both to assess genome completeness -evalue 1e-03”). This analysis yielded two hits, with E values of 0.0 and 7 × 10−170 and to aid in the prediction of protein-coding genes. The true genomic size of (Supplementary Table 25). Phylogenetic analysis (Supplementary Fig. 7) and A. ceylanicum was estimated by counting 31-mer frequencies with SOAPdenovo domain analysis with HMMER/hmmsearch indicated that the higher-scoring (V1.05)71, by CEGMA (v2.4.010312) (Supplementary Table 3)76 and by map- hit represented an A. ceylanicum homolog of bandit, whereas the lower-scoring ping cDNAs to genomic DNA with BLAT (v. 34)77. Repetitive DNA elements in hit represented a partial homolog of bandit that did not encode a transposase the final genome assembly were identified with RepeatScout (1.0.5)78. domain (Transposase_1/PF01359.13 in Pfam). We predicted protein-coding genes for our final genomic assembly with To examine whether more evidence for lateral acquisition of repetitive ele- AUGUSTUS (2.6.1)25, after generating species-specific parameters with one ments existed in human hookworms, we used the DFAM database98 to identify round of MAKER2 (2.26-beta)79 (see URLs for the protocol by S. Kumar for repetitive DNA elements in A. ceylanicum and N. americanus with similarity to running MAKER2) and using hints from cDNA that had been mapped to the human repetitive elements. This analysis identified two classes of elements with genome assembly with BLAT. For predicted A. ceylanicum proteins, we annotated mammalian similarities, L3/Plat_L3-like retrotransposons and HSMAR1/2- signal and transmembrane sequences with Phobius80, low-complexity regions like mariner elements (Supplementary Table 25). To determine whether these with SEG81, coiled-coil domains with NCoils82, Pfam 26.0 domains (from both similarities were adventitious or real, we computed maximum-likelihood Pfam-A and Pfam-B)83 with HMMER 3.0/hmmsearch84, InterPro domains with phylogenies for reverse-transcriptase domains (for L3/Plat_L3-like elements) InterProScan (4.8)85 and GO terms with Blast2GO 2.5 (build 23092011)33 (see and transposase domains (for HSMAR1/2-like elements). These phylogenies URLs for protocols for running Blast2GO and InterProScan). We also assigned included all of the L3/Plat_L3-like and HSMAR1/2-like repetitive elements that GO terms to C. elegans and H. contortus genes with Blast2GO so that comparisons we could detect in A. ceylanicum and N. americanus, in a diverse set of other of GO terms between different nematode species would be based on equivalent published genome sequences from nematodes, vertebrates, arthropods, lopho- GO term assignments. We performed InterProScan and Blast2GO for both trochozoans and deuterostomes (Supplementary Table 26) and in a curated A. ceylanicum and C. elegans. We computed orthologies with OrthoMCL (1.3)86. collection of eukaryotic elements from RepBase99 (see URLs for source). We Strict orthologies between genes from two or more species were defined as those extracted well-aligned, full-length protein domains from repetitive elements by orthology groups that contained only one predicted gene for each of the species. requiring that they match the Pfam domains Transposase_1/PF01359.13 (for Annotations for protein-coding genes are listed in Supplementary Table 4. HSMAR1/2-like elements) or RVT_1 (reverse transcriptase)/PF00078.22 (for For RNA-seq analysis of C. elegans, we used published developmental L3/Plat_L3-like elements) and also by excluding the shortest 10% of domain data from the modENCODE consortium (Supplementary Table 22)42. For matches. These criteria led us to select 988 Plat_L3/L3-like RVT_1/PF00078.22 H. contortus, we used published developmental RNA-seq data21. peptides (Supplementary Table 27a) and 168 HSMAR1/2-like Transposase_ We mapped RNA-seq reads to genes with Bowtie 2 (ref. 87) and quantified 1/PF01359.13 peptides (Supplementary Table 27b), which we subjected to gene expression with RSEM (1.2.0)88. For individual genes, we computed multiple-sequence alignment and phylogenetic analysis. the significance for changes in gene activity between stages or biological conditions (Supplementary Tables 7 and 23) with NOISeq-sim (2.13)89. Analyzing protein-coding genes. For motif searches or OrthoMCL analyses Because we had only one biological replicate per condition, we sampled five of protein sequences, we used nematode and mammalian proteomes from random subsets of RNA-seq data per condition to estimate the significance genomic sequences and partial nematode proteomes from translated ESTs. All of changes in gene activity. For A. ceylanicum, C. elegans and H. contortus, we proteomes and their sources are listed in Supplementary Table 28. We classi- used FUNC 0.4.5 with Wilcoxon rank-sum statistics34 to compute which GO fied A. ceylanicum, H. contortus and C. elegans genes both by known protein terms were significantly associated with genes up- or downregulated between motifs (through HMMER 3.0/Pfam-A 26 and InterProScan 4.8)83–85 and evo- developmental stages or environmental conditions (for example, changes lutionary relationships to genes in different species (through OrthoMCL 1.3)86. of drug treatment). For A. ceylanicum, we also used rank-sum statistics to Pfam-A domains were detected at a threshold of E ≤ 1 × 10−5; InterProScan and compute such associations for protein families. OrthoMCL were run with default parameters. We used Pfam-A and InterPro For phylogenetic analyses, sequences homologous to a protein or motifs, in turn, to assign GO terms to each gene with Blast2GO 2.5 (build single domain were extracted with psi-BLAST90 or HMMER/jackhmmer. 23092011)33. We performed InterProScan and Blast2GO according to avail- Protein sequences were aligned with MUSCLE (3.8.31)91 or MAFFT able protocols (see URLs); for Blast2GO, we used both InterProScan predic- (v7.158b)92; alignments were edited with Trimal (v1.4.rev15)93 and visual- tions and BLASTP results against an animal-specific subset of the NCBI nr ized with JalView (2.8)94. Protein maximum-likelihood phylogenies and their database (NCBI-nr)100. We computed orthology groups for our A. ceylanicum branch confidence levels were computed with FastTree (2.1.7)95 and visualized genes with OrthoMCL (1.3)86, for numbers of species ranging from 4 to 14 with FigTree 1.4 (see URLs). (Supplementary Tables 4 and 28). Strict orthologies between genes of two or Some details of these methods are provided below; considerably more more species were defined as those orthology groups that contained only one extensive details are provided in the Supplementary Note. predicted gene for each of those species (Fig. 2). Strict orthologies allowed

Nature Genetics doi:10.1038/ng.3237 us to compare transcriptional profiles between A. ceylanicum and C. elegans 69. Hu, Y. et al. Mechanistic and single-dose in vivo therapeutic studies of Cry5B and to thereby identify a set of 406 A. ceylanicum genes that were strongly anthelmintic action against hookworms. PLoS Negl. Trop. Dis. 6, e1900 (2012). 70. Srinivasan, J. et al. The draft genome and transcriptome of Panagrellus redivivus expressed under all conditions for which we had RNA-seq data from either are shaped by the harsh demands of a free-living lifestyle. Genetics 193, A. ceylanicum or C. elegans. 1279–1295 (2013). 71. Li, R. et al. De novo assembly of human genomes with massively parallel short Searching for horizontal gene transfer of protein-coding genes. To find read sequencing. Genome Res. 20, 265–272 (2010). 72. Barrière, A. et al. Detecting heterozygosity in shotgun genome assemblies: lessons possible cases of HGT of protein-coding genes from non-nematodes to from obligately outcrossing nematodes. Genome Res. 19, 470–480 (2009). A. ceylanicum, we used both orthologies (strict and non-strict) and Pfam-A 73. Huang, S. et al. HaploMerger: reconstructing allelic relationships for polymorphic domains (computed for all proteomes as with A. ceylanicum). Orthologies diploid genome assemblies. Genome Res. 22, 1581–1588 (2012). were considered to represent possible instances of HGT if they included 74. Brown, C.T., Howe, A., Zhang, Q., Pyrkosz, A.B. & Brom, T.H. A single pass approach to reducing sampling variation, removing errors, and scaling A. ceylanicum, Homo sapiens and Mus musculus but did not include C. elegans, de novo assembly of shotgun sequences. arXiv. http://arxiv.org/abs/1203.4802 C. briggsae, P. pacificus, Bursaphelenchus xylophilus or Meloidogyne hapla. Sets (2012). of genes encoding a shared Pfam-A domain were likewise considered to con- 75. Schulz, M.H., Zerbino, D.R., Vingron, M. & Birney, E. Oases: robust de novo tain possible instances of HGT if the domains were present in A. ceylanicum RNA-seq assembly across the dynamic range of expression levels. Bioinformatics and mammals (at E ≤ 1 × 10−6) but absent in C. elegans, C. briggsae, P. pacificus, 28, 1086–1092 (2012). 76. Parra, G., Bradnam, K., Ning, Z., Keane, T. & Korf, I. Assessing the gene space −5 B. xylophilus and M. hapla (at E ≤ 1 × 10 ). Out of 33,243 orthology groups in draft genomes. Nucleic Acids Res. 37, 289–297 (2009). and 3,545 Pfam-A domains, we found 52 and 15 (respectively) that were 77. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 instances of possible HGT. Each possible instance of HGT in A. ceylanicum was (2002). individually checked by BLASTP searches of NCBI-nr. In most cases, BLASTP 78. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21 (suppl. 1), i351–i358 (2005). showed similarities to C. elegans and other nematodes, which marked the 79. Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database putative HGTs as false positives. However, we also identified (through Pfam-A management tool for second-generation genome projects. BMC Bioinformatics 12, domains) one A. ceylanicum gene with strong similarity to bacterial amiD, 491 (2011). Acey_s0012.g1873. To search for other such homologs, we reran our motif 80. Käll, L., Krogh, A. & Sonnhammer, E.L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 1027–1036 searches without the requirement for mammalian hits, but, on further testing (2004). with BLASTP against NCBI-nr, no other bacterial sequences were found. 81. Wootton, J.C. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285 (1994). Phylogenetic analysis of lectin homologs from metazoa. In addition to 82. Lupas, A. Prediction and analysis of coiled-coil structures. Methods Enzymol. 266, 513–525 (1996). the amiD homolog Acey_s0012.g1873, we also observed eight A. ceylanicum 83. Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, genes that were more similar to vertebrate lectins than to nematode ones D290–D301 (2012). (Supplementary Table 17): these fell into three classes, showing similarity 84. Eddy, S.R. A new generation of homology search tools based on probabilistic to mannose receptor (MRC), asialoglycoprotein receptor (ASGR) or neuro­ inference. Genome Inform. 23, 205–211 (2009). 85. McDowall, J. & Hunter, S. InterPro protein classification.Methods Mol. Biol. 694, can (NCAN). We phylogenetically compared their domains to nematode, 37–47 (2011). arthropod, deuterostome and lophotrochozoan proteomes, along with a 86. Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups small number of added individual nematode lectins that had been character- for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). ized because of their previously reported similarities to mammalian proteins 87. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). (species listed in Supplementary Table 29; sources of proteome sequences 88. Li, B. & Dewey, C.N. RSEM: accurate transcript quantification from listed in Supplementary Table 28). To avoid misalignments and spurious RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 similarities between multidomain proteins, we analyzed individual C-lectin (2011). domains rather than full-length lectin proteins; to identify coherent sets of 89. Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011). homologs, we searched the custom proteome database with single-domain 90. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 90,101 query sequences via psi-BLAST (2.2.26+) , run for either three or four 10, 421 (2009). rounds at an inclusion threshold of E ≤ 1 × 10−20. The query sequences 91. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high used, with the corresponding numbers of psi-BLAST rounds, are listed in throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 92. Katoh, K. & Standley, D.M. MAFFT multiple sequence alignment software Supplementary Table 30. The resulting single-domain matches were realigned version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 with MUSCLE (3.8.31) and phylogenetically analyzed as above. For each lectin (2013). class, the sequences in each resulting phylogeny are listed in Supplementary 93. Capella-Gutiérrez, S., Silla-Martinez, J.M. & Gabaldon, T. trimAl: a tool for Table 31. automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009). 94. Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M. & Barton, G.J. Jalview Phylogenetic analysis of amiD homologs from metazoa and bacteria. Version 2—a multiple sequence alignment editor and analysis workbench. We first characterized non-bacterial and bacterial homologs of Acey_ Bioinformatics 25, 1189–1191 (2009). s0012.g1873 with BLASTP of NCBI-nr. This analysis yielded matches to 95. Price, M.N., Dehal, P.S. & Arkin, A.P. FastTree 2—approximately maximum- sequences from the hookworms A. ceylanicum (our own data, deposited likelihood trees for large alignments. PLoS ONE 5, e9490 (2010). 96. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of into GenBank) and N. americanus; it also gave nine matches to non-bacterial transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 sequences from arthropods and basal animals (Supplementary Table 32). (1997). To more rigorously determine the phylogenetic origin of the amiD genes 97. Laha, T. et al. The bandit, a new DNA transposon from a hookworm-possible in the hookworms A. ceylanicum and N. americanus, we generated a phy­ horizontal genetic transfer between host and parasite. PLoS Negl. Trop. Dis. 1, e35 (2007). logeny for the entire Amidase_2 superfamily (N-acetylmuramoyl-l-alanine 98. Wheeler, T.J. et al. Dfam: a database of repetitive DNA based on profile hidden amidase; PFAM 27.0 motif PF01510.20), of which bacterial amiD genes rep- Markov models. Nucleic Acids Res. 41, D70–D82 (2013). resent one of four major subdivisions102. We searched all of the proteomes 99. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. listed in Supplementary Table 29, along with all of the individual metazoan Cytogenet. Genome Res. 110, 462–467 (2005). 100. Sayers, E.W. et al. Database resources of the National Center for Biotechnology amiD homologs listed in Supplementary Table 32 and more proteomes from Information. Nucleic Acids Res. 40, D13–D25 (2012). arthropods, two different metagenomes from human stool and cow rumen 101. Schäffer, A.A. et al. Improving the accuracy of PSI-BLAST protein database and the entire 9 July 2014 release of UniProt103. Species and data sources searches with composition-based statistics and other refinements. Nucleic Acids for additional proteomes are listed in Supplementary Table 33; source files Res. 29, 2994–3005 (2001). 102. Firczuk, M. & Bochtler, M. Folds and activities of peptidoglycan amidases. FEMS are listed in Supplementary Table 28. We extracted subsequences matching Microbiol. Rev. 31, 676–691 (2007). the Amidase_2/PF01510.20 domain, realigned them with MAFFT v7.158b 103. UniProt Consortium. Activities at the Universal Protein Resource (UniProt). and phylogenetically analyzed them as above. Nucleic Acids Res. 42, D191–D198 (2014).

doi:10.1038/ng.3237 Nature Genetics