1 1 Supplementary data for manuscript 2 Comparative genomics of the and 3 Ramazzottius varieornatus 4 5 Yuki Yoshida1,2*, Georgios Koutsovoulos3*¶, Dominik R. Laetsch3,4, Lewis Stevens3, Sujai Kumar3, Daiki D. 6 Horikawa1,2, Kyoko Ishino1, Shiori Komine1, Takekazu Kunieda5, Masaru Tomita1,2, Mark Blaxter3, Kazuharu 7 Arakawa1,2 8 9 1 Institute for Advanced Biosciences, Keio University, Kakuganji 246-2, Mizukami, Tsuruoka City 10 Yamagata, Japan 11 2 Systems Biology Program, Graduate School of Media and Governance, Keio University, 5322, Endo, 12 Fujisawa City, Kanagawa, Japan 13 3 Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh EH9 4JT UK 14 4 The James Hutton Institute, Dundee DD2 5DA, United 15 5 Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo 7-3-1, 16 Bunkyo-ku, Tokyo, Japan 17 18 * Joint first authors 19 ¶ Current addresses: GK: [email protected] 20 21 Addresses for correspondence: 22 Kazuharu Arakawa [email protected] 23 Mark Blaxter [email protected] 24 25 2 26 Supplementary Information 27 Tables 28 Supplementary Table S1. Data used in this study ...... 3 29 Supplementary Table S2. Mapping statistics of various DNA-Seq data ...... 6 30 Supplementary Table S3. Repeat content in two tardigrades ...... 7 31 Supplementary Table S4. Telomeric regions in the H. dujardini genome ...... 8 32 Supplementary Table S5. Mapping ratio for RNA-Seq data ...... 9 33 Supplementary Table S6. Mapping ratio of Trinity assembled transcriptome ...... 11 34 Supplementary Table S7. Proteomes used for Orthologue clustering ...... 12 35 Supplementary Table S8. HGT content calculation of ...... 13 36 Supplementary Table S9. Location of specific protection related proteins ...... 14 37 Supplementary Table S10. Functional annotations of synapomorphies ...... 15 38 Supplementary Table S11. Transcriptome data used for phylogenomics...... 16 39 Supplementary Table S12. Software used in this study and their options...... 17 40 41 42 Figures 43 Supplementary Figure S1. DNA-Seq coverage of the H. dujardini genome ...... 20 44 Supplementary Figure S2. tRNA content of the two tardigrades ...... 21 45 Supplementary Figure S3. Length of genetic features ...... 22 46 Supplementary Figure S4. Clustered HGT loci in H. dujardini and R. varieornatus ...... 23 47 Supplementary Figure S5. Phylogeny of protection related proteins ...... 24 48 Supplementary Figure S6. HGT obtained pathways in two tardigrades ...... 25 49 50 Data files 51 1. miRNA_data 52 2. hgt_trees 53 3. HGT_cluster_matrix 54 4. DEG_list 55 5. Tardigrade_DEGs.fuctional_annotation 56 6. Orthofinder.clustering 57 7. KinFin_input 58 8. Tardigrade_counts_representation_tests 59 9. 463_putative HGTs 60 10. proteome_fastas 61 62 63 3

64 Supplementary Tables 65 Supplementary Table S1. Data used in this study 66 67 a) DNA-sequencing 68 Origin Keio UNC Accession ID SRX2495681 DRR055040 SRX1474871 SRX1474929 SRX1474950 Platform PacBio MiSeq HiSeq 2000 HiSeq 2000 HiSeq 2000 # Reads 779,905 51,607,261 87,744,967 64,049,630 45,733,238 # bp 5,877,440,568 NA NA NA NA Read Length NA 300bp paired 100bp paired 100bp paired 100bp paired Maximum length 49,455 NA NA NA NA N50 length 10,657 NA NA NA NA Average length 7,536 NA NA NA NA Insert length NA 477.0953 347.566 496.9285 749.2743 69 70 We generated new sequencing data using PacBio SMRT technology. In addition, we have used 71 sequence data from Boothby et al (2015) for assembly, and single individual sequencing data from 72 our previous report (Arakawa 2016). 73 74 4 75 (b) Hypsibius dujardini RNA-Seq #Individuals Platform sample #replicate Sample ID #Reads #Accession76 10,000 HiSeq2000 Active 1 Active_1 25,172,359 SRX252836977 2 Active_2 26,497,216 SRX252837078 3 Active_3 28,141,582 SRX2528371 Tun 1 Tun_1 25,782,478 SRX252837279 2 Tun_2 27,832,551 SRX2528373 3 Tun_3 27,001,002 SRX2528374 30 NextSeq500 Active 1 act-1 11,399,144 SRX2528375 2 act-2 10,744,670 SRX2528376 3 act-3 10,939,323 SRX2528377 Tun 1 tun-1 10,325,677 SRX2528378 2 tun-2 10,689,489 SRX2528379 3 tun-3 10,455,913 SRX2528380 Egg 1st day after laying 1 H-E1-1 8,822,054 SRX2528333 2 H-E1-2 10,286,604 SRX2528334 3 H-E1-3 8,319,242 SRX2528335 Egg 2nd day after laying 1 H-E2-1 11,794,526 SRX2528336 2 H-E2-2 11,086,054 SRX2528337 3 H-E2-3 10,151,210 SRX2528338 Egg 3rd day after laying 1 H-E3-1 10,057,550 SRX2528339 2 H-E3-2 9,253,951 SRX2528340 3 H-E3-3 11,871,780 SRX2528341 Egg 4th day after laying 1 H-E4-1 11,622,113 SRX2528342 2 H-E4-2 12,386,383 SRX2528343 3 H-E4-3 9,654,776 SRX2528344 Egg 5th day after laying 1 H-E5-1 11,921,100 SRX2528345 2 H-E5-2 11,569,382 SRX2528346 3 H-E5-3 10,503,387 SRX2528347 Juvenile 1st day 1 H-B1-1 12,440,551 SRX2528348 2 H-B1-2 12,306,138 SRX2528349 3 H-B1-3 12,734,126 SRX2528350 Juvenile 2nd day 1 H-B2-1 13,107,156 SRX2528351 2 H-B2-2 14,437,609 SRX2528352 3 H-B2-3 13,870,809 SRX2528353 Juvenile 3rd day 1 H-B3-1 8,360,076 SRX2528354 2 H-B3-2 6,542,790 SRX2528355 3 H-B3-3 9,775,113 SRX2528356 Juvenile 4th day 1 H-B4-1 9,824,335 SRX2528357 2 H-B4-2 16,666,875 SRX2528358 3 H-B4-3 15,995,271 SRX2528359 Juvenile 5th day 1 H-B5-1 6,928,823 SRX2528360 2 H-B5-2 8,857,975 SRX2528361 3 H-B5-3 12,901,947 SRX2528362 Juvenile 6th day 1 H-B6-1 9,843,726 SRX2528363 2 H-B6-2 12,913,346 SRX2528364 3 H-B6-3 11,745,564 SRX2528365 Juvenile 7th day 1 H-B7-1 12,384,307 SRX2528366 2 H-B7-2 9,182,107 SRX2528367 3 H-B7-3 13,626,269 SRX2528368 5,000 HiSeq2000 miRNA-Seq 1 HD_miRNA 32,254,413 SRX2495676 5 80 (C) Ramazzottius varieornatus RNA-Seq #Individuals Platform sample #rep Sample ID #Reads #Accession 1 Y-active_slow_1 12,146,289 SRX2528399 Active-Fast 2 Y-active_slow_2 11,076,841 SRX2528400 3 Y-active_slow_3 11,211,443 SRX2528401 1~2.5 1 Y-tun_slow_ 1 11,781,529 SRX2528402 Tun-Fast 2 Y-tun_slow_2 11,966,104 SRX2528403 3 Y-tun_slow_3 12,361,848 SRX2528404 1 Y-active_fast_1 31,330,380 SRX2528405 Active-Slow 2 Y-active_fast_2 35,320,831 SRX2528406 3 Y-active_fast_3 36,895,441 SRX2528407 1 Y-tun_fast_1 35,469,871 SRX2528408 Tun-Slow 2 Y-tun_fast_ 2 38,879,671 SRX2528409 3 Y-tun_fast_ 3 31,835,650 SRX2528410 Egg 1st day after laying 1 Y-E1-1 11,688,367 SRX2528381 2 Y-E1-2 13,064,048 SRX2528382 3 Y-E1-3 13,389,666 SRX2528383 NextSeq500 Egg 2nd day after laying 1 Y-E2-1 12,702,879 SRX2528384 2 Y-E2-2 14,385,811 SRX2528385 3 Y-E2-3 13,101,271 SRX2528386 30 Egg 3rd day after laying 1 Y-E3-1 14,348,899 SRX2528387 2 Y-E3-2 13,640,410 SRX2528388 3 Y-E3-3 8,817,117 SRX2528389 Egg 4th day after laying 1 Y-E4-1 12,606,663 SRX2528390 2 Y-E4-2 15,271,225 SRX2528391 3 Y-E4-3 12,517,722 SRX2528392 Egg 5th day after laying 1 Y-E5-1 12,599,958 SRX2528393 2 Y-E5-2 14,476,417 SRX2528394 3 Y-E5-3 17,324,895 SRX2528395 Juvenile 1st day 1 Y-B1-1 4,811,886 SRX2528396 2 Y-B1-2 6,210,798 SRX2528397 3 Y-B1-3 5,637,785 SRX2528398 81 82 83 6 84 Supplementary Table S2. Mapping statistics of various DNA-Seq data 85 Origin Accession ID Mapped Reads Coverage (Mean/SD) # In/Del

SRR2986339 47.6M (54.24%) 107±234 6.85M/7.70M

UNC SRR2986435 53.8M (83.95%) 50±97 1.84M/2.02

SRR2986451 39.2M (85.68%) 45±70 2.82M/2.90M

ERR1147177 116.0M (77.15%) 50±87 3.2M/3.32M Edinburgh ERR1147178 99.2M (80.03%) 36±88 2.30M/2.37M

Keio DRR055040 50.0M (96.91%) 113±136 8.59M/8/48 86 87 88 89 7 90 Supplementary Table S3. Repeat content in two tardigrades 91 CategoryTerm H. dujardini R. varieornatus

#elements 65,638 3,301

Simple #length (bp) 5,391,682 137,297 %genome 5.18 0.25 #elements 158,522 65,730 Unclassified #length (bp) 24,232,698 11,020,138 %genome 23.27 19.74 92 93 94 95 96 97 8 98 Supplementary Table S4. Telomeric regions in the H. dujardini genome 99 100 Scaffold Start End Repeat Length from End Length scaffold0088 327857 336800 TTGATGGGTT 49 8943 scaffold0114 15 7307 ATCAAAACCC 15 7292 scaffold0012 1 5955 CATCAAAACC 1 5954 scaffold0363 14 4481 ATCAAAACCC 14 4467 scaffold0321 157 4157 AAAACCCATC 157 4000 scaffold0005 52 3367 CAAAACCCAT 52 3315 scaffold0128 239844 242599 GGTTTTGATG 823 2755 scaffold0001 7702 9727 AAACCCATCA 7702 2025 scaffold0192 164482 165621 GGGTTTTGAT 49 1139 scaffold0343 22767 23510 TTTTGATGGG 22767 743 scaffold0023 54017 54373 ATCAAAACCC 54017 356 scaffold0287 57943 58286 ATGGGTTTTG 36758 343 scaffold0212 65340 65622 TTTTGATGGG 65340 282 scaffold0093 201990 202227 TTGATGGGTT 107275 237 scaffold0072 51074 51288 TGGGTTTTGA 51074 214 scaffold0070 189916 190113 CATCAAAACC 189916 197 scaffold0031 383706 383897 GATGGGTTTT 227004 191 scaffold0090 105640 105790 TCAAAACCCA 105640 150 scaffold0092 52492 52623 ATGGGTTTTG 52492 131 scaffold0005 3799 3922 ATCAAAACCC 3799 123 scaffold0036 185539 185660 ATCAAAACCC 185539 121 scaffold0036 17528 17649 ATCAAAACCC 17528 121 scaffold0070 137262 137380 AAACCCATCA 137262 118 scaffold0117 133706 133813 ATCAAAACCC 120682 107 scaffold0171 121075 121176 AAACCCATCA 66249 101 scaffold0268 65742 65843 ATCAAAACCC 36714 101 101 102 Regions spanned close to scaffold ends were colored in yellow; they may represent telomeric ends. 103 9 104 Supplementary Table S5. Mapping ratio for RNA-Seq data 105 106 (A) Anhydrobiosis samples Hypsibius dujardini 10k Individuals H. dujardini transcriptome H. dujardini genome act-1 35,536,088 (70.08%) 51,790,540 (92.95%) act-2 38,335,541 (71.78%) 55,208,281 (93.93%) act-3 40,570,560 (71.53%) 58,667,535 (94.03%) tun-1 37,104,384 (71.41%) 54,280,558 (94.77%) tun-2 39,611,771 (70.58%) 58,602,179 (94.87%) tun-3 38,118,839 (69.87%) 57,146,947 (95.28%) 30 Individuals act-1 4,276,18 (66.19%) act-2 5,604,724 (52.04%) act-3 2,628,272 (24.98%) tun-1 6,410,871 (61.75%) tun-2 4,524,015 (61.87%) tun-3 5,562,719 (61.55%) Ramazzottius varieornatus Slow-dry R. varieornatus transcriptome R. varieornatus genome Hashimoto et al. gene model act-1 6,414,974 (52.68%) 11,459,570 (90.77%) 6,592920 (54.14%) act-2 6,052,043 (54.49%) 10,525516 (91.22%) 6,199,903 (55.82%) act-3 5,587,457 (49.73%) 10,581005 (90.96%) 5,735,575 (51.04%) tun-1 5,327,210 (45.14%) 10,816699 (88.77%) 5,484,698 (46.47%) tun-2 5,593,452 (46.66%) 11,003,773 (88.82%) 5,754,535 (48.00%) tun-3 6,108,721 (49.31%) 11,614,986 (90.71%) 6,287,399 (50.75%) Fast-dry act-1 15,088,390 (48.06%) act-2 17,576,037 (49.66%) act-3 17,942,032 (48.53%) tun-1 11,430,186 (32.18%) tun-2 13,773,138 (35.37%) tun-3 12,529,323 (39.30%) 107 108 109 110 111 112 113 114 115 116 117 118 10 119 (B) Developmental samples 120 121 Hypsibius dujardini Ramazzottius varieornatus Development Transcriptome Genome Transcriptome Genome E1-1 4,378,145 (49.51%) 7,962,696 (87.17%) 5,191,424 (44.32%) 9,436,055 (77.92%) E1-2 5,253,306 (50.92%) 9,311,381 (87.33%) 5,299,355 (40.49%) 9,693,795 (71.78%) E1-3 3,958,477 (47.46%) 7,341,290 (84.97%) 5,051,568 (37.66%) 9,497,572 (68.66%) E2-1 4,818,034 (40.76%) 10,213,359 (83.77%) 2,591,777 (20.38%) 5,835,138 (45.13%) E2-2 4,777,963 (42.98%) 9,834,385 (85.84%) 3,098,601 (21.52%) 6,685,797 (45.59%) E2-3 4,525,795 (44.45%) 8,777,764 (83.40%) 2,514,170 (19.17%) 5,823252 (43.70%) E3-1 4,529,697 (44.92%) 8,738,778 (83.96%) 3,504,071 (24.40%) 7,761,912 (53.03%) E3-2 3,661,789 (39.49%) 7,335,588 (76.93%) 2,935,065 (21.50%) 6,961,489 (50.14%) E3-3 4,941,305 (41.53%) 10,139,902 (82.78%) 1,551,655 (17.59%) 3,826,220 (42.74%) E4-1 4,,446,480 (38.18%) 8,251,277 (68.94%) 2,286,988 (18.13%) 5,335,818 (41.70%) E4-2 5,057,162 (40.74%) 9,514,858 (74.52%) 1,805,923 (11.82%) 4,599,738 (29.82%) E4-3 3,622,851 (37.44%) 7,249,760 (73.06%) 1,451,975 (11.59%) 3,634,522 (28.75%) E5-1 5,008,318 (41.91%) 9,568,902 (77.70%) 9,29,244 (7.37%) 2,786,443 (21.98%) E5-2 5,415,199 (46.66%) 10,173,646 (84.90%) 2,99,688 (2.07%) 1,680,860 (11.58%) E5-3 4,816,060 (45.72%) 9,050,720 (83.26%) 6,30,014 (3.64%) 2,605,069 (14.98%) B1-1 6,372,485 (51.09%) 11,862,133 (91.80%) 2,043,701 (42.39%) 3,563,016 (71.91%) B1-2 3,753,075 (30.46%) 7,981,613 (63.46%) 2,871,856 (46.14%) 4832,785 (75.38%) B1-3 4,574,819 (35.86%) 9,315,606 (71.27%) 2,684,092 (47.50%) 4,539,890 (77.94%) B2-1 2,727,779 (20.79%) 6,397,464 (48.10%) NA NA B2-2 2,293,308 (15.87%) 5,574,352 (38.18%) NA NA B2-3 2,244,975 (16.17%) 5,219,405 (37.19%) NA NA B3-1 3,009,036 (35.94%) 6,134,894 (71.71%) NA NA B3-2 1,800,045 (27.48%) 3,885,947 (58.34%) NA NA B3-3 3,070,949 (31.38%) 6,447,666 (64.60%) NA NA B4-1 3,213,530 (32.66%) 6,355,544 (63.35%) NA NA B4-2 3,718,153 (22.29%) 8,188,239 (48.43%) NA NA B4-3 3,808,703 (23.79%) 8,102,210 (49.89%) NA NA B5-1 2,366,172 (34.10%) 4,775,355 (67.40%) NA NA B5-2 2,507,768 (28.28%) 5,510,508 (61.17%) NA NA B5-3 3,797,968 (29.40%) 7,968,128 (60.63%) NA NA B6-1 4,176,593 (42.35%) 8,477,432 (83.80%) NA NA B6-2 5,616,490 (43.40%) 11,116,147 (83.69%) NA NA B6-3 4,280,233 (36.37%) 8,557,430 (71.17%) NA NA B7-1 5,380,476 (43.35%) 10,719,826 (84.09%) NA NA B7-2 4,177,942 (45.40%) 8,118,862 (85.79%) NA NA B7-3 5,987,512 (43.85%) 11,144,850 (79.27%) NA NA 122 123 11 124 Supplementary Table S6. Mapping ratio of Trinity assembled transcriptome 125 Hypsibius dujardini Total number of H. dujardini 10k Individuals H. dujardini genome transcripts transcriptome act-1 66886 59283 (88.63%) 43608 (65.20%) act-2 68941 60998 (88.48%) 45168 (65.52%) act-3 73982 68767 (99.74%) 52217 (70.58%) tun-1 63670 59503 (93.46%) 43667 (68.58%) tun-2 77919 74304 (95.37%) 54768 (61.59%) tun-3 149853 117883 (78.67%) 72078 (48.10%) 126 127 128 12 129 Supplementary Table S7. Proteomes used for Orthologue clustering 130 Index Species GeneBuild ID / Accession Source 0 Anopheles gambiae 2014-08-VectorBase EnsemblMetazoa 1 Apis mellifera 2014-05-BeeBase EnsemblMetazoa 2 Acyrthosiphon pisum 2013-07-AphidBase EnsemblMetazoa 3 Ascaris suum PRJNA80881 WormbaseParasite5 4 Brugia malayi PRJNA10729 WormbaseParasite5 5 Bursaphelenchus xylophilus PRJEA64437 WormbaseParasite5 6 Caenorhabditis elegans PRJNA13758 WormbaseParasite5 7 Cimex lectularius v0.5.3 I5K 8 Capitella teleta 2012-12-JGI EnsemblMetazoa 38 Drosophila melanogaster r6_09 FLYBASE 9 Dendroctonus ponderosae 2013-04-EnsemblMetazoa EnsemblMetazoa 10 Daphnia pulex 2011-02-EnsemblMetazoa EnsemblMetazoa 11 Hypsibius dujardini nHd.3.0 this study 12 Ixodes scapularis 2014-08-VectorBase EnsemblMetazoa 13 Meloidogyne hapla PRJNA29083 14 Nasonia vitripennis 2010-12-NasoniaBase EnsemblMetazoa 15 Octopus bimaculoides 2016-03-OIST EnsemblMetazoa 16 Priapulus caudatus GCF_000485595 NCBI 17 Pediculus humanus 2014-04-VectorBase EnsemblMetazoa 18 Plectus murrayi nPm.2.0 ngenomes.org 19 Pristionchus pacificus PRJNA12644 WormbaseParasite5 20 Plutella xylostella DBM_FJ_v1_1 LEPBASE 37 Ramazzottius varieornatus nRv.1.1 this study 22 Solenopsis invicta 2013-10-AntGenomesPortal EnsemblMetazoa 23 Strigamia maritima 2013-02-EG EnsemblMetazoa 24 Tribolium castaneum 2012-09-24 EnsemblMetazoa 25 Trichuris muris PRJEB126 WormbaseParasite5 26 Trichinella spiralis PRJNA12603 WormbaseParasite5 27 Tetranychus urticae 2012-11-ORCAE EnsemblMetazoa 131 132 133 13 134 Supplementary Table S8. HGT content calculation of ecdysozoa 135 136

Augustus3 ENSEMBL (DL:2016.09.20) Ab initio (Augustus 3.2.2) Category Organism model # Gene HGT % # Gene HGT % aedes Aedes aegypti 15,796 182/11906 1.53 103,215 6364/70452 9.03 Arthropod honeybee1 Apis mellifera 15,314 485/9773 4.96 14,115 897/10073 8.90 Arthropod bombus_impatiens1 Bombus impatiens 15,896 1166/11329 10.29 18,793 1487/10910 13.63 Arthropod culex Culex quinquefasciatus 18,968 203/14015 1.45 25,343 297/18296 2.17 Arthropod NA Daphnia magna 29,127 1343/17234 7.79 NA NA NA Arthropod fly Drosophila ananassae 15,069 91/10336 0.88 21,842 1816/15694 11.58 Arthropod fly Drosopohila erecta 15,044 85/10121 0.84 15,447 616/11491 5.36 Arthropod fly Drosophila grimshawi 14,982 90/10448 0.86 15,293 263/11015 2.39 Arthropod fly Drosophila melanogaster 13,918 74/19191 0.73 15,535 532/11401 4.67 Arthropod fly Drosophila mojavensis 14,594 86/9918 0.87 15,677 372/11338 3.28 Arthropod fly Drosophila persimilis 16,874 82/10800 0.76 21,673 786/14385 5.46 Arthropod fly Drosophila pseudoobscura 15,864 96/10649 0.90 16,705 2452/11387 2.21 Arthropod fly Drosophila sechellia 16,465 71/10787 0.66 22,119 1151/16788 6.86 Arthropod fly Drosophila simulans 15,413 70/9820 0.71 16,148 362/11393 3.18 Arthropod fly Drosophila virilis 14,491 82/10081 0.81 15,991 512/11862 4.32 Arthropod fly Drosophila willistoni 15,512 156/10638 1.47 16,942 738/12259 6.02 Arthropod fly Drosophila yakuba 16,077 85/10463 0.81 17,774 539/12469 4.32 Arthropod heliconius_melpomene1 Heliconius_melpomene 12,669 132/9421 1.40 20,333 289/14640 1.97 Arthropod nasonia Nasonia vitripennis 17,083 223/12130 1.92 26,010 538/15457 3.48 Arthropod rhodnius Rhodnius prolixus 15,438 498/10733 4.63 52,161 1833/39130 4.68 Arthropod tribolium2012 Tribolium castaneum 16,524 124/1115 1.12 16,160 105/10442 1.00 caenorhabditis Caenorhabditis brenneri 30,660 253/14748 1.70 38,953 518/17298 2.99 Nematode caenorhabditis Caenorhabditis briggsae 21,814 239/10936 2.19 20,745 242/11335 2.13 Nematode caenorhabditis Caenorhabditis elegans 20,362 223/10574 2.11 18,177 215/10278 2.09 Nematode caenorhabditis Caenorhabditis japonica 29,931 315/15260 2.06 29,556 352/15842 2.22 Nematode caenorhabditis Caenorhabditis remanei 31,437 766/14483 5.29 30,506 1288/14291 9.01 Nematode trichinella Trichinella spiralis 16,380 47/8616 0.55 11,310 41/8079 0.51 Tardigrade BRAKER Hypsibius dujardini NA NA NA 19913 463/12616 3.67 Tardigrade BRAKER Ramazzottius varieornatus 19521 242/10760 2.25 13917 220/9894 2.22 137 138 139 14 140 Supplementary Table S9. Location of tardigrade specific protection related proteins 141 Gene ID Scaffold Category %Identity Length E-value Bitscore bHd17608.1 scaffold0022 CAHS1 54.55 165 5.0E-45 158 bHd17663.1 scaffold0023 CAHS1 49.46 184 2.0E-45 161 bHd04182.1 scaffold0087 CAHS1 61.99 221 7.0E-65 209 bHd04184.1 scaffold0087 CAHS1 78.18 55 4.0E-19 85.5 bHd06166.1 scaffold0123 CAHS1 45.13 195 2.0E-51 174 bHd16038.1 scaffold0013 CAHS2 56.55 168 8.0E-57 187 bHd17504.1 scaffold0022 CAHS2 53.33 105 3.0E-28 117 bHd17505.1 scaffold0022 CAHS2 47.40 192 3.0E-47 162 bHd17506.1 scaffold0022 CAHS2 67.71 192 1.0E-82 253 bHd18862.1 scaffold0032 CAHS2 51.63 184 9.0E-41 148 bHd01486.1 scaffold0050 CAHS2 55.09 167 8.0E-54 179 bHd02925.1 scaffold0069 CAHS3 57.33 75 2.0E-20 90.5 bHd19902.1 scaffold0018 MAHS 42.51 167 5.0E-24 107 bHd16514.1 scaffold0016 RvLEAM 41.45 234 3.0E-40 147 bHd00493.1 scaffold0002 SAHS1 36.30 135 2.0E-24 100 bHd07979.1 scaffold0005 SAHS1 34.50 171 2.0E-25 103 bHd10755.1 scaffold0239 SAHS1 45.88 170 2.0E-44 152 bHd10756.1 scaffold0239 SAHS1 38.01 171 2.0E-32 121 bHd10757.1 scaffold0239 SAHS1 38.01 171 2.0E-32 121 bHd10758.1 scaffold0239 SAHS1 54.23 142 3.0E-47 159 bHd10759.1 scaffold0239 SAHS1 54.68 139 2.0E-49 164 bHd10762.1 scaffold0239 SAHS1 47.37 171 5.0E-47 159 bHd10763.1 scaffold0239 SAHS1 46.47 170 3.0E-47 159 bHd10764.1 scaffold0239 SAHS1 69.63 135 2.0E-65 205 142 143 144 15 145 Supplementary Table S10. Functional annotations of synapomorphies 146 proportion of proportion of proportion of nematode arthropode tardigrade taxon proteomes proteomes proteomes cluster_id proteins hypothesis coverage present (n=9) present (n=15) present (n=2) OG0000436 104 1.00 0.00 1.00 1.00 OG0001236 54 Panarthropoda 1.00 0.00 1.00 1.00 OG0002592 36 Panarthropoda 1.00 0.00 1.00 1.00 OG0006538 19 Panarthropoda 1.00 0.00 1.00 1.00 OG0006541 19 Panarthropoda 1.00 0.00 1.00 1.00 OG0006869 17 Panarthropoda 1.00 0.00 1.00 1.00 OG0005117 27 Panarthropoda 0.88 0.00 0.93 0.50 OG0005941 22 Panarthropoda 0.77 0.00 0.73 1.00 OG0006662 18 Panarthropoda 0.82 0.00 0.80 1.00 OG0006889 17 Panarthropoda 0.71 0.00 0.73 0.50 OG0006940 17 Panarthropoda 0.82 0.00 0.87 0.50 OG0006941 17 Panarthropoda 0.71 0.00 0.67 1.00 OG0006951 17 Panarthropoda 0.71 0.00 0.67 1.00 OG0007141 16 Panarthropoda 0.82 0.00 0.80 1.00 OG0007285 15 Panarthropoda 0.71 0.00 0.67 1.00 OG0007290 15 Panarthropoda 0.82 0.00 0.80 1.00 OG0007298 15 Panarthropoda 0.88 0.00 0.87 1.00 OG0007328 15 Panarthropoda 0.71 0.00 0.67 1.00 OG0007463 14 Panarthropoda 0.77 0.00 0.73 1.00 OG0007689 13 Panarthropoda 0.71 0.00 0.67 1.00 Nematoda+ OG0005423 26 Tardigrada 0.82 0.89 0.00 0.50 Nematoda+ OG0006414 20 Tardigrada 0.82 0.78 0.00 1.00 Nematoda+ OG0007199 16 Tardigrada 0.91 1.00 0.00 0.50 Nematoda+ OG0007812 13 Tardigrada 0.82 0.78 0.00 1.00 Nematoda+ OG0008368 11 Tardigrada 0.82 0.78 0.00 1.00 147 148 149 16 150 Supplementary Table S11. Transcriptome data used for phylogenomics. 151 Species Data Type Action Link Echiniscus Tardigrada assembled predicted proteins using ftp.ncbi.nlm.nih.gov/sra/wgs_ testudo transcripts TransDecoder aux/GD/AL/GDAL01/GDAL 01.1.fsa_nt.gz Milnesium Tardigrada assembled predicted proteins using https://www.ncbi.nlm.nih.gov tardigradum transcripts TransDecoder /nuccore/?term=PRJNA3412 1 Euperipatoides Onycophora ESTs assembled with CAP3; ftp://ftp- kanangrensis predicted proteins using private.ncbi.nlm.nih.gov/pub/ TransDecoder TraceDB/euperipatoides_kan angrensis/fasta.euperipatoide s_kanangrensis.001.gz Peripatopsis Onycophora ESTs assembled with CAP3; https://www.ncbi.nlm.nih.gov sedgwicki predicted proteins using /nucest/?term=Peripatopsis+ TransDecoder sedgwicki%5Borganism%5D Peripatopsis Onycophora raw Illumina trimmed using Skewer; https://www.ncbi.nlm.nih.gov capensis reads assembled with Trinity; /bioproject/PRJNA236598 predicted proteins using TransDecoder Pycnophyes assembled predicted proteins using ftp://ftp.ncbi.nlm.nih.gov/sra/ kielensis transcripts TransDecoder wgs_aux/GD/AN/GDAN01/ GDAN01.1.fsa_nt.gz Echinoderes Kinorhyncha ESTs assembled with CAP3; ftp://ftp- horni predicted proteins using private.ncbi.nlm.nih.gov/pub/ TransDecoder TraceDB/echinoderes_horni/ fasta.echinoderes_horni.001. gz Halicryptus assembled predicted proteins using ftp://ftp.ncbi.nlm.nih.gov/sra/ spinulosus transcripts TransDecoder wgs_aux/GD/AM/GDAM01/ GDAM01.1.fsa_nt.gz 152 153 17 154 Supplementary Table S12. Software used in this study and their options.

Tool name Reference Version Relevant parameters Commentary on usage

Raw data processing and filtering

FastQC [82] v0.11.3 None

Skewer [114] 0.2.2 -n -q 30 -l 51 -m pe

bwa [93] 0.7.12-r1039 None DK from blobtools [87] v0.9.19 https://github.com/DRL/blob tools

Version: 1.2 [view] -@ 30 –bS samtools [94] (using htslib [sort] -@ 30 1.2.1) [index] None

ncbi-blast+ [85]

Genome assembly

Usearch [83] v. 8.0.1517 None

-k 21,33,55,77,99,127, --only- SPAdes [84] v. 3.8.1 assembler , --careful

Bowtie2 [88] v. 2.2.4 None

Platanus [44] v. 1.2.3 -u 0.2 [Error correction] daligner cutoff of 4,000 bp and - k16 -e0.70 -s1000 -t16 -l1000 - h64 -w7 [Second daligner] Falcon [43] v.0.2.2 -k20 -e.96 -s1000 -t32 -l1500 - h256 [Final assembly] min coverage of 2, max coverage of 80, max diff coverage of 40 SSPACE-LongReads [46] v. 1.1 -m 50

PBJelly [89] v. 13.10 -m 50

Pilon [90] v. 1.17 None

qualimap bamqc -bam input.mem.sorted.bam - Qualimap [95] v. 2.2 outformat pdf --java-mem- size=16G

CEGMA [47] v. 2.5 None

Phylogenetic analyses

RAxML [65]

ClustalW2 [99] v2.1 None 18 v7.271 MAFFT [103] None (2016/1/6) FastTree [108] 2.1.8 SSE3 -boot 1000

trimal [116]

fasconcat-G [117]

FigTree [109] v. 1.4.2

Annotation and databasing

ENSEMBL [54] Version 85 HMMER 3.1b2 Hmmsearch [122] (February --cpu 32 --domE 1e-15 2015); Braker [48] v1.9 None

Augustus [61] v3.2.2 None

GeneMark-ES [92] v.4.21 None

RepeatScout [100] Version 1.0.5 None version open- RepeatMasker [101] None 4.0.5 tRNAscan-SE tRNAscan-SE [51] 1.3.1 (January None 2012)

RNAmmer [50] v1.2 -multi -S euk -m lsu,ssu,tsu

Automatic Annotation KAAS [98] Representative set for GENES Via web interface. Server Ver. 2.1 Legacy blast [64] v2.2.22 -e 1e-15

Diamond [63] v1.8.2 -e 10 --sensitive --goterms -appl TIGRFAM- 15.0,ProDom-2006.1,SMART- 7.1,SignalP-EUK- interproscan- 4.1,PrositePatterns- https://github.com/ebi-pf- InterProScan [99] 5.19-58.0 20.119,PRINTS- team/interproscan 42.0,SuperFamily-1.75,Pfam- 29.0,PrositeProfiles-20.119 -f TSV miRDeep [52] v.2.0.0.8 None v36.3.8e Sep, SSEARCH [102] 2016(preload9 None ) Genome comparison Murasaki murasaki [58] version 1.68.6 -p[28:36] -M 100 (LARGESEQ) mauve_snapsh Progressive alignment with Mauve [107] ot_2015-02- None GUI 13/ 19

Databases used in annotation DL at Swiss-Prot [57] 2016/5/23 DL at TrEMBL [57] 2016/1/17 DL at Pfam-A [97] 2016/7/22 DL at Dfam [56] 2016/09/26 DL at miRBase [53] 2016/12/20 Transcriptome analyses

Trinity [49] 2.2.0 Default params TransDecoder.LongOrfs (default) TransDecoder [115] 3.0.0 TransDecoder.Predict --

retain_blastp_hits -- single_best_orf tophat -o output -p 30 ref.fa Tophat2 [91] v2.1.1 a.r1.fq a.r2.fq

DESeq2 [104] 1.8.2 None

Bowtie2 [88] version 2.2.8 None Used in TopHat2

[index] None [quant-single] --bias -b 100 -- Kallisto [105] kallisto 0.42.4 single -l 200 -s 50 [quant-paired] --bias --b 100 Gene family analyses

Gephi [123] v0,9.1 https://gephi.org/ https://github.com/DRL/kinfi KinFin [60] v0.8.2 n

Using the following inflation https://github.com/davidemm OrthoFinder [59] v1.1.2 values: 1.1, 1.5, 2.0, 2.5, 3.5, 4.0, s/OrthoFinder 4.5, 5.0

Others

EasyImport [119] G-language Genome Analysis [120, 121] v.1.9.1 Environment 155 156 20

157 Supplementary Figures 158 159 Supplementary Figure S1. DNA-Seq coverage of the H. dujardini genome 3. Results160 : Coverage across reference

161 162 Single individual DNA sequencing data (DRR055040) with BWA MEM, and the genomic coverage 163 was calculated with Qualimap bamqc. 164 165

Page 65 21 166 Supplementary Figure S2. tRNA content of the two tardigrades 167

168 169 tRNA locus were predicted with tRNA-Scan SE, and was summarized. Each codon was colored by 170 the number of tRNA loci, ranging from 0 to 8. 171 22 172 Supplementary Figure S3. Length of genetic features

173 174 Comparisons of genes structures in 4728 single-copy orthologues between H. dujardini and R. varieornatus. 175 Outliers are defined as genes in H. dujardini which have CDS lengths 20% longer (long outliers; orange; 576 176 genes) or 20% shorter (short outliers; black; 294 genes) than their orthologues in R. varieornatus. A positive 177 log2 ratio indicates a trend towards a larger count or span in H. dujardini. 178 179 180 181 182 23 183 Supplementary Figure S4. Clustered HGT loci in H. dujardini and R. varieornatus 184

185 186 187 HGT candidates are clustered in the genomes of H. dujardini and R. varieornatus. The number of 188 genes between each HGT. 189 190 191 24 192 Supplementary Figure S5. Phylogeny of protection related proteins

CAHS3

CAHS1/2

SAHS2

SAHS1

MAHS 193 194 195 All amino acid sequences of CAHS, SAHS, MAHS, RvLEAM, Dsup genes in H. dujardini and R. 196 varieornatus were aligned with ClustalW2, and a was constructed with FastTree 197 with 1,000 bootstraps. Each clade was annotated with the corresponding R. varieornatus gene 198 registered in Uniprot (CAHS1:g673, CAHS2:g675, CAHS3:g884, SAHS1:g1671, SAHS2:g1676, 199 MAHS:g6834, RvLEAM:g2978, Dsup:g4591). 200 25 Figure 1-b 201 Supplementary Figure S6. HGT obtained pathways in two tardigrades A Ascorbate synthesis pathway

myo-Inositol inositol oxygenase glucuronosyltransferase

UDP-D-glucose UDP-D-glucuronate D-Glucuronate UDPglucose 6-dehydrogenase glucuronate reductase (alcohol dehydrogenase) L-Gulonate L-Gulono-1,4-lactone L-xylo-Hexulono-lactone Metabolite gluconolactonase L-gulonolactone oxidase non-HGT L-Ascorbate

HGT Conserved in R. varieornatus L-ascorbate oxidase

L-Dehydroascorbate B Trehalose synthesis via TPS

R. varieornatus TPS TPS treA

UDP-glucose Trehalose-6P Trehalose D-Glucose

H. dujardini TPS TPS treA 202 203 204 205 Trehalose and ascorbate synthesis pathways were reconstructed with KAAS and KEGG mapper. 206 207 Right) Trehalose phosphatase synthase/phosphatase (TPS) is lost in H.dujardini, however R. varieornatus has a horizontally transfered TPS.