Comparative Genomics of the Tardigrades Hypsibius Dujardini and 3 Ramazzottius Varieornatus 4 5 Yuki Yoshida1,2*, Georgios Koutsovoulos3*¶, Dominik R
Total Page:16
File Type:pdf, Size:1020Kb
1 1 Supplementary data for manuscript 2 Comparative genomics of the tardigrades Hypsibius dujardini and 3 Ramazzottius varieornatus 4 5 Yuki Yoshida1,2*, Georgios Koutsovoulos3*¶, Dominik R. Laetsch3,4, Lewis Stevens3, Sujai Kumar3, Daiki D. 6 Horikawa1,2, Kyoko Ishino1, Shiori Komine1, Takekazu Kunieda5, Masaru Tomita1,2, Mark Blaxter3, Kazuharu 7 Arakawa1,2 8 9 1 Institute for Advanced Biosciences, Keio University, Kakuganji 246-2, Mizukami, Tsuruoka City 10 Yamagata, Japan 11 2 Systems Biology Program, Graduate School of Media and Governance, Keio University, 5322, Endo, 12 Fujisawa City, Kanagawa, Japan 13 3 Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh EH9 4JT UK 14 4 The James Hutton Institute, Dundee DD2 5DA, United Kingdom 15 5 Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo 7-3-1, 16 Bunkyo-ku, Tokyo, Japan 17 18 * Joint first authors 19 ¶ Current addresses: GK: [email protected] 20 21 Addresses for correspondence: 22 Kazuharu Arakawa [email protected] 23 Mark Blaxter [email protected] 24 25 2 26 Supplementary Information 27 Tables 28 Supplementary Table S1. Data used in this study .................................................................................................. 3 29 Supplementary Table S2. Mapping statistics of various DNA-Seq data ............................................................ 6 30 Supplementary Table S3. Repeat content in two tardigrades ............................................................................. 7 31 Supplementary Table S4. Telomeric regions in the H. dujardini genome ......................................................... 8 32 Supplementary Table S5. Mapping ratio for RNA-Seq data ................................................................................ 9 33 Supplementary Table S6. Mapping ratio of Trinity assembled transcriptome ............................................... 11 34 Supplementary Table S7. Proteomes used for Orthologue clustering ........................................................... 12 35 Supplementary Table S8. HGT content calculation of ecdysozoa ................................................................... 13 36 Supplementary Table S9. Location of tardigrade specific protection related proteins ............................... 14 37 Supplementary Table S10. Functional annotations of synapomorphies .......................................................... 15 38 Supplementary Table S11. Transcriptome data used for phylogenomics. ..................................................... 16 39 Supplementary Table S12. Software used in this study and their options. .................................................... 17 40 41 42 Figures 43 Supplementary Figure S1. DNA-Seq coverage of the H. dujardini genome .................................................. 20 44 Supplementary Figure S2. tRNA content of the two tardigrades .................................................................... 21 45 Supplementary Figure S3. Length of genetic features ......................................................................................... 22 46 Supplementary Figure S4. Clustered HGT loci in H. dujardini and R. varieornatus ........................................ 23 47 Supplementary Figure S5. Phylogeny of protection related proteins .............................................................. 24 48 Supplementary Figure S6. HGT obtained pathways in two tardigrades ......................................................... 25 49 50 Data files 51 1. miRNA_data 52 2. hgt_trees 53 3. HGT_cluster_matrix 54 4. DEG_list 55 5. Tardigrade_DEGs.fuctional_annotation 56 6. Orthofinder.clustering 57 7. KinFin_input 58 8. Tardigrade_counts_representation_tests 59 9. 463_putative HGTs 60 10. proteome_fastas 61 62 63 3 64 Supplementary Tables 65 Supplementary Table S1. Data used in this study 66 67 a) DNA-sequencing 68 Origin Keio UNC Accession ID SRX2495681 DRR055040 SRX1474871 SRX1474929 SRX1474950 Platform PacBio MiSeq HiSeq 2000 HiSeq 2000 HiSeq 2000 # Reads 779,905 51,607,261 87,744,967 64,049,630 45,733,238 # bp 5,877,440,568 NA NA NA NA Read Length NA 300bp paired 100bp paired 100bp paired 100bp paired Maximum length 49,455 NA NA NA NA N50 length 10,657 NA NA NA NA Average length 7,536 NA NA NA NA Insert length NA 477.0953 347.566 496.9285 749.2743 69 70 We generated new sequencing data using PacBio SMRT technology. In addition, we have used 71 sequence data from Boothby et al (2015) for assembly, and single individual sequencing data from 72 our previous report (Arakawa 2016). 73 74 4 75 (b) Hypsibius dujardini RNA-Seq #Individuals Platform sample #replicate Sample ID #Reads #Accession76 10,000 HiSeq2000 Active 1 Active_1 25,172,359 SRX252836977 2 Active_2 26,497,216 SRX252837078 3 Active_3 28,141,582 SRX2528371 Tun 1 Tun_1 25,782,478 SRX252837279 2 Tun_2 27,832,551 SRX2528373 3 Tun_3 27,001,002 SRX2528374 30 NextSeq500 Active 1 act-1 11,399,144 SRX2528375 2 act-2 10,744,670 SRX2528376 3 act-3 10,939,323 SRX2528377 Tun 1 tun-1 10,325,677 SRX2528378 2 tun-2 10,689,489 SRX2528379 3 tun-3 10,455,913 SRX2528380 Egg 1st day after laying 1 H-E1-1 8,822,054 SRX2528333 2 H-E1-2 10,286,604 SRX2528334 3 H-E1-3 8,319,242 SRX2528335 Egg 2nd day after laying 1 H-E2-1 11,794,526 SRX2528336 2 H-E2-2 11,086,054 SRX2528337 3 H-E2-3 10,151,210 SRX2528338 Egg 3rd day after laying 1 H-E3-1 10,057,550 SRX2528339 2 H-E3-2 9,253,951 SRX2528340 3 H-E3-3 11,871,780 SRX2528341 Egg 4th day after laying 1 H-E4-1 11,622,113 SRX2528342 2 H-E4-2 12,386,383 SRX2528343 3 H-E4-3 9,654,776 SRX2528344 Egg 5th day after laying 1 H-E5-1 11,921,100 SRX2528345 2 H-E5-2 11,569,382 SRX2528346 3 H-E5-3 10,503,387 SRX2528347 Juvenile 1st day 1 H-B1-1 12,440,551 SRX2528348 2 H-B1-2 12,306,138 SRX2528349 3 H-B1-3 12,734,126 SRX2528350 Juvenile 2nd day 1 H-B2-1 13,107,156 SRX2528351 2 H-B2-2 14,437,609 SRX2528352 3 H-B2-3 13,870,809 SRX2528353 Juvenile 3rd day 1 H-B3-1 8,360,076 SRX2528354 2 H-B3-2 6,542,790 SRX2528355 3 H-B3-3 9,775,113 SRX2528356 Juvenile 4th day 1 H-B4-1 9,824,335 SRX2528357 2 H-B4-2 16,666,875 SRX2528358 3 H-B4-3 15,995,271 SRX2528359 Juvenile 5th day 1 H-B5-1 6,928,823 SRX2528360 2 H-B5-2 8,857,975 SRX2528361 3 H-B5-3 12,901,947 SRX2528362 Juvenile 6th day 1 H-B6-1 9,843,726 SRX2528363 2 H-B6-2 12,913,346 SRX2528364 3 H-B6-3 11,745,564 SRX2528365 Juvenile 7th day 1 H-B7-1 12,384,307 SRX2528366 2 H-B7-2 9,182,107 SRX2528367 3 H-B7-3 13,626,269 SRX2528368 5,000 HiSeq2000 miRNA-Seq 1 HD_miRNA 32,254,413 SRX2495676 5 80 (C) Ramazzottius varieornatus RNA-Seq #Individuals Platform sample #rep Sample ID #Reads #Accession 1 Y-active_slow_1 12,146,289 SRX2528399 Active-Fast 2 Y-active_slow_2 11,076,841 SRX2528400 3 Y-active_slow_3 11,211,443 SRX2528401 1~2.5 1 Y-tun_slow_ 1 11,781,529 SRX2528402 Tun-Fast 2 Y-tun_slow_2 11,966,104 SRX2528403 3 Y-tun_slow_3 12,361,848 SRX2528404 1 Y-active_fast_1 31,330,380 SRX2528405 Active-Slow 2 Y-active_fast_2 35,320,831 SRX2528406 3 Y-active_fast_3 36,895,441 SRX2528407 1 Y-tun_fast_1 35,469,871 SRX2528408 Tun-Slow 2 Y-tun_fast_ 2 38,879,671 SRX2528409 3 Y-tun_fast_ 3 31,835,650 SRX2528410 Egg 1st day after laying 1 Y-E1-1 11,688,367 SRX2528381 2 Y-E1-2 13,064,048 SRX2528382 3 Y-E1-3 13,389,666 SRX2528383 NextSeq500 Egg 2nd day after laying 1 Y-E2-1 12,702,879 SRX2528384 2 Y-E2-2 14,385,811 SRX2528385 3 Y-E2-3 13,101,271 SRX2528386 30 Egg 3rd day after laying 1 Y-E3-1 14,348,899 SRX2528387 2 Y-E3-2 13,640,410 SRX2528388 3 Y-E3-3 8,817,117 SRX2528389 Egg 4th day after laying 1 Y-E4-1 12,606,663 SRX2528390 2 Y-E4-2 15,271,225 SRX2528391 3 Y-E4-3 12,517,722 SRX2528392 Egg 5th day after laying 1 Y-E5-1 12,599,958 SRX2528393 2 Y-E5-2 14,476,417 SRX2528394 3 Y-E5-3 17,324,895 SRX2528395 Juvenile 1st day 1 Y-B1-1 4,811,886 SRX2528396 2 Y-B1-2 6,210,798 SRX2528397 3 Y-B1-3 5,637,785 SRX2528398 81 82 83 6 84 Supplementary Table S2. Mapping statistics of various DNA-Seq data 85 Origin Accession ID Mapped Reads Coverage (Mean/SD) # In/Del SRR2986339 47.6M (54.24%) 107±234 6.85M/7.70M UNC SRR2986435 53.8M (83.95%) 50±97 1.84M/2.02 SRR2986451 39.2M (85.68%) 45±70 2.82M/2.90M ERR1147177 116.0M (77.15%) 50±87 3.2M/3.32M Edinburgh ERR1147178 99.2M (80.03%) 36±88 2.30M/2.37M Keio DRR055040 50.0M (96.91%) 113±136 8.59M/8/48 86 87 88 89 7 90 Supplementary Table S3. Repeat content in two tardigrades 91 CategoryTerm H. dujardini R. varieornatus #elements 65,638 3,301 Simple #length (bp) 5,391,682 137,297 %genome 5.18 0.25 #elements 158,522 65,730 Unclassified #length (bp) 24,232,698 11,020,138 %genome 23.27 19.74 92 93 94 95 96 97 8 98 Supplementary Table S4. Telomeric regions in the H. dujardini genome 99 100 Scaffold Start End Repeat Length from End Length scaffold0088 327857 336800 TTGATGGGTT 49 8943 scaffold0114 15 7307 ATCAAAACCC 15 7292 scaffold0012 1 5955 CATCAAAACC 1 5954 scaffold0363 14 4481 ATCAAAACCC 14 4467 scaffold0321 157 4157 AAAACCCATC 157 4000 scaffold0005 52 3367 CAAAACCCAT 52 3315 scaffold0128 239844 242599 GGTTTTGATG 823 2755 scaffold0001 7702 9727 AAACCCATCA 7702 2025 scaffold0192 164482 165621 GGGTTTTGAT 49 1139 scaffold0343 22767 23510 TTTTGATGGG 22767 743 scaffold0023 54017 54373 ATCAAAACCC 54017 356 scaffold0287 57943 58286 ATGGGTTTTG 36758 343 scaffold0212 65340 65622 TTTTGATGGG 65340 282 scaffold0093 201990 202227 TTGATGGGTT 107275 237 scaffold0072 51074 51288 TGGGTTTTGA 51074 214 scaffold0070 189916 190113 CATCAAAACC 189916 197 scaffold0031 383706 383897 GATGGGTTTT 227004 191 scaffold0090 105640 105790 TCAAAACCCA 105640 150 scaffold0092 52492 52623 ATGGGTTTTG 52492 131 scaffold0005 3799 3922 ATCAAAACCC 3799 123 scaffold0036 185539 185660 ATCAAAACCC 185539 121 scaffold0036 17528 17649 ATCAAAACCC 17528 121 scaffold0070 137262 137380 AAACCCATCA 137262 118 scaffold0117 133706 133813 ATCAAAACCC 120682 107 scaffold0171 121075 121176 AAACCCATCA 66249 101 scaffold0268 65742 65843 ATCAAAACCC 36714 101 101 102 Regions spanned close to scaffold ends were colored in yellow; they may represent telomeric ends.