Resequencing of the Common Marmoset Genome Improves Genome Assemblies and Gene-Coding Sequence Analysis”
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary information: “Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis” Kengo Sato, Yoko Kuroki, Wakako Kumita, Asao Fujiyama, Atsushi Toyoda, Jun Kawai, Atsushi Iriki, Erika Sasaki, Hideyuki Okano, Yasubumi Sakakibara* Supplemental Figure Supplemental figure 1. The distribution of insert sizes in the mapped BAC-ends. Supplemental figure 2. Histgrams of mapping rate of length for marmoset and human cDNAs to the improved genome sequence. We discarded cDNAs of which mapping rate is <80%. Supplemental figure 3. The Venn diagram that shows how human cDNA transcripts mapped into the improved (CIEA) genome sequence overlap with the other four annotations. Supplemental Tables Supplemental table 1. The number and location of the MGSAC contigs that have been previously unmapped to chromosomes in the MGSAC draft but mapped to each chromosome in the improved genome. The relative positions on chromosome from 5'-end (binned by every 10% of the total length of chromosome) chr # of contigs 0%-10% 10%-20% 20%-30% 30%-40% 40%-50% 50%-60% 60%-70% 70%-80% 80%-90% 90%-100% chr1 12214 906 1063 897 1141 1022 850 1034 1116 2044 2141 chr2 10822 1983 1058 1169 1257 791 768 959 793 878 1166 chr3 9257 981 838 843 851 799 858 827 1075 804 1381 chr4 8957 1081 1146 1165 742 806 695 811 658 743 1110 chr5 11671 1145 1106 1193 1204 1282 1169 1249 1232 1197 894 chr6 8369 1050 993 755 676 643 730 720 783 844 1175 chr7 10135 941 848 1378 1514 1311 1043 711 652 772 965 chr8 7482 789 756 597 555 882 697 632 812 784 978 chr9 8364 802 858 671 1145 576 539 690 710 1062 1311 chr10 7513 908 758 684 671 640 607 676 739 629 1201 chr11 7836 667 801 562 554 702 937 545 584 861 1623 chr12 7766 1164 888 903 719 719 455 650 711 622 935 chr13 6050 717 583 558 750 593 557 439 591 580 682 chr14 5988 605 716 477 614 512 515 586 671 520 772 chr15 5248 636 479 770 483 425 349 636 592 450 428 chr16 5028 637 429 405 473 436 756 509 478 409 496 chr17 3562 376 337 353 300 333 346 300 312 343 562 chr18 3186 416 465 392 431 214 343 202 229 257 237 chr19 2961 338 231 292 299 316 350 323 286 259 267 chr20 3018 290 229 321 145 313 337 280 245 356 502 chr21 2781 272 214 221 148 192 251 241 316 338 588 chr22 6192 898 687 688 620 373 423 557 637 685 624 chrX 17389 1396 1414 1408 1870 2342 1752 1794 2087 1612 1714 chrY 5290 437 591 72 229 313 127 1735 473 534 779 Supplemental table 2. The list of the completely filled transcripts. The first, second, and third columns represent Ensembl transcript_id, gene_id, and gene_name, respectively.