Application of Genome Assembly to Bovidae Species
Total Page:16
File Type:pdf, Size:1020Kb
Application of genome assembly to Bovidae species Tim Smith U.S. Meat Animal Research Center Clay Center, Nebraska September 20, 2018 PacBio User Group Meeting St. Louis, MO AGRICULTURAL RESEARCH SERVICE is an equal opportunity provider and employer U.S. Meat Animal Research Center Clay Center, Nebraska 8 mi (11 km) Science complex 7800 breeding cows/heifers 800 swine litters/year 2000 breeding ewes The Bovinae Family Tree Branch lengths are not proportional to time (From Hernandez-Fernandez and Vrba, 2005) Nilgai (India) Four-horned antelope, Chousingha (India) Lesser kudu (Ethiopia) Nyala (South Africa) Bushbuck (Senegal) Sitatunga (Tanzania) Bongo (West Africa) Greater kudu (South Africa) Mountain Nyala (Ethiopia) Giant eland (Gambia) Common eland (South Africa) Saola (Laos) African Buffalo Domestic Water Buffalo, Bubalus Bubalis Tamaraw (Mindoren island, Phillipines) Lowland anoa (Indonesia) Mountain anoa (Phillipines) Gaur (Bangladesh) Banteng (Indonesia, Java) Kouprey (Cambodia) Progenitor of Bos taurus Yak (Boreal Asia; also Bos grunniens) American Bison (North America) Wisent (Poland) The Bovinae Family Tree Branch lengths are not proportional to time (From Hernandez-Fernandez and Vrba, 2005) Nilgai (India) Four-horned antelope, Chousingha (India) Lesser kudu (Ethiopia) Nyala (South Africa) Bushbuck (Senegal) Sitatunga (Tanzania) Bongo (West Africa) Greater kudu (South Africa) Mountain Nyala (Ethiopia) Giant eland (Gambia) Common eland (South Africa) Saola (Laos) African Buffalo Domestic Water Buffalo, Bubalus Bubalis Tamaraw (Mindoren island, Phillipines) Lowland anoa (Indonesia) Mountain anoa (Phillipines) Gaur (Bangladesh) Banteng (Indonesia, Java) Kouprey (Cambodia) Progenitor of Bos taurus Yak (Boreal Asia; also Bos grunniens) American Bison (North America) Wisent (Poland) Cave painting of Aurochs ca. 14,000 BC Aurochs drawing ca. 1885 Selective breeding has substantially changed bovid species Modern South Devon bull “the ideal animal” South Devon circa 1835 The idea of creating specialized breeds through selective matings began in the 1700s The “ideal” Durham ca. 1819 Modern Durham (Shorthorn) bull First principle of genetics was known as early as mid-17th century : hindquarters inherited from cow, forequarters from bull Bos indicus (Zebu, Bos taurus indicus) Bos taurus (Taurine, Continental or British) Even within breed, substantial variation affecting traits exists ca. 1900 : average milk yield/cow ca. 2000 : average milk yield/cow = 1,800 kg/yr = 8,500 kg/yr “Breeds of cattle” chromolithograph ca. 1879 Are these large phenotypic differences a result of accumulated SNP variation by selection? Comparisons of genomes of breeds needed to reveal -- mapping of short reads to a single reference may miss significant differences Genetic selection accounts for about 1/3 of increased beef production • amount of beef produced • U.S. cattle herd size USDA, NASS, 2010 Reference-quality cattle genomes Angus Hereford Jersey Brahman Nelore Holstein Long-read Hereford reference assembly Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.90 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous Long-read Hereford versus Nelore short-read reference assembly Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.9 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous Total sequence length 2.67 Gb Total assembly gap length 198.13 Mb Number of scaffolds 32 Scaffold N50 106.31 Mb Scaffold L50 11 Number of contigs 253,770 Contig N50 28 kb Contig L50 25,227 Total number of chromosomes and plasmids 32 Futuro Long-read Hereford reference assembly Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.9 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous Total sequence length 2.77 Gb Total assembly gap length NA Number of scaffolds NA Scaffold N50 NA Scaffold L50 NA Number of contigs 1,831 Contig N50 46.75 Mb Contig L50 32 Total number of chromosomes and plasmids 31 Sequel, v2.0 Jersey 63x coverage 0.57% heterozygous Cattle subspecies – Bos taurus taurus and Bos taurus indicus Auroch Angus, Bos taurus taurus Brahman, Bos taurus indicus Domesticated ≈ 11,000 ya Domesticated ≈ 9,000 ya F1 Angus x Brahman 0.9% heterozygous TrioCanu for F1 Angus x Brahman Sequel, v2.0 ≈135x coverage Total sequence length 2.57 Gb Angus Number of haplotigs 1,747 Haplotig N50 26.65 Mb 66.9x Total sequence length 2.68 Gb Brahman Number of contigs 1,585 Haplotig N50 23.26 Mb 0.9% heterozygous 67.3x TrioCanu for F1 Angus x Brahman Sequel, v2.0 ≈135x coverage Total sequence length 2.57 Gb Angus Number of haplotigs 1,747 Haplotig N50 26.65 Mb 66.9x Total sequence length 2.68 Gb Brahman Number of contigs 1,585 Haplotig N50 23.26 Mb 0.9% heterozygous 67.3x Scaffolding ongoing : Haplotype-resolved HiC (Phase Genomics and Arima Genomics) Haplotype-resolved optical maps (Bionano) X-chromosome PAR – single contig Y-chromosome PAR – single scaffold (4 contigs) Other domesticated and wild bovinae Banteng, Bos javanicus Water buffalo, Bubalus Bubalis Plains Bison, Bison bison bison Gaur, Bos gaurus Yak, Bos grunniens (Bos mutus) Cape Buffalo, Syncerus caffer Riverine water buffalo Total sequence length 2.66 Gb Total assembly gap length 373,500 Number of scaffolds 509 Scaffold N50 117.22 Mb Scaffold L50 9 Number of contigs 919 Contig N50 22.44 Mb Contig L50 36 Total number of chromosomes and plasmids 26 Sequel, v2.0 Olimpia 69x coverage Riverine water buffalo Total sequence length 2.66 Gb Total assembly gap length 373,500 Number of scaffolds 509 Scaffold N50 117.22 Mb Scaffold L50 9 Number of contigs 919 Contig N50 22.44 Mb Contig L50 36 Total number of chromosomes and plasmids 26 Olimpia Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Gaur genome Omaha zoo, blood collection 2001 Total sequence length 2,700,417,543 Number of contigs 2,868 Sequel, v2.0 Contig N50 13,257,066 53x coverage Contig L50 64 Total number of chromosomes and plasmids 28 Interspecies crosses maximizes contrast between parental genome contributions “Duke” “Molly” Scottish Highland Imperial Yak “Esperanza” Yaklander cattle/yak F1 heterozygosity = 1.3% Yaklander genome assembly Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30 “Esperanza” Yaklander cattle/yak F1 heterozygosity = 1.3% Yaklander genome assembly Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30 “Esperanza” Yaklander cattle/yak F1 heterozygosity = 1.3% Scaffolding : Scaffolding ? We don’t need no stinkin’ scaffolding ! Yaklander genome assembly Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30 “Esperanza” Yaklander cattle/yak F1 heterozygosity = 1.3% Scaffolding : Scaffolding ? We don’t need no stinkin’ scaffolding ! The NG90 is 95.3% (dam) and 95.5% (sire) in 58 and 55 haplotigs, respectively Yaklander genome assembly Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30 Salsa_default with HiC read pairs Total sequence length 2.66 Gb “Esperanza” Number of scaffolds 527 Scaffold NG50 86.25 Mb Yaklander cattle/yak F1 Scaffold LG50 12 heterozygosity = 1.3% Yaklander interspecies F1 has best assembly EVAH ! Scottish Highland (paternal) genome Yak (maternal) genome Courtesy : Sergey Koren Yaklander interspecies F1 has best assembly EVAH ! Scottish Highland (paternal) genome Yak (maternal) genome Important note : these are the initial haplotigs – no scaffolding, gap-filling, etc. Just, alignment to cattle reference Yaklander interspecies F1 has best assembly EVAH ! Record haplotig/contig N50 >70 Mb Record longest haplotig/contig (155 Mb) < compared to previous record for human, 143 Mb > Except X chromosome, as good as current human assembly Yaklander interspecies F1 has best assembly EVAH ! Record haplotig/contig N50 >70 Mb Record longest haplotig/contig (155 Mb) < compared to previous record for human, 143 Mb > Except X chromosome, as good as current human assembly Human genome Yak (maternal) genome (GRCh38) YAK THERE CAN BE ONLY ONE ( Haplotig per chromosome ) F1 bison x Simmental American Simmental Association Wade Shafer Fred Schuetze Brad Stroud Ben Rosen, Juan Medrano, Derek Bickhart, Bob Schnabel, Sergey Koren, Richard Hall Ben Rosen, Christine Couldrey Sergey Koren, Arang Rhie, Adam Phillippy, Wai-Yee Low, John Williams, Stefan Hiendleder, Derek Bickhart, Ben Rosen, Rick Tearle, Sarah Kingan Mike Heaton, Peter Hackett, Tim Hardy, Jessica Petersen, Ed Rice, Sergey Koren Mention of trade names or commercial products in this presentation is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture .