<<

Application of genome assembly to

Tim Smith U.S. Research Center Clay Center, Nebraska

September 20, 2018 PacBio User Group Meeting St. Louis, MO

AGRICULTURAL RESEARCH SERVICE

is an equal opportunity provider and employer U.S. Meat Animal Research Center Clay Center, Nebraska

8 mi (11 km)

Science complex

7800 breeding cows/heifers 800 swine litters/year 2000 breeding ewes

The Family Tree Branch lengths are not proportional to time (From Hernandez-Fernandez and Vrba, 2005)

Nilgai (India)

Four-horned , Chousingha (India) Lesser (Ethiopia)

Nyala (South Africa) Bushbuck (Senegal)

Sitatunga (Tanzania) (West Africa)

Greater kudu (South Africa)

Mountain (Ethiopia) (Gambia)

Common eland (South Africa) ()

African Domestic , Bubalis

Tamaraw (Mindoren island, Phillipines)

Lowland ()

Mountain anoa (Phillipines)

Gaur (Bangladesh)

Banteng (Indonesia, )

Kouprey (Cambodia) Progenitor of taurus

Yak (Boreal Asia; also Bos grunniens)

American (North America)

Wisent (Poland) The Bovinae Family Tree Branch lengths are not proportional to time (From Hernandez-Fernandez and Vrba, 2005)

Nilgai (India)

Four-horned antelope, Chousingha (India) (Ethiopia)

Nyala (South Africa) Bushbuck (Senegal)

Sitatunga (Tanzania) Bongo (West Africa)

Greater kudu (South Africa)

Mountain Nyala (Ethiopia) Giant eland (Gambia)

Common eland (South Africa) Saola (Laos)

African Buffalo Domestic Water Buffalo, Bubalus Bubalis

Tamaraw (Mindoren island, Phillipines)

Lowland anoa (Indonesia)

Mountain anoa (Phillipines)

Gaur (Bangladesh)

Banteng (Indonesia, Java)

Kouprey (Cambodia) Progenitor of Bos taurus

Yak (Boreal Asia; also Bos grunniens)

American Bison (North America)

Wisent (Poland) of ca. 14,000 BC Aurochs drawing ca. 1885

Selective breeding has substantially changed bovid species

Modern South Devon

“the ideal animal” South Devon circa 1835 The idea of creating specialized breeds through selective matings began in the 1700s

The “ideal” Durham ca. 1819 Modern Durham (Shorthorn) bull

First principle of genetics was known as early as mid-17th century : hindquarters inherited from cow, forequarters from bull Bos indicus (, Bos taurus indicus) Bos taurus (Taurine, Continental or British) Even within breed, substantial variation affecting traits exists

ca. 1900 : average yield/cow ca. 2000 : average milk yield/cow = 1,800 kg/yr = 8,500 kg/yr “Breeds of ” chromolithograph ca. 1879

Are these large phenotypic differences a result of accumulated SNP variation by selection?

Comparisons of genomes of breeds needed to reveal -- mapping of short reads to a single reference may miss significant differences

Genetic selection accounts for about 1/3 of increased production

• amount of beef produced

• U.S. cattle herd size USDA, NASS, 2010 Reference-quality cattle genomes

Angus Hereford Jersey

Brahman Nelore Holstein Long-read Hereford reference assembly

Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.90 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous Long-read Hereford versus Nelore short-read reference assembly

Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.9 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous

Total sequence length 2.67 Gb Total assembly gap length 198.13 Mb Number of scaffolds 32 Scaffold N50 106.31 Mb Scaffold L50 11 Number of contigs 253,770 Contig N50 28 kb Contig L50 25,227 Total number of chromosomes and plasmids 32

Futuro Long-read Hereford reference assembly

Total sequence length 2.71 Gb Total assembly gap length 28,162 Number of scaffolds 2,211 Scaffold N50 103.31 Mb Scaffold L50 12 Number of contigs 2,597 Contig N50 25.9 Mb Contig L50 32 Total number of chromosomes and plasmids 31 RSII, P6/C4 L1 Dominette 01449 75x coverage 0.21% heterozygous

Total sequence length 2.77 Gb Total assembly gap length NA Number of scaffolds NA Scaffold N50 NA Scaffold L50 NA Number of contigs 1,831 Contig N50 46.75 Mb Contig L50 32 Total number of chromosomes and plasmids 31 Sequel, v2.0 Jersey 63x coverage 0.57% heterozygous

Cattle – Bos taurus taurus and Bos taurus indicus

Auroch

Angus, Bos taurus taurus Brahman, Bos taurus indicus

Domesticated ≈ 11,000 ya Domesticated ≈ 9,000 ya F1 Angus x Brahman

0.9% heterozygous TrioCanu for F1 Angus x Brahman

Sequel, v2.0 ≈135x coverage

Total sequence length 2.57 Gb Angus Number of haplotigs 1,747 Haplotig N50 26.65 Mb 66.9x

Total sequence length 2.68 Gb Brahman Number of contigs 1,585 Haplotig N50 23.26 Mb 0.9% heterozygous 67.3x TrioCanu for F1 Angus x Brahman

Sequel, v2.0 ≈135x coverage

Total sequence length 2.57 Gb Angus Number of haplotigs 1,747 Haplotig N50 26.65 Mb 66.9x

Total sequence length 2.68 Gb Brahman Number of contigs 1,585 Haplotig N50 23.26 Mb 0.9% heterozygous 67.3x

Scaffolding ongoing : Haplotype-resolved HiC (Phase Genomics and Arima Genomics) Haplotype-resolved optical maps (Bionano)

X-chromosome PAR – single contig Y-chromosome PAR – single scaffold (4 contigs) Other domesticated and wild bovinae

Banteng, Bos javanicus Water buffalo, Bubalus Bubalis , Bison bison bison

Gaur, Bos gaurus Yak, Bos grunniens (Bos mutus) Cape Buffalo, Syncerus caffer Riverine water buffalo

Total sequence length 2.66 Gb Total assembly gap length 373,500 Number of scaffolds 509 Scaffold N50 117.22 Mb Scaffold L50 9 Number of contigs 919 Contig N50 22.44 Mb Contig L50 36 Total number of chromosomes and plasmids 26 Sequel, v2.0 Olimpia 69x coverage Riverine water buffalo

Total sequence length 2.66 Gb Total assembly gap length 373,500 Number of scaffolds 509 Scaffold N50 117.22 Mb Scaffold L50 9 Number of contigs 919 Contig N50 22.44 Mb Contig L50 36 Total number of chromosomes and plasmids 26

Olimpia

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Gaur genome

Omaha zoo, blood collection 2001

Total sequence length 2,700,417,543 Number of contigs 2,868 Sequel, v2.0 Contig N50 13,257,066 53x coverage Contig L50 64 Total number of chromosomes and plasmids 28 Interspecies crosses maximizes contrast between parental genome contributions

“Duke” “Molly” Scottish Highland Imperial Yak

“Esperanza”

Yaklander cattle/yak F1 heterozygosity = 1.3% Yaklander genome assembly

Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30

“Esperanza”

Yaklander cattle/yak F1 heterozygosity = 1.3% Yaklander genome assembly

Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30

“Esperanza”

Yaklander cattle/yak F1 heterozygosity = 1.3%

Scaffolding : Scaffolding ? We don’t need no stinkin’ scaffolding ! Yaklander genome assembly

Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30

“Esperanza”

Yaklander cattle/yak F1 heterozygosity = 1.3%

Scaffolding : Scaffolding ? We don’t need no stinkin’ scaffolding ! The NG90 is 95.3% (dam) and 95.5% (sire) in 58 and 55 haplotigs, respectively Yaklander genome assembly

Total sequence length 2.67 Gb Number of contigs 625 Contig NG50 69.88 Mb Contig LG50 15 Total number of chromosomes and plasmids 30

Salsa_default with HiC read pairs

Total sequence length 2.66 Gb “Esperanza” Number of scaffolds 527 Scaffold NG50 86.25 Mb Yaklander cattle/yak F1 Scaffold LG50 12 heterozygosity = 1.3% Yaklander interspecies F1 has best assembly EVAH !

Scottish Highland (paternal) genome Yak (maternal) genome

Courtesy : Sergey Koren Yaklander interspecies F1 has best assembly EVAH !

Scottish Highland (paternal) genome Yak (maternal) genome

Important note : these are the initial haplotigs – no scaffolding, gap-filling, etc. Just, alignment to cattle reference Yaklander interspecies F1 has best assembly EVAH !

Record haplotig/contig N50 >70 Mb

Record longest haplotig/contig (155 Mb) < compared to previous record for human, 143 Mb >

Except X chromosome, as good as current human assembly Yaklander interspecies F1 has best assembly EVAH !

Record haplotig/contig N50 >70 Mb

Record longest haplotig/contig (155 Mb) < compared to previous record for human, 143 Mb >

Except X chromosome, as good as current human assembly

Human genome Yak (maternal) genome (GRCh38) YAK

THERE CAN BE ONLY ONE ( Haplotig per chromosome )

F1 bison x Simmental

American Simmental Association Wade Shafer Fred Schuetze Brad Stroud Ben Rosen, Juan Medrano, Derek Bickhart, Bob Schnabel, Sergey Koren, Richard Hall

Ben Rosen, Christine Couldrey

Sergey Koren, Arang Rhie, Adam Phillippy, Wai-Yee Low, John Williams, Stefan Hiendleder, Derek Bickhart, Ben Rosen, Rick Tearle, Sarah Kingan

Mike Heaton, Peter Hackett, Tim Hardy, Jessica Petersen, Ed Rice, Sergey Koren

Mention of trade names or commercial products in this presentation is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of