<<

Sharing of Agrogenomics data

Hendrik-Jan Megens

Animal Breeding & Genomics Centre, Wageningen University Agrogenomics

“cheap genomic data might have a bigger impact on agriculture than human health in the long term”

Ewan Birney, Plant and Genome XXII, 13/01/2014 San Diego, USA

§ Many different species § Non-model organisms but many resources available § University/Institutional research as well as corporate (breeding companies) Species specific communities, resources

Genomes: MD/HD SNP assays ● Chicken (2004) 60K, 660K ● Cattle (2009) 60K, 770K ● Turkey (2010) no ● Atlantic salmon (2010) 800K, lower plex (custom) ● (2012) 60K ● Duck (2013) no ● Sheep (subm.) 60K ● Goat (in prep) 60K Impact of genomics in animal breeding

§ Genomic selection ● Genomic estimated breeding values § Dairy cattle ● Long generation interval ● Bulls à phenotype from daughters ● Shortening selection interval by years § Layer chickens ● Evaluation of phenotypes > 2yrs (e.g. ‘duration of laying period) § Turkey ● No added value (growth, growth, growth….) Different communities, common platforms

§ SNP assays have paved the way for data integration ● Illumina Infinium (2007 – 2009) ● “HapMap” consortia ● Cattle, Pig, Sheep, Goat ● Not for poultry § First truly genome wide and worldwide assessment of variation, selective sweeps, etc. § WGS is the next step : speciation, domestication and selection

African Bearded pig (Borneo) European domestic Peccarry (South America) Red river hog () 2 Europe

Visaya Warty pig

~50 Mya ~20 Mya ~10 Mya ~4 Mya 1 Mya 10 Kya 200 ya

Javan Wild boar Asia Warty pig Babyrousa Asian domestic (Sulawesi) Celebes Warty pig selection domestication speciation Demographic history of wild boars (PSMC)

population expansion LGM following colonisation 3.5 of Eurasia 3 2.5 2 1.5 population 1 decline during LGM WBnl WBit 0.5 WBNch ective population size (x10 " ) size ective population WBSch

E ! 0 10" 10& 10' Years (g=5, μ=2.5x10$%)

Groenen et al., 2012, Nature 491:393 Transition to modern pig breeds

?

• Asian variation has been important in shaping modern pig breeds. • But evidently selection has been the major driver – selection on Asian (e.g. IGF2), and European variation? Pigs got longer in the 19th Century

Darwin, C. R. 1868. The variaon of and plants under domescaon. London: John Murray. First edion, first issue. Volume 1. Three of the major signatures of selection in modern pigs are related to body length

• NR6A1 • PLAG1 • LCORL

Rubin, Megens, et al., 2012, PNAS 109:19529

~300 Individuals sequenced to date (~10x) European Asian Wild

Meishan Angler Casertana Mangalica Sattlelschwein Retinto

Leping Spotted Gloucester_Old Large black Negra sicilliana Tamworth _Spot European (>40) and Asian (11) wild boar Jiangquhai

Britsh Saddleback Negra iberico Chato murciano Linderödssvin

Xiang

Berkshire Bunte Calabrese Bentheimer Middle White Wannan spotted 18 individuals from 10 other Suids (Asia, Hampshire 4 Africa, South America) Pietrain Large white Jinhua +10 commercial cross-breds

Landrace Duroc 8 Zang Generic data analysis strategies

§ Farm animal genomes often similar in size and composition as human (for at least) § In all farm animal species, Human 1K genomes practices are accepted best practice ● Mapping: BWA à BAM ● Genotype calling: GATK (samtools) à VCF § Varying success, some considerations on genome assembly quality and completeness § Allows for adoption of data sharing infrastructure developed for Human studies Sequence integration consortia

§ Cattle: 1000 Bulls project ● Add 15 bulls @>10x coverage to be part of consortium ● Data submitted as BAM ● Data is semi-closed § Pigs: ~600 sequenced (300 Wageningen, 80 Korea, >100 China) ● Various smaller sequencing efforts ● Integration: closed or open? § Sheep,Goat: >100 each ● Similar ‘loose’ consortia as in pig § Chicken: hundreds, very fragmented § Turkey: 150, subsidized by USDA; work done in Wageningen Data ownership considerations

§ Data confidentiality often important in agricultural species ● No privacy/informed consent limitations ● Individuals are owned by somebody § Who owns the animals/plants? ● Institutions/genebanks ● National interests ● Breeding companies § Limitations for sharing / not sharing data ● Lead data generation institutes/consortia ● Breeding companies ● Journals / publishers ● Funding organizations Sequence integration: how?

§ Metadata ● Identifiers ● Gene banks? Animalgenome.org? EBI? ● Phenotype ● Pedigrees ● Locations/checksums/etc… for fastq, BAM, (g)VCF § Sequence data ● QC ● Generic pipelines ● Storage à BAM? § Genotype data ● VCF; gVCF? ● Virtual genomes à imputation data ● Flat text? Databases? Database platforms genotype/sequence data

§ Open? § Commercial? ● E.g. Dutch Variant ● E.g. BC platforms Database /Varda ● Pros: ● Pros: ● Feature rich ● Open ● Fast & ‘easy’ implementation ● between communities ● Support ● federation ● Cons ● Cons: ● License fee ● Some features lacking ● Limits data sharing ● Development? ● Development is up to company Final considerations: across community efforts?

§ Most communities are small ● each too small to develop new platforms (exception: cattle) § Common platforms, workflows à integration between species ● Open platforms § Coordination? ● Gene banks ● Animalgenome.org ● Intl. animal breeding societies ● EBI § Funding! § New developments may force further integration ● Farm-animal Encode project Acknowledgements

n Wageningen University, NL l Martien Groenen l Richard Crooijmans l Ole Madsen l Bert Dibbits l Mirte Bosse l Yogesh Paudel l Laurent Frantz n University of Illinois, USA l Larry Schook l Laurie Rund n CAU, Beijing, China l Ning Li n Uppsala University, Sweden n Roslin Insitute, UK l Carl-Johan Rubin l Alan Archibald l Leif Andersson n Durham University, UK l Greger Larson n University of California, n IPG, Beuningen, NL Berkeley, USA l Barbara Harlizus l Joshua Schraiber n Res. Centre for Biol. – LIPI, Indonesia l Montgomery Slatkin l Gono Semiadi n International Swine Genome Sequencing Consortium Funding n The Porcine HapMap Consortium