Rapid Discovery of De Novo Deleterious Mutations In
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN Rapid Discovery of De Novo Deleterious Mutations in Cattle Enhances the Value of Livestock as Received: 13 October 2016 Accepted: 23 August 2017 Model Species Published: xx xx xxxx E. Bourneuf1,2, P. Otz3, H. Pausch4, V. Jagannathan5, P. Michot1,6, C. Grohs1, G. Piton1,2, S. Ammermüller4, M.-C. Deloche1,6, S. Fritz1,6, H. Leclerc1,7, C. Péchoux1,8, A. Boukadiri1, C. Hozé1,6, R. Saintilan1,6, F. Créchet1,2, M. Mosca9, D. Segelke10, F. Guillaume1, S. Bouet1, A. Baur1,6, A. Vasilescu11, L. Genestout11, A. Thomas12, A. Allais-Bonnet1,6, D. Rocha1, M.-A. Colle13,14, C. Klopp15, D. Esquerré16, C. Wurmser4, K. Flisikowski17, H. Schwarzenbacher18, J. Burgstaller 19, M. Brügmann20, E. Dietschi 5, N. Rudolph21, M. Freick22, S. Barbey23, G. Fayolle24, C. Danchin-Burge7, L. Schibler6, B. Bed’Hom1, B. J. Hayes25,26, H. D. Daetwyler25,27, R. Fries4, D. Boichard 1, D. Pin 9, C. Drögemüller 5 & A. Capitan1,6 In humans, the clinical and molecular characterization of sporadic syndromes is often hindered by the small number of patients and the difculty in developing animal models for severe dominant conditions. Here we show that the availability of large data sets of whole-genome sequences, high-density SNP chip genotypes and extensive recording of phenotype ofers an unprecedented opportunity to quickly dissect the genetic architecture of severe dominant conditions in livestock. We report on the identifcation of seven dominant de novo mutations in CHD7, COL1A1, COL2A1, COPA, and MITF and exploit the structure of cattle populations to describe their clinical consequences and map modifer loci. Moreover, we demonstrate that the emergence of recessive genetic defects can be monitored by detecting de novo deleterious mutations in the genome of bulls used for artifcial insemination. These 1GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350, Jouy-en-Josas, France. 2LREG, CEA, Université Paris- Saclay, 78350, Jouy-en-Josas, France. 3Veterinary School of Lyon (VetAgro Sup), Cattle Pathology Unit, Marcy l’Etoile, France. 4Chair of Animal Breeding, Technische Universitaet Muenchen, Freising-Weihenstephan, Germany. 5Institute of Genetics, Vetsuisse Faculty, University of Bern, Bremgartenstrasse 109a, 3001, Bern, Switzerland. 6ALLICE, Paris, France. 7Institut de l’Elevage, Paris, France. 8INRA, MIMA2 MET – Equipe Plateformes, Jouy-en- Josas, France. 9Université de Lyon, Veterinary School of Lyon (VetAgro Sup), UPSP 2011-03-101 Interactions Cellules Environnement, Marcy l’Etoile, France. 10Vereinigte Informationssysteme Tierhaltung w.V., Verden, Germany. 11LABOGENA, Jouy-en-Josas, France. 12Veterinary School of Lyon (VetAgro Sup), Imaging Department, Marcy l’Etoile, France. 13INRA, UMR703 Animal Pathophysiology and Biotherapy for Muscle and Nervous system Diseases, Atlantic Gene Therapies, Nantes, France. 14LUNAM University, Oniris, Nantes-Atlantic National College of Veterinary Medicine, Food science and Engineering, Nantes, France. 15INRA, Sigenae Bioinformatics Group, UR875, 31320, Castanet-Tolosan, France. 16GenPhySE, Université de Toulouse, INRA, INPT, INP-ENVT, 31320, Castanet Tolosan, France. 17Chair of Animal Biotechnology, Technische Universität München, Freising-Weihenstephan, Germany. 18ZuchtData GmbH, Vienna, Austria. 19Clinic for Ruminants, University of Veterinary Medicine Vienna, Vienna, Austria. 20Lower Saxony State Ofce for Consumer Protection and Food Safety, Department of Pathology, Oldenburg, Germany. 21Landesuntersuchungsanstalt für das Gesundheits- und Veterinärwesen Sachsen, Bahnhofstrasse 60, 04158, Leipzig, Germany. 22Veterinary Practice Zettlitz, Strasse der Jugend 68, 09306, Zettlitz, Germany. 23INRA, UE0326, Domaine expérimental du Pin-au-Haras, Exmes, France. 24UMOTEST, Ceyzeriat, France. 25AgriBio, Centre for AgriBioscience, Biosciences Research, DEDJTR, Bundoora, Australia. 26Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, Brisbane, Australia. 27School of Applied Systems Biology, La Trobe University, Bundoora, Australia. E. Bourneuf and P. Otz contributed equally to this work. Correspondence and requests for materials should be addressed to A.C. (email: aurelien. [email protected]) SCIENTIFIC REPORTS | 7: 11466 | DOI:10.1038/s41598-017-11523-3 1 www.nature.com/scientificreports/ results demonstrate the attractiveness of cattle as a model species in the post genomic era, particularly to confrm the genetic aetiology of isolated clinical case reports in humans. For centuries humans have taken advantage of their apparent similarities with animals to gain insights into their own anatomy, physiology, and diseases. In modern biology, research mainly focuses on a very limited number of species (i.e. laboratory animals) selected for their high reproductive rate, small generation interval and ease of manipulation1. Te study of laboratory animals has dramatically improved our understanding of the molecular basis of a number of phenotypes, and in particular human genetic disorders, with the characterization of thou- sands of genes/mutations in the past 30 years. Nevertheless, the use of laboratory animals has limitations when it comes to the study of certain dominant conditions. For example, rodents, which are the most commonly studied mammals, have undergone a rapid evolution and display an unusual tolerance to haploinsufciency of transcription factor genes2. Te absence of genetic variability in inbred strains requires several generations of crossing with other strains to map spontaneous de novo mutations. Finally, the most harmful dominant mutations cannot be studied since (i) they result in the death of the carriers before they reach sexual maturity and (ii) the prolifcacy of laboratory animals is ofen not high enough to produce enough afected individuals among the progeny of mosaic parents. Livestock such as cattle have the potential to be relevant models for these dominant conditions. Cattle have higher efective prolifcacy (i.e. through the use of artifcial insemination (AI), bulls can have thousands or tens of thousands of ofspring), greater genetic diversity, and greater similarity with humans in terms of physiology, body size and longevity. In the past, the costs per animal and lack of genomic tools limited the value of cattle as a model species. Recently there has been a rapid accumulation of whole genome sequences (WGS) of bulls, industry scale SNP chip genotyping data for routine genomic evaluations (with millions of animals genotyped), and compre- hensive phenotyping in large well-recorded pedigrees (through industry performance recording). Tis provides a unique opportunity to speed up the identifcation of mutations associated with Mendelian diseases in cattle, and makes cattle an attractive model species3, 4. Te aim of this study was (i) to identify deleterious de novo mutations using WGS data as a primary source of information and (ii) to draw the attention of the scientifc community to the opportunities ofered by the bovine animal model. Results and Discussion Rapid discovery of de novo deleterious mutations using WGS data. We collected samples from afected animals and their healthy relatives for seven autosomal dominant conditions in four cattle breeds: glass- eyed albino (GEA) and dominant red (DR) in Holstein cattle, neurocristopathy (NC) in Montbéliarde cattle, oste- ogenesis imperfecta type 2 (OI) in Fleckvieh cattle, and bulldog calf syndrome (BD1, BD2 and BD3) in a Charolais X Salers cross and in two Holstein pedigrees (Fig. 1; for information on pedigrees see Supplementary Note 1). We sequenced the genome of one animal per condition at nine to 20.7 × coverage on Illumina HiSeq instru- ments using paired-end reads (see Methods). We called variants with SAMtools5 and fltered them to retain heterozygous polymorphisms that were absent from 1230 control genomes4 (Supplementary Table 1) and were predicted to be deleterious to protein function according to Ensembl Variant Efect Predictor annotations6. For each of the seven conditions studied, the large number of control genomes enabled us to decrease the number of candidate causal polymorphisms from millions to only one mutation (Table 1, Supplementary Table 2, and Supplementary Fig. 1). Confrmation of the de novo nature and validation of the causality of the candidate muta- tions. Several analyses were conducted to confrm the causality of the candidate mutations and the reliability of our approach. For GEA, DR and NC, we took advantage of the large pedigrees of livestock populations and mapped the conditions to 1.6-, 4.6- and 0.7-Mb intervals, respectively, using case-control association testing with Illumina BovineSNP50 genotype data (see Methods, Supplementary Table 3). We mined the WGS data of one afected animal per condition and identifed only 3, 4 and 2 small private mutations that were compatible with domi- nant inheritance and located within the GEA, DR and NC associated intervals, respectively. In addition, we per- formed structural variant detection in the sequence data of case and control animals with Pindel7, DELLY8, and the Integrative Genomics Viewer (IGV)9, and we did not detect any additional candidate mutation within these intervals. We then mined Illumina BovineSNP50 phased genotypes from case relatives and from the large French genomic selection database (231,115 Holstein cattle genotypes and 99,044 Montbéliarde cattle genotypes) to iden- tify unafected individuals carrying ancestral versions of the disease-associated haplotypes, i.e.