Molecular characterisation of ichthyosis fetalis and

Niemann-pick type C disease in cattle and brachygnathia,

cardiomegaly and renal hypoplasia syndrome and

pulmonary hypoplasia with anasarca in sheep

Shernae Alicia Woolley

BAnVetBioSc (Hon I)

Faculty of Science

University of Sydney

A thesis submitted in fulfilment of the requirements for the degree of Doctor of

Philosophy

2021 Declaration

This is to certify that to the best of my knowledge, the content of this thesis is my own work.

This thesis has not been submitted for any degree or other purposes. Information derived from the work of others has been acknowledged in the text and references prior or within each chapter.

The work contained in Chapter 4: Niemann-Pick type C diseases in Angus/Angus-cross cattle contains data that was produced during my undergraduate honours degree involving Sanger sequencing. This data was re-analysed for this thesis, which included the inclusion of a larger sample size and other methods to validate the variant. The early sequencing work was re-written for publication.

I certify that the intellectual content of this thesis is the product of my own work and that all the assistance received in preparing this thesis and sources have been acknowledged.

Shernae Alicia Woolley

November, 2020

i Acknowledgements

First and foremost, I would like to express my profound gratitude to my supervisors Imke

Tammen, Brendon O’Rourke and Cali Willet. I will be forever grateful for your enduring support, guidance and friendship throughout my research journey. I have learnt so much from your expertise and experience within your respective fields, and each of you are an inspiration to me. I truly cannot thank you enough for your kindness, your unwavering support for my research endeavours and for having the belief in me to complete this research.

To our internal and external collaborators, including those at the University of Oxford, the

University of Bern and the University of Cambridge, your help and generosity in contributing your expertise to the studies in this thesis is truly appreciated. I would also like to thank the producers and veterinarians who have contributed samples and their time to this project. The studies in this project could not have been completed without your help. To the wonderful staff at the Elizabeth Macarthur Agricultural Institute, especially the Biotechnology team: Naomi

Porter, Kim Whitaker, Leanne Bringolf and Katie Eager. Thank you for your unwavering assistance and friendship. You have all made my experience in the laboratory that much more enjoyable, and I have such fond memories of my time within the laboratory. I am also thankful to the Elizabeth Macarthur Agricultural Institute and Brendon O’Rourke for the invaluable experience of working within a veterinary diagnostic laboratory. I will carry this experience with me for the entirety of my career and I have learnt so much. I am thankful to the Australian

Government for the Research Training Stipend received throughout my postgraduate studies, and to the University of Sydney and the Department of Primary Industries of New South Wales for

ii their scholarship funding and resource support. I have been fortunate enough to conduct my postgraduate research across both an academic and industry setting, which has allowed me to gain a unique perspective and appreciation for the intersection between academia and industry research.

To my friends, I thank you for your support throughout my postgraduate journey and for listening to me constantly talk about my research. To my dog Ren, I adopted you when I first began my postgraduate studies, and you have been there with me every step of the way. Coming home to your mischievous face after very long days made all the difference, I love you my boy.

And finally, to my amazing family. My undergraduate and postgraduate achievements would not have been possible without the support and love from my mother Alison, my nan Colleen, my sister Logan and my brother Orlando. Thank you for your unwavering belief in me, and for listening to me talk about my research even when you did not share the same enthusiasm! My mother Alison has been both a source of support and strength, and your dedication in supporting me through this journey has made all of the difference. You have been a source of encouragement, and through every time of struggle and joy, you have always been there for me. I could not have done this without you, and I love you with all of my heart.

I dedicate this thesis to my mother Alison, the person I admire most in this world.

iii Table of Contents

Abstract 1 Chapter 1 | Literature review 2 1. Introduction 2 2. Animal breeding 3 2.1 History of animal breeding 3 2.2 Integration of molecular genetics in animal breeding 5 2.3 Unintended effects of animal breeding 7 3. Inherited diseases 7 3.1 Mendelian disease 8 3.2 Mendelian disease in cattle and sheep 12 4. Identifying and mutations for inherited diseases 15 4.1 Candidate analysis 17 4.2 Mapping approaches 19 4.3 Microsatellites 20 4.4 SNP genotyping 21 4.5 Next generation sequencing 22 4.6 Other sequencing approaches 25

4.7 Validation of likely disease-causing variants 27 5. Management of inherited diseases in livestock populations 28 5.1 Reporting 29

5.2 Genotyping and diagnostic assays 30 5.3 Breeding management 32 6. Aims of this thesis 33 7. References 35 Chapter 2 | Overview of the inherited diseases investigated and strategies used 61

iv 2.1 Synopsis 61 2.2 Investigating emerging inherited diseases in Australian livestock: A collaborative approach 65 2.3 Molecular investigation of several emerging inherited diseases in cattle and sheep 69

Chapter 3 | Ichthyosis fetalis in Shorthorn cattle 74 3.1 Synopsis 74 3.2 An ABCA12 missense variant in a Shorthorn calf with ichthyosis fetalis 75 3.3 Appendix: Supplementary material for Chapter 3 79 Chapter 4 | Niemann-Pick type C disease in Angus/Angus-cross cattle 103 4.1 Synopsis 103 4.2 Molecular basis for a new bovine model of Niemann-Pick type C disease 104 4.3 Appendix: Supplementary material for Chapter 4 130 Chapter 5 | Brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep 132 5.1 Synopsis 132 5.2 Molecular basis of a new ovine model for human 3M syndrome-2 133 5.3 Appendix: Supplementary material for Chapter 5 144 Chapter 6 | Pulmonary hypoplasia with anasarca in Persian/Persian-cross sheep 174 6.1 Synopsis 174 6.2 A splice site mutation in ADAMTS3 is the likely causal variant for pulmonary hypoplasia with anasarca in Australian Persian/Persian-cross sheep 175 6.3 Appendix: Supplementary material for Chapter 6 219

Chapter 7 | General discussion and conclusions 238 7.1 General discussion 238 7.1.1 Attitudes towards inherited diseases in cattle and sheep 239

v 7.1.2 Inherited disease investigation 250 7.1.3 Challenging breeding attitudes for future diversity and food

security 255 7.2 Conclusion 256 7.3 References 257

vi A note on style

This thesis is comprised of a series of manuscripts, with a literature review preceding the first manuscript and a general discussion following the last manuscript. A synopsis is provided before each manuscript, followed by an appendix containing supplementary material to provide greater context. Each manuscript is formatted to the guidelines of each journal.

Manuscripts contributing to this thesis

Chapter 2

Woolley S.A., Tsimnadis E.R., Nowak N., Tulloch R.L., Shariflou M.R., Leeb T., Willet C.E.,

Khatkar M.S., O’Rourke B.A., Tammen I. (2017) Investigating emerging inherited diseases in

Australian livestock: A collaborative approach. Proc. Assoc. Advmt. Anim. Breed. Genet (Vol.

22, pp. 15-18).

Woolley S.A., Tsimnadis E.R., Tulloch R.L., Hughes P., Hopkins B., Hayes S.E., Shariflou

M.R., Bauer A., Häfliger I.M., Jagannathan V., Drögemüller C. (2019) Molecular investigation of several emerging inherited diseases in cattle and sheep. Proc. Assoc. Advmt. Anim. Breed.

Genet (Vol. 23, pp. 270-273).

vii Chapter 3

Woolley, S.A., Eager, K.L.M., Häfliger, I.M., Bauer, A., Drögemüller, C., Leeb, T., O'Rourke,

B.A. and Tammen, I. (2019), An ABCA12 missense variant in a Shorthorn calf with ichthyosis fetalis. Anim Genet, 50: 749-752. doi:10.1111/age.12856

Chapter 4

Woolley S.A., Tsimnadis E.R., Lenghaus C., Healy P.J., Walker K., Morton A., Khatkar M.S.,

Elliott A., Kaya E., Hoerner C., Priestman D.A., Shepherd D., Platt F.M., Porebski B.T., Willet

C.E., O’Rourke B.A., Tammen I. (2020) Molecular basis for a new bovine model of Niemann-

Pick type C disease. PLOS ONE 15(9): e0238697. https://doi.org/10.1371/journal.pone.0238697

Chapter 5

Woolley S.A., Hayes S.E., Shariflou M.R., Nicholas F.W., Willet C.E., O’Rourke B.A., Tammen

I. (2020) Molecular basis of a new ovine model for human 3M syndrome-2. BMC Genet 21, 106. https://doi.org/10.1186/s12863-020-00913-8

Chapter 6

Woolley S.A., Hopkins B., Khatkar M.S., Jerrett I.V., Willet C.E., O’Rourke B.A., Tammen I. A splice site mutation in ADAMTS3 is the likely causal variant for pulmonary hypoplasia with anasarca in Australian Persian/Persian-cross sheep. To be submitted.

viii Invited presentations, conference proceedings and manuscripts not included in this thesis

Emerging inherited diseases and animal welfare: A case study of congenital mandibular prognathia in Droughtmaster cattle. (2016) Proceedings of the 31st Biennial Conference of the

Australian Society of Animal Production, 1263, Glenelg, Australia, 4th-7th July (Conference proceedings, poster)

Molecular characterisation and management of several emerging inherited diseases in cattle (Bos taurus) and sheep (Ovis aries). (2016) Faculty of Veterinary Science Annual Postgraduate

Conference, Camperdown, Australia, 9th-10th November (Conference proceedings, presentation)

Investigating emerging inherited diseases in Australian livestock by utilising whole genome sequencing data. (2017) Sydney Bioinformatics Research Symposium, Camperdown, Australia,

13th June (Conference proceedings, presentation)

Congenital mandibular prognathia in Droughtmaster cattle. (2017) 36th International Society for

Animal Genetics Conference, MT314, Dublin, Republic of Ireland, 16th-21st July (Conference proceedings, poster)

ix Molecular characterisation and management of emerging inherited diseases in cattle and sheep.

(2017) Sydney School of Veterinary Science Annual Postgraduate Conference, Camden,

Australia, 7th November (Conference proceedings, poster)

Investigating emerging inherited diseases in Australian livestock. (2018) NSW Department of

Primary Industries and The University of Sydney Seminar, Menangle, Australia, 31st October

(Conference proceedings, presentation)

Molecular characterisation of emerging inherited diseases in cattle and sheep. (2018) Sydney

School of Veterinary Science Annual Postgraduate Conference, Camperdown, Australia, 7th

November (Conference proceedings, presentation)

Investigating emerging inherited diseases in Australian livestock. (2019) District Veterinarians

Conference, Newcastle, Australia, 9th - 11th April (Invited presentation)

Investigating emerging inherited diseases in Australian livestock: A snapshot. (2019) 37th

International Society for Animal Genetics Conference, OP103, Lleida, Spain, 7th - 12th July

(Conference proceedings, presentation)

x Eager, K.L.M., Conyers, L.E., Woolley, S.A., Tammen, I. and O'Rourke, B.A. (2020). A novel

ABCA12 frameshift mutation segregates with ichthyosis fetalis in a Polled Hereford calf. Anim

Genet, 51: 837-838. doi:10.1111/age.12973

Author attribution statements

Chapter 2

Chapter 2.2 Investigating emerging inherited diseases in Australian livestock: A collaborative approach. Proceedings of the 22nd Association for the Advancement of Animal Breeding and

Genetics Conference.

Chapter 2.3 Molecular investigation of several emerging inherited diseases in cattle and sheep.

Proceedings of the 23rd Association for the Advancement of Animal Breeding and Genetics

Conference.

For the above publications, I co-designed the study with A.Prof. Imke Tammen and Dr Brendon

O’Rourke and performed the experiments and analysis under their supervision. As part of my honours degree, I conducted Sanger sequencing of a candidate gene for PHA, CMP and NPC, for which a disease-causing variant was only identified for NPC. All additional work outlined in each chapter for PHA and NPC was conducted during my PhD, and any work conducted during my honours year was completely re-written and re-analysed for the purpose of this thesis. I conducted and analysed SNP genotyping for PHA, performed the Sanger sequencing and/or whole genome sequencing preparation and analysis for the following diseases: CMP, NPC, CVS,

xi BCRHS and PHA. I conducted the whole genome sequencing bioinformatics analysis under the supervision of Dr Cali Willet. I prepared the DNA for IF in Shorthorn cattle and OD for Prof.

Tosso Leeb, Prof. Cord Drögemüller, Dr Jagannathan, Anne Bauer and Irene Häfliger to generate and analyse whole genome sequencing data. I analysed and validated the variant results for ichthyosis fetalis in Shorthorn cattle.

Honours research student Emily Tsimnadis under the supervision of A.Prof. Imke Tammen, Dr

Brendon O’Rourke and Dr Mehar Khatkar conducted analysis of SNP genotyping data for six inherited conditions (congenital mandibular prognathia (CMP) in Droughtmaster cattle, ichthyosis fetalis (IF) in Hereford cattle, Niemann-Pick type C disease (NPC) in Angus cattle, congenital blindness (CB) in white Shorthorn cattle, a new variant of cardiomyopathy woolly haircoat syndrome (CWH) in Hereford cattle and suspected congenital contractural arachnodactyly (CCA) in Murray Grey cattle) and conducted partial Sanger sequencing analysis of a candidate gene for CMP. Honours research student Natalie Nowak under the supervision of

A.Prof. Imke Tammen and Dr Brendon O’Rourke conducted Sanger sequencing analysis of the

PPP1R13L candidate gene for the atypical CWH cases. Both Natalie Nowak and Emily

Tsimnadis completed this work prior to the beginning of this thesis.

A. Prof. Imke Tammen, Dr Brendon O’Rourke, Dr Mehar Khatkar and myself co-supervised and assisted honours research student Rachel Tulloch’s SNP genotyping and analysis of this data for cervicothoracic vertebral subluxation (CVS) in Merino sheep. Honours research student Patrick

Hughes was supervised by A. Prof. Imke Tammen and Dr Brendon O’Rourke for the Sanger sequencing of candidate genes in ovine dermatosparaxis (OD) in Merino sheep. I developed the

xii validation test for the OD variant identified via whole genome sequencing. Patrick Hughes conducted the validation under my supervision. Bethany Hopkins and Sarah Hayes conducted candidate gene analysis for PHA and BCRHS under the supervision of A. Prof. Imke Tammen,

Dr Brendon O’Rourke and myself as part of their research project in the Doctor of Veterinary

Medicine program. I developed and validated the variants identified for BCRHS and PHA.

Bethany Hopkins and Sarah Hayes conducted the validation under our supervision. Dr

Mohammed Shariflou provided access to samples and data for BCRHS and contributed to the revision of early draft manuscripts. I wrote and developed the discussion points in the draft manuscripts, and critical revision was conducted by myself, A. Prof. Imke Tammen, Dr Brendon

O’Rourke, Dr Cali Willet, Prof. Tosso Leeb and Prof. Cord Drögemüller.

Chapter 3

Chapter 3.2 An ABCA12 missense variant in a Shorthorn calf with ichthyosis fetalis. Animal

Genetics.

For this publication, I co-designed the study with A.Prof. Imke Tammen and Dr Brendon

O’Rourke and performed the experiments and analysis under their supervision. I prepared the

DNA for whole genome sequencing and analysis by Prof. Tosso Leeb, Prof. Cord Drögemüller,

Dr Jagannathan, Anne Bauer and Irene Häfliger. I analysed and validated the candidate variant and implemented diagnostic genotyping. I wrote and developed the discussion points in the draft manuscript, and critical review was conducted by myself, A.Prof. Imke Tammen, Dr Brendon

O’Rourke, Prof. Tosso Leeb and Prof. Cord Drögemüller.

xiii Chapter 4

Chapter 4.2 Molecular basis for a new bovine model of Niemann-Pick type C disease. PLoS

ONE.

For this publication I co-designed the study with A.Prof. Imke Tammen and Dr Brendon

O’Rourke. I performed the Sanger sequencing of the candidate gene, molecular genetic analysis of the causal variant, validation, development and screening of the diagnostic genotyping assay and data curation under the supervision of A.Prof. Imke Tammen and Dr Brendon O’Rourke.

Emily Tsimnadis conducted the SNP genotyping analysis and selection of candidate genes under the supervision of A.Prof Imke Tammen and Dr Brendon O’Rourke. Mehar Khatkar provided

SNP genotyping analysis and visualisation. Provision of resources for cell culture and maintenance of cells was conducted by Annette Elliot. Prof. Frances Platt, Ecem Kaya, Clarisse

Hoerner, Dr David A. Priestman and Dawn Shepherd conducted and provisioned resources for the bioanalysis investigation of fibroblast cells. Anthony Morton provided clinical description of the phenotype, collected of samples and coordinated communication with producers.

Methodology and analysis for in silico modelling was performed by Dr Ben Porebski and

Dr Cali Willet. I wrote and developed the discussion points in the draft manuscript, and critical review was conducted by myself, A.Prof. Imke Tammen, Dr Brendon O’Rourke and co-authors.

Chapter 5

Chapter 5.2 Molecular basis of a new ovine model for human 3M syndrome-2. BMC Genetics

xiv For this publication I co-designed the study with A.Prof. Imke Tammen and Dr Brendon

O’Rourke and performed experiments under their supervision. I extracted and prepared DNA for sample screening and whole genome sequencing. Bioinformatics and analysis of whole genome sequencing data was conducted by myself under the supervision of Dr Cali Willet. I optimised and validated the diagnostic genotyping assay, including screening animal samples. A.Prof. Imke

Tammen, Dr Brendon O’Rourke and myself co-supervised Sarah Hayes and her analysis of candidate genes, extraction of DNA and screening of animal samples. Sarah Hayes contributed to writing sections of the paper relating to candidate gene identification. Dr Mohammed

Shariflou provided collection and assembly of pedigree information and analysis and communication with the producer. Prof. Frank Nicholas initiated collaboration with the producer and provided pedigree analysis along with A. Prof. Imke Tammen. I wrote and developed the discussion in the draft manuscript, and critical review was conducted by myself, A.Prof. Imke

Tammen, Dr Brendon O’Rourke, Dr Cali Willet, Dr Mohammed Shariflou and Prof. Frank

Nicholas.

Chapter 6

Chapter 6.2 A splice site mutation in ADAMTS3 is the likely causal variant for pulmonary hypoplasia with anasarca in Australian Persian/Persian-cross sheep (manuscript to be submitted).

xv For this publication I co-designed the study with A.Prof. Imke Tammen and Dr Brendon

O’Rourke and performed experiments under their supervision. I extracted and prepared DNA for

SNP genotyping, Sanger sequencing, whole genome sequencing and animal screening. Analysis of SNP genotyping data was conducted by Dr Mehar Khatkar, Imke Tammen and myself. I conducted bioinformatics analysis of whole genome sequencing data under the supervision of Dr

Cali Willet. I optimised and validated the diagnostic genotyping assay, including screening animal samples. A.Prof. Imke Tammen, Dr Brendon O’Rourke and myself co-supervised

Bethany Hopkins and her analysis of candidate genes, extraction of DNA, PCR and Sanger sequencing analysis. Bethany Hopkins also contributed to writing of the paper relating to candidate gene identification. Dr Ian Jerrett conducted pathology and histopathology and contributed to the writing of the paper. I wrote and developed the discussion in the draft manuscript, and critical review was conducted by myself, A.Prof. Imke Tammen and Dr

Brendon O’Rourke.

xvi Author attribution attestment statement

In addition to the statements above, in cases where I am not the corresponding author of a published item included in the chapters of this thesis, permission to include the published material has been granted by the corresponding author, A.Prof. Imke Tammen.

Shernae A. Woolley

November, 2020

As supervisor for the candidature upon which this thesis is based, I can confirm that the authorship attribution statements above are correct.

A.Prof. Imke Tammen

November, 2020

xvii Abbreviations

µg microgram

µL microlitre

2AA anthranillic acid

4MU-α-Mann 4-Methylumbelliferyl-α-D-Mannopyranoside

ABCA12 ATP binding cassette subfamily A member 12

ADAMTS3 a disintegrin and metalloproteinase with thrombospondin type 1 motif 3

AGRF Australian Genome Research Facility

Arg arginine bam binary alignment map

BCRHS brachygnathia, cardiomegaly and renal hypoplasia syndrome

BLAST Basic Local Alignment Search Tool bp

BSA bovine serum albumin

BTA Bos taurus autosome

BVDV bovine viral diarrhea virus

BWA-mem Burrows-Wheeler Aligner

xviii C cytosine

CB congenital blindness

CCA congenital contractural arachnodactyly

CCD8 Coiled-Coil Domain Containing 8 cDNA copy deoxyribonucleic acid

CHPF Chondroitin polymerizing factor

CMP congenital mandibular prognathia

CRISPR-Cas9 clusters of regularly interspaced short palindromic repeats-associated protein 9

CTD C-terminal domain

CUL7 Cullin 7

CVS cervicothoracic vertebral subluxation

CWH cardiomyopathy woolly haircoat syndrome dbSNP Single Nucleotide Polymorphism Database del deletion

dH2O deionised water

DNA deoxyribonucleic acid dNTP deoxynucleotide triphosphate (dATP, dCTP, dGTP, dTTP)

xix DOPE Discrete Optimized Protein Energy

EDTA ethylenediaminetetra-acetic acid

EMAI Elizabeth Macarthur Agricultural Institute

FBS foetal bovine serum

FLT4 Fms Related Receptor Tyrosine Kinase 4

FOXC2 Forkhead Box C2 fs femtosecond g gravitational force

GATK Genome Analysis Toolkit

GMPAA GDP-mannose pyrophosphorylase A

GSLs Glycosphingolipids

Gus glucose unit values

GVCF genomic variant call format

GWAS genome-wide association studies hr hour indel insertion:deletion ins insertion

xx Kb kilobase

Leu leucine

LSD lysosomal storage disease

M molar

MaxENT maximum entropy scoring model

Mb megabase

MD molecular dynamics

Mg milligram

MGI Mouse Genome Informatics; min minute ml millilitre mmol/L millimolar mRNA messenger ribonucleic acid

NCBI National Center for Biotechnology Information

NCBI ORF National Center for Biotechnology Information Open Reading Frame Finder ng nanogram

NIHF non-immune hydrops foetalis

xxi nmol/L nanomolar

NPC Niemann-Pick type C

NPC1 NPC intracellular cholesterol transporter 1

NPC2 NPC intracellular cholesterol transporter 2

NP-HPLC normal-phase high-performance liquid chromatography

NSW New South Wales nt nucleotide

NTD N-terminal domain

OAR Ovis aries autosome

OBSL1 obscurin like cytoskeletal adaptor 1

OD ovine dermatosparaxis

OMIA Online Mendelian Inheritance in Animals

OMIM Online Mendelian Inheritance in Man

PCR polymerase chain reaction

PHA pulmonary hypoplasia with anasarca

PIEZO1 Piezo Type Mechanosensitive Ion Channel Component 1

POPC 1,2-palmitoyl-oleoylsn-glycero-3-phosphocholine

xxii RFLP restriction fragment length polymorphism

RMSD root mean square deviation

RNA ribonucleic acid

ROH run of homozygosity rpm revolutions per minute

RT-PCR reverse-transcriptase PCR s second

SMPD1 sphingomyelin phosphodiesterase 1

SNP single nucleotide polymorphism

SOX18 SRY-Box Transcription Factor 18

SCS single-cell sequencing

SSCP single-strand conformation polymorphism

TE tris-EDTA buffer

TSP1 thrombospondin type 1

U units

WES whole exome sequencing

WGS whole genome sequencing

xxiii Abstract

Since the domestication of cattle and sheep, these species have been extensively farmed to produce food and fibre for human consumption and use. The move from natural breeding to selective breeding has enabled for desired production traits to be selected, and has facilitated accelerated genetic gain within these populations. The presence of deleterious alleles is not a new phenomenon in animal breeding, yet inherited diseases continue to impact animal welfare, productivity and profitability. The advancement in molecular genetics, sequencing technologies and bioinformatics over the past few decades has facilitated the wide generation of genomes for cattle and sheep, and has allowed for improved variant discovery. Despite these advances, reporting of inherited diseases in cattle and sheep is not commonplace in Australia. This thesis utilised several approaches that included SNP genotyping, Sanger sequencing, candidate gene analysis and whole genome sequencing for initially ten inherited diseases, to identify causal mutations. These approaches were used successfully for four inherited diseases: ichthyosis fetalis in Shorthorn cattle, Niemann-Pick type C disease in Angus/Angus-cross cattle, brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep and pulmonary hypoplasia with anasarca in Persian sheep, which forms the basis of this thesis. Diagnostic DNA tests were developed for these four diseases, and were used to improve breeding management. The communication of the results from this thesis will help provide awareness of emerging inherited diseases in Australian livestock populations, as well as highlighting the importance of taking a proactive approach for reporting and managing inherited diseases in livestock.

1 Chapter 1 | Literature review

1. Introduction

Inherited diseases within livestock can have detrimental impacts on animal welfare and can cause significant economic losses (Conington et al. 2010; Windsor et al. 2011; Gibson &

Jackson 2017). Investigating the molecular basis of inherited diseases is vital for the identification and management of deleterious alleles in livestock populations. The identification of causative mutations not only has its benefits for improved animal breeding, but also for potential animal models of disease for human therapy exploration (Agerholm 2008; Cibelli et al.

2013).

The improvement of livestock production over the past decades through the application of quantitative genetics to selection and particularly through the use of estimated breeding values and genomic selection, has meant that profitability and efficiency within the livestock sectors has increased (Kennedy et al. 1990; Charlier et al. 2008). The genetic diversity of livestock populations, especially those under heavy selection pressure such as cattle and sheep, is a crucial factor in determining the viability of future production output and sustainability (Cundiff et al.

1986; Groeneveld et al. 2010). The ability to disseminate desirable alleles tied to traits of economic importance can at times result in a monopoly of gene pools, where elite sires or sire lines can dominate breeding populations (Weigel 2001; Teseling & Parnell 2011). Whilst the use of elite genetics is fundamental in achieving rapid genetic gain (Flint & Woolliams 2008;

Andersson 2013), the inheritance of undesirable alleles must also be considered when designing and implementing breeding programs.

2 Breeding cattle and sheep typically focuses on desirable traits that can increase production output or sustainability of an animal within a herd or flock (Groeneveld et al. 2010). However, this can result in the inadvertent increase of deleterious alleles either via heterozygote advantage

(Robertson 1962; Sellis et al. 2011; Hedrick 2012, 2014) or in the case of recessive diseases, due to an increased chance of inheriting two copies of a deleterious allele. Reporting emerging inherited diseases is a crucial first step in managing these diseases, followed by clear phenotypic descriptions, sample collection and molecular characterisation. Consideration of the role that inherited diseases can play in animal breeding programs, and how deleterious alleles may enter or increase in livestock populations, is vital to effective inherited disease management (Windsor et al. 2011). This literature review focuses on the impact of inherited diseases in animal breeding and the approaches used to identify causative alleles and genes.

2. Animal breeding

2.1 History of animal breeding

Domestication has been part of our culture and civilisation for the past 14,000 years; beginning with the domestication of the dog as both a companion and hunting animal (Mignon-Grasteau et al. 2005). The domestication of livestock started later, about 8,000 to 10,000 years ago in

Anatolia and South Asia, and coincided with increased labour and food demands (Bruford et al.

2003; Mignon-Grasteau et al. 2005; Marshall et al. 2014).

3 Since the domestication of animals, animal selection has focused on production or cultural traits.

Livestock selection began with a focus on phenotypes, such as selection for particular coat colours, improved fibre production in sheep and goats (Alberto et al. 2018) and drafting ability in larger livestock species such as cattle and horses (Marshall et al. 2014). More recently, animal breeding has focused on the improved production of animals or animal products for human consumption due to a rapidly expanding human population (Flint & Woolliams 2008), as well as increasing consideration of animal health and welfare traits (Tammen 2016). During this time, quantitative genetics has played a significant role in improving the selection of production animals.

In the past few decades, advanced reproductive techniques has accelerated genetic gain across the cattle and sheep industries (Faber et al. 2003). In particular, readily accepted and used advanced reproductive techniques such as artificial insemination and embryo transfer have allowed for the rapid dissemination of superior genetics throughout the globe (Faber et al. 2003;

Groeneveld et al. 2010; Hayes et al. 2013). Whilst this has contributed to substantial genetic gain in livestock (Hayes et al. 2013), advanced reproductive techniques can also inadvertently contribute to the dissemination of deleterious alleles, increases in inbreeding and a decrease in effective population size (Young & Seykora 1996; Charlier et al. 2008). It is important here to distinguish between contribution and causality, as the practice of using advanced reproductive techniques does not cause increased incidences of inherited diseases.

4 2.2. Integration of molecular genetics in animal breeding

The application of quantitative genetics (Lush 1943) during the 1960s revolutionised genetic selection and the speed in which the genetic output of offspring could be measured (Meuwissen et al. 2013; Georges et al. 2019). Whilst quantitative genetics can facilitate early selection and shortened generation intervals (Lush 1943; Clark et al. 2012; Boerner et al. 2014), modern molecular techniques have contributed exponentially to animal selection. Initially, molecular genetics was used to identify causative mutations that impacted single-gene traits, such as citrullinaemia in cattle (Harper et al. 1986; Dennis et al. 1989) and the Booroola fecundity

Merino sheep strain (Turner 1978; Montgomery et al. 1992). The identification of causative mutations for deleterious and favourable traits and the development of DNA tests to screen and manage populations have proven useful in livestock. A key example is porcine stress syndrome, a type of myopathy that is triggered by stress or halothane inhalation which results in unfavourable meat characteristics and in some animals, sudden death (Fujii et al. 1991). The deleterious variant was identified in the skeletal muscle ryanodine receptor gene, and the development of a diagnostic test facilitated improved management of the disease (Fujii et al.

1991). The ability to identify a causal mutation and the development of a diagnostic test meant that fatalities and dissemination of the causative allele were managed and reduced.

Utilising molecular genetics for single gene traits to not only improve breeding management but also animal products, has meant that similar techniques have been used to identify variation in genes that contribute to the variation seen in quantitative traits. Genotyping for such variants was used in marker-assisted selection (MAS), which represents an opportunity to improve the

5 selection of quantitative traits in livestock for loci that had both direct and indirect impact on traits (Lande & Thompson 1990).

Livestock selection programs have focussed on improving desirable quantitative traits by using estimated breeding values (EBV) and more recently, using genomic selection programs (Boerner et al. 2014). Estimated breeding values, are calculated from performance data of an animal and its relatives and from pedigree information, to select superior animals for their genetic merit.

This approach is common in cattle, sheep, pigs and poultry (Bijma 2012; Clark et al. 2012). In

Australia, the calculation of estimated breeding values (EBV; called Australian breeding values

(ABV) for dairy cattle and Australian Sheep Breeding Values (ASBV) for sheep) have assisted producers to select animals for their genetic potential (Truscott & Thomas 2010; Boerner et al.

2014).

More recently, genomic estimated breeding values (GEBV) have been incorporated to guide selection decisions. This approach utilises a combination of traditional EBV and the information from genetic markers covering the whole genome (Clark et al. 2012; Meuwissen et al. 2013;

Georges et al. 2019). The incorporation of genetic markers with traditional EBVs has allowed more accurate estimates of animal breeding values and for traits with low heritability, and those that are difficult to measure (Clark et al. 2012; Hayes et al. 2016).

6 2.3 Unintended effects of animal breeding

Increased accuracy of estimating the breeding value of an animal, and the consequent use of elite sire lines or maternal breeding lines within closed or small effective populations, increases the risk of inheriting recessive conditions (Charlier et al. 2008; Windsor et al. 2011). This can be favourable if the allele is coding for a desired trait. Mutations affecting desirable traits such as milk yield, muscle mass, growth and other advantageous traits are often under positive selection

(Qanbari & Simianer 2014). Deleterious alleles within inbred populations are a critical aspect of inbreeding depression (Charlesworth & Charlesworth 1999). In livestock production systems, and especially within commercial operations, inbreeding is common practice in the hope to fix desirable alleles in the population related to production traits or elite lines (Kristensen &

Sørensen 2005). However, inbreeding contributes to production inefficiencies in livestock, and can impact fertility and overall hybrid vigour (Blouin & Blouin 1988). The fact that offspring from inbred parents are less fit than outbred crosses may not appear essential if the purpose of these offspring are for consumption (Blouin & Blouin 1988). However, inbreeding increases the chance of inheriting deleterious alleles in a homozygous form, which can reduce the fitness of a population and impact on production efficiency and financial profits (McDaniel 2001; Pryce et al. 2012).

3. Inherited diseases

Inherited diseases do not always conform to Mendelian single-gene inheritance patterns (van

Heyningen & Yeyati 2004). These include oligogenic diseases, where a phenotype is impacted by the modifying action of a few loci (van Heyningen & Yeyati 2004), and polygenic diseases,

7 where multiple genes and environmental factors can influence the expression of disease (Badano

& Nicholas 2002; van Heyningen & Yeyati 2004), such as polydactyly in Simmental cattle

(Johnson et al. 1981).

3.1 Mendelian disease

Genetic variation is the cornerstone of diversity in all species. Mutation rates are less than 10-7 per nucleotide in each generation, yet the acquisition of variants is important for maintaining genetic diversity (Lynch et al. 2016). However, some mutations have a deleterious effect. The scale in which deleterious variants impact protein structure, function, folding, stability and interaction with other and substrates depends on the type of mutation (Studer et al.

2013).

Pedigree information and accurate phenotype records are vital in elucidating the mode of inheritance. It is important to consider that single-gene traits or diseases do not always produce phenotypes that follow a recessive or dominant mode of inheritance, such as incomplete dominance for chondrodysplasia in Dexter cattle (Harper et al. 1998) and co-dominant modes of inheritance such as roan coat colour in cattle (Charlier et al. 1996; Seitz et al. 1999). Other non-

Mendelian modes of inheritance such as incomplete penetrance, must be considered.

Understanding the inheritance of genetic diseases is paramount to disease management. Even without the knowledge of the disease-causing mutation, the mode of inheritance can be used to estimate genotype probabilities for individuals within a population if pedigree information is available for affected individuals (Kerr & Kinghorn 1996). The discovery of causal variants is

8 much more effective, and the key to managing inherited diseases at an individual and population level.

Information about traits and diseases in animals is rapidly increasing. Databases have been created to provide easy access to a wide range of information about inherited diseases across a wide range of species. The Online Mendelian Inheritance in Man (OMIM) database (Online

Mendelian Inheritance in Man 2019) is a resource for human Mendelian and non-Mendelian diseases where information is gathered both for the predicted or proven molecular basis of the disease, and the phenotypic presentation (Hamosh et al. 2000).

Similarly, the Mouse Genome Informatics (MGI) database provides researchers with information for gene annotation, mouse models of human disease and phenotype descriptions (Bult et al.

2019). For non-laboratory animals, Online Mendelian Inheritance in Animals (OMIA) is a database that contains information about inherited diseases and non-disease traits for more than

250 species (Online Mendelian Inheritance in Animals 2020a). In cattle and sheep, 254 and 110 disorders or traits have been reported, with 161 and 55 likely causal mutations identified respectively (Table 1) (Online Mendelian Inheritance in Animals 2020b).

9 Table 1. Summary of traits and causal variants identified in cattle and sheep obtained from the

Online Mendelian Inheritance in Animals database, as of August 2020.

Cattle Sheep Total traits/disorders 544 256 Mendelian trait/disorder 254 110 Mendelian trait/disorder; likely causal variant(s) known 161 55 Likely causal variants 217 70 Potential models for human disease 220 115

As increasing strides are made in molecular genetics and the tools used to identify causal variants are becoming more cost-effective and accurate, the number of variants being discovered is increasing (Figure 1). Discovery of disease-causing variants is likely to increase if reporting of emerging inherited diseases improves.

Figure 1. Bar graph of the number of likely causal variants for Mendelian traits reported each year from 1986 to August 2020 in non-human and non-laboratory animals from the Online

Mendelian Inheritance in Animals database showing an increase in the number of variants

10

reported. Graph obtained from

, accessed 6th September

2020.

The types of mutations that can occur within the genome vary, and can impart differing effects on phenotype (Botstein & Risch 2003). Genetic variants vary from small single point mutations to complex, large-scale chromosomal rearrangements such as chromosomal inversions or translocations (Table 2). In animals, more than 1/3 of the reported likely causal variants are missense variants, closely followed by small nucleotide deletions, nonsense variants, splice site variants and nucleotide insertions (Table 2).

Table 2. A summary of the type, count and percentage of likely causal variants reported in non- human and non-laboratory animal species in the Online Mendelian Inheritance in Animals database, as of September 2020.

Variant Type Count Percent Missense 415 34.2% Deletion, small (<=20) 193 15.9% Nonsense (stop-gain) 141 11.6% Deletion, gross (>20) 96 7.9% Splicing 95 7.8% Insertion, small (<=20) 69 5.7% Insertion, gross (>20) 63 5.2% Regulatory 33 2.7% Complex rearrangement 32 2.6% Delins, small (<=20) 18 1.5% Duplication 17 1.4% Repeat variation 16 1.3%

11

Haplotype 8 0.7% Inversion 7 0.6% Extension (stop-lost) 3 0.2% Not known 3 0.2% Delins, gross (>20) 2 0.2% Start-lost 2 0.2% Wildtype 2 0.2%

3.2 Mendelian disease in cattle and sheep

In contrast to the 4,318 genes and 4,334 causative variants identified in humans to cause inherited diseases (Online Mendelian Inheritance in Man 2019), the OMIA database lists only

255 bovine and 110 ovine inherited diseases and traits. Only 161 and 55 likely causal variants have been identified in cattle and sheep, respectively (Online Mendelian Inheritance in Animals

2020b). The gap between the number of inherited disorders and number of causal variants identified between humans and non-human species is clearly evident, indicating that this is a gap in knowledge that should be addressed by veterinarians and animal scientists, and provision of additional funding avenues and collaborative research opportunities should follow.

Lack of reporting due to misdiagnosis as a non-genetic condition or under-reporting due to concerns for reputation loss surrounding the report of an inherited disorder are important considerations for the lower number of characterised causal variants in animals (Windsor &

Agerholm 2009). However, early reporting and diagnosis of an inherited disease facilitates early research efforts and if a causal mutation is identified, early exploration of management options can be implemented. The identification of inherited diseases in cattle and sheep poses an

12 opportunity for human research. The use of animal models of human disease is becoming increasingly important. Animal models, especially those involving large animals such as cattle and sheep where organ scaling and lifespan is more similar to humans than rodent models, have been shown to be useful in investigating underlying disease mechanisms as well as developing and evaluating therapeutic strategies for human disease (Pinnapureddy et al. 2015; Gurda & Vite

2019).

Some animal breeding practices can contribute to an increase in incidence of inherited diseases within small or inbred populations. A disease gene can be in close linkage with a desirable trait(s) or maybe inherited when there is an advantage for the heterozygous variant (Hedrick

2014). An example of this selection advantage is chondrodysplasia in Dexter cattle.

Heterozygous animals are selected because these animals have the desired disproportionate dwarfism phenotype while homozygous mutant animals present with severe lethal chondrodysplasia (Harper et al. 1998; Cavanagh et al. 2007).

The use of popular sire lines to disseminate desired traits is a common practice in cattle and sheep breeding. Whilst the use of popular sire lines can be beneficial, the inadvertent dissemination of deleterious alleles causing inherited disease can occur and result in an increase in allele frequency within populations. In Shorthorn and Maine-Anjou cattle, three popular sires with a common ancestor were extensively used for breeding within the associated cattle populations (Whitlock et al. 2008). All three sires carried a single missense mutation that is proposed to cause pulmonary hypoplasia and anasarca (PHA) when inherited in homozygous form (Whitlock et al. 2008). The

13

extensive use of these sires resulted in the dissemination of carriers of the PHA mutation and an increased occurrence of PHA-affected progeny. The delay in pedigree analysis meant that the source of the mutation was not identified for a significant period of time (Whitlock et al. 2008).

More recently, the importance of recessive loci resulting in early embryonic mortality has attracted more attention. Embryonic lethal phenotypes with a recessive mode of inheritance are well known, such as the deficiency of uridine monophosphate synthase (DUMPS) disorder in dairy cattle (Schwenger et al. 1993). While embryonic loss can amass from chromosomal abnormalities as well as numerous non-genetic factors (Diskin & Morris 2008), recent approaches using single nucleotide polymorphism (SNP) genotyping data followed by genome- wide association studies (GWAS) has enabled identification of embryonic lethal conditions with recessive modes of inheritance. This was detected by missing haplotypes in large-scale genotyping data, for which approximately 51 haplotype-related disorders are listed in OMIA for cattle (Online Mendelian Inheritance in Animals 2020b).

It is important to note that genetic heterogeneity, which is more commonly reported in humans, is increasingly observed in livestock. Examples of genetic heterogeneity observed in livestock are bovine maple syrup urine disease in Poll Hereford and Shorthorn cattle (Healy & Dennis

1994), and neuronal ceroid lipofuscinosis in sheep (Tammen et al. 2006). In addition to genetic heterogeneity, phenocopies need to be considered when investigating inherited disease in ruminants. Environmental factors need to be considered, especially for lysosomal storage diseases, where clinical signs of a disease can be the result of the ingestion of plant toxins or an

14

inherited metabolic disorder, as seen in mannosidosis cases in cattle (Hocking et al. 1972;

Dorling et al. 1978; Huxtable & Dorling 1982).

4. Identifying genes and mutations for inherited diseases

The emergence of a new disease is often first met with the investigation of numerous potential aetiologies such as metabolic, toxic and infectious disease mechanisms before genetic causes are considered. Congenital arthrogryposis-hydranencephaly caused by intrauterine infection with

Akabane virus via transmission from biting midges or mosquitoes is a prime example of how similar clinical signs in a disease can be the result of infectious sources (Jagoe et al. 1993).

Once an inherited disease is suspected, the next obstacle to overcome is the mode of inheritance, and ideally the identification of a causal mutation. To identify genes and causal mutations, approaches can be tailored depending on the resources available. Historically, pedigree analysis and test crosses have been used to establish the mode of inheritance (Man et al. 2007). This approach has proven easier for dominant diseases, whereas identifying heterozygous animals for recessive conditions can be problematic when the causal variant is unknown and there are no accurate genotyping tests available.

The identification of causal mutations has accelerated in recent decades owing to the advances in genotyping and genetic sequencing technologies and the bioinformatics tools available

(McKenna et al. 2010; DePristo et al. 2011; Gut 2013; Heather & Chain 2016). Due to these

15

advances ranging from the use of microsatellite markers for genotyping to next generation sequencing technologies, mapping and variant discovery has become more efficient and more cost-effective due to the number of approaches that can be used (Niedringhaus et al. 2011).

Initially, genotyping animals with microsatellite markers followed by linkage analysis, or homozygosity mapping were used to identify areas of interest, which was followed by fine mapping and Sanger sequencing of positional candidate genes (Hearne et al. 1992). The development of high throughput SNP genotyping made mapping more efficient, and increasingly, genome wide associations studies (GWAS) to identify regions associated with the condition (Hearne et al. 1992; Purfield et al. 2012; Bolormaa et al. 2013).

The transition from Sanger sequencing (considered to be the first generation of sequencing) to parallel sequencing using DNA libraries attached to beads that are washed over wells within picolitre reaction plates signified the advancement to next generation sequencing (NGS; second generation sequencing) (Margulies et al. 2005; Niedringhaus et al. 2011; Heather & Chain

2016). This mass production of reads meant that short reads ranging up to 400 bp could be generated very quickly with high yields (Heather & Chain 2016). The third generation of sequencing, also known as long read sequencing, signified a dynamic shift from producing short reads to producing long contiguous reads up to 10 Kb in length. These long reads can help identify large structural variants, which can be problematic in short read sequencing methods

(Eid et al. 2009; Flusberg et al. 2010; Amarasinghe et al. 2020). The first method of long read sequencing was developed by Pacific Biosciences, using the single molecule real time sequencing technology (Eid et al. 2009; Pollard et al. 2018). This approach again used parallel

16

sequencing but instead utilised polymerases joined to single molecules of DNA that had hairpin adaptors ligated to each molecule of DNA (Eid et al. 2009; Pollard et al. 2018).

These technologies paved the way for current approaches in inherited disease investigation.

Multiple approaches can therefore be taken in order to determine the genetic etiology of an inherited disease. Each sequencing technology can be used for specific purposes, whether that is the sequencing of individual genes as often used for Sanger sequencing, next generation sequencing for generation of individual genomes or long read sequencing where mapping and identification of large structural variants can be identified more accurately for disease investigation.

4.1 Candidate gene analysis

The candidate gene approach is popular if the underlying biological mechanism is understood and the deficiency of a specific protein is thought to be disease-causing (Zhu & Zhao 2007;

Masoudi-Nejad et al. 2012). Before reference genomes were available, various approaches were used to obtain the wildtype sequence of a candidate gene, and either genomic or cDNA of affected animals were Sanger sequenced to identify disease-causing variants (Goldmann et al.

1990; Zhang et al. 1990; Tan et al. 1997). The availability of bovine and ovine genomic data and an increased understanding of gene function and tools has made the identification and analysis of candidate genes easier (Wheeler et al. 2007; Bovine Genome et al. 2009; The International

Sheep Genomics Consortium et al. 2010; Zerbino et al. 2017; Hayes & Daetwyler 2019). This

17

has meant that for single-gene disorders, identification of causal variants can be rapid if sufficient resources are available (van Driel & Brunner 2006; He et al. 2020).

It is now common practice to conduct whole genome sequencing or exome sequencing followed by a targeted analysis of candidate genes for likely disease-causing variants (Zhang et al. 2019;

Letko et al. 2020; Paris et al. 2020). This approach has superseded Sanger sequencing of candidate genes particularly when multiple candidate genes are identified, as the cost of NGS is now more affordable (Boycott et al. 2013).

Candidate gene analysis is often pursued following genotyping that allows a disease to be mapped to a genome region. Reference genomes allow identification of all genes in the mapped interval and positional candidate genes be identified based on protein function and analogous syndromes in other species (Sasaki et al. 2016; Paris et al. 2020). Cross-species analysis of inherited diseases also allows for accelerated identification of positional candidate genes. This has been shown to be useful for species with genome assemblies that have not been annotated to the same detail as either the mouse or . Regions with conserved synteny can be identified between species for candidate gene identification, and this has been a successful approach that was used in the identification of a causal mutation in a recessive disorder in

Miniature Schanuzer dogs (Willet et al. 2015).

18 4.2 Mapping approaches

If a candidate gene cannot be identified, genotyping affected animals and control animals with

DNA markers that span the whole genome can be a useful mapping method to identify region(s) associated with disease. In the past genotyping was conducted with microsatellite markers (Ihara et al. 2004; Guichoux et al. 2011) although SNP genotyping is more commonplace due to the large number of SNPs across the genome, and greater accuracy and precision of genotyping

(Kirov et al. 2006; Gurgul et al. 2014). Genotyping can facilitate mapping of genome regions by linkage analysis (Crawford et al. 1995; Andersson 2001; Ott et al. 2015), which requires genotyping of large families in which the disease is segregating (Tate et al. 1992; Charlier et al.

1995; Grobet et al. 1997).

For recessive diseases, homozygosity mapping approaches became more attractive due to the use of SNP genotyping data, which can be combined with linkage mapping approaches to identify candidate genes (Ohba et al. 2000). Homozygosity mapping of recessive inherited diseases exploits the fact that affected individuals will inherit segments of DNA that are identical by descent if the disease is caused by a mutation in a common founder (Lander & Botstein 1987;

Ohba et al. 2000). Affected individuals, their parents, grandparents and other closely related individuals such as siblings and half-siblings, as well as unrelated controls, are useful to accurately define regions of homozygosity common to affected animals only. Assessment of

SNP genotyping data for runs of homozygosity make this a useful approach to identify candidate genes (Charlier et al. 2008). Similarly, genome wide association studies (GWAS) utilise SNP genotyping data to identify variants associated with traits or diseases in case vs control studies

19 often consisting of large numbers of genotyped individuals that are related and unrelated (Ozaki et al. 2002; Topol et al. 2007).

4.3 Microsatellites

Since the 1980s, microsatellite DNA markers have been identified for genotyping.

Microsatellites are simple sequence repeats and specific microsatellite markers can be genotyped by designing primers in flanking regions, and amplifying the different alleles for that locus

(Hearne et al. 1992; Guichoux et al. 2011). Microsatellite markers have been used extensively across a number eukaryotic species, initially for humans (Hearne et al. 1992; Bhargava &

Fuentes 2010), and then later in cattle and sheep (Fries 1993; Crawford et al. 1995). Ovine and bovine linkage maps have been developed in international collaborations that identify the order in which these markers are present on (Bishop et al. 1994; Crawford et al. 1995;

Kappes et al. 1997; Ihara et al. 2004).

Microsatellite markers have been used successfully to identify genome regions that contain genes responsible for inherited diseases, as well as key production traits in livestock (Georges et al.

1993a; Georges et al. 1993b; Ron et al. 1994). The use of microsatellite markers to identify regions of homozygosity amongst affected animals, led to the identification of a positional candidate gene, PIX3 and a causal variant for microphthalmia in sheep (Becker et al. 2010).

Whilst modern sequencing techniques are now available, microsatellite markers can still be used to identify regions of interest for disease gene investigation and offer opportunities for gene cloning in species that are lacking annotated genome assemblies.

20 4.4 SNP genotyping

SNP genotyping chips have been useful for genotyping large numbers of animals for many genetic markers due to the highly automated approach for calling genotypes (Gurgul et al. 2014).

A variety of low, medium and high density SNP chip panels are available for cattle and sheep.

For cattle, densities range from 10,000 SNPs on the Parallele SNP10K chip (Affymetrix, Santa

Clara, CA), 50,000 SNPs on the BovineSNP50K BeadChip (Illumina, San Diego, CA) and an excess of 777,000 SNPs on the Illumina Bovine HD BeadChip (Illumina, San Diego, CA). In sheep, densities range from 12,233 SNPs on the Ovine LD (12k) SNP chip (Boloromaa et al.

2015), 49,034 SNPs on the OvineSNP50 BeadChip (Illumina, San Diego, CA), to approximately

600,000 SNPs available on the Ovine Infinium® HD SNP BeadChip (Kijas et al. 2014). Low- density SNP chips are generally used for parentage verification, and four sets of SNP markers culminating to 854 SNPs have been selected for Australian and New Zealand sheep flocks for this purpose (Heaton et al. 2014). Medium-density SNP chips are often used in GWAS studies for homozygosity mapping (Bolormaa et al. 2013; Curik et al. 2014). High-density SNP chips allow for more detailed insight into individual genomes of animals, as well as improved accuracy for linkage disequilibrium analysis (Kijas et al. 2014; Porto-Neto et al. 2014). Both medium and high-density SNP genotyping data is a vital resource for inherited disease research, especially when considering the use of homozygosity mapping or GWAS approaches (Charlier et al. 2008;

Gurgul et al. 2014).

In homozygosity mapping, the use of SNP genotyping panels to identify runs of homozygosity

(ROH) has been paramount to the identification of genome regions that allowed for the identification of positional candidate genes for autosomal recessive diseases. The length of ROH

21 can indicate the inferred age of a mutation, with long ROH associated with recent inbreeding and shorter ROH associated with more historical inbreeding (Lander & Botstein 1987; Gurgul et al.

2014). However it is important to note compound heterozygous animals will be undetected in

ROH, and should be considered when analysing ROH data for recessive inherited diseases

(Curik et al. 2014).

4.5 Next generation sequencing

Whilst SNP genotyping has its benefits for genotyping across DNA markers spanning the genome of animals in a cost effective manner, NGS has allowed for cost effective sequencing of whole genomes (Mardis 2008; Buermans & den Dunnen 2014). Next generation sequencing is used to identify likely disease-causing variants in candidate genes or positional candidate genes by sequencing affected animals and comparing to reference genomes or sequences of control animals. Exome sequencing on the other hand is often used without candidate gene information or mapping information, particularly in human disease investigations (Ng et al. 2009; Ng et al.

2010; Gilissen et al. 2011), but increasingly in animals (Cosart et al. 2011; Fairfield et al. 2011).

Exome sequencing is particualrly useful when diseases have already been mapped, but require further variant discovery, such as diseases with known haplotypes (McClure et al. 2014). Other

NGS approaches such as RNA sequencing can be useful approaches to validate the impact of identified mutations on .

The cost of NGS has drastically reduced (Shen et al. 2015), and increasing numbers of species have been sequenced using strategies such as whole genome sequencing (WGS) and whole

22 exome sequencing (WES) (Gilissen et al. 2011; Buermans & den Dunnen 2014). The use of short read sequencing and the increasing availability of whole genome sequences in databases such as the European Nucleotide Archive (ENA) (Leinonen et al. 2011) and the National Center for Biotechnology Information (NCBI) (NCBI Resource Coordinators 2018) has allowed for more diverse species to become accessible for inherited disease research.

Sequence repositories for both cattle and sheep are available, such as the 1000 Bull Genomes

Project (Hayes & Daetwyler 2019) and the Sheep Genomes Database (SheepGenomesDB), an initiative of the International Sheep Genomics Consortium (The International Sheep Genomics

Consortium et al. 2010). Access to these resources can be limited, and can thus quickly diminish their usefulness. The number of publically available WGS datasets available for cattle and sheep are growing, with over 2,700 bovine genomes available through the 1000 Bull Genomes Project

(Hayes & Daetwyler 2019) and approximately 2,900 ovine genomes available via

SheepGenomesDB (The International Sheep Genomics Consortium 2020) if researchers are able to contribute sequence data to these projects.

The ability to whole genome sequence affected individuals, family trios or a large cohort of animals allows a more direct approach to identifying disease genes or causal mutations. This approach has been shown to work effectively in identifying de novo mutations for a variety of disorders in cattle (Bourneuf et al. 2017).

23 Given that approximately 1% of the genome in humans is protein coding (Boycott et al. 2013) and similar percentages can be assumed for cattle and sheep, WES is often an excellent choice to not only reduce costs, but to also enable quick investigation of the coding regions. The first use of WES in cattle examined over 16,000 exons in 2,570 genes (Cosart et al. 2011). The use of

WES has grown and when combined with data from SNP genotyping, GWAS and/or ROH analysis, it can be a useful resource for identifying variants responsible for monogenic conditions. The use of WES to identify inherited disease in cattle has been successful, with WES used to identify a causal variant for one of three embryonic lethal haplotypes in dairy cattle

(McClure et al. 2014). One major caveat with WES is that the methods have so far focused on delivering tools for the more accurately annotated human and mouse genomes and not all causal mutations are located within exonic regions, and at times, additional re-sequencing using WGS methods is required (Marcq et al. 1998; Clop et al. 2006; McClure et al. 2014).

Whilst these sequencing technologies have been beneficial to inherited disease research, the quantity and quality of genomic data requires careful curation and analysis. Sequencing depth is an important factor when identifying variants, as higher sequencing depths can assist in identifying sequencing errors within short reads (Kircher & Kelso 2010; Sims et al. 2014). In a study involving pigs, the optimal sequencing depth of 150 bp paired-end reads was identified at a depth of 10X, achieving more than 99% genome coverage for accurate variant calling (Jiang et al. 2019). Similarly, the length of reads and whether these or single or paired-end reads can help identify and place repetitive regions in the genome during assembly (Sims et al. 2014). To minimise false-positive variant discovery, multiple bioinformatics tools are available to assess the quality of data, as well as assembling and aligning data to genome assemblies.

24 The types of tools selected during the early processing stages of genomic data drastically affects the output, and the tools chosen are usually dictated by the research question that is developed

(Gut 2013). As new and updated software tools and genome assemblies become available, it is important for researchers to ensure their analyses are still considered to be in line with best practice when assessing quality of sequence data, and are able to capture a range of genomic features including both SNPs and large structural variants (DePristo et al. 2011; Kulkarni &

Frommolt 2017). Whilst it is inevitable that the data generated from next generation and long read sequencing will be analysed slightly differently between research groups, it is important that the curation of this data when uploaded to public depositories is as transparent as possible to aid other researchers in utilising their data.

4.6 Other sequencing approaches

Long read sequencing in comparison to short read sequencing methods enables read lengths of up to 10 kilobases (Kb) to be generated (Amarasinghe et al. 2020). The major advantage of long read sequencing is the lack of mapping ambiguity and the ability to accurately identify large structural variants, which can be challenging if only short reads are available (Amarasinghe et al.

2020). Whilst next generation sequencing technologies are currently more cost-effective, the use and applications of long read sequencing for genome assemblies and improving genomic selection through understanding genome architecture is growing (Couldrey et al. 2017; Lamb et al. 2020; Upadhyay et al. 2020). Furthermore, the computational demands for long read sequencing is likely to decrease due to longer reads between 20 to 100 Kb (Gut 2013; Pollard et

25 al. 2018), and could thus be more accessible than the current next generation sequencing technologies in terms of resource input (Schadt et al. 2010). However it is important to note that long read sequencing technologies are not yet at the same price point as short read sequencing

(Amarasinghe et al. 2020), and could therefore represent an issue for financial accessibility. The accuracy levels of long read sequencing as well as the types of introduced errors, such as increased numbers of pseudo insertions and deletions is a drawback when compared to short read sequencing (Ross et al. 2013; Hackl et al. 2014). An error rate of up to 20% in long read sequencing poses a computational challenge due to the sheer size of the data that requires correction (Hackl et al. 2014). Therefore the selection of correction pipelines for long read data is paramount to maximise the quality of genomic information obtained from long read sequencing.

In cattle and sheep, long read sequencing is still a relatively new technology that is beginning to gain traction. Recently, a combination of long read sequencing and short read sequencing was used to generate a copy number variation library for dairy cattle in New Zealand (Couldrey et al.

2017). The study identified the differences in copy number variation size between both sequencing technologies, and highlights that the type of sequencing technology used can impact the data that is generated. The use of long read sequencing to identify structural variants accurately by providing sufficient flanking sequence surrounding breakpoints has been useful for identifying traits with multiple alleles, as observed for the poll allele in Australian Brahman cattle (Lamb et al. 2020). In sheep, the use of long read sequencing has enabled for improved genome assemblies, producing notably the Texel Oar_v4.0 (GCA_000298735.2) assembly and the recent assembly of the Rambouillet Oar_rambouillet_v1.0 (GCA_002742125.1) (Liu et al.

26 2016). Both genome assemblies are an improvement from the Oar_v3.1 (GCA_000298735.1) assembly with fewer gaps and longer N50 contigs, where the sequence length of the shortest contig is 50% of the whole genome length (Liu et al. 2016).

The next frontier in molecular genetics is single cell sequencing. Single cell sequencing (SCS) utilises next generation sequencing technologies such as WGS, WES and RNA sequencing, within a single cell, as well as for the entire genome (Zhu et al. 2017). The potential use of this in disorders that exhibit mosaicism, such as the tortoiseshell coat colour phenotype in female cats, could allow for precise identification of causal mutations across a variety of cell types

(Campbell et al. 2015). The potential use of SCS in screening de novo mutations in the germline of elite animals could be used to avoid disadvantageous matings, where offspring could potentially be subject to inherited disorder. The use of SCS could therefore signify the future of individual animal genome annotation in animal breeding, however the cost and feasibility of this venture still needs to be assessed in livestock.

4.7 Validation of likely disease-causing variants

Variants identified using the above approaches needs to be validated within the relevant populations to confirm segregation with disease. Validation strategies vary depending on the type of mutation and the predicted impact on protein function. These strategies can include the use of bioinformatics tools to predict mutation impact on protein (Kumar et al. 2009; Flanagan et al. 2010) and assessing the conservation of amino acids across species or within conserved regions of genes through multiple species alignments for missense variants. Furthermore, direct

27 sequencing over the variant and flanking sequence using Sanger sequencing (Baudhuin et al.

2015), PCR-RFLP or real-time PCR (Pourzand & Cerutti 1993; Gibson 2006) can be useful validation tools.

Functional genomics can also be used to validate the predicted impact of a variant on protein abundance through Western blot assays, rescue experiments, gene editing using CRISPR-Cas9 or animal models (Rodenburg 2018). Whilst these approaches are extremely useful in assessing the impact of a variant, suitable sample types and size are needed in order to extract as much useful information as possible. After the validation of a likely causal variant, the segregation within relevant populations can be determined, and the management of these variants becomes paramount in reducing the risk of the birth of affected animals.

5. Management of inherited diseases in livestock populations

Management of inherited conditions with a recessive mode of inheritance and late onset of disease can be challenging. In recessive diseases, the disease allele may be present for multiple generations in the population until an affected animal, that has inherited two copies is born. It is therefore vital to monitor and report animals that may present with an emerging inherited disorder (Healy 1996). The under-reporting of inherited diseases is problematic for the identification and management of emerging conditions (Windsor et al. 2011) and represents an area within the Australian cattle and sheep industry that requires significant improvement.

28 5.1 Reporting

The reporting of genetic defects in cattle and sheep is essential to identify their frequency in at- risk populations (Dennis 1993). Despite this importance, emerging inherited diseases are not commonly reported due to either fear of loss of income or reputation, or due to misdiagnosis.

The surveillance of emerging inherited diseases in cattle and sheep populations is vital to the collection and analysis of pedigree information, accurate characterisation of the phenotype and biobanking of suitable samples. Without the collection and storage of samples from the affected animal and related animals such as parents, siblings, half-siblings or grandparents; inherited disease studies can be compromised. A survey study conducted on members of a Swiss sheep society for the occurrence of congenital disorders in several Swiss sheep breeds revealed that the frequency of some inherited diseases was rising (Greber et al. 2013). This study used a survey format to gather primary information, which had a response rate of 31.2% (Greber et al. 2013).

This response from producers could reflect the attitude towards reporting inherited diseases, which also may not be reported due to misdiagnosis or poor lamb or calf survivability (Burns et al. 2010; Brien et al. 2014). Inherited diseases are considered a taboo subject, as it can be perceived to affect the livelihoods of producers, especially stud breeders. What is important to highlight is that inherited diseases can be managed if the correct resources are available, and the mechanisms for reporting must be made available for producers and veterinarians.

The Irish Cattle Breeding Federation (ICBF) has a dedicated webpage that allows for producers to report disease on farm, including suspected genetic diseases (Irish Cattle Breeding Federation

29 2016). The inclusion of a congenital defect questionnaire for the reporting of inherited diseases allows disease prevalence to be determined (Irish Cattle Breeding Federation 2016). A similar system in Australia is lacking, and the cattle and sheep industries within Australia would benefit from such a database to determine the prevalence. This is particularly important when elite sires or sire lines are widely used and are responsible for large proportions of the gene pool, as this could lead to higher incidence of inherited disease. This has been observed in Angus cattle for a variety of conditions, including developmental duplication, α-mannosidosis, arthrogryposis multiplex, neuropathic hydrocephalus and contractural arachnodactyly (Teseling & Parnell

2013). When reporting emerging inherited diseases and known defects, anonymity is important to foster better relations between producers and researchers, and to encourage elevated engagement. However, such an approach of anonymity can delay reporting of suspected pedigrees to the wider farming community. If producers are open to communicate information about suspected inherited conditions to the breed societies, then this should be encouraged.

5.2 Genotyping and diagnostic assays

Historically, diagnostic assays for causal variants of genetic defects have been available as individual tests for each specific disease, through either polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) assays and/or polymerase chain reaction-single- strand conformation polymorphism (PCR-SSCP) assays. Multiplexing of several diagnostic tests has reduced costs, and has been achieved through the use of Taqman® real-time genotyping assays. Inclusion of disease variants on SNP genotyping chips that are routinely used for parentage verification and genomic EBV prediction can thus reduce costs, provide higher throughput options and allow for monitoring of allele frequencies within populations.

30 Since the first use of PCR-RFLP to genotype mutations that affect the recognition site of a restriction enzyme (Parry et al. 1990), the use of PCR-RFLP has been extremely popular due to its sensitivity and ease of interpretation (Pourzand & Cerutti 1993; Poli et al. 1996; Parsons &

Heflich 1997; Soethout et al. 2002). However, maintaining sensitivity is important for RFLP assays, and the presence of a second native site within the PCR product allows for quality assurance of the activity of the enzyme in the PCR-RFLP to be confirmed (Parry et al. 1990;

O'Rourke et al. 2006). It is also important to consider that not all mutations affect a restriction enzyme recognition site and PCR-RFLP is not always the most discriminatory method, and other methods of genotyping should be used (Smith et al. 2002). For improved sensitivity, PCR-SSCP assays have been used, where target DNA is amplified and labelled with primers or nucleotides that are then denatured to a single-strand. This denatured product is then analysed via gel electrophoresis to identify mutation bands at different positions on an autoradiogram (Hayashi

1992). Both assays offer genotyping of mutations, but are limited by their low throughput.

Currently, high throughput genotyping assays are more attractive as a diagnostic tool, as more cattle and sheep are being genotyped for production traits. The use of real-time PCR for genotyping has allowed high throughput genotyping to be achieved, with the technology overcoming some of the shortfalls of PCR-RFLPs surrounding ambiguous genotype allocation due to inefficient or undigested enzyme (Johnson et al. 2004). Real-time PCRs are not only used for elucidating genotypes for monogenic diseases, but can also be used to detect variants that are

31 associated with production-related traits such as disease resistance, as well assessing viral or bacterial load in individual samples (Konnai et al. 2003; Álvarez-Sánchez et al. 2005).

Another cost-effective option that is increasingly being used as a diagnostic tool is the inclusion of disease variants on SNP genotyping panels. Since SNP chip panels are already used by producers for parentage verification and genomic selection, the inclusion of causal mutations for inherited diseases would be beneficial for monitoring allele frequencies in the population and for producers to make informed breeding decisions. The patenting of mutations or variants can hinder the transparency of inherited disease investigation by withholding key mutation information and can slow testing if the cost of these genotyping tests are expensive and monopolised by a particular company or organisation (Matthijs 2004). It is therefore important that mutation discovery is as transparent as possible in order to aid producers and industry in understanding the prevalence of inherited diseases and how to manage these best.

5.3 Breeding management

Whilst cattle and sheep breeding management is often designed to maximise desirable traits for production value, strategies to decrease inbreeding are becoming more important to maintain genetic diversity and reduce the risk of deleterious recessive conditions (McDaniel 2001; Pryce et al. 2012). As mentioned earlier, the use of pedigrees is paramount to maintaining accurate breeding records and to allow for breeding lines to be swiftly identified if a recessive disease is suspected (Man et al. 2007). Managing non-random matings in cattle and sheep is imperative to reduce the occurrence of inbreeding depression (Wang & Hill 1999), and it is reasonable to

32 assume that increasing homozygosity for particular genes can lead to detrimental effects upon the fitness of a population (Wang & Hill 1999). This is particularly important if the effective population size is relatively small. Small effective population sizes have been observed for a few cattle breeds resulting from the exclusive use of elite sires and high selection pressure, as observed for Holstein-Friesian cattle and Angus cattle, both with differing selection pressures for different traits (MacEachern et al. 2009). To improve breeding management by maximising genetic diversity whilst obtaining genetic gain, software that can assist in mate selection via the use of pedigree information and inclusion of existing information about inherited diseases such as MateSel (Kinghorn & Kinghorn 2020), can help minimise inbreeding.

Using genomic data for breeding management concerning inherited diseases enables for fast and accurate identification of affected animals for diseases with late onset and carrier animals for recessive diseases. This allows producers to avoid mating carrier animals, and can therefore drastically reduce the number of affected animals born. This will not only improve animal welfare, but will also decrease income losses. The inclusion of DNA test results in herdbooks is essential for producers to make informed decisions, but also for predicting genotypes of all animals in populations using software such as GeneProb (Kerr & Kinghorn 1996).

6. Aims of this thesis

In this thesis, ten inherited diseases in seven breeds of sheep and cattle with a suspected recessive mode of inheritance are investigated, to identify disease-causing mutations and develop

DNA diagnostic tools for industry (Table 3). Reporting and initial sample collection for these diseases occurred in the past, resulting in limitations to retrospective analyses. A range of

33

approaches are utilised, including pedigree analysis, SNP genotyping, homozygosity mapping, candidate gene analysis, Sanger sequencing and whole genome sequencing of affected and control animals, validation of likely causative variants as well as the development of diagnostic tests.

Table 3. List of inherited diseases investigated in this thesis.

Centre Suspected Inherited disease Species Breed reported MOI* Bos Shorthorn, Ichthyosis fetalis EMAI Recessive taurus Hereford Bos Angus/Angus- Niemann-Pick type C EMAI Recessive taurus cross Brachygnathia, cardiomegaly and renal Ovis USYD Recessive Merino hypoplasia syndrome aries Ovis Persian/Persian- Pulmonary hypoplasia with anasarca USYD Recessive aries cross Ovis Ovine dematosparaxis EMAI Recessive Merino aries Ovis Cervicothoracic vertebral subluxation EMAI Recessive Merino aries Bos Congenital mandibular prognathia USYD Recessive Droughtmaster taurus Bos Congenital Blindness EMAI Recessive Shorthorn taurus Atypical variant of cardiomyopathy woolly Bos EMAI Recessive Hereford haircoat syndrome taurus Bos Congenital contractural arachnodactyly USYD Recessive Murray Grey taurus *Mode of inheritance (MOI)

34 7. References

Agerholm J.S. (2008) Inherited disorders of ruminants: The sheep as a model of disease in

humans. The Veterinary Journal 177, 305-6.

Alberto F.J., Boyer F., Orozco-terWengel P., Streeter I., Servin B., de Villemereuil P.,

Benjelloun B., Librado P., Biscarini F., Colli L., Barbato M., Zamani W., Alberti A.,

Engelen S., Stella A., Joost S., Ajmone-Marsan P., Negrini R., Orlando L., Rezaei H.R.,

Naderi S., Clarke L., Flicek P., Wincker P., Coissac E., Kijas J., Tosser-Klopp G., Chikhi

A., Bruford M.W., Taberlet P. & Pompanon F. (2018) Convergent genomic signatures of

domestication in sheep and goats. Nature Communications 9, 813-22.

Álvarez-Sánchez M.A., Pérez-García J., Cruz-Rojo M.A. & Rojo-Vázquez F.A. (2005) Real time

PCR for the diagnosis of benzimidazole resistance in trichostrongylids of sheep.

Veterinary Parasitology 129, 291-8.

Amarasinghe S.L., Su S., Dong X., Zappia L., Ritchie M.E. & Gouil Q. (2020) Opportunities and

challenges in long-read sequencing data analysis. Genome Biology 21, 1-16.

Andersson L. (2001) Genetic dissection of phenotypic diversity in farm animals. Nature Reviews

Genetics 2, 130-8.

Andersson L. (2013) Molecular consequences of animal breeding. Current Opinion in Genetics

& Development 23, 295-301.

Badano J.L. & Nicholas K. (2002) Human genetics and disease: Beyond Mendel: An evolving

view of human genetic disease transmission. Nature Reviews Genetics 3, 779-789.

Baudhuin L.M., Lagerstedt S.A., Klee E.W., Fadra N., Oglesbee D. & Ferber M.J. (2015)

Confirming variants in next-generation sequencing panel testing by Sanger sequencing.

The Journal of Molecular Diagnostics 17, 456-61.

35 Becker D., Tetens J., Brunner A., Bürstel D., Ganter M., Kijas J., for the International Sheep

Genomics C. & Drögemüller C. (2010) Microphthalmia in Texel sheep is associated with

a missense mutation in the paired-like homeodomain 3 (PITX3) gene. PLoS ONE 5,

e8689:1-9.

Bhargava A. & Fuentes F.F. (2010) Mutational dynamics of microsatellites. Molecular

Biotechnology 44, 250-66.

Bijma P. (2012) Accuracies of estimated breeding values from ordinary genetic evaluations do

not reflect the correlation between true and estimated breeding values in selected

populations. Journal of Animal Breeding and Genetics 129, 345-58.

Bishop M.D., Kappes S.M., Keele J.W., Stone R.T., Sunden S.L., Hawkins G.A., Toldo S.S.,

Fries R., Grosz M.D. & Yoo J. (1994) A genetic linkage map for cattle. Genetics 136,

619-39.

Blouin S.F. & Blouin M. (1988) Inbreeding avoidance behaviors. Trends in Ecology & Evolution

3, 230-3.

Boerner V., Johnston D.J. & Tier B. (2014) Accuracies of genomically estimated breeding values

from pure-breed and across-breed predictions in Australian beef cattle. Genetics Selection

Evolution 46, 1-11.

Bolormaa S., Gore K., van der Werf J.H.J., Hayes B.J. & Daetwyler H.D. (2015) Design of a

low-density SNP chip for the main Australian sheep breeds and its effect on imputation

and genomic prediction accuracy. Animal Genetics 46, 544-56.

Bolormaa S., Pryce J.E., Kemper K.E., Hayes B.J., Zhang Y., Tier B., Barendse W., Reverter A.

& Goddard M.E. (2013) Detection of quantitative trait loci in Bos indicus and Bos taurus

cattle using genome-wide association studies. Genetics Selection Evolution 45, 43-55.

36 Botstein D. & Risch N. (2003) Discovering genotypes underlying human phenotypes: Past

successes for Mendelian disease, future approaches for complex disease. Nature Genetics

33, 228-37.

Bourneuf E., Otz P., Pausch H., Jagannathan V., Michot P., Grohs C., Piton G., Ammermüller S.,

Deloche M.C., Fritz S., Leclerc H., Péchoux C., Boukadiri A., Hozé C., Saintilan R.,

Créchet F., Mosca M., Segelke D., Guillaume F., Bouet S., Baur A., Vasilescu A.,

Genestout L., Thomas A., Allais-Bonnet A., Rocha D., Colle M.A., Klopp C., Esquerré

D., Wurmser C., Flisikowski K., Schwarzenbacher H., Burgstaller J., Brügmann M.,

Dietschi E., Rudolph N., Freick M., Barbey S., Fayolle G., Danchin-Burge C., Schibler

L., Bed’Hom B., Hayes B.J., Daetwyler H.D., Fries R., Boichard D., Pin D., Drögemüller

C. & Capitan A. (2017) Rapid discovery of de novo deleterious mutations in cattle

enhances the value of livestock as model species. Scientific Reports 7, 11466-84.

Bovine Genome S., Analysis C., Elsik C.G., Tellam R.L., Worley K.C., Gibbs R.A., Muzny

D.M., Weinstock G.M., Adelson D.L., Eichler E.E., Elnitski L., Guigó R., Hamernik

D.L., Kappes S.M., Lewin H.A., Lynn D.J., Nicholas F.W., Reymond A., Rijnkels M.,

Skow L.C., Zdobnov E.M., Schook L., Womack J., Alioto T., Antonarakis S.E., Astashyn

A., Chapple C.E., Chen H.-C., Chrast J., Câmara F., Ermolaeva O., Henrichsen C.N.,

Hlavina W., Kapustin Y., Kiryutin B., Kitts P., Kokocinski F., Landrum M., Maglott D.,

Pruitt K., Sapojnikov V., Searle S.M., Solovyev V., Souvorov A., Ucla C., Wyss C.,

Anzola J.M., Gerlach D., Elhaik E., Graur D., Reese J.T., Edgar R.C., McEwan J.C.,

Payne G.M., Raison J.M., Junier T., Kriventseva E.V., Eyras E., Plass M., Donthu R.,

Larkin D.M., Reecy J., Yang M.Q., Chen L., Cheng Z., Chitko-McKown C.G., Liu G.E.,

Matukumalli L.K., Song J., Zhu B., Bradley D.G., Brinkman F.S.L., Lau L.P.L.,

37 Whiteside M.D., Walker A., Wheeler T.T., Casey T., German J.B., Lemay D.G.,

Maqbool N.J., Molenaar A.J., Seo S., Stothard P., Baldwin C.L., Baxter R., Brinkmeyer-

Langford C.L., Brown W.C., Childers C.P., Connelley T., Ellis S.A., Fritz K., Glass E.J.,

Herzig C.T.A., Iivanainen A., Lahmers K.K., Bennett A.K., Dickens C.M., Gilbert

J.G.R., Hagen D.E., Salih H., Aerts J., Caetano A.R., Dalrymple B., Garcia J.F., Gill

C.A., Hiendleder S.G., Memili E., Spurlock D., Williams J.L., Alexander L., Brownstein

M.J., Guan L., Holt R.A., Jones S.J.M., Marra M.A., Moore R., Moore S.S., Roberts A.,

Taniguchi M., Waterman R.C., Chacko J., Chandrabose M.M., Cree A., Dao M.D., Dinh

H.H., Gabisi R.A., Hines S., Hume J., Jhangiani S.N., Joshi V., Kovar C.L., Lewis L.R.,

Liu Y.-S., Lopez J., Morgan M.B., Nguyen N.B., Okwuonu G.O., Ruiz S.J., Santibanez

J., Wright R.A., Buhay C., Ding Y., Dugan-Rocha S., Herdandez J., Holder M., Sabo A.,

Egan A., Goodell J., Wilczek-Boney K., Fowler G.R., Hitchens M.E., Lozado R.J., Moen

C., Steffen D., Warren J.T., Zhang J., Chiu R., Schein J.E., Durbin K.J., Havlak P., Jiang

H., Liu Y., Qin X., Ren Y., Shen Y., Song H., Bell S.N., Davis C., Johnson A.J., Lee S.,

Nazareth L.V., Patel B.M., Pu L.-L., Vattathil S., Williams R.L., Jr., Curry S., Hamilton

C., Sodergren E., Wheeler D.A., Barris W., Bennett G.L., Eggen A., Green R.D., Harhay

G.P., Hobbs M., Jann O., Keele J.W., Kent M.P., Lien S., McKay S.D., McWilliam S.,

Ratnakumar A., Schnabel R.D., Smith T., Snelling W.M., Sonstegard T.S., Stone R.T.,

Sugimoto Y., Takasuga A., Taylor J.F., Van Tassell C.P., Macneil M.D., Abatepaulo

A.R.R., Abbey C.A., Ahola V., Almeida I.G., Amadio A.F., Anatriello E., Bahadue S.M.,

Biase F.H., Boldt C.R., Carroll J.A., Carvalho W.A., Cervelatti E.P., Chacko E., Chapin

J.E., Cheng Y., Choi J., Colley A.J., de Campos T.A., De Donato M., Santos I.K.F.d.M., de Oliveira C.J.F., Deobald H., Devinoy E., Donohue K.E., Dovc P., Eberlein A.,

38 Fitzsimmons C.J., Franzin A.M., Garcia G.R., Genini S., Gladney C.J., Grant J.R.,

Greaser M.L., Green J.A., Hadsell D.L., Hakimov H.A., Halgren R., Harrow J.L., Hart

E.A., Hastings N., Hernandez M., Hu Z.-L., Ingham A., Iso-Touru T., Jamis C., Jensen

K., Kapetis D., Kerr T., Khalil S.S., Khatib H., Kolbehdari D., Kumar C.G., Kumar D.,

Leach R., Lee J.C.M., Li C., Logan K.M., Malinverni R., Marques E., Martin W.F.,

Martins N.F., Maruyama S.R., Mazza R., McLean K.L., Medrano J.F., Moreno B.T.,

Moré D.D., Muntean C.T., Nandakumar H.P., Nogueira M.F.G., Olsaker I., Pant S.D.,

Panzitta F., Pastor R.C.P., Poli M.A., Poslusny N., Rachagani S., Ranganathan S., Razpet

A., Riggs P.K., Rincon G., Rodriguez-Osorio N., Rodriguez-Zas S.L., Romero N.E.,

Rosenwald A., Sando L., Schmutz S.M., Shen L., Sherman L., Southey B.R., Lutzow

Y.S., Sweedler J.V., Tammen I., Telugu B.P.V.L., Urbanski J.M., Utsunomiya Y.T.,

Verschoor C.P., Waardenberg A.J., Wang Z., Ward R., Weikard R., Welsh T.H., Jr.,

White S.N., Wilming L.G., Wunderlich K.R., Yang J. & Zhao F.-Q. (2009) The genome

sequence of taurine cattle: A window to ruminant biology and evolution. Science 324,

522-8.

Boycott K.M., Vanstone M.R., Bulman D.E. & Mackenzie A.E. (2013) Rare-disease genetics in

the era of next-generation sequencing: Discovery to translation. Nature Reviews Genetics

14, 681-91.

Brien F.D., Cloete S.W.P., Fogarty N.M., Greeff J.C., Hebart M.L., Hiendleder S., Edwards

J.E.H., Kelly J.M., Kind K.L., Kleemann D.O., Plush K.L. & Miller D.R. (2014) A

review of the genetic and epigenetic factors affecting lamb survival. Animal Production

Science 54, 667-93.

39 Bruford M.W., Bradley D.G. & Luikart G. (2003) DNA markers reveal the complexity of

livestock domestication. Nature Reviews Genetics 4, 900-10.

Buermans H.P.J. & den Dunnen J.T. (2014) Next generation sequencing technology: Advances

and applications. Biochimica et Biophysica Acta 184, 1932–41.

Bult C.J., Blake J.A., Smith C.L., Kadin J.A. & Richardson J.E. (2019) Mouse Genome Database

(MGD) 2019. Nucleic Acids Research 47, D801-6.

Burns B.M., Fordyce G. & Holroyd R.G. (2010) A review of factors that impact on the capacity

of beef cattle females to conceive, maintain a pregnancy and wean a calf—Implications

for reproductive efficiency in northern Australia. Animal Reproduction Science 122, 1-

22.

Campbell I.M., Shaw C.A., Stankiewicz P. & Lupski J.R. (2015) Somatic mosaicism:

Implications for disease and transmission genetics. Trends in Genetics 31, 382-92.

Cavanagh J., Tammen I., Windsor P., Bateman J., Savarirayan R., Nicholas F. & Raadsma H.

(2007) Bulldog dwarfism in Dexter cattle is caused by mutations in ACAN. Mammalian

Genome 18, 808-14.

Charlesworth B. & Charlesworth D. (1999) The genetic basis of inbreeding depression. Genetics

Research 74, 329-40.

Charlier C., Coppieters W., Farnir F., Grobet L., Leroy P.L., Michaux C., Mni M., Schwers A.,

Vanmanshoven P., Hanset R. & Georges M. (1995) The mh gene causing double-

muscling in cattle maps to bovine 2. Mammalian Genome 6, 788-92.

Charlier C., Coppieters W., Rollin F., Desmech D., Agerholm J.S., Cambisano N., Carta E.,

Dardano S., Dive M., Fasquelle C., Fennet J.C., Hanset R., Hubin X., Jorgensen C.,

Karim L., Kent M., Harvey K., Pearce B.R., Simon P., Tama N., Nie H., Vandeputte S.,

40 Lien S., Longeri M., Fredholm M., Harvey R.J. & Georges M. (2008) Highly effective

SNP-based association mapping and management of recessive defects in livestock.

Nature Genetics 40, 449-54.

Charlier C., Denys B., Belanche J.I., Coppieters W., Grobet L., Mni M., Womack J., Hanset R.

& Georges M. (1996) Microsatellite mapping of the bovine roan locus: A major

determinant of White Heifer Disease. Mammalian Genome 7, 138-42.

Cibelli J., Emborg M.E., Prockop D.J., Roberts M., Schatten G., Rao M., Harding J. &

Mirochnitchenko O. (2013) Strategies for improving animal models for regenerative

medicine. Cell stem cell 12, 271-4.

Clark S., Hickey J., Daetwyler H. & van der Werf J. (2012) The importance of information on

relatives for the prediction of genomic breeding values and the implications for the

makeup of reference data sets in livestock breeding schemes. Genetics Selection

Evolution 44, 4-13.

Clop A., Marcq F., Takeda H., Pirottin D., Tordoir X., Bibé B., Bouix J., Caiment F., Elsen J.M.,

Eychenne F., Larzul C., Laville E., Meish F., Milenkovic D., Tobin J., Charlier C. &

Georges M. (2006) A mutation creating a potential illegitimate microRNA target site in

the myostatin gene affects muscularity in sheep. Nature Genetics 38, 813-8.

Conington J., Gibbons J., Haskell M. & Bünger L. (2010) The use of breeding to improve animal

welfare. In: Proceedings of 9th World Congress on Genetics Applied to Livestock

Production, Leipzig, Germany. German Society for Animal Science, Leipzig, Germany.

Cosart T., Beja-Pereira A., Chen S., Ng S.B., Shendure J. & Luikart G. (2011) Exome-wide

DNA capture and next generation sequencing in domestic and wild species. BMC

Genomics 12, 347-55.

41 Couldrey C., Keehan M., Johnson T., Tiplady K., Winkelman A., Littlejohn M.D., Scott A.,

Kemper K.E., Hayes B., Davis S.R. & Spelman R.J. (2017) Detection and assessment of

copy number variation using PacBio long-read and Illumina sequencing in New Zealand

dairy cattle. Journal of Dairy Science 100, 5472-8.

Crawford A.M., Dodds K.G., Ede A.J., Pierson C.A., Montgomery G.W., Garmonsway H.G.,

Beattie A.E., Davies K., Maddox J.F. & Kappes S.W. (1995) An autosomal genetic

linkage map of the sheep genome. Genetics 140, 703-24.

Cundiff L.V., Gregory K.E., Koch R.M. & Dickerson G.E. (1986) Genetic diversity among cattle

breeds and its use to increase beef production efficiency in a temperate environment.

Curik I., Ferenčaković M. & Sölkner J. (2014) Inbreeding and runs of homozygosity: A possible

solution to an old problem. Livestock Science 166, 26-34.

Dennis J.A., Healy P.J., Beaudet A.L. & O'Brien W.E. (1989) Molecular definition of bovine

argininosuccinate synthetase deficiency. Proceedings of the National Academy of

Sciences 86, 7947-51.

Dennis S.M. (1993) Congenital defects of sheep. Veterinary Clinics of North America: Food

Animal Practice 9, 203-17.

DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A.,

del Angel G., Rivas M.A., Hanna M., McKenna A., Fennell T.J., Kernytsky A.M.,

Sivachenko A.Y., Cibulskis K., Gabriel S.B., Altshuler D. & Daly M.J. (2011) A

framework for variation discovery and genotyping using next-generation DNA

sequencing data. Nature Genetics 43, 491-502.

Diskin M.G. & Morris D.G. (2008) Embryonic and early foetal losses in cattle and other

ruminants. Reproduction in Domestic Animals 43, 260-7.

42 Dorling P.R., Huxtable C.R. & Vogel P. (1978) Lysosomal storage in Swainsona spp. toxicosis:

An induced mannosidosis. Neuropathology and Applied Neurobiology 4, 285-95.

Eid J., Fehr A., Gray J., Luong K., Lyle J., Otto G., Peluso P., Rank D., Baybayan P., Bettman

B., Bibillo A., Bjornson K., Chaudhuri B., Christians F., Cicero R., Clark S., Dalal R.,

deWinter A., Dixon J., Foquet M., Gaertner A., Hardenbol P., Heiner C., Hester K.,

Holden D., Kearns G., Kong X., Kuse R., Lacroix Y., Lin S., Lundquist P., Ma C., Marks

P., Maxham M., Murphy D., Park I., Pham T., Phillips M., Roy J., Sebra R., Shen G.,

Sorenson J., Tomaney A., Travers K., Trulson M., Vieceli J., Wegener J., Wu D., Yang

A., Zaccarin D., Zhao P., Zhong F., Korlach J. & Turner S. (2009) Real-time DNA

sequencing from single polymerase molecules. Science 323, 133-8.

Faber D.C., Molina J.A., Ohlrichs C.L., Vander Zwaag D.F. & Ferré L.B. (2003)

Commercialization of animal biotechnology. Theriogenology 59, 125-38.

Fairfield H., Gilbert G.J., Barter M., Corrigan R.R., Curtain M., Ding Y., D'Ascenzo M.,

Gerhardt D.J., He C. & Huang W. (2011) Mutation discovery in mice by whole exome

sequencing. Genome Biology 12, R86-98

Flanagan S.E., Patch A.M. & Ellard S. (2010) Using SIFT and PolyPhen to predict loss-of-

function and gain-of-function mutations. Genetic Testing and Molecular Biomarkers 14,

533-7.

Flint A.P.F. & Woolliams J.A. (2008) Precision animal breeding. Philosophical transactions of

the Royal Society of London. Series B, Biological sciences 363, 573-90.

Flusberg B.A., Webster D.R., Lee J.H., Travers K.J., Olivares E.C., Clark T.A., Korlach J. &

Turner S.W. (2010) Direct detection of DNA methylation during single-molecule, real-

time sequencing. Nature Methods 7, 461-5.

43 Fries R. (1993) Mapping the bovine genome: Methodological aspects and strategy*. Animal

Genetics 24, 111-6.

Fujii J., Otsu K., Zorzato F., Khanna V., Weiler J., O'Brien P. & Maclenna D. (1991)

Identification of a mutation in porcine ryanodine receptor associated with malignant

hyperthermia. Science 253, 448-51.

Georges M., Charlier C. & Hayes B. (2019) Harnessing genomic information for livestock

improvement. Nature Reviews Genetics 20, 135-56.

Georges M., Dietz A.B., Mishra A., Nielsen D., Sargeant L.S., Sorensen A., Steele M.R., Zhao

X., Leipold H. & Womack J.E. (1993a) Microsatellite mapping of the gene causing

weaver disease in cattle will allow the study of an associated quantitative trait locus.

Proceedings of the National Academy of Sciences 90, 1058-62.

Georges M., Drinkwater R., King T., Mishra A., Moore S.S., Nielsen D., Sargeant L.S., Sorensen

A., Steele M.R., Zhao X., Womack J.E. & Hetzel J. (1993b) Microsatellite mapping of a

gene affecting horn development in Bos taurus. Nature Genetics 4, 206-10.

Gibson N.J. (2006) The use of real-time PCR methods in DNA sequence variation analysis.

Clinica Chimica Acta 363, 32-47.

Gibson T. & Jackson E. (2017) Predictable inherited diseases in purebred companion animals.

Gilissen C., Hoischen A., Brunner H.G. & Veltman J.A. (2011) Unlocking Mendelian disease

using exome sequencing. Genome Biology 12, 228-39.

Goldmann W., Hunter N., Foster J.D., Salbaum J.M., Beyreuther K. & Hope J. (1990) Two

alleles of a neural protein gene linked to scrapie in sheep. Proceedings of the National

Academy of Sciences of the United States of America 87, 2476-80.

44 Greber D., Doherr M., Drögemüller C. & Steiner A. (2013) Occurrence of congenital disorders

in Swiss sheep. Acta Veterinaria Scandinavica 55, 27-34.

Grobet L., Royo Martin L.J., Poncelet D., Pirottin D., Brouwers B., Riquet J., Schoeberlein A.,

Dunner S., Ménissier F., Massabanda J., Fries R., Hanset R. & Georges M. (1997) A

deletion in the bovine myostatin gene causes the double–muscled phenotype in cattle.

Nature Genetics 17, 71-4.

Groeneveld L.F., Lenstra J.A., Eding H., Toro M.A., Scherf B., Pilling D., Negrini R., Finlay

E.K., Jianlin H., Groeneveld E., Weigend S. & Consortium. T.G. (2010) Genetic

diversity in farm animals – A review. Animal Genetics 41, 6-31.

Guichoux E., Lagache L., Wagner S., Chaumeil P., LÉGer P., Lepais O., Lepoittevin C.,

Malausa T., Revardel E., Salin F. & Petit R.J. (2011) Current trends in microsatellite

genotyping. Molecular Ecology Resources 11, 591-611.

Gurda B.L. & Vite C.H. (2019) Large animal models contribute to the development of therapies

for central and peripheral nervous system dysfunction in patients with lysosomal storage

diseases. Human Molecular Genetics 28, R119-31.

Gurgul A., Semik E., Pawlina K., Szmatola T., Jasielezuk I. & Bugno-Poniewierska M. (2014)

The application of genome-wide SNP genotyping methods in studies on livestock

genomes. Journal of Applied Genetics 55, 197-208.

Gut I.G. (2013) New sequencing technologies. Clinical and Translational Oncology 15, 879-81.

Hackl T., Hedrich R., Schultz J. & Förster F. (2014) proovread: large-scale high-accuracy

PacBio correction through iterative short read consensus. Bioinformatics 30, 3004-11.

Hamosh A., Scott A.F., Amberger J., Valle D. & McKusick V.A. (2000) Online Mendelian

Inheritance in Man (OMIM). Human Mutation 15, 57-61.

45 Harper P., Latter M., Nicholas F., Cook R. & Gill P. (1998) Chondrodysplasia in Australian

Dexter cattle. Australian Veterinary Journal 76, 199-202.

Harper P.A.W., Healy P.J., Dennis J.A., O' Brien J.J. & Rayward D.H. (1986) Citrullinaemia as a

cause of neurological disease in neonatal Friesian calves. Australian Veterinary Journal

63, 378-9.

Hayashi K. (1992) PCR-SSCP: A method for detection of mutations. Genetic Analysis:

Biomolecular Engineering 9, 73-9.

Hayes B.J. & Daetwyler H.D. (2019) 1000 Bull genomes project to map simple and complex

genetic traits in cattle: Applications and outcomes. Annual Review of Animal Biosciences

7, 89-102.

Hayes B.J., Donoghue K.A., Reich C.M., Mason B.A., Bird-Gardiner T., Herd R.M. & Arthur

P.F. (2016) Genomic heritabilities and genomic estimated breeding values for methane

traits in Angus cattle. Journal of Animal Science 94, 902-8.

Hayes B.J., Lewin H.A. & Goddard M.E. (2013) The future of livestock breeding: genomic

selection for efficiency, reduced emissions intensity, and adaptation. Trends in Genetics

29, 206-14.

He S., Zhang Z., Sun Y., Ren T., Li W., Zhou X., Michal J.J., Jiang Z. & Liu M. (2020)

Genome‐wide association study shows that microtia in Altay sheep is caused by a 76 bp

duplication of HMX1. Animal Genetics 51, 132-6.

Healy P.J. (1996) Testing for undesirable traits in cattle: An Australian perspective. Journal of

Animal Science 74, 917–22.

Healy P.J. & Dennis J.A. (1994) Molecular heterogeneity for bovine maple syrup urine disease.

Animal Genetics 25, 329-32.

46 Hearne C.M., Ghosh S. & Todd J.A. (1992) Microsatellites for linkage analysis of genetic traits.

pp. 288-94. Elsevier Ltd.

Heather J.M. & Chain B. (2016) The sequence of sequencers: The history of sequencing DNA.

Genomics 107, 1-8.

Heaton M.P., Leymaster K.A., Kalbfleisch T.S., Kijas J.W., Clarke S.M., McEwan J., Maddox

J.F., Basnayake V., Petrik D.T., Simpson B., Smith T.P.L., Chitko-McKown C.G. & and

the International Sheep Genomics C. (2014) SNPs for parentage testing and traceability

in globally diverse breeds of sheep. PLoS ONE 9, e94851:1-10.

Hedrick P.W. (2012) What is the evidence for heterozygote advantage selection? Trends in

Ecology & Evolution 27, 698-704.

Hedrick P.W. (2014) Heterozygote advantage: The effect of artificial selection in livestock and

pets. Journal of Heredity 106, 141-54.

Hocking J.D., Jolly R.D. & Batt R.D. (1972) Deficiency of α-mannosidase in Angus cattle. An

inherited lysosomal storage disease. Biochemical Journal 128, 69-78.

Huxtable C.R. & Dorling P.R. (1982) Poisoning of livestock by Swainsona spp.: Current status.

Australian Veterinary Journal 59, 50-3.

Ihara N., Takasuga A., Mizoshita K., Takeda H., Sugimoto M., Mizoguchi Y., Hirano T., Itoh T.,

Watanabe T. & Reed K.M. (2004) A comprehensive genetic map of the cattle genome

based on 3802 microsatellites. Genome Research 14, 1987-98.

Irish Cattle Breeding Federation (2016) Health and Disease. Accessed 26th October 2020. URL:

https://www.icbf.com/wp/?page_id=2170.

47 Jagoe S., Kirkland P.D. & Harper P.A.W. (1993) An outbreak of Akabane virus—induced

abnormalities in calves after agistment in an endemic region. Australian Veterinary

Journal 70, 56-8.

Jiang Y., Jiang Y., Wang S., Zhang Q. & Ding X. (2019) Optimal sequencing depth design for

whole genome re-sequencing in pigs. BMC Bioinformatics 20, 556-68.

Johnson J.L., Leipold H.W., Schalles R.R., Guffy M.M., Peeples J.G., Castleberry R.S. &

Schneider H.J. (1981) Hereditary polydactyly in Simmental cattle. Journal of Heredity

72, 205-8.

Johnson V.J., Yucesoy B. & Luster M.I. (2004) Genotyping of single nucleotide polymorphisms

in cytokine genes using real-time PCR allelic discrimination technology. Cytokine 27,

135-41.

Kappes S.M., Keele J.W., Stone R.T., McGraw R.A., Sonstegard T.S., Smith T.P., Lopez-

Corrales N.L. & Beattie C.W. (1997) A second-generation linkage map of the bovine

genome. Genome Research 7, 235-49.

Kennedy B.W., Verrinder Gibbins A.M., Gibson J.P. & Smith C. (1990) Coalescence of

molecular and quantitative genetics for livestock improvement. Journal of Dairy Science

73, 2619-27.

Kerr R.J. & Kinghorn B.P. (1996) An efficient algorithm for segregation analysis in large

populations. Journal of Animal Breeding and Genetics 113, 457-69.

Kijas J.W., Porto-Neto L., Dominik S., Reverter A., Bunch R., McCulloch R., Hayes B.J.,

Brauning R. & McEwan J. (2014) Linkage disequilibrium over short physical distances

measured in sheep using a high-density SNP chip. Animal Genetics 45, 754-7.

48 Kinghorn B. & Kinghorn S. (2020) MateSel. Accessed 12th October 2020. URL:

https://www.matesel.com/.

Kircher M. & Kelso J. (2010) High-throughput DNA sequencing – concepts and limitations.

Bioessays 32, 524-36.

Kirov G., Nikolov I., Georgieva L., Moskvina V., Owen M.J. & O'Donovan M.C. (2006) Pooled

DNA genotyping on Affymetrix SNP genotyping arrays. BMC Genomics 7, 1-10.

Konnai S., Usui T., Ohashi K. & Onuma M. (2003) The rapid quantitative analysis of bovine

cytokine genes by real-time RT-PCR. Veterinary Microbiology 94, 283-94.

Kristensen T.N. & Sørensen A.C. (2005) Inbreeding – lessons from animal breeding,

evolutionary biology and conservation genetics. Animal Science 80, 121-33.

Kulkarni P. & Frommolt P. (2017) Challenges in the setup of large-scale next-generation

sequencing analysis workflows. Computational and Structural Biotechnology Journal 15,

471-7.

Kumar P., Henikoff S. & Ng P.C. (2009) Predicting the effects of coding non-synonymous

variants on protein function using the SIFT algorithm. Nature Protocols 4, 1073-81.

Lamb H.J., Ross E.M., Nguyen L.T., Lyons R.E., Moore S.S. & Hayes B.J. (2020)

Characterization of the poll allele in Brahman cattle using long-read Oxford Nanopore

sequencing. Journal of Animal Science 98, 1-5.

Lande R. & Thompson R. (1990) Efficiency of marker-assisted selection in the improvement of

quantitative traits. Genetics 124, 743-56.

Lander E. & Botstein D. (1987) Homozygosity mapping: a way to map human recessive traits

with the DNA of inbred children. Science 236, 1567-70.

49 Leinonen R., Akhtar R., Birney E., Bower L., Cerdeno-Tárraga A., Cheng Y., Cleland I.,

Faruque N., Goodgame N., Gibson R., Hoad G., Jang M., Pakseresht N., Plaister S.,

Radhakrishnan R., Reddy K., Sobhany S., Ten Hoopen P., Vaughan R., Zalunin V. &

Cochrane G. (2011) The European Nucleotide Archive. Nucleic Acids Research 39, D28-

D31.

Letko A., Dijkman R., Strugnell B., Häfliger I.M., Paris J.M., Henderson K., Geraghty T., Orr

H., Scholes S. & Drögemüller C. (2020) Deleterious AGXT missense variant associated

with type 1 primary hyperoxaluria (PH1) in Zwartbles sheep. Genes 11, 1147-56.

Liu Y., Murali S.C., Harris R.A., English A.C., Qin X., Skinner E., Richards S., Rogers J., Han

Y., Vee V., Wang M., Meng Q., Heaton M.P., Smith T.P.L., Dalrymple B.P., Kijas J.,

Cockett N.E., Boerwinkle E.A., Muzny D.M., Gibbs R.A. & Worley K.C. (2016) P1009

Sheep reference genome sequence updates: Texel improvements and Rambouillet

progress. Journal of Animal Science 94, 18-9.

Lush J.L. (1943) Animal breeding plans. Edition 2, Iowa State College Press., Ames, Iowa.

Lynch M., Ackerman M.S., JF. G., Long H., Sung W., Thomas W.K. & Foster P.L. (2016)

Genetic drift, selection and the evolution of the mutation rate. Nature Reviews Genetics

17, 704-714.

MacEachern S., Hayes B., McEwan J. & Goddard M. (2009) An examination of positive

selection and changing effective population size in Angus and Holstein cattle populations

(Bos taurus) using a high density SNP genotyping platform and the contribution of

ancient polymorphism to genomic diversity in domestic cattle. BMC Genomics 10, 181-

200.

50 Man W.Y.N., Nicholas F.W. & James J.W. (2007) A pedigree-analysis approach to the

descriptive epidemiology of autosomal-recessive disorders. Preventive Veterinary

Medicine 78, 262–73.

Marcq F., El Barkouki S., Elsen J.-M., Grobet L., Royo L., Leroy P. & Georges M. (1998)

Investigating the role of myostatin in the determinism of double muscling characterizing

Belgian Texel sheep. FAO of the UN.

Mardis E.R. (2008) The impact of next-generation sequencing technology on genetics. Trends in

Genetics 24, 133-41.

Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J.,

Braverman M.S., Chen Y.-J., Chen Z., Dewell S.B., Du L., Fierro J.M., Gomes X.V.,

Godwin B.C., He W., Helgesen S., Ho C.H., Irzyk G.P., Jando S.C., Alenquer M.L.I.,

Jarvie T.P., Jirage K.B., Kim J.-B., Knight J.R., Lanza J.R., Leamon J.H., Lefkowitz

S.M., Lei M., Li J., Lohman K.L., Lu H., Makhijani V.B., McDade K.E., McKenna M.P.,

Myers E.W., Nickerson E., Nobile J.R., Plant R., Puc B.P., Ronan M.T., Roth G.T.,

Sarkis G.J., Simons J.F., Simpson J.W., Srinivasan M., Tartaro K.R., Tomasz A., Vogt

K.A., Volkmer G.A., Wang S.H., Wang Y., Weiner M.P., Yu P., Begley R.F. & Rothberg

J.M. (2005) Genome sequencing in microfabricated high-density picolitre reactors.

Nature 437, 376-80.

Marshall F.B., Dobney K., Denham T. & Capriles J.M. (2014) Evaluating the roles of directed

breeding and gene flow in animal domestication. Proceedings of the National Academy of

Sciences of the United States of America 111, 6153-8.

Masoudi-Nejad A., Meshkin A., Haji-Eghrari B. & Bidkhori G. (2012) Candidate gene

prioritization. Molecular Genetics and Genomics 287, 679-98.

51 Matthijs G. (2004) Patenting genes. BMJ 329, 1358-60.

McClure M.C., Bickhart D., Null D., VanRaden P., Xu L., Wiggans G., Liu G., Schroeder S.,

Glasscock J., Armstrong J., Cole J.B., Van Tassell C.P., Sonstegard T.S. & Moore S.

(2014) Bovine exome sequence analysis and targeted SNP genotyping of recessive

fertility defects BH1, HH2, and HH3 reveal a putative causative mutation in SMC2 for

HH3. PLoS ONE 9, e92769:1-9

McDaniel B.T. (2001) Uncontrolled Inbreeding. Journal of Dairy Science 84, E185-6.

McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K.,

Altshuler D., Gabriel S., Daly M. & DePristo M.A. (2010) The Genome Analysis

Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.

Genome Research 20, 1297-303.

Meuwissen T., Hayes B. & Goddard M. (2013) Accelerating improvement of livestock with

genomic selection. Annual Review of Animal Biosciences 1, 221-37.

Mignon-Grasteau S., Boissy A., Bouix J., Faure J.-M., Fisher A.D., Hinch G.N., Jensen P., Le

Neindre P., Mormède P., Prunet P., Vandeputte M. & Beaumont C. (2005) Genetics of

adaptation and domestication in livestock. Livestock Production Science 93, 3-14.

Montgomery G.W., McNatty K.P. & Davis G.H. (1992) Physiology and molecular genetics of

mutations that increase ovulation rate in sheep. Endocrine Reviews 13, 309-28.

NCBI Resource Coordinators (2018) Database resources of the National Center for

Biotechnology Information. Nucleic Acids Research 46, D8-13.

Ng S.B., Buckingham K.J., Lee C., Bigham A.W., Tabor H.K., Dent K.M., Huff C.D., Shannon

P.T., Jabs E.W., Nickerson D.A., Shendure J. & Bamshad M.J. (2010) Exome sequencing

identifies the cause of a mendelian disorder. Nature Genetics 42, 30-5.

52 Ng S.B., Turner E.H., Robertson P.D., Flygare S.D., Bigham A.W., Lee C., Shaffer T., Wong

M., Bhattacharjee A., Eichler E.E., Bamshad M., Nickerson D.A. & Shendure J. (2009)

Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461,

272-278

Niedringhaus T.P., Milanova D., Kerby M.B., Snyder M.P. & Barron A.E. (2011) Landscape of

next-generation sequencing technologies. Analytical Chemistry 83, 4327-41.

O'Rourke B.A., Dennis J.A. & Healy P.J. (2006) Internal restriction sites: Quality assurance aids

in genotyping. Journal of Veterinary Diagnostic Investigation 18, 195-7.

Ohba Y., Kitagawa H., Kitoh K., Asahina S., Nishimori K., Yoneda K., Kunieda T. & Sasaki Y.

(2000) Homozygosity mapping of the locus responsible for renal tubular dysplasia of

cattle on bovine chromosome 1. Mammalian Genome 11, 316-9.

Online Mendelian Inheritance in Animals (2020a) Sydney School of Veterinary Science,

University of Sydney, Sydney. Accessed 20th August 2020. URL: https://omia.org/.

Online Mendelian Inheritance in Animals (2020b) Sydney School of Veterinary Science,

University of Sydney, Sydney. Accessed 31st August 2020. URL: https://omia.org/.

Online Mendelian Inheritance in Man (2019) Johns Hopkins University, Balitmore MD.

Accessed 26th December 2019. URL: https://omim.org/.

Ott J., Wang J. & Leal S.M. (2015) Genetic linkage analysis in the age of whole-genome

sequencing. Nature Reviews Genetics 16, 275-84.

Ozaki K., Ohnishi Y., Iida A., Sekine A., Yamada R., Tsunoda T., Sato H., Sato H., Hori M.,

Nakamura Y. & Tanaka T. (2002) Functional SNPs in the lymphotoxin-alpha gene that

are associated with susceptibility to myocardial infarction. Nature Genetics 32, 650-4.

53 Paris J.M., Letko A., Häfliger I.M., Švara T., Gombač M., Klinc P., Škibin A., Pogorevc E. &

Drögemüller C. (2020) A de novo variant in OTX2 in a lamb with otocephaly. Acta

Veterinaria Scandinavica 62, 5-11.

Parry J.M., Shamsher M. & Skibinski D.O. (1990) Restriction site mutation analysis, a proposed

methodology for the detection and study of DNA base changes following mutagen

exposure. Mutagenesis 5, 209-12.

Parsons B.L. & Heflich R.H. (1997) Genotypic selection methods for the direct analysis of point

mutations. Mutation Research/Reviews in Mutation Research 387, 97-121.

Pinnapureddy A.R., Stayner C., McEwan J., Baddeley O., Forman J. & Eccles M.R. (2015)

Large animal models of rare genetic disorders: sheep as phenotypically relevant models

of human genetic disease. Orphanet Journal of Rare Diseases 10, 1-8.

Poli M.A., Dewey R., Semorile L., Lozano M.E., Albarino C.G., Romanowski V. & Grau O.

(1996) PCR screening for carriers of bovine leukocyte adhesion deficiency (BLAD) and

uridine monophosphate synthase (DUMPS) in Argentine Holstein cattle. Journal of

Veterinary Medicine Series A 43, 163-8.

Pollard M.O., Gurdasani D., Mentzer A.J., Porter T. & Sandhu M.S. (2018) Long reads: Their

purpose and place. Human Molecular Genetics 27, R234-41.

Porto-Neto L.R., Kijas J.W. & Reverter A. (2014) The extent of linkage disequilibrium in beef

cattle breeds using high-density SNP genotypes. Genetics Selection Evolution 46, 22-27.

Pourzand C. & Cerutti P. (1993) Genotypic mutation analysis by RFLP/PCR. Mutation

Research/Fundamental and Molecular Mechanisms of Mutagenesis 288, 113-21.

54 Pryce J.E., Hayes B.J. & Goddard M.E. (2012) Novel strategies to minimize progeny inbreeding

while maximizing genetic gain using genomic information. Journal of Dairy Science 95,

377-88.

Purfield D.C., Berry D.P., McParland S. & Bradley D.G. (2012) Runs of homozygosity and

population history in cattle. BMC Genomics 13, 1-11.

Qanbari S. & Simianer H. (2014) Mapping signatures of positive selection in the genome of

livestock. Livestock Science 166, 133-43.

Robertson A. (1962) Selection for heterozygotes in small populations. Genetics 47, 1291-300.

Rodenburg R.J. (2018) The functional genomics laboratory: Functional validation of genetic

variants. Journal of Inherited Metabolic Disease 41, 297-307.

Ron M., Band M., Yanai A. & Weller J.I. (1994) Mapping quantitative trait loci with DNA

microsatellites in a commercial dairy cattle population. Animal Genetics 25, 259-64.

Ross M.G., Russ C., Costello M., Hollinger A., Lennon N.J., Hegarty R., Nusbaum C. & Jaffe

D.B. (2013) Characterizing and measuring bias in sequence data. Genome Biology 14,

R51-71.

Sasaki S., Hasegawa K., Higashi T., Suzuki Y., Sugano S., Yasuda Y. & Sugimoto Y. (2016) A

missense mutation in solute carrier family 12, member 1 (SLC12A1) causes hydrallantois

in Japanese Black cattle. BMC Genomics 17, 724-39.

Schadt E.E., Turner S. & Kasarskis A. (2010) A window into third-generation sequencing.

Human Molecular Genetics 19, R227-40.

Schwenger B., Schöber S. & Simon D. (1993) DUMPS cattle carry a point mutation in the

uridine monophosphate synthase gene. Genomics 16, 241-4.

55 Seitz J.J., Schmutz S.M., Thue T.D. & Buchanan F.C. (1999) A missense mutation in the bovine

MGF gene is associated with the roan phenotype in Belgian Blue and Shorthorn cattle.

Mammalian Genome 10, 710-2.

Sellis D., Callahan B.J., Petrov D.A. & Messer P.W. (2011) Heterozygote advantage as a natural

consequence of adaptation in diploids. Proceedings of the National Academy of Sciences

108, 20666-71.

Shen T., Lee A., Shen C. & Lin C.J. (2015) The long tail and rare disease research: The impact

of next-generation sequencing for rare Mendelian disorders. Genetics Research 97, 1-14.

Sims D., Sudbery I., Ilott N.E., Heger A. & Ponting C.P. (2014) Sequencing depth and coverage:

Key considerations in genomic analyses. Nature Reviews Genetics 15, 121-32.

Smith S., Cantet F., Angelini F., Marais A., Mégraud F., Bayerdöffer E. & Miehlke S. (2002)

Discriminatory power of RAPD, PCR-RFLP and southern blot analyses of ureCD or

ureA gene probes on Helicobacter pylori isolates. Zeitschrift für Naturforschung - Section

C Journal of Biosciences 57, 516-21.

Soethout E.C., Verkaar E.L.C., Jansen G.H., Muller K.E. & Lenstra J.A. (2002) A direct styi

polymerase chain reaction–restriction fragment length polymorphism (PCR–RFLP) test

for the myophosphorylase mutation in cattle. Journal of Veterinary Medicine Series A 49,

289-90.

Studer R.A., Dessailly B.H. & Orengo C.A. (2013) Residue mutations and their impact on

protein structure and function: detecting beneficial and pathogenic changes. The

Biochemical Journal 449, 581-594.

Tammen I. (2016) Breeding Focus 2016: Improving Welfare. University of New England,

Armidale, Australia.

56 Tammen I., Houweling P.J., Frugier T., Mitchell N.L., Kay G.W., Cavanagh J.A.L., Cook R.W.,

Raadsma H.W. & Palmer D.N. (2006) A missense mutation (c.184C>T) in ovine CLN6

causes neuronal ceroid lipofuscinosis in Merino sheep whereas affected South Hampshire

sheep have reduced levels of CLN6 mRNA. Biochimica et Biophysica Acta - Molecular

Basis of Disease 1762, 898-905.

Tan P., Allen J.G., Wilton S.D., Akkari P.A., Huxtable C.R. & Laing N.G. (1997) A splice-site

mutation causing ovine McArdle's disease. Neuromuscular Disorders 7, 336-42.

Tate M.L., Manly H.C., Dodds K.G. & Montgomery G.W. (1992) Genetic linkage analysis

between protein polymorphisms and the Fec B major gene in sheep. Animal Genetics 23,

417-24.

Teseling C. & Parnell P. (2011) The effective management of deleterious genetic conditions of

cattle. In: Association for the Advancement of Animal Breeding and Genetics, 131-4.

Teseling C. & Parnell P. (2013) How Angus breeders have reduced the frequency of deleterious

recessive genetic conditions. In: Association of Advancement Animal Breeding and

Genetics, 558-61.

The International Sheep Genomics Consortium (2020) Overview. Accessed 12th October 2020.

URL: https://www.sheephapmap.org/overview.php.

The International Sheep Genomics Consortium, Archibald A.L., Cockett N.E., Dalrymple B.P.,

Faraut T., Kijas J.W., Maddox J.F., McEwan J.C., Oddy V.H., Raadsma H.W., Wade C.,

Wang J., Wang W. & Xun X. (2010) The sheep genome reference sequence: A work in

progress. Animal Genetics 41, 449–53.

Topol E.J., Murray S.S. & Frazer K.A. (2007) The genomics gold rush. Jama 298, 218-21.

57 Truscott G. & Thomas P. (2010) A strategy for achieving innovation through Sheep Cooperative

Research Centre research and development. Animal Production Science 50, 1145–51.

Turner H. (1978) Selection for reproduction rate in Australian Merino sheep: Direct responses.

Australian Journal of Agricultural Research 29, 327-50.

Upadhyay M., Hauser A., Kunz E., Krebs S., Blum H., Dotsev A., Okhlopkov I., Bagirov V.,

Brem G., Zinovieva N. & Medugorac I. (2020) The first draft genome assembly of Snow

sheep (Ovis nivicola). Genome Biology and Evolution 12, 1330-6. van Driel M.A. & Brunner H.G. (2006) Bioinformatics methods for identifying candidate disease

genes. Human Genomics 2, 429–32. van Heyningen V. & Yeyati P.L. (2004) Mechanisms of non-Mendelian inheritance in genetic

disease. Human Molecular Genetics 13, R225-33.

Wang J. & Hill W.G. (1999) Effect of selection against deleterious mutations on the decline in

heterozygosity at neutral loci in closely inbreeding populations. Genetics 153, 1475-89.

Weigel K.A. (2001) Controlling inbreeding in modern breeding programs. Journal of Dairy

Science 84, E177-84.

Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V., Church D.M.,

DiCuccio M., Edgar R., Federhen S., Geer L.Y., Kapustin Y., Khovayko O., Landsman

D., Lipman D.J., Madden T.L., Maglott D.R., Ostell J., Miller V., Pruitt K.D., Schuler

G.D., Sequeira E., Sherry S.T., Sirotkin K., Souvorov A., Starchenko G., Tatusov R.L.,

Tatusova T.A., Wagner L. & Yaschenko E.Y. (2007) Database resources of the National

Center for Biotechnology Information. Nucleic Acids Research 35, D5–12.

58 Willet C., Makara M., Reppas G., Tsoukalas G., Malik R., Haase B. & Wade C. (2015) Canine

disorder mirrors human disease: Exonic deletion in HES7 causes autosomal recessive

spondylocostal dysostosis in Miniature Schnauzer dogs. PLoS ONE 10, e0117055:1-13.

Windsor P.A. & Agerholm J.S. (2009) Inherited diseases of Australian Holstein-Friesian cattle.

Australian Veterinary Journal 87, 193–9.

Windsor P.A., Kessell A.E. & Finniec J.W. (2011) Neurological diseases of ruminant livestock

in Australia. V: congenital neurogenetic disorders of cattle. Australian Veterinary

Journal 89, 394-401.

Young C.W. & Seykora A.J. (1996) Estimates of inbreeding and relationship among registered

Holstein females in the United States. Journal of Dairy Science 79, 502-5.

Zerbino D.R., Achuthan P., Akanni W., Amode M.R., Barrell D., Bhai J., Billis K., Cummins C.,

Gall A., Girón C.G., Gil L., Gordon L., Haggerty L., Haskell E., Hourlier T., Izuogu

O.G., Janacek S.H., Juettemann T., To J.K., Laird M.R., Lavidas I., Liu Z., Loveland

J.E., Maurel T., McLaren W., Moore B., Mudge J., Murphy D.N., Newman V., Nuhn M.,

Ogeh D., Ong C.K., Parker A., Patricio M., Riat H.S., Schuilenburg H., Sheppard D.,

Sparrow H., Taylor K., Thormann A., Vullo A., Walts B., Zadissa A., Frankish A., Hunt

S.E., Kostadima M., Langridge N., Martin F.J., Muffato M., Perry E., Ruffier M., Staines

D.M., Trevanion S.J., Aken B.L., Cunningham F., Yates A. & Flicek P. (2017) Ensembl

2018. Nucleic Acids Research 46, D754-61.

Zhang B., Healy P.J., Zhao Y., Crabb D.W. & Harris R.A. (1990) Premature translation

termination of the pre-E1 alpha subunit of the branched chain alpha-ketoacid

dehydrogenase as a cause of maple syrup urine disease in Polled Hereford calves. Journal

of Biological Chemistry 265, 2425-7.

59 Zhang X., Swalve H.H., Pijl R., Rosner F., Wensch-Dorendorf M. & Brenig B. (2019)

Interdigital hyperplasia in Holstein Cattle is associated with a missense mutation in the

signal peptide region of the tyrosine-protein kinase transmembrane receptor gene.

Frontiers in Genetics 10, 1-11.

Zhu M. & Zhao S. (2007) Candidate gene identification approach: Progress and challenges.

International Journal of Biological Sciences 3, 420-7.

Zhu W., Zhang X.-Y., Marjani S., Zhang J., Zhang W., Wu S. & Pan X. (2017) Next-generation

molecular diagnosis: Single-cell sequencing from bench to bedside. Cellular and

Molecular Life Sciences 74, 869-80.

60 Chapter 2 | Overview of the inherited diseases investigated and strategies used

2.1 Synopsis This chapter serves as a preface and overview of the research presented in this thesis. A general overview of the inherited diseases initially considered for this research is given in section 2.2, and in section 2.3, a research update is provided. Both manuscripts in section 2.2 and 2.3 were published in the Association for the Advancement of Animal Breeding and Genetics conference proceedings.

Of the ten inherited diseases detailed in this chapter, disease-causing variants have been identified and validated for four diseases: ichthyosis fetalis in Shorthorn cattle, Niemann-Pick type C disease in Angus/Angus-cross cattle, brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep and pulmonary hypoplasia with anasarca in Persian/Persian-cross sheep. The detailed information regarding this research is presented in Chapters 3-6. The remaining inherited diseases were not prioritised and as the research is ongoing, detailed information about this research is not included in this thesis. A brief summary of the current progress of the research and reasons to not prioritise these projects are detailed in the following paragraphs.

61

Ovine dermatosparaxis

Whilst a possible disease-causing variant was identified for dermatosparaxis in Merino sheep in a single affected sheep, samples from additional animals from the same flock were not available for validation. The variant was not present in affected sheep from a second flock and genetic heterogeneity was therefore considered. In both flocks the diagnosis of dermatosparaxis was based on clinical description alone and histopathological confirmation of the phenotype was not possible. Whole genome sequencing data of the affected sheep of the second flock has been analysed, although so far no likely disease-causing variant has been identified.

Congenital mandibular prognathia

Congenital mandibular prognathia in Droughtmaster cattle has a low impact on animal welfare when compared to the other diseases listed in this thesis. Clinical signs of affected animals include misalignment of the mandible and maxilla, resulting in a craniofacial abnormality.

Affected animals are still able to graze as the abnormality is not extreme. While homozygosity mapping identified a region of interest and whole genome sequencing identified three large deletions in the identified interval, a clear association between the deletions and disease phenotype was difficult to establish. The diagnosis of affected animals was based on observation by the owner. At the time of this study, no affected animals were available for a more detailed characterisation of the phenotype or for sample collection to investigate if the identified deletions would impact gene expression.

62

Congenital blindness in Shorthorn and a new variant of cardiomyopathy woolly haircoat syndrome in Hereford cattle

For both of these diseases, low numbers of quality samples from affected animals, insufficient number of samples from control animals and limitations in regards to phenotype description were limiting factors for investigation. Unless additional cases are reported, these investigations are unlikely to lead to the identification of the underlying genetic cause.

Congenital contractural arachnodactyly in Murray Grey cattle

Several Murray Grey cattle presented with a phenotype similar to congenital contractural arachnodactyly (CCA) in Angus cattle. The causative mutation for this disease in Angus cattle has not been published. SNP genotyping identified that the disease in Murray Grey cattle mapped to the same region to which CCA mapped in Angus cattle, and consequent commercial

DNA diagnostics identified that the affected Murray Grey cattle were indeed homozygous for the same unpublished mutation that causes CCA in Angus cattle.

Cervicothoracic vertebral subluxation in Merino sheep

Several affected sheep were SNP genotyped and homozygosity mapping suggested a region of interest, however this region was later associated with a selection sweep once additional control animals were genotyped. While the phenotype was characterised with post mortems in some of the affected animals, uncertainty in regards to onset of disease and concerns that only a select few of the affected animals progress to presentation of clinical signs of ataxia created doubt in

63 regards to the true phenotypes of the control animals. Whole genome sequencing data has been unsuccessfully analysed for likely causative variants in a number of candidate genes, and high density SNP genotyping in an extended sample set is considered for future investigation.

Ichthyosis fetalis in Poll Hereford cattle

For ichthyosis fetalis in Poll Hereford cattle, the DNA quality of the affected animal was insufficient for whole genome sequencing. Research student Liberty Conyers under the supervision of Katie Eager and Dr Brendon O’Rourke Sanger sequenced the ADAMTS2 candidate gene that is known to cause ichthyosis, and the identification of a likely disease- causing variant was published (Eager et al. 2020). This work has not been included in this theses as my role was limited in this study.

2.2 Investigating emerging inherited diseases in Australian livestock: A

collaborative approach

64 Proc. Assoc. Advmt. Anim. Breed. Genet. 22:15-18

INVESTIGATING EMERGING INHERITED DISEASES IN AUSTRALIAN LIVESTOCK: A COLLABORATIVE APPROACH

S.A. Woolley1, E.R. Tsimnadis1, N. Nowak1, R.L. Tulloch1, M.R. Shariflou1, T. Leeb2, C.E. Willet3, M.S. Khatkar1, B.A. O’Rourke4, I. Tammen1

1 Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, Camden, NSW, Australia 2 Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland 3 Sydney Informatics Hub, Core Research Facilities, The University of Sydney, Sydney, NSW, Australia 4 Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries, Menangle, NSW, Australia

SUMMARY Emerging inherited diseases can cause numerous issues for producers, including productivity loss, profit loss and animal welfare problems. Under-reporting of emerging inherited diseases can result in difficulties associated with identifying and managing these diseases. The development of a research centre between the University of Sydney and Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries is a current collaborative effort to encourage the submission of suspected inherited disease cases. Previous collaboration has resulted in the ongoing investigation of 10 inherited diseases using SNP-based homozygosity mapping and next generation sequencing to identify positional candidate genes and causal mutations. The long-term aim is to formally develop a research centre that allows independent investigation of emerging inherited diseases in livestock that builds upon current joint research.

INTRODUCTION Emerging inherited diseases within Australian livestock can often go unreported, either because they are misdiagnosed as non-inherited diseases or are not reported due to concerns of profit loss and reputation damage. Not reporting suspected inherited disease cases can lead to a loss of valuable sample resources and a lost opportunity to characterise the phenotype(s), thus causing a delay in investigating or monitoring these diseases. Without the assurance of a robust genotyping test to identify heterozygous individuals, the management of autosomal recessive inherited diseases can become problematic, especially if detailed pedigrees are unknown for at-risk populations (Man et al. 2007). The under-reporting of suspected recessive inherited diseases can contribute to the inadvertent dissemination of deleterious alleles throughout populations. If a deleterious allele can be traced to a common ancestor within a prominent sire line, all offspring are at risk of being heterozygous for the deleterious allele and only a DNA test will be able to accurately identify true heterozygous animals. Emerging inherited disease monitoring and the implementation of management programs to avoid carrier by carrier matings are important for reducing the number of affected progeny born, as well as mitigating any production and economic losses. The importance of these management programs has been shown in the case of brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep (Shariflou et al. 2013), where breeding programs have reduced the number of affected progeny born (Shariflou, personal communication). Researchers at the University of Sydney and the Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries (EMAI) each have a longstanding history in investigating inherited diseases in Australian livestock and have recently started to collaborate on numerous research projects. So far, 10 inherited diseases are being investigated and are likely to be inherited

15

65 Industry 1 via a recessive mode of inheritance: congenital mandibular prognathia (CMP) in Droughtmaster cattle, pulmonary hypoplasia with anasarca (PHA) in Persian sheep, Niemann-Pick type C disease (NPC) in Angus cattle, congenital blindness (CB) in white Shorthorn cattle, cervicothoracic vertebral subluxation (CVS) in Merino sheep, a new variant of cardiomyopathy woolly haircoat syndrome (CWH) in Hereford cattle, new variants of ichthyosis fetalis (IF) in Hereford and Shorthorn cattle, suspected cases of congenital contractural arachnodactyly (CCA) in Murray Grey cattle, ovine dermatosparaxis (OD) in Merino sheep as well as the previously reported brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in Merino sheep (Shariflou et al. 2013). A SNP-chip based homozygosity mapping approach and next generation sequencing is described with an aim to identify positional candidate genes, identify causal mutations and develop diagnostic DNA tests. The long term aim resulting from these collaborations is to develop an independent centre where producers and veterinarians can report and submit samples of suspected inherited disease cases. The centre will follow a similar approach to previous studies conducted and will benefit the Australian livestock industries through increased awareness and acceptance of reporting.

MATERIALS AND METHODS In current collaborative research projects, SNP genotyping was performed by the Animal Genetics Laboratory (University of Queensland, Gatton, Australia) and Australian Genome Research Facility (Westmead, Australia) (Table 1). Sliding windows of 25, 50 and 100 SNPs were used to identify runs of homozygosity (ROH) for all affected animals using the bovine UMD3.1 genome assembly and the ovine Oarv1.0 genome assembly. ROH were analysed using PLINK (Purcell et al. 2007) and were considered to be regions of interest if these regions were shared by all of the affected animals and not with any of the carrier and control animals. These regions were scanned for positional candidate genes based on gene function.

Table 1. Number of affected and carrier DNA samples sent for SNP chip genotyping and regions of homozygosity, including species specific OMIA ID

Disease OMIA ID1 Breed Affected SNP Region of interest /Carrier chip Cervicothoracic 000077-9940 Merino 14/2 SNP502 OAR10 vertebral sublaxation Pulmonary hypoplasia 000493-9940 Persian 5/5 SNP502 OAR1,3,4,6,7,9,17, with anasarca 25,26 Cardiomyopathy and 000161-9913 Poll Hereford 2/0 SNP803 BTA1,4,6,12,15,24, woolly haircoat 25 syndrome Congenital blindness - Shorthorn 2/3 SNP803 BTA5,14,16,22,24 Congenital contractural 001511-9913 Murray Grey 5/5 SNP803 BTA21 arachnodactyly Congenital mandibular - Droughtmaster 9/4 SNP803 BTA26 prognathia Ichthyosis fetalis 000547-9913 Hereford 1/3 SNP803 multiple Niemann-Pick disease - Angus 2/2 SNP803 BTA3,4,16,24,29 1OMIA http://omia.angis.org.au, - indicates no species specific OMIA ID. 2SNP50 = Illumina® OvineSNP50 Genotyping BeadChip (CA, USA). 3SNP80 = GeneSeek® Genomic Profiler Bovine HD Chip 80K chip (Neogen, NE, USA).

16

66 Proc. Assoc. Advmt. Anim. Breed. Genet. 22:15-18

Sanger sequencing of select candidate genes was commenced but was cost and labour intensive. Next generation sequencing (NGS) of affected animals for CMP, CVS, PHA and BCRHS using the Illumina HiSeqTM X Ten sequencing platform was performed by the Kinghorn Centre for Clinical Genomics (Garvan Institute of Medical Research, Darlinghurst, Australia) through the Ramaciotti Centre for Genomics (University of New South Wales, Sydney, Australia) with 150bp paired-end reads at 30X coverage (Table 2). This NGS data has been aligned to either the bosTau8 or oviAri3 reference genome assemblies and will be analysed for genetic variants. Samples of affected animals for IF, CWH and OD are undergoing sequencing using an in-house Illumina HiSeq® 3000 sequencing platform in Switzerland (Table 2).

Table 2. Number of affected DNA samples for next generation sequencing

Disease Breed Affected Expected % of sequences with coverage mean Q>30 Brachygnathia, cardiomegaly and renal Merino 1 30X 85.84 hypoplasia syndrome Cardiomyopathy and woolly haircoat Poll Hereford 2 20X In progress syndrome

Cervicothoracic vertebral sublaxation Merino 2 30X 92.16

Congenital mandibular prognathia Droughtmaster 2 30X 86.58 Ichthyosis fetalis Hereford 1 20X In progress Ichthyosis fetalis Shorthorn 1 20X In progress Ovine dermatosparaxis Merino 2 20X In progress

Pulmonary hypoplasia with anasarca Persian 2 30X 90.17

RESULTS AND DISCUSSION Homozygosity mapping has successfully revealed and/or excluded positional candidate genes for all of the inherited diseases currently being investigated (Table 1; Shariflou et al. 2013; Tammen et al. 2016). The known mutation for CCA in Angus cattle was confirmed to be present in the Murray Grey cattle with suspected CCA. Validation of a genetic variant in a positional candidate gene for NPC is ongoing. Partial Sanger sequencing of positional candidate genes for CVS, PHA, CMP and CWH did not reveal any disease-causing mutations and affected animals were therefore re-sequenced using NGS. Previous mapping of BCRHS did not identify a clear positional candidate gene and an affected animal sample was submitted for NGS. Known candidate genes for CWH and CB were excluded and alternate candidate genes need to be investigated within the regions of interest identified (Table 1). Strong candidate genes exist for IF and OD, as these diseases have been previously characterised in different breeds (Charlier et al. 2008; Zhou et al. 2012). The affected animals tested negative for the known disease causing mutations and were re-sequenced due to suspected genetic heterogeneity. Preliminary quality control analysis of the NGS data is positive with per base sequence quality determined by a Phred score of Q>30 ranging from 85.84% to 92.16% (Table 2) with no over- represented sequences identified. After aligning data to the bosTau8 or oviAri3 genome assemblies, allelic variations including SNPs, indels and structural variants will be identified in the regions of interest previously identified, with a focus on positional candidate genes identified by homozygosity mapping. The results from these studies indicate that SNP genotyping and homozygosity mapping methods are highly effective in identifying positional candidate genes for a range of disorders even

17

67 Industry 1 if sample sizes are small and phenotypes are poorly defined. Genome wide SNP genotyping and homozygosity mapping approaches have successfully identified candidate genes and causal mutations in a range of recessive inherited diseases in cattle, including ichthyosis fetalis in Chianina cattle (Charlier et al. 2008). The inclusion of NGS data to identify allelic variations will allow for several runs of homozygosity identified through homozygosity mapping to be further investigated.

CENTRE CONCEPT The methodology framework and results described in the current research projects between the University of Sydney and EMAI demonstrates the success of the working relationship between both groups. The concept of an independent research centre geared towards the molecular characterisation of emerging inherited diseases in livestock could provide a central point of contact for veterinarians, breeders, producers and breed societies. It has the potential to increase confidential reporting of suspected cases and provide research services with the aim to rapidly develop low-cost diagnostic tests based on frameworks that are already implemented at both institutions. The availability of diagnostic DNA tests will allow for informed breeding decisions to be made to avoid potentially devastating profit loss and animal welfare issues. The centre will aim to publish validated results which will increase awareness for the role of emerging inherited diseases within Australian livestock populations. The future development of the centre will be focussed on developing a stream-lined research and diagnostic service that may involve additional research and industry groups. The key driving factor behind successfully developing an independent centre will be the collaborative relationships and shared resources between numerous research groups to encourage greater surveillance of emerging inherited disease in livestock across NSW and nation-wide.

ACKNOWLEDGMENTS The authors acknowledge and thank the producers and veterinarians for the submission of samples to EMAI and the University of Sydney. The authors thank the genetics laboratory staff at EMAI for their assistance. The University of Sydney is acknowledged for the use of services and HPC facilities at the Sydney Informatics Hub and the Faculty of Veterinary Science provided research support for 3 honours projects (NN 2013, ET 2014 & SW 2015), funding support from the Animal Welfare, FS Quiney and Dorothy Minchin Bequests. The Australian Wool Education Trust is acknowledged for the support of a research scholarship for RT (2016). The NGS is funded by a Faculty of Veterinary Science - DPI collaborative compact fund and research grant CRSII3_160738 from the Swiss National Science Foundation to TL.

REFERENCES Charlier C., Coppieters W., Rollin F., Desmech D., Agerholm J.S., et al. (2008) Nature Genet. 40: 449. Man W.Y.N., Nicholas F.W. & James J.W. (2007) Prev Vet Med. 78: 262. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira Manuel A R., et al. (2007) Am J Hum Genet. 81: 559. Shariflou M.R., Wade C.M., Kijas J., McCulloch R., Windsor P.A., et al. (2013) Anim Genet. 44: 231. Tammen I., Woolley S., Tsimnadis E., Nowak N., Tulloch R., et al. (2016). J Anim Sci. 94 (7 Supplement 4): 173. Zhou H., Hickford J.G.H. & Fang Q. (2012) Anim Genet. 43: 471.

18

68 2.3 Molecular investigation of several emerging inherited diseases in cattle and sheep

69 Detection of Causal Variants

MOLECULAR INVESTIGATION OF SEVERAL EMERGING INHERITED DISEASES IN CATTLE AND SHEEP

S.A. Woolley1, E.R. Tsimnadis1, R.L. Tulloch1, P. Hughes1, B. Hopkins1, S.E. Hayes1, M.R. Shariflou1, A. Bauer2, I.M. Häfliger2, V. Jagannathan2, C. Drögemüller2, T. Leeb2, M.S. Khatkar1, C.E. Willet3, B.A. O’Rourke4, I. Tammen1

1The University of Sydney, Faculty of Science, Sydney School of Veterinary Science, Camden, NSW, 2570 Australia 2The University of Bern, Institute of Genetics, Vetsuisse Faculty, Bern, Switzerland 3The University of Sydney, Sydney Informatics Hub, Core Research Facilities, Sydney, NSW, 2006 Australia 4The Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries, Menangle, NSW, 2568 Australia

SUMMARY Emerging inherited diseases can cause numerous issues for producers, including productivity loss, profit loss and animal welfare problems. Current collaborative efforts between the University of Sydney and the Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries has resulted in the ongoing investigation of several inherited diseases using both SNP-based homozygosity mapping and whole genome sequencing approaches to identify positional candidate genes and likely causal variants. This paper serves as a brief update for eight of the investigated inherited diseases in cattle and sheep, with these studies aiming to identify positional candidate genes and causal variants to facilitate the improved management of at-risk populations for each inherited disease investigated.

INTRODUCTION The advancement of livestock breeding has allowed for desirable traits and elite genetics to be disseminated throughout livestock populations within relatively short periods of time. Small effective population sizes and inbreeding poses a risk for the inheritance of deleterious alleles in homozygous form and can contribute to the increased observation of animals with recessive inherited diseases (Charlier et al. 2008; Groeneveld et al. 2010), especially when considering closed herds or flocks. The reporting of inherited diseases within Australian livestock is limited due to either misdiagnosis of a prospective inherited disease or concern for reputation damage and profit losses. Detailed clinical and phenotypic descriptions of suspected recessive inherited diseases is imperative to future molecular investigations. Without consistent reporting and detailed phenotype information, the molecular characterisation of emerging inherited diseases can be delayed due to resource loss or lack of key information such as pedigree data and clinical descriptions. This can therefore impact on the monitoring and management of the inherited disease in at-risk populations, especially if detailed pedigrees are unknown when genotyping tests become available (Man et al. 2007). Collaborative projects between researchers at the University of Sydney and the Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries (EMAI) has enabled the investigation of several emerging recessive inherited diseases in livestock. With an increasing number of suspected inherited disease cases being investigated, the use of SNP-chip based homozygosity mapping and whole genome sequencing approaches is becoming routine in identifying positional candidate genes, causal variants and for facilitating the development of genotyping tests for inherited diseases with little pedigree information or phenotypic descriptions. This paper serves as an update for eight of the emerging inherited diseases with a suspected recessive mode of inheritance currently under

270

70 Proc. Assoc. Advmt. Anim. Breed. Genet. 23:270-273 investigation by the University of Sydney and EMAI. These emerging inherited diseases include: cardiomyopathy and woolly haircoat syndrome (CWH) in Hereford cattle, congenital mandibular prognathia (CMP) in Droughtmaster cattle, Niemann-Pick type C disease (NPC) in Angus cattle, new variants of ichthyosis fetalis (IF) in Hereford and Shorthorn cattle, the previously reported brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in Merino sheep (Shariflou et al. 2013), cervicothoracic vertebral subluxation (CVS) in Merino sheep, ovine dermatosparaxis (OD) in Merino sheep, and pulmonary hypoplasia with anasarca (PHA) in Persian sheep. The aim for these studies was to identify positional candidate genes and likely causal variants to facilitate improved management of at-risk populations for each inherited disease investigated.

MATERIALS AND METHODS Analysis of SNP genotype data for carrier and affected animals (Table 1) using sliding windows of 25, 50 and 100 SNPs to identify runs of homozygosity (ROH) was previously conducted (Table 1) for affected animals using the bovine UMD3.1 genome assembly and the ovine Oarv1.0 genome assembly (Woolley et al. 2017). ROH were analysed using PLINK (Purcell et al. 2007) and were considered to be regions of interest if these regions were shared by all of the affected animals only. These regions were scanned for positional candidate genes based on gene function and comparative genomics methods.

Table 1. Number of affected and carrier DNA samples submitted for SNP chip genotyping and regions of homozygosity, including species specific OMIA ID

Affected/ Disease OMIA ID1 Breed SNP chip Carrier Cardiomyopathy and woolly haircoat 000161-9913 Poll Hereford 2/0 SNP802 syndrome Congenital mandibular prognathia - Droughtmaster 9/4 SNP802 Ichthyosis fetalis 000547-9913 Hereford 1/3 SNP802 Niemann-Pick disease - Angus/Angus X 2/2 SNP802 Cervicothoracic vertebral subluxation 000077-9940 Merino 14/2 SNP503 Pulmonary hypoplasia with anasarca 000493-9940 Persian 5/5 SNP503 1OMIA http://omia.angis.org.au, - indicates no species specific OMIA ID.2 SNP80 = GeneSeek® Genomic Profiler Bovine HD Chip 80K chip (Neogen, NE, USA). 3SNP50 = Illumina® OvineSNP50 Genotyping BeadChip (CA, USA).

Sanger sequencing for inherited diseases with identified positional candidate genes commenced but was cost and labour intensive. Whole genome sequencing (WGS) was conducted for affected animals for CMP, BCRHS, CVS and PHA (Woolley et al. 2017) with 150bp paired-end reads at an expected coverage of 20X or 30X (Table 2). Sequence reads were aligned with BWA-mem (Li 2013) to either the bosTau8 or oviAri3 reference genome assemblies and analysed for novel genetic variants using a modified GATK best practice pipeline (McKennaet al. 2010; DePristo et al. 2010). Large structural variant calling was completed using DELLY (version 0.7.6), LUMPY-sv (version 0.2.12) and LUMPY SVtyper (Rausch et al. 2012; Layer et al. 2014). WGS data generated at the University of Bern similarly applied standard bioinformatics pipelines using software and steps to process fastq files into bam and GVCF files in accordance to the latest 1000 Bulls processing guidelines (www.1000bullgenomes.com). For variant filtering, control genomes from other samples that were sequenced during this study were

271

71 Detection of Causal Variants used according to species and breed, and for the Shorthorn IF and OD samples, 341 control genomes of various cattle breeds and 16 control genomes of various sheep breeds were used to identify novel variants for affected animals only.Genetic variants were annotated using SnpEff for predicted effects and filtered using SnpSift (Cingolaniet al. 2012). To predict the functional effects of candidate causal variants, both SnpEff and SIFT (Kumaret al. 2009; Cingolani et al. 2012) were used to assess whether candidate disease-causing variants were deleterious to protein function.

RESULTS AND DISCUSSION As previously identified, homozygosity mapping was able to successfully reveal and/or exclude positional candidate genes for all of the inherited diseases investigated, with a likely causal variant in a positional candidate gene identified for NPC through Sanger sequencing of affected animals (Shariflouet al. 2013; Woolley et al. 2017). Affected samples for BCRHS, CMP, CWH, IF, CVS, OD and PHA were re-sequenced using WGS (Table 2) as either homozygosity mapping did not identify positional candidate genes of interest or Sanger sequencing of affected animals did not identify causal variants within candidate positional candidate genes. Preliminary quality control analysis of the WGS data was positive (Woolley et al. 2017), however WGS for CWH in Poll Hereford cattle and IF in Hereford cattle was unsuccessful due to inadequate DNA quality. Further investigation of other positional candidate genes and genomic regions of interest based on SNP genotyping data will be required for CWH and IF. After application of filtering parameters on samples that were whole genome sequenced, numerous genetic variants that were homozygous for the alternate allele in the affected animal(s) only were identified either across the genome or within previously identified ROH (Table 2) (Woolleyet al. 2017).

Table 2. Variants identified in affected animals for which each animal was homozygous alternate to the reference sequence

No. homozygous Affected/ Likely causal Disease Breed alternate Carrier variant identified variants Brachygnathia, cardiomegaly and Merino 1 2151 Yes renal hypoplasia syndrome Cervicothoracic vertebral subluxation Merino 2 Ongoing Ongoing Ovine dermatosparaxis Merino 1 18642 Yes Pulmonary hypoplasia with anasarca Persian 2/1 3331,3 Under validation Congenital mandibular prognathia Droughtmaster 2 57804 Under validation Ichthyosis fetalis Shorthorn 1 2982 Yes 1Filtered for low, moderate and high impact with known dbsnps included.2Private homozygous alternate and heterozygous protein-changing variants with a moderate or high predicted impact.3At least one animal was homozygous alternate.4Includes SNPs and small indels.

Further manual filtering based on the predicted variant impact on protein function revealed candidate causal variants for BCRHS, PHA, CMP and IF in Shorthorn cattle (Table 2). A candidate causal variant with possible heterogeneity was identified for OD in Merino sheep and requires greater sample numbers to facilitate further validation. Genotyping assays were developed for these five inherited diseases, with preliminary validation results showing variant segregation with disease in related herds or flocks. The development of these genotyping assays has allowed for producers to facilitate forward planning breeding management strategies.

272

72 Proc. Assoc. Advmt. Anim. Breed. Genet. 23:270-273

Despite small sample sizes, poor phenotypic descriptions and challenging sample types, candidate causal variants have been successfully identified through the combined use of genome wide SNP genotyping, homozygosity mapping and WGS. These approaches have successfully identified candidate genes and causal mutations in a range of recessive inherited diseases in cattle, including ichthyosis fetalis in Chianina cattle (Charlier et al. 2008). The reporting of the inherited diseases investigated in these studies has enabled for better screening and preliminary management and has showcased the ability to identify candidate causal variants using modern genomic technologies.

CONCLUSIONS Despite the challenges surrounding insufficient sample numbers and poorly defined phenotypes, the results from these studies indicate that candidate causal variants can be identified by utilising targeted approaches. The identification of likely causal variants for BCRHS, OD with possible genetic heterogeneity, PHA, CMP, NPC and IF in Shorthorn cattle, has enabled for the development of genotyping assays that are able to successfully discriminate between homozygous wildtype, heterozygous and homozygous alternate genotypes. These assays are being used as a preliminary screen for related or founder herds or flocks and would prove to be a useful tool for screening wider populations to gain a more holistic understanding of population allele frequencies and future breed management strategies.

ACKNOWLEDGMENTS The authors acknowledge and thank the producers and veterinarians for the submission of samples to EMAI and the University of Sydney. The authors thank the genetics laboratory staff at EMAI for their assistance. The authors acknowledge the Sydney Informatics Hub, a Core Research Facility at the University of Sydney for facilitating access to the High Performance Computer Artemis, where computational analysis in this study was performed. The University of Sydney Faculty of Veterinary Science provided research support for 3 honours projects (NN 2013, ET 2014 & SW 2015), and funding support from the Animal Welfare, FS Quiney and Dorothy Minchin Bequests. The Australian Wool Education Trust is acknowledged for the support of a research scholarship for RT (2016) and PH (2018) and the Australian Government for the Australian Government Research Training Program (RTP) Scholarship for SW. The WGS is funded by a Faculty of Veterinary Science - DPI collaborative compact fund and research grant CRSII3_160738 from the Swiss National Science Foundation to TL.

REFERENCES Auwera G.A., Carneiro M.O., Hartl C., et al. (2013) Curr Protoc Bioinformatics 43: 11.0.1 Charlier C., Coppieters W., Rollin F., et al. (2008) Nat. Genet. 40: 449. Cingolani P., Platts A., Wang le L., et al. (2012) Fly (Austin) 6: 80. DePristo M.A., Banks E., Poplin R., et al. (2011) Nat. Genet. 43. Groeneveld L.F., Lenstra J.A., Eding H., et al. (2010) Anim. Genet. 41: 6. Kumar P., Henikoff S. and Ng P.C. (2009) Nat Protoc 4: 1073. Layer R.M., Chiang C., Quinlan A.R. and Hall I.M. (2014) Genome Biol. 15: R84. Li H. (2013) arXiv:1303.3997v1 [q-bio.GN] Man W.Y.N., Nicholas F.W. and James J.W. (2007) Prev. Vet. Med. 78: 262. McKenna A., Hanna M., Banks E., et al. (2010) Genome Res. 20: 1297. Purcell S., Neale B., Todd-Brown K., et al. (2007) Am. J. Hum. Genet. 81. Rausch T., Zichner T., Schlattl A., et al. (2012) Bioinformatics 28: i333. Shariflou M.R., Wade C.M., Kijas J., et al. (2013) Animal Genet. 44: 231. Woolley S.A., Tsimnadis E.R., Nowak N. et al. (2017) Proc. Assoc. Advmt. Anim. Breed. Genet. 15.

273

73 Chapter 3 | Ichthyosis fetalis in Shorthorn cattle

3.1 Synopsis

In this chapter, the identification of a causal variant in one affected Shorthorn calf using a whole genome sequencing approach followed by the validation of this variant in a herd of 130

Shorthorn cattle is described. This research is a continuation from the O’Rourke et al. 2017 study describing the clinical signs and histopathology of a Poll Hereford and Shorthorn calf with ichthyosis fetalis. After publication of the research in section 3.2, a likely causal missense variant was identified in the Poll Hereford calf via Sanger sequencing of the ABCA12 gene by Eager et al. 2020. As outlined in the Chapter 2 my contribution to the study by Eager et al. 2020 was limited and did not warrant inclusion in this thesis. The supplementary materials associated with the publication in section 3.2 are provided in section 3.3.

A license was obtained from John Wiley and Sons and the Copyright Clearance Center to include the publication of the work in section 3.2 in this thesis. The license number is

4920740594259 and was obtained on the 2nd October 2020.

3.2 An ABCA12 missense variant in a Shorthorn calf with ichthyosis fetalis

74 SHORT COMMUNICATION doi: 10.1111/age.12856 An ABCA12 missense variant in a Shorthorn calf with ichthyosis fetalis

† ‡ ‡ ‡ ‡ S. A. Woolley* , K. L. M. Eager ,I.M.Hafliger€ , A. Bauer ,C.Drogem€ uller€ , T. Leeb , † B. A. O’Rourke and I. Tammen* † *Faculty of Science, Sydney School of Veterinary Science, University of Sydney, Camden, 2570, NSW, Australia. NSW Department of ‡ Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, 2568, NSW, Australia. Vetsuisse Faculty, Institute of Genetics, University of Bern, Bern 3001, Switzerland.

Summary Two clinical forms of ichthyosis in cattle have been reported, ichthyosis fetalis and congenital ichthyosis. Ichthyosis poses animal welfare and economic issues and the more severe form, ichthyosis fetalis, is lethal. A Shorthorn calf with ichthyosis fetalis was investigated and a likely causal missense variant on chromosome 2 in the ABCA12 gene (NM_001191294.2:c.6776T>C) was identified by whole genome sequencing. Mutations in the ABCA12 gene are known to cause ichthyosis fetalis in cattle and Harlequin ichthyosis in humans. Sanger sequencing of the affected calf and the dam confirmed the variant was homozygous in the affected calf and heterozygous in the dam. Further genotyping of 130 Shorthorn animals from the same property revealed an estimated allele frequency of 3.8%. The presented findings enable genetic testing for breeding and diagnostics.

Keywords cattle, ichthyosis fetalis, rare disease, recessive, whole genome sequencing

Ichthyosis fetalis, or Harlequin ichthyosis, is part of a Akiyama 2014; Zhang et al. 2016). Other species for which heterogeneous group of rare inherited skin conditions that causal mutations within the ABCA12 gene have been are characterised by general hyperkeratosis (Akiyama et al. identified include cattle (Charlier et al. 2008) and a line of 2005). Ichthyosis fetalis is the most severe and lethal form mice (Smyth et al. 2008). The ABCA12 protein is believed to of the non-syndromic ichthyoses, and is characterised by play a major role in transporting keratinocyte lipids across alopecia, hard skin plaques and deep skin fissures that cell membranes within the lamellar granules of granular restrict movement and cause malformation of the eyes, lips layer keratinocytes that form the lipid layers in the stratum and ears (Thomas et al. 2006). Non-syndromic ichthyoses corneum (Akiyama 2014). have been reported in several species, including humans The purpose of this study was to identify the causal (OMIM 242500), cattle (OMIA 002193-9913 and OMIA mutation for a previously reported case of ichthyosis fetalis 002188-9913), sheep (OMIA 002193-9940), mice (MGI ID in a Shorthorn calf (O’Rourke et al. 2017). O’Rourke et al. 4461044), dogs (OMIA 000546-9615, OMIA 001588- (2017) determined that clinical signs and gross histopatho- 9615, OMIA 001973-9615, OMIA 001980-9615 and logical findings in the Shorthorn calf were consistent with OMIA 002099-9615) and greater kudu (OMIA 002188- ichthyosis fetalis. The affected calf and its reported parents 9946). In humans, 15 genes have been associated with were homozygous wildtype for the only known bovine non-syndromic forms of ichthyosis and these numbers are missense mutation identified in the ABCA12 gene in expected to increase as molecular genetic studies identify Chianina cattle for ichtyosis fetalis, NM_001191294.2: further underlying genetic causes for other forms of g.103030489T>C (positive strand) (Charlier et al. 2008; ichthyosis (Oji et al. 2010). More than 50 causal mutations O’Rourke et al. 2017). for Harlequin ichthyosis in humans have been identified in Whole genome sequencing (WGS) using the Illumina the ATP binding cassette subfamily A member 12 HiSeqâ 3000 was performed using DNA extracted from (ABCA12) gene (Akiyama et al. 2005; Kelsell et al. 2005; EDTA blood of the affected Shorthorn calf. The sequence reads were mapped to the ARS-UCD1.2 bovine genome Address for correspondence assembly and variants were identified. The applied software

I. Tammen, Faculty of Science, Sydney School of Veterinary Science, and steps to process fastq files into binary alignment map University of Sydney, Camden, 2570 NSW, Australia. (bam) and genomic variant call format (GVCF) files are in E-mail: [email protected] accordance with the latest 1000 Bulls processing guidelines Accepted for publication 09 August 2019 (www.1000bullgenomes.com). Single nucleotide variants

© 2019 Stichting International Foundation for Animal Genetics 1 75 2 Woolley et al.

and small indel variants were called using GENOTYPEGVCFs of (ThermoFisher Scientific, USA) and a modified in-house GATK (version 3.8; DePristo et al. 2011). To predict the parentage verification protocol (Lee et al. 2018) with 200 functional effects of the called variants, SNPEFF software SNPs from the International Society of Animal Genetics (Cingolani et al. 2012) together with the NCBI Bos taurus panel, parentage verification for the affected calf and both Annotation (release 106) ARS-UCD1.2 were used. For reported parents confirmed the dam and excluded the variant filtering, 341 control genomes of various cattle reported sire when analysing hot spot data using CERVUS breeds were used (Table S1). A total of 298 private protein- version 3.0.7 (Kalinowski et al. 2007; Table S4), explaining changing variants with a moderate or high predicted the sire’s genotype at the ABCA12 locus. impact, located within 257 different genes or loci, were The prevalence of the ABCA12 c.6776T>C variant was identified in the affected calf (Table S2). Of these, only one determined from tail hair samples of 130 Shorthorn animals variant was present in a known candidate gene. This single from the same property. Genomic DNA was isolated from homozygous missense variant in ABCA12 on chromosome hair roots using a standard hair digest protocol (Healy et al. 2:103016791T>C or NM_001191294.2:c.6776T>C was 1995). A custom allelic discrimination assay was used to further investigated as the most likely causal mutation for genotype the c.6776T>C variant (Applied BiosystemsTM, CA, the observed syndrome. USA) (Appendix S2). Ten animals were identified as Analysis of the identified ABCA12 missense variant heterozygous, with the remaining 120 animals homozy- (NM_001191294.2:c.6776T>C) using polymorphism phe- gous wildtype for the ABCA12 c.6776T>C variant, giving notyping (PolyPhen-2) (Adzhubei et al. 2010) predicted an estimated allele frequency of 3.8% for this herd. that the variant was deleterious to protein function and the The predicted deleterious protein effect of the c.6776T>C SNPEFF software (Cingolani et al. 2012) predicted a moderate variant and the conservation of the leucine amino acid impact on protein function. Cross-species protein align- residue at position 2259 of ABCA12 across nine species ments using T-COFFEE (Di Tommaso et al. 2011) and suggest that the novel variant is likely to have caused the BOXSHADE (version 3.2) showed that the implicated disease in the affected Shorthorn calf (O’Rourke et al. 2017). leucine amino acid in the predicted amino acid exchange Human ichthyosis studies have shown that at least one from leucine to proline (NP_001178223.2:p.(Leu2259- deletion or truncation mutation in ABCA12 leads to the Pro)) in the affected calf is highly conserved (Fig. 1). severe phenotype of Harlequin ichthyosis owing to severe To confirm that the c.6776T>C variant in ABCA12 alteration of the ABCA12 protein (Akiyama 2006; Akiyama segregated in a recessive mode of inheritance, the region et al. 2006). In contrast, missense mutations in the ABCA12 was amplified by PCR and Sanger sequenced in the affected gene are often associated with a less severe type of calf, reported sire and dam and four unrelated Shorthorn ichthyosis in humans: lamellar ichthyosis type 2 (Lefevre samples (Appendix S1). Analysis of sequencing data using et al. 2003; Akiyama et al. 2006). However, a study GENEIOUS version 11.0.3 (https://www.geneious.com) conducted by Akiyama et al. (2006) showed that a human revealed that the affected calf was homozygous for the patient with Harlequin ichthyosis harboured a causal variant C allele, the obligate carrier dam was heterozygous, compound heterozygous mutation: a de-novo missense but the reported sire and the four non-related Shorthorn mutation and a maternal deletion mutation in the ABCA12 controls were homozygous for the wildtype allele (Fig. 2, gene. Additional studies have shown that causal compound Table S3). Using the Ion Torrent S5TM XL system heterozygous mutations containing at least one missense

Figure 1 Multiple-species ABCA12 protein alignment using T-COFFEE and BOXSHADE was completed using accession numbers NP_001178223.2 (Bos taurus), NP_775099.2 (Homo sapiens), XP_001149722.1 (Pan troglodytes), XP_001084970.2 (Macaca mulatta), XP_536058.2 (Canis lupus familiaris), NP_780419.2 (Mus musculus), XP_237242.6 (Rattus norvegicus), XP_421867.4 (Gallus gallus), XP_686632.6 (Danio rerio) and XP_004918315.1 (Xenopus tropicalis). CowMT refers to the mutant Bos taurus sequence and CowWT refers to the wildtype Bos taurus sequence. The predicted change from the highly conserved leucine to proline in the affected calf is highlighted by an asterisk.

© 2019 Stichting International Foundation for Animal Genetics, doi: 10.1111/age.12856 76 ABCA12 variant in a calf with ichthyosis fetalis 3

Figure 2 Schematic diagram of the ABCA12 gene showing the location of the candidate causal variant NM_001191294.2:g.103016791T>C with chromatograms from Sanger sequencing data for the reported dam, sire and affected calf. (a) Location of the bovine ABCA12 gene, Chr2:103002532-103202095 on the ARS-UCD1.2 bovine genome assembly. (b) Enlarged view of the ABCA12 gene with 53 exons. (c) Genomic region containing the T>C missense variant with protein translation frames obtained from NCBI Genomic Data Viewer (NCBI, accessed 7 May 2019, ). The correct protein reading frame is identi- fied by a black box. (d) Sanger sequencing chromatograms for the reported dam, sire and affected calf.

mutation have been associated with the human Harlequin ichthyosis phenotype (Xie et al. 2016; Loo et al. 2018). Acknowledgements These findings suggest that the location of causal mutations The authors would like to acknowledge and thank the affecting the ABCA12 protein may impact on the severity of producer and veterinarians Jillian Kelly and Erica Kennedy the ichthyosis phenotype observed (Akiyama et al. 2006). for submitting samples and data associated with this case. Rare diseases in livestock animals are traditionally poorly This study was supported by the University of Sydney and diagnosed. The report of this case by local district veteri- NSW Department of Primary Industries compact funding narians followed by the diagnosis of ichthyosis fetalis and and an Australian Government Research Training Program WGS has resulted in a likely causal variant to be identified Scholarship. The authors would like to thank the Next in the known ABCA12 candidate gene, which has facilitated Generation Sequencing Platform of the University of Bern improved breeding management practices to be imple- for performing the WGS experiment, and the Interfaculty mented through the availability of a genotyping assay. Bioinformatics Unit of the University of Bern for providing Future studies to assess the functionality of the ABCA12 high-performance computing infrastructure. protein in the presence of the c.6776T>C variant will be valuable for understanding the biological impact of the variant. Screening the c.6776T>C variant in the wider Availability of data Australian Shorthorn population will enable better assess- The WGS data were deposited under the study accession no. ment of the population allele frequency for this variant. PRJEB18113 at the European Nucleotide Archive (www.eb

© 2019 Stichting International Foundation for Animal Genetics, doi: 10.1111/age.12856 77 4 Woolley et al.

i.ac.uk/ena) and the sample accession no. is Loo B.K.G., Batilando M.J., Tan E.C. & Koh M.J.A. (2018) SAMEA4644752. Compound heterozygous mutations with novel missense ABCA12 mutation in harlequin ichthyosis. BMJ Case Reports 2018,1–4 bcr-2017-222025. References Oji V., Tadini G., Akiyama M. et al. (2010) Revised nomenclature and classification of inherited ichthyoses: results of the First Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova Ichthyosis Consensus Conference in Soreze 2009. Journal of the A., Bork P., Kondrashov A.S. & Sunyaev S.R. (2010) A method American Academy of Dermatology 63, 607–41. and server for predicting damaging missense mutations. Nature O’Rourke B.A., Kelly J., Spiers Z.B., Shearer P.L., Porter N.S., Parma Methods 7, 248–9. P. & Longeri M. (2017) Ichthyosis fetalis in Polled Hereford and Akiyama M. (2006) Pathomechanisms of harlequin ichthyosis and Shorthorn calves. Journal of Veterinary Diagnostic Investigation 29, ABCA transporters in human diseases. Archives of Dermatology 874–6. 142, 914–8. Smyth I., Hacking D.F., Hilton A.A. et al. (2008) A mouse model of Akiyama M. (2014) The roles of ABCA12 in epidermal lipid barrier harlequin ichthyosis delineates a key role for Abca12 in lipid formation and keratinocyte differentiation. Biochimica et Biophys- homeostasis. PLoS Genetics 4, e1000192. ica Acta 1841, 435–40. Thomas A.C., Cullup T., Norgett E.E. et al. (2006) ABCA12 is the Akiyama M., Sugiyama-Nakagiri Y., Sakai K. et al. (2005) Muta- major harlequin ichthyosis gene. Journal of Investigative Derma- tions in lipid transporter ABCA12 in harlequin ichthyosis and tology 126, 2408–13. functional recovery by corrective gene transfer. Journal of Clinical Xie H., Xie Y., Peng R., Li L., Zhu Y. & Guo J. (2016) Harlequin Investigation 115, 1777–84. ichthyosis: a novel compound mutation of ABCA12 with Akiyama M., Sakai K., Sugiyama-Nakagiri Y., Yamanaka Y., prenatal diagnosis. Clinical And Experimental Dermatology 41, McMillan J.R., Sawamura D., Niizeki H., Miyagawa S. & Shimizu 636–9. H. (2006) Compound Heterozygous Mutations Including a De Zhang L., Ferreyros M., Feng W. et al. (2016) Defects in stratum Novo Missense Mutation in ABCA12 Led to a Case of Harlequin corneum desquamation are the predominant effect of impaired Ichthyosis with Moderate Clinical Severity. Journal of Investigative ABCA12 function in a novel mouse model of harlequin Dermatology 126, 1518–23. ichthyosis. PLoS ONE 11, e0161465. Charlier C., Coppieters W., Rollin F. et al. (2008) Highly effective SNP-based association mapping and management of recessive defects in livestock. Nature Genetics 40, 449–54. Supporting information Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., Land S.J., Lu X. & Ruden D.M. (2012) A program for annotating Additional supporting information may be found online in and predicting the effects of single nucleotide polymorphisms, the Supporting Information section at the end of the article. SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6,80–92. Table S1 Cattle breeds used for variant filtering of the DePristo M.A., Banks E., Poplin R. et al. (2011) A framework for whole genome sequencing data obtained from the affected variation discovery and genotyping using next-generation DNA Shorthorn calf. 43 – sequencing data. Nature Genetics , 491 501. Table S2 Private protein-changing variants with moder- Di Tommaso P., Moretti S., Xenarios I., Orobitg M., Montanyola A., ate or high predicted impact located within 257 different Chang J.-M., Taly J.-F. & Notredame C. (2011) T-Coffee: a web genes or loci in the whole genome sequencing data of the server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. affected calf. Table S3 Nucleic Acids Research 39, W13–7. Variants identified in Sanger re-sequencing of Healy P.J., Dennis J.A. & Moule J.F. (1995) Use of hair root as a the reported dam, reported sire, affected calf and four source of DNA for the detection of heterozygotes for recessive unrelated Shorthorn controls. defects in cattle. Australian Veterinary Journal 72, 392. Table S4 Parentage testing results. Output data from Kalinowski S.T., Taper M.L. & Marshall T.C. (2007) Revising how Cervus for the affected calf, reported dam and reported sire the computer program CERVUS accommodates genotyping error showing 0/197 loci mismatches and a positive LOD score increases success in paternity assignment. Molecular Ecology 16, for the reported dam with strict pair confidence (*) assigned – 1099 106. at 95%, and 20/197 loci mismatches and a negative LOD Kelsell D.P., Norgett E.E., Unsworth H. et al. (2005) Mutations in score for the reported sire with relaxed confidence (+) set at ABCA12 underlie the severe congenital skin disease harlequin 80%. ichthyosis. American Journal of Human Genetics 76, 794–803. Appendix S1 Lee E., Le T., Zhu Y. et al. (2018) A craniosynostosis massively Methodology for conventional PCR to create parallel sequencing panel study in 309 Australian and New template for Sanger sequencing. Zealand patients: findings and recommendations. Genetics in Appendix S2 Methodology for the custom allelic discrim- Medicine 20, 1061–8. ination assay real-time PCR. Lefevre C., Audebert S., Jobard F. et al. (2003) Mutations in the transporter ABCA12 are associated with lamellar ichthyosis type 2. Human Molecular Genetics 12, 2369–78.

© 2019 Stichting International Foundation for Animal Genetics, doi: 10.1111/age.12856 78 3.3 Appendix: Supplementary material for Chapter 3

Table S1 Cattle breeds used for variant filtering of the whole genome sequence data obtained from the affected Shorthorn calf.

EBI Project ID EBI Sample ID Breed PRJEB12093 SAMEA3706825 Angus PRJEB18113 SAMEA5159851 Australian Lowline PRJEB18113 SAMEA5415491 Belgium Blue PRJEB18113 SAMEA5415493 Belgium Blue x Holstein PRJEB18113 SAMEA5415495 Belgium Blue x Holstein PRJEB18113 SAMEA5415497 Belgium Blue x Holstein PRJEB14604 SAMEA4051548 Brown Swiss PRJEB18113 SAMEA4644727 Brown Swiss PRJEB18113 SAMEA4644728 Brown Swiss PRJEB18113 SAMEA4644735 Brown Swiss PRJEB18113 SAMEA4644739 Brown Swiss PRJEB18113 SAMEA4644742 Brown Swiss PRJEB18113 SAMEA4644743 Brown Swiss PRJEB18113 SAMEA4644754 Brown Swiss PRJEB18113 SAMEA4644755 Brown Swiss PRJEB18113 SAMEA4644756 Brown Swiss PRJEB18113 SAMEA4644757 Brown Swiss PRJEB18113 SAMEA4644758 Brown Swiss PRJEB18113 SAMEA4644762 Brown Swiss PRJEB18113 SAMEA4644763 Brown Swiss PRJEB18113 SAMEA4644765 Brown Swiss PRJEB18113 SAMEA4644766 Brown Swiss PRJEB18113 SAMEA4644769 Brown Swiss PRJEB18113 SAMEA5415488 Brown Swiss PRJEB18113 SAMEA5415489 Brown Swiss PRJEB18113 SAMEA5415490 Brown Swiss PRJEB18113 SAMEA5415498 Brown Swiss PRJEB18113 SAMEA19312918 Brown Swiss PRJEB18113 SAMEA19313668 Brown Swiss PRJEB18113 SAMEA19314418 Brown Swiss PRJEB18113 SAMEA19315168 Brown Swiss PRJEB18113 SAMEA19318918 Brown Swiss PRJEB18113 SAMEA19323418 Brown Swiss PRJEB18113 SAMEA19847668 Brown Swiss

79 EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA19864918 Brown Swiss PRJEB18113 SAMEA32980918 Brown Swiss PRJEB18113 SAMEA32981668 Brown Swiss PRJEB18113 SAMEA32982418 Brown Swiss PRJEB18113 SAMEA32997418 Brown Swiss PRJEB18113 SAMEA5159761 Brown Swiss PRJEB18113 SAMEA5159769 Brown Swiss PRJEB18113 SAMEA5159770 Brown Swiss PRJEB18113 SAMEA5159771 Brown Swiss PRJEB18113 SAMEA5159772 Brown Swiss PRJEB18113 SAMEA5159773 Brown Swiss PRJEB18113 SAMEA5159774 Brown Swiss PRJEB18113 SAMEA5159775 Brown Swiss PRJEB18113 SAMEA5159777 Brown Swiss PRJEB18113 SAMEA5159778 Brown Swiss PRJEB18113 SAMEA5159779 Brown Swiss PRJEB18113 SAMEA5159780 Brown Swiss PRJEB18113 SAMEA5159781 Brown Swiss PRJEB18113 SAMEA5159782 Brown Swiss PRJEB18113 SAMEA5159783 Brown Swiss PRJEB18113 SAMEA5159784 Brown Swiss PRJEB18113 SAMEA5159785 Brown Swiss PRJEB18113 SAMEA5159786 Brown Swiss PRJEB18113 SAMEA5159787 Brown Swiss PRJEB18113 SAMEA5159788 Brown Swiss PRJEB18113 SAMEA5159791 Brown Swiss PRJEB18113 SAMEA5159792 Brown Swiss PRJEB18113 SAMEA5159793 Brown Swiss PRJEB18113 SAMEA5159797 Brown Swiss PRJEB18113 SAMEA5159798 Brown Swiss PRJEB18113 SAMEA5159799 Brown Swiss PRJEB18113 SAMEA5159847 Brown Swiss PRJEB18113 SAMEA5159853 Brown Swiss PRJEB18113 SAMEA5159861 Brown Swiss PRJEB18113 SAMEA5159862 Brown Swiss PRJEB18113 SAMEA5159863 Brown Swiss PRJEB18113 SAMEA5159864 Brown Swiss PRJEB18113 SAMEA5159865 Brown Swiss PRJEB18113 SAMEA5159866 Brown Swiss PRJEB18113 SAMEA5159867 Brown Swiss

80 EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA5159868 Brown Swiss PRJEB18113 SAMEA5159869 Brown Swiss PRJEB18113 SAMEA5159870 Brown Swiss PRJEB18113 SAMEA5159871 Brown Swiss PRJEB18113 SAMEA5159872 Brown Swiss PRJEB18113 SAMEA5159873 Brown Swiss PRJEB18113 SAMEA5159874 Brown Swiss PRJEB18113 SAMEA5159875 Brown Swiss PRJEB18113 SAMEA5159885 Brown Swiss PRJEB18113 SAMEA5415485 Brown Swiss PRJEB18113 SAMEA5415486 Brown Swiss PRJEB7528 SAMEA2821387 Charolais PRJEB18113 SAMEA32998168 Chianina PRJEB18113 SAMEA32999668 Chianina PRJEB18113 SAMEA33668668 Chianina PRJEB18113 SAMEA5159835 Chianina PRJEB18113 SAMEA32989918 Cika PRJEB12094 SAMEA3706827 Danish Red Dairy PRJEB12094 SAMEA3706828 Danish Red Dairy PRJEB12094 SAMEA3706829 Danish Red Dairy PRJEB18113 SAMEA19849168 Eringer PRJEB18113 SAMEA4560538 Eringer PRJEB14604 SAMEA4051550 Galloway PRJEB7527 SAMEA2821386 Hereford PRJEB11963 SAMEA3682653 Holstein PRJEB11963 SAMEA3682654 Holstein PRJEB12092 SAMEA3706814 Holstein PRJEB12092 SAMEA3706815 Holstein PRJEB12092 SAMEA3706816 Holstein PRJEB12095 SAMEA3706830 Holstein PRJEB18113 SAMEA4644726 Holstein PRJEB18113 SAMEA4644729 Holstein PRJEB18113 SAMEA4644731 Holstein PRJEB18113 SAMEA4644732 Holstein PRJEB18113 SAMEA4644733 Holstein PRJEB18113 SAMEA4644736 Holstein PRJEB18113 SAMEA4644737 Holstein PRJEB18113 SAMEA4644738 Holstein PRJEB18113 SAMEA4644746 Holstein PRJEB18113 SAMEA4644747 Holstein

81 EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA4644748 Holstein PRJEB18113 SAMEA4644751 Holstein PRJEB18113 SAMEA4644753 Holstein PRJEB18113 SAMEA4644759 Holstein PRJEB18113 SAMEA4644760 Holstein PRJEB18113 SAMEA4644761 Holstein PRJEB18113 SAMEA4644767 Holstein PRJEB18113 SAMEA5415492 Holstein PRJEB18113 SAMEA5415494 Holstein PRJEB18113 SAMEA5415496 Holstein PRJEB18113 SAMEA5415500 Holstein PRJEB18113 SAMEA5415501 Holstein PRJEB18113 SAMEA5415502 Holstein PRJEB18113 SAMEA5415503 Holstein PRJEB18113 SAMEA19309918 Holstein PRJEB18113 SAMEA19310668 Holstein PRJEB18113 SAMEA19311418 Holstein PRJEB18113 SAMEA19312168 Holstein PRJEB18113 SAMEA19316668 Holstein PRJEB18113 SAMEA19317418 Holstein PRJEB18113 SAMEA19318168 Holstein PRJEB18113 SAMEA19320418 Holstein PRJEB18113 SAMEA19321168 Holstein PRJEB18113 SAMEA19321918 Holstein PRJEB18113 SAMEA19322668 Holstein PRJEB18113 SAMEA19325668 Holstein PRJEB18113 SAMEA19846918 Holstein PRJEB18113 SAMEA19848418 Holstein PRJEB18113 SAMEA19864168 Holstein PRJEB18113 SAMEA19865668 Holstein PRJEB18113 SAMEA19867168 Holstein PRJEB18113 SAMEA19871668 Holstein PRJEB18113 SAMEA19874668 Holstein PRJEB18113 SAMEA19876918 Holstein PRJEB18113 SAMEA32983168 Holstein PRJEB18113 SAMEA32983918 Holstein PRJEB18113 SAMEA32984668 Holstein PRJEB18113 SAMEA32986918 Holstein PRJEB18113 SAMEA32990668 Holstein PRJEB18113 SAMEA32991418 Holstein

82 EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA32992168 Holstein PRJEB18113 SAMEA32995168 Holstein PRJEB18113 SAMEA32995918 Holstein PRJEB18113 SAMEA32996668 Holstein PRJEB18113 SAMEA33000418 Holstein PRJEB18113 SAMEA33004168 Holstein PRJEB18113 SAMEA4560543 Holstein PRJEB18113 SAMEA5159802 Holstein PRJEB18113 SAMEA5159804 Holstein PRJEB18113 SAMEA5159808 Holstein PRJEB18113 SAMEA5159809 Holstein PRJEB18113 SAMEA5159810 Holstein PRJEB18113 SAMEA5159813 Holstein PRJEB18113 SAMEA5159814 Holstein PRJEB18113 SAMEA5159815 Holstein PRJEB18113 SAMEA5159816 Holstein PRJEB18113 SAMEA5159817 Holstein PRJEB18113 SAMEA5159818 Holstein PRJEB18113 SAMEA5159819 Holstein PRJEB18113 SAMEA5159820 Holstein PRJEB18113 SAMEA5159821 Holstein PRJEB18113 SAMEA5159822 Holstein PRJEB18113 SAMEA5159826 Holstein PRJEB18113 SAMEA5159828 Holstein PRJEB18113 SAMEA5159832 Holstein PRJEB18113 SAMEA5159833 Holstein PRJEB18113 SAMEA5159834 Holstein PRJEB18113 SAMEA5159839 Holstein PRJEB18113 SAMEA5159840 Holstein PRJEB18113 SAMEA5159841 Holstein PRJEB18113 SAMEA5159842 Holstein PRJEB18113 SAMEA5159844 Holstein PRJEB18113 SAMEA5159854 Holstein PRJEB18113 SAMEA5159855 Holstein PRJEB18113 SAMEA5159856 Holstein PRJEB18113 SAMEA5159857 Holstein PRJEB18113 SAMEA5159858 Holstein PRJEB18113 SAMEA5159859 Holstein PRJEB18113 SAMEA5159860 Holstein PRJEB18113 SAMEA5159878 Holstein

83

EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA5159879 Holstein PRJEB18113 SAMEA5159880 Holstein PRJEB18113 SAMEA5159881 Holstein PRJEB18113 SAMEA5159882 Holstein PRJEB18113 SAMEA5159883 Holstein PRJEB18113 SAMEA5159884 Holstein PRJEB18113 SAMEA5159890 Holstein PRJEB18113 SAMEA5160021 Holstein PRJEB18113 SAMEA5160152 Holstein PRJEB18113 SAMEA5415483 Holstein PRJEB18113 SAMEA5415484 Holstein PRJEB7707 SAMEA3113485 Holstein PRJEB18113 SAMEA19867918 Limousin PRJEB18113 SAMEA32985418 Limousin PRJEB18113 SAMEA5159763 Limousin PRJEB18113 SAMEA32992918 Limousin x Brown Swiss PRJEB18113 SAMEA19866418 Limousin x Holstein PRJEB18113 SAMEA33001168 Normande PRJEB18113 SAMEA4644730 Original Braunvieh PRJEB18113 SAMEA4644734 Original Braunvieh PRJEB18113 SAMEA4644740 Original Braunvieh PRJEB18113 SAMEA4644741 Original Braunvieh PRJEB18113 SAMEA4644749 Original Braunvieh PRJEB18113 SAMEA4644750 Original Braunvieh PRJEB18113 SAMEA4644764 Original Braunvieh PRJEB18113 SAMEA4644768 Original Braunvieh PRJEB18113 SAMEA5159767 Original Braunvieh PRJEB18113 SAMEA5159768 Original Braunvieh PRJEB18113 SAMEA5159776 Original Braunvieh PRJEB18113 SAMEA5159789 Original Braunvieh PRJEB18113 SAMEA5159790 Original Braunvieh PRJEB18113 SAMEA5159794 Original Braunvieh PRJEB18113 SAMEA5159795 Original Braunvieh PRJEB18113 SAMEA5159796 Original Braunvieh PRJEB18113 SAMEA5159837 Original Braunvieh PRJEB18113 SAMEA5159843 Original Braunvieh PRJEB18113 SAMEA5159848 Original Braunvieh PRJEB18113 SAMEA5159849 Original Braunvieh PRJEB18113 SAMEA5159850 Original Braunvieh PRJEB18113 SAMEA5159886 Original Braunvieh

84

EBI Project ID EBI Sample ID Breed PRJEB28191 SAMEA4827645 Original Braunvieh PRJEB28191 SAMEA4827646 Original Braunvieh PRJEB28191 SAMEA4827647 Original Braunvieh PRJEB28191 SAMEA4827648 Original Braunvieh PRJEB28191 SAMEA4827649 Original Braunvieh PRJEB28191 SAMEA4827650 Original Braunvieh PRJEB28191 SAMEA4827651 Original Braunvieh PRJEB28191 SAMEA4827652 Original Braunvieh PRJEB28191 SAMEA4827653 Original Braunvieh PRJEB28191 SAMEA4827654 Original Braunvieh PRJEB28191 SAMEA4827655 Original Braunvieh PRJEB28191 SAMEA4827656 Original Braunvieh PRJEB28191 SAMEA4827657 Original Braunvieh PRJEB28191 SAMEA4827658 Original Braunvieh PRJEB28191 SAMEA4827659 Original Braunvieh PRJEB28191 SAMEA4827660 Original Braunvieh PRJEB28191 SAMEA4827661 Original Braunvieh PRJEB28191 SAMEA4827662 Original Braunvieh PRJEB28191 SAMEA4827663 Original Braunvieh PRJEB28191 SAMEA4827664 Original Braunvieh PRJEB28191 SAMEA4827665 Original Braunvieh PRJEB28191 SAMEA4827666 Original Braunvieh PRJEB28191 SAMEA4827667 Original Braunvieh PRJEB28191 SAMEA4827668 Original Braunvieh PRJEB28191 SAMEA4827669 Original Braunvieh PRJEB28191 SAMEA4827670 Original Braunvieh PRJEB28191 SAMEA4827671 Original Braunvieh PRJEB28191 SAMEA4827672 Original Braunvieh PRJEB28191 SAMEA4827673 Original Braunvieh PRJEB28191 SAMEA4827674 Original Braunvieh PRJEB28191 SAMEA5059741 Original Braunvieh PRJEB28191 SAMEA5059742 Original Braunvieh PRJEB28191 SAMEA5059743 Original Braunvieh PRJEB28191 SAMEA5059744 Original Braunvieh PRJEB28191 SAMEA5059745 Original Braunvieh PRJEB28191 SAMEA5059746 Original Braunvieh PRJEB28191 SAMEA5059747 Original Braunvieh PRJEB28191 SAMEA5059748 Original Braunvieh PRJEB28191 SAMEA5059749 Original Braunvieh PRJEB28191 SAMEA5059750 Original Braunvieh

85

EBI Project ID EBI Sample ID Breed PRJEB28191 SAMEA5059751 Original Braunvieh PRJEB28191 SAMEA5059752 Original Braunvieh PRJEB28191 SAMEA5059753 Original Braunvieh PRJEB28191 SAMEA5059754 Original Braunvieh PRJEB28191 SAMEA5059755 Original Braunvieh PRJEB28191 SAMEA5059756 Original Braunvieh PRJEB28191 SAMEA5059757 Original Braunvieh PRJEB28191 SAMEA5059758 Original Braunvieh PRJEB28191 SAMEA5059759 Original Braunvieh PRJEB8226 SAMEA3209361 Pezzata Rossa Italiana PRJEB18113 SAMEA19324918 Piemontese PRJEB18113 SAMEA33001918 Piemontese PRJEB18113 SAMEA5159836 Piemontese PRJEB18113 SAMEA19872418 Piemontese x Normande PRJEB18113 SAMEA5159845 Pinzgauer PRJEB18113 SAMEA19876168 Romagnola PRJEB18113 SAMEA32987668 Romagnola PRJEB18113 SAMEA32988418 Romagnola PRJEB18113 SAMEA32989168 Romagnola PRJEB5965 SAMEA2422242 Romagnola PRJEB18113 SAMEA19849918 Scottish Highland PRJEB18113 SAMEA19873918 Scottish Highland PRJEB18113 SAMEA5159829 Scottish Highland PRJEB18113 SAMEA5159830 Scottish Highland PRJEB18113 SAMEA5159831 Scottish Highland PRJEB12093 SAMEA3706826 Simmental PRJEB18113 SAMEA4644744 Simmental PRJEB18113 SAMEA4644745 Simmental PRJEB18113 SAMEA5415499 Simmental PRJEB18113 SAMEA19309168 Simmental PRJEB18113 SAMEA19852168 Simmental PRJEB18113 SAMEA19852918 Simmental PRJEB18113 SAMEA19853668 Simmental PRJEB18113 SAMEA19868668 Simmental PRJEB18113 SAMEA19869418 Simmental PRJEB18113 SAMEA19870168 Simmental PRJEB18113 SAMEA19875418 Simmental PRJEB18113 SAMEA19877668 Simmental PRJEB18113 SAMEA33004918 Simmental PRJEB18113 SAMEA33669418 Simmental

86 EBI Project ID EBI Sample ID Breed PRJEB18113 SAMEA5159760 Simmental PRJEB18113 SAMEA5159765 Simmental PRJEB18113 SAMEA5159766 Simmental PRJEB18113 SAMEA5159800 Simmental PRJEB18113 SAMEA5159801 Simmental PRJEB18113 SAMEA5159803 Simmental PRJEB18113 SAMEA5159805 Simmental PRJEB18113 SAMEA5159806 Simmental PRJEB18113 SAMEA5159807 Simmental PRJEB18113 SAMEA5159811 Simmental PRJEB18113 SAMEA5159812 Simmental PRJEB18113 SAMEA5159823 Simmental PRJEB18113 SAMEA5159824 Simmental PRJEB18113 SAMEA5159825 Simmental PRJEB18113 SAMEA5159827 Simmental PRJEB18113 SAMEA5159876 Simmental PRJEB18113 SAMEA5159877 Simmental PRJEB18113 SAMEA5160153 Simmental PRJEB18113 SAMEA5415487 Simmental PRJEB12093 SAMEA3706824 Simmental x Angus PRJEB18113 SAMEA5159762 Simmental x Holstein PRJEB18113 SAMEA5159846 Tux-Zillertal PRJEB11962 SAMEA3682652 Tyrolean Grey PRJEB18113 SAMEA5159887 Tyrolean Grey PRJEB18113 SAMEA5159888 Tyrolean Grey PRJEB18113 SAMEA5159889 Tyrolean Grey PRJEB5435 SAMEA2357050 Tyrolean Grey

87

Table S3 Variants identified in Sanger re-sequencing of the reported dam, reported sire, affected calf and four un-related Shorthorn controls.

Genomic location Ref Alt Type Sample ID Genotype of individual (ARS-UCD1.2)

Chr2:g.103016935 G A Non-coding Reported dam GA Chr2:g.103016898 G A rs209014003 Control A AA Control B AA Control C AA Control D AA Reported dam GA Affected calf AA Reported sire GA Chr2:g.103016791 A G Likely causal Reported dam AG variant Affected calf GG Reported sire AA Chr2:g.103016660 C T rs135167457 Control A TT Control B TT Control C TT Control D TT Reported dam TT Affected calf TT Reported sire CT Chr2:g.103016544 A G rs211650610 Control A GG Control B GG Control C GG Control D GG Reported dam GG Affected calf GG Reported sire GG

88

Table S4 Parentage testing results. Output data from Cervus for the affected calf, reported dam and reported sire showing 0/197 loci mismatches and a positive LOD score for the reported dam with strict pair confidence (*) assigned at 95%, and 20/197 loci mismatches and a negative LOD score for the reported sire with relaxed confidence (+) set at 80%.

Pair loci Pair loci compared candidate ID Loci typed Loci typed

mismatching Offspring ID st Pair top LOD Pair confidence Pair LOD score 1

Affected calf 197 Reported dam 197 197 0 68.1 68.1 *

ID Pair score candidate

Pair loci Pair loci Pair LOD compared Loci typed confidence nd mismatching 2 Pair top LOD - Reported sire 197 197 20 -25.8 + 25.8

Appendix S1 Methodology for conventional PCR to create template for Sanger sequencing:

PCR was performed using a Mastercycler® pro (Eppendorf, Hamburg, Germany) with 5’-

TTGTCACTTACCATGTCAACATAAA-3’ (forward primer) and 5’-

ACTTGGCACACTTTGGTGAAG-3’ (reverse primer) primers in 20µL volumes comprised of

10 mmol/L Tris-HCL pH 8.3, 50 mmol/L KCL, 1.5 mmol/L MgCl2, 0.1 mM dNTPs, 0.5 µM of each primer, 0.05 U of Taq polymerase (Roche, Switzerland) and 15-20ng/µL of target DNA.

The initial denaturation step was performed at 94ºC for 3 min, followed by 35 cycles consisting of a denaturation step of 15s at 94ºC, an annealing step of 20s at 62ºC and an extension step of

30s at 72ºC. A final step was performed at 72ºC for 5 min with a hold step at 15ºC.

89 Appendix S2 Methodology for the custom allelic discrimination assay real-time PCR:

Allelic discrimination was performed using the ViiA™ 7 system (Applied Biosystems™, CA,

USA) in a final reaction volume of 20µL. Each reaction contained 1 x TaqMan® Genotyping

Master Mix (Applied Biosystems, CA, USA), 900 nmol/L of assay specific primers 5’-

ACAGCTATAATCTTTTTGTGGATAAGTTGGT-‘3 (forward primer) and 5’-

CGAGTTGAGAAAGGTGCAAATGATT-3’ (reverse primer), 250 nmol/L of allele specific 5’-

CTCCATCGTCTCACAAAG-‘3 (wildtype probe) and 5’-CCATCGTCCCACAAAG-‘3

(mutant probe) probes and between 10-30ng of genomic DNA. Each assay commenced with a pre-read stage at 60°C for 30s followed by an initial denaturation at 95°C for 10 min.

Denaturation then occurred at 95°C for 15s followed by annealing/extension at 60°C for 60s for

45 cycles, with a post-read stage at 60°C for 30s. Genotypes were analysed using the

QuantStudio™ Real-Time PCR System version 1.3 (Applied Biosystems™, CA, USA).

90 Table S2 Private protein-changing variants with moderate or high predicted impact located within 257 different genes or loci in the whole genome sequence data of the affected calf.

ENOTYPE #CHROM POS REF ALT CALF G EFFECT IMPACT GENE GENEID FEATURE FEATUREID BIOTYPE RANK HGVS_C HGVS_P

trans XM_0026 protein_ 1 5517896 C G 1/1 missense_variant MODERATE LOC785121 785121 1 c.23G>C p.Gly8Ala cript 84593.5 coding trans NM_0010 protein_ 1 56750154 C A 0/1 missense_variant MODERATE ABHD10 515563 5 c.837C>A p.His279Gln cript 15606.2 coding trans NM_0010 protein_ 1 56750156 G A 0/1 missense_variant MODERATE ABHD10 515563 5 c.839G>A p.Arg280Gln cript 15606.2 coding trans NM_0010 protein_ 1 56750159 T A 0/1 missense_variant MODERATE ABHD10 515563 5 c.842T>A p.Met281Lys cript 15606.2 coding trans NM_0010 protein_ 1 56750160 G A 0/1 missense_variant MODERATE ABHD10 515563 5 c.843G>A p.Met281Ile cript 15606.2 coding trans NM_0010 protein_ 1 56750164 G A 0/1 missense_variant MODERATE ABHD10 515563 5 c.847G>A p.Glu283Lys cript 15606.2 coding trans NM_0012 protein_ 1 116650303 T G 0/1 missense_variant MODERATE IGSF10 537487 6 c.7110T>G p.Ser2370Arg cript 06785.1 coding trans NM_0012 protein_ 3 p.Gln1868Ar 1 116685396 T C 0/1 missense_variant MODERATE MED12L 538979 c.5603A>G cript 06505.1 coding 7 g trans NM_0010 protein_ 1 116897105 G C 0/1 missense_variant MODERATE GPR171 767929 2 c.28G>C p.Val10Leu cript 77002.1 coding trans XM_0026 protein_ 1 1 126499452 G A 0/1 missense_variant MODERATE ATR 504869 c.2599G>A p.Glu867Lys cript 85057.6 coding 2 trans NM_1739 protein_ 1 141687838 G A 0/1 missense_variant MODERATE MX1 280872 7 c.656G>A p.Arg219His cript 40.2 coding trans XM_0249 protein_ 1 2 6627053 C A 0/1 missense_variant MODERATE ANKAR 100140244 c.2632G>T p.Gly878Cys cript 79745.1 coding 3 LOC112442 trans XM_0249 protein_ 2 18124036 C T 0/1 missense_variant MODERATE 112442545 1 c.3401C>T p.Pro1134Leu 545 cript 79897.1 coding trans XM_0052 protein_ 2 26517539 C G 0/1 missense_variant MODERATE UBR3 537932 1 c.29G>C p.Gly10Ala cript 02390.4 coding trans XM_0249 protein_ 2 2 26946061 C T 0/1 missense_variant MODERATE LRP2 100337021 c.3893C>T p.Pro1298Leu cript 83502.1 coding 6 LOC112442 trans XM_0249 protein_ 2 34054887 C A 0/1 missense_variant MODERATE 112442562 1 c.140C>A p.Ala47Glu 562 cript 80377.1 coding trans XM_0026 protein_ 2 p.Leu1395Va 2 37136378 A C 0/1 missense_variant MODERATE TANC1 507983 c.4183T>G cript 85349.5 coding 7 l trans NM_0011 protein_ 1 2 79937746 C T 1/1 missense_variant MODERATE MYO1B 537282 c.869C>T p.Ser290Leu cript 02199.1 coding 0 trans NM_0012 protein_ 2 96288344 G A 1/1 missense_variant MODERATE PLEKHM3 533312 2 c.370C>T p.Arg124Cys cript 06217.1 coding trans NM_0011 protein_ 4 2 103016791 A G 1/1 missense_variant MODERATE ABCA12 523479 c.6776T>C p.Leu2259Pro cript 91294.2 coding 5 trans NM_0010 protein_ 2 112071216 C A 0/1 missense_variant MODERATE MRPL44 532389 2 c.415C>A p.Gln139Lys cript 46321.2 coding

91 trans XM_0249 protein_ 1 2 118294664 G A 0/1 missense_variant MODERATE SP140 510377 c.1874G>A p.Arg625His cript 81805.1 coding 9 LOC100336 trans XM_0108 protein_ 3 8567298 A T 1/1 missense_variant MODERATE 100336682 6 c.631A>T p.Thr211Ser 682 cript 02791.3 coding trans XM_0108 protein_ 3 10270213 C T 0/1 missense_variant MODERATE LOC616952 616952 3 c.934G>A p.Ala312Thr cript 02812.1 coding trans XM_0026 protein_ 3 11126728 A G 0/1 missense_variant MODERATE LOC513494 513494 1 c.782A>G p.Lys261Arg cript 85933.5 coding trans NM_0010 protein_ 3 17753881 G A 1/1 missense_variant MODERATE LOC507527 507527 2 c.329C>T p.Pro110Leu cript 34301.1 coding trans NM_0010 protein_ 3 17753921 AT A 1/1 frameshift_variant HIGH LOC507527 507527 2 c.288delA p.Cys97fs cript 34301.1 coding GG CT CC GG AG CC trans NM_0010 protein_ c.264_286delGTGCCACCCCAAGGCTCC 3 17753923 G 1/1 frameshift_variant HIGH LOC507527 507527 2 p.Cys89fs TT cript 34301.1 coding GGAGC GG GG TG GC AC trans NM_0011 protein_ 3 54500697 A G 0/1 missense_variant MODERATE GBP4 613313 8 c.1207A>G p.Lys403Glu cript 02261.2 coding trans NM_0010 protein_ 3 54590474 A G 0/1 missense_variant MODERATE GBP6 533657 2 c.295A>G p.Met99Val cript 75995.1 coding trans XM_0026 protein_ 3 54774191 G T 1/1 missense_variant MODERATE LOC507055 507055 8 c.1170G>T p.Lys390Asn cript 86269.6 coding trans NM_0010 protein_ 3 65668962 C T 1/1 missense_variant MODERATE ADGRL4 535066 2 c.68C>T p.Thr23Ile cript 76908.2 coding trans XM_0026 protein_ 3 101188407 C T 0/1 stop_gained HIGH PTCH2 507948 7 c.880C>T p.Gln294* cript 86431.6 coding trans XM_0249 protein_ 3 104816988 C T 1/1 missense_variant MODERATE FOXO6 112446032 2 c.925G>A p.Gly309Ser cript 90211.1 coding trans XM_0249 protein_ 3 104818146 A G 1/1 missense_variant MODERATE FOXO6 112446032 1 c.128T>C p.Phe43Ser cript 90212.1 coding trans XM_0108 protein_ 3 105247097 T C 1/1 missense_variant MODERATE LOC786231 786231 2 c.820T>C p.Cys274Arg cript 03796.3 coding trans XM_0108 protein_ 3 105247524 C T 1/1 missense_variant MODERATE LOC786231 786231 2 c.1247C>T p.Ala416Val cript 03796.3 coding trans XM_0052 protein_ 3 105615650 C A 1/1 missense_variant MODERATE ZNF684 112441496 5 c.506G>T p.Gly169Val cript 04806.4 coding trans NM_0010 protein_ 3 105633991 T A 1/1 missense_variant MODERATE EXO5 538883 3 c.470A>T p.Glu157Val cript 81608.2 coding trans XM_0249 protein_ 3 106318899 G A 1/1 missense_variant MODERATE BMP8B 100296238 4 c.772G>A p.Gly258Ser cript 90223.1 coding trans XM_0026 protein_ 1 p.Ter522Glue 4 31619078 A C 0/1 stop_lost HIGH FAM126A 540584 c.1564T>G cript 86713.6 coding 1 xt*? CA trans XM_0249 protein_ 4 32412145 C 0/1 frameshift_variant HIGH LOC785370 785370 4 c.682_683delAT p.Met228fs T cript 91318.1 coding trans NM_0010 protein_ 4 51665516 G A 0/1 missense_variant MODERATE MET 280855 5 c.1667C>T p.Thr556Met cript 12999.2 coding trans NM_0011 protein_ 4 62274602 T G 0/1 missense_variant MODERATE NPSR1 519486 1 c.38A>C p.Asn13Thr cript 92977.1 coding trans NM_0010 protein_ 1 4 62733626 C T 0/1 missense_variant MODERATE BMPER 534101 c.1723G>A p.Val575Met cript 77997.1 coding 3

92 GG TC disruptive_inframe_d trans XM_0249 protein_ 2 p.Glu1122_T 4 77200984 G 0/1 MODERATE AEBP1 317693 c.3363_3368delGGAGAC TC eletion cript 91120.1 coding 1 hr1123del C trans XM_0052 protein_ 1 p.Gly1528Gl 4 79061762 G A 0/1 missense_variant MODERATE GLI3 785371 c.4583G>A cript 05659.4 coding 5 u trans NM_0011 protein_ 1 4 92504572 C T 0/1 missense_variant MODERATE RBM28 507132 c.1364G>A p.Gly455Glu cript 91390.1 coding 3 trans XM_0249 protein_ 4 107346953 C T 0/1 missense_variant MODERATE LOC787740 787740 1 c.395C>T p.Thr132Met cript 90943.1 coding trans XM_0154 protein_ 4 107372136 T C 0/1 missense_variant MODERATE LOC514891 514891 1 c.143T>C p.Ile48Thr cript 70791.2 coding trans XM_0026 protein_ 4 107479074 C G 0/1 missense_variant MODERATE LOC514662 514662 1 c.26C>G p.Ala9Gly cript 87031.1 coding trans XM_0249 protein_ 4 107536176 C T 0/1 missense_variant MODERATE LOC508101 508101 1 c.697C>T p.Arg233Cys cript 90944.1 coding trans XM_0026 protein_ 4 107556537 G A 0/1 missense_variant MODERATE LOC507423 507423 1 c.174G>A p.Met58Ile cript 87133.1 coding AG disruptive_inframe_i trans XM_0035 protein_ 5 4935415 A 1/1 MODERATE GLIPR1L2 536555 6 c.957_959dupGGA p.Glu320dup AG nsertion cript 86037.5 coding trans XM_0249 protein_ 1 5 9803089 C T 0/1 missense_variant MODERATE OTOGL 531410 c.1784C>T p.Ala595Val cript 92350.1 coding 5 CC trans NM_1741 protein_ 5 10286530 C 0/1 frameshift_variant HIGH MYF5 281335 3 c.691_692delTC p.Ser231fs T cript 16.1 coding trans NM_1743 protein_ 1 5 56390327 G A 1/1 missense_variant MODERATE MYO1A 281936 c.988G>A p.Val330Ile cript 95.3 coding 1 trans XM_0026 protein_ 3 p.Lys1491Ar 5 60881753 A G 0/1 missense_variant MODERATE CFAP54 787705 c.4472A>G cript 87523.6 coding 3 g trans XM_0052 protein_ 5 62698685 G A 0/1 missense_variant MODERATE TMPO 510267 4 c.1303G>A p.Glu435Lys cript 06549.4 coding trans NM_0011 protein_ 5 62785019 A G 0/1 missense_variant MODERATE APAF1 537782 1 c.73A>G p.Ile25Val cript 91507.1 coding trans XM_0052 protein_ 5 63754756 C T 0/1 stop_gained HIGH FAM71C 101903934 1 c.211C>T p.Gln71* cript 28600.1 coding trans NM_0011 protein_ 5 65951347 T C 0/1 missense_variant MODERATE NUP37 100139879 2 c.74A>G p.Asn25Ser cript 91327.1 coding trans NM_0012 protein_ 5 p.Thr1858Me 5 67409316 C T 0/1 missense_variant MODERATE STAB2 407177 c.5573C>T cript 06679.1 coding 3 t trans NM_0011 protein_ 5 73516288 C T 0/1 missense_variant MODERATE HMGXB4 505539 2 c.440C>T p.Ser147Leu cript 01856.1 coding trans XM_0108 protein_ 5 106834013 C G 0/1 missense_variant MODERATE TEAD4 526771 2 c.596G>C p.Trp199Ser cript 05630.3 coding trans NM_0012 protein_ 1 5 107429316 C T 0/1 missense_variant MODERATE B4GALNT3 527026 c.2173C>T p.Arg725Cys cript 06695.1 coding 5 trans NM_0010 protein_ p.Ter358Leue 5 115731698 G T 0/1 stop_lost HIGH FAM118A 505415 8 c.1073G>T cript 38035.3 coding xt*? trans XM_0052 protein_ 6 35760934 C T 0/1 missense_variant MODERATE FAM13A 282605 4 c.556C>T p.Arg186Cys cript 07747.4 coding trans XM_0108 protein_ 3 7 536128 G A 0/1 missense_variant MODERATE FLT4 338031 c.4048G>A p.Ala1350Thr cript 06455.2 coding 0 LOC112447 trans XM_0249 protein_ 7 8849344 A G 0/1 missense_variant MODERATE 112447485 1 c.614T>C p.Val205Ala 485 cript 95307.1 coding trans XM_0249 protein_ 2 7 16828516 G A 0/1 missense_variant MODERATE FBN3 787458 c.3443C>T p.Ala1148Val cript 94922.1 coding 3 trans NM_0010 protein_ 7 16928642 T C 0/1 missense_variant MODERATE CERS4 505233 4 c.319T>C p.Cys107Arg cript 15520.1 coding trans XM_0108 protein_ 7 18612876 G T 0/1 missense_variant MODERATE HSD11B1L 404546 6 c.571C>A p.Leu191Ile cript 06889.2 coding

93 trans XM_0108 protein_ 1 7 21168789 T C 0/1 missense_variant MODERATE TMPRSS9 518647 c.2402A>G p.Glu801Gly cript 07063.3 coding 5 trans XM_0249 protein_ 1 7 27366440 G A 0/1 missense_variant MODERATE GRAMD2B 505627 c.1396C>T p.Pro466Ser cript 93940.1 coding 3 trans XM_0108 protein_ 7 51910495 A T 0/1 missense_variant MODERATE PCDHA3 787538 1 c.2212A>T p.Arg738Trp cript 07383.3 coding trans XM_0154 protein_ 7 69114792 T C 0/1 missense_variant MODERATE NIPAL4 532275 6 c.709T>C p.Phe237Leu cript 72345.2 coding trans NM_0010 protein_ 7 69297113 G A 0/1 missense_variant MODERATE SOX30 538232 1 c.221C>T p.Pro74Leu cript 46429.2 coding trans NM_1810 protein_ 7 83405292 G T 1/1 missense_variant MODERATE VCAN 282662 7 c.1813G>T p.Val605Phe cript 35.2 coding GC disruptive_inframe_i trans XM_0108 protein_ 1 8 1040713 G 1/1 MODERATE PALLD 617678 c.2283_2285dupCCC p.Pro762dup CC nsertion cript 07668.3 coding 1 trans NM_0011 protein_ 8 70305189 C G 0/1 missense_variant MODERATE RHOBTB2 784567 4 c.922C>G p.Pro308Ala cript 03104.1 coding trans NM_0011 protein_ 8 70305597 G A 0/1 missense_variant MODERATE RHOBTB2 784567 4 c.1330G>A p.Glu444Lys cript 03104.1 coding trans NM_0011 protein_ 8 70314884 G A 0/1 missense_variant MODERATE RHOBTB2 784567 9 c.2102G>A p.Arg701Gln cript 03104.1 coding trans XM_0108 protein_ 8 94127995 T C 1/1 missense_variant MODERATE LOC783328 783328 2 c.355A>G p.Arg119Gly cript 08213.2 coding GC trans NM_0010 protein_ 8 102632035 G 1/1 frameshift_variant HIGH HDHD3 510680 3 c.449_450dupTG p.Arg151fs A cript 14896.1 coding missense_variant&sp trans NM_0010 protein_ 1 p.Met1255Va 8 104233069 T C 1/1 MODERATE TNC 540664 c.3763A>G lice_region_variant cript 78026.2 coding 3 l missense_variant&sp trans NM_0012 protein_ 5 9 14682182 C T 0/1 MODERATE COL12A1 359712 c.8269G>A p.Ala2757Thr lice_region_variant cript 06497.1 coding 3 trans XM_0052 protein_ 3 9 40655013 A T 0/1 missense_variant MODERATE AK9 504511 c.4018A>T p.Ile1340Leu cript 10829.3 coding 3 trans NM_0010 protein_ 9 40703663 G A 0/1 missense_variant MODERATE MICAL1 508306 8 c.838G>A p.Asp280Asn cript 81582.1 coding trans XM_0052 protein_ 1 9 59717757 G A 0/1 missense_variant MODERATE MAP3K7 529146 c.1238G>A p.Arg413His cript 10872.4 coding 2 trans XM_0026 protein_ 5 9 67664740 G A 0/1 missense_variant MODERATE LAMA2 100138434 c.8182G>A p.Ala2728Thr cript 90220.5 coding 8 trans XM_0026 protein_ 6 9 67677291 C T 0/1 missense_variant MODERATE LAMA2 100138434 c.8408C>T p.Ala2803Val cript 90220.5 coding 0 trans NM_0012 protein_ 9 70511774 C T 0/1 missense_variant MODERATE TAAR6 783342 1 c.509C>T p.Thr170Met cript 06565.1 coding TG disruptive_inframe_d LOC100336 trans XM_0026 protein_ 9 70562211 T 0/1 MODERATE 100336507 1 c.27_29delAGC p.Ala10del CA eletion 507 cript 90250.1 coding trans XM_0108 protein_ 9 70580528 C T 0/1 missense_variant MODERATE LOC613867 613867 1 c.172C>T p.Leu58Phe cript 08587.2 coding trans XM_0026 protein_ 9 70661977 A C 0/1 missense_variant MODERATE LOC516101 516101 1 c.869T>G p.Ile290Ser cript 90246.3 coding trans XM_0026 protein_ 9 70694653 C T 0/1 missense_variant MODERATE LOC782807 782807 1 c.770C>T p.Ala257Val cript 90248.5 coding trans XM_0154 protein_ 9 70870595 T G 0/1 missense_variant MODERATE VNN3 785817 1 c.53A>C p.Gln18Pro cript 72861.2 coding trans NM_0011 protein_ 9 75752221 G A 0/1 missense_variant MODERATE TNFAIP3 508105 6 c.1336G>A p.Ala446Thr cript 92170.1 coding trans NM_0011 protein_ 9 75752692 A G 0/1 missense_variant MODERATE TNFAIP3 508105 6 c.1807A>G p.Met603Val cript 92170.1 coding missense_variant&sp trans XM_0026 protein_ 9 76049900 C T 0/1 MODERATE ARFGEF3 522232 6 c.542C>T p.Thr181Met lice_region_variant cript 90274.5 coding

94 trans XM_0249 protein_ 1 9 95548780 C A 0/1 missense_variant MODERATE FNDC1 100850256 c.2000C>A p.Ser667Tyr cript 97129.1 coding 1 trans NM_0011 protein_ 9 96022749 C T 0/1 stop_gained HIGH WTAP 532996 8 c.1174C>T p.Gln392* cript 13254.2 coding trans XM_0026 protein_ 2 9 96771854 G A 0/1 missense_variant MODERATE MAP3K4 511779 c.4336G>A p.Val1446Ile cript 90377.6 coding 3 trans XM_0249 protein_ 1 9 96960266 C G 0/1 missense_variant MODERATE PRKN 530858 c.1429G>C p.Asp477His cript 96689.1 coding 2 1 trans XM_0035 protein_ 1 14212561 C A 0/1 missense_variant MODERATE IQCH 519277 c.884C>A p.Thr295Lys 0 cript 86516.5 coding 0 1 trans NM_0010 protein_ 17680751 C T 0/1 stop_gained HIGH LARP6 787569 3 c.594G>A p.Trp198* 0 cript 99205.1 coding 1 LOC101907 trans 10190710 protein_ 23499255 A G 1/1 missense_variant MODERATE 101907109 2 c.157T>C p.Tyr53His 0 109 cript 9 coding 1 trans XM_0154 protein_ 1 28991974 T G 1/1 missense_variant MODERATE RYR3 539899 c.1339A>C p.Ile447Leu 0 cript 73104.2 coding 3 1 trans NM_0012 protein_ 3 55701771 A C 1/1 missense_variant MODERATE UNC13C 100337128 c.5894T>G p.Ile1965Ser 0 cript 06460.1 coding 0 1 trans XM_0026 protein_ 85018229 G A 1/1 missense_variant MODERATE ACOT2 785383 1 c.169G>A p.Ala57Thr 0 cript 90999.6 coding AC TG GG GT CC 1 conservative_inframe trans XM_0026 protein_ c.785_805dupCCAGCCCCCAGGGGACC p.Ala262_Pro 7432911 A CC 0/1 MODERATE MFSD9 514165 6 1 _insertion cript 91186.6 coding CCAG 268dup TG GG GG CT GG 1 trans NM_0011 protein_ 9242876 C T 0/1 missense_variant MODERATE TGFBRAP1 514660 2 c.115G>A p.Val39Met 1 cript 02008.1 coding 1 missense_variant&sp trans NM_0010 protein_ 10505255 G A 0/1 MODERATE BOLA3 614629 1 c.53G>A p.Arg18Gln 1 lice_region_variant cript 35452.2 coding 1 trans NM_0011 protein_ 10877788 G A 0/1 missense_variant MODERATE TPRKB 519224 3 c.172G>A p.Ala58Thr 1 cript 92507.1 coding 1 trans XM_0026 protein_ 13573437 C T 0/1 missense_variant MODERATE CLEC4F 511001 2 c.124C>T p.Arg42Cys 1 cript 91231.5 coding 1 trans NM_0011 protein_ 68599352 A C 1/1 missense_variant MODERATE PCYOX1 100125835 6 c.1182A>C p.Glu394Asp 1 cript 05474.2 coding 1 trans NM_0010 protein_ 86988292 G A 0/1 missense_variant MODERATE NOL10 516314 9 c.623G>A p.Arg208Gln 1 cript 75740.2 coding 1 trans XM_0026 protein_ 93380897 C G 0/1 missense_variant MODERATE LOC509073 509073 1 c.393C>G p.Phe131Leu 1 cript 91549.1 coding 1 trans XM_0026 protein_ 93456641 C G 0/1 missense_variant MODERATE LOC615170 615170 1 c.230G>C p.Ser77Thr 1 cript 91553.4 coding 1 trans NM_0012 protein_ 2 101374362 C T 0/1 missense_variant MODERATE NUP214 784219 c.3479C>T p.Ser1160Leu 1 cript 05443.1 coding 6 1 missense_variant&sp trans NM_0012 protein_ 1 101711521 C T 0/1 MODERATE RAPGEF1 520454 c.1999G>A p.Ala667Thr 1 lice_region_variant cript 05872.1 coding 3 1 trans NM_0012 protein_ 102910573 G A 0/1 missense_variant MODERATE AK8 615606 1 c.25C>T p.Arg9Cys 1 cript 06250.1 coding 1 trans XM_0249 protein_ 5 105396965 C T 0/1 missense_variant MODERATE COL5A1 100848491 c.4283C>T p.Pro1428Leu 1 cript 99726.1 coding 5 1 trans NM_0010 protein_ 11442329 T C 1/1 missense_variant MODERATE MTRF1 504754 6 c.742A>G p.Asn248Asp 2 cript 35013.2 coding 1 trans XM_0250 protein_ 4 p.Asp1225Hi 85145851 G C 0/1 missense_variant MODERATE COL4A2 508632 c.3673G>C 2 cript 00171.1 coding 0 s

95 1 trans NM_0011 protein_ 1 22144924 C G 1/1 missense_variant MODERATE NEBL 517987 c.1681G>C p.Asp561His 3 cript 91413.2 coding 7 1 trans NM_0011 protein_ 5 31346998 C T 0/1 stop_gained HIGH CUBN 523202 c.7793G>A p.Trp2598* 3 cript 92575.1 coding 0 1 trans NM_0011 protein_ 34435277 C T 0/1 missense_variant MODERATE ZNF438 510512 7 c.179C>T p.Ala60Val 3 cript 43772.2 coding CA AA 1 AA trans NM_0012 protein_ 50953972 C 0/1 frameshift_variant HIGH SMOX 527211 5 c.1307_1308insTTTTTTTTTT p.Met436fs 3 AA cript 05439.1 coding AA A 1 trans NM_0013 protein_ 2 69989679 C T 0/1 missense_variant MODERATE LPIN3 521637 c.2344C>T p.Arg782Cys 3 cript 54379.1 coding 0 1 trans XM_0250 protein_ 3 76852188 C T 1/1 missense_variant MODERATE PREX1 527410 c.4079G>A p.Arg1360His 3 cript 01303.1 coding 2 1 trans NM_1744 protein_ 77623630 A G 1/1 missense_variant MODERATE PTGIS 282021 3 c.247T>C p.Tyr83His 3 cript 44.1 coding 1 trans XM_0035 protein_ 945049 T C 0/1 missense_variant MODERATE EPPK1 100337278 2 c.4361T>C p.Val1454Ala 4 cript 86872.5 coding 1 trans NM_0010 protein_ 1 1041490 G A 0/1 missense_variant MODERATE MAPK15 512125 c.1274C>T p.Ala425Val 4 cript 46110.1 coding 2 1 missense_variant&sp trans NM_0010 protein_ 1042621 G A 0/1 MODERATE MAPK15 512125 8 c.781C>T p.Arg261Trp 4 lice_region_variant cript 46110.1 coding 1 LOC101905 trans XM_0250 protein_ 1 1778359 C T 0/1 missense_variant MODERATE 101905222 c.1721C>T p.Thr574Met 4 222 cript 01993.1 coding 7 1 trans NM_0011 protein_ 1847538 C T 0/1 missense_variant MODERATE ADGRB1 524070 9 c.1904G>A p.Arg635Gln 4 cript 91541.1 coding 1 trans XM_0026 protein_ 1 2747243 G A 0/1 missense_variant MODERATE DENND3 508078 c.2189C>T p.Thr730Met 4 cript 92572.5 coding 3 1 trans XM_0250 protein_ 6 54911806 C T 1/1 missense_variant MODERATE PKHD1L1 100336824 c.10835G>A p.Ser3612Asn 4 cript 01865.1 coding 8 1 trans NM_0011 protein_ 57828326 G A 0/1 missense_variant MODERATE ABRA 539379 2 c.853G>A p.Glu285Lys 4 cript 92234.1 coding 1 trans NM_0012 protein_ 70507818 G A 0/1 missense_variant MODERATE TMEM67 506762 5 c.530C>T p.Ala177Val 4 cript 05299.1 coding 1 trans NM_0012 protein_ 5780735 G T 0/1 missense_variant MODERATE MMP3 281309 4 c.516G>T p.Leu172Phe 5 cript 06637.1 coding 1 trans NM_0012 protein_ 5780736 C T 0/1 missense_variant MODERATE MMP3 281309 4 c.517C>T p.Pro173Ser 5 cript 06637.1 coding 1 trans NM_0012 protein_ 5780737 C T 0/1 missense_variant MODERATE MMP3 281309 4 c.518C>T p.Pro173Leu 5 cript 06637.1 coding 1 trans NM_0012 protein_ 5780742 G T 0/1 missense_variant MODERATE MMP3 281309 4 c.523G>T p.Asp175Tyr 5 cript 06637.1 coding 1 trans NM_0012 protein_ 5780743 A T 0/1 missense_variant MODERATE MMP3 281309 4 c.524A>T p.Asp175Val 5 cript 06637.1 coding 1 trans NM_0012 protein_ 5780745 G T 0/1 stop_gained HIGH MMP3 281309 4 c.526G>T p.Gly176* 5 cript 06637.1 coding 1 trans NM_0010 protein_ 27486710 C G 0/1 missense_variant MODERATE APOC3 408009 2 c.22C>G p.Leu8Val 5 cript 01175.2 coding 1 trans NM_0012 protein_ 29498633 G A 0/1 missense_variant MODERATE BCL9L 539340 6 c.2129C>T p.Ala710Val 5 cript 05656.2 coding 1 trans XM_0108 protein_ 1 34999824 A G 0/1 missense_variant MODERATE USH1C 530709 c.2119A>G p.Asn707Asp 5 cript 12478.3 coding 9 1 LOC100337 trans XM_0026 protein_ 45914938 C T 0/1 missense_variant MODERATE 100337392 1 c.785C>T p.Pro262Leu 5 392 cript 85934.4 coding 1 trans XM_0154 protein_ 46360225 C G 0/1 missense_variant MODERATE LOC783920 783920 8 c.4991C>G p.Thr1664Ser 5 cript 74717.2 coding

96 1 trans NM_0011 protein_ 6 p.Asn4319As 18966208 T C 1/1 missense_variant MODERATE USH2A 100296635 c.12955A>G 6 cript 91425.1 coding 2 p 1 CC conservative_inframe trans XM_0026 protein_ 1 p.Thr882_Gly 59860922 C 0/1 MODERATE RASAL2 540692 c.2646_2647insTTC 6 TT _insertion cript 94157.6 coding 4 883insPhe 1 trans NM_0011 protein_ 61976963 C G 0/1 missense_variant MODERATE XPR1 536550 9 c.994C>G p.Leu332Val 6 cript 92883.1 coding 1 trans NM_0011 protein_ 7 67155752 C T 0/1 missense_variant MODERATE HMCN1 521326 c.11102C>T p.Thr3701Ile 6 cript 92537.2 coding 2 1 trans XM_0026 protein_ 75926669 TC T 0/1 frameshift_variant HIGH LOC519737 519737 4 c.536delC p.Pro179fs 6 cript 94268.5 coding 1 trans XM_0052 protein_ 77604599 C T 0/1 missense_variant MODERATE PTPRC 407152 5 c.302C>T p.Thr101Ile 6 cript 17330.4 coding 1 trans XM_0026 protein_ 35284406 T C 1/1 missense_variant MODERATE ADAD1 537320 7 c.776A>G p.Gln259Arg 7 cript 94395.5 coding 1 trans XM_0026 protein_ 6 35362825 A G 1/1 missense_variant MODERATE KIAA1109 100297914 c.10550T>C p.Val3517Ala 7 cript 94408.4 coding 2 1 trans XM_0026 protein_ 3 35410654 G A 1/1 missense_variant MODERATE KIAA1109 100297914 c.6062C>T p.Ala2021Val 7 cript 94408.4 coding 8 1 trans XM_0249 protein_ 48417746 A G 0/1 missense_variant MODERATE TMEM132C 512602 9 c.3251T>C p.Val1084Ala 7 cript 77666.1 coding 1 trans XM_0249 protein_ p.Asp1059As 48417822 C T 0/1 missense_variant MODERATE TMEM132C 512602 9 c.3175G>A 7 cript 77666.1 coding n 1 trans NM_0012 protein_ 1 50863072 A G 0/1 missense_variant MODERATE DHX37 529207 c.2509A>G p.Met837Val 7 cript 05961.1 coding 9 1 trans NM_0011 protein_ 3 p.Met1117Va 52884253 T C 0/1 missense_variant MODERATE KNTC1 506353 c.3349A>G 7 cript 92091.2 coding 4 l 1 trans XM_0249 protein_ 1 53483661 G A 0/1 missense_variant MODERATE SETD1B 514102 c.4564C>T p.Pro1522Ser 7 cript 77702.1 coding 2 1 trans NM_0011 protein_ 1 64404088 G A 1/1 missense_variant MODERATE SART3 505922 c.2710G>A p.Gly904Arg 7 cript 01860.1 coding 8 1 trans NM_0011 protein_ 64580311 G A 1/1 missense_variant MODERATE CMKLR1 615411 1 c.850G>A p.Val284Met 7 cript 45235.1 coding 1 trans NM_0011 protein_ 72193930 A T 0/1 missense_variant MODERATE HIC2 539541 2 c.1067A>T p.Lys356Met 7 cript 92270.1 coding GT GG GC GG GG 1 splice_donor_variant trans NM_0010 protein_ c.720+1_720+2insTGGGCGGGGCGCGAA 72320021 G CG 0/1 HIGH AIFM3 526295 7 7 &intron_variant cript 46281.2 coding TGGGGC CG AA TG GG GC 1 trans XM_0249 protein_ 2 16594252 G A 0/1 missense_variant MODERATE ABCC11 101909228 c.3719C>T p.Thr1240Ile 8 cript 78937.1 coding 7 1 trans NM_0012 protein_ 21649569 T G 0/1 missense_variant MODERATE CHD9 539275 2 c.1441T>G p.Cys481Gly 8 cript 05650.1 coding 1 trans XM_0026 protein_ 35375857 G A 0/1 missense_variant MODERATE EDC4 513171 9 c.1040G>A p.Arg347Gln 8 cript 94902.6 coding 1 trans NM_0012 protein_ 39241943 C T 0/1 missense_variant MODERATE ATXN1L 521025 3 c.674G>A p.Arg225Gln 8 cript 05881.1 coding AG GC 1 CT trans NM_0010 protein_ 45981074 A 0/1 frameshift_variant HIGH LSR 508651 8 c.1572_1581delGGCCTTGAGG p.Ala525fs 8 TG cript 83394.1 coding AG G

97 GA AA 1 AA trans NM_0010 protein_ 45981084 G 0/1 frameshift_variant HIGH LSR 508651 8 c.1586_1587insAAAAAAAAAA p.Gly530fs 8 AA cript 83394.1 coding AA A 1 trans NM_0010 protein_ 46459144 A G 0/1 missense_variant MODERATE U2AF1L4 615198 1 c.14T>C p.Leu5Ser 8 cript 34778.2 coding 1 LOC112441 trans XM_0249 protein_ 51426769 A AC 0/1 frameshift_variant HIGH 112441502 3 c.285dupG p.Ser96fs 8 502 cript 79122.1 coding 1 trans XM_0249 protein_ 54605524 G A 0/1 missense_variant MODERATE BICRA 531868 7 c.274G>A p.Gly92Ser 8 cript 79183.1 coding 1 missense_variant&sp trans NM_0010 protein_ 54649039 G T 0/1 MODERATE NOP53 506401 2 c.226G>T p.Gly76Cys 8 lice_region_variant cript 38507.2 coding 1 trans XM_0249 protein_ 1 p.Val1491Me 55157838 C T 0/1 missense_variant MODERATE LMTK3 100140306 c.4471G>A 8 cript 79226.1 coding 7 t 1 trans NM_0011 protein_ 55353537 G A 0/1 missense_variant MODERATE MAMSTR 505540 9 c.1040C>T p.Pro347Leu 8 cript 95011.1 coding 1 LOC101904 trans XM_0052 protein_ 60123360 C A 0/1 missense_variant MODERATE 101904879 3 c.310G>T p.Gly104Cys 8 879 cript 19626.4 coding 1 trans XM_0249 protein_ 60940752 A C 0/1 missense_variant MODERATE LOC506868 506868 3 c.599A>C p.Asn200Thr 8 cript 79357.1 coding 1 trans XM_0249 protein_ 63457429 G A 0/1 missense_variant MODERATE NLRP8 506161 3 c.787G>A p.Ala263Thr 8 cript 79487.1 coding 1 CT trans XM_0108 protein_ 64640549 C 0/1 frameshift_variant HIGH LOC528802 528802 4 c.1130_1131delTT p.Phe377fs 8 T cript 15628.3 coding 1 trans XM_0052 protein_ 7953640 C A 1/1 missense_variant MODERATE AKAP1 532072 2 c.1229C>A p.Pro410His 9 cript 19948.4 coding 1 trans NM_0010 protein_ 18692677 T G 1/1 missense_variant MODERATE OMG 407186 2 c.901T>G p.Ser301Ala 9 cript 77524.2 coding 1 trans NM_0010 protein_ 18692689 A G 1/1 missense_variant MODERATE OMG 407186 2 c.913A>G p.Lys305Glu 9 cript 77524.2 coding 1 trans NM_0010 protein_ 18692717 A C 1/1 missense_variant MODERATE OMG 407186 2 c.941A>C p.Glu314Ala 9 cript 77524.2 coding 1 trans NM_0010 protein_ 18692951 A G 1/1 missense_variant MODERATE OMG 407186 2 c.1175A>G p.Asp392Gly 9 cript 77524.2 coding 1 trans XM_0052 protein_ 21674978 G T 0/1 missense_variant MODERATE ABR 515556 1 c.387G>T p.Lys129Asn 9 cript 20104.4 coding 1 trans XM_0026 protein_ 23948205 C T 0/1 missense_variant MODERATE LOC522582 522582 1 c.55G>A p.Gly19Ser 9 cript 95725.1 coding 1 trans XM_0026 protein_ 24148887 A G 0/1 missense_variant MODERATE LOC515540 515540 1 c.379T>C p.Cys127Arg 9 cript 95696.3 coding 1 trans XM_0249 protein_ 2 24556970 C G 0/1 missense_variant MODERATE ATP2A3 512313 c.2818G>C p.Gly940Arg 9 cript 79978.1 coding 0 1 trans NM_0010 protein_ 25152336 C T 0/1 missense_variant MODERATE XAF1 509740 7 c.616G>A p.Ala206Thr 9 cript 35075.1 coding 1 trans XM_0052 protein_ 1 25287168 G A 0/1 missense_variant MODERATE KIAA0753 512933 c.2638G>A p.Glu880Lys 9 cript 20266.4 coding 7 1 trans XM_0249 protein_ 25412616 T C 0/1 missense_variant MODERATE PIMREG 540455 5 c.853A>G p.Thr285Ala 9 cript 80359.1 coding 1 trans XM_0249 protein_ 25412750 A G 0/1 missense_variant MODERATE PIMREG 540455 5 c.719T>C p.Ile240Thr 9 cript 80359.1 coding 1 trans NM_1742 protein_ 25423482 C T 0/1 missense_variant MODERATE AIPL1 281609 2 c.272C>T p.Thr91Ile 9 cript 34.1 coding 1 trans NM_0011 protein_ 2 27025101 T C 0/1 missense_variant MODERATE NEURL4 528485 c.4262A>G p.Asn1421Ser 9 cript 92661.2 coding 7 1 trans NM_0010 protein_ 27053371 C T 0/1 missense_variant MODERATE KCTD11 539167 1 c.52C>T p.Pro18Ser 9 cript 35421.2 coding

98 1 trans XM_0249 protein_ 3 27239178 T C 0/1 missense_variant MODERATE POLR2A 282312 c.5473T>C p.Tyr1825His 9 cript 79629.1 coding 0 1 trans NM_0012 protein_ 27284302 C A 0/1 missense_variant MODERATE TNFSF12 100551513 3 c.235C>A p.Gln79Lys 9 cript 05143.1 coding 1 trans NM_0010 protein_ 27318060 C T 0/1 missense_variant MODERATE MPDU1 504961 5 c.446C>T p.Thr149Met 9 cript 75179.1 coding 1 trans NM_0011 protein_ 1 27441413 C T 0/1 missense_variant MODERATE DNAH2 789044 c.1562C>T p.Ala521Val 9 cript 91249.2 coding 0 1 trans NM_0011 protein_ 5 27491665 A G 0/1 missense_variant MODERATE DNAH2 789044 c.8522A>G p.Asn2841Ser 9 cript 91249.2 coding 4 1 trans NM_0011 protein_ 6 p.Arg3205Gl 27497820 G A 1/1 missense_variant MODERATE DNAH2 789044 c.9614G>A 9 cript 91249.2 coding 2 n 1 trans NM_0012 protein_ 2 p.Val1194Le 27779741 C A 0/1 missense_variant MODERATE PER1 516318 c.3580G>T 9 cript 89772.1 coding 1 u 1 trans NM_0012 protein_ 2 27894044 C T 0/1 missense_variant MODERATE PFAS 520318 c.2978C>T p.Ser993Leu 9 cript 56564.1 coding 3 1 trans NM_0012 protein_ 27926176 T A 0/1 missense_variant MODERATE ARHGEF15 512021 1 c.254T>A p.Leu85His 9 cript 05712.1 coding 1 trans NM_0010 protein_ 1 34215383 C A 0/1 missense_variant MODERATE SLC5A10 407227 c.1594G>T p.Gly532Trp 9 cript 01442.1 coding 3 1 trans NM_0010 protein_ 35909994 C T 0/1 missense_variant MODERATE WFIKKN2 531979 2 c.701G>A p.Arg234Gln 9 cript 76885.1 coding 1 trans XM_0249 protein_ 41366611 C T 0/1 missense_variant MODERATE LOC788228 788228 1 c.122G>A p.Arg41His 9 cript 81151.1 coding 1 trans XM_0249 protein_ 45116070 G A 0/1 missense_variant MODERATE PLEKHM1 523424 6 c.1127C>T p.Pro376Leu 9 cript 80934.1 coding 1 trans NM_0010 protein_ 2 50788329 G A 0/1 missense_variant MODERATE FASN 281152 c.4003G>A p.Ala1335Thr 9 cript 12669.1 coding 5 1 trans NM_0010 protein_ 1 56472565 C A 0/1 missense_variant MODERATE HID1 540436 c.1777C>A p.Gln593Lys 9 cript 76924.1 coding 4 1 trans NM_0011 protein_ 56501337 G T 0/1 missense_variant MODERATE USH1G 531104 1 c.79G>T p.Ala27Ser 9 cript 92702.2 coding 1 trans NM_1746 protein_ 2 56553694 G A 0/1 missense_variant MODERATE FDXR 282604 c.1345G>A p.Val449Met 9 cript 91.1 coding 2 2 trans NM_0011 protein_ 9155712 C T 0/1 missense_variant MODERATE ZNF366 539632 1 c.337C>T p.Pro113Ser 0 cript 92285.1 coding 2 LOC104975 trans XM_0154 protein_ 23005328 T C 1/1 missense_variant MODERATE 104975351 5 c.191A>G p.Glu64Gly 1 351 cript 59215.2 coding 2 trans NM_0011 protein_ 23085215 G T 1/1 missense_variant MODERATE WHAMM 510810 2 c.654G>T p.Met218Ile 1 cript 91456.1 coding 2 trans NM_0011 protein_ 23085238 A C 1/1 missense_variant MODERATE WHAMM 510810 2 c.677A>C p.Glu226Ala 1 cript 91456.1 coding 2 trans NM_0011 protein_ 1 23099105 C T 1/1 missense_variant MODERATE WHAMM 510810 c.2290C>T p.Pro764Ser 1 cript 91456.1 coding 3 2 trans NM_0010 protein_ 23114439 T C 1/1 missense_variant MODERATE HOMER2 510811 9 c.917A>G p.Tyr306Cys 1 cript 75467.2 coding 2 trans NM_0010 protein_ 23245557 G A 0/1 missense_variant MODERATE RAMMET 614296 3 c.145G>A p.Gly49Ser 1 cript 34757.1 coding 2 trans NM_0010 protein_ 48395090 C T 1/1 missense_variant MODERATE CLEC14A 509367 1 c.830G>A p.Gly277Asp 1 cript 77890.1 coding 2 trans NM_0010 protein_ 58656754 G A 1/1 missense_variant MODERATE OTUB2 504880 6 c.694G>A p.Asp232Asn 1 cript 15517.1 coding 2 trans NM_0012 protein_ 67572776 G A 0/1 missense_variant MODERATE TRAF3 506182 1 c.31G>A p.Gly11Ser 1 cript 05586.1 coding CC 2 GG conservative_inframe trans NM_0012 protein_ c.43_69delCGCCCCCAGGAGCCGGCGG p.Arg15_His2 67763739 C 0/1 MODERATE EXOC3L4 526547 1 1 AG _deletion cript 05930.2 coding AGCCGCAC 3del CC

99

GC AC CG CC CC CA GG AG CC GG 2 trans XM_0249 protein_ 2 69190842 G A 0/1 missense_variant MODERATE INF2 538493 c.3464G>A p.Arg1155His 1 cript 82196.1 coding 1 2 trans XM_0249 protein_ 69415237 T C 0/1 missense_variant MODERATE CDCA4 527837 4 c.427A>G p.Thr143Ala 1 cript 82246.1 coding 2 trans NM_0011 protein_ 32422155 A G 0/1 missense_variant MODERATE LMOD3 509978 3 c.1270A>G p.Ile424Val 2 cript 00332.1 coding 2 trans NM_0012 protein_ 36787704 A G 1/1 missense_variant MODERATE ADAMTS9 537051 4 c.739A>G p.Arg247Gly 2 cript 06573.1 coding 2 trans XM_0026 protein_ 1 42508897 C G 1/1 missense_variant MODERATE C22H3orf67 534800 c.1240C>G p.Gln414Glu 2 cript 97057.6 coding 1 2 trans NM_0010 protein_ 1 42925496 G A 1/1 missense_variant MODERATE PXK 614093 c.1568C>T p.Pro523Leu 2 cript 99132.1 coding 8 2 trans XM_0026 protein_ 53354248 A C 1/1 missense_variant MODERATE FYCO1 100139246 8 c.2030A>C p.Glu677Ala 2 cript 97098.6 coding 2 trans NM_0012 protein_ 5531171 C A 0/1 missense_variant MODERATE FAM83B 540231 2 c.238G>T p.Asp80Tyr 3 cript 05795.1 coding 2 trans NM_0010 protein_ 10772318 C T 0/1 missense_variant MODERATE PPIL1 508179 4 c.340G>A p.Val114Met 3 cript 14869.1 coding 2 CG trans XM_0249 protein_ 17786294 C 0/1 frameshift_variant HIGH TCTE1 523600 4 c.1070_1071delAC p.His357fs 3 T cript 83876.1 coding 2 trans XM_0026 protein_ 4 24325254 A G 1/1 missense_variant MODERATE PKHD1 537895 c.6875T>C p.Val2292Ala 3 cript 97311.5 coding 5 2 trans NM_0010 protein_ 30410655 G A 1/1 missense_variant MODERATE PGBD1 539564 2 c.110C>T p.Pro37Leu 3 cript 76174.1 coding 2 trans NM_0010 protein_ 30410722 C T 1/1 missense_variant MODERATE PGBD1 539564 2 c.43G>A p.Gly15Ser 3 cript 76174.1 coding 2 trans NM_0011 protein_ 30417376 C T 1/1 missense_variant MODERATE ZSCAN26 512445 5 c.1178G>A p.Arg393Lys 3 cript 01957.1 coding 2 trans NM_0011 protein_ 30421523 C T 1/1 missense_variant MODERATE ZSCAN26 512445 4 c.491G>A p.Arg164Gln 3 cript 01957.1 coding 2 trans NM_0011 protein_ 30422277 G T 1/1 missense_variant MODERATE ZSCAN26 512445 3 c.46C>A p.Leu16Met 3 cript 01957.1 coding CC 2 TG conservative_inframe trans NM_0010 protein_ p.Cys12_Gly 51168351 C 0/1 MODERATE GMDS 617688 1 c.34_39delTGCGGC 3 CG _deletion cript 80331.1 coding 13del G 2 trans XM_0249 protein_ 25284346 A C 1/1 missense_variant MODERATE TRAPPC8 538805 7 c.974A>C p.His325Pro 4 cript 84432.1 coding 2 trans NM_0011 protein_ 26047264 A T 1/1 missense_variant MODERATE DSC2 281128 8 c.953A>T p.Lys318Ile 4 cript 66526.1 coding 2 trans XM_0154 protein_ 3 p.Arg1474Le 32718999 C A 1/1 missense_variant MODERATE LAMA3 100336873 c.4421G>T 4 cript 60080.2 coding 3 u 2 trans XM_0249 protein_ 3 p.Arg1252Th 32826040 C G 1/1 missense_variant MODERATE LAMA3 100336873 c.3755G>C 4 cript 84437.1 coding 1 r 2 trans XM_0026 protein_ 1 33306989 C T 1/1 missense_variant MODERATE CABLES1 100138286 c.1889G>A p.Ser630Asn 4 cript 97726.4 coding 0 2 trans NM_1741 protein_ 43566930 G A 1/1 missense_variant MODERATE MC2R 281299 2 c.431C>T p.Pro144Leu 4 cript 09.2 coding

100 2 trans XM_0249 protein_ 46943260 C T 1/1 missense_variant MODERATE SKOR2 531776 1 c.1270G>A p.Ala424Thr 4 cript 84563.1 coding 2 trans NM_0012 protein_ 1 16503827 A G 0/1 missense_variant MODERATE SMG1 525143 c.1337T>C p.Phe446Ser 5 cript 05915.1 coding 1 2 trans NM_1742 protein_ 18019069 G A 1/1 missense_variant MODERATE UMOD 281567 6 c.1277C>T p.Ala426Val 5 cript 13.2 coding 2 trans NM_0010 protein_ 22546070 G A 0/1 missense_variant MODERATE SLC5A11 539084 7 c.496G>A p.Ala166Thr 5 cript 34660.2 coding 2 trans NM_0012 protein_ 27102573 G A 0/1 missense_variant MODERATE SETD1A 782887 7 c.2083G>A p.Gly695Ser 5 cript 05433.2 coding 2 missense_variant&sp trans NM_0010 protein_ 34218856 C A 0/1 MODERATE TMEM120A 520173 1 c.81G>T p.Gln27His 5 lice_region_variant cript 79600.1 coding 2 trans NM_0010 protein_ 1 38300641 G A 0/1 missense_variant MODERATE DAGLB 538021 c.1739C>T p.Ser580Leu 5 cript 83487.1 coding 4 2 trans NM_0011 protein_ 41744697 G A 0/1 missense_variant MODERATE C25H7orf50 522840 5 c.421G>A p.Gly141Arg 5 cript 10078.2 coding 2 trans XM_0108 protein_ 41908703 C A 0/1 missense_variant MODERATE DNAAF5 100300875 1 c.174G>T p.Glu58Asp 5 cript 19700.3 coding 2 trans NM_0010 protein_ 36750492 G C 0/1 missense_variant MODERATE CCDC172 522573 3 c.98G>C p.Arg33Thr 6 cript 75833.2 coding 2 trans NM_0010 protein_ 36782657 T G 0/1 missense_variant MODERATE CCDC172 522573 5 c.437T>G p.Ile146Ser 6 cript 75833.2 coding 2 trans NM_0010 protein_ 36806277 C G 0/1 missense_variant MODERATE CCDC172 522573 8 c.712C>G p.Leu238Val 6 cript 75833.2 coding 2 trans XM_0154 protein_ 36992507 T C 0/1 missense_variant MODERATE LOC616241 616241 3 c.290T>C p.Ile97Thr 6 cript 60748.2 coding 2 trans XM_0154 protein_ 36997219 G A 0/1 missense_variant MODERATE LOC616241 616241 7 c.626G>A p.Gly209Asp 6 cript 60748.2 coding 2 trans XM_0154 protein_ 36997251 C A 0/1 missense_variant MODERATE LOC616241 616241 7 c.658C>A p.His220Asn 6 cript 60748.2 coding 2 trans XM_0249 protein_ 37020184 C G 0/1 missense_variant MODERATE PNLIPRP2 510772 2 c.59C>G p.Ala20Gly 6 cript 85570.1 coding 2 trans XM_0249 protein_ 37020189 G T 0/1 missense_variant MODERATE PNLIPRP2 510772 2 c.64G>T p.Gly22Trp 6 cript 85570.1 coding 2 trans XM_0249 protein_ 37020190 G A 0/1 missense_variant MODERATE PNLIPRP2 510772 2 c.65G>A p.Gly22Glu 6 cript 85570.1 coding 2 trans NM_0011 protein_ 1 37035141 A C 0/1 missense_variant MODERATE PNLIPRP2 510772 c.973A>C p.Lys325Gln 6 cript 05355.2 coding 0 2 C26H10orf8 trans NM_0011 protein_ 37058780 C T 0/1 missense_variant MODERATE 616318 6 c.569G>A p.Arg190Gln 6 2 cript 01253.2 coding 2 trans NM_0012 protein_ 37249761 A C 0/1 missense_variant MODERATE ENO4 767880 4 c.617A>C p.Lys206Thr 6 cript 05373.1 coding 2 trans NM_0012 protein_ 37249770 A G 0/1 missense_variant MODERATE ENO4 767880 4 c.626A>G p.Lys209Arg 6 cript 05373.1 coding 2 trans NM_0010 protein_ 37831197 G A 0/1 missense_variant MODERATE EMX2 523601 1 c.331G>A p.Ala111Thr 6 cript 75845.1 coding 2 trans NM_0012 protein_ 34131748 A G 0/1 missense_variant MODERATE HTRA4 514946 4 c.949A>G p.Met317Val 7 cript 72014.1 coding 2 CSGALNA trans NM_0011 protein_ 38195706 C T 0/1 missense_variant MODERATE 528057 4 c.758C>T p.Thr253Met 7 CT1 cript 02116.2 coding 2 trans XM_0108 protein_ 9146684 G A 1/1 missense_variant MODERATE EDARADD 518089 6 c.284G>A p.Cys95Tyr 8 cript 20445.3 coding 2 trans NM_0010 protein_ 27419327 G A 1/1 missense_variant MODERATE UNC5B 524942 2 c.193G>A p.Val65Met 8 cript 99029.2 coding 2 trans XM_0249 protein_ 5548061 T C 0/1 missense_variant MODERATE LOC530556 530556 5 c.1672A>G p.Ser558Gly 9 cript 87380.1 coding

101

2 trans XM_0249 protein_ 5684588 C T 0/1 missense_variant MODERATE TRIM48 101902282 1 c.16C>T p.Pro6Ser 9 cript 87576.1 coding 2 trans XM_0154 protein_ 2 p.Asn1713Ly 17357839 C A 1/1 missense_variant MODERATE TENM4 509726 c.5139C>A 9 cript 61143.2 coding 7 s 2 trans NM_0010 protein_ 41005044 C T 1/1 missense_variant MODERATE UBXN1 506676 8 c.608G>A p.Arg203Gln 9 cript 37588.2 coding 2 missense_variant&sp trans NM_0012 protein_ 49269030 G A 1/1 MODERATE TSPAN32 530507 5 c.455C>T p.Thr152Met 9 lice_region_variant cript 42955.1 coding GG TG GT GG GT CC TG GG GT 2 disruptive_inframe_d trans XM_0249 protein_ 3 c.7062_7097delGACAACCACAGGGACC p.Thr2355_T 50342139 TG G 0/1 MODERATE MUC2 789571 9 eletion cript 87595.1 coding 0 CCAACCCCAGGACCCACCAC hr2366del GG GT CC CT GT GG TT GT C C1GALT1C trans NM_0010 protein_ X 5173038 C T 0/1 missense_variant MODERATE 531644 2 c.203G>A p.Arg68Gln 1 cript 34575.1 coding trans XM_0026 protein_ X 15488899 G A 0/1 missense_variant MODERATE LOC528106 528106 1 c.407G>A p.Ser136Asn cript 99546.2 coding trans NM_0010 protein_ X 34222715 G A 1/1 missense_variant MODERATE CNGA2 407172 7 c.1604G>A p.Arg535Gln cript 01139.2 coding TA trans XM_0052 protein_ X 35254699 T 1/1 frameshift_variant HIGH PNMA5 785413 1 c.95_96delAA p.Lys32fs A cript 27633.2 coding trans XM_0108 protein_ X 36718638 G A 1/1 missense_variant MODERATE LOC518106 518106 7 c.1355G>A p.Arg452Gln cript 21577.3 coding trans XM_0052 protein_ X 42119461 C G 1/1 missense_variant MODERATE NAP1L3 788191 3 c.1279G>C p.Asp427His cript 28608.4 coding trans NM_0012 protein_ X 51253393 C T 0/1 missense_variant MODERATE XKRX 524975 3 c.1288C>T p.Arg430Trp cript 05914.1 coding trans XM_0108 protein_ X 52807274 C T 1/1 missense_variant MODERATE SLC25A53 786529 1 c.92G>A p.Ser31Asn cript 21724.3 coding trans NM_0011 protein_ 1 X 74382271 T C 1/1 missense_variant MODERATE ATP7A 541275 c.2950A>G p.Ile984Val cript 92852.1 coding 4 missense_variant&sp trans NM_0010 protein_ X 87286378 A G 1/1 MODERATE CCDC22 534246 9 c.1091A>G p.Gln364Arg lice_region_variant cript 76016.2 coding LOC101908 trans XM_0249 protein_ X 137291102 C T 0/1 missense_variant MODERATE 101908350 2 c.167G>A p.Arg56His 350 cript 88433.1 coding

NOTE: Duplicate records for a specific variant position due to multiple transcripts have been removed.

102 Chapter 4 | Niemann-Pick type C disease in Angus/Angus-cross cattle

4.1 Synopsis

This chapter describes the investigation of Niemann-Pick type C disease in three Angus/Angus- cross calves. The publication in section 4.2 details the multi-faceted approach used to diagnose

Niemann-Pick type C, identify the missense causal variant and develop a diagnostic assay.

Section 4.2 showcases that using a combination of SNP genotyping, homozygosity mapping, candidate gene analysis and Sanger sequencing approaches can yield successful results. The variant was validated through SIFT analysis, protein alignment, protein modelling and animal screening using a diagnostic test. The supplementary materials associated with section 4.2 are included in section 4.3. Two videos were associated with the publication and these can be accessed via .

The publication of the work in section 4.2 was included in this thesis under the Creative

Commons Attribution 4.0 International License.

4.2 Molecular basis for a new bovine model of Niemann-Pick type C disease

103 PLOS ONE

RESEARCH ARTICLE Molecular basis for a new bovine model of Niemann-Pick type C disease

1 1 2 3 Shernae A. WoolleyID , Emily R. Tsimnadis , Cor Lenghaus , Peter J. Healy , Keith Walker4, Andrew Morton5, Mehar S. Khatkar1, Annette Elliott4, Ecem Kaya6, Clarisse Hoerner6, David A. Priestman6, Dawn Shepherd6, Frances M. Platt6, Ben 7 8 4 1 T. Porebski , Cali E. WilletID , Brendon A. O'Rourke , Imke TammenID *

1 Faculty of Science, Sydney School of Veterinary Science, The University of Sydney, Camden, NSW, Australia, 2 Ararat, VIC, Australia, 3 Harolds Cross, NSW, Australia, 4 NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, Australia, 5 Wagga Wagga, NSW, a1111111111 Australia, 6 Department of Pharmacology, University of Oxford, Oxford, United Kingdom, 7 Medical a1111111111 Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Cambridge, United a1111111111 Kingdom, 8 The University of Sydney, Sydney Informatics Hub Core Research Facilities, Darlington, NSW, a1111111111 Australia a1111111111 * [email protected]

Abstract OPEN ACCESS Niemann-Pick type C disease is a lysosomal storage disease affecting primarily the nervous Citation: Woolley SA, Tsimnadis ER, Lenghaus C, Healy PJ, Walker K, Morton A, et al. (2020) system that results in premature death. Here we present the first report and investigation of Molecular basis for a new bovine model of Niemann-Pick type C disease in Australian Angus/Angus-cross calves. After a preliminary Niemann-Pick type C disease. PLoS ONE 15(9): diagnosis of Niemann-Pick type C, samples from two affected calves and two obligate carri- e0238697. https://doi.org/10.1371/journal. ers were analysed using single nucleotide polymorphism genotyping and homozygosity pone.0238697 mapping, and NPC1 was considered as a positional candidate gene. A likely causal mis- Editor: Emanuele Buratti, International Centre for sense variant on chromosome 24 in the NPC1 gene (NM_174758.2:c.2969C G) was identi- Genetic Engineering and Biotechnology, ITALY > fied by Sanger sequencing of cDNA. SIFT analysis, protein alignment and protein modelling Received: November 17, 2019 predicted the variant to be deleterious to protein function. Segregation of the variant with dis- Accepted: August 21, 2020 ease was confirmed in two additional affected calves and two obligate carrier dams. Geno- Published: September 24, 2020 typing of 403 animals from the original herd identified an estimated allele frequency of 3.5%.

Copyright: © 2020 Woolley et al. This is an open The Niemann-Pick type C phenotype was additionally confirmed via biochemical analysis of access article distributed under the terms of the Lysotracker Green, cholesterol, sphingosine and glycosphingolipids in fibroblast cell cul- Creative Commons Attribution License, which tures originating from two affected calves. The identification of a novel missense variant for permits unrestricted use, distribution, and Niemann-Pick type C disease in Angus/Angus-cross cattle will enable improved breeding reproduction in any medium, provided the original author and source are credited. and management of this disease in at-risk populations. The results from this study offer a unique opportunity to further the knowledge of human Niemann-Pick type C disease through Data Availability Statement: All relevant data are within the manuscript and its Supporting the potential availability of a bovine model of disease. Information files. The variant information is available at the European Nucleotide Archive (www. ebi.ac.uk/ena/) deposited under the study accession number PRJEB40043.

Funding: This study was funded by the University of Sydney Faculty of Veterinary Science - Dorothy Introduction Minchin Bequest for IT, the University of Sydney Faculty of Veterinary Science honours project Lysosomal storage diseases are a heterogeneous group of at least 70 disorders that stem from the support for SAW and ERT and the Australian dysfunctional transport and accumulation of substrates and lipids, resulting in compromised

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 1 / 26 104 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Government Research Training Program (RTP) lysosomal function [1, 2]. Lysosomal storage diseases often result in premature death and effi- Scholarship for SAW. The funders had no role in cient therapeutic interventions for most of these disorders are still being investigated [3]. Nie- study design, data collection and analysis, decision mann-Pick disease is a group of rare neurodegenerative lysosomal storage diseases with a to publish, or preparation of the manuscript. recessive mode of inheritance that are fatal due to impaired un-esterified cholesterol and sphin- Competing interests: The authors have declared gomyelin transport and metabolism [4, 5]. that no competing interests exist. Niemann-Pick disease in humans is categorised into types A, B, C, and D, based on the type of lipid deficiency and clinical signs [6–9]. Causal variants for Niemann-Pick types A and B have been identified in the sphingomyelin phosphodiesterase 1 (SMPD1) gene, with variants in this gene resulting in decreased activity of the lysosomal enzyme sphingomyelinase [4, 10]. Causal variants for Niemann-Pick type C (NPC) have been identified within the NPC intracel- lular cholesterol transporter 1 (NPC1) gene and the NPC intracellular cholesterol transporter 2 (NPC2) gene, whereas the causal variant for the Novia-Scotian Niemann-Pick type D disease has been identified only in the NPC1 gene [6, 11]. Causal variants in the NPC1 and NPC2 genes result in the accumulation of un-esterified cholesterol within the late endosomes and lysosomes [12–15]. Over 95% of the human NPC causative variants identified so far occur within NPC1 [4]. The exact role of the NPC1 protein in facilitating un-esterified cholesterol transportation to target organelles and cells has not been established [6, 16], but it is postulated that the NPC1 and NPC2 proteins interact at different stages of un-esterified cholesterol export from the lysosome to other organelles [6, 17]. The age of onset and the presentation of clinical signs of Niemann-Pick disease in humans varies between each type, which can make diagnosis difficult [18, 19]. For therapeutic approaches to be of any benefit to NPC affected patients, early diagnosis of NPC is essential and the understanding of the role of the NPC1 and NPC2 proteins in transporting un-esteri- fied cholesterol is imperative to allow for targeted therapeutic approaches [3]. Niemann-Pick disease types have also been diagnosed in a wide range of animal species, including mice (MGI: 2685089), cats (OMIA 001795–9685, OMIA 000725–9685, OMIA 002065–9685), dogs (OMIA 001795–9615, OMIA 000725–9615), a raccoon (OMIA 001795– 9654) and cattle (OMIA 001795–9913) [5, 7, 12, 20–35]. The previous report of Niemann-Pick disease in Hereford cattle was identified as Niemann-Pick type A and the underlying genetic cause remains unknown [20]. Here, we present the first report and genetic characterisation of Niemann-Pick type C dis- ease in three Australian Angus/Angus-cross calves. The aims of this study were to confirm the diagnosis of Niemann-Pick type C disease and to identify the causative mutation to improve management of this disease within the affected cattle population.

Materials and methods Ethics statement The collection of hair samples for this study was approved by the University of Sydney Animal Ethics Committee (Project No: 2016/998). Samples of affected animals and two dams were col- lected as part of diagnostic procedures in 2005 before commencement of this study.

Animals Several cases of a progressive neurological disease were reported in a herd of Australian beef cattle between 2002 and 2005. The herd consisted originally of various beef cattle breeds, but the use of purebred Angus bulls over multiple years resulted in a predominantly Angus/ Angus-cross population. The same three Angus bulls were used repetitively for at least 5 years. Onset of disease in affected animals was observed from three months of age and animals died or were euthanised at about seven months of age. Clinical signs included hind limb weakness,

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 2 / 26 105 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

dysmetria, incoordination, a wide based stance, walking sideways or falling over and recum- bency (S1 Video) followed by death. Head tremors were observed in at least one animal. The condition was described to be exacerbated by stress, and animals were reported to be in good body condition at onset of disease. In 2005, three affected half-siblings (calf 1, calf 2 and calf 3) were reported to the district vet- erinarian who commenced an investigation with the suspicion of a genetic condition. Detailed pedigree information was not available for these paternal half-siblings, although the possibility of sire daughter matings was recorded. Despite the lack of ill thrift, α-mannosidosis was con- sidered as a differential diagnosis. DNA testing of calf 1 for the known Angus and Galloway α- mannosidosis mutations [36, 37] was conducted. All three calves were tested in the Virology Laboratory at the NSW Department of Primary Industries Elizabeth Macarthur Agricultural Institute for antibodies to bovine viral diarrhea virus (BVDV) using an agar gel immunodiffu- sion assay and Akabane virus, using a Simbu virus serogroup competitive enzyme-linked immunosorbent assay, as these viruses are known to cause congenital defects in cattle. All three calves were euthanised for post mortem investigation at around 6–7 months of age.

Histopathology Tissue samples from calf 1 (brain, spinal cord, heart, kidney, liver, lung and spleen) and calf 2 (brain, spinal cord, pituitary gland, muscle, peripheral nerve, eye, lymph nodes, thymus, thy- roid, salivary gland, sections of the alimentary tract, heart, aorta, trachea, lung, pancreas, liver, kidney, spleen, adrenal gland and synovium) were collected at necropsy and fixed in phos- phate-buffered formal-saline. Tissue samples from calf 3 (brain, spinal cord, heart, liver, lymph node, small intestine, adrenal gland) were fixed using Karnovski’s fluid. Tissues were embed- ded in paraffin wax, cut to 5 micron thick sections and stained with haematoxylin and eosin. Brain, liver and lymph node tissues were also stained with Periodic acid–Schiff, Ziehl–Neelsen and Pearl’s Prussian blue stains. For calf 1, ultrathin sections of the brain were examined.

Fibroblast cell culture Subcutaneous tissue was collected from calves 2 and 3 at necropsy. The tissue was initially cut into small pieces using sterile scissors and the fragments were washed in 5 x the volume of the tissue with cell culture medium (Minimal Essential Medium with Earl’s salts; MEM, MP Bio- medicals) containing antibiotics (penicillin G and streptomycin sulfate) but without serum. The fragments were then transferred to a 200 mL glass conical flask containing a sterile mag- netic stirring bar and 100 mL of 0.5% trypsin solution (BDH1, VWR Analytical Chemicals) in phosphate-buffered saline (PBS) containing antibiotics. The flask was placed on a stirring platform and stirred gently for 10 minutes at 37˚C. The suspension was decanted through a funnel holding a fine sterile metal sieve and was mixed with 50 mL of cold cell culture growth medium consisting of MEM and 10% foetal bovine serum (FBS, Gibco™, ThermoFisher Scien- tific). A second lot of 0.5% trypsin solution was added to the residual tissue fragments and was stirred for 30 minutes at 37˚C. The supernatant was decanted into the flask containing the first lot of supernatant and then poured into 50 mL tubes and centrifuged at approximately 800 g for 15 minutes. The resulting cell pellets were each resuspended in 2 mL of cell culture growth medium, pooled and the volume expanded to a total of 20 mL. The culture medium and cells were then incubated in two 25 cm2 cell culture flasks at 37˚C. The growth medium was changed every 4–5 days until the monolayers were confluent. Cells were maintained by removing the growth medium and replacing with 10 mL of main- tenance medium (MEM and 2% FBS). Cells were subcultured by removing the culture medium, briefly rinsing with 1 mL of 0.025% trypsin solution in PBS containing antibiotics,

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 3 / 26 106 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

removing the excess trypsin solution, adding 1 mL of fresh trypsin solution and incubating at 37˚C until the cells separated from the surface of the flask. The cells were then dispersed by gentle aspiration with a pipette and 8 mL of fresh growth medium was added to the flask. The cells were then subcultured by adding 2 mL of the re-suspended cell solution to 8 mL of growth medium and transferred to a 25 cm2 flask. For cryopreservation, 8 mL of MEM containing 20% FBS and 10% dimethyl sulfoxide (Ajax Finechem Laboratory Chemicals) was added to the trypsinised cell suspension. Cells in lots of 1 mL were then added to cryovials that were slowly frozen to -80˚C and then transferred to liquid nitrogen. On the first occasion that cells from an individual animal were frozen, 2 mL of the cell suspension solution was added to 18 mL of growth medium and two 25 cm2 flasks were seeded and incubated until the viability of the fro- zen cells had been confirmed. In 2005, fibroblast cell cultures were submitted to Professor John Hopwood at the Lyso- somal Diseases Research Unit, Adelaide, Australia, for initial phenotype characterisation. Dur- ing 2015 and 2017, RNA was extracted from fibroblast cell cultures for sequencing of cDNA, and the NPC phenotype was validated in these fibroblast cell cultures as described below.

DNA and RNA isolation Genomic DNA was extracted from EDTA blood samples from calf 1, calf 2 and calf 3, as well as from the two obligate carrier dams of calf 2 and calf 3 using the UltraClean1 Tissue & Cells DNA Isolation Kit (Mo Bio Laboratories Inc.) following the manufacturer’s protocol. Genomic DNA was isolated from hair roots from 403 Angus/Angus-cross animals collected in 2016 and 2017 from the original herd using a standard hair digest protocol [38]. RNA was extracted from cultured fibroblast cells from calf 2 and calf 3 and one unrelated Angus animal using the RNeasy Mini Kit (QIAGEN). Extraction was performed according to the manufacturer’s protocol with the addition of 600 μL of RLT buffer to lyse cells. DNA and RNA concentration and purity were measured using the NanoDrop 8000 spec- trophotometer (ThermoFisher Scientific).

SNP genotyping DNA samples of affected calves 2 and 3 and their two obligate carrier dams were submitted to the Australian Genome Research Facility (AGRF) for genotyping with the GeneSeek1 Geno- mic Profiler Bovine HD Chip 80K chip (Neogen). Runs of homozygosity (ROH) were com- puted using PLINK v1.07 [39]. Threshold values of 80% and 0.01 were used for single nucleotide polymorphism (SNP) call rate, and minor allele frequency ROH were identified using the–homozyg command. A minimum length for a ROH was pre-set at 5 megabases (Mb), spanning over a minimum of 100 SNPs to define a ROH. In order to account for 1% error in genotyping calls, one heterozygote and up to two missing genotypes were allowed for each ROH. The ROH were visualised using R software [40].

RT-PCR and Sanger sequencing PrimerBLAST [41] was used to design 6 primer pairs to amplify the cDNA (ENSBTAT00000020219.5) of the positional candidate gene NPC1 (Table 1 and Fig 1). Six overlapping reverse transcriptase PCRs (RT-PCR) were performed using a Mastercy- cler1 pro and Mastercycler1 (Eppendorf) in 50 μL volumes comprised of RNase-free water, 2.5 mmol/L MgCl2 pH 8.7, 0.8 mmol/L TrisCl pH 9.0, 4 mmol/L KCL, 0.04 mmol/L DTT, 0.004 mmol/L EDTA, 2.0 mM dNTPs, 0.6 μM of each primer (Table 1) and 20–30 ng/μL of RNA from calf 2 to amplify the entire NPC1 cDNA. The denaturation step was performed at 95˚C for 15 minutes, followed by 40 cycles consisting of a denaturation step for 1 minute at

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 4 / 26 107 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Table 1. Primer sequences for amplification of cDNA for NPC1 and their location in the ENSBTAT00000020219.5 transcript. Primer name Primer sequence 5’-3’ Exon Location Product size (bp) LSD_F1 CTTGGTTTCCTCCCTCCGC 1 54–72 737 LSD_R1 TGAAAAGACGGGGGTGATGG 5 771–790 LSD_F2 GCGCCTGCAATGCTACTAAC 5 705–724 902 LSD_R2 CAGGATGGTGCAGTTCTGGT 9 1587–1606 LSD_F3 ACAACGAGACTGTGACCCTG 9 1533–1552 799 LSD_R3 ACCTGTTGGTCGAGGGTTTC 14 2312–2331 LSD_F4 CCTGGTCCAGACCTACCAGA 13 2272–2291 895 LSD_R4 GAACATGGGCAGGAACCTCA 20 3147–3166 LSD_F5 CAGAGGGCAAGCAGAGACC 20 3111–3129 965 LSD_R5 GTTCAACCGAGCCTCGACA 25 4057–4075 LSD_F6 AGGTACAGAACGAGAACAGCTC 25 3961–3982 722 LSD_R6 ACCCAGTATCCACATCTAGGAG 25 4661–4682 https://doi.org/10.1371/journal.pone.0238697.t001

94˚C, annealing at 55˚C and extension at 72˚C. A final step was performed at 72˚C for 10 min- utes with a hold step at 15˚C. For calf 3 and an unrelated Angus control the protocol was mod- ified to amplify the region containing the NM_174758.2:c.2969C>G variant using primers

Fig 1. Schematic diagram of the NPC1 gene showing the location of the candidate causal mutation NM_174758.2: g.33099467C>G with chromatograms from Sanger sequencing data for two affected calves and a wildtype control. (a) Location of the bovine NPC1 gene, Chr24: 33058694–33105394 on the ARS-UCD1.2 bovine genome assembly. (b) Position of the six overlapping RT-PCR products (blue bars) in relation to the NPC1 cDNA containing 25 exons (grey bars) with a transcript length of 4721 bp (ENSBTAT00000020219.5). The start codon (ATG) is indicated by the green circle and the stop codon (UAG) is indicated by the red circle, with the cDNA location of the first nucleotide of the start and stop codon given. (c) Genomic region containing the C>G missense variant with protein translation frames obtained from NCBI Genomic Data Viewer (NCBI, accessed 12th August 2019, ). The reading frame is identified by a black box. (d) Sanger sequencing chromatograms for the two affected calves and a wildtype control. https://doi.org/10.1371/journal.pone.0238697.g001

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 5 / 26 108 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

LSD_F4 and LSD_R5 (Table 1). A touchdown RT-PCR was performed with a reverse tran- scription step at 50˚C for 30 minutes, followed by denaturation at 95˚C for 15 minutes. Cycling commenced at 94˚C for 1 minute, followed by annealing at 65–55˚C for 1 minute and extension at 72˚C for 2 minutes for 11 cycles, with the annealing temperature decreasing by 1˚C each cycle. A further 29 cycles were performed with denaturation at 94˚C for 1 minute, annealing at 55˚C for 1 minute and extension at 72˚C for 2 minutes. A final extension step at 72˚C for 1 minute concluded the PCR. PCR products were purified using the MinElute PCR Purification Kit (QIAGEN) or bands were excised from a 2% agarose gel using the Wizard1 SV Gel and PCR Clean-up System (Promega) before submission to AGRF for DNA sequencing. Sequencing data was analysed using Sequencher1 (version 5.3, Gene Codes Corporation,) by aligning the sequences to NM_174758.2 to identify variants. Variants were compared to the variant database in Ensembl [42] and predicted impacts of novel variants on protein function were determined by SIFT analysis [43]. Cross-species NPC1 protein alignments were conducted using T-Coffee [44] and BOXSHADE (v3.2) across ten species.

Genotyping assays Two genotyping tests were developed for the NM_174758.2:c. 2969C>G missense variant in the NPC1 gene to confirm segregation with disease and to estimate the allele frequency in the current herd (n = 403). Genotyping by real-time PCR. Real-time PCR was performed using the ViiA™ 7 system (Applied Biosystems™) in a final reaction volume of 20 μL. Each reaction contained 1 x Taq- Man1 Genotyping Master Mix (Applied Biosystems™), 900 nmol/L of assay specific primers (Table 2) (Sigma-Aldrich), 10 mmol/L of allele specific probes (LGC Biosearch Technologies) (Table 2) and 15–30 ng of genomic DNA. Each PCR commenced with a pre-read stage at 60˚C for 30 seconds followed by an initial denaturation at 95˚C for 10 minutes. Denaturation then occurred at 95˚C for 10 seconds followed by annealing/extension at 62˚C for 60 seconds for 50 cycles, with a post-read stage at 60˚C for 30 seconds. Genotypes were analysed using the ViiA™ 7 software version 1.1 (Applied Biosystems™). Genotyping by PCR and restriction enzyme analysis. PCR was performed using a Mas- tercycler1 pro (Eppendorf) using primers (Table 2) in 20 μL volumes comprised of 10 mmol/ L Tris-HCL pH 8.3, 50 mmol/L KCL, 1.5 mmol/L MgCl2, 0.1 mM dNTPs, 0.5 μM of each primer (Table 2), 0.5 U of Q solution, 0.05 U of Taq polymerase (Roche) and 15–20 ng/μL of target DNA. The initial denaturation step was performed at 94˚C for 3 minutes, followed by 45 cycles consisting of a denaturation step at 94˚C for 10 sec, annealing at 60˚C for 15 sec and extension at 72˚C for 20 sec. A final extension step was performed at 72˚C for 2 minutes with a hold step at 15˚C.

Table 2. Primer and probe sequences for the real-time PCR assay and primer sequences for the RFLP PCR assay. Primer or probe name PCR type Primer sequence 5’-3’ Product size (bp) NP2016_F Real-time PCR GGTCAACCCTACCTGTGTC 103 NP2016_R GTCGGAGAGGAACATGGG NP_wildtype CAAGCAGAGACCTCAGGGCGCAGAC NP_mutant CAAGCAGAGACGTCAGGGCGCAGAC NP2016_RFLP_F PCR-RFLP GAGGTTGTCCTTAAAGCAGGCA 331 NP2016_R GTCGGAGAGGAACATGGG https://doi.org/10.1371/journal.pone.0238697.t002

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 6 / 26 109 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Restriction enzyme digestions were performed in final volumes of 15 μL containing 10 μL of PCR product and 5 μL of enzymatic solution (1 x PBS, 3.7 x 10X NEBufferTM1 and 3 U of HpyCH4IV enzyme) (New England BioLabs Inc.) at 37˚C overnight. Products were visualised on a 3.7% agarose gel or the 2100 Bioanalyzer Instrument using the 2100 Expert Software (ver- sion B.02.10.SI764) (Agilent Technologies).

Characterisation of NPC1 phenotype in fibroblast culture Fibroblasts from affected calves 2 and 3 and a non-related Angus control from a different

study were shipped to the University of Oxford, Oxford, United Kingdom as CO2 equilibrated growing cultures in an insulated polystyrene box. Activity assay for α-mannosidosis. The α-mannosidase activity was assayed using 4-Methylumbelliferyl-α-D-Mannopyranoside (4MU-α-Mann) as substrate, conducted in trip- licate to investigate the possibility of differential diagnosis of α-mannosidosis in a wildtype control and affected calves 2 and 3. Briefly, hydrolysis of 4MU-α-Mann (Sigma-Aldrich) was conducted at 37˚C in 100 mm of sodium acetate buffer (pH 4.0) using 10 μl of fibroblast homogenates at a final volume of 50 μl after addition of substrate (3.00 mm final concentration 4MU-α-Mann). Reactions were stopped by the addition of 0.5M Na2CO3. Fluorescence of released 4MU was measured using a Clariostar 96-well plate reader (Excitation 365nm and Emission 450nm). Lysotracker. Bovine fibroblasts (1 × 105, in triplicate) were stained with 0.5 ml of 200 nM Lysotracker-green DND-26 (Invitrogen) in PBS for 10 minutes in the dark at room tempera- ture. Cells were centrifuged for 5 minutes at 2000 rotations per minute, re-suspended in 0.5 ml FACS buffer (PBS+1% FCS) and kept on ice for a maximum of 1 hour prior to flow cytometric analysis (BD Biosciences FACSCanto II). The cytometer was calibrated using Cytometer Setup and Tracking beads (BD). Samples were acquired with gating on singlets (FSC-H versus FSC-A). In total, 10,000 singlet events were collected. The mean fluorescence was calculated using FACSDiva software (BD) and the mean of the triplicate reading plotted. Sphingoid base measurements. Sphingoid bases (sphingosine and sphinganine) were extracted from 100 μL fibroblast homogenates in 500 μL of chloroform:methanol (1:2 v/v) fol- lowed by sonication for 10 minutes at room temperature. Subsequently, 1M 500 μL sodium chloride, 500 μL chloroform, and 3M 100 μL sodium hydroxide were added to the samples, and vortexed every 5 minutes for 15 minutes at room temperature. Homogenates were centri- fuged at 13,000 g for 10 minutes, and the lower organic phase retained. Sphingoid bases were purified from the samples by pre-equilibrating SPA-NH2 columns with 2 x 1 mL chloroform followed by sample elution with 3 x 300 μl acetone. The samples were dried under nitrogen. Lipids were re-suspended in 50 μL pre-warmed 37˚C ethanol and 50 μL o-phthaldialdehyde (OPA) labelling solution (1 mg OPA/20 μL ethanol/1 μL 2-mercaptoethanol; dilution 1:2000 in 3% boric acid pH 10.5) was added. Samples were kept at room temperature in the dark for 20 minutes and vortexed every 10 minutes. Samples were buffered with 100 μL methanol:5mM Tris pH 7 (9:1) and centrifuged at 5,000 g for 2 minutes. Supernatants (150 μL) were loaded onto a reverse phase HPLC. Chromatography was carried out using a mobile phase of 85% ace-

tonitrile/15% H2O at a flow rate of 1.0 ml/minute. The orthophthaldehyde-labelled derivatives were monitored at an excitation wavelength of 340 nanometres (nm) and an emission wave- length of 450 nm. Quantification of trace peak area was carried out using EZChrom Elite soft- ware v3.2.1 (http://www.jascoinc.com/ezchrom). Cholesterol measurements. Cholesterol was measured with the Amplex Red kit (Molecu- lar Probes) according to the manufacturer’s instructions. Briefly, cell and tissue homogenates

in 100 μL milliQ H2O were Folch extracted and dried down under nitrogen. The pellets

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 7 / 26 110 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

containing cholesterol were resuspended in 1X reaction buffer, and 50 μL was loaded for each sample in a flat bottom 96-well plate. The reaction was initiated by adding working solution per sample (25% Amplex1 Red, 2 U/mL HRP stock solution, 2 U/mL cholesterol oxidase stock solution, 0.2 U/mL cholesterol esterase stock solution in 1X reaction buffer). Samples were incubated at 37˚C for 30 minutes and the fluorescence was measured in a microplate reader (Optima, BMG Labtech) using excitation in the range of 530–560 nm and emission detection at ~590 nm. Filipin staining. Free cellular cholesterol were visualised with filipin (from Streptomyces filipinensis) (Sigma-Aldrich). Cells were fixed with 4% paraformaldehyde, washed 3 x PBS and incubated with 1.5 mg/mL glycine for 10 min to quench auto-fluorescence. Cells were then incubated with filipin (0.05 mg/mL in PBS/10% FBS/0.2% Triton x100) for 2 hours, washed 3 x PBS and visualized by Leica-SP8 confocal microscope. Cholera toxin B staining. GM1 staining with cholera toxin was conducted on fixed and permeabilised cells by incubating cells with 1:1000 dilution of stock solution (1 mg/ml) for 2 hours in 0.5% bovine serum albumin (BSA) in PBS washed 3 x with PBS. Cholera toxin B staining of cells were visualised by Leica-SP8 confocal microscope. Image quantification. Filipin and cholera toxin signals were quantified by acquiring mean gray value for each defined cell (ROI) using Fiji Software (http://fiji.sc/Fiji)[45]. Normal phase HPLC for glycosphingolipids in cultured bovine fibroblasts. Glyco- sphingolipids (GSLs) were analysed essentially as described previously [46]. Lipids from cul- tured bovine fibroblasts were extracted with chloroform and methanol overnight at 4˚C. The GSLs were then further purified using solid-phase C18 columns (Telos, Kinesis). After elution, the GSL fractions were dried down under a stream of nitrogen at 42˚C and treated with recom- binant ceramide glycanase (rEGCase I, prepared by Genscript and kindly donated by Orpha- zyme) to obtain oligosaccharides from more complex GSLs. The liberated free glycans were then fluorescently-labelled with anthranillic acid (2AA). To remove excess 2AA label, labelled glycans were purified using DPA-6S SPE columns (Supelco). Purified 2 2AA-labelled oligosac- charides were separated and quantified by normal-phase high-performance liquid chromatog- raphy (NP-HPLC) as previously described (Neville et al, 2004). The NP-HPLC system consisted of a Waters Alliance 2695 separations module and an in-line Waters 2475 multi λ- fluorescence detector set at E x λ360 nm and Em λ425 nm. The solid phase used was a 4.6 x 250 mm TSK gel-Amide 80 column (Anachem). A 2AA-labelled glucose homopolymer ladder (Ludger) was included to determine the glucose unit values (GUs) for the HPLC peaks. Indi- vidual GSL species were identified by their GU values and quantified by comparison of inte- grated peak areas with a known amount of 2AA-labelled BioQuant chitotriose standard (Ludger). Protein concentration in fibroblast homogenates was determined using the BCA assay (Sigma-Aldrich).

Structural analysis Homology modelling of bovine NPC1. As of writing, there are currently no wildtype bovine NPC1 (NP_777183.1) structures available in the (PDB) (http:// www.rcsb.org/)[47]. We therefore decided to build a homology model based on a recent cryoelectron microscopy structure of the human NPC1 protein (PDB: 6UOX) [48]. This was performed using Modeller v.9.24 [49], where 100 models were constructed using the full- length wildtype bovine NPC1 sequence (NP_777183.1) and the lowest Discrete Optimized Protein Energy (DOPE) scoring model was selected for further evaluation. A model of bovine NPC1 containing the p.P990R mutation was created using the PyMOL Molecular Graphics System, v.2.3.4 (Schro¨dinger, LLC) and selecting the rotamer of best fit.

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 8 / 26 111 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Molecular dynamics simulations. All steps of the simulation system setup were per- formed using the CHARMM-GUI webserver’s membrane builder tool [50, 51]. Initially, chain termini were capped with neutral groups (acetyl and methylamide). Residues were protonated as per their expected states at pH 7. Disulfide bonds were added based on homology to struc- tures of human NPC1. As NPC1 is an endosomal trans-membrane protein, we opted to explic- itly model the protein in a 100% POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) lipid bilayer. The PPM web server was used to determine the rotational and translational posi- tion of the transmembrane region [52]. The simulation box was set to be a rectangular system with a XYZ size of 100.12 x 100.12 x 219.21Å, which resulted in 121 and 109 lipids in the top and bottom layer respectively. Ions were added to the system to yield a NaCl concentration of 150 mM, before being solvated with TIP3P water [53]. Protein and ions were modeled with the AMBER ff14SB force field [54]. Lipids were modeled with the LIPID14 force field [55]. All bonds involving hydrogen atoms were constrained to their equilibrium lengths with the SHAKE algorithm [56]. The resulting systems were subjected to at least 5,000 steps of energy minimisation to remove any clashes, followed by an equilibration protocol with periodic boundaries. The equilibration protocol was provided by the CHARMM-GUI web server, and consists of 6 steps with harmonic positional restraints applied. Restraints started at 10 kcal/ mol/Å2 for backbone atoms, 5 kcal/mol/Å2 for sidechain atoms, 10 kcal/mol/Å2 for ions and 2.5 kcal/mol/Å2 for membrane atoms, and were gradually relaxed over 6 cycles for 125 picosec- onds (ps) each at a time step of 1 femtosecond (fs). Temperature was initialised using random velocities with a fixed target of 311.15 Kelvin, maintained by a Langevin thermostat. After 2 cycles of equilibration, constant pressure control was introduced using a semi-isotropic Berendsen barostat, at a target pressure of 1.0 bar. After 3 cycles of equilibration, positional restraints continued to be relaxed, and the cycle time was extended to 500 ps and the time step was increased to 2 fs. Throughout, a 9 Å cut-off radius was used for range-limited interactions with Particle Mesh Ewald electrostatics for long-range interactions. Production simulations were conducted in an isothermal-isobaric (NPT) ensemble as described above, with a 2 fs time step for 500 ns total and run in triplicate from independent starting velocities using Amber v.19.19.12 [57, 58] and PMEMD.cuda on Nvidia V100 GPUs. Simulation trajectories were processed and analysed using a combination of AmberTools, PyMol, VMD and custom Python (v3.7) (https://www.python.org/) scripts.

Results Clinical findings Between 2002 and 2005, a small proportion of calves from a commercial beef herd in Australia presented with progressive neurological signs (S1 Video). The three affected calves (calves 1, 2 and 3) investigated in 2005 were reported to be sired by the same bull. Herd and grazing his- tory did not support plant toxicity as a cause of disease and all three affected calves tested nega- tive for BVDV and Akabane virus. Due to a possible history of inbreeding, a recessive inherited disease was considered. Based on the suspicion of an inherited neurodegenerative disease, calf 1 was genotyped for two mutations known to cause bovine α-mannosidosis, a lysosomal storage disease that has been previously described in Angus and Galloway cattle (OMIA 000625–9913) [36, 37, 59]. The affected calf was homozygous wildtype for both bovine α-mannosidosis mutations.

Pathology At necropsy of calf 1, excessive cerebral spinal fluid was noted and no gross pathology was reported for calves 2 and 3. Histologically, brain tissue from these calves presented with similar

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 9 / 26 112 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

widespread, foamy vacuolation of the cytoplasm of neurons and glia, with eosinophilic and axonal swellings (spheroids) (Fig 2). This effect was observed throughout the brain and spinal cord with more intense degenera- tion in selected areas. In the cerebellum, there was marked vacuolation of Purkinje cells with prominent, swollen proximal axonal processes in the internal granular cell layer (torpedoes). There was extensive Wallerian-type degeneration of white matter tracts in the cerebellum par- ticularly, with numerous digestion chambers containing macrophages and detritus. In calf 2, with perhaps the most advanced disease, there was marked astrocytosis and microgliosis with a large proportion of degenerated neurons and neuronophagic nodules present. The chronicity of the process was reflected in the presence of golden-brown lipid breakdown products, seen as globules in the cytoplasm of perivascular macrophages (globoid cells). In the heart there was marked hypertrophy of Purkinje cells, with vacuolation of the cytoplasm. There were large aggregates of foamy macrophages at the cortico-medullary junction in lymph nodes. Brain, liver and lymph node tissues stained with Periodic Acid–Schiff stained weakly due to soluble storage materials being present, and not polysaccharides which can be lost during tissue pro- cessing. Tissues stained with Ziehl–Neelsen were negative and Pearl’s Prussian Blue stains were negative for iron. The examination of ultrathin brain sections using electron microscopy revealed material that was finely granular, sometimes within intracytoplasmic, membranous bodies. The identification of further intracellular storage material was difficult to examine, due to the appearance of general stored cellular debris. Examination of these sections did not reveal any micro-organisms or viral particles. Based on these findings a lysosomal storage disease was diagnosed.

Fig 2. Haematoxylin and eosin staining of 5 micron thick sections from two affected and two unaffected calves at 40x magnification. (a) Thalamus of calf 2 showing nuclei with foamy, granular basophilic cytoplasm (black arrow), degenerate neurons with a condensed and basophilic nucleus (orange arrow) with the ensuing phagocytosis (red arrows) and macrophages with brown-pigmented cellular debris accumulating around a blood vessel (green arrow). (b) Cerebellar white matter of calf 3 showcasing widespread foamy vacuolation of the cytoplasm of neurons and glia, with eosinophilic and axonal swellings (white arrows) and macrophages and lymphocytes surrounding a small blood vessel (grey arrow). (c) Thalamus from a healthy calf and (d) cerebellum from a healthy calf. https://doi.org/10.1371/journal.pone.0238697.g002

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 10 / 26 113 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Analysis of fibroblasts from affected calves 2 and 3 and a normal Angus control at the Lyso- somal Storage Disease Research Unit in Adelaide, South Australia in 2005, was suggestive of Niemann-Pick disease due to positive filipin staining. However, these results were inconclusive due to variability in filipin staining across cells.

SNP genotyping and homozygosity analysis Call rates for the SNP genotyping data for two affected and two obligate carrier animals were on average 99.81%. A sliding window of 100 SNPs identified a shared ROH between both calf 2 and calf 3 on Bos taurus autosome (BTA) 24 and BTA26 that were not shared with the obligate carrier dams of each calf. These ROH windows were approximately 40 Mb and 6 Mb in size respectively (Fig 3). Bovine homologs for the genes that cause Niemann- Pick disease in humans (SMPD1, NPC1 and NPC2), are located on BTA15, BTA24 and BTA10 respectively. The NPC1 gene was considered as a positional candidate gene based on the described gene function in humans and mice, and the phenotypes associated with a defective NPC1 protein.

Fig 3. Two Regions of Homozygosity (ROH) for affected calves 2 and 3 and their respective dams showing approximate locations for two NPC candidate genes of interest based on the UMD_3.1 bovine genome assembly. (a) A shared ROH on chromosome 10 did not contain the NPC2 gene NM_173918:g. 86,170,653–86,179,237 and (b) A shared ROH on chromosome 24 contained the NPC1 gene NM_174758:g. 33,438,455–33,485,188. https://doi.org/10.1371/journal.pone.0238697.g003

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 11 / 26 114 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Identification of a likely causal variant A likely causal homozygous missense variant (NM_174758.2:c.2969C>G) was identified in calf 2 by sequencing six overlapping RT-PCR products that spanned all 25 exons of the bovine NPC1 gene (Table 2 and Fig 1). The homozygous c.2969C>G missense variant results in the substitution of a proline residue for arginine at position 990 of the NPC1 protein (Fig 4). To predict the effect of the p.P990R missense mutation on function, SIFT [43] and PolyPhen-2 [60] were used. SIFT calculated a score of 0, which is interpreted to be ‘damaging’ to function across all three NPC1 transcripts. PolyPhen-2 calculated a score of 0.999, which is interpreted to be ‘probably damaging’ with a high probability. Although both SIFT and PolyPhen-2 are trained on human datasets, homology (88.6% identity) and conservation of this position with human NPC1 is high, and both methods are in agreement that this is a damaging mutation. A further 11 variants were identified in the sequence data and included a novel heterozy- gous missense variant and 10 homozygous previously reported variants (Table 3). Of the previ- ously reported variants, 9 were synonymous and one (p.F390L) was a missense variant, with a calculated SIFT score of 0.64 (tolerated) and a PolyPhen-2 score of 0.0 (benign) and is there- fore unlikely to be disease-causing. The likely causal variant c.2969C>G was validated using Sanger sequencing of a targeted RT-PCR product in calves 2, 3 and an unrelated control. Validation using two genotyping assays was conducted for all three affected calves, the dams of calves 2 and 3, and 403 animals from the original herd with concordant results (S1 Fig). All three affected calves were homozy- gous for the c.2969C>G variant, and both obligate carrier dams were heterozygous. From the 403 animals that were sampled in 2016 and 2017, 28 animals were heterozygous and the remaining 375 animals were homozygous wildtype. An estimated allele frequency of 3.5% was calculated for the current herd that was sampled.

Characterisation of NPC1 phenotype in fibroblast culture The retrospective nature of this study has meant that since the report of the affected calves in 2005, no recent cases have been available. Therefore, only fibroblast cell cultures for calf 2 and calf 3 were available for further phenotype analysis.

Fig 4. Multiple species NPC1 protein alignment using T-Coffee and BOXSHADE was completed using accessions NP_777183.1 (Bos taurus), NP_000262.2 (Homo sapiens), XP_001155285.1 (Pan troglodytes), XP_002800934.1 (Macaca mulatta), NP_032746.2 (Mus musculus), NP_705888.2 (Rattus norvegicus), NP_001003107.1 (Canis lupus familiaris), XP_419162.3 (Gallus gallus), NP_001230804.1 (Danio rerio) and XP_004915269.1 (Xenopus tropicalis). CowMT refers to the mutant Bos taurus sequence and CowWT refers to the wildtype Bos taurus sequence. The predicted change from the proline to arginine in the affected calves is highlighted by an asterix (�) in the CowMT sequence. https://doi.org/10.1371/journal.pone.0238697.g004

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 12 / 26 115 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Table 3. Variants identified in Sanger sequencing of NPC1 in affected calf 2. Genomic location (ARS-UCD1.2) Ref Alt Variant ID Effect on protein (Accession ID: NP_777183.1) SIFT Poly Phen-2 Genotype of individual Chr24:33058797 GA- - -- GA Chr24:33076979 T C rs382241638 p.Arg179 = -- CC Chr24:33076997 T C rs380249089 p.Asn185 = -- CC Chr24:33080459 T C rs136708615 p.Ala333 = -- CC Chr24:33080538 T C rs134085739 p.Phe360Leu 0.64 0.0 CC Chr24:33080768 T C rs137533522 p.Ala436 = -- CC Chr24:33084639 A G rs109486809 p.Gly535 = -- GG Chr24:33090972 C T rs133148564 p.Ile654 = -- TT Chr24:33091053 C T rs137071819 p.Pro681 = -- TT Chr24:33095637 T C rs110517880 p.Ile770 = -- CC Chr24:33098763 A G rs132653527 p.Ala969 = -- GG Chr24:33099467 CG- p.Pro990Arg 0 0.999 GG

The likely causal variant (bold) was also Sanger sequenced in calf 3 and a wildtype control. https://doi.org/10.1371/journal.pone.0238697.t003

Specific α-mannosidase enzyme activities for the control, calf 2 and calf 3 were 126, 147 and 135 nmol/hr/mg protein respectively, and did not reflect results consistent with α-manno- sidosis. Characterisation of fibroblast cultures from calf 2 and calf 3 and an unrelated Angus control confirmed the NPC phenotype. In the affected animals, the overall acid compartment measured in the Lysotracker analysis, sphingosine levels and cholesterol levels were elevated by 20.3%, 15.5% and 105.5% respectively (Fig 5). The accumulation of unesterified cholesterol in the late endosomal/lysosomal compartments in the fibroblasts of affected calves was visual- ised by filipin staining (Fig 6). Cholera toxin B staining (Fig 7) showed the accumulation of

Fig 5. Characterisation of NPC1-mutant bovine fibroblasts for (i) wildtype (WT) and NPC1-affected calves (NPC), consisting of calves 2 and 3 and (ii) individual profiles. (A) Total acidic compartment volume measurements with Lysotracker Green fluorescence values (mean, standard deviation (SD)). (B) Total sphingosine levels with samples labeled with o-phthaldialdehyde solution (mean, SD). (C) Total cholesterol levels measured with Amplex Red cholesterol assay, followed by Folch extraction (mean, SD). https://doi.org/10.1371/journal.pone.0238697.g005

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 13 / 26 116 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Fig 6. Filipin staining for free cholesterol in cells in wildtype Angus fibroblast cells (WT) and affected calves 2 (NPC1-1) and 3 (NPC1-2), scale bar is equal to 30 microns. (A) Representative images for filipin staining (yellow) for wildtype and affected calves, with vertical tiles under each sample. (B) Fluorescence quantification of filipin staining. NPC1-1 (calf 2) is represented in red and NPC1-2 (calf 3) in black. Mean gray value was quantified for each cell with Fiji (image J) software. https://doi.org/10.1371/journal.pone.0238697.g006

ganglioside GM3 in perinuclear vesicular structures due to the impairment of the recycling of the GM3/Cholera toxin complex typical for NPC disease. The HPLC of 2AA-labelled GSL-derived glycans from bovine cultured fibroblasts showed that, as observed in cultured human fibroblasts from patients with NPC, some GSLs remain

Fig 7. Cholera toxin B staining for GM1 localisation in cells in wildtype Angus fibroblast cells (WT) and affected calves 2 (NPC1-1) and 3 (NPC1-2). (A) Representative images for cholera toxin staining (red). Scale bar 30 micron. (B) Fluorescence quantification of cholera toxin staining. NPC1-1 (calf 2) is represented in red and NPC1-2 (calf 3) in black. Mean gray value was quantified for each cell with Fiji software. https://doi.org/10.1371/journal.pone.0238697.g007

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 14 / 26 117 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Fig 8. HPLC of 2AA-labelled GSL-derived glycans from bovine cultured fibroblasts. Fluorescence profiles for affected calf 2 (NPC1-1) are shown in red (NPC) and a wildtype Angus animal in blue (WT). https://doi.org/10.1371/journal.pone.0238697.g008

constant such as GM2 and GM1b, whereas others are significantly elevated, such as GM3 or decreased, as for GM1a and GD1a, in the samples from the affected calves (Fig 8).

Structural analysis Destabilising effect of p.P990R. Both SIFT and PolyPhen-2 indicated p.P990R to be a variant that is damaging to protein function in the context of the human NPC1 protein. To further investigate the effect of this variant on protein structure though homology modelling, molecular dynamics simulations and forcefield based energy functions, such as Rosetta and FoldX were conducted. In the absence of a bovine NPC1 structure, a homology model based on a recent human NPC1 cryoEM structure (PDB: 6UOX), was built. As described in the methods section, this model was built for the wildtype protein before introducing the p.P990R mutation (Fig 10A). Comparative analysis of the wildtype and R990 models showed no steric clashes on mutagenesis, however, the charged nature of arginine may interfere with or weaken the extant salt bridge between R980 and D994 (Fig 10B). To better understand the effect of the p.P990R mutation over time, we embedded the homology models into a 1,2-palmitoyl-oleoyl- sn-glycero-3-phosphocholine (POPC) membrane and performed 500 ns of molecular dynam- ics (MD) simulation of the wildtype and p.P990R mutant protein in triplicate. The MD simulations showed a small overall increase in root mean square deviation (RMSD) of the R990 variant in comparison to wildtype (mean RMSD of 7.3 Å vs. 6.5 Å over Cα atoms) (Fig 10C). Although this is not a substantial difference, further examination of the simulation trajectories revealed the spatial rearrangement and distortion of the C-terminal

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 15 / 26 118 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Fig 9. Quantification of Glycosphingolipids (GSLs) species in bovine cultured fibroblasts from HPLC analysis for wildtype (control), calf 2 (NPC1-1) and calf 3 (NPC1-2). https://doi.org/10.1371/journal.pone.0238697.g009

domain (CTD) and N-terminal domain (NTD) in R990, which are absent in the wildtype pro- tein (Fig 10D and S2 Video). Specifically, the simulations show that the introduction of R990 weakens the extant salt bridge of R980 and D994, presumably by competing with R980. This results in larger distances between R980 and D994 over the lifetime of the simulation (Fig 10E). By weakening this salt bridge, we observed a larger degree of dynamic motion in the CTD of the R990 mutation when compared to wildtype, and this appears to have a knock on effect within the NTD, which also exhibits a large degree of motion and structural distortion (Fig 10D and 10F and S2 Video).

Discussion Here, we present the first report detailing the clinical signs, pathology, biochemistry and genetic characterisation of Niemann-Pick type C (NPC) disease in three Australian Angus/ Angus-cross calves. After review of case history and results from initial clinical investigations, a genetic cause of disease was proposed. A lysosomal storage disease (LSD), α-mannosidosis, which was first reported in Australian Angus cattle in 1957 [61] was considered. Cattle diagnosed with α-man- nosidosis display head tremors, ataxia, incoordination and failure to thrive, leading to prema- ture death [61]. Affected calves in this study presented with neurological signs, but in contrast to animals affected with α-mannosidosis, calves in this study were reported to have good body condition at onset of clinical signs. Genotyping of the two known bovine α-mannosidosis mutations [36, 37] in calf 1 showed that the calf was homozygous wildtype for both loci. The α-mannosidase assay showed that specific activity in fibroblasts from the two affected calves was similar to that of the control. These results, when combined with the homozygous

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 16 / 26 119 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Fig 10. Structural modelling and simulations of bovine NPC1 wildtype and p.P990R mutant. (a) The homology model of bovine NPC1 wildtype embedded in a POPC membrane, showing the NTD in tan, the CTD in purple and the P990 position in pink. (b) Coordination of the R980-D994 salt bridge surrounding the P990 position in wildtype (left) and R990 mutant (right), showing how R990 competes for the negative charge of D994. This snapshot comes from an equilibrated state of the homology model. (c) Root mean square deviation (RMSD) plot of all protein Cα atoms for wildtype (blue) and R990 mutant (red), over 500 ns of simulation and for each replicate. (d) Superimposed simulation snapshots (every 100ns) over the simulation replicates shows structural distortions and enhanced conformational sampling of the CTD (purple) and NTD (tan) domains in the presence of R990 (pink). (e) Distance plot that measures the distance between the R980 CZ and D994 CG atoms over time for wildtype (blue) and R990 mutant (red). (f) RMSD plots showing the larger dynamic motion of the CTD (left) and NTD (right) domains in the R990 (red) mutant simulations in comparison to wildtype (blue). https://doi.org/10.1371/journal.pone.0238697.g010

wildtype genotype of calf 1 for both Angus and Galloway bovine α-mannosidosis mutations, excluded α-mannosidosis. Plant toxicity or viral infection were also considered as potential causes of disease. Ingestion of legumes from the Swainsona spp. by livestock is known to cause clinical signs and patholog- ical manifestations similar to those observed in inherited LSDs [62]. More specifically, the alkaloid swainsonine found within the Swainsona spp. inhibits α-mannosidases, thus causing a disease that is phenotypically similar to mannosidosis [63, 64]. The report of only three

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 17 / 26 120 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

affected calves from this property as well as the lack of any known toxic plants suggests that acquired LSD through plant toxicity was not the causal factor for disease. Viral infection has been demonstrated in a range of neurodegenerative diseases in livestock that result in variable abnormalities of the central nervous system (CNS) [65]. Both BVDV and the Akabane virus are known to cause congenital abnormalities, notably cerebral hypoplasia and hydranence- phaly depending on the foetal stage of infection [65, 66] but were excluded in all three cases. Histopathology revealed degenerated neurons and widespread foamy vacuolation of the cytoplasm and glia of the CNS in all three affected calves. The accumulation of storage material within the CNS and peripheral organs in the affected calves resulted in a diagnosis of a LSD. Based on the clinical investigation, pedigree information, histopathology and initial filipin staining results, NPC was considered and was later confirmed with biochemical characterisa- tion of fibroblast cells. Initial filipin staining of fibroblast cells in 2005 produced inconclusive results due to vari- ability in staining across cells. However, filipin staining can result in variable staining patterns in human NPC fibroblasts, and based on the variation, staining patterns are described as either classic, intermediate or variant phenotype [67, 68]. Particular NPC1 mutations located within the C-terminal luminal domain of the human NPC1 protein [67, 69] are often associated with the variant pattern of filipin staining. After identification of a possible causative mutation in NPC1 in the affected calves, central to the diagnosis of NPC in this study was the characterisation of fibroblast cell cultures from affected calves 2 and 3 in 2017. The cells displayed elevated total acidic compartment volume measurements via Lysotracker Green analysis (Fig 5A) and increased sphingosine (Fig 5B) and total cholesterol (Fig 5C) levels. The elevation in Lysotracker staining, GSLs and cholesterol is consistent with that observed in human patients with NPC [70]. Biochemically, GSL expres- sion in bovine cultured fibroblasts is slightly different to that in man. In control human fibro- blasts, Gb3 and GM3 are the most prominent gangliosides, with small amounts of GM2 and Gb4. In human patient NPC fibroblasts, increased amounts of both GM3 and GM2 are observed (Platt lab, Oxford, unpublished data). In the bovine control fibroblasts, the most prominent GSLs are GM3 and GM1b with small amounts of GM2 and Gb4 (Figs 8 and 9). In the bovine NPC fibroblasts an almost two-fold increase in GM3 was observed, similar to those seen in human patient samples, but GM2 was unchanged. Cholera toxin can bind to a number of different gangliosides, as well as GM1, with varying affinities [71]. From the HPLC analysis (Figs 8 and 9), GM1a is less abundant in the control than in the affected calf fibroblasts. Thus, the cholera toxin binding seen in Fig 7 is likely to be a result of interaction with GM3. The total concentration of GSLs in the control bovine fibroblasts are similar to the values we have observed in human fibroblast cultures. These tend to remain very constant in healthy patient samples (Platt lab, Oxford, unpublished data). Overall the bovine biochemical data is very sim- ilar to observations made in human NPC patient fibroblasts and thus strongly suggests that these calves do exhibit phenotypes of NPC disease. To characterise the genetic basis of NPC in the affected calves, SNP genotyping and homo- zygosity analysis of affected calves 2 and 3 and their obligate carrier dams revealed a region of homozygosity on bovine chromosome 24 common to the affected animals only that included the location of NPC1 (Fig 3). No region of homozygosity was observed at the location of NPC2 on BTA10 (Fig 3). Selection of NPC1 as a positional candidate gene was based on similar phe- notype and disease progression in humans and mice, as well as gene function. Sanger sequenc- ing of bovine cDNA revealed the homozygous missense variant NM_174758.2:c.2969C>G (NP_777183.1:p.Pro990Arg), which leads to a non-conservative amino acid change within the C-terminal luminal domain of the bovine NPC1 protein. Through SIFT analysis and Poly- Phen-2 analysis, this variant is predicted to have a deleterious impact on NPC1 protein

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 18 / 26 121 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

function. Multiple species alignment of the NPC1 protein (Fig 4) showed that the proline resi- due was conserved across all ten species. Since proline is a small non-polar amino acid and arginine is a large positively charged amino acid, this exchange may impact the structure and function of NPC1. As such, we explored homology modelling and molecular dynamics (MD) simulations of wildtype bovine NPC1 protein and the p.P990R mutant protein. Here, the homology model suggested that the p.P990R mutation might interfere with a salt-bridge (R980-D994) within the C-terminal domain (CTD) (Fig 10B). The MD simulations supported this observation, revealing an overall increase in distance between the salt-bridge (Fig 10E), which in turn appears to be responsible for increased dynamic motions of and between the CTD and NTD amongst local structural distortion of the domains (Fig 10D and 10F). Despite it being difficult to measure the effect of function in simulation, this adds extra evidence to the deleterious nature of the p.P990R mutation. The NM_174758.2:c.2969C>G variant was observed in a homozygous state in affected calves 2 and 3 via Sanger sequencing, in a homozygous state in affected calf 1 and in a hetero- zygous state in the obligate carrier dams of calves 2 and 3 via genotyping. The location of NP_777183.1:p.Pro990Arg is in the C-terminal luminal domain of the NPC1 protein [72, 73]. The human NPC1 protein contains three luminal domains and 13 transmembrane domains [72]. Most causative variants within the human NPC1 protein reside within the C-terminal luminal domain spanning from residues 855 to 1098 [72, 73]. Deleterious variants within this region can result in the misfolding of the NPC1 protein, suggesting that this region has struc- tural importance for normal protein function [11, 72, 73]. Further validation of the c.2969C>G in 403 animals was conducted using two genotyping assays. Human, feline and murine NPC is inherited via a Mendelian recessive mode of inheri- tance [22, 74–77]. Limited pedigree information in this study was suggestive of inbreeding and therefore a recessive mode of inheritance for the disease in these animals is proposed. Under this assumption the c.2969C>G variant segregates with the disease. Considering the location of this variant within the C-terminal luminal domain, the predicted deleterious impact of this variant on protein function and its segregation with disease, it is likely that this missense vari- ant is responsible for the NPC phenotype observed in the Angus/Angus-cross calves. Variants within the NPC1 gene have been previously reported in cattle, cats and mice. Four SNPs within the NPC1 gene were identified in Qinchuan cattle and were associated with body size traits. These variants included one missense variant and three synonymous variants [78]. The missense variant did not localise to the C-terminal luminal domain of the bovine NPC1 protein, and was instead located within loop A of the protein [78]. The cattle in that study were noted to be healthy, with no comment made in relation to any neurological or beha- vioural abnormalities [78]. In a colony of cats with NPC, a recessive causal variant was identi- fied that caused similar clinical signs and biochemical profiles to human juvenile-onset NPC [22]. This variant caused an amino acid change located within the C-terminal luminal domain of the feline NPC1 protein, which is highly homologous to the human NPC1 protein [22, 79]. Three murine models of NPC have been reported, with the Npc1spm and Npc1nih models show- ing phenotypic similarity to early-onset NPC, and the recent Npc1nmf164 model showing phe- notypic similarity to late-onset NPC in humans [74–77]. The causal variant identified in the Npc1spm and Npc1nmf164 models similarly corresponds to amino acid changes in the C-terminal luminal loop of the NPC1 protein [77]. The Npc1nih model however corresponds to an amino acid change in the second terminal loop (loop A) of the NPC1 protein, but is still associated with early-onset NPC in humans [77]. Animal models of human disease have proven useful for advancing the knowledge of the biochemical mechanisms of disease, as well as improving therapeutic studies. The feline and murine models of NPC have been used in investigating the effectiveness of treatments [80–82].

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 19 / 26 122 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

However, some doubts have been noted for the Npc1smp mouse model, because it is unknown whether this model accurately reflects human NPC disease [74–76] despite the causal variant for this model being located in a region of high variant frequency [77]. While both the murine and feline animal models have contributed to human NPC research, a bovine model would be a valuable addition. Large animal models have been fundamental to discovering underlying disease mechanisms and effective therapies for lysosomal storage diseases in humans [83]. In contrast to rodent models, large animal models have a more varied genetic background, a lon- ger lifespan, larger organs, and clinical presentation and progression of disease that are often more similar to human disease. However, breeding and maintaining research populations of large animal models is more time consuming and costly when compared to rodent models. In regard to evaluating safety and effectiveness of therapeutic interventions, large animal models are therefore recognized as important intermediates when moving from preclinical research to clinical trials in humans [83]. The development of a bovine model for human NPC disease would enable therapeutic approaches to be evaluated and implemented on a model with organ size scaling more comparable to humans than cats and rodents. Potential issues may arise when sourcing suitable reagents for testing therapeutic drug use in non-human species [84], as well as the expense of providing these therapeutic drugs to large animals over extended periods of time. Our findings of a missense variant in three Angus/Angus-cross calves represents a unique opportunity to further the knowledge of the underlying biochemistry and mechanisms of NPC in humans, especially as similar neurological clinical signs are observed between affected human patients and cattle. For NPC, fibroblast cell cultures can be used to demonstrate the disease phenotype. There- fore, cell rescue experiments involving the overexpression of NPC1 in the established fibroblast cell line could be conducted. The cell rescue experiments would enable a reversal of the cell phenotype and thus confirm the diagnosis of NPC1. Alternatively, future work could involve the reversal of the c.2969C>G variant via CRISPR-CAS9 genome-editing [85] in affected fibroblasts to correct the phenotype. The identification of a causative variant and a robust genotyping test to identify heterozy- gous individuals allowed for the effective management of this inherited diseases in the herd of origin. Genotyping of the wider Angus cattle population will reveal if the deleterious allele needs to be managed on a national or international level [86]. A bovine model for human NPC offers a unique opportunity to investigate the underlying mechanisms of the disease, as well as opportunities for targeted therapeutic approaches.

Supporting information S1 Fig. Two genotyping assays were developed to discriminate the NM_174758.2: c.2969C>G variant for homozygous wildtype, heterozygous and homozygous mutant indi- viduals. (a) Allelic discrimination plot visualised using QuantStudio™ Real-Time PCR System version 1.3 (Applied Biosystems™) for a TaqMan genotyping assay for homozygous wildtype (red dots), heterozygous (green dots), homozygous mutant (blue) individuals and no DNA template control (black square). (b) PCR-RFLP size discrimination visualised on a Bioanalyzer Instrument (Agilent Technologies) for (1) homozygous wildtype (230 bp, 101 bp and 44 bp) (2) heterozygous (230 bp, 186 bp, 101 bp and 44 bp), (3) homozygous mutant (186 bp, 101 bp and 44 bp) individuals and (4) no DNA template control. (TIF) S1 Video. Video recording of an affected calf showing clinical signs including hind limb weakness, dysmetria, incoordination, walking sideways, falling over and recumbency. The

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 20 / 26 123 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

video was recorded in 2005 by the producer. (MP4) S2 Video. Simulation trajectories of wildtype (left) and p.P990R mutant (right) models. Both models were subjected to 500 ns of MD simulation and performed in triplicate (replicates 1, 2 and 3). The movie shows the NPC1 models displayed in the cartoon representation, and embedded in a POPC membrane. This movie highlights the C-terminal domain (CTD) (pur- ple), N-terminal domain (NTD) (tan) and site of mutagenesis (pink Van der Waals spheres). The p.P990R simulations show an increased amount of motion and distortion of the CTD and NTDs internally, but also of their positions relative to each other. The full length of this movie encompasses 500 ns of MD simulation, with water molecules and ions hidden for clarity. (MP4)

Acknowledgments The authors would like to acknowledge and thank the producers for submitting samples, video recordings of affected animals and data associated with this case, as well as veterinari- ans Nerida Evans and Clem Watson for their assistance with collecting hair samples. The authors would like to acknowledge Professor John Hopwood for early functional experi- mentation of the bovine fibroblast cells. The authors would like to further acknowledge the Australian Genome Research Facility for conducting SNP genotyping and sequencing ser- vices. The authors acknowledge the assistance of resources from the National Computa- tional Infrastructure (NCI Australia), an NCRIS enabled capability supported by the Australian Government for this study. The authors acknowledge the technical assistance provided by the Sydney Informatics Hub, a Core Research Facility of the University of Syd- ney and the use of the University of Sydney’s high performance computing cluster, Artemis. The authors would like to thank Dr Julie Dennis and Dr Peter Kirkland for providing feed- back on the paper during the draft stages.

Author Contributions Conceptualization: Shernae A. Woolley, Brendon A. O’Rourke, Imke Tammen. Data curation: Shernae A. Woolley. Formal analysis: Shernae A. Woolley, Emily R. Tsimnadis, Frances M. Platt, Brendon A. O’Rourke, Imke Tammen. Funding acquisition: Imke Tammen. Investigation: Shernae A. Woolley, Emily R. Tsimnadis, Cor Lenghaus, Peter J. Healy, Keith Walker, Andrew Morton, Annette Elliott, Ecem Kaya, Clarisse Hoerner, David A. Priest- man, Dawn Shepherd. Methodology: Shernae A. Woolley, Ben T. Porebski, Cali E. Willet. Project administration: Imke Tammen. Resources: Andrew Morton, Frances M. Platt, Ben T. Porebski, Cali E. Willet, Brendon A. O’Rourke, Imke Tammen. Software: Mehar S. Khatkar, Ben T. Porebski, Cali E. Willet. Supervision: Brendon A. O’Rourke, Imke Tammen. Visualization: Shernae A. Woolley, Ben T. Porebski.

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 21 / 26 124 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

Writing – original draft: Shernae A. Woolley. Writing – review & editing: Shernae A. Woolley, Cor Lenghaus, Mehar S. Khatkar, Annette Elliott, David A. Priestman, Frances M. Platt, Ben T. Porebski, Cali E. Willet, Brendon A. O’Rourke, Imke Tammen.

References 1. Aerts JM, Kallemeijn WW, Wegdam W, Joao Ferraz M, van Breemen MJ, Dekker N, et al. Biomarkers in the diagnosis of lysosomal storage disorders: proteins, lipids, and inhibodies. J Inherit Metab Dis. 2011; 34(3):605±19. https://doi.org/10.1007/s10545-011-9308-6 PMID: 21445610 2. Wang N, Zhang Y, Gedvilaite E, Loh JW, Lin T, Liu X, et al. Using whole-exome sequencing to investi- gate the genetic bases of lysosomal storage diseases of unknown etiology. Hum Mutat. 2017; 38 (11):1491±9. https://doi.org/10.1002/humu.23291 PMID: 28703315 3. Patterson MC, Platt F. Therapy of Niemann±Pick disease, type C. Biochim Biophys Acta-Mol Cell Biol L. 2004; 1685(1±3):77±82. https://doi.org/10.1016/j.bbalip.2004.08.013 4. Vanier MT, Millat G. Niemann±Pick disease type C. Clin Genet. 2003; 64:269±81. https://doi.org/10. 1034/j.1399-0004.2003.00147.x PMID: 12974729 5. Crocker AC, Farber S. Niemann-Pick disease: A review of eighteen patients. Med. 1958; 37(1):1±95. https://doi.org/10.1097/00005792-195802000-00001 6. Vanier MT. Complex lipid trafficking in Niemann-Pick disease type C. J Inherit Metab Dis. 2015; 38:187±99. https://doi.org/10.1007/s10545-014-9794-4 PMID: 25425283 7. Crocker AC. The cerebral defect in Tay-Sachs disease and Niemann-Pick disease. J Neurochem. 1961; 7(1):69±80. https://doi.org/10.1111/j.1471-4159.1961.tb13499.x 8. Vanier MT, Rodriguez-Lafrasse C, Rousson R, Duthel S, Harzer K, Pentchev PG, et al. Type C Nie- mann-Pick disease: Biochemical aspects and phenotypic heterogeneity. Dev. Neurosci. 1991; 13(4± 5):307±14. https://doi.org/10.1159/000112178 PMID: 1817036. 9. Yu XH, Jiang N, Yao PB, Zheng XL, Cayabyab FS, Tang CK. NPC1, intracellular cholesterol trafficking and atherosclerosis. Clin Chim Acta. 2014; 429:69±75. https://doi.org/10.1016/j.cca.2013.11.026 PMID: 24296264 10. McGovern MM, Lippa N, Bagiella E, Schuchman EH, Desnick RJ, Wasserstein MP. Morbidity and mor- tality in type B Niemann-Pick disease. Genet Med. 2013; 15(8):618±23. https://doi.org/10.1038/gim. 2013.4 PMID: 23412609 11. Greer WL, Dobson MJ, Girouard GS, Byers DM, Riddell DC, Neumann PE. Mutations in NPC1 highlight a conserved NPC1-specific cysteine-rich domain. Am J Hum Genet. 1999; 65(5):1252±60. https://doi. org/10.1086/302620 PMID: 10521290 12. Schuchman EH, Desnick RJ. Types A and B Niemann-Pick disease. Mol Genet Metab. 2017; 120(1± 2):27±33. https://doi.org/10.1016/j.ymgme.2016.12.008 PMID: 28164782 13. Vanier MT, Rodriguez-Lafrasse C, Rousson R, Gazzah N, Juge M-C, Pentchev PG, et al. Type C Nie- mann-Pick disease: Spectrum of phenotypic variation in disruption of intracellular LDL-derived choles- terol processing. Biochim et Biophys Acta. 1991; 1096(4):328±37. https://doi.org/10.1016/0925-4439 (91)90069-L 14. Vanier MT. Biochemical studies in Niemann-Pick disease I. Major sphingolipids of liver and spleen. Bio- chim et Biophys ActaÐLipid Lipid Met. 1983; 750(1):178±84. https://doi.org/10.1016/0005-2760(83) 90218-7 15. Zitman D, Chazan S, Klibansky C. Sphingomyelinase activity levels in human peripheral blood leuko- cytes, using [3H]sphingomyelin as substrate: study of heterozygotes and homozygotes for Niemann- pick disease variants. Clin Chim Acta. 1978; 86(1):37±43. https://doi.org/10.1016/0009-8981(78) 90455-2 PMID: 26487 16. Lloyd-Evans E, Platt FM. Lysosomal Ca2+ homeostasis: Role in pathogenesis of lysosomal storage dis- eases. Cell Calcium. 2011; 50(2):200±5. https://doi.org/10.1016/j.ceca.2011.03.010 PMID: 21724254 17. Deffieu MS, Pfeffer SR. Niemann±Pick type C 1 function requires lumenal domain residues that mediate cholesterol-dependent NPC2 binding. PNAS. 2011; 108(47):18932±6. https://doi.org/10.1073/pnas. 1110439108 PMID: 22065762 18. Vanier MT, Wenger DA, Comly ME, Rousson R, Brady RO, Pentchev PG. Niemann-Pick disease group C: Clinical variability and diagnosis based on defective cholesterol esterification: A collaborative study on 70 patients. Clin Genet. 1988; 33(5):331±48. https://doi.org/10.1111/j.1399-0004.1988.tb03460.x PMID: 3378364

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 22 / 26 125 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

19. Fink JK, Filling-Katz MR, Sokol J, Cogan DG, Pikus A, Sonies B, et al. Clinical spectrum of Niemann- Pick disease type C. Neurology. 1989; 39(8):1040±9. https://doi.org/10.1212/wnl.39.8.1040 PMID: 2761697 20. Saunders GK, Wenger DA. Sphingomyelinase Deficiency (Niemann-Pick disease) in a Hereford Calf. Vet Pathol. 2008; 45:201±2. https://doi.org/10.1354/vp.45-2-201 PMID: 18424834 21. Kuwamura M, Awakura T, Shimada A, Umemura T, Kagota K, Kawamura N, et al. Type C Niemann- Pick disease in a boxer dog. Acta Neuropathol. 1993; 85:345±8. https://doi.org/10.1007/BF00227733 PMID: 8460536 22. Somers KL, Royals MA, Carstea ED, Rafi MA, Wenger DA, Thrall MA. Mutation analysis of feline Nie- mann-Pick C1 disease. Mol Genet Metab. 2003; 79(2):99±103. https://doi.org/10.1016/s1096-7192(03) 00074-x PMID: 12809639. 23. Somers KL, Wenger DA, Royals MA, Carstea ED, Connally HE, Kelly T, et al. Complementation Studies in Human and Feline Niemann-Pick Type C Disease. Mol Genet Metab. 1999; 66:117±21. https://doi. org/10.1006/mgme.1998.2778 PMID: 10068514 24. Bundza A, Lowden JA, Charlton KM. Niemann-Pick Disease in a Poodle Dog. Vet Pathol. 1979; 16 (5):530±8. https://doi.org/10.1177/030098587901600504 PMID: 573013. 25. Zampieri S, Bianchi E, Cantile C, Saleri R, Bembi B, Dardis A. Characterization of a Spontaneous Novel Mutation in the NPC2 Gene in a Cat Affected by Niemann Pick Type C Disease. PLoS One. 2014; 9 (11):e112503. https://doi.org/10.1371/journal.pone.0112503 PMID: 25396745 26. Chrisp CE, Ringler DH, Abrams GD, Radin NS, Brenkert A. Lipid storage disease in a Siamese cat. J Am Vet Med Assoc. 1970; 156(5):616±22. PMID: 5461697. 27. Wenger DA, Sattler M, Kudoh T, Snyder SP, Kingston RS. Niemann-Pick Disease: A Genetic Model in Siamese Cats. Science. 1980; 208(4451):1471±3. https://doi.org/10.1126/science.7189903 PMID: 7189903 28. Vapniarsky N, Wenger DA, Scheenstra D, Mete A. Sphingomyelin Lipidosis (Niemann±Pick Disease) in a Juvenile Raccoon (Procyon lotor). J Comp Pathol. 2013; 149(2):385±9. https://doi.org/10.1016/j.jcpa. 2013.01.011 29. Baker HJ, Wood PA, Wenger DA, Walkley SU, Inui K, Kudoh T, et al. Sphingomyelin lipidosis in a cat. Vet Pathol. 1987; 24(5):386±91. https://doi.org/10.1177/030098588702400504 PMID: 3672804. 30. Niemann A. Ein unbekanntes krankheitsbild. Jahrb Kinderheilkd. 1914; 79(1):1±10. 31. Pick L. UÈ ber die lipoidzellige splenohepatomegalie typus Niemann-Pick als stoffwechselerkrankung. Med Klin. 1927; 23:1483±6. 32. Pflander U. La maladie de Niemann-Pick dans le cadre des lipoidoses. Schweiz Med Wochenschr. 1946:76. 33. Dusendschon A. Deux cas familiaux de maladie de Niemann-Pick chez l'adulte: Impr. ReÂunies; 1946. 34. Vance JE. Lipid imbalance in the neurological disorder, Niemann-Pick C disease. FEBS Lett. 2006; 580 (23):5518±24. https://doi.org/10.1016/j.febslet.2006.06.008 PMID: 16797010 35. Higashi Y, Murayama S, Pentchev PG, Suzuki K. Cerebellar degeneration in the Niemann-Pick type C mouse. Acta Neuropathol. 1993; 85(2):175±84. https://doi.org/10.1007/BF00227765 PMID: 8382896 36. Tollersrud OK, Berg T, Healy P, Evjen G, Ramachandran U, Nilssen O. Purification of bovine lysosomal alpha-mannosidase, characterization of its gene and determination of two mutations that cause alpha- mannosidosis. Eur J Biochem. 1997; 246(2):410±9. https://doi.org/10.1111/j.1432-1033.1997.00410.x PMID: 9208932. 37. Berg T, Healy PJ, Tollersrud OK, Nilssen O. Molecular heterogeneity for bovine alpha-mannosidosis: PCR based assays for detection of breed-specific mutations. Res Vet Sci. 1997; 63(3):279±82. https:// doi.org/10.1016/s0034-5288(97)90034-5 PMID: 9491457. 38. Healy PJ, Dennis JA, Moule JF. Use of hair root as a source of DNA for the detection of heterozygotes for recessive defects in cattle. Aust Vet J. 1995; 72(10):392-. https://doi.org/10.1111/j.1751-0813.1995. tb06178.x PMID: 8599573 39. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007; 81 (3):559±75. PubMed PMID: PMC1950838. https://doi.org/10.1086/519795 PMID: 17701901 40. Team RC. R: A Language and Environment for Statistical Computing 2014. Available from: https:// www.R-project.org. 41. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden T. Primer-BLAST: A tool to design tar- get-specific primers for polymerase chain reaction. BMC Bioinformatics. 2012; 13:134±45. https://doi. org/10.1186/1471-2105-13-134 PMID: 22708584

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 23 / 26 126 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

42. Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2015. Nucleic Acids Res. 2015; 43(Database issue):D662±D9. https://doi.org/10.1093/nar/gku1010 PMID: 25352552; PubMed Central PMCID: PMC4383879. 43. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein func- tion using the SIFT algorithm. Nat Protoc. 2009; 4(7):1073±81. https://doi.org/10.1038/nprot.2009.86 PMID: 19561590. 44. Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang J-M, et al. T-Coffee: A web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011; 39(Web Server issue):W13±W7. https://doi.org/10. 1093/nar/gkr245 PMID: 21558174. 45. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012; 9(7):676±82. https://doi.org/10.1038/nmeth. 2019 PMID: 22743772 46. Neville DCA, Coquard V, Priestman DA, Te Vruchte DJM, Sillence DJ, Dwek RA, et al. Analysis of fluo- rescently labeled glycosphingolipid-derived oligosaccharides following ceramide glycanase digestion and anthranilic acid labeling. Anal Biochem. 2004; 331(2):275±82. https://doi.org/10.1016/j.ab.2004.03. 051 PMID: 15265733 47. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000; 28(1):235±42. https://doi.org/10.1093/nar/28.1.235 PMID: 10592235 48. Long T, Qi X, Hassan A, Liang Q, De Brabander JK, Li X. Structural basis for itraconazole-mediated NPC1 inhibition. Nat Commun. 2020; 11(1):152. https://doi.org/10.1038/s41467-019-13917-5 PMID: 31919352 49. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, et al. Comparative pro- tein structure modeling using MODELLER. Curr Protoc Protein Sci. 2007; Chapter 2: Unit 2.9. Epub 2008/04/23. https://doi.org/10.1385/1-59259-890-0:831 PMID: 18429317. 50. Jo S, Kim T, Im W. Automated Builder and Database of Protein/Membrane Complexes for Molecular Dynamics Simulations. PLoS One. 2007; 2(9):e880. https://doi.org/10.1371/journal.pone.0000880 PMID: 17849009 51. Jo S, Kim T, Iyer VG, Im W. CHARMM-GUI: A web-based graphical user interface for CHARMM. J Comput Chem. 2008; 29(11):1859±65. https://doi.org/10.1002/jcc.20945 PMID: 18351591 52. Lomize MA, Pogozheva ID, Joo H, Mosberg HI, Lomize AL. OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res. 2012; 40(Database issue): D370±D6. Epub 2011/09/02. https://doi.org/10.1093/nar/gkr703 PMID: 21890895. 53. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J Chem Phys. 1983; 79(2):926±35. https://doi.org/10.1063/1. 445869 54. Maier JA, Martinez C, Kasavajhala K, Wickstrom L, Hauser KE, Simmerling C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J Chem Theory Comput. 2015; 11(8):3696±713. https://doi.org/10.1021/acs.jctc.5b00255 PMID: 26574453 55. Dickson CJ, Madej BD, Skjevik ÅA, Betz RM, Teigen K, Gould IR, et al. Lipid14: The Amber Lipid Force Field. J Chem Theory Comput. 2014; 10(2):865±79. https://doi.org/10.1021/ct4010307 PMID: 24803855 56. Lippert RA, Bowers KJ, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, et al. A common, avoidable source of error in molecular dynamics integrators. J Chem Phys. 2007; 126(4):046101. Epub 2007/02/ 09. https://doi.org/10.1063/1.2431176 PMID: 17286520. 57. Case DA, Belfon K, Ben-Shalom IY, Brozell SR, Cerutti DS, Cheatham ITE, et al. AMBER 2020. Uni- versity of California, San Francisco. 2020. 58. Salomon-Ferrer R, Case DA, Walker RC. An overview of the Amber biomolecular simulation package. WIRES Comput Mol Sci. 2013; 3(2):198±210. https://doi.org/10.1002/wcms.1121 59. Jolly RD. Diagnosis and control of pseudolipidosis of angus calves. N Z Vet J. 1970; 18(10):228±9. https://doi.org/10.1080/00480169.1970.33909 PMID: 5285875 60. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010; 7(4):248±9. Epub 2010/04/01. https://doi.org/10.1038/nmeth0410-248 PMID: 20354512; PubMed Central PMCID: PMC2855889. 61. Whittem JH, Walker D. "Neuronopathy" and "pseudolipidosis" in Aberdeen-Angus calves. J Pathol Bac- teriol. 1957; 74:281±8. 62. Dorling PR, Huxtable CR, Vogel P. Lysosomal storage in Swainsona spp. toxicosis: an induced manno- sidosis. Neuropath Appl Neuro. 1978; 4(4):285±95. https://doi.org/10.1111/j.1365-2990.1978.tb00547. x PMID: 703929.

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 24 / 26 127 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

63. Huxtable CR, Dorling PR. Poisoning of livestock by Swainsona spp.: current status. Aust Vet J. 1982; 59(2):50±3. https://doi.org/10.1111/j.1751-0813.1982.tb02716.x PMID: 6816206 64. Molyneux RJ, McKenzie RA, O'Sullivan BM, Elbein AD. Identification of the Glycosidase Inhibitors Swainsonine and Calystegine B2 in Weir Vine (Ipomoea sp. Q6 {aff. calobra}) and Correlation with Tox- icity. J Nat Prod. 1995; 58(6):878±86. https://doi.org/10.1021/np50120a009 PMID: 7673932 65. Kessell AE, Finnie JW, Windsor PA. Neurological diseases of ruminant livestock in Australia. IV: viral infections. Aust Vet J. 2011; 89(9):331±7. https://doi.org/10.1111/j.1751-0813.2011.00817.x PMID: 21864304 66. Hartley WJ, Wanner RA, Della-Porta AJ, Snowdon WA. Serological evidence for the association of Aka- bane virus with epizootic bovine congenital arthrogryposis and hydranencephaly syndromes in New South Wales. Aus Vet J. 1975; 51(2):103±4. https://doi.org/10.1111/j.1751-0813.1975.tb09422.x 67. Vanier MT, Latour P. Laboratory diagnosis of Niemann-Pick disease type C: the filipin staining test. Methods Cell Biol. 2015; 126:357±75. https://doi.org/10.1016/bs.mcb.2014.10.028 PMID: 25665455. 68. Vanier MT, Rodriguez-Lafrasse C, Rousson R, Gazzah N, Juge MC, Pentchev PG, et al. Type C Nie- mann-Pick disease: Spectrum of phenotypic variation in disruption of intracellular LDL-derived choles- terol processing. Biochimica et Biophysica Acta. 1991; 1096(4):328±37. https://doi.org/10.1016/0925- 4439(91)90069-l PMID: 2065104. 69. Millat G, MarcËais C, Tomasetto C, Chikh K, Fensom AH, Harzer K, et al. Niemann-Pick C1 Disease: Correlations between NPC1 Mutations, Levels of NPC1 Protein, and Phenotypes Emphasize the Func- tional Significance of the Putative Sterol-Sensing Domain and of the Cysteine-Rich Luminal Loop. Am J Hum Genet. 2001; 68(6):1373±85. https://doi.org/10.1086/320606 PMID: 11333381 70. te Vruchte D, Speak AO, Wallom KL, Al Eisa N, Smith DA, Hendriksz CJ, et al. Relative acidic compart- ment volume as a lysosomal storage disorder±associated biomarker. J Clin Invest. 2014; 124(3):1320± 8. https://doi.org/10.1172/JCI72835 PMID: 24487591 71. Kuziemko GM, Stroh M, Stevens RC. Cholera Toxin Binding Affinity and Specificity for Gangliosides Determined by Surface Plasmon Resonance. Biochemistry. 1996; 35(20):6375±84. https://doi.org/10. 1021/bi952314i PMID: 8639583 72. Scott CE, Ioannou YA. The NPC1 protein: structure implies function. Biochim Biophys ActaÐMol Cell Biol L. 2004; 1685(1±3):8±13. https://doi.org/10.1016/j.bbalip.2004.08.006 73. Li X, Lu F, Trinh MN, Schmiege P, Seemann J, Wang J, et al. 3.3 Å structure of Niemann-Pick C1 pro- tein reveals insights into the function of the C-terminal luminal domain in cholesterol transport. PNAS. 2017; 114(34):9116±21. https://doi.org/10.1073/pnas.1711716114 PMID: 28784760. 74. Pentchev PG, Gal AE, Booth AD, Omodeo-Sale F, Fours J, Neumeyer BA, et al. A lysosomal storage disorder in mice characterized by a dual deficiency of sphingomyelinase and glucocerebrosidase. Bio- chim Biophys Acta- Lipids Lipid Metab. 1980; 619(3):669±79. 75. Miyawaki S, Mitsuoka S, Sakiyama T, Kitagawa T. Sphingomyelinosis, a new mutation in the mouseA model of Niemann-Pick disease in humans. J Hered. 1982; 73(4):257±63. https://doi.org/10.1093/ oxfordjournals.jhered.a109635 PMID: 7202025 76. Miyawaki S, Yoshida H, Mitsuoka S, Enomoto H, Ikehara S. A mouse model for Niemann-Pick disea- seInfluence of genetic background on disease expression in spm/spm mice. J Hered. 1986; 77(6):379± 84. https://doi.org/10.1093/oxfordjournals.jhered.a110265 PMID: 3559164 77. Maue RA, Burgess RW, Wang B, Wooley CM, Seburn KL, Vanier MT, et al. A novel mouse model of Niemann±Pick type C disease carrying a D1005G-Npc1 mutation comparable to commonly observed human mutations. Hum Mol Genet. 2012; 21(4):730±50. https://doi.org/10.1093/hmg/ddr505 PMID: 22048958 78. Dang Y, Li M, Yang M, Cao X, Lan X, Lei C, et al. Identification of bovine NPC1 gene cSNPs and their effects on body size traits of Qinchuan cattle. Gene. 2014; 540:153±60. https://doi.org/10.1016/j.gene. 2014.03.001 PMID: 24607034 79. Brown DE, Thrall MA, Walkley SU, Wenger DA, Mitchell TW, Smith MO, et al. Feline Niemann-Pick Dis- ease Type C. Am J Pathol. 1994; 144(6):1412±5. PMID: 8203477 80. Stein VM, Crooks A, Ding W, Prociuk M, O'Donnell P, Bryan C, et al. Miglustat Improves Purkinje Cell Survival and Alters Microglial Phenotype in Feline Niemann-Pick Disease Type C. J Neuropathol Exp Neurol. 2012; 71(5):434±48. https://doi.org/10.1097/NEN.0b013e31825414a6 PMID: 22487861 81. Vite CH, Bagel JH, Swain GP, Prociuk M, Sikora TU, Stein VM, et al. Intracisternal cyclodextrin prevents cerebellar dysfunction and Purkinje cell death in feline Niemann-Pick type C1 disease. Neurodegener Dis. 2015; 7(276):1±15. 82. D'Arcangelo G, Grossi D, Racaniello M, Cardinale A, Zaratti A, Rufini S, et al. Miglustat Reverts the Impairment of Synaptic Plasticity in a Mouse Model of NPC Disease.(Research Article). Neural Plast. 2016; 2016(Article ID: 3830424). https://doi.org/10.1155/2016/3830424

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 25 / 26 128 PLOS ONE Molecular basis for a bovine model of Niemann-Pick type C disease

83. Gurda BL, Vite CH. Large animal models contribute to the development of therapies for central and peripheral nervous system dysfunction in patients with lysosomal storage diseases. Hum Mol Genet. 2019; 28(R1):R119±r31. Epub 2019/08/07. https://doi.org/10.1093/hmg/ddz127 PMID: 31384936. 84. Cibelli J, Emborg ME, Prockop DJ, Roberts M, Schatten G, Rao M, et al. Strategies for improving animal models for regenerative medicine. Cell Stem Cell. 2013; 12(3):271±4. https://doi.org/10.1016/j.stem. 2013.01.004 PMID: 23472868. 85. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014; 157(6):1262±78. https://doi.org/10.1016/j.cell.2014.05.010 PMID: 24906146. 86. Man WYN, Nicholas FW, James JW. A pedigree-analysis approach to the descriptive epidemiology of autosomal-recessive disorders. Prev Vet Med. 2007; 78:262±73. https://doi.org/10.1016/j.prevetmed. 2006.10.010 PMID: 17126430

PLOS ONE | https://doi.org/10.1371/journal.pone.0238697 September 24, 2020 26 / 26 129

4.3 Appendix: Supplementary material for Chapter 4

S1 Fig Two genotyping assays were developed to discriminate the NM_174758.2:c.2969C>G variant for homozygous wildtype, heterozygous and homozygous mutant individuals. (a) Allelic discrimination plot visualised using QuantStudio™ Real-Time PCR System version 1.3 (Applied

Biosystems™) for a TaqMan genotyping assay for homozygous wildtype (red dots), heterozygous (green dots), homozygous mutant (blue) individuals and no DNA template control

(black square). (b) PCR-RFLP size discrimination visualised on a Bioanalyzer Instrument

(Agilent Technologies) for (1) homozygous wildtype (230 bp, 101 bp and 44 bp) (2) heterozygous (230 bp, 186 bp, 101 bp and 44 bp), (3) homozygous mutant (186 bp, 101 bp and

44 bp) individuals and (4) no DNA template control. (TIF)

130

S1 Video Video recording of an affected calf showing clinical signs including hind limb weakness, dysmetria, incoordination, walking sideways, falling over and recumbency. The video was recorded in 2005 by the producer. https://doi.org/10.1371/journal.pone.0238697.s002.

(MP4).

S2 Video Simulation trajectories of wildtype (left) and p.P990R mutant (right) models. Both models were subjected to 500 ns of MD simulation and performed in triplicate (replicates 1, 2 and 3). The movie shows the NPC1 models displayed in the cartoon representation, and embedded in a POPC membrane. This movie highlights the C-terminal domain (CTD) (purple),

N-terminal domain (NTD) (tan) and site of mutagenesis (pink Van der Waals spheres). The p.P990R simulations show an increased amount of motion and distortion of the CTD and NTDs internally, but also of their positions relative to each other. The full length of this movie encompasses 500 ns of MD simulation, with water molecules and ions hidden for clarity. https://doi.org/10.1371/journal.pone.0238697.s003 (MP4).

131

Chapter 5 | Brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep

5.1 Synopsis

In this chapter, published research in section 5.2 is presented for brachygnathia, cardiomegaly and renal hypoplasia syndrome in Merino sheep. This research continues from work published by Shariflou et al. 2011 and Shariflou et al. 2012 where the disease was first described and later mapped, respectively. The work presented in section 5.2 details the identification of a causal frameshift variant through the use of a whole genome sequencing approach, followed by protein impact analysis and then validation using a diagnostic test in 583 animals from the original flock.

The supplementary materials associated with section 5.2 are included in section 5.3.

The publication of the work in section 5.2 was included in this thesis under the Creative

Commons Attribution 4.0 International License.

5.2 Molecular basis of a new ovine model for human 3M syndrome-2

132 Woolley et al. BMC Genetics (2020) 21:106 https://doi.org/10.1186/s12863-020-00913-8

RESEARCHARTICLE Open Access Molecular basis of a new ovine model for human 3M syndrome-2 S. A. Woolley1, S. E. Hayes1, M. R. Shariflou1, F. W. Nicholas1, C. E. Willet2,B.A.O’Rourke3 and I. Tammen1*

Abstract Background: Brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS, OMIA 001595–9940) is a previously reported recessively inherited disorder in Australian Poll Merino/Merino sheep. Affected lambs are stillborn with various congenital defects as reflected in the name of the disease, as well as short stature, a short and broad cranium, a small thoracic cavity, thin ribs and brachysternum. The BCRHS phenotype shows similarity to certain human short stature syndromes, in particular the human 3M syndrome-2. Here we report the identification of a likely disease-causing variant and propose an ovine model for human 3M syndrome-2. Results: Eight positional candidate genes were identified among the 39 genes in the approximately 1 Mb interval to which the disease was mapped previously. Obscurin like cytoskeletal adaptor 1 (OBSL1)wasselectedasastrong positional candidate gene based on gene function and the resulting phenotypes observed in humans with mutations in this gene. Whole genome sequencing of an affected lamb (BCRHS3) identified a likely causal variant ENSOARG00000020239:g.220472248delC within OBSL1. Sanger sequencing of seven affected, six obligate carrier, two phenotypically unaffected animals from the original flock and one unrelated control animal validated the variant. A genotyping assay was developed to genotype 583 animals from the original flock, giving an estimated allele frequency of 5%. Conclusions: The identification of a likely disease-causing variant resulting in a frameshift (p.(Val573Trpfs*119)) in the OBSL1 protein has enabled improved breeding management of the implicated flock. The opportunity for an ovine model for human 3M syndrome and ensuing therapeutic research is promising given the availability of carrier ram semen for BCRHS. Keywords: Whole genome sequencing, Sheep, Inherited disease, Lethal, Frameshift variant, Animal model, 3M syndrome-2

Background recessive mode of inheritance based on previous pedi- Brachygnathia, cardiomegaly and renal hypoplasia syn- gree information and segregation analyses [1]. Affected drome (BCRHS, OMIA 001595–9940) is a previously re- lambs are stillborn and the primary defects associated ported lethal inherited disorder in Australian Poll with this disorder are brachygnathia, cardiomegaly and Merino/Merino sheep [1, 2] that, to the best of our renal hypoplasia, with additional skeletal defects includ- knowledge, has not been reported in other sheep breeds ing short stature, a short and broad cranium, a small in Australia. This disorder is characterized by a range of thoracic cavity reduced in size by approximately 25%, congenital defects and conforms with an autosomal thin ribs and brachysternum (Fig. 1). Affected lambs also present with congestive hepatopathy and small kidneys, * Correspondence: [email protected] which are reduced in size by approximately 50%, with 1 Faculty of Science, Sydney School of Veterinary Science, The University of male affected lambs having bilateral cryptorchidism [1]. Sydney, Camden, NSW 2570, Australia Full list of author information is available at the end of the article

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

133 Woolley et al. BMC Genetics (2020) 21:106 Page 2 of 11

Fig. 1 Comparison between an unrelated control stillborn Merino lamb a and a BCRHS-affected stillborn lamb b. The image of the affected lamb highlights some of the characteristic clinical signs: short stature, shortened broad cranium, brachygnathia and small thoracic cavity (Ruler length in both images = 30 cm)

Overall, these findings suggest a syndromic growth dis- gene. The phenotypic similarity between BCRHS and order in affected lambs. certain human short stature syndromes, in particular 3M Growth disorders leading to short stature in humans syndrome-2, adds further potential for developing a large can be broadly grouped into two main categories: dispro- animal model for the disease. portionate short stature where height and some body pro- Shariflou et al. (2012) mapped BCRHS to a 1.1 mega- portions are reduced in length; and proportionate short base (Mb) region on ovine chromosome OAR2, flanked stature, where overall height is reduced but all body pro- by single nucleotide polymorphisms (SNP) s50915 and portions remain within normal limits [3]. Short stature or s40177, using a medium density ovine SNP chip for geno- dwarfism in livestock can arise from selective breeding for typing, followed by genome-wide association and homozy- small body phenotypes [4]. Short stature phenotypes can gosity analyses. At that time, 25 genes were predicted to also arise in livestock as an undesired trait, with under- be located in the 1.1 Mb segment [2], as only an early vir- lying complex or Mendelian inheritance [5]. tual ovine genome assembly was available, based on a Multiple cases of inherited short stature in non- comparative mapping approach that mapped sheep DNA human species have been listed in Mouse Genome In- segments onto bovine and other mammalian genomes formatics (MGI) and Online Mendelian Inheritance in [13]. Since the publication of the Shariflou et al. (2012) Animals (OMIA) databases. Inherited forms are com- study, the Ovis aries Oar_v3.1 genome assembly (GCA_ mon in mice [6] and many genes have been implicated 000298735.1) became available. Therefore, the aim of the in cattle, sheep, goats, horses, pigs, rabbits, dogs, cats, present study was to identify the gene and causal mutation chickens, Japanese quail and Sumatran tigers [7] (Add- for BCRHS by analyzing the region flanked by SNPs itional file 1: Table S1). The BCRHS condition with its s50915 and s40177 from whole genome sequence data multitude of defects, in addition to dwarfism, represents aligned to the Oar_v3.1 reference genome. an opportunity to further elucidate human disorders with similar phenotypes. A number of inherited short stature syndromes in humans have been reported, in- Results cluding Seckel, Mulibrey Nanism, Bloom, Meier–Gorlin, Identification of positional candidate genes microcephalic osteodysplastic primordial dwarfism types Thirty-nine genes were identified in the region flanked by I and II, Silver-Russell and 3M syndromes [8, 9]. Over- SNPs s50915 and s40177 identified by Shariflou et al. (2012), lapping clinical signs are observed between BCRHS and which corresponds to position OAR2:g.220083076– a number of these syndromes, with particular similarity 221052836 on the Oar_v3.1 genome assembly (GCA_ observed with the phenotypes of Silver-Russel and 3M 000298735.1). Twenty-five of these genes were identified to syndromes [9–11]. Human 3M syndrome has been sub- code for known proteins, six genes coded for uncharacterized categorized into three types, 3M1 (OMIM 273750) with proteins and eight were RNA genes (Additional file 2:Table causal mutations located in the Cullin 7 (CUL7) gene, S2). Eight protein coding genes were identified as functional 3M2 (OMIM 612921) with causal mutations located in positional candidates for BCRHS (Additional file 3:Table the Obscurin like cytoskeletal adaptor 1 (OBSL1) gene, S3), and were prioritized based on known function and the and 3M3 (OMIM 614205 [12];) with causal mutations extent to which causal mutations in these genes produced located in the Coiled-Coil Domain Containing 8 (CCD8) phenotypes similar to the BCRHS phenotype. The OBSL1

134 Woolley et al. BMC Genetics (2020) 21:106 Page 3 of 11

gene was selected as the strongest candidate based on this variant is predicted to alter the amino acid sequence approach. (Additional file 7) within a conserved fibronectin type 3 domain resulting in the truncation of this domain as well Whole genome sequencing as the loss of four immunoglobulin (Ig) domains (Add- Whole genome sequencing of an affected lamb itional file 8: Fig. S2). (BCRHS3) identified 11,671 raw variants in an interval that included the region of interest plus an additional 1 Validation of c.1716delC Mb flanking sequence (OAR2:219083025–222052887) Segregation of the c.1716delC variant with BCRHS was on the Oar_v3.1 genome assembly (GCA_000298735.1). initially investigated in seven affected (including BCRHS3), After filtering and removal of known Single Nucleotide six obligate carrier and two phenotypically unaffected ani- Polymorphism Database (dbSNP) variants, 103 variants mals from the same flock, and one control Merino sheep with a predicted ‘low’, ‘moderate’ or ‘high’ impact on from an unrelated flock. Polymerase chain reaction (PCR) protein function that were homozygous alternate in products were amplified for all 16 samples and Sanger se- BCRHS3 and not homozygous alternate in the control quenced (Table 1). The results supported segregation of Merino sheep Y0346 were identified (Additional file 4: the c.1716delC variant with BCRHS (Fig. 4). All seven af- Table S4). Twenty-six of the 103 variants were located fected animals were homozygous for the deletion, six obli- within three of the top eight prioritized positional candi- gate carrier animals were heterozygous and two date genes. Fifteen variants were located in OBSL1,10 phenotypically unaffected and one unrelated control ani- variants in the chondroitin polymerizing factor (CHPF) mals were homozygous wildtype (Fig. 2). gene and one variant in the GDP-mannose pyrophos- The c.1716delC variant was not listed as a known vari- phorylase A (GMPAA) gene (Additional file 4: Table S4). ant in the Ensembl Genome Browser and was not Visual inspection of these 26 variants using SAMtools present in the variant database from 935 sequenced tview [14] in BCRHS3 and three control genomes (Me- sheep processed by Agriculture Victoria Research staff. rino sheep Y0346 and Y0244 and one Persian sheep), re- vealed that all variants for CHPF and GMPAA were TaqMan PCR genotyping assay present in these unrelated control animals and were A custom TaqMan PCR genotyping assay (Additional therefore unlikely to be disease-causing. Seven variants file 9: Fig. S3) was developed to genotype an additional located within the CHPF gene located at positions 583 animals from the current cohort of sheep in the ori- OAR2:220443181, 220443186, 220443189, 220443192, ginal flock, revealing 61 heterozygous animals and 522 220443194, 220443200 and 220443202 were in the same homozygous wildtype animals, giving an estimated allele region of poor sequencing quality across all controls and frequency of 5%. the affected BCRHS animal, and were therefore not fur- ther considered. Discussion Fourteen of the OBSL1 variants were observed in con- Our study identified a novel likely causal variant, trols or in areas of very low sequencing coverage, leaving c.1716delC, in the ovine OBSL1 gene in a lamb affected only one strong candidate variant. This variant was a with BCRHS by analyzing whole genome sequence single nucleotide deletion ENSOARG00000020239: within a genomic region previously associated with the g.220472248delC; ENSOART00000022037.1:c.1716delC disease. This variant was validated in additional animals (Fig. 2; XM_027965226.1:g.236304071delC or XM_ and the development of a discriminatory genotyping 027965226.1:c.1716delC on the new Oar_rambouillet_v1 assay has facilitated improved breeding management genome assembly (GCA_002742125.1)). This results in a practices. Moreover, screening of the original flock has frameshift of the OBSL1 protein after the valine amino revealed a high estimated minor allele frequency of 5%. acid at residue position 573 (p.(Val573Trpfs*119); Fig. 3) These results are a continuation of the work reported by and a prematurely truncated protein. The National Cen- Shariflou et al. (2011; 2012). These studies first described ter for Biotechnology Information Open Reading Frame the clinical signs and pathology of BCRHS, and after ex- (NCBI ORF) Finder [15] predicted the ovine OBSL1 tensive pedigree analyses, identified this disorder as a re- wildtype sequence start codon to begin at nucleotide 55 cessively inherited Mendelian trait that was mapped to a and the stop codon to end at nucleotide 5754, with an 1.1 Mb region on OAR2. amino acid length of 1899. The mutant sequence con- Inherited growth disorders are relatively common, taining the c.1716delC variant results in a frameshift with numerous genes identified for these disorders in with the last nucleotide of the stop codon at nucleotide both humans and livestock [3, 16]. Short stature syn- 2130, yielding a predicted altered amino acid length of dromes such as 3M syndrome, have been characterized 691 (p.(Val573Trpfs*119)) and a 64% truncation of in humans and are a part of a group of clinically hetero- OBSL1 (Additional file 6: Fig. S1). The c.1716delC geneous growth disorders.

135 Woolley et al. BMC Genetics (2020) 21:106 Page 4 of 11

Fig. 2 Schematic diagram of the ovine OBSL1 gene showing the location of the candidate causal mutation ENSOARG00000020239:g.22047224 8delC with Sanger sequencing chromatograms for one wildtype control, one obligate carrier and one affected animal. a Location of the ovine OBSL1 gene, OAR2:220453801–220475937 on the Oar_v3.1 ovine genome assembly. b Enlarged view of the OBSL1 gene with 25 exons. c Genomic region containing the c.1716delC frameshift variant with protein translation frames obtained from Ensembl genome browser 98 (Ensembl, accessed 26th December 2019, < http://asia.ensembl.org/Ovis_aries/Location/View?db=core;g=ENSOARG00000020239;r=2:220472236-22 0472254;t=ENSOART00000022037>. The position of the variant is identified by a red box and the protein reading frame is identified by a black box. d Sanger sequencing chromatograms for one wildtype, one obligate carrier and one affected animal (reverse strand)

While eight genes were initially considered as positional growth disorder results in growth, facial and skeletal ab- candidate genes, the identification of 26 of 103 private var- normalities in pre- and postnatal children with similar iants within only three of these genes - OBSL1, CHPF and phenotypes to BCRHS cases [17, 18]. GMPAA - allowed for further prioritization of candidate The OBSL1 gene encodes a cytoskeletal adaptor pro- genes based on gene function and mutation association tein that is involved in cell interactions and the cell with disease. Of all the positional candidate genes within matrix [19]. Mutations within OBSL1 have been associ- the identified region, OBSL1 was considered the strongest ated with human 3M syndrome-2, a short stature growth candidate as mutations in this gene cause 3M syndrome-2 disorder [20] that shows similar clinical signs to in humans (OMIM 612921)[12]. This autosomal recessive BCRHS-affected lambs. The clinical signs in humans

136 Woolley et al. BMC Genetics (2020) 21:106 Page 5 of 11

Fig. 3 Multiple species partial OBSL1 protein alignment using T-Coffee and BOXSHADE. Accessions used were XP_027821027.1 (Ovis aries), NP_056126.1 (Homo sapiens), XP_003309516.1 (Pan troglodytes), XP_002799127.1 (Macaca mulatta), XP_005640767.1 (Canis lupus familiaris), NP_001068959.2 (Bos taurus), XP_343600.4 (Rattus norvegicus), XP_003641689.2 (Gallus gallus) and NP_001121829.1 (Danio rerio). SheepMT refers to the mutant Ovis aries sequence and SheepWT refers to the wildtype Ovis aries sequence. The predicted frameshift starting with the replacement of valine with tryptophan in position 573 in affected sheep is highlighted by an asterix (*), with the partial new amino acid sequence highlighted in red in the SheepMT sequence, that is predicted to terminate at positon 691

resulting from mutations in the remaining two positional genome sequencing provided further support for the se- candidate genes, CHPF and GMPAA, did not show the lection of OBSL1 as the prime candidate gene. Of the 26 same degree of similarity with the clinical signs of variants that passed filtering located within the top three BCRHS-affected lambs [21–23]. The CHPF gene is in- positional candidate genes, only the c.1716delC OBSL1 volved in cell division and cytokinesis, with defects variant was predicted to impact protein function and ob- within this gene resulting in defective early embryogen- served in the homozygous non-reference state in the sin- esis through the arrestment of cell division [21]. Muta- gle affected animal. tions within the GMPAA gene have been reported to be The c.1716delC variant results in a frameshift after the involved with defects associated with neurological im- valine amino acid residue at position 573 in the OBSL1 pairment and facial abnormalities [22, 23]. Whole protein, where the remaining amino acid sequence is

Table 1 Sanger sequencing genotype results for the ENSOARG00000020239:g.220472248delC variant identified within the OBSL1 gene Sample ID Phenotype Reference Alternate Genotype of individual Control_1 Unaffected CC C- CC/CC Control_2 Unaffected CC C- CC/CC Control_3 Unaffected CC C- CC/CC Carrier_1 Obligate carrier CC C- CC/C- Carrier_2 Obligate carrier CC C- CC/C- Carrier_3 Obligate carrier CC C- CC/C- Carrier_4 Obligate carrier CC C- CC/C- Carrier_5 Obligate carrier CC C- CC/C- Carrier_6 Obligate carrier CC C- CC/C- BCRHS3 Affected CC C- C-/C- BCRHS11 Affected CC C- C-/C- Affected_3 Affected CC C- C-/C- Affected_4 Affected CC C- C-/C- Affected_5 Affected CC C- C-/C- Affected_6 Affected CC C- C-/C- Affected_7 Affected CC C- C-/C-

137 Woolley et al. BMC Genetics (2020) 21:106 Page 6 of 11

Fig. 4 Partial pedigree showing segregation of the c.1716delC mutation (C = wildtype, − = deletion) with the BCRHS phenotype. The pedigree links animals in this study to three obligate carriers (animals 4, 5 and 6) and two suspected carriers (animals 1 and 8) identified in a detailed pedigree by Shariflou et al. (2011). Females and males are denoted by circles and squares, respectively. Filled symbols represent affected animals and shaded symbols represent obligate carriers. The affected animal whole genome sequenced in this study is indicated by ‘BCRHS3’

completely changed thereafter (Fig. 3, Additional file 6). and modelling studies are required to fully understand Four ovine OBSL1 isoforms have been predicted, each the binding behavior of the mutated ovine OBSL1 in with differing amino acid residue lengths of 1899 amino BCRHS-affected sheep, and its interaction with the acids in OBSL1 isoform X1 (XP_027821027), 1807 CUL7 protein. amino acids in isoform X2 (XP_027821028), 1802 amino The OBSL1 gene was considered as a strong candidate, acids in isoform X3 (XP_027821029) and 1023 amino based on the phenotypic similarity between 3M acids in isoform X4 (XP_027821030). Similarly, in syndrome-2 in humans and the BCRHS-affected lambs, humans, three different OBSL1 isoforms exist with a as well as its biological function as a widespread cyto- 1896 amino acid sequence length for the OBSL1 isoform skeletal adaptor protein important for tissue stabilization 1 precursor (NP_056126), 1543 amino acids for the iso- in multiple organs [19]. In brief, clinical signs of 3M form 2 precursor (NP_001166902) and 1025 amino acids syndrome-2 include a large head, frontal bossing, short for the isoform 3 precursor (NP_001166879). The trun- nose and triangular-shaped face with a pointed chin dur- cation of the ovine OBSL1 isoform X1 from the ing later years in life, a short neck and thorax, thin ribs, c.1716elC variant imparts a 64% loss of the protein, slender long bones and tall vertebral bodies [17, 18]. resulting in a shortened amino acid length from 1899 to Endocrine function and growth hormone levels are 691 amino acid residues (Fig. 4). The early termination within normal limits for affected children [17, 27]. The of the protein and the ensuing loss of four conserved Ig presentation of BCRHS-affected lambs showcasing mul- domains (Additional file 8: Fig. S2) indicates that protein tiple congenital defects including brachygnathia, short function may be diminished. The Ig domains located stature, a short and broad cranium, a small thoracic cav- within the OBSL1 protein play important roles in form- ity, thin ribs and brachysternum [1] draw obvious simi- ing interactive binding sites for other proteins and form- larities to the clinical signs displayed by children affected ing complexes with muscle proteins such as titin [24, by 3M syndrome-2. However, BCRHS-affected lambs 25]. The OBSL1 protein has been shown to form a direct also present with congestive hepatopathy and small kid- complex with CUL7, named the cullin complex, through neys [1]. binding at the CUL7 C-terminus [25, 26]. It is therefore A majority of disease-causing mutations for human plausible that due to the p.(Val573Trpfs*119) and the 3M syndrome are located within one of three genes, with ensuing truncation and loss of four Ig domains, the approximately 70% of cases occurring within the CUL7 binding ability and interaction of the OBSL1 protein in gene, 25% in the OBSL1 gene and 5% within the CCD8 the cullin complex could be altered through loss of avail- gene [28, 29]. It is important to note that there have been able protein binding sites. Further protein interaction no observed phenotypic differences between human

138 Woolley et al. BMC Genetics (2020) 21:106 Page 7 of 11

patients with either CUL7 or OBSL1 mutations [20]. Mu- human 3M syndrome-2. The discovery of this variant tations within the human OBSL1 gene for 3M syndrome-2 has enabled the development of a robust genotyping typically occur within the first six to eight exons [17, 20] assay, that is being used for the identification of carrier and affect all three human isoforms. The c.1716delC vari- animals and for improved breeding management. The ant identified in this study is similarly located within the availability of a large animal model for human 3M sixth exon of the ovine OBSL1 gene (Fig. 2). Hanson et al. syndrome-2 represents a unique opportunity to further (2009) used a gene knockout model to show that OBSL1 investigate the biochemical basis of human 3M appears to play a role in regulating CUL7 protein levels in syndrome-2 as well as offering alternatives to validate cells and therefore, may act in a common pathway. Inter- therapeutic interventions in preclinical trials. action studies conducted by Hanson et al. (2011) showed that the OBSL1 protein acts as an adaptor protein for Methods CUL7 and CCD8, despite the lack of interaction between Animals and DNA isolation the CUL7 and CCD8 proteins [30]. Brachygnathia, cardiomegaly with renal hypoplasia was To determine the protein effect of the c.1716delC vari- reported from a single Merino/Poll Merino flock in ant for BCRHS-affected sheep, similar gene knockout Australia and samples were collected by the owner. Tis- models and protein interaction assays would be beneficial sue samples (liver, kidney, heart and ear notches) were to further elucidate the impact of this variant on disease collected from stillborn lambs or slaughter animals and phenotype. As no affected animals are currently available either stored in RNAlater or frozen. Blood cards were for further study, CRISPR-Cas9 [31] could be utilized to collected as per diagnostic DNA testing protocols [35]. replicate this variant to introduce an ovine cell culture Genomic DNA for whole genome sequencing was ex- gene knockout model. Protein levels of OBSL1, CUL7 and tracted from tissue stored in RNAlater (ThermoFisher CCD8 could be investigated to confirm whether this com- Scientific, DE, USA) from two affected animals. Genomic mon pathway also exists within sheep, and whether alter- DNA for Sanger sequencing was extracted from frozen ing this pathway and complex through OBSL1 truncation tissue for an additional five affected lambs, six obligate results in the BCRHS phenotype. carriers and two unaffected Merino sheep from the same Silver-Russel syndrome is recognized as a differential flock. All extractions used the QIAGEN DNeasy Blood & diagnosis for all three subcategories of 3M syndrome, as Tissue Kit following the manufacturer’sAnimalTissues it is characterized by slow growth before and after birth Spin-Column protocol (QIAGEN, CA, USA). [17]. Skeletal surveys and radiology are often used to Genomic DNA for the genotyping assay was isolated help differentiate between these two possible diagnoses, from blood cards collected from 583 Merino sheep from as Silver-Russel syndrome patients do not display the the original flock using a standard blood card digest skeletal phenotypes observed in 3M syndrome patients protocol [36]. DNA was also available from one add- [32]. Treatment of human 3M syndrome often involves itional Merino sheep from an unrelated flock. growth hormone administration, however the efficiency Pedigree information was compiled from over 40 years of this treatment has not been determined [17]. of breeding records, with details presented in Shariflou The use of animal models to assist in advancing the et al. (2011). Pedigraph [37] and the R package kinship2 knowledge of human disease has been proven to be bene- version 1.8.4 [38] were used for drawing pedigree trees. ficial [33]. The sheep investigated within this study would be prime candidates for an ovine model of human 3M syndrome-2. The affected sheep appear to suffer from a Identification of positional candidate genes more severe phenotype compared to humans with 3M The flanking SNPs s50915 and s40177 for the 1.1 Mb region syndrome-2, and the development of a large animal model reported by Shariflou et al. (2012)wereusedtoidentifythe would enable further disease characterization on both the region of interest in the Ensembl Oar_v3.1 genome assembly molecular and protein level to evaluate therapeutic inter- (GCA_000298735.1), where the region is smaller in size at ventions by using a model with comparable organ size approximately 0.97 Mb (OAR2(CM001583.1):g.220083076– scaling [34]. 221052836). Genes in this region were identified using the University of California Santa Cruz (UCSC) Genome Conclusions Browser (http://genome.ucsc.edu/) and Table Browser and The c.1716delC variant described in this study results in Ensembl annotations [39–41]. Online databases Online a frameshift mutation and the premature truncation of Mendelian Inheritance in Man (OMIM) [12], MGI [6]and the OBSL1 protein. Variant segregation among our PubMed were used to identify positional candidate genes ovine study set and similarity of the BCRHS-affected based on the normal function of the protein or any reported ovine phenotype to human 3M syndrome-2 suggests that phenotypes that were similar to BCRHS (Additional file 5: BCRHS-affected sheep represent an ovine model for Table S5).

139 Woolley et al. BMC Genetics (2020) 21:106 Page 8 of 11

Whole genome sequencing Variant filtering Genomic DNA concentration and purity of samples Variants annotated by SnpEff [47] within the 0.97 Mb region from two affected lambs (BCRHS3 and BCRHS11) were of interest and an additional 1 Mb flanking on OAR2 were measured using the NanoDrop 8000 spectrophotometer selected for filtering using a case-control approach in SnpSift and Qubit® 3.0 fluorometer (Thermo Scientific, DE, [48] version 4. Variants that were homozygous alternate for USA) and visualized on a 1% agarose gel. BCRHS3 and that were not homozygous alternate for con- Due to financial constraints as well as quantity and trol Merino sheep Y0346 were selected. Variants were fil- quality measures, only BCRHS3 was submitted for tered for ‘low’, ‘moderate’ or ‘high’ impact on protein whole genome sequencing. Three additional sheep function as annotated by SnpEff [47], with known dbSNP whole genome sequenced for the investigation of other variants and duplicate variants manually removed. unrelated inherited conditions (Merino sheep Y0244 Of these, variants present in the prioritized positional and Y0346 and one Persian sheep, all from different candidate genes underwent visual inspection using flocks) were used as controls in the present study. SAMtools tview in the sequence data for BCRHS3, Whole genome sequencing was performed using the Y0346, an additional Merino (Y0244) and one Persian Illumina HiSeq™ X Ten sequencing platform (Illumina, sheep. This reduced the list of candidate variants by ex- San Diego, CA) by the Kinghorn Centre for Clinical cluding SNPs in regions of poor sequencing quality or Genomics (Garvan Institute of Medical Research, Dar- those that were present in the controls. linghurst, Australia). DNA libraries were prepared using the Illumina® TruSeq DNA Nano Library Prep kit. Each sample was sequenced as 150 base pair (bp) Validation of c.1716delC paired-end reads at an expected 30X coverage. Adaptor Following PCR amplification of the region flanking the sequences were removed by the service provider. Qual- c.1716delC variant, Sanger sequencing was conducted to ity visualization and control was conducted on the validate the variant in seven affected, six obligate and resulting sequence reads using FastQC (version 0.11.3) three unaffected control animals (Table 1). (https://www.bioinformatics.babraham.ac.uk/projects/ PrimerBLAST [49] was used to design a primer pair to fastqc/). Inspection of FastQC output indicated that the amplify the region flanking the ENSOART00000022037.1: sequence data for all four samples were of good quality c.1716delC (Oar_v3.1) variant in the candidate gene OBSL1. (yield ranged from 54.87 Gb to 80.16 Gb, 76.15 to PCR amplification of a 229 bp product was performed 97.42% > PHRED30, 40.5 to 42.5% GC content and no using a Gradient Palm-Cycler™ Thermal Cycler (CGI-96, adaptor contamination flagged). Therefore no quality Corbett Life Science, NSW, Australia) in a total volume of trimming was conducted. 25 μL, containing 1x Platinum™ SuperFi™ PCR Master Mix (Invitrogen, ThermoFisher Scientific, DE, USA), 0.5 μMof each primer F2 5′- GTGTTGGCCGAAATGTTCAAG-3′ Read mapping, variant calling and annotation and R2 5′-GTTCGCTGACAGTGCAGACTC-3 and ap- Paired-end sequence reads were mapped to the Ovis proximately 50 ng of genomic DNA. The initial denatur- aries Oar_v3.1 genome assembly (GCA_000298735.1) ation step was performed at 98 °C for 30 s, followed by 35 using Burrows-Wheeler Aligner (BWA-mem) version cycles consisting of a denaturation step at 98 °C for 10 s, an- 0.7.15 [42] with default settings. Polymerase chain reac- nealing at 64 °C for 10 s and extension at 72 °C for 30 s. A tion (PCR) duplicates were marked using samblaster ver- final extension was performed at 72 °C for 5 min. PCR sion 0.1.22 [43]. Lane-level binary alignment maps products were visualised on a 2% agarose gel before sub- (BAMs) were merged using Picard version 1.119 (http:// mission to Macrogen (Seoul, Korea) for DNA sequencing. broadinstitute.github.io/picard/). Sorting and indexing Sequencing data was analysed using MEGAX software was performed with SAMtools [14] version 1.6. Local re- [50] by aligning the sequences to genomic DNA to iden- alignment around insertion and deletion sites as well as tify variants. Variants were compared to the variant data- base quality score recalibration using known variants base in Ensembl [51] and predicted impacts of novel downloaded from Ensembl’s dbSNP database for Ovis variants on protein function were additionally determined aries version 87 [41] were performed with the Genome by SIFT analysis [52]. Cross-species OBSL1 protein align- Analysis Toolkit (GATK) version 3.7.0 [44, 45]. ments were conducted using T-Coffee [53] and BOX- Single nucleotide polymorphisms were called using SHADE (v3.2) across nine species. These included OBSL1 GATK HaplotypeCaller in GVCF mode [46] and were protein sequences from sheep (Ovis aries), human (Homo genotyped using GATK GenotypeGVCFs [44, 45]. Anno- sapiens), chimpanzee (Pan troglodytes), Rhesus macaque tation and prediction of functional effects of SNP on (Macaca mulatta), dog (Canis lupus familiaris), cattle OAR2 was conducted using SnpEff [47] version 4.3 and (Bos taurus), Brown rat (Rattus norvegicus), chicken the Ensembl annotation release 86 for Oar_v3.1. (Gallus gallus) and zebrafish (Danio rerio).

140 Woolley et al. BMC Genetics (2020) 21:106 Page 9 of 11

To assess the predicted impact of the c.1716delC variant Additional file 2: Table S2. List of genes identified in the region on OBSL1 length, NCBI ORF Finder [15] was used to flanked by SNPs s50915 and s40177 corresponding to positions compare the wildtype and mutant mRNA sequences from OAR2:g.220083076–221052836 on the Oar_v3.1 genome assembly. the ovine OBSL1 isoform X1 (XP_027821027) to identify Additional file 3: Table S3. Top eight protein coding positional candidate genes identified in the OAR2:g.220083076–221052836 region alternative stop codon sites in the mutant sequence. To on the Oar_v3.1 genome assembly. identify predicted losses of conserved ovine OBSL1 pro- Additional file 4: Table S4. List of 103 private whole genome tein domains, the NCBI Conserved Domains database [15, sequencing variants identified in an affected lamb (BCRHS3) after filtering 54] was used for the mutant OBSL1 mRNA sequence based on segregation, predicted protein impact, removal of known SNPs and duplicates. (XM_027965226.1) containing the c.1716delC variant. Additional file 5: Table S5. Gene list obtained from Mouse Genome To assess whether the c.1716delC variant had been Informatics (MGI) with PubMed literature counts identifying genes previously reported in sheep, known ovine OBSL1 vari- causing similar phenotypes to BCRHS. ants in the Ensembl genome browser (https://www. Additional file 6: Figure S1. Comparison of predicted open reading ensembl.org/Ovis_aries/Gene/Variation_Gene/Table? frames (ORF) for OBSL1 mRNA (XM_027965226.1) and predicted mutant ovine mRNA for OBSL1 using NCBI ORF Finder [15] (accessed 18th db=core;g=ENSOARG00000020239;r=2:220453801-2204 December 2019, < https://www.ncbi.nlm.nih.gov/orffinder/>). (a) ORF1 75937;t=ENSOART00000022037) were investigated. (black *) represents the ORF that codes for OBSL1 (1899 amino acid Presence of the variant was also screened in an add- residues). (b) ORF1 (red *) codes for a truncated and modified protein of 691 amino acid residues. itional database of sequence variants generated by the Additional file 7. Ovine OBSL1 isoform X1 protein sequences Agriculture Victoria Research team at the Centre For (XP_027821027) for wildtype (SheepWT) and mutant (SheepMT) sheep. AgriBioscience, Melbourne. These variants were discov- The predicted mutant c.1716delC (p.(Val573Trpfs*119)) altered amino acid ered from 935 sheep sequences: 453 from the SheepGen- sequence is highlighted in red. omesDB Project and 482 contributed by the Sheep CRC Additional file 8: Figure S2. Wildtype OBSL1 protein showing conserved domains obtained from the NCBI Conserved Domains Project [55]. A range of different breeds were repre- database [15, 54] (accessed 18th December 2019, ). The location of the c.1716delC variant is indicated. The resulting modified protein p.(Val573Trpfs*119) is predicted to have a truncated fibronectin type 3 domain and is lacking TaqMan PCR genotyping assay four immunoglobulin domains. A custom TaqMan real-time PCR assay was designed using Additional file 9: Figure S3. Allelic discrimination plot visualised using the Custom TaqMan® Assay Design tool (ThermoFisher Sci- QuantStudio™ Real-Time PCR System version 1.3 (Applied Biosystems™) entific, DE, USA) to discriminate between homozygous wild- for a TaqMan genotyping assay used to discriminate the ENSOART00000022037.1:c.1716delC variant for homozygous wildtype (red type, heterozygous and homozygous mutant genotypes. dots), heterozygous (green dots), homozygous mutant (blue) individuals Allelic discrimination was performed using the ViiA™ 7 and a no DNA template control (black square). system (Applied Biosystems™, CA, USA) in a final reaction volume of 12.5 μL. Each reaction contained 1 x TaqMan® Genotyping Master Mix (Applied Biosystems, CA, USA), Abbreviations ′ BCRHS: Brachygnathia, cardiomegaly and renal hypoplasia syndrome; 900 nmol/L of assay specific primers 5 - CGGTAGGCAC bp: Base pair; CCD8: Coiled-Coil Domain Containing 8; CHPF: Chondroitin GCAGTCC-3′ and 5′-TACAGTGCTTCAGCATTGAG polymerizing factor; CUL7: Cullin 7; dbSNP: Single Nucleotide Polymorphism AAAG-3′, 250 nmol/L of allele specific 5′-VIC-CACCTCC Database; DNA: Deoxyribonucleic acid; GMPAA: GDP-mannose ′ ′ pyrophosphorylase A; Mb: Megabase; MGI: Mouse Genome Informatics; NCBI ACGCCCCG-NFQ-3 (wildtype) and 5 -FAM-CCTCCACG ORF: National Center for Biotechnology Information Open Reading Frame GCCCCG-NFQ-3′ (mutant) probes and approximately 10– Finder; NCBI: National Center for Biotechnology Information; OAR2: Ovis aries 30 ng of genomic DNA. Each assay commenced with a pre- chromosome 2; OBSL1: Obscurin like cytoskeletal adaptor 1; OMIA: Online Mendelian Inheritance in Animals; OMIM: Online Mendelian Inheritance in read stage at 60 °C for 30 s followed by an initial denaturation Man; PCR: Polymerase chain reaction; SNP: Single nucleotide polymorphism at 95 °C for 10 min, followed by 45 cycles of denaturation at 95 °C for 15 s, annealing and extension at 60 °C for 60 s and a final post-read stage at 60 °C for 30 s. Genotypes were ana- Acknowledgments The authors would like to acknowledge and thank Charlotte Carter for lysed using the QuantStudio™ Real-Time PCR System ver- providing samples and data associated with this study, and for her sion 1.3 (Applied Biosystems™, CA, USA). unflagging enthusiasm and support during the many years since she first reported this disorder to the then MLA/AWI Sheep Genomics Program. The University of Sydney is acknowledged for the use of the Artemis HPC Supplementary information services and facilities at the Sydney Informatics Hub. The authors would like Supplementary information accompanies this paper at https://doi.org/10. to thank the Biotechnology laboratory staff at the Elizabeth Macarthur 1186/s12863-020-00913-8. Agricultural Institute for assisting in genotyping the final subset of the samples submitted for this study. The authors would like to acknowledge Dr. Iona MacLeod and Dr. Hans Daetwyler at the Centre For AgriBioscience, Additional file 1: Table S1. Adapted Online Mendelian Inheritance in Agriculture Victoria Research, Melbourne for the provision of allele frequency Animals (OMIA) list of reported inherited forms of short stature (or data from Run2 of the combined SheepGenomesDB and Sheep CRC dataset dwarfism) in cattle, sheep, a goat, horses, pigs, rabbits, dogs, a cat, of 935 sequences. We also acknowledge the Sheep CRC, SheepGenomesDB chickens, Japanese quail and a Sumatran tiger with the associated OMIA (http://sheep.genomedb.org) and all institutions that have made their sheep ID. sequence data available.

141 Woolley et al. BMC Genetics (2020) 21:106 Page 10 of 11

Authors’ contributions 7. Online Mendelian Inheritance in Animals. Sydney School of Veterinary Science, All authors read and approved this manuscript for publication, with the University of Sydney, Sydney. 2019:https://omia.org/ Accessed 26th December 2019. additional contributions: SAW: DNA extraction/preparing samples for WGS, 8. Boycott KM, Vanstone MR, Bulman DE, Mackenzie AE. Rare-disease genetics WGS analysis, bioinformatics, writing the paper, optimisation of the in the era of next-generation sequencing: discovery to translation. Nat Rev genotyping assay, running samples through the genotyping assay, co- Genet. 2013;14(10):681–91. supervision of SEH, study design and response to reviewers. SEH: Candidate 9. Clayton PE, Hanson D, Magee L, Murray PG, Saunders E, Abu-Amero SN, gene analysis, extraction of DNA from tissues samples and blood cards, run- et al. Exploring the spectrum of 3-M syndrome, a primordial short stature ning samples through the genotyping assay and contributing to writing the disorder of disrupted ubiquitination. Clin Endocrinol. 2012;77(3):335–42. paper. MRS: Collection and assembly of pedigree information and subse- 10. Silver HK, Kiyasu W, George J, Deamer WC. Syndrome of congenital quent analysis, review of paper, coordination of the collection and delivery hemihypertrophy, shortness of stature, and elevated urinary gonadotropins. of samples (including BCRHS3 & 11) and maintaining communication with Pediatrics. 1953;12(4):368–76. the producer. FWN: Initiation and facilitation of collaboration between the 11. Russell A. A syndrome of intra-uterine dwarfism recognizable at birth with producer and researchers; review of paper and pedigree analysis. CEW: Bio- cranio-facial dysostosis, disproportionately short arms, and other anomalies informatics, paper review, co-supervision and bioinformatics training of SAW (5 examples). Proc Roy Soc Med. 1954;47(12):1040–4. and study design. BOR: Co-supervision and training of SHE & SAW, study de- 12. Online Mendelian Inheritance in Man. Johns Hopkins University (Baltimore, sign, paper review and funding applications. IT: Conception of study, paper MD). 2019. https://omim.org/.Accessed 26th December 2019. review, co-supervision and training of SEH & SAW, communication with the 13. The International Sheep Genomics Consortium, Archibald AL, Cockett NE, producer, funding applications and response to reviewers. Dalrymple BP, Faraut T, Kijas JW, et al. The sheep genome reference sequence: a work in progress. Anim Genet. 2010;41:449–53. Funding 14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Whole genome sequencing was supported by the University of Sydney and alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. NSW Department of Primary Industries compact funding and an Australian 15. NCBI Resource Coordinators. Database resources of the National Center for Government Research Training Program (RTP) Scholarship for SAW to undertake biotechnology information. Nucleic Acids Res. 2018;46(D1):D8–D13. this project. The Sydney School of Veterinary Science, The University of Sydney 16. Argente J, Tatton-Brown K, Lehwalder D, Pfäffle R. Genetics of growth provided research student support for SAW and SHE through assistance with disorders—which patients require genetic testing? Front Endocrinol. 2019; consumables and Sanger sequencing. 10(602):1–15. 17. Huber C, Munnich A, Cormier-Daire V. The 3M syndrome. Best Pract Res Clin – Availability of data and materials Endocrinol Metab. 2011;25(1):143 51. The dataset generated and/or analysed during the current study are 18. Murray PG, Hanson D, Coulson T, Stevens A, Whatmore A, Poole RL, et al. 3- available at the European Nucleotide Archive (www.ebi.ac.uk/ena/) and was M syndrome: a growth disorder associated with IGF2 silencing. Endocr – deposited under the study accession number PRJEB39179 and the sample Connect. 2013;2(4):225 35. accession is SAMEA7034587. 19. Geisler SB, Robinson D, Hauringa M, Raeker MO, Borisov AB, Westfall MV, et al. Obscurin-like 1, OBSL1, is a novel cytoskeletal protein related to obscurin. Genomics. 2007;89(4):521–31. Ethics approval and consent to participate 20. Hanson D, Murray PG, Sud A, Temtamy SA, Aglan M, Superti-Furga A, et al. Ethics approval was obtained for the collection of samples by the University The primordial growth disorder 3-M syndrome connects Ubiquitination to of Sydney Animal Ethics Committee (Animal Ethics Project No. 2016/998). the cytoskeletal adaptor OBSL1. Am J Hum Genet. 2009;84(6):801–6. 21. Izumikawa T, Kitagawa H, Mizuguchi S, Nomura KH, Nomura K, Tamura JI, Consent for publication et al. Nematode chondroitin polymerizing factor showing cell−/organ- Not applicable. specific expression is indispensable for chondroitin synthesis and embryonic cell division. J Biol Chem. 2004;279(51):53755. Competing interests 22. Gold WA, Sobreira N, Wiame E, Marbaix A, Van Schaftingen E, Franzka P, The authors declare that they have no competing interests. et al. A novel mutation in GMPPA in siblings with apparent intellectual disability, epilepsy, dysmorphism, and autonomic dysfunction. Am J Med Author details Genet A. 2017;173(8):2246–50. 1Faculty of Science, Sydney School of Veterinary Science, The University of 23. Koehler K, Malik M, Mahmood S, Gießelmann S, Beetz C, Hennings JC, et al. Sydney, Camden, NSW 2570, Australia. 2Sydney Informatics Hub, Core Mutations in GMPPA cause a glycosylation disorder characterized by intellectual Research Facilities, The University of Sydney, Sydney, NSW 2006, Australia. disability and autonomic dysfunction. Am J Hum Genet. 2013;93(4):727–34. 3NSW Department of Primary Industries, Elizabeth Macarthur Agricultural 24. Sauer F, Vahokoski J, Song YH, Wilmanns M. Molecular basis of the head-to- Institute, Menangle, NSW 2568, Australia. tail assembly of giant muscle proteins obscurin-like 1 and titin. EMBO Rep. 2010;11(7):534–40. Received: 15 March 2020 Accepted: 30 August 2020 25. Benian GM, Mayans O. Titin and Obscurin: giants holding hands and discovery of a new Ig domain subset. J Mol Biol. 2015;427(4):707–14. 26. Litterman N, Ikeuchi Y, Gallardo G, O'Connell BC, Sowa ME, Gygi SP, et al. An References OBSL1-Cul7Fbxw8 ubiquitin ligase signaling mechanism regulates Golgi 1. Shariflou MR, Wade CM, Windsor PA, Tammen I, James JW, Nicholas FW. morphology and dendrite patterning. PLoS Biol. 2011;9(5):e1001060. Lethal genetic disorder in poll merino/merino sheep in Australia. Aus Vet J. 27. Al-Dosari MS, Al-Shammari M, Shaheen R, Faqeih E, Alghofely MA, Boukai A, 2011;89(7):254–9. et al. 3M syndrome: an easily recognizable yet underdiagnosed cause of 2. Shariflou MR, Wade CM, Kijas J, McCulloch R, Windsor PA, Tammen I, et al. proportionate short stature. J Pediatr. 2012;161(1):139–45. Brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in 28. Hanson D, Murray PG, Coulson T, Sud A, Omokanye A, Stratta E, et al. merino sheep maps to a 1.1-megabase region on ovine chromosome Mutations in CUL7, OBSL1 and CCDC8 in 3-M syndrome lead to disordered OAR2. Anim Genet. 2012;44(2):231–3. growth factor signalling. J Mol Endocrinol. 2012;49(3):267–75. 3. Boegheim IJM, Leegwater PAJ, van Lith HA, Back W. Current insights into 29. Huber C, Fradin M, Edouard T, Le Merrer M, Alanay Y, Da Silva DB, et al. the molecular genetic basis of dwarfism in livestock. Vet J. 2017;224:64–75. OBSL1 mutations in 3-M syndrome are associated with a modulation of 4. Parnell PF, Arthur PF, Barlow R. Direct response to divergent selection for IGFBP2 and IGFBP5 expression levels. Hum Mutat. 2010;31(1):20–6. yearling growth rate in Angus cattle. Livest Prod Sci. 1997;49(3):297–304. 30. Hanson D, Murray PG, Sullivan J, Urquhart J, Daly S, Bhaskar SS, et al. Exome 5. Cavanagh J, Tammen I, Windsor P, Bateman J, Savarirayan R, Nicholas F, sequencing identifies CCDC8 mutations in 3-M syndrome, suggesting that et al. Bulldog dwarfism in Dexter cattle is caused by mutations in ACAN. CCDC8 contributes in a pathway with CUL7 and OBSL1 to control human Mamm Genome. 2007;18(11):808–14. growth. Am J Hum Genet. 2011;89(1):148–53. 6. Bult CJ, Blake JA, Smith CL, Kadin JA, Richardson JE. Mouse genome 31. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 database (MGD) 2019. Nucleic Acids Res. 2019;47(D1):D801–6. for genome engineering. Cell. 2014;157(6):1262–78.

142 Woolley et al. BMC Genetics (2020) 21:106 Page 11 of 11

32. Akawi NA, Ali BR, Hamamy H, Al-Hadidy A, Al-Gazali L. Is autosomal recessive Silver-Russel syndrome a separate entity or is it part of the 3-M syndrome spectrum? Am J Med Genet A. 2011;155(6):1236–45. 33. Robinson NB, Krieger K, Khan FM, Huffman W, Chang M, Naik A, et al. The current state of animal models in research: a review. Int J Surg. 2019;72:9–13. 34. Davidson MK, Lindsey JR, Davis JK. Requirements and selection of an animal model. Isr J Med Sci. 1987;23(6):551–5. 35. NSW Department of Primary Industries. Sample Collection Guide Blood Cards. 2017:https://www.dpi.nsw.gov.au/__data/assets/pdf_file/0019/ 701335/sample-collection-guide-blood-card.pdf Accessed 29th June 2020. 36. O’Rourke BA, Kelly J, Spiers ZB, Shearer PL, Porter NS, Parma P, et al. Ichthyosis fetalis in polled Hereford and shorthorn calves. J Vet Diagn Investig. 2017;29(6):874–6. 37. Garbe JR, Da Y. Pedigraph: A Software Tool for the Graphing and Analysis of Large Complex Pedigree. https://animal.geneumnedu/pedigraph. User manual Version 24, Department of Animal Science, University of Minnesota (2008) Accessed February 2020. 38. Sinnwell JP, Therneau TM, Schaid DJ. The kinship2 R package for pedigree data. Hum Hered. 2014;78(2):91–3. 39. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006. 40. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. The UCSC table browser data retrieval tool. Nucleic Acids Res. 2004; 32(Database issue):D493–6. 41. Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Res. 2017;46(D1):D754–D61. 42. Li H, Durbin R. Fast and accurate short read alignment with burrows- wheeler transform. Bioinformatics. 2009;25(14):1754–60. 43. Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014;30(17):2503–5. 44. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. 45. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next- generation DNA sequencing data. Nat Genet. 2011;43:491–502. 46. VanderAuweraGA,CarneiroMO,HartlC,PoplinR,delAngelG,Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;11(1110):11.0.1–0.33. 47. Cingolani P, Platts A, Wang Le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. 48. Cingolani P, Patel V, Coon M, Nguyen T, Land S, Ruden D, et al. Using Drosophila melanogaster as a model for Genotoxic chemical mutational studies with a new program. SnpSift Front Genet. 2012;3(35):1–9. 49. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden T. Primer- BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics. 2012;13:134–45. 50. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4. 51. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor. Bioinformatics. 2010;26(16):2069–70. 52. Flanagan SE, Patch AM, Ellard S. Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genet Test Mol Biomarkers. 2010;14(4):533–7. 53. Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang J-M, et al.. T-Coffee: A web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011;39(Web Server issue):W13-WW7. 54. Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45(D1):D200–D3. 55. Daetwyler HD, Brauning R, Chamberlain AJ, McWilliam S, McCulloch A, Vander Jagt CJ, et al. 1000 Bull Genomes And Sheepgenomedb Projects: Enabling Costeffective Sequence Level Analyses Globally. Proc Assoc Advmt Anim Breed Genet. 2017;22:201–4.

Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

143

5.3 Appendix: Supplementary material for Chapter 5

Additional file 1: Table S1 Adapted Online Mendelian Inheritance in Animals (OMIA) list of reported inherited forms of short stature (or dwarfism) in cattle, sheep, a goat, horses, pigs, rabbits, dogs, a cat, chickens, Japanese quail and a Sumatran tiger with the associated OMIA ID.

Species OMIA ID Phene Species Scientific Name Common Gene Name

OMIA 001522-9615 Oculoskeletal dysplasia 1 Canis lupus familiaris Dog COL9A3

OMIA 000187-9615 Chondrodysplasia Canis lupus familiaris Dog -

OMIA 001271-9913 Dwarfism, ACAN-related Bos taurus Cattle ACAN

OMIA 000303-9031 Dwarfism, autosomal Gallus gallus Chicken C1H12ORF23

OMIA 001271-9796 Dwarfism, ACAN-related Equus caballus Horse ACAN

OMIA 001926-9913 Achondrogenesis, type II Bos taurus Cattle COL2A1

OMIA 000187-9913 Chondrodysplasia Bos taurus Cattle EVC2 Dwarfism, proportionate,

OMIA 001686-9913 Bos taurus Cattle RNF11 with inflammatory lesions

OMIA 000299-9986 Dwarfism Oryctolagus cuniculus Rabbit HMGA2

OMIA 000299-9796 Dwarfism Equus caballus Horse -

OMIA 002068-9796 Dwarfism, Friesian Equus caballus Horse B4GALT7

OMIA 001523-9615 Oculoskeletal dysplasia 2 Canis lupus familiaris Dog COL9A2

OMIA 001323-9823 Dwarfism, Laron Sus scrofa Pig -

OMIA 000309-9031 Dwarfism, sex-linked Gallus gallus Chicken GHR

OMIA 000402-9615 Gangliosidosis, GM1 Canis lupus familiaris Dog GLB1 Hypothyroidism,

OMIA 000536-9615 Canis lupus familiaris Dog TPO congenital Dwarfism, growth-

OMIA 001473-9913 Bos taurus Cattle GH1 hormone deficiency Mucopolysaccharidosis Domestic

OMIA 000666-9685 Felis catus ARSB VI cat

OMIA 001997-9986 Achondroplasia-2 Oryctolagus cuniculus Rabbit -

OMIA 000189-9986 Chondrodystrophy Oryctolagus cuniculus Rabbit -

OMIA 001998-9986 Dwarfism, Dahlem Oryctolagus cuniculus Rabbit -

OMIA 000302-9940 Dwarfism, Ancon Ovis aries Sheep -

OMIA 000307-9615 Dwarfism, pituitary Canis lupus familiaris Dog LHX3

OMIA 001985-9913 Dwarfism, Fleckvieh Bos taurus Cattle GON4L

144 Skeletal dysplasia with

OMIA 001903-9940 craniofacial deformity and Ovis aries Sheep - disproportionate dwarfism

OMIA 001400-9940 Chondrodysplasia, Texel Ovis aries Sheep SLC13A1 Dwarfism,

OMIA 001296-9615 Canis lupus familiaris Dog - hypochondroplastic Dwarfism, growth-

OMIA 001294-9913 hormone-receptor Bos taurus Cattle - deficiency

OMIA 000156-9986 C8 deficiency Oryctolagus cuniculus Rabbit -

Congenital

OMIA 001814-9913 chondrodystrophy of Bos taurus Cattle - unknown origin

OMIA 001772-9615 Skeletal dysplasia 2 (SD2) Canis lupus familiaris Dog COL11A2

OMIA 000299-9823 Dwarfism Sus scrofa Pig -

Dwarfism, Schmid

OMIA 001718-9823 metaphyseal Sus scrofa Pig COL10A1 chondrodysplasia

OMIA 001485-9913 Dwarfism, Angus Bos taurus Cattle PRKG2

OMIA 001659-9913 Dwarfism, dominant Bos taurus Cattle -

OMIA 000311-9913 Dwarfism, stumpy Bos taurus Cattle -

OMIA 000308-9913 Dwarfism, proportionate Bos taurus Cattle -

OMIA 000310-9913 Dwarfism, snorter Bos taurus Cattle -

OMIA 000306-9031 Dwarfism, crooked neck Gallus gallus Chicken - Hypothyroidism and Sumatran

OMIA 000537-9695 Panthera tigris sumatrae - dwarfism tiger

OMIA 000306- Japanese Dwarfism, crooked neck Coturnix japonica -

93934 quail Domestic

OMIA 000616-9685 Lysosomal storage disease Felis catus - cat Domestic

OMIA 000299-9685 Dwarfism Felis catus - cat

OMIA 000299-9913 Dwarfism Bos taurus Cattle -

OMIA 000299-9615 Dwarfism Canis lupus familiaris Dog -

OMIA 000299-9925 Dwarfism Capra hircus Goat -

OMIA 000299-9940 Dwarfism Ovis aries Sheep -

OMIA 001323-9913 Dwarfism, Laron Bos taurus Cattle - OMIA 000300-9615 Dwarfism with anaemia Canis lupus familiaris Dog - Joint laxity and dwarfism,

OMIA 000570-9913 Bos taurus Cattle - congenital

OMIA 000664-9913 Mucopolysaccharidosis I Bos taurus Cattle -

OMIA 000187-9823 Chondrodysplasia Sus scrofa Pig -

145

Additional file 2: Table S2 List of genes identified in the region flanked by SNPs s50915 and s40177 corresponding to positions OAR2:g.220083076–221052836 on the Oar_v3.1 genome assembly.

Database Ensembl name Gene name Gene type Ensembl & UCSC ENSOARG00000025850 ENSOARG00000025850 LincRNA Ensembl & UCSC ENSOARG00000023421 ENSOARG00000023421 MiRNA Ensembl & UCSC ENSOARG00000024463 ENSOARG00000024463 MiRNA Ensembl & UCSC ENSOARG00000023812 ENSOARG00000023812 MiRNA Ensembl & UCSC ENSOARG00000023371 MIRN153-1 MiRNA Ensembl & UCSC ENSOARG00000022190 ENSOARG00000022190 SnoRNA Ensembl & UCSC ENSOARG00000022244 SNORD39 SnoRNA Ensembl & UCSC ENSOARG00000022972 ENSOARG00000022972 SnoRNA Ensembl & UCSC ENSOARG00000020190 ENSOARG00000020190 Uncharacterised, protein coding Ensembl & UCSC ENSOARG00000020199 ENSOARG00000020199 Uncharacterised, protein coding Ensembl & UCSC ENSOARG00000020195 ENSOARG00000020195 Uncharacterised, protein coding Ensembl & UCSC ENSOARG00000020189 ENSOARG00000020189 Uncharacterised, protein coding Ensembl & UCSC ENSOARG00000020202 ENSOARG00000020202 Uncharacterised, protein coding UCSC ENSOARG00000020248 ENSOARG00000020248 Uncharacterised, protein coding Ensembl & UCSC ENSOARG00000019795 NHEJ1 Protein coding Ensembl & UCSC ENSOARG00000019940 Autophagy related 9A Protein coding Ensembl & UCSC ENSOARG00000020207 ASIC4 Protein coding Ensembl & UCSC ENSOARG00000019920 ABCB6 Protein coding Ensembl & UCSC ENSOARG00000020032 tubulin alpha-1D chain Protein coding Ensembl & UCSC ENSOARG00000020185 DES Protein coding Ensembl & UCSC ENSOARG00000020203 GMPPA Protein coding Ensembl & UCSC ENSOARG00000020243 INHA Protein coding Ensembl & UCSC ENSOARG00000020255 SLC4A3 Protein coding Ensembl & UCSC ENSOARG00000019836 CNPPD1 Protein coding Ensembl & UCSC ENSOARG00000020020 TUBA4A Protein coding Ensembl & UCSC ENSOARG00000020161 DNPEP Protein coding Ensembl & UCSC ENSOARG00000019885 ZFAND2B Protein coding Ensembl & UCSC ENSOARG00000020051 DNAJB2 Protein coding Ensembl & UCSC ENSOARG00000020209 CHPF Protein coding Ensembl & UCSC ENSOARG00000019815 SLC23A3 Protein coding Ensembl & UCSC ENSOARG00000019976 GLB1L Protein coding Ensembl & UCSC ENSOARG00000019867 RETREG2 Protein coding Ensembl & UCSC ENSOARG00000020016 STK16 Protein coding

146

Ensembl & UCSC ENSOARG00000020210 TMEM198 Protein coding Ensembl & UCSC ENSOARG00000019955 ANKZF1 Protein coding Ensembl & UCSC ENSOARG00000020084 PTPRN Protein coding Ensembl & UCSC ENSOARG00000020239 OBSL1 Protein coding Ensembl & UCSC ENSOARG00000020148 ENSOARG00000020148 Protein coding Ensembl & UCSC ENSOARG00000020122 ENSOARG00000020122 Protein coding

Additional file 3: Table S3 Top eight protein coding positional candidate genes identified in the

OAR2:g.220083076–221052836 region on the Oar_v3.1 genome assembly.

Gene Biological function summary Disease(s) associated with mutations Controls repair of double stranded Growth retardation, microcephaly in humans NHEJ1 breaks in DNA, that would otherwise and mice (1, 2) cause cell death Growth retardation, foetal death, abnormal bone ATG9A Required for autophagosome formation surface in mice (3, 4) Produces the β-galactosidase lysosomal Skeletal changes, cardiomyopathy, hydrops GLB1L enzyme foetalis in humans (5) Protein located in endocrine and RESP18 neuroendocrine tissues, essential for Embryonic death in mice (6) embryonic development Produces desmin, the intermediate Myopathies of cardiac, skeletal and smooth DES filament protein for muscle muscle (7, 8) Neurological deficits, adrenal insufficiency, Involved in formation of glycoprotein GMPPA microcephaly and craniofacial dysmorphism in and glycolipid precursors humans (9, 10) Involved in formation of extracellular Defects in cell division early in embryogenesis CHPF matrix molecules (chondroitin (11) polymerase) 3-M Syndrome in humans characterised by Produces cytoskeletal adaptor proteins pregnancy losses, growth retardation, short OBSL1 that link the cytoskeleton to the cell limbs and interesting facial features; dilated membrane cardiomyopathy (12, 13) References

(1) El Waly B, Buhler E, Haddad MR et al. Nhej1 deficiency causes abnormal development of the cerebral cortex. Mol Neurobiol 2015;52:771-782. (2) Buck D, Malivert L, de Chasseval R et al. Cernunnos, a novel nonhomologous end-joining factor, is mutated in human immunodeficiency with microcephaly. Cell 2006;124:287-299 (3) Kojima T, Yamada T, Akaishi R et al. Role of the Atg9a gene in intrauterine growth and survival of fetal mice. Reprod Biol 2015;15:131-138. (4) Imagawa Y, Saitoh T, Tsujimoto, Y. Vital staining for cell death identifies Atg9a-dependent necrosis in developmental bone formation in mouse. Nat Commun 2016;7:13391. (5) Brunetti-Pierri N, Scaglia F. GM 1 gangliosidosis: Review of clinical, molecular, and therapeutic aspects. Mol genet metab 2008;94:391-396.

(6) Liang M, Yang JL, Bian MJ et al. Requirement of regulated endocrine-specific protein-18 for development and expression of regulated endocrine-specific protein-18 isoform c in mice. Mol Biol Rep 2011;38:2557-2562.

147

(7) Joanne P, Chourbagi O, Agbulut O. Desmin filaments and their disorganization associated with myofibrillar myopathies. Biol Aujourd'hui 2011;205:163-177. (8) Höllrigl A, Puz S, Al-Dubai H et al. Amino‐terminally truncated desmin rescues fusion of des−/− 1 myoblasts but negatively affects cardiomyogenesis and smooth muscle development. FEBS Lett 2002;523:229-233. (9) Koehler K, Malik M, Mahmood S et al. Mutations in GMPPA cause a glycosylation disorder characterized by intellectual disability and autonomic dysfunction. Am J Hum Genet 2013;93:727-734. (10) Gold WA, Sobreira N, Wiame E et al. A novel mutation in GMPPA in siblings with apparent intellectual disability, epilepsy, dysmorphism, and autonomic dysfunction. Am J Med Genet A 2017;173:2246-2250. (11) Izumikawa T, Kitagawa H, Mizuguchi S et al. Nematode chondroitin polymerizing factor showing cell-/organ- specific expression is indispensable for chondroitin synthesis and embryonic cell division. J Biol Chem 2004;279:53755-53761. (12) Marshall CR, Farrell SA, Cushing D et al. Whole-exome analysis of foetal autopsy tissue reveals a frameshift mutation in OBSL1, consistent with a diagnosis of 3-M Syndrome. BMC genomics 2015;16:S12. (13) Geisler SB, Robinson D, Hauringa M et al. Obscurin-like 1, OBSL1, is a novel cytoskeletal protein related to obscurin. Genomics 2007;89:521-531.

148

Additional file 4: Table S4 List of 103 private whole genome sequencing variants identified in an affected lamb (BCRHS3) after filtering based on segregation, predicted protein impact, removal of known SNPs and duplicates.

#CHROM POS ID REF ALT QUAL FILTER INFO 2 1 9 AC=2;AF=1.00;AN=2;BaseQRankSum=- 2 1.712e+00;ClippingRankSum=0.00;DP=34;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.01;MQRankSum=- 3.000e- 4 01;QD=30.97;SOR=0.283;ANN=G|synonymous_variant|LOW|ENSOARG00000019495|ENSOARG00000019495|transcript|EN 2 SOART00000021233.1|protein_coding|3/12|c.267T>G|p.Thr89Thr|267/1128|267/1128|89/375||WARNING_TRANSCRIPT_NO_ 9 START_CODON,G|upstream_gene_variant|MODIFIER|SNORA42|ENSOARG00000025091|transcript|ENSOART00000026993 .1|snoRNA||n.- 3 2732T>G|||||2732|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 2 . T G 1207.2 . ;CC_REC=1.000e+00 2 1 9 AC=2;AF=1.00;AN=2;BaseQRankSum=- 2 1.743e+00;ClippingRankSum=0.00;DP=34;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.01;MQRankSum=- 3.000e- 4 01;QD=34.24;SOR=0.283;ANN=C|missense_variant|MODERATE|ENSOARG00000019495|ENSOARG00000019495|transcript| 2 ENSOART00000021233.1|protein_coding|3/12|c.274G>C|p.Val92Leu|274/1128|274/1128|92/375||WARNING_TRANSCRIPT_ 9 NO_START_CODON,C|upstream_gene_variant|MODIFIER|SNORA42|ENSOARG00000025091|transcript|ENSOART0000002 6993.1|snoRNA||n.- 3 2725G>C|||||2725|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 9 . G C 1343.2 . ;CC_REC=1.000e+00 2 1 9 2 AC=2;AF=1.00;AN=2;BaseQRankSum=-8.940e- 01;ClippingRankSum=0.00;DP=33;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.94;MQRankSum=-3.000e- 4 01;QD=29.03;SOR=0.446;ANN=T|missense_variant|MODERATE|ENSOARG00000019495|ENSOARG00000019495|transcript| 2 ENSOART00000021233.1|protein_coding|3/12|c.281G>T|p.Cys94Phe|281/1128|281/1128|94/375||WARNING_TRANSCRIPT_ 9 NO_START_CODON,T|upstream_gene_variant|MODIFIER|SNORA42|ENSOARG00000025091|transcript|ENSOART0000002 6993.1|snoRNA||n.- 4 2718G>T|||||2718|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 6 . G T 1208.2 . ;CC_REC=1.000e+00 2 1 9 2 AC=2;AF=0.500;AN=4;DP=72;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=60.00;QD=23.93;SOR=1.236;AN N=G|missense_variant|MODERATE|PNKD|ENSOARG00000019505|transcript|ENSOART00000021243.1|protein_coding|2/10|c 8 .173C>G|p.Pro58Arg|617/1948|173/1158|58/385||,G|missense_variant|MODERATE|PNKD|ENSOARG00000019505|transcript|E 8 NSOART00000021245.1|protein_coding|2/3|c.173C>G|p.Pro58Arg|617/873|173/429|58/142||,G|upstream_gene_variant|MODIFI 3 ER|AAMP|ENSOARG00000019500|transcript|ENSOART00000021238.1|protein_coding||c.- 3304G>C|||||2701|,G|downstream_gene_variant|MODIFIER|TMBIM1|ENSOARG00000019527|transcript|ENSOART000000212 4 68.1|protein_coding||c.*3952G>C|||||3256|;Cases=1,0,2;Controls=0,0,0;CC_TREND=1.573e- 2 8 . C G 933.42 . 01;CC_GENO=NaN;CC_ALL=1.667e-01;CC_DOM=5.000e-01;CC_REC=5.000e-01 2 AGTGGCTCGGCCACTTCTTTGGGGGCG AC=2;AF=1.00;AN=2;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.83;QD=27.93;SOR=0.941;ANN 2 1 . A ACTGCTACCTGCTGCTC 1449.2 . =AGTGGCTCGGCCACTTCTTTGGGGGCGACTGCTACCTGCTGCTC|frameshift_variant&splice_region_variant|HIGH|VIL

149

9 1|ENSOARG00000019575|transcript|ENSOART00000021318.1|protein_coding|12/20|c.1246- 1_1246insGTGGCTCGGCCACTTCTTTGGGGGCGACTGCTACCTGCTGCTC|p.Tyr416fs|1257/2450|1246/2439|416/812||;L 4 OF=(VIL1|ENSOARG00000019575|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e 2 +00;CC_DOM=1.000e+00;CC_REC=1.000e+00 0 8 0 9 2 1 9 5 0 5 8 AC=2;AF=0.500;AN=4;DP=75;ExcessHet=1.5490;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=60.24;QD=23.40;SOR=0.954;AN N=T|splice_region_variant&intron_variant|LOW|USP37|ENSOARG00000019601|transcript|ENSOART00000021347.1|protein_ 2 coding|3/23|c.329-6delC||||||;Cases=1,0,2;Controls=0,0,0;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=1.667e- 2 8 . TG T 959.4 . 01;CC_DOM=5.000e-01;CC_REC=5.000e-01 2 1 9 5 AC=2;AF=1.00;AN=2;BaseQRankSum=-3.690e- 01;ClippingRankSum=0.00;DP=47;ExcessHet=3.0103;FS=3.745;MLEAC=2;MLEAF=1.00;MQ=58.53;MQRankSum=0.00;QD 9 =28.47;ReadPosRankSum=1.28;SOR=0.820;ANN=CCGGGCAGCAAAGT|splice_acceptor_variant&splice_donor_variant&intr 2 on_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/10|c.4527+1_4528- 4 1insACTTTGCTGCCCG||||||WARNING_TRANSCRIPT_NO_START_CODON,CCGGGCAGCAAAGT|downstream_gene_var iant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4829_*4830insCGGG 1 CAGCAAAGT|||||4830|;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_G 2 4 . C CCGGGCAGCAAAGT 1939.2 . ENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.22;QD=30.88;SOR=0.859;ANN 2 =CAGAG|frameshift_variant&splice_region_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART0000002139 4 6.1|protein_coding|8/11|c.4525_4526insCTCT|p.Gly1509fs|4685/5581|4525/5421|1509/1806||WARNING_TRANSCRIPT_NO_S TART_CODON,CAGAG|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART000000 1 21375.1|protein_coding||c.*4832_*4833insAGAG|||||4833|;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls 2 7 . C CAGAG 1671.2 . =0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.22;QD=31.23;SOR=0.859;ANN 9 =TGGCAGCGGTGGCTG|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|prot 2 ein_coding|8/11|c.4523_4524insCAGCCACCGCTGCC|p.Leu1508fs|4683/5581|4523/5421|1508/1806||WARNING_TRANSCRI 4 PT_NO_START_CODON,TGGCAGCGGTGGCTG|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|tr anscript|ENSOART00000021375.1|protein_coding||c.*4834_*4835insGGCAGCGGTGGCTG|||||4835|;LOF=(ZNF142|ENSOAR 1 G00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000 2 9 . T TGGCAGCGGTGGCTG 1629.2 . e+00;CC_REC=1.000e+00 2 1 AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.83;QD=26.55;SOR=0.693;ANN 9 =ATGCTGCTGCCTCCGGTGCTCGTC|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART0000 5 0021396.1|protein_coding|8/11|c.4520_4521insGACGAGCACCGGAGGCAGCAGCA|p.Cys1507fs|4680/5581|4520/5421|1507/ 9 1806||WARNING_TRANSCRIPT_NO_START_CODON,ATGCTGCTGCCTCCGGTGCTCGTC|downstream_gene_variant|M ODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4837_*4838insTGCTGCTG 2 CCTCCGGTGCTCGTC|||||4838|;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=Na 2 4 . A ATGCTGCTGCCTCCGGTGCTCGTC 1359.2 . N;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

150

2 2 2 1 9 5 AC=2;AF=1.00;AN=2;DP=25;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.20;QD=31.39;SOR=0.963;ANN 9 =AGTGCCGGCCGGC|conservative_inframe_insertion|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00 2 000021396.1|protein_coding|8/11|c.4518_4519insGCCGGCCGGCAC|p.Leu1506_Cys1507insAlaGlyArgHis|4678/5581|4518/54 4 21|1506/1806||WARNING_TRANSCRIPT_NO_START_CODON,AGTGCCGGCCGGC|downstream_gene_variant|MODIFIER |PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4839_*4840insGTGCCGGCCGGC||||| 2 4840|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC= 2 4 . A AGTGCCGGCCGGC 1314.2 . 1.000e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.65;QD=28.44;SOR=1.022;ANN 2 =AGGG|conservative_inframe_insertion|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|p 4 rotein_coding|8/11|c.4516_4517insCCC|p.Leu1505_Leu1506insSer|4676/5581|4516/5421|1506/1806||WARNING_TRANSCRIP T_NO_START_CODON,AGGG|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART 2 00000021375.1|protein_coding||c.*4841_*4842insGGG|||||4842|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=Na 2 6 . A AGGG 1269.2 . N;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 AC=2;AF=1.00;AN=2;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.48;QD=27.25;SOR=0.914;ANN 9 =AAGGCCTCCTGG|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_c 2 oding|8/11|c.4515_4516insCCAGGAGGCCT|p.Leu1506fs|4675/5581|4515/5421|1505/1806||WARNING_TRANSCRIPT_NO_S 4 TART_CODON,AAGGCCTCCTGG|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOA RT00000021375.1|protein_coding||c.*4842_*4843insAGGCCTCCTGG|||||4843|;LOF=(ZNF142|ENSOARG00000019647|1|1.00) 2 ;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000 2 7 . A AAGGCCTCCTGG 1089.2 . e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.46;QD=27.64;SOR=1.022;ANN 2 =AG|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.451 4 3dupC|p.Leu1505fs|4673/5581|4513/5421|1505/1806||WARNING_TRANSCRIPT_NO_START_CODON,AG|downstream_gene _variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4844_*4845insG| 2 ||||4845|;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ 2 9 . A AG 1134.2 . ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=22;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.62;QD=28.14;SOR=1.127;ANN 2 =ACACTCG|conservative_inframe_insertion|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART0000002139 4 6.1|protein_coding|8/11|c.4507_4508insCGAGTG|p.Leu1503delinsSerSerVal|4667/5581|4507/5421|1503/1806||WARNING_TR ANSCRIPT_NO_START_CODON,ACACTCG|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcr 3 ipt|ENSOART00000021375.1|protein_coding||c.*4850_*4851insCACTCG|||||4851|;Cases=1,0,2;Controls=0,0,0;CC_TREND=Na 2 5 . A ACACTCG 909.16 . N;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.46;QD=29.91;SOR=1.292;ANN =GCC|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.45 1 05_4506insGG|p.Leu1503fs|4665/5581|4505/5421|1502/1806||WARNING_TRANSCRIPT_NO_START_CODON,GCC|downstr 2 9 . G GCC 909.16 . eam_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4852_*

151

5 4853insCC|||||4853|;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO 9 =NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 4 3 7 2 1 9 5 AC=2;AF=1.00;AN=2;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.27;QD=25.52;SOR=1.179;ANN 9 =GGGTGCTGCTTCCGGGTGTGCCCCC|disruptive_inframe_insertion|MODERATE|ZNF142|ENSOARG00000019647|transcr 2 ipt|ENSOART00000021396.1|protein_coding|8/11|c.4502_4503insGGGGGCACACCCGGAAGCAGCACC|p.Ile1501delinsMet 4 GlyAlaHisProGluAlaAlaPro|4662/5581|4502/5421|1501/1806||WARNING_TRANSCRIPT_NO_START_CODON,GGGTGCT GCTTCCGGGTGTGCCCCC|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART000 4 00021375.1|protein_coding||c.*4855_*4856insGGTGCTGCTTCCGGGTGTGCCCCC|||||4856|;Cases=1,0,2;Controls=0,0,0;CC_ 2 0 . G GGGTGCTGCTTCCGGGTGTGCCCCC 1044.2 . TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=18;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.34;QD=33.22;SOR=1.244;ANN 2 =G|missense_variant|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c 4 .4502T>C|p.Ile1501Thr|4662/5581|4502/5421|1501/1806||WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_ge ne_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4856A>G|||||48 4 56|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.0 2 1 . A G 1053.2 . 00e+00 2 1 9 5 AC=2;AF=1.00;AN=2;DP=18;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.34;QD=31.21;SOR=1.244;ANN =AGGCTGGCGGGGCTGGCGCACAGCAGCCCGC|disruptive_inframe_insertion|MODERATE|ZNF142|ENSOARG0000001 9 9647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.4499_4500insGCGGGCTGCTGTGCGCCAGCCCCGCCAGC 2 C|p.Leu1500_Ile1501insArgAlaAlaValArgGlnProArgGlnPro|4659/5581|4499/5421|1500/1806||WARNING_TRANSCRIPT_NO 4 _START_CODON,AGGCTGGCGGGGCTGGCGCACAGCAGCCCGC|downstream_gene_variant|MODIFIER|PLCD4|ENSOA RG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4858_*4859insGGCTGGCGGGGCTGGCGCACAGC 4 AGGCTGGCGGGGCTGGCGCACAGCAGC AGCCCGC|||||4859|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+ 2 3 . A CCGC 1089.2 . 00;CC_REC=1.000e+00 2 1 9 5 9 AC=2;AF=1.00;AN=2;DP=27;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.58;QD=32.63;SOR=1.358;ANN 2 =G|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.4497 4 delA|p.Ile1501fs|4657/5581|4497/5421|1499/1806||WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_vari ant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4861delT|||||4861|;LOF 4 =(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e 2 5 . GT G 1134.2 . +00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 AC=2;AF=1.00;AN=2;DP=28;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.67;QD=38.84;SOR=1.445;ANN 5 =GGTGCAGTGC|disruptive_inframe_insertion|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021 9 396.1|protein_coding|8/11|c.4493_4494insGCACTGCAC|p.Leu1498_Arg1499insHisCysThr|4653/5581|4493/5421|1498/1806||W ARNING_TRANSCRIPT_NO_START_CODON,GGTGCAGTGC|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG0 2 0000019627|transcript|ENSOART00000021375.1|protein_coding||c.*4864_*4865insGTGCAGTGC|||||4865|;Cases=1,0,2;Control 2 4 . G GGTGCAGTGC 1269.2 . s=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

152

4 9 2 1 9 5 AC=2;AF=1.00;AN=2;BaseQRankSum=1.91;ClippingRankSum=0.00;DP=34;ExcessHet=3.0103;FS=7.754;MLEAC=2;MLEAF 9 =1.00;MQ=57.17;MQRankSum=-2.830e-01;QD=26.12;ReadPosRankSum=- 2 1.108e+00;SOR=0.962;ANN=C|missense_variant|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART0000002 4 1396.1|protein_coding|8/11|c.4486T>G|p.Cys1496Gly|4646/5581|4486/5421|1496/1806||WARNING_TRANSCRIPT_NO_STAR T_CODON,C|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|prot 5 ein_coding||c.*4872A>C|||||4872|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_D 2 7 . A C 1273.2 . OM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 AC=2;AF=1.00;AN=2;BaseQRankSum=2.14;ClippingRankSum=0.00;DP=36;ExcessHet=3.0103;FS=7.014;MLEAC=2;MLEAF 9 =1.00;MQ=57.33;MQRankSum=-2.740e-01;QD=30.28;ReadPosRankSum=-8.070e- 2 01;SOR=0.669;ANN=C|missense_variant|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1| 4 protein_coding|8/11|c.4476T>G|p.Asp1492Glu|4636/5581|4476/5421|1492/1806||WARNING_TRANSCRIPT_NO_START_CO DON,C|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript|ENSOART00000021375.1|protein_co 6 ding||c.*4882A>C|||||4882|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1. 2 7 . A C 1363.2 . 000e+00;CC_REC=1.000e+00 2 1 9 5 AC=2;AF=1.00;AN=2;BaseQRankSum=2.38;ClippingRankSum=0.00;DP=36;ExcessHet=3.0103;FS=7.014;MLEAC=2;MLEAF 9 =1.00;MQ=57.33;MQRankSum=-2.740e- 2 01;QD=33.69;ReadPosRankSum=0.727;SOR=0.669;ANN=G|missense_variant|MODERATE|ZNF142|ENSOARG00000019647| 4 transcript|ENSOART00000021396.1|protein_coding|8/11|c.4472T>C|p.Val1491Ala|4632/5581|4472/5421|1491/1806||WARNIN G_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|PLCD4|ENSOARG00000019627|transcript| 7 ENSOART00000021375.1|protein_coding||c.*4886A>G|||||4886|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=Na 2 1 . A G 1363.2 . N;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 9 2 AC=2;AF=1.00;AN=2;BaseQRankSum=2.43;ClippingRankSum=0.00;DP=42;ExcessHet=3.0103;FS=3.284;MLEAC=2;MLEAF 6 =1.00;MQ=59.54;MQRankSum=0.340;QD=25.15;ReadPosRankSum=0.636;SOR=2.133;ANN=T|missense_variant|MODERAT E|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.4327C>A|p.Pro1443Thr|4487/55 1 81|4327/5421|1443/1806||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;C 2 6 . G T 1031.2 . C_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 5 9 2 AC=2;AF=0.500;AN=4;DP=39;ExcessHet=0.7918;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=60.28;QD=30.71;SOR=1.863;AN 6 N=C|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.430 3_4315delAGCGGCTCCCCCC|p.Ser1435fs|4475/5581|4303/5421|1435/1806||WARNING_TRANSCRIPT_NO_START_COD 2 CGGGGGGAG ON;LOF=(ZNF142|ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=1.573e- 2 7 . CCGCT C 1655.5 . 01;CC_GENO=NaN;CC_ALL=1.667e-01;CC_DOM=5.000e-01;CC_REC=5.000e-01 2 AC=2;AF=1.00;AN=2;DP=38;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.49;QD=26.41;SOR=1.802;ANN 1 =G|frameshift_variant|HIGH|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c.4296 2 9 . GCA G 1690.2 . _4297delTG|p.Ala1433fs|4457/5581|4296/5421|1432/1806||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(ZNF142|

153

5 ENSOARG00000019647|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_D 9 OM=1.000e+00;CC_REC=1.000e+00 2 6 4 5 2 1 9 5 9 2 6 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.51;QD=27.49;SOR=1.670;ANN =T|missense_variant|MODERATE|ZNF142|ENSOARG00000019647|transcript|ENSOART00000021396.1|protein_coding|8/11|c 4 .4295G>A|p.Cys1432Tyr|4455/5581|4295/5421|1432/1806||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Co 2 8 . C T 1670.2 . ntrols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 6 1 4 AC=2;AF=1.00;AN=2;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.31;QD=29.77;SOR=1.418;ANN 9 =G|frameshift_variant|HIGH|RNF25|ENSOARG00000019664|transcript|ENSOART00000021415.1|protein_coding|7/10|c.552del G|p.Glu184fs|552/1380|552/1380|184/459||,G|3_prime_UTR_variant|MODIFIER|BCS1L|ENSOARG00000019649|transcript|EN 9 SOART00000021399.1|protein_coding|7/7|c.*2981delC|||||2981|;LOF=(RNF25|ENSOARG00000019664|1|1.00);Cases=1,0,2;Co 2 2 . GC G 1477.2 . ntrols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 6 1 AC=2;AF=1.00;AN=2;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.31;QD=31.83;SOR=1.418;ANN 4 =T|frameshift_variant|HIGH|RNF25|ENSOARG00000019664|transcript|ENSOART00000021415.1|protein_coding|7/10|c.549_5 9 50delGG|p.Glu184fs|550/1380|549/1380|183/459||,T|3_prime_UTR_variant|MODIFIER|BCS1L|ENSOARG00000019649|transcr ipt|ENSOART00000021399.1|protein_coding|7/7|c.*2983_*2984delCC|||||2983|;LOF=(RNF25|ENSOARG00000019664|1|1.00); 9 Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e 2 4 . TCC T 1477.2 . +00 2 1 9 6 AC=2;AF=1.00;AN=2;DP=46;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.34;QD=25.96;SOR=0.737;ANN =T|missense_variant&splice_region_variant|MODERATE|TTLL4|ENSOARG00000019690|transcript|ENSOART00000021445.1 7 |protein_coding|1/20|c.164G>T|p.Arg55Leu|164/6263|164/3597|55/1198||,T|5_prime_UTR_premature_start_codon_gain_variant| 6 LOW|TTLL4|ENSOARG00000019690|transcript|ENSOART00000021443.1|protein_coding|1/18|c.- 0 392G>T||||||,T|5_prime_UTR_variant|MODIFIER|TTLL4|ENSOARG00000019690|transcript|ENSOART00000021443.1|protein_ coding|1/18|c.- 7 392G>T|||||392|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;C 2 7 . G T 1985.2 . C_REC=1.000e+00 2 1 9 AC=2;AF=1.00;AN=2;BaseQRankSum=-4.620e- 7 01;ClippingRankSum=0.00;DP=23;ExcessHet=3.0103;FS=2.521;MLEAC=2;MLEAF=1.00;MQ=58.72;MQRankSum=-4.670e- 3 01;QD=32.22;ReadPosRankSum=2.32;SOR=0.929;ANN=CTGTTCTGTCTCGGGGCAAAT|frameshift_variant|HIGH|CYP27A 1|ENSOARG00000019716|transcript|ENSOART00000021471.1|protein_coding|1/9|c.3_4insGTTCTGTCTCGGGGCAAATT|p. 4 Pro2fs|575/2122|4/1551|2/516||WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME;Cases=1,0,2; 2 8 . C CTGTTCTGTCTCGGGGCAAAT 846.16 . Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

154

7 3 2 1 9 7 AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.16;QD=29.75;SOR=0.941;ANN =CAGGCTCTCTGGCTCGCGAGCCAAAGGCGCGGCAGGGCGAGCCATGAACCGAATGGGCGCTCTGGGCTCCGCGA 3 CAGGCTCTCTGGCTCGCGAGCCAAAGG GGTTGCGGTTGGCGCTGCTGGGGACGCGGGCGGCACT|frameshift_variant&stop_gained|HIGH|CYP27A1|ENSOARG00 4 CGCGGCAGGGCGAGCCATGAACCGAAT 000019716|transcript|ENSOART00000021471.1|protein_coding|1/9|c.5_6insAGGCTCTCTGGCTCGCGAGCCAAAGGCGCG 8 GGGCGCTCTGGGCTCCGCGAGGTTGCG GCAGGGCGAGCCATGAACCGAATGGGCGCTCTGGGCTCCGCGAGGTTGCGGTTGGCGCTGCTGGGGACGCGGGCG GCACT|p.Pro3fs|577/2122|6/1551|2/516||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(CYP27A1|ENSOARG0000 7 GTTGGCGCTGCTGGGGACGCGGGCGGC 0019716|1|1.00);NMD=(CYP27A1|ENSOARG00000019716|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO 2 6 . C ACT 2100.2 . =NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 7 3 4 8 AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.84;SOR=1.284;ANN =T|synonymous_variant|LOW|CYP27A1|ENSOARG00000019716|transcript|ENSOART00000021471.1|protein_coding|1/9|c.9G 8 >T|p.Pro3Pro|580/2122|9/1551|3/516||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TRE 2 0 . G T 1389.2 . ND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 8 2 AC=2;AF=1.00;AN=2;BaseQRankSum=2.68;ClippingRankSum=0.00;DP=45;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF 4 =1.00;MQ=60.00;MQRankSum=0.00;QD=26.68;ReadPosRankSum=3.53;SOR=0.427;ANN=CTA|frameshift_variant&splice_re 7 gion_variant|HIGH|WNT6|ENSOARG00000019734|transcript|ENSOART00000021490.1|protein_coding|6/7|c.796_797insAT|p. Trp266fs|797/939|797/939|266/312||WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME;LOF=( 0 WNT6|ENSOARG00000019734|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00; 2 4 . C CTA 1841.2 . CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 8 2 4 AC=2;AF=1.00;AN=2;DP=41;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.03;SOR=0.843;ANN 7 =GCCGCCGACTCGCCCGACTTCT|disruptive_inframe_insertion&splice_region_variant|MODERATE|WNT6|ENSOARG000 00019734|transcript|ENSOART00000021490.1|protein_coding|6/7|c.797_798insCCGCCGACTCGCCCGACTTCT|p.Trp266deli 0 nsCysArgArgLeuAlaArgLeuLeu|798/939|798/939|266/312||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Co 2 6 . G GCCGCCGACTCGCCCGACTTCT 1843.2 . ntrols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 8 2 4 AC=2;AF=1.00;AN=2;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.95;SOR=0.804;ANN 7 =CCAA|conservative_inframe_insertion|MODERATE|WNT6|ENSOARG00000019734|transcript|ENSOART00000021490.1|pro tein_coding|7/7|c.804_805insAAC|p.Pro268_Arg269insAsn|805/939|805/939|269/312||WARNING_TRANSCRIPT_NO_START 1 _CODON&INFO_REALIGN_3_PRIME;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+0 2 3 . C CCAA 2664.2 . 0;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 GGCGCACGGGTTCGCCGGGCACCCGCG AC=2;AF=1.00;AN=2;DP=58;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.36;QD=24.87;SOR=0.693;ANN =GGCGCACGGGTTCGCCGGGCACCCGCGGCCGCGCCTGCAACAGCAGCGCCCCGGACCTCAGCGGCTGCGACCT|di 1 GCCGCGCCTGCAACAGCAGCGCCCCGG sruptive_inframe_insertion|MODERATE|WNT6|ENSOARG00000019734|transcript|ENSOART00000021490.1|protein_coding|7 2 9 . G ACCTCAGCGGCTGCGACCT 3732.2 . /7|c.808_809insGCACGGGTTCGCCGGGCACCCGCGGCCGCGCCTGCAACAGCAGCGCCCCGGACCTCAGCGGCTGCG

155

8 ACCTGC|p.Arg269_Leu270insArgThrGlySerProGlyThrArgGlyArgAlaCysAsnSerSerAlaProAspLeuSerGlyCysAspLeu|809/939| 809/939|270/312||WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME;Cases=1,0,2;Controls=0,0, 2 0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 4 7 1 6 2 1 9 8 2 4 7 AC=2;AF=1.00;AN=2;DP=52;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=55.93;QD=36.76;SOR=1.136;ANN =T|missense_variant|MODERATE|WNT6|ENSOARG00000019734|transcript|ENSOART00000021490.1|protein_coding|7/7|c.81 2 4G>T|p.Gly272Cys|814/939|814/939|272/312||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0; 2 4 . G T 2301.2 . CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 1 9 8 2 4 7 AC=2;AF=1.00;AN=2;DP=52;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=55.93;QD=29.57;SOR=1.136;ANN =C|missense_variant|MODERATE|WNT6|ENSOARG00000019734|transcript|ENSOART00000021490.1|protein_coding|7/7|c.8 3 20G>C|p.Gly274Arg|820/939|820/939|274/312||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0 2 0 . G C 2299.2 . ;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 AC=2;AF=1.00;AN=2;BaseQRankSum=1.53;ClippingRankSum=0.00;DP=9;ExcessHet=3.0103;FS=4.260;MLEAC=2;MLEAF= 1.00;MQ=57.28;MQRankSum=-1.570e-01;QD=33.30;ReadPosRankSum=- 1.534e+00;SOR=1.447;ANN=TCGCCTGACTTCTGCGAGCGGGAGCCGCGCCTGGACTCGGCGGGCACCGTGGGCCGCC 2 TGTGCAACAAGAGCAGCGCGGGCCCCGACGGCTGCGGCAGCATGTGCTGCGGCCGCGGCCACAACATCCTTCGGC AGACA|frameshift_variant|HIGH|WNT10A|ENSOARG00000019751|transcript|ENSOART00000021508.1|protein_coding|5/5|c 1 .1116_1117insCTGACTTCTGCGAGCGGGAGCCGCGCCTGGACTCGGCGGGCACCGTGGGCCGCCTGTGCAACAAGAG 9 CAGCGCGGGCCCCGACGGCTGCGGCAGCATGTGCTGCGGCCGCGGCCACAACATCCTTCGGCAGACACGC|p.Ser373 8 TCGCCTGACTTCTGCGAGCGGGAGCCG fs|1117/1206|1117/1206|373/401||INFO_REALIGN_3_PRIME,TCGCCTGACTTCTGCGAGCGGGAGCCGCGCCTGGACTC GGCGGGCACCGTGGGCCGCCTGTGCAACAAGAGCAGCGCGGGCCCCGACGGCTGCGGCAGCATGTGCTGCGGCCG 4 CGCCTGGACTCGGCGGGCACCGTGGGC CGGCCACAACATCCTTCGGCAGACA|intron_variant|MODIFIER|ENSOARG00000025849|ENSOARG00000025849|transcri 4 CGCCTGTGCAACAAGAGCAGCGCGGGC pt|ENSOART00000027814.1|lincRNA|1/1|n.248-5900_248- 8 CCCGACGGCTGCGGCAGCATGTGCTGC 5899insCTGACTTCTGCGAGCGGGAGCCGCGCCTGGACTCGGCGGGCACCGTGGGCCGCCTGTGCAACAAGAGCAGC GCGGGCCCCGACGGCTGCGGCAGCATGTGCTGCGGCCGCGGCCACAACATCCTTCGGCAGACACGC||||||INFO_REAL 4 GGCCGCGGCCACAACATCCTTCGGCAG IGN_3_PRIME;LOF=(WNT10A|ENSOARG00000019751|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO= 2 7 . T ACA 1618.2 . NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 1 AC=2;AF=1.00;AN=2;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.79;SOR=0.927;ANN =T|missense_variant|MODERATE|FAM134A|ENSOARG00000019867|transcript|ENSOART00000021631.1|protein_coding|1/9| 0 c.236G>T|p.Arg79Leu|236/1650|236/1629|79/542||,T|upstream_gene_variant|MODIFIER|CNPPD1|ENSOARG00000019836|tran 6 script|ENSOART00000021598.1|protein_coding||c.- 4 2466C>A|||||2466|,T|upstream_gene_variant|MODIFIER|CNPPD1|ENSOARG00000019836|transcript|ENSOART00000021597.1 |protein_coding||c.- 5 2007C>A|||||2007|WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GEN 2 3 . G T 1545.2 . O=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.64;SOR=1.002;ANN =A|missense_variant|MODERATE|FAM134A|ENSOARG00000019867|transcript|ENSOART00000021631.1|protein_coding|1/9| 2 c.241G>A|p.Ala81Thr|241/1650|241/1629|81/542||,A|upstream_gene_variant|MODIFIER|CNPPD1|ENSOARG00000019836|tran 0 script|ENSOART00000021598.1|protein_coding||c.- 1 2471C>T|||||2471|,A|upstream_gene_variant|MODIFIER|CNPPD1|ENSOARG00000019836|transcript|ENSOART00000021597.1 |protein_coding||c.- 0 2012C>T|||||2012|WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GEN 2 6 . G A 1517.2 . O=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

156

4 5 8 2 2 0 1 AC=2;AF=1.00;AN=2;DP=44;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.37;QD=35.09;SOR=1.259;ANN 6 =AACCCTCAGACCCCCGACCCT|frameshift_variant|HIGH|ANKZF1|ENSOARG00000019955|transcript|ENSOART0000002 1 1724.1|protein_coding|12/13|c.2020_2021insACCCTCAGACCCCCGACCCT|p.Ser674fs|2049/2212|2021/2184|674/727||,AACC 9 CTCAGACCCCCGACCCT|downstream_gene_variant|MODIFIER|GLB1L|ENSOARG00000019976|transcript|ENSOART0000 0021748.1|protein_coding||c.*691_*692insAGGGTCGGGGGTCTGAGGGT|||||691|;LOF=(ANKZF1|ENSOARG00000019955|1| 1 1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1 2 6 . A AACCCTCAGACCCCCGACCCT 2086.2 . .000e+00 2 2 0 2 1 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.36;QD=31.27;SOR=1.765;ANN 0 =A|conservative_inframe_deletion|MODERATE|DNAJB2|ENSOARG00000020051|transcript|ENSOART00000021831.1|protein 5 _coding|8/9|c.730_741delGGGTCGGGGAGC|p.Gly244_Ser247del|730/975|730/975|244/324||,A|downstream_gene_variant|MO DIFIER|PTPRN|ENSOARG00000020084|transcript|ENSOART00000021867.1|protein_coding||c.*4599_*4610delGCTCCCCG 7 AGGGTCGGG ACCC|||||4610|WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO= 2 1 . GAGC A 1314.2 . NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 2 AC=2;AF=1.00;AN=2;DP=34;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.32;QD=29.31;SOR=1.819;ANN 1 =T|frameshift_variant&splice_donor_variant&splice_region_variant&intron_variant|HIGH|DNAJB2|ENSOARG00000020051|tr 0 anscript|ENSOART00000021831.1|protein_coding|8/9|c.747_750+2delGTGTCA|p.Cys250fs|747/975|747/975|249/324||,T|downs 5 tream_gene_variant|MODIFIER|PTPRN|ENSOARG00000020084|transcript|ENSOART00000021867.1|protein_coding||c.*4588_ *4593delTGACAC|||||4593|WARNING_TRANSCRIPT_NO_START_CODON;LOF=(DNAJB2|ENSOARG00000020051|1|1.00 8 );Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000 2 8 . TGTGTCA T 1494.2 . e+00 2 2 0 2 1 AC=2;AF=1.00;AN=2;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.35;QD=32.70;SOR=1.609;ANN 0 =A|splice_region_variant&intron_variant|LOW|DNAJB2|ENSOARG00000020051|transcript|ENSOART00000021831.1|protein_ 5 coding|8/8|c.750+6_750+12delCCCCGCG||||||,A|downstream_gene_variant|MODIFIER|PTPRN|ENSOARG00000020084|transcri pt|ENSOART00000021867.1|protein_coding||c.*4578_*4584delCGCGGGG|||||4584|WARNING_TRANSCRIPT_NO_START_ 9 CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_RE 2 7 . ACCCCGCG A 1494.2 . C=1.000e+00 2 2 0 2 AC=2;AF=0.500;AN=4;BaseQRankSum=2.78;ClippingRankSum=0.00;DP=34;ExcessHet=3.0103;FS=0.000;MLEAC=3;MLEA F=0.750;MQ=60.00;MQRankSum=0.00;QD=33.18;ReadPosRankSum=- 2 2.691e+00;SOR=1.382;ANN=TCCAAGAGAGGGGGCAGCAGCTCAGCCTGCAAAGGGGACAGGGGGGAGCCCGCCCC 8 TCCAAGAGAGGGGGCAGCAGCTCAGCC AGCTCCTCCCCCACCTACTGAAGGTCGAGGAAGC|frameshift_variant&splice_region_variant|HIGH|PTPRN|ENSOARG0 0 TGCAAAGGGGACAGGGGGGAGCCCGC 0000020084|transcript|ENSOART00000021867.1|protein_coding|5/24|c.479_480insGCTTCCTCGACCTTCAGTAGGTGGGG GAGGAGCTGGGGCGGGCTCCCCCCTGTCCCCTTTGCAGGCTGAGCTGCTGCCCCCTCTCTTGG|p.His161fs|479/2865|4 3 CCCAGCTCCTCCCCCACCTACTGAAGG 79/2865|160/954||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(PTPRN|ENSOARG00000020084|1|1.00);Cases=1, 2 5 . T TCGAGGAAGC 2788.4 . 0,2;Controls=0,0,0;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=1.667e-01;CC_DOM=5.000e-01;CC_REC=5.000e-01 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.90;SOR=1.329;ANN 2 =GA|frameshift_variant|HIGH|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|6/14|c.558 2 2 . G GA 1766.8 . _559insT|p.Leu187fs|558/1416|558/1416|186/471||WARNING_TRANSCRIPT_NO_START_CODON,GA|downstream_gene_va

157

0 riant|MODIFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*2501_*25 02insA|||||2502|;LOF=(DNPEP|ENSOARG00000020161|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=Na 2 N;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 8 9 8 9 5 2 2 0 2 8 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.09;SOR=1.387;ANN 9 =A|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|6/14|c. 8 558G>T|p.Gln186His|558/1416|558/1416|186/471||WARNING_TRANSCRIPT_NO_START_CODON,A|downstream_gene_var iant|MODIFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*2502C>A||| 9 ||2502|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC= 2 6 . C A 1775.8 . 1.000e+00 2 2 0 2 AC=2;AF=1.00;AN=2;DP=42;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.55;QD=27.51;SOR=1.445;ANN 8 =G|splice_acceptor_variant&disruptive_inframe_deletion&splice_region_variant&intron_variant|HIGH|DNPEP|ENSOARG0000 9 0020161|transcript|ENSOART00000021951.1|protein_coding|6/14|c.552- 8 3_554delAAGAAA|p.Leu184_Asn185delinsPhe|554/1416|552/1416|184/471||WARNING_TRANSCRIPT_NO_START_CODO N,G|downstream_gene_variant|MODIFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART0000002532 9 3.1|miRNA||n.*2506_*2511delTTTCTT|||||2506|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.0 2 9 . GTTTCTT G 1809.2 . 00e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 2 8 AC=2;AF=1.00;AN=2;DP=42;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.55;QD=32.88;SOR=1.445;ANN 9 =C|splice_region_variant&intron_variant|LOW|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_c 9 oding|5/13|c.552- 7C>G||||||WARNING_TRANSCRIPT_NO_START_CODON,C|downstream_gene_variant|MODIFIER|ENSOARG00000023421| 0 ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*2515G>C|||||2515|;Cases=1,0,2;Controls=0,0,0;CC_T 2 9 . G C 1818.2 . REND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 2 8 AC=2;AF=0.500;AN=4;DP=42;ExcessHet=0.7918;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=60.00;QD=29.90;SOR=1.445;AN 9 N=C|splice_region_variant&intron_variant|LOW|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_ 9 coding|5/13|c.552- 8C>G||||||WARNING_TRANSCRIPT_NO_START_CODON,C|downstream_gene_variant|MODIFIER|ENSOARG00000023421| 1 ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*2516G>C|||||2516|;Cases=1,0,2;Controls=0,0,0;CC_T 2 0 . G C 1769.4 . REND=1.573e-01;CC_GENO=NaN;CC_ALL=1.667e-01;CC_DOM=5.000e-01;CC_REC=5.000e-01 2 2 0 AC=2;AF=1.00;AN=2;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.20;SOR=1.609;ANN= 2 G|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.2 9 1A>C|p.Arg7Ser|21/1416|21/1416|7/471||WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MOD IFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3920T>G|||||3920|;Ca 1 ses=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 2 3 . T G 272.23 . 0

158

1 4 2 2 0 2 9 1 AC=2;AF=1.00;AN=2;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.01;SOR=1.329;ANN= 3 G|synonymous_variant|LOW|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.19A> C|p.Arg7Arg|19/1416|19/1416|7/471||WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFI 1 ER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3922T>G|||||3922|;Cases 2 6 . T G 272.23 . =1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 2 9 AC=2;AF=1.00;AN=2;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.67;SOR=1.329;ANN= 1 C|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.1 3 6A>G|p.Thr6Ala|16/1416|16/1416|6/471||WARNING_TRANSCRIPT_NO_START_CODON,C|downstream_gene_variant|MOD IFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3925T>C|||||3925|;Ca 1 ses=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 2 9 . T C 243.26 . 0 2 2 0 2 9 AC=2;AF=1.00;AN=2;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.56;SOR=1.329;ANN= 1 C|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.1 3 4A>G|p.Glu5Gly|14/1416|14/1416|5/471||WARNING_TRANSCRIPT_NO_START_CODON,C|downstream_gene_variant|MOD IFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3927T>C|||||3927|;Ca 2 ses=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 2 1 . T C 243.26 . 0 2 2 0 2 9 AC=2;AF=1.00;AN=2;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.70;SOR=1.329;ANN= 1 T|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.1 3 3G>A|p.Glu5Lys|13/1416|13/1416|5/471||WARNING_TRANSCRIPT_NO_START_CODON,T|downstream_gene_variant|MOD IFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3928C>T|||||3928|;Ca 2 ses=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 2 2 . C T 243.26 . 0 2 2 0 2 9 AC=2;AF=1.00;AN=2;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.52;SOR=1.981;ANN= 1 C|missense_variant|MODERATE|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.1 3 0A>G|p.Ser4Gly|10/1416|10/1416|4/471||WARNING_TRANSCRIPT_NO_START_CODON,C|downstream_gene_variant|MOD IFIER|ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3931T>C|||||3931|;Ca 2 ses=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 2 5 . T C 243.26 . 0 2 AC=2;AF=1.00;AN=2;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.82;SOR=1.981;ANN= 2 G|synonymous_variant|LOW|DNPEP|ENSOARG00000020161|transcript|ENSOART00000021951.1|protein_coding|1/14|c.3A> 2 0 . T G 198.33 . C|p.Ala1Ala|3/1416|3/1416|1/471||WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|

159

2 ENSOARG00000023421|ENSOARG00000023421|transcript|ENSOART00000025323.1|miRNA||n.*3938T>G|||||3938|;Cases=1, 9 0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 1 3 3 2 2 2 0 3 4 9 7 AC=2;AF=1.00;AN=2;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.83;QD=24.01;SOR=0.770;ANN =C|missense_variant|MODERATE|ENSOARG00000020189|ENSOARG00000020189|transcript|ENSOART00000021983.1|prot 4 ein_coding|3/4|c.292G>C|p.Gly98Arg|292/592|292/591|98/196||WARNING_TRANSCRIPT_NO_STOP_CODON;Cases=1,0,2;C 2 7 . G C 600.2 . ontrols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 3 4 9 AC=2;AF=1.00;AN=2;DP=28;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.92;QD=25.53;SOR=0.770;ANN 7 =CGGGCAGCCCAAG|conservative_inframe_insertion|MODERATE|ENSOARG00000020189|ENSOARG00000020189|transcr ipt|ENSOART00000021983.1|protein_coding|3/4|c.303_304insGGCAGCCCAAGG|p.Pro101_Gln102insGlySerProArg|304/592| 5 304/591|102/196||WARNING_TRANSCRIPT_NO_STOP_CODON&INFO_REALIGN_3_PRIME;Cases=1,0,2;Controls=0,0,0; 2 7 . C CGGGCAGCCCAAG 1130.2 . CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 3 7 2 3 AC=2;AF=1.00;AN=2;DP=55;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.90;QD=31.16;SOR=0.737;ANN =T|missense_variant|MODERATE|ENSOARG00000020190|ENSOARG00000020190|transcript|ENSOART00000021984.1|prot 8 ein_coding|8/9|c.1096C>T|p.Leu366Phe|1096/1272|1096/1272|366/423||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GE 2 5 . C T 1910.2 . NO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 3 7 2 3 AC=2;AF=1.00;AN=2;DP=51;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.79;SOR=0.836;ANN =T|missense_variant|MODERATE|ENSOARG00000020190|ENSOARG00000020190|transcript|ENSOART00000021984.1|prot 9 ein_coding|8/9|c.1106C>T|p.Pro369Leu|1106/1272|1106/1272|369/423||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GE 2 5 . C T 1947.2 . NO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 3 AC=2;AF=1.00;AN=2;DP=51;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.62;SOR=0.941;ANN 7 =CTGGA|frameshift_variant|HIGH|ENSOARG00000020190|ENSOARG00000020190|transcript|ENSOART00000021984.1|prot ein_coding|8/9|c.1108_1109insTGGA|p.Gln370fs|1109/1272|1109/1272|370/423||;LOF=(ENSOARG00000020190|ENSOARG00 2 000020190|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+0 2 3 . C CTGGA 2073.2 . 0;CC_REC=1.000e+00

160

9 7 2 2 0 3 7 2 AC=2;AF=0.500;AN=4;DP=55;ExcessHet=0.7918;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=60.00;QD=32.12;SOR=0.976;AN 3 N=GAT|frameshift_variant|HIGH|ENSOARG00000020190|ENSOARG00000020190|transcript|ENSOART00000021984.1|protei n_coding|8/9|c.1110_1111insAT|p.Val371fs|1111/1272|1111/1272|371/423||;LOF=(ENSOARG00000020190|ENSOARG000000 9 20190|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=1.667e-01;CC_DOM=5.000e- 2 9 . G GAT 2029.4 . 01;CC_REC=5.000e-01 2 2 0 3 7 2 AC=2;AF=1.00;AN=2;DP=53;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.72;SOR=0.929;ANN 4 =GAA|frameshift_variant|HIGH|ENSOARG00000020190|ENSOARG00000020190|transcript|ENSOART00000021984.1|protein _coding|8/9|c.1114_1115insAA|p.Gly372fs|1115/1272|1115/1272|372/423||;LOF=(ENSOARG00000020190|ENSOARG0000002 0 0190|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_ 2 3 . G GAA 2028.2 . REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=35.15;SOR=2.124;ANN 2 =GCAGTCGTATCTCCTGCATCTCGGTCTCCAGCATCTCGGTTGGAGCCCAAGGCCCACCCGGCTGTGTTCTCTAGGG 0 TGGCCTC|frameshift_variant|HIGH|ENSOARG00000020195|ENSOARG00000020195|transcript|ENSOART00000021988.1|pr 3 otein_coding|8/9|c.1742_1743insAGTCGTATCTCCTGCATCTCGGTCTCCAGCATCTCGGTTGGAGCCCAAGGCCCACCC GGCTGTGTTCTCTAGGGTGGCCTCC|p.His582fs|1743/2454|1743/2454|581/817||INFO_REALIGN_3_PRIME,GCAGTCGT 8 ATCTCCTGCATCTCGGTCTCCAGCATCTCGGTTGGAGCCCAAGGCCCACCCGGCTGTGTTCTCTAGGGTGGCCTC|up 9 stream_gene_variant|MODIFIER|ENSOARG00000020199|ENSOARG00000020199|transcript|ENSOART00000021992.1|protei 2 GCAGTCGTATCTCCTGCATCTCGGTCTC n_coding||c.-4371_- 4370insCAGTCGTATCTCCTGCATCTCGGTCTCCAGCATCTCGGTTGGAGCCCAAGGCCCACCCGGCTGTGTTCTCTA 4 CAGCATCTCGGTTGGAGCCCAAGGCCC GGGTGGCCTC|||||4370|;LOF=(ENSOARG00000020195|ENSOARG00000020195|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TR 2 5 . G ACCCGGCTGTGTTCTCTAGGGTGGCCTC 1720.2 . END=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 1 AGGGCGTGC AC=2;AF=1.00;AN=2;DP=47;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=63.20;QD=24.39;SOR=1.270;ANN 0 GGCTCCGAG =A|frameshift_variant&splice_acceptor_variant&splice_region_variant&intron_variant|HIGH|GMPPA|ENSOARG00000020203| 6 AGAGCATCA transcript|ENSOART00000021997.1|protein_coding|11/14|c.943+5_947delGGGCGTGCGGCTCCGAGAGAGCATCATCGAC GATAGGCGG|p.Gly316fs||944/1281|315/426||WARNING_TRANSCRIPT_NO_STOP_CODON;LOF=(GMPPA|ENSOARG00 8 TCGACGATA 000020203|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+0 2 9 . GGCGG A 1608.2 . 0;CC_REC=1.000e+00 2 2 0 4 1 9 1 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.88;SOR=0.906;ANN =C|missense_variant|MODERATE|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding|1/12|c. 3 46G>C|p.Glu16Gln|67/2007|46/1986|16/661||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000 2 4 . G C 1278.2 . e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=28;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.65;SOR=0.922;ANN =G|missense_variant|MODERATE|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding|1/12|c. 2 50C>G|p.Pro17Arg|71/2007|50/1986|17/661||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000 2 0 . C G 1278.2 . e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

161

4 1 9 1 3 8 2 2 0 4 1 9 1 AC=2;AF=1.00;AN=2;DP=29;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.55;SOR=0.997;ANN =T|missense_variant|MODERATE|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding|1/12|c.5 4 2C>T|p.Pro18Ser|73/2007|52/1986|18/661||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+ 2 0 . C T 1278.2 . 00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 1 9 1 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.85;SOR=0.906;ANN =CT|frameshift_variant|HIGH|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding|1/12|c.55_5 4 6insT|p.Pro19fs|77/2007|56/1986|19/661||;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+0 2 3 . C CT 1314.2 . 0;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 1 9 1 AC=2;AF=1.00;AN=2;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.98;SOR=0.826;ANN =CCTCG|frameshift_variant|HIGH|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding|1/12|c. 4 60_61insGCTC|p.Thr21fs|82/2007|61/1986|21/661||INFO_REALIGN_3_PRIME;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN 2 5 . C CCTCG 1359.2 . ;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 AC=2;AF=1.00;AN=2;DP=33;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=56.46;QD=30.23;SOR=1.382;ANN 4 =CAT|frameshift_variant|HIGH|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.1382 3 _1383insAT|p.Glu462fs|1382/2328|1382/2328|461/775||WARNING_TRANSCRIPT_NO_START_CODON,CAT|upstream_gene 1 _variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-4912_- 4911insAT|||||4911|,CAT|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022 8 001.1|protein_coding||c.*2283_*2284insAT|||||2284|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Controls=0,0,0; 2 1 . C CAT 1269.2 . CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 AC=2;AF=1.00;AN=2;DP=33;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=57.89;QD=28.32;SOR=1.536;ANN 0 =CGG|frameshift_variant|HIGH|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.1377 4 _1378insCC|p.Gly460fs|1377/2328|1377/2328|459/775||WARNING_TRANSCRIPT_NO_START_CODON,CGG|upstream_gen 4 e_variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-4907_- 4906insGG|||||4906|,CGG|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART0000002 3 2001.1|protein_coding||c.*2288_*2289insGG|||||2289|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Controls=0,0,0; 2 1 . C CGG 1449.2 . CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

162

8 6 2 2 0 4 AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.01;QD=29.45;SOR=1.473;ANN 4 =A|missense_variant|MODERATE|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.13 3 75C>T|p.Arg459Cys|1375/2328|1375/2328|459/775||WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_varia 1 nt|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.- 4904G>A|||||4904|,A|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001. 8 1|protein_coding||c.*2291G>A|||||2291|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;C 2 9 . G A 1413.2 . C_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.01;QD=26.86;SOR=1.473;ANN 4 =AT|frameshift_variant|HIGH|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.1371_1 3 372insA|p.Ser458fs|1371/2328|1371/2328|457/775||WARNING_TRANSCRIPT_NO_START_CODON,AT|upstream_gene_vari 1 ant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-4901_- 4900insT|||||4900|,AT|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001 9 .1|protein_coding||c.*2294_*2295insT|||||2295|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_T 2 2 . A AT 1449.2 . REND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 AC=2;AF=1.00;AN=2;DP=34;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.63;QD=31.27;SOR=1.681;ANN 4 =GAA|frameshift_variant|HIGH|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.1369 3 _1370insTT|p.Pro457fs|1369/2328|1369/2328|457/775||WARNING_TRANSCRIPT_NO_START_CODON,GAA|upstream_gene 1 _variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-4899_- 4898insAA|||||4898|,GAA|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART0000002 9 2001.1|protein_coding||c.*2296_*2297insAA|||||2297|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Controls=0,0,0; 2 4 . G GAA 1494.2 . CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.66;SOR=1.765;ANN 4 =G|frameshift_variant|HIGH|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|6/6|c.1363del 3 A|p.Thr455fs|1363/2328|1363/2328|455/775||WARNING_TRANSCRIPT_NO_START_CODON,G|upstream_gene_variant|MO 2 DIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.- 4892delT|||||4892|,G|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001. 0 1|protein_coding||c.*2303delT|||||2303|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND= 2 0 . GT G 1359.2 . NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.23;SOR=1.765;ANN 4 =G|frameshift_variant&splice_acceptor_variant&splice_region_variant&intron_variant|HIGH|CHPF|ENSOARG00000020209|tra nscript|ENSOART00000022003.1|protein_coding|6/6|c.1357- 4 2_1361delAAAACGC|p.Asn453fs|1361/2328|1357/2328|453/775||WARNING_TRANSCRIPT_NO_START_CODON,G|upstrea 3 m_gene_variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.- 2 4890_- 4884delGCGTTTT|||||4890|,G|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART0000 0 0022001.1|protein_coding||c.*2305_*2311delGCGTTTT|||||2305|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1,0,2;Co 2 2 . GGCGTTTT G 1314.2 . ntrols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.40;QD=29.41;SOR=0.892;ANN =A|missense_variant|MODERATE|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|3/6|c.77 2 0G>T|p.Arg257Leu|770/2328|770/2328|257/775||WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_variant| 2 0 . C A 1336.2 . MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-

163

4 3411C>A|||||3411|,A|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001. 1|protein_coding||c.*3784C>A|||||3784|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;C 4 C_DOM=1.000e+00;CC_REC=1.000e+00 4 6 8 2 2 2 0 4 AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.40;QD=34.40;SOR=1.044;ANN 4 =A|missense_variant|MODERATE|CHPF|ENSOARG00000020209|transcript|ENSOART00000022003.1|protein_coding|3/6|c.75 4 7C>T|p.Pro253Ser|757/2328|757/2328|253/775||WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_variant|M 6 ODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.- 3398G>A|||||3398|,A|downstream_gene_variant|MODIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001. 9 1|protein_coding||c.*3797G>A|||||3797|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;C 2 5 . G A 1362.2 . C_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.36;QD=28.16;SOR=0.976;ANN 4 CAGCACCCC =C|frameshift_variant&splice_acceptor_variant&splice_region_variant&intron_variant|HIGH|CHPF|ENSOARG00000020209|tra nscript|ENSOART00000022003.1|protein_coding|3/6|c.744+6_752delCACACCCCCCCCAGGCCCCTGCCTCCACTGTGGTT 4 CAAAACCAC TTGGGGGTGCT|p.Gly249fs|752/2328|745/2328|249/775||WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gen 4 AGTGGAGGC e_variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.-3393_- 6 AGGGGCCTG 3348delAGCACCCCCAAAACCACAGTGGAGGCAGGGGCCTGGGGGGGGTGTG|||||3393|,C|downstream_gene_variant|MO DIFIER|ASIC4|ENSOARG00000020207|transcript|ENSOART00000022001.1|protein_coding||c.*3802_*3847delAGCACCCCC 9 GGGGGGGTG AAAACCACAGTGGAGGCAGGGGCCTGGGGGGGGTGTG|||||3802|;LOF=(CHPF|ENSOARG00000020209|1|1.00);Cases=1, 2 9 . TG C 1309.2 . 0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 AC=2;AF=1.00;AN=2;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.47;QD=25.46;SOR=1.418;ANN 5 =CGGCGGCGCATCTGCAGAGGCAGTGCTGG|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOA 4 RT00000022037.1|protein_coding|24/25|c.5317_5318insCCAGCACTGCCTCTGCAGATGCGCCGCC|p.Arg1773fs|5317/5586| 5 5317/5586|1773/1861||WARNING_TRANSCRIPT_NO_START_CODON,CGGCGGCGCATCTGCAGAGGCAGTGCTGG|do wnstream_gene_variant|MODIFIER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c. 4 CGGCGGCGCATCTGCAGAGGCAGTGCT *1977_*1978insGGCGGCGCATCTGCAGAGGCAGTGCTGG|||||1978|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GE 2 9 . C GG 1537.2 . NO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 5 5 AC=2;AF=1.00;AN=2;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.08;SOR=0.846;ANN 6 =A|splice_region_variant&intron_variant|LOW|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_c oding|21/24|c.5017+6G>T||||||WARNING_TRANSCRIPT_NO_START_CODON,A|downstream_gene_variant|MODIFIER|TME 3 M198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.*3063C>A|||||3063|;Cases=1,0,2;Controls 2 5 . C A 756.2 . =0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 AC=2;AF=1.00;AN=2;DP=28;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.68;SOR=0.836;ANN =AGGCGGCGCGTGCCGAGGGCCTGGAGCCTGAGCCGAGCGCTGGGGGTGAGCGGCTGCCCGTCCTAGGGGCGGG|st 2 op_gained&conservative_inframe_insertion|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein 2 _coding|21/25|c.4921_4922insCCCGCCCCTAGGACGGGCAGCCGCTCACCCCCAGCGCTCGGCTCAGGCTCCAGGCCCT CGGCACGCGCCGCC|p.Asn1640_Leu1641insProArgProTerAspGlyGlnProLeuThrProSerAlaArgLeuArgLeuGlnAlaLeuGlyThr 0 ArgArg|4921/5586|4921/5586|1641/1861||WARNING_TRANSCRIPT_NO_START_CODON,AGGCGGCGCGTGCCGAGGG 4 CCTGGAGCCTGAGCCGAGCGCTGGGGGTGAGCGGCTGCCCGTCCTAGGGGCGGG|downstream_gene_variant|MODIFI 5 AGGCGGCGCGTGCCGAGGGCCTGGAGC ER|TMEM198|ENSOARG00000020210|transcript|ENSOART00000022004.1|protein_coding||c.*3164_*3165insGGCGGCGCG TGCCGAGGGCCTGGAGCCTGAGCCGAGCGCTGGGGGTGAGCGGCTGCCCGTCCTAGGGGCGGG|||||3165|;LOF=(OBS 5 CTGAGCCGAGCGCTGGGGGTGAGCGGC L1|ENSOARG00000020239|1|1.00);NMD=(OBSL1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND= 2 7 . A TGCCCGTCCTAGGGGCGGG 2421.2 . NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

164

3 6 2 2 0 4 5 8 AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.29;SOR=1.112;ANN 5 =T|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/25|c.4234 _4235delGG|p.Gly1412fs|4235/5586|4234/5586|1412/1861||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(OBSL1| 6 ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_D 2 1 . TCC T 1433.2 . OM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 5 8 5 AC=2;AF=1.00;AN=2;DP=33;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.43;SOR=1.352;ANN =A|synonymous_variant|LOW|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/25|c.42 6 30G>T|p.Gly1410Gly|4230/5586|4230/5586|1410/1861||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Contro 2 7 . C A 1667.2 . ls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 5 8 AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.61;SOR=1.483;ANN 5 =CAT|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/25|c.4 228_4229insAT|p.Gly1410fs|4228/5586|4228/5586|1410/1861||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(OBS 6 L1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC 2 8 . C CAT 1629.2 . _DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 5 8 AC=2;AF=1.00;AN=2;DP=37;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.53;SOR=1.432;ANN 5 =CTGCAG|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/2 5|c.4226_4227insCTGCA|p.Gly1410fs|4226/5586|4226/5586|1409/1861||WARNING_TRANSCRIPT_NO_START_CODON;L 7 OF=(OBSL1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.00 2 0 . C CTGCAG 1719.2 . 0e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 5 8 5 AC=2;AF=1.00;AN=2;DP=37;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.29;SOR=1.609;ANN =G|synonymous_variant|LOW|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/25|c.42 7 24G>C|p.Gly1408Gly|4224/5586|4224/5586|1408/1861||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Contro 2 3 . C G 1722.2 . ls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.25;SOR=1.552;ANN 2 CCCCCACCC =C|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|18/25|c.4211 2 0 . ACA C 1668.2 . _4221delTGTGGGTGGGG|p.Leu1404fs|4221/5586|4211/5586|1404/1861||WARNING_TRANSCRIPT_NO_START_CODON;

165

4 LOF=(OBSL1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.0 5 00e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 8 5 7 5 2 2 0 4 AC=2;AF=1.00;AN=2;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.75;SOR=1.085;ANN 5 =GGTGAATGGTGGGGGGGGGCATGAGGCCCTTGGTGGGGGGGGGCCGGGGCCCCAGGGGAGTGAGGACGACACT 9 GTTACCT|frameshift_variant&splice_region_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART0000002203 6 GGTGAATGGTGGGGGGGGGCATGAGG 7.1|protein_coding|15/25|c.3917_3918insAGGTAACAGTGTCGTCCTCACTCCCCTGGGGCCCCGGCCCCCCCCCACCAAG GGCCTCATGCCCCCCCCCACCATTCAC|p.Asp1306fs|3917/5586|3917/5586|1306/1861||WARNING_TRANSCRIPT_NO_S 4 CCCTTGGTGGGGGGGGGCCGGGGCCCC TART_CODON;LOF=(OBSL1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=N 2 8 . G AGGGGAGTGAGGACGACACTGTTACCT 450.23 . aN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 6 ACCAGGCGA AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.76;QD=28.23;SOR=0.760;ANN 1 CGGCGGGGC =A|frameshift_variant&splice_acceptor_variant&splice_region_variant&intron_variant|HIGH|OBSL1|ENSOARG00000020239|t 7 CCGGGGGGG ranscript|ENSOART00000022037.1|protein_coding|12/25|c.3120+14_3136delGGCCGGCCGCCCCCCCCCGGGCCCCGCCGT CGCCTGG|p.Pro1041fs|3136/5586|3121/5586|1041/1861||WARNING_TRANSCRIPT_NO_START_CODON;LOF=(OBSL1|E 9 GGCGGCCGG NSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DO 2 6 . CC A 1261.2 . M=1.000e+00;CC_REC=1.000e+00 2 2 0 4 7 AC=2;AF=1.00;AN=2;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.67;SOR=0.976;ANN 2 =C|frameshift_variant|HIGH|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|6/25|c.1716d 2 elC|p.Val573fs|1716/5586|1716/5586|572/1861||WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|M ODIFIER|INHA|ENSOARG00000020243|transcript|ENSOART00000022040.1|protein_coding||c.- 4 4473delG|||||4473|;LOF=(OBSL1|ENSOARG00000020239|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO= 2 8 . CG C 950.16 . NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 4 7 AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.14;QD=28.45;SOR=0.693;ANN 3 =A|missense_variant|MODERATE|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|3/25|c. 1 1141C>T|p.Arg381Cys|1141/5586|1141/5586|381/1861||WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_v ariant|MODIFIER|INHA|ENSOARG00000020243|transcript|ENSOART00000022040.1|protein_coding||c.- 2 3598G>A|||||3598|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 4 . G A 967.2 . ;CC_REC=1.000e+00 2 2 0 AC=2;AF=1.00;AN=2;DP=55;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.45;QD=29.85;SOR=1.287;ANN 4 =T|missense_variant|MODERATE|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|2/25|c. 7 188G>A|p.Gly63Asp|188/5586|188/5586|63/1861||WARNING_TRANSCRIPT_NO_START_CODON,T|upstream_gene_variant| MODIFIER|INHA|ENSOARG00000020243|transcript|ENSOART00000022040.1|protein_coding||c.- 5 1111C>T|||||1111|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 6 . C T 2326.2 . ;CC_REC=1.000e+00

166 1 1 2 2 0 4 7 AC=2;AF=1.00;AN=2;DP=56;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.46;QD=30.79;SOR=1.329;ANN 5 =A|missense_variant|MODERATE|OBSL1|ENSOARG00000020239|transcript|ENSOART00000022037.1|protein_coding|2/25|c. 6 185G>T|p.Gly62Val|185/5586|185/5586|62/1861||WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_variant| MODIFIER|INHA|ENSOARG00000020243|transcript|ENSOART00000022040.1|protein_coding||c.- 1 1108C>A|||||1108|;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00 2 4 . C A 2326.2 . ;CC_REC=1.000e+00 2 2 0 5 0 7 AC=2;AF=1.00;AN=2;DP=13;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.68;SOR=1.022;ANN 4 =G|missense_variant|MODERATE|ENSOARG00000020248|ENSOARG00000020248|transcript|ENSOART00000022047.1|prot ein_coding|20/24|c.2651A>G|p.Gln884Arg|2651/3225|2651/3225|884/1074||WARNING_TRANSCRIPT_NO_START_CODON; 1 Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e 2 9 . A G 296.2 . +00 2 2 0 5 AC=2;AF=1.00;AN=2;DP=12;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.40;SOR=2.093;ANN =TGGAGCTGAACGAGCTGATGCTGGATCGCAGCCAGGAGCCCCACTGGCGGGAGACGGCCCGCTGGATCAAGTTT 2 GAGGAGGATGTGGA|disruptive_inframe_insertion|MODERATE|SLC4A3|ENSOARG00000020255|transcript|ENSOART000 6 TGGAGCTGAACGAGCTGATGCTGGATC 00022052.1|protein_coding|8/24|c.1022_1023insGCTGAACGAGCTGATGCTGGATCGCAGCCAGGAGCCCCACTGGCGGG 8 GCAGCCAGGAGCCCCACTGGCGGGAG AGACGGCCCGCTGGATCAAGTTTGAGGAGGATGTGGAGGA|p.Glu341_Glu342insLeuAsnGluLeuMetLeuAspArgSerGln GluProHisTrpArgGluThrAlaArgTrpIleLysPheGluGluAspValGluGlu|1023/3654|1023/3654|341/1217||INFO_REALIGN_3_PRI 6 ACGGCCCGCTGGATCAAGTTTGAGGAG ME;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1. 2 2 . T GATGTGGA 1146.2 . 000e+00 2 2 0 5 3 1 5 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.97;SOR=1.096;ANN =T|frameshift_variant|HIGH|SLC4A3|ENSOARG00000020255|transcript|ENSOART00000022052.1|protein_coding|16/24|c.237 4 3_2374delCG|p.Val792fs|2373/3654|2373/3654|791/1217||;LOF=(SLC4A3|ENSOARG00000020255|1|1.00);Cases=1,0,2;Control 2 4 . TCG T 1647.2 . s=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 2 0 5 3 1 AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=33.34;SOR=1.096;ANN 5 TCCTCGCCC =T|frameshift_variant&splice_donor_variant&splice_region_variant&intron_variant|HIGH|SLC4A3|ENSOARG00000020255|tra nscript|ENSOART00000022052.1|protein_coding|16/24|c.2376_2389+7delCCTCGCCCTAGTGGCTGCGGA|p.Leu793fs|2376/ 4 TAGTGGCTG 3654|2376/3654|792/1217||;LOF=(SLC4A3|ENSOARG00000020255|1|1.00);Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC 2 7 . CGGA T 1647.2 . _GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00

167

Key:

GMPAA

CHPF

OBSL1

Likely causal variant

168

Additional file 5: Table S5 Gene list obtained from Mouse Genome Informatics (MGI) with

PubMed literature counts identifying genes causing similar phenotypes to BCRHS.

PubMed reference hits Gene MGI ID MGI phenotype overview linked from MGI Cellular, endocrine and exocrine glands, hematopoietic NHEJ1 19228201 38 system, homeostasis and immune system (1) ATG9A 2138446 Cellular, homeostasis, immune system and mortality (2) 31 Various tissues and organ systems (part of larger GLB1L 1921827 19 studies) (3) DNAJB2 1928739 Adipose tissue (4) 22 RESP18 1098222 Growth and mortality (5) 29 Behaviour, cardiovascular system, cellular, homeostasis, DES 94885 421 mortality and muscle (6) Various tissues and organ systems (part of larger GMPPA 1916330 22 studies) (7) CHPF 106576 Behaviour, growth, homeostasis, limbs and skeleton (8) 26 Various tissues and organ systems (part of larger OBSL1 2138628 16 studies) (9) STK11IP 1918978 Mortality (10) 19

References (1) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1922820), February 2020

(2) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:2138446), February 2020

(3) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1921827), February 2020

(4) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1928739), February 2020

(5) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1098222), February 2020

(6) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:94885), February 2020

(7) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1916330), February 2020

(8) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:106576), February 2020

(9) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:2138628), February 2020

(10) Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/marker/MGI:1918978), February 2020

169

Additional file 6: Figure S1 Comparison of predicted open reading frames (ORF) for OBSL1 mRNA (XM_027965226.1) and predicted mutant ovine mRNA for OBSL1 using NCBI ORF

Finder [15] (accessed 18th December 2019, < https://www.ncbi.nlm.nih.gov/orffinder/>). (a)

ORF1 (black *) represents the ORF that codes for OBSL1 (1899 amino acid residues). (b) ORF1

(red *) codes for a truncated and modified protein of 691 amino acid residues.

Additional file 7 Ovine OBSL1 isoform X1 protein sequences (XP_027821027) for wildtype

(SheepWT) and mutant (SheepMT) sheep. The predicted mutant c.1716delC

(p.(Val573Trpfs*119)) altered amino acid sequence is highlighted in red.

>SheepWT MKAGSGDQGSPPCFLRFPRPVRVVSGAEAELKCVVLGEPPPIVVWEKGGQQLAASDRLSFPVDGAEHCLLL

SGALPTDAGVYVCRARNSAGEAYAAAAVTVLEPPAPEPEPQLAERPLPPPGAGEGAPVFLTGPRSQWVLR

GAEVVLECQVGGLPAPTLYWEKDGMALDEVWDSSHFSLEPGRAGAGASLALRILAARLPDSGVYVCHARN

AHGHARAGALLQVQQPPESPPEDPDEAPTPVVEPLKCAPKTFWVNEGKHAKFRCYVMGKPEPEIEWHWE

GRPLLPDRRRLMYRDRDGGF VLKVLYCQAKDRGLYVCAARNSAGQTLSAVQLHVKEPRLRFSRPLQDVEG

REHGIAVLECKVPNSRIPTAWFREDQRLLPCRKYEQIEEGTVRRLIIHRLKADDDGVYLCEMRGRVRTVANV

170

TVKGPILKRLPRKLDVFEGENAVLLVETREAGVEGRWSRDGEDLPATCQSSSGHMHALVLPGVTREDAGEV

TFSLGNSRTTTLLRVKCIKHSPPGPPVLAEMFKGHRNTVLLTWKPPDPTPETAFIYRLERQEVGSEDWVQCF

SIEKAGAVEVPGDCVPTEGDYRFRVCTVSEHGRSPHVVFHGSAHLVPTARLVAGLEEVQVYDGEDAVFSLD

LSTVIQGTWFLNGEELKSNEPEGQVGPGPLRYRVEQRGLQHRLILQAVRHQDSGALIGFSCPGVQDSAALTI

QESPVHILSPQDKVSLTFTTSDRVVLTCELSRVDFPASWYKDGQQVEESESLVVKMDGRKHRLILPEAQVQD

SGEFECRTEGVSAFFSVTVQDPPVHIVAPREHVFVHAITSECVMLTCEVDREDAPVHWFKDGQEVEESDFV

LLESEGPHHRLVLPSAQPSIGGEFQCVAGDERAYFTVTITDVSSWIVYPSGKVYVAAVRLERVVLTCELCRPW

AEVRWTKDGEEVVESPTLLLQKEDTVRRLVLPAVQLEDSGEYLCEIDDESASFTVTVTEPPVRILYPRDEVTLV

AVSLECVVLMCELSREDAPVRWYKDGLEVEESEALVLESDGPRRRLVLPAAQPQDGGEFVCDAGDDSAFFT

VTVTAPPERIVHPAARSLDLQFRAPGRVELRCEVAPAGSQVRWYKDGLEVEASEALQLGAEGPTRTLTLPH

AQPEDAGEYVCETRDEAVTFNVSLAEPPVQFLAPEAAPGPLCVAPGEPVVLSCELSRAGALVFWSHNGKPV

QTGEGLELRAEGPRRVLCIRAADLAHAGLYTCQCGAAPGAPSLSFTVQVAEPPVRVVAPEAAQTRVRSTPG

GDLELAVRLSGPGGPVRWYKDGERLASQGRVQLEQDGARQVLRVRGARSRDAGEYLCDTPQDSRIFLVSV

EEPPLVKLVSELTPLTVHEGDDATFRCEVSPPDADITWLRNGVVITPGPQLETTQNGSSRTLTVRSCRLEDAG

TVTARAGGTSTSARLHVRETELLFLRRLQDVRAEEGQDVCLEVETGRVGAAGAVRWVRGGAPLPPDSRLST

AQDGHVFRLFIHSVVLADQGTYGCESHHDRTLARLSVRPKQLRVLRPLEDVTIIEGGNATFQLELSQEGVTG

EWARGGVRLQPGPKCQIQAEGPTHHLVLSGLGLADSGCISFTADTLRCAARLTVREAPVTIVRGLQDLEVTE

GDTATFECALSQALADVTWEKDGQPLTPSARLRLQALGTRRLLQLRRCSPLDAGTYSCVVGMARTGPVHL

VVRERKVSVLSELRSVSAREGDGATFECTVSEVETAGSWELGGRPLRPGGRVRIRQEGKKHILVLSELRAEDA

GEVRFQAGPAQSVAQLEVEALPLQMRRRPPREKTVLVGRRAVLEVTVSRPGGQVCWLREGAELCPGDKY

QLRSHGPTHSLVIHDVRPEDQGTYCCRAGQDSAYTRLLVEGDAPLST

>SheepMT MKAGSGDQGSPPCFLRFPRPVRVVSGAEAELKCVVLGEPPPIVVWEKGGQQLAASDRLSFPVDGAEHCLLL

SGALPTDAGVYVCRARNSAGEAYAAAAVTVLEPPAPEPEPQLAERPLPPPGAGEGAPVFLTGPRSQWVLR

GAEVVLECQVGGLPAPTLYWEKDGMALDEVWDSSHFSLEPGRAGAGASLALRILAARLPDSGVYVCHARN

AHGHARAGALLQVQQPPESPPEDPDEAPTPVVEPLKCAPKTFWVNEGKHAKFRCYVMGKPEPEIEWHWE

GRPLLPDRRRLMYRDRDGGF VLKVLYCQAKDRGLYVCAARNSAGQTLSAVQLHVKEPRLRFSRPLQDVEG

REHGIAVLECKVPNSRIPTAWFREDQRLLPCRKYEQIEEGTVRRLIIHRLKADDDGVYLCEMRGRVRTVANV

171 TVKGPILKRLPRKLDVFEGENAVLLVETREAGVEGRWSRDGEDLPATCQSSSGHMHALVLPGVTREDAGEV

TFSLGNSRTTTLLRVKCIKHSPPGPPVLAEMFKGHRNTVLLTWKPPDPTPETAFIYRLERQEVGSEDWVQCF

SIEKAGAWRCPGTACLPKATTASESALSANTAAAPTWCSTGLLTSCPQLAWWLVWRRYRCMMGKTPSSP

WISPPSSRAPGSLTGRSSRVTSQRARWGPGPCGTGWNSVACSTGSSCRPSGIRTAGP

Additional file 8: Figure S2 Wildtype OBSL1 protein showing conserved domains obtained from the NCBI Conserved Domains database [15, 54] (accessed 18th December 2019,

). The location of the c.1716delC variant is indicated. The resulting modified protein p.(Val573Trpfs*119) is predicted to have a truncated fibronectin type 3 domain and is lacking four immunoglobulin domains.

172

Additional file 9: Figure S3 Allelic discrimination plot visualised using QuantStudio™ Real-

Time PCR System version 1.3 (Applied Biosystems™) for a TaqMan genotyping assay used to discriminate the ENSOART00000022037.1:c.1716delC variant for homozygous wildtype (red dots), heterozygous (green dots), homozygous mutant (blue) individuals and a no DNA template control (black square).

173

Chapter 6 | Pulmonary hypoplasia with anasarca in

Persian/Persian-cross sheep

6.1 Synopsis

This chapter describes research relating to pulmonary hypoplasia with anasarca in Persian sheep.

Section 6.2 includes a manuscript for submission that describes the disease in this breed and molecular characterisation. A likely causal splice site variant was identified following a SNP genotyping, homozygosity mapping, whole genome sequencing and candidate gene analysis.

Validation of this variant was conducted through Sanger sequencing of gDNA and cDNA, splice site recognition impact analysis and diagnostic testing of 193 animals from the original flock.

The supplementary materials associated with section 6.2 is included in section 6.3. Chapter 6.2 is in the format of the journal guidelines for which this manuscript will be submitted.

174

6.2 A splice site mutation in ADAMTS3 is the likely causal variant for

pulmonary hypoplasia with anasarca in Australian Persian/Persian-cross

sheep

A splice site mutation in ADAMTS3 is the likely causal variant for pulmonary hypoplasia with anasarca in Australian Persian/Persian-cross sheep

S.A. Woolley1, B. Hopkins1, M.S. Khatkar1, I.V. Jerrett2, C.E. Willet3, B.A. O’Rourke4, I.

Tammen1*

1The University of Sydney, Faculty of Science, Sydney School of Veterinary Science, Camden,

2570, NSW, Australia 2Agriculture Victoria Research Department of Jobs, Precincts and Regions

AgriBio Centre, Bundoora, Victoria 3083 3The University of Sydney, Sydney Informatics Hub

Core Research Facilities, Sydney, 2006, NSW, Australia 4NSW Department of Primary

Industries, Elizabeth Macarthur Agricultural Institute, Menangle, 2568, NSW, Australia

Corresponding author: Imke Tammen, The University of Sydney, Sydney School of Veterinary

Science, Camden, NSW, Australia [email protected]

175

Summary

Pulmonary hypoplasia with anasarca, or hydrops foetalis, a lethal disease that is characterised by diffuse oedema and generalised lymph node hypoplasia, has been reported in several breeds of cattle and sheep as an inherited condition with a recessive mode of inheritance. This is the first report of the disease in three flocks of Persian/Persian-cross sheep in Australia. Affected foetuses were reported from three flocks over multiple years, and a total of eight affected, eight obligate carriers and 176 related Persian/Persian-cross animals were available for analysis, as well as unrelated control animals. SNP genotyping revealed a region of homozygosity in five affected animals on ovine chromosome six that contained the functional candidate gene

ADAMTS3 gene. Whole genome sequencing of two affected foetuses and one obligate carrier ewe revealed a single nucleotide deletion, ENSOARG00000013204:g.87124344delC, located

3bp downstream from a donor splice site region in the ADAMTS3 gene. Sanger sequencing of cDNA containing this variant further revealed that it is likely to introduce an early splice site in exon 14, resulting in a loss of six amino acids at the junction of exon 14 and intron 14/15. A genotyping assay was developed and the ENSOARG00000013204:g.87124344delC segregated with disease in 194 animals, allowing for effective identification of carrier animals.

Key words: Pulmonary hypoplasia with anasarca, sheep, recessive, inherited disease, whole genome sequencing

176

Introduction

Pulmonary hypoplasia with anasarca or hydrops foetalis is a lethal inherited disease that is characterised by diffuse generalised oedema within body cavities and tissues, under-developed lungs and generalised hypoplasia of lymph tissues (Windsor et al. 2006; Whitlock et al. 2008;

Alleaume et al. 2012). Pulmonary hypoplasia with anasarca has been reported in cattle

(OMIA001562-9913), and documented as hydrops foetalis in several other species including humans (OMIM236750; OMIM613124), within murine models (Caron & Smithies 2001;

Mackie et al. 2018), sheep (OMIA000493-9940), rabbits (OMIA000493-9986), dogs

(OMIA000493-9615) and pigs (OMIA000493-9823).

In humans, hydrops foetalis is not considered a standalone disorder, but rather a feature of many disorders, with non-immune hydrops foetalis (NIHF; OMIM 236750) being the most frequent form of hydrops foetalis reported (Bellini et al. 2009). Causes of NIHF are often multifaceted, and can include foetal and chromosomal abnormalities, metabolic disorders, congenital infections, lysosomal storage diseases and rare genetic disorders (Bellini et al. 2009; Sparks et al. 2019). Given the vast range of genetic etiologies for NIHF, the approach of categorising the affected organs in addition to NIHF has meant that numerous genes of interest have been identified (Mardy et al. 2019).

The genetic etiology of hydrops foetalis in rabbits and dogs is yet to be determined, however a recent CRISPR-Cas9 experiment utilising a 5 base pair (bp) deletion in the FBN1 gene has resulted in piglets with hydrops foetalis phenotypes (Tsang et al. 2020). In mice, a genetic model

177

involving a gene knockout of the Adm gene resulted in mice with similarly severe hydrops foetalis and cardiovascular defects (Caron & Smithies 2001). In ruminants, pulmonary hypoplasia with anasarca has been reported in Dexter, Shorthorn, Maine-Anjou, Cika and

Holstein cattle (Windsor et al. 2006; Whitlock et al. 2008; Svara et al. 2016; Švara et al. 2017;

Häfliger et al. 2020), and in several sheep breeds; including Cheviot, Merino, Poll Dorset-cross,

Awassi and crossbred sheep (Plant et al. 1987; d'Assonville 1989; Hailat et al. 1997;

Monteagudo et al. 2002; Alleaume et al. 2012). The genetic cause has been established in

Maine-Anjou and Dexter cattle; a point mutation and a 84 bp deletion, respectively (Whitlock et al. 2008). However, the causal gene was not disclosed in these breeds despite the availability of a commercial test. In Cika cattle, a causative missense mutation for PHA has been identified in a disintegrin and metalloproteinase with thrombospondin type 1 motif 3 (ADAMTS3) gene located on chromosome 20 (Häfliger et al. 2020). Häfliger et al. (2020) also identified a trisomy of chromosome 20 in a Holstein foetus affected with PHA, and the resulting copy number variation of ADAMTS3 was proposed to be disease causing.

Here, we report the clinical signs, the pathology of four affected foetuses and the identification of a likely causal mutation for PHA in this breed.

178

Materials and methods

Animals

Samples from three different Australian Persian/Persian-cross sheep flocks were available for this study. The collection of blood and tissue samples for genetic analysis was approved by the

University of Sydney Animal Ethics Committee (Project No: 2016/998).

Flock 1: One intact affected foetus was available for a complete necropsy (PPHA68). Additional tissue samples from four affected foetuses (PPHA69, PPHA70, PPHA93 and PPHA94) were collected by veterinarians, and were available for histopathology only.

Blood and tissue samples for DNA analysis were collected by veterinarians and/or the owner.

EDTA blood samples were submitted from seven obligate carriers (PPHA73, PPHA74, PPHA79,

PPHA91, PPHA92, PPHA97 and PPHA98) and 26 clinically unaffected sheep. Tissue samples were available from seven affected foetuses (PPHA68, PPHA69, PPHA70, PPHA93 and

PPHA94 collected for histopathology, as well as lung, liver, kidney and/or spleen from two additional foetuses PPHA72 and PPHA100). Tissue samples were either formalin fixed, frozen fresh or frozen in RNAlater (ThermoFisher Scientific, DE, USA). Blood cards of 87 clinically unaffected animals were collected by the owner as per diagnostic DNA testing protocols (NSW

Department of Primary Industries 2017). Pedigree information was provided by the owner from personal records and was visualised using R package kinship2 version 1.8.5 (Sinnwell et al.

2014) (https://CRAN.R-project.org/package=kinship2).

179

Flock 2: Cases of affected foetuses showing clinical signs similar to affected foetuses in flock 1 were historically observed. The owner of flock 1 had purchased several sheep from this flock.

EDTA-blood samples from one obligate carrier (PPHA21) and 66 clinically unaffected sheep were provided by the owner.

Flock 3: A formalin fixed paraffin embedded tissue sample of a single affected foetus (PPHA71) showing clinical signs similar to affected foetuses in Flock 1 was provided by the owner.

DNA and RNA extraction

Genomic DNA was extracted from 98 EDTA blood samples and 10 tissue samples using the

QIAGEN DNeasy Blood & Tissue Kit (QIAGEN, CA, USA) following the manufacturer’s protocol. Formalin-fixed and paraffin-embedded tissues were treated using a tissue wash protocol (Rivero et al. 2006) before extraction. Genomic DNA was isolated from blood cards collected from 87 animals using a standard blood card digest (O’Rourke et al. 2017).

Total RNA was extracted from fresh spleen tissue stored in RNAlater (ThermoFisher Scientific,

DE, USA) from two affected animals (PPHA93 and PPHA94) using the QIAGEN RNeasy Mini

Kit (QIAGEN, CA, USA) following the manufacturer’s protocol for animal tissues.

180

Pathology

Detailed necropsy was conducted on one affected foetus (PPHA68). Tissue samples (skin, lymph nodes, lung, thymus, liver, spleen, bone marrow, heart, kidney, adrenal gland, pancreas, salivary gland, small intestine, large intestine, skeletal muscle, visceral nerves and brain) were collected and fixed in formalin.

Additional formalin-fixed tissues of affected foetuses PPHA69 (lung), PPHA70 (brain) and

PPHA93 and PPHA94 (lung, skin, mesenteric lymph node, liver, kidney, heart, spleen, gall bladder and small and large intestine) were collected for histopathology by veterinarians.

Formalin-fixed tissues were embedded in paraffin wax, cut and then stained with haematoxylin and eosin.

SNP genotyping

Genomic DNA samples from five affected foetuses (PPHA68, PPHA69, PPHA72, PPHA93 and

PPHA94), three obligate carriers (PPHA73, PPHA91 and PPHA92) and six clinically normal

Persian/Persian-cross sheep as well as 22 control samples from different breeds (six Icelandic and 16 Merino sheep) were submitted to the Australian Genome Research Facility (AGRF) for genotyping with the Illumina® OvineSNP50 Genotyping BeadChip (Neogen). Runs of homozygosity (ROH) were computed using PLINK v1.07 (Purcell et al. 2007). Threshold values of 80% and 0.01 were used for single nucleotide polymorphism (SNP) call rate, and minor allele frequency ROH were identified using the –homozyg command. A minimum length for a ROH was pre-set at 5 megabases (Mb), spanning over a minimum of 100 SNPs to define a ROH. In order to account for 1% error in genotyping calls, one heterozygote and up to two missing

181

genotypes were allowed for each ROH. The ROH were visualised using R software (Team 2014) and Excel.

Whole genome sequencing

Genomic DNA concentration and purity for samples from two affected foetuses (PPHA93 and

PPHA94) and the obligate carrier dam (PPHA92) of PPHA94 were measured using the

NanoDrop 8000 spectrophotometer and Qubit® 3.0 fluorometer (Thermo Scientific, DE, USA) and visualised on a 1% agarose gel.

Whole genome sequencing was performed on the DNA of these three animals using the Illumina

HiSeqTM X Ten sequencing platform (Illumina, San Diego, CA) by the Kinghorn Centre for

Clinical Genomics (Garvan Institute of Medical Research, Darlinghurst, Australia). DNA libraries were prepared using the Illumina® TruSeq DNA Nano Library Prep kit for both affected animals, and the Roche KAPA PCR-Free Library Prep kit for the obligate carrier animal. Each sample was sequenced as 150 base pair (bp) paired-end reads at an expected 30x coverage. Adaptor sequences were removed by the service provider. Quality visualization and control was conducted on the resulting sequence reads using FastQC (version 0.11.3)

(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Inspection of FastQC output indicated that the sequence data for all three samples were of good quality (yield ranged from

58.37 Gb to 76.23 Gb, 82.45% to 96.6%>PHRED30, 40.5% to 42% GC content and no adaptor contamination flagged). Therefore, no quality trimming was conducted.

182

Read mapping, variant calling and annotation

Paired-end sequence reads were mapped to the Ovis aries Oar_v3.1 genome assembly

(GCA_000298735.1) using Burrows-Wheeler Aligner (BWA-mem) version 0.7.15 (Li & Durbin

2009) with default settings. PCR duplicates were marked using samblaster version 0.1.22 (Faust

& Hall 2014). Lane-level BAMs were merged using Picard version 1.119

(http://picard.sourceforge.net). Sorting and indexing was performed with SAMtools version 1.6.

Local realignment around insertion and deletion sites as well as base quality score recalibration using known variants downloaded from Ensembl’s dbSNP database for Ovis aries version 87

(Zerbino et al. 2017) were performed with the Genome Analysis Toolkit version 3.7.0 (GATK)

(McKenna et al. 2010; DePristo et al. 2011).

Single nucleotide polymorphisms (SNP) were called using GATK HaplotypeCaller in GVCF mode (Van der Auwera et al. 2013) and were genotyped using GATK GenotypeGVCFs

(McKenna et al. 2010; DePristo et al. 2011). Annotation and prediction of functional effects of

SNPs were conducted using SnpEff version 4.3 (Cingolani et al. 2012b) and the Ensembl annotation release 86 for Oar_v3.1.

Candidate gene analysis

Candidate genes were identified using searches in PubMed (NCBI 2018), Online Mendelian

Inheritance in Man (OMIM) (Online Mendelian Inheritance in Man) and Online Mendelian

Inheritance in Animals (OMIA) (Online Mendelian Inheritance in Animals 2019). The terms

183

‘Genetic OR congenital’, ‘lymphoedema’, ‘pulmonary hypoplasia,’ and ‘hydrops (foetalis OR foetalis) AND pulmonary hypoplasia’ were used to search the PubMed database. The term ‘non- immune hydrops foetalis’ was used to search the OMIM database, and the exhaustive search terms ‘hydrops foetalis,’ ‘lymph node hypoplasia’ and ‘pulmonary hypoplasia’ were used for the

OMIA database. Functional candidate genes were prioritised based on their biological gene function and disease associations from OMIM, Mouse Genome Informatics (MGI) (Bult et al.

2019) and PubMed to identify genes causing similar phenotypes to PHA.

Variant filtering

Variants annotated by SnpEff (Cingolani et al. 2012b) within a 40 Mb region containing the functional candidate gene ADAMTS3 (GCA_000298735.1, OAR6: 87097877-87386270) were selected for filtering using a case-control approach in SnpSift (Cingolani et al. 2012a) version 4.

Variants that were homozygous alternate for affected foetuses PPHA93 and PPHA94 and that were not homozygous alternate for obligate carrier PPHA92 were selected. Variants were filtered for ‘low’, ‘moderate’ or ‘high’ impact on protein function as annotated by SnpEff (Cingolani et al. 2012b) and known dbSNP variants and duplicate variants were manually removed.

Of these, variants present in the positional candidate gene ADAMTS3 underwent visual inspection using SAMtools tview in the sequence data for PPHA93, PPHA94 and PPHA92.

184

Validation of ENSOART00000014359.1:c.2055+3delG

PrimerBLAST (Ye et al. 2012) was used to design a primer pair to amplify the region flanking the ENSOART00000014359.1: c.2055+3delG (Oar_v3.1) variant. PCR amplification of a 360 base pair (bp) product was performed in one affected foetus (PPHA68), six obligate carrier animals (PPHA21, PPHA73, PPHA74, PPHA79, PPHA91 and PPHA92), five clinically normal

Persian/Persian-cross sheep (PPHA12, PPHA13, PPHA19, PPHA83 and PPHA89) and one

Merino control using a Gradient Palm-Cycler™ Thermal Cycler (CGI-96, Corbett Life Science,

NSW, Australia) in a total volume of 25μL, containing 1x Platinum™ SuperFi™ PCR Master

Mix (Invitrogen, ThermoFisher Scientific, DE, USA), 0.5μM of each primer F1 5’-

AAAATTTTCTCCAGTGACCAGTTTA-3’ and R2 5’-

CTTCCATCTTATACAGCCAAGAAAA-3’ and approximately 50ng of genomic DNA. The initial denaturation step was performed at 98 ºC for 30 seconds, followed by 40 cycles consisting of a denaturation step at 98ºC for 10 seconds, annealing at 64°C for 10 seconds and extension at

72ºC for 30 seconds. A final extension was performed at 72ºC for 5 minutes. PCR products were visualised on a 2% agarose gel before submission to Macrogen (Seoul, Korea) for DNA sequencing. Sequencing data was analysed using by aligning the sequences to NCBI Oar_v3.1

Sheep Genome.

Presence of the variant in sheep was also screened in a database of sequence variants generated by the Agriculture Victoria Research team at the Centre For AgriBioscience, Melbourne. These variants were discovered from 935 sheep sequences: 453 from the SheepGenomesDB Project and

185

482 contributed by the Sheep CRC Project (Daetwyler et al. 2017). A range of different breeds were represented, including 21 Persian sheep from Iran.

In-silico prediction of consequences of ENSOART00000014359.1: c.2055+3delG

To assess the strength of the 5’ splice site junction, a splice site motif scoring system based on an maximum entropy scoring model (MaxENT), MaxEntScan::score5ss

(http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) (Yeo & Burge 2004), was used. A 9-mer sequence for the donor 5’ splice site of intron 14/15 was used for the ovine

ADAMTS3 wildtype (ADAMTS3_WT: gtgGTGAGT), the mutant splice site (ADAMTS3_MT: gtgGTAGTT) and a cryptic splice site identified in cDNA analysis (ADAMTS3_CT: tgtGTGCGT). As a comparison, the human wildtype (ADAMTS3_HUM_WT: gtgGTAAGT), mutant (ADAMTS3_HUM_MT: gtgGTAGTT) and cryptic

(ADAMTS3_HUM_CT:tgtGTGCGA) ADAMTS3 equivalents of the ovine variant ADAMTS3 equivalent of the ovine variant were also tested for splice site strength. For these analyses, only the Maximum Entropy Model was selected as the scoring model.

The JSI splice site prediction tool varSEAK (v.2.0; https://varseak.bio/index.php) for human sequences was used to analyse the predicted impact of the equivalent variant in the human

ADAMTS3 transcript (ENST00000286657.4) using ADAMTS3_HUM_MT: gtgGTAGTTCAAATTGCTTCCCCAAAG as the input sequence.

186

In-vivo consequences of ENSOART00000014359.1: c.2055+3delG

To assess the impact of the ENSOART00000014359.1:c.2055+3delG (Oar_v3.1) variant in cDNA, PrimerBLAST (Ye et al. 2012) was used to design a primer pair to amplify the cDNA from exons 13 to 17 that contained the variant. A reverse transcriptase PCR (RT-PCR) amplification was performed in two affected sheep (PPHA93 and PPHA94) using a Gradient

Palm-Cycler™ Thermal Cycler (CGI-96, Corbett Life Science, NSW, Australia) in a total volume of 25μL containing 10 μL of 2.5x OneStep Ahead RT-PCR Master Mix (QIAGEN, CA,

USA), 1 μL of 25x OneStep Ahead RT-Mix (25x) (QIAGEN, CA, USA), 0.5 μM of each primer

F4 5’-TGAACATCCCGACTCCAAGAA-3’ and R4 5’-TTGACTTGGCTTCCTCCCCTT-3’, approximately 100 ng of total RNA from PPHA93 and PPHA94 and additional RNase free water to the total volume per reaction. An initial denaturation step was performed at 50˚C for 10 minutes and then at 95˚C for 5 minutes. This was followed by 40 cycles consisting of a denaturation step at 95˚C for 10 seconds, annealing at 55˚C for 10 seconds and an extension at

72˚C for 10 seconds. A final extension step was performed at 72˚C for 10 seconds. The RT-PCR products were visualised on a 2% agarose gel before submission to Macrogen (Seoul, Korea) for

Sanger sequencing. Sanger sequencing results contained overlapping chromatogram peaks, and so a nested PCR was conducted to improve specificity.

A primer pair was designed for a nested PCR using PrimerBLAST (Ye et al. 2012). Primers were located in exon 14 and exon 17 and diluted RT-PCR product from both affected animals was used as template. Amplification was conducted in a Gradient Palm-Cycler™ Thermal Cycler

(CGI-96, Corbett Life Science, NSW, Australia) in a total volume of 25μL containing 10 μL of

187

2.5x OneStep Ahead RT-PCR Master Mix (QIAGEN, CA, USA), 1 μL of 25x OneStep Ahead

RT-Mix (25x) (QIAGEN, CA, USA), 0.5 μM of each primer R1 5’-

CGCGCTGTTCCTACAAAGAC-3’ and R4 5’-TTGACTTGGCTTCCTCCCCTT-3’, a 1/1000 dilution of the RT-PCR product and additional RNase free water to the total volume per reaction.

An initial denaturation step was performed at 50˚C for 10 minutes and then at 95˚C for 5 minutes. This was followed by 40 cycles consisting of a denaturation step at 95˚C for 10 seconds, annealing at 55˚C for 10 seconds and an extension at 72˚C for 10 seconds. A final extension step was performed at 72˚C for 10 seconds. The nested PCR products from were visualised on a 2% agarose gel before submission to Macrogen (Seoul, Korea) for Sanger sequencing.

After cDNA analysis, the strength of the 5’ cryptic splice site junction for intron 14/15 resulting from the c.2055+3delG was investigated using MaxEntScan::score5ss

(http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) (Yeo & Burge 2004).

A 9-mer sequence for the donor 5’ splice site of intron14/15 was used for the ovine ADAMTS3 wildtype (ADAMTS3_WT: gtgGTGAGT) and cryptic splice site (ADAMTS3_CT: tgtGTGCGT) cDNA. As a comparison, the human wildtype (ADAMTS3_HUM_WT: gtgGTAAGT) and mutant (ADAMTS3_HUM_CT:tgtGTGCGA) ADAMTS3 cDNA (ENST00000286657.10) equivalent of the ovine variant was also tested for splice site strength. For both analyses, only the

Maximum Entropy Model was selected for the scoring model.

188

TaqMan PCR genotyping assay

A TaqMan real-time PCR was designed using the Custom TaqMan® Assay Design tool

(ThermoFisher Scientific, DE, USA) to discriminate between homozygous wildtype, heterozygous and homozygous mutant genotypes.

Allelic discrimination was performed using the ViiA™ 7 system (Applied Biosystems™, CA,

USA) in a final reaction volume of 12.5µL. Each reaction contained 1 x TaqMan® Genotyping

Master Mix (Applied Biosystems, CA, USA), 900 nmol/L of assay specific forward primer 5’-

ACTAGAGGAAGCTTCGGTGGAA-3’ and reverse primer 5’-

CAAAGACCCGTACAGCATATGTGT-3’, 250 nmol/L of allele specific 5’-VIC-

TGTGTGGTGAGTTCAG-NFQ-3’ (wildtype) and 5’-FAM- AGTGTGTGGTAGTTCAG-NFQ-

3’ (mutant) probes and approximately 10-30ng of genomic DNA. Each assay commenced with a pre-read stage at 60°C for 30 seconds followed by an initial denaturation at 95°C for 10 minutes, followed by 45 cycles of denaturation at 95°C for 15 seconds, annealing/extension at 60°C for 60 seconds and a final post-read stage at 60°C for 30 seconds. Genotypes were analysed using the

QuantStudio™ Real-Time PCR System version 1.3 (Applied Biosystems™, CA, USA).

Results

Clinical signs and pedigree analysis

Several cases of abnormally large pregnant ewes with oedematous foetuses leading to severe dystocia were reported in at least three flocks of Persian and Persian cross bred sheep from 2014 onwards. This is a relatively uncommon breed in Australia and the three flocks were known to be

189

genetically linked. Clinical signs first manifested in pregnant ewes that showed bloating, lethargy and recumbency. On vaginal examination of the ewes, it became apparent that the foetuses were profoundly large and oedematous requiring a caesarean section or euthanasia of the ewe. The affected foetuses showed systemic oedema with high volumes of pleural fluid and were stillborn

(Figure 1).

Figure 1 Gross morphology of PHA-affected foetuses reported in Persian/Persian-cross sheep.

(a) An affected PHA foetus with severe malformation due to systemic oedema. (b) A submandibular midline incision showing subcutaneous fluid accumulation. (c) Thoracic viscera including left hypoplastic lung (L), heart (H) and thymus (T). (d) Right thorax with pleural fluid accumulation. Heart (H) is visible but extensive fluid obscures the hypoplastic lungs.

190

Pedigree analysis of flock 1 supports the suspected recessive mode of inheritance for PHA in

Persian sheep (Figure 2).

Figure 2 Pedigree of flock 1 with available genotypes showcasing inbreeding within the flock and segregation of the disease-causing mutation. Males are designated by a square, and females are designated by a circle. Wildtype animals (C/C) are designated by black outlines, obligate carriers (C/-) are highlighted in red and affected animals (-/-) are designated by black-filled symbols. Symbols with a diagonal strike-through have not been genotyped or sequenced.

Connecting dotted lines represent the same animal in other pedigree branches.

191

Gross pathology

Full necropsy conducted on a female foetus (PPHA68) showed marked anasarca with subcutaneous oedema over the head, neck, thorax and limbs (Figure 3). The foetus had a crown to rump length of 50 cm and a bodyweight of 9.7 kg. Moderate oedema was also observed in the caudal portion of the foetus, with patchy subcutaneous congestion and haemorrhage. No prominent lymph nodes were visible in the peripheral, mesementeric or other abdominal regions.

A small structure measuring 3 mm was observed in the parotid salivary tissue, although it could not be confirmed as a lymph node. The liver, kidneys and spleen appeared normal. The abomasum contained a moderate amount of yellow mucoid fluid and there was mild reddening of the small intestinal serosa. The meconium of the large intestine was semi-fluid. Large quantities of yellow fluid were present in the thorax, and the lungs were bilaterally hypoplastic, with each lobe measuring approximately 25 mm in length, 20 mm wide and 12 mm in depth. The heart appeared to be globose but otherwise normal. Excess translucent pericardial fluid was observed, as well as marked subserosal oedema of the parietal pleura. The thymus appeared normal, and the brain appeared mildly misshapen with an elongated appearance. The spine appeared fractured and separated in the upper lumbar region, with minor haemorrhage in adjacent soft tissue. In addition, mandibular prognathia was observed, with the lower incisor region extending beyond the dental pad.

192

Figure 3 Histopathology of PHA-affected foetus PPHA68 tissues. (a) Dysplastic mesenteric lymph node lacking normal architectural arrangement. The lymphocytes are scant, and are intermingled with granulocytes. (b) Liver with proliferative disorganised bile ductules and intraductular or periductular bile stasis. Scale bar = 200 µM.

For PPHA93 and PPHA94, samples for histopathology were collected by a veterinarian. At sample collection, large quantities of pleural fluid were observed in both PPHA93 and PPHA94.

Necropsy of lung tissue from PPHA93 revealed small lung lobes measuring 55 mm in total length with the caudal lobe measuring 30 mm in length, 17 mm in width and 17 mm in depth.

Necropsy of PPHA94 showed small lungs similar to PPHA93, as well as a swollen liver with uniform orange discolouration. The gall bladder had collapsed with only trace bile present. Gross pathology was not available for PPHA69 and PPHA70, but preliminary diagnosis of all affected cases included pulmonary hypoplasia with anasarca with lymphoid hypoplasia and dysplasia, as well as cholangiopathy in some cases (PPHA68 and PPHA94).

193

Histopathology

Skin samples from PPHA68 showed marked oedema and mildly diffused mixed inflammatory cell infiltration of the deep dermis and subcutis (Figure 3). Lymphatic tissues were dilated and of irregular tortuous shape. The mesenteric lymph nodes were small and poorly populated with lymphocytes, with disorganised architecture and lack of normal capsular, subcapsular and trabecular structure. The lungs were congested with patchy accumulation of proteinaceous fluid within alveoli. Bronchioles contained collapsed lumina with papillary projections of mucosal epithelium. There is fibrosis of subpleural connective tissue. There is marked patchy colonisation of alveoli and occasional bronchioles with bacilli. There was no evidence of bacterial colonisation associated with necrosis or inflammatory cell infiltration. The liver of PPHA68 contained diffuse bile ductule proliferation with mild associated portal fibroplasia. There was marked stasis of the bile within canaliculi and within some bile ductules. The remaining tissues collected showed no significant findings.

In animal PPHA69, the right lung measured 35 mm in length, 15 mm wide and 12 mm in depth.

The bronchi appeared smaller than normal, with lobes showing crowding of bronchi, bronchioles and large blood vessels. A pleural sub-mesothelial moderately thick layer of fibrous tissue containing many lymph vessels and blood vessels was observed. As only the brain was available for PPHA70 and did not show any significant findings, no further observations could be made regarding other organ structures. The dams of PPHA69 and PPHA70 were also available for necropsy. The dam of PPHA69 showed marked uterine and placental oedema and hyperplastic abomastitis, whilst the dam of PPHA70 showed hyperplastic abomastitis.

194

In animals PPHA93 and PPHA94, skin samples showed expanded subcutaneous connective tissue due to diffuse oedema, with plump pleomorphic spindle cells. In the lung, there was a diffuse layer of pleural sub-mesothelial deposition of loose fibrous tissue containing many lymph vessels. The bronchi appeared smaller than normal, with lobes showing crowding of bronchi and bronchioles. Blood vessels adjacent to the bronchi were also prominent, with accumulation of patchy intra-alveolar foamy macrophages in PPHA93, with PPHA94 containing respiratory bronchioles lined by prominent cuboidal epithelium. The mesenteric lymph nodes for both animals contained areas of clustered abnormal lymph vessels within a loose collagenous stroma containing scant lymphocytes and areas of extramedullary haemopoiesis. The liver was congested with mild diffuse hydropic change of hepatocytes in PPHA93 and in PPHA94, there was diffuse hydropic change of hepatocytes. For PPHA94, there was also marked diffuse cholangiole proliferation in portal areas, and moderate portal proliferation of plump spindle cells resembling primitive mesenchyme. The kidney cortex was congested, and the cortical clusters of tubules of immature appearance alternated with areas of normal tubules. Glomeruli appeared to be concentrated in the outer cortical zone, with dilatation of the renal pelvis and moderate oedema of hilar connective tissue. There were no significant findings for the heart, spleen, small intestine and large intestine for either animal.

Identification of candidate genes

Five protein coding genes were identified as functional candidate genes based on similar phenotypes to PHA observed in humans and mice (Table S1). These genes included the

195

ADAMTS3 gene, the Fms Related Receptor Tyrosine Kinase 4 (FLT4) gene, the Forkhead Box

C2 (FOXC2) gene, the Piezo Type Mechanosensitive Ion Channel Component 1 (PIEZO1) gene and the SRY-Box Transcription Factor 18 (SOX18) gene. The ADAMTS3 gene has been implicated in lymphangiogenesis, and the proteins from the ADAMTS family are thought to have numerous functions within the extracellular matrix (Tang 2001; Janssen et al. 2016). The

FLT4 and FOXC2 genes are associated with lymphangiogenesis during early development

(Kaipainen et al. 1995; Sheik et al. 2015). Similarly, the PIEZO1 protein is involved in the mechanotransduction of several cell types, including the vasculature system. The SOX18 gene is involved in maintaining the endothelial barrier of cells, as well as vascular and lymphatic development and maintenance (Irrthum et al. 2003; Fontijn et al. 2008). The ADAMTS3 gene was considered a strong candidate gene based on its function in an Adamts3-/- knockout mouse model (Janssen et al. 2016) and the ROH results in this study. The selection of ADAMTS3 as a strong candidate gene was later supported by its genetic implication in PHA-affected cattle

(Häfliger et al. 2020).

SNP genotyping

Samples with call rates under 98% were removed from this study which resulted in the removal of five animals, three of which were PHA-affected foetuses (PPHA68, PPHA69 and PPHA72) and two control Merino sheep. Call rates for the remaining 31 samples were on average 99.5%.

The PHA-affected foetuses (PPHA93 and PPHA94) showed a shared region of homozygosity for approximately 40 Mb on chromosome 6 that contained the ADAMTS3 gene. This region of homozygosity for ADAMTS3 was not conserved in the carrier or control Persian/Persian-cross

196 animals (Figure 4). For the remaining four candidate genes, no shared regions of homozygosity were identified for the two PHA-affected foetuses.

Figure 4 Schematic summary of SNP genotyping results for affected (PPHA93 and PPHA94), carrier (PPHA91, PPHA92, PPHA19, PPHA21 and PPHA73) and control (PPHA83, PPHA89,

PPHA12, PPHA13) Persian/Persian-cross animals on the Ovis_aries_1.0 genome assembly

(GCA_000005525.1). The region OAR6: 95302002-95870367 containing the ADAMTS3 gene is highlighted by a black box. A region of homozygosity is highlighted in green amongst affected animals and heterozygous SNPs are highlighted in yellow. The SNP marker at

OAR6_95646692.1 was removed due to poor performance across all samples.

Whole genome sequencing

Whole genome sequencing data for affected foetuses (PPHA93 and PPHA94) and one obligate carrier (PPHA92) identified 1958 raw variants in the SnpEff annotated VCF file in the

ADMATS3 positional candidate gene.

197

After removal of known Single Nucleotide Polymorphism Database (dbSNP) variants and duplicates, only one variant was identified within the functional candidate gene ADAMTS3 that was homozygous alternate in at least one affected foetus and heterozygous in the obligate carrier

PPHA92. This variant was a single nucleotide deletion predicted to be located within the exon14/intron14 splice site ENSOARG00000013204:g.87124344delC;

ENSOART00000014359.1:c.2055+3delG (reverse strand) and was homozygous alternate in both affected foetuses and heterozygous in the obligate carrier PPHA92 (Table S2). Visual inspection of this variant using SAMtools tview (Li et al. 2009) in PPHA93, PPHA94, PPHA92, confirmed these genotypes.

Validation of c.2055+3delG

Sanger sequencing of 13 animals that included one affected foetus (PPHA68), six obligate carrier animals (PPHA21, PPHA73, PPHA74, PPHA79, PPHA91 and PPHA92), five clinically normal

Persian/Persian-cross sheep (PPHA12, PPHA13, PPHA19, PPHA83 and PPHA89) and one

Merino control sheep showed segregation of the variant with the PHA phenotype considering a recessive mode of inheritance (Table 1; Figure 5). Affected foetus PPHA68 was homozygous alternate (-/-) for the deletion, all obligate carriers were heterozygous (C/-) and the five clinically normal Persian animals were heterozygous (PPHA83 and PPHA89) or homozygous wildtype

(C/C; PPHA12, PPHA13 and PPHA19) for the deletion. The Merino control was homozygous wildtype for the deletion.

198

Table 1 Sanger sequencing genotype results for the ENSOARG00000013204:g.87124344delC variant identified in the ADAMTS3 gene.

Animal ID Reference Alternate Genotype of individual Phenotype Merino control C - CC Control PPHA-19 C - CC Wildtype PPHA-12 C - CC Wildtype PPHA-13 C - CC Wildtype PPHA-73 C - C- Obligate carrier PPHA-74 C - C- Obligate carrier PPHA-79 C - C- Obligate carrier PPHA-91 C - C- Obligate carrier PPHA-92 C - C- Obligate carrier PPHA-21 C - C- Obligate carrier PPHA-83 C - C- Carrier PPHA-89 C - C- Carrier PPHA-68 C - -- Affected

199

Figure 5 Schematic diagram of the ovine ADAMST3 gene showing the location of the candidate causal mutation ENSOARG00000013204:g.87124344delC with Sanger sequencing chromatograms for one wildtype control, one obligate carrier and one affected animal. (a)

Location of the ovine ADAMTS3 gene, OAR6: 87,097,877-87,386,270 on the Oar_v3.1 ovine genome assembly. (b) Enlarged view of the ADAMTS3 gene with 22 exons. (c) Genomic region containing the g.87124344delC variant with protein translation frames obtained from Ensembl genome browser 98 (Ensembl, accessed 20th September 2020, <

200

https://www.ensembl.org/Ovis_aries/Location/View?db=core;g=ENSOARG00000013204;r=6:8

7124332-87124392;t=ENSOART00000014359>. The position of the variant is identified by a red box and the protein reading frame is identified by a black box. (d) Sanger sequencing chromatograms for one wildtype (PPHA19), one obligate carrier (PPHA92) and one affected animal (PPHA68).

Results from the TaqMan genotyping assay showed segregation of the variant with disease in the

192 Persian/Persian-cross sheep across all three flocks. All seven affected animals tested were homozygous alternate, all eight reported obligate carriers were heterozygous and the remaining animals were either homozygous wildtype (n=108) or heterozygous (n=68) (Table S3). The DNA quality for the affected animal from flock 3 was insufficient for the genotyping assay. The two

Merino controls were homozygous wildtype. Based on pedigree information provided for flock 1, the c.2055+3delG variant segregates with disease (Figure 2). The estimated allele frequency for flocks 1 and 2 was 24%.

The c.2055+3delG variant was not listed as a known variant in the Ensembl Genome Browser and was not present in the variant database from 935 sequenced sheep processed by Agriculture

Victoria Research staff.

201

Functional consequences of ENSOART00000014359.1: c.2055+3delG

SnpEff identified ENSOART00000014359.1:c.2055+3delG as a splice region and intron variant at the boundary of exon 14 and intron 14/15 of the ADAMTS3 gene. In-silico analysis with

MaxENT and varSEAK supports that the variant ENSOART00000014359.1:c.2055+3delG is predicted to disrupt the splice site. The MaxENT score for wildtype sheep and human sequence was high, with a score of 8.95 and 10.36 respectively, while the mutated splice site in both the mutant ovine and the equivalent mutant human sequences resulted in the same weak scores for both sheep and human: MaxENT = 1.00. Using the varSEAK software, which is only available for human sequences, a human variant equivalent to the ovine variant was predicted to impart a loss of function of the authentic splice site and induce exon skipping, with a +71.01% reference score

(wildtype) and a -64.02% variant score (mutant). A MaxENT score was also calculated by varSEAK of 10.36 for the wildtype human sequence and 1.00 for the human equivalent of the ovine variant.

Sanger sequencing of PCR products that were generated using RT-PCR amplicons from cDNA from two affected animals indicated that the ENSOART00000014359.1: c.2055+3delG variant results in the activation of a cryptic splice site within exon 14 for both PPHA93 and PPHA94. The resulting cDNA lacked the last 18 nucleotides of exon 14 (Table S4), and the protein is therefore predicted to contain six fewer amino acids (p.(Val680_Val685del)) (Figure 6; S1; S2). The loss of these six amino acids is located within the thrombospondin type 1 repeats (TSP1) conserved domain of the ADAMTS3 protein (Zerbino et al. 2017) (Figure S2). We propose that this deletion of six amino acids via the activation of a cryptic splice site is likely to be disease causing.

202

Figure 6 Schematic diagram of part of the ovine ADAMST3 cDNA and protein showing the location of the removed 18 nucleotides (highlighted in red) due to the variant

ENSOART00000014359.1:c.2055+3delG, and the predicted loss of six amino acids

ENSOARP00000014152.1:p.(Val680_Val685del) in bold nucleotides. Figure adapted from

Ensembl (accessed 12th September 2020,

0013204;r=6:87061904-87186831;t=ENSOART00000014359>).

In silico analysis of this cryptic splice site (TGT|gtgcgt) using the MaxEnt splice site motif scoring system (Yeo & Burge 2004) did predict a lower MaxENT value for the cryptic splice site

(MaxENT = 0.78) when compared to the mutant (MaxENT = 1) and wildtype (MaxENT = 8.95) sequences.

Affected foetus PPHA94 also appeared to present with an additional splice variant, as an alternative exon followed the shortened exon 14 sequence after analysis of the cDNA PCR products (Table S4). Analysis with the Basic Local Alignment Search Tool (BLAST) and Multiple

Sequence Alignment Viewer (MSA) identified that the 139 bp sequence in the cDNA of PPHA94 was a possible alternative exon. From this sequence, 83 bp was identified in humans, mice and dogs. The full 139 bp sequence was identified in flying foxes, mole rats and some birds. The 139

203

bp sequence was not identified in annotated ovine transcripts (Figure S1) (NCBI 2018). A BLAST search did not identify this sequence to be present in the ovine reference genome, possibly due to gaps in the genome. As this alternative exon appears to represent a normal splice variant in other species and it was only detected in one of the two affected foetuses, we are uncertain if it is associated with the disease.

Discussion

Our study identified a novel likely causal variant, ENSOARG00000013204:g.87124344delC, in the ADAMTS3 gene in PHA-affected foetuses by utilising a multi-faceted approach. This approach harnessed homozygosity analysis of SNP genotyping data, whole genome sequencing and Sanger sequencing of both genomic and cDNA, and the development of a genotyping assay to aid in breeding management.

The PHA cases reported in Persian/Persian-cross sheep of this study are phenotypically similar to previously reported cases in sheep and cattle (Plant et al. 1987; d'Assonville 1989; Hailat et al.

1997; Monteagudo et al. 2002; Windsor et al. 2006; Whitlock et al. 2008; Alleaume et al. 2012;

Švara et al. 2017; Häfliger et al. 2020). Similar clinical signs across these species include diffuse oedema, marked anasarca, lymph node hypoplasia and dystocia. Hydrops foetalis in mice has so far only been observed in laboratory models, with the knockout of the Adrenomedullin gene

(Adm) being shown to produce hydrops foetalis phenotypes and cardiovascular abnormalities

(Caron & Smithies 2001). Hydrops foetalis in humans however, is more difficult to clinically and genetically define due its syndromic nature and association with a multitude of disorders (Bellini

204

et al. 2009). However, acute oedema within body cavities (Bellini et al. 2009) and cardiovascular abnormalities (Mardy et al. 2019) are common clinical signs.

The inclusion of etiologic categories for NIHF by Bellini et al. 2009 such as cardiovascular, chromosomal, extra thoracic tumours, gastrointestinal, hematologic, idiopathic, inborn errors of metabolism, infections, lymphatic dysplasia, miscellaneous, syndromic, thoracic, twin-to-twin transfusion-placental and urinary tract malformations highlights the difficulty in attributing a correct diagnosis and targeting viable candidate genes for hydrops foetalis. Lymphoid hypoplasia and at times, complete absence of lymphoid tissue, is an important phenotypic hallmark that must be considered when considering candidate functional genes. Five functional candidate genes ADAMTS3, FOXC2, FLT4, PIEZO1 and SOX18 were selected in this study due to their involvement with lymphatic development. Of these candidate genes, it has been identified that biallelic loss of function mutations in the PIEZO1 gene can cause an autosomal recessive disorder, congenital lymphatic dysplasia with non-immune hydrops foetalis (Lukacs et al. 2015;

Martin-Almedina et al. 2018). Similarly, both recessive and dominant mutations in the SOX18 gene in humans and mice showcase defective lymphatic tissue development followed by diffuse lymphatic oedema (Fontijn et al. 2008; Pendeville et al. 2008). However in PHA cases reported in cattle, a majority of causative mutations have been identified in ADAMTS3 (Häfliger et al.

2020)(Jonathan Beever, University of Tennessee, pers. comm.). The family of ADAMTS proteins play a role in cell-matrix interactions and the extracellular matrix (ECM), as well as some members being involved in embryogenesis and angiogenesis (Tang 2001; Dubail & Apte

2015). The ADAMTS3 protein is thought to play a role in fibrillary procollagen maturation, however the function of this protein is still to be fully understood (Dubail & Apte 2015). A

205

knockout Adamts3-/- mouse model was generated to elucidate the function of the function of

ADAMTS3 in vivo (Janssen et al. 2016). The Adamts3-/- embryos were not viable past day 15 of gestation, however a reduced liver size, diffuse lymphoedema and complete lack of lymphatic tissue were observed in these mice when compared to Adamts3+/+ embryos (Janssen et al. 2016).

This obvious phenotype in Adamts3-/- mouse embryos added strong support for the selection of

ADAMTS3 as a candidate gene for PHA in sheep, given the highly similar phenotypes observed.

A majority of PHA cases in cattle have followed a recessive mode of inheritance and this has also been predicted in sheep (Monteagudo et al. 2002; Whitlock et al. 2008; Häfliger et al.

2020). In the present study, the pedigree information for flock 1 is supportive of a recessive mode of inheritance (Figure 2). Pedigree analysis, combined with genotype analysis for Flock 1, supports that the g.87124344delC variant segregates with disease.

The primary challenge when investigating rare inherited diseases is the availability of samples and suitable sample sizes for meaningful analyses. It is however possible to identify and prioritise positional candidate genes by first utilising approaches such as SNP genotyping and identifying runs of homozygosity within small sample sizes (Charlier et al. 2008). When using this approach, it is particularly important to have clear phenotype descriptions and flock history.

Despite using only two of five affected animals for SNP genotyping analysis due to DNA quality issues, a region of homozygosity was able to be identified when coupled with obligate carrier and control sheep, and this approach allowed for prioritisation of one of the functional candidate genes identified for further analysis. Whole genome sequencing rather than direct sequencing of

206

the ADAMTS3 gene was considered more cost effective as the ADAMTS3 gene is relatively long with a gene length of 1,711,607 bp (ENSOARG00000013204) and a transcript length of 4120 bp

(ENSOART00000014359.1) and 1138 bp (ENSOART00000014360.1) for the two transcripts currently annotated.

The c.2055+3delG variant identified in this study is predicted to disrupt the 5’ donor splice site in intron 14/15 of ADAMTS3 and was shown to result in the activation of a cryptic splice site resulting in a loss of six amino acids ((p.(Val680_Val685del)) (Figure 6). The assessment of the strength of this mutant splice site using maximum entropy scores (Yeo & Burge 2004) for both the ovine and human sequences showed that the low score of the mutant splice site in both the ovine and human sequences supports that the c.2055+3delG variant disrupts the splice site.

Larger numerical values for the MaxENT score are typically associated with more efficient splicing and therefore, a stronger splice site for each exon (Eng et al. 2004). It has been reported that the ideal MaxENT score for a 5’ donor splice site in human sequences is 11.81 (Eng et al.

2004). Both the wildtype MaxEnt scores for the ovine and human intron 14/15 donor splice site were strong at values of 8.95 and 10.36 respectively. The MaxENT scores for the mutant splice sites had a low value of 1.0. However, the proposed activated cryptic splice site identified by sequencing of cDNA had an even lower value of 0.78. Other consequences in addition to the activation of the cryptic splice site, such as exon skipping, can therefore not be excluded. The prediction of the equivalent mutant splice site in humans by varSEAK suggested a class five splicing effect that resulted in a loss of function. The in silico predictions, the cDNA analysis, the segregation of the variant with disease and the evidence of similar phenotypes in cattle and mice resulting from mutations in the ADAMTS3 gene, do support that this variant is disease-

207 causing, however not all consequences of the disruption of the splice site may have been discovered.

This study was limited by the availability of tissues and RNA from affected, obligate carrier and control animals to further analyse the functional effect of the splice variant. If high quality RNA from affected animals from different tissues and time points becomes available, RNA sequencing analysis or quantitative RT-PCR could be used to further analyse splice variants and expression in this gene and their association with disease. Mutations in splice site regions have been shown in humans (Padgett 2012) and animals to cause disease, with 95 splicing variants currently listed in OMIA as deleterious variants for Mendelian traits (Online Mendelian Inheritance in Animals

2020). Mutations in splice site regions are especially important when the exon/intron boundary sequence is altered from the standard GT dinucleotide at the 5’ end of the exon (Padgett 2012;

Abramowicz & Gos 2018). Despite the difficulty in identifying splice site mutations and their effect on gene isoform structure (Lewandowska 2013), numerous acceptor splice site mutations have been identified as causal in human and animal inherited diseases.

The loss of six amino acids ((p.(Val680_Val685del)) is located within the thrombospondin type

1 repeats (TSP1) conserved domain of the ADAMTS3 protein (Figure S2). The TSP1 domain is part of the extracellular matrix of a wide variety of cells, and functions as an inhibitor to endothelial cell growth and angiogenesis. It has been shown in cattle that the TSP1 domain binds latent TGF-β, which is produced by cells in a latent form (Schultz-Cherry et al. 1994). This latent form of TGF-β is actively involved in angiogenesis and other biological processes

(Schultz-Cherry et al. 1994). The deletion of six amino acids within the TSP1 domain could

208

impact on the ability for TSP1 to bind latent TGF-β, however further in silico protein structure and function analysis is required to fully elucidate the impact of the c.2055+3delG variant on protein function.

Interestingly, there was evidence of an alternative exon 15 in the cDNA of PPHA94. As this alternative exon was not detected in the cDNA of PPHA93, it is unlikely to be disease related, but may present a tissue specific splice variant as only RNA from spleen tissue was available for this study.

The use of a SNP genotyping and whole genome sequencing approach has allowed for the identification of g.87124344delC as a likely disease-causing variant. This has enabled for the development and successful use of a genotyping assay to identify heterozygous animals in flock

1. Since the implementation of genotyping, no further affected foetuses have been observed in the flock as the owner is now able to avoid carrier by carrier matings. Wider use of DNA testing for this mutation will enable for improved breeding management in the small population of

Australian Persian/Persian-cross sheep.

Acknowledgments

The authors would like to acknowledge and thank Ernesto Angrilli, Denis Russell and Colin

Walker for providing samples and pedigree information associated with this study. Veterinarians

Kylie Flanagan and Natarsha Williams collected samples and conducted necropsies. The

University of Sydney is acknowledged for the use of the Artemis HPC services and facilities at the Sydney Informatics Hub. The authors would like to thank the Biotechnology laboratory staff

209 at the Elizabeth Macarthur Agricultural Institute for their assistance in the DNA extraction of some of the samples submitted for this study. The authors would like to acknowledge Dr Iona

MacLeod and Dr Hans Daetwyler at the Centre For AgriBioscience, Agriculture Victoria

Research, Melbourne for the provision of allele frequency data from Run2 of the combined

SheepGenomesDB and Sheep CRC dataset of 935 sequences. We also acknowledge the Sheep

CRC, SheepGenomesDB (http://sheep genomedb.org) and all institutions that have made their sheep sequence data available.

Funding

Whole genome sequencing was supported by the University of Sydney and NSW Department of

Primary Industries compact funding and an Australian Government Research Training Program

(RTP) Scholarship for SAW to undertake this project. The Sydney School of Veterinary Science,

The University of Sydney provided research student support for SAW and BH through assistance with consumables and Sanger sequencing. Ernesto Angrilli and Denis Russell kindly donated funds to contribute to genotyping and whole genome sequencing costs.

Availability of data

The dataset generated and/or analysed during the current study are available at the European

Nucleotide Archive (www.ebi.ac.uk/ena/) and was deposited under the study accession number

PRJEB39179, with sample accession numbers SAMEA7034588, SAMEA7034589 and

SAMEA7034590.

210 References

Abramowicz A. & Gos M. (2018) Splicing mutations in human genetic disorders: Examples,

detection, and confirmation. Journal of Applied Genetics 59, 253-68.

Alleaume C., Strugnell B., Spooner R. & Schock A. (2012) Hydrops foetalis with pulmonary

hypoplasia in Cheviot and Cheviot-Texel cross-lambs. Veterinary Record, 1-2.

Bellini C., Hennekam R.C.M., Fulcheri E., Rutigliani M., Morcaldi G., Boccardo F. & Bonioli E.

(2009) Etiology of nonimmune hydrops fetalis: A systematic review. American Journal

of Medical Genetics Part A 149A, 844-51.

Bult C.J., Blake J.A., Smith C.L., Kadin J.A. & Richardson J.E. (2019) Mouse Genome Database

(MGD) 2019. Nucleic Acids Research 47, D801-6.

Caron K.M. & Smithies O. (2001) Extreme hydrops fetalis and cardiovascular abnormalities in

mice lacking a functional Adrenomedullin gene. Proceedings of the National Academy of

Sciences 98, 615-9.

Charlier C., Coppieters W., Rollin F., Desmech D., Agerholm J.S., Cambisano N., Carta E.,

Dardano S., Dive M., Fasquelle C., Fennet J.C., Hanset R., Hubin X., Jorgensen C.,

Karim L., Kent M., Harvey K., Pearce B.R., Simon P., Tama N., Nie H., Vandeputte S.,

Lien S., Longeri M., Fredholm M., Harvey R.J. & Georges M. (2008) Highly effective

SNP-based association mapping and management of recessive defects in livestock.

Nature Genetics 40, 449-54.

Cingolani P., Patel V., Coon M., Nguyen T., Land S., Ruden D. & Lu X. (2012a) Using

Drosophila melanogaster as a model for genotoxic chemical mutational studies with a

new program, SnpSift. Frontiers in Genetics 3, 1-9.

211 Cingolani P., Platts A., Wang le L., Coon M., Nguyen T., Wang L., Land S.J., Lu X. & Ruden

D.M. (2012b) A program for annotating and predicting the effects of single nucleotide

polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118;

iso-2; iso-3. Fly 6, 80-92. d'Assonville J.A. (1989) Hydrops foetalis in a mutton merino ewe. Journal of the South African

Veterinary Association 60, 174-5.

Daetwyler H.D., Brauning R., Chamberlain A.J., McWilliam S., McCulloch A., Vander Jagt

C.J., Sunduimijid B., Hayes B.J. & Kijas J.W. (2017) 1000 bull genomes and

SheepGenomeDB projects: Enabling costeffective sequence level analyses globally. In:

Proceedings of Association for the Advancement of Animal Breeding and Genetics 22,

201-4.

DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A.,

del Angel G., Rivas M.A., Hanna M., McKenna A., Fennell T.J., Kernytsky A.M.,

Sivachenko A.Y., Cibulskis K., Gabriel S.B., Altshuler D. & Daly M.J. (2011) A

framework for variation discovery and genotyping using next-generation DNA

sequencing data. Nature Genetics 43, 491-502.

Eng L., Coutinho G., Nahas S., Yeo G., Tanouye R., Babaei M., Dörk T., Burge C. & Gatti R.A.

(2004) Nonclassical splicing mutations in the coding and noncoding regions of the ATM

gene: Maximum entropy estimates of splice junction strengths. Human Mutation 23, 67-

76.

Faust G.G. & Hall I.M. (2014) Samblaster: Fast duplicate marking and structural variant read

extraction. Bioinformatics 30, 2503-5.

212 Fontijn R.D., Volger O.L., Fledderus J.O., Reijerkerk A., de Vries H.E. & Horrevoets A.J.G.

(2008) SOX-18 controls endothelial-specific claudin-5 gene expression and barrier

function. American Journal of Physiology-Heart and Circulatory Physiology 294, H891-

900.

Häfliger I.M., Wiedemar N., Švara T., Starič J., Cociancich V., Šest K., Gombač M., Paller T.,

Agerholm J.S. & Drögemüller C. (2020) Identification of small and large genomic

candidate variants in bovine pulmonary hypoplasia and anasarca syndrome. Animal

Genetics 51, 382-90.

Hailat N., Lafi S.Q., al-Darraji A., el-Maghraby H.M., al-Ani F. & Fathalla M. (1997) Foetal

anasarca in Awassi sheep. Australian Veterinary Journal 75, 257-9.

Irrthum A., Devriendt K., Chitayat D., Matthijs G., Glade C., Steijlen P.M., Fryns J.P., Van

Steensel M.A.M. & Vikkula M. (2003) Mutations in the transcription factor gene SOX18

underlie recessive and dominant forms of hypotrichosis-lymphedema-telangiectasia. The

American Journal of Human Genetics 72, 1470-8.

Janssen L., Dupont L., Bekhouche M., Noel A., Leduc C., Voz M., Peers B., Cataldo D., Apte

S.S. & Dubail J. (2016) ADAMTS3 activity is mandatory for embryonic

lymphangiogenesis and regulates placental angiogenesis. Angiogenesis 19, 53-65.

Kaipainen A., Korhonen J., Mustonen T., van Hinsbergh V.W., Fang G.H., Dumont D.,

Breitman M. & Alitalo K. (1995) Expression of the fms-like tyrosine kinase 4 gene

becomes restricted to lymphatic endothelium during development. In: Proceedings of the

National Academy of Sciences 92, 3566-70.

Lewandowska M.A. (2013) The missing puzzle piece: Splicing mutations. International Journal

of Clinical and Experimental Pathology 6, 2675-82.

213

Li H. & Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754-60.

Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G. &

Durbin R. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics

25, 2078-9.

Lukacs V., Mathur J., Mao R., Bayrak-Toydemir P., Procter M., Cahalan S.M., Kim H.J.,

Bandell M., Longo N., Day R.W., Stevenson D.A., Patapoutian A. & Krock B.L. (2015)

Impaired PIEZO1 function in patients with a novel autosomal recessive congenital

lymphatic dysplasia. Nature Communications 6, 8329-35.

Mackie D.I., Al Mutairi F., Davis R.B., Kechele D.O., Nielsen N.R., Snyder J.C., Caron M.G.,

Kliman H.J., Berg J.S., Simms J., Poyner D.R. & Caron K.M. (2018) hCALCRL

mutation causes autosomal recessive nonimmune hydrops fetalis with lymphatic

dysplasia. Journal of Experimental Medicine 215, 2339-53.

Mardy A.H., Chetty S.P., Norton M.E. & Sparks T.N. (2019) A system‐based approach to the

genetic etiologies of non‐immune hydrops fetalis. Prenatal Diagnosis 39, 732-50.

Martin-Almedina S., Mansour S. & Ostergaard P. (2018) Human phenotypes caused by PIEZO1

mutations; one gene, two overlapping phenotypes? The Journal of Physiology 596, 985-

92.

McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K.,

Altshuler D., Gabriel S., Daly M. & DePristo M.A. (2010) The Genome Analysis

Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.

Genome Research 20, 1297-303.

214

Monteagudo L., Lujan L., Tejedor T., Climent S., Acin C., Navarro A. & Arruga M.V. (2002)

Fetal anasarca (hydrops foetalis) associated with lymphoid tissue agenesis possibly due to

an autosomal recessive gene defect in sheep. Theriogenology 58, 1219-28.

NCBI (2018) Database resources of the National Center for Biotechnology Information. Nucleic

Acids Research 46, D8-13.

NSW Department of Primary Industries (2017) Sample collection guide blood cards. Accessed

29th June 2020. URL:

https://www.dpi.nsw.gov.au/__data/assets/pdf_file/0019/701335/sample-collection-

guide-blood-card.pdf

O’Rourke B.A., Kelly J., Spiers Z.B., Shearer P.L., Porter N.S., Parma P. & Longeri M. (2017)

Ichthyosis fetalis in Polled Hereford and Shorthorn calves. Journal of Veterinary

Diagnostic Investigation 29, 874-6.

Online Mendelian Inheritance in Animals (2019) Sydney School of Veterinary Science,

University of Sydney, Sydney. Accessed 26th December 2019. URL: https://omia.org/.

Online Mendelian Inheritance in Animals (2020) Sydney School of Veterinary Science,

University of Sydney, Sydney. Accessed 31st August 2020. URL: https://omia.org/

Online Mendelian Inheritance in Man Johns Hopkins University (2019) Balitmore, MD.

Accessed 26th December 2019. URL: https://omim.org/..

Padgett R.A. (2012) New connections between splicing and human disease. Trends in Genetics

28, 147-54.

Pendeville H., Winandy M., Manfroid I., Nivelles O., Motte P., Pasque V., Peers B., Struman I.,

Martial J.A. & Voz M.L. (2008) Zebrafish Sox7 and Sox18 function together to control

arterial–venous identity. Developmental Biology 317, 405-16.

215 Plant J.W., Lomas S.T., Harper P.A.W., Duncan D.W. & Carroll S.N. (1987) Hydrops foetalis in

sheep. Australian Veterinary Journal 64, 308-10.

Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar

P., de Bakker P.I.W. & Daly M.J. (2007) PLINK: A tool set for whole-genome

association and population-based linkage analyses. American Journal of Human Genetics

81, 559-575.

Rivero E.R.C., Neves A.C., Silva-Valenzuelac M.G., Sousac S.O.M. & Nunes F.D. (2006)

Simple salting-out method for DNA extraction from formalin-fixed, paraffin-embedded

tissues. Pathology – Research and Practice 202, 523-9.

Schultz-Cherry S., Lawler J. & Murphy-Ullrich J.E. (1994) The type 1 repeats of

thrombospondin 1 activate latent transforming growth factor-beta. Journal of Biological

Chemistry 269, 26783-8.

Sheik Y., Qureshi S.F., Mohhammed B. & Nallari P. (2015) FOXC2 and FLT4 gene variants in

lymphatic filariasis. Lymphatic Research and Biology 13, 112-9.

Sinnwell J.P., Therneau T.M. & Schaid D.J. (2014) The kinship2 R package for pedigree data.

Human Heredity 78, 91-3.

Sparks T.N., Thao K., Lianoglou B.R., Boe N.M., Bruce K.G., Datkhaeva I., Field N.T., Fratto

V.M., Jolley J., Laurent L.C., Mardy A.H., Murphy A.M., Ngan E., Rangwala N.,

Rottkamp C.A.M., Wilson L., Wu E., Uy C.C., Valdez Lopez P. & Norton M.E. (2019)

Nonimmune hydrops fetalis: Identifying the underlying genetic etiology. Genetics in

Medicine 21, 1339-44.

216 Švara S., Cociancich V., Šest K., Gombač M., Paller T., Starič J. & Drögemüller C. (2017)

Pulmonary hypoplasia and anasarca syndrome: A newly diagnosed genetic disorder in

Cika cattle. Journal of Comparative Pathology 156, 107.

Svara T., Cociancich V., Sest K., Gombac M., Paller T., Staric J. & Drogemuller C. (2016)

Pulmonary hypoplasia and anasarca syndrome in Cika cattle. Acta Veterinaria

Scandinavica 58, 1-5.

Tang B.L. (2001) ADAMTS: A novel family of extracellular matrix proteases. The International

Journal of Biochemistry & Cell Biology 33, 33-44.

Team R.C. (2014) R: A language and environment for statistical computing. URL:

https://www.R-project.org.

Tsang H.-G., Lillico S., Proudfoot C., McCulloch M.E.B., Markby G., Trejo-Reveles V.,

Corcoran B.M., Whitelaw C.B.A., MacRae V.E. & Summers K.M. (2020) Severe

perinatal hydrops fetalis in genome edited pigs with a biallelic five base pair deletion of

the Marfan syndrome gene. bioRxiv, 2020.07.20.213108.

Van der Auwera G.A., Carneiro M.O., Hartl C., Poplin R., del Angel G., Levy-Moonshine A.,

Jordan T., Shakir K., Roazen D., Thibault J., Banks E., Garimella K.V., Altshuler D.,

Gabriel S. & DePristo M.A. (2013) From FastQ data to high confidence variant calls: The

Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics 11,

11.0.1-.0.33.

Whitlock B.K., Kaiser L. & Maxwell H.S. (2008) Heritable bovine fetal abnormalities.

Theriogenology 70, 535-49.

Windsor P.A., Cavanagh J. & Tammen I. (2006) Hydrops fetalis associated with pulmonary

hypoplasia in Dexter calves. Australian Veterinary Journal 84, 278-81.

217 Ye J., Coulouris G., Zaretskaya I., Cutcutache I., Rozen S. & Madden T. (2012) Primer-BLAST:

A tool to design target-specific primers for polymerase chain reaction. BMC

Bioinformatics 13, 134-45.

Yeo G. & Burge C.B. (2004) Maximum entropy modeling of short sequence motifs with

applications to RNA splicing signals. Journal of Computational Biology 11, 377-94.

Zerbino D.R., Achuthan P., Akanni W., Amode M.R., Barrell D., Bhai J., Billis K., Cummins C.,

Gall A., Girón C.G., Gil L., Gordon L., Haggerty L., Haskell E., Hourlier T., Izuogu

O.G., Janacek S.H., Juettemann T., To J.K., Laird M.R., Lavidas I., Liu Z., Loveland

J.E., Maurel T., McLaren W., Moore B., Mudge J., Murphy D.N., Newman V., Nuhn M.,

Ogeh D., Ong C.K., Parker A., Patricio M., Riat H.S., Schuilenburg H., Sheppard D.,

Sparrow H., Taylor K., Thormann A., Vullo A., Walts B., Zadissa A., Frankish A., Hunt

S.E., Kostadima M., Langridge N., Martin F.J., Muffato M., Perry E., Ruffier M., Staines

D.M., Trevanion S.J., Aken B.L., Cunningham F., Yates A. & Flicek P. (2017) Ensembl

2018. Nucleic Acids Research 46, D754-61.

218 6.3 Appendix: Supplementary material for Chapter 6

Table S1 Top five protein coding functional candidate genes identified based on literature review of similar phenotypes to PHA in humans and mice. File format: Excel (.xls).

Gene Biological function Disease(s) associated with mutations summary ADAMTS3 Lymphangiogenesis during Embryonic death and abnormal lymphangiogenesis in mice1, earlry developmen and lymphoedema and Hennekam lymphangiectasia-lymphedema extracellular matrix syndrome 3 in humans2 funcionality

FLT4 Believed to be involved in Lymphatic malformation and lymphodema (Milroy disease) in the development of humans3,4 lymphatics FOXC2 Believed to be involved in Lymphedema-distichiasis syndrome in humans5 the development of lymphatics PIEZO1 Involved in ion channel Lymphatic malformation and non-immune hydrops fetalis in activity humans6 SOX18 Regulation of embryonic Hypotrichosis-lymphedema-telangiectasia syndrome and lymphatic development hydrops fetalis in humans7

References

(1) Janssen, L., et al. (2016). ADAMTS3 activity is mandatory for embryonic lymphangiogenesis and regulates placental angiogenesis. Angiogenesis 19(1): 53-65. (2) Brouillard, P., et al. (2017). Loss of ADAMTS3 activity causes Hennekam lymphangiectasia–lymphedema syndrome 3. Human Molecular Genetics 26(21): 4095-4104. (3) Esterly, J. R. (1965). Congenital Hereditary Lymphoedema. Journal of Medical Genetics 2(2): 93. (4) Brice, G., et al. (2005). Milroy disease and the <em>VEGFR-3</em> mutation phenotype. Journal of Medical Genetics 42(2): 98. (5) Mangion, J., et al. (1999). A Gene for Lymphedema-Distichiasis Maps to 16q24.3. The American Journal of Human Genetics 65(2): 427-432. (6) Fotiou, E., et al. (2015). Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nat Commun 6(1): 8085. (7) Irrthum, A., et al. (2003). Mutations in the Transcription Factor Gene SOX18 Underlie Recessive and Dominant Forms of Hypotrichosis-Lymphedema-Telangiectasia. The American Journal of Human Genetics 72(6): 1470-1478.

219 Table S2 List of 44 private whole genome sequencing variants identified in two affected foetuses (PPHA93 and PPHA94) and one obligate carrier (PPHA92) after filtering based on segregation, predicted protein impact, removal of known SNPs and duplicates.

Likely causal variant is highlighted in yellow. File format: Excel (.xls).

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PPHA92 PPHA93 PPHA94 6 8 . T C 1 . AC=5;AF=0.833;AN=6;BaseQRankSum=- GT: 0/1:8,7:1 1/1:0,20: 1/1:0,31:31:92:838, 4 4 1.108e+00;ClippingRankSum=0.00;DP=95;ExcessHet=3.0103;FS=4.337;MLEAC= AD: 5:99:124 20:59:51 92,0 5 4 5;MLEAF=0.833;MQ=55.35;MQRankSum=0.320;QD=21.90;ReadPosRankSum=0. DP: ,0,174 4,59,0 8 5 999;SOR=1.918;ANN=C|missense_variant|MODERATE|ENSOARG00000009541| GQ: 7 . ENSOARG00000009541|transcript|ENSOART00000010395.1|protein_coding|2/10| PL 3 1 c.382A>G|p.Ser128Gly|382/1530|382/1530|128/509||,C|intron_variant|MODIFIER| 9 3 ENSOARG00000009541|ENSOARG00000009541|transcript|ENSOART00000010 0 388.1|protein_coding|1/6|c.722- 5569A>G||||||;Cases=2,0,4;Controls=0,1,1;CC_TREND=8.326e- 02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 01;CC_REC=1.000e+00 6 8 . G A 3 . AC=5;AF=0.833;AN=6;BaseQRankSum=1.93;ClippingRankSum=0.00;DP=137;Ex GT: 0/1:14,1 1/1:0,56: 1/1:0,51:51:99:153 6 5 cessHet=3.0103;FS=5.683;MLEAC=5;MLEAF=0.833;MQ=60.00;MQRankSum=0. AD: 5:29:99: 56:99:16 7,153,0 4 1 00;QD=25.83;ReadPosRankSum=-9.200e- DP: 346,0,28 61,168,0 0 3 02;SOR=0.453;ANN=A|synonymous_variant|LOW|SLC4A4|ENSOARG00000012 GQ: 3 6 . 378|transcript|ENSOART00000013465.1|protein_coding|15/25|c.2211G>A|p.Thr73 PL 0 1 7Thr|2211/6280|2211/3441|737/1146||,A|synonymous_variant|LOW|SLC4A4|ENSO 5 3 ARG00000012378|transcript|ENSOART00000013467.1|protein_coding|15/24|c.201 5 0G>A|p.Thr670Thr|2211/6183|2010/3285|670/1094||;Cases=2,0,4;Controls=0,1,1;C C_TREND=8.326e-02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 01;CC_REC=1.000e+00 6 8 . T TAAAATTTTTATTGG 6 . AC=2;AF=1.00;AN=2;DP=58;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:8,0:8: 1/1:0,16: ./.:21,0:21:.:0,0,0 6 AATAGAGTTGATTT 8 .00;MQ=42.43;QD=29.43;SOR=5.670;ANN=TAAAATTTTTATTGGAATAGAG AD: .:0,0,0 16:47:71 9 ACAACGTTGTGTTA 2 TTGATTTACAACGTTGTGTTAATTTCTGCTATACAGAAAAGTGAATCAGT DP: 8,47,0 9 ATTTCTGCTATACAG . TGTAAATACATATA|frameshift_variant|HIGH|ENSOARG00000013035|ENSOA GQ: 2 AAAAGTGAATCAGT 8 RG00000013035|transcript|ENSOART00000014172.1|protein_coding|10/10|c.918_ PL 4 TGTAAATACATATA 6 919insAAAATTTTTATTGGAATAGAGTTGATTTACAACGTTGTGTTAATTT 1 CTGCTATACAGAAAAGTGAATCAGTTGTAAATACATATA|p.Leu307fs|919/1 0 221|919/1221|307/406||WARNING_TRANSCRIPT_NO_START_CODON;LOF=( ENSOARG00000013035|ENSOARG00000013035|1|1.00);Cases=1,0,2;Controls=0, 0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+ 00;CC_REC=1.000e+00

220

6 8 . TC T 4 . AC=5;AF=0.833;AN=6;BaseQRankSum=1.49;ClippingRankSum=0.00;DP=132;Ex GT: 0/1:10,1 1/1:0,64: 1/1:0,39:39:99:143 7 3 cessHet=3.0103;FS=0.000;MLEAC=5;MLEAF=0.833;MQ=60.00;MQRankSum=0. AD: 8:28:99: 64:99:23 9,117,0 1 4 00;QD=33.16;ReadPosRankSum=0.696;SOR=0.743;ANN=T|splice_region_variant DP: 581,0,26 64,193,0 2 4 &intron_variant|LOW|ADAMTS3|ENSOARG00000013204|transcript|ENSOART0 GQ: 7 4 . 0000014359.1|protein_coding|14/21|c.2055+3delG||||||;Cases=2,0,4;Controls=0,1,1; PL 3 0 CC_TREND=8.326e-02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 4 9 01;CC_REC=1.000e+00 3 6 8 . G T 2 . AC=3;AF=0.750;AN=4;BaseQRankSum=2.87;ClippingRankSum=0.00;DP=25;Exc GT: 0/1:12,1 1/1:0,2:2 ./.:1,0:1:.:0,0,0 8 6 essHet=3.0103;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=58.79;MQRankSum=0.0 AD: 0:22:99: :6:61,6,0 5 3 0;QD=10.96;ReadPosRankSum=2.03;SOR=1.022;ANN=T|missense_variant|MOD DP: 230,0,24 8 . ERATE|ENSOARG00000014766|ENSOARG00000014766|transcript|ENSOART00 GQ: 3 4 1 000016069.1|protein_coding|3/5|c.75G>T|p.Met25Ile|75/360|75/360|25/119||;Cases= PL 8 5 1,0,2;Controls=0,1,1;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=5.000e- 2 01;CC_DOM=5.000e-01;CC_REC=1.000e+00 5 6 9 . C CAAA 3 . AC=5;AF=0.833;AN=6;BaseQRankSum=1.33;ClippingRankSum=0.00;DP=122;Ex GT: 0/1:16,1 1/1:1,19: 1/1:1,37:48:89:165 0 0 cessHet=3.0103;FS=7.017;MLEAC=4;MLEAF=0.667;MQ=60.00;MQRankSum=0. AD: 6:32:99: 24:64:85 3,89,0 1 6 00;QD=34.00;ReadPosRankSum=0.434;SOR=0.235;ANN=CAAA|splice_region_v DP: 588,0,53 4,64,0 2 0 ariant&intron_variant|LOW|CDKL2|ENSOARG00000015497|transcript|ENSOART GQ: 7 2 . 00000016865.1|protein_coding|1/11|c.169-6_169- PL 2 0 5insTTT||||||;Cases=2,0,4;Controls=0,1,1;CC_TREND=8.326e- 7 9 02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 3 01;CC_REC=1.000e+00 6 9 . T TCATGGAGCACAAG 1 . AC=5;AF=0.833;AN=6;BaseQRankSum=0.953;ClippingRankSum=0.00;DP=60;Ex GT: 0/1:20,1 1/1:0,17: 1/1:0,10:10:30:450, 0 GG 4 cessHet=3.0103;FS=7.348;MLEAC=5;MLEAF=0.833;MQ=60.00;MQRankSum=0. AD: 0:30:99: 17:54:81 30,0 8 8 00;QD=26.00;ReadPosRankSum=0.792;SOR=2.057;ANN=TCATGGAGCACAAG DP: 262,0,71 0,54,0 2 2 GG|disruptive_inframe_insertion|MODERATE|ENSOARG00000016978|ENSOAR GQ: 6 2 . G00000016978|transcript|ENSOART00000018493.1|protein_coding|1/7|c.166_167i PL 4 0 nsTGGAGCACAAGGGCA|p.Phe55_Arg56insMetGluHisLysGly|167/1302|167/13 8 9 02|56/433||INFO_REALIGN_3_PRIME;Cases=2,0,4;Controls=0,1,1;CC_TREND= 3 8.326e-02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 01;CC_REC=1.000e+00 6 9 . A T 3 . AC=5;AF=0.833;AN=6;BaseQRankSum=- GT: 0/1:8,18: 1/1:0,55: 1/1:0,38:38:99:113 2 1 1.777e+00;ClippingRankSum=0.00;DP=119;ExcessHet=3.0103;FS=5.596;MLEAC AD: 26:99:44 55:99:16 6,114,0 6 5 =5;MLEAF=0.833;MQ=60.00;MQRankSum=0.00;QD=26.52;ReadPosRankSum=0 DP: 4,0,167 07,164,0 1 6 .389;SOR=0.174;ANN=T|synonymous_variant|LOW|FRAS1|ENSOARG00000018 GQ: 2 . 254|transcript|ENSOART00000019882.1|protein_coding|5/74|c.522A>T|p.Pro174Pr PL 6 1 o|522/12057|522/12057|174/4018||,T|synonymous_variant|LOW|FRAS1|ENSOARG 7 3 00000018254|transcript|ENSOART00000019879.1|protein_coding|6/74|c.519A>T|p 9 .Pro173Pro|519/12039|519/12039|173/4012||;Cases=2,0,4;Controls=0,1,1;CC_TRE ND=8.326e-02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 01;CC_REC=1.000e+00 6 9 . G A 3 . AC=4;AF=1.00;AN=4;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1 GT: ./.:3,0:3: 1/1:0,3:3 1/1:0,8:8:24:1|1:94 4 0 .00;MQ=53.18;QD=27.83;SOR=4.977;ANN=A|splice_region_variant&intron_varia AD: .:.:.:0,0,0 :9:.:.:92, 653532_G_A:240,2 6 6 nt|LOW|ENSOARG00000019008|ENSOARG00000019008|transcript|ENSOART00 DP: 9,0 4,0 5 . 000020695.1|protein_coding|1/2|c.111+4G>A||||||WARNING_TRANSCRIPT_NO_ GQ: 3 1 STOP_CODON;Cases=2,0,4;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN; PGT 5 8 CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PID :PL

221

3 2 6 9 . G GCGGCCGCCCAGTA 2 . AC=2;AF=1.00;AN=2;DP=11;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:3,0:3: ./.:0,0:0:. 1/1:0,8:8:24:1|1:94 4 GGGTGGCCAC 0 .00;MQ=51.17;QD=25.61;SOR=4.407;ANN=GCGGCCGCCCAGTAGGGTGGCC AD: .:.:.:0,0,0 :.:.:0,0,0 653532_G_A:240,2 6 4 AC|splice_region_variant&intron_variant|LOW|ENSOARG00000019008|ENSOAR DP: 4,0 5 . G00000019008|transcript|ENSOART00000020695.1|protein_coding|1/2|c.111+7_1 GQ: 3 8 11+8insCGGCCGCCCAGTAGGGTGGCCAC||||||WARNING_TRANSCRIPT_NO PGT 5 8 _STOP_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN :PID 3 ;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PL 5 6 9 . C T 3 . AC=5;AF=0.833;AN=6;BaseQRankSum=-8.730e- GT: 0/1:31,2 1/1:0,62: 1/1:0,58:58:99:168 5 8 01;ClippingRankSum=0.00;DP=172;ExcessHet=3.0103;FS=5.003;MLEAC=5;MLE AD: 1:52:99: 62:99:18 8,173,0 4 9 AF=0.833;MQ=60.00;MQRankSum=0.00;QD=22.66;ReadPosRankSum=0.047;SO DP: 387,0,73 54,185,0 4 8 R=0.355;ANN=T|synonymous_variant|LOW|BMP3|ENSOARG00000019042|trans GQ: 9 3 . cript|ENSOART00000020731.1|protein_coding|4/5|c.456C>T|p.Ser152Ser|456/459 PL 5 1 5|456/1362|152/453||WARNING_TRANSCRIPT_NO_START_CODON;Cases=2,0 5 3 ,4;Controls=0,1,1;CC_TREND=8.326e-02;CC_GENO=NaN;CC_ALL=3.333e- 4 01;CC_DOM=3.333e-01;CC_REC=1.000e+00 6 9 . GA G 1 . AC=5;AF=0.833;AN=6;BaseQRankSum=0.580;ClippingRankSum=0.00;DP=132;E GT: 0/1:22,1 1/1:0,49: 1/1:1,34:35:82:712, 7 9 xcessHet=3.0103;FS=1.910;MLEAC=5;MLEAF=0.833;MQ=60.16;MQRankSum= AD: 5:37:99: 49:99:10 82,0 6 5 1.27;QD=16.13;ReadPosRankSum=1.22;SOR=0.531;ANN=G|splice_region_varian DP: 211,0,38 69,146,0 9 2 t&intron_variant|LOW|HPSE|ENSOARG00000002360|transcript|ENSOART00000 GQ: 7 2 . 002560.1|protein_coding|6/12|c.927- PL 1 0 5delT||||||WARNING_TRANSCRIPT_NO_START_CODON,G|splice_region_varia 1 9 nt&intron_variant|LOW|HPSE|ENSOARG00000002360|transcript|ENSOART0000 7 0002558.1|protein_coding|5/11|c.885- 5delT||||||;Cases=2,0,4;Controls=0,1,1;CC_TREND=8.326e- 02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 01;CC_REC=1.000e+00 6 9 . TA T 5 . AC=5;AF=0.833;AN=6;BaseQRankSum=1.28;ClippingRankSum=0.00;DP=171;Ex GT: 0/1:23,1 1/1:0,55: 1/1:0,57:57:99:230 7 AA 1 cessHet=3.0103;FS=5.845;MLEAC=5;MLEAF=0.833;MQ=60.06;MQRankSum=0. AD: 7:40:99: 57:99:22 2,171,0 8 1 00;QD=33.63;ReadPosRankSum=- DP: 576,0,84 73,171,0 3 1 2.097e+00;SOR=0.297;ANN=T|splice_region_variant&intron_variant|LOW|HELQ| GQ: 9 0 . ENSOARG00000002630|transcript|ENSOART00000002855.1|protein_coding|5/17| PL 5 0 c.1355-5_1355-3delTTT||||||;Cases=2,0,4;Controls=0,1,1;CC_TREND=8.326e- 6 9 02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- 2 01;CC_REC=1.000e+00 6 9 . G A 3 . AC=5;AF=0.833;AN=6;BaseQRankSum=1.02;ClippingRankSum=0.00;DP=88;Exc GT: 0/1:7,11: 1/1:0,41: 1/1:0,26:26:84:1|1: 9 5 essHet=3.0103;FS=0.000;MLEAC=5;MLEAF=0.833;MQ=53.62;MQRankSum=0.2 AD: 18:99:0| 41:99:1| 99175589_TAGTC 1 6 98;QD=32.39;ReadPosRankSum=0.454;SOR=0.955;ANN=A|synonymous_variant| DP: 1:99175 1:99175 _T:1260,84,0 7 5 LOW|WDFY3|ENSOARG00000003468|transcript|ENSOART00000003780.1|prote GQ: 589_TA 589_TA 5 . in_coding|34/70|c.5643C>T|p.Ala1881Ala|5768/10694|5643/10569|1881/3522||;Cas PGT GTC_T: GTC_T: 6 1 es=2,0,4;Controls=0,1,1;CC_TREND=8.326e- :PID 441,0,26 1895,12 1 3 02;CC_GENO=NaN;CC_ALL=3.333e-01;CC_DOM=3.333e- :PL 1 9,0 0 01;CC_REC=1.000e+00 6 1 . C G 2 . AC=2;AF=1.00;AN=2;DP=11;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:0,0:0: 1/1:0,6:6 ./.:5,0:5:.:.:.:0,0,0 0 4 .00;MQ=44.31;QD=32.35;SOR=3.912;ANN=G|5_prime_UTR_premature_start_co AD: .:.:.:0,0,0 :18:1|1:1 1 3 don_gain_variant|LOW|KLHL8|ENSOARG00000004639|transcript|ENSOART000 DP: 0173940 7 . 00005051.1|protein_coding|1/11|c.- GQ: 9_C_G:2 3 782G>C||||||,G|5_prime_UTR_variant|MODIFIER|KLHL8|ENSOARG00000004639 PGT 70,18,0

222

9 9 |transcript|ENSOART00000005051.1|protein_coding|1/11|c.- :PID 4 7 782G>C|||||25359|,G|intron_variant|MODIFIER|HSD17B11|ENSOARG0000000477 :PL 1 9|transcript|ENSOART00000005203.1|protein_coding|7/7|c.951- 9 35329G>C||||||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Con trols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM= 1.000e+00;CC_REC=1.000e+00 6 1 . T C 3 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1. GT: ./.:0,0:0: 1/1:0,2:2 ./.:0,0:0:.:0,0,0 0 3 00;MQ=34.12;QD=16.94;SOR=2.303;ANN=C|missense_variant|MODERATE|ENS AD: .:0,0,0 :6:59,6,0 2 . OARG00000005253|ENSOARG00000005253|transcript|ENSOART00000005734.1 DP: 2 8 |protein_coding|4/31|c.376A>G|p.Ile126Val|376/3039|376/3039|126/1012||;Cases=1, GQ: 1 8 0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_ PL 6 DOM=1.000e+00;CC_REC=1.000e+00 9 8 1 6 1 . T C 3 . AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1. GT: ./.:0,0:0: 1/1:0,3:3 ./.:0,0:0:.:0,0,0 0 9 00;MQ=30.39;QD=13.14;SOR=2.833;ANN=C|synonymous_variant|LOW|ENSOA AD: .:0,0,0 :9:65,9,0 2 . RG00000005253|ENSOARG00000005253|transcript|ENSOART00000005734.1|pro DP: 2 4 tein_coding|4/31|c.297A>G|p.Lys99Lys|297/3039|297/3039|99/1012||;Cases=1,0,2; GQ: 1 2 Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DO PL 7 M=1.000e+00;CC_REC=1.000e+00 0 6 0 6 1 . G GTAGAGGTA 7 . AC=3;AF=0.750;AN=4;BaseQRankSum=0.093;ClippingRankSum=0.00;DP=23;Ex GT: 0/1:6,15: ./.:0,0:0:. 1/1:0,2:2:6:1|1:112 1 3 cessHet=3.0103;FS=2.236;MLEAC=3;MLEAF=0.750;MQ=42.86;MQRankSum=1. AD: 21:99:.:. :.:.:0,0,0 437536_C_T:90,6, 2 7 67;QD=32.05;ReadPosRankSum=2.61;SOR=1.341;ANN=GTAGAGGTA|frameshi DP: :684,0,3 0 4 . ft_variant|HIGH|ENSOARG00000010776|ENSOARG00000010776|transcript|ENS GQ: 69 3 1 OART00000011727.1|protein_coding|1/20|c.33_34insTACCTCTA|p.Leu12fs|33/28 PGT 7 1 32|33/2832|11/943||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0, :PID 5 2;Controls=0,1,1;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=5.000e- :PL 4 01;CC_DOM=5.000e-01;CC_REC=1.000e+00 3 6 1 . C CAGAGGCAG,G 1 . AC=1,3;AF=0.250,0.750;AN=4;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=1,3; GT: 1/2:0,7,1 ./.:0,0,0: 2/2:0,0,2:2:6:1|1:11 1 1 MLEAF=0.250,0.750;MQ=44.31;QD=30.37;SOR=2.303;ANN=CAGAGGCAG|fra AD: 7:24:99:. 0:.:.:.:0,0 2437536_C_T:90,9 2 9 meshift_variant|HIGH|ENSOARG00000010776|ENSOARG00000010776|transcript DP: :.:1131,7 ,0,0,0,0 0,90,6,6,0 4 7 |ENSOART00000011727.1|protein_coding|1/20|c.32_33insCTGCCTCT|p.Leu12fs| GQ: 14,726,4 3 . 32/2832|32/2832|11/943||WARNING_TRANSCRIPT_NO_START_CODON,G|syn PGT 17,0,366 7 7 onymous_variant|LOW|ENSOARG00000010776|ENSOARG00000010776|transcri :PID 5 8 pt|ENSOART00000011727.1|protein_coding|1/20|c.33G>C|p.Gly11Gly|33/2832|33/ :PL 4 2832|11/943||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Cont 4 rols=0,1,2;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1 .000e+00;CC_REC=1.000e+00 6 1 . G C 1 . AC=3;AF=0.750;AN=4;BaseQRankSum=1.88;ClippingRankSum=0.00;DP=46;Exc GT: 0/1:7,29: ./.:2,0:2:. 1/1:0,1:1:3:1|1:112 1 1 essHet=3.0103;FS=6.543;MLEAC=2;MLEAF=0.500;MQ=53.74;MQRankSum=1.8 AD: 42:99:.:. :.:.:0,0,0 490222_G_C:45,3, 2 6 3;QD=32.47;ReadPosRankSum=-1.860e- DP: :1147,0, 0 4 8 01;SOR=0.144;ANN=C|synonymous_variant|LOW|ENSOARG00000011020|ENS GQ: 293 9 . OARG00000011020|transcript|ENSOART00000011993.1|protein_coding|5/20|c.60 PGT 0 8 3G>C|p.Ser201Ser|603/2991|603/2991|201/996||;Cases=1,0,2;Controls=0,1,1;CC_T :PID 2 8 :PL

223

2 REND=1.573e-01;CC_GENO=NaN;CC_ALL=5.000e-01;CC_DOM=5.000e- 2 01;CC_REC=1.000e+00 6 1 . T C 6 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1. GT: ./.:0,0:0: 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 00;MQ=39.50;QD=32.44;SOR=2.303;ANN=C|synonymous_variant|LOW|ENSOA AD: .:.:.:0,0,0 :6:1|1:11 2 . RG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993.1|pro DP: 2492350 4 8 tein_coding|7/20|c.855T>C|p.Asn285Asn|855/2991|855/2991|285/996||;Cases=1,0,2; GQ: _T_C:90 9 8 Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DO PGT ,6,0 2 M=1.000e+00;CC_REC=1.000e+00 :PID 3 :PL 5 0 6 1 . C T 6 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1. GT: ./.:0,0:0: 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 00;MQ=39.50;QD=32.44;SOR=2.303;ANN=T|synonymous_variant|LOW|ENSOA AD: .:.:.:0,0,0 :6:1|1:11 2 . RG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993.1|pro DP: 2492350 4 8 tein_coding|7/20|c.870C>T|p.Phe290Phe|870/2991|870/2991|290/996||;Cases=1,0,2; GQ: _T_C:90 9 8 Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DO PGT ,6,0 2 M=1.000e+00;CC_REC=1.000e+00 :PID 3 :PL 6 5 6 1 . A G 6 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1. GT: ./.:0,0:0: 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 00;MQ=39.50;QD=32.44;SOR=2.303;ANN=G|missense_variant|MODERATE|EN AD: .:.:.:0,0,0 :6:1|1:11 2 . SOARG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993. DP: 2492350 4 8 1|protein_coding|7/20|c.871A>G|p.Thr291Ala|871/2991|871/2991|291/996||;Cases= GQ: _T_C:90 9 8 1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;C PGT ,6,0 2 C_DOM=1.000e+00;CC_REC=1.000e+00 :PID 3 :PL 6 6 6 1 . C T 1 . AC=4;AF=1.00;AN=4;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1. GT: ./.:2,0:2: 1/1:0,2:2 1/1:0,2:2:6:1|1:112 1 5 00;MQ=27.86;QD=27.86;SOR=1.609;ANN=T|synonymous_variant|LOW|ENSOA AD: .:.:.:0,0,0 :6:1|1:11 498559_C_T:90,6, 2 5 RG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993.1|pro DP: 2498559 0 4 . tein_coding|11/20|c.1531C>T|p.Leu511Leu|1531/2991|1531/2991|511/996||,T|upstre GQ: _C_T:90 9 0 am_gene_variant|MODIFIER|ENSOARG00000011137|ENSOARG00000011137|tr PGT ,6,0 8 2 anscript|ENSOART00000012113.1|protein_coding||c.- :PID 5 1113C>T|||||1113|;Cases=2,0,4;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN :PL 5 ;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 9 6 1 . T C 1 . AC=4;AF=1.00;AN=4;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1. GT: ./.:2,0:2: 1/1:0,2:2 1/1:0,2:2:6:1|1:112 1 5 00;MQ=27.86;QD=34.95;SOR=1.609;ANN=C|missense_variant|MODERATE|ENS AD: .:.:.:0,0,0 :6:1|1:11 498559_C_T:90,6, 2 5 OARG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993.1 DP: 2498559 0 4 . |protein_coding|11/20|c.1534T>C|p.Tyr512His|1534/2991|1534/2991|512/996||,C|up GQ: _C_T:90 9 0 stream_gene_variant|MODIFIER|ENSOARG00000011137|ENSOARG0000001113 PGT ,6,0 8 2 7|transcript|ENSOART00000012113.1|protein_coding||c.- :PID 5 1110T>C|||||1110|;Cases=2,0,4;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN :PL 6 ;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 2 6 1 . T C 1 . AC=4;AF=1.00;AN=4;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1. GT: ./.:2,0:2: 1/1:0,2:2 1/1:0,2:2:6:1|1:112 1 5 00;MQ=27.86;QD=30.64;SOR=1.609;ANN=C|synonymous_variant|LOW|ENSOA AD: .:.:.:0,0,0 :6:1|1:11 498559_C_T:90,6, 2 5 RG00000011020|ENSOARG00000011020|transcript|ENSOART00000011993.1|pro DP: 2498559 0

224

4 . tein_coding|11/20|c.1536T>C|p.Tyr512Tyr|1536/2991|1536/2991|512/996||,C|upstre GQ: _C_T:90 9 0 am_gene_variant|MODIFIER|ENSOARG00000011137|ENSOARG00000011137|tr PGT ,6,0 8 2 anscript|ENSOART00000012113.1|protein_coding||c.- :PID 5 1108T>C|||||1108|;Cases=2,0,4;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN :PL 6 ;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 4 6 1 . T C 1 . AC=3;AF=0.750;AN=4;BaseQRankSum=- GT: 0/1:5,7:1 1/1:0,1:1 ./.:4,0:4:.:.:.:0,0,0 1 7 1.301e+00;ClippingRankSum=0.00;DP=17;ExcessHet=3.0103;FS=0.000;MLEAC= AD: 2:99:.:.: :3:1|1:11 2 7 3;MLEAF=0.750;MQ=52.47;MQRankSum=0.00;QD=14.76;ReadPosRankSum=- DP: 159,0,11 2657956 6 . 1.090e- GQ: 4 _T_C:45 5 1 01;SOR=1.402;ANN=C|splice_region_variant&intron_variant|LOW|KIAA0232|EN PGT ,3,0 7 1 SOARG00000011238|transcript|ENSOART00000012220.1|protein_coding|1/9|c.- :PID 9 269+5T>C||||||;Cases=1,0,2;Controls=0,1,1;CC_TREND=1.573e- :PL 5 01;CC_GENO=NaN;CC_ALL=5.000e-01;CC_DOM=5.000e- 6 01;CC_REC=1.000e+00 6 1 . C CA 3 . AC=3;AF=0.750;AN=4;BaseQRankSum=0.319;ClippingRankSum=0.00;DP=31;Ex GT: 0/1:4,9:1 1/1:1,3:4 ./.:1,0:1:.:.:.:0,0,0 1 5 cessHet=3.0103;FS=2.271;MLEAC=3;MLEAF=0.750;MQ=59.38;MQRankSum=0. AD: 3:99:0|1: :7:1|1:11 4 6 396;QD=20.99;ReadPosRankSum=-3.190e- DP: 1148195 4819523 8 . 01;SOR=1.336;ANN=CA|splice_region_variant&intron_variant|LOW|ENSOARG0 GQ: 23_C_C _C_CA: 1 8 0000013472|ENSOARG00000013472|transcript|ENSOART00000014654.1|protein PGT A:267,0, 127,7,0 9 8 _coding|1/1|c.245-8_245- :PID 99 5 7insA||||||WARNING_TRANSCRIPT_NO_STOP_CODON,CA|upstream_gene_var :PL 2 iant|MODIFIER|ENSOARG00000013494|ENSOARG00000013494|transcript|ENS 3 OART00000014675.1|protein_coding||c.-1900_- 1899insA|||||1800|WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2; Controls=0,1,1;CC_TREND=1.573e-01;CC_GENO=NaN;CC_ALL=5.000e- 01;CC_DOM=5.000e-01;CC_REC=1.000e+00 6 1 . A T 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=T|synonymous_variant|LOW|ENSOA AD: 2:.:.:.:0, :6:1|1:11 4 . RG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904.1|pro DP: 0,0 4966801 9 8 tein_coding|1/5|c.51A>T|p.Arg17Arg|51/1608|51/1608|17/535||WARNING_TRAN GQ: _A_T:90 6 8 SCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;C PGT ,6,0 6 C_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PID 8 :PL 0 1 6 1 . C T 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=T|synonymous_variant|LOW|ENSOA AD: 2:.:.:.:0, :6:1|1:11 4 . RG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904.1|pro DP: 0,0 4966801 9 8 tein_coding|1/5|c.52C>T|p.Leu18Leu|52/1608|52/1608|18/535||WARNING_TRAN GQ: _A_T:90 6 8 SCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;C PGT ,6,0 6 C_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PID 8 :PL 0 2 6 1 . G A 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=A|synonymous_variant|LOW|ENSOA AD: 2:.:.:.:0, :6:1|1:11 4 . RG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904.1|pro DP: 0,0 4966801 9 8 tein_coding|1/5|c.60G>A|p.Pro20Pro|60/1608|60/1608|20/535||WARNING_TRANS GQ: _A_T:90 6 8 PGT ,6,0

225 6 CRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC :PID 8 _GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PL 1 0 6 1 . G A 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=A|missense_variant|MODERATE|EN AD: 2:.:.:.:0, :6:1|1:11 4 . SOARG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904. DP: 0,0 4966801 9 8 1|protein_coding|1/5|c.68G>A|p.Arg23Gln|68/1608|68/1608|23/535||WARNING_T GQ: _A_T:90 6 8 RANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=Na PGT ,6,0 6 N;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e :PID 8 +00 :PL 1 8 6 1 . T A 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=A|missense_variant|MODERATE|EN AD: 2:.:.:.:0, :6:1|1:11 4 . SOARG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904. DP: 0,0 4966801 9 8 1|protein_coding|1/5|c.73T>A|p.Ser25Thr|73/1608|73/1608|25/535||WARNING_TR GQ: _A_T:90 6 8 ANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN PGT ,6,0 6 ;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 :PID 8 0 :PL 2 3 6 1 . A G 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=G|synonymous_variant|LOW|ENSOA AD: 2:.:.:.:0, :6:1|1:11 4 . RG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904.1|pro DP: 0,0 4966801 9 8 tein_coding|1/5|c.78A>G|p.Pro26Pro|78/1608|78/1608|26/535||WARNING_TRANS GQ: _A_T:90 6 8 CRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC PGT ,6,0 6 _GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PID 8 :PL 2 8 6 1 . C A 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=11.10;QD=32.44;SOR=0.693;ANN=A|missense_variant|MODERATE|EN AD: 2:.:.:.:0, :6:1|1:11 4 . SOARG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904. DP: 0,0 4966801 9 8 1|protein_coding|1/5|c.83C>A|p.Thr28Lys|83/1608|83/1608|28/535||WARNING_T GQ: _A_T:90 6 8 RANSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=Na PGT ,6,0 6 N;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e :PID 8 +00 :PL 3 3 6 1 . T G 6 . AC=2;AF=1.00;AN=2;DP=25;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:23,0:2 1/1:0,2:2 ./.:0,0:0:.:.:.:0,0,0 1 4 .00;MQ=10.88;QD=32.44;SOR=0.693;ANN=G|synonymous_variant|LOW|ENSOA AD: 3:.:.:.:0, :6:1|1:11 4 . RG00000013697|ENSOARG00000013697|transcript|ENSOART00000014904.1|pro DP: 0,0 4966801 9 8 tein_coding|1/5|c.108T>G|p.Pro36Pro|108/1608|108/1608|36/535||WARNING_TRA GQ: _A_T:90 6 8 NSCRIPT_NO_START_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN; PGT ,6,0 6 CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+0 :PID 8 0 :PL 5 8

226

6 1 . A AC 3 . AC=3;AF=0.750;AN=4;BaseQRankSum=-6.870e- GT: 0/1:9,13: 1/1:0,5:5 ./.:0,0:1:.:0|1:11533 1 8 01;ClippingRankSum=0.00;DP=29;ExcessHet=3.0103;FS=0.000;MLEAC=3;MLE AD: 22:99:.:. :15:.:.:12 2578_AC_A:0,0,0 5 7 AF=0.750;MQ=60.00;MQRankSum=0.00;QD=14.34;ReadPosRankSum=0.167;SO DP: :258,0,3 1,15,0 3 . R=0.527;ANN=AC|splice_region_variant&intron_variant|LOW|ENSOARG000000 GQ: 11 3 2 26621|ENSOARG00000026621|transcript|ENSOART00000028645.1|lincRNA|1/1| PGT 2 n.1005- :PID 5 9dupC||||||INFO_REALIGN_3_PRIME,AC|upstream_gene_variant|MODIFIER|EN :PL 7 SOARG00000026621|ENSOARG00000026621|transcript|ENSOART00000028646. 8 1|lincRNA||n.-2752_- 2751insC|||||2751|,AC|intron_variant|MODIFIER|SH3BP2|ENSOARG00000014948| transcript|ENSOART00000016273.1|protein_coding|3/17|c.327- 532dupG||||||WARNING_TRANSCRIPT_NO_STOP_CODON,AC|non_coding_tra nscript_exon_variant|MODIFIER|ENSOARG00000026621|ENSOARG0000002662 1|transcript|ENSOART00000028644.1|lincRNA|2/2|n.1036dupC||||||INFO_REALIG N_3_PRIME;Cases=1,0,2;Controls=0,1,1;CC_TREND=1.573e- 01;CC_GENO=NaN;CC_ALL=5.000e-01;CC_DOM=5.000e- 01;CC_REC=1.000e+00 6 1 . C G 2 . AC=3;AF=0.750;AN=4;BaseQRankSum=-7.910e- GT: 0/1:16,1 1/1:0,2:2 ./.:0,0:0:.:0,0,0 1 2 01;ClippingRankSum=0.00;DP=29;ExcessHet=3.0103;FS=3.452;MLEAC=3;MLE AD: 1:27:99: :6:54,6,0 5 2 AF=0.750;MQ=60.00;MQRankSum=0.00;QD=7.66;ReadPosRankSum=- DP: 196,0,32 4 . 1.568e+00;SOR=0.537;ANN=G|synonymous_variant|LOW|FAM193A|ENSOARG GQ: 3 7 1 00000015077|transcript|ENSOART00000016412.1|protein_coding|1/18|c.228G>C|p PL 0 5 .Ala76Ala|228/3609|228/3609|76/1202||;Cases=1,0,2;Controls=0,1,1;CC_TREND= 1 1.573e-01;CC_GENO=NaN;CC_ALL=5.000e-01;CC_DOM=5.000e- 3 01;CC_REC=1.000e+00 0 6 1 . G T 6 . AC=2;AF=1.00;AN=2;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:20,0:2 ./.:1,0:1:. 1/1:0,2:2:6:1|1:116 1 4 .00;MQ=17.69;QD=32.44;SOR=2.303;ANN=T|missense_variant|MODERATE|FG AD: 0:.:.:.:0, :.:.:0,0,0 119455_G_T:90,6, 6 . FR3|ENSOARG00000015904|transcript|ENSOART00000017316.1|protein_coding| DP: 0,0 0 1 8 3/15|c.389C>A|p.Pro130Gln|389/1995|389/1995|130/664||WARNING_TRANSCRI GQ: 1 8 PT_NO_START_CODON,T|intron_variant|MODIFIER|ENSOARG00000015218|E PGT 9 NSOARG00000015218|transcript|ENSOART00000016560.1|protein_coding|2/7|c.9 :PID 4 3- :PL 5 23894C>A||||||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Con 5 trols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM= 1.000e+00;CC_REC=1.000e+00 6 1 . G T 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:21,0:2 ./.:1,0:1:. 1/1:0,2:2:6:1|1:116 1 4 .00;MQ=17.32;QD=32.44;SOR=2.303;ANN=T|missense_variant|MODERATE|FG AD: 1:.:.:.:0, :.:.:0,0,0 119455_G_T:90,6, 6 . FR3|ENSOARG00000015904|transcript|ENSOART00000017316.1|protein_coding| DP: 0,0 0 1 8 3/15|c.385C>A|p.Leu129Met|385/1995|385/1995|129/664||WARNING_TRANSCRI GQ: 1 8 PT_NO_START_CODON,T|intron_variant|MODIFIER|ENSOARG00000015218|E PGT 9 NSOARG00000015218|transcript|ENSOART00000016560.1|protein_coding|2/7|c.9 :PID 4 3- :PL 5 23898C>A||||||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Con 9 trols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM= 1.000e+00;CC_REC=1.000e+00 6 1 . G A 6 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:22,0:2 ./.:0,0:0:. 1/1:0,2:2:6:1|1:116 1 4 .00;MQ=17.32;QD=32.44;SOR=2.303;ANN=A|synonymous_variant|LOW|FGFR3| AD: 2:.:.:.:0, :.:.:0,0,0 119455_G_T:90,6, 6 . ENSOARG00000015904|transcript|ENSOART00000017316.1|protein_coding|3/15| DP: 0,0 0 1 c.372C>T|p.Ser124Ser|372/1995|372/1995|124/664||WARNING_TRANSCRIPT_N GQ:

227 1 8 O_START_CODON,A|intron_variant|MODIFIER|ENSOARG00000015218|ENSO PGT 9 8 ARG00000015218|transcript|ENSOART00000016560.1|protein_coding|2/7|c.93- :PID 4 23911C>T||||||WARNING_TRANSCRIPT_NO_START_CODON;Cases=1,0,2;Con :PL 7 trols=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM= 2 1.000e+00;CC_REC=1.000e+00 6 1 . C T 2 . AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:14,0:1 1/1:0,1:1 ./.:1,0:1:.:.:.:0,0,0 1 0 .00;MQ=15.00;QD=20.70;SOR=1.609;ANN=T|missense_variant|MODERATE|CT AD: 4:.:.:.:0, :3:1|1:11 6 . BP1|ENSOARG00000016137|transcript|ENSOART00000017575.1|protein_coding| DP: 0,0 6356927 3 7 8/8|c.1045C>T|p.Arg349Trp|1045/1059|1045/1059|349/352||WARNING_TRANSC GQ: _G_A:4 5 RIPT_NO_STOP_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_G PGT 5,3,0 6 ENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e+00 :PID 9 :PL 3 2 6 1 . G GCACA 5 . AC=2;AF=1.00;AN=2;DP=17;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:14,0:1 1/1:0,2:2 ./.:1,0:1:.:.:.:0,0,0 1 5 .00;MQ=20.58;QD=27.92;SOR=2.303;ANN=GCACA|frameshift_variant|HIGH|CT AD: 4:.:.:.:0, :6:1|1:11 6 . BP1|ENSOARG00000016137|transcript|ENSOART00000017575.1|protein_coding| DP: 0,0 6356927 3 8 8/8|c.1054_1055insCACA|p.Gly352fs|1055/1059|1055/1059|352/352||WARNING_ GQ: _G_A:9 5 4 TRANSCRIPT_NO_STOP_CODON;Cases=1,0,2;Controls=0,0,0;CC_TREND=Na PGT 0,6,0 6 N;CC_GENO=NaN;CC_ALL=1.000e+00;CC_DOM=1.000e+00;CC_REC=1.000e :PID 9 +00 :PL 4 1 6 1 . TG T 5 . AC=2;AF=1.00;AN=2;DP=15;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1 GT: ./.:12,0:1 1/1:0,2:2 ./.:1,0:1:.:.:.:0,0,0 1 GG 5 .00;MQ=21.91;QD=27.92;SOR=2.303;ANN=T|frameshift_variant&splice_region_v AD: 2:.:.:.:0, :6:1|1:11 6 GG . ariant|HIGH|CTBP1|ENSOARG00000016137|transcript|ENSOART00000017575.1| DP: 0,0 6356927 3 GA 8 protein_coding|8/8|c.1059_*11delGGGGGGAGTGGG|p.Val353fs|1059/1059|1059/ GQ: _G_A:9 5 GT 4 1059|353/352||WARNING_TRANSCRIPT_NO_STOP_CODON,T|downstream_ge PGT 0,6,0 6 GG ne_variant|MODIFIER|CTBP1|ENSOARG00000016137|transcript|ENSOART0000 :PID 9 G 0017575.1|protein_coding|8/8|c.1059_*11delGGGGGGAGTGGG|||||0|WARNING_ :PL 4 TRANSCRIPT_NO_STOP_CODON,T|intergenic_region|MODIFIER|CTBP1- 5 ENSOARG00000026629|ENSOARG00000016137- ENSOARG00000026629|intergenic_region|ENSOARG00000016137- ENSOARG00000026629|||n.116356946_116356957delGGGGGGAGTGGG||||||;Cas es=1,0,2;Controls=0,0,0;CC_TREND=NaN;CC_GENO=NaN;CC_ALL=1.000e+00 ;CC_DOM=1.000e+00;CC_REC=1.000e+00

228 Table S3 TaqMan PCR assay genotyping results for 194 animals. Yellow highlighted cells indicate animals that were also Sanger sequenced. File format: Excel (.xls).

Sample ID Genotype Reported phenotype Flock Control 1 CC Clinically normal, Merino Control Control 2 CC Clinically normal, Merino Control PPHA-68 -- Affected Flock 1 PPHA-69 -- Affected Flock 1 PPHA-70 -- Affected Flock 1 PPHA-72 -- Affected Flock 1 PPHA-73 C- Obligate carrier Flock 1 PPHA-74 C- Obligate carrier Flock 1 PPHA-75 C- Clinically normal Flock 1 PPHA-76 C- Clinically normal Flock 1 PPHA-77 C- Clinically normal Flock 1 PPHA-78 C- Clinically normal Flock 1 PPHA-79 C- Obligate carrier Flock 1 PPHA-80 CC Clinically normal Flock 1 PPHA-81 C- Clinically normal Flock 1 PPHA-82 CC Clinically normal Flock 1 PPHA-83 C- Clinically normal Flock 1 PPHA-84 C- Clinically normal Flock 1 PPHA-85 C- Clinically normal Flock 1 PPHA-86 C- Clinically normal Flock 1 PPHA-87 C- Clinically normal Flock 1 PPHA-88 CC Clinically normal Flock 1 PPHA-89 C- Obligate carrier Flock 1 PPHA-90 CC Clinically normal Flock 1 PPHA-91 C- Obligate carrier Flock 1 PPHA-92 C- Obligate carrier Flock 1 PPHA-93 -- Affected Flock 1 PPHA-94 -- Affected Flock 1 PPHA-95 C- Clinically normal Flock 1 PPHA-96 CC Clinically normal Flock 1 PPHA-97 C- Obligate carrier Flock 1 PPHA-98 C- Obligate carrier Flock 1 PPHA-99 C- Clinically normal Flock 1 PPHA-100 -- Affected Flock 1

229 PPHA-102 CC Clinically normal Flock 1 PPHA-103 CC Clinically normal Flock 1 PPHA-110 C- Clinically normal Flock 1 PPHA-111 C- Clinically normal Flock 1 PPHA-112 C- Clinically normal Flock 1 PPHA-113 C- Clinically normal Flock 1 X18-00033/0001 CC Clinically normal Flock 1 X18-00033/0002 C- Clinically normal Flock 1 X18-00033/0003 CC Clinically normal Flock 1 X18-00033/0004 C- Clinically normal Flock 1 X18-00033/0005 C- Clinically normal Flock 1 X18-00033/0006 C- Clinically normal Flock 1 X18-00033/0007 CC Clinically normal Flock 1 X18-00033/0008 CC Clinically normal Flock 1 X18-00033/0009 C- Clinically normal Flock 1 X18-00033/0010 CC Clinically normal Flock 1 X18-00033/0011 CC Clinically normal Flock 1 X18-00033/0012 C- Clinically normal Flock 1 X18-00033/0013 C- Clinically normal Flock 1 X18-00033/0014 C- Clinically normal Flock 1 X18-00033/0015 CC Clinically normal Flock 1 X18-00033/0016 C- Clinically normal Flock 1 X18-00033/0017 C- Clinically normal Flock 1 X18-00033/0018 C- Clinically normal Flock 1 X18-00033/0019 CC Clinically normal Flock 1 X18-00033/0020 CC Clinically normal Flock 1 X18-00033/0021 CC Clinically normal Flock 1 X18-00033/0022 C- Clinically normal Flock 1 X18-00033/0023 C- Clinically normal Flock 1 X18-00033/0024 C- Clinically normal Flock 1 X18-00033/0025 CC Clinically normal Flock 1 X18-00033/0026 C- Clinically normal Flock 1 X18-00033/0027 CC Clinically normal Flock 1 X18-00033/0028 CC Clinically normal Flock 1 X18-00033/0029 CC Clinically normal Flock 1 X18-00033/0030 CC Clinically normal Flock 1 X18-00033/0031 C- Clinically normal Flock 1 X18-00033/0032 CC Clinically normal Flock 1 X18-00033/0033 C- Clinically normal Flock 1

230 X18-00033/0034 C- Clinically normal Flock 1 X18-00033/0035 C- Clinically normal Flock 1 X18-00033/0036 C- Clinically normal Flock 1 X18-00033/0037 C- Clinically normal Flock 1 X18-00033/0038 C- Clinically normal Flock 1 X18-00033/0039 C- Clinically normal Flock 1 X18-00033/0040 C- Clinically normal Flock 1 X18-00033/0041 C- Clinically normal Flock 1 X18-00033/0042 C- Clinically normal Flock 1 X18-00033/0043 C- Clinically normal Flock 1 X18-00033/0044 C- Clinically normal Flock 1 X18-00033/0045 CC Clinically normal Flock 1 X18-00033/0046 C- Clinically normal Flock 1 X18-00033/0047 CC Clinically normal Flock 1 X18-00033/0048 CC Clinically normal Flock 1 X18-00033/0049 C- Clinically normal Flock 1 X18-00033/0050 C- Clinically normal Flock 1 X18-00033/0051 C- Clinically normal Flock 1 X18-00033/0052 C- Clinically normal Flock 1 X18-00033/0053 CC Clinically normal Flock 1 X18-00033/0054 C- Clinically normal Flock 1 X18-00033/0055 CC Clinically normal Flock 1 X18-00033/0056 CC Clinically normal Flock 1 X18-00033/0057 C- Clinically normal Flock 1 X18-00033/0058 C- Clinically normal Flock 1 X18-00033/0059 CC Clinically normal Flock 1 X18-00033/0060 CC Clinically normal Flock 1 X18-00033/0061 CC Clinically normal Flock 1 X18-00033/0062 CC Clinically normal Flock 1 X18-00033/0063 CC Clinically normal Flock 1 X18-00033/0064 C- Clinically normal Flock 1 X18-00033/0065 C- Clinically normal Flock 1 X18-00033/0066 CC Clinically normal Flock 1 X18-00033/0067 CC Clinically normal Flock 1 X18-00033/0068 C- Clinically normal Flock 1 X18-00033/0069 C- Clinically normal Flock 1 X18-00033/0070 CC Clinically normal Flock 1 X18-00033/0071 C- Clinically normal Flock 1 X18-00033/0072 CC Clinically normal Flock 1

231 X18-00033/0073 CC Clinically normal Flock 1 X18-00033/0074 CC Clinically normal Flock 1 X18-00033/0075 CC Clinically normal Flock 1 X18-00033/0076 CC Clinically normal Flock 1 X18-00033/0077 CC Clinically normal Flock 1 X18-00033/0078 CC Clinically normal Flock 1 X18-00033/0079 CC Clinically normal Flock 1 X18-00033/0080 C- Clinically normal Flock 1 X18-00033/0081 C- Clinically normal Flock 1 X18-00033/0082 C- Clinically normal Flock 1 X18-00033/0083 CC Clinically normal Flock 1 X18-00033/0084 C- Clinically normal Flock 1 X18-00033/0085 CC Clinically normal Flock 1 X18-00033/0086 CC Clinically normal Flock 1 X18-00033/0087 C- Clinically normal Flock 1 PPHA-1 C- Clinically normal Flock 2 PPHA-2 CC Clinically normal Flock 2 PPHA-3 C- Clinically normal Flock 2 PPHA-4 CC Clinically normal Flock 2 PPHA-5 CC Clinically normal Flock 2 PPHA-6 CC Clinically normal Flock 2 PPHA-7 CC Clinically normal Flock 2 PPHA-8 CC Clinically normal Flock 2 PPHA-9 CC Clinically normal Flock 2 PPHA-10 C- Clinically normal Flock 2 PPHA-11 CC Clinically normal Flock 2 PPHA-12 CC Clinically normal Flock 2 PPHA-13 CC Clinically normal Flock 2 PPHA-14 CC Clinically normal Flock 2 PPHA-15 CC Clinically normal Flock 2 PPHA-16 CC Clinically normal Flock 2 PPHA-17 CC Clinically normal Flock 2 PPHA-18 CC Clinically normal Flock 2 PPHA-19 CC Clinically normal Flock 2 PPHA-20 CC Clinically normal Flock 2 PPHA-21 C- Obligate carrier Flock 2 PPHA-22 CC Clinically normal Flock 2 PPHA-23 CC Clinically normal Flock 2 PPHA-24 CC Clinically normal Flock 2

232 PPHA-25 CC Clinically normal Flock 2 PPHA-26 C- Clinically normal Flock 2 PPHA-27 CC Clinically normal Flock 2 PPHA-28 CC Clinically normal Flock 2 PPHA-29 CC Clinically normal Flock 2 PPHA-30 CC Clinically normal Flock 2 PPHA-31 CC Clinically normal Flock 2 PPHA-32 CC Clinically normal Flock 2 PPHA-33 CC Clinically normal Flock 2 PPHA-34 CC Clinically normal Flock 2 PPHA-35 CC Clinically normal Flock 2 PPHA-36 CC Clinically normal Flock 2 PPHA-37 CC Clinically normal Flock 2 PPHA-38 CC Clinically normal Flock 2 PPHA-39 CC Clinically normal Flock 2 PPHA-40 CC Clinically normal Flock 2 PPHA-41 CC Clinically normal Flock 2 PPHA-42 CC Clinically normal Flock 2 PPHA-43 CC Clinically normal Flock 2 PPHA-44 CC Clinically normal Flock 2 PPHA-45 CC Clinically normal Flock 2 PPHA-46 CC Clinically normal Flock 2 PPHA-47 CC Clinically normal Flock 2 PPHA-48 CC Clinically normal Flock 2 PPHA-49 CC Clinically normal Flock 2 PPHA-50 CC Clinically normal Flock 2 PPHA-51 CC Clinically normal Flock 2 PPHA-52 CC Clinically normal Flock 2 PPHA-53 CC Clinically normal Flock 2 PPHA-54 CC Clinically normal Flock 2 PPHA-55 CC Clinically normal Flock 2 PPHA-56 CC Clinically normal Flock 2 PPHA-57 CC Clinically normal Flock 2 PPHA-58 CC Clinically normal Flock 2 PPHA-59 CC Clinically normal Flock 2 PPHA-60 CC Clinically normal Flock 2 PPHA-61 CC Clinically normal Flock 2 PPHA-63 CC Clinically normal Flock 2 PPHA-64 CC Clinically normal Flock 2

233 PPHA-65 C- Clinically normal Flock 2 PPHA-66 CC Clinically normal Flock 2 PPHA-67 CC Clinically normal Flock 2 PPHA-71 Fail Affected Flock 3

234 Table S4 Overview of exon length for in the ADAMTS3 gene in humans (Homo sapiens), mice (Mus musculus), sheep (Ovis aries) and PHA-affected foetuses PPHA93 and PPHA94. Blank cells indicate no annotated exon and N/A indicates lack of Sanger sequencing.

Length (bp)

10 Species Ensembl/ NCBI transcript Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 Exon 8 Exon 9 Exon Exon 11 Exon 12 Exon 13 Exon 14 Exon 15 Alternative* Exon 16 Exon 17 Exon 18 Exon 19 Exon 20 Exon 21 Exon 22 Exon 23

Human-1 ENST00000286657.10 528 28 407 157 200 84 157 106 144 133 114 146 176 134 124 81 164 166 133 208 118 2736

Human-2 ENST00000622135.1 103 28 407 157 200 84 157 106 144 133 114 146 176 134 124 81 164 166 133 208 118 1243 153

Mouse-1 ENSMUST00000163159.7 114 28 407 157 200 84 157 106 144 133 114 146 176 134 124 84 164 166 133 208 118 840

Mouse-2 ENSMUST00000061427.9 114 28 407 157 200 84 157 106 144 133 114 146 176 134 124 81 164 166 133 208 118 840

Mouse-3 ENSMUST00000198151.1 513 28 407 885

Sheep-1 ENSOART00000014359.1 580 28 407 157 200 84 157 106 144 133 114 146 176 134 124 81 164 166 133 208 118 560

Sheep-2 ENSOART00000014360.1 580 28 407 28 35 60

PPHA93 ENSOART00000014359.1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 116 124 81 164 N/A N/A N/A N/A N/A

PPHA94 ENSOART00000014359.1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 116 124** 139** 81** 164 N/A N/A N/A N/A N/A *Supported evidence for this alternative exon is indicated by cDNA sequences identified by BLAST search.

**Sequence data shows that these transcripts are present (3).

235

Figure S1 Multiple species alignment from the NCBI BLAST of the 139 bp insertion in

PPHA94. Grey coloured bars indicate consensus sequence and red coloured bars indicate sequence mismatches.

236 Figure S2 Approximate location of the six amino acids predicted to be removed

(ENSOARP00000014152.1:p. (Val680_Val685del)) as indicated by the red arrow in the wildtype thrombospondin type-1 (TSP1) repeat of the ADAMTS3 protein, generated from

Ensembl (accessed 19th September 2020,

013204;r=6:87097877-87386270;t=ENSOART00000014359>).

237 Chapter 7 | General discussion and conclusions

7.1 General discussion

This thesis describes the molecular investigation of four inherited diseases in cattle and sheep for which causal mutations were previously unknown: ichthyosis fetalis (IF) in Shorthorn cattle,

Niemann-Pick type C (NPC) disease in Angus/Angus-cross cattle, brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in Merino sheep and pulmonary hypoplasia with anasarca (PHA) in Persian/Persian-cross sheep. This study began with the investigation of ten inherited diseases affecting cattle and sheep. Several constraints, such as sample size, sample quality, phenotype accuracy, budget and time constraints prevented the progression of six of these diseases, which will be pursued independently of this thesis.

A majority of the inherited diseases investigated in this thesis originated from historical cases which created challenges. More specifically, often samples were only available for some of the reported affected animals and some of the relatives, while samples of unrelated control animals from the herd or flock were mostly not available. Samples were often not collected for the specific purpose of DNA and RNA extraction which impacted the ability to extract high quality and quantities of DNA and RNA, particularly as samples were in long term storage or suffering from sample degradation. Samples for most cases were collected in the field and often detailed phenotypic descriptions were lacking and samples for clinical pathology or histopathology were not or no longer available. Pedigree information was often absent, which made any prediction of a mode of inheritance difficult. Founder animals could not be traced and other at risk herds or

238

flocks could not be identified. Lack of pedigree information and the passing of time since the diseases were first reported can result in difficulties in establishing research herds or flocks, which would have allowed for further research to confirm the mode of inheritance, to characterise the phenotype and to collect high quality samples for analysis. Furthermore, such research flocks and herds could enable the use of affected animals as models for human disease.

Despite of these challenges, the investigations of NPC, BCRHS, PHA and IF was successful using a variety of approaches. These approaches included pedigree analysis, SNP chip genotyping, homozygosity mapping, candidate gene analysis, Sanger sequencing, whole genome sequencing and confirmation of disease segregation in the original herds and flocks after the development of diagnostic DNA tests. Tailored approaches led to the discovery of four likely causal mutations for: IF in Shorthorn cattle (NM_001191294.2:c.6776T>C; Chapter 3), NPC in

Angus/Angus-cross cattle (NM_174758.2:c.2969C>G; Chapter 4), BCRHS in Merino sheep

(ENSOARG00000020239:g.220472248delC; Chapter 5) and PHA in Persian/Persian-cross sheep (ENSOARG00000013204:g.87124344delC; Chapter 6).

7.1.1 Attitudes towards inherited diseases in cattle and sheep

The discovery of causal mutations in Australian cattle and sheep has had a strong history.

Citrullinaemia affecting Australian dairy cattle (Harper et al. 1986; Healy et al. 1990) was the second inherited diseases in cattle for which a disease causing mutation was identified (Dennis et al. 1989). The implementation of DNA diagnostics allowed for successful management (Healy et al. 1990). The same group later identified or contributed to the identification of disease causing

239

mutations and developed DNA diagnostics for maple syrup urine disease (Zhang et al. 1990;

Dennis & Healy 1999), α-mannosidosis (Berg et al. 1997; Tollersrud et al. 1997), glycogen storage disease II (Dennis et al. 2000) and myoclonus (Pierce et al. 2001). Research by an

Australian and New Zealand collaboration focused on the identification of disease causing mutations for animal models for neuronal ceroid lipofuscinosis and identified the disease causing mutations for variants of this disease in Devon cattle (Houweling et al. 2006), and Merino,

Borderdale and South Hampshire sheep (Tammen et al. 2001; Tammen et al. 2006; Frugier et al.

2008). These are some examples of the pioneering work to identify and manage inherited diseases in cattle and characterise animal models for human disease within Australia and New

Zealand.

However, inherited diseases are still currently under-reported or misdiagnosed as non-heritable diseases. To allow for more effective management of inherited diseases in the future, efficient processes for reporting and investigation must be put into place especially as genetic technologies are becoming more accessible. In addition, awareness about inherited diseases needs to be raised among producers and veterinarians.

The reporting of emerging inherited diseases is not well organised in Australia. This could stem from the perceived negative repercussions on income and reputation, misdiagnosis as a non- heritable disease or lack of knowledge surrounding reporting processes. Some breeding societies are making progress in opening communication about inherited conditions (Teseling & Parnell

2011). These societies are providing educational resources for known inherited diseases in the

240 breed, improving testing availability and providing advice on how to manage inherited diseases on-farm (Teseling & Parnell 2013; Angus Australia 2020).

The development of tools to enable for improved future prediction of genotypes in pedigrees has enhanced this progress, notably through the use of GeneProb (http://www- personal.une.edu.au/~bkinghor/geneprob.htm; Kerr & Kinghorn 1996). This tool allows for segregation analysis of single biallelic loci and the estimation of genotype probabilities for all individuals in a herdbook if DNA testing results are available for key breeding animals only

(Kerr & Kinghorn 1996). This allows producers to effectively manage their herds or flocks without needing to cull heterozygous animals or conduct DNA testing for every animal for known recessive inherited diseases. Breeding societies and producers have the potential to influence the rate at which emerging inherited diseases are reported and therefore managed within the industry.

A breeding society that has made strides within this space is the Australian Angus society, where their proactive approach has permitted for testing of common inherited diseases in Angus (Angus

Australia 2020). An early inherited disease that the society advocated testing for was α- mannosidosis (Hocking et al. 1972; Berg et al. 1997; Tollersrud et al. 1997). The effective use of heterozygous testing for α-mannosidosis using initially enzymatic assays and later DNA diagnostics in New Zealand and Australia (Jolly et al. 1974a; Jolly et al. 1974b; Jolly et al.

1974c; Healy 1996; Berg et al. 1997) facilitated successful management programs where the occurrence of the disease was drastically reduced. The successful management of this lysosomal

241

storage disease has provided a framework for management of inherited conditions and has impacted how the society is managing the risk of inherited conditions in the breed now. DNA testing results and or GeneProb information is provided for nine inherited diseases (Angus

Australia 2020).

The inherited diseases investigated in this thesis were originally reported by veterinarians, or were directly reported to research scientists by producers. The relationship between producers and veterinarians is significant to foster clear communication for the future steps of reporting inherited diseases. District veterinarians within Australia play a vital role in upholding the health and productivity of animals within their region, for which these veterinarians offer specialist support often funded by the State (Frawley 2003). District veterinarians provide a link to government laboratories and resources, and the importance of communication surrounding the reporting and investigation of inherited diseases is therefore integral. It is important for researchers however to keep information regarding affected animals and their owners confidential during the research stage in order to maintain positive working relationships with producers for the collection of samples. The data relating to a suspect inherited disease is crucial in dictating the success of identifying a causal mutation, with the true incidence of inherited disease cases and emerging inherited diseases proving difficult to measure without consistent reporting.

Despite the inconsistent reporting, the number of publications describing new inherited conditions as well as disease-causing mutations is increasing across the globe. This is discussed

242

in Chapter 1, where the number of causal variants identified per year is growing, especially within the last ten years (Online Mendelian Inheritance in Animals 2020). This is partially due to the availability of well annotated reference genomes through the Browser Genome Release

Agreement between major browsers Ensembl, NCBI and University of California Santa Cruz

(UCSC) Genome Browser, to ensure data is readily available and interchangeable

(http://Aug2020.archive.ensembl.org/info/about/legal/browser_agreement.html). This has meant that the advent of new and increasingly more affordable molecular genetic technologies such as

SNP chip genotyping and next generation sequencing is now more accessible than ever before

(Buermans & den Dunnen 2014; Shen et al. 2015). The improved efficiency in identifying disease-causing mutations has been demonstrated in this thesis, where causal mutations have been identified for inherited diseases for which affected animal sample sizes were small, such as one affected animal in Chapter 3 for Shorthorn IF and three for NPC disease in Chapter 4, or when previous studies utilising SNP genotyping and disease mapping accelerated variant discovery such as in BCRHS in Chapter 5 and PHA in Chapter 6.

The growing number of causal mutations identified in cattle and sheep indicates that the accessibility of early molecular genetic tools such as Sanger sequencing of candidate genes and linkage mapping via microsatellites, and more recently, high throughput genotyping and next generation sequencing technologies, is allowing for increased discoveries and therefore a rise in inherited disease frequency to be measured (Dennis 1993; Buermans & den Dunnen 2014;

Gurgul et al. 2014). However, it is not known what method of reporting of inherited conditions is preferred. To improve how inherited diseases are being reported, a national reporting centre for

243

livestock that is independent of breeding societies and organisations to allow anonymity, would be beneficial to streamline reporting and monitor the frequency of emerging inherited diseases.

As discussed in Chapter 1, an initiative in the Republic of Ireland has been formed to enable easier reporting by producers (Irish Cattle Breeding Federation 2016). Earlier initiatives such as the Danish Bovine Genetic Disease Programme have also been used to monitor and discover inherited diseases in cattle (Agerholm et al. 1993). The National Research Institute for

Agriculture, Food and Environment (INRAE) in France has similarly established a research program based on inherited disorders in dairy cattle (Capitan et al. 2014). This program encourages the investigation and reporting of inherited diseases in cattle through the National

Bovine Anomalies Observatory via a questionnaire that also outlines the French legal obligation to report genetic disorders when registering the parentage of cattle (Légifrance. 2020; National

Bovine Anomalies Observatory 2020).

Establishing a similar initiative in Australia to streamline the reporting process will allow for reporting to become more mainstream. Famers and veterinarians should be provided with information of known inherited conditions and with key directives regarding investigations of emerging inherited diseases that include information about which animals to sample, what type of samples to collect as well as being asked to provide information about the disease phenotype and if available, pedigree information. Consideration on how the described phenotype impacts animal welfare and production, as well as prevalence, will assist in prioritising inherited disease investigations. In canines, the impact of inherited diseases on animal welfare is measured via a

244

Generic Illness Severity Index that bases its score calculations on the severity and duration of an inherited disease (Collins et al. 2011). This approach allows for the prioritisation of disease investigation based on animal welfare implications (Collins et al. 2011).

Similar scoring systems could be implemented in livestock to assess animal welfare impacts, production efficiency and profitability impacts. Still, they may need to be tailored to each industry and species, if scores are not transposable. The four inherited diseases investigated in this thesis were prioritised among the ten diseases for which research initially commenced based on the availability of phenotypic descriptions and pedigree information, quality and quantity of samples for genetic analysis, identification of strong candidate genes and the degree of impact of the disease on animal welfare. These factors were considered to maximise the chance of discovery of a causal mutation and to maximise the impact of any resulting DNA diagnostics on animal welfare. This meant that other diseases initially proposed in this study such as congenital mandibular prognathia in Droughtmaster cattle, were de-prioritised due to a lower perceived impact on animal welfare when compared to lethal inherited diseases such as IF, BCRHS, NPC and PHA. Investigation of emerging inherited diseases is vital to reducing the risk of affected animals been born, and if the frequency of the deleterious allele in the population is suspected to be high, such diseases should be prioritised for management as the risk of affected offspring is higher.

For diseases with a recessive mode of inheritance producers can avoid the birth of affected animals by avoiding carrier by carrier matings, but can continue to breed carriers with

245

homozygous wildtype animals and sell offspring for slaughter or DNA test offspring that they plan to use as replacement animals. If diagnostic assays are available and affordable, producers may be more willing to manage the number of heterozygous animals in their herd or flock to avoid breeding affected animals. Producers in this research study were interested in managing disease using DNA testing; 403 Angus/Angus-cross cattle from the founder herd were tested for

NPC1 mutation (Chapter 4), 583 Merino/Merino-cross sheep were tested for the BCRHS mutation (Chapter 5) and 195 Persian/Persian-cross were tested for the PHA mutation (Chapter

6). The screening of additional animals from related herds or flocks assists producers in making informed choices for future breeding. Software packages that assist producers in making selection and breeding decisions to maximise genetic gain and diversity while taking information about genetic defects into account are available, such as MateSel (Kinghorn & Kinghorn 2020).

The fact that producers were interested in getting their whole herd or flock DNA tested proved to be an efficient method to validate the disease causing mutations in these studies, as no clinically normal animals were shown to be homozygous for the likely disease allele. This additional DNA testing allowed estimation of allele frequencies for the disease allele within the affected herds or flocks; 3.5% for NPC, 5% for BCRHS and 24% for the animals genotyped (including affected animals) for PHA.

Due to the low estimated disease allele frequency for NPC and the accurate detection of heterozygous cattle in the herd, the producer chose to keep the relatively small number of heterozygous cows whilst using bulls that were homozygous wildtype for the disease allele

(Personal communication, 2019). The high estimated disease allele frequency for BCRHS and

246

PHA led to the producers making breeding decisions to cull all heterozygous rams (Personal communication, 2019). For BCRHS the producer is aiming to exclude heterozygous ewes from future breeding to avoid the need for ongoing DNA testing.

Diagnostic testing for PHA allowed the producer to select homozygous wildtype rams for subsequent breeding, and therefore mitigate the risk of producing affected offspring without a need to cull all heterozygous ewes. Flock 2 now no longer exists, and the impact on breeding management for this flock cannot be assessed. These results show that if diagnostic tests are affordable or available, producers are eager to test their stock and can make informed breeding decisions that avoid incidence of affected animals.

Whilst the availability of diagnostic genotyping assays enables producers to make informed decisions, the culling of all heterozygous animals can be counter-productive when disease-allele frequency is high. Once diagnostic DNA tests are available molecular geneticists have a role in providing expert advice to producers and industry on how to use these tests and one consideration should be that genetic diversity in the population should be maintained or improved (Bell 2011). Thus particularly in populations with low genetic diversity or small effective population size if all heterozygous animals are culled, there is a risk that genetic diversity within flocks, herds or even entire breeds could be adversely impacted (Groeneveld et al. 2010; Pryce et al. 2012). The potential increase in homozygosity in inbred populations can detrimentally affect the fitness of individual animals (Smith et al. 1998; Notter 1999; Pryce et al.

2012).

247

The diseases investigated in this thesis have not been reported outside of these flocks or herds and the allele frequencies for disease-causing mutations for IF, NPC, BCRHS and PHA within the wider populations of the associated breeds are unknown. The initial investigation of the genetic etiology for BCRHS was published in the wider scientific community (Shariflou et al.

2011; Shariflou et al. 2012) as well as in industry relevant publications (Windsor 2010) although no further cases apart from those within the original flock were reported. The flock had been managed using both external rams and a majority of internally bred rams, and it may be possible that the disease is only present in this flock or not being reported elsewhere.

Both NPC and IF were isolated to the affected herds, however the late onset of clinical signs for

NPC meant that there was an additional risk of misdiagnosis as a non-inherited disease or as a similar disease such as α-mannosidosis, which could therefore halt more rigorous investigations.

The commercial herd in which NPC occurred had purchased several pure bred Angus bulls from the same stud breeder and it is possible that the disease allele is not restricted to the commercial herd. The stud breeder and the Angus society have been made aware of this research. Similarly, the Beef Shorthorn Australia Society was informed of ongoing research into IF and both breeding organisations were provided with the submitted manuscripts before publication to consider any implications for their breeds. The flocks in which PHA was reported shared genetic resources, partially due to the small number of Persian sheep within Australia. All producers that contributed samples for PHA were informed about the identification of a disease causing variants and the availability of DNA testing. The breed organisation will also be informed. Informing

248

breeding societies whilst keeping anonymity for the producers allows for the inherited disease to be acknowledged, and will hopefully facilitate more open communication within member forums through the dissemination of newsletters or other forms of communication between breed societies and members. Routine educational seminars or information packs provided by either breed societies or district veterinarians could be an important resource to inform producers about emerging inherited diseases and management options for current inherited diseases that could include the availability of diagnostic tests for some of these inherited diseases.

The management of inherited diseases may not be as straight forward as simply culling breeding animals from a flock or herd, especially if a heterozygote advantage exists, such as in Dexter cattle. Dexter cattle are selected for a smaller frame and inadvertently this meant that Dexter cattle that were heterozygous for mutations in the ACAN gene, which causes chondrodysplasia with an in incomplete dominant mode of inheritance, had a selection advantage. In the homozygous state these mutations cause a lethal form of dwarfism (Curran 1990; Cavanagh et al. 2007; Hedrick 2014). Breeding decisions will be made that maximise production efficiency and profitability, yet the consideration of inherited diseases are sometimes not included in breeding objectives due to the perceived reductions in production output or efficiency

(Emanuelson 1988). As breeding objectives vary between producers (Nielsen et al. 2014), the identification of animals that are heterozygous for disease alleles should be made available within herd/flock books or through a centralised database, where producers can effectively select elite animals for both their production value and genetic integrity (Bell 2011).

249

In each of the inherited diseases investigated, inbreeding contributed to the relatively large number of affected animals born. Pedigree analysis identified common sires and dams featuring throughout multiple generations for BCRHS and PHA and sire-daughter matings were considered to have occurred in the NPC herd. This serves as a reminder that reducing inbreeding will reduce the risk that genetic conditions with a recessive mode of inheritance manifest.

Outcrossing provides an excellent opportunity to improve genetic diversity in inbred populations, as new genetic material from different breeds can contribute to heterosis for desirable production traits (Cundiff 1970). Maintaining heterozygosity and actively screening for known deleterious alleles enables for stock to maintain genetic diversity, but also remain more flexible for future response to production and market system changes (McDaniel 2001). Whilst outcrossing can be extremely beneficial for improving genetic diversity, there is a risk of recombination loss where the genetic gain founded by the parental breeds could be lost if the crossbred population is greater than 50% of the breeding population used (Sørensen et al. 2008).

7.1.2 Inherited disease investigation

When investigating inherited diseases in cattle and sheep, it can be difficult to prioritise diseases when limited resources are available, despite clear guidelines on how to prioritise diseases based on welfare impact and allele frequency (Bell 2011; Collins et al. 2011). The identification of four likely causal mutations in this thesis was achieved with limited sample sizes and resources. As explored in previous chapters, diseases for which detailed phenotypes and strong candidate genes were available allow for more targeted approaches and have a higher chance for a quick discovery of disease-causing alleles.

250

Detailed phenotype descriptions are key for the identification of candidate genes through gene function and comparative genomics. It is important to note that the phenotypic presentation of many inherited diseases shows high similarity with other non-genetic diseases, such as α- mannosidosis in cattle, which can stem from a genetic cause or ingestion of plant toxins

(Hocking et al. 1972; Dorling et al. 1978). Diagnosis just on clinical signs can be difficult as these are often not pathognomonic. It is therefore, crucial to differentiate between genetic and non-genetic diagnoses, and the addition of clinical pathology and histopathological investigations can assist with differentiation (Adissu et al. 2014). In Chapter 3, the gross pathology and histopathology conducted for IF in Shorthorn cattle (O’Rourke et al. 2017) enabled for identification of a strong candidate gene, ABCA12 which is involved in the transport of lipid within the epidermis and during keratinisation (Akiyama et al. 2005), due to the clinical similarity to Harlequin ichthyosis in humans and ichthyosis in Chianina cattle (O’Rourke et al.

2017).

In Chapter 4, extensive phenotype descriptions were provided for affected calves 2 and 3, including a video recording, gross pathology and histopathology which indicated a lysosomal storage disorder. A lysosomal storage disease within Angus cattle that causes neurological signs is widely known and well described (Hocking et al. 1972; Tollersrud et al. 1997), and this contributed to the veterinarian considering an inherited disease and requesting a detailed pathological investigation after the DNA test for α-mannosidosis. The availability of fibroblast cell culture from affected animals was essential in this study to allow for the functional analysis

251

that confirmed the diagnosis of bovine NPC. Homozygosity mapping was used to prioritise

NPC1; one of the two genes that can cause NPC in humans (Vanier & Millat 2003), which allowed for specific analysis of this candidate gene using Sanger sequencing. Despite a very the small sample size of three affected animals and two obligate carriers, the likely disease-causing variant was identified.

In Chapter 5, initial studies utilised larger sample sizes of 10 affected animals and 27 control animals to map BCRHS to a 1.1 Mb region on OAR2 (Shariflou et al. 2012). This study followed the publication of the clinical signs of BCRHS (Shariflou et al. 2011), where clear pathological descriptions were provided. The combination of mapping information and phenotype description allowed for the identification of a strong positional candidate gene. This undoubtedly accelerated the discovery of the causal mutation in this study, where only one affected animal was selected for WGS, and control animals were initially limited before utilising the genotyping test that was developed during the study. Similarly, in Chapter 6 for PHA, only two affected animals and one obligate carrier were selected for WGS. SNP genotyping and homozygosity mapping and detailed pathological descriptions were available for PHA and permitted a candidate gene analysis by comparing disease ontology descriptions in humans and cattle.

The collection of a variety of sample types can be integral to inherited disease research.

Pathological changes can be confined to specific tissue types or organs, and understanding the underlying epidemiology of a disease should inform the collection of adequate samples and

252

phenotypes within at-risk populations (Rasmussen et al. 2002). With the sample size of a study often underpinning its likelihood for success (Bacchetti et al. 2008; Bacchetti et al. 2011), obtaining large sample sizes for rare inherited diseases from affected animals can be challenging.

In the past, researchers created research flocks and herds to breed additional animals for inherited disease or production trait investigation via back crossing or test crosses (McKenzie &

Clarke 1988; Otsu et al. 1991; Charlier et al. 1995). This investigative practise is time consuming, costly and raises some ethical concerns about breeding animals that are likely to have disease (Varga et al. 2010). New approaches based on homozygosity mapping or identity by descent when compared to linkage analysis or knowledge about strong candidate genes can reduce the number of required animals and controls. Lack of appropriate controls can lead to false positives, where regions thought to be associated with disease sometimes correspond to genetic sweeps unrelated to the disease. Lack of controls can also hinder validation of a likely causative variant in related herds or flocks.

The evolution of genome editing offers vast opportunities for inherited disease research. The recent use of the CRISPR-Cas9 complex to introduce directed alterations to the genome has proven successful (Bell et al. 2014). Genome editing of cell lines could be used for validation purposes, where genome editing could be used to correct the mutation in the fibroblast cell line maintained for bovine NPC in Chapter 4. Reversal of phenotype in the fibroblasts could be considered as a proof that the mutation is disease causing. Discussions surrounding the editing of livestock genomes to improve production traits or the health of animals has been fraught with ethical and sustainability questions (Eriksson et al. 2018).

253

Whilst the use of genome-edited animals is still within its infancy, special precautions surrounding the screening of potential off-target effects, the assessment of risk versus gain and common regulatory frameworks needs to be conducted in order to facilitate further research

(Young et al. 2020). It is therefore a question of when these genome editing technologies will become more common in correcting the germ plasm of elite bulls or dams for inherited diseases or production traits, and what industries will be targeted for their widespread use. It has already been suggested that variants responsible for desirable production traits must be clearly identified before large scale genome editing of elite animals can begin (Mclean et al. 2020). Desirable traits are generally not linked to single genes with large effects, but rather are complex traits linked to numerous variants with small-scale effects, which could make targeting variants associated with production traits for genome editing difficult (Hayes et al. 2013). The applicability of this technology to correcting disease alleles opens a multitude of ethical and sustainability questions and concerns, and must be approached in a manner that facilitates open discussion between industry and producers and consumers.

Whilst genome editing provides opportunities for altering and potentially reducing the known disease allele landscape in livestock, animal models of disease can be extremely useful for human disease research. This thesis has identified the direct link between the potential availability and usefulness of animal models for human disease, as explored in Chapters 4 and 5, where both NPC and BCRHS are excellent candidates for human NPC and human 3M syndrome-2 studies. Large animal models of disease facilitate improved organ scaling when

254

compared to rodent models, and allow for enhanced disease characterisation and discovery of underlying disease mechanisms (Agerholm 2008; Pinnapureddy et al. 2015). Similarly to the genome-editing of animal genomes, the use of animal models for human disease also has ethical concerns when alternative in vivo approaches can be utilised (Varga et al. 2010). Considerations for the cost of breeding and maintaining research populations of large animal models and the time required to do so is an important aspect when comparing large animal models to rodent models. Large animal models are however considered to be safe intermediates between preclinical research and clinical trials in humans (Gurda & Vite 2019), and are thus a valuable resource when needing to prove efficacy of therapeutic treatments.

7.1.3 Challenging breeding attitudes for future diversity and food security

Livestock production is a major cornerstone of Australian agriculture. The importance of breeding efficient and highly productive animals amid urbanisation, human population growth and climate change is becoming more evident (Keating & Carberry 2010). It is expected that by

2050, agricultural production across the globe will need to at least double to meet consumer demand (Dorrough et al. 2007). Livestock production will play a vital role in providing protein for the increasing human population (Godber & Wall 2014), and livestock breeding and management will become crucial for future sustainability of the livestock industry, both within

Australia and across the globe. Future sustainability of the livestock industry will also need to take into consideration the role of consumers, as the social license to operate in the livestock

255

industries within Australia is growing, especially in regards to animal welfare concerns (Lusk &

Norwood 2008; Nocella et al. 2010).

To improve both animal welfare and production efficiency, animal breeding attitudes about the reporting and management of inherited diseases must change. Inherited disease research is underfunded as the true impact of inherited conditions is underestimated. To effectively manage and educate producers and industry, research within this topic area needs to expand. Funding should not be restricted to funding from breeding societies as this disadvantages smaller breed societies with lower membership and limited resources available. Consumer concern for animal welfare may require breed societies to undertake more proactive approaches to managing inherited diseases. The stigma behind reporting inherited diseases within herds or flocks needs to change, as there are plentiful resources available to facilitate the investigation and potential identification of causal variants.

7.2 Conclusion

This study has identified causal mutations for four inherited diseases, IF in Shorthorn cattle, NPC in Angus/Angus-cross cattle, BCRHS in Merino sheep and PHA in Persian/Persian-cross sheep using multiple approaches to harness and extract the available information for each inherited disease. The importance of detailed phenotype descriptions, sample quality and quantity, pedigree information and the role of candidate gene analysis have been essential in discovering mutations for rare inherited diseases. The importance of reporting, investigating and managing inherited diseases within the cattle and sheep industries in Australia must not be undermined.

256

Once disease causing mutations have been identified DNA diagnostics should be implemented in ways that do not reduce genetic diversity. As the growing human population becomes more reliant on the production of protein, the health and genetic diversity of cattle and sheep will become even more imperative to future food security.

7.3 References

Adissu H.A., Estabel J., Sunter D., Tuck E., Hooks Y., Carragher D.M., Clarke K., Karp N.A.,

Project S.M.G., Newbigging S., Jones N., Morikawa L., White J.K. & McKerlie C.

(2014) Histopathology reveals correlative and unique phenotypes in a high-throughput

mouse phenotyping screen. Disease Models & Mechanisms 7, 515.

Agerholm J.S. (2008) Inherited disorders of ruminants: The sheep as a model of disease in

humans. The Veterinary Journal 177, 305-6.

Agerholm J.S., Basse A. & Christensen K. (1993) Investigations on the occurrence of hereditary

diseases in the Danish cattle population 1989-1991. Acta Veterinaria Scandinavica 34,

245-53.

Akiyama M., Sugiyama-Nakagiri Y., Sakai K., McMillan J.R., Goto M., Arita K., Tsuji-Abe Y.,

Tabata N., Matsuoka K., Sasaki R., Sawamura D. & Shimizu H. (2005) Mutations in lipid

transporter ABCA12 in harlequin ichthyosis and functional recovery by corrective gene

transfer. Journal of Clinical Investigation 115, 1777-84.

Angus Australia (2020) Genetic Conditions in Angus. Accessed 29th September 2020. URL:

https://www.angusaustralia.com.au/education/breeding-and-genetics/genetic-conditions-

in-angus/.

257

Bacchetti P., Deeks S.G. & McCune J.M. (2011) Breaking free of sample size dogma to perform

innovative translational research. Science Translational Medicine 3, 1-4.

Bacchetti P., McCulloch C.E. & Segal M.R. (2008) Simple, defensible sample sizes based on

cost efficiency. Biometrics 64, 577-94.

Bell C.C., Magor G.W., Gillinder K.R. & Perkins A.C. (2014) A high-throughput screening

strategy for detecting CRISPR-Cas9 induced mutations using next-generation

sequencing. BMC Genomics 15, 1-7.

Bell J.S. (2011) Researcher responsibilities and genetic counseling for pure-bred dog

populations. The Veterinary Journal 189, 234-5.

Berg T., Healy P.J., Tollersrud O.K. & Nilssen O. (1997) Molecular heterogeneity for bovine

alpha-mannosidosis: PCR based assays for detection of breed-specific mutations.

Research in Veterinary Science 63, 279-82.

Buermans H.P.J. & den Dunnen J.T. (2014) Next generation sequencing technology: Advances

and applications. Biochimica et Biophysica Acta 184, 1932–41.

Capitan A., Michot P., Guillaume F., Grohs C., Djari A., Fritz S., Barbey S., Otz P., Bourneuf E.,

Esquerre D., Gallard Y., Klopp C. & Boichard D. (2014) Rapid discovery of mutations

responsible for sporadic dominant genetic defects in livestock using genome sequence

data: enhancing the value of farm animals as model species. In: 10. World Congress of

Genetics Applied to Livestock Production, Vancouver, Canada.

Cavanagh J., Tammen I., Windsor P., Bateman J., Savarirayan R., Nicholas F. & Raadsma H.

(2007) Bulldog dwarfism in Dexter cattle is caused by mutations in ACAN. Mammalian

Genome 18, 808-14.

258 Charlier C., Coppieters W., Farnir F., Grobet L., Leroy P.L., Michaux C., Mni M., Schwers A.,

Vanmanshoven P., Hanset R. & Georges M. (1995) The mh gene causing double-

muscling in cattle maps to bovine chromosome 2. Mammalian Genome 6, 788-92.

Collins L.M., Asher L., Summers J. & McGreevy P. (2011) Getting priorities straight: Risk

assessment and decision-making in the improvement of inherited disorders in pedigree

dogs. The Veterinary Journal 189, 147-54.

Cundiff L.V. (1970) Experimental results on crossbreeding cattle for beef production. Journal of

Animal Science 30, 694-705.

Curran P.L. (1990) Kerry and Dexter cattle and other ancient Irish breeds : A history. Royal

Dublin Society, Dublin.

Dennis J.A. & Healy P.J. (1999) Definition of the mutation responsible for maple syrup urine

disease in Poll Shorthorns and genotyping Poll Shorthorns and Poll Herefords for maple

syrup urine disease alleles. Research in Veterinary Science 67, 1-6.

Dennis J.A., Healy P.J., Beaudet A.L. & O'Brien W.E. (1989) Molecular definition of bovine

argininosuccinate synthetase deficiency. Proceedings of the National Academy of

Sciences 86, 7947-51.

Dennis J.A., Moran C. & Healy P.J. (2000) The bovine alpha-glucosidase gene: Coding region,

genomic structure, and mutations that cause bovine generalized glycogenosis.

Mammalian Genome 11, 206-12.

Dennis S.M. (1993) Congenital defects of sheep. Veterinary Clinics of North America: Food

Animal Practice 9, 203-17.

Dorling P.R., Huxtable C.R. & Vogel P. (1978) Lysosomal storage in Swainsona spp. toxicosis:

An induced mannosidosis. Neuropathology and Applied Neurobiology 4, 285-95.

259

Dorrough J., Moll J. & Crosthwaite J. (2007) Can intensification of temperate Australian

livestock production systems save land for native biodiversity? Agriculture, Ecosystems

& Environment 121, 222-32.

Emanuelson U. (1988) Recording of production diseases in cattle and possibilities for genetic

improvements: A review. Livestock Production Science 20, 89-106.

Eriksson S., Jonas E., Rydhmer L. & Röcklinsberg H. (2018) Invited review: Breeding and

ethical perspectives on genetically modified and genome edited cattle. Journal of Dairy

Science 101, 1-17.

Frawley P.T. (2003) Review of rural veterinary services report. Department of Agriculture,

Fisheries and Forestry Australia, Australia, 1-133.

Frugier T., Mitchell N.L., Tammen I., Houweling P.J., Arthur D.G., Kay G.W., van Diggelen

O.P., Jolly R.D. & Palmer D.N. (2008) A new large animal model of CLN5 neuronal

ceroid lipofuscinosis in Borderdale sheep is caused by a nucleotide substitution at a

consensus splice site (c.571 + 1G >>> A) leading to excision of exon 3. Neurobiology of

Disease 29, 306-15.

Godber O.F. & Wall R. (2014) Livestock and food security: Vulnerability to population growth

and climate change. Global Change Biology 20, 3092-102.

Groeneveld L.F., Lenstra J.A., Eding H., Toro M.A., Scherf B., Pilling D., Negrini R., Finlay

E.K., Jianlin H., Groeneveld E., Weigend S. & Consortium. T.G. (2010) Genetic

diversity in farm animals – a review. Animal Genetics 41, 6-31.

Gurda B.L. & Vite C.H. (2019) Large animal models contribute to the development of therapies

for central and peripheral nervous system dysfunction in patients with lysosomal storage

diseases. Human Molecular Genetics 28, R119-R131.

260 Gurgul A., Semik E., Pawlina K., Szmatola T., Jasielezuk I. & Bugno-Poniewierska M. (2014)

The application of genome-wide SNP genotyping methods in studies on livestock

genomes. Journal of Applied Genetics 55, 197-208.

Harper P.A.W., Healy P.J., Dennis J.A., O' Brien J.J. & Rayward D.H. (1986) Citrullinaemia as a

cause of neurological disease in neonatal Friesian calves. Australian Veterinary Journal

63, 378-9.

Hayes B.J., Lewin H.A. & Goddard M.E. (2013) The future of livestock breeding: Genomic

selection for efficiency, reduced emissions intensity, and adaptation. Trends in Genetics

29, 206-14.

Healy P.J. (1996) Testing for undesirable traits in cattle: An Australian perspective. Journal of

Animal Science 74, 917–22.

Healy P.J., Harper P.A.W. & Dennis J.A. (1990) Bovine citrullinaemia: A clinical, pathological,

biochemical and genetic study. Australian Veterinary Journal 67, 255-8.

Hedrick P.W. (2014) Heterozygote advantage: The effect of artificial selection in livestock and

pets. Journal of Heredity 106, 141-54.

Hocking J.D., Jolly R.D. & Batt R.D. (1972) Deficiency of α-mannosidase in Angus cattle. An

inherited lysosomal storage disease. Biochemical Journal 128, 69-78.

Houweling P.J., Cavanagh J.A., Palmer D.N., Frugier T., Mitchell N.L., Windsor P.A., Raadsma

H.W. & Tammen I. (2006) Neuronal ceroid lipofuscinosis in Devon cattle is caused by a

single base duplication (c. 662dupG) in the bovine CLN5 gene. Biochimica et Biophysica

Acta - Molecular Basis of Disease 1762, 890-7.

Irish Cattle Breeding Federation (2016) Health and Disease. Accessed 26th October 2020. URL:

https://www.icbf.com/wp/?page_id=2170.

261

Jolly R.D., Digby J.G. & Rammell C.G. (1974a) A mass screening programme of angus cattle

for the mannosidosis genotype - a prototype programme for control of inherited diseases

in animals. New Zealand Veterinary Journal 22, 218-22.

Jolly R.D., Thompson K.G. & Tse C.A. (1974b) Evaluation of a screening programme for

identification of mannosidosis heterozygotes in Angus cattle. New Zealand Veterinary

Journal 22, 185-90.

Jolly R.D., Thompson K.G., Tse C.A., Munford R.E. & Merrall M. (1974c) Identification of

mannosidosis heterozygotes — factors affecting normal plasma α-mannosidase levels.

New Zealand Veterinary Journal 22, 155-62.

Keating B.A. & Carberry P.S. (2010) Emerging opportunities and challenges for Australian

broadacre agriculture. Crop & Pasture Science 61, 269-78.

Kerr R.J. & Kinghorn B.P. (1996) An efficient algorithm for segregation analysis in large

populations. Journal of Animal Breeding and Genetics 113, 457-69.

Kinghorn B. & Kinghorn S. (2020) MateSel. Accessed 30th October 2020. URL:

https://www.matesel.com/.

Légifrance. Official Journal of the French Republic. (2020) Order of 12 December 2013 relating

to the registration and certification of the parentage of bovines. In: Article 6 (ed. by

Légifrance). Légifrance, France.

Lusk J.L. & Norwood F.B. (2008) A survey to determine public opinion about the ethics and

governance of farm animal welfare. Journal of the American Veterinary Medical

Association 233, 1121–6.

McDaniel B.T. (2001) Uncontrolled inbreeding. Journal of Dairy Science 84, E185-E6.

262

McKenzie J.A. & Clarke G.M. (1988) Diazinon resistance, fluctuating asymmetry and fitness in

the Australian sheep blowfly, lucilia cuprina. Genetics 120, 213-20.

Mclean Z., Oback B. & Laible G. (2020) Embryo-mediated genome editing for accelerated

genetic improvement of livestock. Frontiers of Agricultural Science and Engineering. 7,

148-60.

National Bovine Anomalies Observatory INRAE. (2020) Report an anomaly, instructions for

use. Accessed 19th October 2020. URL: https://www.onab.fr/Declarer-une-anomalie.

Nielsen H.M., Amer P.R. & Byrne T.J. (2014) Approaches to formulating practical breeding

objectives for animal production systems. Acta Agriculturae Scandinavica, Section A —

Animal Science 64, 2-12.

Nocella G., Hubbard L. & Scarpa R. (2010) Farm animal welfare, consumer willingness to pay,

and trust: results of a cross-national survey. Applied Economic Perspectives and Policy

32, 275-97.

Notter D.R. (1999) The importance of genetic diversity in livestock populations of the future.

Journal of Animal Science 77, 61-9.

O’Rourke B.A., Kelly J., Spiers Z.B., Shearer P.L., Porter N.S., Parma P. & Longeri M. (2017)

Ichthyosis fetalis in Polled Hereford and Shorthorn calves. Journal of Veterinary

Diagnostic Investigation 29, 874-6.

Online Mendelian Inheritance in Animals (2020) Sydney School of Veterinary Science,

University of Sydney, Sydney. Accessed 31st August 2020. URL: https://omia.org/.

Otsu K., Khanna V.K., Archibald A.L. & Maclennan D.H. (1991) Cosegregation of porcine

malignant hyperthermia and a probable causal mutation in the skeletal muscle ryanodine

receptor gene in backcross families. Genomics 11, 744-50.

263

Pierce K.D., Handford C.A., Morris R., Vafa B., Dennis J.A., Healy P.J. & Schofield P.R. (2001)

A nonsense mutation in the α1 subunit of the inhibitory glycine receptor associated with

bovine myoclonus. Molecular and Cellular Neuroscience 17, 354-63.

Pinnapureddy A.R., Stayner C., McEwan J., Baddeley O., Forman J. & Eccles M.R. (2015)

Large animal models of rare genetic disorders: Sheep as phenotypically relevant models

of human genetic disease. Orphanet Journal of Rare Diseases 10, 1-8.

Pryce J.E., Hayes B.J. & Goddard M.E. (2012) Novel strategies to minimize progeny inbreeding

while maximizing genetic gain using genomic information. Journal of Dairy Science 95,

377-88.

Rasmussen S.A., Lammer E.J., Shaw G.M., Finnell R.H., McGehee R.E., Gallagher M., Romitti

P.A. & Murray J.C. (2002) Integration of DNA sample collection into a multi‐site birth

defects case‐control study. Teratology 66, 177-84.

Shariflou M.R., Wade C.M., Kijas J., McCulloch R., Windsor P.A., Tammen I. & Nicholas F.W.

(2012) Brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in Merino

sheep maps to a 1.1-megabase region on ovine chromosome OAR2. Animal Genetics 44,

231-3.

Shariflou M.R., Wade C.M., Windsor P.A., Tammen I., James J.W. & Nicholas F.W. (2011)

Lethal genetic disorder in Poll Merino/Merino sheep in Australia. Australian Veterinary

Journal 89, 254-9.

Shen T., Lee A., Shen C. & Lin C.J. (2015) The long tail and rare disease research: The impact

of next-generation sequencing for rare Mendelian disorders. Genetics Research 97, 1-14.

Smith L.A., Cassell B.G. & Pearson R.E. (1998) The effects of inbreeding on the lifetime

performance of dairy cattle. Journal of Dairy Science 81, 2729-37.

264

Sørensen M.K., Norberg E., Pedersen J. & Christensen L.G. (2008) Invited Review:

Crossbreeding in dairy cattle: A Danish perspective. Journal of Dairy Science 91, 4116-

28.

Tammen I., Cook R.W., Nicholas F.W. & Raadsma H.W. (2001) Neuronal ceroid lipofuscinosis

in Australian Merino sheep: A new animal model. European Journal of Paediatric

Neurology 5, 37-41.

Tammen I., Houweling P.J., Frugier T., Mitchell N.L., Kay G.W., Cavanagh J.A.L., Cook R.W.,

Raadsma H.W. & Palmer D.N. (2006) A missense mutation (c.184C>T) in ovine CLN6

causes neuronal ceroid lipofuscinosis in Merino sheep whereas affected South Hampshire

sheep have reduced levels of CLN6 mRNA. Biochimica et Biophysica Acta - Molecular

Basis of Disease 1762, 898-905.

Teseling C. & Parnell P. (2011) The effective management of deleterious genetic conditions of

cattle. In: Proceedings of Association for the Advancement of Animal Breeding and

Genetics, 131-4.

Teseling C. & Parnell P. (2013) How Angus breeders have reduced the frequency of deleterious

recessive genetic conditions. In: Proceedings of Association of Advancement Animal

Breeding and Genetics. AAABG 20, 558-61.

Tollersrud O.K., Berg T., Healy P., Evjen G., Ramachandran U. & Nilssen O. (1997)

Purification of bovine lysosomal alpha-mannosidase, characterization of its gene and

determination of two mutations that cause alpha-mannosidosis. European Journal of

Biochemistry 246, 410-9.

Vanier M.T. & Millat G. (2003) Niemann–Pick disease type C. Clinical Genetics 64, 269-81.

265

Varga O.E., Hansen A.K., Sandøe P. & Olsson I.A.S. (2010) Validating animal models for

preclinical research: A scientific and ethical discussion. Alternatives to Laboratory

Animals 38, 245-8.

Windsor P. (2010) Brachygnathia, cardiomegaly and renal hypoplasia syndrome (BCRHS) in

Poll Merino lambs. Flock and Herd. URL:

http://www.flockandherd.net.au/sheep/reader/bcrhs.html

Young A.E., Mansour T.A., McNabb B.R., Owen J.R., Trott J.F., Brown C.T. & Van

Eenennaam A.L. (2020) Genomic and phenotypic analyses of six offspring of a genome-

edited hornless bull. Nature Biotechnology 38, 225-32.

Zhang B., Healy P.J., Zhao Y., Crabb D.W. & Harris R.A. (1990) Premature translation

termination of the pre-E1 alpha subunit of the branched chain alpha-ketoacid

dehydrogenase as a cause of maple syrup urine disease in Polled Hereford calves. Journal

of Biological Chemistry 265, 2425-7.

266