Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2012 mapping of monogenic disorders and complex diseases via genome wide association studies Xia Zhao Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Animal Diseases Commons, and the Molecular Biology Commons

Recommended Citation Zhao, Xia, "Gene mapping of monogenic disorders and complex diseases via genome wide association studies" (2012). Graduate Theses and Dissertations. 12878. https://lib.dr.iastate.edu/etd/12878

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Gene mapping of monogenic disorders and complex diseases via genome wide association studies

by

Xia Zhao

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Genetics

Program of Study Committee: Max Rothschild, Co-major Professor Dorian Garrick, Co-major Professor Susan Lamont N. Matthew Ellinwood Peng Liu

Iowa State University

Ames, Iowa

2012

Copyright © Xia Zhao, 2012. All rights reserved.

ii

TABLE OF CONTENTS LIST OF FIGURES iv LIST OF TABLES vi ABSTRACT vii CHAPTER1. GENERAL INTRODUCTION 1 INTRODUCTION 1 RESEARCH OBJECTIVES 2 DISSERTATION ORGANIZATION 2 LITERATURE REVIEW 4 REFERENCES 27 CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP 38 ABSTRACT 38 INTRODUCTION 39 RESULTS 41 DISCUSSION 44 MATERIALS and METHODS 47 REFERENCES 51 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR DISEASE 61 ABSTRACT 61 INTRODUCTION 62 METHODS 64 RESULTS 68 DISCUSSION 72 ACKNOWLEDGEMENTS 76 REFERENCES 76 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP 87 SUMMARY 87 INTRODUCTION 88 MATERIALS AND METHODS 91 RESULTS 95 DISCUSSION 99 REFERENCES 105 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE 120 ABSTRACT 120 INTRODUCTION 121 iii

METHODS 125 RESULTS 129 DISCUSSION 133 ACKNOWLEDGEMENTS 138 REFERENCES 138 CHAPTER 6. GENERAL CONCLUSIONS 157 FUTURE RESEARCH 161 CONCLUSIONS 162 ACKNOWLEDGEMENTS 164

iv

LIST OF FIGURES

CHAPTER1. GENERAL INTRODUCTION 1 Figure 1. Structures involved in testicular descent 20 CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP 38 Figure 1. Corriedale sheep with rickets 54 Figure 2. Work flow chart for discovering the mutation of inherited rickets in Corriedale sheep 55 Figure 3. Mutation analyses 56 Figure S1. Complete predicted DMP1 sequence alignments among rickets affected sheep, wild type sheep and cow 57 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE 61 Figure 1. The distribution of identical homozygous fragments in affected lambs 79 Figure 2. Work flow chart for discovering the mutation locus of low motor neuron disease in Romney sheep 80 Figure 3 The conserved domain structures of partial protein sequence of AGTPBP1 81 Figure S1. Conserved amino acids predicted by “CD-search” 84 Figure S2. Characteristics of protein AGTPBP1 85 Figure S3. Secondary structure modeling of the AGTPBP1 protein indicates critical residues 86 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP 87 Figure 1. Chondrodysplasia of Texel sheep 109 Figure 2. Work flow chart for discovering the mutation locus of chondrodysplasia in Texel sheep 110 Figure 3. The topology prediction of transmembrane domains (TMDs) of ovine SLC13A1 protein 111 Figure 4. The direct sequencing results of the causative mutation (g.25513delT) 112 Figure S1. The pedigree information for the 23 sheep genotyped by ovine SNP chip 113 Figure S2. The distribution of homozygous fragments commonly in affected lambs 114 Figure S3. The CNV region display 115 Figure S4. The comparison of the SLC13A1 protein sequences among 7 species 116 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE 120 Figure 1. Re-grouping dogs based on the value of cluster membership estimated from STRUCTURE 144 v

Figure 2. Whole genome association analyses for subgroup 1 comprising 129 mostly USA sourced (60 cases and 69 controls) Siberian Huskies 145 Figure S1. Whole genome association analyses for subgroup 2 comprising 41 mixed sourced (26 cases and 15 controls) Siberian Huskies 146 Figure S2. Whole genome association analyses for subgroup 3 comprising 35 mostly UK sourced (20 cases and 15 controls) Siberian Huskies 147 Figure S3. Whole genome association analyses for a pooled sample of 204 Siberian Huskies in which 105 were diagnosed with cyptorchidism 148

vi

LIST OF TABLES

CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE FOR INHERITED RICKETS IN CORRIEDALE SHEEP 38 Table 1. The genotyping results of the causative mutation (R145X) 54 Table S1. Primer sequence, annealing temperature, pcr amplicon information and genetic variants identified by sequencing 58 Table S2. The list of ovine positional candidate based on the bovine reference gene sequences 59 CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN SHEEP WITH A LOWER MOTOR NEURON DISEASE 61 Table 1. The genotyping results of three exonic SNPs 79 Table S1 primer sequences, annealing temperatures, pcr amplicon information and targeted encoding regions of gene AGTPBP1 82 Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences 83 CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP 87 Table 1. The genotyping results of the causative mutation (g.25513del) 108 Table S1. A list of candidate gene symbol, primer sequence, annealing temperature, PCR amplicon information, genetic variants identified by sequencing and accessed template sequence 117 CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES THE AMHR2 GENE 120 Table1. The top 1 Mb windows with the high percentage of genetic variance obtained from bayesb analyses in different Siberian Husky subgroups 149 Table 2. The results of bayesb analyses on 17 previously reported genes being associated with cryptorchidism 150 Table S1. Re-grouping dogs based on the value of cluster membership (k=3) 151 Table S2. A list of genes in the putative regions for subgroup1 152 Table S3. A list of genes in the putative regions for subgroup2 and subgroup3 155 Table S4. A list of genes in the putative regions for all Siberian huskies (including subgroup1, 2 and 3) 156

vii

ABSTRACT

Inherited rickets, lower motor neuron disease and chondrodysplasia were recently reported genetic disorders that had been identified on sheep farms in New Zealand. Animals with rickets showed softening and weakening bone, and were characterized by decreased growth rate, thoracic lordosis and angular limb deformities. Sheep diagnosed with lower motor neuron disease were characterized by poor muscle tone and progressive weakness from about one week of age, leading to severe tetraparesis, recumbency and muscle atrophy.

Chondrodysplasia in Texel sheep was characterized by dwarfism and angular deformities of the forelimbs. These three disorders were determined likely simple autosomal recessive by backcross breeding trials. One complex disease called cryptorchidism was found with high prevalence in Siberian Husky dogs. This disease is a congenital disorder characterized by failure of one or both to descend into the scrotum. In order to map gene loci and thus the mutations involved, genome wide association studies were conducted using the

Illumina OvineSNP50 BeadChip for the three monogenic disorders and the Illumina

CanineHD BeadChip for cryptorchidism. Several mapping strategies were utilized including homozygosity mapping, Bayesian inference, case-control analysis and these were followed by fine mapping with a candidate gene approach.

In results, a nonsense mutation 250C > T in 6 of the dentin matrix protein 1 gene

(DMP1) was revealed as the putative cause of inherited rickets in Corriedale sheep. The missense mutation c.2909G > C on exon 21 of the ATP/GTP-binding protein 1 gene

(AGTPBP1) was identified likely being responsible for lower motor neuron disease in

Romney sheep. A 1-bp deletion (g.25513delT) on exon 3 in the solute carrier family 13, viii

member 1 gene (SLC13A1) was discovered for chondrodysplasia seen in Texel sheep. The

100% concordant rate of genotypes for mutations in certain sheep populations with the

recessive pattern of inheritance for the three disorders indicates a causative relationship

between these mutations with their related monogenic disorders. Gene knockout mice and

studies of similar diseases in provide more evidence to support this conclusion. In

the cryptorchidism study, the anti-mullerian hormone type II receptor gene (AMHR2) was

found located in a 1Mb window on dog CFA27 which is associated with

cryptorchidism using a BayesB approach in a subgroup of Siberian Husky dogs. Previous

studies support AMHR2 is a very promising functional candidate gene for the development of cryptorchidism. In conclusions, we have successfully identified putative causative mutations responsible for the three monogenic disorders in sheep, and also found a promising candidate gene in a putative region associated with cryptorchidism in dogs. Our finding could shed light on further investigations of the role of involved genes and the affected animals could be used as animal models for human diseases with similar mechanisims. 1

CHAPTER1. GENERAL INTRODUCTION

INTRODUCTION

Genetic disorders in humans and animals include illnesses caused by abnormalities in genes or , those abnormalities typically existing at birth. Disorders can be passed down from the parents’ genes, or caused by de novo mutations to the DNA loci. Disorders caused by a single mutated gene are called single or monogenic defects. Over 4,000 human diseases belong to this kind (Wikipedia, http://en.wikipedia.org/wiki/Genetic_disorder).

Single gene disorders are relatively easy to be studied especially when a supportive pedigree exists. In this dissertation, we described detection of causative mutations involved in three sporadically occurring heritable disorders in sheep including inherited rickets, lower motor neuron disease and chondrodysplasia. These defects all show an inheritance pattern that reflects a simple gene autosomal recessive basis. Homozygosity mapping was applied to locate the genomic regions and then the positional candidate gene approach was carried out to search for the causative mutation responsible for each of these disorders.

Genetic disorders might not be monogenic, but can be caused by complex, multifactorial or so-called polygenic effects. The effects of multiple genes and environmental factors would contribute to the development of complex diseases. Compared to monogenic disorders, the genetic basis underlying complex disorders are not simply deciphered. Different strategies implemented in the genetic analyses were utilized for this kind of diseases. Here we did a whole genome scan for a complex disorder called cryptorchidism in Siberian Husky dogs.

Cryptorchidism is an abnormality whereby one or both testes fail to descend into the scrotum.

Undescended testes could reduce fertility, and increase the risk of testicular tumors. For 2

certain purebred breeds of dogs, the incidence of cryptorchidism is as high as 14% of the population. Dogs with cryptorchidism will not be qualified by their breed organizations for either breeding purposes or most of the dog competitions or dog shows in the United States.

It is essential to remove affected dogs and prevent carriers from contributing to the next generation of a dog breed. In this dissertation, we integrate information from a genome-wide association study with biological functions and comparative genomics to predict possible genomic regions and candidate genes that might be the cause of cryptorchidism.

RESEARCH OBJECTIVES In this dissertation, the first goal is to explore the genetic basis for each of the disorders above. The causative mutations can be used as genetic markers for breeding purposes in order to exclude carriers to avoid at risk mating, and to improve animal welfare and decrease the economic losses. Second, due to the body size and relatively low cost of maintenance, sheep with rickets, lower motor neuron disease or chondrodysplasia could be used as animal models for studies of similar disorders in humans. Third, the results from the dog cryptorchidism study may provide valuable information for human research on cryptorchidism.

DISSERTATION ORGANIZATION This dissertation was written by starting with a literature review (Chapter 1) followed by four manuscripts published in peer-reviewed journals (or prepared for submission). Chapter 2 is a manuscript that explores the causative mutation responsible for inherited rickets in

Corriedale sheep. Dorian Garrick designed this experiment and assisted to uncover the identical by descent regions. The fine mapping in this research and manuscript was 3

conducted and written by Xia Zhao under the direction of Dorian Garrick and Max

Rothschild. Keren Dittmer provided suggestions for good candidate genes and contributed a lot for manuscript preparation. This work has been published in PLoS One 2011;6:e21739.

Chapter 3 is a manuscript that examines the genetic basis causing a lower motor neuron disease that occurred in an extended family of Romney lambs in New Zealand. Dorian

Garrick designed this experiment and helped to uncover the identical by descent regions.

Searching for priority candidate genes in the homozygous regions, mutation discovery and mutation analyses were performed by Xia Zhao. Suneel Onteru contributed initial sequencing during the screening of positional candidate genes. The manuscript was written by Xia Zhao under the supervision of Dorian Garrick and Max Rothschild. This work has been published in Heredity 2012;109:156-62. Chapter 4 is a manuscript that unravels the cause of chondrodysplasia in Texel sheep by a whole genome scan. With the help from

Dorian Garrick, the identical by descent region was located. Fine mapping of positional candidate genes was carried out by Xia Zhao and Suneel Onteru. Eventually, the putative causal mutation was identified and the manuscript was written by Xia Zhao under the supervision of Dorian Garrick and Max Rothschild. This work has been published in a special issue of Animal Genetics 2012;43 Suppl 1:9-18. Chapter 5 is a manuscript that exams the associated genomic regions and candidate genes for cryptorchidism in Siberian

Huskies using a whole genome association study built on a foundation of Bayesian inference.

This research was conducted by Xia Zhao under the supervision of Max Rothschild and

Dorian Garrick. Max Rothschild helped to design this experiment. Suneel Onteru and

Mahdi Saatchi conducted the Bayes B analyses. The manuscript was written by Xia Zhao 4

under the supervision of Max Rothschild and Dorian Garrick. Chapter 6 discribed

conclusions summarized from Chapters 2, 3, 4 and 5.

LITERATURE REVIEW

1. Current Trends in Characterizing The Underlying Basis of Genetic

Disorders

The completed whole genome sequence in humans and many animal species, and subsequent

generation of widespread markers of genetic variations have allowed investigators to obtain

dramatic improvement in their ability to map gene loci associated with monogenic or

polygenetic diseases. Monogenic disorders are rare at a population level and result from a

single mutated gene. These disorders typically follow Mendelian patterns of transmission

(autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive) or non-

Mendelian patterns of transmission (Y-linked and mitochondrial or maternal inherited). As a

result, monogenic disorders are easily studied when supportive pedigrees are available with

many affected members. The database OMIM (Online Mendelian Inheritance In Man,

http://www.ncbi.nlm.nih.gov/Omim/mimstats.html) reported in September 21st, 2012, a total of 3571 human phenotypes with a known molecular basis. The database OMIA (Online

Mendelian Inheritance in Animals, http://omia.angis.org.au/home/) also provides a list of 473

Mendelian traits with a known important mutation in animals, and 1,218 potential animal models for human diseases. The two databases may exhibit overlap with similar characterized disease phenotypes and this helps to establish the basis for comparative genomics to facilitate further disease studies in humans or animal species. 5

More recently, researchers have shifted their focus from rare monogenic to more common

polygenic disorders. Unlike monogenic disorders, polygenic disorders do not usually follow

Mendelian inheritance pattern, and appear to be caused by an unknown number of genes.

Only a small fraction of affected individuals in some complex diseases such as

cardiovascular defects, diabetes and several cancers, have been found to result from a single

mutant gene transmitted with Mendelian inheritance (Scheuner et al., 2004). The effects of multiple genes and environmental factors might contribute together to the development of these kinds of diseases. The genetic mechanisms of many complex diseases are still unclear.

Studying complex disorders for their genetic basis is more difficult than studying monogenic diseases. Genome wide association studies (GWAS) represent a major tool for unraveling complex traits (Stranger et al., 2011). Linkage mapping can be used in studies of complex multifactorial disease. Additionally, studies of monogenic diseases will contribute greatly to knowledge of polygenic forms of human diseases (Antonarakis and Beckmann, 2006). The following section describes the basic principles of a list of mapping methods and their applications for uncovering mutations responsible for monogenic disorders and complex diseases.

2. Gene Mapping Strategy

Understanding the genetic basis of human diseases was extremely challenging for biomedical researchers before the 20th century. A big improvement has been obtained with the development of novel genetic tools and technologies, such as single polymorphism (SNP) chips, microarrays and next generation sequencing (NGS). Several gene mapping strategies have been broadly performed in many studies for human and animal 6

diseases including linkage mapping, GWAS, homozygosity mapping, and whole genome

sequencing.

2.1 Linkage Mapping

Linkage mapping was widely applied as a traditional mapping strategy for qualitative and

quantitative trait loci (QTL) mapping over the last few decades. This mapping strategy is based on the assumption that linked genes are on the same chromosome, whereas the chance of two gene loci located at different chromosomes or more than 50 centimorgan (cM) apart being co-inherited in one meiotic event is 0.5. According to this principle, detecting markers being inherited together with a given phenotype having a probability much larger than 0.5 will help discover phenotype related genes or QTL regions. Linkage mapping is usually conducted in certain families that can provide informative meioses for linkage detection. A meiosis is informative for linkage when one can identify whether or not the resulting gamete is a recombinant in a particular region.

Different types of markers have been used for linkage map construction. Restriction fragment length polymorphisms (RFLPs) are differences between samples of homologous

DNA molecules that come from differing locations of restriction enzyme sites. The differing locations of restriction enzyme sites could be caused by nucleotide substitutions, rearrangements, insertions, or deletions (Chang et al., 1988). Amplified fragment length

polymorphisms (AFLPs) were improved markers produced by adding ligation of adaptors to

the ends of digested restriction fragments and that were amplified by polymerase chain

reaction (PCR) (Tan et al., 2001). These two types of markers (RFLPs and AFLPs) are less

employed in practice now, compared to simple sequence repeats (SSR or microsatellites) and 7

SNPs for linkage mapping (Ball et al., 2010). Microsatellites or SSR are repeating sequences of 2-6 base pairs of DNA, with advantages for linkage mapping such as being PCR-based, highly reproducible and polymorphic, generally co-dominant and abundant in animal and plant genomes (Miao et al., 2005). Single nucleotide polymorphisms are single nucleotide variations in a DNA sequence. The high-resolution linkage map using high density SNP markers facilitates fine mapping of QTLs and enables the identification of genomic regions associated with high recombination rates (Groenen et al., 2009). More recently, a novel genetic marker called restriction-site associated DNA (RAD) was developed. This kind of marker is a short fragment of DNA nearby each side of a particular restriction enzyme recognition site (Miller et al., 2007). Integrating next-generation sequencing as a RAD genotyping platform, RAD has been rapidly applied for linkage map construction in many species since a high-density genetic map can be available and simultaneously sufficient sequence information will meet the demand of mutation analysis (Baird et al., 2008; Wang et al., 2012).

The implementation of linkage mapping at genome wide scale using commercial SNP chips has been widely applied for many interesting traits. The density of SNP chips, haplotype frequency in the population and the amount of linkage disequilibrium (LD) in the genome will strongly affect the ability to detect genomic regions harboring the real causative mutations (Teo et al., 2010). Alleles that are in non-random association are said to be in LD.

Linkage disequilibrium represents the chance of co-inheritance of alleles at different loci. It can be the result of physical linkage of genes or due to selection, mutation, migration, crossing or genetic drift. Owing to the longer LD structure and artificial selection, livestock are preferred study populations for linkage mapping for many economically important traits 8

such as reproduction traits, disease resistance, behavior traits, lactation (milk production) and

meat quality (Georges, 2007; Rosendo et al., 2012; Heifetz et al., 2009; Glenske et al., 2011;

Marques et al., 2011; Nones et al., 2012). On the other hand, mapping gene loci using linkage data is extremely inefficient in humans due to typically few progeny for calculating reliable map distance, traits being difficult to follow for many generations, and large distance between the known genes compared to livestock populations (Griffiths et al., 2000).

2.2 Homozygosity Mapping

Homozygosity mapping was first proposed in 1987 (Lander et al., 1987) as a powerful

method of mapping genes underlying recessive traits in humans. The principle of this

approach is based on the assumption that the homozygous genotype of a mutation resulting from a recessive disease gene is “identical by descent (IBD)” by the transmission of the affected allele to the affected child from a common ancestor occuring through both parents.

The adjacent chromosomal region surrounding the mutated locus is likely homozygous by descent in closely related offspring and this short segment of homozygosity can be detected in the affected group but not in carriers. This mapping strategy has been used successfully in both human and animal recessive diseases (Lander et al., 1987; Drielsma et al., 2012;

Charlier et al., 2008). Traditional homozygosity mapping requires suitable consanguineous families with multiple affected offspring. However, the limited availability of suitable families has hampered the discovery of genes for many diseases. Moreover, too closely related family members provide large homozygous segments that contain too many candidate genes. This hinders efficient identification of causative genes. Applying homozygosity mapping to single affected individuals from outbred populations may aid in the discovery of 9

short candidate chromosome segments and accelerate the identification of novel disease genes (Hildebrandt et al., 2009).

For many years, microsatellite markers were used in the analysis of homozygosity mapping.

With the development of high throughput genotyping platforms like array-based genotyping, this mapping approach is mainly based on data generated by commercially available chips with hundreds of thousands of SNPs. The decreasing costs of NGS technique drives researchers to try homozygosity mapping using sequencing data on some diseases such as exome sequencing data in a study for a cohort of Crohn’s Disease cases. A new algorithm was implemented in this study by detecting IBD from exome sequencing data to find small slices of matching alleles between pairs of individuals and then to extend them into full IBD segments. However, the detection accuracy of this method is low in assessing the causative regions (Zhuang et al., 2012).

There are several freely available online computational tools for homozygosity mapping to facilitate the analysis of large datasets for gene discovery, for example HomozygosityMapper

(Seelow et al., 2009, http://www.homozygositymapper.org/), IBDFinder (Carr et al., 2009, http://www.insilicase.co.uk/Desktop/Ibdfinder.aspx), and KinSNP (Amir et al., 2010, http://bioinfo.bgu.ac.il/bsu/software/KinSNP/).

2.3 Genome Wide Association Study

The recent use of genome wide association studies has become a powerful tool for investigating the genetic basis of common diseases such as Crohn’s disease (Barrett et al.,

2008), type 2 diabetes (Zeggini et al., 2008) and obesity (Wang et al., 2011). This approach searches for associations between DNA sequence variants (SNPs) across the whole genome 10

and phenotypes of interest. In an attempt to pinpoint genes that contribute to the etiology of

a disease, scientists use GWAS to determine if one variant of an SNP is statistically more

common in individuals with a given phenotype than in the ones without. This mapping

strategy became available for human diseases since the completion of the

Project in 2003 and the HapMap in 2005. The map of genetic variants and a list of new

technologies assist the accurate analysis of samples for genetic variations contributing to the

onset of a disease. However, the associated variants themselves may not be the causative

mutations, but may just be in LD with the actual causal genetic variants. A genome wide

association analysis is usually followed by fine mapping with a larger number of

recombinant individuals to narrow the region of interest, and may include targeted

sequencing of genome libraries prepared for the candidate genomic regions. Genome wide

association studies can be confounded by population stratification caused by systematic ancestry differences between cases and controls (Price et al., 2010). Population stratification will drive spurious associations when allele frequencies are different between subpopulations

(Campbell et al., 2005). Therefore, correcting for stratification is critical in performing

GWAS.

2.4 Gene Hunting for Diseases with Whole Genome Sequencing

With the development of novel biomedical technologies and dramatically decreased prices

for sequencing, whole genome sequencing is more affordable that in the past. Gene hunting for diseases with whole genome sequencing is a modern approach for revealing the underlying causal gene locus not only on Mendelian diseases but also on complex diseases.

This approach avoids the need for routine molecular analysis in the lab which is often labor intensive and time consuming, and effectively provides genetic information for doctors to 11

counsel at risk patients. Whole genome sequencing may also quickly allow suitable patients

the option of gene therapy. For example, remarkable advances would be obtained for

understanding disease-specific genetic alterations associated with cancers by the application

of whole genome sequencing, especially for the detection of structural variation (Mardis &

Wilson, 2009). Artificial selection in livestock will also be easily conducted when genetic

markers are quickly identified by whole genome sequencing (Larkin et al., 2012).

Phenotypes of Mendelian diseases might not be genetically homogeneous. The causal type

of mutation like point mutations or segmental variations may vary between disease affected

individuals with the same syndromes. Additionally, different patterns of inheritance can be

involved in the same phenotype. For instance, Charcot-Marie-Tooth disease (CMT) is an

inherited peripheral neural disease but can be separated into an autosomal dominant,

recessive, or X-linked causes and both SNPs and copy number variants confer susceptibility

to CMT (Lupski et al. 2010). Different genetic variants in the same gene locus might cause

similar phenotypes. For instance, cystic fibrosis (CF) is a recessive inherited disease. Over

1,000 mutations have been found in the cystic fibrosis transmembrane conductance regulator gene (CFTR) to be associated with CF (Jaffe and Bush, 2001). There is not usually a 100% success in discovering the causative mutation for a new CF patient even by screening all known mutation loci. To detect the genetic basis underlying these genetically heterogeneous or polymorphic disorders, performing whole genome sequencing within individual families who have well characterized pathologic records will be quick and efficient (Lupski et al.,

2010).

Most complex diseases result from the combined small effects from multiple genes.

Mapping novel genes by whole genome sequencing for complex diseases is a powerful tool 12

(Casals et al., 2012). Whole genome sequencing could detect any alterations on the entire genome and provide new insights into the mutational spectra. For example, a whole genome sequencing analysis of a group of hepatocellular carcinoma (HCC) patients revealed the influence of etiological background of somatic mutation patterns, and also identified recurrent mutations in chromatin regulators in the development of HCC (Fujimoto et al.,

2012).

3. Gene Mapping of Genetic diseases

In order to understand the applications of different mapping strategies for detecting genetic components underlying heritable diseases, gene mapping of three monogenetic disorders including inherited rickets, chondrodysplasia, and lower motor neuron disease, as well as one of the polygenic complex diseases, cryptorchidism are reviewed here.

3.1 Gene Mapping of Inherited Rickets

Rickets is a metabolic bone disease due to nutritional deficiencies of vitamin D, phosphorus, magnesium and/or calcium. The lack of those micronutrients will lead to defective mineralization of cartilage at sites of endochondral ossification. The weakening or softening bones will often result in fractures and limb deformities (Fitch, 1943; Dittmer et al., 2009;

Ozkan, 2010). Inherited rickets is an inborn error of metabolism, resulting from mutations that disrupt the normal bone metabolism. Several different forms of inherited rickets have been described including X-linked hypophosphatemic rickets, autosomal dominant hypophosphatemic rickets (ADHR), autosomal recessive hypophosphatemic rickets (ARHR), hereditary hypophosphataemic rickets with hypercalciuria (HHRH) and two types of Vitamin

D-dependent rickets (VDDR-I and VDDR-II). 13

X-linked hypophosphatemic rickets is commonly known as HYP or XLH. For this review,

HYP will be the acronym used. Based on pedigree analysis, HYP was considered a

dominant disorder and is the most common form of familial hypophosphatemic rickets. To

map the causative locus, a multilocus linkage analysis using a series of X chromosome

markers was conducted in eleven HYP relevant human families, and eventually located a

region on the X chromosome (Read et al., 1986). The mutation (Hyp) has

been identified on a homologous segment of the mouse X chromosome and the mice

exhibited similar symptoms to human HYP (Eicher et al., 1976). Using a positional cloning

approach, the HYP consortium successfully isolated a new gene named phosphate regulating

gene with homologies to endopeptidase on the X chromosome (PHEX; MIM 300550). The

causative mutations responsible for HYP patients were then determined (HYP consortium,

1995).

Similarly, a genome-wide multilocus linkage mapping was performed for ADHR using a

large family and determined the gene responsible for this disorder was on human

chromosome 12p13 in the 18-cM interval (Econs et al., 1997). Combining the results in this

large study with another small ADHR pedigree, the disease locus was mapped to a reduced

region of a 1.5-Mb region. The following direct sequencing of coding and promoter regions

in the positional candidate genes detected missense mutations in fibroblast growth factor 23

gene (FGF23; MIM 605380), a member of the FGF family (ADHR consortium, 2000).

Gene mapping of ARHR was carried out using a series of mapping tools. In one study, homozygosity mapping using a SNP array identified a 4.6 Mb candidate regions on human chromosome 4q21. Mutation screening by direct sequencing found loss-of-function mutations in a positional plausible candidate gene called DMP1 (dentin matrix protein 1; 14

MIM 600980). The elevated phosphaturic protein FGF23 suggested that dysfunctional

DMP1 mis-regulated the expression of FGF23 to results in ARHR (Lorenz-Depierux et al.,

2010). Similar methods were used in another study for ARHR human patients, but mutations in a different causative gene called ecto-nucleotide pyrophosphatase/phosphodiesterase 1

(ENPP1) were found. Biochemical experiments confirmed that the inactive mutations in

ENPP1 reduce enzymatic activity in hypophosphatemic patients but may not affect phosphate (Pi) renal reabsorption (Levy-Litan et al., 2010).

Another form of autosomal recessive hypophosphatemic rickets is HHRH (Tieder et al.,

1985). The distinctive characteristic of HHRH compared to other forms of

hypophosphatemic rickets is hypercalciuria due to increased serum 1,25-dihydroxyvitamin D

levels and increased intestinal calcium absorption. A genome-wide linkage scan with

homozygosity mapping, followed by the candidate gene method were performed to map a

single nucleotide deletion in solute carrier family 34 (sodium phosphate), member 3 gene

(SLC34A3; MIM 241530). This gene encodes the renal sodium-phosphate cotransporter

NaPi-IIc (Bergwitz et al., 2006).

Two other recessively inherited forms of rickets in humans were due to disruptions of either

the synthesis or metabolism of vitamin D. Vitamin D-dependent rickets, type I (VDDR-I) is

caused by mutations in the CYP27B1 gene, which encodes vitamin D 1-alpha-hydroxylase

(Fu et al., 1997). The mutation in this gene was initially mapped by linkage mapping with

big pedigrees followed by the candidate gene method which involved sequencing for

causative genetic variants (Labuda et al., 1990; Fu et al., 1997). Vitamin D-dependent

rickets, type II (VDDR-II), is called hereditary vitamin D - resistant rickets (HVDRR) (MIM

277440). Shortly after the successful cloning of a full-length cDNA sequence of the vitamin 15

D receptor gene (VDR) in 1988, researchers discovered many mutations in VDR that were found to be associated with HVDRR (Malloy et al., 1999).

3.2 Gene Mapping of Lower Motor Neuron Disease

Motor neuron diseases (MND) are a group of progressive neurological disorders involved with selective degeneration of upper motor (UMN) and/or lower motor neurons

(LMN). Messages from neurons in the (UMN) are initially transmitted to neurons in the brain stem and spinal cord (LMN) and then are transmitted from LMN to particular muscles to produce movements such as walking, chewing or blinking. When there are disruptions in the signals between LMN and the muscle, it is commonly called LMN disease.

The incidence of MND varies between different ethnic human populations from 0.6 to 2.4 individuals per 100,000 person-years (Lee, 2010). Common characteristics of MND are muscle weakness and/or spastic paralysis. Symptoms for UMN degeneration include spastic tone, hyperreflexia and upgoing plantar reflex. Signs of LMN degeneration are represented as muscle atrophy, fasciculations and weakness when associated LMN degenerate (Lee,

2010). Different types of MND have been classified based on their time of onset, type of motor neuron involvement and detailed clinical features. These MND include amyotrophic lateral sclerosis (ALS), hereditary spastic paraplegia (HSP), primary lateral sclerosis (PLS), spinal bulbar muscular atrophy (SBMA), spinal muscular atrophy (SMA) and lethal congenital contracture syndrome (LCCS). Among these diseases, both UMN and LMN are involved in the development of ALS and LCCS (Dion et al., 2009).

Previous evidence has shown genetic variants within more than 30 genes might be responsible for MND (Dion et al., 2009). Amyotrophic lateral sclerosis is a fatal 16

degenerative disorder of motor neurons. Linkage mapping using single-strand

conformational polymorphisms (SSCP) was applied to discover markers near the Cu/Zn-

binding superoxide dismutase (SOD1) gene being in significant linkage with ALS.

Mutations in this gene were identified by direct sequencing to explain about 20% of patients with familial ALS (Rosen et al., 1993). Hereditary spastic paraplegia with an autosomal dominant inheritance pattern is characterized by progressive spasticity of the lower limbs.

Pathogenic mutations in spastin gene (SPAST) were revealed by a positional cloning strategy

involving construction of BAC (bacterial artificial chromosome) contigs for sequencing. The

spastin gene was shown to be the most common causal gene locus of autosomal dominant

HSP patients (Hazan et al., 1999). Spinal muscular atrophy is a common fatal autosomal recessive disorder, leading to progressive paralysis with muscular atrophy. Genomic regions linked with SMA were first determined by means of linkage analysis. Further analysis using yeast artificial chromosome (YAC) contigs were applied to narrow down the linked regions, and finally identified loss of function mutations or deletions on SMN1 (survival of motor neuron 1, telomeric) responsible for SMA (Lefebvre et al., 1995). Lethal congenital contractual syndrome type 2 is an autosomal recessive neurogenic form of arthrogryposis, characterized by multiple joint contractures and markedly distended urinary bladder. To map gene loci for this MND, linkage analyses and homozygosity mapping using microsatellite markers were used. One substitution on exon 11 of ERBB3 (receptor - protein kinase erbB-3) was identified as being associated with LCCS2 (Narkis et al, 2007).

3.3 Gene Mapping of Chondrodysplasia 17

Chondrodysplasia is a genetically diverse group of diseases characterized by abnormal

development of cartilage that are present in humans and many animals. Defective genes

disrupt normal chondrogenesis and cartilage development leading to abnormal shape and

structure of the skeleton. The inheritance patterns of chondrodysplasia are diverse.

Mutations in several genes related to the synthesis of collagen or proteoglycans, or the

signal-transduction mechanisms have been reported to be involved in inherited

chondrodysplasia. Collagen II is the major collagenous component of cartilage. Traditional

genetic tools such as genomic cloning and sequencing indicated that several point mutations

in human collagen, type II, alpha 1 gene (COL2A1) interrupt the normal formation of type II

collagen and causes an autosomal dominant achondrogenesis type II (Vissing et al., 1989;

Chan et al., 1995). Partially disrupted Col2a1 has been identified with a phenotype of chondrodysplasia in transgenic mice (Vandenberg et al., 1991). Successive genetic linkage

and positional candidate cloning studies have shown that disruption of genes encoding other

cartilage-specific collagen could cause different types of chondrodysplasia. Mutations in

families with an autosomal dominantly inherited chondrodysplasia, multiple epiphyseal

dysplasia (MED), have been reported in genes coding the α1, α2 and α3 chains of collagen

IX. A mutation in intron 8 of COL9A1 (collagen, type IX, alpha 1) causing a splicing defect was found in one family with MED (Czarny-Ratajczak et al. 2001). Several mutations were described to cause the skipping of COL9A2 (collagen, type IX, alpha 2) exon 3 in families with MED (Muragaki et al., 1996; Spayde et al., 2000; Holden et al. 1999). Separate families with MED have been described to show that mutations in the acceptor splice site of intron 2 in COL9A3 (collagen, type IX, alpha 3) caused the skipping of exon 3, and resulted phenotypes of MED (Paassilta et al. 1999b, Bönnemann et al. 2000). Type X collagen is a 18

short chain collagen expressed by hypertrophic chondrocytes during endochondral ossification. A nonsense mutation of the collagen, type X, alpha 1 (COL10A1) gene has been found to be responsible for Schmid type metaphyseal chondrodysplasia (MCDS) in a human patient (Woelfle et al., 2011).

Sulfated glycosaminoglycan is very important for maintaining the elasticity and water- binding capacity in the cartilage matrix (Jerosch, 2011). Absorption and re-absorption of sulfate ions is very critical for the normal cartilage and bone development. Different forms of chondrodysplasia in humans including multiple epiphyseal dysplasia, atelosteogenesis type II and diastrophic dysplasia are due to mutations in the solute carrier family 26, member

2 gene (SLC26A2), which encodes a sulfate transporter protein (DTDST). These results were supported by genetic analyses with techniques of positional cloning and direct sequencing.

The impaired sulfate uptake prevents the normal development of cartilage (Cho et al., 2010;

Hästbacka et al., 1996; Hästbacka et al., 1999).

Mutations in genes encoding a number of functionally osteogenic growth factors/signaling molecules have been also reported to be associated with different forms of chondrodysplasia.

A 22-bp tandem duplication resulting in a frameshift of CDMP1 (designated cartilage- derived morphogenetic protein 1) was found to be associated with chondrodysplasia in human patients (Thomas et al., 1996). A multi-breed association study in domestic dogs demonstrated that an evolutionarily recently acquired retrogene encoding fibroblast growth factor 4 (fgf4), an extra copy of the FGF4 gene, was strongly associated with chondrodysplasia (Parker et al., 2009). The most common inherited chondrodysplasia of sheep is “spider lamb syndrome” (SLS) in the Suffolk breed. By means of direct sequencing, 19

it was found that the mutation responsible for the disease was a non-synonymous

transversion on the highly conserved tyrosine kinase II domain of the fibroblast growth factor

receptor 3 gene (FGFR3) (Cockett et al., 1999; Beever et al., 2006).

3.4 Gene Mapping of Cryptorchidism

Cryptorchidism, also known as undescended , is the absence from the scrotum of one

or both testes. It is the most common genital problem encountered in pediatrics. The

incidence of this defect in human studies varies from 0.2% to 13% (Thonneau et al., 2003).

Cryptorchidism has also been seen in domestic animals such as dogs (Cox et al., 1978), cats

(Yates et al., 2003), horses (Cox et al., 1979), pigs (Rothschild et al., 1988), goats (Singh and

Ezeasor, 1989), sheep (Smith et al., 2007) and (St Jean et al., 1992), and wild animals

such as maned wolves (Burton and Ramsay, 1986), black bears (Dunbar et al., 1996) and

reindeer (Leader-Williams, 1979). An undescended testicle is considered a risk factor for

testicular cancer and male infertility (Berkowitz et al., 1993; Carizza et al., 1990).

In most , the testis must descend from the abdomen to the scrotum to provide an environment with a lower temperature than deep body temperature which is too high for normal spermatogenesis. The elephant is an exception since its testes are completely inside the body (Short, 2005). Normal testicular descent involves a two-phase process including transabdominal and inguinoscrotal descents (Hutson, 1985; Figure 1). Key anatomical events to release the testes from their urogenital ridge location and to guide the gonad into the scrotum are the degeneration of the cranio-suspensory ligament, the swelling, outgrowth and regression of the gubernaculum and the elevation of the intro-abdominal pressure

(Hutson and Hasthorpe, 2005). Insulin-like hormone 3 plays an important role for the normal 20

descent of the testes during the initial phase of transabdominal testicular descent, and androgen works on both phases. The gubernaculum is the principle structure in testicular descent and both insulin-like hormone 3 and androgen play a role of controlling the changes of the gubernaculum (Nation et al., 2009). Other factors including hormones like mullerian inhibiting substance, estrogen and gonadotropin-releasing hormone, and encoded by members of the HOX gene family also contribute to testicular descent. Abnormalities during the key processes of testicular descent have been recognized to relate to cryptorchidism.

Figure 1. Structures involved in testicular descent. Two critical phases of testis descent transabdominal (left), and inguinoscrotal (right) travel are essential to move the testes into the scrotum. The intermediate stage is shown in the middle. The process of testicle descent depends on the degeneration of the cranio-suspensory ligament (CSL) and the swelling, outgrowth and regression of the gubernaculum. This figure was modified based on the reference Klonisch et al., 2004.

Cryptorchidism is a genetic defect caused by the interaction of genetic, epigenetic and environmental factors (Hutson and Hasthorpe, 2005). Environmental factors with estrogenic or anti-androgenic effects play a role on the development of cryptorchidism (Fisher, 2004, 21

Gill et al., 1979). Various genes implicated in the regulation of testicular descent have been

investigated for cryptorchidism. Previous reports have identified approximately 20 genes

that might be associated with cryptorchidism (Klonisch et al., 2004; Amann and

Veeramachaneni, 2007). Androgens might play the role on both transabdominal and inguinoscrotal processes via regulating the release of a calcitonin gene-related peptide by the genitofemoral nerve. Several androgen related genes have been investigated while studying cryptorchidism. Androgen receptor (AR) was found to be strongly expressed in the gubernaculums (George and Peterson, 1988) and a case-control study showed that the GGN trinucleotide repeat length variant in AR was associated with a risk of cryptorchidism and penile hypospadias in an Iranian population (Radpour et al., 2007). Calcitonin gene-related peptides are encoded by two genes including CGRP-alpha (CALCA) and CGRP-beta

(CALCB). Exogenous calcitonin gene-related peptide injection into the scrotum can change the direction of gubernacular migration in the mutant trans-scrotal (TS) rat (Griffiths et al.,

1993; Clarnette and Hutson 1999). However, no pathogenic sequence changes were identified in CALCA and CALCB for 90 selected human patients with unilateral or bilateral cryptorchidism by mutation screening performed through SSCP analyses (Zuccarello et al.,

2004). According to the previous reports, a minority of affected cases have mutations either in insulin-like hormone 3 gene (INSL3) or affecting its receptor gene, called relaxin/insulin- like family peptide receptor 2 (RXFP2), the latter also known as leucine-rich repeat- containing G protein-coupled receptor 8 gene (LGRF8). Insulin-like factor 3 is expressed in the pre-and postnatal Leydig cells of the testis and postnatal theca cells of the ovary, and is a member of the insulin-like hormone superfamily (Zimmermann et al., 1997). Targeted disruption of this gene prevents gubernaculum growth and causes bilateral cryptorchidism in 22

mice (Nef and Parada 1999; Zimmermann et al., 1999). The receptor gene LGR8 has been

found highly expressed in the gubernaculum and is responsible for mediating hormonal

signals that control normal testicular descent in humans (Gorlov et al., 2002). Knockout

mice models for this gene exhibit bilateral intra-abdominal cryptorchidism (Bogatcheva et al.,

2007). Disruptions of other hormone related genes also show associations with aberrant

testicular descent. Müllerian inhibiting substance (MIS), or anti-Mullerian hormone (AMH), encoding a 140-kDa glycoprotein produced by Sertoli cells is critical for the mullerian duct during male sexual differentiation (Bashir and Wells, 1995) and is associated with the swelling reaction of gubernaculum occurring during the first phase of testicular descent

(Kubota et al., 2002). In mice, the testicular feminization (Tfm) mutation with the deletion of the Mis gene was found with a phenotype of undescended testis suggesting a causal relationship between Mis and cryptorchidism (Behringer et al., 1994). The mutated gonadotropin-releasing hormone receptor gene (Gnrhr) has failed to initiate testicular descent in male mice as well (Pask et al., 2005). Estrogen receptor 1 (ESR1) is a ligand-activated transcription factor that is important in the process of testicular descent in children

(Przewratil et al., 2004). Esr1 knockout male mice with retracted gonads suggest estrogens may play a role in the scrotal positioning of the testicle (Donaldson et al., 1996). Moreover, investigations were carried out on HOX genes for studying cryptorchidism. Homeobox A10

(Hoxa10) is highly expressed in the gubernaculum and this gene has been found to be

essential for the gubernacular migration during testicular descent in mice. Disruption of this

gene implicates its role in the development of cryptorchidism ((Nightingale et al., 2008; Rijli

et al., 1995). A single SNP variant in HOXD13 (homeobox D13), has been identified to be 23

significantly associated with cryptorchidism in humans via a case-control analysis (Wang et

al., 2007).

4. Animal Models for Human Diseases

Using appropriate model organisms for human disease studies will help understand the

genetic components behind disorders. Domestic animals have proven to be good research

models due to their large body size and many anatomical and physiological similarities to

humans. Numerous genetic defects have been identified in domestic animals, and inherited

disorders in these animal models can provide valuable information to assist in the

identification of human disease genes and the study of gene therapy.

4.1 The Sheep model

Sheep (Ovis aries) were initially domesticated approximately 11,000 years ago for

agricultural purposes including the production of fleece, skins, meat and milk (Colledge et al.,

2005; Chessa et al., 2009). The top three sheep meat producing countries, in order of quantity, are China, Australia and New Zealand based on data from the Food and

Agricultural Organization of the United Nations in 2010

(http://faostat.fao.org/site/569/DesktopDefault.aspx?PageID=569#ancor). Large animal models like sheep have received extensive clinical attention for human genetic diseases due to their advantages over rodent models, such as greater physiological similarity to human patients, a longer life span and larger body size. Sheep are considered as excellent models for studying respiratory physiology (Meeusen et al., 2009), renal physiology (Ramchandra et al., 2012), reproductive physiology (Vinet et al., 2012), cardiovascular physiology (Wylie et al., 2012) and the endocrine system (Recabarren et al., 2005). The body size of sheep allows 24

for frequent blood sampling and easy insertion of monitoring devices into organs. These

advantages allow performing prenatal gene therapy to be performed in sheep prior to disease

onset to achieve long-term expression of the exogenous gene without modifying the germ

line of the recipient (Porada et al., 1998).

The construction of the reference sheep genome sequence was started in 2009 by the

International Sheep Genomics Consortium. Genome sequence will enable accelerated

genetic gain and improved understanding of the etiology of diseases deleterious for the sheep

breeding programs (The International Sheep Genomics Consortium et al., 2010). Most

modern sheep breeds have maintained high levels of genetic diversity due to strong gene

flow between individuals of different lines and breeds (Kijas et al., 2012). The initial

analysis of LD structure in sheep was conducted across over 3,000 sheep from 74 diverse

breeds. The results suggest substantially lower LD in sheep compared with cattle but higher

when compared to humans. Moreover, substantial variation in LD structure exists between

ovine breeds (Raadsma et al., 2010). The new reference sheep genome sequence would help biomedical researchers to overcome previous challenges of limited gene sequence availability.

4.2 The Dog Model

The domestic dog (Canis lupus familiaris) may be the first animal to be domesticated

approximately 15,000 years ago (Savolainen et al., 2002). The purposes of domestication of

dogs at the beginning were for working or hunting. More recently, the dog has been used

extensively as a for complex human disease research because of the

similarities between canine and human diseases including disease manifestation and genetic 25

predisposition of diseases such as cancers (Khanna et al., 2006), diabetes (Srinivasan and

Ramarao, 2007), heart disease (Kohler et al., 1984), disease (Aresu et al., 2008)

(Karlsson and Lindblad-Toh, 2008). Additionally, many anatomic and physiological aspects,

particularly in the cardiovascular, urogenital, nervous and musculoskeletal systems between

canine and humans are similar (Khanna et al., 2006). The similar spectrum of diseases,

specific dog population structure, the close social relationship and similar living environment

between humans and dogs, drive the domestic dog as an ideal model for mapping human

diseases. With the first version of Dog Genome Sequencing draft being completed in 2005,

the dog genome was found to exhibit relatively small size (2.4 Gb compared to 3.2 Gb in

humans), with less divergence from the human when compared to the mouse genome

(Lindblad-Toh et al., 2005; International Human Genome Sequencing Consortium, 2004).

This similarity allows the dog to be a suitable animal model for human diseases with a genetic component.

Additionally, the dog genome has a favorable LD structure for investigating complex disorders. Two population bottlenecks have occurred during domestication of the dog. This created an advantageous linkage structure which is ideal for gene mapping. First, long-range

LD, from hundreds of kilobases to several megabases, exists within a breed, unlike the tens of kilobases observed in the human genome. Second, short-range LD exists in the whole dog population due to its long domestication period (Lindblad-Toh, et al., 2005). By knowing the patterns of limited LD across whole dog populations and extensive LD within breeds, a two- stage mapping including identification of the disease locus in a breed and then finding the narrowed genomic region using multiple breeds is a favored strategy (Karlsson and Lindblad-

Toh, 2008). An example of this occurred with the mapping of white spotting in dogs. 26

Initially, 19 boxer dogs were used to discover an 800kb region. A second breed, bull terriers,

was used to confirm and narrow the regions to a 100kb region that contained a gene called

microphthalmia-associated transcription factor (MITF). Candidate regulatory mutations were

identified to be associated with the spotting trait (Karlsson et al., 2007).

The most powerful and commonly used mapping approach in dogs for disease research is

GWAS. Less than 50 individuals are sufficient to map a simple Mendelian recessive trait

using GWAS, but many more dogs are needed to map complex diseases (Karlsson and

Lindblad-Toh, 2008). There are many successful cases using GWAS to map Mendelian

traits in dogs. Some examples include a missense mutation on SOD1 (superoxide dismutase

1) associated with canine degenerative myelopathy (DM) (Awano et al., 2009), a 23 kb

deletion in the canine ADAM9 (a disintegrin and metalloprotease domain, family member 9) associated with a canine cone-rod dystrophy (crd3) (Goldstein et al., 2010), and a missense

mutation in a conserved functional domain of β-glucuronidase (GUSB) confirmed to cause mucopolysaccharidosis VII (MPS VII) in a population of Brazilian terriers (Hytönen et al.,

2012).

Other mapping strategies including QTL mapping, across-breed mapping and selective sweep mapping have also been applied in dogs (Karlsson and Lindblad-Toh, 2008; Sutter et al., 2007). The selective sweep mapping is to identify the selective sweeps that are associated with a strongly selected phenotype. A selective sweep is the region where there is a reduction or elimination of genetic variations in DNA neighboring a mutation, due to recent and strong positive natural or artificial selection (Palaisa et al., 2004). Artificial selection for breeding purposes in dogs over hundreds of years causes big morphological differences 27

among modern breeds and uniform physical features within pure-bred dogs (Wayne and

Ostrander, 2007). This special characteristic of dog populations provides suitable resources for QTL mapping and selective sweep mapping to discovery of genes underlying breed- specific characteristics. For example, a common haplotype in insulin-like growth factor 1 gene (IGF1) associated with body size in dogs was successfully identified by QTL mapping and selective sweep mapping using small and giant dog breeds (Sutter et al., 2007). To avoid the confounding of long haplotypes and extensive homozygosity within the breed, across- breed selection sweep mapping is helpful to find interesting loci. A study used this mapping strategy to analyze genetic variation in 46 dog breeds identified 44 chromosomal regions that are extremely variable between breeds and are likely to control many of the morphological and behavioral traits that vary between them (Vaysse et al., 2011). By contrast, the limitation of offspring from crosses in large numbers restricts the application of QTL mapping in dogs comparing to livestock animals.

Genetic diseases consititute huge public healthy problems for humans, and also cause big economic losses in animal industry. Finding the causative mutations will help to improve this situation. Applying suitable gene mapping tools for scientists could make gene hunts faster, cheaper and practical for genetic diseases. Animal models are preferable for experimental disese research due to their advantages reviewed above. The gene discoveries in animal models will facilitate the study of human genetic diseases.

REFERENCES ADHR Consortium. Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nat Genet. 2000. 26:345-8. Amann RP, Veeramachaneni DNR. Cryptorchidism in common eutherian mammals. Reproduction. 2007. 133:541-1. 28

Amir el-AD, Bartal O, Morad E, Nagar T, Sheynin J, Parvari R, et al. KinSNP software for homozygosity mapping of disease genes using SNP microarrays. Hum Genomics. 2010. 4:394-40. Antonarakis SE, Beckmann, JS. Mendelian disorders deserve more attention. Nat Rev Genet. 2006. 7: 277- 82.

Aresu L, Pregel P, Bollo E, Palmerini D, Sereno A, Valenza F. Immunofluorescence staining for the detection of immunoglobulins and complement (C3) in dogs with renal disease. Vet Rec. 2008. 163:679-82. Awano T, Johnson GS, Wade CM, Katz ML, Johnson GC, Taylor JF, et al. Genome-wide association analysis reveals a SOD1 mutation in canine degenerative myelopathy that resembles amyotrophic lateral sclerosis. Proc Natl Acad Sci U S A. 2009. 106: 2794-9. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008. 3:e3376. Ball AD, Stapley J, Dawson DA, Birkhead TR, Burke T, Slate J. A comparison of SNPs and microsatellites as linkage mapping markers: lessons from the zebra finch (Taeniopygia guttata). BMC Genomics. 2010. 11:218. Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet. 2008. 40:955-62. Bashir MS, Wells M. Müllerian inhibiting substance. J Pathol. 1995. 176:109-10. Beever JE, Smit MA, Meyers SN, Hadfield TS, Bottema C, Albretsen J, et al. A single-base change in the tyrosine kinase II domain of ovine FGFR3 causes hereditary chondrodysplasia in sheep. Anim Genet. 2006. 37:66-71. Behringer RR, Finegold MJ, Cate RL. Müllerian-inhibiting substance function during mammalian sexual development. . 1994. 79:415-25.

Bergwitz C, Roslin NM, Tieder M, Loredo-Osti JC, Bastepe M, Abu-Zahra H, et al. SLC34A3 mutations in patients with hereditary hypophosphatemic rickets with hypercalciuria predict a key role for the sodium-phosphate cotransporter NaPi-IIc in maintaining phosphate homeostasis. Am J Hum Genet. 2006. 78: 179-92. Berkowitz GS, Lapinski RH, Dolgin SE, Gazella JG, Bodian CA, Holzman IR. Prevalence and natural history of cryptorchidism. Pediatrics. 1993. 92:44-9. Bogatcheva NV, Truong A, Feng S, Engel W, Adham IM, Agoulnik AI. GREAT/LGR8 is the only receptor for insulin-like 3 peptide. Mol Endocrinol. 2003. 17:2639-46. Bönnemann CG, Cox GF, Shapiro F, Wu J-J, Feener CA, Thompson TG, et al. A mutation in the α3 chain of type IX collagen causes autosomal dominant multiple epiphyseal dysplasia with mild myopathy. Proc Natl Acad Sci USA. 2000. 97:1212-17. Burton M, Ramsay E. Cryptorchidism in Maned Wolves. J Zoo An Med. 1986. 17:133-5. 29

Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, et al. Demonstrating stratification in a European American population. Nat Genet. 2005. 37:868-72. Carizza C, Antiba A, Palazzi J, Pistono C, Morana F, Alarcón M. Testicular maldescent and infertility. Andrologia. 1990. 22:285-8. Carr IM, Sheridan E, Hayward BE, Markham AF, Bonthron DT. IBDfinder and SNPsetter: tools for pedigree-independent identification of autozygous regions in individuals with recessive inherited disease. Hum Mutat. 2009. 30:960-7. Casals F, Idaghdour Y, Hussin J, Awadalla P. Next-generation sequencing approaches for genetic mapping of complex diseases. J Neuroimmunol. 2012. 248:10-22. Review. Chan D, Cole WG, Chow CW, Mundlos S, Bateman JF. A COL2A1 mutation in achondrogenesis type II results in the replacement of type II collagen by type I and III collagens in cartilage. J Biol Chem. 1995. 270:1747-53. Chang C, Bowman JL, DeJohn AW, Lander ES, Meyerowitz EM. Restriction fragment length polymorphism linkage map for Arabidopsis thaliana. Proc Natl Acad Sci U S A. 1988. 85:6856-60.

Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, Cambisano N, et al. Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet. 2008. 40:449-54. Chessa B, Pereira F, Arnaud F, Amorim A, Goyache F, Mainland I, et al. Revealing the History of Sheep Domestication Using Retrovirus Integrations. Science. 2009. 324: 532-6.

Cho TJ, Kim OH, Lee HR, Shin SJ, Yoo WJ, Park WY, et al. Autosomal recessive multiple epiphyseal dysplasia in a Korean girl caused by novel compound heterozygous mutations in the DTDST (SLC26A2) gene. J Korean Med Sci. 2010. 25:1105-8. Clarnette TD, Hutson JM. 1999. Exogenous calcitonin gene-related peptide can change the direction of gubernacular migration in the mutant trans-scrotal rat. J Pediatr Surg. 34:1208- 12. Cockett NE, Shay TL, Beever JE, Nielsen D, Albretsen J, Georges M, et al. Localization of the locus causing Spider Lamb Syndrome to the distal end of ovine Chromosome 6. Mamm Genome. 1999. 10:35-8. Colledge S, Conolly J, Shennan S. The evolution of Neolithic farming from SW Asian origins to NW European limits. Eur J Arch. 2005. 8:137-56. Cox JE, Edwards GB, Neal PA. An analysis of 500 cases of equine cryptorchidism. Equine Vet J. 1979. 11:113-6. Cox VS,Wallace LJ, Jessen CR. An anatomic and genetic study of canine cryptorchidism. Teratology. 1978.18:233-40. Dion PA, Daoud H, Rouleau GA. Genetics of motor neuron disorders: new insights into pathogenic mechanisms. Nat Rev Genet. 2009. 10:769-82. 30

Dittmer KE, Thompson KG, Blair HT. Pathology of inherited rickets in Corriedale sheep. J Comp Pathol. 2009. 141: 147-55. Donaldson KM, Tong SY, Washburn T, Lubahn DB, Eddy EM, Hutson JM, et al. Morphometric study of the gubernaculum in male estrogen receptor mutant mice. J Androl. 1996. 17:91-5. Drielsma A, Jalas C, Simonis N, Désir J, Simanovsky N, Pirson I, et al. Two novel CCDC88C mutations confirm the role of DAPLE in autosomal recessive congenital hydrocephalus. J Med Genet. 2012 [Epub ahead of print] Dunbar MR, Cunningham MW, Wooding JB, Roth RP. Cryptorchidism and delayed testicular descent in Florida black bears. J Wildl Dis. 1996. 32:661-4. Econs MJ, McEnery PT, Lennon F, Speer MC. Autosomal dominant hypophosphatemic rickets is linked to chromosome 12p13. J Clin Invest. 1997. 100:2653-7. Eicher EM, Southard JL, Scriver CR, Glorieux FH. Hypophosphatemia: mouse model for human familial hypophosphatemic (vitamin D-resistant) rickets. Proc Natl Acad Sci U S A. 1976. 73:4667-71.

Fisher JS. Environmental anti-androgens and male reproductive health: focus on phthalates and testicular dysgenesis syndrome. Reproduction. 2004. 127:305-15. Review.

Fitch LWN. Rickets in hoggets, with a note on the aetiology and definition of the disease. Australian Vet J. 1943.19:2. Fu GK, Lin D, Zhang MY, Bikle DD, Shackleton CH, Miller WL, et al. Cloning of human 25-hydroxyvitamin D-1 α-hydroxylase and mutations causing vitamin D-dependent rickets type 1. Mol Endocrinol.1997. 11:1961-70. Fujimoto A, Totoki Y, Abe T, Boroevich KA, Hosoda F, Nguyen HH, et al. Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators. Nat Genet. 2012. 44:760-4. George FW, Peterson KG. Partial characterization of the androgen receptor of the newborn rat gubernaculum. Biol Reprod. 1988. 39:536-9. Georges M. Mapping, Fine Mapping, and Molecular Dissection of Quantitative Trait Loci in Domestic Animals. Annu Rev Genomics Hum Genet. 2007. 8: 131-62. Gill WB, Schumacher GF, Bibbo M, Straus FH 2nd, Schoenberg HW. Association of diethylstilbestrol exposure in utero with cryptorchidism, testicular hypoplasia and semen abnormalities. J Urol. 1979. 122:36-9. Glenske K, Prinzenberg EM, Brandt H, Gauly M, Erhardt G. A chromosome-wide QTL study on BTA29 affecting temperament traits in German Angus beef cattle and mapping of DRD4. Animal. 2011. 5:195-7. Goldstein O, Mezey JG, Boyko AR, Gao C, Wang W, Bustamante CD, et al. An ADAM9 mutation in canine cone-rod dystrophy 3 establishes with human cone-rod dystrophy 9. Mol Vis. 2010. 16:1549-69. 31

Gorlov IP, Kamat A, Bogatcheva NV, Jones E, Lamb DJ, Truong A, et al. Mutations of the GREAT gene cause cryptorchidism. Hum Mol Genet. 2002. 11:2309-18. Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM. An ntroduction to genetic analysis. 2000. 7th edition. New York: W. H. Freeman. Griffiths AL, Middlesworth W, Goh DW, Hutson JM. 1993. Exogenous calcitonin gene- related peptide causes gubernacular development in neonatal (Tfm) mice with complete androgen resistance. J Pediatr Surg. 28:1028-30. Groenen MA, Wahlberg P, Foglio M, Cheng HH, Megens HJ, Crooijmans RP, et al. A high- density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009. 19:510-9. Hästbacka J, Kerrebrock A, Mokkala K, Clines G, Lovett M, Kaitila I, et al. Identification of the Finnish founder mutation for diastrophic dysplasia (DTD). Eur J Hum Genet. 1999. 7:664-70.

Hästbacka J, Superti-Furga A, Wilcox WR, Rimoin DL, Cohn DH, Lander ES. Atelosteogenesis type II is caused by mutations in the diastrophic dysplasia sulfate- transporter gene (DTDST): evidence for a phenotypic series involving three chondrodysplasias. Am J Hum Genet. 1996. 58:255-62. Hazan J, Fonknechten N, Mavel D, Paternotte C, Samson D, Artiguenave F, et al. Spastin, a new AAA protein, is altered in the most frequent form of autosomal dominant spastic paraplegia. Nat Genet. 1999. 23:296–303. Heifetz EM, Fulton JE, O'Sullivan NP, Arthur JA, Cheng H, Wang J, et al. Mapping QTL affecting resistance to Marek's disease in an F6 advanced intercross population of commercial layer chickens. BMC Genomics. 2009. 14:10-20. Hildebrandt F, Heeringa SF, Rüschendorf F, Attanasio M, Nürnberg G, Becker C, et al. A Systematic Approach to Mapping Recessive Disease Genes in Individuals from Outbred Populations. PLoS Genet. 2009. 5: e1000353. Holden P, Canty EG, Mortier GR, Zabel B, Spranger J, Carr A, et al. Identification of novel pro-α2(IX) collagen gene mutations in two families with distinctive oligo-epiphyseal forms of multiple epiphyseal dysplasia. Am J Hum Genet. 1999. 65:31-8. Hutson JM, Hasthorpe S. Abnormalities of testicular descent. Cell Tissue Res. 2005. 322:155-8. Review. Hutson JM. A biphasic model for the hormonal control of testicular descent. Lancet. 1985. 24:419-21.

HYP Consortium. A gene (PEX) with homologies to endopeptidases is mutated in patients with X-linked hypophosphatemic rickets. Nat Genet. 1995. 11: 130-6. Hytönen MK, Arumilli M, Lappalainen AK, Kallio H, Snellman M, Sainio K, et al. A novel GUSB mutation in Brazilian terriers with severe skeletal abnormalities defines the disease as mucopolysaccharidosis VII. PLoS One. 2012. 7:e40281. 32

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004. 431:931-45.

International Sheep Genomics Consortium, Archibald AL, Cockett NE, Dalrymple BP, Faraut T, Kijas JW, et al. The sheep genome reference sequence: a work in progress. Anim Genet. 2010. 41:449-53. Review.

Jaffé A, Bush A. Cystic fibrosis: review of the decade. Monaldi Arch Chest Dis. 2001. 56:240-7. Review. Jerosch J. Effects of Glucosamine and Chondroitin Sulfate on Cartilage Metabolism in OA: Outlook on Other Nutrient Partners Especially Omega-3 Fatty Acids. Int J Rheumatol. 2011:969012.

Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NH, Zody MC, Anderson N, et al. Efficient mapping of mendelian traits in dogs through genome-wide association. Nat Genet. 2007. 39:1321-8.

Karlsson EK, Lindblad-Toh K. Leader of the pack: gene mapping in dogs and other model organisms. Nat Rev Genet. 2008. 9:713-25. Review.

Khanna C, Lindblad-Toh K, Vail D, London C, Bergman P, Barber L, et al. The dog as a cancer model. Nat Biotechnol. 2006. 24: 1065- 6.

Kijas JW, Lenstra JA, Hayes B, Boitard S, Porto Neto LR, San Cristobal M, et al. Genome- Wide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection. PLoS Biol. 2012. 10: e1001258. Klonisch T, Fowler PA, Hombach-Klonisch S. Molecular and genetic regulation of testis descent and external genitalia development. Dev Bio. 2004. 270:1-18. Kohler J, Silverman NA, Levitsky S, Pavel DG, Eckner FA, Fang RB. A model of cyanotic heart disease: functional, pathological, and metabolic sequelae in the immature canine heart. J Surg Res. 1984. 37:309-13. Kubota Y, Temelcos C, Bathgate RA, Smith KJ, Scott D, Zhao C, et al. The role of insulin 3, testosterone, Müllerian inhibiting substance and relaxin in rat gubernacular growth. Mol Hum Reprod. 2002. 8:900-5. Labuda M, Morgan K, Glorieux FH. Mapping autosomal recessive vitamin D dependency type I to chromosomal 12q14 by linkage analysis. Am J Hum Genet. 1990. 47:28-36. Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987. 236:1567-70. Larkin DM, Daetwyler HD, Hernandez AG, Wright CL, Hetrick LA, Boucek L, et al. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle. Proc Natl Acad Sci U S A. 2012. 109:7693-8. 33

Leader-Williams N. Abnormal testes in reindeer, Rangifer tarandus. J Reprod Fertil. 1979. 57:127-30. Lee CN. Reviewing evidences on the management of patients with motor neuron disease. Hong Kong Med J. 2012. 18:48-55. Lefebvre S, Bürglen L, Reboullet S, Clermont O, Burlet P, Viollet L, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell. 1995. 80:155–65.

Levy-Litan V, Hershkovitz E, Avizov L, Leventhal N, Bercovich D, Chalifa-Caspi V, et al. Autosomal-recessive hypophosphatemic rickets is associated with an inactivation mutation in the ENPP1 gene. Am J Hum Genet. 2010. 86: 273-8.

Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005. 438:803-19. Lorenz-Depiereux B, Schnabel D, Tiosano D, Hausler G, Strom TM. Loss-of-function ENPP1 mutations cause both generalised arterial calcification of infancy and autosomal- recessive hypophosphatemic rickets. Am J Hum Genet. 2010. 86: 267-72. Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, et al. Whole- genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010. 362:1181-91.

Malloy PJ, Pike JW, Feldman D. The vitamin D receptor and the syndrome of hereditary 1,25-dihydroxyvitamin D-resistant rickets. Endo Rev 1999. 20: 156-88. Mardis ER, Wilson RK. Cancer genome sequencing: a review. Hum Mol Genet. 2009.18:R163-8. Marques E, Grant JR, Wang Z, Kolbehdari D, Stothard P, Plastow G, et al. Identification of candidate markers on bovine chromosome 14 (BTA14) under milk production trait quantitative trait loci in Holstein. J Anim Breed Genet. 2011. 128:305-13. Meeusen EN, Snibson KJ, Hirst SJ, Bischof RJ. Sheep as a model species for the study and treatment of human asthma and other respiratory diseases. Drug Discov Today Dis Models. 2009. 5: 101-6. Miao XX, Xub SJ, Li MH, Li MW, Huang JH, Dai FY, et al. Simple sequence repeat-based consensus linkage map of Bombyx mori. Proc Natl Acad Sci U S A. 2005. 102:16303-8. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007. 17: 240-48. Muragaki Y, Mariman ECM, van Beersum SEC, Perälä M, van Mourik JBA, Warman ML, et al. A mutation in the gene encoding the α2 chain of the fibril-associated collagen IX, COL9A2, causes multiple epiphyseal dysplasia (EDM2). Nat Genet. 1996. 12:103-5. 34

Narkis G, Ofir R, Manor E, Landau D, Elbedour K, Birk OS. Lethal congenital contractural syndrome type 2 (LCCS2) is caused by a mutation in ERBB3 (Her3), a modulator of the phosphatidylinositol-3-kinase/Akt pathway. Am J Hum Genet. 2007. 81:589–95. Nation TR, Balic A, Southwell BR, Newgreen DF, Hutson JM. The hormonal control of testicular descent. Pediatr Endocrinol Rev. 2009. 7:22-31. Nef S, Parada LF. Cryptorchidism in mice mutant for Insl3. Nat Genet. 1999. 22:295-9. Nightingale SS, Western P, Hutson JM. The migrating gubernaculum grows like a "limb bud". J Pediatr Surg. 2008. 43:387-90. Nones K, Ledur MC, Zanella EL, Klein C, Pinto LF, Moura AS, et al. Quantitative trait loci associated with chemical composition of the chicken carcass. Anim Genet. 2012. 43:570-6.

Ozkan B. Nutritional rickets. J Clin Res Pediatr Endocrinol. 2010. 2: 137- 43.

Paassilta P, Lohiniva J, Annunen S, Bonaventure J, Le Merrer M, Pai L, et al. COL9A3: a third locus for multiple epiphyseal dysplasia. Am J Hum Genet. 1999. 64:1036-44. Palaisa K, Morgante M, Tingey S, Rafalski A. "Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep". Proc Natl Acad Sci U.S.A. 2004. 101: 9885-90. Parker HG, VonHoldt BM, Quignon P, Margulies EH, Shao S, Mosher DS, et al. An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science. 2009. 325:995-8. Pask AJ, Kanasaki H, Kaiser UB, Conn PM, Janovick JA, Stockton DW, et al. A novel mouse model of hypogonadotrophic hypogonadism: N-ethyl-N-nitrosourea-induced gonadotropin-releasing hormone receptor gene mutation. Mol Endocrinol. 2005. 19:972-81. Porada CD, Tran N, Eglitis M, Moen RC, Troutman L, Flake AW, et al. In utero gene therapy: transfer and long-term expression of the bacterial neo(r) gene in sheep after direct injection of retroviral vectors into preimmune fetuses. Hum Gene Ther. 1998. 9: 1571-85. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010. 11:459-63. Review. Przewratil P, Paduch DA, Kobos J, Niedzielski J. Expression of estrogen receptor alpha and progesterone receptor in children with undescended testicle previously treated with human chorionic gonadotropin. J Urol. 2004. 172:1112-6. Raadsma HW, ISGC Int. Sheep Genomics Consortium. Linkage Disequilibrium In The Sheep Genome: Findings From The ISGC HapMap Initiative. Abstract:W134. Plant & Animal genomes XVIII Conference, Jan. 9-30, 2010. San Diego, CA. Radpour R, Rezaee M, Tavasoly A, Solati S, Saleki A. Association of long polyglycine tracts (GGN repeats) in exon 1 of the androgen receptor gene with cryptorchidism and penile hypospadias in Iranian patients. J Androl. 2007. 28:164-9. 35

Ramchandra R, Hood S, Frithiof R, McKinley MJ, May CN. The Role of the Paraventricular Nucleus of the Hypothalamus in the Regulation of Cardiac and Renal Sympathetic Nerve Activity in Conscious Normal and Heart Failure Sheep. J Physiol. 2012. Jul 2. Read AP, Thakker RV, Davies KE, Mountford RC, Brenton DP, Davies M, et al. Mapping of human X-linked hypophosphataemic rickets by multilocus linkage analysis. Hum Genet. 1986. 73:267-70. Recabarren SE, Padmanabhan V, Codner E, Lobos A, Durán C, Vidal M, et al. Postnatal developmental consequences of altered insulin sensitivity in female sheep treated prenatally with testosterone. Am J Physiol Endocrinol Metab. 2005. 289:E801-6. Rijli FM, Matyas R, Pellegrini M, Dierich A, Gruss P, Dollé P, et al. Cryptorchidism and homeotic transformations of spinal nerves and vertebrae in Hoxa-10 mutant mice. Proc Natl Acad Sci U S A. 1995. 92:8185-9. Rosen DR, Siddique T, Patterson D, Figlewicz DA, Sapp P, Hentati A, et al. Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature. 1993. 362:59-62. Rosendo A, Iannuccelli N, Gilbert H, Riquet J, Billon Y, Amigues Y, et al. Microsatellite mapping of quantitative trait loci affecting female reproductive tract characteristics in Meishan x Large White F(2) pigs. J Anim Sci. 2012. 90:37-44. Rothschild MF, Christian LL, Blanchard W. Evidence for multigene control of cryptorchidism in swine. J Hered. 1988. 79:313-4.

Savolainen P, Zhang YP, Luo J, Lundeberg J, Leitner T. "Genetic evidence for an East Asian origin of domestic dogs". Science. 2008. 298: 1610–3.

Scheuner MT, Yoon PW, Khoury MJ. Contribution of Mendelian disorders to common chronic disease: opportunities for recognition, intervention, and prevention. Am J Med Genet C Semin Med Genet. 2004. 125C:50-65. Review. Seelow D, Schuelke M, Hildebrandt F, Nürnberg P. HomozygosityMapper--an interactive approach to homozygosity mapping. Nucleic Acids Res. 2009. 37(Web Server issue):W593-9. Short RW. "Why the elephant has intra-abdominal testes" (On-line). 2005. Accessed at http://www.publish.csiro.au/?act=view_file&file_id=SRB03Ab59.pdf (pdf). Singh A, Ezeasor D. Ultrastructure of Sertoli cells in scrotal testis of unilaterally cryptorchid goat. Prog Clin Biol Res. 1989. 296:159-64. Smith KC, Brown PJ, Morris J, Parkinson TJ. Cryptorchidism in North Ronaldsay sheep. Vet Rec. 2007. 161:658-9. Spayde EC, Joshi AP, Wilcox WR, Briggs M, Cohn DH, Olsen BR. Exon skipping mutation in the COL9A2 gene in a family with multiple epiphyseal dysplasia. Matrix Biol. 2000. 19:121-8. 36

Srinivasan K, Ramarao P. Animal models in type 2 diabetes research: an overview. Indian J Med Res. 2007. 125: 451-72. St Jean G, Gaughan EM, Constable PD. Cryptorchidism in North American cattle: breed predisposition and clinical findings. Theriogenology. 1992. 38:951-8.

Stranger BE, Stahl EA, Raj T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics. 2011. 187:367-83. Review. Sutter NB, Bustamante CD, Chase K, Gray MM, Zhao K, Zhu L, et al. A single IGF1 allele is a major determinant of small size in dogs. Science. 2007. 316:112-5. Tan YD, Wan C, Zhu Y, Lu C, Xiang Z, Deng HW. An amplified fragment length polymorphism map of the silkworm. Genetics. 2001. 157:1277-84. Teo YY, Small KS, Kwiatkowski DP. Methodological challenges of genome-wide association analysis in Africa. Nat Rev Genet. 11:149-60. Thomas JT, Lin K, Nandedkar M, Camargo M, Cervenka J, Luyten FP. A human chondrodysplasia due to a mutation in a TGF-beta superfamily member. Nat Genet. 1996. 12:315-7. Thonneau PF, Gandia P, Mieusset R. Cryptorchidism: incidence, risk factors, and potential role of environment; an update. J Androl. 2003. 24:155-62. Review. Tieder M, Modai D, Samuel R, Arie R, Halabe A, Bab I, et al. Hereditary hypophosphatemic rickets with hypercalciuria. N Engl J Med. 1985. 312 : 611-7. Vandenberg P, Khillan JS, Prockop DJ, Helminen H, Kontusaari S, Ala-Kokko L. Expression of a partially deleted gene of human type II procollagen (COL2A1) in transgenic mice produces a chondrodysplasia. Proc Natl Acad Sci USA. 1991. 88:7640-4.

Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011. 7:e1002316. Vinet A, Drouilhet L, Bodin L, Mulsant P, Fabre S, Phocas F. Genetic control of multiple births in low ovulating mammalian species. Mamm Genome. 2012. Aug 8. [Epub ahead of print]. Vissing H, D'Alessio M, Lee B, Ramirez F, Godfrey M, Hollister DW. to substitution in the triple helical domain of pro-alpha 1 (II) collagen results in a lethal perinatal form of short-limbed dwarfism. J Biol Chem. 1989. 264:18265-7. Wang K, Li WD, Zhang CK, Wang Z, Glessner JT, Grant SF, et al. A genome-wide association study on obesity and obesity-related traits. PLoS One. 2011. 6:e18939. Wang N, Fang L, Xin H, Wang L, Li S. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing. BMC Plant Biol. 2012. 12:148. 37

Wang Y, Barthold J, Kanetsky PA, Casalunovo T, Pearson E, Manson J. Allelic variants in HOX genes in cryptorchidism. Birth Defects Res A Clin Mol Teratol. 2007. 79:269-75. Wayne RK, Ostrander EA. Lessons learned from the dog genome. Trends Genet. 2007. 23:557-67. Woelfle JV, Brenner RE, Zabel B, Reichel H, Nelitz M. Schmid-type metaphyseal chondrodysplasia as the result of a collagen type X defect due to a novel COL10A1 nonsense mutation: A case report of a novel COL10A1 mutation. J Orthop Sci. 2011. 16:245-9. Wylie MC, Johnson VM, Carpino E, Mullen K, Hauser K, Nedder A, et al. Respiratory, neuromuscular, and cardiovascular effects of neosaxitoxin in isoflurane-anesthetized sheep. Reg Anesth Pain Med. 2012. 37:152-8. Yates D, Hayes G, Heffernan M, Beynon R. Incidence of cryptorchidism in dogs and cats. Vet Rec. 2003. 152:502-4. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008. 40:638-45. Zhuang Z, Gusev A, Cho J, Pe'er I. Detecting identity by descent and homozygosity mapping in whole-exome sequencing data. PLoS One. 2012.7:e47618 Zimmermann S, Schöttler P, Engel W, Adham IM. Mouse Leydig insulin-like (Ley I-L) gene: structure and expression during testis and ovary development. Mol Reprod Dev. 1997. 47:30- 8. Zimmermann S, Steding G, Emmen JM, Brinkmann AO, Nayernia K, Holstein AF, et al. Targeted disruption of the Insl3 gene causes bilateral cryptorchidism. Mol Endocrinol. 1999. 13:681-91. Zuccarello D, Morini E, Douzgou S, Ferlin A, Pizzuti A, Salpietro DC, et al. Preliminary data suggest that mutations in the CgRP pathway are not involved in human sporadic cryptorchidism. J Endocrinol Invest. 2004. 27:760-4.

38

CHAPTER 2. A NOVEL NONSENSE MUTATION IN THE DMP1 GENE

IDENTIFIED BY A GENOME-WIDE ASSOCIATION STUDY IS RESPONSIBLE

FOR INHERITED RICKETS IN CORRIEDALE SHEEP

A paper published in PLoS One 2011;6:e21739.

Xia Zhao1 #, Keren E Dittmer2 #, Hugh T Blair2, Keith G Thompson2, Max F Rothschild1,

Dorian J Garrick1,2 *

1Department of Animal Science and Center for Integrated Animal Genomics, Iowa State

University, Ames, IA 50011, USA

2Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston

North 4474, New Zealand

# These two authors contributed equally to this work.

ABSTRACT Inherited rickets of Corriedale sheep is characterized by decreased growth rate, thoracic lordosis and angular limb deformities. Previous outcross and backcross studies implicate inheritance as a simple autosomal recessive disorder. A genome wide association study was conducted using the Illumina OvineSNP50 BeadChip on 20 related sheep comprising 17 affected and 3 carriers. A homozygous region of 125 consecutive single-nucleotide polymorphism (SNP) loci was identified in all affected sheep, covering a region of 6 Mb on ovine chromosome 6. Among 35 candidate genes in this region, the dentin matrix protein 1 gene (DMP1) was sequenced to reveal a nonsense mutation 250C/T on exon 6. This mutation introduced a stop codon (R145X) and could truncate C-terminal amino acids. 39

Genotyping by PCR-RFLP for this mutation showed all 17 affected sheep were “T T” genotypes; the 3 carriers were “C T”; 24 phenotypically normal related sheep were either “C

T” or “C C”; and 46 unrelated normal control sheep from other breeds were all “C C”. The other SNPs in DMP1 were not concordant with the disease and can all be ruled out as candidates. Previous research has shown that mutations in the DMP1 gene are responsible for autosomal recessive hypophosphatemic rickets in humans. Dmp1_knockout mice exhibit rickets phenotypes. We believe the R145X mutation to be responsible for the inherited rickets found in Corriedale sheep. A simple diagnostic test can be designed to identify carriers with the defective “T” allele. Affected sheep could be used as animal models for this form of human rickets, and for further investigation of the role of DMP1 in phosphate homeostasis.

Keywords: nonsense mutation; DMP1 gene; inherited rickets; sheep; genetic disease

INTRODUCTION Rickets is a metabolic bone disease in humans and animals with most cases caused by a nutritional deficiency of either vitamin D or phosphorus. This disease leads to softening and weakening of bone caused by defective mineralization of cartilage at sites of endochondral ossification and potentially causes fractures and limb deformities [1], [2], [3]. Rickets may also result from genetic mutations that disrupt genes whose functions are critical for normal bone metabolism. Five types of rickets have been described in humans with common clinical characteristics of renal phosphate wasting, including X-linked hypophosphatemic rickets

(XLH), autosomal dominant hypophosphatemic rickets (ADHR), autosomal recessive hypophosphatemic rickets (ARHR) type 1 and 2, and hereditary hypophosphataemic rickets 40

with hypercalciuria (HHRH). A large number of different mutations comprising single base

pair deletions, nonsense and missense mutations have been identified in PHEX (phosphate

regulation gene with homologies to endopeptidases on the X chromosome) as being

responsible for XLH [4], [5], [6]. ADHR is caused by gain of function mutation leading to

increased activity of fibroblast growth factor 23 encoded by gene FGF23 [7], and mutations

in the dentin matrix protein 1 gene (DMP1) and ecto-nucleotide

pyrophosphatase/phosphodiesterase 1 (ENPP1) have been identified in autosomal recessive

hypophosphataemic rickets type 1 and 2, respectively [8], [9], [10]. The mutation for HHRH

occurs in the gene for the NaPi-IIc protein (or SLC34A3, human chromosome 9q34), a renal

Na-P co-transporter [11]. Two other recessively inherited forms of rickets in humans are due to the disruptions of either the synthesis or metabolism of vitamin D. Vitamin D-dependent rickets type I (VDDR-I) is caused by mutations in the CYP27B1 gene, which encodes vitamin D 1-alpha-hydroxylase [12]. Loss of function mutations in the vitamin D receptor gene (VDR) are the genetic basis for vitamin D-dependent rickets type II (VDDR-II), which is also called hereditary vitamin D - resistant rickets (HVDRR) [13], [14].

Inherited forms of rickets have been uncovered frequently in humans but were rare in domestic animals. Recently, an inherited form of rickets has been described in purebred

Corriedale sheep from a commercial flock in New Zealand, with an incidence of up to 20 lambs out of 1,600 over a 2-year period. Affected sheep were characterized by decreased growth rate, thoracic lordosis and angular limb deformities (Figure 1) [15], with low serum calcium and phosphate concentrations and normal 25 hydroxyvitamin D and 1,25 dihydroxyvitamin D3 concentrations [16]. Embryo transfer and backcross breeding trials have determined that this disease is likely a simple autosomal recessive disorder [16]. 41

Large animal models of human genetic diseases have received extensive clinical attention

due to their advantages over rodent models, including greater physiological similarity to

human patients, a longer life span and larger body size. Sheep have been used as an animal

model for gene therapy (GT) of human diseases, such as post-traumatic osteoarthritis [17]

and steroid-induced ocular hypertension [18]. The convenience of performing in utero GT

prior to disease onset will help to achieve long-term expression of the exogenous gene

without modifying the germ line of the recipient [19]. Based on the hypothesis that inherited

rickets-affected Corriedale sheep could be a potential model for human rickets; and a genetic

marker could help sheep breeders to avoid at-risk matings, we performed a genome-wide

association study by genotyping 54,241 evenly distributed single nucleotide polymorphisms

(SNPs) for 17 affected Corriedale sheep and 3 carriers using the Illumina Ovine SNP50

BeadChip. It has been shown that the analysis of genome-wide high-density SNPs is an

effective strategy to map and identify causal mutations for recessive diseases [20].

RESULTS Homozygosity Mapping

The genotyping data using a SNP BeadChip for this study was submitted to the NCBI Gene

Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under Super Series accession no

GSE26310. In our study, exactly 3 homozygous regions of more than 10 consecutive SNPs were identified in all 17 affected sheep whereas the 3 carriers exhibited heterozygosity in those regions. The largest homozygous segment consisted of 125 consecutive SNP loci, whereas the other two segments had only 19 or 11 consecutive SNPs. The largest 125-SNP region started from SNP OAR6_109334543.1 and extended to SNP s03175.1, covering a region of 5.95 Mb (109,334,543 to 115,285,275 bp) on the long arm of sheep chromosome 6 42

(OAR 6). There were 35 genes located in this region based on the bovine reference sequence

(Figures 2a, 2b and Table S2). The size of the second homozygous segment was about 0.70

Mb (OAR6: 118,884,834 to 119,587,883 bp) and had only 3.2 Mb separation from the first homozygous segment on the same chromosome (OAR 6). The third homozygous segment covered a 0.66 Mb region (OAR15: 1,131,166 to 1,654,496 bp) on the short arm of OAR 15.

Within the second and third homozygous regions there were 8 and 6 positional candidate genes, respectively (Table S2). One gene called DMP1 in the largest 125-SNP region was considered the most plausible candidate due to its biological functions involved in mineralization and phosphate homeostasis and previous reports of its involvement in human rickets [8], [21]. By comparing with the bovine genomic reference sequence it was found that this gene was at 112.20 Mb on OAR6q (Table S2), spanned over 16 kb of size and contained 6 .

Candidate Gene Sequencing

Fine mapping was carried out by individual PCR amplification and sequencing of all the exons of the DMP1 gene to identify the mutation for this disease (Table S1). After blasting the sheep genomic reference sequence with bovine DMP1 coding sequence

(ENSBTAT00000025468), there were several alignments for exon 6 with high identity rates

(91%-96%). As exon 6 is the last and also the largest exon covering 88% of the open reading frame of DMP1, it was prioritized for re-sequencing. Direct sequencing from 3 carriers and

1 normal control revealed 1 published SNP (s18093.1, OAR6: 112,213,812 bp) and 8 novel ones in this exon (Figure 2e). The complete genomic sequence of exon 6 for sheep DMP1 gene was submitted to GenBank (http://www.ncbi.nlm.nih.gov/genbank/) under accession no.

HQ600592. 43

Moreover, we successfully amplified exon 1, exon 3, exon 4 and exon 5 of sheep DMP1 gene using exon flanked intronic sequences of cow reference for each exon. The amplifications of exon 2 failed after using several pairs of primers. Intronic SNPs (SNP1 and SNP2) and a 1- bp insertion/deletion (indel: A) were identified respectively in intron 1, intron 5 and intron 3 by direct sequencing (Table S1).

Mutation Analysis

Seventeen base pairs upstream of SNP s18093.1, a SNP (C/T, OAR6:112,213,795 bp) located at +250bp of exon 6 caused a non-synonymous amino acid change in the DMP1 protein. This C -> T transition introduced a stop codon (R145X) that led to a truncated

DMP1 protein at the 145th amino acid (Figure 3a, Figure 3b and Figure S1). Genotyping of this mutation by PCR-RFLP (Polymerase Chain Reaction-Restriction Fragment Length

Polymorphism) (Figure 3c) and direct sequencing showed all 17 affected animals had the “T

T” genotype; the 3 carriers were “C T”; 24 phenotypically normal related sheep were either

“C T” or “C C”; and 46 unrelated control sheep of other breeds were only “C C” genotypes

(Table 1). This mutation was 100% concordant with the recessive pattern of inheritance in

affected, carrier and normal individuals. Moreover, genotyping of the other 8 SNPs in exon

6 which surrounded the R145X mutation in DMP1, found incomplete linkage disequilibrium

for these markers with either the disease or the mutant SNP (Figure 2f). The 3 genomic

variants in introns 1, 3 and 5 identified were not located at either the splicing receptor or

acceptor sites. Therefore, those mutations could be ruled out as candidate mutation loci. RT-

PCR/RFLP (Figure 3d) was performed for the R145X mutation and showed the same

digestion pattern as PCR-RFLP. 44

Taken together, our results strongly suggest that the R145X substitution is causative for

inherited rickets in Corriedale sheep, disrupting the function of the DMP1 protein, preventing

successful bone mineralization and phosphate homeostasis.

DISCUSSION Homozygosity mapping is a powerful method to map simple recessive traits since the regions

immediately flanking the mutation locus will likely be homozygous by descent in closely

related offspring. The affected animals in this study resulted from consanguineous mating

between pairs of offspring descended from one common male ancestor in the embryo transfer

and backcross studies used to characterize the inheritance of the disease [15], [16]. The

region containing the mutation should in these circumstances span a large chromosome

segment that would be progressively diminished in subsequent meioses when recombination

events occur near to the mutation. This mapping strategy has been used successfully for

recessive diseases in both animal and human studies [20], [22]. The homozygosity mapping in this study was conducted from knowledge of the SNP genotypes in affected individuals and their genomic location, and used command line UNIX tools including awk, sort and join to identify any regions of at least 10 consecutive homozygous SNPs, equivalent to about 0.5 cM or more. It was presumed that one such region would contain the mutation associated with inherited rickets in Corriedale sheep. In this study, 3 homozygous regions met the criteria of exceeding 10 consecutive homozygous SNP markers. Previously reported rickets genes including PHEX, FGF23, ENPP1, SLC34A3, CYP27B1 and VDR did not show up in any of these homozygous segments based on the ovine genome assembly v1.0, whereas

DMP1 was contained in the largest region. This is not surprising because PHEX or FGF23 related rickets have a distinctive inheritance mode that differs from the autosomal recessive 45

inheritance of the rickets studied here. The severe hypophosphatemia in affected sheep

indicates disturbed phosphate balance within the body. Therefore, the DMP1 gene was

considered to be a high priority candidate gene. The nonsense mutation we identified in the

DMP1 gene was 100% concordant with the recessive pattern of inheritance for the disease

studied.

DMP1 encodes a serine-rich acidic protein and is a member of Small Integrin-Binding

Ligand, N-linked, Glycoprotein family (SIBLING) which mineralizes tissue such as bones

and teeth by binding strongly to hydroxyapatite using an Arg-Gly-Asp (RGD) tripeptide [23].

Dual functions of the DMP1 protein have been reported based on the in vitro studies. The

nonphosphorylated cytoplasmic DMP1 protein translocates to the cell nucleus and initially

acts as a transcriptional factor to enhance gene transcription of the osteoblast-specific genes

such as alkaline phosphatase and osteocalcin, and then it moves out to the extracellular

matrix during the osteoblast to osteocyte transition phase, to promote mineralization and

phosphate homeostasis [24], [25]. Loss of function mutations in the human DMP1 gene have

been shown to cause autosomal recessive hypophosphatemic rickets in different ethnic

groups including Turkish, consanguineous Spanish, Lebanese and Japanese populations.

These variant mutations in DMP1 include deletions in exon 6, nucleotide substitution in the

splice acceptor sequence of intron 2, and missense mutations in exon 2 or exon 3 that

introduce the premature termination codon [8], [21], [26]. The Dmp1-null mouse model also exhibits hypophosphatemic rickets, which indicates the important role of the DMP1 gene in the development of normal bone formation [21]. Since the majority of the dentin matrix protein 1 is encoded by the sixth and last exon of DMP1, typical for proteins in the SIBLING family [27], the novel nonsense mutation (250C/T, R145X) identified here in ovine DMP1 46

gene is a plausible mutation leading to loss of function following nonsense-mediated mRNA

decay shown to occur in human patients. The nonsense-mediated mRNA decay is a post-

transcriptional mechanism in eukaryotic species. It likely eliminates the abnormal transcripts

produced by prematurely terminated translation [28]. The previous studies strongly support

the nonsense mutation we identified in exon 6 as causing loss of function of the DMP1

protein and being responsible for ARHR type 1 in Corriedale sheep.

Despite the wide variety of mutated genes that result in the different forms of hereditary

hypophosphatemic rickets, the clinical features are relatively similar (with the exception of

HHRH). In XLH, ADHR, ARHR type 1 and 2, the overriding common feature is increased

serum concentration and activity of FGF23. FGF23 inhibits a NPT-2a co-transporter (Na-Pi co-transporter) in the kidneys leading to loss of phosphate from renal tubules into the urine

(phosphaturia). FGF23 also inhibits the activity of CYP27B1, thereby inhibiting active vitamin D (1,25-dihydroxyvitamin D3) production [29]. As a result of the increased FGF23 concentrations patients with XLH, ADHR, ARHR type 1 and 2 present with rickets, hypophosphatemia, phosphaturia and inappropriately normal serum 1,25-dihydroxyvitamin

D3 concentrations [9], [30]. Corriedale sheep with the mutation in DMP1 also developed

rickets, were severely hypophosphataemic and had normal serum 1,25(OH)2D3 concentrations [1], [16]. The mechanism by which mutations in PHEX, DMP1 or ENPP1 result in upregulation of FGF23 is unknown. An unusual feature of ARHR type 2 is that mutations in ENPP1 are also associated with generalized arterial calcification of infancy, again the mechanism by which a mutation in ENPP1 can result in a disease characterised by hypomineralization and one characterised by hypermineralization is unknown [9]. In HHRH, 47

CYP27B1 is actually upregulated, resulting in increased 1,25(OH)2D3 concentrations, and perhaps explaining the hypercalciuria [11].

In conclusion, this is the first report of ARHR type 1 caused by a mutation in the DMP1 gene in a sheep population. In this study, we demonstrate successful use of applying high density

SNP arrays for localization of a mutation locus causing a recessive inherited defect. The results can be rapidly applied as a selection marker to identify carriers with the defective “T” allele in order to avoid at-risk matings to improve animal welfare and decrease economic losses. At this stage, the incidence of the mutation in the Corriedale sheep population is unknown, however it is suspected to be widespread and testing of New Zealand Corriedale sheep is planned. More importantly, due to their size and relatively low cost of maintenance,

Corriedale sheep with inherited rickets could be used as an animal model for this form of human rickets. Even though the human and sheep mutations are not exactly the same, both disrupt the function of the DMP1 protein. The sheep model can be used for further investigation of the role of DMP1 in phosphate homeostasis and to explore the potential for early medical intervention to reduce the development of the disease in homozygous individuals.

MATERIALS and METHODS Ethics statement

Approvals dealing with DNA or tissue collection from animals were obtained from the

Massey University Animal Ethics Committee (APPROVAL NUMBER 05/123) and Iowa

State University Animal Care & Use Committee did not require additional approvals.

Animals 48

In total, 17 affected Corriedale sheep (9 males, 8 females) and 27 phenotypically normal crossbred Corriedale sheep (6 males, 21 females) as well as 46 normal controls (Texel and crossbred Perendale sheep) were examined. The age of examined animals ranged from a fetus of 133 days gestation to a 3 year-old animal. Eight affected sheep were from the original commercial property at Marlborough, New Zealand, and 9 affected sheep were their offspring. The 27 phenotypically normal crossbred Corriedale sheep were progeny from the mating of unrelated Romney ewes with the Corriedale carrier ram (F1 generation from outcross), or progeny of the F1 generation daughters mated back to the carrier ram (F2 generation from backcross). SNP genotypes were obtained for the 17 affected and 3 carrier

Corriedale sheep, while fine mapping was conducted on all 44 pure or crossbred Corriedale sheep. The 46 normal control sheep used for validation of the mutation were from unrelated populations without evidence of rickets.

SNP genotyping

DNA was extracted from blood using a Roche MagNA Pure automated analyser. DNA samples were randomized on genotyping plates according to disease status. The genotyping was performed using the Ovine SNP50 BeadChip containing 54,241 SNPs (Illumina, San

Diego, CA, USA) with standard procedures at GeneSeek Inc. Lincoln, NE, USA. The SNPs retained had to get a call rate > 80% and a minimum GenTrain score of 0.25. All genotypes with satisfactory call rate and GenTrain score were used regardless of minor allele frequency

(MAF).

SNP chip data analyses

The IBD program used awk, join and sort UNIX commands to scan the whole genome of the genotyped affected cases, to produce a list of all the strings of consensus homozygous 49

markers with a length of more than 10 SNPs. The same process was repeated in the carriers

to ensure the regions identified in the affected individuals were not homozygous in the

carriers. The homozygosity mapping algorithm involved the following: first, SNPs in the

Illumina ovine 50K map file were sorted by chromosome and position, and numerically

coded in contiguous order. Second, the names of the “A/B” genotypes (provided by

GeneSeek Inc.) for each individual were joined with the renumbered map file to identify the

genotypes by their locations. The joined file was separated into three groups according to the

disease status of affected, carrier and normal. Third, the number of occurrences of genotypes

(“AA”, “AB”, “BB” and no call “--”) for each SNP was counted in the affected and carrier

groups. Fourth, the genotype frequencies were calculated for each SNP. In the frequency

calculation, the numerator was the count of genotypes being either “AA”, “AB” or “BB”, and

the denominator was the total count excluding the category of no call. Finally, if the highest

genotype frequency for a SNP was equal to 1 and the two alleles of this genotype were the

same, such as “AA” or “BB”, this SNP was considered a homozygous locus. The span of

contiguous homozygous loci were accumulated and reported if the region included more than

10 loci. Genes in these IBD regions were examined for potential involvement using

comparative maps between the ovine and bovine genomes through the Ovine Genome

Assembly v1.0 (http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search).

DMP1 gene sequencing

Primers were designed to amplify all 6 exons for sheep DMP1 (Table S1). Cow DMP1

genomic sequence was used as a template for primer designing in order to amplify the first 5

exons of sheep DMP1 gene. PCR was performed using a 10 l system with GoTaq DNA polymerase (Promega, Madison, WI). The annealing temperatures were 58° C or 60° C. 50

PCR products were purified with ExoSAP-IT® (USB, Cleveland, OH), before pooling from

3 carriers and sequencing commercially using a 3730xl DNA Analyzer.

Mutation analyses

Sheep EST sequence (DY509762) encompassing exon 1 to exon 5 of sheep DMP1 along with re-sequencing results were used to predict the DMP1 protein sequence by applying the

ExPASy translate tool (http://ca.expasy.org/tools/dna.html). The predicted sheep DMP1 protein was compared to the wild type (normal) of sheep and cow (AAI49017) using the

CLUSTALW alignment tool (http://align.genome.jp/). Each SNP variant was analyzed to check whether it could induce a missense or nonsense mutation. The identified SNPs were genotyped on all 44 related Corriedale sheep and/or 46 normal controls by restriction enzyme digestion (PCR-RFLP) or direct sequencing. The genotyping results from all 9 SNPs distributed in exon 6 of the DMP1 gene were formatted and put into Haploview software

(http://www.broadinstitute.org/haploview/haploview). The linkage disequilibrium (LD) between pairs of SNPs in exon 6 was analyzed by Haploview software. Under the LD tabs, r2, the correlation coefficient, was saved to show the LD between each pair of SNPs. The related LD color scheme was produced accordingly.

RNA extraction and Reverse transcriptase-polymerase chain reaction (RT-PCR)

RNA was extracted from fibroblast cells cultured in vitro from skin biopsies of affected and control sheep to confirm the mutation existed at the transcript level. The fibroblast cell culture is as previously described [16]. Total RNA was extracted using Tri Reagent as per the manufacturer’s instructions (Sigma-Aldrich Co., USA). RT-PCR was performed using the Superscript® one-step RT-PCR system with Platinum® Taq polymerase (Invitrogen 51

Corp., USA). The RT-PCR conditions were: 55ºC for 30 min, 94ºC for 2 min, 40 cycles of

30 s at 94ºC, 30 s at 58º and 30 s at 72ºC, and finally a 5 min elongation step at 72ºC.

Restriction enzyme digestion

The nonsense mutation (C -> T) created a new recognition site for the restriction enzyme

NIaIII (New England Biolabs Inc., USA). DNA fragments from the PCR and RT-PCR reactions were subjected to restriction enzyme digestion with NlaIII. The reaction mix was incubated at 37ºC overnight, and the products were analyzed on a 2.5% (w/v) ultra-pure agarose gel.

REFERENCES

1. Dittmer KE, Thompson KG, Blair HT (2009) Pathology of inherited rickets in Corriedale sheep. J Comp Pathol 141: 147-155. 2. Fitch LWN (1943) Rickets in hoggets, with a note on the aetiology and definition of the disease. Australian Vet J 19:2. 3. Ozkan B (2010) Nutritional rickets. J Clin Res Pediatr Endocrinol 2: 137-143. 4. HYP Consortium (1995) A gene (HYP) with homologies to endopeptidases is mutated in patients with X-linked hypophosphatemic rickets. Nat Genet 11: 130-136. 5. Sabbagh Y, Jones AO, Tenenhouse HS (2000) PHEXdb, a locus-specific database for mutations causing X-linked hypophosphatemia. Hum. Mutat 16: 1-6. 6. Ichikawa S, Traxler EA, Estwick SA, Curry LR, Johnson ML, et al. (2008) Mutational survey of the PHEX gene in patients with X-linked hypophosphatemic rickets. Bone 43: 663-666. 7. ADHR Consortium (2000) Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nat Genet. 26:345-348. 8. Lorenz-Depiereux B, Bastepe M, Benet-Pages A, Amyere M, Wagenstaller J, et al. (2006) DMP1 mutations in autosomal recessive hypophosphatemia implicate a bone matrix protein in the regulation of phosphate homeostasis. Nat Genet 38: 1248-1250. 9. Lorenz-Depiereux B, Schnabel D, Tiosano D, Hausler G, Strom TM (2010) Loss-of- function ENPP1 mutations cause both generalised arterial calcification of infancy and autosomal-recessive hypophosphatemic rickets. Am J Hum Genet 86: 267-272. 10. Levy-Litan V, Hershkovitz E, Avizov L, Leventhal N, Bercovich D, et al. (2010) Autosomal-recessive hypophosphatemic rickets is associated with an inactivation mutation in the ENPP1 gene. Am J Hum Genet 86: 273-278. 52

11. Bergwitz C, Roslin NM, Tieder M, Loredo-Osti JC, Bastepe M, et al. (2006) SLC34A3 mutations in patients with hereditary hypophosphatemic rickets with hypercalciuria predict a key role for the sodium-phosphate cotransporter NaPi-IIc in maintaining phosphate homeostasis. Am J Hum Genet 78: 179-192. 12. Alzahrani AS, Zou M, Baitei EY, Alshaikh OM, Al-Rijjal RA, et al. (2010) A novel G102E mutation of CYP27B1 in a large family with vitamin D-dependent rickets type 1. Clin Endocrinol Metab 95: 4176-4183. 13. Malloy PJ, Pike JW, Feldman D (1999) The vitamin D receptor and the syndrome of hereditary 1,25-dihydroxyvitamin D-resistant rickets. Endo Rev 20: 156-188. 14. Malloy PJ, Xu R, Peng L, Peleg S, Al-Ashwal A, et al. (2004). Hereditary 1,25- dihydroxyvitamin D resistant rickets due to a mutation causing multiple defects in vitamin D receptor function. Endocrine 145: 5106-5114. 15. Thompson KG, Dittmer KE, Blair HT, Fairley RA, Sim DF (2007) An outbreak of rickets in Corriedale sheep: evidence for a genetic aetiology. N Z Vet J 55: 137-142. 16. Dittmer KE, Howe L, Thompson KG, Stowell KM, Blair HT, et al. (2010) Normal vitamin D receptor function with increased expression of 25-hydroxyvitamin D(3)- 24-hydroxylase in Corriedale sheep with inherited rickets. Res Vet Sci 91:362-369. 17. Hurtig M, Frederic D, Darilyn F, Betina L, Antonio C, et al. (2008) BMP-7 Gene Therapy for Mitigation of Post-traumatic Osteoarthritis in Sheep. Eur Cell Mater 16: 52 18. Gerometta R, Spiga MG, Borras T, Candia OA (2010) Treatment of sheep steroid- induced ocular hypertension with a glucocorticoid-inducible MMP1 gene therapy virus. Invest Ophthalmol Vis Sci 51: 3042-3048. 19. Porada CD, Tran N, Eglitis M, Moen RC, Troutman L, et al. (1998) In utero gene therapy: transfer and long-term expression of the bacterial neo(r) gene in sheep after direct injection of retroviral vectors into preimmune fetuses. Hum Gene Ther 9: 1571-1585. 20. Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, et al. (2008) Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet 40: 449-454. 21. Feng JQ, Ward LM, Liu S, Lu Y, Xie Y, et al. (2006) Loss of DMP1 causes rickets and osteomalacia and identifies a role for osteocytes in mineral metabolism. Nat Genet 38: 1310-1315. 22. Lander ES, Botstein D (1987) Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236: 1567-1570. 23. Fisher LW, Fedarko NS (2003) Six genes expressed in bones and teeth encode the current members of the SIBLING family of proteins. Connect Tissue Res 44 Suppl 1: 33-40. 24. Narayanan K, Ramachandran A, Hao J, He G, Park KW, et al. (2003) Dual functional roles of dentin matrix protein 1. Implications in biomineralization and gene transcription by activation of intracellular Ca2+ store. J Biol Chem 278: 17500- 17508. 25. Schiavi SC (2006) Bone talk. Nat Genet 38: 1230-1231. 53

26. Koshida R, Yamaguchi H, Yamasaki K, Tsuchimochi W, Yonekawa T, et al. (2010) A novel nonsense mutation in the DMP1 gene in a Japanese family with autosomal recessive hypophosphatemic rickets. J Bone Miner Metab 28: 585-590. 27. Fisher LW, Torchia DA, Fohr B, Young MF, Fedarko NS (2001) Flexible structures of SIBLING proteins, bone sialoprotein, and osteopontin. Biochem Biophys Res Commun 280: 460-465. 28. Maquat LE (2004) Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol 5: 89-99. 29. Bergwitz C, Jüppner H (2010) Regulation of Phosphate Homeostasis by PTH, Vitamin D, and FGF23. Annu Rev Med 61: 91-104. 30. Dittmer KE, Thompson KG (2011) Vitamin D metabolism and rickets in domestic animals: a review. Vet Pathol 48: 389-407. 54

Table 1. The genotyping results of the causative mutation (R145X) Sheep Phenotype Genotype Rate of Breed Concordance Rickets Status # Sheep Genotype # Sheep Corriedale Affected 17 T T 17 100% Known Carrier 3 T C 3 100% Phenotype Normala 24 T C , C C 24 100% Texel, Normal 46 C C 46 100% Perendale a These sheep are phenotypically normal offspring from F1 generation and F2 backcross generations generated by out-crossing and back-crossing trials.

Figure 1. Corriedale sheep with rickets. (a) 1½-year-old Corriedale sheep with inherited rickets showing angular limb deformities of the forelimbs and lordosis of the spine in the mid-thoracic region (white arrow). (b) 1½-year-old Corriedale sheep with inherited rickets showing combined varus and valgus or “windswept” deformity of the forelimbs.

55

Figure 2. Work flow chart for discovering the mutation locus of inherited rickets in Corriedale sheep. (a) The sheep chromosome 6 (OAR 6). (b) The IBD region containing the causative locus is located on OAR 6 between 109 and 115 Mb. (c) The black boxes show 35 cow reference genes encompassing the sheep IBD region based on the comparative genomic maps. (d) & (e) The boxes represent exons of the cow and the predicted sheep DMP1 genes. The solid parts of boxes demonstrate the coding region for DMP1. The vertical lines on exon 6 are representing the mutant SNP (R145X) with the other 8 SNPs. (f) The number and color in each diamond box shows r2 value for each pair of SNPs. Higher numbers and darker colors indicate SNPs are with higher LD. Four SNPs in block 1 show a complete mutual linkage disequilibrium (LD) but not with the causative mutation (R145X).

56

Figure 3. Mutation analyses (a) The chromatogram shows the location of the mutation (Y: C/T). (b) Protein sequence alignments of DMP1. The highlighted R145X substitution induces a stop codon that leads to premature termination of the protein. MUT, R145X mutant; WT, wild type. (c) PCR-RFLP genotyping results for the affected (3 bands) and carrier (4 bands) Corriedale sheep and normal (2 bands) control sheep. M, 1kb marker; A, affected; C, carrier; N, normal. (d) RT-PCR-RFLP results for 2 controls (N1 &N2) and 3 affected sheep (A1, A2 &A3).

57

Figure S1. Complete predicted DMP1 protein sequence alignments among rickets affected sheep, wild type sheep and cow. The amino acid (R145X) with a border is the position where the “C - > T” transition induced a stop codon and lead to a truncated DMP1 protein at the 145th amino acid.

Table S1. Primer sequence, annealing temperature, PCR amplicon information and genetic variants identified by sequencing.

Primer Primer sequence Anealing Size Fragment Genetic variant SNP SNP name Temp. (ºC) (bp) name (position in the name characteristic fragment)

DMP1e1F 5’- CTTTTCCCATCCTTGGGTCT-3’ 60 560 DMP1e1 C/T (+ 453bp) SNP1 intronic DMP1e1R 5’- GCACTGTTCTCCCCATCTTT-3’ DMP1e3F 5’- CAAAATGTTATCCCCAGACCA-3’ 60 388 DMP1e3 no - no DMP1e3R 5’- TTCAGCATTTGCAGTGAAGC-3’ DMP1e4F 5’- GGCTAAACACAAGCCAAGGA-3’ 60 659 DMP1e4 indel (A) (+284bp) - intronic DMP1e4R 5’- GGAGATTGGGAGGGTCATGT-3’ DMP1e5F 5’- CAGGCCATTTGGAAAGTCAT-3’ 60 411 DMP1e5 C/G (+ 341bp) SNP2 intronic DMP1e5R 5’- GAGGAACATTTAGGGCCACA-3’ DMP1e6_2F a 5’- ATGGAAAATGGGGTGACTTG-3’ 58 668 DMP1e6_2 C/T (+455bp) R145X exonic a

DMP1e6_2R 5’- CATCCCTTCATCGTCGAACT-3’ C/T (+472bp) S18033.1 exonic 58 A/G (+493bp) SNP3 exonic DMP1e6_5F 5’- ATGAGTCCAGGGGTGACAAC-3’ 58 703 DMP1e6_5 C/G (+101bp) SNP4 exonic DMP1e6_5R 5’- CAACAATGGGCATCTTTCCT-3’ C/T (+134bp) SNP5 exonic C/T (+140bp) SNP6 exonic A/G (+167bp) SNP7 exonic A/C (+218bp) SNP8 exonic C/T (+602bp) SNP9 3’ UTR a These primer sets were used for PCR-RFLP genotyping of the mutation R145X.

Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences.

Targeted Bovine Ref. Bovine Gene Region Ovine Chromosome Position Bovine Ref. Gene Name Gene Symbol Accession No. (size) 1 OAR6:109103268..109364732 WDFY3 Similar to WD repeat and FYVE domain-containing protein 3 (Autophagy-linked XM_617252 (5.95 Mb) FYVE protein) (Alfy) OAR6:110461394..110535578 ARHGAP24 Rho GTPase activating protein 24 NM_001102234 OAR6:111226629..111449410 PTPN13 Protein tyrosine phosphatase, non-receptor type 13 (APO-1/CD95 (Fas)- NM_174590 associated phosphatase) OAR6:111458495..111488852 SLC10A6 Solute carrier family 10 (sodium/bile acid cotransporter family), member 6 NM_001081738 OAR6:111512006..111515283 UBE2D3P Similar to Putative ubiquitin-conjugating enzyme E2 D3-like protein NM_001075135 OAR6:111512014..111515294 LOC783264 Similar to ubiquitin-conjugating enzyme E2D 3 XM_001249763 OAR6:111512014..111515294 LOC783114 Similar to ubiquitin-conjugating enzyme E2D 3 XM_001250124 OAR6:111600842..111740902 AFF1 Similar to AF4/FMR2 family, member 1 XM_001249488 OAR6:111756587..111811197 KLHL8 Similar to KIAA1378 protein, transcript variant 1 XM_612186 OAR6:111885205..111904838 HSD17B13 Hydroxysteroid (17-beta) dehydrogenase 13 NM_001046616 OAR6:111920170..111969527 HSD17B11 Hydroxysteroid (17-beta) dehydrogenase 11 NM_001046286 OAR6:112013056..112050582 NUDT9 Nudix (nucleoside diphosphate linked moiety X)-type motif 9 NM_001101096 59 OAR6:112063439..112116046 SPARCL1 SPARC-like 1 (hevin) NM_001034302 OAR6:112199546..112216300* DMP1* Dentin matrix acidic phosphoprotein 1 NM_174038 OAR6:112365289..112411784 MAN2B2 Similar to mannosidase, alpha, class 2B, member 2 XM_601803 OAR6:112588875..112650749 PPP2R2C Similar to Serine/-protein phosphatase 2A 55 kDa regulatory subunit B XM_001250700 gamma isoform OAR6:112757862..112757903 LOC617028 Hypothetical LOC617028 XM_882441 OAR6:112759544..112781041 LOC782334 Hypothetical LOC782334 XM_001250977 OAR6:112780837..112910375 JAKMIP1 Janus kinase and microtubule interacting protein 1 NM_001102251 OAR6:113024908..113058856 LOC615433 Hypothetical LOC615433 XM_867237 OAR6:113144679..113208484 CRMP1 Similar to Dihydropyrimidinase-related protein 1 (DRP-1) (Collapsin response XM_580336 mediator protein 1) OAR6:113213443..113312073 EVC Ellis van Creveld syndrome NM_174747 OAR6:113335352..113473188 EVC2 Ellis van Creveld syndrome 2 NM_173927 OAR6:113451908..113473188 LOC786485 Hypothetical LOC786485 XM_001254147 OAR6:113787927..114038860 STK32B Similar to serine/threonine kinase 32B XM_607574 OAR6:114257470..114261815 CYTL1 Similar to Cytokine-like protein 1 precursor (Protein C17) XM_581416 OAR6:114391388..114395645 MSX1 Msh homeobox 1 NM_174798 OAR6:114700887..114900429 STX18 Syntaxin 18 NM_001099719 OAR6:114899716..114940430 NSG1 Neuron specific gene family member 1 NM_001077957 OAR6:114899716..114901567 LOC789276 Hypothetical LOC789276 XM_001256071 OAR6:115074361..115095979 ZNF509 Similar to hCG2039195, transcript variant 1 XM_583985 OAR6:115101064..115121797 LYAR Ly1 antibody reactive homolog (mouse) NM_001031769 OAR6:115135963..115151278 TMEM128 Transmembrane protein 128 NM_001034454 OAR6:115207964..115221470 LOC525643 Similar to otopetrin XM_603996 OAR6:115249664..115251638 DRD5 receptor D5 XM_604584

Table S2. (continued)

Targeted Bovine Ref. Bovine Gene Region Ovine Chromosome Position Bovine Ref. Gene Name Gene Symbol Accession No. (size) 2 OAR6:118851890..118883051 PDE6B Phosphodiesterase 6B, cGMP-specific, rod, beta NM_174418 (0.70 Mb) OAR6:118941179..118943474 GPI7 GPI7 protein NM_001024518 OAR6:118976061..119063813 LOC614799 Similar to solute carrier family 2 (facilitated glucose transporter), member 9 XM_866430 OAR6:119204026..119245137 WDR1 WD repeat domain 1 (WDR1) NM_001046346 OAR6:119417009..119417107 LOC786062 Similar to protein 518B-like XM_001788422 OAR6:119417009..119417587 ZNF518B Similar to Zinc finger protein 518B XM_584181 3 OAR15:1220402..1223574 KIAA1826 Hypothetical protein LOC511226 NM_001046078 (0.66 Mb) OAR15:1338164..1344393 KBTBD3 Kelch repeat and BTB (POZ) domain containing 3 XM_602528 OAR15:1418829..1442937 AASDHPPT Similar to aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl XM_001250765 transferase 60 OAR15:1484534..1488246 ANKRD49 Ankyrin repeat domain 49 NM_001014965 OAR15:1503540..1575517 MRE11A Double-strand break repair protein meiotic recombination 11 homolog A XM_603439 OAR15:1596294..1619791 GPR83 Similar to G protein-coupled receptor 83 XM_615580 OAR15:1220402..1223574 KIAA1826 Hypothetical protein LOC511226 NM_001046078 61

CHAPTER 3. A MISSENSE MUTATION IN AGTPBP1 WAS IDENTIFIED IN

SHEEP WITH A LOWER MOTOR NEURON DISEASE

A paper published in Heredity 2012;109:156-62

Xia Zhao1, Suneel K Onteru1, Keren E Dittmer2, Kathleen Parton2, Hugh T Blair2, Max F

Rothschild1, Dorian J Garrick1,2

1Department of Animal Science and Center for Integrated Animal Genomics, Iowa State

University, Ames, IA 50011, USA

2Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston

North 4474, New Zealand

Running title: A missense mutation in AGTPBP1 and sheep LMN disease

Word count for main text: 22,172 characters

Keywords: AGTPBP1; lower motor neuron disease; missense mutation;

degeneration; sheep

ABSTRACT A type of lower motor neuron disease inherited as autosomal recessive in Romney sheep was

characterized with normal appearance at birth, but with progressive weakness and

tetraparesis after the first week of life. Here we carried out genome-wide homozygosity

mapping using Illumina OvineSNP50 BeadChips on lambs descended from one carrier ram,

including 19 sheep diagnosed as affected and 11 of their parents that were therefore known

62

carriers. A homozygous region of 136 consecutive single-nucleotide polymorphism (SNP)

loci on chromosome 2 was common to all affected sheep and it was the basis for searching

for the positional candidate genes. Other homozygous regions shared by all affected sheep

spanned 8 or fewer SNP loci. The 136-SNP-region contained the sheep ATP/GTP-binding

protein 1 (AGTPBP1) gene. Mutations in this gene have been shown to be related to

Purkinje cell degeneration (pcd) phenotypes including ataxia in mice. One missense

mutation c.2909G>C on exon 21 of AGTPBP1 was discovered which induces an Arg to Pro

substitution (p.Arg970Pro) at amino acid 970, a conserved residue for the catalytic activity of

AGTPBP1. Genotyping of this mutation showed 100% concordant rate with the recessive

pattern of inheritance in affected, carrier, phenotypically normal and unrelated normal

individuals. This is the first report showing a mutant AGTPBP1 is associated with a lower

motor neuron disease in a large animal model. Our finding raises the possibility of

human patients with the same etiology caused by this gene or other genes in the same

pathway of neuronal development.

INTRODUCTION

Motor neuron diseases (MND) are a group of disorders involved with selective degeneration of upper motor neurons (UMN), and/or lower motor neurons (LMN). The incidence of MND varied from 0.6 per 100,000 person-years to 2.4 per 100,000 person-years in different ethnic human populations (Lee, 2010). UMN originate in the cerebral cortex or the brain stem and provides indirect stimulation of the target muscles, while LMN connect the brain stem and spinal cord to muscle fibers, and provide nerve signals between the UMN and muscles.

Common characteristics of MND are muscle weakness and/or spastic paralysis. Signs of

LMN lesions are represented as muscle atrophy, fasciculations and weakness when

63

associated LMN degenerate. Symptoms like spastic tone, hyperreflexia and upgoing plantar reflex signs appear as UMN degenerate (Lee, 2012). The weakness of respiratory or bulbar muscle can even cause death of MND patients due to respiratory insufficiency (Borasio et al.,

1998). Based on the time of onset, types of motor neuron involvement and detailed clinical features, MND in humans can be classified into six types of diseases including amyotrophic lateral sclerosis (ALS), hereditary spastic paraplegia (HSP), primary lateral sclerosis (PLS), spinal bulbar muscular atrophy (SBMA), spinal muscular atrophy (SMA) and lethal congenital contracture syndrome (LCCS). MND are considered incurable and no effective treatment is currently available. However, the gene therapy strategy of performing gene transfer to patients by utilizing adeno-associated virus (AAV) vectors is a promising therapeutic approach (Nizzardo et al., 2012). Previous evidence has shown genetic variants within more than 30 genes might be responsible for MND (Dion et al., 2009). For example, mutations in the Cu/Zn-binding superoxide dismutase (SOD1) gene have been identified in about 20% of patients with familial ALS (Rosen et al., 1993). However, genetic causes still remain unclear for more than half of human patients with MND.

In 1999, an extended family of Romney lambs was found to have a suspected hereditary

MND, characterized by poor muscle tone and progressive weakness from about 1 week of age, leading to severe tetraparesis, recumbency and muscle atrophy. Lambs remained alert up to four weeks of age when they were euthanized. The muscles were atrophied and pale, and histologically had variation in muscle fiber diameter and small angular fibers. Large foamy macrophages were present in cerebrospinal fluid, and there were degenerate neurons in the spinal cord leading to a decrease in the number of motor neurons, consistent with LMN diseases. No histological changes were seen in the cerebellum (Anderson et al., 1999).

64

Outcross and backcross breeding trials were consistent with this LMN disease having simple

autosomal recessive inheritance.

In order to find the causative locus responsible for this disease, we performed a genome-wide

homozygosity mapping process and then fine mapped the mutation using a positional

candidate gene approach within the largest homogygous genomic region found in the

affected sheep. Our goals of this research were to discover the genetic basis of this disease,

to help sheep breeders avoid matings between carrier animals, and to provide a potential

animal model for gene therapy treatment of human LMN disease.

METHODS Animals

Approvals dealing with DNA or tissue collection from animals were obtained from the

Massey University Animal Ethics Committee (Approval Number 99/135). Iowa State

University Animal Care & Use Committee did not require additional approvals.

A single Romney putative carrier ram was mated in New Zealand to 130 crossbred Finnish

Landrace ewes to produce 115 daughters that were subsequently backcrossed to their sire.

The backcrosses produced 5 affected sheep out of 47 lambs in the first lambing, and 11 out of

71 in the second lambing. The observed rate of affected sheep was consistent with the 1 in 8 expectation for a homozygous recessive disorder from this mating scheme. Any affected offspring and their dams now proven to be carriers, formed the basis of the experimental population. High density SNP genotypes were obtained from 19 affected, which included 3 affected lambs from the original Romney flock, and 11 known carrier sheep. Fine mapping using novel SNPs was conducted on 52 Romney or Romney/Finn crosses including the above mentioned 30 ones with clear disease statuses and 22 potential carriers which were

65

phenotypically normal but sired by the carrier ram. Moreover, 85 control sheep consisted of

14 normal Romney sheep, 25 crossbred Texel sheep and 46 crossbred Corriedale sheep were used for validation of the causative mutation. These sheep were produced from unrelated populations without evidence of LMN disease.

SNP genotyping

DNA was extracted from blood using the QIAamp DNA Blood Mini Kit (QIAGEN, CA,

USA) and the Roche MagNA Pure automated analyser (Roche, USA). The genotyping of

54,241 evenly distributed polymorphic SNP markers across the genome was performed using the Ovine SNP50 BeadChip (Illumina, San Diego, CA, USA) at the commercial company

GeneSeek Inc. Lincoln, NE, USA. Standard procedures were performed with a PCR and ligation-free protocol, which routinely achieves high average call rates and accuracy.

Genome wide homozygosity mapping

In order to define the molecular genetic basis for the LMN disease in these Romney sheep, a search for putative homozygous-by-descent (also named identical by descent, IBD) regions was initiated by performing a genome wide scan using an in-house UNIX script as described in our previous publication (Zhao et al., 2011). The SNP genotypes were ordered according to their genomic location as provided in the Illumina manifest, and used to identify regions of consecutive homozygous SNP loci that were common to all affected animals. The same procedure was applied to known carriers to ensure that the identified region was not simply fixed across all animals. A threshold of 10 SNPs defining a candidate consecutive homozygous region was chosen after calculating expected IBD fragment size for 19 affected lambs produced through the outcross and backcross experiments assuming exactly one crossover event occurred every meiosis on the chromosome carrying the causal mutation

66

with uniform probability along the chromosome expressed in linkage units (cM). The

homozygous fragment containing the causative mutation would have exceeded 0.5 cM or

about 10 consecutive SNP with probability > 95%. The expected size of the IBD region that

contains the causative mutation depends in part on the number of meioses that separate the

common ancestor from the affected lambs, and the number of affected individuals. The IBD

region that included the causative mutation had 23% probability of being shorter than 2 cM

and a 32% probability being longer than 5 cM. The mean expected length of the IBD region

was about 4 cM. Genes in the identified IBD regions were examined for potential

involvement using comparative homology between ovine and bovine genomes as provided

through the Ovine Genome Assembly v1.0

(http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search).

Positional candidate genes sequencing

Primer3 software (version 0.4.0) and the reference sequences from Ovine Genome Assembly

v1.0 were used to design PCR primers flanking the coding regions of ovine AGTPBP1

(Supplementary Table S1). Among the many genes in the identified homozygous regions,

AGTPBP1 was considered to be the best candidate gene because of its role in neural

functions as reported by several publications (Harris et al., 2000; Fernandez-Gonzalez, et al.,

2002). PCR was performed using a 10 l cocktail mixture and a standard program as

previously described (Zhao et al., 2011), with an annealing temperature of 58ºC or 60ºC.

Negative controls were performed to check for the presence of contaminants in all reactions.

PCR products were verified by 1.5% agarose gel electrophoresis, and treated with ExoSAP-

IT® (Affymetrix, USA), before pooling them from 2 affected and 2 carrier sheep. The pooled PCR products were sequenced commercially using a 3730xl DNA Analyzer (Applied

67

Biosystems, USA). The Sequencher software (version 3.0) was used for sequence alignment

and comparison.

Mutation analyses

Since annotation of the ovine genome has not yet been completed, the bovine AGTPBP1

reference mRNA sequence (GenBank: XM_001252536.3) was used as a query in cross-

species BLAST searches to identify sheep EST. Both the bovine AGTPBP1 mRNA and corresponding ovine EST served as references to search for the coding regions within ovine

AGTPBP1 gene. The AGTPBP1 protein sequence was predicted using successfully amplified and re-sequenced exonic regions of the gene from the sheep population, the bovine

AGTPBP1 reference mRNA sequence and ovine EST. In order to understand the functional annotation of this protein, the conserved domains within this sheep protein sequence were searched by blasting the complete protein sequence against the Conserved Domains database

(database: CDD-40526 PSSMs) in NCBI

(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) while setting an option of 500 as the maximum number of hits. Multiple sequence alignments provide a basis for conserved domain models. The specific hits and super families were given for the queried protein.

Moreover, the conserved protein residues involved in conserved features such as binding or catalysis were annotated when the query sequence residues were in the Protein database. The sequence alignments of protein AGTPBP1 from multiple species including human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax, and sheep, were displayed to show the conserved feature sites. The fold structure of the conserved domain of

AGTPBP1 in sheep (residues 850-1150) and mouse (residues 800-1110) were predicted using the 3D_PSSM program (http://www.sbg.bio.ic.ac.uk/3dpssm/index2.html). The known

68

crystal structures of carboxypeptidases in the PDB database help to generate the predicted

structure of AGTPBP1. Sequence alignment was performed on the ClustalW2 server

(http://www.ebi.ac.uk/Tools/msa/clustalw2/) among sheep and mouse AGTPBP1, human

(PDB ID: 1DTD) carboxypeptidase A2 (CPA2) and cattle (PDB ID: 1HDU)

carboxypeptidase A (CPA). This helped us to be more confident about the conserved

features of AGTPBP1 secondary structure and the amino acids critical for carboxypeptidase

catalysis. A three dimensional macromolecular structure of the conserved domain was

searched and downloaded using homologous sequences with confirmed crystal structure

(http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml). Each SNP variant was

analyzed to check whether it could induce missense or nonsense mutations. The identified

SNPs were then genotyped using restriction enzyme digestion (PCR-RFLP: Polymerase

Chain Reaction-Restriction Fragment Length Polymorphism) on the affected and carrier

sheep as well as on other normal controls.

Restriction enzyme digestion (PCR-RFLP)

The sequences of PCR primer “LMN_new” (Supplementary Table S1) were 5’

TGTTAGCACCTCCTCTTTGCT3’ and 5’ AGCAGACCCTTGGCATGATA3’. DNA

fragments from the PCR reactions were subjected to restriction enzyme digestion with AciI

(New England Biolabs Inc., USA). The missense mutation in AGTPBP1 removes a recognition site of the AciI enzyme. The 10 l system reaction mix contained 1 l of NEB buffer 3, 0.5 units of AciI and 2 l of PCR product. The reaction mix was incubated at 37ºC overnight, and the products were analyzed on a 2.0% (w/v) ultra-pure agarose gel.

RESULTS Detection of IBD regions

69

The whole genome scan discovered 209 homozygous fragments consisting of 3 or more consecutive SNPs shared by all 19 affected lambs. The size distribution of common homozygous fragments in these lambs is shown (Figure 1). Only one large homozygous segment exceeded the 10-SNP threshold and it consisted of 136 consecutive SNP loci identified in all 19 affected sheep, whereas the 11 carriers exhibited heterozygosity in this segment. The region started from SNP s56017.1 and extended to SNP OAR2_36428027.1.

This segment is physically positioned on the small arm of ovine chromosome 2 (OAR 2) from 29,284,415 bp to 36,428,027 bp, covering a region of 7 Mb (Figure 2a and b). We found this segment likely encompassed 36 genes (Figure 2d and Supplementary Table S2) based on the bovine reference sequences. A gene called ATP/GTP binding protein, transcript variant 1 (AGTPBP1), also named Nna1 (Nervous system nuclear protein induced by axotomy protein 1 homolog) was the most plausible candidate due to its involvement in axonal regeneration and its causal relationship with mouse pcd phenotype (Harris et al., 2000;

Fernandez-Gonzalez et al., 2002). The ovine AGTPBP1 gene spans about 0.17 Mb and encodes a 1,225-amino-acid intracellular protein that contains a predicted zinc carboxypepetidase domain near its C-terminus (Harris et al., 2000). This gene was initially cloned from regenerating spinal cord neurons of the mouse (supplied by OMIM 606830).

Gene coding region re-sequencing and mutation analyses

Functional mutations in candidate genes were examined by re-sequencing all available exons using primers based on the adjacent intronic sequences of these genes. The ovine AGTPBP1 gene was predicted to have 25 exons based on the bovine reference gene and protein sequences. All the exons except exons 11, 12 and 17 for this gene were successfully amplified and three exonic SNPs were identified. The first exonic SNP (called c.2909G>C)

70

was located at the 6th bp of exon 21 of the sheep AGTPBP1 gene. The other two were located at the 93rd and 99th bp of exon 22. An additional 17 SNPs were identified in either introns or the 3’ UTR on this gene. These were immediately excluded as causative loci since none were located either in the splicing donor/receptor regions or other important regulatory regions and were not in concordance with sheep phenotypes (data not shown). After comparing our findings with the bovine coding sequence of AGTPBP1, we observed that the first exonic SNP (c.2909G>C) introduced a missense transition (Arg>Pro) at amino acid 970

(p.Arg970Pro) (Figure 2e), and the third exonic SNP (c.3133T>C) also resulted as a missense mutation causing a Pro to Ser substitution (p.Pro1045Ser). The second exonic SNP

(c.3126C>T) was a synonymous mutation and did not change any amino acid. A conserved domain named M14_Nna1 (CDD: cd06906) spanning from amino acid residue 859 to 1136 on AGTPBP1 in sheep was predicted in this study using the “CD-search” option available in

NCBI. This domain is a zinc-binding carboxypeptidase domain, which hydrolyzes the peptide bonds at the C-terminal part of certain amino acids in polypeptide chains. The sequence alignment among multiple species demonstrated 2 specific feature hits including metallocarboxypeptidase activity and zinc binding. Nine conserved amino acid residues were involved in these two features hits. (Supplementary Figure S1 and Figure S2; Gomis-

Rṻth et al., 1999; Aloy et al., 2001). Three conserved residues (His920, Glu923 and

His1017) expressed as a motif “HXXE…H” are responsible for zinc binding, and all 9

conserved residues (His920, Glu923, Arg970, Asn979, Arg980, His1017, Gly1018, Met1027

and Glu1102) are important for substrate binding and catalysis of this protein

(Supplementary Figure S1 and Figure 3a). In terms of the size and relative positioning of β- sheets and α-helices, the overall predicted secondary structures of AGTPBP1 in sheep and

71

mouse were the same as that of other metallocarboxypepdases in human and cattle by using

the 3D-PSSM program. Features and conserved amino acids and motifs are located in

similar positions. For example, the well-known zinc-binding HXXE…H motif lies between

the second β-sheet and the first α-helix as reported (Wang et al., 2006). Six residues (His920,

Glu923, Arg970, Asn979, Arg980 and Glu1102) remain conserved among these proteins

(Supplementary Figure S3). The first exonic missense mutation (c.2909G>C) identified on exon 21 is the middle nucleotide within a codon “CGC” which was translated and substituted from Arg970 (codon “CGC”), one of the conserved residues responsible for substrate binding and catalysis, to Pro970 (codon “CCC”) (Figure 3b). Molecular modeling based on the crystal structure of the homologous protein, putative carboxypeptidase (PBDID: 3K2K), also demonstrated that Arg970 forms a part of a loop (the short yellow section of the middle loop in Figure 3c), which might be a key site for substrate binding or catalytic activity. The amino acid residue encoded by the exonic missense mutation on exon 22 (c.3133T>C; p.Pro1045Ser) varies considerably among different species (Figure 3b). It indicates that this missense mutation may not affect the function of the protein AGTPBP1 as it is neither involved in zinc binding nor in substrate binding and catalysis.

The Arg970Pro mutation was genotyped by PCR-RFLP (Figure 2e). The 611 base pair length of the “LMN_new” PCR fragment amplified from normal sheep had one restriction site, which could be recognized by the restriction enzyme AciI and cut into 2 fragments with lengths of 519 bp, and 92 bp. However, the restriction site was removed due to the

Arg970Pro mutation and the 611 bp fragment amplified from the affected lambs was intact after digestion. Therefore, the PCR fragment amplified from sheep with a genotype of “GG” was cut into 2 fragments with lengths of 519 bp, and 92 bp; the one with a genotype of “GC”

72

had 3 fragments with lengths of 611 bp, 519 bp, and 92 bp; the one with a genotype of “CC” had a single fragment with length 611bp. Our genotyping results showed all 19 affected sheep were “CC”, the 11 known carriers were “GC”, and the 22 phenotypically normal but possible carrier sheep were either “GC” or “GG” genotypes. In contrast, 85 control

(unrelated) sheep were only “GG” genotypes (Table 1). This mutation was 100% concordant with the recessive pattern of inheritance in affected, carrier, phenotypically normal and unrelated normal individuals. The two exonic SNP (c.3126C>T and c.3133T>C) on exon 22 were genotyped by direct sequencing. However, there is no perfect concordant rate for these two SNPs based on the counts for each of the genotypes (Table 1). At this stage, we excluded these two SNP as causative mutations for this defect due to their genotyping results and mutation analyses. Taken together, our results strongly suggest that the Arg970Pro substitution is the causative locus for LMN disease in Romney sheep and it could act by decreasing the function of protein AGTPBP1 to generate LMN disease in these lambs.

DISCUSSION The AGTPBP1 gene encodes a protein which belongs to a member of the cytosolic carboxypeptidase subfamily (M14D subfamily) (Kalinina et al., 2007; Rodriguez de la Vega et al., 2007). This protein plays a role in protein turnover by cleaving proteasome-generated peptides into amino acids (Berezniuk et al., 2010), enzymatically removing gene-encoded glutamic acids or tyrosine from the C terminus of proteins (Rogowski et al., 2010; Kalinina et al., 2007), and also plays a role in neuronal bioenergetics at least in mouse

(Chakrabarti et al., 2010). Similar to other cytosolic carboxypeptidases, the mouse Nna1 protein is broadly expressed in brain, pituitary, eye, testis and other tissues (Kalinina et al.,

2007). Mutations or abnormal expression of Nna1 have been found to cause a Purkinje cell

73

degeneration (pcd) phenotype in mice characterized by degeneration of Purkinje cells, mitral

cells of the olfactory bulb, thalamic neurons and retinal photoreceptor cells as well as

degeneration of sperm (Fernandez_Gonzalez et al., 2002; Mullen et al., 1976). Twelve

alleles in mouse mutants related to Agtpbp1 have been summarized by the Jackson

Laboratory (http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:2159437).

Among which, loss-of-function of Agtpbp1 was indicated as being responsible for pcd3J and pcd5J mouse mutants (Fernandez-Gonzalez, et al., 2002; Wang and Morga, 2007; Chakrabarti et al., 2008). Expression of full-length Nna1 successfully rescued Purkinje cell loss and ataxic behavior in pcd3J, and retinal photoreceptor loss and Purkinje cell degeneration in pcd5J mice (Wang et al. 2006; Chakrabarti et al., 2008).

In this study, the missense mutation c.2909G>C on exon 21 of AGTPBP1 induces an Arg to

Pro substitution (p.Arg970Pro) at amino acid 970 (R970). Based on the crystal structure of

human CPA2, it has been reported that 127 (R127) in CPA2 is an electrophile site

crucially responsible for the catalytic activity along with glutamate 270 (Mangani et al., 1994;

Garcia-Saez et al., 1997; Christianson et al., 1989). Similarly, the homologous arginine residue R127 (numbered according to mature bovine carboxypeptides A1) in C. elegans

AGBL4 (AGP/GTP binding like protein 4) has been predicted to stabilize the oxyanion hole in the S1’ site of the carboxypeptidase domain (Rodriguez de la Vega et al., 2007).

Sequence comparison and structural modeling of sheep AGTPBP1 with other carboxypeptidases predicted R970 was located in a region known as the substrate-binding pocket and catalytic center of metallocarboxypeptidase as mouse Nna1 (Supplementary

Figure S3; Wang et al., 2006). These results enforce the view that R970 in sheep is a very conserved site being critical for its function of catalytic activity. Therefore, we concluded

74

the Arg970Pro substitution could be a candidate causative mutation possibly altering the

characteristic of the AGTPBP1 protein and very likely impairing its function of catalytic

activity.

Comparing to the phenotype of the original pcd mouse model, the affected sheep do not have

apparent degeneration of Purkinje cells but show their specific characteristics including

degeneration of neurons in ventral horns of the spinal cord and brain stem, Wallerian

degeneration of motor nerve fibers and atrophy of skeletal muscles. The phenotype of these

sheep is more similar to the symptoms in the Sod1 transgenetic mice (mSOD1), which were

constructed as animal models for human ALS patients (Gurney et al., 1994). It is interesting

to note that the Lox (lysyl oxidase) gene, encoding a protein playing an important role in

formation and repairing of extracellular matrix, was found to be up-regulated in both mSOD1

mice and Agtpbppcd-sid mice (Li et al., 2004; Li et al., 2010; Lucero and Kagan, 2006). The

Agtpbppcd-sid mutant was characterized as a deletion of Agtpbp1 exon 7 that produced a stop codon in exon 8. Absence of protein Agtpbp1 will decrease dendrite development of cerebellar Purkinje cells by changing the expression of downstream components, including

Lox propeptide and the NF-κB signaling pathway in mice (Li et al., 2010). Therefore, we assume the p.Arg970Pro substitution might impair the catalytic activity of AGTPBP1 and decrease its ability of protein turnover of LOX. The overabundant protein LOX would cause abnormal development of dendrite in LMN and then develop phenotypes shown in these sheep.

In order to ascertain that the Arg970Pro substitution was really a causative mutation, measuring mRNA expression of the genes AGTPBP1 and LOX and their enzymatic activities in the spinal cord, brain stem and cerebellar tissues would be desired. It could exclude the

75

possibility that the Arg970Pro mutation in protein AGTPBP1 is only a coincidental finding

from the genetic mapping study or this mutation disrupts expression of another gene in the

same region of the genome. Nevertheless, a point mutagenesis of mouse Nna1 gene at

arginine 962 (R962), the homologous residue of R970 in sheep, to generate transgenic mice

and then using the intact full-length Nna1 gene to rescue phenotypes in mice will provide

further evidence for the Arg970Pro substitution being a causative mutation in sheep.

The missense mutation c.2909G>C (p.Arg970Pro) in AGTPBP1 appears to be a spontaneous

one arising from a single sheep family. In order to avoid spreading out this disease

throughout the sheep population, a simple genetic test can be easily designed based on our

results to identify and remove carriers with the defective c.[2902C] allele, and to decrease the

risk of economic losses in the future .

Regenerating motor neurons expressing protein NNA1 in human nervous tissues indicated

that NNA1 was a key factor for motor neuronal degeneration (Harris et al., 2000). To date, no human cases have been reported with spontaneous mutations in AGTPBP1 that would implicate the gene as a cause for a LMN disease. Neither have they been reported to occur in other large mammalian species. Mice and hamsters are convenient animal models for studies of human neural diseases (Mashimo et al., 2009; Akita et al., 2007). However, there are still significant structural and functional differences between rodent and large mammalian species.

Even among rodents themselves, the gene distributions could be different. Hamsters with undetectable Nna1 mRNA in the brain did not exhibit distinctive neurodegenerative features as serious as mice mutants, which could possibly be explained due to the different distributions of Nna1 and other Nna 1-like genes in the brains of mice and hamsters

(Kalinina et al., 2007; Akita et al., 2007; Akita and Arai, 2009). Sheep have large brain size

76

and long life span relative to rodents and similar gestation period as humans. Our finding in

sheep, a large mammalian species, raises the possibility that human patients with similar

etiology may also result from this gene or other genes in the same pathway of neuronal

development. Since there is no effective treatment available for motor neuron diseases, the

identification of molecular pathogenetic targets will shed light on the development of gene

therapy for this type of diseases (Nizzardo et al., 2011). Here we reported that LMN disease

originating in a Romney ram was likely caused by a mutation in a gene locus AGTPBP1

(NNA1). The affected sheep could be used as a valuable animal model for gene therapy if

human patients are successfully diagnosed in the future.

ACKNOWLEDGEMENTS The authors appreciate the financial support provided by State of Iowa and Hatch Funds and the support from the College of Agriculture and Life Sciences at Iowa State University.

Animal studies and diagnoses were funded by Massey University. H.T.B. was partially funded by the National Research Centre for Growth and Development.

Conflict of interest

The authors declare no conflict of interest.

Supplementary information

Supplementary information is available at Heredity’s website.

Data archiving

Data deposited in the Dryad repository: doi:10.5061/dryad.8fp2pc88

REFERENCES Akita K, Arai S (2009). The ataxic Syrian hamster: an animal model homologous to the pcd mutant mouse? Cerebellum 8: 202-210.

77

Akita K, Arai S, Ohta T, Hanaya T, Fukuda S (2007). Suppressed Nna1 gene expression in the brain of ataxic Syrian hamsters. J Neurogenet 21: 19-29. Aloy P, Companys V, Vendrell J, Aviles FX, Fricker LD, Coll M et al. (2001). The crystal structure of the inhibitor-complexed carboxypeptidase D domain II and the modeling of regulator carboxypidases. J Biol Chem 276:16177-16184. Anderson PD, Parton KH, Collett MG, Sargison ND, Jolly RD (1999). A lower motor neuron disease in newborn Romney lambs. N Z Vet J 47: 112-114. Berezniuk I, Sironi J, Callaway MB, Castro LM, Hirata IY, Ferro ES et al. (2010). CCP1/Nna1 functions in protein turnover in mouse brain: Implications for cell death in Purkinje cell degeneration mice. Faseb J 24: 1813-1823. Borasio GD, Gelinas DF, Mechanical YN (1998). Ventilation in amyotrophic lateral sclerosis: a cross-cultural perspective. J Neurol 245: Suppl 2:S7-12. Chakrabarti L, Eng J, Martinez RA, Jackson S, Huang J, Possin DE et al. (2008). The zinc- binding domain of Nna1 is required to prevent retinal photoreceptor loss and cerebellar ataxia in Purkinje cell degeneration (pcd) mice. Vision Res 48: 1999-2005. Chakrabarti L, Zahra R, Jackson SM, Kazemi-Esfarjani P, Sopher BL, Mason AG et al. (2010). Mitochondrial dysfunction in NnaD mutant flies and Purkinje cell degeneration mice reveals a role for Nna proteins in neuronal bioenergetics. Neuron 66: 835-847. Christianson DW, Mangani S, Shoham G, Lipscomb WN (1989). Binding of D- phenylalanine and D-tyrosine to carboxypeptidase A. J Biol Chem 264: 12849-12853. Dion PA, Daoud H, Rouleau GA (2009). Genetics of motor neuron disorders: new insights into pathogenic mechanisms. Nat Rev Genet 10: 769-782. Fernandez-Gonzalez A, La Spada AR, Treadaway J, Higdon JC, Harris BS, Sidman RL et al. (2002). Purkinje cell degeneration (pcd) phenotypes caused by mutations in the axotomy- induced gene, Nna1. Science 295: 1904-1906. Garcia-Saez I, Reverter D, Vendrell J, Aviles FX, Coll M (1997). The three-dimensional structure of human procarboxypeptidase A2. Deciphering the basis of the inhibition, activation and intrinsic activity of the zymogen. Embo J 16: 6906-6913.

Gomis-Rṻth FX, Companys V, Qian Y, Fricker LD, Vendrell J, Aviles FX et al. (1999). Crystal structure of avian carboxypeptidase D domain II: a prototype for the regulatory metallocarboxypeptidase subfamily. EMBO J 18:5817-5826

Gurney ME, Pu H, Chiu AY, Dal Canto MC, Polchow CY, Alexander DD et al. (1994). Motor neuron degeneration in mice that express a human Cu,Zn superoxide dismutase mutation. Science 264:1772-1775. Harris A, Morgan JI, Pecot M, Soumare A, Osborne A, Soares HD (2000). Regenerating motor neurons express Nna1, a novel ATP/GTP-binding protein related to zinc carboxypeptidases. Mol Cell Neurosci 16: 578-596.

78

Kalinina E, Biswas R, Berezniuk I, Hermoso A, Aviles FX, Fricker LD (2007). A novel subfamily of mouse cytosolic carboxypeptidases. Faseb J 21: 836-850. Lee CN (2012). Reviewing evidences on the management of patients with motor neuron disease. Hong Kong Med J 18:48-55. Li J, Gu X, Ma Y, Calicchio ML, Kong D, Teng YD et al. (2010). Nna1 mediates Purkinje cell dendritic development via lysyl oxidase propeptide and NF-kappaB signaling. Neuron 68: 45-60. Li PA, He Q, Cao T, Yong G, Szauter KM, Fong KS et al. (2004). Up-regulation and altered distribution of lysyl oxidase in the central nervous system of mutant SOD1 transgenic mouse model of amyotrophic lateral sclerosis. Brain Res Mol Brain Res 120:115-122. Lucero HA, Kagan HM (2006). Lysyl oxidase: an oxidative enzyme and effector of cell function. Cell Mol Life Sci 63:2304-2316. Mangani S, Ferraroni M, Orioli P (1994). Interaction of carboxypeptidase A with anions: crystal structure of the complex with the HPO42- anion. Inor Chem 33: 3421–3423. Mashimo T, Hadjebi O, Amair-Pinedo F, Tsurumi T, Langa F, Serikawa T et al. (2009). Progressive Purkinje cell degeneration in tambaleante mutant mice is a consequence of a missense mutation in HERC1E3 ubiquitin ligase. PloS Genet 5:e1000784. Mullen RJ, Eicher EM, Sidman RL (1976). Purkinje cell degeneration, a new neurological mutation in the mouse. Proc Natl Acad Sci USA 73: 208-212. Nizzardo M, Simone C, Falcone M, Riboldi G, Rizzo F, Magri F et al. (2012). Research advances in gene therapy approaches for the treatment of amyotrophic lateral sclerosis. Cell Mol Life Sci 69: 1641-1650. Rodriguez de la Vega M, Sevilla RG, Hermoso A, Lorenzo J, Tanco S, Diez A et al. (2007). Nna1-like proteins are active metallocarboxypeptidases of a new and diverse M14 subfamily. Faseb J 21: 851-865. Rogowski K, van Dijk J, Magiera MM, Bosc C, Deloulme JC, Bosson A et al. (2010). A family of protein-deglutamylating enzymes associated with neurodegeneration. Cell 143: 564-578. Rosen DR, Siddique T, Patterson D, Figlewicz DA, Sapp P, Hentati A et al. (1993). Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature 362:59-62. Wang T, Morgan JI (2007). The Purkinje cell degeneration (pcd) mouse: an unexpected molecular link between neuronal degeneration and regeneration. Brain Res 1140: 26-40. Wang T, Parris J, Li L, Morgan JI (2006). The carboxypeptidase-like substrate-binding site in Nna1 is essential for the rescue of the Purkinje cell degeneration (pcd) phenotype. Mol Cell Neurosci 33: 200-213. Zhao X, Dittmer KE, Blair HT, Thompson KG, Rothschild MF, Garrick DJ (2011). A novel nonsense mutation in the DMP1 gene identified by a genome-wide association study is responsible for inherited rickets in Corriedale sheep. PLoS One 6: e21739.

79

Table 1. The genotyping results of three exonic SNPs

Exon 21 Exon 22 c.2909G>C (p.Arg970Pro) c.3126C>T c.3133T>C (p.Pro1045Ser) LMND Sheep Breed Rate of Rate of Rate of Status C C G Concordan T C C Concordan C C T Concordan C G G ce T T C ce C T T ce

Affected 19 0 0 100% 19 0 0 100% 19 0 0 100% Romney and Known 0 13 0 100% 3 10 0 76.9% 10 3 0 23.1% Romney-sired carrier crossbreds Phenotyp 0 13 9 100% 1 13 6 95.0% 18 2 0 10.0% e normala Normal 0 0 14 100% Texel/Corrieda Normal 0 0 71 100% le aThese sheep are including phenotypically normal and suspected sheep which were died without clear reasons.

Figure 1. The distribution of identical homozygous fragments in LMND affected lambs. Based on the different number of SNPs spanning the homozygous fragment, the counts of identical homozygous fragments in all affected lambs were shown. Only one homozygous fragment spanning 136 SNPs was identified to be common to all 19 affected lambs when the “larger than 10-SNP” threshold was applied.

80

Figure 2. Work flow chart for discovering the mutation locus of low motor neuron disease in Romney sheep. (a) The sheep chromosome 2 (OAR 2). (b) The IBD region containing the causative locus is located on OAR 2 between 29 and 36 Mb. (c) SNP locations on the Ovine SNP50 BeadChip in the IBD region (d) The black boxes show reference genes encompassing the sheep IBD region based on the comparative genomic maps. The gene AGTPBP1 was depicted by a red lined box. (e) Sequence analyses of AGTPBP1 showed a point mutation in exon 21, resulting in an Arg to Pro substitution. This missense mutation is located in a highly conserved residue for the catalytic activity of this protein. The PCR-RFLPs show different patterns for affected and carrier groups.

81

Figure 3 The conserved domain structures of partial protein sequence of AGTPBP1. (a) The conserved domains includes Zinc-binding sites and putative active sites shared by M14_Nna1 like superfamily genes (b) Alignment of predicted sheep protein AGTPBP1 with 8 most dissimilar protein sequences from species of human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax and zebrafish. The residues labeled by hash-marks on the top were known to be involved in substrate binding or catalytic activity. The two missense mutations were labeled on the protein sequence. Upper case amino acids are aligned and a read to blue color scale represents the degree of conservation, with red showing highly conserved residues. Lower case, grey amino acids are unaligned. Dashes indicate variations in sequence length among multiple sequence alignment. (c) Three dimensional molecular modeling of the conserved domain based on the crystal structure of putative carboxypeptidase (PBDID: 3K2K). The short yellow section in the middle loops represents the position of the mutated amino acid residue.

82

Table S1. Primer sequences, annealing temperatures, PCR amplicon information and targeted encoding regions of gene AGTPBP1.

Annealing Primer Size Targeted Primer sequence temperature name (bp) exon (°C) sAGTe1bF 5’TCAAACATTCTCCTGTGGTTTC3’ 783 60 Exon 1 sAGTe1bR 5’TGCTCAACACTCATTTTCTGC3’ sAGTe2F 5’CAGGGTAGGATTTTGGCTTG3’ 443 60 Exon 2 sAGTe2R 5’ATGGGGAAAGGAGGACAATC3’ sAGTe3F 5’AGCAACATCAGTGGAAGGAAC3’ 678 60 Exon 3 sAGTe3R 5’CTCCCATTCACTGGCTCTTC3’ sAGTe4F 5’GCTCAAGTGATTTGCAAGAATTT3’ 445 60 Exon 4 sAGTe4R 5’GGTTTGAGGGACACTGGAAG3’ sAGTe5F 5’CTTCCAGTGTCCCTCAAACC3’ 589 60 Exon 5 sAGTe5R 5’CTTCCTTTTTCCATCACATGC3’ sAGTe6F 5’CCCGCCTCCTACTCTTTCTT3’ 521 60 Exon 6 sAGTe6R 5’TCACCTTTGAGGGAGTGGTT3’ sAGTe7F 5’GTTGCTTTTGGGCAATTCAC3’ 407 60 Exon 7 SAGTE7R 5’AAGGCCCCAAATATCAAAGG3’ sAGTe8F 5’TCCTTTGTGAACTCTGGCCTA3’ 520 60 Exon 8 sAGTe8R 5’CATCAATCAACTGCACTAAAAGA3’ sAGTe9F 5’TGTTTTCAATTTGACTTTACCAGTG3’ 321 60 Exon 9 sAGTe9R 5’CCTATCATTAACAATAACAGGAAGAAA3’ sAGTe10F 5’GCCATATGAGGTTCTTACAAAGAGA3’ 410 60 Exon 10 sAGTe10R 5’GAAGCCAGCAAGAAGCATGT3’ sAGTe13_1F 5’GGTGTAATGAATTTTGTGCTGGT3’ 519 60 Exon 13 sAGTe13_1R 5’CAACGTAATTCGGTCCAAGG3’ sAGTe13_2F 5’AGCAGCTCGGTGATGAAAGT3’ 815 60 Exon 13 sAGTe13_2R 5’TGTCCAATGGCCAGGTTTAT3’ sAGTe14F 5’TTTACTTGGAGGAAAGTTGTGC3’ 502 60 Exon 14 sAGTe14R 5’AAACTGCAGCATTCATACAAAATC3’ sAGTe15_16F 5’GTGAGCGAGCAATAATGGTG3’ 522 60 Exon 15 & 16 sAGTe15_16R 5’TCCTCTGGATGAACAGACTTCA3’ sAGTe18F 5’ACAGAATTTTTGTGTGGATGAAT3’ 502 60 Exon 18 sAGTe18R 5’GAGATTCAATGCTTCTCATTTAAACA3’ sAGTe19F 5’TGTTTGAGATCTGGATGCATTT3’ 421 60 Exon 19 sAGTe19R 5’GGCAGCTCCCTCATATTCAA3’ sAGTe20F 5’ATGAAATCAAGGTAATGTTAAAGAACT3’ 540 60 Exon 20 sAGTe20R 5’ACACAGGACAGCCACAAAAA3’ sAGTe21F 5’TTGGAGGGTCCAGATAGCAC3’ 633 60 Exon 21 sAGTe21R 5’TGCTTTTGAAACTAACATACAGCTT3’ LMN_newF 5’TGTTAGCACCTCCTCTTTGCT3’ 611 58 Exon 21 LMN_newR 5’AGCAGACCCTTGGCATGATA3’ sAGTe22F 5’ACCCTTGGGACAATGATGAA3’ 611 60 Exon 22 sAGTe22R 5’CAGAATAGCAAAATGGCAACA3’ sAGTe23F 5’TTTTTATCTTTGCAGACTTTGCCTA3’ 400 60 Exon 23 sAGTe23R 5’TTCATAAGAACCCAAAGCAACC3’ sAGTe25_1F 5’AACCCAAATTACTGCTTGCAT3’ 429 60 Exon 25 sAGTe25_1R 5’CCAAACAGGTCCAAGATGCT3’ sAGTe25_2F 5’CTTGAACCCTTTGCCATCAC3’ 522 60 Exon 25 sAGTe25_2R 5’GGTTTCAAGGCAAACTCTGG3’

Table S2. The list of ovine positional candidate genes based on the bovine reference gene sequences.

Ovine Chromosome Bovine Gene Bovine Ref. Gene Bovine Ref. Gene Name Position Accession No. Symbol OAR2:29285324..29294958 XM_586318 LOC509375 PREDICTED: Bos taurus similar to Ninjurin 1 OAR2:29285341..29296154 NM_001038082 NINJ1 Bos taurus ninjurin 1 OAR2:29338280..29380982 XM_590060 SUSD3 PREDICTED: Bos taurus similar to sushi domain containing 3 OAR2:29382607..29413924 NM_001024527 FGD3 Bos taurus FYVE, RhoGEF and PH domain containing 3 OAR2:29570653..29607558 XM_599340 IPPK PREDICTED: Bos taurus similar to inositol 1,3,4,5,6-pentakisphosphate 2-kinase OAR2:29648909..29688207 NM_001034597 ECM2 Bos taurus extracellular matrix protein 2, female organ and adipocyte specific OAR2:29744485..29759383 NM_173947 OMD Bos taurus osteomodulin OAR2:29878441..29959955 NM_001101069 IARS Bos taurus isoleucyl-tRNA synthetase OAR2:29762217..29777791 NM_173946 OGN Bos taurus osteoglycin OAR2:29845850..29871806 XM_001253324 NOL8 PREDICTED: Bos taurus nucleolar protein 8 OAR2:29995325..30004700 XM_580638 ZNF484 PREDICTED: Bos taurus similar to zinc finger protein 484 OAR2:30316185..30318955 XM_001788193 LOC616883 PREDICTED: Bos taurus similar to zinc finger protein 782 OAR2:30350587..30352681 NM_001076436 C8H9orf21 Bos taurus open reading frame 21 ortholog OAR2:30476264..30592054 XM_877619 CDC14B PREDICTED: Bos taurus similar to CDC14 homolog B, transcript variant 6 OAR2:30610577..30646729 NM_001081523 HABP4 Bos taurus hyaluronan binding protein 4 OAR2:30685815..30689928 XM_001788164 LOC100140121 PREDICTED: Bos taurus similar to rCG47065 83 OAR2:30688798..30713324 XM_613307 ZNF367 PREDICTED: Bos taurus similar to zinc finger protein 367, transcript variant 1 OAR2:30790734..30845612 NM_001076439 HSD17B3 Bos taurus hydroxysteroid (17-beta) dehydrogenase 3 OAR2:31108137..31147487 NM_001082606 LOC508357 Bos taurus chromosome 9 open reading frame 102 ortholog OAR2:31292157..31295300 NM_001103310 LOC789977 Bos taurus hypothetical LOC789977 OAR2:31555711..31623497 XM_599250 PTCH1 PREDICTED: Bos taurus patched homolog 1 (Drosophila) OAR2:31850742..32286795 NM_174316 FANCC Bos taurus Fanconi anemia, complementation group C OAR2:32301777..32654570 XM_610259 AP-O (LOC531757) PREDICTED: Bos taurus similar to Aminopeptidase O (AP-O) OAR2:32877326..32920047 NM_001046164 FBP2 Description: Bos taurus fructose-1,6-bisphosphatase 2 OAR2:33007170..33228793 XM_613544 DAPK1 PREDICTED: Bos taurus similar to death-associated protein kinase 1, transcript variant 1 OAR2:32942781..32949974 NM_001083686 CTSL2 Bos taurus cathepsin L2 OAR2:33971137..33971720 XM_593294 GAS1 Bos taurus similar to growth arrest-specific 1 PREDICTED: Bos taurus similar to zinc finger, CCHC domain containing 6, transcript OAR2:34660860..34718709 XM_590860 ZCCHC6 variant 1 OAR2:34725688..34738850 NM_001034470 ISCA1 Bos taurus iron-sulfur cluster assembly 1 homolog (S. cerevisiae) OAR2:34778116..34782313 XM_001787825 LOC785863 PREDICTED: Bos taurus similar to chromosome 9 open reading frame 153 OAR2:34874490..34921202 XM_595216 GOLM1 PREDICTED: Bos taurus similar to golgi membrane protein 1 PREDICTED: Bos taurus MAK10 homolog, amino-acid N-acetyltransferase subunit, (S. OAR2:34929288..35016263 XM_868391 MAK10 cerevisiae) OAR2:34929660..35016263 XR_042634 LOC782560 PREDICTED: Bos taurus misc_RNA OAR2:35074125..35075486 XR_028207 LOC785119 PREDICTED: Bos taurus misc_RNA OAR2:35163069..35333039 XM_001252536 AGTPBP1 PREDICTED: Bos taurus similar to AGTPBP1 protein, transcript variant 1 OAR2:35990235..36232516 NM_001075225 NTRK2 Bos taurus neurotrophic tyrosine kinase, receptor, type 2

84

Feature # # sheep_A 859 LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI cq----FRNRPYVFLSARVHPGETNASWVMKGT 933 sheep_N 859 LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI cq----FRNRPYVFLSARVHPGETNASWVMKGT 933 mice 793 LQKLESahn-PQQIYFRKDVLCETLSGNICPLVTITAMPESNYYEHI cq----FRTRPYIFLSARVHPGETNASWVMKGT 867 frog 853 LQKLESlhs-PQQIYFRQEVLCETLGGNGCPVITITAMPESNYYEHV yq----FRNRPYIFLTSRVHPGETNASWVMKGT 927 purpuratus 694 LSKLQAslgeFSDIYYRKQSLCLSLGGNECPVLTITANPTTLDKEGV lq----FRCRRYLFLSARVHPGESNSSYIMKGT 769 zebrafish 785 LQKLSAlc--TPEIYYRQEDLCETLGGNGCPLLTITAMPESSSDEHI sq----FRSRPVIFLSARVHPGETNSSWVMKGS 858 human 859 LQKLESahn-PQQIYFRKDVLCETLSGNSCPLVTITAMPESNYYEHI ch----FRNRPYVFLSARVHPGETNASWVMKGT 933 chicken 876 LQKLESmhn-PQQIYFRQDALCETLGGNICPIVTITAMPESNYYEHI cq----FRNRPYIFLSARVHPGETNASWVMKGT 950 sea squirt 902 LQNLMSnid-TSTMYVRQQTLCNTLSGNSVPVLTITSMPISDNAGSA ieavheLRSRPYIFLSARVHPGETNASWTMRGT 980 trichoplax 218 LEHIRNvad-NNSIYFKCQELCLTLNGNVCNLMTITNSPNKERLNDD ky----VPRRPYIFLSARVHPGESNSSWIMKGL 292

Feature # ## sheep_A 934 LEYLMSn------nPTAQSLRESYIFKIVPMLNPDGVINGNH PCSLSGEDLNRQWQSPNPDLHPTIYHAKGLL 1000 sheep_N 934 LEYLMSn------nPTAQSLRESYIFKIVPMLNPDGVINGNHRCSLSGEDLNRQWQSPNPDLHPTIYHAKGLL 1000 mice 868 LEYLMSn------sPTAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEGLNRQWQSPNPELHPTIYHAKGLL 934 frog 928 LEFLMGs------sATAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEDLNRQWQNPNSDLHPTIYHTKGLL 994 purpuratus 770 LKFLMSt------hPTAQALREIFIFKIVPMLNPDGVINGSHRCSL SGEDLNRRWLQPIKELHPTIYHTKGLL 836 zebrafish 859 LEFLMSc------sPQAQSLRQSYIFKIMPMLNPDGVINGNHRCSL SGEDLNRQWQNPNAELHPTIYHAKSLL 925 human 934 LEYLMSn------nPTAQSLRESYIFKIVPMLNPDGVINGNHRCSL SGEDLNRQWQSPSPDLHPTIYHAKGLL 1000 chicken 951 LEYLMSs------nPSAQSLRESYIFKIIPMLNPDGVINGNHRCSL SGEDLNRQWQNPNPDLHPTIYHAKGLL 1017 sea squirt 981 LKLLLTppasqdqpediriiaSIAEDLRKSYIFKIIPMLNPDGVINGHHRCSL SGEDLNRTWLDPNPQFFPTIYHTKGLL 1060 trichoplax 293 LDFITSd------dDCAIQLRESYIFKIIPMLNPDGVVNGCHRCSL SGHDLNRCWISPDPRIHPTIYHTKGLI 359

Feature ## # sheep_A 1001 QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnaapcdvvedv GYRTLPKILSHIAPAFCMSSCSFVVEKS 1080 sheep_N 1001 QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnaapcdvvedv GYRTLPKILSHIAPAFCMSSCSFVVEKS 1080 mice 935 QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhthdnsascdivegm GYRTLPKILSHIAPAFCMSSCSFVVEKS 1014 frog 995 QYLSAIKRVPLVYCDYHGHSRKKNVFMYGC Siketvwhtnanaascdmvees GYKTLPKVLNQIAPAFSMSSCSFVVEKS 1074 purpuratus 837 QYLKLIERSPLVYCDFHGHSRKKNVFMYGN Saamshspedl-ikfgipsgvsGAKTLPRILSSIAPAFSHSNCSFAVDKG 915 zebrafish 926 QYLRATGRTPLVFCDYHGHSRKKNVFMYGC Siketmwqssvntstcdlnedl GYRTLPKLLSQMTPAFSLSSCSFVVERS 1005 human 1001 QYLAAVKRLPLVYCDYHGHSRKKNVFMYGC Siketvwhtndnatscdvvedt GYRTLPKILSHIAPAFCMSSCSFVVEKS 1080 chicken 1018 QYLAAIKRLPLVYCDYHGHSRKKNVFMYGC Aiketmwhtnvntascdlmedp GYRVLPKILSQTAPAFCMGSCSFVVEKS 1097 sea squirt 1061 QYLQSIQKAPLVYCDYHGHSRKMNVFMYGC Shkdsaaaget-sttvggpddtGFKTLPKLLNHFAPSFSLKGCSYVVEKS 1139 trichoplax 360 QYMVTIGKSPLIFCDFHGHSRRKNIFTYAC Ypytntn------hraedqMLRALPRALQEVSPVFSYQLCSFAVEKC 430

Feature # sheep_A 1081 KESTARVVVWREIGVQRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR 1136 sheep_N 1081 KESTARVVVWREIGVQRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR 1136 mice 1015 KESTARVVVWREIGVQRSYTMESTLCGCDQGRYK QLQIGTRELEEMGAQFCVGLLR 1070 frog 1075 KESTARVVVWKEIGVQRSYTMESTLCGCDQGKYK DLQIGTKELEEMGAKFCVGLLR 1130 purpuratus 916 KESAARVVVWREIGVARSYTMESTYCGFDQGKYK QLQVTTAMLEEMGQHFCEGLYK 971 zebrafish 1006 KETTARVVVWREIGVQRSYTMESTLCGCDQGKYK QLQIGTAELEEMGSQFCLALLR 1061 human 1081 KESTARVVVWREIGVQRSYTMESTLCGCDQGKYK QLQIGTRELEEMGAKFCVGLLR 1136 chicken 1098 KESTARVVVWREIGVQRSYTMESTLCGCDQGKYK QLQIGTKELEEMGAKFCVGLLR 1153 sea squirt 1140 KVTTARVVVWKEIGVARSYTMESTYCGCDQGRYR QLQVGTKELEEMGARFVEVLLH 1195 trichoplax 431 KESTARVVLWRELCILRSYTMESTYCGMDQGVYK QYHINTTALEDMGKDFCRGLLK 486

Figure S1. Conserved amino acids predicted by “CD-search”. Alignment of the complete predicted sheep protein AGTPBP1 with 8 most dissimilar protein sequences from species of human, mouse, chicken, frog, purpuratus, sea squirt, trichoplax and zebrafish predicted 2 specific feature hits consisting of 9 conserved amino acid residues. His920, Glu923 and His1017 labeled by hash-marks on the top consist as the zinc binding motif” HXXE…H”, the conserved feature 1. The conserved feature 2 represents metallocarboxypeptidase active sites consisting of all 9 conserved residues including zinc binding residues. These residues include His920, Glu923, Arg970, Asn979, Arg980, His1017, Gly1018, Met1027 and Glu1102 which are labeled by hash- marks on the top.

85

sheep MSKLKVIPEKSLANNSRIVGLLAQLEKINTESAESDTARYVTAKILHLAQSQEKTRREMT 60 cat t l e MSKLKVIPEKSLTNNSRIVGLLAQLEKINTESTESDTARYVTSKILHLAQSQEKTRREMT 60 human MSKLKVIPEKSLTNNSRIVGLLAQLEKINAEPSESDTARYVTSKILHLAQSQEKTRREMT 60 mouse MSKLKVVGEKSLTNSSRVVGLLAQLEKINTDSTESDTARYVTSKILHLAQSQEKTRREMT 60 ******: ****:*.**:***********::.:*********:*****************

sheep AKGSTGMEVLLSTLENTKDLQTTLNILSILVELVSAGGGRRASFLVSKGGSQILLQLLMN 120 cat t l e AKGSTGMEVLLSTLENTKDLQTTLNILSILVELVSAGGGRRASFLVSKGGSQILLQLLMN 120 human AKGSTGMEILLSTLENTKDLQTTLNILSILVELVSAGGGRRVSFLVTKGGSQILLQLLMN 120 mouse TKGSTGMEVLLSTLENTKDLQTVLNILSILI ELVSSGGGRRASFLVAKGGSQILLQLLMN 120 :*******:*************.*******:****:*****.****:*************

sheep ASKESPLNEELMVQIHSILAKIGPKDKKFGVKARINGALNITLNLVKQNLQNHRLVLPCL 180 cat t l e ASKESPLNEELMVQIHSILAKIGPKDKKFGVKARINGALNITLNLVKQNLQNHRLVLPCL 180 human ASKESPPHEDLMVQIHSILAKIGPKDKKFGVKARINGALNITLNLVKQNLQNHRLVLPCL 180 mouse ASKDSPPHEEVMVQTHSILAKIGPKDKKFGVKARVNGALTVTLNLVKQHFQNYRLVLPCL 180 ***:** :*::*** *******************:****.:*******::**:*******

sheep QLLRVYSANSVNSVSLGKNGVVELMFKII GPFSKKNSSLMKVALDTLAALLKSKTNARRA 240 cat t l e QLLRVYSANSVNSVSLGKNGVVELMFKII GPFSKKNSNLMKVALDTLAALLKSKTNARRA 240 human QLLRVYSANSVNSVSLGKNGVVELMFKII GPFSKKNSSLI KVALDTLAALLKSKTNARRA 240 mouse QLLRVYSTNSVNSVSLGKNGVVELMFKII GPFSKKNSGLMKVALDTLAALLKSKTNARRA 240 *******:*****************************.*:********************

sheep VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGILQSLKSVTNIKLGRKAFI DANGMKILYN 300 cat t l e VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGILQSLKSVTNIKLGRKAFI DANGMKILYN 300 human VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGILQSLKSVTNIKLGRKAFI DANGMKILYN 300 mouse VDRGYVQVLLTI YVDWHRHDNRHRNMLI RKGILQSLKSVTNIKLGRKAFI DANGMKILYN 300 ************************************************************

sheep TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLPTI KSSFHFQLPVI PVTGPVAQLYSLP 360 cat t l e TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLSTI KSSFHFQLPVI PVTGPVAQLYSLP 360 human TS------QLPVIPVTGPVAQLYSLP 320 mouse TSQECLAVRTLDPLVNTSSLI MRKCFPKNRLPLPTI KSSFHFQLPII PVTGPVAQLYSLP 360 ** ***:**************

sheep PEVDDVVDESDDNDDIDLEAENETENEDDLDQNFKNDDIETDINKLKPQQEPGRTI EELK 420 cat t l e PEVDDVVDESDDNDDIDLEAENETENEDDLDQNFKNDDIETDINKLKPQQEPGRTI EELK 420 human PEVDDVVDESDDNDDIDVEAENETENEDDLDQNFKNDDIETDINKLKPQQEPGRTI EDLK 380 mouse PEVDDVVDESDDNDDIDLEVENELENEDDLDQSFKNDDIETDINKLRPQQVPGRTI EELK 420 *****************:*.*** ********.*************:*** ******:**

Figure S2. Characteristics of protein AGTPBP1. Letters highlighted in green denotes the rich domain; in pink corresponds to the ATP binding domain; in red for zinc-binding site; in blue for the carboxypeptidase catalytic site and substrate binding sites. The entire enzymatic domain is indicated by bold lettering.

86

200 400 600 800 1000 AGTPBP1

Agtpbp1_mouse ICPLVTITAMPESNYYEHICQFR-TRPYIFLSARVHPGETNASWVMKGTLEYLMS---NS AGTPBP1_sheep SCPLVTITAMPESNYYEHICQFR-NRPYVFLSARVHPGETNASWVMKGTLEYLMS---NN CPA2_human PMNVLKF------STGG-DKPAIWLDAGIHAREWVTQATALWTANKIVSDYGKD CPA_cattle PIYVLKF------STGGSNRPAIWIDLGIHSREWITQATGVWFAKKFTENYGQN

* Agtpbp1_mouse PTAQSLRESYIFKIVPMLNPDGV-----INGNHRCSLS------GEGLNRQW----Q AGTPBP1_sheep PTAQSLRESYIFKIVPMLNPDGV-----INGNHRCSLS------GEDLNRQW----Q CPA2_human PSITSILDALDIFLLPVTNPDGYVFSQTKNRMWRKTRSKVSAGSLCVGVDPNRNWDAGFG CPA_cattle PSFTAILDSMDIFLEIVTNPNGFAFTHSENRLWRKTRSKVTSSSLCVGVDANRNWDAGFG

Agtpbp1_mouse SP----NP---ELH------PTIYHAKGLLQYLAAVKRLPLVYCDYHGHSRKKNVFMYG AGTPBP1_sheep SP----NP---DLH------PTIYHAKGLLQYLAAVKRLPLVYCDYHGHSRKKNVFMYG CPA2_human GPGASSNPCSDSYHGPSANSEVEVKSIVDFIKSHG----KVKAFIILH--S-YSQLLMFP CPA_cattle KAGASSSPCSETYHGKYANSEVEVKSIVDFVKNHG----NFKAFLSIH--S-YSQLLLYP

Agtpbp1_mouse CSIKETVWHTHDNSASCDIVEGMGYRTLPKILSHIAPAFCMSSCSFVVEKSKESTARVVV AGTPBP1_sheep CSIKETVWHTNDNAASCDVVEDVGYRTLPKILSHIAPAFCMSSCSFVVEKSKESTARVVV CPA2_human YGYKCT--KLDD-FDELSEVAQKAAQSLSRLHGTKY--KVGPICSVIYQASGGSIDWSYD CPA_cattle YGYT-T-QSIPD-KTELNQVAKSAVAALKSLYGTSY--KYGSIITTIYQASGGSIDWSYN

Agtpbp1_mouse WREIGVQRSYTMESTLCGCDQGRYKGLQIGTRELEEMGAQFCVGLLRLKRLTSSLEYNLP AGTPBP1_sheep WREIGVQRSYTMESTLCGCDQGKYKGLQIGTRELEEMGAKFCVGLLRLKRLTSPLEYNLP CPA2_human Y---GIKYSFAFELR----DTGRYG-FLLPARQILPTAEETWLGLKAIMEHVRDHPY--- CPA_cattle Q---GIKYSFTFELR----DTGRYG-FLLPASQIIPTAQETWLGVLTIMEHTVNN-----

Figure S3. Secondary structure modeling of the AGTPBP1 protein indicates critical amino acid residues. The fold structure of the carboxypeptidase (CP) domain (orange box in upper panel) of the AGTPBP1 protein in sheep and mouse were predicted using the 3D_PSSM software. The sequences of CPA2-human (PDB ID: 1DTD) and CPA_cattle (PDB ID: 1HDU) which have known tertiary structures were also included in the alignment. Green highlighting indicates β-sheet and blue highlighting indicates α-helix. The conserved amino acids being critical for substrate binding and catalysis are in red color (His920, Glu923, Arg970, Asn979, Arg980 and Glu1102). The location of arginine 970 in sheep AGTPBP1 gene is marked with an asterisk on its top. The numbering is based on the sheep AGTPBP1 protein.

87

CHAPTER 4. IN A SHAKE OF A LAMB’S TAIL: USING GENOMICS TO

UNRAVEL A CAUSE OF CHONDRODYSPLASIA IN TEXEL SHEEP

A paper published in Animal Genetics 2012;43 Suppl 1:9-18.

Xia Zhao1, Suneel K Onteru1, Susan Piripi2,3, Keith G Thompson2, Hugh T Blair2, Dorian J

Garrick1,2, Max F Rothschild1

1Department of Animal Science and Center for Integrated Animal Genomics, Iowa State

University, Ames, IA 50011, USA

2Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston

North 4474, New Zealand

3College of Veterinary Medicine, Oregon State University, Corvallis, OR 97331, USA

Running title: the SLC13A1 gene responsible for Texel chondrodysplasia

Keywords: deletion; SLC13A1 gene; chondrodysplasia; Texel sheep; genetic disease

SUMMARY Chondrodysplasia in Texel sheep is a recessively inherited disorder characterized by dwarfism and angular deformities of the forelimbs. A genome-wide association study using the Illumina OvineSNP50 BeadChip on 15 sheep diagnosed as affected and 8 carriers descended from 3 affected rams was conducted to uncover the genetic cause. A homozygous region of 25 consecutive SNP loci was identified in all affected sheep, covering a region of

1Mbp on ovine chromosome 4. Seven positional candidate genes including solute carrier

88

family 13, member 1 (SLC13A1) were identified and used to search for new single nucleotide

polymorphisms (SNPs) for fine mapping of the causal locus. The SLC13A1 gene, encoding a

sodium/sulfate transporter, was the primary candidate gene due to similar phenotypes

observed in the Slc13a1_knockout mouse model. We discovered a 1-bp deletion of T

(g.25513delT) at 107 bp position of exon 3 in the SLC13A1 gene. Genotyping by direct

sequencing and restriction fragment length polymorphism analysis for this mutation showed

that, all 15 affected sheep were “g.25513delT/g.25513delT”; the 8 carriers were

“g.25513delT/T” and 54 normal controls were “T/T”. The mutation g.25513delT shifts the

open reading frame of SLC13A1 to introduce a stop codon and truncate C-terminal amino acids. It was concluded that the g.25513delT mutation in the SLC13A1 gene was responsible for the chondrodysplasia seen in these Texel sheep. The knowledge can be used to identify carriers with the defective g.[25513delT] allele in order to avoid at-risk matings to improve animal welfare and decrease economic losses.

INTRODUCTION Nearly 30 years have passed since the seminal research of Soller and colleagues (Soller &

Beckman 1983; Beckman & Soller 1983) which described the use of genetic markers for genetic improvement of crops and livestock. Considerable advancements have allowed animal geneticists to springboard from these original marker papers to the development of genetic maps, quantitative trait loci analyses and on to genome sequencing and resultant development of high-density panels of single nucleotide polymorphisms (SNPs). These recent tools greatly facilitate the exploration of genetic abnormalities in livestock species.

89

Chondrodysplasia is a general term for diseases of development of cartilage that are present in human beings and many animal species. Other terms like achondroplasia, achondrogenesis, hypochondrogenesis and chondrodystrophy have been used to describe various forms of chondrodysplasia in published clinical and research literature (Thompson et al. 2005; Martínez et al. 2000; Superti_Furga et al. 1999; Zankl et al. 2008; Hastbacka et al.

1999). Chondrodysplasia is primarily caused by defective genes affecting normal chondrogenesis and cartilage development; these display phenotypically as abnormal shape and structure of the skeleton.

Mutations in several genes related to the synthesis of collagen or proteoglycans, or the signal-transduction mechanisms have been reported to be involved in inherited chondrodysplasia. Three forms of chondrodysplasia in humans including achondrodrogenesis 1B, atelosteogenesis type II and diastrophic dysplasia are due to mutations in the solute carrier family 26, member 2 (SLC26A2) gene, which encodes a sulfate transporter protein (DTDST). The impaired sulfate uptake affected the normal development of cartilage (Superti_Furga et al. 1999; Hastbacka et al. 1996; Hastbacka et al. 1999). A nonsense mutation of the collagen, type X, alpha 1 (COL10A1) gene has been found to be responsible for Schmid type metaphyseal chondrodysplasia (MCDS) in a human patient

(Woelfle et al. 2011). A multibreed association study in domestic dogs demonstrated that an evolutionarily recently acquired fgf4 retrogene, an extra copy of the fibroblast growth factor

4 gene (FGF4), was strongly associated with chondrodysplasia (Parker et al. 2009). The most common inherited chondrodysplasia of sheep is the “spider lamb syndrome” (SLS) in the Suffolk breed. The mutation responsible for the disease is a non-synonymous T > A

90

transversion on the highly conserved tyrosine kinase II domain of the fibroblast growth factor receptor 3 (FGFR3) gene (Cockett et al. 1999; Beever et al. 2006).

Chondrodysplasia in Texel sheep has been determined by outcross and backcross experiments to be a simple autosomal recessive disorder (Byrne et al. 2008). It was distinguished from SLS and other chondrodysplasias of sheep by gross phenotypic observations and microscopic analysis (Thompson et al. 2005; Piripi 2008).

Chondrodysplasia in Texel sheep is phenotypically characterized by disproportionate dwarfism, a stocky appearance with shortened neck and limbs, bilateral varus deformities of the forelimbs, flexion of the carpal joints, a barrel-shaped chest and wide-based stance (Fig.

1A &1B). Consistent lesions were observed grossly and microscopically in many cartilaginous tissues especially in hyaline cartilage. The marked narrowing of the tracheal lumen (Fig. 1C) in the affected lambs could result in sudden death due to tracheal collapse.

Immunohistochemistry of cartilage in Texel chondrodysplasia demonstrated that both types

II and VI collagen were distributed abnormally in the matrix, suggesting that the extracellular assembly of these collagen types was disturbed. Moreover, a significantly reduced level of chondroitin 4- sulfate was observed using capillary electrophoresis in chondrodysplastic sheep compared to age-matched controls (Piripi et al. 2011). Most cases were able to be diagnosed at 14 days of age, but all those showing a characteristic chondrodysplastic appearance by approximately 4 months of age were diagnosed as affected.

Due to the similar sulfation pattern with achondrogenesis 1B and diastrophic dysplasia in humans, the ovine SLC26A2 gene was previously considered as a prime candidate gene for this disease, but no mutations were identified in 85.4% of exonic DNA sequenced (Byrne,

91

2005). That initial investigation was followed by targeted linkage disequilibrium (LD)

mapping to locate the causative mutation for this disorder. However, it again failed since the

microsatellite markers were selected only from ovine chromosomes 1, 5, 6, 13, and 22 based

on the limited known locations of functional candidate genes or their linkage to

chondrodysplastic diseases in other species (Piripi 2008).

In order to understand the genetic basis for chondrodysplasia in Texel sheep, we recently

performed a homozygosity mapping through whole genome analysis using the Illumina

OvineSNP50 BeadChip to identify the genomic region containing the mutation, and then

identified the disease-causing mutation by direct examination of the positional candidate

genes.

MATERIALS AND METHODS Ethics statement

Approvals dealing with DNA or tissue collection from animals were obtained from the

Massey University Animal Ethics Committee (APPROVAL NUMBER 05/123). Iowa State

University Animal Care & Use Committee did not require additional approvals.

Animals

Three related affected Texel rams were out-crossed to 221 unrelated phenotypically normal mixed-breed ewes. Then 123 F1 hogget daughters were backcrossed to the same 3 rams and these matings produced 46 surviving back-cross lambs. In another mating trial 22 mixed- breed ewes, which either had chondrodysplasia or had previously produced chondrodyplastic offspring, were mated to the 3 affected Texel rams and produced 26 lambs. The 3 affected rams and the 22 ewes with chondrodysplasia background were purchased from the Southland

92

property in New Zealand which was the original source for this disease. A total of 23 lambs

were genotyped using the OvineSNP50 BeadChip for genome wide homozygosity mapping

and the following fine mapping experiments. These lambs included 11 affected (5 males and

6 females) and 7 presumed carriers (4 males and 3 females) chosen from the back-cross

lambs, and 4 affected females with 1 presumed male carrier chosen from the lambs produced

in the second mating trial. The pedigree information for these 23 lambs is provided (Fig. S1).

Fine mapping also included 54 Romney control sheep from an unrelated population without evidence of this inherited chondrodysplasia.

DNA extraction

DNA was extracted from samples of spleen collected during necropsy. Most samples were extracted using the Wizard genomic DNA extraction kit (Promega, Madison, WI, USA).

DNA for an additional 2 samples were extracted with a GenRlute mammalian genomic DNA miniprep kit (Sigma_Aldrich Corp., St. Louis, MO, USA), and a high pure PCR template preparation kit (Roche Diagnostics, Mannheim, Germany). The DNA extractions were made following standard procedures for each of the kits.

Genome wide homozygosity mapping

DNA samples from the 23 Texel crossbred sheep were genotyped using the Ovine SNP50

BeadChip (Illumina, San Diego, CA, USA) with standard procedures at GeneSeek Inc.

Lincoln, NE, USA. The homozygosity mapping was conducted using the same in-house IBD

(identical by descent) approach implemented using command line UNIX tools as described previously (Zhao et al. 2011). This approach identified common homozygous segments in the affected lambs. Among those segments, it was presumed that one would contain the

93

mutation associated with chondrodysplasia in Texel sheep. A list of positional candidate

genes were obtained when the targeted IBD segments were identified using comparative

maps between ovine and bovine genomes accessed through the Ovine Genome Assembly

v1.0.

Copy number variation (CNV) analysis

The CNV analysis based on the whole genome SNP chip genotyping data was carried out

with the CNV partition software, a plug-in within the GenomeStudio program (V1.0

Illumina). The CNV partition software helped to count the copies of probes for each SNP on

the ovineSNP50 BeadChip within each individual. Three files including the sample sheet,

the intensity and the manifest files were required for data evaluations in this program.

Software parameters were kept at the developers’ default values. The CNV value and

confidence for chromosomal regions in each sample were computed. An estimated copy

number for each allele was represented by the CNV value. Viewing the results of CNV

analyses was accomplished by the heat map produced using the CNV partition software.

Each column in the heat map represents the copy numbers of SNP loci across the genome for

one individual animal.

New genetic variants discovery

In order to amplify exons and adjacent genomic sequences for the sheep positional candidate

genes located in the homozygous region, primers (IDT, Coralville, IA, USA) were designed using the online Primer3 tool (Rozen & Skaletsky 2000) targeting the sheep genomic sequences which were downloaded from the Ovine Genome Assembly v1.0 (Table S1).

PCR was performed using a 10 l system, which included 1x GoTaq reaction buffer

94

containing 1.5 mM Mg2+ (Promega, Madison, WI, USA), 0.125 mM of each dNTPs, 2.5 pM of each primer, 12.5 ng of genomic DNA, and 0.25 units of GoTaq DNA polymerase

(Promega, Madison, WI, USA). The PCR program was 35 cycles of 30 s at 94°, 30 s at the corresponding annealing temperature, and 30 s at 72°C with additional 3 min denaturation in the first cycle and 5 min extension in the last cycle. In all reactions, negative controls were performed to check for the presence of contaminants. PCR products were verified by 1.5% agarose gel electrophoresis, and treated with ExoSAP-IT®. The PCR pools from 2 affected sheep, 2 carriers and 2 normal controls, were used for commercial re-sequencing using the

3730xl DNA Analyzer. The sequences were aligned and compared to discover new genetic variants using Sequencher software version 3.0.

Mutation analyses

The cattle reference SLC13A1 mRNA (GenBank: NM_001193219) was used as a query in cross-species BLAST searches to identify sheep ESTs. The sheep re-sequencing results using templates from the affected, carrier and normal control sheep along with identified sheep ESTs were used to predict a complete ovine SLC13A1 mRNA sequence. The ovine mRNA sequence was then translated into a protein sequence by applying the ExPASy translate tool. The predicted sheep SLC13A1 protein sequence was compared to proteins in other species including cattle (Bos Taurus, GenBank: DAA30458), human (Homo sapiens,

GenBank: NP_071889), mouse (Mus musculus, GenBank: AAH22672), rat (Rattus norvegicus, GenBank: NP_113839), dog (Canis lupus familiaris, GenBank: XP_539548) and

Zebrafish (Danio rerio, GenBank: NP_954975) using the CLUSTALW alignment tool to check the completeness of the sheep protein sequence and the conserved regions. The

95

topology prediction of transmembrane domains (TMDs) was undertaken using the online tool

TopPred 0.01. The upper and lower cutoff values were set as 1 and 0.6. Each genetic variant

was analyzed to check whether it was a missense, or nonsense mutation or whether it could

shift the open reading frame of the candidate gene. Every suspected genetic variant was

genotyped on all the sheep samples to check whether it was concordant with the recessive

inheritance pattern of the disease.

Mutation “g.25513delT” genotyping

Direct sequencing and PCR restriction enzyme digestion (PCR-RFLP) were carried out to

genotype the suspected causative mutation “g.25513delT” on all Texel crossbred sheep with

clear status of chondrodysplasia as well as 54 Romney control sheep. The restriction enzyme

digestion procedures were as follows: DNA fragments from PCR reactions were subjected to

restriction enzyme digestion with HpyAV (New England Biolabs Inc., USA). The 10 l system reaction mix contained 1 l of NEB buffer 4, 3 units of HpyAV (1.5 l), 0.1 l of

BSA (100X) and 3 l of PCR products. Then the reaction mix was incubated at 37º C overnight, and the products were analyzed on a 2.5% (w/v) ultra-pure agarose gel.

RESULTS

Identity by descent region

The IBD analysis narrowed the defective region down to a single 1 Mb homozygous segment consisting of 25 consecutive SNP loci on sheep chromosome 4 (OAR 4) in all 15 affected sheep, by using a larger than 10-SNP threshold in our IBD approach (Fig. S2). This region was heterozygous for 7 presumed carriers but not for lamb 14-04. Lamb 14-04 was phenotypically normal but produced affected offspring. It was considered as a carrier, but

96

was homozygous in the 1 Mb segment with the same haplotype as affected sheep. The 25-

SNP homozygous segment was defined by SNP OAR4_92447165.1 to SNP

OAR4_93438733.1, spanned a chromosomal region between 92.45 Mb and 93.44 Mb (Fig.

2A & 2B). It encompassed 7 genes based on the cow reference sequences including Ca2+ dependent activator protein for secretion 2 (CADPS2) (OAR4:92155977..92738612), solute carrier family 13 member 1 (SLC13A1) (OAR4:92888272..92993056), IQ motif and ubiquitin domain containing (IQUB) (OAR4:93209447..93265179), ring finger protein 148

(RNF148) (OAR4:92554935..92556138), ankyrin repeat and SOCS box-containing 15

(ASB15) (OAR4:93343952..93353409), leiomodin 2 (cardiac)

(LMOD2)(OAR4:93370342..93377214) and Wiskott-Aldrich syndrome-like

(WASL)(OAR4:93431017..93481693) (Fig. 2C). The SLC13A1 gene, spanning over 79 kb in length, was the most plausible candidate due to its biological functions and previous reports of its involvement with chondrodysplasia in other species. Most of the subsequent effort then focused on this gene in further mutation analyses.

CNV analysis

The heat map from CNV analyses showed copy numbers of SNP loci for each individual sheep. No CNVs were discovered that were associated with chondrodysplasia in Texel sheep.

There were only two loci with CNV values from 0 to 0.5 on OAR1 and OAR3, respectively

(the dark red horizontal bars in Fig. S3). However, the 2 CNVs with rare copy numbers showed up mostly among the first 12 animals on the heat map, and did not occur specifically in either the affected group or the carrier group (Fig. S3). Therefore, the CNV analysis was not pursued further.

97

Mutation analysis

Two corresponding ovine EST entries (GenBank: FE030665.1, FE030666.1) were found

using the cattle reference SLC13A1 mRNA as a query. The putative ovine genomic structure

was determined by alignments of the 2 ovine ESTs, the complete bovine SLC13A1 mRNA

and the human SLC13A1 mRNA. Even though the human SLC13A1 gene contains 15 exons

and spanned 83 kb in length (Lee et al. 2000), the evidence from the alignments with the

bovine homolog showed the presence of exon 16 in both bovine and ovine SLC13A1 mRNA,

in which exon 16 only contained a long 3’ un-translated region. The “GT-AG” splice site

groups in the ovine SLC13A1 gene support this genomic structure. The translated protein length ranged from 583 to 597 amino acids in the 7 species. In sheep and cattle, it is 597 amino acids long, 2 amino acids longer than in humans. There are many conserved regions in this protein shared by the 7 species (Fig. S4). The topology prediction showed the predicted SLC13A1 protein was comprised of 13 transmembrane domains (TMDs) in sheep.

Each peak above the upper cut-off value represents a TMD (Fig. 3).

Re-sequencing all available exons and the adjacent introns of the SLC13A1 gene identified a mutation potentially responsible for this disease, with one base pair deletion T

(JN108880:g.25513delT) at the 107 bp position of exon 3 (148 bp on the “sRFLPdelT” fragment, Table S1) by comparing affected, carriers and normal controls. This 1-bp deletion was 3,751 bp far away from the nearest published SNP OAR4_92963794.1 on one side and

27,654 bp from SNP s11336.1 on another side. An additional 11 novel SNPs in this candidate gene and 5 genetic variants in the other 6 candidate genes were identified to be either intronic or synonymous, and were excluded as mutations for this defect (Table S1).

98

The g.25513delT deletion on SLC13A1 gene changed the open reading frame and induced a nonsense transition (p.Val113Stop) at amino acid 113on the predicted ovine SLC13A1 protein sequence (Fig 2D).

Genotyping of mutation g.25513delT

Genotyping all the sheep including unrelated controls by direct sequencing showed that all the affected Texel sheep were homozygous with the deletion of “T”

(g.25513delT/g.25513delT) on the exon 3 of SLC13A1 gene, the 8 carriers including lamb

14-04 were heterozygous with a single “T” allele (g.25513delT/T) and the pooled normal controls were all homozygous with “T” allele (T/T) (Fig. 4). The g.25513delT mutation created a HpyAV restriction endonuclease recognizing site for specific cuttings on the PCR fragment “sRFLPdelT”, which was used to distinguish the affected, carriers and normal controls. In the PCR-RFLP experiment, PCR fragment “sRFLPdelT” with a

“g.25513delT/g.25513delT” genotype was cut into 2 fragments with the lengths of 192bp and

156bp; the one with a “g.25513delT/T” genotype was cut into 4 fragments with lengths of

349bp, 348bp, 192bp, and 156bp; and the one with a “T/T” genotype could not be cut and kept the original fragment size with a length of 349bp. The PCR-RFLP genotyping results were consistent with the results deduced from direct sequencing. This g.25513delT mutation was the only polymorphism 100% concordant with the recessive pattern of inheritance in affected, carriers and unrelated normal individuals (Table 1).

Taken together, our results strongly support the conclusion that the deletion “T” on exon 3 of

SLC13A1 gene is the causative mutation for inherited chondrodysplasia in Texel sheep.

99

DISCUSSION The use of genetic markers to either develop a genetic map or for QTL analysis has come a long way since early reports suggesting their use (Soller & Beckmann, 1983). However, with the advent of the ovine genome sequence and a high density SNP chip, the tools for analyses of quantitative inherited traits and genetic disorders have improved markedly.

These tools, combined with comparative genomic approaches to determine useful candidate genes (Rothschild & Soller, 1997), make it possible to find genetic causes of both quantitative traits and monogenic simple recessive disorders in a relatively short time.

Previous reports showed that the mutations in the SLC26A2 gene were responsible for human achondrogenesis. The structural, microscopic, and biochemical analyses on chondrodysplasia in Texel sheep suggested a similar pathogenic mechanism involved in this defect. The genes called solute carriers, encoding membrane transport proteins, especially the sulfate transporters, are prime candidate genes. However, there are almost 300 members for solute carrier genes distributed throughout the human genome and these are organized into 51 gene families (Hediger et al. 2004). Both the low density microsatellite - based linkage mapping and the direct candidate gene method were not suitable to handle these data directly and were incapable of localizing the mutations underlying this Mendelian disease.

The recent development of the ovine SNP chip containing more than 50,000 SNP markers well-distributed across the sheep genome has become the best and most efficient tool for the genome-wide dissection of likely mutant loci (Becker et al. 2010). The rationale for this is as follows. Provided a disease is recessive, accurately diagnosed when present, and the defective allele is non-existent in that part of the population not descended from the founder, then regardless of the level of penetrance the affected animals must all be identical by

100

descent at the causative mutation and most likely in the surrounding area. The expected size of the IBD region that contains the causative mutation will depend in part upon the number of meioses that separate the common ancestor from the affected individual, and the number of affected individuals. In our mapping strategy, a homozygous fragment spanning larger than 10 SNPs was chosen as a threshold for valid candidate regions that needed to be examined (Zhao et al. 2011). A 10-SNP interval on the sex-average map (Sheep Best

Positions Linkage Map Version 4.7, http://rubens.its.unimelb.edu.au/~jillm/jill.htm) is about

0.64 cM. The probability that a 0.64 cM region would be inherited intact is 1-0.0064=0.9936, and that this would occur in all three meioses that separate a founder and a back-crossed affected individual is 0.9936^3=0.9810. The 10-SNP region will only be IBD if it was inherited intact in all 11 backcross offspring, that outcome having a probability of 0.9810^11

=0.81. This value is large enough to give us confidence that the causative mutation would be contained in regions of larger than 10 consecutive SNP and that we should examine those regions first as they had the highest likelihood of carrying the mutation.

In this study, only one such IBD fragment was detected, which comprised 25 SNP loci and the causative mutation was right located on this fragment. The results from our current and previous studies (Zhao et al. 2011) show our genome wide homozygosity mapping strategy has enough power to detect causative loci for rare recessive genetic diseases where consanguineous matings are available. The candidate gene approach (Rothschild & Soller,

1997) can act like a bridge between the targeted genomic regions and the causative mutations.

This helps to verify and find the mutation. Candidate gene analysis can be accomplished by direct sequencing or PCR-RFLP tests. The RFLP method was first described by Dr. Soller and colleagues in 1983 (Soller & Beckmann, 1983; Beckmann & Soller, 1983). It has been

101

successfully applied for about 30 years for a variety of polymorphism analyses including the

SNP analysis associated with many interesting traits (Rothschild et al. 1997; Zhao et al. 2009;

Zhao et al. 2010, etc.). The candidate gene approach using RFLP tests is usually more convenient and historically more economic than using direct sequencing. Our work provides a successful example for combining GWAS and the standard candidate gene analysis.

In this study, the presumed carrier lamb 14-04 was not heterozygous for the consecutive 25

SNPs in this targeted IBD region as were the other carriers, but was homozygous with the same haplotype as the affected sheep. This was probably due to this lamb carrying the ancestral genomic region within which the g.25513delT mutation arose in some other ancestor. The genotyping results of g.25513delTfor lamb 14-04 demonstrated it was in fact heterozygous which was consistent with the presumed disease status.

The SLC13A1 gene was considered as the most plausible functional candidate gene within the 1Mb-interval on OAR 4. This gene belongs to the solute-linked carrier gene family 13

(SLC13) and encodes a protein called renal Na+ - dependent inorganic sulfate transporter-1

(Sodium/sulfate cotransporter, NaS1). DNA sequencing revealed the 1-bp deletion of “T” was perfectly associated with the chondrodysplasia phenotype in Texel sheep. We also demonstrated that this deletion was not observed in any of the 54 normal Romney sheep controls. We confirmed the presence of this deletion as a mutation on the genomic DNA and believe that the ovine NaS1 protein in the affected sheep may undergo a nonsense mRNA mediated decay, a process which detects nonsense mutations and prevents the expression of truncated or erroneous proteins. The RT-PCR analysis on SLC13A1 mRNA could provide further evidence to support our conclusions. Unfortunately, most of the originally affected

102

animals either died due to tracheal collapse or were euthanized at approximately 5 months of

age. New lambs from designed matings between carrier animals would need to be produced

for any future experiments.

Our study is the first report showing that the mutation in the SLC13A1 gene is responsible for

inherited chondrodysplasia in a large mammal species. A previous research described that a

homologous Slc13a1 knockout mouse model produced similar clinical phenotypes as our

affected Texel sheep, including reduced serum sulfate concentration and a general growth

retardation throughout adulthood (Dawson et al. 2003). The knockout mouse model further

supports our findings.

The NaS1 co-transporter locates on the brush border membranes of epithelial cells lining the

renal proximal tubule and the gastrointestinal tract. It helps to absorb inorganic sulfate in the

small intestine and to reabsorb in the kidney proximal tubule to maintain the concentration of

serum sulfate (Markovich, 2001). Sulfate is the fourth most abundant anion in human plasma,

and plays an essential role during growth, development, cellular metabolism, and

bone/cartilage formation. For example, the sulfated proteoglycans as the largest group of

sulfoconjugate in mammals, are required for the maintenance of normal structure of bone and

cartilage (Fernandes, et al. 1997). The SLC13 family was believed to be distinct functionally

and structurally from the SLC26 gene family which constitutes Na+ - independent inorganic sulfate transporter (Markovich, 2008). A recent study has indicated that the SLC26A2 protein expression was limited to the microvillus membrane of kidney proximal tubule

(Chapman & Karniski, 2010). We believe NaS1 and SLC26A2 proteins would share similar

103

functions during the sulfate transportation due to their similar localizations. The exact roles

of sulfate transportation for each of the two proteins are still uncertain.

Copy number variation analysis is receiving more attention since the development of high

density SNP chips as well as the dramatic improvements in new sequencing technologies.

Several CNVs have been found associated with human diseases such as obesity (Chen et al.

2011), autoimmune disease (Mamtani et al. 2010) and autism (Sebat et al. 2007). Moreover,

an extra copy of the FGF4 gene, as a form of evolutionarily acquired fgf4 retrogene, was identified strongly associated with chondrodysplasia in dogs (Parker et al. 2009). With the intention of determining whether CNVs were associated with this inherited chondrodysplasia in Texel sheep, we performed a CNV analysis based on the SNP chip genotyping data.

However, there were no CNVs apparently associated only with this defect. The CNV variations detected on OAR1 and OAR3 using the partition software probably came from the

SNP chips themselves rather than being due to the disease status themselves.

In summary, our study indicates the 1-bp coding deletion of “T” in the ovine SLC13A1 gene

is responsible for this form of inherited chondrodysplasia in Texel sheep. Our results suggest

NaS1 is essential for maintaining sulfate homeostasis in sheep. The comparison of the gene

expression of SLC13A1 or directly detecting the levels of the NaS1 protein using in situ

hybridization in the kidney tissues among chondrodysplasia affected, carriers and normal

Texel sheep should be done in the future. Using the ovine high density SNP array with a

candidate gene approach was confirmed again by this study as an efficient way for

localization of a mutation locus causing a simple recessive inherited defect in sheep. Similar

methodologies have been carried out to identify mutations in other recessive inherited defects

104

in sheep, such as microphthalmia and inherited rickets (Becker et al. 2010; Zhao et al. 2011).

The SLC13A1 gene can be rapidly applied as a selection marker to identify carriers with the defective g.[25513delT] allele in order to cull carriers, or avoid at-risk matings to improve animal welfare and decrease economic losses. Due to the moderate size and the relatively low cost of maintenance of carriers, sheep with chondrodysplasia could be used as a potential animal model for future human chondrodysplasia clinical drug development if any similar human cases are diagnosed in the future.

ACKNOWLEDGEMENTS

The authors appreciate the opportunity for this paper to be included in this journal’s special issue in honor of Dr. Moshe Soller and thank the editor and referees for their comments. The authors appreciate the financial support provided by State of Iowa and Hatch Funds and the support from the College of Agriculture and Life Sciences at Iowa State University. Animal studies and diagnoses were funded by Massey University. H.T.B. was partially funded by the National Research Centre for Growth and Development.

Conflicts of Interest

The authors have declared no potential conflicts.

Web URLs

Primer3: http://fokker.wi.mit.edu/primer3/

Ovine Genome Assembly v1.0: http://www.livestockgenomics.csiro.au/perl/gbrowse.cgi/oar1.0/#search

ExPASy translate tool: http://ca.expasy.org/tools/dna.html

105

CLUSTALW: http://align.genome.jp/

TopPred 0.01: http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html

REFERENCES Becker D., Tetens J., Brunner A., Burstel D., Ganter M., Kijas J. & Drogemuller C. (2010) Microphthalmia in Texel sheep is associated with a missense mutation in the paired- like homeodomain 3 (PITX3) gene. PLoS One 5, e8689. Beckmann J.S. & Soller M. (1983) Restriction fragment length polymorphisms in genetic improvement: methodologies, mapping and costs. Theoretical and Applied Genetics 67, 35-43. Beever J.E., Smit M.A., Meyers S.N., Hadfield T.S., Bottema C., Albretsen J. & Cockett N.E. (2006) A single-base change in the tyrosine kinase II domain of ovine FGFR3 causes hereditary chondrodysplasia in sheep. Animal Genetics 37, 66-71. Byrne T.J. (2005) Investigation into the inheritance and biochemistry of chondrodysplasia in Texel sheep. Master’s thesis. Massey University, New Zealand. Byrne T.J., Blair H.T., Thompson K.G., Stowell K.M. & Piripi S. (2008) Inheritance of chondrodysplasia in Texel sheep. Proceedings of the New Zealand Society of Animal Production 68, 20-2. Chapman J.M. & Karniski L.P. (2010) Protein localization of SLC26A2 (DTDST) in rat kidney. Histochemistry Cell Biology 133, 541-7. Chen Y., Liu Y.J., Pei Y.F., Yang T.L., Deng F.Y., Liu X.G., Li D.Y. & Deng H.W. (2011) Copy Number Variations at the Prader-Willi Syndrome Region on Chromosome 15 and associations with Obesity in Whites. Obesity (Silver Spring) 19, 1229-34. Cockett N.E., Shay T.L., Beever J.E., Nielsen D., Albretsen J., Georges M., Peterson K., Stephens A., Vernon W., Timofeevskaia O., South S., Mork J., Maciulis A. & Bunch T.D. (1999) Localization of the locus causing Spider Lamb Syndrome to the distal end of ovine Chromosome 6. Mammalian Genome 10, 35-8. Dawson P.A., Beck L. & Markovich D. (2003) Hyposulfatemia, growth retardation, reduced fertility, and seizures in mice lacking a functional NaSi-1 gene. Proceedings of the National Academy of Sciences USA 100, 13704-9. Fernandes I., Hampson G., Cahours X., Morin P., Coureau C., Couette S., Prie D., Biber J., Murer H., Friedlander G. & Silve C. (1997) Abnormal sulfate metabolism in vitamin D-deficient rats. The Journal of Clinic Investigation 100, 2196-203. Hastbacka J., Kerrebrock A., Mokkala K., Clines G., Lovett M., Kaitila I., de la Chapelle A. & Lander E.S. (1999) Identification of the Finnish founder mutation for diastrophic dysplasia (DTD). European Journal of Human Genetics 7, 664-70. Hastbacka J., Superti-Furga A., Wilcox W.R., Rimoin D.L., Cohn D.H. & Lander E.S. (1996) Atelosteogenesis type II is caused by mutations in the diastrophic dysplasia sulfate- transporter gene (DTDST): evidence for a phenotypic series involving three chondrodysplasias. The American Journal of Human Genetics 58, 255-62.

106

Hediger M.A., Romero M.F., Peng J.B., Rolfs A., Takanaga H. & Bruford E.A. (2004) The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins - Introduction. Pflugers Archive 447, 465-8. Lee A., Beck L. & Markovich D. (2000) The human renal sodium sulfate cotransporter (SLC13A1; hNaSi-1) cDNA and gene: organization, chromosomal localization, and functional characterization. Genomics 70, 354-63. Mamtani M., Anaya J.M., He W. & Ahuja S.K. (2010) Association of copy number variation in the FCGR3B gene with risk of autoimmune diseases. Genes Immunity 11, 155-60. Markovich D. (2001) Physiological roles and regulation of mammalian sulfate transporters. Physiological reviews 81, 1499-533. Markovich D., Romano A., Storelli C. & Verri T. (2008) Functional and structural characterization of the zebrafish Na+-sulfate cotransporter 1 (NaS1) cDNA and gene (slc13a1). Physiology Genomics 34, 256-64. Martínez S., Valdés J. & Alonso R.A. (2000) Achondroplastic dog breeds have no mutations in the transmembrane domain of the FGFR-3 gene. Canadian Journal of Veterinary research 64, 243-5. Parker H.G., VonHoldt B.M., Quignon P., Margulies E.H., Shao S., Mosher D.S., Spady T.C., Elkahloun A., Cargill M., Jones P.G., Maslen C.L., Acland G.M., Sutter N.B., Kuroki K., Bustamante C.D., Wayne R.K. & Ostrander E.A. (2009) An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science 325, 995-8. Piripi S.A. (2008) Chondrodysplasia of Texel sheep. Doctorate thesis. Massey University, New Zealand. Piripi S.A., Williams M.A.K. & Thompon K.G. (2011) On the sulfation pattern of polysaccharides in the extracellular matrix of sheep with chondrodysplasia. Cartilage 2, 36-9. Rothschild M., Jacobson C., Vaske D., Tuggle C., Wang L., Short T., Eckardt G., Sasaki S., Vincent A., McLaren D., Southwood O., van der Steen H., Mileham A. & Plastow G. (1996) The estrogen receptor locus is associated with a major gene influencing litter size in pigs. Proceedings of the National Academy of Sciences USA 93, 201-5. Rothschild M.F. & Soller M. (1997) Candidate gene analysis to detect traits of economic importance in domestic livestock. Probe 8, 13-22. Rozen S. & Skaletsky H.J. (1999) Primer3 on the WWW for general users and for biologist programmers. In: Bioinformatics Methods and Protocols:Methods in Molecular Biology (eds. by Misener S & Krawetz SA), pp. 365-86. Humana Press Inc., Totowa, NJ. Sebat J., Lakshmi B., Malhotra D., Troge J., Lese-Martin C., Walsh T., Yamrom B., Yoon S., Krasnitz A., Kendall J., Leotta A., Pai D., Zhang R., Lee Y.H., Hicks J., Spence S.J., Lee A.T., Puura K., Lehtimaki T., Ledbetter D., Gregersen P.K., Bregman J., Sutcliffe J.S., Jobanputra V., Chung W., Warburton D., King M.C., Skuse D., Geschwind D.H., Gilliam T.C., Ye K. & Wigler M. (2007) Strong association of de novo copy number mutations with autism. Science 316, 445-9. Soller M. & Beckmann J.S. (1983) Genetic polymorphisms in varietal identification and genetic improvement. Theoretical and Applied Genetics 67:25-33.

107

Superti-Furga A., Hastbacka J., Wilcox W.R., Cohn D.H., van der Harten H.J., Rossi A., Blau N., Rimoin D.L., Steinmann B., Lander E.S. & Gitzelmann R. (1996) Achondrogenesis type IB is caused by mutations in the diastrophic dysplasia sulphate transporter gene. Nature Genetics 12, 100-2. Thompson K.G., Blair H.T., Linney L.E., West D.M. & Byrne T. (2005) Inherited chondrodysplasia in Texel sheep. New Zealand Veterinary Journal 53, 208-12. Woelfle J.V., Brenner R.E., Zabel B., Reichel H. & Nelitz M. (2011) Schmid-type metaphyseal chondrodysplasia as the result of a collagen type X defect due to a novel COL10A1 nonsense mutation : A case report of a novel COL10A1 mutation. Journal of Orthopaedic Science 16, 245-9. Zankl A., Mornet E. & Wong S. (2008) Specific ultrasonographic features of perinatal lethal hypophosphatasia. American Journal of Medical Genetics 146A, 1200-4. Zhao X., Dittmer K.E., Blair H.T., Thompson K.G., Rothschild M.F. & D.J. G. (2011) A novel nonsense mutation in the DMP1 gene identified by a genome-wide association study is responsible for inherited rickets in Corriedale sheep. PLoS ONE 6, e21739. doi:10.1371/journal.pone.0021739. Zhao X., Du Z.Q. & Rothschild M.F. (2010) An association study of 20 candidate genes with cryptorchidism in Siberian Husky dogs. Journal of Animal Breeding and Genetics 127, 327-31. Zhao X., Du Z.Q., Vukasinovic N., Rodriguez F., Clutter A.C. & Rothschild M.F. (2009) Association of HOXA10, ZFPM2, and MMP2 genes with scrotal hernias evaluated via biological candidate gene analyses in pigs. Journal of Animal Breeding and Genetics 70, 1006-12.

108

Table 1. The genotyping results of the causative mutation (g.25513delT)

Phenotype Genotype Sheep Rate of No. of Direct No. of breed Status PCR-RFLP concordance sheep sequencing Sheep

Texela g.25513delT/ g.25513delT/ Affected 15 15 100% g.25513delT g.25513delT Presumed 8 g.25513delT/T g.25513delT/T 8 100% Carrier Romney Normal 54 T/T T/T 54 100% aThese sheep include pure Texel sheep and Texel crossbred sheep.

109

Figure 1. Chondrodysplasia of Texel sheep (A) A pair of full sibling lambs at 56 days. The left lamb (lamb ID: 23-04) was phenotypically normal and its twin brother (lamb ID: 24-04) was characterized with stunted growth. (B) A thirty-two day-old chondrodysplasia lamb with characteristic short neck and wide-based stance. (C) Sections through the trachea of a chondrodysplastic lamb (bottom) and a normal lamb (top) illustrating the thickened, flattened cartilage ring in the affected lamb.

110

Figure 2. Work flow chart for discovering the mutation locus of chondrodysplasia in Texel sheep. (A) The sheep chromosome 4 (OAR 4). (B) The IBD region containing the causative locus is located on OAR 4 between 92.45 and 93.44 Mb. (C) Seven cow reference genes including SLC13A1 encompassing the sheep IBD region based on the comparative genomic maps. (D) The comparisons of the partial SLC13A1 mRNA sequences and the partial protein sequences between the normal (wide type) and chondrodysplasic sheep. The 3 underlined in the mRNA sequence highlighted with yellow color represent the codon affected by the g.25513delT mutation. The corresponding amino acids were also underlined in the protein sequences. The next following codon in the protein sequence of affected sheep, highlighted in red color, introduced a stop codon represented by “X”. WT: wild type; A: affected sheep

111

Figure 3. The topology prediction of transmembrane domains (TMDs) of ovine SLC13A1 protein. The peaks above the cut-off value of 1 represent the TMDs. There are 13 putative TMDs in this protein.

112

Figure 4. The direct sequencing results of the causative mutation g.25513delT. All the affected sheep with two copies of deletion T; the presumed carriers show one copy of deletion T; the pooled normal controls show no copy of deletion T.

113

Figure S1. The pedigree information for the 23 sheep genotyped by Ovine SNP chip. The result was attained from microsatellite results (data not shown). The boxes below animal identifiers indicate the status of chondrodysplasia. The 3 terminal rams are at the top-left. They sired dams 3-17 in the out-cross trials, and F1 daughters in the back- cross and other 4 ewes with chondrodysplasia background. Arrows indicate parentage results deduced by observation of maternal behavior soon after birth of lambs and microsatellite makers. M: male; F: female.

114

Figure S2. The distribution of homozygous fragments commonly in affected lambs. Based on the different number of SNPs spanning the homozygous fragment, the counts of homozygous fragments commonly in all affected lambs were shown. Only one homozygous fragment spanning 25 SNPs was identified commonly in all 15 affected lambs (pointed by the blue color arrow), when the “larger than 10-SNP” threshold was applied.

115

Figure S3. The CNV region display. Each column represents one individual sheep with the copy numbers of SNP loci across the whole genome. Different colour shows different ranges of copy numbers. The chondrodysplasia status for each sheep was listed on the top of the display map. A: affected; C: presumed carriers.

116

CLUSTALW ( 1. 81) mul t i pl e sequence al i gnment Mouse MKLLNYALVYRRFLLVVFTI LVFLPLPLI I RTKEAQCAYI LFVIAI FWITEALPLSI TAL Rat MKLLNYAFVYRRFLLVVFTVLVLLPLPLI I RSKEAECAYI LFVI ATFWITEALPLSITAL Sheep MKFFSYVLVYRRLLLVVFTLLI LLPLPLI LRSKEAECAYTLFVVAI FWLTEALPLSI TAL Cat t l e MKFFSYI LVYRRLLLI VFTLLI LLPLPLI LRSKEAECAYTLFVIAI FWLTEALPLSI TAL Human MKFFSYI LVYRRFLFVVFTVLVLLPLPIVLHTKEAECAYTLFVVATFWLTEALPLSVTAL Dog MKVFSYI LVYRRFLLI VLPPLVLLPLPII VRTKEAECAYTWAVVAALWLTEALPLPVTAL Zebr af i sh MRRLRCSWSALKPLI I I STPLAI LPLPLVI KTKEAECAYVLI LMAVYWMTEVLPLSVTAL *: : : *::: . * :****:::::***:*** ::* *:**.***.:***

Mouse LPGLMFPMFGIMRSSQVAS-AYFKDFHLLLI GVI CLATSIEKWNLHKRIALRMVMMVGVN Rat LPGLMFPMFGIMSSTHVAS-AYFKDFHLLLI GVI CLATSIEKWNLHKRIALRMVMMVGVN Sheep LPGLMFPMFGIMPSKEVAS-AYFKDFHLLLI GVI CLATSIEKWNLHKRIALRMVMMVGVN Cat t l e LPGLMFPMFGIMTSKEVAS-AYFKDFHLLLI GVI CLATSIEKWNLHKRIALRMVMMVGVN Human LPSLMLPMFGIMPSKKVAS-AYFKDFHLLLI GVI CLATSIEKWNLHKRIALKMVMMVGVN Dog LPGLLFPLFGITASKEVVASAYFKDFHLLLI GVICLATSI EKWNLHTRIALRMVMTVGVN Zebr af i sh I PALLFPLFGIMKSSQVAS-QYFKDFHLLLLGVI CLATSIEKWGLHRRIALRLVTGLGVN :*.*::*:*** *..*.: *********:************.** ****::* :***

Mouse PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPIVEAVAQQII SAEAEAEATQMTYFNESAAH Rat PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPIVEAVAQQITSAEAEAEATQMTYFNESAAQ Sheep PAWLSLGFMSSTAFLSMWLSNTSTAAMVMPI VEAVAQQVISAEAEVEATQMTYFSGSTNQ Cat t l e PAWLSLGFMSSTAFLSMWLSNTSTAAMVMPIVEAVAQQVI SAEAEVEATQMTYFSGSTNQ Human PAWLTLGFMSSTAFLSMWLSNTSTAAMVMPIAEAVVQQII NAEAEVEATQMTYFNGSTNH Dog PTWLTLGLMSSTAFLSMWLSNTSTAAMVMPIVEAVAQQII SAQAEVEATQMTSLGGSTNH Zebr af i sh PGMLMLGFMVGCAFLSMWLSNTSTVAMVMPIVEAVI QQVLSASEEANKDHKGALNGISNP * * **:* . ************.******.*** **: .*. *.: : :. :

Mouse GLDIDETVIGQETNEKKEKTKPAPGSSHDKGKVSRKMETEKNAVTG-- AKYRSRKDHMMC Rat GLEVDETI I GQETNERKEKTKPALGSSNDKGKVSSKMETEKNTVTG-- AKYRSKKDHMMC Sheep ALEIDESINGQEANERKEKTKPVPGYSNDAGKI SSKMELEKNSGTNSGNKYRTKKDHMMC Cat t l e ALEIDESINGQEANERKEKTKPVPGYSNDAGKI SSKMELEKNSGTNSGNKYRTKKDHMMC Human GLEIDESVNGHEINERKEKTKPVPGYNNDTGKISSKVELEKNSGMR-- TKYRTKKGHVTR Dog GLEIDESVNGQETNERNEKTI PAPGYGNDAGKI SSKTEPEKNSDTG-- TKYRTKKDHMMC Zebr af i sh ALQLEDIEAQHDQSE-- - KIKSIEAGITEPEPTSEEHETAKAPTPPYSGKYRTREDHMMC .*:::: :: .* * . . : * : * * . ***:::.*:

Mouse KLMCLSVAYSSTI GGLTTI TGTSTNLI FSEHFNTRYPDCRCLNFGSWFLFSFPVALI LLL Rat KLMCLCIAYSSTI GGLTTI TGTSTNLI FSEHFNTRYPDCRCLNFGSWFLFSFPVAVI LLL Sheep KLMCLCIAYSSTI GGLTTI TGTSTNLI FAEHFNMRYPDCRCINFGSWFI MSFPAAI I I LL Cat t l e KLMCLCIAYSSTI GGLTTI TGTSTNLI FAEHFNMRYPDCQCINFGSWFI MSFPAAI VILL

Figure S4. The comparison of the SLC13A1 protein sequences among 7 species. The comparison was conducted using the multiple sequence alignment tool (CLUSTALW).

Table S1. A list of candidate gene symbol, primer sequence, annealing temperature, PCR amplicon information, genetic variants identified by sequencing and accessed template sequence.

Sequence access Genetic variant No. Gene Annealing Size Fragment SNP Primer name Primer sequence (position in the Variant name symbol Temp. (ºC) (bp) name characteristic fragment) (GenBank/Ovine 1.0)

SLC13A1 sSLCre1F 5’AGCCACCTGTGCAAGACAAT3’ 60 472 sSLCe1 no - - JN108880 sSLCre1R 5’CGCATTGCTGCCTAGTCTC3’ sSLCre2F 5’AATAACCATTTGCGGCCCTA3’ 60 301 sSLCe2 no - - OAR4:92886000- 92995055 sSLCre2R 5’AAGTCAAACATTCGCAGGAAA3’ sSLCdelrF 5’TGGATAAGGGATGTGGAACTG3’ 56 411 sSLCdel C > T (170bp) g.25395C>T intronic JN108880 sSLCdelrR 5’TGCCTGGATGTGCATTACTT3’ Deletion T (288bp) g.25513delTb exonic sRFLPdelTFa 5’TGAATTACACTGCCATACAAATG 58 349 sRFLPdelT Deletion T (148bp) g.25513delTb exonic JN108880 A3’ sRFLPdelTRa 5’GGTCAAAGTTAGGAGCAGCAA3’ sSLCre5_6F 5’TCCACCTTCACACACACACA3’ 64 510 sSLCe5_6 A > G (181bp) g.78757A>G intronic OAR4:92886000- 92995055 117 sSLCre5_6R 5’TGCAAACTGCCATCACTATG3’ A > G (261bp) g.78677A>G intronic sSLCre8F 5’CTGTTCCCAGGTCTGCTTGT3’ 64 585 sSLCe8 C > T (381bp) g.59017C>T intronic JN108880 sSLCre8R 5’AGAACACGGAACCCAGACAC3 sSLCre9F 5’CTCGCCACCATATCCCTCTA3’ 58 711 sSLCre9 C > T (61bp) g.63875C>T JN108880 sSLCre9R 5’AACCAGAAACAAAGCCAGGA3’ A > G (212bp) g.64026A>G exonic, synonymous sSLCre10F 5’CCAGGTAGAAGGATAAGAAGCTG 58 609 sSLCre10 no - - JN108880 A3’ sSLCre10R 5’GCTCCAAGGTCTCTACTCCTGA3’ sSLCre11F 5’AAGGGAACTTTTGGCCTAGC3’ 60 545 sSLCre11 G > T (169bp) g.39802G>T intronic OAR4:92886000- 92995055 sSLCre11R 5’AGAGGAAGGGTGCAGTCTGA3’ A > C (239bp) g.39732A>C intronic C > T (444bp) g.39527C>T intronic sSLCre12F 5’AAAATGCAAAGAAGTTTTCCAA3’ 60 535 sSLCre12 no - - OAR4:92886000- 92995055 sSLCre12R 5’GCTGGGACATCATAAAAATGG3’ sSLCre13F 5’AAAAACCACAGGAAAATATGAAA 60 505 sSLCre13 no - - OAR4:92886000- GA3’ 92995055 sSLCre13R 5’CTGCTGCTTGAACATGGAAT3’ sSLCre14F 5’CAATCACCATATAAACAACCGAA 60 507 sSLCre14 A > T (54bp) g.30951A>T intronic OAR4:92886000- A3’ 92995055 sSLCre14R 5’GCACGTTTGCTCCAGGTTAT3’ sSLCre15F 5’GAAACTTCTCTGATGCTCTCCAA3’ 58 529 sSLCre15 A > G (27bp) g.29232A>G intronic OAR4:92886000- 92995055

Table S1. (continued) Sequence access Genetic variant No. Gene Primer Annealing Size Fragment Variant SNP Primer sequence (position in the symbol name Temp. (ºC) (bp) name name characteristic fragment) (GenBank/Ovine 1.0) SLC13A1 sSLCre15R 5’GGCACATAGATGCTCTTGGTT3’ sSLCre16F 5’TGTTGAGTAGCCAGTTGTGGTT3’ 58 567 sSLCre16 no - - OAR4:92886000- 92995055 sSLCre16R 5’TTGCAATTGACTTTCCTTGC3’ CADPS2 sCADPS2- 5’TGGTGGGCAATTCACTTCTT3’ 58 339 sCADPS-1 no - - OAR4:92155745- 1F 92255745 sCADPS2- 5’CGGGTCAACTTAGCAGGTTT3’ 1F sCADPS2- 5’TTGCAGACACACCCTCAAAA3’ 58 405 sCADPS-2 no - - OAR4:92155745- 2F 92255745 sCADPS2- 5’GGACAGAGAAGCCTGGAGTG3’ 2R sCADPS2- 5’GAACCTTCCTGTTGATGCAAG3’ 60 515 sCADPS2-3 no - - OAR4:92155745- 3F 92255745

sCADPS2- 5’CCATGTTTCCAGATAAATTCTGTTC3’ 118 3R sCADPS2- 5’AAATCATGGGCATTCAGACC3’ 58 353 sCADPS2-4 no - - OAR4:92155745- 4F 92255745 sCADPS2- 5’CATGAAAGGCAGCATGAGAA3’ 4R IQUB sIQUB-1F 5’AGTTCCAGGGAGACCATTCC3’ 58 302 sIQUB-1 no - - OAR4:93209752- 93265179 sIQUB-1R 5’TGATTAAACCACCAGCACCA3’ sIQUB-2F 5’GAATACAGGGCAGGGAAAGG3’ 58 320 sIQUB-2 no - - OAR4:93209752- 93265179 sIQUB-2R 5’GCAATGGTTCATTACTTTTGGA3’ sIQUB-3R 5’CCCCCAAAAGCAGAACTTTA3’ 58 392 sIQUB-3 no - - OAR4:93209752- 93265179 sIQUB-3R 5’GCATATCTACTTCAGAGGGCTTT3’ sIQUB-4F 5’AAATGAAGCCAGTAGCGGAAT3’ 58 404 sIQUB-4 no - - OAR4:93209752- 93265179 sIQUB-4R 5’TTTGTTATCATGGAAGAGACTTGC3’ ASB15 sASB15_1F 5’ CTGTGTGAAGCGATGGTCAG3’ 58 607 sASB15-1 A > G (150bp) g.150A>G exonic JN108881 sASB15_1R 5’GGCACTCGGATTTGTAGGTC3’ sASB15_2F 5’GGGGACCTACAAATCCGAGT3’ 58 519 sASB15-2 no - - JN108881 sASB15_2R 5’GGGGAAAGTGGTCTGTTTGA3’ sASB15_3F 5’TTGAAAAGGCCAAGAACACA3’ 58 404 sASB15-3 no - - JN108881 sASB15_3R 5’CTGTTTGGATTTTGGGAGACA3’ sASB15_4F 5’TTGCCTGTGGCTCTCATACA3’ 58 343 sASB15-4 no - - OAR4:93343948- 93353368 sASB15_4R 5’TTACCAGCACAGCCATACGA3’

Table S1. (continued) Sequence access Genetic variant No. Gene Primer Annealing Size Fragment Variant SNP Primer sequence (position in the symbol name Temp. (ºC) (bp) name name characteristic fragment) (GenBank/Ovine 1.0) ASB15 sASB15_5F 5’TGCAGATGCCCTCATTGTTA3’ 58 307 sASB15-5 no - - OAR4:93343948- 93353368 sASB15_5R 5’CCTGTTGAACGGAGCGTAA3’ sASB15_6F 5’TCCAAACATGCAATCCAGAA3’ 58 372 sASB15-6 no - - OAR4:93343948- 93353368 sASB15_6R 5’ACTGTATGGCACTGGGGAAG3’ LMOD2 sLMOD2-1F 5’AGCTGGACGACATTGAACCT3’ 58 434 sLMOD2-1 C > T (207bp) g.470C>T intronic OAR4:93370239- 93377211 sLMOD2-1R 5’TCTAAACCCGTGCTTTGCTT3’ G > T (230bp) g.493G>T intronic sLMOD2-2F 5’GGGTGTCAACCCATTTCAAC3’ 58 444 sLMOD2-2 no - - OAR4:93370239- 93377211 sLMOD2-2R 5’GTGCCAACAGGACTTGGTTT3’ sLMOD2-3F 5’TGTGAGTTCATCCCTGGTCA3’ 58 337 sLMOD2-3 no - - OAR4:93370239- 93377211

sLMOD2-3R 5’TGCTCTGGAAAACCCTTGAT3’ 119 sLMOD2-4F 5’CAGCGAAGGAGAGGAAAGAA3’ 63 430 sLMOD2-4 no - - OAR4:93370239- 93377211 sLMOD2-4R 5’GATGTTGACGCTGGTGATGT3’ sLMOD2-5F 5’CCTTTGGCTACAGGAGAGGA3’ 58 306 sLMOD2-5 no - - OAR4:93370239- 93377211 sLMOD2-5R 5’GGGATTGGGAATAGCTGAGG3’ WASL sWASL-1F 5’GGGGGTCTCGTCTTTTCTCT3’ 63 466 sWASL-1 C > T (248bp) g.256C>T intronic OAR4: 93431017- 93481693 sWASL-1R 5’TTGTCCCTCTGCTGTCACTG3’ sWASL-2F 5’TGTTTTGTGTAAAGGGCCAGA3’ 60 494 sWASL-2 no - - OAR4: 93431017- 93481693 sWASL-2R 5’TCCATGCTTTTTGCCTTCTT3’ sWASL-3F 5’CGTTGCTTCCTATTAAGGCGTA3’ 58 381 sWASL-3 Deletion A (309bp) g.5692delA intronic OAR4: 93431017- 93481693 sWASL-3R 5’TGCTCTGTGTGTAAAGGCTGTT3’ sWASL-4F 5’AACGCCCATGCTTCAATATC3’ 58 399 sWALS-4 no - - OAR4: 93431017- 93481693 sWASL-4R 5’GGGGTTCTGGTTCAGTGTCT3’

120

CHAPTER 5. BAYESIAN INFERENCE IDENTIFIES A CANDIDATE

REGION ASSOCIATED WITH CANINE CRYPTORCHIDISM THAT INCLUDES

THE AMHR2 GENE

A paper prepared to submit to the Journal of Heredity

Xia Zhao, Suneel Onteru, Mahdi Saatchi, Dorian Garrick, Max Rothschild

Department of Animal Science, Iowa State University, IA, 50011 USA.

Key word: AMHR2, cryptorchidism, BayesB analysis

ABSTRACT Cryptorchidism is a condition whereby one or both testes fail to descend into the scrotal

sac. It is a common congenital disorder prevalent in many mammalian species.

Cryptorchidism is a complex genetic disease considered to be caused by the interaction of

genetic, epigenetic and environmental factors. No single mutated gene has been found to

cause more than 10% of cryptorchids. Here we performed a genome wide association

study (GWAS) using case-control analysis in PLINK software and the BayesB approach

in GenSel software with a 1Mb window of SNPs or haplotypes. The haplotypes were

constructed from a genealogical tree on the population of 204 Siberian Huskies and its 3

subgroups clustered by genomic relationships to account for population structure. No

significant association was observed in the case-control analyses after FDR corrections in

PLINK. In contrast, the BayesB analysis identified a 1Mb region on dog chromosome 27

121

(CFA27) containing the AMHR2 (anti-mullerian hormone type II receptor) gene that

explained a high percentage of genetic variance in subgroup 1 (~ 10 fold greater than the

expected value for an infinitesimal model in the SNP window analysis and ~ 9 fold

greater in the haplotype window analysis). It is known that mutations in AMHR2 cause

persistent mullerian duct syndrome (PMDS) in humans and the majority of such patients

with PMDS exhibit cryptorchidism. In conclusion, we identified a putative candidate

region at the 4th Mb on CFA27 (CanFam2.0) harboring a very promising functional candidate gene AMHR2 associated with the development of cryptorchidism in a subgroup of our Siberian Husky population.

INTRODUCTION Testicular descent is an essential developmental stage for normal reproduction in male

individuals of many mammalian species. Normally, testes descend from an intra-

abdominal position down into the scrotal sac. That environment with 2 - 4 °C below

normal body temperature is a prerequisite for spermatogenesis in humans and many other

mammals (Klonisch et al. 2004; Thonneau et al. 1998). The approximate timing of

testicular descent varies in different species. In mice, rats and especially dogs, testicular

descent is normally completed postnatally, while in humans, rabbits, pigs and horses, this

process is normally completed before birth (Amann and Veeramachaneni, 2007; Hughes

and Acerini, 2008). The failure of a testis to descend into the scrotal sac is called

cryptorchidism (Greek: hidden testicle). This defect is regarded as one of the most

common congenital human defects. About 3% of full term and 30% of premature infant

boys are born with at least one testicle undescended (Klonisch et al. 2004).

Cryptorchidism has been observed in other mammalian species, such as dogs (Cox et al.

122

1978), cats (Yates et al. 2003), horses (Cox et al. 1979), pigs (Rothschild et al. 1988), goats (Singh and Ezeasor 1989), sheep (Smith et al. 2007) and cattle (St Jean et al. 1992), as well as in wild animals like maned wolves (Burton and Ramsay, 1986), black bears

(Dunbar et al. 1996) and reindeer (Leader-Williams 1979). An undescended testicle is considered a risk factor for germ-cell tumors and bilateral cryptorchidism can result in male infertility (Berkowitz et al. 1993; Carizza et al. 1990).

A general perception of causes for cryptorchidism is a multiplicity of factors including genetic, epigenetic and environmental components (Spencer et al. 1991; Klonisch et al.

2004; Thonneau et al. 2003; Hutson and Hasthorpe, 2005; Amann and Veeramachaneni

2007). A number of environmental factors have been recognized to be associated with an undescended testis in humans, and these factors include low birth weight, premature birth, maternal consumption of alcohol, smoking and caffeine, gestational diabetes and prenatal exposure to pesticides (Hughes and Acerini 2008; Wang and Wang 2002).

Familial analyses show that cryptorchidism is a genetic disorder. Various genes implicated in the regulation of testicular descent have been investigated for cryptorchidism. The processes of testicular descent have been proposed as a biphasic model, with a “transabdominal” phase followed by an “inguinalscrotal” phase (Hutson

1985). Androgens might play a role in both phases. Impaired fetal androgen action can result in cryptorchidism. Genes related to androgen action have been investigated in the study of cryptorchidism. The androgen receptor gene (AR) is strongly expressed in the gubernaculums (George and Peterson 1988). The length variant of trinucleotide repeats like GGN (encoding glycine) or CAG (encoding glutamine) in AR was identified respectively to be associated with a risk of cryptorchidism in an Iranian population and in

123

Hispanic boys (Radpour et al. 2007; Davis_Dao et al. 2012). Mice with conditional

inactivation of Ar developed cryptorchidism with testes located in a suprascrotal position

(Kaftanovskaya et al. 2012). Androgens act on testicular descent via the release of a

calcitonin gene-related peptide by the genitofemoral nerve. Calcitonin gene-related

peptides are encoded by two separated genes named CALCA (CGRP-alpha) and CALCB

(CGRP-beta). Exogenous calcitonin gene-related peptides injection into the scrotum can change the direction of gubernacular migration in the mutant trans-scrotal (TS) rat

(Griffiths et al. 1993; Clarnette and Hutson 1999). However, the CGRP gene was not found to be a major factor for cryptorchidism based on a mutation association analysis in a human study (Zuccarello et al. 2004).

The swelling and migration of male gubernaculum with relaxation of the cranial suspensory ligament in the process of testicular descent allow testes to move from the posterior abdominal wall to the internal inguinal ring (Hutson 1985). The gubernaculum is a continuous column of mesenchyme connecting the testis to the scrotal sac. Insulin- like factor 3 (INSL3) is expressed in the pre-and postnatal Leydig cells of the testis and postnatal theca cells of the ovary, and is a member of the insulin-like hormone superfamily (Zimmermann et al. 1997). Targeted disruption of this gene prevents gubernaculum growth and causes bilateral cryptorchidism in mice (Nef and Parada 1999;

Zimmermann et al. 1999). The receptor of INSL3 is encoded by a gene called relaxin/insulin-like family peptide receptor 2 (RXFP2), also named the leucine-rich repeat-containing G protein-coupled receptor 8 gene (LGR8). LGR8 is highly expressed in the gubernaculum and is responsible for mediating hormonal signals that control normal testicular descent (Bogatcheva et al. 2003). Mutant mice for this gene exhibit

124 bilateral intra-abdominal cryptorchidism due to the undifferentiated gubernaculum

(Gorlov et al. 2002). Summarized from previous reports, only a minority of affected cases have a mutation either in the INSL3 gene or its receptor gene LGR8.

Disruptions of other hormone related genes also show associations with aberrant testicular descent. Mullerian inhibiting substance (MIS), or anti-Mullerian hormone

(AMH), encoding a 140-kDa glycoprotein produced by Sertoli cells is critical for the mullerian duct regression during male sexual differentiation (Bashir and Wells 1995) and is associated with the swelling reaction of gubernaculum occurring during the first phase of testicular descent (Kubota et al. 2002). The mutated gonadotropin-releasing hormone receptor (Gnrhr) gene is associated with undescended testes in mice (Pask et al. 2005).

In humans, maternal exposure to the synthetic nonsteroidal estrogen, diethylstilbestrol

(DES), results in a high risk of offspring with cryptorchidism (Gill et al. 1979). The receptor of estrogen, estrogen receptor 1 (ESR1), is one of the ligand-activated transcription factors and is important in the process of testicular descent (Donaldson et al.

1996; Przewratil et al. 2004). Retracted gonads in knockout male mice of Esr1 suggest estrogens played a direct role in the scrotal positioning of the testis (Donaldson et al.

1996). Investigations have also been carried out on homeobox (Hox) genes in mice for cryptorchidism. Homeobox A10 (Hoxa10) is highly expressed in the gubernaculum and this gene has been found to be essential for the gubernacular migration during testicular descent (Nightingale et al. 2008). In mice, disruption of this gene and another Hox gene, called Hoxa11, results in the development of cryptorchidism (Satokata et al. 1995; Hsieh-

Li et al. 1995).

125

There is a high prevalence (about ~14%) of cryptorchidism in Siberian Husky dogs (Zhao et al. 2010). Although several studies show association of some candidate genes with cryptorchidism, the genetic basis of the development of cryptorchidism is still controversial. Here we performed a genome wide association study in Siberian Husky dogs using the Illumina CanineHD BeadChip with more than 170,000 single nucleotide polymorphisms (SNPs) and tried to identify genomic regions and propose some positional candidate genes associated with variation in the incidence of cryptorchids.

METHODS Animals and phenotypes

A total of 205 male Siberian Huskies and 12 male Chinooks were included in this study.

Among them, 43 Siberian Huskies were from the UK, 8 from Canada, and the remaining

154 Siberian Huskies and all Chinooks were from USA. Dogs were diagnosed at 6 months of age by veterinarians or dog owners as being either unilateral or bilateral cryptorchids, or as normal unaffected dogs. In this study, there are 106 dogs with cryptorchidism and 99 diagnosed normal in the Siberian Husky breed. Six affected and 6 control Chinooks were included as a control population for population stratification.

Samples and DNA preparation

Buccal samples were collected by owners from 186 Siberian Huskies and 12 male

Chinooks using sterile cytology brushes (Fisher Scientific, Hampton, NH, USA). The

Canine Health Information Center (CHIC) DNA repository provided 19 male Siberian

Huskies DNA samples with an average concentration of 50 ng/ul. DNA was extracted from 2 to 3 cytology brushes per dog within a single reaction of the QIAamp DNA Mini

Kit (QIAGEN, Inc., Valencia, CA, USA). The DNA concentrations ranged from 10.0 to

126

50.0 ng/l. Samples with DNA concentration less than 20 ng/l were subjected to whole- genome amplification (WGA) using the GenomiPhi (GE Healthcare, Buckinghamshire,

England) kit to yield high-quality DNA suitable for the Illumina infinium canine genotyping platform.

Data generation and analyses

SNP array genotyping and quality control

DNA samples with a total DNA amount of 700-1,000 ng were sent to GeneSeek, Inc.

(Lincoln, NE, USA) for genotyping on the CanineHD BeadChip. The SNPs with call rate ≤ 40%, Gentrain score ≤ 40%, and minor allele frequency ≤ 0.01 were excluded from the data set.

Analyses of population structure

Genotypes of all 217 dogs including 205 Siberian Huskies and 12 Chinooks were imported into Haploview v4.2 software (Barrett et al. 2005). Pairwise linkage disequilibrium (LD) comparisons among markers of each chromosome were performed.

Tagging SNPs were identified using the TAGGER routine implemented in Haploview.

In order to remove the underlying group effects we stratified the population, using the model-based clustering method in STRUCTURE v2.3 software (Pritchard et al. 2000) for inferring population structure using multi-locus Illumina 170K CanineHD BeadChip genotypes for all animals. The admixture model was chosen in STRUCTURE on selected tagging SNP makers and most of the parameters were at their default values.

The length of both the burn-in and post burn in Markov chain Monte Carlo (MCMC) interations were 5,000 wth 3 runs for each K (the number of presumed population

127

clusters). The presumed value of K was varied from 1 to 7. To detect the true value of K,

a second order rate of changes of the likelihood function with respect to K represented as

K was applied. The highest value of the K distribution was used as an indicator of the

strength of the signal detected by STRUCTURE (Evanno et al., 2005). The dogs were assigned into different subgroups according to the constitution of Q values (estimated membership coefficient) in presumed clusters for each individual.

Genome wide association studies

Genome wide association studies were carried out on each of the subgroups assigned by

STRUCTURE as well as on all male Siberian Huskies. Three approaches were applied for the data analyses. The first approach was the case-control analysis implemented in

PLINK v1.06 software (Purcell et al. 2007) using single SNP markers. The linear step-up procedure of Benjamini and Hochberg FDR control in PLINK was utilized in order to correct for multiple testing. The second approach was a BayesB model averaging approach using the GenSel v4R software (http:// bigs.ansci.iastate.edu) with the categorical analysis option, to obtain the variance explained by SNPs in one megabase

(1Mb) genomic windows (which we refer SNP model). The statistical model used for

BayesB was as follows:

1 where is the phenotype of animal i, is an overall mean, is the total number of SNPs in the panel, is genotype at marker j in individual i, is a random substitution effect for marker with its own variance ( with a probability 1- ) or (with a 0 0

128

probability of ), and ei is the random residual on animal i that are assumed to be normally distributed N (0, . The parameter was assumed to be 0.9999 so as to fit

10-20 markers (0.0001x 170K) per iteration of the Markov chain in a mixture model for the estimation of individual SNP effects. A total of 51,050 MCMC iterations with burn- in of 1,000 iterations and with an output frequency of 50 iterations were used for inference. There were 2,346 unique 1Mb windows across the dog genome based on the genotyped markers. The number of markers in each 1Mb window varied according to the genomic regions. It was assumed that under an infinitestimal model all windows would account for the same variance of about 0.04%. Hence, the 1Mb windows explaining at least 0.4% of genetic variance, which is 10 fold greater than expected, were considered as indicative candidate regions for a gene influencing the incidence of cryptorchids.

In order to utilize haplotype information for identifying genes underlying complex traits, a tree-based association method that captures information about the evolutionary history

(genealogy) of the genome were implemented in the analysis. The third approach, we refer to as the BayesB analysis with haplotype model, was carried out on each of the subgroups and their combined dataset to further verify regions showing associations with cryptorchidism in the BayesB analysis with SNP model. To perform this analysis, genotypes for each individual were phased on each chromosome (based on the CanFam

2.0 assembly) using BEAGLE v3.3 software in which a localized haplotype-cluster mode was fitted using an EM algorithm (Browning & Browning 2007). Subsequently, local phylogenetic trees were constructed for each SNP marker with default values in Blossoc software (Mailund et al. 2006) and were drawn as a rooted binary tree (Sahana et al.

2011). At the third level of each tree, there are eight possible nodes and all haplotypes

129

descending from each of the nodes (with the same phylogenetic lineage) were considered

as a cluster. A new incidence matrix was developed to relate each of the genotyped

animals to the detected haplotype clusters and its original SNP genotypes. The same

BayesB model was applied to the new incidence matrix and the genetic variance

explained by each 1 Mb genomic window was obtained using GenSel software. The

windows passing the criteria (> 0.40% of genetic variance) in the SNP model and also

explaining at least 0.20% of genetic variance, which is 5 fold greater than expected, were

retained as candidate regions in this study.

RESULTS There were 143,286 SNPs for GWAS analyses after quality control edits. One Siberian

Husky dog was excluded due to its low genotype call rate and the remaining 105 affected

and 99 controls were included for the GWAS analyses. A total of 47,672 tagging SNP

markers were selected using Haploview (data not shown) and then used to run

simulations in STRUCTURE. Under a model assuming admixture, the results recover 3

population clusters (K=3) according to the K statistic which is based on the rate of

change in the log probability of the data (Figure 1B). The estimated proportion of ancestry (Q value) derived from the 3 assumed population clusters varied among individual dogs. The 12 Chinook dogs were distinct from the Siberian Husky dogs

(Figure 1A). Here we re-assigned dog samples into 4 subgroups according to their constitution of Q values in the 3 population clusters (Figure 1C, Supplementary Table S1).

The first 3 subgroups are Siberian Huskies and subgroup 4 included all 12 Chinooks with extreme Q values of cluster 3 (0.928 to 1). Due to the small sample size, the Chinook dogs in subgroup 4 were used only as a control breed for the structure clustering, but not

130

included in subsequent GWAS. There were 129 dogs in subgroup 1 with a range of Q

values from 0.709 to 1 for cluster 1, the majority of these dogs were collected from USA

(95%) and 5% were from Canada. Sixty dogs are cryptorchids and 69 are controls in this

subgroup. Subgroup 3 included 20 cryptorchids and 15 control dogs, with high Q values

of cluster 2 (0.700-0.838). Most of these dogs were sampled from the UK (94%).

Individuals in the subgroup 2 had intermediate Q values for both clusters 1 and 2. These

dogs were of mixed origins, being sampled from USA (71%), the UK (24%) and Canada

(5%). There were 41 individuals in subgroup 2, and 26 were cryptorchids. The sample

sources and breeds for each dog subgroup are in Figure 1C and the supplementary Table

S1.

Twelve 1Mb SNP windows showed a high percentage of genetic variance (> 0.4%, 10

fold greater than the expected value) when the BayesB analyses were conducted in

subgroup 1. These regions include one window on each of dog chromosomes CFA1,

CFA3, CFA14 and CFA18, four windows on each of CFA5 and CFA27. Notably, the

four windows on CFA27 explain 4.67% of genetic variance and one of them located at

the 5th Mb explains 3.24% of genetic variance by itself (Table 1, Figure 2). Furthermore, three out of these four windows on CFA27 remain with high genetic variance compared to other regions (> 5 fold of the expected values, 0.34%, 0.53% and 0.22% for the 1Mb windows located at the 4th, 5th and 6th Mb on CFA27, respectively) when the window approach using haplotype model was conducted in BayesB analyses for subgroup 1

(Table 1). The windows on CFA1 and CFA14 were both at 50 Mb and were also detected by the haplotype analysis accounting for genetic variance over 0.2% (Table 1, Figure 2).

By searching the database of CanFam 2.0 using Ensembl browser, 74 characterized genes

131

have been mapped between the 4th to the 6th Mb on CFA27, 8 on the 50th Mb of CFA1, and 5 on the 50th Mb of CFA14 (Supplementary Table S2). Among them, a candidate gene named anti-mullerian hormone type II receptor (AMHR2) was found to be located in

the region of the 4th Mb of CFA27 (Figure 2, Supplementary Table S2). Human patients with mutations in this gene typically develop cryptorchidism (Clarnette et al. 1997).

Moreover, a cluster of HOX-C genes including HOXC4, HOXC5, HOXC6 HOXC8,

HOXC10, HOXC11, HOXC12 and HOXC13 were identified as being located within the

5th Mb window on CFA27 (Table S2).

The results from BayesB analyses by SNP model for subgroup 2 and subgroup 3 identified six different regions. Among those six 1Mb regions, only two of the 1Mb windows were also identified as candidate regions with the haplotype model, those windows at 75 Mb on CFA4 for subgroup 2 (Table 1, Supplementary Figure S1) and 18

Mb on CFA31 for subgroup 3 (Table 1, Supplementary Figure S2). Eight characterized genes such as CAPSL (calcyphosine-like), IL7R (interleukin 7 receptor) and SPEF2

(sperm flagellar 2) etc. have been mapped to 75 Mb of CFA4 and two types of U6 spliceosomal RNA have been mapped to 18 Mb of CFA31 (Supplementary Table S3). No apparent genes in these regions have been reported so far to be associated with the processes of testicular descent or the development of cryptorchidism. The sample size for these two subgroups is relative small. More samples are needed in the future for discovering new loci if it is the fact that dogs with cryptorchidism in these two subgroups have distinct causative mutations from individuals in subgroup1.

The results from GWAS analyses using all Siberian Huskies (105 affected and 99 controls) show that 1Mb SNP windows on CFA3, 6, 9, 24, 27, X explain a high

132

percentage of genetic variance (higher than 0.40%) (Table 1). Among those, four regions

including the 31st and 33rd Mb windows on CFAX, 56th Mb on CFA9 and 13th Mb window on CFA27 also showed a high percentage of variance (higher than 0.20%) when the window were fitted using the haplotype model (Table 1, Supplementary Figure S3).

Comparing the results of this pooled analysis with previous analyses from each of the 3 subgroups separately, we did not identify a common window explaining high percentage of genetic variance.

It is to be noted that most of the SNPs with lowest raw P-values on each chromosome in the case-control analyses in PLINK were located within the windows detected from the

BayesB analyses (Figure 2A, Supplementary Figure S1A, S2A and S3A), even though their significance disappeared after FDR corrections (data not shown).

Additionally, we summarized the results from the BayesB analyses for genes that had previously been shown to be associated with cryptorchidism, including ESR1, NR5A1

(nuclear receptor subfamily 5, group a, member 1), GNRHR, HOXA10, HOXA11, FGFR1

(fibroblast growth factor receptor 1), SOS1 (son of sevenless, drosophila, homolog 1),

WT-1 (Wilms tumor gene 1), INSL3, AMH, CALCA, PROKR2 (prokineticin receptor 2),

LGR8, COL2A1, KAL1 (Kallmann syndrome interval gene 1) and AR (Table 2). There are no regions covering these genes passing the criteria set in this study to show apparent associations with cryptorchidism, neither in the three subgroup analyses nor in the pooled analysis of the Siberian Husky population. The causative loci for cryptorchidism in our study do not seem to coincide with those previously reported candidate genes.

133

DISCUSSION Genetic association studies relate differences in disease status reflected as case and control groups with differences in allele frequencies at SNPs. Undetected population substructure in an admixed population can lead to spurious allelic associations (Cardon and Palmer 2003). Diseases could be heterogeneous when the same condition is caused by different genes or different alleles at the same gene in different populations. The genetic basis for cryptorchidism is likely heterogeneous in different populations. Perhaps at least 20 genes might be associated with cryptorchidism, since no single mutated gene has been found to cause more than 10% of cryptorchids (Klonisch et al. 2004; Amann and Veeramachaneni 2007). Previous studies have shown that only 3-5% cryptorchid men exhibit gene aberrance for INSL3 or GREAT and < 16% for AR or ESR1 (Amann and

Veeramachaneni 2007). In another instance, a specific haplotype of ESR1 has been found to be associated with cryptorchidism status in a Japanese population (Yoshida et al.

2005), but only being associated with the level of severity of cryptorchidism in other populations (Wang et al. 2008). In such cases, isolating a relatively homogeneous subpopulation from a heterogeneous population is necessary for cryptorchidism studies.

Here we used STRUCTURE to infer the presence of distinct population clusters and then re-assigned individuals to 4 new subgroups in order to discover aberrant genes associated with the condition in each subsample. The re-assigned dog subgroups were consistent for the most part with their geographic distribution but the regions (windows) identified in each subgroups were distinctly different in each sub population. It demonstrates that different genes might contribute to cryptorchidism in different groups of dogs.

134

Comparing the results of the pooled analyses (all 204 Siberian Huskies) using BayesB

approaches with results from the three subgroups separately, we did not identify a

common window accounting for high genetic variance. This may be due to fact that dogs

from different geographical backgrounds in the pooled population blended the allele

frequencies of SNPs associated with the disease in any particular subgroup.

In this study, the AMHR2 gene was considered a promising candidate gene for canine cryptorchidism in Siberian Husky subgroup 1 based on results from BayesB analyses using the window approaches. The SNP (BICF2G630137913, 4791435bp on CFA27) located within AMHR2 gene was not significantly associated with cryptorchidism from the results of single marker analysis in PLINK after FDR corrections (praw = 0.002, pgenome

= 0.9971). This marker is about 0.59Mb away on CFA27 from SNP BICF2G6301380 which has the smallest raw P value in the analyses (praw = 2.39E-05). By checking the

LD structure for SNPs on CFA27 using Haploview, BICF2G630137913 did not show strong LD with nearby SNPs (r2 = 0.2 with the left SNP BICF2G630137904 or the right

SNP BICF2G630137922). The average LD for BICF2G630137913 with other markers on CFA27 was very low (r2 = 0.04). It is not surprising to find that the window containing BICF2G630137913 shows high percentage of genetic variance, but no significant association was observed between this SNP and cryptorchidism in the single marker analysis using PLINK. Supposing a true causative mutation in AMHR2 is not in high LD with SNP BICF2G630137913, the single SNP analysis will not help to capture the association with the studied disease. Therefore, the cumulative effects of 1Mb SNP windows or windows with integration of haplotype information would facilitate detection of regions harboring the causative loci that may not be apparent in single SNP analyses.

135

To confirm whether AMHR2 is a causative locus for cryptorchidism, expression array studies and/or fine mapping by sequencing the whole gene are necessary.

The case-control analysis using single SNP tests in the PLINK program is a frequentist approach by calculating a p-value for the null hypothesis of no association. Multiple testing corrections like FDR and Bonferroni adjustment are usually performed (Noble

2009). The sample size and minor allele frequency will strongly affect the results using this approach (Stephens and Balding 2009). In this study, there is no significant association remaining after FDR corrections for SNPs on the BeadChip with the given disease in any data sets. The relatively small sample size would limit the power of detecting associated markers or regions for a complex disease. In this study, Bayesian approaches with SNP or haplotype windows were also used to assess evidence for an association between genomic regions containing multiple genetic variants and cryptorchidism. The 1Mb window approach accounts for linkage disequilibrium among adjacent SNP and accumulates the effects from multiple genetic variants in any particular region. This method reduces the chance of spurious associations that can easily occur from a single locus analysis (Balding, 2006). Incorporating biological information of genes with GWAS analyses may also help find causative gene loci. Here we found a potential gene AMHR2 that has a known biological function on sex differentiation and is plausibly related to the development of cryptorchidism.

It is known that anti-mullerian hormone (AMH) is one of the hormones mediating male sex differentiation by regression of mullerian ducts which would otherwise differentiate into the uterus and fallopian tubes (Behringer 1994). Double-knockout mouse models for

AMH and AMHR2 suggest that AMH is the only ligand of the AMHR2 receptor (Mishina

136 et al. 1996). Mutations in AMH and AMHR2 have been found to cause persistent mullerian duct syndrome (PMDS), a rare form of male pseudohermaphroditism (Imbeaud et al. 1995; Messika-Zeitoun et al. 2001). There are 60-70% patients with PMDS that have their testes located in the abdomen at a position analogous to the ovaries without descending into the normal scrotal destinations (Clarnette et al. 1997). Many PMDS patients have been found with the absence of gubernacular structures or feminized gubernaculum (Clarnette et al. 1997). The disruptions of AMH or AMHRs may cause human cryptorchidism by preventing transabdominal descent through failure of the gubernacular swelling reaction (Clarnette et al. 1997). It was reported that a boy carried a heterozygous WT1 nonsense mutation had bilateral cryptorchidism, nystagmu and

Wilms’ tumor. The nonsense mutation leads to a truncated protein by losing three zinc- finger domains for DNA binding (Terenziani et al. 2009). The cDNA microarray analyses on Wt1 (Wilms’ tumor suppressor gene) knockout mice have shown that inactivation of the Wt1 gene dramatically down-regulates gene expression of Amkr2.

This shows that Amkr2 is a target gene of Wt1 when it plays the role in urogenital and gonadal development during sex differentiation (Klattig et al. 2007). Therefore, the WT1,

AMHR2 and AMH genes might be within the same pathway to regulate sex differentiation.

Gene disruptions of one of these genes could prompt the development of undescended testicles in males.

It is known that HOX genes are a group of related genes determining the regional identity of an embryo along the anterior- posterior axia and the type of segment structure (Joseph et al. 2005). The gubernaculum is central to testicular descent, with evidence suggesting that it elongates to the scrotum like a limb bud (Huynh et al. 2007). The genes HOXA10

137

and HOXA11 involved in limb bud outgrowth are expressed within the gubernaculum

(Yuan et al. 2006), and these two genes have been found to play a role in the developing genito-urinary tract. The Hoxa10-/- and Hoxa11-/- mutant male mice present cryptorchidism (Satokata et al. 1995; Hsieh-Li et al. 1995). Disrupted gene functions or ectopic expression of some of the Hoxc genes such as Hoxc4, Hoxc6 or Hoxc8 have been considered being responsible for the lack of hindlimb phenotypes in chickens (Nelson et al. 1996). However, there is no sufficient research evidence to show relationships between any of the HOX-C genes with cryptorchidism.

In conclusion, the important findings in this present GWAS are (i) the AMHR2 gene was identified in a putative candidate region associated with canine cryptorchidism by using

1Mb window approach and Bayesian inference (ii) different chromosomal regions and thus different genes appear to be associated with the disease in dogs from pedigrees from different geographical regions. Population structure clustering will help correct for stratification; (iii) cryptorchidism is a complex disease and shows genetic heterogeneity;

(iv) single SNP analyses may not capture the genomic regions harboring the causative loci when there is not enough LD between markers on the chip nearby the causative mutations. The 1Mb window approach along with BayesB analysis may help to resolve this problem; (v) whole genome sequencing for pairs of affected and unaffected dogs from the same family may facilitate finding causative mutations without needing to consider LD structure and disease heterogeneity.

Software BEAGLE: http://faculty.washington.edu/browning/beagle/beagle.html Blossoc: http://www.daimi.au.dk/~mailund/Blossoc/index.html GenSel: http://bigs.ansci.iastate.edu

138

Haploview: http://www.broadinstitute.org/scientific-community/science/programs/medical-and- population-genetics/haploview/haploview PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/ STRUCTURE: http://pritch.bsd.uchicago.edu/structure.html

Funding

This work was supported by the AKC Canine Health Foundation (Grant No. 1248) and

State of Iowa Funds.

ACKNOWLEDGEMENTS The help of sample collections provided by Sheila E. (Blanker) Morrissey

(DVM), individual dog breeders and owners, Eddie Dziuk (Chief Operating Officer,

CHIC), Caroline Kisko (secretary, Kennel Club in UK) and Dr. Nancy Bartol (Chinook

Club of America) is appreciated. We also thank Ziqing Weng for helping in utilizing

BEAGLE program.

REFERENCES Amann RP, Veeramachaneni DN, 2007. Cryptorchidism in common eutherian mammals. Reproduction. 133:541-61.Balding DJ, 2006. A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791. Barrett JC, Fry B, Maller J, Daly MJ, 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 21:263-5. Bashir MS, Wells M, 1995. Müllerian inhibiting substance. J Pathol. 176:109-10. Behringer RR, Finegold MJ, Cate RL, 1994. Müllerian-inhibiting substance function during mammalian sexual development. Cell. 79:415-25. Behringer RR, 1994. The in vivo roles of müllerian-inhibiting substance. Curr Top Dev Biol. 29:171-87. Review. Berkowitz GS, Lapinski RH, Dolgin SE, Gazella JG, Bodian CA, Holzman IR. Prevalence and natural history of cryptorchidism, 1993. Pediatrics. 92:44-9.

139

Bogatcheva NV, Truong A, Feng S, Engel W, Adham IM, Agoulnik AI, 2003. GREAT/LGR8 is the only receptor for insulin-like 3 peptide. Mol Endocrinol. 17:2639- 46. Browning SR, Browning BL, 2007. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Am J Hum Genet. 81:1084-97. Burton M, Ramsay E, 1986. Cryptorchidism in Maned Wolves. J Zoo An Med. 17:133-5. Cardon LR, Palmer LJ, 2003. Population stratification and spurious allelic association. Lancet. 361:598-604. Review. Carizza C, Antiba A, Palazzi J, Pistono C, Morana F, Alarcón M, 1990. Testicular maldescent and infertility. Andrologia. 22:285-8. Clarnette TD, Hutson JM, 1999. Exogenous calcitonin gene-related peptide can change the direction of gubernacular migration in the mutant trans-scrotal rat. J Pediatr Surg. 34:1208-12. Clarnette TD, Sugita Y, Hutson JM, 1997. Genital anomalies in human and animal models reveal the mechanisms and hormones governing testicular descent. Br J Urol. 79:99-112. Cox JE, Edwards GB, Neal PA, 1979. An analysis of 500 cases of equine cryptorchidism. Equine Vet J. 11:113-6. Cox VS,Wallace LJ, Jessen CR, 1978. An anatomic and genetic study of canine cryptorchidism. Teratology. 18:233-40. Davis-Dao C, Koh CJ, Hardy BE, Chang A, Kim SS, De Filippo R, Hwang A, Pike MC, Carroll JD, Coetzee GA, 2012. Shorter androgen receptor CAG repeat lengths associated with cryptorchidism risk among Hispanic white boys. J Clin Endocrinol Metab. 97:E393- 9. Donaldson KM, Tong SY, Washburn T, Lubahn DB, Eddy EM, Hutson JM, Korach KS, 1996. Morphometric study of the gubernaculum in male estrogen receptor mutant mice. J Androl. 17:91-5. Dunbar MR, Cunningham MW, Wooding JB, Roth RP, 1996. Cryptorchidism and delayed testicular descent in Florida black bears. J Wildl Dis. 32:661-4. Evanno G, Regnaut S, Goudet J, 2005. Detection the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 14:2611-20. George FW, Peterson KG, 1988. Partial characterization of the androgen receptor of the newborn rat gubernaculum. Biol Reprod. 39:536-9. Gill WB, Schumacher GF, Bibbo M, Straus FH 2nd, Schoenberg HW, 1979. Association of diethylstilbestrol exposure in utero with cryptorchidism, testicular hypoplasia and semen abnormalities. J Urol. 122:36-9. Gorlov IP, Kamat A, Bogatcheva NV, Jones E, Lamb DJ, Truong A, Bishop CE, McElreavey K, Agoulnik AI, 2002. Mutations of the GREAT gene cause cryptorchidism. Hum Mol Genet. 11:2309-18.

140

Griffiths AL, Middlesworth W, Goh DW, Hutson JM, 1993. Exogenous calcitonin gene- related peptide causes gubernacular development in neonatal (Tfm) mice with complete androgen resistance. J Pediatr Surg. 28:1028-30. Hadziselimovic NO, de Geyter Ch, Demougin P, Oakeley EJ, Hadziselimovic F, 2010. Decreased expression of FGFR1, SOS1, RAF1 genes in cryptorchidism. Urol Int. 84:353- 61. Hou JW, Tsai WY, Wang TR, 1999. Detection of KAL-1 gene deletion with fluorescence in situ hybridization. J Formos Med Assoc. 98:448-51. Hsieh-Li HM, Witte DP, Weinstein M, Branford W, Li H, Small K, Potter SS, 1995. Hoxa 11 structure, extensive antisense transcription, and function in male and female fertility. Development. 121:1373-85. Hughes IA, Acerini CL, 2008. Factors controlling testis descent. Eur J Endocrinol. 159 Suppl 1:S75-82. Hutson JM, Hasthorpe S, 2005. Abnormalities of testicular descent. Cell Tissue Res. 322:155-8. Review. Hutson JM, 1985. A biphasic model for the hormonal control of testicular descent. Lancet. 24:419-21. Huynh J, Shenker NS, Nightingale S, Hutson JM, 2007. Signalling molecules: clues from development of the limb bud for cryptorchidism? Pediatr Surg Int. 23:617-24. Imbeaud S, Faure E, Lamarre I, Mattéi MG, di Clemente N, Tizard R, Carré-Eusèbe D, Belville C, Tragethon L, Tonkin C, 1995. Insensitivity to anti-müllerian hormone due to a mutation in the human anti-müllerian hormone receptor. Nat Genet. 11:382-8. Kaftanovskaya EM, Huang Z, Barbara AM, De Gendt K, Verhoeven G, Gorlov IP, Agoulnik AI, 2012. Cryptorchidism in mice with an androgen receptor ablation in gubernaculum testis. Mol Endocrinol. 26:598-607. Klattig J, Sierig R, Kruspe D, Besenbeck B, Englert C, 2007. Wilms' tumor protein Wt1 is an activator of the anti-Müllerian hormone receptor gene Amhr2. Mol Cell Biol. 27:4355-64. Klonisch T, Fowler PA, Hombach-Klonisch S, 2004. Molecular and genetic regulation of testis descent and external genitalia development. Dev Biol. 270:1-18. Kubota Y, Temelcos C, Bathgate RA, Smith KJ, Scott D, Zhao C, Hutson JM, 2002. The role of insulin 3, testosterone, Müllerian inhibiting substance and relaxin in rat gubernacular growth. Mol Hum Reprod. 8:900-5. Leader-Williams N, 1979. Abnormal testes in reindeer, Rangifer tarandus. J Reprod Fertil. 57:127-30. Mailund T, Besenbacher S, Schierup MH, 2006. Whole genome association mapping by incompatibilities and local perfect phylogenies. BMC Bioinformatics. 7:454. Messika-Zeitoun L, Gouédard L, Belville C, Dutertre M, Lins L, Imbeaud S, Hughes IA, Picard JY, Josso N, di Clemente N, 2001. Autosomal recessive segregation of a truncating mutation of anti-Müllerian type II receptor in a family affected by the

141 persistent Müllerian duct syndrome contrasts with its dominant negative activity in vitro. J Clin Endocrinol Metab. 86:4390-7. Mishina Y, Rey R, Finegold MJ, Matzuk MM, Josso N, Cate RL, Behringer RR, 1996. Genetic analysis of the Müllerian-inhibiting substance signal transduction pathway in mammalian sexual differentiation. Genes Dev. 10:2577-87. Nef S, Parada LF, 1999. Cryptorchidism in mice mutant for Insl3. Nat Genet. 22:295-9. Nelson CE, Morgan BA, Burke AC, Laufer E, DiMambro E, Murtaugh LC, Gonzales E, Tessarollo L, Parada LF, Tabin C, 1996. Analysis of Hox gene expression in the chick limb bud. Development. 122:1449-66. Nightingale SS, Western P, Hutson JM, 2008. The migrating gubernaculum grows like a "limb bud". J Pediatr Surg. 43:387-90.

Noble WS, 2009. How does multiple testing correction work? Nat Biotechnol. 27:1135-7.

Pask AJ, Kanasaki H, Kaiser UB, Conn PM, Janovick JA, Stockton DW, Hess DL, Justice MJ, Behringer RR, 2005. A novel mouse model of hypogonadotrophic hypogonadism: N-ethyl-N-nitrosourea-induced gonadotropin-releasing hormone receptor gene mutation. Mol Endocrinol. 19:972-81. Pearson JC, Lemons D, McGinnis W, 2005. Modulating Hox gene functions during animal body patterning. Nat Rev Genet. 6:893-904. Review. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P, 2000. Association mapping in structured populations. Am J Hum Genet. 67:170-81. Przewratil P, Paduch DA, Kobos J, Niedzielski J, 2004. Expression of estrogen receptor alpha and progesterone receptor in children with undescended testicle previously treated with human chorionic gonadotropin. J Urol. 172:1112-6. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81:559-75. Radpour R, Rezaee M, Tavasoly A, Solati S, Saleki A, 2007. Association of long polyglycine tracts (GGN repeats) in exon 1 of the androgen receptor gene with cryptorchidism and penile hypospadias in Iranian patients. J Androl. 28:164-9. Rothschild MF, Christian LL, Blanchard W, 1988. Evidence for multigene control of cryptorchidism in swine. J Hered. 79:313-4.

Sahana G, Mailund T, Lund MS, Guldbrandtsen B, 2011. Local genealogies in a linear mixed model for genome-wide association mapping in complex pedigreed populations. PLoS One.6:e27061.

Sarfati J, Dodé C, Young J, 2010. Kallmann syndrome caused by mutations in the PROK2 and PROKR2 genes: pathophysiology and genotype-phenotype correlations. Front Horm Res. 39:121-32. Review

142

Satokata I, Benson G, Maas R, 1995. Sexually dimorphic sterility phenotypes in Hoxa10- deficient mice. Nature. 374:460-3. Singh A, Ezeasor D, 1989. Ultrastructure of Sertoli cells in scrotal testis of unilaterally cryptorchid goat. Prog Clin Biol Res.296:159-64. Smith KC, Brown PJ, Morris J, Parkinson TJ, 2007. Cryptorchidism in North Ronaldsay sheep. Vet Rec. 161:658-9. Spencer JR, Torrado T, Sanchez RS, Vaughan ED Jr, Imperato-McGinley J, 1991.Effects of flutamide and finasteride on rat testicular descent. Endocrinology. 129:741-8. Stephens M, Balding DJ, 2009. Bayesian statistical methods for genetic association studies. Nat Rev Genet. 10:681-90. St Jean G, Gaughan EM, Constable PD, 1992. Cryptorchidism in North American cattle: breed predisposition and clinical findings. Theriogenology. 38:951-8. Terenziani M, Sardella M, Gamba B, Testi MA, Spreafico F, Ardissino G, Fedeli F, Fossati-Bellani F, Radice P, Perotti D, 2009. A novel WT1 mutation in a 46,XY boy with congenital bilateral cryptorchidism, nystagmus and Wilms tumor. Pediatr Nephrol. 24:1413-7. Thonneau P, Bujan L, Multingner L, Mieusset R, 1998. Occupational heat exposure and male fertility: a review. Hum Reprod. 13: 2122. Thonneau PF, Gandia P, Mieusset R, 2003. Cryptorchidism: incidence, risk factors, and potential role of environment; an update. J Androl. 24:155-62. Review. Wada Y, Okada M, Hasegawa T, Ogata T, 2005. Association of severe micropenis with Gly146Ala polymorphism in the gene for steroidogenic factor-1. Endocr J. 52:445-8. Wang J, Wang B, 2002. Study on risk factors of cryptorchidism Zhonghua Liu Xing Bing Xue Za Zhi. 23:190-3. Wang Y, Barthold J, Figueroa E, González R, Noh PH, Wang M, Manson J, 2008. Analysis of five single nucleotide polymorphisms in the ESR1 gene in cryptorchidism. Birth Defects Res A Clin Mol Teratol. 82:482-5. Wang Y, Barthold J, Kanetsky PA, Casalunovo T, Pearson E, Manson J, 2007. Allelic variants in HOX genes in cryptorchidism. Birth Defects Res A Clin Mol Teratol. Apr;79(4):269-75. Yates D, Hayes G, Heffernan M, Beynon R, 2003. Incidence of cryptorchidism in dogs and cats. Vet Rec. 152:502-4. Yoshida R, Fukami M, Sasagawa I, Hasegawa T, Kamatani N, Ogata T, 2005. Association of cryptorchidism with a specific haplotype of the estrogen receptor alpha gene: implication for the susceptibility to estrogenic environmental endocrine disruptors. J Clin Endocrinol Metab. 90:4716-21. Yuan FP, Lin DX, Rao CV, Lei ZM, 2006. Cryptorchidism in LhrKO animals and the effect of testosterone-replacement therapy. Hum Reprod. 21:936-42. Zhao X, Du ZQ, Rothschild MF, 2010. An association study of 20 candidate genes with cryptorchidism in Siberian Husky dogs. J Anim Breed Genet. 127:327-31.

143

Zimmermann S, Schöttler P, Engel W, Adham IM, 1997. Mouse Leydig insulin-like (Ley I-L) gene: structure and expression during testis and ovary development. Mol Reprod Dev. 47:30-8. Zimmermann S, Steding G, Emmen JM, Brinkmann AO, Nayernia K, Holstein AF, Engel W, Adham IM, 1999. Targeted disruption of the Insl3 gene causes bilateral cryptorchidism. Mol Endocrinol. 13:681-91. Zuccarello D, Morini E, Douzgou S, Ferlin A, Pizzuti A, Salpietro DC, Foresta C, Dallapiccola B, 2004. Preliminary data suggest that mutations in the CgRP pathway are not involved in human sporadic cryptorchidism. J Endocrinol Invest. 27:760-4.

144

Figure 1. Re-grouping dogs based on the value of cluster membership estimated from STRUCTURE. A. Summary plot of estimated Q values (cluster membership coefficients) for each individual dog collected from USA, UK and Canada. Each individual is represented by a single vertical line broken into K colored segments, with lengths proportional to each of the K inferred clusters. B. Identification of the most possible estimate of K calculated from 47,672 tagging SNPs. Delta K was calculated for K=1 through K= 6 under a model assuming admixture. K=3 is the best estimate based on K statistics on the plot. C. Dogs were assigned into 4 subgroups based on Q values when K=3. The re-grouping of dogs is consistent with the dogs’ geographical distributions. SH: Siberian Husky; CHI: Chinook.

145

Figure 2. Whole genome association analyses for subgroup 1 comprising 129 mostly USA sourced (60 cases and 69 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discovery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Display results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. The most promising putative regions for cryptorchidism in this subpopulation are on CFA27, 1 and 14.

146

Figure S1. Whole genome association analyses for subgroup 2 comprising 41 mixed sourced (26 cases and 15 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discovery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. The most promising putative regions for cryptorchidism are on CFA4 in subgroup 2.

147

Figure S2. Whole genome association analyses for subgroup 3 comprising 35 mostly UK sourced (20 cases and 15 controls) Siberian Huskies. A. Shows results of single marker analyses in PLINK before false discovery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. The most promising putative region for cryptorchidism is on CFA31 in subgroup 3.

148

Figure S3. Whole genome association analyses for a pooled sample of 204 Siberian Huskies in which 105 were diagnosed with cyptorchidism. A. Shows results of single marker analyses in PLINK before false discovery rate (FDR) implementation. B. Depicts results from the GenSel 1Mb SNP window analyses (SNP model). Each spot is a 1Mb window containing a group of consecutive SNPs. C. Displays results using haplotypes and SNP genotypes together in GenSel (SNP & haplotype models). Each spot is a 1Mb window containing a group of consecutive SNPs and haplotypes. The red breaking arrows link SNPs in PLINK to their 1Mb SNP windows with genetic variance being over 0.40% (10 folds than the expected value). The blue breaking arrows show windows repeatedly identified with high variance (> 0.20%, 5 folds than the expected value) when haplotypes were integrated into the analyses in GenSel. Windows on CFA9, 27 and X were identified as putative candidate regions for cryptorchidism in the joined population.

149

Table1. The top 1 Mb windows with the highest percentage of genetic variance obtained from BayesB analyses in different Siberian Husky subgroups

1Mb window Window Group (Chr_Mb) SNP model SNP & haplotype (%) models (%) Subgroup 1 1_50 0.73 0.22 (60 A & 69 U) 3_86 0.44 0.08 5_8 0.54 0.16 5_10 0.47 0.07 5_12 0.68 0.06 5_16 1.04 0.05 14_50* 0.4 0.22 18_11 0.55 0.18 27_4* 0.42 0.34 27_5* 3.24 0.53 27_6* 0.59 0.22 27_18 0.42 0.07 Subgroup 2 4_75* 0.56 0.23 (26 A & 15 U) 27_13 0.58 0.16 35_26 0.4 0.02 Subgroup 3 24_37 0.39 0.04 (20 A & 15 U) 25_14 0.47 0.14 31_18* 0.52 0.33 Pooled 3_87 0.42 0.15 (subgroup1, 6_62 0.45 0.16 subgroup 2 and 9_56* 1.42 0.27 subgroup 3) 24_31 0.55 0.11 (106 A & 99 U) 24_36 0.55 0.16 27_13* 1.14 0.21 27_18 0.43 0.14 X_31* 0.71 0.31 X_33* 1.1 0.39 X_34 0.43 0.15 X_35 0.53 0.14 *These windows were considered as putative candidate regions for cryptorchidism with genetic variance over 0.40% (10 fold than the expected value) in the 1Mb SNP window analyses (SNP model) and repeatedly being with over 0.20% (5 fold than the expected value) for the 1Mb window analyses by integrating with haplotypes (SNP & haplotype models). The positional candidate genes were listed in the supplementary Table S2, S3 &S4.

Table 2. The results of BayesB analyses on 17 previously reported genes being associated with cryptorchidism

Percentage of Genetic Variance (%) (1Mb window with BayesB approach) Gene Chromosom Genome position window Pooled Gene name Ref. Subpop1 Subpop2 Subpop3 symbol e (CFA) (CanFam2.0) (chr_Mb) population SNP Haplo. SNP Haplo. SNP Haplo. SNP Haplo. ESR1 CFA1 Estrogen receptor 1 45129993- 1_45 Yoshida et al. 2005 0.01 0.05 0 0.04 0.03 0.07 0.03 0.03 45409811 NR5A1 CFA9 nuclear receptor 61804755- 9_61 Wada et al. 2005 0.05 0.01 0.03 0.05 0.08 0.09 0.08 0.06 (SF1) subfamily 5, group 61824397 A, member 1 GNRHR CFA13 gonadotropin- 61190588- 13_61 Pask et al. 2005 0 0.04 0.03 0.02 0.04 0.03 0.01 0.04 releasing hormone 61204076 receptor HOXA1 CFA14 homeobox A10 43301964- 14_43 Satokata et al. 1996 0 0.07 0.08 0.09 0.02 0.03 0 0.05 0 43304348 HOXA1 CFA14 homeobox A11 43312981- 14_43 Hsieh-Li et al. 1995 0 0.07 0.08 0.09 0.02 0.03 0 0.05 1 43315443 FGFR1 CFA16 fibroblast growth 29991434- 16_30 Hadziselimovic et al. 0.15 0.17 0.12 0.06 0.03 0.05 0.14 0.03

factor receptor 1 30031766 2010 150 SOS1 CFA17 son of sevenless 34112601- 17_34 Hadziselimovic et al. 0 0.02 0.07 0.01 0.02 0.03 0.04 0.06 homolog 1 34186729 2010 WT-1 CFA 18 Wilms tumor 1 38119539- 18_38 Terenziani et al. 2008 0.09 0.11 0.08 0.04 0.03 0.05 0.04 0.13 3167266 INSL3 CFA20 insulin-like factor 3 48072721- 20_48 Nef and Parada 1999 0.12 0.1 0.04 0.04 0.1 0.03 0.09 0.07 48074397 AMH CFA20 anti-Mullerian 59926297- 20_59 Behringer et al. 1994 0.02 0.05 0.03 0.04 0 0.03 0.02 0.04 (MIS) hormone 59928993 RAF1 CFA8 v-raf-1 murine 8899094- 20_8 Hadziselimovic et al. 0.02 0.04 0.04 0.05 0.06 0.05 0.1 0.06 leukemia viral 8975879 2010 oncogene homolog 1 CALCA CFA21 Calcitonin/calcitonin 41104822- 21_41 Griffiths et al. 1993; 0.03 0.02 0.04 0.04 0.06 0.04 0.14 0.03 -related polypeptide, 41108840 Clarnette and Hutson alpha 1999) PROKR2 CFA24 prokineticin receptor 19351537- 24_19 Sarfati et al. 2010 0.01 0.03 0.07 0.09 0.02 0.06 0 0.04 2 19359239 LGR8 CFA 25 Relaxin receptor 2 11294671- 25_11 Tomyama et al. 2003 0.01 0.03 0.02 0.12 0.13 0.05 0.03 0.03 11340203 COL2A1 CFA27 collagen, type II, 9768511- 27_9 Zhao et al. 2010 0.06 0.07 0.06 0.07 0.06 0.03 0.03 0.05 alpha 1 9799248 KAL1 CFA X Kallmann syndrome 5369941- X_5 Hou et al. 1999 0 0.03 0 0.06 0 0.05 0 0.01 1 sequence 5428059

Table S1. Re-grouping dogs based on the value of cluster membership (K=3)

No. of No. of Q value Geographical distribution K=3 Breed cases controls Cluster1 Cluster2 Cluster3 USA_SH Canada_SH UK_SH USA_CHI Subgroup 1 SH 60 69 0.709-1 0-0.291 0-0.036 0.95 0.05 0 0 Subgroup 2 SH 26 15 0.304-0.694 0.306-0.696 0-0.092 0.71 0.05 0.24 0 Subgroup 3 SH 20 15 0.162-0.3 0.7-0.838 0-0.024 0.06 0 0.94 0 Subgroup 4 CHI 6 6 0.01-0.031 0-0.041 0.928-1 0 0 0 1

The Q value represents the estimated proportion of ancestry when 3 assumed populations were given. SH: Siberian Husky; CHI: Chinook.

151

152

Table S2. A list of genes in the putative regions for subgroup1

Window CFA Gene Start Gene End Ensembl Gene ID Gene symbol Description (chr_Mb) (bp) (bp) 1_50th 1 50145766 50258876 ENSCAFG00000000618 ZDHHC14 zinc finger, DHHC-type containing 14 1 50286593 50286847 ENSCAFG00000028358 7SK 7SK RNA 1 50340242 50341092 ENSCAFG00000000624 Novel gene pseudogene Ensembl genes 1 50436290 50498735 ENSCAFG00000000627 SNX9 sorting nexin 9 1 50550972 50622090 ENSCAFG00000000644 SYNJ2 synaptojanin 2 1 50642818 50692684 ENSCAFG00000000647 SERAC1 serine active site containing 1 1 50711829 50723881 ENSCAFG00000000648 GTF2H5 general transcription factor IIH, polypeptide 5 1 50919150 50993873 ENSCAFG00000000652 TULP4 tubby like protein 4 1 50935969 50936076 ENSCAFG00000028188 U6 U6 spliceosomal RNA 14_50th 14 50199420 50200394 ENSCAFG00000003182 NOVEL pseudogene 14 50211571 50244090 ENSCAFG00000003194 HERPUD2 homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain, member 2 protein 14 50219951 50220703 ENSCAFG00000003199 NOVEL_pseudogene 14 50373100 50456616 ENSCAFG00000003216 SEPT7L septin 7-like 14 50573856 50682597 ENSCAFG00000003227 EEPD1 endonuclease/exonuclease/phosphatase family domain containing 1 14 50708853 50745061 ENSCAFG00000003232 KIAA0895 KIAA0895 14 50773901 50815300 ENSCAFG00000003243 ANLN Actin-binding protein anillin 14 50856685 50950463 ENSCAFG00000003252 AOAH Acyloxyacyl hydrolase (neutrophil) 27_4th 27 4008479 4010926 ENSCAFG00000006572 NFE2 nuclear factor (erythroid-derived 2) 27 4016380 4020656 ENSCAFG00000006580 Novel gene Uncharacterized protein 27 4038247 4050570 ENSCAFG00000006593 CBX5 chromobox homolog 5 27 4090107 4092358 ENSCAFG00000006603 SMUG1 single-strand-selective monofunctional uracil- DNA glycosylase 1 27 4211932 4213230 ENSCAFG00000006611 HOXC4 homeobox C4 27 4231278 4232665 ENSCAFG00000006618 HOXC5 homeobox C5 27 4231753 4231837 ENSCAFG00000028294 cfa-mir-615 cfa-mir-615 27 4235965 4237469 ENSCAFG00000006622 HOXC6 homeobox C6 27 4254284 4256431 ENSCAFG00000006626 HOXC8 homeobox C8 27 4272335 4272444 ENSCAFG00000020562 cfa-mir-196a-2 cfa-mir-196a-2 27 4274730 4278914 ENSCAFG00000006636 HOXC10 homeobox C10 27 4288980 4291010 ENSCAFG00000025599 HOXC11 homeobox C11 27 4307265 4308845 ENSCAFG00000006645 HOXC12 homeobox C12 27 4318372 4324857 ENSCAFG00000006648 HOXC13 homeobox C13 27 4408336 4408471 ENSCAFG00000025595 Novel gene Uncharacterized protein 27 4465522 4465593 ENSCAFG00000028208 Novel gene miRNA ncRNAs 27 4471215 4473053 ENSCAFG00000006653 Novel gene Uncharacterized protein 27 4526984 4543023 ENSCAFG00000006699 CALCOCO1 calcium binding and coiled-coil domain 1 27 4618776 4707507 ENSCAFG00000006726 ATF7 activating transcription factor 7 27 4716529 4721245 ENSCAFG00000006874 TARBP2 TAR (HIV-1) RNA binding protein 2 27 4730433 4736128 ENSCAFG00000006918 MAP3K12 mitogen-activated protein kinase kinase kinase 12

153

Table S2. (continued) Window CFA Gene Start Gene End Ensembl Gene ID Gene symbol Description (chr_Mb) (bp) (bp) 27 4737476 4760860 ENSCAFG00000006821 PCBP2 poly(rC) binding protein 2 27 4771276 4771641 ENSCAFG00000006927 PRR13 rich 13 27 4790871 4796699 ENSCAFG00000006934 AMHR2 anti-Mullerian hormone receptor, type II 27 4808070 4846934 ENSCAFG00000006950 SP1 Sp1 transcription factor 27 4852684 4913472 ENSCAFG00000007016 PFDN5 prefoldin subunit 5 27 4886171 4888554 ENSCAFG00000025559 SP7 Sp7 transcription factor 27 4893129 4903977 ENSCAFG00000006963 AAAS achalasia, adrenocortical insufficiency, alacrimia 27 4904239 4909825 ENSCAFG00000006981 C12orf10 open reading frame 10 27 4915195 4937670 ENSCAFG00000007030 ESPL1 extra spindle pole bodies homolog 1 (S. cerevisiae) 27 4945566 4947561 ENSCAFG00000007035 MFSD5 major facilitator superfamily domain containing 5 27 4962069 4981222 ENSCAFG00000007050 RARG retinoic acid receptor, gamma 27 4993303 4998519 ENSCAFG00000007065 ITGB7 integrin, beta 7 27_5th 27 5002178 5004869 ENSCAFG00000007070 ZNF740 zinc finger protein 740 27 5028008 5039893 ENSCAFG00000007077 CSAD cysteine sulfinic acid decarboxylase 27 5056925 5066187 ENSCAFG00000007085 SOAT2 sterol O-acyltransferase 2 27 5067628 5071493 ENSCAFG00000007097 IGFBP6 insulin-like growth factor binding protein 6 27 5084602 5098690 ENSCAFG00000007101 SPRYD3 SPRY domain containing 3 27 5100630 5115952 ENSCAFG00000007105 TENC1 tensin like C1 domain containing phosphatase (tensin 2) 27 5123868 5154188 ENSCAFG00000007117 EIF4B eukaryotic translation initiation factor 4B 27 5193231 5197229 ENSCAFG00000007154 Novel gene Uncharacterized protein 27 5214844 5215090 ENSCAFG00000025408 Novel gene Uncharacterized protein 27 5225567 5233056 ENSCAFG00000007199 Novel gene Uncharacterized protein 27 5261219 5268719 ENSCAFG00000025402 KRT78 keratin 78 27 5273786 5283987 ENSCAFG00000007204 KRT79 keratin 79 27 5294126 5300384 ENSCAFG00000007208 KRT4 keratin 4 27 5403819 5415836 ENSCAFG00000025382 KRT77 keratin 77 27 5425745 5458087 ENSCAFG00000007212 K2C1_CANFA Keratin, type II cytoskeletal 1 27 5481521 5490471 ENSCAFG00000007233 KRT73 keratin 73 27 5498594 5510056 ENSCAFG00000025358 KRT72 keratin 72 27 5523441 5530483 ENSCAFG00000025346 KRT74 keratin 74 27 5540426 5547762 ENSCAFG00000007278 D6BR72_CANFA keratin 71 27 5570885 5576777 ENSCAFG00000007223 KRT5 keratin 5 27 5595688 5601257 ENSCAFG00000007251 Novel gene Uncharacterized protein 27 5613256 5618136 ENSCAFG00000007209 Novel gene Uncharacterized protein 27 5625571 5627338 ENSCAFG00000025323 Novel gene Uncharacterized protein 27 5638886 5648810 ENSCAFG00000025322 KRT75 keratin 75 27 5650991 5663478 ENSCAFG00000025318 Novel gene Uncharacterized protein 27 5667080 5678048 ENSCAFG00000007281 KRT82 keratin 82 27 5686336 5686456 ENSCAFG00000028307 5S_rRNA 5S ribosomal RNA

154

Table S2. (continued) Window CFA Gene Start Gene End Ensembl Gene ID Gene symbol Description (chr_Mb) (bp) (bp) 27 5687503 5695036 ENSCAFG00000007284 KRT84 keratin 84 27 5709947 5716497 ENSCAFG00000007286 KRT85 keratin 85 27 5722652 5735581 ENSCAFG00000007293 Novel gene Uncharacterized protein 27 5749848 5755811 ENSCAFG00000007311 Novel gene Uncharacterized protein 27 5764377 5770662 ENSCAFG00000007307 Novel gene Uncharacterized protein 27 5779873 5784692 ENSCAFG00000007300 Novel gene Uncharacterized protein 27 5790522 5794208 ENSCAFG00000007318 Novel gene Uncharacterized protein 27 5808380 5814164 ENSCAFG00000007320 Novel gene Uncharacterized protein 27 5816046 5830969 ENSCAFG00000007322 KRT7 keratin 7 27 5865148 5884620 ENSCAFG00000007328 KRT80 keratin 80 27 5931294 5931569 ENSCAFG00000014784 Novel gene Uncharacterized protein 27 5934133 5936981 ENSCAFG00000007331 C12orf44 chromosome 12 open reading frame 44 27 5949972 5957895 ENSCAFG00000007338 NR4A1_CANFA Nuclear receptor subfamily 4 group A member 1 27 5996708 6003640 ENSCAFG00000025231 GRASP GRP1 (general receptor for phosphoinositides 1)-associated scaffold protein 27_6th 27 6014634 6032573 ENSCAFG00000007357 ACVR1B activin A receptor, type IB 27 6079269 6086578 ENSCAFG00000025220 ACVRL1 activin A receptor type II-like 1 27 6104064 6106685 ENSCAFG00000007345 ANKRD33 ankyrin repeat domain 33 27 6179810 6298254 ENSCAFG00000007600 SCN8A sodium channel, voltage gated, type VIII, alpha subunit 27 6242075 6242213 ENSCAFG00000027574 U2 U2 spliceosomal RNA 27 6407456 6462367 ENSCAFG00000007728 Q9GK56_CANFA Uncharacterized protein 27 6517095 6538877 ENSCAFG00000007758 GALNT6 UDP-N-acetyl-alpha-D- galactosamine:polypeptide N- acetylgalactosaminyltransferase 6 (GalNAc- T6) 27 6544212 6733918 ENSCAFG00000007774 CELA1_CANFA Chymotrypsin-like elastase family member 1 27 6565180 6596945 ENSCAFG00000007778 BIN2 bridging integrator 2 27 6619721 6620699 ENSCAFG00000007785 SMAGP small cell adhesion glycoprotein 27 6622959 6626272 ENSCAFG00000007787 DAZAP2 DAZ associated protein 2 27 6656427 6671622 ENSCAFG00000007790 POU6F1 POU class 6 homeobox 1 27 6682523 6741630 ENSCAFG00000007831 TFCP2 transcription factor CP2 27 6755279 6763644 ENSCAFG00000007860 CSRNP2 cysteine-serine-rich nuclear protein 2 27 6767612 6774873 ENSCAFG00000007872 LETMD1 LETM1 domain containing 1 27 6805741 6820814 ENSCAFG00000007902 SLC11A2 solute carrier family 11 (proton-coupled divalent metal ion transporters), member 2 27 6831874 6871607 ENSCAFG00000007924 METTL7A methyltransferase like 7A 27 6924625 6954071 ENSCAFG00000007928 ATF1 activating transcription factor 1

155

Table S3. A list of genes in the putative regions for subgroup2 and subgroup3

Window Gene Start Gene End Group CFA Ensembl Gene ID Gene symbol Description (chr_Mb) (bp) (bp) Subgroup 4_75th 4 75800049 75809381 ENSCAFG00000018745 CAPSL calcyphosine-like 2 4 75832871 75860982 ENSCAFG00000018752 IL7R interleukin 7 receptor

4 75896215 76060543 ENSCAFG00000018767 SPEF2 sperm flagellar 2

Q9N280_CAN excitatory amino acid 4 75133068 75211061 ENSCAFG00000018704 FA transporter 1 RAN binding protein 3- 4 75464205 75489129 ENSCAFG00000018711 RANBP3L like NAD kinase domain 4 75494657 75536151 ENSCAFG00000018721 NADKD1 containing 1 S-phase kinase- 4 75551591 75571161 ENSCAFG00000018725 SKP2 associated protein 2, E3

ubiquitin protein ligase LMBR1 domain 4 75591977 75621896 ENSCAFG00000018731 LMBRD2 containing 2 4 75658019 75676510 ENSCAFG00000023144 Novel gene Uncharacterized protein

4 75680288 75711566 ENSCAFG00000018738 Novel gene Uncharacterized protein

4 75751766 75776502 ENSCAFG00000018743 Novel gene Uncharacterized protein

Subgroup 31_18th 31 18049292 18049396 ENSCAFG00000021702 U6 U6 spliceosomal RNA 3 31 18936469 18936573 ENSCAFG00000027925 U6 U6 spliceosomal RNA

156

Table S4. A list of genes in the putative regions for all Siberian Huskies (including subgroup1, 2 and 3)

Window CFA Gene Gene End (bp) Ensembl Gene Gene symbol (chr_Mb) Start ID (bp) 9_56th 56770082 56817552 ENSCAFG00000019945 ASS1 argininosuccinate synthase 1 56765474 56765884 ENSCAFG00000019944 HBXIP hepatitis B virus x interacting protein

56458287 56458469 ENSCAFG00000024407 QRFP pyroglutamylated RFamide peptide

56831407 56978021 ENSCAFG00000019950 Novel gene Uncharacterized protein 55986530 56045429 ENSCAFG00000019908 PRRC2B proline-rich coiled-coil 2B

phosphatidic acid phosphatase type 2 domain containing 56125301 56139605 ENSCAFG00000019911 PPAPDC3 3 56151597 56164178 ENSCAFG00000019912 FAM78A family with sequence similarity 78, member A

56187199 56289132 ENSCAFG00000019920 NUP214 nucleoporin 214kDa

56292948 56418185 ENSCAFG00000019926 AIF1L allograft inflammatory factor 1-like

56316975 56360237 ENSCAFG00000019927 LAMC3 laminin, gamma 3

56419461 56448416 ENSCAFG00000019930 FIBCD1 fibrinogen C domain containing 1

56465925 56487400 ENSCAFG00000019938 ABL1 c-abl oncogene 1, non-receptor tyrosine kinase

56495457 56496470 ENSCAFG00000023093 Novel gene Uncharacterized protein 56612649 56621679 ENSCAFG00000019941 EXOSC2 exosome component 2

56629556 56644738 ENSCAFG00000023076 PRDM12 PR domain containing 12

56662039 56688826 ENSCAFG00000019942 FUBP3 far upstream element (FUSE) binding protein 3

56235131 56235229 ENSCAFG00000021075 U6 U6 spliceosomal RNA

27_13th 13293117 13305402 ENSCAFG00000009652 TWF1 twinfilin, actin-binding protein, homolog 1 (Drosophila) 13326156 13343992 ENSCAFG00000009700 IRAK4 interleukin-1 receptor-associated kinase 4

13356837 13379958 ENSCAFG00000009729 PUS7L pseudouridylate synthase 7 homolog (S. cerevisiae)-like

ADAM metallopeptidase with thrombospondin type 1 13504107 13665614 ENSCAFG00000009744 ADAMTS20 motif, 20 13189041 13189531 ENSCAFG00000025198 Novel gene pseudogene Ensembl genes

13159473 13160267 ENSCAFG00000024514 Novel gene protein coding Ensembl genes

X_31st 31935641 31936435 ENSCAFG00000024521 Novel gene protein coding Ensembl gene 31622005 31623093 ENSCAFG00000013845 PCBP1 poly(rC) binding protein 1

31853711 31854505 ENSCAFG00000024210 Novel gene Uncharacterized protein

30996456 31056775 ENSCAFG00000013840 CXorf59 chromosome X open reading frame 59

31148254 31150659 ENSCAFG00000023630 Novel gene Uncharacterized protein

31277509 31342758 ENSCAFG00000023600 CXorf30 chromosome X open reading frame 30

31820228 31821353 ENSCAFG00000023558 Novel gene Uncharacterized protein

X_33rd 33390642 33413474 ENSCAFG00000014051 Novel gene Uncharacterized protein 33498639 33499190 ENSCAFG00000014056 MID1IP1 MID1 interacting protein 1

33792251 33794292 ENSCAFG00000014059 Novel gene Uncharacterized protein

33006703 33055120 ENSCAFG00000014003 RPGR_CANFA X-linked retinitis pigmentosa GTPase regulator

33080945 33149321 ENSCAFG00000014023 OTC ornithine carbamoyltransferase

157

CHAPTER 6. GENERAL CONCLUSIONS

Mapping causative mutations responsible for genetic diseases is the primary goal for medical geneticists. Precisely mapped genetic variants can contribute to future use of gene therapy to cure patients. Therefore, understanding the inherited mode of a given disease and utilizing a suitable mapping strategy to identify causative mutations are very important. Moreover, given that animals and humans share many genetic diseases, animal models such as dogs and sheep are useful for investigating gene functions and human diseases. By examining gene alterations in these animal models, one can map the gene loci for a given disease to help understand the causal relationship between mutations in homologous genes and the corresponding human disease.

The relationship between phenotypes and genotypes

The classical forward genetics approach allows researchers to locate the gene on its chromosome associated with a trait of interest. The genotype is the genetic makeup of an individual that represents a specific genetic character under consideration. Measuring the genotype of several individuals will provide information on the differences among individuals in a population. Using various mapping strategies, genotype data will help to map the causative or major genes responsible for the traits being studied. In the past several decades, a lot of efforts have been done to improve genotyping techniques in order to obtain accurate genotypes with less expense. To gain a highly successful rate of mapping traits in a population, researchers have realized that improving only the quality of genotypes is not enough when the measurement and categorization of phenotypes is not precise for quantitative and categorical traits. In this dissertation, we have successfully mapped three

158

monogenic sheep disorders (Chapters 2, 3 and 4). The prerequisites for being successful in these studies were: the accurate diagnoses of disease statuses of each sheep; and the very clear inherited mode of each disorder. Without those accomplishments, we would not have so quickly located the candidate genes, performed the fine mapping processes and eventually found the putative causative mutations. Moreover, in the canine cryptorchidism study in this dissertation (Chapter 5), dogs without one or two testicles in the scrotal sac were diagnosed as cryptorchidism. It is known that disruptions of different regulatory factors during testicular descent will lead to gonads remain in various positions inside the body, such as the abdominal cavity close to the kidneys, the locations nearby the internal/external inguinal ring or the inguinal canal. Therefore, the undescended testicle staying at different body locations in affected dogs has likely increased the complexity and heterogeneity of this disease and has made gene mapping difficult. Since different hormones regulate testicular descent in different descending phases, it is important to know exact details of cryptorchidism in each dog in order to efficiently find the associated genes for this disorder. Therefore, improving the measurement and categorization of phenotypes becomes a new challenge for many studies involved in a forward genetics approach.

Comparative genomics for monogenic disorders and complex diseases

Finding the genetic variations for a disease is just a start. Providing evidence to confirm that the genetic variant is a causative mutation needs a lot of work. Studies in comparative genomics, for example, biochemical or genetic studies in other species for the homologous gene will provide detailed functions of the studied genes. Gene knock-out mouse models are likely to show similar phenotypes as in affected human patients with a loss of function of

159

genes. However, many knock-out mice exhibit apparently different or no phenotypes due to different physiological systems between humans and mice. Selecting a suitable animal model is important for studying of human diseases. Similar characteristics in specific disease affected dogs, sheep or other animal species will provide an alternative choice of selecting a good model system for human diseases rather than mouse models.

Comparing expression panels of genes in the same pathway between different species will give some hints for key genes affecting phenotypes of interest. For example, testicular descent is a prerequisite for normal spermatogenesis in many mammalian species.

Cryptorchidism is an abnormal condition of testicular descent. It is interesting to know that elephants do not perform the process of testicular descent. The comparisons of gene expression, gene coding sequence and translated protein sequence among elephants, cryptorchid dogs and normal dogs to produce a list of genes involved in testicular descent in dogs might show some interesting findings.

From candidate gene approach to whole genome scan

Scientists have applied the candidate gene approach to search for causative mutations or associated DNA markers on a given trait for several decades. Biological functions of candidate genes should be clear when performing this approach. Reverse genetics by utilizing multiple techniques such as mutagenesis, gene targeting, or targeting induced local lesions in genomes (TILLING) to engineer a change or disruption in the DNA can provide information of how a gene influences phenotypes. If there is not sufficient research evidence to show gene functions, it will be hard to decide where to start in the genome to map the causative gene. Whole genome scans using different forms of genetic markers may solve

160

this problem. A SNP array based whole genome association study has been widely applied for study of disease traits. The higher the density of markers, the higher the level of resolution that can be obtained for mapping associated genes. SNPs are DNA markers with the highest density in the genome. Due to their widely distributions in the genome, one can detect association signals for almost every chromosome.

From DNA marker assisted mapping to whole genome sequencing

Advanced sequencing technology has pushed the reality of direct sequencing of genomic or coding sequences in a genome wide scale into the fact. Compared to traditional mapping approaches by using linkage and linkage disequilibrium, whole genome sequencing is not as abstract to understand as linkage mapping but it is straight forward because the researcher could observe almost any changes in the DNA sequences. This will help many scientists quickly search for disease loci without devoting too many efforts for preparing suitable populations for a specific marker assisted mapping approach. However, DNA sequencing generates large volumes of genomic sequence data relatively rapidly. There are too many genetic changes across the genome to be identified and that will increase the difficulty in finding the real mutation. Monogenic disorders usually have a clear mode of inheritance and the corresponding phenotypes are mostly caused by mutations that disrupt protein functions.

Therefore, whole genome sequencing can be an efficient method to discover genes causing monogenic disorders. To map causative mutations and risk alleles for common complex diseases, a larger sample size is still preferred to find the common genetic variants responsible for common disease. DNA marker assisted mapping and whole genome

161

sequencing can be applied together to efficiently and economically identify genes contributing to diseases.

The genetic basis underlying phenotypes is not simple for many cases. The common mutations found are likely changes in DNA sequences. However, more and more evidence shows that epigenetic changes such as changes in the pattern of methylation, or changes of histone modification play important roles in the development of specific phenotypes.

Therefore, it is necessary to screen epigenetic changes in the genome in disease studies. The robust and cheaper technologies are under developed in commercial companies. A new era is expected for genetic disease mapping when powerful tools become available by integrating genetic and epigenetic knowledge for discovering causative mutations responsible for genetic disorders in humans and other species.

FUTURE RESEARCH Future uses for our findings in the sheep industry in relation to the three sheep disorders are to exclude affected individuals and carriers by using genetic tests based on the mutations we discovered in the sheep populations to avoid at risk mating. Additionally, the disease affected sheep and carriers can be retained as animal models for studying of related human diseases. Gene expression and protein analyses can be done for these causative gene loci in the sheep model to further understand the gene functions involved in the normal bone and cartilage metabolisms as well as motor neuron activities.

Future research plans regarding cryptorchidism should include fine mapping of the promising candidate gene AMHR2 by screening its promoter region, exons and splicing regions to discover causative mutations responsible for cryptorchidism. Gene expression analysis

162

should also be done for the AMHR2 gene using gubernaculum tissues from dogs diagnosed as

affected and normal to provide solid evidence for the causative relationship between this

gene and cryptorchidism. Second, collecting more Siberian Husky samples is desirable to

increase the power of the experiment for studying cryptorchidism. Multiple offspring from

closely related dog families will help to decipher the genetic basis for cryptorchid dogs with

a similar genetic background. Third, whole genome sequencing can be done with a small

group of dogs. Whole genome sequencing for pairs of affected and unaffected dogs from the

same family may facilitate finding causative mutations without needing to consider LD

structure and disease heterogeneity.

CONCLUSIONS In conclusion, we have successfully found putative causative mutations for the three sheep

monogenic diseases including a nonsense mutation in DMP1 for inherited rickets, a missense

mutation in AGTPBP1 for lower motor neuron disease, and 1 base pair deletion in SLC13A1

for chondrodysplasia. This conclusion is based on 100% concordant rate of genotypes with

the recessive pattern of inheritance in affected, carrier and phenotypically normal individuals,

as well as implication of gene functions in other species.

Moreover, by applying Bayesian inference, we identified a candidate region at the 4th Mb window on dog chromosome CFA27 associated with the development of cryptorchidism in a subgroup of our Siberian Husky population. The candidate region harbors a very promising functional candidate gene AMHR2 based on the studies for human diseases. Once significantly associated markers or causative mutations are identified in AMHR2 for cryptorchidism, this information could be quickly applied to design a genetic test for the dog

163

breeding program to cull dogs carrying the deleterious allele and to test other breeds.

Moreover, our finding implicates the promising role of the AMHR2 gene during the normal testicular descent. This result will shed light on the future studies of genes in the same pathway for development of human cryptorchidism.

164

ACKNOWLEDGEMENTS

The completion of my dissertation and subsequent Ph. D. has been a long journey. The first three research projects (chapters 2, 3, and 4) in my dissertation were completed smoothly even though the breaking though of the first one took me more than half a year. A large amount of time and efforts were put on the last project (chapter 5). Enormous problems rise during this study. I felt deeply sad, lost confidence on this project many times and got writer’s block before starting to write this dissertation. Eventually, I have finished my dissertation, but not alone. I could not have succeeded without the invaluable support of many people. Without these supporters, especially the select few I’m about to mention, I may not have gotten to where I am today.

Firstly, special thanks go to Dr. Max F. Rothschild, my major professor, for his gentle encouragement and help in pushing me through the first to the last chapter of my dissertation; for his guidance not only on my study but also on the real life aspect of being a strong and successful person.

I would also like to give a heartfelt, special thanks to Dr. Dorian Garrick, my co-major professor. His patience, flexibility, knowledgeable insight and genuine caring enabled me to complete my four research projects.

I am very grateful to the remaining members of my study committee, Dr. Sue Lamont, Dr.

Matthew Ellinwood and Dr. Peng Liu. Their academic support and input and personal cheering are greatly appreciated.

Thanks too to the members of the Rothschild lab group (Dr. Danielle Gorbach, Dr.

Zhi_Qiang Du, Dr. Suneel K. Onteru, Dr. Fan Bin, Agatha Ampaire and Muhammed

165

Walugembe) for your kind concern and support. Thanks to Dr. Rothschild’s family, especially Denise for the countless times she hosted the members of the lab group at her home.

Next I would like to thank several of my good friends: Dawn Koltes, James Koltes, Xinhua

Xiao and Difei Shen. They always show their warmly caring, concern and support during these years, and encourage me with their best wishes.

I would want to sincerely thank my parents-in laws for their love and support. Without their help in taking care of my daughter, I would not have been able to complete my study and this dissertation. Thanks also go to my parents and younger sister’s famiy for taking responsibilities of family affairs in China over the past several years.

Finally, I must thank my husband Chuanhe for his unfailing soulful love that gave me the constant inspiration to fulfill my study. He was always there cheering me up and stood by me through the good and bad times. Many thanks go to my daughter Jasmine. She is such a sweet girl and her smile is my central motivation to finish my Ph.D. study.

There is no doubt that numerous people were involved in making my Ph.D. study possible.

To the countless others that have been with me and helped me along the way, I apologize for not specifically mentioning you, but please know that your friendship, help and guidance is just as greatly appreciated. Thank you all.