EQUINE VETERINARY JOURNAL 569 Equine vet. J. (2010) 42 (Suppl. 38) 569-575 doi: 10.1111/j.2042-3306.2010.00181.x

Association of sequence variants in CKM (creatine kinase, muscle) and COX4I2 (, subunit 4, isoform 2) with racing performance in Thoroughbred horses

† J. GU, D. E. MACHUGH, B. A. McGIVNEY, S. D. E. PARK, L. M. KATZ and E. W. HILL*

Animal Genomics Laboratory, UCD School of Agriculture, Food Science and Veterinary Medicine, †University Veterinary Hospital, UCD School of Agriculture, Food Science and Veterinary Medicine, University College Dublin, Ireland.

Keywords: horse; SNP; racing performance; Thoroughbred; CKM; COX4I2

Summary Introduction

Reasons for performing study: The wild progenitors of the For 3 centuries the natural athleticism of the horse has been domestic horse were subject to natural selection for speed and selected by breeders to produce Thoroughbred racehorses that, stamina for millennia. Uniquely, this process has been aided by intense management, have become athletes with extreme augmented in Thoroughbreds, which have undergone at least exercise-performance . While management of exercise 3 centuries of intense artificial selection for athletic conditioning and nutrition have considerable effects on the phenotypes. While the phenotypic adaptations to exercise are development of elite Thoroughbred athletes (approximately 65%), well described, only a small number of the underlying genetic a significant proportion of variation in athletic ability is heritable variants contributing to these phenotypes have been reported. (Gaffney and Cunningham 1988). Genetic contributions to human Objectives: A panel of candidate performance-related genes athletic performance phenotypes are well documented and more was examined for DNA sequence variation in Thoroughbreds than 220 loci have been described (Bray et al. 2009). While it and the association with racecourse performance investigated. is likely that Thoroughbred racing performance is also influenced Materials and methods: Eighteen candidate genes were chosen by a large number of genes, only 2 performance-associated for their putative roles in exercise. Re-sequencing in sequence variants in exercise-relevant genes (MSTN and PDK4), Thoroughbred samples was successful for primer sets in 13 of have been reported for the horse (Hill et al. 2010a,b). As well as these genes. SNPs identified in this study and from the these genes, it has been established that genomic regions EquCab2.0 SNP database were genotyped in 2 sets of containing an over-representation of genes responsible for insulin Thoroughbred samples (n = 150 and 148) and a series of signalling, fatty acid metabolism and muscle strength have been population-based case-control investigations were performed selected during the development of the Thoroughbred (Gu et al. by separating the samples into discrete cohorts on the basis of 2009). Additionally, relationships between mitochondrial retrospective racecourse performance. genotypes and athletic performance in the Thoroughbred have been Results: Twenty novel SNPs were detected in 3 genes: ACTN3, reported (Harrison and Turrion-Gomez 2006). CKM and COX4I2. Genotype frequency distributions for 3 A single nucleotide polymorphism (SNP) is a single base SNPs in CKM and COX4I2 were significantly (P<0.05) substitution in a genomic DNA sequence. SNPs are stable, different between elite Thoroughbreds and racehorses that bi-allelic genetic markers that are abundantly distributed had never won a race. These associations were not validated throughout the genome. They may be discovered by aligning when an additional (n = 130) independent set of samples was segments of a genomic region from different individuals following genotyped, but when analyses included all samples (n = 278) re-sequencing of candidate genes. We hypothesised that genomic the significance of association at COX4I2 g.22684390C>Twas sequence variants may be detected in exercise-relevant genes in confirmed (P<0.02). Thoroughbred horses. Such sequence variants may be developed Conclusions: While molecular genetic information has the for future use to distinguish between individuals with greater potential to become a powerful tool to make improved potential for elite racetrack performance and individuals with lesser decisions in horse industries, it is vital that rigour is applied to prospects for success. The prospective identification of genetic studies generating these data and that adequate and potential may improve selection decisions and reduce operating appropriate sample sets, particularly for independent costs and may provide opportunities to individually design replication, are used. conditioning programmes to reduce injury risk.

evj_181 569..575

*Corresponding email: [email protected] [Paper received for publication 08.01.10; Accepted 01.06.10]

© 2010 EVJ Ltd 570 SNP association with elite racing performance

Therefore, we investigated a panel of candidate athletic RPR = 115) and 62 nonelite (mean RPR = 59) performing performance genes with functions in muscle development and Thoroughbreds. The elite performer group contained a subset of metabolism, many of which were included in the human gene animals that competed successfully in short distance (Յ8 f) and map for performance (Rankinen et al. 2006). The aim of this long distance (>8 f) races. study was to identify sequence variation in a panel of candidate Validation Sample Set: A set of 130 (97 TBE and 33 TBO) athletic performance genes and to evaluate SNP association with additional Thoroughbred samples was selected from the repository racing performance phenotypes. As the Thoroughbred population for validation of the SNP associations, and criteria for inclusion has been subjected to recent and strong selection, we were as for Sample Set II. hypothesised that adaptation to exercise performance may have resulted in advantageous sequence variants in genes that NonThoroughbreds: Samples from 3 non-Thoroughbred contribute to an athletic and that these variants may be populations were included as diverse samples in a panel for SNP found at higher frequencies in successful subgroups of the discovery by re-sequencing: Akhal-Teke (AH; Turkmenistan, population. Central Asia), Connemara (CON; Ireland, Western Europe) and Tuva (TU; Republic of Tuva, Southern Siberian Steppes). Materials and methods Genomic DNA was extracted from either fresh whole blood or hair samples using a modified version of a standard phenol/ Ethics chloroform method (Sambrook and Russell 2001).

This work has been approved by University College Dublin, Retrieval of equine genomic sequence for SNP discovery Animal Research Ethics Committee. Twenty-three candidate athletic performance genes (Table 1) were Horse populations and DNA samples selected for SNP discovery on the basis that their key functions were relevant to exercise physiology and on the availability of Thoroughbreds: More than 1400 registered Thoroughbred horse equine-specific genomic sequence at the time of PCR assay design. samples (hair or fresh blood) were collected from stud farms, An overview of the study genes including gene symbol, gene racing yards and sales establishments in Ireland and New Zealand name, location and functional ontology is given in between 1997 and 2009. To minimise confounding effects of racing Table 1. over obstacles, only horses with performance records in Flat races Primers for PCR were selected and designed from 3 different were considered for inclusion in the study cohorts. The highest sources based on the availability of equine-specific genomic standard and most valuable elite Flat races are known as Group sequence for each candidate gene. Before the availability of the first races. Horses were categorised based on retrospective racecourse assembly (EquCab1.0) of the horse (Equus caballus) genome performance records as ‘Elite Thoroughbreds’ (TBE) or ‘Other (February 2007), PCR primers were either: 1) chosen from Thoroughbreds’ (TBO). Elite Thoroughbreds were Flat racehorses published papers; or 2) designed from available sequences in that had won at least one Group race (Group 1, Group 2 or Group GenBank and the Ensembl Trace Archive (Wheeler et al. 2008). In 3). Other Thoroughbreds had competed in at least one race, but had instances where no horse sequence was available, a comparative never won a race and had handicap ratings (Racing Post Rating, genomics approach was taken and PCR primers were designed RPR) <80. Race records were derived from 3 sources - Europe race based on the conserved regions between human (Homo sapiens) records: The Racing Post online database (www.racingpost.co.uk); and cattle (Bos taurus) sequences. Australasia and South East Asia race records: Arion Pedigrees (www.arion.co.nz) and North America race records: Pedigree Online Thoroughbred database (www.pedigreequery.com). In all PCR primer design, amplification and purification cases pedigree information was used to control for genetic and DNA re-sequencing background by attempting to exclude samples sharing sires. No dams were shared. Also, overrepresentation of popular sire lines To amplify candidate gene target regions, PCR primers were (e.g. Northern Dancer, etc.) within the pedigrees was avoided designed using the online tool Primer3 (Rozen and Skaletsky 2000) where possible. and synthesised by Invitrogen1. Assays that were successfully optimised (Table S2) were used Sample Set I: Sample Set I (Table S1) comprised 150 elite (TBE, n to amplify their target regions in a SNP discovery panel of = 80; mean RPR = 117) and nonelite (TBO, n = 70; mean RPR = 8 individuals from 4 diverse horse populations representing modern 63) performing Thoroughbreds. There was some sharing of sires and ancient breeds (Thoroughbred n = 4; Akhal-Teke n = 2; among the sample cohorts; i.e. there were 63 sires among the 63 Connemara n = 1 and Tuva n = 1). As such, the SNP discovery TBE samples and 57 sires among the n = 70 TBO samples. The phase of the study was sufficiently powered to detect 75% of SNPs elite performer group contained a subset of animals that competed that had a minor allele frequency >0.05 (Kruglyak and Nickerson successfully in short distance (Յ8f,Յ1609 m) and long distance 2001). PCR products were purified using the ChargeSwitch PCR (>8f,>1609 m) races. product purification kit1. Bidirectional DNA sequencing of PCR products was outsourced to Macrogen2 and carried out using AB Sample Set II: Sample Set II (Table S1) (n = 148) was refined to 3730xl sequencers3. Sequence variants were detected by visual include individuals with no shared sires or dams within each examination of sequences following treatment by Pregap4 and cohort. Therefore 17 samples were removed from Sample Set I and alignment to detect single sequence variations within one amplicon supplemented with additional samples collected on an on-going by comparing different sample segments using Gap4 (Bonfield basis during the project. Sample Set II contained 86 elite (mean et al. 1995).

© 2010 EVJ Ltd J. Gu et al. 571

TABLE 1: Candidate athletic performance genes

Gene symbol Gene name Chr KEGG pathway and/or GO biological process Resequencing

ACE2 Angiotensin I converting X hsa04614:Renin-angiotensin system Y (peptidyl-dipeptidase A) 2 ACOT9 Acyl-coa thioesterase 9 X GO:0006629~lipid metabolic process Y ACTN3 Actinin, alpha 3 12 hsa04510:Focal adhesion; GO:0003012~muscle Y system process AGTR2 Angiotensin II receptor, type 2 X GO:0002016~renin-angiotensin regulation of blood Y volume CKM Creatine kinase, muscle 10 GO:0006603~phosphocreatine metabolic process Y COX4I2 Cytochrome c oxidase, subunit 4, isoform 2 22 hsa00190:Oxidative phosphorylation Y CYCS Cytochrome c, somatic 4 GO:0045333~cellular respiration Y FBP1 Fructose-1,6-bisphosphatase 1 23 hsa00010:Glycolysis / Gluconeogenesis Y GYG1 Glycogenin 1 16 GO:0005977~glycogen metabolic process Y GYG2 Glycogenin 2 X GO:0005977~glycogen metabolic process Y HIF1A inducible factor 1, alpha subunit (basic helix- 24 GO:0001666~response to hypoxia Y loop-helix transcription factor) LDHA Lactate dehydrogenase A 7 hsa00010:Glycolysis/Gluconeogenesis Y MYEF2 Myelin expression factor 2 1 GO:0007518~myoblast cell fate determination N PDHA1 Pyruvate dehydrogenase (lipoamide) alpha 1 X hsa00010:Glycolysis/Gluconeogenesis Y PDHA2 Pyruvate dehydrogenase (lipoamide) alpha 2 3 hsa00010:Glycolysis/Gluconeogenesis Y PFKM Phosphofructokinase, muscle 6 hsa00010:Glycolysis/Gluconeogenesis Y PGK1 Phosphoglycerate kinase 1 X hsa00010:Glycolysis/Gluconeogenesis Y PHKA1 Phosphorylase kinase, alpha 1 (muscle) X hsa04910:Insulin signalling pathway Y PPARGC1A Peroxisome proliferator-activated receptor gamma, 3 hsa04910:Insulin signalling pathway Y coactivator 1 alpha PRKAA1 kinase, AMP-activated, alpha 1 catalytic subunit 21 hsa04910:Insulin signalling pathway N PRKAA2 Protein kinase, AMP-activated, alpha 2 catalytic subunit 2 hsa04910:Insulin signalling pathway N SLC2A4 Solute carrier family 2 (facilitated glucose transporter), 11 hsa04910:Insulin signalling pathway N member 4 VEGFA Vascular endothelial growth factor A 20 hsa04370:VEGF signalling pathway, N hsa04510:Focal adhesion

DNA re-sequencing in the COX4I2 gene Genotyping in Sample Set I was outsourced to KBiosciences4 using either competitive allele specific PCR or Taqman assay3. The re-sequencing of the COX4I2 gene was designed to detect 95% Genotyping in Sample Set II was carried out using iPlex of SNPs with MAF >0.05 in the Thoroughbred population technology5 at Sequenom’s facilities. Genotyping in the Validation (Kruglyak and Nickerson 2001). There was particular interest in Sample Set was performed on a TaqMan StepOnePlus instrument3 identifying SNPs in this gene as it has been shown to be according to the manufacturer’s instructions. differentially expressed in Thoroughbred horse skeletal muscle following exercise (Eivers et al. 2010). Eleven pairs of overlapping Statistical analyses PCR primers were designed to cover the entire COX4I2 genomic sequence using the PCR Suite extension to the Primer3 web-based All statistical analyses, including tests of association were primer design tool (Rozen and Skaletsky 2000; van Baren and performed using PLINK Version 1.05 (http://pngu.mgh.harvard. Heutink 2004) (Table S3). Twenty-four unrelated Thoroughbred edu/purcell/plink/) (Purcell et al. 2007). Quality control analyses DNA samples were included in a re-sequencing panel to identify included computation of sample allele frequency and percent Thoroughbred-specific sequence variants in the COX4I2 gene. missing genotypes. Case-control association tests were performed Bidirectional DNA sequencing of PCR products was outsourced to for all loci. Statistical significance was assessed using the Cochran- Macrogen and carried out using AB 3730xl sequencers. Sequence Armitage test for trend and an unconditioned genotypic model. The variants were detected by visual examination of sequences linear regression model was used to evaluate quantitative following alignment using Consed version 19.0 (Gordon et al. trait association at loci CKM g.15884567A>G and COX4I2 1998). g.22684390C>T using best race distance (furlongs) as the phenotype.

High-throughput SNP genotyping Bioinformatics

A panel of 47 SNPs in 14 genes were genotyped in either Sample MatInspector (Cartharius et al. 2005) was used to identify putative SetI(n= 150) or Sample Set II (n = 148); (Tables S4, S5). The transcription factor binding sequences in the sequences assays included novel SNPs that were discovered in this study. The surrounding the polymorphic sites at the CKM g.15884567A>G SNP identification process was augmented by the comparison of and COX4I2 g.22684390C>T loci. sequences deposited in the Equus caballus Ensembl Trace Archive repository before January 2007. Furthermore, the SNP panel was Results supplemented with EquCab2.0 SNPs lying within candidate gene regions that were discovered in the Horse Genome Sequencing A population-based case-control study was performed to Project (Wade et al. 2009). investigate candidate gene single nucleotide polymorphism (SNP)

© 2010 EVJ Ltd 572 SNP association with elite racing performance

TABLE 2: Genetic model for the CKM (g.15884567A>G) and COX4I2 (g.22684390C>T) SNPs (Sample Set II)

SNP ID Allele 1 Allele 2 Model TBE TBO CHISQ DF P value

CKM_ 15884567 G A GENO 1/10/70 2/14/39 5.03 2 0.081 TREND 12/150 18/92 4.865 1 0.027 ALLELIC 12/150 18/92 5.355 1 0.021 DOM 11/70 16/39 4.953 1 0.026 REC 1/80 2/53 0.876 1 0.349 COX4I2_ 22684390 C T GENO 4/44/32 10/30/15 6.979 2 0.031 TREND 52/108 50/60 5.58 1 0.018 ALLELIC 52/108 50/60 4.654 1 0.031 DOM 48/32 40/15 2.326 1 0.127 REC 4/76 10/45 6.093 1 0.014 association with retrospective racecourse performance phenotypes performance. For both genes the significance of the associations in Thoroughbred horses. Twenty-three candidate genes (Table 1) became stronger (CKM g.15884567A>G, c2 = 7.724, P = 0.005, that had molecular functions relevant to physiological processes OR = 4.401; and COX4I2 g.22684390C>T, c2 = 7.172, P = 0.007, important for exercise were selected for the study. The genes were OR = 2.233) when elite sprinters were compared with nonwinners. interrogated for SNPs using 3 approaches: 1) De novo SNP In order to determine the most parsimonious genetic model for the discovery by selective resequencing; 2) SNP identification from the associations between these SNPs and elite racing performance the Ensembl Trace Archive (Wheeler et al. 2008); and 3) SNP selection analysis was repeated with coding variables for additive, recessive from the Horse Genome Sequence SNP Database Version 2.0. and overdominant models. For CKM, an allelic model in which the The initial SNP discovery phase executed by these combined A allele is favourable (i.e. A:A or A:G) provided the best approaches revealed 16 sequence variants in 4 (ACTN3 n = 4; CKM explanation for the data (P = 0.021) and for COX4I2 a recessive n = 4; CYCS n = 7; NOS2 n = 1) of 12 candidate genes. Eight SNPs model was the best fit in which homozygous individuals (i.e. T:T) in 2 genes (ACTN3 n = 4; CKM n = 4) were not recorded in the for the minor allele were superior racehorses (P = 0.014) (Table 2). EquCab2.0 SNP Database. Targeted re-sequencing in COX4I2 As the associations became stronger when winners of short identified 14 sequence variants, of which 12 were not included in distance Group races were compared to nonwinners, the the EquCab2.0 SNP Database (Table S6). Two of the novel relationship between SNP and the best race distance (furlongs) for COX4I2 sequence variants and 3 EquCab2.0 COX4I2 SNPs were each individual were examined. The best race distance was defined included in the genotyping panel. An additional 26 SNPs in 9 genes as the distance of the highest grade Group race won by each (ACATE2, ACE2, HIF1A, MYEF2, PPARGC1A, PRKAA1, individual. In cases where multiple races of the same grade were PRKAA2, SLC2A4 and VEGF) were selected from the Horse won, the distance of the race in which the most prize money was Genome Sequence (EquCab1.0) SNP Database Version 1.0 won was used. A quantitative association test analysis for elite following localisation of candidate gene regions by Blast analysis individuals that had won a Group race (n = 86) revealed a of homologous Homo sapiens gene sequences on Horse Genome significant association with best race distance for 4 of the COX4I2 Sequence Version 1.0. Upon availability of the annotated genome SNPs (Table 3). Individuals homozygous for the favourable allele sequence (EquCab2.0), the SNP locations were cross-referenced to at COX4I2 g.22684390C>T on average won their best races over confirm that they were contained within candidate gene genomic shorter distances (P = 0.025). The average best race distances for regions (nomenclature henceforth refers to EquCab2.0). each genotype were: T:T = 7.9 f; C:T = 8.9 f and C:C = 10.6 f. In total, 47 SNPs in 14 genes were genotyped in genomic DNA To validate the associations we genotyped an additional from a panel of Thoroughbred horse samples - either Sample Set I set of samples (n = 130) for 2 of the SNPs with the most or Sample Set II (Tables S4, S5). Six SNPs had minor allele significant associations (COX4I2 g.22684390C>T and CKM frequencies <0.05 and were excluded from the association g.15884567A>G). While the associations were not identified in this analyses. An exact test for deviation from Hardy-Weinberg set of samples (P = 0.516 and P = 0.792), when all of the samples proportions was applied at each (Wigginton et al. 2005). We that had been genotyped were considered together (n = 278) the did not exclude SNPs that deviated significantly (P<0.01) from association at COX4I2 g.22684390C>T was retained (P = 0.014). Hardy-Weinberg proportions from subsequent analyses since the Following correction for multiple testing the association Thoroughbred population does not conform to many of the between this SNP and racing performance did not remain conditions under which Hardy-Weinberg equilibrium would be significant. However, while correction for multiple testing is expected to hold. In addition, the tests for association remain valid essential in order to control for false positives in global under departure from Hardy-Weinberg proportions, albeit with a nonhypothesis driven experiments, multiple testing is not always potential loss in power if they reflect systematic genotyping errors. necessary in hypothesis-driven candidate gene studies (Perneger Therefore 41 SNPs were included in a series of phenotype-based 1998). genetic association tests (Tables S7, S8). All further discussion refers to SNPs in Sample Set II unless otherwise stated. Discussion Sequence variants in the CKM (g.15884567A>G, c2 = 5.355, P = 0.021, OR = 2.45) and COX4I2 (g.22684390C>T, c2 = 4.654, Cytochrome c oxidase (COX) is a multi-subunit enzyme (Complex P = 0.031, OR = 1.731 and g.22684676C>T, c2 = 4.384, P = 0.036, IV) that catalyses the electron transfer from reduced cytochrome c OR = 1.680) genes were significantly associated (P<0.05) with to oxygen in mitochondrial respiration. COX is a dimer in which elite racing performance (Table 2). Of the COX4I2 SNPs, each monomer is made up of 13 subunits, 3 of which are encoded g.22684390C>T had the strongest association (P = 0.03) with by the mitochondrial genome (COX1, 2 and 3). Nuclear encoded

© 2010 EVJ Ltd J. Gu et al. 573

TABLE 3: Quantitative association test analysis results for elite individuals in Sample Set II. a. Association test using best race distance b. Quantitative trait means for best race distance a CHR SNP NMISS BETA SE R2 T P

22 COX4I2_22684844 83 1.194 0.468 0.074 2.549 0.013 22 COX4I2_22684390 80 1.195 0.497 0.069 2.405 0.019 22 COX4I2_22684676 82 1.023 0.482 0.053 2.124 0.037 22 COX4I2_22683226 77 0.991 0.481 0.054 2.061 0.043 b 22 COX4I2_22684390 GENO C/C C/T T/T

COUNTS 4 44 32 FREQ 0.05 0.55 0.40 MEAN 10.620 8.943 7.875 SD 2.056 2.783 2.254

COX4 is responsible for the regulation and assembly of indexed by maximal uptake following 20 weeks of training mitochondrially encoded subunits on the inner mitochondrial (Echegaray and Rivera 2001). In the Thoroughbred horse skeletal membrane and has been associated with mitochondrial volume. muscle transcriptome, CKM mRNA is the most abundantly COX4 comprises 2 isoforms (COX4-1 and COX4-2) encoded by expressed transcript representing 6.9% of the annotated the COX4I1 and COX4I2 genes that are differentially regulated in transcriptome (McGivney et al. 2010). Studies have indicated that normoxic and hypoxic environments (Fukuda et al. 2007). In CKM makes up ~1% of the human skeletal muscle transcriptome normal oxygen environments the COX4I1 gene is preferentially (Welle et al. 1999). The very high expression of CKM mRNA in transcribed. In limited oxygen environments the master regulator of equine compared to human skeletal muscle is indicative of the the hypoxic response, HIF-1 (hypoxia inducible factor 1), activates importance of the CKM gene product in the highly adapted athletic transcription of COX4I2 and the mitochondrial LON gene, which phenotype of the Thoroughbred. In support of this, CKM gene inhibits the expression of COX4I1. It has been proposed that this transcripts are significantly increased 4 h following treadmill environmental regulation of COX4-2 may increase the efficiency of exercise (Eivers et al. 2010) and following a 10 month period of cellular respiration (Fukuda et al. 2007). training (Eivers et al. 2010). We have identified a weak, but significant, association between We have identified a preliminary association between a CKM an intronic SNP in the COX4I2 gene and retrospective racing polymorphism and racing performance; however, this relationship performance. The COX4I2 g.22684390C>T SNP disrupts a must be validated in additional sample sets before any application putative glucocorticoid response element (GRE) binding site of the information may be useful. Briefly, the CKM (C/TGTT). The favourable allele (T) retains the site (TGTT), the g.15884567A>G SNP is located in intron 4 and disrupts a putative less favourable allele (C) disrupts the site (CGTT), which may interferon regulatory factor (IRF-1) binding site (GCA/GA). The A disable GRE binding and repress expression of the gene. allele retains the site (GCAA) while the G allele disrupts the site Alternatively, the SNP disrupts a putative p53 tumour suppressor (GCGA). IRF-1 is an oxygen mediated transcription factor binding site (CAC/TG). The favourable allele (T) retains the site involved in mitochondrial biogenesis and metabolism, and in man (CATG), therefore enabling p53 binding, while the less favourable has been shown to be significantly activated after a period of allele (C) disrupts the site (CACG) disabling the p53 binding. endurance exercise (Mahoney et al. 2005). There is growing evidence for a pivotal role for p53 in the regulation of exercise adaptation via mitochondrial biogenesis and Conclusion apoptosis (Saleem et al. 2009) and regulation of the cytochrome c oxidase complex (Kruse and Gu 2006; Matoba et al. 2006). For To date, a range of approaches has been taken to investigate instance, p53 has been shown to promote aerobic metabolism and measurable associations with athletic performance phenotypes in exercise capacity by the regulation of a number of mitochondrial Thoroughbred racehorses including assessment of heart size specific genes and in a tissue-specific manner (Park et al. 2009). (Young et al. 2005), muscle fibre type (Rivero et al. 1993, 1995; While this study has not determined the significance of the Barrey et al. 1999; Young et al. 2005), musculoskeletal maintenance or disruption of a putative p53 binding site in conformation (Fang et al. 2000), post exercise lactate concentration COX4I2, we hypothesise that the presence or absence of the (Evans et al. 1993), speed at maximal heart rate (Gramkow and binding site may contribute to mitochondrial biogenesis and Evans 2006), haematological (Revington 1983) and other therefore overall aerobic capacity. physiological variables (Harkins et al. 1993). The availability of The creatine kinase, muscle gene (CKM) encodes a muscle type the horse genome sequence (Wade et al. 2009) and the parallel isozyme of creatine kinase found exclusively in striated muscle and development of molecular genomics tools for the horse have involved in cellular energetics. During exercise CKM gene rapidly enabled the identification of sequence variants associated knockout mice show a lack of burst activity but maintain normal with athletic performance phenotypes in Thoroughbreds. The first absolute muscle force (van Deursen et al. 1993). A CKM sequence genetic test for a known performance associated trait (Hill et al. variant described in man confers a tendency to be more effective in 2010a) is now commercially available and will assist the a 90 min performance test and to have less decline in force bloodstock industry to maximise the genetic potential of each production during a 60 s force generation test (Bouchard et al. Thoroughbred horse. As it is likely that there is a considerable 1989). Human CKM polymorphisms have been shown to be number of genes that contribute to the heterogeneity observed in associated with an increase in cardiorespiratory endurance as phenotypic performance, it will be necessary to further refine and

© 2010 EVJ Ltd 574 SNP association with elite racing performance develop genetic selection methodologies to enhance decision- Gordon, D., Abajian, C. and Green, P. (1998) Consed: A graphical tool for sequence making. finishing. Genome Res. 8, 195-202. This study contributes to the growing body of knowledge Gramkow, H.L. and Evans, D.L. (2006) Correlation of race earnings with velocity at maximal heart rate during a field exercise test in thoroughbred racehorses. Equine regarding the genetic contributions to elite performance vet. J., Suppl. 36, 118-122. phenotypes in the equine athlete and suggests that sequence Gu, J., Orr, N., Park, S.D., Katz, L.M., Sulimova, G., MacHugh, D.E. and Hill, E.W. variation in 2 genes (COX4I2 and CKM) contributes to racing (2009) A genome scan for positive selection in thoroughbred horses. PLoS ONE 4, performance observed on the racetrack. However, it is important to e5767. note that a genotype with a significant but weak effect on racing Harkins, J.D., Beadle, R.E. and Kamerling, S.G. (1993) The correlation of running ability is unlikely to have commercial value as many other factors ability and physiological variables in thoroughbred racehorses. Equine vet. J. 25, 53-60. (both genetic and environmental) will also influence performance. Harrison, S.P. and Turrion-Gomez, J.L. (2006) Mitochondrial DNA: An important Therefore, with regard to the evaluation of specific genomic female contribution to thoroughbred racehorse performance. 6, information for the equine industries, it is imperative that all 53-63. preliminary associations are validated in adequate and appropriate Hill, E.W., Gu, J., Eivers, S.S., Fonseca, R.G., McGivney, B.A., Govindarajan, P., Orr, cohorts of individuals. The application of genomic information in N., Katz, L.M. and MacHugh, D.E. (2010a) A sequence polymorphism in MSTN the Thoroughbred industry is a new and emerging field and should predicts sprinting ability and racing stamina in thoroughbred horses. PLoS ONE 5, e8645. adhere to rigorous scientific standards to establish and maintain the Hill, E.W., Gu, J., McGivney, B.A. and MacHugh, D.E. (2010b) Targets of selection integrity of such information among end-users. in the Thoroughbred genome contain exercise-relevant gene SNPs associated with elite racecourse performance. Anim Genet. Revision submitted. Kruglyak, L. and Nickerson, D.A. (2001) Variation is the spice of life. Nat. Genet. 27, Conflicts of interest 234-236. Kruse, J.P. and Gu, W. (2006) p53 aerobics: T he major tumor suppressor fuels your None declared. workout. Cell Metab. 4, 1-3. McGivney, B.A., McGettigan, P.A., Browne, J.A., Evans, A.C.O., Fonseca, R.G., Loftus, B.J., Lohan, A., MacHugh, D.E., Murphy, B.A., Katz, L.M. and Manufacturers’ addresses Hill, E.W. (2010) Characterization of the equine skeletal muscle transcriptome identifies novel functional responses to exercise training. BMC Genomics 23, 1Invitrogen, California, USA. 398. 2 Macrogen, Seoul, South Korea. Mahoney, D.J., Parise, G., Melov, S., Safdar, A. and Tarnopolsky, M.A. (2005) 3 Applied Biosystems, Foster City, California, USA. Analysis of global mRNA expression in human skeletal muscle during recovery 4KBiosciences Ltd., Hoddesdon, Hertfordshire, UK. from endurance exercise. FASEB J. 19, 1498-1500. 5Sequenom, San Diego, California, USA. Matoba, S., Kang, J.G., Patino, W.D., Wragg, A., Boehm, M., Gavrilova, O., Hurley, P.J., Bunz, F. and Hwang, P. (2006) p53 regulates mitochondrial respiration. References Science 312, 1650-1653. Park, J.Y., Wang, P.Y., Matsumoto, T., Sung, H.J., Ma, W., Choi, J.W., Anderson, S.A., Leary, S.C., Balaban, R.S., Kang, J.G. and Hwang, P.M. (2009) p53 improves Barrey, E., Valette, J.P., Jouglin, M., Blouin, C. and Langlois, B. (1999) Heritability of aerobic exercise capacity and augments skeletal muscle mitochondrial DNA percentage of fast myosin heavy chains in skeletal muscles and relationship with content. Circ Res. 105, 705-712. performance. Equine vet. J., Suppl. 30, 289-292. Perneger, T.V. (1998) What’s wrong with Bonferroni adjustments. Br. Med J. 316, Bonfield, J.K., Smith, K. and Staden, R. (1995) A new DNA sequence assembly 1236-1238. program. Nucleic Acids Res. 23, 4992-4999. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Bouchard, C., Chagnon, M., Thibault, M.C., Boulay, M.R., Marcotte, M., Cote, C. and Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J. and Sham, P.C. (2007) PLINK: a Simoneau, J.A. (1989) Muscle genetic variants and relationship with performance tool set for whole-genome association and population-based linkage analyses. Am. and trainability. Med. Sci. Sports Exerc. 21, 71-77. J. Hum. Genet. 81, 559-575. Bray, M.S., Hagberg, J.M., Perusse, L., Rankinen, T., Roth, S.M., Wolfarth, B. and Rankinen, T., Bray, M.S., Hagberg, J.M., Perusse, L., Roth, S.M., Wolfarth, B. and Bouchard, C. (2009) The human gene map for performance and health-related Bouchard, C. (2006) The human gene map for performance and health-related fitness phenotypes: The 2006-2007 update. Med. Sci. Sports Exerc. 41, 35-73. fitness phenotypes: the 2005 update. Med. Sci. Sports Exerc. 38, 1863- Cartharius, K., Frech, K., Grote, K., Klocke, B., Haltmeier, M., Klingenhoff, A., Frisch, 1888. M., Bayerlein, M. and Werner, T. (2005) MatInspector and beyond: Revington, M. (1983) Haematology of the racing Thoroughbred in Australia 2: analysis based on transcription factor binding sites. Bioinformatics 21, 2933-2942. haematological values compared to performance. Equine vet. J. 15, 145-148. Echegaray, M. and Rivera, M.A. (2001) Role of creatine kinase isoenzymes on Rivero, J.L., Ruz, M.C., Serrano, A.L. and Diz, A.M. (1995) Effects of a 3 month muscular and cardiorespiratory endurance: genetic and molecular evidence. endurance training programme on skeletal muscle histochemistry in Andalusian, Sports Med. 31, 919-934. Arabian and Anglo-Arabian horses. Equine vet. J. 27, 51-59. Eivers, S.S., McGivney, B.A., Fonseca, R.G., MacHugh, D.E., Menson, K., Park, S.D., Rivero, J.L., Serrano, A.L., Henckel, P. and Aguera, E. (1993) Muscle fiber type Rivero, J.L., Taylor, C.T., Katz, L.M. and Hill, E.W. (2010) Alterations in composition and fiber size in successfully and unsuccessfully endurance-raced oxidative in equine skeletal muscle following exercise and horses. J. appl. Physiol. 75, 1758-1766. training. Physiol. Genomics 40, 83-93. Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for Evans, D.L., Harris, R.C. and Snow, D.H. (1993) Correlation of racing performance biologist programmers. Methods Mol. Biol. 132, 365-386. with blood lactate and heart rate after exercise in thoroughbred horses. Equine vet. J. 25, 441-445. Saleem, A., Adhihetty, P.J. and Hood, D.A. (2009) Role of p53 in mitochondrial biogenesis and apoptosis in skeletal muscle. Physiol. Genomics 37, 58-66. Fang, J., Dagenais, S.L., Erickson, R.P., Arlt, M.F., Glynn, M.W., Gorski, J.L., Seaver, Molecular Cloning: A Laboratory Manual L.H. and Glover, T.W. (2000) in FOXC2 (MFH-1), a forkhead family Sambrook, J. and Russell, D. (2001) ,3rd transcription factor, are responsible for the hereditary lymphedema-distichiasis edn., Cold Spring Harbor Laboratory Press, New York, USA. syndrome. Am. J. Hum. Genet. 67, 1382-1388. Van Baren, M.J. and Heutink, P. (2004) The PCR suite. Bioinformatics 20, 591-593. Fukuda, R., Zhang, H., Kim, J.W., Shimoda, L., Dang, C.V. and Semenza, G.L. (2007) Van Deursen, J., Heerschap, A., Oerlemans, F., Ruitenbeek, W., Jap, P., Laak, H. and HIF-1 regulates cytochrome oxidase subunits to optimize efficiency of respiration Wieringa, B. (1993) Skeletal muscles of mice deficient in muscle creatine kinase in hypoxic cells. Cell 129, 111-122. lack burst activity. Cell 74, 621-631. Gaffney, B. and Cunningham, E.P. (1988) Estimation of genetic trend in racing Wade, C.M., Giulotto, E., Sigurdsson, S., Zoli, M., Gnerre, S., Imsland, F., Lear, T.L., performance of Thoroughbred horses. Nature 332, 722-724. Adelson, D.L., Bailey, E., Bellone, R.R., Blocker, H., Distl, O., Edgar, R.C.,

© 2010 EVJ Ltd J. Gu et al. 575

Garber, M., Leeb, T., Mauceli, E., MacLeod, J.N., Penedo, M.C., Raison, J.M., Supporting information Sharpe, T., Vogel, J., Andersson, L., Antczak, D.F., Biagi, T., Binns, M.M., Chowdhary, B.P., Coleman, S.J., Della Valle, G., Fryc, S., Guerin, G., Hasegawa, T., Hill, E.W., Jurka, J., Kiialainen, A., Lindgren, G., Liu, J., Magnani, E., Additional Supporting Information may be found in the online Mickelson, J.R., Murray, J., Nergadze, S.G., Onofrio, R., Pedroni, S., Piras, M.F., version of this article: Raudsepp, T., Rocchi, M., Roed, K.H., Ryder, O.A., Searle, S., Skow, L., Swinburne, J.E., Syvanen, A.C., Tozaki, T., Valberg, S.J., Vaudin, M., White, J.R., Table S1: Description of population cohorts. Sample Set I Zody, M.C., Lander, E.S. and Lindblad-Toh, K. (2009) Genome sequence, comprised n = 150 performing Thoroughbreds. Sample Set II comparative analysis, and population genetics of the domestic horse. Science 326, (n = 148) was refined to include individuals with no shared sires or 865-867. dams within each cohort. Welle, S., Bhatt, K. and Thornton, C.A. (1999) Inventory of high-abundance mRNAs in skeletal muscle of normal men. Genome Res. 9, 506-513. Table S2: Designed amplicons in candidate genes. Table S3: Overlapping PCR primers for re-sequencing COX4I2. Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., Dicuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Table S4: A panel of 28 SNPs in 12 genes were genotyped in Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, Sample Set I. T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, Table S5: A panel of 19 SNPs in 5 genes were genotyped in M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L. and Yaschenko, E. (2008) Database resources of Sample Set II. the National Center for Biotechnology Information. Nucleic. Acids. Res. 36, Table S6: Novel SNPs discovered by re-sequencing. D13-D21. Table S7: Case-control association test results for a panel of 28 Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of SNPs in 12 genes were genotyped in Sample Set I. Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 76, 887-893. Table S8: Case-control association test results for a panel of 19 Young, L.E., Rogers, K. and Wood, J.L. (2005) Left ventricular size and systolic SNPs in 5 genes that were genotyped in Sample Set II. function in Thoroughbred racehorses and their relationships to race performance. J. appl. Physiol. 99, 1278-1285. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

© 2010 EVJ Ltd