UNRAVELLING THE GENOMIC ARCHITECTURE OF BULL FERTILITY IN DAIRY CATTLE

By

YI HAN

A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

UNIVERSITY OF FLORIDA

2016

© 2016 Yi Han

To my parents and my grandfather

ACKNOWLEDGMENTS

I thank my advisor Dr. Francisco Penagaricano for giving me the opportunity to enter this field and his help throughout my research. I thank my committee members:

Dr. Raluca Mateescu and Dr. Mauricio Elzo for being so kind, understanding and supportive. Finally, thanks to Mom, Dad and Ruohan for their encouragement and endless support.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 7

LIST OF FIGURES ...... 8

LIST OF ABBREVIATIONS ...... 9

ABSTRACT ...... 10

CHAPTER

1 INTRODUCTION ...... 11

2 LITERATURE REVIEW ...... 13

Bull Fertility ...... 13 Compensable Traits vs. Uncompensable Trait ...... 13 Prediction of Sire Fertility ...... 16 Fertility Estimation Using Field Data (in vivo) ...... 16 Technician non-return rates (NNR) ...... 17 Estimated relative conception rate (ERCR) and agritech analytics (ATA) .. 17 Sire conception rate (SCR) ...... 18 Predictors of Fertility in vitro ...... 19 Making Predictions ...... 21 The Genetic Basis of Bull Fertility ...... 22 Chromosomal Aberrations ...... 22 Numerical aberrations ...... 23 Structural aberrations ...... 23 Identification ...... 24 GWAS ...... 25 Pathway based analysis ...... 26 Novel Omic Technologies on Bull Fertility ...... 27

3 UNRAVELLING THE GENOMIC ARCHITECTURE OF BULL FERTILITY IN HOLSTEIN CATTLE ...... 32

Background ...... 32 Methods ...... 34 Phenotypic and Genotypic Data ...... 34 Statistical Methods for Genome-Wide Association Mapping ...... 35 Genome-Wide Association Mapping Using ssGBLUP ...... 36 Genome-Wide Association Mapping Using Single Marker Regression (cGWAS) ...... 38

5

Gene Set Analysis ...... 39 Results ...... 40 Whole Genome Association Analysis ...... 40 Gene Set Analysis ...... 42 Discussion ...... 44 Conclusion ...... 48

LIST OF REFERENCES ...... 57

BIOGRAPHICAL SKETCH ...... 67

6

LIST OF TABLES

Table page

2-1 Summary of the strongest candidate found by GWAS ...... 30

3-1 Most significant genetic markers associated with Sire Conception Rate...... 54

3-2 (GO) Molecular Function terms significantly enriched with genes associated with Sire Conception Rate ...... 55

3-3 MeSH terms significantly enriched with genes associated with Sire Conception Rate (SCR)...... 56

7

LIST OF FIGURES

Figure page

3-1 Descriptive statistics for Sire Conception Rate (SCR)...... 50

3-2 Manhattan plots showing the results of the genome-wide association mapping for Sire Conception Rate...... 51

3-3 Genomic regions (1.5 Mb) that explain more than 0.50% of the genetic variance for Sire Conception Rate...... 52

3-4 Gene Ontology Biological Process terms significantly enriched with genes associated with Sire Conception Rate...... 53

8

LIST OF ABBREVIATIONS

AI Artificial insemination

AIPL Animal Improvement Programs Laboratory

ATA Agritech Analytics

CASA Computer-assisted semen analysis

CDCB Council of Dairy Cattle Breeding

CDDR Cooperative Dairy DNA Repository cGWAS Classical Genome-wide association studies

ERCR Estimated relative conception rate

GEBVs Genomic estimated breeding values

GO Gene Ontology

GSEA Gene set enrichment analysis

HOST Hypo-osmotic swelling test

IBD Identical-by-descent

MeSH Medical Subject Headings

NAAB National Association of Animal Breeders

NNR Non-return rate

PMI Plasma membrane integrity

QTL Quantitative trait loci

SCR Sire conception rate

SNP Single-nucleotide polymorphisms ss-GBLUP Single step Genomic best linear unbiased predictor

SSFS Service Sire Fertility Summary

USDA United States Department of Agriculture

9

Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science

UNRAVELLING THE GENOMIC ARCHITECTURE OF BULL FERTILITY IN DAIRY CATTLE

By

Yi Han

December 2016

Chair: Francisco Peñagaricano Major: Animal Sciences

Fertility is considered an important economic trait in dairy cattle. Most studies have investigated cow fertility while bull fertility has received much less consideration.

The main objective of this study was to perform a comprehensive genomic analysis in order to unravel the genomic architecture underlying sire fertility in dairy cattle. The analysis included the application of alternative genome-wide association mapping approaches and the subsequent use of diverse gene set enrichment tools.

The association analyses identified at least eight genomic regions strongly associated with bull fertility. Most of these regions harbor genes, such as KAT8, CKB,

TDRD9 and IGF1R, with functions related to sperm biology, including sperm development, motility and sperm-egg interaction. Moreover, the gene set analyses revealed many significant functional terms, including fertilization, sperm motility, calcium channel regulation, and SNARE . Most of these terms are directly implicated in sperm physiology and male fertility.

This study contributes to the identification of genetic variants and biological processes underlying sire fertility. These findings can provide opportunities for improving bull fertility via marker-assisted selection.

10

CHAPTER 1 INTRODUCTION

A rapid raising demand for dairy products has inflicted on a comparatively small number of dairy cows in recent years (Potgieter, 2012). Multiple programs including nutrition, management and genetic selection have been used and successfully made a progress on milk production. However, traits associated with fitness, such as fertility are seldom taken into account in dairy cattle breeding program. In response to the intense selection for genetic merit of milk yield, a worldwide deleterious effect on fertility has become a major concern (Lucy, 2001a; Pryce and Veerkamp, 2001). This phenomenon happens to be both logical and natural (Berry et al., 2016). One possible explanation is

that the transmission of mutations causing infertility has always been prevented by

natural selection, and this protection has been broken by artificial selection (Ferlin et al.,

2006). In addition, some studies have suggested that this negative association may be

caused by genetic correlations, which are the result of pleiotropy and linkage, and

physiological factors (metabolic disease), since the genetic correlation is less than one

suggesting that genetic is not the only factor contribute to this phenomenon (Miglior et

al., 2005; Berry et al., 2016). From a physiological aspect, an evolutionary biology trade-

off hypothesis was proposed stating that both milk production and reproduction need

resources, and hence, a trade-off occurs when resources are limited (Berry et al.,

2016). For example, milk production has the priority in terms of nutrients and energy,

leading to a fertility depression (Berry et al., 2016).

Fertility is an economically important trait in dairy cattle production. Subfertility

denote a great direct or indirect financial loss, including sub-optimal calving interval,

higher veterinary and insemination cost and a forced replacement cost (Dijkhuizen et

11

al., 1985; Heringstad et al., 2003). Nowadays, female reproduction traits receive more consideration, and are included in selection indices (Miglior et al., 2005). In fact, most of

the attention has been paid to female fertility while bull fertility has received less consideration (Peñagaricano et al., 2012c). However, male infertility is crucial in mammalian reproduction. In human, it accounts for 40%-50% of the infertility cases

(Peddinti et al., 2008).

12

CHAPTER 2 LITERATURE REVIEW

Bull Fertility

Bull fertility is explained as the zygote, successfully resulting from the process where the spermatozoa fertilizes and activates the oocyte, can continue to develop

through embryonic and fetal development until delivery of alive offspring (Peddinti et al.,

2008; Kaya and Memili, 2016). Male infertility can be described as a reduced or absent

ability of producing spermatozoa that is capable of fertilizing the oocyte and supporting

subsequent embryonic and fetal development until birth (Feugang et al., 2010).

Sperm, to be fertile, needs to be produced properly through spermatogenesis

including a series of mitotic and meiotic divisions of male primordial germ cells, followed

by morphological remodeling and capacitation. The resulting spermatozoa needs to be

capable also of performing the following functions: adequate motility, move properly in

both male and female reproductive tracts in order to successfully reach the oocyte;

hyperactivation, support the penetration to the zona pellucida; acrosome reaction, the

release of enzymes from the acrosome that helps the sperm to penetrate the ovum;

penetration to zona; insemination and other roles involved in further support of

embryogenesis (Feugang et al., 2010; Kaya and Memili, 2016). During this long and

complex process, different environmental and genetic factors, including genetic

mutations, may result in abnormal and dysfunctional spermatozoa that finally leads to

bull subfertility (Feugang et al., 2010).

Compensable Traits vs. Uncompensable Trait

The primary responsibility of the AI industry is to provide products that meet the

goals of the whole dairy industry (DeJarnette and Amann, 2010). Due to the increasing

13

awareness of the decreasing reproductive performance of dairy cattle, semen quality traits have been emphasized by major AI organizations (DeJarnette and Amann, 2010).

Semen traits related to bull fertility can be categorized as compensable and

uncompensable; this simple classification has been the foundation of the semen quality

control programs (Saacke et al., 2000; DeJarnette et al., 2004a). Those traits, where

low semen fertility can be overcome by increasing the sperm count per AI dose to reach

their maximum fertilization rate, are considered as compensable; on the other hand,

those traits that do not respond to changes in semen concentration, are considered as

uncompensable traits (Saacke et al., 2000).

For compensable traits, it is important to determine the minimum amount of

semen per AI dose in order to achieve the maximum fertility (Saacke et al., 2000).

These traits typically affect the sort of events happening before the sperm penetrates

the oocyte membrane, and thus address the ability of inseminated sperm to move and

function in female tract, and to access the ovum or the site of fertilization (Evenson,

1999; Amann and DeJarnette, 2012). Conventional compensable traits can be

explained by viability (motility) and morphology (Saacke et al., 2000). Some studies

have shown that immotile sperm are restricted at several barriers in the female

reproductive tract (e.g. tubal isthmus) (Overstreet and Cooper, 1978); and only those

sperms with appropriate movement can cross the barriers, or constitute the accessory

sperm population (Saacke et al., 1994). Besides these conventional compensable traits,

other semen traits related to functional modifications that occur in the oviducts, including

sperm capacitation, sperm/egg recognition, and the acrosome reaction, are crucial in

the fertilization process, particularly in the penetration of the sperm into the zona

14

pellucida (Saacke et al., 2000). Thus, compensable traits not only include the ability of the sperm to moving in the female reproductive tract, but also include the initiation of the fertilization process and polyspermy blocks (Saacke et al., 2000).

Uncompensable traits dramatically affect bull fertility regardless of sperm dosage.

Low fertility bulls having uncompensable defects have been shown to have a reduced early cleavage rate, and also a delayed pronuclear formation (Eid et al., 1994). Sperms

with non-compensable traits typically have low fertilization rates, producing low quality

embryos (Saacke et al., 2000). These semen traits induce spermatozoal incompetence

to continue or sustain the late stage of fertilization process, and induce embryonic development, which becomes a limiting factor of bull fertility (Saacke et al., 2000).

Uncompensable traits include nuclear vacuoles, morphological deficiencies that do not

affect movement depression, chromatin structure and aberrations, and

functional genetic attributes (Peddinti et al., 2008; Kaya and Memili, 2016). Sperms with

normal or near-normal morphological shape, but with chromatin defects, are known to

cause subfertility (Gledhill, 1970). In fact, chromatin aberrations are probably the best

known non-compensable semen traits (Saacke et al., 2000). Finally, there is growing

evidence that several chromosomal regions contribute to genetic variation in

uncompensable traits (Blaschek et al., 2011b). To sum up, these non-compensable

traits appear to be very important in the fertilization process and subsequent

embryogenesis maintenance, and result from molecular and genomic defects.

A recent world-wide survey found that AI organizations are placing more sperm

that needed in each straw in order to ensure the maximum fertility (Vishwanath, 2003).

However, there are many bulls that present low fertility, probably due to the possession

15

of uncompensable semen defects. Because these defects cannot be fixed in laboratory

by increasing semen concentration, uncompensable traits should receive more attention

(DeJarnette and Amann, 2010; Amann and DeJarnette, 2012). Indeed, to estimate

precisely semen quality, it is important to detect and characterize potential non-

compensable defects, separately from typical compensable traits (Braundmeier and

Miller, 2001).

Prediction of Sire Fertility

Sire fertility has become a concern to both dairy and beef producers when they

select genetically superior bulls (DeJarnette and Amann, 2010). Thus, ensuring highly

fertile semen straws is a primary goal for any AI center (Sellem et al., 2015). Proper

prediction of sire fertility may provide a valuable tool to both AI industry and producers

(DeJarnette and Amann, 2010). The basic principle of sire fertility prediction requires

identifying a dependent variable, i.e. a good estimate of male fertility, and several

independent variables (Utt, 2016).

Fertility Estimation Using Field Data (in vivo)

Fertility estimation is quite complicated. Broadly speaking, fertility is defined as

the ability to give birth to alive offspring (Utt, 2016). The use of the term fertility is

convenient for an expression purpose, but it is inconvenient in terms of trait definition

and measurement (Amann and DeJarnette, 2012) . Since the process from

insemination to giving birth to a viable offspring is complex, it is hard to choose which

endpoint phenotype should be used for fertility measurement (Utt, 2016). Thus, multiple

systems for evaluating bull fertility are available, and in general, they are based on

either fertilization rates, non-return rates (NRR) or conception rates (Utt, 2016).

16

Technician non-return rates (NNR)

The oldest method to measure bull fertility is based on professional technician non-return rates (NNR), which have been used in the dairy industry since 1940s

(DeJarnette and Amann, 2010). Non-return means that a specific cow was inseminated

and she did not return to estrus, and hence, we consider this animal as pregnant (the

assumption of this method). This rate is calculated as the portion of cows that are not

re-bred by the same professional technician within a specific interval of time. This

method has been adopted as an indicator of bull fertility in many countries, because the

data is easily collected and the cost is fair (Miglior et al., 1998). Some studies have

shown that NNR is less biased by selection compared with other methods, and quite

reliable when causative factors are well controlled (Fouz et al., 2011). However, this

simple method has some important limitations. For instance, recording data is crucial: a

given tested cow may have no second service record because she left the herd, or she

was serviced by another technician, or simply because the second service was never

recorded; in other words, in this case having no second service has nothing to do with

conception (Amann and DeJarnette, 2012). Furthermore, many factors affecting

conception have to be assumed equally for each sire since we have poor knowledge

about the data, which also can contribute to the bias and unexplained variation

(DeJarnette and Amann, 2010).

Estimated relative conception rate (ERCR) and agritech analytics (ATA)

From 1986 to November 2005, the Animal Improvement Programs Laboratory

(AIPL) provided to the US dairy industry an estimate of bull fertility called Estimated

Relative Conception Rate (ERCR), which is the difference in conception rate (measured

as non-return rate at 70 d) of a sire compared with other AI sires used in the same herd

17

(Clay and McDaniel, 2001; DeJarnette and Amann, 2010). This estimate became

popular because its increased accuracy by considering several environmental factors

that may affect conception (DeJarnette and Amann, 2010; Amann and DeJarnette,

2012). These factors included milk yield, lactation number, stage of lactation, month of breeding, year and herd. Moreover, the bull fertility estimate ATA, now called Service

Sire Fertility Summary (SSFS), was introduced to the dairy industry in 2003. This estimate uses 75-day confirmed pregnancy instead of non-return rate, including also information up to 5 services per cow per lactation (Norman et al., 2008). While ERCR was calculated using 3-year rolling data, ATA system is based on perpetual data.

Sire conception rate (SCR)

Since August 2008, AIPL provides a new estimate of bull fertility called Sire

Conception Rate (SCR). This is a complex and more accurate estimate of the fertility of

AI service-sire, developed by Dr. Melvin Kuhn, that replaced ERCR (Di Croce, 2010).

The SCR is calculated three times per year using a large scale nation-wide database

(Amann and DeJarnette, 2012). Generally speaking, the SCR model includes several

factors related to the sire capability to achieve a conception, and remove simultaneously multiple factors that are related to the cows that receive the semen (Kuhn and

Hutchison, 2008a; Kuhn et al., 2008b). These nuisance variables can indeed obscure the true fertility of the bull (Norman et al., 2008). Factors related to the bull include bull

age at mating, inbreeding of service sire, inbreeding of the embryo from the mating, and

stud-year of insemination, and the inbreeding coefficient (Kuhn and Hutchison, 2008a;

Norman et al., 2008). The cow factors were identified by comparing the prediction of alternative models with different combination of potential nuisance factors (Kuhn et al.,

2008b). These nuisance variables included group effect of cow herd, year-state-month,

18

lactation, service number, effect of having a short interval between breedings, age of the cow at breeding, milk yield, and the random animal effect as well as the random permanent environmental effect (Kuhn et al., 2008b; Norman et al., 2008).

SCR is intended as phenotypic rather than genetic evaluation, because published estimates reflect both genetic and permanent environmental effects (Norman et al., 2015). Similar to ERCR, SCR is the expected difference in conception rate of a sire compared with the mean of all other evaluated sires (Di Croce, 2010; Amann and

DeJarnette, 2012). For example, a bull with SCR of -1.0% is expected to have a conception rate 1% lower than the average conception rate. The term “expected” denotes that this statistical estimation is valid only when the database has an extremely large number of matings (Norman et al., 2008).

The SCR model has been applied for 8 years. Due to more bulls having been published and the fact that the assumptions of the model might have changed during this period of time, the National Association of Animal Breeders (NAAB) Sire Fertility

Evaluation Committee asked to re-examine the model; the results of this re-evaluation showed that the SCR model is still valid (Norman et al., 2015).

Finally, since there are different traits and estimation procedures for bull fertility, one may ask which one to believe. The truth is we should believe all of them; each evaluation is designed for a certain situation and depends heavily on the data used

(DeJarnette and Amann, 2010).

Predictors of Fertility in vitro

To predict bull fertility in an accurate and precise way, the identification of multiple independent variables is often very beneficial (Utt, 2016). The measure and control of the sperm quality is a routine process in any AI company (Sellem et al., 2015).

19

In fact, in vitro–assessed sperm quality characteristics are very useful for predicting bull

fertility (Utt, 2016). In vitro fertility predictors can be classified into two general categories, i.e. the direct measurement of different sperm attributes, such as motility and morphology, and functional bioassays of sperm competence of a certain task

related to fertilization process, such as capacitation, acrosome reaction, or mucus

penetration (Utt, 2016).

Most commonly, sperm motility is selected as one of the major predictors for

male fertility, given that it is closely related to sperm viability and structural integrity

(Kathiravan et al., 2011). Computer-assisted semen analyzer (CASA) is a very precise

and accurate measurement of sperm motility (Kathiravan et al., 2011). Sperm

morphology also appears to be crucial to fertility. Briefly, sperm morphology maybe

related to both compensable and also uncompensable traits. Evaluation of sperm

morphology can be performed using microscope or automated morphometry

estimations (Mocé and Graham, 2008). Another useful laboratory analyses include the

flow cytometry, which is proven to be useful for assessing membrane integrity in bovine

sperm, mitochondrial status, the level of reactive oxidation, and acrosomal membrane

integrity (Sellem et al., 2015). Furthermore, there are many assays available that can be

used to evaluate the plasma membrane integrity (PMI); various assays should be

applied due to their different abilities to assess different PMI characteristics (Mocé and

Graham, 2008). For instance, an indirect indicator of PMI is the hypo-osmotic swelling

test (HOST); the results of this test are associated with in vitro fertilization rate and non-

return rate (Utt, 2016). Mitochondrial level can indirectly assess sperm motility by the

use of several fluorescent markers (Thomas et al., 1998). Western blot techniques or

20

flow cytometry can also be used to evaluate membrane changes (Mocé and Graham,

2008). Finally, the quality of the genetic package and different molecular indicators are

also strong predictors of bull fertility; these variables will be discussed in detailed in

section ‘The Genetic Basis of Bull Fertility’ and ‘Novel Omic Technologies on Bull

Fertility’.

Making Predictions

It is important to remark that both in vitro predictors and also field data

estimations have their own weaknesses and bottlenecks (Mocé and Graham, 2008).

Studies have shown that the correlation between the results of the two methods are not

always high as expected (Mocé and Graham, 2008). Fertility estimates are somewhat

limited by the definition considered for the term fertility and also the quality of the field

data collected (Mocé and Graham, 2008).

Prediction models for bull fertility use in general large datasets in order to

improve precision; however, large datasets cannot eliminate unexplained variation and

may even introduce more noise (Amann and DeJarnette, 2012; Utt, 2016).

In vitro prediction has limitations due to several reasons including the complexity

of the sperm function, predictor selection, the uncertainty and inaccuracy of in-vitro

measurements, and the presence of uncontrollable factors (Mocé and Graham, 2008;

Amann and DeJarnette, 2012; Utt, 2016). To fertilize an oocyte, a sperm requires many

attributes. In this sense, firstly we cannot figure out all the attributes, and secondly, the

specific attribute that is measured in-vitro is not likely to stand for the whole reason of

infertility (Mocé and Graham, 2008; Amann and DeJarnette, 2012; Utt, 2016). The use

of several attributes (predictors) in the model may be reasonable for fertility prediction;

however, finding the best attributes is also complicated (Amann and DeJarnette, 2012).

21

In addition, sperm traits are in general correlated, and such correlations in a multiple

regression model leads to wrong estimations (Utt, 2016). Finally, in vitro methods,

according to its definition, are conducted without the normal biological context which

lead to an innate disadvantage that the results of the assay may not be representing

what happen after insemination (Amann and DeJarnette, 2012). Thus, to make a

reasonable prediction of sire fertility, one needs to combine information and use not only

different statistic approaches but also reproductive biology knowledge (Utt, 2016).

The Genetic Basis of Bull Fertility

The quality of the genetic package is a good candidate to infer bull fertility. It

seems that genetics account for at least one third of the decline in pregnancy rate in

dairy cattle, and recent evidence also suggests that genetics is a major component of

male infertility (Ferlin et al., 2006; Shook, 2006a). Due to the wide use of AI, there is a

huge risk in dairy cattle that elite bulls may rapidly spread their genetic and

chromosomal defects into the entire population (Citek et al., 2009). Generally, the

genetic causes of sire infertility can be categorized into chromosomal abnormalities and

gene mutations.

Chromosomal Aberrations

Chromosomal aberrations have been widely studied because they are directly

linked to different fertility and reproductive problems in both human and domestic

animals (Larkin et al., 2015). Chromosome abnormality can negatively impact the

meiosis, gametogenesis, and the viability of the zygote and the subsequent embryo

(Raudsepp and Chowdhary, 2016). There are two types of chromosome aberrations,

namely numerical and structural aberrations (Larkin et al., 2015).

22

Numerical aberrations

Numerical aberrations can be classified into polyploidy and aneuploidy

(Raudsepp and Chowdhary, 2016). Polyploidy means that the cells contain more than

two paired sets of , which most likely result from first cleavage division suppression or embryonic cell fusion (Larkin et al., 2015). In mammals, aneuploidy is

the only numerical aberrations that was found, which result from failed disjunction in

homologous chromosomes during meiosis (Larkin et al., 2015). Aneuploidy means that

the cells contain an abnormal number of chromosomes; for instance, most commonly,

monosomes (missing a chromosome from a pair) and trisomies (one additional

chromosome of a pair) (Larkin et al., 2015). Autosomal numerical aberration are seldom

found in live animals since they are typically lethal during the embryonic stage

(Raudsepp and Chowdhary, 2016). Few cases have been reported mainly related to

abnormal number of sex chromosomes (Villagómez et al., 2009). For instance, XXY

trisomy was found in 3 Charolais sires, which seems to be related to testicular

hypoplasia, and hence, negatively affect bull fertility (Citek et al., 2009; Villagómez et

al., 2009).

Structural aberrations

Structural aberrations arise from alterations during the meiosis process, and are

related very often to reduced bull fertility (Molteni et al., 2005). Structural aberrations

appear as changes in the integrity of one or more chromosomes. These aberrations,

caused by misrepair of DNA breaks during the meiotic recombination, can be

categorized as balanced or unbalanced. (Larkin et al., 2015; Raudsepp and Chowdhary,

2016). Balanced aberrations do not alter the DNA content and typically include fusions,

fissions, inversions, or translocations (Larkin et al., 2015). Ideally, balanced aberrations

23

should be phenotypically harmless, and carriers can be ignored (Raudsepp and

Chowdhary, 2016). These carriers, however, have in general reproductive problems because most of these aberrations disturb meiotic pairing and chromosome segregation during the gametogenesis, and hence, they generate unbalanced gametes (Raudsepp and Chowdhary, 2016). These unbalanced gametes result in early embryonic failures

(Larkin et al., 2015; Raudsepp and Chowdhary, 2016). Therefore, balanced aberrations are especially harmful in breeding programs (Iannuzzi et al., 2002; Raudsepp and

Chowdhary, 2016). Unbalanced aberrations include deletions and duplications, which in

turn can reduce or increase DNA content, respectively. Large changes in chromosomal

segments are almost always lethal; however, the association between small changes and fertility is not yet well understood (Raudsepp and Chowdhary, 2016).

There is growing evidence that Robertsonian (rob)-type translocations are commonly related to cattle fertility, including both male infertility (Molteni et al., 2005) and also female infertility (Bonnet-Garnier et al., 2008). In fact, this type of translocation

is responsible for roughly 5-10% in fertility reduction (Larkin et al., 2015). Some studies

have also shown that an abnormal Y chromosome, which is the result from a reciprocal

translocation between BTAY and BTA9, can lead to azoospermia in young bulls

(Iannuzzi et al., 2002).

Gene Identification

As a complex or multifactorial trait, fertility is simultaneously affected by

numerous genes and environmental factors (Andersson and Georges, 2004). To

enhance our understanding of the genetic basis of this complex trait, the identification of

genes related to bull fertility is very important. In addition, this knowledge can provide

opportunities for improving bull fertility via marker-assisted selection.

24

Quantitative trait loci (QTL) are considered as regions on the genome influencing complex traits (Andersson and Georges, 2004). In the past two decades, identical-by- descent (IBD) mapping has been used to detect QTL associated with complex traits in livestock species (Andersson and Georges, 2004). However, the confidence interval of a given QTL is in general too large, and hence, the candidate region may harbor tens to hundreds of genes (Ron and Weller, 2007). This severely limits our understanding of the biology of trait, and makes also animal selection less accurate. Fortunately, the discovery of hundreds of thousands of single-nucleotide polymorphisms (SNP) spanning the whole bovine genome, plus the development of different cost-effective genotyping techniques have provided unique opportunities to identify genetic variants underlying complex trait through the use of genome-wide association studies (GWAS)

(Goddard and Hayes, 2009).

GWAS

The term association in GWAS refers to the association between the trait of interest and the genotype of a given SNP. In general, these associations are tested using linear models, which assume that the phenotype can be explained by the sum of independent variable effects. The SNP variable is included as a fixed effect, fitting one

SNP at a time. The significance level of the SNP is tested using a likelihood ratio comparing the full model versus a reduced model without the SNP effect. GWAS have successfully identified thousands of variants associated with multiple phenotypes

(Gibson, 2012). It is important to remark that complex traits are controlled by a large number of genes with each of them having a small effect. Therefore, the genomic regions detected by GWAS are only those with major effects. The vast majority of the genes with small effects remain undetected under the significance threshold.

25

Single step GBLUP is an alternative statistical approach to further understand the

genetic structure of complex traits. This method considers all markers simultaneously

and combines phenotypes from genotyped and also ungenotyped animals. Thus,

additional information can be used compared with traditional GWAS, increasing the

sample size, and hence, the power of detection. Several studies have shown that this

alternative GWAS method is a very useful tool (Misztal and Wang, 2014; Fragomeni et

al., 2015).

Table 2-1 summarizes the findings of several GWAS related to bull fertility.

Pathway based analysis

The traditional GWAS strategy is not powerful enough to detect SNP that have

marginally weak but jointly strong effects (Weng et al., 2011). An alternative approach, called pathway-based analysis, focuses on the association of a group of genes rather than exploring only the most significant markers. This strategy can greatly complement

GWAS results (Wang et al., 2007; Weng et al., 2011). Indeed, pathway-based analysis

provides a unique tool to detect pathways and genetic mechanisms that contribute to

complex traits (Peñagaricano et al., 2012c). One common approach of pathway

analysis is the gene set enrichment analysis (GSEA). GSEA is used to determine

whether members of a given gene set (i.e., genes that are involved in the same

biological process or have the same molecular function) are overrepresented with

significant genes (Subramanian et al., 2005). Using this approach, one can focus on

gene sets rather than individual genes, which would be more reproducible and also

interpretable. In addition, the signal to noise ratio can be greatly increased, and hence,

it is possible to detect subtle changes in some mechanisms. In fact, gene set analysis

has been used to explore sire conception rate, identifying several pathways, such as

26

small GTPases mediated signal transduction, neurogenesis, calcium ion binding, and that may play a role in male fertility in cattle (Peñagaricano et al., 2012c).

Novel Omic Technologies on Bull Fertility

The primary idea of ‘omic’ technology is to evaluate multiple biological molecules

in a specific biological sample and then gain greater understanding of the trait at the

whole system level (Horgan and Kenny, 2011). The ‘omic’ studies often lead to

biomarker discovery as multiple molecules are investigated simultaneously (Joyce and

Palsson, 2006; Horgan and Kenny, 2011). Despite the biomarkers found in the genome

using genomics, which is the most mature among all the ‘omics’ technologies, there are

other biomarkers found by other ‘omic’ studies that can provide further insights into bull

fertility. Along the path from the gene to the endpoint phenotype, DNA is transcribed to

mRNA, that is then translated into a , ‘omic’ studies can investigate each step

and give rise to genomics, transcriptomics, proteomics and metabolomics, respectively

(Joyce and Palsson, 2006).

The main goal of a transcriptomic analysis is to measure global gene expression,

evaluating both the presence and the abundance of individual transcripts (Joyce and

Palsson, 2006). Consequently, gene activity in any given moment can be evaluated

(Horgan and Kenny, 2011). Regarding bull fertility, microarray expression technology

has been used to evaluate the sperm transcriptome (Feugang et al., 2010). The

advantage of transcriptome analysis in bull fertility is the ability to access the remnant

mRNA from spermatogenesis, which is partially or totally delivered by the spermatozoa

to the oocyte in addition to the paternal genome (Feugang et al., 2010). Although the

role of this residual RNA is still unclear, this transcriptomic profile may represent a

fingerprint of spermatogenesis quality (Lalancette et al., 2008). Specific transcripts

27

associated with bull fertility have been identified in several studies. In particular, three transcripts, protamine 1, casein beta 2, and thrombospondin receptor CD36 molecule, were associated with bull fertility (Feugang et al., 2010). Additionally, transcripts

encoding a serine/threonine testis-specific protein kinase (TSSK6) and a

metalloproteinase non coding RNA (ADAM5P) were found to be related to high semen

motility (Bissonnette et al., 2009). One limitation of any transcriptomic analysis is that

gene expression is a measure of an intermediate product rather that an endpoint

product, which may lead to problems regarding to data interpretation.

The term proteomics refers to the large scale study of all expressed proteins in

an organism or system (Joyce and Palsson, 2006). Proteomics holds a special promise

for male fertility since it is the only comprehensive method that allows to study

molecular function of the spermatozoa, which has no active transcription or translation

(Peddinti et al., 2008; O'Brien et al., 2010). Especially, proteomics is ideal for biomarker

detection because the proteins directly affect the endpoint phenotype (Horgan and

Kenny, 2011). The first bovine spermatozoa proteomic analysis was conducted by

comparing extreme fertility bulls; this study reported a panel of protein biomarkers and

also potential signaling pathways closely related to bull fertility (Peddinti et al., 2008).

Another proteomic study, using 2-DE-based analysis, found that proteins involved in

sperm-egg interaction and cell cycle regulation are related to differences in bull fertility

(Gaviraghi et al., 2010). In addition, a panel of transmembrane proteins were reported

as potential biomarkers (Byrne et al., 2012), and nine proteins were identified as

putative sperm fertility biomarkers affecting the ability of bull sperm to reach and fertilize

the oocyte (D'Amours et al., 2010).

28

Metabolomics refers to the measure of the entire set of metabolites in a biological system, which can provide a panel of potential biomarkers and also indicates cell functionality (Horgan and Kenny, 2011). One obvious advantage of metabolomics over the other ‘omic’ technologies is that the metabolite is considered to be very close to the final (endpoint) phenotype (Horgan and Kenny, 2011). Metabolomics studies in human fertility have identified oxidative stress biomarkers closely associated with semen quality

(Deepinder et al., 2007). Therefore, this approach is promising for bull fertility biomarker detection.

The biomarkers discovered using ‘omic’ strategies can be very useful to evaluate semen quality and predict bull fertility (Kaya and Memili, 2016). It is worth noting that in order to maximize the information delivered from these ‘omic’ platforms, an integrated system biology approach is needed to gain further understanding of complex phenotypes (Kaya and Memili, 2016).

29

Table 2-1. Summary of the strongest candidate genes found by GWAS Chr Gene Gene function Trait Species Reference

1 ITGB5 ITGB5 gene may involve in sperm- Noncompens Holstein (Feugang et egg interaction atory fertility Bulls al., 2009a) in semen 5 LOC784 LOC784935 encodes cpb-1 SCR Holstein (Peñagaricano 935 protein that is present in the germ Bulls et al., 2012b) line just prior to overt spermatogenesis and is essential for successful progression through meiosis. 23 PPP1R1 PPP1R11 plays important roles in SCR Holstein (Li et al., 1 sperm motility and Bulls 2012a) spermatogenesis. 19 STAT5A STAT5A encodes protein that is a EREC Holstein (Khatib et al., member of the STAT family of Bulls 2010a) transcription factors. 17 FGF2 Plays an important role in the regulation of cell survival, cell division, angiogenesis, cell differentiation and cell migration. Functions as potent mitogen in vitro. Can induce angiogenesis. X AR The gene encodes androgen Scrotal Brahman (Fortes et al., receptor. Androgen-signaling circumference Bulls 2012a) pathway is critical for testicular at 12 mo development, spermatogenesis, and mammalian fertility. X SERPIN The gene SERPINA7 encodes the Proportion of A7 major thyroid hormone transport normal sperm protein, TBG, in serum that can at 24 mo influence steroidogenesis and spermatogenesis. 14 PLAG1 PLAG1, which is developmentally IGF1 regulated has been showed consistently rearranged in pleomorphic adenomas of the salivary glands. 7 PROP1 PROP1 gene expressed in the SCR Holstein (Lan et al., pituitary gland specifically, and it Bulls 2013a) plays a important role in the ontogenesis of pituitary gonadotropes, somatotropes, lactotropes, and caudomedial thyrotropes. 7 CSF1R The gene involved in the Semen Chinese (Qin et al., regulation of self-renewal and volume per Hosterin 2016) extrinsic stimulation of ejaculate Bulls spermatogonia stem cells and is the tissue-specific stem cell of mammalian testes, committed to establishing and maintaining spermatogenesis and maintaining male fertility.

30

Table 2-1. Continued Chr Gene Gene function Trait Species Reference

1 SOD1 The gene is important in sperm Sperm production and ultimately concentration contributes to the maturation per ejaculate and/or sperm survival. 1 LOC785 This gene is important in chemo- Sperm Holstein- (Hering et al., 875 sensing and in regulation of sperm motility Friensian 2014a) motility. Bulls 10 GALC GALC plays important function in Semen Holstein- (Hering et al., spermiogenesis and indicated on Volume Friensian 2014c) critical role for lysosomal enzymes Bulls in testicular as well as epididymal maturation of the mouse spermatozoa. 22 PHF7 PHF7 is found to be involved in Total Number germ-cell development and of Sperm spermatogenesis, and also play a role in germline stem cell keeping and gametogenesis in males. 3 PRMT6 PRMT5 is associated with germ- Sperm Holstein- (Hering et al., cell development in mice, and the Concentration Friensian 2014b) expression of PRMT5 is specific Bulls for gonads in adult medaka. X MAGEB The encoded protein is specifically Semen Holstein- (Suchocki and 10 expressed in testis and tumor Volume & Friensian Szyda, 2015) cells. Number of Bulls Spermatozoa X KLH13 This gene plays a role in proper Motility Score chromosome segregation and completion of cytokinesis. 6 SGMS2 SGMS2 gene plays a role in Sperm Holstein- (Kamiński et developing acrosome in Membrane Friensian al., 2016) spermatids and cell membrane Integrity Bulls reconstructing. 19 TMEM9 TMEM95 is a nonsense mutation Male Fleckvieh (Pausch et al., 5 that can cause of the idiopathic Reproductive Bulls 2014) male subfertility. It located at the Ability surface of spermatozoa of fertile animals whereas it is absent in spermatozoa of subfertile animals.

31

CHAPTER 3 UNRAVELLING THE GENOMIC ARCHITECTURE OF BULL FERTILITY IN HOLSTEIN CATTLE

Background

Improving reproductive efficiency of dairy cattle has become one of the major challenges of the dairy industry worldwide. The intense selection for production traits in the last decades has led to a decrease in fertility (Royal et al., 2000; Lucy, 2001b).

Fertilization failure and early embryonic loss have been identified as the two main

factors contributing to this decline (Morris and Diskin, 2008; Diskin et al., 2012). For

instance, fertilization rate in high-producing dairy cows is about 75%, and only 65% of

the fertilized eggs are considered viable at 5-6 days post-fertilization (Santos et al.,

2004). It is no surprise that conception rates are only 35-45% (Santos et al., 2004).

Many factors may account for this decline in reproductive performance, including physiological, nutritional, environmental, and genetic factors. In this sense, several studies have recognized that there is substantial genetic variation underlying reproductive success in dairy cattle (Shook, 2006b; Weigel, 2006).

Reproduction is a very complex process that involves numerous consecutive events, including gametogenesis, fertilization, and early embryo development, that should be accomplished in a well-orchestrated manner in order to achieve a successful pregnancy. The relative importance of the parental effects on the reproductive success, i.e., maternal versus paternal contribution to the zygote, is still largely unknown (Kropp et al., 2014). Most studies in dairy cattle have focused on female fertility, while male fertility has received much less attention. It is worth noting that the service sire has a direct influence not only in the fertilization process but also on the viability of the preimplantation embryo (Stalhammar et al., 1994; Amann and DeJarnette, 2012). In

32

fact, previous studies have reported that the service sire represents an important source of variation for conception rate in dairy cattle (DeJarnette et al., 2004b; Jamrozik et al.,

2005; Nagamine and Sasaki, 2008).

Both candidate gene (Khatib et al., 2010b; Li et al., 2012b; Lan et al., 2013b) and

whole-genome scan (Feugang et al., 2009b; Blaschek et al., 2011a; Fortes et al.,

2012b; Peñagaricano et al., 2012a) approaches have attempted to identify genomic

regions and individual genes responsible for the genetic variation in bull fertility. For

instance, two highly conserved spermatogenesis genes, MAP1B and PPP1R11, were

significantly associated with male fertility in Holsteins (Li et al., 2012b). In addition,

genetic markers in BTA2, BTA14, and BTAX were associated with testicular

development, sperm quality, and hormone levels in young Brahman bulls (Fortes et al.,

2012b). It should be noted that these association studies detect in general only the most

significant markers, and hence, the vast majority of the genetic variants contributing to

the trait remain hidden. In this context, gene set or pathway-based analysis offers an

alternative strategy based on evaluating modules of functionally related genes, rather

than focusing only on the most significant markers (Chasman, 2008; Peng et al., 2010).

This approach provides unique opportunities to detect the genetic mechanisms

underlying complex phenotypes. Indeed, using this pathway-based approach, we have

identified some processes, such as small GTPases mediated signal transduction or

calcium ion binding, that may explain part of the differences in sire fertility

(Peñagaricano et al., 2013).

The main objective of this study was to unravel the genomic architecture

underlying sire fertility in dairy cattle. Sire Conception Rate (SCR) was used as a

33

measure of bull fertility. SCR is a new and more accurate phenotypic evaluation of dairy

sire fertility calculated using field data. Two complementary genome-wide association

approaches plus different gene set analyses were performed in order to identify

genomic regions, individual genes, functional gene terms, and biological pathways

associated with sire fertility. These findings can contribute to a better understanding of the genetics underlying this complex trait and may point out opportunities for improving bull fertility via selective breeding.

Methods

Phenotypic and Genotypic Data

The Animal Improvement Programs Laboratory of the United States Department of Agriculture (AIPL-USDA) implemented in 2008 a national phenotypic evaluation of bull fertility called Sire Conception Rate (SCR). The model that is being used in the U.S.

bull fertility evaluation includes both factors related to the service sire under evaluation

(including age of the bull and AI organization) and also factors (nuisance variables)

associated with the cow that receives the unit of semen (including herd-year-season,

cow age, parity, and milk yield) (Kuhn and Hutchison, 2008b; Kuhn et al., 2008a). The

trait SCR is defined as the expected difference in conception rate of a given bull

compared to the mean of all other evaluated bulls; in other words, a bull with an SCR

value of +5.0% is expected to achieve a conception rate of 37% in a herd that normally

averages 32% and uses average SCR bulls. It is worth noting that the U.S. bull fertility

evaluation, in contrast to evaluations for other traits such as production, is intended as a

phenotypic rather that a genetic evaluation, because the estimates include not only

genetic but also some (permanent) environmental effects.

34

The entire evaluation of U.S. Holstein bull fertility was used in this study.

Specifically, a total of 44,449 SCR records were available from a total of 10,884

Holstein bulls. These SCR records were obtained from 23 consecutive evaluations

provided to the U.S. dairy industry between August 2008 and April 2016. These 23

different SCR evaluations are available at the Council of Dairy Cattle Breeding (CDCB) website (https://www.cdcb.us/). Figure 3-1 shows (A) the distribution of SCR values per

evaluation and (B) the distribution of the number of SCR records per bull, i.e., total

number of repeated measurements per sire evaluated. The reliabilities of the SCR

records, calculated as a function of the number of breedings, were also available for the

analyses.

Genotype data for 60,671 single nucleotide polymorphism (SNP) markers were

available for 7,447 out of the 10,884 Holstein bulls with SCR evaluation. The SNP data

were kindly provided by the Cooperative Dairy DNA Repository (CDDR). Those SNP

markers that mapped to the sex chromosomes, or were monomorphic, or had minor

allele frequency less than 1% were removed from our dataset. After data editing, a total

of 58,029 SNP markers were retained for subsequent genomic analysis.

Statistical Methods for Genome-Wide Association Mapping

The association analysis between phenotypes and genotypes using related individuals with repeated measurements can be implemented within the framework of the classical repeatability animal model,

y = X + Zu + Wpe + e (3-1)

where y is the vector of phenotypicβ records (SCR values), is the vector of fixed

effects included in the model, u is the vector of random animal effects,β pe is the vector

35

of random permanent environmental and non-additive effects, and e is the vector of random residual effects. The matrices X, Z, and W are the incidence matrices relating phenotypic records to fixed, animal, and permanent environmental effects, respectively.

In this context, the random effects are assumed to follow a multivariate normal distribution,

u K 0 0 (3-2) pe , , 0, 0 2 I 0 𝜎𝜎𝑢𝑢 e 2 2 2 0 02 R 𝑢𝑢 𝑝𝑝𝑝𝑝 𝑒𝑒 𝜎𝜎𝑝𝑝𝑝𝑝 � �𝜎𝜎 𝜎𝜎 𝜎𝜎 � ∼ 𝑁𝑁 � � 2�� 𝑒𝑒 where , , and are the animal additive genetic, permanent𝜎𝜎 environmental, and 2 2 2 𝑢𝑢 𝑝𝑝𝑝𝑝 𝑒𝑒 residual𝜎𝜎 variances𝜎𝜎 respectively;𝜎𝜎 K is a kinship matrix that can be calculated using either

pedigree or genotypic information, and R is typically an identity matrix (I) or a diagonal

matrix.

In this particular study, two alternative genome-wide association mapping

approaches were performed: (1) single-step genomic best linear unbiased prediction

(ssGBLUP) and (2) classical genome-wide association study (cGWAS) using regular

single-marker regression analysis but with correction for population structure. The

ssGBLUP combines all the available phenotypic, pedigree and genotypic information,

and fits all the SNP simultaneously, while cGWAS typically uses only animals that have

both phenotypic and genotypic data, and fits the SNP markers one at a time.

Genome-Wide Association Mapping Using ssGBLUP

The ssGBLUP method is one of a group of statistical methods that were originally

developed for genomic prediction and later was extended for performing gene mapping.

Indeed, ssGBLUP model is a modification of the classical BLUP model where the

pedigree relationship matrix A is replaced by H which combines pedigree and genotypic

36

information (Aguilar et al., 2010). The combined pedigree-genomic relationship matrix

H is calculated as follows, −1 0 0 H = A + (3-3) 0 G A −1 −1 −1 −1 � 1 22 � where G is the inverse of the genomic relationship− matrix and A is the inverse of the −1 −1 1 22 pedigree-based relationship matrix for genotyped animals. In this case, G has

1 dimensions 7,993 × 7,993 and it was created using the 7,447 sires with both SCR and

SNP data plus 546 genotyped sires with no SCR records. In addition, the A matrix

(25,075 × 25,075) was calculated based on a five generation pedigree downloaded

from AIPL-USDA website. The random effects were assumed multivariate normal with

u (0, H ), pe 0, I , and e (0, Q ). Note that in this case the original 2 2 −1 2 𝑢𝑢 𝑛𝑛 𝑝𝑝𝑝𝑝 𝑁𝑁 𝑒𝑒 kinship∼ 𝑁𝑁 matrix𝜎𝜎 K is∼ replaced𝑁𝑁� 𝜎𝜎 by� H, and the∼ 𝑁𝑁 residual𝜎𝜎 matrix R is the inverse of a diagonal

matrix Q with its elements representing the reliabilities of the SCR values. The

subscripts and indicate the size of the matrices and represent the number of

individuals 𝑛𝑛with SCR𝑁𝑁 records ( = 10,884) and the total number of SCR records( =

44,449), respectively. 𝑛𝑛 𝑁𝑁

Candidate regions associated with sire fertility were identified based on the

amount of genetic variance explained by 1.5 Mb window of adjacent SNPs evaluated

across the entire bovine genome. Given the genomic estimated breeding values

(GEBVs), the SNP effects can be estimated as s = [ ] a , where s is the vector −1 g of SNP marker effects, D is a diagonal matrix of� weightsDZ′ ZDZ′ of SNPs,� and a �is the vector of

g GEBVs (Wang et al., 2012). The percentage of genetic variance explained� by a given

1.5 Mb genomic region was then calculated as,

37

( ) (3-4) × 100 = 𝐵𝐵 × 100 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 𝑉𝑉𝑉𝑉𝑉𝑉�∑𝑗𝑗=1 𝑍𝑍𝑗𝑗𝑠𝑠𝑗𝑗� 2 2 𝑢𝑢 𝑢𝑢 where is the genetic value𝜎𝜎 of the genomic𝜎𝜎 region under consideration, is the 𝑡𝑡ℎ 𝑖𝑖 total number𝑢𝑢 of adjacent SNPs within𝑖𝑖 the 1.5 Mb region, and is the marker𝐵𝐵 effect of

𝑗𝑗 the SNP within the region. All the ssGBLUP calculations𝑠𝑠 were performed using 𝑡𝑡ℎ 𝑡𝑡ℎ the 𝑗𝑗BLUPF90 family of𝑖𝑖 programs from Ignacy Misztal and collaborators, University of

Georgia.

Genome-Wide Association Mapping Using Single Marker Regression (cGWAS)

For the whole genome single marker regression, we extended the repeatability

model as,

y = X + + Zu + Wpe + e (3-5)

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 where is the design matrixβ 𝑋𝑋for the𝛽𝛽 SNP under study (coded as 0, 1 or 2) and is

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 the regression𝑋𝑋 coefficient or SNP effect (also known as the allele substitution effect).𝛽𝛽 In

this particular case, the distribution of the random effects were assumed multivariate

normal with u (0, G ), pe 0, I , and e (0, I ). Here the original 2 2 2 2 𝑢𝑢 𝑚𝑚 𝑝𝑝𝑝𝑝 𝑀𝑀 𝑒𝑒 kinship matrix ∼K 𝑁𝑁is replaced𝜎𝜎 by G∼ 𝑁𝑁 that� is 𝜎𝜎calculated� based∼ 𝑁𝑁 on 𝜎𝜎the 7,447 sires that had

2 both SCR records and genotypic data. The subscripts and indicate the size of the

identity matrices and represent the number of individuals𝑚𝑚 with𝑀𝑀 SCR records ( = 7,447) and the total number of SCR records ( = 32,590) used in this particular analysis.𝑛𝑛

Note that the extended repeatability𝑁𝑁 model can be written as y = X + +

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 , where (0, V) with V = ZG + + I . In this scenario, βthe 𝑋𝑋significant𝛽𝛽 2 2 2 2 𝑢𝑢 𝑝𝑝𝑝𝑝 𝑀𝑀 𝑒𝑒 effectϵ of theϵ ∼ SNP𝑁𝑁 marker can be tested𝑍𝑍′𝜎𝜎 usingWW′ 𝜎𝜎a standard𝜎𝜎 Wald statistics computed from

the ratio of the estimate of and its standard error. However, the application of this

𝛽𝛽𝑆𝑆𝑆𝑆𝑆𝑆

38

test across the whole genome is computationally prohibitive. Alternatively, the association of a given SNP with SCR can be evaluated in a more computationally efficient way using the following test statistic,

X V y X (3-6) = ′ X −1V X SNP o � − β�� 𝑧𝑧 ′ −1 SNP o SNP which approximates the Wald test, �and hence, is asymptotically standard normal. Here,

V is computed as V but from a model where the term is removed, and is

o 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 obtained from the model y = X + + e, assuming𝑋𝑋 𝛽𝛽e (0, V ). Theseβ � 2 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜 e analyses were performed usingβ the𝑋𝑋 R package𝛽𝛽 RepeatABEL ∼(Rönnegård𝑁𝑁 σ et al., 2016)

Gene Set Analysis

The gene set analysis consists basically of three different steps (Gambra et al.,

2013; Peñagaricano et al., 2013): (i) the assignment of SNPs to genes, (ii) the

assignment of genes to functional categories, and (iii) the association analysis between

each functional category and the phenotype of interest.

1. The SNPs were assigned to bovine genes based on the UMD3.1 bovine genome sequence assembly (Zimin et al., 2009) using the Bioconductor R package biomaRt (Durinck et al., 2005; Durinck et al., 2009). A given SNP was assigned to a particular gene if it was located within the gene or within 15kb either upstream or downstream of the gene. An arbitrary threshold of P-value ≤ 0.01 was used to define significant SNPs (based on the results of the cGWAS); in this context, significant genes were defined as those genes that contained at least one significant SNP.

2. The databases Gene Ontology (GO) (Ashburner et al., 2000), and Medical Subject Headings (MeSH) (Nelson et al., 2004; Cole et al., 2011) were used to define functional categories of genes. The idea is that genes assigned to the same functional category can be considered as members of a group of genes that share some particular properties, typically their involvement in the same biological or molecular process.

3. The significant association of a given term with SCR was analyzed using Fisher’s exact test. The P-value of observing significant genes in the term was calculated by 𝑔𝑔

39

(3-7) = 1 𝑔𝑔−1 𝑆𝑆 𝑁𝑁 − 𝑆𝑆 � � � � 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 − � 𝑖𝑖 𝑘𝑘 − 𝑖𝑖 𝑁𝑁 𝑖𝑖=0 � � where is the total number of significant genes𝑘𝑘 associated with SCR, is the total number of genes that were analyzed, and is the total number of genes in the term𝑆𝑆 considered (Peñagaricano et al., 2013; Abdalla et al., 2016). T𝑁𝑁he GO gene set enrichment analysis was performed using𝑘𝑘 the R package goseq (using method hypergeometric) (Young et al., 2010) while the MeSH enrichment analysis was carried out using the R package meshr (Morota et al., 2015; Tsuyuzaki et al., 2015). Additionally, the semantic similarities among GO functional terms were calculated based on the GO hierarchy using the R package GOSemSim (Yu et al., 2010).

Results

Whole Genome Association Analysis

Two complementary genome-wide association approaches, ssGBLUP and

cGWAS, were performed in order to identify genomic regions and candidate genes

associated with bull fertility. These two alternative methods differ slightly in how they

identify significant regions or genes associated with the phenotype of interest. On the

one hand, ssGLUP allows identifying genomic regions that explain a given amount of

genetic variance. On the other hand, using cGWAS, it is possible to formally evaluate

the significance of the association (using a statistical test) between each genetic marker

and the phenotype of interest. In our study, these two methods yielded very similar

results; in fact, the Spearman's rank correlation coefficient between the SNP effects

calculated with ssGLUP and cGWAS was equal to 0.943. In addition, the corresponding

Manhattan plots showed similar profiles with common significant regions in BTA21 and

also BTA25 (Figure 3-2). Note that, as expected, ssGBLUP yields less noisy results with

well-defined peaks across the entire genome.

40

Figure 3-2A displays the results obtained with ssGBLUP method in terms of the proportion of genetic variance explained by 1.5 Mb SNP windows across the entire bovine genome. A total of six different genomic regions, distributed on chromosomes

BTA5, BTA13, BTA21 and BTA25, explained more than 0.50% of the genetic variance for sire conception rate. Figure 3-3 shows the genomic location, the percentage of genetic explained, and the list of genes located in each of these SNP windows. The region that explained the highest percentage of genetic variance (1.06%) was located on chromosome 21 (21:8031396-9528223). Interestingly, this region harbors IGF1R, an

insulin-like growth factor receptor that plays critical roles in different reproductive

events, including testis development and spermatogenesis. Another SNP-window on

BTA21 (21:68,846,429-70,294,301) explained also a substantial amount of genetic

variance (0.82%); this region harbors two genes, TDRD9 and CKB, which are

implicated in sperm development and sperm quality, respectively. Moreover, two

different regions on BTA25 (25:3148958-4647188, and 25:26736589-28233820) explained together almost 1.50% of the genetic variance. Notably, these regions harbor

several putative candidate genes for bull fertility, including MGRN1 and SEPT12, which

are directly involved in spermatogenesis, and CCT6A that is implicated in the

fertilization process. Finally, two genomic regions on BTA5 and BTA13 were also

identified; each of these windows explains roughly 0.60% of the genetic variance. The

region located on BTA5 (5:105357507-106813133) harbors two genes, PARP11 and

AKAP3, that are involved in sperm maturation and motility. In addition, at least two

putative genes related to male infertility, CTCFL and SPO11, are located in the middle

of the region detected on BTA13 (13:58456868-59951247).

41

Figure 3-2B displays the results obtained with cGWAS in terms of

( ) for each of the SNP markers evaluated across the genome. In addition,

10 −Table𝑙𝑙𝑙𝑙𝑙𝑙 3-𝑃𝑃𝑃𝑃1 describes𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 in detail the six most significant SNP markers detected in this

analysis (P-value ≤ 1.5E-05; q-value ≤ 0.15). The most significant SNP (BTB-01438088,

P-value = 5.1E-08) is located in BTA9 in an intron of the gene RIMS1. This gene

regulates synaptic vesicle exocytosis and is also involved in the regulation of voltage-

gated calcium channels. Unsurprisingly, the RIMS1 allele negatively associated with

conception rate is in very low frequency in the bull population (fB = 0.038). Two SNP

markers located in chromosome 25, BTA-59768-no-rs and ARS-BFGL-NGS-112660, showed remarkable associations with sire conception rate (P-value = 2.8E-07). Note that this genomic region (BTA25 26-28 Mb) was also detected using ssGLUP method.

The two significant markers were highly correlated (high linkage disequilibrium), and therefore, it is very likely that they represent the same genetic signal. The marker BTA-

59768-no-rs is located in an intron of the gene KAT8. This gene encodes a histone

acetylase implicated in chromatin modification and gene expression regulation. Finally,

like ssGBLUP, the single marker regression also detected the region in BTA21 at 68-71

Mb as significantly associated with sire fertility (P-value = 1.4E-05). The significant SNP

marker ARS-BFGL-NGS-106232 is located within the gene BRF1, which encodes one

the subunits of the RNA polymerase III transcription factor complex, and hence, it is

directly involved in transcription initiation.

Gene Set Analysis

The whole-genome association analysis was complemented with a gene set enrichment analysis in order to detect potential functional categories and molecular mechanisms associated with sire fertility. Of the 58,029 SNP markers evaluated in the

42

analysis, 27,066 were located within or surrounding annotated genes; this set of SNPs pointed to a total of 17,259 annotated genes. A subset of 349 of these 17,259 genes had at least one SNP with P-value ≤ 0.01, and hence, were defined as significantly associated with bull fertility.

Figure 3-4 displays a set of GO Biological Process terms that were significantly enriched with genes associated with SCR. Noticeably, some of these terms are closely associated with male fertility, such as reproduction process (GO:0022414) and fertilization (GO:0009566). These two categories, highly related in the GO hierarchy, had four significant genes in common, namely BSP3, BSP5, SLC22A16, and ZP2, all of them directly involved in the process of spermatogenesis and subsequent ovum fecundation. Furthermore, many significant GO terms were associated with ion transport and homeostasis, including cation transport (GO:0006812), zinc II ion transport

(GO:0006829), regulation of sodium ion transport (GO:0002028), zinc ion homeostasis

(GO:0055069), and cellular metal ion homeostasis (GO:0006875). Moreover, terms related to developmental biology (e.g. GO:0048588), small GTPase mediated signal transduction (e.g. GO:0032482), and mRNA processing (e.g. GO:0050685) were also enriched with significant genes.

Several GO terms classified into the Molecular Function domain showed an overrepresentation of genes associated with sire fertility (Table 3-2). Especially,

functional terms related to channel regulation [e.g., calcium channel regulator activity

(GO:0005246, P-value = 0.020) and sodium channel regulator activity (GO:0017080, P-

value = 0.010)], and transmembrane transporter activity [e.g., inorganic cation

transmembrane transporter activity (GO:0022890, P-value = 0.009) and ion

43

transmembrane transporter activity (GO:0015075, P-value = 0.015)] showed an overrepresentation of significant genes. Of particular interest, two closely related terms,

SNARE binding (GO:0000149, P-value = 0.007) and SNAP receptor activity

(GO:0005484, P-value = 0.003), which involve a group of membrane-associated

proteins that participate in different reproductive events including spermatogenesis and

acrosome reaction, were significantly enriched with at least three genes, STX1A,

STX1B and STX8, associated with sire conception rate.

Table 3-3 shows a panel of MeSH terms that were enriched with genes

associated with SCR. Many of these terms are closely related to male fertility, such as

spermatozoa (D013094), sperm capacitation (D013075), and sperm motility (D013081).

Five genes associated with SCR, namely AKAP3, BSP3, BSP5, NTRK2 and ZP2, were

part of these terms. Additionally, two other terms related to fertility, follicle stimulating

hormone (D005640) and pregnancy rate (D018873), were also enriched with significant

genes, including AKT1, CTTNBP2NL, FSHR and IGF1R. Finally, functional categories

involving protein kinases (D017868) and GTPases (D020691) were also detected as

significant in the MeSH-informed enrichment analysis.

Discussion

There is growing evidence that bull fertility is influenced by genetic factors. The

present study was specifically performed to unravel the genomic architecture underlying

sire conception rate, an accurate phenotypic measure of dairy sire fertility. Although

previous studies have attempted to identify potential genes and pathways related to

SCR (Peñagaricano et al., 2012a; Peñagaricano et al., 2013), this study has some

unique features, including the analysis of a large dataset including almost 11k bulls with

44

about 45k fertility records, the use of alternative methods for gene mapping, and the application of novel gene set tools, such as MeSH enrichment analysis.

Many methods have been proposed to detect and localize genes underlying complex traits. Given that there is no method that is clearly superior than the others, it is recommended to combine multiple approaches in order to obtain more reliable findings

(Legarra et al., 2015). As such, two alternative whole genome scans were implemented in this study, including a regular single marker regression (cGWAS) and a single-step genomic prediction method (ssGLUP). It is worth noting that these two methods yielded very similar results. In particular, both approaches have identified candidate genomic regions in BTA21 and BTA25 that may be underlying the genetic variation in dairy sire fertility.

The significant region in BTA21 located at 68-71 Mb (see Figure 3-2 and Figure

3-3) harbors at least two candidate genes, namely CKB and TDRD9 that might be directly involved in sire fertility. Gene CKB encodes the enzyme creatine kinase, and previous studies have reported that elevated levels of creatine kinase in the sperm are associated with severe oligospermia and male infertility (Gergely et al., 1999). In fact, some researchers have proposed that creatine kinase should be used as an indicator of sperm quality and maturity in humans (Hallak et al., 2001). Similarly, gene TDRD9

encodes an helicase which plays an important role during spermatogenesis by silencing

potential transposable elements, and hence, protecting the integrity of the male

germline (Shoji et al.). Hence, our findings provide a foundation for future studies that

seek to decipher the specific roles of CKB and TDRD9 in bull fertility. No less important,

the results of ssGBLUP in BTA21 at 8-9 Mb strongly suggest IFGF1 as a candidate

45

gene for sire conception rate. This gene belongs to a family of insulin-like growth factors

that has important roles in sex determination, testis development, spermatogenesis and

steroidogenesis (Griffeth et al., 2014). Interestingly, IGF1R has been implicated in

regulating Sertoli cell proliferation and maturation, testis size, and sperm capacitation

(Pitetti et al., 2013; Wang et al., 2015). Therefore, our findings provide more evidence of

the association between IGF1R and male fertility.

Both ssGBLUP and cGWAS identified the region in BTA25 at 26-28 Mb as

significantly associated with SCR. This region harbors at least two genes, namely KAT8

and CCT6A, with potential roles in dairy sire fertility. The gene KAT8, a member of the

MYST histone acetyltransferase family, is highly expressed during sperm development

(Thomas et al., 2007), and it plays essential roles during early embryonic development

(Thomas et al., 2008). In addition, the gene CCT6A encodes a molecular chaperone

that mediates the sperm-ooctyte interaction during fertilization (Dun et al., 2011).

Moreover, the significant region detected in BTA25 but at 3-4 Mb also contains

candidate genes for bull fertility, such as SEPT12 and MGRN1. Indeed, SEPT12 is

expressed specifically in the testis and encodes a GTP-binding protein that has been

implicated in sperm morphogenesis, sperm motility and male infertility (Lin et al., 2009;

Kuo et al., 2015). Likewise, the gene MGRN1 is widely expressed in the male

reproductive system, and recent studies have shown that MGRN1 knockout in mice

results in male infertility, with disruption of hormones secretion and impaired sperm

motility (Cheng et al., 2014). It should be noted that this specific region in BTA25 had

been already associated with sire fertility (Peñagaricano et al., 2012a). Overall, our

findings provide further evidence for the presence of one or more genes that affect bull

46

fertility in these regions of BTA25. Additional functional studies, including resequencing and fine mapping, are needed to decipher the roles that these genomic regions have in male fertility.

Given that whole-genome scans only detect the most significant regions, and these regions explain only a small fraction of the genetic variance, additional approaches are needed in order to dissect the complex genetic architecture of a quantitative trait. In the present study, different pathway-based approaches, using GO and MeSH databases, were used in order to obtain additional insights regarding the genetic determinants and biological mechanisms underlying sire fertility. Interestingly, some biological processes directly related to male fertility, such as fertilization and sperm motility, were among the most significant functional categories. Further analyses revealed that at least six genes associated with SCR, including AKAP3, BSP3, BSP5,

NTRK2, SLC22A16, and ZP2, were part of these functional categories. Interestingly, the gene AKAP3 is expressed in the spermatozoa and is involved in sperm motility, sperm capacitation, and the acrosome reaction (Ficarro et al., 2003). In addition, the genes

BSP3 and BSP5 are two binder of sperm proteins implicated in sperm capacitation and fertilization (Hung and Suarez, 2012). The gene ZP2 encodes a sperm receptor that mediates gamete recognition during the fertilization (Avella et al., 2014). These findings clearly demonstrate that gene set tools can greatly complement genome-wide association studies in order to understand the genetic basis of complex traits.

Of special interest, GO molecular function terms related to SNARE proteins showed an overrepresentation of significant genes. SNARE proteins are implicated in membrane fusion events, including several events that occur during spermatogenesis

47

and also the acrosome reaction (Gamboa and Ramalho-Santos, 2005). In fact, it was proposed that SNARE proteins are key players involved in controlling the acrosome reaction during fertilization (Ramalho-Santos et al., 2002). Therefore, our findings provide further evidence regarding the active role of SNARE proteins in male fertility. On the other hand, several GO terms associated with ion transport and channel regulation also showed a significant enrichment of genes associated with SCR. It is well- documented that ion channels regulate several sperm physiological responses, including maturation, motility, and chemotaxis (Lishko et al., 2012). Interestingly, most of the significant terms were related to calcium transport and regulation, and several studies have reported that calcium is indeed implicated in the regulation of sperm motility, and it is an essential second messenger for the acrosome reaction (Darszon et al., 2011). Therefore, our findings provide further evidence of the important association between calcium and sperm physiology. More in general, note that the genetic markers located in genes initially detected in our GO or MeSH-informed enrichment analysis may facilitate the incorporation and implementation of genomic selection in commercial breeding schemes.

Conclusion

In this study, a comprehensive genomic analysis was performed with the purpose of unravelling the genetic architecture underlying sire conception rate in Holstein dairy cattle. Genomic regions in BTA5, BTA9, BTA13, BTA15, BTA21 and BTA25 were associated with sire fertility. Most of these regions harbor genes with known roles in sperm biology, including sperm maturation, motility and fertilization. Moreover, gene set analysis revealed that many of the significant terms, such as reproductive process, calcium ion channels, and SNARE proteins, are implicated in biological processes

48

related to male fertility. Overall, this integrative study sheds light on the genetic variants and mechanisms underlying this complex phenotype in cattle. In addition, these findings can provide opportunities for improving bull fertility via marker-assisted selection.

49

Figure 3-1. Descriptive statistics for Sire Conception Rate (SCR): (A) Distribution of SCR values per evaluation, and (B) Distribution of the number of SCR records per bull.

50

Figure 3-2. Manhattan plots showing the results of the genome-wide association mapping for Sire Conception Rate: (A) Percentage of genetic variance explained by 1.5 Mb SNP windows across the genome (ssGBLUP method), and (B) log (Pvalue) for each of the genetic markers evaluated across the genome (cGWAS method). − 10

51

Figure 3-3. Genomic regions (1.5 Mb) that explain more than 0.50% of the genetic variance for Sire Conception Rate: genomic location, percentage of variance explained, and list of genes. Adapted from www.ensembl.org using bovine assembly UMD 3.1.

52

Figure 3-4. Gene Ontology Biological Process terms significantly enriched with genes associated with Sire Conception Rate: (A) Name, total number of genes, P- value, and total number of significant genes per functional term, and (B) Semantic similarity among functional terms.

53

Table 3-1. Most significant genetic markers associated with Sire Conception Rate. Marker Chr Position Frequency β ± se P-value q-value Nearest Gene BTB-01438088 9 11867269 0.038 -0.65 ± 5.1E-08 0.001 RIMS1 0.12 (within) BTB-01138539 15 26472899 0.815 0.26 ± 7.0E-06 0.102 CADM1 0.06 (22 kb) ARS-BFGL- 21 71210609 0.670 0.20 ± 1.4E-05 0.136 BRF1 NGS-106232 0.05 (within) BTA-59768- 25 27477941 0.266 -0.29 ± 2.7E-07 0.005 KAT8 no-rs 0.06 (within) ARS-BFGL- 25 27672891 0.266 -0.29 ± 2.8E-07 0.005 ITGAM NGS-112660 0.06 (34 kb) Hapmap8541- 25 28711626 0.150 -0.30 ± 1.4E-05 0.136 TYW1 BTA-59825 0.07 (within)

54

Table 3-2. Gene Ontology (GO) Molecular Function terms significantly enriched with genes associated with Sire Conception Rate GO ID MeSH Term Name No. No. P-value Genes Significant Genes GO:0000149 SNARE binding 20 3 0.007 GO:0005484 SNAP receptor activity 15 3 0.003 GO:0005246 calcium channel regulator 11 2 0.020 activity GO:0016247 channel regulator activity 36 5 0.001 GO:0017080 sodium channel regulator activity 8 2 0.010 GO:0005385 zinc ion transmembrane 7 2 0.008 transporter activity GO:0015075 ion transmembrane transporter 188 9 0.015 activity GO:0022857 transmembrane transporter 215 9 0.031 activity GO:0022890 inorganic cation transmembrane 116 7 0.009 transporter activity GO:0019901 protein kinase binding 77 5 0.020 GO:0019899 enzyme binding 283 13 0.005 GO:0016772 transferase activity 197 10 0.007 GO:0016779 nucleotidyltransferase activity 46 5 0.002

55

Table 3-3. MeSH terms significantly enriched with genes associated with Sire Conception Rate (SCR). Mesh Term MeSH Term Name No. Genes No. Significant P-value ID Genes D005640 Follicle Stimulating 34 4 6.4E-03 Hormone D013075 Sperm Capacitation 9 2 1.6E-02 D013081 Sperm Motility 13 4 1.4E-04 D013094 Spermatozoa 71 5 2.0E-02 D017868 Cyclic AMP- 75 5 2.5E-02 Dependent Protein Kinases D018698 Glutamic Acid 35 4 7.1E-03 D018873 Pregnancy Rate 4 2 2.8E-03 D020691 rab GTP-Binding 12 3 2.0E-03 Proteins

56

LIST OF REFERENCES

Abdalla, E.A., Peñagaricano, F., Byrem, T.M., Weigel, K.A., Rosa, G.J.M., 2016. Genome-wide association mapping and pathway analysis of leukosis incidence in a US Holstein cattle population. Animal Genetics.

Aguilar, I., Misztal, I., Johnson, D.L., Legarra, A., Tsuruta, S., Lawlor, T.J., 2010. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93, 743-752.

Amann, R.P., DeJarnette, J.M., 2012. Impact of genomic selection of AI dairy sires on their likely utilization and methods to estimate fertility: A paradigm shift. Theriogenology 77, 795-817.

Andersson, L., Georges, M., 2004. Domestic-animal genomics: deciphering the genetics of complex traits. Nature Reviews Genetics 5, 202-212.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., Gene Ontology, C., 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Avella, M.A., Baibakov, B., Dean, J., 2014. A single domain of the ZP2 zona pellucida protein mediates gamete recognition in mice and humans. Journal of Cell Biology 205, 801-809.

Berry, D.P., Friggens, N.C., Lucy, M., Roche, J.R., 2016. Milk Production and Fertility in Cattle. Annu Rev Anim Biosci 4, 269-290.

Bissonnette, N., Lévesque-Sergerie, J.-P., Thibault, C., Boissonneault, G., 2009. Spermatozoal transcriptome profiling for bull sperm motility: a potential tool to evaluate semen quality. Reproduction 138, 65-80.

Blaschek, M., Kaya, A., Zwald, N., Memili, E., Kirkpatrick, B.W., 2011a. A whole- genome association analysis of noncompensatory fertility in Holstein bulls. Journal of Dairy Science 94, 4695-4699.

Bonnet-Garnier, A., Lacaze, S., Beckers, J.-F., Berland, H., Pinton, A., Yerle, M., Ducos, A., 2008. Meiotic segregation analysis in cows carrying the t (1; 29) Robertsonian translocation. Cytogenetic and genome research 120, 91-96.

Braundmeier, A.G., Miller, D.J., 2001. The search is on: finding accurate molecular markers of male fertility. J Dairy Sci 84, 1915-1925.

Byrne, K., Leahy, T., McCulloch, R., Colgrave, M.L., Holland, M.K., 2012. Comprehensive mapping of the bull sperm surface proteome. Proteomics 12, 3559- 3579.

57

Chasman, D.I., 2008. On the utility of gene set methods in genomewide association studies of quantitative traits. Genetic Epidemiology 32, 658-668.

Cheng, D., Xiong, C., Li, J., Sui, C., Wang, S., Li, H., Jiang, X., 2014. The effect of mahogunin gene mutant on reproduction in male mice: a new sight for infertility? Andrologia 46, 98-105.

Citek, J., Rubes, J., Hajkova, J., 2009. Short communication: Robertsonian translocations, chimerism, and aneuploidy in cattle. Journal of dairy science 92, 3481- 3483.

Clay, J., McDaniel, B., 2001. Computing mating bull fertility from DHI nonreturn data. Journal of dairy science 84, 1238-1245.

Cole, J.B., Wiggans, G.R., Ma, L., Sonstegard, T.S., Lawlor, T.J., Jr., Crooker, B.A., Van Tassell, C.P., Yang, J., Wang, S., Matukumalli, L.K., Da, Y., 2011. Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary US Holstein cows. Bmc Genomics 12.

D'Amours, O., Frenette, G., Fortier, M., Leclerc, P., Sullivan, R., 2010. Proteomic comparison of detergent-extracted sperm proteins from bulls with different fertility indexes. Reproduction 139, 545-556.

Darszon, A., Nishigaki, T., Beltran, C., Trevino, C.L., 2011. Calcium channels in the development, maturation, and function of spermatozoa. Physiological Reviews 91, 1305-1355.

Deepinder, F., Chowdary, H.T., Agarwal, A., 2007. Role of metabolomic analysis of biomarkers in the management of male infertility. Expert review of molecular diagnostics 7, 351-358.

DeJarnette, J., Amann, R., 2010. Understanding estimates of AI sire fertility: From A to Z. Proceedings of the 23rd Tech Conference Artific Insem Reprod Natl Assoc Anim Breeders, pp. 13-27.

DeJarnette, J., Marshall, C., Lenz, R., Monke, D., Ayars, W., Sattler, C., 2004. Sustaining the fertility of artificially inseminated dairy cattle: the role of the artificial insemination industry. Journal of dairy Science 87, E93-E104.

Di Croce, F.A., 2010. Development of Genetic and Genomic Predictors of Fertility in Argentinean Holstein Cattle.

Dijkhuizen, A., Stelwagen, J., Renkema, J., 1985. Economic aspects of reproductive failure in dairy cattle. I. Financial loss at farm level. Preventive Veterinary Medicine 3, 251-263.

Diskin, M.G., Parr, M.H., Morris, D.G., 2012. Embryo death in cattle: an update. Reproduction Fertility and Development 24, 244-251.

58

Dun, M.D., Smith, N.D., Baker, M.A., Lin, M., Aitken, R.J., Nixon, B., 2011. The chaperonin containing TCP1 complex (CCT/TRiC) is involved in mediating sperm- oocyte interaction. Journal of Biological Chemistry 286, 36875-36887.

Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., Huber, W., 2005. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439-3440.

Durinck, S., Spellman, P.T., Birney, E., Huber, W., 2009. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols 4, 1184-1191.

Eid, L.N., Lorton, S.P., Parrish, J.J., 1994. Paternal influence on S-phase in the first cell cycle of the bovine embryo. Biol Reprod 51, 1232-1237.

Evenson, D.P., 1999. Loss of livestock breeding efficiency due to uncompensable sperm nuclear defects. Reprod Fertil Dev 11, 1-15.

Ferlin, A., Arredi, B., Foresta, C., 2006. Genetic causes of male infertility. Reprod Toxicol 22, 133-141.

Feugang, J.M., Kaya, A., Page, G.P., Chen, L., Mehta, T., Hirani, K., Nazareth, L., Topper, E., Gibbs, R., Memili, E., 2009. Two-stage genome-wide association study identifies integrin beta 5 as having potential role in bull fertility. BMC Genomics 10, 176.

Feugang, J.M., Rodriguez-Osorio, N., Kaya, A., Wang, H., Page, G., Ostermeier, G.C., Topper, E.K., Memili, E., 2010. Transcriptome analysis of bull spermatozoa: implications for male fertility. Reprod Biomed Online 21, 312-324.

Ficarro, S., Chertihin, O., Westbrook, V.A., White, F., Jayes, F., Kalab, P., Marto, J.A., Shabanowitz, J., Herr, J.C., Hunt, D.F., Visconti, P.E., 2003. Phosphoproteome analysis of capacitated human sperm - Evidence of tyrosine phosphorylation of a kinase- anchoring protein 3 and valosin-containing protein/p97 during capacitation. Journal of Biological Chemistry 278, 11579-11589.

Fortes, M.R., Reverter, A., Hawken, R.J., Bolormaa, S., Lehnert, S.A., 2012. Candidate genes associated with testicular development, sperm quality, and hormone levels of inhibin, luteinizing hormone, and insulin-like growth factor 1 in Brahman bulls. Biology of reproduction 87, 58.

Fouz, R., Gandoy, F., Sanjuán, M.L., Yus, E., Diéguez, F.J., 2011. Factors associated with 56-day non-return rate in dairy cattle. Pesquisa Agropecuária Brasileira 46, 648- 654.

Fragomeni, B.O., Lourenco, D.A., Tsuruta, S., Masuda, Y., Aguilar, I., Legarra, A., Lawlor, T.J., Misztal, I., 2015. Hot topic: Use of genomic recursions in single-step genomic best linear unbiased predictor (BLUP) with a large number of genotypes. J Dairy Sci 98, 4090-4094.

59

Gamboa, S., Ramalho-Santos, J., 2005. SNARE proteins and caveolin-1 in stallion spermatozoa: possible implications for fertility. Theriogenology 64, 275-291.

Gambra, R., Peñagaricano, F., Kropp, J., Khateeb, K., Weigel, K.A., Lucey, J., Khatib, H., 2013. Genomic architecture of bovine k-casein and beta-lactoglobulin. Journal of Dairy Science 96, 5333-5343.

Gaviraghi, A., Deriu, F., Soggiu, A., Galli, A., Bonacina, C., Bonizzi, L., Roncada, P., 2010. Proteomics to investigate fertility in bulls. Veterinary research communications 34, 33-36.

Gergely, A., Szollosi, J., Falkai, G., Resch, B., Kovacs, L., Huszar, G., 1999. Sperm creatine kinase activity in normospermic and oligozospermic Hungarian men. Journal of Assisted Reproduction and Genetics 16, 35-40.

Gibson, G., 2012. Rare and common variants: twenty arguments. Nature Reviews Genetics 13, 135-145.

Gledhill, B.L., 1970. Enigma of spermatozoal deoxyribonucleic acid and male infertility: a review. Am J Vet Res 31, 539-545.

Goddard, M.E., Hayes, B.J., 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10, 381-391.

Griffeth, R.J., Bianda, V., Nef, S., 2014. The emerging role of insulin-like growth factors in testis development and function. Basic and clinical andrology 24, 12-12.

Hallak, J., Sharma, R.K., Pasqualotto, F.F., Ranganathan, P., Thomas, A.J., Agarwal, A., 2001. Creatine kinase as an indicator of sperm quality and maturity in men with oligospermia. Urology 58, 446-451.

Hering, D., Olenski, K., Kaminski, S., 2014a. Genome-wide association study for poor sperm motility in Holstein-Friesian bulls. Animal reproduction science 146, 89-97.

Hering, D., Olenski, K., Kaminski, S., 2014b. Genome‐Wide Association Study for Sperm Concentration in Holstein‐Friesian Bulls. Reproduction in Domestic Animals 49, 1008-1014.

Hering, D., Oleński, K., Ruść, A., Kaminski, S., 2014c. Genome-wide association study for semen volume and total number of sperm in Holstein-Friesian bulls. Animal reproduction science 151, 126-130.

Heringstad, B., Klemetsdal, G., Svendsen, M., Steine, T., 2003. Heifer fertility in Norwegian dairy cattle: Variance components and genetic change. Journal of dairy science 86, 2706-2714.

Horgan, R.P., Kenny, L.C., 2011. ‘Omic’technologies: genomics, transcriptomics, proteomics and metabolomics. The Obstetrician & Gynaecologist 13, 189-195.

60

Hung, P.H., Suarez, S.S., 2012. Alterations to the bull sperm surface proteins that bind sperm to oviductal epithelium. Biology of Reproduction 87.

Iannuzzi, L., Molteni, L., Di Meo, G., De Giovanni, A., Perucatti, A., Succi, G., Incarnato, D., Eggen, A., Cribiu, E., 2002. A case of azoospermia in a bull carrying a Y-autosome reciprocal translocation. Cytogenetic and Genome Research 95, 225-227.

Jamrozik, J., Fatehi, J., Kistemaker, G.J., Schaeffer, L.R., 2005. Estimates of genetic parameters for Canadian Holstein female reproduction traits. Journal of Dairy Science 88, 2199-2208.

Joyce, A.R., Palsson, B.Ø., 2006. The model organism as a system: integrating'omics' data sets. Nature Reviews Molecular Cell Biology 7, 198-210.

Kamiński, S., Hering, D.M., Oleński, K., Lecewicz, M., Kordan, W., 2016. Genome-wide association study for sperm membrane integrity in frozen-thawed semen of Holstein- Friesian bulls. Animal reproduction science 170, 135-140.

Kathiravan, P., Kalatharan, J., Karthikeya, G., Rengarajan, K., Kadirvel, G., 2011. Objective Sperm Motion Analysis to Assess Dairy Bull Fertility Using Computer‐Aided System–A Review. Reproduction in Domestic Animals 46, 165-172.

Kaya, A., Memili, E., 2016. Sperm macromolecules associated with bull fertility. Anim Reprod Sci.

Khatib, H., Monson, R., Huang, W., Khatib, R., Schutzkus, V., Khateeb, H., Parrish, J., 2010. Short communication: validation of in vitro fertility genes in a Holstein bull population. Journal of dairy science 93, 2244-2249.

Kropp, J., Penagaricano, F., Salih, S.M., Khatib, H., 2014. Invited review: Genetic contributions underlying the development of preimplantation bovine embryos. Journal of Dairy Science 97, 1187-1201.

Kuhn, M.T., Hutchison, J.L., 2008a. Prediction of dairy bull fertility from field data: use of multiple services and identification and utilization of factors affecting bull fertility. J Dairy Sci 91, 2481-2492.

Kuhn, M.T., Hutchison, J.L., Norman, H.D., 2008b. Modeling nuisance variables for prediction of service sire fertility. Journal of Dairy Science 91, 2823-2835.

Kuo, Y.-C., Shen, Y.-R., Chen, H.-I., Lin, Y.-H., Wang, Y.-Y., Chen, Y.-R., Wang, C.-Y., Kuo, P.-L., 2015. SEPT12 orchestrates the formation of mammalian sperm annulus by organizing core octameric complexes with other SEPT proteins. Journal of Cell Science 128, 923-934.

Lalancette, C., Thibault, C., Bachand, I., Caron, N., Bissonnette, N., 2008. Transcriptome analysis of bull semen with extreme nonreturn rate: use of suppression-

61

subtractive hybridization to identify functional markers for fertility. Biology of reproduction 78, 618-635.

Lan, X., Peñagaricano, F., DeJung, L., Weigel, K., Khatib, H., 2013a. Short communication: A missense mutation in the PROP1 (prophet of Pit 1) gene affects male fertility and milk production traits in the US Holstein population. Journal of dairy science 96, 1255-1257.

Lan, X.Y., Peñagaricano, F., DeJung, L., Weigel, K.A., Khatib, H., 2013b. A missense mutation in the PROP1 (prophet of Pit 1) gene affects male fertility and milk production traits in the US Holstein population. Journal of Dairy Science 96, 1255-1257.

Larkin, D., Farré, M., Garrick, D., Ruvinsky, A., 2015. Cytogenetics and chromosome maps. The Genetics of Cattle, 103-129.

Legarra, A., Croiseau, P., Sanchez, M.P., Teyssedre, S., Salle, G., Allais, S., Fritz, S., Moreno, C.R., Ricard, A., Elsen, J.M., 2015. A comparison of methods for whole- genome QTL mapping using dense markers in four livestock species. Genetics Selection Evolution 47..

Li, G., Peñagaricano, F., Weigel, K.A., Zhang, Y., Rosa, G., Khatib, H., 2012b. Comparative genomics between fly, mouse, and cattle identifies genes associated with sire conception rate. Journal of Dairy Science 95, 6122-6129.

Lin, Y.-H., Lin, Y.-M., Wang, Y.-Y., Yu, I.S., Lin, Y.-W., Wang, Y.-H., Wu, C.-M., Pan, H.-A., Chao, S.-C., Yen, P.H., Lin, S.-W., Kuo, P.-L., 2009. The expression level of septin12 is critical for spermiogenesis. American Journal of Pathology 174, 1857-1868.

Lishko, P.V., Kirichok, Y., Ren, D.J., Navarro, B., Chung, J.J., Clapham, D.E., 2012. The control of male fertility by spermatozoan ion channels. In: Julius, D., Clapham, D.E. (Eds.), Annual Review of Physiology, Vol 74, pp. 453-475.

Lucy, M.C., 2001. Reproductive loss in high-producing dairy cattle: where will it end? Journal of Dairy Science 84, 1277-1293.

Miglior, F., Muir, B.L., Van Doormaal, B.J., 2005. Selection indices in Holstein cattle of various countries. J Dairy Sci 88, 1255-1263.

Miglior, F., Pizzi, F., Guaita, N., 1998. Effect of environmental factors on non return rate in Italian Holstein-Friesians. Interbull Bulletin, 106.

Misztal, I., Wang, H., 2014. GWAS using ssGBLUP. 10th World Congress on Genetics Applied to Livestock Production. Asas.

Mocé, E., Graham, J., 2008. In vitro evaluation of sperm quality. Animal reproduction science 105, 104-118.

62

Molteni, L., Meggiolaro, D., Macchi, A.D.G., De Lorenzi, L., Crepaldi, P., Stacchezzini, S., Cremonesi, F., Ferrara, F., 2005. Fertility of cryopreserved sperm in three bulls with different Robertsonian translocations. Animal reproduction science 86, 27-36.

Morota, G., Peñagaricano, F., Petersen, J.L., Ciobanu, D.C., Tsuyuzaki, K., Nikaido, I., 2015. An application of MeSH enrichment analysis in livestock. Animal Genetics 46, 381-387.

Morris, D., Diskin, M., 2008. Effect of progesterone on embryo survival. Animal 2, 1112- 1119.

Nagamine, Y., Sasaki, O., 2008. Effect of environmental factors on fertility of Holstein- Friesian cattle in Japan. Livestock Science 115, 89-93.

Nelson, S.J., Schopen, M., Savage, A.G., Schulman, J.L., Arluk, N., 2004. The MeSH translation maintenance system: structure, interface design, and implementation. Stud Health Technol Inform 107, 67-69.

Norman, H.D., Hutchison, J.L., Wright, J.R., Hubbard, S.M., 2008. A national sire fertility index. Proc. Dairy Cattle Reproductive Council Conf., Omaha, NE. Dairy Cattle Reproductive Council, Hartland, WI, pp. 45-52.

Norman, H.D., Wright, J.R., Dürr, J., 2015. Re-examination of service-sire conception rates in the United States. Interbull Bulletin.

O'Brien, K.L.F., Varghese, A.C., Agarwal, A., 2010. The genetic causes of male factor infertility: a review. Fertility and sterility 93, 1-12.

Overstreet, J.W., Cooper, G.W., 1978. Sperm transport in the reproductive tract of the female rabbit: II. The sustained phase of transport. Biol Reprod 19, 115-132.

Pausch, H., Kölle, S., Wurmser, C., Schwarzenbacher, H., Emmerling, R., Jansen, S., Trottmann, M., Fuerst, C., Götz, K.-U., Fries, R., 2014. A nonsense mutation in TMEM95 encoding a nondescript transmembrane protein causes idiopathic male subfertility in cattle. PLoS Genet 10, e1004044.

Peddinti, D., Nanduri, B., Kaya, A., Feugang, J.M., Burgess, S.C., Memili, E., 2008. Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility. BMC Syst Biol 2, 19.

Peng, G., Luo, L., Siu, H.C., Zhu, Y., Hu, P.F., Hong, S.J., Zhao, J.Y., Zhou, X.D., Reveille, J.D., Jin, L., Amos, C.I., Xiong, M.M., 2010. Gene and pathway-based second- wave analysis of genome-wide association studies. European Journal of Human Genetics 18, 111-117.

Peñagaricano, F., Weigel, K.A., Khatib, H., 2012a. Genome-wide association study identifies candidate markers for bull fertility in Holstein dairy cattle. Animal Genetics 43, 65-71.

63

Peñagaricano, F., Weigel, K.A., Rosa, G.J., Khatib, H., 2012b. Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study. Front Genet 3, 307.

Peñagaricano, F., Weigel, K.A., Rosa, G.J.M., Khatib, H., 2013. Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study. Frontiers in Genetics 3, 307.

Pitetti, J.-L., Calvel, P., Zimmermann, C., Conne, B., Papaioannou, M.D., Aubry, F., Cederroth, C.R., Urner, F., Fumel, B., Crausaz, M., Docquier, M., Herrera, P.L., Pralong, F., Germond, M., Guillou, F., Jegou, B., Nef, S., 2013. An essential role for insulin and IGF1 receptors in regulating sertoli cell proliferation, testis size, and FSH action in mice. Molecular Endocrinology 27, 814-827.

Potgieter, J.P., 2012. Estimation of genetic parameters for fertility traits and the effect of milk production on reproduction performance in South African Holstein cows. Stellenbosch: Stellenbosch University.

Pryce, J., Veerkamp, R., 2001. The incorporation of fertility indices in genetic improvement programmes. BSAS occasional publication, 237-250.

Qin, C., Yin, H., Zhang, X., Sun, D., Zhang, Q., Liu, J., Ding, X., Zhang, Y., Zhang, S., 2016. Genome‐wide association study for semen traits of the bulls in Chinese Holstein. Animal Genetics.

Ramalho-Santos, J., Schatten, G., Moreno, R.D., 2002. Control of membrane fusion during spermiogenesis and the acrosome reaction. Biology of Reproduction 67, 1043- 1051.

Raudsepp, T., Chowdhary, B.P., 2016. Chromosome Aberrations and Fertility Disorders in Domestic Animals. Annual review of animal biosciences 4, 15-43.

Ron, M., Weller, J., 2007. From QTL to QTN identification in livestock–winning by points rather than knock‐out: a review. Animal genetics 38, 429-439.

Royal, M.D., Darwash, A.O., Flint, A.P.E., Webb, R., Woolliams, J., Lamming, G.E., 2000. Declining fertility in dairy cattle: changes in traditional and endocrine parameters of fertility. Animal science 70, 487-501.

Rönnegård, L., McFarlane, S.E., Husby, A., Kawakami, T., Ellegren, H., Qvarnström, A., 2016. Increasing the power of genome wide association studies in natural populations using repeated measures – evaluation and implementation. Methods in Ecology and Evolution.

Saacke, R., Nadir, S., Dalton, J., Bame, J., DeJarnette, J., Degelos, S., Nebel, R., 1994. Accessory sperm evaluation and bull fertility: an update. Proc. 15th tech. Conf. Artif. Insem. And Reprod. Natl Assoc. Animal Breeders, pp. 57-67.

64

Saacke, R.G., Dalton, J.C., Nadir, S., Nebel, R.L., Bame, J.H., 2000. Relationship of seminal traits and insemination time to fertilization rate and embryo quality. Anim Reprod Sci 60-61, 663-677.

Santos, J.E.P., Thatcher, W.W., Chebel, R.C., Cerri, R.L.A., Galvao, K.N., 2004. The effect of embryonic death rates in cattle on the efficacy of estrus synchronization programs. Animal Reproduction Science 82-3, 513-535.

Sellem, E., Broekhuijse, M.L., Chevrier, L., Camugli, S., Schmitt, E., Schibler, L., Koenen, E.P., 2015. Use of combinations of in vitro quality assessments to predict fertility of bovine semen. Theriogenology 84, 1447-1454.e1445.

Shoji, M., Tanaka, T., Hosokawa, M., Reuter, M., Stark, A., Kato, Y., Kondoh, G., Okawa, K., Chujo, T., Suzuki, T., Hata, K., Martin, S.L., Noce, T., Kuramochi-Miyagawa, S., Nakano, T., Sasaki, H., Pillai, R.S., Nakatsuji, N., Chuma, S., The TDRD9-MIWI2 complex is essential for piRNA-mediated retrotransposon silencing in the mouse male germline. Developmental Cell 17, 775-787.

Shook, G.E., 2006. Major advances in determining appropriate selection goals. Journal of Dairy Science 89, 1349-1361.

Stalhammar, E.M., Janson, L., Philipsson, J., 1994. Genetic studies on fertility in AI bulls. II. Environmental and genetic effects on non-return rates of young bulls. Animal Reproduction Science 34, 193-207.

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P., 2005. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences 102, 15545- 15550.

Suchocki, T., Szyda, J., 2015. Genome-wide association study for semen production traits in Holstein-Friesian bulls. Journal of dairy science 98, 5774-5780.

Thomas, C., Garner, D., DeJarnette, J., Marshall, C., 1998. Effect of cryopreservation of bovine sperm organelle function and viability as determined by flow cytometry. Biology of reproduction 58, 786-793.

Thomas, T., Dixon, M.P., Kueh, A.J., Voss, A.K., 2008. Mof (MYST1 or KAT8) is essential for progression of embryonic development past the blastocyst stage and required for normal chromatin architecture. Molecular and Cellular Biology 28, 5093- 5105.

Thomas, T., Loveland, K.L., Voss, A.K., 2007. The genes coding for the MYST family histone acetyltransferases, Tip60 and Mof, are expressed at high levels during sperm development. Gene Expression Patterns 7, 657-665.

65

Tsuyuzaki, K., Morota, G., Ishii, M., Nakazato, T., Miyazaki, S., Nikaido, I., 2015. MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis. BMC Bioinformatics 16, 1-17.

Utt, M.D., 2016. Prediction of bull fertility. Animal reproduction science 169, 37-44.

Villagómez, D., Parma, P., Radi, O., Di Meo, G., Pinton, A., Iannuzzi, L., King, W., 2009. Classical and molecular cytogenetics of disorders of sex development in domestic animals. Cytogenetic and genome research 126, 110-131.

Vishwanath, R., 2003. Artificial insemination: the state of the art. Theriogenology 59, 571-584.

Wang, H., Misztal, I., Aguilar, I., Legarra, A., Muir, W.M., 2012. Genome-wide association mapping including phenotypes from relatives without genotypes. Genetics Research 94, 73-83.

Wang, J., Qi, L., Huang, S., Zhou, T., Guo, Y., Wang, G., Guo, X., Zhou, Z., Sha, J., 2015. Quantitative phosphoproteomics analysis reveals a key role of insulin growth factor 1 receptor (IGF1R) tyrosine kinase in human sperm capacitation. Molecular & Cellular Proteomics 14, 1104-1112.

Wang, K., Li, M., Bucan, M., 2007. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81, 1278-1283.

Weigel, K.A., 2006. Prospects for improving reproductive performance through genetic selection. Animal Reproduction Science 96, 323-330.

Weng, L., Macciardi, F., Subramanian, A., Guffanti, G., Potkin, S.G., Yu, Z., Xie, X., 2011. SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinformatics 12, 99.

Young, M., Wakefield, M., Smyth, G., Oshlack, A., 2010. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14.

Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., Wang, S., 2010. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26, 976-978.

Zimin, A.V., Delcher, A.L., Florea, L., Kelley, D.R., Schatz, M.C., Puiu, D., Hanrahan, F., Pertea, G., Van Tassell, C.P., Sonstegard, T.S., Marcais, G., Roberts, M., Subramanian, P., Yorke, J.A., Salzberg, S.L., 2009. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biology 10.

66

BIOGRAPHICAL SKETCH

Yi Han grew up in Harbin, a city in the north of China, where he attended school finishing high school. For his undergraduate degree, he majors in animal sciences and

received his bachelor’s degree from China Agricultural University in the spring of 2014.

In 2014 fall, he entered University of Florida for a Master of Science program. In 2015

fall, he joined Dr. Peñagaricano’s lab focusing on quantitative genetics. Upon

graduation, he will continue his interests in biostatistics and work toward his doctoral

degree.

67