<<

Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 632

______

Genome Mapping in the

BY

GABRIELLA LINDGREN

ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2001 Dissertation for the Degree of Doctor of Philosophy in Evolutionary Genetics presented at Uppsala University in 2001

ABSTRACT Lindgren, G. 2001. Mapping in the Horse. Acta Universitatis Upsaliensis. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 632. 33 pp. Uppsala. ISBN 91-554-5036-9 Our ability to map and sequence whole is one of the most important developments in biological science. It will provide us with an unprecedented insight into the genetic background of phenotypic traits, such as disease resistance, reproduction and growth and also makes it feasible to study the processes of genome . The main focus of this thesis has been to develop a linkage map of the horse ( caballus) genome. A secondary aim was to expand the number of physically mapped in the horse, allowing comparative analyses with data from the map. Finally, attempts were made to identify single nucleotide polymorphisms (SNPs) on the horse Y . The development of a genome map relies on the information generated by both linkage and cytogenetical studies. To integrate genetical and physical assignments in the very early phase of equine genome map construction, 19 polymorphic microsatellite markers were isolated from lambda phage clones which, in parallel, were physically assigned to by fluorescent in situ hybridization (FISH). The microsatellites were simultaneously mapped by linkage analysis in a Swedish reference pedigree. A first primary male autosomal linkage map of the domestic horse was constructed by segregation analysis of 140 genetic markers within eight half-sib families with, in total, 263 offspring. One hundred markers were arranged into 25 linkage groups, 22 of which could be assigned physically to 18 different chromosomes. The total map distance contained within linkage groups was 679 cM. The presented map provides an important framework for future genome mapping in the horse. Our contribution to the comparative map, was the presentation of map data for 12 novel genes using FISH and somatic cell mapping. All chromosomal assignments except one were in agreement with human-horse Zoo-FISH data. The exception concerned the CLU which was mapped by synteny to ECA2 while human-horse Zoo-FISH data predicted that it would be located on ECA9. The level of SNPs on the horse Y chromosome was also investigated by DNA sequencing and denaturing high performance liquid chromatography (DHPLC) of Y chromosome-specific fragments derived mainly from BAC clone subcloning. The amount of genetic variability was found to be very low, consistent with low male effective population size. Key words: Horse, microsatellites, linkage analysis, FISH, Type I markers, BAC library, Y chromosome.

Gabriella Lindgren, Department of , Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, .

 Gabriella Lindgren 2001

ISSN 1104-232X ISBN 91-554-5036-9 Printed in Sweden by Uppsala University, Tryck & Medier, Uppsala, 2001 Appendix

Papers I-V

The present thesis is based on the following papers, which will be referred to by their Roman numerals:

I. Breen M, Lindgren G, Binns MM, Norman J, Irvin Z, Bell K, Sandberg K, and Ellegren H. (1997) Genetical and physical assignments of equine microsatellites - first integration of anchored markers in horse genome mapping. Mammalian Genome 8: 267-273.

II. Lindgren G, Sandberg K, Persson H, Marklund S, Breen M, Sandgren B, Carlstén J, and Ellegren H. (1998) A primary male autosomal linkage map of the horse genome. Genome Research 8: 951-966.

III. Lindgren G, Breen M, Godard S, Bowling A, Murray J, Scavone M, Skow L, Sandberg K, Guérin G, Binns MM, and Ellegren H. (2001) Mapping of 13 horse genes by fluorescence in-situ hybridization (FISH) and somatic cell hybrid analysis. Chromosome Research 9:53-59.

IV. Lindgren G, Swinburne J, Breen M, Mariat D, Sandberg K, Guérin G, Ellegren H, and Binns MM. (2001) Physical anchorage and orientation of equine linkage groups by FISH mapping BAC clones containing microsatellite markers. Genetics, 32: 37-39.

V. Lindgren G, Aldridge V, Einarsson A, Hellborg L, Sandberg K, Binns M, and Ellegren H. (2001) Low genetic variability on the Y chromosome of domestic . Manuscript.

Papers I-IV are reproduced with permission from the publishers. Abbreviations list bp cM centiMorgan BAC bacterial atrificial chromosome BLAST basic local alignment search tool CATS comparative anchor-tagged sequences cDNA complementary deoxyribonucleic acid cR centiRay DAPI 4,6-diamidino-2-phenylindole DIGdigoxigenin DNA deoxyribonucleic acid dNTP any of dATP, dCTP, dGTP or dTTP; or a mixture of all four ECA Equus caballus (horse) ELA equine major histocompatibility complex EST expressed sequence tag FACS fluorescence activated cell sorter FISH fluorescence in situ hybridization kb kilobase LINE long interspersed elements Lod logarithm (base 10) of odds MAS marker-assisted selection Mb megabase mt mitochondrial mya million years ago PCR polymerase chain reaction RAPD randomly amplified polymorphic DNA RFLP restriction fragment length polymorphisms RH radiation hybrid SINE short interspersed elements SCH somatic cell hybrid SNP single nucleotide SSCP single strand conformational polymorphism STR short tandem repeat STS sequence-tagged sites tRNA transfer RNA QTL quantitative trait loci YAC yeast artificial chromosome Table of contents

Introduction 1 Background to genome mapping 1 The significance of genome mapping in domestic 2 Methods in genome mapping 2 Genetic linkage mapping 2 Physical mapping 4 In situ hybridisation 4 Somatic cell hybrid (SCH) mapping 4 Radiation hybrid (RH) mapping 5 Comparative mapping 6 The equine genome 6 The study organism 6 Genome structure and size 7 The 9 Equine genome mapping 9 General research aims 14 Results and discussion 15 Integration of genetical and physical assignments of equine microsatellite markers (Paper I) 15 A primary linkage map of the horse genome (Paper II) 17 FISH and somatic cell hybrid mapping of a set of horse genes (Paper III) 19 Physical anchorage and orientation of equine linkage groups (Paper IV) 22 Low genetic variability on the Y chromosome of domestic horses (Paper V) 23 Future prospects 25 References 26 Acknowledgements 32 Introduction

The horse (Equus caballus) was domesticated about 6000 years ago (Clutton-Brock 1999). Thereafter, horses have played an essential role in the development of many civilizations. The bond between people and horses remains strong and in addition to their importance as companion animals in sport and recreation, contact with horses is one alternative way for rehabilitation from certain psychological problems, neurological injuries and stroke in humans (Håkansson, 2000). Early horse breeders selected individuals with the most attractive phenotypic characters for breeding without knowledge of their genetic status and indeed, much was achieved through this kind of selection. Today, approximately 400 distinctive horse breeds are recognised (Scherf 1995), although in general they are not as phenotypically distinctive as dog breeds. These horse breeds mainly reflect selection on type and utility, but are also based on colour, geographical origin and speed. A greater knowledge of the horse genome and of the genes controlling desired traits will dramatically increase the efficiency of artificial selection. Identification of quantitative trait loci (QTL) and implementation of marker-assisted selection (MAS) provide new tools for animal breeding and will most likely become common tools for selective in the future. The prime force for genome mapping in animals is to understand the genetic background for traits such as disease, reproduction and growth. In addition, a genome map of the horse will also contribute to a greater understanding of the evolution of the mammalian genome. This thesis focuses on the development of the horse genome map.

Background to genome mapping

Genetics has become an indispensable component of almost all research in modern biology and medicine. This position has been achieved through the synthesis of classical and molecular approaches. Hereditary phenomena have been utilised by humans since before civilisation with ancient people improving plant crops and domesticated animals by selecting the most desirable individuals for breeding. However, an understanding of the principles of genetics only began in the 1860s when the Augustinian monk Gregor Mendel performed a set of experiments that revealed the existence of heritable elements, which later were defined as genes. According to Mendel's theory, characters are determined by discrete units that are inherited intact down through the generations. This model explained many observations that could not be explained by the idea of . The importance of Mendel's ideas and contribution was not recognised until around 1900, after his . Modern genetics is based on the concept of the gene, the fundamental unit of heredity. A gene is a functional section of the long DNA molecule that constitutes the fundamental structure of a chromosome. Each cell in an organism has either one or two sets of the DNA (note tetraploids though), the complete set of which is called a genome. Most organisms utilise a common information storage system, DNA, in which information flows from DNA to RNA to . During the first quarter of the 20th century, studies focused on the discovery and visualization of chromosomes in several species. The most crucial breakthrough was the description of the DNA double helix by Watson and Crick in 1953. Thereafter, the genetic code was unlocked with the discovery of how cells read the information contained in genes. Recombinant DNA technology, including cloning and sequencing, was developed in the 1970s allowing the sequencing of genes and subsequently whole genomes, creating the ultimate physical genetic map. Recently, the initial sequence of the human genome was reported (Venter et al. 2001 and International Human Genome Sequencing Consortium 2001).

1 Genome mapping utilises several complementary techniques of which the most important are described below. Briefly, there are two kinds of genome maps; genetic and physical. The genetic map is based on recombination frequencies and involves analysis of the segregation of polymorphic genetic markers in pedigrees. It gives information on which markers that belong to the same linkage group, their relative and the distance between them. The physical map determines on which chromosome or chromosomal region a DNA fragment, e.g. a marker or a gene, resides. Genetic markers used in genome mapping have been classified based on the level of conservation. Type I markers are coding markers conserved across different species and typically having low polymorphism (e.g. CATS and SSCPs). By contrast, Type II markers, exemplified by microsatellites, are highly polymorphic markers, invaluable for constructing genetic linkage maps.

The significance of genome mapping in domestic animals

Why is it of interest to study genomes? DNA has provided the basis for the evolutionary process that created the tremendous number of existing different forms. By studying the genomes in different organisms the great organismal variety and many other questions may be explained. Genome mapping can be divided into two main important components; construction of the genome map and the utilisation of the map. For many years, large- scale genetic linkage mapping was held up by the lack of available markers. The rapid development of genetic markers that came along with the invention of recombinant DNA technology has opened the way for the construction of detailed genetic linkage maps for numerous different species. For farm animals, comprehensive genetic linkage maps are available for several of the most important species, such as (Bos taurus) (Kappes 1997), chicken (Gallus gallus) (Groenen 2000), (Ovis aries) (de Gortari 1998) and pig (Sus scrofa) (Mikawa 1999). Genome maps can be used to identify genes or chromosomal regions regulating the genetic background to various phenotypic traits. Both monogenic traits, controlled by a single gene, and polygenic traits, controlled by an unknown number of genes as well as by environmental factors, can be studied. A trait can be mapped to a region by linkage analysis, in which markers evenly spread over the genome are analysed in a family material in which a certain trait is segregating. Markers that are inherited together with the trait must be physically close to the gene for that trait. The final goal is to clone the gene and identify the causative . Further, genome maps can be used for comparative genome mapping in which homologous loci in other species are explored, for evolutionary studies, and also for comparisons of the recombination frequency within and between species.

Methods in genome mapping

Genetic linkage mapping

Genetic linkage represents a deviation from Mendel's law regarding the independent assortment of alleles during the formation of gametes. When two genes are close together on the same chromosome, they do not assort independently at meiosis. Genes that are far apart on the same chromosome do appear to assort independently. The reason for this phenomenon is the occurrence of cross-overs between a pair of homologous chromosomes at meiosis (see Figure 1). The proportion of intrachromosomal recombinant genotypes is a measure of the crossing-over frequency of a particular chromosome pair. The more distant two loci are on a chromosome, the more likely it is that a cross-over will occur between them. The recombination frequency can therefore be

2 Homologous chromosomes

Cross-over

Results from recombination

Figure 1. A molecular event of recombination may be schematically represented by two double-stranded molecules breaking and rejoining. used as a measure of the distance between loci. If the recombination frequency is lower than 50%, the loci are said to be genetically linked. The unit of map distance is centiMorgan (cM), with one cM corresponding to a recombination frequency of 1%. Linkage mapping involves following the segregation of alleles in families to establish whether or not the alleles at one locus cosegregate with alleles at other loci. To be able to do this, it is necessary to determine the parental origination of each allele at a locus in the progeny. Both polymorphic genetic markers and pedigrees are required for this. Currently, the most commonly used marker for linkage analysis in is microsatellites, which will be described in more detail below. There are two principal types of pedigrees used to generate a resource pedigree with a high level of heterozygosity; the intercross and the backcross design. In the former, two divergent breeds are crossed to get a high level of heterozygosity in the F1 generation. In a backcross design the F1 animals are crossed to one of the parental lines. In an intercross design the F1 animals are crossed to each other to generate F2 individuals. When constructing a map with many linked loci it is more efficient to perform multi-locus analyses than to do separate pairwise linkage analyses and try to put them together into a map. CRIMAP (Green et al. 1990) and LINKAGE (Lathrop and Lalouel 1988), among others, provide computational tools to perform multi-locus analyses. Although the idea behind linkage mapping is quite simple there are a number of pitfalls to be aware of and dealt with. For example, over longer chromosomal distances it is likely that there will be multiple cross-overs which could lead to recombinants being scored as non-recombinants. Therefore, recombination frequencies are not additive. Another problem is the phenomenon of interference, for example, with positive interference a cross-over has the effect of reducing the probability of a second cross-over in its vicinity. Several mapping functions exist to accomodate these problems and convert frequency of cross-overs into linear map distance. Some mapping functions take interference into account whereas others do not. A number of mapping functions have been derived depending upon the degree of interference assumed (Kosambi 1944). Increasing the number of markers will make the map more accurate with respect to these pitfalls.

3 The frequency of recombination varies with age and chromosomal position, being particularly high in certain chromosomal regions called “hot spots”. Chromosomal regions close to the telomeres tend to recombine more frequently than regions close to the , however it has been shown that recombination suppression at the centromere is not universal (Lynn 2000). Furthermore, the heterogametic sex generally show less recombination than the homogametic sex (Dunn and Bennett 1967). Generally, these factors are not corrected for in map construction, although the extent which this affects the construction of linkage maps is not clear. However, available linkage analysis programs allow separate estimation of distances in the two sexes, which in turn can be compared with a map derived assuming equal recombination (Archibald and Haley 1998).

Physical mapping

In situ hybridization In situ hybridisation is important in gene mapping for localising and orienting linkage groups to chromosomes as well as to assign syntenic groups (see below) of markers established from somatic cell hybrid mapping to chromosomes. It enables the visualisation of a probe of interest on the chromosome. The probe is usually a particular segment of cloned DNA which is labelled to allow detection. The target is normally a preparation of chromosomal DNA placed on a microscope slide. Both the probe and the target DNA are denaturated and then hybridised to let the probe anneal to the complementary sequence. Unbound probe is then washed off to allow the visualisation of the specific hybridisation signal using appropriate detection methods. The probes used can vary considerably in size, and range from a few bases to several hundred kilobases cloned in e.g. yeast artificial chromosome (YAC) vectors. The target is usually metaphase or prometaphase chromosomes. Non-isotopic probe labelling methods have largely replaced the earlier isotopic procedures (Waterston and Sulston 1995). Fluorescence in situ hybridisation (FISH) is the most common non-isotopic method used. FISH for chromosome assignment and chromosome band localisation is now a routine mapping procedure. Development of fluorescent labelling detection systems enables simultaneous hybridisation of multiple probes known as multicolour FISH. Somatic cell hybrid (SCH) mapping This technique is based on the discovery that interspecific hybrids loose the chromosomes of one species, allowing genetic characters to be assigned to particular chromosomes by analysing a panel of somatic cell hybrid clones (Naylor 1997). It is sometimes referred to as synteny mapping, were synteny means on the same chromosome. When markers are known to be located on the same chromosome, they are referred to as syntenic markers. The SCH technique is a very common way of identifying markers that belong to the same chromosome, ie. to define syntenic groups of markers. Basically, two cell lines from different species are fused to produce hybrid cells. These hybrid cells lose the majority of the chromosomes of the target species but may keep one or a few of them. The principle of construction of somatic cell hybrids is shown in Figure 2. The optimal case is when the clones have kept only one whole chromosome or chromosome arm (Chowdhary 1998a). By using a selective medium only the hybrid cells will survive. Cytogenetic methods are used to characterise the hybrid clones for their foreign chromosomal content. The presence or absence of a particular locus in the cell line is usually analysed by PCR. Neither polymorphic markers nor pedigrees are required for this technique. Somatic cell hybrid mapping provides a valuable tool for the development of comparative gene maps because it allows the mapping of large number of coding genes.

4 Figure 2. Principle of construction of whole-cell somatic hybrids.

Radiation hybrid (RH) mapping Radiation hybrid mapping combines approaches from both physical and genetic mapping and is based on experiments by Gross and Harris (1975, 1977a, b). The technique was further developed by Cox et al. (1990) who demonstrated that a hamster-human somatic cell hybrid containing only one human chromosome could be used as a donor in hybrid formation, and that the selectable marker could lie in the hamster portion of the irradiated donor genome. Essentially, it is a SCH technique with the difference that before fusion of cell lines, the whole or partial genome of the target species is irradiated to cause fragmentation of chromosomes. Cox et al. (1990) assumed that the further apart any two markers are on a chromosome, the more likely that the irradiation would break them apart so that they would segregate into different hybrids. When using this technique a panel of radiation hybrid cell lines are analysed for the presence or absence of markers. The frequency with which the markers co-segregate reflects the distance between them and allows ordering of markers. The distance between markers is expressed in centiRays (cR) and a distance of one cR between two markers corresponds to 1% frequency of chromosomal breakage between the markers after exposure of a particular X-ray dose (Walter and Goodfellow 1993). It is possible to control the frequency of radiation induced breaks by altering the radiation dose, where a higher dose will cause more breaks. This means that the resolution of the map can be increased by a higher radiation dose of donor DNA, or vice versa (Stewart and Cox 1997). The frequency of breakage appears to be linearly related to physical distance and there do not seem to be hot spots for breakage along the chromosome (Cox and Myers 1992). Recently, a high-resolution radiation hybrid map of the human genome draft sequence was constructed by using a panel of 90 whole-genome radiation hybrids (Olivier et al. 2001). RH panels exist for some farm animals like pig (Yerle et al. 1998) and cattle (Womack et al. 1997).

5 Comparative mapping Comparative maps display the chromosomal location of homologous genes in different species and genetic segments conserved through evolutionary time. Genome organisation is well conserved in mammals. With well developed comparative maps, gene mapping data can be transferred from the more dense human and mouse gene maps to other species. Comparative chromosome painting or Zoo-FISH is a powerful method for detecting evolutionary conserved chromosomal regions (Chowdhary et al 1998b). This involves hybridisation of whole, or parts of chromosomes derived from one species to metaphase chromosomes of a second species. However, this technique only gives a gross picture of the conserved segments, because the probes used have to be very large (at least 5 Mb) to allow proper hybridisation, and therefore results need to be fine tuned with techniques such as linkage or RH mapping.

The equine genome

The study organism

This thesis focuses on mapping the genome of the horse (Figure 3). The modern horse has one of the most complete fossil histories and belongs to the order Perissodactyla, (the odd-toed hoofed mammals). Table 1 outlines the classification of the Perissodactyla (Prothero and Schoch 1989). The order contains three extant groups, horses, and rhinos. Tapirs and rhinos diverged in the Late 40 million years ago (mya).

Table 1. Classification of the order Perissodactyla by Prothero and Schoch (1989). ______Order Perissodactyla Suborder Titanotheriomorpha - brontotheres Suborder Hippomorpha Superfamily Pachynolophoidea Superfamily Equioidea Family Family Suborder Moropomorpha Parvorder Ancylopoda - chalicotheres Parvorder Ceratomorpha - tapirs and rhinos ______

The family Equidae includes all horse breeds in addition tovarious species of and ass. This family is known from the early Eocene 54 mya (Groves and Ryder 2000). The purported ancestor of the Equidae is the species formerly known as cuniculum, from the genus Cymbalophus (Hooker 1984). A slightly shortened classification of the Equidae is given in Table 2 (Evander 1989). The evolutionary history can be traced through the different genera; (Early Eocene, 50-47 mya), (Middle and Late Eocene, 47-40 mya), (Late Eocene and Early , 40-30 mya), and (Latest Eocene and Oligocene, 37-25 mya). Thus, the of the modern horse appears to have branched very little until the latest stages.

6 Table 2. Classification of the family Equidae, slightly modified from Evander (1989). ______Family Equidae Cymbalophus Orohippus Epihippus Mesohippus Miohippus Subfamily Anchitheriinae Subfamily Tribe Protohippini Tribe Hipparionini Tribe Onohippidion Equus ______

The archaeological record suggests that the horse was domesticated about 6000 years ago somewhere in the region of and southern Russia. Recently, an analysis of the mtDNA control region sequences of 191 domestic horses revealed a high sequence diversity (Vilà et al. 2001), suggesting a widespread integration of matrilines and a prolonged period of domestication of wild horses.

Genome structure and size

Mammalian genomes appear highly conserved in size and gene content. The size of the haploid genome is known as the C-value and this value has been estimated as follows for some mammals; human 3200 Mb, mouse 2700 Mb, pig 2700 Mb and cow 3000 Mb (Miller 1997). The fact that genome size does not correlate well with the complexity of organisms is known as the C-value paradox. However, the paradox might be explained by the discovery that genomes contain a large quantity of repetitive sequences (reviewed in Gregory et al. 1999). The relative proportions of the genome that are occupied by unique DNA and repetitive DNA vary substantially between different organisms (Lewin 1994). In contrast to the similarity in genome size and gene number in mammals, chromosome number varies a lot between mammalian species: from the Indian muntjak (Muntiacus muntjak) with 2n=6/7 to a South American rodent (Tympanoctomys barrerae) with 2n=102 (Qumsiyeh 1994). Each eukaryotic nucleus encloses the nuclear DNA which is structurally arranged in a fixed number of chromosomes. Chromosomes exist in a highly extended linear form during most of the cell cycle, condensing into more compact bodies prior to cell division. In this condensed stage the chromosomes may be microscopically examined. The chromosome contains two defined parts known as the centromere and the telomere. While chromosomes generally duplicate in interphase the centromere always duplicates during the contracted metaphase stage. The centromere is a constriction at a fixed position for each chromosome that controls the movement of the chromosome (the separation of the two sister chromatides) during mitosis. Mammalian are made up of very large regions of repeated sequences. The position of the centromere on the chromosome is used to classify the chromosomes into meta-, submeta-, telo- and acrocentric chromosomes. The ends of the chromosomes, the telomeres, are composed of small

7 Figure 3. arrays of tandemly repeated DNA. The substance of chromosomes is chromatin and includes not only DNA but also chromosomal and RNA. Chromatin exists in two forms: the condensed heterochromatin and the less condensed euchromatin. Heterochromatin stains densely in microscopic preparations and is believed to be genetically inert. Euchromatin is thought to contain the normally functioning genes. Both centromeres and telomeres consist of heterochromatin. Briefly, there are two types of repetitive DNA; interspersed and tandemly repeated DNA. One common type of dispersed DNA is classified according to the size of the repeat unit, and are called either long interspersed elements (LINEs, typically longer than 1000 bases), or short interspersed elements (SINEs, usually shorter than 500 bases). The majority of LINEs and SINEs has most likely arisen from the action of reverse transcriptase. For SINEs, the DNA is a copy of either 7SL RNA or a certain tRNA molecule. Most mammalian tRNA-derived SINEs are thought to originate from tRNA- Lys, whilst in horses they appear to derive from tRNA-Ser (Sakagami et al. 1994b). Equine SINEs form two families, ERE-1 and ERE-2, which share some sequence similarity but have different subunit structure (Gallagher et al. 1999). As is the case for many other mammals, SINEs and LINEs in the horse are often associated with microsatellites in their 3' end (Breen et al. 1997; Hiromura et al. 1997; Gallagher et al. 1999).

8 In regions where the repeat units occur immediately adjacently to one another they are termed as tandem repeats. If there are many thousands of copies of the repeat unit at one site, the DNA is called satellite DNA. The size of the repeat unit in satellite DNA is usually less than 500 bp, with the total number of repeats at one site ranging somewhere between 1000 and 50000. As the size of the tandem repeat unit at one site decreases, the terms minisatellite (Jeffreys et al. 1985a) and microsatellite (Litt and Luty 1989) are used to describe them. Minisatellites have repeated units of 10-50 bp while microsatellites have repeated units of 1-6 bp. The most common type of repeat units in microsatellites are mono-, di-, tri- and tetra repeat units. The use of microsatellites, in particular, as highly polymorphic genetic markers has revolutionised the field of genome mapping since the early 1990s.

The karyotype

The number and morphology of chromosomes in a species is called the karyotype. In 1959 Rothfels et al. showed that the diploid chromosome number of the horse is 2n=64. Only the dog with 2n=78 has a higher diploid number among the domestic animals. The horse karyotype consists of 13 meta- and submetacentric (bi-armed) and 18 acrocentric pairs of , as well as a large submetacentric X and a small acrocentric Y chromosome. The current standard karyotype for the horse was defined by the International System for Cytogenetic Nomenclature of the Domestic Horse (ISCNH 1997) and is shown in Figure 4. The committee provided a detailed description of both G- and R-banded chromosomes along with ideograms showing landmarks and band numbers for each of the banding types. In G-banding, the chromosomes are treated with trypsin followed by Giemsa staining which defines regions with high A (adenine) and T (thymine) content. Reversed banding patterns are produced by R-banding in which BrdU (5-bromo-2'-deoxyuridine) incorporation in late interphase is followed by Giemsa staining to highlight the GC-rich chromatin. Cytogenetic studies in several related equid species have revealed great variation in chromosome number, for example, the diploid number varies from 2n=32 in Hartmann´s (Benirschke and Malouf 1967) to 2n=66 in Przewalski´s horse (Ryder 1978). Despite the extensive chromosomal differences, the various equine species can produce viable offspring when breeding together. Generally however, the hybrid offspring are all sterile except for in cases of hybridisation between E. caballus and E. przewalskii or among the hemiones (Mongolian wild ass, Persian wild ass, Transcaspian wild ass, ), and occasionally in and (Rong et al. 1988).

Equine genome mapping

Several linkages between equine protein polymorphisms and blood groups systems were reported relatively early (e.g. Sandberg 1974; Sandberg and Juneja 1978). Equine linkage mapping continued to deal with protein polymorphic markers during the 1980s (e.g. McGuire and Weitkamp; Guttormsen and Weitkamp et al. 1981; Weitkamp et al. 1982; Andersson et al. 1983a, b; Weitkamp et al. 1983; Andersson and Sandberg 1984; Bowling 1987). However, coding sequences are under selective pressure to conserve the biological function of the proteins and this selection means that most proteins show little variation. Without variation proteins are not very informative in linkage analysis. Minisatellites demonstrate extensive intraspecific polymorphism in terms of varying number of repeats and have been used for DNA fingerprinting of organisms (Jeffreys et al. 1985a, b), e.g. for identification of individuals and for studying genetic relationships. When a cloned probe containing a minisatellite sequence is annealed with DNA blots of restriction endonuclease digested DNA, multiple bands will hybridise. The pattern of bands produced is specific for each individual. Georges et al. (1988) first examined the use of minisatellites in the horse on a family of Belgian half-bred horses using four different minisatellite probes. Hopkins et al. (1991) used human minisatellite probes 33.15 and 33.6 to solve a paternity case in closely related Exmoor . Guerin

9 18 17 16 15 14.3 14.1 13 12.3 18 12.1 16 11 17 15 16 11 14 16 12 15.3 15 13 15.1 13 14 18 14 14 12 13 17 15 13 12 16 12 11 11 16 11 15 11 11 17.1 11 12 14 17.2 12 13 12 13 17.3 13.1 13 12 13.3 14.1 14 11 21.1 14.1 14.3 21.2 21.1 11 21.3 14.3 21 12 21.1 21.2 13 22 22.1 21.3 21.2 22.2 23 21.3 22.3 22 14 24 22 23 23 15 25 24 24 26 23 25 16 27 24 25 28 25 26 27 17 29 1 2 345 15 16 14 13 15 15 16 15 14.3 14 15 14 12 14.2 14 14.1 13 13 13 11 12 13 12 12 11 12 11 11 11 12 11 11 11 11 13 11 12 12 12 13 13 13 21 13 14 14 14 21.2 14 15 15 21.3 15 16 15 16 17 22 16 21 16 21 22 23 17 23 17 22 24 18 24 18 23 19 25 6 789 10 25 14 24 13 22 12.3 15 21 15 14 16 12.1 14 13 11 13 12 14 12 11 12 11 11 11 12 12 12 11 11 13 11 12 14.1 12 13 R 11 13 14.3 13 14 12 15 15 14 16 14 14 16 16 15 17 11 12 13 21 Y 22 24 26 27 28 29 X 11 12 11 11 11 12 13 12 12 11 14 13 13 12 11 13 14 13 12 14 15 15 14 15 14 13 21.1 15.1 16 21.1 16 21.2 14 15.3 21 21.3 21.1 21.3 15 22.1 21 22 21.2 22 16 22.2 21.3 22 17 22.3 23 23 23 18 23 24 22 24 24 24 21 25 23.1 25 25 25 22 26 26 23.3 26 23 27 25 27 26 24 27 25 14 15 16 17 18 19

11 12 11 13 12 11 11 14 13 12 15 12 11 11 14 13 12 16 15 13 13 12 14 14 13 21.1 16 15.1 14 17 15 14 21.2 16 15.3 15 21.3 15 16 18 17 16 22 17 16.1 17 19.1 18 19.3 18 16.3 18 24 19 19 19 20 21 22 23 24 25

11 11 11 11 12 12 12 13 13 12 11 11 13 13 12 12 14 14 14 15 14 13 13 15 14 15 16 16 15 14 15 17 17 16 16 18 16 15 17 18 17 26 27 28 29 30 31 Figure 4. The current standard karyotype for the horse as defined by the International System for Cytogenetic Nomenclature of the Domestic Horse (ISCNH 1997).

10 et al. (1993) also used the 33.6 minisatellite probe in a variety of horse breeds. DNA fingerprinting has also been carried out using a synthetic (TG)n polynucleotide probe to study its application to population comparisons (Ellegren et al. 1992a). Minisatellite sequences have also been characterised in the horse by Flint et al. (1988) and Anglana et al. (1996). DNA fingerprinting requires comparatively large amounts of DNA for Southern blotting and has largely been replaced by PCR-based methods, e.g. using microsatellites. The discovery of microsatellite markers in the end of the 1980s was a major breakthrough for genome mapping. Microsatellite polymorphism is commonly measured as the length of the repeat sequence amplified by PCR, using primers flanking the repeat region. The sequences immediately flanking the microsatellites, which serve as primer binding sites, are generally not conserved between species. Hence, there is a need to isolate species-specific microsatellites. The first horse microsatellites were isolated by Ellegren et al. (1992b) and Marklund et al. (1994) who demonstrated their suitability for parentage testing and linkage studies due to high levels of polymorphism. To date, approximately 900 microsatellites (of which 634 have been published) have been isolated in the horse, most of which were developed from small insert genomic libraries. However, a proportion of microsatellites have also been isolated from cosmid, phage or BAC libraries (Tozaki et al. 1995; Sakagami et al. 1995; Breen et al. 1997; Godard et al. 1997, 1998; Hirota et al. 1997; Marti et al. 1998). The radiolabelling approach was used when physical assignment by in situ hybridisation of loci to horse chromosomes began. The first assignments were for the equine major histocompatibility complex (ELA) to ECA20q14-q22 (Ansari et al. 1988; Mäkinen et al. 1989). Human and pig genomic or cDNA probes were used in the beginning when no horse-specific probes were available. However, specifically for the genome map, a significant effort was made to develop equine genomic libraries and identify genes, microsatellite markers and sequence-tagged sites (STSs) in large insert clone libraries (phage, cosmid and BAC libraries) suitable for in situ hybridisation (Breen et al. 1997; Godard et al. 1997, 1998, 1999, 2000; Hirota et al. 1997; Lear et al. 2000, 2001; Lindgren et al. 2001a, b; Mariat 2001). To date, there is only one reported study using double-colour FISH for ordering of markers in the horse (Raudsepp et al. 1999). There are no reported studies using higher resolution techniques like fibre FISH or interphase FISH in horses. Three research groups have produced synteny mapping data for the horse using horse-rodent SCH panels (Williams et al. 1993; Bailey et al. 1995; Shiue et al. 1999). In the first one, developed by Williams et al. (1993), three syntenic groups based on eight Type I loci mapped by Southern blotting and isozyme analysis were identified. In 1995, Bailey et al. established six synteny groups by analysing 15 microsatellite markers and two Type I loci. More than 500 markers have now been assigned to 33 syntenic groups using the UC Davis horse x mouse SCH panel (Caetano et al. 1999a, b, c; Hopman et al. 1999; Murphie et al. 1999; Ruth et al. 1999; Shiue et al. 1999, 2000; Tallmadge et al. 1999; Bowling et al. 2000). The markers used includes RAPDs, microsatellites and Type I markers. All 33 syntenic groups have been assigned to chromosomes using in situ hybridisation and Zoo-FISH (Raudsepp et al. 1996). Two radiation hybrid panels have been established and characterised in the horse (B. P. Chowdhary, personal communication; Kiguwa et al. 2000). The Copenhagen-Texas panel was generated by the irradiation of a horse fibroblast cell line for a total of 5000 rad. The cells were then fused with the Chinese hamster TK- fibroblast line A23 (Chowhary and Raudsepp 2000). The second panel was prepared by the Department of Genetics, Cambridge University, Cambridge in collaboration with Research Genetics, Cambridge, UK. These resources will be very important for incorporating and ordering Type I markers on the horse gene map as there is no need for polymorphism using this technique.

11 Comparative chromosome painting or Zoo-FISH is a powerful method for detecting evolutionary conserved chromosomal regions (Chowdhary et al 1998b). Raudsepp et al. (1996) presented conserved chromosomal segments in horse and man using Zoo-FISH and the results is shown in Figure 5. This work determines the gross degree of chromosome conservation between the horse and the human genome. Five genes and two equine microsatellites were FISH mapped in both horse and (Raudsepp et al. 1997, 1999), providing the first comparative map data between the two equid species. A summary of the information stored in databases of gene mapping projects in the horse, can be found at: ArkDB horse Genome (Roslin Node) http://www.ri.bbsrc.ac.uk/cgi-bin/arkdb/browsers/browser.sh?species=horse Horsmap (INRA, Jouy-en-Josas, France) http://locus.jouy.inra.fr/cgi-bin/lgbc/mapping/common/intro2.pl?BASE=horse At present, the number of assigned loci in the horse are at least 673 (at least 150 additional markers have been mapped but are not yet published) of which 523 are microsatellite markers and 150 are designated genes. 147 microsatellites have been mapped by FISH.

12 Figure 5. Conserved chromosomal segments in horse and man using Zoo-FISH (Raudsepp et al. 1996).

13 General research aims

 To integrate genetical and physical assignments of equine microsatellite markers

 To develop a primary male autosomal linkage map of the horse genome

 To physically anchor and orientate equine linkage groups

 To contribute with information to the comparative horse map by using fluorescent in situ hybridisation (FISH) and somatic cell hybrid mapping

 To identify single nucleotide polymorphisms (SNPs) on the horse Y chromosome

14 Results and discussion

Integration of genetical and physical assignments of equine microsatellite markers (Paper I) Microsatellites or short tandem repeats (STRs) are widely accepted as the markers of choice for the development of genome maps. The main reason for this being their high level of polymorphism, abundant frequency in the genome, and also because the analysis of these kinds of markers is readily performed. They are conventionally isolated from some form of small insert libraries. The small insert size facilitates sequencing of microsatellite flanking regions, but at the same time prohibits physical mapping by in situ hybridisation due to too weak hybridisation signal. The development of a genome map relies on the information generated by both linkage and cytogenetical studies, i.e. it is important to anchor genetic markers to particular chromosomal regions. One way to integrate genetical and physical assignments is to isolate polymorphic microsatellite markers from some form of large insert genomic library, such as a phage-, cosmid- or BAC library. This is important in the first phase of developing genome maps to get an overview of how well the karyotype is covered and allows rapid assignment of linkage groups to chromosomes. In later stages of mapping, physical and genetical integration is important, for example, in tracing candidate genes and for studies of the patterns of recombination along the chromsomes. In this study we report the first integration of genetical and physical data for the equine genome map. We screened 10,000 clones from a horse genomic DNA library (constructed in the vector λ-FIX II, with an average insert size of 15kb), with a poly[dAC:dTG] probe using standard procedures. The detection of approximately 200 strong positive clones indicated a dispersion of a (CA)n microsatellite marker every 750kb in the horse genome; obviously, this estimate is greatly affected by what is considered as a positive clone. Twenty positive clones (named ASB followed by a number for each clone) were selected at random and subsequently subcloned and sequenced. A (CA)n microsatellite was identified in all clones, with 90% (18) containing at least 15 or more tandem repeat units, and 40% (8) containing 20 or more. Primers were designed to amplify across the microsatellite in each locus, however we were unable to produce a clean PCR product for one of them, even with a number of different primers. Each of the remaining 19 microsatellite loci were tested for polymorphism in three horse populations; Australian , Arab and a population of mixed breeds (, Shetland, Finn, New Forest and Welch Cob). As expected, both the number of alleles and the heterozygosity of each marker were generally greater in the mixed population than in the single breeds. Of the 19 markers studied, at least 15 (79%) showed levels of heterozygosity in excess of 50% in all three populations. Database searching was performed of the microsatellite repeat flanking sequences against the GenBank, cDNA and NBRF databases with the BLAST program (Altschul et al. 1990). The microsatellite repeat sequences were removed from their flanking sequences to avoid matches with similar repeats in the databases. Four of the 20 microsatellite positive clones could be matched to known genes. Chromosomal location of the phage clones representing all 20 loci was made by FISH analysis. Seventeen mapped to single sites, one mapped to two clear locations suggesting that the clone was chimeric, one mapped to multiple centromeres suggesting that it contained satellite sequences, and produced no visible signal despite repeated attempts. These assignments involved eight equine autosomes; ECA1, 2, 4, 6, 9, 10, 15 and 16. The precise localisation of each locus is given in Table 3 (Paper I, Appendix) and representative images are presented in Figure 6. The microsatellites were simultaneously mapped by linkage analysis in a Swedish reference pedigree comprising eight half-sib families. Segregation of the markers was related to a set of 35 other genetic markers previously typed in the pedigree. Cases of linkage were detected with CRI-MAP (Green et al. 1990), applying a lod score (z) criterion of 3.0 for significant linkage. Thirteen of

15 Figure 6. Partial metaphases demonstrating the fluorescence in situ localisation of ASB 2- 9, 11-19, and 22.

16 the 19 markers showed linkage to at least one other marker (see Table 3 in Paper I, Appendix). Linkages between two ASB markers were always consistent with their being assigned physically to the same chromsome; ASB8 and ASB12 on ECA1 (horse chromosome 1), ASB4 and ASB5 on ECA9, ASB6 and ASB9 on ECA10, and ASB15 and ASB19 on ECA15. Some of the linkages detected in this study relate in various ways to previously made assignments. One example being the gene for glucose phosphate isomerase (GPI) which is located at terminal part of ECA10p (Harbitz et al. 1990) and has been found to be linked to A1BG with a recombination frequency of 23% (Andersson et al. 1983). We observed significant linkage (z = 5.59) and 14% recombination between A1BG and ASB9. Since ASB9 was in situ mapped to ECA10q15-17, the linkage group GPI-A1BG-ASB9 is likely to cover the whole p arm of ECA10, in that case representing the first chromosomal arm of a horse chromosome being tagged with linked genetic markers. The last compilation of genetically mapped markers in the horse included only approximately 20 loci distributed in five different linkage groups (Sandberg and Andersson 1992). Hence, in this study we made a substantial contribution to the genetic map of the horse, providing linkage data for 21 new markers. Furthermore, the fact that 13 of these markers were assigned to chromosomes by in situ hybridisation and that six additional, but so far unlinked markers were also physically mapped provided a starting point for future equine genome mapping.

A primary linkage map of the horse genome (Paper II) The importance of linkage mapping in domestic animals has been discussed in the Introduction. In an initiative to extend the knowledge of the equine genome, we have established a reference pedigree for genome mapping, and here we present the construction of the first preliminary, male autosomal linkage map of the horse genome. The reference pedigree consists of eight half-sib families (four Standardbred trotters and four Icelandic horse stallions) with a total of 263 offspring. The pedigree was genotyped with 140 polymorphic markers, 121 of which were PCR-based microsatellites and about one-third of which had been physically assigned to chromosomes by in situ hybridisation (all presented in Table 1 in Paper II, Appendix). The markers included five new, previously undescribed restriction fragment length polymorphisms (RFLPs) which were detected using human or porcine cDNA probes; FUCA1, GLUT1, LPL, MYL1,3 and TYR. Marker heterozygosity varied extensively between loci, from one out of the eight sires being heterozygote to all being so. The average number of heterozygous sires was 4.3, corresponding to an observed heterozygosity (Ho) of 53%. In general, microsatellites were more variable than other markers (Ho=56% vs. 35%). Two important consequences of only the sires and their offspring being included in the study were that we only measured male recombination fractions and that we could not deduce the transmission of paternal alleles for all offspring. The average number of fully informative offspring per marker was 89, with the range being from 9 to 235. An overview of the linkage data is given in Figure 1 in Paper II (Appendix), where the equine idiogram (ISCNH 1997) is shown together with the established linkage groups. Linkage between markers was analysed using the program CRIMAP (Green et al. 1990). One hundred of the 140 markers (72%) showed linkage to at least one other marker. Using multipoint analysis we established 25 linkage groups. Twenty-two of these were assigned to 18 different chromosomes, generally because one or more of the markers included in a linkage group were physically mapped by in situ hybridisation (see legend of Figure 1, Paper II, Appendix). Chromosomes covered by linkage groups were ECA1, ECA2, ECA3, ECA4, ECA5, ECA6, ECA7, ECA9, ECA10, ECA11, ECA13, ECA15, ECA16, ECA18, ECA19, ECA21, ECA22, and ECA30. Linkage groups ranged between 0 and 103 cM (multi-point distances in Kosambi cM), the longest residing on ECA4, and the number of markers within groups between two and 10. The sum of the length of all linkage groups was 679 cM, with an average

17 distance between adjacent markers of 12.6 cM. Clearly, a much greater total genetic length is revealed if one accounts for the flanking distances covered by end markers in linkage groups, and the distances covered by the 40 unlinked markers. Using the mean number of informative meioses per marker (89), we can estimate that, on average, our data set allows linkage between two random markers spaced up to 15 cM apart to be detected with a lod score criterion of 3. Thereby assuming that each of the 2x25=50 end markers on average cover 15 flanking cMs, the total map length would be about 1425 cM. Furthermore, with the addition of unlinked markers, it is reasonable that the marker set covers well above 1500 cM. It is hard to deduce what fraction of the genome is thereby covered, principally because we do not know the total recombinational distance (genetic length) of the equine genome. Total sex-average distances for other mammalian species range from about 1600 cM (mouse; Davisson and Roderick 1989) to 3500 cM or even higher (human; Weissenbach et al. 1992). In at least some species the male recombination rate is considerably less than that in females (Ellegren et al. 1994, Morton 1991), but this might not be a ubiquitous phenomenon among mammals as suggested by data from cattle and sheep (Barendse et al. 1997, Crawford et al. 1995). The only clue to the genetic length of the equine genome comes from the analysis of meiotic chromosomes (Scott and Long 1980). The number of per cell chiasma in late diplotene or diakinesis among stallions was counted to 54.4 ±1.8 S.E., which is comparable to that found in sheep (Chapman and Bruere 1977, Long 1978), but higher than that in pig (Burt and Bell 1987), cattle and goat (Logue 1977). The observed number of chiasma in horse would suggest a total male genetic distance of 2720 cM (while it would be 2000 - 2500 cM in pig, cattle and goat). The linkage groups on ECA3, ECA4, and ECA15 contained five in situ mapped markers, the one on ECA2 had four whereas ECA1, ECA9, ECA10, ECA11, ECA19 and ECA22 had either two or three physically anchored markers. In most of these cases this allowed determining the orientation of the linkage group along the respective chromosomes. Data from chromosomes with more than one physical tag also allowed a rough analysis of the relationship between genetic and physical distances in the equine genome. Using the approach described in Ellegren et al. (1994), we analysed this relationship by measuring the physical distance between the most distant anchored markers within linkage groups, expressed as the relative proportion of the genome covered by these markers (measured with a ruler on the karyotype, from the midpoints of in situ assignments), and compared this with the recombinational distance between these markers. A minimum length of linkage groups to be considered was set to 20 cM. Six groups fulfilled these criteria and gave the mean of 0.68 cM/Mb ± 0.05 S.E. Two new gene assignments were made in this study, phosphoglucomutase (PGM) to ECA5 and transferrin (TF) to ECA16. Human PGM maps to HSA1p31. According to Zoo-FISH (Raudsepp et al. 1996), HSA1 corresponds to three different equine chromosomes, ECA2p, ECA5 and ECA30, but it is not known which parts of the human chromosome are homologous to each of the three equine chromosomes. Our mapping of PGM now shows that ECA5 is homologous to at least parts of the p arm of HSA1. Since the linkage group on ECA5 is not orientated, we cannot deduce in what way the HSA1 conservation is arranged along ECA5. The assignment of the TF locus to ECA16 is in agreement with Zoo-FISH data since human TF maps to HSA3q21 and HSA3 corresponds to ECA16 and ECA19. The HSA3q21 band is relatively close to the centromere and given the rather distal location of TF on ECA16, it is possible that the entire q arm of HSA3 corresponds to ECA16, but is orientated inversely. If so, the HSA3 - ECA19 would involve the p arm of HSA3. Of course, internal rearrangements may occur, but Zoo-FISH data suggest that such events have been rare following the split of the human and equine lineages (Raudsepp et al. 1996). With the linkage map presented here, it will now become feasible to make genome scans for traits of unknown genetic background in the horse. Also, the present map should provide an important framework for future genome mapping in the horse. In addition to the linkage map presented in this study, two other linkage maps have been constructed. In 1995, the First International Equine Gene Mapping Workshop was held with the goal of constructing a low density male linkage map of the horse genome and the results from this effort are described in Guérin et al. (1999). More recently, a

18 comprehensive low-density horse linkage map based on two, three-generation, full-sibling, cross-bred reference families has been presented by Swinburne et al. (2000). All three maps can be viewed at : Horse http://www.uky.edu/Ag/Horsemap/Maps/index.html

FISH and somatic cell hybrid mapping of a set of horse genes (Paper III)

Microsatellite markers have been extensively used in the construction of genome maps in domestic animals, including the cat (Menotti-Raymond et al.1999), dog (Werner et al.1999), sheep (de Gortari et al.1998), cow (Kappes et al.1997) and also the horse (Lindgren et al.1998, Guérin et al.1999, Shiue et al.1999, Swinburne et al. 2000). However, although microsatellites are highly suitable as linkage markers, representing anonymous DNA segments, they are of limited use for transferring mapping data from one species to another. For this reason it is advantageous to incorporate Type I coding genes into maps (O´Brien et al.1993). As coding sequences tend to be well conserved, through evolutionary time, gene map information can therefore be applied between different genomes under investigation e.g. between human and other mammalian genomes. Indeed, comparative mapping has become a most useful tool for identification of trait genes in domestic animals (e.g. Hasler-Rapacz et al.1998). Recently, a first comparative gene map of the horse genome was established, mainly based on somatic cell hybrid data (Caetano et al.1999b). The fact that detailed map information is only available for a limited number of horse genes restricts the possibility of candidate approaches in mapping trait loci through comparative data. To widen the information on the comparative horse map, we present here map data for 12 novel genes using fluorescence in situ hybridisation and somatic cell hybrid mapping, and confirm the synteny assignment of a further gene using FISH. GenBank was searched for horse specific coding genomic DNA or mRNA sequences. Primers were designed for amplification of 200-500 bp fragments of either introns, or 5' or 3' untranslated regions of 13 different genes (ANP, CD2, CLU, CRISP3, CYP17, FGG, IL1RN, IL10, MMP13, PRM1, PTGS2, TNFA and TP53); their full names and primer sequences are listed in Table 1 in Paper III (Appendix). Two different horse BAC libraries (Godard et al.1998, L. Skow, unpublished) were screened by PCR for the presence of clones containing these 13 Type I loci. The screening procedure essentially followed that described in Godard et al. (1998). First, the BAC library presented by Godard et al. (1998) was screened by PCR for the 13 horse Type I markers and eight of which (ANP, CD2, CRISP3, FGG, IL1RN, PRM1, PTGS2 and TNFA) were found to be represented by at least one clone. The remaining five genes were subsequently screened for in the BAC library developed at the Texas A & M University (L. Skow, unpublished), but only one, MMP13 could be identified. Thus, no clone was found for CLU, CYP17, IL10 or TP53 in either of the two BAC libraries. FISH signals for individual positive BAC clones were analysed in 10 metaphase chromosome spreads. All nine BAC clones showed clear hybridisation sites on both homologues of the corresponding chromosome; the resulting chromosome band assignments are given in Table 3 and representative images are presented in Figure 1 (Paper II, Appendix). The clones mapped to six different equine chromosomes (ECA2, ECA5, ECA7, ECA13, ECA15 and ECA20). The physical localisation of CD2 and PTGS2 to ECA5 represents the first genes FISH mapped to this horse chromosome. The establishment of the UC Davis SCH panel has been described previously (Shiue et al.1999). DNA from the same 108 horse-mouse heterohybridoma cell lines as used in previous studies (Caetano et al.1999a, b; Shiue, 1999; Shiue et al. 1999, 2000) were used in this study. DNA from each cell line was amplified with primers for the four primer sets that did not yield a positive BAC clone (CLU, CYP17, IL10 and TP53) and

19 Table 3. FISH and synteny mapping results for 13 Type I loci in the horse and comparison with the human-horse zoo-FISH data. ______Locus FISH (F)/ Physical position Physical position Zoo-FISH Synteny (S) Horse Human Human-Horse ______ANP F 2p14prox-p13 1p36 2p, 5, 30 CD2 F 5q11 1p13 2p, 5, 30 CLU S 2 8p21-p12 9 CRISP3 F 20q22 6 10q, 20 CYP17S * 1p 10q24.3 1p, 29 FGG F 2q14.3-q21.1 4q28 2q, 3q IL1RN F 15q13dist-q14prox 2q14.2 1q, 15, 18 IL10 S 5 1q31-q32 2p, 5, 30 MMP13 F* 7p15 11q22.3 7 PRM1 F 13q14-q15prox 16p13.3 3p, 13q PTGS2 F 5p15 1q25.2-q25.3 2p, 5, 30 TNFA F 20q16dist-q21.1 6p21.3 10q, 20 TP53 S 11 17p13.1 11 ______F: Loci FISH mapped using clones found in a BAC library developed in France (Godard et al. 1998). F*: Loci FISH mapped using clones found in a horse BAC library developed in Texas (L. Skow, unpublished). S: Loci mapped by somatic cell hybrid analysis (Shiue et al. 1999) and for which no clones were found in either of the two BAC libraries. S*: Loci mapped by Y-L Shiue (1999). the product was scored for presence or absence of horse-specific fragments after electrophoresis. Correlation coefficients were calculated between all of the markers in the UC Davis SCH panel database and each of the loci studied. A correlation value greater or equal to 0.70 was accepted as evidence for synteny between two markers (Chevalet and Corpet 1986). The synteny assignments of IL10, CLU and TP53 are based on significant correlation values with reference markers from Caetano et al. (1999b). The synteny assignment of CYP17 is based on significant correlation values with markers linked to H27A1, that have been FISH mapped to ECA1p16-p15 (K. Mathiasson, personal communication). Although the overall relationships between horse and human chromosomes have been delineated by comparative chromosome painting (Zoo-FISH; Raudsepp et al. 1996) and as many as 140 genes have been mapped by somatic cell hybrid analysis (Caetano et al. 1999b, Shiue et al. 2000), the addition of Type I loci to the physical or genetic horse map is still valuable for deriving more detailed information on how the horse genome relates to that of other mammals. For this reason we compared all new gene assignments with existing human-horse Zoo-FISH data, some of which justify further comment. The entire ECA15 shares homology with HSA2. The assignment of IL1RN to ECA15q13dist-q14prox represents the fourth gene positioned by FISH to ECA15, permitting analysis of gene order conservation between ECA15 and HSA2 (Figure 7). Apparently, an inversion must have occurred in one lineage since the gene order is IL1RN- LCT-CAD-ODC1 in the horse and LCT-IL1RN-CAD-ODC1 in human. This kind of chromosomal rearrangement within conserved segments has been seen in comparisons of human and other domestic animals, for example the pig (Johansson et al. 1995). The FISH mapping of CD2 and PTGS2 to ECA5q11 and ECA5p15, respectively, represent the first genes mapped by in situ hybridisation to this horse chromosome. Four other equine type I genes have previously been mapped to ECA5 by somatic cell hybrid mapping (Caetano et al. 1999b) and an additional gene (PGM) has been assigned to the same chromosome by linkage analysis (Lindgren et al. 1998). In man, both CD2 and PTGS2 map to HSA1 and their relative location on the chromosome resembles that in the horse, except that the chromosome arms are reversed. CD2 is located in a proximal position (HSA1p13, ECA5q11) in both species whereas PTGS2 is found towards the middle of the other chromosome arm (HSA1q25.2-q25.3, ECA5p15). The assignment of

20 CD2 also shows that there is a chromosomal breakpoint on the p-arm of HSA1, as ANP on HSA1p36 maps to ECA2. One new gene assignment was not in agreement with Zoo-FISH data. CLU was localized to ECA2 by somatic cell hybrid mapping, however CLU maps to HSA8p21-p12 in man and HSA8 is homologous to ECA9 according to Zoo-FISH data. Whether a small region of HSA8 is actually conserved with a region on ECA2 will need to be defined by further comparative mapping of loci from HSA8 in the horse. The results presented in this study highlight the need for well-defined and characterised relationships between the human and horse genomes. Comparative maps of greater resolution may lead to a more efficient identification of trait genes, by using comparative candidate strategies, in the future. HSA2

ECA15

IL1RA

LCT

CAD

ODC1

Figure 7. ECA15 and HSA2. An inversion have occurred in either lineage.

21 Physical anchorage and orientation of equine linkage groups (Paper IV)

To combine previously generated linkage data in the horse (Lindgren et al. 1998, Guérin et al. 1999, Swinburne et al. 2000) with physical map data, we screened a horse BAC library (Godard et al. 1998), with a set of microsatellite markers derived from presently unassigned linkage groups, or in one case an unassigned marker. Three markers were also selected for the purpose of orienting anchored linkage groups with respect to the chromosomes. BAC clones were subsequently used as probes in dual colour FISH to identify their precise chromosomal origin. A horse BAC library was screened by PCR for the presence of 19 different microsatellite markers; AHT7, AHT20, ASB38, EB2E8, HMS5, HMS25, HMS45, LEX005, LEX014, LEX015, LEX023, LEX029, LEX038, LEX044, TKY111, UCDEQ14, UCDEQ425, UCDEQ464 and VIASH21. The screening procedure essentially followed that described in Godard et al. (1998). The primer pairs were used to amplify DNA from the 17 super-pools of the library and then from the corresponding 44 plate/row/column pools. Positive clones could be identified for eleven out of the 19 microsatellites (Table 1, Paper IV, Appendix). Chromosomal band location of the BAC clones representing the 11 loci was made by dual colour FISH analysis. The chromosomal location of each clone was made by reference to latest standardised karyotype of the horse (ISCNH 1997). The resulting chromosome band assignments are given in Table 1 (Paper IV, Appendix). The 11 microsatellites were from eight different linkage groups, except for one that was an unlinked marker. Most of these linkage groups were believed to map to the telocentric chromosomes, based on somatic cell hybrid analysis (Shiue et al. 1999). The eight groups and the single unlinked marker were assigned to nine horse chromosomes; for four of them (ECA6, ECA25, ECA27 and ECA28), these were their first in situ anchored markers. The microsatellite AHT20 was FISH mapped to ECA21q14-q15. This assignment makes it possible to orientate the linkage group TKY21-SGCV14-SGCV16-AHT20- HTG10-LEX060-COR068-COR073-LEX031-LEX037 along the chromosome as both SGCV14 and SGCV16 had previously been FISH mapped to ECA21q13. The orientation is thus centromere-TKY21-SGCV14-SGCV16-AHT20-HTG10-LEX060-COR068- COR073-LEX031-LEX037-telomere. A further two microsatellites, LEX044 and EB2E8, were FISH mapped to ECA26q13-q15 and ECA26q15, respectively. A17 has previously been FISH mapped to 26q13-q14. This allows the orientation of the linkage group COR071-LEX044-A17-EB2E8-UM005 with COR071 proximal to the centromere. ECA27 was tagged for the first time with in situ hybridisation data; LEX005 and HMS45 included in the linkage group COR031-UCDEQ5-ASB38-COR040-HMS45- LEX005-COR017 both map to ECA27q16 – the orientation thus remains to be determined. Two more equine chromosomes, ECA6 and ECA25, were also targeted with in situ hybridisation data for the first time and enabled anchoring of linkage groups COR010-LEX065-UM015-TKY111-TKY28 to ECA6 and NVHEQ43-AHT7- UCDEQ405-COR018-TXN-UCDEQ464 to ECA25, through the mapping of TKY111 and UCDEQ464, respectively. ECA 28 was mapped with a single microsatellite marker, UCDEQ425, not included in any linkage group, however it is the first FISH mapped microsatellite marker to that particular chromosome. For five (EB2E8, LEX014, LEX023, LEX044 and UCDEQ464) of the markers used in this study somatic cell hybrid data (Shiue et al. 1999) were available and the FISH locations made here were all in agreement with those. Several of the assigned chromosomes still have only one physically mapped marker that anchors their linkage group and the development of additional physically mapped markers on these linkage groups will therefore be important. Further, it will also be of great interest to physically map the two markers situated at the ends of each linkage group to estimate the coverage of each chromosomes by the linkage groups.

22 Low genetic variability on the Y chromosome of domestic horses (Paper V)

The application of genetic markers, e.g. single nucleotide polymorphisms (SNPs) or microsatellites, from the Y chromosome has recently proven most useful for the study of population or evolutionary processes. Importantly, by tracking the genetic history of patrilines studies of Y chromosome polymorphisms can complement analyses based on mitochondrial DNA (mtDNA), which reflects matrilinear inheritance, or autosomal markers. Certainly, the application of Y chromosome markers would be useful for studies of horse domestication, relationships between breeds and the genetic composition of individual breeds. As no polymorphic markers are yet available for the equine Y chromosome we report here the results from a large-scale screen for Y-linked SNPs among domestic horses. An equine bacterial artificial chromosome (BAC) library (L. Skow, unpublished) was screened by PCR for the presence of clones containing the zink finger protein Y (ZFY) gene. The primers used were designed to amplify a fragment from within one of the exons for the ZFY gene; these primers were designed such that they would not amplify the homologous gene on the X chromosome, ZFX. The BAC clone was subsequently subcloned and sequenced, and PCR primers for 13 different 300-1000 bp fragments of anonymous sequence, avoiding known equine repetitive elements within primer sequences, were designed. Two further Y chromosome-specific markers were developed for the SMCY gene where primers were based on conserved sequence identified from alignment of the Y-linked SMCY gene of human and mouse. The two markers were flanking intron 3 and 7, respectively. Again, primers were designed such that they should not amplify from the homologous SMCX gene. All primer pairs proved to be Y chromosome-specific as judged from their subsequent successful amplification of male but not female horse genomic DNA. The 15 Y chromosome fragments were screened for SNPs by DNA sequencing of PCR products obtained from amplification of unrelated male domestic horses of several different breeds, plus one male of the Przewalski’s horse. The DNA samples screened from the male domestic horses consisted of a set of 23 individuals from twelve different breeds. These horses included Arab (2 individuals), Akhal Teké (2), Ardenne (2), (2), (2), Fjord (2), pony (2), Icelandic horse (2), Shetland pony (2), Standardbred (1), North-Swedish horse (2), and Thoroughbred (2). Twelve or 24 individuals were generally screened for each fragment (Table 1, Paper V, Appendix) and the total length of Y chromosome sequence scanned in this way was 6014 bp. For about half of the fragments overlapping sequence was obtained from both ends (Table 1, Paper V, Appendix), while sequencing was successful only in one direction for the remainder. For one fragment, ZFY43A, we also used DHPLC to screen for variability, in this case on a total of 48 individuals. Only segregating site (K) was identified, in the ZFY55B fragment. This was in the form of a rare allele present in a single individual, the Przewalski’s horse. The polymorphism concerned an A↔G transition at a potential CpG site, confirmed by several independent amplifications and sequencing reactions (Figure 8). The failure to demonstrate genetic variability in about 6 kb of Y chromosome sequence of domestic horses clearly points to low levels of polymorphism on this chromosome. However, it is not likely that low Y chromosome variability of domestic horses reflects an overall low level of polymorphism in the genome. For instance, screening for genetic variability in introns and untranscribed regions of equine genes using single strand conformation polymorphism (SSCP) analysis, clearly a less sensitive method compared to DNA sequencing, we found 3 polymorphisms in about 6 kb of autosomal sequence; only eight horses of two different breeds were screened in that case (Lindgren et al. 1998, 2001a). Moreover, a recent screening for SNPs in 26 kb of autosomal sequence among revealed one polymorphic site every 1500 bp (M.B., unpublished) Finally, data from both autosomal microsatellites (Ellegren et al. 1992b) and mtDNA (Marklund et al. 1995) suggest high levels of genetic variability in the domestic horse.

23 Figure 8. DNA sequence chromatograms showing the identified A↔G SNP at the equine ZFY55B locis. Top: sequence from the domestic horse. Below: sequence from the Przewalski’s horse. Assuming that the overall levels of genetic variability in the domestic horse is not unusually low (cf. above), variability on Y may yet be lower than expected if there is a strong sex-bias in breeding. Specifically, if the number of stallions contributing to the gene pool is significantly lower than that of , variability on Y should be lowered compared to the rest of the genome. We believe this could be a plausible scenario and indeed it gains some support from previous genetic studies. Horse breeding has traditionally taken place, and still does, by selecting stallions and letting them cover many mares each (Levine 1999). Under such a breeding scheme the number of patrilines remaining in the population would be significantly less than the number of matrilines. Vilá et al. (2001) found a clear difference in the genetic structure among horse breeds revealed by mtDNA and autosomal microsatellite data. While mtDNA haplotypes are widely distributed among breeds with no clear genetic structure, microsatellite data reveal distinct differentiation between breeds. This can be explained by that maternal between breeds exceeds paternal gene flow and/or that the male effective population size is less than female effective population size. However, a male of the Przewalski’s horse, a with unclear taxonomic status, was found to differ at one nucleotide position compared to domestic horses. In the absence of knowledge of the ancestral state of this polymorphism, through information from an outgroup species, we cannot say which lineage represents the derived state. Nevertheless, the distinction between patrilines of Przewalski’s and domestic horses points at the former representing a genetically different from the domestic horse and its ancestors. This contrasts with recent data based on analyses of mtDNA. In conclusion, we have made the first large-scale survey of genetic variability on the Y chromosome of the domestic horse. The observation of complete monomorphism in about 6 kb sequence among 12-24 male horses of many different breeds indicates that levels of genetic variability on domestic horse Y chromosome is low. One possible interpretation of this is that the number of males contributing to the gene pool of domestic horses has been low, reducing the number of surviving patrilines. The identification of a variable site in a single male Przewalski’s horse suggests that the evolutionary lineage leading to the Przewalski’s horse is either ancestral to domestic horses or represents a more recent branch separated from domestic horses or their wild ancestors.

24 Future Prospects

There are several important tasks to fulfil in the field of horse genome mapping in the future. Firstly, the existing genetic and physical maps should be integrated. The availability of equine BAC libraries (Godard et al. 1998 and L. Skow, unpublished) will allow identification of large-insert clones containing genetically mapped markers, and the subsequent physical mapping of such clones with FISH. Secondly, gaps in the present map should be filled with markers. Moreover, it is of interest to develope certain chromosome regions with particular high marker density for cloning of genes in the region. Monochromosomal somatic cell hybrids, flow-sorted chromosomes (using FACS) or microdissected chromosome fragments provide sources for the identification of markers (e.g. microsatellites), from specific regions of the genome. The goal will be to extend the present linkage maps to reach a nearly complete genome coverage. During the last three years, three genetic linkage maps have been developed over the equine genome (Lindgren et al. 1998, Guérin et al. 1999, Swinburne et al. 2000). It is likely that this goal could be reached in the near future by merging linkage data from different maps to produce a consensus linkage map through strong international collaborative effort by the Equine Gene Mapping Workshop. The present linkage maps opens the way to perform genome scans for identification of phenotypically important genes in the horse. However, the density of markers will probably be too low for positional cloning of a specific gene of interest. One alternative is the use of a comparative positional approach, for which a comparative gene map is needed (Murray and Bowling 2000). With a well developed comparative map it would be possible to transfer mapping data from one species to another as coding sequences tend to be evolutionarily well conserved. The point here is to transfer mapping data from the well developed human and mouse genome maps to the less developed maps of e.g. the domestic animals. Raudsepp et al. (1996) presented conserved chromosomal segments in horse and man using Zoo-FISH. This work determines the gross degree of chromosome conservation between the horse and the human genome. However, a more fine-tuned comparative gene map is needed for positional candidate cloning of genes of interest. Only a few genes have been put on the horse linkage map because the limited amount of polymorphism found in such loci. The initial comparative gene maps are often based on synteny maps that do not give information on gene order (Murray and Bowling 2000). Caetano et al. (1999c) presented a comparative gene map of the horse consisting of 127 loci, mainly based on somatic cell hybrid mapping. The use of cDNA fragments, termed expressed sequence tags (ESTs), from cDNA libraries as markers in genome mapping is believed to be an effective approach to increase the number of mapped genes in the horse. ESTs provide good markers for use in radiation hybrid mapping as there is no need for polymorphism. Radiation hybrid mapping will probably be an important tool to rapidly increase the information of the horse comparative gene map in the future. As most biologically important traits, as well as many diseases, often show a polygenic inheritance, the identification of genes which regulate quantitative traits will be essential for our basic understanding of the relationship between genotype and phenotype. Further, the mapping and identification of quantitative trait loci (QTL), as well as qualitative traits, in domestic animals are valuable for animal breeders. At present, DNA tests based on flanking markers for QTLs are used in animal breeding programmes for reproductive traits in pigs (Rothschild et al. 1997) and for milkproduction in cattle (Riquet et al. 1999). Hence, it is possible that the identification of QTLs (e.g. QTLs underlying certain diseases or QTLs important for reproduction) will also prove to be important in horse breeding in the future. The more we learn, the more there is to explore...

25 References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403-410. Andersson L, Sandberg K, Adalsteinsson S, and Gunnarsson E. (1983a) Linkage of the equine serum esterase (Es) and mitochondrial glutamate oxaloacetate transaminase (GOTM) loci. A horse-mouse homology. The Journal of Heredity 74: 361-362. Andersson L, Juneja K, and Sandberg K. (1983b) Genetic linkage between the loci for phosphohexose isomerase (PHI) and serum protein (Xk) in horses. Anim. Blood Grps. Biochem. Genet. 14: 45-50. Andersson L, and Sandberg K. (1984) Genetic linkage in the horse. II. Distribution of male recombination estimates and the influence of age, breed and sex on recombination frequency. Genetics 106: 109-122. Anglana M, Vigoni MT, and Giulotto E. (1996) Four horse genomic fragments containing minisatellites detect highly polymorphic DNA fingerprints. Animal Genetics 27: 286. Ansari HA, Hediger R, Fries R, and Stranzinger G. (1988) Chromosomal localization of the major histocompatibility complex of the horse (ELA) by in situ hybridization. Immunogenetics 28: 362-364. Archibald AL, and Haley CS. (1998) Genetic linkage maps. In: Rothschild MF, and Ruvinsky A (eds), The Genetics of the Pig. CABI Publishing, CAB International, Oxon, UK, pp. 265-294. Bailey E, Graves KT, Cothran EG, Reid R, Lear TL, and Ennis RB. (1995) Synteny- mapping horse microsatellite markers using a heterohybridoma panel. Animal Genetics 26: 177-180. Barendse, W., D. Vaiman, S.J. Kemp, Y. Sugimoto, S.M. Armitage, J.L. Williams, et al. 1997. A medium-density genetic linkage map of the . Mammalian Genome 8: 21-28. Benirschke K, and Malouf N. (1967) Chromosome studies in Equidae. Equus 1: 253- 284. Bowling AT. (1987) Equine linkage groups II: phase conservation of To with A1B and GcS. The Journal of Heredity 78: 248-250. Bowling AT, Shiue Y-L, Caetano AR, Metallinos DL, Millon LV, Murray JD, Rieder S. (2000) Chromosome and synteny assignments for the candidate coat colour genes of horses. Mammalian Genome, in press. Breen M, Lindgren G, Binns MM, Norman J, Irvin Z, Bell K, Sandberg K, and Ellegren H. (1997) Genetical and physical assignments of equine microsatellites-first integration of anchored markers in horse genome mapping. Mammalian Genome 8: 267-273. Burt, A., and G. Bell. 1987. Mammalian chiasma frequencies as a test of two theories of recombination. 326: 803-805. Caetano AR, Pomp D, Murray JD, and Bowling AT. (1999a) Comparative mapping of 18 equine type I genes assigned by somatic cell hybrid analysis. Mammalian Genome 10: 271-276. Caetano AR, Lyons LA, Laughlin TF, O’Brien SJ, Murray JD, and Bowling AT. (1999b) Equine synteny mapping of comparative anchor-tagged sequences (CATS) from human chromosome 5. Mammalian Genome 10: 1082-1084. Caetano AR, Shiue Y-L, Lyons LA, et al. (1999c) A comparative gene map of the horse (Equus caballus). Genome Research 9: 1239-1249. Chapman, H.M., and A. Bruere. 1977. Chromosome morphology during meiosis of normal and Robertsonian translocation-carrying rams (Ovis aries). Can. J. Genet. Cytol. 19: 93-102. Chevalet C, Corpet F (1986) Statistical decision rules concerning synteny or independence between markers. Cytogenet Cell Genet 43: 132-139. Chowdhary B. (1998a) Cytogenetics and physical chromosome maps. In: Rothschild MF, and Ruvinsky A (eds), The Genetics of the Pig. CABI Publishing, CAB International, Oxon, UK, pp. 199-264.

26 Chowdhary BP, Raudsepp T, Frönicke L, and Schertan H. (1998b) Emerging patterns of comparative genome organization in some mammalian species as revealed by Zoo- FISH. Genome Research 8: 577-589. Chowdhary B, and Raudsepp T. (2000) Cytogenetics and physical gene maps. In: Bowling AT, and Ruvinsky A (eds), The Genetics of the Horse. CABI Publishing, CAB International, Oxon, UK, pp. 171-241. Clutton-Brock J. (1999). A Natural History of Domesticated Mammals. Cambridge University Press, Cambridge. Cox DR, Burmeister M, Price ER, Kim S, and Myers RM. (1990) Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes. Science 250: 245-250. Cox DR, and Myers RM. (1992) Current Biology 2: 338. Crawford, A.M., K.G. Dodds, A.J. Ede, C.A. Pierson, G.W. Montgomery, H.G. Garmonsway, A.E. Beattie, et al. 1995. An autosomal genetic linkage map of the sheep genome. Genetics 140: 703-724. Davisson, M.T., and T.H. Roderick. 1989. Linkage map. In Genetic Variants and Strains of the Laboratory Mouse, 2nd edition (ed. M.F. Lyon and A.G. Searle), pp. 416- 427. Gustav Fischer Verlag, Stuttgart. de Gortari MJ, Freking BA, Cuthbertson RP, Kappes SM, Keele JW, Stone RT, Leymaster KA, Dodds KG, Crawford AM, and Beattie CW. (1998) A second- generation linkage map of the sheep genome. Mammalian Genome 9: 204-209. Dunn LC, and Bennett D. (1967) Sex differences in recombination of linked genes in animals. Genet. Res. Camb. 9: 211-220. Ellegren H, Andersson L, Johansson M, and Sandberg K. (1992a) DNA fingerprinting in horses using simple (TG)n probe and its application to population comparisons. Animal Genetics 23: 1-9. Ellegren H, Johansson M, Sandberg K, and Andersson L. (1992b) Cloning of highly polymorphic microsatellites in the horse. Animal Genetics 23: 133-142. Ellegren, H., B.P. Chowdhary, M. Johansson, L. Marklund, M. Fredholm, I. Gustavsson, and L. Andersson. 1994. A primary linkage map of the porcine genome reveals a low rate of genetic recombination. Genetics 137: 1089-1100. Evander R.L. (1989) Phylogeny of the family Equidae. In: Prothero DR, and Schoch RM (eds), The evolution of Perissodactyls. Oxford University Press, Oxford, pp. 109- 127. Flint J, Taylor AM, Clegg JB. (1988) Structure and zeta globin locus. Journal of Molecular Biology 199: 427-437. Gallagher PC, Lear TL, Coogle LD, Bailey E. (1999) Two SINE families associated with equine microsatellite loci. Mammalian Genome 10: 140-144. Georges M, Lequarre AS, Castelli M, Hanser, R Vassart G. (1988) DNA fingerprinting in domestic animals using four different minisatellite probes. Cytogenetics and Cell Genetics 47: 127-131. Godard S, Vaiman D, Oustry A, Nocart M, Bertaud M, Guzylack S, Mériaux JC, Cribiu EP, and Guérin G. (1997) Characterization, genetic and physical mapping analysis of 36 horse plasmid and cosmid-derived microsatellites. Mammalian Genome 8: 745- 750. Godard S, Schibler L, Oustry A, Cribiu EP, and Guerin G. (1998) Construction of a horse BAC library and cytogenetical assignment of 20 type I and type II markers. Mammalian Genome 9: 633-637. Godard S, Oustry-Vaiman A, Cribiu EP, and Guérin G. (1999) FISH mapping assignment of two horse BAC clones containing HMS41 and HTG3 microsatellites. Animal Genetics 30: 233-234. Godard S, Oustry-Vaiman A, Schibler L, Mariat D, Vaiman D, Cribiu EP, and Guérin G. (2000) Cytogenetic localization of 44 new coding sequences in the horse. Mammalian Genome 11: 1093-7. Green P, Falls KA, and Crooks S. (1990) Documentation for CRI-MAP, version 2.4. Washington University School of Medicine, St. Louis, MO. Gregory TR, and Hebert PH. (1999) The modulation of DNA content: proximate causes and ultimate consequences. Genome Research 9: 317-324.

27 Groenen MA, Cheng HH, Bumstead N, Benkel BF, Briles WE, Burke T, Burt DW, Crittenden LB, Dodgson J, Hillel J, Lamont S, de Leon AP, Soller M, Takahashi H, and Vignal A. (2000) A consensus linkage map of the chicken genome. Genome Research 10: 137-147. Gross SJ, and Harris H. (1975) Nature 255: 680. Gross SJ, and Harris H. (1977a) Journal of Cell Science 25: 17. Gross SJ, and Harris H. (1977b) Journal of Cell Science 25: 39. Groves CP, and Ryder OA. (2000) and Phylogeny of the Horse. In: Bowling AT, and Ruvinsky A (eds), The Genetics of the Horse. CABI Publishing, CAB International, Oxon, UK, pp. 1-24. Guérin G, Bailey E, Bernoco D, et al. (1999) Report of the International Equine Gene Mapping Workshop: Male Linkage Map. Animal Genetics 30: 341-354. Guérin G, Bertand M, Billoud B, and Meriaux JC. (1993) A genetic analysis of variable number tandem repeat (VNTR) polymorphism in the horse. Genetics Selection and Evolution 25: 435-445. Guttormsen SA, and Weitkamp LR. (1981) Equine marker genes: polymorphism of soluble erythrocyte malic enzyme. Anim. Blood Grps. Biochem. Genet. 12: 53-57. Harbitz I, Chowdhary BP, Saether H, Hauge JG, Gustavsson I. (1990) A porcine genomic glocosephosphate isomerase probe detects a multiallelic restriction fragment length polymorphism assigned to chromosome 10pter in horse. Hereditas 112: 151-156. Hasler-Rapacz J, Ellegren H, Fridolfsson A-K et al. (1998) Identification of a mutation in the low density lipoprotein receptor gene associated with recessive familial hypercholesterolemia in swine. Am J Med Genet 76: 379-386. Hirota K, Mashima S, Tozaki T, Sakagami M, Mukoyama H, and Miura N. (1997) Sequence tagged sites on horse chromosomes. Archivos de Zootecnia 46: 3-7. Hooker JJ. (1984) A primitive ceratomorph (Perissodactyla, Mammalia) from the early Tertiary of Europe. Zoological Journal of the Linnaean Society 82: 229-244. Hopkins B, O´Connell FM, and Hopkins J. (1991) Use of DNA fingerprinting in paternity analysis of closely related Exmoor ponies. Equine Veterinary Journal 23: 277-279. Hopman TJ, Han EB, Story MR, et al. (1999) Equine dinucleotide repeat loci COR001- COR020. Animal Genetics 30: 225-226. Håkansson, M. (2000) Slutrapport Projekt Pegasus. Västra Götalands Landsting, Psykiatriska kliniken NU-sjukvården. International Human Genome Sequencing Consortium. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860-921. ISCNH (1997) International System for Cytogenetic Nomenclature of the Domestic Horse. Bowling AT, Breen M, Chowdhary BP, Hirota K, Lear T, Millon LV, Ponce de Leon FA, Raudsepp T, and Stranzinger G. (Committee). Chromosome Research 5: 433-443. Jeffreys AJ, Wilson V, and Thein SL. (1985a) Hypervariable ”minisatellite” regions in human DNA. Nature 314: 67-73. Jeffreys AJ, Wilson V, and Thein SL. (1985b) Individual-specific ”fingerprints” of human DNA. Nature 316: 76-79. Johansson M, Ellegren H, Andersson L (1995) Comparative mapping reveals extensive linkage conservation - but with gene order rearrangements - between the pig and the human genomes. Genomics 25: 682-690. Kappes SM, Keele JW, Stone RT, McGraw RA, Sonstegard TS, Smith TP, Lopez- Corrales NL, and Beattie CW. (1997) A second-generation linkage map of the bovine genome. Genome Research 7: 235-249. Kiguwa SL, Hextall P, Smith AL, Critcher R, Swinburne J, Millon L, Binns MM, Goodfellow PN, McCarthy LC, Farr CJ, Oakenfull EA. (2000) A horse whole genome-radiation hybrid panel: Chromosome 1 and 10 preliminary maps. Mammalian Genome 11: 803-805. Kosambi DD. (1944) The estimation of map distances from recombination values. Annals of Eugenics 12: 172-175. Lathrop GM, and Lalouel JM. (1988) Efficient computations in multi-locus linkage analysis. American Journal of Human Genetics 42: 498-505.

28 Lear TL, Piumi F, Terry RR, Guerin G, and Bailey E. (2000) Physical mapping of horse v-fes feline sarcoma viral oncogene homologue; pyruvate kinase, muscle type; plasminogen; beta spectrin, nonerythrocytic 1; thymidylate synthetase and microsatellite LEX078 to 1q14-q15, 1q21, 31q12-q14, 15q22, 8q12-q14, 14q27, respectively. Chromosome Research 8: 361. Lear TL, Brandon R, Piumi F, Terry RR, Guerin G, Thomas S, and Bailey E. (2001) Mapping of 31 horse genes in BACs by FISH. Chromosome Research, in press. Levine, M.A. 1999. Botai and the origins of horse domestication. Journal of Anthropological Archaeology 18: 29-78. Lewin B. (1994) The C-value paradox describes variations in genome size. In: Genes V. Oxford University Press, Oxford, pp. 658-665. Lindgren G, Sandberg K, Persson H, et al. (1998) A primary male autosomal linkage map of the horse genome. Genome Research 8: 951-966. Lindgren G, Breen M, Godard S, Bowling A, Murray J, Scavone M, Skow L, Sandberg K, Guérin G, Binns MM, and Ellegren H. (2001a) Mapping of 13 horse genes by fluorescence in-situ hybridization (FISH) and somatic cell hybrid analysis. Chromosome Research 9: 53-59. Lindgren, G., Swinburne, J., Breen, M., Mariat, D., Sandberg, K., Guérin, G., Ellegren, H. and Binns, M.M. (2001b) Physical anchorage and orientation of equine linkage groups by FISH mapping BAC clones containing microsatellite markers. Animal Genetics 32: 37-39. Litt M, and Luty JA. (1989) American Journal of Human Genetics 44: 397. Logue, D.N. 1977. Meiosis in the domestic ruminants with particular reference to Robertsonian translocations. Ann. Genet. Sel. Anim. 9: 493-507. Long, S.E. 1978. Chiasma counts and non-disjunction frequencies in a normal ram and in rams carrying the Massey I (t1) Robertsonian translocation. J. Reprod. Fertil. 53: 353-356. Lynn A, Kashuk C, Petersen MB, Bailey JA, Cox DR, Antonarakis SE, Chakravarti A. (2000) Patterns of meiotic recombination on the long arm of human chromosome 21. Genome Research 10:1319-32. Mariat D, Oustry-Vaiman A, Cribiu EP, Raudsepp T, Chowdhary B, and Guérin G. (2001) Isolation, characterization and FISH assignments of horse BAC clones containing type I and type II markers. Cytogenetics and Cell Genetics, in press. Marklund S, Chaudhary R, Marklund L, Sandberg K, and Andersson L. (1995) Extensive mtDNA diversity in horses revealed by PCR-SSCP analysis. Animal Genetics 26: 193-196. Marklund S, Ellegren H, Eriksson S, Sandberg K, and Andersson L. (1994) Parentage testing and linkage analysis in the horse using a set of highly polymorphic microsatellites. Animal Genetics 25: 19-23. Marti E, Breen M, Fischer P, and Binns MM (1998) Isolation, characterisation and physical mapping of cosmid derived microsatellites from the horse. Animal Genetics 29: 236-238. McGuire TR, and Weitkamp LR. (1980) Equine marker genes: polymorphism for transferrin alleles, TfF1 and TfF2, in thoroughbreds. Anim. Blood Grps. Biochem. Genet. 11: 113-117. Menotti-Raymond M, David VA, Lyons LA et al. (1999) A genetic linkage map of microsatellites in the domestic cat (Felis catus). Genomics 57: 9-23. Mikawa S, Akita T, Hisamatsu N, Inage Y, Ito Y, Kobayashi E, Kusumoto H, Matsumoto T, Mikami H, Minezawa M, Miyake M, Shimanuki S, Sugiyama C, Uchida Y, Wada Y, Yanai S, Yasue H. (1999) A linkage map of 243 DNA markers in an intercross of Gottingen miniature and Meishan pigs. Anim Genet. 30:407-17. Miller, R. (1997) Linkage mapping of plant and animal genomes. In: Dear, PH (editor), Genome mapping: a practical approach. Oxford University Press, Oxford, pp. 27-47. Morton, N.E. 1991. Parameters of the human genome. Proc. Natl. Acad. Sci. USA 88: 7474-7476. Murphie AM, Hopman TJ, Schug MD, et al. (1999) Equine dinucleotide repeat loci COR021-COR040. Animal Genetics 30: 235-237.

29 Murray JD, and Bowling AT. (2000) Linkage and Comparative Maps for the Horse (Equus caballus). In: Bowling AT, and Ruvinsky A (eds), The Genetics of the Horse. CABI Publishing, CAB International, Oxon, UK, pp. 243-280. Mäkinen A, Chowdhary B, Mahdy E, Andersson L, and Gustavsson I. (1989) Localisation of the equine major histocompatibility complex (ELA) to chromosome 20 by in situ hybridization. Hereditas 110: 93-96. Naylor S. (1997) Construction and use of somatic cell hybrids. In: Dear PH (editor), Genome mapping: a practical approach. Oxford University Press, Oxford, pp. 125- 163. O'Brien SJ, Womack JE, Lyons LA, Moore KJ, Jenkins NA, Copeland NG (1993) Anchored reference loci for comparative genome mapping in mammals. Nat Genet 3: 103-112. Olivier M, Aggarwal A, Allen J, et al. (2001) A high-resolution radiation hybrid map of the human genome draft sequence. Science 291: 1298-1302. Prothero DR, and Schoch RM. (1989) Classification of the Perissodactyla: summary and synthesis. In: Prothero DR, and Schoch RM (eds), The Evolution of Perissodactyls. Oxford University Press, Oxford, pp. 530-537. Qumsiyeh MB. (1994) Evolution of number and morphology of mammalian chromosomes. The Journal of Heredity 85: 455-465. Raudsepp T, Frönicke L, Scherthan H, Gustavsson I, and Chowdhary BP. (1996) Zoo- FISH delineates conserved chromosomal segment in horse and man. Chromosome Research 4: 218-225. Raudsepp T, Otte K, Rozell B, and Chowdhary BP. (1997) FISH mapping of the IGF2 gene in horse and donkey – detection of homoeology with HSA11. Mammalian Genome 8: 569-572. Raudsepp T, Kijas J, Godard S, Guérin G, Andersson L, and Chowdhary B. (1999) Comparison of horse chromosome 3 with donkey and human chromosomes using cross species painting ang heterologous FISH mapping. Mammalian Genome 10: 277-282. Riquet J, Coppieters W, Cambisano N, Arranz JJ, Berzi P, Davis SK, Grisart B, Farnir F, Karim L, Mni M, Simon P, Taylor JF, Vanmanshoven P, Wagenaar D, Womack JE, and Georges M. (1999) Fine-mapping of quantitative trait loci by identity by descent in outbred populations: application to milk production in dairy cattle. Proceedings of the National Academy of Sciences USA 96: 9252-7. Rong R, Chandley AC, Song J, McBeath S, Tan PP, Bai Q, and Speed RM. (1988) A fertile and in China. Cytogenetics and Cell Genetics 47: 134-139. Rothfels KH, Alexrad AA, Siminovitch L, and McCulloch Parker RC. (1959) The origin of altered cell lines from mouse, monkey and man, as indicated by chromosome and transplantation studies. Proceeding of Canadian Cancer Research Conference 3: 189-214. Rothschild MF, Messer LA, and Vincent A. (1997). Molecular approaches to improved pig fertility. J Reprod Fertil Suppl 52: 227-36. Ruth L, Hopman TJ, Schug MD, et al. (1999) Equine dinucleotide repeat loci COR041- COR060. Animal Genetics 30: 320-321. Ryder OA, Epel NC, and Benirschke K. (1978) Chromosome banding studies of the Equidae. Cytogenetics and Cell Genetics 20: 323-350. Sakagami M, Ohshima K, Mukoyama H, Yasue H, and Okada N. (1994b) A novel tRNA species as an origin of short interspersed repetitive elements (SINEs): equine SINEs may have originated from tRNASer. Journal of Molecular Biology 239: 731-735. Sakagami M, Tozaki T, Mashima S, Hirota K, and Mukoyama H. (1995). Equine parentage testing by microsatellite locus at chromosome 1q2.1. Animal Genetics 26: 123-124. Sandberg K. (1974) Linkage between the K blood group locus and the 6-PGD locus in horses. Anim. Blood Grps. Biochem. Genet. 5: 137-141. Sandberg K, and Juneja K. (1978) Close linkage between the albumin and Gc loci in the horse. Anim. Blood Grps. Biochem. Genet. 9: 169-173.

30 Sandberg K and Andersson L. (1992) Horse (Equus caballus). In: Genetic Maps 6th ed. (SJ O´Brien, ed) (Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press), pp 4.276-8. Scherf B. (1995) World Watch List for Domestic Animal Diversity, 2nd edn. The Food and Agriculture Organization (FAO), Rome. Scott, I.S., and S.E. Long. 1980. An examination of chromosomes in the stallion (Equus caballus) during meiosis. Cytogenet. Cell Genet. 26: 7-13. Shiue, Y.-L. (1999) Construction of a horse (Equus caballus) synteny and comparative map based on Type I and Type II markers. Ph.D. dissertation, University of California, Davis. Shiue Y, Bickel LA, Caetano AR, Millon LV, Clark RS, Eggleston ML, Michelmore R, Bailey E, Guérin G, Godard S, Mickelson JR, Valberg SJ, Murray JD, and Bowling AT. (1999). A synteny map of the horse genome comprised of 240 microsatellite and RAPD markers. Animal Genetics 30: 1-9. Shiue Y-L, Millon LV, Skow LC, Honeycutt D, Murray JD, and Bowling AT (2000) Synteny and regional marker order assignment of 26 type I and microsatellite markers to the horse X- and Y-chromosomes. Chromosome Research 8: 45-55. Stewart EA, and Cox DR. (1997) Radiation hybrid mapping. In: Dear PH (editor), Genome mapping: a practical approach. Oxford University Press, Oxford, pp. 73-93. Swinburne J, Gerstenberg C, Breen M, et al. (2000) First comprehensive low-density horse linkage map based on two three-generation, full-sibling, cross-bred reference families. Genomics 66: 123-134. Tallmadge RL, Hopman TJ, Schug MD, et al. (1999) Equine dinucleotide repeat loci COR061-COR080. Animal Genetics 30: 462-463. Tozaki T, Sakagami M, Mashima S, Hirota K, and Mukoyama H. (1995) ECA-3: equine (CA) repeat polymorphism at chromosome 2p1.3-4. Animal Genetics 26: 283. Venter CJ, Adams MD, Myers EW, et al. (2001) The sequence of the human geneome. Science 291: 1304-1351. Vilà C, Leonard JA, Gotherstrom A, Marklund S, Sandberg K, Liden K, Wayne RK, and Ellegren H. (2001) Widespread Origins of Domestic Horse Lineages. Science 291: 474-477. Walter MA, and Goodfellow PN. (1993) Radiation hybrids: irradiation and fusion gene transfer. Trends in Genetics 9: 352-356. Waterston R, and Sulston J. (1995) The genome of Caenorhabditis elegans. Proceedings of the National Academy of Sciences USA 92: 10836-10840. Watson JD, and Crick FHC. (1953) Molecular structure of nucleic acidc. Nature 171: 737-738. Weissenbach, J., G. Gyapay, C. Dib, A. Vignal, J. Morisette, P. Millasseau, G. Vaysseix, and M. Lathrop. 1992. A second-generation linkage map of the human genome. Nature 359: 794-801. Weitkamp LR, Costello-Leary P, and Guttormsen SA. (1983) Equine marker genes: polymorphism for plasminogen. Anim. Blood Grps. Biochem. Genet. 14: 219-223. Weitkamp LR, Guttormsen SA, and Costello-Leary P. (1982) Equine gene mapping: close linkage between the loci for soluble malic enzyme and Xk (Pa). Anim. Blood Grps. Biochem. Genet. 13: 279-284. Williams H, Richards CM, Konfortov BA, Miller JR, and Tucker EM. (1993) Synteny mapping in the horse-mouse heterohybridomas. Animal Genetics 24: 257-260. Womack JE, Johnson JS, Owens EK, Rexroad III CE, Schläpfer J, and Yang Y. (1997) A whole genome radiation panel for bovine gene mapping. Mammalian Genome 8: 854-856. Yerle M, Pinton P, Robic A, Alfonso A, Palvadeau Y, Delcros C, Hawken R, Alexander L, Beattie C, Schook L, Milan D, and Gellin J. (1998) Construction of a whole-genome radiation panel for high-resolution gene mapping in pigs. Cytogenetics and Cell Genetics 82: 182-188.

31 Acknowledgements

I am happy to have met so many nice people during my time as a PhD student! Hans Ellegren. I would like to express my sincere thanks for your guidance over the past years, for being helpful when needed, for fun, for giving me free space, for being an immeasurable, natural source of inspiration. I have learnt so much from you! Matthew Binns. Thankyou so much for letting me work with you and coworkers (thanks to all of you!) in a well working lab. It has meant a lot for me to get the opportunity to work in another lab during my time as a PhD student. I have really enjoyed it! You must have noticed as I keep coming back… I have got the opportunity to see lots of things in England, some of which I remenber more than others, like the Sanger Centre, proper English pubs, sterile bacterial work(!), some really good places in London and very much alive young thoroughbreds during training in the early morning… probably the best place of all to look at world class race horses. Kaj Sandberg. My co-supervisor and the best person there is to collaborate with. I cant think of any better collaborator. Thankyou! Leif Andersson. You have been like an anchor for me in this project and answered lots of questions during these years - until the very end. It is a delight to hear you speak about science. Matthew Breen, for introducing me into the world of chromosomes. Im not sure if I could agree about the English bulldog thing though! June Swinburne, for giving me the opportunity to learn what efficiency is (I will do my best to adopt at least some of it) and the small chats about our quadruped dearest friends. Simon, for commenting on this thesis, giving very good and valuable suggestions for improvements – thankyou for that. Good luck with the blue tits among other things! Anna-Karin Fridolfsson. My office mate during a couple of years. Always with a fresh mind. Thanks for the most valuable comments you gave on this thesis. Good luck with your hobbies – and hold on to them! No doubt you will though… Terje Raudsepp, for our talks about horses and learning me things about what their chromosomes are like.

Gudrun Wieslander and Ulla Gustavsson – thanks for all your expertness. To all people at my previous department Animal Breeding and Genetics. Anna Härlid, for creating such a nice atmosphere around you. Øystein, for being Øystein. Rose-Marie for your experience and expertise. Lori, for being a very nice person. Helping with all different kinds of things – and solving them in most cases. Sofia, for your fresh and open mind! Linda, I’m sure everything will go very well for you – keep on working. I’m looking forward to see ”Fame” with your dog. Jesper, for being a critical scientist. Hannah, for your nice engagement in things! Carles, the man with the big smile. The ”FlyTech-group” Glenn, Thomas and Jon, for bringing a good atmosphere! Henrik, for always being genuinly nice - you are missing here, but Im sure you have a good time in Norway. Anna-Karin, Eva and Jenny for making our department an enjoyable place. Our students Christine and Sara. The new persons to our group Helene and Anders.

32 Anki and Susanne - I want to give a special thank to both of you for doing all the intensive sequencing during the last period of my PhD studies. Adrienne, for always listening and to your advice through these years. Dick (thanks for helping me with formatting the manuscript in this thesis), Kristina, Anna K, Marta, Hege and Martin - my friends. Anna Rosander, for being so nice and you certainly know how to make the best cakes… Lotta, thanks for all the small talks we have had, especially lately being in the same situation as me with dissertation in the middle of May. You certainly know how to make a party… restart? Anki Weibull, for being nice and helpful regarding horses or other things. Sofia Wretblad – shall we go swimming? or just go to the sauna maybe… Christina – my dearest friend. Annika Dahlén, for good company and chats at Ultuna as well as on conferences all over the world including massive shopping, for instance the bargin of lamps at Tiffanys in Minneapolis… very handy on the flight back home. Annica Pontén, especially for the evenings out in Stockholm. To all nice persons I got to know here at EBC! Annika, for being a great friend. To all people I met and collaborated with through the Equine Gene Mapping Workshops held at several places around the world. To my family, Nils and Charlotte – probably among the best parents there is to get. And to the rest of you - Ina, Jois, Emma and Fredrik!

33