<<

THE ISOLATION AND CHARACTERISATION

OF COSMID-DERIVED CANINE

MICROSATELLITE DNA SEQUENCES

Thesis submitted for the degree of

Doctor of Philosophy

at the University of Leicester

by

Nicola Mary Suter BSc (Keele)

Department of Biochemistry

University of Leicester

February 1998 UMI Number: U104146

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

Dissertation Publishing

UMI U104146 Published by ProQuest LLC 2013. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code.

ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 2

To my brother Jonathan, for being himself ACKNOWLEDGEMENTS

I would like to acknowledge several people for their help and support which has enabled me to produce this thesis. For invaluable practical help, advice and support in the laboratory, Dr. Sue Adams; my supervisor, Dr. Jeff Sampson for help and advice with reports and talks, and encouragement when times were hard; Rachael Thomas for her dedicated help and criticism with my thesis, and James and my parents for their patience, understanding and financial support. I would also like to thank The Health Trust, particularly Dr. Matthew Breen for his help in both practical and written work contributing to Chapter 4 of my thesis, and Ed Ryder for his help with linkage analysis. Finally, I wish to acknowledge Pedigree Petfoods for their financial contribution. 4 CONTENTS

pg Title page 1 Dedication 2 Acknowledgements 3 Contents 4 Figures 9 Tables 12 Abstract 13 Abbreviations 14

1.0 CHAPTER 1 17 INTRODUCTION 1.1 The origins of the domestic dog 17 1.2 The need for a map 18 1.3 Markers for map production 20 1.4 Microsatellites 22 1.4.1 Microsatellites associated with disease 25 1.4.2 The function of microsatellites 27 1.4.3 The potential and limitations of microsatellites 29 1.5 Linkage markers in the canine genome 34 1.6 Principles of linkage mapping 34 1.7 studies of the Canidae 36 1.8 Physical mapping and the use of fluorescence in situ hybridisation (FISH) 38 1.8.1 FISH 40 1.8.2 Probes for FISH 40 1.8.3 Applications of FISH 41 1.8.4 Physical mapping 42 1.8.5 Physical mapping of the canine genome 44 1.9 Comparative mapping 44 1.10 Combining the maps 46 1.11 Research aims 47

2.0 CHAPTER 2 49 GENERAL SOLUTIONS 49 METHODS 5 2.1.0 Identification of cosmids and fragments containing putative microsatellites 53 2.1.1 Partial canine cosmid 53 2.1.2 Hedgehog blot production 53 2.1.3 Probing for cosmid colonies containing microsatellites - Radioactive method 54 2.1.4 Probing for cosmid colonies containing microsatellites - ECL method 54 2.1.5 Isolation of putative positive clones 56 2.1.6 Small-scale Alkaline Lysis Method 56 2.1.7 Large-scale Alkaline Lysis Method 57 2.1.8 Quantification of DNA 57 2.1.9 Digestion of DNA with restriction endonucleases 58 2.1.10 Separation of putative positive clone fragments digested with a restriction endonuclease 58 2.1.11 Agarose gels 58 2.1.12 Fluorescein d-UTP endlabelling of DNA molecular weight markers 59 2.1.13 Southern blotting 59 2.1.14 Putative microsatellite repeat-containing fragment identification 59 2.1.15 Band elution from agarose gels 60 2.1.16 Preparation of DNA fragments for subcloning 60 2.2.0 Subcloning 61 2.2.1 Production of competent cells (XL-1 Blue) 61 2.2.2 Vector preparation of KS+ 62 2.2.3 Treatment of vectors with phosphatase 62 2.2.4 Ligation of inserts into vectors 62 2.2.5 Transformation of KS+ ligated vectors into competent XL-1 Blue MRF* cells 62 2.2.6 Plating of XL-1 Blue containing the KS+ vector 63 2.2.7 DNA production of insert-containing KS+ vectors 63 2.2.8 Identification of containing positively- hybridising sequences 64 2.3.0 dsDNA sequencing and primer design 64 2.3.1 Manual sequencing 64 2.3.2 Automatic sequencing 66 2.3.3 Designing primers for PCR of microsatellites 67 2.3.4 Precipitation and quantification of primers 68 2.4.0 Polymerase chain reaction (PCR) 68 6 2.4.1 PCR reaction conditions 68 2.4.2 Optimising PCR conditions 68 2.5.0 Extraction of canine genomic DNA from blood and blood-clots 69 2.6.0 Determination of polymorphic microsatellites and their subsequent analysis across the DogMap reference families 71 2.6.1 Assessment of the polymorphic properties of microsatellite repeats 71 2.6.2 Radioactive endlabelling of forward PCR primers 71 2.6.3 Re-optimisation of PCR conditions 71 2.6.4 Sequencing gel preparation 71 2.6.5 Reference family typing 72 2.7.0 Fluorescence in situ hybridisation (FISH) 72 2.7.1 Cosmid labelling and cleaning 72 2.7.2 Cosmid preparation for hybridisation to chromosome spreads 73 2.7.3 Slide preparation for hybridisation of labelled cosmids 74 2.7.4 Hybridisation of cosmids to chromosome spreads 74 2.7.5 Post hybridisation washing and immunocytological detection of hybridisation sites 74 APPENDIX 76

3.0 CHAPTER 3 78 ISOLATION AND SEQUENCING OF MICROSATELLITES 3.1 Summary 78 3.2 Introduction 79 Flow chart to show microsatellite isolation and polymorphism detection 80 3.3 Results 81 3.3.1 Cosmid screening for microsatellite repeats 81 3.3.2 Digestion and Southern blotting of possible microsatellite-containing cosmids 83 3.3.3 Subcloning of fragments containing putative repeats 88 3.3.4 Sequencing of plasmids containing putative microsatellite repeats 90 3.3.5 PCR of microsatellite repeats 93 3.3.6 Summary of loss of microsatellite repeats from initial pool 95 3.3.7 BLAST search results 95 7 3.4 Discussion 99 3.4.1 Partial canine cosmid library 99 3.4.2 Methods of isolation 99 3.4.3 Selection stringency and ECL 100 3.4.4 Microsatellite duplications 101 3.4.5 Repeat sequences 101 3.4.6 Assessment of the polymorphic properties of the microsatellite repeats 102 3.4.7 BLAST search results 103

4.0 CHAPTER 4 106 FLUORESCENCE IN SITU HYBRIDISATION OF COSMIDS CONTAINING POLYMORPHIC MICROSATELLITES 4.1 Summary 106 4.2 Introduction 108 4.3 Results 109 4.3.1 FISH of cosmids containing polymorphic microsatellites to canine 109 4.3.2 Identification of chromosomes to which cosmids hybridise 109 4.3.3 The centromeric repeat-containing cosmid, 2A11 115 4.3.4 Cosmid 1E7 and the pseudoautosomal region 120 4.4 Discussion 123 4.4.1 FISH results 123 4.4.2 Centromeric cosmid 2A11 123 4.4.3 1E3/2A6 hybridisation to chromosome 18q21 124 4.4.4 1E7 - the pseudoautosomal cosmid 125

5.0 CHAPTERS 127 GENETIC LINKAGE OF MICROSATELLITES 5.1 Summary 127 5.2 Introduction 128 5.3 Results 129 5.3.1 Optimisation of PCR conditions using radioactively- labelled primers and sequencing gels 129 5.3.2 Genotyping the reference families 134 5.3.3 Allele non-amplification of microsatellite 1B10 149 5.3.4 Genetic linkage analysis of microsatellites 149 5.4 Discussion 158 5.4.1 PCR optimisation 158 8 5.4.2 Informativeness of microsatellites for reference family analysis 158 5.4.3 Genotyping the reference families 159 5.4.4 Mutations 160 5.4.5 Linkage analysis of reference family data 162 APPENDIX 166 (i) Table of reference family data 166 (ii) Linkage analysis results - Twopoint analysis 173 (iii) Initial multipoint linkage analysis results of marker groups ( build option) 176 (iv) Further analysis of ordered linkage groups using the all option 180

6.0 CHAPTER 6 182 6.1 GENERAL DISCUSSION 182 6.1.1 Project aims and achievements 182 6.1.2 The progress of the human genome map 182 6.1.3 The canine genome map 184 6.1.4 Comparative mapping 186 6.1.5 The future of canine genome mapping 187 6.2 Future work 188 Conclusions 189

GENERAL APPENDIX 190 (i) Sequences of microsatellite-containing subclones, indicating sequence used in BLAST searches, primer sequences and repeat sequences 190 (ii) Results of BLAST searches which gave significant matches to database sequences 205

REFERENCES 231 9 FIGURES

pg CHAPTER 1 1.1 Canine karyotype 39

CHAPTER 3 3.1a Radioactively probed cosmid hedgehog blot (CA probe) 82 3.1b Fluorescently labelled cosmid hedgehog blot (GAAA probe) 82 3.2a Ethidium Bromide stained agarose gel showing Bam H1 restriction fragments of cosmid clones containing (CA)n microsatellites 84 3.2b Southern blot of gel in Figure 3.2a probed with a fluorescently labelled (CA)io probe 85 3.3a Ethidium Bromide stained agarose gel showing Sau3A 1 digests of cosmids containing (CA)n microsatellites 86 3.3b Southern blot of gel in Figure 3.3a probed with fluorescently labelled (CA)io probe 86 3.4a Ethidium Bromide stained agarose gel showing Sau3A 1 digests of cosmids containing (GAAA)n microsatellites 87 3.4b Southern blot of gel in Figure 3.3a probed with fluorescently labelled (GAAA)6 probe 87 3.5 Hedgehog blot of plasmid subclone showing subclones containing (CA)n microsatellites using a fluorescently labelled (CA)io probe 89 3.6 Alternative positive subclone identification method using a fluorescently labelled (CA)io probe 89 3.7 Manual sequence of 1B7 GAAA repeat 91 3.8 Alleles of a microsatellite after PCR on twelve breeds of dog 94

CHAPTER 4 4.1 FISH results of cosmid hybridisation to canine chromosome spreads 110 4.2 Identification of hybridising chromosome, and precise localisation of cosmids on each chromosome 113 4.3a Chromosome paint hybridising to chromosomes 8 and 11, with cosmid probe 2A7 116 10 4.3b Chromosome paint hybridising to chromosome 7 with cosmid 1B10 116 4.3c Chromosome paint hybridising to chromosome 30 with cosmid 1F11 116 4.4a Dual hybridisation of Cosmids 1E3 and 2A6 hybridising to chromosome 18 117 4.4b DAPI-banded chromosome 18 117 4.5a Ethidium Bromide stained agarose gel showing restriction patterns of cosmid 2A11 with different restriction endonucleases 118 4.5b Southern blot of gel in Figure 4.5a probed with fluorescently labelled (CA )-|0 probe, showing the fragment containing microsatellite 118 4.6a Chromosome spread showing Pst 1 fragment containing 2A11 microsatellite hybridising to chromosome 6 119 4.6b DAPI-banded spread indicating chromosomes where fragment hybridised 119 4.6c Enlargement of DAPI-banded chromosome pair for confirmation of chromosome 6 identification 119 4.7 Families used to confirm presence of 1E7 microsatellite in the pseudoautosomal region 121

CHAPTER 5 5.1a Non-specific PCR product of 2A7 at 64°C 132 5.1b Non-specific PCR product of 2A7 at 66°C with loss of positive control 132 5.2 DogMap reference families 135 5.3a Parents of reference families at locus 1B7 137 5.3b 1D6 137 5.3c 1E7 138 5.3d 1E12 138 5.3e 1F11 139 5.3f 2A6 139 5.3g 2A11 140 5.3h 2D2 140 5.31 2H12 141 5.3j 1B10 141 5.3k 1E3 142 5.31 1F5 142 5.4a Example of Family 1 results with microsatellite 1F11 145 5.4b Example of Family 2 results with microsatellite 1E7 145 5.4c Example of Family 3 results with microsatellite 1F11 146 5.4d Example of Family 6 results with microsatellite 1E3 146 5.5a Family 7 with 2D2 microsatellites showing single base- pair size difference between alleles 148 5.5b Microsatellite 2H12 showing a spontaneous mutation in Family 8 148 5.6 Detection of previously unidentified allele for 1B10 using alternative primer sets 150 5.7 Most likely linkage group orders with centiMorgan distances 155 12 TABLES

pg CHAPTER 3 3.1 Repeat lengths and types of isolated microsatellites 92

CHAPTER 4 4.1 Chromosome locations of cosmids containing microsatellites 106

CHAPTER 5 5.1 PCR conditions and primers sequences of isolated, polymorphic microsatellites 130 5.2 Microsatellite allele sizes, PIC values, alleles present in reference families and number of heterozygous parents 131 5.3 Linkage results showing groups of linked markers 153

CHAPTER 6 6.1 General results summary 183 13 THE ISOLATION AND CHARACTERISATION OF COSMID- DERIVED CANINE MICROSATELLITE DNA SEQUENCES

Nicola Mary Suter

ABSTRACT

Microsatellites are the most commonly used DNA sequences in genetic mapping studies due to their frequency, distribution and polymorphic properties. (GAAA)n and (CA)n microsatellites were isolated from a partial canine cosmid library using (GAAA)s and (CA)io probes. From a total of 46 cosmids, 25 were subcloned and sequenced, revealing the presence of microsatellite repeats. Oligonucleotide primers flanking the repeat were designed for 17 microsatellites. A total of thirteen cosmid-derived repeats (nine (CA)n and four (GAAA)n) were found to be polymorphic across a panel of from 12 breeds of dog. Twelve of the cosmids from which the repeats were derived were isolated to eleven separate canine chromosomes by fluorescence in situ hybridisation. Two were assigned to a similar location on canine chromosome 18, and one to the telomeric end of the pseudoautosomal region of the canine sex chromosomes. The rest were assigned to canine chromosomes 2, 3, 6, 7, 8, 9, 20, 24 and 30.

Eleven of the thirteen markers were mapped in a set of canine reference families. Eight linkage groups including nine of these markers were revealed, four of which enhanced existing linkage groups, the remainder creating novel linkage groups. Five linkage groups were able to be newly assigned to canine chromosomes. Analysis of DNA sequence surrounding one repeat showed high homology to the 3' UTR of both bovine and ovine IGF II sequences, suggesting possible synteny between canine chromosome 18, to which IGF II has been physically mapped, and human, bovine and ovine chromosomes 11, 25 and 21 respectively.

The genetical and physical mapping in this study represent an extension to the size of the canine map as a whole, and to the combination of the genetical and physical maps of the domestic dog. ABBREVIATIONS

A - Adenine

AFLP - Amplified fragment-length polymorphism

APLP - Arbitrarily-primed length polymorphism

AP-PCR - Arbitrarily-primed polymerase chain reaction

APS - Ammonium persulphate

ATP - Adenosine Triphosphate

BAA - Biotinylated Anti-Avidin D BAC - Bacterial Artificial Chromosome bp - base-pair

BSA - Bovine Serum Albumin

C - Cytosine can- SINE - canine Short Interspersed Nucleotide Element

CATS - Comparative Anchor-Tagged Sequence

CEPH - Centre d'Etude de Polymorphisme Humaine cDNA - complementary Deoxyribonucleic Acid

Ci - Curie

CIAP - Calf Intestinal Alkaline Phosphatase

CTP - Cytidine Triphosphate d - 2'-deoxyribo DAPI - Diamidino-2-Phenylindole Dihydrochloric Hydrate dH20 - distilled, deionised water

DIG - Digoxigenin

DNA - Deoxyribonucleic Acid dNTP - mix of dA-, dC-, dG and dTTP

DNase - Deoxyribonuclease ds - double-stranded

DTT - DithioThreitol

EDTA - Disodium EthylenediamineTetraacetic Acid

EST - Expressed Sequence Tag

FISH - Fluorescence In Situ Hybridisation

FITC - Fluoroisothio Cyanate

g - gram

G - Guanine

GET - Glucose EDTA Tris

GTP - Guanosine Triphosphate

hr - hour

IMS - Industrial Methylated Spirit IPTG - Isopropylthio-G-D-Galactosidase

Kb - Kilobase

KW - KiloWatt

X - lambda

1 - Litre

LINE - Long Interspersed Nucleotide Element

LB - Luria Broth

MER - Medium Reiteration sequence m - milli

M - Molar

Ml - Microsatellite Instability

- micro nm - nanometer

OD - Optical Density

PCR - Polymerase Chain Reaction

PEG - Polyethylene Glycol

PNACL - Protein and Chemistry Laboratory

RAPD - Randomly Amplified Polymorphic DNA

RBC - Red Blood

RFLP - Restriction Fragment Length Polymorphism

RNA - Ribonucleic Acid

RNase - Ribonuclease rpm - revolutions per minute

SCEUS - Smallest Conserved Evolutionary Segment

SDS - Sodium Dodecyl Sulphate

SINE - Short Interspersed Nucleotide Element

SSC - Tri-Sodium Citrate and Sodium Chloride

SSCP - Single-Stranded Conformation Polymorphism

SSM - Slipped-Strand Mispairing

STS - Sequence-Tagged Site

T - Thymine

TBE - Tris-Borate-EDTA

TBq - TeraBequerel

TE - Tris EDTA

TEMED - N.N.N'.N'-Tetra Methyl Ethylene Diamine

TNE - Tris Sodium EDTA

TRIS-HCL - T ris[hydroxymethyl]aminomethane

TTP - Thymidine Triphosphate

UM-STS - Universal Mammalian Sequence-Tagged Site

UTP - Uridine Triphosphate UTR - Untranslated Region

UV - Ultra Violet

V - Volts

VNTR - Variable Number vol - volumes w/v - weight for volume

X-Gal - 5-bromo-4-chloro-3-indole-6-D-Galactoside

YAC - Yeast Artificial Chromosome

/ - per

2X - Two times 17 CHAPTER 1

1.0 INTRODUCTION

1.1 The origins of the domestic dog

The dog family, Canidae, consists of 34 species, their success being illustrated by their ability to adapt to vastly different climates and feeding habits. As members of the order Carnivora, which also includes cats, bears and seals, elucidation of their interrelationships has been facilitated by comparative chromosome studies. The canids have a highly variable diploid number of chromosomes, ranging from 36 in the red fox, to 78 in the domestic dog and its closest relatives such as the grey wolf, the jackal and the coyote (Wayne 1993).

Mitochondrial DNA studies suggest that the process of canine domestication began over 100,000 years ago (Morell 1997)(Vila et al. 1997). The research demonstrated that domestic dogs of today are descended from two original, separate domestication events, followed by two further dog/wolf matings, resulting in four separately evolving groups or clades of domesticated dog. The evidence which produced this estimation of a date for the domestication of the dog is not incontrovertible, due to the varying mutation rates found in mitochondrial DNA, but strongly suggests that domestication did take place significantly earlier than the estimated 14,000 years ago suggested by archaeological evidence (Morell 1997)(Vila et al. 1997).

The domestic dogs of today demonstrate diverse morphology, with each breed possessing its own unique physical and behavioural characteristics. The inbreeding of dogs to produce today's pedigrees has led to most breeds having one or more recessively inherited disease, often due to the 'founder effect'. If a particular dog has the required characteristics of a pedigree, that dog is bred from, in order to produce as many offspring with these preferred characteristics as possible. All dogs with the required particular characteristics are then bred together until the 'ideal' individual as regards the characteristics is produced. Should the original stud dog be a carrier for a recessively inherited disease, the frequency of the defective gene within the population would increase upon breeding from that individual. As the offspring of this stud dog are bred together to maintain the breed characteristics, resultant individuals may inherit two mutant , showing the recessively inherited disease phenotype. To enable identification of dogs which carry these diseases, there is clearly a need to produce a map of the canine genome.

1.2 The need for a genome map

Production of a map of the canine genome is essential if any area of canine genetic research is to proceed efficiently. Such a map has many potential uses, including that of the identification of disease-causing genes, at least one of which is present in most pedigree dog populations (Patterson 1995). There are currently more than 300 different genetic diseases known amongst more than 200 breeds of dog, 73% of which are recessively inherited. Of this 73%, 68% show autosomal and 5% X-linked inheritance (Patterson 1995). Examples of these recessively inherited diseases include copper toxicosis in Bedlington Terriers - (around 25% are affected with the disease) (Yuzbasiyan- Gurkan et al. 1997), canine X-linked severe combined immunodeficiency (SCID) (Deschenes et al. 1994) and progressive retinal atrophy (late onset blindness, which is commonly found in labradors and dashchunds) (Gould et al. 1995).

The production of a genome map assists the development of tests for disease gene carriers. Once identified by the presence of a particular polymorphic marker allele, carriers and affected can be avoided in breeding programs, and the eventual eradication of many recessively inherited canine diseases can be achieved. Some diseases of dogs have equivalents in other animals, such as the similarity of human and dog characteristics of X-linked SCID. These similarities include failure of the young to thrive, and increasing susceptibility to chronic infections later in . Dogs also show morphologically abnormal thymuses and B lymphocytes which fail to produce specific antibodies in common with the human disease form. Such similarities provide an opportunity for comparative mapping to identify the disease gene in dogs. Once the disease defect has been confirmed as being of the same origin in both dogs and humans (Deschenes et al. 1994), it may be possible to use the dog as a model for the human disease, enabling therapy and treatment to be discovered.

A map also helps in identifying the genetic basis of the vast morphological and behavioural differences between breeds of dog (Ostrander et al. 1993) by 19 providing markers whose alleles may be inherited with characteristics such as tail size or a love of water. Other, wider uses include maintaining the diversity of closely related endangered species such as wolves and jackals (Ostrander et al. 1993), the Ethiopian wolf (Gottelli et al. 1994) and the Mexican wolf (Garcfa-Moreno et al. 1996). This is done by comparing marker similarity between individuals and allowing only those which are less closely related to produce offspring. Gene-specific markers, isolated to produce the map, are used as markers to study other (Haberfeld et al. 1991) and vice- versa. A map therefore can be used as an aid to achieving breeding of less related and therefore healthier dogs within pedigrees (Ostrander et al. 1993). A further use is in the analysis of the canid family and assessment of the extent of their evolutionary relationships (Wayne 1993). These uses are discussed in greater detail, with respect to the markers used, in a later section.

The development of maps of most other animals is usually directed specifically to the requirements of the species in question. The maps of economically important species such as cattle, sheep, pigs and chicken concentrate on their economically important characteristics such as meat production, milk yield and growth rate. Similar such characteristics are prioritised in commercially important species such as wheat and barley. The purpose of the human genome mapping project is to understand gene action in all aspects, from discovering the genetic basis of disease to the mechanism of ageing, and to enable understanding of human evolutionary heritage (O'Brien 1991). Together with the maps of other animals, a complete history of the evolution of life on earth may eventually be established. The human genome mapping project is well under way, with sufficient markers found and mapped to the genome to enable the mapping of any monogenetically inherited disease (Dib etal. 1996).

There are two types of genetic map, a physical map and a linkage map. A physical map is based on physically hybridising markers onto chromosomes, whereas a linkage map relies on the grouping of markers as defined by the frequency with which their specific alleles recombine through inheritance. Markers close together on the same chromosome will tend to be inherited together more often than those which are on different chromosomes or far apart on the same chromosome. By assessing inheritance of marker alleles within a family, these markers can be grouped according to the tendency of their alleles to be inherited together. Eventually, the number of linkage groups should be equal to the number of chromosomes in the species under study. This, however, assumes several factors. Firstly, the markers which are linked must originate from (in total) every chromosome of the animal in question, and secondly, there must be no gaps between the markers on the same chromosome large enough to result in there being more linkage groups than chromosomes. It is possible to have two markers which are not genetically linked on one chromosome, if they are at opposite ends of that chromosome. Groups which are separated by such a distance that the two nearest markers of each group are not genetically linked result in the assumption of the two groups belonging to different chromosomes. Thus, for a complete view of the genome, the development of both types of map is necessary.

1.3 Markers for map production

To produce a map of any species, markers need to be placed throughout the genome as reference points. There are two types of marker, known as Types I and II anchor loci. Type I markers are evolutionarily conserved loci, such as genes. Conservation of these Type I loci from one species with a developed map can be used to identify genes on a less developed map, hence speeding the progress of building the latter map. Several sets of gene-specific markers have been developed for this purpose, one of which is called universal mammalian sequence-tagged sites (UM-STSs) (Venta et al. 1996). The set consists of gene-specific markers, chosen for their conservation between species and their even spacing within the species for which they are mapped. The application of these markers for the canine genome is in progress (Venta et al. 1996), and they will each be physically assigned to chromosomes, aiding physical map production. However, a map consisting solely of genes would not be very informative for genetic linkage mapping of a particular species as genes are not usually polymorphic. More than one allele is needed to enable linkage mapping within a species, and this is where the Type II marker loci are required. Good Type II anchor loci need to be polymorphic, stably inherited, numerous and evenly spaced within the genome. Type II markers are usually found in non-coding regions of the genome, where polymorphism is much more common. There are several types of this kind of marker:

Restriction fragment length polymorphisms (RFLPsI These are polymorphisms based on the presence or absence of restriction endonuclease sites within the genome leading to the production of various fragment lengths of DNA. These lengths differ with both the various enzymes used and the presence or absence of a digestion site due to mutation 21 (Botstein et al. 1980), and were formerly the markers used in linkage map production (Botstein etal. 1980). A variation on RFLPs are amplified fragment length polymorphisms (AFLPs), when Genomic DNA is digested with restriction enzymes, linkers are added to each end of the fragments, and these are amplified using the polymerase chain reaction (PCR) with primers designed from the linker sequence (Vos et al. 1995). The amplified fragments are then separated on denaturing polyacrylamide gels, and the differences of the banding patterns between individuals are studied. Other markers based on the natural variation between individuals include randomly amplified polymorphic DNA (RAPDs)(Bielawski and Pumo 1997) and arbitrarily-primed PCR (AP-PCR) (Welsh and McClelland 1990), the latter of which has been used in dogs (Ezer et al. 1996). Both of these methods involve the use of random primer sequences to produce a number of PCR products which are analysed on gels. A more complicated extension of the above methods requires extensive sequence knowledge and involves the use of allele- specific primers for detecting polymorphisms, known as amplified product length polymorphisms (APLP) (Watanabe etal. 1997).

Sinale-stranded conformation polymorphisms (SSCPs) These polymorphisms are detected on non-denaturing gels, when any change in the secondary structure of single-stranded DNA, caused by a polymorphism, leads to a change in the speed at which the DNA migrates (Lucas etal. 1997).

Variable number tandem repeats (VNTRs) These repeats are the most commonly used linkage markers. They consist of units of DNA which are repeated a number of times. The number of repeats at any one locus can vary between individuals, depending upon both the length of repeat and the specific repeat unit involved. Minisatellites, when found to be polymorphic, are hypervariable repeat sequences consisting of tandemly repeated units of 10 to 250 base pairs in size, and are used in the production of individual-specific DNA fingerprints (Jeffreys etal. 1985). Those VNTRs most commonly used today are microsatellites (Tautz and Renz 1984), which are short stretches of tandemly repeated sequences of 1 to 5 nucleotides.

Microsatellites are used extensively as markers for the production of linkage maps, and show most of the characteristics required for an ideal genetic marker. They are common and randomly distributed within the genome (Stallings etal. 1991), usually polymorphic and generally stably inherited. 22 Both the final human and mouse genetic maps are made up in the most part using microsatellite markers (Deitrich etal. 1996)(Dib etal. 1996).

The other markers mentioned are also used as markers, but are less suitable. Minisatellites cluster towards the telomeres of human chromosomes (Royle et al. 1988). RFLPs, and the variations based on them are not as polymorphic as microsatellites, and not all are locus-specific and therefore they are unsuitable as genetic map markers. Those RFLP variations which use PCR are very sensitive to even slight variations in amplification conditions, which make any results achieved difficult to interpret.

1.4 Microsatellites

Microsatellites are found in high numbers interspersed throughout many eukaryotic genomes (Tautz and Renz 1984), with nearly all occurring in non­ coding regions. It is suggested that those which are found in coding regions may have been more stable during evolutionary divergence of mammals, and therefore would be useful in finding microsatellites in many species of mammal (Shibuya etal. 1993). Some repeats are more common in coding regions, especially (CAG)n repeats, the lengths of which are constrained so that they do not to interrupt the function of the encoded molecule (Hancock 1996).

The presence of simple sequences is found in five phylogenetically wide­ spread , but only to a small degree in (Tautz and Renz 1984)(Tautz etal. 1986). There is one microsatellite found, on average, every 6 kilobases in humans, with (CA)n repeats occurring about every 30 kilobases (Stallings etal. 1991)(Beckmann.and Weber 1992) and every 18kb in mice (Stallings et al. 1991).

Length polymorphism within the microsatellites has been both theorised (Tautz et al. 1986) and observed (Tautz 1989)(Weber and May 1989), and is thought to be caused by replication slippage within the repetitive stretches (Tautz and Renz 1984) and may be a major force in sequence evolution. With the use of PCR, the potential of microsatellites as DNA markers for both genome mapping and linkage studies was realised (Litt and Luty 1989)(Tautz 1989)(Weber and May 1989), as well as their collective use in individual identity testing (Tautz 1989). (CA)n repeats were found to be polymorphic between individuals by size differences of multiples of two bases, suggesting 23 the variation was due to differing numbers of repeat units. Their stability by way of the Mendelian inheritance of their alleles was also observed (Tautz 1989)(Weber and May 1989), leading to their first use in solving parentage disputes (Tautz 1989). Many types of simple-sequence repeats are polymorphic (Tautz 1989) in a wide range of genomes (Gupta etal. 1994), confirming their potential for use in mapping both and animals. Studies on (CA)n repeats have shown that the length of a repeat is related to its polymorphic properties. In general, a repeat size of less than ten units tends to show little polymorphism, those with repeat unit sizes of 11-15 show variable polymorphic properties, and those which are larger show consistently moderate to high polymorphism (Weber 1990).

The general frequency of the types of repeat varies widely depending upon the in question. The GC dinucleotide repeat is underrepresented in all vertebrate genomes (Tautz et al. 1986). This is partly explained by the conversion of the cytosine (C) base to the thymine (T) base by methylation, followed by deamination and finally mutation (Karlin and Burge 1995). The most commonly found, and therefore most studied form of tandem repeat in animals is (CA)n (Love etal. 1990), whilst in plants the (GA)n and (AT)n repeats are more common (Dow etal. 1995)(Panaud etal. 1995), with the (GA)n repeats of plants showing longer average repeat lengths than the (CA)n repeats of mammals (Dow etal. 1995). The average repeat length of (CA)n repeats in cold water fish is twice that of humans and indeed most mammals (Broker et al. 1994). There is a low frequency of microsatellites in the avian genome, with one every 30 kilobases. The most common is the (CA)n repeat, occurring approximately once every 145 kilobases. The small size of the avian genome and its lower density of non-coding sequence, where most microsatellites are found, possibly explains this lower frequency of microsatellite repeats (Primmer et.al. 1997).

There are three categories of microsatellite repeats; perfect, imperfect and compound, which is subdivided into compound perfect and compound imperfect (Weber 1990). Each of these are defined below:

Perfect (comprising 64% of repeats in humans): A run, for example, of (CA)n repeats, such as C AC AC AC AC AC AC AC A, with no interruptions or flanking repeat sequences. 24 Imperfect (comprising 25% of repeats in humans): For example;

CACACACA(N)i-3 CACACA, where 1-3 nonrepeat bases break up two or more runs of similar repeat units, with at least 3 repeats on either side of the ‘break*.

Internal repeats must be at least 1.5 units long; (CA)ioTACAGT(CA)i 3 is considered an imperfect repeat sequence, but (CA)-ioGTACG(CA)i 1 would be considered to be two separate perfect repeats of (CA)i 2 and (CA)n.

Compound repeats (comprising 11% of repeats in humans): A run of at least 5 repeats plus 3 or less consecutive nonrepeat bases separating the first run of repeats from a second run of different repeats, of 5 repeats long or greater. For example: CACACACACA(N)o-3(CT)7. The compound repeats can be perfect, as shown above, or imperfect, which contain ‘breaks’ within the individual repeat types. (All classifications and repeat frequencies described above were made according to Weber (1990).)

76% of all human microsatellites are from the group A, CA, AAAN, AAN and GT (in decreasing order of abundance), 80% of which (excluding AC) are associated with interspersed, repetitive Alu elements (Beckmann and Weber 1992). Only a small proportion of CA repeats are found to be so associated. A and CA repeats in general are found in sequences (Tautz and Renz 1984)(Weber and May 1989) and CA repeats are found in coding regions, although they contain only small numbers of repeats (Weber and May 1989). The general tendency of A-rich microsatellites to be associated with longer repetitive sequences can cause problems in the design of one primer for polymorphism analysis, which would be within the highly repeated A lu element, resulting in variable amplification results (Beckmann and Weber 1992).

The use of tetranucleotide repeats as markers has increased due to their ability to produce clearly identifiable alleles when compared to dinucleotide repeats. Some dinucleotide repeats produce stutter bands during PCR, at times to such an extent that the actual allele size is difficult to distinguish from the stutter band. Stutter bands were observed when PCR was first used to study repeats (Tautz 1989)(Weber and May 1989), and are an artifact of PCR. In general, the larger the repeat unit size, the less the repeat tends to produce stutter bands (Beckmann and Weber 1992). These extra bands are explained in part by the model of slipped strand displacement, which is the misaligning of single stranded DNA that can occur if the polymerase falls off during repeat synthesis (Schlotterer et al. 1997). This is supported by sequencing of the stutter bands, which reveals the random loss of one or more repeat units, dependent upon the band analysed (Murray etal. 1993)(Walsh etal. 1996). A stutter band can be more intense than the actual band, and can sometimes result in incorrect genotyping (He et al. 1996). Comparing the semi­ automated genotyping accuracy of di-and tetranucleotide repeats in the human genome, tetranucleotide repeats result in fewer errors making them more informative in linkage studies (He et al. 1996). Despite the tendency of dinucleotide repeats to produce stutter bands during PCR, (CA)n repeats are sufficiently abundant to permit those which tend to produce numerous or strong stutter bands to be avoided. A combination of marker types, using the abundance of (CA)n repeats and the stability of tri- and tetranucleotide repeats in PCR, produce a good and abundant set of Type II markers to create a comprehensive genome map in any organism.

1.4.1 Microsatellites associated with disease Although microsatellites are useful because of their stability both within an individual and through generations, general microsatellite instability (Ml) has been found to occur in some cancer tumours. Ml is a change in the number of repeats or loss of heterozygosity of a microsatellite within tumour cells. The instability was detected in hereditary non-polyposis colorectal cancer (HPNCC) tumours, which displayed widespread and frequent alterations in microsatellite lengths. This widespread instability of microsatellites denotes a deficiency in the correction of errors of DNA replication in the malignant tumour, and led to the identification of the defective gene responsible (Jiricny 1994)(Lengauer et al. 1997). Endometrial carcinoma, often associated with HPNCC, also shows Ml (Tucker-Burks etal. 1994), as do some other sporadic cancers, such as those of the lung, bladder and prostate (Kunzler et al. 1995). Ml, whether contributory to tumourigenesis or a by-product of it, is thought to reflect genomic instability and therefore may indicate cells undergoing genetic alteration, possibly leading to cancer or other disease phenotypes (Jiricny 1994). This microsatellite instability may be used as a 'molecular tumour clock' to follow the progression of some cancers (Shibata 1997).

Microsatellites are more commonly associated with several hereditary dominantly inherited diseases. At present, twelve diseases are known to be caused by trinucleotide repeat expansion, eleven of which are CG-rich repeats (Warren 1996). The diseases can be separated into two classes, those where the expansion is in the coding DNA and those where the microsatellite is found in the noncoding sequences. When the microsatellite is in a coding region of DNA, an expansion of the repeat may disrupt the correct functioning of the gene. For example, the CAG/CTG repeat type is 26 usually found in the coding regions of genes and includes Spinal and bulbar muscular atrophy, Spinocerebellar ataxia Type 1, Huntington's disease, Dentatoribral-pallidoluysian atrophy and its phenotypic variant, Haw River syndrome, and Machado-Joseph disease. The disease phenotype presents following expansions of a 2-3 fold increase in size from the normal allele size in healthy individuals of up to approximately 30 repeats, and are called type-l trinucleotide disorders. The CAG/CTG repeat codes for polyglutamine stretches, and produces a change of function of the protein produced. When expansion occurs which causes disease, it is transmitted in a dominant fashion, and shows continued instability in successive generations.

Expanded triplet repeats outside protein-coding regions are tolerated to a much greater extent, the mutant genotypes reaching a much larger expansion size, although the expansion can still greatly affect gene expression (Kunzler et al. 1995). Diseases caused by trinucleotide repeat expansion outside the coding region include Fragile X syndrome, Fragile XE mental retardation (both GGC/CCG repeats), Myotonic dystrophy (CTG/CAG repeat) and Friedreich's ataxia (GAA repeat). Repeat expansion in Fragile X syndrome causes loss of expression of the FMR1 protein (Kunzler et al. 1995), possibly caused by cytosine methylation of the extended repeat preventing from taking place.

With the exception of Friedrich's ataxia, all the diseases mentioned are caused by GC-rich repeat expansion, which has led to several theories explaining this apparent bias. One experiment tested the ability of GC-rich repeats to form stable, hairpin-like structures, possibly explaining the sustained largeness of size of the disease gene in Huntington's disease (Gacy et al. 1995). The lack of this structure formation could explain the increased stability found in normal alleles conferred by the interruption of CGG repeats in Fragile X syndrome (Eichler et al. 1994). The longer the uninterrupted run of CGG repeats, the more likely expansion, through unstable transmission, is to cause the disease. The interruptions found in the repeats may disrupt the secondary structure of the repeat, causing it to be less stable, and therefore less prone to form hairpins of threshold stability (Gacy et al. 1995). The exception to the secondary structure 'rule' governing trinucleotide repeat diseases is Friedrich's ataxia, which has recently been found to be caused in the majority of cases by a greatly expanded intronic GAA repeat. The expansion may cause abnormal RNA processing or interrupting transcription in some way (Cariello etal. 1996), rather than forming strong secondary structure; its ability to do so is rather low In vitro , at any copy number (Gacy etal. 1995). The discovery of a non GC-rich trinucleotide repeat expansion causing disease, also the only one to be recessively inherited, suggests further types of repeat, triplet or otherwise, may be causative agents of as yet unexplained diseases. One example of a different repeat type causing a disease is the recent discovery of a dodecamer minisatellite repeat expansion, causing progressive myoclonus epilepsy (Lalioti et al. 1997). The repeat unit of 12 bases is GC-rich and shows premutation alleles which are unstably inherited, as observed in Huntington's disease. The dodecamer may be able to form significant secondary structure, and the threshold stability of both the dodecamer and trinucleotide lengths is equivalent, at 144-156 base pairs (Mandel 1997). A further 15-18-mer repeat expansion in the same gene, EPM1 (Virtaneva etal. 1997) is also GC-rich, and shows no evidence of premutation alleles. The nature of the disease and the repeats causing it need to be studied further. Susceptibility to type 1 diabetes is connected to size variation at an insulin gene minisatellite locus, possibly a contributory factor in the disease haplotype (Bennett et al. 1995). None of the diseases involving repeat sequences, of either a mini- or microsatellite nature, are as simple, or similar, as they first appear, varying in repeat base composition, repeat size, length and positioning within the genome. Many more diseases may yet reveal a genetic origin based on repeat expansion.

1.4.2 The function of microsatellites The random distribution and frequency of microsatellite repeats suggests that microsatellites have no general function involving gene expression (Tautz and Renz 1984), although their universal presence in eukaryotic genomes suggest they are an important part of the genome (Tautz et al. 1986). Some microsatellites are found in coding regions, notably some of the triplet repeats mentioned above, so at least some repeats have specific roles to play within the genome.

Several functions are proposed for microsatellite repeats. They include the regulation of gene expression, for which the presence of a microsatellite was found to be necessary in a Drosophila promoter sequence containing (GA)n*(CT)n repeats, enhancing heat-induced expression of the hsp26 gene (Glaser et al. 1990), and the presence of a satellite repeat which influences gene expression in a dose-dependent, orientation-specific way (Maiorano et al. 1997). An example of in vitro enhanced gene expression shown by a (CA)n*(GT)n repeat was derived from its ability to form a left-handed DNA conformation called Z-DNA, important when the repeat was found close to the promoter (Hamada etal. 1984). Reduction of gene expression has also been 28 demonstrated with a (GAA)n and (GCA)n repeats, and is orientation and position-dependent (Aoki etal. 1997). Microsatellite repeats have also been associated with transcription regulation and activation. Evidence for transcription regulation was found in triplet repeat-based homopolymeric glutamine and proline stretches (Brahmachari etal. 1997). Transcription activation may be influenced by the secondary structures the (CA)n#(GT)n dinucleotide repeat forms (Gerber et al. 1994). (TA)n repeats have been found to have high affinity for C 2 H2 zinc fingers in vivo , resulting in a length- dependent up-regulation of transcription (Gogos and Karayiorgou 1996). Several early studies have also suggested that microsatellite repeats may be recombination hotspots ((Tautz and Renz 1984) and references within).

Structurally, the (CNG)n repeat type is able to form cruciform structures (Kunzler etal. 1995), hairpin loops (Petruska etal. 1996) and induces heterochromatin formation (Kunzler et al. 1995). A further proposed new DNA secondary structure called triad-DNA involving both strands, is possible in GC-rich trinucleotide repeats (Kuryavyi and Jovin 1995). In a study of Myotonic Dystrophy, (GAC)n repeat blocks of 75 or greater repeats 'become the strongest known natural nucleosome positioning elements and may profoundly alter the local chromatin structure' (p572 (Wang and Griffith 1995)), although in vivo evidence of the structure is not conclusive (Debin et al. 1997). Where repeats are necessary, avoidance of large expansions is thought to occur by using different codons, thus varying the DNA sequence while the amino-acid sequence remains the same (Kunzler et al. 1995).

Dinucleotide repeats also have structural ambiguities. The alternating pyrimidine-purine sequence of the (CA)n repeat enables formation of left- handed Z-DNA, giving locus-specific organisation of DNA folding are a super­ structure able to attract specific nuclear proteins (Vogt 1990). It is suggested that these evolutionarily conserved repeat sequences are necessary for their secondary and tertiary folding structure and subsequent ability to attract nuclear proteins rather than having a particular primary structural effect. Further, a bent DNA structure is a common recognition site for the binding of nuclear proteins, such as histones of nucleosomes (Vogt 1990). Conserved patterns of bending have been found in the tertiary structure of satellite DNA, resembling the bending of DNA in nucleosomes (Vogt 1990)(Fitzgerald etal. 1994), and thus suggesting a possible role in the positioning of nucleosomes along the satellite DNA (Fitzgerald etal. 1994). Single-stranded (TC)n repeat protein binding activity is conserved across species, suggesting an important role for some repeat-types (Yee et al. 1991). Supporting evidence for this 29 function of repeat DNA is provided by the research suggesting that the formation of complementary strand d(GA*GA)n duplex DNA, found in vitro , occurs in vivo (Huertas et al. 1993), leaving the single-stranded (TC)n repeat free for protein binding.

1.4.3 The potential and limitations of microsatellites Genetic markers The combination of the use of PCR and the discovery of the potential of microsatellites led to a revolution in the world of genetic mapping, first realised in the human genome mapping project by the production of 813 polymorphic microsatellite markers, covering 90% of the genome (Weissenbach et al. 1992). The random distribution and high frequency of the (CA)n microsatellite repeats enable them to be the only marker type used in the production of the Genethon human linkage map, which consists of 5,264 (CA)n microsatellites (Dib et al. 1996). These microsatellites are the main marker choice in the construction of the linkage maps many other animals, including those of the murine and bovine genomes.

There is an increasing use of longer repeat-unit microsatellites for mapping, because of their ability to give clearer results following PCR, leading to the construction of further human maps based entirely on these types of repeats (A dam son et al. 1995). The (GAAA)n repeat is the most common tetranucleotide repeat in the human genome, also giving one of the most clearly interpretable group of results within the tetranucleotide repeat class (Adamson et al. 1995). One disadvantage of tetranucleotide repeats is their higher mutation rate in humans when compared to dinucleotide repeats (Weber and Wong 1993). Indeed, tetranucleotide repeats have been found to be exceptionally polymorphic in some cases (Liu et al. 1995). In the canine genome, a subset of (GAAA)n repeats are extremely polymorphic; more so than almost any described for the human genome (Francisco et al. 1996). Tetranucleotides in general show a mutation rate of 0.4% per locus per generation in the dog (Francisco etal. 1996), but are still stable enough to allow their use for genetic mapping.

Microsatellites and evolution Microsatellites are widely used in evolutionary studies, but the analysis of the evolution of microsatellites themselves reveals limitations to their use. Several models for the evolution of microsatellites are proposed, all involving the slipped-strand mispairing (SSM) mutation model rather than recombination (Darvasi and Kerem 1995) as a mechanism for the variation in 30 size of alleles at a given microsatellite locus (Goldstein et al. 1995a)(Goldstein etal. 1995b). This assumption thus leads to several theories based on a stepwise mutation model for microsatellite evolution (Valdes etal. 1993)(Bell and Jurka 1997), with different models appropriate for different studies (Takezaki and Nei 1996). The model which presently best fits the evolution of microsatellite repeats is based on a two-step model of mutation. This involves a combination of frequent single base-pair additions and deletions and rare large size changes in microsatellite length (Di Rienzo et al. 1994). The theory is further refined by introducing a constraint on allele size, but with a bias towards increasing allele size, followed by rare large deletion mutations keeping the allele size reduced overall (Garza et al. 1995). The theory is supported by further research, demonstrating a tendency towards increased allele size (Rubinsztein etal. 1995), and further studies demonstrating both allele size increase and allele size constraint by rare deletions in yeast (Wierdl etal. 1997). Rare, large deletion events are not the only mechanism of constraint on allele size. Research shows that increasing allele length causes an increase in the mutation rate of microsatellites (Edwards etal. 1992)(Jin etal. 1996)(Petes etal. 1997) which is subsequently stabilised by interruptions in the repeat (probably by reducing slipped-strand mispairing (SSM)), thus causing a decrease in the mutation rate (Pepin et al. 1995)(Jin et al. 1996)(Petes et al. 1997). Evidence for the stabilising effect of interruptions within long repeats is provided by their presence causing a decrease in likelihood of the expansion of premutation alleles to disease- causing levels by transmission in Fragile X syndrome (Eichler et al. 1994).

The accuracy of the use of microsatellites in evolutionary studies depends on an exact knowledge of the mutation rate of the microsatellite in question and the assumption that the variations in allele size are based on variation in repeat number alone. Estimations of mutation rates for microsatellites vary dramatically, from 10'5 to 10-2 mutations per locus per generation (Weber and Wong 1993) and higher (Hastbacka etal. 1992). Mutation of microsatellites is not always based upon changes in repeat number alone, but also on variations in the sequence flanking microsatellites (Garza eta l. 1995) between the PCR primers. This can result in microsatellites being homoplasic for size, showing different sequences both within and surrounding the microsatellite, resulting in alleles of a similar size. This allelic homoplasy can cause problems in interspecies studies (Garza and Freimer 1996)(Grimaldi and Crouau-Ray 1997)(Pepin etal. 1995), where similar-sized alleles across species may not result from identical sequence between the primers flanking the repeat. One highly polymorphic tetranucleotide repeat reveals several further alleles when a single fragment length is sequenced from different individuals and is found to contain sequence variations, despite displaying identical allele sizes (Rolf et al. 1997). Homoplasy for size has implications for the use of microsatellites in linkage disequilibrium, gene mapping and cross-species evolutionary studies, possibly reducing their usefulness in the latter area altogether (Angers and Bernatchez 1997). This effect would be masked more in some microsatellites than others; compound repeats can vary in either repeat-type, an increase in size of one repeat unit type and a decrease by one repeat unit in the other will result in an allele of identical size, but different composition (Garza and Freimer 1996). One way to enable detection of at least some of these size-homoplasic differences is the use of SSCP analysis, which would detect any subsequent changes in folding of single-stranded DNA (Lucas etal. 1997). Microsatellites can be used in interspecific evolutionary studies as long as care is taken in the selection of microsatellites for the study. For reliable results, all microsatellites need to show a similar, low mutation rate, large range of allele size, no degeneration of the microsatellite (or its flanking sequence) between taxa and the use of at least 18-20 separate loci (Feldman et al. 1997). For other purposes such as linkage analysis, sequencing of similarly sized alleles could provide much more information about microsatellite variation, making the locus far more informative and useful in finding and ordering linked markers.

Related to this sequence variation in the flanking sequences of microsatellites is the phenomenon of allele non-amplification. A relatively common occurrence, it is the lack of amplification of an allele due to the presence of a mutation in one or both primer sequences (Callen et al. 1993)(Koorey et al. 1993) preventing the primer from binding. 30% of CEPH sibships contain such mutations (Callen et al. 1993), resulting in a lack of information when using the markers for the creation of linkage maps. Allele non-amplification can also result in false exclusion in parentage cases. Designing new primers alleviates the problem, and the possibility of a mutation within primer sequences should be investigated whenever apparent non inheritance of a paternal allele by an offspring is observed at a locus.

Intraspecies variation and evolution Drosophila melanogaster populations reveal adaptation-specific loss of variation at certain microsatellite loci for some fly populations. These results demonstrate the use of microsatellites in reflecting adaptive changes by loss of variation within populations over a short space of time (Schlotterer et al. 1997). Similar intra- and interpopulation studies use microsatellites as a means of measuring allelic variance in toads (Scribner et al. 1994) and breed differentiation in goats (Pepin et al. 1995). Two separate studies of the origin and evolution of human populations reach similar conclusions using different microsatellites (Bowcock etal. 1994)(Perez-Lezaun etal. 1997), emphasising their usefulness in intraspecies studies. Conservation in the flanking sequence of microsatellites is also observed, and is found across millions of years of marine turtle evolution, suggesting a particularly slow rate of microsatellite mutation in species of this group (FitzSimmons etal. 1995). Studies on canids using microsatellites demonstrate variation within pure breeds of dogs is low when compared to variation between the breeds, and is based mainly on allele distribution differences rather than the presence of breed-specific alleles (Fredholm and Wintero 1995)(Pihkanen e ta l. 1996)(Zajc etal. 1997). Other studies have involved the use of microsatellites in order to determine the evolution and resulting breeding strategies for endangered wolf species, such as the Mexican Grey wolf (Garcia-Moreno et al. 1996), the Ethiopian wolf (Gottelli et al. 1994) and the Red wolf (Roy et al. 1996). The Ethiopian wolf shows low microsatellite variability, reflecting population decline due to habitat reduction. Further microsatellite evidence from this study suggests domestic dogs are also hybridising with the wolves. A captive breeding programme with genetically pure Ethiopian wolves would therefore be needed to preserve the canid's diversity (Gottelli etal. 1994).

Interspecies studies There are many studies involving sequence conservation of microsatellites between closely related species. Analysis of sheep and goats suggest they have a closer phylogenetic relationship with each other than either have with cattle (Pepin etal. 1995), although some microsatellite conservation between sheep and cattle (Moore etal. 1991) and between cattle and goats is seen, despite differing flanking sequences (Pepin etal. 1995). Studies between human and primate microsatellites also show variation in flanking sequences and repeats (Garza etal. 1995). Some studies between apes and New World monkeys using human microsatellites (Coote and Bruford 1996) and between dogs, red foxes and Arctic foxes (Fredholm and Wintero 1995) were carried out, but analysed allele size only, demonstrating only cross-species conservation of microsatellites and some of their polymorphic properties, rather than all the true variation between primer sequences. Loss of polymorphism of microsatellites between species is also present, which adds to the possible problems of cross-species comparisons using microsatellites (FitzSimmons etal. 1995). The use of sequence analysis of cross-species 33 conserved microsatellites and their flanking sequences is important if they are to be used in interspecies analysis.

Paternity testing and forensic studies The properties of microsatellites results in their use in solving cases of disputed parentage. Paternity testing was initially carried out using minisatellites and although still used for such purposes, microsatellites are increasingly the marker of choice. Minisatellite probes have disadvantages in that they produce complicated banding patterns, with bands of varying intensities which can be difficult to interpret, and in multilocus probing, the different loci are not always easily separated (Pena and Chakraborty 1994). The comparatively lower heterozygosity of most microsatellites can be overcome by using 10 to 20 loci (Queller et al. 1993), and combined with the use of multiplexing (Queller et al. 1993)(Gloatzki-Mullis et al. 1995)(Lindqvist et al. 1996), thus providing a fast and efficient test of parentage. The possibility of spontaneous mutation should be taken into account, as it could result in incorrect parentage analysis - no result should be based on exclusion at one locus alone (Pena and Chakraborty 1994)(Heyen etal. 1997). Where one exclusion is found, the possibility of allele non-amplification should also be taken into account before a parent is excluded (Gloatzki-Mullis etal. 1995).

The use of microsatellites in paternity testing of dogs as well as farm animals and humans is increasing. A panel of tri- and tetranucleotide repeats is set up for human paternity testing and forensic applications (Lindqvist etal. 1996). Research has shown that with technical modifications, microsatellites can be used to amplify fragmented DNA from various sources, including archaeological samples (Golenberg etal. 1996), indicating the possibility of gaining DNA evidence many years after a crime has been committed. Microsatellites are already used successfully in cattle (Gloatzki-Mullis et al. 1995)(Heyen etal. 1997), goat (Pepin etal. 1995) and dog (Fredholm and Wintero 1996) parentage testing. The canine paternity testing used a panel of 12 dinucleotide microsatellite loci, of which between six and nine are sufficient in solving all the parentage cases presented. Complications within parentage testing, such as superfecundation (Fredholm and Wintero 1996) are also solved using microsatellite panels. In such a situation, a dam has a litter where the pups have at least two sires between them, resulting from two or more males fertilising the dam whilst she was in heat. In this case, the importance of the exclusion of a given father at more than one locus is even greater. 34

1.5 Linkage markers in the canine genome

The production of a map of the canine genome is in progress. The markers so far isolated for the map are mostly microsatellite markers. As in the human genome, (CA)n repeats are by far the most common repeat-type in dogs, and are found every 42 kilobases (Rothuzien etal. 1994). Next to the dinucleotide repeats, the most common repeats in the canine genome are the tetranucleotide repeats (TTCC)n, found every 122kb (Rothuzien etal. 1994) and (GAAA)n repeats, found to be about one-third less common than the (CA)n repeats, every 130 kilobases (Francisco etal. 1996). Particularly when large, (GAAA)n repeats are often irregular, including variable combinations of G(A)n -containing repeats (Liu etal. 1995)(Francisco etal. 1996).

The microsatellites found have been largely (CA)n repeats, and many isolated from the canine genome have been found to show sufficient polymorphism to be of use in genetic studies (Holmes et al. 1993) (Ostrander et al. 1993)(Ostrander et al. 1995). Many (GAAA)n repeats have also been isolated from the canine genome (Francisco etal. 1996). In a heterogeneous dog population, a total of around 400 evenly spaced markers, consisting of di-, tri- and tetranucleotide repeats would produce a map of 10cM resolution, possibly higher (Ostrander etal. 1993).

1.6 Principles of linkage mapping

Markers The original markers used for linkage mapping were RFLPs, but these are found too often to be diallelic, with low polymorphism and unfavourable allele distributions (Botstein etal. 1980). The advent of microsatellites enables high resolution linkage mapping of virtually all genomes potentially to become a reality.

For a marker to be of use in linkage mapping, it should be of a certain polymorphic value. The most commonly used measure of polymorphism is the Polymorphism Information Content, or PIC value (Botstein etal. 1980), and is a combined measurement of the number of alleles of a microsatellite and their relative frequencies within a population. Linkage map markers need to be as informative as possible, and the use of the PIC value to assess the potential of markers for use in mapping is important. 35

Establishing linkage Suitable markers can be assessed for their linkage to each other, as measured by their recombination fraction (RF). The value of the RF ranges between 0 and 0.5 and can be converted to a genetic map distance, measured in Morgans, and defined as : the expected number of crossovers occurring on a single chromosome strand between the two genes' (pg 12, (Ott 1991)). The distance is commonly quoted in centiMorgans, cM. 1cM is equivalent to 1% recombination (one event in a total possible of 100 ). This genetic distance is roughly equivalent (depending upon the species) to a physical distance of 1 Megabase of DNA along the chromosome. The comparison is only approximate, since if two such loci are situated either side of a recombination hotspot, they would recombine frequently, making their genetic distance apart much greater than their physical distance from each other.

The conversion of the RF to a cM map distance is necessary because of the possibility of multiple crossover events between two loci. There are sex- specific differences between the recombination rates of many genomes, resulting in longer genetic distances for the female than the male maps in many animals. The human female genetic map is 1.9 times the size of the male map (Matise et al. 1994) because of the greater recombination rate in the female. Not all animals have as striking a difference between the sexes; there is little difference between the male and female bovine maps (Kappes et al. 1997). Most genetic map distances are quoted separately as male and female, or an average of the two distances is given, namely the sex-averaged map distance.

Linkage groups and their order . Each marker is assessed according to its recombination with every other marker. Different recombination fractions are assessed for each pair of markers to find which is statistically the most likely to occur, and is called the maximum likelihood estimate. The highest likelihood value gives the most likely recombination fraction. The value is usually given as a ratio of linkage versus nonlinkage, meaning the likelihood of linkage between two markers against their not being linked. The logarithm of this highest score (known as the Lodscore) is then given. A Lodscore value of 3 describes linkage as 1000x more likely than nonlinkage (Ott 1991), and is the minimum value accepted as indicating definite linkage. Once more than two markers are established as being significantly linked to each other by their Lodscore, the order of the markers in the group can be established. This completes the process of genetic map production, where a series of loci are identified in a specific order and at defined distances from each other (Goodfellow 1993). Lodscores are used to calculate the probability of one marker order over another. An order is considered to be 'definite' only if the order of the markers is at least 10OOx more likely than the next best order. In this way, markers being added to each linkage group enables a linkage map to be built of chromosome arms (Kwiatkowski et al. 1992), an entire chromosome (Hazan etal. 1992)(Du etal. 1995), and eventually the entire genome (Reed etal. 1994).

Reference families The most efficient way to produce a genetic linkage map of a genome is to obtain a standard set of reference families, such as the CEPH families used in the human genome mapping project. Ideally, the families should be all of complete three-generation families with large numbers of offspring and none of the original individuals should be related in any way. Mice can have controlled matings between different breeds, increasing the potential polymorphism of markers, have large litters and are able to breed several litters in a lifetime. Three generation families enable linkage phase to be established, that is, the knowledge of which allele in each parent was inherited from the mother and which from the father, and in turn which of these alleles is passed on to the offspring. This enables more recombinations to be detected, and gives more information for linkage analysis, eventually producing better quality maps.

The ideal canine family would use four different breeds which are as evolutionarily distant as possible,.producing a crossbred set of families with a large F2 generation. At least 100 markers have been mapped to a set of two- generation families, producing sixteen linkage groups (Lingaas etal. 1997), and these are to be combined with 150 further markers, mapped on a separate, three-generation set of families (Mellersh etal. 1997).

1.7 Chromosome studies of the Canidae

The Canidae comprise 34 extant species (Wayne 1993) with chromosome numbers ranging from 2n=36 (mainly metacentric) in the red fox (Vulpes vulpes) to 2n=78 (with all acrocentric chromosomes) in a number of wolf-like canids, including the domestic dog (Canis familiaris) (Wayne 1993). Many evolutionary studies have been carried out in the Canidae, and are based on their chromosome differences, each suggesting the grey wolf to be the closest relative of the domestic dog. The large range in the number of canine chromosomes leads to suggestions that the 'primitive' canine karyotype branches into two directions; those species containing high numbers of acrocentric chromosomes, such as the wolf-like canids and the South American canids, and those with low numbers of metacentric chromosomes, such as the red fox-like canids and others including the racoon dog and the grey fox (Wayne etal. 1987b)(Wayne 1993). The basic karyotype is assumed to be metacentric, based on the metacentric nature of the felid chromosomes within carnivora. The original canine karyotype may have consisted of around 50-60 metacentric chromosomes, with the numbers both decreasing by fusion in some instances, leading to species with lower diploid numbers (2n<50) and increasing by fission leading to species with higher diploid numbers (2n>64) in others (Wayne etal. 1987a).

Many difficulties arise in the precise identification of the chromosomes of the domestic dog because of their acrocentric morphology and similarity in size and banding patterns. Chromosome banding patterns are produced using various stains, the most common being the Giemsa stain which, subject to previous chromosome treatment, produces dark and light bands on the chromosomes, generally corresponding to the AT- and GC-rich areas respectively of the chromosomes (Ayala and Kiger Jr. 1984). The increased resolution of G (Giemsa)-banding of the canine karyotype using elongated metaphase preparations allows for more accurate comparisons between karyotypes, such as the red fox (Vulpes vulpes) and the domestic dog (Canis familiaris) (Graphodatsky etal. 1995). The G-banding conservation between the species enables 'matching' of.the two sets of chromosomes and the study of the evolutionary differences between the two species, although genetic homology can only be established by fluorescence in situ hybridisation (FISH).

Several karyotypes of the domestic dog are available (Selden et al. 1975)(Manolache etal. 1976)(Stone and Prieur 1991)(Reimann etal. 1996)(Swftonski etal. 1996), all stating the difficulty in clearly identifying the canine chromosomes. New technologies and methods are allowing for the production of lengthened chromosomes, producing more detailed banding patterns making subsequent chromosome identification easier. The published work of The Committee for the Standardised Karyotype of the Dog (Canis familiaris) agrees upon the standardisation of the largest 21 pairs of autosomes and the sex chromosomes (Swftonski et al. 1996). The rest of the chromosomes are now able to be identified, and the work of The Committee for the Standardised Karyotype of the Dog (Canis familiaris) , consisting of a completed, standardised canine karyotype is about to be submitted for publication (personal communication, M. Breen (1997)). Maximum use of the low divergence between dog and fox could be made by using the well characterised fox karyotypes in the physical mapping of the canine genome. (Fredholm and Wintero 1995). The most comprehensive published complete karyotype can be seen in Figure 1.1 (Reimann etal. 1996), which shows an ideogram or diagrammatic representation of the G- banding patterns of each of the chromosomes. Chromosomes are grossly divided into the p and q arms, representing the smaller and larger arms respectively. Each arm is then subdivided into numbered regions according to the presence of landmark bands, with numbers increasing away from the centromere. These regions are then subdivided into sections until all the individual bands are discernible. In the canine ideogram shown in Figure 1.1, only the X and Y chromosomes are metacentric and have both a p- and a q- arm. All autosomes are acrocentric, so only a q-arm is present. The system of nomenclature works by identifying the precise location of a locus. If a locus was mapped to the first black band of chromosome 24, its position would be 24q12; 24 for the chromosome number, q for being on the q arm, 1 representing the first region of the arm and 2 for the band it belongs to within the first subdivision. At position 24q24, the sections are further subdivided into .1, .2, .3, and .4 and can be added to the chromosome location where discernible. A range, such as 24q21-3 is given if the precise location is not clear. To enable accurate and reliable identification of all the smaller canine chromosomes, a further method of recognising each of the chromosomes is needed. Such a method would make use of chromosome-specific 'paints' and chromosome-specific cosmids.

1.8. Physical mapping and the use of fluorescence in situ hybridisation (FISH)

Existing cytogenetic or chromosome maps consist of patterns of chromosomal bands which both uniquely identify an individual chromosome and divide the genome into distinguishable units. In the human cytogenetic map, there are at least 300 dark and light bands within the karyotype. Each band is unique and Figure 1.1

31 32 33 34 35 36 37 38 *33 11 mli2 •

Figure 1.1 shows an extended ideogram of the canine karyotype (Reimann et al, 1996) with 460 numbered bands and based on G-banded chromosomes. 40 labelled positionally. Physical and genetic maps can be keyed to the cytogenetic map using these bands. The human genome mapping project is now physically mapping YACs (Yeast Artificial Chromosomes), containing up to 1 Megabase of DNA, allowing integration of the cytogenetic, genetic and physical maps (Bray-Ward et al. 1996). Once a complete karyotype for an organism is compiled, it can be used to physically map unique segments of DNA, such as a cosmid containing a gene, to the specific area of the genome from which it was derived.

1.8.1 FISH In situ hybridisation of a DNA probe to metaphase chromosomes allows the DNA contained within that probe to be assigned to its point of origin within the genome. The most commonly used technique is Fluorescence In Situ Hybridisation (FISH) (Pinkel etal. 1986)(Lichter etal. 1990). The general technique (Trask 1991) involves chromatin preparation and denaturation, where the chromosomes of cells at metaphase are spread onto slides. The DNA within the chromosomes is then made single-stranded and DNA or RNA probes are then labelled with reporter molecules such as biotin or digoxigenin, enabling binding of fluorescent affinity reagents. The probe is then mixed with sonicated DNA to prevent the binding of non-specific and repetitive sequence, and hybridised to the chromosomes. The hybridised DNA is then incubated in a fluorochrome-conjugate such as avidin (where biotin is the reporter molecule), followed by avidin antibodies and fluorescently labelled anti-immunoglobulins (such as fluoroisothio cyanate (FITC), giving a yellow/green signal where biotin is used or rhodamine, giving a red signal, where digoxigenin (DIG) is used) to amplify and produce the signal. Diamidino-2-phenylindole dihydrochloric hydrate (DAPI) is used to produce G-banding on the chromosomes, and signal location and chromosomes are identified by macroscopic analysis of the spreads.

1.8.2 Probes for FISH (Yung 1996) Several types of probe can be used in hybridisation analysis, including YACs, centromere and telomere probes, whole chromosome painting probes and unique sequence probes (USPs). Probes such as chromosome-specific YACs provide both a simple method of chromosome identification, and enable an ordered group of genetic markers which has been assigned to a YAC by PCR, to be physically mapped. The hybridisation position of the YAC enables the physical assignment of the genetic markers previously assigned to the YAC. 41 Centromeric and telomeric probes contain repetitive sequences and give strong signals which can be chromosome-specific. Where small, acrocentric chromosomes are present, such as those of the dog, both centromeric and telomeric probes can be used for orientation (Reimann et al. 1996), subsequently aiding identification of the chromosomes by their banding pattern. Whole chromosome painting probes can be produced using either a number of chromosome-specific probes hybridised together, or a labelled whole chromosome. The production of chromosome-specific paints using whole chromosome as the probe involves flow-sorting the chromosomes, which takes into account both DNA content and base-pair ratio. The canine chromosomes are resolved into 32 peaks, representing all 38 pairs of canine autosomes and the sex chromosomes. Eight of the peaks hybridised to two pairs of chromosomes, and the rest were chromosome-specific (Langford et al. 1996). All chromosomes in the canine karyotype were represented, so the use of these paints, once assigned to their corresponding chromosomes, means all physically labelled markers can also be accurately assigned. Such chromosome paints can be used to label their corresponding chromosomes in metaphase spreads of hybrid cells, in interspecies studies and for labelling specific chromosomes (Pinkel etal. 1986).

A further type of probe hybridises to one specific place on the genome, and is the most widely used probe for physical mapping. The probes contain unique sequence, such as genes, enabling comparative mapping to take place. A DNA probe containing a human gene sequence is hybridised to a canine chromosome spread. This enables the similar gene to be quickly located in the canine genome, thus building a comparative physical map. The minimum size of routinely used FISH probes is generally five kilobases in size; large enough to produce a fluorescent signal that allows detection above non­ specific background signals. Fluorescent probes as small as two Kilobases are used (Dreyling etal. 1997), although hybridisation efficiency is reduced.

1.8.3 Applications of FISH In addition to physical mapping, FISH has also been used in the diagnosis of diseases which show chromosomal abnormality, such as Down's Syndrome (trisomy 21), and translocation disorders like Cri-du-chat and Haemoglobin H (Yung 1996). FISH is also used to elucidate the chromosome rearrangements seen in cancerous tumours. Microdeletions which may cause diseases by the absence of a unique sequence are revealed by the absence of an appropriate signal following FISH with the relevant unique sequence probe (USP), thus helping to find the gene or genes which may be causing the disease. Isolation of the short-stature 'gene' to a 270 kilobase area of the pseudoautosomal region of the human sex chromosomes was carried out by analysing the loss of signal from cosmids within that region in chromosome preparations of affected patients (Rao et al. 1997). As techniques for visualising smaller DNA fragments on chromosomes are developed (Dreyling et al. 1997), further diseases caused by deletions or rearrangements can be physically isolated to their causative gene(s).

FISH techniques have also been used to create a novel method of identification of human chromosomes, using mixtures of different types of fluorescently labelled probes and hybridising them to chromosome spreads. Where red and green labelled probes hybridise to the same chromosome area, the signal appears yellow. Using these colour variations, a 'multicolour chromosome bar code' for the human chromosomes is developed, enabling a simplified way of chromosome identification (Muller etal. 1997).

1.8.4 Physical mapping Physical mapping is largely based on the use of single-copy, large-insert clones such as cosmids, and localising genetic linkage maps by their assignment to radiation hybrid panels which contain identified chromosome segments (O'Brien 1991).

Physical clone ordering YACs, cosmids and phage clones containing insert DNA ranging from one megabase to nine kilobases are easily visualised on chromosome spreads. The hybridisation of more than one cosmid at a time to a spread enables physical ordering of cosmids with inserts from the same chromosome. Genes used as probes produce an ordered gene map of the chromosome (Lichter et al. 1990). Initially, ordering of clones on metaphase spreads could only take place when they were separated by at least one megabase (Heiskanen et al. 1996)(Buckle and Kearney 1993), as chromatin folding disrupted the visual order of the clones, but new techniques now enable the resolution to be increased. The use of interphase nuclei enables the ordering of clones which are between 20 and 1,000 kilobases apart, and mechanically stretched DNA allows clone ordering at a resolution of above 200 kilobases (Buckle and Kearney 1993)(Heiskanen etal. 1996). Direct visual hybridisation, or DIRVISH, based on the production of free, linearly extended DNA from interphase nuclei (Buckle and Kearney 1993) enables ordering of clones which are only five kilobases apart. A further technique, using interphase nuclei, has been successfully used to order to a resolution of one kilobase by Fiber-FISH (Heiskanen etal. 1996), which linearises DNA strands.

Hybrid cell line use in mapping There are two types of hybrid cell lines useful for mapping studies, somatic cell hybrids and radiation hybrids (Stewart et al. 1997). Somatic cell hybrids are cells containing a single, donor chromosome or part-chromosome in recipient cells. The chromosome or chromosome fragment (produced by irradiation of the chromosome) usually survives as a separate, individual chromosome within the recipient cell. Such cell lines are available in the dog, representing up to 90% of the canine genome (Langston et al. 1997), although some of the lines contain fragmented parts of different chromosomes creating confusion as to correct marker assignment in some cases. Such lines also eject the foreign (donor) chromosomes from time to time, so the characteristics of the cell lines can change over time. Radiation hybrids are more stable, and are produced by the fusion of a single chromosome or an entire donor genome, which has been previously fragmented by irradiation, to a similarly irradiated rodent cell (usually that of a hamster). Parts of the donor chromosome(s) are incorporated into the rearranged chromosomes of the recipient cell, thus making the new hybrid cell more stable than a somatic cell hybrid line. The resultant composite irradiated cell is then fused to a nonirradiated recipient cell, and the viable colonies which are formed consist of hybrids between irradiated and non-irradiated cells. Some of these cells contain integrated, stabilised parts of donor chromosomes, providing a base for mapping that part of the donor chromosome. Different cell lines contain different parts of different donor chromosomes, the total panel of cell lines retaining between them the entire karyotype of the donor animal. The more irradiation the chromosome or whole cell is subjected to, the smaller the fragments produced, and the greater the level of mapping resolution (Stewart etal. 1997).

Both somatic cell hybrids and radiation hybrids provide means of mapping and ordering sequence-tagged sites (STSs), and are able to incorporate both polymorphic and nonpolymorphic markers, such as expressed-sequence tags (ESTs).

ESTs are Type I anchor loci and are generally produced from cDNA libraries, their use greatly facilitating comparative mapping. The human genome mapping project is using gene-specific STSs found in the 3' untranslated regions of human cDNAs (Berry etal. 1995), or ESTs derived from cDNA 44 libraries and attached to genes to provide a transcript map of the human genome (Boguski and Schuler 1995). More than 16,000 genes are already positioned relative to a map of polymorphic genetic markers (Schuler et al. 1996), so the completion of a gene map for the human genome is well under way. The human genome mapping project comprises 65,000 ESTs, based on the 3' and 5' ends of cDNA libraries in the public domain (Boguski and Schuler 1995), the eventual aim being to map more than 50,000 of these to 0.5 Megabase intervals (Gianfrancesco etal. 1997). Fifty-nine EST gene markers are mapped to the human X chromosome (Gianfrancesco et al. 1997), with an average dispersion of four to five Megabases. Eventually, every gene will have an associated gene-based STS (Berry etal. 1995).

1.8.5 Physical map of the canine genome The physical map of the canine genome is in its early stages of development, although the recent production of a standardised karyotype will speed map production (personal communication, M. Breen (1997)). Both large insert libraries based on BACs (bacterial artificial chromosomes) (Ostrander 1997) and somatic cell hybrid lines (Ostrander 1997)(Vignaux et al. 1997) are being developed.

Tentative chromosomal assignment has been made for at least seven genes in the dog at anchor loci (Guevara-Fujita et al. 1996), and the potential use of genes found in other species to produce a gene map of the dog has been investigated with promising results (Venta etal. 1996). cDNA libraries are also being used to isolate canine genes, and physically map them to canine chromosomes by the use of hybrid cell lines or by direct mapping with FISH.

Several microsatellite repeats have been found associated with genes in the dog: and include CAG repeats in the coding region of the canine androgen receptor gene (Shibuya et al. 1993), in the canine MHC Class 1 region (Burnett et al. 1995), the intron of canine von Willebrand factor gene (Shibuya etal. 1994) and in an intron of canine Wilms tumour 1 (WT1) gene (Shibuya et al. 1996). The physical mapping of these genes can be combined with the use of their associated microsatellites in linkage mapping, making these markers both Type I and Type II in their characteristics, helping to unify the physical and linkage maps of the canine genome.

The combined use of canine chromosome-specific paints (Langford et al. 1996) and physically assigned cosmid clones containing microsatellite markers for each chromosome, provides a method of identifying canine 45 chromosomes, genetic linkage groups and physical maps of the canine genome (Breen etal. 1997).

1.9 Comparative mapping

Highly developed human and mouse maps are speeding the progression of less developed maps using comparative mapping. Mapping of conserved genes between species reveals extensive conservation of chromosomal areas between animals as diverse as the human and the cat (O'Brien et al. 1997).

Segments of human chromosomes hybridise with other species, often showing extensive banding homology (Hozier and Davis 1992). This conservation between species is known as synteny, and extends from conservation of a chromosome segment to the level of conservation of the grouping and order of genes on those segments. The advanced states of many genetic and physical maps of mammals including the human, mouse and economically important species such as cattle and sheep, means that comparative mapping (exploiting synteny between species) provides an efficient way of producing maps which are developing more slowly, such as that of the domestic dog. The use of reciprocal cat and human FISH probes also reveals substantial syntenic conservation across the chromosomes, cat chromosome D1 being entirely contained in human chromosome 11 (O'Brien et al. 1997). Chromosome paints developed in humans and mice have been used to find homology between distantly related species, including rodents, whales and ungulates and is called ZOO-FISH (Scherthan et al. 1994). The method demonstrates the potential of the use of chromosome paints for aiding comparative mapping and evolutionary studies.

The gene resource provided by the ESTs of the human project is being used in numerous other species. The bovine genome project makes maximum use of these resources, and consists of synteny groups which need only to be karyotypically assigned to provide the basis of the physical gene map of bovine chromosomes (Fries 1993). Synteny is present between human chromosome 13 and bovine chromosome 12 using comparative mapping (Sun etal. 1997) and genes mapped in humans and mice are physically assigned to sheep chromosomes, and further comparative mapping between sheep and cattle is also underway (Ansari etal. 1994). One species where these Type I markers are fully exploited is the cat, for which 105 physically 46 assigned gene markers are present, based mostly on human homologs of genes suggested as anchor loci for comparative mapping (O'Brien et al. 1997). Initial studies reveal synteny between human and canine chromosomes, where loci on human chromosome 17 and canine chromosomes 9, 5 (Werner et al. 1997), and to one of the medium-sized chromosomes, tentatively assigned as chromosome 23 (Park 1996) are syntenic. Human and canine X chromosomes also share similar loci, and several individual genes in human YACs have been used to identify the equivalent genes and their chromosomal locations in the dog.(Dutra et al. 1996). Synteny does not necessarily mean the orders of the genes within the syntenic group are conserved, and syntenic mapping as a whole is based on units of the smallest evolutionarily conserved unit segment (SCEUS) within a group of syntenic loci (O'Brien etal. 1993). Synteny discovered between human chromosome 17, mouse chromosome 11 and canine chromosome 9 involves an inverted gene order in the dog (Werner et al. 1997), demonstrating rearrangements within the syntenic group. The synteny of canine and human chromosomes shown by the above study, combined with the success of primers amplifying conserved gene loci reported in the dog (Venta et al. 1996), and the development of hybrid cell lines, together should facilitate the rapid development of the physical and gene maps of the dog.

Nearly all comparative mapping has been carried out using the group of gene loci suggested originally by the committee on comparative genome mapping (O'Brien and Marshall Graves 1991). Various modifications have been made to the suggested group of around 200, including an increase in size of the group to 321 loci for comparative mammalian genome mapping based on even spacing in the human genome and conservation of gene loci in at least 2 species (O'Brien etal. 1993). Further modifications investigated loci called comparative anchor-tagged loci (CATS), again based on previous anchor loci but optimised for cats (Lyons et al. 1997). This latter group shows promise for the canine genome map given the closer evolutionary relationship between dogs and cats than dogs and humans.

1.10 Combining the maps

The physical and genetic maps produced for each species need to be combined to produce a complete map. The human physical and genetic maps are being combined by FISH mapping of YAC clones, thus physically assigning the linkage groups of the YAC library (Bray-Ward etal. 1996). 47 Integration is also taking place using somatic cell hybrids (Budarf et al. 1996) and radiation hybrids (Stewart et al. 1997). The advantage of using such hybrid cell lines is that markers which have been mapped both physically and by linkage can be used, which aids the integration of the map types. Linkage groups of polymorphic markers can be assigned to specific cell-lines containing the chromosomes or parts of chromosomes to which the groups will eventually be physically assigned. Gene loci can also be assigned to a cell-line (Venta et al. 1996) by PCR, thus combining the linkage and physical maps. Primers were designed in evolutionarily conserved regions of the genes, enabling them to be used to amplify canine DNA sequences (Venta et al. 1996). Further methods include the combining of linkage and physical maps where there are linkage groups based on more than one set of reference families, as with the bovine map (Fries 1993). This can be achieved, as proposed for the bovine genome (Fries 1993), by creating a set of cosmids which are physically-mapped, polymorphic markers which represent all the chromosomes of the animal in question, using two markers for each chromosome, one at each end of the chromosome. The physical assignment of each of the markers would establish them as 'anchor' loci for each of the chromosomes, and their polymorphic properties would enable this small set to be used across all the sets of reference families. All linkage groups from all the different families could then be assigned to a specific chromosome, physically mapping the groups onto a common map. Progress towards the development of such markers is being made with the canine genome (Breen etal. 1997), and their availability to the canine mapping community as a whole will enable the integration of the linkage groups and their physical assignment.

1.11 Research aims

Genetically mapping the canine genome has so far largely involved the use of microsatellite markers. Most have been isolated from small insert libraries, which are ideal for linkage mapping, but not for physical mapping, as the latter requires at least 5 Kilobases of DNA to produce an easily identifiable signal. If the aims of the integration of the physical and genetic maps of the canine genome are to be realised, polymorphic markers which can also be physically mapped need to be developed. This strategy has already been used for the porcine (Robic etal. 1995), bovine (Toldo etal. 1993) and equine (Breen etal. 1997) maps with some success. To this end, it was decided to isolate polymorphic markers by using a partial canine cosmid library with an average insert size of 38 Kilobases, and thus suitable for physical mapping by FISH. Upon isolation of cosmid-derived microsatellites, polymorphic markers would be identified by assessing their allele size variations over twelve breeds of dog. The cosmids themselves could be physically mapped by FISH, assigning each to a canine chromosome. Genetic linkage of the polymorphic microsatellite markers using two-generation reference families would permit identification of markers genetically linked to those which had previously been physically mapped. Each of the markers which show linkage to those binding to specific chromosomes could then be assigned to a specific chromosome, bridging the gap between a genetic and a physical map. Any physically assigned markers which were not genetically linked to others in the reference families would still be useful when more markers are typed across the families in the future. By the same reasoning, any markers which are not able to be assigned to a chromosome are still useful as pure linkage markers. To enable the work to be completed by the end of the PhD, the complete characterisation of a small number of microsatellite markers would enable an important contribution to be made to the progress of the canine genome map as a whole. CHAPTER 2

GENERAL SOLUTIONS

ACRYLAMIDE GEL (6% and 4%) - DENATURING For 100ml: 43g Urea (USB Ultrapure) 5ml 10xTBE 15ml (6%) or 10ml (4%) Accugel (40%) (National Diagnostics, Flowgen) dH20 to 100ml 500|il 10% APS (Ammoniun Persulphate) 30pl TEMED (N,N,N',N'-Tetra methyl ethylene diamine)

ACRYLAMIDE GEL (5%) - NON-DENATURING For 100ml: 10ml 10xTBE 12.5ml Acugel (40%) 77.5ml dH20 500jnl 10% APS 30pl TEMED

DENATURING SOLUTION (Amersham Hybond-N booklet) 1.5M NaCI 0.5M NaOH

0.5M EDTA (disodium ethylenediaminetetraacetic acid),(pH8) for 1 litre: Add 186.1g solid EDTA to 800ml of H20 Stir well and adjust to pH8 with ~20g of NaOH pellets. Make up to 1 litre with dH20

EXTRACTION BUFFER 150mM NaCI 10mM Tris-HCI, pH 8.0 100mM EDTA, pH 8.0 1 % SDS (Sodium Dodecyl Sulphate) 0.5mg/ml Proteinase K (Gibco BRL) G.E.T SOLUTION 50mM Glucose 10mM EDTA pH8.0 25mM Tris-HCI pH8.0

LB (Luria Broth) for 1 litre: 10g Bacto-tryptone (Oxoid) 5g Bacto-yeast extract (Oxoid) 10g NaCI H20 to 1 litre Autoclave ( at 151b/ square inch for 20 minutes) and allow to cool to ~50°C before adding antibiotics as required.

LB (Luria Broth) PLATE AGAR Recipe as for LB and add: 15g/l Bacto-agar for plates LOADING BUFFER (6x) In H20: 0.25% w/v Bromophenol Blue (BDH) (from a 5% stock solution in dH20) 0.25% w/v Xylene Cyanol (BDH) (from a 5% stock solution in dH20) 30% v/v Glycerol (from a 50% stock solution)

10x M9 SALTS Add to 800ml dH20: 60g Na2HP04 30g KH2P04 5g NaCI 10g NH4CI Make up to 1 litre with dH20

M9 MINIMAL AGAR Add to 750ml sterile deionised H20 containing 15g/l bacteriological agar: 100ml 10xM9 salts 2ml 1M MgS04 0.1ml 1M CaCI2 20ml 20% Glucose Solution Make up to 1 litre with sterile deionised H20. NEUTRALISING SOLUTION (Amersham Hybond-N booklet) 1.5M NaCI 0.5M Tris-HCI pH7.2 0.001 M EDTA

PCR BUFFER (STOCK 11.1x - for 676jxl) (recipe from Alec Jeffreys, personal communication) Final reaction concentration 167(0,1 2M Tris HCI (pH8.0) 45mM 83|il 1M Ammonium sulphate 11mM 33.5jxl 1 M MgCI2 4.5mM 3.6(xl 100% 2-mercaptoethanol 6.7mM 3.4jxl 10mM EDTA (pH8.0) 4.4mM 75(il 100mM dATP 1mM 75|il 100mM dCTP 1mM 75pl 100mM dGTP 1mM 75|il 10OmM dTTP 1mM 85(xl 10mg/ml BSA 113pg/ml

POTASSIUM ACETATE SOLUTION (pH4.8) 60ml 5M Potassium acetate solution 11.5ml Glacial Acetic Acid 28.5ml H20 (The resulting solution is 3M with respect to potassium and 5M with respect to acetate)

PREHYBRIDISATION SOLUTION FOR RADIOACTIVE USE (for 200ml) 6x SSC (from 20x stock solution) 5mM EDTA (from 0.5M stock solution, pH8.0) 6% w/v PEG (polyethylene glycol) (from 36%stock solution) 1% w/v SDS (sodium dodecyl sulphate) (from 10% stock solution) 0.25% w/v Marvel Milk

20xSSC (Amersham Hybond-N booklet) 3M NaCI 0.3M Tri-sodium citrate STOP SOLUTION 95% Formamide 20mM EDTA (from 0.5M stock solution, pH8.0) 0.05% Bromophenol Blue (from 5% stock solution) 0.05% Xylene Cyanol (from 5% stock solution)

10x TBE (Tris-Borate-EDTA) For 1 litre: 108g Tris base 55g Boric acid 40ml 0.5M EDTA, pH8.0

1M TRIS-HCI (Tris[hydroxymethyl]aminomethane) 121.1g Tris base in 800ml H20 Adjust pH to that required with concentrated HCI, make up volume to 1 litre with dH20. 53 METHODS

2.1 Identification of cosmids and fragments containing putative microsatellites

2.1.1 Partial canine cosmid library Clones were obtained from a canine in a pWE15 vector (cloned into the BamH 1 site) from Stratagene, donated by M. Binns of the Animal Health Trust, Newmarket.

Bacteria containing cosmid recombinants were grown on LB agar containing 50jxg/ml Ampicillin (Sigma). Colonies were then picked into individual wells of microtitre plates, each well containing 200jxl of LB + Ampicillin (50pg/ml) and grown up overnight at 37°C with shaking. Glycerol was then added to each well and mixed with a pipette tip to give a 15% glycerol solution and the plates were then stored at -70°C.

2.1.2 Hedgehog Blot production Each microtitre plate was replicated onto Hybond-N membrane (Amersham) as follows: The prongs of a 'hedgehog' replicating device (Microtitre- Dynatech) were sterilised by immersion in IMS and flaming. After cooling, the hedgehog was placed in the microtitre plate and then directly onto the Hybond-N, which had been placed on LB Agar containing 50pg/ml Ampicillin. This was then grown in a 37°C incubator overnight. One of two methods were used to lyse the bacteria and to fix the DNA to the Hybond-N: Method 1 The membranes were placed colony side up on Whatmann 3MM chromatography paper soaked in 2x SSC/5% SDS for 2 minutes in a microwave-proof container, which was then microwaved in an Hitachi 1.4kW microwave at the maximum setting for 2 V2 minutes to lyse the cells. Method 2 The membranes (always colony side up) were placed onto 3MM soaked in 0.5M NaOH/1.5M NaCI for 5 minutes. The membranes were removed and placed onto 3MM Whatmann filter paper soaked in 1M Tris-HCI, pH7.2 for 5 minutes and finally onto 3MM Whatmann filter paper soaked in 0.5M Tris, pH7.5/1.5M NaCI for 5 minutes. The membranes were left to air dry and the DNA was fixed by placing the membrane colony side down on a Fluo-Link TFL-35M u.v illuminator for exactly 11/2 minutes, or face-up in an Amersham UV crosslinker using default settings. 54

2.1.3 Probing for cosmid colonies containing microsateliites - Radioactive method 1. Probe synthesis 2pl poly(AC)n*(GT)n at 30ng/p,l, 1 jxl oligo (CA)io (50pmol or 324ng/pJ), 2jxl 10xTM and 5|iil H20 were combined and placed at 70°C for 5 minutes. The solution was then left to cool to room temperature for 30 minutes. 2pl dATP (0.125mM), 2|il (20pCi) [32P] dCTP (Specific Activity 3000Ci/mmol, concentration 10mCi/ml from NEN), 0.5pl (0.5U) Klenow and 5.5pl H20 were then added and mixed. The solution was placed in a 37°C water bath for 1hour, boiled and added to the prehybridisation solution (see General solutions section, pg 49 for recipe) whilst still hot.

2. Probing and washing blots The blot was placed in prehybridisation solution at 60°C for several hours or overnight. The prehybridisation solution was replaced with fresh solution, and the probe was added as described above (this hybridisation solution can be stored for at least 6 months at -20°C and re-used). The probe was left to hybridise at 60°C overnight. All washes were carried out at 60°C until radioactivity levels had reached a plateau as detected with a monitor (Type B series 900 mini-monitor G-M tube (mini-instruments ltd.)) and were as follows: 20 minutes in 6x SSC/0.1% SDS, 10 minutes in 1x SSC/ 0.1% SDS and 3 washes of 10 minutes each in 0.5x SSC/0.1% SDS.

Positive clones were detected by placing the blot in a film cassette with Fuji Medical X-Ray film or Kodak Diagnostic film. All X-Ray films were developed by a DuPont CRONEX CX-130 processor.

2.1.4 Probing for cosmid colonies containing microsateliites - ECL method The Amersham LIFE SCIENCE ECL™ 3'-oligolabelling and detection systems kit provided all material except Buffers 1 and 2 and stringency wash and wash solutions.

Cleaning cosmid and plasmid blots for ECL detection To gain sufficient positive signal from the cosmid and plasmid blots in the ECL method, it was essential to remove all visible cellular debris from the membrane after fixing the DNA. This was achieved by rubbing the membrane with a clean glove in 2xSSC. 55 1. Probe synthesis (pa16-18 of Amersham LIFE SCIENCE ECL™ 3'- oligolabelling and detection systems booklet)

(CA)io or a (GAAA)6 oligonucleotide (100x1 O'12) x j l l I (synthesised by PNACL, Leicester) Fluorescein-11-dUTP 10|il Cacodylate buffer 16pl dH20 ypl Terminal Transferase 16pl Total 160pl x and y were adjusted so that the total reaction volume was 160|nl. The solution was mixed gently using a pipette. The reaction mixture was incubated at 37°C for 60-90 minutes. The labelled probe was stored at -20°C protected from the light.

2. Probing and washing blots (pa19-27 of Amersham LIFE SCIENCE ECL™ S'-oligolabelling and detection systems booklet) The blots were placed in 0.125ml/cm2 Hybridisation buffer (5xSSC, 0.1%(w/v) Hybridization buffer component, 0.02%(w/v) SDS, 20-fold dilution of liquid block) in a bag and hybridised for at least 30 minutes with agitation at 45°C. The labelled oligonucleotide probe was added to the prehybridisation step at a final concentration of 5ng/ml and the blot was hybridised in similar conditions for 1-2 hours (can be left overnight). The blot was removed from the hybridisation solution and placed in an excess of Wash Buffer (5xSSC, 0.1%(w/v)SDS) at room temperature for 2x 5 minutes. The blot was then placed in pre-warmed Stringency Wash Buffer (1xSSC, 0.1%(w/v)SDS) at 45°C for two washes of 15 minutes with agitation. All subsequent steps were carried out at room temperature. The blot was then rinsed in Buffer 1 (0.15M NaCI, 0.1 M Tris base, pH7.5) for 1 minute. The blot was then placed in a bag with 0.125ml/cm2 of a 20-fold dilution of liquid block in Buffer 1 and incubated for at least 30 minutes. The blot was then rinsed in Buffer 1 and then placed in 0.125ml/cm2 diluted antibody conjugate solution (anti-fluorescein HRP conjugate diluted 1000-fold in Buffer 2 (0.4M NaCI, 0.1 M Tris base, pH7.5) containing 0.5%(w/v) bovine serum albumin (fraction V)) in a bag for 30 minutes. The blot was then washed in excess Buffer 2 for 4x 5 minutes. The blot was then stored in cling film at 4°C until signal generation and detection.

3. Signal generation and detection All steps were carried out in the darkroom. Any excess wash buffer was drained from the blot. 0.0625ml/cm2 each of detection solutions 1 and 2 were mixed and immediately poured over the blot (DNA side up on cling film). The blot was left to incubate for exactly 1 minute, after which excess detection buffer was drained off and the blot was wrapped in cling film, removing all air pockets. The blot was then placed DNA side up in a film cassette, the lights switched off and a sheet of autoradiography film placed over the top of the blots. The cassette was then closed and the film exposed for 2 minutes to 3 hours until an appropriate signal was obtained.

2.1.5 Isolation of putative positive clones Using a sterile, wooden toothpick, a scrape was taken from each frozen microtitre plate well of the possible microsatellite-containing clones. These samples were then grown over night in a 20ml sterilin tube containing 5ml of LB and 50jig/ml Ampicillin. 3x 0.75ml aliquots of the cultures were placed in 1.5ml Eppendorf tubes and frozen at -70°C in 15% glycerol for long-term storage.

The remainder of the overnight culture was then used to produce DNA using the Alkaline Lysis method (See section 2.1.6)

2.1.6 Small-scale Alkaline Lysis Method (adapted from Birnboim and Doly, 1979). Up to 4.5ml of culture was spun for 2 mins at 9,500g . The supernatant was

removed and the pellet was gently resuspended using a pipette in 2 0 0 jllI of ice-cold GET (solution I) and stored at room temperature for 5 minutes.

300|il of freshly made 0.2M NaOH/1% SDS (solution II) was added, mixed by inversion and stored on ice for 5 minutes. The solution was neutralised with 300pl of ice-cold 3M Potassium Acetate, pH4.8 (solution III) and incubated on ice for 5 minutes (can be left for 1hr+). Cellular debris was removed by centrifuging for 10 minutes at room temperature. The supernatant was then transferred to a fresh tube. DNase-free RNase (Promega) to a final concentration of 20jig/ml was added and incubated for 20 minutes at 37°C. The solution was extracted twice with 400|il chloroform:iso-amyl alcohol (24:1). The layers were mixed by hand for 30 seconds then spun for 2 minutes in a centrifuge to separate the phases. The upper, aqueous phase was then removed to a fresh tube. The DNA was precipitated by the addition of an equal volume of 100% isopropanol and immediately spun for 10 minutes at room temperature. The DNA pellet was washed with 500pl 70% ethanol, dried under a vacuum for 3 minutes and resuspended in sterile dH20. 2.1.7 Large-scale Alkaline Lysis Method (adapted from Birnboim and Doly, 1979) 1. An E.coli culture harbouring the cosmid of interest was grown overnight (12-16hours) at 37°C in a 50ml Corex tube in up to 30 ml LB culture medium containing ampicillin at a concentration of 50 jig/ml. 2. The culture was spun for 10 minutes at 2500g and the supernatant removed. 3. Using 1 ml of ice-cold solution I, the pellet was gently resuspended with a pipette and stored at room temperature for 5 minutes. 4. 2 ml of freshly prepared solution II was added, the solution mixed by inversion and stored on ice for 5 minutes. 5. The solution was neutralised by adding 1.5 ml of ice-cold solution III and incubating on ice for a minimum of 5 minutes (storage for up to an hour is possible at this stage). 6. Cellular debris was removed by spinning in a centrifuge for 15 minutes at maximum speed (5600g) at 4°C. 7. The supernatant was transferred to a fresh tube and filtered using a 5 ml tip filled with sterilised poly-allomes wool. 8. DNAase-free RNase was added to a final concentration of 100 mg mH and incubated for 30 minutes at 37°C. 9. An equal amount of phenol:chloroform:isoamyl alcohol (25:24:1) was added and the suspension mixed by vortexing. The tube was then spun at 2500g for 5 minutes and the supernatant was transferred to a fresh tube. The process was then repeated using chloroform:isoamyl alcohol (24:1) 10.Double-stranded DNA was precipitated by adding 0.1 vol of NaAc (3 M, pH 5.2), followed by 2.5 vol of 100% ethanol to the supernatant from the previous step. The solution was spun in a centrifuge immediately at 9,500g for 20 minutes at room temperature. 11 .All the supernatant was removed taking care not to detach the DNA pellet from the wall of the tube. The pellet was washed with 70% ethanol and allowed to air dry for 5 minutes. 12.The DNA was resuspended in 20-50 jxl of sterile dH20 and stored in an Eppendorf tube at -20°C.

2.1.8 Quantification of DNA The DNA was left to dissolve and once resuspended, 1pl of each DNA sample to be quantified was added to 9pl sterile dH20 and 2jxl 6x loading buffer. Each sample was then loaded on a 1% agarose gel containing 0.5pl/ml ethidium bromide (from a 10mg/ml stock solution) and subjected to 58 electrophoresis alongside lambda DNA samples of known concentration. Estimates of DNA concentration of the samples were made by comparing the intensity of staining with the varying standard lambda DNAs (Amersham).

Quantification of DNA was also assessed using a Spectrophotometer (PYE UNICAM PU 8600 UV/VIS Spectrophotometer - PHILIPS) by taking an absorbance reading at 260nm. Given that double-stranded DNA at a concentration of 50ng/pl gives an A26onm of 1.0, a calculation of the amount of DNA present in a sample was carried out as follows: Eg:

If 1 O jllI of sample DNA in 990pl dH20 gives an A260nm of 0.168, the concentration of the sample DNA is calculated as follows - A26onm x dilution factor x concentration of DNA giving an A260nm of 1.0 0.168 x 100 x 50 = 840ng/pl

2.1.9 Digestion of DNA with restriction endonucleases The required amount of DNA was digested to completion with either four or six base recognition restriction endonuclease as specified by the manufacturer (Gibco-BRL or Promega unless otherwise indicated).

2.1.10 Separation of putative positive clone fragments digested with a restriction endonuclease 0.5|ig of each of the positive clones was digested overnight with a restriction endonuclease , loading buffer was added and the samples were loaded onto a large 0.8% agarose gel which was run at 60V for several hours until separation of the bands was sufficient to enable individual identification of most bands (usually overnight). If the gel was to be blotted, it was run with fluorescently labelled markers (XHind'\‘\'\ and XEco R 1 digests), labelled as described in section 2.1.12.

2.1.11 Agarose gels 1.0% Agarose (Multi-purpose agarose, Boehringer Mannheim) gels were made up in 1xTBE, and used unless stated otherwise, containing 0.5pg/ml ethidium bromide The gels were run at 120V for 1-2 hours and visualised under u.v. illumination from a UVP Dual-Intensity transilluminator. Photographs of gels were taken using a UVP camera and SONY Videographic printer UP-860CE. 2.1.12 Fluorescin d-UTP endlabelling of DNA molecular weight markers (Modified method from Amersham Booklet FluoroGreen) 1 jug DNA to be endlabelled 10|nl of each of 300pM dATP, dCTP and dGTP 2 j l i I FluoroGreen (Fluorescein-11-dUTP, 1mM - Amersham) 1 j l l I Klenow (Promega) 5pl 10x Klenow buffer dH20 to 50|il The solution was mixed and incubated at 20°C overnight. 3-4fxl were run on a gel as a fluorescently labelled marker.

2.1.13 Southern Blotting (Amersham Hybond-N protocol booklet (pg5&6)) The gel containing the bands was rinsed in distilled water and then incubated at room temperature for 30 minutes in denaturing solution with agitation. The gel was then rinsed in distilled water and placed in neutralising solution for 15 minutes with agitation. The procedure was then repeated and the gel was rinsed in distilled water. A platform covered with 3 layers of 3MM. saturated with 20xSSC was placed in a tray containing 20xSSC. The gel was placed on the filter paper avoiding trapping air bubbles underneath the gel and a sheet of membrane (Hybond-N) cut to the exact size of the gel was placed on top of the gel. Any air bubbles were carefully squeezed out and three sheets of 3MM cut to size and wetted with 20xSSC were placed on top of the membrane. A stack of absorbent paper towels 5cm high was placed on top of the 3MM. A flat plate was placed on top of the paper towels and a 1kg weight was placed on top. Transfer was allowed to take place for 2-16 hours. The membrane was then removed from the apparatus and left to air dry. The DNA was fixed by placing the membrane DNA side down on a Fluo-Link TFL-35M u.v illuminator for exactly 11/2 minutes, or face-up in an Amersham UV cross­ linker using default settings.

2.1.14 Putative microsateliite repeat-containing fragment identification Sau 3A 1 was used to digest the putative microsatellite-containing cosmids. The digested cosmid DNAs were then run on an agarose gel as described above and Southern blotted. The size of the fragments containing the putative microsateliite were determined using the ECL method described above. In the case of (CA)n-repeat microsateliites (due to the large numbers of cosmids containing possible (CA)n repeats), where the smallest fragment size containing a repeat was larger than 1 .Okb, the cosmid was discarded. For (GAAA)n-repeat fragments, the process was repeated with different 60 restriction enzymes until a fragment of between approximately 200 and 1500 base pairs was obtained.

The whole procedure was then repeated to verify the identity and size of the fragment of interest. When more than one repeat was present in the cosmid, the band of interest was eluted from the gel as described below. In cases where only one positive fragment was detected, the entire cosmid digest providing the correct positive fragment size was cleaned (see section 2.1.16 Method 2) in preparation for subcloning.

2.1.15 Band elution from agarose gels (McDonell eta ! 1977) The appropriate band was cut from an agarose gel with a clean scalpel blade and each separate agarose slice was placed in dialysis tubing (Medicell International ltd) (prepared as recommended in Maniatis 2nd edition, E39) clipped at one end. Fresh 1x TBE was added until the slice was covered, the excess liquid was tipped out and the tubing was clipped at the open end after careful removal of any air bubbles. The slice was pushed gently to one side of the dialysis tubing and placed in an electrophoresis tank. The tubing was then immersed in 1 xTBE with the slice nearest to the negative electrode. The samples were then subjected to electrophoresis at -150V for 20-30 minutes depending on the size of the DNA fragment to be eluted. The electrodes were reversed for 10 seconds to remove DNA from the inside surface of the dialysis membrane. As much as possible of the supernatant buffer surrounding the agarose slice was removed and placed in a 1.5ml Eppendorf tube.

2.1.16 Preparation of DNA fragments for subcloning The DNA from an agarose slice was prepared for subcloning using one of the two following methods:

Method 1 0.1x volume of 3M sodium acetate was added to the tube containing electrophoresed DNA in 1xTBE as described in section 2.1.14, followed by 2x volume of 100% ethanol. The solution was mixed by inversion and precipitated for 2hrs at -70°C or overnight at -20°C. The DNA was recovered by spinning at 13,000rpm for 30mins. The supernatant was decanted and the pellet washed in 1ml 100% ethanol and spun for 15mins. This was then repeated with 70% ethanol. The pellet was then resuspended in 10-20|il of sterile dH20. 61 Method 2 (used for preparation of whole cosmid digests as well as for eluted bands).

The DNA was cleaned using Qiagen QIAquick Nucleotide Removal Kit following the manufacturer's instructions provided in the QIAquick™ handbook:

1. 10 volumes of buffer PN were added to one volume of DNA solution (for fragments more than 100bp, only 5 volumes were added). 2. The solutions were mixed and placed in a QIAquick spin column in a 2ml collection tube and spun in a microcentrifuge at ~5600g for 1 minute 3. The flow-through was discarded and the column was washed by adding 750pl buffer PE (Ethanol added according to the manufacturer's instructions) and spun as before. The flow-through was again discarded and the column spun at 26,500g for a further minute. 4. The column was then placed in a clean 1.5ml centrifuge tube, 30-50pl dH20 was added to the column and incubated for 1 minute. 5. The column and tube were then spun for 1 minute at 26,500g. The liquid flow-through containing the DNA was stored in a 0.5ml Eppendorf at -20°C until required.

2.2 Subcloning

2.2.1 Production of competent cells (XL-1 Blue) A single colony of XL-1 Blue (for genotype see appendix) was picked from a minimal medium plate containing 1mM thiamine and was placed in 5ml LB with 50jig/ml tetracycline (Sigma) for XL-1 Blue. The culture was grown at 37°C with vigorous shaking overnight. 200jnl of the overnight culture was placed into 100ml LB in a 1 litre flask with the appropriate antibiotic. The cells were incubated for ~3hours at 37°C with vigorous shaking until the cells were not more than 108cells/ml (an A600nm 0.2-0.3). The cells were then asceptically transferred to two 50ml sterile, disposable, ice-cold polypropylene tubes.

Using aseptic technique, the cells were recovered by spinning in a centrifuge at 4000rpm for 10 minutes at 4°C and the medium decanted from the pellets. The tubes were stood in an inverted position for 1 minute to allow the medium to drain away. Each pellet was then carefully resuspended in 10ml ice-cold 50mM CaCI2 and stored on ice for at least 1 hour. The cells were recovered by centrifugation at 4000rpm for 10 minutes at 4°C and each pellet gently 62 resuspended in 2ml ice-cold 50mM CaCI2. The cells were then placed at 4°C for 12hours or overnight to increase their competency, after which they were separated into 2 0 0 j jJ aliquots and stored until needed in 15% glycerol at -70°C.

2.2.2 Vector preparation of KS+[Stratagene]) Vectors were cut with Bam H1 or Sma 1 according to the manufacturer's instructions The vectors were also treated with phosphatase.

2.2.3 Treatment of vectors with phosphatase (modified from 1st edition Maniatis, p133) This was carried out using CIAP (Calf Intestinal Alkaline Phosphatase from New England Biolabs). 1pl CIAP enzyme was added directly into the restriction enzyme mix and incubated for 1 hour at 37'C. The enzyme was then removed by cleaning with Qiagen QIAquick Nucleotide Removal Kit as described in section 2.1.15

2.2.4 Ligation of inserts into vectors Ligation reactions were generally carried out at a molar ratio of vectoriinsert 1:2, in a total volume of 10pJ containing (from New England Biolabs) 1x ligation buffer (50mM Tris-HCI , 10mM MgCI2, 10mM DTT, 1mM ATP, 50|ig/ml BSA (Sigma), pH7.8) and 1pl (for blunt-ended ligations) or 0.5pl (for cohesive-end ligations) of T4 DNA Ligase (400U/jnl - New England Biolabs). The ligation mix was then subjected to temperature-cycling for 12-16h in a Perkin Elmer DNA Thermal Cycler, programmed to cycle indefinitely between 30s at 10°C and 30s at 30°C (Lund et al, 1996). Included in all ligation experiments were the following controls: 1. Vector with the ligase enzyme but no insert DNA 2. Vector with neither ligase nor insert DNA

2.2.5 Transformation of KS+ligated vectors into competent XL-1 Blue MRF1 ceils The ligation mix was added to 150pl of competent cells, mixed by gently tapping the side of the tube and stored on ice for 20-30 minutes. The tubes were transferred to a 42°C water-bath and incubated for exactly 90 seconds without shaking. The tubes were then transferred to an ice-bath and the cells allowed to chill for 1-2 minutes. 600jxl LB and 10pil IPTG (isopropylthio-B-D- galactoside,100mg/ml stock) was added to each tube. The cultures were warmed to 37°C in a water-bath for 45 minutes to allow the cells to recover and to express antibiotic resistance. Two further controls were included: 1. Competent bacteria transformed with uncut vector (also a test for competency of cells) 2. Competent bacteria alone

2.2.6 Plating of XL-1 Blue bacteria containing the KS+ vector The mix was added to an LB agar plate containing 80jng/ml X-gal (5-bromo-4- chloro-3-indoly-B-D-galactoside, from a 20mg/ml stock in N,N dimethyl formamide (Sigma) - Calbiochem) and 50|xg/ml ampicillin. The mix was spread well over the plate using a glass spreader and left to absorb at room temperature. The plates were then inverted and incubated at 37°C for 12-16 hours.

2.2.7 DNA production of insert-containing KS+ vectors Method 1 (used where whole cosmid digests were subcloned) A selection of white colonies were picked to fill up to two microtitre plates, grown and stored as described in section 2.1.1. Some blue colonies were also stored as control DNA. Hedgehog blots were produced as described in section 2.1.2.

Method 2 (used where the positive band alone was subcloned) 12 white colonies were picked, each into 3.75ml LB containing 50pg/ml Ampicillin and grown overnight at 37°C. 0.75ml was stored in glycerol at -70°C and the remainder was used to produce DNA using the Wizard™ Plus Minipreps DNA Purification System: 1. The bacteria were precipitated in a 1.5ml centrifuge tube at 26,500g for 2 minutes (repeated for 3ml of bacteria). The supernatant was discarded and the pellet was carefully drained to remove excess medium. 2. The pellet was completely resuspended in 200jnl Cell Resuspension Solution (50mM Tris (pH7.5), 10mM EDTA, 100|ig/ml RNase A) followed by a further 200|il of Cell Lysis Solution (0.2M NaOH, 1% SDS) and mixed by inversion. The suspension should clear almost immediately. 3. 200jil of Neutralisation Solution (1.32M potassium acetate) was added to the lysed cells and mixed by inversion. The lysate was incubated on ice for 5 minutes and then spun at 26,500g to produce a pellet. 4. For each miniprep, one Wizard™ Minicolumn was prepared. 1 ml of mixed Wizard™ Minipreps DNA Purification Resin was added to a 3ml syringe barrel attached to a Luer-Lok® extension. 64 5. All the supernatant was removed from each miniprep and transferred to the barrel containing the resin. The syringe plunger was inserted into the barrel and the slurry was pushed into the Minicolumn. 6. The syringe barrel was detached from the Minicolumn, the plunger removed and the barrel reattached to the minicolumn. 2ml of Column

Wash Solution (80mM potassium acetate, 8.3mM Tris-HCI, pH7.5, 4 0 j i M EDTA and 55% ethanol) was added to the barrel and pushed through the Minicolumn as described before. 7. The syringe was removed and the minicolumn was transferred to a 1.5ml centrifuge tube. The Minicolumn was spun at 26,500g for 3 minutes to dry the resin.

8. The Minicolumn was then transferred to a clean 1.5ml tube and 5 0 j jJ dH20 was added to the Minicolumn and left for 1 minute. This was then spun at 26,500g for 2 minute to elute the DNA which was stored at -20°C.

2.2.8 Identification of plasmids containing positively hybridising sequences. If Method 1 was used, then ECL detection of possible positives was carried out as described in section 2.1.4. This was also done if Method 2 was used, but the DNA was first transferred to Hybond-N as follows: A piece of Hybond-N was divided into 1cm squares using a pencil. Into each square was placed a total of 3(il of each miniprep. This was carried out by dotting 1jil volumes onto the paper and allowing them to dry before the next was added. This was done for all 12 minipreps from each plated ligation reaction, and a KS+ and a previously positive plasmid were included as negative and positive controls respectively. The DNA was then denatured and fixed to the paper as described in section 1.2, Method 2.

2.3 dsDNA Sequencing and primer design

2.3.1 Manual Sequencing (SEQUENASE™ Version 2 protocol from United States Biochemical)

Alkaline denaturation of double-stranded DNA for sequencing 0.1 volumes of 2M NaOH, 2mM EDTA were added to 3-5|ig of plasmid DNA and incubated at 37°C for 30 minutes. The mixture was neutralised by adding 0.1 vol 3M NaAc (pH4.5-5.5) and the DNA was precipitated by adding 3 vols 100% ethanol (-70°C, 15 minutes). The ethanol DNA mixture was spun at 65 26,500g for 20 minutes and the DNA pellet was washed with 70% ethanol. The pellet was redissolved in 7\i\ dH20.

Annealing template and primer The annealing reaction components were as follows: 0.5 pmol primer (for primers used, see Appendix), 7\i\ DNA, 2pl Sequencing buffer to a final volume of 10jul with sterile dH20. The mixture was then placed in a 37°C water bath for 15-30 minutes.

A home-made labelling mix was produced which was diluted 1:20-1:25 to produce the sequence required.

Labelling reaction Initially, 2.5pl of each termination mixture was placed in Eppendorphs and prewarmed at 37°C (each mixture contained 80mM concentrations of each of dATP, dCTP, dGTP and dTTP as well as 50mM NaCI. The "G" mixture contains 8mM ddGTP, the"A" mix 8mM ddATP, the "C" mix 8mM ddCTP and the "T" mix 8mM ddTTP). Then the following was added to the annealed template/ primer mixture: 1pJ 0.1 M DTT, 2pl of a1:20 dilution of labelling mix, 5ptCi [ a-35S] dATP (Amersham) and 3 units of Sequenase enzyme. This was mixed well and incubated at room temperature for 2-5 minutes.

Termination reactions 3.5|il of the labelling reaction was transferred to each termination mixture, incubated at 37°C for 5 minutes and the reaction stopped by the addition of 4jxl of stop solution. This was stored at -20°C until required. Just prior to loading on the gel, the samples were heated to >80°C for 3-4 minutes, placed immediately on ice and 3\i\ loaded in each lane of a vertical 6% polyacrylamide gel (see section 2.6.4 for preparation).

Bulk sequencing reaction for preparation of bp marker using KS+ The method indicated above was carried out with volumes adjusted as follows: 15pg KS+ plasmid was dissolved in 25p,l dH20 after alkaline denaturation. 2.5pmol primer was added to the plasmid solution and 5|xl Sequencing buffer and the volume was made up to 50pl with dH20. 12.5pJ of each termination mixture was prewarmed. The following volumes were added to the annealed template/primer mixture: 5|il 0.1 M DTT, 10|il of diluted labelling mix, 25pCi [a- 35S] dATP or [ a - 33P] dATP (ICN) (2.5jnl) and 10.5jxl diluted Sequenase enzyme. 17.5jil of the reaction mixture was added to each termination mixture and the reaction was stopped by the addition of 20 jjJ Stop solution. The reaction was then divided into 7.5jxl aliquots and stored at -20°C until needed.

2.3.2 Automatic Sequencing: ABI PRISM™ Dye Terminator Cycle Sequencing Ready Reaction Kit.

The PCR reaction Plasmid DNA containing a possible microsateliite was prepared as described in section 2.2.7, Method 2. The DNA was quantified and brought to a concentration of 100-500ng/jLil. 250-500ng of DNA was used in each sequencing reaction. The reaction mix was made up as follows (from Protocol booklet P/N 402078): 8|xl Dye-terminator Ready Reaction Mix (ABI) 250-500ng dsDNA 3.2pmol primer (T7 (forward) or KS+(reverse)) dH20 to 20jil The mixture was overlaid with mineral oil (Sigma). The tubes were placed in a Perkin Elmer DNA Thermal Cycler and subjected to the following program: 1. Rapid thermal ramp to 96°C 96°C for 30 seconds Rapid thermal ramp to 50°C 50°C for 15 seconds Rapid thermal ramp to 60°C 60°C for 4 minutes 2. Repeat step 1 for 25 cycles 3. Rapid thermal ramp to 4°C and hold.

Purifying extension products 1. 50p,l 95% ethanol and 2pJ 3M NaAc, pH 5.2 was added to an empty, clean 1.5ml microcentrifuge tube 2. The PCR reaction was added to the tube containing the ethanol mixture, removing the reaction from beneath the oil. The tube was vortexed briefly and the tube was stored on ice for 15 minutes. 3. The tube was centrifuged at 13,000rpm for 30 minutes to pellet the extension products. 4. The ethanol mixture was carefully and entirely removed from the pellet and 250|xl 70% ethanol was added to the tube. 5. The ethanol was carefully removed and the pellet was air-dried. 67 Gel conditions for sample runnina(recipe from Protein and Nucleic acids laboratory (PNACL.^: The pellet was then sent to the 377 ABI automated sequencer. It was run on a denaturing acrylamide gel 21 g Urea (USB, ultrapure grade) 5.5g 29:1 Acrylamide premix (Biorad) 28ml deionised dH20 This was deionised while the urea dissolves (Amberlite MB-1) 5ml 10X TBE was filtered through a 2pm nitrocellulose filter, followed by the acrylamide mix and degassed for about 3mins. 35pl TEMED, followed by 300|il 10% APS were added and the solution was mixed well. The 0.2mm gel was then poured and left to polymerise for 2 hours. It was used as soon after this as possible.

Sample pellets were resuspended in 4 j l iI 25mM EDTA:Formamide (1:5 ratio) with Blue Dextran added. 2|il of each sample was loaded. The run conditions were as follows: 2400V for 10 hours, 50mA, 200W at a gel temperature of 48°C. Laser power 40mW.

2.3.3 Designing primers for PCR of microsateliites Primers were designed using the following criteria whenever possible: 1. At least 16 nucleotides long 2. A minimum GC content of 50% 3. Not to contain runs of more than 3 identical nucleotides. 4. Be as close to the repeat as possible while still being in 'unique' DNA 5. At least one G or C nucleotide at both the 3' and the 5' end of the primer (more importantly at the 3' end as this anchors the primer at the end nearest the repeat). 6. Melting temperatures (Tm) close to each other. The Tm was calculated as follows: G and C =4°C each, A and T = 2°C each

The number of each nucleotide type within the primer was determined, values assigned and summed. The resulting value was the Tm of the oligonucleotide.

For example, a primer with the sequence :CGATAGTTCAGTCCATAGG would have a Tm of 56°C. 68 As many of these criteria as possible were met in designing each pair of primers.

The primers were synthesised on an Applied Biosystems 394 or 380B by PNACL.

2.3.4 Precipitation and quantification of primers 400pl of primer as received from the synthesiser in 1ml ammonia solution was placed in a 1.5ml tube and 0.1 vol 3M NaAc, pH5.2 was added, followed by 2.5 vols 100% ethanol. The solution was mixed well and the tube placed at -70°C for 30 minutes, or overnight. The primer was then precipitated by spinning at maximum speed in a microcentrifuge for 30 minutes and cleaned using 70% ethanol and spinning for a further 15 minutes. The pellet was air- dried and redissolved in 10Opil sterile dH20.

Quantification of primers was carried out as described in section 2.1.8 , and the primers diluted as necessary to 25ng/pl for use in PCR.

2.4 Polymerase Chain Reaction (PCR)

2.4.1 PCR reaction conditions PCR was carried out using the following protocol: 25ng of each primer, 50ng template DNA, 1 Unit Taq DNA polymerase ('Red Hot ' - Advanced Biotechnologies, 5U/pl) and sterile dH20 and 11.1 x PCR buffer to produce a final concentration of 1x PCR buffer in the final volume. A PCR stock solution was made up containing every component except the genomic DNA template appropriate to the number of reactions required. A typical PCR reaction was made up to a final volume of 10jil with water: 1pl DNA 1jxl Forward primer

1 jllI Reverse primer 0.9pl 11.1 x PCR buffer 0.2|il 'Red Hot' Taq

The reaction was overlaid with a drop of mineral oil (Sigma). Included in each set of PCR reactions was a negative control with water instead of DNA and a positive control, containing the appropriate plasmid DNA. 2.4.2 Optimising PCR conditions All PCR was carried out on a Grant Autogene II or a Perkin Elmer Cetus DNA Thermal Cycler.

Optimisation of PCR conditions was carried out by altering the programme used and the number of cycles. Touchdown programmes (Don et a l , 1991) were used in each case where the annealing temperature was decreased by 1 or 2°C every second cycle to a touchdown temperature where 10-25 cycles were carried out. Several programmes were used, all with denaturation achieved at 94°C for 1 minute, annealing for 1 minute, and extension at 72°C for 1 minute (Holmes et a l , 1993) followed by a final extension for 10 minutes and ending with a soak at 6°C: All primer pairs were initially subjected to touchdown PCR 72°C -54°C (2°C intervals), with 12 cycles at 54°C, changing the conditions as appropriate. Once the conditions were optimised, the primers were used to PCR DNA of a panel of 12 dogs, each of a different breed.

2.5 Extraction of canine genomic DNA form blood and blood-clots

The blood samples were taken by a vet, mostly into heparinised tubes to prevent blood-clotting and were stored at -20°C until needed.

Two methods were used:

Method 1. 1. Thawed 0.5ml blood or 500mg blood-clot was taken and 1ml 1x SSC was added. If a blood-clot, it was finely chopped with a scalpel blade to enable total resuspension in the SSC. 2. The blood mixture was spun at 4°C for 15 minutes and the pellet was totally resuspended in 1xSSC at least twice more, until the pellet had lost most of its red colour. 3. The pellet was then resuspended in extraction buffer (General Solutions, pg 49) and 15|il RNase A was added and the mixture incubated for 30 minutes at 37°C. 4. 15|xl Proteinase K (Boehringer Mannheim) was added and the mixture was incubated with gentle shaking overnight at 42°C (blood or clot) or for 3hours at 55°C (blood only). 5. The solution was extracted twice with an equal volume of phenol: chloroform:isoamylalcohol and once with chloroform : Isoamylalcohol 6. The genomic DNA sample was divided into aliquots of 160pl and purified with BioMag ® carboxyl terminated magnetic beads (Hawkins e ta l. 1994 from step 3 onwards) with the modification that the EDTA used was pH8.0. 7. The DNA was recombined and precipitated with 0.1 vol 3M NaAc and 3 volumes 100% ethanol. The DNA was immediately precipitated by spinning for 5-10 minutes at 13,000rpm and washed with 70% ethanol. After a further spin of 5 minutes, the DNA was left to air dry and was then resuspended in 50jxl sterile dH20. 8. The DNA was quantified and diluted to give a concentration of approximately 50ng/joJ, to enable use in PCR reactions.

Method 2. (using the Genomic G2 DNA extraction kit (Immunogen International) Whole blood pre-treatment: 1. 3 volumes of RBC Prep Sol A were added to 300pJ blood in a 1.5ml microcentrifuge tube. The solution was mixed and incubated at room temperature for exactly 10 mins, followed by a 10 min spin at 26,500g. 2. The supernatant was discarded and the pellet rinsed with 1 volume of RBC prep Sol A. The pellet was spun for 5 mins and the supernatant discarded. Protocol: 1. The cell pellet was resuspended thoroughly in 300pl of Gen I and 300|il Gen II was added. 2. The mixture was incubated at 55°C for 30 mins. 3. 150jnl of Gen III was added and mixed fairly vigorously. The tube was incubated for 10 minutes at room temperature with occasional mixing, followed by centrifugation for 5 mins. 4. The supernatant was collected to a fresh tube and 450|il isopropanol was added and mixed well (At this stage the DNA precipitate should be clearly visible.). The mixture was centrifuged for 5 mins and the supernatant discarded. 5. The pellet was rinsed in 70% ethanol and centrifuged for a further 5 mins. The supernatant was discarded and the pellet was air-dried for 5 mins at room temperature. 6. The pellet was resuspended in 30p,l Gen IV (TE) or dH20 and left to dissolve at room temperature overnight or in less time at 55°C. The samples were then stored at -20°C until needed. 71 2.6 Determination of polymorphic microsateliites and their subsequent analysis across the DogMap reference families

2.6.1 Assessment of the polymorphic properties of microsateliite repeats The PCR products were run on a 5% non-denaturing polyacrylamide gel using BIORAD Protean® II ix cell gel apparatus. The gels were run at 150- 170V for between 4 and 9 hours depending on the fragment size in 1x TBE buffer with a 1kb DNA ladder (Gibco BRL). Photographs were taken with the apparatus described in section 2.1.10. Once the microsateliites had been assessed, those which were polymorphic were accurately sized using radioactively labelled primers and sequencing gels.

2.6.2 Radioactive endlabelling of forward PCR primers For every 10jxl reaction volume: 600ng Forward primer, 25|iCi [Y-33P] dATP(ICN), 1jil 10x Reaction Buffer for T4 polynucleotide kinase, 20U T4 polynucleotide kinase (MBI Fermentas) and dH20 to 10pl. The mixture was incubated at 37°C for 45 mins and the enzyme denatured by incubation at 65°C for 20 mins. The reaction mixture was then diluted by the addition of an equal volume of dH20 and stored at -20°C until needed.

2.6.3 Re-optimisation of PCR conditions Due to the more sensitive nature of PCR using radioactively labelled primers, the PCR conditions for each set of primers had to be re-optimised to enable clear allele identification, usually by increasing the touchdown temperature and/or by reducing the number of cycles necessary at the touchdown temperature. Once this had been carried out, the total volume of the PCR reaction was reduced to 5jil by halving all volumes. This helped to save both the limited template DNA and the amount of radioactivity used for each reaction. Following PCR, 1.5pl Stop solution was added to each sample.

2.6.4 Sequencing gel preparation The 6 or 4% denaturing polyacrylamide mix was made up as described in General solutions, pg 49. 4% gels were used when PCR products were more than 350bp in size. Glass plates were prepared by cleaning each side with detergent and rinsing in distilled water. The inside surface of each plate was then cleaned with IMS and the smaller plate inner surface was sprayed with Acrylease (Stratagene) to repel the gel from that glass surface. The plates were then assembled with spacers and secured with bulldog clips. The gel 72 mixture was carefully poured between the two plates. A shark’s tooth comb was placed in the top of the gel and the gel was left to set for at least 2 hours.

2.6.5 Reference family typing 2pl of each PCR sample was loaded onto the sequencing gel using a Drummond sequencing pipette (Laser Laboratory Systems Ltd) and equal amounts of each of C, A, G, and T bp marker (KS+ - see bulk sequencing reaction, section 2.3.1) into 4 lanes for allele sizing. The gel was run on a Sequencing gel electrophoresis system Model S2 (BRL) at 1200V in 0.5x TBE buffer for up to 4 hours depending on the PCR product size. The power supply used was an LKG Biochrom 2103 power supply. The gels were dried on Whatmann 3MM filter paper in a BIORAD Model 583 gel dryer. The gel was exposed to Fuji Medical film at room temperature.

2.7 Fluorescence in situ hybridisation (FISH)

All work in this section was carried out at The Animal Health Trust in Newmarket.

2.7.1 Cosmid labelling and cleaning Cosmids were labelled using the BioNick Labelling System (Gibco BRL) according to the manufacturer’s instructions, optimised for labelling every 25 nucleotides as follows: 1pg cosmid DNA 5jil dNTP mix (0.2mM each of dCTP, dGTP, dTTP, 0.1 mM dATP, 0.1 mM biotin- 14-dATP, 500mM Tris-HCI (pH7.8), 50mM MgCI2, 100mM 6-mercaptoethanol, 100pg/ml nuclease-free BSA.) 5pl 10x enzyme mix (0.5 units/pl DNA Polymerase 1, 0.0075 U/pl DNase 1, 50mM Tris-HCI(pH7.5), 5mM Magnesium acetate, 1mM B-mercaptoethanol, 0.1 mM phenylmethylsulfonyl fluoride, 50% (v/v) glycerol, 100pg/ml nuclease- free BSA). The mixture was made up to 50pl with dH20.

The solution was incubated at 16°C for 2 hours and stopped by the addition of 5jllI Stop buffer (300mM EDTA), 2pl 5% Sodium Lauryl Sulphate, and 20pl 1xTNE (0.1 M NaCI, 10mM Tris-HCI (pH8.0) and 1mM EDTA (pH8.0)). The solution was mixed (NOT vortexed). Each nick-labelled cosmid was then purified by removing unincorporated nucleotides and salts using Sephadex G-50 Nick columns (Pharmacia): 1. The column was equilibrated by removing first the bottom cap, then the top cap and pouring off the buffer. 3ml 1xTNE was added to the column and removed, a further 3ml of 1xTNE was added and allowed to drip through the column. 2. Without allowing the gel matrix to dry out, the labelled probe was added to the column and allowed to settle. 400 jllI of 1xTNE was added and allowed to drip through (10-11 drops). 3. A collection tube was then placed underneath the column and a further 400|il of 1xTNE was added into the top of the column. All drops were collected in the collection tube. Allowing for 80% recovery, the 400 |llI elution contained approximately 800ng of labelled cosmid DNA at a concentration of 2ng/|il.

2.7.2 Cosmid preparation for hybridisation to chromosome spreads For 1 hybridisation reaction:

Labelled cosmid DNA (100ng) 5 0 j l iI Sonicated canine DNA (10mg/ml) 2.5jnl Sonicated salmon sperm DNA(10mg/ml)(Stratagene) 1pl

At least 2 volumes of ice-cold 100% ethanol were added. The solution was vortexed and placed at -70°C for 1 hour to precipitate the DNA. The precipitate was then collected by centrifugation at 13,000g for 10 minutes and the resultant pellet was air-dried for 30 minutes. The pellet was then resuspended in hybridisation buffer by adding to the pellet the following solutions in the order indicated:

1. Deionized formamide 7.5jxl 2. dH20 1.5|il 3. 20xSSC 1.5pl 4. Tween20(1%) 1.5pl 5. Dextran Sulphate (50%) 3pl

The solution was incubated at room temperature for 30 minutes with flicking and spinning down to ensure mixture and resuspension of the pellet in the hybridisation buffer. The mixture was then incubated at 70°C for 10 minutes to denature the probe and competitor DNAs, followed by a 30-60 minute incubation at 37°C for pre-annealing, to suppress hybridisation of repetitive DNA. 74

2.7.3 Slide preparation for hybridisation of labelled cosmid (Pre-prepared Chromosome spreads were used, prepared as stated in Fischer e ta l . 1996.)

Ail incubations were carried out in Copiin Jars unless otherwise stated.

Slides were taken and dehydrated by 3 minute washes in an ethanol (AnalaR grade, BDH) series of 70%, 70%, 90%, 90% and 100%. The slides were then air dried and incubated in a solution of 70% formamide/2xSSC pre-heated to 68°C for exactly 2 minutes. The slides were then immediately transferred to ice-cold, 70% ethanol on ice for at least 2 minutes with agitation. The slides were then dehydrated through an ethanol series as described above and air dried. Slides were pre-heated to 37°C.

2.7.4 Hybridisation of cosmids to chromosome spreads Pre-annealed cosmid probe and competitor DNA was added to the prepared

chromosome spreads by adding 1 5 j l iI hybridisation mix onto a pre-warmed (to 37°C) coverslip. The pre-warmed slide was then inverted onto the coverslip, taking care to avoid any bubble formation underneath the coverslip. The slide was then sealed with a coverslip and Cow-gum and incubated at 37°C for 16- 18 hours on a platform in a sealed sandwich box humidified with 4x SSC on damp tissue-paper.

2.7.5 Post hybridisation washing and immunocytochemical detection of hybridisation sites All incubations were carried out at 42°C in Copiin jars unless otherwise stated.

The Cow-gum seal was very carefully removed and the slide was dipped into prewarmed 2xSSC and the coverslips were carefully removed without sliding. The slides were then incubated for 3x3minutes in 50%formamide/2x SSC with agitation. It was important not to allow the slide to dry out at any time during the washes. The slide was then incubated for 3x3minutes in 2xSSC, followed by a rinse in 4xSSC/Triton (0.05%).

Non-specific binding of antibodies was blocked with a 30 minute incubation with 100-120|xl 3% BSA in 4x SSC/Triton (0.05%) under nescofilm at 37°C. The immunological detection reagents were prepared as follows: 75 Tube 1: 3% BSA in 4xSSC 1ml FITC (Fluoroisothio cyanate - conjugated to avidin, Vector laboratories) 2pl

Tube 2: 3% BSA in 4xSSC 1ml BAA (Biotinylated anti-avidin D, Vector Laboratories) 10pl Tube 3: 3% BSA in 4x SSC 1ml FITC 2pl Each tube was mixed thoroughly by vortexing for 30 seconds and undissolved antibody pelleted by centrifugation at 13,000g for 5 minutes. The tubes were then stored on ice.

Following slide blocking, the nescofilm was removed and the slide was washed for 3x3 minutes in 4xSSC, 0.05% Triton at 42°C. Slides were incubated for 30 minutes at 37°C with 120jil of FITC (tube 1 mixture) under nescofilm. This was followed by 3 washes in 4xSSC, 0.05% Triton for 3 minutes each at 42°C. The procedure was then repeated for tube 2 and tube 3, with washes in between. After the final washes in 4xSSC, 0.05% Triton, the slide was rinsed in 2xSSC. Chromosome preparations were counterstained with 4-6|il DAPI (Diamidino-2-phenylindole dihydrochloric hydrate (Sigma) - 80ng/ml in 2x SSC) for 2-3 minutes prior to being mounted with Vectashield antifade medium (Vector Laboratories) and sealed with a coverslip.

Image capture analysis was carried out using a Smart Capture FISH station (Digital Scientific, Cambridge, UK) including a fluorescence microscope (Axiophot Zeiss) with FITC, Texas Red and DAPI filters (Chroma Technologies), a CCD camera (Photometries) with a Macintosh Quadra 800 computer with dedicated software (Smart Capture). Chromosome identification was achieved by digitally processing the DAPI component of the images to reveal the banding (Fischer etal, 1996). APPENDIX

PRIMERS (all primers were obtained from Stratagene)

Manual and automatic sequencing primers (sections 3.1 and 3.2) Forward primer ( ? 7 ) : 5 ’ gtaatacgactcactatagggc 3 ’ Reverse primer (KS):5’ tcgaggtcgacggtatc 3 ’

Bulk sequencing reaction primer (section 3.1) M13 reverse prim er: s ’ggaaacagctatgaccatg 3 ’

Plasm icLgonotype XL1-Blue MRF’: A {mrcA) 183 A(mcrCB-hsdSMR-mri) 173endA1 supE44 thi-1 recA 1 gyrA96 re!A1 lac [F’ proAB lacFZAM15lr\10(T ef)] pWE 15 cosmid vector (Stratagene):

T3 T7

pWE 15 8.16kb

NEO Plasmid vector (Stratagene):

Ssp 1 2850 SsAJ1 19

Nae 1 330 Xmn 12645

Seal 2526

Pvu 1 2416 JBssH 11 619 Sac1 657

pBluescript® 11 KS(+) Kpn^ 759 2.96 kb BssH 11 782

Pvu 11 977

ColE1 ori

Afl111 1153 78 CHAPTER 3

ISOLATION AND SEQUENCING OF MICROSATELLITES

3.1 Summary

This chapter describes the isolation, sequencing and initial assessment of the polymorphic properties of thirteen microsatellite repeat sequences.

Restriction endonucleases were used to digest cosmids, previously identified as containing microsatellite repeats, into fragments which were separated on agarose gels. DNA was then transferred to a nylon membrane by Southern blotting and probed with both radiolabelled (CA)n polynucleotide and fluorescently labelled (CA)io or (GAAA)6 , in order to determine the sizes of the fragments containing repeat sequences. Either the entire digest or eluted bands containing the repeat sequences (of no more than 1.5 kilobases in size) were then subcloned into a KS+ plasmid vector. The plasmids containing repeats were then identified, isolated and the inserts sequenced to confirm the presence of a microsatellite. After elimination of (CA)n repeats containing less than ten repeat units, primer sequences flanking the remaining microsatellites were designed. Each of the microsatellites for which primers had been designed was then assessed for its polymorphic properties using PCR. Thirteen polymorphic microsatellites were isolated, four (GAAA)n- and nine (CA)n-containing microsatellite repeats.

One microsatellite showed high homology to the 3' UTR of Insulin-like growth factor II (IGF II) gene of bovids and ovids, and another showed homology to a MER LINE-1-like sequence (medium reiteration long interspersed nucleotide element). A further two were found to be associated with canine-specific sequences, and another showed homology to a can -SINE (canid-specific short interspersed nucleotide element). 79 3.2 Introduction

Microsatellites are short, tandemly repeated sequences of 1-5 or 6 base-pairs in length. They are abundant, randomly distributed throughout the genome and highly polymorphic (Tautz 1989)(Tautz et al. 1986)(Tautz and Renz 1984). These properties make them ideal for use as genetic markers (Tautz 1989). The (CA)n microsatellite frequency and distribution in the canine genome is similar to that found in the human genome (Ostrander et al. 1992), and is the most abundant repeat type to be found within the canine genome (Rothuzien et al. 1994), with (GAAA)n microsatellites being about one third as common (Francisco et al. 1996). (CA)n microsatellites were the first repeat- type to be used in genetic mapping - the current genetic map of the human genome consists of about 7500 microsatellite markers (Weissenbach 1997), of which at least 5264 are (CA)n repeats (Dib et al. 1996). The general use of microsatellites with longer repeat units has become more popular in recent years - despite their lower abundance - by virtue of their lower tendency to produce stutter bands during PCR, and therefore the greater ease with which the alleles they produce can be identified. In the case of the canine genome, many (GAAA)n tetranucleotide repeats have already been isolated and characterised (Francisco etal. 1996).

When repeats are isolated and characterised from large insert libraries such as those of the cosmid and YAC, the clone from which the microsatellite originated can be used to physically map that sequence to a chromosome, leading to the production of both genetic and physical mapping. Working with a smaller number of isolated polymorphic microsatellites enabled both genetic and physical mapping to be carried out.

Microsatellite repeats are not often found within the coding sequences of genes, perhaps not unexpectedly due to their tendency towards high polymorphism, making them sequences which are unlikely to be found in a necessarily conserved region of the genome. Those microsatellites which are found in coding sequences may serve as genetic markers for these genes. If the repeats have been isolated from a large insert library, the DNA inserts can be used as physical markers, enabling chromosomal assignment of the gene. Pick individual colonies of partial canine cosmid library into microtitre plates I Duplicate library onto nylon membrane i Select positive with either CA or GAAA repeat oligonucleotides. Isolate cosmid DNA I Digest DNA with an appropriate restriction enzyme. Positive fragment of up to 1 kb in size detected by fluorescent probing of gel Southern blot

Gel purify fragment

Subclone fragment(s) into an appropriate vector I Identify subclones containing appropriate insert DNA by blue/white selection and probing of DNA with oligonucleotide probe

; ------Sequence from each end of insert to confirm the presence of a microsatellite and design flanking primers i Amplify repeat by PCR using DNA taken from 12 dogs of different breeds to assess polymorphisms of the microsatellite

FLOW CHART TO SHOW MICROSATELLITE ISOLATION AND POLYMORPHISM DETECTION 81 3.3 Results

The flow-chart on the previous page briefly shows the overall strategy used for isolation of cosmid-derived genomic fragments containing polymorphic microsatellite repeats.

3.3.1 Cosmid screening for microsatellite repeats A partial, ordered-array canine cosmid library was duplicated onto nylon membrane (Methods sections 2.1.1 and 2.1.2) and probed both with a radioactively labelled (CA)n probe (Methods section 2.3) and with fluorescently labelled (CA)io and (GAAA)6 probes (Methods section 2.1.4) to detect cosmids containing positively hybridising sequences, as shown in Figure 3.1a and Figure 3.1b respectively.

The average cosmid insert size is 38 kilobases (Holmes et al. 1995) and the partial library provided was ordered in two microtitre plates with a total of 188 cosmids, giving a total of 1774 kilobases of canine DNA. The haploid size of the canine genome is estimated to be 3 x 106 kilobase pairs, so the percentage of the genome represented by the library used was (1774 / 3x 106) x 100 = 0.25%, assuming all inserts are unique.

32 cosmids positive for sequences hybridising to the radioactively labelled (CA)n probe were initially detected, and an additional 19 cosmids were detected using the fluorescently labelled (CA)io probe, giving a total of 51. Duplicate blots were probed with a fluorescently labelled (GAAA)6 probe, which gave a total of 10 possible positives, six of which are visible as dark circles in Figure 3.1b. Probing with fluorescently labelled probes was less stringent, resulting from a lower hybridisation temperature, and therefore produced more positives than when using the radioactive method. The higher background produced using the ECL method, shown in Figure 3.1b was seen after 45 minutes of exposure onto X-Ray film. The background produced after this exposure time enabled localisation of the positive clones. The positives were visible after only 2 minutes exposure, but the lack of background after this time made it difficult to ascertain which positive colonies had been identified from the microtitre plate. Figure 3.1b illustrates the picture from which the positives were identified. Figure 3.1a: 1 2 3 4 5 6 7 8 9 10 11 12

Figure 3.1b: 1 2 3 4 5 6 7 8 9 10 11 12

Figures 3.1a and b show “hedgehog” blots (Methods, section 2.1.2) of microtitre plates 2 and 1 respectively. The blots were probed with a 32P- labelled (CA.GT)n polynucleotide (Figure 3.1a) and a fluorescently labelled (GAAA)6 oligonucleotide (Figure 3.1b). The darker spots represent putative CA or GAAA repeat-containing cosmids. The cosmids were subsequently identified by their position in the microtitre plates. For example, cosmid 2A6 in Figure 3.1a denotes a cosmid from microtitre plate 2, at position A6 on the plate. Figure 3.1b shows six GAAA positives in total, one of which is 1C9. 83 3.3.2 Digestion and Southern blotting of possible microsatellite-containing cosmids DNA was isolated from a total of 46 positive cosmids and digested with Sma 1 or BamH 1, both of which recognise six base-pair sequences. Figures 3.2a and b show Bam H 1 digestion of DNA derived from positive cosmids electrophoresed on an agarose gel (a) and the Southern blot of the gel probed with a (CA)n repeat (b). Most of the (CA)n-positive fragments were larger than 2 kilobases in size and therefore too large to be sequenced in a single pass. Additional cloning would have been necessary to obtain a positive fragment small enough to sequence. To avoid the need for carrying out more than one subcloning step, it was subsequently decided to use Sau3A 1, a restriction endonuclease recognising a four base-pair sequence. The results after digestion are shown in Figures 3.3a and b. This analysis produced a larger number of fragments, mostly smaller than 2 kilobases in size (Figure 3.3a). This smaller fragment size would facilitate single-pass sequencing following a single subcloning step. Figure 3.3b shows that 1G1 represents either a false positive or that the fragment(s) containing the repeat sequence were very small and therefore unlikely to produce a visible signal upon Southern blotting. A total of seven of the initial 46 positive cosmids identified with the (CA)io and (GAAA)6 probes, were found to be false positives, giving a total of 39 individual positives which were identified.

The (GAAA)n repeat-containing cosmids were analysed in a similar fashion and the results are presented in Figures 3.4a and b. As shown in Figures 3.4a and b, fewer (GAAA)n repeats per cosmid were observed, whereas (CA)n positive clones contained up to four possible repeats within one cosmid after digestion with restriction endonucleases, as illustrated by the cosmid digest of 2F11 in Figure 3.2b. Of the ten initial (GAAA)n repeat-containing cosmids, two separate groups of clones were found to contain identical repeats. One group of three clones - 1A8, 2H12 and 1D8 in Figure 3.4a - showed similar restriction enzyme patterns and the fragment containing the positively hybridising fragment was an identical size in each case (see Figure 3.4b), suggesting the presence of identical or overlapping DNA inserts in each of the cosmids. All three subclones were sequenced (two by Edafe Knabe, an undergraduate project student), confirming their similarity at the sequence level. The second group consisted of two clones which contained identical sequences (1C9 and 2D2), seen in the same figures, this time discovered only after sequencing the repeats. In this case, neither the restriction enzyme patterns nor the sizes of the positively hybridising fragments were similar. This can probably be explained by the two cosmids either containing slightly Figure 3. 2a:

T - CM T- _ T _ O o _ C D t— i— CO LO-r-COLO COr-T-T- LO CD CO i - § < < < C Q CDCDOQ QLULLLL . 0 0 X 1 § i CMCMCMCM^CMCMCMCM : CM CM CM CM : CM CM CM CM I origin

m mm

Figure 3.2a shows an ethidium bromide stained 0.8% agarose gel of DNA fragments from a restriction digest prepared from cosmid clones that hybridised to a (CA)10 probe. The colonies were taken from microtitre plate 2 of the patrial canine library. The samples were digested with BamH'\ and electrophoresed. M= fluorescently labelled marker, UM= unlabelled marker. The marker was made up of separate digests of lambda DNA with Hind 111 and EcoR1. -ve= negative control DNA taken from a cosmid which hybridised with neither CA nor GAAA probes in the initial screens. Cosmids were named as decribed in Figure 3.1. In this gel, the DNA of cosmid 2C3 was not visible following digestion, probably due to little or no DNA being present in the digest. The origin indicated by the arrow shows the presence of DNA retained in the wells of the gel. Figure 3.2b:

Figure 3.2b shows a Southern blot of the agarose gel in Figure 3.2a. Dark bands indicate fragments which hybridised to the fluorescently labelled (CA)10 oligonucleotide probe. The unlabelled marker was removed from the Southern blot of the gel. Figure 3.3a: Figure 3.3b:

O CM t- T - Is- -r- L O T - CD HI LU LL Li. CD

4000 4000 3000

111■

Figures 3.3a and b show Sau3A1 digests of 6 cosmids (1B10, 1E7, 1E12, 1F5, 1F11, 1G1), each containing putative (CA)n microsatellite repeats and electrophoresed on a 0.8% agarose gel. Both ethidium bromide stained (Figure 3.3a) and the resulting probed Southern blot revealing the positions of the positively hybridising fragments (Figure 3.3b) are shown. M= Fluorescently labelled 1 kb marker, UM= Unlabelled 1kb marker, removed from the Southern blot of the gel, shown in Figure 3.3b. Flgur«3.4a: £ ^ ^ ^-^

:-e , ,■..■■■ ■< bp ,*gig ■ #&*■,

4162 t^w m ^ mm-

* » . '*» «• »&

. . - ^ ,* —■** ■ ■ m m - *& 123

o _ 3.4b: ^ co-: co i— cd Is- C Q LUI^CU.^

123bp Figures 3.4a and b show an ethidium bromide stained agarose gel and its corresponding Southern blot. Putative (GAAA)n repeat- containing cosmids were digested with Sau3A 1 and run on a 0.8% agarose gel (Figure 3.4a). Figure 3.4b shows the fragments containing sequence which hybridises with a (GAAA)6 fluorescently labelled probe. overlapping DNA inserts, the overlap including the repeat sequence, or that one or both of the cosmids contains two inserts, producing two clones containing the same DNA sequence. A total of four cosmids of the 39 positives were lost due to duplications; three (GAAA)n repeats and one (CA)n repeat.

3.3.3 Subcloning of fragments containing putative repeats Three (CA)n-positive cosmids were initially chosen for further study. DNA from each was digested with B a m H 1 and separated by agarose gel electrophoresis. CA-positive fragments, identified by Southern blotting, were eluted from the gel (Methods, section 2.1.15) and ligated into the BamH 1 site of the KS+ pBluescript vector. Following transformation into XL-1 Blue (see appendix to methods, pg 76 for plasmid diagram), subclones containing (CA)n repeats were identified as described below (Figure 3.5). Two of these subclones contained fragments too large to sequence entirely from each end, so a second subcloning was attempted by digesting the recombinant plasmid with Sau3A 1 and re-subcloning into KS+. Of these two, one of the inserts could not be cut out of the initial plasmid, possibly due to destruction of the site on initial subcloning, although the other was successfully re-subcloned and sequenced. This method proved costly and time-consuming, so all other subclonings were carried out using fragments generated from the four base- pair recognition enzyme, Sau3A 1.

Three cosmids containing (CA)n repeats produced (CA)n-positive fragments which were larger than approximately 1 kilobase in size and were abandoned because they could not be sequenced unambiguously without further, laborious sucloning. As duplicated repeat sequences had reduced the number of different (GAAA)n repeat-containing cosmids from 10 to 7, DNA fragments larger than 1 kilobase containing the latter repeats were subcloned in order to maximise the number of tetranucleotide repeats in the final collection of microsatellites.

Figure 3.5 shows one method used for identifying subclones containing possible repeats. The whole cosmid digest was subcloned into bluescript and up to two full microtitre plates of colonies (blue/white selected) were duplicated onto nylon membranes as described for the initial detection of positive cosmids. A fluorescently-labelled probe was used to detect plasmids containing positively-hybridising fragments. However, the composition of the nylon membrane (Amersham Hybond-N) was altered by the manufacturer during this time and the new paper produced very high background after only Figure 3. 5 1 2 3 4 5 6 7 8 9 10 11 12

A

B

C

D

E

F

G

H

Figure 3.5 shows two positive (CA)n repeat-containing subclones (indicated by the arrows), obtained by ‘shotgun’ cloning of the 2A11 cosmid digested with Sau3A 1. The blot was probed with fluorescently-labelled CA probe at 45°C. A1 is a negative control containing LB with Ampicillin only, and shows background levels of hybidisation.

Figure 3. 6

Figure 3.6 shows the results of a dot-blot where frgaments containing putative (CA)n repeats were eluted from a gel and subcloned. Up to 12 DNA minipreps were made of the resulting recombinant colonies and 3pl of each preparation was dotted onto a grid in an ordered array. A fluorescently-labelled (CA)10 probe was then used to detect DNA from clones containing repeat sequences. += positive control DNA, -= negative control DNA. 1E12 and 2A6 each have one positively hybridising ‘dot’, indicating a picked subclone containinga putative CA repeat sequence. a few minutes exposure to X-ray film, as seen in Figure 3.5, making the positives difficult to distinguish from the background. The difficulties encountered are further discussed in section 3.4. This happened to such an extent that although it was sometimes possible to detect a true positive, as shown in Figure 3.5, it was decided to use an alternative method, illustrated in Figure 3.6. The method used was labour-intensive, but gave clear results. Cosmid-digest fragments containing putative repeats were identified as above, and DNA was eluted from that area of the gel. The fragments were then subcloned, and up to twelve of the resulting recombinant colonies were picked. DNA was then isolated from overnight cultures of these colonies. The DNA was then dotted onto a grid as described in Figure 3.6 and any plasmids containing repeats were detected following hybridisation of a fluorescently- labelled (CA)n probe. Any subcloned DNA which did not produce a positive subclone after three attempts was abandoned. Two possible positives were rejected in this way, one putative (GAAA)n repeat and one putative (CA)n repeat.

3.3.4 Sequencing of plasmids containing putative microsatellite repeats All plasmids containing repeats were sequenced using either an ABI automatic sequencer or the Sequenase sequencing kit. An example of manual sequencing of the 1B7 repeat is shown in Figure 3.7. In this figure, the presence of a (GAAA)i 3 repeat can be seen, which was not clearly identified with the automatic sequencer (see General Results Appendix (i), pg 191 for sequence data), possibly because of the secondary structure causing ambiguous results. Secondary structure can also be seen in the manual sequencing gel, but the stronger of the bands was always that of the G nucleotide lane and the even spacing of the G and A repeat nucleotides contributed to the evidence for the presence of a (GAAA)i 3 repeat. Sequencing results for all repeats are summarised in General Appendix (i), pg 191. Another putative (GAAA)n repeat-containing subclone was rejected due to repeated difficulty in obtaining good quality sequencing data, and two repeats were found to be too small in size, suggesting that they may not be polymorphic. A cut-off point of a minimum of ten (CA)n repeats was used (Weber 1990).

Sequencing of the subclone containing 2 A 1 1 revealed it contained 2 (CA)n microsatellite repeats. The largest repeat was selected and primers were designed around this repeat. 1F5 contained a (TG)i 2 repeat as seen in Table 3.1, but also within the primers flanking the repeat was a GA repeat Figure 3.7 G A T C

(GAAA)13

(GA)18

(GAAA),

(GAAA) 13

Figure 3.7 shows the sequence of the GAAA repeat 1B7, the sequence of which is indicated to the left of the figure as (GAAA)4(GA)18(GAAA)13. The sequence to the right shows the (GAAA)13 repeat clearly, derived from a longer run of the same sequencing reaction on the left. G, A, T and C indicate the nucleotide lanes of the sequence. 92 Table 3.1:

Cosmid Repeat characteristics Repeat type Name

1B7 [(GAAA)4](GA)i8 (GAAA)i3# compound perfect*

1B10 (GT)i8 perfect 1 D6 (AC) 15 perfect

1E3 (TG)u(GA ) 14 compound perfect

1E7 (TG)2 1 perfect

1E12 (TG)i3 [(CACG)3] perfect

1F5 (TG)12 perfect*

1F11 (GT)i9 perfect

2A6 (TG)13[(CG)4] perfect*

2A7 (GAAA) 12 perfect* O < 2A11 i". perfect 2D2 (GAAA)g perfect* 2H12 (GAAA) 16 perfect*

The * indicates the largest repeat within the primer sequences has been identified in each case. All the repeats indicated are surrounded by repeat-like sequences which could be partly responsible for some of the polymorphic properties observed for each microsatellite (see discussion for details). The repeat-like sequences found surrounding these repeats can be seen in the general appendix. [ ] indicates the shown within is not included in the classification system used to name the repeat types as described by Weber (Weber 1990). # The last part of this repeat - (GAAA)13 - was not clearly sequenced using the ABI automatic sequencer, possibly due to the small size of the repeat, but manual sequencing showed a repeat of that size to be present. interspersed with Cs replacing the Gs for nearly 40 bases. The 2A6 repeat also contained a short TG repeat, with interspersed Cs replacing the Ts, or possibly a GC repeat extension with Ts replacing the Cs. All of the GAAA repeats contained G(A)n rich sequence around the main repeat indicated in Table 3.1, and 2A7 contained a short CT repeat within the sequence flanking the microsatellite. The full DNA sequences of the repeats and flanking DNA are given in General Appendix (i), pg 191.

3.3.5 PCR of microsatellite repeats Primers were designed for each of 17 microsatellites in total, using the parameters described in the methods, section 2.3.3. Primers were partly optimised by varying the number of cycles and temperature of cycling in the PCR reactions to produce as few spurious products as possible. A further GAAA repeat was discarded, despite exhaustive attempts to optimise the PCR conditions, as it was found to produce numerous spurious bands to the extent that the properties of the repeat were impossible to assess. Another (GAAA)n repeat - 2A7 - also contained spurious bands after PCR amplification, but it was thought to be possible to assess the properties of this repeat, so this repeat was included in subsequent analyses.

Each microsatellite repeat was then assessed for its polymorphic properties across a panel of DNAs prepared from dogs representing 12 breeds. Three microsatellites were found to be monomorphic across all 1 2 breeds, so these were not processed further. An example of a microsatellite found to be polymorphic across the 12 breeds is shown in Figure 3.8, and shows the different alleles of a PCR product amplifying across a CA( 17 ) microsatellite repeat. A total of 13 microsatellites were isolated, sequenced and confirmed as containing polymorphic repeats by PCR. Table 3.1 describes the repeat- types and characteristics. The sequences and position of designed primers of these microsatellites can be seen in General Appendix (i), pg 191, and are identified as underlined sequences.

The PCR conditions were not further optimised at this stage, as the intention was to use the primers to amplify each microsatellite in the DogMap set of reference families. This would involve using a more sensitive and more accurate allele size assessment involving radioactively endlabelling one primer and running the products on sequencing gels. The initial confirmation of the presence of polymorphic microsatellites was required in order to maximise their potential information value in the reference families. Figure 3.8

M +ve -ve BC BM Dal GH GSD IS JR Lab P RW Sa SH M

\ 51;

S 3 IM iife' \ 2 f< I 134bp ", ?&f!-<\Jril^^SSSf

Figure 3.8 shows PCR products electrophoresed on a small, non­ denaturing 5% polyacrlyamide gel shows the polymorphism of microsatellite 2A6 over 12 dog breeds. BC= Border Collie, BM= Burmese Mountain dog, Dal= Dalmation, GH= Greyhound GSD= German Shepherd Dog, IS= Irish Setter, JR= Jack Russell, Lab= Labrador, P= Poodle, RW= Rotweiller, Sa= Samoyed, SH= Siberian Husky. M= Marker +ve= plasmid DNA control, the original subclone of cosmid 2A6 used to sequence the insert DNA, -ve= negative DNA control. The extra band in the Irish Setter sample, above the 1/3 genotype alleles is an artifact commonly seen in PCR products (Holmes et al. 1993) and is more visible due to the slight overloading of the sample. The three alleles of this microsatellite are illustrated in the Poodle and Rottweiler samples, which are heterozygous for 1/3 and homozygous for 2/2 respectively. 95 3.3.6 Summary of loss of microsatellite repeats from initial pool During the course of the project, 46 of a total of 61 possible repeats were characterised, using the combined methods of radioactive and ECL detection. Of these, 13 polymorphic microsatellites were finally identified. The rest were rejected as follows:

7 False positives 4 Found to contain identical repeat sequences

3 Positive fragments produced after Sau3A 1 digestion were too large to enable complete sequencing after a single subcloning

3 Proved to be monomorphic over 1 2 breeds 2 Repeat size found to be too small to be polymorphic (<10 repeat units)

2 Were not able to be subcloned

1 Would not sequence properly 1 Found to produce non-specific bands upon PCR

Ten putative positives were left after 13 polymorphic microsatellites had been identified, so these were not investigated further. It can be seen that about 1/3 of the original number of cosmids containing putative microsatellite repeats were found to be appropriate for further study. This emphasised the need to begin analyses with a much larger number of positives, providing a sufficient number of microsatellites to be eventually elucidated for further analysis.

3.3.7 BLAST search results The 13 DNA sequences shown to contain polymorphic microsatellites were searched for matches against the GenBank sequence database using the BLAST search tool. The parameters used were as follows: The BLASTn search tool (Altschul et al. 1990) searched for nucleotide sequences with identity to the submitted sequence in several databases: GenBank, EMBL, DDBJ and PDB (protein database). Dinucleotide repeat sequences were selected to be filtered out to avoid spurious matches with other, non-related repeat sequences.

Of the 26 sequences analysed in this manner - both forward and reverse sequences of each microsatellite - seven microsatellites produced significant matches (generally given by a maximum smallest sum probability of 1 e - 1 0 ), two of which were to non-specific DNA clones (1B7 and 1D6). 1B7 matched mostly with (GAAA)n repeat sequences of various clones. Three had matches to specific sequences: 96 The forward and reverse sequences of 1E3 both showed high matches to Bovine and Ovine cDNA for insulin-like growth factor II. The sequences showed a maximum of 80% identity over 78 bases (in the forward sequence) to Bovine mRNA for insulin-like growth factor II (accession no. X53553) - and 73% identity over 78 bases (in the forward sequence) to the equivalent Ovine sequence (accession no.X53554) (Brown etal. 1990).

The best overall match, based on both length of sequence and the percentage identity, found was with the complement of the reverse sequence of 1E3 and is shown below ('Sbjct' is the database sequence):

emb|X53553|BTILGF2 Bovine mRNA for insulin-like growth factor II, partial Length = 845

Minus Strand HSPs:

Score = 306 (84.6 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 Identities = 84/120 (70%), Positives = 84/120 (70%), Strand = Minus / Plus

Query: 254 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllll II INI INI I N I II lllllll II II Sbjct: 630 CCCCCTCCATCTGGCTGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 689

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 135 Mill I I Mill llllll II II llllll I lllllll III sbjct: 690 GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTACACAACCAAAA 749

Score = 178 (49.2 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 Identities = 44/55 (80%), Positives = 44/55 (80%), Strand = Minus / Plus

Query: 55 TCNGTCCCCTTAAAACAAATTGGCTTTTTAGGAACACCAGCAAAATTAATTAGTT 1 II I llllll 1111 I I II I I I I II I I I I I I I I I I II III II III I Sbjct: 771 TCAGCCCCCTTGAAACGAATTGGCTTTTTAGCAACACCAGAAAAGCAAACTAGCT 825

The entire length of the insert sequence was matched to the IGF II sequence. Further matches involving the 1E3 sequence can be found in general appendix (ii).

The part sequence of the Bovine cDNA for insulin-like growth factor II (Brown et al. 1990) is shown below. The sequence in bold indicates where the match with the submitted sequence was found. The entire insert sequence was found to be homologous to sequence within the 3' UTR (untranslated region) of the gene. The underlined sequence within the match denotes a gap where homology between the two sequences is lost. In the canine sequence, it is replaced by the microsatellite repeat, between bases 51 and 135 of sequence 1E3 in the alignment above (] denotes the end of the coding sequence of IGF ii). 97

1 cgagactctg tgcggcgggg agctggtgga caccctccag tttgtctgtg gggaccgcgg 61 cttctacttc agccgaccat ccagccgcat aaaccgacgc agccgtggca tcgtggaaga 121 gtgttgcttc cgaagctgcg acctggccct gctggagact tactgtgcca cccccgccaa 181 gtccgagagg gatgtgtctg cctctacgac cgtgcttccg gacgacgtca ccgcataccc 241 cgtgggcaag ttcttccaat atgacatctg gaagcagtcc acccagcgcc tgcgcagggg 301 cctgcccgcc ttcctgcgag cacgccgggg tcgcacgctc gccaaggagc tggaggccct 3 61 cagagaggcc aagagtcacc gtccgctgat cgccctgccc acccaggacc ctgccatcca 421 cgggggcgcc tcttccaagg catccagcga ttag] aagtga gccaaagtgt cgtaattctg 481 ccaagtggca ccatctacct cgcgccgacc tcctgaccgg gaccgcccca ctaggtctct 541 ctctgaaatc cctgtaccgt cctgtctgcg ggctcccctg ccccggcctc tgtgccccaa 601 cctccccacg tcaggcgaat ccccctcggc cccctccatc tggctgaggg gatcagaaca 661 acatctctaa aaatgtacaa aaccaattgg ctttaaatat ccccccaaat tatcaccccc 721 caaattaccc ccaaattaca caaccaaaat tcrcaatcata aacccctcaa tcagccccct 781 tgaaacgaat tggcttttta gcaacaccag aaaagcaaac tagctttcca aaaacttctt 841 aaaac (Brown etal. 1990)

An identity of 72% with Human Mermaid (homologous to MER or medium reiteration frequency sequences) (Hoyle etal. 1996) LINE 1 element mRNA

(Forward and Reverse sequences) was demonstrated with 2A11. A LINE 1 is a long interspersed nucleotide element found widely distributed throughout mammalian genomes; MER sequences are found as part of the 3' untranslated region (UTR) of LINE 1 elements, although they were initially thought to be a separate class of repeat sequences found within the genome (Smit et al. 1995). As with 1E3, the best match was found with the complement of the reverse sequence of 2A11: gb|U31059|HSU31059 Human Mermaid LINE-1 element mRNA sequence Length = 302

Minus Strand HSPs:

Score = 241 (66.6 bits), Expect = 6.9e-23, Siam P(3) = 6.9e-23 Identities = 67/92 (72%), Positives = 67/92 (72%), Strand = Minus / Plus

Query: 32 6 TGTGCCACCATCACCACCATCTTNTGCCAGAACNTCATCATCATCCCAAATNGAAACTCC 267 Mill llllllllllllll lllllll I Mill lllllll lllllll Sbjct: 1 TGTGCAGCCATCACCACCATCCATCTCCAGAACTTTTTCATCTTCCCAAACTGAAACTCT 60

Query: 266 ACACCCACTGAACAGTTCCCCGCCCCACCCCC 235 Mill I 1111 I I I II I Mill Sbjct: 61 GTACCCATTAAACACTAACTCCCCACTCCCCC 92

Score = 161 (44.5 bits), Expect = 6.9e-23, Sum P(3) = 6.9e-23 Identities = 45/63 (71%), Positives = 45/63 (71%), Strand = Minus / Plus

Query: 214 TTNTGCNTCTGTGAATTTAACAAGCATAGCGACATCACATAAGTGGAATCATAGAGCATT 155 II II III lllllll II I II II III II I I I I I I I I I I I I II III Sbj C t: 12 6 TTCTGTCTCTATGAATTTGACTACTCNAGGTACCTCATNTAAGTGGAATCATACAGTATT 185

Query: 154 TAT 152 I I Sbjct: 186 TGT 188

Score = 97 (26.8 bits), Expect = 6.9e-23, Sum P(3) = 6.9e-23 Identities = 21/23 (91%), Positives = 21/23 (91%), Strand = Minus / Plus

Query: 115 TTTGTGACTGGCTTATTTTACGT 93 II I I I I I I I I I I I I I I I I II I Sbjct: 192 TTTGTGACTGGCTTATTTCACTT 214 The Mermaid sequence is shown below. As with 1E3, the bolded sequence shows where the homology with the canine sequence was found. The underlined sequence (bases 189 to 191) indicate the insertion of a (CA)n microsatellite repeat specific to the canine sequence. There was a loss of alignment in the middle of the sequence from bases 93 - 125, shown as normal type, also indicated above in the sequence alignment.

1 tgtgcagcca tcaccaccat ccatctccag aactttttca tcttcccaaa ctgaaactct 61 gtacccatta aacactaact ccccactccc ccctctcccc agcccctggc aaccaccatt 121 ctactttctg tctctatgaa tttgactact cnaggtacct catntaagtg gaatcataca 181 gtatttgt££__£tttgtgact ggcttatttc acttagcata atgtcctcan gcttcatnnn 241 nnccatgttg tagcatgtgt cagaatttcc ttccttttta aggctgaana atattccatg 301 tt (Hoyle etal. 1996)

2A7 showed an identity of 94% to a canine-specific SINE (can-SINE), found dispersed throughout the canine genome (Coltman and Wright 1994)(Minnick etal. 1992) (see Chapter 5, section 5.3.1. pg 129 for alignment details).

Two showed matches to canine-specific sequences (2D2 Forward only and 2H12 Reverse only), but not to specific genes.

The microsatellite sequences 2A7, 2 D2 and 2 H1 2 were all (GAAA)n repeats, and upon further study of their sequences, were found to have identity to the canine-specific SINE (short interspersed nucleotide element) (Coltman and Wright 1994) and (Minnick et al. 1992) in parts of their sequences. All three could be associated with these can -SINEs (Coltman and Wright 1994). In order to determine with which gene (if any) the repeat was associated, further sequencing of the cosmid from which the subclone was derived would be required. Results of the BLAST searches can be seen in General Appendix (ii), pg 206. 99 3.4 Discussion

3.4.1 Partial canine cosmid library Since only 0.25% of the library was used to identify microsatellites, an estimation of frequency of (CA)n and GAAA microsatellites within the genome would not be statistically valid. However, an estimation of the frequency of (CA)n and (GAAA)n microsatellites within the canine genome has been carried out, and it was found that (CA)n microsatellites occur about every 42 kilobases (Rothuzien etal. 1994) and (GAAA)n microsatellites occur about 1/3 as often, about every 130 kilobases (Francisco et al. 1996). For (GAAA)n microsatellites, this would mean there should have been one in every 3 to 4 cosmids in the library containing a (GAAA)n repeat sequence, giving an expected total of 47 to 63 repeats of this type. Since only 7 unique (GAAA)n microsatellites were identified from this library, sample size, detection methods, age and number of previous replications of the library may have had a significant effect on reducing the number of (GAAA)n sequences ultimately identified. The library was not obtained directly from the manufacturer, so the number of amplifications it may have undergone cannot be determined. Over­ amplification of libraries such as these cause the cosmids with inserts which amplify less efficiently to be lost and those which replicate efficiently to increase in number. From the observed frequency of (CA)n repeats, it was expected that nearly every cosmid would have been positive in the eventual screen (170 positives in 188 cosmids), but again the observed number was significantly lower. The aim of the project was to identify a small number of polymorphic microsatellites to carry out both genetic and physical mapping. Since many of the repeats investigated had more than one (CA)n repeat in any one cosmid and not all positives were investigated past the initial stage of identifying a cosmid containing putative positive(s), the numbers were not complete, so these statistics cannot be assessed with respect to the partial library used. The aim of isolating the microsatellites from a large insert library was to gain sufficient repeats to carry out both physical and genetic mapping of each microsatellite. This meant a smaller number of microsatellites were required. For the purposes of this study, it was neither considered necessary to investigate all positives identified or to ensure all positives present in the library were identified. Between 10 and 15 polymorphic microsatellites were to be identified to enable physical and genetic mapping to be performed.

3.4.2 Methods of isolation Many different methods of isolation of microsatellites from large insert libraries have been described, mainly using a PCR-based method to avoid the need 100 for time-consuming subcloning. The methods often involved the use of the repeat itself as a primer binding site, and by ligating primer sequences on the ends of DNA fragments cut with a specific restriction endonuclease. Some involved the use of magnetic beads to capture biotin-labelled fragments (Yuille et al. 1991)(Baron et al. 1992)(Pandolfo 1992)(Taylor et at. 1992)(Robic et al. 1994)(Rothuizen and Raak 1994)(Rowe et al. 1994)(Lench et at. 1996)(Prochazka 1996). It was decided to use a subcloning-based method as used in the pig (Robic et al. 1995) and cattle (Toldo et al. 1993) because this method enabled confirmation of the presence of a repeat sequence at almost every stage, as well as providing stable storage of each repeat at most stages of the isolation. This allowed for efficient assessment of problems where they arose with the methodology, without the need to return to the beginning of the isolation process after solving each problem.

3.4.3 Selection stringency and ECL The isolation of cosmids containing microsatellite repeats was carried out mainly using fluorescently labelled oligonucleotide probes (called ECL - see methods, section 2.1.4). The method was undertaken primarily because it is much safer than the use of radioactivity, which was used to carry out the initial screening for (CA)n repeats. It was also found to be less stringent than radioactivity, as it detected more positives. This difference in the detection of positives was probably due to the different hybridisation temperatures used in each method - 60°C for radioactive probing and 45°C for fluorescent probing.

However, a total of 6 false positives were detected after radioactive probing (18%), but only one subsequent to ECL (1.5%). Thus, the ECL method was more effective in detecting true positives than the radioactive method. The radioactive method did produce results with less background than did the fluorescent method, but this was not really a problem in the initial detection of cosmid positives. There was, however, found to be a problem with background when the composition of the nylon membrane used was altered. High background was present after the membrane was used to grow colonies overnight which had been replica-plated onto it. The plating problem was also encountered, and remained unsolved, by colleagues. Southern blotting ability was not affected by the change, which meant the problem probably arose due to the agar absorbed during overnight incubation. Several other types of membrane were tried, but none gave acceptable results. Subsequently, the last few microsatellite subclones had to be identified using an alternative method, described in the Results section 3.3.3, pg 90. The method was more labour-intensive, as it involved carrying out numerous DNA isolations and hybridisations, as well as repeated subcloning in some cases. However, several microsatellites had already been identified as being polymorphic at this stage, and time spent using the alternative method was minimal.

3.4.4 Microsatellite duplications Four of the positives initially identified were subsequently shown to be reisolates of a previously characterised cosmid, three of which were (GAAA)n repeats This suggests the library provided may have been amplified several times. This would cause cosmids which copy easily to become more abundant and those which do not to be lost from the library altogether. Given the small proportion of the genome which was represented by the library provided, it was surprising to find any duplicate sequences. Contamination of the isolated cosmids is unlikely, as digestion of cosmid mixtures would reveal a fragment pattern which would include all bands of the uncontaminated cosmid as well as the extra bands from the different cosmid, which was not the case with the (GAAA)n repeats. The duplicated (CA)n repeats, 1 G9 and 2 A6 were not detected as being identical until sequencing, although subsequent restriction pattern analysis revealed the cosmid inserts were identical by virtue of their similar restriction patterns.

3.4.5 Repeat sequences The sequence surrounding the microsatellites isolated were often flanked by repeat-like sequences. This has been reported before for both canine dinucleotide (Holmes et al. 1993)(Ostrander et al. 1993)(Fredholm and Wintero 1995) and tetranucleotide repeats (Francisco etal. 1996). Two of the

(CA)n microsatellites, 1F5 and 2 A6 had repeat-like sequence surrounding the main repeat as follows (the repeat length recorded in Table 3.1 is shown in bold type):

1F5: (TG) 1 2 C A C (GA) 3C A (GA) 2C A (GA) 3C A (GA) 7

2A6: (TG) 13 (CG) 4 T G (CG) 2 (TG) 6

The classification of these repeats (Weber 1990) as described in section 1.4, pg 22 of the general introduction, means 1F5 is considered as a compound imperfect repeat, comprising one perfect repeat of (TG)i 2 , and one imperfect repeat of (GA) 3CA(GA)2CA(GA)3CA(GA)7 , separated by 3 nucleotides. 2A6 is considered to be two separate perfect repeats of (TG)i 3 and (TG) 6 , as there are more than three nonrepeat bases between the two longer repeats, which are not interspersed with at least 1.5 TG repeat units. Each of the repeats within the sequences, however, are potentially polymorphic and had to be considered as essentially one repeat, although technically 2A6 consists of two separate repeats.

In the case of the 1F5 repeat, it seems as if the original repeat could have been a compound perfect (TG)i 2 (GA)n repeat, where certain Gs within the second half of the repeat have been replaced by Cs. Mutations such as these can stabilise the repeat, making it less polymorphic, as found in normal individuals when compared to those who have some diseases caused by trinucleotide repeat expansion (Eichler etal. 1994). An interrupted (imperfect) repeat is less likely to be as polymorphic as an uninterrupted one (Weber 1990).

All the (GAAA)n isolated microsatellite repeats were surrounded by G(A)n repeats which may also contribute to their polymorphic properties. The sequence variability of tetranucleotide repeats has been observed for (GAAA)n repeats by several groups, including (Liu etal. 1995) and (Francisco et al. 1996). Longer alleles are more likely to contain irregular repeat sequences with various combinations of G and A, often with higher mutation rates (Liu et al. 1995). Types of microsatellite repeat found in an intron of one gene included a perfect, uninterrupted repeat, and irregular sequences with different combinations of G and A (Liu etal. 1995). (GAAA)n repeats are more polymorphic than (CA)n repeats in the canine genome (Francisco et al. 1996), with the advantage of giving results which were easier to interpret due to less stutter band production and a larger size gap between alleles which are one repeat unit apart (Beckmann and Weber 1992)(Murray et al. 1993)(Adamson et al. 1995), therefore causing fewer genotyping errors (He etal. 1996).

3.4.6 Assessment of the polymorphic properties of the microsateliite repeats It was decided to assess the polymorphic properties of each microsatellite by using a panel of twelve dogs, each of a different breed. The intention was to find a set of markers showing polymorphism throughout the domestic dog population, maximising their potential usefulness in all breeds, not just those of the families to be used in genetic linkage analysis. Markers assessed in this way would be of more use in general applications such as parentage testing and evolutionary studies. Indeed, microsatellite variability within dog breeds has consistently been found to be lower than between dog breeds (Pihkanen etal. 1996)(Zajc etal. 1997), so it would be advantageous to assess polymorphism using different breeds of dog rather than using families 103 of only two or three breeds. Some microsateliite loci which have been found to be monomorphic in some breeds are variable in others (Zajc et al. 1997).

3.4.7 BLAST search results Of the sequence alignments found using BLASTn search tool, only one showed significant homology to a gene sequence. The entire reverse sequence of 1E3 was found to match to the 3' UTR of the IGF II gene. The gene itself for IGF II in dogs has not yet been found, although it has been identified in the human (O'Brien and Marshall Graves 1991)(0'Brien etal. 1993), rat (Moore etal. 1991), mouse (Moore etal. 1991)(0'Brien etal. 1993), cattle (O'Brien and Marshall Graves 1991)(0'Brien etal. 1993), sheep (Brown et al. 1990)(Ansari etal. 1994), horse (Raudsepp etal. 1997), and owl monkey (O'Brien and Marshall Graves 1991). Primers designed for the human IGF II gene produce a single PCR product when amplified on numerous mammals' genomic DNA including that of the dog (Lyons et al. 1997). Confirmation of the presence of the canine IGF II gene itself within the cosmid, by sequencing 5' to the insert sequence, would need to be carried out before cosmid could be said to contain the canine IGF II gene, and thus be assigned to chromosome 18q21, as assigned in Table 4.1, pg 106, and discussed further in Chapter 4, pg 124. The presence of a microsateliite repeat within the homologous region of the canine sequence, where not present in the bovine, ovine or equivalent area of the human (Dull et al. 1984) sequence shows evidence of a lack of conservation of microsateliite repeats between species. To one side of the repeat, near 1E3 sequence position 135 in the bovine sequence, there is a small CA-rich region, allowing for the possibility of the evolution of a microsateliite around that region. The region which was lost in the bovine sequence and replaced in the canine sequence with a microsateliite consisted of 6 6 % C and A nucleotides and two CT dinucleotide repeats, possibly providing the basis for the compound repeat in the canine sequence at that point. Depending on how the canids are related to the ovids and bovids, the microsateliite could equally have been lost from the bovid and ovid sequence and retained in that of the canid.

The match of the sequence immediately surrounding the 2A11 repeat was similar to a Mermaid (Hoyle etal. 1996) sequence. This showed high homology to the MER 12 sequence, which has been found to be associated with an ancient LINE 1 (L1) sequence, L1MB3 (Smit etal. 1995). Further evidence of the possibility of L1MB3 being a subfamily found in the carnivores arises from the high homology of both 2A11 sequence and the Mermaid sequence to the L1 element in the human RVP (Red Visual Pigment) gene (Hoyle et al. 1996) - see General Appendix (ii), pg 206 for matches to the 2A11 sequence. The gene was found to contain L1MB3 in the human (Smit et al. 1995) in the same area and sequence where matches were found for both 2A11 and the Mermaid sequence. It is possible that the sequence of 2A11 is part of a L1MB3 element or similar. However, the Mermaid sequence was classified as a different MER to MER12 in that no homology was found between bases 1-76 of the MER12 and the Mermaid sequence, and that the Mermaid sequence extends past the 3' end of the MER12 (Hoyle etal. 1996). The Mermaid sequence matches more highly to the RVP gene than to the MER12 sequence. This suggests the Mermaid sequence is actually what is present in the gene rather than a MER12. This would cast doubt upon the presence of the L1MB3 element associated with the MER12 as containing the

Mermaid sequence. Since many MERs are seen as fragments of L 1 elements (Smit et al. 1995), it is probable that the Mermaid sequence will also be part of one subfamily of L 1 elements, if not of L1MB3. Preliminary analysis of the

PCR product including the microsateliite of 2 A 1 1 gave a specific product, so although the full cosmid sequence may reveal a LINE 1 element, the sequence immediately surrounding the repeat seems to be unique within the genome.

LINE 1 elements are transposable elements, which are able to move around the genome by reverse transcription of an RNA intermediate. The process is called retrotransposition, and the movement of these elements to new parts of the genome can disrupt the function of genes, sometimes causing diseases (Sassaman etal. 1997). Short interspersed repeats, such as MERs, the Mermaid sequence and SINEs (short interspersed nucleotide elements) have often been found to be part of the truncated 3' ends of the older LINE 1 elements rather than independent interspersed elements (Smit et al. 1995), lending further evidence to the possibility of the 2A11 sequence being part of a LINE 1 element. The sequence homology of 2A11 extends either side of the polymorphic 2 A 1 1 (CA)n repeat. 2 A 1 1 actually contains two (CA)n repeats, although the homology includes only the larger repeat, which was subsequently found to be polymorphic. The presence of (CA)n repeats has been reported in the 3'UTR of LINE 1 elements, at the site of recombination of transposed LINE 1 elements where two have combined to produce a new LINE 1 source gene (Smit etal. 1995). The 2A11 sequence shows these characteristics, although the reason for the occurrence of these repeats is not certain, providing further evidence that 2A11 may be part of a canine LINE 1- like element. 105 The sequence of 2A7, which matched to a can -SINE and 2D2 and 2H12, which matched to various canine-specific sequences, suggests the matches are to repetitive sequences widely distributed throughout the mammalian genome. In humans, about 80% of A( 2-3)N sequences have been found to be adjacent to repetitive sequences, called Alu elements, at the 3' end, some having imperfect repeats between the end of the Alu element and the start of the uninterrupted run of repeats (Beckmann and Weber 1992). The matches gained by sequence alignment for the GAAA repeats seem to support this finding. It means the microsatellites are likely to be polymorphic, although placement of one of the PCR primers within the Alu element may cause high background (Economou etal. 1990). The loss of the (GAAA)n repeat 1 H6 due to the inability to produce specific PCR products may be due to this phenomenon of association, and was confirmed to be the reason for the loss of 2A7, described in Chapter 5, section 5.3.1, pg 129. The precise identification of the repetitive sequences was only possible if sequence data from outside the repeat was obtained. The presence of a

Sau3A 1 restriction site, where the fragment was ligated to the vector precluded further sequencing of the DNA adjacent to the repeat. Without sequencing part of the original cosmid adjacent to the subcloned sequences, more specific matches to genes (if present) or to other non-repetitive sequences are unlikely.

1B7 showed identity to many sequences, but analysis of the results showed all matches were with the (GAAA)n repeat sequence. The size of the subcloned insert was very small, only just enough unique sequence was present either side of the repeat with which to design primers. Any more specific matches would only be found by using a different restriction endonuclease to cut out a larger repeat-containing fragment from the original cosmid, enabling alignment of non-repeat sequence flanking the microsateliite.

It was not surprising to find identification with only one gene, as whole genomic DNA was used to produce the cosmid library. If matches with genes had been required, a cDNA library would have been used, although the small inserts of this library would usually preclude its direct use in physical mapping. 106 CHAPTER 4

FLUORESCENCE IN SITU HYBRIDISATION OF COSMIDS CONTAINING POLYMORPHIC MICROSATELLITES

4.1 Summary

FISH analysis of cosmids containing polymorphic microsatellites was carried out in order to physically map the markers onto their corresponding canine chromosomes. Table 4.1 below summarises the results produced:

Table 4.1

Cosmid Chromosome location 1B7 20q17 1B10 7q16-17.2

1 D6 2q34-35 1E3 18q21 1E7 Xp24 and Yp13 1E12 3q31 1F5 24q24 1F11 30q15 2A6 18q21 2A7 8q33

2A11 centromeres and 6 q21 2D2 9q25

2H12 -

Twelve of the thirteen cosmids were physically mapped to eleven separate pairs of chromosomes as shown in the table above. Two cosmids (1E3 and 2A6) were located to chromosome 18q21. Cosmid 2A11 hybridised to the centromeres of a subset of the smaller chromosomes, but further analysis of a restriction fragment from the cosmid localised the microsateliite sequence itself to chromosome 6q21. 1E7 hybridised to the pseudoautosomal region of the X and Y chromosomes, the position of which was confirmed by sex linkage analysis. Cosmid 2 H1 2 gave spurious results and did not hybridise specifically to any chromosomes. 107 The chromosomes to which the cosmids hybridised were identified by several methods, including banding pattern analysis, relative size, and by the use of chromosome-specific paints. 108 4.2 Introduction

Fluorescence In Situ Hybridisation (FISH) is a technique which involves the use of fluorescently labelled probes of at least 5 kilobases in size to enable identification of their chromosomal origin. There are 78 ( 2 n) canine chromosomes, all autosomes are acrocentric and the sex chromosomes are metacentric. Canine chromosomes are difficult to identify by their banding pattern alone, as many of the chromosomes are small and have similar banding patterns and several canine karyotypes have been proposed (Selden etal. 1975)(Manolache et al. 1976)(Reimann etal. 1996)(Swftonski e ta l 1996). The Committee for the Standardised Karyotype of the Dog (Swftonski et al. 1996) was set up in 1994 (Swftonski et al. 1994), with an agreed standardisation procedure, and a fully standardised, complete canine karyotype has been agreed (personal communication, M. Breen (1997)). The small size of many of the chromosomes has demonstrated a need for chromosome-specific cosmids and the use of chromosome-specific paints to verify chromosome identification. Chromosome-specific paints have already been produced (Langford etal. 1996) and (in combination with chromosome- specific cosmids) have been used to conclusively identify all canine chromosomes comprising the karyotype. 109 4.3 Results

All practical work in this chapter was carried out at the Animal Health Trust, Newmarket under the supervision of Dr. Matthew Breen.

4.3.1 FISH of cosmids containing polymorphic microsatellites to canine chromosomes Figure 4.1 shows the results of hybridisation of twelve fluorescently labelled cosmids to canine metaphase chromosome spreads. Although thirteen cosmids containing polymorphic microsatellites were hybridised to the spreads, one (2H12) gave consistently spurious binding to the chromosomes results and was not successfully localised to any chromosome. The result shown in each picture in Figure 4.1 is a representative example of one of 20 to 30 spreads on a single slide to which the same labelled cosmid was hybridised. Cosmid 1E7 hybridised to both the X and Y chromosomes, and cosmid 2A11 hybridised to the centromere regions of several of the smaller chromosomes as well as giving a faint signal on one pair of the larger chromosomes, which can be seen more clearly as chromosome 6 when enlarged, as seen in Figure 4.2.

4.3.2 Identification of chromosomes to which cosmids hybridised Studies of both partial (Swftonski etal. 1996) and complete (Reimann etal. 1996) karyotypes have recently been published, one of which (Reimann etal. 1996) has been used to identify the chromosomes to which each cosmid hybridised (Figure 4.1). The complete canine karyotype (Reimann etal. 1996) can be seen in Figure 1.1 of the Introduction.

Figure 4.2 shows each of the twelve cosmids hybridised to a pair of homologous chromosomes (stained blue) and the linear, enhanced DAPI banding pattern (similar to G-banding), accompanied by the ideogram used for identification (Reimann et al. 1996). The precise location of each cosmid on the chromosome is also indicated. The banding pattern was used to identify the chromosomes to which 1B10, 1D6, 1E7, 1E12, 2A7 and 2D2 had hybridised while the rest were identified by additional methods: The chromosomes which hybridised 1B7, 1E3, 1F5, 1F11 and 2A6 were identified using methods including relative size comparison and banding patterns as follows: The number of chromosomes which were smaller and/or larger than the chromosomes with the signal was ascertained. This provided a small range of possible chromosomes to which the cosmid had hybridised. The chromosome banding patterns were then compared to the banding pattern on Figure 4.1: (see end of figure for legend) Figure 4.1 cont.: (see end of figure for legend) Figure 4.1 cont.:

2A6 2A7

2A11 2D2

Figure 4.1 Fluorescence in situ hybridisation of twelve cosmids containing polymorphic microsateliite repeats The yellow arrows indicate the pairs of blue, metaphase chromosomes to which the fluorescently labelled cosmids hybridised. The cosmid clone 2A11 can be seen hybridising to the centromeres of a subset of the smaller chromosomes as well as giving a faint, but consistent signal in the middle of the chromosomes indicated by the light blue arrows. The signals on the chromosomes are seen more clearly in Figure 4.2. Figure 4.2: Chromosome ideogram and cosmid location (Reimann et al, 1996)

1B7 t i t 20q17

1 B1 0 7q 16-17.2

1 2*

1 D6 TK 2q34-35

n r i t Xp24 Yp13

1E12 3q31

1F5 24q24

1F11 30q15

2A11 6q21 Figure 4.2 cont.: Chromosome ideogram and cosmid location (Reimann et al, 1996)

t i n 2A7 8q33

2D2 HI* 9q25

1E3

18q21

2A6 I

Figure 4.2 shows the identification of individual chromosomes to which labelled cosmids hybridised. The chromosomes were identified by their DAPI banding pattern using the ideograms in Reimann et al, (1996) shown to the right of the cosmid name above. the chromosome with the signal. The results were confirmed by repeating the hybridisation and also, in the case of cosmid 1 F1 1 , by cohybridising with a chromosome paint (carried out by Matthew Breen) to chromosome 30, and can be seen in Figure 4.3c. The location of all cosmids was confirmed by cohybridisation of cosmids with chromosome paints on chromosome spreads, two examples of which are shown in Figure 4.3a and b.

Initial studies indicated that both 1E3 and 2A6 hybridised to chromosome 18. This was later confirmed by labelling the two cosmids to show up as different colours (red and green) and cohybridising both cosmids on the same chromosome spread, as seen in Figure 4.4 (carried out by Matthew Breen). The signals are so close together, that they are seen as a single, yellow signal, resulting in both cosmids being assigned the same location on the chromosome - 18q21, shown in Figures 4.2 and 4.4.

4.3.3 The centromeric-repeat containing cosmid, 2A11

Cosmid 2A1 1 hybridised to the centromeric region of several of the smaller chromosome pairs (Figure 4.1). However, a faint signal was consistently observed on chromosome 6 (Figure 4.2).

In order to confirm that the unique microsateliite within cosmid 2A11 was in fact located on chromosome 6 , an isolated fragment of DNA containing the microsateliite (3 to 10 kilobases in size) was isolate from the insert in the original cosmid clone and hybridised to canine chromosome spreads to confirm its presence on chromosome 6 ( this work was carried out in Leicester). The initial cosmid DNA was digested separately with several restriction endonucleases recognising six base pairs. The digests were then electrophoresed on an agarose gel and Southern blotted and probed with a fluorescently labelled (CA)-io repeat as shown in Figure 4.5. Those fragments containing the microsateliite which were large enough to successfully carry out FISH, yet small enough to potentially exclude the sequences which were hybridising to some of the canine centromeres, were eluted from the gel and FISH was carried out using the DNA fragment. The results of hybridisation of a 3.7 Kilobase Pst 1 fragment containing the repeat are shown in Figure 4.6 (carried out at The Animal Health Trust), confirming the presence of the microsateliite on canine chromosome 6 . This result suggested the unique microsateliite and its flanking sequence was present on chromosome 6 , despite the cosmid also containing centromere repeat sequences specific to some of the smaller chromosomes, causing further hybridisation when the cosmid was hybridised. Figure 4.3a: Figure 4. 3b:

Figure 4.3c:

Figures 4.3a, b and c show examples of chromosome painting. Figure 4.3a shows digoxigenin labelled 2A7 cosmid (red dots) on chromsome 8 cohybridised with a chromosome paint which is specific to chromosomes 8 and 11. Figure 4.3b shows a similarly labelled 1B10 cosmid on chromosome 7 cohybridised with a chromosome paint specific to chromosome 7 Figure 4.3c shows chromosome 30 paint and labelled cosmid probe 1F11. This dual hybridisation technique was used to identify the chromosome to which 1F11 was hybridised. Figure 4.4

b:

•*

♦ i

Figure 4.4a shows dual hybridisation of cosmids 1E3 and 2A6 to a similar location on canine chromosome 18. The cosmids were labelled separately in red and green and are seen as a single, yellow signal indicated by the yellow arrows, because they are close together on the chromosome. Figure 4.4b shows the DAPI- banded chromosomes to which the labelled cosmids hybridised (chromosome 18), indicated by the blue arrows. X £ ^ Figure 4.5a* s Q_- ^TO -2 -5 -5 Q.^3 o H(fl =3i C\J O C-C Q

— 12216bp 12216bp—

5090bp— 4072bp— 3054bp 2036bp— —2036bp 1636bp—

1018bp— — 506/517bp 506/517bp—

Figure 4.5b m < 5 o i a: r ^ s ^ cl o. # firs >< * :

•; ...% v.

— 12216bp 12216bp—

5090bp— 4072bp— 3054bp— f— 2036bp 2036bp— 1636bp— i i i s ® ■» m •~y“ w 1018bp— t ip i m % m wmi m m Ig ■ - (— 506/517bp 506/517bp— | 1 Ar, iS§§ liH

Figures 4.3a and b show the gel and corresponding Southern blot of 2A11 cosmid digested with various restriction enzymes to determine enzymes producing (CA)n repeat-containing fragments of a size between 2kb and 10kb in size. The blot shows 3 enzyme digests {BarrH 1, Pst 1 and Sma 1) giving fragments within the required size range. Figure 4.6

a :

b :

Figure 4.6a shows hybridisation of the 4 kilobase Pst 1 fragment of cosmid 2A11 containing the polymorphic (CA)P repeat. Yellow arrows indicate the chromosomes to which the fragment hybridised. Figure 4.6b shows the DAPI banding of the chromosome spread in Figure 4.6a. The chromosomes to which the fragment hybridised are indicated by the blue arrows in Figures 4.6b and 4.6c. Figure 4.6c is an enlarged portion of Figure 4.6b, showing the banding pattern of the chromosomes to which the fragment hybridised, shown to the same chromosome as seen in Figure 4.2 for 2A11 120 4.3.4 Cosmid 1E7 and the pseudoautosomal region 1E7 was identified as hybridising to the X and Y chromosomes. The hybridisation position on the X chromosome was at the extreme end of the p- arm. The cosmid insert also hybridised to the Y chromosome, suggesting it might be in the pseudoautosomal region (Burgoyne 1982) of the canine sex chromosomes. Nothing is known of this region in the dog, so it was assumed its characteristics are similar to those of the human pseudoautosomal region, about which much more is known. In humans, this region is where pairing of the X and Y chromosomes during meiosis takes place (Burgoyne 1982). The region has a gradient of sex-linkage (Rouyer et al. 1986a), such that the extent of sex-linked inheritance of markers within the pseudoautosomal region varies depending upon its position within that region. There is a distinct border between the pseudoautosomal region and the sex-specific region of the X and Y chromosomes (Ellis etal. 1989). The nearer the marker is situated to the border, the more likely it is to be inherited in a sex-linked fashion. Conversely, the further away and thus nearer to the telomere, the more likely it is to be inherited in an autosomal fashion. In humans, there is one obligatory crossover event within the pseudoautosomal region at meiosis (Ellis etal. 1989)(Cooke etal. 1985), which explains this variation in inheritance. This fact was used to investigate microsateliite 1E7 further. If the marker was pseudoautosomal, then it would be inherited with either some degree of sex-linkage, or in a non-sex-linked fashion, depending upon its position.

Proof of the pseudoautosomal position of 1E7 was assessed by using three generation families of dogs. Of the six, three generation families obtained (the kind gifts of Nigel Holmes at the Animal Health Trust and Gus Aguirre at Cornell University), only four litters in three families were found to be fully informative for the pseudoautosomal microsateliite, shown in Figure 4.7. The alleles are numbered with respect to their base-pair size difference from the size of the plasmid control. In Irish Setter Family 1, the genotype of the sire

(II2) was +2,-4. His sire was of the genotype +4 , + 2 and his mother -4,-4. The sire inherited the -4 allele from his mother, so the 'male' allele, inherited on the Y chromosome from his sire, was +2. If the locus is sex-linked, all male offspring and no female offspring should inherit this allele. All offspring in litter Ilia were female, but five inherited the sire's +2 allele from his Y chromosome; individuals 2, 4, 7, 8 , and 9, which are the recombinants. In Irish Setter Family 2, the allele on the Y chromosome of the sire (111) is the -2 allele, as his mother's genotype was +2,-4 and the genotype of the father was not available. In this case, recombinant offspring will be females with the -2 allele and males 121 Figure 4.7:

Irish Setter Family 1 (Animal Health Trust) 1 2 -4,-4 +4,+2 +2,-4 +2,-4 -2,-4

Ilia -4,-4 +2 ,-4 -4,-4 +2 ,-4 -2,-4 -2,-4 +2 ,-2 +2 ,-2 +2 ,-4 -2,-4

10

lllb +2,-4 -4,-4 -4,-4 +2 ,+2 +2 ,+2 +2,-4 +2 ,+2 +2,-4 ? R R ? ? 1 2 3 4 5 6 7 8

Irish Setter Family 2 (Animal Health Trust) 1 I +2,-4 O

II +2,-2 U - r O +2’-4 1 2

III -4,+2 -4,+2 -4,+2 +2,+2 +2,+2 +2,+2 +2, -2 R R R R 1 2 3 4 5 6 7

Family D1 (Cornell University)

1 2 I +2,0 O t O

2 3 II 0,-4 O- -o- —0 + 6.-2 +2 ,-4 lllb Ilia £ i cb 6 i i i i 6 i & — — +2 ,-4 +2 ,-4 — ~ — +6 ,+2 +6,-4 — +6,-4 R R 1 2 3 4 1 2 3 4 5

The diagrams above show three generation families, squares denote males and circles denote females. The + and - numbers represent the different alleles of the microsateliite, the other numbers represent the individuals. The + or - numbers in bold represent the allele originally on the Y chromosome of the father and R indicates the recombinant offspring. - indicates individuals where no PCR product was produced, and ? indicates the individuals for which recombination could not be determined. I indicates the first, 'grandparent' generation, II the second, 'parent' generation, and Ilia or b indicate the final generation, from which results were taken. Family D1 was donated by Gus Aguirre and the Irish Setter families by Nigel Holmes, as described in the text. without the -2 allele. As can be seen in Figure 4.7, individuals III 3, 4, 5 and 6 are all male, but none have the allele from the Y chromosome of the sire, so all are recombinant.

The sire in Family D1 (112) has a +2,-4 genotype, the allele inherited from the sire and therefore on the Y chromosome being the +2 allele. The -4 allele is not present in his sire's genotype and therefore must have been inherited from the mother, whose genotype was unavailable. This sire fathered two litters, Ilia and b with different dams (111 and 3 respectively), but not all offspring from these litters had DNA which was able to be amplified during PCR, denoted by the - in Figure 4.7. The male offspring should inherit the +2 allele, and the female offspring the -4 allele if the locus is sex-linked. Two results were obtained for litter Ilia, one male (Illa3), which was nonrecombinant, and one female (Illa4) who had inherited the sire's allele from his Y chromosome, and therefore was recombinant.

In litter lllb of this family, three of a possible seven results were gained, one of which was recombinant (Illb5), the only male in this family to have inherited the allele from the X chromosome of the sire. One litter (lllb), part of Irish

Setter Family 1 , was only partly informative, as its parents (II 1 and 2 ) were heterozygous for the same alleles, resulting in the recombination status of the heterozygote offspring (lllb 1 , 6 and 8 ), which could have inherited either allele from either parent, not being able to be determined. The allele inherited from the sire's father was the +2 allele. The informative offspring were therefore homozygotes; recombinant females (Illb4 and 5) inherited the +2 allele from both parents, the nonrecombinant females (Illb2 and 3) inherited the -4 allele from their parents, and the nonrecombinant male (Illb7) the +2 alleles from his parents. The two families which were not used were discontinued for the following reasons: In family D2, the sire was homozygous, so there was no way of determining which allele was from which of his parents, and therefore which his offspring would inherit, and in family D3 both parents were homozygous for identical alleles. The results of the informative families are shown in Figure 4.7.

From a total of 27 results, 13 were found to be recombinant, giving a recombination frequency of 48%. Given the small number of results, no sex- linkage is evident. The results do, however, support the FISH results, strongly suggesting the 1E7 locus is situated in the pseudoautosomal region, and most likely towards the telomeric regions of the X and Y chromosomes. A further marker within this region is required for more precise positioning of 1E7. 123 4.4 DISCUSSION

4.4.1 FISH results Of the twelve cosmids which hybridised to the chromosome spreads, six were localised to chromosomes 1-10. Given that the chromosomes are numbered in decreasing size, a larger proportion would be expected to hybridise to the larger chromosomes. The remaining cosmids hybridised throughout the rest of the chromosomes - three in the range 11-20, two in the range of 21-30 and one on the X/Y chromosomes. The decreasing number in each group reflects the decreasing size of the chromosomes and therefore chromosomal material. All the cosmids were found to hybridise towards the telomeric end of the chromosomes to which they were assigned. Given the small number of cosmids hybridised, this preference may be entirely due to chance. However, it has been suggested that bovine (CA)n repeats may be preferentially located towards the telomeres of bovine chromosomes (Toldo etal. 1993), this possibility may also be true for canine chromosomes, and will be known only when the many linkage groups have been physically located on chromosomes.

The hybridisation signals present on each chromosome spread represented in Figure 4.1 were consistent and easily identified. They show an example of one of the spreads to which the labelled cosmid was hybridised. All chromosomes were initially identified by their banding pattern and/or relative size as described in the results section.

4.4.2 Centromeric cosmid 2A11 Cosmids hybridising to centromere sequences, as seen with 2A11, have been reported before in both dog (Fischer et al. 1996) and bovine chromosomes (Toldo et al. 1993). Sequence analysis of the corresponding microsatellite- containing subclone showed a high homology to a human Mermaid sequence (Hoyle et al. 1996) (see Chapter 3 discussion, section 3.3.7 pg.95) and could be, by virtue of the Mermaid sequence's homology to MER12, similarly located as part of a LINE 1 element which is highly repetitive, subfamilies of which are found throughout mammalian genomes (Smit et al. 1995). The Mermaid sequence - to which 2A11 was homologous - was found throughout the human genome, although it was less common than the MER12 to which the Mermaid sequence was homologous (Hoyle etal. 1996). The Mermaid sequence was also able to be isolated to a single originating position on the human genome, chromosome 21q22.3 (Hoyle et al. 1996). Microsateliite 2A11 was isolated to chromosome 6q21, a single locus suggested both by its 124 specific mapping and by the specific PCR product produced by primers flanking the 2A11 repeat. The FISH characteristics of the cosmid suggested the unique sequence was associated with a larger repetitive element found at some canine centromeres, possibly a LINE 1-like element as discussed in Chapter 3, section 3.4.7, pg 103.

Canine-specific centromere sequences have been reported (Fanning et al. 1988)(Modi etal. 1988)(Fanning 1989), but showed universal centromere hybridisation and were not subset-specific, as seen with 2A11. A cosmid containing canine DNA has previously hybridised to a subset of 9 pairs of chromosomes (Fischer etal. 1996), so canine repeat DNA which shows a degree of chromosome specificity has been previously observed. This suggests there are repeat sequences in the canine genome which are particular to some centromeres and not others. From the sequence homology of 2A11 and its inferred connections to LINE 1 sequences, it is possible these repetitive sequences are making up the satellite sequences of some canine centromeres. Once characterised, these sequences could be used to identify some of the smaller chromosomes along with chromosome-specific cosmids and paints.

4.4.3 1E3/2A6 hybridisation to chromosome 18q21 The successful hybridisation of these two cosmids to the same area of chromosome 18 is interesting:

Firstly, two cosmids physically mapped so close together may prove to be genetically as well as physically linked. Since the cosmids contain polymorphic microsatellites intended for use in genetic linkage, any genetic linkage between the two microsatellites would support the findings of the physical mapping, the results gained from one method confirming those gained from the other. This was found to be the case when linkage analysis was carried out. Microsatellites 2 A6 and 1E3 were found to be closely linked genetically with a high Lodscore of 13.99 for a recombination fraction of 0.01, described in Chapter 5, Table 5.3, pg 153.

Secondly, 1 E3 sequence alignment has shown it to be highly homologous to the 3' UTR of the IGF II gene in sheep and cattle, as described in Chapter 3, pg 99. The physical mapping of the cosmid containing this sequence to chromosome 18q21 means (subject to confirmation by further sequencing of the presence of the gene sequence itself) this area of the canine chromosome 125 may show synteny with areas of chromosomes of other mammals to which IGF II has already been mapped.

IGF II is one of the suggested 'anchor loci' for carrying out comparative gene mapping (O'Brien and Marshall Graves 1991)(0'Brien et al. 1993) and has been physically mapped to human chromosome 11 p15.5, mouse chromosome 7, bovine chromosome 25 (O'Brien et al. 1993), ovine chromosome 21q21qter ((Ansari et al. 1994) and references within) and equine chromosome 12q13 (Raudsepp et al. 1997). It may therefore be suggested that canine chromosome 18q21 could be homologous to human chromosome 11 p15.5, part of mouse chromosome 7 and bovine chromosome 25, ovine chromosome 21q21qter and equine chromosome 12q13 (Raudsepp et al. 1997). Any other genes found linked to these regions in these species can be assessed for their presence in the corresponding canine chromosomal area. In the horse, donkey, sheep and cattle chromosomes, IGFII was found to be at a terminal location on their respective chromosomes (Raudsepp et al. 1997), but this was not found with the possible canine IGFII. Assuming the gene homology can be confirmed by further sequencing, the non-terminal nature of the IGFII gene in the dog may reflect a chromosome rearrangement of some kind which is particular to the dog when comparing the animals. Further gene-mapping in the dog would be necessary to detect the nature of the difference.

The use of chromosome paints developed in humans could be used to assess the physical extent of homology of the area including IGF II between, for example, humans and dogs.

4.4.4 1E7 - the pseudoautosomal cosmid 1E7 was shown, both by physical mapping and studies of sex-linked inheritance, to be on the pseudoautosomal region of the X and Y chromosomes of the dog. The lack of evidence of sex-linkage of the microsatellite confirmed its presence in the pseudoautosomal region. The number of results used to assess the linkage was too small to give any specific indication of the location of the marker within the region, although it is probably towards the telomeric end of the X and Y chromosomes. If the marker had been close to the border with the sex-specific regions of the X and Y chromosomes, it would have shown almost total sex-specific inheritance. If

no recombinations had been seen with the 2 2 results gained, further evidence for the presence of the microsatellite on the pseudoautosomal region rather than the sex-specific region would have been required. A much larger sample 126 size would be required to gain statistically significant results along with the presence of recombinations, to prove a true pseudoautosomal inheritance. By similar reasoning, a much larger sample size would also have been necessary to confirm any slight sex-linkage of the locus. An example of borderline linkage, to non sex-linkage or to sex-linkage is the human pseudoautosomal gene MIC2 and repeat element DXYZ2prox, whose loci recombine in only 2.5% and 2.2% respectively of male meioses (Rouyer et al. 1986b). Three families (four litters) of the six three-generation families provided were informative, and it can be seen that not all possible offspring genotypes from the informative families were elucidated. Results from these families were included in the analysis, as the genotype of one offspring is independent from that of another, so even without all results from each litter, the results themselves would still be in proportion with the overall end result. From the few results gained, evidence supports the presence of 1E7 on the pseudoautosomal region. More precise positioning of 1E7 within the pseudoautosomal region of the sex chromosomes would require results from a significantly larger sample of informative, three-generation families than shown here.

Microsatellite 1E7 has been definitively mapped to the canine pseudoautosomal region of the dog, both physically and by sex-linkage analysis. The microsatellite is thought to be the first example of a polymorphic pseudoautosomal marker in the dog. This cosmid has also been used to orientate the Y chromosome, thus allowing localisation of a male-specific microsatellite to the sub-telomeric end of the q-arm of the canine Y- chromosome (Olivier et al, in preparation).

The pseudoautosomal region is interesting in that there are differences between the human and mouse pseudoautosomal regions, about both of which more is known. One gene which is X-linked in humans has been found to be pseudoautosomal in mice, and mice have shown evidence of double crossovers in this region (Signer et al. 1994) where humans have not. The characteristics of the canine pseudoautosomal region have yet to be elucidated, and more markers within this region would be needed to confirm the presence of a gradient of sex-linkage in the dog for this region, which is a characteristic of both the mouse and human pseudoautosomal regions. 127 CHAPTER 5

GENETIC LINKAGE OF MICROSATELLITES

5.1 Summary

Polymorphism Information Content (PIC) value calculation of the polymorphic microsatellites and genetic linkage analysis were carried out using the microsatellites which were informative for the DogMap reference families. One marker revealed alleles of a size such that the polymorphism was based on the loss or gain of nucleotides outside the assumed variation in repeat-unit number. Five spontaneous mutations were found in two different tetranucleotide repeats. Four of these mutations were found with the same microsatellite, revealing it to be exceptionally unstable. Apparent non- Mendelian inheritance was observed for one microsatellite, which was subsequently found to be due to non-amplification of one of the alleles. New primers were designed which amplified the allele subsequently increasing the informativeness of the microsatellite for linkage analysis.

Eight linkage groups involving nine of the eleven microsatellites which were informative in the reference families were found, four of which increased the size of already published linkage groups. None of the groups were ordered with the certainty required for genetic mapping, although two were ordered with likelihoods as high as the orders of already published groups, and involved a group of four and a group of five markers. The remaining groups were as follows: Two groups of 4 markers, two groups of 3 and two groups of 2 markers. The groups containing two and three markers created novel linkage groups. 128 5.2 Introduction

The production of a genetic linkage map of the canine genome will form the backbone to genetical research in the dog, and will enable identification of carriers of the many genetically inherited diseases of the dog. Those canine diseases for which there are human equivalents could be studied in the dog, possibly serving as models for the human disorders and be useful in attempting new therapies prior to their use in humans. Research could also be carried out investigating the diverse physical and behavioural traits of the dog.

A linkage map is a map based on polymorphic markers, most commonly microsatellites. The properties of a good linkage marker are almost entirely fulfilled by microsatellites in that they are numerous, polymorphic and randomly distributed throughout the genome (Tautz 1989)(Weber and May 1989)(Weber 1990). The linkage mapping of these and other polymorphic markers is carried out by genotyping them across reference families. Given sufficient numbers of markers typed across the reference families, linkage groups will become apparent. They consist of two or more markers whose alleles tend to segregate together across the reference families. Markers which usually segregate together are likely to be genetically close, undergoing few, if any, recombination events which would cause separation of the markers from each other. As the size of each linkage group increases, the markers within the group can be ordered. The order within each linkage group will become more certain as further markers are added. Eventually, the number of linkage groups should match the number of chromosomes of the animal under study. In the case of the dog, a total of 40 linkage groups, corresponding to the 38 autosomes and the X and Y chromosomes would be elucidated. A genetic map of sufficient density that will enable the identification of monogenic diseases has already been realised in both the human and the mouse (Deitrich et al. 1996)(Dib et al. 1996). The canine genetic map is developing, and at present in the dog there are sixteen known linkage groups (Lingaas et al. 1997), created from a pool of around one hundred genetic markers which include some polymorphic gene markers using the DogMap set of reference families. A further thirty linkage groups including one hundred and fifty microsatellite markers have been identified using a further, three generation reference family (Mellersh etal. 1997). Once the two groups of markers are combined, the state of the canine linkage map will be greatly improved. 129 5.3 Results

5.3.1 Optimisation of PCR conditions using radioactively labelled primers and sequencing gels Radioactivity was used to analyse the reference families because the PCR products could be run out on sequencing gels, allowing more samples to be run on each gel with the added advantage of greater separation of different individual alleles, and more accurate sizing.

The optimal PCR conditions necessary for each primer pair were achieved by adjusting the temperature and by varying the number of cycles at the bottom temperature during touchdown PCR (Don etal. 1991) (see Methods, section

2.4.2, pg 6 8 ). The forward primer in each case was endlabelled with y - 3 3 p ATP.

Table 5.1 shows the PCR conditions and primer sequences of twelve microsatellites. The primers designed can also be seen as the underlined sequences in General Appendix (i) pg 191, which shows the sequences of each microsatellite and their flanking DNA sequence. Once the PCR conditions had been optimised for each primer pair, the polymorphic properties of each microsatellite over 1 2 dogs of different breeds (as used in Chapter 3) were assessed. Table 5.2 shows the Polymorphism Information

Content (PIC) value of each microsatellite over the 1 2 dogs. The PIC value (Botstein et al. 1980) is a measure of the number of alleles of a marker and their distribution over a population and is measured by the following formula:

n-1 n

i=1 j=i+1 Where p/ is the population frequency of the / th allele (Ott 1991).

A PIC value of more than 0.5 is considered to be useful for linkage analysis (Botstein et al. 1980). It is a value which is generally representative of a marker with sufficiently numerous and evenly distributed alleles to be informative.

The 2A7 microsatellite was not optimised, despite numerous attempts using the variations described above. One example shows temperature increases (Figure 5.1) were not effective - increasing the temperature of the touchdown 130 Table 5.1

Microsatellite Primer sequences PCR °C No. of 5' to 3' conditions interval cycles

1B7 F:GCAATCCCACCCCAAAGA 72°C-63°C 1 17

R:CTGTTGCTATGATCATGGGTG

1 B 1 0 * F:TTGTCTTTAGACTCAATAGTAGC 72°C-63°C 1 15

R:CCCAACGATCCGTTCTTTGAC

1 D6 F:GCTCCATCCACATCATTGC 72°C-66°C 1 14

R:AGCAGCATCGACAATAGCC

1E3 F:GAGGAGATCCTAGTAACACC 72°C-63°C 1 17

R:GCTGGTGTTCCTAAAAAGCC

1E7 F:CGTGCCCTCTGCAAAGTG 70°C - 2 2

R:CTGACCGCATGACTTATCC

1E12 F:GGAAATGCATGTAGATACATGTG 70°C-61°C 1 15

R:AGAATGGGAATTGGAAGTCAGG

1F5 F:AGCTCGTTCAGAGTCACGG 72°C-54°C 2 2 0

R:TCTCAGGGATGTTGTCTCTG

1F11 F:TTTCAGAGTTCTGATGCTCC 72°C-54°C 2 1 2

R:GTCAGTTCAATTAGCCAGAG

2A6 F:ATGTCTTTATTCATAATGAGGG 67°C-58°C 1 2 0

R:AAGGAGAAGGTGACCATCC

2A11 F:CATAGCGACATCACATAAGT 72°C-54°C 2 15

R:GGGCTGATAATTCAGTGAGA

2D2 F:CGAGACCACATTGGGCTCC 72°C-63°C 1 15

R:CCAGACGCTCTCCATGCCTC

2H12 F:CCATCCAGGTGAGTCCAGG 66°C-50°C 2 1 2

R:CCACAATCTGGGCTTTAACG

The general PCR conditions were: initial denaturation for 4 minutes at 94°C, followed by 2 cycles at each of the touchdown temperatures with the following parameters: 1 minute at 94°C, 1 minute at the relevant annealing temperature, followed by a 1 minute elongation at 72°C. Once the touchdown temperature was attained, the appropriate number of cycles was carried out as above , followed by a final elongation at 72°C for 6 minutes to a holding temperature of 6°C. * indicates the second set of primers which revealed that the initial primer set was unable to amplify at least one allele. F: denotes the forward primer sequences, R: the reverse primer sequences. 131 Table 5.2

Allele size No. of No. of Micro­ (bp) PIC value over reference heterozygote satellite 12 dogs family alleles parents 1B7 198 0.61 9 15 1B10 119 0.70 3 8 1D6 172 0.55 4 15 1E3 220 0.67 5 7 1E7 128 0.57 7 17 1E12 152 0.35 2 9 1F5 154 0.70 1 0 1F11 175 0.77 4 18 2A6 117 0.59 4 14 2A11 184 0.70 4 15 2D2 390 0.79 10 20 2H12 388 0.64 6 16

PIC values shown in the table are to 2 decimal places and were calculated using the HGMP linkage analysis PIC value program via the Internet. The column to the right shows the number of alleles present over the 23 parents of the reference families and the far right-hand column shows how many of the parents are heterozygotes. The second column from the left shows the product size of the plasmid subclone, used as the positive control. Figure 5.1a: Figure 5.1b: 0 0 0 0 > > + i O ^ jj*^ ■origin i l I j mm i,-.. I 'c;;.IJ h I

Figures 5.1a and b show PCR products produced at a touchdown temperature of 64°C and 66°C respectively. +ve= plasmid subclone 2A7, -ve= no DNA control, G= canine genomic DNA, M=1 kb marker. PCR conditions were touchdown starting at 72°C, reducing 1 °C every 2 cycles to 64°C or 66°C and then at these touchdown temperatures for 15 cycles. PCR from 64°C (Figure 5.1a) to 6 6 °C (Figure 5.1b) lead to a loss of PCR product from the positive control (the plasmid subclone), while still producing spurious banding using genomic DNA as the template. Decreasing and increasing cycle number did not improve specificity, nor did the designing of alternative primers. The size of the plasmid insert was too small to design further sets of primers, so the microsatellite was abandoned. A later database search was carried out on the sequence of the microsatellite and the surrounding DNA and revealed a 92% alignment of part of the sequence to Canis familiaris repetitive DNA sequence ( EMBL accession no.X57357.) (Minnick et al. 1992) as mentioned in Chapter 3, section 3.3.7, pg 95. A copy of the Blast search result is shown below (Query is the submitted sequence, Sbjct is the matching sequence):

emb|X57357|CFREPDNA Canis familiaris (dog) repetitive DNA sequence Length = 130

Plus Strand HSPs:

Score = 229 (63.3 bits), Expect = 2.5e-ll, P = 2.5e-ll Identities = 49/53 (92%), Positives = 49/53 (92%), Strand = Plus / Plus

Query: 1 GAGTCCCACGTCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 II llllll II lllllllllllllllllllllllllllllllllllll II Sbjct: 70 GAATCCCACATCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTG 122

The sequence shown below is a copy of part of 2A7F sequence from general appendix(i) - the arrow indicates the start of the homology with the sequence shown above:

NANNCTTNTT NNTTGGACCC CCGCGGTGGC GGCCGCTCTA GAACTAGTGG ..A A TC Q ^TC C C eeCGGTGCAT GGAGCCTGCT TCTCCCTCTG

CCTATGTCTC TGCCTCTCTC TCTCTCTGTG TGACTATCAT NATTNNNTTC

AAATTANTAA AATCNGAANA GAAAGAAGAA AGAAGAAAGA AAGAAAGAAA

The two forward primers were designed as indicated by the underlined sequence and the outlined sequence, the italicised sequence is plasmid sequence and the bold letters indicate part of the tetranucleotide repeat. A further reverse primer was not able to be designed, due to a lack of suitable unique sequence before the insert DNA ended. The underlined primer sequence is identical to the sequence of the canine SINE and the second designed primer has only 3 base-pairs difference towards the 5‘ end of the sequence. The 130 canSINE sequence is found dispersed throughout the genome (Minnick etal. 1992) and (Coltman and Wright 1994), 134 sometimes as a part of larger repeat sequences. The almost identical sequence of both primers to parts of the sequence could explain why spurious banding was consistently produced when carrying out PCR. Re-subcloning of the microsatellite in order to design primers outside the can SINE would have resulted in a product size too large to enable accurate allele-size differences to be determined, as the product size using the above primers was almost 400 base-pairs in size.

5.3.2 Genotyping the reference families Once the polymorphic properties of the microsatellites had been determined, the alleles of the microsatellites were studied in the two-generation reference panel of the international DogMap collaboration (Lingaas etal. 1997). This panel consists of eight families; one of German Shepherd dogs and the rest of Beagles - Figure 5.2 shows the pedigrees of the families. The parents of each of the families were assessed for their informativeness at each of the microsatellite loci. Where the parents' alleles would not produce offspring with informative allele combinations, the offspring were not assessed. Non- informative parents were those whose alleles would give no visible variety in the offspring produced at that particular locus, examples are given later. Table 5.2 gives information as to the potential usefulness of the microsatellites within the reference families; microsatellites with higher PIC values were generally more informative within the reference families. One exception was 1F5, which had a high PIC value but was totally uninformative within the reference families.

Figure 5.3 shows the alleles present in the parents of the reference families for each of the twelve microsatellites. The actual allele sizes of the parents can be seen in Appendix (i) of this chapter.

Figure 5.3a shows one of the most polymorphic microsatellites over the reference families, giving a total of nine alleles. Figure 5.3b shows the four alleles of 1D6. The bands of individuals 24, 52 and 122 show some distortion due to the presence of urea in the wells, of which effects include the slight retardation of the samples, resulting in the bands showing an upwards curvature, smearing of the lane and narrowing of the bands, as seen with the samples of the individuals mentioned. Also present in the results of this microsatellite are stutter bands. They are usually found in multiples of two base-pairs below the main bands, and can cause confusion in allele-calling. In 1D6, the stutter bands are of lower intensity than the allele bands, and once Figure 5.2

German Shepherd Dogs

E h O “ 7 1 d 24

3 4 5 6 7 _ 6 6 ii6 i6 i6 ££££ 14 15 16 17 18 19 20 21 22 23 9 10 11 12

Family 1

25 26 27 28 29 30 31 32 £38 39 40 41 42

Beagles

—Q Family 2 O Family 3 43 44 52 53 £c3 J 45 46 47 48 49 50 51 54 55E£5£o£ 56 57 58 59 60 62 63o£ 64 65£ 66

Family 4 Family 5 67 68

£ i 69 70 71 74 75

Family 6 98 87 88

99 100 101102 103 104 105 89

Family 7 p Family 8 106 107 122 123

( J ( J ( j ( 108 109 11 18 119 120 121 Figure 5.2 shows the DogMap reference panel for typing polymorphic markers. It consists of 8 families (129 individuals); Family 1 is the largest with 35 offspring and is a pure-bred German Shepherd family; Families 2 - 8 have 71 offspring and are Beagle families. Each individual has a number for ease of identification - males are represented by squares, females by circles. 136

Figure 5.3 shows the parents of the reference families and the results gained with PCR of each parent with each microsatellite repeat. The numbers shown above each picture are the numbers of the parents as represented between the figures of each page. PF1 represents Family 1 parents and so on. Lines joining the parents indicate offspring was produced from those individuals. The letters C, A, G and T are the nucleotides present in the KS+ plasmid sequence and act as a base-pair marker for the alleles. The small case letters indicate the genotype of the parents. + = the plasmid subclone used as a positive control, -= no DNA control. Figures c, d, e and f are dye-sublimation prints, produced to give maximum clarity of results. Figures 5.3g,j,k and I are photographs of X-Ray films rather than scans because the results were clearest when photographed. Figure 5.3

Genotype: c/i g/g i/i g/i g/i g/h b/c a/a c/f b/b a/g c/g a/a a/a c/d d/i b/g a/e a/a a/f b/h a/a a:1B7 ' ■*' :-r• ■ -V V*.

l i l l l l

c a g t + - Do6ooiHD c a d o non d o g t doc5o~6 do do 1 2 8 13 24 33 37 43 44 52 53 61 67 68 72 73 87 88 98 106 107122 123 PF1 PF2 PF3 PF4 PF5 PF6 PF7 PF8 b: 1D6

? * tirfFts M i '

Genotype: c/d c/d c/d c/d b/c c/d c/d b/d b/b a/b b/b b/d b/b b/b b/b b/d b/d b/b b/c b/c b/b a/d b/b Figure 5.3 cont. Figure 5.3 cont.

Genotype: b/d c/d b/c a/d b/c b/c a/c a/c d/d d/d b/d c/c a/d a/d a/d a/c d/d d/d a/d c/d a/c a/b a/d e:1 F 1 1

C A G T d o 6 o c 6 d C a DO DOO DOG T DO £hC)6 DO DO 1 2 8 13 24 33 37 43 44 52 53 61 67 68 72 73 87 88 98 106 107 122123 PF1 PF2 PF3 PF4 PF5 PF6 PF7 PF8 f:2A 6

Genotype: c/c c/c c/c c/c c/c c/c b/c c/d a/c a/d c/d c/c c/d b/c d/d c/d c/d a/b b/d b/d b/c c/c c/d Figure 5.3 cont.

Genotype: c/d c/c c/d c/d c/c c/c c/d a/c a/c c/c a/b c/c a/c c/c a/c a/c a/c a/c c/c c/c a/c b/ca/c g:2A11

c - d -

C AG T + - O O tl-O 6 ~ 6 u C A DO DO-D DO G T DO thOO DO DO 1 2 8 13 24 33 37 43 44 52 53 61 67 68 72 73 87 88 98 106 107 122 123 PF1 PF2 PF3 PF4 PF5 PF6 PF7 PF8 h:2D2

Genotype: c/i c/gb/c c/i h/i b/i c/i d/g j/j d/e g/g e/gg /ig/g c/j f/ge/gg/j a/i i/j e/g d/g d/i Figure 5.3 cont.

Genotype: a/b c/e a/b c/e b/e c/e e/e b/c d/d c/e d/f b/d d/d f/f d/e e/f c/f d/d c/d b/b b/f d/f e/e i:2H12 p p i m i w m m ^ » 1 * 1 5 P a— b - - b si... «iS^|itPite»tis’—c c— ' • . 'I a V" \ » > - d — e e— <4 - Wx:& S :«£#=' \> ■ MmMi' - f v»:. ^ m m h mm ! s *XL <..., ... ? |*effigmMt jI .

C AG T + - 0 0 6 0 0 6 0 C A 0 0 D O O D O G T 0 0 6 6 6 D O D O 1 2 8 13 24 33 37 43 44 52 53 61 67 68 72 73 87 88 98 106 107 122 123 PF1 PF2 PF3 PF4 PF5 PF6 PF7 PF8 j:1B10 H —a

Genotype: b/b b/b b/b b/c b/b b/b b/b b/b b/c b/c a/b b/c b/c b/c b/b b/c b/c b/c b/b c/c b/b a/c b/b Figure 5.3 cont.

Genotype: b/b b/b d/d b/b d/d d/d d/e a/c e/e d/d b/e c/e e/e e/e ___ a/e e/e e/e d/e b/e c/e c/e e/e e/e k:1E3 m * b

b- mm m j i 1 d- * s 5 * ' 3t e J —e e - T L * m w H i ­ 's. •

- ■

C A G T + - d o -6 o 0 (*>□ c a d o d o -d DO g t d o f r o 6 DO DO 1 2 8 13 24 33 37 43 44 52 53 61 67 68 72 73 87 88 98 106 107 122 123 PF1 PF2 PF3 PF4 PF5 PF6 PF7 PF8

c a g t + - d

Stutter bands can also be seen in Figure 5.3d, but all are less intense than the allele bands, of which there were only two, ten base-pairs apart. Figure 5.3e, the 1 F11 microsatellite, also shows some stutter bands, but these do not impede allele identification when taking into account the relative intensities of allele plus stutter band and allele alone, seen in the sample of individuals 98 and 122, both heterozygotes with more intense lower allele bands. The results of 2A6 in f of Figure 5.3 are also easily interpreted as above, noting the samples of parents 53, 61 and 67 which are hetero-, Homo- and heterozygotes respectively. The results for microsatellite 2A11 (Figure 5.3g) and 1B10 (Figure 5.3j) gave similar stutter band problems, also resolved as described above, although the background of the 1 B 1 0 samples made the results more difficult to interpret.

The results of the parents at the 1B10 locus shown in the figure were amplified with a set of primers which gave less clear results and failed to amplify one allele in individuals 53 and 122 (see Figure 5.6 for result of individual 122). The primer set eventually used across the families revealed this allele, and will be discussed later. Individual 61 in Figure 5.3i is retarded by the presence of urea in the well, and the gel itself ran in an uneven fashion, as 144 can be seen by the 'e' alleles of Family 1 parents. Stutter bands were a particular problem for microsatellite 1E3 in Figure 5.3k, and the results presented were able to be seen most clearly following photographing of the gel. Several sample runs were carried out to determine the nature of the bands of each individual, made difficult by the unusual presence of a band two base-pairs above the actual allele, sometimes as intense as the actual allele band itself, as with individuals 1,2, 13 and 67. This phenomenon has been observed before in the PCR of microsatellites (Holmes et al. 1993). The result shown for the parents in Figure 5.3k are the clearest results over all the parents for the 1E3 microsatellite.

The informativeness of each of the microsatellites across the parents was varied. Examples of non-informative families or parts of families include the parents of offspring 14-23 in Family 1 , namely 13 and 8 . They are non- informative for microsatellite 1B7 (Figure 5.3a) because they are both homozygous for different alleles. All the offspring will be heterozygous for the two alleles, but there is no way of identifying which of the parental alleles have been inherited. A further example is shown with microsatellite 1E12, where all the parents for Family 1 are homozygous for the same allele (Figure 5.3d), and were therefore not used for linkage analysis.

Microsatellite 1F5 was found to be homozygous for the same allele in all parents of the reference panel (Figure 5.3I), despite having a PIC value of 0.7 when assessed over 1 2 breeds. The presence of stutter bands, most noticeably two base-pairs below the allele band was noted and was in each case less intense than the allele band itself. The uneven running of the bands of some individuals, especially in Family 1, was due to the presence of urea in the wells. This microsatellite was therefore not analysed further.

Where parents shared two alleles, one could be homozygous, as seen with microsatellite 2D2 in Family 4 (Figure 5.3h), or both could be heterozygous for the two alleles, as seen in Family 2 with microsatellite 2A11 (Figure 5.3g). Where three alleles are shared between the parents, one may be heterozygous and the other homozygous for a different allele, as shown in Family 2 with microsatellite 2H12 (Figure 5.3i). A further scenario is that both parents can be heterozygous, with one shared allele as with Family 2, for 1E7 (Figure 5.3c for the parents alone and Figure 5.4b for the family). The most informative parents of all are those who have 4 different alleles between them, as seen with the parents of part of Family 6 (dogs 87 and 8 8 ) with microsatellite 1E7 (Figure 5.3c), and with parents 8 and 13 in Family 1 for Figure5.4

0.0 C70) T G CA b:1E7 a: —I O- I □— 3 5 7 9 0 1 2 1 51 1 1 9 0 1 2 31 2 2 6 7 8 9 0 1 2 3 3 3 3 7 8 9 0 1 42 41 40 39 38 37 33 36 35 34 8 32 31 30 29 28 27 26 25 24 13 23 22 21 20 19 18 17 16 15 14 8 12 11 10 9 2 7 6 5 4 3 1 65556 1F11 Family1 O y □ 3 4 5 6 7 8 9 0 51 50 49 48 47 46 45 44 43 Family Family 6 6 6 6 6 6 6 6 6 6 6 [ 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 [5 6 6 6 6 2 CA -o- £ o O' ■o T G □□O CA -D 56655 + GT G -+ Figure 5.4 cont.

c: 1F11 Family 3

□ - ■OtO 0 A A 6 0 6 6 6 i i i 52 54 55 56 57 58 59 60 53 61 62 63 64 65 66

Figure 5.4 shows examples of results C A + G T gained from genotyping the reference families. The diagrams above each picture show the family structure. Circles represent females and squares d : 1E3 Family 6 represent males. The numbers in white indicate the numbers given to each O -Q- ■O individual in the families. The offspring of each family are in order of increasing 066666666 6666666 number. C, A, G, and T= base pair 89 90 91 92 93 94 95 96 97 87 98 99 100 101 102 103 104 105 marker of KS+ plasmid, + = plasmid control, - = no DNA control. b,c, d, e and f = alleles of the microsatellite as named in Figure 5.3.

G T C A G T - + C A 147 microsatellite 1F11 in Figure 5.4a. The presence of four different alleles in two parents was not common.

Once the informativeness of the parents had been ascertained, the offspring of the informative parents were genotyped. Representative examples of results of the genotyping of the reference families are shown in Figure 5.4. Samples on gels were arranged to enable clear interpretation of results, especially where families included more than two parents, such as Family 3

(Figure 5.4c) and Family 6 (Figure 5.4d). The best example is Family 1 in Figure 5.4a, where there are a total of seven parents, and most litters were sired by a single parent, individual 8 . This dog was sire to four litters and appears twice on the gels of Family 1 for easier interpretation of the results. All results from the genotyping of the reference families with each of the 11 informative microsatellites can be found in Appendix (i) of this chapter.

Figure 5.5 gives two examples of atypical results which were found during the typing of the reference families with the microsatellites. Figure 5.5a shows the alleles of a tetranucleotide repeat in Family 7 where the two alleles of parent 106 are only 1 base-pair apart, also shown in Figure 5.3h. In both figures, the parental result seems to show a single, thick band rather than two separate bands. The two different alleles of the parent were revealed in the offspring, who inherited one or the other of the alleles of dog 106. The two alleles were discernible as different when separated in the offspring, thus identifying the parent as a heterozygote. This small size difference between the alleles is unlikely to be due to the loss or gain of a repeat unit, but to the loss or gain of a single nucleotide, possibly outside the repeat sequence itself. Individual 110 in Figure 5.5a possesses the 'i' allele although it looks like a different allele, one base-pair smaller than the 'i' allele. This was due to the retardation of the sample in the well due to the presence of urea. It can be seen that both alleles of the dog are narrower than the alleles of the other individuals, which is one characteristic of the presence of urea in the wells, as described earlier, and illustrated in Figure 5.3b in dogs 24, 52 and 122.

An apparent spontaneous mutation in individual 126 is shown in Figure 5.5b, as indicated by the arrow. The mutation was produced by either the loss or

gain of a repeat unit from parent 1 2 2 , which has alleles both one repeat-unit above and below that of the allele shown in its offspring. This microsatellite showed four spontaneous mutations within the reference family offspring, indicated by italics in appendix (i). Microsatellite 2H12 was particularly unstable and therefore unreliable for uses such as parentage testing, and less Figure 5.5a:

G T CAGT+-

Figure 5.5b: Q tO

0 0 0 GCM CM CM CM CM CM

2H12F8

Figures 5.5a and b show unusual results found during the genotyping of the reference families. Figure 5.5a shows Family 7 typed with 2D2, a GAAA microsatellite repeat. The sire of the family was found to be heterozygous for alleles one base pair apart rather than the expected four base pairs. Alleles are indicated by the letters to the sides of the figures. Figure 5.5b shows one offspring of Family 8, typed with 2H12 - a GAAA microsatellite repeat - which shows a spontaneous mutation. The individual is indicated by the arrow and by the grey shaded box. Individuals are identified by numbers above the pictures. 149 useful for linkage analysis, as results involving spontaneous mutations have to be ignored. One further mutation was found with microsatellite 1B7; individual 57 in Family 3. It was found in the allele inherited from the sire (individual 52) of the litter - the mutation was of four base-pairs in size, either a deletion of insertion, depending on the allele inherited from the sire.

5.3.3 Allele non-amplification of microsatellite 1B10

The initial set of primers designed for microsatellite 1 B1 0 are indicated on the sequence in Figure 5.6 in italics. Subsequent to observations of apparently impossible allelotypes within the offspring of Family 3, the possibility of the presence of a mutation in the primer sequences was investigated. Individual 53, the dam of the family was the dog where the allele non-amplification was apparent, since it presented as homozygous. The design of a further primer pair and subsequent DNA amplification of each of the parents of the reference families revealed the presence of a previously undetected allele, fourteen base-pairs larger than the plasmid control in parents 53 and 1 2 2 . The detection of the allele solved the unusual inheritance pattern in Family 3 and further showed that Family 8 was now informative with the use of the new primer pair. The allele revealed in individual 122 is shown in Figure 5.6, along with its inheritance in some of the offspring of Family 8 , and was shown to be clearly visible only using the new primer set, as indicated by the upper arrow. Using combinations of the new and old primers revealed the allele only when the new reverse and the old forward primers were used, as seen in Figure 5.6, indicated by the lower arrow. The visualisation of the allele with the new primer set and when using the new reverse primer with the original forward primer suggests there was a mutation of some kind in the 3' end of the old reverse primer sequence. Considering the much larger size of the allele revealed by the new primer sequence, the mutation may be a deletion or insertion, although this can be confirmed only by sequencing the allele itself.

All family results previously gained for 1B10 with the initial primer set were rechecked to ensure the same results were produced with the new set of primers and the results were recorded as seen in the results table in Appendix (i) of this chapter.

5.3.4 Genetic linkage analysis of microsatellites The results of all informative families were collated, and genetic linkage analysis was carried out to assess how many of the polymorphic markers isolated were genetically linked to other such markers which had been analysed using the same reference families. Data were compiled using the Figure 5.6 New New F New R Old

ET6 cT6 ticb Part-sequence of IB 10: TTTGCAATGC ATTTTTGTGT CTAACTCCTT CTTGTCTTTA GA CTCAATAG

TAGCTCCCAA GTGAATCACC AAATTTGGCA AAATTGATAT TCAAACATGG

TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTTTNCA CATGGTGTCA

AAGAACGGAT CGTTGGGAAC CGGANCTGAA

Figure 5.6 shows a previously undetected allele in 1B10 microsatellite, as indicated by the top white arrow to the left of the picture. The lower arrow indicates the revealed allele using the new reverse and old forward primers. 122 is the sire of Family 8 which revealed the allele when alternative primers were designed. 123 is the dam of the family and 124 and 129 are two of the offspring, the former showing the previously undetected allele. C, A, G and T= base pair marker lanes (KS+), += plasmid control, -= no DNA control. New= alternative primers, New F= alternative Forward primer and original Reverse primer, New R= alternative Reverse primer and original Forward primer, Old= original Forward and Reverse primers. Old primers are indicated by the italicised sequences below the picture, new primers are indicated by the underlined sequences below the picture. Cyrillic 2.10 database (Cherwell Scientific Publishing Ltd.), and CRI-MAP (Lancer and Green 1987) was used to assess any linkage with other polymorphic markers submitted to the DogMap database (Lingaas et al. 1997). Twopoint analyses were carried out on a total of 115 markers

(including the 11 isolated), which gave pairs of markers which were linked to each other (for results, see appendix). For each pair of linked markers, the recombination fraction and the Lod score was given. The recombination fraction is the number of recombinants (k) as a proportion of the total number of possible recombinants (n), denoted by 0 , where the values of 0 range from 0 to 0.5. The recombination Fraction can be converted to genetic distance, in Kosambi centiMorgans (cM) (Kosambi 1944) using the following formula:

x = 0.5 tanh‘ 1(20) = 0.25 In 1+20 x = genetic distance in centiMorgans 1 -20

The Lod score, Z(0) is the log of the likelihood ratio with respect to the recombination fraction. This means it is a measure of the likelihood ratio of linkage between two markers versus the likelihood of non-linkage:

Z(Q) = log-io [L(0) /L(0.5)], where L is the likelihood.

Where this is based on number of recombinants, k, and n - k nonrecombinants, the Lod score is given by:

Z(6) = n log(2) + /dog(0) + (n-k) log(1-0) if 0 > 0 ((Ott 1991), pg41).

When Z(6) is equal to or greater than 3, the null hypothesis of free recombination is rejected, and when the value falls below - 2 , the null hypothesis is accepted (pg57, (Ott 1991)). Values falling between the two limits are seen as being of indeterminate linkage. A Lod score of 3 suggests linkage is one thousand times more likely than non-linkage between two loci, equivalent to a probability of less than or equal to 0 .0 0 1 .

The analysis assesses each marker with respect to each other marker singly and separately, producing pairs of markers with Lodscores of >2, although only linked markers with a Lodscore of 3 or above were used in any further analysis. Using this value as a cut-off point was important, because although in many cases a value just below 3 would provide supporting evidence for markers within a linkage group, as with F 6 6 and CPH20 (Lodscore 2.95), and

2 A6 and k292 (Lodscore 2.95), they can also be misleading, such as CPH20 152 and Lei030 (Lodscore 2.84). These latter two markers are actually both in different linkage groups - 1B10 and 2 A 11 respectively - and on different chromosomes (see Table 6.1).

Eight linkage groups involving the eleven submitted microsatellites were found; two with two markers, two with three markers, two with four markers and two with five markers. The groups with their recombination fractions and Lod scores can be seen in Table 5.3. In some cases, markers within a linkage group are not all linked to each other. For example, 1D 6 is linked to cxx.030 which is in turn linked to 111, but 111 shows no significant linkage to 1D6. This does, however, give some indication as to the order. The results from the analyses of these markers can be found in Appendix (ii) of this chapter.

Each group of microsatellites was then further analysed to order the markers within the group, using the build option of the CRI-MAP program. This involves using two strongly linked markers as 'anchor loci' and ordering the remainder around and/or between them. This option does not explore all possible orders, as additional loci are ordered without breaking orders previously determined by earlier placed markers and giving cM distances between the ordered markers. A minimum Lod score of 2 was used in ordering the markers, as the linkage groups were small, and not all markers within each group were able to be ordered given the limitations imposed. All results are given in Appendix (iii) of this chapter.

The all option was used for further analysis - all results can be seen in Appendix (iv) of this chapter. This option assesses each marker with every other marker in the group and gives the most likely order. If this order is less than 10OOx more likely than the next best order, then a short list of possible orders is given, in decreasing order of likelihood. The higher stringency of a minimum Lod score of 3 was used for this analysis, as it gave the final determination of the most likely order of the microsatellites within each group. A simple example of the results given is shown below:

0 LEI015 1 cxx.147 2 2H12 ordered loci: 0 2 inserted loci: 1 0 2 1 -20.835 0 1 2 -21.255 1 0 2 -22.919 153 Table 5.3

Marker Linked marker Lod score Recombination fraction 1F11 CPH21° 7.68 0.15 2D2 cxx.250+ 18.46 0.07

1 D6 cxx.030+ 3.99 0.15

cxx030+ 1 1 1 * 9.36 0 . 0 2

2H12 LEI015* 3.31 0 . 2 2 cxx.147+ 4.26 0.16 1B7 cfp6301 6.58 0.13

k2 0 * 14.34 0.04

k2 0 * cfp6301 8 . 2 2 0.08

cfp6301 CPH16t 4.07 0 . 1 2

1B10 F6 6 ° 6.32 0 .0 VIAS-D10a 3.57 0.17 VIAS-D10Q CPH20t 3.63 0.08

2A6 1E3 13.99 0 .0 1

cxx.349 1 3.20 0 . 2 2 k32° 3.01 0.18 k32° k292° 3.86 0.08

2A11 cxx.069+ 11.04 0 . 0 2

CPH3t 8.27 0 . 1 2 LEI0300 14.30 0.06 109* 4.23 0.04

cxx.069+ CPH3t 14.31 0 . 0 2 LEI0300 9.60 0.05 109* 5.65 0.08

CPH3t LEI0300 8.30 0 . 1 2 109* 6.48 0.07 LEI0300 109* 3.58 0.18

Table 5.3: Results gained following linkage analysis of microsatellite markers, showing linked marker pairs, their Lodscores and Recombination Fractions. References to markers which were found to be linked to the microsatellites. 0 - markers which are unpublished or of unknown origin, + - (Ostrander et al. 1993), * - (Holmes et al. 1993), t - (Mellersh et al. 1994), * - (Fischer et al. 1996), t - (Fredholm and Wintero 1995), a - (Primmer and Matthews 1993), 1 - (Ostrander et al. 1995) 154

Taken in adjacent pairs, the numerical values shown to the right of the orders at the bottom can be used to calculate how much more likely the most likely order is compared to the next best order: A difference between the adjacent numbers of 3 or greater means the likelihood of the first order over the second order is at least antilog 3, or 1000x more likely. In the case above, the difference is 0.42, converting to the first order being 2.6x more likely than the next best order. It should be noted, however, that the confusion of the order seems to stem from the inability of the program with ordering loci 1 and 2 , namely cxx.147 and 2H12. If these are taken to be ordered as the most likely shown in the first ordered row above (this is supported by the higher Lod score value between 2H12 and cxx.147 than between 2H12 and LEI015, shown in Table 5.3), the difference is 2.084, or 121 x more likely than the order where the two markers are split. 1 D6 presents a similar problem, where markers 111 and cxx.030 are closely linked and the program cannot decide the order. However, the alternative orders do not allow for these markers to be split. If more information was available to enable these two markers to be ordered, the result would be definite.

None of the linkage groups were ordered with the certainty needed to confirm the order of the group - this is a Lod score difference of >3, or the probability of the first given order being > 1 0 0 0 x more likely than the next best order. However, an ordered linkage group with an order which was only 45.5x more likely than the next best has been published (Lingaas et al. 1997). 2A6/1E3 and 1 B1 0 do not have a figure for how much more likely the order, shown in Figure 5.7 is than the next best order. For 2A6/1E3, it is the pairs 2A6/1E3 and k292/k32 which are causing the problems. If each of these pairs is taken as a single entity, the next best order which separates one of the pairs is 398x less likely than the order shown. For 1B10, the pairs are VIAS-D10/CPH20 and F66/1B10. If these pairs are treated as those in 2A6/1E3, the next best order after the one shown is 132x less likely.

These multipoint linkage analyses were carried out on linkage groups which contained more than two markers to order the markers within the group. The six groups containing more than two markers were assessed, and the most likely ordering of each marker within each group using the combined information of the build and all options is shown in Figure 5.7. Despite the different minimum Lod scores of the two options, the most likely orders were found to be the same in all cases. The probabilities of the orders were taken from the all option with the higher minimum Lod score of 3. 155 Figure 5.7

1D6 2H12 1B7 1B10 (LI 3) (L15) f { 1 0 6 } I-LEI015 A r 1B7 -CPH20

6.7cM 8.4cM -111 2.2cM -k 2 0 (23.6cM) - C X X . 0 3 0 -VIAS-D10

9.3cM

I (15.5cM) 28.9cM -2H12 -cfp.630 (17.7cM)

(16.6cM) ; 1D6 15.3cM

l F66 1B10 ~1.3x t -cxx.147

■CPH16

{2 H1 2 } ~608x

~2 .6 x 156

-cxx.349 -LEI030

5.4cM 2A6/1E3 2A11 (L16) (L3) -2A11

6 cM

24.4cM Figure 5.7 - cxx.069 shows the ordering of six groups of linked markers. Multipoint 2.6cM linkage analysis was carried out, -CPH3 based on twopoint linkageanalysis results, using a minimum lod score of 2 to carry out the analysis. Bold type markers are those markers I 7.2cM isolated. The distances between the markers are proportional on the diagrams and are given in sex-averaged Kosambi genetic distance; cM (centiMorgans). -109 Bracketed distances were those {2A6} calculated manually from the recombination fraction given in - 1E3 Table 5.3, as the program used to order the markers was not able to ~378x place a marker at one particular position with sufficient certainty. The double arrow-headed unbroken lines indicate where the two markers' positions could be swapped, giving almost equal liklihood of the alternative order. 16.2cM The arrow with 2A6 next to it in the 2A6/1E3 group shows 2A6 can be positioned with almost equal likelihood either side of 1E3. {2A6} indicates the slightly more likely position of the marker. Dotted lines with an unbraketed marker indicate the most likely order of the markers, although this is not significantly more likely than the - k292 order shown by the dotted lines with the bracketed markers. Where present, the numbers below each ordered group show how much more likely the order (7.7cM) shown is than the next best order (the next best order is shown using dotted lines as described above when the alternative order is less than 10x more likely than the - k32 next best order). 157

Of the 8 linkage groups found, four groups increased the size of already published groups (Lingaas etal. 1997): 1B7 (increased L13), 1B10 (increased L15), 2A6/1E3 (increased L16) and 2A11 (increased L3), and four were novel groups consisting of both 2 markers (1F11 and 2D2) and 3 markers (1D6 and 2H12).

The microsatellite 1E3 and 2A6 were shown to be very closely linked, confirming the physical linkage data in Chapter 4 which hybridised the cosmid insert DNA from which the microsatellites were isolated to the same region of the same canine chromosome. The close linkage of the microsatellites in a genetical sense therefore confirmed the physical linkage seen with the FISH results.

Apart from the homozygous alleles of all parents found with 1F5, only two microsatellites were found not to be genetically linked to any already in the database, 1 E1 2 and 1E7. Since there are so few markers present in the database, this result is unsurprising. 1E7 was physically mapped to the pseudoautosomal region (Chapter 4, pg 120) and would not be expected to link to any markers on the database. The likelihood of other pseudoautosomal markers within the 115 present in the database is low. Since it showed no significant sex-linkage, it would not show genetic linkage with any sex-linked markers and not being present on the autosomes, it would not show any linkage to others on those chromosomes. 1E12 showed only 3 alleles in the reference family parents and had the lowest PIC value of the isolated microsatellites. Non-significant linkage (Lodscore of 2.95) was found to database marker cfp.629 (Ostrander et al. 1995)(see Appendix (ii), pg 173), but any significant genetic linkage with 1 E1 2 will need to rely on finding a further, highly informative marker to which it is linked or assessment on further reference families of both 1E12 and cfp.629. 158 5.4 Discussion

5.4.1 PCR optimisation PCR products were optimised by varying the temperatures used in touchdown PCR (Don etal. 1991) rather than by varying the concentration of Magnesium in the buffer solution. This meant many sets of primers were optimised with similar conditions and varying by the cycle number alone. This increased the efficiency of genotyping by reducing the number of different PCR programs required.

Sequence-specific problems meant that 2A7 was not able to be optimised. Spurious priming was caused by the presence of one of the primers within a repetitive element next to the GAAA repeat. Up to 80% of (A( 2-3)G)(n) repeats in humans and rats have been found to be associated with Alu elements (Beckmann and Weber 1992). The presence of a can SINE immediately adjacent to the microsatellite in 2A7 precludes designing primers outside the sequence, as the element itself has a Sau3A 1 site , where the DNA insert ends and the plasmid begins. These SINEs are distributed throughout the canine genome (Coltman and Wright 1994)(Minnick etal. 1992), providing many potential priming sites. The primers designed within the repeat therefore probably primed at similar sites throughout the canine genome on the template DNA. The strong binding at temperatures high enough to preclude primer binding to the plasmid control, even with one primer designed out of the repeat sequence itself, suggest multiple priming events with a single primer. Indeed, the presence of specific, stronger bands within the spurious banding, although the amount of product varied greatly, suggested the two primers were binding specifically in some cases.

5.4.2 Informativeness of microsatellites for reference family analysis The number of alleles found in the parents of the reference families (Table

5 .2 ) roughly corresponds to the number of heterozygote parents, especially where there are very few or numerous alleles. Care was taken when comparing these values with the PIC values, which were calculated from different data. The PIC value should be used as a guide only to the informativeness of the microsatellites; a PIC value should be calculated from at least 25 individuals to give a true value (Ostrander et al. 1995).

Microsatellite 2D2 had the highest values in all cases, suggesting this microsatellite is the most informative. 1B7 was highly informative within the reference families, although its PIC value across the twelve breeds of dog did not reflect this. Many parents were heterozygotes at the 2A6 locus, although neither its PIC value or allele number reflect this degree of potential informativeness. Two microsatellites produced alleles which were informative throughout the reference panel families - 2A11 and 2D2, which was reflected in the number of heterozygotes in the parents. This suggested that it was the number of heterozygotic parents which was important rather than the total number of alleles of the microsatellite. 1F5 was totally uninformative for the reference families despite having a high PIC value. Two dog breeds were represented in the reference families, the individuals of which all happen to be homozygous for the same allele of 1F5, illustrating the great variation of microsatellite alleles between dog breeds. However, given the PIC value, 1F5 could well be informative with other dog breeds and therefore other reference families. The only microsatellite to have a PIC value of less than 0.5 was

1 E1 2 , which was not very informative with respect to the reference families.

5.4.3 Genotyping the reference families Genotyping was carried out in such a way to ensure accuracy, since stutter or shadow bands were present with 1 E3 and 1 B7. The presence of stutter bands can mask the true allele sizes (Figure 5.3k). Both 1B7 and 1E3 contain compound perfect repeat sequences (Weber and May 1989), which made the number of repeats within the microsatellite large; 1E3 contained a total of 28 uninterrupted dinucleotide repeats, and 1B7 18 and 13 di- and tetranucleotide repeats respectively. In general, the stutter bands produced increase with increasing allele length (Walsh etal. 1996) and decrease with increasing repeat unit length (Beckmann and Weber 1992), as was seen with the 1E3 repeat. These extra bands are caused during PCR (Murray etal. 1993) and generally appear with graded intensity in multiples of 2 base pairs below the main band (Weber and May 1989). This was not a serious problem with the family genotyping, since the genotypes of the offspring could be used to confirm the parental genotypes. The presence of stutter bands can aid the accurate sizing of alleles, since they provide a convenient 'ladder' by which to count up and down the number of repeats between different allele sizes (Edwards et al. 1992). Where an individual was heterozygous for two alleles one repeat unit apart, the stutter band of the upper allele made the lower allele more intense. Once the phenomenon is recognised, it is possible to identify heterozygotes from homozygotes where the stutter band below the allele was particularly strong. Where two alleles of one repeat unit difference in size are present, the position of the stronger band confirms the presence of a homo- or a heterozygote. If the lower band is the most intense, it consists of the true allele plus the stutter band of the allele above it, and the individual is a heterozygote, if the upper is more intense, the individual is homozygous at that locus and the lower band is a stutter band.

Compound microsatellite alleles can differ in size in such a way that variation was caused by changes in the number of both repeat unit lengths. The use of this microsatellite type, for example 1B7, in evolutionary studies would not be possible without sequencing the allele each time (Angers and Bernatchez 1997). It also poses possible problems for linkage analysis in that two alleles of the same size may not be of identical sequence, hence masking a variation of the repeat. The loss of a single repeat unit in one repeat unit type of a compound microsatellite and the simultaneous gain of a repeat unit of the other type within the microsatellite would result in no size change, and the difference between the sequences would not be detected. However, sequence analysis of each allele length could provide extra information regarding the true polymorphic properties of the microsatellite, thus making it more informative. This phenomenon is known as allelic homoplasy (Garza and Freimer 1996), which can also occur in the sequences flanking the repeat (Grimaldi and Crouau-Ray 1997).

The tetranucleotide repeat, 2D2, gave some interesting results by revealing alleles sizes which did not correspond to the loss or gain of a repeat unit. As illustrated in Figure 5.5a, the sire of family 7 (individual 106) was heterozygous for alleles which were 1 base pair apart in size. The alleles were inherited in the normal fashion as seen in the figure, so must represent a true polymorphism. The repeat itself, as described in the discussion to Chapter 3, is surrounded by repeat-like sequence, with various short stretches of mono- and dinucleotide repeat sequences. The loss of a single nucleotide from one of the mononucleotide stretches is a likely candidate for the production of the allele size difference. This illustrates the complexity and consequent increased informativeness of tetranucleotide repeats surrounded by repeat-like sequences.

5.4.4 Mutations Several spontaneous mutations were observed during genotyping. 1B7 revealed a mutation in individual 57 of family 3 which involved either the loss or gain of a probable single 4 base pair repeat unit originating from the father. Since this microsatellite was a compound repeat containing a dinucleotide repeat, it was also possible the mutation was due to the loss or gain of two dinucleotide repeats, which is rare (Weber and Wong 1993). Mutations were also found in microsatellite 2H12, which had four mutations, spread over three families, three of which originated with the father. This supports the observation that more spontaneous mutations occur in the male rather than the female germline (Weber and Wong 1993). Each of these mutations was consistent with the loss or gain of one repeat unit in size.

There have been several reports of a high mutation rate of tetranucleotide repeats when compared to dinucleotide repeats (Edwards etal. 1992)(Weber and Wong 1993)(Francisco et al. 1996), as well as a large variation in the reported mutation rate of microsatellite repeats in general, from 1 0 _2to 1 0 -5 per locus per gamete per generation in humans ((Weber and Wong 1993)(Di Rienzo et al. 1994) and references within). The maximum reported figure of

10“2 (1%) mutations means the observed mutation of 1B7, one in a total of 1 0 *? offspring with 2 lia lle le s (4.7 x 10*3, 0.5%) falls within the reported range and is not the most unstable reported microsatellite. There has been a report of a (GAAA)n tetranucleotide repeat which has been found to be equally as unstable (Liu et al. 1995). 2H12, with 4 mutations, gives a rate of 1.9 x 10 -2 (1.9%), suggests this microsatellite is extremely unstable: A human (CCTT)n repeat has been found with a reported mutation rate ranging between 0.3 and 2.7%, thought to be one of the most unstable microsatellites found in the human genome (Hastbacka etal. 1992). The average mutation rate calculated for canine tetranucleotide repeats was found to be 0.4% (Francisco et al. 1996). The instability of these microsatellite repeats meant the results of those individuals were excluded at the linkage analysis stage, giving less informative results.

A further mutation, causing allele non-amplification was discovered and overcome by designing alternative primers, subsequently giving more informative results with that microsatellite (1B10). The allele which was revealed for parents 53 and 122 had not previously been seen with the initially designed primers (see Figure 5.3j). The presence of some non­ specific product in the area where the allele was revealed by the second primer set indicated the primer was binding to some extent, so the mutation present was probably inhibiting binding rather than preventing it. The new primer design took into account the effect of a mutation in the primer sequence having greatest effect if present in the 3' end of the sequence (Koorey et al. 1993). The primers were designed so the 3' ends overlapped with the 5‘ ends of the original primer sequences, the 3' end being the most crucial for accurate binding during PCR. The presence of a further allele within a family for a particular microsatellite is not always revealed and can lead to a loss of information when carrying out linkage analysis by causing a decrease in the observed heterozygosity levels of the microsatellite (Callen etal. 1993). The nature of the mutation was probably the presence of a mutation at the 3 ' end of the original reverse primer sequence, given the visualisation of the allele both using the new primer pair and the new reverse with the old forward primer. The clearest results were produced using both the new primers, so these were used subsequently. The dam of this family 3 (individual 53) was initially supposed to be homozygous, but the offspring produced results which were seemingly impossible given the genotype of the sire. The detection of the allele revealed the dam and some of her offspring as heterozygotes. Family 8 , initially discarded as both parents were thought to be homozygous also revealed the allele in question, revealing the sire (individual 1 2 2 ) as a heterozygote with the new primers. The importance of revealing this extra allele was such that it increased the number of linkage groups produced with the microsatellites. One study of human microsatellite showed the incidence of allele non-amplification to be 30% (Callen etal. 1993), revealing a potentially large loss of information due to mutations in the flanking sequences of microsatellites. These alleles could be revealed if two sets of primers were routinely designed and used for each polymorphic microsatellite identified, but will normally only be carried out where non-amplification is detected by unusual allele segregation.

5.4.5 Linkage analysis of reference family data The LOD score minimum of 3.0 used for linkage analysis studies gives a level of significance higher than that usually required for statistical tests. The reasons for this are due to the natural clustering of markers which will occur by virtue of them being on the same chromosome. The acceptable level of 5% significance used in many statistical tests, meaning 5% chance of detecting a linkage where none exists is not stringent enough. Given that there are 23 human chromosomes, 4.4% of the markers will be on the same chromosome anyway. With a 5% significance level, as many false as true linkages will be detected (Ott 1991, pg 6 6 ). Taking into account other factors , such as the fact that markers may be too far apart on the same chromosome to show linkage, the critical odds ratio of 1000, or a Lod score of 3 can be seen as equivalent to a linkage probability of 95% (Ott 1991). More recent calculations have reset the limit at a Lod score of 3.2 (Sawcer et al. 1997), but given the small number of markers as of yet genetically linked in the canine genome, the linkage groups found were constructed on the basis of a minimum Lod score of 3.0. Pairs of linked markers - demonstrated by having a Lodscore of >3 - were identified, and subsequent linkage groups were initially assessed for the order of each marker within the group, using the build option of the CRI-MAP program at a lower significance level of a Lodscore of > 2 . With this initial information, the all option (at the higher significance level of > 3 ) was used to assess the reasons behind the uncertainties of the orders produced with the initial bulid option.

There were six groups of markers which contained sufficient markers to enable ordering, although only 2 groups produced orders with any significance, which were still not significant if the criterion of a Lodscore of three or greater between the first and the next best order was accepted. The groups are those containing 2A11 and 1B7, both of which improved the significance of the order of the already published groups (L3 and L13 respectively) (Lingaas et al. 1997). The 2A11 group consisted of the most markers which were connected to each other as well as to the added marker. 2A11 itself was linked with a Lod score of >3 to all other markers, and each of these markers was linked to at least two further markers within the group, the order of the published markers within the group being conserved. The linkage was not as complete with the 1B7 group, but it does span a larger genetic distance and contains fewer markers. Again, the order of the published linkage group was maintained. One can state with relative certainty that the order of these two groups with the additional markers is correct, even though the certainty of the order does not reach the required level of significance. As was stated in the results section, the ordering of the remaining groups was less certain, due to insufficient information and the small sizes of the groups.

The group containing 1 E3 and 2A6 is an interesting example because of the high Lodscore between these two markers, suggesting they are very closely linked. This, along with the lack of significant linkage of 1E3 with any of the other markers in the group (a certain level of linkage is present, but the values are between a Lod score of 2 and 3, meaning linkage is not certain), presents a confusing set of data to the linkage program. This could be explained by the lower informativeness of this marker compared to 2A6; 1E3 had half the number of heterozygous parents within the reference families as 2A6. If the order of the two markers was able to be determined and 1E3 was significantly linked to more of the markers in the group, the order would become much more certain.

As the number of markers genetically mapped to the canine genome increases, any spurious linkage should be revealed, producing a more correct and informative map. Not all markers have been genetically linked using the same set of reference families, so to combine the information contained within all families, it would be necessary to decide upon a single set of reference families across which all markers should be assessed, and use this set of families alone. At the moment, there is more than one reference family set being used. The one which used in this study is the DogMap reference panel, used by laboratories from 20 countries (Dolf 1997). A further set of linkage groups has been established using a different set of reference families (Mellersh etal. 1997). It involves the use of 150 markers in 30 linkage groups, and the results are to be combined with those of the DogMap reference panel by typing a subset of markers from each of the linkage groups on the other reference families. Of the 150 markers used in the second linkage analysis, some have already been typed on the DogMap reference panel, and three were found to be linked to the groups containing the nine polymorphic microsatellites. These were cxx.030 (Ostrander et al. 1993) from linkage group L21 (Mellersh et al. 1997), in the 1D6 group, assigned to chromosome 2, and cxx.250 (Ostrander et al. 1993) from linkage group L18 (Mellersh etal. 1997), in the 2D2 linkage group, assigned to chromosome 9. A further marker, cxx.147 (Ostrander et al. 1993) from linkage group L13 (Mellersh etal. 1997), was found to be part of the 2 H1 2 linkage group, which was not physically assigned. Combining these two sets of data, the 1D6 group would contain eleven markers; two from the 1D6 group and nine from the L21 group, which is able to be chromosomally assigned to chromosome 2 . The L18 groups contains six markers and expands the 2D2 group to a total of seven markers on chromosome 9 , and the unassigned 2 H1 2 group with three markers is expanded to five with L13. The precise ordering of the groups would require more than one marker from each group to be assessed in the other reference family, but the potential for the fast expansion of the linkage map of the canine genome is great.

An ideal set of reference families, of three generations, allows phase (the knowledge of the inheritance of the male and female-derived alleles in a family) to be established, as well as being able to more accurately establish the number of recombination events within a family between two markers. This extra information would lead to higher Lod scores in all aspects of genetic mapping, as well as the identification of any double crossover events. The cross-breeding of a set of reference families - for example having all four grandparents in a family of a different breed - would greatly increase the possible number of different alleles for any given polymorphic marker. A cross-breed set of reference families would enable the exploitation of the natural variation of microsatellite alleles between dog breeds.

A good reference family would provide maximum opportunity for all those involved in genetic mapping to combine their data and speed the progress of the genetic map of the canine genome as a whole. Standard reference families have been used for other genome mapping projects with great success, notably the CEPH families used in the human genome mapping project. A standard set of families is also in use in the PigMap project (Signer et al. 1994) to great effect. 166 APPENDIX

(i) Table of reference family data

The data in the table overleaf represents allele sizes at each of eleven polymorphic microsatellite loci, named across the top of the table on each page, over the eight reference families. Data shaded in grey represents the parents in each family. Data in italics indicates individuals with spontaneous mutations. A thick line separates families on the same page. Families are numbered 1 to 8 in the Link-fam column. Animal is the individual dog number in the reference families, f denotes the father of the offspring and m denotes the mother of the offspring, the mother and father of each offspring is identified by its parents' link number in the f and m columns. The sex of each individual is given by 1 as male or 2 as female in the column headed sex. Genotypes of reference families at microsatellite loci

Unk-fam Animal m se x 1B7 1B10 1D6 1E3 1E7 1E12 1F11 2A6 2 All 2D2 2H12

3 1 2 2 206,198 166,164 130,126 169,169 180,178 406,406 396,392 4 1 2 2 206,198 166,166 132,126 173,169 180,180 406,379 396,384 5 1 2 2 198,192 166,164 132,126 169,169 180,180 379,375 404,392 6 1 2 2 192,192 166,166 130,130 173,171 180,180 406,375 396,392 7 1 2 2 198,192 166,164 130,126 173,171 180,180 379,375 396,384

9 8 2 1 206,196 166,166 130,126 173,171 180,178 414,406 396,384 10 8 2 1 196,192 164,164 130,130 173,169 180,180 406,406 396,384 1 1 8 2 1 196,192 166,166 130,126 171,171 180,180 406,379 400,384 12 8 2 2 206,196 164,164 132,126 173,171 180,180 406,406 400,392

14 8 13 1 121,119 164,164 130,130 175,171 180,178 414,375 396,384 15 8 13 1 121,119 166,166 130,130 173,169 180,180 414,375 400,388 16 8 13 2 121,121 166,166 130,130 171,169 180,178 414,375 396,392 17 8 13 2 121,121 166,164 130,130 175,173 180,178 414,406 400,392 18 8 13 2 121,121 166,164 132,130 175,171 178,178 414,406 396,384 19 8 13 2 121,121 166,166 132,130 175,171 180,180 414,406 396,384 20 8 13 2 121,121 166,164 132,130 175,171 178,178 414,406 400,384 21 8 13 2 121,119 166,164 132,130 175,173 178,178 414,406 400,384 22 8 13 1 121,121 164,164 132,130 175,173 178,178 406,406 400,384 23 8 13 2 121,121 166,164 132,130 175,171 180,180 406,406 400,392 Genotypes of reference families at microsatellite loci link-fam \lmk-no 2 All 2H12

196,192166,166 130,130 173,173 180,178 414,378 396,396 196,192 166,166 130,130 173,173180,180 406,375 400,384 196,192 166,166 130,130 173,171 180,180 414,375 400,396 196,196 172,164 132,130 173,173 180,180 414,375 400,396 196,196 172,164 132,130 171,17180,180 414,378 400,396 196,192 172,166 130,130 173,17 180,178 414,378 396,396 196,196 166,166 130,130 171,17 180,180 406,375 400,384 196,196 166,166 132,130 171,17 180,178 406,375 396,384

196,192 166,164 130,130 171,17 180,180 406,375 396,384 196,196 166,164 130,130 173,17 180,180 414,406 396,392 196,196 166,164 130,130 171,17 180,178 414,406 396,392

196,194 164,164 224,224 175,17 115,115 180,178 375,375 392,384 194,192166,164 224,224 175,17 115,115 180,178 414,406 384,384 196,196 166,164 224,220 175,1 117,115 180,180 375,375 384,384 196,192 164,164 224,224 175,17 115,115 180,178 414,406 392,384 196,194166,166 224,224 175,17 115,115 180,178 414,406 392,384 Genotypes of reference families at microsatellite loci

link-fam \link-no\ f \m sex 1B7 1B10 1D6 1E3 1E7 1E12 1F11 2A6 \2A11 2D2 \2H12 iH— Hib h h h h i 45 43 44 1 218,206 121,119 172,172 226,220 134,124 152,142 175,169 121,115 184,184 379,374 396,388 46 43 44 2 218,206 121,119 172,164 230,220 126,124 142,142 175,169 121,113 184,184 402,374 396,388 47 43 44 1 218,206 121,121 172,172 230,220 134,126 142,142 175,169 115,113 184,180 379,374 392,388 48 43 44 218,214 121,121 172,164 226,220 126,126 152,152 175,169 121,115 184,180 402,374 392,388 49 43 44 1 218,214 121,121 172,172 226,220 134,126 152,142 175,169 115,115 184,180 402,374 396,388 50 43 44 1 218,214 121,121 172,172 230,220 126,126 152,142 171,169 121,113 184,180 402,374 392,388 51 43 44 218,206 121,121 172,172 230,220 134,126 152,142 171,169 121,113 180,180 402,374 392,388 m i Wf M 54 52 53 1 214,198 131,119 172,172 228,224 128,126 142,142 173,169 113,113 184,180 402,379 392,380 55 52 53 1 214,198 131,119 172,172 228,224 134,134 142,142 169,169 121,113 182,180 402,379 388,384 56 52 53 1 214,198 121,121 172,172 228,224 134,134 142,142 173,169 121,113 184,180 398,379 392,380 57 52 53 2 214,202131,119 174,172 224,220 128,126 142,142 173,169 115,113 184,180 398,379 392,380 58 52 53 1 214,198 121,119 174,172 224,220 134,126 142,142 169,169 121,115 184,180 402,379 392,380 59 52 53 2 214,198 131,119 172,172 228,224 128,126 142,142 169,169 113,113 182,180 402,379 392,380 60 52 53 2 214,198 121,119 172,172 228,224 134,134 142,142 173,169 113,113 182,180 402,379 392,380

62 61 53 2 218,214 131,119 172,164 220,220 134,128 142,142 171,169 115,115 182,180 379,379 388,380 63 61 53 2 214,196 121,121 172,164 226,220 134,128 142,142 173,171 115,115 182,180 379,379 396,388 64 61 53 1 214,196 121,119 172,172 228,226 134,128 142,142 173,171 115,113 184,180 379,379 396,388 65 61 53 1 218,214 131,119 172,172 220,220 134,126 142,142 173,171 115,115 184,180 379,379 388,380 66 61 53 1 214,196 121,121 172,164 228,220 134,134 152,142 171,169 115,115 184,180 398,379 396,388 Genotypes of reference families at microsatellite loci

1link-farn link-no sex 1B7 1B10 1D6 1E3 1E7 1E12 1F11 2A6 2A11 2D2 2H12 §1 m m f e HH ■ ■ ■ HIH H 69 67 68 1 218,206 119,119 134,124 152,142 175,169 115,113 180,180 379,379 70 67 68 1 218,206 121,121 134,128 142,142 175,175 115,113 180,180 379,375 T u c "7 -t 6 7 68 2 2-o h 2* H 2“* io 4 -.24 ■1 7C 4 *7C ■‘■•'M13 180,180 379,379 ilfy is fitis Lt2i 1211 is4 f s&te|4nto» m m m mm ■n■ ■ ■ ■ ■ ISM 1 m m m w k B m Hi■1B 74 72 73 1 218,206 121,121 172,164 220,220 128,128 152,142 175,169 113,113 184,180 406,379 388,380 75 72 73 1 218,204 121,121 172,164 220,220 128,128 152,142 175,175 115,113 184,180 379,374 388,384 76 72 73 1 218,206 121,119 172,172 220,220 128,128 142,142 175,175 113,113 184,180 406,379 388,380 77 72 73 2 218,206 121,119 172,164 220,220 134,128 152,142 175,175 115,113 184,180 406,379 384,384 78 72 73 2 218,204 121,119 172,164 230,220 134,128 152,142 175,171 113,113 184,180 406,383 384,380 79 72 73 2 218,204 121,121 172,164 220,220 134,126 142,142 171,169 113,113 184,180 379,374 384,380 80 72 73 1 218,206 121,119 172,164 220,220 134,126 142,142 175,175 115,113 184,184 406,383 388,380 81 72 73 1 218,204 121,121 172,164 220,220 128,126 142,142 175,171 115,113 180,180 406,379 384,380 82 72 73 1 218,204 121,119 172,164 230,220 128,126 152,142 175,169 115,113 184,180 379,374 384,380 83 72 73 1 218,206 121,119 172,164 230,220 134,126 142,142 171,169 115,113 180,180 379,374 384,380 84 72 73 2 218,204 121,119 172,172 230,220 134,126 152,142 175,175 115,113 184,180 383,374 388,384 85 72 73 2 218,204 121,119 172,164 220,220 134,126 142,142 175,175 115,113 184,184 406,379 388,380 86 72 73 2 218,204 121,119 172,172 230,220 134,128 152,142 175,169 115,113 180,180 406,379 384,380 Genotypes of reference families at microsatellite loci

I link-fam \link-no \ f I m Isex I 1B7 I 1B10 I 1D6 I 1E3 I 1E7 I 1E12 \ 1F11 I 2A6 \2A11 I 2D2 12H12

89 87 88 1 196,192 121 119 172,172 220,220 134,124 142,142 117 113 184,184 379,374 388,380 90 87 88 214,192 119 119 172,164 220,220 128,122 152,142 117 113 184,180 398,374 392,388 91 87 88 214,192 119 119 172,172 224,220 134,124 152,142 121 113 184,180 398,374 392,388 92 87 88 204,196 121 119 172,172 224,220 134,122 152,142 121 115 180,180 398,374 392,388 93 87 88 214,204 121 119 172,164 220,220 128,124 142,142 117 115 184,180 398,374 392,388 94 87 88 196,192 119 119 172,172 224,220 134,122 152,142 121 115 184,180 379,379 388,380 95 87 88 196,192 121 119 172,164 220,220 128,122 142,142 117 115 180,180 398,374 392,388 96 87 88 204,196 121 119 172,164 220,220 128,122 142,142 117 115 184,180 398,379 392,388 214,204 172,172 224,220 128,122 142,142 180,180 398,379 392,388

204.202 172.166 220,220 128,128 169.169 184.180 379.375 392.380 100 87 98 204.202 121 119 172.172 220,220 128,128 175.169 117 113 180.180 422.379 392.388 101 87 98 218.192 121 119 166.164 228,220 128,128 169.169 115 113 180,180 398.375 392.392 102 87 98 202.192 121 119 172.166 220,220 134.128 175.169 117 113 184.180 422.379 392.380 103 87 98 218,204 121 121 172.166 228,220 134.128 169.169 113 113 184.180 422.379 392.388 104 87 98 202.192 121 121 172.172 220,220 128.128 169.169 117 113 180.180 379.375 392.392 105 87 98 218.192 121 119 172.164 220,220 134,128 169.169 117 115 180,180 422,398 392.388 Genotypes of reference families at microsatellite loci

link-fam link-no 7 1 7 ] sex 1B7 1B10 1D6 1E3 1E7 1E12 1F11 2A6 12A11 I 2D2 I 2H12 | 1 1 1 ■ ■ ■ < * t 1 o$4 o fc 0

108 106 107 1 218,218 172,172 226,226 128,128 152,152 175,171 115,113 180,180 379,374 396,396 109 106 107 1 218,218 172,166 226,226 134,128 152,152 175,171 115,113 180,180 379,375 396,396 110 106 107 1 218,198 172,166 226,220 134,128 152,142 171,169 117,113 184,180 398,375 396,380 111 106 107 1 218,198 172,172 226,220 134,128 152,142 175,171 117,115 184,180 379,374 396,380 112 106 107 2 218,218 172,172 226,220 128,128 152,152 175,171 117,113 180,180 398,375 396,380 113 106 107 2 218,198 172,172 226,220 134,128 152,142 175,169 117,115 180,180 379,375 400,380 114 106 107 2 218,218 172,172 226,226 128,128 152,152 175,169 115,113 180,180 379,375 396,380 115 106 107 2 218,198 172,166 220,220 134,128 152,152 175,171 117,117 184,180 379,374 396,380 116 106 107 1 218,218 172,166 220,220 134,128 152,152 175,169 117,117 180,180 379,375 396,396 117 106 107 1 218,218 172,166 226,220 134,128 152,152 171,171 117,113 180,180 398,375 396,380 118 106 107 2 218,198 172,172 226,226 134,134 152,152 171,171 115,113 180,180 379,375 396,380 119 106 107 2 218,198 172,172 226,226 134,128 152,142 175,171 115,113 184,180 398,374 396,380 120 106 107 2 218,198 172,166 226,220 134,128 152,142 171,169 117,115 184,180 379,374 396,380 121 106 107 2 218,218 172,172 226,220 134,128 152,152 171,169 117,113 184,180 379,374 396,396 1■■miBii■ HIH iH iH iH HIHiMB 124 122 123 1 218,214 131,121 174,172 134,124 152,142 175,169 115,115 184,182 379,375 388,384 125 122 123 1 218,214 131,121 172,164 132,124 142,142 175,175 115,113 184,180 402,402 384,380 126 122 123 1 218,194 131,121 174,172 134,124 142,142 173,169 115,115 184,182 379,375 384,384 127 122 123 1 218,214 131,121 174,172 134,124 142,142 175,175 115,113 184,180 402,379 384,380 128 122 123 2 218,194 131,121 172,164 132,124 142,142 175,175 115,115 184,182 402,402 388,384 129 122 123 2 218,214 121,119 174,172 132,124 142,142 175,175 115,115 180,180 379,375 384,380 173 (ii) Linkage Analysis results - Twopoint analysis

Below is a list of pairs of markers found to be genetically linked, both directly and indirectly, to the microsatellites added to the DogMap database with a Lodscore ('lods') of 2.0 or above, 'rec. fracs' indicates recombination fractions. The numbers beneath the results show the Lodscores at different recombination fractions. The results in bold are those which were linked to the microsatellites with a maximum Lodscore of 3.0 or greater.

Markers linked to 1B7 CPH16 cfp.630 rec. fracs.■ 0.12, lods * 4.07 -5.98 -0.15 3.24 4.01 4.00 3.66 3.12 2.47 1.76 1.02 0.34 0.00 k20 cfp.630 rec. fracs.a 0.08, lods ■ 8.22 0.63 5.42 7.95 8.18 7.73 6.94 5.94 4.76 3.45 2.06 0.72 0.00

1B7 cfp.630 rec. fracs. ■ 0.13, lods = 6.58 -8.96 -0.19 5.06 6.41 6.55 6.15 5.41 4.41 3.22 1.91 0.64 0.00

1B7 cxx.251 rec. fracs.= 0.06, lods = 2.56 1.21 2.14 2.56 2.48 2.26 1.97 1.62 1.24 0.83 0.44 0.12 0.00

1B7 k20 rec. fracs.a 0.07, lods * 14.34 4.25 10.89 14.16 14.14 13.15 11.70 9.94 7.94 5.76 3.44 1.19 0.00

Markers linked to 1B10 CPH20 V1AS-D10 rec. fracs.* 0.08, lods * 3.63 0.61 2.53 3.53 3.62 3.42 3.09 2.67 2.17 1.60 0.97 0.33 0.00

F66 CPH20 rec. fracs. = 0.17, lods = 2.95 -9.88 -3.01 1.25 2.55 2.93 2.91 2.64 2.21 1.64 0.98 0.32 0 . 00

1B10 V1AS-D10 rec. fracs.■ 0.17, lods - 3.57 -11.07 -3.23 1.65 3.12 3.54 3.50 3.18 2.65 1.95 1.15 0.37 0.00

1B10 F66 rec. fracs.* 0.00, lods ■ 6.32 6.31 6.21 5.74 5.13 4.49 3.81 3.09 2.33 1.56 0.83 .24 0.00

Markers linked to 1D6 cxx.030 111 rec. fracs.* 0.02, lods * 9.36 8.42 9.25 9.16 8.43 7.52 6.50 5.39 4.21 2.98 1.73 0.57 0.00

1D6 cxx.188 rec. fracs.= 0.16, lods = 2.48 -6.58 -1.68 1.34 2.24 2.47 2.42 2.18 1.81 1.34 0.81 0.28 0. 00 174

1D6 cxx.030 rec. fracs.- 0.15, lods - 3.99

-10.17 -2.36 2.40 3.72 3.99 3.82 3.37 2.73 1.98 1.16 0.39

0.00

Markers linked to 1E12 1E12 cfp.629 rec. fracs. = 0.13, lods = 2.95 -3.88 0.01 2.32 2.89 2.93 2.72 2.37 1.91 1.37 0.80 0.26 0.00

Markers linked to 1F11 lFll 123 rec. fracs. = 0.19, lods = 2.04 -11.08 -4.20 0.13 1.50 1.96 2.03 1.87 1.55 1.13 0.66 0.21 0.00

1F11 CPH21 rec. fracs.- 0.15, lods - 7.68 -17.35 -3.66 4.73 7.13 7.67 7.40 6.62 5.48 4.05 2.42 0.82 0.00

Markers linked to 2A6/1E3 k32 k292 rec. fracs.- 0.08, lods ■ 3.86 0.91 2.82 3.79 3.83 3.58 3.20 2.72 2.17 1.55 0.89 0.28 0.00

1E3 k292 rec. fracs. = 0.16, lods = 2.71 -6.28 -1.39 1.62 2.49 2.70 2.62 2.35 1.94 1.42 0.84 0.27 0 . 00

1E3 k32 rec. fracs. = 0.11, lods = 2.04 -1.19 0.74 1.84 2.04 1.97 1.78 1.51 1.18 0.81 0.44 0.13 0.00

2A6 cxx.349 rec. fracs.- 0.22, lods ■ 3.20 -26.06 -11.28 -1.78 1.45 2.73 3.17 3.12 2.72 2.08 1.28 0.44 0.00

2A6 k292 rec. fracs. = 0.13, lods = 2.95 -3.88 0.01 2.32 2.89 2.93 2.72 2.36 1.89 1.34 0.75 0.23 0 . 00

2A6 k32 rec. fracs.- 0.18, lods - 3.01 -13.77 -4.93 0.62 2.37 2.94 2.99 2.73 2.24 1.60 0.89 0.26 0.00

2A6 1E3 rec. fracs.- 0.01, lods ■ 13.99 13.23 13.96 13.47 12.23 10.80 9.25 7.61 5.88 4.08 2.26 0.69 0.00

Markers linked to 2A11 LEI030 109 rec. fracs.- 0.18, lods ■ 3.58 -16.77 -5.98 0.77 2.85 3.51 3.54 3.21 2.64 1.89 1.06 0.33 0.00 cxx.069 109 rec. fracs.- 0.08, lods - 5.65 1.22 4.08 5.53 5.59 5.24 4.68 3.99 3.20 2.31 1.37 0.46 0.00 175 cxx.069 LEI030 roc. fracs." 0.05, lods ■ 9.60 5.73 8.51 9.60 9.20 8.37 7.31 6.10 4.77 3.35 1.90 0.61 0.00

CPH3 109 rec. fracs." 0.07, lods ■ 6.48 2.12 4.97 6.39 6.40 5.98 5.36 4.59 3.70 2.70 1.62 0.56 0.00

CPH3 LEI030 rec. fracs.- 0.12, lods - 8.30 -10.16 0.52 6.77 8.20 8.18 7.50 6.43 5.08 3.56 1.99 0.63 0.00

CPH3 cxx.069 rec. fracs.■ 0.02, lods m 14.31 13.53 14.28 13.83 12.64 11.25 9.73 8.11 6.38 4.56 2.67 0.90 0.00

CPH20 LEI030 rec. fracs. = 0.21, lods = 2.84 -27.83 -12.09 -2.08 1.22 2.46 2.84 2.71 2.26 1.62 0.90 0.27 0.00

2 All 109 rec. fracs. ■ 0.04, lods ■ 4.23 3.01 3.92 4.23 4.01 3.64 3.19 2.68 2.11 1.50 0.88 0.30 0.00

2All LEX030 rec. fracs.■ 0.06, lods ■ 14.30 4.54 11.14 14.20 13.95 12.74 11.09 9.18 7.10 4.91 2.77 0.92 0.00

2A11 cxx.069 rec. fracs.■ 0.02, lods ■ 11.04 10.22 10.99 10.66 9.63 8.44 7.14 5.79 4.42 3.07 1.78 0.62 0.00

2A11 CPH3 rec. fracs." 0.12, lods ■ 8.27 -9.86 0.80 6.92 8.22 8.07 7.29 6.15 4.79 3.32 1.86 0.59 0.00

Markers linked to 2D2 cxx.250 129 rec. fracs.= 0.25, lods = 2.28 -39.85 -19.12 -5.65 -0.91 1.14 2.04 2.28 2.09 1.60 0.94 0.30 0.00

2D2 cxx.250 rec. fracs." 0.07, lods ■ 18.46 3.96 13.46 18.17 18.21 16.88 14.90 12.49 9.78 6.85 3.88 1.24 0.00

Markers linked to 2H12 2H12 LEI015 rec. fracs." 0.22, lods ■ 3.31 -26.06 -11.27 -1.75 1.50 2.81 3.27 3.24 2.85 2.18 1.33 0.46 0.00

2H12 cxx.147 rec. fracs." 0.16, lods ■ 4.26 -11.97 -3.16 2.27 3.86 4.26 4.13 3.69 3.02 2.19 1.27 0.41 0.00 (iii) Initial multipoint linkage analysis results of marker groups (build option)

Explanatory notes: The results shown give the ordering of the markers within each group as far as possible with the information available, using a Lod score cut-off of 2. The ordered results give the recombination fraction between each marker and its adjacent marker and the distance in Kosambi centiMorgans (cM) between each of the adjacent markers as indicated. The total genetic distance covered is also given on the right-hand column, added together through the ordered group. Where an order is uncertain, the unplaced marker is put above the placed markers with Xs underneath, indicating the possible position of the marker within the group.

Linkage group including 1B7:

Sex_.averaged map (recomb. frac., Kosambi cM)

3 1B7 0.0 0.07 6.7 2 k20 6.7 0.09 9.3 0 cfp.630 16.0 0.15 15.3 1 CPH16 31.3 loglO_like = -29.83

Sex- specific map (recomb. frac. , Kosambi cM -- female, male ) :

3 1B7 0.0 0.0 0.08 8.3 0.05 4.5 2 k20 8.3 4.5 0.08 8.1 0.13 12.9 0 cfp.630 16.4 17.5 0.17 18.1 0.00 0.4 1 CPH16 34.5 17.8 loglO_like = -29.25

Linkage group including 1B10:

0 V1AS-D10 1 CPH20 2 F66 3 1B10

Sex_averaged map (recomb, frac., Kosambi cM) :

0 VlAS-DlO 0.0 0.08 8.4 1 CPH20 8.4

* denotes recomb. frac. held fixed in this analysis 177 loglO_like = 2.73

Sex-specific map (recomb. frac., Kosambi cM -- female, male ):

0 VlAS-DIO 0.0 0.0 0.10 10.1 0.07 7.2 1 CPH20 10.1 7.2

* denotes recomb. frac. held fixed in this analysis logl0_like = 2.74

F66 0 1 X X

-7 .91

- 8.12

1B10 0 1 X X

-7 .13 -8 .11

Linkage group including 1D6:

0 111 1 cxx.030 2 1D6

Sex_averaged map (recomb. frac., Kosambi cM) :

0 111 0.0 0.02 2.2 1 cxx.030 2.2 logl0_like = 3.03

Sex-specific map (recomb. frac., Kosambi cM -- female, male ):

0 111 0.0 0.0 0.00 0.0 0.07 6.7 1 cxx.030 0.0 6.7 logl0_like = 3.52

1D6 0 1 X X

Linkage group including 2A6 and 1E3:

0 cxx.349 1 k292 2 k32 3 1E3 4 2A6 178

Sex_averaged map (recomb. frac., Kosambi cM) :

0 cxx.349 0.0 0.23 24.4 3 1E3 24.4 0.16 16.2 1 k292 40.6

* denotes recomb. frac. held fixed in this analysis logl0_like = -5.67

Sex-specific map (recomb. frac., Kosambi cM -- female, male ):

0 cxx.349 0.0 0.0 0.28 31.3 0.15 15.8 3 1E3 31.3 15.8 0.11 10.9 0.46 77.1 1 k292 42.2 92.9

* denotes recomb. frac. held fixed in this analysis logl0_like = -4.87 k3 2 0 3 1 X X

-26.66 -26.68

2A6 0 3 1 X X

-20.61 -21.81

Linkage group including 2A11:

Sex_.averaged map (recomb. frac., Kosambi cM)

0 109 0.0 0.07 7.2 3 CPH3 7.2 0.03 2.6 2 cxx.069 9.8 0.06 6.0 4 2 All 15.8 0.05 5.4 1 LEI030 21.2 logl0_like = -33,.86

Sex--specific map (recomb. frac., Kosambi CM

0 109 0.0 0.0 0.06 5.9 0.09 9.0 3 CPH3 5.9 9.0 0.09 9.3 0.00 0.0 2 cxx.069 15.1 9.0 0.00 0.0 0.07 7.2 4 2 All 15.2 16.2 0.08 8.1 0.03 3.5 1 LEI030 23 .3 19.6 logl0_like = -32.71 179

Linkage group including 2H12: 0 LEI015 1 cxx.147 2 2H12

Sex_averaged map (recomb. frac., Kosambi cM) :

0 LEI015 0.0 0.26 28.9 1 cxx.147 28.9 logl0_like =0.81

Sex-specific map (recomb. frac., Kosambi cM -- female, male ):

0 LEI015 0.0 0.0 0.27 30.6 0.26 28.5 1 cxx.147 30.6 28.5 logl0_like = 0.82

2H12 0 1 X (iv) Further analysis of ordered linkage groups using the all option

Explanatory notes: The markers are each given a number and 2 loci which are the furthest apart but still linked are chosen as the ordered loci. The others are inserted, and orders of the markers within the group are given in a list at the end. The differences between the numbers to the right of the orders is important (see results text). The Lod score cut-off for this set of results is 3.

Linkage group including 1B7:

0 cfp.630 1 CPH16 2 k20 3 1B7 ordered loci: 0 3 inserted loci: 1 2

1 0 2 3 -29.832 0 2 3 1 -32.616

Linkage group including 1B10:

0 V1AS-D10 1 CPH20 2 F66 3 1B10 ordered loci: 0 3 inserted loci: 1 2

1 0 2 3 -18.171 1 0 3 2 -18.244 0 1 3 2 -18.977 0 1 2 3 -19.013 1 2 0 3 -20.299 0 2 3 1 -20.494 0 3 2 1 -20.553

Linkage group including 1D6:

0 111 1 cxx.030 2 1D6 ordered loci: 2 1 181 inserted loci: 0

2 1 0 -13.457 2 0 1 -13.586

Linkage group including 2A6/1E3:

0 cxx.349 1 k292 2 k32 3 1E3 4 2A6 ordered loci: 0 1 inserted loci: 2 3 4

0 4 3 1 2 -40.095 0 4 3 2 1 -40.202 0 3 4 2 1 -41.304 0 3 4 1 2 -41.320 0 1 3 4 2 -42.695

Linkage group including 2A11:

0 109 1 LEI030 2 cxx.069 3 CPH3 4 2A11 ordered loci: 4 3 inserted loci: 0 12

1 4 2 3 0 -33.857 1 4 3 2 0 -36.435 1 4 2 0 3 -36.564

Linkage group including 2H12:

0 LEI015 1 cxx.147 2 2H12 ordered loci: 0 2 inserted loci: 1 0 2 1 -20.835 0 1 2 -21.255 1 0 2 -22.919 182 CHAPTER 6

6.1 GENERAL DISCUSSION

6.1.1 Project aims and achievements The aim of the project was to initiate progress in combining the physical and linkage maps of the canine genome. The isolation of polymorphic microsatellite repeats from canine DNA fragments and their genetic and physical mapping would improve the connection between the two maps, thus speeding their integration. The main results of the project are summarised in Table 6.1, showing the linkage groups and their chromosome assignments. Each linkage group contains of at least one of the polymorphic microsatellite repeats isolated from the cosmid library. This study increased the number of linkage groups in the canine genetic map using the DogMap reference panel from 16 to 20, and enabled chromosomal assignment of 5 further linkage groups within the 20. The project also increased the extent of two linkage groups which had already physically mapped, and supported both the genetic linkage group order and the physical mapping results. Furthermore, linkage through any set of reference families with the 8 physically assigned polymorphic markers presented here will enable their physical mapping onto one of seven chromosomes, hence increasing the integration of the physical and genetic maps of the dog.

The progress of the map as a whole is slow when compared to the rate of progress of the human genome mapping project. To highlight the differences between the two maps, the present state of the human map will be discussed as well as that of the canine map. This will enable the differences between the maps to be highlighted, and the significance to the canine map of the work presented in this study.

6.1.2 The progress of the human genome map For both the mouse and human, a final genetic map has been produced, giving an average spacing between markers of 0.2cM for the mouse (Deitrich et at. 1996), and 1.6cM for the human (Dib et al. 1996). Each map is constructed of several thousand markers, mostly of the microsatellite type. The density of these markers is sufficient to allow the mapping of monogenic and possibly polygenic traits in either animal. Both maps provide a framework for the construction of a physical map, based on YAC (yeast artificial chromosome) libraries (Deitrich etal. 1996)(Dib etal. 1996). 183

Table 6.1

Linkage group No. of markers Published Chromosome in group linkage group assignment name* 1B7 4 L13 20* 1B10 4 L15 7 1D6 3 - 2 1F11 2 - 30 2A6/1E3 5 L16 18* 2A11 5 L3 6 2D2 2 - 9 2H12 3 - - Table 6.1 shows the 8 genetic linkage groups including the isolated microsatellites, the number of markers in each group, the published linkage group name which was increased in size where present, and the physical chromosome to which the cosmid containing the microsatellite was mapped. * indicates linkage groups already assigned to chromosomes (Lingaas etal. 1997), the assignments being further confirmed from the FISH results of 1B7, 2A6 and 1E3 cosmids.

The development of a human radiation hybrid cell line map (Stewart et al. 1997) promises to bring the culmination of the physical mapping of the genome even closer than expected. This gene-specific STS (sequence- tagged site) map contains over 10,000 previously mapped genetic markers, covering almost all the genome. There are three radiation hybrid panels available (Stewart et al. 1997), containing chromosome fragments of different average sizes and therefore producing maps of different resolutions. The construction of a physical map using YACs in the human is also in progress, although problems such as YAC chimerism and rearrangement, and the under representation of GC-rich regions have necessitated the use of alternative physical mapping methods in order to increase the accuracy of the final map (Stewart etal. 1997).

A human EST (expressed sequence-tagged) map, or a gene-based STS map (Berry etal. 1995)(Boguski and Schuler 1995) is also being established. 65,000 EST sequences have already been placed in the public domain (Boguski and Schuler 1995), the mapping of which, along with others, will eventually provide a comprehensive gene map for the human genome. Combining all types of map of the human genome would enable a specific gene to be localised to each of a chromosome, a YAC, a hybrid cell line and to 184 other genetically linked markers. At least 16,000 genes have already been mapped , helping to unify the existing genetic and physical maps of the human genome (Schuler et al. 1996). When completed, the map will be useful in wide-ranging areas of human science, such as the identification of candidate genes and markers for them, as well as extrapolation from cytogenetic analysis of abnormalities to the gene(s) involved (Bray-Ward et al. 1996).

The high throughput, automation of the human genome mapping program was made possible by the world-wide contribution towards both its funding and its organisation and co-operation between researchers involved to limit duplication of work. The development of methodologies for error detection and efficient, high throughput working has enabled the human genome map to progress at an ever increasing speed.

6.1.3 The canine genome map Several hundred polymorphic markers have been isolated from the canine genome ((Ostrander et al. 1993)(Francisco et al. 1996)(Guevara-Fujita et al. 1996) and others). Several genes have also been identified, both by physical mapping (Guevara-Fujita etal. 1996) and by virtue of their linkage to polymorphic microsatellites (Burnett etal. 1995)(Shibuya etal. 1996). At least 45 canine genes have been identified (O'Brien and Marshall Graves 1991), but compared to the genome maps of humans, mice and economically important species such as the chicken, cow, pig and sheep, the canine map itself is rather underdeveloped.

There are many lessons to be taken from the progress of the human map, the main one is to ensure the knowledge and information gained from its production is used efficiently to speed the production of the less economically important maps, such as that of the dog. Automation is not possible on a large scale for financial reasons, but much use can be made of the information gained from human and other maps in progress. The eventual aim of the canine genome map is the identification of genes, especially those which are disease-causing. Sufficient markers have already been identified to provide a canine map of a 10cM resolution, but there is as of yet, no universal set of reference families to enable their genetic mapping. A two-generation set of reference families has been used to map over 100 markers, consisting of 16 linkage groups (Lingaas etal. 1997). It is not an ideal set of reference families, as two generations cannot establish phase, and some recombination data will be lost. The families are also pure-bred, so the number of markers which are highly informative throughout the families is likely to be low. A three-generation cross-bred set of families would be ideal, and will be likely to provide much more information at any given locus. Funding constraints, as well as time and housing of the animals has led to the use of existing families, bred for other purposes - usually for mapping disease loci.

A further 30 linkage groups have been identified using a second reference family (Mellersh etal. 1997), and once the integration of these with the DogMap reference panel has been carried out, the linkage map of the canine genome will start to take shape.

The size and number of canine chromosomes has meant a standardised canine karyotype has been slow in becoming established. Such a karyotype has been elucidated (M. Breen, pers. comm.), and the progress of physical mapping of the canine genome can now progress more rapidly. All 78 canine chromosomes can now be identified using paint probes, and the construction of a set of chromosome-specific cosmid clones containing polymorphic microsatellite markers for use with these paints is underway (Breen et al. 1997). This will allow the physical mapping of present and future genetic linkage groups onto the canine map. The development of both radiation hybrid cell lines (Vignaux etal. 1997) and bacterial artificial chromosome (BAC) library (Ostrander 1997) are underway, and canine-rodent hybrid cell lines, representing up to 90% of the canine genome, are already available (Langston etal. 1997).

One possible beginning to a solution of integrating the genetic linkage maps and the physical maps of the canine genome is the use of a common set of markers, as has been suggested for the mapping of the bovine genome (Fries 1993). The use of cosmid-derived, physically mapped polymorphic microsatellites, one or two per chromosome could be used as reference markers (Fries 1993). This is in the process of being carried out for the canine genome (Breen et al. 1997) and this project has made a significant contribution to its progress. The markers thus developed could be used to place the genetic linkage groups already existing on a common map. Subsequent hybrid cell line use will enable the genetic linkage groups to be physically mapped to the cell lines and the reference markers to identify the chromosome fragments in the cell lines, both directly by the production of a PCR product from a cell line and by extrapolation (markers linked to the reference marker on a particular chromosome would be assigned the same chromosome as that reference marker). Genes which have been previously 186 identified in the dog can also be mapped onto these hybrid cell lines, giving their chromosomal location at the same time.

6.1.4 Comparative mapping Gene sequences show conservation between species, so these are ideal Type I loci for use in comparative mapping. Such markers have been called 'anchored reference loci' (O'Brien etal. 1993), and 380 were initially proposed by the committee on comparative gene mapping (O'Brien and Marshall Graves 1991). The loci were chosen for their equivalent spacing in the human and mouse genomes. This group of markers has since been referred to in the development of further groups of anchored reference loci (O'Brien et al. 1993)(Venta etal. 1996)(Lyons etal. 1997). These loci have been given various names, including universal mammalian sequence-tagged sites (UM- STSs) (Venta etal. 1996) and comparative anchor-tagged sites (CATS) (Lyons et al. 1997). The UM-STSs were developed specifically for the canine genome, and 86 genes have been identified by PCR which can be mapped by FISH and the use of hybrid cell lines. Some of these anchor loci have been found to contain microsatellite repeats, suggesting these could also be used in linkage mapping. The sequence surrounding the microsatellite in cosmid 1E3, showed high homology with the 3' UTR of the IGF II gene. Subject to further sequence analysis, confirming the presence of the coding sequence of the gene itself, it would be an ideal UM-STS. IGF II is one of the proposed reference anchor loci in several papers (O'Brien etal. 1993)(Venta etal. 1996)(Lyons et al. 1997), it has been mapped to chromosome 18q21 of the dog and is also part of an already published genetic linkage group - L16 - previously assigned to the same chromosome (Lingaas etal. 1997). Details on the comparison between genomes using this locus are seen in the discussion to Chapter 4, pg 124.

A good example of the use of comparative genetics is illustrated by the CATS (comparative anchor-tagged loci) optimised for the cat (Lyons etal. 1997). The loci were taken from previously recommended anchored reference loci (O'Brien et al. 1993), many of which have been physically mapped in the cat genome (O'Brien et al. 1997). This use of comparative mapping along with reciprocal chromosome painting has allowed the chromosomal rearrangements between human and cat to be largely determined, revealing a great deal of synteny between the two mammalian orders. The technique of comparative mapping using chromosome paints from one animal and hybridising them to the chromosomes of another animal is called ZOO-FISH (Scherthan etal. 1994), and has been used to compare the syntenic grouping of different mammalian orders with human chromosome paints. The current development of canine chromosome paints will enable direct syntenic mapping with well-mapped mammals, such as the human or mouse, speeding the mapping of genes onto the canine genome.

The potential for the canine genome with the use of these anchor loci is enormous, and the advent of cDNA libraries of the dog will allow further genes to be identified by their homology to genes already mapped in other species. Some synteny has already been found between the dog and other species - Human chromosome 17 loci have been found to map to canine chromosomes 9 and 5. The gene order was found to be essentially the same, with only one inversion (Werner et al. 1997), again illustrating evolutionary conservation of syntenic groups of gene loci.

6.1.5 The future of canine genome mapping Maximum use needs to be made of the resources available through other genome mapping projects, notably those of the human and mouse. This could significantly aid the production of a comprehensive map of the canine genome. Microsatellite markers need to be identified and genetically mapped in one of two ways: 1. Using a universal set of reference families, preferably three generation and made of cross-bred animals to enable all researchers to place their identified markers into the same linkage groups. 2. Using reference loci, consisting of one or two physically mapped polymorphic microsatellite-containing cosmids. This would enable linkage groups from markers assessed across different sets of reference families to be placed onto a common linkage map, and also a chromosomal location.

Full use should be made of the suggested anchored reference loci in conjunction with the developing canine cDNA libraries from which canine- specific gene identification can be carried out. Together with canine hybrid cell lines and BAC libraries, the physical mapping of these genes onto the canine genome can be carried out efficiently. Subsequent information concerning the evolution of the canid when compared to other animals may provide important information about the possible reasons for conservation of synteny between species.

The development of physically mapped cosmid-derived microsatellite markers would enable the integration of both the physical and genetic maps of the dog, and enable the integration of linkage groups based on different reference families. The combination of linkage groups identified using different reference families in the dog by mapping a subset of markers from each linkage group onto the other reference family is ongoing, and will aid the progress of the canine linkage map.

6.2 Future work

The results of this study has opened up further possible avenues of research, into areas of canine research of which there is relatively little known at the present time. The discovery of the presence of a polymorphic microsatellite in the pseudoautosomal region of the dog could be further investigated by obtaining a larger sample of three-generation families to more closely assess the position of the marker relative to the sex-specific region of the canine sex chromosomes. Although no sex-linkage was observed in the results gained, there is still the possibility of some sex-linkage which was not observed in the small sample used in the study. Such results would be observable only with a larger sample size, as was done for a pseudoautosomal pig minisatellite (Signer et al. 1994). The results in the pig study were breed-specific, showing loose sex-linkage in one breed and no linkage in a second breed. Extrapolating these results to the dog would mean that linkage results may be breed-specific, so a number of studies could be carried out. These would include a study involving a large number of three-generation pedigrees could be used to study the possible breed-specific differences in sex-linkage. A further study involving cross-bred, three-generation families could also be carried out to determine a general linkage, especially if breed-specific studies reveal no differences. The studies would be limited without the discovery of a further polymorphic pseudoautosomal locus, when recombination with this as well as sex-linkage could be assessed. This would provide much more information about this little-known region in the dog. The presence of a marker proximal the pseudoautosomal boundary would enable female meioses to be studied in this region. It would enable comparisons of the male and female recombination rates in the dog, which in humans are 10-fold greater at male meioses than at female meioses (Henke et al. 1993).

The identification of sequence surrounding a microsatellite showing high homology to the 3' UTR of IGF II on chromosome 18q21 in the dog invites further study with regards confirmation of the presence of the coding sequence for the gene, and thus the extent of possible synteny of the canine chromosome area with human chromosome 11 p. Since the sequence found to be homologous with the gene involved the entire subclone insert sequence, a restriction map of the cosmid insert from which the subclone was taken would need to be made. This would enable identification of the sequence to be found either side of the subcloned insert. These could then be sequenced, thus confirming the presence of the coding sequence of canine IGF II. As stated above, 1E3 is one of five linked loci in this area in the dog, based on linkage analysis in this study. A combination of FISH with human probes and paints and linkage analysis using nearby conserved gene loci in the human and other animals, such as the mouse and cattle, could be used to establish the extent of synteny and to confirm the presence of IGF II by synteny. The details of the synteny between human chromosome 17 and canine chromosomes 9 and 5 was found using combined methods of FISH and linkage mapping (Werner et al. 1997). A human chromosome 17 paint was used to establish parts of canine chromosomes 9 and 5 as being homologous to the human chromosome, and canine genomic clones containing genes from the human syntenic area were identified from human and mouse cDNAs. The dog-specific clones were screened for polymorphic DNA, which was used to create a linkage map of the area in question. It would be of use to follow similar lines in establishing the possibility of synteny between canine chromosome 18q and human chromosome 11 p. The canine chromosome paints (Langford et al. 1996) could be used to identify human chromosomes with homology to canine chromosome 18. CATS, consisting of gene-specific primer pairs, have been identified for IGF II and nearby loci (Lyons et al. 1997), both telomeric and centromeric to IGF II (Venta etal. 1996). With optimisation, these primers could be used in the dog to identify canine clones containing the genes in question. Establishing synteny should be possible, given the identification of a canine clone containing the genes. If this identification is not possible, the equivalent human clones could be used in FISH analysis to establish synteny.

Conclusions The project has increased the size and detail of the present canine physical and linkage maps, and has contributed towards combination of the two maps. A pseudoautosomal marker for the dog has been identified, and there is also evidence suggesting the possible identification of the IGF II gene in the dog. In general, the isolated microsatellites have been found to be good markers in contributing towards the production of the canine genome map, despite the high mutation rates of a small proportion of microsatellites. 190 GENERAL APPENDIX

(i)

Sequences of microsatellite-containing plasmid subclones, indicating sequence used in BLAST searches, primer sequences and repeat sequences Explanatory notes: Groups of letters in BOLD type show the microsatellites, single letters in BOLD type are where N has been changed to A,C,G or T by comparison with the overlapping region of the complementary sequence. Letters in ITALICS show sequence not included in BLAST searches, because it was either vector sequence or contained a large number of undetermined bases (Ns). UNDERLINED sequences are primer sequences. * Indicates the primer sequence shown was designed on that sequence, although the complementary primer sequence may also be underlined on the complementary sequence. Differences specific to particular sequences are explained with the sequence. (CA)n*(GT)n MICROSATELLITE REPEATS

1B10

1B10F:

GNNNNNNNNN NNNNNNNNNN GANTNAATCG GTTTCACGCG GTGGCGGGCG CTCTAGAATA GTGGACAAAA GAGNANTCCT AGCATCCCAA GAGTCCCGTC ATCTCTTCTC ACCTTATACA GATAACCACT CATCTAATGT CTAGCTGTAG TAAATAGTCT TGCTTGTCAC AATCAGTAAA TATTCTGTTT GTTTACTTGT TGATTTTATG TTTCTCCTAT TAGAGCATCA ACTCCCAAGA AGAGAACCTG TTTATTTAGT TCTAAGCCCA GTGTCTAGCT CAGTTCTTAC ACATAACAAG TTTGCAATGC ATTTTTGTGT CTAACTCCTT CTTGTCTTTA GACTCAATAG TAGCTCCCAA GTGAATCACC AAATTTGGCA AAATTGATAT TCAAACATGG TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTTTNCA CATGGTGTCA AAGAACGGAT CGTTGGGAAC CGGANCTGAA TNAACCNNNC CNNNCGACNA GCGTGACCCN CNANGCCNGC NNCNNNGGCiV ACNACNTNGC GCNNACTANN NNCNGGNNAA CNACNNNNCN NNCNNCCCGG NNNCNNNNNN NANACGGATG GANGCGGATN NNNTNGCNNG ACNNNNCTGC GCNCGGCCNN CNGGNNGGNN GGTNNANGNN GANNNNCNGG ACCNGNGAGC GTGGGTCCNC NGNNCNNNNN NNNNCNGGGG GCCNNNAANG NNNCCCCCCC

1B10R:

NNGANACTTA TTCGAATCCT GCAGCCCGGG GGATCTTCAG CATCTTTTAC TTTCACCAGC GTNTCTGGGT GAGCAAAAAC AGGAAGGCAA AATGCCGCAA AAAAGGGAAT AAGGGCGACA CGGAAATGTT GAATACTCAT ACTCTTCCTT TTTCAATATT ATTGAANCAT TTATCAGGGT TATTGTCTCA TGAGCGGATA CATATTTGAA TGTTTTTANA AAAATAAACA AATAGGGGTT CCGCGCACAT TTCNCCGAAA AGTGCCACCC GACNTCNTAA GAAACCCATT ATTTNCCTGA CNTTNNCCNT ATAAAAATTN GCGTATNACG AAGCCCTTTC GTCTTCAANA ATTCGCGGNC GCAATTNACC CTCACTAAAG GGATCTGTCT ATTTCGTTCA TCCATANTTG CCTNNCTTCC TCCTNNTTGN TTACTCTTTC CAATNTTNTA NGGGCTTTNC TCCTNNNCCC NTTGTTTCTC TTTTTAACCN TTTNNAAACC CCTNTCNCCT TTTNTCCCCT NNTTTCTTCT CNCTTTTTTT CTTCNTCCCT CTCCGCAANG NCNCGATAAT CNCCTTAATN TTGTTCTCTT CNTGTATTAC TCANCCCCTC CTCTCCACTC CCTTTCTCTT ATTNCCCTCT CNTNTGNCNC NCGGGNANNT CCCCTCCACT CACTCTNTTT TCTCCTCTCN CCCTCTNTCT NTCATTCCNG CCCCTTTTNC CCTCCNGTCN CCCTCATNTT NCNCTCCCCN CNGNTTCTAT NCNTCGTTTC NTTTTCTCTC CCTTCNCTTT TNNTGTCNNT GNGNCTNTCT TTCNNTTTNN CCCTCTCCNC CTCTCTTTTC NCTACTCTCN TCTCTTTCTC TNTNNCANCC TTCCTTTTTT CTCCACCTNC TTTCCNTTTT TTTNTCCATT TTNTCCCTNT CTTNTCTCNT CTCCCACCTN CTCTCTCNGT CTTCCTCNGC CTN 192 1D6

1D6F:

NNNNGATTGG ACCCACGCGG TGGCGGCCGC TCTAGAACTA GTGGATCCAC TCTAGCTCCA TCCACATCAT TGCAAATGGT AAGATTTCAT TCTATGTGAC GGCTGAGTAA TATTACACAC ACACACACAC ACACACACAC ACACGCCCGC CACACCACGT CTTCCTTACG TGTTTATCAA AGGACATTTG GGCTCTTTCC CTAGTTTGGC TATTGTCGAT GCTGCTGCTA TAAACCTAGG GGTGCATGTA TCCCTCTGAA TTAGTATTTT GCCTCTTTTG GGTAAATACC CAGTAATGCA GTCCCTGGAT CCCCCGGGCT GCAGGAATTC GATATCAAGC TTATCGATAC CGTCGACTCG AGGGGGGCCG GTACCACTTT TGTCCCTTTA TGAAGGTAAT TGCCCTGGCG TACATGTCTA CGTTCTGTTG AATGTACCCC CATCANCAAA CACCGACAAA TTAANCGGTG NATATACTAT CATATGNTGC CCANGCCCTC ATCGGAACGT TCACCTATAT CGCACCCGGA AGGGTGGTTG GNCTCCTCCC CATACCGNCC GCTCGTGGGA

1D6R:

TNNNCTTTNA TAACTTGAGC CGTTTCCGNC ACCCGGGGGA NGCATGGACT GCACTACTGG GTATTTACCC AAAAGAGGCA AAAATACTAA TTCAGANGGA TNCATGCACC CCTAGGTTTA TAGCAGCAGC ATCNACAATA GCCAAACTAG GGAAAGAGC C CAAATGTCCT TTGATAAACA CGTAAGGAAA ACGTGGTGTG GCGGGCGTGT GTGTGTGTGT GTGTGTGTGT GTGTGTAANA TGACTCAGCC GTCACATAGA ATGAAATCTT NCCATTTGCA ATGATGTTGG ATNGAGCTAG ANTGGATCCA CTATTTCTAG AACGGCCGCC ACCGNGGTGG ACTCCAATTC GCCCTATANT GAGTCGGNNG NCGCGCGCTC ACTGGCCGNC GTTTTACAAC GTCGTGACTG GGAAAACCCC GNCGTTACCC AACTTAATCN CCTTGCAGCA CATCCCCCTT TCGCCAGNTG GCGTNGATAA CAAAAAGNNC GCACCGATCG CCCTTCNCAA CAGTTGCGNA GCCNGAAATG GCGAATNGGG ACCCGCCCTT GTTTGCTGGC GCCNTTAAAN CCCCGCGGGG TGTTGGGTGG GTTTNCCCCC CCCCGTNNAA NCGCCTACTN NTTNCNANGC GGCCCTTAAA NCGCCCGGGC ANCCTATTCC GNTTTGCCTT CCCCTGGCCC TTTNCTCCGN CCNANGTTTC CCCCCGGGGT TTTCCCCCCG TCCAAAGCCC CTAAATTCNN GGGGGGCNNC CNTTTAAG 193 1E3

1E3F:

GNGTAAGTAT TNACCNTCGG CNTTTGGCAA ACTANAATAG GGCCTCCCCC TCCAACAGGC TGAGGAGATC CTAGTAACAC CTCTAAAAAT GTACAAACTC AATTGGCTTT CATAACCCCC CAAAATTACC CCCCCCANAT TATCCCAAAT TACACAACAG AAAAACCTAC GAATTTCTGA AAATATCTCT CTCTCTCTCT CTCTCTCTCT CTCTCACACA CACACACACA CACACACACA CATCAGTCCC CTTAAAAACA ATTTGGNTTT TTTAGGAACA CCAGCAAAAT TAATTAGTTA AAAAAAAACA ACCCTAAAAT TNTTTTTTTN TTTTTAAAAA AAAAAAAAAA ATACCAAAAC CAAATTGGNT TAAAAACAAT TGGCATCNTG AAAGAATTAG GGTGNAGGAA TTCGATATCA AGCTTATCGA TACCGTCCAA CCTCGANGGG GGGCCGGTAC CCANCTTTTG TTCCCCTTTA GTGANGGTTA ATTGGCGCGC TTGGGNGTAA TNCANGGGTC ATAAGCTGTT TTNCTGGTGT TGAAAATTGG TTANCCGGCT CCACAAATTC CCCCNCNACA TTNCGAAACC CGGGGAAGCA TTAAATGTTG TTTNAANCCC TGGGGGGGTG GCCTTAAATG AAATTGGAGC NTAANCTTCC CCTTTTAAAT TTGGGGTTTG GNGGCCTNCN TTGGGCCCGC NTTTTCCCTN TNCNGGGGAA AAAACTTGGT CCGTTGGCCA GGCTTGGGCT TTTTAAATGN AAAATCCGGG GCTAACCCNC CCNGGGGGGA AAAA

1E3R:

CNNCNNNNTT TTNACNAATC CGCCACCTAA TTCTTTCATG ATGCCAATTG TTTTTAANCC AATTTGGTTT TGGTATTTTT TTTTTTTTTT TTTAGCCAAT TGATAATTTT AGGGTTGTTT TTTTTTAACT AATTAATTTT GCTGGTGTTC CTAAAAAGCC AATTTGTTTT AAGGGGACNG ATGTGTGTGT GTGTGTGTGT GTGTGTGTGA GANANAGAGA GAGAGAGAGA GAGAGAGATA TTTTCNGAAA TTCNTANGTT TTTCNGTTGT GTTAATTGGG GAAAAATTTG GGGGGGGTAA TTTTGGGGGG TTATGAAAGC CANTTTANTT TGTACNTTTT TAGAGGTGTT ACNANGATCT CCTCNCCCTG ATGGAGGGGG ATCCACTANT TCTAGAGCGG CCGCCACCGC GGTGGANCTC CNATTCNCCC NATANTGANT CGTATTACGC GCGCTCACTG GCCGTCCTTT TACAACNTCN TGACTGGGAA AACCCTGGCG TTACCCAACT TAATCGCCTT GCAGCACATC CCCCTTTCNC CAGCTGGCGT AATAACAAAA AAGCCGCACC GATCGCCCCT CCCAACAATT GCGCNCCCTG AATGGCGAAT GGGGACCCCC CTGTTTCCGC GCATTAAACN CCGCGGGTGT TGTGGTTACC CCCNACCTTG ACCGCTACAC TTGCCAANCG CCCNAACNCC CCNC 194 1E7

1E7F:

NNNNNNNTTN NATTGGACTG CCCNCGGTGG NGGCCGCTCT AAAANAATGG ATCCTTATGC CAACTGGTTG AAANATGTGT TTTCCAGGAG GGTAGATTAT GGAAGCGTAT TGAANACGCC ATCGTGCCCT CTGCAAAGTG £CCCGCTANC TATATGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTGATNT CCCTAGGACC CTCTGTCTTC TGCCGANNTS. QATAAQTC M ,GCGGTCAGTG ACACATGTGC GTTGTTCCCC CANNGCCCCN GGGCCNAAGG NNGACACATG GTGAGCCTTG GGAGNGTCTT GTTTANTGAA TGAATGAATA AACAAAAGAA CTGATTCATC CCNATAAGAC CNNNCACNTC ATATGAAGCA TCCATAATGC TGGCGTATCA TGGCGTTGAT TTATCTCNAA ACAGTCANCC CAGANCTCTC NNNGGGCAGA ANCACAACAT TTGAATCCTC TGCNCCNATT ATGTAGANAA ACCCCTCNAA AATTNANTTG GTTCCCGATG TTTTCCCGGG TTAAAACNTN ATNCCTNGGA ANCCNACTNG CCTAAAANGG TTTNNCNTCC AACCACNACA GAANANATCT TTTCNCTNAN TTTCCAAANT TCTTTTGGGG GAAGATCCCC CGGGCTGCAN GAAATCCACT NCATCCTTTT TCTANTNCCT CCACCCCGAA AGGGGGGGCC NGGGTTCCCA NCTTTGGTTC CCCCCNATNN AAGGGTTTAT TNCCCN

1E7R:

NNGANACTGN NCGAATCCTG CAGCCCGGGG GATCCTCCCC CAAAAGCACT TTGAAACTCT GTTAAATGAT GTCTCTGTCT GCTTGGANAA AAACCTTTTA GGCAGTTGGT TTCCAGGCAT CTGTTTATTT GGCTACATCC TGAACAAATC AATTTTGANA GGTTATCTAA ATAATTGGTG CANATGAATT CAAATGTTGT GTTTCTGCCC TTGANANCTC TGGGTTGACT GTTTCGANAT AAATCAAAGC CCTGATACCC NGCATTCTGG ATGCTTCATA TAATGTGCTT GGGTCTTATC GGGAATGAAT CANTTCTTTT GTTTATTCAT TCNTTCATTA AACAACGACT TTCCCAAGGC TCACCATGTG TCCGCCATCG CCCGGGGCAC TGGGANAACA ACGCACATGT GTCNCTGACC GCATGACCTA TCCAAATCGG CAGAAAACAG ANGGTCCTAA GGAAATCACA CACACACACA CTCACACACN CACACACACA CACACNCANA TATATAGCGG GGCACTTTGC NNAAGGCACG AANGGCGTCT TCCAATACCC TTCCCATAAT CTAACCCCCC CTGGGAAAAA CACATCCTTT TCCAACCAAN TTGGGCNTTA AAGGGATC CC CCTTAANTTT CCCAAAAAAC CGGGGCCGNC CCCCCGNCGG GNGGGNAAAC TCCCAAATTT CCNCCCCAAA AAANTGGAAN TTCNTNNNTN AACCCCCNCN CCCCCCCCNG GGCCCGNCCC NTTTTTAAAA AACCNTTCCT TTAAACTGGG GGNAAAAAAC CCCTGGGGNT n t t t n c c n a a NN 195 1E12

1E12F:

NNNTTGGACC CACGCGTNGC GGCCGCTCTA GAATAGTGGA TCACAGGTGA CAGGTGCAAA GTCATGGTTT GGGATATACA CAGATATGGA AATGCATGTA GATACATGTG TGTAAAAATG TGTGTGTGTG TGTGTGTGTG TGTGCACGCA CGCACGAAAT GCATCCCTTT GCCCCCCACC CCANCACCGA NAGGTCCTAN GACANAAAC C CCCCAATCCT GACTTCCAAT TCCCATTCTC CATAAACAGG AACCAAAA TC CCCNGGCTGC AGGAATTCNA TATCAAGCTT ATCGATACCG TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTANTGAGG GTTAATTGCG CGCTTGGCGT AATCATGGTC ATAACTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT CCNCACAACA TACGANCCGG GANCATAAAG TGTNAAGCCT GGGGTGCCTA ATGAGTGANC TAACTCNCAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCNGGAAA CTGTTCNTGC CANCTGCATT AATNAATCNG CCAACCCCGG GGAAAAGCGG TTTGCTTTTT GGGGGCTCTT CCGCTTCCTC CGCTCAATAA ATCCCTGCCC CCCGGTCTTT CGGGCTGCNG GGAAACNGGT TTCACTCCCC TCCAAAGGGN GGTAATAACG GTTTTCCN

1E12R:

NNGGATATTT GATATCGAAT TCGGCCTNCC GGGGGATCTT GGCTCNTGTT TATGGANAAT GGGAATTGGA AGTCAGGATT GGGGGGTTTC TGTCCTAGGA CCTCTCGGTG ATGGGGTGGG GGGCAAANGG ATGCATCTCG TGCGTNCGTG CACACNCACA CACACANACA CACNCACTCT TACACACATG TATCTACATN CATTTCCATA TCTGTGTATA TCCCAAACCA TGACTTTGCA CCTGTCACCT GTGATCCACT AGTTCTAAAA CCGGCCGCCA CCGCGGTGGG ANCTCCAATT CGCCCTATAT TGAGTCNTAT TACGCGCGCT CACTGGCCGT CGTTTTACAA CGTCGTGACT GGGAAAACCC TGGCGTTACC CAACTTAATC GCCTTGNAGC ACATCCCCCT TTCGCCAGNT GGCGTNATAT CGAAAAAGNC CGCACGATNN CCCTTCCCAA CAGTTGCGNC NGCCTGAATG GCNAATGGGA CCCGCCNTGT NTCGGCGCAT TAANCNCGGC GGGGTGTNGT GGGTTACTCC CCANCGTTGA ACCGCTTACA CTTGCNAACG GCCCTAAACG GCCCGCTCCC TTTCCGCTTT TCCTTCCCNT TCCTTTTTCT TNGCCCACNN TTNCGCCGGG GNTTTTCCCC CCGTCCAAAN CTNTTAAAAT CGGGGGGGGG CTCCCCTTTT NANGGGGTTC CCAAATTTNA ATTGCTTTTA ANGGGNACCC TCNAAACCCC CAANAAAAAC TTTN 196 1F5

1F5F:

TNNNNNNNGT TTGGACTCCA CGGCGTNGCG GCGCTCTAGA A TAGTGGATC CATCCTTATG ACGGGGAAAC TAAGGAAAGT GGGCCAAGTG ACTGCCTGAG GTCACATGTN CAGGATGTGG CCGCGTCTGG ATTCGAATCC AGAGCCGGGC GCCTGGCCTC CCACAGCCTT GCCACGTCCC CATCTCGGAG TGCANAAAAG GAGCACCTCC CTGAGCTCCT CCTTCANAAA NCAGTTGCTC AGGACCTCGG TCTTCCCAGC CGTTGGGGTT TCTGCTTCTG GAAAATNCTG TTTCAAAAGG AAACGCTGCT NAAAGCTNAN ATACCTTGCC TGTNGGGCGG GGANCAAAAA AAGCGTCTGG GANGCTCGTA ACTGCCCCCT GGGAATGTTT TTCGACTGTC NTCNGAAAAA NAATNAGAAC TCGTTCACAA TNNCCGGGAA AGATTCNNCN TATTTTNACC ACGGTCAGCC CTCTCTCTCT CTGTCTCTCT GTCTCTGTCT CTCGTTCACN CNCNCNCNCN CNCTCNCTCT TNAAGGCGGG AANACNCCNC AAAACTNAAA CAACNTNCTN AANACCCTAZV CCCCCCTANT TATTCCCCCC GGGGTGCCGG GAANTTCCCA ATTTTCNAAC TTTTTNCNAT TNCCCTTTCA AACCTCCAAA AGGGGGGGGC CNGGGGNNCC CNNNNNNTNN GTTNCCCCNC NNANTNANGG GGTTNANNNC CTCCCCTTGG GNNT

1F5R:

NTNNNNGATN ANTTGA TATC GANTCGCGCA GCCCGGGGGA TCATCTGTGT GCTTGTGTCT CAGGGATGTT GTCTCTGTTT CTGGGTGTTT CCTGCCTCTG TGTGTGTGTG TGTGTGTGTG TGCACGAGAG ACAGAGACAG AGAGACAGAG AGAGAGAGAG GCTGACCGTG GTCTAAATCT GCTGAATCTT TCCCCGTGAC TCTGAACGAG CTCTTACTCT TTTTCCGACG ACAGTCGAAA CACATTCCCA GCGGGCAGTT ACGAACCTCC CAGACGCCTC TTCTGGTCCC CGCCCCACAG GCNAGGCTAT CTCAGCCTTC AGCAGCGTTT CCTTTTGAAA CAGCATCTTC CANAAGCAGA AAACCCCNAC GGCTGGGAAG ACCGAGGTCC TGAGCAACTG CTCTCTGAAN GAAGAGCTCA GGGAGGTGCT CCTTTTCTGC ACTCCGAAAT NGGGACGTGG CTAGGCTGTG GGANGCCANN CCCCNGCTCT GGATCNAATC CAGACGCGCC ACNTCCTGTN CTTTNACCTC AGGCAGTCAC TTTGGCCACT TTCCCTAATT TCCCCGTCCT TNANGATGGA TCCACTANTT TCTANANCNG CCCCCACCGC GGTGGAACTC CNATTCNCCC ATAATTGAGT CTTTTTNCCC NCGCNCNNGG GCCGTCTTTT TACANCTNCC TNANTGGGGA AAACCNGNCT TTCCCC 197 1 F11

1F11F:

NNNNNNNTTG TTTGGACCCN CGGGGTGGCG GCCGCTCTAG AATAGTGGAT CTGGAGCAAA CCCTTCTCCC TCTGGAGGCC ACAGGTTCCT GATATGTAAA ATGAGAGGGT TCATTTAGAG TTAATCTGGA AGTTTCTTCC AAGTTCTGTG TTCTGTATTA TGTCCATGAT TCTGTCTTTT TGCTGTCAAC AGATAATGAG CCACCCTCAG TGACCAAAAA GAGAAGAAAA GCCCCTTTAG CAATTTTCTC GCAAAACAGT TGTGTTAAAC TATACCAAGA ACCTCCAAGA CCAAGTGATT CTCACCTTCC CTTGCCTAAA CCAAATTGAA GTAAACAAGT TGCCTTTGAT TCTGCCCCCC ACCTGTGTTT CTCTAGAAAG ACTGGTTAGA TTTCAGAGTT CTGATGCTCC TTCTTGCTTA ACCTTTGGTT GCATGCACAC GCANGCGTGT * GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGTAAAAAA AAATCCCCNC CTGTTTCCTN GGNCCANAAA TAAAAGCCAA ACCGGNNNGC CCTAANATTC CCCTNGNCTC NTTGAACTGA NGGNGGGGGG GGNCCCGAAA AAAATTCCCC NGGGTNGCNG GAATTNCNAN NTTCNACCTT TTCGATACCG NCCNAACCCC NAAGGGGGGG GCCGGGNGCC CACTTTTNGT CCCCNTTTAA TTGNAGGGTT AATTNNCCNC CTTGNNNTN

1F11R:

NNTNNNGATA ATTTGATATC GAATTCGCGC ANCCCGGGGG ATCTCTCAGT AACCACCCAC AGTCAGTTCA ATTAGCCAGA GAATTTTATT TACAACAGTT TGGCTTTATT TTTGGCAAGG AAACACGTGT GGATTTTCTA TACACACACA CACACACACA CACACANACA CNCACNCACG CCTGCGTGTG CATGCNACCA AAGGTNNATC TAGAAGGANC ATCAGAACTC TNANATCTAA CCANTCTTTC TAGAGAAACA CANGTGGGGG GCAGAATCNA AGGCAACTTG TTTACTTCAA TTTGGTTTAG GCCAGGGAAG GTNANAATCA CTTGGTCTTG GAGGTTCTTG GTATANTTTT ACACANCTGT TTTGCGAGAA AATTGCTAAA GGGGCTTTTC TTCTCTTTTT GGTCACTGAG GGTGGCTCAT TATCTGTTGA CAGCNAAAAG KCAGAATCAT GGACATANTN CAGAACTCAG AACTTGGAAG AAACTTCCNN ATTAACTCTT AAATGAACCC TCTCATTTTA CATATCAGGA ACCTGTGGCC TCCCANAAGG GANAANGGTT TGCTCCCAAA TCCACTAATT CTANANCNGC CCCCCCCGCN GTGGGAACTN CCAATCCNCC CTATAAGTGA ANTCCTTTTA NCCCGCGCTC NCTGGCCGGC CGTTTTAAAA CGTTCNTGAC NGGGGAAAAA NCCTGGCGTT TNCCCNNTTG 198 2A6

2A6F:

CNCTTCNTTT TGGACTCCAC GCGGTGGCGG CCGCTCTAAA ATANTGGATC CTGTGACAAT GTCTTTATTC ATAATGAGGG AAAAGATGTG TGTGTGTGTG TGTGTGTGTG TGCGCGCGCG TGCGCGTGTG TGTGTGTGCG CGTGCATCGG GGGGCGGATG GTCACCTTCT CCTTGCGCTG ACCCCCACGG CGCCCTGGCT CGGGGGCTTC GGGGACCAAA CGCTANGGGG AAGCGGGAAG CCCAACAACT TCTGGAAAAA ACACGGGGGT CANGGTGCAA ATGCCAANCT GGGGCTGTGG GGGCCCCNCC CGGGGGACCN TCCTTGGGGT GGGTGGGGCC GTGNAACTCA CTGTGTCCTG CCCAGGGGCG ACCGACAAAC ANCTGTGCCC CACTCANAAA TGGCTCCTCA CNTCAGCCCC GTCCCCGTGC TCTTCCANCC ACCGCATGCC CATANAAAGG CACGGCTTGG GCANTCTGGC TCCNGCACCC CNCCTCCTCC TTNGCCCCAT CCTTGGCCCC CNCGCCTCCC NNCTNTCCTG CANGCANAAG AAAAACANGA NANCNCGGAN CTNNTCNAAG CCNTNGCNNG GCNGGAAANC CNGGGCTNCN CCTTTNCTNG NCCCNNNCCA GAANCCCNNN CNGNAANNAG GGGCCGGTTN CCCCNNANNC CCCTAAAAAN NNNGAACCNC CNNNNN

2A6R:

TNNTTTNNTT TGAAGCCATT CCTGCNGCCC GGGGGANTCG TCTGANTGCT CCAACCCNAT AAACCCTGCC CAGCCTTGGC CTTCCCAGGG GCTGCTGGGT GGGGGCTTTC CANGCCACAA GTCCCCCCCG GGCACTGGGA CTTCTCACCA CCATGGACAN GAAAATCCCC GTTTCCTTAN TCACCTTGTG AAATGANAAC CTGCCTCACC CTGGGTCTCG GTTTCATTCT GACACCACAC TGGGGGGATN AAATTTNTTG GGTCCAAAGT TCTANGGCCT GGGGGCANCC NGNCCTTGTC CAGCAAGGGC TCNTTGATCG ATCCANCATA TGTGCATNCC CNGNCTCTCN TCTGCCCAAT NNCTCGGNNA AGTCNGCGCT GTNNTGCTTC TCCTCTGCCT GCANGANANN NTNNGAANTG GGGGGGCCAA TGATTGGGTN AATGAAGGAN GTGGGGTTTC TGAACNNANA NTNTCCTTTC CNTGCCTTTC TTATTTGNTN CGTTTGTCTG GAANAANANN GGGAANNGGG NCNTACTNTN ANGAAACNAT TTCTTTAATN TNGTCNCATT TTTTTCTTTT CNGTCCTCCC CTTCNGGCTT GGATATNCAT TNAATTCTTC NNTGGGTTTC CACCCTTATT TCTNCAANCN AAATTTTNTN CCTTTCCNTT TAAGNGGGTT TTCNTCATTT TCCTTTNANN TCTCTTCTAT ACTTTTTTNT CTCCTTATAT NCTCTCCTTC TGTTNCTTTT TTTTTGAAAT N 199 2A11

2A11F:

NNNNNNGNTT GTACCCACGN GGTGGCGGCC GCTCTAGAAT AGTGGATCCA CC CTCAAGGA GTAAACTGTG AAACAGATAA AATTGTAACA GTCTGAATCA TGGAGATGTA ACTTTAGTTT GAAATCTGCC TTTTTTGAAG AACCTCATTC ACCCATGGTG TTCGGTAAAT GGAACATTAC TACATGTAGT TTGTTACAGG TGTGTGCGTG TGTGTGTGTG TGTGTGTGTG TGAAANTTAA AATACATATC NCAGAAATTT ACAACTTGAA ATCATTCTAA GTGCACAGCT CGCTGGCATG AGGCACCCCA CACGGGTGTG CCACCATCAC CACCATCCTC TGCCAGAACC TCATCATCAT CCCAAATTGA AACTCCACAC CCACTGAACA GCTCCCCGCC CCACCCCCAN CCTGCAACCA CCACCTGACT CTGCCTCTGT GAATTTAACA AGCATAGCGA CATCACNTAA GTGGAATCAT AGANCATCAC CGTGTGTGTG TGTGTGTGTG TGTNTGTGTG TGTGTCCCGT GACCGGCCTA TTTCACNTTT GGCATTTACA NACTTTTTAA ANACNTAANT CCCATTGGGT ANCCCCTGGT GCCNACTAAT TTTTACTCCC CCCTGAAATT AAACNNCCCC TNTAAAACNA TGAATNCCCC CGGGNGGCNN GGAAATCCAA AATCAAGCNT TATCCGAATA CCNNTCGAAC NTCGAAGG

2A11R:

CNNNGAAATT TGAACGAATC GCCANCCCGG GGGACATGTT TAAAGGGCTG ATAATTCAGT GAGAATAAAA ATAGTAAGCA CAAAGGATAG CAATGAGAAT TATGTATTTA AAAAGTCTGT AATGCAAAAC GTAAAATAAG CCAGTCACAA AACACACACA CACACACACA CACACACACA CACACAAATA AATGCTCTAT GATTCCACTT ATGTGATGTC GCTATGCTTG TTAAATTCAC AGANGCANAA TCNGGTGGTG GTTGCANGGT GGGGGTGGGG CGGGGAACTG TTCAGTGGGT GTGGAGTTTC NATTTGGGAT GATGATGANG TTCTGGCANA AGATGGTGGT GATGGTGGCA CACCCGTGTG GGGTGCCTCA TGCCANCGAG CTGTGCACTT AGAATGATTC NAGTTGTAAA TTTCTGCGAT ATGTATTTTA ACTCTCACAC ACACACACAC NCACACNCAC GCACNCACCT GGTAACAAAC TACATGTTAN TAATGTTCCN TTTACCGAAC ACCATGGGTT GAAATGAAGT CCTTCCAAAA ANGGCAGATT TCAACTAAAA NTTNCNTCNC CCTGAATCNN ACTGTTACAA TTTTTTCTGT TTCCCANTTT ACCCCTGAAG GGTGGATCCC CNATTCCANA ACGGCCNCCC CCCGGNGGAA CCCCCATCCC CCCANAATGA NTCCTTTTAC CCCCCC (GAAA)n*(CTTT)n MICROSATELLITE REPEATS

1B7

1B7F:

GGNNGGGGGT NTTTTTCCTT TTTGGNGTCT CNGCCATCAN ANCAATCTTG GCCCCCGGGC NATCNACTGT GGTTNAACAG TATGGCAGTT CCTCCAAAAA TTAAACAATT AATCATATGA TCCAGCAATC CCACCCCAAA GANTTGNTTG CAGGAAGANN GAGAGACAGA AAANGAAAGC ANGAAGAAAG AAAGAAAGAA AGAGAGAGAG AGAGAGAGAG AGAGAGAGAG AGAGAGATGA TTGATNGANA GANNGAAAGA ANGAAAGATN GNTAGAAAGA AAGAAACTCA AAGAGATATN TNNACACCCA TGATCATAGC AACAGNATCC ACAGTGGNCA TTACGTGGTT GNACCCAAGT TCCCACTGAA GGATGNATTG ATGANCAGAA TGTGGNCCAG ACGTACAATG GAATNTTTCT CTTGTGTTTA NNNNGGGNAA GGACATCCTG ACAACTNCTT CAACGTTGTT TTGTCCTTTG GNGGNTTTTT TNNTTCNTTN TNNTNTNGCC NGTCTAAAAT NGGGACANAT TTCTGTTCCG ATTNCCNNTC CCTTTNANGT TTCCTTTTNT GNATCANTTT CCATNTGTNT TNGTTTTTTN GGNNAGCCNN NTGGNTTTTG GGGGGGGNNN AGGTTTTTTT TTTTTTTTCT NNNNGGNATA ANTTCTCTTT TTCNGGTAAA NTTNNAANTT NCCCCCCNTN NTTCNTNGTT CTCTTCTCNT CTTCNNTTNN TTGGGNTNGG TNTTTNNTTT NTNNNTTTNT TTTTNTTNNN TCNTNNNCCN TNTNATTTCT CTCN

1B7R:

GNATTGACCA CGCGGCCNGT TGTCGNAAAN CNNTCTTTCT NGCCNTCTTT CTTTCTTTCT TTCTTTCTCT CTCTCTCTNT CACTCANTCG CNNTCTCTCT CTTNCTTTCT TTCTTTCNTC TTNCTTTCTT TNTGTGTCTC TNTTTCTTCC TGCTTTCAAT TCTTTGGGGT GGGATAGCNG GATCACATGA TTAATTGTTT AATTANTGGA GGAACTGCCA NACTGTTNCC CACAATGGAT NCACCATNNT GTATNCCCAC CANCAGTGNA NAAAGGNGCC AAATTTCTTC ANANCCNCAC CAACACTNGT TACNTANNAC CCAACTCTTT ANGTTTGTGA ACGTGAGANA GNTANCTANN GNTGTTTANN NACCCCCAGT CCNNGNAGCT GAAAAANANN AGCATTTAAT GTCTCACACA AATTCTGAAG GNGNNGAAAT CGANAATGGG NGNTNATGAA TGGGGCAAAN NNTCTTCNTT CCCCCNCANG GGCCCACCNT ANAAANGGAA TATCCCNNCT CGNTGGNATT GNAAAAANAA AAGGNANTTT TCCCNGAAAA GNTNGTATAN NNNNGGAAAN CATNGGNTGC TCACGGNGGT GTGCGGTNTA TNGNGCTNGG GACNAAGAAA GNAAATTTTT NGAAANANAA ATNCCCTCGN TCNTGCAAAN CACTCNGGGN GANN ANA TCG CTNTTTATCA TCNGNTGGNA AGAAANATTN CNCTCNCNGA GGGGGCTANT ANAGNGGNNG ANNGTTATNC TACCGCCGNA GANAGAAAGA NANATANNGG AAAANGTNCT TCTNNNAANA GGGGGGNAAC TCTTAAATAT ATCNCGNATA TATCATCTAT ANTNNGGNNT AAAATANNAN 201 2A7

2A7F:

NANNCTTNTT NNTTGGACCC CCGCGGTGGC GGCCGCTCTA GAACTAGTGG A T C O M T C C C A C Q T C M Q C T CCCGGTGCAT GGAGCCTGCT TCTCCCTCTG CCTATGTCTC TGCCTCTCTC TCTCTCTGTG TGACTATCAT NATTNNNTTC AAATTANTAA AATCNGAANA GAAAGAAGAA AGAAGAAAGA AAGAAAGAAA GGAANGAAAG AAAGAAGAGA NANANAAAGA ANGAAGGAAG GAAGGAAGGA AGAANGAAAG AAGAANGGAG AAAGANGGAA AGAAAGAAAG AAAGAAAGAA AGAAAGAAAG AAAGAAAGAA AGAAAGAACN ATTTTGAAGG AGAGAAGAAA GGAAGACCCT TTTTTGGGAA AAAGGGGAAA AAATNATATC CTTTNTTACA TTGAATTNAA ACCCCNAAAG GCNGATCCCC CGGCTGCNGG AATTCCATNT CCAGCTTNTC GATTCCGTCT ACCCCAAGGG GGGCCGGTTN CCNCCTTTTG TTCCCCTTAA TGAAGGTTTA TTGCGCCCCT GGNGTTTTCC TNGTCNTAAC TGTTTCCNGT TGNNNAATTG TNTCCNTCCC CNTTCCCCCC TCNTTCNANC CGGAACATAA TTNTTAGCCG GGGTGCNNTG ANTTACNNNN CCCTTTNTTT GNTTTCNCCC CCGGCCCTTT CCCTCNGGAA ACNTTNTTCC CCTTTCTTTT TTATTCCGCC NCCCCCGGGA AN

2A7R:

NNNNNNAACT GATNCGAAAC CTGCAGCGNG GGGGATCTGN CTNNNGTGGT TTNAATTCAA TGTTATATAT GANTTAATTT TCTCCCCTTT TTCCCATATA AGTTGTCNTC CTTGNCTTCT TTCNTTCTTT ATTTTTCTTT CTTTCTTTCT TTCTTTCTTT CTTTCTTTCT TTCTTTCTTT CTTTCTTTCT TTCTCTTTTT CTTCTTTCCT TCTTCCTTCC TTCCTTCCTT CCTTCCTTCT CTCTCTCTTC TTTCTTTCTT TCCTTTCTTT CTTTCTTTCT TCTTTCTTCT TTCTTTTCTT T T T T T T T T AA TTTTTA TTTA TTTATGATAG TCACACAGAG AGAGAGAGAG GCAGAGACAT AGGCAGAGGG AGAAGCAGGN TCCATGCACC GGGAGCCTGA CGTGGGANTC GAT CCACTAG TTCTAGAGCG GNCGCCACCG NGGTGGGAGC TCCAATTCGC CCTANAGTGA GTCGTATTAC GCGCGCTCAC TGGCCGTCGT TTTACAACGT CGTGACTGGG AAAACCCTGG CGTTACCCAA CTTAATCGCC TTGCAGCACA TCCCCCTTTC GCCAGCTGGG GTAATAGCGA ANAAGNCCGC AACGATCGCC CTTCCCAACA GTTGCGCAGC CTGAATGGNG AAAGGGAACG CCCTGTAGCG GGGCATTAAA CGCGGGGGGT GTGGTGGTTA CGCGCAGGTG ACGNTAAACT TGGCAGCGGC CTANCGCCCG CTCCTTTTCG CTTTCTTCCC TTCCCTTCTC GCCAAGTTCG GCGGGGTTCC CCGTCNAAGC TCNAAATCGG GGGGNCCCNT TTAGGGNTCC GAAATNAATG GTTTAANGGG ACCTCGAACC CAAAAAAAAT TGAATAAGGG NGAAAGGNTC ACNTANTGGG GCAACGCCCC GAAAAAAANG GGTTTNCGCC CNTGGAAAGN TGGGAAGTCC CNTTCCCTTA AAAAANGGAA ACCTGGTTCC AAAANGGGAA AAAAATCCAA CCCAAACCCG GGCNATTCTT TGAAAANAAA AAGGGAATTG GNCGAATTCC GGCCATNGGG TAAAAAANGA ACGGAATNAC AAAATTTAAN NNGAAATTTA N

The o u tim ® letters indicate the site where a second primer was designed for PCR due to non-specificity with the initially designed primer. 202 2D2

2D2F:

NNCGGNTNCT TTTGTTNGNT NGNCCCTTGG CAACTAGAAT AGGGGATCGA GACCACATTG GGCTCCCCGC ATGGAGCCTG TTTCTCCCTC TTGCCTGTGT* CTCTGCCTCT CTCTATGTGG CCCTCAAGAA CAAATTAATT ATTTTTTAAA AAAAGAAAGA AGAAAGAAAG AAGAAAGAAA GAAAGAAAGA AAGAAAGAAA GAAAGNAAGG AAGNAAGANA GNAAGGAAGG NANGNGAATT TTNGGAAGGA CCNAACCNCA NCTCTTGNGN TTAAAAAAAG NTCCAGGTCC CGGTNACAAA CNCAAANGGA CGGGGATTCA AATCAAANTT TNNTTCTNCG GGGGAAAAGT GGGTNTGGGG GAGAANAGGG TGGGGGNGGA GACCCCCCCC CATTTTTTNT CCCCAGCAGG GGANAAAACC CCCCCCTTNG TTGCAAGAAG GNTCNCCNAG GNAATNATTG NAAAAGTTTT TTTGGGCCCN CAANANGGGG NCCCCCTCCC CCCCTTTNAA AGNAGTTNNT AGANAANAGC CCCCCCCACA ANGNATANCN CCCNTGGNTT A TNNAAAACC CCCCNGTNNA GGANNCCCCC TTAAAAANNN TTTNTTNNCC CCNNCCCCCN GGGGGGGGCN NCCCCCCTTT TGNNCCCCCT NNNAAGNTTT TTTTTTNNNC GGNGGNNNNA AAAAAAANTN NTCTCNTCTN TNNTTNNNTT CNNCCAAAAN ANNTCAAAAA ANAATNCNNT CNTNTNTATT TNTTTNTNNC NNNNGGNNCT NTTNNNCTNT TTTN

2D2R:

NNNANTGNAG TTGATCGATT CGNNAGCNGG NNNATCTGAG TTTCGACCGG CGTTTCTNCT TGTGGGTCGT CTNNCTNAAC GTCTTCTAGT GGNGACGCGG GTCCCCTGTG GGCCCCAGAC GCTCTCCATG CCTCCCGAAC CCCTGCTCCT* CGCTTGGGTT TGTCTCTCTG TGGTTACATA GNGGGTGGTT TCTCCCCACC ACTGTCTCCA ANCCCACTTC TCCGGANGAT CTAATCTGAC ATNAATCCCG CCCCGTGTGG GTGNCATCNG AACTGGAGCT TTNTTTAANN TCTAAAAATA TGGTGTGGNT CTNTCTAACA TTCTCTTTCT ATCTTNNTNT CTNNCTNCTA NNTANNTNNC TNTCTNTCTC TCTNNCTATN TAACTCCTTN CNNATCTCCT NNCTNANNNA AAAAATATTA TTTTTTGGGN CAAGNAGAGG NAAAATAGAG NNGAGGCAGN AGAACAAAAG AGNAGAGGGG AGAAAANAGA ATCCAATGNN GGAGGAGNCN AAATNNTGAN TTTTTCNAAN CCAANTTNGN TNNCTNAGNA GGNNAGACCN NCCAACCANG ANGNGAGGGG AGGNTTCAAA ANTANCNNNN ANNNAATNAA NAANAATCNG GTAATTTTAA GGNNANNNNN TCNAAATTNG GCCNNNNNGG ANNAANAAAA AANGAGCGGN ATAAANTNAT ANAAAAAANC CNNNNNNNAT TAANCCCAAA ANNTAAANTT NNNNCNANGG AAAAAAAANN ACCCCNNTNN AAAAAAAATT NNNNTNAANN ANNNAAANAN GAANANNNAN NAATCCNANN NANANAANTN TANNTANNNA ACNTAANTTT ATNNAN 203 2H12

2H12F:

ATTGGACCCA CGCGGTGGCG GCGCTCTAGA ATAGTGGATC CCAAGGNAGC CTGGGGATTC TTTGTCCCTT CCTTCAGTCT GGANCCAGGA TGANCCGGGT GCCCATTGGG GCACGGGGAA AGGGCGCATC TGANCCTGAT AATGGCTCCC CANGCCCAGG GTCACTGGGA AGCANGAAGG AAGGCCAGGG GCTCCGGGCG TCCCGCTGTN CACGTCCCCG GCATCCCCTG ANGGGTTGGC CGGACGCGGA AANGACTGGC GCGTCCTGGA AAGGTCTCCT GTGTCCCCCG CCTTGGGCAT GGCCTTGCAC ACCCCTTGAA CCTCCGTTGC CTTCCTTCNG AATTGGGGGA AANTCCTACC CACNCANTTG CCGGANGCCG TCAACCGCNT TTGAATTCCC ACCTGCGAAC AAACCNGGAA CCGNNGGTTT CCTGCCTCCT NGGGGGCTGG AAGCTTNGGG GGAACAAAAA TAANCCTNCT TCTNTTCTTT CCCCAGGGTC CCTGNGGGGT TTGGAAACCC CTTNNACCAT CCTGCNTGNT TCTCCCGGCC GTTGCCAAAA AACAACCCNA NGTTTAATTC CNANTTTGGN CCCCCTAACC TTTGGGGGCC NCNACCCTNC CCCNNGGGGG AAACCCTAAA ACCTTNGGGT NCCCTNAAAT TTTTTTNCNT TTTAAAACCC CNTTGGGTCN ANCCCNNCNN NTTTNCCTTT TTTTNTNCCC TTTCCCNNTN CCCNTCCCTT TNCCTTTTTT CCCTTTNCCC NTNCCCTTTT CCCNTNCCCN TTNCCNTTCC CNTNCCCNTT NCCNTNTCCC TTNCCNNTNC CCCTNNNNNN NN

2H12R:

GNGNNNGGGA AACTTAACNA TTCTGCCCCC GGGGGATCCT GGGGGGCTCA NCANTTNAAG CACCTGCCTT CAGCCCAGGG CATGACTCTG GGGTCCTGGA ATCGAGTCCT ACNTCGGGCT CCCTGCATGG AACCTGCTTC TCCCTCTGCC TGGGTCTCTG CCTCTCTCTC TCTCTGCATC TCTCATGAAT AAATAAATAA NATCTTGNAA ATTTTTTAAA TTGTGTTTTT GAACACTGAT GTNAGAAATA ACCATGCGCC TTCGCCGGNA AGCTGGNTNA ATTNCACCTC CCGGGTTCCC GGGAAGCCCA ACGTTCACCC GTGCAATTNA NCCACAGCTG TGNACGGAAA NAATNCTTCC CCNCAATCTG GGCTTTAACG TGGGGANGAA GCTGCCTCCN CNAAAATAAA AGANAAAACC GGCAATNCCC CACCTGATAC ATATCACCTT TGGGANATTN NNTTCCCCGA ATTCNCGAAA GNGGAAAAAG AAAGAAANCC NANGAAAAAA AAGGAANGGA GAGAAANGNG AGAAAGGAGA GAAAAAGAAA GAACGAANGA AAGAAAGAAN GANAGAANGA ANNAANGAAN GAAAGAAGGA AGGAANNNCA ANNGGNAAGA AANAAANAAA GAANNNNNGA ATTNTTNAAT NACCCCGNTT TTTTNCNNNC TTTNNNCCCN NCTCNTGGGC CCCCCGGGAA GNTTTTTCNC NTTTGGGCCC CCNNAAACCC CCNNNNTTGT TTGGGNGGGN NAAA

The underlined primers shown above were designed to improve sequencing of the repeat itself, as the insert was too large to enable data of sufficient quality to be produced from sequencing directly from vector primers. 2H12F(2):

GCNNANNNTT TTGCTTCNNN CNGGATTGGG GGANANTCCT ACCCACACAG TTGCCGGNCG CGGTCACCGC GTTTGACTTC CCACCTGCGA CCAAACCGGG AAGCGGCGGT GTCCTGCCTT CTGGGGGGCT GGAGGCTGGG GGACCANAAT GACCCGTGCT CTGTCGTTCG CAGGCTCCTG GGGGTGGGAC CCCTGCACCA TCTGCTGCTC TCGGCCGTGC CAGACCATCC AGGTGAGTCC AGGTGGCCCC * TACCTGGGGC CACACCTCCC AGGGGAANCC TGAGCTTGGT GCTTANTGTT GCTTTACTCT GTGGTCATCG TCATTTCTTT TTTTCTTTCT TTCTTTCTTT CTTTTTCTTT TCTTTCTTTC TTTCTTTCTT TCTTTCTTTC TTTCTTTCTT TCTTTCTTTC TTTCTTTCTT TCTCTTTCCT TTCTTTCCTT TCTTTCCTTT CCTTTCTTTC CTCTGTTTTC TTTCTTTTCT TCCTTCTTGA ACTCTGGGAA TAAATGTCCA AAGGTGATAT GTATCAAGGT TTTTGTTTTG TTTTGTTTTT TCCTTTCTTT TCCTGGATGC AGCCTCCTCC CCACGTTAAA GCCCAGATTG TGGGGAACGA ATCTTTCCGT GCANAACTGT GGCTTAATTG NACGGGTGAA NGTTGGGGCC TCCCGGGAAC CNGGAAGTGC ACTTAACCAA CTGCCGGGGA AAGGGCAAGG TTAANTCTAA AATCAATGTT CAAAAAANAN AAATTTAAAA AAATTTTAAA AAATTTAATT AATTAATCCN TGAAAAAAAN NCCNAAAAAA AAAAAAAAAG GGNNAAAAAA CCNNGGNNAA AAAGGGAAAA AAACAAGGGG NCCCAANNGN AAGGGGGAAA CCCCAAAANT TTTGGNAAAA CCNAAATNCC CNNNAAAAAC CCCNAAAAAA TTNNTTNCCC CCNNGGGGGG NNNAAAAAAG GGGAANGNTT TNNNNNAAAA ATNGGNNNNA AAAAACCCCC CCCCAAAGGG GGNAAAA TNN AANAAAANTT TNNNNAAAAA AAAAGGGGGN NNNCCAAAAN NNNNGGGGGN GGGAAAAAAN CCCNAAANTT NCCCCCCCCC NAAAAAANNN NAAAAATTNN NNNNANNAAN NNCCCCCCNC CCNCNNNNNN GGGGNNNNNN NNNNTTTTTT NAAAAAAAAN NTTTNNTNNA AAAANGGGGG AAAAAAANNN NNNNMJNNNN N

2H12R(2):

NNNNNNTGAN TTTTNCNNNG TCCTGCCCTT TCTCTCTCTG CATCTCTCAT GACTAAAGNA NTAAAATCTT AAAAATTTTT TAAATTGTGT TTTTGAACAC TGATGTAAGA AGTAACCATG CGCCTTCGCC GGCAGCTGGC TAAGTGCACC TCCCGGGTTC CCGGGAGGCC CCAACGTTCA CCCGTGCAAT TAAGCCACAG CTGTGCACGG AAAGAATCGC TTCCCCACAA TCTGGGCTTT AACGTGGGGA * GGAGGCTGCA TCCAGGAAAA GAAAGGAAAA AACAAAACAA AACAAAAACC TTGATACATA TCACCTTTGG ACATTTATTC CCAGAGTTCA AGAAGGAAGA AAAGAAAGAA AACAGAGGAA AGAAAGGAAA GGAAAGAAAG GAAAGAAAGG AAAGAGAAAG AAAGAAAGAA AGAAAGAAAG AAAGAAAGAA AGAAAGAAAG AAAGAAAGAA AGAAAGAAAA GAAAAAGAAA GAAAGAAAGA AAGAAAAAAA GAAATGACGA TGACCNCAGA GTTAAGCAAC ACTAAGCACC AAGCTCAGGC TCCCCTGGGA GGTGTGCCCC ANGTTNGGGC CNCCTGGACT CNCCTGGATG GTCTGGCNCG GNCGAAANCA NCANATGGTG CNGGGGTCCC ACCCCCAGGA ACCTGCGAAC ACANANCCGG GTCTTCTGGT CCCCNCCCCN CCCCCCANAA GCNGGACNCG CCCTCCCGTT GGTCCCNGTG GGAATCNACC CGTGACGCTC CGCNCTGTTN GTNGATCCCC NCCCCANGAA GCNCNAGCCN GGGTTCNGCC TCCCNGNGGG ANGAACTCNA CCNTCTTCCT CGGCCCTNGC ACGGATTNCG ACCGACCGCC TNCCCTNCNG CGGACTCGCN TCCCTCCTCC TGCCGCCNNC ANNN 205 GENERAL APPENDIX

00

Results of BLAST searches which gave significant matches to database sequences (Altschui et al 1990)

The BLASTn search tool was used and reports significant matches between the submitted sequence and the database sequence, where there are almost identical sequence matches.

Explanatory notes: Each sequence had dinucleotide repeats screened out to avoid sequence alignment with the repeat alone. Where the repeat is included in the matching sequence, it is represented by NNNN.... The list of sequences given at the start of each search is structured as follows: On the far left is the EMBL(emb) or Genbank(gb) Accession number. The middle section gives the name of the sequence providing the match with the submitted sequence. In the final column, it is the Probability - P(N) which is important. It is a natural logarithmic value of the sequence similarity, N is the number of different places the submitted sequence matches the matching sequence.

In the next section, showing where the matches occur within each sequence, the submitted sequence is 'Query' and the sequence matched to it is 'Sbjct'. 'Identities' and 'Positives' give the number of identical bases over the total number of bases encompassing the match, expressed as both a fraction and a percentage. Sequences are reported showing those with the highest matches first. 206 SUBCLQNE 1B7 Only part of the sequence match of the highest scoring sequence in the forward match ,and only the highest scoring sequence in the reverse match has been shown.

Sequence 1B7F:

Query= 1B7F (294 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N emb|Z82200|HS12 Human DNA sequence from clone J333. 161 2.4e-19 5 gb|U73642|U73642 Human Chromosome 11 Cosmid cSRL30h. 161 2 .le-12 3 gb|L31733|HUMUT643B Human STS UT643, 3' primer bind. 161 8 .5e-ll 2 gb|AC000115|HSAC000115 Human BAC clone GS188P18; HTGS pha. 162 9.Oe-11 4 dbj|D87016|D87016 Human (lambda) DNA for immunoglobi. 152 l.le-10 3 emb | Z822001HS12 Human DNA sequence from clone J333E231 Length = 139,101

Minus Strand HSPs:

Score = 161 (44.5 bits), Expect = 5.7e-08, Sum P(4) = 5.7e-08 Identities = 43/60 (71%), Positives = 43/60 (71%), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTCNATCAATC 203 I I Mill lllllllllllll I MINI I I I I I I I III || || II Sbjct: 63 542 TCTTTCTTTTCCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63601

Score = 152 (42.0 bits), Expect = 7.3e-06, Sum P(4) = 7.3e-06 Identities = 42/60 (70%), Positives = 42/60 (70%), Strand = Minus / Plus

Query: 262 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTCNATCAATC 203 I III I I lllllllllllll I llllll Mill III || || || sbjct: 63 554 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63613

Score = 152 (42.0 bits), Expect = 7.3e-06, Stun P(4) = 7.3e-06 Identities = 42/60 (70%), Positives = 42/60 (70%), Strand = Minus / Plus

Query: 262 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTCNATCAATC 203 I III I I lllllllllllll I llllll lllllll III II II II Sbjct: 63 558 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63617

Score = 146 (40.3 bits), Expect = 2.4e-19, Sum P(5) = 2.4e-19 Identities = 36/48 (75%), Positives = 36/48 (75%), Strand = Minus / Plus

Query: 250 TTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTCNATCAATC 203 lllllllllllll I llllll lllllll III II II II Sbjct: 63470 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63517 Score = 146 (40.3 bits), Expect = 2.4e-19, Sum P(5) = 2.4e-19 Identities = 36/48 (75%), Positives = 36/48 (75%), Strand = Minus / Plus

Query: 250 TTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTCNATCAATC 203 lllllllllllll I llllll lllllll III II II II Sbjct: 63474 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63521

Score = 146 (40.3 bits), Expect = 1.4e-06, Sum P(4) = 1.4e-06 Identities = 38/52 (73%), Positives = 38/52 (73%), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNTC 211 I III I I lllllllllllll I llllll lllllll III II Sbjct: 63470 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC 63521

Score = 141 (39.0 bits), Expect = 4.0e-06, Sum P(4) = 4.0e-06 Identities = 37/51 (72%), Positives = 37/51 (72%), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNT 212 I III I I lllllllllllll I llllll lllllll III I Sbjct: 63474 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCT 63524

Score = 141 (39.0 bits), Expect = 6.8e-05, Sum P(4) = 6.8e-05 Identities = 37/51 (72%), Positives = 37/51 (72%), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNT 212 I III I I lllllllllllll I llllll lllllll III I Sbjct: 63 570 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT 63 620

Score = 140 (38.7 bits), Expect = 2.4e-19, Sum P(5) = 2.4e-19 Identities = 40/57 (70%), Positives = 40/57 (70%), Strand = Minus / Plus

Query: 294 GGATNCTGTTGCTATGATCATGGGTGTNNANATATCTCTTTGAGTTTCTTTCTTTCT 238 I II II III INI llllll II I I I I I I I I II I I ! I II II II Sbjct: 15 GAATAGTGCTGCAATGAACATGGGAGTGCACATATCTCTTTGATATACTCGTTTCCT 71

Score = 132 (36.5 bits), Expect = 2.5e-05, Sum P(4) = 2.5e-05 Identities = 36/51 (70%), Positives = 36/51 (70%), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNT 212 llllll lllllllllllll I llllll lllllll I I I Sbjct: 63478 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCTTTCT 63 528

Score = 123 (34.0 bits), Expect = 2.4e-19, Sum P(5) = 2.4e-19 Identities = 31/40 (77%), Positives = 31/40 (77%), Strand = Minus / Plus

Query: 164 TTCTTTCTTTCTTTCTTCNTGCTTTCNTTTTCTGTCTCTC 125 lllllllllllllllll I II llllll III II sbjct: 63 591 TTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCTTTC 63 630

Score = 118 (32.6 bits), Expect = 1.4e-05, Sum P(4) = 1.4e-05 Identities = 30/39 (76%), Positives = 30/39 (76%), Strand = Minus / Plus

Query: 164 TTCTTTCTTTCTTTCTTCNTGCTTTCNTTTTCTGTCTCT 126 lllllllllllllllll I II llllll III I sbjct: 63503 TTCTTTCTTTCTTTCTTTCTCTTTCTTTTTTCTTTCTTT 63541

Score = 114 (31.5 bits), Expect = 0.00087, Sum P(4) = 0.00087 Identities = 34/51 (6 6 %), Positives = 34/51 (6 6 %), Strand = Minus / Plus

Query: 2 62 TATCTCTTTGAGTTTCTTTCTTTCTANCNATCTTTCNTTCTTTCNNTCTNT 212 I III I I lllllllllllll I llllll INI I I I I Sbjct: 63482 TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCTTTCTTTTT 63532

Score = 109 (30.1 bits), Expect = 2.9e-18, Sum P(5) = 2.9e-18 Identities = 33/49 (67%), Positives = 33/49 (67%), Strand = Minus / Plus

Query: 164 TTCTTTCTTTCTTTCTTCNTGCTTTCNTTTTCTGTCTCTCNNTCTTCCT 116 II llllllllllllll I I lllll I III II Mil Sbjct: 67904 TTTTTTCTTTCTTTCTTTTTTTGTCTTTTTTCCCTTTCTTTCTCCTCCT 67952 208

Score = 108 (29.8 bits), Expect = 3.5e-18, Sum P(5) = 3.5e-18 Identities = 28/37 (75%), Positives = 28/37 (75%), Strand = Minus / Plus

Query: 164 TTCTTTCTTTCTTTCTTCNTGCTTTCNTTTTCTGTCT 128 I I I I I I I I II I I I I I I I I II HIM III Sbjct: 63 595 TTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCTTTCT 63 631

Score = 104 (28.7 bits), Expect = 0.079, Sum P(4) = 0.076 Identities = 30/43 (69%), Positives = 30/43 (69%), Strand = Minus / Plus

Sequence 1B7R:

Query= 1B7R (532 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching...... done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AC002128|AC002128 Human DNA from chromosome 19 cosmi... 291 4.5e-14 X1 gb|L78751|HUM21DC9Z Homo sapiens (subclone 2_a2 from P. . . 260 3.7e-13 o gb|L4 6 9 0 0|HUM21DC8Z Homo sapiens (subclone 6_g3 from P. . . 260 3.8e-13 2

1 emb|Z69838|HSV1164A6 Human DNA sequence from cosmid Vll... 224 to 0 (D o CO X1 emb|Z86001|HSSRL9A13 Human DNA sequence from cosmid SRL... 218 6.3e-08 1 gb|AC002128|AC002128 Human DNA from chromosome 19 cosmid F19410, genomic sequence; HTGS phase 3, complete sequence [Homo sapiens] Length = 45,328

Minus Strand HSPs :

Score = 291 (80.4 bits), Expect = 4.5e-14, P = 4.5e-14 Identities = 81/117 (69%), Positives = 81/117 (69%), Strand = Minus / Plus

Query: 312 TAACNAGTGTTGGTGNGGNTNTGAAGAAATTTGGCNCCTTTNTNCACTGNTGGTGGGNAT 253 III III llllll II I II lllllll I III I II II lllllll II Sbjct: 28249 TAATAAGTTTTGGTGGGGATGTGGAGAAATTGGAATCCTCCGTGCATTGCTGGTGGGAAT 28308

Query: 252 ACANNATGGTGNATCCATTGTGGGNAACAGTNTGGC AGTTCCTCCANTAATTAAACA 196 I II III I Mill II llllll I I I I I I II II I I I Mill I Sbjct: 28309 GTACAATAGTGCAGTCATTGGGGAAAACAGTTTGGCAGTTCCTCAAAAGGTTAAAAA 28365 SUBCLQNE 1D6 Only the alignment of the highest matching sequence is shown.

Sequence 1D6F:

Query= 1D6F (275 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching, , done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AC0023 85|AC0023 85 Human BAC clone RG222A16 from 7q31... 220 1.8e-17 O*3

1 to 0 emb|Z69907|HSN22D1 Human DNA sequence from cosmid N22... 263 (D 2 gb|AC000110|HSAC000110 Human Cosmid g0771a233; HTGS phase... 188 2.7e-17 3 emb|Z74696|HS203C2 Human DNA sequence from cosmid 203 . .. 249 9.4e-17 2 emb|Z73 965|HSV857G6 Human DNA sequence from cosmid V85... 197 2.3e-16 3

gb|AC002385|AC002385 Human BAC clone RG222A16 from 7q31, complete sequence [Homo sapiens] Length = 201,990

Plus Strand HSPs:

Score = 220 (60.8 bits), Expect = 2.3e-08, P = 2.3e-08 Identities = 84/134 (62%), Positives = 84/134 (62%), Strand = Plus / Plus

Query: 142 ATCAAAGGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTCGATGCTGCTGCTATAAAC 201 II I II III llllll lllllll III II INI I lllllllllllll Sbjct: 1177 59 ATTGATGGTCATCTGGGCTGGTTCCCTATTTTTGCAATTGCAAACTGTGCTGCTATAAAC 117818

Query: 202 CTAGGGGTGCATGTATCCCTCTGAATTAGTATTTTGCCTCTTTTGGGTAAATACCCAGTA 261 Mill II II I I INI Mill 1111 Sbj ct : 117819 ATGCGTGTGCAGTATCTTTTTCGTATAATGACTTCTTTTCCTCTTGGTAGATACCTAGTA 117878

Query: 262 ATGCAGTCCCTGGA 275 II I Mill Sbjct: 117879 GTGGGATTGCTGGA 117892

Score = 199 (55.0 bits), Expect = 1.8e-17, Sum P(3) = 1.8e-17 Identities = 51/65 (78%), Positives = 51/65 (78%), Strand = Plus / Plus

Query: 16 CTCTAGCTCCATCCACATCATTGCAAATGGTAAGATTTCATTCTATGTGACGGCTGAGTA 75 III I II I I I II I I I I I II I I I I I II I I I I I I I I I I I I I I I 1111 III Sbjct: 140312 CTCCAACTGCATCCACATTATTGCAAAGGGCATGATTTCATTCTTTTTTATTGCTGTGTA 140371

Query: 76 ATATT 80 1111 Sbjct: 140372 TTATT 140376 210

Score = 194 (53.6 bits), Expect = 5.3e-09, Sum P(2) = 5.3e-09 Identities = 54/73 (73%), Positives = 54/73 (73%), Strand = Plus / Plus

Query: 146 AAGGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTCGATGCTGCTGCTATAAACCTAG 205 I 111II111111 II INI II llllllll III III II llllll II Sbjct: 48564 ATGGACATTTGGGTTCGTTCCAAGTCTTTGCTATTGTGAATGGTGCCGCAATAAACATAC 48623

Query: 206 GGGTGCATGTATC 218 llllllll II Sbjct: 48624 ATGTGCATGTGTC 48636

Score = 175 (48.4 bits), Expect = 1.8e-17, Sum P(3) = 1.8e-17 Identities = 51/71 (71%), Positives = 51/71 (71%), Strand = Plus / Plus

Query: 148 GGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTCGATGCTGCTGCTATAAACCTAGGG 207 I I I I I II I I I I I 1111 II Mill II II III II llllll II I Sbjct: 147299 GGACATTTGGGTTGGTTCCAAGTCTTTGCTATCGTGAATAATGCCGCAATAAACATACGT 147358

Query: 208 GTGCATGTATC 218 llllllll II Sbjct: 147359 GTGCATGTGTC 147369

Score = 166 (45.9 bits), Expect = 1.2e-06, Sum P (2) = 1.2e-06 Identities = 50/71 (70%), Positives = 50/71 (70%), Strand = Plus / Plus

Query: 148 GGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTCGATGCTGCTGCTATAAACCTAGGG 207 I I I I I I II I I I I 1111 II I llllll II III II llllll II Sbjct: 60470 GGACATTTGGGTTGGTTCCAAGTCTTCGGTATTGTGAATAATGCCGCAATAAACATACAT 60529

Query: 208 GTGCATGTATC 218 llllllll II Sbjct: 6053 0 GTGCATGTGTC 60540

Score = 125 (34.5 bits), Expect = 5.3e-09, Sum P(2) = 5.3e-09 Identities = 33/43 (76%), Positives = 33/43 (76%), Strand = Plus / Plus

Query: 232 ATTTTGCCTCTTTTGGGTAAATACCCAGTAATGCAGTCCCTGG 274 IIM I I I I II I I I I I I I I I I I I I I I I II I 1111 Sbjct: 67502 ATTTATAGTCTTTTGGGTATATACCCAGTAATGGGATGGCTGG 67544

Score = 116 (32.1 bits), Expect = 1.8e-17, Sum P(3) = 1.8e-17 Identities = 32/43 (74%), Positives = 32/43 (74%), Strand = Plus / Plus

Query: 232 ATTTTGCCTCTTTTGGGTAAATACCCAGTAATGCAGTCCCTGG 274 1111II llllllll 1111111111III I 1111 Sbjct: 147384 ATTTATAGTCCTTTGGGTATATACCCAGTAATGGGATGGCTGG 147426

Score = 116 (32.1 bits), Expect = 3.0e-08, Sum P (2) = 3.0e-08 Identities = 32/43 (74%), Positives = 32/43 (74%), Strand = Plus / Plus

Query: 232 ATTTTGCCTCTTTTGGGTAAATACCCAGTAATGCAGTCCCTGG 274 1111 II llllllll 1111111111III I 1111 Sbjct: 48651 ATTTATAATCCTTTGGGTATATACCCAGTAATGGGATGACTGG 48693

Score = 80 (22.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05 Identities = 28/43 (65%), Positives = 28/43 (65%), Strand = Plus / Plus

Query: 162 TTTCCCTAGTTTGGCTATTGTCGATGCTGCTGCTATAAACCTA 204 I I I I I I I I I I I II I II III I III II I II sbjct: 179260 TTTCCCTAGTTCAGTTTCTTTCTCTGCAGTTGCACTACATTTA 179302

Minus Strand HSPs:

Score = 184 (50.8 bits), Expect = 3.7e-08, Sum P(2) = 3.7e-08 Identities = 52/71 (73%), Positives = 52/71 (73%), Strand = Minus / Plus 211

Query: 218 GAT AC ATGCACCCCTAGGTTTATAGCAGCAGCATCGACAATAGCCAAACTAGGGAAAGAG 159 II llllllll I II llllll II III II llllllll II Mil I Sbjct: 32390 GACACATGCACACGTATGTTTATTGCGGCACTATTCACAATAGCAAAGACTTGGAACCAA 32449

Query: 158 CCCAAATGTCC 148 lllllllllll Sbjct: 32450 CCCAAATGTCC 32460

Score = 183 (50.6 bits), Expect = 4.5e-08, Sum P(2) = 4.5e-08 Identities = 55/78 (70%), Positives = 55/78 (70%), Strand = Minus / Plus

Query: 217 ATACATGCACCCCTAGGTTTATAGCAGCAGCATCGACAATAGCCAAACTAGGGAAAGAGC 158 I llllllll II llllll llllll II llllllll II INI I I Sbjct: 104471 AGACATGCACAAGTATGTTTATTGCAGCACTATTCACAATAGCAAAGGCTTGGAACCAAC 104530

Query: 157 CCAAATGTCCTTTGATAA 140 llllllllll I II I Sbjct: 104531 CCAAATGTCCATCAATGA 104548

Score = 125 (34.5 bits), Expect = 3.7e-08, Sum P(2) = 3.7e-08 Identities = 33/43 (76%), Positives = 33/43 (76%), Strand = Minus / Plus

Query: 274 CCAGGGACTGCATTACTGGGTATTTACCCAAAAGAGGCAAAAT 232 INI I lllllllllllll llllllll III INI Sbjct: 32333 CCAGCCATCCCATTACTGGGTATATACCCAAATGAGTATAAAT 32375

Score = 116 (32.1 bits), Expect = 2.6e-07, Sum P (2) = 2. 6e-07 Identities = 32/43 (74%), Positives = 32/43 (74%), Strand = Minus / Plus

Query: 274 CCAGGGACTGCATTACTGGGTATTTACCCAAAAGAGGCAAAAT 232 INI I lllllllllllll llllllll II INI Sbjct: 104415 CCAGCCATCCCATTACTGGGTATATACCCAAAGGATTATAAAT 104457

Sequence 1D6R:

Query= 1D6R (272 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching done

Smallest S tu n High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AC002385|AC002385 Human BAC clone RG222A16 from 7q31... 264 2.6e-14 2 gb|AF003 627|HSAF003 627 Homo sapiens cosmids Qc5E3, LC1833... 287 3.9e-14 1 gb|AC002074|HSAC002074 Human BAC clone GS056H18 from 7q31... 287 3.9e-14 1 emb|Z68908|HSU227D1 Human DNA sequence from cosmid U22... 284 7.le-14 1 gb|AC002069|HSAC002069 Human BAC clone RG326K09 from 7q21... 284 7.le-14 1 gb|L11910|HUMRETBLAS Human retinoblastoma susceptibilit... 240 1.4e-13 2 i emb|Z47556|HSSG1SG2 H.sapiens genes for semenogelin I ... 242 00 a> M U> 2 212

gb|AC002385|AC002385 Human BAC clone RG222A16 from 7q31, complete sequence [Homo sapiens] Length = 201,990

Plus Strand HSPs:

Score = 254 (70.2 bits), Expect = 2.7e-ll, P = 2.7e-ll Identities = 80/118 (67%), Positives = 80/118 (67%), Strand = Plus / Plus

Query: 12 CACTACTGGGTATTTACCCAAAAGAGGCAAAAATACTAATTCAGANGGATNCATGCACCC 71 II llllllllll llllllll III III I II II lllllll I Sbjct: 32343 CATTACTGGGTATATACCCAAATGAGTATAAATCATGCTGCTATAAAGACACATGCACAC 32402

Query: 72 CTAGGTTTATAGCAGCAGCATCNACAATAGCCAAACTAGGGAAAGAGCCCAAATGTCC 129 II llllll II III II llllllll II INI I I II I II I I I I I Sbjct: 32403 GTATGTTTATTGCGGCACTATTCACAATAGCAAAGACTTGGAACCAACCCAAATGTCC 32460

Score = 179 (49.5 bits), Expect = 0.00055, Sum P(2) = 0.00055 Identities = 53/75 (70%), Positives = 53/75 (70%), Strand = Plus / Plus

Query: 63 CATGCACCCCTAGGTTTATAGCAGCAGCATCNACAATAGCCAAACTAGGGAAAGAGCCCA 122 lllllll II llllll llllll II llllllll II INI I Mil sbjct: 104474 CATGCACAAGTATGTTTATTGCAGCACTATTCACAATAGCAAAGGCTTGGAACCAACCCA 104533

Query: 123 AATGTCCTTTGATAA 137 lllllll I II I Sbjct: 104534 AATGTCCATCAATGA 104548

Score = 79 (21.8 bits), Expect = 0.00055, Sum P(2) = 0.00055 Identities = 23/32 (71%), Positives = 23/32 (71%), Strand = Plus / Plus

Query: 13 ACTACTGGGTATTTACCCAAAAGAGGCAAAAA 44 lllllllllll II I III III I II Sbjct: 29595 ACTACTGGGTACATAACGAAATGAAGGCAGAA 2962 6

Score = 79 (21.8 bits), Expect = 0.00055, Sum P(2) = 0.00055 Identities = 23/32 (71%), Positives = 23/32 (71%), Strand = Plus / Plus

Query: 13 ACTACTGGGTATTTACCCAAAAGAGGCAAAAA 44 lllllllllll II I III III I II Sbjct: 68317 AC TACTGGGTACATAACGAAATGAAGGCAGAA 68348

Minus Strand HSPs:

Score = 264 (72.9 bits), Expect = 3.7e-12, P = 3.7e-12 Identities = 82/120 (6 8 %), Positives = 82/120 (68 %), Strand = Minus / Plus

Query: 131 AAGGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTNGATGCTGCTGCTATAAACCTAG 72 I lllllllllll II INI II llllllll III III II llllll II Sbjct: 4 8564 ATGGACATTTGGGTTCGTTCCAAGTCTTTGCTATTGTGAATGGTGCCGCAATAAACATAC 48623

Query: 71 GGGTGCATGNATCCNTCTGAATTAGTATTTTTGCCTCTTTTGGGTAAATACCCAGTAGTG 12 lllllll II I I II III II llllllll llllllllll II Sbjct: 48624 ATGTGCATGTGTC TTTATAGCAGCGTGATTTATAATC C TTTGGGTATATAC C CAGTAATG 48683

Score = 245 (67.7 bits), Expect = 1.6e-10, P = 1.6e-10 Identities = 79/118 (6 6 %), Positives = 79/118 (6 6 %), Strand = Minus / Plus

Query: 129 GGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTNGATGCTGCTGCTATAAACCTAGGG 70 llllllllll I INI II llllllll II III II llllll II I Sbjct: 67417 GGACATTTGGATTGGTTCCAAGTCTTTGCTATTGTGAATAGTGCCGCAATAAACATACGT 67476 213

Query: 69 GTGCATGNATCCNTCTGAATTAGTATTTTTGCCTCTTTTGGGTAAATACCCAGTAGTG 12 llllll! II I I I III lllllllllll llllllllll II Sbjct: 61 All GTGCATGTGTCTTTATAGCAGCATGATTTATAGTCTTTTGGGTATATACCCAGTAATG 67534

Score = 236 (65.2 bits), Expect = 2.6e-14, Sum P(2) = 2.6e-14 Identities = 78/118 (6 6 %), Positives = 78/118 (66 %), Strand = Minus / Plus

Query: 129 GGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTNGATGCTGCTGCTATAAACCTAGGG 70 lllllllllll I INI II Mill II II III II llllll II I Sbjct: 147299 GGACATTTGGGTTGGTTCCAAGTCTTTGCTATCGTGAATAATGCCGCAATAAACATACGT 147358

Query: 69 GTGCATGNATCCNTCTGAATTAGTATTTTTGCCTCTTTTGGGTAAATACCCAGTAGTG 12 lllllll II I I I III II llllllll llllllllll II Sbjct: 147359 GTGCATGTGTCTTTATAGCAGCATGATTTATAGTCCTTTGGGTATATACCCAGTAATG 147416

Score = 218 (60.2 bits), Expect = 3.2e-08, P = 3.2e-08 Identities = 76/118 (64%), Positives = 76/118 (64%), Strand = Minus / Plus

Query: 129 GGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTNGATGCTGCTGCTATAAACCTAGGG 70 lllllllllll I INI II I llllll II III II llllll II Sbjct: 60470 GGACATTTGGGTTGGTTCCAAGTCTTCGGTATTGTGAATAATGCCGCAATAAACATACAT 60529

Query: 69 GTGCATGNATCCNTCTGAATTAGTATTTTTGCCTCTTTTGGGTAAATACCCAGTAGTG 12 lllllll II I I I III II llllllll II lllllll II Sbjct: 6053 0 GTGCATGTGTCTTTATAGCAGCATGATTTATAGTCCTTTGGGTACATTCCCAGTAATG 60587

Score = 177 (48.9 bits), Expect = 9.7e-05, P = 9.7e-05 Identities = 51/71 (71%), Positives = 51/71 (71%) , Strand = Minus / Plus

Query: 13 5 ATCAAAGGACATTTGGGCTCTTTCCCTAGTTTGGCTATTGTNGATGCTGCTGCTATAAAC 76 II I II III llllll lllllll III II INI I lllllllllllll sbjct: 117759 ATTGATGGTCATCTGGGCTGGTTCCCTATTTTTGCAATTGCAAACTGTGCTGCTATAAAC 117818

Query: 75 CTAGGGGTGCA 65 I llllll Sbjct: 117819 ATGCGTGTGCA 117829

1 A * * f ^ Q K — O 1 A C u m t> / O \ - O 1 A Score — X V J \ ~1 J • ~J X / X ^ o ; , x — X . X V , xX XXXII X \ Xj / — X • x v Identities = 39/53 (73%), Positives = 39/53 (73%), Strand = Minus / Plus

Query: 249 CAACATCATTGCAAATGGNAAGATTTCATTCTATGTGACGGCTGAGTCATNTT 197 I INI llllllll II I lllllllllll I I I Mil II I II Sbj ct: 140324 CCACATTATTGCAAAGGGCATGATTTCATTCTTTTTTATTGCTGTGTATTATT 140376 214 SUBCLONE1E3

Sequence 1E3F:

Query= 1E3F (397 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N emb|X53 553|BTILGF2 Bovine mRNA for insulin-like growth... 255 1 .4e-12 2 gb|U00668|OAINIGFII7 Ovis aries insulin-like growth fact... 246 1. 7e-10 1 emb|X15248|OOIGFII Sheep mRNA for insulin-like growth . . . 246 1. 7e-10 1 gb|M89788|SHPIGFIIA Ovis aries insulin-like growth fact... 246 1 .8e-10 1 o 00 rl rH

Plus Strand HSPs:

Score = 255 (70.5 bits), Expect = 1.4e-12, Sum P(2) = 1.4e-12 Identities = 63/78 (80%), Positives = 63/78 (80%), Strand = Plus / Plus

Query: 1 TCCAACAGGCTGAGGAGATCCTAGTAACACCTCTAAAAATGTACAAACTCAATTGGCTTT 60 INI I llllllll INI I INI lllllllllllllllll lllllllllll Sbjct: 63 5 TCCATCTGGCTGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTGGCTTT 694

Query: 61 CATAACCCCCCAAAATTA 78 I I Mill llllll Sbjct: 695 AAATATCCCCCCAAATTA 712

Score = 82 (22.7 bits), Expect = 1.4e-12, Sum P(2) = 1.4e-12 Identities = 18/20 (90%), Positives = 18/20 (90%), Strand = Plus / Plus

Query: 94 CCCAAATTACACAACAGAAA 113 lllllllilllllll III Sbjct: 73 0 CCCAAATTACACAACCAAAA 749

gb|U00668|OAINIGFII7 Ovis aries insulin-like growth factor II (IGF-II) gene, exon 10 sequence and complete cds Length = 720

Plus Strand HSPs:

Score = 246 (68.0 bits), Expect = 1.7e-10, P = 1.7e-10 Identities = 62/78 (79%), Positives = 62/78 (79%), Strand = Plus / Plus

Query: 1 TCCAACAGGCTGAGGAGATCCTAGTAACACCTCTAAAAATGTACAAACTCAATTGGCTTT 60 nil I III Mil Mil I Mil lllllllllllllllll lllllllllll Sbjct: 484 TCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTGGCTTT 543 215

Query: 61 CATAACCCCCCAAAATTA 78 I I Mill llllll sbjct: 544 AAATATCCCCCCAAATTA 561

emb|X152481OOIGFII Sheep mRNA for insulin-like growth factor II Length = 883

Plus Strand HSPs:

Score = 246 (68.0 bits), Expect = 1.7e-10, P = 1.7e-10 Identities = 62/78 (79%), Positives = 62/78 (79%), Strand = Plus / Plus

Query: 1 TCC AAC AGGC TGAGGAGATCCT AGT AAC AC CTC TAAAAATGT AC AAAC TC AATTGGCTTT 60 INI I III till INI I INI lllllllllllllllll lllllllllll Sbjct: 733 TCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTGGCTTT 792

Query: 61 CATAACCCCCCAAAATTA 78 I I Mill llllll sbjct: 793 AAATATCCCCCCAAATTA 810

gb|M89788|SHPIGFIIA Ovis aries insulin-like growth factor II (IGF-II) mRNA, complete cds. Length = 103 6

Plus Strand HSPs:

Score = 246 (68.0 bits), Expect = 1.8e-10, P = 1.8e-10 Identities = 62/78 (79%), Positives = 62/78 (79%), Strand = Plus / Plus

Query: 1 TCCAACAGGCTGAGGAGATCCTAGTAACACCTCTAAAAATGTACAAACTCAATTGGCTTT 60 1111 I III 1111 1111 I 1111II11111111IIIII11 11111111III Sbjct: 822 TCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTGGCTTT 881

Query: 61 CATAACCCCCCAAAATTA 78 I I Mill llllll Sbjct: 882 AAATATCCCCCCAAATTA 899

emb|X55638 |OAIGFIIR O.aries mRNA for insulin-like growth factor-II Length = 1225

Plus Strand HSPs:

Score = 246 (68.0 bits), Expect = 1.8e-10, P = 1.8e-10 Identities = 62/78 (79%), Positives = 62/78 (79%), Strand = Plus / Plus

Query: 1 TCCAACAGGCTGAGGAGATCCTAGTAACACCTCTAAAAATGTACAAACTCAATTGGCTTT 60 1111 I III 1111 1111 I 1111 11111111IIIIIII11 1111111II11 Sbjct: 1003 TCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTGGCTTT 1062

Query: 61 CATAACCCCCCAAAATTA 78 I I Mill llllll Sbjct: 1063 AAATATCCCCCCAAATTA 1080

emb|X53554|OAIGFII Ovine mRNA for insulin-like growth factor II (IGF-II) Length = 1152

Plus Strand HSPs:

Score = 201 (55.5 bits), Expect = 1.2e-06, P = 1.2e-06 Identities = 57/78 (73%), Positives = 57/78 (73%), Strand = Plus / Plus 216

Query: 1 TCCAACAGGCTGAGGAGATCCTAGTAACACCTCTAAAAATGTACAAACTCAATTGGCTTT 60

Sbjct: 947 TCCATCTGGCCGAGGGGATCAGAACAACTATCTCAAAAATGTACAAAACCAATTGGCTTT 1006

Query: 61 CATAACCCCCCAAAATTA 78

Sbjct: 1007 AAATATCCCCCCAAATTA 1024

Sequence 1E3R:

Query= 1E3R (256 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N emb|X53 553|BTILGF2 Bovine mRNA for insulin-like growth ... 306 6.6e-29 2 gb|U00668|OAINIGFII7 Ovis aries insulin-like growth facto... 297 1.6e-27 2 gb|M89788|SHPIGFIIA Ovis aries insulin-like growth facto... 297 2.8e-27 2 emb|X5563 8 |OAIGFIIR O.aries mRNA for insulin-like growth... 297 2.0e-26 2 emb|X53554|OAIGFII Ovine mRNA for insulin-like growth f... 252 1.7e-23 2 emb|X15248|OOIGFII Sheep mRNA for insulin-like growth f... 297 1.5e-15 1

emb|X53553|BTILGF2 Bovine mRNA for insulin-like growth factor II, partial Length = 845

Minus Strand HSPs:

Score = 306 (84.6 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 Identities = 84/120 (70%), Positives = 84/120 (70%), Strand = Minus / Plus

Query: 254 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllll II INI INI INI llllllll! lllllll II II Sbjct: 63 0 CCCCCTCCATCTGGCTGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 689

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 13 5 Mill I I Mill llllll II II llllll I lllllll III Sbjct: 690 GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTACACAACCAAAA 749

Score = 178 (49.2 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 Identities = 44/55 (80%), Positives = 44/55 (80%), Strand = Minus / Plus

Query: 55 TCNGTCCCCTTAAAACAAATTGGCTTTTTAGGAACACCAGCAAAATTAATTAGTT 1 II I llllll 1111 I II I I I I I I I I I I I llllllll III II III I Sbjct: 771 TCAGCCCCCTTGAAACGAATTGGCTTTTTAGCAACACCAGAAAAGCAAACTAGCT 825

gb|U00668|OAINIGFII7 Ovis aries insulin-like growth factor II (IGF-II) gene, exon 10 sequence and complete cds Length = 720

Minus Strand HSPs: 217

Score = 297 (82.1 bits), Expect = 1.6e-27, Sum P(2) = 1.6e-27 Identities = 83/120 (69%), Positives = 83/120 (69%), Strand = Minus / Plus

Query: 254 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllllll1 II 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1II 1 1 1II 1 1II 1 1 1 1II 1 1 lllllll II II Sbjct: 479 CCCCCTCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 538

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 135 MMI 1 1 MMI MMM M M MMM 1 1 MMI Ml Sbj ct : 539GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTATACAACCAAAA 598

Score = 169 (46.7 bits), Expect = 1.6e-27, Sum P(2) = 1.6e-27 Identities = 43/55 (78%), Positives = 43/55 (78%), Strand = Minus / Plus

Query: 55 TCNGTCCCCTTAAAACAAATTGGCTTTTTAGGAACACCAGCAAAATTAATTAGTT 1 II I llllll III llllllllllllll llllllll III II III I Sbjct: 620 TCAGCCCCCTTGAAATGAATTGGCTTTTTAGCAACACCAGAAAAGCAAACTAGCT 674

gb|M89788|SHPIGFIIA Ovis aries insulin-like growth factor II (IGF-II) mRNA, complete cds. Length = 1036

Minus Strand HSPs:

Score = 297 (82.1 bits), Expect = 2.8e-27, Sum P(2) = 2.8e-27 Identities = 83/120 (69%), Positives = 83/120 (69%), Strand = Minus / Plus

Query: 2 54 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllll II INI INI INI MINIMI lllllll II II Sbjct: 817 CCCCCTCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 87 6

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 135 Mill I I lllll llllll II II llllll I I Mill III Sbjct: 877 GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTATACAACCAAAA 93 6

Score = 169 (46.7 bits), Expect = 2.8e-27, Sum P(2) = 2.8e-27 Identities = 43/55 (78%), Positives = 43/55 (78%), Strand = Minus / Plus

Query: 55 TCNGTCCCCTTAAAACAAATTGGCTTTTTAGGAACACCAGCAAAATTAATTAGTT 1 II I llllll III II I I I I I I I I I I I I llllllll III II III I Sbjct: 958 TC AGC C CC CTTGAAATGAATTGGCTTTTTAGCAACAC C AGAAAAGC AAACT AGC T 1012

emb|X55638|OAIGFIIR O.aries mRNA for insulin-like growth factor-II Length = 1225

Minus Strand HSPs:

Score = 297 (82.1 bits), Expect = 2.0e-26, Sum P(2) = 2.0e-26 Identities = 83/120 (69%), Positives = 83/120 (69%), Strand = Minus / Plus

Query: 254 CC C CC TC CATCAGGGNGAGGAGATCNTNGTAACACCTC TAAAAANGTACAAANTAAANTG 1! lllllllllll II 1111 1111 1111 111111111 lllllll II II Sbj ct : 998 CCCCCTCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 1<

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 1! lllll l I lllll llllll II II llllll I I lllll III Sbj ct : 1058

Score = 160 Identities =

Query: 55 II I III! I III llllllllllllll llllllll III II III I Sbjct: 1139 218 emb|X53554|OAIGFII Ovine mRNA for insulin-like growth factor II (IGF-II) Length = 1152

Minus Strand HSPs:

Score = 252 (69.6 bits), Expect = 1.7e-23, Sum P(2) = 1.7e-23 Identities = 78/120 (65%), Positives = 78/120 (65%), Strand = Minus / Plus

Query: 254 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllll II INI INI III lllll lllllll II II Sbjct: 942 CCCCCTCCATCTGGCCGAGGGGATCAGAACAACTATCTCAAAAATGTACAAAACCAATTG 1001

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 13 5 lllll I I lllll llllll II II llllll I I lllll III Sbjct: 1002 GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTATACAACCAAAA 1061

Score = 169 (46.7 bits), Expect = 1.7e-23, Sum P(2) = 1.7e-23 Identities = 43/55 (78%), Positives = 43/55 (78%), Strand = Minus / Plus

Query: 55 TCNGTCCCCTTAAAACAAATTGGCTTTTTAGGAACACCAGCAAAATTAATTAGTT 1 II I llllll III llllllllllllll llllllll III II III I Sbjct: 1083 TCAGCCCCCTTGAAATGAATTGGCTTTTTAGCAACACCAGAAAAGCAAACTAGCT 1137

emb|X15248|OOIGFII Sheep mRNA for insulin-like growth factor II Length = 883

Minus Strand HSPs:

Score = 297 (82.1 bits), Expect = 1.5e-15, P = 1.5e-15 Identities = 83/120 (69%), Positives = 83/120 (69%), Strand = Minus / Plus

Query: 254 CCCCCTCCATCAGGGNGAGGAGATCNTNGTAACACCTCTAAAAANGTACAAANTAAANTG 195 lllllllllll II Mil INI INI MINIMI lllllll II II sbjct: 728 CCCCCTCCATCTGGCCGAGGGGATCAGAACAACATCTCTAAAAATGTACAAAACCAATTG 787

Query: 194 GCTTTCATAACCCCCCAAAATTANNNNNNNNAAATTTTTCCCCAATTAACACAACNGAAA 13 5 MMI I I MMI MMM II II MMM I I MMI III Sbjct: 788 GCTTTAAATATCCCCCCAAATTATCACCCCCCAAATTACCCCCAAATTATACAACCAAAA 847 219 SUBCLQNE 2A7

Sequence 2A7F:

Query= 2A7F (355 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P (N) N gb|M73046|DOGSNVD17B Dog inserted sequence in spleen necr... 273 2. 0e-15 1 gb|M63427|DOGCOLIP Dog pancreatic colipase gene, comple... 256 1.8e-13 1 emb|X57357|CFREPDNA Canis familiaris (dog) repetitive DN... 229 2 .5e-ll 1 gb|U17996|CFU17996 Canis familiaris SINE and TC microsa... 238 4.5e-ll 1 1 O to emb|Y11309|CFCGMPI2 C.familiaris gene encoding cGMP-gate... 229 (D 1 gb|L47165|DOGTYRA Canis familiaris tyrosine aminotrans... 220 3. 0e-09 1

gb|M73046|DOGSNVD17B Dog inserted sequence in spleen necrosis vector clone. Length =202

Plus Strand HSPs:

Score = 273 (75.4 bits), Expect = 2.0e-15, P = 2.0e-15 Identities = 73/109 (6 6 %), Positives = 73/109 (66%), Strand = Plus / Plus

Query: 1 GAGTCCCACGTCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATGNNNNNNN 6 0 I! INI INI II11 III HIM IIIIIIIIIIIIII HIM III lllll Sbjct: 59 GAATCCCGCGTCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGTCTATGTCTCTGC 118

Query: 61 NNNNNNNNNNNNNNGTGTGAC TATC ATNATTNNNTTC AAATT ANTAAAA 109 lllllllllllll II I III I Mil Sbjct: 119 CTCTCTCTCTGTGTGTGTGACTATCATAAATAAATAAAAAATTAAAAAA 167

g b |M634 2 7 |DOGCOLIP Dog pancreatic colipase gene, complete cds. Length = 3164

Plus Strand HSPs:

Score = 256 (70.7 bits), Expect = 1.8e-13, P = 1.8e-13 Identities = 64/91 (70%), Positives = 64/91 (70%), Strand = Plus / Plus

Query: 1 GAGTCCCACGTCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATGNNNNNNN 60 II llllll I I I II I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I II Sbjct: 57 9 GAATCCCACATCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTCTGC 63 8

Query: 61 NNNNNNNNNNNNNNGTGTGACTATCATNATT 91 llllllllllll I I sbjct: 63 9 CTCTCTCTCTCTCTCTGTGACTATCATGAAT 669

Minus Strand HSPs: Score = 137 (37.9 bits), Expect = 2.4e-05, Sum P(2) = 2.4e-05 Identities = 29/31 (93%), Positives = 29/31 (93%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTC CATGCA 23 II llllll lllllllllllllllllllll Sbjct: 1219 CACAGGCAGGGGGAGAAGCAGGCTCCATGCA 1249

Score = 101 (27.9 bits), Expect = 2.4e-05, Sum P(2) = 2.4e-05 Identities = 21/22 (95%), Positives = 21/22 (95%), Strand = Minus / Plus

Query: 22 CCGGGAGCCTGACGTGGGACTC 1 I I II I III II !l Mill INI Sbjct: 1248 CAGGGAGCCTGACGTGGGACTC 1269

emb|X57357|CFREPDNA Canis familiaris (dog) repetitive DNA sequence Length = 130

Plus Strand HSPs:

Score = 229 (63.3 bits), Expect = 2.5e-ll, P = 2.5e-ll Identities = 49/53 (92%), Positives = 49/53 (92%), Strand = Plus / Plus

Query: 1 GAGTCCCACGTCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 II llllll II llllllllllllllllllllllllllllllillllll II Sbjct: 70 GAATCCCACATCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTG 122

gb|U17996|CFU17996 Canis familiaris SINE and TC microsatellite repeat units, genomic sequence Length = 737

Plus Strand HSPs:

Score = 238 (65.8 bits), Expect = 4.5e-ll, P = 4.5e-ll Identities = 50/53 (94%), Positives = 50/53 (94%), Strand = Plus / Plus

Query: 1 GAGTCCCACGTCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 llllllll III lllllllllilllllllllllllllllllllllllll II Sbjct: 64 GAGTCCCATGTCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTG 116

emb|Y113 09|CFCGMPI2 C.familiaris gene encoding cGMP-gated channel alpha subunit, intron 2 Length = 1295

Minus Strand HSPs:

Score = 229 (63.3 bits), Expect = 4.2e-10, P = 4.2e-10 Identities = 49/53 (92%), Positives = 49/53 (92%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATGCACCGGGAGCCTGACGTGGGACTC 1 I I I I I M II I I Ml II lllll II II II lllllllllll II llllll II sbjct: 73 8 CATAGGCAGAGGGAGAAGCAGGCTCCATACACCGGGAGCCCGATGTGGGATTC 790

gb|L47165|DOGTYRA Canis familiaris tyrosine aminotransferase gene, partial cds. Length = 919

Minus Strand HSPs:

Score = 220 (60.8 bits), Expect = 3.0e-09, P = 3.0e-09 Identities = 48/53 (90%), Positives = 48/53 (90%), Strand = Minus / Plus Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATGCACCGGGAGCCTGACGTGGGACTC 1 II lllllllllllllllllllllllllllll lllllll I lllllll II Sbjct: 404 CACAGGCAGAGGGAGAAGCAGGCTCCATGCACAGGGAGCCCGCCGTGGGATTC 456

gb|U46916|CFU46916 Canis familiaris MHC DLA Class II DRB pseudogene DRB2 . Length = 2117

Plus Strand HSPs:

Score = 150 (41.4 bits), Expect = 0.0067, P = 0.0066 Identities = 34/39 (87%), Positives = 34/39 (87%), Strand = Plus / Plus

Query: 15 GCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 I II I t I I I I I I I I I I I I I I I I I I I !! I I I I I I || Sbjct: 1666 GGTCTCCCTGCATGGAGCCTGCTTCTCCCTCTGCCTGTG 1704

gb|U33628|CFU33628 Canis familiaris sphingolipid Ca2+ release mediating protein of endoplasmic reticulum mRNA, complete cds. Length = 1869

Plus Strand HSPs:

Score = 146 (40.3 bits), Expect = 0.015, P = 0.015 Identities = 30/31 (96%), Positives = 30/31 (96%), Strand = Plus / Plus

Query: 23 TGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 llllllllllllllllllllllllllll II Sbjct: 1209 TGCATGGAGCCTGCTTCTCCCTCTGCCTGTG 1239

emb|Y09004|CFRHOD C.familiaris gene encoding rhodopsin Length = 5352

Minus Strand HSPs:

Score = 137 (37.9 bits), Expect = 0.092, P = 0.088 Identities = 29/31 (93%), Positives = 29/31 (93%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATGCA 23 II MINIMI I I I I I I I I I I II I I I I I I Sbjct: 1922 CACAGGCAGAGGAAGAAGCAGGCTCCATGCA 1952

Score = 118 (32.6 bits), Expect = 4.1, P = 0.98 Identities = 26/29 (89%), Positives = 26/29 (89%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATG 25 II III I I I I I I I I I I II I I I I I I I II Sbjct: 1450 CAGAGGTAGAGGGAGAAGCAGGCTCCCTG 1478

embIX86187ICFCALE009 C.familiaris CA repeat sequence Length = 210

Minus Strand HSPs:

Score = 128 (35.4 bits), Expect = 0.38, P = 0.32 Identities = 28/31 (90%), Positives = 28/31 (90%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATGCA 23 II M M M III11111111 111111111 Sbjct: 158 CACAGGCAGGGGGAGAAGCAGACTCCATGCA 188 gb|L25313|DOGDNRPTPM Canis familaris VIAS-D23 locus dinucleotide repeat polymorphism. Length = 229

Plus Strand HSPs:

Score = 128 (35.4 bits), Expect = 0.40, P = 0.33 Identities = 28/31 (90%), Positives = 28/31 (90%), Strand = Plus / Plus

Query: 23 TGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 III INI II I II III III MM! Ill II Sbjct: 5 TGCGTGGAACCTGCTTCTCCCTCTGCCTGTG 35

emb|Y13283|CFPDEA5UT Canis familiaris, PDEA gene, 5'UTR Length = 738

Minus Strand HSPs:

Score = 128 (35.4 bits), Expect = 0.51, P = 0.40 Identities = 28/31 (90%), Positives = 28/31 (90%), Strand = Minus / Plus

Query: 53 CATAGGCAGAGGGAGAAGCAGGCTCCATGCA 23 II I llllllllllllllllll lllllll Sbj c t: 3 99 CACAAGCAGAGGGAGAAGCAGGCCCCATGCA 429

emb|A33693|A33693 C.familiaris IFN-alpha 1 Length = 1952

Plus Strand HSPs:

Score = 128 (35.4 bits), Expect = 0.55, P = 0.42 Identities = 28/31 (90%), Positives = 28/31 (90%), Strand = Plus / Plus

Query: 23 TGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 I II lllllll llllllllllllllllll Sbjct: 995 TACACGGAGCCTTCTTCTCCCTCTGCCTATG 1025

gb|M217571DOGFIX Canis familiaris factor IX mRNA, complete cds. Length = 2869

Plus Strand HSPs:

Score = 128 (35.4 bits), Expect = 0.55, P = 0.42 Identities = 28/31 (90%), Positives = 28/31 (90%), Strand = Plus / Plus

Query: 23 TGCATGGAGCCTGCTTCTCCCTCTGCCTATG 53 INI II llllllllllllllllll INI Sbjct: 2781 TGCAGGGGGCCTGCTTCTCCCTCTGCTTATG 2811 223 Sequence 2A7R:

Query= 2A7R (378 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters.

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P (N) N gb|M73046|DOGSNVD17B Dog inserted sequence in spleen necr... 351 1 .6e-22 1 to to O gb|M63427|DOGCOLIP Dog pancreatic colipase gene, comple... 342

gb|M73046|DOGSNVD17B Dog inserted sequence in spleen necrosis virus vector provirus clone. Length = 202

Minus Strand HSPs

Score = 351 (97.0 bits), Expect = 1.6e-22, P = 1.6e-22 Identities = 79/98 (80%), Positives = 79/98 (80%) , Strand = Minus / Plus

Query: 378 II INI INI lllllllllllllllll llllllllllllllll MINIMI Sbjct: 56

Query: 318 TGCNNNNNNNNNNNNNNGTGTGACTATCATAAATAAAT 281 II llllllllllllllllilll Sbjct:116 TGCCTCTCTCTCTGTGTGTGTGACTATCATAAATAAAT 153

gb|M63427|DOGCOLIP Dog pancreatic colipase gene, complete cds. Length = 3164

Plus Strand HSPs:

Score = 165 (45.6 bits), Expect = 3.0e-08, Sum P(2) = 3.0e-08 Identities = 35/38 (92%), Positives = 35/38 (92%), Strand = Plus / Plus

Query: 316 GCAGAGACATAGGCAGAGGGAGAAGCAGGNTCCATGCA 3 53 lllllllll llllll llllllllllll llllllll Sbjct: 1212 GCAGAGACACAGGCAGGGGGAGAAGCAGGCTCCATGCA 1249

Score = 109 (30.1 bits), Expect = 3.0e-08, Sum P(2) = 3.0e-08 Identities = 23/25 (92%), Positives = 23/25 (92%), Strand = Plus / Plus

Query: 3 54 CCGGGAGCCTGACGTGGGANTCGAT 378 I lllllllllllllllll lllll Sbjct: 1248 CAGGGAGCCTGACGTGGGACTCGAT 1272

Minus Strand HSPs: 224

Score = 342 (94.5 bits), Expect = 1.2e-20, P = 1.2e-20 Identities = 78/98 (79%), Positives = 78/98 (79%), Strand = Minus / Plus

Query: 378 ATCGANTCCCACGTCAGGCTCCCGGTGCATGGANCCTGCTTCTCCCTCTGCCTATGTCTC 319 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 lllll llllll 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II1 1 1 1 1 1I lllllllllllllllllll llllll Sbjct: 576 ATCGAATCCCACATCAGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTC 635

Query: 318 TGCNNNNNNNNNNNNNNGTGTGACTATCATAAATAAAT 281 Ml lllllllllllllllllll Sbjct: 636 TGCCTCTCTCTCTCTCTCTGTGACTATCATGAATAAAT 673

emb|Y113 09|CFCGMPI2 C.familiaris gene encoding cGMP-gated channel alpha subunit, intron 2 Length = 1295

Plus Strand HSPs:

Score = 274 (75.7 bits), Expect = 8.1e-19, Sum P (2) = 8.1e-19 Identities = 58/63 (92%), Positives = 58/63 (92%), Strand = Plus / Plus

Query: 316 GCAGAGACATAGGCAGAGGGAGAAGCAGGNTCCATGCACCGGGAGCCTGACGTGGGANTC 375 I I I I I I I I I II I I I I I I I I I I I I I I I I I I lllll lllllllllll II llllll II Sbj ct: 731 GCAGAGACATAGGCAGAGGGAGAAGCAGGCTCCATACACCGGGAGCCCGATGTGGGATTC 790

Query: 37 6 GAT 378 III Sbjct: 791 GAT 793

Score = 91 (25.1 bits), Expect = 8.1e-19, Sum P (2) = 8.1e-19 Identities = 19/20 (95%), Positives = 19/20 (95%), Strand = Plus / Plus

Query: 281 ATTTATTTATGATAGTCACA 300 lllllllllllllllll II Sbjct: 698 ATTTATTTATGATAGTCTCA 717

gb|U17996|CFU17996 Canis familiaris SINE and TC microsatellite repeat units, genomic sequence Length = 737

Minus Strand HSPs:

Score = 265 (73.2 bits), Expect = 2.0e-18, Sum P(2) = 2.0e-18 Identities = 57/63 (90%), Positives = 57/63 (90%), Strand = Minus / Plus

Query: 37 8 ATCGANTCCCACGTCAGGCTCCCGGTGCATGGANCCTGCTTCTCCCTCTGCCTATGTCTC 319 lllll lllll III lllllllllllllllll lllllllllllllllllll lllll Sbjct: 61 ATCGAGTCCCATGTCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTG 120

Query: 318 TGC 316 III Sbjct: 121 TGC 123

Score = 91 (25.1 bits), Expect = 2.0e-18, Sum P(2) = 2.0e-18 Identities = 19/20 (95%), Positives = 19/20 (95%), Strand = Minus / Plus

Query: 3 00 TGTGACTATCATAAATAAAT 281 lllllllllllllll Mil Sbjct: 143 TGTGACTATCATAAAAAAAT 162

emb|X573571CFREPDNA Canis familiaris (dog) repetitive DNA sequence Length = 130

Minus Strand HSPs: 225

Score = 274 (75.7 bits), Expect = 3.1e-16, P = 3.1e-16 Identities = 58/63 (92%), Positives = 58/63 (92%), Strand = Minus / Plus

Query: 37 8 ATCGANTCCCACGTCAGGCTCCCGGTGCATGGANCCTGCTTCTCCCTCTGCCTATGTCTC 319 Mill MINI II lllllllllllllllll 1111111111111III 111 MINI Sbjct: 67 ATCGAATCCCACATCGGGCTCCCGGTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTC 126

Query: 318 TGC 316 III Sbjct: 127 TGC 129

gb|L47165|DOGTYRA Canis familiaris tyrosine aminotransferase gene, partial cds. Length = 919

Plus Strand HSPs:

Score = 265 (73.2 bits), Expect = 3.3e-13, P = 3.3e-13 Identities = 57/63 (90%), Positives = 57/63 (90%), Strand = Plus / Plus

Query: 316 GCAGAGACATAGGCAGAGGGAGAAGCAGGNTCCATGCACCGGGAGCCTGACGTGGGANTC 375 MINIMI I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II Sbjct: 3 97 GCAGAGACACAGGCAGAGGGAGAAGCAGGCTCCATGCACAGGGAGCCCGCCGTGGGATTC 456

Query: 376 GAT 378 III Sbjct: 457 GAT 459

gb|U46916|CFU46916 Canis familiaris MHC DLA Class II DRB pseudogene DRB2 . Length = 2117

Minus Strand HSPs:

Score = 192 (53.1 bits), Expect = 1.9e-06, P = 1.9e-06 Identities = 54/81 (6 6 %), Positives = 54/81 (6 6 %), Strand = Minus / Plus

Query: 3 61 GCTCCCGGTGCATGGANCCTGCTTCTCCCTCTGCCTATGTCTCTGCNNNNNNNNNNNNNN 302 I II I I II I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I Sbjct: 1666 GGTCTCCCTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTCTGCCTCTCTCTCTGTGT 1725

Query: 3 01 GTGTGACTATCATAAATAAAT 281 II I I II I I I I I I I I Sbjct: 1726 GTCTCTCATGGATAAATAAAT 1746 SUBCLONE 2A11

Sequence 2A11F: Only the sequence match of the highest scoring sequence has been shown.

BLASTN 1.4.9MP [26-March-1996] [Build 14:27:07 Apr 1 1996]

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipxnan (1990). Basic local alignment search tool. J. M o l . Biol. 215:403-10.

Notice: this program and its default parameter settings are optimized to find nearly identical sequences rapidly. To identify weak similarities encoded in nucleic acid, use BLASTX, TBLASTN or TBLASTX.

Query= 2A11F (604 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching...... done

Smallest Sum High Probability Sequences producing High- scoring Segment Pairs: Score P(N) N

gb|U31059|HSU31059 Human Mermaid LINE-1 element mRNA... 244 l.le-17 2 emb|Z69648|HSE122E9 Human DNA sequence from cosmid El. . . 235 l.le-14 2 gb|U52112|HSU52112 Human Xq28 genomic DNA in the reg. . . 260 3.3e-ll 1X gbIL35701IHUM9DC98Z Homo sapiens (subclone H9 7_d4 fr... 249 2 .6e-10 X1 gb|U07000|HSU07000 Human breakpoint cluster region (... 177 l.le-08 2

gb|U31059|HSU31059 Human Mermaid LINE-1 element mRNA sequence Length = 302

Plus Strand HSPs:

Score = 244 (67.4 bits) , Expect = l.le-17, Sum P(2) = l.le- 17 Identities = 68/92 (73%), Positives = 68/92 (73%), Strand = Plus / Plus

Query: 266 INI 111111111111111 MINI! Mill II Sbjct: 1 ?GTGCAGCCATCACCACCATCCATCTCCAGAAC'

Query: 326 :acccactgaacagctccccgccccaccccc 3 Mill I INI I I II I Mill Sbjct: 61

Score = 152 Identities =

Query: 379

Sbj ct : 127 227 Sequence 2A11R: Selected sequence matches, indicated by the * shown only.

BLASTN 1.4.9MP [26-March-1996] [Build 14:27:07 Apr 1 1996]

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. M o l . Biol. 215:403-10.

Notice: this program and its default parameter settings are optimized to find nearly identical sequences rapidly. To identify weak similarities encoded in nucleic acid, use BLASTX, TBLASTN or TBLASTX.

Query= 2A11R (456 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching...... done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|U31059|HSU31059 Human Mermaid LINE-1 element mRNA . . . 241 6.9e-23 3 * emb|Z69648|HSE122E9 Human DNA sequence from cosmid E12... 233 1.2e-17 3 gb|S44029|S44029 red visual pigment {51 region} [hu... 188 9.6e-ll 3 * gb|AF003626|HSAF003626 Homo sapiens cosmids IM0525, LC123. . . 212 1 .8 e-10 2 gb|U07000|HSU07000 Human breakpoint cluster region (B. . . 184 8 .le-10 2

gb|U31059|HSU31059 Human Mermaid LINE-1 element mRNA sequence Length = 302

Minus Strand HSPs:

Score = 241 (6 6.6 bits), Expect = 6.9e-23, Sum P(3) = 6.9e-23 Identities = 67/92 (72%), Positives = 67/92 (72%), Strand = Minus / Plus

Query: 326 TGTGCCACCATCACCACCATCTTNTGCCAGAACNTCATCATCATCCCAAATNGAAACTCC Mill llllllllllllll lllllll I Mill 1111111 1111111 Sbjct: 1

Query: 266 Mill I 1111 I I I II I 1111 Sbjct: 61

Score = 161 Identities =

Query: 214 II II III lllllll II I II II III II11111II11111 II III Sbjct: 126

Query: 154 ' I I Sbjct: 186 ' 228

Score = 97 (26.8 bits), Expect = 6.9e-23, Sum P(3) = 6.9e-23 Identities = 21/23 (91%), Positives = 21/23 (91%) , Strand = Minus / Plus

Query: 115 TTTGTGACTGGCTTATTTTACGT 93 llllllllllllllllll II I Sbjct: 192 TTTGTGACTGGCTTATTTCACTT 214

gb|S44029|S44029 red visual pigment {5' region} [human, Genomic, 6018 nt] Length = 6018

Minus Strand HSPs:

Score = 188 (51.9 bits). Expect = 9.6e-ll, Sum P(3) = 9.6e-ll Identities = 66/103 (64%), Positives = 66/103 (64%), Strand = Minus / Plus

Query: 334 CACACGGGTGTGCCACCATCACCACCATCTTNTGCCAGAACNTCATCATCATCCCAAATN 275 II Mill llllllllllllll MINI I Mill lllllll Sbj ct : 33416 416 CACGGTGCTGTGCAGCCATCACCACCATCCAGCCGCAGAACTTGTTCATCTTCCCAAACT 3475

Query:274 GAAACTCCACACCCACTGAACAGTTCCCCGCCCCACCCCCACC 232 III III I III I I II I 1111 II Sbjct: 3473476 6 GAAGCTCTGGCCCCGTTCAACCCCAACTCCCCACCCCCCAGCC 3 518

Score = 138 Identities =

Query: 213 I II III II 1111 II I III I III 111111II111 II II 1111 Sbjct: 3541

Query: 153 J

Sbjct: 3601 i

Score = 77 ( Identities =

Query: 113 III111111 II Sbjct: 3608 SUBCLONE 2D2 Only the sequence match of the highest scoring sequence has been shown.

Sequence 2D2F:

BLASTN 1.4.9MP [26-March-1996] [Build 14:27:07 Apr 1 1996]

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10.

Notice: this program and its default parameter settings are optimized to find nearly identical sequences rapidly. To identify weak similarities encoded in nucleic acid, use BLASTX, TBLASTN or TBLASTX.

Query= 2D2F (404 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters.

Smallest Siam High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N

gb|L4 2 3 2 6 |DOGFRA Canis familiaris (clone GPCR W) DNA . . . 163 1.6e-09 AO gb|U46916| CFU46916 Canis familiaris MHC DLA Class II DR. . . 159 1.8e-08 zo emb|Y09004 |CFRHOD C.familiaris gene encoding rhodopsin 168 2.6e-07 2 gb|M63427| DOGCOLIP Dog pancreatic colipase gene, comple... 141 5.9e-06 2

gb|L42326|DOGFRA Canis familiaris (clone GPCR W) DNA fragment Length = 1638

Plus Strand HSPs:

Score = 163 (45.0 bits), Expect = 1.6e-09, Sum P (2) = 1.6e-09 Identities = 43/56 (76%), Positives = 43/56 (76%), Strand = Plus / Plus

Query: 46 TGCCTGTGTCTCTGCCTCTCTCTATGTGGCCCTCAAGAACAAATTAATTATTTTTT 101 Mil llllllllllllllllll II I I III III INI III I I II Sbjct: 108 TGCCAGTGTCTCTGCCTCTCTCTCTGGGTCTTTCATGAATAAATAAATAAAATCTT 163

Score = 141 (39.0 bits), Expect = 1.6e-09, Sum P(2) = 1.6e-09 Identities = 33/39 (84%), Positives = 33/39 (84%), Strand = Plus / Plus

Query: 7 CCACATTGGGCTCCCCGCATGGAGCCTGTTTCTCCCTCT 45 III I Mill II I I I I I I I I I I I I I I I I I I I I I I Sbjct: 70 CCATGTCGGGCTTCCTGGATGGAGCCTGTTTCTCCCTCT 108 SUBCLQNE 2H12 Only the sequence match of the highest scoring sequence has been shown.

Sequence 2H12R:

BLASTN 1.4.9MP [26-March-1996] [Build 14:27:07 Apr 1 1996]

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. M o l . Biol. 215:403-10.

Notice: this program and its default parameter settings are optimized to find nearly identical sequences rapidly. To identify weak similarities encoded in nucleic acid, use BLASTX, TBLASTN or TBLASTX.

Query= 2H12R (718 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 347,056 sequences; 530,023,315 total letters. Searching...... done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P (N) N gb|L42326 |DOGFRA Canis familiaris (clone GPCR W) DN. . . 280 4.4e-16 2 gb|U47339|CFMHCDRB03 Canis familiaris MHC class II DLA . . . 276 5.7e-14 2 gb|M63427|DOGCOLIP Dog pancreatic colipase gene, comp... 251 2.4e-13 2 emb|X80208|CFANXIIIA C . familiaris mRNA for annexin X H I a 232 4.4e-12 6O gb|U46916|CFU46916 Canis familiaris MHC DLA Class II ... 240 7.Oe-12 2

gb | L423261DOGFRA Canis familiaris (clone GPCR W) DNA fragment Length = 1638

Plus Strand HSPs:

Score = 280 (77.4 bits), Expect = 4.4e-16, Sum P(2) = 4.4e- 16 Identities = 66/79 (83%), Positives = 66/79 (83%), Strand = Plus / Plus

Query: 24 GCACCTGCCTTCAGCCCAGGGCATGACTCTGGGGTCCTGGAATCGAGTCCTACNTCGGGC 83 I I I I I I I II I I llllllllllllll INI lllllll III Mill I llllll Sbjct: 21 GCACCTGCCTTTGGCCCAGGGCATGACCCTGGAGTCCTGGGATCAAGTCCCATGTCGGGC 80

Query: 84 TCCCTGCATGGAACCTGCT 102 I INI Mill 1111 I Sbjct: 81 TTCCTGGATGGAGCCTGTT 99

Score = 112 (30.9 bits), Expect = 4.4e-16, Sum P(2) = 4.4e-16 Identities = 24/27 (8 8 %), Positives = 24/27 (8 8 %), Strand = Plus / Plus

Query: 149 ATGAATAAATAAATAANATCTTGNAAA 175 IMIIIMIIIIMM Mill III Sbjct: 142 ATGAATAAATAAATAAAATCTTTAAAA 168 231

REFERENCES

Adamson, D., Albertson, H., Ballard, L., Bradley, P., Carlson, M., Cartwright, P., Council, C., Eisner, T., Fuhrman, D., Gerken, S., Harris, L., Holik, P. R., Kimball, A., Knell, J., Lawrence, E., Lu, J., Marks, A., Matsunami, N., Melis, R., Milner, B., Moore, M., Nelson, L., Odelberg, S., Peters, G., Plaetke, R., Riley, R., Robertson, M., Sargent, R., Staker, G., Tingey, A., Ward, K., Zhao, X., and White, R. (1995). A collection of ordered tetranucleotide-repeat markers from the human genome. American Journal of Human Genetics 57, 619-628.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology 215, 403-410.

Angers, B., and Bernatchez, L. (1997). Complex evolution of a salmonid microsatellite locus and its consequences in inferring allelic divergence from size information. Molecular Biology and Evolution 14, 230-238.

Ansari, H. A., Pearce, P. D., Maher, D. W., and Broad, T. E. (1994). Regional assignment of conserved reference loci anchors unassigned linkage and syntenic groups to ovine chromosomes. Genomics 24, 451-455.

Aoki, T., Koch, K. S., and Leffert, H. L. (1997). Attenuation of gene expression by a trinucleotide repeat-rich tract from the terminal exon of the rat hepatic polymeric immunoglobulin receptor gene. Journal of Molecular Biology 267, 229-236.

Ayala, F. J., and Kiger Jr., J. A. (1984). Modern Genetics, second Edition, L. Hofmann, ed. (Menlo Park: The Benjamin/Cummings Publishing Company, Inc.).

Baron, B., Poirer, C., Simon-Chazottes, D., Barnier, C., and Guenet, J.-L. (1992). A new strategy useful for rapid identification of microsatellites from DNA libraries with large size inserts. Nucleic Acids Research 20, 3665-3669.

Beckmann, J. S., and Weber, J. L. (1992). Survey of Human and Rat microsatellites. Genomics 12, 627-631.

Bell, G. I., and Jurka, J. (1997). The length distribution of perfect dimer repetitive DNA is consistent with its evolution by an unbiased single-step mutation process. Journal of Molecular Evolution 44, 414-421. 232 Bennett, S. T., Lucassen, A. M., Gough, S. C. L., Powell, E. E., Undlien, D. E., Pritchard, L. E., Merriman, M. E., Kawaguchi, Y., Dronsfield, M. J., Pociot, F., Nerup, J., Bouzekri, N., Cambon-Thomsen, A., Ronningen, K. S., Barnett, A. H., Bain, S. C., and Todd, J. A. (1995). Susceptibility to human type 1 diabetes at IDDM2 is determined by tandem repeat variation at the insulin gene minisatellite locus. Nature Genetics 9, 284-292.

Berry, R., Stevens, T. J., Walter, N. A. R., Wilcox, A. S., Rubano, T., Hopkins, J. A., Weber, J., Goold, R., Soares, M. B., and Sikela, J. M. (1995). Gene-based sequence-tagged sites (STSs) as the basis for a human gene map. Nature Genetics 10, 415-423.

Birnboim and Doly (1979). A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Research 7, 1513-1523

Bielawski, J. P., and Pumo, D. E. (1997). Randomly amplified polymorphic DNA (RAPD) analysis of Atlantic Coast striped bass. Heredity 78, 32-40.

Boguski, M. S., and Schuler, G. D. (1995). ESTablishing a human transcript map. Nature Genetics 10, 369-371.

Botstein, D., White, R. L., Skolnick, M., and Davis, R. W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. American Journal of Human Genetics 32, 314-331.

Bowcock, M., Ruiz-Linares, A., Tomfohdre, J., Minch, Kidd, J. R., and Cavalli-Sforza, L. L. (1994). High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.

Brahmachari, S. K., Sarkar, P. S., Raghavan, S., Narayan, M., and Maiti, A. K. (1997). Polypurine/polypyrimidine sequences as cis-acting transcriptional regulators. Gene 190, 17-26.

Bray-Ward, P., Menninger, J., Lieman, J., Desai, T., Mokady, N., Banks, A., and Ward, D. C. (1996). Integration of the cytogenetic, genetic and physical maps of the human genome by FISH mapping of CEPH YAC clones. Genomics 32, 1-14.

Breen, M., Langford, C. F., Carte, N. P., Holmes, N. P., Dickens, H. F., Thomas, R., Suter, N., Ryder, E. J., Pope, M., and Binns, M. M. (1997a). FISH mapping and identification of canine chromosomes. In Canine genetics: the map, the genes, the diseases (Cornell University, Ithaca, U.S.A.) 233 Breen, M., Lindgren, G., Binns, M. M., Norman, J., Ivirn, Z., Bell, K., Sandberg, K., and Ellegren, H. (1997b). Genetical and physical assignments of equine microsatellites-first integration of anchored markers in horse genome mapping. Mammalian Genome 8, 267-273.

Brooker, A. L., Cook, D., Bentzen, P., Wright, J. M., and Doyle, R. W. (1994). Organisation of microsatellites differs between mammals and cold-water Teleost fishes. Canadian Journai of Fisheries and Aquatic Sciences 51, 1959- 1966.

Brown, W. M., Dziegielewska, K. M., Foreman, R. C., and Saunders, N. R. (1990). The nucleotide and deduced amino acid sequences of insulin-like growth factor II cDNAs from adult bovine and foetal sheep liver. Nucleic Acids Research 18, 4614.

Buckle, V. J., and Kearney, L. (1993). Untwirling dirvish. Nature Genetics 5, 4-5.

Budarf, M. L., Eckman, B., Michaud, D., McDonald, T., Gavigan, S., Buetow, K. H., Tatsumura, Y., Liu, Z., Hilliard, C., Driscoll, D., Goldmuntz, E., Meese, E., Zwarthoff, E. C., Williams, S., McDermid, H., Dumanski, J. P., Biegel, J., Bell, C. J., and Emanuel, B. S. (1996). Regional localisation of over 300 loci on human chromosome 22 using a somatic cell hybrid mapping panel. Genomics 35, 275-288.

Burgoyne, P. S. (1982). Genetic homology and crossing over in the X and Y chromosomes of mammals. Human Genetics 61, 85-90.

Burnett, R. C., Francisco, L. V., DeRose, S. A., Storb, R., and Ostrander, E. A. (1995). Identification and characterization of a highly polymorphic microsatellite marker within the canine MHC Class I region. Mammalian Genome 6, 684-685.

Callen, D. F., Thompson, A. D., Shen, Y., Phillips, H. A., Richards, R. I., Mulley, J. C., and Sutherland, G. R. (1993). Incidence and origin of "null11 alleles in the (AC)n microsatellite markers. American Journal of Human Genetics 52, 922-927. 234 Cariello, L., Cristofaro, T. d., Zanetti, L., Cuomo, T., Maio, L. D., Campanella, G., Rinaldi, S., Zanetti, P., Lauro, R. D., and Varrone, S. (1996). Transglutaminase activity is related to CAG repeat length in patients with Huntington's disease. Human Genetics 98, 633-635.

Coltman, D. W., and Wright, J. M. (1994). Can SINEs: a family of tRNA- derived specific to the superfamily Canoidea. Nucleic Acids Research 22, 2726-2730.

Cooke, H. J., Brown, W. R. A., and Rappold, G. A. (1985). Hypervariable telomeric sequences from the human sex chromosomes are pseudoautosomal. Nature 317, 687-692.

Coote, T., and Bruford, M. W. (1996). Human microsatellites applicable for analysis of genetic variation in apes and Old World monkeys. The Journal of Heredity 87, 406-410.

Darvasi, A., and Kerem, B. (1995). Deletion and insertion mutations in short tandem repeats in the coding regions of human genes. European Journal of Human Genetics 3, 14-20.

Debin, A., Malvy, C., and Svinarchuk, F. (1997). Investigation of the formation and intracellular stability of purine.(purine/pyrimidine) triplexes. Nucleic Acids Research 25, 1965-1974.

Deitrich, W. F., Miller, J., Steen, R., Merchant, M. A., Damron-Boles, D., Husain, Z., Dredge, R., Daly, M. J., Ingalls, K. A., O'Connor, T. J., Evans, C. A., DeAngelis, M. M., Levinson, D. M., Kruglyak, L., Goodman, N., Copeland, N. G., Jenkins, N. A., Hawkins, T. L., Stein, L., Page, D. C., and Lander, E. S. (1996). A comprehensive genetic map of the mouse genome. Nature 380, 149-152.

Deschenes, S. M., Puck, J. M., Dutra, A. S., Somberg, R. L., Felsberg, P. J., and Henthorn, P. S. (1994). Comparative mapping of canine and human proximal Xq and genetic analysis of canine X-linked severe combined immunodeficiency. Genomics 23, 62-68.

Di Rienzo, A., Peterson, A. C., Garza, J. C., Valdes, A. M., Slatkin, M., and Freimer, N. B. (1994). Mutational processes of simple-sequence repeat loci in human populations. Proceedings of the National Academy of Sciences, U.S.A 91, 3166-3170. 235 Dib, C., Faure, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., Morissette, J., and Weissenbach, J. (1996). A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152-154.

Dolf, G. (1997). DogMap - an international collaboration towards a low resolution canine genetic marker map. The DogMap Consortium. In Canine Genetics: The map. the genes, the diseases (Cornell University, Ithaca.

Don, R. H., Cox, P. T., Wainwright, B. J., Baker, K., and Mattick, J. S. (1991). 'Touchdown' PCR to circumvent spurious priming during gene amplification. Nucleic Acids Research 19, 4008.

Dow, B. D., Ashley, M. V., and Howe, H. F. (1995). Characterization of highly variable (GA/CT)n microsatellites in the bur oak, Quercus macrocarpa. Theoretical and Applied Genetics 91, 137-141.

Dreyling, M. H., Olopade, O. I., and Bohlander, S. K. (1997). Generation of small insert genomic FISH probes with high signal intensity suitable for deletion mapping. Cytogenetics and Cell Genetics 76, 202-205.

Du, Y., Remmers, E. F., Zha, H., Goldmuntz, E. A., Mathern, P., Crofford, L. J., Szpirer, J., Szpirer, C., and Wilder, R. L. (1995). Genetic map of eight microsatellite markers comprising two linkage groups on rat chromosome 6. Cytogenetics and Cell Genetics 68, 107-111.

Dull, T. J., Gray, A., Hayflick, J. S., Ullrich, A. (1984). Insulin-like growth factor II precursor gene organisation in relation to insulin gene family. Nature 310, 777-781.

Dutra, A. S., Mignot, E., and Puck, J. M. (1996). Gene localisation and syntenic mapping by FISH in the dog. Cytogenetics and Cell Genetics 7 4, 113-117.

Economou, E. P., Bergen, A. W., Warren, A. C., and Antonarakis, S. E. (1990). The polydeoxyadenylate tract of Alu repetitive elements is polymorphic in the human genome. Proceedings of the National Academy of Sciences, U.S.A. 87, 2951-2954.

Edwards, A., Hammond, H. A., Jin, L., Caskey, T., and Chakraborty, R. (1992). Genetic variation at five tetrameric tandem repeat loci in four human population groups. Genomics 12, 241-253. 236 Eichler, E. E., Holden, J. J. A., Popovich, B. W., Reiss, A. L., Snow, K., Thibodeau, S. N., Richards, C. S., Ward, P. A., and Nelson, D. L. (1994). Length of uninterrupted CGG repeats determines instability in the FMR1 gene. Nature Genetics 8, 88-94.

Ellis, N. A., Goodfellow, P. J., Pym, B., Smith, M., Palmer, M., Frischauf, A.- M., and Goodfellow, P. N. (1989). The pseudoautosomal boundary in man is defined by an Alu repeat sequence inserted on the Y chromosome. Nature 337, 81-84.

Eisner, T., Fuhrman, D., Gerken, S., Harris, L., Holik, P. R., Kimball, A., Knell, J., Lawrence, E., Lu, J., Marks, A., Matsunami, N., Melis, R., Milner, B., Moore, M., Nelson, L., Odelberg, S., Peters, G., Plaetke, R., Riley, R., Robertson, M., Sargent, R., Staker, G., Tingey, A., Ward, K., Zhao, X., and White, R. (1995). A collection of ordered tetranucleotide-repeat markers from the human genome. American Journal of Human Genetics 57, 619-628.

Ezer, A. D., Williams, R. W., and Goldowitz, D. (1996). Arbitrary primer PCR of dog DNA with estimates of average heterozygosity. Journal of Heredity 87, 450-455.

Fanning, T. G., Modi, W. S., Wayne, R. K., and O'Brien, S. J. (1988). Evolution of heterochromatin-associated satellite DNA loci in felids and canids (Carnivora). Cytogenetics and Cell Genetics 48, 214-219.

Fanning, T. G. (1989). Molecular evolution of centromere-associated nucleotide sequences in two species of canids. Gene 85, 559-563.

Feldman, M. W., Bergman, A., Pollock, D. D., and Goldstein, D. B. (1997). Microsatellite genetic distances with range constraints: Analytic description and problems of estimation. Genetics 145, 207-216.

Fischer, P. E., Holmes, N. G., Dickens, H. F., Thomas, R., Binns, M. M., and Nacheva, E. P. (1996). The application of FISH techniques for physical mapping in the dog (Canis familiaris). Mammalian Genome 7, 37-41.

Fitzgerald, D. T., Dryden, G. L., Bronson, E. C., Williams, J. S., and Anderson, J. N. (1994). Conserved patterns of bending in satellite and nucleosome positioning DNA. The Journal of Biological Chemistry 269, 21303-21314. 237 FitzSimmons, N. N., Moritz, C., and Moore, S. S. (1995). Conservation and dynamics of microsatellite loci over 300 million years of marine turtle evolution. Molecular Biology and Evolution 12, 432-444.

Francisco, L. V., Langston, A. A., Mellersh, C. S., Neal, C. L., and Ostrander, E. A. (1996). A class of highly polymorphic tetranucleotide repeats for canine genetic mapping. Mammalian Genome 7, 359-362.

Fredholm, M., and Wintero, A. K. (1995). Variation of short tandem repeats within and between species belonging to the Canidae family. Mammalian Genome 6, 11-18.

Fries, R. (1993). Mapping the bovine genome: methodological aspects and strategies. Animal Genetics 24, 111-116.

Gacy, A. M., Goellner, G., Juranfc, N., Macura, S., and McMurray, C. T. (1995). Trinucleotide repeats that expand in human disease form hairpin structures in vitro. Cell 81, 533-540.

Garcfa-Moreno, J., Matocq, M. D., Roy, M. S., and Wayne, R. K. (1996). Relationships and genetic purity of the endangered Mexican Wolf based on analysis of microsatellite loci. Conservation Biology 10, 376-389.

Garza, J. C., and Freimer, N. B. (1996). Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Research 6, 211-217.

Garza, J. C., Slatkin, M., and Freimer, N. B. (1995). Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Molecular Biology and Evolution 12, 594-603.

Gerber, H.-P., Seipel, K., Greogiev, O., Hofferer, M., Hug, M., Rusconi, S., and Schaffner, W. (1994). Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263, 808-811.

Gianfrancesco, F., Esposito, T., Ruini, L., Houlgatte, R., Nagaraja, R., D'Esposito, M., Rocchi, M., Auffray, C., Schlessinger, D., D'Urso, M., and Forabosco, A. (1997). Mapping of 59 EST gene markers in 31 intervals spanning the human X chromosome. Gene 187, 179-184.

Glaser, R. L., Thomas, G. H., Seigfried, E., Elgin, S. C. R., and Lis, J. T. (1990). Optimal heat-induced expression of the Drosophila hsp26 gene requires a promoter sequence containing (CT)n.(GA)n repeats. Journal of Molecular Biology 211, 751 -761. 238 Gloatzki-Mullis, M.-L., Gaillard, C., Wigger, G., and Fries, R. (1995). Microsatellite-based parentage control in cattle. Animal Genetics 26, 7-12.

Gogos, J. A., and Karayiorgou, M. (1996). Sequence-specific and length- dependent interaction of C 2 H2 zinc fingers and (TA)n microsatellites. Human Genetics 98, 616-619.

Goldstein, D. B., Ruiz Linares, A ., Cavalli-Sforza, L. L., and Feldman, M. W. (1995a). An evaluation of genetic distances for use with microsatellite loci. Genetics 139, 463-471.

Goldstein, D. B., Ruiz Linares, A., Cavalli-Sforza, L. L., and Feldman, M. W. (1995b). Genetic absolute dating based on microsatellites and the origin of modern humans. Proceedings of the National Academy of Sciences , U.S.A 92, 6723-6727.

Golenberg, E. M., Bickel, A., and Weihs, P. (1996). Effect of highly fragmented DNA on PCR. Nucleic Acids Research 24, 5026-5033.

Goodfellow, P. N. (1993). Microsatellites and the new genetic map. Current Biology 3, 149-151.

Gottelli, D., Sillero-Zubiri, C., Applebaum, G. D., Roy, M. S., Girman, D. J., Garcia-Moreno, J., Ostrander, E. A., and Wayne, R. K. (1994). Molecular genetics of the most endangered canid: the Ethiopian wolf Canis simensis. Molecular Ecology 3, 301-312.

Gould, D. J., Petersen-Jones, S. M., Sohal, A., Barnett, K. C., and Sargan, D. R. (1995). Investigation of the role of opsin gene polymorphism in generalised progressive retinal atrophies in dogs. Animal Genetics 26, 261-267.

Graphodatsky, A. S., Beklemisheva, V. R., and Dolf, G. (1995). High- resolution GTG-banding patterns of dog and silver fox chromosomes: description and comparative analysis. Cytogenetics and Cell Genetics 69, 226-231.

Grimaldi, M.-C., and Crouau-Ray, B. (1997). Microsatellite allelic homoplasy due to variable flanking sequences. Journal of Molecular Evolution 44, 336- 340.

Guevara-Fujita, M. L., Loechel, R., Venta, P. J., Yuzbasiyan-Gurkan, V., and Brewer, G. J. (1996). Chromosomal Assignment of Seven Genes on Canine Chromosomes by Fluorescence In Situ Hybridisation. Mammalian Genome 7, 268-270. 239 Gupta, M., Chyi, Y.-S., Romero-Severson, J., and Owen, J. L. (1994). Amplification of DNA markers from evolutionary diverse genomes using single primers of simple-sequence repeats. Theoretical and Applied Genetics 89, 998-1006.

Haberfeld, A., Cahaner, A., Yoffe, O., Plotsky, Y., and Hillel, J. (1991). Short Communication: DNA fingerprints of farm animals generated by microsatellite and minisatellite DNA probes. Animal Genetics 22, 299-305.

Hamada, H., Seidman, M., Howard, B. H., and Gorman, C. M. (1984). Enhanced gene expression by the poly(dT-dG)-poly(dC-dA) sequence. Molecular and Cellular Biology 4, 2622-2630.

Hancock, J. M. (1996). Simple sequences and the expanding genome. Genes and Genomes 18, 421-425.

Hastbacka, J., de la Chapelle, A., Kaitila, I., Sistonen, P., Weaver, A., Lander, E. (1992). Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genetics 2, 204-211.

Hawkins, T. L., O'Connor-Marin, T., Roy, A., Santillan, C. (1994). DNA purification and isolation using a solid phase. Nucleic Acids Research 22, 4543-4544.

Hazan, J., Dubay, C., Pankowiak, M.-P., and Weissenbach, J. (1992). A genetic linkage map of human chromosome 20 composed entirely of microsatellite markers. Genomics 12, 183-189.

He, L., Morris, S., Lennon, A., Clair, D. M. S., Porteous, D. J., Wright, A. F., Muir, W. J., and Blackwood, D. H. R. (1996). A genome-wide search for linkage in a large bipolar family: comparison of genotyping accuracy using di- and tetranucleotide repeat microsatellite markers. Psychiatric Genetics 6, 123- 129.

Heiskanen, M., Peltonen, L., and Palotie, A. (1996). Visual mapping by high resolution FISH. Trends In Genetics 12, 379-382.

Henke, A., Fischer, C., and Rappold, G. A. (1993). Genetic map of the human pseudoautosomal region reveals a high rate of recombination in female meiosis at the Xp telomere. Genomics 18, 478-485. 240 Heyen, D. W., Beever, J. E., Da, Y., Evert, R. E., Green, C., Bates, S. R. E., Zeigle, J. S., and Lewin, H. A. (1997). Exclusion probabilities of 22 bovine microsatellite markers in fluorescent multiplexes for semi-automated parentage testing. Animal Genetics 28, 21-27.

Holmes, N. G., Mellersh, C. S., Humphreys, S. J., Binns, M. M., Holliman, A., Curtis, R., and Sampson, J. (1993). Isolation and characterisation of microsatellites from the canine genome. Animal Genetics 24, 289-292.

Holmes, N. G., Dickens, H. F., Thomas, R., Fischer, P. E., and Binns, M. M. (1995). Characterisation of canine microsatellites. In 2nd International DOGMAP Meeting (Robinson College, Cambridge.

Hoyle, J., Yulung, I. G., Johnston, K., Scambler, P. J., and Fisher, M. C. H. (1996). Characterisation of a short interspersed repeat (Mermaid) that has family members on human chromosome 21 and elsewhere in the human genome. Human Genetics 97, 117-120.

Hozier, J., and Davis, L. M. (1992). Cytogenetic approaches to genome mapping. Analytical Biochemistry 200, 205-217.

Huertas, D., Bellsolell, L., Cassasnovas, J. M., Coll, M., and Azorfn, F. (1993). Alternating d(GA)n DNA sequences form anti parallel stranded homoduplexes stabilised by the formation of G A base pairs. The EMBO Journal 12, 4029-4038.

Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985). Individual-specific 'fingerprints' of human DNA. Nature 316, 76-79.

Jin, L., Macaubas, C., Hallmayer, J., Kimura, A., and Mignott, E. (1996). Mutation rate varies among alleles at a microsatellite locus: Phylogenetic evidence. Proceedings of the National Academy of Sciences, USA 93, 15285- 15288.

Jiricny, J. (1994).Colon cancer and DNA repair: have mismatches met their match? Trends In Genetics 10, 164-168.

Kappes, S. M., Keele, J. W., Stone, R. T., McGraw, R. A., Sonstegard, T. S., Smith, T. P. L., Lopez-Corrales, N. L., and Beattie, C. W. (1997). A second- generation linkage map of the bovine genome. Genome Research 7, 235- 249.

Karlin, S., and Burge, C. (1995). Dinucleotide relative abundance extremes: a genomic signature. Trends In Genetics 11, 283-290. Koorey, D. J., Bishop, G. A., and McCaughan, G. W. (1993). Allele non- amplification: a source of confusion in linkage studies employing microsatellite polymorphisms. Human Molecular Genetics 2, 289-291.

Kosambi, D. D. (1944). The estimation of map distances from recombination values. Annals of Eugenics 13, 35-71.

Kunzler, P., Matsuo, K., and Schaffner, W. (1995). Pathological, physiological and evolutionary aspects of short unstable DNA repeats in the human genome. Biological Chemistry 376, 201-211.

Kuryavyi, V. V., and Jovin, T. D. (1995). Triad-DNA: a model for trinucleotide repeats. Nature Genetics 9, 339-341.

Kwiatkowski, D. J., Henske, E. P., Weimer, K., Ozelius, L., Gusella, J. F., and Haines, J. (1992). Construction of a GT polymorphism map of human 9q. Genomics 12, 229-240.

Lalioti, M. D., Scott, H. S., Buresi, C., Rossier, C., Bottani, A., Morris, M. A., Malafosse, A., and Antonarakis, S. E. (1997). Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature 386, 847-851.

Lancer, E., and Green, P. (1987). Proceedings of the National Academy of Sciences, U.S.A 84, 2363-2367.

Langford, C. F., Fischer, P. E., Binns, M. M., Holmes, N. G., and Carter, N. P. (1996). Chromosome-specific paints from a high-resolution flow karyotype of the dog. Chromosome Research 4, 115-123.

Langston, A. A., Mellersh, C. A., Neal, C. L., Ray, K., Acland, G. M., Gibbs, M., Aguirre, G. D., Fournier, R. E. K., Ostrander, E. A. (1997). Construction of a panel of canine-rodent hybrid cell lines for use in partitioning of the canine genome. Genomics 46, 317-325.

Lench, N. J., Norris, A., Bailey, A., Booth, A., and Markham, A. F. (1996). Vectorette PCR isolation of microsatellite repeat sequences using anchored dinucleotide repeat primers. Nucleic Acids Research 24, 2190-2191.

Lengauer, C., Kinzler, K. W., and Vogelstein, B. (1997). Genetic instability in colorectal cancers. Nature 386, 623-627.

Lichter, P., Tang, C.-J. C., Call, K., Hermanson, G., Evans, G., Housman, D., and Ward, D. C. (1990). High-resolution mapping of human chromosome 11 by in situ hybridisation with cosmid clones. Science 247, 64-69. 242 Lindqvist, A.-K. B., Magnusson, P. K. E., Balciuniene, J., Wadelius, C., Lindholm, E., Alarcon-Riquelme, M. E., and Gyllensten, U. B. (1996). Chromosome-specific panels of tri- and tetranucleotide microsatellite markers for multiplex fluorescent detection and automated genotyping: evaluation of their utility in pathology and forensics. Genome Research 6, 1170-1176.

Lingaas, F., Sorensen, A., Juneja, R. K., Johansson, S., Fredholm, M., Wintero, A. K., Sampson, J., Mellersh, C., Curzon, A., Holmes, N. G., Binns, M. M., Dickens, H. F., Ryder, E. J., Gerlach, J., Baumle, E., and Dolf, G. (1997). Towards construction of a canine linkage map: establishment of 16 linkage groups. Mammalian Genome 8, 218-221.

Litt, M., and Luty, J. A. (1989). A hyper variable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. American Journal of Human Genetics 44, 397-401.

Liu, Y., Rasool, O., Grander, D., Lindblom, A., and Einhorn, S. (1995). Sequence variability of a prolonged tetranucleotide repeat. Human Molecular Genetics 4, 727-729.

Love, J. M., Knight, A. M., McAleer, M. A., and Todd, J. A. (1990). Towards construction of a high resolution map of the mouse genome using PCR- analysed microsatellites. Nucleic Acids Research 18, 4123-4129.

Lucas, M., Munoz, C., Pintado, E., and Solano, F. (1997). Highly informative single-stranded conformation polymorphism (SSCP) of short tandem repeats in DNA identification. Journal of Forensic Science 42, 118-120.

Lund, A. H., Duch, M., Peders, F. S., (1996). Increased cloning efficiency by temperature-cycle ligation. Nucleic Acids Research 24, 800-801.

Lyons, L. A., Laughlin, T. F., Copeland, N. G., Jenkins, N. A., Womack, J. E., and O'Brien, S. J. (1997). Comparative anchor tagged sequences (CATS) for integrative mapping of mammalian genomes. Nature Genetics 15, 47-56.

Maiorano, D., Cece, R., and Badaracco, G. (1997). Satellite DNA from the brine shrimp Artemia affects the expression of a flanking gene in yeast. Gene 189, 13-18.

Mandel, J.-L. (1997). Breaking the rule of three. Nature 386, 767-769.

Manolache, M., Ross, W. M., and Schmid, M. (1976). Banding analysis of the somatic chromosomes of the domestic dog ( Canis familiaris ). Canadian Journal of Genetic Cytology 18, 513-518. 243 Matise, T. C., Perlin, M., and Chakravarti, A. (1994). Automated construction of genetic linkage maps (MultiMap): A human genome linkage map. Nature Genetics 6, 384-390.

McDonell, M. W., Simon, M. N., Studier, F. W. (1977). Analysis of restriction fragments of T7 DNA and determination of molecular weights by electrophoresis in neutral and alkaline gels. Journal of Molecular Biology MO, 119-146.

Mellersh, C., Holmes, N., Binns, M., and Sampson, J. (1994). Dinucleotide repeat polymorphisms at four canine loci (LEI 003, LEI 007, LEI 008 and LEI 015). Animal Genetics 25, 125-126.

Mellersh, C. S., Langston, A. A., Acland, G. M., Fleming, M. A., Ray, K., Wiegrand, N. A., Francisco, L. V., Gibbs, M., Aguirre, G. D., and Ostrander, E. A. (1997). A linkage map of the canine genome. Genomics 46, 326-336.

Minnick, M. F., Stillwell, L. C., Heineman, J. M., and Steigler, G. L. (1992). A highly repetitive DNA sequence possibly unique to canids. Gene 110, 235- 238.

Modi, W. S., Fanning, T. G., Wayne, R. K., and O'Brien, S. J. (1988). Chromosomal localisation of satellite DNA sequences among 22 species of felids and canids (Carnivora). Cytogenetics and Cell Genetics 48, 208-213.

Moore, S. S., Sargeant, L. L., King, T. J., Mattick, J. S., Georges, M., and Hetzel, D. J. S. (1991). The conservation of dinucleotide microsatellites among mammalian genomes allows the use of heterologous PCR primer pairs in closely related species. Genomics 10, 654-660.

Morell, V. (1997). The origin of dogs: running with the wolves. Science 276, 1647-1648.

Muller, S., Rocchi, M., Ferguson-Smith, M. A., and Wienberg, J. (1997). Toward a multicolour chromosome bar code for the entire human karyotype by fluorescence in situ hybridisation. Human Genetics 100, 271-278.

Murray, V., Monchawin, C., and England, P. R. (1993). The determination of the sequences present in the shadow bands of a dinucleotide repeat PCR. Nucleic Acids Research 21, 2395-2398.

O'Brien, S. J. (1991). Mammalian genome mapping: lessons and prospects. Current Biology 1, 105-111. 244 O'Brien, S. J., and Marshall Graves, J. A. (1991). Report of the committee on comparative gene mapping. Cytogenetics and Cell Genetics 58, 1124-1151.

O'Brien, S. J., Womack, J. E., Lyons, L. A., Moore, K. J., Jenkins, N. A., and Copeland, N. G. (1993). Anchored reference loci for comparative genome mapping in mammals. Nature Genetics 3, 103-112.

O'Brien, S. J., Wienberg, J., and Lyons, L. A. (1997). Comparative genomics: lessons from cats. Trends In Genetics 13, 393-399.

Ostrander, E. A., Jong, P. J., Rine, J., and Duyk, G. (1992). Construction of small-insert genomic DNA libraries highly enriched for microsatellite repeat sequences. Proceedings of the National Academy of Sciences, U.S.A 89, 3419-3423.

Ostrander, E. A., Jr., G. F. S., and Rine, J. (1993). Identification and characterization of dinucleotide repeat (CA)n markers for genetic mapping in dog. Genomics 16, 207-213.

Ostrander, E. A., Mapa, F. A., Yee, M., and Rine, J. (1995). One hundred and one new simple sequence repeat-based markers for the canine genome. Mammalian Genome 6, 192-195.

Ostrander, E. O. (1997). Physical mapping resources: canine/rodent hybrid cell lines and a collaborative canine BAC library. In Canine genetics: the map, the genes, the diseases (Cornell University, Ithaca, U.S.A.

Ott, J. (1991). Analysis of Human Genetic Linkage, Revised Edition (Baltimore: The John Hopkins University Press).

Panaud, O., Chen, X., and McCouch, S. R. (1995). Frequency of Microsatellite Sequences in Rice (Oryza sativa L.). Genome 38, 1170-1176.

Pandolfo, M. (1992). A rapid method to isolate (GT)n repeats from yeast artificial chromosomes. Nucleic Acids Research 20, 1154.

Park, J. P. (1996). Shared synteny of human chromosome 17 loci in canids. Cytogenetics and Cell Genetics 74, 133-137.

Patterson, D. F. (1995). Genetic defects in the dog. In Second international DOGMAP meeting (Cambridge, U.K.

Pena, S. D. J., and Chakraborty, R. (1994). Paternity testing in the DNA era. Trends In Genetics 10, 204-209. 245 Pepin, L., Amigues, Y., Lepingle, A., Berthier, J.-L., Bensaid, A., and Vaiman, D. (1995). Sequence conservation of microsatellites between Bos taurus (cattle), Capra hircus (goat) and related species. Examples of use in parentage testing and phylogeny analysis. Heredity 74, 53-61.

Perez-Lezaun, A., Calafell, F., Mateu, E., Comas, D., Ruiz-Pacheco, R., and Bertranpetit, J. (1997). Microsatellite variation and the differentiation of modern humans. Human Genetics 99, 1-7.

Petes, T. D., Greenwell, P. W., and Dominska, M. (1997). Stabilisation of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146, 491-498.

Petruska, J., Arnheim, N., and Goodman, M. F. (1996). Stability of Intrastrand Hairpin Structures Formed by the CAG/CTG Class of DNA Triplet Repeats Associated with Neurological Diseases. Nucleic Acids Research 24, 1992- 1998.

Pihkanen, S., Vainola, R., and Varvio, S. (1996). Characterising dog breed differentiation with microsatellite markers. Animal Genetics 27, 343-346.

Pinkel, D., Straume, T., and Gray, J. W. (1986). Cytogenetic analysis using quantitative, high-sensitivity, fluorescence hybridisation. Proceedings of the National Academy of Science , USA 83, 2934-2938.

Primmer, C. R., and Matthews, M. E. (1993). Canine tetranucleotide repeat polymorphism at the VIAS-D10 locus. Animal Genetics 24, 332.

Primmer, C. R., Raudsepp, T., Chowdhary, B. P., Moller, A. P., and Ellegren, H. (1997). Low frequency of microsatellites in the avian genome. Genome Research 7, 471-482.

Prochazka, M. (1996). Microsatellite hybrid capture technique for simultaneous isolation of various STR markers. Genome Research 6, 646- 649.

Queller, D. C., Strassmann, J. E., and Hughes, C. R. (1993). Microsatellites and kinship. TREE 8, 285-288.

Rao, E., Weiss, B., Fukami, M., Mertz, A., Meder, J., Ogata, T., Heinrich, U., Garcia-Heras, J., Schiebel, K., and Rappold, G. A. (1997). FISH-deletion mapping defines a 270-kb short stature critical interval in the pseudoautosomal region PAR1 on human sex chromosomes. Human Genetics 100, 236-239. 246 Raudsepp, T., Otte, K., Rozell, B., and Chowdhary, B. P. (1997). FISH mapping of the IGF2 gene in horse and donkey - detection of homeology with HSA11. Mammalian Genome 8, 569-572.

Reed, P. W., Davies, J. L., Copeman, J. B., Bennett, S. T., Palmer, S. M., Pritchard, L. E., Gough, S. C. L., Kawaguchi, Y., Cordell, H. J., Balfour, K. M., Jenkins, S. C., Powell, E. E., Vignal, A., and Todd, J. A. (1994). Chromosome-specific microsatellite sets for fluorescence-based, semi­ automated genome mapping. Nature Genetics 7, 390-395.

Reimann, N., Bartnitzke, S., Bullerdiek, J., Schmitz, U., Rogalla, P., Nolte, I., and Ronne, M. (1996). An extended nomenclature of the canine karyotype. Cytogenetics and Cell Genetics 73, 140-144.

Robic, A., Dubois, C., Milan, D., and Gellin, J. (1994). A rapid method to isolate microsatellite markers from cosmid clones. Mammalian Genome 5, 177-179.

Robic, A., Parrou, J.-L., Yerle, M., Goureau, A., Dalens, M., Milan, D., and Gellin, J. (1995). Pig microsatellites isolated from cosmids revealing polymorphism and localised on chromosomes. Animal Genetics 26, 1-6.

Rolf, B., Schiirenkamp, M., Junge, A., and Brinkmann, B. (1997). Sequence polymorphism at the tetranucleotide repeat of the human beta-actin related pseudogene H-beta-Ac-psi-2 (ACTBP2) locus. International Journal of Legal Medicine 110, 69-72.

Rothuizen, J., and Raak, M. v. (1994). Rapid PCR-based characterization of sequences flanking microsatellites in large-insert libraries. Nucleic Acids Research 22, 5512-5513.

Rothuzien, J., Wolfswinkel, J., Lenstra, J. A., and Frants, R. R. (1994). The incidence of mini- and micro-satellite repetitive DNA in the canine genome. Theoretical and Applied Genetics 89, 403-406.

Rouyer, F., Simmler, M.-C., Johnsson, C., Vergnaud, G., Cooke, H. J., and Weissenbach, J. (1986a). A gradient of sex linkage in the pseudoautosomal region of the human sex chromosomes. Nature 319, 291-295.

Rouyer, F., Simmler, M.-C., Vergnauld, G., Johnsson, C., Levilliers, J., Petit, C., and Weissenbach, J. (1986b). The pseudoautosomal region of the human sex chromosomes. Cold Spring Harbour Symposia on Quantitative Biology L1, 221-228. 247 Rowe, P. S. N., Francis, F., and Goulding, J. (1994). Rapid isolation of DNA sequences flanking microsatellite repeats. Nucleic Acids Research 22, 5135- 5136.

Roy, M. S., Geffen, E., Smith, D., and Wayne, R. K. (1996). Molecular genetics of pre-1940 Red Wolves. Conservation Biology 10, 1413-1424.

Royie, N. J., Clarkson, R. E., Wong, Z., and Jeffreys, A. J. (1988). Clustering of hypervariable minisatellites in the proterminal regions of human autosomes. Genomics 3, 352-360.

Rubinsztein, D. C., Amos, W., Leggo, J., Goodburn, S., Jain, S., Li, S.-H., Margolis, R. L., Ross, C. A., and Ferguson-Smith, M. A. (1995). Microsatellite evolution- evidence for directionality and variation in rate between species. Nature Genetics 10, 337-343.

Sassaman, D. M., Dombroski, B. A., Moran, J. V., Kimberland, M. L., Naas, T. P., DeBerardinis, R. J., Gabriel, A., Swergold, G. D., and H. H. Kazazian, J. (1997). Many human L1 elements are capable of retrotransposition. Nature Genetics 16,37-43.

Sawcer, S., Jones, H. B., Judge, D., Visser, F., Compston, A., Goodfellow, P. N., and Clayton, D. (1997). Empirical genome-wide significance levels established by whole genome simulations. Genetic Epidemiology'll, 223- 229.

Scherthan, H., Cremer, T., Arnason, U., Weier, H.-U., Lima-de-Faria, and Fronickle, L. (1994). Comparative chromosome painting discloses homologous segments in distantly related mammals. Nature Genetics 6, 342- 347.

Schlotterer, C., Vogl, C., and Tautz, D. (1997). Polymorphism and locus- specific effects on polymorphism at microsatellite loci in natural Drosophila melanogaster populations. Genetics 146, 309-320. 248 Schuler, G. D., Boguski, M. S., Stewart, E. A., Stein, L. D., Gyapay, G., Rice, K., White, R. E., Rodruiguez-Tome, P., Aggarwal, A., Bajorek, E., Bentolila, S., Birren, B. B., Butler, A., Castle, A. B., Chiannilkulchai, N., Chu, A., Clee, C., Cowles, S., Day, P. J. R., Dibling, T., Drouot, N., Dunham, I., Duprat, S., East, C., Edwards, C., Fan, J.-B., Fang, N., Fizames, C., Garrett, C., Green, L., Hadley, D., Harris, M., Harrison, P., Brady, S., Hicks, A., Holloway, E., Hui, L., Hussain, S., Louis-Dit-Sully, C., Ma, J., MacGilvery, A., Mader, C., Maratukulam, A., Matise, T. C., McKusick, K. B., Morissette, J., Mungall, A., Muselet, D., Nusbaum, H. C., Page, D. C., Peck, A., Perkins, S., Piercy, M., Qin, F., Quackenbush, J., Ranby, S., Reif, T., Rozen, S., Sanders, C., She, X., Silva, J., Slonim, D. K., Soderlund, C., Sun, W.-L., Tabar, P., Thangarajah, T., Vega-Czarny, N., Vollrath, D., Voyticky, S., Wilmer, T., Wu, X., Adams, M. D., Auffray, C., Walter, N. A. R., Brandon, R., Dehejia, A., Goodfellow, P. N., Houlgatte, R., Hudson Jr., J. R., Ide, S. E., lorio, K. R., Lee, W. Y., Seki, N., Nagase, T., Ishikawa, K., Torres, R., Venter, J. C., Sikela, J. M., Beckmann, J. S., Weissenbach, J., Myers, R. M., Cox, D. R., James, M. R., Bentley, D., Deloukas, P., Lander, E. S., and Hudson, T. J. (1996). A gene map of the human genome. Science 274, 540-546.

Scribner, K. T., Arntzen, J. W., and Burke, T. (1994). Comparative analysis of intra- and interpopulation genetic diversity in Bufo Bufo, using allozyme, single-locus microsatellite, minisatellite, and multilocus minisatellite data. Molecular Biology and Evolution 11, 737-748.

Selden, J. R., Moorhead, P. S., Oehlert, M. L., and Patterson, D. F. (1975). The Giemsa banding pattern of the canine karyotype. Cytogenetics and Cell Genetics 15, 380-387.

Shibata, D. (1997). Molecular tumour clocks. The Finnish Medical Society DUODECIM, Annals of Medicine 29, 5-7.

Shibuya, H., Nonneman, D. J., Huang, T. H.-M., Ganjam, V. K., Mann, F. A., and Johnson, G. S. (1993). Two polymorphic microsatellites in a coding segment of the canine androgen receptor gene. Animal Genetics 24, 345-348.

Shibuya, H., Collins, B. K., Huang, T. H.-M., and Johnson, G. S. (1994). A polymorphic (AGGAAT)n tandem repeat in an intron of the canine von Willebrand factor genes. Animal Genetics 25, 122.

Shibuya, H., Collins, B. K., Collier, L. L., Huang, T. H.-M., Nonneman, D., and Johnson, G. S. (1996). A polymorphic (GAAA)n Microsatellite in a Canine Wilms Tumor 1 (WT 1) Gene Intron. Animal Genetics 27, 59-60. 249 Signer, E. N., Gu, F., Gustavsson, I., Andersson, L., and Jefferys, A. J. (1994). A pseudoautosomal minisatellite in the pig. Mammalian Genome 5, 48-51.

Smit, A. F. A., Toth, G., Riggs, A. D., and Jurka, J. (1995). Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. Journal of Molecular Biology 246, 401-417.

Stallings, R. L., Ford, A. F., Nelson, D., Torney, D. C., Hilderbrand, C. E., and Moyzis, R. K. (1991). Evolution and distribution of (GT)n repetitive sequences in mammalian genomes. Genomics 10, 807-815.

Stewart, E. A., McKusick, K. B., Aggarwal, A., Bajorek, E., Brady, S., Chu, A., Fang, N., Hadley, D., Harris, M., Hussain, S., Lee, R., Maratukulam, A., O'Connor, K., Perkins, S., Piercy, M., Qin, F., Reif, T., Sanders, C., She, X., Sun, W.-L., Tabar, P., Voyticky, S., Cowles, S., Fan, J.-B., Mader, C., Quackenbush, J., Myers, R. M., and Cox, D. R. (1997). An STS-based radiation hybrid map of the human genome. Genome Research 7, 422-433.

Stone, D. M., and Prieur, P. B. (1991). The Giemsa banding pattern of canine chromosomes, using a cell synchronisation technique. Genome 34, 407-412.

Sun, H. S., L.Cai, Davis, S. K., Taylor, J. F., Doud, L. K., Bishop, M. D., Hayes, H., Bardense, W., Viaman, D., McGraw, R. A., Hirano, T., Sugimoto, Y., and Kirkpatrick, B. W. (1997). Comparative linkage mapping of human chromosome 13 and bovine chromosome 12. Genomics 39, 47-54.

Swftonski, M., Fischer, P., and Reimann, N. (1994). International efforts for establishing standard karyotype of the dog. In Proceedings of the 11th European Colloquium on the Cytogenetics of Domestic Animals (Copenhagen, pp. 150-152.

Swftonski, M., Reimann, N., Bosma, A. A., Long, S., Bartnitzke, S., Pienkowska, A., Moreno-Milan, M. M., and Fischer, P. (1996). Report on the progress of standardisation of the G-banded canine (Canis familiaris) karyotype. Chromosome Research 4, 306-309.

Takezaki, N., and Nei, M. (1996). Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144, 389-399.

Tautz, D., and Renz, M. (1984). Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Research 12, 4127-4138. 250 Tautz, D., Trick, M., and Dover, G. A. (1986). Cryptic simplicity in DNA is a major source of genetic variation. Nature 322, 652-656.

Tautz, D. (1989). Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Research 17, 6463-6471.

Taylor, G. R., Haward, S., Noble, J. S., and Murday, V. (1992). Isolation and sequencing of CA/GT repeat microsatellites from chromosomal libraries without subcloning. Analytical Biochemistry 200, 125-129.

Toldo, S. S., Fries, R., Steffen, P., Neibergs, H. L., Barendse, W., Womack, J. E., Hetzel, D. J. S., and Stranzinger, G. (1993). Physically mapped, cosmid- derived microsatellite markers as anchor loci on bovine chromosomes. Mammalian Genome 4, 720-727.

Trask, B. J. (1991). Fluorescence in situ hybridisation: applications in cytogenetics and gene mapping. Trends In Genetics 7, 149-154.

Tucker-Burks, R., Kessis, T. D., Cho, K. R., and Hedrick, L. (1994). Microsatellite instability in endometrial carcinoma. Oncogene 9, 1163-1166.

Valdes, A. M., Siatkin, M., and Freimer, N. B. (1993). Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133, 737- 749.

Venta, P. J., Brouillette, J. A., Gurkan, V. Y.-M ., and Brewer, G. J. (1996). Gene-specific universal mammalian sequence-tagged sites: Application to the canine genome. Biochemical Genetics 34, 321-341.

Vignaux, F., Priat, C., Jouquand, S., Hitte, C., Jiang, Z., Cheron, A., Renier, C., Andre, C., and Galibert, F. (1997). Radiation hybrid cell lines. In Canine genetics: the map, the genes, the disaeses. Cornell University, Ithaca, U.S.A.

Vila, C., Savolainen, P., Maldonado, J. E., Amorin, I. R., Rice, J. E., Honeycutt, R. L., Crandall, K. A., Lundeberg, J., and Wayne, R. K. (1997). Multiple and ancient origins of the domestic dog. Science 276, 1687-1689.

Virtaneva, K., D'Amato, E., Miao, J., Koskiniemi, M., Norio, R., Avanzini, G., Franceschetti, S., Michelucci, R., Tassinari, C. A., Omer, S., Pennacchio, L. A., Myers, R. M., Dieguez-Lucena, J. L., Krahe, R., de la Chapelle, A., and Lehesjoki, A.-E. (1997). Unstable minisatellite expansion causing recessively inherited myoclonus epilepsy, EPM1. Nature Genetics 15, 393-396. Vogt, P. (1990). Potential genetic functions of tandem repeated DNA sequence blocks of the human genome are based on a highly conserved chromatin folding code. Human Genetics 84, 301-336.

Vos, P., Hogers, R., Bleeker, M., Reijans, M., Lee, T. v. d., Hornes, M., Frijters, A., Pot, J., Peleman, J., Kuiper, M., and Zabeau, M. (1995). AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research 23, 4407-4414.

Walsh, P. S., Fildes, N. J., and Reynolds, R. (1996). Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. Nucleic Acids Research 24, 2807-2812.

Wang, Y.-H., and Griffith, J. (1995). Expanded CTG triplet blocks from the Myotonic Dystrophy gene create the strongest known natural nucleosome positioning elements. Genomics 25, 570-573.

Warren, S. T. (1996). The expanding world of trinucleotide repeats. Science 271, 1374-1375.

Watanabe, G., Umetsu, K., Yuasa, I., and Suzuki, T. (1997). Amplified product length polymorphism (APLP): a novel strategy for genotyping the ABO blood group. Human Genetics 99, 34-37.

Wayne, R. K., Nash, W. G., and O'Brien, S. J. (1987). Chromosomal evolution of the Canidae I. Species with high diploid numbers. Cytogenetics and Cell Genetics 44, 123-133.

Wayne, R. K., Nash, W. G., and O'Brien, S. J. (1987). Chromosomal evolution of the Canidae II. Divergence from the primitive carnivore type. Cytogenetics and Cell Genetics 44, 134-141.

Wayne, R. K. (1993). Molecular evolution of the dog family. Trends In Genetics 9, 218-224.

Weber, J. L., and May, P. E. (1989). Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. American Journal of Human Genetics 44, 388-396.

Weber, J. L. (1990). Informativeness of human (dC-dA)n.(dG-dT)n polymorphisms. Genomics 7, 524-530.

Weber, J. L., and Wong, C. (1993). Mutation of human short tandem repeats. Human Molecular Genetics 2, 1123-1128. 252 Weissenbach, J., Gyapay, G., Dib, C., Vignal, A., Morissette, J., Millasseau, P., Vaysseix, G., and Lathrop, M. (1992). A second-generation linkage map of the human genome. Nature 359, 794-801.

Weissenbach, J. (1997). The human genome program (French). Pathologie Biologie 45, 205-208.

Welsh, J., and McClelland, M. (1990). Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Research 18, 7213-7218.

Werner, P., Raducha, M. G., Prociuk, U., Henthorn, P. S., and Patterson, D. F. (1997). Physical and linkage mapping of human chromosome 17 loci to dog chromosomes 9 and 5. Genomics 42, 74-82.

Wierdl, M., Dominska, M., and Petes, T. D. (1997). Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics 146, 769-779.

Yee, H. A., Wong, A. K. C., van de Sande, J. H., and Rattner, J. B. (1991). Identification of novel single-stranded d(TC)n binding proteins in several mammalian species. Nucleic Acids Research 19, 949-953.

Yuille, M. A. R., Goudie, D. R., Affara, N. A., and Ferguson-Smith, M. A. (1991). Rapid determination of sequences flanking microsatellites. Nucleic Acids Research 19, 1950.

Yung, J.-F. (1996). New FISH probes-the end in sight. Nature Genetics 14, IQ- 12.

Yuzbasiyan-Gurkan, V., Blanton, S. H., Cao, Y., Ferguson, P., Li, J., Venta, P. J., and Brewer, G. J. (1997). Linkage of a microsatellite marker to the canine copper toxicosis locus in Bedlington Terriers. American Journal of Veterinary Research 58, 23-27.

Zajc, I., Mellersh, C. S., and Sampson, J. (1997). Variability of canine microsatellites within and between different dog breeds. Mammalian Genome 8, 182-185.