Quick viewing(Text Mode)

Comparative Genome Analysis Between the Atlantic Salmon Sex Chromosome and the Genomes of Other Teleosts

Comparative Genome Analysis Between the Atlantic Salmon Sex Chromosome and the Genomes of Other Teleosts

COMPARATIVE ANALYSIS BETWEEN THE ATLANTIC SALMON SEX AND THE OF OTHER TELEOSTS

by

Teng-Kai (Kevin) Huang B. Sc., Simon Fraser University, 2006

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

In the Department of Molecular Biology and Biochemistry

© Teng-Kai (Kevin) Huang 2009

SIMON FRASER UNIVERSITY

Spring 2009

All rights reserved. This work may not be reproduced jill whole or in part, by photocopy or other means, without permission of the author. APPROVAL

Name: Teng-Kai (Kevin) Huang Degree: Master of Science Title of Thesis: Comparative Genome Analysis between the Atlantic Salmon Sex Chromosome and the Genomes of Other Teleosts

Examining Committee: Chair: Dr. Andrew J. Bennet Professor, Department of Chemistry

Dr. William S. Davidson Senior Supervisor Professor, Department of Molecular Biology and Biochemistry

Dr. Jack N. Chen Supervisor Associate Professor, Department of Molecular Biology and Biochemistry

Dr. Norbert H. Haunerland Supervisor Professor, Department of Biological Sciences

Dr. Fiona S. L. Brinkman Public Examiner Associate Professor, Department of Molecular Biology and Biochemistry

Date Defended!Approved:

ii SIMON FRASER UNIVERSITY LIBRARY

Declaration of Partial Copyright Licence

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Flraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the "Institutional Repository" link of the SFU Library website at: ) and, without changing the content, to translate the thesis/project or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

While licensing SFU to permit the above uses, the author retains copyright in the thesis, project or extended essays, including the right to change the work for subsequent purposes, including editing and publishing the work in whole or in part, and licensing other parties, as the author may desire.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library Burnaby, BC, Canada

Revised: Fall 2007 ABSTRACT

In Atlantic salmon, Linkage Group (LG) I contains the sex-determining locus, and from previous studies it was found that LG 1 corresponds to chromosome 2. I have completed a screen of the Atlantic salmon bacterial artificial chromosome (BAC) libraries for all known microsatellite markers on LG 1, and was able to integrate much of the physical and linkage maps in this genomic region. I have also constructed BAC minimum tiling pathways in many regions of chromosome 2, with the sequences of some of these regions being obtained. All the BAC-end sequences from the contigs assigned to

Atlantic salmon LG 1 were subjected to BLASTx searches with the genomes of medaka, stickleback, zebrafish and Tetraodon. The orthologous found were used to identify the regions of these fish genomes that are syntenic to Atlantic salmon LG 1. Three sex­ determining candidates were also identified: ZFYVE27, Zmat4 and TSG118.

iii ACKNOWLEDGEMENTS

First of all, I would like to thank my senior supervisor, Dr. William S. Davidson, for his tremendous help for the past two years of my master program. My learning experience has been fun and enjoyable with him, and I really appreciate his kindness, patience, and support. I am grateful to all the guidance and advice that he provided me to carry out my research for the past two years, and his help on the completion of my thesis and my master degree.

I thank Dr. Jack N. Chen and Dr. Norbert H. Haunerland, the members of my thesis committee, for contributing their time and expertise on my work, and their good-natured support.

I am grateful too for the support from all the lab people for the past two years. I enjoyed the friendships and all the times that we had together. Especially, I need to express my gratitude and appreciation to Dr. Kazuhiro Fujiki, who taught me all the skills necessary for the two years of my master program, and shared all of his knowledge and advice with me regarding to my project. I would also like to thank Jieying Li for her help on some of the minimum tiling pathway constructions, and she has done the job very well. My appreciation also goes to Keith Boroevich for developing the Asalbase website with all the BLASTx results, and William Chow for developing the GRASP website with all the sequence annotation data.

I must acknowledge all of my church friends, especially to Benson Hsu and Howard Liu, and my church pastors, Victor Pai and Dove Jang, who had prayed for me, encouraged me, and supported me for the past two years of my master program. I would not have been able to complete my master program without their support.

Finally, I would like to thank my Grade 4 science teacher, Ms. Lucia Chen. She really inspired me to do science as my career through a project that we had worked together for a science competition 16 years ago, and it is this inspiration that drives me forward until this day. I am also grateful for the many support and encouragement that she gave me during the period of my thesis writing. Without her, my life could be completely different, and the completion of this thesis would be impossible. (Thank you, Ms. Chen.)

IV TABLE OF CONTENTS

Approval " ii Abstract " iii Acknowledgements " iv

Table of Contents n ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••V List of Figures n...... vii List of Tables ix Chapter 1: Introduction 1 1.1 Sex-Determining Mechanisms 1 1.1.1 Environmental Mechanisms 1 1.1.2 Genetic Mechanisms 2 1.2 Sex-Determination and Sex Chromosome in 2 1.2.1 Mammals 3 1.2.2 Birds 5 1.2.3 Fishes 6 1.3 Sex-Determination in Teleost Fishes .- 9 1.3.1 Medaka 9 1.3.2 Poeciliid 10 1.3.3 Tilapia 12 1.3.4 Stickleback 14 1.3.5 Pufferfish 15 1.4 Sex-Determinations in Salmonids 16 1.5 Atlantic Salmon Linkage Group (LG) 1 19 1.6 Comparative Genome Analysis 21 1.6.1 Whole-Genome Duplication 21 1.6.2 Salmonid Genome Duplication 23 1.6.3 Comparing Genomes from Different Species 24 1.7 Aim of the Thesis 27 Chapter 2: Materials and Methods 28 2.1 Integration of Microsatellite and SNP Markers into Atlantic Salmon Linkage Group 1 28 2.1.1 Probe and Primer Design 28 2.1.2 BAC Library Screening 28 2.1.3 Positiveness Check by PCR 29 2.1.4 Identifying the Corresponding Contig 30 2.2 Chromosome "Walking" Along Atlantic Salmon Linkage Group 1 31

v 2.2.1 Probe and Primer Design 31 2.2.2 BAC Library Screening 31 2.2.3 Positiveness Check by PCR 31 2.2.4 Joining the Contigs 31 2.3 Sequencing the BAC Inserts 32 2.3.1 PCR Procedure 32 2.3.2 Subcloning the PCR Product 32 2.3.3 Insert Check by Colony PCR 33 2.3.4 Sequencing Reaction 34 2.4 BAC Sequencing Analysis and Gene Identification 35 2.5 Comparative Genome Analysis 36 Chapter 3: Results 37 3.1 Updating the Atlantic Salmon Linkage Group (LG) 1 Physical Map 37 3.1.1 An Overview of Atlantic Salmon LG 1 37 3.1.2 Incorporating Sex-Linked Microsatellte Markers into the Physical Map 39 3.2 Extension of Coverage of Atlantic Salmon LG 1 Physical Map .47 3.2.1 Chromosome Walking .47 3.2.2 Joining the Contigs 55 3.3 Minimum Tiling Pathways 57 3.3.1 Super-Contig 783 59 3.3.2 Super-Contig 332 67 3.3.3 Super-Contig 315 72 3.3.4 Super-Contig 2169 75 3.3.5 OMYlllliRA 79 3.4 Assembly and Annotation of BAC Sequences 83 3.4.1 Sequences Annotation 83 3.4.2 Genes Identified " 89 3.5 Comparative Genome Analysis 107 3.5.1 Medaka " 110 3.5.2 Stickleback " 114 3.5.3 Zebrafish 118 3.5.4 Tetraodon 122 3.6 A Comparison between the Comparative Genomics Data and the BAC Sequencing Data 125 3.7 Candidate Sex-Determining Genes 128 Chapter 4: Discussions and Conclusions 139 4.1 A Comparison between the Previous Work in Comparative Genomics and my Results 139 4.2 Future Work 141 References 143

VI LIST OF FIGURES

Figure 3.1 Atlantic salmon Linkage Group 1. 38 Figure 3.2 CHORI-214 Atlantic salmon BAC library filter hybridized with BHMS7.029 probe 44 Figure 3.3 Hybridization positive BACs found by BHMS7.029 microsatellite probe were confirmed by PCR .45 Figure 3.4 Contig 332 as viewed from Asalbase .46 Figure 3.5 Checking the Orientation of BACs located at the end of a contig using PCR 53 Figure 3.6 PCR result of end BACs in contig 783, using both T7 and SP6 primers from S0065E16 54 Figure 3.7 Super-contig 783 63 Figure 3.8 Super-contig 332 69 Figure 3.9 Super-contig 315 73 Figure 3.10 Super-contig 2169 77 Figure 3.11 The region around the microsatellite marker OMYIIINRA. 81 Figure 3.12 BAC Sequences with their scaffolds in super-contig 783 86 Figure 3.13 BAC Sequences with the scaffold in contig 818 87 Figure 3.14 BAC Sequences in contig 2705 (scaffold 2705) 88 Figure 3.15 Gene annotation of scaffold 2354 (super-contig 783) 92 Figure 3.16 Gene annotation of scaffold 1052 (super-contig 783) 93 Figure 3.17 Gene annotation of scaffold 1968 (super-contig 783) 94 Figure 3.18 Gene annotation of S0066L24 (super-contig 783) 95 Figure 3.19 Gene annotation of S0397C07 (super-contig 783) 96 Figure 3.20 Gene annotation of S0363E24 (super-contig 783) 97 Figure 3.21 Gene annotation of scaffold 818 (contig 818) 10 1 Figure 3.22 Gene annotation of S0120P04 (contig 818) 102 Figure 3.23 Gene annotation of SO 120P04 (contig 818) 103

vii Figure 3.24 Putative orientation of the three BACs and the location of all the genes in contig 2705 104 Figure 3.25 Atlantic salmon LG 1 with the contigs and their corresponding markers 109 Figure 3.26 Atlantic salmon LG 1 with the orthologus chromosome from medaka 112 Figure 3.27 Atlantic salmon LG 1 with the orthologus chromosome from stickleback 116 Figure 3.28 Atlantic salmon LG 1 with the orthologus chromosome from zebrafish 120 Figure 3.29 Atlantic salmon LG 1 with the orthologus chromosome from tetraodon 123 Figure 3.30 The region of medaka genome that is orthologous to contig 2705 132 Figure 3.31 The region of stickleback genome that is orthologous to contig 2705 133 Figure 3.32 The region of zebrafish genome that is orthologous to contig 2705 134 Figure 3.33 The region of Tetraodon genome that is orthologous to contig 2705 135 Figure 3.34 Expression profile of Zmat4, ZFYVE27 and TSG118 in the Atlantic salmon smoll. 138

Vlll LIST OF TABLES

Table 3.1 Microsatellite markers and SNPs used for the integration of Atlantic salmon Linkage Group 1 physical map .42 Table 3.2 BACs selected for chromosome walking and contigs joining 50 Table 3.3 Contigs and clones that were joined to form super-contigs, with their corresponding micorsatellite markers 56 Table 3.4 A summary of all the minimum tiling pathways constructed 58 Table 3.5 BACs selected for building the minimum tiling pathway in super- contig 783 64 Table 3.6 BACs selected for building the minimum tiling pathway in super- contig 332 70 Table 3.7 BACs selected for building the minimum tiling pathway in super- contig 315 74 Table 3.8 BACs selected for building the minimum tiling pathway in super- contig 2169 78 Table 3.9 BACs selected for building the minimum tiling pathway around microsatellite marker OMYIIINRA. 82 Table 3.10 BAC sequences in super-contig 783 98 Table 3.11 BAC sequences in contig 818 and contig 2705 105 Table 3.12 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with medaka 113 Table 3.13 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with stickleback. 117 Table 3.14 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with zebrafish 12] Table 3.15 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with tetraodon 124 Table 3.16 A comparison of the genes identified in the genomes of the four teleosts that correspond to contig 2705 in Atlantic salmon LG 1. 136

ix CHAPTER 1: INTRODUCTION

Almost all animal species have developed two sexes, male and female. This involves

one of the most fundamental biological processes known as sex-determination.

Interestingly, although the existence of two sexes is a conserved phenomenon, the

mechanisms that control sex-determination are diverse, ranging from environmental

influences to strictly genetic factors (Kikuchi et aI., 2007). Nevertheless, it is now known

that sexual reproduction has a greater advantage than asexual reproduction, since sexual

reproduction provides new genetic combinations that can rapidly adapt to a changing

environment, and prevents the accumulation of deleterious mutations in the population

(Kondrashov, 1993). In the introduction to this thesis, I will examine some different sex­

determining mechanisms used by vertebrates, and will then focus primarily on sex­

determination in the Salmonids, especially in the Atlantic salmon (Salmo salar).

1.1 Sex-Determining Mechanisms

Sex-determination can be divided into two broad categories: environmental

mechanisms and genetic mechanisms.

1.1.1 Environmental Mechanisms

As the name suggests, an organism using this type of sex-determining mechanisms

requires external stimuli to determine its sex. Many reptile and fish species fall into this category, in which the temperature at which the eggs are incubated controls the differentiation of sex. The population dynamics can also affect the sex of an individual, and this happens when a dominant individual with a particular sex becomes scarce, the next dominant individual may undergo sex reversal in order to compensate the loss. In addition, there is evidence that factors such as municipal waste, pH, salinity, and organic compounds, can lead to sex-reversal of fish (Devlin and Nagahama, 2002).

1.1.2 Genetic Mechanisms

In this case, the organism uses the gene(s) on the sex chromosome that serve as a master switch that triggers the development of the embryonic gonad into a testis or an ovary. There are two major types of genetic sex-determination systems: (1) heterogametic male system, in which the males have X and Y and females have two X chromosomes, and (2) heterogametic female system, in which the females have Z and W chromosomes and males have two Z chromosomes. However, this regulation of the sex ratio strictly by sex chromosome is not always the case, since in some species a polyfactorial system can take part in the sex-determining pathway. This system includes autosomal modifiers that can override the sex chromosome, or different combination of genetic factors inherited from the male and female parent (Volff and Schartl, 2001;

Devlin and Nagahama, 2002; Mank et al., 2006).

1.2 Sex-Determination and Sex Chromosome Evolution in Vertebrates

It is now widely accepted that sex chromosomes arise in two major steps. First is the emergence of a novel autosomal gene with two alleles, in which homozygosity leads to the development of one sex and heterozygosity to the other sex. Therefore, at the early stage of sex chromosome evolution (proto-X and proto-Y), the chromosomes structures are expected to be homomorphic. In order to give rise to heteromorphic sex chromosomes,

2 suppression of recombination has to occur in the sex-determining region in order to preserve the gene(s) that benefit one sex over the other. The lack of recombination would also mean that as the mutations accumulate, there is no way to repair them, and eventually the Y chromosome would degenerate (peichel et al., 2004; Charlesworth et al.,

2005; Takehana et al., 2007b). Ultimately, the Y chromosome will be so diverged that only a tiny region will remain homologous with the X chromosome (the pseudoautosomal region, PAR) necessary for segregation during meiosis (Waters et al., 2007). The observation that Y chromosomes contain paralogous or pseudogenized genes that have close relatives on the X chromosome provides good evidence that Y was originally homologous to X but has been progressively degenerated (Graves, 1995). Here I will focus on a few major species that are widely studied.

1.2.1 Mammals

Mammals have an XX female:XY male chromosomal sex-determination system.

The X and Y chromosomes are usually highly differentiated, with the X chromosome being large, euchromatic, and highly conserved among species, whereas the Y chromosome is tiny and heterochromatic (Graves, 1995; Waters et al., 2007). The master switch that determines maleness is on the on the Y chromosome and is called the SRY gene. It encodes the testis-determining factor (TDF) that induces testis formation and it is located on the upstream of the sex-determination cascade (Waters et al., 2007). The SRY (the TDF) is a transcription factor containing the High Mobility Group (HMG) domain that binds to DNA at a 6-base consensus target sequence, and introduces specific bends, which might bring other to the site for activation (Graves, 1995; Waters et al., 2007). Overall, when different mammalian SRY sequences were compared, it was

3 found that they share a moderate conservation of HMG domain, but little conservation outside the HMG region (Koopman et aI., 2004; Waters et al., 2007). SRY shares the highest sequence similarity to a gene located on the X chromosome, the SOX3 (SRY-like

HMG box-containing) gene. Both SRY and SOX3 are intronless. The fact that both genes are closely related and SOX3 is located on the X suggests that SRY could have risen from the SOX3 on the Y chromosome and gained its testis-determining function once it was isolated from recombination (Graves, 1995; Quintana-Murci et al., 2001). Indeed, it was shown that Sox3 mutations in mice cause developmental problems in the gonads of both sexes and spermatogenesis fails (Waters et al., 2007). Furthermore, another closely related gene on human chromosome 17, SOX9, can cause XY sex reversal in the absence of its function (Wagner et al., 1994).

Another interesting observation worth mentioning is that the human X chromosome appears to have four evolutionary "strata", where each stratum represents an event. Each event is thought to correspond to an inversion on the Y chromosome, and these inversions are characterized by different levels of divergence between the X-and Y­ linked homologues. These caused the suppression of X-Y recombination and, ultimately, the degeneration of the Y chromosome (Lahn and Page, 1999). These strata occurred in a stepwise fashion, with the distal q arm being the oldest stratum, and the distal p arm the most recent one. This observation further supports the hypothesis that a Y chromosome degenerates gradually from an autosome. Remarkably, such similar strata can also be observed in some dioecious plant (Liu, 2004; Filatov, 2005; Nicolas et al., 2005), chicken

(Handley et al., 2004) and fungal species (Menkis et al., 2008). These phenomena might

4 indicate that the progressive and stepwise cessation of recombination is the driving force on the evolution of sex chromosome throughout all organisms.

1.2.2 Birds

Birds diverged from mammals about 300 million years ago (Nanda et aI., 1999), and they present a ZZ/ZW sex-determining system, where the females are the heterogametic sex. Although the birds and the mammals use different heterogametic systems, one feature in common between the two is that the Z chromosome is large and gene rich, just like the mammalian X chromosome, whereas the W chromosome is small, highly­ differentiated, and heterochromatic, similar to the mammalian Y chromosome (Shetty et al., 2002; Ezaz et al., 2006; Smith and Voss, 2007). Because the W chromosome is found only in the female birds, it was once thought that there is a female-determining gene located on the W which controls the development of ovaries. However, such a gene has not been found (Nanda et aI., 2000; Ezaz et aI., 2006). Nevertheless, the Z chromosome may contain a dosage-dependent male determinant that controls avian sex, and in this case, birds with two Z chromosomes develop testes, whereas ZW birds lead to female sexual differentiation (Nanda et al., 2000). Such dosage-compensation mechanisms controlling sexual differentiation have been observed in Drosophila melanogaster (Burtis and Baker, 1989) and Caenorhabditis elegans (Stothard and Pilgrim, 2003; Pires-daSilva,

2007). Currently, a likely candidate sex-determining gene in chicken and ratite birds is the doublesex/male abnormal-3 related transcription factor-1 (DMRTl) gene, which encodes a protein containing a zinc-finger DNA binding domain (DM-domain), which is orthologous to doublesex (dsx) in D. melanogaster and male abnormal-3 (mab-3) in C. elegans. These proteins play roles in the downstream position of the sex-determination

5 cascade in both of these organisms (Shetty et al., 2002; Schartl, 2004; Yamaguchi et al.,

2006). This DMRTI gene is located only on the Z chromosome in all the birds investigated so far (Shetty et al., 2002; Ezaz et al., 2006), and the region containing

DMRTl on the chicken Z chromosome shares an extensive conservation of chromosome synteny with human chromosome 9 (Nanda et al., 2000), which further support this hypothesis, since DMRTI in human acts in a dosage-dependent manner in the sex­ determining pathway, and deletion of the DMRTI region causes XY sex reversal

(Raymond et al., 1999).

It was once believed that the sex-determination mechanisms in birds and mammals have evolved independently for more than 300 million years (Nanda et al., 1999), but recent studies have shown that both sex chromosomes could have been derived from a common ancestor in the monotremes or amphibians. In these cases, both the human X and chicken Z chromosomes share orthologues on the platypus X chromosomes

(Grutzner et al., 2004) and a salamander autosome (Smith and Voss, 2007). However, more investigations are needed to test these findings.

1.2.3 Fishes

Fish are an attractive group of organisms to study sex-determination, because there are more than 24,000 species of fish worldwide, which provide a rich source of material for the study of vertebrate sex-determination and possible insights into how genetic sex­ determination in vertebrates arose (Devlin and Nagahama, 2002; Schartl, 2004). Almost every sex-determining mechanism that is possible is represented in fish, such as male/female heterogamety, hemlaphroditism, and environmental influence. So, III contrast to the relatively stable sex-determining mechanisms found in both mammals and

6 birds, the diversity of such mechanisms found in fishes might indicate frequent evolutionary changes in their sex-determination systems (Mank et aI., 2006; Volff et aI.,

2007). Different types of fishes can adopt different types of sex-determining mechanisms according to their needs. For example, sex-determination by environmental influence could be an advantage if the ecological conditions or population composition favours a particular sex over the other, and the ability to alter their sex even after the maturation of the gonads could provide some flexibility to withstand selection pressures (Mank et aI.,

2006). For those fishes that have discrete habitats or low population densities, hermaphroditism could be a better choice, and this is found in reef fishes (Mank et al.,

2006). Finally, those fishes that contain sex chromosomes are those that are unlikely to encounter a shortage of potential mates, such as diadromous species. It has been suggested that sex chromosomes tend to associate with diadromous species, since the individuals could come together during spawning, and hence they are likely to meet each other (Mank et al., 2006). Currently, it is found that male heterogamety is twice as common as female heterogamety, and whether a single gene was ancestral to both the male and female heterogamety and the two systems later evolved independently, or that there was a transition from one heterogamety to the other, is still under debate (Ezaz et aI.,

2006).

For those species that utilize sex chromosomes for sex-determination, it is believed that the majority of their sex chromosomes might still be in the earliest stage of evolution.

Most of these species contain homomorphic sex chromosomes, and the YY males are viable and fertile, indicating that the Y chromosomes in fish still contain functional genes, and have not degenerated to a point where YY individuals are lethal (Onozato, 1989).

7 Therefore, fish can also serve as a model orgamsm on how sex chromosomes differentiate from homomorphic to the heteromorphic systems that are observed in higher vertebrates (Devlin and Nagahama, 2002). Among the 1,700 species that have been cytogenetically characterized, only about 176 of them or 10.4% have cytogenetically distinct sex chromosomes. Heteromorphic chromosomes have been found in at least 72 families within Chondrichthyes (cartilaginous fish) and Osteichthyes (bony fish) (Devlin and Nagahama, 2002). It is also interesting to see that homomorphic sex chromosomes are often associated with these species that have autosomal or environmental influences on sex-determination, possibly because such sex chromosomes could not retain a dominant influence over sex-determination for sufficient time to undergo degeneration.

Therefore, whenever a new mutation that causes an autosomal gene to override the effect of the existing master gene, the original sex chromosome would be replaced by the new autosome carrying the new master gene, and the process of chromosomal degeneration would start again (Devlin and Nagahama, 2002).

Despite the functional importance of SRY in mammals, it has not been found in non­ mammalian vertebrates bearing the XX/XY system (Matsuda et aI., 2002; Nanda et al.,

2002). However, the SRY paralogue SOX9 is found in many teleosts, including rice field eel (Monopterus albus), zebrafish (Danio rerio), rainbow trout (Oncorhynchus mykiss), and medaka (Oryzias latipes), in which SOX9 is expressed in the testes, ovaries, and ovotestes during their gonadal differentiation, indicating a conserved role for this gene

(Takamatsu et aI., 1997; Chiang et al., 2001; Yokoi et ai, 2002; Zhou et al., 2003). A more recent study has even revealed a novel sex-determining gene that acts just like the

SRY in mammals - the DMY in medaka (discussed in section 1.3.1). As more fully

8 sequenced genomes of teleost fish become available (Sarropoulou et aI., 2008), it will be possible to carry out comparative genome analysis and look for the putative sex­ determining genes.

1.3 Sex-Determination in Teleost Fishes

As mentioned previously, fish provide an important resource for the study of sex­ determination due to their amazing diversity of species with various types of sex­ determining mechanisms. Here I will focus on a few teleost fish that have been well studied.

1.3.1 Medaka

Medaka (Oryzias latipes) possesses an XX/XY sex-determination system like that in mammals, but contains homomorphic sex chromosomes (Matsuda et al., 2002; Schartl,

2004; Matsuda, 2005). Recent studies have discovered a male-determining gene that spans about 250kb on its Linkage Group (LG) 1, which also corresponds to chromosome

1, that causes XY females when mutated (Matsuda et aI., 2002; Kasahara et aI., 2007).

This gene, called DMY, is the only gene that has been found that acts as a master switch in male development in fish so far (Zhang, 2004). DMY is an orthologue of DMRTl which, as mentioned previously, plays an important role in the sex-determining pathway in human. Its orthologues, dsx in D. melanogaster and mab-3 in C. elegans, are also involved in sex-determination in these species. However, DMY serves as an upstream regulator in medaka (Nanda et aI., 2002), whereas DMRTl orthologues serve as downstream regulators in both D. melanogaster and C. elegans. It will be interesting to investigate why the two orthologues behave in such a different way in spite of the

9 evolutionarily conserved mechanism, and the possible common ongm of the two

(Raymond et al., 1998). It is believed that DMY emerged from a recent of the DMRTI located on LG 9 about 10 million years ago, which was inserted to LG 1 and later gained the sex-determining function, making chromosome 1 the Y chromosome in medaka (Nanda et al., 2002; Schartl, 2004; Zhang, 2004).

Despite the seeming importance of DMY in O. latipes, this gene is absent in many of its closely related species (Kondo et al., 2004; Takehana et al., 2007a; Takehana et al.,

2007b) and other fishes (Kondo et al., 2003), suggesting that the duplication of DMRTI only occurred in the lineage leading to the O. latipes and O. curvinotus (Kondo et al.,

2004), and therefore is not the male-determining gene in all fishes. Some medaka species have even adopted a ZZ/ZW system (Takehana et al., 2007b), which further illustrates the independent evolution of sex-determining mechanisms among closely related species.

In addition, it was found that DMY might not be necessary for male development in every case, since a considerable number of males lacking DMY were identified (Volff and

Schartl, 2002). This suggests that there are autosomal modifiers that control the male determining function in medaka, and the chromosomes carrying such modifiers may develop into neo-Y chromosomes that could replace the present DMY and become the master regulator in male determination in the future (Schartl, 2004).

1.3.2 Poeciliid

The Poeciliidae, which includes guppy (Poecilia reticulata), molly (genus Poecilia), and platyfish (Xiphophorus maculatus), is one of the best-studied families of fish with respect to sex-determination. The Poeciliids generally have homomorphic sex chromosomes, and have a range of sex-determining mechanisms from XX/XY or ZZ/ZW

10 systems to polyfactorial sex-determination (Volff and Schartl, 2001). I will briefly look at each of these examples, and discuss the possible implications of these phenomena.

The guppies contain many sex-linked genes that are responsible for pigmentation pattern and controlling fin shape. Therefore, those phenotypes are excellent tools to study sex-determination in guppies. This species use an XX/XY mechanism, but there are loci on the autosomes that can override the effect of the sex-determining gene on the sex chromosome, which occasionally leads to XX males and XY females. When XX males are crossed with normal females, all offspring are female, and when XY females are crossed with normal males, both XY and YY males are produced, with the YY males being viable and fertile (Volff and Schartl, 2001).

Currently, little information is available for the sex-determination mechanism in mollies, because sex-linked traits have not been found in this species (Volff and Schartl,

2001). However, it is now known that XX/XY systems are used in P. velifera and P. latipinna, and in different stocks of P. sphenops, both XX/XY and ZZ/ZW systems were found. In a cytogenetic study by Haaf and Schmid (1984), the karyotype of P. sphenops was subjected to C-banded staining. It was found that a heterochromatic region was confined to a single chromosome 1 in females, but not in the males, hence confirming the

ZZ/ZW mechanism in this species.

The platyfish is the best-studied fish with regard to sex-determination within the

Poeciliids. This species contains three different sex chromosomes: X, Y, and W. Males are XY or YY, and females can be XX, XW, or WY (Volff and Schartl, 2001; Volff and

Schartl, 2002). Due to this spectacular three-chromosome-situation, it is possible to study the relationship between male heterogamety (XY male, XX female) and female

11 heterogamety (WY female, YY male), and such a phenomenon could be a transition state between the two heterogameties in the evolution of teleost sex-determination (Ezaz et aI.,

2006). The mechanism for this three-chromosome-situation is rather complicated and may even involve some autosomal modifiers and repressors (Volff and Schartl, 2001).

The master sex-determining gene(s) are yet to be identified in platyfish.

It can be seen from the above that the Poeciliidae display a wide variety of sex­ determination mechanisms, which might reflect the early stage of sex-determination evolution, as the poeciliid fishes are a relatively young group in the history of teleost evolution (Volff and Schartl, 2001). It is observed that the sex-determining region in

Xiphophorus undergoes frequent rearrangements that inlcude duplications, amplifications, deletions and transpositions, and such events might be the first step toward the cessation of sex chromosome recombination and the divergence of the Y chromosome.

Furthermore, it is proposed that in Xiphophorus, the X chromosomes contain one copy of the dosage-sensitive gene involving sex-determination, the Y chromosomes contain two copies of such gene, while the W chromosomes contain none. Two copies or less of the gene would lead to female, whereas three copies or more would lead to a male. Therefore, the high genome plasticity of Xiphophorus that causes the deletion or duplication of this gene would consequently lead to the different types of sex chromosomes and the frequent switching of sex-determining mechanisms observed in this species (Volff and Schartl,

2001).

1.3.3 Tilapia

The tilapia fish (Oreochromis, Sarotherodon and Tilapia) are among the world's most important fish species in aquaculture. Therefore, there has been a long history of

12 work on the study of sex-determination in tilapia in the hope of understanding their reproductive mechanisms in order to improve growth rates in commercial production settings (Lee et al., 2003; Cnaani et ai., 2008). It is now confirmed that there are genes which regulate and determine the sex of tilapia, and both male (XXlXY) and female

(ZZ/ZW) heterogametic systems are found among closely related species (Lee et ai.,

2004; Cnaani et al., 2008). However, the observed sex ratios frequently deviate from the expected Mendelian segregation, indicating that a strict genetically-determined mechanism is not always the case, and so autosomal genes and environmental influences could also take part in the sex-determination of tilapia (Cnaani et ai., 2008). It is interesting that different species of tilapia uses a different sex chromosome and a different heterogametic system. In both T. zillii and O. niloticus, the sex-determining locus is located on the LG 1 and both species possess male heterogametic sex­ determination. In contrast, both T. mariae and O. karongae adopt the female heterogametic sex-determination, and in both species, the sex-determining locus is located on LG 3. A more complex system was observed III O. aureus (female heterogametic system) and O. mossambicus (heterogamety unknown), where both LG 1 and LG 3 contain the sex-determining locus in both species (Lee et al., 2004; Cnaani et ai., 2008). The above observations may indicate that a dominant sex-determining gene has not been fixed in the species, but nevertheless, it now seems that both LG 1 and LG 3 contribute to sex-determination in tilapia. This was confirmed by fluorescent in situ hybridization (FISH) analysis using sex-linked DNA markers (Ezaz et ai., 2004; Cnaani et al., 2008). This supports the hypothesis that autosomal genes may also be involved in their sexual development. Some characteristics of the early-developing sex chromosomes

13 were observed, such as a broad region of recombination suppression in LG 3 with an

accumulation of repetitive elements and a possible inversion, and that LG 1 may contain

lethal alleles caused by the lack of recombination (Cnaani et al., 2008). Such

combinations of early-evolving sex chromosome phenomena, with the variability of their

sex-determination mechanisms among closely related species, have not been observed in

other vert~brates (Cnaani et al., 2008). Therefore, tilapia are considered an excellent

model for the study of vertebrate sex chromosome evolution.

1.3.4 Stickleback

The threespine stickleback (Gasterosteus aculeatus) has been extensively studied in

the field of genetics and evolution for the past decades due to its great diversity and wide

range of aquatic habitats (Cunado et al., 2002). It was once believed that temperature and

population dynamics played an important role in the sexual differentiation in stickleback

(Peichel et al., 2004), but recent studies have shown that an allozyme of isocitrate

dehydrogenase (IDH) is linked to sexually dimorphic characteristics in some stickleback

populations and sex-specific DNA markers have also been found (Griffiths et al., 2000;

Peichel et al., 2004). These observations suggest a genetic basis for the sex-determination

in stickleback. Indeed, after further examination by genome-wide linkage mapping, it was

found that LG 19 contains the sex-determining locus. Genes near the sex-determining

locus show a male-specific reduction in recombination frequency, indicating a

suppression of the recombination rate as predicted in an evolving sex chromosome. Many alleles in this region are constantly heterozygous in males, implying that stickleback uses the male heterogamety (XX/XY) system, and the sex-determining locus on LG 19 is a

male-determining gene. Recently, FISH analysis using sex-linked markers has revealed

14 that threespine stickleback contains a heteromorphic Y chromosome (Ross and Peichel,

2008), which was not observed in earlier studies (Cunado et aI., 2002) due to the

similarity in size between the X and Y chromosomes. The FISH analysis has also

identified a chromosomal deletion on the telomeric end of the Y chromosome, and the Y

chromosome has experienced multiple inversions involving the centromere (Ross and

Peichel, 2008), which could have created the evolutionary "strata" similar to the Y

chromosomes in human, chicken, some dioecious plant and some fugal species (section

1.2.1). In addition, it is expected that a degeneration of genes would be observed near the

sex-determining loci due to the lack of recombination between the X and the Y

chromosome, and this is indeed the case. When the sequences of regions of the X and Y

chromosomes were compared, poor sequence similarity, large deletions/insertions, and

transposable elements were observed (Peichel et al., 2004). All these observations show

that LG 19 corresponds to the Y chromosome (which is also heteromorphic) in

stickleback, and that there is a gene that controls sex-determination. However, the

identity of the sex-determining gene remains unknown and the search for such a gene

continues (peichel et al., 2004).

1.3.5 Pufferfish

The pufferfish are considered good model organisms for comparative genome

analysis and sex-determination study due to their small genome sizes: the freshwater pufferfish (Tetraodon nigroviridis) contains the smallest known vertebrate genome, while the tiger pufferfish (Takifugu rubripes) contains a compact genome size that is about eight times smaller than the (laillon et aI., 2004; Kai et al., 2005).

Currently, the fully sequenced genomes of both Tetraodon and Takifugu are available,

15 and these genomes can be used as references to understand other vertebrate genomes and to discover new genes, especially novel sex-determining genes (Kai et al., 2005;

Sarropoulou et al., 2008).

Recent evidence found that DMRTl might also playa role in the sex-determining pathway in Takifugu. A high level of DMRTJ expression was detected until the final stage of spermatogenesis during gonadal development, and the expression was abundant in the developing testes, but not in the ovaries, of the 3-month-old fish (Yamaguchi et al.,

2006). Again, this sexual differentiation mechanism is similar to the downstream role of dsx in D. melanogaster and mab-3 in C. elegans. Furthermore, a more recent study has shown that Takifugu has adopted an XX/XY sex-determining system, and the sex­ determining locus is located on LG 19, since the sex-linked markers around this region have a reduced recombination rate (Kikuchi et al., 2007). Interestingly, DMRTl does not reside on LG 19, and hence it is not the master gene for sex-determination in this species.

However, two genes that playa key role in the mammalian male sexual differentiation, amhrll and inhbb (Vigier et al., 1989; Brown et al., 2000), were found near the sex­ determining locus, and these have become candidate genes for the sex-determining locus in Takifugu (Kikuchi et al., 2007).

1.4 Sex-Determinations in Salmonids

Salmonids are also excellent fish species for the study of sex-determination. Since they are an important component of the human food supply, understanding the reproductive biology of both wild and aquacultured species would allow effective management in salmon breeding (Devlin and Nagahama, 2002). To date, all of the salmonid speices studied have genetically-determined sexual differentiation, and all of

16 them are male heterogametic systems. There are three pieces of evidence to support this:

(l) cytogenetic studies have shown that a few of the salmonid species contain heteromorphic Y chromosomes, such as rainbow trout (Oncorhynchus mykiss), sockeye salmon (Oncorhynchus. nerka), lake trout (Saiveiinus namaycush), and least cisco

(Coregonus sardinella) (Thorgaard, 1977; Pillips and Rab, 2001); (2) when hormonally­ treated sex-reversed males (XX males) were crossed with normal females, all of the offspring produced were female (Johnstone et ai., 1979; Hunter et ai., 1982; Hunter et ai.,

1983; Johnstone and Youngson, 1984); and (3) FISH analysis using sex-linked DNA markers has identified sex chromosomes in several Salmonid species (Stein et ai., 2001;

Phillips et ai., 2005; Artieri et ai., 2006). In addition, those sex-specific markers are often heterozygous in males and homozygous in females (Sakamoto et ai., 2000; Devlin et ai.,

2001; Stein et ai., 2002). Currently, sex chromosomes have been identified in eight species of salmonid fishes: rainbow trout, sockeye salmon, lake trout, Atlantic salmon, chinook salmon (Onchorhynchus tshawytscha), coho salmon (Onchorhynchus kisutch), chum salmon (Oncorhynchus keta) and pink salmon (Oncorhynchus gorbuscha) (Phillips et ai., 2005, 2007). Salmonid fishes other than rainbow trout, lake trout and sockeye salmon all contain homomorphic sex chromosomes, indicating an early stage of sex chromosome evolution (Pillips and Rab, 2001). This is also supported by the fact that YY males are viable and fertile, suggesting that X and Y chromosomes still share a large portion of functional genes (Chevassus et ai., 1988; Onozato, 1989).

A comparative genetic analysis was done by comparing the sex linkage groups of four salmonid species: Atlantic salmon, brown trout, Arctic charr, and rainbow trout. It was found that there were no sex-linked markers in common linked to the sex-

17 determining locus across different species, since the sex-linked markers from each of these species were mapped on to the autosomal linkage groups in each of the other species. This could be caused by chromosomal translocations, which could lead to sex­ linkage disruption. Therefore, no conservation of sex-determining region was seen across salmonid fishes, and they may have different sex-determining genes (Woram et al., 2003).

Furthermore, chromosome painting was carried out using the probes designed from the Y chromosome of the lake trout, and the probes were subjected to hybridization on the genomes of rainbow trout, chinook salmon, Atlantic salmon, and brown trout. It was found that the probes only hybridized to autosomes rather than sex chromosomes in all these fishes. Therefore, chromosome painting has also confirmed the lack of conservation and the independent evolution of sex chromosomes in different genera of salmonid fishes

(Phillips et aI., 2001).

The actual master gene(s) for sex-determination has not been identified yet in any of the salmonid species mentioned above, but the search for sex-determining genes is underway. It appears that the sex-determining locus is located at the telomeric end of a chromosome in all of the salmonids examined, except in rainbow trout where the sex­ determining locus seems to be located near the centromere. It has been postulated that this was due to a chromosomal rearrangement (Woram et aI., 2003). Interestingly, two genes - SOX9 and DMRTl were isolated in rainbow trout, and both of them seem to play a crucial role in sexual differentiation (Takamatsu et aI., 1997; Marchand et aI., 2000).

SOX9 is a paralogue to the SRY in mammals, and in human it is involved in the formation of the testis. Therefore, haploinsufficiency of SOX9 in human can cause XY sex. reversal

(Foster et aI., 1994; Wagner et al., 1994). The rainbow trout SOX9 has 70% amino acid

18 sequence identity with the human SOX9, and a high level of expression is observed in the testis of rainbow trout (Takamatsu et ai., 1997). DMRTI and its orthologues are involved in the sexual differentiation in many invertebrates, fish and human, as mentioned previously. The rainbow trout DMRTI shows high sequence similarity to the DMRTI from other organisms and it is highly expressed during testicular differentiation, especially during spermatogenesis. This expression pattern was not observed during ovarian differentiation. In adult tissues, there was a high level of DMRTI expression in the testis, plus a very small expression in the ovary (Marchand et al., 2000). Therefore, these genes are probably good candidates for the study of the sex-determining mechanisms and the isolation of the actual sex-determining gene(s) in salmonid fishes.

1.5 Atlantic Salmon Linkage Group (LG) 1

The Atlantic salmon possesses a male (XX/XY) heterogametic system, with homomorphic sex chromosomes (Johnstone and Youngson, 1984; Phillips and Rab, 2001;

Artieri et al., 2006). From previous studies, microsatellite markers isolated from Atlantic salmon have shown that those having an association with the male phenotype are on LG

1, and hence it has been assigned as the sex linkage group in Atlantic salmon (Woram et ai., 2003; Artieri et al., 2006). The sex-determining locus, SEX, was located at the distal end of LG 1, after using three AFLP markers and 15 microsatellite markers (Woram et al.,

2003). Since the recombination rate in females is approximately ten times greater than in males in salmonid fish (Artieri et al., 2006), LG 1 was built based on the female maps from the two Atlantic salmon families, Br5 and Br6.

A physical map of Atlantic salmon comprising ~4,400 contigs was generated based on HindIII fingerprints of bacterial artificial chromosome (BAC) library from a

19 Norwegian male Atlantic salmon (Ng et ai., 2005). The BAC library, CHORI-214,

consists of three segments made from partial EcoRI- and a fourth segment made from

partial Sau3AI-digested DNA from a Norwegian male salmon, and provides at least 20­

fold coverage of the genome. BAC filters, each representing more than 18,000 Atlantic

salmon clones in duplicate, have been produced for hybridization screening (Thorsen et

ai., 2005). In the study by Artieri et al. (2006), the flanking regions of 16 microsatellite

markers from LG 1 (information provided by Danzmann and Hoyheim, unpublished data)

were chosen to design oligonucleotide probes. These probes were hybridized to the

CHORI-214 BAC library filters, and 12 of the 16 probes successfully identified positive

BACs. The positive BACs were PCR-confirmed using primers corresponding to the

microsatellite markers, and they were placed into the contigs of the Atlantic salmon

physical map. These contigs, with their associated markers, provided the starting point

for chromosome walking and contigs extension on the LG 1, which is the basis of my

research.

Six of these 12 successfully mapped microsatellite markers (OmyFGT8TUF,

One18ASC, BHMS447, One102ADFG, BHMS150 and Ssa202DU), with the six unique

BACs that each of it is positive to, were chosen for FISH analysis to determine their

physical locations on Atlantic salmon chromosomes, and the locations of these six microsatellite markers are dispersed throughout the entire LG 1 (Figure 3.1). The FISH result showed that the six BACs (and hence, the six microsatellite markers) hybridized to chromosome 2, making it the sex chromosome in Atlantic salmon. Interestingly, both of the chromosome 2 homologues gave positive hybridization results with the BACs in all cases, which make it impossible to distinguish which of the chromosome 2 homologues is

20 the Y chromosome at this time. Nevertheless, the order of the FISH hybridized BACs along chromosome 2 correlates well with the order of the corresponding microsatellite marker in LG I (Artieri et ai., 2006). DAPI staining of Atlantic salmon chromosome revealed a large region of heterochromatin on the telomeric end of the q arm on chromosome 2. Therefore, combining the information obtained from both cytogenetic analysis and the LG I data, it is now believed that SEX is located between the microstatellite marker Ssa202DU and the heterochromatin region, which is near the telomeric end ofthe q arm on chromosome 2 (Artieri et ai., 2006).

1.6 Comparative Genonle Analysis

1.6.1 Whole-Genome Duplication

It has been proposed that duplication of genes, chromosomal segments, or the entire genomes provide the raw genetic materials necessary for macroevolution: they are crucial for generating novel gene functions and expression patterns, which could possibly have led to the creation of metazoans, vertebrates and mammals from unicellular organisms

(Ohno, 1970). Ohno (1970) also proposed that one or two rounds of whole-genome duplication (WGD) took place during the early evolution of vertebrates. This idea later became known as the 2R hypothesis, in which the first round of WGD occurred before the emergence of all vertebrates, followed by a second round of WGD that shaped the genome of jawed vertebrates (Sidow, 1996; Kasahara, 2007). It is now believed that a third round of WGD (3R duplication) also occurred in the lineage that led to the emergence of teleost fishes (Christoffels et ai., 2004; Meyer and Van der Peer, 2005).

Evidence supporting the 2R and 3R duplication comes from the extensive studies ofHOX gene clusters, as zebrafish and medaka contain seven copies of clusters, while

21 the mammals have four and the invertebrates have one (Amores et ai., 1998; Naruse et ai.,

2000). Another example is the egfr (epidermal growth factor receptor) gene, in which one egfr gene is found in invertebrates such as D. meianogaster and C. eiegans, whereas mammals contain four egfr-related genes and teleost fishes have seven (Volff, 2005). In addition, comparative genome analysis between different types of teleosts also seems to support the 3R duplication (discussed in section 1.6.3). Therefore, the above observations all support the 2R and 3R duplication, and since only seven gene clusters are found in the teleosts (rather than eight) for both HOX and egfr, it is believed that one copy has been lost during the teleost evolution in both cases.

After gene duplication has occurred, the duplicated genes are often referred to as paralogous genes (Zhang, 2003). The paralogous genes could have three types of fate: (1) become silenced or lost (nonfunctionalization); (2) acquire a novel function that IS beneficial to the orgamsm so that natural selection could preserve it

(neofunctionalization); or (3) both gene copies experienced mutations so that both have complement gene expression that correspond to their single ancestral gene

(subfunctionalization). Over time, the modification of new genomes through mutations, natural selections and genetic drift would create novel gene functions, gene networks, and gene cascades, which could facilitate speciation, a phenomenon known as divergent resolution. Divergent resolution states that after gen(om)e duplication, one population loses function of one copy of the gene while the other population loses function of the second gene copy. This would cause the species to produce gametes that completely lack the functional genes, making the organism less viable and/or fertile. Such interspecific genomic incompatibility would mark the beginning of speciation (Lynch and

22 Force, 2000). Divergent resolution could be the reason why there is such a huge diversity

of teleost (more than 24,000 species), making them the most species-rich order of

vertebrates today.

Sex-determination is one of the major problems for an organism that has undergone

a WGD. One way to overcome this problem is that the duplicated gene undergoes

neofunctionalization and becomes the novel sex-determining master switch that overrides

the original sex-determining gene, as in the case of DMY in medaka (section 1.3.1).

Alternatively, one of the duplicated sex-determining genes could become pseudogenized

or lost from the genome in order to allow the sex-determination system to return to the

diploid state (Davidson et aI., in press). It is reasonable to expect that different lineages

would evolve different sex-determining mechanisms after WGD due to different selection

pressure, which is also another possible way to promote speciation (Davidson et aI., in

press).

1.6.2 Salmonid Genome Duplication

It is believed that the common ancestor of salmonids had undergone a fourth WGD

(i.e. 4R duplication) about 25 to 100 million years ago (Allendorf and Thorgaard, 1984).

Evidence supporting this hypothesis also comes from the studies of HOX gene clusters in

the salmonids: both Atlantic salmon and rainbow trout contain 13 HOX gene clusters,

which is roughly twice the number found in both zebrafish and medaka (Moghadam et aI.,

2005a, 2005b). However, sequence data analysis suggested that at least 14 HOX genes

should be presented in the common ancestor of salmonids (Moghadam et aI., 2005a,

2005b). Other evidence comes from the karyotype studies in salmonids. The number of chromosome arms (NF) in salmonids is between 96-104, whereas in most teleosts, the NF

23 is between 48-52 (Phillips and Rab, 2001; Mank and Avise, 2006), indicating that the salmonids NF is roughly twice as much compared to the teleost fishes. Furthermore, quadrivalent meiotic configurations are often observed within male-specific meiosis, and these pairings seem to involve metacentric chromosomes that mayor may not include an acrocentric pair (Wright et aI., 1983). Tetrasomic inheritance or partial tetrasomic ratios are also observed in the segregation ratios after male meioses, indicating that salmonid genomes have not completely returned to a stable diploid state, a process called rediploidization (Allendorf and Danzmann, 1997; Wolfe, 2001). In addition, recent studies in Atlantic salmon and rainbow trout using comparative genome analysis also support the 4R duplication (discussed in section 1.6.3). The fact that salmonids contain duplicated genomes has provided us with excellent models to study the rediploidization process, the fate of duplicate genes, and the development of possible novel sex­ determining mechanisms after an organism has undergone WGD (Davidson et aI., in press).

1.6.3 Comparing Genomes from Different Species

Comparative genome analysis is an excellent tool to compare the similarities and differences between different organisms in terms of gene content, and see how genomes evolve in different lineages. This approach can also help us to predict the locations of the genes by comparing different gene maps from different organisms. With more and more organisms being sequenced, they provide us with new gene maps that are useful for this purpose. A good way to do this is to compare the chromosomal location of homologous genes or DNA markers in different species and to see the degree of genome conservation and rearrangement between them. The occurrence of two or more genes on the same

24 chromosome is referred to as synteny (Ehrlich et al., 1997), and one wants to find out whether there are syntenic regions in different genomes to see evolutionary relationships.

There are three different types of synteny: (l) conserved synteny, where three or more genes are syntenic in two or more species, regardless of the gene order, so gene order is not necessarily conserved in this case; (2) conserved linkage, the conservation of both synteny and order of three or more homologous genes between species, and (3) disrupted synteny, where two or more genes are located on the same chromosome in one species but their homologues are located on different chromosomes in another species (Ehrlich et al., 1997).

Comparative genome analysis has been very helpful in the reconstruction of the vertebrate ancestral genomes, thus confirming the 3R duplication. In the studies done by

Kohn et al. (2006), the genomes of tetrapod lineage (human and chicken) and teleost lineage (pufferfish and zebrafish) were compared. It was found that large portions of conserved syntenies and paralogous segments were dispersed on different chromosomes in the four organisms, and when these conserved regions were assembled back, 11 pairs of ancestral vertebrate proto-chromosomes were formed (Kohn et al. 2006). The studies done by Nakatani et al. (2007) also had the similar conclusion, but they proposed that the number of ancestral proto-chromosomes ranged from 10 to 13 depending on the choice of fission model or fusion model. Previous similar approaches have been applied to medaka, zebrafish, Tetraodon, and human, and these studies have led to the hypothesis that the common ancestor of teleosts and tetrapods had 12 pairs of proto-chromosomes, followed by a WGD (i.e. the 3R duplication) that occurred early in the lineage leading to the teleosts (Jaillon et al., 2004; Mulley and Holland, 2004; Naruse et al., 2004). These

25 studies have also revealed large portions of conserved syntenies between zebrafish and human (Barbazuk et al., 2000; Woods et ai., 2000), or cases where pairs of zebrafish genes appear to be orthologous to single mammalian genes, an indication of the gene duplication event (Gates et al., 1999), and conserved syntenies were also observed between medaka and zebrafish (Naruse et al., 2000). These all implied the possible chromosomal rearrangements and gene shuffling after the WGD.

In the case of salmonids, comparative genome analysis has also been carried out between the genomes of Atlantic salmon, rainbow trout, zebrafish and medaka

(Danzmann et ai., 2008). It was found that two chromosomes in zebrafish and medaka

(both are descendants of the 3R duplication) contained conserved regions in four whole or partial chromosomal arms in the salmonids, and it was postulated that one of the ancestral linkage group has undergone an additional duplication following the 3R duplication (Danzmann et ai., 2008). Large portions of conserved syntenies were also identified between the ancestral linkage groups of the 3R fishes and the 50 homeologous chromosomal segments in Atlantic salmon and rainbow trout (Danzmann et ai., 2008). In addition, both Atlantic salmon and rainbow trout shared portions of conserved syntenies as well (Danzmann et ai., 2008), and it has been suggested that Robertsonian translocations and chromosomal fusions/fissions were the main forces that shaped the salmonids genome (Phillips and Rab, 2001). In this thesis, I will compare the sex linkage group of Atlantic salmon, LG 1, with the genomes of four different teleost fishes: medaka, stickleback, zebrafish and Tetraodon. This will not only give us some insights into the evolution of sex chromosomes between the 3R and 4R fishes, but may also provide the

26 necessary tool for the prediction of sex-determining gene in Atlantic salmon by looking at their conserved syntenic regions.

1.7 Aim of the Thesis

There are four objectives of my research project:

(1) to screen the Atlantic salmon BAC libraries for all known microsatellite markers

and single nucleotide polymorphism (SNPs) on LG 1, and to integrate as much of

the physical and linkage maps in this genomic region as possible;

(2) to perform chromosome walking and contig extension from the existing contigs

with their associated microsatellite markers and SNPs, with the eventual goal of

covering the entire chromosome 2 of the Atlantic salmon;

(3) to construct BAC minimum tiling pathways in the known contigs, so that the BACs

that make up the minimum tiling pathways can be subjected to sequencing; and

(4) to carry out comparative genome analysis between the LG 1 of Atlantic salmon and

the genomes of medaka, stickleback, zebrafish and Tetraodon to study their

evolutionary relationships and to predict the possible sex-determining gene in the

Atlantic salmon.

27 CHAPTER 2: MATERIALS AND METHODS

2.1 Integration of Microsatellite and SNP Markers into Atlantic Salmon Linkage Group 1

2.1.1 Probe and Primer Design

All of the 17 oligonucleotide probes (approximately 40 nucleotides in length), as well as their corresponding reverse primers (approximately 20 nucleotides in length), were designed from the flanking regions of 12 microsatellite markers and five single nucleotide polymorphisms (SNPs) using OLIGO 4.0 software (Rychlik and Rhoads,

1989). The sequences of the probes were also checked by the RepeatMasker program

(http://repeatmasker.org; also http://lucy.ceh.uvic.ca/repeatmasker/cbcrepeatmasker.py), since the Atlantic salmon genome is full of repetitive elements. The oligonucleotide probes and the reverse primers were designed to have a GC content of 50% or more if possible, with the Tm of the probes equal or higher than 65°C.

2.1.2 BAC Library Screening

To begin the BAC library screening, two types of probing solutions were prepared: the 40-mer oligonucleotide probe and a Caenorhabditis briggsae 40-mer overgo reference probe (5'-GTTGCCAAATTCCGAGATCTTGGCGACGAAGCCACATGAT-

3'). For each probing solution mix, 0.5 ilL of 10 11M oligonucleotide probe (or overgo reference probe) was mixed with 1 ilL of 5X Forward Reaction Buffer (Invitrogen), 0.5 ilL of lOU/ilL T4 polynucleotide kinase (Invitrogen), 1 ilL of 3Zp_y ATP (0.37 MBq/IlL), and 2 ilL of dHzO. So the total volume was 5 ilL each for oligonucleotide probe and

28 overgo reference probe solution mix, which was enough for three CHORI-214 Atlantic salmon BAC library filters. The probing solution mixtures were incubated at 37°C for more than one hour so that the 32p could be incorporated into the 5' end of the probes.

Each CHORI-214 Atlantic salmon BAC library filter

(http://bacpac.chori.org/salmon214.htm) was pre-hybridized (three filters per hybridization bottle) at 65°C for more than two hours with 100 mL of hybridization buffer containing 5X SSC (pH 7.0), 0.5% SDS, and 5X Denhardt's solution (5 g of bovine serum albumin, 5 g of Ficoll 400, 5 g of polyvinyl pyrrolidine and 500 mL of dH 20 to make 50X Denhardt's solution). After the pre-hybridization, both oligonucleotide probe solution and overgo reference probe solution were added to the

BAC filters, and the hybridization was carried out at 65°C for more than 16 hours

(overnight). Filters were then washed twice at 50 °c for one hour with 200 mL of washing buffer containing IX SSC and 0.1 % SDS. Filters were then wrapped in Saran wrap exposed on the storage phosphor screens (Molecular Dynamics) overnight. The storage phosphor screens were then scanned using a Typhoon 9410 Phosphor Imager for the detection of the hybridization signals.

2.1.3 Positiveness Check by PCR

BAC clones that were positive by hybridization were manually picked from the

CHORI-214 Atlantic salmon genomic library. The BAC clones were plated on an agar plate containing 50 ~g/mL chloramphenicol, which was incubated at 37°C overnight, forming a "master plate". Each of the PCR reaction mixes contained the following: I ~L of lOX PCR buffer containing MgCh (QIAGEN), I ~L of 2 mM dNTP, 0.5 ~ of 10 ~M probe, 0.5 ~L of 10 ~M reverse primer, 0.15 ~L ofTaq DNA polymerase (QIAGEN), and

29 6.85 ~L of dH20. Using a toothpick, the BAC clone was picked out from the master plate and dipped into the PCR reaction mix. PCR amplification was carried out in a T3

Thermocycler (Biometra), and the PCR temperature profile comprised the following steps: an initial denaturation at 95°C for 5 min; 40 cycles of 95 °c for 30 sec, 55°C for

30 sec, and 72 °c for 1 min; and a final extension step at 72°C for 5 min. PCR products were electrophoresed on a 1% agarose gel containing IX TBE and ethidium bromide (50 mg/mL; 1 ~L of ethidium bromide was added to every 10 mL of agarose), and the PCR products were visualized using a UV trans-illuminator (Alpha Innotech). BACs that produced the appropriate PCR product were considered "true positives" of the hybridization result, and these BAC colonies were transferred from the master plate to 5 mL of LB containing 50 ~g/mL of chloramphenicol for an overnight incubation at 37°C, with shaking at 250 RPM. After the overnight incubation, 0.7 mL of culture was added to

0.3 mL of 50% glycerol, and stored at -80°C as a glycerol stock.

2.1.4 Identifying the Corresponding Contig

BAC clones containing the sex-linked microsatellite markers or SNP markers, confirmed by both hybridization and PCR were checked in the Asalbase database

(http://www.asalbase.org/sal-bin/tools/bac2contig) in order to find the contig(s) that these

BACs were associated with, and the newly identified contig(s) were incorporated into the

Atlantic salmon LG 1.

30 2.2 Chromosome "Walking" Along Atlantic Salmon Linkage Group 1

2.2.1 Probe and Primer Design

All of the 34 oligonucleotide probes and their corresponding reverse primers were designed from BAC end sequences using OLIGO 4.0 software (Rychlik and Rhoads,

1989), and the BAC end sequences were checked by the RepeatMasker program

(http://repeatmasker.org; also http://lucy.ceh.uvic.ca/repeatmasker/cbcrepeatmasker.py) before designing probes/reverse primers. Again, all of the probes contained approximately 40 nucleotides, with the Till of the probes equal or higher than 65°C, and the reverse primers contained approximately 20 nucleotides. The probes/reverse primers were designed to produce a PCR product of 200 to 400 bp.

2.2.2 BAC Library Screening

The procedure for BAC library screenmg was exactly the same as described in section 2.1.2.

2.2.3 Positiveness Check by peR

The procedure for PCR positiveness check was exactly the same as described in section 2.1.3.

2.2.4 Joining the Contigs

BAC clones confirmed by both hybridization and PCR were checked in the Asalbase database (http://www.asalbase.org/sal-bin/tools/bac2contig) in order to find the contig(s) that these BACs associated to. Once a contig was identified, this newly identified contig could be merged end-to-end with the contig from which the probe was designed, forming a "super-contig".

31 2.3 Sequencing the BAC Inserts

2.3.1 PCR Procedure

Each PCR reaction mix contained the following: 1 ~L of lOX PCR buffer containing

MgCh (QIAGEN), 1 ~L of 2mM dNTP, 0.5 ~L of 10 ~M probe, 0.5 ~L of 10 ~M reverse primer, 0.15 ~L of Taq DNA polymerase (QIAGEN), and 6.85 ~L of dH20. Using a toothpick, the BAC containing the desired insert for sequencing was picked from the agar plate and dipped into the PCR reaction mix. The amplification was carried out in a T3

Thermocycler (Biometra), and the PCR temperature profile comprised the following steps: an initial denaturation at 95°C for 5 min; 40 cycles of 95 °c for 30 sec, 55°C for

30 sec, and 72 °c for 1 min; and a final extension step at 72°C for 5 min. PCR products were then run on a 1% agarose gel containing SYBR Safe (Invitrogen), and the PCR products were visualized using the Safe Imager Transluminator (Invitrogen). The PCR fragments were cut out from the gel, and the DNA was purified using an Ultrafree-DA column (Millipore).

2.3.2 Subcloning the PCR Product

2 ~L of the PCR product was mixed with 2.5 ~L of Clonables 2X Ligation Premix and 0.5 ~L of AccepTor Vector (50 ng/~L) from the Novagen AccepTor Vector Kit in a

1.5 mL Eppendorf tube on ice, and the ligation mixture was left at 4 °c overnight. After the overnight incubation the tube was placed on ice, and 25 ~L of Novablue Singles

Competent Cells were added to the ligation mixture with gentle pipetting. The competent cell mixture was incubated on ice for 20 min, followed by heat shock at 42°C water bath for 45 sec, and then it was quickly returned on ice for additional 3 min. 100 ~L of LB was added to the cell mixture, and it was incubated at 37°C for about 30 min with shaking at

32 250 RPM. During the 30 min incubation, 50 JlL of 4% X-gal and 25 ilL of 0.1 M (2.4%)

IPTG were spread on the agar plate containing 50 Jlg/mL ampicillin. After the 30 min incubation at 37°C, the cell culture was spread on the ampicillin/X-gal/IPTG plate gently, followed by overnight incubation at 37°C.

2.3.3 Insert Check by Colony PCR

In order to make sure that the white colonies formed on the plate were truly containing vectors with inserts, an "insert check" was carried out using PCR. The U-

20mer primer (5'-GGTGACACTATAGAATACAG-3') and the R-20mer primer (5'­

ATGACCATGATTACGCCAAG-3') were designed from the flanking regions of the

AccepTor Vector insert site and it would produce a PCR fragment of 197 bp if no DNA insert is present. Each of the PCR reaction mix contained the following: 1 ilL of lOX

PCR buffer containing MgClz (QIAGEN), 1 ilL of 2mM dNTP, 0.5 JlL of 10 JlM U­

20mer primer, 0.5 JlL of 10 JlM R-20mer primer, 0.15 JlL of Taq DNA polymerase

(QIAGEN), and 6.85 ilL of dH20. Using a toothpick, the white colony was picked from the ampicillin/X-gal/IPTG plate and dipped into the PCR reaction mix. Just before the

PCR amplification, a small aliquot (-1 JlL) of the PCR reaction mix was transferred to an agar plate containing 50 Jlg/mL ampicillin as the "master plate". PCR amplification was carried out in a 1'3 Thermocycler (Biometra), and the PCR temperature profile comprised the following steps: an initial denaturation at 95°C for 5 min; 40 cycles of 95 °C for 30 sec, 55°C for 30 sec, and 72 °C for 1 min; and a final extension step at 72°C for 5 min.

PCR products were then run on a 1% agarose gel containing IX TBE and ethidium bromide (50mg/mL; 1 JlL of ethidium bromide was added to every 10 mL of agarose), and the PCR products were visualized using a UV trans-illuminator (Alpha Innotech).

33 The peR product larger than 197 bp would correspond to the white colony that contained the true insert. peR positive white colonies were transferred from the master plate to 5 mL of LB containing 100 J.lg/mL of ampilillin for an overnight incubation at 37 °e, with shaking at 250 RPM. After the overnight incubation, 0.7 rnL of culture was added to 0.3 mL of 50% glycerol, and stored at -80 °e as a glycerol stock. The rest of the cell culture

(4.3 mL) would be used for DNA isolation and sequencing purposes (section 2.3.4).

2.3.4 Sequencing Reaction

Plasmid DNA was isolated from the 4.3 rnL cell culture mentioned in section 2.3.3, using the QIAprep Spin Miniprep Kit (QIAGEN). Each of the sequencing reaction mixes consisted of the following: 4 J.lL of Amasham DYEnamic ET terminator cycle sequencing kit master mix, 2 J.lL of 2 J.lM R-20mer primer, and 4 J.lL isolated DNA. The sequencing reaction was carried out in a T3 Thermocycler (Biometra), and the temperature profile comprised the following steps: an initial denaturation at 96 °e for 1 min; 40 cycles of 96

°e for 10 sec, 50 °e for 5 sec, and 60 °e for 2 min. After the sequencing reaction, 40 J.lL of 99.9% EtOH and 1 J.lL of sodium acetate/EDTA buffer (1.5 M sodium acetate, 250 mM EDTA) were added to the sequencing reaction mix, followed by centrifugation at

14,000 RPM for 15 min at 4°C. The supernatant was carefully removed by pipetting, and

100 J.lL of 70% EtOH was added to the DNA pellet, followed by centrifugation at 14,000

RPM for 5 min at 4 0C. The supernatant was carefully removed by pipetting, and the pellet was air-dried at room temperature for 5 min. The pellet was dissolved in 2 J.lL formamide loading dye, and the sequencing analysis was carried out on an ABI Prism

377 DNA Sequencer (Applied Biosystems).

34 2.4 HAC Sequencing Analysis and Gene Identification

Twenty BACs from super-contig 783, five BACs from contig 818 (SOI20P04,

S0343H24, S0206M03, S05030I4, and S0539C02) and three BACs from contig 2705

(SOI80H04, S0429J16, S0048H24) were chosen for sequencing by the Baylor College of

Medicine Human Genome Sequencing Center. As for S0132J08 (contig 818), its shotgun

library was constructed at Simon Fraser University, British Columbia, and was sent to the

University of Victoria, Victoria, British Columbia for sequencing. Once all of the BAC

sequences were obtained, the BAC sequences were first masked with RepeatMasker

(http://repeatmasker.org) and with the Salmon Repeat Mask Library

(http://lucy.ceh.uvic.ca/repeatmasker/cbcrepeatmasker.py), and then they were aligned

using BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) to the following databases: NBCI

non-redundant (m) database (0412008) (e-value cutoff = 1), UniRef database (release

13.4, 0512008) (e-value cutoff = 1), NCBI conserved domain database (CDD) (0312007)

6 6 (e-value cutoff = Ie- ), TC 1 database (0212005) (e-value cutoff = Ie- ), and Ensembl

organism genome protein database (release 49, 0512008; http://www.ensembl.org/) (the

five fish that were used from the Ensembl database for alignment were medaka, zebrafish,

stickleback, Tetraodon, and Takifugu). The BAC sequences were also aligned using

BLAT (Kent, 2002) against Atlantic salmon EST database

(http://lucy.ceh.uvic.ca/contigs/cbccontig_viewer.py), and SSAHA2 (Ning et al., 2001)

against Atlantic salmon BAC ends (http://www.asalbase.org/). The gene prediction was

done by Genscan (Burge and Karlin, 1997), and transmembrane helix structures were

predicted using TMHMM (Krogh et al., 2001). More detailed BAC sequencing information is also available on the website (http://grasp.mbb.sfu.ca/).

35 2.5 Comparative Genome Analysis

The genomes of medaka, stickleback, zebrafish and Tetraodon (ensembl 48,

1212007; http://www.ensembl.orgl) were used for the comparison with the Linkage

Group (LG) 1 of Atlantic salmon. A separate BLASTx (e-value < 0.01) (Altschul et al.,

1990; Altschul et aI., 1997; Zhang and Madden, 1997) was performed between all the

BAC-end sequences from the contigs assigned to Atlantic salmon LG I and each of the proteomes for medaka, stickleback, zebrafish and Tetraodon. The lists of BLASTx results can be found on the ASalbase website (http://www.asalbase.org/). In order to make sure that the correct synteny was chosen in each of the contigs examined, only these orthologues that were in close proximity with each other on the same chromosome were included. If no such case was found in a contig (e.g. each of the orthologues belonging to a different chromosome), then I would select the orthologues that contain more than 80% sequence identity with the BAC-end sequence. Using these criteria, the most likely orthologues could be identified and selected.

In order to see how these syntenic regions are distributed along Atlantic salmon LG

1, the orthologous chromosomes from each of the teleost examined were used to reconstruct the Atlantic salmon LG 1, using the method of Naruse et al. (2004), with the three-column image generator program (http://grasp.mbb.sfu.ca/cgi- bin/BACAnno/DEV/GeneImage_Multi_test.cgi). Four Atlantic salmon LG I maps were reconstructed by rearranging the orthologous chromosomes from medaka, stickleback, zebrafish and Tetraodon in the order of their corresponding microsatellite markers and their contigs that appear on Atlantic salmon LG 1.

36 CHAPTER 3: RESULTS

3.1 Updating the Atlantic Salmon Linkage Group (LG) 1 Physical Map

3.1.1 An Overview of Atlantic Salmon LG 1

The Atlantic salmon LG 1 genetic maps from Danzmann and Hoyheim (unpublished data), Woram et al. (2003), Artieri et al. (2006) and Moen et al. (2008) were used as the starting point of this thesis. As of September 1, 2008, the merged Atlantic salmon female

LG 1 map contains 38 microsatellite markers, five single nucleotide polymorphisms

(SNPs), and six amplified fragment length polymorphism (AFLPs) (Figure 3.1). It should be noted that the paper by Artieri et al. (2006) uses a different nomenclature for the contigs. In this thesis, the contig nomenclature has been revised, and the information can be found on the ASalbase website (http://www.asalbase.org/).

37 Figure 3.1 Atlantic salmon Linkage Group 1. Currently it contains 38 microsatellite markers,S single nucleotide polymorphisms (SNPs) and 6 amplified fragment length polymorphisms (AFLPs). The numbers on the right indicate the distances in centi-Morgans (eM). The sex-determining region is believed to be located at the telomeric end. Ssa406UoS 46.6 Ssa0247BSFU 42.3 Ssa15SSFU 40.1 Ssa0947BSFU 31.4 Ssa0542BSFU 29.2 BHMS216 29.2 Ssa0219ECI G One102ADFG 24.9 Ssa1077BSFU 24.9 Ssa0531 BSFU 19.5 Str41NR A 19.5 ACT ICAC148 18.4 AAGICAT218 18.4 Ssa0114ECI G 18.4 One18ASC 18.4 Ssa0856BSFU 15.2 HoxA11b/ii 14.1 AGCICTC565 13 OMM1122/i 13 Ssa-A15/1 13 OMM5188/i 13 Ssa-A14/1 13 BHMS447 6.5 OmyFGT8TUF 6.5 OMM1016 0 OM M1278 -BHMS7.029 0 Ssa0219BSFU 14.1 Ssa0691 BSFU 15.2 BHMS'150 15.2 CA341677 17.3 Ssa0337BSFU 18.4 Ssa0182ECI G Ssa208/2 23.9 ACT ICTG71 23.9 Ssa0166ECI G 27.1 CL7880 34.7 Ssa58 34.7 CL11428 38.0 BX311884 40.2 Ssa0055BSFU 44.5 Ssa 0183BSFU 44.5 Ssa202DU 44.5 Ssa0181 BSFU.~ 44.5 Sal1 UoG 44.5 SS12 46.7 SEX') Ssa0233BSFU 46.7 • AAGICTA236 57.5 AAC/CTT97 59.8

38 3.1.2 Incorporating Sex-Linked Microsatellte Markers into the Physical Map

Oligonucleotide probes, approximately 40 nucleotides in length, were designed from the flanking regions of 12 microsatellite markers and five SNPs. The sequences of the probes were checked by the RepeatMasker program (http://repeatmasker.org; also http://lucy.ceh.uvic.ca/repeatmasker/cbcrepeatmasker.py), since the Atlantic salmon genome is filled with repetitive elements. It is important to avoid these repetitive elements when designing a probe, as it would jeopardize the hybridization result if the probe contains the sequence of a repetitive element. Reverse matching primers approximately 20 nucleotides in length were also designed to be used for PCR. Before the hybridization, all of the probe/reverse primer sets were tested using Atlantic salmon genomic DNA to make sure that they gave clean PCR products. The probes were used to screen the CHORI-214 Atlantic salmon BAC library filters. Hybridization positive clones from the filters were selected from the CHORI-214 Atlantic salmon BAC library, and

PCR was carried out using the corresponding probe/reverse primer sets to check whether these clones were true positives. The probe and primer sequences corresponding to the microsatellite markers and SNPs are shown in Table 3.1.

Each of the 12 probes for the microsatellite markers and five SNPs was hybridized to Segment 1 (six filters) and Segment 4 (three filters) of the CHORI-214 Atlantic salmon

BAC library. Segment 2 (six filters) was sometimes used in cases where Segment 1 and 4 did not give any positive results (e.g. BX311884, HoxAllb, and OMM1016). Eleven of the 12 probes associated with microsatellite markers and all of the five probes from SNPs gave positive hybridization signals to BAC clones (Figure 3.2 shows an example). The probe corresponding to OMM10 16 did not yield any positive signals in any of the three segments, but when the probe and the reverse primer were used to amplify Atlantic

39 salmon genomic DNA, a PCR product could be seen. This probe had actually been tested previously for hybridization, but no positive clones were found (Artieri et al., 2006). This could be because the region near OMM1016 in the Atlantic salmon genome is not represented in the BAC library.

All the BACs that were positive by hybridization were checked by PCR usmg primers corresponding to the microsatellite markers and SNPs that are described in Table

3.1 (Figure 3.3). The PCR confirmed BACs were then placed into their corresponding contigs in the ASalbase database (http://www.asalbase.org/sal-bin/tools/bac2contig)

(Figure 3.4). Among the 11 probes corresponding to microsatellite markers and the five probes corresponding to SNPs that gave positive hybridization results, 11 of them hybridized to a single contig: BHMS7.029 (contig 332), CA341677 (contig 783), CL7880

(contig 1269), OMM1l22 (contig 3400), OMM5188 (contig 3400), Ssa01l4ECIG

(contig 155), Ssa0166ECIG (contig 5983), Ssa15SSFU (contig 2169), Ssa0247BSFU

(contig 2169), Ssa0219ECIG (contig 2183), and Ssa-A14 (contig 3400). Two probes hybridized to two contigs: CLl1428 (contig 807 and contig 938) and SsaO 182ECIG

(contig 1185 and contig 846), which could be due to duplicated regions or overlaps. In order to see which of these alternatives was correct, PCR products (amplified by probe/reverse primer designed from CLl1428) from clones S0001D09 (contig 807) and

S0231 A 17 (contig 938) were sequenced. The result showed the two sequences had a

100% match, indicating that contig 807 and contig 938 overlapped one another. The same procedure was also carried out for contig 1185 and contig 846: PCR products (amplified by probe/reverse primer designed from SsaO 182ECIG) from clones S0232A12 (contig

846) and S0234H14 (contig 1185) were sequenced. The result showed the two sequences

40 also had a 100% match, indicating contig 846 and 1185 overlap as well. Two probes hybridized to singletons (i.e. BACs that are not part of any contig): HoxA 11 b (S0901 A 19,

S0934C08) and OMMIIINRA (S0247F07, S0208DI4). One probe hybridized to five contigs: BX311884 (contig 397, contig 458, contig 1481, contig 2263, contig 2673), which could be due to it being part of a repetitive element. This result also indicates that, despite an effort to avoid the repetitive element while designing a probe, there are still uncharacterized repetitive elements within the Atlantic salmon genome, which could not be picked up by the RepeatMasker program. Table 3.1 shows a summary of all the hybridization results.

41 Table 3.1 Microsatellite markers and SNPs used for the integration of Atlantic salmon Linkage Group 1 physical map. R- Reverse primer. Name of Primers Positive Contig(s) or BACs Marker BHMS7.029 Probe-CAGGGGAAGGGGTTGGACTTCAAGGTTACCTGCTTTTCAC etg332 R-GGATGTAGGGAGCCATAAAG BX311884 Pro~-CCTGTCACATGCTGGCTGTAGAGGAGGTCTGGACCTGAAG etg397, etg458, etg1481, etg2263, etg2673 R-AGGTCCTTGCTGAAGTTGTC CA341677 Probe-GTCACTGCAGGGTGTGGTTTCTGGACTACGTCATGTTCAC etg783 R-TTCAACTCCAGGCGGTGCTC CL7880 Probe-ATGTTGGAACCCAGGAAATGACGACCGCATCAACATTGTC etg1269 R-GCATGCAAAGATAGTTGGTG CL11428 Probe-CAGGAGGTCATCTGTCTCTGGGTCATGTTTATATCTCTGG etg807, etg938 R-ACACGTCTAGGTTTGTCACC HoxA 11 b Probe-ATGAGCGGGTCCCAGTTGGGTCTAATATGTATTTGCCCGGC S0901A19,S0934C08 R-ATTTACTGGAGCTACCAATG OMM1016 Probe-CTCAGAGCCTCTCAGTCTTAAGTAGACCACACTGTTAAAG None R-TAACGGATGAGGGATATGTC OMM1122 Probe-ATCATTTTGAAGGAATCTCCAGGTACAAGGTATCTGTAGG etg3400 R-AAGAGCAGTGGTTCTCAGTC OMM5188 Probe-TGTGCCCCAGCCCAACTATCCTGGTGTCTACACCAAGGTC etg3400 R-AGCCTTCTATCTCCCTCTAC OMM 11 INRA Probe-GTTCAAGTCAAAGCAGCTGGACAACTCTCCCTGAAGGTTAC S0247F07,S0208D14 R-CCAGGGCATGGCTTTG Ssa0114ECIG Probe-GCATAAAGCCTGTTGAGTGACCATGGACTGAAGGTCATTC etg 155 R-GCTAAGACTGAGGGTTCTGG SsaO 166ECIG Probe-TGGCCTTATTTCTTACAAGCACATCCAGCTGCCGCCAAAC etg5983 R-GGAACCAGTCACCAGTTTAC SsaO 182ECIG Probe-TGAAGGGCTCAACTATGAAGTGCCCAAGCCTAAGGCTGTG etg 1185, etg846 R-CACAAGTGTAAGGTGTAAC

42 Name of Primers Positive Contig(s) or BACs Marker SsalSSSFU F-ATTCATTGGTGACCGTAAAC ctg2169 R-ATCTGTCTGGAAAACGTGTC Ssa0247BSFU F-CTACGCGCCTACTCTACATT ctg2169 R-CTCTAGGGGTACCAGAGGAT Ssa0219ECIG Probe-CGAACCTGGAAAAAGTGGACAAGTGGTGTTTGTTGCAGTG ctg2183 R-AGTGGTCAGCAGTGGCACAG Ssa-A14 Probe-GAGTCTTCTGCGTCATTAGGCAGAGCAGGCAACAGATGGC ctg3400 R-GCAAATCTCTTCTGCACCAG

43 Figure 3.2 CHORI-214 Atlantic salmon BAC library filter hybridized with BHMS7.029 probe. The blue rectangles indicate the hybridization by overgo reference probe, which is made from C. briggsae. The green circles indicate the positive hybridization, which are duplicate spots. The red circles indicate noise. which are single spots

~ • L....:...:.J

.0·. [J 0. 00

o o··. • . (5) D

r:-:I D ~ o D

44 Figure 3.3 Hybridization positive BACs found by BHMS7.029 microsatellite probe were confirmed by PCR.

1 1 3 4 5 6 7 8 9101112131415161718192011221324

1. S0002F04 2. S0040B21 3. S0040003 4. S0044L15 5.S0064E07 6. S0067K01 7. S0068A04 8. S0092009 9. S0109B13 10. S0109B15 11. SOl10L18 12. SOl16022 13. S0163G01 14. S0165H08 15. S0166E23 16.S0168011 17. S0184C17 18. S0243N11 19. S0243Pll 20.S0251C14 21. S0253106 22. S0268C12 23. S0271H22 24. 100 bp ladder.

45 Figure 3.4 Contig 332 as viewed from Asalbase. BHMS7.029 positive clones are shown in green.

(I II I IIIIIIIII I IIIIIIIII I IIIIIII I IIIIIIIII I IIIIIIIII I II 11111 III II II III II II II III II " I) 400k 500k 600k 700k 800k 900k 1000k 1100k 1200k 1300k ,------,---=:::------ctg767 ctg332 Predicted Range of 8HtlS7.029 D····················· .. . -- --.-0 S0.1;;.6~5H-:O.;8~ ;:.SO~8~7=4,,;N2;,;1~===:7== S0427006 S0947NU S0058L13 S0244F18.. __ ~S,,;08,;;,8.,;5;,;J1,;,8======Sc=00:='0=:2F·;;.O~3...;.;.;=---~==SO~1;;;1~9M;;2~1= S0235E14 .;,SO;;;2;;;2,,;OP;;;1;,;6===== S0274K22 S0196F12 ;;,SO;,;2;,;,7,;;;50;;;,;1;,;7_____ S0915808 S03411 ~SO;;;5~;;;93~61A;;;91;;;N02=1======so;;,:~;.;:;.;:~..;~A;.;:12;.;2;"'--S-0-12-3-M-2-3 S0146K13 ;;SO;,4;;9,;;08;;1;;5~======S0493J07 S0926F09 ;;,SO;,;9;,;;2;;,;2A,;;;0,,;,4=== S0325P17 S0826K21 503930085;;.0;.0;.4..;4,;;L1.;;;5;... _ S0475P04 S0315818 S0451010 ;;50;,;3;,;8,,:,4;;10;,;2~======S0040821 S0119C05 S0456A04 ;;,50;,;3;,;8,,:,4;,;11;;4======S0047C22 S0166£23 S0288L15 S;;..O;.;:2.5;;;.;.31;.;0;.;:6 _ S;;,0;;;9;,;;2.;.;00;;,;0;;;9===~ ;;,SO;;4,,;4,;5C;,0;;;5===:===== S032' S0080013 S0268C12 S0361124 S0315817 ;;.S-0-37~7-N-10 S0502A14 ...----- S0263N05 S0050N18 S041 S0348N23 .;,SO~2;;;1~50;,;2;;;0:======;,SO..O.;68A.;;...;O..4 .;,SO;;;4;;;0~50;;,,0;;;6===== S0429P02 S;;..O.;1;.;:8~4Cl~7 _ S0873M01 50865C05 S0834A10 .;,SO;;;2;;;1~9A;;;1;,;3======S0305C18 S0150809 S0522E16 =SO=4=1=6=J1~6~===== S0128L07 S0258F02 S0125E03 S;;..0;.;:1~1;;..602;;;.:2~ ___ S0477802 S0173N23 S0222E01 S;.;09;.;:.;,O;;..8M.;.;O,;;3 _ S0164F05 S0120K04 S0030C06 ;;,SO;;4;;;0,;;,8K;;;2;,;3~==== S0200J18 S0070F12 S0244811 ;;.SO;.0.;6..;7K..;O;.;:1~ _ S0111A02 S0314J08 S0144N21 S;,;O;.;;1,;;63;,;G.O,;;1 _ 50288008 S0121F07 S0905A13 ;;.SOO;..;.;;6~4;;..EO..;7 _ ;;,SO;;;30~10;,0;;;5======50497016 S0270M09 =SO;;;8;;;6~51;;;2;;;2======S0887L04 S0823P01 S0097K13 ~SO;;;1;;;89~M,,;0=7======S0251C14 S0160F11 S0224C21 ;;SO;,;0;,;;5;;9F;;;0;,;;2~===== S086SE06 S0374N03 S0031010 ;;SO;,;0;,;4,;;;8L;;,;0;,;;2======S;;..0;.;1~1,;,;OL;;;1;;;8 S0308L09 S0005805 S;;,0;;;9;,;4,;;;6G;;,;0;,;;8======S;.;0;.;0;.;:0;;;2F..;0.;4 S0034F02 S0218L16 S;;,0;,;0,;23;;,P.;,0;,9======;;,SO;;5;;2;;;2,;L1;;9~,_S;;,;0;;;3,,:,47;;;H.;.;0;;;2~==:===_===S0341F1: S0041K19 ;;,SO;,;9;,;;1,;60;;1;;;1~======S0376H13 50516011

46 3.2 Extension of Coverage of Atlantic Salmon LG 1 Physical Map

3.2.1 Chromosome Walking

After finding the contigs corresponding to the microsatellite and SNP markers, the next step was to extend out (or "walk out") along the chromosome, using these contigs as the starting points. In order to walk out from a contig, I designed hybridization probes from the BACs located at the end of each contig. Approximately a hundred thousand

BACs were used for end-sequencing from the CHORI-214 Atlantic salmon BAC library, and with the help from the Genome Science Centre in Vancouver, the DNA inserts near both T7 and SP6 promoters of the BAC vector were sequenced, producing approximately two hundred thousand BAC-end sequences, covering about 3.5% of the Atlantic salmon genome. Each BAC-end sequence was given the name "BAC-name_T7" or "BAC- name_SP6" (for example, the sequences from T7-end and SP6-end of S0397C07 would be S0397C07_T7 and S0397C07_SP6, respectively). The T7 and SP6 BAC-end sequences are now available on the ASalbase website (http://www.asalbase.org/), and they can be used to design probes. All the BAC-ends that were selected for chromosome walking and contig joining are listed in Table 3.2, with their cOlTesponding contigs and

T7 or SP6 probe/reverse primer sequences.

47 Before designing the hybridization probes, the BAC-end sequences were checked by the RepeatMasker Program mentioned previously to try to avoid repetitive elements. All the hybridization probes designed from the BAC-end sequences contain approximately

40 nucleotides. The corresponding reverse primers for PCR checks contain approximately 20 nucleotides. Just before the hybridization, it was important to find out which of the T7 or the SP6 end of the selected BAC was the "true end" in the contig. To do this, several other BACs located near the selected BAC were chosen for PCR analysis usmg the probes/reverse pnmers designed from both T7 and SP6 sequences of the selected BAC. If PCR products occurred, it would mean the BAC-end was pointing inward and located within the contig. If there were no PCR products, the BAC-':end would be pointing outward, and the probe from this end would be used for hybridization. Figure

3.5 shows a hypothetical scenario that illustrates the concept, and Figure 3.6 shows an actual result. Thirty-four probes were used for such a purpose, and the procedures were the same as I did for the microsatellite marker probes and SNP probes: hybridization of probes to the CHORI-214 Atlantic salmon BAC library filters, hybridization positive clones were PCR checked, and those PCR confirmed BACs were checked to determine their corresponding contigs.

48 Among the 34 probes, six gave hybridization signals that were too complex to interpret (S0099D1S_SP6, S0068C06_T7, S0111 L24_SP6, S0422B20_T7,

S0422B20_SP6, and S0186DOS_SP6) and so could not be analyzed further. Two probes

(S0126017_T7 and S0088G07_T7) hybridized to repetitive regions and could not be extended out as well. Two gave no hybridization signal (S013S114_SP6 and

SOlOSH17_T7). Two probes (SOOS7M11_T7 and S0068C06_SP6) hybridized to two contigs, and four probes (S006SE16_T7, S0198G07_T7, S0247F07_T7, and

S0208D14_SP6) hybridized to singletons. The rest of the 18 probes only hybridized to one contig other than the contigs that they came from (see Table 3.2 for a summary of the hybridization results).

49 Table 3.2 BACs selected for chromosome walking and contigs joining. F- Forward Primer. R- Reverse Primer. BAC ends Contigs Primers Positive Contig(s) or BACs S0397C07_SP6 1052 Probe-TCCTTTCAGCCCTCTCTACCTCACTCCTCCATCCTTCCCC ctg1968 R-CTCTCATCCATCCCTTCTCC S0363E24_SP6 1968 Probe-AGCCCTGGAATGACCTAATGCCCAGCTTCTACCATGGCAG ctg783 R-GTAGCTATGCCGTATGTATC S0065E16_T7 783 Probe-CTGCTCAATCTTGTGGATAAGGATAGTGATTTGGTGCCTG S0847EI6 R-CTCTGCCTTCATCATACAGC S0066L24_SP6 1052 Probe-GCAGTCAGTCTCACCCTAACCCCGGACAGAAAAGGAAACG ctg6118 R-CCACCATCACTATGCCAACG S0946D20_SP6 6118 F-CCCATTTAGTGTCCAAGGTG ctg2354 R-TCTGTCTCTGTCCGTGACTG S0057M1 LT7 2354 Probe-AGTAAAATAAACGTACGCGTGTGTGCCTCAGTCACCAGAT ctg374, ctg567 R-TTAGTAGGGCATTTGTCCAC S0068A04_SP6 332 Probe-GACTTTGACAATCTATCAGGATAAGGTAAGACCCAGGTGC ctg767 R-CTGTCTTTACCCAGCCTCTC S0057D16_T7 332 Probe-AGTGACGGAGTTTGTTTTGCGAACAATGATCAGAAAATGC ctg2178 R-CGGCTTAATACAGTAACCAG SO 116022_SP6 767 Probe-TGCGGGGATCATCAACTACACTGAACAAAAATAAAAACGC ctg332 R-GTGGGGAAAAACTCTTGCTG S0126017_T7 2178 Probe-CGCTGGCTCGATGCTCTATCTGGTTCTTTTATGGCGG ctg30, ctg771 , ctg1049, ctg1129, ctg2505, ctg5498 R-GAGATGAACTTACTTTGGTG S0272N03_T7 2178 F-TTCCCGAGCTTCCATCATAG ctg332 R-CCGTAATTGGCGAAATAGAC S0099D15_SP6 1668 Probe-ACGTAGAACACCAAATAATGCATGCAGAGGAGAATTAGGC Signals too complex to R-CTCTTGAGAGCCAGGTCTGC interpret S0023K 12_T7 1668 Probe-AGCTATACCACGGCTCTTCAGTGATACGATAGAATCGAGC ctg315 so BAC ends Contigs Primers Positive Contig(s) or BACs R-CTGGATGTCTAAGGGCAATG S0135I14_SP6 315 Probe-GGCTAAGGTGTATGTAAACTTCCAACITCAACTGTAGGTG None R-GGACITCCTCCCTGTATCAC S0068C06_T7 Singleton Probe-CAITATTTTGGTCITACTGGGAITTACTGTCAGGGCCCAG Signals too complex to R~ACCCTACTAGAATCTGAAG interpret S0068C06_SP6 Singleton Pro~-AGATGAGCGATGACAAAGGAAGAGAGAGGATAAGATGGGG ctg3961, ctg4476 R-GGTGAACAGTAITGGAGCAC S0161E01_T7 3961 Probe-CAATACCCACCITCTCCATCTCATTTCACTCCAGTGTCTC ctg2169 R-GGCTTCACAAGACTATGCTG S0132F13_SP6 3725 Probe-ATGTGCAAGCTACCAAAGGAGGTGGTCACCACTCCTCCAC ctg155 R-GCGATACATAAGTGCTAAAG S0131 A08_SP6 2322 Probe-GACAGCTTAGATTGITATCGGATCTGGAGACACCCTGCCG ctg684 R-ACCTCTGCCGTCCCITAGAC S0238C08_SP6 684 Probe-GGACTCTGATAAGGGCTGTAGCCTACAAAACTGCACTGC ctg2322 R-TCATTACGAGGTAGCGATAC S0154I18_T7 722 Probe-TGCATGGTACTTCTGGTTGTGACATGGATACGAGTCATAC ctg172 R~ACTAGGAATAGCAGCACTG S0230B09_T7 172 Probe-GGCAITGGGCCTCTGGITAGAAGAAAAAGTGAGGGGGAAC ctg722 R-CTTTTAITGGCTGACTCCAC S0111L24_SP6 807 Probe-CGTTGAATTGACATCTGTGCTCAGTGGCATAATACTCTGC Signals too complex to R-ATATTTCCCTCGCCCCTCTG interpret S0088G07_T7 807 Probe-CACATGAACACAGGATITCCAATCCAGAITGTGTTCTGAG cg138, ctg241, ctg987, ctgl137, ctg3120 R-CATGGGCGITGACAITAITG S0422B20_T7 1269 Probe-GTGGCTTCTTCCCTGGTATCCTCCCATGAACACAITCITG Signals too complex to R-AGCATTGCCTGATGTGAACC interpret S0422B20_SP6 1269 Probe-CACATGCCTGTITGCCATCCATGAATCCCTTGGGAATAGC Signals too complex to R-GTTCCGTCAGCGGTGAGTAG interpret S0105H17_T7 1269 Probe-TGACCCTCTTTCTACTTCTGAGGACGTTCTCTAACATCCG None

51 BAC ends Contigs Primers Positive Contig(s) or BACs R-GGTCGCTTCGGAACTGTTAC S0037Mll_T7 1269 Probe-AGTGCTCGCTGTTTACTTACCATACGAGTAATAAATCGAC ctg1469 R-AGATGCTGTCTCGGCTTTTG S0082H05_T7 1469 Probe-CAAGAATGGTGTCGGATACAATGCGAGAGTTACAACGATG ctg1269 R-TTTGTTGCTGGCGTGTTATG SO 198G07_T7 1469 Pro~-ACACGCCACAGAACATAAGGGATGCTTAAACTGCCCGGTG S0915B24 R-GTCTAACGCAAACAAACCTC S0247F07_T7 Singleton Probe-GTGGTGCGAGACAGTGTGATGATCAGTGCTTTGCTCCTTC S0091 C24, S0850H02, S0896B21 R-GTTTTATCTCCCAGCCAGAC S0208014_SP6 Singleton Probe-CTCGGAATTAAGTCGGATGGTGACCCAAATTCCATGTCGG S0243124, S0927023 R-GACCAGCCAATCAAATCTC SO 186005_T7 Singleton Probe-ACATGCTGCTATATGCTGCCATATGCTGCCAGCACGCCC 5983 R-ACGGTGCATGTACACTAGAT S0186005_SP6 Singleton Probe-GATTTTCCAGGAGGCACTGAAGGACCTGTCGAACATGGAG Signals too complex to R-CCTGACCAAACTACAATAGC interpret

52 Figure 3.5 Checking the Orientation of BACs located at the end of a contig using PCR. In this hypothetical scenario, all those BACs are located at the end of a contig and the BAC on the top is used for hybridization and the extension of the contig. The SP6 end of the BAC is found to amplify all the terminal BACs, indicating that this end is overlapping with these BACs and so it is buried inside the contig. The T7 end, on the other hand, shows no amplification on these BACs, therefore it is the true end of the contig and should be used to design probe for contig extension.

I J I BAC Chosen for hybridization I T7 ¢:----·-----cPSP6 I I ------""""""i¢,.---

I I I I I ------~CD:--­ I 1 I ------:.CD:------

53 Figure 3.6 PCR result of end BACs in contig 783, using both T7 and SP6 primers from S006SE16.

... 1 2 3 6 I 8 9 10 11 12 13 ]4 15

.. T7End SP6 End

1. S0074M19 2. S0171H08 3. S0174K16 4. S0848M16 5. S0861102 6. S0065£16 (control) 7. Atlantic salmon genomic DNA (control) 8. 100 bp ladder 9. S0074M19 10. S0171H08 11. S0174K16 12. S0848M16 13. S0861102 14. S0065£16 (control) 15. Atlantic salmon genomic DNA (control)

54 3.2.2 Joining the Contigs

The contigs and clones that were joined with other BACs are listed in Table 3.3,

with their corresponding microsatellite markers. When two or more contigs were merged, the "super-contig" was renamed using the smaller contig number. For example, when contig 332 was merged with contig 767, the super-contig was renamed contig 332.

Currently, eight super-contigs have been produced (contig 783, contig 332, contig 315, contig 2169, contig 1269, contig 684, contig 155 and contig 172; see Table 3.3 for the merged contigs). Super-contigs 783, 332, 315, 2169, and the region around the marker

OMYllINRA now have minimum tiling pathways built ready for sequencing analysis

(see section 3.3). The minimum tiling pathways for super-contigs 1269, 684, 155, 172, and contig 807 are not yet completed. In addition, super-contigs 684, 155 and 172 have not been extended, whereas super-contig 1269 gave hybridization signals that were too complex to interpret (using S0422B20_T7, S0422B20_SP6, and S0105H17_T7), which could be due to repetitive regions on the chromosome, and a singleton (using

S0198G07_T7), and so could not be extended any further (see Table 3.2). Furthermore, contig 807 also gave hybridization signals that were too complex to interpret (using

SOlI 1L24_SP6), and repetitive regions (using S0088G07_T7), and therefore cannot be extended further (Table 3.2).

55 Table 3.3 Contigs and clones that were joined to form super-contigs, with their corresponding micorsatellite markers. Starting Contigs or Clones Corresponding Markers Merged Contigs or Clones Name of the Super-Contig ctgl052 BHMS150 ctg2354, ctg6118, ctg1052, ctg1968, ctg783 ctg783 ctg332 BHMS7.029 ctg2178, ctg332, ctg767 ctg332 ctg1668 Ssa0219BSFU ctg1668, ctg315 ctg315 S0068C06 Ssa406UoS ctg3961, ctg4476, ctg2169 ctg2169 ctg1269 CL7880 ctg1269, ctg1469 ctg1269 ctg684 OmyFGT8TUF ctg684, ctg2322 ctg684 ctg3725 Ssa0114ECIG ctg3725, ctg155 ctg155 ctg722 BHMS216 ctg172, ctg722 ctg172 S0247F07 OMYIIINRA S0091C24,S0850H02,S0896B21 No contig is found yet S0208D14 OMYIIINRA S0243124, S0927D23 No contig is found yet

56 3.3 Minimum Tiling Pathways

In order to sequence the region of the genome represented by a contig, the construction of a minimum tiling pathway is required, which is the minimum number of

BACs necessary to cover the entire contig, and thus the corresponding genomic region.

Currently, four of the eight super-contigs have minimum tiling pathways constructed: super-contig 783,332,315 and 2169, which are ready for sequencing and analysis. Table

3.4 shows a summary for all the minimum tiling pathways constructed.

57 Table 3.4 A summary of all the milllimum tiling pathways constructed.

Super-contigs No. of BACs in Minimum Coverage Markers Tiling Pathways 783 31 -6Mb BHMSI50, Ssa208 332 15 -2Mb OMM1278, BHMS7.029/lA 315 5 -900 Kb Ssa0219BSFU 2169 6 -1.1 Mb SsaI5SSFU, Ssa0247BSFU, Ssa406UoS OMYIIINRA 9 -1.1 Mb OMYIIINRA

58 3.3.1 Super-Contig 783

Super-contig 783 is currently the largest one constructed, covering a total length of approximately 6Mb. The starting point of this super-contig was contig 1052, with the microsatellite marker BHMS150 positioned on clone S0397C07, and this clone also happens to locate at the end of contig 1052. A probe was designed from the SP6 end of

S0397C07 for hybridization, but the hybridization result was quite complex, containing a total of 182 positive clones that corresponding to 46 contigs. Further analysis showed that most of the clones came from five contigs (contig 71, contig 1127, contig 1708, contig

1968 and contig 2172), so two clones from each of the five contigs were chosen for PCR analysis. Fortunately, after testing all the candidate clones with PCR, I found that the positive clones (S0036K17, S0446M18, S0130K18, S0114H06 and S0256122) belong to contig 1968 only. Hence these two contigs were merged. S0363E24_SP6 in contig 1968 was used for extension, and it identified three clones (S0247M17, S0854L05 and

S0858H02) that belong to contig 783. From contig 783, S0065E16_T7 was used to probe the filters, but it only hybridized to one singleton, S0847E16. Currently, S0847E16 has no sequences from either the T7 or SP6 end, so the extension cannot continue until these end sequences are obtained.

59 At the other end of contig 1052, the S0066L24_SP6 sequence was used for contig extension. Clones from two contigs (contig 6118 and contig 2354) were positive. PCR analysis showed that the BAC belonging to contig 2354 (S0015I08) gave a faint PCR product, whereas the clone belonging to contig 6118 (S0946D20) gave a strong band.

Therefore, I predicted that S0946D20 (contig 6118) could be situated just in between contig 1052 and contig 2354. In order to test this, both S0015I08 and S0066L24 were analyzed with PCR using S0946D20_T7 and S0946D20_SP6, and the result showed that

S0946D20_T7 amplified S0066L24 (contig 1052), whereas S0946D20_SP6 amplified

S0015I08 (contig 2354), indicating that the contigs 1052,6118 and 2354 were connected.

In addition, contig 2354 contains another microsatellite marker, Ssa208, and so by putting both BHMS150 and Ssa208 on the physical map corresponding to Atlantic salmon LG 1, I was able to get the true orientation of this super-contig: the end of contig

2354 (S0057Mll) is pointing toward the sex-determining locus and thus the telomeric region. When the S0057Ml LT7 probe was used for hybridization, BACs from two contigs (contig 374 and contig 567) were identified. This could be due to duplicated regions or overlap, and in order to see which contig is the correct one following contig

2354, PCR products (amplified by S0057M1 LT7 probe/reverse primer) from clones

S0024C15 (contig 567) and S0039D06 (contig 374) were sequenced. The result showed

60 that the two PCR products contained a completely different sequence, indicating the

S0057Mll_T7 probe/reverse primer had amplified two different portions of the Atlantic

salmon genome. When the two sequences were checked by RepeatMasker program, both turned out to be repetitive sequences. Therefore, S0057Mll_T7 probe has identified a repetitive region in the Atlantic genome, and thus the extension could not continue in this direction. In summary, this super-contig currently contains five smaller contigs: contig

2354, contig 6118, contig 1052, contig 1968, and contig 783, and so it was renamed as contig 783.

The minimum tiling pathway of this super-contig was also built, and the primers used are listed on Table 3.5. There are 31 BACs that make up the minimum tiling pathway (Figure 3.7), but the BAC ends from only 26 clones were used to design primers for constructing the minimum tiling pathway (Table 3.5) because the end sequences from the other five BACs were not available. Within the minimum tiling pathway, 20 BACs were subjected to sequencing analysis at the Baylor College of Medicine Human Genome

Sequencing Center (see section 3.4).

As mentioned in section 3.]l.2, both contig 1185 and 846 were discovered when probing with the SNP probe SsaOI82ECIG, and both contigs were found to overlap with one another. Further analysis has shown that contig 1185 also contains another

61 microsatellite marker, Ssa0337BSFU, which has been incorporated into Atlantic salmon

LG 1 at a distance of 18.4 cM from the centromere (Figure 3.1). Since super-contig 783

corresponds to a region 15.2 cM to 23.9 cM from the centromere (Figure 3.1), the

position of Ssa0337BSFU is predicted to be within this super-contig 783, but this marker

was not detected in earlier studies when the minimum tiling pathway of super-contig 783

was constructed. A possible explanation is that contig 1185 (and also contig 846)

overlaps with super-contig 783. PCR has shown that primers designed from

Ssa0337BSFU could amplify a few clones III super-contig 783, such as S0036K17,

S0446M18, S0130K08, and S0114H06. The PCR products from S0036K17, S0446M18, and a BAC from contig 1185 (S0023B04, the BAC where Ssa0337BSFU was identified) were sequenced and compared, and the result showed that these sequences are identical, except in the microsatellite portion of the PCR products, where there were a few nucleotide differences in the microsatellite. This indicates that contig 1185, 846 and super-contig 783 are very likely to be overlapping. However, due to the complexity of this region, it might be necessary to carry out fluorescent in situ hybridization (FISH) analysis in order to confirm whether these contigs do in fact overlap.

62 Figure 3.7 Super-contig 783. This contig covers about 6Mb in length, which contains five smaller contigs: contig 2354, contig 6118, contig 1052, contig 1968 and contig 783 (diagrams not to scale)

63 Table 3.5 BACs selected for building the minimum tiling pathway in super-contig 783. F- Forward Primer. R- Reverse Primer. BACs ends Primers Positive BACs Probe-CTGCTCAATCTTGTGGATAAGGATAGTGATTTGGTGCCTG S0847E16 R-CTCTGCCTTCATCATACAGC Probe-TTTTCTGTCTTTGGGGAGTATGGGTGAGGGTAGGCTCGGG S0174K16 R-ATTAATATGCCTCGCCTGTG Probe-ACTCGGAGGACCAACAACCTCCACATACAGACCAACAAAC S0065E16 R-TCACGAGCGCACAAATGTAG Probe-TGCTGTGGCTCTGACCAGTCAGCATTGACATCAGTACACC S0048N06 R-GACCGCTGTTTGACGAGTGC F-GATACGCGTGTGACCCAGAC S0359P08 R-ACCCGGTATCATAGCAGATG F-AGCGTTAGATTGTCCGTTCG S0048N06 R-CTGGTGTCAGTCAACATTGG F-GATCCCCAATTAGAGACAAC S0051J09 R-GGAAAGCTGCTGAAAGAAGC F-AAGACCCAGTCACTCAAAGC S0359P08 R-CTGAAAAGCAGCAGCCTCTG F-TTGAGCTCTTACCACCTGAG S0086A16 R-TCCCGTACGTGTGCGAACAG F-CCTGCCCTAGCCTACATTAC S0213A09 R-ACCAACTGTGTTTCTATGAG F-GAAGTGGCCAGAGTTGAAAC S0025G22 R-TGTGATTACATTGTGCTTGC S0025G22_T7 F-TAGACGGTCCACTTTTCCTG S0242121 R-GTACTGGTCTCACCTCACTG F-GAGAGTACTGACCCTGACAC S0086A16 R-CCAGCTTCTATCTTTCTTGC S0242121_T7 Probe-CATGTAGTTCTGCCAATAGTCTGCTTTGCAGGCTGTGTGG S0025G22 R-GTCCTACACCAAATCCAGTG

64 BACs ends Primers Positive BACs S0242I21_SP6 Probe-TTAAAAGCTCAGGGTAGTGAGCTGATATTGGCCAGCCAGC S0247M17 R-CATCATTTGCATTACAGTGC S0363E24_T7 Probe-GGAGGATTTTCCACTTCAGAGTGTCTTTTATCCTGCTTTC S0247M17 R-AGCCCCATTGTTCACTCACC S0363E24_SP6 Probe-AGCCCTGGAATGACCTAATGCCCAGCTTCTACCATGGCAG S0501D07 R-GTAGCTATGCCGTATGTATC S0501D07_T7 Probe-AGGCAGACAGTAGAATAGGCATCAGACCTGTAAGAGAGAC SOO45K22 R-GAGACAAGACCGAGACACTC S0045K22_T7 Probe-TTCTCTGCTTCCTCCTATGCTTGGCACTACATCGCAGCGT SOO36K17 R-TGGTTAAGTGGCTCTGTGAC SOO45K22_SP6 F-AGGCTGACCAATGTAGATAG S0501D07 R-TTCGTTAGTTATGCTCCAAC S0036K17_T7 Probe-TACCATAGCCGCTTCCTTTTCAAACGTAATCCCTTATGTC SOO45K22 R-CTAGCCAGCTTCTCTGTGAC S0397C07_T7 Probe-AGTGCTTGATCTGGGCAACATGTTCAGTATTCATCCTTCA S0185Dll R-TCCCTGCATATTCCCTTCAG S0397C07_SP6 Probe-TCCTTTCAGCCCTCTCTACCTCACTCCTCCATCCTTCCCC SOO36K17 R-CTCTCATCCATCCCTTCTCC S0185DILT7 Probe-GGATGGGATGGGGTCCTTACTTCCTTTTCAACAGTGCTGC S0397C07 R-CAGCAACCGATGTGTCCAAG S0185Dl1_SP6 Probe-TTAGCCCATGAACTTAGCACTTAACTTGATTGTGGTTTGA SOO86K22 R-GGCACCCCGAAAAGACTCAG S0086K22_T7 Probe-CCTAGTGGGTTGTGACTGGAGGGAGTACAGCTAAGACCGG S0126A12 R-AGGCTCCTCCCCTGAAATAC SOO86K22_SP6 Probe-TGATCTCAAAGCAGTAGACGGAGCTTGCTTCGCTAACAAC S0185Dll R-TGTATTTTTGGGAGCAGAGC SOO50K08_T7 Probe-CGTTTACTCTCCCCTGACACAAATCAGACTTGTATTGTTC S0126A12 R-GACGGTGAGGATGATGATGG SOO50K08_SP6 Probe-TGTGTAATGACTTGGTGCTGCCAATCTCTTGAAAAAGAGA SOO66L24 R-GGGCCTATAGTAACCTACGC

65 BACs ends Primers Positive BACs SOO66L24_T7 Probe-CCAGCTAGCCAGAGAAACTCACATCAGGACACTAATACGG SOO50K08 R-TAAGAATCATCCAGCAGACC SOO66L24_SP6 Probe-GCAGTCAGTCTCACCCTAACCCCGGACAGAAAAGGAAACG S0946D20 R-CCACCATCACTATGCCAACG S0946D20_T7 F-ACAATCTGCCCCTTGACTAC SOO66L24 R-GACAATAGCAGCCCCTTTAC S0946D20_SP6 F-CCCATTTAGTGTCCAAGGTG SOO15I08 R-TCTGTCTCTGTCCGTGACTG SOO15I08_T7 Probe-AAGCCCCTTCACCTCGCTCACCAGCTCCTTGTTGTCACTC S0946D20 R-CACCTGAGAAAATAGACCTG SOO15I08_SP6 Probe-TCCAGATCACAGCACCTCTACACTGTGTGATGGTAAAAGG SOO11013 R-AACCAAGCAACTTTGTTTCG SOOl1013_T7 Probe-CACCTTACATGACCTGGTAGTTGACATGGCAGTGGCTGTT SOO70C22 R-ACATAATGTTGGATGGTTCC S0011013_SP6 Probe-GGCCCTGCACAGTGACAAACAAACCAATCATTTTAAAAGT SOO15I08 R-TGAGAGCAGAGCGGTGAGAG S0070C22_T7 Probe-CGCAATGTTAAAATACCACCTTTGCCACCCAACTGAACAT SOO65Hll R-TCAACAGCAGAGAAACCTTG SOO65HILT7 Probe-TAAACTTAAGTCCAATAACGTTTGGTCGAGAAAAGTAAGG SOO76C15 R-CTGTCTCTCACTGTTCAAAT SOO76C15_T7 Probe-CAACTAATTACCTCCAGATGACTTTCCGGTAAACAAAATG SOO74L06 R-TACACACGTCACATTTCAAC SOO76C15_SP6 Probe-GCTGTGGAATTTTGGATTTGATTTAAATCAGCCCCACACC SOO65Hll R-TGCTCACGTTGCTGGTGGTC SOO74L06_T7 F-GGACTTGTCGAAAGGTTCTA SOO76C15 R-CTTGTCTAATGCCCCGTCTC SOO74L06_SP6 F-CTCGGGCTGTCCTGTGATTC SOO57Mll R-CAGCCTAGCTCGCCTGAGTC SOO57MILT7 Probe-AGTAAAATAAACGTACGCGTGTGTGCCTCAGTCACCAGAT S0468F14 R-TTAGTAGGGCATTTGTCCAC

66 3.3.2 Super-Contig 332

Contig 332 was identified as containing the marker BHMS7.029. Initially,

BHMS7.029 hybridized to BACs belonging to two contigs (contig 332 and contig 767).

Careful analysis of the hybridization positive clones showed that all of these clones were

located at the end of each contig, suggesting the possibility that BHMS7.029 actually

spanned the two contigs. By designing primers using the clones located at the ends of

both contigs, such as S01l6022_SP6 (contig 767) and S0068A04_SP6 (contig 332), ,md

carrying out PCR analysis, I confirmed that these two contigs were connected. I did not

carryon with the extension on the end containing SO159B 18 in contig 767, because all of

the BAC-end sequences currently available contain repetitive sequences and so

hybridization probes could not be designed. It would be very helpful if other BACs from

this region were end-sequenced to determine whether repetitive region-free sequences are available for probe design.

After merging contigs 332 and 767, the end containing S0057D16 in contig 332 continued to be used for walking out. A probe from S0057D16_T7 was used for this purpose, and it hybridized to BACs that belong to contig 2178, a rather small contig. The extension continued, using a probe from SO 126017_T7 located at the end of contig 2178,

67 but unfortunately, it hybridized to six contigs (contig 30, contig 771, contig 1049, contig

1129, contig 2505, contig 5498), indicating that it had encountered a repetitive region, and thus the extension could not continue in this direction. Overall, super-contig 332 comprises three contigs: contig 2178, contig 332 and contig 767. The minimum tiling pathway of this super-contig was also built, consisting of 15 BACs (Figure 3.8) and covering approximately 2Mb in length. All the primers used for building the minimum tiling pathway are listed on Table 3.6, and the BACs from the minimum tiling pathway are ready for sequencing and annotation studies.

68 Figure 3.8 Super-contig 332. This contig covers about 2Mb in length, which contains three smaller contigs: contig 2178, contig 332 and contig 767 (diagram not to scale).

Contig 2178 Contig 332 Contig 332 Contig 767

OMM1278 BHMS7.029/1A

--+---==~~T7 T7~--=:"':="":~-¢ SPS II 57016.1, J ~ f f ,SPS ~ ~ 116022~ ~T7 110L18; SPS + SPS + 4' 159818 0 T7 T7 I 94K11 SPSl>--~-'-'-'-'-,+-~T7 j119C05 I I II II I jl-~~i':"':":='------

o Positive Hits

69 Table 3.6 BACs selected for building the minimum tiling pathway in super-contig 332. F- Forward Primer. R- Reverse Primer. BAC ends Primers Positive BACs Probe-GCTTACACGACCTTTCACCTCCCTGTTCCCTAGTTTGACC R-GTGGGGTATGGCTATTTGTC Probe-AACGTGTGTAGAAAGTAAACATCAGCACCTCGATGCTGCC SO 140DO 1 R-CTATGTTGTCATCTCAGGTC Probe-AATCACACAGAAGAGGAGTGAGTCCGTCTGCTCAGACGAG S0144N21 R-GCGTTATTGATGGCGTCCAC S0140DOLSP6 Probe-CCACGTTGAAAGTCACTAAGCTCTTCAGTACGGGCCATTC S0159B 18 R-AATCAAACCCTGAGTTCCTG S0144N2LT7 F-AGGGAGACGTGTTGAGACTG S0092009 R-GATGTTTGCAGGAACCACTG F-GAGGTTGCTGTTTCCTGTTG S0140D01 R-TTCAAGTCTCATAGCGAAGG S0092009_T7 F-AGCAGAAAGGCAAGGTTGAG SOl16022 R-TCGGTCAGGATTGGGCTATG S0092009_SP6 F-GTCTGTTGAAGCACCTTTGG S0144N21 R-ATCACAGAGACCTCATTATC SO 116022_T7 Probe-GCCAGATGCTCTCGGGGTATGCGAGAACACGTGGGCAGAC S0092009 R-GCCGCTGCCACTCTAACCTG Probe-TGCGGGGATCATCAACTACACTGAACAAAAATAAAAACGC S0068A04 R-GTGGGGAAAAACTCTTGCTG S0068A04_T7 Probe-AGGAATTCACAACGGAGACGGAGAAAGCCAACTCAGATGG SOllOL18 R-ACCGCTGAGCCCTCTGACAG S0068A04_SP6 Probe-AGACTGATCGAAGGTTTCTGCTGATTAATTAAGGCCATCG SOl16022 R-TCGACAATGTCCTGGCATCC F-GAGAAAACATTGAGGTGAGG S01lOL18 R-CGAAGTGAGCTGATAGTGAC SO 119C05_SP6 F-TGTGGACTCGACTTCTCATC S0093C11 R-TTCCAAAGGTGGGGGTAGAC

70 BAC ends Primers Positive BACs SOO93C1LT7 F-CAGTAGAGTGGCAACTTACC SOl19C05 R-CCTTGGTCTCGTGCGAATAC S0093C1LSP6 F-CAGATGGCAGGCAGTAAATG SO 11 6F11 R-TCCAGTTCTGGGTAGGTATC SOl16F11_T7 F-CTACCAGAGCAATGACTCAG S0187004 R-TCTCGCCTCACTGTTAGAAG SOl16F1 LSP6 F-CTTCTCCCTTCATTATCGTC SOO93C11 R-GTCCAGGGCTACAACAAACG SOO94K1LT7 Probe-TTGACTTTTCGTAGCAGGTTAGGAGAACTTATGCAGCACC S0187004 R-CCTAGCGGACCCTCCTAAGC SOO94Kll_SP6 Probe-AGACTGATCGAAGGTTTCTGCTGATTAATTAAGGCCATCG SOO57D16 R-TCGACAATGTCCTGGCATCC SOO57D16_T7 Probe-AGTGACGGAGTTTGTTTTGCGAACAATGATCAGAAAATGC S0272N03 R-CGGCTTAATACAGTAACCAG SOO57D16_SP6 Probe-TCAGTCTCCTTTGTGATTCATGTTATCACCTCTTTCCCAG SOO94K11 R-AGCCATACACCATCTTTCTG S0272N03_T7 F-TTCCCGAGCTTCCATCATAG SOO57D16 R-CCGTAATTGGCGAAATAGAC SO 126017_T7 Probe-CGCTGGCTCGATGCTCTATCTGGTTCTTTTATGGCGG R-GAGATGAACTTACTTTGGTG S0126017_SP6 Probe-ATAGCCTGTGGCAGCCTGAGCCTCCAGCAGTTCAAAGGC S0272N03 R-CCAACAGACGGTCACTGAGC

71 3.3.3 Super-Contig 315

The starting point of super--contig 315 was contig 1668, with the corresponding

microsatellite marker Ssa0219BSFU located on clone S0023K12. Since S0023K12 was

also located at the end ofcontig 1668, the T7 end of this BAC was used for hybridization.

The result showed that this end hybridized to BACs belonging to two contigs (contig 315

and contig 627), but the PCR result confirmed that only contig 315 was the true one that connected with conitg 1668. The extension from the other end of the contig 1668 was also carried out, using the S0099D15_SP6 probe. Unfortunately, the hybridization result was too complex to interpret, which is an indication of encountering a repetitive region in the genome, and hence the contig can no longer extend from S0099D15. As for contig

315, the S0135I14_SP6 probe was used for the extension, but the hybridization result showed no positive BACs. Therefore, the extension cannot continue in this direction either. Altogether, this super-contig is a small one, covering just about 900Kb and comprising only five BACs in total after the minimum tiling pathway was built (Figure

3.9). Table 3.7 gives a summary of all the primers used for building the minimum tiling pathway of super-contig 315.

72 Figure 3.9 Super-contig 315 This contig covers just about 900Kb in length, which contains two smaller contigs: contig 1668 and contig 3]5 (diagram not to scale).

Contig 1668 Config 315

Ssa0219BSFU

SP60 99015 Hn T7~ t 135114 oSP6 II ,I 75E09 T7 38 SP6H H CH-4I-....::..:;.::B:..::O.:....7_-4l--(j T7 II S P6 ahIr--i--=:"':"':"':'-=-"1ll-+-$

73 Table 3.7 BACs selected for building the minimum tiling pathway in super-contig 315. F- Forward Primer. R- Reverse Primer. BAC ends Primers Positive BACs Probe-TCGTGCGTTCAAGACACTATCCGAAGTGTAAATAAGGGTC S038B07 R-TCCCATGGGCTCTCAGAAGG Probe-GGCTAAGGTGTATGTAAACTTCCAACTTCAACTGTAGGTG R-GGACTTCCTCCCTGTATCAC Probe-GGCAGCTGAGTCTTTTTGGTAAAAACACCTGGATTGTAC S0135I14 R-CTTTCAGCAGGACAATAACC S0038B07_SP6 Probe-TATCTAGCGTTCACCACCAAGCTTTAGCTAACGGTAGCTG S0023K12 R-TCGACCCTGCGTGTCATCAG Probe-AGCTATACCACGGCTCTTCAGTGATACGATAGAATCGAGC S038B07 R-CTGGATGTCTAAGGGCAATG Probe-AAGCGTTACGTCTGGAGGAAACCTGGCACCATCCCTACGG S0075E09 R-TGTGCTTAGGGTCGTTGTCC S0075E09_T7 Probe-GTCGAACTCAAGGAAATCGTCAGTGAGATCTACCGCATAG S0023K12 R-GATCATTGAGCCTCTGTATC S0075E09_SP6 Probe-TGGAACGTGTTTCAAAGGACGACTTCGCCATTCCCTGTAC S0099D15 R-GTCTGCAACAATGGTGTGAC Probe-TTCAGCGTTTTCCCTCTGGATATTTCTAGCTTTAGGATCG S0075E09 R-TGCACAAATCACTGCTCCTC S0099D15_SP6 Probe-ACGTAGAACACCAAATAATGCATGCAGAGGAGAATTAGGC R-CTCTTGAGAGCCAGGTCTGC

74 3.3.4 Super-Contig 2169

The starting point of super-contig 2169 was clone S0068C06, which was a singleton found by Artieri et ai. (2006) using the microsatellite marker Ssa406UoS. Initially I wanted to see whether it was possible to incorporate S0068C06 into a contig. The hybridization result using S0068C06_T7 probe was too complex to interpret, so the extension could not continue from this direction. S0068C06_SP6 probe identified clones that belong to contig 3961 and contig 4476. Since both of these contigs were small

(contig 3961 contains only five BACs, contig 4476 contains only four BACs), it was possible that these two contigs actually overlapped with each other in the genome. The

PCR result showed that S0068C06_SP6 probelreverse primer could amplify all the clones from both contig 3961 and contig 4476, hence the two contigs were indeed overlapping.

At the same time, the microsatellite marker Ssa0247BSFU from clone S0030E15 (contig

2169) was used to check the clones in these contigs using PCR, because it was believed that this marker is situated somewhere in this region. The result showed that

Ssa0247BSFU could amplify the BACs from the three contigs (contig 2169, contig 3961 and contig 4476), which further strengthened the likelihood that these contigs are connected. The BAC-ends of S016lEOl from contig 3961 were used to design primers to

75 check this possibility, and the result showed that SOl61EOLSP6 primers could amplify clones from contig 3961, and S0161EOLT7 primers could amplify clones from contig

2169. The above investigation confirms that contig 2169, contig 3961 and the singleton

S0068C06 are connected. In summary, super-contig 2169 consists of three contigs: contig

3961, contig 4476 and contig 2169, plus the singleton S0068C06, and covers a total length of approximately 1.1 Mb. The minimum tiling pathway of this super-contig contains six BACs (Figure 3.10), and a summary of all the BACs that were used for building this minimum tiling pathway are listed on Table 3.8.

76 Figure 3.10 Super-contig 2169. This contig covers about 1.1 Mb in length, which contains three smaller contigs: contig 3961, contig 4476 and contig 2169 (diagram not to scale). Contig 4476 is not shown because it overlapped with contig 3961.

Conrig 2169 Conrig 3961 Singleton

Ssa15SSFU Ssa406UoS

Ssa0247BSFU I 905104 +-- I 357E14 HSP6 68C06 T7 T7+ SP6 I j I T7<\' ; 159F09 ~SP6 T7 ~~_--,-1.:::...61,-,=E",-01~t- I I

77 Table 3.8 BACs selected for building the minimum tiling pathway in super-contig 2169. F- Forward Primer. R- Reverse Primer. BAC ends Primers Positive BACs Probe-CATTATTTTGGTCTTACTGGGATTTACTGTCAGGGCCCAG R-CACCCTACTAGAATCTGAAG Probe-AGATGAGCGATGACAAAGGAAGAGAGAGGATAAGATGGGG S0161EOI R-GGTGAACAGTATTGGAGCAC SOl 61EOLT7 Probe-CAATACCCACCTTCTCCATCTCATTTCACTCCAGTGTCTC S0030E15 R-GGCTTCACAAGACTATGCTG F-CACCCCCTTTGCTATGAAGC S0357E14 R-CCCTGACCTGTTTGGAGAGC F-CGACCGCATTATCTACCAAG S0030E15 R-ACAATGATCCAGGGTTTTGC Probe-TACGCTTGCAGTTTACTTTCAGTGTGGCGTTATCAAGCCC S0864ClO, S0905I04 R-ATCCTAGGCTCACCATAGAG Probe-GCACTTTAGTGGTCCAGTCGGACCAGACCGGCATTATCTC S0159F09 R-TTGCACATGGGGGTTCCTAT

78 3.3.5 OMYllINRA

Currently, OMY11INRA is not incorporated into the merged female Atlantic salmon

LG 1 because this marker was only informative in the male parent, and this is in the male

LG 1. I would like to incorporate this marker into a contig that was already associated

with the female LG 1, which is the only way to incorporate this marker into the merged

female Atlantic salmon LG 1. The probe designed from sequence containing

OMYllINRA identified two singletons, S0247F07 and S0208D14. Thus, the extension

of BACs and the search for a known contig associated with the female LG 1 began here.

The probes designed from S0247F07_T7 and S0208D14_SP6 were used to screen the

BAC library so that the extension could be carried out in both directions. S0247F07_T7

. hybridized to three singletons, S0091 C24, S0850H02 and S0896B21. Since none of these

BACs have been end-sequenced, extension will not be possible in this direction until the

end-sequences for at least one of these clones is available for probe design. The

S0208D14_SP6 probe identified two singletons, S0243124 and S0927D23. The extension

continued on using a probe designed from S0243124_SP6, and it hybridized to two more

singletons, S0137A06 and S0919I02. The PCR result showed that the two clones seem to

contain a large region of repetitive element, since multiple PCR bands occurred. When

79 S0243124_SP6 probe/reverse primer was tested on the Atlantic salmon genomic DNA by

PCR, multiple PCR bands also occurred, indicating that S0243124_SP6 probe has identified a repetitive region in the Atlantic salmon genome, thus chromosome walking could not continue in this direction. Therefore, the effort to incorporate this region into a contig and eventually the Atlantic salmon LG 1 continues. Currently, this region consists of nine BACs (see Table 3.9 for all the BACs used to make the minimum tiling pathway, and Figure 3.11).

80 Figure 3.11 The region around the microsatellite marker OMYIIINRA. This marker is not incorporated into either Atlantic salmon LG I or any contigs so far (diagram not to scale).

OMY111NRA

__9_1C_24_--lt--

I 247F07 -4_...;.9_19.:-1...;.02 _ Tl$------ED--+-----.- I _8_5_0_H_O_2__ I 208D14 I I t-­ Tl &---;.------e---4) SP6 ---1 137A06 I I I I TlH 243124 ---t-896821 I t SP6 I --1 927D23

81 Table 3.9 BACs selected for building the minimum tiling pathway around microsatellite marker OMYllINRA. F- Forward Primer. R- Reverse Primer. BAC ends Primers Positive BACs Probe-CAGTGTCTTAAACAAAATAGGACCACTTGAGACTGGTTAC S0208D14,S0247F07 R-CACTTACATGGTGGAAAACC Probe-ACATGAATTAGCCGTCAGACCACGCATTCTAACATAGCTC S0l37A06, S0919Il2 R-CCTGTATTCTCAAGACCAAG Probe-GGTAATTGTTTCTGTCATTGTGTTTCACCTGACAGGACTG S0247F07 R-CCCTCATAGCCTGGTTCCTC Probe-CTCGGAATTAAGTCGGATGGTGACCCAAATTCCATGTCGG S0243124, S0927D23 R-GACCAGCCAATCAAATCTC S0247F07_T7 Probe-GTGGTGCGAGACAGTGTGATGATCAGTGCTTTGCTCCTTC S0091C24,S0850H02,S0896B21 R-GTTTTATCTCCCAGCCAGAC S0247F07_SP6 Probe-ACGATGCACGTCCTTACCAGACAGGATGCCACCAAAGGGC R-ATTCGGGGTTGTCCCTTTGC

82 3.4 Assembly and Annotation of BAC Sequences

3.4.1 Sequences Annotation

The minimum tiling pathway of super-contig 783 consists of 31 BACs. Twenty of these BACs (which cover approximately 4Mb) were chosen for sequencing by the Baylor

College of Medicine Human Genome Sequencing Center. Six BACs (S0247M17,

S0501D07, SOI85Dll, S0050K08, S0057Mll and S0468F14) did not yield results due to sequencing failure. In addition, six BACs that made up the minimum tiling pathway of contig 818 (SOI20P04, SO 132J08, S0343H24, S0206M03, S0503014, and S0539C02) and three BACs that made up the minimum tiling pathway of contig 2705 (S0180H04,

S0429116, S0048H24) were also sequenced. Since 2705 resides very close to the sex- determining region (Figure 3.1), the three BACs that make up the minimum tiling pathways of this contig is a great source for identifying candidates for the sex- determining gene. The shotgun library of S0132J08 (contig 818) was constructed at

Simon Fraser University, British Columbia, and was sent to the University of Victoria,

Victoria, British Columbia for sequencing. The other eight BACs were sequenced by

Baylor College of Medicine Human Genome Sequencing Center.

83 Once all of the BAC sequences were obtained, they were combined to become a continuous sequence, if possible. Because there were six BACs in super-contig 783 that did not yield sequencing results, the whole sequence was disrupted and thus three continuous sequences and three single BAC sequences were obtained, and for each continuous sequence a scaffold name was gIven (Figure 3.12). The first continuous sequence contains seven BACs (S0946D20, S0015108, SOOI1013, S0069107, S0065Hll,

S0076C15 and S0074L06) and is named "scaffold 2354". The second continuous sequence consists of two BACs (S0126A12 and S0086K22) and is named "scaffold

1052". The third continuous sequence consists of two BACs (S0036K17 and S0045K22) and is named "scaffold 1968". The three single BAC sequences that do not belong to any scaffolds in super-contig 783 are S0066L24, S0397C07, and S0363E24. S0066L24 does not have a complete sequence and therefore could not be incorporated into scaffold 2354, whereas S0397C07 has a complete sequence but was unable to get incorporated into scaffold 1968. As mentioned in section 3.3.1, clones (including S0036K17) from contig

1968 were found by chromosome walking using a probe designed from the SP6 end of

S0397C07 (from contig 1052), and therefore contig 1968 has been merged to become part of the super-contig 783. However, the sequence of S0397C07 could not be assembled together with the sequence of S0036K17 to become part of scaffold 1968. This

84 could be due to the possibility that the S0397C07_SP6 probe might have contained a repetitive element that was not identified by RepeatMasker, and if S0036K17 also contain this repetitive element, the S0397C07_SP6 probe would hybridize to S0036K17.

In other words, S0397C07 and S0036KI7 might share a very small portion of the same repetitive element, while the other regions of the two BACs contain completely different sequences. This would lead to the misjoining of contig 1052 and contig 1968, and might also explain why S0397C07 and S0036K17 could not be assembled together. Since

S0397C07_SP6 hybridization result gave a total of 182 positive clones that corresponding to 46 contigs (section 3.3.1), we cannot ignore this possibility, and this further emphasizes the complex nature of the Atlantic salmon genome.

In contig 818 (Figure 3.13), only S0343H24, S0206M03, S0503014 and S0539C02 have the complete sequences, so only these have been combined to become a continuous strand, which is "scaffold 818". S0120P04 and S0132J08 could not be incorporated into scaffold 818 because neither of these two BACs have complete sequences. For contig

2705 (Figure 3.14), none of the BACs were sequenced completely and so they were unable to be combined into a continuous strand. However, they were grouped together and have the name "scaffold 2705".

85 Figure 3.12 BAC Sequences with their scaffolds in super-contig 783. BACs in green are the successful sequences, BACs in black are these that could not be incorporated into any scaffolds, and BACs in red are the failed sequences.

CUn1ll1n~ C:umJLjIH18 Cotlllo IO!ll2

--- SucC&$$fully S&quencod BACs

--- BACs tnal Could "01 bt lncofporilttd 1010 any scaffolds BHMS150 --- Failed SACs

nf~..c..'.-t t-_"""",K',,-'-+-+5P6 T7t- 16C15 I PSP6 T7. 89107 t. • 12/lA12 T7 ...·-'to--'"'",.D",'_'"---+1-; ,,,,, ~~_ , II n JlOlJ_

Scaffold 2354 -----+------'

86 Figure 3.13 RAe Sequences with the scaffold in contig 818. BACs in green are the successful sequences, and BACs in black are these which could not be incorporated into any scaffolds.

Contig 818

Sal1UoG Ssa202DU Ssa55BSFU

SP6 o--_+-1:..:2:..:0.;..P.=..04.:....e_-(>T7

I 1 539C02 132J08 ----T7 I 503014 1 343H24 SP6 H ------..,~T7 II ---f~----+-~ SP6

Scaffold 818

Successfully Sequenced BACs

BACs that could not be incorporated into any scaffolds

87 Figure 3.14 BAC Sequences in contig 2705 (scaffold 2705).

Contig 2705

Ssa233BSFU

2E22 SP6 $-+-----

180H04

429J16 -.-r------~SP6 I I ____4_8_H_24_t>-- _

88 3.4.2 Genes Identified

Section 3.4.2.1 gives a list of the genes identified in super-contig 783 (17 genes identified), and section 3.4.2.2 is a list of genes identified in both contig 818 (three genes identified) and contig 2705 (nine genes identified). More detailed BAC sequencing information is available on the website (http://grasp.mbb.sfu.caJ).

3.4.2.1 Super-contig 783

Figure 3.15 shows a diagram of the gene annotation for scaffold 2354 (S0946D20,

SOOI5I18, SOOl1013, S0070C22, S0065Hll, S0076C15 and S0074L06), which contains the following nine annotated genes: nonmuscle myosin heavy chain (accession number

XP_683046); 40S ribosomal protein Sa-like protein (accession number AAT44424); solute carrier family 25 member 28 (accession number NP_001014905); neuropathy target esterase (accession number XP_001236055); patatin-like phospholipase domain containing 7 (accession number EDL08192); myosin heavy polypeptide 9 (myh9)

(accession number XP_689928); muscle segment C (msh-C) (accession number NP_571347); SORCS receptor 1 (accession number XP_001512681), and

SORCS receptor 3 (accession number XP_001512594). It should be noted that, although two myosin heavy chains were identified, further analysis showed that they have an

89 opposite orientation of transcription on the BAC sequence. The sequence of the microsatellite marker Ssa208 was also identified in Scaffold 2354 (on SOO 11 013).

Scaffold 1052 (S0126A12 and S0086K22) (Figure 3.16) contains three annotated genes, which are: growth factor receptor-bound protein 2 (accession number NP_998200); ubiquitin specific protease 43 (accession number XP_426827); and TBCl domain family member 24 (XP_001232297).

Scaffold 1968 (S0036K17 and S0045K22) (Figure 3.17) contains only one annotated gene, which is septin 9b (accession number NP_001014329).

As for the three single BAC sequences, S0066L24 (Figure 3.18) contains only one annotated gene: phosphoinositide-3-kinase, regulatory subunit 5 (accession number

NP_001025868). S0397C07 (Figure 3.19) contains two annotated gene: ADP- ribosylation factor binding protein GGA3 (Golgi-localized, gamma ear-containing, ARF- binding protein 3) (accession number XP_540429), and hematological and neurological expressed 1 (HNl) (accession number NP_991176). The sequence of the microsatellite marker BHMS150 was also identified on S0397C07. S0363E24 (Figure 3.20) contains only one annotated gene as well: olfactory receptor 01r836 (accession number

XP_001515117).

90 Table 3.10 is the summary of gene contents of super-contig 783, with all the scaffolds and single BACs

91 Figure 3.15 Gene annotation of scaffold 2354 (super-contig 783).

Patatin-like Nonmuscle myson pospholipase domain heavy chain Solute carrier family containing 7 25 member 28 rps-blllSI allgumrlll to SORCS receptor 1 Consrl'wd Domain DlIlabllsr

II I Gl'lIsellIl llgalnsl full, H1unll..krd stf[UWCr, I I Al~O Pl't(Jjctrd I,,'otttn alllDltd nglliust I I tJ'llI(~lOeJnlJl"ane H Uninf50 DllllIlJlIsr and H ... hillees pn(UCltd fol' using TMHl\'Il\1 IH IHI! I#tIIHt-i I 11M H IIH'-t It I Hoil i'f+t I }iiI I i .... ~ HH >i "'lOti.

II HI III IIH II II IH III Hifl H II I H It I ~ t*tl II II 1]- GWSCllU ngainst 10,OOObp '" I " J\~tql1tllet IJlock.~. III II II II II II lllllskrd I I H tH .I " :BL~STx nllgumtUI agninst CBI Jl1' c1alllbllSt lIsl.ug 10 0001lp Ck 400k 500k 600~, 700k SOOk SelJUrllct blocks, Jnllsked

II , I IIII III I 1111 I II II HilI I I , I I , " II " " I BL<\.T allg1l]l1tJlt ngaillsl UVIC 3001 96 EST 0lL~ttl' DB using 10,OOObp _~tqutnct blocks, mas.krd

Neuropathy Myosin heavy Muscle segment SORes receptor 3 40S ribosomal protein target esterase homeobox C (msh-C) Sa-like protein polypeptide 9 (myh9)

92 Figure 3.16 Gene annotation of scaffold 1052 (super-contig 783).

Growth factor receptor-bound TBe1 domain family 11IS-blll.~t >Iligwnrllf (0 protein 2 me...... mber 2, ~~ / COlI.~tl'Vt,l Domllill DlllllbllNf

,__ ..., GtlLSfllllllgniust fnll, munnsktll .~tlJm·Jlft. ---. I " "" II )...... r Also Ill'fdlflt,I!)J'ortill fIliglltilllgnillsl I I<+f I I If+-ttt I H II >HI I II I..-.

IIIIIII IIIII II IIIIIIIIIIIII Gi'nSrlUl llgmnst IO,OOOltp Ok 10k 20;';' 30k 401', 50k 60k 1001<. 11010:, 1201', 130k 140!', 150k 1901<. 2001<. 2101<, 22C* 2301', 240" 2501<, 2601<, 27Ok. 280" 2901<, 300k 310k stqnt.ll( f' blOfk~. Inn."'J:krd

,. Ht-i II II IIt-1 I I I I 1Ie:1I1n~1 I "I "\ 0 i \ BLASTx 11l1grontUl NCBI I JU' dlltnbllSt '''~lllg 10,000bll ~fllntnrt blorks. IDllSktd

BllIt 'llig:l1lntnf llg:,l.ilt.~f ffi"l:C' 300.' Ubiquitin specific Repetitive Paliial Syntax 8 96 :EST ChL~ttl' DB "sing 10,OOObl' protease 43 Element EST Stlllltllft ltlorl,s. masked

93 Figure 3.17 Gene annotation of scaffold 1968 (super-contig 783).

Septin 9b

ttl H ,...... , ,...,...... -.... H ....-...... , ..... H Hi I

S~tO Sectl Sf-elZ Secl3 Sect4 Sect5 Stct6 Sf-etl Sect8 Sect9- I , II ,, ,, I I , ,, ,, I ,,,,, I Ck 10k ZOk )Ok 'Ok ~ 6.k 7.k 80k 90k lOOk llOk 120k 13Of- "Ok 15Ck. 160k 170k l80k 190i<. 200k 210'

HI II , t------< H H I ,I , (·t------11 " " H " , I I------< I I

94 Figure 3.18 Gene annotation of S0066L24 (super-contig 783).

'll,~-blaSI ll11puntnf 10 Phosphoinositide-3·kinase. C'onstlvtd Domain Dntnbnst regulatory subunit 5

GtllSClll1 ag:aiIL~f Cull. lUunnsktl\ Stf)lItllCt, Also IU'tdltrtd 11I'Olti1i llllglltll a~nim' • Uuinf!'O Dlllabllst {llllllrllllSmtmb"R1lt • htUcts IlI'tcUcttl1 Cor using TMHMl\l I 11111 •I • I I I III I•I 111111"'1 .u ...... II • •­ .- Vi • .~~~IISflln ngaillsl lO,OOObp S.,'O secl1 S.,'2 S",'3 S.cl' S"t~ S"t6 StCflltllC~ blocks, masl,ttl cf-k~~--~--~1:iJ.'--'---~--~--=-20rll-,----~--~3:i6-k--~--~--.o'I'k----~-+-~::+-'--~------:6+OI',-

c BLAST>: lI11gmlltlll ngninsl NCB! I\l' t1atnbll,~~ lIslnll: 10,OOObJl ~ StlJlltll t blocks. musl,ell Bbl' alIgJuntut 'lgll1nSI UVIC 3001 96:EST Clnsttl'DB millig lO,OOObll Pal1ial Netrin 1a Stf)t1tllCt hlocl,s, 1llllSktll

95 Figure 3.19 Gene annotation of S0397C07 (super-contig 783).

ADP-ribosyiation factor binding protein GGA3 (Golgi­ localized, gamma ear-containing, ARF-binding protein 3) rps:blns! llllgllmtll! !O 'OIL~t1'Ytd Dom.lill Dll!llbllSt

GtlL~tml ngainst filII, lDun'lsl,td,' tlJUtllCf, A1I"0 prrllk!td pl'o!rill alil!llt ,I >I~aiJ.lS! I {lllinfSO DllIllbllHt 0001 n'nnsm~mbrl1llt I I ~1111.llllf I~ ,., ? IIdirts Ilntli.!tll toJ' limn.!! Tl\IHI\IM H to-+--! H tt-t ~ H H'-I ~ t--l-+tt II ...... H H'tH I • ...... '"""'I-- t-H H--i H IH '~ GtnScllll'lll:,l1DSllO.OOObp " .. " I SectO Sectl Sect2 Sect.3 Secl4 Sect5 SP:'ct6 Sect7 SectS Se-ct9 Sect10 Sectll Sect12 Sect13 Sect14 Sect15 Sect16 S€'ct17 SectiS Seeti9 ~ct20 Se-c S.cW ~ct23 s.ct" S'ct" stqnturt blotks, masktd ""r--:l'tO'--2::':!O~'-~30r-.-":.'t0k-~':':!Ck~-:':60kt-~7'tCk-~8:':tCk--9::':!O~'-:":'O<::"-~11~O~'-:1c:'2C-::'-:":130:t'-~":!""~":1~5C~'-~t6C<::'-~17'±"'~";'~-8C~'-1:':9Okt-~2OOk:+:."-.::m,f..;2'::,~"".-~'3:t",,~,~,c!.,,~~,::;,t:Ok- , BLASTx .Illgrnnrll! "!.!,ltnst NCBI I t-H H--i Hi I II II I II t----1 HI llIt! I IU' 1 IH H H I tt-----i IllH • 1 dlltnbnst IISlllg 10,000111 H H , I tt-----i IllH ..J ~ StqnrllCt IIlocks, mflsktd Bill! 1111gllllltllf 'lgfllns! tTVIC 300/ 96 EST (1m!rl' DB lIStllg 10,000bl' stlJlltnre blo('\,s, mllsktd Hematological and Neurological expressed 1

96 Figure 3.20 Gene annotation of S0363E24 (super-contig 783).

Olfactory receptor 0lr836

Partial Syntax 8

97 Table 3.10 BAC sequences in super-contig 783. Scaffold 2354 BACs Orthologous Genes Found Organisms GenBank Accession Number S0946D20, Nonmuscle myosin heavy chain Danio rerio XP_683046 SOO 15108, 40S ribosomal protein Sa-like protein Sparus aurata AAT44424 SOOI1013, Solute carrier family 25 member 28 Bos Taurus NP_0010 14905 S0069107, Myosin heavy polypeptide 9 (myh9) Danio rerio XP_689928 S0065Hll, Neuropathy target esterase Gallus gallus XP_001236055 S0076C15 Patatin-like phospholipase domain containing 7 Mus musculus EDL08192 S0074L06 Muscle segment homeobox C (msh-C) Danio rerio NP_571347 SORCS receptor 1 Ornithorhynchus anatinus XP_001512681 SORCS receptor 3 Ornithorhynchus anatinus XP 001512594 Scaffold 1052 BACs Ortholo ous Genes Found Or anisms GenBank Accession Number S0126A12, Growth factor receptor-bound protein 2 Danio rerio NP_998200 S0086K22 Ubiquitin specific protease 43 Gallus gallus XP_426827 TBCl domain family member 24 Gallus gallus XP_001232297 Scaffold 1968 BACs Ortholo ous Genes Found Or anisms GenBank Accession Number S0036K17, Septin 9b Danio rerio NP_001014329 S0045K22

BACs that do not belong to any scaffolds BACs Orthologous Genes Found Organisms GenBank Accession Number S0066L24 Phosphoinositide-3-kinase, regulatory subunit 5 Gallus gallus NP_001025868 S0397C07 ADP-ribosylation factor binding protein GGA3 Canis jamiliaris XP_540429 (Golgi-localized, gamma ear-containing, ARF-binding protein 3) Hematological and neurological expressed 1 (HN 1) Danio rerio NP 991176 S0363E24 Olfactory receptor 0lr836 Ornithorhynchus analinus XP 001515117

98 3.4.2.2 Contig 818 and Contig 2705

Scaffold 818 (S0206M03, S0343H24, S0503014 and S0539C02) (Figure 3.21) contains only one annotated gene, the ATPase H+ transporting Vo unit (accession number

NP_660265), so it seems that this region has a relatively low gene content. As for the two

BACs (S0132J08 and S0120P04) that could not be incorporated into scaffold 818,

SO 132J08 does not contain any annotated genes, and S0120P04 (Figure 3.22 and Figure

3.23) contains two annotated genes: dihydrodipicolinate synthase-like, mitochondrial precursor (DHDS-like protein) (accession number Q5M8W9) and ligand-dependent co- repressor (LCoR) (accession number NP_001073441). Two figures are necessary for

SO 120P04 because it does not have the full sequence complete.

As mentioned in section 3.4.1, none of the BACs in contig 2705 (S0180H04,

S0429116 and S0048H24) were sequenced completely, thus a continuous BAC sequence could not be constructed in this contig and, therefore, the sequence analysis had to be done separately for each BAC. All of the three BACs were also grouped into "scaffold

2705". Although a continuous BAC sequence is currently not available, it is obvious from the data analysis that several genes are shared among the three BACs, thus implying overlapping regions in the three BACs. The zinc finger matrin-type 4 (Zmat4) gene

99 (accession number NP_796060) could be found in all three BACs; both S0180H04 and

S0429116 contain the following three genes: acidic and secreted protein (accession number ABD37673), growth inhibition and differentiation related protein 86 (GIDR)

(accession number XP_001508391), and zinc finger FYVE domain containing 27

(ZFYVE27) (accession number XP_690160); both S0429116 and S0048H24 contain phosphatidylinositol 4-kinase type 2 alpha (accession number NP_998523); and both

S0180H04 and S0429116 share golgi autoantigen golgin subfamily a 7 (accession number

EDL94236). In addition, there are genes that are identified uniquely in each BAC:

S0180H04 contains the mitochondrial ribosome recycling factor (accession number

XP_001345877), and S0048H24 contain phosphoglycerate mutase 1 (accession number

NP_958457) and testis-specific gene 118 (TSG118) (EnsembI number

ENSGACP00000025036). Table 3.11 is a summary of gene contents of contig 818 and contig 2705, and Figure 3.24 shows a putative orientation of the three BACs and the location of all the genes in contig 2705.

100 Figure 3.21 Gene annotation of scaffold 818 (contig 818).

ATPase H+ transpolting Vo unit

~...-mll I~I"!I I IIfH, rt-H ...... , 11'1'-1 HO ...... HI'-+tH '" '" • • , • I ~ , , III ,.. -I 1 , 1 H " "' " " " • , , I IH I " . ,, H • •

1 1 1 1 1 I 01: lOOk 200k 30'* 400k 500k

II , 1 , ," "' " , " "I

101 Figure 3.22 Gene annotation of S0120P04 (contig 818). Green: Genscan against full, unmasked sequence. Also predicted protein aligned against UnirefSO Database and transmembrane helices predicted for using TMHMM. Red: Genscan against 10,000 bp sequence blocks, masked. Predicted exons aligned against UnirefSO Database. Predicted CDS aligned against Ensembl fish proteome. Blue: Blastx alignment against NCBI nr database using 10,000 bp sequence blocks, masked. Yellow: Blat alignment against UVIC 300/96 EST Cluster DB using 10,000 bp sequence blocks, masked.

Ligand-depend co-repressor (LeoR)

SE'cto -SE'ctl I I IIIII 1 , 1 t IIIIIII I I I I IIIIII I 11 I I I I I I I 1 I I I I I I I 1 I tit I IIIIIIIII I I I I Ok lk 3k ok 6k 7k 8k 9k l~O~O

102 Figure 3.23 Gene annotation of S0120P04 (contig 818). Green: Genscan against full, unmasked sequence. Also predicted protein aligned against UnirefSO Database and transmembrane helices predicted for using TMHMM. Red: Genscan against 10,000 bp sequence blocks, masked. Predicted exons aligned against UnirefSO Database. Predicted CDS aligned against Ensembl fish proteome. Blue: Blastx alignment against NCBI or database using 10,000 bp sequence blocks, masked. Yellow: Blat alignment against UVIC 300/96 EST Cluster DB using 10,000 bp sequence blocks, masked.

Oihydrodipicolinate synthase-like, mitochondrial precursor (OH OS-like protein)

(IE (E (E (( I (lit (. il] III II I) I l J) B If ( lEE ( E t j I• ••1 I J J l I l I I I II HI I DI f HI'-l ... • • I • • • Secto Sect1 Sect2 Sect3 Sect4 Sect5 Sect6 Sect? SectS - Sect9 I I I IIIII I I IIIII I I IIIIII I I I I I I I 1 I I I I IIIIII I I IIIIII I IIIIIIII J IIIIIIII J IIIIIIII I I I J I 11 Ok 10k 20k 30k 40k 50k

o III lID D~ 011 I II D

103 Figure 3.24 Putative orientation of the three BACs and the location of all the genes in contig 2705

Mitochondrial Acidic and Ribosomal Secreted Recycling Factor Protein ZFYVE27 I 180H04 I Phosphatidylinositol 4·Kinase

429J26 --f------t-----f----r------+------r---

Growth Inhibition Goigin and Differentiated Subfamily TSG118 Related Protein 48H24 --+------1------+------+---

Zmat4 Phosphoglycerate Mutase

104 Table 3.11 HAC sequences in contig 818 and contig 2705. Scaffold 818 BACs Orthologous Genes Found Organisms GenBank Accession Number S0206M03, ATPase H+ transporting Vo unit Homo sapiens NP_660265 S0343H24, S0503014, S0539C02 BACs that do not belong to any scaffolds BACs Ortholo ous Genes Found Or anisms GenBank Accession Number SO 120P04 Dihydrodipicolinate synthase-like, mitochondrial Null Q5M8W9 precursor (DHDS-like protein) -S-0-13-2-J-0-8-1 ~i;::d-dependent co-repressor (LCaR) ~~~ia reria NP 001073441 N/A Contig 2705 (Scaffold 2705) BACs Ortholo~ousGenes Found Or~anisms GenBank Accession Number S0180H04 Mitochondrial ribosome recycling factor Danio rerio XP_001345877 Acidic and secreted protein Sparus aurata ABD37673 Growth inhibition and differentiation related protein 86 Ornithorhynchus anatinus XP_001508391 Zinc finger FYVE domain containing 27 (ZFYVE27) Danio rerio XP_690160 Golgi autoantigen golgin subfamily a 7 Rattus norvegicus EDL94236 Zinc finger matrin-type 4 (Zmat4) Mus musculus NP 796060 S0429J16 Acidic and secreted protein Sparus aurata ABD37673 Golgi autoantigen golgin subfamily a 7 Rattus norvegicus EDL94236 Growth inhibition and differentiation related protein 86 Ornithorhynchus anatinus XP_001508391 Phosphatidylinositol 4-kinase type 2 alpha Danio rerio NP_998523

Zinc finger FYVE domain containing 27 (ZFYVE27) Danio rerio / XP_690160 Zinc finger matrin-type 4 (Zmat4) Mus musculus NP 796060 S0048H24 Phosphatidylinositol 4-kinase type 2 alpha Danio rerio NP_998523 Phosphoglycerate mutase 1 Danio rerio NP_958457

105 Testis-specific gene 118 (TSGl18) Gasterosteus aculeatus ENSGACP00000025036 (Ensembl number) Zinc finger matrin-type 4 (Zmat4) Mus musculus NP 796060

106 3.5 Comparative Genome Analysis

Comparative genome analysis depends on the identification of orthologous genes from different species, so I compared the Atlantic salmon LG 1 with the genomes of four different teleosts: medaka, stickleback, zebrafish and Tetraodon. All of them are excellent model organisms and have been studied for many years with regard to their genomes. In recent years, the genomes of these four fishes have even been fully sequenced in order to address questions in the field of comparative genomics, and the sequenced genomes are now available at the Ensembl website (http://www.ensembl.orgl).

By doing such comparisons, it may be possible to predict and discover the possible candidate gene(s) for sex-determination in Atlantic salmon. As of September 1, 2008, 33 microsatellite markers and five SNPs have had their corresponding contigs identified

(Figure 3.25). A separate BLASTx (Altschul et ai., 1990; Altschul et ai., 1997; Zhang and Madden, 1997) was performed between all the BAC-end sequences from the contigs assigned to Atlantic salmon LG 1 and each of the proteomes for medaka, stickleback, zebrafish and Tetraodon. The lists of BLAST results can be found on the ASalbase website (http://www.asalbase.orgl). In order to make sure that the correct synteny was chosen in each of the contig examined, only these orthologues that were in close

107 proximity with each other on the same chromosome were included. If no such case were found in a contig (e.g. each of the orthologues belonging to a different chromosome), then I would select the orthologues that contain more than 80% sequence identity with the BAC-end sequence (e.g. the myosin heavy chain 11 protein on zebrafish chromosome

6; see section 3.5.3). Using these criteria, the most likely orthologues could be identified and selected.

An excellent way to visualize the conserved synteny between LG 1 and the four teleost genomes is to construct an Oxford grid for each of them. Oxford grid arrays the orthologues from two species according to chromosome for each species (Woods et ai.,

2000). In order to see how these syntenic regions are distributed along Atlantic salmon

LG 1, the orthologous chromosomes from each of the teleosts examined were used to reconstruct the Atlantic salmon LG 1, using the method of Naruse et ai. (2004). Four

Atlantic salmon LG 1 maps were reconstructed by rearranging the orthologous chromosomes from medaka, stickleback, zebrafish and Tetraodon in the order of their corresponding microsatellite markers and their contigs that appear on Atlantic salmon LG

1 (see sections 3.5.1,3.5.2,3.5.3 and 3.5.4).

108 Figure 3.25 Atlantic salmon LG 1 wilth lthe contigs and their corresponding markers.

Ss aoIDBU oS 43.6 ctg2100 Ssa247BSFU 423 ctg2100 Ssa15SSFU 4].1 ctg2100 Ss aJgq;;rBS FU 31.4 ctgOOJ Ssa542BSFU 29.2 ctg172 BHMS215 292 ctg172 Ss aJ219EC IG ctg2183 On e102.A.D FG 24.9 ctg2183 Ss a1 077BS FU 24.9 ctg1C&J Ssa531BSFU 19.5 ctg155 Str4lNRA 195 ctg155 AC T/CAC 10e:J3 18.4 AAGlCAT218 18.4 Ss aJ 114ECIG 18.4 ctg155 One18ASC 18.4 ctg155 Ss aSffiBSFU 15.2 ctg584 HoxA11b1ii 14.1 9J9J1A19. 9J934C03 AG CIG TCffiS 13 OMM11221i 13 ctg3:ID Ssa-A'15I1 13 OMM51881i 13 ctg3:ID Ss a- A'14i1 13 ctg3:ID BHMS447 5.5 ctg584 OmyFGTSTUF 5.5 ctg584 OMM1015 0 OMM1278 ctg332 -BHMSlD2911A 0 ctg332 Ssa219BSFU 14.1 ctg315 Ss £91BS FU 152 ct~3 BHMS100 15.2 ctg783 C.A341677 17.3 ctg783 Ss a33'"!BS FU 18.4 ctg1185 SsaJ182ECIG ctg1185. 843 Ssa203t2 23.9 ctg783 ACT/CTG71 23.9 Ss aJ 1ffiECIG 27.1 ctg5983 ClJBffi 34.7 ctg1200 Ssa58 34.7 CL11428 38.0 ctg807.938 BX311884 4].2 Ssa55BSFU 44.5 ct¢:18 Ssa183BSFU 44.5 ct¢:18 Ssa202DU 44.5 ct¢:18 Ss a181 BS FUll 44.5 ct¢:18 Sal1UoG 44.5 ct¢:18 SS12 43.7 Ssa233BSFU 43.7 ctg2705 AAGICTAZJ3 57.5 AAClCTT97 00.8

109 3.5.1 Medaka

In the medaka genome, a total of 36 orthologues were shared with Atlantic salmon

LG 1, which corresponds to three conserved synteny groups: chromosome 1 (18

orthologues), (3 orthologues) and chromosome 16 (15 orthologues)

(Table 3.12). Figure 3.26 shows the medaka orthologous chromosomes arranged in the order that would appear on Atlantic salmon LG 1. It is quite clear from Figure 3.26 that the p arm of Atlantic salmon LG 1 is orthologous to medaka chromosome 16, and the q arm is orthologous to medaka chromosome 1, although there are small portions of orthologous genes from medaka chromosome 11 that are dispersed in between chromosome 16 and chromosome 1, which could be due to chromosomal translocations or inversions, or perhaps incorrect assignment of orthologues. Therefore, there IS extensive conservation of syntenic regIons between the Atlantic salmon LG 1 and the medaka genome. It is interesting to see that the portion correspond to the q arm of

Atlantic salmon chromosome 2 (where the sex-determining regIon lies) has a large syntenic region from the medaka chromosome 1, since the medaka chromosome 1 is the sex chromosome (Kasahara et al., 2007). However, since DMY (or DMRTl) IS not identified among the medaka chromosome 1 orthologues, it is reasonable to conclude that medaka and Atlantic salmon have adopted different sex-determining genes. The positions

110 of the orthologous genes in the regions of conserved syntenies on LG 1 were also examined to see the extent of the gene order maintained. It can be seen that gene order has been altered, so significant intrachromosomal rearrangements have occurred since the divergence of the Atlantic salmon and the medaka. The gene ZFYVE27 (see section 3.7 for more detail on this gene) can be found on the telomeric end ofthe Atlantic salmon LG

1 in contig 2705, the sex-determining region of Atlantic salmon. However, the function of ZFYVE27 in medaka is not yet known.

111 Figure 3.26 Atlantic salmon LG 1 with the orthologus chromosome from medaka. On the left, the orthologues are arranged in the order of the corresponding microsatellite markers which appear on LG l. The numbers on the right of the boxes indicate the location of the gene on the chromosome of that particular organism. On the right, the orthologues are sorted according to the chromosome order of that particular organism.

At.lant.ic Salfllon Linkage Group 1 Sorted by Medaka Chromosome NuMber

I\Ib G(>u{"s

46.6 RRAGC <1 of 2) 18 2131~~85-21323889 "PEPPS i 1829741-t046S83 ('Ol\ti~ 2169 1------1 40.1 IfY8n31 1lI HYHJ8 __"":'__-l2B6ft574-2118639 1 JU lttSD7A (I of 2) 18 31Ba22&-)188948 Conn:: 380 I-----j 31." T"KfH1968 .. 28566992-28594325 29.~ PI,,"21 " 25115161-25132775 RPl81CHUtlRH 3882N2-3993706 t---,-.-~-125i91542-2~285437 t------1 29.2 SCAr (2 of 2) SfC1-4L1 u of 2) 04196285-4286185 ('onne:17~ 29.2 snRRec1 (1 of 2) 10 C8X8 (1 or 2) 4276484-.4278189

29.~ OPY19U 1~ COX4

24.9 ITGAI t8 SORCSt

19.5 TRIO (1 of 2) 1'1 9526372-9589382 Rtf8Dl.3 11563729-11&83311 I-----j 19.5 GR010 l' 9732483-9789381 19.5 OOC: I. 6.~ NODI 5816138-~~68 " COllti~ 6S~ 65 LIl101 (1 of 2) l' 62:52969-6273244 0 PLEe1 ••I ••~ 17924642-17968144 Contig:'32 U.I coxa (1 of 2) 4276464-4278189 1-----1 Conti~ .'\15 14.1

1:-.2 TNRC6C (2 of 2) 11597158-.11629931 1-----1 1:-.2 RHBOL3 .1166372'3-1.1.683311 1-----1 134 "PErrS 1829741-1846663

13.4 LTV! 3168226-3168948 Conrig 11S5 1----1 13.4 DYSFIPI <1 of 2) 3830966-3831653

~.J.7 HYHll (1 of 2l 1-----,33367095-33396412 ,4 , NOEl (2 of 2) 333987_16_33481738 COllri~ 1:!69 .\4 - ICIRA8436 3341810]-33436898 1------1 J4.- ttf'V17L (2 of 2) 33453936-33"l55734 1----1 46 - ZFvVE27 31185187-3UI8639 '----'

112 Table 3.12 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with medaka.

Madaka Chromosome Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 I LG 1 18 0000000003000 0 15 0 0 0 0 0 0 0 0

113 3.5.2 Stickleback

The stickleback genome and Atlantic salmon LG 1 shared 52 orthologues in total, 28

of them belong to LG IX and 24 belong to LG XX (Table 3.13), so two conserved

synteny groups were found. Figure 3.27 shows how these stickleback orthologues and

syntenic regions are arranged on the Atlantic salmon LG 1. It shows that the p arm of

Atlantic salmon LG 1 is orthologous to stickleback LG XX, and the q arm is orthologous

to stickleback LG IX. Again, extensive conservation of syntenic regions is found between

the Atlantic salmon LG 1 and the stickleback genome. However, the sex chromosome in

the stickleback is known to correspond to LG XIX (peichel et al., 2004), so both Atlantic

salmon and stickleback have evolved different sex chromosomes. The orthologous gene

positions in the two conserved syntenies were also looked at on LG 1, and it can be seen

that intrachromosornal rearrangements have occurred since the divergence of the

stickleback and the Atlantic salmon, due to the fact that gene order has altered. What is

interesting is that the orthologous gene RPTOR_HUMAN occurs twice in LG IX when arranged on the Atlantic salmon LG 1 (in contig 393 and contig 1185). This could be caused by a tandem duplication in Atlantic salmon. It is unlikely that contig 393 and contig 1185 overlap because contig 393 contains the microsatellite marker 5sa691 B5FU

114 and it has a linkage distance of 15.2 cM from the centromere on LG 1, whereas contig

1185 contains the microsatellite marker Ssa337BSFU and it has a linkage distance of

18.4 cM from the centromere on LG 1. This implies that the two contigs are about 3 cM apart on LG 1. The sex-determining gene is not yet identified in stickleback, but the gene

ZFYVE27 can also be found at the telomeric end of Atlantic salmon LG 1 on contig 2705, which reinforces the possibility that this gene could be a candidate for sex-determination in Atlantic salmon.

115 Figure 3.27 Atlantic salmon LG 1 with the orthologus chromosome from stickleback.

Atlantic Salmon linkage Group 1 Sort.ed by St.lcklcback Unkaee Group

Gelle.. Po ..rir::iou(baH"'I:) GE'IU''\"

Q8lC2E2..HlJl1AM '- ~ 1'1£>92:140-14698927 OtSGACP0800092326:li Ix 8429824-8439458 EM5GACP1l00008170t2 1"I717343-1-471998-t '-----I Couti.;:SSO 0tS1iflCP808888239,22 ~====.u.====~18819579-10822517 EM3GACMlOOO0917909 f----- 1-4993715-1-4'91.8192 OtSGA0"806889239581-__I_'_-l18894991-1810821.6

149~7'52-14942!7" RARB (J. or 2) 0tSGfl(P886880239:51 1~ 18111379-18114S81. '----- ~----j Ot::::7::0:Q~t---=~"':'--I::~:::::::::: 1-----,,""7201::l-t4388.l46 RA025 (lor 2) 16&6-t9t 4-1 S8lt 491 la2:S8"II-'182~.0t310 '----- EM5GAcPOOO00024044 - --..---1 1681)9!l2-16021325 I-----j '----- 160381)09-1.6835735 1----- 16'

~.. 9 0(,..,48 78397-98611 l!).5 f------1..,71l'_6188'<7

195 CIlB:18

o TIOII! <1 or 2) ){)C '77~63-37BI307 '-----I o PLECl (L or 2) 3999223--49"""'83 o f------l '4 I COllri~~l~ 1<1.1 15.2 .. 15.2

15.1 O!:TC2 98654-::16505 15.1 f------1

1:'.2 2«2881:1-2"360:1.2 l:'.~ f------1 9636494-9Q4g958 f------1

RH8Ol..J VI I- IS' EICSI;;ACf"9889982;,s22- DI 181t1.9'310-10022::I11 IS.4 OCSGACP1lIMll99923:9,e ~ IS.4 EICSGACf"9989982:J:9!)1 1,1 1$." £N3G1Kf"98800024lHJJ f-_...:'"__...,18193493-1.010:i216 18.• £M5~249441-__"'__-l10~411-1825..,18

IS.., OVSFD"t U or 2) 1-_~Dl'-__I1822:1987-1822!Rl88 ('ouo; llS~ 1$." Jtf'TOIf_HUttAM 1-__"__.,18~392-18y..u

18.4 lliI 19594239-'19:i9D':l15 RARO (t of 2) 14934752-14942574

3~.:' 155~450-1.-:I~5307 f------j HETRLLOPRlJTnltASE INHIBITOR 91-__"__... RR825 (1 Qr 2) f----~1600491+-16911.0191 ~.J.­ KIAA9430 1-_"':"''-__ ll:J~1.3J.--1.:i~92904 IlAllUA 168'13992-1692192'5 C:ouri~ 1169 -!1~:i66911-1:5:569:26:5 1-----1 ;\0.1.7 t1PVt7l (1. or 2) 1'-__"'__ lse3080lr:1.663'7~ ;\.j­ 11'flU1 (lor 2) III 1:'ii>063:i:1-1.,82704$ noo... f------I 16S

116 Table 3.13 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with stickleback.

Stickleback Linkage Groups I II III IV V VI VII VIII IX X XI XII XIII XIV I LGt 0 0 0 0 0 0 0 0 28 0 0 0 0 0

Stickleback Linka~ e Groups XV XVI XVII XVIII XIX XX XXI XXII XXIII XXIV XXV XXVI I LGt 0 0 0 0 0 24 0 0 0 0 0 0

117 3.5.3 Zebrafish

The zebrafish genome shared 35 orthologues with the Atlantic salmon LG 1, with 34

of them belonging to SIX conserved syntenic groups, corresponding to zebrafish chromosomes 1,3,11,16, ]9, and 21 (Table 3.14). One of the orthologues, myosin heavy chain 11 (myh11) from chromosome 6, does not belong to any of the other synteny groups, and it may reflect the existence of an additional synteny group as more regions along LG 1 are sequenced (Figure 3.28). When all of the syntenic regions were put together on the Atlantic salmon LG 1, as shown on Figure 3.28, it can be seen the overall configuration of the syntenic groups is quite disrupted, although relatively large portions of zebrafish chromosome 16 and chromosome 21 are conserved. The disrupted configuration of the syntenic groups might indicate there was shuffling of the genomes caused by chromosomal translocations and rearrangements after the divergence of the

Atlantic salmon and zebrafish. The orthologous gene order was also examined in the two largest syntenic groups, chromosomes 16 and 21. The gene order has altered on the chromosome 16, an indication that intrachromosomal rearrangements had occurred.

Chromosome 21, on the other hand, shows no alternation in the gene order. However, since all of the orthologous genes on chromosome 21 came from contig 783 in Atlantic

118 salmon, the claim that there was no intrachomosomal rearrangement in this regIOn IS weak, and further investigation is needed in order to confirm the validity of this result.

Although neither sex-linked markers nor sex chromosomes have been identified in zebrafish so far, and the sex-determination mechanism for this species remains a mystery

(reviewed in Volff et at., 2007), the gene ZFYVE27 is once again found in the region corresponding to the telomeric end of Atlantic salmon LG 1, supporting its possible role in sex-determination of Atlantic salmon.

119 Figure 3.28 Atlantic salmon LG 1 with the orthologus chromosome from zebrafish.

At lant..ic Salmon Linkage Group 1 Sorted by Zebrafish Chromosome Number

G",ue.~ Po.~irions (b.,s~s) ~1b Gf'Uf'S PO~ltions (b;'lSH)

~6.6 ZlC:111971 I. 396819'38-39616OU pdcd411 23438687-23454989 Contig 2169 ~O.I "1'0"" " 4732"88-.47396861 ZFYV[21 51403345-51115808 J1 J Zl:c:ll.ot14l1 27533686-27548126 lOC572858 48889666-40834187 " ('OHri~ ssa :11.J si:d1cflor12h3.1 .. 34665502'-5-4896854 ZIC:66474 13583572-13694214 29.2 lOCl95289 t. 30881239-38891366 zvc:l::58588 15~9276G-15S99141 29.2 lOC~292 16 38188953-391.4:1123 COHri~ J!2 ft

1~.1 dacha .zL 2723302:)-27524751- COllti~ -:-83 zec;U-41-47 19 27~33686-27'5-49126 I'.J olul 21 216.26476-27656512 si:rp71-45k5.7 U 32557122-326364-47

1'.3 cOM'ok2a 2. 33865,.gs-33939555 si:dke-r12h3.1 '9 34665582'-34896954 J'.3 LOC798654 .. 342523'35-34253387 CLC-5 :n 19889931-199"16648 1'3 zec:J11958 3438994&-34329827 zec:118786 :u 22157240-22184222 18.4 .zec:118712 9O&4742-9tl13418 nrxn2a 21 22394860-22991850 18.J npcPP:2 19539592-18579902 Couti~ US:' dpp3 ... 2327583&-23287911 18.J zl:c:92935 19611~9-19619381 dacha 27233825-27524751 J4. - n'jhll 6731516-6885741 COllti: 126'9 olul 21'" 27628-476-21656512

JS.O pdcd4a 23436687-234'54989 cank2a '-3885496-339395:i5 ('ouri~ .. 80- 380 LOC572858 49899686-4683

J20 Table 3.14 Oxford grid showing conservation ofsynteny between Atlantic salmon LG 1 with zebrafish.

Zebrafish Chromosome Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I LG 1 3 0 2 0 0 1 0 0 0 0 3 0 0 0 0 13 0 0 4 0 9 0 0 0 0

121 3.5.4 Tetraodon

Only 23 orthologues were found between the Tetraodon genome and Atlantic salmon LG 1. Five conserved syntenic groups were found: chromosome 7 (5 orthologues), chromosome 8 (5 orthologues), chromosome 11 (3 orthologues), chromosome 15 (2 orthologues), and chromosome 18 (8 orthologues) (Table 3.15). Figure 3.29 shows the arrangement of all the Tetraodon orthologues on the Atlantic salmon LG 1, and it can be seen that the overall configuration of the orthologous chromosomes is quite diverse as

well. It seems that little intrachromosomal rearrangements had occurred 10 these orthologous chromosomes after I examined the three largest conserved synteny groups, chromosomes 7, 8 and 18, but due to the fact that these conserved synteny groups are relatively small, this claim needs to be further justified. Currently, the sex chromosome and the sex-determining mechanism are unknown in Tetraodon (Figure 1 in the review by Volff et al., 2007). However, the gene SOX 21 is identified in the telomeric end of the

Atlantic salmon LG 1. Since the mammalian SRY gene is thought to be derived from a member of the SOX , this could also be a good candidate for the sex- determination in Atlantic salmon.

122 Figure 3.29 Atlantic salmon LG 1 with the orthologus chromosome from tetraodon.

At.lant.1c SBllnOn Unkaee Group J Sort.ed by Tct.raodon ChrQ~SQQc "'--er

Ml> Gf'Ilf'$ PO;Ut10n (bilSf''''l') Gelt"'~ Po."iriou (bn.(~:'l')

29.2 CSTEHC999299310Bl 'UKi1l2)1)-4819153 I't:I:ANE)(,1 )8B46~-361:)tS::t9 CO"tI~ ."2 1~.! GSTfMC&aEl28:J22881 01':11391-«0:19648 ClYCOC£M PHOSf'ftORYLRSf 'e97768-3915167

19 ; CSTfNC88835.1la.4881 398n:s"-39'1~32 nPf5 5

0 CSTfl'lcee913238A81 1945.32-19~ CiWl Jl:lMASf: II DfUA [HAnl 18971712-1.99'/64'76

0 GSTEHGeee28781881 G'ST£I'lC088132'l8881 218888-4-219878-4 COlln~ 332 19458'2-19<118566 0 ~TE1'lCeee35~gel 22888"2

1:'.1 Gl.TCOG£M ","I)SI'HORYLAS( 3901768-3915167 CSTENC800""O-tOO1 3988954-'91'159:2 1~.1 YIf"f:J !f"~-94586l)" Conti'!: ;83 1'.> CSTENG89El33896981 951,)1KB-9~19632

17.> CRt1 U*lSE II DELTR [KAIM 18978772-19978476

1S .. CSTfltGOO82839888.1 998277-l.9998'9

IS. Sf&RCJll...lI1fMIH f'R{CURSOR te'~198~7"

IS.• GST EHG8El8119:JleDJ Conti;: 1185 l~.J 18e'O"91-198..7t.:'J7 ANIO'" D:CItAMQ; 30LllTf CJlltll:I[lil "$118358-:)71'''79

lS.4 CSTOGe08t19"'10et 1888't177-10869229 GSYt:HCe08119579Al 1991..8"3-999516-4 IS" CSl[NGgeOtl93e881 t88e2G88-1800'.1~ An]ItOf'£f'TIDAS£ tOO3949t-l0eO.~ AIlIO" OICHAHGE. SOLUTE CARRIER 5710938-5713"79 Cono;80"; CSTEHGOOOl19"11I(n lA96"177~le069229 46.: sox 2t a 1927895-1929968 COllrig 2705- CSTENG80011.99086:1 110021189-1000559:2

123 Table 3.15 Oxford grid showing conservation of synteny between Atlantic salmon LG 1 with tetraodon.

Tetraodon Chromosome Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 I LGI 0 0 0 0 0 0 5 5 0 0 3 0 0 0 2 0 0 8 0 0 0

124 3.6 A Comparison between the Comparative Genomics Data and the BAC Sequencing Data

As mentioned III section 3.5, the comparative genome analysis was done by subjecting the BAC-ends to BLASTx search. However, since I now have the full BAC sequences in the region of super-contig 783, contig 818 and contig 2705, I can do a data comparison between these two results. It can be seen that in super-contig 783, the result from the BAC sequence analysis is quite consistent with the result from the BAC-ends

BLASTx search in both medaka and stickleback, in which septin 9, phosphoinositide-3- kinase regulatory subunit 5, SORCS receptor 1, myosin heavy chain, and ADP- ribosylation factor binding protein GGA3 (not identified in the BAC-ends BLASTx search for medaka) were identified. Therefore, the BAC sequencing data confirm the result of the BAC-end sequences in both medaka and stickleback. However, the genes identified using BAC-end sequences for both zebrafish and Tetraodon turned out to be quite different compared to the BAC sequencing data. One reason for this could be that the genome of zebrafish had undergone too many chromosomal rearrangements and gene shuffling (as Figure 3.28 shows a disrupted syntenic groups configuration), and the genome of Tetraodon had undergone compaction (see Introduction, section 1.3.5).

Therefore, both zebrafish and Tetraodon genomes could be less conserved with the

125 Atlantic salmon genome and they might not be a good choice for comparative genome

analysis. In addition, many of the genes in Tetraodon have not been annotated, and hence

this could be the reason why many orthologous genes could not be identified in

Tetraodon. It can be also seen that some partial genes, such as syntax 8 (in both scaffold

1052 and S0363E24; see Figure 3.16 and Figure 3.20, respectively) and netrin la (in

S0066L24; see Figure 3.18), were identified. This could be due to frameshift mutation

that causes the truncation of the protein sequence, or the random insertion of gene

fragments by the transposable elements, leading to the pseudogenization of the genes.

In contig 818, orthologous genes were not identified using the BLASTx search in

the genomes of the four fish that I analyzed, whereas three genes were found when the

six BACs in this contig were sequenced (section 3.4.2.2). As mentioned in section 3.4.2.2,

contig 818 turned out to be a gene-poor region from the BAC sequencing data. Since

BAC-end sequences are very short in length, it is possible that none of the BAC-ends in

contig 818 contain any genes, and therefore it is reasonable that BLASTx search using

BAC-end sequences failed to provide any information on orthologous genes.

In contig 2705, some degree of consistency can also be seen between the BLASTx search using BAC-end sequences and the BAC sequencing data, in which ZFYVE27 was identified in medaka, stickleback and zebrafish. Since contig 2705 is located near the sex-

126 determining region of LG 1, and ZFYVE27 is expressed in the ovary of Atlantic salmon

smolt (see section 3.7), this gene might have a role in the gonad development of Atlantic

salmon and is therefore a good candidate for sex-determination. The genes that were not

identified using the BAC-end search but were identified from the BAC sequencing results

might be because these BAC-ends do not contain the genes (i.e. a similar situation to the

BACs in contig 818). However, all of the BACs in contig 2705 are not fully sequenced

and thus contain a lot of gaps, which made the sequence analysis more difficult and it is

currently impossible to obtain a tme orientation of the three BACs in this contig. Figure

3.25 shows that contig 2705 is located near the telomeric end of LG 1, and from the

DAPI staining of Atlantic salmon chromosome we know that there is a large region of

heterochromatin on the telomeric end of the q arm on chromosome 2 (Artieri et ai., 2006).

Therefore, contig 2705 might contain a lot of repetitive elements that jeopardized the

sequencing results and produced a lot of gaps, which makes it difficult to assemble the three BACs into a continuous sequence. However, Figure 3.24 is an attempt to assemble the three BACs into the possible orientation with their gene locations according to the sequence data of each individual BAC, and we might need a better sequencing tool in order to reveal the true feature of contig 2705.

127 3.7 Candidate Sex-Determining Genes

As shown in Figure 3.25, contig 2705 is located near the sex-determining region of

Atlantic salmon LG 1, hence this contig becomes a great interest to look for sex-

determining gene candidates. From the comparative genome analysis using the BAC-end

sequences (section 3.5), the gene encoding the zinc finger FYVE domain containing 27

(ZFYVE27) protein was identified in the genomes of medaka, stickleback and zebrafish

that correspond to contig 2705, and SOX 21 was identified in the genomes of Tetraodon

that also correspond to contig 2705. The next step would be to take a closer look at the gene contents in these genomic regions in medaka, stickleback, zebrafish and Tetraodon that are orthologous to contig 2705 in Atlantic salmon LG 1, using ZFYVE27 (and SOX

21 for Tetraodon) as the centre point, with the help from the ASalbase database

(http://www.asalbase.org). Figure 3.30 shows the region of medaka genome that is orthologous to contig 2705, which corresponds to the region from 31.0 Mb to 31.2 Mb

(covering approximately 200Kb) on medaka chromosome 1. Figure 3.31 shows the region of stickleback genome that is orthologous to contig 2705, with the corresponding region from 14.8 Mb to 15.0 Mb (covering approximately 200Kb) on stickleback LG IX.

Figure 3.32 is the region of zebrafish genome orthologous to contig 2705, corresponding to the region from 51.3 Mb to 51.5 Mb (covering approximately 200Kb) on zebrafish

128 chromosome 1. Figure 3.33 shows the region of the Tetraodon genome orthologous to

contig 2705, and it corresponds to the region from 1920Kb to 1990Kb (covering

approximately 70Kb) on Tetraodon chromosome 18. It seems reasonable that a genome

coverage of 70Kb is sufficient in the case of Tetraodon, since its genome is more

compact than the other three teleosts. All these four figures show a consistency with each

other in terms of the gene contents and the gene orders, and they are also quite consistent

with the BAC sequencing data (section 3.4.2.2 and Figure 3.24), indicating a

conservation of the syntenic regions among the four teleosts and the Atlantic salmon with regard to contig 2705. By comparing the BAC sequencing data and the genomes of the four teleosts corresponding to contig 2705, it can be seen that genes such as ZFYVE27, golgin subfamily, cartilage acidic protein (CRTACl), phosphatidylinositol 4-kinase type

2 alpha (PI4K2A), phosphoglycerate mutase, testis-specfic gene (TSG118), tektin-4

(TEKT4) and G-protein coupled receptor (GPRC5B) are quite common in the genomes of the four teleosts and Atlantic salmon, although there are some slight variations among the four teleost genomes, which could be due to the chromosomal rearrangements or gene shuffling during the course of evolution. Table 3.16 is a summary of all the gene contents in the genomes of four teleosts corresponding to contig 2705 in Atlantic salmon LG 1.

129 Since SRY and DMRTl, the genes that are both involved in sex-determination, contain DNA-binding motif and are both transcription factors (see Introduction), the goal

would be to identify any putative DNA-binding protein, and in particular, transcription factors. Currently, there are three likely candidates for sex-determination: ZFYVE27,

zinc finger matrin-type 4 (Zmat4) and TSGl18. ZFYVE27 encodes a FYVE-type zinc- finger protein is involved in neuronal development in human, and the defect in ZFYVE27 causes degenerative spinal cord disorder named spastic paraplegia (Mannan et ai., 2006).

Although it is a phospholipid-binding protein rather than a nucleotide-binding protein

(Kutateladze, 2006), expression study of ZFYVE27 showed that it is expressed in abundance in the ovary of Atlantic salmon smolt (Fujiki, unpublished data; Figure 3.34), and so it could be involved in the gonad development in Atlantic salmon. With this expression result in hand, given that ZFYVE27 is identified in the genomes of all four teleosts that are orthologous to contig 2705 in Atlantic salmon LG 1, and it is itself also present in contig 2705 of Atlantic salmon LG 1 (found in both S0180H04 and S0429J16), this gene is indeed a good sex-determining gene candidate. Zmat4 encodes a protein that contains a DNA-binding domain, and the expression studies revealed that it has a predominant expression in both brain and ovary of Atlantic salmon smolt (Fujiki, unpublished data; Figure 3.34), indicating a possible role in the gonad development in

130 Atlantic salmon. However, Zmat4 is not found in the genomes of the four teleosts examined that are orthologous to contig 2705 in Atlantic salmon LG 1, and it is currently considered to have a role in the maintenance of the nuclear matrix structure by attaching chromosome through its zinc finger domains (Nakayasu and Berezney, 1991). Finally,

TSG118 (identified in S0048H24), as the name suggests, might be responsible for testis development. Current understanding of this gene is that it encodes a protein that closely resembles nucleostemin, a nucleolar targeting protein involved in the control of the proliferation potential of stem cells, and it is considered to have a role in targeting other proteins to nucleoli (Larsson et al., 1999; Grasberger and Bell, 2005). However, TSG118 is also expressed in the ovary of the Atlantic salmon smolt (Fujiki, unpublished data;

Figure 3.34), suggesting its possible role in the gonad development in the Atlantic salmon.

All of the three genes mentioned above are good candidates for future studies regarding to Atlantic sex-determination.

131 Figure 3.30 The region of medaka genome that is orthologous to contig 2705

( II II II II II II II II II II II II II II II II II II II II II II 11 II II II II II II II II II II I " I II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II 1111 II II II II II II II II II II II II II II II II II II II II II I) 31000k 31010k 31020k 31030k 31040k 31050k 31060k 31070k 31080k 31090k 311pOk 31110k 31120k 31130k 31140k 31150k 31160k 31170k 31180k 31190k 31200k Hedaka Trans~ripts TEKT4 LOXL4 (1 of 2) C10orf28 C10orf132 (1 of 2) PI4K2A EXOSC1 NR_001013P09.2 ENSORLG00000011630 ~~~ f-It1I H1tf'f1 ~f-Y1I ~ CRTAC1 (2 of 2) ZFYVE27 EXOSC1 NP_001013009.2 ~ t-t""t"r'I ~t-f'I C10orf132 (1 of 2) EXOSC1 ENSORLG00000011617 f-I11 ~ H-t ZFYVE27 EXOSC1 GPRC5B (2 of 2) t-r1"Irt ~ t-ft""i1l

TEKT4: tektin-4 LOXL4: lysyl oxidase homolog 4 precursor C lOorf28: growth inhibition and differentiation related protein (GIDR) CRTAC1: cartilage acidic protein (acidic and secreted protein) ClOorf132: golgin subfamily ZFYVE27: zinc finger FYVE domain containing 27 PI4K2A: phosphatidylinositol 4-kinase type 2 alpha EXOSC1: 3'-5' exoribonuc1ease CSL4 homolog NP_001013009.2: testis-specific gene 118 ENSORLGOOOOOOl1617: unknown GPRC5B: G-protein coupled receptor family C group 5 member B precursor ENSORLGOOOOOOl1630: lipopolysaccharide-induced tumor necrosis factor (LPS-induced TNF-alpha- factor)

132 Figure 3.31 The region of stickleback genome that is orthologous to contig 2705

( II II II II II II II II II II II II II II II II II II II II II II II I II II II II II II II II II II II II II II II II II II II II II II II II II II II II II II 1"1 II II II II IIII II II II II 1111 II II 1111 II II II II II II II II II II II II II II II II II II II II II II II II II II I) 14800k 14810k 14820k 14830k 14840k 14850k 14860k 14870k 14880k 14890k 14900k 14910k 14920k 14930k 14940k 14950k 14960k 14970k 14980k 14990k 15000k Stickleback Transcripts EPN3 (2 of 2) METRNL (2 of 2) CRTAC1 (2 of 2) 'PI4K2A EXOSC1 GPRC5B (2 of 2) SSTR5(1 of 2) PKD1 ~ ~ ~ ~ t'F f-f'f'III f-f'IIl ~ EPN3 (2 of 2) ENSGACG00000018899 C10orf132 (1 of 2) ENSGACG00900018928 PKD1 nlW7 I'1P t-m'I ~, ~ B3GNTL1 TEKT4 ZFVVE27 ENSGACG00000018928 ~ ~ f-Utr1 ~ B3GNTL1 NP_001013009.2 ~ ~ . B3GNTL1 NP_001013009.2 ~ f-tI ENSGACG00000018941 t-IIt1

EPN3: Epsin-3 B3GNTL1: UDP-GlcNAc:betaGal beta-1 ,3-N-acetylglucosaminyltransferase-like protein 1 METRNL: Meteorin-like protein precursor ENSGACGOOOOOO 18899: Immunoglobulin TEKT4: tektin-4 CRTAC1: cartilage acidic protein (acidic and secreted protein) C10orf132: golgin subfamily ZFYVE27: zinc finger FYVE domain containing 27 PI4K2A: phosphatidylinositol 4-kinase type 2 alpha ENSGACG00000018928: phosphoglycerate mutase NP_001013009.2: testis-specific gene 118 ENSGACGOOOOOO 18941: lipopolysaccharide-induced tumor necrosis factor-alpha factor SSTR5: somatostatin receptor type 5 PKD1: polycystin-1 precursor

133 Figure 3.32 The region of zebrafish genome that is orthologous to contig 2705

( II II IIII II II II II II II II II II II II II II II II II IIII II II II II II II II II II II II II 1111 II II II II II II II II II II II 1111 II II II \I II 1111 II II II II II II II II II II II II II II \I II IIII II II II II II II II II II II II II II II II II II II II II II II II II I) 51300k 51310k 51320k 51330k 51340k 51350k 51360k 51370k 51380k 51390k 51400k 51410k 51420k 51430k 51440k 51450k 51460k 514iOk 51480k 51490k 51500k Zebrafish Transcripts ENSDARG00000039691 zgc :153156 zgc :110549 sb :cb184 ~897.1 t-7 ~~ f-I"'t'I'f't'rt zgc:110549 LOC561939 ~ t-rtrf1"'l1Y' zgc:110549 LOC561939 ~ t-rtrf1"'l1Y' zgc:110549 ~

ENSDARG00000039691: NACHT nucleoside triphosphatase zgc: 153156: zinc finger FYVE domain containing 27 (ZFYVE27) zgc: 110549: golgin subfamily sb:cb 184: cartilage acidic protein (acidic and secreted protein) NP_001082897.1: far upstream element binding protein (K homology domain) LOC561939: calcium binding mitochondrial carrier (solute carrier family 25)

134 Figure 3.33 The region of Tetraodon genome that is orthologous to contig 2705 19~Ok 19~Ok 19{Ok IIIII lMok ' IIII 1910k' IIIII 19Jok' IIIII I , 1 I I 19Jok IIIII 1910k IIIII IIIII 19Jok' II

)01 GSTENG00026297001 GSTENG00026294001 GSTENG00026290001 GSTENG00026286001 R-7 f-W"EI f-fIE'IlI f-Itt1 :NG00026299001 GSTENG00026293001 GSTENG00026289001 GSTENG00026285001 ~ ~ &-till GSTENG00026298001 GSTENG00026292001 GSTENG00026288001 GSTENG00026283001 ~ ~ t"t"F ~ GSTENG00026296001 GSTENG00026291001 GSTENG00026287001 ~ f-I'I't""'i ~ GSTENG00026295001 ~

GSTENG0002629'800 1: DNA/RNA non-specific endonuclease GSTENG0002629700 1: leucine-rich repeat GSTENG00026296001: tektin GSTENG00026295001: lysyl oxidase homolog precursor GSTENG00026294001: SOX 21 GSTENGO0026293001: integrin alpha chain GSTENG00026292001: SOX 21 GSTENG00026291001: zinc finger FYVE domain containing 27 (ZFYVE27) GSTENG0002629000 1: phosphatidylinositol 3- and 4-kinase GSTENG0002628900 1: phosphoglycerate mutase GSTENG00026288001: SOX 21 GSTENG00026287001: G-protein coupled receptor GSTENG00026286001: proline-rich region GSTENG00026285001: G-protein coupled receptor GSTENG00026283001: Ambiguous

135 Table 3.16 A comparison of the genes identified in the genomes of the four teleosts that correspond to contig 2705 in Atlantic salmon LG 1. X'd'III lcates th e presence 0f such gene III'ht e genome. Genes Identified Atlantic Salmon Medaka Stickleback Zebrafish Tetraodon B3GNTLI X Calcium binding mitochondrial carrier (solute carrier X family 25) CRTACI X X X X DNA/RNA non-specific endonuclease X EPN3 X EXOSCI X X Far upstream element binding protein (K homology X domain) GIDR X X Golgin subfamily X X X X GPRC5B X X X Integrin alpha chain X Immunoglobulin X LOXL4 X X LPS-induced TNF-alpha factor X X METRNL X Mitochondrial ribosomal recycling factor X NACHT nucleoside triphosphatase X Phosphoglycerate mutase X X X PI4K2A X X X X PKDl X SOX21 X SSTR5 X TEKT4 X X X TSG118 X X X ZFYVE27 X X X X X

136 Genes Identified Atlantic Salmon Medaka Stickleback Zebrafish Tetraodon Zmat4 x

137 Figure 3.34 Expression profile of Zmat4, ZFYVE27 and TSG118 in the Atlantic salmon smolt.

~ Q) c:: ro >. "'0 U Q) ~ Q) CO r:: OJ "- ..c "0 r:: 0 U u ~ Q) 'C U c:: ro t) u ~ Q) 'C CJ) (/) c:: "C 1: "- en +-' CJ) E oIoJ peR Q,) CO ('0 OJ Q) CO (J) 0 ~ Q) :J 0 en >- -.- OJ ... > > 0 ~ a. ... Q,) cycles ..c OJ C) ..c ..c:: r:: .- E 0 a. a. V) V) +-'

Zmat4 :••11 35

ZFYVE27 ,..., ,t.",.' _ .... --- -- _.".... -".' ,If --_ ...... 35

TSG118 30 ubiquitin - 25

138 CHAPTER 4: DISCUSSIONS AND CONCLUSIONS

The objective of this study involved the integration of Atlantic salmon sex-linked

microsatellite markers into the Atlantic salmon Linkage Group (LG) 1, chromosome

walking on the Atlantic salmon chromosome 2, and the comparative genomic analysis

between Atlantic salmon LG 1 and the genomes of four other teleosts: medaka,

stickleback, zebrafish and Tetraodon. This allowed me to update the Atlantic salmon LG

1, to sequence portions of the Atlantic salmon chromosome 2, to identify syntenic regions between Atlantic salmon LG 1 and the four teleosts, and to predict possible sex- determining gene candidates.

4.1 A Comparison between the Previous Work in Conlparative Genomics and my Results

A similar study in comparative genome analysis was previously done by Naruse et al. (2004), in which the genomes of human, medaka and zebrafish were compared with each other to look for conserved syntenic regions among these vertebrates, and at the same time reconstructing the possible ancestral vertebrate proto-chromosomes. Their study revealed some large portions of syntenic regions among the three vertebrates. By assembling back the corresponding conserved regions according to the positions of the orthologous genes on the LGs of the three vertebrates, 12 pairs of ancestral vertebrate proto-chromosomes were reconstructed, and their study also supported the 3R duplication hypothesis. In my study, a similar method was applied by comparing the Atlantic salmon

LG 1 with the genomes of medaka, stickleback, zebrafish, and Tetraodon. For each of the

139 teleost genome comparisons, all the BAC-end sequences from the contigs assigned to

Atlantic salmon LG 1 were subjected to a BLASTx search to the teleost genome of

interest to find the orthologous genes and their corresponding chromosomes. In order to

see how the syntenic regions are distributed along Atlantic salmon LG 1, the orthologous

chromosomes from each of the teleost examined were used to reconstruct the Atlantic

salmon LG 1. Four Atlantic salmon LG 1 maps were reconstructed by rearranging the orthologous chromosomes from medaka, stickleback, zebrafish and Tetraodon in the order of their corresponding microsatellite markers and their contigs that appear on

Atlantic salmon LG 1. The results showed that the p arm of Atlantic salmon chromosome

2 is largely conserved with the medaka chromosome 16, while its q arm is conserved with the medaka chromosome 1. In the case of stickleback, the p arm of Atlantic salmon chromosome 2 is conserved with stickleback LG XX, whereas the q arm is conserved with the stickleback LG IX. In both cases withzebrafish and Tetraodon, however, the p arm and q arm of Atlantic salmon chromosome 2 do not correspond to a single chromosome of zebrafish or Tetraodon as observed in medaka and stickleback, and disrupted syntenies are observed. Therefore, it seems that both zebrafish and Tetraodon genomes were less conserved with the Atlantic salmon genome, as compared to the medaka and stickleback genomes, and hence neither zebrafish nor Tetraodon would be a good choice for comparative genome analysis with Atlantic salmon.

Such comparative genomics also helped me to predict possible sex-determining gene candidates in Atlantic salmon. It can now be seen that Atlantic salmon has adopted a different sex-determining gene with respect to medaka and stickleback, since neither the sex-determining gene nor the sex-linkage group are conserved in these cases. It remains

140 unknown with respect to zebrafish and Tetraodon, since neither of their sex-detennining

genes nor their sex chromosomes have been identified. A closer inspection of contig 2705,

the contig close to the sex-detennining region in Atlantic, and the orthologous regions of

contig 2705 in medaka, stickleback, zebrafish and Tetraodon, revealed three possible

candidate genes for sex-detennination in Atlantic salmon: zinc finger FYVE domain

containing 27 (ZFYVE27), zinc finger matrin-type 4 (Zmat4) and testis-specific gene 118

(TSG118). Although currently these three candidate genes are not known to have a role

in sex-detennination in other organisms (see Results, section 3.7), all of these genes were

found to be expressed in abundance in the ovary of Atlantic salmon smolt, indicating

their possible roles in the gonad development of Atlantic salmon. However, this needs to

be further tested (see section 4.2), and it might still be a long time before we can

detennine the true sex-detennining gene in Atlantic salmon.

4.2 Future Work

In this study, some parts of the chromosome 2 have already been sequenced, such as contig 783, contig 818 and contig 2705, although there are still many gaps in these regions. The next step would be to close up these gaps and, ultimately, we want to sequence the entire chromosome 2. With more and more chromosome 2 sequences available in the future, we should refine the comparative genomics maps in order to obtain even better syntenic infonnation between the Atlantic salmon and the four teleosts studied in this thesis. Finally, from the candidate sex-detennining genes identified in this study, we want to identify the actual master switch for sex-detennination in the Atlantic salmon (i.e. the gene that detennines maleness). In order to test that, we might need to

141 insert these candidate genes into the normal (XX) female embryos of Atlantic salmon and see whether these genes could tum the offspring into XX males.

142 REFERENCES

Allendorf, F. W., Danzmann, R. G. (1997). Secondary tetrasonic segregation of MDH-B and preferential pairing of homeologues in rainbow trout. Genetics 145: 1083­ 1092.

Allendorf, F. W., Thorgaard, G. H. (1984). Tetraploidy and the evolution of salmonid fishes. In: Turner, J.B. (ed) Evolutionary Genetics ofFishes. Plenum Press, New York,1-53.

Altschul, S. F., Gish, W., Miller, VV., Myers, E. W. and Lipman, D. J. (1990). Basic local alignment search tool. Journal ofMolecular Biology 215: 403-410. Altschul, S. F., Madden, T. L., Schaffer, A A, Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389-3402. Amores, A, Force, A, Yan, Y .. -L., Joly, L., Amemiya, c., Fritz, A, Ho, R. K, Langeland, J., Prince, V., Wang, Y-L., Westerfield, M., Ekker, M., Postlethwait, J. H. (1998). Zebrafish hox clusters and vertebrate genome evolution. Science 282: 1711-1714.

Artieri, C. G., Mitchell, L. A, Ng, S. H. S., Parisotto, S. E., Danzmann, R. G., Hoyheim, B., Phillips, R. B., Morasch, M., Koop, B. F., Davidson, W. S. (2006). Identification of the sex-determining locus of Atlantic salmon (Salmo salar) on chromosome 2. Cytogenetic and Genome Research 112: 152-159. Barbazuk, W. B., Korf, I., Kadavi, c., Heyen, J., Tate, S., Wun, E., Bedell, J. A, McPherson, J. D., and Johnson, S. L. (2000). The Syntenic Relationship of the Zebrafish and Human Genomes. Genome Research 10: 1351-1358.

Brown, C. W., Houston-Hawkins, D. E., Woodruff, T. K, and Matzuk M. M. (2000). Insertion of Inhbb into the Inhba locus rescues the Inhba-null phenotype and reveals new activin functions. Nature Genetics 25: 453-457.

Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA Journal ofMolecular Biology 268: 78-94.

Burtis, K C. and Baker, B. S. (1989). Drosophila doublesex gene controls somatic sexual differentiation by producing alternatively spliced mRNAs encoding related sex­ specific polypeptides. Cell 56: 997-1010.

Charlesworth, D., Charlesworth, B., and Marais, G.. (2005). Steps in the evolution of heteromorphic sex chromosomes. Heredity 95: 118-128.

143 Chevassus, B., Devaux, A., ChoUITout, D., Jalabert, B. (1988). Production of YY rainbow trout males by self-fertilization of induced hermaphrodites. Journal of Heredity 79: 89-92.

Chiang, E. F.-L., Pai, C. I., Wyatt, M., Yan, Y-L., Postlethwait, J., Chung, B. (2001). Two Sox9 genes on duplicated zebrafish chromosomes: Expression of similar transcription activators in distinct sites. Developmental Biology 231: 149-163. Christoffels, A., Koh, E. G. L., Chia, J.-m., Brenner, S., Aparicio, S., and Venkatesh, B. (2004). Fugu Genome Analysis Provides Evidence for a Whole-Genome Duplication Early During the Evolution of Ray-Finned Fishes. Molecular Biology and Evolution 21: 1146-1151. Cnaani, A., Lee, B.-Y., Zilberman, N., Ozouf-Costaz, c., Hulata, G., Ron, M., D'Hont, A., Baroiller, J.-F., D'Cotta, H., Penman, D. J., Tomasino, E., Coutanceau, J.-P., Pepey, E., Shirak, A., Kocher, T. D. (2008). Genetics of Sex Determination in Tilapiine Species. Sexual Development 2: 43-54. Cunado, N., Barrios, J., Miguel, E. S., Amaro, R., Fernandez, c., Hermida, M., and Santos, J. L. (2002). Synaptonemal complex analysis in oocytes and spermatocytes of threespine stickleback Gasterosteus aculeatus (Teleostei, Gasterosteidae). Genetica 114: 53-56. Danzmann, R. G., Davidson, E. A., Ferguson, M. M., Gharbi, K., Koop, B. F., Lien, S., Lubieniecki, K. P., Moghadam, H. K., Park, J., Phillips, R. B. and Davidson, W. S. (2008). Distribution of ancestral proto-Actinopterygian chromosome arms within the genomes of 4R-derivative salmonid fishes (Rainbow trout and Atlantic salmon). BMC Genomics 9: 557.

Davidson, W. S., Huang, K., Fujiki, K., von Schalburg, K. R., Koop, B. F. (in press). The sex determinging loci and the sex chromosomes in the family Salmonidae. Sexual Development.

Devlin, R. H., Biagi, C. A., and Smailus, D. E. (2001). Genetic mapping of Y­ chromosomal DNA markers in Pacific salmon. Genetica 111: 43-58. Devlin, R. H., Nagahama, Y (2002). Sex determination and sex differentiation in fish: an overview of genetic, physiological, and environmental influences. Aquaculture 208: 191-364. Ehrlich, J., Sankoff, D., and Nadeau, J. H. (1997). Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution. Genetics 147: 289­ 296. Ezaz, M. T., Harvey, S. c., Boonphakdee, c., Teale, A. J., McAndrew, B. J., and Penman, D. J. (2004). Isolation and physical mapping of sex-linked AFLP markers in Nile Tilapia (Oreochromis niloticus L.). Marine Biotechnology 6: 435-445.

144 Ezaz, T., Stiglec, R, Veyrunes, F.., and Graves, J. A. M. (2006). Relationships between vertebrate ZW and XY sex chromosome systems. Current Biology 16: R736­ R743.

Filatov, D. A. (2005). Evolutionary history of Silene latifolia sex chromosomes revealed by genetic mapping of four genes. Genetics 170: 975-979.

Foster, J. W., Dominguez-Steglich, M. A., Guioli, S., Kowk, G., Weller, P. A., Stevanovic, M., Weissenbach, J., Mansour, S., Young, 1. D., Goodfellow, P.N., Brook, J. D., and Schafer, A. J. (1994).Campomelic dysplasia and autosomal sex reversal caused by mutations in an SRY-re1ated gene. Nature 372: 525-530.

Gates, M. A., Kim, L., Egan, E. S., Cardozo, T., Sirotkin, H. 1., Dougan, S. T., Lashkari, D., Abagyan, R, Schier, A.. F., and Talbot, W. S. (1999). A Genetic Linkage Map for Zebrafish: Comparative Analysis and Localization of Gene and Expressed Sequences. Genome Research 9: 334-347.

Grasberger, H., Bell, G. 1. (2005). Subcellular recruitment by TSG118 and TSPYL implicates a role for zinc finger protein 106 and a novel developmental pathway. The International Journal ofBiochemistry & Cell Biology 37: 1421-1437. Graves, J. A. M. (1995). The evolution of mammalian sex chromosomes and the origin of sex detennining genes. Philosophical Transactions of the Royal Society Biological Sciences. 350: 305-312.

Griffiths, R, Orr, K. J., Adam, A., and Barber, 1. (2000). DNA sex identification in the three-spined stickleback. Journal ofFish Biology 57: 1331-1334.

Grutzner, F., Rens, W., Tsend-Ayush, E., EI-Mogharbel, N., O'Brian, P. C. M., Jones, R c., Ferguson-Smith, M. A., and Graves, J. A. M. (2004). In the platypus a meiotic chain of ten sex chromosomes shares genes with the bird Z and mammal X chromosomes. Nature 432: 913-917. Haaf, T., and Schmid, M. (1984). An early stage of ZW/ZZ sex chromosome differentiation in Poecilia sphenops var. melanistica (Poeciliidae, Cyprinodontifonnes). Chromosoma 89: 37-41. Handley, L.-J. L., Ceplitis, H., and Ellegren, H. (2004). Evolutionary strata on the chicken Z chromosome: Implication for sex chromosome evolution. Genetics 167: 367-376.

Hunter, G. A., Donaldson, E. M., Goetz, F. W., and Edgell, P. R (1982). Production of all female and sterile Coho salmon, and experimental evidence for male heterogamety. Transactions ofthe American Fisheries Society 111: 367-372. Hunter, G. A., Donaldson, E. M., Stoss, J., and Baker, 1. 1. (1983). Production of monosex female groups of chinook salmon (Oncorhynchus tshawytscha) by the fertilization of nonnal ova with spenn from sex-reversed females. Aquaculture 33: 355-364.

145 Jaillon, 0., Aury, J-M., Brunet, F., Petit, J-L., Stange-Thomann, N., Mauceli, E., Bouneau, L., Fischer, c., Ozouf-Costaz, c., Bernot, A, Nicaud, S., Jaffe, D., Fisher, S., Lutfalla, G., Dossat, c., Segurens, B., Dasilva, c., Salanoubat, M., Levy, M., Boudet, N., Castellano, S., Anthouard, V., Jubin, c., Castelli, V., Katinka, M., Vacherie, B., Biemont, c., Skalli, Z., Cattolico, L., Poulain, J., de Berardinis, V., Cruaud, c., Duprat, S., Brottier, P., Coutanceau, J-P., Gouzy, J., Parra, G., Lardier, G., Chapple, c., McKernan, K J., McEwan, P., Bosak, S., Kellis, M., Volff, J-N., Guigo, R, Zody, M. c., Mesirov, J., Lindblad-Toh, K, Birren, B., Nusbaum, c., Kahn, D., Robinson-Rechavi, M, Laudet, V., Schachter, V., Quetier, F., Saurin, W., Scarpelli, c., Wincker, P., Lander, E. S., Weissenbach, J., and Crollius, H. R (2004). Genome duplication in the teleost fish Teraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431: 946-957.

Johnstone, Rand Youngson, A F. (1984). The progeny of sex-inverted female Atlantic salmon (Salmo salar L.). Aquaculture 37: 179-182.

Johnstone, R, Simpsons, T. H., Youngson, A F., and Whitehead, C. 1979. Sex reversal in salmonid culture. Part II. The progeny of sex reversed rainbow trout. Aquaculture 18: 13-19.

Kai, W., Kikuchi, K, Fujita, M., Suetake, H., Fujiwara, A, Yoshiura, Y., Ototake, M., Venkatesh, B., Miyaki, K, and Suzuki, Y. (2005). A Genetic Linkage Map for the Tiger Pufferfish, Takifugu rubripes. Genetics 171: 227-238. Kasahara, M. (2007). The 2R hypothesis: an update. Current Opinion in Immunology 19: 547-552. Kasahara, M., Naruse, K., Sasaki S., Nakatani, Y., Qu, W., Ahsan, B., Yamada, T., Nagayasu, Y., Doi, K, Kasai, Y., Jindo, T., Kobayashi, D., Shimada, A, Toyoda, A, Kuroki, Y., Fujiyama, A., Sasaki, T., Shimizu A, Asakawa, S., Shimizu, N., Hashimoto, S., Yang, J., Lee, Y., Matsushima, K., Sugano, S., Sakaizumi, M., Narita, T., Ohishi, K, Haga, S., Ohta, F., Nomoto, H., Nogata, K, Morishita, T., Endo, T., Shin-I, T., Takeda, H., Morishita, S., and Kohara, Y. (2007). The medaka draft genome and insights into vertebrate genome evolution. Nature 447: 714-719.

Kent, W. J. (2002). BLAT - The BLAST-Like Alignment Tool. Genome Research 12: 656-664.

Kikuchi, K, Kai, W., Hosokawa, A, Mizuno, N., Suetake, H., Asahina, K, and Suzuki, Y. (2007). The Sex-Determining Locus in the Tiger Pufferfish, Takifugu rubripes. Genetics 175: 2039-2042. Kohn, M., Hogel, J., Vogel, W., Minich, P., Kehrer-Sawatzki, H., Graves, J. A. M., and Hameister, H. (2006). Reconstruction of a 450-My-old ancestral vertebrate protokaryotype. Trends in Genetics 22: 203-210.

146 Kondo, M., Nanda, I., Hornung, U., Asakawa, S., Shimizu, N., Mitani, H., Schmid, M., Shima, A, and Schartl, M. (2003). Absence of the Candidate Male Sex­ Detennining Gene dmrtlb(Y) of Medaka from Other Fish Species. Current Biology 13: 416-420. Kondo, M., Nanda, I., Hornung, D., Schmid, M., and Schartl, M. (2004). Evolutionary origin of the medaka Y chromosome. Current Biology 14: 1664-1669.

Kondrashov, A S. (1993). Classification of hypotheses on the advantage of amphimixis. Journal ofHeredity 84: 372-387. Koopman, P., Schepers, G., Brenner, S., Venkatesh, B. (2004). Origin and diversity of the Sox transcription factor gene family: genome-wide analysis in Fugu rubripes. Gene 328: 177-186.

Krogh, A, Larsson, B., and Sonnhammer, E. L. L. (2001). Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. Journal ofMolecular Biology 305: 567-580.

Kutateladze, T.G. (2006). Phosphatidylinositol 3-phosphate recognition and membrane docking by the FYVE domain. Biochimica et Biophysica Acta. 1761: 868-77.

Lahn, B. T., and Page, D. C. (1999). Four evolutionary strata on the human X chromosome. Science 286: 964-967. Larsson, M., Brundell, E., Jorgensen, P. M., Stahl, S., Haag, c. (1999). Characterization of a novel nucleolar protein that transiently associates with the condensed chromosomes in mitotic cells. The European Journal of Cell Biolog. 78: 382-390 Lee, B.-Y., Hulata, G., and Kocher, T. D. (2004). Two unlinked loci controlling the sex ofblue tilapia (Oreochromis aureus). Heredity 92: 543-549. Lee, B.-Y., Penman, D. J., Kocher, T. D. (2003). Identification of a sex-detennining region in Nile tilapia (Oreochromis niloticus) using bulked segregant analysis. Animal Genetics 34: 379-383.

Liu, Z., Moore, P. H., Ma, H., Ackennan, C. M., Ragiba, M., Yu, Q., Pearl, H. M., Kim, M. S., Charlton, J., W., Stiles, J. I., Zee, F. T., Paterson, A H., and Ming, R. (2004). A primitive Y chromosome in papaya marks incipient sex chromosome evolution. Nature 427: 348--352. Lynch, M. and Conery, J. S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1154. Lynch, M. and Force, A G. (2000). The origin of interspecific genomic incompatibility via gene dupilication. American Naturalist 156: 590-605.

Mank, J. E. and Avise, J. C. (2006). Phylogenetic conservation of chromosome numbers in Actinopterygiian fishes. Genetica 127: 321-327.

147 Mank, J. E., Promislow, D. E., and Avise, J. C. (2006). Evolution of alternative sex­ determining mechanisms in teleost fishes. Biological Journal of the Linnean Society 87: 83-93.

Mann~m, A. u., Krawen, P., Sauter, S. M., Boehm, J., Chronowska, A., Paulus, W., Neesen, J., and Engel, W. (2006). ZFYVE27 (SPG33), a Novel Spastin-Binding Protein, is Mutated in Hereditary Spastic Paraplegia. The American Journal of Human Genetics 79: 351-357.

Marchand, 0., Govoroun, M., D'Cotta, H., McMeel, 0., Lareyre, J-J., Bernot, A., Laudet, V., Guiguen, Y. (2000). DMRT1 expression during gonadal differentiation and spermatogenesis in the rainbow trout, Oncorhynchus mykiss. Biochimica et Biophysica Acta 1493: 180-187. Matsuda, M. (2005). Sex Determination in the Teleost Medaka, Oryzias latipes. Annual Review OfGenetics 39: 293-307. Matsuda, M., Nagahama, Y., Shinomiya, A., Sato, T., Matsuda, c., Kobayashi, T., Morrey, C. E., Shibata, N., Asakawa, S., Shimizu, N., Hori, H., Hamaguchi, S., and Sakaizumi, M. (2002). DMY is a Y-specific DM-domain gene required for male development in the medaka fish. Nature 417: 559-563. Menkis, A., Jacobson, D. J., Gustafsson, T., Johannesson, H. (2008). The mating-type chromosome in the filamentous ascomycete Neurospora tetrasperma represents a model for early evolution of sex chromosomes. PLoS Genetics 4: 1-10.

Meyer, A., Van der Peer, Y. (2005). From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays 27: 937-945.

Moen, T., Hayes, B., Baranski, M., Berg, P. R., Kj0g1um, S., Koop, B. F., Davidson, W. S., Omholt, S. W., and Lien, S. (2008). A linkage map of the Atlantic salmon (Salmo salar) based on EST-derived SNP markers. BMC Genomics 9: 223-236.

Moghadam, H. K., Ferguson, M. M., Danzmann, R. G. (2005a). Evolution of Hox Clusters in Salmonidae: A Comparative Analysis Between Atlantic Salmon (Salmo salar) and Rainbow Trout (Oncorhynchus mykiss). Journal of Molecular Evolution 61: 636-649.

Moghadam, H. K., Ferguson, M. M., Danzmann, R. G. (2005b). Evidence for Hox Gene Duplication in Rainbow Trout (Oncorhynchus mykiss): A Tetraploid Model Species. Journal ofMolecular Evolution 61: 804-818. Mulley, J. and Holland, P. (2004). Small genome, big insights. Nature 431: 916-917. Nakatani, Y., Takeda, H., Kohara, Y., and Morishita, S. (2007). Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Research 17: 1254-1265.

Nakayasu, H., and Berezney, R. (1991). Nuclear matrins: identification of the major nuclear matrix proteins. Proceedings of the National Academy of Science. 88: 10312-10316.

148 Nanda, I., Kondo, M., Hornung, U., Asakawa, S., Winkler, c., Shimizu, A., Shan, Z., Haaf, T., Shimizu, N., Shima, A., Schmid, M., and Schartl, M. (2002). A duplicated copy of DMRTl in the sex-determining region of the Y chromosome of the medaka, Oryzias latipes. Proceedings of the National Academy of Sciences 99: 11778-11783.

Nanda, I., Shan, Z., Schartl, M, Burt, D. W., Koehler, M., Nothwang, H.-G., Grutzner, F., Paton, I. R, Windsor, D., Dunn, I., Engel, W., Staeheli, P., Mizuno, S., Haaf, T., and Schmid, M. (1999). 300 million years of conserved synteny between chicken Z and human chromosome 9. Nature Genetics 21: 258-259.

Nanda, I., Zend-Ajusch, E., Shan, Z., Grutzner, F., Schartl, M., Burt, D. W., Koehler, M., Fowler, V. M., Goodwin, G., Schneider, W. J., Mizuno, S., Dechant, G., Haaf, T., and Schmid, M. (2000). Conserved synteny between the chichken Z sex chromosome and human chromosome 9 includes the male regulatory gene DMRT1: a comparative (re)view on avian sex determination. Cytogenetics and Cell Genetics 89:67-78.

Naruse, K, Fukamachi, S., Mitani, H., Kondo, M., Matsuoka T., Kondo, S., Hanamura, N., Morita, Y., Hasegawa, K, Nishigaki, R, Shimada, A., Wada, H., Kusakabe, T., Suzuki, N., Kinoshita, M., Kanamori, A., Terado, T., Kimura H., Nonaka, M., and Shima, A. (2000). A Detailed Linkage Map of Medaka, Oryzias latipes: Comparative Genomics and Genome Evolution. Genetics 154: 1773-1784.

Naruse, K, Tanaka, M., Mita, K., Shima, A., Postlethwait, J., and Mitani, H. (2004). A Medaka Gene Map: The Trace of Ancestral Vertebrate Proto-Chromosomes Revealed by Comparative Gene Mapping. Genome Research 14: 820-828

Ng, S. H. S., Artieri, C. G., Bosdet, I. E., Chiu, R., Danzmann, R G., Davidson, W. S., Ferguson, M. M., Fjell, C. D., Hoyheim, B., Jones, S. J. M., de Jong, P. J., Koop, B. F., Krzywinski, M. I., Lubieniecki, K., Marra, M. A., Mitchell, L. A., Mathewson, c., Osoegawa, K., Parisotto, S. E., Phillips, R B., Rise, M. L., von Schalburg, K R, Schein, 1.. E., Shin, H., Siddiqui, A., Thorsen, J., Wye, N., Yang, G., Zhu, B. (2005). A physical map of the genome of Atlantic salmon, Salmo salar. Genomics 86: 396-404.

Nicolas, M., Marais, G., Hykelova, V., Janousek, B., Laporte, V., Vyskot, B., Mouchiroud, D., Negrutiu, I., Charlesworth, D., Monegaer, F. (2005). A gradual process of recombination restriction in the evolutionary history of the sex chromosomes in dioecious plants. PLoS Biology 3: 47-56.

Ning, Z., Cox, A. J., Mullikin, J. C. (2001). SSAHA: A Fast Search Method for Large DNA Databases. Genome Research 11: 1725-1729.

Ohno, S. (1970). Evolution by Gene Duplication. Springer Verlag: New York.

Onozato, H. (1989). Androgenesis in masu salmon. Physiology and Ecology Japan 1: 543.

149 Peichel, C. L., Ross, J. A, Matson, C. K., Dickson, M., Grimwood, J., Schmutz, J., Myers, R M., Mori, S., Schluter, D., and Kingsley, D. M. (2004). The Master Sex-Determination Locus in Threespine Stickleback is on a Nascent Y Chromosome. Current Biology 14: 1416-1424.

Phillips, R B. and Rab, P. (2001). Chromosome evolution in the Salmonidae (Pisces): An update. Biological Reviews 76: 1-25.

Phillips, R B., DeKoning, J., Morasch, M. R, Park, L. K., Devlin, R H. (2007). Identification of the sex chromosome pair in chum salmon (Oncorhynchus keta) and pink salmon (Oncorhynchus gorbuscha). Cytogenetic and Genome Research 116: 298-304.

Phillips, R B., Konkol, N. R, Reed, K. M., and Stein, J. D. (2001). Chromosome painting supports lack of homology among sex chromosomes in Oncorhynchus, Salmo, and Salvelinus (Salmonidae). Genetica 111:119-121.

Phillips, R B., Morasch, M. R, Park, L. K., Naish, K. A, Devlin, R H. (2005). Identification of the sex chromosome pair in coho salmon (Onchorhynchus kisutch): lack of conservation of the sex linkage group with chinook salmon (Onchorhynchus tshawytscha). Cytogentic and Genome Research Ill:166-170.

Pires-daSilva, A (2007). Evolution of the control of sexual identity in nematodes. Seminars in Cell & Developmental Biology 18: 362-370.

Quintana-Murci, L., Jamain, S., Fellous, M. (2001). The origin and evolution of mammalian sex chromosomes. Life Sciences 324: 1-11.

Raymond, C. S., Parker, E. D., Kettlewell, J. R, Brown, L. G., Page, D. c., Kusz, K., Jaruzelska, J., Reinberg, Y., Flejter, W., Bardwell, V. J., Hirsch, B., and Zarkower, D. (1999). A region of human chromosome 9p required for testis development contains two genes related to known sexual regulators. Human Molecular Genetics 8: 989-996.

Raymond, C. S., Shamu, C. E., Shen, M. M., Seifert, K. J., Hirsch, B., Hodgkin, J., and Zarkower, D. (1998). Evidence for evolutionary conservation of sex-determining genes. Nature 391: 691-695.

Ross, J. A and Peichel, C. L. (2008). Molecular cytogenetic evidence of rearrangements on the Y chromosome of the threespine stickleback fish. Genetics 179: 2173-2182.

Rychlik, W. and Rhoads, R E. (1989). A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA Nucleic Acids Research 17: 8543-8551.

Sakamoto, T., Danzmann, R G., Gharbi, K., Howard, P., Ozaki, A, Khoo, S. K., Woram, R A, Okamoto, N., Ferguson, M. M., Holm, L.-E., Guyomard, R, and Hoyheim, B. (2000). A microsatellite linkage map of rainbow trout (Oncorhynchus mykiss) characterized by large sex--specific differences in recombination rates. Genetics 155: 1331-1345.

150 Sarropoulou, E., Nousdili, D., Magoulas, A, Kotoulas, G. (2008). Linking the genomes of nonmodel teleosts through comparative genomics. Marine Biotchnology 10: 227-233. Schartl, M. (2004). A comparative view on sex determination in medaka. Mechanisms of Development 121: 639-645.

Shetty, S., Kirby, P., Zarkower, D., and Graves, J. A M. (2002). DMRTI in a ratite bird: evidence for a role in sex determination and discovery of a putative regulatory element. Cytogenetic and Genome Research 99: 245-25l. Sidow, A (1996). Gen(om)e duplications in the evolution of early vertebrates. Current Opinion in Genetics and Development 6: 715-722.

Smith, J. and Voss, S. R, (2007). Bird and Mammal Sex-Chromosome Orthologs Map to the Same Autosomal Region in a Salamander (Ambystoma). Genetics 177:607­ 613.

Stein, J. D., Reed, K M., and Phillips, R B. (2002). Isolation and characterization of a sex-linked microsatellite locus from the lake trout Y chromosome. Environmental Biology ofFishes 64: 211-216.

Stein, J., Phillips, R. B., and Devlin, R H. (2001). Identification of the Y chromosome in chinook salmon (Onchorhynchus tshawytscha). Cytogentic and Genome Research 92: 108-110. Stothard, P. and Pilgrim, D. (2003). Sex-determination gene and pathway evolution in nematodes. BioEssays 25: 221-231. Takamatsu, N., Kanda, H., Ito, M., Yamashita, A., Yamashita, S., Shiba, T. (1997). Rainbow trout SOX9: cDNA cloning, gene structure and expression. Gene 202: 167-170.

Takehana, Y., Demiyah, D., Naruse, K, Hamaguchi, S., and Sakaizumi, M. (2007a). Evolution of Different Y Chromosomes in Two Medaka Species, Oryzias dancena and O. laptipes. Genetics 175: 1335-1340.

Takehana, Y., Naruse, K, Hamaguchi, S., and Sakaizumi, M. (2007b). Evolution of ZZ/ZW and XX/XY sex-determination systems in the closely related medaka species, Oryzias hubbsi and O. dancena. Chromosoma 116: 463-470. Thorgaard, G. H. (1977). Heteromorphic Sex Chromosomes in Male Rainbow Trout. Science 196:900-902.

Thorsen, J., Zhu, B., Frengen, E., Osoegawa, K, de Jong, P. J., Koop, B. F., Davidson, W. S., and Hoyheim, B. (2005). A highly redundant BAC library of Atlantic salmon (Salmo salar): an important tool for salmon projects. BMC Genomics 6:50.

Vigier, B., Forest, M. G., Eychenne, B., Bezard, J., Garrigou, 0., Robel, P., and Josso, N. (1989). Anti-Mullerian hormone produces endocrine sex reversal of fetal ovaries. Proceedings ofthe National Academy ofSciences 86: 3684-3688.

151 Volff, J.-N. (2005). Genome evolution and biodiversity in teleost fish. Heredity 94: 280­ 294. Volff, J.-N. and Schartl, M. (2001). Variability of genetic sex determination in poeciliid fishes. Genetica 111: 10 1-110.

Volff, J.-N. and Schartl, M. (2002). Sex determination and sex chromosome evolution in the medaka, Oryzias latipes, and the platyfish, Xiphophorus maculatus. Cytogenetic and Genome Research 99: 170-177.

Volff, J.-N., Nanda, I., Schmid, M., Schartl, M. (2007). Governing sex determination in fish: regulatory putsches and ephemeral dictators. Sexual Development 1: 85-99. Wagner, T., Wirth, J., Meyer, J., Zabel, B., Held, M., Zimmer, J., Pasantes, J., Bricarelli, F. D., Keutel, J., Hustert, E., Wolf, U., Tommerup, N., Schempp, W., and Scherer, G. (1994). Autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the SRY-related gene SOX9. Cell 79: 1111-1120. Waters, P. D., Wallis, M. C., Graves, J. A. M. (2007). Mammalian sex - Origin and evolution of the Y chromosome and SRY. Seminars in Cell & Developmental Biology 18: 389-400.

Wolfe, K. H. (2001). Yesterdays polypoids and the mystery of diploidization. Nature Reviews Genetics 2: 333-341.

Woods, I. G., Kelly, P. D., Chu, F., Ngo-Hazelett, P., Yan, Y-L., Huang, H., Postlethwait, J. H., and Talbot, W. S. (2000). A Comparative Map of the Zebrafish Genome. Genome Research 10: 1903-1914. Woram, R. A., Gharbi, K., Sakamoto, T., Hoyheim, B., Holm, L-E., Naish, K., McGowan, c., Ferguson, M. M., Phillips, R. B., Stein, J., Guyomard, R., Cairney, M., Taggart, J. B., Powell, R., Davidson, W., and Danzmann, R. G. (2003). Comparative Genome Analysis of the Primary Sex-Determining Locus in Salmonid Fishes. Genome Research 13: 272-280

Wright, J. E., Johnson, K., Hollister, A., May, B. Meiotic models to explain classical linkage, pseudolinkage, and chromosome pairing in tetraploid derivative salmonid genomes. ]sozymes: Current Topics in Biological and Medical Research 10: 239­ 260. Yamaguchi, A., Lee, K. H., Fujimoto, H., Kadomura, K., Yasumoto, S., Matsuyama, M. (2006). Expression of the DMRT gene and its roles in early gonadal development of the Japanese pufferfish Takifugu rubripes. Comparative Biochemistry and Physiology, Part D 1: 59-68.

Yokoi, H., Kobayashi, T., Tanaka, M., Nagahama, Y, Wakamatsu, Y., Takeda, H., Araki, K., Morohashi, K., Ozato, K. (2002). Sox9 in a Teleost Fish, Medaka (Oryzias latipes): Evidence for diversified function of Sox9 in gonad differentiation. Molecular Reproduction and Development 63: 5-16.

152 Zhang, J. (2003). Evolution by gene duplication: an update. Trends in Ecology and Evolution 18: 292-298.

Zhang, J. (2004). Evolution of DMY, a Newly Emergent Male Sex-Determination Gene of Medaka Fish. Genetics 166: 1887-1895.

Zhang, J. and Madden, T. L. (1997). PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation. Genome Research 7: 649-656.

Zhou, R., Liu, L., Guo, Y., Yu, H., Cheng, H., Huang, X.,Tiersch, T. R., and Berta, P. (2003). Similar gene structure of two Sox9a genes and their expression patterns during gonadal differentiation in a teleost fish, rice field eel (Monopterus albus). Molecular Reproduction and Development 66: 211-217.

153



© 2022 Docslib.org