Evolutionary History and Whole Genome Sequence of Pejerrey ( bonariensis): New Insights into Sex Determination in

by Daniela Campanella

B.Sc. in Biology, July 2009, Universidad Nacional de La Plata, Argentina

A Dissertation submitted to

The Faculty of The Columbian College of Arts and Sciences of The George Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

January 31, 2015

Dissertation co-directed by

Guillermo Ortí Louis Weintraub Professor of Biology

Elisabet Caler Program Director at National Heart, Lung and Blood Institute, NIH The Columbian College of Arts and Sciences of The George Washington University certifies that Daniela Campanella has passed the Final Examination for the degree of

Doctor of Philosophy as of December 12th, 2014. This is the final and approved form of the dissertation.

Evolutionary History and Whole Genome Sequence of Pejerrey (): New Insights into Sex Determination in Fishes

Daniela Campanella

Dissertation Research Committee:

Guillermo Ortí, Louis Weintraub Professor of Biology, Dissertation Co-Director

Elisabet Caler, Program Director at National Heart, Lung and Blood Institute, NIH, Dissertation Co-Director

Hernán Lorenzi, Assistant Professor in Bioinformatics Department, J. Craig Venter Institute Rockville Maryland, Committee Member

Jeremy Goecks, Assistant Professor of Computational Biology, Committee Member

! ""!

! Copyright 2015 by Daniela Campanella All rights reserved

! """!

Dedication

The author wishes to dedicate this dissertation to:

My love, Ford, for his unconditional support and inspiration. For teaching me that admiration towards each other’s work is the fundamental fuel to go anywhere.

My family and friends, for being there, meaning “there” everywhere and whenever.

My grandpa Hugo, a pejerrey lover who knew how to , cook and enjoy the “silver arrows”. I was never a good fishing partner, but at least I caught a pejerrey genome!

! "#!

Acknowledgements

The author wishes to say thank you to:

My co-advisor Lis. You were not only an incredible advisor and mentor, but also a friend.

Thank you, Emmanuel & Mila for providing me of a home away from home.

My co-advisor Guillermo. for giving me the opportunity to move forward in my career, for your guidance, enthusiasm and kind words during tough times.

To Hernán, for taking over and guiding me through the bioinformatics jungle. For patiently responding to my hundreds of emails with silly questions. Thank you!

To my lab-mates, Lily Hughes, Andrew Thompson, Kerry Mullaney, Roberto “tito”

Cifuentes, Ricardo R-Betancur, Dahiana Arcila and Eva Rueda, for teaching me about

PCR, fishes and life as a graduate student. Some of you only stayed for a while but made a big difference, and taught me so much. Thank you.

Students, friends (Sanghi, Thiago, Chuy, Joey, Cristina), staff, and faculty from the GWU

Biology Department, thank you all.

To Juani Fernandino & Gustavo Somoza from IIB-INTECH, Chascomús. For sharing your pejerrey knowledge with me, for the countless hours of interesting conversations and the inspiration to explore.

To G. Allen, R. Cifuentes, V. Cussac, A. Gosztonyi, J. Graf, E. Habit, G. Lange, M.

Loureiro, D. Lumbantobing, A. Saunders, L. Smith, J. Sparks and the many additional people who assisted with collecting, curating, and providing the specimens for the

Chapter 2 study. To K. Mullaney for helping with DNA amplifications, and

! #! paleontologist M. E. Raffi who contributed with observations on argentine fossils. To R.

Betancur-R who helped with data analysis.

To the US National Science Foundation for providing grants DEB 0918073 (to Kyle R.

Piller, co-author of Chapter 2), DEB-1019308 and OISE-0530267 (to Guillermo Ortí, co- author of Chapter 2).

To The George Washington University, for funding part of my work through the

Weintraub Fellowship, Harlan funds, and startup funds to my co-advisor Guillermo Ortí.

To the J. Craig Venter Institute in Rockville, Maryland for providing the computational resources, training and fellowship.

! #"! Abstract of Dissertation

Evolutionary History and Whole Genome Sequence of Pejerrey (Odontesthes bonariensis): New Insights into Sex Determination in Fishes

Recent reduction in the cost of DNA sequencing has enabled unprecedented opportunities to obtain genomic resources for non-model organisms. The main product of this dissertation is the whole-genome sequence of the pejerrey, Odontesthes bonariensis (Atherinopsidae,

Atheriniformes, Teleostei), a freshwater with high value for and recreational fisheries and an emerging model system to study the evolution of sex determination in vertebrates. Genomic resources have the potential to quickly expand scientific knowledge by providing direct access to big genetic datasets. This dissertation introduces the first version of the pejerrey genome assembly and annotation based on a shotgun sequencing approach using the

Illumina platform of three genomic libraries with different insert sizes.

The pejerrey is one of the few fish species known to undergo temperature-dependent sex determination (TSD). Although a major sex-determining gene has been identified recently in several species of Odontesthes, the temperature at which pejerrey eggs or larvae are exposed to during the first weeks of life is a major factor determining phenotypic sex, overriding the effect of the genotype. A direct application of the newly assembled and annotated pejerrey genome provides insight into the regulatory network affecting TSD in fishes. This study reveals potential mechanisms to explain how genetic, environmental, and chemical factors interact in a sex- determining network during key developmental stages of pejerrey. A new perspective is presented on the role of steroid hormones affecting expression of a conserved genetic toolkit shared by species with different sex determination systems.

! #""! To enhance the comparative value of the new genomic resources and place the pejerrey in a well- resolved phylogenetic context, this dissertation provides a new phylogenetic hypothesis based on analyses of new sequence data collected for eight molecular markers for a representative sample of 103 atheriniform species, covering 2/3 of the genera in this order. The new phylogenetic hypothesis is used to recommend some changes in the current classification and is calibrated with six carefully chosen fossil taxa to provide an explicit timeframe for the diversification of this group. Ancestral habitat reconstructions are inferred to test biogeographic hypotheses to explain current distribution of marine and freshwater taxa. Post-Gondwanan divergence times among families are consistent with extensive marine dispersal along the margins of continents with repeated invasion of freshwater habitats.

! #"""!

Table of Contents

Dedication ...... iv

Acknowledgements ...... v

Abstract of Dissertation...... vii

List of Figures ...... xi

List of Tables...... xiii

Chapter 1: Introduction and Overview...... 1-23

References...... 16

Chapter 2: Multi-locus fossil calibrated phylogeny of ...... 24-70

Introduction...... 24

Materials & Methods ...... 32

Results...... 39

Discussion...... 54

References...... 64

Chapter 3: The draft genome of Odontesthes bonariensis ...... 71-127

Introduction...... 71

Materials & Methods ...... 73

Results...... 83

Discussion and Conclusions ...... 111

References...... 120

Chapter 4: The draft genome of Odontesthes bonariensis ...... 128-168

! "$! Introduction...... 128

Materials & Methods ...... 73

Results...... 140

Discussion ...... 146

Conclusions ...... 153

References ...... 162

! $! List of Figures

Figure 1.1...... 5

Figure 1.2...... 8

Figure 1.3...... 14

Figure 2.1...... 29

Figure 2.2 A & B...... 49-50

Figure 2.3...... 51

Figure 2.4...... 53

Figure 3.1...... 79

Figure 3.2...... 85

Figure 3.3...... 87

Figure 3.4...... 88

Figure 3.5...... 91

Figure 3.6...... 92

Figure 3.7...... 94

Figure 3.8...... 95

Figure 3.9...... 96

Figure 3.10...... 98

! $"! Figure 3.11...... 101

Figure 3.12...... 107

Figure 3.13...... 108

Figure 4.1...... 132

Figure 4.2...... 143

Figure 4.3...... 145

! $""! List of Tables

Table 1.1...... 9

Table 2.1...... 30

Table 2.2...... 41

Table 2.3...... 44

Table 2.4...... 45

Table 2.5...... 59

Table 3.1...... 80

Table 3.2...... 86

Table 3.3...... 93

Table 3.4...... 99

Table 3.5...... 104

Table 3.6...... 105

Table 3.7...... 109

Table 3.8...... 110

Table 4.1...... 139

Table 4.2...... 144

Supplementary Table 4.3...... 154

! $"""! Chapter 1. Introduction and Overview

1.1. The pejerrey, Odontesthes bonariensis (Valenciennes 1835)

Odontesthes bonariensis (Valenciennes 1835) is a slender, silvery fish with a wide and conspicuous lateral line and two small dorsal fins (Lahille, 1929; Ringuelet, 1975), similar to most of the species in its family Atherinopsidae. It is commonly known as pejerrey or peixe-rei in South America, a vernacular commonly applied to other species of the same . As it became known in other parts of the world, it is often referred to as the South American silverside (Froese and Pauly, 2014; Lahille, 1929). Pejerrey is a native species of inland streams and lakes in Argentina, Uruguay, and Southern Brazil and are most abundant in shallow lakes in the region of Buenos Aires province,

Argentina (Dyer, 1993; Dyer, 1997). Although primarily a freshwater species, O. bonariensis is (capable of tolerating variable salt concentrations) and also can inhabit chemically complex environments (Gómez et al., 2006), especially when compared to closely related species, a trait though to have contributed to the expansion of its native distribution (Grosman, 2002). Pejerrey specimens reach the largest registered size (52 cm total length) compared to any other atherinopsid or atheriniform relative

(Dyer, 2006), but the average length of mature adults in wild populations is closer to 25 cm (Lahille, 1929). In part for this reason and also due to the quality and market value of its flesh, pejerrey is a highly popular species in South American commercial and sport fisheries (Grosman, 2002; Lopez et al., 2001).

Pejerrey aquaculture started around Chascomús Lake, Argentina, in 1904 (Evermann and

Jenkins, 1891; Somoza et al., 2008; Valette, 1939), primarily for facilitating the distribution of juveniles to stock man-made reservoirs and lakes. Brazilian aquaculture of

! "! this species was established in dos Quadros Lagoon, RS, starting in 1943 (Kleerekoper,

1949). The species was subsequently exported to Japan, where it is cultured since 1966

(Ohashi, 2004), and became so popular that pejerrey aquaculture is currently practiced in

21 Japanese provinces (Grosman, 2002). Italy imported pejerrey aquaculture in 1974

(Tortonese, 1985). Further introductions for stocking purposes spread this species into

Chile (Riegel, 1960) and Bolivia in the late 1940s, entering around 1955

(Dyer, 2006). In Argentina, pejerrey was introduced to lakes in in many cases into the native range of its congener O. hatcheri (the Patagonian silverside or pejerrey patagónico). In many localities O. bonariensis seems to have replaced O. hatcheri and in others they seem to form swarms (Crichigno et al., 2014). There is circumstantial evidence that artificial insemination practices carried out by aquaculturists for stocking purposes (Dyer, 2006) may have contributed to this process. Modern aquaculture efforts in Argentina include a wide variety of rearing conditions such as fish tanks, artificial ponds and cages (Berasain et al., 2002). Recently, the intensive culture in tanks has resulted in the completion of full production cycles (Berasain et al., 2002; Velasco et al.,

2008). However, pejerrey aquaculture has not reached the desired scale for mass production, mostly because of large population losses during larval and juvenile life stages (de Souza et al., 2013). Some suggest that the lack of success in pejerrey aquaculture is due to knowledge gaps in its basic biology and scarcity of technological resources (Somoza et al., 2008). Much effort, however, has been devoted to generating new knowledge on the physiology of pejerrey that could be directly applied to aquaculture methods, including investigations of gonadal apoptosis (Yamamoto et al.,

2013), germ cell transplantation (Majhi et al., 2014), stress (Dornelles Zebral et al.,

! #! 2014), sperm damage from cryopreservation (Garriz and Miranda, 2013), and dietary implications in growth and gene expression (Gómez-Requeni et al., 2013) (Gómez-

Requeni et al., 2012). Other studies have focused on growth patterns (Llompart et al.,

2013), cage aquaculture (Campanella et al., 2013; de Souza et al., 2013) and environmental toxicology (Pérez et al., 2012). Due to its economic importance, O. bonariensis has been included in the list of “Fishes of the Genome 10K” (Bernardi et al.,

2012) a project aiming to assemble a collection of DNA sequences representing the genomes of 10,000 vertebrate species (www.genome10k.org) (see 1.3 for more information).

Pejerreyes typically live and feed near the surface, with a diet that is mostly dominated by during early life stages, but that includes snails and small insects as they become adults (Freyre et al., 2009; Ringuelet, 1975). The reproductive cycle of pejerrey is biannual, as two cohorts are produced per year, one in September-October, and another during March-April (Ringuelet, 1943). Sex determination in O. bonariensis is greatly influenced by water temperature experienced by developing larvae during a thermo- sensitive period in the first weeks after hatching (Strüssmann et al., 1997). This peculiar biological feature is named TSD, temperature-dependent sex determination (Devlin and

Nagahama, 2002). The reproductive physiology of O. bonariensis, including TSD, has been extensively studied but remains far from being fully understood (Fernandino et al.,

2013a; Fernandino et al., 2013b; Miranda et al., 2013; Ospina-Alvarez and Piferrer, 2008;

Yamamoto et al., 2014). This topic is further introduced in a later section and discussed extensively in Chapter 4 of this dissertation, where I provide new genomic insights into the regulatory network associated with TSD.

! $! In summary, the regional importance of O. bonariensis and its potential value for international aquaculture underline the need for resources that will expand our knowledge and leverage current aquaculture efforts. Genomic information has the potential to improve many different areas of study. Once the pejerrey genome becomes publicly available, researchers will have an easier, faster and cost-effective access to pejerrey genomics, transcriptomics, and proteomics. The generation of a pejerrey genomic database will certainly provide the tools to accelerate the research in pejerrey physiology, ecology, and toxicology.

1.2. Phylogenetic position of O. bonariensis and diversification of atheriniform fishes

Odontesthes bonariensis is a South American species of the circumtropical order

Atheriniformes (Dyer, 2006). It is classified within the family Atherinopsidae, an array of marine, estuarine, and freshwater fishes that are commonly referred to as New World silversides (Dyer, 1997). Species of Atherinopsidae are endemic to the marine and freshwater environments of Central, North and South America (Dyer, 1997). Other commonly known species in Atherinopsidae include the (Leuresthes sp.), false grunion ( regis), charal ( sp.), topsmelt (Atherinops affinis) and (Atherinopsis californiensis). A phylogeny for these taxa based on morphological characters (Dyer, 1997)(Dyer1998) is shown in Figure 1.1. A more detailed taxonomic description of species richness in Atherinopsidae, as well as the rest of the families within the order Atheriniformes is provided in Chapter 2.

! %!

Figure 1.1. Phylogeny of atherinopsid genera based on morphological characters taken from Dyer (2006)

! &! Phylogenetic relationships within the order have been intensely studied but, after decades of morphological and molecular studies, much disagreement remains. Atheriniformes are geographically distributed in New and Old World regions, correspondingly classified in two suborders: Atherinopsoidae (New World) and (Old World). This distribution could be explained by either a vicariant event, as suggested by previous studies, or by marine dispersal, requiring a marine common ancestor. The aim of Chapter

2 is to provide a robust phylogenetic framework to inform taxonomic decisions and test biogeographic hypotheses. A newly generated multi-locus, fossil-calibrated phylogeny is presented to perform a habitat transition analysis, to estimate the time of diversification of the main Atheriniform clades, and to test whether oceanic dispersal might be a likely explanation of their current distribution.

1.3. Fish Genomes and the Pejerrey Genome Project

Bony fishes have inhabited the Earth since the Silurian period (Betancur et al., 2013) and are the most diverse group of , with about 32,000 valid species (Eschmeyer,

2013). Such species richness reflects extraordinary diversity of body forms, as well as evolutionary adaptations to all kinds of aquatic environments (Greenwood et al., 1966).

There are currently 5516 eukaryote genome projects in different stages of completion, among which 824 are for species, including 108 teleost fish species, according to the GOLD database at NCBI (Pagani et al., 2012) (ncbi.nih). The current number of

Teleost genomes available by 2014 only covers about 0.3% of all recognized species.

Furthermore, the stage of completeness of these genome projects is highly variable. For example, only six are fully sequenced and annotated to the chromosome level (spotted

! '! gar, zebrafish, flounder, medaka, tilapia, and green spotted pufferfish). The rest of the available genomes are either unassembled, assembled in contigs or scaffolds, or roughly annotated (Figure 1.2). A list of published fish genomes is available in Table 1.1.

The green-spotted pufferfish, Tetraodon nigroviridis (Aparicio et al., 2002) and the

Japanese fugu, Takifugu rubripes (Jaillon and al., 2004), were the first fish species to have their genomes sequenced due to their compact genome sizes (Hinegardner and

Rosen, 1972; Roest Crollius and Weissenbach, 2005). The advantages of sequencing small genomes include a reduction in time, workforce, and costs. However, the development and physiology of pufferfish and fugu were still largely unknown and, at the time their genomes were released, there was little information about the genes discovered and their biological function (Roest Crollius and Weissenbach, 2005).Therefore, two important biological fish models were subsequently sequenced: zebrafish (Danio rerio) and medaka (Oryzias latipes). The former is a well-known model for vertebrate development (Grunwald and Eisen, 2002), and the latter was chosen due to existence of fertile inbred strains and embryonic stem cells (Hong et al., 1996). Notably, the medaka was the first fish species for which a single locus was found to be responsible for sex determination (Matsuda et al., 2002).

! (!

Figure 1.2. Whole genome projects of Teleost fishes in a phylogenetic context. Teleosts are shown within a purple rectangle. The degree of project completeness is shown with different colors (finished; in progress with scaffolds only; in progress with contigs only). About 30 more genome projects are in the sequencing stage and were not included in the figure. The phylogeny is based on a multi-locus data set (Betancur et al., 2013). Whole genome duplications are shown in yellow circles, as inferred by Sato and Nishida (Sato and Nishida, 2010).

! )! Table 1.1. Whole genomes of teleost fishes that have been published as completed, draft or genomic maps, in chronologic ascending order. Assignment of taxonomic orders was taken from Betancur (Betancur et al., 2013).

Order Genus Species Common Name Genome Publication

Tetraodontiformes Takifugu rubripes Fugu (Aparicio et al., 2002)

Tetraodontiformes Tetraodon nigroviridis Green spotted puffer (Jaillon and al., 2004)

Beloniformes Oryzias latipes Medaka (Kasahara, 2007)

Cichliformes fuelleborni Blue mbuna (Loh et al., 2008)

Cichliformes zebra Zebra mbuna (Loh et al., 2008)

Cichliformes Mchenga conophorus (Loh et al., 2008)

Cichliformes Melanochromis auratus Auratus cichlid (Loh et al., 2008)

Cichliformes Rhamphrochromis esox Cichlid (Loh et al., 2008)

Cyprinodontiformes Nothobranchius khuntae Beira killifish (Reichwald et al., 2009)

Cyprinodontiformes Nothobranchius furzeri Turqoise killifish (Reichwald et al., 2009)

incertae sedis Dicentrarchus labrax European sea bass (Kuhl et al., 2010)

Gadiformes Gadus morhua Atlantic cod (Larsen et al., 2011)

Anguilliformes Anguilla japonicus Japanese eel (Henkel et al., 2012b)

Anguilliformes Anguilla anguilla European eel (Henkel et al., 2012a)

Perciformes Gasterosteus aculeatus Threespine stickleback (Jones et al., 2012)

Cichliformes Oreochromis niloticus Nile Tilapia (Guyon and et. al, 2012)

Cypriniformes Danio rerio Zebrafish (Howe et al., 2013)

Cyprinodontiformes Xiphophorus maculatus Southern platyfish (Schartl et al., 2013b)

Scombriformes Thunnus orientalis Blue fin tuna (Nakamura et al., 2013)

Pleuronectiformes Cynoglossus semilaevis Tongue sole (Chen et al., 2014)

! *! The generation of genomic data for non-model fish species has increased in the last decade as new sequencing technologies became available (Mehinto et al., 2012). Before the advent of Next-Generation Sequencing technologies (NGS), genomic information was obtained through Sanger sequencing (Sanger and Coulson, 1975; Sanger et al.,

1977), which is a very accurate though costly and labor-intensive method (Pareek et al.,

2011). After 2005, NGS technologies have allowed for the collection of massive amounts of genomic data in a cost-effective manner. These data have improved research on fish endocrinology, evolution, genetics, immunology, and physiology (Mehinto et al., 2012).

Under such change in the technological resources landscape, Bernardi et al. (Bernardi et al., 2012) presented a list of 10,000 vertebrate species recommended to have its genome sequenced and annotated. The selection criteria comprised conservation of endangered species, commercial importance, and the need to fill knowledge gaps of unexplored lineages (Bernardi et al., 2012). This inventory includes approximately 4000 fish species, and the pejerrey was targeted as one of the 100 “gold standard” species to be sequenced

(Bernardi et al., 2012).

Advances in sequencing technologies and associated genomic software have facilitated in-depth studies of non-traditional model species through genome sequencing (Table 1.1).

An example of such research trend is the three-spine stickleback, a recently proposed model for adaptive evolution in freshwater colonization (Hohenlohe et al., 2010; Jones et al., 2012). After the retraction of Pleistocene glaciers, marine sticklebacks colonized and adapted to recently developed freshwater environments. Such adaptations included modifications in body shape, skeletal armor, trophic specialization, pigmentation, osmoregulation, and mating preferences. These adaptations provide evidence of evolution

! "+! by natural selection, because such features arise in repeated phenotypes among sticklebacks that colonized similar environments (Jones et al., 2012). Similarly, pejerrey populations and species show an array of morphological changes following invasion of freshwater environments with salinities, temperatures, and food resources not encountered in their native marine habitat. In agreement with hypotheses raised for stickleback fish evolution, it has been suggested that phenotypic plasticity or natural selection are a likely explanation (Baigun et al., 2013; Bamber and Henderson, 1988;

Colautti, 2014). The generation of a pejerrey genomic database could facilitate further studies on the evolution of this species.

The evolutionary radiation of teleosts has been causally linked to a teleost-specific whole genome Duplication, or 3R-WGD, a hypothesized event that affected the teleost ancestral genome (Amores et al., 1998; Meyer and Schartl, 1999; Taylor et al., 2001; Wittbrodt et al., 1998). The 3R event has been estimated to occur around 300-450 million years ago, after the divergence of lobe-finned and ray-finned fishes (Sato and Nishida, 2007, 2010).

The first piece of evidence consistent with the 3R hypothesis came from a genetic linkage map of zebrafish. This map allowed for the identification of large duplicated genomic regions, which were compared to other vertebrate genomes supposedly not affected by the 3R, like human and mouse (Postlethwait et al., 2000). More definitive evidence became available with the fully assembled genomes of the pufferfish, !"#$%&'&()

(*+$&,*$*'*- (Jaillon and al., 2004) and fugu, Takifugu rubripes (Christoffels et al., 2004).

In the pufferfish genome, gene copies that diverged after the duplication event

(paralogous genes, Fitch, 1970) and blocks of co-linear genes were used to identify pairs of chromosomes generated by the genome duplication. However, there is not an agreed

! ""! consensus regarding the teleost 3R hypothesis. Duplicated genomic regions could also be the result of local duplications, instead of a single duplication event that affected the entire genome of a teleost common ancestor (Christoffels et al., 2004; Robinson-Rechavi et al., 2001; Vandepoele et al., 2004).

It has been proposed that the 3R duplication event provided genetic raw material to trigger evolutionary adaptations to a variety of aquatic environments through the acquisition of functional novelties (Roest Crollius and Weissenbach, 2005). Duplication of the entire genome creates the potential for novelty, as entire duplicated metabolic pathways become available for modification in sequence and function, a mechanism anticipated to be a major driver of evolutionary radiations (Ohno, 1999).

Following the 3R event, the tetraploid teleost genome underwent a protracted process of diploidization, a molecular process in which duplicate copies of genes are either retained or removed from the genome over evolutionary time (Jaillon and al., 2004; Kellis et al.,

2004; Wolfe, 2001). Currently, only ~20% of the duplicated gene copies were retained by teleosts, but evidence is accumulating to suggest that different subsets of paralogous genes may be retained in different fish lineages (Garcia de la Serrana et al., 2014). In spite of the genomic complexity arising from the 3R event, teleost genomes have a conserved number of chromosomes, a relatively small size range, and long stretches of conserved gene order. About 58% of known teleost karyotypes have 24 or 25 chromosomes (Amores et al., 2014). Similarly, genome size among teleosts range from

342 Mb for the spotted green pufferfish (Lamatsch et al., 2000) to 4792 Mb for Atlantic salmon (Johnson et al., 1987) with a mean value of 1134 Mb (Gregory, 2005). In comparison, genome size in varies from 2406 Mb in Rhinobatos

! "#! schlegelii to 11775 Mb in Crassinarke dormitor (Chang et al., 1996). Based on similarity with estimates for species of Atherinopsidae, genome size in pejerrey was expected to range between 655 - 1075 Mb. Methods and results to estimate the pejerrey genome size are described in Chapter 3.

Large regions of conserved gene order, or synteny, are common within teleosts (Amores et al., 2014; Jaillon and al., 2004; Kasahara, 2007). These syntenic regions could be the result of a low rate of chromosomal rearrangements, possibly related to the peculiar set of transposable elements (TEs) present in teleosts. Transposable elements are a well-known source of genomic variability, as genomes with fewer TEs show less chromosomal rearrangements (Amores et al., 2014; Chalopin et al., 2013; Schartl et al., 2013a). Among teleosts, TEs are highly variable in , but low in copy number. Based on such evidence, the pejerrey genome would be expected to share many syntenic blocks with closely related beloniform fish medaka Oryzias latipes (medaka), and to present a large variety of transposable elements. Such assumptions are explored in Chapter 3, along with an account of the methods and results.

! One of the three goals of this dissertation is to generate a draft version of the pejerrey genome. For this purpose, the genomic material of a single specimen is first fragmented, and amplified. Then, libraries often of various fragment sizes are constructed for shotgun sequencing. The resulting “reads” are further quality-controlled and assembled into contigs and scaffolds. Once assembled, the genome is annotated to identify coding units and assign their corresponding functions, if known (Figure 1.3).

! "$! ,-./01.234!35! 6743821!9:;!

<34=./>1.234!35! ?2@/0/27=!

A7B>741246!

;==78@CD!

;443.0.234! E A./>1.>/7! E F>41.234!

Figure 1.3. Schematic representation of the main components of a genome project.

! "%! All these steps are extremely complex, and require a high degree of expertise and training. Historically, genome project efforts have been undertaken by large groups of researchers and consortiums between institutes and laboratories. The output from such projects has provided the scientific community of well-curated genomic databases that keep growing and improving through maintenance staffs and extensive budgets. The efforts presented in this dissertation do not equal those of such large projects’, but intend to generate a much needed good quality starting point for future research endeavors. This first genome version will be made publicly available for further improvements, and has been the result of a collaborative agreement between the laboratory of Dr. Ortí at The

George Washington University and the J. Craig Venter Institute.

Chapter 3 includes the methods, results and conclusions reached during this initial assessment of the pejerrey genome.

1.4. Temperature-dependent sex determination in fishes (TSD)

In most vertebrates, their genes and/or sexual chromosomes determine the sex of an individual. However, in some reptiles, amphibians and fish species, environmental factors play a key role during developmental stages of sex determination. TSD is a complex and poorly understood biological feature: the temperature at which eggs or larvae are exposed during its first weeks of life will determine the phenotypic sex, and could override the genotypic sex. This feature is scarcely present among teleosts (Devlin and Nagahama,

2002; Ospina-Alvarez and Piferrer, 2008). Recently, discovery of a master-sex determining gene in the pejerrey (amhy) has suggested the possible coexistence and interaction between genetic and environmental factors in sex determination (Yamamoto

! "&! et al., 2014). Many questions remain to be answered: How many genes are involved in

TSD? Where are these genes located in the genome compared with species that undergo genetic sex determination? Are some –or all- of these genes clustered and/or regulated by a common element? These questions will be explored in Chapter 4 using the newly generated pejerrey genome draft presented in Chapter 3.

1.5. References

Amores, A., Catchen, J., Nanda, I., Warren, W., Walter, R., Schartl, M., Postlethwait, J.H., 2014. A RAD-tag genetic map for the platyfish (Xiphophorus maculatus) reveals mechanisms of karyotype evolution among teleost fish. Genetics 197, 625- 641. Amores, A., Force, A., Yan, Y.L., Joly, L., Amemiya, C., Fritz, A., Ho, R.K., Langeland, J., Prince, V., Wang, Y.L., 1998. Zebrafish hox clusters and vertebrate genome evolution. Science 282, 4. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 10. Baigun, C., Berasain, G., Colautti, D., Grosman, F., Lozano, I., Llamazares, S., Mancini, M., Miranda, L., Salina, V., Sanzanno, P., 2013. Ubicación sistemática del pejerrey de la laguna La Salada Pedro Luro (Buenos Aires). In: Cuarto, U.N.d.R. (Ed.), Encuentro sobre manejo de Ecosistemas Acuaticos Pampeanos, Rio Cuarto, Argentina. Bamber, Henderson, 1988. Pre-adaptive plasticityin atherinids and the estuarine seat of teleost evolution. Journal of Fish Biology 33, 17-23. Berasain, G.A., Velasco, C., A. M., Colautti, D., 2002. Experiencias de cultivo intensivo de larvas, juveniles y reproductores de pejerrey Odontesthes bonariensis. In: Grosman, F. (Ed.), Fundamentos biologicos, economicos y sociales para una correcta gestion del recurso pejerrey. Editorial Astyanax, Buenos Aires, Argentina. Bernardi, G., Wiley, E.O., Mansour, H., Miller, M.R., Orti, G., Haussler, D., O'Brien, S.J., Ryder, O.A., Venkatesh, B., 2012. The fishes of Genome 10K. Marine genomics 7, 3-6. Betancur, R.R., Broughton, R.E., Wiley, E.O., Carpenter, K., Lopez, J.A., Li, C., Holcroft, N.I., Arcila, D., Sanciangco, M., Cureton Ii, J.C., Zhang, F., Buser, T., Campbell, M.A., Ballesteros, J.A., Roa-Varon, A., Willis, S., Borden, W.C., Rowley, T., Reneau, P.C., Hough, D.J., Lu, G., Grande, T., Arratia, G., Orti, G., 2013. The tree of life and a new classification of bony fishes. PLoS currents 5. Campanella, D., Garriz, A., Colautti, D.C., Somoza, G.M., Miranda, L.A., 2013. Osmotic induction marking with Alizarin Red S on juveniles of pejerrey, Odontesthes bonariensis (Atherinopsidae). Neotropical Ichthyology 11, 95-100.

! "'! Chalopin, D., Fan, D.S., Simakov, A., Meyer, A., Schartl, M., 2013. Evolutionary active transposable elements in the genome of the coelacanth. J Exp Zool B Mol Dev Evol. 322, 322-333. Chang, H.Y., Sang, T.K., Jan, K.Y., Chen, C.T., 1996. Cellular DNA contents and cell volumes of batoids. Copeia, 571-576. Chen, S., Zhang, G., Shao, C., Huang, Q., Liu, G., Zhang, P., Song, W., An, N., Chalopin, D., Volff, J.N., Hong, Y., Li, Q., Sha, Z., Zhou, H., Xie, M., Yu, Q., Liu, Y., Xiang, H., Wang, N., Wu, K., Yang, C., Zhou, Q., Liao, X., Yang, L., Hu, Q., Zhang, J., Meng, L., Jin, L., Tian, Y., Lian, J., Yang, J., Miao, G., Liu, S., Liang, Z., Yan, F., Li, Y., Sun, B., Zhang, H., Zhang, J., Zhu, Y., Du, M., Zhao, Y., Schartl, M., Tang, Q., Wang, J., 2014. Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle. Nature genetics 46, 253-260. Christoffels, A., Koh, E.G., Chia, J.-m., Brenner, S., Aparicio, S., Venkatesh, B., 2004. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Molecular biology and evolution 21, 1146- 1151. Colautti, D., 2014. Personal Communication In: Campanella, D. (Ed.). Crichigno, S.A., Hattori, R.S., Strüssmann, C.A., Cussac, V., 2014. Morphological comparison of wild, farmed and hybrid specimens of two South American silversides, Odontesthes bonariensis and Odontesthes hatcheri. Aquaculture Research, 1-12. de Souza, J.R.G., Solimano, P.J., Maiztegui, T., Baigún, C.R.M., Colautti, D.C., 2013. Effects of stocking density and natural food availability on the extensive cage culture of pejerrey (Odontesthes bonariensis) in a shallow Pampean lake in Argentina. Aquaculture Research, 1-13. Devlin, R.H., Nagahama, Y., 2002. Sex determination and sex differentiation in fish: an overview of genetic, physiological, and environmental influences. Aquaculture 208, 191-364. Dornelles Zebral, Y., Zafalon-Silva, B., Wiegand Mascarenhas, M., Berteaux Robaldo, R., 2014. Leucocyte profile and growth rates as indicators of crowding stress in pejerrey fingerlings (Odontesthes bonariensis). Aquaculture Research, Early view, pages not specified yet. Dyer, B., 2006. Systematic revision of the South American silversides (Teleostei, Atheriniformes). Biocell 30, 69-88. Dyer, B.S., 1993. A phylogenetic study of Atheriniform fishes with a systematic revision of the South American silversides (, Atherinopsidae, Sorgentinini) (Volumes I and II). The University of Michigan. The University of Michigan. Dyer, B.S., 1997. Phylogenetic Revision of Atherinopsidae (Teleostei, Atherinopsidae), with comments on the systematics of the south american freshwater fish genus Girard. Miscellaneous Publications Museum of , University of Michigan 185. Eschmeyer, W.N.e., 2013. : genera, species, references. Evermann, B.W., Jenkins, O.P., 1891. Report upon a collection of fishes made at Guaymas, Sonora, Mexico, with descriptions of new species. Proc. U.S. Natl. Mus. 14, 121-165.

! "(! Fernandino, J.I., Hattori, R.S., Moreno Acosta, O.D., Strussmann, C.A., Somoza, G.M., 2013a. Environmental stress-induced testis differentiation: androgen as a by-product of cortisol inactivation. General and comparative endocrinology 192, 36-44. Fernandino, J.I., Hattori, R.S., Strüssmann, C.A., 2013b. Atherinopsid fishes as models for the study of temperature-dependent sex determination: physiology of gonadal sex differentiation in pejerrey Odontesthes bonariensis. In: Nova Science Publishers Inc., U. (Ed.), Sexual Plasticity and Gametogenesis in Fishes. Nova Science Publishers Inc., USA., New York, USA, pp. 57 - 71. Fitch, W.M., 1970. Distinguishing homologous from analogous proteins. Systematic Zoology 19, 99-113. Freyre, L.R., Colautti, D.C., Maroñas, M.E., Sendra, E.D., Remes Lenicov, M., 2009. Seasonal changes in the somatic indices of the freshwater silverside, Odontesthes bonariensis (Teleostei, Atheriniformes) from a Neotropical shallow lake (Argentina) Brazilian Journal of Biology 69, 389-395. Froese, R., Pauly, D., 2014. FishBase. World Wide Web electronic publication. . Garcia de la Serrana, D., Mareco, E., Johnston, I., 2014. Systematic variation in the pattern of gene paralogue retention between the teleost super-orders Ostariophysi and Acanthopterygii. Genome biology and evolution 6.4, 981-987. Garriz, A., Miranda, L.A., 2013. Ultrastructure of fresh and post thawed sperm of pejerrey Odontesthes bonariensis (Atheriniformes). Neotropical Ichthyology 11, 831-836. Gómez, S.E., Menni, R.C., Naya, J.G., Ramirez, L., 2006. The physical–chemical habitat of the Buenos Aires pejerrey, Odontesthes bonariensis (Teleostei, Atherinopsidae), with a proposal of a water quality index. Environmental Biology of Fishes 78, 161- 171. Gómez-Requeni, P., Bedolla-Cázares, F., Montecchia, C., Zorrilla, J., Villian, M., Toledo-Cuevas, E.M., Canosa, F., 2013. Effects of increasing the dietary lipid levels on the growth performance, body composition and digestive enzyme activities of the teleost pejerrey (Odontesthes bonariensis). Aquaculture 416–417, 15-22. Gómez-Requeni, P., Kraemer, M., Canosa, L., 2012. Regulation of somatic growth and gene expression of the GH–IGF system and PRP-PACAP by dietary lipid level in early juveniles of a teleost fish, the pejerrey (Odontesthes bonariensis). Journal of Comparative Physiology B 182, 517-530. Greenwood, P.H., Rosen, D.E., Weitzman, S.H., Myers, G.S., 1966. Phyletic studies of teleostean fishes, with a provisional classification of living forms. Bulletin of the AMNH; v. 131, article 4. Gregory, T.R., 2005. The evolution of the genome. Elsevier/Academic Press. Grosman, F.e., 2002. Fundamentos biológicos, económicos y sociales para una correcta gestión del recurso Pejerrey. Editorial Astyanax, Buenos Aires, Argentina. Grunwald, D.J., Eisen, J.S., 2002. Headwaters of the zebrafish—emergence of a new model vertebrate. Nature reviews genetics 3, 717-724. Guyon, R., et. al, 2012. A high-resolution map of the Nile tilapia genome: a resource for studying and other percomorphs. BMC genomics 13.

! ")! Henkel, C.V., Burgerhout, E., de Wijze, D.L., Dirks, R.P., Minegishi, Y., Jansen, H.J., Spaink, H.P., Dufour, S., Weltzien, F., Tsukamoto, K., van den Thillart, G.E.E.J.M., 2012a. Primitive duplicate hox clusters in the European Eel's genome. PLoS one 7. Henkel, C.V., Dirks, R.P., de Wijze, D.L., Minegishi, Y., Aoyama, J., Jansen, H.J., Turner, B., Knudsen, B., Bundgaard, M., Hvam, K.L., Boetzer, M., Pirovano, W., Weltzien, F.A., Dufour, S., Tsukamoto, K., Spaink, H.P., van den Thillart, G.E., 2012b. First draft genome sequence of the Japanese eel, Anguilla japonica. Gene 511, 195-201. Hinegardner, R., Rosen, D.E., 1972. Cellular DNA content and the evolution of teleostean fishes. Am. Nat. 106, 621-644. Hohenlohe, P.A., Bassham, S., Etter, P.D., Stiffler, N., Johnson, E.A., Cresko, W.A., 2010. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS genetics 6, e1000862. Hong, Y., Winkler, C., Schartl, M., 1996. Pluripotency and differentiation of embryonic stem cell lines from the medakafish Oryzias latipes. Mechanisms of development 60, 33-44. Howe, K., Clark, M.D., Torroja, C.F., Torrance, J., Berthelot, C., Muffato, M., Collins, J.E., Humphray, S., McLaren, K., Matthews, L., McLaren, S., Sealy, I., Caccamo, M., Churcher, C., Scott, C., Barrett, J.C., Koch, R., Rauch, G.J., White, S., Chow, W., Kilian, B., Quintais, L.T., Guerra-Assuncao, J.A., Zhou, Y., Gu, Y., Yen, J., Vogel, J.H., Eyre, T., Redmond, S., Banerjee, R., Chi, J., Fu, B., Langley, E., Maguire, S.F., Laird, G.K., Lloyd, D., Kenyon, E., Donaldson, S., Sehra, H., Almeida-King, J., Loveland, J., Trevanion, S., Jones, M., Quail, M., Willey, D., Hunt, A., Burton, J., Sims, S., McLay, K., Plumb, B., Davis, J., Clee, C., Oliver, K., Clark, R., Riddle, C., Elliot, D., Threadgold, G., Harden, G., Ware, D., Begum, S., Mortimore, B., Kerry, G., Heath, P., Phillimore, B., Tracey, A., Corby, N., Dunn, M., Johnson, C., Wood, J., Clark, S., Pelan, S., Griffiths, G., Smith, M., Glithero, R., Howden, P., Barker, N., Lloyd, C., Stevens, C., Harley, J., Holt, K., Panagiotidis, G., Lovell, J., Beasley, H., Henderson, C., Gordon, D., Auger, K., Wright, D., Collins, J., Raisen, C., Dyer, L., Leung, K., Robertson, L., Ambridge, K., Leongamornlert, D., McGuire, S., Gilderthorp, R., Griffiths, C., Manthravadi, D., Nichol, S., Barker, G., Whitehead, S., Kay, M., Brown, J., Murnane, C., Gray, E., Humphries, M., Sycamore, N., Barker, D., Saunders, D., Wallis, J., Babbage, A., Hammond, S., Mashreghi-Mohammadi, M., Barr, L., Martin, S., Wray, P., Ellington, A., Matthews, N., Ellwood, M., Woodmansey, R., Clark, G., Cooper, J., Tromans, A., Grafham, D., Skuce, C., Pandian, R., Andrews, R., Harrison, E., Kimberley, A., Garnett, J., Fosker, N., Hall, R., Garner, P., Kelly, D., Bird, C., Palmer, S., Gehring, I., Berger, A., Dooley, C.M., Ersan-Urun, Z., Eser, C., Geiger, H., Geisler, M., Karotki, L., Kirn, A., Konantz, J., Konantz, M., Oberlander, M., Rudolph-Geiger, S., Teucke, M., Lanz, C., Raddatz, G., Osoegawa, K., Zhu, B., Rapp, A., Widaa, S., Langford, C., Yang, F., Schuster, S.C., Carter, N.P., Harrow, J., Ning, Z., Herrero, J., Searle, S.M., Enright, A., Geisler, R., Plasterk, R.H., Lee, C., Westerfield, M., de Jong, P.J., Zon, L.I., Postlethwait, J.H., Nusslein-Volhard, C., Hubbard, T.J., Roest Crollius, H., Rogers, J., Stemple, D.L., 2013. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498-503.

! "*! Jaillon, O., al., e., 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotypes. Nature 431.7011, 946-957. Johnson, O.W., Utter, F.M., Rabinovitch, P.S., 1987. Interspecies differences in salmonid cellular DNA identified by flow cytometry. Copeia, 1001-1009. Jones, F.C., Grabherr, M.G., Chan, Y.F., Russell, P., Mauceli, E., Johnson, J., Swofford, R., Pirun, M., Zody, M.C., White, S., Birney, E., Searle, S., Schmutz, J., Grimwood, J., Dickson, M.C., Myers, R.M., Miller, C.T., Summers, B.R., Knecht, A.K., Brady, S.D., Zhang, H., Pollen, A.A., Howes, T., Amemiya, C., Baldwin, J., Bloom, T., Jaffe, D.B., Nicol, R., Wilkinson, J., Lander, E.S., Di Palma, F., Lindblad-Toh, K., Kingsley, D.M., 2012. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55-61. Kasahara, M., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447. Kellis, M., Birren, B.W., Lander, E.S., 2004. Proof and evolutionary analysis of ancient genome duplication in the yest Saccharomyces cerevisiae. Nature 428, 617-624. Kleerekoper, H., 1949. O peixe-rei. Ministerio da Agricultura, Rio de Janeiro, Brazil. Kuhl, H., Beck, A., Wozniak, G., Canario, A.V.W., Volckaert, F.A.M., Reinhardt, R., 2010. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing. BMC genomics 11. Lahille, F., 1929. El pejerrey. Boletin del Ministerio de Agricultura de la Nación. Lamatsch, D.K., Steinlein, M., Schmid, M., Schartl, M., 2000. Nonivasive determination of genome size and ploidy level in fishes by flow cytometry: detection of triploid Poecilia formosa. Cytometry 39, 91-95. Larsen, P.F., Schulte, P.M., Nielsen, E.E., 2011. Gene expression analysis for the identification of selection and local adaptation in fishes. J Fish Biol 78, 1-22. Llompart, F.M., Colautti, D.C., Maiztegui, T., Cruz-Jimenez, A.M., Baigun, C.R., 2013. Biological traits and growth patterns of pejerrey Odontesthes argentinensis. J Fish Biol 82, 458-474. Loh, Y.-H.E., Katz, L.S., Mims, M.C., Kocher, T.D., Yi, S.V., Streelman, J.T., 2008. Comparative analysis reveals signatures of differentiation amid genomic polymorphism in cichlids. Genome biology 9, R113. Lopez, H.L., Baigun, C.R.M., Iwaszkiw, J.M., Delfino, R.L., Padin, O.H., 2001. La Cuenca del Salado: Uso y posibilidades de sus recursos pesqueros. EDULP. Editorial de la Universidad de La Plata, La Plata, Buenos Aires, Argentina, p. 90. Majhi, S.K., Hattori, R.S., Rahman, S.M., Strüssmann, C.A., 2014. Surrogate Production of Eggs and Sperm by Intrapapillary Transplantation of Germ Cells in Cytoablated Adult Fish. PloS ONE 9, e95294. Matsuda, M., Nagahama, Y., Shinomiya, A., Sato, T., Matsuda, C., Kobayashi, T., Morrey, C.E., Shibata, N., Asakawa, S., Shimizu, N., Hori, H., Hamaguchi, S., Sakaizumi, M., 2002. DMY is a Y-specific DM-domain gene required for male develoment in the medaka fish. Nature 417, 5. Mehinto, A.C., Martyniuk, C.J., Spade, D.J., Denslow, N.D., 2012. Applications for next- generation sequencing in fish ecotoxicogenomics. Frontiers in genetics 3, 62. Meyer, A., Schartl, M., 1999. Gene and genome duplications in vertebrates: the one-to- four (-to-eight in fish) rule and the evolution of novel gene functions. Current opinion in cell biology 11, 699-704.

! #+! Miranda, L.A., Chalde, T., Elisio, M., Strussmann, C.A., 2013. Effects of global warming on fish reproductive endocrine axis, with special emphasis in pejerrey Odontesthes bonariensis. General and comparative endocrinology 192, 45-54. Nakamura, Y., Mori, K., Saitoh, K., Oshima, K., Mekuchi, M., Sugaya, T., Shigenobu, Y., Ojima, N., Muta, S., Fujiwara, A., Yasuike, M., Oohara, I., Hirakawa, H., Chowdhury, V.S., Kobayashi, T., Nakajima, K., Sano, M., Wada, T., Tashiro, K., Ikeo, K., Hattori, M., Kuhara, S., Gojobori, T., Inouye, K., 2013. Evolutionary changes of multiple visual pigment genes in the complete genome of Pacific bluefin tuna. Proceedings of the National Academy of Sciences of the United States of America 110, 11061-11066. Ohashi, M., 2004. The culture of pejerrey in Japan. Jornadas del Pejerrey: aspectos básicos y acuicultura, Chascomús, Argentina. Ohno, S., 1999. Gene duplication and the uniqueness of vertebrate genomes circa 1970- 1999. Cell & Developmental Biology 10, 517-522. Ospina-Alvarez, N., Piferrer, F., 2008. Temperature-dependent sex determination in fish revisited: prevalence, a single sex ratio response pattern, and possible effects of climate change. PLoS one 3, e2837. Pagani, I., Liolios, K., Jansson, J., Chen, I.M., Smirnova, T., Nosrat, B., Markovitz, V.M., Kyrpides, N.C., 2012. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 40, D571-579. Pareek, C.S., Smoczynski, R., Tretyn, A., 2011. Sequencing technologies and genome sequencing. Journal of applied genetics 52, 413-435. Pérez, M.R., Fernandino, J.I., Carriquiriborde, P., Somoza, G.M., 2012. Feminization and altered gonadal gene expression profile by ethinylestradiol exposure to pejerrey, Odontesthes bonariensis, a South American teleost fish. Environmental Toxicology and Chemistry 31, 941-946. Postlethwait, J.H., Woods, I.G., Ngo-Hazelett, P., Yan, Y.L., Kelly, P.D., Chu, F., Huang, H., Hill-Force, A., Talbot, W.S., 2000. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome research 10, 1890-1902. Reichwald, K., Lauber, C., Nanda, I., Kirschner, J., Hartmann, N., Schories, S., Gausmann, U., Taudien, S., Schilhabel, M.B., Szafranski, K., Glockner, G., Schmid, M., Cellerino, A., Schartl, M., Englert, C., Platzer, M., 2009. High tandem repeat content in the genome of the short-lived annual fish Nothobranchius furzeri: a new vertebrate model for aging research. Genome biology 10, R16. Riegel, H., 1960. Observaciones sobre la fauna ictiológica de las aguas dulces chilenas. Primer Congreso Sudamericano Zoología I, pp. 141-144. Ringuelet, R.A., 1943. Piscicultura del pejerrey o aterinicultura. Suelo Argentino, Buenos Aires. Ringuelet, R.A., 1975. Zoogeografia y ecología de los peces de aguas continentales de la Argentina y consideraciones sobre las áreas ictiológicas de América del Sur. Ecosur 2, 1-122. Robinson-Rechavi, M., Marchand, O., Escriva, H., Bardet, P.L., Zelus, D., Hughes, S., Laudet, V., 2001. Euteleost fish genomes are characterized by expansion of gene families. Genome research 11, 781-788.

! #"! Roest Crollius, H., Weissenbach, J., 2005. Fish genomics and biology. Genome research 15, 1675-1682. Sanger, F., Coulson, A.R., 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol 94, 441-448. Sanger, F., Nicklen, S., Coulson, A.R., 1977. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74, 5463-5467. Sato, Y., Nishida, M., 2007. Post-duplication charge evolution of phosphoglucose isomerases in teleost fishes through weak selection on many amino acid sites. BMC evolutionary biology 7, 204. Sato, Y., Nishida, M., 2010. Teleost fish with specific genome duplication as unique models of vertebrate evolution. Environmental Biology of Fishes 88, 169-188. Schartl, M., Walter, R.B., Shen, Y., Garcia, T., Catchen, J., 2013a. The genome of the platyfish, Xiphophorus maculatus: insights into complex traits. Nature genetics 45, 567-572. Schartl, M., Walter, R.B., Shen, Y., Garcia, T., Catchen, J., Amores, A., Braasch, I., Chalopin, D., Volff, J.N., Lesch, K.P., Bisazza, A., Minx, P., Hillier, L., Wilson, R.K., Fuerstenberg, S., Boore, J., Searle, S., Postlethwait, J.H., Warren, W.C., 2013b. The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits. Nature genetics 45, 567-572. Somoza, G.M., Miranda, L.A., Berasain, G.E., Colautti, D., Remes Lenicov, M., Strüssmann, C.A., 2008. Historical aspects, current status and prospects of pejerrey aquaculture in South America. Aquaculture Research 39, 784-793. Strüssmann, C.A., Saito, T., Usui, M., Yamada, H., Takashima, F., 1997. Thermal thresholds and critical period of thermolabile sex determination in two atherinid fishes, Odontesthes bonariensis and Patagonina hatcheri. The Journal of experimental biology 278, 167-177. Taylor, J.S., Van de Peer, Y., Braasch, I., Meyer, A., 2001. Comparative genomics provides evidence for an ancient genome duplication event in fish. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 356, 18. Tortonese, E., 1985. Interesse scientifico e pratico di una famiglia di Pesci ossei: gli aterinidi. Quaderni dell'Ente Tutela Pesca, Rivista di Limnologia 10, 1-40. Valette, L.H., 1939. Apuntes sobre el pejerrey lacustre fluvial de Buenos Aires. Memorias del Jardín Zoológico, La Plata 9, 102-124. Vandepoele, K., De Vos, W., Taylor, J.S., Meyer, A., Van de Peer, Y., 2004. Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proceedings of the National Academy of Sciences of the United States of America 101, 1638-1643. Velasco, C., Berasain, G., Ohashi, M., 2008. Producción intensiva de juveniles de pejerrey (Odontesthes bonariensis). Biología Acuática 24, 53-58. Wittbrodt, J., Meyer, A., Schartl, M., 1998. More genes in fish? BioEssays 20, 511-515. Wolfe, K.H., 2001. Yesterday's polyploids and the mistery of diploidization. Nature Reviews Genetics 2. Yamamoto, Y., Hattori, R.S., Kitahara, A., Kimura, H., Yamashita, M., Strüssmann, C.A., 2013. Thermal and Endocrine Regulation of Gonadal Apoptosis during Sex Differentiation in Pejerrey Odontesthes bonariensis. Sexual Development 7, 316-324.

! ##! Yamamoto, Y., Zhang, Y., Sarida, M., Hattori, R.S., Strussmann, C.A., 2014. Coexistence of genotypic and temperature-dependent sex determination in pejerrey Odontesthes bonariensis. PloS ONE 9, e102574.

! #$! Chapter 2. Multi-locus fossil-calibrated phylogeny of

Atheriniformes (Teleostei, )

Daniela Campanella1, Lily C. Hughes1, Peter J. Unmack2, Devin D. Bloom3, Kyle R.

Piller4, Guillermo Ortí1

1 Department of Biological Sciences, The George Washington University, Washington

DC, USA

2Institute for Applied Ecology, University of Canberra, Australia

3Department of Biology, Willamette University, Salem, Oregon, USA

4 Department of Biological Sciences, Southeastern Louisiana University, Hammond,

Louisiana, USA

2.1. Introduction

The order Atheriniformes includes about 350 fish species commonly known as silversides, , and blue eyes, many of which are important to support commercial fisheries and the aquarium trade (Eschmeyer, 2013; Nelson, 2006). They inhabit a wide range of environments from freshwater lakes, lagoons and rivers, to and coastal marine waters, and are globally distributed in tropical and temperate regions. Some atheriniform species are exclusively marine, but many others are restricted to freshwater (Nelson, 2006) and some diadromous species undertake seasonal migrations between marine and freshwater habitats (Dyer and Chernoff, 1996). Many

! "#! atheriniforms exhibit a wide range of salinity tolerance typical of euryhaline species. The order is morphologically diverse with adult body sizes ranging from 25 mm to 520 mm in length (Dyer and Chernoff, 1996) but most species are silvery in color with a prominent silvery lateral stripe, while rainbowfishes can be very colorful, especially males (Dyer,

1998; Dyer and Chernoff, 1996). The most bizarre morphology among atheriniforms is found in priapium fishes (family ), with male phallostethids exhibiting sub-cephalic copulatory organs derived from modifications of the pelvic skeleton

(Parenti, 1986).

The monophyly of Atheriniformes is supported by ten morphological synapomorphies

(Dyer and Chernoff, 1996) and also was obtained by molecular analyses of mitogenomes for a small number of taxa (Setiamarga et al., 2008), analyses of cytochrome b and RAG-

1 data for 47 ingroup taxa (Bloom et al., 2012), but not by a parsimony analysis combining morphology, mitochondrial, and nuclear gene data (Sparks and Smith, 2004).

A recent molecular phylogenetic study of 1416 ray-finned fishes that included 25 atheriniform taxa from 8 of the 11 recognized families based on 21 gene fragments also resolved the monophyly of this order with high bootstrap support (Betancur et al., 2013).

In that study, Atheriniformes was resolved within the same clade (superorder

Atherinomorphae) as Beloniformes and Cyprinodontiformes, in agreement with previous hypotheses (Parenti, 1993).

The number of atheriniform families and their composition have been relatively variable over time (Nelson, 2006). A summary of current hypotheses is presented in Table 2.1.

Interrelationships of Atheriniformes have been examined by several authors based on morphological characters (Aarn and Ivantsoff, 1997; Dyer and Chernoff, 1996; Ivantsoff

! "$! et al., 1987; Parenti, 1984, 1993; Rosen, 1964; Rosen and Parenti, 1981; White et al.,

1984) and DNA sequence data (Betancur et al., 2013; Bloom and Lovejoy, 2012;

Setiamarga et al., 2008; Sparks and Smith, 2004). Within the order, extensive disagreement persists among phylogenetic studies (Figure 2.1), especially comparing the very different conclusions reached by Dyer and Chernoff (1996) and Aarn and Ivantsoff

(1997). The recognition of two suborders, based on cladistics analysis by Dyer and

Chernoff (1996), however, is supported by most studies that place the New World silverside family Atherinopsidae (suborder Atherinopsoidei) as the sister-group to the remaining families (Fig. 2.1 A, B, D, E, F). Recent molecular analyses (Bloom et al.,

2012) supported previous hypotheses by Saeed et al. (1994) and Aarn and Ivantsoff

(1997) that placed Notocheirus hubbsi, the surf silverside from temperate coastal waters of Chile and Argentina, closer to or nested within the family Atherinopsidae (Fig. 2.1 B and D). A second genus of surf silversides (, with five Indo-Pacific species) traditionally included in the family Notocheiridae was shown to be distantly related and assigned to Isonidae, suggesting that many of the morphological characters that were used to support the monophyly of Notocheiridae (Iso + Notocheirus) may be convergent

(Bloom et al., 2012). The rest of the families are distributed in the Old World (Table 2.1) and included in the suborder Atherinoidei (Dyer and Chernoff, 1996; Nelson, 2006).

However, this early phylogenetic split between New World and Old World lineages was not obtained by Sparks and Smith (2004), who also placed Atherionidae and

Phallostethidae among outgroup taxa from the orders Beloniformes, Cyprinodontiformes, and Mugiliformes, challenging the monophyly of Atheriniformes. Extensive disagreement exists among previous hypotheses, for example in relation to the position of

! "%! the morphologically peculiar Phallostethidae, the monophyly of Melanotaeniidae, and relationships among Indo-West Pacific families Melanotaeniidae, Pseudomugilidae,

Bedotiidae, and Telmatherinidae (Aarn and Ivantsoff, 1997; Aarn and Kottelat, 1998;

Dyer and Chernoff, 1996; Ivantsoff et al., 1997; Saeed et al., 1994; Sparks and Smith,

2004).

Relationships within atheriniform families have been studied using morphology (Aarn and Ivantsoff, 1997; Aarn and Kottelat, 1998; Dyer, 1997; Parenti, 1993; Saeed et al.,

1994), molecular markers (Bloom et al., 2009; McGuigan et al., 2000; Unmack et al.,

2013; Zhu et al., 1994) or combined datasets (Dyer, 1997; McGuigan et al., 2000; Sparks and Smith, 2004). A consensus for the phylogenetic relationships for most families is yet to emerge.

The oldest Atheriniform fossils are marine/brackish species of the extinct genus

Hemitrichas Peters 1877 (Atherinidae) from the Miocene and Oligocene in Germany,

Switzerland and Iran (Gaudant and Reichenbachen, 2005; Jost et al., 2007; Malz, 1978;

Reichenbacher, 2000; Reichenbacher and Weidmann, 1992; Weiler, 1942; Weiler and

Schäfer, 1963). The few molecular phylogenies that attempted to estimate divergence dates for Atheriniformes reported highly discordant results. A comprehensive phylogenetic study of bony fishes that used 60 fossil calibrations placed the origin of this order around 70 Ma (Betancur et al., 2013), but other studies focusing specifically on the age and distribution of one or a few families estimated significantly older ages (Unmack and Dowling, 2010a). Uncertainty about divergence dates for atheriniforms and their distribution in marine and freshwater environments inspired a diversity of biogeographic scenarios to explain their cosmopolitan distribution. Vicariance hypotheses based on the

! "&! fragmentation of continental masses, however, can be explicitly tested with time- calibrated phylogenies!(Crisp et al., 2011). If the origin of most atheriniform families is

Cenozoic, by which time the continents already occupied their modern location

(Upchurch, 2008), their current distribution must have been affected mostly by dispersal rather than by vicariance. Marine to freshwater transitions have occurred in several

Atheriniform lineages (Beheregaray, 2000; Bloom et al., 2013; Pujolar et al., 2012), probably as a result of historical changes in global sea levels that have led to the colonization of coastal freshwater environments by marine ancestors. Examples of these speciation events in Atheriniformes are the boyeri complex in Europe (Pujolar et al., 2012), the inland silverside beryllina (Fluker et al., 2011) in North

America, and Odontesthes in Southern Brazil (Beheregaray, 2000; Beheregaray et al.,

2002). A robust phylogenetic hypothesis is a necessary framework to interpret the evolution of habitat preference in such widespread species (Betancur-R. et al., 2012).

The aim of this study is to provide a multi-locus, fossil-calibrated phylogenetic hypothesis for Atheriniformes to establish a solid framework for evolutionary studies and to inform taxonomic decisions.

! "'!

..&'"()*-74)+%"'&"()*-47)+"% ..&'"()*-74)+%"'& "()*-47)+%" 5-&-2'")()+%" 55-&-2'")()+%";<;04-*)+%"-&-2'")()+%"";<;04-*)+% 5-&-2'")()" +% ..&'"()-*)+%"&'"()-*)" +% ..&'"()-*)+%"&'"()-*)" +% 66'%##-4&"&')+%"'%##-4&""&')+% 66'%##-4&"&')+%"'%##-4&"&')+% ..&'"()*)+%"&'"()*)"+% !!"#$%&'"()*)+%""#$%&'"()*)" +% ..&'"()*)+%"&'"()*)" +% 664"3+-$38)#)+%"4"3+$-38)#)+"% ,,"+-&))*%""+-"&))*% 0(3"%&456"%/&()%&'"(%)* ;B/"#%*-&%"*))+%"C /"#%/"#%*-&%"*))+%"&*%"*))- "+% //"#%*-&%"*))*%""#%*&%"*))- "*% .&'"()*-)+") ,,"+-&))+%""+&-)+%" 64"3+-$38)#)*) 64"3+-$38)#)) )* 9%)(*4))&%"/'%8545#'(&'2:4' ; B/"#%*-&%"*))+%"C ! !!"#$%&'"()*)"#$%&)'"()* = 1'%+)*75&0%/*86/4".'(-24"*&(3 ;B/"#%*-&%"*))+%"C "#$%&'()*'+,%&)-..'/0012 "!(&)'()*'>?()96-..'/00@2 .&'"()*-74)+%";<;5-&-2'")()" +% ..&'"()-*)+%"&'"()-*)"+% .&'"()*-74)+%";<;5-&-2'")()+%" ,!"#$%&'("#-*)=-($B,"#-*)=-($"4C;"4>4?(:@)% 004-*)+%"4-*)+%" 66'%##-4&"&')+%"'%##-4&""&')+% ..&'"()*)+%"&'"()*)+"% /38)#1.2%+)=-(4$" 9()"&46"*8695&+.'(%&"(-2"7'%4#3 ; B.&'"()*)+%"C ,)*+*+&,%'("#-=-($*) "4;>B,"#-*)=-($"4C9-#-#%4 A) 9-./0.+.'(:7()*-+&)=-(-*$B9:7()*-+-*&)=-($"4C4" ,,"+-&))+%""+-&))+"% ..&'"()*-74)+%"'& "()*-47)+%" /"#%/"#%*-&%"*))+%"*-&%"*))+%" 55-&-2'")()+%";<;04-*)+%"-&-2'")()+%";<;0"4-*)+% 66'%##-4&"&')+%"'%##-4&"&')+"% ..&'"()*)+%"&'"()*)"+% 664"3+-$38)#)+%"4"3+-$38)#))" +% 9%)(*4)&%"/'%8545#'()2&'':4 B/"#%*-&%"*))+%"C ,,"+-&))+%""+-&))"+% 664"3+-$38)#)+%"4"3+-$38)#)+%" /"#%/"#%*-&%"*))+%"&*%"*))- +"% + # !!"#$%&'"()*)+%""#$%&'"()*)+"% "34(&56'()*'3789,':;;<2 "=A--7'%9'(A':;/:2 .&'"()*-74)+%"

.&'"()*-74)+%" 04-*)+%" .&'"()*)+%" ,"+-&))+%" /"#%*-&"*))+%" ,"+-&))+%" .&'"()*)+%" 04-*)+%" /"#%*-&%"*))+%" 6'%##-4&"&')+%" !"#$%&'"()*)+%" 64"3+-$38)#)+%" .&'"()*-)+") 6'%##-4&"&')+%" .&'"()*-)+") B E 64"3+-$38)#)+%" "C%(&'%9'(AD':;/:2 "=%9()FG&HI'%9'(A':;/J2

Figure 2.1. Phylogenetic hypotheses for families in the order Atheriniformes based on previous studies. A: Dyer and Chernoff 1996; B: Aarn and Ivantsoff 1997; C: Sparks and Smith 2004; D: Bloom et al. 2012; E: Near et al. 2012; F: Betancur-R. et al. 2013.

! "(! Table 2.1. Families of Atheriniformes according to Nelson (2006) and Eschmeyer and Fong (2014), valid genera, their geographic distribution, habitat type, and alternative taxonomic arrangements.

Family & Geographic Common name Distribution & Included genera Taxonomic observations Habitat Atherinidae Indo-West Pacific and Alepidomus, Atherina, Number of subfamilies Old World Silversides Atlantic; freshwater, Atherinason, included varies between 1,2 3 4 marine, and brackish Atherinomorus, two , three , six , and Bleheratherina*, nine5; 12 genera with 68 , species. Hypoatherina*, Kestratherina, Leptatherina, Sashatherina*, Stenatherina*, Teramulus*

Atherinopsidae North, Central, and , Atherinops, Formerly a subfamily in 1,4,5 New World Silversides South America Atherinopsis, Basilichthys, Atherinidae , later 2 (Atlantic and Pacific); Chirostoma, Colpichthys*, recognized as a family ; freshwater, marine, and , Leuresthes, includes two subfamilies brackish , , ( and Menidia, Odontesthes, ); 13 genera with 109 species

Atherionidae Indian Ocean and Previously a subfamily of 5,6 Pricklenose silversides Western Pacific; Atherinidae , later 3,7 marine elevated to family ; 1 genus with three species

Bedotiidae Central and eastern Bedotia, Rheocles Previously a subfamily of 3 Madagascar Madagascar; Melanotaeniidae , later 8 rainbowfishes freshwater elevated to family ; 2 genera with 16 species.

Dentatherinidae Tropical western Dentatherina merceri* Formerly a subfamily of Tusked silversides Pacific; marine Atherinidae, a subfamily of Phallostethidae9, or as a separate family10; monotypic

Isonidae Indo-West Pacific; Iso Either within 3 Surf silversides marine Notocheiridae , or as its 2,11 own family ; five species

Melanotaeniidae Australia, New Guinea, Cairnsichthys, Chilatherina, Also as a subfamily within 3 Rainbowfishes eastern Indonesia; Glossolepis, Iriatherina, Melanotaeniidae ;seven freshwater, few , Pelangia*, genera with 80 species brackish, rarely marine Rhadinocentrus

Notocheiridae Southern South Notocheirus hubbsi Also as a subfamily in 11 Surf silversides America; marine Atherinopsidae , sister to Atherinopsidae7, or with Iso in a monophyletic family3; monotypic

! )*! Phallostethidae Southeast Asia *, Neostethus, Proposed as a subfamily Priapium fishes (Philippines to Phallostethus*, within Phallostethidae, Thailand and Sumatra); Phenacostethus* sister to Dentatherininae3,7; rfeshwater and 4 genera with 23 species brackish

Pseudomugilidae Australia and New Kiunga, Pseudomugil, Formerly a subfamily 3 Blue eyes Guinea; freshwater and Popondichthys*, within Melanotaeniidae ; 4 brackish, rare marine Scaturiginichthys* genera with 18 species

Telmatherinidae , Misool and Kalyptatherina, Formerly a subfamily 3 Sailfin silversides Batanta Island; Marosatherina, within Melanotaeniidae ; 5 freshwater Paratherina*, genera with 18 species Telmatherina*, Tominanga*

! )+!

2.1. Materials and Methods

2.1.1. Taxonomic sampling, DNA extraction, PCR, and sequencing

We collected a comprehensive taxon sample that includes 103 species, representing 35

(out of 51) genera, and 10 of 11 families of Atheriniformes--only missing the monotypic family Dentatherinidae (Table 2.1). Two specimens were used for DNA analysis for most species for quality control; all tissues were already available from museums or from the author’s collections. Appendix 1 contains collection locality and museum voucher information (when available). For each family, the number of genera represented in our sample (out of the total number of genera in the family, see Table 2.1) is as follows:

Atherinidae (8 out of 13), Atherionidae (1 out of 1), Atherinopsidae (12 out of 13),

Bedotidae (2 out of 2), Isonidae (1 out of 1), Melanotaenidae (6 out of 7), Notocheiridae

(1 out of 1), Phallostethidae (1 out of 4), Pseudomugilidae (2 out of 4) and

Telmatherinidae (2 out of 5). Five species of Beloniformes and Cyprinodontiformes were used as outgroup. A complete list of taxa used in this study also is provided in Table 2.2.

Extraction of genomic DNA was performed using DNeasy® Blood & Tissue Kit (Qiagen,

Valencia, CA). The molecular markers chosen for this study included seven nuclear genes commonly used for fish phylogenetic studies (Betancur et al., 2013): ficd (FIC domain-containing protein 334648), gcs1 (glucosidase 1 or mannosyl-oligosaccharide glucosidase LOC567913), kbtbd4 (Kelch repeat and BTB (POZ) domain containing 4

LOC393178), kiaa-l (leucine-rich repeat and WD repeat-containing protein, KIAA1239- like LOC562320), myh6 (cardiac muscle myosin heavy chain 6 alpha), sh3px3 (SH3 &

PX domain-containing 3-like protein), and slc10a3 (solute carrier family 10, member 3;

! )"! zgc:85947). Nested-PCR amplifications were performed following published protocols for each marker (Betancur-R et al., 2013b; Li et al., 2011; Li et al., 2008; Li et al., 2007).

In addition to the nuclear genes, the entire mitochondrial cytochrome b gene (cytb) was amplified and sequenced as described previously (Lewallen et al., 2011; Unmack and

Dowling, 2010b). The resulting amplicons were sent for purification and sequencing from both directions to High Throughput Sequencing Solutions (HTSeq.org), University of Washington, Seattle, Washington. Sequences were edited and aligned using Geneious v6 (created by Biomatters, http://www.geneious.com/).

2.1.2. Phylogenetic analysis

Multiple sequence alignments were performed for each marker separately using MAFFT with default settings (Katoh et al., 2002). After visual inspection to verify expected open reading frames and trimming of sequence ends, individual gene trees were generated with

RAxML (Stamatakis, 2006; Stamatakis et al., 2008). Gene trees included duplicate sequences for each species for quality control, to verify the clustering of samples from the same species, and to check for cross-contamination. Sequences that clustered in unexpected positions of the gene tree and did not group with the corresponding duplicate sequence for the same species were re-extracted, re-amplified, and sequenced again.

After all sequences passed this quality-control step, a single sequence per species (usually the longest sequence) was used for subsequent analyses. MEGA 5 (Tamura et al., 2011) and Geneious v6 were used to calculate pairwise distances, overall mean distances per marker, percent of invariant sites and pairwise percent identity. Individual gene datasets were concatenated to infer a species tree using Geneious v6 (created by Biomatters,

! ))! http://www.geneious.com/). A first optimal partitioning scheme was obtained with

PartitionFinder (Lanfear et al., 2012), starting with 24 a priori defined partitions (by gene and codon position), using a greedy algorithm based only on the subset of models of substitution available in MrBayes. Maximum likelihood analyses of the partitioned data set (25 replicates) were implemented with RAxML on the CiPRES Science Gateway

XSEDE server (Miller et al., 2010), and confidence on the resulting tree was assessed with rapid bootstrapping. Bayesian analysis was performed using MrBayes 3.2.2

(Ronquist and Huelsenbeck, 2003), with four independent runs, for 150 million generations each. A second partitioning scheme was obtained by running

PartitionFinder’s greedy algorithm under all available models. This partitioning scheme was fully specified for 12 independent runs using Garli 2.0 (Zwickl, 2006, 2011), and

BEAST 1.8.0 (Drummond et al., 2012).

A time-calibrated phylogenetic hypothesis was inferred under a Bayesian framework using BEAST 1.8.0. The concatenated dataset was analyzed under an uncorrelated log- normal clock model (UCLN) with seven calibration priors (see below). To account for extinction, we used a birth-death model with an initial mean growth rate of 1, and a relative death rate of 0.1. Phylogenetic placement of calibration points is shown in Figure

2.3 (numbered 1-7 on the tree). The placements and prior settings were supported by evidence outlined below.

Calibration 1: the root node for Atheriniformes was defined as a secondary calibration based on a published time-tree for ray-finned fishes inferred for 202 taxa with 59 fossil calibrations!(Betancur-R et al., 2013a). Absolute age estimate: 70.5 Ma; 95% soft upper

! )#! bound: 77.5 Ma, based on the estimated age of Atherinomorpha. Prior setting: normal distribution, Mean= 70.5 Std. Dev. =4.25.

Calibration 2: Atherinidae (new crown calibration). MRCA: Craterocephalus stramineus,

Alepidomus stipes. Hard minimum age: 23 Ma, †Hemitrichas stapfi (Gaudant and

Reichenbachen, 2005). Diagnosis and phylogenetic placement: †H. stapfi was described as a new species in the family Atherinidae on the basis of its high vertebral number (33-

34), high number of abdominal vertebrae (15-16), and otolith morphology with elongated shape, pointed posterior end, and slender and rather long rostrum. Stratigraphic horizon and locality: Quarry “Am Katzenrech”, near Dexheim, Germany. Absolute age estimate:

Late Oligocene, 23 Ma, based on biostratigraphic dating of the Mainz basin. 95% soft upper bound 70.5 Ma (secondary calibration based on age of Atheriniformes, see calibration 1 above). Prior setting: exponential distribution, Mean= 15.86.

Calibration 3: genus Atherina (new crown calibration). MRCA: Atherina presbyter,

Atherina boyeri. Hard minimum age: 10 Ma †Atherina atropatiensis (Carnevale et al.,

2011). Diagnosis and phylogenetic placement: †A. atropatiensis has been included in

Atherina by the presence of a maxillary ventral shelf, preopercular and infraorbital sensory canals disconnected, dorsally directed anterior palatine process absent, and dorsolateral process of the basipterygium oriented posterodorsally. Comparative analysis with extant Atherina show that the meristic characters of †Atherina atropatiensis most closely resemble those of Atherina boyeri, in addition to simple and non-enlarged haemal arches and spines (Carnevale et al., 2011). These characteristics were used to place this fossil in crown group Atherina. Stratigraphic horizon and locality: Lignite beds of the

Tabriz Basin, NW Iran. Absolute age estimate: 10 Ma, based on fission track dating of

! )$! the lignite beds (Reichenbacher et al., 2011); 95% soft upper bound: 23 Ma (based on calibration 2, above). Prior setting: Exponential distribution, Mean= 4.34.

Calibration 4: Atherinomorus stipes new stem calibration for terminal branch. Hard minimum age: 6 Ma A. stipes (Nolf and Stringer, 1992). Diagnosis and phylogenetic placement: fossil otolith assigned to A. stipes by Nolf and Stringer (1992) based on otolith features. Stratigraphic horizon and locality: Cercado Formation, Dominican

Republic. Absolute age estimate: 6 Ma, based on strontium isotope dating (McNeill et al.,

2011); 95% soft upper bound of 23 Ma (based on calibration 2, above). Prior setting:

Exponential distribution, Mean= 5.67.

Calibration 5: genus Basilichthys (new stem calibration). MRCA: Basilichthys semotilus,

Basilichthys australis. Hard minimum age: 11 Ma Basilichthys sp. (Rubilar, 1994).

Diagnosis and phylogenetic placement: maxilla with condyle on ventral process supports the placement of this fossil in Basilichthys (Dyer, 1997). Placed as a stem calibration due to lack of synapomorphies to define the species. Stratigraphic horizon and locality: Cura-

Mallín Formation, Cerro La Mina and El Tallón, Chile. Absolute age estimate: 11 Ma, from K-Ar isotope dating (Suarez and Emparan, 1995); 95% soft upper bound 22 Ma

(twice the age of the fossil). Prior setting: exponential distribution, Mean= 3.67.

Calibration 6: genus Odontesthes (new stem calibration). MRCA: Odontesthes incisa,

Odontesthes gracilis. Hard minimum age: 20 Ma Odontesthes sp. (Bocchino, 1971).

Diagnosis and phylogenetic placement: presence of opercular fenestration and no restriction in protractile premaxilla place this fossil within Odontesthes (Cione and Baez,

2007; Dyer, 1997). Stratigraphic horizon and locality: Ñirihuau Formation, Chubut,

Argentina (Cione and Baez, 2007). Placed as a stem calibration due to lack of

! )%! synapomorphies that define species. Absolute age estimate: 20 Ma, based on palynology and microfossils (Asensio et al., 2010); 95% soft upper bound 40 Ma (twice the age of the fossil). Prior setting: exponential distribution, Mean= 6.68.

Calibration 7: genus Membras (new stem calibration). MRCA: Membras gilberti,

Membras martinica. Hard minimum age: 16 Ma Membras sp. (Nolf and Aguilera, 1998).

Diagnosis and phylogenetic placement: otolith with posterior caudal end of sulcus curved dorsally supports the placement of the fossil in Membras. Placed in the stem due to lack of distinguishing synapomorphies for species. Stratigraphic horizon and locality:

Cantaure Formation, Venezuela. Absolute age estimate: 16 Ma, lower bound of Early

Miocene (Gradstein, 2012); 95% soft upper bound 32 Ma (twice the age of the fossil).

Prior setting: exponential distribution, Mean= 5.34

Four replicate searches were conducted with BEAST, each with 150 million generations.

The log files were assessed for satisfactory mixing of the MCMC chains and effective sample size (ESS > 200) using Tracer v.1.5. The resulting outputs from the four independent runs were compiled with LogCombiner v1.8.0 and the maximum clade credibility tree was estimated with TreeAnnotator v1.8.0. The xml file used for this analysis is available at figshare.com as supplemental_data1 (See Appendix 2).

To gauge potentially misleading effects due to non-stationarity (e.g., Jermiin et al., 2004) we checked for base compositional bias using a chi-square test with the BaseFreq function implemented in PAUP* (Swofford, 2002). We also tested for substitution pattern disparity by calculating the average disparity index among taxa (Kumar and

Gadagkar, 2001) for each of the 24 a priori data partitions using MEGA 5, following the approach described by Betancur-R et al. (2013b). Codon positions that significantly

! )&! deviated from the homogeneity assumption were recoded for analysis in TreeFinder

(Jobb, 2008) using AGY coding. TreeFinder is the only ML inference program that implements a GTR3 model to account for AGY-coded data. Since AGY coding results in loss of phylogenetic signal, only partitions with codon positions that deviated significantly from the homogeneity assumption were recoded in order to preserve as much signal in the data as possible. Taxa for which no more than two genes could be sequenced were eventually excluded from this recoding analysis, as loss of phylogenetic information caused their position in the tree to become unstable (see results).

2.1.3. Ancestral Habitat Reconstruction

Habitat information was downloaded for each of taxon in our data set from online compilations such as Catalogue of Fishes (Eschmeyer, 2013) and TreeBase

(http://treebase.org), and also taken from metadata associated with collected specimens.

We assigned each species to one of two possible states: marine (A) or freshwater (B).

Euryhaline species known to tolerate a wide range of salinities and to undertake migrations into freshwater estuaries were coded as AB, since the model used for ancestral reconstruction allows species to occupy both types of habitats. Habitat data are provided in supplemental_data2 (available online at figshare.com, see Appendix 2) and are shown in Figure 2.3. This information, along with the chronogram obtained with BEAST

(available as supplemental_data3 at figshare.com, see Appendix 2), were uploaded to the

Lagrange Configurator for analysis (http://www.reelab.net/lagrange/configurator), to estimate ancestral areas with Lagrange under a model with equal probability of transition between states (Ree and Smith, 2008).

! )'!

2.2. Results

2.2.1. Sequence data, alignments, genetic divergence, and partitions

All DNA sequences obtained for this study have been deposited in GenBank and are listed in Table 2.2; these sequences passed the quality control step and were used for phylogenetic analysis. Not all loci could be amplified and sequenced for all taxa, and some sequences had to be discarded due to evidence of cross-contamination. As a consequence, each marker could be successfully sequenced for about 90% of the species

(Tables 2.2 and 2.3), and only 60 taxa have a complete set of sequences for all markers.

Species with the least amount of sequence data are Atherina hepsetus (with only two markers, kbtbd4 and sh3px3), Pseudomugil tenellus (cytb and myh6), Pseudomugil gertrudae (cytb, kiaa-l, and myh6), and Rheocles wrightae (cytb, myh6, and sh3px3); all other taxa have sequences for four or more genes. The marker with the lowest rate of success was gcs1, missing in 23% of the taxa (Table 2.3). The average level of divergence among taxa for each marker ranged from 23% (cytb) to 5.6% (sh3px3), with most nuclear genes below or around 10% (Table 2.3). Therefore, the combined data set contains a diversity of rates of evolution among markers (and within markers due to codon positions) that needs to be accounted for in a partitioned analysis. The final concatenated alignment included 108 OTU, 6432 sites, and 12.3% invariable characters

(available as supplemental_data4 at figshare.com, see Appendix 2). PartitionFinder analysis of the concatenated alignment resulted in eight partitions when searching under a greedy algorithm with the reduced set of models implemented in MrBayes (Table 2.4).

This scheme was implemented in phylogenetic analyses using RAxML and MrBayes.

! )(! PartitionFinder analysis based on all possible models resulted in a second partitioning scheme with 10 partitions (Table 2.4), which was used for analyses with Garli and

BEAST. Results from the Chi Square test showed that the 3rd codon positions of cytb, ficd, gcs1, kbtbd4, myh6, and slc10a3 differed significantly from base composition stationarity. Disparity index estimates were close to zero for all 1st and 2nd codon positions but showed average values between 1.2 and 4.5 for 3rd codon positions of all genes, except sh3px3 and kiaa-1, which were close to 0.5. Therefore, the 3rd codon positions for all genes except sh3px3 and kiaa-1 were AGY-coded and analyzed under the GTR3 model with TreeFinder, with a ninth partition added to the first partitioning scheme to accommodate the stationary 3rd codon positions (Table 2.4). All stationary partitions were analyzed in TreeFinder under the GTR + Gamma model.

! #*! Table 2.2. Taxa and Genbank accession numbers for eight markers sequenced for this study. (n/a = not available)

Family Genus/ Species cytb ficd gcs1 kbtbd4 kiaa-l myh6 sh3px3 slc10a3

Atherinidae Alepidomus evermanni n/a KM400820 KM400737 KM400915 KM401014 KM401113 KM401215 KM401313 Atherina boyeri EU036422 KM400821 KM400738 KM400916 KM401015 KM401114 KM401216 KM401314 Atherina breviceps n/a n/a KM400739 KM400917 KM401016 KM401115 KM401217 n/a Atherina hepsetus n/a n/a n/a KM401010 n/a n/a KM401308 n/a Atherina presbyter EF439188 KM400822 KM400740 KM400918 KM401017 KM401116 KM401218 KM401315 Atherinason hepsetoides KM400684 n/a n/a KM400919 KM401018 KM401117 KM401219 n/a Atherinomorus lacunosus KM400687 KM400837 KM400756 n/a KM401034 KM401134 KM401236 KM401332 Atherinomorus stipes JQ282023 KM400839 KM400758 KM400936 KM401036 KM401136 KM401238 KM401334 Atherinomorus vaigiensis KM400688 KM400838 KM400757 n/a KM401035 KM401135 KM401237 KM401333 microstoma KM400685 KM400842 n/a KM400939 KM401039 KM401139 KM401241 n/a Craterocephalus capreoli GU932792 KM400860 n/a KM400956 KM401056 KM401157 KM401258 n/a Craterocephalus eyresii GU932886 n/a KM400772 KM400957 KM401057 KM401158 KM401259 n/a Craterocephalus honoriae GU932765 KM400861 KM400773 KM400958 KM401058 KM401159 KM401260 n/a Craterocephalus stercusmuscarum KM400689 KM400862 n/a KM400959 KM401059 KM401160 KM401261 n/a Craterocephalus stramineus GU932804 KM400863 KM400774 KM400960 KM401060 KM401161 KM401262 n/a Kestratherina esox GU932762 KM400868 KM400779 KM400965 KM401065 KM401166 KM401266 n/a Leptatherina presbyteroides KM400686 KM400870 n/a KM400968 KM401068 KM401169 KM401269 n/a Atherinopsidae Atherinella argentea JQ282017 KM400823 KM400741 KM400920 KM401019 KM401118 KM401220 KM401316 Atherinella balsana KC736414 n/a KM400742 KM400921 KM401020 KM401119 KM401221 KM401317 Atherinella blackburni KC736357 KM400824 KM400743 KM400922 n/a KM401120 KM401222 KM401318 Atherinella brasiliensis KC736412 KM400825 KM400744 KM400923 KM401021 KM401121 KM401223 KM401319 KC736346 KM400826 KM400745 KM400924 KM401022 KM401122 KM401224 KM401320 Atherinella guatemalensis KC736386 KM400827 KM400746 KM400925 KM401023 KM401123 KM401225 KM401321 Atherinella hubbsi KC736388 KM400828 KM400747 KM400926 KM401024 KM401124 KM401226 KM401322 Atherinella marvelae JQ282021 KM400829 KM400748 KM400927 KM401025 KM401125 KM401227 KM401323 Atherinella milleri KC736379 KM400830 KM400749 KM400928 KM401026 KM401126 KM401228 KM401324 Atherinella panamensis KC736362 n/a KM400750 KM400929 KM401027 KM401127 KM401229 KM401325 Atherinella pellosemeion KC736349 KM400831 KM400751 KM400930 KM401028 KM401128 KM401230 KM401326 Atherinella sallei KC736383 KM400832 KM400752 KM400931 KM401029 KM401129 KM401231 KM401327 Atherinella sardina KC736389 KM400833 KM400753 KM400932 KM401030 KM401130 KM401232 KM401328 Atherinella schultzi KC736377 KM400834 n/a KM400933 KM401031 KM401131 KM401233 KM401329 Atherinella serrivomer KC736358 KM400835 KM400754 KM400934 KM401032 KM401132 KM401234 KM401330 Atherinella starksi KC736352 KM400836 KM400755 KM400935 KM401033 KM401133 KM401235 KM401331 Atherinops affinis KM400705 KM400840 KM400759 KM400937 KM401037 KM401137 KM401239 KM401335 Atherinopsis californiensis JQ282018 KM400841 KM400760 KM400938 KM401038 KM401138 KM401240 KM401336 KM400706 KM400845 KM400763 KM400942 KM401041 KM401142 KM401244 KM401338 Basilichthys microlepidotus KM400707 KM400846 KM400764 KM400943 KM401042 KM401143 KM401245 KM401339 Basilichthys semotilus JQ282024 KM400847 KM400765 KM400944 KM401043 KM401144 KM401246 KM401340 Chirostoma attenuatum KC736405 KM400854 n/a KM400951 KM401050 KM401151 n/a KM401347 Chirostoma consocium KC736401 KM400855 n/a KM400952 KM401051 KM401152 KM401253 KM401348 Chirostoma humboldtianum KC736402 KM400856 n/a KM400953 KM401052 KM401153 KM401254 KM401349

! "#! Chirostoma jordani JQ282072 KM400857 n/a n/a KM401053 KM401154 KM401255 KM401350 Chirostoma labarcae KC736399 KM400858 KM400770 KM400954 KM401054 KM401155 KM401256 KM401351 Chirostoma riojai KC736398 KM400859 KM400771 KM400955 KM401055 KM401156 KM401257 KM401352 Labidesthes sicculus JQ282031 KM575708 KM400781 KM400967 KM401067 KM401168 KM401268 KM401357 Leuresthes tenuis JQ282032 KM400871 KM400782 KM400969 KM401069 KM401170 KM401270 n/a Melanorhinus microps KC736344 KM400873 KM400784 KM400971 KM401071 KM401172 KM401272 KM401359 Membras gilberti JQ282034 KM400883 n/a KM400981 KM401081 KM401182 KM401282 KM401369 Membras martinica JQ282035 KM400884 n/a KM400982 KM401082 KM401183 KM401283 KM401370 Menidia beryllina KC736408 KM400885 KM400791 KM400983 KM401083 KM401184 KM401284 KM401371 Menidia colei KC736373 KM400886 KM400792 KM400984 KM401084 KM401185 KM401285 KM401372 Menidia extensa KC736370 KM400887 KM400793 KM400985 KM401085 KM401186 KM401286 KM401373 Menidia menidia JQ282036 KM400888 KM400794 KM400986 KM401086 KM401187 KM401287 KM401374 Menidia peninsulae KC736345 KM400889 KM400795 KM400987 KM401087 KM401188 KM401288 KM401375 Odontesthes argentinensis KM400708 KM400893 KM400798 KM400991 KM401090 KM401192 KM401291 n/a Odontesthes bonariensis KM400709 KM400894 KM400799 KM400992 KM401091 KM401193 KM401292 KM401378 Odontesthes brevianalis KM400713 KM400895 KM400800 KM400993 KM401092 KM401194 KM401293 KM401379 Odontesthes gracilis KM400714 KM400896 KM400801 KM400994 KM401093 KM401195 KM401294 KM401380 Odontesthes hatcheri KM400717 KM400897 KM400802 KM400995 KM401094 KM401196 KM401295 KM401381 Odontesthes humensis KM400712 KM400913 KM400817 KM401011 KM401110 n/a KM401309 KM401396 Odontesthes incisa KM400720 n/a KM400818 KM401012 n/a n/a KM401310 KM401397 Odontesthes ledae KM400710 KM400898 KM400803 KM400996 KM401095 KM401197 KM401296 KM401382 Odontesthes mauleanum KM400716 KM400899 KM400804 KM400997 KM401096 KM401198 KM401297 KM401383 Odontesthes nigricans KM400719 KM400900 KM400805 KM400998 KM401097 KM401199 KM401298 KM401384 Odontesthes perugiae KM400711 KM400901 KM400806 KM400999 KM401098 KM401200 KM401299 KM401385 KM400715 KM400902 KM400807 KM401000 KM401099 KM401201 KM401300 KM401386 Odontesthes retropinnis n/a KM400903 KM400808 KM401001 n/a KM401202 KM401301 KM401387 Odontesthes smitti KM400718 KM400904 KM400809 KM401002 KM401100 KM401203 KM401302 KM401388 Poblana alchichica KC736395 KM400906 KM400811 KM401004 KM401102 KM401205 KM401304 KM401390 Poblana ferdebueni KC736394 KM400907 KM400812 KM401005 KM401103 KM401206 KM401305 KM401391 Atherionidae Atherion n/a KM400843 KM400761 KM400940 n/a KM401140 KM401242 n/a Bedotiidae Bedotia leucopteron KM400721 KM400848 n/a KM400945 KM401044 KM401145 KM401247 KM401341 Bedotia marojejy KM400722 KM400849 KM400766 KM400946 KM401045 KM401146 KM401248 KM401342 Bedotia sp. Ankavia KC133643 KM400850 KM400767 KM400947 KM401046 KM401147 KM401249 KM401343 Bedotia sp. Namorona KM400734 KM400851 KM400768 KM400948 KM401047 KM401148 KM401250 KM401344 Rheocles vatosoa KM400723 KM400911 n/a KM401009 KM401109 KM401213 KM401307 n/a Rheocles wrightae KC133646 n/a n/a n/a n/a JX189633 JX189540 n/a Isonidae Iso natalensis KM400690 KM400866 KM400777 KM400963 KM401063 KM401164 n/a n/a Iso sp. JQ282011 n/a n/a n/a KM401111 n/a KM401311 KM401398 Melanotaeniidae Cairnsichthys rhombosomoides JQ282005 KM400852 KM400769 KM400949 KM401048 KM401149 KM401251 KM401345 Chilatherina fasciata KC133596 KM400853 n/a KM400950 KM401049 KM401150 KM401252 KM401346 Glossolepis incisus GU932788 KM400864 KM400775 KM400961 KM401061 KM401162 KM401263 KM401353 Iriatherina werneri JQ282006 KM400865 KM400776 KM400962 KM401062 KM401163 KM401264 KM401354 Melanotaenia affinis KM400726 KM400874 n/a KM400972 KM401072 KM401173 KM401273 KM401360 Melanotaenia australis JQ282007 KM400875 KM400785 KM400973 KM401073 KM401174 KM401274 KM401361

! "$! Melanotaenia catherinae KM400730 KM400876 KM400786 KM400974 KM401074 KM401175 KM401275 KM401362 Melanotaenia exquisita KM400727 KM400877 KM400787 KM400975 KM401075 KM401176 KM401276 KM401363 Melanotaenia kokasensis KM400731 KM400878 n/a KM400976 KM401076 KM401177 KM401277 KM401364 Melanotaenia maccullochi KM400729 KM400879 KM400788 KM400977 KM401077 KM401178 KM401278 KM401365 Melanotaenia nigrans KM400728 KM400880 KM400789 KM400978 KM401078 KM401179 KM401279 KM401366 Melanotaenia splendida KC133543 KM400881 n/a KM400979 KM401079 KM401180 KM401280 KM401367 Melanotaenia trifasciata KM400790 KM400882 KM400790 KM400980 KM401080 KM401181 KM401281 KM401368 Rhadinocentrus ornatus JQ282009 n/a KM400816 n/a KM401108 KM401212 n/a KM401395 Notocheiridae Notocheirus hubbsi JQ282012 KM400892 KM400797 KM400990 KM401089 KM401191 KM401290 n/a Phallostethidae Neostethus bicornis n/a KM400890 n/a KM400988 n/a KM401189 n/a KM401376 Neostethus lankesteri KM400735 KM400891 KM400796 KM400989 KM401088 KM401190 KM401289 KM401377 Pseudomugilidae Kiunga ballochi KM400724 KM400869 KM400780 KM400966 KM401066 KM401167 KM401267 KM401356 Pseudomugil gertrudae Unmack2013 n/a n/a n/a KM401105 KM401208 n/a n/a Pseudomugil novaeguineae n/a KM400909 KM400814 KM401007 KM401106 KM401209 n/a KM401393 Pseudomugil signifer n/a KM400910 KM400815 KM401008 KM401107 KM401210 n/a KM401394 Pseudomugil tenellus JQ282014 n/a n/a n/a n/a KM401211 n/a n/a Telmatherinidae Kalyptatherina helodes KM400732 KM400867 KM400778 KM400964 KM401064 KM401165 KM401265 KM401355 Marosatherina ladigesi KM400733 KM400872 KM400783 KM400970 KM401070 KM401171 KM401271 KM401358 Outgroup taxa Beloniformes Ablennes hians AF231639 KM400819 KM400736 KM400914 KM401013 KM401112 KM401214 KM401312 Platybelone argalus AF243874 KM400905 KM400810 KM401003 KM401101 KM401204 KM401303 KM401389 XM_004072 XM_004069 XM_004079 Oryzias latipes AB480878 598 n/a 600.1 893.1 EF032927 EF033005.1 n/a Cyprinodonti- arachan n/a KM400844 KM400762 KM400941 KM401040 KM401141 KM401243 KM401337 formes Poecilia latipinna HQ677867 KM400908 KM400813 KM401006 KM401104 KM401207 KM401306 KM401392

! "%! Table 2.3. Summary of sequence data obtained, variation among taxa, and missing data.

Molecular cytb ficd gcs1 kbtbd4 kiaa-l myh6 sh3px3 slc10a3 Marker

Number of 99 96 83 100 100 104 100 87 sequences

Alignment 1104 645 1185 675 888 648 669 618 Length (bp)

% Invariant 48 57.8 46.2 62.1 60.1 57.6 59 57.8 Sites

Inter-spp 23.6 11.1 12.3 8.1 divergence1 (36.3) (21.9) (24.0) (16.8) 6.5 (16.1) 9.7 (18.1) 5.6 (15.8) 9.3 (21.2) Missing data2 8.3% 11.1% 23.1% 7.4% 7.4% 3.7% 7.4% 19.4% 1 Divergence among species is measured as average and maximum (in parenthesis) percent sequence difference (percent p-distance) 2 Percent of taxa (out of 108) missing sequence data for each marker

! ""! Table 2.4. Partitioning scheme and best-fit models resolved by PartitionFinder (implemented in RAxML and BEAST analyses) and TreeFinder (for AGY-coded data). Partitions in bold were found to be non-stationary and were AGY-coded for analysis with TreeFinder under the GTR3 model.

PartitionFinder PartitionFinder TreeFinder (MrBayes models for MrBayes and RAxML) (All models for Garli and BEAST)

Partition1 Data included Model Partition1 Data included Model Partition1 Data included Model 1a 1st position ficd, gcs1, GTR + I + ! 1b 1st position gcs1, K81uf + I + ! 1c 1st position ficd, gcs1, GTR + ! kbtbd4, kiaa-1, myh6, slc10a3 kbtbd4, kiaa-1, myh6, sh3px3, slc10a3 sh3px3, slc10a3 2a 2nd position ficd, GTR + I + ! 2b 2nd position ficd, gcs1, GTR + I + ! 2c 2nd position ficd, gcs1, GTR + ! gcs1, kbtbd4, kiaa-1, kiaa-1, myh6, sh3px3 kbtbd4, kiaa-1, myh6, myh6, sh3px3 sh3px3 3a 3rd position gcs1, GTR + ! 3b 3rd position ficd, gcs1, GTR + ! 3c 3rd position gcs1, GTR3+ ! kbtbd4, kiaa-1, sh3px3 kbtbd4, myh6 kbtbd4 4a 3rd position ficd, GTR + ! 4b 1st position ficd, TIM + I + ! 4c 3rd position ficd, GTR3+ ! myh6 kbtbd4, kiaa-1, myh6, myh6 sh3px3 5a 2nd position cytb, GTR + I + ! 5b 2nd position cytb, TVM + I + ! 5c 2nd position cytb, GTR + ! slc10a3 slc10a3 slc10a3 6a 3rd position slc10a3 GTR + ! 6b 3rd position slc10a3 TVM + G 6c 3rd position slc10a3 GTR3+ ! 7a 1st position cytb GTR + I + ! 7b 3rd position kiaa-1, SYM + G 7c 1st position cytb GTR + ! sh3px3 8a 3rd position cytb GTR + ! 8b 2nd position kbtbd4 K80 + I 8c 3rd position cytb GTR3+ ! 9a - 9b 1st position cytb TVMef + I + ! 9c 3rd position kiaa-1, GTR + ! sh3px3 10a - 10b 3rd position cytb TrN + G 10c - 1 24 data blocks were defined a priori, by gene and codon position

! "#! !

2.2.2. Phylogenetic relationships

RAxML (Figure 2.2), Garli, and MrBayes searches all resulted in congruent topologies, but BEAST produced a slightly different result (Figure 2.3). Analyses using BEAST produced ESS values >200 for the combined runs, although some ESS values were low for partitions that contained only a single codon position. Trees sampled for the first 20 million generations of each run were discarded as burn-in. Newick files for trees obtained with BEAST and RAxML are available as supplemental_data3 and supplemental_data5, respectively, at figshare.com (see Appendix 2). Analysis with TreeFinder based on the

AGY-coded data for non-stationary 3rd codon partitions (Table 2.4), resulted in almost the same topology obtained by RAxML when two rogue taxa (Atherina hepsetus and

Pseudomugil tenellus) were excluded. Only two gene partitions were available for these two taxa, therefore loss of information due to AGY coding was likely the cause of erroneous placement with unrelated taxa in different families across replicate runs. The only difference between the topology obtained with RAxML and TreeFinder (when excluding the two rogue taxa) involves branching order within the Odontesthes argentinensis clade and the relative positions of Menidia colei and Menidia menidia. The newick file with TreeFinder results is available as supplemental_data6 at figshare.com

(see Appendix).

All analyses resolve with high support the subdivision of Atheriniformes in two suborders, with Notocheirus hubbsi nested within the family Atherinopsidae. Within this family, minor differences in branching pattern for the ML and Bayesian results were observed for Poblana, some species of Chirostoma, and for the position of Menidia

! "#! ! menidia, but all analyses support the monophyly of the tribe Menidiini (Chirostoma,

Labidesthes, Menidia, and Poblana) with Labidesthes sicculus as the sister group to the rest of the taxa in this clade (Figure 2.2 A and Figure 2.3). In contrast, the tribe

Membradini (Atherinella, Melanorhinus, Membras) was not resolved as a monophyletic group since Melanorhinus microps is more closely related to Menidiinae than to species of Atherinella, and Atherinella brasiliensis is the sister group to all other members of the subfamily Menidiinae (Figure 2.2 A and Figure 2.3). At the generic level, there is no support for the monophyly of Atherinella, Chirostoma, Menidia, or Poblana as currently defined, suggesting necessary revisions to the . Notocheirus hubbsi was unambiguously resolved as the sister group of Menidiinae.

Less congruence between ML and Bayesian results was observed for relationships among families in the suborder Atherinoidei (Figure 2.2 B and Figure 2.3). For example,

Isonidae was resolved either as the sister group to Atherinidae (RAxML) or as the sister group to a larger clade that contains Atherinidae, Bedotiidae, Melanotaeniidae,

Pseudomugilidae, and Telmatherinidae (BEAST). All analyses failed to support the monophyly of Melanotaeniidae because Cairnsichthys is never in the same clade as the other taxa included in this family. The position of Bedotiidae in relation to

Melanotaeniidae, Pseudomugilidae and Telmatherinidae is not resolved with confidence, but there is strong support for a sister group relationship between Pseudomugilidae and

Telmatherinidae (Figure 2.2 B and Figure 2.3). RAxML placed Phallostetids and

Atherion in a well supported clade that is sister to all other taxa in the suborder

Atherinoidei (Figure 2.2 A), but BEAST results placed Atherion as the sister taxon to

Melanotaeniidae, to the exclusion of Cairnsichthys that is now placed as the sister group

! "$! ! of Telmatherinidae plus Pseudomugilidae (Figure 2.3). Our results support the monophyly of the family Atherinidae, its subfamilies Craterocephalinae,

Atherinomorinae and two genera for which we had more than one species in our data set:

Craterocephalus and Atherina.

In spite of the discrepancies mentioned above, the data set provided significant phylogenetic signal to resolve relationships within Atheriniformes. A test of alternative hypotheses (Shimodaira and Hasegawa, 2001) rejected the topology proposed by Dyer &

Chernoff (1996), summarized in Figure 2.1 (p<0.001). Atherinopsidae and its subfamilies were well supported, with 100% bootstrap support in RAxML, and 0.9-1.0 posterior probability in BEAST. Posterior probability density values on the consensus BEAST tree were higher than 0.9, both at the ordinal and family levels. However, our dataset provided weak resolution for the relationships among families in Atherinoidei. While placement of

Phallostethids as sister to all other taxa is highly supported (posterior probability of 1), the relationships among other families are not, ranging from 0.24 for the relationship between Atherinidae and Bedotiidae, Telmatherinidae, Pseudomugilidae, Atherion and

Melanotaeniidae, to 0.55 for the relationship between Isonidae and all other families excluding Phallostethids. The phylogenetic placement of Atherion is still unresolved, as our maximum likelihood analysis placed it sister to the Phallostethids with high support, but BEAST placed it as sister to Melanotaeniidae, although with low posterior probability.

! "%! !

*)+"%,-%(.#/-%01"0,-0 !"#$%&'() 7&)*.%,*&.(*+2-)&.:(:* 7&)*.%,*&.(*+3*''6))%'0( 7&)*.%,*&.(*+&896(2(,* !""#$#%&'($)''"%# 7&)*.%,*&.(*+.(;1*.2 7&)*.%,*&.(*+*62,1*)(2 100% 7&)*.%,*&.(*+,1(5*2'(*,* 40()*,0&1(.*+5*2'(*,* *&+),-.)&,""/)&01 99-90% J)%22%)&-(2+(.'(262 7&)*.%,*&.(*+*55(.(2 7&)*.%,*&.(*+G%G*2&.2(2 89-70% 7&)*.%,*&.(*+'*,0&1(.*& @1(*,0&1(.*+I&1.&1( A0*:(.%'&.,162+%1.*,62 F(6.;*+>*))%'0( $2&6:%36;()+.%<*&;6(.&*& $2&6:%36;()+;&1,16:*& 23&$/-4$#"+"/)& $2&6:%36;()+,&.&))62 $2&6:%36;()+2(;.(5&1 F*)?-,*,0&1(.*+0&)%:&2 5&+4).6&%","/)& 7*1%2*,0&1(.*+)*:(;&2( 4*(1.2('0,0?2+10%3>%2%3%(:&2 *&+),-.)&,""/)&0( B&:%,(*+2-"+/.G*<(* B&:%,(*+3*1%H&H? B&:%,(*+2-"+C*3%1%.* B&:%,(*+)&6'%-,&1%. 7&/-.""/)& A0&%')&2+I1(;0,*& A0&%')&2+<*,%2%* 41*,&1%'&-0*)62+2,&1'62362'*163 41*,&1%'&-0*)62+2,1*3(.&62 41*,&1%'&-0*)62+'*-1&%)( 41*,&1%'&-0*)62+0%.%1(*& 41*,&1%'&-0*)62+&?1&2(( =&-,*,0&1(.*+-1&2>?,&1%(:&2 F&2,1*,0&1(.*+&2%8 /,0&1(.%2%3*+3('1%2,%3* 8.6&%","/)& /,0&1(.*2%.+0&-2&,%(:&2 /,0&1(.*+0&-2&,62 /,0&1(.*+-1&2>?,&1 /,0&1(.*+>%?&1( /,0&1(.*+>1&<('&-2 /)&-(:%362+&<&13*..( /,0&1(.%3%162+2,(-&2 /,0&1(.%3%162+)*'6.%262 /,0&1(.%3%162+<*(;(&.2(2 @2%+.*,*)&.2(2 93-,"/)& 26)++-3.&.6"/)& @2%+2-" C&%2,&,062+)*.G&2,&1( C&%2,&,062+>('%1.(2 /,0&1(%.+&)?362 8.6&%"-,"/)& /62,1%)&>(*2+*1*'0*. $%&'()(*+)*,(-(..* />)&..&2+0(*.2 $)*,?>&)%.&+*1;*)62 D1?E(*2+)*,(-&2 !"!#

!"#$%&'$(()*+)%1&') !"#$%&'$(()*#-22+& -)."%/0%(1#20%34"'$"3/03 !"#$%&'$(()*/)%=$()$ !"#$%&'$(()*/&(($%& !"#$%&'$(()*+)(($& !"#$%&'$(()*2)(+)') !"#$%&'$(()*>-)"$/)($'+&+ !"#$%&'$(()*?$((4+$/$&4' !"#$%&'$(()*,%3+")((&') *%"+&',&-+%./"0"'12 !"#$%&'$(()*)%>$'"$) !"#$%&'$(()*+,#-(".& !""#$#%&'($)''"%# 0$/2%)+*/)%"&'&,) 0$/2%)+*>&(2$%"& 100% !"#$%&'$(()*+$%%&=4/$% !"#$%&'$(()*?)')/$'+&+ 99-90% !"#$%&'$(()*2(),92-%'& !"#$%&'$(()*+")%9+&

89-70% 5#&%4+"4/)*,4'+4,&-/ &

5#&%4+"4/)*#-/24(1"&)'-/ . 5#&%4+"4/)*)""$'-)"-/ /

5#&%4+"4/)*;4%1)'& " 3$+4.-"56'

@42()')*A$%1$2-$'& 3 ,&0"/""0.& 5#&%4+"4/)*%&4;)& > @42()')*)(,#&,#&,) *%"+&',&0"/""0" 5#&%4+"4/)*()2)%,)$ 0$'&1&)*2$%3((&') 0$'&1&) ?$'&'+-()$ "08 0$'&1&)*/$'&1&) % 0$'&1&)*,4($& & 0$'&1&)*$6"$'+) ; 8)2&1$+"#$+*+&,,-(-+ 9 0$()'4%#&'-+*/&,%4?+ = *%"+&',&-+%./"0"'1( ' !"#$%&'$(()*2%)+&(&$'+&+ 6

74"4,#$&%-+*#-22+& 3$+4.-"56'7898:;&"%"0.& 5 :14'"$+"#$+*%$"%4?&''&+ " :14'"$+"#$+*24')%&$'+&+ :14'"$+"#$+*?$%->&)$ - :14'"$+"#$+*($1)$ .

:14'"$+"#$+*#-/$'+&+ ! :14'"$+"#$+*)%>$'"&'$'+&+ 3$+4.-"56' :14'"$+"#$+*>%),&(&+ :14'"$+"#$+*%$>&) =9;&%"08>3"0.& :14'"$+"#$+*+/&""& *%"+&'<8%#&09"0"0" :14'"$+"#$+*/)-($)'-/ :14'"$+"#$+*2%$=&)')(&+ :14'"$+"#$+*#)",#$%& :14'"$+"#$+*&',&+) :14'"$+"#$+*'&>%&,)'+ <)+&(&,#"#3+*/&,%4($?&14"-+ <)+&(&,#"#3+*)-+"%)(&+ <)+&(&,#"#3+*+$/4"&(-+ !"#$%&'4?+*)AA&'&+ !"#$%&'4?+&+*,)(&A4%'&$'+&+ *%"+&'=9;&%"08>3"0" 8$-%$+"#$+*"$'-&+ ! "#$%& '() !-+"%4($2&)+*)%),#)' @4$,&(&)*()"&?&'') !2($''$+*#&)'+ @()"32$(4'$*)%>)(-+ :%3.&)+*()"&?$+ * + * ,

! "&! !

Figure 2.2 A (Atherinopsoidei) & B (Atherinoidei). Maximum-likelihood phylogenetic hypotheses for the Atheriniformes obtained with RAxML. Bootstrap support values are indicated on nodes as black (100% bootstrap support), grey (99-90%) and white circles (89-70%). A: suborder Atherinopsoidei; B: suborder Atherinoidei.

! '(! !

C,")&*"(D/ =&9"(*"#" B(*"#" @9$A(*"#" !$(*"#" =9$(>?=9"> :('.,$4.%%"*#",0$4" :('.,$4.%%"*'166#$ :('.,$4.%%"*+",;.%". :('.,$4.%%"*+$%%.,$ :('.,$4.%%"*#"%%.$ :('.,$4.%%"*6"%#"4" :('.,$4.%%"*>1"(.+"%.4#$# :('.,$4.%%"*/.%%-#.+.$-4 :('.,$4.%%"*&,)#("%%$4" :('.,$4.%%"*",>.4(." :('.,$4.%%"*#&'1%(H$ 7 5.+6,"#*+",($4$&" 5.+6,"#*>$%6.,($ :('.,$4.%%"*#.,,$;-+., :('.,$4.%%"*/"4"+.4#$# :('.,$4.%%"*#(",9#$ :('.,$4.%%"*6%"&961,4$ !"#$%$$#&" ?'$,-#(-+"*&-4#-&$1+ '"M:I(,9%:/$9J",/$%"/ ?'$,-#(-+"*'1+6-%0($"41+ ?'$,-#(-+"*"((.41"(1+ ?'$,-#(-+"*@-,0"4$ ?'$,-#(-+"*%"6",&". A-6%"4"*<.,0.61.4$ A-6%"4"*"%&'$&'$&" ?'$,-#(-+"*,$-@"$ 5.4$0$"*6.,)%%$4" 5.4$0$"*/.4$4#1%". 5.4$0$"*&-%.$ 5.4$0$"*+.4$0$" 5.4$0$"*.F(.4#" 7"6$0.#('.#*#$&&1%1# 5.%"4-,'$41#*+$&,-/# :('.,$4.%%"*6,"#$%$.4#$# -)+",$#(./$%&" 3-(-&'.$,1#*'166#$ '()(*+"$,$#&" =0-4(.#('.#*%.0". =0-4(.#('.#*/.,1>$". =0-4(.#('.#*6-4",$.4#$# =0-4(.#('.#*,.(,-/$44$# =0-4(.#('.#*'1+.4#$# !"#$%&'%()*+'%,-$./$,&', =0-4(.#('.#*",>.4($4.4#$# =0-4(.#('.#*>,"&$%$# =0-4(.#('.#*,.>$" -)+",$#(./$#&" =0-4(.#('.#*#+$(($ =0-4(.#('.#*+"1%."41+ 6 =0-4(.#('.#*6,.;$"4"%$# =0-4(.#('.#*'"(&'.,$ =0-4(.#('.#*4$>,$&"4# =0-4(.#('.#*$4&$#" !"#$%$&'(')#*+$&,-%./$0-(1# 5 !"#$%$&'(')#*"1#(,"%$# !"#$%$&'(')#*#.+-($%1# :('.,$4-/#*"<<$4$# :('.,$4-/#$#*&"%$<-,4$.4#$# 7.1,.#('.#*(.41$# 5.%"4-(".4$"*+"&&1%%-&'$ 5.%"4-(".4$"*#/%.40$0" 5.%"4-(".4$"*.FG1$#$(" 5.%"4-(".4$"*4$>,"4# 5.%"4-(".4$"*"1#(,"%$# 5.%"4-(".4$"*(,$<"#&$"(" ?'$%"('.,$4"*<"#&$"(" !"9&#()&"#$$%&" B%-##-%./$#*$4&$#1# 5.%"4-(".4$"*"<<$4$# 5.%"4-(".4$"*9-9"#.4#$# 5.%"4-(".4$"*&"('.,$4". 8,$"('.,$4"*D.,4.,$ C'"0$4-&.4(,1#*-,4"(1# :('.,$-4*.%)+1# -)+",$(#$%&" 1 E$14>"*6"%%-&'$ A#.10-+1>$%*4-;".>1$4.". A#.10-+1>$%*>.,(,10". =/"D%(EDA$9$%&" A#.10-+1>$%*(.4.%%1# A#.10-+1>$%*#$>4$<., @9%:I(,9%:/$9J",/$%"/ E"%)/("('.,$4"*'.%-0.# 5",-#"('.,$4"*%"0$>.#$ F"9E&)+",$#$%&" &#%:K&$#L(M;$/+"/ ?"$,4#$&'(')#*,'-+6-#-+-$0.# !"9&#()&"#$$%&" !.0-($"*#/2*:49";$" !.0-($"*+",-@.@) !.0-($"*%.1&-/(.,-4 !.0-($"*#/2*3"+-,-4" G"%()$$%&" C'.-&%.#*D,$>'(". C'.-&%.#*;"(-#-" ?,"(.,-&./'"%1#*#(.,&1#+1#&",1+ ?,"(.,-&./'"%1#*#(,"+$4.1# ?,"(.,-&./'"%1#*&"/,.-%$ ?,"(.,-&./'"%1#*'-4-,$". ?,"(.,-&./'"%1#*.),.#$$ E.#(,"('.,$4"*.#-F 7./("('.,$4"*/,.#6)(.,-$0.# :('.,$4-#-+"*+$&,-#(-+" :('.,$4"#-4*'./#.(-$0.# -)+",$#$%&" :('.,$4"*'./#.(1# 3 :('.,$4"*/,.#6)(., 2 :('.,$4"*6-).,$ :('.,$4"*6,.;$&./# :%./$0-+1#*.;.,+"44$ 4 :('.,$4-+-,1#*#($/.# :('.,$4-+-,1#*%"&14-#1# :('.,$4-+-,1#*;"$>$.4#$# !"#$%&'%()*+'%,-$,&', 8#-*4"("%.4#$# 8#-*#/2 H/(#$%&" 3.-#(.('1#*%"49.#(.,$ 3.-#(.('1#*6$&-,4$# =+&99(/)")+$%&"

8 0 7 0 6 0 5 0 4 0 3 0 2 0 1 0 0 !$99$(#/:(;:<"&,/

Figure 2.3. Time-calibrated phylogeny obtained with BEAST. Numbers inside black circles indicate the placement for the 7 calibrations used. Bars represent the 95% highest posterior credibility intervals of divergence times.

! ')! !

2.2.3. Time-calibrated phylogeny

Figure 2.3 shows the time-calibrated phylogeny obtained with BEAST, indicating the position of six fossil calibration points and one secondary calibration for the root.

Divergence times and their estimated 95% highest posterior density (HPD) intervals place the origin of this order in the Late Cretaceous (72.8 Ma). Subsequent divergence between Old World and New World taxa started in the Paleogene and all currently recognized families originated during the Eocene and Oligocene (50-23 Ma).

2.2.4. Ancestral Habitat Reconstruction

Reconstruction of habitat utilization indicates, with a 37% relative probability, that the common ancestors to the New and Old World lineages, Atherinopsoidei and

Atherinoidei, were both marine. However, this analysis also suggests (with 30% probability) that the Old World ancestor could have been euryhaline, and a 13% probability of a freshwater ancestor. Within Atherinopsidae, the ancestors of subfamilies

Atherinopsidae and Menidiinae were reconstructed as marine with a high probability

(76%). Highest probability values for reconstructed ancestral habitats are indicated for all nodes in Figure 2.4. Complete results of the Lagrange analysis can be found in supplemental_data7 at figshare.com (see Appendix 2). We also performed habitat reconstruction on a calibrated time-tree constrained to the RAxML topology, to account for different scenarios due to our conflicting topologies in Atherinoidei. The atheriniform ancestor is reconstructed as marine with a higher probability, 60%, under this constrained topology (complete results can be found in supplemental_data8 at figshare.com,

Appendix 2).

! '*! !

)*+",$#"..&'1&,%$#& 2376 )*+",$#"..&'+0881$ 236< 2378 )*+",$#"..&'(&,9".&" )*+",$#"..&'($..",$ )*+",$#"..&'1&.."$ 237; 2386 )*+",$#"..&'8&.1&#& 2378 )*+",$#"..&'30&*"(&."#1$1 237: )*+",$#"..&'6"..-1"("$-# 237 )*+",$#"..&'4,/1*&..$#& 2356 )*+",$#"..&'&,3"#*"& )*+",$#"..&'14+0.*=$ !$0(%"= 2358 !"(8,&1'(&,*$#$4& 2385 238: !"(8,&1'3$.8",*$ )*+",$#"..&'1",,$9-(", >0"-,?$'"0 )*+",$#"..&'6&#&("#1$1 )*+",$#"..&'1*&,B1$ )*+",$#"..&'8.&4B80,#$ @A0B,$#(%"= 2379 5+$,-1*-(&'7'2-8.&#& 2368 2369 !"#$%$&'8",/..$#& 235 23<: !"#$%$&'6"#$#10.&" 2359 2375 !"#$%$&'4-."$ 2389 !"#$%$&'("#$%$& 238; !"#$%$&'";*"#1& 2378 @&8$%"1*+"1'1$440.01 !".&#-,+$#01'($4,-61 )*+",$#"..&'8,&1$.$"#1$1 A-*-4+"$,01'+0881$ <%-#*"1*+"1'."%&" <%-#*"1*+"1'6",03$&" <%-#*"1*+"1'8-#&,$"#1$1 236; <%-#*"1*+"1',"*,-6$##$1 2356 <%-#*"1*+"1'+0("#1$1 <%-#*"1*+"1'&,3"#*$#"#1$1 2345 <%-#*"1*+"1'3,&4$.$1 <%-#*"1*+"1',"3$& <%-#*"1*+"1'1($**$ 2384 <%-#*"1*+"1'(&0."�( 236< <%-#*"1*+"1'8,"9$&#&.$1 2398 <%-#*"1*+"1'+&*4+",$ <%-#*"1*+"1'#$3,$4 23< <%-#*"1*+"1'$#4$1& ?&1$.$4+*+/1 /',"0(%&1-(%( !"#$%&'$"%(()$" )*+",$-#'"./(01 :$0#3&'8&..-4+$ 236 21"0%-(03$.'#-9&"30$#"&" 2375 21"0%-(03$.'3",*,0%&" 2388 2358 21"0%-(03$.'*"#"..01 2345 2369 21"0%-(03$.'1$3#$>", :&./6*&*+",$#&'+".-%"1 2378 !&,-1&*+",$#&'.&%$3"1$ 5&$,#1$4+*+/1',+-(8-1-(-$%"1 *")&'(()$" 5,&*",-4"6+&.01'1*",401(014&,0( 238; 5,&*",-4"6+&.01'1*,&($#"01 2357 5,&*",-4"6+&.01'4&6,"-.$ 239; 5,&*",-4"6+&.01'+-#-,$&" 5,&*",-4"6+&.01'"/,"1$$ 238 :"1*,&*+",$#&'"1-; @"6*&*+",$#&'6,"18/*",-$%"1 )*+",$#-1-(&'($4,-1*-(& 237: )*+",$#&1-#'+"61"*-$%"1 2354 238 )*+",$#&'+"61"*01 237; )*+",$#&'6,"18/*", 235: )*+",$#&'8-/",$ )*+",$#&'8,"9$4"61 235< )."6$%-(01'"9",(&##$ 2369 23<< )*+",$#-(-,01'1*$6"1 )*+",$#-(-,01'.&40#-101 )*+",$#-(-,01'9&$3$"#1$1 .-&%()$" +,$##&-'"',()$"

Figure 2.4. Ancestral habitat reconstruction for the Atheriniformes based on the chronogram obtained with BEAST (Figure 2.3). Marine state indicated by blue lines, freshwater by red, and euryhaline by black dotted lines. Probability values on the nodes indicate the probability of the reconstructed ancestral state shown. Euryhaline taxa that do not have a euryhaline ancestral are shown with their most probable ancestral state colored at the subtending node.

! '+! !

2.3. Discussion

This study provides a comprehensive time-tree for the order Atheriniformes. Our dataset includes 103 atheriniform species, almost 30% of the 352 valid species in the order, a significant increase from previous molecular phylogenies where the species coverage varied from 1% to 14% (Bloom et al., 2012; Setiamarga et al., 2008; Sparks and Smith,

2004). We included 2/3 of all genera, with dense sampling in the most genera-rich families (92 % for Atherinopsidae, 67% for Atherinidae, 86 % for Melanotaeniidae).

Unfortunately, Dentatherina merceri (monotypic family Dentatherinidae) was not available for this study nor included in any of the published molecular phylogenies, hence its relationships remain unresolved. The amount of DNA sequence data analyzed herein also is larger than previous efforts, with a total length of 6,432 sites for eight gene fragments (one mitochondrial and seven nuclear loci), compiled into a data matrix that is

89% complete (Tables 2.2 and 2.3). The resulting molecular phylogeny was calibrated on the basis of six carefully documented fossil taxa, placed on the phylogeny with high confidence. Phylogenetic resolution afforded by our data set was significant, with high measures of support for most internal branches, and substantial convergence among different types of analyses (ML and Bayesian, but see below). Species trees methods that account for coalescent variance to accommodate potential biases due to anomalous gene tree distributions (e.g., Huang et al., 2010) were not tested. In order to obtain a dataset with complete gene representation for all families suitable for BEAST (Drummond et al.,

2012), our matrix would have to be reduced to five markers (out of eight) and 36 ingroup species (out of 103). We preferred to emphasize taxon sampling and maximal use of our phylogenetic markers, a strategy that provides robust results for inference of deep

! '"! ! phylogenetic questions with concatenation approaches (Lambert et al., 2014). Potential biases originating from non-stationarity of base composition in some data blocks (most

3rd codon positions) were shown to have no effect on phylogenetic results. The time- calibrated phylogeny presented in Figure 2.3 provides a robust framework for understanding the evolution of this important group of fishes.

2.3.1. Taxonomic implications

The early split between New World and Old World silversides suggested by several authors (Figure 2.1) was resolved with confidence in this study, supporting the subdivision of Atheriniformes in two suborders (Atherinopsoidei and Atherinoidei).

Phylogenetic resolution within Atherinopsoidei afforded by our data was more robust than within Atherinoidei, as indicated by lower bootstrap and posterior probability values and by discordances in topology between ML and Bayesian results for Atherinoidei

(Figure 2.2 B and Figure 2.3). Taxonomic sampling also was more robust within

Atherinopsoidei (49% or 54 out of 110 species) than Atherinoidei (20% or 49 out of 242 species). As a consequence, taxonomic recommendations for Atherinoidei seem somewhat premature on the basis of our data.

Our results for Atherinopsoidei are consistent with a previous study (Bloom et al., 2012) that resolved the position of Notocheirus hubbsi among New World taxa. In agreement with these authors, we support the designation of a monotypic subfamily Notocheirinae within Atherinopsidae, placed as the sister-group of the subfamily Menidiinae. A third subfamily (Atherinopsinae) is supported with confidence by our data.

! ''! !

Within Menidiinae, the monophyly of the tribe Membradini proposed by Chernoff (1986) and Dyer and Chernoff (1996) is not corroborated by our results (Figure 2.2 A and Figure

2.3). Melanorhinus microps, placed by these authors within Membradini, has closer affinities with tribe Menidiniini (labeled Membradini-2 in Figure 2.2 A), a result also supported by analyses of rag1 and cytb data (Bloom et al., 2012). Two other species of

Melanorhinus (M. boekei and M. cyanellus) were not available for either of the studies, but a redefinition of Menidiini that contains the genus Melanorhinus seems necessary.

Within Menidiini, our analyses support sinking Chirostoma Swainson 1839 and Poblana de Buen 1945 as junior synonyms of Menidia Bonaparte 1836, in agreement with Miller et al (2005). Another taxon previously assigned to Membradini (Atherinella brasiliensis) is placed with confidence as a sister group to all other taxa in the Menidiinae, a result also obtained by analysis of mtDNA (nd2 and cytb) and two nuclear genes, tmo4c4 and rag1

(Bloom et al., 2013), suggesting that a new tribe may need to be defined for this taxon and putative close relatives in the future. This result also implies that the genus

Atherinella is in need of revision. In fact, since the two species of Membras (M. gilberti and M. martinica) are deeply nested among all species of Atherinella included in this study (except A. brasiliensis) it is necessary to reassign all species in the clade labeled

Membradini-1 (Figure 2.2 A), which includes the type species for the genera Atherinella

(A. panamensis Steindachner 1875) and Membras (type species for Membras is M. martinica Bonaparte 1836), to the genus Membras. Therefore, Atherinella becomes a junior of Membras. Atherinella brasiliensis (Quoy and Gaimard 1825) should be reassigned to the genus Xenomelaniris (Shultz 1948), formerly a subgenus of

Atherinella, changing its valid name to Xenomelaniris brasiliensis (Quoy and Gaimard

! '#! !

1825). Other species that may be included in Xenomelaniris but were not examined in this study include Atherinella robbersi from Lake Totumo, Colombia and A. venezuelae from Trinidad & Tobago and Venezuela, also placed in the subgenus Xenomelaniris by

Chernoff (1896). Until a complete taxonomic revision of these two latter species is completed we do not assign these taxa to any tribe and list them, together with X. brasiliensis, as insertae sedis within Menidiinae. Table 2.5 lists these proposed changes in sequential classification.

Within Atherinoidei, some of our results are congruent with previous hypotheses, such as the non-monophyly of Melanotaeniidae (Aarn and Ivantsoff, 1997; Bloom et al., 2012), supporting the notion that Cairnsichthys should be recognized as an independent lineage.

When initially described, Cairnsichthys was placed as a sister group to Pseudomugilidae based on its morphological specializations (Allen 1980); it was later resolved as sister to

Pseudomugilidae + Telmatherinidae (Bloom et al., 2012), or to the rest of

Melanotaeniidae (Unmack et al., 2013). The morphological distinctiveness of

Cairnsichthys has been attributed to its restricted distribution in a few drainages in the wet Tropics of Queensland, to competition with sympatric Melanotaenia splendida, or to intense predation pressure (Unmack et al., 2013). We tentatively list Cairnsichthys as insertae sedis within Atherinoidei (Table 2.5). Within Melanotaeniidae, our phylogenetic hypothesis are consistent with previous results proposing the non-monophyly of

Chilatherina, Glossolepis, and Melanotaenia (Unmack et al., 2013). Geographic groups were proposed for species of Melanotaenia by these authors, distinguishing Western New

Guinea (M. catherinae and M. kokasensis), Northern New Guinea (C. fasciata, G. incisus and M. affinis), and Southern New Guinea-Australia (M. australis, M. exquisita, M.

! '$! ! maccullochi, M. nigrans, M. splendida, M. trifasciata). Both our RAxML and BEAST results support these groupings, though there is minor conflict between the topologies of the Southern New Guinea-Australian clade (Figure 2.2 B, Figure 2.3).

Other phylogenetic results within Atherinoidei receive consistent support and may be informative for taxonomy. For example, an earlier suggestion to sink Telmatherinidae into Pseudomuglidae (Sparks and Smith, 2004), also consistent with analysis of mtDNA data alone (Stelbrink et al., 2014), is not supported by our results. Both families are resolved as monophyletic groups with high confidence and placed as sister taxa in all our analyses (Figure 2.2 B and Figure 2.3). The revised classification for rainbowfishes proposed by these authors (Sparks and Smith, 2004: their Table 3) is based on fewer taxa for these two families or on a single molecular marker (Stelbrink et al., 2014). On the other hand, their sampling within the family Bedotidae included 18 OTUs for Bedotia and six for Rheocles resulting in the non-monophyly of the latter. This hypothesis is consistent with our results, supporting their recommendation for genus Rheocles to be retained for R. wrightae (plus R. alaotrensis and R. lateralis) and to erect a new genus for

R. vatosoa (and R. derhami). The suborder Melanotenoidei (containing Pseudomugilidae,

Melanotaeniidae, Bedotidae) is not supported by any of our analyses.

! '%! !

Table 2.5. New sequential classification of families of Atheriniformes and subfamilies, tribes, and genera of Atherinopsidae based on phylogenetic relationships proposed herein (Figure 2.2).

Order Atheriniformes Rosen 1966 Suborder Atherinopsoidei Family Atherinopsidae Fitzinger 1873 Subfamily Atherinopsinae Fitzinger 1873 Tribe Atherinopsini Fitzinger 1873 Atherinops Steindachner 1876 Atherinopsis Girard 1854 Colpichthys Hubbs 1918 (not examined) Leuresthes Jordan & Gilbert 1880 Tribe Sorgentinini Pianta de Risso & Risso 1953 Basilichthys Girard 1855 Odontesthes Evermann & Kendall 1906 Subfamily Menidiinae Schultz 1948 Insertae Sedis within Menidiinae: Xenomelaniris brasiliensis (Quoy & Gaimard 1825), “Atherinella” venezuelae (not examined) and “Atherinella” robbersi (not examined) Tribe Menidiini Schultz 1948 Labidesthes Cope 1870 Melanorhinus Metzelaar 1919 Menidia Bonaparte 1836 [includes Chirostoma Swainson 1839 and Poblana de Buen 1945] Tribe Membradini Chernoff 1986 Membras Bonaparte 1836 [includes Atherinella Steindachner 1875] Subfamily Notocheirinae Schultz 1950 Notocheirus Clark

Suborder Atherinoidei Insertae sedis within Atherinoidei: Cairnsichthys Allen 1980 Family Atherinidae Risso 1827 Family Atherionidae Schultz 1948 Family Bedotiidae Jordan & Hubbs 1919 Family Isonidae Rosen 1964 Family Melanotaeniidae Gill 1894 Family Phallostethidae Regan 1916 Family Pseudomugilidae Kner 1867 Family Telmatherinidae Munro 1958

! '&! !

2.3.2. Timing of diversification

The few published time-calibrated atheriniform phylogenies available have been inferred for smaller subset of taxa and were based on single calibration points or on estimated rates of molecular divergence. For example, the origin of five European Atherina species

(A. boyeri, A. breviceps, A. hepsetus, and A. presbyter) was placed at 19 Ma (Pujolar et al., 2012) assuming that a paleogeographic event (closure of the Gibraltar strait dated 5.6

Ma) caused vicariance between A. hepsetus and A. presbyter. This estimate, however, is remarkably close to our estimate for this node (MRCA of A. hepsetus and A. breviceps) at

~19 Ma and consistent with the 10 Ma †Atherina atropatiensis used for calibration point

3 (Carnevale et al., 2011). Bloom et al. (2013) calibrated a molecular phylogeny for New

World silversides (Atherinopsidae) using three fossil constraints and obtained a date of

37 Ma for the origin of this family and 27 Ma for the origin of Menidiinae. These dates are significantly younger than our estimates of 62 Ma and 42 Ma for Atherinopsidae and

Menidiinae, respectively (Figure 2.3). Their younger age estimates are a likely consequence of misinformative fossil priors, most critically the hard minimum bound of

5.3 Ma for the MRCA of Basilichthys and Odontesthes (Bloom et al., 2013: 2042), since stratigraphic studies place the age of Basilichthys and Odontesthes fossils at 11 and 20

Ma, respectively (Suarez and Emparan, 1995), as used for our calibration points 5 and 6

(see methods). Bloom et al. (2013) also have reduced taxonomic representation within

Atherinoidei (only Atherinomorus was used as an outgroup) precluding definition of fossil constraints within this suborder. Another study focusing only on Melanotaeniidae

(Unmack et al., 2013) assumed a “standard” pairwise divergence rate of 1% for the cytochrome b gene to date the origin of this family at ~80 Ma (95% HPD of 63.5 – 99

! #(! !

Ma). This age is significantly older than our estimate for the origin of Melanotaeniidae

(24 Ma, Figure 2.3), and still older than our estimated date for the origin of

Atheriniformes (72.8 Ma). Another study based on mtDNA alone (Stelbrink et al., 2014) used the same rate of evolution for cytb and three alternative calibration approaches and inferred the origin of Melanotaeniidae at 17 - 55 Ma, encompassing our estimated value of 23 Ma. Molecular rates are highly variable among taxa and therefore not reliable as

“standard yardsticks” to calibrate phylogenies, diminishing confidence in these results. It also may be argued that the age prior for the root of Atheriniformes used in our analysis

(calibration 1: 70.5 Ma, 95% soft upper bound 77.5 Ma) imposed a strong constraint on inferred maximum ages. This root prior, however, was based on large scale-analysis of

202 taxa representing all major bony fish lineages with 60 fossil calibration points

(Betancur-R et al., 2013a). The most relevant fossils among the 60 used in that study were two 49 Ma heroine and geophagine cichlids (López-Fernández et al., 2013), phylogenetically close to atheriniforms within Ovalentaria. Therefore, the weight of evidence used to support our choice or root calibration prior and consistency with other fossils used in our study increase confidence in our results. The divergence date estimated by Unmack et al (2013) for Melanoteniidae is closer to the age of Ovalentaria estimated by Betancur et al. (Betancur-R et al., 2013a) and others (~ 100 Ma). This issue is critical to assess competing biogeographic hypotheses, for example to explain the current distribution of freshwater melanoteniids and bedotiids (see discussion on ancestral habitat reconstruction, below).

! #)! !

2.3.3. Vicariance, oceanic dispersal, and freshwater invasions in Atheriniformes

Though vicariance has long been the leading explanation for widely distributed taxa

(Parenti, 2008), improved methodologies in sequencing, molecular phylogenetics, and time-calibrated phylogenies have found support for dispersal as a more likely explanation for the distribution of many groups (Crisp et al., 2011; de Queiroz, 2005; Sanmartin,

2008). The dates of divergence among families obtained here post-date significantly

Gondwanan continental break-up events invoked by most vicariance hypotheses (e.g.,

Sparks and Smith, 2004; Unmack et al., 2013). Therefore, the current distribution of

Atheriniformes is more likely the result of oceanic dispersal. This hypothesis is supported by the wide range of salinity tolerance displayed by atheriniform fishes, making marine dispersal physiologically plausible, and by evidence that euryhalinity has evolved multiple times in some atheriniform groups (Bloom et al., 2013). We find additional support for a marine-dispersal hypothesis in the results of ancestral habitat reconstruction, which suggests that the atherinopsoid and atherinoid ancestors were marine or euryhaline

(see Figure 2.4, and supplemental_data7 and 8 for more detail), implying that the divergence between Atherinopsoidei and Atherinoidei is the result of marine dispersal.

The marine-freshwater boundary poses a significant physiological challenge to many organisms (Lee and Bell, 1999), but silverside fishes seem to cross it with relative ease.

Our fossil-calibrated phylogeny (Figure 2.3) in combination with the ancestral habitat reconstruction (Figure 2.4, supplemental data 7 and 8) points to several instances of marine dispersal and subsequent freshwater colonization by silversides. For example, the distribution of the atherinid species Alepidomus evermanni and Atherinomorus stipes in the Caribbean can only be explained by marine dispersal, followed by freshwater

! #*! ! invasion by A. evermanni, given that their closest relatives inhabit the Indian and western

Pacific Oceans. Similarly, the divergence between Melanotaeniidae and Bedotiidae was hypothesized to be the result of vicariance following the break-up of Gondwana, 140-80

Ma (Sparks and Smith, 2004; Unmack et al., 2013). We do not find these families as sister taxa in either of our analyses, though our ancestral habitat reconstruction suggests a freshwater ancestor for each of the Bedotiidae, Pseudomugilidae, Telmatherinidae, and

Melanotaeniidae families. This is possibly due to bias from the exclusively freshwater modern taxa found in Bedotiidae and Melanotaeniidae, and lack of any fossil evidence that would provide data on historical ranges. The topology obtained in BEAST does suggest a possible relationship between , a marine species, and the freshwater Melanotaeniids (excluding Cairnsichthys), but this relationship is poorly supported in our analyses. At 44 Ma (Figure 2.3), the ancestor of all of these families is too young to explain the current distribution of these taxa by vicariance. Instead, we hypothesize that extinct marine ancestors must have dispersed and colonized freshwater.

In the suborder Atherinopsoidei, where taxon sampling is more complete, repeated invasions of freshwater by marine or euryhaline ancestors are relatively common, occurring in Atherinella, Chirostoma and Poblana, Basilichthys and Odontesthes.

Frequent marine dispersal followed by freshwater colonization make Atheriniformes an interesting system in which to study the processes of speciation and diversification along the marine-freshwater barrier (Bloom et al., 2013).

In addition to marine dispersal, two vicariant events have been hypothesized to be responsible for the current distribution of atherinopsoid silversides: the rise of the

Isthmus of Panama, and equatorial warming during the middle Miocene (White, 1986).

! #+! !

Menidiinae is distributed in Central America in both the Atlantic and Pacific Oceans, with some species extending northwards along eastern North America. Our study and others (Bloom et al., 2013) do not find support for the hypothesis that the distribution of

Menidiinae is a result of vicariance due to the rise of the modern Isthmian link, which rose above sea level 5-7 Ma (White, 1986). However, an earlier Isthmian link that formed during the Eocene and may have persisted into the Miocene has been suggested by geological data (Montes et al., 2012). Our confidence intervals on the divergence between Pacific-distributed “Atherinella” blackburni and “Atherinella” starksi from their

Atlantic relatives extends into the upper Eocene (~33 Ma, Figure 2.3), so we cannot rule out vicariance caused by an older Isthmian link. Similarly, the anti-tropical distribution of

Atherinopsinae was proposed to be the result of unfavorable climatic warming in the

Middle Miocene (White, 1986). Our analysis dates the origin of Atherinopsinae to the

Oligocene, making the vicariance scenario unlikely. Ultimately, we do not have conclusive evidence for a vicariance hypothesis in New World silversides, but do find that oceanic dispersal was a primary driver in shaping the modern distribution of

Atheriniformes.

2.4. References

Aarn, Ivantsoff, W., 1997. Descriptive anatomy of Cairnsichthys rhombosomoides and Iriatherina werneri (Teleostei: Atheriniformes), and a phylogenetic analysis of Melanotaeniidae. Ichthyol. Explor. Freshwaters 8, 107-150. Aarn, W.I., Kottelat, M., 1998. Phylogenetic analysis of Telmatherinidae (Teleostei: Atherinomorpha), with description of Marosatherina, a new genus from Sulawesi. Ichthyological Exploration of Freshwaters 9, 311-323. Asensio, M.A., Cornou, M.E., Malumian, N., Martinez, M.A., Quattrocchio, M.E., 2010. Formación Rio Foyel, Oligoceno de la cuenca de Ñirihuau: la transgresión pacífica en la cordillera norpatagonica. Revista de la Asociacion Geologica Argentina 66, 399-405.

! #"! !

Beheregaray, L.B., 2000. Population Genetics of the Silverside Odontesthes argentinensis (Teleostei, Atherinopsidae): Evidence for Speciation in an of Southern Brazil. Copeia 2, 441-447. Beheregaray, L.B., Sunnucks, P., Briscoe, D.A., 2002. A rapid fish radiation associated with the last sea-level changes in southern Brazil: the silverside Odontesthes perugiae complex. Proceedings. Biological sciences / The Royal Society 269, 65-73. Betancur, R.R., Broughton, R.E., Wiley, E.O., Carpenter, K., Lopez, J.A., Li, C., Holcroft, N.I., Arcila, D., Sanciangco, M., Cureton Ii, J.C., Zhang, F., Buser, T., Campbell, M.A., Ballesteros, J.A., Roa-Varon, A., Willis, S., Borden, W.C., Rowley, T., Reneau, P.C., Hough, D.J., Lu, G., Grande, T., Arratia, G., Orti, G., 2013. The tree of life and a new classification of bony fishes. PLoS currents 5. Betancur-R, R., Broughton, R.E., Wiley, E.O., Carpenter, K., Lopez, J.A., Li, C., Holcroft, N.I., Arcila, D., Sanciangco, M., Cureton Ii, J.C., Zhang, F., Buser, T., Campbell, M.A., Ballesteros, J.A., Roa-Varon, A., Willis, S., Borden, W.C., Rowley, T., Reneau, P.C., Hough, D.J., Lu, G., Grande, T., Arratia, G., Orti, G., 2013a. The tree of life and a new classification of bony fishes. PLoS currents 2013 Apr 18. Edition 1. Betancur-R, R., Li, C., Munroe, T.A., Ballesteros, J.A., Ortí, G., 2013b. Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes). Systematic Biology 62, 763-785. Betancur-R., R., Ortí, G., Stein, A.M., Marceniuk, A.P., Pyron, R.A., 2012. Apparent signal of competition limiting diversification after ecological transitions from marine to freshwater habitats. Ecology Letters 15, 822-830. Bloom, D.D., Lovejoy, N.R., 2012. Molecular phylogenetics reveals a pattern of biome conservatism in New World anchovies (family Engraulidae). Journal of evolutionary biology 25, 701-715. Bloom, D.D., Piller, K.R., Lyons, J., Mercado-Silva, N., Medina-Nava, M., 2009. Systematics and Biogeography of the Silverside Tribe Menidiini (Teleostomi: Atherinopsidae) Based on the Mitochondrial ND2 Gene. Copeia 2009, 408-417. Bloom, D.D., Unmack, P.J., Gosztonyi, A.E., Piller, K.R., Lovejoy, N.R., 2012. It's a family matter: molecular phylogenetics of Atheriniformes and the polyphyly of the surf silversides (family: Notocheiridae). Molecular phylogenetics and evolution 62, 1025- 1030. Bloom, D.D., Weir, J.T., Piller, K.R., Lovejoy, N.R., 2013. Do Freshwater Fishes Diversify Faster Than Marine Fishes? A Test Using State-Dependent Diversification Analyses and Molecular Phylogenetics of New World Silversides (Atherinopsidae). Evolution; international journal of organic evolution, n/a-n/a. Bocchino, A., 1971. Algunos peces fosiles del denominado Patagoniano del Oeste de Chubut, Argentina. Ameghiniana 8, 52-64. Carnevale, G., Haghfarshi, E., Abbasi, S., Alimohammadian, H., Reichenbacher, B., 2011. A New Species of Silverside from the Late Miocene of NW Iran. Acta Palaeontologica Polonica 56, 749-756. Chernoff, B., 1986. Phylogenetic relationships and reclassification of menidiine silverside fishes with emphasis on the tribe Membradini. Proc. Acad. Nat. Sci. Philadelphia 138, 189-249. Cione, A.L., Baez, A.M., 2007. Peces continentales y anfibios cenozoicos de Argentina: los ultimos cincuenta años. Ameghiniana 50º aniversario, 195-220

! #'! !

. Crisp, M.D., Trewick, S.A., Cook, L.G., 2011. Hypothesis testing in biogeography. Trends in ecology & evolution 26, 66-72. de Queiroz, A., 2005. The resurrection of oceanic dispersal in historical biogeography. Trends in ecology & evolution 20, 68-73. Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A., 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7 Molecular Biology and Evolution 29, 1969-1973. Dyer, B., 1998. Phylogenetic systematics and historical biogeography of the family Atherinopsidae (Teleostei, Atheriniformes). Phylogeny and Classification of Neotropical Fishes. LR Malabarba, RE Reis, RP Vari, ZMS Lucena, and CAS Lucena (eds.). Edipucrs, Porto Alegre, Brazil, 519-536. Dyer, B.S., 1997. Phylogenetic Revision of Atherinopsidae (Teleostei, Atherinopsidae), with comments on the systematics of the south american freshwater fish genus Basilichthys Girard. Miscellaneous Publications Museum of Zoology, University of Michigan 185. Dyer, B.S., Chernoff, B., 1996. Phylogenetic relationships among atheriniform fishes (Teleostei: Atherinomorpha). Zoological Journal of the Linnean Society 117, 1-69. Eschmeyer, W.N.e., 2013. Catalog of fishes: genera, species, references. Fluker, B.L., Pezold, F., Minton, R.L., 2011. Molecular and morphological divergence in the inland silverside (Menidia beryllina) along a freshwater-estuarine interface. Environmental Biology of Fishes 91, 311-325. Gaudant, J., Reichenbachen, B., 2005. Hemitrichas stapfi n. sp. (Teleostei, Atherinidae) with otoliths in situ from the late Oligocene of the Mainz Basin. Zitteliana A45, 189-198. Gradstein, 2012. The geologic time scale 2012 2-volume set. Elsevier. Huang, H., He, Q., Kubatko, L.S., Knowles, L.L., 2010. Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Systematic Biology 59. Ivantsoff, W., Aarn, Shepherd, M., A., Allen, G.R., 1997. Pseudomugil reticulatus, (Pisces: Pseudomugilidae) a review of the species originally described from a single specimen, from Vogelkop Peninsula, Irian Jaya with further evaluation of the systematics of Atherinoidea. aqua Journal of Ichthyology and Aquatic Biology 2, 53-64. Ivantsoff, W., Said, B., Williams, A., 1987. Systematic position of the family Dentatherinidae in relationship to Phallostethidae and Atherinidae. Copeia, 649-658. Jermiin, L., Ho, S.Y., Ababneh, F., Robinson, J., Larkum, A.W., 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. . Systematic Biology 53, 638-643. Jost, J., Kälin, D., Schulz-Mirbach, T., Reichenbacher, B., 2007. Late Early Miocene lake deposits near Mauensee, central Switzerland: Fish fauna (otoliths, teeth), accompanying biota and palaeoecology. Eclogae Geologicae Helvetiae 99, 309-326. Katoh, K., Misawa, K., Kuma, K., Miyata, T., 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059-3066. Kumar, S., Gadagkar, S.R., 2001. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158, 1321-1327.

! ##! !

Lambert, S.M., Reeder, T.W., Wiens, J.J., 2014. When Do Species-Tree and Concatenated Estimates Disagree? An Empirical Analysis with Higher-Level Scincid Lizard Phylogeny. Molecular phylogenetics and evolution in press. Lee, C.E., Bell, M.A., 1999. Causes and consequences of recent freshwater invasions by saltwater . TREE 14. Lewallen, E.A., Pitman, R.L., Kjartanson, S.L., Lovejoy, N.R., 2011. Molecular systematics of flyingfishes (Teleostei: Exocoetidae): evolution in the pelagic zone. Biological Journal of the Linnean Society 192, 161-174. Li, C., Betancur-R, R., Smith, L., Ortí, G., 2011. Monophyly and interrelationships of Snook and Barramundi (Centropomidae sensu Greenwood) and five new markers for fish phylogenetics. Molecular phylogenetics and evolution 60, 463-471. Li, C., Lu, G., Orti, G., 2008. Optimal data partitioning and a test case for ray-finned fishes () based on ten nuclear loci. Systematic biology 57, 519-539. Li, C., Orti, G., Zhang, G., Lu, G., 2007. A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study. BMC evolutionary biology 7, 44. López-Fernández, H., Arbour, J.H., Winemiller, K.O., Honeycutt, R.L., 2013. Testing for ancient adaptive radiations in Neotropical cichlid fishes. Evolution 67, 1321-1337. Malz, H., 1978. Aquitane Otolithen-Horizonte im Untergrund von Frankfurt am Main. Senckenbergiana lethaea 58, 451-471. McGuigan, K., Zhu, D., Allen, G.R., Moritz, C., 2000. Phylogenetic relationships and historical biogeography of melanotaeniid fishes in Australia and New Guinea. Marine Freshwater Research 51, 713-723. McNeill, D.F., Klaus, J.S., Budd, A.F., Lutz, B.P., Ishman, S.E., 2011. Late Neogene chronology and sequence stratigraphy of mixed carbonate-siliciclastic deposits of the Cibao Basin, Dominican Republic. Geological Society of America Bulletin 124, 35-58. Miller, M.A., Pfeiffer, W., Schwartz, T., 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop (GCE) 1-8. Montes, C., Cardona, A., McFadden, R., Moron, S.E., Silva, C.A., Restrepo-Moreno, S., Ramirez, D.A., Hoyos, N., Wilson, J., Farris, D., Bayona, G.A., Jaramillo, C.A., Valencia, V., Bryan, J., Flores, J.A., 2012. Evidence for middle Eocene and younger land emergence in central Panama: Implications for Isthmus closure. GSA Bulletin 124, 780- 799. Nelson, J.S., 2006. Fishes of the world. Fourth Edition. John Wiley & Sons, Inc. , Hoboken, New Jersey. Nolf, D., Aguilera, O., 1998. Fish otoliths from the Cantaure Formation (Early Miocene of Venezuela). Bulletin de l'Institut Royal des Sciences Naturelles de Belgique 68, 237- 262. Nolf, D., Stringer, G.L., 1992. Neogene paleontology in the Northern Dominican Republic. Bulletins of American Paleontology 102. Parenti, L., 2008. Common cause and historical biogeography. In: Ebach, M.C., Tangney, R.S. (Eds.), Biogeography in a changing world. CRC Press, Taylor & Francis Group, Boca Raton, Florida, pp. 61-82.

! #$! !

Parenti, L.R., 1984. On the relationships of Phallostethid fishes (Atherinomorpha), with notes on the anatomy of Phallostethus dunckeri Regan, 1913. American Museum Novitates 2779, 1-12. Parenti, L.R., 1986. Homology of pelvic fin structures in female phallostethid fishes (Atherinomorpha, Phallostethidae). Copeia, 305-310. Parenti, L.R., 1993. Relationships of Atherinomorph Fishes (Teleostei). Bulletin of Marine Science 52, 170-196. Pujolar, J.M., Zane, L., Congiu, L., 2012. Phylogenetic relationships and demographic histories of the Atherinidae in the Eastern Atlantic and Mediterranean Sea re-examined by Bayesian inference. Molecular phylogenetics and evolution 63, 857-865. Ree, R.H., Smith, S.A., 2008. Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Systematic biology 57, 4-14. Reichenbacher, B., 2000. Das brackisch-lakustrine Oligozaen und Unter-Miozaen im Mainzer Becken und Hanauer Becken. Courier Forschungsinst. Senckenberg Frankfurt aM. Reichenbacher, B., Alimohammadian, H., Sabouri, J., Haghfarshi, E., Faridi, M., Abbasi, S., Matzke-Karasz, R., Fellin, M.G., Carnevale, G., Schiller, W., Vasilyan, D., Scharrer, S., 2011. Late Miocene stratigraphy, palaeoecology and palaeogeography of the Tabriz Basin (NW Iran, Eastern Paratethys). Palaeogeography, Palaeoclimatology, Palaeoecology 311, 1-18. Reichenbacher, B., Weidmann, M., 1992. Fisch-Otolithen aus der oligo-/miozänen Molasse der West-Schweiz und der Haute-Savoie (Frankreich): mit 1 Tabelle. Staatliches Museum für Naturkunde. Ronquist, F., Huelsenbeck, J.P., 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572-1574. Rosen, D.E., 1964. The relationships and taxonomic position of the halfbeaks, killifishes, silversides, and their relatives. Bulletin of the American Museum of Natural History 127, 217-268. Rosen, D.E., Parenti, L.R., 1981. Relationships of Oryzias, and the groups of atherinomorph fishes. American Museum Novitates 2719, 1-25. Rubilar, A., 1994. Diversidad ictiologica en depositos continentales miocenos de la Formacion Cura-Mallin, Chile (37-39ºS): implicancias paleogeograficas. Revista Geologica de Chile 21, 3-29. Saeed, B., Ivantsoff, W., Crowley, L.E.L.M., 1994. Systematic relationships of Atheriniform families within Division I of the series Acanthomorpha (Acanthopterygii) with relevant historical perspectives. Journal of Ichthyology 34, 27-72. Sanmartin, I., 2008. Inferring dispersal: a Bayesian approach to phylogeny-based island biogeography, with special reference to the Canary Islands. Journal of Biogeography 35, 428-449. Setiamarga, D.H., Miya, M., Yamanoue, Y., Mabuchi, K., Satoh, T.P., Inoue, J.G., Nishida, M., 2008. Interrelationships of Atherinomorpha (medakas, flyingfishes, killifishes, silversides, and their relatives): The first evidence based on whole mitogenome sequences. Molecular phylogenetics and evolution 49, 598-605. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics (Oxford) 17, 1246-1247.

! #%! !

Sparks, J.S., Smith, W.L., 2004. Phylogeny and biogeography of the Malagasy and Australasian rainbowfishes (Teleostei: Melanotaenioidei): Gondwanan vicariance and evolution in freshwater. Molecular phylogenetics and evolution 33, 719-734. Stamatakis, A., 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690. Stamatakis, A., Hoover, P., Rougemont, J., 2008. A fast bootstrapping algorithm for the RAxML web-servers. Systematic biology 57, 758-771. Stelbrink, B., Stöger, I., Hadiaty, R.K., Schliewen, U.K., Herder, F., 2014. Age estimates for an adaptive lake fish radiation, its mitochondrial introgression, and an unexpected sister group: Sailfin silversides of the Malili Lakes system in Sulawesi. BMC Evolutionary Biology 14, 94. Suarez, M., Emparan, C., 1995. The stratigraphy, geochronology and paleophysiography of a Miocene fresh-water interarc basin, southern Chile. Journal of South American Earth Sciences 8, 17-31. Swofford, D., 2002. PAUP 4.0 b10: Phylogenetic analysis using parsimony. Sinauer Associates, Sunderland, MA, USA. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. . Molecular Biology and Evolution 28, 2731-2739. Unmack, P.J., Allen, G.R., Johnson, J.B., 2013. Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea. Molecular phylogenetics and evolution 67, 15-27. Unmack, P.J., Dowling, T.E., 2010a. Biogeography of the genus Craterocephalus (Teleostei: Atherinidae) in Australia. Molecular phylogenetics and evolution 55, 968-984. Unmack, P.J., Dowling, T.E., 2010b. Biogeography of the genus Craterocephalus (Teleostei: Atherinidae) in Australia1 Molecular phylogenetics and evolution 55, 968- 984. Upchurch, P., 2008. Gondwanan break-up: legacies of a lost world? Trends in ecology & evolution 23, 229-236. Weiler, W., 1942. Die Otolithen des rheinischen und nordwestdeutschen Tertiärs. Reichsamt für bodenforschung. Weiler, W., Schäfer, W., 1963. Die Fischfauna des Tertiärs im oberrheinischen Graben, des Mainzer Beckens, des unteren Maintals und der Wetterau, unter besonderer Berücksichtigung des Untermiozäns. Kramer. White, B.N., 1986. The isthmian link, antitropicality and american biogeography: distributional history of the Atherinopsinae (Pisces: Atherinidae) Systematic Zoology 35, 176-194. White, B.N., Lavenverg, R.J., McGowen, G.E., 1984. Atheriniformes: development and relationships. . Ontogeny and Systematics of Fishes. American Society of Ichthyologists and Herpetologists, pp. 355-362. Zhu, D., Jamieson, B., Hugall, A., Moritz, C., 1994. Sequence evolution and phylogenetic signal in control-region and cytochrome b sequences of rainbow fishes (Melanotaeniidae). Mol. Biol. Evol. 11, 672-683.

! #&! !

Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. The University of Texas at Austin, Austin, TX. Zwickl, D.J., 2011. GARLI 2.0.

! $(! Chapter 3.

The draft genome of Odontesthes bonariensis (Teleostei,

Ovalentaria, Atherinopsidae)

3.1. Introduction

Odontesthes bonariensis (Valenciennes 1835) (Atheriniformes, Teleostei), the pejerrey, is a commercially important freshwater species that has been used in aquaculture for more than a century and stocked into natural and artificial environments in several countries (Dyer, 2006; Somoza et al., 2008).

It is one a few teleost species with temperature-dependent sex determination

(TSD)(Wibbels et al., 1991), a poorly understood mechanism by which water temperature during a critical period of larval development is the major determinant of sex (Strüssmann et al., 1997).

Increased interest in physiological and genetic studies of this species has been driven by this unique developmental trait. As any other complex character, in- depth whole-genome analyses may shed light on the structure, function, and regulation of genes involved in TSD. Recent advances in sequencing technologies allow for more manageable and cost-effective access to whole- genome studies in non-model species like the pejerrey, a species with enormous potential for novel discoveries.

! "#! Among the fish genomes currently available, the most closely related to pejerrey is Oryzias latipes “medaka” (Beloniformes)(Kasahara, 2007), and two species of Cyprinodontiformes, Poecilia formosa (Schedina et al., 2014) and

Xiphophorus maculatus (Schartl et al., 2013), all currently classified in the superorder Atherinomorphae (Parenti, 1993). Current estimates based on calibrated molecular phylogenies place the divergence of the pejerrey from the lineage leading to Beloniformes and Cyprinodontiformes at around 77 million years ago (Betancur et al., 2013).

This chapter presents results of the “Pejerrey Genome Project”, a joint effort involving The George Washington University, the J. Craig Venter Institute and the Children’s National Medical Center to sequence, assemble, and annotate the pejerrey draft genome, a species listed as one of the 100 “gold standard” fish species by the Genome 10K consortium (Bernardi et al., 2012). The chosen strategy to sequence the pejerrey genome is a modification of the shotgun method (Pop, 2009; Staden, 1979), which presents a computational challenge in assembly that will be adequately addressed in the discussion section of this chapter. Genomic reads can be assembled either using a reference genome from a closely related species to which the newly generated sequences are mapped and assembled (reference assembly), or under a de novo approach that does not require an existing reference (Miller et al., 2010; Yandell and Ence, 2012).

Considering the large evolutionary distance between pejerrey and its closest relatives for which genomes are available, the de novo approach was selected to assemble the pejerrey genome. In this chapter the output assemblies of two de

! "$! novo assemblers, SOAPdenovo (Li et al., 2010c) and AllPathsLG (Gnerre et al.,

2011) are presented and compared. The preferred assembly was structurally annotated using an automated pipeline that is evaluated in this chapter taking in consideration the limitations of this approach (Yandell and Ence, 2012).

Downstream analyses focus on identification of repetitive elements, large protein families, and the syntenic relationship between medaka and pejerrey genomes. Based on existing literature for other teleost genomes, the pejerrey genome is expected to have a large diversity of repetitive elements but with low copy number, and a high degree of gene order conservation with medaka, due to conserved karyotipic structure (Kai et al., 2011; Kasahara, 2007). This genome draft is intended to serve as an initial exploration of the biological complexity of the pejerrey from a genomic perspective and to provide the basis for improved and more complete versions in the near future.

3.2. Materials and Methods

3.2.1. DNA Isolation, Construction of Illumina Libraries, and Sequencing

Genomic DNA was extracted from muscle tissue of an O. bonariensis male, descendant of a founder stock from Argentina, which was bred for several generations in Japan, and later re-introduced in Argentina . The stock has been maintained at the rearing facilities of IIB-INTECH, Chascomús, Argentina

(Karube et al., 2007). Total DNA was extracted by standard proteinase

K/Phenol/Chloroform extraction method (Sambrook et al., 1989).

! "%! Paired-end (PE) and mate-pair (MP) libraries were prepared for the Illumina sequencing platform at the J. Craig Venter Institute. PE libraries were targeted for 200 and 300 bp fragments, and MP libraries for 3 Kbp fragment sizes. The choice of diverse insert size libraries has been made based on the requirements of the assembly software. Two rounds of sequencing were performed in a

HiScan SQ system at Children’s National Medical Center in Washington, DC.

Reads were quality controlled using output statistics from the Illumina sequencing software, FastQC, and CLCbio. The parameters evaluated were number of reads per flowcell, read length distribution, per base sequence quality values and error rates.

CLCbio end trimming software was used to trim ends, and less than 1% of the bases for each library were trimmed. For reads correction, we used Mertrim, a module of the Celera Assembler that is capable of correcting mismatches and indel errors (Miller et al., 2008).

3.2.2. Assembly methods and Selection of Optimal Assembly

Reads were assembled with two different algorithms, SOAPdenovo v.1.05 63- mer (Luo and al, 2012) and AllPaths-LG (Gnerre et al., 2011), both based on

Eulerian path-DBG methods, which are more suitable for outsized short-read datasets (Miller et al., 2010), and large complex genomes (Li et al., 2010a; Li et al., 2010c). For SOAPdenovo assemblies, Kmer length is a critical parameter that needs to be defined a priori. Different K values were tested (31 bp, 51 bp and 75 bp) in three independent assemblies using only 30% of the total number

! "&! of reads, and equal representation of the different sequencing libraries. Reads were randomly chosen from the reads files. Assemblies were tested both with raw reads and with trimmed and corrected reads to identify and account for sequencing errors and low quality base calls. The optimal K value was chosen by comparison of the three assemblies using parameters described in the next paragraph. All trimmed and corrected reads were then assembled with the optimal K value, and the “-d D” flag for removal of low frequency k-mers. The

AllPaths-LG assembly was run with raw reads, as recommended by the software’s developers, and using default settings.

Resulting assemblies were evaluated based on the following parameters: total number of bases incorporated, number of scaffolds, and scaffold N50 (defined as the the length of the smallest scaffold in a set that includes the fewest largest scaffolds, whose combined length represents at least 50% of the assembly

(Miller et al., 2010) for the set of scaffolds larger than 10,000 bp with and without gaps.

Sequencing gap distribution was analyzed with a custom script from JCVI. The script counts the number and length of gaps present in each scaffold.

3.2.3. Genome Size Estimation

Estimated genome size was calculated using a custom method that surveys the fraction of high quality sequencing reads that map back to the final

SOAPdenovo genome assembly. First, repeats and low-complexity regions in the assembled sequence were identified and masked using RepeatScout (Price et

! "'! al., 2005) and RepeatMasker (Smit et al., 2010). Then, the first 40 nucleotides of 50,000 randomly chosen reads were used as queries for a BLAST search against the genome assembly, and hits with an e-value lower than 1e-5 were counted. The choice of 40 nucleotides is based on the drop in base quality by the end of each read that is inherent to Illumina (Fuller et al., 2009). The

BLAST cut-off selection value is founded on preliminary tests with JCVI datasets for Toxoplasma sp. and simulated data. Genome size is calculated as

Estimated Genome Size = Masked genome size X (50,000 Reads / Number of

Reads that Matched the Genome). This value will be compared with the genome size estimation provided by AllPaths-LG.

3.2.4. Gene Prediction / Structural Annotation

An initial set of high-quality gene models was generated manually using similarity between protein sequences as evidence. Figure 3.1 shows the steps along this process. Protein alignments to the pejerrey genome were generated with the Analysis and Annotation Tool (AAT) package (Huang et al., 1997) followed by GeneWise (Birney et al., 2004) using protein databases Pfam (Finn et al., 2013) and Uniprot (Consortium, 2014), and the version 68 of the proteome of closely-related fish medaka (Oryzias latipes)(Flicek et al., 2013). A total of 400 gene models selected from the GeneWise output were used for training gene predictors. To select these gene models, results from the AAT and

Genewise alignments were filtered to keep models with a percent sequence identity larger than 50, and less that 20 aminoacids of difference in the 5’ start

! "(! and 3’ end regions. Also, the UTRs (un-translated regions) of medaka were used to filter the models with similar transcription start and end coordinates. All these filtering steps were performed using Microsoft Excel. The training set also included 14 previously identified Odontesthes bonariensis genes, some of which were only partial sequences (Table 3.1). The Program to Assemble

Spliced Alignments (PASA) is an eukaryotic genome annotation tool used to align the 14 sequences of pejerrey genes and to incorporate them as model gene structures (http://pasa.sourceforge.net/). PASA exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data.

Although abundance of transposable elements in fish genomes is low (Aparicio et al., 2002), these elements have a highly variable gene structure that differs from other protein-coding genes (Feschotte and Pritham, 2007). In order to avoid the introduction of such structural variability in the training set,

Transposon PSI (transposonpsi.sourceforge.net), an application of PSI-Blast to mine (retro-) transposons, was used to identify potential transposable-elements encoding genes. Four ab initio gene prediction programs, trained with the curated training set escribed above, were run on the pejerrey genome assembly:

Genezilla (Majoros and al., 2004), Augustus (Stanke et al., 2004), SNAP (Korf,

2004), and FGENESH (Salamov and Solovyev, 2000). The resulting set of ab initio gene predictions were combined into one consensus gene dataset using

EVidenceModeler, an annotation tool that reports weighted consensus of gene

! ""! structures from all available evidence (Haas et al., 2008). The weight for each gene prediction was assigned by comparison of two evaluation parameters to predict exon prediction accuracy, sensitivity and sensibility. Sensitivity is the number of correct exons over the actual exons, while specificity is the number of correct exons over the predicted ones (Haas et al., 2008). Statistics of the gene annotation were obtained with the GenomeTools package (Lee and Chen), option gt stat.

! ")! Medaka Gene Top Hits with models AAT Training set searches Uniprot GeneWise • 386 gene models • Alignments of • 1hit/region • 14 known pejerrey the Pejerrey Pfam • minimum alignment genes Genome against: 80%

Gene Prediction Augustus Augustus Genezilla Fgenesh Snap Snap

Final gene Evidence Specificity & models Modeler Sensitivity Predictions are combined by weight

Figure 3.1 Schematic representation of the annotation process in the pejerrey genome. See 3.2.4 for a more detailed explanation.

! "*! Table 3.1. Complete sequences of 14 pejerrey genes used in the training set for gene predictions. Gene common name is given between parentheses. Genes were also included in the pejerrey annotation using PASA software. New IDs in the pejerrey genome are included in the first column and include, first the number of the scaffold in which the gene model is located, an “m” for gene model, and a distinct number for each gene model in each scaffold.

ID in Pejerrey Genome Genbank ID GenBank Name 1352.m000001 EF030342.1 cytochrome P450 aromatase 19a1 (cyp19A1) 1576.m000001 GQ381267.1 cytochrome P450 family 11 subfamily b (P45011beta) 1679.m000001 EU864151.1 forkhead box L2 (Fox12) 1723.m000001 DQ875595.1 gonadotropin-releasing hormone receptor type 2A 293.m000001 AY744689.1 prepro pejerrey-type gonadotropin-releasing hormone 2965.m000001 EU284022.2 estrogen receptor beta 1 386.m000001 EU257205.1 preproinsulin-like growth factor-I 43.m000001 AY380061.2 brain cytochrome P450 19b (cyp19b) 52.m000001 JN228384.1 PRP-PACAP precursor 60.m000001 AY744688.1 prepro salmon-type gonadotropin-releasing hormone 646.m000001 DQ382280.1 glycoprotein hormone alpha subunit 67.m000001 AY319415.4 transcription factor SOX9 7352.m000001 AY319832.2 FSH beta subunit 8998.m000001 AY744687.1 prepro chicken-type gonadotropin-releasing hormone II

! )+!

3.2.5. Functional annotation

Gene-level searches were performed using a number of protein, domain, and profile databases including Pfam (Finn et al., 2013), TIGRfam (Haft et al.,

2003), Cazy (Lombard et al., 2014), Uniref, CDD (Marchler-Bauer et al., 2012),

Priam (Claudel‐Renard et al., 2003), and PANDA, a JCVI internal repository of non-redundant and non-identical protein data built periodically from public databases (e.g. GenBank http://www.ncbi.nlm.nih.gov, PDB http://www.rcsb.org/pdb/Welcome.do, UniProt http://www.pir2.uniprot.org, and the Comprehensive Microbial Resource database http://www.tigr.org/CMR) that include the latest sequences. Protein transmembrane domains were predicted with TMHMM (Krogh et al., 2001), signal peptides with SignalP (Nielsen et al.,

1997), and cellular localization with TargetP (Petersen et al., 2011). The predicted gene models were automatically assigned informative names by computational extraction of BLASTP searches against the databases previously mentioned in this sub-section.

Annotated proteins were organized into domain based-protein families, following the method described by Lin (Lin et al., 2008). The annotated pejerrey proteome was searched against Pfam and TIGRfam HMM profiles with

HMMER2. All sequence regions with scoring values above each domain cutoff value were assigned as domain representatives. Sequences that did not represent any known domain were searched against each other using BLASTP, to identify potential novel protein domain clusters. The BLASTP cutoff values required for

! )#! two sequences to be clustered were 30% identity and an e-value < 0.001 over a minimum span of 50 aminoacids. A Jaccard coefficient was calculated to measure similarity between each linked pair of proteins. The Jaccard coefficient accounts for the number of BLASTP hits that match two query proteins, over the number of BLASTP hits that match either of the two proteins (Haas et al.,

2005). Only peptides Jaccard coefficient above 0.6 were clustered by domain composition, whether Pfam/TIGR or novel. All proteins with the same set of domains were classified as putative protein families.

3.2.6. Assessment of genome completeness and synteny with medaka

To evaluate the degree of representation of the medaka genome in the pejerrey assembly, pairwise alignments in protein space were performed with the promer utility of MUMmer 3.0 (Kurtz et al., 2004) between each medaka chromosome

(ASM31367v1 whole genome shotgun sequence) and all pejerrey scaffolds.

Alignments were visualized with Gnuplot (Williams and Kelley, 2011).

To find genomic regions where gene order between medaka and pejerrey is conserved, a synteny analysis was run on the pejerrey and medaka proteomes using DAGChainer (Haas et al., 2004). DAGChainer identifies chains of gene pairs sharing conserved order between genomic regions (Haas et al., 2004). The medaka proteome was downloaded from Ensembl, release number 68 (Flicek et al., 2013). The output file generated by DAGchainer was analyzed with

Microsoft Excel to identify length, average length, and total number of syntenic blocks, as well as their gene annotations.

! )$! 3.2.7. Repetitive elements

RepeatScout (Price et al., 2005) was used to identify families of repetitive elements in the pejerrey genome assembly. The resulting library of putative repeats was filtered by removing sequences that were likely to code for pejerrey protein-coding genes, low complexity regions and very short sequences. The filtered repeat library was used as a BLAST query (e-value 10e-10) against the

Pfam conserved domain database (Finn et al., 2013), from which only the top hits were kept. The filtered library was also aligned with BLASTN (e-value

10e-4) against a RepBase (Jurka et al., 2005) repeat collection that included a general compilation of macrosatellites and specific repeat sets from zebrafish, pufferfish and vertebrates. All these databases were downloaded from the

RepBase repeat repository database (accessed in September, 2014) and combined in one database file to run in BLASTN.

It has been hypothesized that repetitive elements play a role in gene family expansions (Lorenzi et al., 2010; Zatsepina et al., 2001), therefore it is expected that most gene family members should be located close to regions containing repeat elements. In order to assess this, pejerrey repeat families were compared with the expanded protein families identified in the Functional Annotation section.

3.3. Results

3.3.1. Sequencing and Assembly

! )%! Assembly of all sequencing reads (Figure 3.2 for read counts) with AllPaths-LG outperformed SOAPdenovo across all parameters used for evaluation (Table

3.2). AllPaths-LG assembled more scaffolds of larger size (Figure 3.3) with a smaller number of gaps, mostly 1-100 bp long (Figure 3.4). Therefore, the

AllPaths-LG assembly is preferred and will be used for all downstream analyses.

! )&!

180

160

140

120

100 200 bp 80 3K bp 60 300 bp

40 Number of reads in millions millions reads Number of in

20

0 1 2 Illumina Hi Scan SQ Runs

Figure 3.2. Number of reads for the two sequencing rounds in Illumina HiScan SQ. Different color bars for each of the sequencing libraries, blue for 200 bp, red for 3K and green for 300 bp.

! )'! Table 3.2. Statistical analysis of the AllPaths-LG and SOAPdenovo assemblies. Sequences have been grouped by size in 4 categories (0, 1000, 5000, 10000 bp) of minimum size (Min size limit). Values under “Span Gapped” and “Bases Ungapped” include or not include Ns or gaps, respectively. N50 is the minimum size of the 50% of the total number of the largest sequences in each size category. Smallest and Largest are the minimum and maximum size of the sequences in each size category. Total length is the size in bases of the sum of all sequences in each size category. All values in base pairs.

Span Bases AllPathsLG Gapped Ungapped Number Min Size of Total Total Limit Sequences n50 Smallest Largest Length n50 Smallest Largest Length

0 31,274 60,945 221 1,013,696 870,340,417 58,893 221 994,482 677,404,321

1,000 31,178 60,945 1,000 1,013,696 870,253,413 58,895 569 994,482 677,317,452

5,000 24,105 62,639 5,003 1,013,696 851,644,128 60,590 721 994,482 661,485,920

10,000 18,159 66,414 10,000 1,013,696 808,364,550 64,064 1,010 994,482 636,286,566 SOAPdenovo

0 1,245,137 21,579 100 986,582 1,092,634,823 14,567 100 807,285 763,142,636

1,000 89,724 33,541 1,000 986,582 904,038,785 29,053 157 807,285 586,140,761

5,000 31,559 44,574 5,000 986,582 776,478,754 36,864 228 807,285 517,680,088

10,000 18,822 54,730 10,001 986,582 686,130,073 43,180 1,492 807,285 470,182,567 #!!!" +!!" *!!" -.'/0+1$$ )!!" 2)&345$6!"#7$ (!!" <"#" '!!" <"#!" &!!" <"'!"

!"#$%&$'(()*"+,$ %!!" <"#!!" $!!" <"'!!" #!!" <"#!!!" !" ,-./012343" .55/6789:;"

Figure 3.3. Distribution of scaffold length for SOAPdenovo and AllPaths-LG assemblies. Scaffolds from all length categories are stacked by assembly software. Scaffold length categories are shown by color. #!!!!!!"

#!!!!!"

#!!!!"

#!!!" ,36=01>343" #!!" .55/6789:;"

#!" 89*"):$0;$-.'/0+1($6203$-.'+)7$

#" !" '!" #!!" #'!" $!!" $'!" %!!" %'!" &!!" &'!" <'#$89*"):$#):$-.'/0+1$

Figure 3.4. Frequency distribution of number of gaps per scaffold. Numbers in the Y-axis are shown in logarithmic scale.

! ""! The AllPaths-LG assembly had a total length of 677 Mbp grouped in only

31,274 scaffolds (107,428 contigs), with a scaffold N50 of 58,893, an estimated genome coverage of 40X, and 37% repetitive. The sequence coverage differed among libraries: 12.7X for the 200 bp library, 18.3X for the 300 bp and 22.1X for the 3Kb library. Only 37% of the reads were used in the assembly, with the largest contribution from the 200bp library (43%). The final SOAPdenovo assembly, obtained with all the reads and with a kmer of 51 bp, was very fragmented, with a total length of 763 Mbp grouped in 1.2 million scaffolds

(Table 1). The scaffold N50 for all sequences was 14,567, and 43,180 if only the largest scaffolds were considered (minimum size 10,000 bp, 18,822 sequences).

3.3.2. Genome Size Estimation

According to the literature, fluorometry and densitometry essays have showed that the genome size of closely related species within the Atherinopsidae family ranges between 655 to 1075 Mbp (Hardie and Hebert, 2003) (Hinegardner and

Rosen, 1972). For Odontesthes bonariensis, the genome size was estimated to be 780 Mbp, through a method that surveys the fraction of high quality sequencing reads that map back to the final genome assembly. The genome size estimation provided by AllPaths-LG was 998 Mbp.

3.3.3. Gene Prediction and Assessment of Gene Overestimation The optimal assembly contains 51,848 de novo predicted genes, with an average length of 3,823 bp and a median of 10,199 bp (Figure 3.5). 34.1% of the annotated genes were less than 500 bp long and 17,992 genes were annotated as mono-exonic, or without introns (Figure 3.6).

Comparison of the number of genes obtained for the pejerrey against other closely-related fish genomes, suggests almost twice as many predicted genes in the pejerrey genome (Table 3.3). A fraction of these genes may be unique to pejerrey, with no homology in other genomes. However, the fragmented nature of this draft assembly and the limitations of automated annotation pipelines may have lead to gene overestimation. In a fragmented assembly, gene structures could end up split into different scaffolds, causing redundant annotation by the automated pipelines, smaller average gene size, and a larger gene count than expected. To assess gene overestimation, BLASTP results from the alignments against the medaka proteome used for the automated annotation (described in

3.2.2) were further analyzed. 40.8% (21,144) of the predicted pejerrey gene models had no hits against the medaka proteome (Figure 3.7), and showed a size distribution heavily concentrated in shorter sequence sizes (< 100 aminoacids in length) (Figure 3.8). Around 70% of those 21,144 gene models with no hits against medaka represent potential ORFs with no evidence from any of the protein databases used in the annotation pipeline (Figure 3.9).

! #$!

Figure 3.5. Number of annotated genes by length for all gene models in the pejerrey annotated assembly. The x-axis shows arbitrarily chosen length categories for gene length in base pairs. The y-axis displays the number of genes in each length category.

! #%!

Figure 3.6. Number of exons per gene for all gene models in the pejerrey annotated assembly. The x-axis shows arbitrarily chosen categories for number of exons per gene. From 1 to 10 categories increase the number by 1, from 10 to 50 the increase is by 5, and after category “50”, the increase is by 50. The y-axis displays the number of gene models within each category.

! #&!

Table 3.3. Statistics of the pejerrey genome structural annotation compared with genome projects of closely related species. Genome sizes from Ensembl (www.ensembl.org) are “golden path lengths”, which is the length of the reference assembly. All data extracted from Ensembl (Flicek et al., 2013).

Genome Size Gene number Species and references (Mb) (coding genes) GC %

Odontesthes bonariensis 870 51,848 40.43

Oryzias latipes 869 19,699 40.7 Xiphophorus maculatus (ncbi.nlm.nih.gov/genome) 729 20,379 39.8 20,954 Poecilia Formosa (Schedina et al., 2014) 748 46.63

Gasterosteus aculeatus (Flicek et al., 2013) 461 22,456 44.6 Tetraodon nigroviridis (Jaillon and al., 15,455 54.4 (exon) 2004) 358 45.4 (introns)

Oreochromis niloticus (Flicek et al., 2013) 927 21,437 40.4

! #'! )!!!"

(!!!"

'!!!"

&!!!"

%!!!"

$!!!"

89*"):$0;$=)>)::),$=:04)%&($ #!!!"

!" #" #!" #+" $*" %)" &(" ''" (&" )%" *$" +#" #!!" #!+" ##*" #$)" #%(" #&'" 89*"):$0;$?2@-A=$5%4($$

Figure 3.7. Distribution of the Pejerrey proteins that align against the medaka proteome by number of blastp hits. Proteins with more than 145 hits represent proteins with highly conserved domains like Kinases, Zinc-Fingers, and Receptors.

! #(!

'!!!" /?371@29"A@78">3"B@79" &'!!"

&!!!" /?371@29"A@78"B@79" %'!!" %!!!" $'!!" $!!!"

89*"):$0;$#:04)%&($ #'!!" #!!!" '!!" !"

=:04)%&$2)&345$

Figure 3.8. Distribution of pejerrey predicted proteins by length, with and without hits against the medaka proteome. Protein length is in aminoacids.

! #)! /H6IJKL;GH6I" 'E" KMBMM" F2@G1H" &E" #*E" /6206" %E" CAZy 0.03% >3"C4@012D1" 0.37 )!E" CDD PRIAM 0.34% 0.007%

Figure 3.9. Sources of evidence for predicted CDSs in the pejerrey genome with no hits against medaka proteome. See section 3.2.5 for details on the evidence sources.

! #*! 3.3.4. Classification and function of protein families in Odontesthes bonariensis

Predicted genes were assigned to protein families by grouping proteins with shared domains.

39.2% of the predicted proteins were classified and assigned to one of 3,976 gene families, with an average number of 5.1 members per family (Figure 3.10). The rest of the proteins are

“orphans” with no shared domains. The largest 40 families (with >40 members) encode for reverse transcriptases, protein kinases, integrases, WD domains, and immunoglobulins (Table

3.4). A total of 1,602 protein domains found among pejerrey genes did not match any record from Pfam (Finn et al., 2013) or TIGR curated databases. Forty families out of 575 with novel domains include more than seven members, and three of those families have between 34 and 36 members (Figure 3.11).

The Reverse Transcriptase (PF00078) family had the largest number of members (878 predicted genes or ORFs) with 90.5% annotated as “reverse transcriptases”. Other families with large number of genes involve kinases, cytochrome P450, transposable elements, and tetraspanins. A widely abundant domain in the pejerrey genome is claudin, found in at least 90 different predicted ORFs, 21 located in adjacent positions.

! #+!

Figure 3.10. Distribution of protein families by number of family members.

! #"!

Table 3.4. Largest 40 protein families classified by domains present in the pejerrey genome. ID column is the identifier generated by the script to identify protein domain cluster of genes. “# members” includes the number of predicted genes that carry such domain(s). In the “Domain ID” column domains with the PF- prefix are derived from the Pfam database (Finn et al., 2013), TIGR- prefix is associated with the curated TIGR database (Haft et al., 2003), and para_ prefix is for novel domains.

ID # members Domain ID Domain(s) Name(s)

TDF:1 878 PF00078 Reverse transcriptase (RNA-dependent DNA polymerase) TDF:2 271 PF09004 Domain of unknown function (DUF1891) PF00069 Protein kinase domain TDF:3 234 PF07714 Protein tyrosine kinase TDF:4 205 PF00001 7 transmembrane receptor (rhodopsin family) TDF:5 144 PF00400 WD domain, G-! repeat TDF:6 140 PF00665 Integrase core domain PF00078 Reverse transcriptase (RNA-dependent DNA polymerase) TDF:7 118 PF09004 Domain of unknown function (DUF1891) TDF:8 98 PF13837 Myb/SANT-like DNA-binding domain TDF:9 93 PF00046 Homeobox domain TDF:10 87 PF07686 Immunoglobulin V-set domain PF00078 TDF:11 86 para_133 Reverse transcriptase (RNA-dependent DNA polymerase) TDF:12 82 PF05699 hAT family C-terminal dimerisation region PF00023 PF12796 PF13606 PF13637 TDF:13 80 PF13857 Ankyrin repeats PF00001 7 transmembrane receptor (rhodopsin family) TDF:14 76 PF10320 Serpentine type 7TM GPCR chemoreceptor Srsx

PF00096 PF13465 TDF:15 72 PF13894 Zinc finger C2H2 types TDF:16 71 PF02994 L1 transposable element TDF:17 71 PF00067 Cytochrome P450

! ""! TDF:18 69 PF00059 Lectin C-type domain TDF:19 68 PF00010 Helix-loop-helix DNA-binding domain TDF:20 66 PF00595 PDZ domain (Also known as DHR or GLGF) TDF:21 65 PF13358 DDE superfamily endonuclease TDF:22 63 PF00168 C2 domain TDF:23 59 PF00089 Trypsin TDF:24 57 PF00209 Sodium:neurotransmitter symporter family TDF:25 57 PF00153 Mitochondrial carrier protein TDF:26 55 PF00412 LIM domain TDF:27 54 PF00041 Fibronectin type III domain TDF:28 53 PF03372 Endonuclease/Exonuclease/phosphatase family TDF:29 52 PF00335 Tetraspanin family PF00018 SH3 domain SH3 TDF:30 50 PF07653 variant domain TDF:31 50 PF05380 Pao retrotransposon peptidase TDF:32 49 PF00030 " /Gamma crystallin TDF:33 49 PF13873 Myb/SANT-like DNA-binding domain PF00822 TDF:34 48 PF13903 PMP-22/EMP/MP20/Claudin family

PF00076 PF13893 TDF:35 47 PF14259 RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain) TDF:36 47 PF00520 Ion transport protein TDF:37 46 PF13359 DDE superfamily endonuclease PF02931 PF02932 Neurotransmitter-gated ion-channel ligand binding domains (2) Cation transporter family TDF:38 46 TIGR00860 protein TDF:39 43 PF01391 Collagen triple helix repeat (20 copies) TDF:40 42 PF00069 Protein kinase domain

! #$$!

200 180 160 140 120 100 80 60 Number of families Number of 40 20 0 2 3 4 5 6 7 8 9 10 11 12 13 34 36 Number of members per family

Figure 3.11. Families with novel protein domains categorized by number of family members. These novel domains are identified and clustered by sequence similarity (see Methods in 3.2.5). In Figure 3.9, these protein domains can be found with the “para_” identifying tag.

! "#"! 3.3.5. Repetitive elements in the pejerrey genome

Repeat elements found in the pejerrey genome included tandem and simple satellites, pseudogenes, and a wide array of TEs (Table 3.6). The unfiltered library of pejerrey repeats contained 9,489 elements, but 825 were removed due to short length and 799, for including low complexity regions. Remaining repeat elements included 7,864 distinct sequences. Two independent searches were performed with the filtered repeat library against Pfam (Finn et al., 2013) and RepBase (Jurka et al., 2005) databases. The Pfam search resulted in 376 matches against 44 unique Pfam domains (Table 3.5). This analysis showed five repetitive elements including the domain PF07686 Immunoglobulin

V-set, which is present in 30 protein families, one family being among the top 20 largest pejerrey protein families (Table 3.5).

Three domains showed a significantly larger presence in the repeat library: Reverse transcriptase (PF00078), Integrase core domain (PF00665) and Endonuclease /

Exonuclease / Phosphatase family (PF03372). All of these domains are present in the protein families from 3.3.4, and the first two domains are also found among the largest protein families in the pejerrey.

The search against RepBase showed 220 sequences that corresponded to known vertebrate TEs from the RepBase Update database (Table 3.6) (Jurka et al., 2005).

The most common type of repeat element in pejerrey is the DNA transposon TC1, which accounted for 21.36% of the classified repeats. The pejerrey genome also hosts other

DNA transposons, such as hAT elements, which only code for one transposase protein, and En/Spm elements, encoding not only transposases but also DNA-binding proteins

! "#$! (Jurka et al., 2007). Class II repetitive elements (RNA transposons) found in the pejerrey genome include REX types 1, 3, and 6.

! "#%! Table 3.5. Pfam protein domains present in the pejerrey repeat library compared with protein families analysis. “# in Repeat Library” shows the number of hits of each domain against the entire pejerrey repeat library, “# in Protein Families” the number of protein families that have each domain, and “# in Top 20 Families” the number of appearances of each Pfam domain in families with the 20 protein families with the most members. See the top 20 largest families in Table 3.4. ID Pfam Name # in Repeat # in Protein # in Top 20 Library Families Families PF07686 Immunoglobulin V-set domain 5 30 1 PF00097 Zinc finger, C3HC4 type (RING finger) 2 24 0 PF00046 Homeobox domain 1 22 1 PF00078 Reverse transcriptase (RNA-dependent DNA polymerase) 157 15 2 PF00622 SPRY domain 8 15 0 PF00001 7 transmembrane receptor (rhodopsin family) 5 13 2 PF00063 Myosin head (motor domain) 5 12 0 PF00665 Integrase core domain 39 9 1 PF01094 Receptor family ligand binding region 4 9 0 PF07654 Immunoglobulin C1-set domain 3 8 0 PF00385 Chromo (CHRromatin Organisation MOdifier) domain 3 7 0 PF00028 Cadherin domain 9 6 1 PF04548 AIG1 family 3 6 0 PF00003 7 transmembrane sweet-taste receptor of 3 GCPR 2 5 0 PF00689 Cation transporting ATPase, C-terminus 1 5 0 PF02023 SCAN domain 1 5 0 PF00125 Core histone H2A/H2B/H3/H4 6 4 0 PF01498 Transposase 6 4 0 PF01576 Myosin tail 3 4 0 PF03165 MH1 domain 1 4 0 PF03184 DDE superfamily endonuclease 1 4 0 PF00822 PMP-22/EMP/MP20/Claudin family 1 3 0 PF02338 OTU-like cysteine protease 1 3 0 PF03372 Endonuclease/Exonuclease/phosphatase family 45 3 0 PF03953 Tubulin C-terminal domain 2 3 0 PF05380 Pao retrotransposon peptidase 4 3 0 PF05699 hAT family C-terminal dimerisation region 7 3 1 PF08266 Cadherin-like 4 3 0 PF00012 Hsp70 protein 1 2 0 PF00075 RNase H 4 2 0 PF00129 Class I Histocompatibility antigen, domains alpha 1 and 2 2 2 0 PF00201 UDP-glucoronosyl and UDP-glucosyl transferase 1 2 0 PF00225 Kinesin motor domain 1 2 0 PF05485 THAP domain 7 2 0 PF00022 Actin 1 1 0 PF00030 Beta/Gamma crystallin 1 1 0 PF00232 Glycosyl hydrolase family 1 1 1 0 PF00429 ENV polyprotein (coat polyprotein) 1 1 0 PF01593 Flavin containing amine oxidoreductase 1 1 0 PF02994 L1 transposable element 9 1 1 PF05049 Interferon-inducible GTPase (IIGP) 2 1 0 PF00589 Phage integrase family 8 0 0 PF01609 Transposase DDE domain 6 0 0 PF03175 DNA polymerase type B, organellar and viral 1 0 0

! "#&! Table 3.6. Classification and abundance of Transposable Elements (TE) in the pejerrey genome. Abundance is relative to the total number of repeats identified using a combined approach of RepeatScout and a customized RepBase database (see Methods 3.2). Classification from Jurka (Jurka et al., 2007) and Repbase Update using GIRI (Jurka et al., 2005).

TE Type % Class Order/Family TC1 21.36 II Tc1/mariner hAT 10.91 II hAT DNA 9.55 II REX 9.09 I Non-LRT GYPSY 5.91 I LRT Mariner 5.00 II Tc1/mariner L2 4.55 I Non-LRT CHAPLIN 4.09 II hAT Non-LRT, RTE, Expander 3.18 I LINE SINE 3.18 I Non-LRT BEL 2.27 I LRT CR1 2.27 I Non-LRT LINE L1 2.27 I Non-LRT TX1 2.27 I Non-LRT, L1 TZF28 2.27 II Tc1/mariner hAT, TRILLIAN1 2.27 II Tip100/Zaphod TC2 1.82 II Tc1/mariner MAUI 1.36 I Non-LRT, CR1, L2 DONG 0.91 I Non-LRT ACROBAT 0.91 II EnSpm 0.91 II FUROUSHA2 0.91 II hAT, Tol2 5S 0.45 I Non-LRT HEROTn 0.45 I Non-LRT Nimb 0.45 I Non-LRT RTE 0.45 I Non-LRT, LINE UnaSINE2 0.45 I Non-LRT, SINE Penelope 0.45 I Penelope Total 100

! "#'! 3.3.6. Assessment of genome completeness and synteny with medaka

MUMmer alignments of each medaka chromosome (ASM31367v1) with the pejerrey scaffolds showed that the pejerrey assembly covers large spans of the medaka genome, with high similarity values. For a better visualization, each alignment has been filtered and re-aligned using only the pejerrey scaffolds that showed hits (Figure 3.12). All 24 medaka chromosomes have high (90-100%) similarity regions with pejerrey across the entire stretch of each chromosomal sequence. All interruptions in the alignment curves for each medaka chromosome (Figure 3.12) correspond to gaps (Ns) in the original medaka genome assembly.

The algorithm of DAGchainer identified 2,286 syntenic blocks between the genomes of medaka and pejerrey, with an average number of 4.6 genes per block, considering blocks with more than 3 genes (Figure 3.13). The largest syntenic block included 29 genes located in a region of medaka chromosome 22 and pejerrey scaffold number 3. Gene functions were mostly associated with cell structure and membrane transport. For details of all genes in the syntenic block see Table 3.7. Seventeen blocks with the highest score, based on distance between neighboring genes and the BLAST e-value, and more than 15 genes (Table 3.8), involved 9 out of 24 medaka chromosomes.

! "#(!

Figure 3.12. MUMmer alignments for all 24 medaka chromosomes against pejerrey scaffolds. Medaka chromosomes 1 to 24 from top-left to bottom-right. In each graph, medaka chromosome is in X axis and pejerrey scaffolds in Y axis. Points are colored by similarity percentage where 90%-100% is red, 80%-89% green, 60%-79% blue, 20%- 40% purple.

! "#)!

Figure 3.13. DAGchainer results on conserved syntenic blocks with medaka. Y-axis corresponds to the number of blocks in logarithmic scale.

! "#*! Table 3.7 Genes in largest syntenic block between the pejerrey genome assembly and medaka (Oryzias latipes). Gene names extracted from the pejerrey annotation. Interpro and Uniprot functions/processes were preferably extracted from zebrafish (Danio rerio) homologs, or human (noted) if zebrafish not available. Gene order in table corresponds to order in syntenic block. Medaka Pejerrey Function(s) / Biological Process(es) ID ID Pejerrey Annotated Gene Name or Molecular Function Source

21152 3.m000570 hypothetical protein Unknown - gremlin 1, cysteine knot family Development (zebrafish). Inhibitor in the 21154 3.m000571 (xenopus laevis) protein TGF beta signaling pathway (mice) (Nicoli et al., 2005) Rho GTPase-activating protein Binds to activated G proteins and 21158 3.m000572 11A stimulates GTPase activity Uniprot actin, alpha 1, skeletal muscle 21167 3.m000573 protein Cell motility, structure and integrity ncbi.nlm.nih.gov/gene Inhibits protein synthesis in response to eukaryotic translation initiation stress conditions (oxidative, osmotic, 21185 3.m000574 factor 2-alpha kinase heat, heme deficiency) (human) Uniprot Cytochrome P450 enzymes use molecular oxygen to modify substrate 21189 3.m000579 cytochrome P450 family 1 protein structure (Nelson et al., 2013) Mediates tetrahydrobiopterin (cofactor in GTP cyclohydrolase 1 feedback aminoacid degradation and synthesis) 21194 3.m000581 regulatory protein inhibition of GTP cyclohydrolase (Thony et al., 2000) Actin cytoskeleton reorganization, binds 21234 3.m000007 inverted formin-2 protein, putative to actin and rho GTPase InterPro Participates in synthesis of purine 21266 3.m000008 adenylosuccinate synthetase nucleotides InterPro zinc finger and BTB domain 21275 3.m000584 protein InterPro Regulates metabolism, proliferation, cell survival, growth and angiogenesis, V-akt murine thymoma viral mediated through serine and/or threonine 21301 3.m000585 oncogene-like protein phosphorylation InterPro centrosomal protein of 170 kDa Main microtubule organizing center 21327 3.m000586 protein B, putative Regulator of cell-cycle progression Uniprot 21330 3.m000587 phospholipase D4 Unknown Uniprot 21338 3.m000588 globin X Response to hypoxia, oxygen binding Uniprot serine peptidase inhibitor, kunitz 21343 3.m000591 type 1 protein Epidermis development Uniprot zinc finger FYVE domain protein, 21354 3.m000593 putative Metal ion binding Uniprot 21372 3.m000596 p21(CDKN1A)-activated kinase Apoptosis, cytoskeleton organization Uniprot Mediates attachment of membrane proteins to spectrin-actin based 21374 3.m000597 ankyrin-like protein membrane cytoskeleton Uniprot metazoan phosphoinositide- specific phospholipase C-beta2 21382 3.m000598 domain protein Taste reception signal transduction (Aihara et al., 2007) 21386 3.m000002 transmembrane protein, putative Unknown Uniprot methylenetetrahydrofolate dehydrogenase (NADP+ Biosynthesis of a folic acid containing 21416 3.m000599 dependent) 1 compound Uniprot 21420 3.m000600 chromosome 14 ORF protein Unknown Uniprot 21428 3.m000601 kelch-like 5 protein (insect) Unknown Uniprot 21436 3.m000602 FAM179B protein Unknown Uniprot Bardet-Biedl syndrome-related protein, (Cardenas-Rodriguez 21442 3.m000603 coiled-coil protein 28B ciliary length regulation et al., 2013) 21455 3.m000604 transmembrane protein 39B Integral membrane component Uniprot 21466 3.m000605 KH domain protein RNA binding and recognition Uniprot zinc finger and BTB domain 8 tRNA splicing, via endonucleolytic 21467 3.m000606 opposite strand protein cleavage and ligation Uniprot 21493 3.m000607 histone-binding protein RBBP4 Chromatin assembly and regulation Uniprot

! "#+! Table 3.8 Detail of best-scored syntenic blocks. Block ID has been given for easy reference in the text. The score provided by DagChainer is based on the distance between neighboring genes and the BLAST e-value score (Haas et al., 2004). Subsequent columns show the number of genes in each syntenic block, the original location of the genes in the medaka chromosomes, and in the pejerrey scaffolds. The last column shows if the medaka chromosomes underwent any major chromosomal rearrangements after the teleost whole-genome duplication, and was taken from Kasahara (Kasahara et al., 2007). More details are provided in the discussion section. Rows have been colored by the different medaka chromosomes involved in more than one syntenic block: Ch. 8 in pink, Ch. 12 in light blue, Ch. 13 in green, and Ch. 22 in light brown. * is a rearrangement that consisted in two duplications of the same protochromosome, but no rearrangements with other chrosomosomes in medaka (Kasahara, 2007).

Genes Block in Medaka Pejerrey ID Score block Chromosome Scaffold Rearrangements after 3R-WGD 1 291 29 22 3 no 2 273 25 21 23 no 3 264 25 9 31 no 4 216 22 17 15 * 5 216 21 22 1 no 6 216 20 12 29 no 7 204 20 13 7 yes 8 189 19 6 43 yes 9 174 18 24 12 no 10 201 18 4 20 * 11 174 18 22 65 no 12 177 17 8 11 no 13 150 16 8 10 no 14 171 16 12 27 no 15 174 16 12 34 no 16 162 16 12 5 no 17 153 16 13 77 yes

! ""#!

3.4. Discussion and Conclusions

3.4.1. Assembly of shotgun short reads

While high-throughput next-generation sequencing with Illumina brings many advantages, one of its limitations consists of the great length difference between the short reads generated, especially by the widely used second-generation technologies, and the size of the genomes of most organisms. The pejerrey genome size falls in the middle of the range for fish genomes, approximately a third of the human genome, but more than double the size of the “compact” genomes of fugu and pufferfish (Aparicio et al., 2002;

Jaillon and al., 2004). The average read size obtained in our experiments was between

98-100 base pairs, while the pejerrey genome is estimated to comprise 998 millions bp.

The approach used to accomodate the length difference between the genome and the sequenced reads is a modification of the shotgun method (Staden, 1979), based on random shearing of the whole genome of an organism into smaller fragments, or short reads, that are sequenced separately. The assembly of these fragments constitutes a large bioinformatic challenge, not only because of their size, but also due to the large proportion of repetitive sequences, that a genome can present. For example, 17.5% of the medaka (Oryzias latipes) genome corresponds to repetitive elements, and Allpaths-LG estimated the pejerrey genome to be 37% repetitive.

Another caveat challenging assembly algorithms is the large amount of data to be analyzed, which requires parallelization of computer processes. Most of the assemblers were initially introduced for the assembly of small, and sometimes simple bacterial genomes. However, improvement in the software algorithms plus the use of sequencing

! """! libraries with varied insert sizes have allowed for a successful implementation of some assemblers in large and complex vertebrate genomes (Li et al., 2010b; Li et al., 2010c).

This trend is expected to continue as the sequencing costs diminish and the availability and quality of the computer resources improves (Miller et al., 2010).

For the assembly of the Illumina reads obtained from the sequencing of the pejerrey genomic libraries, we implemented SOAPdenovo v.1.05 63-mer (Luo and al, 2012) and

AllPaths-LG (Gnerre et al., 2011). AllPaths-LG first filters the reads for errors based on quality values, then builds a global K-mer graph by joining local graphs using the overlaps among reads and the information from paired-ends with a divide-and-conquer approach (Gnerre et al., 2011; Miller et al., 2010). The algorithm of SOAPdenovo is a memory-efficient combination of OLC, or Overlap Layout Consensus, and DBG, or De

Bruijn Graph, methods (Miller et al., 2010; Nagarajan and Pop, 2013). Same as AllPaths-

LG, there is an initial step of reads filtering and correction based on threshold values pre- established by K-mer frequencies. Once the DBG has been constructed, the ends or tips are corroded and the bubbles are eliminated based on the path that has the higher coverage value. The formation of the contigs is based on the DBG method, but not for scaffolds. All reads are mapped to the contig graph, which is later cleaned from contigs with ambiguous or repeated positions in the graph. Later, gaps in scaffolds are filled by insertion of the reads from mate pairs. Although the assembly algorithm is similar,

AllPaths-LG brings an innovative approach that is reflected in increased performance, as it requires a specific combination of sequencing datasets of paired-end and mate-paired reads with long insert sizes. The algorithm therefore makes a more efficient use of the libraries by applying them at different stages in the algorithm (Gnerre et al., 2011).

! ""$! Compared analyses of the statistics behind each assembler showed higher N50 values, fewer sequences, and a lower number of smaller gaps in the Allpaths-LG assembly. This could be initially interpreted as a more contiguous assembly, and therefore, more correct and closer to the true, unknown assembly. However, a contiguous, less fragmented assembly does not assure correctness (Nagarajan and Pop, 2013), and N50 values do not necessarily correlate with the quality of the assembly (Earl et al., 2011; Salzberg et al.,

2012). Therefore, other measurements of assembly quality with independent data should be implemented, like manually curated localized assemblies of BAC sequences (Istrail,

2004; Myers, 2000), shotgun optical mapping (Zhou, 2002), and transcriptomes or whole genomes of closely related species (Gnerre et al., 2009; Meader et al., 2010). The latter was the chosen method for further evaluation of the pejerrey assembly, under an important restriction: assembly errors will not be easily distinguished from true biological differences among medaka and pejerrey (Meader et al., 2010). In spite of the limitations of the evaluation methods, our results in the comparative performance are in agreement with previous studies (Nagarajan and Pop, 2013; Salzberg et al., 2012). Assembly quality assessments with real datasets performed by The Genome Assembly Gold-standard

Evaluations (Salzberg et al., 2012) showed that AllPaths-LG outperformed SOAPdenovo in every evaluation parameter (Salzberg et al., 2012; Vezzi et al., 2012). However, a different evaluating effort named Assemblathon based on simulated genomes and reads showed equally satisfactory performances for both assemblers (Earl et al., 2011; Vezzi et al., 2012). Though, simulated datasets might not resemble some of the challenges in real datasets, such as genomes with complex repetitive regions (Henson et al., 2012).

! ""%! 3.4.2. Annotation and gene overestimation

The optimal assembly has more than 51,000 annotated open reading frames. Predicted genes with no database evidence are expected to be artifacts of the automated annotation, and possibly populate the proportion of short (< 100 aminoacids) predicted proteins with no hits (Figure 3.8), which corresponds to 31% (17,685) of all predicted gene models. In light of these caveats, the actual number of genes in the pejerrey genome might be close to 30,000, still a relatively large number in comparison to other fish genomes of similar size (Table 3.3). Other genes redundantly annotated could be located within larger length categories, which might decrease the final number of gene models even more. Further analyses of the annotation will surely improve the quality of the pejerrey genome draft, but exceed the scope of this dissertation.

3.4.3. Protein families in pejerrey

To assess the resulting automated annotation and estimate relationships among genes, the predicted proteins in pejerrey were grouped in protein families sharing conserved protein domains, and therefore, biochemical functions (Doolittle, 1981; Sankoff, 2001). The latter is illustrated by the example of known proteins in which the same domains are shared but have no homologous relationships (Buljan and Bateman, 2009). Protein domains are dense, compressed regions within proteins that have a distinct function

(Buljan and Bateman, 2009) and are present in a certain order or combination (Holm and

Sander, 1994). For example, 4/5 of the known eukaryote proteins have 2 or more domains (Chothia et al., 2003). The order and abundance of protein domains can provide

! ""&! more clues about the complexity of an organism than the number of genes in its genome

(Babushok et al., 2007; Vogel and Chothia, 2006).

The protein family with the largest number of members in pejerrey was represented by reverse transcriptase, an enzyme that generates complementary DNA from an RNA template and constitutes a fundamental structure in all vertebrate retroelements (Bohne et al., 2008). Eukaryotic genomes have abundance of retroelements, and in vertebrates such elements are drivers of evolution (Bohne et al., 2008). Among the most abundant protein families in pejerrey, cytochrome P450, claudins, and tetraspanins are of known relevance among fishes. Cytochrome P450 enzymes are encoded by CYP genes and are functionally related to the catalysis of mono-oxygenase reactions (Uno et al., 2012). CYP genes in fish present a large variety of families and subfamilies due to its expansion through whole genome duplication events (Nelson et al., 2013; Ohno, 1999). Claudins are a greatly expanded family of junction proteins, with at least 21 claudin genes specific to the fish lineage. Two types of claudin proteins participate in hydromineral balance in ionoregulatory tissues in euryhaline fishes (Bagherie-Lachidan et al., 2009). Expansion of the family within teleosts is attributed to tandem and whole genome duplications

(Bagherie-Lachidan et al., 2009), which have potentially allowed for the development of novel functions and distinct physiologies within fishes (Loh et al., 2004).

One of the largest protein families in pejerrey is the superfamily of the tetraspanins, membrane proteins present in almost all cell types, and involved in interactions between cells and matrix-cells, such as adhesion, migration, signal transduction, activation, proliferation, and differentiation (Hemler, 2001; Hemler, 2003). Tetraspanins might have had a key role in the evolution from unicellularity to multi-cell organisms (Huang et al.,

! ""'! 2005). Comparative studies revealed that fishes are the vertebrates with the largest number of tetraspanins (Huang et al., 2005). Zebrafish and Japanese fugu have 40-47

(Garcia-España et al., 2008; Huang et al., 2005) tetraspanins depending on the methods of study, while 52 members were identified in pejerrey. This difference in number could be a consequence of repeated annotations of fragmented tetraspanin genes in pejerrey, or could actually be a true expansion of the family. An example of redundant annotation was found in TSPAN12, which was annotated as two adjacent genes of 3 and 4 exons, separated by a long stretch of bases. Visual inspection of the scaffold and comparison with TSPAN12 from zebrafish showed 7 coding exons and a similar length than the 2 combined pejerrey TSPAN12 genes. Many tetraspanins in fish are found in two copies and located in genomic regions that are known to be influenced by whole-genome duplications, or paralogous regions (Abi-Rached et al., 2002; Ohno, 1999). In pejerrey, many tetraspanins were found in two copies (TSPAN3, 7, 12, 18, CD9, and retinal outer segment membrane protein 1b) and always in different scaffolds.

3.4.4. Synteny with medaka

Medaka (Oryzias latipes) is the phylogenetically closest species to pejerrey with a thoroughly annotated genomic database (Kasahara, 2007). Genomic conservation in the shape of conserved blocks of collinear genes between both genomes is expected, mostly due to the low rate of chromosomal re-arrangements among teleost fishes (Kai et al.,

2011), and existing evidence of genome similarities between medaka and the evolutionarily distant zebrafish (Naruse et al., 2011).

! ""(! Comparison of the results from the synteny analysis with studies of other fish genomes represents a challenge due to the fragmented status of the pejerrey genome. Previous studies presented comparisons at the chromosomic or linkage-group scale, which are significantly larger than the pejerrey scaffolds (Jaillon and al., 2004; Star et al., 2011).

For example, the maximum number of genes in syntenic blocks between fugu and medaka ranged from 83 to 505 genes (Kai et al., 2011), while the longest syntenic stretch here presented harbors 29 genes. However, the largest syntenic blocks between medaka and pejerrey could be roughly compared with other studies to find conserved regions with other teleosts, and therefore, highly conserved gene order (Table 3.8). The seventeen best-scored syntenic blocks (Table 3.8) corresponded to 9 chromosomes in medaka (4, 6,

8, 9, 12, 13, 17, 21, and 22) and 17 scaffolds in pejerrey, from which 12 are included in the top 30 largest of the pejerrey assembly. Among the nine medaka chromosomes included in the largest blocks, only two have been affected by major chromosomal rearrangements after the teleost-specific whole-genome duplication event or 3R-WGD

(Amores et al., 1998; Meyer and Schartl, 1999; Taylor et al., 2001; Wittbrodt et al.,

1998). The current karyotype in known fish genomes is hypothesized to be the result of a whole-genome duplication of 13 ancestral protochromosomes that underwent 8 major rearrangements after the duplication event (Kasahara, 2007). Therefore, most of these blocks correspond to genomic regions in medaka that are expected to show a high degree of conserved gene order (Kai et al., 2011; Kasahara, 2007).

3.4.5. Repetitive elements

! "")! Repetitive sequences represent a substantial portion of eukaryote genomes, without any correlation to genome size (Haubold and Wiehe, 2006). Repeats can be found in one copy or many copies across a genome, and copies can be closely located, such as tandem repeats or microsatellites, minisatellites and satellites, or can be interspersed in the genome. Interspersed elements can be TEs (Transposable Elements), highly conserved sequences of DNA and RNA of variable size that are reproduced and inserted in the host genome. TEs are generally the most abundant type of repetitive elements in eukaryote genomes but their distribution varies greatly among taxa (Bohne et al., 2008; Ussery et al., 2009). TEs are less abundant but extremely more diverse in fishes than in mammals

(Aparicio et al., 2002), and are classified in two main groups, Class I or DNA transposons, and Class II or retrotransposons, the latter requiring an RNA intermediate to insert in the genome (Jurka et al., 2007).

Tc1, the most abundant element in pejerrey (Table 3.6), is a class I type that belongs to the Tc1/mariner superfamily, the most widespread in nature, found in fungi, plants, animals, and protists (Plasterk et al., 1999). Tc1 elements were studied in other fish genomes like Oreochromis niloticus (Harvey et al.), Gobius niger (Mandrioli et al.),

Tetraodon fluviatilis (Mandrioli and Manicardi) and T. nigroviridis heterochromatins

(Dasilva et al.). Tc1 elements are located in different genomic regions depending on the species, and can be either concentrated in heterochromatin and RNA-synthesis regions, or dispersed throughout the genome (Ferreira et al., 2011). REX 1, 3 and 6 are class II repetitive elements (RNA transposons) found in the pejerrey genome and were already characterized in other fish genomes (Feldberg et al., 2003; Gross et al.; Poletto et al.;

Schneider et al., 2013; Teixeira et al.).

! ""*! Interactions between the TEs and host genomes have been extensively studied, and vary greatly from suppressing and deleterious to beneficial drivers of vertebrate evolution

(Jurka et al., 2007). Immunoglobulin (Ig) genes in fish are extremely varied and complex, supposedly as a consequence of the action of transposable elements and genome duplication events (Hsu et al., 2006). Two immunoglobulin (Ig) domains, PF00078

Immunoglobulin V-set and PF07654 Immunoglobulin C1-set (Table 3.5), were identified in 5 and 3 repetitive elements, respectively, but were also found within some of the largest protein families. The presence of repetitive elements in these loci could have helped the expansion of immunoglobulin protein families in pejerrey. The detection of repeats associated with the Ig domains could also be an artifact derived from the genomic architecture of Ig loci in fish. The Ig loci in teleost fish have clusters of repeated domains and pseudogenes (Hikima et al., 2011). For example, the IgH (Immunoglobulin heavy chain) loci in teleosts is a translocon, a complex of proteins associated with the translocation of polypeptides across membranes (Johnson and van Waes, 1999), where various domains are adjacent and repeatedly located.

Another protein domain associated with both repetitive elements and protein families in the pejerrey is the PF00012 heatshock protein 70 (Hsp70). Previous studies have underlined the role of transposable elements on the evolution of this extensive family, focusing on the flanking regions near the Hsp70 promoters (Shilova et al., 2006;

Zatsepina et al., 2001).

The section on repetitive elements of this dissertation merely introduces potential links between large protein families and transposable elements, but more in depth investigations are beyond the scope of this manuscript. However, the pejerrey genome

! ""+! draft and the identification of repetitive elements in Tables 3.5 and 3.6 will hopefully encourage further studies.

3.4.6. Final remarks

This study makes available the first draft version of the pejerrey genome along with an evaluation of its assembly and annotation, a comprehensive list of its repetitive elements, and a preliminary examination on conserved gene order with Oryzias latipes. The product of this chapter will be made publicly available in the GenBank online database, and will also be prepared in the form of a manuscript for publication in a suitable scientific journal. This first genome draft for Odontesthes bonariensis constitutes a significant, and much needed starting point for the scientific community to use and improve, but also to help expand across current research limitations. The exploration of the first atheriniform genome bestows a considerable resource for further advance in studies of this iconic

South American species and closely related organisms.

3.5. References

Abi-Rached, L., Gilles, A., Shiina, T., Pontaroti, P., Inoko, H., 2002. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 31, 100-105. Aihara, Y., Yasuoka, A., Yoshida, Y., Ohmoto, M., Shimizu-Ibuka, A., Misaka, T., . . . Abe, K., 2007. Transgenic labeling of taste receptor cells in model fish under the control of the 5'-upstream region of medaka phospholipase C-beta 2 gene. Gene expression patterns : GEP 7, 149-157. Amores, A., Force, A., Yan, Y.L., Joly, L., Amemiya, C., Fritz, A., . . . Wang, Y.L., 1998. Zebrafish hox clusters and vertebrate genome evolution. Science 282, 4. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 10.

! "$#! Babushok, D.V., Ostertag, E.M., Kazazian, J., H. H., 2007. Current topics in genome evolution: molecular mechanisms of new gene formation. Cell. Mol. Life. Sci. 64, 542-554. Bagherie-Lachidan, M., Wright, S.I., Kelly, S.P., 2009. Claudin-8 and -27 tight junction proteins in puffer fish Tetraodon nigroviridis acclimated to freshwater and seawater. Journal of comparative physiology. B, Biochemical, systemic, and environmental physiology 179, 419-431. Bernardi, G., Wiley, E.O., Mansour, H., Miller, M.R., Orti, G., Haussler, D., . . . Venkatesh, B., 2012. The fishes of Genome 10K. Marine genomics 7, 3-6. Betancur, R.R., Broughton, R.E., Wiley, E.O., Carpenter, K., Lopez, J.A., Li, C., . . . Orti, G., 2013. The tree of life and a new classification of bony fishes. PLoS currents 5. Birney, E., Clamp, M., Durbin, R., 2004. GeneWise and genomewise. Genome research 14, 988-995. Bohne, A., Brunet, F., Galiana-Arnoux, D., Schultheis, C., Volff, J.N., 2008. Transposable elements as drivers of genomic and biological diversity in vertebrates. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 16, 203-215. Buljan, M., Bateman, A., 2009. The evolution of protein domain families. Biochemical Society transactions 37, 751-755. Cardenas-Rodriguez, M., Irigoín, F., Osborn, D.P., Gascue, C., Katsanis, N., Beales, P.L., Badano, J.L., 2013. The Bardet-Biedl syndrome related protein CCDC28B modulates mTORC2 function and interacts with SIN1 to control cilia length independently of the mTOR complex. Human molecular genetics, ddt253. Chothia, C., Gough, J., Vogel, C., Teichmann, S.A., 2003. Evolution of the protein repertoire. Science 300, 1701-1703. Claudel‐Renard, C., Chevalet, C., Faraut, T., Kahn, D., 2003. Enzyme‐specific profiles for genome annotation: PRIAM. Nucleic acids research 31, 6633-6639. Consortium, U., 2014. Activities at the Universal Protein Resource (UniProt). Nucleic acids research 42, D191-D198. Dasilva, C., Hadji H Fau - Ozouf-Costaz, C., Ozouf-Costaz C Fau - Nicaud, S., Nicaud S Fau - Jaillon, O., Jaillon O Fau - Weissenbach, J., Weissenbach J Fau - Roest Crollius, H., Roest Crollius, H., Remarkable compartmentalization of transposable elements and pseudogenes in the heterochromatin of the Tetraodon nigroviridis genome. Doolittle, R.F., 1981. Similar amino acid sequences: chance or common ancestry? Science 214, 149-159. Dyer, B., 2006. Systematic revision of the South American silversides (Teleostei, Atheriniformes). Biocell 30, 69-88. Earl, D., Bradnam, K., St John, J., 2011. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome research 21, 2224-2241. Feldberg, E., Porto, J.I.R., Bertollo, L.A.C., 2003. Chromosomal changes and adaptation of cichlid fishes during evolution. Fish adaptations, 285-308. Ferreira, D.C., Porto-Foresti, F., Oliveira, C., Foresti, F., 2011. Transposable elements as a potential source for understanding the fish genome. Mobile genetic elements 1, 112-117.

! "$"! Feschotte, C., Pritham, E.J., 2007. DNA transposons and the evolution of eukaryotic genomes. Annual review of genetics 41, 331. Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., . . . Mistry, J., 2013. Pfam: the protein families database. Nucleic acids research, gkt1223. Flicek, P., Ahmed, I., Amode, M.R., Barrell, D., Beal, K., Brent, S., . . . Searle, S.M.J., 2013. Ensembl 2013. Nucleic acids research 41, D48-D55. Fuller, C.W., Middendorf, L.R., Benner, S.A., Church, G.M., Harris, T., Huang, X., . . . Vezenov, D.V., 2009. The challenges of sequencing by synthesis. Nature biotechnology 27, 1013-1023. Garcia-España, A., Chung, P.-J., Sarkar, I.N., Stiner, E., Sun, T.-T., DeSalle, R., 2008. Appearance of new tetraspanin genes during vertebrate evolution. Genomics 91, 326-334. Gnerre, S., Lander, E.S., Lindblad-Toh, K., Jaffe, D.B., 2009. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome biology 10, 2009. Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., . . . Jaffe, D.B., 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America 108, 1513-1518. Gross, M.C., Schneider Ch Fau - Valente, G.T., Valente Gt Fau - Porto, J.I.R., Porto Ji Fau - Martins, C., Martins C Fau - Feldberg, E., Feldberg, E., Comparative cytogenetic analysis of the genus symphysodon (discus fishes, cichlidae): chromosomal characteristics of retrotransposons and minor ribosomal DNA. Haas, B.J., Delcher, A.L., Wortman, J.R., Salzberg, S.L., 2004. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20, 3643- 3646. Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., . . . Wortman, J.R., 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, R7. Haas, B.J., Wortman, J.R., Ronning, C.M., Hannick, L.I., Smith, R.K., Jr., Maiti, R., . . . Town, C.D., 2005. Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC biology 3, 7. Haft, D.H., Selengut, J.D., White, O., 2003. The TIGRFAMs database of protein families. Nucleic acids research 31, 371-373. Hardie, D.C., Hebert, P.D.N., 2003. The nucleotypic effects of cellular DNA content in cartilaginous and ray-finned fishes Genome / National Research Council Canada = Genome / Conseil national de recherches Canada 46, 683-706. Harvey, S.C., Boonphakdee C Fau - Campos-Ramos, R., Campos-Ramos R Fau - Ezaz, M.T., Ezaz Mt Fau - Griffin, D.K., Griffin Dk Fau - Bromage, N.R., Bromage Nr Fau - Penman, P., Penman, P., Analysis of repetitive DNA sequences in the sex chromosomes of Oreochromis niloticus. Haubold, B., Wiehe, T., 2006. How repetitive are genomes? Bmc Bioinformatics 7, 541. Hemler, M.E., 2001. Specific tetraspanin functions. J. Cell Biol. 155, 1103-1107.

! "$$! Hemler, M.E., 2003. Tetraspanin proteins mediate cellular penetration, invasion, and fusion events and define a novel type of membrane microdomain. Annu. Rev. Cell. Dev. Biol. 19, 397-422. Henson, J., Tischler, G., Ning, Z., 2012. Next-generation sequencing and large genome assemblies. Pharmacogenomics 13, 901-915. Hikima, J.-i., Jung, T.-S., Aoki, T., 2011. Immunoglobulin genes and their transcriptional control in teleosts. Developmental & Comparative Immunology 35, 924-936. Hinegardner, R., Rosen, D.E., 1972. Cellular DNA content and the evolution of teleostean fishes. Am. Nat. 106, 621-644. Holm, L., Sander, C., 1994. Parser for protein folding units. Proteins 19, 256-268. Hsu, E., Pulham, N., Rumfelt, L.L., Flajnik, M.F., 2006. The plasticity of immunoglobulin gene systems in evolution. Immunological reviews 210, 8-26. Huang, S., Yuan, S., Dong, M., Su, J., Yu, C., Shen, Y., . . . Xu, A., 2005. The phylogenetic analysis of tetraspanins projects the evolution of cell-cell interactions from unicellular to multicellular organisms. Genomics 86, 674-684. Huang, X., Adams, M.D., Zhou, H., Kerlavage, A.R., 1997. A tool for analyzing and annotating genomic sequences. Genomics 46, 37-45. Istrail, S., 2004. Whole-genome shotgun assembly and comparison of human genome assemblies Proc. Natl. Acad. Sci. USA 101, 1916-1921. Jaillon, O., al., e., 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotypes. Nature 431.7011, 946-957. Johnson, A.E., van Waes, M.A., 1999. THE TRANSLOCON: A Dynamic Gateway at the ER Membrane. Annual Review of Cell and Developmental Biology 15, 799-842. Jurka, J., Kapitonov, V.V., Kohany, O., Jurka, M.V., 2007. Repetitive sequences in complex genomes: structure and evolution. Annual review of genomics and human genetics 8, 241-259. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J., 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462-467. Kai, W., Kikuchi, K., Tohari, S., Chew, A.K., Tay, A., Fujiwara, A., . . . Venkatesh, B., 2011. Integration of the genetic map and genome assembly of fugu facilitates insights into distinct features of genome evolution in teleosts and mammals. Genome biology and evolution 3, 424-442. Karube, M., Fernandino, J.I., Strobl-Mazzulla, P., Strussmann, C.A., Yoshizaki, G., Somoza, G.M., Patino, R., 2007. Characterization and expression profile of the ovarian cytochrome P-450 aromatase (cyp19A1) gene during thermolabile sex determination in pejerrey, Odontesthes bonariensis. Journal of experimental zoology. Part A, Ecological genetics and physiology 307, 625-636. Kasahara, M., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447. Kasahara, M., Naruse, K., Sasaki, S., Nakatani, Y., Qu, W., Ahsan, B., . . . Kohara, Y., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447, 714-719. Korf, I., 2004. Gene finding in novel genomes. Bmc Bioinformatics 5, 59.

! "$%! Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.L., 2001. Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes. Journal of Molecular Biology 305, 567-580. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L., 2004. Versatile and open software for comparing large genomes. Genome biology 5, R12. Lee, W., Chen, S.L., Genome-tools provides fliexible tools for a simple API for genomic sequence processing on genomes published in the standard Genbank format. . BioTechniques 33, 1334-1341. Li, C., Orti, G., Zhao, J., 2010a. The phylogenetic placement of sinipercid fishes ("Perciformes") revealed by 11 nuclear loci. Molecular phylogenetics and evolution 56, 1096-1104. Li, R., Fan, W., Tian, G., Zhu, H., He, L., Cai, J., 2010b. The sequence and de novo assembly of the giant panda genome. Nature 463, 311-317. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., . . . Wang, J., 2010c. De novo assembly of human genomes with massively parallel short read sequencing. Genome research 20, 265-272. Lin, H., Ouyang, S., Egan, A., Nobuta, K., Haas, B.J., Zhu, W., . . . Buell, C.R., 2008. Characterization of paralogous protein families in rice. BMC plant biology 8, 18. Loh, Y.H., Christoffels, A., Brenner, S., Hunziker, W., Venkatesh, B., 2004. Extensive expansion of the claudin gene family in the teleost fish, Fugu rubripes. Genome research 14, 1248-1257. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., Henrissat, B., 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic acids research 42, D490-D495. Lorenzi, H.A., Puiu, D., Miller, J.R., Brinkac, L.M., Amedeo, P., Hall, N., Caler, E.V., 2010. New Assembly, Reannotation and Analysis of the Entamoeba histolytica Genome Reveal New Genomic Features and Protein Content Information. PLoS Negl Trop Dis 4, e716. Luo, R., al, e., 2012. SOAPdenovo2: an empirically improved memory-efficient short- read de novo assembler. GigaScience 1. Majoros, W., al., e., 2004. TIGRscan AND GlimmerHMM: two open-source ab initio eukaryotic gene finders. Bioinformatics 20, 2878-2879. Mandrioli, M., Manicardi, G.C., Cytogenetic and molecular analysis of the pufferfish Tetraodon fluviatilis (Osteichthyes). Mandrioli, M., Manicardi Gc Fau - Machella, N., Machella N Fau - Caputo, V., Caputo, V., Molecular and cytogenetic analysis of the goby Gobius niger (Teleostei, Gobiidae). Marchler-Bauer, A., Zheng, C., Chitsaz, F., Derbyshire, M.K., Geer, L.Y., Geer, R.C., . . . Lanczycki, C.J., 2012. CDD: conserved domains and protein three-dimensional structure. Nucleic acids research, gks1243. Meader, S., Hillier, L.W., Locke, D., Ponting, C.P., Lunter, G., 2010. Genome assembly quality: assessment and improvement using the neutral indel model. Genome research 20, 675-684.

! "$&! Meyer, A., Schartl, M., 1999. Gene and genome duplications in vertebrates: the one-to- four (-to-eight in fish) rule and the evolution of novel gene functions. Current opinion in cell biology 11, 699-704. Miller, J.R., Delcher, A.L., Koren, S., Venter, E., Walenz, B., Brownley, A., . . . Sutton, G.G., 2008. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818-2824. Miller, J.R., Koren, S., Sutton, G., 2010. Assembly algorithms for next-generation sequencing data. Genomics 95, 315-327. Myers, E.W., 2000. A Whole-Genome Assembly of Drosophila. Science 287, 2196-2204. Nagarajan, N., Pop, M., 2013. Sequence assembly demystified. Nature reviews. Genetics 14, 157-167. Naruse, K., Tanaka, M., Takeda, H., 2011. Medaka: a model for organogenesis, human disease, and evolution. Springer Science & Business Media, p. 404. Nelson, D.R., Goldstone, J.V., Stegeman, J.J., 2013. The cytochrome P450 genesis locus: the origin and evolution of cytochrome P450s. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368, 20120474. Nicoli, S., Gilardelli, C.N., Pozzoli, O., Presta, M., Cotelli, F., 2005. Regulated expression pattern of< i> gremlin during zebrafish development. Gene Expression Patterns 5, 539-544. Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G., 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein engineering 10, 1-6. Ohno, S., 1999. Gene duplication and the uniqueness of vertebrate genomes circa 1970- 1999. Cell & Developmental Biology 10, 517-522. Parenti, L.R., 1993. Relationships of Atherinomorph Fishes (Teleostei). Bulletin of Marine Science 52, 170-196. Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 8, 785-786. Plasterk, R.H.A., Izsvak, Z., Ivics, Z., 1999. Resident aliens, the Tc1/mariner superfamily of transposable elements. TIG 15. Poletto, A.B., Ferreira Ia Fau - Cabral-de-Mello, D.C., Cabral-de-Mello Dc Fau - Nakajima, R.T., Nakajima Rt Fau - Mazzuchelli, J., Mazzuchelli J Fau - Ribeiro, H.B., Ribeiro Hb Fau - Venere, P.C., . . . Martins, C., Chromosome differentiation patterns during cichlid fish evolution. Pop, M., 2009. Genome assembly reborn: recent computational challenges. Briefings in bioinformatics 10, 354-366. Price, A.L., Jones, N.C., Pevzner, P.A., 2005. De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1, i351-358. Salamov, A.A., Solovyev, V.V., 2000. Ab initio gene finding in Drosophila genomic DNA. Genome research 10, 516-522. Salzberg, S.L., Phillippy, A.M., Zimin, A.V., 2012. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome research 22, 557-567. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA. Sankoff, D., 2001. Gene and genome duplication. Current opinion in genetics & development 11, 681-684.

! "$'! Schartl, M., Walter, R.B., Shen, Y., Garcia, T., Catchen, J., 2013. The genome of the platyfish, Xiphophorus maculatus: insights into complex traits. Nature genetics 45, 567-572. Schedina, I.M., Hartmann, S., Groth, D., Schlupp, I., Tiedemann, R., 2014. Comparative analysis of the gonadal transcriptomes of the all-female species Poecilia formosa and its maternal ancestor Poecilia mexicana. BMC research notes 7, 249. Schneider, C.H., Gross, M.C., Terencio, M.L., Artoni, R.F., Vicari, M.R., Martins, C., Feldberg, E., 2013. Chromosomal evolution of neotropical cichlids: the role of repetitive DNA sequences in the organization and structure of karyotype. Reviews in Fish Biology and Fisheries 23, 201-214. Shilova, V.Y., Garbuz, D.G., Myasyankina, E.N., Chen, B., Evgen'ev, M.B., Feder, M.E., Zatsepina, O.G., 2006. Remarkable site specificity of local transposition into the Hsp70 promoter of Drosophila melanogaster. Genetics 173, 809-820. Smit, A.F.A., Hubley, R., Green, P., 2010. RepeatMasker Open-3.0. . Somoza, G.M., Miranda, L.A., Berasain, G.E., Colautti, D., Remes Lenicov, M., Strüssmann, C.A., 2008. Historical aspects, current status and prospects of pejerrey aquaculture in South America. Aquaculture Research 39, 784-793. Staden, R., 1979. A strategy of DNA sequencing employing computer programs. Nucleic Acids Research 6, 10. Stanke, M., Steinkamp, R., Waack, S., Morgenstern, B., 2004. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, W309-W312. Star, B., Nederbragt, A.J., Jentoft, S., Grimholt, U., Malmstrom, M., Gregers, T.F., . . . Jakobsen, K.S., 2011. The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207-210. Strüssmann, C.A., Saito, T., Usui, M., Yamada, H., Takashima, F., 1997. Thermal thresholds and critical period of thermolabile sex determination in two atherinid fishes, Odontesthes bonariensis and Patagonina hatcheri. The Journal of experimental biology 278, 167-177. Taylor, J.S., Van de Peer, Y., Braasch, I., Meyer, A., 2001. Comparative genomics provides evidence for an ancient genome duplication event in fish. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 356, 18. Teixeira, W.G., Ferreira Ia Fau - Cabral-de-Mello, D.C., Cabral-de-Mello Dc Fau - Mazzuchelli, J., Mazzuchelli J Fau - Valente, G.T., Valente Gt Fau - Pinhal, D., Pinhal D Fau - Poletto, A.B., . . . Martins, C., Organization of repeated DNA elements in the genome of the cichlid fish Cichla kelberi and its contributions to the knowledge of fish genomes. Thony, B., Auerbach, G., Blau, N., 2000. Tetrahydrobiopterin biosynthesis, regeneration and functions. Biochem. J 347, 1-16. Uno, T., Ishizuka, M., Itakura, T., 2012. Cytochrome P450 (CYP) in fish. Environmental toxicology and pharmacology 34, 1-13. Ussery, D.W., Wassenaar, T.M., Borini, S., 2009. Word frequencies and repeats. Computing for comparative microbial genomics. Springer London, pp. 137-150. Vezzi, F., Narzisi, G., Mishra, B., 2012. Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons. PloS ONE 7, e52210.

! "$(! Vogel, C., Chothia, C., 2006. Protein family expansions and biological complexity. PLos Comput. Biol. 2. Wibbels, T., Bull, J.J., Crews, D., 1991. Chronology and morphology of temperature- dependent sex determination. J Exp Zool 260, 371-381. Williams, T., Kelley, C., 2011. Gnuplot 4.5: an interactive plotting program. Wittbrodt, J., Meyer, A., Schartl, M., 1998. More genes in fish? BioEssays 20, 511-515. Yandell, M., Ence, D., 2012. A beginner's guide to eukaryotic genome annotation. Nature reviews. Genetics 13, 329-342. Zatsepina, O.G., Velikodvorskaia, V.V., Molodtsov, V.B., Garbuz, D., Lerman, D.N., Bettencourt, B.R., . . . Evgenev, M.B., 2001. A Drosophila melanogaster strain from sub-equatorial Africa has exceptional thermotolerance but decreased Hsp70 expression. Journal of Experimental Biology 204, 1869-1881. Zhou, S., 2002. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl. Environ. Microbiol. 68, 6321-6331.

! "$)! Chapter 4. Genomic insights into the regulatory network of sex determination in the pejerrey (Odontesthes bonariensis)

4.1. Introduction

4.1.1. Temperature –dependent sex determination and differentiation

Phenotypic sex in adult vertebrates is the final outcome of the biological processes of sex determination and differentiation (Devlin and Nagahama, 2002). Sex determination refers to the canalization of the gonad into ovaries or testes, while differentiation entails all developmental changes in the morphology that affect undifferentiated gonads (Devlin and

Nagahama, 2002). Although conceptually intuitive, strict distinction between determination and differentiation processes may be better conceived as a continuum in which chemical, environmental, and genetic factors act together in a nonhierarchical network to form a female or male phenotype (Uller and Helantera, 2011). Most vertebrates undergo genetic sex determination (GSD) but the interaction between genetic and environmental factors also may play a role. Environmental factors that affect sex determination include pH, behavior, pollution, hypoxia, growth rate, density, and temperature; the latter considered the most frequent and well-studied factor (Baroiller et al., 2009; Devlin and Nagahama, 2002). Temperature-dependent sex determination, or

TSD, is not uncommon in turtles, crocodiles, and lizards, though it is scarcely present among teleost fishes (Ospina-Alvarez and Piferrer, 2008). Recent estimates have found between 43 to 53 species with TSD among teleosts, distributed in 4 taxonomic families

(Ospina-Alvarez and Piferrer, 2008; Valenzuela, 2008), and five of them are in the pejerrey family, Atherinopsidae (Strussmann et al., 2010).

! "#$! Dominance of either genotype or environmental temperature in sex determination has led to a dichotomous classification of organisms into GSD or TSD, respectively (Devlin and

Nagahama, 2002; Ospina-Alvarez and Piferrer, 2008; Valenzuela, 2008). But recent studies have challenged this extreme perspective on sex determination systems in fish

(Heule et al., 2014; Uller and Helantera, 2011; Yamamoto et al., 2014). In organisms with a GSD system, a single gene plays a dominant role in the pathway that leads to gonad differentiation (Devlin and Nagahama, 2002). For example, in Oryzias latipes

(medaka) dmy has been identified as the sex-determining gene (Matsuda et al., 2002) and thought to have originated as a duplication of dmrt1, a gene that codes for the DM-related transcription factor 1 implicated in vertebrate male development (Guan et al., 2000;

Kettlewell et al., 2000; Marchand et al., 2000). Under extreme conditions, however, temperature effects (TE) can override the genotypic sex of medaka. This has led to characterize sex determination in this species as a GSD+TE system (Ospina-Alvarez and

Piferrer, 2008), despite the fact that these extreme temperature conditions are not observed in wild habitats (Hattori et al., 2007).

The dichotomous GSD-TSD perspective is further disputed by coexistence of GSD and

TSD in Odontesthes bonariensis (Yamamoto et al., 2014). The pejerrey not only has a sex-determining gene but it is also an eloquent example of TSD (Yamamoto et al., 2014).

O. bonariensis hatchlings reared at water temperatures of 17ºC and 29ºC during post- hatching weeks 1 to 5 develop population sex ratios of 100% females or males, respectively (Strüssmann et al., 1997). The candidate sex-determining gene in pejerrey and other atherinopsid species is amhy, located in the male Y-chromosome (Hattori et al.,

2012; Yamamoto et al., 2014), a duplicated copy of the amh that encodes the anti-

! "#%! müllerian hormone (Josso et al., 2001). Amhy is present in at least eleven species of the genus Odontesthes, but the influence of temperature on sex determination varies among species (Hattori et al., 2012; Strüssmann et al., 1997; Yamamoto et al., 2014)(Hattori in press). The range of variation in temperature effects seen among species of Odontesthes also has been documented within a single species. The best known examples highlighting the plasticity of environmental influence on sex determination are the Atlantic silverside

Menidia menidia and the tidewater silverside M. peninsulae, whose populations range from GSD to TSD along a latitudinal gradient (Duffy et al., 2010; Lagomarsino and

Conover, 1993). Silverside species of Odontesthes and Menidia provide excellent models to gain further insights into mechanisms underlying such plasticity in sex determination and underscore the diffuse line separating GSD and TSD as discrete categories.

4.1.2. Hypotheses on GSD and TSD: discrete categories versus a continuum

Interactions between genetic and environmental factors to determine sex and differentiate gonads have been explained under two opposing views. The “discrete categories” framework (Figure 4.1.A) implies that the major difference between systems is ontogenetic, since the temperature-sensitive period for sex determination under TSD occurs during the first few weeks after hatching, while under GSD the direction of sex determination is set at fertilization. From a phylogenetic standpoint, given the scattered record of TSD species in the teleost tree of life, this hypothesis implies that TSD is a derived condition that originates from an ancestral GSD system (Ospina-Alvarez and

Piferrer, 2008). Under the “continuum” framework (Fig 4.1.B), organisms would share a common genetic toolkit but diverge along a gradient in the extent to which the

! "&'! environment regulates the gene set(s) involved in sex determination and differentiation.

Under this view, pejerrey and medaka would represent extremes along the GSD-TSD continuum, and other Odontesthes species and different Menidia (M. menidia or M. peninsulae) populations would be placed along this temperature-sensitive regulatory gradient. The degree of environmental sensitivity would be a function of the regulatory network controlling expression of a set of conserved genes. Therefore, plasticity in environmental regulation of sex determination and differentiation may be more consistent with a flexible modular network rather than a hierarchical regulatory cascade (Heule et al.,

2014; Uller and Helantera, 2011). Under this new perspective, there is not a single, canalized pathway of genes and factors for each of the two gonadal fates, but many possible routes that could lead to any of the two sexes. This view predicts that GSD and

TSD species would share the genetic machinery associated with sexual development, but diverge in their environmental sensitivity.

! "&"! Genetic SD Temperature SD

Oryzias latipes Odontesthes bonariensis Odontesthes hatcheri Odontesthes argentinensis Menidia menidia & Menidia Menidia menidia & Menidiae A peninsulae (High latitudes) peninsulae (Low latitudes) _ + Temperature influence

Oryzias Odontesthes Odontesthes Odontesthes latipes hatcheri argentinensis bonariensis

Menidia menidia B Menidia peninsulae

Figure 4.1. Contrasting hypotheses about the boundaries of GSD and TSD systems in atherinomorph species. (A) GSD and TSD are two discrete categories (Valenzuela, 2008). (B) GSD and TSD are extremes of a continuum along which temperature effects on the genotypic sex is variable (Barske and Capel, 2008; Sarre et al., 2004; Yamamoto et al., 2014). The beloniform Oryzias latipes is a classic GSD species with weak thermo- sensitivity; the atheriniform species in Odontesthes and Menidia are in the family Atherinopsidae (see Chapter 2 for phylogenetic hypotheses). Information on thermosensitivity was taken from Lagomarsino (Lagomarsino and Conover, 1993) for M. menidia, Yamahira (Yamahira and Conover, 2003) for M. peninsulae, and Strüssmann (Strüssmann et al., 1996a; Strüssmann et al., 1997; Strüssmann et al., 1996b) for all Odontesthes species. Distances between points in the continuum (B) are approximates, not proportional to the degree of thermosensitivity.

! "&#!

! 4.1.3. The role of steroids in the regulation of TSD

In addition to candidate sex-determining genes and water temperature, other factors have been shown to play a key role in pejerrey sex determination and differentiation. Steroid hormones (glucocorticoids, androgens, and estrogens) are chemical factors involved in vertebrate sex differentiation (Blazquez and Somoza, 2010; Piferrer et al., 2012). All steroid hormones are synthesized from cholesterol, a precursor that is converted by steroidogenic enzymes into different hormones in organs such as the brain, gonads, adrenal, and interrenal glands. In differentiating gonads of embryos and larvae, cells synthesize the corresponding female or male sex steroids and show expression of the genes encoding steroidogenic enzymes (Devlin and Nagahama, 2002; Matsuda, 2003;

Nakamura, 2010; Yamamoto, 1969). The influence of steroids in teleost gonadal differentiation remains unclear, and two hypotheses have been proposed: (1) differential activity of gonadal aromatase cyp19a1a or hsd11b2 will determine the differentiation into ovaries or testes, respectively (Baroiller et al., 2009; Guiguen et al., 2010; Nakamura,

2010), and (2) up-regulation of gonadal aromatase cyp19a1a, and the resulting increase in active estrogen lead towards a female gonad, while down-regulation of cyp19a1a is the first necessary step for subsequent testicular differentiation (Guiguen et al., 2010). In other words, while the first hypotheses states that either androgens or estrogens could lead the way, the second proposes that the absence of estrogens (by down-regulation of cyp19a1a) is required for the synthesis of active androgens and masculinization.

Glucocorticoids, frequently associated with regulation of metabolism and immune function, are also essential in gonadal fate during masculinization by exposure to high

! "&&! temperatures in pejerrey, medaka, and Japanese flounder (Hattori et al., 2009; Hayashi et al., 2010; Yamaguchi et al., 2010). In all these species, high levels of cortisol (a stress- related hormone) are detected when hatchlings are reared at high, male-producing temperatures during the sensitive period (Lovejoy et al., 2006). In addition, when cortisol is administered experimentally to pejerrey larvae during this period, the percentage of males in the progeny increases (Hattori et al., 2009). The resulting molecular effects of cortisol as a trigger of the male pathway are not clearly understood, but two main events have been hypothesized to occur: (i) cortisol down-regulates the expression of gonadal aromatase (cyp19a1a), (ii) cortisol affects the synthesis of 11-KT, the main active androgen in fish (Fernandino et al., 2013). The former was shown experimentally in pejerrey and Japanese flounder (Hattori et al., 2009; Yamaguchi et al., 2010). Moreover, in vitro studies in the Japanese flounder (Yamaguchi et al., 2010) have shown that the transcription of cyp19a1a was suppressed by the binding of a glucocorticoid response element (GRE) in its promoter region.

Cortisol is tightly linked to the androgen 11-KT, due to the shared enzymatic machinery for its synthesis and inactivation (Borg, 1994; Bury and Sturm, 2007; Fernandino et al.,

2012). Cortisol in pejerrey and Japanese eels can induce the production of 11-KT

(Fernandino et al., 2012; Ozaki et al., 2006). Steroids act in the cell by binding to a receptor that regulates the expression of steroid-targeted genes (Nilsson et al., 2001).

Once the steroid forms a complex with the receptor in the cytoplasm, the complex enters the nucleus and binds to an enhancer sequence in the cis-regulatory region of the target gene named hormone response element (Howe et al., 2013) (Howe et al., 2013) (Howe et al., 2013) (Evans, 1988). Each receptor will bind to a specific HRE, depending on the

! "&(! type of steroid and they are known accordingly as Estrogen Responsive Element (ERE),

Glucocorticoid Responsive Element (GRE), or Androgen Responsive Element (ARE).

These elements are structurally similar but have different function. The common structure consists of a palindromic pair of hexameric ‘half-sites’ that differ by a few nucleotides

(Evans, 1988). Presence, absence, and completeness of steroid response elements in the cis-regulatory region of TSD-relevant genes could shed light on the role of steroids in sex determination and differentiation in pejerrey.

Only a small number of candidate TSD-relevant genes in pejerrey have been cloned and sequenced (Table 4.1), but none of their cis-regulatory regions are available. The whole- genome draft presented in Chapter 3 contains a large catalog of genes and regulatory regions, including response elements in cis-regulatory regions of steroid-targeted genes in pejerrey.

The goal of this chapter is to explore the sex determination and differentiation network in pejerrey by studying two aspects: (a) if the relevant genes for GSD in medaka and TSD in pejerrey have a conserved gene order and functions in the adjacent genes, and (b) the potential regulation of relevant genes in pejerrey TSD by the different steroid hormones.

4.2. Materials and Methods

4.2.1. Synteny and functions in TSD-related and adjacent genes

Candidate genes that exhibit sexually dimorphic expression profiles during early-life stages of O. bonariensis (encompassing the sex determination/differentiation sensitive period) identified in experimental studies are listed in Table 4.1 (with references to the original studies). Although the glucocorticoid receptor 2 gene (Gr2) is not known to have

! "&)! sexually dimorphic expression in pejerrey, it is included in the list due to its inherent sensitivity to glucocorticoids and the potential role in high-temperature induced masculinization (Fernandino et al., 2013). As a first step to explore putative mechanisms of regulation for these genes we compared their chromosomal location (synteny) in pejerrey and medaka. Published protein sequences of the TSD-relevant pejerrey genes were downloaded from GenBank (accession numbers in Table 4.1), and used as queries to identify putative medaka orthologs with BLASTP and BLink (BLAST Link). Protein sequences of the putative medaka orthologs were downloaded and used as BLASTX queries (e-value 10e-5), along with the pejerrey genes from Table 4.1, against the newly assembled pejerrey genome. The use of both medaka and pejerrey genes as queries served as a double-check, because some of these published pejerrey sequences were partial. The visual inspection of the output involved the verification that both putative orthologs from medaka and the pejerrey published sequences would have a top hit against the same gene in the new pejerrey genome. Top hits from the pejerrey genome were used as the definitive list of pejerrey TSD-relevant genes. All of these genes were located in the DagChainer output file from the synteny analysis in Chapter 3.

The function of the genes in Table 4.1 and adjacent genes in the corresponding scaffolds from the pejerrey assembly was assessed with gene ontology (Camon et al., 2004). This procedure may identify functionally integrated gene clusters containing TSD-relevant genes that might be regulated as a unit. TSD-relevant genes and all genes included in their scaffolds were loaded, blasted, and mapped with Blast2GO (Conesa et al., 2005).

All GOterms were used as input for online searches with GOslimmer from AmiGO

(Carbon et al., 2009), for identification of more specific parent terms or gene ontology

! "&*! categories. Main categories from GOslimmer were used to select the main GOterm for each gene.

4.2.2. Role of steroids in the TSD regulatory network

To examine the regulatory network affecting TSD-relevant genes, the potential regulatory/promoter, or cis-regulatory region of the TSD-relevant genes was identified using an application of PromH (Solovyev, 2003) in MolQuest software v2.3.3 for MAC

(Softberry, Inc., Mount Kisco, NY, USA). PromH uses pairs of orthologous genes to identify TSS (transcription start sites) by taking into account conservation features in known promoter regions (Solovyev, 2003). Because the TSS have not been identified experimentally in any of the TSD-relevant genes, an arbitrary flanking region was extracted from the pejerrey scaffolds, corresponding to 2000 bp upstream of the annotated initiation codon. For each gene, this 2000 bp pejerrey sequence was compared with PromH against a region of equal length from the orthologous medaka gene, downloaded from Ensembl web (last accessed October 2014)(Flicek et al., 2013).

Subsequently, MultiTF from the Mulan online application (Ovcharenko et al., 2005) was used to search for transcription factor binding sites associated with sexual steroids such as androgens (ARE, Androgen Response Element), estrogens (ERE), and glucocorticoids

(GRE) in the vicinity of the potential regulatory/promoter regions of the TSD-relevant genes in pejerrey. The identification codes used for targeted response elements from

TRANSFAC database (Matys et al., 2006) were: GR_Q6, GRE_C, GR_Q6_01 and

GR_01 for GREs, AR_Q2, AR_01, AR_02, AR_03, AR_Q6 for AREs, ERR1_Q2,

ER_Q6 and ER_Q6_02 for EREs. Additionally, TATA boxes were targeted with

! "&+! MultiTF in genes where PromH could not detect a promoter. TATA boxes are commonly associated with promoters and provide a good approximation to the TSS regions (Smale and Kadonaga, 2003). Because the original purpose of Mulan-MultiTF is the identification of conserved binding sites, it required at least two sequences to be aligned

(Ovcharenko et al., 2005). Therefore, the 2000 bp pejerrey sequence for each gene

(obtained from the de novo assembly) was aligned against each partial pejerrey nucleotide sequence downloaded from the NCBI NIH website.

! "&$!

Table 4.1. TSD-relevant genes in pejerrey. Previous experimental studies (“References” column) have shown sexually dimorphic patterns of gene expression in larvae undergoing either thermo-sensitive stages of sex determination or later gonad differentiation, *Gr2 did not show sexually dimorphic expression in pejerrey experiments, but is included as a relevant gene due to its inherent sensitivity to glucocorticoids and the potential role in high- temperature induced masculinization.

Gene Short GenBank Name Accession Gene Name Number References Anti-müllerian hormone (autosomic) amh AY763406.2 (Fernandino et al., 2008a) Anti-müllerian hormone (sex amhy KC847082.1 determining) (Yamamoto et al., 2014) Androgen receptor ! ar! HM755973.1 (Fernandino et al., 2012) Androgen receptor " ar" HM755974.1 (Fernandino et al., 2012) Gonadal Aromatase cyp19a1a EF030342.1 (Fernandino et al., 2008b) Doublesex and mab-3 related AY319416.3 transcription factor 1 dmrt1 (Fernandino et al., 2008b) er!1 EU284021.1 (Perez et al., 2012; Strobl- Estrogen receptor ! 1 Mazzulla et al., 2008) er" EU284022.2 (Perez et al., 2012; Strobl- Estrogen receptor " Mazzulla et al., 2008) foxl2 EU864151.1 (Hattori et al., 2012) in O. Forkhead transcription factor L2 hatcheri Follicle-stimulating hormone " subunit fsh " AY319832.2 (Shinoda et al., 2010) Follicle-stimulating hormone receptor fsh r GQ258853.1 (Shinoda et al., 2010) Glucocorticoid receptor 1 gr1 HQ843506.1 (Fernandino et al., 2012) Glucocorticoid receptor 2 * gr2 HM755976.1 (Fernandino et al., 2012) Hydroxysteroid (11-!) dehydrogenase 2 hsd11b2 HM755972.1 (Fernandino et al., 2012) Luteinizing hormone " subunit lh " AY319833.3 (Shinoda et al., 2010) Luteinizing hormone receptor (LHr) lh r GQ258852.1 (Shinoda et al., 2010)

! "&%!

4.3. Results

4.3.1. Comparison of synteny and functions in SD-related genes in medaka and pejerrey

All 16 genes with sexually dimorphic expression profiles during sex determination and/or gonad differentiation that were experimentally identified in pejerrey (Table 4.1) are present in the genome draft assembly, and six (amh, ar!, cyp19a1a, er!, fshr, lhr) are located in conserved syntenic blocks with medaka (Figure 4.2). From the ten genes that are in syntenic blocks with medaka, three (amhy, dmrt1, hsd11b2) are located in small scaffolds with no other annotated gene.

The most common biological functions associated with the TSD relevant genes are transcription regulation and signal transduction, observed in nine (ar!, ar", dmrt1, er!, er", foxl2, dmrt1, gr1, gr2) and six (amh, amhy, fsh", fshr, lh", lhr) genes, respectively

(Figure 4.2). The two genes encoding steroidogenic enzymes (cyp19a1a, hsd11b2) are functionally associated with oxidation-reduction processes. Six of the genes involved in transcription regulation were also associated with steroid binding and the steroid- mediated signaling pathway (ar!, ar", er!, er", gr1, gr2)(not in Figure 4.2). C6orf211, which is not part of the TSD-related genes, is also involved in steroid binding, and is located directly adjacent to er!.

Functional predictions of genes within clusters in synteny with medaka show that five out of the six blocks have genes with protein binding function (syne1a and zbtb2b in the er! block, olfm4 and dmxl2 in the cyp19a1a block, fbxo11 in both fshr and lhr block, erm in the ar! block), three blocks have genes associated with transcription regulation

(c6orf211 in the er! block, gtf2a1 and foxn2b in the lhr block, and ell and pplase in the

! "('! amh block), and two blocks have genes associated with development (myct1 in the er! block, zc4h2 in the in the ar! block). Among the thirteen genes in the adjacencies of

TSD-related genes not in synteny with medaka, no gene ontology terms were found for six genes, while three were involved in signal transduction (ophn1 in the ar" block, rho in the lh" block, and arhgap26 in the the gr2 block), two were associated to protein binding (plk3cb in the foxl2 block, syne2b in the er" block), and two were involved in development (sh3 in the gr2 block and fgf1 in the gr1 block).

4.3.2. Role of steroids in the TSD regulatory network

Table 4.2 shows the presence of steroid hormone response elements in the potential regulatory/promoter region for TSD-relevant genes. The term “potential regulatory/promoter region” is used instead of “promoter region”, because the actual promoter could not be identified with PromH (Solovyev, 2003) in seven genes (amhy, er!, fsh", fshr, gr2, hsd11b2, lhr). For these genes, Mulan-MultiTF (Ovcharenko et al.,

2005) detected TATA-box motifs in four genes (Table 4.2), which are commonly associated with promoter regions, allowing for an alternative estimation of the promoter.

The region used to scan for response elements was restricted to 2000 bp in the 5’ upstream direction from the first ATG codon, larger regions (if available) were examined after initial unsuccessful searches. This allowed for identification of elements in ar", which seems to have its potential regulatory/promoter region more than 6500 bp upstream of the initiation codon (Table 4.2).

! "("! No response elements were found in the 5’ flanking region of lh", due to a sequencing gap affecting that particular region. Both AREs and EREs were identified in the upstream region of the rest of all TSD-relevant genes, while GREs were identified in all but three genes (er!, foxl2, and fsh"). EREs in amh and foxl2 and AREs in gr2 were not complete sequences, but partial elements denominated “half-sites”.

! "(#!

mthfd1l zbtb2b c6orf211 er! syne1a myct1 serac1

lhr gtf2a1 ppp1r21 foxn2b fbxo11 msh6 kcnk12

ar! erm zc4h2

tnfaip cyp19a1a olfm4 dmxl2

fshr fbxo11 mutS1

amh dot1l ell pplase Syntenic blocks with medaka

per2 pik3cb foxl2

syne2b er" HP Protein binding ophn1 ar" Transcription regulation Development rho lh" HP Transport (intracel/transmembrane) arhgap26 sh3 gr2 Oxidation-Reduction Process DNA damage repair fgf1 gr1 HP Signal Transduction Methylation HP fsh" HP

dmrt1

hsd11b2 amhy

Figure 4.2 Genomic location and Gene Ontology of TSD-relevant genes in pejerrey. Pejerrey TSD-relevant genes in gray box. Dashed line encloses syntenic blocks of genes with medaka. GOterm for “development” includes embryonic development, nervous system development, and embryonic hemopoiesis. Genes with no associated GOterm are shown in white rectangles. (HP) hypothetical protein, annotated with no evidence from curated databases. Full list of genes with corresponding GOterms and names in Supplementary Table 4.3. (see before References).

! "(&!

Table 4.2. Steroid hormone response elements in upstream region of TSD-relevant genes. Presence (X), absence (-), and half-sites (") of regulatory elements are indicated for each gene. “Distance from ATG (bp)” shows the number of base pairs that separates the putative promoter region from the predicted transcription initiation codon. TATA boxes were estimated when promoters could not be predicted (see methods 4.2.1). ARE: Androgen Response Element; ERE: Estrogen Response Element; GRE: Glucocorticoid Response Element.

Gene ARE ERE GRE Distance from ATG (bp) amh X (") X 641 amhy X X X - ar! X X X 566 ar" X X X 6549 cyp19a1a X X X 1217 dmrt1 X X X 160 er! X X - - er" X X X 6197 foxl2 X (") - 90 fsh" X X - ~400 (TATA) fshr X X X ~1078 or ~483 (TATA) gr1 X X X 112 gr2 (") X X ~446 or ~1566 (TATA) hsd11b2 X X X - lh" - - - 107 lhr X X X TATA in 2nd intron

! "((!

GRE Hatching amhy or ARE Cortisol ERE 29ºC 25ºC 17ºC

Low FSH & LH Gr

? LHb LHr hsd11b2 FSHb FSHr

Thermo- 11-KT sensitive FSHr FSHb period Ar LHr ? LHb foxl2

dmrt1 amh 17 !- cyp19a1a estradiol

Gonadal Er Differentiation

Target genes Target genes

Sexual maturation TESTES OVARIES

Figure 4.3. Regulation of temperature-dependent sex determination (TSD) in pejerrey. Developmental time axis shown on left starts from hatching at the top to sexual maturation at the bottom. Thermo-sensitive and gonadal differentiation periods are indicated with dashed lines. Presence of amhy gene or high (29ºC) water temperatures during the thermo-sensitive period lead to masculinization of undifferentiated gonads (light blue area), whereas absence of amhy or low (17ºC) temperatures leads to feminization (pink area). Incubating temperatures close to 25ºC (black split arrow) lead to a 1:1 ratio of males and females. An alternative pathway for masculinization starts with high-temperature-induced cortisol release (upper left). Genes putatively involved in TSD regulation are shown in boxes with their upstream cis-regulatory region indicated as a black line to the left. Steroid hormone response elements in cis-regulatory regions are marked with different colors in the upstream region of genes. Solid lines connecting genes point to relationships that have been experimentally confirmed. Dashed lines link elements based on experiments on other species or on incomplete evidence. Lines with arrowheads indicate up-regulation while block ends imply down-regulation. References from the literature for each gene are included in Table 4.1. Response elements are summarized in Table 4.2 "!

! "()! "! #! 4.4. Discussion &!

4.4.1. Comparison of synteny and functions in SD-related genes in medaka and pejerrey (!

Sex determination in GSD medaka is directed by the presence of one gene, while genetic )! effects in TSD pejerrey can be completely overridden by temperature. The master sex- *! determining genes in medaka and pejerrey are different, but interesting similarities were +! found in the gene order and function of the downstream genetic machinery. $!

Thirteen blocks of genes that contain TSD-relevant genes were identified in the pejerrey %! draft genome and analyzed to assess their functionality. Due to the fragmentary nature of "'! the assembly, three genes (amhy, dmrt1, and hsd11b2) strongly associated with TSD in ""! previous studies (Fernandino et al., 2012; Fernandino et al., 2008b; Yamamoto et al., "#!

2014) were located in the draft genome but in short scaffolds that contained no other "&! genes. Six of the thirteen gene blocks identified are syntenic with medaka, implying "(! strong conservation of gene order and perhaps co-regulation over long evolutionary time. ")!

In teleost fish, gene order is highly conserved, except for well-documented chromosomal "*! rearrangements (Jaillon and al., 2004; Kasahara, 2007; Star et al., 2011). "+!

The most frequent functional class predicted for these genes was protein binding, and in "$! descending order, transcription regulation, development, signal transduction, cellular "%! transport, DNA damage repair, and methylation (Figure 4.2). Most of these genes are #'! transcription factors and hormone receptors, molecules known to be involved in gene #"! transcription regulation and signaling mechanisms, respectively (Latchman, 1997). The ##! extent to which TSD-relevant genes and adjacent genes in these syntenic blocks are #&! integrated into functional clusters with coordinated expression during key stages of #(!

! "(*! sexual development is currently not known but co-expressed genes are commonly in "! close proximity, forming functional clusters (Ng et al., 2009). Our structural and #! functional analysis of putative regulatory networks enabled by the draft pejerrey genome &! assembly provides explicit predictions for genes that could be tested in future studies. (!

)!

4.4.2. Steroid regulation of TSD-relevant genes *!

Sex determination and differentiation in Odontesthes bonariensis is influenced by water +! temperature during early sexual development (Fernandino et al., 2013; Strüssmann et al., $!

1997). In this early life stage, steroid hormones like androgens, estrogens and %! glucocorticoids have a fundamental role as signaling molecules between the brain, "'! pituitary, adrenal, interrenal glands, and the gonads (Kumar et al., 2000; Miranda et al., ""!

2013). Some pejerrey genes are known to show differential expression in females and "#! males during thermo-labile stages of sex determination or gonad differentiation (Table "&!

4.1), though specific interactions with steroids are unclear. Results of the in silico "(! analysis provide strong evidence to support a role of steroids in modulating expression of ")!

TSD genes in the pejerrey. Experimental evidence from studies of pejerrey reproduction "*! predicts the presence/absence of androgen response elements (AREs), estrogen response "+! elements (EREs), and glucocorticoid response elements (GREs) in upstream regions of "$!

TSD-relevant genes (Elisio et al., 2012; Fernandino et al., 2012; Fernandino et al., 2013; "%!

Miranda et al., 2013; Pérez et al., 2012; Strobl-Mazzulla et al., 2008). Results from the in #'! silico analysis here presented support many of those predictions. #"!

Given the fragmented nature and the draft quality of the draft assembly, potential ##! regulatory regions and/or response elements for genes amhy and lh" could not be found. #&!

! "(+! In the case of amhy, no promoter was identified, probably due to a shorter available 5’ "! upstream region. Therefore, response elements in amhy may be present but could not be #! detected in our analysis. Regulatory analysis in lhb was not possible due to a sequencing &! gap in the targeted genomic region. (!

Response elements were in some cases incomplete or present as half-sites. However, )! experimental evidence has shown half-site elements to be functional and responsive to *! specific ligands (Nocillado et al., 2006; Tanaka et al., 1995; Vyhlidal et al., 2000). For +! example, incomplete EREs in the promoter region of cyp19a1a have been characterized $! in four species (Nocillado et al., 2013; Tanaka et al., 1995), and were found to be %! responsive to estradiol (Vyhlidal et al., 2000). "'!

As suggested by experimental evidence, our results confirm that both ovarian- ""! differentiation factor cyp19a1a and male differentiation-related amh can be affected by "#! androgens, estrogens, and glucocorticoids (Fernandino et al., 2013; Strüssmann and "&!

Nakamura, 2002). The amh gene encodes anti-müllerian hormone (amh), a member of "(! the TGF-! superfamily secreted by Sertoli cells and implicated in testes development by ")! regression of müllerian-ducts in tetrapods (Josso et al., 2001). Fish do not have müllerian "*! ducts, but amh is also heavily involved with gonad differentiation through proliferation of "+! primordial germ cells and spermatogenesis (Morinaga et al., 2007). Amh is expressed "$! early in low levels in undifferentiated gonads of both pejerrey males and females, and in "%! high levels after week 5 only in testes (Fernandino et al., 2008a). Similar expression #'! profiles were observed in Japanese founder (Yoshinaga et al., 2004), zebrafish #"!

(Rodríguez-Marí et al., 2005), and rainbow trout (Baron et al., 2005). Cyp19a1a encodes ##! for gonadal aromatase, a steroidogenic enzyme that converts androgens into estrogens, #&!

! "($! and has a key role in pejerrey female gonad development (Simpson et al., 2002; Simpson "! et al., 1994). In the gonad, regulation of cyp19a1a expression seems to be heavily #! influenced by steroids, possibly due to the presence or absence of steroid response &! elements in the cis-regulatory region as was observed in Japanese flounder (Yamaguchi (! et al., 2010). In pejerrey cyp19a1a is expressed in early sex determination stages and it is )! involved in later steps of ovarian formation (Karube et al., 2007). *!

Identification of EREs in upstream regions supports the key role of estrogen in regulation +! of genes expressed before gonad differentiation, like dmrt1 and cyp19a1a (Fernandino et $! al., 2008b). Also, estrogen response elements in pejerrey masculinization genes (amh, %! amhy, dmrt1) confirm the renewed hypothesis of estrogens as important factors in male "'! reproduction (Couse et al., 2001; Miura et al., 1999). The presence of EREs in ar! and ""! ar" is consistent with experimental evidence for down-regulation by estrogen (Pérez et "#! al., 2012), as well as up-regulation in cyp19a1a, er!, and er" (Strobl-Mazzulla et al., "&!

2008). Both, ar! and ar" exhibit different expression profiles in response to external "(! administration of cortisol, where ar! showed no differential expression (Fernandino et al., ")!

2012). Such expression disparity in ar! and ar" is not fully explained by in silico "*! evidence for response elements, as all types of elements targeted in this study have been "+! found for both receptors. "$!

This study also confirms experiments measuring the response of pejerrey larvae to "%! supplemented cortisol on the level of hsd11b2 expression (Fernandino et al., 2012). #'!

Hsd11b2 encodes for a key enzyme in high-temperature induced masculinization #"!

(Fernandino et al., 2012), where water temperature acts as an environmental cue that ##! triggers the release of stress-related cortisol. Hsd11b2 is the final enzyme in the 11-KT #&!

! "(%! (main androgen in fish) androgen synthesis pathway, which has been detected in high "! levels when cortisol is high (Fernandino et al., 2013). Hsd11b2 is affected by cortisol, #! possibly through a GRE in the promoter region, which leads to an increased synthesis of &!

11-KT (Fernandino et al., 2012; Fernandino et al., 2013). The in silico study here (! presented confirms that hsd11b2 has a glucocorticoid response element in the promoter )! vicinity region. *!

GREs were found in 12 genes, except for er!, foxl2, and fsh". Er! and er" genes code +! for estrogen receptors ! and ", respectively, and both are expressed in several tissues in $! pejerrey (Strobl-Mazzulla et al., 2008). These paralogous genes evolved from an %! ancestral gene duplication but share low sequence identity (Strobl-Mazzulla et al., 2008). "'!

It has been shown that er! and er" have different expression profiles during early sexual ""! development (Strobl-Mazzulla et al., 2008). While er" is differentially expressed in "#! female- and male-producing temperatures, expression of er! is similar in every "&! developmental stage. According to our in silico results, er! has no GREs in the potential "(! regulatory/promoter region, and therefore, a lower sensitivity to glucocorticoids. Cortisol, ")! a glucocorticoid internally released during stress, is known as a key factor in pejerrey "*! masculinization through high temperatures. In pejerrey, high temperatures and cortisol "+! can override genetic sex determination, converting genotypic females into phenotypic "$! males (Fernandino et al., 2013). Therefore, absence of GREs in er! supports the "%! experimental evidence that temperature-induced changes in cortisol levels should not #'! affect er! expression. Foxl2 is a transcription factor with an important role in ovarian #"! development (Baron et al., 2004; Loffler et al., 2003), regulating the expression of ##! gonadal aromatase cyp19a1a (Wang et al., 2007). In medaka, the expression of cyp19a1a #&!

! ")'! is independent from foxl2 (Herpin et al., 2013). In O. hatcheri, high expression of foxl2 "! and cyp19a1a has been detected in potential ovaries (Hattori et al., 2012), and temporal #! separation of foxl2 and cyp19a1a expression has been hypothesized but not confirmed &!

(Hattori et al., 2012). At high masculinizing temperatures (>25°C) high stress-related (! cortisol levels down-regulate the expression of cyp19a1a, and supposedly foxl2, but the )! mechanism remains unknown. Absence of GREs in the potential regulatory/promoter *! vicinity of foxl2 along with GRE presence in downstream cyp19a1a, suggest that +! masculinization might not be the result of a direct interaction between cortisol and a GRE $! in the potential regulatory/promoter region of foxl2. However, the regulation of cyp19a1a %! by foxl2 under female genotypic conditions or female-producing temperatures, if "'! occurring in pejerrey, remains unknown. Fsh" codes for the " subunit of the ""! heterodimeric glycoprotein follicle-stimulating hormone, a pituitary gonadotropin (GtHs) "#! that, along with luteinizing hormone (LH), constitute key regulatory factors in vertebrate "&! gonadal reproduction (Levavi-Sivan et al., 2010). FSH release is stimulated by GnRH "(! from the brain, but also by feedback mechanisms triggered by gonadal steroids. Once ")! released from the pituitary gland, FSH regulates the expression of FSH receptor, fshr "*!

(Miranda et al., 2013). Specific interactions between steroids and FSH are not clear in "+! pejerrey, as suggested by contradictory experimental evidence. While in vivo experiments "$! rearing larvae at high temperatures showed decreased fshr expression in both females and "%! males (Elisio et al., 2012), no difference in fshr expression was observed in in vitro #'! experiments (Miranda et al., 2013). However, none of these experiments measured #"! cortisol levels as a result of high temperatures. Our results would only allow to ##! hypothesize that due to the absence of GREs in the potential regulatory/promoter region #&!

! ")"! of fsh-" gene, FSH regulation by glucocorticoids could be occurring not at the hormone, "! but at the FSH receptor level, as GREs were indeed detected in fshr (Table 4.2). #!

&!

4.4.3. Other considerations (!

The pejerrey genome uncovers new perspectives for the study of environmental sex )! determination and its regulatory network. This study focuses on the regulation of genes *! previously identified as TSD-relevant in pejerrey through experimental evidence, but +! other genes like wnt, respondin, sox8 and sox9 have been targeted in reptiles and teleosts $! with TSD (Shoemaker and Crews, 2009; Valenzuela, 2010). However, the expression %! profiles of those genes, along with its potential relevance in TSD, are unknown in "'! pejerrey. A broader perspective on pejerrey TSD using transcriptomes of key ""! developmental stages could help identify a larger diversity of candidate genes, and "#! potentially shared regions on the genetic network between fish and reptiles with TSD. "&!

Gene expression regulation in sex determination and differentiation also happens at the "(! epigenetic level through processes like DNA methylation, histone modification and the ")! presence of non-coding RNAs (Piferrer, 2013). For example, methylation of certain genes "*! is a key factor in sex determination in flatfish and sea bass (Chen et al., 2014; Navarro- "+!

Martín et al., 2011). Interestingly, this study has detected dotl1 as a methylation-related "$! gene adjacent to amh in pejerrey. Dotl1 has been associated to methylation and sex "%! differentiation in the silkworm Bombyx mori (Suzuki et al., 2014). These genes together #'! have been related to a highly conserved cluster regulated by circadian cycles or “clock #"! genes” (Paibomesai et al., 2010), shared by ray-finned fishes and mammals (Rodríguez- ##!

! ")#! Marí et al., 2005). However, the role of dotl1 in the clock genes cluster was associated to "! cell cycling, and not to methylation. #!

&!

4.5. Conclusions (!

Since its discovery in 1997, TSD in O. bonariensis has been the subject of numerous )! studies that resulted in well-described morphological changes, but unclear connections in *! the network of underlying molecular mechanisms. Individual genetic analyses have +! identified a number of genes with sexually dimorphic expression profiles. The genomic $! order of many of these genes is conserved between GSD medaka and TSD pejerrey, %! suggesting potential regulation and/or expression of those genes as units, and also a "'! conservation of the genetic machinery in diverging sex determination mechanisms. These ""! genes are transcription factors and hormone receptors, molecules known to be involved in "#! gene transcription regulation and signaling mechanisms. All these TSD-relevant genes "&! could be regulated by androgens and estrogens, and most by glucocorticoids. The "(! omnipresence of steroid response elements in TSD-relevant genes expressed in different ")! stages of sexual development underlines the importance of the steroid-signaling pathway "*! in the regulatory network of pejerrey sex determination and differentiation. According to "+!

Heule (Heule et al., 2014), much attention has been drawn towards upstream master-sex "$! determination genes, while overlooking the role of downstream genes in the diverging "%! plasticity of fish sex determination. #'!

Also, the presence of glucocorticoids response elements in the majority of the genes #"! opens the possibility of a larger number of genes that could be affected by the cortisol ##! alternative masculinizing pathway (Fernandino et al., 2013). #&!

#(!

! ")&! "! #!

Supplementary Table 4.3. GOterms for all TSD-related and adjacent genes. All &! genes included in the same scaffold/syntenic block are consecutive and have the same (! color. TSD-relevant genes with the GOterm shown in Figure 4.3 are in bold. GO )! categories are (F) Molecular Function, (C) Cellular Component, and (P) Biological *! Process. +! $! GO Gene ID Gene Name Goterm ID Goterm Name category zinc finger and btb domain- 89.m000002 containing protein 2-like GO:0046872 metal ion binding F zinc finger and btb domain- 89.m000002 containing protein 2-like GO:0003676 nucleic acid binding F zinc finger and btb domain- 89.m000002 containing protein 2-like GO:0005634 nucleus C zinc finger and btb domain- 89.m000002 containing protein 2-like GO:0005515 protein binding F myc target protein 1 89.m000196 homolog GO:0035162 embryonic hemopoiesis P monofunctional c1- tetrahydrofolate 89.m000192 mitochondrial-like GO:0055114 oxidation-reduction process P monofunctional c1- tetrahydrofolate methylenetetrahydrofolate 89.m000192 mitochondrial-like GO:0004488 dehydrogenase (NADP+) activity F monofunctional c1- tetrahydrofolate folic acid-containing compound 89.m000192 mitochondrial-like GO:0009396 biosynthetic process P monofunctional c1- tetrahydrofolate 89.m000192 mitochondrial-like GO:0005524 ATP binding F monofunctional c1- tetrahydrofolate 89.m000192 mitochondrial-like GO:0004329 formate-tetrahydrofolate ligase activity F monofunctional c1- tetrahydrofolate 89.m000192 mitochondrial-like GO:0046487 glyoxylate metabolic process P 89.m000197 protein serac1 GO:0006505 GPI anchor metabolic process P hydrolase activity, acting on ester 89.m000197 protein serac1 GO:0016788 bonds F 89.m000197 protein serac1 GO:0006886 intracellular protein transport P 89.m000197 protein serac1 GO:0005488 binding F upf0364 protein c6orf211 steroid hormone mediated signaling 89.m000001 homolog GO:0043401 pathway P upf0364 protein c6orf211 89.m000001 homolog GO:0008270 zinc ion binding F upf0364 protein c6orf211 regulation of transcription, DNA- 89.m000001 homolog GO:0006355 dependent P upf0364 protein c6orf211 89.m000001 homolog GO:0005496 steroid binding F upf0364 protein c6orf211 89.m000001 homolog GO:0043565 sequence-specific DNA binding F upf0364 protein c6orf211 sequence-specific DNA binding 89.m000001 homolog GO:0003700 transcription factor activity F upf0364 protein c6orf211 89.m000001 homolog GO:0030284 estrogen receptor activity F upf0364 protein c6orf211 89.m000001 homolog GO:0005634 nucleus C upf0364 protein c6orf211 89.m000001 homolog GO:0005667 transcription factor complex C upf0364 protein c6orf211 regulation of transcription, DNA- 89.m000001 homolog GO:0045449 dependent P 89.m000195 NA GO:0016021 integral to membrane C

! ")(! 89.m000195 NA GO:0003779 actin binding F steroid hormone mediated signaling 89.m000194 estrogen receptor alpha GO:0043401 pathway P 89.m000194 estrogen receptor alpha GO:0008270 zinc ion binding F regulation of transcription, DNA- 89.m000194 estrogen receptor alpha GO:0006355 dependent P 89.m000194 estrogen receptor alpha GO:0005496 steroid binding F 89.m000194 estrogen receptor alpha GO:0043565 sequence-specific DNA binding F sequence-specific DNA binding 89.m000194 estrogen receptor alpha GO:0003700 transcription factor activity F 89.m000194 estrogen receptor alpha GO:0030284 estrogen receptor activity F 89.m000194 estrogen receptor alpha GO:0005634 nucleus C 89.m000194 estrogen receptor alpha GO:0005667 transcription factor complex C regulation of transcription, DNA- 89.m000194 estrogen receptor alpha GO:0045449 dependent P rho gtpase-activating protein positive regulation of Rho GTPase 1686.m000059 26 GO:0032321 activity P rho gtpase-activating protein 1686.m000059 26 GO:0005829 cytosol C rho gtpase-activating protein 1686.m000059 26 GO:0005100 Rho GTPase activator activity F rho gtpase-activating protein 1686.m000059 26 GO:0005856 cytoskeleton C rho gtpase-activating protein 1686.m000059 26 GO:0005543 phospholipid binding F rho gtpase-activating protein 1686.m000059 26 GO:0046847 filopodium assembly P rho gtpase-activating protein 1686.m000059 26 GO:0030036 actin cytoskeleton organization P rho gtpase-activating 1686.m000059 protein 26 GO:0007399 nervous system development P rho gtpase-activating protein 1686.m000059 26 GO:0005925 focal adhesion C rho gtpase-activating protein 1686.m000059 26 GO:0008093 cytoskeletal adaptor activity F rho gtpase-activating protein 1686.m000059 26 GO:0017124 SH3 domain binding F rho gtpase-activating protein 1686.m000058 26 GO:0005096 GTPase activator activity F rho gtpase-activating protein 1686.m000058 26 GO:0043547 positive regulation of GTPase activity P rho gtpase-activating protein 1686.m000058 26 GO:0005543 phospholipid binding F rho gtpase-activating protein 1686.m000058 26 GO:0005622 intracellular C rho gtpase-activating 1686.m000058 protein 26 GO:0007165 signal transduction P glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0016568 chromatin modification P glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0008270 zinc ion binding F glucocorticoid receptor- regulation of transcription, DNA- 1686.m000060 like isoform x2 GO:0006355 dependent P glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0005496 steroid binding F glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0043565 sequence-specific DNA binding F glucocorticoid receptor-like glucocorticoid mediated signaling 1686.m000060 isoform x2 GO:0043402 pathway P glucocorticoid receptor-like sequence-specific DNA binding 1686.m000060 isoform x2 GO:0003700 transcription factor activity F glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0005737 cytoplasm C glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0004883 glucocorticoid receptor activity F glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0005634 nucleus C glucocorticoid receptor-like glucocorticoid receptor signaling 1686.m000060 isoform x2 GO:0042921 pathway P

! "))! glucocorticoid receptor-like 1686.m000060 isoform x2 GO:0005667 transcription factor complex C glucocorticoid receptor-like regulation of transcription, DNA- 1686.m000060 isoform x2 GO:0045449 dependent P doublesex and mab-3 related 5921.m000016 transcription factor 1a GO:0043565 sequence-specific DNA binding F doublesex and mab-3 related transcription factor regulation of transcription, DNA- 5921.m000016 1a GO:0006355 dependent P doublesex and mab-3 related sequence-specific DNA binding 5921.m000016 transcription factor 1a GO:0003700 transcription factor activity F doublesex and mab-3 related 5921.m000016 transcription factor 1a GO:0005634 nucleus C doublesex and mab-3 related 5921.m000016 transcription factor 1a GO:0005667 transcription factor complex C doublesex and mab-3 related regulation of transcription, DNA- 5921.m000016 transcription factor 1a GO:0045449 dependent P luteinizing hormone beta 9100.m000034 subunit GO:0005576 extracellular region C luteinizing hormone beta 9100.m000034 subunit GO:0005179 hormone activity F luteinizing hormone beta 9100.m000034 subunit GO:0007165 signal transduction P g-protein coupled receptor G-protein coupled receptor signaling 9100.m000033 4-like GO:0007186 pathway P g-protein coupled receptor 4- 9100.m000033 like GO:0004930 G-protein coupled receptor activity F g-protein coupled receptor 4- 9100.m000033 like GO:0016021 integral to membrane C 6064.m000001 moesin-like isoform x1 GO:0019898 extrinsic to membrane C 6064.m000001 moesin-like isoform x1 GO:0005737 cytoplasm C 6064.m000001 moesin-like isoform x1 GO:0005856 cytoskeleton C 6064.m000001 moesin-like isoform x1 GO:0003779 actin binding F steroid hormone mediated signaling 6064.m000047 androgen receptor alpha GO:0043401 pathway P 6064.m000047 androgen receptor alpha GO:0008270 zinc ion binding F regulation of transcription, DNA- 6064.m000047 androgen receptor alpha GO:0006355 dependent P 6064.m000047 androgen receptor alpha GO:0043565 sequence-specific DNA binding F sequence-specific DNA binding 6064.m000047 androgen receptor alpha GO:0003700 transcription factor activity F 6064.m000047 androgen receptor alpha GO:0003707 steroid hormone receptor activity F 6064.m000047 androgen receptor alpha GO:0005634 nucleus C 6064.m000047 androgen receptor alpha GO:0005667 transcription factor complex C regulation of transcription, DNA- 6064.m000047 androgen receptor alpha GO:0045449 dependent P zinc finger c4h2 domain- 6064.m000048 containing GO:0005737 cytoplasm C zinc finger c4h2 domain- 6064.m000048 containing GO:0045211 postsynaptic membrane C zinc finger c4h2 domain- 6064.m000048 containing GO:0005634 nucleus C zinc finger c4h2 domain- 6064.m000048 containing GO:0007399 nervous system development P histone-lysine n- h3 lysine-79 histone-lysine N-methyltransferase 2002.m000058 specific-like isoform x1 GO:0018024 activity F histone-lysine n- h3 lysine- 2002.m000058 79 specific-like isoform x1 GO:0006479 protein methylation P histone-lysine n- h3 lysine-79 2002.m000058 specific-like isoform x1 GO:0006554 lysine catabolic process P peptidyl-prolyl cis-trans integral to endoplasmic reticulum 2002.m000060 isomerase fkbp8-like GO:0030176 membrane C peptidyl-prolyl cis-trans 2002.m000060 isomerase fkbp8-like GO:0006457 protein folding P peptidyl-prolyl cis-trans 2002.m000060 isomerase fkbp8-like GO:0005528 FK506 binding F peptidyl-prolyl cis-trans 2002.m000060 isomerase fkbp8-like GO:0018208 peptidyl-proline modification P 2002.m000060 peptidyl-prolyl cis-trans GO:0005740 mitochondrial envelope C

! ")*! isomerase fkbp8-like peptidyl-prolyl cis-trans peptidyl-prolyl cis-trans isomerase 2002.m000060 isomerase fkbp8-like GO:0003755 activity F peptidyl-prolyl cis-trans 2002.m000060 isomerase fkbp8-like GO:0005515 protein binding F peptidyl-prolyl cis-trans 2002.m000060 isomerase fkbp8-like GO:0008023 transcription elongation factor complex C peptidyl-prolyl cis-trans transcription elongation from RNA 2002.m000060 isomerase fkbp8-like GO:0006368 polymerase II promoter P 2002.m000056 anti-mullerian hormone GO:0008083 growth factor activity F negative regulation of androgen 2002.m000056 anti-mullerian hormone GO:2000835 secretion P 2002.m000056 anti-mullerian hormone GO:0005576 extracellular region C 2002.m000056 anti-mullerian hormone GO:0008406 gonad development P 2002.m000056 anti-mullerian hormone GO:0007165 signal transduction P 2002.m000056 anti-mullerian hormone GO:0008283 cell proliferation P 2002.m000056 anti-mullerian hormone GO:0040007 growth P rna polymerase ii elongation 2002.m000059 factor ell-like GO:0006412 translation P rna polymerase ii elongation 2002.m000059 factor ell-like GO:0005488 binding F rna polymerase ii elongation 2002.m000059 factor ell-like GO:0001570 vasculogenesis P rna polymerase ii transcription elongation from RNA 2002.m000059 elongation factor ell-like GO:0006368 polymerase II promoter P rna polymerase ii elongation 2002.m000059 factor ell-like GO:0008023 transcription elongation factor complex C follicle-stimulating hormone 7352.m000001 beta subunit GO:0005179 hormone activity F follicle-stimulating hormone 7352.m000001 beta subunit GO:0005576 extracellular region C follicle-stimulating 7352.m000001 hormone beta subunit GO:0007165 signal transduction P forkhead box protein n2-like 125.m000140 isoform x1 GO:0009888 tissue development P forkhead box protein n2-like 125.m000140 isoform x1 GO:0003690 double-stranded DNA binding F forkhead box protein n2-like 125.m000140 isoform x1 GO:0005667 transcription factor complex C forkhead box protein n2-like 125.m000140 isoform x1 GO:0008301 DNA binding, bending F RNA polymerase II distal enhancer forkhead box protein n2-like sequence-specific DNA binding 125.m000140 isoform x1 GO:0003705 transcription factor activity F forkhead box protein n2-like 125.m000140 isoform x1 GO:0008134 transcription factor binding F forkhead box protein n2-like 125.m000140 isoform x1 GO:0007389 pattern specification process P forkhead box protein n2-like 125.m000140 isoform x1 GO:0009790 embryo development P forkhead box protein n2-like regulation of sequence-specific DNA 125.m000140 isoform x1 GO:0051090 binding transcription factor activity P forkhead box protein n2-like 125.m000140 isoform x1 GO:0043565 sequence-specific DNA binding F forkhead box protein n2-like regulation of transcription from RNA 125.m000140 isoform x1 GO:0006357 polymerase II promoter P 125.m000141 f-box only protein 11-like GO:0008270 zinc ion binding F 125.m000141 f-box only protein 11-like GO:0005730 nucleolus C protein-arginine N-methyltransferase 125.m000141 f-box only protein 11-like GO:0016274 activity F 125.m000141 f-box only protein 11-like GO:0035246 peptidyl-arginine N-methylation P 125.m000141 f-box only protein 11-like GO:0004842 ubiquitin-protein ligase activity F 125.m000141 f-box only protein 11-like GO:0016567 protein ubiquitination P 125.m000141 f-box only protein 11-like GO:0007605 sensory perception of sound P 125.m000141 f-box only protein 11-like GO:0005737 cytoplasm C 125.m000141 f-box only protein 11-like GO:0005515 protein binding F potassium channel subfamily 125.m000142 k member 12-like GO:0005267 potassium channel activity F 125.m000142 potassium channel subfamily GO:0071805 potassium ion transmembrane transport P

! ")+! k member 12-like potassium channel subfamily 125.m000142 k member 12-like GO:0016021 integral to membrane C follicle-stimulating hormone signaling 125.m000137 luteinizing hormone receptor GO:0042699 pathway P 125.m000137 luteinizing hormone receptor GO:0016021 integral to membrane C follicle-stimulating hormone receptor 125.m000137 luteinizing hormone receptor GO:0004963 activity F 125.m000137 luteinizing hormone receptor GO:0004964 luteinizing hormone receptor activity F G-protein coupled receptor signaling luteinizing hormone pathway, coupled to cyclic 125.m000137 receptor GO:0007187 nucleotide second messenger P dna mismatch repair protein negative regulation of DNA 125.m000001 msh6-like isoform x1 GO:0045910 recombination P dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0032138 single base insertion or deletion binding F dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0008094 DNA-dependent ATPase activity F dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0007131 reciprocal meiotic recombination P dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0003684 damaged DNA binding F dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0000400 four-way junction DNA binding F dna mismatch repair protein msh6-like isoform 125.m000001 x1 GO:0000710 meiotic mismatch repair P dna mismatch repair protein intrinsic apoptotic signaling pathway in 125.m000001 msh6-like isoform x1 GO:0008630 response to DNA damage P dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0009411 response to UV P dna mismatch repair protein somatic hypermutation of 125.m000001 msh6-like isoform x1 GO:0016446 immunoglobulin genes P dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0000228 nuclear chromosome C dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0032137 guanine/thymine mispair binding F dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0005524 ATP binding F dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0032301 MutSalpha complex C dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0043570 maintenance of DNA repeat elements P dna mismatch repair protein 125.m000001 msh6-like isoform x1 GO:0045190 isotype switching P transcription initiation from RNA 125.m000138 stonin-1-like GO:0006367 polymerase II promoter P 125.m000138 stonin-1-like GO:0005672 transcription factor TFIIA complex C 125.m000138 stonin-1-like GO:0030131 clathrin adaptor complex C 125.m000138 stonin-1-like GO:0006897 endocytosis P 125.m000138 stonin-1-like GO:0006886 intracellular protein transport P 125.m000138 stonin-1-like GO:0005515 protein binding F corticosteroid 11-beta- dehydrogenase isozyme 2- 27555.m000015 like GO:0055114 oxidation-reduction process P corticosteroid 11-beta- dehydrogenase isozyme 2- 27555.m000015 like GO:0016491 oxidoreductase activity F 7329.m000021 fibroblast growth factor 1 GO:0030097 hemopoiesis P 7329.m000021 fibroblast growth factor 1 GO:0051781 positive regulation of cell division P 7329.m000021 fibroblast growth factor 1 GO:0008083 growth factor activity F 7329.m000021 fibroblast growth factor 1 GO:0045766 positive regulation of angiogenesis P positive regulation of cholesterol 7329.m000021 fibroblast growth factor 1 GO:0045542 biosynthetic process P 7329.m000021 fibroblast growth factor 1 GO:0044548 S100 protein binding F 7329.m000021 fibroblast growth factor 1 GO:0005615 extracellular space C 7329.m000021 fibroblast growth factor 1 GO:0005829 cytosol C 7329.m000021 fibroblast growth factor 1 GO:0008201 heparin binding F

! ")$! 7329.m000021 fibroblast growth factor 1 GO:0008284 positive regulation of cell proliferation P positive regulation of intracellular 7329.m000021 fibroblast growth factor 1 GO:0010740 protein kinase cascade P branch elongation involved in ureteric 7329.m000021 fibroblast growth factor 1 GO:0060681 bud branching P positive regulation of transcription from 7329.m000021 fibroblast growth factor 1 GO:0045944 RNA polymerase II promoter P 7329.m000021 fibroblast growth factor 1 GO:0005104 fibroblast growth factor receptor binding F fibroblast growth factor receptor 7329.m000021 fibroblast growth factor 1 GO:0008543 signaling pathway P 7329.m000021 fibroblast growth factor 1 GO:0072163 mesonephric epithelium development P 7329.m000021 fibroblast growth factor 1 GO:0030335 positive regulation of cell migration P 7329.m000022 glucocorticoid receptor-like GO:0008270 zinc ion binding F glucocorticoid receptor- regulation of transcription, DNA- 7329.m000022 like GO:0006355 dependent P 7329.m000022 glucocorticoid receptor-like GO:0005496 steroid binding F 7329.m000022 glucocorticoid receptor-like GO:0043565 sequence-specific DNA binding F glucocorticoid mediated signaling 7329.m000022 glucocorticoid receptor-like GO:0043402 pathway P sequence-specific DNA binding 7329.m000022 glucocorticoid receptor-like GO:0003700 transcription factor activity F 7329.m000022 glucocorticoid receptor-like GO:0004883 glucocorticoid receptor activity F 7329.m000022 glucocorticoid receptor-like GO:0005634 nucleus C glucocorticoid receptor signaling 7329.m000022 glucocorticoid receptor-like GO:0042921 pathway P 7329.m000022 glucocorticoid receptor-like GO:0005667 transcription factor complex C regulation of transcription, DNA- 7329.m000022 glucocorticoid receptor-like GO:0045449 dependent P phosphatidylinositol - bisphosphate 3-kinase phosphatidylinositol-4,5-bisphosphate 1679.m000043 catalytic subunit beta isoform GO:0046934 3-kinase activity F phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0005942 phosphatidylinositol 3-kinase complex C phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0048015 phosphatidylinositol-mediated signaling P phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0016303 1-phosphatidylinositol-3-kinase activity F phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0005524 ATP binding F phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0046854 phosphatidylinositol phosphorylation P phosphatidylinositol - bisphosphate 3-kinase phosphatidylinositol-3-phosphate 1679.m000043 catalytic subunit beta isoform GO:0036092 biosynthetic process P phosphatidylinositol - bisphosphate 3-kinase 1-phosphatidylinositol-4-phosphate 3- 1679.m000043 catalytic subunit beta isoform GO:0035005 kinase activity F phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0005886 plasma membrane C phosphatidylinositol - bisphosphate 3-kinase 1679.m000043 catalytic subunit beta isoform GO:0005515 protein binding F sequence-specific DNA binding 1679.m000002 forkhead box l2 GO:0003700 transcription factor activity F 1679.m000002 forkhead box l2 GO:0043565 sequence-specific DNA binding F 1679.m000002 forkhead box l2 GO:0005634 nucleus C regulation of transcription, DNA- 1679.m000002 forkhead box l2 GO:0006355 dependent P 1679.m000002 forkhead box l2 GO:0005667 transcription factor complex C regulation of transcription, DNA- 1679.m000002 forkhead box l2 GO:0045449 dependent P dna mismatch repair protein 3040.m000001 msh2-like GO:0032138 single base insertion or deletion binding F

! ")%! dna mismatch repair protein 3040.m000001 msh2-like GO:0008094 DNA-dependent ATPase activity F dna mismatch repair protein 3040.m000001 msh2-like GO:0010165 response to X-ray P dna mismatch repair protein 3040.m000001 msh2-like GO:0000403 Y-form DNA binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0003684 damaged DNA binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0000400 four-way junction DNA binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0000710 meiotic mismatch repair P dna mismatch repair protein 3040.m000001 msh2-like GO:0032302 MutSbeta complex C dna mismatch repair protein negative regulation of reciprocal meiotic 3040.m000001 msh2-like GO:0045128 recombination P dna mismatch repair 3040.m000001 protein msh2-like GO:0006302 double-strand break repair P dna mismatch repair protein 3040.m000001 msh2-like GO:0010224 response to UV-B P dna mismatch repair protein somatic hypermutation of 3040.m000001 msh2-like GO:0016446 immunoglobulin genes P dna mismatch repair protein double-strand/single-strand DNA 3040.m000001 msh2-like GO:0000406 junction binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0000228 nuclear chromosome C dna mismatch repair protein 3040.m000001 msh2-like GO:0000404 loop DNA binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0006301 postreplication repair P dna mismatch repair protein 3040.m000001 msh2-like GO:0032137 guanine/thymine mispair binding F intrinsic apoptotic signaling pathway in dna mismatch repair protein response to DNA damage by p53 class 3040.m000001 msh2-like GO:0042771 mediator P dna mismatch repair protein 3040.m000001 msh2-like GO:0031573 intra-S DNA damage checkpoint P dna mismatch repair protein 3040.m000001 msh2-like GO:0005524 ATP binding F dna mismatch repair protein 3040.m000001 msh2-like GO:0032301 MutSalpha complex C dna mismatch repair protein 3040.m000001 msh2-like GO:0043570 maintenance of DNA repeat elements P dna mismatch repair protein 3040.m000001 msh2-like GO:0045190 isotype switching P dna mismatch repair protein 3040.m000001 msh2-like GO:0006311 meiotic gene conversion P f-box only protein 11-like 3040.m000070 isoform x1 GO:0004842 ubiquitin-protein ligase activity F f-box only protein 11-like 3040.m000070 isoform x1 GO:0016567 protein ubiquitination P f-box only protein 11-like 3040.m000070 isoform x1 GO:0008270 zinc ion binding F f-box only protein 11-like 3040.m000070 isoform x1 GO:0005515 protein binding F follicle stimulating hormone follicle-stimulating hormone signaling 3040.m000069 receptor GO:0042699 pathway P follicle stimulating hormone 3040.m000069 receptor GO:0016021 integral to membrane C follicle stimulating hormone follicle-stimulating hormone receptor 3040.m000069 receptor GO:0004963 activity F G-protein coupled receptor signaling follicle stimulating pathway, coupled to cyclic 3040.m000069 hormone receptor GO:0007187 nucleotide second messenger P 1352.m000001 cytochrome p450 aromatase GO:0020037 heme binding F 1352.m000001 cytochrome p450 aromatase GO:0005506 iron ion binding F cytochrome p450 1352.m000001 aromatase GO:0055114 oxidation-reduction process P oxidoreductase activity, acting on 1352.m000001 cytochrome p450 aromatase GO:0016705 paired donors, with incorporation or F

! "*'! reduction of molecular oxygen dmx-like protein 2-like 1352.m000093 isoform x1 GO:0005515 protein binding F 1352.m000092 gliomedin-like GO:0005515 protein binding F steroid hormone mediated signaling 4244.m000038 estrogen receptor beta 2 GO:0043401 pathway P 4244.m000038 estrogen receptor beta 2 GO:0008270 zinc ion binding F regulation of transcription, DNA- 4244.m000038 estrogen receptor beta 2 GO:0006355 dependent P 4244.m000038 estrogen receptor beta 2 GO:0005496 steroid binding F 4244.m000038 estrogen receptor beta 2 GO:0043565 sequence-specific DNA binding F sequence-specific DNA binding 4244.m000038 estrogen receptor beta 2 GO:0003700 transcription factor activity F 4244.m000038 estrogen receptor beta 2 GO:0003707 steroid hormone receptor activity F 4244.m000038 estrogen receptor beta 2 GO:0005634 nucleus C 4244.m000038 estrogen receptor beta 2 GO:0005667 transcription factor complex C regulation of transcription, DNA- 4244.m000038 estrogen receptor beta 2 GO:0045449 dependent P 24950.m000011 anti-mullerian hormone GO:0008083 growth factor activity F 24950.m000011 anti-mullerian hormone GO:0008406 gonad development P 24950.m000011 anti-mullerian hormone GO:0007165 signal transduction P 24950.m000011 anti-mullerian hormone GO:0008283 cell proliferation P 24950.m000011 anti-mullerian hormone GO:0040007 growth P 3153.m000051 oligophrenin-1-like GO:0005096 GTPase activator activity F 3153.m000051 oligophrenin-1-like GO:0043547 positive regulation of GTPase activity P 3153.m000051 oligophrenin-1-like GO:0005543 phospholipid binding F 3153.m000051 oligophrenin-1-like GO:0005622 intracellular C 3153.m000051 oligophrenin-1-like GO:0007165 signal transduction P steroid hormone mediated signaling 3153.m000052 androgen receptor beta GO:0043401 pathway P 3153.m000052 androgen receptor beta GO:0008270 zinc ion binding F regulation of transcription, DNA- 3153.m000052 androgen receptor beta GO:0006355 dependent P 3153.m000052 androgen receptor beta GO:0005496 steroid binding F 3153.m000052 androgen receptor beta GO:0043565 sequence-specific DNA binding F 3153.m000052 androgen receptor beta GO:0004882 androgen receptor activity F 3153.m000052 androgen receptor beta GO:0030521 androgen receptor signaling pathway P 3153.m000052 androgen receptor beta GO:0005634 nucleus C 2965.m000035 nesprin-2-like GO:0003779 actin binding F 2965.m000035 nesprin-2-like GO:0016021 integral to membrane C steroid hormone mediated signaling 2965.m000001 estrogen receptor beta GO:0043401 pathway P 2965.m000001 estrogen receptor beta GO:0008270 zinc ion binding F regulation of transcription, DNA- 2965.m000001 estrogen receptor beta GO:0006355 dependent P 2965.m000001 estrogen receptor beta GO:0005496 steroid binding F 2965.m000001 estrogen receptor beta GO:0043565 sequence-specific DNA binding F sequence-specific DNA binding 2965.m000001 estrogen receptor beta GO:0003700 transcription factor activity F 2965.m000001 estrogen receptor beta GO:0005634 nucleus C intracellular estrogen receptor signaling 2965.m000001 estrogen receptor beta GO:0030520 pathway P 2965.m000001 estrogen receptor beta GO:0030284 estrogen receptor activity F 2965.m000001 estrogen receptor beta GO:0005667 transcription factor complex C regulation of transcription, DNA- 2965.m000001 estrogen receptor beta GO:0045449 dependent P "!

#!

&!

(!

! "*"! 4.6. References "!

Baroiller, J.F., D'Cotta, H., Bezault, E., Wessels, S., Hoerstgen-Schwark, G., 2009. #! Tilapia sex determination: Where temperature and genetics meet. Comparative &! biochemistry and physiology. Part A, Molecular & integrative physiology 153, 30-38. (! Baron, D., Cocquet, J., Xia, X., Fellous, M., Guiguen, Y., Veitia, R.A., 2004. An )! evolutionary and functional analysis of FoxL2 in rainbow trout gonad differentiation. *! J. Mol. Endocrinol. 33, 705-715. +! Baron, D., Houlgatte, R., Fostier, A., Guiguen, Y., 2005. Large-scale temporal gene $! expression profiling during gonadal differentiation and early gametogenesis in %! rainbow trout. Biology of reproduction 73, 959-966. "'! Barske, L.A., Capel, B., 2008. Blurring the edges in vertebrate sex determination. Current ""! opinion in genetics & development 18, 499-505. "#! Blazquez, M., Somoza, G.M., 2010. Fish with thermolabile sex determination (TSD) as "&! models to study brain sex differentiation. Gen. Comp. Endocrinol. 166, 470-477. "(! Borg, B., 1994. Androgen in teleost fishes. Comp. Biochem. Physiol. C Pharmacol. ")! Toxicol. Endocrinol. 109, 219-245. "*! Bury, N.R., Sturm, A., 2007. Evolution of the corticosteroid receptor signalling pathway "+! in fish. Gen. Comp. Endocrinol. 153, 47-56. "$! Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., . . . Apweiler, R., "%! 2004. The Gene Ontology annotation (GOA) database: sharing knowledge in Uniprot #'! with Gene Ontology. Nucleic acids research 32, D262-D266. #"! Carbon, S., Ireland, A., Mungall, C.J., Shu, S., Marshall, B., Lewis, S., . . . Group, ##! W.P.W., 2009. AmiGO: online access to ontology and annotation data. #&! Bioinformatics 25, 288-289. #(! Chen, S., Zhang, G., Shao, C., Huang, Q., Liu, G., Zhang, P., . . . Wang, J., 2014. Whole- #)! genome sequence of a flatfish provides insights into ZW sex chromosome evolution #*! and adaptation to a benthic lifestyle. Nature genetics 46, 253-260. #+! Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., Robles, M., 2005. #$! Blast2GO: a universal tool for annotation, visualization and analysis in functional #%! genomics research. Bioinformatics 21, 3674-3676. &'! Couse, J.F., Mahato, D., Eddy, E.M., Korach, K.S., 2001. Molecular mechanism of &"! estrogen action in the male: insights from the estrogen receptor null mice. &#! Reproduction, Fertility and Development 13, 211-219. &&! Devlin, R.H., Nagahama, Y., 2002. Sex determination and sex differentiation in fish: an &(! overview of genetic, physiological, and environmental influences. Aquaculture 208, &)! 191-364. &*! Duffy, T.A., Picha, M.E., Won, E.T., Borski, R.J., McElroy, A.E., Conover, D.O., 2010. &+! Ontogenesis of gonadal aromatase gene expression in atlantic silverside (Menidia &$! menidia) Populations with genetic and temperature̺dependent sex determination. &%! Journal of Experimental Zoology Part A: Ecological Genetics and Physiology 313, ('! 421-431. ("! Elisio, M., Soria, F.N., Fernandino, J.I., Strussmann, C.A., Somoza, G.M., Miranda, L.A., (#! 2012. Extrahypophyseal expression of gonadotropin subunits in pejerrey Odontesthes (&! bonariensis and effects of high water temperatures on their expression. General and ((! comparative endocrinology 175, 329-336. ()!

! "*#! Evans, R.M., 1988. The steroid and thyroid hormone receptor superfamily. Science 240, "! 889-895. #! Fernandino, J.I., Hattori, R.S., Kimura, H., Strussmann, C.A., Somoza, G.M., 2008a. &! Expression profile and estrogenic regulation of anti-Mullerian hormone during (! gonadal development in pejerrey Odontesthes bonariensis, a teleost fish with strong )! temperature-dependent sex determination. Developmental dynamics : an official *! publication of the American Association of Anatomists 237, 3192-3199. +! Fernandino, J.I., Hattori, R.S., Kishii, A., Strussmann, C.A., Somoza, G.M., 2012. The $! cortisol and androgen pathways cross talk in high temperature-induced %! masculinization: the 11beta-hydroxysteroid dehydrogenase as a key enzyme. "'! Endocrinology 153, 6003-6011. ""! Fernandino, J.I., Hattori, R.S., Moreno Acosta, O.D., Strussmann, C.A., Somoza, G.M., "#! 2013. Environmental stress-induced testis differentiation: androgen as a by-product "&! of cortisol inactivation. General and comparative endocrinology 192, 36-44. "(! Fernandino, J.I., Hattori, R.S., Shinoda, T., Kimura, H., Strobl-Mazzulla, P.H., ")! Strussmann, C.A., Somoza, G.M., 2008b. Dimorphic expression of dmrt1 and "*! cyp19a1 (ovarian aromatase) during early gonadal development in pejerrey, "+! Odontesthes bonariensis. Sexual development : genetics, molecular biology, "$! evolution, endocrinology, embryology, and pathology of sex determination and "%! differentiation 2, 316-324. #'! Flicek, P., Ahmed, I., Amode, M.R., Barrell, D., Beal, K., Brent, S., . . . Searle, S.M.J., #"! 2013. Ensembl 2013. Nucleic acids research 41, D48-D55. ##! Guan, G., Kobayashi, T., Nagahama, Y., 2000. Sexually Dimorphic Expression of Two #&! Types of DM (< i> Doublesex/Mab-3)-Domain Genes in a Teleost Fish, the #(! Tilapia (< i> Oreochromis niloticus). Biochemical and biophysical research #)! communications 272, 662-666. #*! Guiguen, Y., Fostier, A., Piferrer, F., Chang, C.F., 2010. Ovarian aromatase and #+! estrogens: a pivotal role for gonadal sex differentiation and sex change in fish. Gen. #$! Comp. Endocrinol. 165, 352-366. #%! Hattori, R.S., Fernandino, J.I., Kishii, A., Kimura, H., Kinno, T., Oura, M., . . . Watanabe, &'! S., 2009. Cortisol-induced masculinization: does thermal stress affect gonadal fate in &"! pejerrey, a teleost fish with temperature-dependent sex determination? PloS ONE 4, &#! e6548. &&! Hattori, R.S., Gould, R.J., Fujioka, T., Saito, T., Kurita, J., Strussmann, C.A., . . . &(! Watanabe, S., 2007. Temperature-dependent sex determination in Hd-rR medaka &)! Oryzias latipes: gender sensitivity, thermal threshold, critical period, and DMRT1 &*! expression profile. Sexual development : genetics, molecular biology, evolution, &+! endocrinology, embryology, and pathology of sex determination and differentiation 1, &$! 138-146. &%! Hattori, R.S., Murai, Y., Oura, M., Masuda, S., Majhi, S.K., Sakamoto, T., . . . ('! Strussmann, C.A., 2012. A Y-linked anti-Mullerian hormone duplication takes over a ("! critical role in sex determination. Proceedings of the National Academy of Sciences (#! of the United States of America 109, 2955-2959. (&! Hayashi, Y., Kobira, H., Yamaguchi, T., Shiraishi, E., Yazawa, T., Hirai, T., . . . Kitano, ((! T., 2010. High temperature causes masculinization of genetically female medaka by ()! elevation of cortisol. Molecular reproduction and development 77, 679-686. (*!

! "*&! Herpin, A., Adolfi, M.C., Nicol, B., Hinzmann, M., Schmidt, C., Klughammer, J., . . . "! Schartl, M., 2013. Divergent expression regulation of gonad development genes in #! medaka shows incomplete conservation of the downstream regulatory network of &! vertebrate sex determination. Molecular biology and evolution 30, 2328-2346. (! Heule, C., Salzburger, W., Bohne, A., 2014. Genetics of sexual development: an )! evolutionary playground for fish. Genetics 196, 579-591. *! Howe, K., Clark, M.D., Torroja, C.F., Torrance, J., Berthelot, C., Muffato, M., . . . +! Stemple, D.L., 2013. The zebrafish reference genome sequence and its relationship $! to the human genome. Nature 496, 498-503. %! Jaillon, O., al., e., 2004. Genome duplication in the teleost fish Tetraodon nigroviridis "'! reveals the early vertebrate proto-karyotypes. Nature 431.7011, 946-957. ""! Josso, N., di Clemente, N., Gouédard, L., 2001. Anti-müllerian hormone and its receptors. "#! Mol. Cell. Endocrinol. 179, 25-32. "&! Karube, M., Fernandino, J.I., Strobl-Mazzulla, P., Strussmann, C.A., Yoshizaki, G., "(! Somoza, G.M., Patino, R., 2007. Characterization and expression profile of the ")! ovarian cytochrome P-450 aromatase (cyp19A1) gene during thermolabile sex "*! determination in pejerrey, Odontesthes bonariensis. Journal of experimental zoology. "+! Part A, Ecological genetics and physiology 307, 625-636. "$! Kasahara, M., 2007. The medaka draft genome and insights into vertebrate genome "%! evolution. Nature 447. #'! Kettlewell, J.R., Raymond, C.S., Zarkower, D., 2000. Temperature-dependent expression #"! of turtle Dmrt 1 prior to sexual differentiation. Genesis 26, 174-178. ##! Kumar, R.S., Ijiri, S., Trant, J.M., 2000. Changes in the expression of genes encoding #&! steroidogenic enzymes in the channel catfish (Ictalurus punctatus) ovary throughout #(! a reproductive cycle. Biology of reproduction 63, 1676-1682. #)! Lagomarsino, I.V., Conover, D.O., 1993. Variation in environmental and genotypic sex #*! determining mechanisms across a latitudinal gradient in the fish, Menidia menidia. #+! Evolution; international journal of organic evolution 47, 487-494. #$! Latchman, D.S., 1997. Transcription factors: an overview. Int. J. Biochem. Cell Biol. 29, #%! 1305-1312. &'! Levavi-Sivan, B., Bogerd, J., Mañanós, E.L., Gómez, A., Lareyre, J.J., 2010. &"! Perspectives on fish gonadotropins and their receptors. General and comparative &#! endocrinology 165, 412-437. &&! Loffler, K.A., Zarkower, D., Koopman, P., 2003. Etiology of ovarian failure in &(! blepharophimosos ptosis epicanthus inversus syndrome: FOXL2 is a conserved, &)! early-acting gene in vertebrate ovarian development. Endocrinology 144, 3237-3243. &*! Lovejoy, N.R., Albert, J.S., Crampton, W.G.R., 2006. Miocene marine incursions and &+! marine/freshwater transitions: Evidence from Neotropical fishes. Journal of South &$! American Earth Sciences 21, 5-13. &%! Marchand, O., Govoroun, M., D’Cotta, H., McMeel, O., Lareyre, J.-J., Bernot, A., . . . ('! Guiguen, Y., 2000. DMRT1 expression during gonadal differentiation and ("! spermatogenesis in the rainbow trout,< i> Oncorhynchus mykiss. Biochimica et (#! Biophysica Acta (BBA)-Gene Structure and Expression 1493, 180-187. (&! Matsuda, M., 2003. Sex determination in fish: lessons from the sex-determining gene of ((! the teleost medaka, Oryzias latipes. Dev. Growth Differ. 45, 397-403. ()!

! "*(! Matsuda, M., Nagahama, Y., Shinomiya, A., Sato, T., Matsuda, C., Kobayashi, T., . . . "! Sakaizumi, M., 2002. DMY is a Y-specific DM-domain gene required for male #! develoment in the medaka fish. Nature 417, 5. &! Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., . . . (! Hornischer, K., 2006. TRANSFAC® and its module TRANSCompel®: )! transcriptional gene regulation in eukaryotes. Nucleic acids research 34, D108-D110. *! Miranda, L.A., Chalde, T., Elisio, M., Strussmann, C.A., 2013. Effects of global warming +! on fish reproductive endocrine axis, with special emphasis in pejerrey Odontesthes $! bonariensis. General and comparative endocrinology 192, 45-54. %! Miura, T., Miura, C., Ohta, T., Nader, M.R., Todo, T., Yamauchi, K., 1999. Estradiol-17 "'! β stimulates the renewal of spermatogonial stem cells in males. Biochemical and ""! biophysical research communications 264, 230-234. "#! Morinaga, C., Saito, D., Nakamura, S., Sasaki, T., Asakawa, S., al., e., 2007. The hotei "&! mutation of medaka in the anti-müllerian hormone receptor causes the dysregulation "(! of germ cell and sexual development. Proc. Natl. Acad. Sci. USA 104, 9691-9696. ")! Nakamura, M., 2010. The mechanism of sex determination in vertebrates-are sex steroids "*! the key-factor? Journal of experimental zoology. Part A, Ecological genetics and "+! physiology 313, 381-398. "$! Navarro-Martín, L., Viñas, J., Ribas, L., Díaz, N., Gutierrez, A., al., e., 2011. DNA "%! methylation of the gonadal aromatase (cyp19a) promoter is involved in temperature- #'! dependent sex ratio shifts in the European sea bass. PLoS genetics 7, e1002447. #"! Ng, Y.K., Wu, W., Zhang, L., 2009. Positive correlation between gene coexpression and ##! positional clustering in the zebrafish genome. BMC genomics 10, 42. #&! Nilsson, S., Mäkelä, S., Treuter, E., Tujague, M., Thomsen, J., Andersson, G., . . . #(! Gustafsson, J.A., 2001. Mechanisms of estrogen action. Physiol. Rev. 81, 1535-1565. #)! Nocillado, J.N., Elizur, A., Avitan, A., Carrick, F., Levavi-Sivan, B., 2006. Cytochrome #*! P450 aromatase in grey mullet: cDNA and promoter isolation: brain, pituitary and #+! ovarian expression during puberty. Mol. Cell. Endocrinol. 263, 65-78. #$! Nocillado, J.N., Mechaly, A.S., Elizur, A., 2013. In silico analysis of the regulatory #%! region of the Yellowtail Kingfish and Zebrafish Kiss and Kiss receptor genes. Fish &'! physiology and biochemistry 39, 59-63. &"! Ospina-Alvarez, N., Piferrer, F., 2008. Temperature-dependent sex determination in fish &#! revisited: prevalence, a single sex ratio response pattern, and possible effects of &&! climate change. PLoS one 3, e2837. &(! Ovcharenko, I., Loots, G.G., Giardine, B.M., Hou, M., Ma, J., Hardison, R.C., . . . Miller, &)! W., 2005. Mulan: multiple-sequence local alignment and visualization for studying &*! function and evolution. Genome research 15, 184-194. &+! Ozaki, Y., Higuchi, M., Miura, C., Yamaguchi, S., Tozawa, Y., Miura, T., 2006. Roles of &$! 11beta-hydroxysteroid dehydrogenase in fish spermatogenesis. Endocrinology 147, &%! 5139-5146. ('! Paibomesai, M.I., Moghadam, H.K., Ferguson, M.M., Danzmann, R.G., 2010. Clock ("! genes and their genomic distributions in three species of salmonid fishes: (#! Associations with genes regulating sexual maturation and cell cycling. BMC research (&! notes 3, 215. ((! Perez, M.R., Fernandino, J.I., Carriquiriborde, P., Somoza, G.M., 2012. Feminization and ()! altered gonadal gene expression profile by ethinylestradiol exposure to pejerrey, (*!

! "*)! Odontesthes bonariensis, a South American teleost fish. Environmental toxicology "! and chemistry / SETAC 31, 941-946. #! Pérez, M.R., Fernandino, J.I., Carriquiriborde, P., Somoza, G.M., 2012. Feminization and &! altered gonadal gene expression profile by ethinylestradiol exposure to pejerrey, (! Odontesthes bonariensis, a South American teleost fish. Environmental Toxicology )! and Chemistry 31, 941-946. *! Piferrer, F., 2013. Epigenetics of sex determination and gonadogenesis. Developmental +! dynamics : an official publication of the American Association of Anatomists 242, $! 360-370. %! Piferrer, F., Ribas, L., Diaz, N., 2012. Genomic approaches to study genetic and "'! environmental influences on fish sex determination and differentiation. Mar. ""! Biotechnol. 14, 591-604. "#! Rodríguez-Marí, A., Yan, Y.-L., BreMiller, R.A., Wilson, C., Cañestro, C., Postlethwait, "&! J.H., 2005. Characterization and expression pattern of zebrafish anti-Müllerian "(! hormone (amh) relative to sox9a, sox9b, and cyp19a1a, during gonad development. ")! Gene Expression Patterns 5, 655-667. "*! Sarre, S.D., Georges, A., Quinn, A., 2004. The ends of a continuum: genetic and "+! temperature-dependent sex determination in reptiles. . Bioassays 26, 639-645. "$! Shinoda, T., Miranda, L.A., Okuma, K., Hattori, R.S., Fernandino, J.I., Yoshizaki, G., . . . "%! Strussmann, C.A., 2010. Molecular cloning and expression analysis of Fshr and Lhr #'! in relation to Fshb and Lhb subunits during the period of temperature-dependent sex #"! determination in pejerrey Odontesthes bonariensis. Mol Reprod Dev 77, 521-532. ##! Shoemaker, C.M., Crews, D., 2009. Analyzing the coordinated gene network underlying #&! temperature-dependent sex determination in reptiles. Semin. Cell. Dev. Biol. 20, #(! 293-303. #)! Simpson, E., Clyne, C., Rubin, G., Boon, W., Robertson, K., Britt, K., . . . Jones, M., #*! 2002. Aromatase: a brief overview. Ann. Rev. Physiol. 64, 93-127. #+! Simpson, E., Mahendroo, M., Means, G., Kilgore, M., Graham-Lorence, S., Amarneh, #$! B., . . . Michael, M., 1994. Aromatase cythochrome P450, the enzyme responsible for #%! estrogen biosynthesis. Endocrinol. Rev. 15, 342-355. &'! Smale, S.T., Kadonaga, J.T., 2003. The RNA polymerase II core promoter. Annu. Rev. &"! Biochem. 72, 449-479. &#! Solovyev, V., 2003. PromH: promoters identification using orthologous genomic &&! sequences. Nucleic acids research 31, 3540-3545. &(! Star, B., Nederbragt, A.J., Jentoft, S., Grimholt, U., Malmstrom, M., Gregers, T.F., . . . &)! Jakobsen, K.S., 2011. The genome sequence of Atlantic cod reveals a unique &*! immune system. Nature 477, 207-210. &+! Strobl-Mazzulla, P.H., Lethimonier, C., Gueguen, M.M., Karube, M., Fernandino, J.I., &$! Yoshizaki, G., . . . Somoza, G.M., 2008. Brain aromatase (Cyp19A2) and estrogen &%! receptors, in larvae and adult pejerrey fish Odontesthes bonariensis: ('! Neuroanatomical and functional relations. General and comparative endocrinology ("! 158, 191-201. (#! Strüssmann, C.A., Calsina Cota, J.C., Phonlor, G., Higuchi, H., Takashima, F., 1996a. (&! Temperature effects on sex differentiation of two South American atherinids, ((! Odontesthes argentinensis and Patagonina hatcheri. Environmental Biology of Fishes ()! 47, 143-154. (*!

! "**! Strussmann, C.A., Conover, D.O., Somoza, G.M., Miranda, L.A., 2010. Implications of "! climate change for the reproductive capacity and survival of New World silversides #! (family Atherinopsidae). J Fish Biol 77, 1818-1834. &! Strüssmann, C.A., Nakamura, M., 2002. Morphology, endocrinology, and environmental (! modulation of gonadal sex differentiation in teleost fishes. Fish physiology and )! biochemistry 26, 13-29. *! Strüssmann, C.A., Saito, T., Usui, M., Yamada, H., Takashima, F., 1997. Thermal +! thresholds and critical period of thermolabile sex determination in two atherinid $! fishes, Odontesthes bonariensis and Patagonina hatcheri. The Journal of %! experimental biology 278, 167-177. "'! Strüssmann, C.A., Takashima, F., Toda, K., 1996b. Sex differentiation and hormonal ""! feminization in pejerrey Odontesthes bonariensis. Aquaculture 139, 31-45. "#! Suzuki, M.G., Ito, H., Aoki, F., 2014. Effects of RNAi-mediated knockdown of histone "&! methyltransferases on the sex-specific mRNA expression of Imp in the silkworm "(! Bombyx mori. International Journal of molecular sciences 15.4, 6772-6796. ")! Tanaka, M., Fukada, S., Matsuyama, M., Nagahama, Y., 1995. Structure and promoter "*! analysis of the cytochrome P-450 aromatase gene of the teleost fish (Oryzias latipes). "+! J. Biochem. 117, 719-725. "$! Uller, T., Helantera, H., 2011. From the origin os sex-determining factors to the evolution "%! of sex-determining systems. Q. Rev. Biol. 86, 163-180. #'! Valenzuela, N., 2008. Relic thermosensitive gene expression in a turtle with genotypic #"! sex determination Evol Int J Org Evol 62, 234-240. ##! Valenzuela, N., 2010. Multivariate expression analysis of the gene network underlying #&! sexual development in turtle embryos with temperature-dependent and genotypic sex #(! determination. Sexual development : genetics, molecular biology, evolution, #)! endocrinology, embryology, and pathology of sex determination and differentiation 4, #*! 39-49. #+! Vyhlidal, C., Samudio, I., Kladde, M.P., Safe, S., 2000. Transcriptional activation of #$! transforming growth factor alfa by estradiol: requirement for both a GC-rich site and #%! an estrogen response element half-site. Journal of molecular endocrinology 24, 329- &'! 338. &"! Wang, D.S., Kobayashi, T., Zhou, L.Y., Paul-Prasanth, B., Ijiri, B., Sakai, F., . . . &#! Nagahama, Y., 2007. Foxl2 up-regulates aromatase gene transcription in a female- &&! specific manner by binding to the promoter as well as interacting with ad4 binding &(! protein/steroidogenic factor 1. Mol. Endocrinol. 21, 712-725. &)! Yamaguchi, T., Yoshinaga, N., Yazawa, T., Gen, K., Kitano, T., 2010. Cortisol is &*! involved in temperature-dependent sex determination in the Japanese flounder. &+! Endocrinology 151, 3900-3908. &$! Yamahira, K., Conover, D.O., 2003. Interpopulation variability in temperature-dependent &%! sex determination of the tidewater silverside Menidia peninsulae. Copeia 2003, 155- ('! 159. ("! Yamamoto, T., 1969. Sex differentiation. Fish Physiol. A 9, 117-372. (#! Yamamoto, Y., Zhang, Y., Sarida, M., Hattori, R.S., Strussmann, C.A., 2014. (&! Coexistence of genotypic and temperature-dependent sex determination in pejerrey ((! Odontesthes bonariensis. PloS ONE 9, e102574. ()!

! "*+! Yoshinaga, N., Shiraishi, E., Yamamoto, T., Iguchi, T., Abe, S.-i., Kitano, T., 2004. "! Sexually dimorphic expression of a teleost homologue of Müllerian inhibiting #! substance during gonadal sex differentiation in Japanese flounder,< i> Paralichthys &! olivaceus. Biochemical and biophysical research communications 322, 508-513. (! ! )! *!

! "*$!