Molecular Evolution of Dim-light Visual Pigments in Neotropical Geophagine

by

Shannon Refvik

A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Ecology and Evolutionary Biology University of Toronto

c Copyright 2012 by Shannon Refvik Abstract

Molecular Evolution of Dim-light Visual Pigments in Neotropical Geophagine Cichlids

Shannon Refvik Master of Science Graduate Department of Ecology and Evolutionary Biology University of Toronto 2012

Neotropical fishes are highly diverse and occupy a wide range of environments. Evo- lution of visual pigments has been important in the diversification of the African rift lake cichlids, but relatively little is known of Neotropical cichlid visual systems. This thesis ad- dresses the molecular evolution of the dim-light visual pigment rhodopsin in the Geophagini tribe of Neotropical cichlids. We use various likelihood-based codon models of molecular evo- lution and newly isolated sequences for Neotropical cichlid rhodopsin to compare patterns of selective constraint among Neotropical, African rift lake, and African riverine cichlid rhodopsin; and provide evidence for differences in selective constraint among clades with positive selection occurring in both the Neotropical and African rift lake clades. We further investigate and find evidence for variation in selective constraint within the geophagine ci- chlids. Comparing the results obtained from different methods suggests that Clade model C is more appropriate than branch-site models for investigating variation in selective con- straint among clades. Neotropical cichlids, alone and in comparison with African cichlids, are emerging as an excellent system for investigating molecular evolution in visual pigments.

ii Acknowledgements

First and foremost, I would like to thank Hern´anL´opez-Fern´andezand Belinda Chang, my co-supervisors, for their support and advice throughout my degree. I came into this project with zero experience working either with fish or in molecular biology, but they were incredibly helpful in the ensuing learning process. I would also like to thank Hern´anfor providing me with opportunities to work in the field, and my committee members, Allan Baker and Nathan Lovejoy, for their helpful suggestions throughout.

I would like to acknowledge members of the L´opez-Fern´andezand Chang labs for their support - particularly Jessica Arbour, who helped me wade through the finer points of graduate school adminstration processes; James Morrow, David Yu, and Ilke van Hazel, who patiently helped me learn lab procedures; and Cameron Weadick, who conducted some of the studies that motivated my research and helped me to implement the clade model he developed.

My family, friends, and in particular my partner Jasper Palfree have been incredibly sup- portive - it was helpful and motivating to share my successes and discuss my challenges with such great people. On that note, I would like to thank the members of Toronto’s Lindy Hop scene, who definitely kept me sane when the challenges seemed overwhelming.

Lastly, I would like to acknowledge my funding sources for this project, NSERC and OGS.

iii Contents

0.1 Statement of Contributions ...... 1

1 General Introduction 2 1.1 Biogeography of Cichlids ...... 2 1.2 Vertebrate Vision ...... 6 1.3 Visual systems of African and Neotropical cichlids ...... 10 1.4 Codon based models of molecular evolution ...... 13 1.5 Objectives ...... 19 1.5.1 Objective 1: Investigating differences in selective constraint between Neotropical and African dim light visual pigments ...... 19 1.5.2 Objective 2: Investigating differences in selective constraint within geophagine cichlid dim light visual pigments ...... 20 1.6 Figures ...... 20

2 Molecular Evolution of Dim-light Visual Pigments in Neotropical Geophagine Cichlids: Evidence for differences in selective constraint in comparison with African cichlids 22 2.1 Abstract ...... 22 2.2 Introduction ...... 23

iv 2.3 Methods ...... 26 2.3.1 Samples and Sequences ...... 26 2.3.2 Tree building ...... 27 2.3.3 Testing for selection ...... 28 2.4 Results ...... 32 2.4.1 Molecular dataset ...... 32 2.4.2 Site models ...... 33 2.4.3 Clade model C ...... 33 2.4.4 Branch-site Models ...... 34 2.4.5 Influence of positively selected sites on rhodopsin function ...... 35 2.5 Discussion ...... 37 2.5.1 Positive selection in Neotropical and African cichlid rhodopsins . . . 37 2.5.2 High average omega values ...... 38 2.5.3 Divergent selection between clades ...... 39 2.5.4 Non-overlapping BEB sites ...... 41 2.5.5 Clade model C vs. Branch-site Results ...... 43 2.5.6 Caveats ...... 45 2.5.7 Conclusions ...... 46 2.6 Tables ...... 46 2.7 Figures ...... 51 2.8 Supplementary information ...... 55

3 Patterns of Selective Constraint in Geophagine Cichlid Rhodopsin 60 3.1 Introduction ...... 60 3.2 Methods ...... 62 3.2.1 Species Included and Phylogenetic Relationships ...... 62

v 3.2.2 Clade Model C Analyses ...... 62 3.2.3 Branch-site Analyses ...... 63 3.3 Results ...... 63 3.3.1 Clade Model C ...... 63 3.3.2 Branch-site ...... 64 3.3.3 Divergently Selected Sites ...... 64 3.4 Discussion ...... 66 3.4.1 Divergent Selection Between Clades, with Positive Selection Throughout 66 3.4.2 Clade model C vs. Branch-site Results ...... 67 3.5 Tables ...... 69 3.6 Figures ...... 73

4 Conclusions and Future Directions 75 4.1 Conclusions ...... 75 4.2 Future Directions ...... 79

5 References 84

vi List of Tables

2.1 Parameter estimates, likelihood values, likelihood ratio tests, and significance values of PAML random site models using Neotropical or African RH1 sequences. 47 2.2 BEB sites in Neotropical and African cichlids ...... 48 2.3 Parameter estimates, likelihood values, test statistics, and p values for various data partitions in Clade Model C...... 49 2.4 Likelihood values, test statistics, and p values for likelihood ratio tests for branch-site models...... 50 2.1 Supplementary Table. Species list, museum catalogue numbers, and accession numbers for sequences used in this study...... 55 2.2 Supplementary table: Parameter estimates, likelihood values, test statistics, and p values for various data partitions in Clade Model C with phylogenetically misplaced species removed...... 57 2.3 Supplementary table: Likelihood values, test statistics, and p values for likeli- hood ratio tests for branch-site models with phylogenetically misplaced species removed...... 58 2.4 Supplementary table: Detailed BEB output for Site Models, CmC, and Branch- site Models...... 59

vii 3.1 Supplementary Table. Species list, museum catalogue numbers, and accession numbers for sequences used in this study...... 69 3.2 Parameter estimates, likelihood values, test statistics, and p values for CmC analysis of a tree with three partitions: The “” clade, the “Geoph- agus” clade, and a clade of basal outgroups...... 71 3.3 Likelihood values, test statistics, and p values for likelihood ratio tests for branch-site models...... 72

viii List of Figures

1.1 3D images of dark-state and active state rhodopsin...... 21

2.1 Maximum likelihood tree of RH1 sequences, constrained to be reciprocally monophyletic ...... 52 2.2 RH1 phylogeny and distribution of amino acid residues at positively selected sites in Neotropical and African cichlids...... 53 2.3 Interface between rhodopsin molecules in a dimer ...... 54 2.4 Openings to retinal binding pocket in the active conformation of rhodopsin. . 54

3.1 Amino acid residues at divergently selected sites in geophagine cichlids and some Neotropical basal outgroups ...... 74

ix 0.1 Statement of Contributions

Chapters 1, 3, and 4 of this thesis were conceived of and written by Shannon Refvik. Chapter 2 of this thesis will be submitted as a paper co-authored by myself and my two co-supervisors, Belinda Chang and Hern´anL´opez-Fern´andez. The studies included in this chapter were designed collaboratively between myself and my supervisors. I conducted all of the data collection, performed the statistical analyses, and wrote the text of all work submitted in this thesis.

1 Chapter 1

General Introduction

1.1 Biogeography of Cichlids

The rivers of South and Central America harbour the most diverse freshwater fish fauna on earth, with an estimated 7000 species that interact in a wide variety of structured commu- nities (Reis et al. 2003). Cichlids fishes are the third largest group of Neotropical fish with approximately 600 species, and are ubiquitous throughout the ecologically varied aquatic habitats of South and Central America from southern Patagonia to Texas (Reis et al. 2003). Cichlids exhibit diverse life histories, reproductive modes, and feeding strategies (Wimberger et al. 1998, Barlow 2000), with this diversity being well represented by the geophagine clade. Geophagines are a monophyletic group (L´opez-Fern´andez et al. 2010) restricted to South America and Southern Panama (Reis et al. 2003), and are one of the three most species rich tribes of Neotropical cichlids along with Cichlasomatini and Heroini (Kullander 1998, Smith et al. 2008, L´opez-Fern´andez et al. 2010). Within 17 genera, this clade includes species with a diversity of feeding modes including piscivorous species, substrate sifters, and water-

2 column feeders, as well as species that mouth brood their young (L´opez-Fern´andez et al. 2005a, 2012). Diet categories within the geophagines are highly correlated to morphological characteristics, indicating that ecomorphological specialization has occurred (Winemiller et al. 1995, L´opez-Fern´andez et al. 2012). Their ecological and morphological diversity, com- bined with the well-resolved genus-level phylogeny available for Neotropical cichlids (L´opez- Fern´andez et al. 2010), make them an ideal system for investigating the ecology and evolution of the freshwater fish fauna in the Neotropics.

Neotropical cichlids make up a monophyletic clade that is sister to the African cichlids (Streelman et al. 1998, Farias et al. 1999, 2000, 2001; Sparks and Smith 2004, Smith et al. 2008, L´opez-Fern´andez et al. 2010), which includes the species-rich and well-studied African rift lake cichlids (reviewed in: Kocher 2004, Seehausen 2006). The African and Neotropical clades make up the majority of cichlid biodiversity, in addition to five genera including 18 species occurring in Madagascar and a single genus with three species occurring in India/Sri Lanka (Sparks 2004) that are basal to the African/Neotropical sister clades (Farias et al. 1999, 2000). The distribution of these species suggests a Gondwanan origin of cichlids, which has been supported by fossil-calibrated phylogenetic analyses (eg. Genner et al. 2007, L´opez-Fern´andez et al. in review). Cichlid diversity is distributed quite differently on the two continents with respect to geography: The majority of cichlid diversity in occurs in lacustrine habitats, where in the rift lakes of Eastern Africa 1000-2000 species have evolved in just the past 5my (Seehausen 2006). In contrast, the majority of cichlid diversity occurs in riverine habitats in the Neotropics, with only a few species/species complexes inhabiting lacustrine habitats (see Barluenga et al. 2006), and at least some major taxonomic groups have been present since the Eocene (Malabarba et al. 2010, L´opez-Fern´andez et al. in review).

The speciose African rift lake cichlids are a model system for vertebrate adaptive radi-

3 ation (reviewed in Kocher 2004, Seehause et al. 2006), which was defined by Schluter (2000) as “evolution of ecological and phenotypic diversity within a rapidly multiplying lineage”. Schluter further suggests four characteristics which define adaptive radiation in a group: 1) monophyly 2) rapid diversification 3) phenotype-trait correlation 4) trait utility. The utility of this definition, and indeed of the term adaptive radiation, have been the subject of much controversy (reviewed in Glor 2010), but all modern definitions agree that that adaptive radi- ation requires speciation within a clade and adaptive diversification between its species (Glor 2010). The geophagine cichlids of South America demonstrably have some characteristics of an adaptive radiation as defined by Schluter (2000), including: monophyly, as evidenced by phylogenetic of molecular and morphological data (L´opez-Fern´andez et al. 2005b, Farias et al. 1999, 2000, and 2001); rapid speciation, as evidenced by short basal branches that are not significantly different from zero in phylogenetic reconstructions (L´opez-Fern´andez et al. 2005a, 2005b, 2010) and by lineage through time plots which show a rapid initial diversification followed by a reduction in diversification rate (L´opez-Fern´andez et al. in re- view); and phenotype-trait correlation, as evidenced by strong correlations between feeding modes and axes of morphological variation (L´opez-Fern´andez et al. 2012). While African rift lake cichlid adaptive radiations are very young, the presence of the extinct Neotropi- cal species Gymnogeophagus eocenicus from the modern genus Gymnogeophagus in the fossil record from approximately 50 mya (Malabarba et al. 2010) suggests that the extant diversity may be the result of an ancient adaptive radiation (L´opez-Fern´andez et al. 2005a, 2005b). Despite undergoing strong morphological divergence, geophagines have not diversified into as many species as the African rift lake cichlids (approx. 600 species (Reis et al. 2003) compared to 1000-2000 (Seehausen 2006)). Determining why there is unequal diversification between lineages is a major goal in evolutionary biology (Foote 1997, Sidlauskas 2008), and providing a comparison between the hyper-diverse African rift lake cichlids and the Neotrop- ical geophagines, which are also very diverse but less speciose, may provide insight into the

4 circumstances that have promoted diversification in the respective groups.

Neotropical cichlids, and in particular the members of the geophagine clade, are an excellent system to provide a comparison to African rift lake cichlids. The Neotropical cich- lid clade has several properties which address short-comings in the African rift lake model system: 1) Geophagines exist in complex communities with other more distantly related taxa, which is a more common ecological situation than in the African lakes, where cichlid diversity has originated mostly in the context of other closely related cichlid species (Turner et al. 2001). Conclusions drawn from the study of Geophagine cichlids may therefore be more applicable to other taxa. 2) Geophagine cichlid diversity is much older than the African rift lake cichlids, with at least some modern genera having been present since the Eocene (Malabarba et al. 2010) and fossil-calibrated molecular clock analysis estimating an age of approximately 107 Mya for the clade (L´opez-Fern´andez et al. in review). Further, many species of Neotropical cichlid heavily influence riverine community structure (eg. Reznick and Endler 1982, P´erez et al. 2007). The relatively young age of the African rift lake ci- chlids makes it unclear how much these species will contribute to the long term structure of freshwater fish communities in Africa, particularly in the light of recent extinctions due to eutrophication and the associated break-down of pre-zygotic reproductive barriers (See- hausen 1997), and extinctions due to the introduction of predators such as the Nile Perch (Witte et al. 1992, but see Awiti 2011). Studies of Neotropical cichlids may therefore be more relevant for understanding freshwater fish community structure in a more general sense. 3) Lastly, well resolved, time-calibrated, genus-level phylogenies are available for Neotropical cichlids (L´opez-Fern´andez et al. 2010), with work underway to provide species-level phylo- genies for some genera (eg. Willis et al. 2012). There has been a strong effort to understand phylogenetic relationships among African cichlids (eg. Albertson et al. 1999, Salzburger et al. 2002, Schwarzer et al. 2009), and these have placed the root of the monophyletic African

5 rift lake radiations within the context of other African cichlids (Schwarzer et al. 2009). However, species or even genus-level phylogenies may be impossible to obtain for African rift lake cichlids due to the vast number of species to be considered (Seehausen 2006), low levels of genetic differentiation between species (Zardoya et al. 1996), the persistence of ances- tral polymorphisms (Moran and Kornfield 1993, van Oppen et al. 2000), and hybridization and introgression between species (Koblmuller et al. 2010). The existence of a phylogeny for geophagine cichlids allows for hypotheses to be made that explicitly take evolutionary history into account.

1.2 Vertebrate Vision

Visual system evolution had been under intensive investigation in the African rift lake cich- lids, and visual system properties and evolution have been implicated in both the speciation and diversification of the group. African rift lake cichlids have therefore emerged as a model system for understanding the molecular biology and evolution of opsin proteins (Carleton 2009). However, very little is known about Neotropical cichlid visual ecology, or whether visual system evolution has contributed to speciation or diversification in Neotropical cich- lids. This section introduces the molecular biology and biochemistry of the pigments that mediate vision, with a focus on the dim-light visual pigment rhodopsin.

Vertebrate vision is mediated by a class of photosensitive visual pigments situated in the rod and cone photoreceptor cells of the eye (Wald 1968). Visual pigments consist of an opsin protein and a light-absorbing chromophore (Wald 1953). Opsin proteins are members of the G protein-coupled receptor (GPCR) super family, and consist of seven α-helices that span the cell membrane, connected by intracellular and extracellular loops (Terakita 2005). The retinal chromophore is derived from vitamin A, and is covalently bound to the opsin

6 protein via a Schiff base linkage at amino acid site 296. In the dark-state visual pigment, the chromophore is located in the interior of the protein, surrounded by the 7 α-helices in the retinal binding pocket (Sakmar et al. 2002).

There are five major classes of vertebrate opsin, each of which absorbs a characteristic range of wavelengths of light: Four classes of cone opsins mediate bright light vision, including RH2 (the “green” cone, absorbing in the 470-530nm range), SWS1 (the “UV/violet” cone, absorbing in the 355-450nm range), SWS2 (the “blue” cone, absorbing in the 415-480nm range) and LWS (the “green/red” cone, absorbing in the 495-570 range); and a single class of rod opsin (rhodopsin) mediates dim-light vision and absorbs green light at 460-530nm (Bowmaker 2008). The opsin classes arose from a series of gene duplications that pre-date the evolution of the jaw, although one or more classes have been lost in some clades and gene duplications are relatively common, particularly within teleosts (Bowmaker 2008). The wavelength of light to which a visual pigment is maximally sensitive is referred to as the

λmax, and although pigments are sensitive to a range of wavelengths the λmax is commonly

used to describe the sensitivity of the visual pigment. λmax is mediated by the electrostatic conditions in the retinal binding pocket, which is dependent upon the amino acid sequence of the opsin protein and in particular the amino acid residues that have side chains near the chromophore (Kochendoerfer et al. 1999, Sakmar et al. 2002). Within these major classes

of visual pigments the λmax can be finely tuned by amino acid replacements of particular residues, and in some cases a single amino acid replacement can cause a large shift in the peak wavelength absorbed by the pigment (Takenaka and Yokoyama 2007; Kochendoerfer et a. 1999, Hunt et al. 2001).

The biochemical processes that allow vision to occur begin when a visual pigment absorbs a photon of light. In the dark state, the retinal bound to the opsin protein exists in the 11-cis conformation. Absorption of a photon causes the retinal to isomerize from 11-

7 cis-retinal to all-trans-retinal (Wald 1968), which triggers a series of conformational changes in the opsin protein. This thesis focuses on the dim-light pigment rhodopsin, which is a well-established model system for GPCR and visual pigment research and for which the biochemical pathways subsequent to photon absorption are the best understood (Menon et al. 2001, Fotiadis et al. 2006, Palczewski 2006, Hofmann et al. 2009, Smith 2010). Once triggered by retinal isomerization, rhodopsin passes through a series of intermediaries (photorhodopsin, bathorhodopsin, and lumirhodopsin) within milliseconds, then exists in an equilibrium between the Meta I state and the active Meta II state (Okada et al 2001). Several major structural changes occur during the transition to the Meta II state which distinguish the active structure from the dark-state structure: in the active structure, helices V and VI are tilted outwards via a hinge on the extracellular side of the protein, opening a crevasse on the cytoplasmic face (Farrens et al. 1996, Park et al. 2008); the length of helix V is extended at the expense of intracellular loop III (Park et al. 2008); and a channel opens parallel to the cell membrane surface that links the retinal binding pocket to the inter-membrane space by two openings, one between helices I and VII and the other between helices V and VI (Park et al. 2008, Hildebrand et al. 2009). Figure 1.1 shows a comparison of the dark state and Meta II crystal structure. In the active conformation the opsin can bind to and activate the G protein transducin, which initiates a signal transduction cascade within the cell. This cascade results in a decrease in cyclic GMP concentration, which closes cGMP gated channels and hyperpolarizes the cell. This leads to a reduction in neutrotransmitter release and affects neural signals to the brain (Yau and Hardie 2009). The activated phase is interrupted by phosphorylation of the opsin by rhodopsin kinase, which allows the binding of arrestin and prevents further G protein activation (Burns and Pugh 2010). The Schiff base linkage between the chromophore and opsin is subsequently hydrolyzed (Blazynski and Ostroy 1984), likely by bulk water from the intracellular face of the protein (Jastrzebska et al. 2011). All-trans-retinal then migrates out of the protein

8 through the opened channel (Hildebrand et al.2009), and the visual pigment is subsequently regenerated with re-constituted 11-cis-retinal which acts as reverse agonist, locking the visual pigment in the dark state configuration (Menon et al. 2001, Sakmar et al. 2002). Rhodopsin molecules form dimers and higher oligomers in vivo (Overton and Blumer 2000), with a dimerization interface between helices IV and V (Fotiadis et al. 2006).

Rhodopsin is one of the few GPCRs for which the 3D crystal structure has been solved, and 3D images are available for both the dark state (Palczewski et al. 2000) and active state (Park et al. 2008) of bovine rhodopsin (Figure 1.1) Extensive mutagenesis studies followed by functional assays have been performed on rhodopsin, making it the best understood GPCR in terms of the relationship between amino acid structure and visual pigment function (Hofmann et al. 2009). Mutagenesis studies have provided detailed information about how specific amino acid substitutions affect properties such as wavelength discrimination (Parry et al. 2004, Takenaka and Yokoyama 2007, Yokoyama et al. 2007, Yokoyama 2008), kinetic properties such as the equilibrium between Meta I and Meta II (Weitz and Nathan 1993, DeCaluwe 1995, Breikers et al. 2001, Sugawara et al. 2010), protein folding (Nakayama et al. 1998), the nature of the interface between dimers of rhodopsin (Kota et al. 2006), and rates of all-trans retinal release after photoactivation (Piechnick et al. 2012). The existence of this background research makes investigations into the molecular evolution of rhodopsin in cichlid fishes particularly interesting, because observations of evolutionary trends at the molecular level can be used to create hypotheses about how rhodopsin function, and hence organismal vision, may be impacted.

9 1.3 Visual systems of African and Neotropical cichlids

While there is very little known of the visual systems of Neotropical cichlids, the visual systems of African rift lake cichlids have been intensively studied. African rift lake cichlids exhibit a wide diversity of visual systems, with differences in opsin properties and expression among species. Rhodopsin in particular has undergone positive selection in many species, and its functional properties are often correlated to environmental characteristics. This section summarizes the extensive work that has been done on African rift lake cichlid vision, introduces what little is known of Neotropical cichlid vision, and provides a justification for extending visual system research in cichlids to Neotropical clades.

The majority of African cichlids possess seven fully intact cone opsins (SWS1, SWS2a, SWS2b, RH2aα, RH2aβ, RH2b, and LWS), including representatives from Lake Malawi (Spady et al. 2005), Lake Victoria (Carleton et al. 2005), and Lake Tanganyika (Carleton 2009). These have arisen from the five vertebrate opsin classes (Bowmaker 2008) through a series of gene duplications (Chinen et al. 2003; Matsumoto et al. 2006), at least one of which (the RH2aα/RH2aβ split) appears to be exclusive to the African cichlid lineage (Weadick et al. 2012). Although the full complement of opsins is present in the genome of most African rift lake cichlids, individuals tend to express only three opsins at any given time to produce a trichomatic visual system (Spady 2005, Carleton 2009, Carleton et al. 2010), with some species expressing a fourth opsin at low abundance (Perry et al. 2005). There are three typical combinations in which cones are expressed, yielding three general types of vision in cichlid fish: “UV” vision, where SWS1, RH2b, and RH2a are expressed; “Violet” vision, where SWS2b, RH2b, and RH2a are expressed, and “Blue” vision, where SWS2a, RH2a, and LWS are expressed (Carleton et al. 2000; Parry et al. 2005; Jordan et al. 2006; reviewed in Carleton 2009). Changes in the set of opsins that are expressed can occur

10 throughout ontogeny, where juveniles and adults of the same species express different sets of opsins (Spady et al. 2006), and differences in visual sensitivity between closely related species can be driven primarily by changes in opsin expression (Carleton and Kocher 2001). The large palette of opsins available allows African rift lake cichlids to adapt to different visual requirements and photic environments, and is hypothesized to have contributed to the evolution of cichlid diversity in the African rift lakes (Carleton 2009).

Changes in opsin expression yield large changes in visual sensitivity, but sensitivity can also be finely tuned by the molecular evolution of individual opsin proteins (Carleton 2009). Opsin sensitivity in African rift lake cichlids is often correlated to properties of the photic en- vironment, which includes factors such as the wavelengths of light available, light constancy, and light intensity. These properties can be affected by water depth, turbidity, and chemical properties (Lythgoe 1979). Fine scale changes in opsin sensitivity have been implicated in the process of speciation through sensory drive, as well as in the ecological diversification of closely related cichlid species. Speciation through sensory drive can occur when selection acts on sensory traits that are also involved in inter-species signalling (Boughman 2002). This applies to cichlids when putative species inhabit environments with different photic properties, which results in divergent selective pressures on their opsin genes. If selection pressure is strong enough, differences in wavelength discrimination can arise between the two populations (Terai et al. 2006, Seehausen et al. 2008). Most species of African rift lake cich- lids are sexually dimorphic, with drab females and brightly coloured males (Seehausen et al. 1998). Female cichlids tend to prefer conspicuous males (Maan et al. 2004), and the degree to which male colouration is conspicuous is dependent both on the visual sensitivity of the female and on the photic environment. Differences in visual sensitivity among populations can therefore drive differences in female preference in nuptial colouration, which can in turn drive differences in male nuptial colouration and contribute to pre-zygotic isolation upon

11 secondary contact of the speciating pair (Terai et al. 2006, Seehausen et al. 2008). This process has been implicated in the speciation of at least three pairs of cichlid fish species, ei- ther because differences in turbidity (Terai et al. 2006) or differences in depth (Seehausen et al. 2008) led to differences in wavelength availability among nearby populations. In each of these cases, this process of speciation through sensory drive involved the molecular evolution of the LWS opsin protein.

Rhodopsin proteins have frequently been targets of natural selection in aquatic or- ganisms (eg. Fasick and Robinson 2000, Hunt et al. 2001, Sivasundar and Palumbi 2010, Larmuseau et al. 2010), and have repeatedly evolved to complement the photic environment in the habitat of various African rift lake cichlids. This has been demonstrated most clearly by Sugawara et al. (2005), who showed that there have been repeated point mutations at amino acid site 292 from alanine to serine, which shifts the peak wavelength absorbed to- wards the blue end of the visible light spectrum and occurs in species that inhabit relatively blue-shifted waters. Recent ancestral reconstructions showed that this mutation has evolved independently at least four times, and that the reverse mutation from serine to alanine has occurred at least three times, in each case causing the species to be better adapted to the photic conditions in their habitat (Nagai et al. 2011). Rhodopsin proteins have also been shown to adapt to the intensity of light available in the environment in African rift lake ci- chlids, through a mutation at amino acid site 83 (Sugawara 2010). Aspartic acid is the most common residue at this site in African cichlids, and phylogenetic analyses indicate that there have been at least two mutations to asparagine at site 83, resulting in three species with this residue (Sugawara et al. 2005). This mutation is thought to be an adaptation for dim-light conditions, as it alters the equilibrium between the Meta I and Meta II forms of rhodopsin to favour the active Meta II state (Breikers et al. 2001, Sugawara et al. 2010). All of the African rift lake cichlids with the “dim-light” amino acid at this site (asparagine) inhabit

12 deeper waters than their closest relatives, where there is less light is available (Sugawara et al. 2005). In both of the above examples, a single amino acid substitution has evolved repeatedly and has caused a measurable phenotypic change which is highly correlated to the organisms habitat, strongly suggesting that the changes are adaptive.

Prior to this study, the only Neotropical cichlid for which opsins have been charac- terized at the sequence level is in the Pike cichlid from Trinidad, the geophagine Crenichla frenata, and very few species from the Neotropics have undergone spectrophotometric anal- ysis (Levine and MacNichol 1979, Wagner and Kroger 2005). C. frenata was chosen for study because it is the major predator of the guppy Poecilia reticulata, and imposes selec- tion on guppy colouration (reviewed in Houde 1997, Magurran 2005). This single species was found to possess only four cone opsins (LWS, RH2a, SWS2a, and SWS2b) compared to the 7 expressed in African cichlids due to a loss of the SWS1 pigment, pseudogenization of the RH2b pigment, and an African-specific duplication of the RH2a pigment into RH2aα and RH2aβ (Weadick et al. 2012). Intriguingly, both the SWS2b and RH1 opsins in C. frenata were found to be under positive selection using likelihood-based codon based models of evolution (Weadick et al. 2012). Because the visual systems of Neotropical cichlids and African riverine cichlids are under-explored compared to African rift lake cichlds, it is unclear whether the patterns of opsin reduction and positive selection seen in C. frenata may be due to differences in selection pressure due to lake vs. river habitats, differences in evolutionary history between African and Neotropical cichlids, or a species-specific pattern.

1.4 Codon based models of molecular evolution

Genetic variation among species is ultimately caused by mutation, which can be passively distributed by forces such as genetic drift and migration or influenced by natural selection

13 (Pages and Holmes 1998). Natural selection can be categorized into positive selection, where individuals with a particular mutation are favoured causing the mutation to spread through the population, and purifying selection, deleterious mutations are selected against and the original state tends to be preserved. These processes leave different patterns of variation in the DNA of extant species over evolutionary time. Models of molecular evolution attempt to mathematically describe processes that contribute to DNA or amino acid sequence variation among species, and by determining which of various models best fit a data set of aligned sequences one can infer which evolutionary processes, ie. positive, neutral, or purifying se- lection, likely affected them. The development of simple yet accurate models for sequence evolution is an area of active research in molecular biology, and many commonly used meth- ods are either the subject of intense controversy (see Nozawa et al. 2009a, 2009b, Yang and Reis 2011) or have recently been improved or extended (ie. Yoshida et al. 2011, Chang et al. 2012, Weadick and Chang 2012). This thesis employs various likelihood-based codon based models of molecular evolution to investigate differences in selective constraint on rhodopsin genes among groups of cichlids.

In the process of transcription, amino acids in a protein sequence are coded for by a set of three nucleotides at the DNA level, called codons. Because there are only 20 amino acids and 64 possible combinations of nucleotides, this code is degenerate: in most cases, there are several possible codons that will code for the same amino acid (Crick 1968). Some point substitutions at the nucleotide level therefore lead to a change in the amino acid produced, referred to as non-synonymous substitutions, and some do not lead to a change in the resulting amino acid, referred to as synonymous substitutions. Prior to 1994, nucleotide based (Jukes and Cantor 1969, Felsenstein 1981, Hasegawa et al. 1985) or amino acid- based (Kishino et al. 1990) models were used to model the evolution of protein-coding DNA and protein sequences. The base units in these models (either nucleotides or amino acids)

14 were assumed to evolve independently. In either case, these methods led to an under-use of available data: in nucleotide based models, the different constraints on synonymous and non- synonymous nucleotide changes were not considered; and amino acid based models ignored synonymous substitutions entirely (Goldman and Yang 1994). As statistical techniques to assess the accuracy of models of evolution were developed, both types of models were found to be increasingly inadequate (Goldman 1993).

Codon based models were introduced to bridge the gap between the two existing types of models; to simultaneously use information available in nucleotide sequences and to take into account effects caused by selection at the amino acid level (Goldman and Yang 1994, Muse and Gaut 1994). They assume that because synonymous substitutions do not affect the amino acid sequence of a protein, they will not be under evolutionary pressure. This as- sumption can be violated, for example when certain codons are favoured due to translational efficiency (reviewed in Duret 2002) or when certain codons are favoured to facilitate interac- tions between mRNAs and microRNAs, which affect protein production after transcription (Li et al. 2012). However, as long as such processes affect synonymous and non-synonymous sites equally this violation should not affect the integrity of the models (Fay and Wu 2003, Yang 2006). If the assumption that synonymous substitutions are selectively neutral holds true or if the violation affects sites equally, the ratio of non-synonymous substitutions per non-synonymous site (dN) to synonymous substitutions per synonymous site (dS) is a use- ful measure of selection pressure on non-synonymous substitutions (ω = dN/dS, Kimura 1983). Under positive selection, non-synonymous substitutions are promoted by natural selection, leading to an increase in non-synonymous substitutions relative to synonymous substitutions (ω > 1); conversely if a sequence is under purifying selection non-synonymous substitutions will be eliminated or reduced in frequency, and the number of substitutions at non-synonymous sites will be low compared to substitutions at synonymous sites (ω < 1).

15 Sequences where non-synonymous substitutions are not under selection are expected to have a ω approximately equal to one (Yang and Bielawski 2000, Nielsen and Yang 1998, Suzuki and Gojobori 1999, Hurst 2002).

This thesis uses the codeml program from the PAML software package, which is de- signed to detect signatures of positive selection in protein-coding DNA sequences (Yang 2007). Codeml includes various models which make different assumptions about the value of ω and its distribution across the phylogeny and/or amino acid sequence. Given an evo- lutionary model and a phylogenetic hypothesis, the program calculates a likelihood value that describes the overall fit of the model to the DNA alignment. Nested models (ie. pairs of models such that the null model is equivalent to the alternative model when a single constraint is applied to the alternative model) can be compared via a likelihood ratio test to determine if the alternative model is a significantly better fit (Hulsenbeck and Rannala 1997). Codeml also provides estimates of various parameters relevant to the model chosen, most importantly the average value of ω (which may be estimated separately in different regions of the phylogeny or in classes of amino acid sites depending on the model).

The simplest models in the codeml package are the site models, which are used to determine whether some sites in an amino acid sequence are undergoing positive selection in an otherwise neutrally or conservatively evolving background (Nielsen and Yang 1998, Yang et al. 2000). There are two tests which are commonly used to identify the presence of sites under positive selection: the M1a/M2a test and the M7/M8 test. Both tests compare the relative fit of a null model, which allows for classes of amino acid sites under neutral and purifying selection, to the fit of a model that allows for a class of sites to be under positive selection in addition to neutral and purifying selection classes. Codeml estimates the percentage of sites that belong to each class, as well as the average ω within each class. The M1a or neutral model incorporates two site classes: one where 0 < ω < 1 and one where

16 ω = 1. This model is compared to the M2a selection model, which adds a third site class where ω > 1. Because M2a can be constrained to be equivalent to M1a if the proportion of sites in the third class is equal to zero, a LRT test can be used to compare the relative fit of the two models. The M7/M8 test is slightly more complex. M7, or the beta model, assumes that ω follows a beta distribution restricted between 0 and 1. M8, or the beta + ω model, assumes that ω follows a beta distribution plus an additional category where ω = 1 (Yang 2006). The beta distribution can take on a variety of shapes depending on its parameters, p and q, yielding a flexible model that can adapt to many biological situations. Similarly to the M1a/M2a comparison, these models can be compared via a LRT test.

Branch models compare a model where ω is free to vary in a pre-defined foreground branch compared to the rest of the phylogeny (the background) to a model that estimates a single value of ω across all branches. This allows for detection of changes in average selection pressure in a particular lineage (Yang 1998, Yang and Nielsen 1998).

The branch-site models combine elements of site models and branch models, simulta- neously detecting natural selection at particular residues on particular branches. They were introduced by Yang and Nielsen in 2002, and subsequently improved by Zhang et al. 2005. Like the branch models, the branch-site models detect positive selection on pre-defined fore- ground lineages but also allow for variation in ω among amino acid sites. The alternative model for this test allows for four classes of sites: one where 0 < ω < 1 in all branches, one where ω = 1 in all branches, one where 0 < ω < 1 in the background but ω > 1 in the foreground, and one where ω = 1 in the background but ω > 1 in the foreground. This is compared to a null model where the value of ω in the foreground constrained to be equal to one in all classes.

The clade models were designed to detect whether a gene is under different selective

17 pressure in each of two clades (Bielawski and Yang 2004), and were later extended to consider multiple clades (Yoshida et al. 2011). The alternative model for the most commonly used clade model, Clade model C (CmC), employs three site classes: one class under purifying selection (0 < ω < 1) in all lineages, one under neutral selection (ω = 1) in all lineages, and a third site class where ω is under no constraint, and estimated separately in each clade. This allows for the detection of amino acid sites that are under divergent selective pressure in the clades pre-defined by the user. The CmC alternative models were originally compared to the M1a (neutral) model from the site models, which allow for only two site classes (one under neutral selection and the other under purifying selection). However, a recent study has shown this test to have unacceptable false positive rates, likely due to a confounding factor: because the CmC alternative model has 3 site classes while the M1a has only 2, the CmC model is better able to deal with among-site variation in ω and will therefore be a better fit to the data whether or not divergent selection has occurred among clades (Weadick and Chang 2012). The authors proposed and tested the performance of a modified null model (M1a rel), which applies the single constraint to the CmC model that the estimated ω for the divergent site class must be the equal in all clades. This null model was used in all CmC analyses in this thesis.

CmC analysis results can be further tested to determine if the value of ω in the divergent class of each clade is significantly different from one. This is done by constraining the value of the “divergent” ω to be equal to one in each clade in turn, and testing whether the alternative model allowing for the divergent ω to take on any value is a significantly better fit than the model where its value is constrained. (Chang et al. 2012).

After employing one of the models described above, the Bayes empirical Bayes (BEB) approach can be used to estimate which amino acid sites fall into the positively or divergently selected class (Yang et al. 2005). This allows the specific residues that are under positive

18 selection (in the site and branch-site models) or divergent selection between clades (in CmC) to be identified.

1.5 Objectives

This thesis aims to investigate the molecular evolution of rhodopsin in Neotropical cichlids using species from the tribe Geophagini as a model. This line of research has two major objectives 1) To compare patterns of selective constraint on rhodopsin between Neotropical cichlids and African cichlids, in which evolution of opsin proteins in general and rhodopsin in particular have contributed to diversification between species, and 2) To provide a ba- sis for further investigation into the evolution of visual systems in Neotropical cichlids by determining whether patterns of selective constraint vary within the geophagine cichlids.

1.5.1 Objective 1: Investigating differences in selective constraint

between Neotropical and African dim light visual pigments

As described in this introduction, the visual systems of African cichlids have been thoroughly studied and adaptive evolution has occurred in the rhodopsin gene in several cases (Spady et al. 2005, Sugawara et al. 2005, 2010). Although there have been no studies of a visual pigment in a phylogenetic context in the Neotropical cichlids, there is evidence for positive selection in the rhodopsin gene of the only Neotropical cichlid for which the gene has been characterized (Weadick et al. 2012). This project has five sub-objectives: 1) To determine if there are on average differences in selective constraint between Neotropical cichlids (rep- resented by the geophagines) and African cichlids, 2) To determine if there are on average differences in selective constraint between riverine cichlids (with Neotropical and African

19 representatives included) and lake cichlids, 3) To determine if there are differences in selec- tive constraint among three separate clades: Neotropical cichlids, African riverine cichlids, and African rift lake cichlids, 4) To determine which amino acid sites in the rhodopsin gene are affected by differences in selective constraint among clades, and to speculate on what effects substitutions at these sites may have on rhodopsin function, and 5) To compare the results derived from two different likelihood-based codon models of molecular evolution, the branch-site models and Clade model C.

1.5.2 Objective 2: Investigating differences in selective constraint

within geophagine cichlid dim light visual pigments

Geophagine cichlids are extraordinarily diverse in terms morphology, ecology, and reproduc- tive mode (Barlow 2000, Wimberger et al. 1998, L´opez-Fern´andez et al. 2012), and occur in a variety of habitats (Reis et al. 2003) with different photic properties (Sioli 1984). Both differences in visual requirements (Sabbah et al. 2010) and photic environment (Bowmaker 1995) may cause divergent selective pressures on visual system genes, including rhodopsin. This project has three sub-objectives: 1) To determine if there are differences in selective con- straint between the two major clades of geophagine cichlids, 2) To determine which amino acid sites in the rhodopsin gene are affected by differences in selective constraint, and to speculate on what effects substitutions at these sites may have on rhodopsin function, and 3) To provide a second system for comparing the results of Clade model C and branch-site analyses.

1.6 Figures

20 Figure 1.1: 3D images of dark-state and active state rhodopsin. Panel A shows dark-state rhodopsin (pdID 1U19), panel B shows active state rhodopsin (pdID 2PX0). The retinal chromophore is shown in red.

21 Chapter 2

Molecular Evolution of Dim-light Visual Pigments in Neotropical Geophagine Cichlids: Evidence for differences in selective constraint in comparison with African cichlids

2.1 Abstract

Neotropical cichlid fishes are highly diverse and occupy a wide range of environments. Evo- lution of visual pigments has been important in the speciation and diversification of their sister group, the African rift lake cichlids, but relatively little is known of Neotropical cich- lid visual systems. We sequenced the rhodopsin gene from 28 species of the highly diverse Geophagini clade of cichlids from South America and 3 basal Neotropical cichlids, and com- bined them with an available Geophagini cichlid sequence to provide the first comparative study of a visual protein between the well-studied African clade and their Neotropical sister group. Using a combination of likelihood-based codon models of evolution including site models, branch-site models, and clade models; we investigated differences in selective con-

22 straint in rhodopsin between the Geophagini tribe of Neotropical cichlids, African rift lake cichlids, and African riverine cichlids. We report evidence for significant positive selection in Neotropical cichlid rhodopsins. We also found evidence of positive selection in African rift lake cichlid rhodopsins, a finding consistent with previous studies, but no evidence of positive selection in African riverine cichlid rhodopsins. Clade based analyses indicated that selection pressures are divergent between these three groups and site models indicated the amino acid sites under positive selection in African rift lake and Neotropical cichlids are largely non-overlapping, strongly suggesting that selection pressures on rhodopsin are in- deed divergent between these clades. Based on prior studies of rhodopsin structure and function, we hypothesize that substitutions at divergently and positively selected sites may be influencing non-spectral properties of rhodopsin function. Our analyses include a direct comparison of two methods for inferring functional divergence among genes: the branch-site method, which detects amino acid sites that are under positive selection in a particular clade or lineage in an otherwise neutrally evolving background; and the Clade model C method, which detects amino acid sites that are under different selection regimes in each clade.

2.2 Introduction

Aquatic organisms contend with complex photic environments; where incident brightness, depth, and water chemistry affects the type of light available for vision (Lythgoe 1979). In fish species, visual ability is often correlated to properties of the photic environment, suggesting that the photic environment imposes selective pressure on visual systems (Bowmaker 1995). The cichlid fishes of South and Central America are ubiquitous throughout the ecologically varied riverine habitats of the Neotropics (Reis et al. 2003) and have diverse life histories (L´opez-Fern´andez et al. 2012). Although the evolution of visual systems has been important

23 in the diversification of their sister group, the African cichlids (eg. Spady et al. 2005, Carleton et al. 2010), there is very little known about visual systems in Neotropical cichlid taxa (Weadick et al. 2012) and no comparisons between African and Neotropical clades have been attempted.

Vision is mediated by the visual pigments, which consist of a light-absorbing chro- mophore (retinal) covalently bound to an opsin protein (Wald 1968), a member of the G protein-coupled receptor (GPCR) super family (Hofmann et al. 2009). Absorption of a photon by the retinal causes it to isomerize from the dark-state 11-cis-retinal to all-trans- retinal, resulting in a series of conformational changes in the opsin protein that leads to the Meta II state which binds to and activates the G protein transducin (Hoffmann et al. 2008, Smith 2010). Activation of transducin initiates a signal transduction cascade within the cell, resulting in a reduction in neutrotransmitter release which affects neural signals to the brain (Yau and Hardie 2009). The bond between the chromophore and opsin is subsequently hydrolyzed, all-trans-retinal migrates out of the protein, and the visual pigment is regener- ated with re-constituted 11-cis-retinal (Menon et al. 2001, Sakmar et al. 2002 , Yau and Hardie 2009). There are five major classes of opsins in vertebrates, each of which absorbs a characteristic wavelength of light: The four cone opsins (LWS, RH2, SWS1, and SWS2) mediate bright light vision, and a single class of rod opsin (rhodopsin or RH1) mediates dim-light vision (Bowmaker 2008). One or more classes have been lost in some clades, and gene duplication within classes is relatively common, especially in teleosts (Bowmaker 2008).

Neotropical cichlids make up a monophyletic clade that is sister to the African cichlids (Stiassny 1991; Farias et al. 2000; Sparks and Smith 2004), including the African rift lake cichlids which are well known for their rapid speciation and diversification (reviewed in Seehausen et al. 2006, Kocher 2004). The African rift lake cichlids have an unusually large complement of opsin proteins, with up to 8 functional opsins expressed in the retina of a single

24 individual over the course of its lifespan (Spady et al. 2006), and natural selection on opsin proteins has been implicated in diversification between species: for example, rhodopsin has repeatedly evolved to complement the photic environment by tuning the peak wavelength absorbed to longer wavelengths in blue-shifted environments (eg. Sugawara et al. 2005, Nagai et al. 2011), by responding to natural selection imposed by water turbidity (Spady et al. 2005), or by becoming more responsive to low levels of light in in dim-light environments (Sugawara et al. 2010).

Neotropical cichlids are less speciose than the African rift lake cichlids, but are also characterized by high levels of morphological, ecological, and reproductive diversity (Barlow 2000, Wimberger et al. 1998, L´opez-Fern´andez et al. 2012). This diversity is well represented by the tribe Geophagini: within 18 genera, this clade includes piscivorous species, substrate sifters, and water-column feeders, as well as species that mouth brood their young (L´opez- Fern´andez et al. 2005). Prior to this study, the only Neotropical cichlid in which visual pigment genes have been sequenced is the geophagine Crenichla frenata, due to its relevance as a guppy predator (Reviewed in Houde 1997, Magurran 2005). C. frenata was found to express only five opsins compared to the 8 expressed in African cichlids, and both rhodopsin and one cone opsin were found to be under positive selection (Weadwick et al. 2012). Because the visual systems of Neotropical cichlids and African riverine cichlids are under-explored compared to African rift lake cichlids, it is unclear whether the patterns of opsin reduction and positive selection seen in C. frenata may be due to differences in selection pressure due to lake vs. river habitats, differences in evolutionary history between African and Neotropical cichlids, or a species-specific pattern.

To begin clarifying the differences in evolutionary history between African and Neotrop- ical cichlid visual pigments, we sequenced the gene for the rhodopsin protein (RH1) in 31 species of Neotropical cichlids and two African species, and compared them to available

25 sequences for African cichlids and C. frenata. We hypothesize that the differences in bio- geographic history and evolutionary processes among Neotropical riverine cichlids, African riverine cichlids, and African rift lake cichlids has resulted in divergent selective pressure on the rhodopsin gene. We used codon based models of molecular evolution to compare patterns of selective constraint among these groups; using the popular branch-site models as well as the less widely used clade models as implemented in PAML v.4.5 (Yang 2007). We incorporated newly developed multi-clade models (Yoshida et al. 2011) and recently imple- mented improvements to existing models (Weadick and Chang 2012; Chang et al. 2012) in our analysis, and compare the results from the various methods. To our knowledge, this is the first study of a Neotropical cichlid visual pigment spanning an entire clade, and provides the first comparative study of a visual protein between the well-studied African clade and their poorly known Neotropical sister group in a broad phylogenetic context.

2.3 Methods

2.3.1 Samples and Sequences

A 756bp fragment (representing 73% of the gene, including the seven transmembrane helices) of RH1 was amplified from 1-3 individuals from 33 species, depending on the number of tissue samples available. This included at least one species from each genus in the tribe Geophagini except Acarichthys, three Neotropical species basal to Geophagini (Retroculus xinguensis, Cichla temensis, and Chaetobranchus flavescens), and the basal African riverine cichlids Heterochromis multidens and guntheri (L´opez-Fern´adez et al. 2010).

Tissue samples (muscle or fin) were obtained from the collection at the Royal Ontario Museum. DNA was extracted using standard phenol/chloroform extraction

26 protocols and amplified using the primers PminRH1F (GCGCCTACATGTTCTTCCT) and Rh1039R (TGCTTGTTCATGCAGATGTAGA) (Chen et al. 2003). PCR was performed using standard cycling conditions. Fragments were visualized on agarose gels and extracted using a QIAquick Gel Extraction Kit (QIAGEN). Fragments were cloned into the pJET 1.2 cloning vector (Fermentas), cultured in liquid media, and miniprepped using GeteJET Plas- mid Miniprep Kit (Fermentas). 3-4 clones were sequenced per individual. Sequencing was performed in the forward and reverse directions using a 3730 Analyzer (Applied biosystems).

Sequences were assembled, then manually trimmed and edited in Sequencher 5.0.4.9 (Genecodes) to produce a consensus sequence for each species. Additional sequences were downloaded from Genbank and include all RH1 sequences available from African riverine ci- chlids (nine species) as well as representatives from Lakes Malawi, Tanganyika, and Victoria (16 species). Sequences were aligned using Clustal W (Thompson et al. 1994) as imple- mented in Mega 5.0 (Tamura et al. 2011) and manually verified to ensure an open reading frame. Species list and accession numbers for all sequences used in the study are provided in Supplementary Table 3.1.

2.3.2 Tree building

A maximum likelihood tree for all RH1 sequences was constructed using RaXML-III (Sta- matakis et al. 2005) using the GTR + γ nucleotide substitution model, selected based on AIC comparisons carried out in Findmodel, a web implementation of the program MODELTEST (Posada and Crandall 1998). To avoid local optima, 50 trees were created independently from the same data. The three most likely trees were each bootstrapped with 1000 replicates and summarized using RaXMl-III. Branches with less than 20 bootstrap support were collapsed. All trees had the same topology after this step. This tree placed the African cichlid Hete-

27 rochromis multidens at the base of the Neotropical cichlid assemblage, and the Neotropical species Retroculus xinguensis at the base of the African cichlid assemblage, which is contrary to molecular (Farias et al. 1999, Sparks and Smith 2004, Smith et al. 2008, L´opez-Fern´andez et al. 2010) and total evidence analysis (Farias et al. 2000, 2001) that consistently resolve Neotropical and African cichlids as monophyletic sister clades. Although much less resolved, all other relationships were consistent with previously published trees of Neotropical cichlids (L´opez-Fern´andez et al. 2010), suggesting that there is phylogenetically informative data in the RH1 sequences we obtained.

Our study focuses on the evolution of RH1 in the context of biogeographical differences among Neotropical cichlids, African riverine cichlids, and African rift lake cichlids, and we assume that the phylogenetic misplacement of these species is an artefact of our single gene data set. We therefore used Mesquite to switch the basal branches on each clade to reflect the widely accepted reciprocal monophyly of Neotropical and African cichlid assemblages.

All analyses presented here use this modified tree. All analyses were repeated on a tree with the two misplaced taxa removed. Results from these additional analyses are included as supplementary data (Supplementary Tables 2.2 and 2.3), and do not change the conclusions presented here. The tree used in this study is shown in Figure 2.1.

2.3.3 Testing for selection

Patterns of selection in RH1 sequences were analyzed using the maximum likelihood frame- work of PAML v.4 (Yang 2007). These analyses estimate the ratio of non-synonymous substitutions per non-synonymous site to the synonymous substitutions per synonymous site (dN/dS or ω) (Yang and Bielawski 2000). Neutrally evolving sequences are expected to accumulate non-synonymous substitutions at the same rate as synonymous substitutions,

28 resulting in a ω value of approximately one. Values of ω greater than one indicate positive selection (non-synonymous substitutions are accumulating faster than synonymous substi- tutions), and values of ω less than one indicate purifying selection (non-synonymous sub- stitutions are selected against and therefore accumulate at a slower rate than synonymous substitutions) (Nielsen and Yang 1998, Suzuki and Gojobori 1999, Hurst 2002).

Site models

Tests based on comparisons between models M1a/M2 and M7/M8 from the site models in the codeml package of PAML were used to identify codons under positive selection in alignments of African cichlids and Neotropical cichlids respectively, and M0 was used to estimate the average ω in each alignment (Nielsen and Yang 1998; Yang et al. 2000). M0 assumes all sites evolve under the same selective pressure, and estimates a single ω value for each alignment. M1a assumes two classes of sites, under purifying and neutral selection respectively (0 < ω < 1 and ω = 1), and is compared to M2 which adds an additional class of sites under positive selection (ω > 1). M7 allows ω to continuously vary between 0 and 1 according to a beta distribution, and is compared to M8 which adds an additional class of sites under positive selection (ω > 1). Model M8a was applied to test if the ω value estimated to be under positive selection in M8 is significantly greater than one. All analyses were run starting with the branch lengths estimated by RaXML and repeated four times with varying initial starting points of κ and ω. The model pairs M1-M2 and M7-M8 were compared using a likelihood ratio test (LRT) with a χ2 distribution and two d.f., model pair M8a-M8 was compared with one d.f. (Wong et al. 2004), and sites under positive selection were identified by the Bayes Empirical Bayes (BEB) posterior probabilities (Yang et al. 2005).

29 Clade Model C

Clade Model C (CmC) (Bielawski and Yang 2004) was used to test whether ω is divergent among major cichlid clades, using an alignment including both African and Neotropical cichlids. CmC assumes that some sites evolve conservatively across the phylogeny (allowing for one site class where 0 < ω < 1 and one where ω = 1), while other sites are free to evolve differently among clades (a single site class where ω can take on different values, ω2 and ω3, in each clade). CmC models were recently extended to allow for more than two clades (Yoshida et al. 2011), allowing us to define clades in three different ways to address different aspects of the evolutionary history of cichlids: 1) African vs. Neotropical cichlids, 2) Lake cichlids vs. river cichlids, and 3) A model with three partitions: African lake cichlids, Neotropical river cichlids, and African river cichlids. All analyses included an additional outgroup partition containing the Indian cichlid Etroplus maculatus.

The null model for these analyses was created using the methods of Weadick and Chang (2012), which applies a constraint to the CmC so that the value of ω in the divergent site class no longer varies among clades. The LRT using this model has a significantly lower false positive rate than previous tests, which compared the divergent model to the M1a model. All models were run starting with the branch lengths from RaXML and a κ value of two. CmC analyses are prone to local optima (Bielawski and Yang 2004, Weadick and Chang 2012), so all models were run 20 times with varying initial ω values. In each set, the three runs with the highest maximum likelihood scores were re-run using random starting branch lengths, and the most highest likelihood value overall is reported. Likelihood Ratio Tests (LRTs) were performed between each pair of corresponding alternative and null models with two d.f. (Weadick and Chang 2012). Sites in the divergently selected class were identified by the Bayes Empirical Bayes (BEB) posterior probabilities, which identifies residues that are likely to be in the divergently selected site class (Yang et al 2005).

30 The models in all statistically significant LRT tests were further analyzed to test if the ω value in the divergent class was significantly different from one. This was done by specifying

(fix omega = 1, omega = 1) in the control file, which has the result of constraining ω in the branches labelled with the highest number to be equal to one. LRT tests were performed between the original model and this constrained model with two d.f., as recommended by the authors (Chang et al. 2012).

Branch-site models

Branch-site models were employed to test for positive selection in particular lineages (Zhang et al. 2005). These models allow for ω to vary among amino acid sites and between “fore- ground” and “background” branch types specified by the user, based on a-priori hypotheses of where adaptive evolution may have occurred. These models include four site classes: 1)

0 < ω0 < 1 in all sites; 2) ω1 = 1 in all sites, 3) ω2 > 1 in the foreground and 0 < ω0 < 1 in the background, and 4) ω3 > 1 in the foreground and ω1 = 1 in the background. These mod- els were used to determine if significant differences in selection among clades highlighted by the CmC models are driven by a burst of selection in the lineage leading to each of the main clades. Three analyses were conducted, with 1) the lineage leading to all African cichlids designated as the foreground, 2) the lineage leading to all Neotropical cichlids designated as the foreground, and 3) the lineage leading to all African lake cichlids designated as the foreground. Some studies have used branch-site models to highlight multiple lineages or entire clades (Spady et al. 2005, Ramm et al. 2008; Yoshida et al. 2011), and although this method can lose power if selection pressures are different among foreground branches (Zhang et al. 2005) we performed two tests to compare to our Clade model results: 1) With all Neotropical cichlid lineages as the foreground, to compare to our African vs. Neotropical clade model and 2) with all African cichlids as the foreground, to compare to our Lakes vs.

31 Rivers clade model.

The branch site models were compared to a null model where ω2 is constrained to be equal to one. To avoid local optima, each analysis was run 11 times with the initial value of κ ranging from 0-5 in increments of 0.5. LRT tests between models were performed with 2 d.f.

Location of positive selection

We used the Bayes Empirical Bayes (BEB) method to determine which sites in the amino acid sequence are under positive selection in the rhodopsins of Neotropical and African cichlids, respectively. Sites estimated to be in the positively (or divergently) selected site classes were mapped onto the light-activated (Park et al. 2008) and dark state (Palczewski et al. 2000) 3D structures of rhodopsin (PDB accession numbers IU19 and 3DQB respectively) using PyMOL v. 1.5.0.4 (DeLano 2002). Bovine rhodopsin numbering is used throughout.

2.4 Results

2.4.1 Molecular dataset

Our alignment did not contain any stop codons, and all sequences had characteristics integral to rhodopsin function such as lysine at site 296. A total of 214 nucleotide sites were variable in our dataset, with 149 variable sites in the Neotropical cichlids, 71 in the African riverine cichlids, and 55 in the African lake cichlids. At the amino acid level, 105 amino acids varied among Neotropical cichlids, 58 in the African riverine cichlids, and 41 in the African rift lake cichlids.

32 2.4.2 Site models

We used the site models in PAML v4.5 (Yang, 2007) on separate alignments of RH1 from African and Neotropical cichlids to determine which amino acid sites are under positive selection in each group. We found strong evidence for positive selection in both groups using both the M1/M2 test and the M7/M8 test (p < 0.0001 in all tests). 4-5% of sites were estimated to be under positive selection in both the Neotropical cichlids and the African cichlids, with an average ω of 4.05 (M8) to 4.17 (M2) in Neotropical cichlids and 6.4 (M8) to 6.9 (M2) in African cichlids. These values are all significantly greater than one (p < 0.001 for all M8/M8a tests) (Table 2.1). The BEB sites highlighted by the M8 and M2 tests were consistent (Supplementary Table 2.4). Interestingly, the BEB sites in these two groups are largely non-overlapping, with 14 positively selected sites in Neotropical cichlids and 9 positively selected sites in African cichlids, only two of which are common to both analyses (Table 2.2).

2.4.3 Clade model C

We used Clade Model C in PAML v. 4.5 (Bielawski and Yang 2004) on our entire data set to determine if there is divergent selection between ecologically and geographically distinct cichlid lineages, using the newly implemented multi-clade models (Yoshida et al. 2011), a newly derived null model (Weadick and Chang 2012), and a new method to determine if omega values in the divergent site class are significantly different from one (Chang et al. 2012). We partitioned our data to reflect three hypotheses about which phylogenetic groups may have divergent selection pressure on their opsins: 1) Neotropical cichlids vs. African cichlids; 2) Lake cichlids vs. riverine cichlids (including Neotropical and African representatives), and 3) A three-way test between Neotropical cichlids, African lake cichlids,

33 and African riverine cichlids. All models also included a partition for the outgroup species. Allowing for a divergent site class significantly improved the fit of all models (p < 0.05 in all tests), indicating that there is divergent selection pressure in each clade. The Neotropical vs. African Lake vs. African River test indicates that the divergent site class is on average under significant positive selection in Neotropical cichlids and African Lake cichlids (ω = 2.2 and 7.3 respectively), but under neutral or slightly purifying selection in African riverine cichlids with an ω value that is not significantly different from one (ω = 0.81). This is corroborated by our results in the Neotropical vs. African and Lakes vs. Rivers tests: Grouping the African lake and African riverine cichlids together reduces the estimate of omega from 7.3 in the lake cichlids to 5.3 in all African cichlids; and grouping the African riverine cichlids with the Neotropical cichlids reduces the value of omega from 2.2 in just the Neotropical cichlids to 1.9 in all riverine cichlids (Table 2.3). 10-11% of sites were estimated to be under divergent selection pressure in all of the models (Table 2.3), which is consistent with the approx. 5% of sites found to be under positive selection in the African and Neotropical clades separately (Table 2.1). Sites estimated to be in the divergent site class correspond to sites that are under positive selection in either Neotropical or African cichlids according to the site models. Detailed BEB site results are available in the supplementary material (Supplementary Table 2.4).

2.4.4 Branch-site Models

We used branch-site tests to determine whether the patterns of divergent selection in our clade model tests are driven by a burst of selection following divergence between major clades, by designating the lineage leading to all Neotropical cichlids, all African cichlids, or all African lake cichlids as the foreground in three separate tests. All tests were insignificant (Table 2.4), indicating that the divergent selection pressure found using the clade models

34 was not driven by selection as each group invaded a new environment, but rather by processes affecting the molecular evolution of rhodopsin across the clade. We further applied this test with all of the Neotropical cichlid lineages as the foreground and with all of the African cichlid lineages as the foreground, as this method has been used as an alternative to using clade models. Evidence for positive selection in the foreground clade was found in both tests (p < .001, Table 2.4).

2.4.5 Influence of positively selected sites on rhodopsin function

We mapped our positively and divergently selected sites onto the crystal structure of both the dark-state and the active conformations of rhodopsin (Palczewski et al. 2000; Park et al. 2008), and found that they map to regions in rhodopsin that are associated with non-spectral properties. These include the dimerization interface between monomers and the entry/exit channels for retinal.

Rhodopsin forms dimers and higher order oligomeric interactions in vivo, with the clos- est contact between monomers occurring between transmembrane helices IV and V (Fotiadis et al. 2006). 5/6 of the BEB sites exclusive to the African lineage that fall on the dimer- ization interface (Sites 162, 163, 165, 166, 213, and 218; Table 2.3) are characterized by hydrophobic residues in the Neotropical cichlids, but smaller hydrophobic residues (site 218) or a combination of smaller hydrophobic and non-hydrophobic sites, including nucleophiles (sites 162, 163, 165, and 218), in the African cichlids (Figure 2.2). The precise nature of the dimeric interface is not known (Morris et al. 2009, Lohse 2010), but it is possible that these substitutions affect the affinity between members of a rhodopsin dimer or the density of rhodopsin packing. BEB sites from Neotropical cichlids along this interface do not show a consistent pattern of amino acid substitution. However, two adjacent positively selected

35 sites (172 and 173) show opposite patterns of substitution (larger hydrophobic in African vs. smaller hydrophobic in Neotropical at site 172; small hydrophobic in African vs. larger hydrophobic in Neotropical at 173), which could be the result of compensatory mutations to maintain an overall similar level of dimeric contact. The location of these sites with respect to the dimerization interface is shown in Figure 2.3.

The structure of the activated opsin (Park et al. 2008) shows a channel through the protein that provides access to the chromophore pocket, with openings into the lipid bi-layer between helices I and VII and between helices V and VI (Hildebrand et al. 2009). Current theories suggest that retinal traverses through this channel unidirectionally (Schadel et al. 2003, Hildebrand et al. 2009), but despite extensive mutagenesis studies the direction of travel has not been established (Piechnick et al. 2012). The BEB sites 213 in African cichlids and 270 and 274 in Neotropical cichlids from this study are adjacent to the opening between helices V and VI, and BEB site 286 in Neotropical cichlids is adjacent to the opening between helices I and VII. The side chain of residue 286 in particular points directly into the helices I/VII channel, and has repeatedly evolved from valine to isoleucine in Neotropical cichlids (Figure 2.2). The additional methyl group in isoleucine compared to valine could potentially hinder the passage of retinal through steric effects, and may be a good target for future mutagenesis studies aiming to determine the direction of retinal passage. The location of these sites with respect to the channel openings is shown in Figure 2.4.

Although not identified as being under positive selection, site 83 was found to be diver- gent between African and Neotropical cichlids based on a visual inspection of our alignment. Surprisingly, all Neotropical cichlids with the exception of the basal Retroculus xinguensis have asparagine (N) at this residue. The residue aspartic acid (D) is the most conserved at this site across GPCRs (Iismaa et al. 1995), with the natural variant asparagine often being associated with deep water organisms (Hope et al. 1997, Hunt et al. 1996, Fasick and

36 Robinson 2000, Hunt et al. 2001) including cichlids from the deepest parts of the African rift lakes (Sugawara et al. 2005). This substitution shifts the peak wavelength absorbed towards longer wavelengths in some species, but the shift is context dependent and minor compared to other spectral tuning sites in African deep water cichlids (Sugawara et al. 2005). The acidic side chain in aspartic acid is known to stabilize the inactive form of rhodopsin by participating in a hydrogen bond network (Breikers et al. 2001), and the substitution to the non-acidic asparagine increases the speed of production of Meta II upon photo-activation in cichlids causing them to be more sensitive to dim light. This indicates that this substitution is likely related to dim-light adaptation rather than adaptation to wavelength discrimination in cichlids (Sugawara et al. 2010).

2.5 Discussion

2.5.1 Positive selection in Neotropical and African cichlid rhodopsins

We show strong positive selection (ω = 4.1) in approximately 10% of amino acid residues in the RH1 protein of Neotropical cichlids, using an alignment containing only Neotropical cichlids. Positive selection on rhodopsin was predicted given the wide variety of niches and environments that Neotropical cichlids occupy, but the strength of the evidence is remarkable given that positive selection in African rift lake opsin genes is closely linked to both sexual dimorphism (Terai et al. 2006; Miyagi et al. 2012) and very recent adaptive radiation (Spady et al. 2005), neither of which is the case in Neotropical cichlids. This suggests that ecological selection over long time scales is sufficient to drive detectable positive selection in the RH1 gene of cichlid fishes, and provides evidence that the evolution of visual systems may be important for cichlid diversification outside the African rift lakes.

37 We also used site models on an alignment including only African cichlids, and show evidence for strong positive selection (ω = 6.4 using M8 and 6.9 using M2) in approximately 10% of amino acid residues in the RH1 protein in African cichlids. Previous studies including African rift lake cichlids and a single African riverine outgroup reported 6% of sites under positive selection, with an average omega value of 14.07 (M8) to 17.54 (M2) (Spady et al. 2005). These values are higher than those reported here, likely because our analysis included more riverine cichlids, which do not appear to have positively selected sites in their rhodopsin genes (Table 2.3; ω = .811 in the divergent site class for African riverine cichlids). These values approach or exceed the values reported for genes known to be under strong positive selection using site models, including viral coat proteins (ω = 5.6 − 6.7, Moury and Simon 2011), the influenza A virus (ω = 5.3−6.7, Yang 2000), and the mammalian immune system protein p53 (ω = 1.3, Khan et al. 2011).

2.5.2 High average omega values

In addition to high values of omega in the positively selected class, we report a high average value of omega across all sites in both Neotropical and African rhodopsin. Our values for average omega using the M0 model (ω = .28 in Neotropical cichlids and ω = .31 in African cichlids) are comparable to the value found in goby rhodopsin (ω = 0.28), but are substantially higher than typical values in protein coding genes (ω = 0.08−0.18 Fay and Wu 2003), and in a broad analysis of ray-finned fish rhodopsin (ω = 0.07 − 0.08, Rennison et al. 2012), and are instead closer to the values found in genes coding for highly diverse proteins with sites under strong positive selection such as human MHC proteins (ω = .5) and human reproductive proteins (ω = .27 − .93) (Swanson et al. 2001). This suggests that the genes coding for cichlid rhodopsins are not as highly conserved as those coding for rhodopsin in other ray-finned fish, or for protein-coding genes in general.

38 2.5.3 Divergent selection between clades

We used Clade model C on a data set including both Neotropical and African cichlids to identify sites where the selection regime is divergent among clades. We found strong evidence for divergent selection pressure at 10-11% of amino acid sites between Neotropical and African cichlids; between lake cichlids and riverine cichlids; and among all three clades when Neotropical cichlids, African rift lake cichlids, and African riverine cichlids were treated as separate partitions. Clade based analyses are ideal to assess variation in selection pressure among clades that have become geographically or ecologically distinct after a vicariance event, because some sites in a protein are essential to function and are expected to be strongly conserved, while sites that are less constrained may evolve differently depending on selection pressures experience by species in each clade (Forsberg and Christiansen 2003). This could be relevant to our results as the division between Neotropical and African riverine cichlids is likely due to the break-up of Gondwana (Genner et al. 2007). Our results suggest that rhodopsin proteins in the three clades are evolving under divergent selection pressures. Consistent with our site model analysis, the divergently selected site class was on average under positive selection in the Neotropical and the African rift lake cichlids. However, the divergently selected site class was on average under neutral or weakly purifying selection in the African riverine cichlids. It is not obvious why Neotropical cichlids should have positive selection on their rhodopsins when African riverine cichlids do not, because both Neotropical and African riverine cichlids are geographically widespread and occupy many different niches that would be expected to exert selective pressure on opsins. Low rates of diversification in African riverine cichlids has been explained by the temporal instability of African riverine habitats, which may have prevented speciation via niche partitioning (Joyce et al. 2005) and resulted in diversification driven more by vicariance and drift than by selective processes (Joyce et al. 2005, Katongo et al. 2005, 2007; but see Kobmuller 2008). Aquatic systems in

39 the Neotropics have also been very unstable throughout history (eg. Lundberg et al. 1999, Bloom and Lovejoy 2011), but this does not appear to have hindered diversification — the geophagine cichlids which represent the majority of Neotropical lineages in our study have diversified widely in morphology and feeding ecology (L´opez-Fern´andez et al. 2012, L´opez- Fern´andez et al. submitted), and these divergent life histories may have driven positive selection on rhodopsin. Although we used all rhodopsin sequences from African riverine cichlids currently available on Genbank, limited phylogenetic sampling may have hindered our ability to detect positive selection in African riverine cichlids.

We further used branch-site tests with the lineage leading to each clade as the fore- ground to test if this divergent selection was the result of a burst of selection after speciation, but these tests were all non-significant indicating that divergent selection patterns are acting on each clade as a whole. This is consistent with the phylogenetic pattern of substitutions seen at positively selected sites; as positively selected sites often have variants distributed throughout the entire clade they are positively selected in (Figure 2.2).

To date, very few studies have employed the clade-based methods used here, but what little data is available suggests that the values of omega in the divergent class found to be significantly greater than one in this study (ranging from ω = 2.2 in Neotropical riverine ci- chlids to ω = 7.3 in African rift lake cichlids) are exceptionally high. Analysis of mammalian rhodopsin, including species that inhabit dim light environments expected to exert selection pressure on rhodopsin genes, show omega values in the divergent site class of no more than 1.19 (Weadick and Chang 2012).

40 2.5.4 Non-overlapping BEB sites

After detecting significant positive selection in African rift lake cichlids and in Neotropical cichlids and showing that selection is divergent between African rift lake, African riverine, and Neotropical cichlids, we used the Bayes Empirical Bayes (BEB) method to predict which amino acid sites are driving positive selection in African rift lake cichlids and Neotropical cichlids respectively. Because the structure of rhodopsin has been thoroughly characterized, this method can give insight into how positive selection is affecting functional characteristics of rhodopsin (e.g. Weadick and Chang 2007, Larmuseau et al. 2010). Intriguingly, the set of sites that are under positive selection are almost entirely non-overlapping between African and Neotropical cichlids (Table 2.2), and the pattern of substitution at each of the non-overlapping BEB sites clearly favours different amounts of amino acid variation or different residues in each clade: many sites have unique residues in each clade, or are more variable in one clade than the other (Figure 2.2). The only study we are aware of that uses site models to detect BEB sites on various clades independently also found non-overlapping positively selected sites between clades. Although this was conducted in a viral protein considered by the authors to be an ideal system in which to detect this type of divergence, only 0-3 sites were found to be under positive selection in each clade (Moury and Simon 2011). In conjunction with our clade model results showing divergent selective pressure among clades, this provides very strong evidence that the groups of cichlids defined in this study are experiencing different selective pressures on their rhodopsin genes.

It is tempting to conclude that because Neotropical riverine cichlids and African rift lake cichlids have different rhodopsin residues under positive selection, natural selection is selecting for different functional characteristics in the rhodopsins of each clade (Yang and Bielawski 2000). However, further studies linking BEB sites to function, and function to fitness are necessary to confirm this (Yokoyama et al. 2008; MacCallum and Hill 2008;

41 Nozawa et al. 2009a). In some systems, physiological experiments have been conducted to bridge this gap (eg. Yuan et al. 2010 in Heliconius butterfly opsins; Moury and Simon 2011 in potato virus coat proteins). In rhodopsin, the extensive mutagenesis studies that have been performed provide information allowing for predictions about the possible effects of substitutions at BEB sites to be formed. The location of BEB sites and the amino acid substitutions at those sites in the current study suggests that positively selected sites may be influencing non-spectral properties of rhodopsin such as the dimerization point between rhodopsin monomers (Figure 2.3) and the passage of retinal through the protein (Figure 2.4).

Even if substitutions are shown to be adaptive, rhodopsin proteins are not necessarily under different environmental pressures in each clade. Alternatively, adaptation to simi- lar environmental pressures may be occurring via different substitutions at the molecular level. Amino acid substitutions can produce general, non-local effects on rhodopsin function (Piechnick et al. 2012), and different substitutions or substitutions at different sites can often have convergent effects on function (eg. Hunt et al. 2001, Takenaka et al. 2007). The location of BEB sites on the crystal structure of rhodopsin and the chemical properties of the amino acids substituted at these sites can provide insight into which of these processes may be occurring. The sites along the dimerization interface that are positively selected in African cichlids only show a consistent pattern of substitutions towards reduced hydropho- bicity, which suggests that these sites may be under positive selection due to environmental pressures unique to African rift lakes. Other sites, such as sites 213 and 217, are one helix turn away from each other and face the same direction, and are under positive selection in African and Neotropical cichlids respectively: substitutions at these sites could be causing similar functional changes in each clade.

42 2.5.5 Clade model C vs. Branch-site Results

In this study, we addressed differences in selective constraint between African and Neotropical cichlids in four ways: 1) By applying site models to each data set individually, 2) by applying branch-site models with either the Neotropical or the African clade as the foreground, 3) by applying branch-site models with the lineage leading to either the Neotropical or the African cichlids as the foreground, and 4) by applying Clade model C with various partitions. We argue that the combination of these models uncovers patterns of variation not apparent when the models are used in isolation; and that the inclusion of the under-used CmC method provides important additional information (Weadick and Chang 2012).

Clade models are less widely used than branch-site models, and although they are both designed to detect functional divergence among genes, they make different assumptions and address slightly different patterns of substitution. Branch-site models are used much more commonly, and test for an episode of positive selection along particular branches in an otherwise conservatively evolving background (Yang et al. 2005; Zhang et al. 2005). These tests assume that there is a category of sites that switches from neutral or purifying selection to positive selection in a specific branch or clade. Clade models have been used less frequently, and detect sites that vary in the strength and form of selection among clades (Weadick and Chang 2012). If the assumption in the branch-site test that there is no positive selection in the background is violated, the alternative model allowing positive selection in the foreground may fit the data better even if there are positively selected sites throughout the phylogeny, leading to false positive results (Zhang et al. 2005, Suzuki 2008; Yoshida et al. 2011). Similarly to other studies comparing the two methods (Yoshida et al. 2011), the results from our branch-site tests where we designated the entire Neotropical or African clade as the foreground were consistent with our CmC results, insofar as the branch-site tests indicated significant positive selection at some sites in each clade independently, and

43 the CmC test indicated a divergent site class that is on average under positive selection in each clade. However, the BEB sites from the site models in our analysis suggest that the branch-site test has low power to detect sites that are under positive selection in both the foreground and the background. Site 169 and 124 were found to be under positive selection in both Neotropical and African cichlids using the site models on each clade independently, and are in the divergent site class using CmC, but were not highlighted as a BEB site when the entire clade of Neotropical cichlids was designated as the foreground using the branch- site test. Site 169 (but not 124) was highlighted as a BEB site when the entire clade of African cichlids was designated as the foreground, but with lower support than in the site models or CmC (Supplementary Table 2.4). The full interpretation of our results therefore depended on using clade models to detect divergent selection pressure, branch-site models to determine whether differences in the clade models are driven by particular lineages, and site models on each clade independently to determine which sites are under positive selection in each clade.

There are two other possible drawbacks of using entire-clade branch-site models in- stead of Clade models. First, although Zhang et al. (2005) found that branch-site models are statistically well-behaved, Weadick and Chang (2012) showed that the inclusion of an additional site class can make an alternative model fit better even if no positive selection is occurring, because the additional site class allows for the alternative model to better deal with among-site variation. The branch-site model only allows one value of omega to be estimated in the background (ω0, which must be between 0 and 1), but allows two values of ω to be estimated in the foreground (ω0, which must be between zero and one, and ω2, which must be above one). To our knowledge there has been no critical evaluation of the reliability or power of the branch-site test when multiple branches are designated as the fore- ground. Secondly, branch-site tests are specifically designed to distinguish positive selection

44 as distinct from relaxed selective constraint, which in many cases is a desired outcome of the test. However, relaxed selective constraint at a particular site in one clade but not another is an inherently interesting evolutionary pattern, and is better addressed using Clade models. Three sites in our study (sites 297, 299, and 304) were not estimated to be under positive selection using site models or branch-site models, but were placed in the CmC divergent class with strong support. Why these sites are evolving divergently without being under positive selection is an interesting question, and this pattern would not have been uncovered using branch-site models.

2.5.6 Caveats dN/dS based methods have been extraordinarily useful to evolutionary biologists, but there are some caveats associated with their use. In some systems, positive selection at synonymous sites due to processes such as selection for translational efficiency (e.g. Kreitman et al. 1995; Duret 2002) could inflate the value of omega by increasing dS (Hirsh et al. 2005). However, Zhang and Li (2004) found no trend for increased omega at lower values of dS, and as long as this selection pressure is equal between synonymous and non-synonymous sites the integrity of the dN/dS based methods should not be affected (Fay and Wu 2003, Yang 2006). Nozawa et al. (2009a, 2009b) have criticized the branch-site models for having a high rate of false positives, but their concerns are largely addressed by Yang et al. (2009, 2011), which showed that the false positive rate falls well within an acceptable 5% margin of error. The CmC method was also recently found to have a high false positive rate, but a new null model was proposed and rigorously tested that reduces the rate to an acceptable level (Weadick and Chang 2012) and this was used in the present study. In general, dN/dS based methods are limited because they consider just point mutations, and ignore deletions or insertions which can also be under positive selection (Kamneva et al. 2010), but as rhodopsin lacks indels

45 and there were no gaps in our alignment, this could not have influenced the present study.

2.5.7 Conclusions

We have shown that positive selection is acting on the rhodopsins of Neotropical and African rift lake cichlids in a divergent manner, with strong positive selection occurring in each clade. In this study we only speculate about the functional and ecological consequences of this pattern, but site-directed mutagenesis and laboratory analysis could clarify the functional relevance of these substitutions. Environmental data could be collected to test for correla- tions between rhodopsin phenotype and environmental variables, which could provide a link from substitutions at the molecular level to functional divergence and organismal fitness.

2.6 Tables

46 Table 2.1: Parameter estimates, likelihood values, likelihood ratio tests, and significance values of PAML random site models using Neotropical or African RH1 sequences. The analysis on Neotropical cichlids was based on the phylogeny proposed by L´opez-Fern´andez et al. 2010, analysis on African cichlids was based on the RH1 gene tree created in this study. Significant LRT tests are highlighted in bold.

Model np tree Κ Parameter estimates Lnl Test P value length statistic Neotropical cichlids

ω =0.28 -2966.7 M0: One ratio 61 1.25 2.78 0 ω =0.01 p =0.84 ω1=1 p =0.16 -2829.8 M1: Nearly Neutral 62 1.30 2.41 0 0 1

M2: Positive Selection 64 1.34 2.70 ω0 =0.02 p0=0.84 ω1=1 p1=0.11 ω3=4.17 p3=0.05 -2803.5 273.8 <.0001 (vs. M1) p=0.01 q= 0.03 -2830.8 M7: Beta 62 1.35 2.45

p0=0.95 p=0.02 q= 0.149 (p1=0.05) ω=4.05 -2803.8 54.08 <.0001 (vs. M7) M8: Beta + ω 64 1.34 2.77 47 p = 0.84 p= 0.03 q= 0.35 (p = 0.16) ω= 1.00 -3124.9 642.25 <.0001 (vs M8) M8a 62 1.23 2.46 0 1 African cichlids

M0: One ratio 49 0.92 3.34 ω0=0.31 -2221.4

M1: Nearly Neutral 50 0.96 2.99 ω0 =0.04 p0= 0.85 ω1=1 p1=0.15 -2161.9

M2: Positive Selection 52 1.05 3.47 ω0 =0.07 p0=0.87 ω1=1 p1=0.09 ω3=6.9 p3= 0.04 -2133.3 57.2 <.0001 (vs. M1)

M7: Beta 50 0.95 3.08 p=0.01 q= 0.03 -2164.3

M8: Beta + ω 52 1.05 3.48 p0=0.95 p=0.02 q= 0.149 (p1=0.05) ω=6.4 -2134.2 60.2 <.0001 (vs. M7)

M8a 50 1.05 2.50 p0= 0.81 p= 3.94 q= 99.0 (p1=0.19) ω= 1.00 -3030.3 1792.2 <.0001 (vs M8)

Table 2.2: BEB sites in Neotropical and African cichlids. All sites found to be under positive selection withp > 0.9 are listed in the first column. ** indicates p > 0.90, * indicates p > .85 in BEB results from the M8 model (All sites were also indicated to be under positive selection in the M3 model, although in some cases with lower probability). Site numbers in bold were also found to be under positive selection in African cichlid rhodopsin by Spady et al. 2005; as well as sites 22, 41, 42, 50, 95, 104, 158, 159, 255, 256, 263, and 297.

Bovine location in African Neotropical Possible Effect on Rhodopsin Function References Rhodopsin rhodopsin cichlids cichlids site only only 49 TM1 ** ? 124 TM3 * * Possible spectral tuning Hunt et al. 2001 133 TM3 ** Adjacent to “ionic lock” at Glu 134 Hoffmann et al. 2009; review 156 TM4 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 162 TM4 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 163 TM4 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006

48 165 TM4 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 166 TM4 * Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 169 TM4 ** ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 172 TM4 * Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 173 TM4/E3 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 213 TM5 ** Dimerization interface, Near retinal channel B Guo et al. 2005, Fotiadis et al. 2006, Hildebrand et al. 2009 217 TM5 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 218 TM5 ** Dimerization interface Guo et al. 2005, Fotiadis et al. 2006 248 TM6 ** Adjacent to “ionic lock” at Glu 247 Hoffmann et al. 2009; review 270 TM6 ** Near retinal channel B Hildebrand et al. 2009 274 TM6 ** Near retinal channel B Hildebrand et al. 2009 281 E3 ** Affect ability to form 3D structure Anukanth & Khorana 1994 282 E3 * Forms H bond with C terminus; affects stability Standfuss et al. 2007 286 TM7 ** Near retinal channel A Hildebrand et al. 2009

Table 2.3: Parameter estimates, likelihood values, test statistics, and p values for various data partitions in Clade Model C. Omega estimates in the alternative models that are significantly different from 1 are highlighted in bold.

Tree Site Class Site Class 2 Test P Partition and test Np kappa Site Class 1 LNL length 0 (divergent) statistic value 109 2.83 2.66 -5145.7 p0=0.77 p1=0.13 p2=0.10 African vs. Neotropical African ω 0.027 1.000 5.330 alternative Neotropical ω 0.027 1.000 2.380 Outgroup ω 0.027 1.000 2.226 107 2.82 2.70 p =0.773 p =0.116 p =0.111 -5149.7 7.94 0.0188 African vs. Neotropical null 0 1 2 Average ω 0.028 1.000 3.336 109 2.83 2.66 -5137.6 p0=0.773 p1=0.122 p2=0.105 Lakes ω 0.027 1.000 7.568 Lakes vs. Rivers alternative River ω 0.027 1.000 1.912 49 Outgroup ω 0.027 1.000 2.196 107 2.82 2.70 p =0.773 p =0.116 p =0.111 -5149.7 24.08 <.0001 Lakes vs. Rivers null 0 1 2 Average ω 0.028 1.000 3.336 110 2.82 2.68 -5136.2 p0=0.774 p1=0.115 p2=0.111 African rivers vs. African Lakes ω 0.028 1.000 7.262 lakes vs. Neotropical Neotropical ω 0.028 1.000 2.151 alternative Afr. River ω 0.028 1.000 0.811 Outgroup ω 0.028 1.000 2.340

African rivers vs. African 107 2.82 2.70 p0=0.773 p1=0.116 p2=0.111 -5149.7 26.83 <.0001 lakes vs. Neotropical null Average ω 0.028 1.000 3.336

Table 2.4: Likelihood values, test statistics, and p values for likelihood ratio tests for branch-site models.

Tree Test Partition Test Np kappa LNL P value length statistic

Alternative 107 2.56302 2.3539 -5198.112 African vs. Neotropical (Single lineage leading to African cichlids as foreground) null 106 2.56303 2.354 -5198.112 0 1.00 African vs. Neotropical (Single lineage leading to Alternative 107 2.35393 2.3539 -5198.112

Neotropical cichlids as foreground) null 106 2.56302 2.3539 -5198.112 0 1.00 Lakes vs. Rivers (Single lineage leading to lake Alternative 106 2.28511 2.3429 -5000.866

cichlids foreground null 105 2.28519 2.3451 -5001.511 1.29 0.16 African vs. Neotropical (Entire Neotropical lineage Alternative 106 2.46722 2.4783 -4981.517 as foreground) null 105 2.2852 2.34508 -5001.511 40.0 <.001

50 African vs. Neotropical (Entire African lineage as Alternative 106 2.5312 2.50866 -4957.082 foreground) null 105 2.2852 2.34508 -5001.511 88.9 <.001

2.7 Figures

51 Mazarunia sp 1 Mazarunia sp 2 Guianacara owroewefi Guianacara stergiosi Chaetobranchus flavescens Crenicara punctulatum Dicrossus filamentosus Biotodoma cupido Biotodoma wavrini Mikrogeophagus ramirezi Geophagus Geophagus setequedas Geophagus abalios Geophagus dicrozoster Geophagus harreri Biotoecus dicentrarchus Taenicara candidi Apistogramma agassizi Apistogramma hoignei Satanoperca daemon Satanoperca leucosticta Satanoperca mapiritensis Satanoperca jurupari Crenichla Orinoco lugubris Crenichla geayi Teleocichla nsp preta Crenichla frenata Crenichla Orinoco wallaci Retroculus xinguensis Cichla temensis Heterochromis multidens Heterochromis fasciatus Chromidotilapia guntheri Steatocranus casuarius Tilapia buttikoferi Oreochromis niloticus Sarotherodon melanotheron Tilapia rendalli Spathodus erythrodon Neolamprologus leleupi Xenotilapia spiloptera Haplotaxodon microlepis Trematocara unimaculatum Limnochromis staneri Baileychromis centropomoides Cyphotilapia frontosa Cyprichromis leptosoma Pallidochromis tokolosh Diplotaxodon macrops Rhamphochromis longiceps Tropheus duboisi Metriaclima zebra Haplochromis brownae Tyrannochromis maculatus Aulonocara stuartgranti

0.02 Figure 2.1: Maximum likelihood tree of RH1 sequences, constrained to be reciprocally mono- phyletic.

52 Dive- BEB Site in Neotropical Cichlids African Cichlids Both rgent BEB site in: Neotropical Cichlids African Cichlids Both Divergent 49 156 173 217 248 270 274 281 282 286 133 162 213 163 165 166 172 210 218 256 124 169 83 210

Etroplus maculatus L G V C R G W A E V I T I M L S L C V I S G N C

Heterochromis multidens L G I T R G W S E I I I L M L S V V V V S G N V Hemichromis fasciatus L G L T R G Y S E V I A L A L S L C I I G A D C Chromidotilapia guntheri L G L T R G Y S D V V A L A L S L C V I G A D C Steatocranus casuarius L G V T R G Y S E V I L L A L S L C V I G G D C Tilapia buttikoferi L G V T R G Y S E V I L L A L S L C V I G G D C Oreochromis niloticus L G V T R G Y S E V V L L A L S L C V I G G D C Sarotherodon melanotheron L G V T R G Y S E V V L L A L A L C V I G G D C Afr. River Afr. Tilapia rendalli I G V T R G Y S E V V L L A L A L C V I G G D C Spathodus erythrodon L G V T R G Y S E V I V M A S A L C V M G V D C Neolamprologus leleupi L G V T R G Y S E V I V M A C A L C I M G V D C Xenotilapia spiloptera L G V T R G Y S E V I L T A S A L C T M G V D C Haplotaxodon microlepis L G V T R G Y S E V I V M A L S L C V M G A D C Trematocara unimaculatum L G V V R G Y S E V I V A A L S L C V M G V D C Limnochromis staneri L G V T R G Y S E V I V S A L S L C V M G A D C Baileychromis centropomoides L G V T R G Y S E V I V S A L S L C V M G A N C Cyphotilapia frontosa L G V T R G F S E V I V M A L S L C V M G A D C Cyprichromis leptosoma L G V T R G F S E V V V M G L S L C V M G A D C Pallidochromis tokolosh L G V T R G Y S E V V V T A L A L C V M G A N C Dipoltaxodon microps L G V T R G Y S E V V V T A L A L C V M G A N C

Afr. Lake Afr. Rhampochromis longiceps L G V - R G Y S E V V V T A L A L C V M G A D C Tropheus duboisi L G V T R G Y S E V V V T A L S L C V M G A D C Metriaclima zebra L G V T R G Y S E V I V T A L S L C V M G A D C Haplochromis brownae L G L T R G Y S E V V I L G L S L C I M G A D C Tyrannochromis maculatus L G V T R G Y S E V I V I A L A L C V M G A D C Aulonocara stuartgranti L G V T R G Y S E V I V T A L A L C V M G A D C Retroculus xinguensis L F V T R G W S E V I I L M L T L C I I G G D C Cichla temensis I F I I K Y W A E V I V I M L S V C I I S G N C Biotoecus dicentrarchus L G V A R G W S D V I I L M L S V V I I S G N V Chaetobranchus flavescens L G I T R G W S E I I I L M L S V V V I S G N V Mazarunia sp. 1 L G I T R G W S E I I I L M L S V V I I S G N V Mazarunia sp. 2 L G I T R G W S E V I I L M L S V V I I S A N V Guianacara owroewefi L G I A R G W A E I I I L M L S V V I I S G N V Guinacara stergiosi L G I A R G W S E I I I L M P S V V I I S G N V Crehichla 'Orinoco lugubris' I G I T R G W A E I I I L M L S V V I I S G N V Crehichla geayi I G I I R G W A E I I I F M L S I V I I S G N V

Teleocichla n sp. preta I G I A R G W A E V I I L M L S V V I I S G N V Crenichla frenata I G I T R G W A E I I I L M L S V V I I S G N V Crenichla 'Orinoco wallaci' I G I T K G W A E I I I L M L S V V I I S G N V Taenicara candidi L G L T R G W A E V I I L M L S V V I I S A N V Apistogramma agassizi L F I T R G W A E V I I L M L S V V I I S A N V Apistogramma hoignei L F I T R G W A E V I I L M L S V V I I S A N V Satanoperca daemon L G V T R G W S E I I I L M L S V V V I S G N V Satanoperca leucosticta L G V A R G W S E V I I L M L S V V V I S G N V Satanoperca mapiritensis L G V T R G W S E I I I L M L S V V V I S G N V

Neotropical Satanoperca jurupari L G V T R G W S E I I I L M L S V V V I S G N V Crenicara punctulata L F I I R G W A E I I I L M L S L L I I S A N L Dicrossus filamentosus L G I T R G W A E I I I L M L S L L I I S G N L Biotodoma cupido L G I T R G W S E I I I L M L S L L V I S G N L Biotodoma wavrini P G I T R G W S E V I I L M L S L L V I S G N L Mikrogeophagus ramirezi I G V T R G Y S E I I I L M L T L V I I S A N V 'Geophagus' brasiliensis I G I T R G Y S E I I I L M L S L V I I S G N V 'Geophagus' steindachneri I G V T R G W S E I I I L M L S L V I I A G N V Gymnogeophagus setequedas I F I F R G Y S E V V I L M L T L V I I G G N V Geophagus abalios I F I F R S Y S E V I V L M L S V V I I S G N V Geophagus discrozoster I F I V K Y W A D V I V L M L S V V I I S G N V Geophagus harreri I F I F K Y W A D V I V L M L S V V I I S G N V Figure 2.2: RH1 phylogeny and distribution of amino acid residues at positively selected site in Neotropical and African cichlids. Amino acids with hydrophobic side chains are in shades of blue, aromatic in red, acidic in green, basic in orange, amides in white, small in yellow, and nucleophilic in purple. Within each group, residues with larger side chains are darker. ”Divergent” sites contain different residues between Neotropical and African cichlids based on a visual inspection of the alignment, but are not under positive selection.

53 Figure 2.3: Interface between rhodopsin molecules in a dimer. Sites thought to be on the dimeric interface are highlighted in yellow : Helices IV and V (Fotiadis et al. 2004; Guo et al. 2005), cytoplasmic loop II, and parts of the C terminal region (Fotiadis et al. 2004). Residues in blue are BEB sites on helix IV or V in African cichlids, residues in red are BEB sites on helix IV or V in Neotropical cichlids, the residue highlighted in purple residue is the only BEB site on helix IV or V in both African and Neotropical cichlids. Panels show ribbon and space-filling diagrams of the same structure, based on pdID 1U19.

Figure 2.4: Openings to retinal binding pocket in the active conformation of rhodopsin. The left panel shows the opening between helices I and VII, the right panel shows the opening between helices V and VI. Residues around the opening are highlighted in yellow. Residues in blue are BEB sites near the openings in African cichlids, and residues in red are BEB sites near the opening in Neotropical cichlids. Sites are mapped onto PdID 3DQB.

54 2.8 Supplementary information

Table 2.1: Supplementary Table. Species list, museum catalogue numbers, and accession numbers for sequences used in this study. Sequences isolated for this study are from the Royal Ontario Museum Icthyology collection in Toronto, Canada and their museum catalogue numbers are listed here. Species names follow L´opez-Fern´andez et al. 2010.

Partition Species Catalogue Accession Reference Numbers Number Neotropical Retroculus xinguensis HLF1230 JX576463 This study Neotropical Cichla temensis HLF61 JX576464 This study HLF80 Neotropical Biotoecus dicentrarchus HLF75 JX576465 This study Neotropical Chaetobranchus flavescens HLF517 JX576466 This study Neotropical Mazarunia sp. 1 T06044 JX576467 This study Neotropical Mazarunia sp. 2 T06235 JX576468 This study Neotropical Guianacara owroewefi HLF485 JX576469 This study Neotropical Guianacara stergiosi HLF125 JX576470 This study Neotropical Crenicichla ’Orinoco lugubris’ HLF667 This study JX576471 Neotropical Crenicichla geayi HLF18 JX576472 This study Neotropical Crenicichla ’Orinoco wallacii’ HLF68 JX576474 This study Neotropical Crenichla frenata na JN990736.1 Weadick et al. 2012 Neotropical Teleocichla sp. HLF1358 JX576473 This study Neotropical Taeniacara candidi HLF152 JX576475 This study Neotropical Apistogramma agassizi HLF5 JX576476 This study HLF7 Neotropical Apistogramma hoignei HLF23 JX576477 This study HLF25 HLF42 Neotropical Satanoperca daemon HLF64 JX576478 This study HLF90 Neotropical Satanoperca leucosticta HLF498 JX576479 This study Neotropical Satanoperca mapiritensis HLF117 JX576480 This study HLF132 Neotropical Satanoperca jurupari HLF184 JX576481 This study Neotropical Crenicara punctulatum HLF282 JX576482 This study Neotropical Dicrossus filamentosus HLF143 JX576483 This study Neotropical Biotodoma cupido HLF1 JX576484 This study HLF3 Neotropical Biotodoma wavrini HLF13 JX576485 This study HLF55

55 Neotropical Mikrogeophagus ramirezi HLF37 JX576486 This study Neotropical ’Geophagus’ brasiliensis HLF145 JX576487 This study HLF727 Neotropical ’Geophagus’ steindachneri HLF726 JX576488 This study Neotropical Gymnogeophagus setequedas HLF302 JX576489 This study Neotropical Geophagus abalios HLF88 JX576490 This study TO8707 Neotropical Geophagus dicrozoster HLF83 JX576491 This study HLF84 Neotropical Geophagus harreri HLF277 JX576492 This study Neotropical Oreochromis niloticus na AB084938.1 Sugawara et al. 2005 African lake Xenotilapia spiloptera na AB185242.1 Sugawara et al. 2005 African lake Cyphotilapia frontosa na AB084929.1 Sugawara et al. 2005 African lake Diplotaxodon macrops na AB185220.1 Sugawara et al. 2005 African lake Pallidochromis tokolosh na AB185229.1 Sugawara et al. 2005 African lake Haplotaxodon microlepis na AB185390.1 Sugawara et al. 2005 African lake Limnochromis staneri na AB185225.1 Sugawara et al. 2005 African lake Metriaclima zebra na AB185235.1 Sugawara et al. 2005 African lake Neolamprologus leleupi na AB084937.1 Sugawara et al. 2005 African lake Rhamphochromis longiceps na AB196147.1 Sugawara et al. 2005 African lake Trematocara unimaculatum na AB185238.1 Sugawara et al. 2005 African lake Tropheus duboisi na AB084946.1 Sugawara et al. 2005 African lake Aulonocara stuartgranti na AB185215.1 Sugawara et al. 2005 African lake Tyrannochromis maculatus na AY775117.1 Spady et al. 2005 African lake Baileychromis centropomoides na AB185217.1 Spady et al. 2005 African river Heterochromis multidens T07177 JX576460 This study African river Hemichromis fasciatus HLF177 JX576461 This study HLF178 African river Chromidotilapia guntheri HLF156 JX576462 This study HLF156 HLF158 African river Sarotherodon melanotheron na AB084940.1 Sugawara et al. 2005 African river Spathodus erythrodon na AB084941.1 Sugawara et al. 2005 African river Steatocranus casuarius na AB084942.1 Sugawara et al. 2005 African river Tilapia buttikoferi na AB084943.1 Sugawara et al. 2005 African river Tilapia rendalli na AB084944.1 Sugawara et al. 2005 Indian Etroplus maculatus na EF095630.1 Chen et al. 2007

56 Table 2.2: Supplementary table: Parameter estimates, likelihood values, test statistics, and p values for various data partitions in Clade Model C with phylogenetically misplaced species removed. Omega estimates in the alternative models that are significantly different from one are highlighted in bold. All analyses are conducted using a tree with Heterochromis multidens and Retroculus xinguensis removed. 57 Table 2.3: Supplementary table: Likelihood values, test statistics, and p values for likelihood ratio tests for branch-site models with phylogenetically misplaced species removed. All analyses are conducted using a tree with Heterochromis multidens and Retroculus xinguensis removed. 58 Table 2.4: Supplementary table: Detailed BEB output for Site Models, CmC, and Branch- site Models. Values indicate p (ω > 1) from all Bayes’ Emperical Bayes (BEB) analyses. Sites with p > .70 in any analysis are listed in the first column. *: P > 95%; **: P > 99%. Alternative analyses for the same evolutionary scenario are within dark black borders (ie. analyses within the borders address the same question, but use different trees or are from M8 vs.M3 runs). Darkly shaded entries indicate strong support that that site belongs in the positively selected class across all alternative analyses (defined as: P > 95% in at least one analyses with support from at least one other analyses with P > 90%); lightly shaded entries indicate moderate support (defined as: P > 80% in at least one analyses with support from at least one other analyses with P > 50% or P > 95% with no other support). Site numbers in bold were also found to be under positive selection in African cichlid rhodopsin by Spady et al. 2005; as well as sites 22, 41, 42, 50, 95, 104, 158, 159, 255, 256, and 263.

59 Chapter 3

Patterns of Selective Constraint in Geophagine Cichlid Rhodopsin

3.1 Introduction

This chapter explores patterns of selective constraint within the Neotropical cichlids, focusing on the Geophagini clade and including several basal Neotropical cichlid taxa. Geophagine cichlids are extraordinarily diverse in terms morphology, ecology, and reproductive mode (Barlow 2000, Wimberger et al. 1998, L´opez-Fern´andez et al. 2012), and as we have shown in Chapter 2 of this thesis positive selection has influenced the evolution of rhodopsin in this clade. However, the large-scale approach we employed in Chapter 2 did not include investigation into whether positive selection is uniform throughout the geophagine cichlids, or whether there are patterns of selective constraint at finer scales. Amino acid substitu- tions and levels of positive selection can often be correlated to characteristics of the photic environment in aquatic organisms (eg. Hunt et al. 2001, Spady et al. 2005, Sugawara et

60 al. 2005, 2010, Yokoyama 2008) and there are many ways in which the photic environment varies among the habitats of geophagine cichlids: For example, species of geophagines are found in three distinct water types in the Neotropics; including “white water” (character- ized by a high sediment load, high pH, and a high nutrient load, Albernaz et al. 2012), “black water” (characterized by high transparency, but strong staining by tannins, low pH and negligible amounts of solutes), and “clear water” (characterized by relatively high trans- parency, slightly acid pH, and moderate amounts of dissolved organic matter) (Sioli, 1984), and although all geophagines are riverine there are substantial differences in maximum depth among rivers, both currently and over history (Lundberg et al. 1998). Both water type and depth affect light intensity as well as the available wavelengths (Lythgoe 1979), and hence may apply different selective pressures on visual system genes.

The tribe Geophagini is divided into two large sister clades (L´opez-Fern´andez et al. 2005a, 2005b, L´opez-Fern´andez et al. 2010), described here as the “Geophagus” clade and the “Satanoperca” clade (referred to as the “B” clade and the “Satanoperca” clade respectively in L´opez-Fern´andez et al. 2005a, 2005b, L´opez-Fern´andez et al. 2010). We employed Clade model C and branch-site models to determine if there is divergent selection pressure between these two groups, and also included a“basal” clade of three Neotropical cichlids which are basal to the “Satanoperca”/“Geophagus” split.

These analyses provide a second system for comparing results from the branch-site models and Clade model C , as we did in Chapter 2 of this thesis. The results in this section provide additional support that Clade model C is better suited to detecting among-clade divergence than are branch-site models.

61 3.2 Methods

3.2.1 Species Included and Phylogenetic Relationships

A subset of the RH1 gene fragments used in Chapter 2 were selected for this study, including all species from the tribe Geophagini (including at least one member of each genus) and three Neotropical species basal to Geophagini (Retroculus xinguensis, Chaetobranchus flavescens, and Cichla temensis). Species included and accession numbers are listed in table 3.1. There is a well-resolved, genus-level phylogeny available for Neotropical cichlids based on informa- tion from three mirochondrial genes and two nuclear genes (L´opez-Fern´andez et al. 2010), which includes all of the species considered in the present study. All analyses conducted in this study use this tree.

3.2.2 Clade Model C Analyses

We used Clade Model C (CmC) in PAML v4.5 (Bielawski and Yang 2004) to determine if there is divergent selection between the two major clades within Geophagini and a group of Neotropical cichlids basal to the Geophagini. CmC analyses were set up with three parti- tions: 1) The “Satanoperca” clade, which includes the genera Apistogramma, Taeniacara, Guianacara, Mazarunia, Crenicichla, Teleocichla, Acarichthys, Biotoecus, and Satanoperca; 2) The “Geophagus” clade, which includes the genera Geophagus, Mikrogeophagus, Di- crossus, Crenicara, Biotodoma, Gymnogeophagus, and Geophagus; and 3) The outgroup clade, which includes the genera Chaetobranchus, Cichla, and Retroculus L´opez-Fern´andez et al. 2010). We used the newly implemented multi-clade models (Yoshida et al. 2011), a newly derived null model (Weadick and Chang 2012), and a new method to determine if omega values in the divergent site class are significantly different from one (Chang et

62 al. 2012) to conduct these analyses. All CmC analyses were carried out according to the methods described in Chapter 2.

3.2.3 Branch-site Analyses

Branch-site models allow for omega to vary among amino acid sites and between “foreground” and “background” branch types specified by the user, based on a-priori hypotheses of where adaptive evolution may have occurred (Zhang et al. 2005). They were employed in four ways: 1) with the entire “Geophagus” clade as the foreground, 2) with the entire“Satanoperca” clade as the foreground, 3) with the single lineage leading to the “Geophagus” clade as the foreground, and 4) with the single lineage leading to the“Satanoperca” clade as the foreground. All branch-site analyses were carried out according to the methods in Chapter 2.

3.3 Results

3.3.1 Clade Model C

When Clade model C was employed with each of “Geophagus”, “Satanoperca”, and the outgroups as separate partitions, allowing for a divergently selected site class significantly improved the fit of the model. This indicates that there are amino acid sites which are under different selective constraint among clades (p = .041). The estimated value of omega is significantly greater than one in all three data partitions, indicating that the divergently selected class is, on average, under positive selection in all clades. However, the value of omega was not uniform throughout the phylogeny: the highest values of omega occur in

63 the “Geophagus” clade ( ω = 5.24) and the basal clade (ω = 5.29) respectively, with the “Satanoperca” clade having an omega value of 2.48 in the divergently selected site class. Approximately 6-7% of amino acid sites are in the divergently selected class (Table 3.2).

3.3.2 Branch-site

We used branch-site tests to determine whether the patterns of divergent selection in our clade model tests are driven by a burst of selection following divergence of major clades, by designating the lineage leading to the “Geophagus” clade and the“Satanoperca” clade as the foreground in two separate tests. Both tests were insignificant (Table 3.3), indicating that the divergent selection pressure found using the clade models was not driven by selection as each group invaded a new environment, but rather by processes affecting the molecular evo- lution of rhodopsin across each of the clades within Geophagini. We also applied branch-site models with the entire “Geophagus” or “Satanopeca” clade as the foreground, respectively. This has been used to detect divergent selection between clades (eg. Ramm et al. 2008), and has been used as an alternative to CmC models (Yoshida et al. 2011). Despite finding evidence for positive selection in both “Geophagus” and“Satanoperca” using Clade model C, our branch-site test was significant when the entire “Geophagus” clade was designated as the foreground (p < .001) but insignificant when the entire “Satanoperca” clade was designated as the foreground (p = 0.567) (Table 3.3).

3.3.3 Divergently Selected Sites

We used the BEB method to estimate which amino acid sites belong in the divergently selected site class from the CmC analyses (Yang et al. 2005). These sites, and their amino acid distribution with respect to the phylogeny, are listed in Figure 3. Two sites (270 and 274)

64 are variable in the “Geophagus” clade but not the“Satanoperca” clade, and one site (site 217) has unique residues in both clades (alanine in the“Satanoperca” clade and phenylalanine or valine in the “Geophagus” clade), as well as some residues common to both clades (threonine and isoleucine). Divergent selection was detected at three sites that include only amino acid residues that are functionally similar to each other: Both sites 173 and 286 contain only hydrophobic residues, and site 169 includes only very small, hydrophobic residues (Figure 3.1).

The structure of the activated opsin (Park et al. 2008) shows a channel through the protein that provides access to the chromophore pocket, with openings into the lipid bi-layer between helices I and VII and between helices V and VI (Hildebrand et al. 2009). Current theories suggest that retinal traverses through this channel unidirectionally (Schadel et al. 2003, Hildebrand et al. 2009), but despite extensive mutagenesis studies the direction of travel has not been established (Piechnick et al. 2012). Both sites 270 and 274 were iden- tified as being under divergent selection pressure between African cichlids and Neotropical cichlids in Chapter 2, and are adjacent to the opening between helices V and VII. Substi- tutions at these sites may influence the rate of retinal migration through the channel. The distribution of amino acid substitutions within Geophagini and the higher value of omega in the divergently selected class of the “Geophagus” clade suggests that differences in se- lective constraint at these sites between African and Neotropical cichlids are likely by the “Geophagus” clade of Neotropical cichlids (Figure 3.1).

65 3.4 Discussion

3.4.1 Divergent Selection Between Clades, with Positive Selection

Throughout

We used Clade model C to determine whether the selection regime is divergent among the two major clades of geophagine cichlids, and found evidence for divergent selection pressure at 6-7% of amino acid sites. The omega value in each clade was found to be significantly greater than 1, indicating that the divergent class is on average under positive selection in all three partitions considered (Table 3.2). Previous Clade model C analyses had lumped all Neotropical cichlids together, and found that the divergent class in this group was on average under positive selection, with an average omega value of 2.15 (Chapter 2). The present analysis suggests a higher average omega value of 4.15 (Table 3.2), likely because sites under positive selection in African cichlids but not in Neotropical cichlids would have contributed to the average omega value in the divergent class of the Neotropical vs. African tests. Our results indicate that the distribution of positive selection is non-uniform within the Neotropical cichlids. This is demonstrated by the distribution of amino acid residues in the two clades within Geophagini, in that some sites are variable in one clade but not the other, and substitutions are unique to a particular clade (Figure 3.1). It is presently unclear why selective constrain on rhodopsin should be different between these clades, or why different substitutions should be favoured in each clade, as there are no obvious distinctions between the clades in terms of habitat, range, or photic environment; and important morphological and life history innovations (such as the presence of the epibranchial lobe and behaviours such as mouth-brooding young) occur in members of each clade (Lopez-Fernadez et al. 2012). However, Geophagine genera are very old (Malabarba et al. 2010, L´opez-Fern´andez et al. in review), and members of many genera are morphologically distinct from each other (L´opez-

66 Fern´andez et al. 2012), indicating that ecomorphological specialization has occurred among genera. It is possible that positive selection has acted on the rhodopsin genes of each genus independently, and that dividing the phylogeny into the two clades is artificial in terms of how selective constraint is distributed with respect to the phylogeny. If this is the case, the result of higher omega values within the “Geophagus” clade could be due to either stronger positive selection in particular lineages, or positive selection occurring on more lineages, than in the“Satanoperca” clade; rather than differences in selective constraint between the two clades as a whole. This hypothesis is tentatively supported by the observation that there are no amino acid sites where all members of each clade share a residue, which is different from the residue found in the other clade.

3.4.2 Clade model C vs. Branch-site Results

We used branch-site models with the single lineage leading to each of the major clades as the foreground, to determine if positive selection in these clades was the result of a burst of selection following the divergence of the ancestor of the two groups. We found non- significant results, with no difference in likelihood between the null and alternative models. This is perhaps unsurprising, because although these nodes are very strongly supported in the phylogeny, the branches leading to these groups are very short (L´opez-Fern´aandez 2010). We did not perform ancestral reconstructions of these sequences, but it is possible given the short length of these branches that no amino acid substitutions occurred in these branches.

We then used branch-site models with the entire “Satanoperca” and the entire “Geoph- agus” group designated as the foreground respectively. We found evidence for positive selec- tion in the “Geophagus” group when it was designated as the foreground, but no evidence for positive selection in the“Satanoperca” group when it was designated as the foreground.

67 This is in conflict with our Clade model C results, which found evidence for positive selection in both of these clades (Table 3.2). In Chapter 2, we suggested that because branch-site models are designed to detect positive selection in particular lineages in an otherwise neu- trally or conservatively evolving background (Yang et al. 2005; Zhang et al. 2005), they may have low power to detect positively selected sites in the foreground clade when the same sites are also under positive selection in the background. The results presented here further support this conclusion, as the branch-site models were able to detect positive selection when the “Geophagus” clade was designated as the foreground, which has an omega value of 5.24 according to Clade model C, but not when the “Satanoperca” clade was designated as the foreground, which had a lower but still significantly greater than one value for omega of 2.48 (Table 3.2).

Overall, the discrepancies between the results of the branch-site models and CmC in Chapter 2 and in this chapter suggest that using branch-site models as applied to an entire clade may be inappropriate for assessing differences in selective constraint among clades, especially if one is interested in determining if positive selection has acted on residues within a clade. The branch-site models were designed to detect episodes of positive selection in a background of otherwise purifying or neutral selection (Yang et al. 2005; Zhang et al. 2005), and are usually applied to a single branch in a phylogeny. The situation where the test is applied to multiple lineages simultaneously or to an entire clade has not undergone statistical review to our knowledge, but the test is used in this way fairly commonly (eg. Spady et al. 2005, Ramm et al. 2008, Yoshinda 2011 ). Based on the results presented here, we hypothesize that the branch-site method has low power to detect positive selection in the foreground clade if there is also positive selection in the background clade, and suggest that the power of the branch-site test used in this manner should be tested using simulation studies.

68 3.5 Tables

Table 3.1: Supplementary Table. Species list, museum catalogue numbers, and accession numbers for sequences used in this study. Tissues are from the Royal Ontario Museum Icthyology collection in Toronto, Canada. Species names follow L´opez-Fern´andez et al. 2010.

Partition Species Catalogue Accession Reference Numbers Number Neotropical Retroculus xinguensis HLF1230 JX576463 This study Neotropical Cichla temensis HLF61 JX576464 This study HLF80 Neotropical Biotoecus dicentrarchus HLF75 JX576465 This study Neotropical Chaetobranchus flavescens HLF517 JX576466 This study Neotropical Mazarunia sp. 1 T06044 JX576467 This study Neotropical Mazarunia sp. 2 T06235 JX576468 This study Neotropical Guianacara owroewefi HLF485 JX576469 This study Neotropical Guianacara stergiosi HLF125 JX576470 This study Neotropical Crenicichla ’Orinoco lugubris’ HLF667 This study JX576471 Neotropical Crenicichla geayi HLF18 JX576472 This study Neotropical Crenicichla ’Orinoco wallacii’ HLF68 JX576474 This study Neotropical Crenichla frenata na JN990736.1 Weadick et al. 2012 Neotropical Teleocichla sp. HLF1358 JX576473 This study Neotropical Taeniacara candidi HLF152 JX576475 This study Neotropical Apistogramma agassizi HLF5 JX576476 This study HLF7 Neotropical Apistogramma hoignei HLF23 JX576477 This study HLF25 HLF42 Neotropical Satanoperca daemon HLF64 JX576478 This study HLF90 Neotropical Satanoperca leucosticta HLF498 JX576479 This study Neotropical Satanoperca mapiritensis HLF117 JX576480 This study HLF132 Neotropical Satanoperca jurupari HLF184 JX576481 This study Neotropical Crenicara punctulatum HLF282 JX576482 This study Neotropical Dicrossus filamentosus HLF143 JX576483 This study Neotropical Biotodoma cupido HLF1 JX576484 This study HLF3 Neotropical Biotodoma wavrini HLF13 JX576485 This study HLF55 Neotropical Mikrogeophagus ramirezi HLF37 JX576486 This study

69 Neotropical ’Geophagus’ brasiliensis HLF145 JX576487 This study HLF727 Neotropical ’Geophagus’ steindachneri HLF726 JX576488 This study Neotropical Gymnogeophagus setequedas HLF302 JX576489 This study Neotropical Geophagus abalios HLF88 JX576490 This study TO8707 Neotropical Geophagus dicrozoster HLF83 JX576491 This study HLF84 Neotropical Geophagus harreri HLF277 JX576492 This study Neotropical Oreochromis niloticus na AB084938.1 Sugawara et al. 2005 Indian Etroplus maculatus na EF095630.1 Chen et al. 2007

70 Table 3.2: Parameter estimates, likelihood values, test statistics, and p values for CmC analysis of a tree with three partitions: The “Satanoperca” clade, the “Geophagus” clade, and a clade of basal outgroups. 71 Table 3.3: Likelihood values, test statistics, and p values for likelihood ratio tests for branch-site models.

Tree Test Partition Test Np kappa LNL P value length statistic Alternative 63 1.27071 2.6474 -3114.1 Geophagini as foreground null 62 1.22962 2.45476 -3124.8 21.4 <.001

Alternative 63 2.65 2.65 -3124.0 Satanoperca as foreground

72 null 62 1.23 2.45 -3124.8 1.67 0.567

Alternative 63 1.22962 2.45472 -3124.8 Single lineage to Geophagini as foreground null 62 1.22962 2.45469 -3124.8 0 1

Alternative 63 1.22961 2.45472 -3124.8 Single lineage to Satanoperca as foreground null 62 1.22962 2.45473 -3124.8 0 1

3.6 Figures

73 Figure 3.1: Amino acid residues at divergently selected sites in geophagine cichlids and some Neotropical basal outgroups. Amino acid residues with hydrophobic chains are in shades of blue, aromatic in red, acidic in green, basic in orange, amides in white, small in yellow, and nucleophilic in purple. Within each category, residues with larger side chains are darker.

74 Chapter 4

Conclusions and Future Directions

4.1 Conclusions

This thesis describes aspects of the molecular evolution of rhodopsin in Neotropical cichlids, with a focus on the tribe Geophagini. Before the work described in this thesis was conducted, visual systems genes had only been sequenced for a single species of Neotropical cichlid — the geophagine Crenicichla frenata from Trinidad. Based on some surprising findings within this single species (Weadick and Chang 2012), and because the evolution of visual systems has been important in the speciation and diversification of African rift lake cichlids (Carleton 2009), we sought out to investigate the molecular evolution of the dim light visual pigment, rhodopsin, in a wider phylogenetic context in Neotropical cichlids.

In Chapter 2, a fragment of the rhodopsin gene was sequenced for 31 species of Neotrop- ical cichlid and two species of African riverine cichlids, which were combined with publicly available sequences for C. frenata, 20 African species, and one species from India to compare patterns of selective constraint between 1) Neotropical and African cichlids, 2) Riverine ci-

75 chlids and lake cichlids, and 3) A three-way comparison of Neotropical cichlids, African rift lake cichlids, and African riverine cichlids. In Chapter 3, patterns of selective constraint in rhodopsin were compared between the two clades of geophagine cichlids, the “Geophagus” clade and the “Satanoperca” clade. Possible effects of positively and divergently selected amino acid substitutions on rhodopsin function were discussed in both chapters, and both chapters included a comparison of two likelihood-based codon models of molecular evolution — the branch-site models and Clade model C (Yang 2007).

We were able to show very high levels of positive selection in the rhodopsin gene of Neotropical geophagine cichlids using our new sequence data, and were also able to confirm previous reports of positive selection in African rift lake cichlid rhodopsin (Spady et al. 2005), but we found no evidence for positive selection in African riverine cichlid rhodopsin (Chapter 2). On average, selective constraint was different between Neotropical cichlids and African cichlids, lake cichlids and riverine cichlids, and among Neotropical cichlids, African rift lake cichlids, and African riverine cichlids. Intriguingly, the set of amino acid sites under positive selection in African rift lake cichlids and in Neotropical cichlids are almost entirely non-overlapping, suggesting that selective pressure is divergent between these clades. Based on their location in the 3D structure, substitutions at these sites may be influencing non- spectral properties of rhodopsin such as rates of retinal release or the dimerization interface between rhodopsin monomers. However, not all substitutions have clear functional correlates, and further experiments would be need to be conducted to investigate their potential effects on rhodopsin function (Chapter 2).

Given the high levels of positive selection found in our dataset, the question remains as to what effect these substitutions may have on visual ability in an ecological context, and how ecology or habitat may have affected the molecular evolution of the rhodopsin pigment. Why should selective constraint on rhodopsin be so different on the two continents?

76 Although not estimated to be under positive selection, an interesting pattern of substi- tution was found at amino acid site 83 which has bearing on how positive selection at other sites is interpreted. As described in chapter 2, all Neotropical cichlids with the exception of the basal Retroculus xinguensis have an asparagine(Asn) residue at this site. Most African cichlids have aspartic acid (Asp) at this residue, with Asn83 occurring only in deep water cichlids. Asp83 forms hydrogen bonds with asparagine residues at sites 55 and 302 in the dark state of bovine rhodopsin, and also to a glycine at residue 120 via a structural water molecule (Palczewski et al. 2000, Okada et al. 2002). These interactions are thought to stabilize the Meta I state, and that when aspartic acid is replaced by asparagine at site 83 the equilibrium between Meta I and Meta II is shifted towards Meta II, increasing the efficiency of photic signal transduction (Suguwara et al. 2010). This has been interpreted as an adaptation for vision in dim light. Meta II formation times are variable in vitro among African lake cichlids (Suguwara et al 2010), suggesting that other residues also have an ef- fect on this property. We suggest that the Asn83 substitution common to most Neotropical cichlids studied thus far may not be adaptive, given that there is currently no evidence that rapid Meta II formation is adaptive outside of deep water (or dim light) habitats. This substitution could affect other aspects of rhodopsin function, but at least in African cichlids the Asn residue appears to be primarily an adaptation for dim-light vision (Suguwara et al. 2010). It is possible that the Asn83 residue is present in Neotropical cichlids either because it was adaptive early in evolutionary history or due to a selectively neutral substitution that swept through the ancestral species. Making the parsimonious assumption that this residue did arise early in geophagine evolutionary history, it likely arose between 118.5mya (the estimated date that lineages leading to (Geophagini +Chaetobranchini) and (Astrono- tini+Cichlasomatini+Heroini diverged) and 124mya (the estimated date for when Cichlini and Retroculini separated from all other cichlids) (L´opez-Fern´andez et al. in review). South America has experienced multiple marine incursions since the separation from Africa, result-

77 ing in deep water habitat throughout much of the continent (Lundberg et al. 1998, Bloom and Lovejoy 2011). If cichlid diversification occurred in ancient deep lake habitats, Asn83 may have been an adaptive substitution early in geophagine evolutionary history. Although highly speculative, the presence of Asn 83 in riverine cichlids today, many of which are not in dim-light habitats, could have driven positive selection on other amino acid sites for in- creased Meta I stability, to reverse its effect. This could account for some of the positive selection we observe in Neotropical cichlid rhodopsin, and help explain some of the large differences in selective constraint between Neotropical and African cichlid rhodopsins.

In Chapter 3 we showed that positive selection on rhodopsin is pervasive throughout the Neotropical cichlid species sampled, and that patterns of selective constraint are distributed non-uniformly within the group with higher levels of positive selection in the “Geophagus” clade than the “Satanoperca” clade. However, many amino acid sites in the divergently selected class have similar patterns of substitution in the two clades, with the same residues occurring in both. Only sites 217, 270, and 274 show substitution patterns with unique residues in each of the clades within the tribe Geophagini. Given the lack of broad differences in substitution patterns among clades, we suggest that differences in the average level of positive selection between clades is not the result of broad differences in ecology or habitat, but rather the combined result of positive selection acting on specific species, which happens to be stronger on average in species from the “Geophagus” group. Interpreting these results in terms of ecology should therefore be done in a species-specific (or possibly genus-specific) manner.

Both Chapter 2 and 3 use branch-site models and Clade model C to asses among- clade divergence in rhodopsin. In chapter 2 we show that branch-site tests are unable to detect (or detect with much lower significance) positively selected sited identified by the site models, if the site in question is under positive selection in the background as well as the

78 foreground. In chapter 3, we found that the branch-site test was not significant when the “Satanoperca” clade was designated as the foreground, which the Clade model C suggested contained a divergently selected site class under positive selection, but with a lower average value of omega than in the “Geophagus” clade that was the background of the branch-site test. Both of these results suggest that branch-site models have low power to detect positive selection in a foreground clade if there is also positive selection in the background. There are some biological questions where this is not a problem, ie., if one is interested in knowing whether there is positive selection above the background level in a particular clade or lineage. However, we conclude that Clade model C is more appropriate for assessing among-clade divergence in protein-coding genes.

4.2 Future Directions

Questions addressed in this thesis were inspired by the plethora of studies conducted in the African rift lake cichlids (see Seehausen 2006, Carleton 2009 for notable reviews), as well as by work done in C. frenata by Weadick et al. (2012) and by increasing evidence for ecomorphological specialization within Neotroipcal cichlids, and in geophagini in particular (L´opez-Fern´andez et al. 2012). The results presented in this thesis build on this foundation, providing further evidence that investigating the molecular evolution of visual systems in Neotropical cichlids is interesting both in terms of providing a comparison to the African rift lake cichlids and in terms of understanding evolutionary and ecological processes within the Neotropical cichlids. This system has just begun to be explored, and there are many avenues of potential future research.

The most obvious future direction would be to expand the analyses to more lineages within the Neotropical cichlids, to determine if positive selection on rhodopsin acts in other

79 Neotropical clades. The geophagines are one of 7 tribes of Neotropical cichlids (Retroculini, Cichlini, Chaetobranchini, Geophagini, Astronotini, Cichlasomatini, and Heroini), two of which (Heroini and Cichlasomatini) are also characterized by short branch-lengths at the root of the radiation (L´opez-Fern´andez et al. 2010) and by declining lineage accumulation over time (L´opez-Fern´andez et al. in review), traits that are characteristic of adaptive radiation. The heroini in particular are also very morphologically diverse, and very important to riverine community structure, as they make up to 25% of the ichthyofauna of Mesoamerica (Perez et al. 2007). They would therefore be a good candidate clade in which to further pursue studies on the molecular evolution of rhodopsin in Neotropical cichlids.

A second major avenue of research would be to begin investigating the molecular evo- lution of the cone opsins of Neotropical cichlids. One of the most interesting findings from the C. frenata study (Weadick et al. 2012) was that this species has three fewer cone opsins than the African cichlids, due to a loss of the SWS1 pigment, pseudogenization of the RH2b pigment, and an African-specific duplication of the RH2a pigment into RH2aα and RH2aβ (Weadick et al. 2012). The genus Crenicichla has the fastest and most heterogeneous rates of molecular evolution within the geophagines (Farias et al. 1999, L´opez-Fern´andez et al. 2005a), and is the only group to feed primarily on fish (Lopez-Fernandez et al. 2012). They are therefore somewhat atypical among geophagines, and it is unclear whether other geophagines may also have a reduced opsin complement compared to African cichlids. The phylogenetic extent of the SWS1 loss and the RH2b pseudogenization could be pursued in future studies. Although this line of research is mostly inspired by Weadick et al. 2012, the existence of positive selection on rhodopsin (Chapter 2) and variation in selective constraint (Chapter 3) within the geophagines provide further motivation, as the visual systems of these fishes in general appear to be evolving under the influence of natural selection. Related to this line of research, it is possible that the positive selection found in the SWS2 gene of C.

80 frenata (Weadick et al. 2012) is related to the loss of the SWS1 pigment, for example if the selection on SWS1 in some way compensates for SWS2 loss. Investigations into patterns of selective constraint on the SWS2 pigment could be conducted to determine if positive selection in SWS2 is correlated to SWS1 loss.

A third avenue of future research involves comparing the branch-site and Clade model C methods more thoroughly. We compare the results of branch-site models and clade model C in both chapters 2 and 3, but this was done in a post-hoc manner: we noticed discrepancies in the results of the two models that occurred consistently across analyses, and formulated the hypothesis that these inconsistencies are due to the branch-site test having low power to detect positive selection in the foreground when there is also positive selection occurring in the background. However, it is unclear whether these discrepancies are related to the specific circumstances of these analyses, and under what conditions they are likely to arise. This hypothesis could be rigorously tested through simulation studies, and clarifying this methodological point could be immensely useful to future studies investigating patterns of selective constraint across clades.

Whether the analysis presented in this thesis is extended to include more species or more opsins, the question remains whether amino acid substitutions highlighted as being under positive or divergent selection are adaptive, or what effect they have on organismal vision. In their seminal 1979 paper, Gould and Lewontin warned against what they saw as an adaptionist bias, reminding the scientific community that functional observations do not always have adaptive explanations. Although the original implementation of PAML was described as a method for detecting molecular adaptation (Yang and Bielawski 2000), this interpretation of positively selected sites has been found inadequate without subsequent studies linking BEB sites to function, and function to fitness (Hughes 2008, Yokoyama et al. 2008; MacCallum and Hill 2008; Nozawa et al. 2009). However, the extensive mutagenesis

81 studies performed in rhodopsin do provide a basis for making predictions about the possible effects of substitutions at BEB sites, and correlation of function to habitat characteristics may indicate whether they may be adaptive (Hughes 2008). However, substitutions at positively selected sites may not be adaptive in a way that is easily interpretable, such as if selection has acted on some pleiotropic effect of multiple substitutions to produce signatures of positive selection (Anisomova and Liberles 2012) or if there is no way to predict what effect a positively selected amino acid substitution may have on protein function due to a lack of relevant mutagenesis studies. In some cases, positively selected sites may not be adaptive at all, such as if a particular site has a dN/dS ratio greater than one due to stochasticity in the substitution process (Hughes 2008). Many authors have performed mutagenesis experiments to determine if amino acid substitutions at positively selected sites have an effect on protein function that is relevant to organismal fitness (eg. Ivarsson et al. 2003, Sawyer et al. 2005, Levasseur et al. 2006, Yuan et al. 2010, Loughran et al. 2012, Patel et al. 2012), and in some cases a direct link between positively selected amino acid substitutions and fitness has been established (Moury and Simon 2011, in a potato coat virus). To determine whether positively selected amino acid substitutions in Neotropical cichlid rhodopsin are adaptive, one could use mutagenesis to create the relevant proteins in the lab and measure aspects of their function. Shifts in amino acid residues that directly correlate to functional shifts have been taken as preliminary evidence, and if functional shifts could be shown to have a selective advantage in the context of the environment this would be good evidence that the amino acid substitution was adaptive (Levasseur et al. 2007). Correlating functional change or amino acid substitutions to environmental characteristics may be a very challenging problem. Visual pigments are most likely to undergo natural selection imposed by properties of the photic environment, which may be difficult to measure. Properties of the photic environment can change substantially over short time scales in the Neotropics — for example, the Amazon river and its tributaries undergo extensive flooding on a yearly basis (Junk 1997),

82 and turbidity and sediment load can change rapidly due to anthropogenic activities. In the case of Neotropical cichlids, if the amino acid substitutions identified as being under positive selection confer functional differences in the protein and these can be correlated to aspects of cichlid habitat diversity, for example if inhabiting blue-shifted waters was correlated to blue-shifted peak wavelength absorbance of opsin proteins, this would provide evidence that the elevated levels of non-synonymous substitutions compared to synonymous substitutions were driven by Darwinian natural selection.

83 Chapter 5

References

Albernaz AL, Pressey RL, Costa LRF, Moreira MP, Ramos JF, Assunc˜aoPA, Franciscon CH. 2012. Tree species compositional change and conservation implications in the white- water flooded forests of the Brazilian Amazon. Journal of biogeography 39(5):869—883.

Albert JS, Reis RE. 2011. Historical Biogeography of Neotropical Freshwater Fishes. Berke- ley (CA): University of California Press.

Albertson RC, Markert JA, Danley P, Kocher TD. 1999. Phylogeny of a rapidly evolving clade: The cichlid fishes of Lake Malawi, East Africa. PNAS. 96(9):5107—5110.

Anisimova M, Liberles D. 2012. Detecting and understanding natural selection. In: Codon Evolution: mechanisms and models. Cannarozzi G, Schneider A, editors. Oxford: Oxford University Press.

Awiti AO. 2011. Biological Diversity and Resilience: Lessons from the Recovery of Cichlid Species in Lake Victoria. Ecology and society. 16:e9.

Barlow GW. 2000. The Cichlid Fishes: Nature’s Grand Experiment in Evolution. Cam- bridge: Perseus Publishing.

Barluenga M, St¨oltingKN, Salzburger W, Muschick M, Meyer A. 2006. Sympatric speciation in Nicaraguan crater lake cichlid fish. Nature. 439(7077):719—23.

Bielawski J, Yang Z. 2004. A Maximum Likelihood Method for Detecting Functional Diver-

84 gence at Individual Codon Sites, with Application to Gene Family Evolution. Journal of molecular evolution. 59(1):121-132.

Bloom DD, Lovejoy NR. 2011. The biogeography of marine incursions in South America. In: Albert JS, Reis RE, editors. Historical Biogeography of Neotropical Freshwater Fishes. Berkely and Los Angeles (CA): University of California Press.

Boughman J. 2002. How sensory drive can promote speciation. Trends in ecology & evolu- tion. 17(12):571—577.

Bowmaker JK. 1995. The visual pigments of fish. Progress in retinal and eye research. 15(1): 1-31

Bowmaker JK. 2008. Evolution of vertebrate visual pigments. Vision research. 48(20):2022- 2041.

Breikers G, Bovee-Geurts PH, DeCaluw´eGL, DeGrip WJ. 2001. A structural role for Asp83 in the photoactivation of rhodopsin. Biological chemistry. 382(8):1263-70.

Burns ME, Pugh EN. 2010. Lessons from photoreceptors: Turning off g-protein signalling in living cells. Physiology. 25(2):72—84.

Carleton KL, H´arosiFI, Kocher TD. 2000. Visual pigments of African cichlid fishes: evidence for ultraviolet vision from microspectrophotometry and DNA sequences. Vision research. 40(8):879—90.

Carleton KL, Kocher TD. 2001. Cone opsin genes of african cichlid fishes: tuning spectral sensitivity by differential gene expression. Molecular biology and evolution. 18(8):1540— 50.

Carleton KL, Parry JW, Bowmaker JK, Hunt DM, Seehausen O. 2005. Colour vision and speciation in Lake Victoria cichlids of the genus Pundamilia. Molecular ecology. 14(14): 4341—53.

Carleton KL. 2009. Cichlid fish visual systems: mechanisms of spectral tuning. Integrative . 4(1):75—86.

Carleton KL, Hofmann CM, Klisz C, Patel Z, Chircus LM, Simenauer LH, Soodoo N, Al- bertson RC, Ser JR. 2010. Genetic basis of differential opsin gene expression in cichlid fishes. Journal of evolutionary biology. 23(4):840—53.

Chang B, Du J, Weadick CJ, M¨ullerJ, Bickleman C, Yu D, Morrow J. 2012. The Fu-

85 ture of Codon Models in Studies of Molecular Function: Ancestral Reconstruction, and Clade Models of Functional Divergence. In: Cannarozzi G, Schneider A, editors. Codon Evolution: Mechanisms and Models. Oxford: Oxford University Press.

Chen W-J, Bonillo C, Lecointre G. 2003. Repeatability of clades as a criterion of reliability: a case study for molecular phylogeny of Acanthomorpha (Teleostei) with larger number of taxa. Molecular phylogenetics and evolution. 26(2):262—288.

Chen W-J, Ruiz-Carus R, Ort´ı G. 2007. Relationships among four genera of mojarras (Teleostei: Perciformes: Gerreidae) from the western Atlantic and their tentative place- ment among percomorph fishes. Journal of fish biology. 70(Supplement B):202-218

Chinen A, Hamaoka T, Yamada Y, Kawamura S. 2003. Gene duplication and spectral diversification of cone visual pigments of zebrafish. Genetics. 163(2):663—75.

Crick FHC. 1968. Origin of Genetic Code. Journal of molecular biology. 38(3):367-379.

DeCaluw´eGL, Bovee-Geurts PH, Rath P, Rothschild KJ, de Grip WJ. 1995. Effect of carboxyl mutations on functional properties of bovine rhodopsin. Biophysical chemistry 56(1-2):79—87.

DeLano WL. 2002. The PyMOL Molecular Graphics System. http:// www.pymol.org.

Duret L. 2002. Evolution of synonymous codon usage in metazoans. Current opinion in genetic development. 12(6): 640-649.

Farias, IP, Ort´ıG, Sampaio I, Schneider H, and Meyer A. 1999. Mitochondrial DNA phy- logeny of the family Cichlidae: Monophyly and fast molecular evolution of the neotropical assemblage. Journal of molecular evolution. 48(6):703-11.

Farias IP, Ort´ıG, Meyer A. 2000. Total evidence: molecules, morphology, and the phyloge- netics of cichlid fishes. The Journal of experimental zoology. 288(1):76-92.

Farias IP, Ort´ıG, Sampaio I, Schneider H, Meyer A. 2001. The Cytochrome b gene as a phylogenetic marker: the limits of resolution for analyzing relationships among cichlid fishes. Journal of molecular evolution. 53(2): 89—103.

Farrens DL, Altenbach C, Yang K, Hubbell WL, Khorana HG. 1996. Requirement of rigid-body motion of transmembrane helices for light activation of rhodopsin. Science 274(5288):768—70.

Fasick JI, Robinson PR. 2000. Spectral-tuning mechanisms of marine mammal rhodopsins

86 and correlations with foraging depth. Visual neuroscience. 17(5):781-8.

Fay JC, Wu C. 2003. Sequence divergence, functional constraint, and selection in protein evolution. Annual review of genomics and human genetics. 4:213-35.

Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood ap- proach. Journal of molecular evolution. 17(6): 368-376.

Forsberg R, Christiansen FB. 2003. A Codon-Based Model of Host-Specific Selection in Parasites, with an Application to the Influenza A Virus. Molecular biology. 20(8):1252— 1259.

Fotiadis D, Jastrzebska B, Philippsen A, M¨ullerDJ, Palczewski K, Engel A. 2006. Structure of the rhodopsin dimer: a working model for G protein-coupled receptors. Current opinion in structural biology. 16(2):252—259.

Glor RE. 2010. Phylogenetic Insights on Adaptive Radiation. Annual review of ecology, evolution, and systematics 41:251—270.

Goldman N. 1993. Statistical tests of models of DNA substitution. Journal of molecular evolution 36(2): 182-198.

Goldman N, Yang Z. 1994. A codon-based model of nucleotide substitution for protein- coding DNA sequences. Molecular biology and evolution. 11(5):725—36.

Guo W, Shi L, Filizola M, Weinstein H, Javitch JA, Karlin A. 2005. Cross-talk in G protein- coupled receptors: Changes at the transmembrane homodimer interface determine acti- vation. PNAS. 102(48):17495—17500.

Hasegawa M, Kishino H, Yano T. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of molecular evolution. 22(2):160-174.

Hildebrand PW, Scheerer P, Park JH, Choe H-W, Piechnick R, Ernst OP, Hofmann KP, Heck M. 2009. A ligand channel through the G protein coupled receptor opsin. PloS one. 4:e4382.

Hirsh AE, Fraser HB, Wall DP. 2005 Adjusting for selection on synonymous sites in estimates of evolutionary distance. Molecular biology and evolution. 22(1): 174-177.

Hoffmann C, Z¨urnA, B¨unemannM, Lohse MJ. 2008. Conformational changes in G-protein- coupled receptors-the quest for functionally selective conformations is open. British jour- nal of pharmacology. 53(S1)S358—66.

87 Hofmann KP, Scheerer P, Hildebrand PW, Choe HW, Park JH, Heck M, Ernst OP. 2009. A G protein-coupled receptor at work: the rhodopsin model. Trends in biochemical sciences. 34(11):540—52.

Hope AJ, Partridge JC, Dulai KS, Hunt DM. 1997. Mechanisms of wavelength tuning in the rod opsins of deep-sea fishes. Proceedings biological sciences. 264(1379):155—63.

Houde AE. 1997. Sex, Color, and Mate Choice in Guppies. Princeton (NJ): Princeton University Press.

Huelsenbeck, JP, Rannala B. 1997. Phylogenetic methods come of age: Testing hypotheses in an evolutionary context. Science. 276(5310):227-232.

Hughes AL. 2008. The origin of adaptive phenotypes. PNAS. 105:13193—4.

Hunt DM, Fitzgibbon J, Slobodyanyuk SJ, Bowmaker JK. 1996. Spectral tuning and molec- ular evolution of rod visual pigments in the species flock of cottoid fish in Lake Baikal. Vision research. 36(9):1217-24.

Hunt DM, Dulai KS, Partridge JC, Cottrill P, Bowmaker JK. 2001. The molecular basis for spectral tuning of rod visual pigments in deep-sea fish. Journal of experimental biology. 204(19):3333-3344.

Hurst LD. 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends in genetics. 18(9):486.

Iismaa TP, Biden TJ, Shine J. 1995. G protein-coupled receptors. Austin (TX): R.G.Landes Company.

Ivarsson Y, Mackey AJ, Edalat M, Pearson WR, Mannervik B. 2003. Identification of residues in glutathione transferase capable of driving functional diversification in evolu- tion, a novel approach to protein redesign. The Journal of biological chemistry. 278(10):8733— 8.

Jordan R, Kellogg K, Howe D, Juanes F, Stauffer JR, Loew ER. 2006. Photopigment spectral absorbance of Lake Malawi cichlids. Journal of fish biology. 68(4):1291—9.

Jukes TH, Cantor CR. 1969. Evolution of protein molecules. In: Munro HN, editor. Mam- malian protein metabolism. New York (NY): Academic Press. p. 21-123.

Junk WJ. 1997. General aspects of floodplain ecology with special reference to Amazonian floodplains. In: Junk WJ, editor. The Central Amazon floodplain: ecology of a pulsing

88 system. Berlin, Springer-Verlag. p. 3-20

Kamnoeva O, Liberles DA, Ward NL. 2010. Genome-wide infldeunce of Indel substitutions on Evolution of Bacteria of the PVC Superphylum, revealed using a novel computational method. Genome biology and evolution. 2:870-886

Katongo C, Koblm¨ullerS, Duftner N, Makasa L, Sturmbauer C. 2005. Phylogeography and speciation in the Pseudocrenilabrus philander species complex in Zambian Rivers. Hydrobiologia. 542(1):221-233.

Katongo C, Koblm¨ullerS, Duftner N, Mumba L, Sturmbauer C. 2007. Evolutionary history and biogeographic affinities of the serranochromine cichlids in Zambian rivers. Molecular phylogenetics and evolution. 45(1):326-38.

Khan MMG, Ryd´enA-M, Chowdhury MS, Hasan MA, Kazi JU. 2011. Maximum likelihood analysis of mammalian p53 indicates the presence of positively selected sites and higher tumorigenic mutations in purifying sites. Gene. 483(1-2):29—35.

Kimura M. 1983. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.

Kishino H, Miyata ZT, Hasegawa M. 1990. Maximum likelihood inference of protein phy- logeny and the origin of chloroplasts. Journal of molecular evolution. 31(2):151—160.

Koblm¨ullerS, Sefc KM, Duftner N, Katongo C, Tomljanovic T, Sturmbauer C. 2008. A single mitochondrial haplotype and nuclear genetic differentiation in sympatric colour morphs of a riverine cichlid fish. Journal of evolutionary biology. 21(1): 362—367.

Koblm¨ullerS, Egger B, Sturmbauer C, Sefc KM. 2010. Rapid radiation, ancient incomplete lineage sorting and ancient hybridization in the endemic Lake Tanganyika cichlid tribe Tropheini. Molecular phylogenetics and evolution. 55(1):318—34.

Kochendoerfer GG, Lin SW, Sakmar TP, Mathies RA. 1999. How color visual pigments are tuned. Trends in biochemical sciences. 24(8):300—5.

Kocher TD. 2004. Adaptive evolution and explosive speciation: the cichlid fish model. Nature reviews: Genetics. 5(4):288-98.

Kreitman M, Akashi H. 1995. Molecular evidence for natural selection. Annual review of ecology and systematics. 26: 403-422.

Kullander SO. 1998. A phylogeny and classification of the Neotropical Cichlidae (Teleostei:

89 Perciformes). In: Malabarba, LR, Reis RE, Vari RP, Lucena ZM, Lucena CAS, editors. Phylogeny and Classification of Neotropical Fishes. Editora Universitaria, Pontificia Universidad Catolica do Rio Grande do Sul, Porto Alergre. p. 461—498.

Larmusea MHD, Huyse T, Vancampenhout K, Van Houdt JKJ, Volckaert FAM. 2010. High molecular diversity in the rhodopsin gene in closely related goby fishes: A role for visual pigments in adaptive speciation? Molecular phylogenetics and evolution. 55(2):689-98.

Levasseur A, Gouret P, Lesage-Meessen L, Asther Michle, Asther Marcel, Record E, Pon- tarotti P. 2006. Tracking the connection between evolutionary and functional shifts using the fungal lipase/feruloyl esterase A family. BMC evolutionary biology. 6:92.

Levasseur A, Orlando L, Bailly X, Milinkovitch MC, Danchin EGJ, Pontarotti P. 2007. Conceptual bases for quantifying the role of the environment on gene evolution: the par- ticipation of positive selection and neutral evolution. Biological reviews of the Cambridge Philosophical Society. 82(4):551—72.

Levine JS, Macnichol EF. 1979. Visual Pigments in Teleost Fishes - Effects of Habitat, Microhabitat, and Behavior on Visual-System Evolution. Sensory processes. 3(2):95- 131.

Li J, Liu Y, Xin X, Kim TS, Cabeza EA, Ren J, Nielsen R, Wrana JL, Zhang Z. 2012. Evidence for positive selection on a number of MicroRNA regulatory interactions during recent human evolution. PLoS genetics. 8:e1002578.

Lohse, MJ. 2010. Dimerization in GPCR mobility and signaling. Current opinion in phar- macology. 10(1):53-8.

L´opez-Fern´andezH, Honeycutt RL, Winemiller KO. 2005a. Molecular phylogeny and evi- dence for an adaptive radiation of geophagine cichlids from South America (Perciformes: Labroidei). Molecular phylogenetics and evolution. 34(1):227—44.

L´opez-Fern´andezH, Honeycutt RL, Stiassny MLJ, Winemiller KO. 2005b. Morphology, molecules, and character congruence in the phylogeny of South American geophagine cichlids (Perciformes, Labroidei). Zoologica Scripta. 34(6):627—651.

L´opez-Fern´andezH, Winemiller KO, Honeycutt RL. 2010. Multilocus phylogeny and rapid radiations in Neotropical cichlid fishes (Perciformes: Cichlidae: Cichlinae). Molecular phylogenetics and evolution. 55(3):1070-86.

L´opez-Fern´andezH, Winemiller KO, Monta˜naC, Honeycutt RL. 2012. Diet-morphology cor- relations in the radiation of South American geophagine cichlids (Perciformes: Cichlidae:

90 Cichlinae). PLoS ONE. 7:4

L´opez-Fern´andezH, Arbour JH, Winemiller KO, Honeycutt RL. In review. Testing for ancient adaptive radiations in Neotropical cichlid fishes.

Loughran NB, Hinde S, McCormick-Hill S, Leidal KG, Bloomberg S, Loughran ST, O’Connor B, O’Fagan C, Nauseef WM, O’Connell MJ. 2012. Functional Consequence of Positive Selection Revealed Through Rational Mutagenesis of Human Myeloperoxidase. Molecu- lar biology and evolution. 29(8)2039—2046.

Lythgoe JN. 1979. The ecology of vision. Oxford: Oxford University Press.

MacCallum C, Hill E. 2006. Being positive about selection. PLoS biology. 4(3):e87.

Magurran AE. 2005. Evolutionary Ecology: The Trinidadian Guppy. Oxford: Oxford Uni- versity Press.

Malabarba MC, Malabarba LR, Papa CD. 2010. Gymnogeophagus eocenicus , n. sp. (Per- ciformes: Cichlidae), an Eocene cichlid from the Lumbrera Formation in Argentina. Journal of vertebrate paleontology. 30(2):341—350.

Maan ME, Seehausen O, S¨oderberg L, Johnson L, Ripmeester E a P, Mrosso HDJ, Taylor MI, van Dooren TJM, van Alphen JJM. 2004. Intraspecific sexual selection on a speciation trait, male coloration, in the Lake Victoria cichlid Pundamilia nyererei. Proceedings in biological sciences. 271(1556):2445—52.

Matsumoto Y, Fukamachi S, Mitani H, Kawamura S. 2006. Functional characterization of visual opsin repertoire in Medaka (Oryzias latipes). Gene. 371(2): 268—78.

Menon ST, Han M, Sakmar TP. 2001. Rhodopsin: structural basis of molecular physiology. Physiological reviews. 81(4):1659-88.

Miyagi R, Terai Y, Aibara M, Sugawara T, Imai H, Tachida H, Mzighani SI, Okitsu T, Wada A, Okada N. 2012. Correlation between Nuptial Colors and Visual Sensitivities Tuned by Opsins Leads to Species Richness in Sympatric Lake Victoria Cichlid Fishes. Molecular biology and evolution. Online: corrected proof.

Moran P, Kornfield I. 1993. Retention of an Ancestral Polymorphism in the Mbuna Species Flock ( Teleostei: Cichlidae ) of Lake Malawi. Molecular biology and evolution. 10(5):1015— 1029.

Morris MB, Dastmalchi S, Church WB. 2009. Rhodopsin: structure, signal transduction and

91 oligomerisation. The international journal of biochemistry & cell biology. 41(4):721-4

Moury B, Simon V. 2011. dN/dS-Based Methods Detect Positive Selection Linked to Trade- Offs between Different Fitness Traits in the Coat Protein of Potato virus Y. Molecular biology and evolution. 28(9):2707-17.

Muse SV, Gaut BS. 1994. A likelihood approach for comparing synonymous and nonsynony- mous nucleotide substitution rates, with application to the chloroplast genome. Molecular biology and evolution. 11(5):715—24.

Nagai H, Terai Y, Sugawara T, Imai H, Nishihara H, Hori M, Okada N. 2011. Reverse evolution in RH1 for adaptation of cichlids to water depth in Lake Tanganyika. Molecular biology and evolution. 28(6):1769—76.

Nakayama TA, Zhang W, Cowan A, Kung M. 1998. Mutagenesis Studies of Human Red Opsin: Trp-281 Is Essential for Proper Folding and Protein-Retinal Interactions. Bio- chemistry. 37(50):17487—17494.

Nielsen R, Yang Z. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 148(3):929—936.

Nozawa M, Suzuki Y, Nei M. 2009a. Reliabilities of identifying positive selection by the branch-site and the site-prediction methods. PNAS. 106(16):6700-5.

Nozawa M, Suzuki Y, Nei M. 2009b. Response to Yang et al.: Problems with Bayesian methods of detecting positive selection at the DNA sequence level. PNAS. 106(36):e96

Okada TK, Ernst OP, Palczewski K, Hofmann KP. 2001. Activation of rhodopsin: new in- sights from structural and biochemical studies. Trends in biochemical sciences. 26(5):318— 24.

Page RDM, Holmes EC. 1998. Molecular Evolution: A Phylogenetic Approach. Oxford: Blackwell Scientific.

Palczewski K. 2006. G protein-coupled receptor rhodopsin. Annual review of biochemistry. 75:743—67.

Palczewski K, Kumasaka T, Hori T, et al. (12 co-authors). 2000. Crystal structure of rhodopsin: A G Protein-Coupled receptor. Science. 289(5480):739—745.

Park JH, Scheerer P, Hofmann KP, Choe H, Ernst OP. 2008. Crystal structure of the ligand-free G-protein-coupled receptor opsin. Nature. 454(7201):183-189.

92 Parry JWL, Poopalasundaram S, Bowmaker JK, Hunt DM. 2004. A novel amino acid substitution is responsible for spectral tuning in a rodent violet-sensitive visual pigment. Biochemistry. 43(25):8014—20.

Parry JWL, Carleton KL, Spady TC, Carboo A, Hunt DM, Bowmaker JK. 2005. Mix and Match Color Vision: Tuning Spectral Sensitivity by Differential Opsin Gene Expression in Lake Malawi Cichlids. Current Biology. 15(19):1734—1739.

Perez PA, Malabarba MC, del Papa C. 2010. A new genus and species of Heroini (Per- ciformes: Cichlidae) from the early Eocene of southern South America. Neotropical ichthyology. 8(3):631—642.

Piechnick R, Ritter E, Hildebrand PW, Ernst OP, Scheerer P, Hofmann KP, Heck M. 2012. Effect of channel mutations on the uptake and release of the retinal ligand in opsin. PNAS. 109(14):5247—52.

Posada D, Crandall KA. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics. 14 (9): 817-818.

Ramm SA, Oliver PL, Ponting CP, Stockley P, Emes RD. 2008. Sexual selection and the adaptive evolution of mammalian ejaculate proteins. Molecular biology and evolution. 25(1):207—19.

Reis R, Kullander S, Ferraris C, Jr. 2003. Check list of the freshwater fishes of South and Central America. Porto Alegre: Pontifcia Universidade Catoico de Rio Grande do Sul.

Rennison DJ, Owens GL, Taylor JS. 2012. Opsin gene duplication and divergence in ray- finned fish. Molecular phylogenetics and evolution. 62(3):986-1008.

Reznick DN, Endler JA. 1982. The Impact of Predation on Life History Evolution in Trinida- dian Guppies (Poecilia reticulate). Evolution. 36(1):160-177

Sabbah S, Laria RL, Gray SM, Hawryshyn CW. 2010. Functional diversity in the color vision of cichlid fishes. BMC biology. 8:133.

Salzburger W, Meyer A, Baric S, Verheyen E, Sturmbauer C. 2002. Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. Systematic biology. 51(1):113—35.

Sakmar TP, Menon ST, Marin EP, Awad ES. 2002. Rhodopsin: insights from recent struc- tural studies. Annual review of biophysics and biomolecular structure. 31:443-84.

93 Sawyer SL, Wu LI, Emerman M, Malik HS. 2005. Positive selection of primate TRIM5alpha identifies a critical species-specific retroviral restriction domain. PNAS. 102(8):2832—7.

Schadel SA, Heck M, Maretzki D, Filipek S, Teller DC, Palczewski K, Hofmann KP. 2003. Ligand Channeling within a G-protein-coupled Receptor. Journal of biological chemistry. 278(27):24896—24903.

Schwarzer J, Misof B, Tautz D, Schliewen UK. 2009. The root of the East African cichlid radiations. BMC evolutionary biology. 9:186.

Schluter D. 2000. The Ecology of Adaptive Radiation. Oxford: Oxford University Press.

Seehausen O, JJM van A. 1998. The effect of male coloration on female mate choice in closely related Lake Victoria cichlids (Haplochromis nyererei complex). Behavioural ecology and sociobiology. 42(1):1—8.

Seehausen O. 1997. Cichlid Fish Diversity Threatened by Eutrophication That Curbs Sexual Selection. Science. 277(5333):1808—1811.

Seehausen O. 2006. African cichlid fish: a model system in adaptive radiation research. Proceedings biological sciences. 273(1597):1987-98.

Seehausen O, Terai Y, Magalhaes IS, Carleton KL, Mrosso HDJ, Miyagi R, van der Sluijs I, Schneider MV, Maan ME, Tachida H, et al. 2008. Speciation through sensory drive in cichlid fish. Nature. 455(7213):620—6.

Sioli H. 1984. The Amazon and its main affluents: hydrography, morphology of the river courses, and river types. In: The Amazon: limnology and landscape ecology of a mighty tropical river and its basin Siolo H, editor. Dordecht: Dr W. Junk publishers. p. 127— 165.

Sivasundar A, Palumbi SR. 2010. Parallel amino acid replacements in the rhodopsins of the rockfishes (Sebastes spp.) associated with shifts in habitat depth. Journal of evolutionary biology. 23(6):1159—69.

Smith SO. 2010. Structure and Activation of the Visual Pigment Rhodopsin. Annual review of biophysics. 39:309-328

Smith LW, Chakrabarty P, Sparks JS. 2008. Phylogeny, , and evolution of Neotrop- ical cichlids (Teleostei: Cichlidae: Cichlinae). Cladistics. 24(5):625-641.

Spady TC, Seehausen O, Loew ER, Jordan RC, Kocher TD, Carleton KL. 2005. Adaptive

94 molecular evolution in the opsin genes of rapidly speciating cichlid species. Molecular biology and evolution. 22(6):1412—22.

Spady TC, Parry JWL, Robinson PR, Hunt DM, Bowmaker JK, Carleton KL. 2006. Evo- lution of the cichlid visual palette through ontogenetic subfunctionalization of the opsin gene arrays. Molecular biology and evolution. 23(8):1538—47.

Sparks JS. 2004. Molecular phylogeny and biogeography of the Malagasy and South Asian cichlids (Teleostei: Perciformes: Cichlidae). Molecular phylogenetics and evolution 30(3):599—614.

Sparks JS, Smith LW. 2004. Phylogeny and biogeography of cichlid fishes (Teleostei: Perci- formes: Cichlidae). Cladistics. 20(6):501-517.

Stamatakis A, Ludwig T, Meier H. 2005. RAxML-III: a fast program for maximum likelihood- based inference of large phylogenetic trees. Bioinformatics. 21(4):456-63.

Stiassny MLJ. 1991. Phylogenetic intrarelationships of the family Cichlidae: an overview. In: Keenleyside MH, editor. Cichlid Fishes: Behaviour, ecology and evolution. London: Chapman Hall. p.1-35

Streelman JT, Zardoya R, Meyer A, Karl S. 1998. Multilocus phylogeny of cichlid fishes (Pisces: Perciformes): evolutionary comparison of microsatellite and single-copy nuclear loci. Molecular biology and evolution. 15(7):798—808.

Sugawara T, Terai Y, Imai H, Turner GF, Koblm¨ullerS, Sturmbauer C, Shichida Y, Okada N. 2005. Parallelism of amino acid changes at the RH1 affecting spectral sensitivity among deep-water cichlids from Lakes Tanganyika and Malawi. PNAS. 102(15):5448—53.

Sugawara T, Imai H, Nikaido M, Imamoto Y, Okada N. 2010. Vertebrate rhodopsin adap- tation to dim light via rapid meta-II intermediate formation. Molecular biology and evolution. 27(3): 506—19.

Suzuki Y, Gojobori T. 1999. A method for detecting positive selection at single amino acid sites. Molecular biology and evolution. 16(10):1315—1328.

Swanson WJ, Yang Z, Wolfner MF, Aquadro CF. 2001. Positive Darwinian selection drives the female in reproductive proteins evolution mammals of. PNAS. 98(5):2509-2514.

Takenaka N, Yokoyama S. 2007. Mechanisms of spectral tuning in the RH2 pigments of Tokay gecko and American chameleon. Gene. 399(1):26-32.

95 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecu- lar evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular biology and evolution. 28(10):2731—9.

Terai Y, Seehausen O, Sasaki T et al. (14 co-authors). 2006. Divergent selection on opsins drives incipient speciation in Lake Victoria cichlids. PLoS biology. 4:e433.

Terakita, A. 2005. The opsins. Genome biology. 6(3):213.

Thompson JD, Higgins DG, Gibson TJ. 1994. Clustal-W: Improving the sensitivity of pro- gressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research. 22(22): 4673—4680.

Turner GF, Seehausen O, Knight ME, Allender CJ, Robinson RL. 2001. How many species of cichlid fishes are there in African lakes? Molecular ecology. 10(3):793—806. van Oppen MJ, Rico C, Turner GF, Hewitt GM. 2000. Extensive homoplasy, nonstepwise mutations, and shared ancestral polymorphism at a complex microsatellite locus in Lake Malawi cichlids. Molecular biology and evolution. 17(4):489—98.

Wagner HJ, Kroger RHH. 2005. Adaptive plasticity during the development of colour vision. Progress in retinal and eye research. 24(4):521-536.

Wald G. 1951. The Chemistry of Rod Vision. Science. 113(2933):287—291.

Wald G. 1968. Molecular basis of visual excitation. Science. 162(3850):230-239

Wang T, Duan Y. 2011. Retinal release from opsin in molecular dynamics simulations. Journal of molecular recognition. 24(2):350—8.

Weadick CJ, Chang BSW. 2007. Long-wavelength sensitive visual pigments of the guppy (Poecilia reticulate): six opsins expressed in a single individual. BMC evolutionary biology. 7 Suppl 1:S11.

Weadick CJ, Chang BSW. 2012 An improved likelihood ratio test for detecting site-specific functional divergence among clades of protein-coding genes. Molecular biology and evo- lution. 29(5):1297-1300.

Weadick CJ, Loew E, Rodd H, Chang BSW. 2012. Visual pigment molecular evolution in the Trinidadian pike cichlid (Crenichla frenata): A less colorful world for Neotropical cichlids? Molecular biology and evolution online: corrected proof

96 Weitz CJ, Nathans J. 1993. Rhodopsin activation: effects on the metarhodopsin I-metarhodopsin II equilibrium of neutralization or introduction of charged amino acids within putative transmembrane segments. Biochemistry. 32(51):14176—82.

Willis SC, L´opez-Fern´andezH, Monta˜naCG, Farias IP, Ort´ıG. 2012. Molecular Phyloge- netics and Evolution Species-level phylogeny of “Satan’s perches” based on discordant gene trees ( Teleostei: Cichlidae: Satanoperca G¨unther 1862 ). Molecular Phylogenetics and evolution. 63(3):798-808.

Wimberger PH, Reis RE, Thornton KR. 1998. Mitochondrial phylogenetics, biogeography, and evolution of parental care and mating systems in Gymnogeophagus (Perciformes: Cichlidae). In: Malabarba LR, Reis RE, Vari RP, Lucena ZM, Lucena CAS, editors. Phylogeny and Classification of Neotropical Fishes. Editora Universitaria, Pontificia Universidad Catolica do Rio Grande do Sul, Porto Alergre. p.509—518.

Winemiller KO, Kelso-Winemiller LC, Brenkert AL. 1995. Ecomorphological diversification and convergence in fluvial cichlid fishes. Environmental biology of fishes. 44(1-3):235— 261.

Witte F, Goldschmidt T, Wanink J, Oijen MV, Goudswaard K, Witte-maas E, Bouton N. 1992. The destruction of an endemic species flock: quantitative data on the decline of the haplochromine cichlids of Lake Victoria. Environmental biology of fishes. 34(1):1—28.

Wong WSW, Yang Z, Goldman N, Nielsen R. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 168(2):1041—51.

Yang Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Molecular biology and evolution. 15(5):568-573.

Yang Z. 2000. Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adap- tive Evolution in Human Influenza Virus A. Journal of molecular evolution. 51(5):423- 432.

Yang Z. 2006. Computational molecular evolution. Oxford: Oxford University Press.

Yang Z. 2007. PAML 4: A program package for phylogenetic analysis by maximum likeli- hood. Molecular biology and evolution. 24(8): 1586-1591

Yang Z, Bielawski JP. 2000. Statistical methods for detecting molecular adaptation. Trends in ecology & evolution. 15(12):496-503.

97 Yang Z, Nielsen R. 1998. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. Journal of molecular evolution. 46(4):409-418.

Yang Z, Nielsen R. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular biology and evolution. 19(6):908-917.

Yang Z, Nielsen R, Goldman N, Pedersen AMK. 2000. Codon-substitution models for het- erogeneous selection pressure at amino acid sites. Genetics. 155(1):431-449.

Yang Z, Reis M. 2011. Statistical Properties of the Branch-Site Test of Positive Selection Research article. Molecular biology and evolution. 28(3):1217-1228.

Yang Z, Wong WSW, Nielsen R. 2005. Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular biology and evolution. 22(4):1107-1118.

Yau KW, Hardie RC. 2009. Phototransduction Motifs and Variations. Cell. 139(2):246-264.

Yokoyama S, Tada T, Yamato T. 2007. Modulation of the absorption maximum of rhodopsin by amino acids in the C-terminus. Photochemistry and photobiology. 83(2):236—41.

Yokoyama S, Tada T, Zhang H, Britt L. 2008. Elucidation of phenotypic adaptations: Molecular analyses of dim-light vision proteins in vertebrates. PNAS. 105(36):13480-5.

Yokoyama S. 2008. Evolution of dim-light and color vision pigments. Annual review of genomics and human genetics. 9:259—82.

Yokoyama S. 2000. Molecular evolution of vertebrate visual pigments. Progress in retinal and eye research. 19(4):385—419.

Yoshida I, Sugiura W, Shibata J, Ren F, Yang Z, Tanaka H. 2011. Change of positive selection pressure on HIV-1 envelope gene inferred by early and recent samples. PloS one. 6:e18630.

Yuan F, Bernard GD, Le J, Briscoe AD. 2010. Contrasting modes of evolution of the visual pigments in Heliconius butterflies. Molecular biology and evolution. 27(10):2392—405.

Zardoya R, Vollmer DM, Craddock C, Streelman JT, Karl S, Meyer A. 1996. Evolutionary conservation of microsatellite flanking regions and their use in resolving the phylogeny of cichlid fishes (Pisces: Perciformes). Proceedings biological sciences. 263(1376):1589—98.

Zhang L, Li W-H. 2004. Mammalian housekeeping genes evolve more slowly than tissue- specific genes. Molecular biology and evolution. 21(2):236—9.

98 Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular biology and evolution. 22(12):2472-2479.

99