Relaxed selection in erythropoietic gene hemogen among high-latitude Antarctic notothenioids

by Carmen M. Elenberger

B.A. in Anthropology, University of Florida

A thesis submitted to

The Faculty of the College of Science of Northeastern University in partial fulfillment of the requirements for the degree of Master of Science

December 12, 2018

Thesis directed by

H. William Detrich Professor of Biochemistry and Marine Biology

1

Copyright 2018 Carmen Elenberger

2

Acknowledgements

First and foremost, I would like to thank my advisor, Dr. H. William Detrich, for his guidance and his support over the past four years. He challenged me to broaden my horizons and gave me the opportunity to travel to the ends of the earth in order to do so. I would also like to thank Dr.

Thomas Desvignes, as well as Laura Goetz and Sierra Smith, for their assistance in conducting field work for this project. I would like to extend further thanks to Dr. Jacob Daane for permitting me to use his unpublished data to expand my analyses. Many thanks to Biology Open for allowing me to reproduce their figure with permission [1].

I would like to thank my committee members, Dr. A. Randall Hughes and Dr. Steve

Vollmer, for their interest in my research and their advice in analyzing and framing the results of my research. I would also like to thank my labmate, Dr. Michael Peters, and our lab manager,

Sandra Parker, for their advice, assistance, and encouragement over the years. Additionally, I would like to thank the faculty and staff of the Marine Science Center, as well as the funding sources for this research. Special thanks to the staff of Palmer Station and the crew of the

Laurence M. Gould for a productive and memorable field season. Finally, I would like to thank my friends and family for their unwavering support and encouragement, now and always.

3

Abstract of Thesis

Antarctic icefish (Channichthyidae) are the only vertebrate taxon with an erythrocyte-null phenotype, and present an interesting model for studying the evolution and regulation of erythropoiesis. The gene hemogen has been identified to encode a protein which plays a role in regulating erythropoietic processes in vertebrates. hemogen may have been potentially impacted by the loss of globin-expression. I investigated possible relaxed selection at the hemogen locus by looking for evolutionary change to the regulatory elements or segments encoding the

Hemogen protein, and assessed the evolutionary processes that drove hemogen variation among

Antarctic notothenioids. While regulatory mechanisms remain intact, icefish show a significant

90bp indel in exon 3 of hemogen that would disrupt conserved modules in the Hemogen protein that are critical for erythropoiesis. Despite this, hemogen still remains expressed at low levels in adult icefish and possesses a novel splice variant that encodes a truncated protein possibly serving as a dominant negative for wild-type Hemogen. I conclude that while hemogen has undergone relaxed selection and accumulated mutations that would impact erythropoietic function in non-Antarctic , the observed mutations may be tolerated due to erythrocyte and hematocrit modifications in notothenioid blood phenotypes. hemogen may have a decreased— but still important—role to play in icefish, possibly functioning as a dominant negative for hemogen’s role in erythropoiesis.

4

Table of Contents

Acknowledgements 3

Abstract of Thesis 4

Table of Contents 5

List of Tables 6

List of Figures 7

List of Abbreviations 9

Introduction 11

Methods 15

Results 21

Discussion 29

Tables and Figures 40

References 67

5

List of Tables

1 Primers used in PCR and qRT-PCR reactions to amplify hemogen gDNA and cDNA in Antarctic notothenioids (pg 40)

2 sequenced and included in study of Antarctic notothenioid hemogen (pg 41)

3 Codon usage bias for hemogen (total coding sequence) among Antarctic notothenioids (pg 42)

4 Mean pairwise dN/dS for within-family comparisons of Antarctic notothenioid families (pg 43)

5 Mean pairwise dN/dS for between-family comparisons of Antarctic notothenioid families (pg 44)

6 Results of codon-based site tests conducted in CodeML on the Antarctic radiation (pg 45)

6

List of Figures

1 Zebrafish Si:dkey-25o16.2 and human Hemogen are orthologous and encode related proteins that differ in size (pg 46)

2 Icefish transcript variants for hemogen and their putative effects on translation illustrated in representative species Champsocephalus gunnari (pg 48)

3 Maximum likelihood tree used to test for positive selection on the branch leading to the Antarctic notothenioid clade (pg 50)

4 Maximum likelihood tree used in site-tests for positive/pervasive selection among Antarctic notothenioids (pg 51)

5 RELAX tree shows relaxed selection on the branches contained Bathydraconidae and Channichthyidae, demonstrating a trend of relaxed selection in hemogen on the way to the erythrocyte-null phenotype (pg 53)

6 Gene structure and size remains conserved among red-blooded and white-blooded notothenioids, including regulatory regions conserved among teleost fish (pg 54)

7 Conservation of conserved non-coding elements CNE1 and CNE2 in Antarctic notothenioids relative to Gasterosteus aculeatus and Danio rerio (pg 56)

8 hemogen exon 3 deletions in representative species from Channichthyidae relative to a red-blooded notothenioid, and their predicted effects on transcription and translation (pg 57)

9 Variant forms of hemogen “exon 3” deletion mapped onto the Channichthyidae species tree (pg 59)

10 hemogen indels in Antarctic notothenioids mapped onto a maximum parsimony tree (pg 60)

11 Pairwise dN/dS comparisons plotting total dN/dS of whole Hemogen-encoding sequence with the dN/dS values for the N-terminus and C-terminus of notothenioid Hemogen, within families (A & B) and Channichthyidae (C & D). (pg 62)

12 Pairwise dN/dS trends between families Nototheniidae and Channichthyidae, plotting whole-Hemogen dN/dS vs the N-terminus (A) or C-terminus (B). (pg 63)

13 qPCR quantification of hemogen transcript variants in representative icefish species C. aceratus and C. gunnari, comparing adult head kidney hemogen expression with N. coriiceps adult head kidney for both hemgn-L and hemgn-s splice variants (pg 64)

7

14 Changes to the bipartite nuclear localization signal in icefish (Champsocephalus gunnari) relative to red-blooded notothens ( coriiceps). (pg 66)

8

List of Abbreviations aa amino acid bp base pair

CAI Codon Adaptation Index cDNA complementary DNA

CNE conserved non-coding element dN nonsynonymous mutation rate

DNA deoxyribonucleic acid dN/dS ratio of nonsynonymous to synonymous mutation rates dS synonymous mutation rate

EDAG erythroid differentiation-associated gene

GATA1 GATA-binding protein 1 gDNA genomic deoxyribonucleic acid

HoxB4 homeobox B4

KLF4 Krueppel-like Factor 4

-lnL negative log likelihood

MMCT Middle Miocene Climate Transition

MRCA most recent common ancestor

Mya million years

Myb MYB Proto-Oncogene, Transcription Factor

NLS nuclear localization signal p300 histone acetyltransferase p300

PCR polymerase chain reaction

9 qPCR quantitative polymerase chain reaction

RNA ribonucleic acid

Sox9 transcription factor SOX-9

UTR untranslated region

10

INTRODUCTION

Cold-driven evolution of the Antarctic notothenioid lineage began roughly 46 Mya [2] concurrent with the emergence of the Drake Passage (55-41 Ma) [3] and the initial formation of the Antarctic Circumpolar Current [4]. The development of antifreeze glycoproteins [5, 6] permitted colonization and persistence in the [7] and set the stage for further diversification during successive cooling periods and accompanying geological events. The radiation of the high latitude Antarctic notothenioids (Cryonotothenioidea) occurred during a period of diversification driven by intensified cooling of the Southern Ocean during the Middle

Miocene Climate Transition (MMCT) [7, 8], with species diversification beginning ~14 Mya and accelerating ~11 Mya during the Late Miocene [7, 9-11]. Cooling during the MMCT led to contemporary Antarctic conditions (-2℃ to + 2℃) and resulted in the scouring of continental shelves by ice [12, 13]. This opened ecological niches for potential colonization by removing more temperate adapted competitors [14] and leading to rapid morphological and ecological diversification [15]. Current day Antarctic notothenioids comprise 77% of Antarctic teleost diversity and constitute a marine species flock [16] derived via adaptive radiation [17-19]. High levels of morphological diversity and intense speciation make Antarctic notothenioids a useful evolutionary model for studying cold adaptation.

Antarctic notothenioids possess a number of remarkable changes to erythropoiesis and the oxygen-transport system at large that resulted in the evolution of the only known vertebrate clade devoid of erythrocytes—the family Channichthyidae, characterized by a “white-blooded” phenotype [20]. It has been hypothesized that the high oxygen concentration in polar seawater could lead to potential relaxed selection on erythrocytes and other oxygen-binding pigments, as hypoxic stress becomes less of a relevant factor with oxygen in such high abundance [21].

11

Evidence for such relaxed selection can be seen in changes to blood content: a study of “red- blooded” Antarctic species from McMurdo Sound showed decreased numbers of erythrocytes, lowered hematocrit, and lowered hemoglobin concentrations when compared with temperate fish

[22]. General trends throughout the radiation show that the more derived the family, the fewer erythrocytes present in circulating blood and the lower the hemoglobin content [21]. Both red- blooded and white-blooded notothenioid fish show reduced hematocrit, which is potentially an adaptive feature to contend with the increased viscosity of blood under low temperatures [23,

24]. Hemoglobin multiplicity is reduced among notothenioids relative to temperate fish [25-27] and cold anemia responses became genetically assimilated [28-30]. At some point notothenioid dependence on hemoglobin for respiration became so reduced even red-blooded fish could continue to effectively absorb and utilize oxygen even in the presence of carbon monoxide [31], suggesting that the stage had well been set for hemoglobin loss before it disappeared.

Channichthyidae are characterized by loss of the vertebrate oxygen-transport molecules the α2β2 hemoglobin tetramer carried within erythrocytes. This occured in the most recent common ancestor (MRCA) of all icefish via large genomic lesions within the respective loci [25,

32-36]. Furthermore, there have been multiple, independent losses of myoglobin during diversification [36]. Icefish possess few erythroblasts, and their blood contains mostly leukocytes and plasma [35]. The evolution of the white-blooded phenotype is unique among vertebrates and has far reaching consequences for the cardiovascular system and key globin partners. As a result of hemoglobin loss, we would anticipate changes to the genetic machinery involved in red blood cell production and maintenance, as selective constraints on this may relax in the absence of key globin partners. It is also possible that this began somewhere within the red-blooded families, as oxygen transport molecules became less necessary for survival. Relaxed selection in the

12 regulatory regions of globin has been detected among dragonfish, prior to the emergence of a white-blooded phenotype [37].

The gene hemogen has been identified as an interesting candidate for further study in notothenioid fish, given evidence based on subtraction libraries that expression may be impaired or entirely absent in icefish. The hemogen gene encodes the transcription factor Hemogen (Fig

1), which acts as a regulator in hematopoietic development by stimulating the differentiation of hematopoietic cells into both the erythroid and megakaryocytic lineages [38-43]. In teleost fish,

Hemogen is encoded by four exons and contains domains similar to those predicted in the human ortholog: a coiled-coil domain, a bipartite nuclear localization, a series of tandem repeats and an acidic domain (Fig 1) [1, 38]. It is promoted via two conserved non-coding elements, one proximal and one distal, both critical for promoting primitive erythropoiesis (Fig 1) [1].

Hemogen also plays a role in cell apoptosis [39] and has been implicated in the regulation of tumor cells in acute myeloid leukemia [44]. Other possible roles include spermatogenesis [45], sex-determination [46], and osteoblast recruitment and bone calcification [47-49]. Research show Hemogen’s role in hematopoiesis takes place via interactions with a number of key proteins involved in erythropoiesis and development, including GATA1 and p300. GATA1 is critical for erythroid differentiation [50-52] and functions in both primitive and definitive hematopoiesis [53]. Nonsense mutations in GATA1 lead to a “bloodless” phenotype [54].

GATA1 recruitment is crucial for hemogen function and downregulation of hemogen expression inhibits GATA1 activity [40, 43], while GATA1 recruits hemogen to the beta-globin locus [55]. p300 is crucial for cell differentiation [56, 57] and inhibition of p300 binding to Hemogen causes decreased production of erythroid cells. Hemogen facilitates the interaction between GATA1 and p300, making it a critical part of the erythroid differentiation process [55].

13

Decreased hemogen expression in white-blooded fish may indicate functional loss. Given the decreasing importance of red blood cells to the notothenioid lineage, selective constraints on known regulators of erythrocyte production may have relaxed prior to complete globin loss.

Hemogen interacts with Beta-globin and regulates erythroid production, raising the possibility that erythropoietic features may be aberrant in icefish. However, hemogen demonstrates pleiotropy, as described in the previous paragraph, and lists of potential partners implicate it in a number of important cellular processes beyond erythropoiesis. Therefore, at least some features must remain conserved in order to carry out non-erythropoietic roles.

In this thesis, I characterize hemogen genes in both red-blooded and white-blooded

Antarctic notothenioids and compare them with sub-Antarctic perciform outgroups to establish hemogen’s history within this clade. I hypothesize that the hemogen locus is undergoing relaxed selection among the icefish, and that relaxation of selective constraints began prior to the emergence of Channichthyidae. I investigated partial conservation of the hemogen gene, hypothesizing that pleiotropy would protect against total pseudogenization of hemogen. Features under relaxed selection would be implicated in erythropoietic function and could be considered targets for further study of hemogen in erythropoiesis. I hypothesize some level of differential expression between white-blooded and red-blooded fish; if not complete loss of expression, than loss in certain tissues or of certain key isoforms in Channichthyidae.

My results show a strong trend towards relaxed selection in high-latitude Antarctic notothenioids relative to Sub-Antarctic relatives, with icefish showing intensified relaxation.

Confirmation of relaxed selection among-red-blooded fish supports the theory that the decreased dependence on erythrocytes in notothenioid fish also correlates with larger-scale changes in the erythropoietic paradigm on the genomic level. Three out of four key functional domains show

14 some form of degradation, either via nonsynonymous mutation or through the transcriptional/translational impacts of indels on icefish hemogen. Three key evolutionary events took place in the MRCA of all extant icefish: the degradation of the bipartite NLS, a 30aa loss in a proline-rich region of tandem repeats, and the development of a novel splice form, hemgn-s, which excludes all functional domains encoded by exon 3 and 4 and theoretically results in a frameshifted and truncated hemogen protein. However, key promoter regions remain conserved in icefish, and while expression is down-regulated in adult tissues relative to red-blooded species, hemogen is still expressed in adult tissues of some icefish. This suggests that while the decreased importance of erythropoietic functions may have significantly relaxed pressure on hemogen and resulted in mutations impacting domains critical for erythropoietic-function, it is not necessarily non-functional and may still be playing a decreased but critical role in other processes.

METHODS

Sample collection & sequencing of notothenioid hemogen gDNA

The primary source of genomic material came from tissues obtained by the Detrich Lab during the 2012, 2014 and 2016 winter fishing cruises conducted by the Research Vessel

Laurence M. Gould near Palmer Station, . Tissues were flash-frozen in liquid nitrogen and then stored at -80℃. I generated sequences from between 1-5 individual fish per species.

Molecular methods for gDNA extraction from tissues are as specified in the Quick-gDNA miniprep kit (Zymo Research, D3024). Full notothenioid hemogen—from start codon to the 3’

UTR—was amplified by PCR from gDNA samples using 1 µM primers (Table 1) designed from previously obtained sequences. The amplification protocol was as

15 follows—35 cycles of 98°C for 10 s, 59°C for 10 s, 72°C for 1 min. PCR products were cloned into the pGEM T-easy vector (Promega, A1360), and recombinant clones were transformed into

DH5α competent cells (New England Biolabs, C2987H). Recombinant plasmids were identified using blue/white screening, purified via the Wizard Plus Miniprep DNA Purification System

(Promega, A7500), and sequenced by GeneWiz. I obtained full genomic sequences for 18 notothenioid species (Table 2, Figure S1).

Cloning and sequencing of notothenioid cDNAs

I isolated total RNA from flash-frozen tissues of adult N. coriiceps and C. aceratus using the RNEasy Mini Kit (Qiagen, 74104). Several potential hemogen transcripts had been previously identified by other Detrich Lab members (Figure 2). To expand upon these results,

RNA samples were prepared from ten tissues: liver, spleen, head kidney, trunk kidney, white muscle, pectoral red muscle, testes, brain, heart ventricle and gill. Total cDNA was produced from the mRNA using M-MuLV reverse transcriptase and an oligo(dT)23 primer according to the protocol outlined in the Protoscript II First Strand cDNA Synthesis kit (NEB, E6560S). cDNA was amplified via PCR using the same primers as gDNA PCR (Table 1) according to the following protocol: 35 cycles of 98°C for 10 s, 59°C for 10 s, and 72°C for 45 s. cDNA was then cloned into pGEM T-easy vector and subsequently transformed and purified as outlined for gDNA sequences.

Construction of genomic, coding and protein alignments for gene characterization, phylogenies and evolutionary analysis

16

Nucleic acid alignments were constructed using MUSCLE [58] as implemented in

MEGA7 [59], with a gap opening penalty of 15 and gap extension penalty of 6.66. Alignments were subsequently inspected and adjusted by eye in BioEdit [60]. Construction of gene trees and evolutionary analysis relied primarily on three alignments: a gDNA alignment with all exons + introns; a coding alignment based on cDNA sequences, transcriptome data and concatenated exome data; and a protein alignment, translated from the coding sequences in MEGA7 [59].

The cDNA sequences that I generated were supplemented with hemogen cDNAs from transcriptomic analyses of Pseudochaenichthys georgianus [unpublished data from Detrich lab] and Parachaenichthys charcoti [unpublished data from Detrich lab], and aligned with my genomic sequences to generate coding sequences for other notothenioids. Additionally, cDNA and transcriptome sequences also served as a basis for alignment and quality control for sequences obtained via an exome-capture analysis [unpublished] conducted by Dr. Jacob Daane of the Detrich lab. A full list of all species included and the sequence sources can be found in

Table 2. In total 43 species representing Antarctic notothenioids from all high-latitude families

(Artedidraconidae, Bathydraconidae, Channichthyidae, Harpagiferidae, Nototheniidae) as well as 7 Sub-Antarctic outgroups were included in evolutionary analyses.

Analysis of positive, pervasive and relaxed selection on Antarctic notothenioid hemogen

All trees were constructed in RAxML [61, 62] using nucleotide substitution model

GTRGAMMA to conduct an initial tree search of 20 trees and select the best tree from this pool.

No outgroups were specified. Branch tests were conducted using the CodeML module included in PAML 4.0 [63, 64].

17

Two tests for positive selection were run. The first was conducted using branch-site models [65, 66] on a subset of coding sequences (SFigure 2) to search for possible positive adaptation in the Antarctic clade relative to several notothenioid outgroups. The branch leading to the representative Antarctic notothenioids was specified a priori as the foreground branch

(Figure 3). The null model set NSsites = 2, fix_omega = 1, and omega = 1. This assumes two categories of sites (purifying and neutral selection) and looks for a difference in proportions of sites undergoing neutral selection on the foreground branch relative to the background. The positive/alternative model set NSsites = 2, fix_omega = 0, and omega = 1, which allows for three categories of sites (purifying, neutral, and positive selection) and looks to identify sites undergoing positive selection on the foreground relative to the background branch. If the alternative model is accepted over the null, this indicates a site has undergone episodic positive selection (changed once, then retained in the clade)

The second test relied on codon-substitution site models [67, 68] to detect pervasive positive selection among Antarctic notothenioids using the coding sequences (SFigure 2). This would identify any possible sites which changed repeatedly throughout diversification of the clade, possibly as a result of differing adaptive challenges related to the modification of the hematic system. Models M0, M1a, M2a, M3, M7, and M8 were run by setting NSsites = 0 1 2 3

7 8 (respectively), fix_omega = 0, and omega = 1. Model M8a set NSsites = 8 but set fix_omega

= 1 and omega = 1. The submitted gene tree for the site tests can be found in Figure 4.

Test for relaxed selection in the branch leading to Channichthyidae was conducted using

RELAX [69] as part of the HyPhy suite of hypothesis testing software [70]. RELAX conducts a comparative test of whether an a priori specified branch or subset of branches has undergone relaxed or diversifying selection relative to the rest of the tree. This makes it useful for

18 identifying trends and/or shifts in the stringency of natural selection on a given gene, provided one has an idea of where that should occur. The branches ending in Channichthyidae and

Bathydraconidae were selected as the test branches, with all others used as reference branches.

The reference tree used was the putative species tree of Daane [unpublished]. The test was run on the Datamonkey server [71, 72].

Bioinformatic comparison of notothenioid hemogen promoters and coding domains

Regulatory regions from Eleginops maclovinus, N. coriiceps and Chaenocephalus aceratus were sequenced based on the annotations for the N. coriiceps genome (NCBI

Accession: PRJNA66471, ID: 66471) [73]. gDNA sequences were aligned to N. coriiceps and C. aceratus scaffolds via BLAST in Geneious (v. 10.0.5) [74] to determine whether notothenioid species possess conserved synteny around the hemogen locus as observed in other vertebrate species [1]. Scaffold sequences were confirmed by sequencing from the upstream (anp32b) and downstream (TRMO) genes towards hemogen. Promoter alignments for hemogen were obtained using the whole genome alignments for D. rerio and Gasterosteus aculeatus (ENSEMBL v94)

[75]. Transcription factor binding sites were predicted with ConTra v2 using a similarity matrix of 0.75 [76]. Protein domains were identified based on annotations from human [38] and zebrafish hemogen [1].

Parsimony gene tree and deletion mapping

A hemogen gene tree was built using coding sequences and maximum parsimony method

[77] in Mega7. Gaps were treated as partial deletions with site coverage set for 90%. This allowed for the inclusion of sites where a majority of species possessed sequence data but one

19 species (or ) possessed a phylogenetically informative indel. The tree included 1st, 2nd, and

3rd codon positions and was computed using the Subtree-Pruning-Regrafting method, beginning with 10 trees and retaining 100 trees. Following 1000 bootstrap iterations the best tree was selected based on comparison with known species phylogenies. The phylogeny was edited to include indel information using ggtree in R [78] and the Interactive Tree of Life (iTOL) v3 [79].

I ran the tree topology in CodeML [63, 64] using the M0 model (model = 0, NSsites = 0) [67] to obtain the number of nucleotide substitutions per codon (dN+dS) as well as dN, dS, and dN/dS for the whole tree.

An icefish species tree was constructed based on the species tree built from the exome data of Daane [unpublished] with modifications derived from available Channichthyidae phylogenies [80, 81].

Pairwise dN/dS comparisons

Pairwise dN/dS values were generated using a subset of the coding alignment (SFigure 2) and were ran in PAML4 using yn00 [64]. yn00 calculates rates based on the method outlined in

Nielsen & Yang 2000 [82] and allows for codon usage bias as well as transition-transversion rate differences. To assess codon usage bias in notothenioid hemogen, I used DnaSP v5 [83, 84] to measure codon usage bias via the codon adaptation index (CAI/CBI) [85, 86]. Values for CAI are shown in Table 3; the values fall within a range of 0.3-0.4 for all species, which represents moderate codon usage bias (low bias < 0.3 and high > 0.5).

All forty-three high-latitude notothenioid species were included in this analysis (Table 2).

I examined two kinds of evolutionary relationships: within-family comparisons (ex: icefish vs icefish) and between-family pairwise comparisons (ex: Channichthyidae vs Nototheniidae). For

20 each kind of comparison I ran the data with three partitions: the total protein coding sequence, the coding sequence for the N-terminus only (1-79 aa, which represents the end of the bipartite

NLS), and the coding sequence for the C-terminus only (80 aa—end). This allowed for a more nuanced analysis of the selective forces at work on different parts of the gene as well as within different clades and is derived from work done parsing geographic effects on cichlids and positive selection in notothenioids [87, 88].

qPCR

Previous qualitative PCR I conducted on C. aceratus cDNA established general presence/absence of hemogen expression in several adult tissues—liver, head kidney, trunk kidney, spleen, and brain—and isolated the predominant isoforms of hemogen expression in

Channichthyidae (Figure 2). qPCR experiments utilized cDNA samples (outlined in preceding selection) taken from tissues of adult icefish (Champsocephalus gunnari and C. aceratus). The experiment was designed to verify and quantify hemogen expression in multiple icefish species and compare isoform expression in adult tissues. Target transcripts were amplified from cDNA using 1 µM primers (Table 1). Targets were amplified in triplicate. Expression was normalized to beta-actin as the endogenous control for ΔΔCt method [89]. Standard curves were generated to assess the primer efficiencies. qPCR was performed using QuantStudio3 ThermoCycler using

QuantStudio Design and Analysis Software.

RESULTS

Branch test does not detect positive selection in Antarctic notothenioids

21

The test attempted to determine whether adaptive positive selection has occurred in

Antarctic notothenioid Hemogen in comparison to Sub-Antarctic relatives/perciform outgroups

(Figure 3) using the branch-site method. The alternative model testing for positive selection returned –lnL = -3135.34323 with np = 20. The null model (no positive selection) gave –lnL = -

3135.34325 with np = 19. The likelihood ratio test yielded a value of 0.00039 with df = 1, with p

= 0.9842. The test for positive selection was not significant, and the evolutionary change observed in Antarctic notothenioids relative to other teleosts is not likely to be adaptive change driven by positive selection.

Site test results do not detect pervasive adaptive change within the high-latitude Antarctic notothenioid radiation

Model M0 gives fundamental statistics about the base composition of the tree, as well as measures mutational rates over all sequences. The tree used for site tests was generated from hemogen coding sequences via the maximum likelihood method (Figure 4). dN summed over the entire tree = 0.3485, while dS = 0.2974, giving omega dN/dS = 1.17184. While these mutational rates are low, this ratio would be consistent with relaxed purifying selection on hemogen, although it cannot definitively differentiate between relaxed and positive selection.

Site tests yielded three tests to detect positive selection (M1a-M2a, M7-M8, M8a-M8) with p < 0.05 (Table 6). All of these models measure positive selection, with M8a-M8 being the most robust and reliable. That all returned significant p-values would be indicative of pervasive positive selection at specific sites throughout the diversification of Antarctic notothenioids.

However, the data violates a critical assumption of the site test of positive selection. dS summed over all branches < 0.5, which indicates insufficient sequence divergence among the species

22 tested. As a result of this low sequence divergence, the codon-based test is insufficiently robust and cannot reliably measure changes to selective pressure at different sites. No pervasive positive selection can be inferred from these results.

Relaxed selection in Channichthyidae relative to other Antarctic notothenioid families

RELAX confirmed a trend towards relaxed selection in the branches leading to

Channichthyidae and Bathydraconidae relative to Artedidraconidae, Nototheniidae, and E. maclovinus (Figure 5). Test for selection relaxation (K = 0.25) was significant (p = 0.002, LR

= 9.77).

Key promoters remain conserved in Antarctic notothenioids

Based on previous work on both human and teleost hemogen, both conserved non-coding regulatory elements, CNE1 and CNE2, identified by Peters et al 2018 [1] were identified in E. maclovinus, N. coriiceps and C. aceratus (Figure 6). While the intergenic regions between CNEs is reduced in C. aceratus relative to D. rerio (Figures 1/6), both elements remain intact in all three species examined. Preliminary comparisons of key transcription factor binding sites show no significant departures or losses in C. aceratus compared with N. coriiceps or E. maclovinus.

This includes putative binding sites for important co-factors like p300, GATA1, Sox9 and

HoxB4 (Figure 7).

Gene size and structure remains largely conserved in icefish relative to red-blooded fish but show a large genetic lesion in exon 3

23

I used two species as representatives for assessing differences in the gene based on erythroid presence/absence: the red-blooded notothen N. coriiceps and the white-blooded icefish

C. aceratus. Notothenioid hemogen is composed of 4 exons and 3 introns, similar to the previously described D. rerio [1]. hemogen is approximately the same size in both species:

1762bp in N. coriiceps, and 1701bp in C. aceratus as measured from the start codon to the stop codon. I observed no size change in exons 1, 2 & 4, and only small indels in each intron (1-12bp)

(Figure 6). However, C. aceratus showed a significant deletion in exon 3—the loss of 89bp, which occurs within the tandem-repeat region (Figure 6/SFigure 3). This prompted further exploration of this region among icefish to: 1) determine its prevalence throughout the clade, and

2) assess its potential significance on transcription and translation.

Indels in exon 3 serve as the primary source of sequence divergence and evolutionary change among Antarctic notothenioids

I surveyed 14 of the 16 extant icefish species to determine if the deletion observed in C. aceratus is a species-specific feature or evidence of an evolutionary event in the

Channichthyidae MRCA. All species examined showed evidence of an evolutionary loss at this locus, but in three different variants: an 89 bp deletion, a 90 bp deletion, and a 99 bp deletion

(Figure 8). The deletions are not distributed evenly throughout the radiation and do not neatly correspond with the known phylogeny for icefish speciation (Figure 9). By far the most prevalent deletion was the 90bp loss, in a majority of species from the most ancestral (genus

Champsocephalus) to the most recently derived (genus Chionodraco) (Figure 9). The 99bp/33aa loss emerged more recently and is contained within the clade consisting of the genera

Chionobathyscus, Cryodraco, Chaenodraco, and Chionodraco (Figure 9). The 89 bp deletion

24 was only present in species that did not form a monophyletic group. In addition, some species possess multiple deletion alleles—within the derived clade where the 99 bp deletion first emerged, 4 of 6 species were found to carry alleles for both the 90 bp and 99 bp deletion.

Analysis of the coding sequence (all exons) showed high sequence conservation regardless of the breadth of speciation among Antarctic notothenioids. The majority of evolutionary change centers on repeated insertion or deletion events, which are notable both for their frequency as well as a tendency to reoccur independently in different species or clades within the same, often overlapping, region of the gene. CodeML model M0 showed low mutation rates even when E. maclovinus was included in the analysis: a tree length for dN of

0.3156, and tree length for dS of 0.2284.

The majority of indels are concentrated in two specific regions within the hemogen protein: at the beginning of the region encoding the C-terminus and within the segment encoding the proline-rich region of the C-terminus—both of which are contained within exon 3. Within the species surveyed, I identified 24 unique indels in the coding regions of high-latitude Antarctic notothenioids: 5 insertions, 19 deletions (Figure 10). Of these 24 events, 23 occur within the segment encoding the C-terminus, and only one occurs in the segment encoding the N-terminus, in a single species (SFigure 3). In addition to disproportionate prevalence of deletions over insertions, there’s considerable variability among deletions and their occurrence within the tree when compared with insertions. Insertion size is 1-2 amino acids, exclusively, and almost solely contained within the Nototheniidae (with one exception among icefish). Deletions range from 1-

33 amino acids. These indels are also the most phylogenetically distinct features of notothenioid hemogen; a full 22 of 24 indels are parsimony informative, while only 2 are homoplasic.

25 dN/dS shows disparate selection pressure in hemogen segments encoding the N-terminus vs the C-terminus

Pairwise comparisons show similar patterns: neutral mutation rate (as measured by dS) is low/non-existent in all comparisons. A significant number of pairwise comparisons were excluded from measurements of dN/dS because of a lack of synonymous mutations between the two sequences, resulting in N/A values (Table 4/5). Both within and between family comparisons show a preference for accumulating non-synonymous mutations, primarily within the segments encoding the C-terminus (Figure 11/12). The segments encoding the N-terminus are under purifying selection within families, and most between family comparisons also display this trend (Table 4). A trend away from purifying selection, possibly relaxed to the point of coming under drift, is shown in between-family comparisons with Channichthyidae (Table 5), with all comparisons showing dN/dS >1. The N-terminal still remains largely conserved in

Channichthyidae (Figure 11), but given low neutral change, any nonsynonymous changes to the coding sequence will result in high dN/dS.

Discovery of novel splice variation in Channichthyidae that excludes key domains from translation

Two transcripts have been detected in surveyed icefish. The first transcript is hemgn-L, which includes all exons (Figure 2). The tissue survey for C. aceratus uncovered a novel splice variant not previously detected in zebrafish, N. coriiceps, or P. charcoti [unpublished transcriptome]. This transcript (hemgn-s) (Figure 2) splices from the end of exon 2 beyond the

“icefish deletion” in exon 3. Curiously, hemgn-s splices into another frameshift, resulting in the same truncation observed in hemgn-L from C. aceratus—a missense mutation and premature

26 stop preventing translation of the acidic domain and exon 4 (Figure 2). This transcript has been detected in other icefish which do not show a frameshift in hemgn-L; in each, it yields a similarly truncated protein (Figure 2). As it does not include the beginning of exon 3, this transcript would also exclude the bipartite nuclear localization signal from any translated protein. If translated, it would produce a protein of approximately 78AA with only one functional domain intact: the coiled-coil domain. The evolution of the hemgn-s splice form corresponds with an amino acid change at the end of exon 2 that results in the gain of a basic residue (SFigure 3, position 52 in alignment). Surveying the coding and protein sequences (SFigure 2/3) shows this change is unique to the emergence of Channichthyidae.

qPCR shows low levels of hemogen expression in adult tissues of icefish

Primers were designed to capture two types of hemogen expression in icefish. One is the hemgn-s variant, and the other is all “whole” hemgn. In icefish, this captures only the hemgn-L variant, while in N. coriiceps this captures all transcripts (excluding a potential hemgn-s transcript) Given the high standard deviations for icefish hemogen expression, I chose to analyze my results as dCT rather than fold change. We see some hemogen expression in adult icefish, primarily in a non-frameshifted species, C. gunnari, with little to no expression in C. aceratus.

Expression of “whole” hemogen is significantly reduced in icefish relative to red-blooded fish

(Figure 13). The hemgn-s transcript is not expressed in N. coriiceps, but does appear to be expressed at relatively equivalent levels to hemgn-L in C. gunnari (Figure 13). Sequencing the qPCR product confirms that hemgn-s was detected by our primers in C. aceratus, albeit at a nearly undetectable and unreliably quantifiable level. Sequencing the N. coriiceps product for

27 hemgn-s qPCR showed an off-target fragment was amplified using the hemgn-s primers. N. coriiceps does not legitimately express the hemgn-s splice variant.

Conservation of functional domains & degradation of nuclear localization domain in

Channichthyidae

Yang et al [38] identified several structural features of the mouse ortholog: a coiled-coil domain and bipartite nuclear localization signal in the N-terminus, and an acidic domain in the

C-terminus. I find these features conserved in red-blooded notothenioids, but partially lost or degraded in Channichthyidae. The coiled-coil domain is located at 25-39aa; the bipartite NLS at

57-74aa; and the acidic domain at 240-261aa (SFigure 3). Exons 1, 2, and part of 3 encode the

N-terminus. Exon 3 encodes the bulk of the C-terminus, and consists of the bipartite NLS as well as a series of tandem repeats [1]. The icefish deletion appears to eliminate at least one tandem repeat from icefish relative to red-blooded notothens (Figure 6, 8) in those species with an in- frame deletion (Figure 8). The out-of-frame deletion causes a missense mutation impacting the tandem repeat structure further (Figure 8).

The bipartite nuclear localization signal has undergone degradation in all observed icefish species (Figure 14), indicating that this feature likely evolved in the MRCA of Channichthyidae.

It’s been previously established that excising the NLS from hemogen prevents nuclear localization [90], and simply mutating the initial amino acid of a bipartite NLS has been shown to be sufficient to inhibit nuclear localization in some genes [91]. Furthermore, the bipartite NLS is excised from the hemgn-s splice form. The degradation of the NLS coupled with the evolution of an isoform encoding a protein which excludes the NLS outright suggests this feature may be a victim of relaxed selection in icefish.

28

DISCUSSION hemogen is under relaxed selection in Channichthyidae and potentially notothenioid fish at large

Relaxed selection plays an important role in evolution by permitting novel traits and functions to evolve— evolutionary innovation— and sometimes permitting expanded phenotypic plasticity [92, 93]. Alternatively, relaxed selection can lead to loss-of-function and possibly pseudogenization [94]. RELAX does show that changes observed in Channichthyidae relative to other Antarctic notothenioids are indicative of a trend towards relaxed selection on these branches. This method is beneficial for detecting subtle selective pressure but comes with a serious drawback of being unable to identify specific sites under relaxed selection, or substantiate more than a trend between a foreground and background set of branches—which requires some previously existing insight on where relaxed selection is most likely to occur [69].

However, the codon-based site tests to fully confirm—or refute—if changes at the level of individual amino acids can be attributed to relaxed selection, vs positive selection.

I conducted the branch-site test, which examined N. coriiceps and C. aceratus vs other teleosts, to specifically identify changes shared among Antarctic notothenioids which were not present in any of the teleost outgroups. Such changes might be indicative of adaptive changes associated with the hematic challenges of polar living, and would be good candidates for positive selection. Distinguishing between relaxed and positive selection can be difficult, as both can show a similar signal of increased ω when a gene should be under purifying selection [95]. The ideal scenario would be 1) to confirm relaxed selection, or 2) exclude the possibility that observed amino-acid changes might be due to positive selection. The branch-site test did not

29 yield any sites under possible positive selection, let alone any that were considered significant under the most-robust Bayes Empirical Bayes analysis implemented in CodeML [96]. This would seem to support the conclusion that evolutionary change among Antarctic notothenioids is due to relaxed selection rather than positive selection. However, this test cannot assess changes observed within Antarctic notothenioids, but not shared in common amongst all of them—that is, evolutionary change that emerged during speciation but not shared in kind amongst the whole radiation.

When trying to asses change within the Antarctic clade, several factors make it difficult to directly test hypotheses of positive selection vs genetic drift on hemogen. Evolutionary hypothesis testing using dN/dS (ω) relies on codon-based models for best practices, where the number of silence and replacement changes are used to “decide” if there’s been an excess of nonsynonymous change relative to synonymous change—and thus, make inferences about selective pressure. Large dN/dS estimates for extremely short branch lengths from model M0 indicate that any estimate of selective pressure should be considered unreliable as proof positive of positive selection, and that fundamental assumptions of the test may be violated. The current dataset fails to meet the sequence diversity requirement (dS over branches > 0.5) for any site test of selection to accurately test hypotheses about positive selection [97-101].

I had intended to use similar methods to explicitly test for relaxed selection, which rely on similar models and assumptions in order to assess selective pressure [102]. Further exploration of these models and their practical applications showed they would be no more robust than the tests for positive selection had been—due to either low levels of sequence divergence among my dataset, or lack of a comparison to assess potentially accelerated mutation rates and thus establish relaxed selection [103-106].

30

This illustrates an interesting problem in evolutionary biology: if the selective changes are subtle, or in recently diverged species, it can be difficult to pinpoint the selective cause behind the mutational effects. Among adaptive radiations, if the gene of interest is 1) recently under positive or relaxed selection, or 2) only under slight positive or relaxed selection, or 3) not a gene responsible for a strong phenotypic effect of adaptive change, potentially driving speciation—then it is unlikely that many current methods will be able to detect that selective pressure. Given the low mutational rate of change, there’s no evidence to suggest that notothenioid preference for non-synonymous change in this region is associated with positive selection or novel interactions at this time.

Potentially the best evidence for relaxed selection is post hoc—with repeated mutations that impair proper functioning of key domains, one could argue that this could only be due to relaxed selection. Frameshift mutations tend to have strongly adverse effects and have been linked to many cancers and diseases, but have occurred independently at least twice within the icefish clade. Frequent indels occurring within exon 3 are consistent with the hypothesis that genes associated with erythropoiesis underwent a change in the intensity of purifying selection as a result of Southern Ocean colonization [21]. To substantiate this, it is necessary to outline what the functional impacts of these indels might be and how they would tie into larger trends of evolutionary change among notothenioids, as I will do in the following sections.

The most conclusive evidence for relaxed selection would be establishing that icefish hemogen has undergone pseudogenization, complete loss-of-function, or relaxation of selectional constraints leading to deterioration of some functional domains. This could be accomplished either through observation (only a fragment remains, ex: icefish alpha- & beta-globin) or via prediction of pseudogenization via comparative genomics [107]. While it is undeniable that

31 hemogen has been impacted in icefish, I argue that it does not reach the level of complete pseudogenization, and that the functional picture is more complicated than simply: is hemogen nonfunctional in white-blooded fish?

Observed mutations in hemogen are not due to gene duplication or chromosomal rearrangement

If hemogen is under relaxed selection in notothenioids, it becomes important to determine whether or not the observed changes in the hemogen sequence represent changes to the true hemogen orthologue of zebrafish and other vertebrates, and rule out some alternative possibilities for the evolutionary processes behind them. Two evolutionary processes could be in play: gene duplication in the icefish MRCA, while would result in a hemogen paralog that could be deteriorating; or chromosomal rearrangement/localized chromosomal breakage, resulting in the partial loss of part of the hemogen gene. Either could result in a signal of relaxed selection, but the evolutionary conclusions surrounding hemogen and its current role would differ depending on which—if either—process is in effect.

Gene duplications play a critical role in the evolutionary process, either as contributor to phenotypic plasticity via evolution of new roles through neofunctinalization and subfunctionalization [108, 109], or by development of pseudogenes [94]. I ruled out ancestral gene duplication related to the whole-genome duplication event in teleosts [110-112], as analysis conducted with zebrafish ortholog Si:dkey-25o16.2 showed that most teleost genomes retain only a single hemogen ortholog [1]. However, a duplication event somewhere within the Antarctic radiation, or prior to the diversification if Channichthyidae could be possible [113, 114]. If the hemogen first identified in C. aceratus might happen to be a paralog, or the product of a

32 duplication event in the icefish MRCA, then subfunctionalization of non-erythropoietic processes could explain the C-terminus deterioration in said duplicate [115, 116] as well as some of the reduced expression of icefish hemogen [117]. Thus, it was critical for me to make sure I was analyzing true ortholog hemogen in the notothenioids, and not a duplicate which has either subfunctionalized, neofunctionalized, or deteriorated into fragmentation and pseudogenization.

Alternatively, the large indels in Channichthyidae could be attributed to chromosomal breakage,or rearrangement [118]. Studies with human ortholog EDAG identify it as a putative oncogene located at a chromosomal region (9q22) linked to leukemia-associated chromosomal breakpoints [119, 120]. If the hemogen locus has been disrupted by chromosomal breakage, the gene seen in icefish might be a fragment rather than a whole (but mutated) gene. Given the high sequence conservation of the hemogen gene, significant disruption around the locus seemed unlikely.

My experimental evidence already suggested a low possibility that either gene duplication or chromosomal rearrangement had taken place, given the high sequence conservation between red-blooded and white-blooded notothenioids (> 90%), and the lack of double bands following gel electrophoresis of PCR. To be entirely sure, I ruled out both possibilities by assessing the conservation of synteny surrounding hemogen in Antarctic notothenioids and by using available genomes to survey for similar sequences which might represent hemogen paralogs. I was able to check my C. aceratus sequence against the available

C. aceratus [unpublished results] and N. coriiceps genomes (NCBI RefSeq NC_015653.1) [121] to confirm its chromosomal location and assess possible sequencing errors or duplicate genes.

The sequence I obtained, with the observed 89 bp deletion, was present on the genome scaffold,

33 and BLAST failed to obtain multiple hits for the whole-gene (exon + introns), coding sequence only (all exons), or partial coding (individual exons) sequences.

Thanks to my work sequencing the promoter region of hemogen, I was also able to assess the conservation of synteny among three representative species: C. aceratus (white-blooded), N. coriiceps (red-blooded), and E. maclovinus (Sub-Antarctic outgroup). Previous work by the

Detrich lab established the conservation of synteny between zebrafish and other vertebrates [1] with anp32b upstream of hemogen, and TRMO downstream. I sequenced towards hemogen from both of those genes, and sequenced upstream and downstream from hemogen, in order to rule out a small, localized chromosomal rearrangement that might impact hemogen. The Eleginops sequence confirmed this localized synteny existed prior to the diversification of Antarctic notothenioids, and expanding my search to other teleosts, lobe-finned, and cartilaginous fish substantiated conservation of this ~15 kb region despite > 400 Ma of evolutionary distance [122-

125].

I therefore conclude that the sequence I obtained is not a hemogen paralog in C. aceratus, and that it is unlikely that any other icefish sequences represent paralogs rather than true orthologs. I also conclude that the indels observed in both red-blooded and white-blooded fish are likely further evidence of relaxed selection acting upon hemogen, as opposed to larger-scale chromosomal factors that could cause significant deletion or genomic rearrangement.

Mutation is preferentially accumulated in areas known to be conserved and vital for erythropoiesis

The majority of evolutionary change across the Antarctic radiation occurs within the C- terminal region encoded in exon 3, and in particular within the proline-rich region composed of

34 tandem repeats. This includes nonsynonymous amino acid changes, as illustrated in the pairwise dN/dS comparisons, but primarily occurs as indels. The C-terminus is critical for binding of p300, and the recruitment and binding to p300 is critical for Hemogen to promote erythroid differentiation in the human ortholog [55].

These indels also overlap or occur adjacent to a conserved C-terminal area identified as essential for erythropoiesis in zebrafish [1]. Previous studies used CRISPR/Cas9 zebrafish mutants to induce indels in zebrafish hemogen—a frameshift mutant deleting 5aa, and an in- frame mutant deleting 12 aa and part of a conserved acidic motif (EEED). Analysis of hemoglobin concentration and numbers of circulating blood cells show indels in this region lead to reduced erythrocyte levels and hypochromatic blood in embryos of in-frame mutants, and that the proportion of anemic individuals increases even in the heterozygous condition in both mutant strains [1]. Frameshifted mutants did not appear to be translationally successful, whereas the non-frameshifted mutants did produce a slightly-smaller Hemogen protein [1]. Indels in this region also impacted development via nototchord and trunk defects in both frameshifted and in- frame indel mutant zebrafish, and increased cellular apoptosis was identified within framesfhited mutants throughout the embryo. Adult fish were statistically divergent in size from wild-type, especially within homozygous in-frame mutant strains [1].

The implications of this research for interpreting the prevalence of indels suggest several possibilities about relaxed selective pressures on erythropoiesis and hematopoietic traits.

Zebrafish mutations targeting the C-terminus of Hemogen reduced erythrocyte levels in adults and decreased expression of Embryonic beta-globin regardless of in-frame or frameshift condition. Therefore, deletions around or within this area of the C-terminus should impair erythropoiesis in notothenioids even in the absence of frameshifts. The permissibility of

35 hemogen indels in red-blooded fish supports the conclusion that Hemogen cannot be absolutely essential for erythropoiesis. Importantly, given that even heterozygous mutants show impaired erythropoiesis, “one good copy” of the gene would not be sufficient to prevent such an indel from having an effect on erythrocyte production and concentration.

Given that red-blooded notothenioids demonstrate reduced hematocrit and lowered erythrocyte levels, the observed in-frame indels in red-blooded fish may represent a response to the necessary adaptation to polar conditions. At this time it is not possible to distinguish between the chicken-and-egg of this situation: do the indels in hemogen represent a part of the adaptive changes needed to lower hematocrit/decrease blood viscosity, or did they occur from relaxed selection on erythrocyte-regulators as the importance of hemoglobin decreased throughout the radiation? Given the lack of strong phylogenetic pattern among the red-blooded indel distribution, I cannot determine between these two possibilities at this time. It is also unclear specifically how the mutations in red-blooded notothends impact binding with p300, or if they impair or inhibit access to a TAD. Similarly, it is unknown whether or not the nonsynonymous mutations have a strong functional impact (or any functional impact) on binding or protein structure in red-blooded notothenioids.

The adaptive value of frameshifts among icefish is another matter. The large size of the deletion (89-99 bp) and the fixation of a frameshift variant are most likely the result of relaxed selection, given that they should have significant adverse effects on erythropoiesis. Furthermore, during diversification of Channichthyidae two independent frameshift mutations occurred in this

C-terminal region, resulting in truncated proteins that eliminate the acidic domain and the segment encoded by exon 4. If erythropoiesis is no longer required, then there is no longer any reason to selectively maintain the domains responsible for that function, and frameshift

36 mutations may be tolerated to the point of fixation. Thus, the large indels may be indicative of ongoing subfunctionalization in the hemogen ortholog, paring down the gene to only those functional domains most essential to non-erythropoietic roles for hemogen.

Evolution of a novel splice variant missing majority of functional domains required for erythropoiesis might be a natural dominant negative

Regardless of which isoform an icefish may be expressing at any given time, at least two functional domains show evidence of relaxed selection. All icefish display a degraded bipartite

NLS and some loss to the proline-rich/tandem repeat region implicated in GATA1/p300/EDAG complex [1, 55]. All icefish can theoretically express the hemgn-s isoform that excludes everything but the coiled-coil domain from its truncated protein product. This elimination of functional domains from a key isoform could be a dominant negative mutation that would interfere with the functions of the Hemogen protein encoded by hemgn-L. This could be accomplished either by binding to key partners (such as GATA1 or p300) or potentially through oligomerization of Hemogen itself. Dominant negatives have an important role to play in inhibiting wild-type expression, and have particularly been implicated in disease and promotion of cancer formation. Additionally, dominant negatives in erythropoietic genes have been known to lead to anemic conditions [126, 127] and development of a dominant negative in hemogen could promote the erythrocyte-null condition. Dominant negatives have also been shown to increase favorable outcomes in acute myeloid leukemia at other loci [128], suggesting that dominant negative mutations in other putative oncogenes associated with leukemia (such as hemogen) might also be beneficial in decreasing proliferation of this particular type of tumor.

37

Key transcription factor binding domains for hematopoiesis- and/or erythropoiesis- promoting genes appear to be conserved in icefish despite the erythrocyte-null phenotype, including GATA1 [129], KLF4 [130], and Myb [57, 131]. With the promoter region and necessary CNEs for primitive erythropoiesis still potentially intact, hemogen could still be recruited to these processes in icefish. If hemgn-s is a dominant negative, it might interact with some complexes in such a way as to prohibit their functions to promote erythropoiesis, while still permitting non-erythropoietic function to occur.

Several caveats to this theory need to be stated. At this time, it is unclear how expression of hemgn-s is regulated and what promoters might be behind it, or if it is controlled by a promoter region further upstream that has yet to be discovered. It is unclear how it is differentially expressed relative to wild-type hemogen, or if promoting expression of wild-type hemogen means that hemgn-s is automatically expressed as well. However, the implications for - hemgn-s if it could be conclusively proven to encode a dominant negative—or, alternatively, if other forms of icefish hemogen could function as a dominant negative in some processes—could be significant for non-erythropoietic research as well, given that the C-terminal is also implicated in cell apoptosis and developmental defects and delays in zebrafish [1].

Large-scale deletions and degradation of functional domains occur concurrent with loss of alpha- & beta-globin expression

The majority of functional change in Channichthyidae appears to be traceable to events that occurred within the MRCA of icefish, prior to the diversification of the clade. The major deletion in exon 3, the degradation of the bipartite NLS, and the evolution of a novel splice

38 variant are all shared among extant icefish, and the most parsimonious explanation is that these characters developed prior to diversification and speciation.

Additionally, the icefish-specific deletions provide insight into the evolutionary history of the icefish clade that implicate globin-loss in relaxed selection on hemogen. Based on analysis of hemogen allelic variation, the most likely evolutionary scenario for deletions and deteriorations is that the 90bp/30aa deletion occurred in the MRCA of icefish, possibly as a consequence of relaxed constraints following the loss of hemoglobin expression. From there two independent mutations occurred: 1) a 1 bp insertion leading to the “frameshift” allele observed in C. aceratus and Neopagetopsis ionah, which became fixed in some species, and 2) a secondary 9 bp deletion, which has fixed in some but not all of the most recently speciated/most derived lineages.

Distribution of these deletion variants is consistent with incomplete lineage sorting, which is common in adaptive radiations due to the rapid speciation [132], but also consistent with possible introgression. Past introgression events have been detected in some icefish species

[133]. The evolution of these traits suggest relaxed selection permitting further deterioration following the removal of a functional constraint: i.e., the non-expression of erythrocytes.

39

Table 1. Primers used in PCR and qRT-PCR reactions to amplify hemogen gDNA and cDNA in

Antarctic notothenioids

PRIMER NAME SEQUENCE PURPOSE SPECIES

Ncor130for 5'-TGGAGGAGACATTTCAACA-3' gDNA, cDNA Antarctic notothenioids

NcHemRev2 5'-ACTAACAGGATGCACACTAACC-3' gDNA, cDNA Antarctic notothenioids

QP_CA500SpliceF2 5'-GACTAACCAGTGGGTTTAAGCC-3' qPCR C. aceratus, C. gunnari

NcHemRev1 5'-TTGTGGAGGAGGTGTCGAG-3' qPCR Antarctic notothenioids hemAllqPCRFor 5'-AGAATGGAGGAGACATTTCAACA-3' qPCR C. aceratus, C. gunnari,

N. coriiceps hemAllqPCRRev1 5'-TTCCTCAGAAGATCCCTGTC-3' qPCR C. aceratus, C. gunnari hemAllqPCRRev2B 5'-CTTGTCTTCTGCTTCAGCTT-3' qPCR N. coriiceps

RTBactF 5'-CAGATCATGTTCGAGACCTTCAAC- qPCR C. aceratus, C. gunnari,

3' N. coriiceps

RTBactR 5'-TCACCRGARTCCATGACGATA-3' qPCR C. aceratus, C. gunnari,

N. coriiceps

40

Table 2. Species sequenced and included in study of Antarctic notothenioid hemogen

ORGANISM FAMILY LOCALE ORGANISM FAMILY LOCALE Champsocephalus esox* Channichthyidae Sub-Antarctic Harpagifer antarcticus Harpagiferidae Antarctic Champsocephalus gunnari* Channichthyidae Antarctic Notothenia rossii Nototheniidae Antarctic Neopagetopsis ionah* Channichthyidae Antarctic Notothenia coriiceps* Nototheniidae Antarctic Pagetopsis macropterus* Channichthyidae Antarctic Nototheniidae Sub-Antarctic Pseudochaenichthys georgianus*† Channichthyidae Antarctic Gobionotothen gibberifrons Nototheniidae Antarctic Dacodraco hunteri Channichthyidae Antarctic Pleuragramma antarctica Nototheniidae Antarctic Channichthys rhinoceratus* Channichthyidae Antarctic Trematomus hansoni* Nototheniidae Antarctic Chaenocephalus aceratus*† Channichthyidae Antarctic Trematomus bernacchii* Nototheniidae Antarctic Chionobathyscus dewitti* Channichthyidae Antarctic Trematomus eulepidotus Nototheniidae Antarctic Cryodraco antarcticus* Channichthyidae Antarctic Trematomus borchgrevinki Nototheniidae Antarctic Chaenodraco wilsoni* Channichthyidae Antarctic Trematomus newnesi* Nototheniidae Antarctic Chionodraco myersi* Channichthyidae Antarctic Trematomus scotti Nototheniidae Antarctic Chionodraco hamatus* Channichthyidae Antarctic Nototheniidae Antarctic Chionodraco rastrospinosus* Channichthyidae Antarctic Patagonotothen cornucola Nototheniidae Antarctic Parachaenichthys charcoti† Bathydraconidae Antarctic Lepidonotothen nudifrons Nototheniidae Antarctic Gerlachea australis Bathydraconidae Antarctic Dissostichus mawsoni Nototheniidae Antarctic Bathydraco marri Bathydraconidae Antarctic Dissostichus eleginoides Nototheniidae Antarctic Akarotaxis nudiceps Bathydraconidae Antarctic Aethotaxis mitopteryx Nototheniidae Antarctic Vomeridens infuscipinnis Bathydraconidae Antarctic Eleginops maclovinus* Eleginopsidae Sub-Antarctic Racovitzia glacialis Bathydraconidae Antarctic Pseudaphritis urvilli Pseudaphritidae Eastern Australia Pogonophryne barsukovi Artedidraconidae Antarctic Cottoperca gobio Bovichtidae Sub-Antarctic Pogonophryne scotti Artedidraconidae Antarctic Bovichtus diacanthus Bovichtidae Sub-Antarctic Dolloidraco longedorsalis Artedidraconidae Antarctic Percophis brasiliensis Percophidae South America Histiodraco velifer Artedidraconidae Antarctic Etheostoma nigrum Percidae North America Artedidraco skottsbergi Artedidraconidae Antarctic Gasterosteus aculeatus Gasterosteidae Northern Hemisphere

* sequenced gDNA † transcriptome available

41

Table 3. Codon usage bias for hemogen (total coding sequence) among Antarctic notothenioids

SPECIES CAI SPECIES CAI Eleginops maclovinus 0.34 Pogonophryne scotti 0.329 Aethotaxis mitopteryx 0.352 Gerlachea australis 0.334 Dissostichus mawsoni 0.37 Parachaenichthys charcoti 0.332 Dissostichus eleginoides 0.345 Racovitzia glacialis 0.346 Lepidonotothen nudifrons 0.357 Vomeridens infuscipinnis 0.343 Patagonotothen cornucola 0.378 Akarotaxis nudiceps 0.349 Patagonotothen guntheri 0.377 Bathydraco marri 0.337 Trematomus bernacchii 0.356 Chaenocephalus aceratus 0.309 Trematomus borchgrevinki 0.363 Dacodraco hunteri 0.342 Trematomus eulepidotus 0.36 Neopagetopsis ionah 0.338 Trematomus hansoni 0.376 Cryodraco antarcticus 0.316 Trematomus newnesi 0.35 Chionodraco hamatus 0.332 Trematomus scotti 0.368 Chionodraco rastrospinosus 0.329 Gobionotothen gibberifrons 0.324 Champsocephalus esox 0.343 Notothenia angustata 0.359 Chionodraco myersi 0.304 Notothenia rossii 0.367 Pagetopsis macropterus 0.344 Notothenia coriiceps 0.358 Champsocephalus gunnari 0.337 Harpagifer antarcticus 0.339 Chaenodraco wilsoni 0.309 Artedidraco skottsbergi 0.332 Channichthys rhinoceratus 0.337 Histiodraco velifer 0.333 Chionobathyscus dewitti 0.313 Dolloidraco longedorsalis 0.327 Pseudochaenichthys georgianus 0.327 Pogonophryne barsukovi 0.329

42

Table 4. Mean pairwise dN/dS for within-family comparisons of Antarctic notothenioid families

total N-terminus C-terminus Artedidraconidae 1.381 N/A N/A Bathydraconidae 0.807 0.880 0.866 Nototheniidae 1.320 0.716 1.191 Channichthyidae 1.077 0.435 1.190

43

Table 5. Mean pairwise dN/dS for between-family comparisons of Antarctic notothenioid families

total N-terminus C-terminus Harpagiferidae-Nototheniidae 5.889 0.656 5.793 Harpagiferidae-Artedidraconidae 5.889 0.394 N/A Harpagiferidae-Bathydraconiade 1.980 1.172 2.225 Harpagiferidae-Channichthyidae 4.737 2.171 4.723 Bathydraconidae-Nototheniidae 3.010 0.839 2.991 Bathydraconidae-Artedidraconidae 1.754 0.557 1.184 Bathydraconidae-Channichthyidae 2.225 1.844 1.653 Artedidraconidae-Nototheniidae 3.158 0.516 3.708 Artedidraconidae-Channichthyidae 2.601 1.772 1.963 Channichthyidae-Nototheniidae 2.865 1.895 2.597

44

Table 6. Results of codon-based site tests conducted in CodeML on the Antarctic radiation

TEST ΔLRT DF P-VALUE M0-M3 9.08037 4 p = 0.0591 M1a-M2a 8.3575 2 p = 0.0153 M7-M8 10.1192 2 p = 0.0063 M8-M8a 8.07556 1 p = 0.0045

45

46

Figure 1. Zebrafish Si:dkey-25o16.2 and human Hemogen are orthologous and encode related proteins that differ in size. (A) Structure of the zebrafish Hemogen-like gene, Si:dkey-

25o16.2. Two conserved noncoding elements (C1 and C2, black boxes) were identified in a 2 kb segment proximal to the start codon (see Results, Figs 4-6). Coding exons, white boxes; noncoding exons, gray boxes. Numbers indicate length in bp. (B) Synteny of loci for zebrafish

Si:dkey-25o16.2 on chromosome 1 and Hemogen on human chromosome 9 (region q22).

Transcriptional orientations indicated by arrows. (C) Alternative splicing of zebrafish Hemogen- like transcripts showing sequenced regions. Introns are shown as chevrons. Transcripts 1 and 2 differ by retention of 12 bp of intron (red). (D) Modular structures of zebrafish and human

Hemogen proteins each encoded by four exons (numbered boxes). Locations of truncating mutations found in some human cancers (Forbes et al., 2017) are indicated by asterisks.

Predicted regions and motifs: green, coiled coil; blue, nuclear localization signal; red, four residues introduced by alternative splicing; yellow, tandem peptide repeats; brown, acidic repeat with transactivation domain (TAD) motif; gray, no prediction. (E) Three-dimensional ab initio models of Hemogens. The ribbon diagram of the zebrafish protein, color-coded as in panel D, is superimposed on the gray, space-filling model for the human protein. (Reproduced with permission from Biology Open)

47

48

Figure 2. Icefish transcript variants for hemogen and their putative effects on translation illustrated in representative species Champsocephalus gunnari. While several potential transcripts were possible, only two transcripts have been confirmed for hemogen in icefish. The first transcript, hemgn-L, consists of the complete transcription of all exons. The novel transcript hemgn-s splices from the end of exon 2, to downstream of the 90bp deletion region; additionally, it splices into a frameshift similar to that observed in C. aceratus, and thus, the end of exon 3 and exon 4 would not be translated. This hemgn-s transcript would exclude the bipartite NLS, the proline-rich tandem repeat domain and the acidic region. It has been detected in all icefish surveyed regardless of their exon 3 deletion allele. Additionally, all transcripts detected in surveyed icefish (C. aceratus, C. gunnari, P. georgianus and C. rastrospinosus) show that a potential splice variant feature at the end of exon 2 (+/- 4AA) present in teleost fish only splices with the +4AA (blue) in icefish. (C) Illustration of functional domains which would be included in the Hemogen protein encoded by hemgn-s. If translated, only the coiled-coil domain (green) would be present in the Hemogen protein; the splice form would exclude the functional domains encoded within exon three: the bipartite nuclear localization signal, the tandem repeats, and the acidic region. The light-grey regions on the protein illustration represent areas with no predicted functional domain.

49

Figure 3. Maximum likelihood tree used to test for positive selection on the branch leading to the Antarctic notothenioid clade. Tree was constructed based on the coding sequences derived from each species. The background branch includes all non-Antarctic outgroups ranging from as far as the three-spine stickleback (G. aculeatus) to close Sub-Antarctic relative E. maclovinus. The foreground branch is marked in red and includes two representative species: one red-blooded (N. coriiceps) and one white-blooded (C. gunnari).

50

51

Figure 4. Maximum likelihood tree used in site-tests for positive/pervasive selection among

Antarctic notothenioids. Tree was constructed in RAxML (see Methods) and used for calculating changes to site dN/dS for all models, in order to detect pervasive selection among the high-latitude Antarctic notothenioids. No such selection was detected.

52

Figure 5. RELAX tree shows relaxed selection on the branches contained Bathydraconidae and Channichthyidae, demonstrating a trend of relaxed selection in hemogen on the way to the erythrocyte-null phenotype. Test for selection relaxation (K = 0.25) was significant (p =

0.002, LR = 9.77). The test branch is indicated in light teal and represents the K value (K = 0.25) for this branch relative to the background (K = 1, not colored.)

53

54

Figure 6. Gene structure and size remains conserved among red-blooded and white- blooded notothenioids, including regulatory regions conserved among teleost fish. Structure of the coding region in both Notothenia coriiceps (A) and Chaenocephalus aceratus (B) matches that observed in other teleosts as well in other vertebrates and show strong size conservation for exons 1, 2 and 4, as well as all introns. Two conserved non-coding elements described in Peters et al 2018 for D. rerio are also still present in both species. The hemogen regulatory structure shown (C) is derived from C. aceratus intergenic sequencing, but general spacing is approximate to the regulatory regions of both N. coriiceps and Sub-Antarctic relative Eleginops maclovinus.

While the 5’ intergenic region of notothenioid hemogen is decreased in size relative to D. rerio

(Peters et al 2018), both CNEs are intact and show no significant genetic lesions relative to other vertebrate species. (D) shows the partitioning of functional domains among exons in a representative red-blooded species, Trematomus scotti. There are four primary domains: the coiled-coil domain (exon 2), the bipartite nuclear localization signal (exon 3), a proline-rich region composed of a variable number of tandem repeats (exon 3) and an acidic domain (exon

3). (E) shows these functional domains in a white-blooded fish, Champsocephalus gunnari.

Preliminary analysis of tandem repeats in icefish shows that the large deletion in exon 3 may have resulted in the loss of at least one repeat.

55

Figure 7. Conservation of conserved non-coding elements CNE1 and CNE2 in Antarctic notothenioids relative to Gasterosteus aculeatus and Danio rerio. Transcripton factor binding sites were predicted using Contra V2 (Broos et al 2011) based on previously identified key binding factors in both human hemogen and D. rerio. Relative to Antarctic notothenioids and representative teleost species, C. aceratus does not show significant deterioration of either CNE1 (A) or CNE2 (B) and still possesses many putative binding sites for key co-factors like p300, GATA1, Sox9, etc. Binding sites are colored accordingly: Foxl2 (orange), GFI1 (light blue), KLF4

(bright green), HNF1 (pink), HOXB4 (light brown), MYB (cyan/lavender), P300 (grey), Sox9 (red), GATA1 (dark pink/dark orange.)

56

57

Figure 8. hemogen exon 3 deletions in representative species from Channichthyidae relative to a red-blooded notothenioid, and their predicted effects on transcription and translation.

The genetic lesion representing the most significant mutation to icefish hemogen takes three key forms: a 90bp deletion, a 99bp deletion, and an 89bp deletion. (A) shows a representative red- blooded notothenioid, Trematomus scotti, which does not possess any lesions in exon 3. (B) shows representative C. gunnari, which possess the 90bp form of the deletion. Putative translation shows this would result in a 30AA deletion but does not produce a frameshift, and the rest of the gene should be translated normally. Similarily, in (C) you can see the 99bp deletion of

C. rastrospinosus would be somewhat reduced but still translated normally in frame. However, the 89bp deletion first observed in C. aceratus (D) would result in a missense mutation leading to a premature stop, and a subsequently truncated protein. The transcript possessing all exons is illustrated here in red; grey regions indicate sequence that would be excluded from translation as a result of the premature stop codon. The Hemogen protein structure is illustrated underneath each transcript, in order to show how some features would not be wholly translated/translated at all due to the exon 3 deletions in icefish. Functional domains are colored as follows: coiled-coil, green; nuclear localization signal, blue; tandem repeats, yellow; acidic domain, red; light grey, no predicted domain.

58

Figure 9. Variant forms of hemogen “exon 3” deletion mapped onto the Channichthyidae species tree. The deletion forms are not distributed evenly throughout the tree and follow a loose evolutionary pattern at best. The most common form the deletion is the 90bp loss (green), which may be the ancestral form of this lesion, as it is present throughout the entire tree. Second most common is the 99bp deletion (blue), which evolved within the more recently speciated lineages

(< 4Mya) and does not appear to be fully fixed in most species, as it frequently appears as a second allele alongside the 90bp deletion. The 89bp deletion, resulting in a frameshift, follows no clear evolutionary pattern. This mutation appears to be fixed in species such as C. aceratus, and surveying multiple indidividuals failed to detect any non-frameshifted hemogen variants in these species. Tree topology was recreated using the mitochondrial tree of Near et al 2006 and the putative species tree built by Dr. Jacob Daane [unpublished results].

59

60

Figure 10. hemogen indels in Antarctic notothenioids mapped onto a maximum parsimony tree. Maximum parsimony tree was constructed in MEGA7 based on putative protein sequences

(see Methods). All indels marked appear in the coding sequence and would have an impact on the protein composition. Purple triangles indicate a deletion event, while red triangles indicate an insertion. Indels marked with a colored star are homoplastic and appear to have occurred independently several times throughout Antarctic notothenioid diversification. However, the indel marked with a green star is an artifact introduced by the parsimony construction process; due to the high sequence similarity among icefish, the parsimony method cannot adequately model correct species relationships in this clade.

61

Figure 11. Pairwise dN/dS comparisons plotting total dN/dS of whole Hemogen-encoding sequence with the dN/dS values for the N-terminus and C-terminus of notothenioid

Hemogen, within families Nototheniidae (A & B) and Channichthyidae (C & D). This demonstrates the relative changes in selective pressure of the N-terminus vs the C-terminus, relative to the overall selective pressure on the whole Hemogen protein. Values were obtained by conducting pairwise-comparisons in the module yn00 of PAML 4 (see Methods). Each dot represents a unique species pair and the calculated dN/dS values for the N-terminus, C-terminus, and total coding sequence for that specific species comparison.

62

Figure 12. Pairwise dN/dS trends between families Nototheniidae and Channichthyidae, plotting whole-Hemogen dN/dS vs the N-terminus (A) or C-terminus (B). This demonstrates the relative changes in selective pressure of the N-terminus vs the C-terminus, relative to the overall selective pressure on the whole Hemogen protein. Values were obtained by conducting pairwise-comparisons in the module yn00 of PAML 4 (see Methods). Each dot represents a unique species pair and the calculated dN/dS values for the N-terminus, C-terminus, and total coding sequence for that specific species comparison.

63

64

Figure 13. qPCR quantification of hemogen transcript variants in representative icefish species C. aceratus and C. gunnari, comparing adult head kidney hemogen expression with

N. coriiceps adult head kidney for both hemgn-L and hemgn-s splice variants. Primers were designed to capture two types of hemogen expression in icefish. One is the hemgn-s variant and the other is referred to as “whole” hemogen. In icefish, this captures only the hemgn-L variant, while in N. coriiceps this captures all transcripts (excluding a potential hemgn-s transcript).

When normalized to beta-actin expression, we see some hemogen expression in adult icefish, particularly in the non-frameshifted species C. gunnari, but no amplification of hemgn-L in the species with the frameshift indel, C. aceratus. In C. gunnari, hemgn-s is expressed at relatively similar levels to hemgn-L, whereas in C. aceratus the variant hemgn-s is expressed at such low levels that it is nearly undetectable. While it appears that N. coriiceps may also express hemgn-s at low levels, sequencing of qPCR product shows this is off-target binding and not legitimate amplification of the targeted splice variant. N. coriiceps does not express hemgn-s.

65

Figure 14. Changes to the bipartite nuclear localization signal in icefish (Champsocephalus gunnari) relative to red-blooded notothens (Notothenia coriiceps). The bipartite nuclear localization signal consists of two clusters of positively charged amino acids (typically lysine and arginine) separated by a spacer sequence. In C. gunnari and other icefish, the first portion of the bipartite signal has been degraded at the first (R -> M) and fourth (R -> S) positions. This would result in a decreased positive charge and negatively impact the nuclear localization process.

66

REFERENCES

1. Peters, M.J., et al., Divergent Hemogen genes of teleosts and mammals share conserved roles in erythropoiesis: analysis using transgenic and mutant zebrafish. Biol Open, 2018. 7(8). 2. Near, T.J., et al., Identification of the notothenioid sister lineage illuminates the biogeographic history of an Antarctic adaptive radiation. BMC Evol Biol, 2015. 15(109): p. 1-14. 3. Scher, H.D. and E.E. Martin, Timing and Climatic Consequences of the Opening of Drake Passage. Science, 2006. 312: p. 428-430. 4. Barker, P.F., et al., Onset and role of the Antarctic Circumpolar Current. Deep Sea Research Part II: Topical Studies in Oceanography, 2007. 54(21-22): p. 2388-2398. 5. Chen, L., A. Devries, and C.H. Cheng, Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc Natl Acad Sci U S A, 1997. 94: p. 3811-3816. 6. Cheng, C.H. and L. Chen, Evolution of an antifreeze glycoprotein: a blood protein that keeps Antarctic fish from freezing arose from a digestive enzyme. Nature, 1999. 401: p. 443-444. 7. Near, T.J., et al., Ancient climate change, antifreeze, and the evolutionary diversification of Antarctic . Proc Natl Acad Sci U S A, 2012. 109(9): p. 3434-3439. 8. Shevenell, A.E., J.P. Kennett, and D.W. Lea, Middle Miocene Southern Ocean Cooling and Antarctic Cryosphere Expansion. Science, 2004. 305: p. 1766-1770. 9. Clarke, A., D.K. Barnes, and D.A. Hodgson, How isolated is Antarctica? Trends Ecol Evol, 2005. 20(1): p. 1-3. 10. Clarke, A. and I.A. Johnston, Evolution and adaptive radiation of . Trends Ecol Evol, 1996. 11(5): p. 212-218. 11. Dornburg, A., et al., Cradles and museums of Antarctic teleost biodiversity. Nat Ecol Evol, 2017. 1(9): p. 1379-1384. 12. Tripati, A.K., C.D. Roberts, and R.A. Eagle, Coupling of CO2 and ice sheet stability over major climate transitions of the last 20 million years. Science, 2009. 326(5958): p. 1394- 7. 13. Pollard, D. and R.M. DeConto, Modelling West Antarctic ice sheet growth and collapse through the past five million years. Nature, 2009. 458(7236): p. 329-32. 14. Thatje, S., et al., Life hung by a thread: endurance of Antarctic fauna in glacial periods. Ecology, 2008. 89(3): p. 682-692. 15. Cheng, C.H. and H.W. Detrich, 3rd, Molecular ecophysiology of Antarctic notothenioid fishes. Philos Trans R Soc Lond B Biol Sci, 2007. 362(1488): p. 2215-32. 16. Johns, G.C. and J.C. Avise, Tests for ancient species flocks based on molecular phylogenetic appraisals of Sebastes rockfishes and other marine fishes. Evolution, 1998. 52(4): p. 1135-1146. 17. Eastman, J.T., The nature of the diversity of Antarctic fishes. Polar Biology, 2005. 28(2): p. 93-107. 18. Schluter, D., The Ecology of Adaptive Radiation. 2000, Oxford: OUP. 19. Rutschmann, S., et al., Parallel ecological diversification in Antarctic notothenioid fishes as evidence for adaptive radiation. Mol Ecol, 2011. 20(22): p. 4707-21.

67

20. Ruud, J.T., Vertebrates without erythrocytes and blood pigment. Nature, 1954. 173: p. 848-850. 21. Eastman, J.T., Antarctic fish biology: evolution in a unique environment. 1993: Academic Press. 22. Wells, R.M.G., et al., Comparative study of the erythrocytes and haemoglobins in nototheniid fishes from Antarctica. Journal of Fish Biology, 1980. 17(5): p. 517-527. 23. Wells, R.M.G., J.A. Macdonald, and G. di Prisco, Thin-blooded Antarctic fishes- a rheological comparison of the haemoglobin-free icefishes Chionodraco kathleenae and Cryodraco antarcticus with a red-blooded nototheniid, Pagothenia bernacchii. Journal of Fish Biology, 1990. 36(4): p. 595-609. 24. Macdonald, J.A. and R.M.G. Wells, Viscosity of Body Fluids From Antarctic Notothenioid Fish, in Biology of Antarctic Fish, G. Di Prisco, B. Maresca, and B. Tota, Editors. 1991, Springer-Verlag: Berlin. p. 163-178. 25. Cocca, E., et al., Genomic remnants of alpha-globin genes in the hemoglobinless antarctic icefishes. Proc Natl Acad Sci U S A, 1995. 92: p. 1817-1821. 26. di Prisco, G., Molecular Adaptations of Antarctic Fish Hemoglobins. 1998: p. 339-353. 27. di Prisco, G., et al., Biogeography and adaptation of Notothenioid fish: hemoglobin function and globin-gene evolution. Gene, 2007. 398(1-2): p. 143-55. 28. Xu, Q., et al., Adaptive evolution of hepcidin genes in antarctic notothenioid fishes. Mol Biol Evol, 2008. 25(6): p. 1099-112. 29. Beers, J.M. and N. Jayasundara, Antarctic notothenioid fish: what are the future consequences of 'losses' and 'gains' acquired during long-term evolution at cold and stable temperatures? J Exp Biol, 2015. 218(Pt 12): p. 1834-1845. 30. Beers, J.M., K.A. Borley, and B.D. Sidell, Relationship among circulating hemoglobin, nitric oxide synthase activities and angiogenic poise in red- and white-blooded Antarctic notothenioid fishes. Comp Biochem Physiol A Mol Integr Physiol, 2010. 156(4): p. 422- 9. 31. di Prisco, G., J.A. MacDonald, and M. Brunori, Antarctic fishes survive exposure to carbon monoxide. Experientia, 1992. 48(5): p. 473-475. 32. Cocca, E., et al., Do the hemoglobinless icefishes have globin genes? Comp Biochem Physiol, 1997. 118A(4): p. 1027-1030. 33. Zhao, Y., et al., The Major Adult -Globin Gene of Antarctic Teleosts and Its Remnants in the Hemoglobinless Icefishes: CALIBRATION OF THE MUTATIONAL CLOCK FOR NUCLEAR GENES. Journal of Biological Chemistry, 1998. 273(24): p. 14745-14752. 34. Near, T.J., S.K. Parker, and H.W. Detrich, 3rd, A genomic fossil reveals key steps in hemoglobin loss by the antarctic icefishes. Mol Biol Evol, 2006. 23(11): p. 2008-16. 35. Barber, D.L., The blood cells of the Antarctic icefish Chaenocephalus aceratus Lönnberg: light and electron microscopic observations. Journal of Fish Biology, 1981. 19(1): p. 11-28. 36. Sidell, B.D. and K.M. O'Brien, When bad things happen to good fish: the loss of hemoglobin and myoglobin expression in Antarctic icefishes. J Exp Biol, 2006. 209(Pt 10): p. 1791-802. 37. Lau, Y.T., et al., Evolution and function of the globin intergenic regulatory regions of the antarctic dragonfishes (: Bathydraconidae). Mol Biol Evol, 2012. 29(3): p. 1071-80.

68

38. Yang, L.V., et al., Hemogen is a novel nuclear factor specifically expressed in mouse hematopoietic development and its human homologue EDAG maps to chromosome 9q22, a region containing breakpoints of hematological neoplasms. Mechanisms of Development, 2001. 104: p. 105-111. 39. Li, C.Y., et al., EDAG regulates the proliferation and differentiation of hematopoietic cells and resists cell apoptosis through the adtivation of nuclear factor-ĸB. Cell Death and Differentiation, 2004. 11: p. 1299-1308. 40. Li, C.-Y., et al., Suppression of EDAG gene expression by phorbol 12-myristate 13- acetate is mediated through down-regulation of GATA-1. Biochimica et Biophysica Actta, 2008. 2008(1779): p. 606-615. 41. Li, C.-Y., et al., Overexpression of a hematopoietic transcriptional regulator EDAG induces myelopoiesis and suppresses lymphopoiesis in transgenic mice. Leukemia, 2007. 21: p. 2277-2286. 42. Jiang, J., et al., Hemgn is a direct transcriptional target of HOXB4 and induces expansion of murine myeloid progenitor cells. Blood, 2010. 116(5): p. 711-719. 43. Ding, Y.L., et al., Over-expression of EDAG in the myeloid cell line 32D: induction of GATA-1 expression and erythroid/megakaryocytic phenotype. J Cell Biochem, 2010. 110(4): p. 866-74. 44. An, L.-L., et al., High expression of EDAG and its significance in AML. Leukemia, 2005. 19: p. 1499-1502. 45. Yang, L.V., et al., Alternative promoters and polyadenylation regulate tissue-specific expression of Hemogen isoforms during hematopoiesis and spermatogenesis. Dev Dyn, 2003. 228(4): p. 606-16. 46. Nakata, T., et al., Chicken hemogen homolog is involved in the chicken-specific sex- determining mechanism. PNAS, 2013. 110(9): p. 3417-3422. 47. Kruger, A., et al., RP59, a marker for osteoblast recruitment, is also detected in primitive mesenchymal cells, erythroid cells, and megakaryocytes. Dev Dyn, 2002. 223(3): p. 414- 8. 48. Wurtz, T., et al., A new protein expressed in bone marrow cells and osteoblasts with implication in osteoblast recruitment. Exp Cell Res, 2001. 263(2): p. 236-42. 49. Shao, J., et al., Sequencing and bioinformatics analysis of the differentially expressed genes in herniated discs with or without calcification. Int J Mol Med, 2017. 39(1): p. 81- 90. 50. Iwasaki, H., et al., GATA-1 Converts Lymphoid and Myelomonocytic Progenitors into the Megakaryocyte/Erythrocyte Lineages. Immunity, 2003. 19: p. 451-462. 51. Pevny, L., et al., Development of hematopoietic cells lacking transcription factor GATA- 1. Development, 1995. 121: p. 163-172. 52. Galloway, J.L., et al., Loss of gata1 but not gata2 converts erythropoiesis to myelopoiesis in zebrafish embryos. Dev Cell, 2005. 8(1): p. 109-16. 53. Belele, C.L., et al., Differential requirement for Gata1 DNA binding and transactivation between primitive and definitive stages of hematopoiesis in zebrafish. Blood, 2009. 114(25): p. 5162-72. 54. Lyons, S.E., et al., A nonsense mutation in zebrafish gata1 causes the bloodless phenotype in vlad tepes. Proc Natl Acad Sci U S A, 2002. 99(8): p. 5454-9. 55. Zheng, W.W., et al., EDAG positively regulates erythroid differentiation and modifies GATA1 acetylation through recruiting p300. Stem Cells, 2014. 32(8): p. 2278-89.

69

56. Blobel, G.A., CREB-binding protein and p300: molecular integrators of hematopoietic transcription. Blood, 2000. 95(3): p. 745-755. 57. Sandberg, M.L., et al., c-Myb and p300 regulate hematopoietic stem cell proliferation and differentiation. Dev Cell, 2005. 8(2): p. 153-66. 58. Edgar, R., MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, 2004. 32(5): p. 1792-1797. 59. Kumar, S., G. Stecher, and K. Tamura, MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol, 2016. 33(7): p. 1870-4. 60. Hall, T.A., BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, 1999. 41: p. 95-98. 61. Stamatakis, A., RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 2006. 22(21): p. 2688-90. 62. Stamatakis, A., RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, 2014. 63. Yang, Z., PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci, 1997. 13(5): p. 555-556. 64. Yang, Z., PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol, 2007. 24(8): p. 1586-91. 65. Zhang, J., R. Nielsen, and Z. Yang, Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol, 2005. 22(12): p. 2472-9. 66. Burri, R., et al., Adaptive divergence of ancient gene duplicates in the avian MHC class II beta. Mol Biol Evol, 2010. 27(10): p. 2360-74. 67. Yang, Z. and R. Nielsen, Codon-Substitution Models for Detecting Molecular Adaptation at Indiviual Sites Along Specific Lineages. Mol Biol Evol, 2002. 19(6): p. 908-917. 68. Yang, Z. and W.J. Swanson, Codon-Substiution Models to Detect Adaptive Evolution that Account for Heterogeneous Selective Pressures Among Site Clases. Mol Biol Evol, 2002. 19(1): p. 49-57. 69. Wertheim, J.O., et al., RELAX: detecting relaxed selection in a phylogenetic framework. Mol Biol Evol, 2015. 32(3): p. 820-32. 70. Pond, S.L.K. and S.V. Muse, HyPhy: Hypothesis Testing Using Phylogenies, in Statistical methods in molecular evolution. 2005, Springer: New York, NY. p. 125-181. 71. Delport, W., et al., Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics, 2010. 26(19): p. 2455-7. 72. Weaver, S., et al., Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes. Mol Biol Evol, 2018. 73. Shin, S.C., et al., The genome sequence of the Antarctic bullhead notothen reveals evolutionary adaptations to a cold environment. Genome Biol, 2014. 15(468). 74. Kearse, M., et al., Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 2012. 28(12): p. 1647-9. 75. Zerbino, D.R., et al., Ensembl 2018. Nucleic Acids Res, 2018. 46(D1): p. D754-D761. 76. Broos, S., et al., ConTra v2: a tool to identify transcription factor binding sites across species, update 2011. Nucleic Acids Res, 2011. 39(Web Server issue): p. W74-8. 77. Nei, M. and S. Kumar, Molecular Evolution and Phylogenetics. 2000, Oxford: Oxford University Press.

70

78. Yu, G., et al., ggtree: anrpackage for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 2017. 8(1): p. 28-36. 79. Letunic, I. and P. Bork, Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res, 2016. 44(W1): p. W242-5. 80. Near, T.J. and C.H. Cheng, Phylogenetics of notothenioid fishes (Teleostei: Acanthomorpha): inferences from mitochondrial and nuclear gene sequences. Mol Phylogenet Evol, 2008. 47(2): p. 832-40. 81. Near, T.J., J.J. Pesavento, and C.-H.C. Cheng, Mitochondrial DNA, morphology, and the phylogenetic relationships of Antarctic icefishes (Notothenioidei: Channichthyidae). Molecular Phylogenetics and Evolution, 2003. 28(1): p. 87-98. 82. Yang, Z. and R. Nielsen, Estimating Synonymous and Nonsynonymous Substitution Rates Under Realistic Evolutionary Models. Mol Biol Evol, 2000. 17(1): p. 32-43. 83. Librado, P. and J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 2009. 25(11): p. 1451-2. 84. Rozas, J., et al., DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics, 2003. 19(18): p. 2496-2497. 85. Sharp, P.M. and W.-H. Li, The codon adaptation index- a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res, 1987. 15(3): p. 1281-1295. 86. Morton, B.R., Codon Use and the Rate of Divergence of Land Plant Chloroplast Genes. Mol Biol Evol, 1994. 11(2): p. 231-238. 87. Sugawara, T., Y. Terai, and N. Okada, Natural Selection of the Rhodopsin Gene During the Adaptive Radiation of East African Great Lakes Cichlid Fishes. Mol Biol Evol, 2002. 19(10): p. 1807-1811. 88. Ota, T., et al., Positive Darwinian Selection Operating on the Immunoglobulin Heavy Chain of Antarctic Fishes. Journal of Experimental Zoology (Mol Dev Evol), 2003. 295B: p. 45-58. 89. Schmittgen, T.D. and K.J. Livak, Analyzing real-time PCR data by the comparative CT method. Nature Protocols, 2008. 3(6): p. 1101-1108. 90. Gao, P., Functional Study of Hemogen Knockout Mouse Model. Theses and Dissertations (ETD), 2013. Paper 92. 91. Boulikas, T., Putative Nuclear Localization Signals (NLS) in Protein Transcription Factors. 1994, 1994. 55: p. 32-58. 92. Lahti, D.C., et al., Relaxed selection in the wild. Trends Ecol Evol, 2009. 24(9): p. 487- 96. 93. Hunt, B.G., et al., Relaxed selection is a precursor to the evolution of phenotypic plasticity. PNAS, 2011. 108(38): p. 15936-15941. 94. Go, Y., et al., Lineage-specific loss of function of bitter taste receptor genes in humans and nonhuman primates. Genetics, 2005. 170(1): p. 313-26. 95. Murrell, B., et al., Detecting individual sites subject to episodic diversifying selection. PLoS Genet, 2012. 8(7): p. e1002764. 96. Yang, Z., W.S. Wong, and R. Nielsen, Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol, 2005. 22(4): p. 1107-18.

71

97. Anisimova, M., J.P. Bielawski, and Z. Yang, Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol, 2001. 18(8): p. 1585-1592. 98. Yang, Z., Adaptive molecular evolution, in Handbook of statistical genetics, D.J. Balding, M. Bishop, and C. Cannings, Editors. 2001, Wiley: New York. p. 327-350. 99. Anisimova, M., J.P. Bielawski, and Z. Yang, Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol, 2002. 19(6): p. 950-958. 100. Yang, Z. and J.P. Bielawski, Statistical methods for detecting molecular adaptation. TREE, 2000. 15(12): p. 496-503. 101. Yang, Z., Inference of selection from multiple species aignments. Current Opinion in Genetics and Development, 2002. 12: p. 688-694. 102. Bielawski, J.P. and Z. Yang, Maximum likelihood methods for detecting adaptive evolution after gene duplication, in Genome Evolution, A. Meyer and Y. Van de Peer, Editors. 2003, Kluwer Academic Publishers: Netherlands. p. 201-212. 103. Zhao, H., et al., Rhodopsin molecular evolution in mammals inhabiting low light environments. PLoS One, 2009. 4(12): p. e8326. 104. Veilleux, C.C., E.E. Louis, Jr., and D.A. Bolnick, Nocturnal light environments influence color vision and signatures of selection on the OPN1SW opsin gene in nocturnal lemurs. Mol Biol Evol, 2013. 30(6): p. 1420-37. 105. Markova, S., J.B. Searle, and P. Kotlik, Relaxed functional constraints on triplicate alpha-globin gene in the bank vole suggest a different evolutionary history from other rodents. Heredity (Edinb), 2014. 113(1): p. 64-73. 106. Feng, P., et al., Massive losses of taste receptor genes in toothed and baleen whales. Genome Biol Evol, 2014. 6(6): p. 1254-65. 107. Dainat, J., et al., GLADX: an automated approach to analyze the lineage-specific loss and pseudogenization of genes. PLoS One, 2012. 7(6): p. e38792. 108. Ohno, S., Evolution by gene duplication. 1970, New York: Allen & Unwin. 109. Lynch, M. and J.S. Conery, The Evolutionary Fate and Consequences of Duplicate Genes. Science, 2000. 290(5494): p. 1151-1155. 110. Christoffels, A., et al., Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol, 2004. 21(6): p. 1146-51. 111. Hoegg, S., et al., Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish. J Mol Evol, 2004. 59(2): p. 190-203. 112. Postlethwait, J.H., et al., Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res, 2000. 10(1): p. 1890-1902. 113. Brunet, F.G., et al., Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol, 2006. 23(9): p. 1808-16. 114. Glasauer, S.M.K. and S.C.F. Neuhauss, Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol Genet Genomics, 2014. 289: p. 1045-1060. 115. Rastogi, S. and D.A. Liberles, Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol, 2005. 5: p. 28. 116. Amoutzias, G.D., et al., Posttranslational regulation impacts the fate of duplicated genes. Proc Natl Acad Sci U S A, 2010. 107(7): p. 2967-71. 117. Qian, W., et al., Maintenance of duplicate genes and their functional redundancy by reduced expression. Trends in Genetics, 2010. 26(10).

72

118. Amores, A., et al., Cold Fusion: Massive Karyotype Evolution in the Antarctic Bullhead Notothen Notothenia coriiceps. G3 (Bethesda), 2017. 7(7): p. 2195-2207. 119. Chen, D.L., et al., EDAG-1 promotes proliferation and invasion of human thyroid cancer cells by activating MAPK/Erk and AKT signal pathways. Cancer Biol Ther, 2016. 17(4): p. 414-21. 120. Lü, J.W.-X.X.S.-Y.W.Y.J.C.-Y.L.W.-M.C.X.-M.Y., Overexpression of EDAG-1 in NIH3T3 cells leads to malignant transformation. heng wu hua xue yu sheng wu wu li xue bao Acta biochimica et biophysica Sinica, 2002. 34(1): p. 95-98. 121. O'Leary, N.A., et al., Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res, 2016. 44(D1): p. D733-45. 122. Jones, F.C., et al., The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 2012. 484(7392): p. 55-61. 123. Betancur, R.R., et al., The tree of life and a new classification of bony fishes. PLoS Curr, 2013. 5. 124. Read, T.D., et al., Draft sequencing and assembly of the genome of the world's largest fish, the whale shark: Rhincodon typus Smith 1828. BMC Genomics, 2017. 18(1): p. 532. 125. Venkatesh, B., et al., Elephant shark genome provides unique insights into gnathostome evolution. Nature, 2014. 505(7482): p. 174-9. 126. Arnaud, L., et al., A dominant mutation in the gene encoding the erythroid transcription factor KLF1 causes a congenital dyserythropoietic anemia. Am J Hum Genet, 2010. 87(5): p. 721-7. 127. Devlin, E.E., et al., A transgenic mouse model demonstrates a dominant negative effect of a point mutation in the RPS19 gene associated with Diamond-Blackfan anemia. Blood, 2010. 116(15): p. 2826-35. 128. Paz-Priel, I. and A.D. Friedman, C/EBPα Dysregulation in AML and ALL. Crit Rev Oncog, 2011. 16(1-2): p. 93-102. 129. Yang, L.V., et al., The GATA site-dependent hemogen promoter is transcriptionally regulated by GATA1 in hematopoietic and leukemia cells. Leukemia, 2006. 20(3): p. 417- 25. 130. Gardiner, M.R., et al., A global role for zebrafish klf4 in embryonic erythropoiesis. Mech Dev, 2007. 124(9-10): p. 762-74. 131. Soza-Ried, C., et al., Essential role of c-myb in definitive hematopoiesis is evolutionarily conserved. PNAS, 2010. 107(40): p. 17304-17308. 132. Takahashi, K., et al., Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons. Mol Biol Evol, 2001. 18(11): p. 2056-2066. 133. Marino, I.A., et al., Evidence for past and present hybridization in three Antarctic icefish species provides new perspectives on an evolutionary radiation. Mol Ecol, 2013. 22(20): p. 5148-61.

73