<<

Atypicat molecular evolution of afrotherian and xenarthran

B-globin cluster genes with insights into the

B-globin cluster gene organization of stem eutherians.

By

ANGELA M. SLOAN

A thesis submitted to the Faculty of Graduate Studies in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Department of Zoology University of Manitoba Winnipeg, Manitoba, Canada

@ Angela M. Sloan, July 2005 TIIE I]MVERSITY OF' MANITOBA

FACULTY OF GRADUATE STT]DIES ***** - COPYRIGHTPERMISSION ] .

Atypical molecular evolution of afrotherian and xenarthran fslobin cluster genes with insights into thefglobin cluster gene organization òf stem eutherians.

BY

Angela M. Sloan

A ThesislPracticum submitted to the Faculty of Graduate Studies of The University of

Manitoba in partial fulfill¡nent of the requirement of the degree of

Master of Science

Angela M. Sloan @ 2005

Permission has been granted to the Library of the University of Manitoba to lend or sell copies of this thesis/practicum, to the National Library of Canada to microfilm this thesis and to lend or sell copies of the fiIm, and to University Microfïlms Inc. to publish an abstract of this thesis/practicum.

This reproduction or copy of this thesis has been made available by authority of the copyright owner solely for the purpose of private study and research, and may only be reproduced and copied as permitted by copyright laws or with express written authorization from the copyright ownér. ABSTRACT

Our understanding of p-globin gene cluster evolutionlwithin eutherian

.is based solely upon data collected from in the two most derived eutherian

superorders, Laurasiatheria and Euarchontoglires. Ifence, nothing is known regarding_the

gene composition and evolution of this cluster within afrotherian (, sea cows,

, , shrews, tenrecs and golden moles) and xenarthran (,

and armadillos) mammals. To address this shortcoming, I sequenced and

classified atotal of 24 'þ-llke' globin genes from 1 I afrotherian and Z xenarthran species,

and conducted a comprehensive analysis on the molecular evolution of p-globin cluster

genes within these two basal eutherian clades.

Analyses of the 'adult expressed' F- atrd ò-globin products suggested that the ô-

globin locus of paenungulate (and presumably all afrotherian) mammals encodes the only

'f3-type' chain component of their post-natal hemoglobin. This finding challenges recent

views on the evolution of the eutherian p-globin cluster, and provides the first

documented evidence of a dispensable plocus within Mammalia. It appears that the ò-

locus of stem afrotherians gained a functional advantage by means of a capable promoter

region, donated by the neighbouring p gene via gene conversion, after which p was

silenced through nonfunctionalization. Hyraxes were also found to possess a second putatively functional ô-globin gene (òH), which may or may not be expressed. A ML analysis of ò-globin IVS2 sequences grouped elephants and sirenians into a monophyly to the exclusion of hyraxes supporting the '' hypothesis, and placed the tree as the most basal hyrax species. Analyses of the 'embronically expressed' r-, y- and q-globin genes revealed the

presence of both Y- and e-loci within members of the *p"T.9:: Afrotheri4 providing

compelling evidence that the ancestral eutherian p-globin cluster ðontained at least four

genes (u, ò, p) T, by the time this clade diverged from other eutherians -107 mya. Earlier

studies have revealed that hemoglobin of 5-month old elephant fetuses and

calves contains two 'F-type' chains. My data suggests these distinct p-chain components

are encoded by the Y- and ôloci in these species, thus providing only the second example

of developmentally delayed expression of a y locus within mammals. Sequence analyses

of the armadillo B-globin cluster further indicated that the eutherian p-cluster contained

five (e, ò, p) loci Y, ï, by the time xenarthrans diverged from eutherians -102 mya. Remarkably, of the five armadillo loci, only the two outermost genes (e and p) appeared

capable of transcribing a polypeptide. Together with earlier studies that demonstrated adult armadillos express two distinct p-type chains, this finding implies that these

xenarthrans may uniquely extend expression of the e-locus into post-natal growth stages

or, like marsupials, possess an unlinked 'pJike' gene outside their p-globin cluster.

A comparative assessment of evolution rates indicated that, with few exceptions, descendants of the'e-like' loci diverged more slowly than the 'p-like' loci in both coding and non-coding gene regions within species of all four eutherian superorders. These findings signify that rates of evolution for the parlogous genes within eutherian p-clusters are a function of selection pressure, contradicting the molecular clock theory that all genes evolve at an equal rate. ACKNOWLEDGEMENTS

First and foremost, I wpuld like to thank mV acad-e]1i,c advijor, Dr. Kevin

Campbell, for giving me the opporhrnity to gain invaluable research experience, and

devising a thesis project to which I am proud to attach my name. I am also very grateful

for all your adVice, patience, and good-natured kidding, which provided me with a

pleasant work environment for the entirety of my graduate work. You took me on as a

novice to the field of molecular phylogenetics and I thank you for your faith in me - it

will forever be appreciated.

I also thank my other committee members, Dr. Gunnar Valdimarsson who not

only donated essential lab equipment but plentiful advice concerning laboratory

procedures, and Dr. Ivan Oresnik who allowed me to partake in an intriguing research

project which ultimately boosted my career interest in the flreld of site-directed

mutagenesis. I additionally appreciate the helpful feedback they both supplied throughout the course of my thesis project, which proved invaluable to the efficiency of data collection and analyses. Thanks to both for your kind and generous natures, as well as your eyes for detail!

For allowing me to work in his lab for more than two , I extensively thank

Dr. Nate Lovejoy, who was more patient and giving than I could have ever hoped for. I also thank the "Lovejoy Lab" graduate students, Kristie Lester, Stuart Willis, and Tommy

Sheldon for their friendship and moral support over the years.

I must also graciously thank the providers of blood, tissue, and DNA samples.

These include Andrew Taylor at the University of Pretoria in South Africa, Brenda

McDonald at the University of Adelaide in Australia, Christophe Douady at Université

111 Claude Bernard Lyon I in France, Gabi fürlach at the Marine Biological Laboratory in

Woods Hole, Massachusetts, Gary Bronner at the University of Cape Town in South

Africa, Galen Rathbun at the California Academy of Sciencei irr Cam6ri4 California,

Gordon Glover at the Assiniboine Park Zoo in Winnipeg, Manitob4 Robert Bonde at the

Florida Integrated Science Centre in Gainseville, , Sarit¿ Maree at the University

of Pretoria in South Africa, Terry Robinson at the University of Stellenbosch in Cape

Town, South Africa, and Wendy Korver at the Bowmanville Zoo in Ontario, Canada.

Without the generous donations of these skilled academics, this study could not have

been the complete molecular analyses I dreamed it would be.

I could not forget my mother, Merle, and father, Gary, to whom I thank for

everything they have ever given me, be it advice, support, friendship, and love. These are

the best parents anyone could ask for and I owe them everything I also thank my sister,

Jennifer, for a precious friendship and an immense amount of moral support and

encouragement when I needed it most. Thanks as well to Amrit Singh for all the talks

and advice on lab procedures, and to Octavio Orellana for providing me with analytical

equipment and helping me become a stronger and more confident individual. I love you

all.

Last but not leas! I would like to thank the organizations that funded my research, including the University of Manitoba Graduate Fellowship association (providing financial support to myself), and the Natural Sciences and Engineering Research Council of Canada (NSERC), The Winnipeg Foundation and the University of Manitoba Research

Grants Program (tlRGP) (providing supporr to Kevin Campbell)

1V TABLE OF CONTENTS

Page

Abstract

Acknowledgements 111

Table of Contents

List of Figures

List of Tables xll

Abbreviations xlv

General Introduction

Chapter I

Atypical molecular evolution of afrotherian and xenarthran 'B-like' globin genes ...... 7

Introduction 8

Materials and Methods l3

Results 19

Discussion 38

Chapter II

Molecular evolution of the eutherian p-globin cluster infened from afrotherian and xenarthran gene sequences . 45

Introduction ...... 46

Materials and Methods ... .. 49

Results ...... 52

Discussion ...... 69 Literature Cited 77

Appendix I

t-1 95 Primers used in lhe (A) amplification and @) sequencing reactions of "p-like'globin genes.

T-2 96 DNA sequences of the 'BJike' gfobin genes amplifred in this study.

1-3 125 Lengths of exons and introns for sequenced 'p-like' globin genes.

r-4 726 Lengths of additional sequence data obtained for the 5' external region of sequenced 'p-like' globin genes, and positions of upstream features relative to the putative CAp site.

1-5 127 Lengths of additional sequence data obtained for the 3' external region of sequenced 'p-like' globin genes, and positions of poly- adenylation signals (AATAAA) relative to the termination codon.

Appendix 2

2-l 128 Primers used in the (A) amplification and (B) sequencing reactions of "e-like' globin genes.

2-2 tze DNA sequences of the 'e-like' sì"ui' g"n", ;*piir,; ir rr,ir' sh:dy.

2-3 t42 Lengths of exons and intons for 'e-like' globin genes

2-4 t43 Lengths of sequence data obtained for the 5' external region of 'e-like' globin genes, and positions of upstream features relative to the putative Cap site.

vl 2-5 -- .....,...r44 Lengths of sequence data obtained for the 3' external region of 'e-like' globin genes, and positions of poly-adenylation signals (AATAAA) relative to the termination codon.

vll LIST OF FIGI]RES

Figure Page

1-1. 9 The p-globin gene clusters of select mammals (see text for details; Konkel är al.1979; Schon et al. l98l; Lingrel et al. 1983; shapiro et at. r9t3; Townes et al. 1984; Schimenti and Duncan 1985b; cooper et al. 1996; satoh et al. 1999) mapped onto a phylogram adapted from the maximum likelihood analysis (concatenated 17,736 bp data set) of Amrine-Madsen et at. (2003).

l-2. T2 Hypothesized relationships among the three extant paeunungulate orders based on 19 recent molecular phylogenetic studies.

1-3. 23 Comparison of the'p-globin' chain amino acid sequences of four afrotherian and two xenarthran species (top row in each case) with those deduced from the ò- and p-globin gene sequences determined for these species (bottom row; dots represent identity with the top sequence).

1-4. 24 Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).

1-5. 25 The 5' external sequence data and features of ô- and p-globin genes amplified in this study and those of armadillo, human, tarsier and galago.

1-6. 26 Dotplot comparisons (window size : 100; threshold : 80) of the putative ô-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).

viii LIST OF FIGURES ...cont'd

Figure Page

1-7 27 Dotplot comparisons (window size : 100; threshold : g0) of the putative ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).

1-8 28 Dotplot comparisons (window size : 100; threshold : g0) of the putative rock hyrax òH-globin gene (Y-axis) with the ö- and p-globin genes of humans, rabbits and mice (X-axis).

1-9. 30 Dotplot comparisons (window size : 100; threshold : g0) of the putative p-globin gene (y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).

1-10. 3T Dotplot comparisons (window size : 100; threshold : s0) of the putative armadillo r¡rô-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis).

1-11. 32 Dotplot comparisons (window size : 100; threshold : g0) of the putative armadillo p-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis).

r-12. 34 Doþlot comparisons (window size : 100; threshold : 80) of the putative qò-globin gene (y-axiÐ with the ò- and p-globin genes of humans, rabbits and mice (X-axis).

IX LIST OF FIGIIRES ...cont'd

Figure Page

1- 13. 35 Dotplôt comparisons (window size : 100; threshold : 80) of the putative p-globin sloth gene (y-axis) with the ò- and B-globin genes of humans, rabbits and mice (X-axis).

t-14. 36 Xenarthran-rooted maximum likelihood tree constructed from homologous ò- globin IVS2 gene sequences (773 bp; see text for details) illustrating the inter-relationships among paenungulate mammals.

2-t. 56 Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant y-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). an 57 lotplot comparisons (window size : 100; threshold : g0) of the putative dugong y-globin gene (y-axis) with the e-, y- and r¡-globin g"n", of humans, rabbits, mice and goats (X-axis)

2-3. 59 The 5' external sequence data and features of 'e-like' globin genes sequenced in this studl', plus those of the armadillo r-, ey- and rpq-globin genes, the human r-, yA- and r¡rq-globin genes and the goai r¡-globin gene.

2-4. 60 Dotplot comparisons (window size : 100; threshold : 80) of the putative elephant shrew e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). LIST OF FIGURES ...cont'd

Figure Page

2-5. 67 Dotplot comparisons (window size : 100; threshold : 80) of the putative sloth e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis).

2-6. 62 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo e-globin gene (y-axis) with the e-, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).

2-7. 64 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rþy-globin gene (y-axi$ with the e_, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).

2-8. 65 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rpq-globin gene (y-axis) with the e_, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).

2_9. 70 p-globin The gene clusters of select mammals, updated using data collected for this study (see Fig. l-l legend for details).

XI LIST OF TABLES

Table

1-1. T4 List of taxa examined in this study

T-2. 15 Genes included in the sequence analyses of 'p-like' globin genes with corresponding GenBank accession numbers.

1-3. 20 List of 'p-like' globin gene products amplified in this study.

t-4. 22 Percent identity matrix comparing the IVSI and tVS2 sequences of 'p-like' globin genes sequenced in this those of human, rabbit and mouse.

2-1. 50 Genes included in the sequence analyses of 'eJike' globin genes with corresponding GenBank accession numbers.

t _') 53 List of 'elike' globin gene products amplified in this study.

2-3. 54 Percent identity matrix comparing the IVSI and tVS2 external sequences of 'e-like' globin genes sequenced in this study with those of human, rabbit, mouse and goat.

2-4. 55 Percent identity matix comparing the 5' and 3' external sequences of 'e-like' globin genes amplified in this study with those of human, rabbit, mouse and goat.

xtl LIST OF TABLES ...cont'd

Table Page

2-5. 66 Percent of synonymous (silent) and nonsynonymous (replacement) substitutions between the 'e-like' and 'pJike' globin genes amplified in this study (plus those of armadillo, rabbiq mouse, goat and tarsier) and their ortholog in humans. Silent and non-silent evolution rates are also shown.

2-6. 68 Percent of nucleotide substitutions between the IVS2 sequences of 'e-like' and pJike' globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat and tarsier) and their ortholog in humans. The corresponding non-coding DNA mutation rates are also shown.

xlil ABBREVIATIONS

A adenine -: ct alpha _- p beta bp(s) base pair(s) ô delta ddH2O double distilled water dNTP deoxynucleotide triphosphate E epsilon E. coli Escherichia coli EMBOSS The European Molecular Biology Open Software Suite rl eta (F) forward primer Ï gamma G guanine Hb hemoglobin HI(Y85 Haegawa-Kishino-Yano (1985) model HS hypersensitive site(s) IPTG isopropyl-B-D-thiogalactoside IVS intervening sequence LB Luria-Bertani LCR locus control region MatGAT Matrix Global Alignment Tool MgCl2 magnesium chloride ML maximum likelihood mya million years ago NCBI National Centre for Biotechnology Information NISC NIH (National Institute of Health) Intramural Sequencing Centre NSERC Natural Sciences and Engineering Research Council of Canada ú) omega PAUP* Phylogenetic Analysis Using Parsimony (*and other methods PCR(s) polymerase chain reaction(s) tþ psi (pseudo) (R) reverse primer RPM revolutions per minute T thymidine Ta annealing temperature TAE tris-acetate EDTA Taq Thermus aquaticus TBR tree bisection and reconnection U unit of enzyme activity

xtv URGP University of Manitoba Research Grants program V volts vlv volume per volume X-GAL 5=bromo-4-chloro-3-indolyl-B-D-garactoside

XV GENERAL INTRODUCTION

The crown-group Mammalia is defined as the most rec-ent common ancestor of

the existing mammalian clades and includes the egg-laying platypuses and echidnas

(Monotremata (or Prototheria)), the pouched marsupials (Marsupialia (or Metatheria))

and the piacental mammals @utheria) (McKenna and Bell 1997; Nowak 1999). Both the

Monotremeta and Marsupialia are relatively small assemblages, containing one and seven

orders respectively (McKenna and Bell 1997; Nowak 1999), whereas the placental mammals a¡e a diverse gfoup, consisting of eighteen orders recently divided into four

superorders: (1) Euarchontoglires (or Supraprimates) [rodents, lagomorphs, ]; (2)

Laurasiatheria [carnivores, cetaceans, ungulates, bats, shrews, moles]; (3) Xenarthra

[armadillos, anteaters, sloths]; and (a) (Eizirik et al. 20Q7; Madsen et al. 2001;

Murphy et at. 2001b; Lin et al. 2002; Springer et at. 2003). Afrotherians represent a clade of placentals thought to have arisen on the African continent and comprise six orders: (elephants), (sea cows), Hyracoidea (hyraxes), Tubulidentata

(aardvarks), Macroscelidea (elephant shrews), and Afrosoricida (golden moles and tenrecs) (Stanhope er ø/. 1998). Although they do not share a single morphological synapomorphy, the members of this group were first grouped into a monophyly by

Springer (1997) and Stanhope et al. (1998) based on molecul ar datz. Further support for this clade comes from many more recent comparative sfudies utilizing mitochondrial, nuclear and concatenated gene sequences (Liu and Miyamoto 1999; Springer et al- 1999;

Eizirik et al.2O0l; Murphy et al.200la,b; Lin et al.2O02) and from a common nine base pair deletion in exon i I of fhe BRCAI gene found only in afrotherians (Madsen e/ a/.

2001). Recently, van Dijk et al. (2001) also provided independent support for the clade by identiffing unique amino acid substitutions in three proteins of these relative

to other eutherians, while Nikaido et al. (2003) isolated and charactenzed a novel family

of SINEs (short interspersed repetitive elements) termed "Afro-Sb[8s", sp-ecific to species

within Afrotheria.

Molecular clock data based on gene studies indicate that eutherians have been present and evolving into present-day clades since the mid (-100 mya)

(Eizirik et al. 2001; Murphy et al. 2001b; Springer et al. 2OO3). The deepest split is thought to have occurred either between afrotherians * xenarthrans (making them 'sister groups') and other placental mammals (Madsen et at.2007; Delsuc et at. 2002;Lin et al.

20OZ) or simply between afrotherians and other placentals (Stanhope et al. 1998, Liu and

Myamoto 1999; F;izink et al. 2001; Murphy et al. Z}ola,b; van Dijk et al. z00l;

Springer et al. 2003). Both hypotheses place this divergence at -107 mya @izirik et al.

2001; Murphy et al. 2001b; Madsen et al. 2002, Springer et al. 2003), a date coinciding with a major plate tectonic separation event (Smith. et al. 1994). At roughly this time,

Africa and South America separated from the southern hemispheric super-continent

Gondwana, which also incorporated Antarctica, Australia, India and Madagascar before it broke apart (Smith et al. 1994). The 'Afrotheria hypothesis' suggests that six of the twenty placental orders evolved on Africa during the period when the continent was isolated from others via plate tectonics (-1ol to 40 mya) (Eizirik et at. 2001; Murphy e/ al. 2001b Springer et at. 2003). After differentiating into various ecological niches, these mammals were able to distribute themselves throughout Europe and Asi4 as Africa began to collide with these continents -30 mya (tledges 2001). Xenarthran mammals are thought to have evolved solely on South America (Murphy et at. 2001b). Laurasiatheria

2 and Euarchontoglires have been deemed sister taxa (constituting the clade Boreoeutheria)

(Murphy et al. 2001b) with a northern hemispheric origin on the ancient supercontinent

Laurasia (Murphy et al. 200la,b). Undoubtedly, the early divergence of Afrotheria and

Xenarthra from other placentals provides a framework for ancient eutherian evolution studies, while the collection of molecular dat¿ from these basal superorders could have many important implications on the early evolution of eutherian genomes.

An ideal candidate to address these intriguing possibilities is provided by the respiratory pigment hemoglobin (FIb). Hemoglobin is one of four classes of functional porphyrin-containing proteins capable of reversible oxygen-binding @erutz 1979;

Dickerson and Geis 1983). Myoglobin (supplying Oz to mitochondria within muscle tissues), neuroglobin (supplying Oz to the brain; Burmester et al. 2000) and the ubiquitous cytoglobin (whose function is unknown; Burmester et al. 2002; Trent and

Hargrove 2002) are monomeric proteins comprising one protein chain. However, Hb

(binding oxygen in the lung capillaries and facilitating its delivery to respiring cells) is a tetrameric protein comprising two ct- and two p-helices (Dickerson and füis 1983; 'Weber and Wells 1989; Poyart et al. 7992; Weber 1995). It is hypothesized that the cr- and P- globin genes encoding hemoglobin arose from the duplication of a primordial globin gene roughly 600-800 mya (Antoine and Niessing 19S4). This pair of globins then underwent further duplication events over time to produce the various a-like and p- like globin genes found in extant vertebrate species (Goodman and Moore 1975;

Czelusniak et al. 1982; Proudfoot et al. 1982; Goodman et al. 1987). Both the a- and p- globin gene clusters of derived eutherians have undergone extensive scrutiny. However, it is the p-globin cluster, comprising the genes encoding the'p-type' polypeptide chains of hemoglobin, that is the focus of this study.

Functional p-globin cluster genes are composed of thràcoding rãgions separated by two introns (intervening sequences or IVS). Exon sequences are highly conserved in length, (normally comprising 89,223 and126 bp in exons 7,2 and 3, respectively) and share strong similarity between orthologs of different species (>60%) and within paralogs of the same species @awson and Yamamoto 1998) These findings are most likely the consequence of puriffing selection, as conserved sequences have important implications for the structure and function of the protein. The two fVS regions are variable in size, normally ranging ftom 722 to 130 bp in the first intron and from 850 to 914 bp in the second @ickerson and Geis 1983). Based on sequence alignments, it is assumed that the vast amount of variability in both size and composition is the result of numerous base substitution and insertior/deletion events over time (Konkel et al. 1979). Exon-infon boundaries show a high degree of conservation, however, as both introns almost always possess the 5'GT - AG 3' sequence normally found at splice junctions (Nishioka e/ a/.

1980). It is therefore quite possible that introns are necessary for proper transcription, especially since one mouse pseudogene, containing only exon sequences, is not expressed under normal circumstances (Dickerson and Geis 1983).

Mammalian genomes often employ regulating mechanisms to ensure that the genes within multi-gene families are expressed at appropriate times, including proximal promoters and distal regulatory elements working in conjunction with environmental cues

(Fritsch et al. 1980;Iltll et al. 1984). The locus control region (LCR) is a large segment of DNA found in mammals whose activity has extreme effects on the expression of genes within a multigene family (Tuan et al. 1985; Forrester et al. 1987; Grosveld et al. 1987).

The B-globin LCR is roughly 12 kb in length, located from -6 to 20 kb upstream region of the cluster, and is divided into three sites (200-300 bp toþl'hypersensitive (HS) to

DNase I digestion (Tuan et al. 1985; Forrester et al. 1987; Grosveld et al. 7987). These sites act as recognition sequences for DNA-binding proteins, which directly control the chromatin structure of the 60 kb functional p-globin domain. The mechanisms by which these proteins regulate chromatin structure, however, are currently unknown.

Experiments in which the LCR of humans was fully deleted resulted in failure to express any of the p-globins, even though the genes themselves were structurally intact @risocoll et al. L989; Feng et al. 2005). Interestingly however, the presence of only one HS site within the LCR is sufficient for full gene expression in transgenic mice @yan er ø/.

1989). The results of these studies confirm the utmost importance of the LCR in the expression of each globin gene within the cluster. Certain sequences of non-coding

DNA, located both upstream and downstream of the 5' (CACCC, CCAAT, ATA) and 3'

(AATAAA) flanking end of each globin gene, respectively, also play important roles in the control of both transcription and translation @roudfoot and Brownleee 1976;

Efstratiadis et al. 1980; Grosveld et al. 1982; Dierks et al. 1983; Kosche et al. 1985;

Myers et al- 1986), and are discussed in det¿il within the upcoming chapters.

Surprisingly, our knowledge of eutherian p-globin clusters has been deduced from data collected on only the two most derived orders, Laursiatheria and

Euarochontoglires, and thereforg the goal of this study was to obt¿in genetic information on the p-clusters of the two most basal clades, Afrotheria and Xenarthra. This information should provide much needed insight into the p-cluster gene organization of stem eutherians and ultimately lead to a better understanding of molecular evolution of the p-cluster within the whole of .

-=-.. CHAPTER I

ATYPICAL MOLECULAR EVOLUTION OF

AF'ROTHERIAN AND XENARTHRAN 'P.LIKE' GLOBIN GENES. INTRODUCTION

The mammalian p-globin gene cluster is thought to.frl,_t_ originated from the tandem duplication of a single proto-B-globin gene about 200 mya @fstratiadis et al.

1980; Czelusniak et al. 1982). Differential selection on the upstream regulatory elements of these paralogous loci subsequently confined expression of the 5' e-globin gene to early embryonic stages, while the 3' p-globin gene became developmentally suppressed

@fstratiadis et al. 1980; Czelusniak et al. 1982). This gene cluster orgamzation and expression pattern is still found in marsupials (Koop and Goodman 1988; Cooper and

Hope 1993; Cooper et al. 1996; but see Wheeler et al. 2001), and presumably monotremes (Lee et at. 1999). However, as indicated by the four-to-twelve gene clusters of rodents, lagomorphs, artiodactyls and primates (Fig. 1-l), the composition and regulation of the ancestral mammalian p-cluster has been substantively altered over the eutherian radiation (Efstratiadis et al. 1980; Jeffreys et al. 1982; Martin et al. 1983;

Goodman et al. 1984; Hardies et al. 1984; Scott et al. 1984; Townes et al. 1984;

Schimenti and Duncan 1985a,b, Hardison and Gelinas 1986; Tagle et al. 1988). Despite their compositional differences, remarkably congruent (though independent) gene conversion and inactivation events are seen within the B-clusters of these four orders, leading to the development of several widely accepted paradigms pertaining to the evolution of this cluster (Jeffreys et al. 1982; Goodman et al. 1984; Hardison 1984;

Hardison and Margot 1984; Harris et al. 1984,' Tagle et al. l99I; Satoh et al. 7999, Koop et al. 1989a;Prychitko et aL.2005).

For inst¿nce, the presence of a ô-globin locus in each of the four examined placental orders (Rodentia, Lagomorpha, Artiodactyla, Primates; Fig. 1-l) has led to the Putative coÍÌmon eutherian ancestor

E'r q ô p eyvqôB al # Þ ÉIl

vßhO phl yBh2 ryph3 pl Bz

VPr" "P

i-r EII pc

¡ €rrr ìyru

?

f ------Elephant shrew I r-l i i L------Golden I T-t ttl I I L------ lt -'1 LJ< ? I ll ------HYrax l.l !-- j l---- Elephanr I

eÊ ot Marsupials H H H eP Monotremes {t}-+ general acceptance that this gene arose via duplication of the adult-expressed pJocus

@fstratiadis et al. 1980; Czelusniak et al. 1982) soon **]n: divergence of Eurheria from Marsupialia (148-159 mya; Archibald 2003). Since thirevent, the orthologous ô genes from each of these four orders \ryere converted (in some cases repeatedly) by the B- locus, with p preferentially converting ô in its upstream external and 5' exonic regions

(Jefireys et al. 1982; Martin et al. T983; Hardison and Margot 1984; Hardies et al. 1984).

The ò gene thus proved to be largely expendable, with p-globin becoming the major (and generally only) functional adult gene in the cluster (Martin et al. 1983; Hardies et al.

1984; Koop et al. 1989a; Prychitko et aL.2005). Indeed, õ-globin is only known to be expressed in New World monkeys, hominoids, tarsiers and galagos (Martin et al. 7983;

Hardies et al. 1984; Prychitko et al. 2005) and,' with the exception of the laner two groups (Tagle et al. 1988; Koop et al. I989a), encodes for only a small fraction of the hemoglobin molecules of post-natal erythrocytes (Boyer et at. l97l; Martin et at. 1983;

Ross and Ptzarro 1983). The distribution of expressed ò-globin genes among primates suggests these functional loci arose (possibly via the resurrection of an anciently silenced

ô-globin gene; Martin et al. 1983) independently in stem anthropoids, tarsiiformes and loriformes via recombination with the 5' p-globin promoter region necessary for transcription (Koop et al. 1989a; Tagle et al. 7991; Prychitko et a|.2005). The ò-locus was subsequently silenced in the lineage leading to Old \Morld monkeys (Martin et al.

1983), or hybridized by an unequal crossover event with the rþr¡-locus in the ancestry of lemurs (Jeffreys et al. 1982). Accordingly, ò-globins have been litle more than vestigial genes since their inception, except where their integrity has been maintained through sporadic, fortuitous events of gene conversion (Hardies et al.l9B4).

10 Recent molecular studies have provided compelling support for the presence of

four placental 'superorders' (Madsen et al. 2001; Murphy et at.200la,b; Amrine-Madsen

et at. 2003). Because these (and other) studies place the iop".rd.rs Xenarthra and

Afrotheria at the base of the eutherian clade (Fig. 1-1), members of these two groups

provide ideal model systems to test hypotheses regarding the molecular evolution of the

B-globin gene cluster. While protein sequences have been documented for the 'F-type'

chains ofseveral afrotherian and xenarthran species, priorto the initiation ofthis study no

nucleotide data was available. Consequently, the identity (i.e. Þ or ô) and evolutionary

history of the genes encoding these polypeptide chains have not been established. To fill

this void, the primary aim of this study was to determine whether the p/ô duplication

event predates the eutherian radiation. If so, a second aim was to explicate the

evolutionary events leading to the present-day post-natal p-like globin genes of both

superorders to test whether the p-globin locus exhibited a competitive advantage over the

5' ò-locus in members of these basal clades.

The final goal of this study was to examine the phylogenetic relationships among

the afrotherian orders Proboscidea, Hyracoidea and Sirenia. Since their grouping into the

superorder (Simpson 1945), the inter-relationships among these orders have

remained controversial. For example, studies examining morphological characters

consistently place sirenians and elephants as sister taxa ('Tethytheria hyothesis';

McKenna 1975; Shoshani and McKenna 1998; Liu and Miyamoto 1999; Liu et al.200l), while molecular analyses recover only limited support for this relationship (Fig. 1-2). To help resolve this issue, phylogenetic relationships amongst all seven ext¿nt paenungulate genera were exarnined using non-coding, homologous gene sequences.

l1 Fig. 1-2. Hypothesized relationships among the three extant paeunungulate orders based on 19 recent molecular phylogenetic studies. Letters below each hypothesis correspond to studies providing support for the above topology.

12 (d () CS ñ rcJ 8E E 'rJ t E o Þ o() E Ø Clq o() o U) Íúo U) p i{ _Ò ñ a Jà .A ;r.j Êi Ë E

A B c

Lavergne et al. 1996 Stanhope et al. 1998 Springeref al. 1997 Eizirik et al. 2001 Liu and Miyamoto 1999 Madsen et al. 2001 Murphy et al. 200Ia Amrine and Springer 1999 Amrin+Madsen et al. 2003 Madsen et aL.2002 Murphy et ql. 2001b Fleming et qL.2003 Murata et al. 2003 Delsuc et al. 2002 Douady et a|.2002 Maliaet al. 2002 Douady et aL.2003 Springeret al. 2003 Waddell and Shelley 2003 MATERIALS AND METHODS

Data collection

DNA was extracted from whole blood (100 pl) and tissue (lO-25 mg) samples from 11 afrotherian and two xenathran species (Table 1-1) using a QIAGEN DNeasy@

Tissue Kit. The optical density of eluates was measured with an flltrosp ec 3100 pro spectrophotometer @iochrom), the concentration and purity of the DNA was calculated from absorbance readings at260 and 280 nm, and 50 ngl¡rl working solutions prepared.

To design primers for polymerase chain reactions (PCRs), degenerate nucleotide sequences were first deduced from the 'p-globin' polypeptide chains of the African

(Loxodonta africana) and Asian elephant (Etephas maximus), Amazon manatee

(Trichechus inunguis), rock hyrax (Procøvia capensis), nine banded armadillo (Dasypus novemcinctus) and three-toed sloth (Bradypus tridactylas) (see Fig l-3 for references).

Resulting sequences were aligned with , rodent, lagomorph and artiodactyls p- globin gene sequences (see Table l-2), and highly conserved regions used for primer design (Primer Premier 5.0; Premier Biosoft International). PCRs were conducted on l00ngof templateDNA(2¡rl) in 0.2m1 tubescontaining43¡rlofreactionmixture

(1.0 ¡rl of each dNTP (2.5 mM), 5.0 ¡,rl lOx pCR Reaction Buffer, 3.H.5 pl of 50 mM

MgCl2, 5.0 ¡rl of each primer (10.0 pmol/¡rl), 0.5 ¡rl Taq polymerase (5 U/¡^r,l) and 24.0-

25-5 ¡il ddH2O), using PTC-0220 DNA Engine Dyad@ (MJ Research) and Mastercycler@

Gradient @ppendorf) thermocyclers. Initiat amplifications entailed a 30 cycle protocol

(94"C for 30 s; 50oC for 15 s;72oC for 30-75 s) followed by a 10 min interval at72oC.

PCR products were isolated using Montageru PCR Centrifugal Filter Devices

t3 Table 1-1. List of taxa examined in this studv.

Afrotheria Loxodonta africanø blood not specified female

Elephas maximus Asian elephant blood not specified male

Dugongidae Dugong dugon dugong DNA Mabuiag Island, female Tones Straits

Trichechidae Trichechus manatus skin , Florida female

Procaviidae Procsvia capensis rock hyrax blood not specified not specified

Heterohyrax brucei À yellow-spotted hyrax eaf not specified not specified Dendrohyrax dorsalis western DNA not specified not specified

Orycteropodidae qÍer aardvark liver Karoo, SouthAfrica female

Macroscelididae inturt bushveld elephant shrew liver not specified not specified

'I Cþsochloridae Chrysochloris asiatica cape golden mole Cape Town, SouthAfrica L ¡nale hottentotus Hottentot golden mole spleen PaIm Springs, South Africa male

Xenarthra Bradypodidae Bradypus tridøctylus pale-throated not not specified nod specified three-toed sloth specified

Dasypodidae Dasypus novemcinctus nine-banded not not specified not specified armadillo specified Table 1-2. Genes included in the sequence analyses of 'p-like' globin genes with corresponding GenBank accession numbers. Hyrax-specific=ò.globin genà denotedbyasterisks(t). - '"'."-- (ôH) are

Eutherian Globin Accession superorder Species Common Name Reference gene No.

Afrotheria Loxodonta africana African elephant ò DQ09l20l This study

Elephas maximus Asian elçhant ô DQ091202 This study

West Indian Trichechus manqtus ô manatee DQ091203 This study

Dugong dugon dugong ò DQ091204 This study

ô DQ091208 Procavia capensis rock hyrax This study ôHx DQ091205 yellow-spotted Heterohyrax brucei +' hyrax ôH DQ091206 This study Dendrohyræ. dorsalis western tree hyrax òHx DQ091207 This study

Elephantulus intufi bushveld elephant ô DQ0912l1 strrdy sh¡ew p DQoe1212 This Dasypus Xenarthra nine-banded vô DQ091209 novemcinctus This study armadillo 13 DQ09l2l3

pale-throated vò DQ091210 Bradypus tridactylus This study three-toed sloth 13 DQ0912l4 Dasypus nine-banded Green 2004, .lrò, p 4C151518 novemcinctus armadillo unpublished Efst¡atiadis Euarchotoglires Homo sapiens human ô,p u01317 et al. 7980

ph2, ph3, Shehee et al. Mus musculus mouse x14061 þr,þ2 I 989

Oryctolagus Margof et al. cuniculus rabbit ,þþ2,þL Ml8818 1 989

Otolemur Tagle et galago ô,p u60902 al. crqssicaudatus 1992 õ Tarsius syrichta tarsier r04428 Koop et al. p J04429 I 989a

Schon et al. Laurasiatheria Capra hircus goat po M15387 t 98l

l5 (Millipore), and small samples of purified template (0.8 ¡rl) pipetted into 19.2 ¡rl aliquots of reaction mixture for subsequent amplification experiments. To increase template- primer binding specificity, a nested PCR protocol utilizing an adnèaling Gmperature (Ta) gradient of 50-60'C was employed. Nucleotide sequences of the resulting products were used to design species-specific primers to amplify the flanking regions of each gene via a walking reaction (APAgeneru Genome Walking Kit, Bio S&T Inc.).

A 15 ¡"tl sample of each final PCR product was mixed with 3.5 ¡"rl loading buffer and 1.85 ¡rl SYBR Green (Fisher Scientific), pipetted into gels prepared from 0.75 g of low melting temperature agarose dissolved in 50 ml lx TAE buffer, and electrophoresed at 100 V for I hr. Target bands were excised, and purified using a MinEluterM Gel

Extraction Kit (QIAGEN). These extracts were ligated using a eIAGEN@ pCR

Cloningnlu" Kit and transformations performed. Transformation products were incubated on agar plates containing O.lYo vlv ampicillin (100 mglml), X-GAL (40 mg/ml) aod

IPTG (100 mM) for l8 hr at37oC. Positive colonies were transferred to culture medium containing 0.lYo vlv ampicillin (100 mg/ml), and incubated (37'C for 18 hr) in 50 ml LB culture tubes placed on a platform shaker set to 225 RPM. Finally, a 3 ml subsample of this bacterial culture was pelleted by centrifugation (4 min at 5,000 RPM) and plasmid

DNA extracted (QIAprep@ Spin Miniprep Kit; eIAGEN).

To verify that the plasmids contained target bands, 20 ¡*l (350-400 ng) of purified plasmid DNA was digested in a restriction enzymemixh:re (l 5 ¡rl 10x REact@ 3

Buffer, 0.5 ¡rl EcoRl (10 U/¡rl) and 11.0 ul ddl{zo) for l-z hr at 37oC, then electrophoresed (80 V; t hr) on a lYo agarose gel. Positive samples \¡/ere sequenced in both directions with a 3739 ABI PRISM Genetic Analyzer (University of Calgary DNA

16 core laboratory) and BigDye Sequencing Kit using the universal sequencing primers, M-

13(F)-40 and M-13@) (Appendix 1-1).

Data analyses

Consensus gene alignments were constructed for each species using

Sequencherru (Versi on 4.2.2) software. Overlapping gene fragments incorporated between 3 and 14 cloned fragments amplified from 2 to 7 individual PCRs, respectively.

Dotmatcher (EMBOSS) software was used to identify homology, and MatGAT (Version

2,02) employed to calculate percent identity, between sequences of the 'pJike' genes amplified in this study and the ò- and p-globin genes of human, rabbit and mouse (see

Table 1-2 for GenBank accession numbers). For comparative purposes, the ô- and p- globin gene flanking regions of the nine-banded armadillo (Dasypus novemcinctus) were first identified in a working draft BAC sequence available on GenBank (accession number 4C151518, representing clone VMRC5-69F10 from the NISC Comparative

Vertebrate Sequencing Project) and included in these analyses. To assess promoter regions, the 5' flanking sequences of sequenced genes were aligned and compared with the 'pJike' globin genes of armadillq human, galago and tarsier (Table 1-2).

To examine the inter-relationships amongst paenungulates, a phylogenetic analysis was conducted on intron 2 (IVS2) sequences of the ò-globin genes ampliflred from these species, with homologous xenarthran sequences serving as outgroups. A

773 bp alignment was created using Clust¿l X (Version 1.8) (Thompson e/. al 1994) using gap opening and extension penalty values of 16 and 6.66, respectively, and refined by eye. A ML analysis was conducted on the alignment using PAUP* (Version 4.0b10)

T7 software, applying the HKY85 two-parameter model variant for unequal base

frequencies. AII positions were weighted equally and the transition:transversion ratio set

to 2.00. A heuristic search was executed with 1000 bootst.u¡r"pti"ut"* The final tree was constructed using random stepwise addition with weighted least-squares and TBR

algorithms.

18 RESULTS

A total of 10 complete and 3 incomplete 'p-like' gene products were amplified

and sequenced - from the lt afrotherian species (Table f 4r 'Complete' products

possessed gene regions running from at least the initiation to termination codons, while

_'incomplete' products were missing the gene region containing either one or both signals.

putative AII gene products possessed the three exon/two intron affangement (Appendix 1-

3) characteristic of other p-globin cluster genes. For genes with no frameshift mutations,

IVSI divided codon 30 between the second and third nucleotide, while IVS2 separated

codons 104 and 105. Comparative sequence examinations revealed that both gene

products obtained from the a¿rdvark and the single product from each ofthe golden and

Hottentot moles were artificial 'ò/p-globin' chimeras, with the conceptual point of

conversion occurring at exactly the same base position within each "gene"; these pCR

artifacts were thus excluded from all subsequent analyses. MatGAT and dotplot analyses

(see below) suggested that all but one gene product from each of the remaining

afrotherian species shared the highest degree of sequence homology with the ô-globin

genes of human, rabbit and mouse (Table I -2). Accordingly, these genes were classified

as ô-globins. Conversely, one of the amplified gene fragments from the elephant shrew was shown to share highest sequence similarity with other placental p-globin

sequences Table @ig.1-9, 1-4) and classified as p-globin. Complete ù and B-globin gene sequences were obtained from both xenarthran species included in this study.

Classification of afrotherian 'þlike' globin genes

Dotplot analyses of the Asian (Fig. la) and African elephant (data not shown) genes (which shared 99.2% sequence identity from initiation to termination codons)

19 Table 1-3. List of 'BJike' globin gene products amplified in t[is_study.

Number of base Taxon Gene identity Amplified gene pairs sequenced region

Afücan elephant ò I,744 complete

Asian elephant ò 2,lrg complete

dugong ô 2,639 complete

West Indian manatee .ô L,696 complete

rock hyrax ô 1,180 partial

öH 1,609 complete

yellow-spotted hyrax òH r,604 complete

western tree hyrax òH 1,591 complete

aardvark ò/p r,177 complete, artifact

ò/p I,192 complete, artifact

bushveld elephant shrew p 1,840 partial

ò 180 partial

cape golden mole ò/p I,7L3 complete, artifact

Hottentotgolden mole òorp? 426 partial

p pale-throated sloth 1,3 18 complete

q,ò 1,590 complete

nine-banded armadillo p 1,383 complete

+ò 1,296 complete

20 recovered the highest sequence similarity with ò-globin genes of human and rabbit in the

IVS2 and the 3' flanking regions, while 5' flanking regions revealed homology with other ' 1.-- B-globins. Examination of the Asian elephant 5' promoter regioñ further revealed the

presence of all transcriptional control motifs characteristic of functional B-globin genes

(Fig. 1-5; Appendix 1-4). The translated protein chains of the African and Asian elephant

ò-globin genes were identical to the 'p-globin' protein chain determined for these species

(Fig.1-3), suggesting these genes are expressed.

Dotplot analyses of the dugong (Fig. l-6) and manatee (data not shown) gene

products (which were 96.2%o identical from initiation to termination codons) revealed

homology with other mammalian ô-globin genes in the IVS2 and 3' flanking regions.

Like the elephant gene products, the 5' flankig regions of both sirenian ò-globin genes

were p-like. However, both species possessed ATA --* GTA mutations at the ATA

transcriptional control motif (Fig. 1-5, Appendix l-4). While these base substitutions

may have an effect on the transcriptional regulation of these genes, the conceptual protein

chain of the Florida manatee ô-globin gene precisely matched the amino acid sequence of

the 'p-globin' protein chain determined from the Amazon manatee, Trichechus inunguis

(Fig. 1-3).

Dotplot comparisons of the first rock hyrax gene fragment (Fig. 1-7) also

suggested this gene was ò-like '\¡/ithin the IVS2 and 3' flanking region. This gene fragment (designated ô-globin) encoded for a protein identical to codons 82-146 of the

'p-globin' protein chain documented for the Abyssinian hyrax (Procavia capensis habessinica) (Fig. l-3). Notably, the second rock hyrax gene product was also òlike in the IVS2 and 3' flanking region (Fig. l-8). However, the translated protein sequence \ryas

21 Table 1-4. Percent identity matrix comparing the IVSI and IVS2 sequences of 'p-like'globin genes sequenced in this study and those of human, rabbit and mouse. Gap opening and extension penalties were set to 16 and 4, respectively. Numbers in gray represent IVSI comparisons while values in black denote fVS2 comparisons.

(õ' (ô' (ô' L(ô 2(E' 3 4 s tô' 6 7 rô' 8(ô 9 (ùô 10 ({'ô ll lß' 12 (B', t3 (ß 14 lô 15 tô) t6 lô L7 (ô 18 (B t9 (B' 20 (B', 2t (B'

1. Africa¡l elephant ô 98.9 84.0 84.5 73.d 75.0 76.1 77.2 67.f Á(6 44.7 49.5 47. 54.! 61.5 <¿. ¿ <ÉÁ 47.( 46.4 48.6 48.4 2. Asian ele¡hant ô 99.2 7aÁ 7(O AK? 83.8 84.2 77.2 67.Z 65. 49.7 48.2 47.1 54.! 61.2 5¿.0 (6.f1 ¿ß.n 46.t ß.2 4E.2 |n.3 3. dueons ò 9r.5 97.4 17.r 16¡ 76.8 77.8 67.1 65.f 46.9 50.4 48. 6t.', 1^¿ 48.5 46.2 48.2 47.9 4. manatee ô 91.5 9|.8 96.1. 77.1 78.2 69.0 66. 47.0 40 ¡< áRA s7.Á 62.t 54.t 57.( 49.3 46.8 47.l 47.5 ô 1^ á 5. rockhvrax 72.3 73.1 #î 60.3 45.8 49.6 47.',, ar. a (t{ s2.3 53.8 ß.2 47.1 41.1 47.2 J "i1.6 6. rockhyrax ôß i; ,1J 97.7 95.s 63.6 62. 47.2 49.7 47.1 s2-6 57.! 52.t <<¡ 48.', 47.t 41.5 ¿R(

7. vellow-sootted hwax ôs 7'J.4 7i. 8rl.fi Ii0.0 )5,5 95.9 64.2 62.! 48.1 49.8 48.', \17 l/.r 111 <Ã1 &1 47.4 b..) 48.6 48.0 t\) ôH 'li. 8. tree hrrax 7ô. fr0..l 3ri.8 9-ì..i -1,-6 65.1 627 48.t 50.3 dR{ <1 t (Pn

14. ô (i9,1 ,. human 68.t 68.t 68. í;.t. ÍJ li l. ,:tj 11 63..1 66.t 70.5 58.2 s2.4 51,3 49.4 41.( ß.4 44.! (ôlike) 15. ¡abbit rllß2 5+.2 5J.2 s6. i1 1 itl 19.{ ô1. 59.¡ ,t8.8 s8.J (il s6.2 55.t (63 ß.7 46( tt!'t ¿ 47.'. 16. 4.J. J .!i.1 mouse th2 lò-like) "t3.J 1i.1 l.i ¿.1.1 .rí. J4.l .11.7

mouse (öJike) ,.t(;. t'l Fh3 51 51 ! -l-<. I .t -r.t.B .19.( s6.-i 53.Í 45.3 46.5 4b.o 47.! 18. human ß tt) 68.9 r;fi.1 69.t 68. 6i h1 it {tJ.l 66.9 ii).5 i (l(i.{ì ll 18.5 47.1 aRÁ 47.3

19. rabbit ß1 (tJ.1 1;.3.1 ó 1..1 ót.r 69.0 id.{i ó(r,'I 65.1 7ti.t 57_ 1 -13. -17 . il).n 49.È s2.1

20. mouse 81 6Z 6t.tj (t..\. (¡2.1 ,l ()2 iÌf ,s.1 5i.{) 65.9 6:ì. 51.1 ,tB.: 5i (i-ì. 1 9.2 6^Á

.I 5,J r 1 21. mouse 82 ¡.1. t il.3 5l 5-1.5 ¡1).7 .!5. 5.]. t { -r5 5t lti Fig. 1-3. Comparison of the'B-globin' chain amino acid sequences of four afrotherian and two xenarthran species (top row in each case) with those deduced from the ò- and

p-globin gene seqüences determined for these species (bottom row; dots represent -Florida identity with the top sequence). manatee (7. manatu^s); Irock hyrax (P. capensis); $conceptual amino acid sequence of the armadillo p-globin gene (GenBank accession number 4C151518). The degenerate amino acid symbol B corresponds to D or N, while Z represents E or Q.

23 African elephant (Loxodonta africana); Braunitzer et al. (1982)

(ô-globin)

Asian elephant(Elephas maximus); Braunitzer et øt. (1984)

Amazon manatee (Trichechus inunguis); Kleinschmidt et al. (1996)

(ô-globin)

Rock (Abyssinian) hyrax (Procøvia cøpensis habessinica); Kleinschmidt and Braunitzer (1983)

r (ô-globin) .R...S...8..S,.S.,DEKII.S...... r....R T T....r¡K..R.Q... (ôH-globh)

Pale-throated tfuee-toed sloth (Bradypus tridactylus); Kleiruchmi dt et al. (19g9)

l,l

f

Nine-ba¡ded armadillo (Dasypus novemcinctus); de Jong et al. (l9gl) I (ftchain') ..D.EDC..E (p-globin) (ftglobín) Fig. 1-4. Dotplot comparisons (window size : 100; threshold : 80) of the putative

Asian elephant ô-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons l, 2 and 3 for each gene are denoted by black boxes.

24 Asian elephant ô-globin gene com parisons

ô-type genes

/'/ ,r/

ca c (g E o- o rabbit yB2 qJ rabbit Bl c .(o 2000

1500

I000

sdo loòo ls'oo 2ooo mouse Bh2 mouse B1

mouse Bh3 mouse B2 Fig. 1-5. The 5' flanking sequence data and features of ò- and p-globin genes amplified

in this study and those of armadillo, human, tarsier and galago. Data runs from seven bases upstream of the distal "GACCC" site to the initiation ("ATG') codon. The putative cap site ("CAp"¡ is underlined and lies 52 bp (51 bp in the sloth) upstream of the initiation codon. Sequences considered important for transcriptional regulation

(CACCC, CCAAT, and ATA) are denoted by bold CAPITAL letters. Nucleotides surrounding the ATA motif correspond to a consensus sequence ("GGGCATAfuartr{G') found in many postnatally expressed p-globin genes (flardison 1983) and are shaded in gray. The consensus sequence "CTTPyTG' @aralle and Brownlee 1978), found seven bases downstream of the cap site, is also marked in gray. Colons (:) denote base deletions. Underlined bases represent those which deviate from the consensus sequence normally recovered in p-globin genes, but should not be considered significant unless they compose a sequence element involved in tanscriptional regulation.

25 " cAccc * ,,cAccc,, "ccÀÀT' +1L0 +100 +90 +80 +70 +60 +50 +40 Asian elephant ô caggcctCACCCtgcagaaccaCACCCtggc: : : : : cttggCC.AÀlctgctcacaagagcaaaaagggcagg: accagggtt dugong ô cagacctCAccCtgcagaaccaCACCCtggc!:!::ctyggCCAATctgytcccaggagcaagaagggcagaaaccagggtt manatee ô cagacctCACCCtgcagáaccaCACCCtggcr::::cttggCCAÀÍIctgctcccaggagca"gaagggåagaaaccagggtt armadillo r¡rô tcttacaaatcaatgaacatggatatatgctatgccttaggAcAÀIctgctùcaaggagcagagagggaagggaccaaggct human ò tttcattctcacaaactaatga.AAcCCtg : cttatcttaaaCCÀÀgctgct,cac 3 : 3 : 3 : : ! t,ggagcagggaggacaggac tarsier ô ttttattctcacaaaccaatcaatttct gccatatcctaggTEAÀTctgctcacatgagcagggagggcâagagtcagggct galago ô / galago tcagccctcctccAcacagccaCACCCtgag:::::ttcagCCTÀTctcctcataggtacagggagggcaggaaccagggca

elephant shrew B ctccAggctctttctgcagagtCAclct9gc!3:::ctgggCCA.ã,Tctgcttgcagaagcaccatgggcaggacccagggct sloth B AC TCCCTGGAAGGA.AGGGGÀGGGCCCAGGGCT armadillo B cag ac ctCACCCtatggaaccaCACCCaggt, : : : ! : tgtgaCCAÀTctgctc agaggagcagagagggc agggaccagggct human B tagacctCACCCtgtggagccaCACCCt.agg : : : : : gttggCC.AÀlctactcccaggagcagggagggcaggagccagggct tarsier B cag gcctCACCCtgc agagcccCACCCt ggg : : ! : : cttggccAÀlctgctcccaggagcagtaagggc agt.agccaaggct

"ÀTA" ,,ATc" +30 +20 +L0 0 _10 _20 _30 _40 _50 Asian elephant ô gggcArÀtaaggaagagtagtgcc agctgctgtttAcactcacttctgacacaactgtgttgactagcaactacccaatcagacaccatg dugong ô gg E cGlAaaar'fgaagrgcaggactagctgctgcttå'ctctLa.,:t-t:ct-ggcacaactgtgttgact,agcaactacacaatcagacvrcc manatee ô gggcGTÀaaaggaagagcagggctagctgctgcttAcacttacì:tctggcacaactgtgt.tgactaacaactacacaatcagacacca!g armadillo rpô gggcÀlAaaag: !::agcaaagccagctgttgcttAtacttgcr-Ltggacaaaact,gtgctct.ctagcaaccatacaaag'agacac4a{g human ô qêgCATAaaâggcâgggcagagtcAactgttgcttAcactttci:-.tsig¿sâtaacagtgttcact,agcaacctcaaa::cagacaccqùg tarsier ô gggcATAaaa'JgtagggcagagtcAactgctgctcAcgcttgc.htctgacacaaccgtgttcaccagcaaccacaaa: : cagacacJatg galago ô / galago g'ggcATAaaaEtgcagggcagggaactgctgct,gctAtacttgct-tt-cqacacaactgtgttcactagcaacca::: ! !:cagacatcatg

I elephant shrew B gggcArAEaag : aag99ca99'acc agcaaaggcttAcactLqcL¿t c'r-qacacaagt,gtgttcactagcaactactcaaacagtcaagatg sloth B tgg cATAaaaggaggagaagggcc agctgctactcAcacttgcttcigacacagctgtgttcactagcaaccacaaaa: cagacaccatg armadillo B gggcATAaaagtaagagtaaggccagctgctgcttAcaattgct'Ectgacacatttgtgctcactagcaaccatataaagagacaccatg human B gggcATAaaagtcagggcagagccatctattgctt.Acatttgclrtctgacacaactgtgttcactagcaacctcaaa::cagacaccatg tarsier B Fig. l-6. Doþlot comparisons (window size : 100; tfueshold : 80) of the putative dugong ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5'to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes.

26 Dugong ô-globin gene comparisons

ô-type genes _Þ_+yp" genes

,/

human ô human B

oa cOl o rabbit yB2 (') rabbit B1 E=

mouse BhZ mouse B1

mouse Bh3 mouse B2 Fig. 1-7. Dotplot comparisons (window size: 100; threshold: 80) of the putative rock hyrax ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 2 (parttù, sequence) and 3 for each gene are denoted by blackboxes.

27 Rock hyrax ô-globin partial gene comparisons

ô-type genes

human ô human B

t'O X (o rabbit yB2 rabbit B1 -c -\¿U

LOl

I

mouse Bh2 mouse B1

ph3 mouse mouse B2 Fig. 1-8- Dotplot comparisons (window size: 100; threshold: 80) of the putative rock hyrax ôH-globin gene (Y-axis) with the ò-, and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes.

28 Rock hyrax ôH-globin gene comparisons

ô-type genes

human õ human B

r oO (gX ! rabbit yB2 rabbit Bl -c -VU o

mouse Bh2 mouse Bl

mouse Bh3 mouse B2 different at 31 of 146 residues (2I.2%) from the 'p-chain' of this species (Fig. 1-3). The amplified gene products of the yellow-spotted and tree hyraxes Sh¿red 9T9Yo and 95.8o/ø sequence identity with the second rock hyrax ô-like gene, respectively, and grouped closely together on the ML tree (Fig. l-la) suggesting the three loci are orthologous.

These apparent hyrax-specific loci were thus designated ôH-globins.

The first gene fragment amplified from the elephant shrew ran from +670 bp upstream of the initiation codon to the end of IVS2. Dotplot analyses recovered the highest degree of sequence similarity between this fragment and the p-globin genes of human, rabbit and mouse @ig. 1-9). Mutations at both CACCC transcriptional control motifs in the 5' promoter region of this gene were detected (Fig. l-5, Appendix 1-4). The second gene fragment, which ran from codon position 136 in exon 3 to +180 downstream from the stop codon, shared the highest degree of sequence similarity with eutherian ò- globin genes (data not shown).

Classifcation of xenarthran 'þlike' globin genes

Dotplot (Figs. 1-10) and MatGAT (Table 14) analyses suggested that one armadillo 'p-like' gene product was homologous with eutherian ô-globin genes in IVS2.

This gene was presumed a pseudogene (r¡rò) due to a premature termination signal at codon 22. The second armadillo gene product showed highest sequence similarity with other mammalian p-globin genes in the IVS2 and 3' regions (Fig. 1-11). Conspicuously, the deduced protein sequence ofthis gene differed from the p-globin protein sequence at

4 of 146 positions (2.1%) throughout exons 1,2 and 3 (Tig. 1-3). From the initiation to

29 Fig. 1-9. Dotplot comparisons (window size : 100; tlreshold : 80) of the putative elephant shrew p-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons I and 2 for each gene are denoted by black boxes.

30 Elephant shrew B-globin partial gene comparisons

ô-type genes Þ-type genes

I 500

500 lo'oo t 500 2000 human ð human p

B (U -c r rabbit yB2 P rabbit Bl c(g -c o o

,/

mouse Bh2 mouse B1

mouse Bh3 mouse B2 Fig. 1-10. Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rþô-globin gene (Y-axis) with the ò- and B-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes.

3t Armadillo ryô-globin gene comparisons

ô-type genes Þtype genes

o sdo robo I 500 human ô human p

CrO s00 :9 rabbit yp2 -o rabbit B1 (U E L (u

mouse Bh2 mouse B1

mouse Bh3 mouse B2 Fig. 1-11. Dotplot comparisons (window size: 100; threshold:80) of the putative armadillo p-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes.

32 Armadillo B-g lobin gene com parisons

ô-type genes Þ-type genes

500 1000

human ð human B

o E ro rabbit r¡p2 rabbit E B1 L (o

500 1000

mouse Bh2 mouse B1

5Ci0 ph3 mouse mouse B2 conceptual stop codons, nucleotide sequences of the armadillo rþô-globin and B-globin genes were found to share a high'degree (99.4Yo and 99.3%o) oEjeQgenceidentity with the two genes identified at the 3'-end of the Dasypus novemcinclzs B-globin cluster. This 3'- most pJike gene seemingly possessed all the 5' and 3' control elements necessary for transcription. Conversely, the ôlike gene was missing both CACCC and CCAAT 5' transcriptional control motifs and contained no 3' poly-adenylation signal (Fig. 1-5;

Appendices 1-4 and 1-5). Dotplot analyses showed the ô-globin gene to be ôlike and the p-globin gene to be B-like in their respective 5' and 3' flanking regions (data not shown).

Dotplot (Figs. 1-12, l-13) and MatGAT (Table 1-4) analyses suggested that one sloth 'pJike' gene product shared homology with eutherian ô-globin genes, while the second gene product was homologous with other p-globin genes. The ô-globin gene was presumed a pseudogene (tpò) due to a two-base deletion at codon 77 resultin g in a premature termination signal at codon 87 in exon 2. The conceptual protein chain of the sloth p-globin gene was identical to the amino acid sequence of the 'p-globin' chain (Fig. l-3)

Phylogenetic analysis of ùglobin IVS2 regions

The ML analysis of ò-globin IVS2 sequences (773 bp) grouped elephants and sirenians together to the exclusion of hyraxes (Fig. 1-14). However, bootstrap support for this assemblage was relatively low (56%). Bootstrap support was strong (37-100%) for all other groupings. The three òH-globin genes were closely clustered together and placed

J3 Fig. 1-12. Dotplot comparisons (window size : 100; threshold : 80) of the putative sloth r¡rò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes.

34 Sloth tyô-globin gene comparisons

õ-type genes Ê-typ" genes

1000 ,/

0 500 l5oo 1000 o söo toilo rsöo human õ human B

500

I'O 0 0 500 1000 1500 0 500 1000 -c rabbit yB2 rabbit B1 Po t^

500

0 o 5oo tooo rsöo 0 500 1000 1500 mouse Bh2 mouse Bl

1

t 000

500

o 5oo rcbo 0 500 to00 ts00 mouse Bh3 mouse B2 Fig. 1-13. Doþlot comparisons (window size : 100; threshold : S0) of the putative sloth p-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice

(X-a*iÐ. Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes.

35 Sloth p-globin gene comparisons

ô-type genes F-type genes

,/ ,/

1000 r 500 human ô human B

./

-Ò co- s00 sdo -c rabbit ryõ2 rabbit Po B 6

500 500 1000 mouse Bh2 mouse B1

OF 500 l0¡0 0 500 mouse Bh3 mouse p2 Fig. 1-14. Xenarthran-rooted maximum likelihood tree constructed from homologous

ô-globin fVS2 gene sequences (773 bp; see text for details) illustrating the inter- relationships ¿tmong paenungulate mammals. Values along each branch denote the degree of sequence divergence. Bootstrap support (1000 replicates) for each node are indicated in large whole numbers.

36 armadillo tyð

0.09 4

o-o1 0 rock hyrax ôH

yellow-spotted hyrax ðH

0.01 1 - western üee hyrax ôH

rock hyrax ð

o-007 manatee ô

0.02 1 dugong ò -

0.005 African elephant ô

Asian elephant ô 0. 05 substitutions/site as a sister group to the rock hyrax ò-globin gene with high bootstrap supporl (57%).

Finally, the tree hyrax was placed as the most basal branch of the hydracoidean lineage. -=.-

37 DISCUSSION

The discovery of a ô-like globin gene within members-of, the superordinal clades,

Afrotheria and Xenarthra, supports the theory that the ancestral mammalian proto-p- globin gene duplicated prior to the eutherian radiation (Ilardies et al. L984; Hardison

1984; Hardison and Margot 1984; Hutchison er at. 1984) However, the data moreover suggest that, contrary to other placental mammals, ô-like genes are the only post-natúly expressed loci of the p-globin cluster found \¡/ithin the red blood cells of paenungulate species. This finding challenges the widely perceived notion (Martin et ø1. 1983; Hardies et al. 1984; Hardison 1984; Hardison and Margot 1984; Koop et al. 1989;Prychitko et al.

2005) that ò-globins have been little more than relic genes (i.e. 'failed experiments') since their inception. It is further proposed that modifications within the p-cluster leading to the sole expression of the ôlocus were already present in stem afrotherians prior to the radiation of the paenungulate clade.

These assertions are well supported by several lines of evidence. The first is based on the observation that the 5' promoter (Fig. 1-5) and 3' poly-A signals (Appendix

1-5) necessary for transcription were detected flanking the ô-globin gene of all paenungulate species for which this region was obtained. However, it is noteworthy that the 5' external ATA box of both sirenian species possessed an ATA ) GTA substitution

@ig. 1-5). This site is thought to be crucial for the transcription of eukaryotic genes by

RNA polymerase II (Dierks et ø1. 1983; Kosche et al. 1985} leading Satoh et al. (1999) to speculate that a similarly altered ATA box might diminish or prevent activity of the rat yl-globin gene. In contras! data collected for this study suggest that the mutated 5' GTA sequence does not adversely affect expression of the ôJocus in sirenians. Support for

38 this contention is provided by the observation that the conceptual polypeptide chain encoded by the ô-globin gene of the Florida manatee (Trichechus manatus) precisely

.l . matches the 'p-globin' chain of the closely related Amazoi manatee, T. ínunguis

(Kleinschmidt et al. 1986). In addition, putative protein chains translated from the õ- globin gene products of the African elephant, Asian elephant and rock hyrax corresponded to the respective 'p-globin' protein chain sequences of these species

(Braunitzer et al. 1982; Kleinschmidt and Braunitzer 1983; Braunitzer et al. l9S4).

Because the hemoglobin molecules of adult proboscideans (Kleihauer et al. 1965;

Braunitzer et ql. 1982; Braunitzer et al. 1984) and sirenians (White et al. 1976;

Kleinschmidt et al. 1986) examined to date possess a single component (i.e. are composed of only one type of 'a-like' and one type of 'p-like' chain), this implies that the pJocus is either non-functional or absent in these groups.

Interestingly, hemoglobin of the rock hyrax possesses two chromatographically distinct (i.e. HbI and tIbII) components (Kleinschmidt and Braunitzer 1983). However, amino acid analysis suggested that like the two 'a-like' chains, the sequences of the two

'pJike' chains of this species were identical, a finding the authors could not explain. The discovery ofa second highly conserved ôlike (ôH) gene in each ofthe three genera raises the possibility that this locus is expressed in hyraxes. Clearly, studies aimed at detecting

ôH mRNA transcripts in these species are required to resolve this issue.

Because so many ô-globin pseudogenes have been reported to date, the expression of a fully functional òlocus in paenungulate (and likely other afrotherian) mammals is intriguing, and raises the question: how did the ôlocus gain an upper hand over the largely dominant pJocus in this clade? Traditionally, inactivation or diminished

39 expression of the ölocus has been attributed to the absence of intragenic enhancers

(Antoniou et al. 1988) and unstable mRNA transcripts (Wood et al. 1978; Ross and ' -1-. Pizano 1983; Kosche et al. 1985; Steinberg and Adams l991r). - However, it is now primarily thought that an ineffrcient promoter region, typically lacking (or possessing mutated versions of) regulatory elements crucial for proper transcription has prevented this locus from attaining the transcriptional success of its predecessor (Vincent and

Wilson 1989; Koop et al. 1989; Tagle et al. 1991; Prychitko et al.20O5). Indeed, the region immediately upstream of the òloci generally contains mutations in two important structural motifs (the CCAAT and proximal CACCC boxes) upstream of the Cap site that interact with specific proteins integral for gene activation (Efstratiadis et al. 1980;

Grosveld et al. 1982; Dierks et al. 1983; Kosche et al. 1985; Myers et al. 1986). The ò- locus of galagos notably contains an int¿ct CACCC control sequence and is expressed at much higher levels (40% of erythrocytic'B-like' chains; Tagle et al. I99l) compared to other hominoids (2-18Yo;Boyer et al. l97l; Ross and Pizarro 1983; Tagle et al. 1988).

In addition, otherwise non-expressed ò-globin genes can be activated in both erythroid and non-erythroid cells by introduction of intact CCAAT and CACCC control sequences into the 5' flanking region of this locus @onze et al. 1996; Tang et al. 1997; Ristaldi er al. 1999). Normal expression of 'pJike' globins thus appears to largely reside in these motifs. Significantly, these important transcriptional control elements, plus a second distal CACCC box (a feature characteristic of functional 'p-like' globin genes; Grosveld et al. 1982; Dierks et al. 1983; Myers et al. 1986) (Fig. 1-5), were detected in the upstream region ofboth sirenian and probscidean ô-globin genes.

40 Hints into the mechanism and probable scenario of the evolutionary events leading to this never before documented attribute can be inferred from the p-clusters of other eutherian mammals (see Fig. l-1; Konkel et al. 1979; Sðnon ¿, at.1OU;Lingrel et al. 1983; Shapiro et al. T983; Hardison and Margot 1984; Townes et al. 1984; Schimenti and Duncan 1985b; Cooper et al. 1996; Satoh et al. 1999), which have demonstrated that the òJocus has been independently converted by the neighbouring p-locus in most species examined (see Fig. 1-1). However, with the exception of the galago (Tagle et al.

1988), these recombination events were insufficient to alter the promoter regions of the ò- locus. Dotplot analyses of the paenungulate ò-globin genes, which indicated their 5' transcriptional control regions are B-like, suggest their presumably capable promoters may have arose via conversion by the 5' region of the p-locus.

An alternate scenario is that the region downstream of exon 2 of the p-globin locus was converted by ô. This contention seems unlikely, however, since the amplified pJike globin gene of the bushveld elephant shrew (Etephantutus intufi) shows no evidence of conversion by a ò-like locus (Fig. l-9). Unforhrnately, the hemoglobin of E intufi has not been examined, though that of the East African elephant shrews,

Petrodromus sultani and chrysopygus> possess two distinct IIb components expressed in relatively equivalent proportions (Buettner-Janusch and Buettner-Janusch

1963). While this observation could signi$ the presence of two different 'pJike' protein chains in these species, the p-locus of E intufi possesses a mutated 5' CACCC transcriptional control motif () CACTC) and lacks the second distal 5' CACCC box common to transcribed 'Blike' genes (Grosveld et al. 1982, Dierks et al. 1983; Myers ø/ al. 1986; Fig. l-5), and is therefore probably not expressed. Together with dat¿ collected

41 from the paenungulates, this finding implies that, within afrotherians, up-regulation of the

ò-locus preceded silencing of p, and that both events early in the evolution of this basal

eutherian clade. Following gene duplication, the failure of t Oa."og to adopt a ne\ry function or expression pattern often leads to its nonfunctionalization (Ohno lgTO) Thus, once the equally apt ò-locus was co-expressed with its predecessor (B-globin) in stem afrotherians and structurally configured for the sarne physiological role (oxygen transport), natural selection was likely too weak to prevent silencing mutations from accumulating in one of these superfluous genes and randomly, the p-locus was inactivated.

The hemoglobin of adult nine-banded armadillos consists of one type of 'c-chain' and two types of electrophoretically distinct'p-chains' (de Jong et al. l98l). While the amino acid sequence of the minor 'B-globin' component is unknown, the major 'p- globin' chain differed from the translated protein sequence of the amplified armadillo p- globin gene at four residues (Fig. 1-3). However, because the amplified p gene shared

99.4% sequence identity with the 3'-most B-globin cluster gene (identified in the

Dasypus novemcinclzs B-globin cluster; Genbank accession number ACl5l5l8) which encoded a protein sequence 99.3% identical (145 of 146 residues) to the major'B-like' polypeptide sequence, these residue differences possibly reflect nucleotide misincorporations or, more likely, the occurrence of two alleles within the population.

Additionally- both sequences differ from the major "p-globin" chain at position 23

(histidine vs. cysteine), suggesting that the published amino acid sequence is incorrect at this site. Because the armadillo ô-globin locus appears non-functional, the gene encoding the minor 'B-like' globin component of this species is currently unknown, but speculated

42 to either be e-globin or represent a locus situated outside of the p-cluster (see Chapter II)

Unlike other eutherians, the rþò-locus of armadillos does nol to have undergone lpïar conversion by p-globin (Fig. 1-10). The only other eutherian ô-gtobin gene thought to

have remained unaltered by conversion is the ph2 pseudogene of mice (Phtllips et at.

1984; Hardies et al. 1984). However, these loci show little, if any, sequence homology at

the 5' flanking region (data not shown), possibly signifuing a high degree of sequence

differentiation or lack of orthology between the two loci.

The hemoglobin of three-toed sloths also possesses two chromatographically

distinct components (Kleinschmidt et al. 1989). While the analysis indicated the major

'p-chain' is encoded by the p-locus, the amplified rpô-globin gene possessed premature

termination signals within the coding region and appeared to have undergone conversion

of IVSI and exon 2by the pJocus. This recombination event was evident from the high

sequence identity shared by these two gene regions (96.4% and, 97.3Yo, respectively),

compared to 79.3o/o for exon 1 and 85.3% for exon 3. It seems unlikely that the

frameshift mutation in exon 2ledto the silencing of this gene since the rþò-globin gene of

armadillos appears to have been silenced in the absence of conversion events. Hence, the

òlocus was likely already transcriptionally silent prior to the divergence of these two

species.

Paenungulate inter-relationships have been extremely diffrcult to resolve using

molecular data (Amrine and Springer 1999; Waddell and Shelley 2lI3;Nishihara e/ a/.

2005). The present analysis of ò-globin IVS2 sequences is the fïrst to incorporate

molecular data from all T extant genera to address this question. The ML tree (Fig. 1-la) placed elephants and sirenians as sister taxa to the exclusion of hyraxes; however, this

43 clade was resolved with only 56% bootstrap support. While providing support for the

traditional 'Tethytheria hypothesis' (McKenna 1975), the meager support value adds to a

growing body of literature suggesting that the three orders diiverged o*, u short time

interval (Murata et al. 2003; Waddell and Shelley 2003; Nishihara et al. 2005). Finally,

the treg hyrax was placed as the most basal hydracoidean (Fig. l-14), a result concordant

with the conclusion of Prinsloo (1993) based on restriction fragment length

polymorphism (RFLP) and phylogenetic analyses of the mitochondrial genes,

cytochrome b and 12S rRNA.

In conclusion, data from this study reveal that the F -* ò duplication event within

the mammalian p-globin cluster occurred prior to the eutherian radiation. The dat¿

additionally provide evidence that xenarthrans, like euarchontoglire and laurasiatherian

mammals, express the p-globin gene while suppressing the öJocus. However, it appears

that the ô-locus v/as converted in its 5' promoter region by F, then uniquely gained a

functional advantage over the normally dominant 3' locus early in the evolution of

afrotherian mammals. The subsequent silencing of the pJocus in this clade thus

challenges the widely accepted view that this locus is indispensable and has remained unaltered throughout eutherian evolution Qlardison and Margo t 1984;Tagle et al. l99L;

Prychitko et at.2005).

44 CHAPTER tr

MOLECULAR EVOLUTION OF THE EUTHERIAN

P-GLOBIN CLUSTER INFERRED FROM

AFROTHERIAN AND XENARTHRAN GENE SEQUENCES.

45 INTRODUCTION

The respiratory pigment hemoglobin, which binds in the lungs and .olV-q"tt delivers it to the tissues, consists of four heme groups bound ã fou. globin polypeptide strands: two 'oJike' and two 'p-like' chains, comprising 141 and 146 amino acid residues respectively (Weber and Wells 1989; Poyart et al. 1992; Weber 1995).

Phylogenetic studies based on nucleotide and protein sequence comparisons have suggested that the progenitor genes encoding the a- and p-globin components arose from the duplication of a single ancestral vertebrate globin gene roughly 450 mya (Czelusniak et al. 1982; Goodman et al. 1987). The proto-p-globin gene is thought to have duplicated prior to the radiation of mammals (-200 mya), producing a two-gene (proto-e and proto- p) cluster @fstratiadis et al. 1980, Goodman 1981; Czelusniak e/ at. 1982). Comparative analyses examining the p-globin clusters from four of l8 extant placental orders (i.e. the orders Rodentia, Lagomorpha, Artiodactyla, Primates) further suggested these paralogs underwent additional duplication events early in the evolution of eutherians, with the last common placental ancestor likely possessing either a four- (Hardison 1983; Hardison

1984) or five-gene cluster (F{ardies et al. 1984; Harris et al. 7984. Goodman et al. 1984;

Hardison and Gelinas 1986; Tagle et al. 1988) in the following orientation: 5'-e-y-(r¡)-ô- p-3'. The 'e-like' globin genes (e, 1 and r¡) are thought to have arisen from the proto-e- globin locus, while ô and p are descendents of the proto-p-globin locus (Goodman et al.

1987; Koop and Goodman 1988; Cooper and Hope 1993; Cooper et al. 1996).

Chromosome mapping studies have further revealed that these genes are typically arrayed

5' to 3' on the chromosome in the order which they are expressed (Tuan et al. 1985;

Forrester et al. 1987). Hence, expression of the 'e-like' genes are generally limited to

46 embryonic erythrocytes (Hardies et al. 1984; Hardison 1984; Hill et at. l9B4), with y- locus expression extending to the fet¿l red blood cells only wiithin anthropoid primates

(Fitch et al. 1991).

Compared to post-natal stages of eutherian development, embryonic and fetal growth takes place in a relatively static gestational environment and consequently, the genes encoding post-natal hemoglobin chains are generally less conserved than those comprising pre-natal hemoglobins (Shapiro et al. 1983; Hardies et al. 1984;Harns et al.

1986; Satoh et al. 1999; Koop and Goodman 1988) Oddly enough, comparative studies have suggested the non-coding regions of 'e-like' globin genes diverge more slowly than those of their adult-expressed counterparts as well (Shapiro et al. 1983; Hardison 1984;

Hardies et al. 1984; Gebel et al. 1985; Slightom et at. 1985; Koop et al. t9ï9b). While the non-coding evolution rates of p-globin cluster genes are comparable to the neufal mutation rate of pseudogenes (4 - 5 xl0-e substitutions/site/; Hayashida and Miyata

1983; Kimura L983;Li et al. 1985) for members of Laurasiatheria and Euarchontoglires, they are markedly low in primates (^2..9 xl}-e substitutionslsitelyear; Giebel et at. l9B5;

Harris et al. 1986; Koop et al. 1989b). This latter observation, known as the 'hominid slowdown' (Li et al. 1987; Hasegawa et at. 7989; Bailey et at. L997), has been attributed to the longer generation time of primates relative to other eutherians (Gebel et al. l9B5;

Wu and Li 1985).

As noted earlier, it is generally recognized that the progenitor of placental mammals possessed either a four- or five-gene p-globin cluster. However, comprehensive molecular phylogenetic studies (Madsen et al. 2007; Murphy et al.

200la,b; Amrine-Madsen et al. 2003) suggest this hypothesis is based on data from only

47 the two most derived (Euarchontoglires and Laurasiatheria) of the four major placental clades. It is therefore unknown whether the four- or five-gene cluster arose before or after the divergence of the superorders Afrotheria and Xenartnø=(Èig l-Ð Accordingly, the primary goal of this study was to obtain sequence data on the'e-like' globin genes from members of the superorders Xenarthra and Afrotheria and, together with nucleotide data obtained from their 'BJike' globin genes (see Chapter I), derive the composition of the ancestral p-globin cluster of stem eutherians. Furthermorg because nothing is known regarding the rates of evolution of the ' e-like' and 'pJike' globin genes of xenarthran and afrotherian mammals, the second goal was to compare the rates of DNA evolution for the coding and non-coding domains of the globin genes found within their p-clusters.

Finally, the above noted mutation rates were contrasted with those of other eutherian B- globin cluster genes in order to gain insight into the postulated link between generation time and the rate of 'neutral' evolution.

48 MATERIALS AND METHODS

Data collection

PCR, cloning and sequencing protocols were identical to those outlined in

Chapter I, with the exception that PCR primers used for initial amplifïcation reactions were designed from a 411 bp 'e-like' globin genefragment sequenced from both African and Asian elephants (Singh 2002). In additior¡ 128,121 bp of consensus sequence containing the p-globin gene cluster of the nine-banded armadillo (clone VMRC5-69F10;

NISC Comparative Vertebrate Sequencing Project), together with the complete e-, y- and r¡-globin genes of human, rabbig mouse and goat (Table 2-1) were downloaded from

GenBank and included in the analyses.

Data analyses

Consensus gene alignments were constructed from each species using

Sequencherru (Version 4.2.2). Overlapping gene fragments incorporated between 4 and,

7 cloned fragments amplified from 3 to 5 individual PCRs, respectively. Dotmatcher

(EMBOSS) was used to determine homology between the sequenced 'elike' genes and those identified from the armadillo p-cluster (see above; Table 2-1) with the e-, y- and q- globin genes of human, rabbiq goat and mouse (Table 2-1). Additionatly, MatGAT

(Version 2 02) software was employed to calculate the percent identity between the non- coding regions (tVS, 5' and 3' external) of the above noted genes. To assess promoter regions, the 5' external sequences of sequenced genes were aligned with the 'e-like' globin genes of armadillo, human and goat (Table 2-1).

49 Table 2-1. Genes included in,the sequence analyses of :eJike'globin genes with corresponding GenBank accession numbers.

Eutherian Globin Accession Species Common Name Reference superorder gene No.

Afrotheria Loxodonta africana African elephant DQ09l2ls This study

Elephas maximus Asian elephant ¡ DQ0el2l6 This shrdy

West Indian Trichechus manatus DQ09r2t7 This study manatee

Dugong dugon dugong DQ091218 This study

bushveld Elephantulus intuf DQ091219 This shrdy elephant shrew

pale-throated Xenarth¡a Bradypus tridactylus e DQ091220 This study three-toed sloth

Dasypus nine-banded Green 2004, t, tÞT, qrq ACls15lS novemcinctus armadillo unpublished

Efstratiadis ¿l Euarchotoglires Homo sapiens human s, q,lï u0l3l7 t', a\.1980

Shehee et al. Mus musculus mouse y, ph1 xt406l 1 989

Oryctolagus Margot et al. rabbit Ê'T MI8818 cuniculus 1989

€, et al. Tørsius syrichta ta¡sier Y, M8141l Koop Vn l\/133973 1989a

eI Shapiro er a/. Laurasiatheria Capra hircus goat x019t2 eII x01913 1 983

50 The coding regions of sequenced and representative eutherian 'e-like' and 'pJike'

globin genes devoid of frameshift mut¿tions (Table 2-5) were analyzed at synonymous

(silent) and nonsynonymous (replacement) sites. the percentage of synonymous and

nonsynonymous substitutions between the sequenced genes and their human ortholog,

was calculated using the Nei and Gojobori (1986, equations 1-3) method with DnaSP

software (Version 4.0). Corrections for superimposed (e.g. A--T-C) and back

mutations (e.g. A--*T*A) occurring at the same site were determined according to the method of Jukes and Cantor (1969). Mutation rates were calculated by dividing the percent substitution values by the estimated time of divergence between each of the compared species (with afrotherians, xenarthrans, laurasiatherians, atriodactylans and prosimian primates splitting from the main eutherian tree at 107, IO2, 87,85 and 77 mya, respectively; Springer et al. 2003).

The evolution rates of non-coding DNA were calculated for eutherian 'e-like' and

'p-like' globin genes by first subtracting the percent similarity between the IVS2 regions of the sequenced gene and its human ortholog (taken from the MatGAT analysis of IVS2 sequences, see Tables 1-4 and 2-3) from 100. The resulting value (representing % divergence) was divided by the estimated time of divergence between the sequenced and compared (human) species to infer the non-coding mutation rate.

5l RESTILTS

A single complete 'e-likel globin gene was amplified.from_the African elephant,

Asian elephant, dugong, manatee and sloth, while a partial'e-like' gene product was amplified from the elephant shrew (Table 2-2). The genes were classified as e-globin or y-globin based on homology with other eutherian 'e-like' globin genes (see below).

Classifcation of y-globin genes

Dotplot analyses of the Asian (Fig 2-l) and African (data not shown) elephant 'e- like' genes (which were 98.7Yo similar from initiation to termination codons) suggested they were homologous with eutherian y-globin genes in the IVS2 regions. Moreover,

MatGAT (Tables 2-3, 2-4) analyses recovered highest sequence similarity with other placental y-globin genes in all non-coding regions. Within the Asian elephant's 3' gene flanking region, two putative poly-adenylation signals (AATAAA) were found at +195 bp and +213 bp from the termination codon (Appendix 2-5), -T40 bp downstream from the site typical of functional eutherian y-globin genes (i.e. +62 bp from the stop codon;

Efstradiatis et al. 1980; Margot et al. 1989; Shehee et al. 1989). The absence of frameshift and splice junction mutations suggested both elephant 1-globin genes are capable of transcribing polypeptides.

Doþlot analyses of the dugong (Fig.2-2) and manatee (data not shown) 'e-like' globin gene products (which were 95.9Yo identical from initiation to termination codons) revealed homology with the y-globin genes of other eutherians in the IVS2 region.

Similarly, MatGAT (Tables 2-3,2-4) analyses of the dugong and manatee'e-like'genes recovered highest sequence similarity with placental 1-globin genes in all non-coding

52 --- Table 2-2. List of 'elike' globin gene products amplified in this study.

Number of base Täxon Gene identity Amplified gene region pairs sequenced

African elephant r,455 complete

Asian elephant 1,798 complete

dugong 2.041 complete

manatee 1,578 complete

elephant shrew 1,r29 partial

sloth 2,122 complete

53 Table 2-3 Percent identity ' matrix comparing the IVS I and IVS2 external sequences of 'e-like' globin genes sequenced in this study with those human, of rabbit, mouse and goat. Gap opening and extension penalties were set to l6 and 4, respeotively. Numbers in gray represent IVS I comparisons while values in black denote [VS2 comparisons.

1(e) 2( e 4G s(e õ{ a 7(r 8tu th IO lv\ 11 (v.) 12ht:v 73 tuA 74tu 150 16û¡n l'/(rn lR l'm\ l. elephant shrew e ss.2 56.3 5lß <{n d8.ß A1 \ ffi 48;, lnî 48.3 ¿sn 47.4 47.t 47.6 tnl AqK 48.3 48.( 2. sloth e 75.5 K.2 4SÃ W 61.7 51.J 47.8 47.7 49.7 44.1 50.1 ¿s_fl 4e ¿, 51.t ,18,5 ¿ßs 48.6 l. armadillo e 8{',t.2 57.7 K3R a1 1l 50.4 49.7 49.s 49.S sr;1 50.i 19. ¿ 49.2 51.J 50.8 49.6 50.4 4. human e '70.1 6t-.i 60.4 4R.6 a1 . 48.1 49-O 49.5 49.r 44.3 47.a 47,3 s0.4 {t.5 48.7 48.0 5. rabbit e 7't.8 6-1.,( 72 I 52.7 50.7 dRR ffiå 51.6 51.2 49.9 49.1 âR./l 47.9 51.1 49.4 A9A 49.1 (e-like) 6. mouse v +9.(t t7 .1 5{J.J (/ì -16.: 42.5 &.3 47.3 47.3 46.8 {t.î 46.6 47.f asJ 47.4 47.8 47.5 À 7. eoat el (t(t,4 6l.J 62.1 6'2 46.5 47 ,2 48.3 48.2 47.6 {t,1 47.( 48.( 45.8 47.1 46.3 8. Aûican eleohant v 65.ú rí(1. {, 59.2 59.2 J6..ì {.4 I 98.( u.2 85.0 63.5 59.9 2 (t2.1 :ì.2 .9 6{¡.1 {;6.1 1ì I t1 t 47.1 44.2 I ,49,0 48.1 q7 15. mouse bh1 (,y-like) I s().J f{,.11 n(,.2 '| l:.Ì.-.r ¿\),( -lft,3 i2.3 5i.5 J3 ,{, S{ .( 52.( 49.6 49.4 49.0 16. armadillo rlm ìlr Ìf¡ i L) +1, ì. 11.' J1.-i 19.ft -1fi.{i il)0.0 +5.3 +3.J :i.t 57.0 c¿q 17. huma¡l r¡n i1 il 51.9 5{) .t iq I !! 5S.1 59. -l Sri.{i -.t6.9 52.t: 5{).f 56.0 1 18. soat eII ft1-like ) :-iJ 56.9 lç -: í-{.: -x,1;_! {7 3 1.-ì 56.J 5J_{} 5f () 11" t! I Table 2-4' Percent identity matrix comparing the 5' and 3' extemal sequences of 'elike'gtobin genes amplified in this study with those of human, rabbit, mouse and goat. Gap opening and extension penalties were set to 16 and.4,respectively. Numbers in gray represent 3' comparisons while values in black denote represent 5' comparisons.

,7 1 (e) Ê 3 (e) (e 2t 4 s( t o( € (e 8 (v) 9 lvl 10 (rpv) 11 (yA LZ (y\t 13tu 14(rUrl) i5(Un L6 h 1. elephant sh¡ew e ffi 2. sloth s 5J 80.6 1It 7 74.8 69.3 13.9 61.8 44.7 54.i ¡,t.t s7.3 59.0 61.3 3. armadillo e 52.6 71.3 73.4 70.9 72.4 73.4 64,0 47.2 14Ã 5R.fl 57.4 57.5 63.4 4. human e 59.8 tf). I 66.1) 74.3 77.8 73.9 c7 1 43.2 49.1 595 53.9 5d.s ß). /t 5. rabbit e qf () q() () (t(i 58,9 7l.l 1?7 60.J ¿5 I <(n 60,6 s7.ß (R( (/¡l 63.3 6. mouse v fe-like) 53.2 51.5 s{}_l it', LA -t().() 65.8 tl6-6 54.4 c?q {K1 <1 I 67 I 7 soat ¡ 50.( 61.0 á3.2 {;'.t.6 :i5. I :¡]. 45.1 45.6 s9.0 60.J 59.s 60.0 64.4 8. Asian elephant v 19.1 J.7.t t'7.'1 á'tt.7 f-i.2 .J3.9 J0.3

9. dusone v -17.3 49.7 19.6 ¿8.2 lta .l .ll i.í- t.:'l 45.1 61.6 58.4 r<1 q Ë1 Â 60.5 10. armadillo ruv 36.5 JJ.iI .1,!:,.ß ¿:\.4 5i.0 5¿.3 ¿Á*, 47.6 47.{ 48.3 11. h uman vA 56.¡i ,16.3 Ji.l¡ -t5.-¿ ri.i r;.-i Jif ,U 52.6 :(r.6 69.3 59.t 58.1 s7.7 t2 rabbit v 50.1) J6.¡ 46. -{5 il r ¡l (t Jl¡.-.1 ,¡ì -ìti 51.7 62.1 59.6 60.1 57.s

I3. motrse hh l r'vJilcel 5{r.6 5J.1 .-i5.1 .q6.fl :-:{. -1 i{ì I f T I 5¡.i ¿5.9 46.'l) 15.2 ,51.8 60.7 14. armadillo rtrtr 43.8 -f5.1 -15.6 1{) l .tiJ. ì J{1..ì tÉ fì {) J6.3 "tJ.{ -18.1 Jfq.5 l,''r_- 15. human r¡n ,43.1 -t9.9 5{). .ts.l r'ì ì :-1 46.1 {-r.9 I8.7 i] 62.2 16. soat eII ln-like) 52 49.(_t 1t.t 5ti.{ -il; J-l.ó t5 13.9 -18.t JJ :{1. i t..i 5a.3 Fig. 2-1. Dotplot comparisons (window size : 100; threshold : 80) of the putative

Asian elephant y-globin gene (Y-axis) with the s-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons l, 2 and 3 for each gene are denoted by black boxes.

56 Asian elephant y-globin gene comparisons

e{ype genes T-type genes q-type genes

I 000 l5m

human e human o1 human yr1

Pc o -c 1000 o- U u rabbit e rabbit 1 c 'ñfo

500 l0o0

mouse y mouse ph1

goat eI goat ell Fig. 2-2. Dotplot comparisons (window size : 100; threshold : 80) of the putative dugong y-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes.

57 Dugong lglobín gene comparisons

e-type genes T-type genes q-type genes

,,/

human e human Ay human r¡n1

,,,/

ny-0 Otc o rabbit e rabbit y o) :l !

t/ t,/- 0l 0 500 1000 1500 mouse y mouse Bh1

goat eI goat eII regions. The dugong y-globin gene did not possess the 5' external CACCC transcriptional control motif typically located between -145 and -112 bp upstream from the eutherian y-globin gene Cap site @fstrad iatts et al. 1980;Margot et al. lg89; Shehee et al. 1989), though a CACCG sequence was detected -120 bp upstream of the Cap (Fig

2-3, Appendix 2-4). Two putative poly-adenylation signals were also found in the dugong gene's 3' gene flanking region - the first -20 bp before the usual site (see above) and the second at the appropriate position (+62bp from the termination codon; Appendix

2-5). Finally, a GT --- AT splice junction mutation was found at the exon 2/IVS2 boundary of both sirenian y-globin genes. Nonetheless, the absence of frameshift mutations indicate the two genes are still capable of transcribing polypeptides.

Classification of e-globin genes

Dotplot (Fig. 2-q and MatGAT (Tables 2-3,2-4) analyses indicated the elephant shrew gene fragment, which ran from codon 60 in exon 2 to +70 bp downstream of the stop codon, shared highest sequence similarity with eutherian e-globins in the IVS2 and

3' flanking region. The sloth gene product was also classified as e-globin based on sequence homology with eutherian e-globin genes in the IVS2 and gene flanking regions

(Fig. 2-5, Tables 2-3, 2-4). This xenarthran gene possessed all the external 5' and 3' transcriptional control motifs characteristic of functional 'e-like' globin genes @ig. 2-3,

Appendices 2-4 and 2-5).

Examination of the Dasypus novemcincløs p-globin cluster revealed the presence of three 'e-like' globin genes. The 5'-most gene was homologous with other eutherian e- globin genes (Fig. 2-6, Tables 2-3,24) and was found to possess all the control elements

58 : _ _-_

Fig. 2-3. The 5' flanking sequence data and features of 'e-like' globin genes amplified in this shrdy, plus those of the armadillo e-, Qy- and rþq-globin genes, the human r-, yA- and rþr¡-globin genes, and the goat r¡-globin gene. Data runs from seven bases upstream of the "CACCC" site to the initiation ("ATG') codon. The putative cap site ("CAP") is underlined and lies 52 bp (53 bp in the sloth) upstream of the initiation codon.

Sequences considered important for transcriptional regulation (CACCC, CCAAT, and

ATA) are denoted by bold CAPITAL letters. Nucleotides surrounding the ATA motif correspond to a consensus sequence ("AAGAATAAAAG') found in many pre-natally expressed p-globin genes (Flardison 1983) and are shaded in gray. The consensus sequence "CTTPyTG' @aralle and Brownlee 1978), found seven bases downstream of the cap site, is also marked in gray. Colons (:) denote base deletions. Underlined bases represent those which deviate from the consensus sequence normally recovered in p- globin genes, but should not be considered significant unless they compose a sequence element involved in tanscriptional'regulation. Two large inserts were omitted from the gene sequences of human yo-globin 15'- ctggagfcacaacccacccltgaccaat-3' containing a second CCAAT box, positioned immediately after the CCATT sequence) and lrn- globin (5'-actgggagaggçaaagggctgggggcca gagagg-3',located immediately before the

"ATA" consensus sequence).

59 "cÀccc" "ccAAl" +L20 +i.1_0 +L00 +90 +80 +70 +60 +50 +40 sloth e ccagctcCACCCctgagggacacagctcagccctgaCCAAT9actgtg : aagtaccaggggaacaaggggcc : : ! : agaggtacacagtg armadillo e ctagctcCACCCCtgaaggacacagctcagccttgaCCAATeactgca: aagtaacagtgtaacaagaggcc : : : : agaatgÈccacagt, human e ctgactcCÀCCCctgagg: acacaggtcagcctfgaCCAÀlgactttt I aagtaccatggagaacagggggcca: : gaacttcggcagta dugong y ccgcccccccgccccggcctgatagcctagcgttgaCCAATagcctcatag'caaaaggaagaacaaaggg"., r r ragtggccagggata manatee y ICATAGCA.AAAGGÀAGAACAAAGGGCC : ! ! : AGTGGCCAGGGATÀ armadillo qy atccttccgT'ilCgtgaactccacaacttagctttgaccaatagccttt ! ! : : : : : : : : : : tggaaagaggcc ! ! ! : aggggacag : aaat humanAy caaacccCACCCatgggttggccagccttgccttgaCCAÀTagattcatttcactgagggaggcaaagggctggtcaatagattcatttc humanqrr¡ caagttcCACCCCtggcagtgaccacctagctttgaCCÀÀTagtctcâttttaftgggggaaggaagggcct:!!!!ggggcagcagatg goat n taaactccÀCcc: : agcctfg'acaaggcaaacttgaCCÀATagtcttagagtatccagtg aggccêqgggccggcggctggctagggatg

'ATÀ' "CÀP" "ATG' +30 +20 +10 0 -10 -20 -30 -40 -50 sloth e aagaATAaaaagccaca: tcttctagaagcagcacAgctct,gci1:ctgacacgtctgtgatcaccagccagcabttagatcctacatcatg armadillo e aagaÀIAaaaggccaca: ccttftagaagtagcacAgacttgc LtctEacaccactgtatcaccagcaag ! ctctctgaatcttcatcatg human e aaqiaATAaaaggccagacagagaggca: gcagcac4tatctgci-tccgacacagctgcaatcactagcaagctctcaggcctggcatcatg dugong y aagaATÀaaaagccaca: tgttccagttgcagcacÀtacatccttctgacâcatctgLgatcaccacaaa: ctccaagac.gg"..çc{tg manatee Y aagaÀTAaaaagccaca ! tgttccagttgcagcacðtacatcc aLctgacacatctgtgatcaccacaaa: ctccaagaccggacacqatg armadillo t¡ry gaagAÎAaaaggccaca:aatttcagtagcagt,acAaatctgcttctgaca::: !::: ! i 3 !::::: ! ! !:: ¡: ¡:: !:::: ! !:::tatg humanAy aagaÀlAaaaqgaagcacccttcagca ! gttccacAcactcgcttctggaacAtctgaggttatcaataagctcctagtccagacgccatg

I humanr¡r1 agaaGTAaaaagccaca: catgaagca: gcaatgcAggcatgc Ltctggctcatctgtgatcaccaggaaactcccagatctgacactatg goat n ar¡gaAlÀaaa.Egccatg : cagtgaagcagcggcacAgacttg.j'b1:Çtggcccattatggatcaccagtaagctcccaqa 3 : ! ! : caccatg Fig. 2-4. Dotplot comparisons (window size : 100; threshold : 80) of the putative elephant shrew e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 2

þartial sequence) and 3 for each gene are denoted by black boxes.

60 Elephant shrew e-globin partial gene comparisons

e-type genes T-type genes Tì-type genes

500 500 human e human A1 human r¡rq

c¡) B o L 500 -c r¡ rabbit e rabbity Pc ro -ç. o_ o q)

0 500 1000 mouse y mouse ph1

500 500 goat eI goat eII Fig.2-5. Dotplot comparisons (window size: 100; threshold: 80) of the putative sloth e-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-a*iÐ Sequences run from the 5'to 3'direction. Exons 1,2 and 3 for each gene are denoted by blackboxes.

61 Sloth e-globin gene comparisons

e-type genes T-type genes tl .]p. genes

human e human oy human r¡n¡

1000 r (¡) -c rabbit e Po

mouse Bh1

goat el goat eII Fig. 2-6. Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo e-globin gene (Y-axis) with the e-, y- and ï-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes.

62 Armadillo e-globin gene comparisons

e-type genes T-type genes n-type genes

,/

human e human o1 human r¡n¡

q) s rabbit e rabbity õrt' E ro

mouse y mouse Bh1

goat eI goat eII necessary for transcription (Fig. 2-3, Appendices 24,2-5)- A yJike gene (Fig. 2-7,

Tables 2-2, 2-3), was located 16,667 bp downstream of the e-globin stop codon, the intergenic dist¿nce representing the number of nucleotides Ue¡reèn-termination codon of the e-locus and the initiation codon of the y-locus. This yJike globin gene contained frameshift mutations in both exons I and 2 causing premature termination signals in each coding block. Examination of the 5' gene flanking region of this gene also revealed a

CACCC box mutation (--+ CCTTC) and a 36-bp deletion immediately upstream of the initiation codon (Fig. 2-3). Additionally, the 3' end contained a putative poly- adenylation signal +32 bp from the stop codon, roughly 30 bp before the usual site of functional y-globin genes (Appendix 2'5). Ã2,163-bp 'q-like' gene fragment was found

1,753 bp downstream from the putative rÞy-globin termination codon, the intergenic distance corresponding to the number of bases between the rþy-gene's final stop signal and point where homolory began with other eutherian r¡-globin genes. This fragment was found to share highest sequence homology within only the IVS2, exon 3 and 3' flanking regions of the human rþr¡- and goat r¡-globin genes (Fig. 2-8, Tables 2-3,2-4).

Examination of mutation rates

Nonsynonymous replacement rates amongst the eutherian p-globin cluster genes were typically found to be lowest for the 'e-like' loci and highest for the 'pJike' loci within each species (Table 2-5). Compared to the average nonsynonymous rate of mammalian p-globin cluster genes (1.0 x lOe substitutions/siteþear; Efstratiadis et aI.

1980), notably high replacement rates were observed for the lJoci of rabbit and mouse.

Replacement rates for 'BJike' globin genes of all examined species were similar to the

63 Fig.2-7. Dotplot comparisons (window size:100; threshold:80) of the putative armadillo rfy-globin gene (Y-axis) with the e-, y- and n-globin genes of humans, rabbits, mice and goats (X-axis). Sequences nrn from the 5' to 3' direction. Exons 1, 2 and 3 for each gene are denoted by black boxes.

64 Armadillo ryyglobin gene comparisons

e-type genes T-type genes I-!!Re genes

human e human a.y human yr¡

0 500 1000 1500 2000 o rabbÍt y ! rO E L o

0 500 1000 i500 2000 y mouse mouse Bhl

goat eI goat eII Fig. 2-8. Dotplot comparisons (window size : 100; threshold : 80) of the putative

armadillo rþr¡-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice, and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 1, 2

and 3 for each gene are denoted by blackboxes.

65 Armadillo y¡-globin partial gene comparisons

e-type genes T-type genes l-type genes

-0 500 r obo 15'oo 0 500 ¡000 1500 0 500 I 000 1 500 human e human AT human yr1

0 500 1000 1500 0 500 1000 1500 o rabbít e rabbit y E ro E oL

0 500 1000 0 500 1000 1500 mouse y mouse ph1

-o 500 looo 0 500 1000 goat eI goat r¡ Table 2'5 Percent of synonymous (silent) and nonsynonymous (replacement) ' substitutions between the 'e -like' and 'B-like' globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat, and tarsier) and their ortholog in humans. Silent and non-silent evolution rates are also shown.

%o synonymous Taxon Silent mutation râte %o nonsynonymous Non-silent mutation rate subsitutions ldsl (no. silent mutations/site/vr) subsitutions ldNl (no. non-silent mutati ons/si1e/vr\ dN/dS

Africar/Asian elephant .y 43.37 4.05 x 10 -9 8.07 7.54 x l0 -70 0.186 African/Asian elephaat ô 31.40 | 3.50 x 10 -9 14.71 7.37 x 10 -9 I 0.393

dugong y 41.39 3.87 x i0 -9 I0.93 1.02 x 10 -9 0.264 dugong ò 38.25 3.57 x 10 -9 ls.34 1.43 x 10 -9 0.401

manatee y 37.47 3.50 x 10 -9 10.75 1.00 x 10 -9 0.287 manatee ò 36.31 3.39 x 10 -9 72.58 1.18 x 10 -9 0.346

rock hyrax ôH 56.89 5.32 x 10 -9 19.73 1.84 x 10 -9 0.347 yellow-spotted hyrax ðE 57.77 5.40 x 10 -9 18.88 1.76 x 10 -9 0.327 tree hyrax òE 55.56 5,19 x i0 -9 19.29 1.80 x l0 -9 0.347

slolh e 49.83 4.89 10 -9 o\ x 7.70 7.55 x 10 -10 0.1 55 o\ sloth p 51..34 5,03 x 10 -9 t5.32 1.50 x 10 -9 0.298

armadillo e 60.82 5.96 x 10 -9 7.01 6.87 x 10 -10 0.1 15 p armadillo 37.80 3.71 -9 x 10 14.77 1.45 x 10 -9 0.391

rabbit e 55.14 6.34 -9 x 10 7.98 9.17 x 10 -10 0.145 rabbit y {5 70 6.41 -9 x 10 13.14 1.51 x 10.6 0.236 rabbit p1 34.39 3.95 x -9 l0 s.35 6.15 x 10 -10 0.1 56

mouse y (e-like) 57.79 6.64 I x 10 -9 9.16 1.05 x 10 -9 0.159 mouse Y 66.93 7.69 x I0 -9 14.63 1.68 x 10 -6 0.279 ¡ mouse B2 48.4t 5.56 -9 x 10 13.28 1.53 x 10 -9 0-274

goat el 42.t8 4.96 x 10-9 5.rI 6.01 x 10-10 0.t21 goat pA 34.t0 4.01 x 10-9 11.94 1.40 x 10-9 0.350

Îa¡sier e 33.79 4.39 x l0-9 5.25 6.82 x 10-10 0.155 tarsier y 40.73 5.30 x 10-9 12.t4 1.58 x 10-9 0.298 t¿¡sier ô J1 1A 3.54 x 10-9 7.46 9.69 x 10-10 0.273 tarsier p 40.91 5.31 x 10-9 4.69 6.09 x 10-10 0.1 15 average replacement rate (1.0 x 10-e substitutions/site/year) in each species except rabbit and tarsier, whose 'p-like' loci showed markedly low rates (Table 2-5). For each p- globin cluster gene examined, the rate of synonymous (silenQlevolution was found to be higher than the nonsynonymous mutation rate (Table 2-5).

With few exceptions, non-coding evolution rates were lowest for the 'e-like' loci and highest for the 'p-like' loci (Table 2-6). These 'neutral' rates were also found to be relatively similar between the orthologs of different species (Table 2-6). The non-coding rates of examined genes were comparable to the non-coding mutation rates of other eutherian c¿- and B-globin cluster genes (4 - 5 x 10-e substitutions/siteþear; Li et al.

1981; Hayashida and Miyata 1983; Kimura 1983) (Table 2-6).

67 Table 2-6. Percent of nucleotide substitutions between the IVS2 sequenc€s of 'e-like' and pJike'globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat and tarsier) and their ortholog in humans. The correspondingno.n.eodingpNA mutation rates a-re also shown.

o% Taxon Non-coding DNA Non-coding DNA (lVS2) substitrfiors substitrfion rate

A-ùica¡¡,/Asian elephant y 40.1 3.75 x lO -9 Aûi ca¡r,/Asian elephant ô 45.1 4.21x lO -9

dugong'y 39.8 3.72 x lO -9 dugong ô 43.2 4.04 x l0 -9

manatee y 39.9 3-73 x l0 -9 manatee ò 42.4 3.96 x 10 -9

rock hyrax ð 47.6 4.45 x l0 -9 rock h¡nax ôE 47.4 4.43 x l0 -9 yellow-spotted hyrari ôtr 46.8 4.37 x l0 -9 tree hyrax ôE 46.8 4.37 x l0 -9

sloth e 43.8 4.29 x l0 -9 sloth çô 44.5 4.36 x 10 -9 sloth þ 53.4 5.24 x lO -9

armadillo e 42.3 4.15 x l0-9 armadillo 1 41.5 4.07 x l0-9 armadillo Qr¡ 43,0 4.22xlO-9 armadillo rpô 46.9 4.60 x 10 -9 armadillo þ 5t.4 5.04 x 10 -9

¡abbit e 39.6 4.55 x l0-9 rabbil y 42.2 4.85 x l0-9 rabbit Bl 52.8 6.07 x l0-9

mouse y 51.4 5.91 x l0-9

mouse Y 52.2 6.00 x 10-9 mouse p2 52.7 6.06 x 10-9

goat eI 42.1 4.95 x l0-9 goat eII (r¡) 44.0 5.18 x 10-9 goat p" 49.8 5.86 x lG9

t¿¡sier e 35.2 4-57 x lO-9 tarsier.y 46.4 6.03 x l0-9 t¿rsier {rl 29.6 3.84 x l0-9 ta¡sier ð 4t.9 5.44 x l0-9 tarsier p 29.8 3.87x l0-9

68 DISCUSSION

The discovery of e- and y loci (this study) together withlîd B-loci (see Chapter

I) within members of the basal eutherian clade Afrotheria (Fig. 2-9) provides compelling

support that at least a four-gene cluster pre-dates the divergence of this group from the

placental mammal tree (-107 mya; Springer et a|.2003). In addition, the présence of five

globin gene loci (e,.Þy, rlrq, rlrò, p) within the p-globin cluster of the nine-banded

armadillo (Fig.2-9) further suggests that afive-gene set situated 5'-e-y-n-ò-p-3, on the chromosome was already est¿blished prior to the radiation of the three remaining superorders (-102 mya; Springer et al. 2003). It is noteworthy that molecular phylogenetic studies have been unable to reject a basal Afrotheria,/Xenarthra clade

(Madsen et al. 2001; Delsuc et al. 2002; Lin et at. 2002). Hence if these two clades diverged concurrently from other placentals, the data suggests that the flrve-gene p-globin cluster does indeed predate the radiation of eutherian mammals.

Although no r¡-globin gene products were amplified from the afrotherian templates, a e-locus was amplified from the elephant shrew, and y-globin gene products were obtained from the African elephant, Asian elephant, dugong and manatee. The absence of frameshift mutations and high sequence similarity between the two elephant

(98.7%) and sirenian (95.9%) y-globin genes, respectively, suggested the yJocus is under conservation and possibly expressed within these paenungulate species. Support for this contention arises frorn the observation that blood from a 5-month old African elephant fetus possesses two distinct hemoglobin components, while blood from a l2-month old fetus and an adult each contain a single hemoglobin component which migrate at the same rate (Kleihauer et al. T965; Braunitzer et al. 1984). These studies thus suggest that

69 Fig. 2-9. The p-globin gene clusters of select mammals, updated using data collected for this study (see Fig. 1-1 legend for details). Genes that presumably exist within each cluster are denoted by grey shading and broken-lined edges.

70 Hrmm

Galago

[.emur

Rabbit

Mouse

Rat

Goat Þ rIV tr J sr grr rI r[I Y Cow

rtlì? tÞô Sloth

vn 1rô Armadillo

¡iirì? ô Elephant sbrew ¿-\ r¡li? ôE ô Hyrax i..i uJ uJ

litri? ô Sirenim

Elephant

eÊ 6l Marsupials H +- r0 Monotremes N, the minor (<15%) and major hemoglobin components of early fetal elephants are encoded by the y- and òJoci, respectively, with ò-globin assuming sole expression prior to 12 months gestation (the gestational period being 21 months). *l¿nouu io elephants, the blood of (Trichechus inunguis) calves possess one major and one minor (<5%) band (Farmer et al. 1979), while adult blood contains a single chain constituent (Kleinschmidt et al. 1986; Kleinschmidt and Braunitzer 1988). Thus within sirenians, fetal hemoglobin presumably also contains two components encoded by the y-

(minor) and ò- (major) loci, with y-globin expression extending into the early stages of post-natal life. Support for this inference is provided by the 5' flanking region of the dugong y-globin gene, which contained no CACCC transcriptional control motif

(typically located -90 bases upstream from the Cap site of transcribed 'pJike' globin genes; Myers et al. 1986). This y-globin promoter sequence has been found to be crucial for suppressing p-globin expression during pre-natal development in vitro (Sargent et al.

1999) and during fetal development of transgenic mice (Perez-Stable and Constantini

1990). Hence, the 'lingering existence' of y-globin in the hemoglobin of Amazonian manatee calves may occur because y-expression is not immediately suppressed after birth as it does not significantly affect expression of the globin gene encoding the major post- natal hemoglobin component (i.e. ô-globin; see Chapter I). Because anthropoid primates are the only eutherians known to express the 1-locus during post-embryonic stages of development, these findings are of considerable interest.

The tandem duplication events producing the four or five gene B-globin cluster of placental mammals is thought to have enabled the evolution of specialized forms of hemoglobin physiologically-adapted for different developmental stages (Barrie et al.

7t 1981; Goodman et al. 1984). For instance, as an adaptive strategy to an extended fetal

life (Koop and Goodman 1988) mice and rabbits express their y- and e-loci during early

and late embryogenesis (Hansen et a|.7982;Hardison 1981; H'árdison tg8¡; Hlll et al.

1984), respectively. Similarly, higher primates have extended their 1 expression to fetal

Srowth stages (Shen et al. 1987), while artiodactyls have recruited their 3'-most BJocus

for fetal development (Townes et al. 7984). Within the armadillo, however, only the two

outermost loci (e and p) appear to be capable of encoding a complete polypeptide (Fig. 2-

9), and thus possess a functional arrangement similar to the ancestral B-cluster of

marsupials and (presumably) monotremes (Lee 1997; Lee et al. lggg). Notably,

hemoglobin of adult armadillos contains two different 'p-like' polypeptide chains

expressed atratios of 3.7 (de Jong et al. 1981) to l:9 (Dementi and Burke 1972), while

only one 'B-like' gene capable of encoding a protein exists within its p-cluster. This

consequently raises the possibility that the second chain is encoded by the e-locus, though

significantly, no e-globin gene of any eutherian is known to be expressed after the

completion of embryogenesis. Interestingly, marsupials possess a 'p-like' globin gene

(o-globin) within their o-globin cluster, which constitutes <25o/o of both their late pre- and early post-natal hemoglobin (Wheeler et al. 2001; Wheeler et at. 2004). Thus, the possibility that armadillos may express an unlinked 'B-like' globin gene, which may be orthologous with marsupial o-globin, can not be discounted.

The discovery of a putatively functional s-globin gene within the pale-throated sloth suggests that orthologs of the ancient e-globin gene have been maintained and expressed in all members of Xenarthra. lVhile I have documented ,r¡lô- and p-globin loci

72 from this species (see Chapter I), it is still unknown whether they have retained functional .¿- and rlJoci.

The comparative analyses of evolution rates amongs-Ëthè 'e-like' and 'pJike' globin genes ofthe four eutherian superorders supports the contention that descendants of the ancestral p-globin gene family have evolved at different rates throughout evolutionary time. Furthermore, these differential rates can be primarily attributed to the different selective pressures placed upon the genes therein (Aguileta et al.2O04). Nonsynonymous

(replacement) evolution rates were found to be lower in the 'e-like genes" compared to the'p-like' genes within each species (Table 2-5), reflecting stabilizing selection as little room exists for variation of ambient conditions during gestational development (Koop and Goodman 1988). Replacement rates for the y-loci of dugong and manatee (1.02 and

1.00 x 10-e, respectively, Table 2-5) were nearly identical to the averagenonsynonymous replacement rate determined for p-globin cluster genes (1 x 10-e substitutions/site/year;

Efstradiatis et al. 7980), while replacement rates were slightly lower for the y-orthologs

10; of both elephant species (7 .54 x 1 0- Table 2-5). The former finding supports the earlier conjecture that sirenian y-globin genes are expressed both pre- and post-natally, as these genes would have diverged faster in order to encode a protein structure suited for two

(pre- and early post-natal) developmental stages, while the latter suggests stabilizing selection. The high degree of sequence divergence observed between the y-globin gene sequences of humans and laurasiatherians (Table 2-5) is likely due to the structural differences allowing the orthologs to function at different developmental times (i.e. during fetal and embryonic development, respectively; Hardison 1984). Thus, it is significant that replacement rates observed between the 1-loci of paenungulates and

73 humans are relatively similar to the average replacement rate of orthologous (and

typically analogous) gene sequences (1 x 10-e substitutions/site/year), lending further

support to the notion that paenungulate yJoci are expressed dùirg fetal growth stages.

Because the ò- and p-globin genes of paenungulates and xenarthrans encode their major post-natal hemoglobin components, respectively (see Chapter I), it is not surprising the mutation rates among these genes were fairly similar. However, these rates (which were also similar to those of the mouse and goat 'pJike' loci) were slightly higher than the average p-globin cluster gene replacement rate (Efstradiatis et at. l9B0) while those of rabbit and tarsier were much lower (Table 2-5). This large degree of variation most likely reflects lineage-specific factors such as population structure and generation time

(Wu and Li 1985; Britten 1986; Li and TanimuratggT).

The discovery of differing evolution rates for each gene in the p-globin family lends credence to the notion that phylogenetic studies using molecular clocks (which assume all hemoglobin genes evolve at equal rates) are based upon unsound assumptions

(Goodman l98l; Hardison 1984; Gebel i985, Li et al. 1987; Bailey et at. I99l). I speculate that, although the branching patterns derived from clock-based analyses of mammalian p-globin gene families are most likely in the correct order, dates estimating divergence times may need to be revised as information is gained on these various rates of evolution.

In accordance with earlier findings (Shapiro et al. 1983; Hardison 1984; Hardies et al- 1984; Gebel et al. 1985; Slightom et al. 1985; Koop et al. l9}9b), the dat¿ further demonstrate marked heterogeneity between the non-coding evolution rates of sequenced 'e-like' and 'p-like' globin genes of the same species. Specifically, descendants of the

74 'e-like'loci were shown to diverge at neutral sites more slowly than the'pJike, loci

within every eutherian species but the tarsier (Table 2-6). The slow rate of neutral

evolution previously observed in various primate p-globin gene-s (G,ebel et al. "to¡re. 1985; Harris et al. 1986; Koop et al. 1989b) was not fully supported by the analyses, as

only the rÞq- and pJoci of tarsier diverged at a rate comparatively lowgr than the neutral

mutation rate of pseudogenes (4 - 5 x 10-e substitutions/site/year, Hayashida and Miyata

1983; Kimura 1983; Li et al. 1981). Further challenging the 'hominid slowdown, theory

(Li et al. 1987; Hasegawa et al. 1989; Bailey et at. l99l) is the observation that no links

were found between species generation time and rate of neutral evolution for any loci but

Y As the 'neutral' mutation rates were relatively similar between orthologs of different

species (with the exception of tarsier), it is notably apparent that underlying mechanisms

(possibly owing to species-specific differences in DNA mutation rates, localized structural constraints, etc., Wu and Li 1985; Koop et at. I9B9b) are acting on these 'unconstrained' gene sequences within each species. Consequently, the referral to non- coding mutation rates as'paradigms of neutral evolution' Q,i et al. lg}l;Kimura 19g3) should clearly be reconsidered.

In conclusion, the results of this study confirm that the B-globin cluster of stem eutherians consisted of at least four, and probably five genes. And, with the exception of primates, descendants of the 5'-most e-locus have typically evolved at a slower rate than genes derived from the p-locus in both coding and non-coding regions throughout evolutionary time. Furthermore, the data suggest that the y-globin gene is expressed until at least I year of age in sirenians, but silenced relatively early (<5 months) in elephant development. Finally, the progenitor of modern day nine-banded armadillos atavistically

75 regressed to a two-gene cluster (5'-e-B-3'), suggesting the potential flexibility afforded by the five-gene expansion did not provide any selective advantages within this lineage.

76 LITERATURE CITED

Aguileta, G., J. P. Bielawski, anciz. Y-g. 2004. Gene converBiÕn_,and functional

divergence in the p-globin gene family. J. Mol. Evol. 59:177-199.

Amrine, H., and M. Springer. 1999. Maximum-likelihood of the tethythere hypothesis

based on a multigene data set and a comparison of different models of sequence

evolution. J. Mamm. Evol. 6:161-176.

Amrine-Madsen, H., K. P. Koepfli, R. K. Wayne, and M. Springer.2003. A new

phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian

relationships. Mol. Phylogenet. Evol. 28.225-240.

Antoine, M., and J. Niessing.lgS4.Intronless globin genes in the insect Chironomus

thumi thumi. Nature 310.195-798.

Antoniou, M., E. deBoer, G. Habets, and F. Grosveld. 1988. The human B-globin gene

contains multiple regulatory regions: identification of one promoter and two

downstream enhancers. EMBO I. 7 .377-384

Archibald, D.2003. Timing and biogeography of the eutherian radiation: fossils and

molecules compared. Mol. Phylogenet. Evol. 28:350-359.

Bailey, W. J., D. H. A. Fitch, D. A Tagle, J. Czelusniak, J. L. Slightom, and M.

Goodman. 1991. Molecular evolution of the rþT-globin gene locus: gibbon

phylogeny and the hominid slowdown. Mol. Biol Evol. B:155-184.

Ba¡alle, F. E,. and G. G. Brownlee. 1978. AUG is the only recognizable signal sequence

in the 5'non-coding regions of eukaryotic mRNA. Nature 274.84-81.

Barrie, P, 4., A. J. Jeffreys, and A. F. Scott. lgSl.Evolution of the p-globin gene cluster

in man and the primates. J. Biol. Evol. 149:319-336.

77 Boyer, s. H., E. F. crosby, A.N-Noyes, G. F. Fuller, S. E. Leslie, L. J. Donaldson, G. R.

vrablik, E. w. Schaefer Jr., and T. F. Thurmon. 1971. nnT"]" hemoglobins: some

sequences and some proposals concerning the charact", of *oiotion and mutation.

Biochem. Genet. 5:405-448.

Braunitzer, G., w. Jelkmann, A. Stangl, B. schranlq and c. K¡ombach. lggz.

Hemoglobins, XLMII: the primary structure of hemoglobin of the Indian elephant

(Elephas mæimus, Proboscidea). þz: Asn. Hoppe-seyler's z. phsiol. chem.

363:683-691.

Braunitzer, G., A- Stangl, B. Schrank, C, Krombach, and H. Wiesner. 19g4. phosphate-

hemoglobin interaction. The primary structure of the hemoglobin of the African

elephant (Loxodonta africana, Probiscidea): asparagine in position 2 of the p-chain.

Hoppe-Seyler' s Z. Phsiol. Chem. 365:7 43 -7 49.

Britten, R. J. 1986. Rates of DNA sequence evolution differ between taxonomic groups.

Science 231 1393-1398.

Buettner-Janusch, J., and V. Buettner-Janusch. 1963. Hemoglobins of two elephant

shrews. Nature 199:91 8-919.

Burmester, T., B. Ebner, B. weich, and T. Hankeln. 2002. cytoglobin: a novel globin

type ubiquitously expressed in vertebrate tissues. Mol. Biol. Evol. 19:416-421.

Burmester, T., B. weich, S. Reinhardl and T. Hankeln. 2000. A vertebrate globin

expressed in the brain. Nature 407 .520-523.

Cooper, S., and R. Hope. 1993. Evolution and expression of a p-like globin gene of the

Australian marsupial Sminthopsis crassicaudata.Proc. Natl. Acad. Sci. USA

90.11777-rT78I

78 Cooper, S., R. Murphy, G.Dolman, D. Hussey, and R. Hope. 1996. A molecular and

evolutionary study of the þ-globin gene family of the Australiàn marsupial

Sminthopsis uassicaudnta. I. Mol. Biol. 17 .1012-1022.

Czelusniak, J., D. Goodman, M. Hewett-Emmett, M. Weiss, P. Venta, and R. Tashian.

1982. Phylogenetic origins and adaptive evolution of avian and mammalian

hemoglobin genes. Nature 298.297 -300. de Jong, W., A. Zweers, and M. Goodman. 1981 . Relationship of aardvark to elephants,

hyraxes, and sea cows from cr-crystallin sequences. Nature 292.538-540.

Delsuc, F., M. Scally, O. Madsen, M. Stanhope, W. de Jong, F. Catzeflis, M. Springer,

and E. Douzery. 2002. Molecular phylogeny of living xenarthans and the impact of

character and taxon sampling on the placental tree rooting. Mol. Biol. Evol. 19:1656-

167T.

Dementi, P. L., and J. D. Burke. 1972. Oxyhemoglobin affînity in the armadillo. Amer. J.

Anat. 134:509-514.

Dickerson, R.8., and I. Geis. 1983. Hemoglobin: structure, function, evolution, and

pathology. Benj amin/Cummings, Menl o Park, Californi a.

Dierks, P., A. van ooyen, M. D. Cochran, C. Dobkin, J. Reiser, and C.'Weissmann. 1983.

Three regions upstream from the cap site are required for efficient and accurate

transcription of the rabbit B-globin gene in mouse 3T6 cells. Cell23 695-7Q6.

Donze, D., P. H. Jeancake, and T. M. Townes. 1996. Activation of ô-globin gene

expression by erythroid Krupple-like factor: a potential approach for gene therapy of

sickle cell disease. Blood 88:4051-4057.

t9 Douady, C.,F. Catzeflis, D. Kao, M. Springer, and M. Stanhope. 2002. Molecular

evidence for the monophyly of Tenrecidae (Mammali{ anJthe timiñg of the

colonization of Madagascar by Malagasy tenrecs. Mol. Phylogenet. 8vo1.22.357-

363.

Douady, c.,F. Catzeflis, J. Raman, M. Springer, and M. stanhope. 2003. The Sahara as a

vicariant agent, and the role of climatic events, in the diversification of the

mammalian order Macroscelidea (elephant shrews). Proc. Natl. Acad. Sci. USA

100:8325-8330

Efstratiadis A, J. W. Posakony, T. Maniatis et al. (15 co-authors). 1980. The structure and

evolution of the human B-globin gene family . Cell2t.654-668.

Eîz:.nk, E., w. Murphy, and s. o'Brien. 2001. Molecular dating and biogeog:aphy of the

early placental mammal radiation. J. Hered. 92:212-219.

Farmer, M., R. weber, J. Bonaventura, R. Best, and D. Domning.1979. Functional

properties of hemoglobin and whole blood in an , the Amazonian

manatee (Trichechus inunguis). Comp. Biochem. Physiol. 62.231-238.

Feng, Y. Q., R. Warin, T. Li, E. Olivier, A. Besse, A. Lobell, H. Fu, C. M. Lin, M. I.,

Aladjem, and E. E. Bouhassira. 2005. The human p-globin locus control region can

silence as well as activate gene expression. Mol. Cell Biol. 25:3864-3874.

Fitch, D. H., w. J. Bailey, D. A. Tagle, M. Goodman,L. sieu, and J. L. slightom. 1991.

Duplication of the y-globin gene mediated by Ll long interspersed repetitive

elements in an early ancestor of simian primates. Proc. Natl. Acad. Sci. USA

88.7396-7400

80 Fleming, M., J. Potter, C. Ramirez, G. Ostrander, and E. Ostrander.2003. Understanding

missense mutations in the BRCAI gene: an evolutionary Jp¡.oa"h. Proc. Natl. Acad.

Sci. USA 100.1 151-1156.

Forrester, w. c., S. Takegawa, T. Papayannopoulou, G. Stamatoyannopoulos, and M.

Groudine. 1987. Evidence for a locus activation region: the formation of

developmentally stable hypersensitive sites in globin-expressing hybrids. Nucl.

Acids Res. 15: 10159-10177.

Fritsch, E.F., R. M. Lawn, and T. Maniatis. 1980. Molecular cloning and

characterization of the human p-like globin gene cluster. cell 19.959-972.

Gebel, L. 8., v. L. van Santen, v. L., Slightom, J. L-, and R. A. Spritz. 1985. Nucleotide

sequence, evolution, and expression of the fetal globin gene of the spider monkey,

Ateles geolfroyi. Proc. Natl. Acad. Sci. USA 82:6985-6989.

Goodman, M. 1981. Globin evolution was apparently very rapid in early vertebrates: a

reasonable case against the rate-constancy hypothesis. J. Mol. Evol. 17:114-120.

Goodman, M., B. Czelusniak, D. Koop, D. Tagle, and J. slightom. 1987. Globins: a case

study in molecular phylogeny. cold spring Harb. symp. Quant. Biol. 52:875-890.

Goodman, M., B. Koop, J. Czelusniak, M. Weiss, and J. Slightom. 1984. The r¡-globin

gene: its long evolutionary history in the p-globin gene family of mammals. J. Mol.

Biol. 180.803-823.

Goodman, M., and G. Moore. l975.Darwinian evolution in the genealogy of

hemoglobin. Nature 253 : 603 -608.

81 Grosveld, G. C., A. Rosenthal, and R. A. Flavell. 1982. SeWel;l requirements for the :' transcription of the rabbit p-globin gene in vivo: the -80 region. Nucl. Acids Res.

to.495t-4971.

Grosveld, F., G. B. van Assendelft, D._R. Greaves, and G. Kollias. 1987. Position-

independeng highJevel expression of the human p-globin gene in transgenic mice.

Cell 51:975-985.

Hansen, J. N., D. A. Konkel, and P. Leder. 1982. The sequence of a mouse embryonic p-

globin gene: evolution of the gene and its signal region. J. Biol. Chem. 257:1048-1052

Hardies, S., H. Edgell, and C. Hutchison ltr. 1984 Evolution of the mammalian B-globin

gene cluster. J. Biol. Chem. 259.3748-3756.

Hardison, R. C. 1981. The nucleotide sequence of rabbit embryonic globin gene p3. J.

Biol. Chem. 256:11780- 11786

Hardison, R. C. 1983. The nucleotide sequence of the rabbit embryonic globin gene p4. J.

Biol. Chem . 258:87 39 -87 44.

Hardison, R. C. 1984. Comparison of the B-like globin gene families of rabbits and

human indicates that the gene cluster 5'-e-y-ô-p-3' predates the mammalian

radiation. Mol. Biol. Evol. 1:390-410.

Hardison, R. C, and R. Gelinas. 1986. Assignment of orthologous relationships among

mammalian cr-globin genes by examining flanking regions reveals a rate of

evolution. J. Mol. Biol. 3.243-261.

Hardison, R., C., and J. B. Margot. 1984. Rabbit globin pseudogene r¡rp2 is a hybrid of ò-

and B-globin gene sequences. Mol. Biol. Evol. 1:302-316.

82 Harris, S., P. Barrie, M. Weiss, and A. Jeffreys. 1984 The primate qrpl gene: an ancient

p-globin pseudogene. J. Mol. Biol 180.785-801 j '

Harris, S., J. R. Thackeray, A. J. Jeffreys, and M. L. Weiss. 1986. Nucleotide sequence

- analysis of the lemur p-globin gene family: evidence for major rate fluctuations in

globin polypeptide evolution. Mol Biol. Evol. 3:465-484.

Hasegawa, M. H., H. Kishino, and T. Yano. 1989. Estimation of branching dates among

primates by molecular clocks of nuclear DNA which slowed down in Hominidae. J.

Hum. Evol. 18:461476.

Hayashida, H., and T. Miyata. 1983. Unusual evolutionary conservation and frequent

DNA segment exchange in class I genes of the major histocompatibility complex.

Proc. Natl. Acad. Sci. USA 80.2671-2675.

Hedges, S. 2001. Afrotheria. plate tectonics meets genomics. Proc. Natl. Acad. Sci. USA

98:1-2

Hill, 4., S. C. Hardies, S. J. Phillips, M. G. Davis, C. A. Hutchison III, and M. H. Edgell.

1984. Two mouse early embryonic p-globin gene sequences: evolution of the

nonadult B-globins. J. Biol. Chem. 259.3739-3747.

Hutchison III, C. 4., S. C. Hardies, R. W. PadgetÇ S. Weaver, and M. H. Edgell. 1984.

The mouse globin pseudogene ph3 is descended from a premammalian ò-globin

gene. J. Biol. Chem . 259.12881-12889.

Jeffreys, A. J., P. A. Barrie, S. Harris, D. H. Fawcett, Z. J. Nugen! and A. C. Boyd. 1982.

Isolation and sequence analysis of a hybrid d-globin pseudogene from the brown

lemur. J. Mol. Biol. 156:487-503

83 Jukes, T. H. and C. R. Cantor.1969. Evolution of protein moleeu=let. Pp.2l-132in H. N.

Munro, ed. Mammalian protein metabolism. Academic Press, New York.

Kimura, M. 1983. Rate variant alleles in the light of the neutral theory. Mol. Biol. Evol.

1.84-93.

Kleihauer, E., I. O. Buss, C. P. Luck, and P. G. Wright. 1965. Hemoglobins of adult and

fetal African elephants. Nature 207.424425.

Kleinschmidt, T., and G. Braunitzer. 1983. The primary structure of hemoglobins of the

rock hyrax (Procavia habessinica, Hyracoidia): insertion of glutamine in the c-

chains. Hoppe Seylers Z. Physiol. Chem. 364:7303-13

Kleinschmidt, T., and G. Braunitzer. 1988. The primary structure of the hemoglobin of

the Brazilian manatee (Trichechus inunguis, Sirenia) Hoppe-Seyler'sZ. Physiol.

Chem. 389:427-435.

Kleinschmidt, T., J. Czelusniak, M. Goodman, and G. Braunitzer. 1986. Paenungulata: a

comparison of the hemoglobin sequences from elephant, hyra<" and manatee. Mol.

Biol. Evol. 3.427-435

Kleinschmidt, T., J.Marz, and G. Braunitzer. 1989. The primary structure of pale-

throated three-toed sloth(Bradypus tridnctyluq Xenarthra) hemoglobin. Biol. Chem

Hoppe-Seyler 370.3 03 -3 08.

Konkel, D. 4., J. V. Maizel Jr., and P. Leder. 7979. The evolution and sequence

comparison of two recently diverged mouse chromosomal B-globin genes. Cell

18:865-873.

84 Koop, B., and M- Goodman. 1988. Evolutionary and developmental aspects of two

hemoglobin B-chain genes (eM and pM) of opposum. proc. Natl. Acad. Sci uSA ''.'-''-- 85:3893-3987

Koop,8., F., D. siemienia( J. L slightom, M. Goodman, J. Dunbar, p. c. wright, and

E. L' Simons. 1989a. Tarsius ô- and B-globin genes: conversions, evolution, and

systematic implications. J. Biol. Chem. 265:69_79

Koop, B. F', D. A. Tagle, M. Goodman, and J. L. Slightom, 1989b. A molecular view of

primate phylogeny and important systematic and evolutionary questions. Mol. Bioi

Evol. 6:580-612.

Kosche, K. 4., c. Dobkin, and A. Bank. l9g5 DNA sequences regulating human p_

globin gene expression. Nucl. Acids Res. 13:77g1-77g3.

Lacy, E., R. c. Hardison, D. euon, and T. Maniatis. TgTg.Thelinkage arrangement of

four rabbit B-like globin genes. Cell lB:1273-IZB3

Lavergne, 4., E. Douzery, T. Stichler, F. Catzeflis, and M Springer. lggl.Interordinal

mammalian relationships: evidence for the paenungulate monophyly is provided by

complete mitochondrial 12S rRNA sequences. Mol. Phylogenet. 8vo1.6.245-25g.

Lee, M--H. 1997. Molecular biology and evolution of p-globin genes in monotremes.

Doctoral thesis, University of Adelaide, Australia.

Lee, M.-H., R. scrofl S. cooper, and R. Hope. 1999. Evolution and molecular

charactenzation of a p-globin gene from the Australian echidna Tachyglossus

aculeatus (Monotremata). Mol. phylogenet. Evol. 12:205 -21 4.

pseudogenes Li, w.-H., Gojobori, T., and M. Nei. r98r. as a paradigm of neutral

evolution. Nature 292.237 -239.

85 Li, W.-H., and M. Tanimura. 1987. The molecular clock runs in man T:.:.to*ty than in - .-- apes and monkeys. Nature 326.93-96.

Li, W.-H., M. Tanimura, and P. M. Sharp. 1987. An evaluation of the molecular clock

hypothesis using mammalian DNA sequences. J. Mor. Evol. 25:33 0-342

Lin, Y-H., P. Mclenachan, A. Gore, M. phillips, R. ota, M. Hendy, and D. penny. 2002.

Four new mitochondrial genomes and the increased stability of evolutionary trees of

mammals from improved taxon sampling. Mol. Biol. Evol. 19:20 60-2070.

Lingrel, J. 8., T. M. Townes, s. G. Shapiro, s. E. Spence, p. A. Liberator, and S. M.

Wernke. 1983 . Organization, structure, and expression of the goat globin genes.

Prog. Clin. Biol. Res. 134:131-139.

Liu, R., and M. Miyamoto. 1999. Phylogenetic assessment of molecular and

morphological data for eutherian mammals. Syst. Biol. 4g.54-64 p. Liu, F. G., M. M. Miyamoto, N. P. Freire, e. ong, M. R. Tennant, T. s. young, and K.

F. Gugel. 2001. Molecular and morphological supertrees for eutherian (placental)

mammals. Science 291.I7 86-1789.

Madsen, O., M. Scally, C. Douady, D. Kao, R. DeBry, R. Adkins, H. Amrine, M.

Stanhope, W. de Jong, and M. Springer.20Ol. Parallel adaptive radiation in two

major clades of placental mammals. Nature 409:610-614.

Madsen, o., D. willemsen, B. ursing, IJ- Arnason, and w. de long.2002. Molecular

evolution of the mammalian alpha 28 adrenergic receptor. Mol. Biol. Evol. 19:2150-

2160.

86 Malia, M. J. Jr., R. M. Adkins, and M. W. Alla¡d. 2002. Molecular support for Afrotheria

and the polyphyly of Lipotyphla based on analyses of the growth hormone receptor

.l - gene. Mol. Phylogenet. Evol. 24:91-I0L.

Margot, J. 8., G. W. Demers, and R. C. Hardison. 1989. Complete nucleotide sequence

of the rabbit p-like globin gene cluster. Analysis of intergenic sequences and

comparison with the human p-like globin gene cluster. J. Mol. Biol. 205:15-40.

Martin, S., K. Vincent, and A. Wilson. 1983. Rise and fall of the ò-globin gene. J. Mol.

Biol. 164:513-528.

McKenna, M.C. 1975. Toward a phylogenetic classification of the Mammalia.Pp. l-4 in

W.P. Luckett and F S. Szalay, eds. Phylogeny of the primates. Plenum Press, New

York.

McKenna, M., and S. Bell 1997. Classification of Mammals Above the Species Level.

Columbia Univ. Press, New York.

Murata, Y., M. Nikaido, T. sasaki,, Y. cao, y. Fukumoto, M. Hasegawa, and N. okada.

2003. Afrotherian phylogeny as infened from complete mitochondrial genomes.

Mol. Phylogenet. Evol. 28:253 -260.

Murphy, W., E. Eizirik, W. Johnson,Y. Zhang, O. Ryder, and S. O'Brien. 2}0la.

Molecular phylogenetics and the origins of placental mammals. Nature 409:614-6T8.

Murphy, w., E.Eizirik, s. o'Brien, o. Madsen, M. scally, c. Douady, E. Teeling o.

Ryder, M. Stanhope, W. de Jong, and M. Springer. 200lb. Resolution of the early

placental mammal radiation using Bayesian phylogenetics. Science 294.2348-2351.

Myers, R., M., K. Tilly, and T. Maniatis. 1986. Fine structure genetic analysis of a p-

globin promoter. Science 232.613 -618.

87 Nei, M., and T. Gojobori. 1986 Simple methods for estimating the numbers of

synonymous and nonsynonymous nucleotide substitutionr. Moi Biol- Evol. 11:613-

619.

Nikaido, M., H. Nishihar4 Y. Hukumoto, and N. okada. 2003. Ancient SINEs from

African endemic mammals. Mol Biol. Evol.2O:522-527.

Nishihara, H., Y. Satt4 M. Nkaido, J. G. M. Thewissen, M. J. Stanhope, and N. okada.

2005. A retroposon analysis of afrotherian phylogeny. Mol. Biol. Evol. (in press). 'walker's Nowak, R. 1999. mammals of the world, Sixth Edition (vol.2). The Johns

Hopkins University Press. Baltimore.

ohno, S. 1970. Evolution by gene duplication. springer-verlag,Berlin, New york.

Gy-globin Perez-Stable, C., and F. Constantini. 1990. Roles of fetal promoter elements

and the adult B-globin 3' enhancer in the stage-specific expression of globin genes.

Mol. Cell Biol 10:1116-1125.

Pesce, 4., M. Bolognesi, and A. Bocedi. 2002. Neuroglobin and cytoglobin EMBO

3.1 146-1 151

Phillips, s., J., s. c. Hardies, c.L. Jahn, M. H. Edgell, and c. A. Hutchison trI. 1984.

The complete nucleotide sequence of a p-globinJike structure, þb2, from the [Hbb]o

mouse BALB/c. J. Biol. Chem. 259.7947-7954.

Prinsloo, P. 1993. Molecular and chromosomal phylogeny of the Hyracoidea. Doctoral

thesis, University of Pretori4 South A_frica.

Poyart, c., H. wajcman, and J. Kister. 1992 Molecular adaptation of hemoglobin

function in mammals. Resp. Physiol. 90:3-17.

88 Proudfoot N. J., and G. G. Brownlee.1976.3'non-coding region sequences in eukaryotic -- messengerRNA.Nature 263:2Il-214. -

Proudfoot N., A. Gil, and T. Maniatis. 1982. The structure of the human (-globin gene

and a closely linked, nearly identical pseudogene. Cell 31:553-563.

Prychitko, T., R. M. Johnson, D- E. Wildmar¡ D. Gumucio, and M. Goodman. 2005. The

phylogenetic history of New World monkey p-globin reveals a platyrrhine p to ò

gene conversion in the atelid ancestry. Mol. Phylogenet. Evol. 35.225-234.

Ristaldi, M. S., S. Casula, S. Porcu, M. F. Marongiu, M. Pirastu, and A. Cao. 1999.

Activation of the ò-globin gene by the p-globin gene CACCC motif. Blood Cells

Mol. Dis. 25:193-209.

Ross, J., and A. Pizano. 1983, Human B and ò globin messenger RNAs turn over at

different rates. J. Mol. Biol, 167:607-617.

Sargent, T. G., C. C. DuBois, A. M. Buller, and J. A. Lloyd. 1999. The roles of 5'-HS2,

5'-HS3, and the 1-globin TATA, CACCC, and stage selector elements in suppression

of p-globin expression in early development. J. Biol. Chem. 274:11229-11236.

Satoh, H., N. Inokuchi, N. Yasuhiro, and T. Okazaki. 1999. Organization, structure, and

evolution of the nonadult rat p-globin gene cluster. J. Mol. Evol. 49:122-129.

Schimenti, J. C., and C. H. Duncan. 1985a. Concerted evolution of the cow u'and ea p-

globin genes. Mol. Biol. Evol. 2.505-513.

Schimenti, J. C., and C. H. Duncan. 1985b. Structure and organization of the bovine p-

globin genes. Mol Biol Evol. 2:514-525.

89 Schon, E. ,{., M. L. Cleary, J. R. Haynes, and J. B. Lingrel. 1981. Structure and evolution

of goat y-, Pc- and pA-globin genes: three developmentally tegtrtut"agenes contain

inserted elements. Cell 27 .3 59-3 69.

scolt, A. F., P- Heath, s. Trusko, s. H. Boyer, w. Prass, M. Goodman, J. czelusniak, L.

Y. Chang, and J. L. Slightom. 1984. The sequence of the gorilla fetal globin genes:

evidence for multiple gene conversions in human evolution. Mol. Biol. Evol. 1:371-

389.

Shapiro, S. G., E. A. Schon, T. M. Townes, and J. B. Lingrel. 1983. Sequence and

linkage of the goat etand eII p-globin genes. J. Mol. Biol 169:31-52.

Shehee, W. R., D. D. Loeb, N B. Adey, F. H. Burton, N. C. Casavan! p Cole, C. J.

Davies, R. A. McGraw, S. A. schichman, and D. M. severynse. 1989. Nucleotide

sequence of the BALB/c mouse B-globin complex. J. Mol. B,iol.2os.4T-62.

Shen, S. H., J. L. Slightom, and O. Smithies. 1981. A history of the human fetal globin

gene duplication. Cell 26.191 -203.

Shoshani, J., and M. McKenna. 1998. Higher taxonomic relationships among extant

mammals based on morphology, with selected comparisons of results from

molecular data. Mol. Phylogenet. Evol. 9.572-584.

Simpson, G.G. 1945. Principles of classification and a classification of mammals. Bull.

Am. Mus. Nat. Hist. 85:1-350.

Singh, A.2002. Phylogenetic relationships of paenungulate mammals inferred from a-

and p-globin gene sequences. Honours thesis, B.sc., university of Manitoba,

Canada.

90 slightom, I.L.,L. Y. chang, B. F. Koop, and M. Goodman. 1985. chimpanzeefetal cy- Ay-globin and gene nucleotide sequences provide further evidence of gene :' conversions in hominid evolution. Mol. Biol. Evol. 2.370-389.-

Smith, 4., D. Smittr, and B. Funnell. 1994. Atlas of Mesozoic and Cenozoic coastlines.

Cambridge Univ. Press. Cambridge, U.K.

Springer, M. 1997. Molecular clocks and the timing of the placental and marsupial

radiations in relation to the Cretaceous-Tertiary boundary. J. Mammal. Evol. 4.285-

302.

springer, M., H. Amring A. Burþ and M. stanhope. 1999 Additional support for

Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear

genes, and the impact of dat¿ partitions with heterogeneous base composition. Syst.

Biol. 48:65-75.

Springer, M., G. Cleven, O. Madsen, W. de Jong, V. Waddell, H. Amrine, and M.

Stanhope. 1997. Endemic African mammals shake the phylogenetic tree. Nature

388:61-64. springer, M., w. Murphy, E. Eiziri( and s. o'Brien. zoo3. placental mammal

diversification and the Cretaceous-Tertiary boundary. Proc. Natl. Acad. Sci. USA

100. 1056- 1061

Stanhope, M., o. Madsen, v. waddell, G. cleven, rv. de Jong, and M. springer. 199g.

Highly congruent molecular support for a diverse superordinal clade of endemic

African mammals. Mol. Phylogenet. Evol. 9:501-508.

Steinberg, M. H., and J. G., Adams Itr. 1991. Hemoglobin A2. origin, evolutior¡ and

aftermath. Blood 7 8.2165-2177 .

9t Tagle, D. 4., B. F. Koop, M. Goodman, J. L. Slightom, D. Hess, and R. Jones. 1988.

Embryonic e- and y-globin genes of a prosimian primate (Galago crassi caudatus).

nucleotide and amino acid sequences, developmental regulaìion and phylogenetic

footprints. J. Mol. Biol. 203:439-455.

Taglg D. 4., J. L. Slightom, R. T. Jones, and M. Goodman. 1991. Concerted evolution

led to high expression of a prosimian primate ô-globin gene locus. J. Biol. Chem.

266:7649-7480.

Tagle, D. 4., M. J. St¿nhope, D. R siemienia( p. Benson, M. Goodman, and J. L.

Slightom. 1992. The p-globin gene cluster of the prosimian primate Galago

crassicaudafzs: nucleotide sequence determination of the 41-kb cluster and

comparative sequence analyses. Genomics lS.7 4l -7 60

Tang, D. c., D. Ebb, R. c. Hardison, and G. p. Rodgers. 1997. Restoration of the

CCAAT box or insertion of the CACCC motif activates ô-globin gene expression.

Blood 90. 42I-427.

Thompson, J. D., D. G. Higgins, and r. J. Gibson. lgg4. 1LTJSTAL w. improving rhe

sensitivity of progressive multiple sequence alignment through sequence weighting,

position-specific gap penalties and weight matrix choice. Nucleic Acids Res.

22:4673-4680.

Townes, T. M., M. C. Fitzgerald, and J. B. Lingrel. 1984. Triplication of a four-gene set

during evolution of the goat p-globin locus produced three genes now expressed

differentially during development. Proc. Natl. Acad. Sci. USA 8l:6589-6593.

Trent III, J.T., and M. s. Hargrove.2002. A ubiquitously expressed human

hexacoordinate hemoglobin. J. Biol. Chem. 277 .19539-19545.

92 Tuan, D., solomon, w. e.Li, and I. M. London. 19g5. The,.B-like-globin,,gene domain

in human erythroid cells. proc. Narl. Acad. sci. usA sz,63gaõggs -

van Dijk, M., o. Madsen, F. catzeflis, M. Stanhope, w. de Jong, and M. page. zo0r. Protein sequence signatures support the African clade of mammals. proc. Natl.

Acad. Sci. USA 98:188-193.

Vincent, K' 4., and A. C. V/ilson. 1989. Evolution and transcription of old world monkey

globin genes. J. Mol. Bliol. Z07:465-480

Waddell, P-, and S. Shelley. 2003. Evaluating placental inter-ordinal phylogenies with novel sequences including RAGI, -fibrinogen, ND6, and mt-tRNA, plus MCMC_

driven nucleotide, amino acid, and codon models. Mol. phylogenet. Evol. 2g:197-

224.

Weber, R. 1995. Hemoglobin adaptations to hypoxia and altitude - the phylogenetic perspective,Pp 3r-44in J. sutton, c. Houston, and G. coates, eds. proceedings

from the ninth international hypoxia symposium, Lake Louise, Canada, Burlington,

Vermont.

Webeq R., and R. Wells. 1989. Hemoglobin structure and function.pp. 279-300 in S

wood, ed. comparative pulmonary physiology. Marcel Dekker, Inc. New york. wheeler, D., R. M. Hope, s. J. B. cooper, G. Dorman, G. c. webb, c. D. K. Bottema, A. A. Gooley, M. Goodman, and R. A. B. Holland. 2001. An orphaned mammalian p_

globin gene of ancient evolutionary origin. Proc. Natl. Acad. Sci. USA 9g:ll0l-

I 106.

93 wheeler, D., R. M. Hope, s. J. B cooper, A. A. Gooley, and R A. B Holland. 2004.

Linkage of the p-like omega-globin gene to a-like globin genes in an Australian

marsupial supports the chromosome duplication model foi*epu*tion of globin gene

clusters. J. Mol. Evol. 56:642-652.

White, J., D. Harkness, R. Isaacks, and D. Duffield. 1976. Some studies on blood of the

Florida manatee, TrÌchechus manatus latÌrostris. Comp. Biochem. Physiol. 55:413-

417. wood, w. G., J. M. old, A. v. Roberts, J. B. clegg, and D. J. weatherall. TgTg.Human

globin gene expression: control of B, ò and ò p chain production. Cell 15:43 7-446.

Wu, C.-I., and W.-H. Li 1985 Evidence for higher rates of nucleotide substitution in

rodents than in man- Proc. Natl. Acad. sci. usA gz:r741-1745.

94 Appendix l-1. Primers used in the (A) amplification and (B) sequencing reactions of 'B-like'globin genes.

(A) Primers used in pol5rmerase chain reactions.

Primer name Sequence IIBB-3(F) 5' TTG CTT CTGACA CAA CTG TGT T 3' IIBB4(F) 51 4üqÁCAGACACCATGG TGC 3' ÉrBB-5(R) 5'AGT GGTACT TGT GC'c CC 3' - lrBB-6(F) 5' GGT TGT CTA CCC CTG GAC 3' HBB-7(R) 5'TTC TCG GGATCCACG TGC 3' }IBB-9 5' CTGACT GCT CrCT GAG AAGACT CAc c 3' lrBB-9(FIARM 5' CTGACT TCT GAT GAGAAGACT GCG G 3' TIBB-10 5' CCC TTGAGG TTG TCC AGG TGC 3' ÉrBB-11(F) 5' CTG TCC TGC ACAACG CTAAAG TGC 3' rrBB-12(R) 5' GCC ACC ACC TTC TCA TAG GCA GC 3' HBB-13(R) 5' GGG CTT CCGAIG GATACC 3' rrBB-15(F) 5' CAC CCT GC'c CTC GGC CAA TCT 3' I{BB-16(R) 5' TTA TGC CTT TGAAGTAGAATT GTT C 3' HBB-18(F) 5' TTC TGA CAC AAC TGT GTT CAC TAG CAA CTA 3' HBB-le(R) 5' GAG CTG Aú4ü{ GTT CTT TAT TAG GCA 3' HBB-20(FISTH 5' GCT GAC GAT GAGAAG CTCT GGT GTA 3' lrBB-ARM-Bß.) 5' TAC ACC CAA GCAACG TAG TAG TGAACC 3' LIBB-ARM-B/Dß) 5' Crcc CTT TTA CCT TAG CAT TTG CGAAC 3' HBB-ARM-D(R) 5'ATG TGGAGG TAAAGC CAA CAG GTTA 3' rrBB-cGM-l(F) 5' CAG CCA GAC AUAú{ TGA AAC TCA CTG 3' FIBB-AI-1I16 5' CAGACTG AAG TCA GTG CCT GTC AGA AAC TGG 3' HBB-Bl-1i4s 5' ACT GTT CTG TCT CCA CAT GCC CAG CC 3' HYR-B1-R 5'ACC GTT TTG TCT CCA CAG CrCT TGG CC 3' HYR-C1-R 5'CCTAGATAC A,AÁ CCT GCCAAGTGC CTCA 3' HBB-C1-1I74 5' CCC ACT CAA CCC TCC TTAAGT CTA CCT TGC 3' ELE.A2-27 5'TGT GTG GGC AAC ATAAAT CTCIG TTT C 3' ELE-82-27 5' CCCATT GTC TCAGAG GCCAAG CT 3' HYF.2BZ-21 5' ATG TCT GTG TAT CTG TCT GTC TCT TCT CCA 3' ELE-C2-21 5' CTC CTG GGCAAT GTG CTG GTGA 3' ESH-A(R) 5' TCAATA CCT GTGAGAAGC CTG GAA AAC GG 3' ESH-B(R) 5' CTC CATATT CCA GTC CCA TTCAAG CAT CC 3' ESH-C(R) 5' CCT GAC CAC CAA GTT TAf CTA CGT CCA CC 3' ESH-AG'>2 5' TGC ACAAGA GTC AAG TGT GAG TGC AGC TTG 3' ESH-A(F) 5' TGA TGC CCTAAT CTC GAT CAC TCT GCC 3' ESH-C(F) 5'AGG CTG CTTTCCAGAAGGTGGTGACT 3'

(B) Primers used in sequencing reactions at the Unversity of Calgary DNA core laborafory.

Primername Sequence M-13(F)40 5' GTT TTC CCA GTC ACGAC 3' M-13(R) 5'AACAC'C TAT GAC CAT G 3'

95 Appendix 1-2. DNA sequences of the 'p-like' globin genes amplified in this study. Exons are identified by CAPITAL letters, while introns and exlemal g.q.r sequences are represented by lower case letters. The 5' upstream transcriptional control motifs (CACCC, CCAAT, ATA) and 3' poly-adenylation signal (AATAAA) are underlined within the 5' and 3'flanking regions of each gene sequence, respectively Within each coding block, translated amino acids are listed below their corresponding base triplet.

Appendix Page

r-21- African elephant ò-globin gene sequence 97 t-28 Asian elephant ô-globin gene sequence 99

1-2C Dugong ô-globin gene sequence 101 t-2D West Indian manatee ô-globin gene sequence 104 t-28 Rock hyrax ò-globin partial gene sequence 106 t-2F Rock hyrax ðH-globin gene sequence 108 r-2G Yellow-spotted hyrax òH-globin gene sequence 1i0

T-2}l Western tree ôH-globin gene sequence TL2

T-21 Bushveld elephant shrew p-globin partial gene sequence 114

1-21 Bushveld elephant shrew ò-globin partial gene sequence 116 t-2K Ni ne-b anded armadi I lo rþ ò-gl obin gene sequence 117 t-2L Nine-banded armadillo p-globin gene sequence t19

L-2}/d Pale-throated sloth qrù.globin gene sequence 721

1-2N Pale-throated sloth p-globin gene sequence 123

96 Appendix 1-24 Africon elephant delto-globin gene sequence

goctogcooc tacccqqtch gococcATGG TGAATCTGAC,l$gegfçre AAGACACAAG V NLT AAE KTA TCACCMCCT GTGGGGCAAG GTGAATGTGA AAGAGCTTGG TGGTGAGGCC CTGAGCAGgT VTNL rfGK VNVK ELG GEA LSR 127 ttgtotctog gttgcoo ggt ogacttaogg ogggttgagt ggggctgggc atgtggagoc

ogoocagtct cccogtttct gocoggcoct gocttcctct gcoccctgtg gtgctttcoc

241 CttCOgGCTG CTGGTG TCT ACCCATGGAC CCGGAGGTTC TTTGAACACT TTGGGGACCT L LVVY PtllT RRF FEHF GDL GTCCACTGCT GAAGCTGTCC TGCACAACGC TAAAGTGCTG GCCCATGGCG AGAÄAGTGTT STA EAVL HNA KVL AHGE KVL 361 GACCTCCTIT GGTGAGGGCY TGAAGCACCT GGACAACCTC AAGGGCACCT TTGCCGATCT TSF GEGL KHL DNL KGTF ADL 42r GAGcGAGcTc cAcTGTGAcA AGcrccAcGT GGATccrcAc MTTTCAGGg tgosrctagg SEL HCDK LHV DPE NFR ogacottcto ttttttcttt tcoctttgto gtctttcact gtgottottt tgcttotttg

54r. aotttcctct gtotctcttt ttqctcAact atgtttcotc otttogtgct ttttcooctt

otoccotttt gtottocttt tctttcooto ttcttccttt tttcctgoct cocottcttg

ctttqtqtco tgctctttat ttootttcct gcgtttttgc tcttgctctc cctttctcct

qqtttccttc cctctgooco gtocccogat tgtgcatocc occtctcotc cqctatttct

781 gcoctggggc oqqtccccoc ccctcctccq totgogggtt ggaaaggoct gootc oaoga

ggagaggatc otcgtgctgt tctogogtct gtgattcatt tcogacttga aggataactt

goatootot o ooatcaggag taaatggago ggaaagtcag totct go gaa tgaaagotco

961 googgtcat a gacgogatgg ggagcagaog ttoctoogoa octgoccott gtggctotoo

1021 ttaotcoctt oottogttoa ttoototgtt tgttotttot tcocgttttt cottttggtg Appendix 1-ZA Africon elephant delto-globin gene sequence

1081 ggagtooatt t g ggct o gtg tgtggg cqo c oto oot g ggt ttcs c c c cot t gt ct cagog

1141 gccoogctgg ottgctttgt tooccotgtc tgtgtotgto tctocctctt ccccotog0

L CCTGGGCMT GTGCTGGTGA TTGTCCTGGC CCGCCACTTT GGCAAGGAAT IEACC{EAGA LGN VLVI VLA RHF GKEF TPD TGTTCAGGCT GCCTATGAGA AGGTTGTGGC TGGTGTGGCG AATGCCCTGG CTCACAAATA VAA AYEK VVA GVA NALA HKY 1321 eeKTGAgot cctggcctgt tcctggtotc catcggaogc cccotttccc gogotgctot H 1381 ctctgoottt gggaaaotao tgccooctct caagggcotc tcttctgcct ootoaogtac

744r tttcogctcq octttctgot tcotttottt ttttctcogt cqctcttgtg gtgggggaog

ttcccoaggc tctotggaca gagagctctt gtgccttoto ggaaaagttc aogggqoott

ggaaoataao gggaoccoto cocogotott aatgggooco ottctocttc aooggcotoq

agottgggot ggtttggcoa ataggotoct ggtoctocog ggottccotg ggcct coggc

ctoogocato gccccogggc tooctttcog ottcqqttcc ogoaattcct cocqoqotqo

1741 tgga

98 Appendix 1-28 Asion elephont delto-globin gene sequence

ttctgggcct cagtttcctc otttgtqtqo toocogoott gllogggtoøq ttctt oagog

gcttaccagg ctgtoottct ooooooaotg cotooqtooo cttgccoagg cagotgtttt

togcagcaat tcctgoo qga aacgggoc co ggagotaagt agagaaagag tgooggtctg

181 ooqtcqoqct aataagacog tcccogoctg tcoaggagog gtotggctgt catcottcog

gccteceeet gcogooccaç gcçetggcct tggeeqqtct gctcoc aaga gcaaaaaggg

coggaccogg gttgggcoto toaggaagag togtgccogc tgctgtttoc octcocttct

361 gococaoctg tgttgactog caoctoccco otcogocacc ATGGÏGAAIC TGACTGCTGC VN L TAA 421 TGAGAAGACA CAAGTCACCA ACCTGTGGGG CAAGGTGAAT GTGAAAGAGC TTGGIGGTGA EKT AVTN LITIG KVN VKEL GGE 481 GGCCCTGAGC AGgtttgtot ctoggttgca aggtagoctt aaggagggtt gogtggggct ALS R gggcotgtgg ogacagaoca gtctcccogt ttctgocogg cactgocttc ctctgcocci,,

tgtggtgctt tcoccttcog GCTGCTGGTG GTCTACCCAT GGACCCGGAG GTTCTTTGAA LLV VYPW TRR FFE 66L CACTTTGGGG ACCTGTCCAC TGCTGACGCT GTCCTGCACA ACGCTAAAGT GCTGGCCCAT HFGD LST ADA VLHN AKV LAH 72L GGCGAGAAAG TGTTGACCTC CTTTGGTGAG GGCCTGAAGC ACCTGGACAA CCTCAAGGGC GEKV LTS FGE GLKH LDN LKG ACCITTGCCG ATCTGAGCGA GCTGCACTGT GACAAGCTGC ACGTGGATCC TGAGAATTTC TFAD LSE LHC DKLH VDP ENF 841 AGGgtgogtc taggagococ tctotttttt cttttcoctt tgtogtcttt coctgtgatt R 901 ottttgctto tttgootttc ctctgtstct ctttttoctc goctotgttt cotcqtttog

tgttttttco octtotocca ttttgtotto cttttctttc qotottcttc cttttttcct

1021 gactcocott cttgctttat stcotgctct ttqtttqott tcctocgttt ttgctcttgc Appendix 1-28 Asiqn elephont delta-globin gene sequence

tctccctttc tcctogtttc cttccctctg oocogtoccc-ooottgtgco toccocctct

1141 cgtccoctot ttctgcoctg gggcoootcc ccocccctcc tccototgag ggttggoaag

goctgootca aogaggogog gatcatggtg ctgttctogo gtotgtgqtt cotttcqgqc

ttgooggoto octtgootoq totqqootco ggagtaaatg gagoggaoog tcogtotctg

agaotgoaog otcagaaggt cotog ocgag øtggggogca gaagttacto agaoactgoc

cottgtggct otoottootc qcttoottog ttoottooto tgtttgttot ttottcocgt

1441 ttttcotttt ggtcgggogto oottt gggct agtgtgtggg coacotooot gggtttcocc

ccottgtctc agaggccaag ctggattgct ttgttoocco tgtctgtgto tgtatctocc

1561 tcttccccot ogCTCCTGGG CAATGTGCTG GTGATTGTCC TGGCCCGCCA CTTTGGCMG LLG NVL VIVL ARH FGK t62t GAATTCACCC CAGATGTTCA GGCTGCCTAT GAGAAGGTTG TGGCAGGTGT GGCGMTGCC EFTP DVA AAY EKVV AGV ANA 1681 CTGGCTCACA A.ATACCACTG Agotcctggc ctgttcctgg totccotcgg oogccccott LAHK YH I74T tcccaogotg ctotctctgo otttgggaoo otootgccoq ctctcoqggg cotctcttct

1801 gcctqotooo gtactttcog ctcooctttc tgottcqttt ottttttttc tcogtcoctc r861 ttgtggtgg g ggaagttccc ooggctctot ggacagagag ctcttgtgcc ttoto ggooo

L92t ogttcoogg g aaattggaoa otaoagggoa ccoto cocog otottoo tgg gaacaqttct

1981 octtcooagg cotaoagott gggooggttt ggcaaotogg otoctggtoc tocogggott

ccotgggcct coggcctoog ocotogcccc ogggctooct ttcogattcq ottccagaao

ttoctcocqa aøtaotgga

i00 þpendix 1-2C Dugong delto-globin gene sequence

tcccctttgo ooccoctgct ttgooacaoa ogtgttottc ge-aqcttgtg cttotoottg

otttttoctt gtttggaccc ccaoogccto otatggtgoc cttgcototo gcottcttcq

tttotattto ttcogttttc ogcgotcotg totttggtto oocgoccaaa oaaagaaoça

181 aacaolc1oo coaatgaotg oatottoqtt ttcttocco g gagaatttoo tccaqotcag

241 gagaoagaag aootgtttog oottgtcqcq gotgtttcot tcottctgtc cttgtootto

tcttggotot tctggooøca caggootogc tccotcctoo toctcctaaa ctgaacccrco

361 gtgggcaatt cctttctgcc ctctgggcct tggtctcctc ottcgtgtoc taogogooct

42r ggagogtago tgtctoagog accaggcagt tottctooqo oqaccctcot oootooqctt

481 gccaogggag otgttttcag aagtoottcc tgoaagcaat gggoccoooo gataagtaga

541 ggoogagtga gggtctgaao tcoaoctoct aagørtggtgc cogoctg cta aggacaggta

cggctgtcot cottcogucc tcqeçet gco gaoccoçqee etggcctygg ccoatctgirt

66r cccoggagcq aga^gggcc¡g ooqccagggt tgggcgtqoo oggaagrgco ggoctogctg

ctgcttoctc ttocttctgg cacaactgtg ttgoctogco octococoot cogccwccAT

78r GGIGEAIE]G ACTGCTGATG AGACGGCTTT GGTCACCGGC CTGTGGGCCA AGGTGAACGT VHL TADE TAL VTG LITAK VNV 841 GAAAGAATAC GGTGGTGAßG cccrccccAc gtttgtgtct aggttggaqg gcoggcttqo KEY GGEA LGR ggogggttga ttggggctgg gcatgtggog CIcogaocagt ctcccagttt ctgocogtco

ctgoctccct ctgtcccctg tggtgctttc occcttcogQ CTGTTGGTTG TCTACCCATG LLVV YPlll 1021 GACCCAGAGG TTCTTTGAGC ACTTTGGGGA CCTGTCCTCT GCTTCGGCTG TCATGCACAÄ TAR FFEH FGD LSS ASAV MHN

101 Appendix 1-2C Dugong delto-globin gene sequence

cccTAAAGTG AAGGCCCATG GCGAGAAAGT GTTGGCCTCCEüSTGACG GCCTGAAGCA PKV KAHG EKV LAS FGDG LKH CCTGGACGAC CTCMGGGCG CCTTTGCTGA GCTGAGTGCG CTGCACTGTG AG&\ËTEGEA LDD LKGA FAE LSA LHCE KSH CGTGGATCCT CAGAACTTCA AGgtgogtct aggogactct ctoctttttt cttttcactt VDP ANFK tgtogtcttt coctgtggtt ottttoctto cttgcotttc ctccocotct ctttocttgo

ctgtotttgo gootttggtg ctttttcooc ttococcott ttgtottott ctttctttco

1381 otgttcttcc ttttctcttg tctcocogtc ttgctttoto tcotgccctt totttoottt

cctgcctttt gcctctgtct ctctctcgat ctcctttttt cctqotttcc ttccccttoo

ocoqtoccco ogttotgcat occocctctc otccoctoct tctgcActtg ggcaaotcct

actgctcttc cototgoggg tLagooagga ctgootcoao gaggggaggo tcatggtgtt

7621 gttctogogt ctgogactcg ttttgqoott gaoggøtoat ttgootoato tqoootcagg

ogtooatggg aaggaaogtc tototctgog aatggaagat cagaoggoco tattggogct

1741 goggagcago ogttoctaag agscsgocco ttqttoctct qattaotcaq ttoqttogtt

1801 ootgogtotg cttgctoatt totttocttg ttttttoott ttogtgggao taogtttggg

1861 ccgtcogttt gggctoctgt gtgggcooco toootgggtg tcogcccott ttctcogagg

1921 ccoogctggo ttgctttgtt ooccotgtct gtgtotctot ctocctcttc cccqcqgeTe L

1981 CTGGGCAATA TGCTGGTGTG TGI-GCTGTCC CGCCACTTGG GCAAGGAATT CTCTCCACAG LGNM LVC VLS RHLG KEF SPA GCACAGGCTG CCTATGAGAA GGTTGTGGCT GGGGTGGCGA ACGCCTTGGC TCACMATAC AAAA YEK VVA GVAN ALA HKY 2101 eAeTGAgott ctggcctott tcctggtotc cactggaagc cctgtttccc tagatgctot

H

t02 Appendix l-ZC Dugong delto-globin gene sequence

2161 ctctgoottt ggggaaotao tgcccoct c t caogggtota gettctgcct oqtooqgooc

2221 tttcogctco octttctgot tcqtttcott tottttuttt cttqctcttg aggtotcgaa

ogttccccao ggctotatgg otoaggagtt ottgtgtctc otogg aagog ttcoogtgoo

234r gttogoaoot gaogggoccc otocacogct ottootg gga otoottctoc ttctoaggco

tootggttgg ggaggtttgg caaatagago ggatgctggc cctac atgga ttccatogoc

2467 cttqqoccto ogocotogcc ccogggctoo cttccogatt ctqttccgac oqttactcoc

aoaotootgg gctcoottog ogoogtcot g otgottgoao qcottgtttt gttttcttgc

ttcctocttc tcttgcctgt ocottttogc ccocootcto toccctttct ogtctttct

103 Appendix 1-2D West Indion monotee delto-globin gene sequence

oooacagtgc cagoctgctc oggoc gggto cggctgtcot -cottcogocc tcoccctgco

gooccococc etggccttgg ccootctgct cccog gagca agoogggcag oaoccøgggt

121 tgggcgúss oggaogagco gggctogctg ctgcttococ ttqcttc tgg cocøact gtg

181 ttgoctooco octocqcoot cogococcAT GGIGCATCIG AcTccTGAAG AGAAGGCTI-T VHL TPEE KAL 24L GGTCATCGGC CTGTGGGCCA AGGTGAACGT GMAGAATAT GGTGGTGAGG CCCTGGGCAG VIG LITAK VNV KEY GGEA LGR gtttgtotct oggttgtaag gcaggcttoq ggagggttgo ttggggctgg gcotgtggog

acagaacogc ctcccogttt ctgacogtco ctgoctccct ctgtcccctg tggtgctttc

OCCCgTCOEQ CTGTTGGTTG TCTACCCATG GACCCAGAGG TTCN-TGAGC ACTTTGGGGA LLVV YPÏII TAR FFEH FGD CCTGTCCTCT GCTTCGGCTA TCATGAACAA CCCTAAAGTG AAGGCCCATG GCGAGAAAGT LSS ASAI MNN PKV KAHG EKV GTTCACCTCC TITGGTGACG GCCTGAAGCA CCTGGAAGAC CTCAAGGGTG CTTTTGCTGA FTS FGDG LKH LED LKGA FAE 601 GCTGAGTGAG cTccAcrcrG ACAAGTTGCA cGTcGATccr cAGAAcrrcA GGgtgogtct LSE LHCD KLH VDP ENFR aggagocgct ctoctttttt cttttcqctt tgtogtcttt coctgtggtt ottttoctto

cttgcotttc ctccqcotct ctttqcttgo ctgtotttgo goatttogtg ctttttcooc

ttococcott ttctottott ttttctttco atqttcttcc ttttttcctg tctcocogtc

841 ttgctttoto tcotgccctt totttoqttt cctgcctttt gcctctgtct ctctctccat

ctcctttttt tcctoqtttt ccttcccctt aoocqotocc coaqttqtgc atoccocctc

tcotccqcto cttctgcgct tgggcooqtc ctoctgctct tccqtotgag ggttagaaog

goctgootco aagqggggog gatcatggtg ttgttctogo gtctgogoct cqttttgaaa

104 þpendix 1-20 West Indion monotee delto-globin gene sequence

ttgooggota atttgaotoo tttooootco ggagtoaatg gggqggaoog tctototctg

1141 agaatgaaag atcagaoggo cotattgcgc tgaggogcag oagttoctoo gagacogocc

1201 qttqttoctc toottootcq ottqottogt tootgootot gcttgctoot ttotttoctt

gttttttoot tttogtgggo atoagtttgg gccctcogtt tgggctoctg tgtgggcooc

otaootgggt gtcaccccqt tttctcagag gccoogctgg ottgctttgt tooccotgtc

1381 tgtgtotcto tctocctctt ccccocogef CCTGGGCAAT GTGCTGGTGT GTGTGCTGGC L LGN VLVC VLA CCGCCACI-IT GGCAAGGAAT TCTCTCCAGA GGCACAGGCT GCCTATCAGA AGGTTGTGGC RHF GKEF SPE AAA AYAK VVA 1s01 TGGTGTGGCG AACGCCTTGG CTCACAAATA eeAeTGAgot tctggcctot rtcctggtot GVA NALA HKY H ccottggoog ccctgtttcc ctagotgcto tctctgoott cggggoooto otgcccactc

L62t tcaagggtat ogcttctgcc tgatoswoa ctttcogctc ogctttctgo ttcctctcot

ttattttqtt cct

105 Appendix 1-2E Rock hyrox delto-globin portiol gene sequence

AAGGGCACCT TTGCCCAGCT GAGCGAGCTG CACTGTGACA.4.GCTGCAIGI GGACCCTGAG KGTF AAL SEL HCDK LHV DPE AACTTCAGGg tgogtccogg ggatgctcta ctcttttcoc tttgttttco ttccttgtog NFR ttotttooct tccttgcott tcctctgtot ctctqcttoc ttgottgcot tgcotcottt

18r ogtocttttc coocttotoc ogcttgcott otttttcatt coaacttcct ctttctttcc

tgoctcocoo tcttgcttto cotcotgccc tttgtotooo cttatocctt tcgcctctct

ctgtctctct ctctctctct ctcctttttt cctoqtttcc ttttctccot totgcotggt

ccctctcact goctgcttct gcocttoggc cootcctoct tgtcctccot otcaggtttg

ggaoggactg aatcaoaggg gacacaatco tgotgttott ttogogtctg tgococottt

481 tgtooctgao gaotoattgg ootoooacoo aotcotgggt aaotggaogc gctootgcot

541 atctgaggat gaaogatcag aaggtcotot aacogotggg gagcagaagt tttto agqga

cogatogcto ttoctctogt cooctaatgo gtttottoto tgtttgctto tttatttoco

661 tgtttttoat tttggtggga otoagtttgg gctgtcogtt tgogctagtg tgctgtcogc

72L atagatggot gtcoccccot ttctqcogot gccaagctgg otcgctttgt taaccotgcc

tctgtotctg tctotccctt ccccocogef CCTGGGCAAT GTCTIGGTGG TTGTCCTGGC L LGN VLVV VLA 841 CCGGCACTTC CGTGAGGAGT TCACCCCAGA TGTTCAGGCT GCCTTTCAGA AGGTTGTGAC RHF REEF TPD VAA AFQK VVT TGGIGTGGCA AATGCTTTGG cTcAcAAATA CCACTGAgot cctgttttct ggtatctotc GVA NALA HKY H 961 ogoagcoctg tttccct agc tgtgatct ct gaottcaggg oaotootgcc cactc tcogg

1021 ggcotcoctt ctgcctootq aogooctttc ogctcoactt tatgottcot tttotttott

106 Appendix 1-2E Rock hyrox delta-globin portiol gene sequence

ttottttatt tctoctcoct tttgoootqt gggogtgtcc-tgaoaggclt ocagotoggg

1141 oocg cttgtt t ccooco gga ogagtt coog ggaoattgag oaatgooggg tqccttococ

ogttgttoot ttootgggoo toocttgoct cgtotgocot ooogottggg oootttggco

L267 aotogagagg otgctgctoc tcctgooatt ccotggaogt coooccctta cococogcot

1321 gaaggctact tttcqoottc atqqtcooco ottotttoca aaatoatagg ttcootoagg

1381 ggogtcatgo ttattgoooc tottgttttc cttccttcct tcctqcttct cttgtocogt

1441 ttoggtcoto ogctatgtcc ttgtaogctt tcctqctcto tggoaccotc ctcttctgtt

1501 gtt0gtccto oooctgttct gooootoatc gtttctcctg ttgttttcot ttctctgggo

gttcogcgct cotgtotcoc ctgtaocttc tcttocctco tttcootggo tttcctogto

L62t gtgttttoao oototttcc

I07 Appendix 1-ZF Rock hyrox deltoH-globin gene sequence

goctogcooc tocotgctct ggcoccATGG TGCGCCTGAC TúAIAGIGñ AAAGCTGAGG V RLT DSE KAE TCGTCTCCTT GTGGAGCAAA GTGGATGAGA AAATMTCGG CAGTGAGGCA CTTGGCAGgT VVSL lrSK VDEK IIG SEA LGR L?l ttgtotctog gccgtocgto aggcaggctg oacgaggalt gcccogggcc aogcctgtgg

181 agacaoaocg gtctcccogt ttctgacaga coctgogtcc ctctgcccoc tctgtotttt

cocccctcog GCTGCTGGTC ATCTACCCAT GGACTCGGAG GITCTTTGM CACTTTGGGG LLV IYPIT TRR FFE HFG ACCTTTCCAC TGCTGACGCT ATCCTGAAAA ACCCTCGGGT ACAGGCCCAT GGAGAGAAAG DLST ADA ILKN PRV AAH GEK TGCTGTCCTC CTTTGGGGAG GGCCTGAATC ACCTGGACAA CCTCAGGGGC ACCTTTGCCC VLSS FGE GLNH LDN LRG TFA 42L AGCTGAGCGA GcrGcAcrGT GAcGAGcTGc ATGTGcAccc rcAcAAcrrc AGGgtgogtc ALSE LHC DELH VDP ENF R caggggatgc tctoctcttt tcttctcact ttgttttcot tcogtgtggt totttgocct s41 acttocttgo otttctttco tgtcoctttt tttcttgoct gtotctcotc otttogtgct

601 ttttcrgctt otqctocttt gcoctgtttt tttctttcao tottctttot ttttgtctto

661 tootcctgct ttototcoog ccctttogct ttctacottt tgcctctctc tcttttccto

ocqqttccct ttcccctoaa cagtoctcqo ottocgootg ttocctctco tccogtgttc

otgtocttag gcoaotcctt ctggtcctcc gtatgogogg tggcaoggoo tgootcaaag

atgcaoagog tgttotgttg ttctogogtt tgtgoctcot tttgoootto ooogotoqtt

tgootootot oooat co gga gtaaaaggoa aggaaaatca gtotctg ggg otggaoogot

cogoogotco totaggt got ggagagcogo ogtttctoo g aaacogacca ctottgctgc

aactaogcoo ttogttogtt aototocttg cttotttgtt tccotgtttt tocttttgot

108 Appendix 1-2F Rock hyrax deltoH-globi.n gene sequence

gggortaaat tcgggccotc cgttt ggggc aocagaoatg g€rlqtcot cc cqtttt coco

1141 gatgccaogc tggottgttt tqttaoccqt gtctgtotat ctotctgtct cttctcccrcq

1201 gcrccl'GGGC AATATCCTGG TGGTTGTCCT GGCCCGCCAT TATGGCAAGG AATTCACCCT LLG NILV VVL ARH YGKE FTL L26L AGAGGTICAG GCTGCCTGTC AGAAGTITGT GGCTGGTATG GCAAATGCCT TGGCTCACAA EVA AACA KFV AGM ANAL AHK L32L ATACCACTGA grtcctggoc tgtttcctgg totccoctgg oogccctgtt tccctagotg YH 1381 tgocctctgo gtttgtoaot ogtgctcatt cgcaagggco ttgcttctgc ctsqtqqqgo

1441 occttcogct cgactttctg ottcotttto tttottttot ttcttotgco ctttoggogt

1501 olgggagtgt cctgoooogc ttocc gatag cgotctcttg tgtccco cog gcogagtccc

1.561. ogggaaottg gaaacggocg gatoccttot gcogttgtto otttootgg

109 Appendix 1-ZG Yellow-spotted hyrox deltoH-globin gene sequence

gactagcoac tocotgctct ggcoccATGG TGCGCCTGAC IGAilAGIGAG AAAGCTGAGG V RLT DSE KAE 61 TCGICTCCTT GTGGAGCAAA GTGGATGAGA AAATAATCGG CAG|-GAGGCA CTTGGCAGgT VVSL lrSK VDEK IIG SEA LGR .LzL ttgtotctog gccgcaoggc oggctgaocg oggottgcc c agggccaogc ctgtg gagac

oaaocggtct cccogtttct gocogacoct gogtccctct gcccoctttg tottttcocc

247 CCTCO9GCTG CTGGTCATCT ACCCATGGAC TCAGAGGTTC TTTGAACACT TTGGGGACCT L LVIY PtrT ARF FEHF GDL 301 TTCCACTGCT GACGCTATCA TGAAAAACCC TCGAGTACAG GCCCATGGAG AGAAAGTGCT STA DAIM KNP RVA AHGE KVL GTCTTCCTTT GGGGAGGGCC TGAATCACCT GGACMCCTC AGGGGCACCT TTGCCCAGCT SSF GEGL NHL DNL RGTF AAL 42r GAGCGAGCTG cAcTGTGAcG AGCTGCATGT GGAcccrcAG AACTTCAGGg tgogtccocg SEL HCDE LHV DPE NFR 481 ggatgctctq ctcttttctt ctcoctttgt tttcottcoc tgtggttott tgocctactt

octtgoottt ctttcotgtc actttttttc ttgoctgtot ttcstcattt ogtgcttttt

601 cogcttotoc toctttgcoc tgtttttttc tttcqotqtt ctttqttttt gtcttotqqt

661 cctgctttot otcaogccct ttogctttct ocottttgcc tctctctctt ttcctoocqo

ttcccttccc cctooocagt cctcoqattq cgootgttoc ctctcqtcca gtgtttotgt

781 octtoggcoo otccttctgg tcctccgtot gagøggtggc oaggaotgoo tcooggotgc

aoagagtatt otgttgttct ogogtttgtg octcottttg ooottoooog atootttgao

toototooq a tcoggagtaa aoggaaagga oootcootqt ctggggo tgo oogatcagaa

ggtcototo g gtgacggogs gcogoogttt ctoogoooco goccoctott gctgcooctq

agcoattogt togttoatot octtgcttqt ttgtttocot gtttttoctt ttgotgggao

110 Appendix 1-2G Yellow-spotted hyrox deltoH-globin gene sequence

1081 toqottcggg ccot cogttt ggggt aocog oaotgggtat cÕtc-ccottt t coco gatgc

1141 caøgctggøt tgttttottq accotgtctg tgtotctgtc tgtctcttct ccocqgeleg

L TGGGCAATAT CCTGGTGGTT GTCCTGGCCC GCCATTATGG CAAGGAATTC ACCCTAGAGG LGNI LVV VLAR HYG KEF TLE 1267 TTCAGGCTGC CTGTCAGAAG TTTGTGGCTG GTATGGCAAA TGCCTTGGCT CACAAATACC VAAA CAK FVAG MAN ALA HKY AelGAgotcc tggoctgttt cctggtotcc ottggaogcc ctgtttccct ogatgtgocc

H 1381 tctgogtttg toaatagtgc tcottcgcqo gggcottgct tctgcctOO! qaogoocctt

cagctcgoct ttctgattco ttttqtttat tttotttctt otgtocttto ggogtotggg

ogtgtcctgo ooogcctocc gotagcgotc tcctgtgtcc cqcoggcogo gtcccoggga

aattggoooc ggocggotqc cttotgcogt tgttaottto atgg

111 Appendix 1-2H Western tree hyrox deltoH-globin gene sequence

catgctctgg coccATGGTG CGCCTGACTG ATAGTGAGAA 4çCTGAGGTC GTCTCCTTGT V R L T D S E K -A-E V V S L GGAGCAAAGT GGATGAGAAA ATAATCGGCA GTGAGGCACT TGGCAGgttt gtotctoggc Y{SKV DEK IIGS EAL GR cgtaaggcag gctgaacgag goctgccagg gccoogcct g tggogocaoa acagtctccc

ogtttctgoc ogacocttog tccctctgcc coctctggto ttttcqcccc tcogGCTGCT LL GGTCATCTAC CCATGGACTC AGAGGTTCTT TGAACACTTT GGGGACCTTT CCACTGCTGA VIY PlrTA RFF EHF GDLS TAD CGCTATCATG AAAAACCCTC GAGTACAGGC CCATGGCGAG AAAGTGCTGT CCTCCTTTGG AIM KNPR VAA HGE KVLS SFG 361 GGAGGGCCTG AATCACCTGG GCGACCTCAA GGGCACCilT GCCCAGCTGA GCGAGCTGCA EGL NHLG DLK GTF AALS ELH CTGIGACGAG CTGCATGIGG ATCCTGAGAA CTTCMGgtg ogtccogggg otgctctoct CDE LHVD PEN FK ctcttcttct coctttgttt tcqttcgctg tggttotttg octtocttgo otttctttco

541 tgtcoctttt ttcttgoctg totttco'tco tttogtgctt tttcogcttq tqctoctttg

cqctottttt ttctttcoot ottctttgtt tttgtcttot ootcctgctt totqtcaagc

cctttogctt tctocotttt gcctctctct ttctcttttc ctqocoottc ccttccccct

721 oaotogtoct cqqqttocgo otgttqcctc tcotccootg tttotgtcct taggcoaotc

781 cttctggtcc tccotqtgog oggl.ggcaag gogtgootca oogatgcoaa gagtattotg

841 ttgttctogo gcttgtgact cottttgaoo ttoooogato otttgootqo totooootco

901 ggagtoøaog gaoaggaooo t cogtotctg gggatgaaag ot cogooggt cotoc aggtg

atggagagco googtttcto agaaacsgoc coctottgct gcooctoogc oottogttog

1021 ttqqtotqct tgcttotttg tttocotgtt tttqqttttg atgggoatoa gtttgggcco

rt2 Appendix 1-2H lYestern tree hyrox deltoH-globi.n gene sequence

tcogtttggg gcaocogooa tgggaotcot cccottttco aiþn_tgccoo gctggottgt

1141 tttattoocc otgtctgtot otctqtctgt ctcttctcco cogCTCCTGG GCAATATCCT LLG NTL GGTGGTTGTC CTGGCCCGCC ATTATGGCAA GGAATTCACC CTAGAGGTTC AGGE]GEEIA VVV LARH YGK EFT LEVA AAY TCAGMGTTT crcccrccrA TGGCAAATGC cTTGGcrcAc AAATAccAcT GAgotcctgg AKF VAGM ANA LAH KYH L32L cctgtttcct ggtatccott ggaagccctg tttccctogo cgtgocctct gogtctgtoo

otogtgctco ttcgcooggg cotLgcttct gcctcctcca gaoccttcog ctcooctttc l-441 tgottcottt tgtttotttt otttcttotg coctttogoo gtotgggogt gtcctgaoaa

gcttaccgot ogcgatctct tgtgtcccat oggcagagtc caogggaaac tggaaaalcga

1561 cgggtacctt atocogttgt tootttootg g

113 Appendix 1-ZI Bushveld elephont shrew beto-globin portiol gene sequence

ctttooatgt oootttttgC tcqgtoøctg tgoctgtoot JcoCtotcct tgttgttgoo

61 gogtcootgc cooggotqtq qocoaotoaa tcootggoto tttcoggooo tttttotcog

72t ggotctttoa tataagocot ggtooaggoa goototttg g aatagtggct gottocttgt

tcottttotc tttgaootoo ttttgtotog tcccoagcot ogtocttoct ggctctgtcc

24t oaocottgtt oaoctgccco otagtggtao agtotctccc ocogtcoogo tctcootgtt

otcotttott toctaootgo otgocataag oaotocctqo gogccttogc ogcttgtttt

361 ttttttooot otocotoctt gctooaaago tgtttttcgt ttcqotttct gagaggootg

421 tgottogoga taogtoggag ogtgagggcc tgoootcooo ctcqootgoc ogtgccogcc

481 tgccootgoc ogtcogogct gtgotcactc cgggctcttt ctgcogagtc actctggcct

gggccaøtct gcttgcogqq gcoccatggg coggqcccag ggctgggcot øgaagaøggg

coggaccagc aaoggcttoc acttgcttct gocacaogtg tgttcactog coactqctcq

OOCOgtCoog ATGGIGEAIE TTACTGACGG CGAGAAGGCT CTGGTCAATG GCATTTGGTC VHL TDG EKA LVNG lylls 72L CMGGTGGAC GIAGATAÂAC TTGGTGGTCA GccrcrGGcc rGgtgagtot tttgottoog KVD VDKL GGA ALG tT gcoggtttoo ggotgcttga otgggactgg aotatggago cogccgtttt ccoggcttct

cocoggtott goctctttct ggcccctgtg tttcttttoc ccotcogfel GCTGATTGTA L LIV TACCCACGGA CTCAGAGGTT CTTTGAATCC TTTGGGGACC TGTCCTCTGC TGATGCTATC YPRT ARF FES FGDL SSA DAI 961 ATAAAGAATC CCAAGGTGGC AGCCCATGGC AAGAAAGTAG TGAACTCCTT CAGTGAGGGC IKNP KVA AHG KKVV NSF SEG 1021 ATGAAGCATC TGGATGACCT CAAGGGCACC TTTGCCCAGC TGAGTGAGCT GCACTGTGAC MKHL DDL KGT FAAL SEL HCD

lt4 Appendix 1-ZI Bushveld elephont shrew beto-globin portiol gene sequence

1081 AAGCTCCATG TGGATCCTGA GAACTTCAGG gtgogtctot gggaotooog ogtttccttc KLHV DPE NFR 1141 tgctctctot ttttgtcooo ttctogcagg gggagggggt tccoggtcag oattgoaaca

oagtgttctg octacoaagt aggaactctc cogggccttt ttaggaogto tootctcttc

L26t cttctocooo tctttottgc ccctgttgco tggttcogtg ttcctttgtc ttcttcttct

1321 ooctocttcc tcctctoctt octttttqtc octtcqoooo tgotttaooo oottoqocct

1381 tcttctttaa otctoctagg ggtttcccct tctqttctto goctttcoga attcatottt

1441 ttooogcotg gggoatoaoo tggttctttt tgocottoca tocgtttttt ooooggtcoc

taotooootc tgooctoago tggotggagg ooootottco catgcotooo ttcogactga

1561 totgocooca ctgocatccg togtogooct oogtgtcoot totcqcocco ctcotottto

1621 gtgcocaogo gtcoogtgtg ogtgcogctt gtgatgccct ootctcgotc octctgccag

ccoaottqot gtgtcttatc tttctcctco g

115 Appendix l-ZJ Bushveld elephont shrew detto-globin portiol gene sequence

t AATGTGGCCA ACGCCCTGGC'TCACAAGTAC CACTAAgotc ctgcc_ctttt ccttgggotc cotg

NVAN ALA HKY H 7t ctctgtctcc ctogttctga cccctooott gggagoogaa ccoooogtgt oototttgct tqot

141 ctttcogctc-ooatttttta ttoatttctt tttttttctt ottcotccta ctgtotagga goga zLL aggaca

116 Appendix 1--2K Nine-bonded onnodillo pseudo-delto-globin gene sequence

tacaaagogo coccATGGrG AATCTGACTT CTGTGGAGAA- geIGeIGIe ACAGGTCTGT

V N L T S V E K-S-A V T G L GGAGAAAGGT GAACGTGAGA IAAtgtggtg gtggctctgg gcaggtttgt gtggaggttø WRKV NVR 121 ccroggcogct tooggtggga ogatgcoogc tgggcotgtg gagocatogc ogtctcctgg

gtttotgoco ggtoctgoco gctgtcccct ttgctgtttt cqcccctcog GCTGCTGGTT LLV 241 GTGAACCCTT GGGCCGAAAG TTTCTTTGAA ACCTTGGGGA ACTTGTCCTC TCCTTCTGCC VNPW AES FFE TLGN LSS PSA ATAÏNGGCA ATAGTAAAGT GAAGGCTCAT TTCAAGAAGG TTCTGACTTC CTTTAGTGAT IFGN SKV KAH FKKV LTS FSD GGT TGATGA AGCACCTGGA CAACCTCAAG GGTACCTATG CTCATCTGAG TGAACTGCAC GVMK HLD NLK GTYA HLS ELH 42t TGTGACAAGC TGCACGTGGA TccrcAGAAc rrCAÄAgtgo gtctoggooo tottccqott CDKL HVD PEN FK ttttcqtttc tootttttgo ctgtgcctct ttoocctgtt ggctttacct ccocot,rrcco

541 ttttoctttt tctototctt ocatucttout gctttctoao ctttotgtco otctttcctt

tcoatottct cotgogctto tqtttqqtco agctctttoa ttqotttcct ogcttttgcc

cctctctccc tttttcttoq ttttctttcc ottooccogt octcqoqtto tgcctqccog

721 cttogotctc otctgctoct tctgcctttq ogooootoct tqttttcctt cooqtgtagg

ttggtoaogt ttgoottaoa goaoagoggc ogocootgtt tttctogooo ttgtgcqtco

ttttoagatt tgoottgttt aoogtcagog oggoaoatto ototctgagc otgoaacotc

ogoagotcot ataggoggtg ggatagcoao agttootog g agacagccco totcocatto

ootootcqqt gtotcogtto ottootgttt otttotagot ttttootttt tqttttggtc

1021 ggootogcct goggotctgt tgtggctoog ggotagttog ootoootggg gaocacccco

TT7 Appendix L-ZK Nine-bqnded qrmodillo pseudo-delto-gLobin gene sequence

10Br gtgtctcogg agtcaogctg gogtcctttc tttoctotgt occtocctct -ctctgtgtcc

1141 tcccttcage TCCTTAGCAA CATGCTGGTG ATTGTGCTGG CCTGTCACTT TGGCAAGGAC L LSN MLV IVLA CHF GKD 1201 TTCACCCTGG AGTTGCACGC TGCCTTTCAG AAGGTCGTAG TTGGAGTGGC GAATGCCCTG FTLE LHA AFQ KVVV GVA NAL L26L GCTCACAACI- AeIA[GAgo ttctggtctg tttgct AHNY Y

118 Appendix l-ZL Nine-bonded qrmoidirro beto-globin gene sequence

tatoaagago coccATGGrG MCCTGACCT CTGACGAGAtr GAçTGCCGTC CTTGCCCIGT V N L T S D E K-T A V L A L GGAACAAGGT GGAccTcGAA GACTGTGGTG GTGAGGcccT GGGCAGgttt gtotggoggr WNKV DVE DCGG EAL GR 121 tocoaggct g cttaoggagg gaggatggaø gctgggcotg tggagacogo ccqcctcctg

181 gotttotgoc aggooctgat tgctgtctcc tgtgctgctt tcocccctco gGCTGCTGGT LLV 241 CGTGIATCCC TGGACCCAGA GGTTCTTTGA AAGCTTTGGG GACTTGTCCA CTCCTGCTGC VYP lllTAR FFE SFG DLST PAA 301 ÏGTGTTCGCA MTGCTAAGG TAAAAGCCCA TGGCAAGAAG GTGCTAACTT CCTTTGGTGA VFA NAKV KAH GKK VLTS FGE 361 AGGTATGAAT CACCTGGACA ACCTCAAGGG CACCTTTGCT AAACTGAGTG AGCTGCACTG GMN HLDN LKG TFA KLSE LHC 42L ïGACAAGCTG cAcGTGGATc CTGAGAATTT cAAGgtgagt cootottctt cttcttcctt DKL HVDP ENF K ctttctotgg tcoogctcot gtcotgggaa aoggacotao gogtcogttt ccogttctco

541 otagaooaoo ooottctgtt tgcotcoctg tggoctcctt gggaccattc otttctttca

601 cctgctttgc ttotogttot tgtttcctct ttttcctttt tctcttcttc ttcotoogtt

tttctctctg tottttttts qcocaqtctt ttaottttgt gcctttooot totttttoog

72t ctttcttctt ttoottocto ctcgtttcct ttcotttcto toctttctot ctqqtcttct

cctttcoogo gaaggogtgg ttcoctocto ctttgcttgg gtgtoao gaa taocagcoot

841 ogcttooott ctggcotaat gtgaotaggg oggocoottt ctcotot aag ttgaggctga

tottggaggo tttgcot tag togtagaggt tocotccogt toccgtcttg ctcotoottt

gtgggcacao cacagggcot qtctt ggaoc aaggctagoo tottctg oat gcaaactggg

7021 gocctgtgtt ooctatgttc otgcctgttg tctcttcctc ttcogcrccr GGGCAATATG LL GNM

119 Appendi.x 1-21 Nine-bonded ormoidillo beto-globin gene sequence

1081 CTGGTGGTTG TGCTGGCTCG CCACTTTGGC AAGGAATTCG -A€IGGEAEAI GCACGCTTGT LVVV LAR HFG KEFD rÚHM HAC 1141 ÏI-TCAGAAGG TGGTGGCTGG TGTGGCTAAT GCCCTGGCTC ACAAGTACCA rTGAgctcct FQKV VAG VAN ALAH KYH ctcccooctt tccogttcct qcoooaggtg cttttgtcct cogogtccoo ctqctgootg

tgggaaoatt ototogogcc ttgggootct ggttgtgcct ootooogooc otttotttcc

1321 octgcatttg tgtotttsoo ttotttctgc qtqtctcqct cogotgggco totgggaggc

aag

r20 Appendix 1-2M Pole-throoted sloth pseudo-delto-globin gene sequence

tcoctagcoa ctacaosoco gococcATGG TGTATCTGTT.SGCAIGAG GTGtcTGccG V Y L F-A-H E V S A TCTCCGGCCT GTGGGGCAAG GTGAATGTGC AACAACTTGG TGGCGAGGCC CTGGGCAGgT VSGL IIGK VNVA ALG GEA LGR tggtoctgog gttotooagc ogottaagga gggoggttgg aagctgggct cctggagoca

gogcogtctc ctgggtttct gocoggcoct goctccttct gtcccctgtg ctoctttcoc

ccttcogGCT GCTGGICGTG TACCCCTGGA CCCAGAGGTT CTTTGAAAGC TTTGGGGACT L LVV YPITT ARF FES FGD TGTCCTCTGC TGATGCTGTG TTTTCCAATG CTGAAGTGAA GGCCCACGGC AAGAAGGTGC LSSA DAV FSNA EVK AHG KKV TGACCTCCTT CGGTGAGGGT CTGAGCACCT GGACAATCTC AAGGGCACCT ATGCTCACCT LTSF GEG LSTltJ TIS RAP MLT GAgcgogctg coctgtgaco agctgcocgt ggotcctgog oocttcoogg cgogttcogg

ogotgttcca otttttttcq ttttcttgtc tttcactgto cttctttooc ttqctogctt

541 tgcctccoca ttccttttco ctttttctqt ottttqtcco ttoootgctt tttagoottt

601 acotccttgt tttctttcao tottctttgt ttcctatctc otgoccttqt tttocotcqq

661 gctcttactt toctogcctt tgcctctctc tccctqtttc ttootgtttt ttccoctooc

cootoctcoo ottotgcaga ccagctotto tctgctoott ctqcccttgg gaoottcttc

ottttctcco aottggggtt ggtooogcct goatcaaogo gaogaggcoc otgotoctct

841 agoaoctgtg cotoottttg oogtttgaot tgccagaogt caggaotooo ttgggaggag

agtcogtot c tgogcat gaa agat"cagaog gtcatotog g oagtgtgoga cagcccotot

961 cqcactaoct qotcootgao tttgttoott ootttotagq ttcocttttt taaogtttgg

L021 tgggoataoo attgggotcc gttttggcto gggtgtgggt ogoatoootg ggcottaccc

LzI Appendix 1-2M Pqle-throqted sloth pseudo-delto-globin gene sequence

1081 cogtttctc o gaagtcaagc t ggattcgtc tgtgooccot -gft c_tot gtgt ctocctocct

1141 CtgCcctcog CTCCTGGGTA ATGTACAGGT GATTGTGCTG GCCTGCTGCT TTGGCAGGGA LLGN VAV IVL ACCF GRE 1201 ATTCACCCCG CAGTTGCAGG CTTCCTGTCA GAAGATGGTG ACTGCTGTGG CTATTGCCCT FTP QLQA SCQ KMV TAVA IAL L26L GGCTCACAAG TAccATTGAc otcctggcct gtttgctggc otccotcggo ogtccctgtt AHK YH 1321 tccgtotott ctgtcctcog oocttgggga oooaotgttc occqtcoogo ocottqcttc

1381 tgcctttttt ttaqttctct ttotttooot ttotttttgt totttctgcc tqqtqqqgot

1441 ctttcotctc oacctgotga ttcotctcoc ttottttctt tcotttttgc tcogtctogt

1501 ggtal.gggao gatcccttog ggtct ocoaa togggoactc otgtgtctto tgooo ogotg

1s61 tcaagggaaa tggaaagotg oaggaggtct

r22 Appendix 1-2N Pole-throated sloth beto-globin gene sequence

o ctccctgg a aggoaggggo gggcccsggg cttgg cCtoo _aagpr1ggaga agggccogct

gctoctcoco cttgcttctg occcogctgt gttcoctogc ooccocoaoo cagococcAT

GG]GEAEEIG GCTGACGATG AGAAGGCTGC TGTATCCGCC CTGTGGCACA AGGTGCATGT VHL ADDE KAA VSA LtrHK VHV GGAAGAATTT GGTGGCGAGG CCCIGGGCAG gttggtoctg aggttataog gcagattoca EEF GGEA LGR gogggatgat ggoogctggg ctcctggoga cogagcogtc tcctggtttc tgocaggcoc

tgoctccttc tgtcccctgt gctoctttcq cccttcogGC TGCTGGTCGT GIAccccTGG L LVV YP$I ACCAGCAGAT TCTTTGAAAG CTTTGGGGAC TTGTCCTCTG CTGATGCTGT GTTTTCCAAI TSRF FES IGD LSSA DAV FSN GCTAAAGTGA AGGCCCACGG CAAGAAGGTG CTGACCTCCT TCGGTGAGGG TCTGMGCAC AKVK AHG KKV LTSF GEG LKH CTGGATGATC TCAAGGGCAC CTATGCTCAC CTGAGCGAGC TGCACTGTGA CAAGCTGCAC LDDL KGT YAH LSEL HCD KLH 541 GTGGATCCTG AGAACTTCAA Ggtgogcctg cgggcacttc ogtgttctcc ttcctccttc VDPE NFK tttctatgot coogcttgtg tcotgggoaa ogggcocagt otccogggtc cogtttggao

oagaaaaoat cttctggttt tntocctotg goctccttgg ogctotttot tttctttocc

tgctttgttc qcqatcottg ttttctcqtt tcottcttct ttttcttttc cotootttct

tctgccttto tttttottto oocttttcot tttgtgcctt tooqttqttt ttoooctttc

841 ttcttttqqt ccqctatttt tcttqtctct otqttttctt tctctqotgt tttctttcct

ogcgoaggoa cggocagctg ctoctttgca taggtctooo gootcoctot gttooogctc

961 toogttgag c tgggtgggga gsgccotttc tgcotgtoco ctcoggc tgg tgtgggaggo

gcogcgtcag tggcagaggt oocatctggc tqtcotcctg ctctngottt gtgggcoooc

t23 Appendix 1-2N Pqle-throoted sloth beto-globin gene sequence

1081 cctogncoco gtttonotgo tgctggoata ttctgottcc-ogtltgggqc cctctgttoo

ctotgttctt gcctcttttc tcttcccctc ogeIeTTGGG TAACGTACTG GIGATTGTGC LLG NVL VIV 1201 TGGCCCGCCA CTITGGMAG GAATTCACTC CGCAGTTGCA GGCTGCCI'AT CAGAAGGTGA LARH FGK EFTP ALQ AAY AKV CGACTGGTGT GTcTACTGcc crcccccAcA AGrAccAcrG Agcoccoctt tctgcttt TTGV STA LAHK YH

t24 Appendix 1-3. Lengths of exons and introns for sequenced 'pJike' globin genes. Values do not include those of the initiation and termination =cqdons. "--,, denotes missing data. '- l-

Taxon Exon I Intron I Exon 2 Int¡on 2 Exon 3 Gene length

African elephant ô 89 bp 128 bp 223bp 729 bp 126bp l.295bp

Asian elephant ð 89 bp 128 bp 223bp 729bp 126bp t,295 bp dugong ô 89 bp t29bp 223bp 755bp 126bp 1,322bp

manatee ô 89 bp 129 bp 223bp 756bp 126bp 1.323 bp

rock hyrax ô 739 bp 126bp

rock hyrax ôH 89 bp 132bp 223bp 728bp 126bp 1,298 bp yellow-spotted hyrax ôH 89 bp 128 bp 223bp 727 bp 126bp 1,293 bp

tree hyrax òH 89 bp 128 bp 223bp 726bp 126bp t,292bp

elephant shrew p 89 bp 125 bp 223bp 601 bp

sloth r¡rò 89 bp 129bp 223bp? 680 bp 126bp 1,247 bp sloth p 89 bp 128 bp 223 bp 6llbp 126bp 1,177 bp

armadillo rþð 87 bp 126bp 226bp 693 bp 126bp i,258 bp p armadillo 89 bp l25bp 223bp 6l I bp 126bp t,t7 4bp

125 Appendix I-4. Lenglhs of additional sequence dat¿ obtained for the 5' external region of sequenced'pJike'globin genes, and positions of upstrearl features relative to the putative CAP site. The positions of putative cap sites fronì-ihe initiation codon ("ATG') are also shown. Lengths of collected 5' end sequence data do not include that of the initiation codon. "--" denotes missing data.

Length of CCAAT Putative Taxon CACCC site ATA site obtained 5' site CAP site external data

African elephant ò -26bp

-104 bp, -52bp Asian elephant ò -75 bp -31 bp from -400 bp -89 bp .,ATGU

-31 -105 bp, bp -52bp dugong ô -76bp mutated into from -778bp -90 bp UGTA" ''ATGU

-31 bp -105 bp, -52bp manatee ð -76bp mutated -90 bp into from -208 bp UGTA" "ATG"

rock hyrax ôH -26bp yellow-spotted hyrax òH -26bp

tree hyrax òH I4 bp

distal absent; proximal at -52bp elephant shrew p -89 bp, -75 bp -30 bp from -670 bp mutated into UATG" ''CACTC"

sloth r¡rô -26bp

-51 bp sloth B -31 bp from l18bp "ATG" armadillo rþô l4 bp

p armadillo l4 bp

126 Appendix l-5. Lengths of additional sequence data obtained &ryhe 3' external region of sequenced 'BJike' globin genes, and positions or põþ-adenylailon signals (AATAAA) relative to the termination codon. Lengths of collected 3' end sequence data do not include that of the termination codon. "--" denotes missing data.

Length of obtained Taxon AATAAA signal 3'extemal data

African elephant ô +103 bp 417 bp

Asian elephant ô +103 bp 418 bp

dugong ò +104 bp 533 bp

manatee ô +104 bp 156 bp

rock hyrax ò +99 bp 702bp

rock hyrax ðH +102 bp 279bp yellow-spotted +102 bp hyrax ôH 279bp

tree hyrax òH +102 bp 279 bp

elephant shrew ô +95 bp 180 bp

sloth r¡ò +l52bp 3ll bp

sloth p 17 bp

armadillo {rô 18 bp

armadillo p +106 bp 189 bp

127 Appendix 2-1. Primers used in the (A) amplification and (B) sequencing'reactiqns of 'e-like'globin genes.

(A) Primers used in poll'rnerase chain reactions.

Primer Name Sequence HBB-12(R) 5' GCC ACC ACC TTC TCA TAG C'cA GC 3' HBG-1(F) 5' CCA TCA TGG GCAACC CCA GG 3' IIBG4(F) 5' CTG TCA TCA CCA CGAACT 3' IIBG-s(F) 5' CCAAGA CCG GACACC ATG 3' HBG6(R) 5' TTAATT GTG CTG TCA TGT GAA 3' IIBG-7(R) 5' AAC A AG CCT TCT CCC TAT T 3' MNT-GA(F) 5' CCA TGG CGA GAA TGT GCT GAA C 3' DUG-GA(R) 5'GAT GGT CCTACTTAT GCT CAGATCAGT GCT 3' DUG-GB(R) 5' TGA GGC TAAACA CAA GAAAGC ACA GAC ATA 3' ELE-GBß,) 5I TGA GAC AJAU{ACA CAA GAGAGC ACA AAC ATA 3' DUG-GC(R) 5' GTG CAG CTC ACT CAG CTT ACG AJd{ CrcT 3' DUG-GA(F) 5' ACT GGG TAA GAC CACAGC CTT TGA G 3' DUG-cB(F) 5' CIGC CAGACT GGAACT CTC TGC TCA C 3'

DUG-GC(F) 5' C'GG AAA CGT GATAGT GAT TGT CTT CTGC 3'

GGW-A(F)-RH 5'CTA CAGAIGCCAAGC TGGATC GCTTTGTTA 3' GGW-A(FISTH 5'AGGAGT GGG TAG GCA GTAATG GTA TTTAGC 3' GGw-B(F) 5' CCC AAC AGC TCC TGG GCAATG T 3' GGW-C(FIESTySTH 5'TGG CTT CACATTTTG GCAAGGAGTT 3' GGw-C(F)-RH 5' GCT CGG CAC TTC CGT GAG GAG TTC 3'

GGW-A(R)-RH 5'AAT GAC GCAACG CAA TCAAGTAAG TAGAGA 3' GGW-A(RISTH 5' CTC CTAACA CTGAAC TTGACC CAI TAI T 3' GGW-B(R)-RH/ESH 5' TGT CCA TGT TCT TAA CAG CTT CTC CA 3' GGW-B(R)-STH 5'TGAGGT CAT CCATGTGCTTAACAG CATCT 3' GGw-C(R) 5' GTC AGC ACC TTC TTG CCA TGA GCC TTG 3'

(B) Primers used in sequencing reactions at the Unversity of Calgary DNA core laborafory.

PrimerName

M-13(F)40 5' GTT TTC CCA GTC ACGAC 3' M-13(R) 5'AAC AGC 'IA-I GAC CAI G 3'

r28 Appendix 2-2. DNA sequences of the 'e-like' globin genes-amplifiel in this study. Exons are identified by CAPITAL letters, while introns and external gene sequences are represented by lower case letters. The 5' flanking transcriptional control motifs (CACCC, CCAAT, ATA) and 3' poly-adenylation signal (AATAAA) are underlined within the 5' and 3' flanking regions of each gene sequence, respectively. Within each coding block, fanslated amino acids are listed below their conesponding base triplet.

Appendix Page

2-24 African elephant y-globin gene sequence 130

2-28 Asian elephant y-globin gene sequence t32

2-2C Dugong y-globin gene sequence 134

2-2D West Indian manatee y-globin gene sequence t36

2-28 Bushveld elephant shrew e-globin partial gene sequence 138

2-2F Pale-throated sloth e-globin gene sequence 140

r29 Appendix 2-24 Africon elephont gommo-globin gene sequence

GTGCATTTTA CTGCCGAAGA GAAGGCTGCT ATCGCAAGCC-I4TGGGGCCA GGTGAATGTG VHFT AEE KAA IASL-tTGA VNV EETG GEA LGR L2L ataoaggaag ctgogcctgg coogaotgco gaccogttoc coogttttgt goctoggtct

gottttccot ctgttotggt tccotcccot ogGeTTeIGG TTGTCTACCC TTGGACCCAG LLV VYP yllTa 24t AGGTTTTTTG ACACCTTTGG CAACCTATCC TCTGCCTCTG CCATCATGGG CAACCCCAGG RFFD TFG NLS SASA IMG NPR GTCAAGGCCC AIGGY¿AGM GGTGCTGACC TCCTTTGGAG ATGCTGTTAA GAACCTGGAC VKAH GKK VLT SFGD AVK NLD AACCTCAAGG GCACCTTTGC TAAGCTGAGC GAGCTGCACT GTGACAAGCT GCACGTGGAT NLKG TFA KLS ELHC DKL HVD 42r ccrGAGAAcr rcAGGgtgog tccoggatot gtttgtgctc tcttgtgttt tgtctcaggc PENF R oocttggoco qccqogcoct gocctgagca caagtaggoc cotctot ggg ctgtgoggto

541 tttggogotc ttgggttogL aotooogcaa ctccagogog ctgoottcoo octoggtgtt

601 ocogtoocoo ggctcoogoo gtgcttgtog oogtctocog lggtgacttg tctooqctct

gttggctttt ootggogoct tgtgtacsga tgagaaoaat gtcttttggt oottggoott

72L ccgtggooog gagaactttg cttttctttc ctccctcqtc totcoccgcc tgcccctcqt

otgttggogg tagaaoaogo otgtcotggg tgocgotaco tgcogocotg tottggttoo

ttotoocooa gooocctggg tggtcotgtg totottocto oggctottcc totqcttggt

ggccotttgt ocoooqotgg tootocctcc tttcoqqctt ttttccgaoa octcgaggog

taagacoaao t gtgtct gag ggagggaaac ooctggooto tgtgacc oag gagggtot gg

aotagaaogc tgtgoagttt tqtttttgct tttttcct c c caaeaoagca totgg agato

130 Appendi.x 2-24 Afri.con elephont gamma-globin gene sequence

1081 gtgctggtag googotcaoo ooaotgoaac cgtgogtgto glgcogtoeu o000gogaag

cooottagct otgttootoo ctgggcaago ccotogcctt tgoggaogct oocotogoct

togttoggt c tgagtggcag ctgggtggcc ogotoctcg g aggccagact ggogctctcg

gctcocooco tgaotccota ggaatgcotc tgttgtctgt ttctacctcc ocogCTCCTG LL GGAAATGTGC TAGTGATTGT CI-TGGCAAAC CACTTTGGCA ARGAAIIEAE CCCCCAGGTG GNVL VIV LAN HFGK EFT PQV CAGGCTGCCT GGCAGAAGAT GGTGACGGGC GTGGCCAATG CCCTGGCCTA CAAGTATCAC AAAlT AKM VTG VANA LAY KYH TGAgctcctc gt

131 Appendix 2-28 Asion elephont gomno-globin gene sequence

gtggtcocco cgooctccco ogcccagaca ccATGGIGeA lT[=-fAeIGee GAAGAGMGG VH FTA EEK CTGCTATCAC AAGCCTATGG GGCCAGGTGA ATGTGGAAGA GACCGGAGGC GAGGCCCTGG AAIT SLIT GAVN VEE TGG EAL r2r GCAGgtocot tctg ggo agc agcggagggg agggootaoo ggaagctgog cctggcaoga GR otgcogocco gttaccoagt tttgtgocto ggtctgottt tccqtctgtt atgottccot

24L CCCAtogGCT TCTGGTTGIC TACCCTTGGA CCCAGAGGTT TTTTGACACC TTTGGCAACC L LVV YPWT ARF FDT FGN TATCCTCTGC CTCTGCCATC ATGGGCAACC CCAGGGTCAA GGCCCATGGC AAGAAGGTGC LSSA SAI MGNP RVK AHG KKV TGACCTCCTT TGGAGATGCT GTTAAGAACC TGGACAACCT CAAGGGCACC TTTGCTAAGC LTSF GDA VKNL DNL KGT FAK TGAGCGAGCT GcAcTGTGAc AAGCTGCACG TGcATccTGA GAACTTCAGG gtgogtccog LSEL HCD KLHV DPE NFR 481 gatotgtttg tgctctcttg tgttttgtct caggcooctt ggacooccoo gcoctgotct

541 gogcacoagt oggoccotct otgggctgtg ogototttgg ogotcttggg ttogtaatos

ogcooctcca gogogctgoa ttcoooctog gtgttocogt aocooggctc oogoogtgct

tgtogoogtc tocogtggtg octtgtctoo octctgttgg cttttootgg ogocttgtgt

721 acagatgoga oaoatgtctt ttggtoottg goottccgt g gaaaggagoo ctttgctttt

ctttcctccc tcotctqtco ccgcctgccc ctcototgtt ggaggtagoa aqogaotgtc

atgggtgacA otocotgcag ocotgtottg gttoottota acaoagaooc ctgggtggtc

otgtgtotot toctocggct ottcctotot ctggtggcco tttgtocoaa oøtogtooto

tctcctttca qocttttttc tgooooctcg aggogtoago caoootgtgt ctgog ggogg

gooocooctg goototgtgo ccooggaggg totggaotog ooagctgtgo ogttttottt

132 Appendix Z-ZB Asion elephont gormo-globin gene sequence

ttgctttttt cctcccoaaa cogcgtotgg agotggtgct ggtpggaogo tcaoooooot

1141 gaooccgtgg gtgtagggco gtocooaoog agaagcaoot togctotgtt ootoa ctggg

coogaccoto gcctttgagg oagctoocot ogocttogtt oggtctgagt ggcagttggg

tggccagoto ctcggoggcc ogoctggsgc tctcggctco coocotgoqt ccotoggoot

1321 gcotctgttg tctgtttcto cctccocogC TCCI.GGGAAA TGTGCTAGTG ATTGTCTTGG L LGN VLV IVL 1381 CAAACCACTT TGGCAAAGM TTCACCCCCC AGGTGCAGGC TGCCTGGCAG AAGATGGTGA ANHF GKE FTPA VAA AITA KMV 144l cGGGcGTGGc cMTGcccrG GccrAcMGT ATCACTGAgc tcctcgcoot aggoagaqga TGVA NAL AYKY H 1s01 cttatttttc ocotgocooc qcoottqqtt oooogtottc tgttoogaoa toogatgtoo

1561 tggoctcctg ttgtttcttt ttcotgtggc cttoagtooo cgaotttcco ggggttttot

gttggggtgt gtgtgtgctc cctottcoct tttggcaooo ggtaagoatt ttgotaataa

1681 aqgoacaagg caatggq{/gc ocotoc actg ggagttctga aqggaasago ooootctctg

1741 ggatogtttt ggtgggagag aaagggcttt ottggocogg goctcctcog ooctogot

133 þpendix 2-2C Dugong gormo-globin gene sequence

otcocogotg tgootgtcco ttttqqctct tccotgcctc¡-oqcoccgccc ccccgccccg

gcctgotogc ctogcgttgo ccoqtogcct cotogcoaaa ggaagsocoa ogggccagtg

gccogggoto oagoglaaoo ogccocotgt tccogttgco gcocotocot ccttctgoco

181 cotctgtgot cqccocooac tccaogoccg gococcATGG TGTATTTTAC TGCTGAAGAG V YFT AEE 24t AAGGCI-GCTA TCACAAGCCT GTGGGGCAAG GTGAATGI-GG AAGAGGCTGG AGGCAAGGCC KAAI TSL ITGK VNVE EAG GKA CTAGGCAGgT ogattct ggo gggtoggggo agggagggao taaaggoagc tgagcttggc LGR 361 aggoatgcag gtctgttocc aogtcttgtg ocaogctctg otttoccotc tgctotgott

ctotcctgto gGeIeelGAI TGTCTACCCT TGGACCCAGA GGTTTfl-TGA CAAATTTGGC LLI VYP IVTAR FFD KFG 481 AACCTATCCT CTGCCTCTGC TATCATGGGC AACCCCAAGA TCMGGCCCA TGGCAAGAAG NLSS ASA IMG NPKI KAH GKK 541 GTGCTGAACT CCTTTGGCGA TGCCGTTGAG AACCCGGACA ACCTCAAGGG TACCTTTGCT VLNS FGD AVE NPDN LKG TFA 601 AAGCTGAGTG AccTccAcTG TcAcAAGcTG CTTGTGGATC cTGAcGAcTT CAGGotgogt KLSE LHC DKL LVDP EDF R 661 ctoggototg tctgtgcttt cttgtgttto gcctcogaco gcttogocoo ctgogcoctg

atctgagcot oagtaggocc otctqtggcc tgtgogotot ttggogotct tgggttogto

ocaaCIocoot tccogogggc tgoottcqoo ctoagtgtto tggtoocaag gc|ucgagaog

841 tgcttctogg ogtctggogt gocttgtcto ooctctgttg gctttto gtg gøgactcgtg

togogatgag gocgtctttt ggtoottggo otcctgtoga aagqagaott ttgcctttct

ttcctcocgt otctqtcoct gcctgccccc cototgtogg aggtogaaoa agootgtcco

gtgggacaat ocacgcogoc otgtottggt tqqttqtqac aaagaaactg ggtggtcotg

t34 Appendix Z-ZC Dugong gomno-globin gene sequence

1081 cotqtottac tooggctott cctgtoctgg gcoggcottt-grtg_coooq[c tactaotgtc

tcctttcoqq cttotttcct qqoqctctog gagtaaggco oogtgtctct gagggaaggg

qooccooct g gaatotgtgg cogaggaggg tatggooto g oaagctgtgo gggttttott

tttgtccccc occcccoccc cccoc coaoo aooaaaagta tgtggagotg gtgct ggtag

1321 goagatagaa agagtgaoot cotggot gtg ggt co gto c a gaaagggaag caaattogct

otgtttooto octgggtaag accacogcct ttgagaaagc toacotogoc ttaggcctto

1441 gtggcogttg ggtggccogo ttcttggagg ccogoctggo octctctgct coctototgo

1501 otccotoggo otgtgtctgt totctctttc tocccctoco geIeeIGGGA AACGTGATAG LLG NVI 1561 TGATTGTCTT GGCAÄACCAC TTTGGCAAAG AATI'TACCCC CCAGGTGCAG GCTGCCTGGC VIVL ANH FGKE FTP QVA AAI'J AGAAGATGGT GACTGGTGTG GCCAGTGCCC TGGCCCGCAA GTATCACTGA gtotctcato AKMV TGV ASAL ARK YH otagggagao gccttgttct t cocs tgoca gcocaattoo tggqattqtt ctgttAAtøl

I74l øtaogottto ctogactcct ottgtttctt tttcatgtoo ctttoogtaa øtgaattccc

oggg0tttta tgttgggggg gtgtgtgtgt gctccccott coctgtttgc aaoogatgao

gottttgaco agaaaatsag agsacaggac gataaaggoq cotccct ggg ogttctgaao l92r gggoaagaao oatctct ggg agagttttgg tgggaoaggø agggctttot gggocoaggo

1981 ctcctcogaq ctoaotattt ttgogcogtg ggaaaogact tcogggoaog tagcagoggc

0

135 þpendix 2-20 West Indicn manqtee gormo-globin gene sequence

tcotogcoo a oggoagaaca aagggccogt ggccagggat -ueogao.Lq ry aagccocatg

61 ttccagttgc ogcocotoco tccotctgoc ocotctgtgo tcoccqcooq ctccoagacc

Lzt ggococcATc GTGGATTTTA CIGCTGAAGA GAAGGCTGCT ATCACMGGC TGTGGGGCAA VDFT AEE KAA ITRL ITGK 181 GATGAATGTG GAAGAGGCTG GAGGCAAGGC CCTGGGCAGg togottctgg gggotogggg MNV EEAG GKA LGR 241 sagggaggga otoaaggaag ctgagctagg cøggaotgco ggtctgttac coogttttgt

301 gocoogctct gottttccot ctgctotgot tctatcctgt ogGeleelGA TTGTcrAccc LLI VYP 361 TTGGACCCAG AGGTTTTTTG ACAACTTTGG CAACCTATCC TCTGCCTCTG CCATCATGGG lrTA RFFD NFG NLS SASA IMG 42L CAACCCCAAG GTCAAGGCCC ATGGCAAGAA GGTGCTGAAC TCCTTTGGAG ATGCCGTTAA NPK VKAH GKK VLN SFGD AVK 481 GAACCCGGAC AACCTCAAGG GTACCTI-TGC TAAGCTGAGT GAGCTGCACT GTGACMGCT NPD NLKG TFA KLS ELHC DKL 541 GCTTGTGGAT TCTGAGAACT rCAGGotgog tctogggtot gtttgtgctt tcttgtgttt LVD SENF R 601 ogcctcogoc ogcttogoco octgogcoct gotctgogca taagtaggoc cotctatggc

661 ctgtgogoto tttggogotc ttgggttogt gaaaaoocao ttccogoagg ctgoattcoa

72r octoogtgtt otggtoocoo ggctcaogoa gtgcttctog gagtclogog tgacttgtct

781 oaoctctgtt ggcttttaat ggogactcat gtagagatgo goocootgtc ttttggtoot

8{1 tggaatcctg tagaaggoog aattttgcct ttctttcctc ocgcotctot coctgcctgo

901 cccccototg toggaggtog oaaaogootg tccogtgggo cAatococgc aggcatgtot

961 tggttootto toocooogoa actgggtggt cotgcototg tcoctooggc tottcctgto

1021 ctgggcaggc otttgtgcqa qotctoctaq tgcctccttt cqoatttotq tcctooooct

136 Appendix 2-2D lïest Indion monotee gormo-globin gene sequence

ctgggagtag gacaaagtgrt ctctg ogggo gggaaacaac -tggaototgt ggcagaggag

1141 ggtatggaot ogaaogctot goggttttat ttttgttccc cccoccqeea aaoaeoogta

tgcggagatg gtgctggtog goagatagaa agagtgaaat catggot gtg ggtcogtoco

goaagggaag caoottogct otgtttootq octgggtoog occacagctt ttgogaocgc

toacotagac ttoggcctto gtggcogttg ggtggccago ttcttggagg ccagactgga

1381 octctctgct coctotatgo otccata$ga otgtgtctgt totctctttc tocccctoca

7447 gEIEE]GGGA AACGTGCTAG TGATTGTCTT GGCAAACCAC TTTGGCAAAG AATTTACCCC LLG NVLV IVL ANH FGKE FTP 1501 CCAGGTGCAG GCTGCCTGGC AGAAGATGGC GACTGGTGTG GCCAGTGCCG TGGCCCGCAA AVA AAlTA KMA TGV ASAV ARK 1561 GTATCACTGA gtocctco YH

137 Appendix 2-2E Bushveld elephont shrew epsilon-globin portiol gene sequence

GTCAAGGCTC ATGGAAAGAA GGTGCTGACC TCCTTTGGAG -AIGEAGTI$ GAACATAGAC

V K A H G K K V L T S F G D-A V K N I D AATCTCAAGG GAGCTTTTGC TAAACTAAGT GAACTGCACT GTGACAAGTT ACATGTGGAT NLKG AFA KLS ELHC DKL HVD CCTGAAAACT TCCGGgtoog tccagoogot gttcocttgg tttggttttt tttttoottt PENF R ttottotggo atootoqtto otootootao toocaocoat ootogotooc cctttgocto

247 gaaagccaaa occoacoooo ctctqasata otootttggt gttoooo gag toagcttctg

gttggcatao ttotogccto ttaggctcog aggtgaogtc aotoagggot gttogotocc

tcogtttooa oaagaaccco oqcotttooc ogaotcogat gtottcotcc tooogootoc

tgaggaøgao caggtaaact ttottctctg tgootctooo ogotttcogt otgtcoataa

tatggaggao ootgotooot ottttgtttc ttgaogaaoo agotootaco otttotoqoo

tqtottaoot qotctotcoo totcqtatcc ooogogoctt ttttctttct gcottgottg

otgoctttoo coooqttotg cctooggagc tcttgtgtcc ctgoaotcct qcoottotoq

661 gcotoaggtq otttqtoaat tøgøaagggo ggcaaocooo otogtoc agt ggatogaaaa

ggatgtagct aogaggaggc otootgttgo acogøgogtt tooqataotg ttgoototog

otgctatctt octtgagtot ootggagaag tttgogotog gaootgotco oggototggg

841 tottcogttt toccccctcc cccqoaaoaa ottqttttto gooctctcto ctcqototgt

ctctttgtco ttttgtcttt tccccaotog CI-CCTGGGCA ATGIGATCGT TATCATCATG LLGN VIV IIM GCTTCGCATT TTGGCAAGGA GTTCACCCCT GAAGTGCAGG CTGCTTGGCA GAAACTGGTG ASHF GKE FTP EVAA AITA KLV 1021 GCTGGTGTTG ccAcrccrcr cGcrcAcAAG TAccATTGAg rcctcttgct ctcot gcaog AGVA TAL AHK YH

138 Bushveld elephont sh rew Tírtålï-rî;Xl n partiol sene sequence r0gr tgccccgtgt tcccaccocc tttottcttc acatgacagè_ãcgottaeo

139 þpendix 2-2F Pole-throated sloth epsilon-globin gene sequence

caaoggcggt gtotoccctt coctgctgoc cctctgctga-cçc_qgctcçg cccct goggg

ococogctco gccctgocco otgoctgtgo agtaccoggg goocoagggg ccagaggtoc

acagtgaøgo qtsoooogcc ocotcttcto gaagcogcsc ogctctgctt ctgocacgtc

tgtgotcocc ogccqgcott togotcctoc otcATGGfGe ATTTTACTGC TGAGGAGAAG VH FTA EEK GCTACTGTGG CGAGCCIGTG GGGCAAGGTG AACGTGGAGG AGGCTGGCGG CGAGGTGCTG ATVA SLIT GKV NVEE AGG EVL GGCAGgtogg cotttgggtc tcoocgcagg ggaaogoogg tgootgcoog cctggoaoat GR tgacaagaao togccooogg otttctgtgt ctctgotttt ccottttcto tggtctcotc

tCOtOgACTT CTGGTTGTCT ACCCCTGGAC CCAGAGATTT TTTGACAACT TTGGTAACCT L LVVY PITT ARF FDNF GNL GTCTTCTTCC TCTGCAATCA TGGGCAACCC CAAGGTCAAG GCCCACGGCA AGAAGGTGCI SS5 SAIM GNP KVK AHGK KVL 541 GACCTCCTTT GGAGAICEIG TTAAGCACAT GGATGACCTC AAGGGCACCT TTGCCCATCT TSF GDAV KHM DDL KGTF AHL 601 GAGCGAGCTG cAcTGTGAcA AGcTGcAccT GGATccccAG MCTTCCGGg tgogtccogg SEL HCDK LHV DPE NFR 661 aaaggagcat gtgcccottt ttgcttttta cttottctgo ootootgggt caagttcogt

gttoggaggo cogooctcto gttggcotoa ccootoccoo ctogtctcag aagtggqocc

781 ogtatgggtc tqctaccogc ctgottttcc ctoooocctt ttogggoctc togccootot

tqgttgtgtt totc ctg gag ogtattggtg agtagcagøa ctgtgco agg agoaatcqoc

tttggctoct totagotgao ggoctatoto aoagotggoo toogccttot tttgttgagt

octooggott gotttggogo aoaotttgog ocogottttt tcooogoooo qtocooaoot

ttcctaqtqt otattaaott ccctgtcogt attgtgttca agaaaaggct tgtccctoto

t40 Appendíx 2-2F Pqle-throoted sloth epsilon-globin gene sequence

ttggctggog ottttooctc cctct gagoa occttgcagc-cct_gooctqc tttgottqcq

1141 ggagtgogot ooctggt aoo tgogtgaagg oogtoootoo oacogct gao ttgggcaaog

gaggtgggao tatocotagg cagootoctg goctgotggc tcatqaoogt ootottgoac

t267 ccoggotcto gctootttgo gcocottggo toggcocooc tcttggogco gtttgaggtg

L32t oggogtgggt aggcogtoot ggtotttogc cottttotct gooaottctt tttggoaact

1381 tctgttcaco tgcctgtgtg ttgtctgcct ttcccctaoc qgeIeelGGG CAATGTGATG LLG NVM 1441 GTGGTTGTTT TGGCTTCTCA TTTTGGCAAG GAATTCACTC CCGAAGTGCA GGCTGCTTGG VVVL ASH FGK EFTP EVA AAI'J CAGAAGTTGG TGGGTGGTGT TGccAATccr CTGGcTcAcA AGTAccAcTG Agccttctgt AKLV GGV ANA LAHK YH 1561 ccocctcgtc agggcccctg tgtccccogc cotccttctg cacgtgttto otgggttttg

gctttgogog cocogcttct ccttqotooo gtgcottcta ttcogtoqtt aotqttttot

ttccttcatc ttttgttctt gttttaaogg aggaagggtt cottggctgo ggggtgggoo

1741 gagacotoag cotooocoog tgctcttttg aggagggttt gogttct cat coaaggoagg

1801 tagoogttaa cgggoacgtt ctoggoggcc aaggggcato ctoagotgca goøggagttt

1861 tctooggcat ggoaaoggct tctgt gaogt ggacoggtac ttgtggggcg gtgtg gcaao

7921 oocttococ a aaggaaggaa ggggatggto ogcogtgtt t gagooagoog aoagcagott

tgtggtoaot ocotgcogoc otocotqcot aaøotaatoa aaaataagct ttgto gggat

ccoogttcto gcctogogct octatttcqo tttgctctoa cocqtotttt taogcogcco

gcttgtcago otgtttgtgo gg

t4l Appendix 2-3. Lenglhs of exons and introns for'e-like' globin genes. Values do not include those of the initiation and termination codons. "--" denoJes missing data.

Taxon Exon 1 Intron I Exon 2 Intron 2 Exon 3 Gene length

African ele-phant y 89 bp l23bp 223bp 879 bp 126bp 1,440 bp

Asian elephant y 89 bp l23bp 223bp 879 bp 126bp 1,440bp

dugong 1 89 bp 123 bp 223bp 887 bp t26bp 1,448 bp

manatee y 89 bp l23bp 223bp 876 bp 126bp r.437 bp

elephant sh¡ew e 795bp 126bp

sloth e 89 bp t2l bp 223bp 773 ï:p 126bp 1,332bp

armadillo e 89 bp l25bp 223bp 819 bp 126bp 1.382 bp

armadillo r¡.'1 l16bp 122bp 200 bp 854 bp 126bp 1,4r8 bp

armadillo r¡r¡ absent absent absent -817 bp t25bp nJa

142 Appendix 2-4. Lenglhs of sequence data obtained for the 5' external region of 'e-like, globin genes, and positions of upstream features relative to the pu-tative Cap site. The positions of putative Cap sites from the initiation codon ("ATG') are ¿so shown. Lengths of collected 5' end sequence data do not includethat of the initiation codon. "--" denotes missing data.

CCAAT Putative Length of Taxon CACCC site ATA site obtained site Cap site 5' external data

Asian elephant y -32bp

dugong 1 absent -84 bp -30 bp -52bp -216bp

manatee T -30 bp -52bp 127 bp

sloth e l12bp -83 bp -30 bp -53 bp -213 bp

armadillo s -lll bp -82 bp -32bp -54 bp n/a

-103 bp, armadillo rpy mutated into -69 bp -28 bp -9 bp nla CTCCC

armadillo rþr1 absent absent absent absent nla

143 Appendix 2-5. Lenglhs of sequence data obtained for the 3' external region of ,e-like, globin genes, and positions of polyadenylation signals (AA?AJ\â) reTative to the termination codon. Lengths of collected 3' end sequence data do not include that of the termination codon. "--" denotes missing data.

Length of obtained Taxon AATAAA signal 3'extemal data

African elephant 1 +9 bp

Asian elephant y +197 bp, +213 bp +320 bp

dugong y +48 bp, +65 bp +371 bp

manatee r¡ +8 bp

elephant shrew e +70 bp

sloth e +93 bp +571 bp

armadillo e +82 bp nla

armadillo {.r1 +29bp nla

armadillo r.[n1 +118 bp nla

t44