Atypicat molecular evolution of afrotherian and xenarthran
B-globin cluster genes with insights into the
B-globin cluster gene organization of stem eutherians.
By
ANGELA M. SLOAN
A thesis submitted to the Faculty of Graduate Studies in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
Department of Zoology University of Manitoba Winnipeg, Manitoba, Canada
@ Angela M. Sloan, July 2005 TIIE I]MVERSITY OF' MANITOBA
FACULTY OF GRADUATE STT]DIES ***** - COPYRIGHTPERMISSION ] .
Atypical molecular evolution of afrotherian and xenarthran fslobin cluster genes with insights into thefglobin cluster gene organization òf stem eutherians.
BY
Angela M. Sloan
A ThesislPracticum submitted to the Faculty of Graduate Studies of The University of
Manitoba in partial fulfill¡nent of the requirement of the degree of
Master of Science
Angela M. Sloan @ 2005
Permission has been granted to the Library of the University of Manitoba to lend or sell copies of this thesis/practicum, to the National Library of Canada to microfilm this thesis and to lend or sell copies of the fiIm, and to University Microfïlms Inc. to publish an abstract of this thesis/practicum.
This reproduction or copy of this thesis has been made available by authority of the copyright owner solely for the purpose of private study and research, and may only be reproduced and copied as permitted by copyright laws or with express written authorization from the copyright ownér. ABSTRACT
Our understanding of p-globin gene cluster evolutionlwithin eutherian mammals
.is based solely upon data collected from species in the two most derived eutherian
superorders, Laurasiatheria and Euarchontoglires. Ifence, nothing is known regarding_the
gene composition and evolution of this cluster within afrotherian (elephants, sea cows,
hyraxes, aardvarks, elephant shrews, tenrecs and golden moles) and xenarthran (sloths,
anteaters and armadillos) mammals. To address this shortcoming, I sequenced and
classified atotal of 24 'þ-llke' globin genes from 1 I afrotherian and Z xenarthran species,
and conducted a comprehensive analysis on the molecular evolution of p-globin cluster
genes within these two basal eutherian clades.
Analyses of the 'adult expressed' F- atrd ò-globin products suggested that the ô-
globin locus of paenungulate (and presumably all afrotherian) mammals encodes the only
'f3-type' chain component of their post-natal hemoglobin. This finding challenges recent
views on the evolution of the eutherian p-globin cluster, and provides the first
documented evidence of a dispensable plocus within Mammalia. It appears that the ò-
locus of stem afrotherians gained a functional advantage by means of a capable promoter
region, donated by the neighbouring p gene via gene conversion, after which p was
silenced through nonfunctionalization. Hyraxes were also found to possess a second putatively functional ô-globin gene (òH), which may or may not be expressed. A ML analysis of ò-globin IVS2 sequences grouped elephants and sirenians into a monophyly to the exclusion of hyraxes supporting the 'Tethytheria' hypothesis, and placed the tree hyrax as the most basal hyrax species. Analyses of the 'embronically expressed' r-, y- and q-globin genes revealed the
presence of both Y- and e-loci within members of the *p"T.9:: Afrotheri4 providing
compelling evidence that the ancestral eutherian p-globin cluster ðontained at least four
genes (u, ò, p) T, by the time this clade diverged from other eutherians -107 mya. Earlier
studies have revealed that hemoglobin of 5-month old elephant fetuses and manatee
calves contains two 'F-type' chains. My data suggests these distinct p-chain components
are encoded by the Y- and ôloci in these species, thus providing only the second example
of developmentally delayed expression of a y locus within mammals. Sequence analyses
of the armadillo B-globin cluster further indicated that the eutherian p-cluster contained
five (e, ò, p) loci Y, ï, by the time xenarthrans diverged from eutherians -102 mya. Remarkably, of the five armadillo loci, only the two outermost genes (e and p) appeared
capable of transcribing a polypeptide. Together with earlier studies that demonstrated adult armadillos express two distinct p-type chains, this finding implies that these
xenarthrans may uniquely extend expression of the e-locus into post-natal growth stages
or, like marsupials, possess an unlinked 'pJike' gene outside their p-globin cluster.
A comparative assessment of evolution rates indicated that, with few exceptions, descendants of the'e-like' loci diverged more slowly than the 'p-like' loci in both coding and non-coding gene regions within species of all four eutherian superorders. These findings signify that rates of evolution for the parlogous genes within eutherian p-clusters are a function of selection pressure, contradicting the molecular clock theory that all genes evolve at an equal rate. ACKNOWLEDGEMENTS
First and foremost, I wpuld like to thank mV acad-e]1i,c advijor, Dr. Kevin
Campbell, for giving me the opporhrnity to gain invaluable research experience, and
devising a thesis project to which I am proud to attach my name. I am also very grateful
for all your adVice, patience, and good-natured kidding, which provided me with a
pleasant work environment for the entirety of my graduate work. You took me on as a
novice to the field of molecular phylogenetics and I thank you for your faith in me - it
will forever be appreciated.
I also thank my other committee members, Dr. Gunnar Valdimarsson who not
only donated essential lab equipment but plentiful advice concerning laboratory
procedures, and Dr. Ivan Oresnik who allowed me to partake in an intriguing research
project which ultimately boosted my career interest in the flreld of site-directed
mutagenesis. I additionally appreciate the helpful feedback they both supplied throughout the course of my thesis project, which proved invaluable to the efficiency of data collection and analyses. Thanks to both for your kind and generous natures, as well as your eyes for detail!
For allowing me to work in his lab for more than two years, I extensively thank
Dr. Nate Lovejoy, who was more patient and giving than I could have ever hoped for. I also thank the "Lovejoy Lab" graduate students, Kristie Lester, Stuart Willis, and Tommy
Sheldon for their friendship and moral support over the years.
I must also graciously thank the providers of blood, tissue, and DNA samples.
These include Andrew Taylor at the University of Pretoria in South Africa, Brenda
McDonald at the University of Adelaide in Australia, Christophe Douady at Université
111 Claude Bernard Lyon I in France, Gabi fürlach at the Marine Biological Laboratory in
Woods Hole, Massachusetts, Gary Bronner at the University of Cape Town in South
Africa, Galen Rathbun at the California Academy of Sciencei irr Cam6ri4 California,
Gordon Glover at the Assiniboine Park Zoo in Winnipeg, Manitob4 Robert Bonde at the
Florida Integrated Science Centre in Gainseville, Florida, Sarit¿ Maree at the University
of Pretoria in South Africa, Terry Robinson at the University of Stellenbosch in Cape
Town, South Africa, and Wendy Korver at the Bowmanville Zoo in Ontario, Canada.
Without the generous donations of these skilled academics, this study could not have
been the complete molecular analyses I dreamed it would be.
I could not forget my mother, Merle, and father, Gary, to whom I thank for
everything they have ever given me, be it advice, support, friendship, and love. These are
the best parents anyone could ask for and I owe them everything I also thank my sister,
Jennifer, for a precious friendship and an immense amount of moral support and
encouragement when I needed it most. Thanks as well to Amrit Singh for all the talks
and advice on lab procedures, and to Octavio Orellana for providing me with analytical
equipment and helping me become a stronger and more confident individual. I love you
all.
Last but not leas! I would like to thank the organizations that funded my research, including the University of Manitoba Graduate Fellowship association (providing financial support to myself), and the Natural Sciences and Engineering Research Council of Canada (NSERC), The Winnipeg Foundation and the University of Manitoba Research
Grants Program (tlRGP) (providing supporr to Kevin Campbell)
1V TABLE OF CONTENTS
Page
Abstract
Acknowledgements 111
Table of Contents
List of Figures
List of Tables xll
Abbreviations xlv
General Introduction
Chapter I
Atypical molecular evolution of afrotherian and xenarthran 'B-like' globin genes ...... 7
Introduction 8
Materials and Methods l3
Results 19
Discussion 38
Chapter II
Molecular evolution of the eutherian p-globin cluster infened from afrotherian and xenarthran gene sequences . 45
Introduction ...... 46
Materials and Methods ... .. 49
Results ...... 52
Discussion ...... 69 Literature Cited 77
Appendix I
t-1 95 Primers used in lhe (A) amplification and @) sequencing reactions of "p-like'globin genes.
T-2 96 DNA sequences of the 'BJike' gfobin genes amplifred in this study.
1-3 125 Lengths of exons and introns for sequenced 'p-like' globin genes.
r-4 726 Lengths of additional sequence data obtained for the 5' external region of sequenced 'p-like' globin genes, and positions of upstream features relative to the putative CAp site.
1-5 127 Lengths of additional sequence data obtained for the 3' external region of sequenced 'p-like' globin genes, and positions of poly- adenylation signals (AATAAA) relative to the termination codon.
Appendix 2
2-l 128 Primers used in the (A) amplification and (B) sequencing reactions of "e-like' globin genes.
2-2 tze DNA sequences of the 'e-like' sì"ui' g"n", ;*piir,; ir rr,ir' sh:dy.
2-3 t42 Lengths of exons and intons for 'e-like' globin genes
2-4 t43 Lengths of sequence data obtained for the 5' external region of 'e-like' globin genes, and positions of upstream features relative to the putative Cap site.
vl 2-5 -- .....,...r44 Lengths of sequence data obtained for the 3' external region of 'e-like' globin genes, and positions of poly-adenylation signals (AATAAA) relative to the termination codon.
vll LIST OF FIGI]RES
Figure Page
1-1. 9 The p-globin gene clusters of select mammals (see text for details; Konkel är al.1979; Schon et al. l98l; Lingrel et al. 1983; shapiro et at. r9t3; Townes et al. 1984; Schimenti and Duncan 1985b; cooper et al. 1996; satoh et al. 1999) mapped onto a phylogram adapted from the maximum likelihood analysis (concatenated 17,736 bp data set) of Amrine-Madsen et at. (2003).
l-2. T2 Hypothesized relationships among the three extant paeunungulate orders based on 19 recent molecular phylogenetic studies.
1-3. 23 Comparison of the'p-globin' chain amino acid sequences of four afrotherian and two xenarthran species (top row in each case) with those deduced from the ò- and p-globin gene sequences determined for these species (bottom row; dots represent identity with the top sequence).
1-4. 24 Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).
1-5. 25 The 5' external sequence data and features of ô- and p-globin genes amplified in this study and those of armadillo, human, tarsier and galago.
1-6. 26 Dotplot comparisons (window size : 100; threshold : 80) of the putative dugong ô-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).
viii LIST OF FIGURES ...cont'd
Figure Page
1-7 27 Dotplot comparisons (window size : 100; threshold : g0) of the putative rock hyrax ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).
1-8 28 Dotplot comparisons (window size : 100; threshold : g0) of the putative rock hyrax òH-globin gene (Y-axis) with the ö- and p-globin genes of humans, rabbits and mice (X-axis).
1-9. 30 Dotplot comparisons (window size : 100; threshold : g0) of the putative elephant shrew p-globin gene (y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis).
1-10. 3T Dotplot comparisons (window size : 100; threshold : s0) of the putative armadillo r¡rô-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis).
1-11. 32 Dotplot comparisons (window size : 100; threshold : g0) of the putative armadillo p-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis).
r-12. 34 Doþlot comparisons (window size : 100; threshold : 80) of the putative sloth qò-globin gene (y-axiÐ with the ò- and p-globin genes of humans, rabbits and mice (X-axis).
IX LIST OF FIGIIRES ...cont'd
Figure Page
1- 13. 35 Dotplôt comparisons (window size : 100; threshold : 80) of the putative p-globin sloth gene (y-axis) with the ò- and B-globin genes of humans, rabbits and mice (X-axis).
t-14. 36 Xenarthran-rooted maximum likelihood tree constructed from homologous ò- globin IVS2 gene sequences (773 bp; see text for details) illustrating the inter-relationships among paenungulate mammals.
2-t. 56 Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant y-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). an 57 lotplot comparisons (window size : 100; threshold : g0) of the putative dugong y-globin gene (y-axis) with the e-, y- and r¡-globin g"n", of humans, rabbits, mice and goats (X-axis)
2-3. 59 The 5' external sequence data and features of 'e-like' globin genes sequenced in this studl', plus those of the armadillo r-, ey- and rpq-globin genes, the human r-, yA- and r¡rq-globin genes and the goai r¡-globin gene.
2-4. 60 Dotplot comparisons (window size : 100; threshold : 80) of the putative elephant shrew e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). LIST OF FIGURES ...cont'd
Figure Page
2-5. 67 Dotplot comparisons (window size : 100; threshold : 80) of the putative sloth e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis).
2-6. 62 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo e-globin gene (y-axis) with the e-, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).
2-7. 64 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rþy-globin gene (y-axi$ with the e_, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).
2-8. 65 Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rpq-globin gene (y-axis) with the e_, y_ and r¡-globin genes of humans, rabbits, mice and goats (X-axis).
2_9. 70 p-globin The gene clusters of select mammals, updated using data collected for this study (see Fig. l-l legend for details).
XI LIST OF TABLES
Table
1-1. T4 List of taxa examined in this study
T-2. 15 Genes included in the sequence analyses of 'p-like' globin genes with corresponding GenBank accession numbers.
1-3. 20 List of 'p-like' globin gene products amplified in this study.
t-4. 22 Percent identity matrix comparing the IVSI and tVS2 sequences of 'p-like' globin genes sequenced in this those of human, rabbit and mouse.
2-1. 50 Genes included in the sequence analyses of 'eJike' globin genes with corresponding GenBank accession numbers.
t _') 53 List of 'elike' globin gene products amplified in this study.
2-3. 54 Percent identity matrix comparing the IVSI and tVS2 external sequences of 'e-like' globin genes sequenced in this study with those of human, rabbit, mouse and goat.
2-4. 55 Percent identity matix comparing the 5' and 3' external sequences of 'e-like' globin genes amplified in this study with those of human, rabbit, mouse and goat.
xtl LIST OF TABLES ...cont'd
Table Page
2-5. 66 Percent of synonymous (silent) and nonsynonymous (replacement) substitutions between the 'e-like' and 'pJike' globin genes amplified in this study (plus those of armadillo, rabbiq mouse, goat and tarsier) and their ortholog in humans. Silent and non-silent evolution rates are also shown.
2-6. 68 Percent of nucleotide substitutions between the IVS2 sequences of 'e-like' and pJike' globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat and tarsier) and their ortholog in humans. The corresponding non-coding DNA mutation rates are also shown.
xlil ABBREVIATIONS
A adenine -: ct alpha _- p beta bp(s) base pair(s) ô delta ddH2O double distilled water dNTP deoxynucleotide triphosphate E epsilon E. coli Escherichia coli EMBOSS The European Molecular Biology Open Software Suite rl eta (F) forward primer Ï gamma G guanine Hb hemoglobin HI(Y85 Haegawa-Kishino-Yano (1985) model HS hypersensitive site(s) IPTG isopropyl-B-D-thiogalactoside IVS intervening sequence LB Luria-Bertani LCR locus control region MatGAT Matrix Global Alignment Tool MgCl2 magnesium chloride ML maximum likelihood mya million years ago NCBI National Centre for Biotechnology Information NISC NIH (National Institute of Health) Intramural Sequencing Centre NSERC Natural Sciences and Engineering Research Council of Canada ú) omega PAUP* Phylogenetic Analysis Using Parsimony (*and other methods PCR(s) polymerase chain reaction(s) tþ psi (pseudo) (R) reverse primer RPM revolutions per minute T thymidine Ta annealing temperature TAE tris-acetate EDTA Taq Thermus aquaticus TBR tree bisection and reconnection U unit of enzyme activity
xtv URGP University of Manitoba Research Grants program V volts vlv volume per volume X-GAL 5=bromo-4-chloro-3-indolyl-B-D-garactoside
XV GENERAL INTRODUCTION
The crown-group Mammalia is defined as the most rec-ent common ancestor of
the existing mammalian clades and includes the egg-laying platypuses and echidnas
(Monotremata (or Prototheria)), the pouched marsupials (Marsupialia (or Metatheria))
and the piacental mammals @utheria) (McKenna and Bell 1997; Nowak 1999). Both the
Monotremeta and Marsupialia are relatively small assemblages, containing one and seven
orders respectively (McKenna and Bell 1997; Nowak 1999), whereas the placental mammals a¡e a diverse gfoup, consisting of eighteen orders recently divided into four
superorders: (1) Euarchontoglires (or Supraprimates) [rodents, lagomorphs, primates]; (2)
Laurasiatheria [carnivores, cetaceans, ungulates, bats, shrews, moles]; (3) Xenarthra
[armadillos, anteaters, sloths]; and (a) Afrotheria (Eizirik et al. 20Q7; Madsen et al. 2001;
Murphy et at. 2001b; Lin et al. 2002; Springer et at. 2003). Afrotherians represent a clade of placentals thought to have arisen on the African continent and comprise six orders: Proboscidea (elephants), Sirenia (sea cows), Hyracoidea (hyraxes), Tubulidentata
(aardvarks), Macroscelidea (elephant shrews), and Afrosoricida (golden moles and tenrecs) (Stanhope er ø/. 1998). Although they do not share a single morphological synapomorphy, the members of this group were first grouped into a monophyly by
Springer (1997) and Stanhope et al. (1998) based on molecul ar datz. Further support for this clade comes from many more recent comparative sfudies utilizing mitochondrial, nuclear and concatenated gene sequences (Liu and Miyamoto 1999; Springer et al- 1999;
Eizirik et al.2O0l; Murphy et al.200la,b; Lin et al.2O02) and from a common nine base pair deletion in exon i I of fhe BRCAI gene found only in afrotherians (Madsen e/ a/.
2001). Recently, van Dijk et al. (2001) also provided independent support for the clade by identiffing unique amino acid substitutions in three proteins of these animals relative
to other eutherians, while Nikaido et al. (2003) isolated and charactenzed a novel family
of SINEs (short interspersed repetitive elements) termed "Afro-Sb[8s", sp-ecific to species
within Afrotheria.
Molecular clock data based on gene studies indicate that eutherians have been present and evolving into present-day clades since the mid Cretaceous (-100 mya)
(Eizirik et al. 2001; Murphy et al. 2001b; Springer et al. 2OO3). The deepest split is thought to have occurred either between afrotherians * xenarthrans (making them 'sister groups') and other placental mammals (Madsen et at.2007; Delsuc et at. 2002;Lin et al.
20OZ) or simply between afrotherians and other placentals (Stanhope et al. 1998, Liu and
Myamoto 1999; F;izink et al. 2001; Murphy et al. Z}ola,b; van Dijk et al. z00l;
Springer et al. 2003). Both hypotheses place this divergence at -107 mya @izirik et al.
2001; Murphy et al. 2001b; Madsen et al. 2002, Springer et al. 2003), a date coinciding with a major plate tectonic separation event (Smith. et al. 1994). At roughly this time,
Africa and South America separated from the southern hemispheric super-continent
Gondwana, which also incorporated Antarctica, Australia, India and Madagascar before it broke apart (Smith et al. 1994). The 'Afrotheria hypothesis' suggests that six of the twenty placental orders evolved on Africa during the period when the continent was isolated from others via plate tectonics (-1ol to 40 mya) (Eizirik et at. 2001; Murphy e/ al. 2001b Springer et at. 2003). After differentiating into various ecological niches, these mammals were able to distribute themselves throughout Europe and Asi4 as Africa began to collide with these continents -30 mya (tledges 2001). Xenarthran mammals are thought to have evolved solely on South America (Murphy et at. 2001b). Laurasiatheria
2 and Euarchontoglires have been deemed sister taxa (constituting the clade Boreoeutheria)
(Murphy et al. 2001b) with a northern hemispheric origin on the ancient supercontinent
Laurasia (Murphy et al. 200la,b). Undoubtedly, the early divergence of Afrotheria and
Xenarthra from other placentals provides a framework for ancient eutherian evolution studies, while the collection of molecular dat¿ from these basal superorders could have many important implications on the early evolution of eutherian genomes.
An ideal candidate to address these intriguing possibilities is provided by the respiratory pigment hemoglobin (FIb). Hemoglobin is one of four classes of functional porphyrin-containing proteins capable of reversible oxygen-binding @erutz 1979;
Dickerson and Geis 1983). Myoglobin (supplying Oz to mitochondria within muscle tissues), neuroglobin (supplying Oz to the brain; Burmester et al. 2000) and the ubiquitous cytoglobin (whose function is unknown; Burmester et al. 2002; Trent and
Hargrove 2002) are monomeric proteins comprising one protein chain. However, Hb
(binding oxygen in the lung capillaries and facilitating its delivery to respiring cells) is a tetrameric protein comprising two ct- and two p-helices (Dickerson and füis 1983; 'Weber and Wells 1989; Poyart et al. 7992; Weber 1995). It is hypothesized that the cr- and P- globin genes encoding hemoglobin arose from the duplication of a primordial globin gene roughly 600-800 mya (Antoine and Niessing 19S4). This pair of globins then underwent further duplication events over time to produce the various a-like and p- like globin genes found in extant vertebrate species (Goodman and Moore 1975;
Czelusniak et al. 1982; Proudfoot et al. 1982; Goodman et al. 1987). Both the a- and p- globin gene clusters of derived eutherians have undergone extensive scrutiny. However, it is the p-globin cluster, comprising the genes encoding the'p-type' polypeptide chains of hemoglobin, that is the focus of this study.
Functional p-globin cluster genes are composed of thràcoding rãgions separated by two introns (intervening sequences or IVS). Exon sequences are highly conserved in length, (normally comprising 89,223 and126 bp in exons 7,2 and 3, respectively) and share strong similarity between orthologs of different species (>60%) and within paralogs of the same species @awson and Yamamoto 1998) These findings are most likely the consequence of puriffing selection, as conserved sequences have important implications for the structure and function of the protein. The two fVS regions are variable in size, normally ranging ftom 722 to 130 bp in the first intron and from 850 to 914 bp in the second @ickerson and Geis 1983). Based on sequence alignments, it is assumed that the vast amount of variability in both size and composition is the result of numerous base substitution and insertior/deletion events over time (Konkel et al. 1979). Exon-infon boundaries show a high degree of conservation, however, as both introns almost always possess the 5'GT - AG 3' sequence normally found at splice junctions (Nishioka e/ a/.
1980). It is therefore quite possible that introns are necessary for proper transcription, especially since one mouse pseudogene, containing only exon sequences, is not expressed under normal circumstances (Dickerson and Geis 1983).
Mammalian genomes often employ regulating mechanisms to ensure that the genes within multi-gene families are expressed at appropriate times, including proximal promoters and distal regulatory elements working in conjunction with environmental cues
(Fritsch et al. 1980;Iltll et al. 1984). The locus control region (LCR) is a large segment of DNA found in mammals whose activity has extreme effects on the expression of genes within a multigene family (Tuan et al. 1985; Forrester et al. 1987; Grosveld et al. 1987).
The B-globin LCR is roughly 12 kb in length, located from -6 to 20 kb upstream region of the cluster, and is divided into three sites (200-300 bp toþl'hypersensitive (HS) to
DNase I digestion (Tuan et al. 1985; Forrester et al. 1987; Grosveld et al. 7987). These sites act as recognition sequences for DNA-binding proteins, which directly control the chromatin structure of the 60 kb functional p-globin domain. The mechanisms by which these proteins regulate chromatin structure, however, are currently unknown.
Experiments in which the LCR of humans was fully deleted resulted in failure to express any of the p-globins, even though the genes themselves were structurally intact @risocoll et al. L989; Feng et al. 2005). Interestingly however, the presence of only one HS site within the LCR is sufficient for full gene expression in transgenic mice @yan er ø/.
1989). The results of these studies confirm the utmost importance of the LCR in the expression of each globin gene within the cluster. Certain sequences of non-coding
DNA, located both upstream and downstream of the 5' (CACCC, CCAAT, ATA) and 3'
(AATAAA) flanking end of each globin gene, respectively, also play important roles in the control of both transcription and translation @roudfoot and Brownleee 1976;
Efstratiadis et al. 1980; Grosveld et al. 1982; Dierks et al. 1983; Kosche et al. 1985;
Myers et al- 1986), and are discussed in det¿il within the upcoming chapters.
Surprisingly, our knowledge of eutherian p-globin clusters has been deduced from data collected on only the two most derived orders, Laursiatheria and
Euarochontoglires, and thereforg the goal of this study was to obt¿in genetic information on the p-clusters of the two most basal clades, Afrotheria and Xenarthra. This information should provide much needed insight into the p-cluster gene organization of stem eutherians and ultimately lead to a better understanding of molecular evolution of the p-cluster within the whole of Eutheria.
-=-.. CHAPTER I
ATYPICAL MOLECULAR EVOLUTION OF
AF'ROTHERIAN AND XENARTHRAN 'P.LIKE' GLOBIN GENES. INTRODUCTION
The mammalian p-globin gene cluster is thought to.frl,_t_ originated from the tandem duplication of a single proto-B-globin gene about 200 mya @fstratiadis et al.
1980; Czelusniak et al. 1982). Differential selection on the upstream regulatory elements of these paralogous loci subsequently confined expression of the 5' e-globin gene to early embryonic stages, while the 3' p-globin gene became developmentally suppressed
@fstratiadis et al. 1980; Czelusniak et al. 1982). This gene cluster orgamzation and expression pattern is still found in marsupials (Koop and Goodman 1988; Cooper and
Hope 1993; Cooper et al. 1996; but see Wheeler et al. 2001), and presumably monotremes (Lee et at. 1999). However, as indicated by the four-to-twelve gene clusters of rodents, lagomorphs, artiodactyls and primates (Fig. 1-l), the composition and regulation of the ancestral mammalian p-cluster has been substantively altered over the eutherian radiation (Efstratiadis et al. 1980; Jeffreys et al. 1982; Martin et al. 1983;
Goodman et al. 1984; Hardies et al. 1984; Scott et al. 1984; Townes et al. 1984;
Schimenti and Duncan 1985a,b, Hardison and Gelinas 1986; Tagle et al. 1988). Despite their compositional differences, remarkably congruent (though independent) gene conversion and inactivation events are seen within the B-clusters of these four orders, leading to the development of several widely accepted paradigms pertaining to the evolution of this cluster (Jeffreys et al. 1982; Goodman et al. 1984; Hardison 1984;
Hardison and Margot 1984; Harris et al. 1984,' Tagle et al. l99I; Satoh et al. 7999, Koop et al. 1989a;Prychitko et aL.2005).
For inst¿nce, the presence of a ô-globin locus in each of the four examined placental orders (Rodentia, Lagomorpha, Artiodactyla, Primates; Fig. 1-l) has led to the Putative coÍÌmon eutherian ancestor
E'r q ô p eyvqôB al # Þ ÉIl
vßhO phl yBh2 ryph3 pl Bz
VPr" "P
i-r EII pc
¡ €rrr ìyru
?
f ------Elephant shrew I r-l i i L------Golden mole I T-t ttl I I L------Aardvark lt -'1 LJ< ? I ll ------HYrax l.l !-- j l---- Elephanr I
eÊ ot Marsupials H H H eP Monotremes {t}-+ general acceptance that this gene arose via duplication of the adult-expressed pJocus
@fstratiadis et al. 1980; Czelusniak et al. 1982) soon **]n: divergence of Eurheria from Marsupialia (148-159 mya; Archibald 2003). Since thirevent, the orthologous ô genes from each of these four orders \ryere converted (in some cases repeatedly) by the B- locus, with p preferentially converting ô in its upstream external and 5' exonic regions
(Jefireys et al. 1982; Martin et al. T983; Hardison and Margot 1984; Hardies et al. 1984).
The ò gene thus proved to be largely expendable, with p-globin becoming the major (and generally only) functional adult gene in the cluster (Martin et al. 1983; Hardies et al.
1984; Koop et al. 1989a; Prychitko et aL.2005). Indeed, õ-globin is only known to be expressed in New World monkeys, hominoids, tarsiers and galagos (Martin et al. 7983;
Hardies et al. 1984; Prychitko et al. 2005) and,' with the exception of the laner two groups (Tagle et al. 1988; Koop et al. I989a), encodes for only a small fraction of the hemoglobin molecules of post-natal erythrocytes (Boyer et at. l97l; Martin et at. 1983;
Ross and Ptzarro 1983). The distribution of expressed ò-globin genes among primates suggests these functional loci arose (possibly via the resurrection of an anciently silenced
ô-globin gene; Martin et al. 1983) independently in stem anthropoids, tarsiiformes and loriformes via recombination with the 5' p-globin promoter region necessary for transcription (Koop et al. 1989a; Tagle et al. 7991; Prychitko et a|.2005). The ò-locus was subsequently silenced in the lineage leading to Old \Morld monkeys (Martin et al.
1983), or hybridized by an unequal crossover event with the rþr¡-locus in the ancestry of lemurs (Jeffreys et al. 1982). Accordingly, ò-globins have been litle more than vestigial genes since their inception, except where their integrity has been maintained through sporadic, fortuitous events of gene conversion (Hardies et al.l9B4).
10 Recent molecular studies have provided compelling support for the presence of
four placental 'superorders' (Madsen et al. 2001; Murphy et at.200la,b; Amrine-Madsen
et at. 2003). Because these (and other) studies place the iop".rd.rs Xenarthra and
Afrotheria at the base of the eutherian clade (Fig. 1-1), members of these two groups
provide ideal model systems to test hypotheses regarding the molecular evolution of the
B-globin gene cluster. While protein sequences have been documented for the 'F-type'
chains ofseveral afrotherian and xenarthran species, priorto the initiation ofthis study no
nucleotide data was available. Consequently, the identity (i.e. Þ or ô) and evolutionary
history of the genes encoding these polypeptide chains have not been established. To fill
this void, the primary aim of this study was to determine whether the p/ô duplication
event predates the eutherian radiation. If so, a second aim was to explicate the
evolutionary events leading to the present-day post-natal p-like globin genes of both
superorders to test whether the p-globin locus exhibited a competitive advantage over the
5' ò-locus in members of these basal clades.
The final goal of this study was to examine the phylogenetic relationships among
the afrotherian orders Proboscidea, Hyracoidea and Sirenia. Since their grouping into the
superorder Paenungulata (Simpson 1945), the inter-relationships among these orders have
remained controversial. For example, studies examining morphological characters
consistently place sirenians and elephants as sister taxa ('Tethytheria hyothesis';
McKenna 1975; Shoshani and McKenna 1998; Liu and Miyamoto 1999; Liu et al.200l), while molecular analyses recover only limited support for this relationship (Fig. 1-2). To help resolve this issue, phylogenetic relationships amongst all seven ext¿nt paenungulate genera were exarnined using non-coding, homologous gene sequences.
l1 Fig. 1-2. Hypothesized relationships among the three extant paeunungulate orders based on 19 recent molecular phylogenetic studies. Letters below each hypothesis correspond to studies providing support for the above topology.
12 (d () CS ñ rcJ 8E E 'rJ t E o Þ o() E Ø Clq o() o U) Íúo U) p i{ _Ò ñ a Jà .A ;r.j Êi Ë E
A B c
Lavergne et al. 1996 Stanhope et al. 1998 Springeref al. 1997 Eizirik et al. 2001 Liu and Miyamoto 1999 Madsen et al. 2001 Murphy et al. 200Ia Amrine and Springer 1999 Amrin+Madsen et al. 2003 Madsen et aL.2002 Murphy et ql. 2001b Fleming et qL.2003 Murata et al. 2003 Delsuc et al. 2002 Douady et a|.2002 Maliaet al. 2002 Douady et aL.2003 Springeret al. 2003 Waddell and Shelley 2003 MATERIALS AND METHODS
Data collection
DNA was extracted from whole blood (100 pl) and tissue (lO-25 mg) samples from 11 afrotherian and two xenathran species (Table 1-1) using a QIAGEN DNeasy@
Tissue Kit. The optical density of eluates was measured with an flltrosp ec 3100 pro spectrophotometer @iochrom), the concentration and purity of the DNA was calculated from absorbance readings at260 and 280 nm, and 50 ngl¡rl working solutions prepared.
To design primers for polymerase chain reactions (PCRs), degenerate nucleotide sequences were first deduced from the 'p-globin' polypeptide chains of the African
(Loxodonta africana) and Asian elephant (Etephas maximus), Amazon manatee
(Trichechus inunguis), rock hyrax (Procøvia capensis), nine banded armadillo (Dasypus novemcinctus) and three-toed sloth (Bradypus tridactylas) (see Fig l-3 for references).
Resulting sequences were aligned with primate, rodent, lagomorph and artiodactyls p- globin gene sequences (see Table l-2), and highly conserved regions used for primer design (Primer Premier 5.0; Premier Biosoft International). PCRs were conducted on l00ngof templateDNA(2¡rl) in 0.2m1 tubescontaining43¡rlofreactionmixture
(1.0 ¡rl of each dNTP (2.5 mM), 5.0 ¡,rl lOx pCR Reaction Buffer, 3.H.5 pl of 50 mM
MgCl2, 5.0 ¡rl of each primer (10.0 pmol/¡rl), 0.5 ¡rl Taq polymerase (5 U/¡^r,l) and 24.0-
25-5 ¡il ddH2O), using PTC-0220 DNA Engine Dyad@ (MJ Research) and Mastercycler@
Gradient @ppendorf) thermocyclers. Initiat amplifications entailed a 30 cycle protocol
(94"C for 30 s; 50oC for 15 s;72oC for 30-75 s) followed by a 10 min interval at72oC.
PCR products were isolated using Montageru PCR Centrifugal Filter Devices
t3 Table 1-1. List of taxa examined in this studv.
Afrotheria Elephantidae Loxodonta africanø African elephant blood not specified female
Elephas maximus Asian elephant blood not specified male
Dugongidae Dugong dugon dugong DNA Mabuiag Island, female Tones Straits
Trichechidae Trichechus manatus West Indian manatee skin Crystal River, Florida female
Procaviidae Procsvia capensis rock hyrax blood not specified not specified
Heterohyrax brucei À yellow-spotted hyrax eaf not specified not specified Dendrohyrax dorsalis western tree hyrax DNA not specified not specified
Orycteropodidae Orycteropus qÍer aardvark liver Karoo, SouthAfrica female
Macroscelididae Elephantulus inturt bushveld elephant shrew liver not specified not specified
'I Cþsochloridae Chrysochloris asiatica cape golden mole spleen Cape Town, SouthAfrica L ¡nale Amblysomus hottentotus Hottentot golden mole spleen PaIm Springs, South Africa male
Xenarthra Bradypodidae Bradypus tridøctylus pale-throated not not specified nod specified three-toed sloth specified
Dasypodidae Dasypus novemcinctus nine-banded not not specified not specified armadillo specified Table 1-2. Genes included in the sequence analyses of 'p-like' globin genes with corresponding GenBank accession numbers. Hyrax-specific=ò.globin genà denotedbyasterisks(t). - '"'."-- (ôH) are
Eutherian Globin Accession superorder Species Common Name Reference gene No.
Afrotheria Loxodonta africana African elephant ò DQ09l20l This study
Elephas maximus Asian elçhant ô DQ091202 This study
West Indian Trichechus manqtus ô manatee DQ091203 This study
Dugong dugon dugong ò DQ091204 This study
ô DQ091208 Procavia capensis rock hyrax This study ôHx DQ091205 yellow-spotted Heterohyrax brucei +' hyrax ôH DQ091206 This study Dendrohyræ. dorsalis western tree hyrax òHx DQ091207 This study
Elephantulus intufi bushveld elephant ô DQ0912l1 strrdy sh¡ew p DQoe1212 This Dasypus Xenarthra nine-banded vô DQ091209 novemcinctus This study armadillo 13 DQ09l2l3
pale-throated vò DQ091210 Bradypus tridactylus This study three-toed sloth 13 DQ0912l4 Dasypus nine-banded Green 2004, .lrò, p 4C151518 novemcinctus armadillo unpublished Efst¡atiadis Euarchotoglires Homo sapiens human ô,p u01317 et al. 7980
ph2, ph3, Shehee et al. Mus musculus mouse x14061 þr,þ2 I 989
Oryctolagus Margof et al. cuniculus rabbit ,þþ2,þL Ml8818 1 989
Otolemur Tagle et galago ô,p u60902 al. crqssicaudatus 1992 õ Tarsius syrichta tarsier r04428 Koop et al. p J04429 I 989a
Schon et al. Laurasiatheria Capra hircus goat po M15387 t 98l
l5 (Millipore), and small samples of purified template (0.8 ¡rl) pipetted into 19.2 ¡rl aliquots of reaction mixture for subsequent amplification experiments. To increase template- primer binding specificity, a nested PCR protocol utilizing an adnèaling Gmperature (Ta) gradient of 50-60'C was employed. Nucleotide sequences of the resulting products were used to design species-specific primers to amplify the flanking regions of each gene via a walking reaction (APAgeneru Genome Walking Kit, Bio S&T Inc.).
A 15 ¡"tl sample of each final PCR product was mixed with 3.5 ¡"rl loading buffer and 1.85 ¡rl SYBR Green (Fisher Scientific), pipetted into gels prepared from 0.75 g of low melting temperature agarose dissolved in 50 ml lx TAE buffer, and electrophoresed at 100 V for I hr. Target bands were excised, and purified using a MinEluterM Gel
Extraction Kit (QIAGEN). These extracts were ligated using a eIAGEN@ pCR
Cloningnlu" Kit and transformations performed. Transformation products were incubated on agar plates containing O.lYo vlv ampicillin (100 mglml), X-GAL (40 mg/ml) aod
IPTG (100 mM) for l8 hr at37oC. Positive colonies were transferred to culture medium containing 0.lYo vlv ampicillin (100 mg/ml), and incubated (37'C for 18 hr) in 50 ml LB culture tubes placed on a platform shaker set to 225 RPM. Finally, a 3 ml subsample of this bacterial culture was pelleted by centrifugation (4 min at 5,000 RPM) and plasmid
DNA extracted (QIAprep@ Spin Miniprep Kit; eIAGEN).
To verify that the plasmids contained target bands, 20 ¡*l (350-400 ng) of purified plasmid DNA was digested in a restriction enzymemixh:re (l 5 ¡rl 10x REact@ 3
Buffer, 0.5 ¡rl EcoRl (10 U/¡rl) and 11.0 ul ddl{zo) for l-z hr at 37oC, then electrophoresed (80 V; t hr) on a lYo agarose gel. Positive samples \¡/ere sequenced in both directions with a 3739 ABI PRISM Genetic Analyzer (University of Calgary DNA
16 core laboratory) and BigDye Sequencing Kit using the universal sequencing primers, M-
13(F)-40 and M-13@) (Appendix 1-1).
Data analyses
Consensus gene alignments were constructed for each species using
Sequencherru (Versi on 4.2.2) software. Overlapping gene fragments incorporated between 3 and 14 cloned fragments amplified from 2 to 7 individual PCRs, respectively.
Dotmatcher (EMBOSS) software was used to identify homology, and MatGAT (Version
2,02) employed to calculate percent identity, between sequences of the 'pJike' genes amplified in this study and the ò- and p-globin genes of human, rabbit and mouse (see
Table 1-2 for GenBank accession numbers). For comparative purposes, the ô- and p- globin gene flanking regions of the nine-banded armadillo (Dasypus novemcinctus) were first identified in a working draft BAC sequence available on GenBank (accession number 4C151518, representing clone VMRC5-69F10 from the NISC Comparative
Vertebrate Sequencing Project) and included in these analyses. To assess promoter regions, the 5' flanking sequences of sequenced genes were aligned and compared with the 'pJike' globin genes of armadillq human, galago and tarsier (Table 1-2).
To examine the inter-relationships amongst paenungulates, a phylogenetic analysis was conducted on intron 2 (IVS2) sequences of the ò-globin genes ampliflred from these species, with homologous xenarthran sequences serving as outgroups. A
773 bp alignment was created using Clust¿l X (Version 1.8) (Thompson e/. al 1994) using gap opening and extension penalty values of 16 and 6.66, respectively, and refined by eye. A ML analysis was conducted on the alignment using PAUP* (Version 4.0b10)
T7 software, applying the HKY85 two-parameter model variant for unequal base
frequencies. AII positions were weighted equally and the transition:transversion ratio set
to 2.00. A heuristic search was executed with 1000 bootst.u¡r"pti"ut"* The final tree was constructed using random stepwise addition with weighted least-squares and TBR
algorithms.
18 RESULTS
A total of 10 complete and 3 incomplete 'p-like' gene products were amplified
and sequenced - from the lt afrotherian species (Table f 4r 'Complete' products
possessed gene regions running from at least the initiation to termination codons, while
_'incomplete' products were missing the gene region containing either one or both signals.
putative AII gene products possessed the three exon/two intron affangement (Appendix 1-
3) characteristic of other p-globin cluster genes. For genes with no frameshift mutations,
IVSI divided codon 30 between the second and third nucleotide, while IVS2 separated
codons 104 and 105. Comparative sequence examinations revealed that both gene
products obtained from the a¿rdvark and the single product from each ofthe golden and
Hottentot moles were artificial 'ò/p-globin' chimeras, with the conceptual point of
conversion occurring at exactly the same base position within each "gene"; these pCR
artifacts were thus excluded from all subsequent analyses. MatGAT and dotplot analyses
(see below) suggested that all but one gene product from each of the remaining
afrotherian species shared the highest degree of sequence homology with the ô-globin
genes of human, rabbit and mouse (Table I -2). Accordingly, these genes were classified
as ô-globins. Conversely, one of the amplified gene fragments from the elephant shrew was shown to share highest sequence similarity with other placental mammal p-globin
sequences Table @ig.1-9, 1-4) and classified as p-globin. Complete ù and B-globin gene sequences were obtained from both xenarthran species included in this study.
Classification of afrotherian 'þlike' globin genes
Dotplot analyses of the Asian (Fig. la) and African elephant (data not shown) genes (which shared 99.2% sequence identity from initiation to termination codons)
19 Table 1-3. List of 'BJike' globin gene products amplified in t[is_study.
Number of base Taxon Gene identity Amplified gene pairs sequenced region
Afücan elephant ò I,744 complete
Asian elephant ò 2,lrg complete
dugong ô 2,639 complete
West Indian manatee .ô L,696 complete
rock hyrax ô 1,180 partial
öH 1,609 complete
yellow-spotted hyrax òH r,604 complete
western tree hyrax òH 1,591 complete
aardvark ò/p r,177 complete, artifact
ò/p I,192 complete, artifact
bushveld elephant shrew p 1,840 partial
ò 180 partial
cape golden mole ò/p I,7L3 complete, artifact
Hottentotgolden mole òorp? 426 partial
p pale-throated sloth 1,3 18 complete
q,ò 1,590 complete
nine-banded armadillo p 1,383 complete
+ò 1,296 complete
20 recovered the highest sequence similarity with ò-globin genes of human and rabbit in the
IVS2 and the 3' flanking regions, while 5' flanking regions revealed homology with other ' 1.-- B-globins. Examination of the Asian elephant 5' promoter regioñ further revealed the
presence of all transcriptional control motifs characteristic of functional B-globin genes
(Fig. 1-5; Appendix 1-4). The translated protein chains of the African and Asian elephant
ò-globin genes were identical to the 'p-globin' protein chain determined for these species
(Fig.1-3), suggesting these genes are expressed.
Dotplot analyses of the dugong (Fig. l-6) and manatee (data not shown) gene
products (which were 96.2%o identical from initiation to termination codons) revealed
homology with other mammalian ô-globin genes in the IVS2 and 3' flanking regions.
Like the elephant gene products, the 5' flankig regions of both sirenian ò-globin genes
were p-like. However, both species possessed ATA --* GTA mutations at the ATA
transcriptional control motif (Fig. 1-5, Appendix l-4). While these base substitutions
may have an effect on the transcriptional regulation of these genes, the conceptual protein
chain of the Florida manatee ô-globin gene precisely matched the amino acid sequence of
the 'p-globin' protein chain determined from the Amazon manatee, Trichechus inunguis
(Fig. 1-3).
Dotplot comparisons of the first rock hyrax gene fragment (Fig. 1-7) also
suggested this gene was ò-like '\¡/ithin the IVS2 and 3' flanking region. This gene fragment (designated ô-globin) encoded for a protein identical to codons 82-146 of the
'p-globin' protein chain documented for the Abyssinian hyrax (Procavia capensis habessinica) (Fig. l-3). Notably, the second rock hyrax gene product was also òlike in the IVS2 and 3' flanking region (Fig. l-8). However, the translated protein sequence \ryas
21 Table 1-4. Percent identity matrix comparing the IVSI and IVS2 sequences of 'p-like'globin genes sequenced in this study and those of human, rabbit and mouse. Gap opening and extension penalties were set to 16 and 4, respectively. Numbers in gray represent IVSI comparisons while values in black denote fVS2 comparisons.
(õ' (ô' (ô' L(ô 2(E' 3 4 s tô' 6 7 rô' 8(ô 9 (ùô 10 ({'ô ll lß' 12 (B', t3 (ß 14 lô 15 tô) t6 lô L7 (ô 18 (B t9 (B' 20 (B', 2t (B'
1. Africa¡l elephant ô 98.9 84.0 84.5 73.d 75.0 76.1 77.2 67.f Á(6 44.7 49.5 47. 54.! 61.5 <¿. ¿ <ÉÁ 47.( 46.4 48.6 48.4 2. Asian ele¡hant ô 99.2 7aÁ 7(O AK? 83.8 84.2 77.2 67.Z 65. 49.7 48.2 47.1 54.! 61.2 5¿.0 (6.f1 ¿ß.n 46.t ß.2 4E.2 |n.3 3. dueons ò 9r.5 97.4 17.r 16¡ 76.8 77.8 67.1 65.f 46.9 50.4 48. 6t.', 1^¿ 48.5 46.2 48.2 47.9 4. manatee ô 91.5 9|.8 96.1. 77.1 78.2 69.0 66. 47.0 40 ¡< áRA s7.Á 62.t 54.t 57.( 49.3 46.8 47.l 47.5 ô 1^ á 5. rockhvrax 72.3 73.1 #î 60.3 45.8 49.6 47.',, ar. a (t{ s2.3 53.8 ß.2 47.1 41.1 47.2 J "i1.6 6. rockhyrax ôß i; ,1J 97.7 95.s 63.6 62. 47.2 49.7 47.1 s2-6 57.! 52.t <<¡ 48.', 47.t 41.5 ¿R(
7. vellow-sootted hwax ôs 7'J.4 7i. 8rl.fi Ii0.0 )5,5 95.9 64.2 62.! 48.1 49.8 48.', \17 l/.r 111 <Ã1 &1 47.4 b..) 48.6 48.0 t\) ôH 'li. 8. tree hrrax 7ô. fr0..l 3ri.8 9-ì..i -1,-6 65.1 627 48.t 50.3 dR{ <1 t (Pn 14. ô (i9,1 ,. human 68.t 68.t 68. í;.t. ÍJ li l. ,:tj 11 63..1 66.t 70.5 58.2 s2.4 51,3 49.4 41.( ß.4 44.! (ôlike) 15. ¡abbit rllß2 5+.2 5J.2 s6. i1 1 itl 19.{ ô1. 59.¡ ,t8.8 s8.J (il s6.2 55.t (63 ß.7 46( tt!'t ¿ 47.'. 16. 4.J. J .!i.1 mouse th2 lò-like) "t3.J 1i.1 l.i ¿.1.1 .rí. J4.l .11.7 mouse (öJike) ,.t(;. t'l Fh3 51 51 ! -l-<. I .t -r.t.B .19.( s6.-i 53.Í 45.3 46.5 4b.o 47.! 18. human ß tt) 68.9 r;fi.1 69.t 68. 6i h1 it {tJ.l 66.9 ii).5 i (l(i.{ì ll 18.5 47.1 aRÁ 47.3 19. rabbit ß1 (tJ.1 1;.3.1 ó 1..1 ót.r 69.0 id.{i ó(r,'I 65.1 7ti.t 57_ 1 -13. -17 . il).n 49.È s2.1 20. mouse 81 6Z 6t.tj (t..\. (¡2.1 ,l ()2 iÌf ,s.1 5i.{) 65.9 6:ì. 51.1 ,tB.: 5i (i-ì. 1 9.2 6^Á .I 5,J r 1 21. mouse 82 ¡.1. t il.3 5l 5-1.5 ¡1).7 .!5. 5.]. t { -r5 5t lti Fig. 1-3. Comparison of the'B-globin' chain amino acid sequences of four afrotherian and two xenarthran species (top row in each case) with those deduced from the ò- and p-globin gene seqüences determined for these species (bottom row; dots represent -Florida identity with the top sequence). manatee (7. manatu^s); Irock hyrax (P. capensis); $conceptual amino acid sequence of the armadillo p-globin gene (GenBank accession number 4C151518). The degenerate amino acid symbol B corresponds to D or N, while Z represents E or Q. 23 African elephant (Loxodonta africana); Braunitzer et al. (1982) (ô-globin) Asian elephant(Elephas maximus); Braunitzer et øt. (1984) Amazon manatee (Trichechus inunguis); Kleinschmidt et al. (1996) (ô-globin) Rock (Abyssinian) hyrax (Procøvia cøpensis habessinica); Kleinschmidt and Braunitzer (1983) r (ô-globin) .R...S...8..S,.S.,DEKII.S...... r....R T T....r¡K..R.Q... (ôH-globh) Pale-throated tfuee-toed sloth (Bradypus tridactylus); Kleiruchmi dt et al. (19g9) l,l f Nine-ba¡ded armadillo (Dasypus novemcinctus); de Jong et al. (l9gl) I (ftchain') ..D.EDC..E (p-globin) (ftglobín) Fig. 1-4. Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant ô-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons l, 2 and 3 for each gene are denoted by black boxes. 24 Asian elephant ô-globin gene com parisons ô-type genes /'/ ,r/ ca c (g E o- o rabbit yB2 qJ rabbit Bl c .(o 2000 1500 I000 sdo loòo ls'oo 2ooo mouse Bh2 mouse B1 mouse Bh3 mouse B2 Fig. 1-5. The 5' flanking sequence data and features of ò- and p-globin genes amplified in this study and those of armadillo, human, tarsier and galago. Data runs from seven bases upstream of the distal "GACCC" site to the initiation ("ATG') codon. The putative cap site ("CAp"¡ is underlined and lies 52 bp (51 bp in the sloth) upstream of the initiation codon. Sequences considered important for transcriptional regulation (CACCC, CCAAT, and ATA) are denoted by bold CAPITAL letters. Nucleotides surrounding the ATA motif correspond to a consensus sequence ("GGGCATAfuartr{G') found in many postnatally expressed p-globin genes (flardison 1983) and are shaded in gray. The consensus sequence "CTTPyTG' @aralle and Brownlee 1978), found seven bases downstream of the cap site, is also marked in gray. Colons (:) denote base deletions. Underlined bases represent those which deviate from the consensus sequence normally recovered in p-globin genes, but should not be considered significant unless they compose a sequence element involved in tanscriptional regulation. 25 " cAccc * ,,cAccc,, "ccÀÀT' +1L0 +100 +90 +80 +70 +60 +50 +40 Asian elephant ô caggcctCACCCtgcagaaccaCACCCtggc: : : : : cttggCC.AÀlctgctcacaagagcaaaaagggcagg: accagggtt dugong ô cagacctCAccCtgcagaaccaCACCCtggc!:!::ctyggCCAATctgytcccaggagcaagaagggcagaaaccagggtt manatee ô cagacctCACCCtgcagáaccaCACCCtggcr::::cttggCCAÀÍIctgctcccaggagca"gaagggåagaaaccagggtt armadillo r¡rô tcttacaaatcaatgaacatggatatatgctatgccttaggAcAÀIctgctùcaaggagcagagagggaagggaccaaggct human ò tttcattctcacaaactaatga.AAcCCtg : cttatcttaaaCCÀÀgctgct,cac 3 : 3 : 3 : : ! t,ggagcagggaggacaggac tarsier ô ttttattctcacaaaccaatcaatttct gccatatcctaggTEAÀTctgctcacatgagcagggagggcâagagtcagggct galago ô / galago tcagccctcctccAcacagccaCACCCtgag:::::ttcagCCTÀTctcctcataggtacagggagggcaggaaccagggca elephant shrew B ctccAggctctttctgcagagtCAclct9gc!3:::ctgggCCA.ã,Tctgcttgcagaagcaccatgggcaggacccagggct sloth B AC TCCCTGGAAGGA.AGGGGÀGGGCCCAGGGCT armadillo B cag ac ctCACCCtatggaaccaCACCCaggt, : : : ! : tgtgaCCAÀTctgctc agaggagcagagagggc agggaccagggct human B tagacctCACCCtgtggagccaCACCCt.agg : : : : : gttggCC.AÀlctactcccaggagcagggagggcaggagccagggct tarsier B cag gcctCACCCtgc agagcccCACCCt ggg : : ! : : cttggccAÀlctgctcccaggagcagtaagggc agt.agccaaggct "ÀTA" ,,ATc" +30 +20 +L0 0 _10 _20 _30 _40 _50 Asian elephant ô gggcArÀtaaggaagagtagtgcc agctgctgtttAcactcacttctgacacaactgtgttgactagcaactacccaatcagacaccatg dugong ô gg E cGlAaaar'fgaagrgcaggactagctgctgcttå'ctctLa.,:t-t:ct-ggcacaactgtgttgact,agcaactacacaatcagacvrcc manatee ô gggcGTÀaaaggaagagcagggctagctgctgcttAcacttacì:tctggcacaactgtgt.tgactaacaactacacaatcagacacca!g armadillo rpô gggcÀlAaaag: !::agcaaagccagctgttgcttAtacttgcr-Ltggacaaaact,gtgctct.ctagcaaccatacaaag'agacac4a{g human ô qêgCATAaaâggcâgggcagagtcAactgttgcttAcactttci:-.tsig¿sâtaacagtgttcact,agcaacctcaaa::cagacaccqùg tarsier ô gggcATAaaa'JgtagggcagagtcAactgctgctcAcgcttgc.htctgacacaaccgtgttcaccagcaaccacaaa: : cagacacJatg galago ô / galago g'ggcATAaaaEtgcagggcagggaactgctgct,gctAtacttgct-tt-cqacacaactgtgttcactagcaacca::: ! !:cagacatcatg I elephant shrew B gggcArAEaag : aag99ca99'acc agcaaaggcttAcactLqcL¿t c'r-qacacaagt,gtgttcactagcaactactcaaacagtcaagatg sloth B tgg cATAaaaggaggagaagggcc agctgctactcAcacttgcttcigacacagctgtgttcactagcaaccacaaaa: cagacaccatg armadillo B gggcATAaaagtaagagtaaggccagctgctgcttAcaattgct'Ectgacacatttgtgctcactagcaaccatataaagagacaccatg human B gggcATAaaagtcagggcagagccatctattgctt.Acatttgclrtctgacacaactgtgttcactagcaacctcaaa::cagacaccatg tarsier B Fig. l-6. Doþlot comparisons (window size : 100; tfueshold : 80) of the putative dugong ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5'to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes. 26 Dugong ô-globin gene comparisons ô-type genes _Þ_+yp" genes ,/ human ô human B oa cOl o rabbit yB2 (') rabbit B1 E= mouse BhZ mouse B1 mouse Bh3 mouse B2 Fig. 1-7. Dotplot comparisons (window size: 100; threshold: 80) of the putative rock hyrax ò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 2 (parttù, sequence) and 3 for each gene are denoted by blackboxes. 27 Rock hyrax ô-globin partial gene comparisons ô-type genes human ô human B t'O X (o rabbit yB2 rabbit B1 -c -\¿U LOl I mouse Bh2 mouse B1 ph3 mouse mouse B2 Fig. 1-8- Dotplot comparisons (window size: 100; threshold: 80) of the putative rock hyrax ôH-globin gene (Y-axis) with the ò-, and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes. 28 Rock hyrax ôH-globin gene comparisons ô-type genes human õ human B r oO (gX ! rabbit yB2 rabbit Bl -c -VU o mouse Bh2 mouse Bl mouse Bh3 mouse B2 different at 31 of 146 residues (2I.2%) from the 'p-chain' of this species (Fig. 1-3). The amplified gene products of the yellow-spotted and tree hyraxes Sh¿red 9T9Yo and 95.8o/ø sequence identity with the second rock hyrax ô-like gene, respectively, and grouped closely together on the ML tree (Fig. l-la) suggesting the three loci are orthologous. These apparent hyrax-specific loci were thus designated ôH-globins. The first gene fragment amplified from the elephant shrew ran from +670 bp upstream of the initiation codon to the end of IVS2. Dotplot analyses recovered the highest degree of sequence similarity between this fragment and the p-globin genes of human, rabbit and mouse @ig. 1-9). Mutations at both CACCC transcriptional control motifs in the 5' promoter region of this gene were detected (Fig. l-5, Appendix 1-4). The second gene fragment, which ran from codon position 136 in exon 3 to +180 downstream from the stop codon, shared the highest degree of sequence similarity with eutherian ò- globin genes (data not shown). Classifcation of xenarthran 'þlike' globin genes Dotplot (Figs. 1-10) and MatGAT (Table 14) analyses suggested that one armadillo 'p-like' gene product was homologous with eutherian ô-globin genes in IVS2. This gene was presumed a pseudogene (r¡rò) due to a premature termination signal at codon 22. The second armadillo gene product showed highest sequence similarity with other mammalian p-globin genes in the IVS2 and 3' regions (Fig. 1-11). Conspicuously, the deduced protein sequence ofthis gene differed from the p-globin protein sequence at 4 of 146 positions (2.1%) throughout exons 1,2 and 3 (Tig. 1-3). From the initiation to 29 Fig. 1-9. Dotplot comparisons (window size : 100; tlreshold : 80) of the putative elephant shrew p-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons I and 2 for each gene are denoted by black boxes. 30 Elephant shrew B-globin partial gene comparisons ô-type genes Þ-type genes I 500 500 lo'oo t 500 2000 human ð human p B (U -c r rabbit yB2 P rabbit Bl c(g -c o o ,/ mouse Bh2 mouse B1 mouse Bh3 mouse B2 Fig. 1-10. Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rþô-globin gene (Y-axis) with the ò- and B-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes. 3t Armadillo ryô-globin gene comparisons ô-type genes Þtype genes o sdo robo I 500 human ô human p CrO s00 :9 rabbit yp2 -o rabbit B1 (U E L (u mouse Bh2 mouse B1 mouse Bh3 mouse B2 Fig. 1-11. Dotplot comparisons (window size: 100; threshold:80) of the putative armadillo p-globin gene (Y-axis) with the ô- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes. 32 Armadillo B-g lobin gene com parisons ô-type genes Þ-type genes 500 1000 human ð human B o E ro rabbit r¡p2 rabbit E B1 L (o 500 1000 mouse Bh2 mouse B1 5Ci0 ph3 mouse mouse B2 conceptual stop codons, nucleotide sequences of the armadillo rþô-globin and B-globin genes were found to share a high'degree (99.4Yo and 99.3%o) oEjeQgenceidentity with the two genes identified at the 3'-end of the Dasypus novemcinclzs B-globin cluster. This 3'- most pJike gene seemingly possessed all the 5' and 3' control elements necessary for transcription. Conversely, the ôlike gene was missing both CACCC and CCAAT 5' transcriptional control motifs and contained no 3' poly-adenylation signal (Fig. 1-5; Appendices 1-4 and 1-5). Dotplot analyses showed the ô-globin gene to be ôlike and the p-globin gene to be B-like in their respective 5' and 3' flanking regions (data not shown). Dotplot (Figs. 1-12, l-13) and MatGAT (Table 1-4) analyses suggested that one sloth 'pJike' gene product shared homology with eutherian ô-globin genes, while the second gene product was homologous with other p-globin genes. The ô-globin gene was presumed a pseudogene (tpò) due to a two-base deletion at codon 77 resultin g in a premature termination signal at codon 87 in exon 2. The conceptual protein chain of the sloth p-globin gene was identical to the amino acid sequence of the 'p-globin' chain (Fig. l-3) Phylogenetic analysis of ùglobin IVS2 regions The ML analysis of ò-globin IVS2 sequences (773 bp) grouped elephants and sirenians together to the exclusion of hyraxes (Fig. 1-14). However, bootstrap support for this assemblage was relatively low (56%). Bootstrap support was strong (37-100%) for all other groupings. The three òH-globin genes were closely clustered together and placed J3 Fig. 1-12. Dotplot comparisons (window size : 100; threshold : 80) of the putative sloth r¡rò-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes. 34 Sloth tyô-globin gene comparisons õ-type genes Ê-typ" genes 1000 ,/ 0 500 l5oo 1000 o söo toilo rsöo human õ human B 500 I'O 0 0 500 1000 1500 0 500 1000 -c rabbit yB2 rabbit B1 Po t^ 500 0 o 5oo tooo rsöo 0 500 1000 1500 mouse Bh2 mouse Bl 1 t 000 500 o 5oo rcbo 0 500 to00 ts00 mouse Bh3 mouse B2 Fig. 1-13. Doþlot comparisons (window size : 100; threshold : S0) of the putative sloth p-globin gene (Y-axis) with the ò- and p-globin genes of humans, rabbits and mice (X-a*iÐ. Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes. 35 Sloth p-globin gene comparisons ô-type genes F-type genes ,/ ,/ 1000 r 500 human ô human B ./ -Ò co- s00 sdo -c rabbit ryõ2 rabbit Po B 6 500 500 1000 mouse Bh2 mouse B1 OF 500 l0¡0 0 500 mouse Bh3 mouse p2 Fig. 1-14. Xenarthran-rooted maximum likelihood tree constructed from homologous ô-globin fVS2 gene sequences (773 bp; see text for details) illustrating the inter- relationships ¿tmong paenungulate mammals. Values along each branch denote the degree of sequence divergence. Bootstrap support (1000 replicates) for each node are indicated in large whole numbers. 36 armadillo tyð 0.09 4 o-o1 0 rock hyrax ôH yellow-spotted hyrax ðH 0.01 1 - western üee hyrax ôH rock hyrax ð o-007 manatee ô 0.02 1 dugong ò - 0.005 African elephant ô Asian elephant ô 0. 05 substitutions/site as a sister group to the rock hyrax ò-globin gene with high bootstrap supporl (57%). Finally, the tree hyrax was placed as the most basal branch of the hydracoidean lineage. -=.- 37 DISCUSSION The discovery of a ô-like globin gene within members-of, the superordinal clades, Afrotheria and Xenarthra, supports the theory that the ancestral mammalian proto-p- globin gene duplicated prior to the eutherian radiation (Ilardies et al. L984; Hardison 1984; Hardison and Margot 1984; Hutchison er at. 1984) However, the data moreover suggest that, contrary to other placental mammals, ô-like genes are the only post-natúly expressed loci of the p-globin cluster found \¡/ithin the red blood cells of paenungulate species. This finding challenges the widely perceived notion (Martin et ø1. 1983; Hardies et al. 1984; Hardison 1984; Hardison and Margot 1984; Koop et al. 1989;Prychitko et al. 2005) that ò-globins have been little more than relic genes (i.e. 'failed experiments') since their inception. It is further proposed that modifications within the p-cluster leading to the sole expression of the ôlocus were already present in stem afrotherians prior to the radiation of the paenungulate clade. These assertions are well supported by several lines of evidence. The first is based on the observation that the 5' promoter (Fig. 1-5) and 3' poly-A signals (Appendix 1-5) necessary for transcription were detected flanking the ô-globin gene of all paenungulate species for which this region was obtained. However, it is noteworthy that the 5' external ATA box of both sirenian species possessed an ATA ) GTA substitution @ig. 1-5). This site is thought to be crucial for the transcription of eukaryotic genes by RNA polymerase II (Dierks et ø1. 1983; Kosche et al. 1985} leading Satoh et al. (1999) to speculate that a similarly altered ATA box might diminish or prevent activity of the rat yl-globin gene. In contras! data collected for this study suggest that the mutated 5' GTA sequence does not adversely affect expression of the ôJocus in sirenians. Support for 38 this contention is provided by the observation that the conceptual polypeptide chain encoded by the ô-globin gene of the Florida manatee (Trichechus manatus) precisely .l . matches the 'p-globin' chain of the closely related Amazoi manatee, T. ínunguis (Kleinschmidt et al. 1986). In addition, putative protein chains translated from the õ- globin gene products of the African elephant, Asian elephant and rock hyrax corresponded to the respective 'p-globin' protein chain sequences of these species (Braunitzer et al. 1982; Kleinschmidt and Braunitzer 1983; Braunitzer et al. l9S4). Because the hemoglobin molecules of adult proboscideans (Kleihauer et al. 1965; Braunitzer et ql. 1982; Braunitzer et al. 1984) and sirenians (White et al. 1976; Kleinschmidt et al. 1986) examined to date possess a single component (i.e. are composed of only one type of 'a-like' and one type of 'p-like' chain), this implies that the pJocus is either non-functional or absent in these groups. Interestingly, hemoglobin of the rock hyrax possesses two chromatographically distinct (i.e. HbI and tIbII) components (Kleinschmidt and Braunitzer 1983). However, amino acid analysis suggested that like the two 'a-like' chains, the sequences of the two 'pJike' chains of this species were identical, a finding the authors could not explain. The discovery ofa second highly conserved ôlike (ôH) gene in each ofthe three genera raises the possibility that this locus is expressed in hyraxes. Clearly, studies aimed at detecting ôH mRNA transcripts in these species are required to resolve this issue. Because so many ô-globin pseudogenes have been reported to date, the expression of a fully functional òlocus in paenungulate (and likely other afrotherian) mammals is intriguing, and raises the question: how did the ôlocus gain an upper hand over the largely dominant pJocus in this clade? Traditionally, inactivation or diminished 39 expression of the ölocus has been attributed to the absence of intragenic enhancers (Antoniou et al. 1988) and unstable mRNA transcripts (Wood et al. 1978; Ross and ' -1-. Pizano 1983; Kosche et al. 1985; Steinberg and Adams l991r). - However, it is now primarily thought that an ineffrcient promoter region, typically lacking (or possessing mutated versions of) regulatory elements crucial for proper transcription has prevented this locus from attaining the transcriptional success of its predecessor (Vincent and Wilson 1989; Koop et al. 1989; Tagle et al. 1991; Prychitko et al.20O5). Indeed, the region immediately upstream of the òloci generally contains mutations in two important structural motifs (the CCAAT and proximal CACCC boxes) upstream of the Cap site that interact with specific proteins integral for gene activation (Efstratiadis et al. 1980; Grosveld et al. 1982; Dierks et al. 1983; Kosche et al. 1985; Myers et al. 1986). The ò- locus of galagos notably contains an int¿ct CACCC control sequence and is expressed at much higher levels (40% of erythrocytic'B-like' chains; Tagle et al. I99l) compared to other hominoids (2-18Yo;Boyer et al. l97l; Ross and Pizarro 1983; Tagle et al. 1988). In addition, otherwise non-expressed ò-globin genes can be activated in both erythroid and non-erythroid cells by introduction of intact CCAAT and CACCC control sequences into the 5' flanking region of this locus @onze et al. 1996; Tang et al. 1997; Ristaldi er al. 1999). Normal expression of 'pJike' globins thus appears to largely reside in these motifs. Significantly, these important transcriptional control elements, plus a second distal CACCC box (a feature characteristic of functional 'p-like' globin genes; Grosveld et al. 1982; Dierks et al. 1983; Myers et al. 1986) (Fig. 1-5), were detected in the upstream region ofboth sirenian and probscidean ô-globin genes. 40 Hints into the mechanism and probable scenario of the evolutionary events leading to this never before documented attribute can be inferred from the p-clusters of other eutherian mammals (see Fig. l-1; Konkel et al. 1979; Sðnon ¿, at.1OU;Lingrel et al. 1983; Shapiro et al. T983; Hardison and Margot 1984; Townes et al. 1984; Schimenti and Duncan 1985b; Cooper et al. 1996; Satoh et al. 1999), which have demonstrated that the òJocus has been independently converted by the neighbouring p-locus in most species examined (see Fig. 1-1). However, with the exception of the galago (Tagle et al. 1988), these recombination events were insufficient to alter the promoter regions of the ò- locus. Dotplot analyses of the paenungulate ò-globin genes, which indicated their 5' transcriptional control regions are B-like, suggest their presumably capable promoters may have arose via conversion by the 5' region of the p-locus. An alternate scenario is that the region downstream of exon 2 of the p-globin locus was converted by ô. This contention seems unlikely, however, since the amplified pJike globin gene of the bushveld elephant shrew (Etephantutus intufi) shows no evidence of conversion by a ò-like locus (Fig. l-9). Unforhrnately, the hemoglobin of E intufi has not been examined, though that of the East African elephant shrews, Petrodromus sultani and Rhynchocyon chrysopygus> possess two distinct IIb components expressed in relatively equivalent proportions (Buettner-Janusch and Buettner-Janusch 1963). While this observation could signi$ the presence of two different 'pJike' protein chains in these species, the p-locus of E intufi possesses a mutated 5' CACCC transcriptional control motif () CACTC) and lacks the second distal 5' CACCC box common to transcribed 'Blike' genes (Grosveld et al. 1982, Dierks et al. 1983; Myers ø/ al. 1986; Fig. l-5), and is therefore probably not expressed. Together with dat¿ collected 41 from the paenungulates, this finding implies that, within afrotherians, up-regulation of the ò-locus preceded silencing of p, and that both events early in the evolution of this basal eutherian clade. Following gene duplication, the failure of t Oa."og to adopt a ne\ry function or expression pattern often leads to its nonfunctionalization (Ohno lgTO) Thus, once the equally apt ò-locus was co-expressed with its predecessor (B-globin) in stem afrotherians and structurally configured for the sarne physiological role (oxygen transport), natural selection was likely too weak to prevent silencing mutations from accumulating in one of these superfluous genes and randomly, the p-locus was inactivated. The hemoglobin of adult nine-banded armadillos consists of one type of 'c-chain' and two types of electrophoretically distinct'p-chains' (de Jong et al. l98l). While the amino acid sequence of the minor 'B-globin' component is unknown, the major 'p- globin' chain differed from the translated protein sequence of the amplified armadillo p- globin gene at four residues (Fig. 1-3). However, because the amplified p gene shared 99.4% sequence identity with the 3'-most B-globin cluster gene (identified in the Dasypus novemcinclzs B-globin cluster; Genbank accession number ACl5l5l8) which encoded a protein sequence 99.3% identical (145 of 146 residues) to the major'B-like' polypeptide sequence, these residue differences possibly reflect nucleotide misincorporations or, more likely, the occurrence of two alleles within the population. Additionally- both sequences differ from the major "p-globin" chain at position 23 (histidine vs. cysteine), suggesting that the published amino acid sequence is incorrect at this site. Because the armadillo ô-globin locus appears non-functional, the gene encoding the minor 'B-like' globin component of this species is currently unknown, but speculated 42 to either be e-globin or represent a locus situated outside of the p-cluster (see Chapter II) Unlike other eutherians, the rþò-locus of armadillos does nol to have undergone lpïar conversion by p-globin (Fig. 1-10). The only other eutherian ô-gtobin gene thought to have remained unaltered by conversion is the ph2 pseudogene of mice (Phtllips et at. 1984; Hardies et al. 1984). However, these loci show little, if any, sequence homology at the 5' flanking region (data not shown), possibly signifuing a high degree of sequence differentiation or lack of orthology between the two loci. The hemoglobin of three-toed sloths also possesses two chromatographically distinct components (Kleinschmidt et al. 1989). While the analysis indicated the major 'p-chain' is encoded by the p-locus, the amplified rpô-globin gene possessed premature termination signals within the coding region and appeared to have undergone conversion of IVSI and exon 2by the pJocus. This recombination event was evident from the high sequence identity shared by these two gene regions (96.4% and, 97.3Yo, respectively), compared to 79.3o/o for exon 1 and 85.3% for exon 3. It seems unlikely that the frameshift mutation in exon 2ledto the silencing of this gene since the rþò-globin gene of armadillos appears to have been silenced in the absence of conversion events. Hence, the òlocus was likely already transcriptionally silent prior to the divergence of these two species. Paenungulate inter-relationships have been extremely diffrcult to resolve using molecular data (Amrine and Springer 1999; Waddell and Shelley 2lI3;Nishihara e/ a/. 2005). The present analysis of ò-globin IVS2 sequences is the fïrst to incorporate molecular data from all T extant genera to address this question. The ML tree (Fig. 1-la) placed elephants and sirenians as sister taxa to the exclusion of hyraxes; however, this 43 clade was resolved with only 56% bootstrap support. While providing support for the traditional 'Tethytheria hypothesis' (McKenna 1975), the meager support value adds to a growing body of literature suggesting that the three orders diiverged o*, u short time interval (Murata et al. 2003; Waddell and Shelley 2003; Nishihara et al. 2005). Finally, the treg hyrax was placed as the most basal hydracoidean (Fig. l-14), a result concordant with the conclusion of Prinsloo (1993) based on restriction fragment length polymorphism (RFLP) and phylogenetic analyses of the mitochondrial genes, cytochrome b and 12S rRNA. In conclusion, data from this study reveal that the F -* ò duplication event within the mammalian p-globin cluster occurred prior to the eutherian radiation. The dat¿ additionally provide evidence that xenarthrans, like euarchontoglire and laurasiatherian mammals, express the p-globin gene while suppressing the öJocus. However, it appears that the ô-locus v/as converted in its 5' promoter region by F, then uniquely gained a functional advantage over the normally dominant 3' locus early in the evolution of afrotherian mammals. The subsequent silencing of the pJocus in this clade thus challenges the widely accepted view that this locus is indispensable and has remained unaltered throughout eutherian evolution Qlardison and Margo t 1984;Tagle et al. l99L; Prychitko et at.2005). 44 CHAPTER tr MOLECULAR EVOLUTION OF THE EUTHERIAN P-GLOBIN CLUSTER INFERRED FROM AFROTHERIAN AND XENARTHRAN GENE SEQUENCES. 45 INTRODUCTION The respiratory pigment hemoglobin, which binds in the lungs and .olV-q"tt delivers it to the tissues, consists of four heme groups bound ã fou. globin polypeptide strands: two 'oJike' and two 'p-like' chains, comprising 141 and 146 amino acid residues respectively (Weber and Wells 1989; Poyart et al. 1992; Weber 1995). Phylogenetic studies based on nucleotide and protein sequence comparisons have suggested that the progenitor genes encoding the a- and p-globin components arose from the duplication of a single ancestral vertebrate globin gene roughly 450 mya (Czelusniak et al. 1982; Goodman et al. 1987). The proto-p-globin gene is thought to have duplicated prior to the radiation of mammals (-200 mya), producing a two-gene (proto-e and proto- p) cluster @fstratiadis et al. 1980, Goodman 1981; Czelusniak e/ at. 1982). Comparative analyses examining the p-globin clusters from four of l8 extant placental orders (i.e. the orders Rodentia, Lagomorpha, Artiodactyla, Primates) further suggested these paralogs underwent additional duplication events early in the evolution of eutherians, with the last common placental ancestor likely possessing either a four- (Hardison 1983; Hardison 1984) or five-gene cluster (F{ardies et al. 1984; Harris et al. 7984. Goodman et al. 1984; Hardison and Gelinas 1986; Tagle et al. 1988) in the following orientation: 5'-e-y-(r¡)-ô- p-3'. The 'e-like' globin genes (e, 1 and r¡) are thought to have arisen from the proto-e- globin locus, while ô and p are descendents of the proto-p-globin locus (Goodman et al. 1987; Koop and Goodman 1988; Cooper and Hope 1993; Cooper et al. 1996). Chromosome mapping studies have further revealed that these genes are typically arrayed 5' to 3' on the chromosome in the order which they are expressed (Tuan et al. 1985; Forrester et al. 1987). Hence, expression of the 'e-like' genes are generally limited to 46 embryonic erythrocytes (Hardies et al. 1984; Hardison 1984; Hill et at. l9B4), with y- locus expression extending to the fet¿l red blood cells only wiithin anthropoid primates (Fitch et al. 1991). Compared to post-natal stages of eutherian development, embryonic and fetal growth takes place in a relatively static gestational environment and consequently, the genes encoding post-natal hemoglobin chains are generally less conserved than those comprising pre-natal hemoglobins (Shapiro et al. 1983; Hardies et al. 1984;Harns et al. 1986; Satoh et al. 1999; Koop and Goodman 1988) Oddly enough, comparative studies have suggested the non-coding regions of 'e-like' globin genes diverge more slowly than those of their adult-expressed counterparts as well (Shapiro et al. 1983; Hardison 1984; Hardies et al. 1984; Gebel et al. 1985; Slightom et at. 1985; Koop et al. t9ï9b). While the non-coding evolution rates of p-globin cluster genes are comparable to the neufal mutation rate of pseudogenes (4 - 5 xl0-e substitutions/site/year; Hayashida and Miyata 1983; Kimura L983;Li et al. 1985) for members of Laurasiatheria and Euarchontoglires, they are markedly low in primates (^2..9 xl}-e substitutionslsitelyear; Giebel et at. l9B5; Harris et al. 1986; Koop et al. 1989b). This latter observation, known as the 'hominid slowdown' (Li et al. 1987; Hasegawa et at. 7989; Bailey et at. L997), has been attributed to the longer generation time of primates relative to other eutherians (Gebel et al. l9B5; Wu and Li 1985). As noted earlier, it is generally recognized that the progenitor of placental mammals possessed either a four- or five-gene p-globin cluster. However, comprehensive molecular phylogenetic studies (Madsen et al. 2007; Murphy et al. 200la,b; Amrine-Madsen et al. 2003) suggest this hypothesis is based on data from only 47 the two most derived (Euarchontoglires and Laurasiatheria) of the four major placental clades. It is therefore unknown whether the four- or five-gene cluster arose before or after the divergence of the superorders Afrotheria and Xenartnø=(Èig l-Ð Accordingly, the primary goal of this study was to obtain sequence data on the'e-like' globin genes from members of the superorders Xenarthra and Afrotheria and, together with nucleotide data obtained from their 'BJike' globin genes (see Chapter I), derive the composition of the ancestral p-globin cluster of stem eutherians. Furthermorg because nothing is known regarding the rates of evolution of the ' e-like' and 'pJike' globin genes of xenarthran and afrotherian mammals, the second goal was to compare the rates of DNA evolution for the coding and non-coding domains of the globin genes found within their p-clusters. Finally, the above noted mutation rates were contrasted with those of other eutherian B- globin cluster genes in order to gain insight into the postulated link between generation time and the rate of 'neutral' evolution. 48 MATERIALS AND METHODS Data collection PCR, cloning and sequencing protocols were identical to those outlined in Chapter I, with the exception that PCR primers used for initial amplifïcation reactions were designed from a 411 bp 'e-like' globin genefragment sequenced from both African and Asian elephants (Singh 2002). In additior¡ 128,121 bp of consensus sequence containing the p-globin gene cluster of the nine-banded armadillo (clone VMRC5-69F10; NISC Comparative Vertebrate Sequencing Project), together with the complete e-, y- and r¡-globin genes of human, rabbig mouse and goat (Table 2-1) were downloaded from GenBank and included in the analyses. Data analyses Consensus gene alignments were constructed from each species using Sequencherru (Version 4.2.2). Overlapping gene fragments incorporated between 4 and, 7 cloned fragments amplified from 3 to 5 individual PCRs, respectively. Dotmatcher (EMBOSS) was used to determine homology between the sequenced 'elike' genes and those identified from the armadillo p-cluster (see above; Table 2-1) with the e-, y- and q- globin genes of human, rabbiq goat and mouse (Table 2-1). Additionatly, MatGAT (Version 2 02) software was employed to calculate the percent identity between the non- coding regions (tVS, 5' and 3' external) of the above noted genes. To assess promoter regions, the 5' external sequences of sequenced genes were aligned with the 'e-like' globin genes of armadillo, human and goat (Table 2-1). 49 Table 2-1. Genes included in,the sequence analyses of :eJike'globin genes with corresponding GenBank accession numbers. Eutherian Globin Accession Species Common Name Reference superorder gene No. Afrotheria Loxodonta africana African elephant DQ09l2ls This study Elephas maximus Asian elephant ¡ DQ0el2l6 This shrdy West Indian Trichechus manatus DQ09r2t7 This study manatee Dugong dugon dugong DQ091218 This study bushveld Elephantulus intuf DQ091219 This shrdy elephant shrew pale-throated Xenarth¡a Bradypus tridactylus e DQ091220 This study three-toed sloth Dasypus nine-banded Green 2004, t, tÞT, qrq ACls15lS novemcinctus armadillo unpublished Efstratiadis ¿l Euarchotoglires Homo sapiens human s, q,lï u0l3l7 t', a\.1980 Shehee et al. Mus musculus mouse y, ph1 xt406l 1 989 Oryctolagus Margot et al. rabbit Ê'T MI8818 cuniculus 1989 €, et al. Tørsius syrichta ta¡sier Y, M8141l Koop Vn l\/133973 1989a eI Shapiro er a/. Laurasiatheria Capra hircus goat x019t2 eII x01913 1 983 50 The coding regions of sequenced and representative eutherian 'e-like' and 'pJike' globin genes devoid of frameshift mut¿tions (Table 2-5) were analyzed at synonymous (silent) and nonsynonymous (replacement) sites. the percentage of synonymous and nonsynonymous substitutions between the sequenced genes and their human ortholog, was calculated using the Nei and Gojobori (1986, equations 1-3) method with DnaSP software (Version 4.0). Corrections for superimposed (e.g. A--T-C) and back mutations (e.g. A--*T*A) occurring at the same site were determined according to the method of Jukes and Cantor (1969). Mutation rates were calculated by dividing the percent substitution values by the estimated time of divergence between each of the compared species (with afrotherians, xenarthrans, laurasiatherians, atriodactylans and prosimian primates splitting from the main eutherian tree at 107, IO2, 87,85 and 77 mya, respectively; Springer et al. 2003). The evolution rates of non-coding DNA were calculated for eutherian 'e-like' and 'p-like' globin genes by first subtracting the percent similarity between the IVS2 regions of the sequenced gene and its human ortholog (taken from the MatGAT analysis of IVS2 sequences, see Tables 1-4 and 2-3) from 100. The resulting value (representing % divergence) was divided by the estimated time of divergence between the sequenced and compared (human) species to infer the non-coding mutation rate. 5l RESTILTS A single complete 'e-likel globin gene was amplified.from_the African elephant, Asian elephant, dugong, manatee and sloth, while a partial'e-like' gene product was amplified from the elephant shrew (Table 2-2). The genes were classified as e-globin or y-globin based on homology with other eutherian 'e-like' globin genes (see below). Classifcation of y-globin genes Dotplot analyses of the Asian (Fig 2-l) and African (data not shown) elephant 'e- like' genes (which were 98.7Yo similar from initiation to termination codons) suggested they were homologous with eutherian y-globin genes in the IVS2 regions. Moreover, MatGAT (Tables 2-3, 2-4) analyses recovered highest sequence similarity with other placental y-globin genes in all non-coding regions. Within the Asian elephant's 3' gene flanking region, two putative poly-adenylation signals (AATAAA) were found at +195 bp and +213 bp from the termination codon (Appendix 2-5), -T40 bp downstream from the site typical of functional eutherian y-globin genes (i.e. +62 bp from the stop codon; Efstradiatis et al. 1980; Margot et al. 1989; Shehee et al. 1989). The absence of frameshift and splice junction mutations suggested both elephant 1-globin genes are capable of transcribing polypeptides. Doþlot analyses of the dugong (Fig.2-2) and manatee (data not shown) 'e-like' globin gene products (which were 95.9Yo identical from initiation to termination codons) revealed homology with the y-globin genes of other eutherians in the IVS2 region. Similarly, MatGAT (Tables 2-3,2-4) analyses of the dugong and manatee'e-like'genes recovered highest sequence similarity with placental 1-globin genes in all non-coding 52 --- Table 2-2. List of 'elike' globin gene products amplified in this study. Number of base Täxon Gene identity Amplified gene region pairs sequenced African elephant r,455 complete Asian elephant 1,798 complete dugong 2.041 complete manatee 1,578 complete elephant shrew 1,r29 partial sloth 2,122 complete 53 Table 2-3 Percent identity ' matrix comparing the IVS I and IVS2 external sequences of 'e-like' globin genes sequenced in this study with those human, of rabbit, mouse and goat. Gap opening and extension penalties were set to l6 and 4, respeotively. Numbers in gray represent IVS I comparisons while values in black denote [VS2 comparisons. 1(e) 2( e 4G s(e õ{ a 7(r 8tu th IO lv\ 11 (v.) 12ht:v 73 tuA 74tu 150 16û¡n l'/(rn lR l'm\ l. elephant shrew e ss.2 56.3 5lß <{n d8.ß A1 \ ffi 48;, lnî 48.3 ¿sn 47.4 47.t 47.6 tnl AqK 48.3 48.( 2. sloth e 75.5 K.2 4SÃ W 61.7 51.J 47.8 47.7 49.7 44.1 50.1 ¿s_fl 4e ¿, 51.t ,18,5 ¿ßs 48.6 l. armadillo e 8{',t.2 57.7 K3R a1 1l 50.4 49.7 49.s 49.S sr;1 50.i 19. ¿ 49.2 51.J 50.8 49.6 50.4 4. human e '70.1 6t-.i 60.4 4R.6 a1 . 48.1 49-O 49.5 49.r 44.3 47.a 47,3 s0.4 {t.5 48.7 48.0 5. rabbit e 7't.8 6-1.,( 72 I 52.7 50.7 dRR ffiå 51.6 51.2 49.9 49.1 âR./l 47.9 51.1 49.4 A9A 49.1 (e-like) 6. mouse v +9.(t t7 .1 5{J.J (/ì -16.: 42.5 &.3 47.3 47.3 46.8 {t.î 46.6 47.f asJ 47.4 47.8 47.5 À 7. eoat el (t(t,4 6l.J 62.1 6'2 46.5 47 ,2 48.3 48.2 47.6 {t,1 47.( 48.( 45.8 47.1 46.3 8. Aûican eleohant v 65.ú rí(1. {, 59.2 59.2 J6..ì {.4 I 98.( u.2 85.0 63.5 59.9 2 (t2.1 :ì.2 .9 6{¡.1 {;6.1 1ì I t1 t 47.1 44.2 I ,49,0 48.1 q7 15. mouse bh1 (,y-like) I s().J f{,.11 n(,.2 '| l:.Ì.-.r ¿\),( -lft,3 i2.3 5i.5 J3 ,{, S{ .( 52.( 49.6 49.4 49.0 16. armadillo rlm ìlr Ìf¡ i L) +1, ì. 11.' J1.-i 19.ft -1fi.{i il)0.0 +5.3 +3.J :i.t 57.0 c¿q 17. huma¡l r¡n i1 il 51.9 5{) .t iq I !! 5S.1 59. -l Sri.{i -.t6.9 52.t: 5{).f 56.0 1 18. soat eII ft1-like ) :-iJ 56.9 lç -: í-{.: -x,1;_! {7 3 1.-ì 56.J 5J_{} 5f () 11" t! I Table 2-4' Percent identity matrix comparing the 5' and 3' extemal sequences of 'elike'gtobin genes amplified in this study with those of human, rabbit, mouse and goat. Gap opening and extension penalties were set to 16 and.4,respectively. Numbers in gray represent 3' comparisons while values in black denote represent 5' comparisons. ,7 1 (e) Ê 3 (e) (e 2t 4 s( t o( € (e 8 (v) 9 lvl 10 (rpv) 11 (yA LZ (y\t 13tu 14(rUrl) i5(Un L6 h 1. elephant sh¡ew e ffi 2. sloth s 5J 80.6 1It 7 74.8 69.3 13.9 61.8 44.7 54.i ¡,t.t s7.3 59.0 61.3 3. armadillo e 52.6 71.3 73.4 70.9 72.4 73.4 64,0 47.2 14Ã 5R.fl 57.4 57.5 63.4 4. human e 59.8 tf). I 66.1) 74.3 77.8 73.9 c7 1 43.2 49.1 595 53.9 5d.s ß). /t 5. rabbit e qf () q() () (t(i 58,9 7l.l 1?7 60.J ¿5 I <(n 60,6 s7.ß (R( (/¡l 63.3 6. mouse v fe-like) 53.2 51.5 s{}_l it', LA -t().() 65.8 tl6-6 54.4 c?q {K1 <1 I 67 I 7 soat ¡ 50.( 61.0 á3.2 {;'.t.6 :i5. I :¡]. 45.1 45.6 s9.0 60.J 59.s 60.0 64.4 8. Asian elephant v 19.1 J.7.t t'7.'1 á'tt.7 f-i.2 .J3.9 J0.3 9. dusone v -17.3 49.7 19.6 ¿8.2 lta .l .ll i.í- t.:'l 45.1 61.6 58.4 r<1 q Ë1 Â 60.5 10. armadillo ruv 36.5 JJ.iI .1,!:,.ß ¿:\.4 5i.0 5¿.3 ¿Á*, 47.6 47.{ 48.3 11. h uman vA 56.¡i ,16.3 Ji.l¡ -t5.-¿ ri.i r;.-i Jif ,U 52.6 :(r.6 69.3 59.t 58.1 s7.7 t2 rabbit v 50.1) J6.¡ 46. -{5 il r ¡l (t Jl¡.-.1 ,¡ì -ìti 51.7 62.1 59.6 60.1 57.s I3. motrse hh l r'vJilcel 5{r.6 5J.1 .-i5.1 .q6.fl :-:{. -1 i{ì I f T I 5¡.i ¿5.9 46.'l) 15.2 ,51.8 60.7 14. armadillo rtrtr 43.8 -f5.1 -15.6 1{) l .tiJ. ì J{1..ì tÉ fì {) J6.3 "tJ.{ -18.1 Jfq.5 l,''r_- 15. human r¡n ,43.1 -t9.9 5{). .ts.l r'ì ì :-1 46.1 {-r.9 I8.7 i] 62.2 16. soat eII ln-like) 52 49.(_t 1t.t 5ti.{ -il; J-l.ó t5 13.9 -18.t JJ :{1. i t..i 5a.3 Fig. 2-1. Dotplot comparisons (window size : 100; threshold : 80) of the putative Asian elephant y-globin gene (Y-axis) with the s-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons l, 2 and 3 for each gene are denoted by black boxes. 56 Asian elephant y-globin gene comparisons e{ype genes T-type genes q-type genes I 000 l5m human e human o1 human yr1 Pc o -c 1000 o- U u rabbit e rabbit 1 c 'ñfo 500 l0o0 mouse y mouse ph1 goat eI goat ell Fig. 2-2. Dotplot comparisons (window size : 100; threshold : 80) of the putative dugong y-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 7,2 and 3 for each gene are denoted by black boxes. 57 Dugong lglobín gene comparisons e-type genes T-type genes q-type genes ,,/ human e human Ay human r¡n1 ,,,/ ny-0 Otc o rabbit e rabbit y o) :l ! t/ t,/- 0l 0 500 1000 1500 mouse y mouse Bh1 goat eI goat eII regions. The dugong y-globin gene did not possess the 5' external CACCC transcriptional control motif typically located between -145 and -112 bp upstream from the eutherian y-globin gene Cap site @fstrad iatts et al. 1980;Margot et al. lg89; Shehee et al. 1989), though a CACCG sequence was detected -120 bp upstream of the Cap (Fig 2-3, Appendix 2-4). Two putative poly-adenylation signals were also found in the dugong gene's 3' gene flanking region - the first -20 bp before the usual site (see above) and the second at the appropriate position (+62bp from the termination codon; Appendix 2-5). Finally, a GT --- AT splice junction mutation was found at the exon 2/IVS2 boundary of both sirenian y-globin genes. Nonetheless, the absence of frameshift mutations indicate the two genes are still capable of transcribing polypeptides. Classification of e-globin genes Dotplot (Fig. 2-q and MatGAT (Tables 2-3,2-4) analyses indicated the elephant shrew gene fragment, which ran from codon 60 in exon 2 to +70 bp downstream of the stop codon, shared highest sequence similarity with eutherian e-globins in the IVS2 and 3' flanking region. The sloth gene product was also classified as e-globin based on sequence homology with eutherian e-globin genes in the IVS2 and gene flanking regions (Fig. 2-5, Tables 2-3, 2-4). This xenarthran gene possessed all the external 5' and 3' transcriptional control motifs characteristic of functional 'e-like' globin genes @ig. 2-3, Appendices 2-4 and 2-5). Examination of the Dasypus novemcincløs p-globin cluster revealed the presence of three 'e-like' globin genes. The 5'-most gene was homologous with other eutherian e- globin genes (Fig. 2-6, Tables 2-3,24) and was found to possess all the control elements 58 : _ _-_ Fig. 2-3. The 5' flanking sequence data and features of 'e-like' globin genes amplified in this shrdy, plus those of the armadillo e-, Qy- and rþq-globin genes, the human r-, yA- and rþr¡-globin genes, and the goat r¡-globin gene. Data runs from seven bases upstream of the "CACCC" site to the initiation ("ATG') codon. The putative cap site ("CAP") is underlined and lies 52 bp (53 bp in the sloth) upstream of the initiation codon. Sequences considered important for transcriptional regulation (CACCC, CCAAT, and ATA) are denoted by bold CAPITAL letters. Nucleotides surrounding the ATA motif correspond to a consensus sequence ("AAGAATAAAAG') found in many pre-natally expressed p-globin genes (Flardison 1983) and are shaded in gray. The consensus sequence "CTTPyTG' @aralle and Brownlee 1978), found seven bases downstream of the cap site, is also marked in gray. Colons (:) denote base deletions. Underlined bases represent those which deviate from the consensus sequence normally recovered in p- globin genes, but should not be considered significant unless they compose a sequence element involved in tanscriptional'regulation. Two large inserts were omitted from the gene sequences of human yo-globin 15'- ctggagfcacaacccacccltgaccaat-3' containing a second CCAAT box, positioned immediately after the CCATT sequence) and lrn- globin (5'-actgggagaggçaaagggctgggggcca gagagg-3',located immediately before the "ATA" consensus sequence). 59 "cÀccc" "ccAAl" +L20 +i.1_0 +L00 +90 +80 +70 +60 +50 +40 sloth e ccagctcCACCCctgagggacacagctcagccctgaCCAAT9actgtg : aagtaccaggggaacaaggggcc : : ! : agaggtacacagtg armadillo e ctagctcCACCCCtgaaggacacagctcagccttgaCCAATeactgca: aagtaacagtgtaacaagaggcc : : : : agaatgÈccacagt, human e ctgactcCÀCCCctgagg: acacaggtcagcctfgaCCAÀlgactttt I aagtaccatggagaacagggggcca: : gaacttcggcagta dugong y ccgcccccccgccccggcctgatagcctagcgttgaCCAATagcctcatag'caaaaggaagaacaaaggg"., r r ragtggccagggata manatee y ICATAGCA.AAAGGÀAGAACAAAGGGCC : ! ! : AGTGGCCAGGGATÀ armadillo qy atccttccgT'ilCgtgaactccacaacttagctttgaccaatagccttt ! ! : : : : : : : : : : tggaaagaggcc ! ! ! : aggggacag : aaat humanAy caaacccCACCCatgggttggccagccttgccttgaCCAÀTagattcatttcactgagggaggcaaagggctggtcaatagattcatttc humanqrr¡ caagttcCACCCCtggcagtgaccacctagctttgaCCÀÀTagtctcâttttaftgggggaaggaagggcct:!!!!ggggcagcagatg goat n taaactccÀCcc: : agcctfg'acaaggcaaacttgaCCÀATagtcttagagtatccagtg aggccêqgggccggcggctggctagggatg 'ATÀ' "CÀP" "ATG' +30 +20 +10 0 -10 -20 -30 -40 -50 sloth e aagaATAaaaagccaca: tcttctagaagcagcacAgctct,gci1:ctgacacgtctgtgatcaccagccagcabttagatcctacatcatg armadillo e aagaÀIAaaaggccaca: ccttftagaagtagcacAgacttgc LtctEacaccactgtatcaccagcaag ! ctctctgaatcttcatcatg human e aaqiaATAaaaggccagacagagaggca: gcagcac4tatctgci-tccgacacagctgcaatcactagcaagctctcaggcctggcatcatg dugong y aagaATÀaaaagccaca: tgttccagttgcagcacÀtacatccttctgacâcatctgLgatcaccacaaa: ctccaagac.gg"..çc{tg manatee Y aagaÀTAaaaagccaca ! tgttccagttgcagcacðtacatcc aLctgacacatctgtgatcaccacaaa: ctccaagaccggacacqatg armadillo t¡ry gaagAÎAaaaggccaca:aatttcagtagcagt,acAaatctgcttctgaca::: !::: ! i 3 !::::: ! ! !:: ¡: ¡:: !:::: ! !:::tatg humanAy aagaÀlAaaaqgaagcacccttcagca ! gttccacAcactcgcttctggaacAtctgaggttatcaataagctcctagtccagacgccatg I humanr¡r1 agaaGTAaaaagccaca: catgaagca: gcaatgcAggcatgc Ltctggctcatctgtgatcaccaggaaactcccagatctgacactatg goat n ar¡gaAlÀaaa.Egccatg : cagtgaagcagcggcacAgacttg.j'b1:Çtggcccattatggatcaccagtaagctcccaqa 3 : ! ! : caccatg Fig. 2-4. Dotplot comparisons (window size : 100; threshold : 80) of the putative elephant shrew e-globin gene (y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 2 þartial sequence) and 3 for each gene are denoted by black boxes. 60 Elephant shrew e-globin partial gene comparisons e-type genes T-type genes Tì-type genes 500 500 human e human A1 human r¡rq c¡) B o L 500 -c r¡ rabbit e rabbity Pc ro -ç. o_ o q) 0 500 1000 mouse y mouse ph1 500 500 goat eI goat eII Fig.2-5. Dotplot comparisons (window size: 100; threshold: 80) of the putative sloth e-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice and goats (X-a*iÐ Sequences run from the 5'to 3'direction. Exons 1,2 and 3 for each gene are denoted by blackboxes. 61 Sloth e-globin gene comparisons e-type genes T-type genes tl .]p. genes human e human oy human r¡n¡ 1000 r (¡) -c rabbit e Po mouse Bh1 goat el goat eII Fig. 2-6. Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo e-globin gene (Y-axis) with the e-, y- and ï-globin genes of humans, rabbits, mice and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 1,2 and 3 for each gene are denoted by black boxes. 62 Armadillo e-globin gene comparisons e-type genes T-type genes n-type genes ,/ human e human o1 human r¡n¡ q) s rabbit e rabbity õrt' E ro mouse y mouse Bh1 goat eI goat eII necessary for transcription (Fig. 2-3, Appendices 24,2-5)- A yJike gene (Fig. 2-7, Tables 2-2, 2-3), was located 16,667 bp downstream of the e-globin stop codon, the intergenic dist¿nce representing the number of nucleotides Ue¡reèn-termination codon of the e-locus and the initiation codon of the y-locus. This yJike globin gene contained frameshift mutations in both exons I and 2 causing premature termination signals in each coding block. Examination of the 5' gene flanking region of this gene also revealed a CACCC box mutation (--+ CCTTC) and a 36-bp deletion immediately upstream of the initiation codon (Fig. 2-3). Additionally, the 3' end contained a putative poly- adenylation signal +32 bp from the stop codon, roughly 30 bp before the usual site of functional y-globin genes (Appendix 2'5). Ã2,163-bp 'q-like' gene fragment was found 1,753 bp downstream from the putative rÞy-globin termination codon, the intergenic distance corresponding to the number of bases between the rþy-gene's final stop signal and point where homolory began with other eutherian r¡-globin genes. This fragment was found to share highest sequence homology within only the IVS2, exon 3 and 3' flanking regions of the human rþr¡- and goat r¡-globin genes (Fig. 2-8, Tables 2-3,2-4). Examination of mutation rates Nonsynonymous replacement rates amongst the eutherian p-globin cluster genes were typically found to be lowest for the 'e-like' loci and highest for the 'pJike' loci within each species (Table 2-5). Compared to the average nonsynonymous rate of mammalian p-globin cluster genes (1.0 x lOe substitutions/siteþear; Efstratiadis et aI. 1980), notably high replacement rates were observed for the lJoci of rabbit and mouse. Replacement rates for 'BJike' globin genes of all examined species were similar to the 63 Fig.2-7. Dotplot comparisons (window size:100; threshold:80) of the putative armadillo rfy-globin gene (Y-axis) with the e-, y- and n-globin genes of humans, rabbits, mice and goats (X-axis). Sequences nrn from the 5' to 3' direction. Exons 1, 2 and 3 for each gene are denoted by black boxes. 64 Armadillo ryyglobin gene comparisons e-type genes T-type genes I-!!Re genes human e human a.y human yr¡ 0 500 1000 1500 2000 o rabbÍt y ! rO E L o 0 500 1000 i500 2000 y mouse mouse Bhl goat eI goat eII Fig. 2-8. Dotplot comparisons (window size : 100; threshold : 80) of the putative armadillo rþr¡-globin gene (Y-axis) with the e-, y- and r¡-globin genes of humans, rabbits, mice, and goats (X-axis). Sequences run from the 5' to 3' direction. Exons 1, 2 and 3 for each gene are denoted by blackboxes. 65 Armadillo y¡-globin partial gene comparisons e-type genes T-type genes l-type genes -0 500 r obo 15'oo 0 500 ¡000 1500 0 500 I 000 1 500 human e human AT human yr1 0 500 1000 1500 0 500 1000 1500 o rabbít e rabbit y E ro E oL 0 500 1000 0 500 1000 1500 mouse y mouse ph1 -o 500 looo 0 500 1000 goat eI goat r¡ Table 2'5 Percent of synonymous (silent) and nonsynonymous (replacement) ' substitutions between the 'e -like' and 'B-like' globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat, and tarsier) and their ortholog in humans. Silent and non-silent evolution rates are also shown. %o synonymous Taxon Silent mutation râte %o nonsynonymous Non-silent mutation rate subsitutions ldsl (no. silent mutations/site/vr) subsitutions ldNl (no. non-silent mutati ons/si1e/vr\ dN/dS Africar/Asian elephant .y 43.37 4.05 x 10 -9 8.07 7.54 x l0 -70 0.186 African/Asian elephaat ô 31.40 | 3.50 x 10 -9 14.71 7.37 x 10 -9 I 0.393 dugong y 41.39 3.87 x i0 -9 I0.93 1.02 x 10 -9 0.264 dugong ò 38.25 3.57 x 10 -9 ls.34 1.43 x 10 -9 0.401 manatee y 37.47 3.50 x 10 -9 10.75 1.00 x 10 -9 0.287 manatee ò 36.31 3.39 x 10 -9 72.58 1.18 x 10 -9 0.346 rock hyrax ôH 56.89 5.32 x 10 -9 19.73 1.84 x 10 -9 0.347 yellow-spotted hyrax ðE 57.77 5.40 x 10 -9 18.88 1.76 x 10 -9 0.327 tree hyrax òE 55.56 5,19 x i0 -9 19.29 1.80 x l0 -9 0.347 slolh e 49.83 4.89 10 -9 o\ x 7.70 7.55 x 10 -10 0.1 55 o\ sloth p 51..34 5,03 x 10 -9 t5.32 1.50 x 10 -9 0.298 armadillo e 60.82 5.96 x 10 -9 7.01 6.87 x 10 -10 0.1 15 p armadillo 37.80 3.71 -9 x 10 14.77 1.45 x 10 -9 0.391 rabbit e 55.14 6.34 -9 x 10 7.98 9.17 x 10 -10 0.145 rabbit y {5 70 6.41 -9 x 10 13.14 1.51 x 10.6 0.236 rabbit p1 34.39 3.95 x -9 l0 s.35 6.15 x 10 -10 0.1 56 mouse y (e-like) 57.79 6.64 I x 10 -9 9.16 1.05 x 10 -9 0.159 mouse Y 66.93 7.69 x I0 -9 14.63 1.68 x 10 -6 0.279 ¡ mouse B2 48.4t 5.56 -9 x 10 13.28 1.53 x 10 -9 0-274 goat el 42.t8 4.96 x 10-9 5.rI 6.01 x 10-10 0.t21 goat pA 34.t0 4.01 x 10-9 11.94 1.40 x 10-9 0.350 Îa¡sier e 33.79 4.39 x l0-9 5.25 6.82 x 10-10 0.155 tarsier y 40.73 5.30 x 10-9 12.t4 1.58 x 10-9 0.298 t¿¡sier ô J1 1A 3.54 x 10-9 7.46 9.69 x 10-10 0.273 tarsier p 40.91 5.31 x 10-9 4.69 6.09 x 10-10 0.1 15 average replacement rate (1.0 x 10-e substitutions/site/year) in each species except rabbit and tarsier, whose 'p-like' loci showed markedly low rates (Table 2-5). For each p- globin cluster gene examined, the rate of synonymous (silenQlevolution was found to be higher than the nonsynonymous mutation rate (Table 2-5). With few exceptions, non-coding evolution rates were lowest for the 'e-like' loci and highest for the 'p-like' loci (Table 2-6). These 'neutral' rates were also found to be relatively similar between the orthologs of different species (Table 2-6). The non-coding rates of examined genes were comparable to the non-coding mutation rates of other eutherian c¿- and B-globin cluster genes (4 - 5 x 10-e substitutions/siteþear; Li et al. 1981; Hayashida and Miyata 1983; Kimura 1983) (Table 2-6). 67 Table 2-6. Percent of nucleotide substitutions between the IVS2 sequenc€s of 'e-like' and pJike'globin genes amplified in this study (plus those of armadillo, rabbit, mouse, goat and tarsier) and their ortholog in humans. The correspondingno.n.eodingpNA mutation rates a-re also shown. o% Taxon Non-coding DNA Non-coding DNA (lVS2) substitrfiors substitrfion rate A-ùica¡¡,/Asian elephant y 40.1 3.75 x lO -9 Aûi ca¡r,/Asian elephant ô 45.1 4.21x lO -9 dugong'y 39.8 3.72 x lO -9 dugong ô 43.2 4.04 x l0 -9 manatee y 39.9 3-73 x l0 -9 manatee ò 42.4 3.96 x 10 -9 rock hyrax ð 47.6 4.45 x l0 -9 rock h¡nax ôE 47.4 4.43 x l0 -9 yellow-spotted hyrari ôtr 46.8 4.37 x l0 -9 tree hyrax ôE 46.8 4.37 x l0 -9 sloth e 43.8 4.29 x l0 -9 sloth çô 44.5 4.36 x 10 -9 sloth þ 53.4 5.24 x lO -9 armadillo e 42.3 4.15 x l0-9 armadillo 1 41.5 4.07 x l0-9 armadillo Qr¡ 43,0 4.22xlO-9 armadillo rpô 46.9 4.60 x 10 -9 armadillo þ 5t.4 5.04 x 10 -9 ¡abbit e 39.6 4.55 x l0-9 rabbil y 42.2 4.85 x l0-9 rabbit Bl 52.8 6.07 x l0-9 mouse y 51.4 5.91 x l0-9 mouse Y 52.2 6.00 x 10-9 mouse p2 52.7 6.06 x 10-9 goat eI 42.1 4.95 x l0-9 goat eII (r¡) 44.0 5.18 x 10-9 goat p" 49.8 5.86 x lG9 t¿¡sier e 35.2 4-57 x lO-9 tarsier.y 46.4 6.03 x l0-9 t¿rsier {rl 29.6 3.84 x l0-9 ta¡sier ð 4t.9 5.44 x l0-9 tarsier p 29.8 3.87x l0-9 68 DISCUSSION The discovery of e- and y loci (this study) together withlîd B-loci (see Chapter I) within members of the basal eutherian clade Afrotheria (Fig. 2-9) provides compelling support that at least a four-gene cluster pre-dates the divergence of this group from the placental mammal tree (-107 mya; Springer et a|.2003). In addition, the présence of five globin gene loci (e,.Þy, rlrq, rlrò, p) within the p-globin cluster of the nine-banded armadillo (Fig.2-9) further suggests that afive-gene set situated 5'-e-y-n-ò-p-3, on the chromosome was already est¿blished prior to the radiation of the three remaining superorders (-102 mya; Springer et al. 2003). It is noteworthy that molecular phylogenetic studies have been unable to reject a basal Afrotheria,/Xenarthra clade (Madsen et al. 2001; Delsuc et al. 2002; Lin et at. 2002). Hence if these two clades diverged concurrently from other placentals, the data suggests that the flrve-gene p-globin cluster does indeed predate the radiation of eutherian mammals. Although no r¡-globin gene products were amplified from the afrotherian templates, a e-locus was amplified from the elephant shrew, and y-globin gene products were obtained from the African elephant, Asian elephant, dugong and manatee. The absence of frameshift mutations and high sequence similarity between the two elephant (98.7%) and sirenian (95.9%) y-globin genes, respectively, suggested the yJocus is under conservation and possibly expressed within these paenungulate species. Support for this contention arises frorn the observation that blood from a 5-month old African elephant fetus possesses two distinct hemoglobin components, while blood from a l2-month old fetus and an adult each contain a single hemoglobin component which migrate at the same rate (Kleihauer et al. T965; Braunitzer et al. 1984). These studies thus suggest that 69 Fig. 2-9. The p-globin gene clusters of select mammals, updated using data collected for this study (see Fig. 1-1 legend for details). Genes that presumably exist within each cluster are denoted by grey shading and broken-lined edges. 70 Hrmm Galago [.emur Rabbit Mouse Rat Goat Þ rIV tr J sr grr rI r[I Y Cow rtlì? tÞô Sloth vn 1rô Armadillo ¡iirì? ô Elephant sbrew ¿-\ r¡li? ôE ô Hyrax i..i uJ uJ litri? ô Sirenim Elephant eÊ 6l Marsupials H +- r0 Monotremes N, the minor (<15%) and major hemoglobin components of early fetal elephants are encoded by the y- and òJoci, respectively, with ò-globin assuming sole expression prior to 12 months gestation (the gestational period being 21 months). *l¿nouu io elephants, the blood of Amazonian manatee (Trichechus inunguis) calves possess one major and one minor (<5%) band (Farmer et al. 1979), while adult blood contains a single chain constituent (Kleinschmidt et al. 1986; Kleinschmidt and Braunitzer 1988). Thus within sirenians, fetal hemoglobin presumably also contains two components encoded by the y- (minor) and ò- (major) loci, with y-globin expression extending into the early stages of post-natal life. Support for this inference is provided by the 5' flanking region of the dugong y-globin gene, which contained no CACCC transcriptional control motif (typically located -90 bases upstream from the Cap site of transcribed 'pJike' globin genes; Myers et al. 1986). This y-globin promoter sequence has been found to be crucial for suppressing p-globin expression during pre-natal development in vitro (Sargent et al. 1999) and during fetal development of transgenic mice (Perez-Stable and Constantini 1990). Hence, the 'lingering existence' of y-globin in the hemoglobin of Amazonian manatee calves may occur because y-expression is not immediately suppressed after birth as it does not significantly affect expression of the globin gene encoding the major post- natal hemoglobin component (i.e. ô-globin; see Chapter I). Because anthropoid primates are the only eutherians known to express the 1-locus during post-embryonic stages of development, these findings are of considerable interest. The tandem duplication events producing the four or five gene B-globin cluster of placental mammals is thought to have enabled the evolution of specialized forms of hemoglobin physiologically-adapted for different developmental stages (Barrie et al. 7t 1981; Goodman et al. 1984). For instance, as an adaptive strategy to an extended fetal life (Koop and Goodman 1988) mice and rabbits express their y- and e-loci during early and late embryogenesis (Hansen et a|.7982;Hardison 1981; H'árdison tg8¡; Hlll et al. 1984), respectively. Similarly, higher primates have extended their 1 expression to fetal Srowth stages (Shen et al. 1987), while artiodactyls have recruited their 3'-most BJocus for fetal development (Townes et al. 7984). Within the armadillo, however, only the two outermost loci (e and p) appear to be capable of encoding a complete polypeptide (Fig. 2- 9), and thus possess a functional arrangement similar to the ancestral B-cluster of marsupials and (presumably) monotremes (Lee 1997; Lee et al. lggg). Notably, hemoglobin of adult armadillos contains two different 'p-like' polypeptide chains expressed atratios of 3.7 (de Jong et al. 1981) to l:9 (Dementi and Burke 1972), while only one 'B-like' gene capable of encoding a protein exists within its p-cluster. This consequently raises the possibility that the second chain is encoded by the e-locus, though significantly, no e-globin gene of any eutherian is known to be expressed after the completion of embryogenesis. Interestingly, marsupials possess a 'p-like' globin gene (o-globin) within their o-globin cluster, which constitutes <25o/o of both their late pre- and early post-natal hemoglobin (Wheeler et al. 2001; Wheeler et at. 2004). Thus, the possibility that armadillos may express an unlinked 'B-like' globin gene, which may be orthologous with marsupial o-globin, can not be discounted. The discovery of a putatively functional s-globin gene within the pale-throated sloth suggests that orthologs of the ancient e-globin gene have been maintained and expressed in all members of Xenarthra. lVhile I have documented ,r¡lô- and p-globin loci 72 from this species (see Chapter I), it is still unknown whether they have retained functional .¿- and rlJoci. The comparative analyses of evolution rates amongs-Ëthè 'e-like' and 'pJike' globin genes ofthe four eutherian superorders supports the contention that descendants of the ancestral p-globin gene family have evolved at different rates throughout evolutionary time. Furthermore, these differential rates can be primarily attributed to the different selective pressures placed upon the genes therein (Aguileta et al.2O04). Nonsynonymous (replacement) evolution rates were found to be lower in the 'e-like genes" compared to the'p-like' genes within each species (Table 2-5), reflecting stabilizing selection as little room exists for variation of ambient conditions during gestational development (Koop and Goodman 1988). Replacement rates for the y-loci of dugong and manatee (1.02 and 1.00 x 10-e, respectively, Table 2-5) were nearly identical to the averagenonsynonymous replacement rate determined for p-globin cluster genes (1 x 10-e substitutions/site/year; Efstradiatis et al. 7980), while replacement rates were slightly lower for the y-orthologs 10; of both elephant species (7 .54 x 1 0- Table 2-5). The former finding supports the earlier conjecture that sirenian y-globin genes are expressed both pre- and post-natally, as these genes would have diverged faster in order to encode a protein structure suited for two (pre- and early post-natal) developmental stages, while the latter suggests stabilizing selection. The high degree of sequence divergence observed between the y-globin gene sequences of humans and laurasiatherians (Table 2-5) is likely due to the structural differences allowing the orthologs to function at different developmental times (i.e. during fetal and embryonic development, respectively; Hardison 1984). Thus, it is significant that replacement rates observed between the 1-loci of paenungulates and 73 humans are relatively similar to the average replacement rate of orthologous (and typically analogous) gene sequences (1 x 10-e substitutions/site/year), lending further support to the notion that paenungulate yJoci are expressed dùirg fetal growth stages. Because the ò- and p-globin genes of paenungulates and xenarthrans encode their major post-natal hemoglobin components, respectively (see Chapter I), it is not surprising the mutation rates among these genes were fairly similar. However, these rates (which were also similar to those of the mouse and goat 'pJike' loci) were slightly higher than the average p-globin cluster gene replacement rate (Efstradiatis et at. l9B0) while those of rabbit and tarsier were much lower (Table 2-5). This large degree of variation most likely reflects lineage-specific factors such as population structure and generation time (Wu and Li 1985; Britten 1986; Li and TanimuratggT). The discovery of differing evolution rates for each gene in the p-globin family lends credence to the notion that phylogenetic studies using molecular clocks (which assume all hemoglobin genes evolve at equal rates) are based upon unsound assumptions (Goodman l98l; Hardison 1984; Gebel i985, Li et al. 1987; Bailey et at. I99l). I speculate that, although the branching patterns derived from clock-based analyses of mammalian p-globin gene families are most likely in the correct order, dates estimating divergence times may need to be revised as information is gained on these various rates of evolution. In accordance with earlier findings (Shapiro et al. 1983; Hardison 1984; Hardies et al- 1984; Gebel et al. 1985; Slightom et al. 1985; Koop et al. l9}9b), the dat¿ further demonstrate marked heterogeneity between the non-coding evolution rates of sequenced 'e-like' and 'p-like' globin genes of the same species. Specifically, descendants of the 74 'e-like'loci were shown to diverge at neutral sites more slowly than the'pJike, loci within every eutherian species but the tarsier (Table 2-6). The slow rate of neutral evolution previously observed in various primate p-globin gene-s (G,ebel et al. "to¡re. 1985; Harris et al. 1986; Koop et al. 1989b) was not fully supported by the analyses, as only the rÞq- and pJoci of tarsier diverged at a rate comparatively lowgr than the neutral mutation rate of pseudogenes (4 - 5 x 10-e substitutions/site/year, Hayashida and Miyata 1983; Kimura 1983; Li et al. 1981). Further challenging the 'hominid slowdown, theory (Li et al. 1987; Hasegawa et al. 1989; Bailey et at. l99l) is the observation that no links were found between species generation time and rate of neutral evolution for any loci but Y As the 'neutral' mutation rates were relatively similar between orthologs of different species (with the exception of tarsier), it is notably apparent that underlying mechanisms (possibly owing to species-specific differences in DNA mutation rates, localized structural constraints, etc., Wu and Li 1985; Koop et at. I9B9b) are acting on these 'unconstrained' gene sequences within each species. Consequently, the referral to non- coding mutation rates as'paradigms of neutral evolution' Q,i et al. lg}l;Kimura 19g3) should clearly be reconsidered. In conclusion, the results of this study confirm that the B-globin cluster of stem eutherians consisted of at least four, and probably five genes. And, with the exception of primates, descendants of the 5'-most e-locus have typically evolved at a slower rate than genes derived from the p-locus in both coding and non-coding regions throughout evolutionary time. Furthermore, the data suggest that the y-globin gene is expressed until at least I year of age in sirenians, but silenced relatively early (<5 months) in elephant development. Finally, the progenitor of modern day nine-banded armadillos atavistically 75 regressed to a two-gene cluster (5'-e-B-3'), suggesting the potential flexibility afforded by the five-gene expansion did not provide any selective advantages within this lineage. 76 LITERATURE CITED Aguileta, G., J. P. Bielawski, anciz. Y-g. 2004. Gene converBiÕn_,and functional divergence in the p-globin gene family. J. Mol. Evol. 59:177-199. Amrine, H., and M. Springer. 1999. Maximum-likelihood of the tethythere hypothesis based on a multigene data set and a comparison of different models of sequence evolution. J. Mamm. Evol. 6:161-176. Amrine-Madsen, H., K. P. Koepfli, R. K. Wayne, and M. Springer.2003. A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol. Phylogenet. Evol. 28.225-240. Antoine, M., and J. Niessing.lgS4.Intronless globin genes in the insect Chironomus thumi thumi. Nature 310.195-798. Antoniou, M., E. deBoer, G. Habets, and F. Grosveld. 1988. The human B-globin gene contains multiple regulatory regions: identification of one promoter and two downstream enhancers. EMBO I. 7 .377-384 Archibald, D.2003. Timing and biogeography of the eutherian radiation: fossils and molecules compared. Mol. Phylogenet. Evol. 28:350-359. Bailey, W. J., D. H. A. Fitch, D. A Tagle, J. Czelusniak, J. L. Slightom, and M. Goodman. 1991. Molecular evolution of the rþT-globin gene locus: gibbon phylogeny and the hominid slowdown. Mol. Biol Evol. B:155-184. Ba¡alle, F. E,. and G. G. Brownlee. 1978. AUG is the only recognizable signal sequence in the 5'non-coding regions of eukaryotic mRNA. Nature 274.84-81. Barrie, P, 4., A. J. Jeffreys, and A. F. Scott. lgSl.Evolution of the p-globin gene cluster in man and the primates. J. Biol. Evol. 149:319-336. 77 Boyer, s. H., E. F. crosby, A.N-Noyes, G. F. Fuller, S. E. Leslie, L. J. Donaldson, G. R. vrablik, E. w. Schaefer Jr., and T. F. Thurmon. 1971. nnT"]" hemoglobins: some sequences and some proposals concerning the charact", of *oiotion and mutation. Biochem. Genet. 5:405-448. Braunitzer, G., w. Jelkmann, A. Stangl, B. schranlq and c. K¡ombach. lggz. Hemoglobins, XLMII: the primary structure of hemoglobin of the Indian elephant (Elephas mæimus, Proboscidea). þz: Asn. Hoppe-seyler's z. phsiol. chem. 363:683-691. Braunitzer, G., A- Stangl, B. Schrank, C, Krombach, and H. Wiesner. 19g4. phosphate- hemoglobin interaction. The primary structure of the hemoglobin of the African elephant (Loxodonta africana, Probiscidea): asparagine in position 2 of the p-chain. Hoppe-Seyler' s Z. Phsiol. Chem. 365:7 43 -7 49. Britten, R. J. 1986. Rates of DNA sequence evolution differ between taxonomic groups. Science 231 1393-1398. Buettner-Janusch, J., and V. Buettner-Janusch. 1963. Hemoglobins of two elephant shrews. Nature 199:91 8-919. Burmester, T., B. Ebner, B. weich, and T. Hankeln. 2002. cytoglobin: a novel globin type ubiquitously expressed in vertebrate tissues. Mol. Biol. Evol. 19:416-421. Burmester, T., B. weich, S. Reinhardl and T. Hankeln. 2000. A vertebrate globin expressed in the brain. Nature 407 .520-523. Cooper, S., and R. Hope. 1993. Evolution and expression of a p-like globin gene of the Australian marsupial Sminthopsis crassicaudata.Proc. Natl. Acad. Sci. USA 90.11777-rT78I 78 Cooper, S., R. Murphy, G.Dolman, D. Hussey, and R. Hope. 1996. A molecular and evolutionary study of the þ-globin gene family of the Australiàn marsupial Sminthopsis uassicaudnta. I. Mol. Biol. 17 .1012-1022. Czelusniak, J., D. Goodman, M. Hewett-Emmett, M. Weiss, P. Venta, and R. Tashian. 1982. Phylogenetic origins and adaptive evolution of avian and mammalian hemoglobin genes. Nature 298.297 -300. de Jong, W., A. Zweers, and M. Goodman. 1981 . Relationship of aardvark to elephants, hyraxes, and sea cows from cr-crystallin sequences. Nature 292.538-540. Delsuc, F., M. Scally, O. Madsen, M. Stanhope, W. de Jong, F. Catzeflis, M. Springer, and E. Douzery. 2002. Molecular phylogeny of living xenarthans and the impact of character and taxon sampling on the placental tree rooting. Mol. Biol. Evol. 19:1656- 167T. Dementi, P. L., and J. D. Burke. 1972. Oxyhemoglobin affînity in the armadillo. Amer. J. Anat. 134:509-514. Dickerson, R.8., and I. Geis. 1983. Hemoglobin: structure, function, evolution, and pathology. Benj amin/Cummings, Menl o Park, Californi a. Dierks, P., A. van ooyen, M. D. Cochran, C. Dobkin, J. Reiser, and C.'Weissmann. 1983. Three regions upstream from the cap site are required for efficient and accurate transcription of the rabbit B-globin gene in mouse 3T6 cells. Cell23 695-7Q6. Donze, D., P. H. Jeancake, and T. M. Townes. 1996. Activation of ô-globin gene expression by erythroid Krupple-like factor: a potential approach for gene therapy of sickle cell disease. Blood 88:4051-4057. t9 Douady, C.,F. Catzeflis, D. Kao, M. Springer, and M. Stanhope. 2002. Molecular evidence for the monophyly of Tenrecidae (Mammali{ anJthe timiñg of the colonization of Madagascar by Malagasy tenrecs. Mol. Phylogenet. 8vo1.22.357- 363. Douady, c.,F. Catzeflis, J. Raman, M. Springer, and M. stanhope. 2003. The Sahara as a vicariant agent, and the role of Miocene climatic events, in the diversification of the mammalian order Macroscelidea (elephant shrews). Proc. Natl. Acad. Sci. USA 100:8325-8330 Efstratiadis A, J. W. Posakony, T. Maniatis et al. (15 co-authors). 1980. The structure and evolution of the human B-globin gene family . Cell2t.654-668. Eîz:.nk, E., w. Murphy, and s. o'Brien. 2001. Molecular dating and biogeog:aphy of the early placental mammal radiation. J. Hered. 92:212-219. Farmer, M., R. weber, J. Bonaventura, R. Best, and D. Domning.1979. Functional properties of hemoglobin and whole blood in an aquatic mammal, the Amazonian manatee (Trichechus inunguis). Comp. Biochem. Physiol. 62.231-238. Feng, Y. Q., R. Warin, T. Li, E. Olivier, A. Besse, A. Lobell, H. Fu, C. M. Lin, M. I., Aladjem, and E. E. Bouhassira. 2005. The human p-globin locus control region can silence as well as activate gene expression. Mol. Cell Biol. 25:3864-3874. Fitch, D. H., w. J. Bailey, D. A. Tagle, M. Goodman,L. sieu, and J. L. slightom. 1991. Duplication of the y-globin gene mediated by Ll long interspersed repetitive elements in an early ancestor of simian primates. Proc. Natl. Acad. Sci. USA 88.7396-7400 80 Fleming, M., J. Potter, C. Ramirez, G. Ostrander, and E. Ostrander.2003. Understanding missense mutations in the BRCAI gene: an evolutionary Jp¡.oa"h. Proc. Natl. Acad. Sci. USA 100.1 151-1156. Forrester, w. c., S. Takegawa, T. Papayannopoulou, G. Stamatoyannopoulos, and M. Groudine. 1987. Evidence for a locus activation region: the formation of developmentally stable hypersensitive sites in globin-expressing hybrids. Nucl. Acids Res. 15: 10159-10177. Fritsch, E.F., R. M. Lawn, and T. Maniatis. 1980. Molecular cloning and characterization of the human p-like globin gene cluster. cell 19.959-972. Gebel, L. 8., v. L. van Santen, v. L., Slightom, J. L-, and R. A. Spritz. 1985. Nucleotide sequence, evolution, and expression of the fetal globin gene of the spider monkey, Ateles geolfroyi. Proc. Natl. Acad. Sci. USA 82:6985-6989. Goodman, M. 1981. Globin evolution was apparently very rapid in early vertebrates: a reasonable case against the rate-constancy hypothesis. J. Mol. Evol. 17:114-120. Goodman, M., B. Czelusniak, D. Koop, D. Tagle, and J. slightom. 1987. Globins: a case study in molecular phylogeny. cold spring Harb. symp. Quant. Biol. 52:875-890. Goodman, M., B. Koop, J. Czelusniak, M. Weiss, and J. Slightom. 1984. The r¡-globin gene: its long evolutionary history in the p-globin gene family of mammals. J. Mol. Biol. 180.803-823. Goodman, M., and G. Moore. l975.Darwinian evolution in the genealogy of hemoglobin. Nature 253 : 603 -608. 81 Grosveld, G. C., A. Rosenthal, and R. A. Flavell. 1982. SeWel;l requirements for the :' transcription of the rabbit p-globin gene in vivo: the -80 region. Nucl. Acids Res. to.495t-4971. Grosveld, F., G. B. van Assendelft, D._R. Greaves, and G. Kollias. 1987. Position- independeng highJevel expression of the human p-globin gene in transgenic mice. Cell 51:975-985. Hansen, J. N., D. A. Konkel, and P. Leder. 1982. The sequence of a mouse embryonic p- globin gene: evolution of the gene and its signal region. J. Biol. Chem. 257:1048-1052 Hardies, S., H. Edgell, and C. Hutchison ltr. 1984 Evolution of the mammalian B-globin gene cluster. J. Biol. Chem. 259.3748-3756. Hardison, R. C. 1981. The nucleotide sequence of rabbit embryonic globin gene p3. J. Biol. Chem. 256:11780- 11786 Hardison, R. C. 1983. The nucleotide sequence of the rabbit embryonic globin gene p4. J. Biol. Chem . 258:87 39 -87 44. Hardison, R. C. 1984. Comparison of the B-like globin gene families of rabbits and human indicates that the gene cluster 5'-e-y-ô-p-3' predates the mammalian radiation. Mol. Biol. Evol. 1:390-410. Hardison, R. C, and R. Gelinas. 1986. Assignment of orthologous relationships among mammalian cr-globin genes by examining flanking regions reveals a rate of evolution. J. Mol. Biol. 3.243-261. Hardison, R., C., and J. B. Margot. 1984. Rabbit globin pseudogene r¡rp2 is a hybrid of ò- and B-globin gene sequences. Mol. Biol. Evol. 1:302-316. 82 Harris, S., P. Barrie, M. Weiss, and A. Jeffreys. 1984 The primate qrpl gene: an ancient p-globin pseudogene. J. Mol. Biol 180.785-801 j ' Harris, S., J. R. Thackeray, A. J. Jeffreys, and M. L. Weiss. 1986. Nucleotide sequence - analysis of the lemur p-globin gene family: evidence for major rate fluctuations in globin polypeptide evolution. Mol Biol. Evol. 3:465-484. Hasegawa, M. H., H. Kishino, and T. Yano. 1989. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominidae. J. Hum. Evol. 18:461476. Hayashida, H., and T. Miyata. 1983. Unusual evolutionary conservation and frequent DNA segment exchange in class I genes of the major histocompatibility complex. Proc. Natl. Acad. Sci. USA 80.2671-2675. Hedges, S. 2001. Afrotheria. plate tectonics meets genomics. Proc. Natl. Acad. Sci. USA 98:1-2 Hill, 4., S. C. Hardies, S. J. Phillips, M. G. Davis, C. A. Hutchison III, and M. H. Edgell. 1984. Two mouse early embryonic p-globin gene sequences: evolution of the nonadult B-globins. J. Biol. Chem. 259.3739-3747. Hutchison III, C. 4., S. C. Hardies, R. W. PadgetÇ S. Weaver, and M. H. Edgell. 1984. The mouse globin pseudogene ph3 is descended from a premammalian ò-globin gene. J. Biol. Chem . 259.12881-12889. Jeffreys, A. J., P. A. Barrie, S. Harris, D. H. Fawcett, Z. J. Nugen! and A. C. Boyd. 1982. Isolation and sequence analysis of a hybrid d-globin pseudogene from the brown lemur. J. Mol. Biol. 156:487-503 83 Jukes, T. H. and C. R. Cantor.1969. Evolution of protein moleeu=let. Pp.2l-132in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York. Kimura, M. 1983. Rate variant alleles in the light of the neutral theory. Mol. Biol. Evol. 1.84-93. Kleihauer, E., I. O. Buss, C. P. Luck, and P. G. Wright. 1965. Hemoglobins of adult and fetal African elephants. Nature 207.424425. Kleinschmidt, T., and G. Braunitzer. 1983. The primary structure of hemoglobins of the rock hyrax (Procavia habessinica, Hyracoidia): insertion of glutamine in the c- chains. Hoppe Seylers Z. Physiol. Chem. 364:7303-13 Kleinschmidt, T., and G. Braunitzer. 1988. The primary structure of the hemoglobin of the Brazilian manatee (Trichechus inunguis, Sirenia) Hoppe-Seyler'sZ. Physiol. Chem. 389:427-435. Kleinschmidt, T., J. Czelusniak, M. Goodman, and G. Braunitzer. 1986. Paenungulata: a comparison of the hemoglobin sequences from elephant, hyra<" and manatee. Mol. Biol. Evol. 3.427-435 Kleinschmidt, T., J.Marz, and G. Braunitzer. 1989. The primary structure of pale- throated three-toed sloth(Bradypus tridnctyluq Xenarthra) hemoglobin. Biol. Chem Hoppe-Seyler 370.3 03 -3 08. Konkel, D. 4., J. V. Maizel Jr., and P. Leder. 7979. The evolution and sequence comparison of two recently diverged mouse chromosomal B-globin genes. Cell 18:865-873. 84 Koop, B., and M- Goodman. 1988. Evolutionary and developmental aspects of two hemoglobin B-chain genes (eM and pM) of opposum. proc. Natl. Acad. Sci uSA ''.'-''-- 85:3893-3987 Koop,8., F., D. siemienia( J. L slightom, M. Goodman, J. Dunbar, p. c. wright, and E. L' Simons. 1989a. Tarsius ô- and B-globin genes: conversions, evolution, and systematic implications. J. Biol. Chem. 265:69_79 Koop, B. F', D. A. Tagle, M. Goodman, and J. L. Slightom, 1989b. A molecular view of primate phylogeny and important systematic and evolutionary questions. Mol. Bioi Evol. 6:580-612. Kosche, K. 4., c. Dobkin, and A. Bank. l9g5 DNA sequences regulating human p_ globin gene expression. Nucl. Acids Res. 13:77g1-77g3. Lacy, E., R. c. Hardison, D. euon, and T. Maniatis. TgTg.Thelinkage arrangement of four rabbit B-like globin genes. Cell lB:1273-IZB3 Lavergne, 4., E. Douzery, T. Stichler, F. Catzeflis, and M Springer. lggl.Interordinal mammalian relationships: evidence for the paenungulate monophyly is provided by complete mitochondrial 12S rRNA sequences. Mol. Phylogenet. 8vo1.6.245-25g. Lee, M--H. 1997. Molecular biology and evolution of p-globin genes in monotremes. Doctoral thesis, University of Adelaide, Australia. Lee, M.-H., R. scrofl S. cooper, and R. Hope. 1999. Evolution and molecular charactenzation of a p-globin gene from the Australian echidna Tachyglossus aculeatus (Monotremata). Mol. phylogenet. Evol. 12:205 -21 4. pseudogenes Li, w.-H., Gojobori, T., and M. Nei. r98r. as a paradigm of neutral evolution. Nature 292.237 -239. 85 Li, W.-H., and M. Tanimura. 1987. The molecular clock runs in man T:.:.to*ty than in - .-- apes and monkeys. Nature 326.93-96. Li, W.-H., M. Tanimura, and P. M. Sharp. 1987. An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J. Mor. Evol. 25:33 0-342 Lin, Y-H., P. Mclenachan, A. Gore, M. phillips, R. ota, M. Hendy, and D. penny. 2002. Four new mitochondrial genomes and the increased stability of evolutionary trees of mammals from improved taxon sampling. Mol. Biol. Evol. 19:20 60-2070. Lingrel, J. 8., T. M. Townes, s. G. Shapiro, s. E. Spence, p. A. Liberator, and S. M. Wernke. 1983 . Organization, structure, and expression of the goat globin genes. Prog. Clin. Biol. Res. 134:131-139. Liu, R., and M. Miyamoto. 1999. Phylogenetic assessment of molecular and morphological data for eutherian mammals. Syst. Biol. 4g.54-64 p. Liu, F. G., M. M. Miyamoto, N. P. Freire, e. ong, M. R. Tennant, T. s. young, and K. F. Gugel. 2001. Molecular and morphological supertrees for eutherian (placental) mammals. Science 291.I7 86-1789. Madsen, O., M. Scally, C. Douady, D. Kao, R. DeBry, R. Adkins, H. Amrine, M. Stanhope, W. de Jong, and M. Springer.20Ol. Parallel adaptive radiation in two major clades of placental mammals. Nature 409:610-614. Madsen, o., D. willemsen, B. ursing, IJ- Arnason, and w. de long.2002. Molecular evolution of the mammalian alpha 28 adrenergic receptor. Mol. Biol. Evol. 19:2150- 2160. 86 Malia, M. J. Jr., R. M. Adkins, and M. W. Alla¡d. 2002. Molecular support for Afrotheria and the polyphyly of Lipotyphla based on analyses of the growth hormone receptor .l - gene. Mol. Phylogenet. Evol. 24:91-I0L. Margot, J. 8., G. W. Demers, and R. C. Hardison. 1989. Complete nucleotide sequence of the rabbit p-like globin gene cluster. Analysis of intergenic sequences and comparison with the human p-like globin gene cluster. J. Mol. Biol. 205:15-40. Martin, S., K. Vincent, and A. Wilson. 1983. Rise and fall of the ò-globin gene. J. Mol. Biol. 164:513-528. McKenna, M.C. 1975. Toward a phylogenetic classification of the Mammalia.Pp. l-4 in W.P. Luckett and F S. Szalay, eds. Phylogeny of the primates. Plenum Press, New York. McKenna, M., and S. Bell 1997. Classification of Mammals Above the Species Level. Columbia Univ. Press, New York. Murata, Y., M. Nikaido, T. sasaki,, Y. cao, y. Fukumoto, M. Hasegawa, and N. okada. 2003. Afrotherian phylogeny as infened from complete mitochondrial genomes. Mol. Phylogenet. Evol. 28:253 -260. Murphy, W., E. Eizirik, W. Johnson,Y. Zhang, O. Ryder, and S. O'Brien. 2}0la. Molecular phylogenetics and the origins of placental mammals. Nature 409:614-6T8. Murphy, w., E.Eizirik, s. o'Brien, o. Madsen, M. scally, c. Douady, E. Teeling o. Ryder, M. Stanhope, W. de Jong, and M. Springer. 200lb. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294.2348-2351. Myers, R., M., K. Tilly, and T. Maniatis. 1986. Fine structure genetic analysis of a p- globin promoter. Science 232.613 -618. 87 Nei, M., and T. Gojobori. 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutionr. Moi Biol- Evol. 11:613- 619. Nikaido, M., H. Nishihar4 Y. Hukumoto, and N. okada. 2003. Ancient SINEs from African endemic mammals. Mol Biol. Evol.2O:522-527. Nishihara, H., Y. Satt4 M. Nkaido, J. G. M. Thewissen, M. J. Stanhope, and N. okada. 2005. A retroposon analysis of afrotherian phylogeny. Mol. Biol. Evol. (in press). 'walker's Nowak, R. 1999. mammals of the world, Sixth Edition (vol.2). The Johns Hopkins University Press. Baltimore. ohno, S. 1970. Evolution by gene duplication. springer-verlag,Berlin, New york. Gy-globin Perez-Stable, C., and F. Constantini. 1990. Roles of fetal promoter elements and the adult B-globin 3' enhancer in the stage-specific expression of globin genes. Mol. Cell Biol 10:1116-1125. Pesce, 4., M. Bolognesi, and A. Bocedi. 2002. Neuroglobin and cytoglobin EMBO 3.1 146-1 151 Phillips, s., J., s. c. Hardies, c.L. Jahn, M. H. Edgell, and c. A. Hutchison trI. 1984. The complete nucleotide sequence of a p-globinJike structure, þb2, from the [Hbb]o mouse BALB/c. J. Biol. Chem. 259.7947-7954. Prinsloo, P. 1993. Molecular and chromosomal phylogeny of the Hyracoidea. Doctoral thesis, University of Pretori4 South A_frica. Poyart, c., H. wajcman, and J. Kister. 1992 Molecular adaptation of hemoglobin function in mammals. Resp. Physiol. 90:3-17. 88 Proudfoot N. J., and G. G. Brownlee.1976.3'non-coding region sequences in eukaryotic -- messengerRNA.Nature 263:2Il-214. - Proudfoot N., A. Gil, and T. Maniatis. 1982. The structure of the human (-globin gene and a closely linked, nearly identical pseudogene. Cell 31:553-563. Prychitko, T., R. M. Johnson, D- E. Wildmar¡ D. Gumucio, and M. Goodman. 2005. The phylogenetic history of New World monkey p-globin reveals a platyrrhine p to ò gene conversion in the atelid ancestry. Mol. Phylogenet. Evol. 35.225-234. Ristaldi, M. S., S. Casula, S. Porcu, M. F. Marongiu, M. Pirastu, and A. Cao. 1999. Activation of the ò-globin gene by the p-globin gene CACCC motif. Blood Cells Mol. Dis. 25:193-209. Ross, J., and A. Pizano. 1983, Human B and ò globin messenger RNAs turn over at different rates. J. Mol. Biol, 167:607-617. Sargent, T. G., C. C. DuBois, A. M. Buller, and J. A. Lloyd. 1999. The roles of 5'-HS2, 5'-HS3, and the 1-globin TATA, CACCC, and stage selector elements in suppression of p-globin expression in early development. J. Biol. Chem. 274:11229-11236. Satoh, H., N. Inokuchi, N. Yasuhiro, and T. Okazaki. 1999. Organization, structure, and evolution of the nonadult rat p-globin gene cluster. J. Mol. Evol. 49:122-129. Schimenti, J. C., and C. H. Duncan. 1985a. Concerted evolution of the cow u'and ea p- globin genes. Mol. Biol. Evol. 2.505-513. Schimenti, J. C., and C. H. Duncan. 1985b. Structure and organization of the bovine p- globin genes. Mol Biol Evol. 2:514-525. 89 Schon, E. ,{., M. L. Cleary, J. R. Haynes, and J. B. Lingrel. 1981. Structure and evolution of goat y-, Pc- and pA-globin genes: three developmentally tegtrtut"agenes contain inserted elements. Cell 27 .3 59-3 69. scolt, A. F., P- Heath, s. Trusko, s. H. Boyer, w. Prass, M. Goodman, J. czelusniak, L. Y. Chang, and J. L. Slightom. 1984. The sequence of the gorilla fetal globin genes: evidence for multiple gene conversions in human evolution. Mol. Biol. Evol. 1:371- 389. Shapiro, S. G., E. A. Schon, T. M. Townes, and J. B. Lingrel. 1983. Sequence and linkage of the goat etand eII p-globin genes. J. Mol. Biol 169:31-52. Shehee, W. R., D. D. Loeb, N B. Adey, F. H. Burton, N. C. Casavan! p Cole, C. J. Davies, R. A. McGraw, S. A. schichman, and D. M. severynse. 1989. Nucleotide sequence of the BALB/c mouse B-globin complex. J. Mol. B,iol.2os.4T-62. Shen, S. H., J. L. Slightom, and O. Smithies. 1981. A history of the human fetal globin gene duplication. Cell 26.191 -203. Shoshani, J., and M. McKenna. 1998. Higher taxonomic relationships among extant mammals based on morphology, with selected comparisons of results from molecular data. Mol. Phylogenet. Evol. 9.572-584. Simpson, G.G. 1945. Principles of classification and a classification of mammals. Bull. Am. Mus. Nat. Hist. 85:1-350. Singh, A.2002. Phylogenetic relationships of paenungulate mammals inferred from a- and p-globin gene sequences. Honours thesis, B.sc., university of Manitoba, Canada. 90 slightom, I.L.,L. Y. chang, B. F. Koop, and M. Goodman. 1985. chimpanzeefetal cy- Ay-globin and gene nucleotide sequences provide further evidence of gene :' conversions in hominid evolution. Mol. Biol. Evol. 2.370-389.- Smith, 4., D. Smittr, and B. Funnell. 1994. Atlas of Mesozoic and Cenozoic coastlines. Cambridge Univ. Press. Cambridge, U.K. Springer, M. 1997. Molecular clocks and the timing of the placental and marsupial radiations in relation to the Cretaceous-Tertiary boundary. J. Mammal. Evol. 4.285- 302. springer, M., H. Amring A. Burþ and M. stanhope. 1999 Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of dat¿ partitions with heterogeneous base composition. Syst. Biol. 48:65-75. Springer, M., G. Cleven, O. Madsen, W. de Jong, V. Waddell, H. Amrine, and M. Stanhope. 1997. Endemic African mammals shake the phylogenetic tree. Nature 388:61-64. springer, M., w. Murphy, E. Eiziri( and s. o'Brien. zoo3. placental mammal diversification and the Cretaceous-Tertiary boundary. Proc. Natl. Acad. Sci. USA 100. 1056- 1061 Stanhope, M., o. Madsen, v. waddell, G. cleven, rv. de Jong, and M. springer. 199g. Highly congruent molecular support for a diverse superordinal clade of endemic African mammals. Mol. Phylogenet. Evol. 9:501-508. Steinberg, M. H., and J. G., Adams Itr. 1991. Hemoglobin A2. origin, evolutior¡ and aftermath. Blood 7 8.2165-2177 . 9t Tagle, D. 4., B. F. Koop, M. Goodman, J. L. Slightom, D. Hess, and R. Jones. 1988. Embryonic e- and y-globin genes of a prosimian primate (Galago crassi caudatus). nucleotide and amino acid sequences, developmental regulaìion and phylogenetic footprints. J. Mol. Biol. 203:439-455. Taglg D. 4., J. L. Slightom, R. T. Jones, and M. Goodman. 1991. Concerted evolution led to high expression of a prosimian primate ô-globin gene locus. J. Biol. Chem. 266:7649-7480. Tagle, D. 4., M. J. St¿nhope, D. R siemienia( p. Benson, M. Goodman, and J. L. Slightom. 1992. The p-globin gene cluster of the prosimian primate Galago crassicaudafzs: nucleotide sequence determination of the 41-kb cluster and comparative sequence analyses. Genomics lS.7 4l -7 60 Tang, D. c., D. Ebb, R. c. Hardison, and G. p. Rodgers. 1997. Restoration of the CCAAT box or insertion of the CACCC motif activates ô-globin gene expression. Blood 90. 42I-427. Thompson, J. D., D. G. Higgins, and r. J. Gibson. lgg4. 1LTJSTAL w. improving rhe sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. Townes, T. M., M. C. Fitzgerald, and J. B. Lingrel. 1984. Triplication of a four-gene set during evolution of the goat p-globin locus produced three genes now expressed differentially during development. Proc. Natl. Acad. Sci. USA 8l:6589-6593. Trent III, J.T., and M. s. Hargrove.2002. A ubiquitously expressed human hexacoordinate hemoglobin. J. Biol. Chem. 277 .19539-19545. 92 Tuan, D., solomon, w. e.Li, and I. M. London. 19g5. The,.B-like-globin,,gene domain in human erythroid cells. proc. Narl. Acad. sci. usA sz,63gaõggs - van Dijk, M., o. Madsen, F. catzeflis, M. Stanhope, w. de Jong, and M. page. zo0r. Protein sequence signatures support the African clade of mammals. proc. Natl. Acad. Sci. USA 98:188-193. Vincent, K' 4., and A. C. V/ilson. 1989. Evolution and transcription of old world monkey globin genes. J. Mol. Bliol. Z07:465-480 Waddell, P-, and S. Shelley. 2003. Evaluating placental inter-ordinal phylogenies with novel sequences including RAGI, -fibrinogen, ND6, and mt-tRNA, plus MCMC_ driven nucleotide, amino acid, and codon models. Mol. phylogenet. Evol. 2g:197- 224. Weber, R. 1995. Hemoglobin adaptations to hypoxia and altitude - the phylogenetic perspective,Pp 3r-44in J. sutton, c. Houston, and G. coates, eds. proceedings from the ninth international hypoxia symposium, Lake Louise, Canada, Burlington, Vermont. Webeq R., and R. Wells. 1989. Hemoglobin structure and function.pp. 279-300 in S wood, ed. comparative pulmonary physiology. Marcel Dekker, Inc. New york. wheeler, D., R. M. Hope, s. J. B. cooper, G. Dorman, G. c. webb, c. D. K. Bottema, A. A. Gooley, M. Goodman, and R. A. B. Holland. 2001. An orphaned mammalian p_ globin gene of ancient evolutionary origin. Proc. Natl. Acad. Sci. USA 9g:ll0l- I 106. 93 wheeler, D., R. M. Hope, s. J. B cooper, A. A. Gooley, and R A. B Holland. 2004. Linkage of the p-like omega-globin gene to a-like globin genes in an Australian marsupial supports the chromosome duplication model foi*epu*tion of globin gene clusters. J. Mol. Evol. 56:642-652. White, J., D. Harkness, R. Isaacks, and D. Duffield. 1976. Some studies on blood of the Florida manatee, TrÌchechus manatus latÌrostris. Comp. Biochem. Physiol. 55:413- 417. wood, w. G., J. M. old, A. v. Roberts, J. B. clegg, and D. J. weatherall. TgTg.Human globin gene expression: control of B, ò and ò p chain production. Cell 15:43 7-446. Wu, C.-I., and W.-H. Li 1985 Evidence for higher rates of nucleotide substitution in rodents than in man- Proc. Natl. Acad. sci. usA gz:r741-1745. 94 Appendix l-1. Primers used in the (A) amplification and (B) sequencing reactions of 'B-like'globin genes. (A) Primers used in pol5rmerase chain reactions. Primer name Sequence IIBB-3(F) 5' TTG CTT CTGACA CAA CTG TGT T 3' IIBB4(F) 51 4üqÁCAGACACCATGG TGC 3' ÉrBB-5(R) 5'AGT GGTACT TGT GC'c CC 3' - lrBB-6(F) 5' GGT TGT CTA CCC CTG GAC 3' HBB-7(R) 5'TTC TCG GGATCCACG TGC 3' }IBB-9 5' CTGACT GCT CrCT GAG AAGACT CAc c 3' lrBB-9(FIARM 5' CTGACT TCT GAT GAGAAGACT GCG G 3' TIBB-10 5' CCC TTGAGG TTG TCC AGG TGC 3' ÉrBB-11(F) 5' CTG TCC TGC ACAACG CTAAAG TGC 3' rrBB-12(R) 5' GCC ACC ACC TTC TCA TAG GCA GC 3' HBB-13(R) 5' GGG CTT CCGAIG GATACC 3' rrBB-15(F) 5' CAC CCT GC'c CTC GGC CAA TCT 3' I{BB-16(R) 5' TTA TGC CTT TGAAGTAGAATT GTT C 3' HBB-18(F) 5' TTC TGA CAC AAC TGT GTT CAC TAG CAA CTA 3' HBB-le(R) 5' GAG CTG Aú4ü{ GTT CTT TAT TAG GCA 3' HBB-20(FISTH 5' GCT GAC GAT GAGAAG CTCT GGT GTA 3' lrBB-ARM-Bß.) 5' TAC ACC CAA GCAACG TAG TAG TGAACC 3' LIBB-ARM-B/Dß) 5' Crcc CTT TTA CCT TAG CAT TTG CGAAC 3' HBB-ARM-D(R) 5'ATG TGGAGG TAAAGC CAA CAG GTTA 3' rrBB-cGM-l(F) 5' CAG CCA GAC AUAú{ TGA AAC TCA CTG 3' FIBB-AI-1I16 5' CAGACTG AAG TCA GTG CCT GTC AGA AAC TGG 3' HBB-Bl-1i4s 5' ACT GTT CTG TCT CCA CAT GCC CAG CC 3' HYR-B1-R 5'ACC GTT TTG TCT CCA CAG CrCT TGG CC 3' HYR-C1-R 5'CCTAGATAC A,AÁ CCT GCCAAGTGC CTCA 3' HBB-C1-1I74 5' CCC ACT CAA CCC TCC TTAAGT CTA CCT TGC 3' ELE.A2-27 5'TGT GTG GGC AAC ATAAAT CTCIG TTT C 3' ELE-82-27 5' CCCATT GTC TCAGAG GCCAAG CT 3' HYF.2BZ-21 5' ATG TCT GTG TAT CTG TCT GTC TCT TCT CCA 3' ELE-C2-21 5' CTC CTG GGCAAT GTG CTG GTGA 3' ESH-A(R) 5' TCAATA CCT GTGAGAAGC CTG GAA AAC GG 3' ESH-B(R) 5' CTC CATATT CCA GTC CCA TTCAAG CAT CC 3' ESH-C(R) 5' CCT GAC CAC CAA GTT TAf CTA CGT CCA CC 3' ESH-AG'>2 5' TGC ACAAGA GTC AAG TGT GAG TGC AGC TTG 3' ESH-A(F) 5' TGA TGC CCTAAT CTC GAT CAC TCT GCC 3' ESH-C(F) 5'AGG CTG CTTTCCAGAAGGTGGTGACT 3' (B) Primers used in sequencing reactions at the Unversity of Calgary DNA core laborafory. Primername Sequence M-13(F)40 5' GTT TTC CCA GTC ACGAC 3' M-13(R) 5'AACAC'C TAT GAC CAT G 3' 95 Appendix 1-2. DNA sequences of the 'p-like' globin genes amplified in this study. Exons are identified by CAPITAL letters, while introns and exlemal g.q.r sequences are represented by lower case letters. The 5' upstream transcriptional control motifs (CACCC, CCAAT, ATA) and 3' poly-adenylation signal (AATAAA) are underlined within the 5' and 3'flanking regions of each gene sequence, respectively Within each coding block, translated amino acids are listed below their corresponding base triplet. Appendix Page r-21- African elephant ò-globin gene sequence 97 t-28 Asian elephant ô-globin gene sequence 99 1-2C Dugong ô-globin gene sequence 101 t-2D West Indian manatee ô-globin gene sequence 104 t-28 Rock hyrax ò-globin partial gene sequence 106 t-2F Rock hyrax ðH-globin gene sequence 108 r-2G Yellow-spotted hyrax òH-globin gene sequence 1i0 T-2}l Western tree ôH-globin gene sequence TL2 T-21 Bushveld elephant shrew p-globin partial gene sequence 114 1-21 Bushveld elephant shrew ò-globin partial gene sequence 116 t-2K Ni ne-b anded armadi I lo rþ ò-gl obin gene sequence 117 t-2L Nine-banded armadillo p-globin gene sequence t19 L-2}/d Pale-throated sloth qrù.globin gene sequence 721 1-2N Pale-throated sloth p-globin gene sequence 123 96 Appendix 1-24 Africon elephant delto-globin gene sequence goctogcooc tacccqqtch gococcATGG TGAATCTGAC,l$gegfçre AAGACACAAG V NLT AAE KTA TCACCMCCT GTGGGGCAAG GTGAATGTGA AAGAGCTTGG TGGTGAGGCC CTGAGCAGgT VTNL rfGK VNVK ELG GEA LSR 127 ttgtotctog gttgcoo ggt ogacttaogg ogggttgagt ggggctgggc atgtggagoc ogoocagtct cccogtttct gocoggcoct gocttcctct gcoccctgtg gtgctttcoc 241 CttCOgGCTG CTGGTG TCT ACCCATGGAC CCGGAGGTTC TTTGAACACT TTGGGGACCT L LVVY PtllT RRF FEHF GDL GTCCACTGCT GAAGCTGTCC TGCACAACGC TAAAGTGCTG GCCCATGGCG AGAÄAGTGTT STA EAVL HNA KVL AHGE KVL 361 GACCTCCTIT GGTGAGGGCY TGAAGCACCT GGACAACCTC AAGGGCACCT TTGCCGATCT TSF GEGL KHL DNL KGTF ADL 42r GAGcGAGcTc cAcTGTGAcA AGcrccAcGT GGATccrcAc MTTTCAGGg tgosrctagg SEL HCDK LHV DPE NFR ogacottcto ttttttcttt tcoctttgto gtctttcact gtgottottt tgcttotttg 54r. aotttcctct gtotctcttt ttqctcAact atgtttcotc otttogtgct ttttcooctt otoccotttt gtottocttt tctttcooto ttcttccttt tttcctgoct cocottcttg ctttqtqtco tgctctttat ttootttcct gcgtttttgc tcttgctctc cctttctcct qqtttccttc cctctgooco gtocccogat tgtgcatocc occtctcotc cqctatttct 781 gcoctggggc oqqtccccoc ccctcctccq totgogggtt ggaaaggoct gootc oaoga ggagaggatc otcgtgctgt tctogogtct gtgattcatt tcogacttga aggataactt goatootot o ooatcaggag taaatggago ggaaagtcag totct go gaa tgaaagotco 961 googgtcat a gacgogatgg ggagcagaog ttoctoogoa octgoccott gtggctotoo 1021 ttaotcoctt oottogttoa ttoototgtt tgttotttot tcocgttttt cottttggtg Appendix 1-ZA Africon elephant delto-globin gene sequence 1081 ggagtooatt t g ggct o gtg tgtggg cqo c oto oot g ggt ttcs c c c cot t gt ct cagog 1141 gccoogctgg ottgctttgt tooccotgtc tgtgtotgto tctocctctt ccccotog0 L CCTGGGCMT GTGCTGGTGA TTGTCCTGGC CCGCCACTTT GGCAAGGAAT IEACC{EAGA LGN VLVI VLA RHF GKEF TPD TGTTCAGGCT GCCTATGAGA AGGTTGTGGC TGGTGTGGCG AATGCCCTGG CTCACAAATA VAA AYEK VVA GVA NALA HKY 1321 eeKTGAgot cctggcctgt tcctggtotc catcggaogc cccotttccc gogotgctot H 1381 ctctgoottt gggaaaotao tgccooctct caagggcotc tcttctgcct ootoaogtac 744r tttcogctcq octttctgot tcotttottt ttttctcogt cqctcttgtg gtgggggaog ttcccoaggc tctotggaca gagagctctt gtgccttoto ggaaaagttc aogggqoott ggaaoataao gggaoccoto cocogotott aatgggooco ottctocttc aooggcotoq agottgggot ggtttggcoa ataggotoct ggtoctocog ggottccotg ggcct coggc ctoogocato gccccogggc tooctttcog ottcqqttcc ogoaattcct cocqoqotqo 1741 tgga 98 Appendix 1-28 Asion elephont delto-globin gene sequence ttctgggcct cagtttcctc otttgtqtqo toocogoott gllogggtoøq ttctt oagog gcttaccagg ctgtoottct ooooooaotg cotooqtooo cttgccoagg cagotgtttt togcagcaat tcctgoo qga aacgggoc co ggagotaagt agagaaagag tgooggtctg 181 ooqtcqoqct aataagacog tcccogoctg tcoaggagog gtotggctgt catcottcog gccteceeet gcogooccaç gcçetggcct tggeeqqtct gctcoc aaga gcaaaaaggg coggaccogg gttgggcoto toaggaagag togtgccogc tgctgtttoc octcocttct 361 gococaoctg tgttgactog caoctoccco otcogocacc ATGGÏGAAIC TGACTGCTGC VN L TAA 421 TGAGAAGACA CAAGTCACCA ACCTGTGGGG CAAGGTGAAT GTGAAAGAGC TTGGIGGTGA EKT AVTN LITIG KVN VKEL GGE 481 GGCCCTGAGC AGgtttgtot ctoggttgca aggtagoctt aaggagggtt gogtggggct ALS R gggcotgtgg ogacagaoca gtctcccogt ttctgocogg cactgocttc ctctgcocci,, tgtggtgctt tcoccttcog GCTGCTGGTG GTCTACCCAT GGACCCGGAG GTTCTTTGAA LLV VYPW TRR FFE 66L CACTTTGGGG ACCTGTCCAC TGCTGACGCT GTCCTGCACA ACGCTAAAGT GCTGGCCCAT HFGD LST ADA VLHN AKV LAH 72L GGCGAGAAAG TGTTGACCTC CTTTGGTGAG GGCCTGAAGC ACCTGGACAA CCTCAAGGGC GEKV LTS FGE GLKH LDN LKG ACCITTGCCG ATCTGAGCGA GCTGCACTGT GACAAGCTGC ACGTGGATCC TGAGAATTTC TFAD LSE LHC DKLH VDP ENF 841 AGGgtgogtc taggagococ tctotttttt cttttcoctt tgtogtcttt coctgtgatt R 901 ottttgctto tttgootttc ctctgtstct ctttttoctc goctotgttt cotcqtttog tgttttttco octtotocca ttttgtotto cttttctttc qotottcttc cttttttcct 1021 gactcocott cttgctttat stcotgctct ttqtttqott tcctocgttt ttgctcttgc Appendix 1-28 Asiqn elephont delta-globin gene sequence tctccctttc tcctogtttc cttccctctg oocogtoccc-ooottgtgco toccocctct 1141 cgtccoctot ttctgcoctg gggcoootcc ccocccctcc tccototgag ggttggoaag goctgootca aogaggogog gatcatggtg ctgttctogo gtotgtgqtt cotttcqgqc ttgooggoto octtgootoq totqqootco ggagtaaatg gagoggaoog tcogtotctg agaotgoaog otcagaaggt cotog ocgag øtggggogca gaagttacto agaoactgoc cottgtggct otoottootc qcttoottog ttoottooto tgtttgttot ttottcocgt 1441 ttttcotttt ggtcgggogto oottt gggct agtgtgtggg coacotooot gggtttcocc ccottgtctc agaggccaag ctggattgct ttgttoocco tgtctgtgto tgtatctocc 1561 tcttccccot ogCTCCTGGG CAATGTGCTG GTGATTGTCC TGGCCCGCCA CTTTGGCMG LLG NVL VIVL ARH FGK t62t GAATTCACCC CAGATGTTCA GGCTGCCTAT GAGAAGGTTG TGGCAGGTGT GGCGMTGCC EFTP DVA AAY EKVV AGV ANA 1681 CTGGCTCACA A.ATACCACTG Agotcctggc ctgttcctgg totccotcgg oogccccott LAHK YH I74T tcccaogotg ctotctctgo otttgggaoo otootgccoq ctctcoqggg cotctcttct 1801 gcctqotooo gtactttcog ctcooctttc tgottcqttt ottttttttc tcogtcoctc r861 ttgtggtgg g ggaagttccc ooggctctot ggacagagag ctcttgtgcc ttoto ggooo L92t ogttcoogg g aaattggaoa otaoagggoa ccoto cocog otottoo tgg gaacaqttct 1981 octtcooagg cotaoagott gggooggttt ggcaaotogg otoctggtoc tocogggott ccotgggcct coggcctoog ocotogcccc ogggctooct ttcogattcq ottccagaao ttoctcocqa aøtaotgga i00 þpendix 1-2C Dugong delto-globin gene sequence tcccctttgo ooccoctgct ttgooacaoa ogtgttottc ge-aqcttgtg cttotoottg otttttoctt gtttggaccc ccaoogccto otatggtgoc cttgcototo gcottcttcq tttotattto ttcogttttc ogcgotcotg totttggtto oocgoccaaa oaaagaaoça 181 aacaolc1oo coaatgaotg oatottoqtt ttcttocco g gagaatttoo tccaqotcag 241 gagaoagaag aootgtttog oottgtcqcq gotgtttcot tcottctgtc cttgtootto tcttggotot tctggooøca caggootogc tccotcctoo toctcctaaa ctgaacccrco 361 gtgggcaatt cctttctgcc ctctgggcct tggtctcctc ottcgtgtoc taogogooct 42r ggagogtago tgtctoagog accaggcagt tottctooqo oqaccctcot oootooqctt 481 gccaogggag otgttttcag aagtoottcc tgoaagcaat gggoccoooo gataagtaga 541 ggoogagtga gggtctgaao tcoaoctoct aagørtggtgc cogoctg cta aggacaggta cggctgtcot cottcogucc tcqeçet gco gaoccoçqee etggcctygg ccoatctgirt 66r cccoggagcq aga^gggcc¡g ooqccagggt tgggcgtqoo oggaagrgco ggoctogctg ctgcttoctc ttocttctgg cacaactgtg ttgoctogco octococoot cogccwccAT 78r GGIGEAIE]G ACTGCTGATG AGACGGCTTT GGTCACCGGC CTGTGGGCCA AGGTGAACGT VHL TADE TAL VTG LITAK VNV 841 GAAAGAATAC GGTGGTGAßG cccrccccAc gtttgtgtct aggttggaqg gcoggcttqo KEY GGEA LGR ggogggttga ttggggctgg gcatgtggog CIcogaocagt ctcccagttt ctgocogtco ctgoctccct ctgtcccctg tggtgctttc occcttcogQ CTGTTGGTTG TCTACCCATG LLVV YPlll 1021 GACCCAGAGG TTCTTTGAGC ACTTTGGGGA CCTGTCCTCT GCTTCGGCTG TCATGCACAÄ TAR FFEH FGD LSS ASAV MHN 101 Appendix 1-2C Dugong delto-globin gene sequence cccTAAAGTG AAGGCCCATG GCGAGAAAGT GTTGGCCTCCEüSTGACG GCCTGAAGCA PKV KAHG EKV LAS FGDG LKH CCTGGACGAC CTCMGGGCG CCTTTGCTGA GCTGAGTGCG CTGCACTGTG AG&\ËTEGEA LDD LKGA FAE LSA LHCE KSH CGTGGATCCT CAGAACTTCA AGgtgogtct aggogactct ctoctttttt cttttcactt VDP ANFK tgtogtcttt coctgtggtt ottttoctto cttgcotttc ctccocotct ctttocttgo ctgtotttgo gootttggtg ctttttcooc ttococcott ttgtottott ctttctttco 1381 otgttcttcc ttttctcttg tctcocogtc ttgctttoto tcotgccctt totttoottt cctgcctttt gcctctgtct ctctctcgat ctcctttttt cctqotttcc ttccccttoo ocoqtoccco ogttotgcat occocctctc otccoctoct tctgcActtg ggcaaotcct actgctcttc cototgoggg tLagooagga ctgootcoao gaggggaggo tcatggtgtt 7621 gttctogogt ctgogactcg ttttgqoott gaoggøtoat ttgootoato tqoootcagg ogtooatggg aaggaaogtc tototctgog aatggaagat cagaoggoco tattggogct 1741 goggagcago ogttoctaag agscsgocco ttqttoctct qattaotcaq ttoqttogtt 1801 ootgogtotg cttgctoatt totttocttg ttttttoott ttogtgggao taogtttggg 1861 ccgtcogttt gggctoctgt gtgggcooco toootgggtg tcogcccott ttctcogagg 1921 ccoogctggo ttgctttgtt ooccotgtct gtgtotctot ctocctcttc cccqcqgeTe L 1981 CTGGGCAATA TGCTGGTGTG TGI-GCTGTCC CGCCACTTGG GCAAGGAATT CTCTCCACAG LGNM LVC VLS RHLG KEF SPA GCACAGGCTG CCTATGAGAA GGTTGTGGCT GGGGTGGCGA ACGCCTTGGC TCACMATAC AAAA YEK VVA GVAN ALA HKY 2101 eAeTGAgott ctggcctott tcctggtotc cactggaagc cctgtttccc tagatgctot H t02 Appendix l-ZC Dugong delto-globin gene sequence 2161 ctctgoottt ggggaaotao tgcccoct c t caogggtota gettctgcct oqtooqgooc 2221 tttcogctco octttctgot tcqtttcott tottttuttt cttqctcttg aggtotcgaa ogttccccao ggctotatgg otoaggagtt ottgtgtctc otogg aagog ttcoogtgoo 234r gttogoaoot gaogggoccc otocacogct ottootg gga otoottctoc ttctoaggco tootggttgg ggaggtttgg caaatagago ggatgctggc cctac atgga ttccatogoc 2467 cttqqoccto ogocotogcc ccogggctoo cttccogatt ctqttccgac oqttactcoc aoaotootgg gctcoottog ogoogtcot g otgottgoao qcottgtttt gttttcttgc ttcctocttc tcttgcctgt ocottttogc ccocootcto toccctttct ogtctttct 103 Appendix 1-2D West Indion monotee delto-globin gene sequence oooacagtgc cagoctgctc oggoc gggto cggctgtcot -cottcogocc tcoccctgco gooccococc etggccttgg ccootctgct cccog gagca agoogggcag oaoccøgggt 121 tgggcgúss oggaogagco gggctogctg ctgcttococ ttqcttc tgg cocøact gtg 181 ttgoctooco octocqcoot cogococcAT GGIGCATCIG AcTccTGAAG AGAAGGCTI-T VHL TPEE KAL 24L GGTCATCGGC CTGTGGGCCA AGGTGAACGT GMAGAATAT GGTGGTGAGG CCCTGGGCAG VIG LITAK VNV KEY GGEA LGR gtttgtotct oggttgtaag gcaggcttoq ggagggttgo ttggggctgg gcotgtggog acagaacogc ctcccogttt ctgacogtco ctgoctccct ctgtcccctg tggtgctttc OCCCgTCOEQ CTGTTGGTTG TCTACCCATG GACCCAGAGG TTCN-TGAGC ACTTTGGGGA LLVV YPÏII TAR FFEH FGD CCTGTCCTCT GCTTCGGCTA TCATGAACAA CCCTAAAGTG AAGGCCCATG GCGAGAAAGT LSS ASAI MNN PKV KAHG EKV GTTCACCTCC TITGGTGACG GCCTGAAGCA CCTGGAAGAC CTCAAGGGTG CTTTTGCTGA FTS FGDG LKH LED LKGA FAE 601 GCTGAGTGAG cTccAcrcrG ACAAGTTGCA cGTcGATccr cAGAAcrrcA GGgtgogtct LSE LHCD KLH VDP ENFR aggagocgct ctoctttttt cttttcqctt tgtogtcttt coctgtggtt ottttoctto cttgcotttc ctccqcotct ctttqcttgo ctgtotttgo goatttogtg ctttttcooc ttococcott ttctottott ttttctttco atqttcttcc ttttttcctg tctcocogtc 841 ttgctttoto tcotgccctt totttoqttt cctgcctttt gcctctgtct ctctctccat ctcctttttt tcctoqtttt ccttcccctt aoocqotocc coaqttqtgc atoccocctc tcotccqcto cttctgcgct tgggcooqtc ctoctgctct tccqtotgag ggttagaaog goctgootco aagqggggog gatcatggtg ttgttctogo gtctgogoct cqttttgaaa 104 þpendix 1-20 West Indion monotee delto-globin gene sequence ttgooggota atttgaotoo tttooootco ggagtoaatg gggqggaoog tctototctg 1141 agaatgaaag atcagaoggo cotattgcgc tgaggogcag oagttoctoo gagacogocc 1201 qttqttoctc toottootcq ottqottogt tootgootot gcttgctoot ttotttoctt gttttttoot tttogtgggo atoagtttgg gccctcogtt tgggctoctg tgtgggcooc otaootgggt gtcaccccqt tttctcagag gccoogctgg ottgctttgt tooccotgtc 1381 tgtgtotcto tctocctctt ccccocogef CCTGGGCAAT GTGCTGGTGT GTGTGCTGGC L LGN VLVC VLA CCGCCACI-IT GGCAAGGAAT TCTCTCCAGA GGCACAGGCT GCCTATCAGA AGGTTGTGGC RHF GKEF SPE AAA AYAK VVA 1s01 TGGTGTGGCG AACGCCTTGG CTCACAAATA eeAeTGAgot tctggcctot rtcctggtot GVA NALA HKY H ccottggoog ccctgtttcc ctagotgcto tctctgoott cggggoooto otgcccactc L62t tcaagggtat ogcttctgcc tgatoswoa ctttcogctc ogctttctgo ttcctctcot ttattttqtt cct 105 Appendix 1-2E Rock hyrox delto-globin portiol gene sequence AAGGGCACCT TTGCCCAGCT GAGCGAGCTG CACTGTGACA.4.GCTGCAIGI GGACCCTGAG KGTF AAL SEL HCDK LHV DPE AACTTCAGGg tgogtccogg ggatgctcta ctcttttcoc tttgttttco ttccttgtog NFR ttotttooct tccttgcott tcctctgtot ctctqcttoc ttgottgcot tgcotcottt 18r ogtocttttc coocttotoc ogcttgcott otttttcatt coaacttcct ctttctttcc tgoctcocoo tcttgcttto cotcotgccc tttgtotooo cttatocctt tcgcctctct ctgtctctct ctctctctct ctcctttttt cctoqtttcc ttttctccot totgcotggt ccctctcact goctgcttct gcocttoggc cootcctoct tgtcctccot otcaggtttg ggaoggactg aatcaoaggg gacacaatco tgotgttott ttogogtctg tgococottt 481 tgtooctgao gaotoattgg ootoooacoo aotcotgggt aaotggaogc gctootgcot 541 atctgaggat gaaogatcag aaggtcotot aacogotggg gagcagaagt tttto agqga cogatogcto ttoctctogt cooctaatgo gtttottoto tgtttgctto tttatttoco 661 tgtttttoat tttggtggga otoagtttgg gctgtcogtt tgogctagtg tgctgtcogc 72L atagatggot gtcoccccot ttctqcogot gccaagctgg otcgctttgt taaccotgcc tctgtotctg tctotccctt ccccocogef CCTGGGCAAT GTCTIGGTGG TTGTCCTGGC L LGN VLVV VLA 841 CCGGCACTTC CGTGAGGAGT TCACCCCAGA TGTTCAGGCT GCCTTTCAGA AGGTTGTGAC RHF REEF TPD VAA AFQK VVT TGGIGTGGCA AATGCTTTGG cTcAcAAATA CCACTGAgot cctgttttct ggtatctotc GVA NALA HKY H 961 ogoagcoctg tttccct agc tgtgatct ct gaottcaggg oaotootgcc cactc tcogg 1021 ggcotcoctt ctgcctootq aogooctttc ogctcoactt tatgottcot tttotttott 106 Appendix 1-2E Rock hyrox delta-globin portiol gene sequence ttottttatt tctoctcoct tttgoootqt gggogtgtcc-tgaoaggclt ocagotoggg 1141 oocg cttgtt t ccooco gga ogagtt coog ggaoattgag oaatgooggg tqccttococ ogttgttoot ttootgggoo toocttgoct cgtotgocot ooogottggg oootttggco L267 aotogagagg otgctgctoc tcctgooatt ccotggaogt coooccctta cococogcot 1321 gaaggctact tttcqoottc atqqtcooco ottotttoca aaatoatagg ttcootoagg 1381 ggogtcatgo ttattgoooc tottgttttc cttccttcct tcctqcttct cttgtocogt 1441 ttoggtcoto ogctatgtcc ttgtaogctt tcctqctcto tggoaccotc ctcttctgtt 1501 gtt0gtccto oooctgttct gooootoatc gtttctcctg ttgttttcot ttctctgggo gttcogcgct cotgtotcoc ctgtaocttc tcttocctco tttcootggo tttcctogto L62t gtgttttoao oototttcc I07 Appendix 1-ZF Rock hyrox deltoH-globin gene sequence goctogcooc tocotgctct ggcoccATGG TGCGCCTGAC TúAIAGIGñ AAAGCTGAGG V RLT DSE KAE TCGTCTCCTT GTGGAGCAAA GTGGATGAGA AAATMTCGG CAGTGAGGCA CTTGGCAGgT VVSL lrSK VDEK IIG SEA LGR L?l ttgtotctog gccgtocgto aggcaggctg oacgaggalt gcccogggcc aogcctgtgg 181 agacaoaocg gtctcccogt ttctgacaga coctgogtcc ctctgcccoc tctgtotttt cocccctcog GCTGCTGGTC ATCTACCCAT GGACTCGGAG GITCTTTGM CACTTTGGGG LLV IYPIT TRR FFE HFG ACCTTTCCAC TGCTGACGCT ATCCTGAAAA ACCCTCGGGT ACAGGCCCAT GGAGAGAAAG DLST ADA ILKN PRV AAH GEK TGCTGTCCTC CTTTGGGGAG GGCCTGAATC ACCTGGACAA CCTCAGGGGC ACCTTTGCCC VLSS FGE GLNH LDN LRG TFA 42L AGCTGAGCGA GcrGcAcrGT GAcGAGcTGc ATGTGcAccc rcAcAAcrrc AGGgtgogtc ALSE LHC DELH VDP ENF R caggggatgc tctoctcttt tcttctcact ttgttttcot tcogtgtggt totttgocct s41 acttocttgo otttctttco tgtcoctttt tttcttgoct gtotctcotc otttogtgct 601 ttttcrgctt otqctocttt gcoctgtttt tttctttcao tottctttot ttttgtctto 661 tootcctgct ttototcoog ccctttogct ttctacottt tgcctctctc tcttttccto ocqqttccct ttcccctoaa cagtoctcqo ottocgootg ttocctctco tccogtgttc otgtocttag gcoaotcctt ctggtcctcc gtatgogogg tggcaoggoo tgootcaaag atgcaoagog tgttotgttg ttctogogtt tgtgoctcot tttgoootto ooogotoqtt tgootootot oooat co gga gtaaaaggoa aggaaaatca gtotctg ggg otggaoogot cogoogotco totaggt got ggagagcogo ogtttctoo g aaacogacca ctottgctgc aactaogcoo ttogttogtt aototocttg cttotttgtt tccotgtttt tocttttgot 108 Appendix 1-2F Rock hyrax deltoH-globi.n gene sequence gggortaaat tcgggccotc cgttt ggggc aocagaoatg g€rlqtcot cc cqtttt coco 1141 gatgccaogc tggottgttt tqttaoccqt gtctgtotat ctotctgtct cttctcccrcq 1201 gcrccl'GGGC AATATCCTGG TGGTTGTCCT GGCCCGCCAT TATGGCAAGG AATTCACCCT LLG NILV VVL ARH YGKE FTL L26L AGAGGTICAG GCTGCCTGTC AGAAGTITGT GGCTGGTATG GCAAATGCCT TGGCTCACAA EVA AACA KFV AGM ANAL AHK L32L ATACCACTGA grtcctggoc tgtttcctgg totccoctgg oogccctgtt tccctagotg YH 1381 tgocctctgo gtttgtoaot ogtgctcatt cgcaagggco ttgcttctgc ctsqtqqqgo 1441 occttcogct cgactttctg ottcotttto tttottttot ttcttotgco ctttoggogt 1501 olgggagtgt cctgoooogc ttocc gatag cgotctcttg tgtccco cog gcogagtccc 1.561. ogggaaottg gaaacggocg gatoccttot gcogttgtto otttootgg 109 Appendix 1-ZG Yellow-spotted hyrox deltoH-globin gene sequence gactagcoac tocotgctct ggcoccATGG TGCGCCTGAC IGAilAGIGAG AAAGCTGAGG V RLT DSE KAE 61 TCGICTCCTT GTGGAGCAAA GTGGATGAGA AAATAATCGG CAG|-GAGGCA CTTGGCAGgT VVSL lrSK VDEK IIG SEA LGR .LzL ttgtotctog gccgcaoggc oggctgaocg oggottgcc c agggccaogc ctgtg gagac oaaocggtct cccogtttct gocogacoct gogtccctct gcccoctttg tottttcocc 247 CCTCO9GCTG CTGGTCATCT ACCCATGGAC TCAGAGGTTC TTTGAACACT TTGGGGACCT L LVIY PtrT ARF FEHF GDL 301 TTCCACTGCT GACGCTATCA TGAAAAACCC TCGAGTACAG GCCCATGGAG AGAAAGTGCT STA DAIM KNP RVA AHGE KVL GTCTTCCTTT GGGGAGGGCC TGAATCACCT GGACMCCTC AGGGGCACCT TTGCCCAGCT SSF GEGL NHL DNL RGTF AAL 42r GAGCGAGCTG cAcTGTGAcG AGCTGCATGT GGAcccrcAG AACTTCAGGg tgogtccocg SEL HCDE LHV DPE NFR 481 ggatgctctq ctcttttctt ctcoctttgt tttcottcoc tgtggttott tgocctactt octtgoottt ctttcotgtc actttttttc ttgoctgtot ttcstcattt ogtgcttttt 601 cogcttotoc toctttgcoc tgtttttttc tttcqotqtt ctttqttttt gtcttotqqt 661 cctgctttot otcaogccct ttogctttct ocottttgcc tctctctctt ttcctoocqo ttcccttccc cctooocagt cctcoqattq cgootgttoc ctctcqtcca gtgtttotgt 781 octtoggcoo otccttctgg tcctccgtot gagøggtggc oaggaotgoo tcooggotgc aoagagtatt otgttgttct ogogtttgtg octcottttg ooottoooog atootttgao toototooq a tcoggagtaa aoggaaagga oootcootqt ctggggo tgo oogatcagaa ggtcototo g gtgacggogs gcogoogttt ctoogoooco goccoctott gctgcooctq agcoattogt togttoatot octtgcttqt ttgtttocot gtttttoctt ttgotgggao 110 Appendix 1-2G Yellow-spotted hyrox deltoH-globin gene sequence 1081 toqottcggg ccot cogttt ggggt aocog oaotgggtat cÕtc-ccottt t coco gatgc 1141 caøgctggøt tgttttottq accotgtctg tgtotctgtc tgtctcttct ccocqgeleg L TGGGCAATAT CCTGGTGGTT GTCCTGGCCC GCCATTATGG CAAGGAATTC ACCCTAGAGG LGNI LVV VLAR HYG KEF TLE 1267 TTCAGGCTGC CTGTCAGAAG TTTGTGGCTG GTATGGCAAA TGCCTTGGCT CACAAATACC VAAA CAK FVAG MAN ALA HKY AelGAgotcc tggoctgttt cctggtotcc ottggaogcc ctgtttccct ogatgtgocc H 1381 tctgogtttg toaatagtgc tcottcgcqo gggcottgct tctgcctOO! qaogoocctt cagctcgoct ttctgattco ttttqtttat tttotttctt otgtocttto ggogtotggg ogtgtcctgo ooogcctocc gotagcgotc tcctgtgtcc cqcoggcogo gtcccoggga aattggoooc ggocggotqc cttotgcogt tgttaottto atgg 111 Appendix 1-2H Western tree hyrox deltoH-globin gene sequence catgctctgg coccATGGTG CGCCTGACTG ATAGTGAGAA 4çCTGAGGTC GTCTCCTTGT V R L T D S E K -A-E V V S L GGAGCAAAGT GGATGAGAAA ATAATCGGCA GTGAGGCACT TGGCAGgttt gtotctoggc Y{SKV DEK IIGS EAL GR cgtaaggcag gctgaacgag goctgccagg gccoogcct g tggogocaoa acagtctccc ogtttctgoc ogacocttog tccctctgcc coctctggto ttttcqcccc tcogGCTGCT LL GGTCATCTAC CCATGGACTC AGAGGTTCTT TGAACACTTT GGGGACCTTT CCACTGCTGA VIY PlrTA RFF EHF GDLS TAD CGCTATCATG AAAAACCCTC GAGTACAGGC CCATGGCGAG AAAGTGCTGT CCTCCTTTGG AIM KNPR VAA HGE KVLS SFG 361 GGAGGGCCTG AATCACCTGG GCGACCTCAA GGGCACCilT GCCCAGCTGA GCGAGCTGCA EGL NHLG DLK GTF AALS ELH CTGIGACGAG CTGCATGIGG ATCCTGAGAA CTTCMGgtg ogtccogggg otgctctoct CDE LHVD PEN FK ctcttcttct coctttgttt tcqttcgctg tggttotttg octtocttgo otttctttco 541 tgtcoctttt ttcttgoctg totttco'tco tttogtgctt tttcogcttq tqctoctttg cqctottttt ttctttcoot ottctttgtt tttgtcttot ootcctgctt totqtcaagc cctttogctt tctocotttt gcctctctct ttctcttttc ctqocoottc ccttccccct 721 oaotogtoct cqqqttocgo otgttqcctc tcotccootg tttotgtcct taggcoaotc 781 cttctggtcc tccotqtgog oggl.ggcaag gogtgootca oogatgcoaa gagtattotg 841 ttgttctogo gcttgtgact cottttgaoo ttoooogato otttgootqo totooootco 901 ggagtoøaog gaoaggaooo t cogtotctg gggatgaaag ot cogooggt cotoc aggtg atggagagco googtttcto agaaacsgoc coctottgct gcooctoogc oottogttog 1021 ttqqtotqct tgcttotttg tttocotgtt tttqqttttg atgggoatoa gtttgggcco rt2 Appendix 1-2H lYestern tree hyrox deltoH-globi.n gene sequence tcogtttggg gcaocogooa tgggaotcot cccottttco aiþn_tgccoo gctggottgt 1141 tttattoocc otgtctgtot otctqtctgt ctcttctcco cogCTCCTGG GCAATATCCT LLG NTL GGTGGTTGTC CTGGCCCGCC ATTATGGCAA GGAATTCACC CTAGAGGTTC AGGE]GEEIA VVV LARH YGK EFT LEVA AAY TCAGMGTTT crcccrccrA TGGCAAATGC cTTGGcrcAc AAATAccAcT GAgotcctgg AKF VAGM ANA LAH KYH L32L cctgtttcct ggtatccott ggaagccctg tttccctogo cgtgocctct gogtctgtoo otogtgctco ttcgcooggg cotLgcttct gcctcctcca gaoccttcog ctcooctttc l-441 tgottcottt tgtttotttt otttcttotg coctttogoo gtotgggogt gtcctgaoaa gcttaccgot ogcgatctct tgtgtcccat oggcagagtc caogggaaac tggaaaalcga 1561 cgggtacctt atocogttgt tootttootg g 113 Appendix 1-ZI Bushveld elephont shrew beto-globin portiol gene sequence ctttooatgt oootttttgC tcqgtoøctg tgoctgtoot JcoCtotcct tgttgttgoo 61 gogtcootgc cooggotqtq qocoaotoaa tcootggoto tttcoggooo tttttotcog 72t ggotctttoa tataagocot ggtooaggoa goototttg g aatagtggct gottocttgt tcottttotc tttgaootoo ttttgtotog tcccoagcot ogtocttoct ggctctgtcc 24t oaocottgtt oaoctgccco otagtggtao agtotctccc ocogtcoogo tctcootgtt otcotttott toctaootgo otgocataag oaotocctqo gogccttogc ogcttgtttt 361 ttttttooot otocotoctt gctooaaago tgtttttcgt ttcqotttct gagaggootg 421 tgottogoga taogtoggag ogtgagggcc tgoootcooo ctcqootgoc ogtgccogcc 481 tgccootgoc ogtcogogct gtgotcactc cgggctcttt ctgcogagtc actctggcct gggccaøtct gcttgcogqq gcoccatggg coggqcccag ggctgggcot øgaagaøggg coggaccagc aaoggcttoc acttgcttct gocacaogtg tgttcactog coactqctcq OOCOgtCoog ATGGIGEAIE TTACTGACGG CGAGAAGGCT CTGGTCAATG GCATTTGGTC VHL TDG EKA LVNG lylls 72L CMGGTGGAC GIAGATAÂAC TTGGTGGTCA GccrcrGGcc rGgtgagtot tttgottoog KVD VDKL GGA ALG tT gcoggtttoo ggotgcttga otgggactgg aotatggago cogccgtttt ccoggcttct cocoggtott goctctttct ggcccctgtg tttcttttoc ccotcogfel GCTGATTGTA L LIV TACCCACGGA CTCAGAGGTT CTTTGAATCC TTTGGGGACC TGTCCTCTGC TGATGCTATC YPRT ARF FES FGDL SSA DAI 961 ATAAAGAATC CCAAGGTGGC AGCCCATGGC AAGAAAGTAG TGAACTCCTT CAGTGAGGGC IKNP KVA AHG KKVV NSF SEG 1021 ATGAAGCATC TGGATGACCT CAAGGGCACC TTTGCCCAGC TGAGTGAGCT GCACTGTGAC MKHL DDL KGT FAAL SEL HCD lt4 Appendix 1-ZI Bushveld elephont shrew beto-globin portiol gene sequence 1081 AAGCTCCATG TGGATCCTGA GAACTTCAGG gtgogtctot gggaotooog ogtttccttc KLHV DPE NFR 1141 tgctctctot ttttgtcooo ttctogcagg gggagggggt tccoggtcag oattgoaaca oagtgttctg octacoaagt aggaactctc cogggccttt ttaggaogto tootctcttc L26t cttctocooo tctttottgc ccctgttgco tggttcogtg ttcctttgtc ttcttcttct 1321 ooctocttcc tcctctoctt octttttqtc octtcqoooo tgotttaooo oottoqocct 1381 tcttctttaa otctoctagg ggtttcccct tctqttctto goctttcoga attcatottt 1441 ttooogcotg gggoatoaoo tggttctttt tgocottoca tocgtttttt ooooggtcoc taotooootc tgooctoago tggotggagg ooootottco catgcotooo ttcogactga 1561 totgocooca ctgocatccg togtogooct oogtgtcoot totcqcocco ctcotottto 1621 gtgcocaogo gtcoogtgtg ogtgcogctt gtgatgccct ootctcgotc octctgccag ccoaottqot gtgtcttatc tttctcctco g 115 Appendix l-ZJ Bushveld elephont shrew detto-globin portiol gene sequence t AATGTGGCCA ACGCCCTGGC'TCACAAGTAC CACTAAgotc ctgcc_ctttt ccttgggotc cotg NVAN ALA HKY H 7t ctctgtctcc ctogttctga cccctooott gggagoogaa ccoooogtgt oototttgct tqot 141 ctttcogctc-ooatttttta ttoatttctt tttttttctt ottcotccta ctgtotagga goga zLL aggaca 116 Appendix 1--2K Nine-bonded onnodillo pseudo-delto-globin gene sequence tacaaagogo coccATGGrG AATCTGACTT CTGTGGAGAA- geIGeIGIe ACAGGTCTGT V N L T S V E K-S-A V T G L GGAGAAAGGT GAACGTGAGA IAAtgtggtg gtggctctgg gcaggtttgt gtggaggttø WRKV NVR 121 ccroggcogct tooggtggga ogatgcoogc tgggcotgtg gagocatogc ogtctcctgg gtttotgoco ggtoctgoco gctgtcccct ttgctgtttt cqcccctcog GCTGCTGGTT LLV 241 GTGAACCCTT GGGCCGAAAG TTTCTTTGAA ACCTTGGGGA ACTTGTCCTC TCCTTCTGCC VNPW AES FFE TLGN LSS PSA ATAÏNGGCA ATAGTAAAGT GAAGGCTCAT TTCAAGAAGG TTCTGACTTC CTTTAGTGAT IFGN SKV KAH FKKV LTS FSD GGT TGATGA AGCACCTGGA CAACCTCAAG GGTACCTATG CTCATCTGAG TGAACTGCAC GVMK HLD NLK GTYA HLS ELH 42t TGTGACAAGC TGCACGTGGA TccrcAGAAc rrCAÄAgtgo gtctoggooo tottccqott CDKL HVD PEN FK ttttcqtttc tootttttgo ctgtgcctct ttoocctgtt ggctttacct ccocot,rrcco 541 ttttoctttt tctototctt ocatucttout gctttctoao ctttotgtco otctttcctt tcoatottct cotgogctto tqtttqqtco agctctttoa ttqotttcct ogcttttgcc cctctctccc tttttcttoq ttttctttcc ottooccogt octcqoqtto tgcctqccog 721 cttogotctc otctgctoct tctgcctttq ogooootoct tqttttcctt cooqtgtagg ttggtoaogt ttgoottaoa goaoagoggc ogocootgtt tttctogooo ttgtgcqtco ttttoagatt tgoottgttt aoogtcagog oggoaoatto ototctgagc otgoaacotc ogoagotcot ataggoggtg ggatagcoao agttootog g agacagccco totcocatto ootootcqqt gtotcogtto ottootgttt otttotagot ttttootttt tqttttggtc 1021 ggootogcct goggotctgt tgtggctoog ggotagttog ootoootggg gaocacccco TT7 Appendix L-ZK Nine-bqnded qrmodillo pseudo-delto-gLobin gene sequence 10Br gtgtctcogg agtcaogctg gogtcctttc tttoctotgt occtocctct -ctctgtgtcc 1141 tcccttcage TCCTTAGCAA CATGCTGGTG ATTGTGCTGG CCTGTCACTT TGGCAAGGAC L LSN MLV IVLA CHF GKD 1201 TTCACCCTGG AGTTGCACGC TGCCTTTCAG AAGGTCGTAG TTGGAGTGGC GAATGCCCTG FTLE LHA AFQ KVVV GVA NAL L26L GCTCACAACI- AeIA[GAgo ttctggtctg tttgct AHNY Y 118 Appendix l-ZL Nine-bonded qrmoidirro beto-globin gene sequence tatoaagago coccATGGrG MCCTGACCT CTGACGAGAtr GAçTGCCGTC CTTGCCCIGT V N L T S D E K-T A V L A L GGAACAAGGT GGAccTcGAA GACTGTGGTG GTGAGGcccT GGGCAGgttt gtotggoggr WNKV DVE DCGG EAL GR 121 tocoaggct g cttaoggagg gaggatggaø gctgggcotg tggagacogo ccqcctcctg 181 gotttotgoc aggooctgat tgctgtctcc tgtgctgctt tcocccctco gGCTGCTGGT LLV 241 CGTGIATCCC TGGACCCAGA GGTTCTTTGA AAGCTTTGGG GACTTGTCCA CTCCTGCTGC VYP lllTAR FFE SFG DLST PAA 301 ÏGTGTTCGCA MTGCTAAGG TAAAAGCCCA TGGCAAGAAG GTGCTAACTT CCTTTGGTGA VFA NAKV KAH GKK VLTS FGE 361 AGGTATGAAT CACCTGGACA ACCTCAAGGG CACCTTTGCT AAACTGAGTG AGCTGCACTG GMN HLDN LKG TFA KLSE LHC 42L ïGACAAGCTG cAcGTGGATc CTGAGAATTT cAAGgtgagt cootottctt cttcttcctt DKL HVDP ENF K ctttctotgg tcoogctcot gtcotgggaa aoggacotao gogtcogttt ccogttctco 541 otagaooaoo ooottctgtt tgcotcoctg tggoctcctt gggaccattc otttctttca 601 cctgctttgc ttotogttot tgtttcctct ttttcctttt tctcttcttc ttcotoogtt tttctctctg tottttttts qcocaqtctt ttaottttgt gcctttooot totttttoog 72t ctttcttctt ttoottocto ctcgtttcct ttcotttcto toctttctot ctqqtcttct cctttcoogo gaaggogtgg ttcoctocto ctttgcttgg gtgtoao gaa taocagcoot 841 ogcttooott ctggcotaat gtgaotaggg oggocoottt ctcotot aag ttgaggctga tottggaggo tttgcot tag togtagaggt tocotccogt toccgtcttg ctcotoottt gtgggcacao cacagggcot qtctt ggaoc aaggctagoo tottctg oat gcaaactggg 7021 gocctgtgtt ooctatgttc otgcctgttg tctcttcctc ttcogcrccr GGGCAATATG LL GNM 119 Appendi.x 1-21 Nine-bonded ormoidillo beto-globin gene sequence 1081 CTGGTGGTTG TGCTGGCTCG CCACTTTGGC AAGGAATTCG -A€IGGEAEAI GCACGCTTGT LVVV LAR HFG KEFD rÚHM HAC 1141 ÏI-TCAGAAGG TGGTGGCTGG TGTGGCTAAT GCCCTGGCTC ACAAGTACCA rTGAgctcct FQKV VAG VAN ALAH KYH ctcccooctt tccogttcct qcoooaggtg cttttgtcct cogogtccoo ctqctgootg tgggaaoatt ototogogcc ttgggootct ggttgtgcct ootooogooc otttotttcc 1321 octgcatttg tgtotttsoo ttotttctgc qtqtctcqct cogotgggco totgggaggc aag r20 Appendix 1-2M Pole-throoted sloth pseudo-delto-globin gene sequence tcoctagcoa ctacaosoco gococcATGG TGTATCTGTT.SGCAIGAG GTGtcTGccG V Y L F-A-H E V S A TCTCCGGCCT GTGGGGCAAG GTGAATGTGC AACAACTTGG TGGCGAGGCC CTGGGCAGgT VSGL IIGK VNVA ALG GEA LGR tggtoctgog gttotooagc ogottaagga gggoggttgg aagctgggct cctggagoca gogcogtctc ctgggtttct gocoggcoct goctccttct gtcccctgtg ctoctttcoc ccttcogGCT GCTGGICGTG TACCCCTGGA CCCAGAGGTT CTTTGAAAGC TTTGGGGACT L LVV YPITT ARF FES FGD TGTCCTCTGC TGATGCTGTG TTTTCCAATG CTGAAGTGAA GGCCCACGGC AAGAAGGTGC LSSA DAV FSNA EVK AHG KKV TGACCTCCTT CGGTGAGGGT CTGAGCACCT GGACAATCTC AAGGGCACCT ATGCTCACCT LTSF GEG LSTltJ TIS RAP MLT GAgcgogctg coctgtgaco agctgcocgt ggotcctgog oocttcoogg cgogttcogg ogotgttcca otttttttcq ttttcttgtc tttcactgto cttctttooc ttqctogctt 541 tgcctccoca ttccttttco ctttttctqt ottttqtcco ttoootgctt tttagoottt 601 acotccttgt tttctttcao tottctttgt ttcctatctc otgoccttqt tttocotcqq 661 gctcttactt toctogcctt tgcctctctc tccctqtttc ttootgtttt ttccoctooc cootoctcoo ottotgcaga ccagctotto tctgctoott ctqcccttgg gaoottcttc ottttctcco aottggggtt ggtooogcct goatcaaogo gaogaggcoc otgotoctct 841 agoaoctgtg cotoottttg oogtttgaot tgccagaogt caggaotooo ttgggaggag agtcogtot c tgogcat gaa agat"cagaog gtcatotog g oagtgtgoga cagcccotot 961 cqcactaoct qotcootgao tttgttoott ootttotagq ttcocttttt taaogtttgg L021 tgggoataoo attgggotcc gttttggcto gggtgtgggt ogoatoootg ggcottaccc LzI Appendix 1-2M Pqle-throqted sloth pseudo-delto-globin gene sequence 1081 cogtttctc o gaagtcaagc t ggattcgtc tgtgooccot -gft c_tot gtgt ctocctocct 1141 CtgCcctcog CTCCTGGGTA ATGTACAGGT GATTGTGCTG GCCTGCTGCT TTGGCAGGGA LLGN VAV IVL ACCF GRE 1201 ATTCACCCCG CAGTTGCAGG CTTCCTGTCA GAAGATGGTG ACTGCTGTGG CTATTGCCCT FTP QLQA SCQ KMV TAVA IAL L26L GGCTCACAAG TAccATTGAc otcctggcct gtttgctggc otccotcggo ogtccctgtt AHK YH 1321 tccgtotott ctgtcctcog oocttgggga oooaotgttc occqtcoogo ocottqcttc 1381 tgcctttttt ttaqttctct ttotttooot ttotttttgt totttctgcc tqqtqqqgot 1441 ctttcotctc oacctgotga ttcotctcoc ttottttctt tcotttttgc tcogtctogt 1501 ggtal.gggao gatcccttog ggtct ocoaa togggoactc otgtgtctto tgooo ogotg 1s61 tcaagggaaa tggaaagotg oaggaggtct r22 Appendix 1-2N Pole-throated sloth beto-globin gene sequence o ctccctgg a aggoaggggo gggcccsggg cttgg cCtoo _aagpr1ggaga agggccogct gctoctcoco cttgcttctg occcogctgt gttcoctogc ooccocoaoo cagococcAT GG]GEAEEIG GCTGACGATG AGAAGGCTGC TGTATCCGCC CTGTGGCACA AGGTGCATGT VHL ADDE KAA VSA LtrHK VHV GGAAGAATTT GGTGGCGAGG CCCIGGGCAG gttggtoctg aggttataog gcagattoca EEF GGEA LGR gogggatgat ggoogctggg ctcctggoga cogagcogtc tcctggtttc tgocaggcoc tgoctccttc tgtcccctgt gctoctttcq cccttcogGC TGCTGGTCGT GIAccccTGG L LVV YP$I ACCAGCAGAT TCTTTGAAAG CTTTGGGGAC TTGTCCTCTG CTGATGCTGT GTTTTCCAAI TSRF FES IGD LSSA DAV FSN GCTAAAGTGA AGGCCCACGG CAAGAAGGTG CTGACCTCCT TCGGTGAGGG TCTGMGCAC AKVK AHG KKV LTSF GEG LKH CTGGATGATC TCAAGGGCAC CTATGCTCAC CTGAGCGAGC TGCACTGTGA CAAGCTGCAC LDDL KGT YAH LSEL HCD KLH 541 GTGGATCCTG AGAACTTCAA Ggtgogcctg cgggcacttc ogtgttctcc ttcctccttc VDPE NFK tttctatgot coogcttgtg tcotgggoaa ogggcocagt otccogggtc cogtttggao oagaaaaoat cttctggttt tntocctotg goctccttgg ogctotttot tttctttocc tgctttgttc qcqatcottg ttttctcqtt tcottcttct ttttcttttc cotootttct tctgccttto tttttottto oocttttcot tttgtgcctt tooqttqttt ttoooctttc 841 ttcttttqqt ccqctatttt tcttqtctct otqttttctt tctctqotgt tttctttcct ogcgoaggoa cggocagctg ctoctttgca taggtctooo gootcoctot gttooogctc 961 toogttgag c tgggtgggga gsgccotttc tgcotgtoco ctcoggc tgg tgtgggaggo gcogcgtcag tggcagaggt oocatctggc tqtcotcctg ctctngottt gtgggcoooc t23 Appendix 1-2N Pqle-throoted sloth beto-globin gene sequence 1081 cctogncoco gtttonotgo tgctggoata ttctgottcc-ogtltgggqc cctctgttoo ctotgttctt gcctcttttc tcttcccctc ogeIeTTGGG TAACGTACTG GIGATTGTGC LLG NVL VIV 1201 TGGCCCGCCA CTITGGMAG GAATTCACTC CGCAGTTGCA GGCTGCCI'AT CAGAAGGTGA LARH FGK EFTP ALQ AAY AKV CGACTGGTGT GTcTACTGcc crcccccAcA AGrAccAcrG Agcoccoctt tctgcttt TTGV STA LAHK YH t24 Appendix 1-3. Lengths of exons and introns for sequenced 'pJike' globin genes. Values do not include those of the initiation and termination =cqdons. "--,, denotes missing data. '- l- Taxon Exon I Intron I Exon 2 Int¡on 2 Exon 3 Gene length African elephant ô 89 bp 128 bp 223bp 729 bp 126bp l.295bp Asian elephant ð 89 bp 128 bp 223bp 729bp 126bp t,295 bp dugong ô 89 bp t29bp 223bp 755bp 126bp 1,322bp manatee ô 89 bp 129 bp 223bp 756bp 126bp 1.323 bp rock hyrax ô 739 bp 126bp rock hyrax ôH 89 bp 132bp 223bp 728bp 126bp 1,298 bp yellow-spotted hyrax ôH 89 bp 128 bp 223bp 727 bp 126bp 1,293 bp tree hyrax òH 89 bp 128 bp 223bp 726bp 126bp t,292bp elephant shrew p 89 bp 125 bp 223bp 601 bp sloth r¡rò 89 bp 129bp 223bp? 680 bp 126bp 1,247 bp sloth p 89 bp 128 bp 223 bp 6llbp 126bp 1,177 bp armadillo rþð 87 bp 126bp 226bp 693 bp 126bp i,258 bp p armadillo 89 bp l25bp 223bp 6l I bp 126bp t,t7 4bp 125 Appendix I-4. Lenglhs of additional sequence dat¿ obtained for the 5' external region of sequenced'pJike'globin genes, and positions of upstrearl features relative to the putative CAP site. The positions of putative cap sites fronì-ihe initiation codon ("ATG') are also shown. Lengths of collected 5' end sequence data do not include that of the initiation codon. "--" denotes missing data. Length of CCAAT Putative Taxon CACCC site ATA site obtained 5' site CAP site external data African elephant ò -26bp -104 bp, -52bp Asian elephant ò -75 bp -31 bp from -400 bp -89 bp .,ATGU -31 -105 bp, bp -52bp dugong ô -76bp mutated into from -778bp -90 bp UGTA" ''ATGU -31 bp -105 bp, -52bp manatee ð -76bp mutated -90 bp into from -208 bp UGTA" "ATG" rock hyrax ôH -26bp yellow-spotted hyrax òH -26bp tree hyrax òH I4 bp distal absent; proximal at -52bp elephant shrew p -89 bp, -75 bp -30 bp from -670 bp mutated into UATG" ''CACTC" sloth r¡rô -26bp -51 bp sloth B -31 bp from l18bp "ATG" armadillo rþô l4 bp p armadillo l4 bp 126 Appendix l-5. Lengths of additional sequence data obtained &ryhe 3' external region of sequenced 'BJike' globin genes, and positions or põþ-adenylailon signals (AATAAA) relative to the termination codon. Lengths of collected 3' end sequence data do not include that of the termination codon. "--" denotes missing data. Length of obtained Taxon AATAAA signal 3'extemal data African elephant ô +103 bp 417 bp Asian elephant ô +103 bp 418 bp dugong ò +104 bp 533 bp manatee ô +104 bp 156 bp rock hyrax ò +99 bp 702bp rock hyrax ðH +102 bp 279bp yellow-spotted +102 bp hyrax ôH 279bp tree hyrax òH +102 bp 279 bp elephant shrew ô +95 bp 180 bp sloth r¡ò +l52bp 3ll bp sloth p 17 bp armadillo {rô 18 bp armadillo p +106 bp 189 bp 127 Appendix 2-1. Primers used in the (A) amplification and (B) sequencing'reactiqns of 'e-like'globin genes. (A) Primers used in poll'rnerase chain reactions. Primer Name Sequence HBB-12(R) 5' GCC ACC ACC TTC TCA TAG C'cA GC 3' HBG-1(F) 5' CCA TCA TGG GCAACC CCA GG 3' IIBG4(F) 5' CTG TCA TCA CCA CGAACT 3' IIBG-s(F) 5' CCAAGA CCG GACACC ATG 3' HBG6(R) 5' TTAATT GTG CTG TCA TGT GAA 3' IIBG-7(R) 5' AAC A AG CCT TCT CCC TAT T 3' MNT-GA(F) 5' CCA TGG CGA GAA TGT GCT GAA C 3' DUG-GA(R) 5'GAT GGT CCTACTTAT GCT CAGATCAGT GCT 3' DUG-GB(R) 5' TGA GGC TAAACA CAA GAAAGC ACA GAC ATA 3' ELE-GBß,) 5I TGA GAC AJAU{ACA CAA GAGAGC ACA AAC ATA 3' DUG-GC(R) 5' GTG CAG CTC ACT CAG CTT ACG AJd{ CrcT 3' DUG-GA(F) 5' ACT GGG TAA GAC CACAGC CTT TGA G 3' DUG-cB(F) 5' CIGC CAGACT GGAACT CTC TGC TCA C 3' DUG-GC(F) 5' C'GG AAA CGT GATAGT GAT TGT CTT CTGC 3' GGW-A(F)-RH 5'CTA CAGAIGCCAAGC TGGATC GCTTTGTTA 3' GGW-A(FISTH 5'AGGAGT GGG TAG GCA GTAATG GTA TTTAGC 3' GGw-B(F) 5' CCC AAC AGC TCC TGG GCAATG T 3' GGW-C(FIESTySTH 5'TGG CTT CACATTTTG GCAAGGAGTT 3' GGw-C(F)-RH 5' GCT CGG CAC TTC CGT GAG GAG TTC 3' GGW-A(R)-RH 5'AAT GAC GCAACG CAA TCAAGTAAG TAGAGA 3' GGW-A(RISTH 5' CTC CTAACA CTGAAC TTGACC CAI TAI T 3' GGW-B(R)-RH/ESH 5' TGT CCA TGT TCT TAA CAG CTT CTC CA 3' GGW-B(R)-STH 5'TGAGGT CAT CCATGTGCTTAACAG CATCT 3' GGw-C(R) 5' GTC AGC ACC TTC TTG CCA TGA GCC TTG 3' (B) Primers used in sequencing reactions at the Unversity of Calgary DNA core laborafory. PrimerName M-13(F)40 5' GTT TTC CCA GTC ACGAC 3' M-13(R) 5'AAC AGC 'IA-I GAC CAI G 3' r28 Appendix 2-2. DNA sequences of the 'e-like' globin genes-amplifiel in this study. Exons are identified by CAPITAL letters, while introns and external gene sequences are represented by lower case letters. The 5' flanking transcriptional control motifs (CACCC, CCAAT, ATA) and 3' poly-adenylation signal (AATAAA) are underlined within the 5' and 3' flanking regions of each gene sequence, respectively. Within each coding block, fanslated amino acids are listed below their conesponding base triplet. Appendix Page 2-24 African elephant y-globin gene sequence 130 2-28 Asian elephant y-globin gene sequence t32 2-2C Dugong y-globin gene sequence 134 2-2D West Indian manatee y-globin gene sequence t36 2-28 Bushveld elephant shrew e-globin partial gene sequence 138 2-2F Pale-throated sloth e-globin gene sequence 140 r29 Appendix 2-24 Africon elephont gommo-globin gene sequence GTGCATTTTA CTGCCGAAGA GAAGGCTGCT ATCGCAAGCC-I4TGGGGCCA GGTGAATGTG VHFT AEE KAA IASL-tTGA VNV EETG GEA LGR L2L ataoaggaag ctgogcctgg coogaotgco gaccogttoc coogttttgt goctoggtct gottttccot ctgttotggt tccotcccot ogGeTTeIGG TTGTCTACCC TTGGACCCAG LLV VYP yllTa 24t AGGTTTTTTG ACACCTTTGG CAACCTATCC TCTGCCTCTG CCATCATGGG CAACCCCAGG RFFD TFG NLS SASA IMG NPR GTCAAGGCCC AIGGY¿AGM GGTGCTGACC TCCTTTGGAG ATGCTGTTAA GAACCTGGAC VKAH GKK VLT SFGD AVK NLD AACCTCAAGG GCACCTTTGC TAAGCTGAGC GAGCTGCACT GTGACAAGCT GCACGTGGAT NLKG TFA KLS ELHC DKL HVD 42r ccrGAGAAcr rcAGGgtgog tccoggatot gtttgtgctc tcttgtgttt tgtctcaggc PENF R oocttggoco qccqogcoct gocctgagca caagtaggoc cotctot ggg ctgtgoggto 541 tttggogotc ttgggttogL aotooogcaa ctccagogog ctgoottcoo octoggtgtt 601 ocogtoocoo ggctcoogoo gtgcttgtog oogtctocog lggtgacttg tctooqctct gttggctttt ootggogoct tgtgtacsga tgagaaoaat gtcttttggt oottggoott 72L ccgtggooog gagaactttg cttttctttc ctccctcqtc totcoccgcc tgcccctcqt otgttggogg tagaaoaogo otgtcotggg tgocgotaco tgcogocotg tottggttoo ttotoocooa gooocctggg tggtcotgtg totottocto oggctottcc totqcttggt ggccotttgt ocoooqotgg tootocctcc tttcoqqctt ttttccgaoa octcgaggog taagacoaao t gtgtct gag ggagggaaac ooctggooto tgtgacc oag gagggtot gg aotagaaogc tgtgoagttt tqtttttgct tttttcct c c caaeaoagca totgg agato 130 Appendi.x 2-24 Afri.con elephont gamma-globin gene sequence 1081 gtgctggtag googotcaoo ooaotgoaac cgtgogtgto glgcogtoeu o000gogaag cooottagct otgttootoo ctgggcaago ccotogcctt tgoggaogct oocotogoct togttoggt c tgagtggcag ctgggtggcc ogotoctcg g aggccagact ggogctctcg gctcocooco tgaotccota ggaatgcotc tgttgtctgt ttctacctcc ocogCTCCTG LL GGAAATGTGC TAGTGATTGT CI-TGGCAAAC CACTTTGGCA ARGAAIIEAE CCCCCAGGTG GNVL VIV LAN HFGK EFT PQV CAGGCTGCCT GGCAGAAGAT GGTGACGGGC GTGGCCAATG CCCTGGCCTA CAAGTATCAC AAAlT AKM VTG VANA LAY KYH TGAgctcctc gt 131 Appendix 2-28 Asion elephont gomno-globin gene sequence gtggtcocco cgooctccco ogcccagaca ccATGGIGeA lT[=-fAeIGee GAAGAGMGG VH FTA EEK CTGCTATCAC AAGCCTATGG GGCCAGGTGA ATGTGGAAGA GACCGGAGGC GAGGCCCTGG AAIT SLIT GAVN VEE TGG EAL r2r GCAGgtocot tctg ggo agc agcggagggg agggootaoo ggaagctgog cctggcaoga GR otgcogocco gttaccoagt tttgtgocto ggtctgottt tccqtctgtt atgottccot 24L CCCAtogGCT TCTGGTTGIC TACCCTTGGA CCCAGAGGTT TTTTGACACC TTTGGCAACC L LVV YPWT ARF FDT FGN TATCCTCTGC CTCTGCCATC ATGGGCAACC CCAGGGTCAA GGCCCATGGC AAGAAGGTGC LSSA SAI MGNP RVK AHG KKV TGACCTCCTT TGGAGATGCT GTTAAGAACC TGGACAACCT CAAGGGCACC TTTGCTAAGC LTSF GDA VKNL DNL KGT FAK TGAGCGAGCT GcAcTGTGAc AAGCTGCACG TGcATccTGA GAACTTCAGG gtgogtccog LSEL HCD KLHV DPE NFR 481 gatotgtttg tgctctcttg tgttttgtct caggcooctt ggacooccoo gcoctgotct 541 gogcacoagt oggoccotct otgggctgtg ogototttgg ogotcttggg ttogtaatos ogcooctcca gogogctgoa ttcoooctog gtgttocogt aocooggctc oogoogtgct tgtogoogtc tocogtggtg octtgtctoo octctgttgg cttttootgg ogocttgtgt 721 acagatgoga oaoatgtctt ttggtoottg goottccgt g gaaaggagoo ctttgctttt ctttcctccc tcotctqtco ccgcctgccc ctcototgtt ggaggtagoa aqogaotgtc atgggtgacA otocotgcag ocotgtottg gttoottota acaoagaooc ctgggtggtc otgtgtotot toctocggct ottcctotot ctggtggcco tttgtocoaa oøtogtooto tctcctttca qocttttttc tgooooctcg aggogtoago caoootgtgt ctgog ggogg gooocooctg goototgtgo ccooggaggg totggaotog ooagctgtgo ogttttottt 132 Appendix Z-ZB Asion elephont gormo-globin gene sequence ttgctttttt cctcccoaaa cogcgtotgg agotggtgct ggtpggaogo tcaoooooot 1141 gaooccgtgg gtgtagggco gtocooaoog agaagcaoot togctotgtt ootoa ctggg coogaccoto gcctttgagg oagctoocot ogocttogtt oggtctgagt ggcagttggg tggccagoto ctcggoggcc ogoctggsgc tctcggctco coocotgoqt ccotoggoot 1321 gcotctgttg tctgtttcto cctccocogC TCCI.GGGAAA TGTGCTAGTG ATTGTCTTGG L LGN VLV IVL 1381 CAAACCACTT TGGCAAAGM TTCACCCCCC AGGTGCAGGC TGCCTGGCAG AAGATGGTGA ANHF GKE FTPA VAA AITA KMV 144l cGGGcGTGGc cMTGcccrG GccrAcMGT ATCACTGAgc tcctcgcoot aggoagaqga TGVA NAL AYKY H 1s01 cttatttttc ocotgocooc qcoottqqtt oooogtottc tgttoogaoa toogatgtoo 1561 tggoctcctg ttgtttcttt ttcotgtggc cttoagtooo cgaotttcco ggggttttot gttggggtgt gtgtgtgctc cctottcoct tttggcaooo ggtaagoatt ttgotaataa 1681 aqgoacaagg caatggq{/gc ocotoc actg ggagttctga aqggaasago ooootctctg 1741 ggatogtttt ggtgggagag aaagggcttt ottggocogg goctcctcog ooctogot 133 þpendix 2-2C Dugong gormo-globin gene sequence otcocogotg tgootgtcco ttttqqctct tccotgcctc¡-oqcoccgccc ccccgccccg gcctgotogc ctogcgttgo ccoqtogcct cotogcoaaa ggaagsocoa ogggccagtg gccogggoto oagoglaaoo ogccocotgt tccogttgco gcocotocot ccttctgoco 181 cotctgtgot cqccocooac tccaogoccg gococcATGG TGTATTTTAC TGCTGAAGAG V YFT AEE 24t AAGGCI-GCTA TCACAAGCCT GTGGGGCAAG GTGAATGI-GG AAGAGGCTGG AGGCAAGGCC KAAI TSL ITGK VNVE EAG GKA CTAGGCAGgT ogattct ggo gggtoggggo agggagggao taaaggoagc tgagcttggc LGR 361 aggoatgcag gtctgttocc aogtcttgtg ocaogctctg otttoccotc tgctotgott ctotcctgto gGeIeelGAI TGTCTACCCT TGGACCCAGA GGTTTfl-TGA CAAATTTGGC LLI VYP IVTAR FFD KFG 481 AACCTATCCT CTGCCTCTGC TATCATGGGC AACCCCAAGA TCMGGCCCA TGGCAAGAAG NLSS ASA IMG NPKI KAH GKK 541 GTGCTGAACT CCTTTGGCGA TGCCGTTGAG AACCCGGACA ACCTCAAGGG TACCTTTGCT VLNS FGD AVE NPDN LKG TFA 601 AAGCTGAGTG AccTccAcTG TcAcAAGcTG CTTGTGGATC cTGAcGAcTT CAGGotgogt KLSE LHC DKL LVDP EDF R 661 ctoggototg tctgtgcttt cttgtgttto gcctcogaco gcttogocoo ctgogcoctg atctgagcot oagtaggocc otctqtggcc tgtgogotot ttggogotct tgggttogto ocaaCIocoot tccogogggc tgoottcqoo ctoagtgtto tggtoocaag gc|ucgagaog 841 tgcttctogg ogtctggogt gocttgtcto ooctctgttg gctttto gtg gøgactcgtg togogatgag gocgtctttt ggtoottggo otcctgtoga aagqagaott ttgcctttct ttcctcocgt otctqtcoct gcctgccccc cototgtogg aggtogaaoa agootgtcco gtgggacaat ocacgcogoc otgtottggt tqqttqtqac aaagaaactg ggtggtcotg t34 Appendix Z-ZC Dugong gomno-globin gene sequence 1081 cotqtottac tooggctott cctgtoctgg gcoggcottt-grtg_coooq[c tactaotgtc tcctttcoqq cttotttcct qqoqctctog gagtaaggco oogtgtctct gagggaaggg qooccooct g gaatotgtgg cogaggaggg tatggooto g oaagctgtgo gggttttott tttgtccccc occcccoccc cccoc coaoo aooaaaagta tgtggagotg gtgct ggtag 1321 goagatagaa agagtgaoot cotggot gtg ggt co gto c a gaaagggaag caaattogct otgtttooto octgggtaag accacogcct ttgagaaagc toacotogoc ttaggcctto 1441 gtggcogttg ggtggccogo ttcttggagg ccogoctggo octctctgct coctototgo 1501 otccotoggo otgtgtctgt totctctttc tocccctoco geIeeIGGGA AACGTGATAG LLG NVI 1561 TGATTGTCTT GGCAÄACCAC TTTGGCAAAG AATI'TACCCC CCAGGTGCAG GCTGCCTGGC VIVL ANH FGKE FTP QVA AAI'J AGAAGATGGT GACTGGTGTG GCCAGTGCCC TGGCCCGCAA GTATCACTGA gtotctcato AKMV TGV ASAL ARK YH otagggagao gccttgttct t cocs tgoca gcocaattoo tggqattqtt ctgttAAtøl I74l øtaogottto ctogactcct ottgtttctt tttcatgtoo ctttoogtaa øtgaattccc oggg0tttta tgttgggggg gtgtgtgtgt gctccccott coctgtttgc aaoogatgao gottttgaco agaaaatsag agsacaggac gataaaggoq cotccct ggg ogttctgaao l92r gggoaagaao oatctct ggg agagttttgg tgggaoaggø agggctttot gggocoaggo 1981 ctcctcogaq ctoaotattt ttgogcogtg ggaaaogact tcogggoaog tagcagoggc 0 135 þpendix 2-20 West Indicn manqtee gormo-globin gene sequence tcotogcoo a oggoagaaca aagggccogt ggccagggat -ueogao.Lq ry aagccocatg 61 ttccagttgc ogcocotoco tccotctgoc ocotctgtgo tcoccqcooq ctccoagacc Lzt ggococcATc GTGGATTTTA CIGCTGAAGA GAAGGCTGCT ATCACMGGC TGTGGGGCAA VDFT AEE KAA ITRL ITGK 181 GATGAATGTG GAAGAGGCTG GAGGCAAGGC CCTGGGCAGg togottctgg gggotogggg MNV EEAG GKA LGR 241 sagggaggga otoaaggaag ctgagctagg cøggaotgco ggtctgttac coogttttgt 301 gocoogctct gottttccot ctgctotgot tctatcctgt ogGeleelGA TTGTcrAccc LLI VYP 361 TTGGACCCAG AGGTTTTTTG ACAACTTTGG CAACCTATCC TCTGCCTCTG CCATCATGGG lrTA RFFD NFG NLS SASA IMG 42L CAACCCCAAG GTCAAGGCCC ATGGCAAGAA GGTGCTGAAC TCCTTTGGAG ATGCCGTTAA NPK VKAH GKK VLN SFGD AVK 481 GAACCCGGAC AACCTCAAGG GTACCTI-TGC TAAGCTGAGT GAGCTGCACT GTGACMGCT NPD NLKG TFA KLS ELHC DKL 541 GCTTGTGGAT TCTGAGAACT rCAGGotgog tctogggtot gtttgtgctt tcttgtgttt LVD SENF R 601 ogcctcogoc ogcttogoco octgogcoct gotctgogca taagtaggoc cotctatggc 661 ctgtgogoto tttggogotc ttgggttogt gaaaaoocao ttccogoagg ctgoattcoa 72r octoogtgtt otggtoocoo ggctcaogoa gtgcttctog gagtclogog tgacttgtct 781 oaoctctgtt ggcttttaat ggogactcat gtagagatgo goocootgtc ttttggtoot 8{1 tggaatcctg tagaaggoog aattttgcct ttctttcctc ocgcotctot coctgcctgo 901 cccccototg toggaggtog oaaaogootg tccogtgggo cAatococgc aggcatgtot 961 tggttootto toocooogoa actgggtggt cotgcototg tcoctooggc tottcctgto 1021 ctgggcaggc otttgtgcqa qotctoctaq tgcctccttt cqoatttotq tcctooooct 136 Appendix 2-2D lïest Indion monotee gormo-globin gene sequence ctgggagtag gacaaagtgrt ctctg ogggo gggaaacaac -tggaototgt ggcagaggag 1141 ggtatggaot ogaaogctot goggttttat ttttgttccc cccoccqeea aaoaeoogta tgcggagatg gtgctggtog goagatagaa agagtgaaat catggot gtg ggtcogtoco goaagggaag caoottogct otgtttootq octgggtoog occacagctt ttgogaocgc toacotagac ttoggcctto gtggcogttg ggtggccago ttcttggagg ccagactgga 1381 octctctgct coctotatgo otccata$ga otgtgtctgt totctctttc tocccctoca 7447 gEIEE]GGGA AACGTGCTAG TGATTGTCTT GGCAAACCAC TTTGGCAAAG AATTTACCCC LLG NVLV IVL ANH FGKE FTP 1501 CCAGGTGCAG GCTGCCTGGC AGAAGATGGC GACTGGTGTG GCCAGTGCCG TGGCCCGCAA AVA AAlTA KMA TGV ASAV ARK 1561 GTATCACTGA gtocctco YH 137 Appendix 2-2E Bushveld elephont shrew epsilon-globin portiol gene sequence GTCAAGGCTC ATGGAAAGAA GGTGCTGACC TCCTTTGGAG -AIGEAGTI$ GAACATAGAC V K A H G K K V L T S F G D-A V K N I D AATCTCAAGG GAGCTTTTGC TAAACTAAGT GAACTGCACT GTGACAAGTT ACATGTGGAT NLKG AFA KLS ELHC DKL HVD CCTGAAAACT TCCGGgtoog tccagoogot gttcocttgg tttggttttt tttttoottt PENF R ttottotggo atootoqtto otootootao toocaocoat ootogotooc cctttgocto 247 gaaagccaaa occoacoooo ctctqasata otootttggt gttoooo gag toagcttctg gttggcatao ttotogccto ttaggctcog aggtgaogtc aotoagggot gttogotocc tcogtttooa oaagaaccco oqcotttooc ogaotcogat gtottcotcc tooogootoc tgaggaøgao caggtaaact ttottctctg tgootctooo ogotttcogt otgtcoataa tatggaggao ootgotooot ottttgtttc ttgaogaaoo agotootaco otttotoqoo tqtottaoot qotctotcoo totcqtatcc ooogogoctt ttttctttct gcottgottg otgoctttoo coooqttotg cctooggagc tcttgtgtcc ctgoaotcct qcoottotoq 661 gcotoaggtq otttqtoaat tøgøaagggo ggcaaocooo otogtoc agt ggatogaaaa ggatgtagct aogaggaggc otootgttgo acogøgogtt tooqataotg ttgoototog otgctatctt octtgagtot ootggagaag tttgogotog gaootgotco oggototggg 841 tottcogttt toccccctcc cccqoaaoaa ottqttttto gooctctcto ctcqototgt ctctttgtco ttttgtcttt tccccaotog CI-CCTGGGCA ATGIGATCGT TATCATCATG LLGN VIV IIM GCTTCGCATT TTGGCAAGGA GTTCACCCCT GAAGTGCAGG CTGCTTGGCA GAAACTGGTG ASHF GKE FTP EVAA AITA KLV 1021 GCTGGTGTTG ccAcrccrcr cGcrcAcAAG TAccATTGAg rcctcttgct ctcot gcaog AGVA TAL AHK YH 138 Bushveld elephont sh rew Tírtålï-rî;Xl n partiol sene sequence r0gr tgccccgtgt tcccaccocc tttottcttc acatgacagè_ãcgottaeo 139 þpendix 2-2F Pole-throated sloth epsilon-globin gene sequence caaoggcggt gtotoccctt coctgctgoc cctctgctga-cçc_qgctcçg cccct goggg ococogctco gccctgocco otgoctgtgo agtaccoggg goocoagggg ccagaggtoc acagtgaøgo qtsoooogcc ocotcttcto gaagcogcsc ogctctgctt ctgocacgtc tgtgotcocc ogccqgcott togotcctoc otcATGGfGe ATTTTACTGC TGAGGAGAAG VH FTA EEK GCTACTGTGG CGAGCCIGTG GGGCAAGGTG AACGTGGAGG AGGCTGGCGG CGAGGTGCTG ATVA SLIT GKV NVEE AGG EVL GGCAGgtogg cotttgggtc tcoocgcagg ggaaogoogg tgootgcoog cctggoaoat GR tgacaagaao togccooogg otttctgtgt ctctgotttt ccottttcto tggtctcotc tCOtOgACTT CTGGTTGTCT ACCCCTGGAC CCAGAGATTT TTTGACAACT TTGGTAACCT L LVVY PITT ARF FDNF GNL GTCTTCTTCC TCTGCAATCA TGGGCAACCC CAAGGTCAAG GCCCACGGCA AGAAGGTGCI SS5 SAIM GNP KVK AHGK KVL 541 GACCTCCTTT GGAGAICEIG TTAAGCACAT GGATGACCTC AAGGGCACCT TTGCCCATCT TSF GDAV KHM DDL KGTF AHL 601 GAGCGAGCTG cAcTGTGAcA AGcTGcAccT GGATccccAG MCTTCCGGg tgogtccogg SEL HCDK LHV DPE NFR 661 aaaggagcat gtgcccottt ttgcttttta cttottctgo ootootgggt caagttcogt gttoggaggo cogooctcto gttggcotoa ccootoccoo ctogtctcag aagtggqocc 781 ogtatgggtc tqctaccogc ctgottttcc ctoooocctt ttogggoctc togccootot tqgttgtgtt totc ctg gag ogtattggtg agtagcagøa ctgtgco agg agoaatcqoc tttggctoct totagotgao ggoctatoto aoagotggoo toogccttot tttgttgagt octooggott gotttggogo aoaotttgog ocogottttt tcooogoooo qtocooaoot ttcctaqtqt otattaaott ccctgtcogt attgtgttca agaaaaggct tgtccctoto t40 Appendíx 2-2F Pqle-throoted sloth epsilon-globin gene sequence ttggctggog ottttooctc cctct gagoa occttgcagc-cct_gooctqc tttgottqcq 1141 ggagtgogot ooctggt aoo tgogtgaagg oogtoootoo oacogct gao ttgggcaaog gaggtgggao tatocotagg cagootoctg goctgotggc tcatqaoogt ootottgoac t267 ccoggotcto gctootttgo gcocottggo toggcocooc tcttggogco gtttgaggtg L32t oggogtgggt aggcogtoot ggtotttogc cottttotct gooaottctt tttggoaact 1381 tctgttcaco tgcctgtgtg ttgtctgcct ttcccctaoc qgeIeelGGG CAATGTGATG LLG NVM 1441 GTGGTTGTTT TGGCTTCTCA TTTTGGCAAG GAATTCACTC CCGAAGTGCA GGCTGCTTGG VVVL ASH FGK EFTP EVA AAI'J CAGAAGTTGG TGGGTGGTGT TGccAATccr CTGGcTcAcA AGTAccAcTG Agccttctgt AKLV GGV ANA LAHK YH 1561 ccocctcgtc agggcccctg tgtccccogc cotccttctg cacgtgttto otgggttttg gctttgogog cocogcttct ccttqotooo gtgcottcta ttcogtoqtt aotqttttot ttccttcatc ttttgttctt gttttaaogg aggaagggtt cottggctgo ggggtgggoo 1741 gagacotoag cotooocoog tgctcttttg aggagggttt gogttct cat coaaggoagg 1801 tagoogttaa cgggoacgtt ctoggoggcc aaggggcato ctoagotgca goøggagttt 1861 tctooggcat ggoaaoggct tctgt gaogt ggacoggtac ttgtggggcg gtgtg gcaao 7921 oocttococ a aaggaaggaa ggggatggto ogcogtgtt t gagooagoog aoagcagott tgtggtoaot ocotgcogoc otocotqcot aaøotaatoa aaaataagct ttgto gggat ccoogttcto gcctogogct octatttcqo tttgctctoa cocqtotttt taogcogcco gcttgtcago otgtttgtgo gg t4l Appendix 2-3. Lenglhs of exons and introns for'e-like' globin genes. Values do not include those of the initiation and termination codons. "--" denoJes missing data. Taxon Exon 1 Intron I Exon 2 Intron 2 Exon 3 Gene length African ele-phant y 89 bp l23bp 223bp 879 bp 126bp 1,440 bp Asian elephant y 89 bp l23bp 223bp 879 bp 126bp 1,440bp dugong 1 89 bp 123 bp 223bp 887 bp t26bp 1,448 bp manatee y 89 bp l23bp 223bp 876 bp 126bp r.437 bp elephant sh¡ew e 795bp 126bp sloth e 89 bp t2l bp 223bp 773 ï:p 126bp 1,332bp armadillo e 89 bp l25bp 223bp 819 bp 126bp 1.382 bp armadillo r¡.'1 l16bp 122bp 200 bp 854 bp 126bp 1,4r8 bp armadillo r¡r¡ absent absent absent -817 bp t25bp nJa 142 Appendix 2-4. Lenglhs of sequence data obtained for the 5' external region of 'e-like, globin genes, and positions of upstream features relative to the pu-tative Cap site. The positions of putative Cap sites from the initiation codon ("ATG') are ¿so shown. Lengths of collected 5' end sequence data do not includethat of the initiation codon. "--" denotes missing data. CCAAT Putative Length of Taxon CACCC site ATA site obtained site Cap site 5' external data Asian elephant y -32bp dugong 1 absent -84 bp -30 bp -52bp -216bp manatee T -30 bp -52bp 127 bp sloth e l12bp -83 bp -30 bp -53 bp -213 bp armadillo s -lll bp -82 bp -32bp -54 bp n/a -103 bp, armadillo rpy mutated into -69 bp -28 bp -9 bp nla CTCCC armadillo rþr1 absent absent absent absent nla 143 Appendix 2-5. Lenglhs of sequence data obtained for the 3' external region of ,e-like, globin genes, and positions of polyadenylation signals (AA?AJ\â) reTative to the termination codon. Lengths of collected 3' end sequence data do not include that of the termination codon. "--" denotes missing data. Length of obtained Taxon AATAAA signal 3'extemal data African elephant 1 +9 bp Asian elephant y +197 bp, +213 bp +320 bp dugong y +48 bp, +65 bp +371 bp manatee r¡ +8 bp elephant shrew e +70 bp sloth e +93 bp +571 bp armadillo e +82 bp nla armadillo {.r1 +29bp nla armadillo r.[n1 +118 bp nla t44