<<

RATITE MOLECULAR , PHYLOGENY AND BIOGEOGRAPHY INFERRED FROM COMPLETE MITOCHONDRIAL GENOMES

by Oliver Haddrath

A thesis submitted in confonnity with the requirements for the Degree of Masters of Science Graduate Department of Zoology University of Toronto

O Copyright by Oliver Haddrath 2000 National Library Biblioth&que nationale 191 .,,da du Canada uisitions and Acquisitions et Services services bibliographiques 395 Welington Street 395. rue WdKngton Ottawa ON KIA ON4 Otîâwâ ON K1A ûN4 Canada Canada

The author has granted a non- L'auteur a accordé une iicence non exclusive licence allowing the exclusive permettant A la National Library of Canada to Bihliotheque nationale du Canada de reproduce, loan, distribute or sell reproduire, @ter, distribuer ou copies of diis thesis in microfonn, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/fïîm, de reproduction sur papier ou sur format 61ectronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette tbése. thesis nor substantial exûacts fiom it Ni la thèse ni des extraits substantiels may be priated or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract

Ratite Molecular Evolution, Phylogeny and Biogeography Inferred fiom Complete

Mitochoncîrial Genomes. Masters of Science. 2000. Oliver Haddrath

Department of Zoology, University of Toronto.

The relationships within the ratite and their biogeographic history has been debated

for over a century. While the of the has been established, consensus on the branching pattern within the ratite tree has not yet been reached. To examine this problem, I sequenced the complete mitochondnal genomes Rom a representative of each extant ratite lineage, one extinct lineage () and two of tinaniour. Based on concatenated sequences, the moas were found to be the most ratite and the kiwis were found to form a clade with the and . It was las clear whether among the extant ratites the or the rheas were more basal. Application of a molecular clock found most of the speciation events were otder or very close to the dates of the breakup of and so a vicariant origin for these birds could not be rejected. Acknowledgements

Any large project undertaken has the input directly or indirectly of many people. This

thesis certainly falls into that category. 1 have many people to thank for cornments, ideas and

insights. First and foremost 1 would like to thank Madeleine, my wife, for al1 her loving

patience, indulgence of my many late nights in the lab and assistance over the last few .

This thesis and degree are shared with her. 1 would also like to thank my , particularly my

parents who instilled a curiosity about the world and an appreciation of science at an early age.

Over the course of my years at the Royal Ontario Museum, 1 have worked with many

talented people who have al1 had an input on this work. They inclucie Cathy Ayley, Chris

Pankewicz, Sue Chopra, Brad Millen, Mark Peck, Jon Barlow, Ross James, Glenn Murphy, and

James Dick. Also included in this list are the graduate students with whom 1 have worked with:

Mike Demison, Dawn Marshall, Oksana Borowik, Jaime Alvarado, Kevin Seymour, Vicki

Friesen, Tim Birt, Paul Wenink, Don Stewart, Anna Toline, Dilara Ally, Tracey Monehan, Carol

Ritland, Colette Baril, Marlene Walker, Alessandro Grapputo, Jan Hughes, Tara Paton, Annette

Greenslade, Maryann Burbidge, Andrew Given, Nicola Wade, Carol Cooke and Gustavo

Ybazeta. I would also like to thank Alejandro Lynch and Pavneet Arora whose help were indispensable with statistical tests and computer programming.

During this project many researchers with whom 1 corresponded were generous with their assistance and computer programs. 1 would like to thank Masami Hasegawa, Lars Jermiin, Wen-

Hsiung Li, Sudhir Kumar, Nicolas Galtier, Ziheng Yang, Spencer Muse, Peter Lockhart, David

Mindell, Peter De Rijk, and Eka Hagelberg.

iii For David Irwin, my adjunct supervisor, 1 am grateful for the insights you provided and the depth of your knowledge. 1 reserve my final acknowledgement for Allan Baker, my supervisor. My thanks for giving me such an interesting project and al1 your support and guidance. The project proved to be challenging in ways that we did not anticipate and you were generous in your advice, time and in giving me the latitude to overcome the problems. You set a high standard for yourself and your students and you constantly made me defend my ideas and theories and hopefully I am a better student for it. Table of Contents

.. Abstract ...... li ... Acknowledgments...... 111

Table of Contents ...... v

List of Tables ...... ix .. List of Figures ...... XII

List of Appendices...... xvi

Chapter I : Introduction and Overview of Ratite Systematics

Introduction...... 1

The Objectives of this Thesis ...... 1 I . . Organization of the Thesis...... 12

Re ferences...... 13

Chapter 2: Characterization of the Complete Mitochondrial Genomes of Nine

Paleognathous B irds

Introduction...... 19

Materials & Methods ...... -21

Specimen Information...... 21 DNA Extraction and Amplification...... 22

Sequence Analysis ...... 25

Results ...... 24

Structural Features of Paleognathous Mitochondrial Genomes...... 25

Codon Usage and Base Composition ...... 29

Amino Acid Composition ...... 30

RNA Editing ...... 36

rIWAs...... 38

Control Region ...... 42

Levels of Variation Between Genes ...... 44

Discussion ...... 46

*@ . Genome Organization...... 46

Codon Usage and Sequence Composition ...... 47

RNA Editinp ...... 48

References ...... 49

Chapter 3: Rates and Patterns of Mitochondrial Genome Evolution in Paleognathous Birds

Introduction...... 58

Materiais & Methods...... 60 DNA Isolation. Amplification and Sequencing...... 60

Data Analysis ...... 61

Estimation and Pattern of Nucleotide Substitution...... 62

Substitutional Rate Variation Among Sites and the Rate

Constancy Test ...... 63

Mode1 of Substitution...... 64

Results ...... 64

Patterns of Nucleotide Substitution for the Protein-Coding,

rRNA and tRNA Genes ...... -64

Substitutional Biases Among the Four Nucleotides ...... 65

Stationarity...... 72

Rate Variation Among Sites and Among Genes ...... 72

Mode1 of Nucleotide Substitution...... 76

Discussion ...... 78 . . Pattern of Substttutions...... 78

Rates of Evolution...... 84

References ...... 85

Chapter 4: Ratite Phylogenetics and Biogeography

Introduction ...... 94

Materials & Methods ...... 96

vii Sequence Data ...... 97

Data Analysis...... 98

Models...... 99

Results ...... 101

Analyses of the Concatenated Data Sets...... 101

Analyses of the rRNA and Protein-Coding Data Sets...... 1 07

Spectral Analysis...... 1 10

Relative Rates Tests ...... 1 13 . . Divergence Time Estimation...... 1 1 3

Discussion ...... 1 17 . Compositional. Biases...... 1 17

Ratite Phylogeny ...... 1 19

Historical Biogeography of the Ratites ...... 122

References...... 126

Chapter 5: General Summciry

Characterization of the Paleognath Mitochondrial Genomes ...... 139

Rates and Patterns of Nucleotide and Amino Acid Evolution ...... 140

Infemng Ratite Phylogenetic Relationships ...... 14 1

Ratite Historicai Biogeography ...... 142

Appendices ...... 144

viii List of Tables

Table 2.1. Length and base composition of the L strand rnitochondrial DNA genomes

of paleognathous birds relative to ...... 25

Table 2.2. Listing of gene , gene size and strand encoded for paleognathous

birds...... 28

Table 2.3. Base composition of the protein-coding genes (excluding ND6)...... 3 1

Table 2.4. Average codon usage in mitochondrial protein-coding genes

(excluding ND6)...... 32

Table 2.5. Levels of variation between genes using mean pairwise comparions among

the paleognathous birds...... -45

Table 3.1. Results of T-test cornparhg observed substitution sahuation with that of half

hl1 substitution saturation...... 69

Table 3.2. Observed substitution rates for the protein-coding (1" and 2ndcodon positions)

ix and RNA genes among paleognathous birds...... -70

Table 3.3. Directional mutation pressure for the 12 H-strand encoded genes...... 7 1

Table 3.4. Results of chi-square tests comparing nucleotide and amino acid compositions

for each sequence to the average sequence composition ...... 74

Table 3.5. Results of chi-square tests comparing nucleotide and arnino acid compositions

across variable sites only for each sequence to the average sequence . composition...... -75

Table 3.6. Substitution rate variation arnong sites for both protein-coding and

RNA genes...... 77

Table 3.7. The hypotheses tested and the associated P values as calculated using the

program Modeltest ...... 79

Table 3.8. The models that best fit the pattern of nucleotide substitution for the

data sets as deterrnined by the program Modeltest ...... 80

Table 4.1. Log-likelihood scores for different ratite phylogenetic hypotheses calculated using

the program NHML and the total concatenated data set. Five different proposed

X ratite phylogenetic hypotheses were examined dong with the NJ LogDet tree

produced ...... 1 11

Table 4.2. Three taon relative rates tests using LogDet corrected distances on total

concatenated sequences excluding 3rdcodon positions...... 1 14 List of Figures

Figure 1.1. Representatives of the extant ratite lineages. Also shown is one of the

recently extinct species of and the closest living relative

to the ratites, the ...... 2

Figure 1.2. The present global distribution of the living paleognathous birds. The

historical range of the extinct moas of New Zealand is also show...... 4

Figure 1.3. Nine proposed hypotheses for the relationships among the ratite birds ...... 6

Figure 2.1. Maps of the ratite and elegans mitochondrial gneorganizations ..... 26

Figure 2.2. Amino acid frequencies averaged across the protein-coding genes

(excluding ND6)...... 33

Figure 2.3. Arnino acid frequencies for ATPase 8 and ND6...... , . . , . ,.. .-34

Figure 2.4. Amino acid frequencies for each of the protein-coding genes averaged

across the paleognathous birds...... , .. .. , .3 5

xii Figure 2.5. Stylized representation of the cloverleaf secondary structure of mitochondnal

tRNAs based upon Kumazawa and Nishida (1 993) ...... 38

Figure 2.6. Aligned tRNA sequences for paleognathous birds and chicken...... 39-41

Figure 2.7. Secondary structure mode1 for the paleognath 12s rRNA molecule

(consensus sequence)...... -43

Figure 3.1. The relation between the number of nucleotide substitutions and p-distance

calculated from pairwise cornparisons among the paleoganths, chicken.

, alligator and turtle for the three codon positions ...... 66

Figure 3.2. The relation between the number of nucleotide substitutions and p-distance

calculated from painvise cornparisons arnong the paleoganths, chicken,

duck, alligator and turtle for 12s rRNA and 16s rRNA genes ...... 67

Figure 3.3. The relation between the number of nucleotide substitutions and p-distance

calculatsd from painvise cornparisons among the paleoganths, chicken,

duck, alligator and turtle for the 22 RNAs...... 67

Figure 3.4. The distribution of birds, mammals, alligator and turtle as a function of G+C

xiii content at the four-fold degenerate sites...... 73

Figure 3 S. The distribution of birds, mamrnals, alligator and turtle as a function of directional

mutation pressure and the G+C content at nonsynonyrnous sites...... 73

Figure 4.1. Phylogenetic hypotheses based on the concatenated sequences of the &NA,

tRNA and 12 of 13 proteinîoding genes (al1 codon positions). A) 1s the single

most parsimonious tree, as well as, the neighbour-joining tree using the mode1

GTR+I+r. B) 1s the tree topology with the maximum likelihood using the sarne

model. The central column shows the percentage of G+C content at synonymous

sites for each taxon. Values show within the trees indicate the level of bootstrap

support (only values > 50 are shown)...... 103

Figure 4.2. Phylogenetic hypotheses based on the concatenated sequences of the rRNA,

RNA and 12 of 13 proteinîoding genes (3" codon positions excluded). A) 1s

the single most parsimonious tree. The topology is essentially the same as in

Figure 4.1 with the exception that the ostrich and the are now sister taxa. B)

The neighbour-joining tree using the model GTR+I+T. C) 1s the tree topology

with the maximum likelihood using the same model. Values show within the

trees indicate the level of bootstrap support (only values > 50 are shown)...... 104

Figure 4.3. Nucleotide compositional tree based on the total concatenated sequence...... 106

xiv Figure 4.4. Neighbour-joining using a LogDet transfocm with an estirnate of the proportion of

invariable sites. A) 1s the tree for the total concatenated sequences

B) 1s the tree for the total concatenated sequence with 3" codon positions

excluded. Values show within the trees indicate the level of bootstrap support

(only values > 50 are shown) ...... 1 08

Figure 4.5. Neighbour-joining using a LogDet transform with an estimate of the proportion of

invariable sites. A) 1s the tree for the concatenated rRNA sequences

B) 1s the tree for the concatenated protein-coding genes (al1 positions)

C) The tree for the concatenated proteirxoding genes excluding 3rdcodon

positions. Values show within the trees indicate the level of bootstrap support

(only values > 50 are shown) ...... 109

Figure 4.6. Lento plot of the support and conflict for the splits in the LogDet transformed

sequence data set. Thirty of the most supported splits are shown out of a total of

16,384 splits. The splits that appear in the LogDet NJ tree are shown. The

conflict and support for spliis in alternative phylogenetic ratite hypotheses are also

shown dong with their ranking. Al1 conflicts values have been

normalized ...... 1 12

Figure 4.7. Molecular dock ...... 1 18 Figure 4.8. ...... 125

xvi List of Appendices

Appendix 2.2. Bar graphs of the amino acid frequencies for each of the protein-coding

genes for the birds sequenced here ...... 145

Appendix 2.3. Listing of birds sequenced for ND3, i indicates presence or absence of single base

insertion at position i 74 bp in the gene ...... 149

Appendix 3.2. Symmetrical directional mutation pressure estimates for the mitochondrial

protein-coding genes for various taxa ...... 150-1 5 1

Appendix 4.1. Models of substitution estimated for the individual gene data (al1 codon positions)

sets using the program Modeltest v3. The mode1 and parameters show were

those satisQing the Akaike Information Criterion ...... 152-

155

Appendix 4.2. Neighbour joining (LogDet) for the individual protein-coding genes...... 156- 159

xvii Chapter t

Introduction and Overview of Ratite Systematics

The ratites are an assemblage of flightless birds with a southem hemisphere distribution.

Living taxa comprise 1 1 species including the ostrich (S~hiocamelus) of Afnca and formerly

Arabia; the greater (Rhea americana) and the lesser rhea (Pterocnemia pennata) of South

America; the emu (Dromaiiis novaehollandiae) of ; three species of cassowary

(Casuarius) of Northem Australia and New Guinea; and four species of kiwi (Apteryx) found in

New Zealana. ~dditionally,a large nurnber of extinct ratite species have been identified from

fossil and sub-fossil remains, including approximately 11 species of moas from New Zealand

and an undetermined number of species of elephant birds from (Fig. 1.1). The

phylogeny and evolution of these birds has been controversial for over a century (reviewed in

Sibley and Ahlquist 1990). The debate centres on three questions: (1) did the ratites descend

fiom a flying ancestor, (2) are the ratites a monophyletic group, and if so what are their

interrelationships, and (3) if they are monophyletic, do they represent one of the earliest avian

lineages?

The first question, while controversial in the early part of the 20th century, has largely

been resolved. That ratites descended from a flying ancestor is now generally accepted, based

largely on their possession of many adaptations associated with flight. Ratites possess the hollow bones as well as the fùsed tail vertebrate which form the pygostyle, both of which are adaptations to reduce weight. Their cerebellurn is similar to that found in flying birds. Ratite

1 Emu (Dmmuius rwvaehoIIundiue) Soutbem Cassowary (Cmuarius casuariw) Ostich (Sfmthiocamelus) Sir Richard Owen and a moa c. 1850

Great Spoîted Kiwi (Apterp hatii) egant Crested Tirtamou (Eudmmia (Rhea arnericuna)

Figure 1.1. Representative species of each of the extant ratite lineages. Also shown is one of the recently extinct moa species of New Zealand and the closest living relative to ratites, the tinamous. wings share the same skeletal planas flying buds and contain vestigial elements such as flight

quills in and the alula in rheas (Feduccia 1999).

The second question of whether the ratites represent a monophyletic group has been a

hotly contested issue for the greater part of the last 150 years. Similarities between the ratite

birds were recognized early on by researchers such as Merrem (1 8 13) and Lessson (1 83 1) who

grouped the ratites together based upon the shared trait of a -less . Huxley (1 867)

made the first comprehensive attempt to unravel the evolutionary relationships of the ratites.

His description of the dromaegnathous palate (paleognathous palate), a feature unique to ratites

and tinamous, wos central in the debate on ratitc monophyly in the next century. Huxley

believed that the ratites were the sole survivors of an ancient radiation. He proposed that the

Aves be divided into three orders: the Saunirae, whose representative was Archueopteryx;

the Ratitae, comprising of the large flightless birds including kiwis; and the Carinatae which

included al1 other birds. Huxley had dificulty classifying tinamous which have both a

paleognathous palate and a keeled sternum. He finally placed them in the Carinatae but began

his account of that order noting the tinamous shared characteristics with the ratites. For many early researchers, living in a time when the continents were considered stationary, the largely disjunct distribution of thm flightless birds (Fig. 1.2) posd a conceptual barriet to accepting the monophyly of the group. The presence of a paleognathous palate and a keel-less sternum were considered by many insufficient to conclusively demonstrate ratite monophyly. These traits were often dismissed as products of (Fobringer 1888, 1902).

Throughout the first half of the twentieth century, a large number of studies using a vast suite of traits were carried out to try to resolve ratite relationships (see Sibley and Ahlquist

1990). Unequivocal evidence of rat ite monophyl y, however, would wai t until the late 1960's

with the first use of molecular markers and cladistic analyses of ratite skeletal featwes. Studies

comparing -white proteins (Osuga and Feeney 1968; Sibley and Frelin 1972), the

electrophoretic properties of eye-lens proteins (Gysels 1970), as well as compatisons of avian

transfemns (Prager et al. 1976) were the first to clearly demonstrate that the ratite birds were

more closely related to each other than to any other bird. This was later supported by DNA-

DNA hybridization studies (Sibley and Ahlquist 198 1, 1990) and protein sequencing (Stapel et al. 1984; Casper et al. 1994). Remarkable uniformity was found among the ratite chromosome complements which were also unique among birds for the apparent absence of the sex chromosomes Z and W (de Boer 1980).

Cracraft (1 974) perfomed a cladistic analysis of the hind limb and pelvic skeletal characters of the ratites, and affirmed the findings of molecular studies. He demonstrated that the ratites were monophyletic, and attempted to resolve the relationships among their constituent taxa. Cracraft also addressed the historical biogeography of the ratites by incorporating the theory of continental drift to account for their disjunct distribution. He proposed that flightlessness arose once in the common ancestor of ratites in Gondwanaland. As the broke up, the descendants became isolated and rafted with their respective continental fragments to their current geographic locations. Thus ratite biogeography was a product of vicariance. The tree illustrating Cracraft's hypothesis of ratite interrelationships is shown in Figure 1.3. The order of the divergences of the major lineages coincides roughly with the pattern of fragmentation of Gondwana.

Comparisons of avian transfemns, including those of the ratites, were largely consistent Emu Cassowary Ostrich Rhea Rhea Rhea Kiwi Moa Ostrich Ostrich Tinamou Tinamou Morphology : -Immunological Distance: -DNA-DNA Hybridization: Cracraft 1974,1997 Prager et al. 1976 Sibley& Ahlquist 1981

Emu Emu Emu Cassowary Cassowary Cassowary Kiwi Kiwi Kiwi Ostrich Rhea Rhea Rhea Ostrich Ostrich Moa Tinarnou Tinamou Tinamou - -DNA-DNA Hybridization: -Morphology: DNA-DNA Hybridization: Bledsoe 1988 Sibley & Ahlquist 1990 Sibley & Ahlquist 1990 (Fitch & Margoliash) (UPCMA)

Emu Cassowary Emu Kiwi Kiwi - Ostrich Rhea Moa Ostrich Ostrich Rhea Rhea Tinamou Tinarnou Tinamou

IZSrRNA Sequences: COl,Il&l11, Cyib, l6S and I2SrRNA 12s and 16s rRNA Sequences: Cooperef al. 1992 Sequences: Lee et al. 1997 van Tuinen et al. 1998 394 bp 5219bp 2 900 bp C-MOS Sequences: Cooper 1997 657 bp

Figure 1.3 Nine proposed hypotheses for the relationship among ratite birds

6 with Cracraft's tree topology and the vicariant origins hypothesis(Prager et al. 1976; see Fig.

1.3). De Boer's (1980) examination of ratite chromosome complements and banding patterns,

while supporting ratite monophyly, were more dificult to interpret in light of Cracraft's

proposed relationships. In Cracraft's tree, the kiwis were a basal lineage and the emu far more

derived. De Boer, however, found the kiwi chromosomes he examined were indistinguishable

from those of the emu. De Boer accepted Cracraft's tree and interpreted his findings as evidence

that the ratite karyotype was highly conserved and may have remained vimuilly unchanged for

over 100 million years. An alternative hypothesis of ratite relatiotiships, one more consistent

with de Boer's observations, was put forward a later with the publication of the DNA-

DNA hybridization work of Sibley and Ahlquist (198 1). Their hypothesized topology for the

relationships within ratites differed fiom Cracraft's morphological tree (Figure 1.3) in the

placement of the kiwis. Sibley and Ahlquist (1 98 1) argued that the kiwis were not basal among

ratites but rather were a to the and cassowaries. This altemate topology was

inconsistent with the vicariance mode1 as the pattern of speciation differed fiom the pattern of

Gondwana fragmentation. Sibley and Ahiquists's conclusions on the placement of the ostrich

and rhea were less conclusive.

Another cladistic analysis of ratite morphological characters was carriecl out by Bledsoe

(1988). Using 83 characters, he estimated a tree of ratite relationships that was in agreement

with that produced fiom the 198 1 DNA-DNA hybridization study (Figure 1.3). Sibley and

Ahlquist (1 990) again performed DNA-DNA hybridization cornparisons among the ratites and many other avian taxa, and produced a similar tree with the emu, cassowaries and kiwis forming an Australasian clade. The placement of the ostrich and rheas differed, depending on the method of phylogenetic reconstruction use& one method (Fitch-Margoliash) placed the ostrich

as the basal lineage and the other (UPGMA) held that the ostrich and rheas were sister taxa

(Figure 1.3).

Taking advantage of the development of the polymerase chain reaction and technological

improvements in sequencing, Cooper et al. (1992) were able to sequence approximately 400

base pairs (bp) of the mitochondrial 12s rRNA gene for eight extant ratite species, a tinamou

and four species of extinct moas. Along with being the first ratite study involving DNA

sequencing, this study was unique because of the inclusion of several extinct moa species which

had not been possible in any of the previous molecular studies. The phylogenetic tree produced

from this data set, while sharing many topological features with earlier work, presented a new

hypothesis for the placement of the moas (Figure 1.3). In agreement with the findings of Sibley and Ahlquist (1 98 1, 1990) and Bledsoe (1988), Cooper et al. (1 992) placed the kiwis as sister taxa to the other Australasian ratites. Conversely, rheas were basal in the 12s sequence tree, unlike in the morphological trees where moas were the basal lineage (Cracraft 1974; Bledsoe

1988). Using the four cluster test on the 12s sequences, Rzhetsky, Kumar and Nei (1995) determined that this data set did not resolve the position of the ostrich. Cooper et al. (1992) recognized the potential inconsistency between their tree topotogy and the vicariance model.

They reasoned that as the placement of the kiwis was higher up in the ratite tree, it would be unlikely that it branched off as long ago as 70-80 million years, when New Zealand fiagmented from Gondwana. The presence of kiwis in New Zealand was therefore the result of a secondary ratite invasion after the islands had sepamted from Gondwana. This was likely accomplished by either island hopping (Cooper pers. corn) or by a flying ancestor, possibly related to a lineage of volant paleognaths described by Houde (1986).

Lee et al. (1 997) reported the results of a "total evidence" approach which cornbined

5,444 bp of mtDNA and 58 osteological characters. The DNA sequence data set consisted of the complete mitochondrial genes 16s rRNA, cytochrome b and ~RNA~Y'and portions of the genes

12s rRNA, cytochrome oxidase 1 and cytochrome oxidase II, for a kiwi, rhea, ostrich, emu, cassowary and two species of tinamou. Analysis of the morphological data set with maximum parsimony produced a tree that was identical to that of Cracraft (1974). Furthemore, they re- examined Bledsoe's (1988) study and dismissed 42 of the 83 characters he used, and concluded that the remaining characters provided weak support for his proposed phylogeny. The analysis of the DNA sequence data set produced a tree identical with that of Cooper et al. (1 992). but did not include any moas. The morphological and molecular trees produced in this snidy were clearly in conflict. A maximum parsirnony analysis of the combined DNA/osteological data sets produced a tree that was identical to the Cracraft's (1 974) morphological tree. They concluded frorn this that the ability of the smaller osteological data set to outweigh the much larger molecular data set may imply that the molecular tree is incorrect and likely suffering a systematic error such as long branch attraction or variation in rates of evolution between lineages. They furiher suggested t hat the on1y discrepmcy between the morphological and molecular trees is the placement of the root, which is true if moas are not considered.

Cooper (1 997) sequenced a 657 bp portion of a pmtooncogenen (c-mos) for the living ratites.

Analysis of this gene sequence resulted in a tree topology identical to the one produced using the

12s rRNA data set alone (Figure 1.3).

Finally, van Tuinen et al. (1 998) sequenced the complete 12s rRNA, ~RNA~Y'and 16s rRNA genes for an emu, rhea, ostrich and tinamou This &ta set had a considerable amount of

sequence overlap with that of Lee et al. (1997), but resulted in a different tree topology (Figure

1.3). To resolve the basal nodes of the ratite tree Van Tuinen et al. (1 998) sampled fewer taxa,

using only an emu to represent the Australasian ratites which they assumed to be monophyletic.

They found the ostrich to be basal and the rhea to be the closest living relative to the

Australasian ratites, and concluded that the discrepancy between their results and that of Lee et

al. (1 997) was due to con flicting phy logenetic signals contained in the protein-coding genes

compared with the ribosomal genes. They noted that their tree was concordant with the trees

constructed from immunological distance and DNA-DNA hybridization data. Additionally, they

argued that their topology is consistent with vicariance, if ~omedispersa1 is assumed and that the

dating of these divergences should be possible with additionai gene sequences.

The ratites are now generally considered to be rnonophyletic, and the tinamous are their

closest living relatives. The biogeography and the relationships among the ratites has attracted a

considerable amount of attention as can be seen by the number of studies that have been carried out on this group. Consensus has been reached on the close relationship between the emu and cassowary, but the placement of the moa, the rhea and the ostrich has not been determined conclusively. The majority of studies support the placement of the kiwi as the sister gmup to other extant Australasian ratites, but this conclusion needs further testing given that it is not consistent with the separation of New Zealand fkom Gondwana 70-80 million years ago.

Regardless of the topology of the ratite tree, consensus has also been reached on the overall shape of the tree itself which has been characterized as having long terminal branches and short intemodal distances (Sibley and Ahlquist 1990; Cooper et al. 1992; Lee et al. 1997; van Tuinen et al. 1998). Assuming that the accumulation of DNA mutations has occud in a roughly

clock-like fashion this would indicate that the extant lineages are very old and that the radiation

within the ratites may have been occurred relatively rapidly.

The third major question regards the antiquity of the ratite lineage. Traditionally ratites

have been considered very primitive birds and likely occupy a very basal node in the avian tree.

Recently this view has been challenged with the publication of three papers using mitochondrial

DNA sequence evidence to argue for a basal position of , with the ratites foming a

more derived clade (Harlid et al. 1998; Harlid and Arnason 1999; Mindel1 et al. 1999). The

conclusions of these papers have been contested by several recent studies, one using largely the

same mitochondrial DNA sequences (Paton and Baker ms) a second using only mitochondrial

ribosomal genes, with a larger sampling of taxa (van Tuinen et al. 2000) and a third using

nuclear DNA sequences (Groth and Barrowclough 1999). These studies conclude that ratites are

basal in the avian tree, and that published trees in which passerines are basal are inconectly

rooted.

The Objectives of this Tl#esis:

In view of the continuing controversies, the objective of this thesis is to merelucidate the relationships arnong the ratite birds using complete mitochondrial genome sequences. The choice of complete mitochondrial genome sequences follows from the understanding that large sequence data sets are more likely to accurately recover the phylogenetic relationships than are small data sets, especially when resolving ancient divergences (Cao et al. 1994; Cummings, Otto and Wakeley 1995; Russo, Takezaki and Nei 1996). As short internodal distance are generally found in al1 the proposeci ratite trees, suggesting the radiation may have occurred fairly quickly,

mitochondrial genes were chosen because of their shorter coalescence times (Moore 1995). The

use of complete mitochondrial genomes also addresses the concem that different classes of genes, such as protein-coding or ribosomal genes may contain conflicting phylogenetic signal

(van Tuinen et al. 1998).

In this thesis 1 report the complete mitochondrial genomes of the , emu, southem cassowary, ostrich, lesser rhea, , and two species of extinct moa; Anomalopteryx didiformes and Emezis crassus. Inclusion of the moas is necessary to resolve their position in the ratite tree, and by increasing taon-sarnpling they should reduce branch lengths to minimize any long branch attraction effects (Lee et al. 1997).

The inclusion of the moas should also assist in determining whether the discrepancies observed between the molecular and morphological trees are exclusively a product of rooting problems.

Organizatio~~of the Thesis:

Chapter 2 describes how the genome sequencing was accomplished, and characterizes genome organization, their base compositions, codon usage and levels of variation among the different gene sequences. In chapter 3, the pattern of nucleotide and amino acid substitution are exarnined. The sequences are tested for saturation, as well as for rate variation among sites and stationarity. The purpose of this examination is to determine the most appropriate sites and the best models of substitution to estimate how the sequences have evolved. In chapter 4 this information is used to estimate the relationships among the ratite birds and assess the support for the resulting phylogeny. Using gene sequences in conjunction with fossil calibration dates the divergence times between these lineages are then determined These dates are used iu

conjunction with the tree topology to test the antiquity of the group and whether they accord

with the fragmentation of Gondwana and support a vicariant origin for the ratites, or whether

some taxa subsequently dispersed to their current geographic distributions.

References

Bledsoe, A.H. 1988. A phylogenetic analysis of postcranial skeletal character of the ratite birds.

AM. Carnegie Mus. 57:73-90.

Cao, Y., Adachi, J. A., Janke, A., Paabo, S. and Hasegawa, M. 1994. Phylogenetic relationships among eutherian orders estimaied from inferred sequences of mitochondrial proteins: Instability of a tree based on a single gene. J. Mol. Evol. 395 19-527.

Casper, G, Wattel, J. and de Jong W. W. 1994. a A-crystallin sequences group tinamou with ratites. Mol. Biol. Evol. 11:711-713.

Cooper, A. 1997. Studies of avian ancient DNA: from park to modem island extinctions. In "Avian Molecular Evolution and Systematics" (D.P.Mindell, Ed.), Academic

Press, San Diego, U.S.A. pp. 345-373. Cooper, A., Mouter-Chauvire, C., Chambers, G. K, von Haeseler, A., Wilson, A.md Piiiibo, S,

1992. Independent origins of New Zealand moas and kiwis. Proc. Natl. Acad. Sci. 89:874 1-

8744.

Cracraft, J. 1974. Phylogeny and evolution of the ratite birds. Ibis 116:494-52 1.

Cummings, M. P., Otto, S. P. and Wakeley, J. 1995. Sampling properties of DNA sequence data

in phylogenetic analysis. Mol. Biol. Evol. l2:8 14-822.

de Boer, L.E.M.1980. Do the chromosomes of the kiwi provide evidence for a monophyletic origin of the ratites? Nature. 28'1: 84-85,

Feduccia, A. 1999. The origin and . 2ndEdition. Yale Univ Press, New Haven

Connecticut.

Fübringer, M.. 1888. Untersuchungen zur morphologie und systematik der vogel. Vols. 1 and 2.

Von Holkema. Amsterdam.

Fübringer, M.. 1902. Zur vergleichenden Anatomie des Brustschulterapparates und der

Schultermuskeln. Part 5. Jena 2. für Nahirwiss. 36:289-736.

Groth, J.G. and Barrowclough, G.F. 1999. Basal divergences in birds and the phylogenetic

14 utility of the nuclear RAG-1 gene. MolShylogenet. Evol, 12:L 15-123.

Gyseis, H. 1970 Some ideas about the phylogenetic relationships of the Tinamiformes, based on

protein characters. Acta. 2001.Pathol. Antverp 50:s- 13.

Harlid, A., Janke, A. and Amason, 0. 1998. The complete mitochondrial genome of Rhea americana and early avian divergences. J. Mol. Evol. 46:669-679.

H&lid, A. and Amason. U. 1999. Analyses of mitochondrial DNA nest ratite birds within the

Neognathae-supporting a neotenous origin of ratite morpho!ogical characters.

Proc. R. Soc. Lond., B, BioI. Sci. 266:305-309.

Houde, P. 1986. Ostrich ancestors found in the northem hemisphere suggest new hypothesis of ratite origins. Nature. 324563465,

Huxley, T.H.1867. On the classification of birds; on the taxanomic value of the modifications of certain craniat bones observable in that cIass. Proc. 2001.Soc. Lond. 18673415-472.

Lee, K., Feinstein, J. and Cracrafl, J. 1997. The phylogeny of ratite birds: resolving conflicts between molecular and morphological data sets. Avian Molecular Evolution and Systematics,

Acadernic Press, Toronto pp 1 73-209. Lesson, R.-P. 183 1. Traite d'Ornithologie, Vol. 1 Levrault, Paris.

Merrem, B. 1813. Tenatem systematis naturalis avium. Abh. Konigel (Preussische) Akad. Wiss.

Berlin 237-259

Mindell, D.P.,Sorenson, M.D., Dimcheff, D.E., Hasegawa, M., Ast, J.C. and Yuri, T. 1999.

Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes.

Syst. Biol. 48: 138-152.

Moore, W.S. 1995. Inferring phylogenies fiom mtDNA variation: mitochondrial gene trees

versus nuclear gene trees. Evolution. 49:7 18-726

Osuga, D.T. and Feeney, R.E. 1968. Biochemistry of the egg-white proteins of the ratite group.

Arch. Biochem. Biophys. 124560-574.

Paton, T. and Baker, A.J. 2000 Shorebirds aren't basal! Tumstones and bird origins.

Prager, E.M., Wilson, A.C., Osuga, D.T.and Feeney, R.E. 1976. Evolution of flightless land birds on southem continents: transfemn cornparison shows monophyletic origin of ratites. J.

Mol. Evol. 8:283-294.

Russo, C. A., Takezaki, N. and Nei M. 1996. Etnciencies of different genes and different tree-

16 building methods in recove~ga known vertebrate phylogeny. Mol. Biol. Evol. 13525-536.

Rzhetsky, A., Kumar, S. and Nei, M. 1995. Four-cluster analysis: a simple method to test

phylogenetic hypotheses. Mol. Biol. Evol. 12:163-167.

Sibley, C.G. and Frelin, C. 1972. The egg white protein evidence of ratite affinities. Ibis.

llO:377-387.

Sibley, C.G.and Ahlquist, J.E. 1981. The phylogeny and relationships of the ratite birds as

indicated by DNA-DNA hybridization. In " Evolution Today, Proc TdCongr for Systematic

Evolutionary Biology", (G.G.E Sçudder. and L.L. Reveal Eds). Hunt Inst. Botanic Document,

Pittsburgh, pp. 301 -335.

Sibley, C.G. and Ahlquist, J.E. 1990. Phylogeny and classification of birds. Yale Univ Press,

New Haven Connecticut.

Stapel, S.O., Leunissen, J.A.M., Versteeg, M. Wattel, J. and de Jong, W. W. 1984. Ratites as the oldest offshoot of avian stem-evidence fiorn a crystallin A sequences. Nature. 31 1:257-259.

van Tuinen, M., Sibley, C. G.and Hedges S. B. 1998. Phylogeny and Biogeography of ratite birds inferred fiom DNA sequences of the mitochondrial ribosomal genes. Mol. Biol. Evol. 15:

3 70-3 76. van Tuinen, M., Sibley, C. G. and Hedges S. B. 2000. The eady history of modem bu& inferred

fiom DNA sequences of nuclear and mitochondrial ribosomal genes. Mol. Biol. Evol. 17: 45 1-

457. Chapter 2

Characterization of the Complete Mitochondrial Genomes of

Nine Paleognathous Birds

Mitochondrial DNA (mtDNA) is a small circular genome located in the mitochondria,

separate from the bulk of the organismal DNA found in the nucleus. The size of this organellar

genome is highly conserved across organisms with an average length of approximately 17

kilobases (kb), although there are a few exceptions where it may be as large as 368.8 kb (Kubo

et al. 2000). mtDNA has attracted a considerable arnount of attention over the last two decades

as a usehl marker for studies in evolutionary biology. The traits that have made it so appealing are its abundance, high rate of evolution, apparent lack of recombination and its predominantly materna1 inheritance (Wilson et al. 1985). The mitochondrial genome is also haploid and unlike nuclear DNA has no gene families, thus eliminating the Rsk of comparing paralogous genes.

To date more than 90 complete mitochondrial genomes have been sequenced and the nurnber of genome sequencing projects is increasing rapidly. This growing interest in recovering large amounts of sequence is aided by technological advancements, and is fuelled by the realization that large amounts of sequence data are required to resolve deep branches in molecular phylogenies, as well as to reliably recover species trees rather than individual gene trees (Cao et al. 1994; Cummings et al. 1995; Russo et al. 1996; Takezaki and Gojobori 1999).

Additionally, the determination of longer stretches of continuous sequence provides the opportunity to map gene organization within genomes. Due to the highly conserved nature of

mitochondrial genome organization across taxa, it has been proposed that gene rearrangements

can provide an additional source of phylogenetically informative characters (Sankoff et al.

1992).

To date, only eight of the published mitochondrial genomes are avian: the chicken

Gallus gahs (Desjardins and Morais 1WO), the ostrich Struthio camelus (Hiidid et al. 1997),

the greater rhea Rhea americana (Hiirlid et al. l998), the mok Corvusfrugilegus (Htlid and

hason 1999), the redhead Aythya americana, the peregrine falcon Falco peregrinus, the grey-

headed broadbi 11 Smithornis sharpei, and the village indigobird chafybeata, (Mindel1 et

al. 1999). For this study 1 sequenced the entire mitochondrial genomes of nine additional birds:

the sout hem cassowary Casiiarius casuarius, the emu novaehollandiae, the great

spotted kiwi Apteryx haastii, the lesser rhea Pterocnemia pennata, the ostrich Struthio camelus,

two extinct species of moa Anomalopteryx didijormes and Emeus crassus and two species of

tinamous; the elegant-crested tinamou Eudromia elegans and the great tinamou major.

These birds represent al1 the major extant and one of the two recently extinct lineages that make

up the paleognathous birds. The history and phylogenetic relationship of paleognaths, including their status as a monophyletic group, has been debated for over a century. In the Iast 30 years several molecular studies have been undertaken to try to resolve these major issues (Prager et al. 1976; Sibley and Ahlquist 198 1, 1990; Cooper et al. 1992; Cooper 1997; Lee et al. 1997; van

Tuinen et al. 1998), but no clear consensus has emerged fiom DNA hybridization or analysis of relatively short DNA sequences. In this thesis my approach is to employ the much larger data, complete mitochondnal genome sequences fiom representative species of al1 the major paleognath lineages, with the objective of resolving the phylogenetic relationships in such an

ancient lineage and to examine the support for the various competing hypotheses.

In this chapter, the methodology is presented by which these mitochondriai genome

sequences were obtained. Structural features of the genomes such as gene order and

organization, as well as base composition, levels of variation arnong genes, codon usage, amino

acid frequencies for the encoded proteins, and secondary structures for both ribosomal and

transfer RNAs are reported for each of the major paleognath lineages. Paleognaths have

traditionally been considered the most basal lineage within birds, and this antiquity along with

their flightlessness invites comparative genomic studies with flying birds (carinates). By noting

both the conunonalities and particularly the unique features, these genomes provide an

opportunity to gain further insight into the constraints acting on mitochondrial genes, their

phylogenetic utility, and ultimately the underlying mechanisms goveming mtDNA evolution in birds.

aterials0 & Methoh

Specimen hforrnation

Tissue samples of emus were donated by the University of Western Australia and by

Sylvan's Fm,located near Guelph, Ontario. A came fiom a zoo specimen in Papua New Guinea. The blood sample from the great spotted kiwi was donated fiom the Mt. Bruce Collection, New Zeaiand. The ostrich tissue sample originated from an ostrich fmin Ontario. The elegant-crested tinamou sample was collected in , and the lesser rhea tissue sample came fiom a road casualty in . Tissues for two additional tinamou species, Crp~re1Iu.svariegutus and Tinamus major, were provided by Leo Joseph

from the Academy of Natural Sciences, Philadelphia. A third tinamou species,

perdicaria sannboni was donated from a captive population near Vancouver, B.C. Samples

from bones of the extinct moas uncovered fiom a cave site on the South Island of New Zealand

were provided by Trevor Worthy. One of the bones fiom which DNA was recovered, a phalanx,

was initially identified as MeguZupteryx didinus based on morphological characters but upon

comparison with the published rnoa 12s rRNA sequences (Cooper et al. 1992) was found to be

identical with the rnoa Anomalopteryx didiformes. The second species of rnoa presented in this study is Erneus crussus. For this specimen the DNA was recovered fiom a tarsometatarsus bone.

DNA Extract ion and A rnp 1ificat ion

DNA was isolated from blood, tissue and feather samples using standard proteinase

WSDS and phenol/chloroform extractions (Sarnbrook et al. 1989). Enriched rnitochondrial

DNA was also isolated fiom an emu tissue sample by separating the mitochondria fiom other cellular organelles using differential centrifuga1 sedimentation followed by cesium chfonde gradient centrifugation of the DNA as described by Cam and Griffith (1987). DNA extraction and amplification from the rnoa bone samples followed the svingent protocol of Hagelberg

(1994). Al1 rnoa DNA extractions were performed in a separate room reserved for this purpose.

The rnoa DNA was typical of ancient DNA recovered fiom preserved materials such as bone in that it was degraded, limiting the maximum size of amplification products to 300 and 500 bp in length. The crude DNA extracts were used as templetes in polymerase chain reactions. More than a hundred primers were utilized to ampli@ the entire mitochondrial genome (Appendix

2.1). The initial primers were designed from conserved regions found by aligning vertebrate mtDNA sequences and later fiom the ratite sequences generated. Segments of the genomes were amplified in >l kb fragments and sequenced in both directions with >99% overlap. The amplified segments were designed to have large regions of overlap to confirm sequences from separate amplifications and thus identify any fragments that may potentially have non- mitochondrial origins. To ensw the sequences were of mitochondrial origin, al1 the autoradiographs produced were examined for evidence of multiple bands indicating more than one sequence. The sequences for the protein-coding genes were translated to confin that the reading-frames were free of indels that were not a multiple of three nucleotides. Variation across the three codon positions was exarnined to ensure it followed the expected pattern of decreasing variation from third to first to second positions, which is a by-product of the degeneracy of the genetic code. For RNA-coding genes, the sequences were mapped onto secondary structure models and checked for any substitutions that would be incompatible with the predicted structures.

For DNA amplification of samples from entant species, the following cycle profile was used: 94'C for 45 s, 50'C for 45 s, 72'C for 90 s, repeated 36 times using a Perkin Elmer 480

DNA thermal cycler. The rnoa samples required a greater number of cycles due to inhibition and a lower number of starting template molecules: 48 cycles were routinely used and oîten a second round of amplification was required. Amplifications of the moa DNAs were performed separately fiom the other ratite samples to minimize the risk of contamination. The amplified products were separated by gel electrophoresisand purifid using either Gene Clean kits (Bio

1O 1) or filter pipet-tip centrifugation (Dean and Greenwald 1995). The products were then sequenced with Sequenase 2.0 kits (United States Biochemical), Amplicycle Sequencing kits

(Perkin Elrner Cetus) or ThennoSequenase (Amersharn). DNA sequences were entered manually into a cornputer using the program XESEE3 (Cabot 1997). A portion of both moa genomes were sequenced at the Department of Omithology at Arnerican Museum of Natural

History for independent confirmation. Portions of the mitochondrial genomes of the great tinarnou and the rnoa E. crassus were sequenced using a LI-COR 4200 and an AB1 377 automated DNA sequencers. (for aligned sequences, Appendix 2.1).

Sequence Analysis

Cornplete mitochondrial DNA sequences of a chicken, X52392 (Desjardins and Morais

1990); and alligator, Y 13 1 1 3 (Janke and Amason 1997) were downloaded from Genbank.

Sequences were aligned initially using the prograrn Clustal X (Thompson et al. 1997), and then improved upon manually, as in the case of the tRNAs or rRNAs using models of secondary structure. The protein coding sequences were aligned using the predicted amino acids, and the nucteotide sequences were made to conform to this. ND6 is the only protein-coding gene transcribed From the L-strand. Because of the very different base composition between the two strands, ND6 was analysed separately for comparisons of codon usage, amino acid fiequency and nucleotide frequency (Hasegawa and Kishino 1989; Adachi et al. 1993). These comparisons were carried out using the program MEGA (Kumar et al. 1993), MOLPHY

(version 2.3; Adachi and Hasegawa l996a) and PAüP* ( version 4.0b2; Swofford 1998). Due to the extremely high sequence similarity between the two moa species and the relatively high

similarity between the two tinarnous, only one representative for each lineage (Anomalopteryx

didiformes and the eleganttrested tinamou) were used in the analysis.

Results

Structurul Fent ures of Pale us Mirochondrial Genomes

The rnitochondrial genomes of the paleognathous birds are similar in size (16.6-16.8 kb) and base composition to other vertebrate genomes sequenced to date (Anderson et al. 1981 ;

Anderson et al. 1982; Bibb et al. 198 1; Roe et al. 1985; Janke and hason 1997; Desjardins and Morais 1990; Hiirlid et al. 1997, 1998; Mindel1 et al. 1999) ( Table 2.1).

Table 2.1. Length and base composition of the L strand of the mitochondrial DNA genomes of

paleognathous birds relative to chicken

Species Chic ken 16,775 30.3 32.5 13.5 23.8 Elegant-crestedTinamou Great Tinamou Emu Southem Cassowary Great-spotted Kiwi Lesser Rhea Ostrich Anomalopteryx (moa 1) Emeus (moa2) 16,643* 30.7 3 1.7 13.3 24.3 * pending final confirmation of control region sequence Ratite +Great Tinamou Mitoc hondrial Genome

Figure 2.1. Maps of the mitochondnal gene organizations for the nine genomes sequenced here. As in other vertebrates the bases A and C are represented in slightly greater than the

expected 25 percent fiequency given a random sequence, while G is significantly below this

value and the frequency of T is close to the random expectation.

The overall genome organimtions for the ratites and the great tinamou were the same as

that reported for the chicken (Figure 2.1) (Desjardins and Morais 1990). The gene ND6 was

found adjacent to the control region, and the origin of light strand replication, normally found in

the WANCY region in non-avian vertebrates, was absent. The gene order found in the elegant-

crested tinamou was however different. In this tinamou, the gene order upstream from the

control region resembles that found in the majority of vertebrate mitochondrial genomes with

cytochrome b separated from the control region by the tRNAs threonine and proline. Unlike the

general vertebrate order however, ND6 is not adjacent to ND5, but has been translocated to the

other side of the control region. To ascertain which gene order was likely common to most

tinamous, this region was sequenced for two additional tinamou species: Cryplurellus

variegatus and Nothoprom perdkaria sannboni. In these tinamou species, the gene order was the same as that in the great tinamou, indicating that this gene order may be unique to

Eudromia elegans. Su bsequently similar kinds of gene remangement around the control region have been reported in birds (Mindel1 et al. 1998; Bensch and Hiklid 2000), although the arrangement of the tRNAs in the elegantçrested tinamou remains unique. The paleognath mitochondrial genomes each contain 37 genes of which 13 code for proteins, 22 for tRNAs, and

2 for rRNAs. As in other vertebrates, the majority of the genes are encoded on the H strand, with only the protein-coding gene ND6 and eight tRNAs transcribed from the L strand (Table

2.2). Table 2.2. Mitochondrial genes for the nine paleognathous birds are listed in the L strand gene order along with their range in size, the strand on which gene is encoded and the associated start and stop codons for the protein coding genes.

Gene S ize Strand Codon Range Encoded Start Stop Control Region tRNA-Phe 12SrRNA tRNA-Val I6SrRNA tRNA-Leu (UUA) ND 1 ATG TAA, AGG, AGA tRNA-lle tRNA-Gln tRNA-Met ND2 ATG TAG tRNA-Trp tRNA-Ala L tRNA-Asn L tRNA-Cys L tRNA-Tyr L COI GTG AGG t RNA-Ser (UCA) L tRNA-Asp coi1 GTG T-- tRNA- Lys ATPase 8 ATG TAA ATPase 6 ATG TAA COI II ATG T-- RNA-Gly ND3 ATA TAA t RNA-Ars ND4L ATG TAA ND4 ATG TAA, AGA tRNA-His RNA-Ser (AGC) tRNA-Leu (CUA) ND5 ATG, GTG TAA, AGA CY~b ATG TAA tRNA-Thr tRNA-Pro L ND6 L ATG TAG tRNA-Glu L *incfuding the edited site 28 The mitochondrial genome is extremely compact, in stark contrast to the nuclear

genome. Nonfûnctional sequences, such as the spacer DNA between genes, are short in length

and few in number. The single exception to this rule is the spacer DNA between the ~RNA~~

and ~RNA'". This region is highly heterogeneous in the length of the spacer and in base

composition among the paleognaths. The spacer ranges in size from 4 bp in the elegant crested

tinamou to over 80 bp in emu, southem cassowary and great spotted kiwi. In the cassowary and

kiwi over 20 bp of the spacer is composed of a monobasic string of Gs.

In several sites in the paleognath genomes, compactness has reached its ultimate state

with the overlapping of gene sequences. The genes ND1 and ~RNA"'which are butt-joined in

chicken have a 2 bp overlap in the great spotted kiwi and ostrich. It was observed in al1 the

paleognaths and the chicken that the genes ND2 and ~RNA~'Poverlapped by 2 bp. The ~RNA'Y' and ~RNA*Y'overlap by a single base pair as does RNAMC'and tRNAG'". In the latter case, the

tRNAs are encoded on opposite strands. The relatively large 9 bp overlap in sequence between

COI and ~RNA'" found in al1 the avian genomes sequenced to date appears unique to avian mitochondrial sequences (Desjardins and Morais 1990). The overlaps observed between the protein-coding genes of ATPase 8 with ATPase 6 (10 bp), ATPase 6 with COI11 (1 bp) and

ND4L with ND4 (7 bp) have dl been seen in other vertebrate genomes, although the extent of the overlap varies.

Codon Usage and Base Composition

The use of initiation and termination codons in the paleognath genomes follows the pattern observed in other vertebrates (Gadaleta et al. 1989; Desjardins and Morais 1990; Hiidid et al. 1997, 1998). For the protein-coding genes, ARJ functions as the predominant initiation

codon. The exceptions are ND5 in the ostrich and COI and COI1 in al1 the paleognaths, where

GTG is the initiation codon. Incomplete termination codons were observed in COI1 and COI11

for al1 the paleognaths (Table 2.2). The paucity of G in the sequences is largely due to a bias

against G in third codon positions of protein-coding genes. In contrast, T is at higher frequency

at second positions, as has been noted in other rnitochondrial gene sequencing studies (Invin et

ai. 1991)(Table 2.3). The bias against having G in the third position is reflected in codon usage.

When several codons code for the same amino acid, those ending in G are less fiequently used

(Table 2.4).

Amino Acid Composition

Along with codon usage, Table 2.4 also provides an indication of which amino acids are

used predominantly. This is more clearly depicted in Figure 2.2, which shows the mino acid composition of the protein-coding genes for the ratites, the elegant-crested tinamou and the chicken. Averaging the amino acid composition across 12 genes may obscure the unique properties of any particular gene and so the composition of each individual gene was also examined (Appendix 2.2). The results for 1 1 of the 12 genes show a distribution similar to the averaged results in Figure 2.2. Individual genes show a high level of concordance among birds in the frequencies of the individual amino acids with a strong predilection for hydrophobic bases. The genes that differed in arnino acid composition fiom the average were ATPase 8 and

ND6 (Figure 2.3). ND6 codes for a protein with a greater representation of valine and glycine and with a decreased fiequency of isoleucine, lysine and histidine compared with other Table 2.3 Base composition of protein-coding genes, excluding ND6.

Codon YO Taxa Position A C G T Chic ken

Tinamou

Moa

Rhea

Ernu

Cassowary

Kiwi

Ostrich Table 2.4 Average codon usage in mitochondrial protein-coding genes for the paleognathous birds (excluding ND6).

UUA P ccu Q CAA UUG CCC CAG CUU CCA CUC CCG H CAU CUA CAC CUG A GCU GCC K AAA AUU GCA AAG AUC GCG R CGU UGU G GGU CGC UGC GGC CGA GGA CGG AUA GGG AUG D GAU S UCU GAC GUU UCC GUC UCA E GAA GUA UCG GAG GUG AGU AGC AGA UUU AGG UUC T ACU UAA ACC UAG UAU ACA UAC ACG

UGA N AAU UGG AAC - LICMVFYWPAGSTNQHKRD'E Amino Acid mm chicken tinamou emu cassawary kiwi rhea ostrich moa

Figure 2-2. Am ino acid frequencies averaged across the protein-coding genes (excluding ND6). ATPase 8 25- 7-

Arnino Acid

n MVFYWPAG TNQH Amino Acid Figure 2.3. Amino acid frequencies for ATPase 8 and ND6 G S T N Q H K R D E Arnino Acid

Figure 2.4. Amino acid frequency for each of the protein-ding genes averaged across paleognathous birds mitochondrial pmteins. ATPase 8 codes for a protein richer in pdiwend txyptophan with a

reduced glycine and alanine content. In part, some of the deviation in composition for ATPase

8 may be a stochastic artifact of its short sequence length. The unique composition of ND6 has

been reported previously for other mitochondrial genomes, and is thought to be in part a by-

product of ND6 being encoded by the L strand which is experiencing different substitutional

biases. The uniqueness of ATPase 8 and ND6 is further illustrated by contrasting the amino

acid composition arnong the 13 protein-coding genes (Figure 2.4).

RNA Editing

In the paleoganth species 1 sequenced , the direct translation of the ND3 gene would

result in a gene product considerably shorter than that found in other vertebrates. Comparisons of ND3 sequences revealed that the ratite sequences could be best aligned to the published chicken sequence by deleting a base 174 bases from the start of the gene. Deletion of this base would result in a translation product of comparable length and amino acid composition to that found in other vertebrates. If the translated mRNA does not have this base edited out or ignored during translation, the resulting frame shifi produces nonsense coding, followed shortly thereafter by a termination. To ensure the sequences were authentic and not of a nuclear pseudogene, the emu sarnple was extracted by CsCl gradient purification (Cam and Griffiths

1987). The 'purified' mtDNA was used as template to ampli@ several fragments with different primer combinations, and they were sequenced for ND3. The resulting sequences were identical to the ones derived using genomic DNA. While this work was in progress the ostrich mitochondrial genome was published (Hiidid et al. 1997). and these authors also found the additional base in ND3. Their methodology involved using clones of purifieci mitochondrial

DNA rather than a PCR based approach which adds further support to the mitochondrial origin

of this sequence.

To further investigate the insertion in ND3, a sample of chicken DNA was extracted as

described in the materiol and methods. The ND3 gene was arnplified, sequenced and found to

be identical to the published sequence (Desjardins and Morais 1990) with the one difference that

it too contained an additional base, a C, at position 174.

tRNAs

The 22 tRNAs were identified and show no reanangements relative to the gene order

found in the chicken genome except in the elegant-crested tinamou. The tRNAs are found interspersed between the protein-coding and ribosomal genes and have been suggested to have a role as recognition sites for the cleavage of the transcripts between the mRNA or rRNA and the

RNA genes (Ojala et al. 1980; Oadaleta et al. 1989)(Fig. 2.5). Each of the tRNAs cm be folded into cloverleaf secondary structures assuming G-Uwobble and tolerance of some non-Watson-

Crick pairings. Mitochondrial tRNAs lack several invariant features found in nuclear and prokaryotic tRNAs and are believed to have reduced tertiary structural constraints (Kumazawa and Nishida 1993). Each of the tRNAs have similar patterns of base substitution. Variation in the stem structures was almost exclusively limited to transitional differences, but loop structures had both transitional and transversional differences, as well as length variation (Figure 2.6). As observed in Htirlid et al. (1 997) the four tRNA genes: ~RNA~'(UCN),~RNA~~: ~RNA~~~(CUN) and ~RNA*'"for the ostrich and chicken al1 have an adenine at position 8 adjacent to the amino- acyl stem unlike ail other vertebrates which have a thymidine at this position.

D loop

u Anticodon loop

Figure 2.5. Stylized representation of the cloverleaf secondary structure of mitochondrial tRNAs based upon Kumazawa and Nishida (1 993).

This was also found to be the case for the other 4 published avian genomes and was observed in ail the genomes sequenced here, adding weight to Hblid et ai.'s (1997) speculation that this condition may be universal for birds.

rRNAs

The 12s and 16s rRNA genes have been partly or completely sequenced for several ratite species (Cooper et al. 1992; Lee et al. 1997; van Tuinen et al. 1998,2000). The sequences they produced are al1 consistent with the sequences generated here with the exception (first noted by Hiidid et al. 1997) of the three base insertioddeletions in the Cooper et al. (1992) igzjL -i3 9cz 2 x: ms: -+E mrnor XuOUUOEU o...... "oz::::: $2;::::::S.... s:::::::a...... z::::3.. .Li l-. 3I ...... C'.... ci.... b.... b.... 5::::L".,..... td.... si;;;Ci...... 3 ru.... $1:Z?::: a....Li...... ::: pi'::::: :: !,:.... 8::::::: F?::: ...... m.. x...... =,*...... $? : :? y;;:::::..... $ri;;.P.*

3,:S... ::: .....-.. 2;;:; 24:::

4 ...... SI:::::::

U zarc >. C C Y 2: 2z22 .c C -t 4 -a mas xc=Z9Za2 3cz3n~um4 44 a 4r n os 40 mmu U .-. m 4(0 nar u 4-amrsi or YU~V~CEV 3cAX4 3c~nmua~ XUQV~OEU 4-~c~nmuia-~ E OSm 0-C -.-i~nrno- 4- E qr m O, XuQ)U40Eu XUiDUIOEO Xu~U~OEV 40

sequemes. The ribosomal genes are highly conseived and are only surpassed in conservation by

the tRNAs among mitochondrial genes. The ribosomal genes are evolving under very different

constraints from that of the protein-coding genes. Ribosomai genes have a much higher

proportion of indels largely lirnited to loop structures as they are in tRNAs. The aligned 12s

and 16SrRNA sequences are shown in Appendix 2.1. The paleognath consensus sequence

conformed closely with published secondary structure models for 12SrRNA (Gutell 1994; Van

de Peer et al. 1994; Sullivan et al. 1995; Hickson et al. 1996) and for 16SrRNA (De Rijk et al.

2000) in the presence and size of stem and loop structures (Figure 2.7). The pattern of base substitutions in the ribosomal RNA genes is similar to that observed in the transfer RNAs. Most of the transversional differences and indels are located in loop structures, whereas variation in the helices is predominantly limited to transitional differences with compensatory substitutions.

Conirol Region

The control region is a non-coding sequence of variable length, usually in the order of

1,100 bp in birds (Baker and Marshall 1997). It contains the origin of replication for the H strand as well as the promoter for the transcription of both the H and L strands. The control region is bounded by the RNA=" and ~RNA'~in the ratites, and by ~RNA'" and ~RNA'~in the elegant crested tinarnou. The region can be subdivided into three domains: a variable 5' and 3' flanking regions (L-strand) flanking a conserved central core. Repeat motifs in the 3' region are present in al1 of the paleognath sequences except ostrich. There are also monobasic strings at the beginning of the 5' and 3' regions. Paleognath 12s rRNA

Figure 2.7. Secondary structure mode1 for the paieognath 12s rRNA molecule (variation across sites). Levels of Variation between Genes

The mitochondrial genes code for protein and ribosomal subunits that are under different

functional constraints, and as a result they exhibit different rates of substitution. The levels of

variation have traditionally been represented by pair-wise cornparisons of percentage sequence

divergences. With sevenl genomes this becomes more curnbersome to represent. To provide a

rough measure for the overall sequence divergence among genes, the average maximum

likelihood distances under the GTR model of substitution for al1 nucleotide positions in the

RNA-coding genes and for first and second codon positions in the protein-coding genes were

calculated for al1 possible pair-wise cornparisons for each gene among the paleognathous birds.

This was repeated for the predicted amino acids using the mtREV model (Adachi and Hasegawa

1996b). The genes are listed in Table 2.5 from the slowcst evolving to the fastest. The pattern of variation between the genes is similar to that observed in the mitochondrial genomes of other vertebrates. The transfer RNAs are the most conserved, followed by the ribosomal RNA genes and then the protein-coding genes. Among the protein-coding genes, tbere is a slight discrepancy in the order of conservation depending on whether the nucleotide sequences or the arnino acid sequences are used. For both nucleotides and amino acids, the cytochrome oxidase genes are the most conserved followed by cytochrome b, several ND genes, ATPase 6, the remaining ND genes and finally ATPase 8. Regions of sequence overlap between genes show the highest level of conservation with no variation at these sites within the paleognathous birds and between paleognathous birds and chicken. This conservation is expected as these sites are under the combined functional constraints of two genes. Table 2.5. Levels of variation between genes using mmpairwise comparisons among the paleognathous birds.

- - Mean Corrected ML Distances Gene Nucieotide Amino Acid - - * S.D. X * S.D. 22 tRNAsS 16SrRNAf 12SrRNA" CO 1 COI1 CO III Cytochrome b ND 1 ND 3 ND 4L ATPase 6 ND 4 ND 5 ND 2 ND 6 ATPase 8

------*ribosomal and tRNA sequences were aligned and analysed with gaps and ambiguous sites removed

The variation observed in the protein-coding genes predominantly involved single base substitutions. Insertioddeletion (indel) mutations were limited to a six-base (or two-arnino acid) deletion near the 5' end of the ND1 gene in tinamou relative to the ratites, a three-base or single amino acid deletion at the 3' terminal end of ND5 in the ratites (with the great spotted kiwi having an additional arnino acid deletion at this site), and a deletion of three amino acids in the great spotted kiwi at the 3' end of ND4. At the sequence level, variation across the protein- coding genes follows the pattern of third codon positions having the greatest level of variation followed by first positions and then second positions. The ratio of the observed variation in

base substitution among the codon positions was 2.9: 1 :12 (first:second:third), similar to that

found in other vertebrate genomes ( hason et. al. 1993).

~iscussion

Genome Organiza~ion

The mitochondrial genome sequences 1 obtained from the nine paleognathous birds share

many of the features observed in other genome sequencing projects. They are similar in size to

other vertebrate genomes. The ratites share the unique avian mitochondrial gene order first

described in the chicken. Although the paleognaths have traditionally been considered the most

basal of the avian lineages, the observation of a rearranpment in the gene order of the tinamou

was initial ly puzzling . Superficially, elements of the tinamou gene order resemble those found

for the majority of vertebrates with cytochrome b and ~RNA~'and ~RNA~" adjacent to the

control region. If this order was a remnant of an earlier or intermediate gene order in birds, the

basal position of the tinamou in the paleognaths would imply that the avian gene order arose

independently in the ratites and chicken. The sequencing of other tinamou species demonstrated

that the gene order in Eudromiu eleguns was unique and tikely a derived feature in this species.

This finding was Mersupported with the publications by Mindel1 et al. (1998) and Bensch and HWd (2000) describing the independent translocation of ND6 around the control repion in many avian orders. From these observations it is clear that the genes adjacent to the control region are prone to translocation, the tinamou being an additional example of this type of remangement. The labile nature of the gene order surrounding the control region may also explain the unusually large and heterogeneous spacer DNA located between the two tRNAs

adjacent to ND6 (or the control region in the case of the elegant-crested tinamou). The

independent origins of these similar gene rearrangements cast doubts on their utility in

phylogenetic studies.

Codon Usage and Sequence Composition

The genornes of paleognaths are similar to those of other birds and vertebrates generally in codon usage, and in base and amino acid composition. The biases against G in third codon positions, and its corresponding reduction in codon usage, is strongly evident, as is the higher than expected abundance of T in second codon positions. There is considerable conservation in the amino acid composition encoded in the protein-coding genes. Hydrophobic and weakly hydrophobic residues constitute the majority of these amino acids. This observation is consistent with the expected requirement of hydrophobic (L, 1, C, M, V, F, Y, W) or weakly neutral hydrophobic (P, A, G, S, T) arnino acids in membrane-bound proteins (Attardi et al.

1986). The strong bias toward hydrophobic residues has a correlated effect on the nucleotide base composition of the protein-coding genes. Since 24 of the 26 triplets that encode hydrophobic residues have the sequence NYN (Naylor et al. 1995), the high fiequency of hydrophobic residues results in a correspondingly high fiequency of pyrimidines in second codon positions. The combined fiequency of T and C at this position averages 70 % in paleognath birds. The great abundance of leucine (mean value 18 %), which is coded by YTN, also contributes to the high fiequency of T in second codon positions. Levels of variation across genes conforrn with the pattern seen in other vertebrate mitochondrial genomes indicating that the paleognaths are experiencing similar hctional consbain&

RNA Editing

The single example of potential RNA editing found in ND3 in the paleognath genomes

has been demonstrated to be widespread in birds, particularly nonpasserines (Appendix 2.3), and

has been reported in a turtle genome (Zardoya and Meyer 1999). Evidence that this single base

is not read during the translation process can be seen indirectly by cornparhg variation at the

three codon positions. Degeneracy in the genetic code results in third positions being the most

variable of the codon positions, with first positions being less variable and second positions the

most conserved. Comparing the paleognathous ND3 gene sequences this pattern of variation

cm only be seen across the full length of the gene if this single base at position 174 is removed.

Furthemore, to attain a full length protein product that is similar in size and arnino acid composition with that found in other vertebrates again relies upon this base being edited out.

One final argument in favour of a post-transcriptional modification is that if the gene product were shorter, then the sequence beyond the termination codon would not be under fûnctional constraints and thus fiee to Vary. This was not observed when these sites were compared across several orders of birds (Haddrath and Baker unpublished). Sirnitar post-transcriptional modifications have been described in invertebrates (Alfonzo et al. 1997) and in a mitochondrial tRNA anticodon(Janke and Pub 1993).

In summary the nine paleognathous bird genomes sequenced here exhibited the sarne constraints and patterns of variation observed in other avian and non-avian genomes. No features unique to paleognaths that might provide insight on their evolutionary history were uncovered. The two uniqw properiies discovered, the potential RNA editing site and the gene

remangement in the tinamou, have subsequently been found to be widespread in birds and also

occur in other vertebrates.

References

Adachi, J. Cao, Y. and Hasegawa, M. 1993. Tempo and mode of rnitochondrial DNA evolution

in vertebrates at the amino acid sequence level: rapid evolution in warm-blooded vertebrates. J.

Mol. Evol. 36:270-28 1.

Adachi, J. and Hasegawa, M. 1996~.MOLPHY, version 2.3: Prograrns for molecular phy logenetics based on maximum-likelihood. Comput. Sci. Monogr. No. 28. lnstitute of

Statistical Mathematics, Tokyo.

Adachi, J. and Hasegawa, M. 1996b Mode1 of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42: 459-468.

Alfonzo, J. D., Thiemann, 0. and Simpson, L. 1997. The mechanism of U insertion/deletion

RNA editing in kinetoplastid mitochondria. Nucleic Acids Res. 25375 1-3759.

Anderson, S., Bankier, A. T., Barrell, B. O., de Bruijn, M. H. L., Coulson, A. R., Drouin, J.,

Eperon, 1. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. L. H., Staden, R. and Young, 1. G. 198 1. Sequence and organization of the human mitochondrial gemme. NetuFe

290:457-464.

Anderson, S., de Brujin, M. H. L., Coulson, A. R., Eperon, 1. C., Sanger, F. and Young, 1. G.

1982. Complete sequence of bovine mitochondrial DNA.. J. Mol. Biol. l56:683-717.

Amason, u., Gullber, A., Johnsson, E. and Ledje, C. 1993. The nucleotide sequence of the

mitochondrial DNA molecule of the grey seal, Halichoerus grypus, and a cornparison with mitochondrial sequences of other tnie seals. J. Mol. Evol. 37:323-330.

Attardi, G., Chomyn, A., Doolittle, R. F., Mariottini, P. and Ragan, C. 1. 1986. Seven unidentified reading frames of human mitochondrial DNA encode subunits of the respiratory chain NADH dehydrogenase. Cold Spring Harbour Symp. Quant. Biol. 5 1 : 103- 1 14.

Baker, A.J. and Marshall, H.D. 1997. Mitochondrial control region sequences as tools for understanding evolution. In "Avian Molecular Evolution and Systematics" (D.P.Mindell, Ed.),

Academic Press, San Diego, U.S.A. pp 351 -79.

Bensch, S. and Hiidid, A. 2000. Mitochondrial genomic rearrangements in . Mol. Biol.

Evol. 17:107-1 13

Bibb, M. J., Van Etten, R. A., Wright, C. T., Walberg, M. W. and Clayton, D. A. 198 1.

50 Sequence and gene organization of mouse mitochondrie1 DNA..Ce11 26:167-1 80.

Cabot, E. 1997. XESEE 3.1. Eyeball Sequence Edh.

Cao, Y., Adachi, J. A., Janke, A., PMbo, S. and Hasegawa, M. 1994. Phylogenetic relationships among eutherian orders estimated from inferred sequence of mitochondrial proteins: Instability of a tree based on a single gene. S. Mol. Evol. 395 19-527.

Cam, S.M. and Griffith, O.M. 1987. Rapid isolation of mitochondrial DNA in a small fixed-angle rotor at ultrahigh speed. Biochem Genet. 25:389-391.

Cooper, A. 1997. Studies of avian ancient DNA: from jurassic park to modem island extinctions. In "Avian Molecular Evolution and Systematics" (D.P.Mindell, Ed.), Academic

Press, San Diego, U.S.A. pp 345-373.

Cooper, A., Mourer-Chauvire, C., Chambers, G. K., von Haeseler, A., Wilson, Amd Piiiibo, S.

1992. Independent origins of New Zealand moas and kiwis. Proc. Natl. Acad. Sci. U.S.A.

89:874 1-8744.

Curnmings, M. P., Otto, S. P. and Wakeley, J. 1995. Sampling properties of DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. l2:8 l4-822. Dean, A.D. and GreenwaM, LE. 1995. Use of filtered pipet tips to elute DNA fiom agarose gels.

Biotechniques 18:980.

De Rijk, P., Wuyts, J., Van de Peer, Y., Winkelmans, T. and De Watchter, R. 2000. The

European large Subunit Ribosomal RNA database. Nucleic Acids Res.28: 177- 178

Desjardins, P. and Morais, R. 1990. Sequence and gene organization of the chicken mitochondrial genome: a novel gene order in higher vertebrates. J. Mol. Biol. 212599-634.

Foury, F., Roganti, T.,Lecrenier, N. and Purnelle, B. 1998. The coniplete sequence of the mitochondrial genome of Saccharomyces cerevisiae. FEBS Lett (3):325-3 1.

Gadaleta, G., Pepe, G., De Candia, G., Quagliariello, C., Sbisà, E. and Saccone, C. 1989. The complete nucleotide sequence of the Rattus norvegicus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J. Mol. Evol. 28:497-5 16.

Gutell, R. R. 1994. Collection of small subunit (16s- and 16s-like) ribosomal RNA structures:

1994. Nucleic Acids Res. 22:3502-3507.

Hagelberg, E. 1994. Mitochondrial DNA fiom ancient bones. In "Ancient DNA" (B. Hernnann and S. Hummel, Eds.), Springer-Verlag, New York, U.S.A. Hariid, A., Jde, A. and bason, U. 1997. The rntDNA seqwnce of the ostrich and the

divergence between paleognathous and neognathous birds. Mol. Biol. Evol. 14:754-761.

Htlrlid, A., Janke, A. and hason, U. 1998. The complete mitochondnal genome of Rhea

urnericana and early avian divergences. J. Mol. Evol. 46:669-679.

Harlid, A. and hason, If. 1999. Analyses of mitochondrial DNA nest ratite birds within the

Neognathae-supporting a neotenous origin of ratite morphological characters Proc. R. Soc.

Lond., B, Biol, Sci. 266:305-309.

Hasegawa, M. and Kishino, H. 1989. Heterogeneity of tempo and mode of mitochondrial DNA evolution among marnmalian orders. Jpn. J. Genet. 64: 243-258.

Hickson, R. E., Simon, C., Cooper, A., Spicer, G. S., Sullivan, I. and Penny, D. 1996.

Conserved sequence motifs, alignmen and secondary structure for the third domain of animal

12SrRNA. Mol. Biol. Evol. 13: 150-169.

Invin, D., Kocher, T. D. and Wilson, A. C. 1991. Evolution of the cytochrome b gene of mammals. J. Mol, Evol. 32: 128-144.

Janke, A. and Piiiibo, S. 1993. Editing of a tRNA anticocion in marsupial mitochondna changes its codon recognition. Nucleic Acids Res 21: 1523- 152% Janke, A. and hason, 0. 1997. The complete rnitochondrial gmorne of alligtztor

mississippiensis and the separation between recent archosauria (birds and ). Mol. Biol.

Evol. 14: 1266- 1272.

Kubo,T., Nishizawa,S., Sugawara,A., Itchoda,N., Estiati,A. and Mikami,T. 2000. The complete

nucleotide sequence of the mitochondrial genome of sugar beet (Beta wlgaris L.) reveals a

novel gene for tRNA(Cys)(GCA). Nucleic Acids Res. 28 : 2571-2576

Kumar, S., Tamura, S. and Nei, M. 1993. MEGA, Molecular Evolutionary Genetics Analysis,

Version 1.O (Pennsylvania State Univ., University Park).

Kumiizawa, Y. and Nishida, M. 1993. Sequence evolution of mitochondrial tRNA genes and

deep-branch animal phylogenetics. S. Mol. Evol. 37:380-398.

Lee, K., Feinstein, J. and Cracraft, J. 1997. The phylogeny of ratite birds: resolving conflicts between molecular and morphological data sets. Avian Molecular Evolution and Systematics,

Academic Press, Toronto pp 173-209.

Mindell, D., Sorenson, M. D. and Dimcheff, D. E. 1998. Multiple independent origins of mitochondrial gene orders in birds. Proc. Nat. Acad. Sci. U.S.A. 95: 10693- 10697.

Mindell, D.P.,Sorenson, M.D., Dimcheff, D.E.,Hasegawa, M., Ast, J.C.and Yuri,T.1999.

54 Interordinal relationships of birds and other reptiles based on whole mitochoidrial genomes

Syst. Biol. 48, 138-152.

Naylor, G. J. P., Collins, T. M. and Brown, W. M. 1995. Hydrophobicity and phylogeny. Nature

373:565-566.

Ojala, D., Merkel, C., Gelfand, R. and Attardi, 0.1980. The tRNA genes punctuate the reading

of genetic information in human mitochondrial DNA. Cell22:393-403.

Prager, E.M.,Wilson, A.C., Osuga, D.T.and Feeney, R.E. 1976. Evolution of flightless land

birds on southeni continents: transfemn cornparison shows monophyletic origin of ratites. J.

Mol. Evol. 8:283-294.

Roe, B. A., Ma, D. P., Wilson, R. K. and Wong, J. F. 1985. The complete nucleotide sequence

of the Xenopus laevis mitochondrial genome. J. Biol. Chem. 260:9759-9774.

Russo, C. A., Takezaki, N. and Nei M. 1996. Efficiencies of different genes and different tree-

building rnethods in recovering a known vertebrate phylogeny. Mol. Biol. Evol. 13525-536.

Sambrook, I., Fritsch, E. F. and Maniatis, T. 1989. Molecular Cloning: A Laboratory Manual,

2nded. Cold Spring Harbour Laboratory Press, Cold Spring Harbour, New York. Sankoff, D., Leduc, Ge, Antoine, Ney Paquin, B., Lang, B.F. and Cedergren 1992. Gene order for phylogenetic inference: evolution of the mitochondrial genome. Proc. Nat. Acad. Sci. U.S.A.

89:6575-6579.

Sibley, C.G.and Ahlquist, J.E. 1981. The phylogeny and relationships of the ratite birds as indicated by DNA-DNA hybridization. In " Evolution Today, Proc zndCongr for Systematic

Evolutionary Biology", (G.G.E Scudder. and L.L. Reveal Eds). Hunt Inst. Botanic Document,

Pittsburgh, pp 301-335.

Sibley, C.G. and Ahlquist, J.E. 1990. Phylogeny and classification of birds. Yale Univ Press,

New Haven Connecticut.

Sullivan, J., Holsinger, K. E. and Simon, C. 1995. Among-site rate variation and phylogenetic analysis of 12SrRNA in sigmodontine rodents. Mol. Biol. Evol. l2:988- 1001.

Swofford, D.L. 1998. PAUP*: Phylogenetic analysis using parsimony (* and other methods), version 4.0.b2 Sinauer, Sutherland, Mass.

Takezaki, N. and Gojobori, T. 1999. Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences. Mol. Biol. Evol. 16:FlO-60 1.

Thompson, J.D., Gibson, T.J., Plewiak, F, Jeanmolgin, J. and Higgins, D.G. 1997. The Clustal

56 X windows interface: flexible strategies for multiple sequence alignment aided by quatity

analysis tools. Nucleic Acids Res. 24:4876-4882.

Van de Peer, Y., Van deen Broeck I., Rijk, P. and de Wachter, R. 1994. Database on the

structure of small ribosomal subunit RNA. Nucleic Acids Res. 22~3488-3494.

van Tuinen, M., Sibley, C. G. and Hedges S. B. 1998. Phylogeny and Biogeography of ratite birds inferred fkom DNA sequences of the mitochondrial ribosomal genes. Mol. Biol. Evol. 15:

370-3 76.

van Tuinen, M., Sibley, C. G. and Hedges S. B. 2000. The early history of modem birds inferred from DNA sequences of nuclear and mitochondrial ribosomal genes. Mol. Biol. Evol. 17: 45 1-

457.

Wilson, A. C., Cam, R. L., Cm,S. M., George, M., Gyllensten, U. B., Helm-Bychowski, K.

M., Higuchi, R. G., Palumbi, S. R., Prager, E. M., Sage, R. D. and Stoneking, M. 1985.

Mitochondrial DNA and two perspectives on evolutionary genetics. Biol. J. L~M.Soc. 26:375-

400.

Zardoya, R. and Meyer, A. 1999. Complete mitochondrial genome suggests diapsid afinities of turtles. Proc. Nat. Acad. Sci. U.S.A. 95: 14226-1423 1. Chapttr 3

Patterns and Rates of Nucleotide and Amino Acid Substitution

in the Mitochondrial Genes of Paleognathous Birds

Over the last decade, the use of nucleotide and arnino acid sequences to make inferences about phylogenetic relationships and to estimate divergence times has become commonplace.

The reliability of these estimates, however, depends on an accurate understanding of the probability of substitution among the four nucleotides, as well as corrections for rates of substitution among sites (Lockhart et al. 1994; Yang 1994; Rzhetsky and Nei 1995; Yang and

Kumar 1996; Takezaki and Gojobori 1999).

The pattern of nucleotide substitution within and among taxa is influenced by several factors such as the substitutional biases among the four nucleotides, and the effects of purifiing selection and directional mutation pressure (Suoeka 1988; Jerrniin et al. 1994,1996). The bias favouring transitional over transversional nucleotide substitutions in mitochondrial DNA is possibly the best-known exarnple of a substitutional bias (Brown and Simpson 1982; Brown et ai. 1982). However, even within transitional or transversional substitution pathways not ail substitutions are necessarily equiprobable. Preferences have been observed for certain kinds of base changes (Kumar 1996). Since many genes code for fùnctional units, purifying selection also

58 adds constraints which influence nucleotide selection. Substitutional biases that are unique to a lineage can ultimately lead to an overall bias in its base composition. Suoeka (1 962) first described this "directional mutation pressure", a phenomenon which leads to an overall increase in G+C or A+T content in a sequence. Directional mutation pressure has even been suggested to have a role in shaping amino acid composition (Jemiin et al. 1994). The identification of base compositional biases is crucial as most phylogenetic methods assume uniformity in the base compositions of the sequences being examined. Analyses of sequences whose composition significantly deviates from the average can result in tree topologies that reflect compositional similarities and not necessarily phylogenetic relationships (Lockhart et al. 1994; Galtier and

Guoy 1995; Foster and Hickey 1999; Xia 2000).

Rate variation among sites is also a common and important phenomenon of sequence evolution and must be incorporated in models of sequence evolution to accurately estimate branch lengths and transitiodtransversion ratios (Gillespie 1986; Takahata 1991). Variation in the rate of substitution between sites can result in different sites having different phylogenetic utility. Invariable sites are uninformative, while fast evolving sites may be prone to saturation effects especially when estimating deep branch phylogenies. Third codon position transitions, for example, are ofien excluded fiom phylogenetic analyses of groups whose age is greater than several million years, due to the likelihood that their high rate of substitution has lefi them saturated with multiple substitutions (Invin et al. 1991). Variation in the rates of substitution arnong sites is not limited to the three codon positions, but also exists among codons due to selectional differences at the arnino acid level. If al1 sites were able to change at the sarne rate, the number of substitutions occuning per site for a series of sequences would approximate a Poisson distribution. Early work by Fitch and Margoliash (1%7) with cytochrome c

demonstrated that the substitution pattem did not fit a Poisson distribution without the removal

of invariable and hyper-variable sites. The presence of sites that are fiee to Vary or are variously

constrained is the hallmark of a sequence which codes for a protein product that is under

fùnctional constraints. Men the rate of substitution does vary among sites, the substitutional

pattern can be characterized by a gamma distribution whose shape parameter is defined as a. As

the value of a approaches infinity, rate variation between sites approaches equality. Alpha values

less than one indicate arnong-site rate variation, where many sites remain invariable and a

minority of sites are free to vary (see review, Yang 1996). Kumar (1996) examined the variation

in substitution rates arnong sites for the mitochondrial genes of 1 1 divergent taxa representing

most of the major vertebrate lineages. The values for a for the rnitochondrial genes he examined

were al1 less than one, indicating substantial rate variation among sites.

In this chapter, the sequences for mitochondrial genes of the paleognathous birds are

examined to estimate the pattem of nucleotide substitution and rate variation arnong sites. The

goal is to choose the appropriate sites and mode1 of evolution which will then be used in the

following chapter for phylogenetic reconstruction and the calculation of dates of divergence of

taxa.

DNA Isolation, Amplification and Sequencing

60 Details of the sarnples and the methods of DNA isolation, amplification and sequmcing

procedures are as presented previously in the Materials & Methods in chapter 2.

Data Analysis

For comparative analyses, in addition to the nine complete mtDNA genomes of

paleognathous birds sequenced in this study, 22 additional rntDNA genome sequences were

downloaded from GenBank and EMBL databases. The data set included the chicken Gallus

galli~sGenBank Accession Number X52392 (Desjardins and Morais 1900), rook Corvus fiugilegus Y 18522 (Halid and hason 1999), redheaded duck Aythya americana AF090337,

peregrine falcon Falco peregrinus AF090338, grey-headed broadbill Smithornis sharpei

AF090340, village indigobird Vidira chalybeata MO9034 1 (Mindel1 et al. 1999), alligator

Alligator rnississippiensis Y 1 3 1 13 (Janke and hason 199 7); turt le Pe lomedirsa siibrufa

AF039066 (Zardoya and Meyer 1W8), blue whale Balaenoptera musculus X72204 (bason and

Gullberg 1993), cat Felis catus U20753 (Lopez et al. 1996), cow Bos taurus JO 1394 (Anderson et al. 1982), dog Canis familiaris U96639 (Kim et al. 1998), hedgehog Erinaceus europaeus

X88898 (Krettek et al. 1999, hippopotamus Hippopotamus amphibius AS010957 (Ursing and hason 1998), horse Equiis caballus X79547 (Xu and hason 1994), human Homo sapiens

M 12548 (Anderson et al. 198 l), mouse Mus musculus JO 1420 (Bibb et al. 198l),

Oryctolagus cuniciilus AJOO 1588 (Gissi et al. 1998), rhinoceros Ceratotherium simum Y07726

(Xu and hason 1997), grey seal Halichoerzis grypus X72004 m mas on et al. 1993), wailaroo Macropus robustus Y 10524 (Janke et al. 1997) and fiog Xenopus laevis X02890 (Roe et al.

1985). For each of the protein-coding gems, the mino acid seqwnces pdicted hmthe nucleotide sequences were aligned using the default options in Clustal X (Thompson et al. 1997).

Nucleotide sequences were then made to confonn to this alignrnent. Sites containing gaps were eliminated and analysis was carried out at the nucleotide and arnino acid levels. For most of the tests done here, only 1" and 2" codon positions were used as 3rdcodon positions were observed to have high levels of sequence divergence and thus were likely saturated with multiple transitional substitutions. For tests using the concatenated protein-coding genes, ND6 was excluded as it is encoded on the L-strand and may obscure the patterns of substitution due to the added complication of asymmetrical (or strand-specific) directional mutation pressure.

Ribosomal RNA (rRNA) and transfer RNA (tRNA) genes were also aligned using the program

Clustal X and the alignments improved upon using models cf secondary structure. Gaps and sites where the alignrnent was arnbiguous were removed (see Appendix 3.1 for aligned sequences). For the tRNA genes, the sequences were subdivided into regions containing loops and those containing stems, and examined separately.

Estimation of the Pattern of Ndeotide Substitution

For the paleognathous birds, chicken, duck, alligator and turtle, the observed levels of transitional and transversional base substitutions for al1 the coding genes, along with the p- distance for al1 possible pairwise comparisons, were calculated using the program MEGA

(Kumar et al. 1993). MEGA was also used to compute the observed fkquencies of the six potential nucleotide substitutions for the 1" and 2" codon positions of the protein-coding genes and for al1 the aligned positions for the rRNA and tRNA genes. The program DAMBE (version 3.7; Xia 2000) was used to perfom a T-test comparing whether the observed substitution

saturation differed from half of fi111 substitution saturation.

Symmetrical directional mutation pressure was estimated for the protein-coding genes of

the paleognathous birds, as well as for 22 additional mitochondrial genomes using the prograrn

DMP (version 2.0; Jemiin et al. 1996) which examines G+C content at synonymous and

nonsynonymous sites, and also Martin's (1995) methodology which compares base frequencies

at fourfold degenerate sites.

Tests for the stationarity of the nucleotide and amino acid compositions among the avian

sequences were carried out using the program PUZZLE (version 4.0.2; Strinuner and von

Haeseler 1996). This program performed a X' test of whether the composition of each sequence

was identical to the average composition for al1 the sequences. In addition to testing for

violations of stationarity across al1 sites a more stringent test using only the variable positions

was also performed (Waddell et al. 1999; van Tuinen et al. 2000).

Substitutional Rate Variation among Sites and the Rate Constancy Test

Estimates were made for the among-site rate variation for the paleognath rRNA, tRNA genes and for each of the protein-coding genes (1'' and 2"6codon positions combined, as weli as for each codon position individually). Substitution rate variation was assumed to follow a gamma distribution with a shape parameter of a. Due to computational restrictions, a discrete gamma model using eight categories was used in a maximum likelihood analysis. The program

PAML (version 2.0; Yang 1997) with the general time reversible model (GTR, Yang 1994;

Adachi and Hasegawa 1996a) was used to estimate a and likelihood values for the nucleotides. For amino acids, PUZZLE was used with the mtREV model of amino acid substitution (Adachi and Hasegawa 1996b). To test the variability of the substitution rates among sites, the log- likelihoods of observing the data were calculated by running the analysis with (1,) and without

(IJ, the discrete gamma model. The log-likelihood ratio tests were performed with the statistic

2A1 (with Al = (4- 1, 1 ) following a X' distribution with 1 degree of freedom (Kumar 1996).

Mode1 of Substitution

The model assumed for amino acid substitution was mtREV; a model which has been optimized for mitochondrial proteinîoding genes. The appropriate model of nucleotide substitution was estimated using the program Modeltest ( version 3; Posada and Crandall 1998).

IhNm

Pattern qfNucleotide Substitution for the Protein-Coding, rMA and tRNA Genes

As in other vertebrate genomes, transitional substitutions are occurring at a higher rate thon transversional substitutions in the 12 H-strand encoded protein-coding genes for the avian, alligator and turtle genomes (Figure 3.1). The linear relationship between the number of observed transversional substitutions and the p-distance for la and 2" codon positions indicates that these sites have not become saturated with multiple substitutions. The slope of the line for

3rdcodon transversional substitutions however does begin to deviate fiom linearity for painvise cornparisons between birds and the distant outgroups of the alligator and turtle. For transitional substitutions across the three codon positions, only 2ndpositions are relatively unaffected by multiple hits. At the first position in codons, however, transitional substitutions accumulate 1inearly wi th increasing distances, and transversional substitutions begin to outnumber

transitional changes for any ratitehon-ratite or non-ratitehon-ratite comparisons. Third codon

position transitional changes exhibit clear signs of saturation, with the curve reaching an

asymptote and then declining. This is likely a result of transitional sites being slowly overwritten

by transversional substitutions.

The rRNA and tRNA genes also show a similar pattern of higher rates of transitional over

transversional substitutions (Figures. 3.2 and 3.3). For the rRNA and the stem portions of the

tRNA genes, transitional and transversional changes accumulate linearly even in comparisons

that include the more distal outgroups. In the RNA loops, however, transversional changes are

linear with distance, whereas transitional substitutions appear to deviate from linearity for more

disial comparisons, suggesting saturation effects. Simulation studies have shown that

phylogenetic information is Iost when the observed level of saturation (observed H) is equal to or

greater than half of full substitution saturation (Xia 2000). The results of testing whether the

three codon positions, the ribosomal and transfer RNA genes are experiencing levels of

substitution saturation significantly different from half of full saturation are show in Table 3.1.

Su bstit u ion Biases among the Four Nudeorides

The observed rates of the six kinds of nucleotide substitution for the protein-coding (1'' and 2ndcodon positions), rRNA and tRNA genes are shown in Table 3.2. Among transitional substitutions, T+C changes occur at a consistently higher fiequency than A-G for al1 the genes.

Transversional substitutions also appear to occur at uneven rates with A-T and AwC substitutions favoured over G-T and (3-C. The results are consistent with comparisons made 1st Codon Position

2nd Codon Position

=-Y

3rd Codon Position

Transitions e Transversions

Figure 3.1 The relation between the number of nucleotide substitutions and p-distance calculated from pairwise cornparisons arnong the paleognaths, chicken, alligator and turtle for the three codon positions. 66 rRNA Genes

* Transitions O Tramversions I

O - 0.05 O. 1 O. 15 0.2 0.25 03 P-Distance Figure 3.2 The relation behveen the nurnber of nucleotide substitutions and pdistance calculated from painvise cornparisons among paleognaths, chicken, alligator and turtle for the 12s rRNA and 16s rRNA genes.

tRNA Genes

stems

Figure 3.3 The relation between the nurnber of nucleotide substitutions and pdistance calculated from pairwise cornparisons among paleognaths, chtcken, alligator, duck and turtie for the 22 tRNA genes. among pmtein-coding sequences in other vertebmtes (Kumm 19%).

Unequai substitutional biases among lineages can lead to biases in nucleotide

composition. Unbiased estimates of symmetrical directional mutation pressure (y,,) along with

the observed fiequencies of G+C content for synonymous and nonsynonymous sites for the

protein-coding genes of avian, alligator, turtle and 13 mammalian mitochondrial genomes were

calculated (Table 3.3) (values for each individual protein-coding gene are shown in Appendix

3.2). The average value of p. for the birds exarnined here (0.504 I0.041)was higher than the

average value for mammals (0.423 k0.065). This may suggest a slight reduction in directional

mutation pressure toward A+T content in birds relative to mammals, but the range of the values

between birds and mammals shows a considerable degree of overlap. Martin (1 995) proposed a

simpler method of examining directional mutation pressure by contrasting the base composition

at fourfold degenerate sites. Figure 3.4 shows the comparison of the G+C content in these sites

for the genomes. Again the mean value for birds (0.409 k0.052) is higher than that for mamrnals

(0.346 I0.059) with a sirnilar overlap in the ranges between the two groups. An examination of

the correlation between p, and the frequency of G+C at nonsynonymous sites to determine

whether directional mutation pressure may be influencing amino acid selection is shown in

Figure 3.5. Regression analysis for both rnammals and birds support Jermiin et al.'s (1 994)

finding of a positive correlation between directional mutation pressure and G+C content at

nonsynonymous sites. This graph also reveals a consistently higher frequency of G+C content at nonsynonymous sites in birds relative to mammals.

Variation in G+C content at synonymous sites was also observed within birds. Among the paleognaths, the rhea was found to be experiencing the strongest û+C directional mutationai Table 3.1 Results of T-tests comparing the observed substitution saturation (observed Fi) with that of half of full substitution saturation.

T-test Protein-Coding rRNA tRNA 1st 1 no Syn. 2* 3rd Stems Loops Mean (Observed H) 0.3897 0.3208 O. 1886 1.O378 0.3838 0.3002 0.409 1 Variance (observed H) 0.2 148 O. 1828 O. 1061 0.203 1 0.2378 0.1390 0.2266 Standard Error 0.0467 0.0490 0.062 1 0.0665 0.0505 0.0449 0.0709 Half of Full Sub. Sat. 0.8989 0.89 14 0.8397 0.7550 0.8897 0.9069 0.7975

Probability (T=O) 0.0000 0.0000 0.0000 0.001 1 0.0000 0.0000 0.0001

Note: The T statistic in the T-test is the difference between the two means divided by the standard error. The 'T=OW represents the nul1 hypothesis, that the two means are equal (Xia 2000). Table 3.2 Observed substitution rates for the protein-coding (1" and 2ndcodon positions) and RNA genes among paleognathous birds.

Gene Transitions Transversions Protein-Coding AwG T-C A++T A-C G-T G-C ATPase 6 ATPase 8 COI COI1 COIII CY~b ND 1 ND2 ND3 ND4L ND4 ND5 ND6

------Average for Protein- O. 195 0.5 10 0.106 0.122 0.030 0.038 Coding Genes ISD *O.OS 1 k0.084 k0.033 A0.069 *0.017 k0.022 rRNA & tRNA Genes 12s rRNA 0.330 0.433 0.087 0.124 0.012 0.014 16s rRNA 0,267 0.422 0.109 0.158 0.026 0.017 tRNAs stems 0.387 0.473 0.053 0.063 0.007 0.018 RNAs loops 0.2 12 0.503 0.144 0.112 0.014 0.016 Table 3.3 Directional mutation pressure (p,,) for the 12 H-strand encoded genes.

Archosauria and Turtle Marnmals and Frog

rook human

indigo cow falcon mouse

broad bill seal duck hedgehog L! chicken blue whale tinamou cat moa rhinoceros rhea hippopotamus

emu rabbit

cassowary dog kiwi horse ostrich wallaroo

alligator frog- 5%, 1%, 0.1 % significance indices turtle 0.393 0.396 * * * * * * ( * towards > A+T) ( 2 towards > G+C) pressure. The chicken and duck were the only other birds examined that were also experiencing

significant G+C directional mutation pressure (see Table 3.3).

Stationariiy

Deviations fiom stationarity of nucleotide and arnino acid compositions were tested

among the paleognath, duck and chicken mitochondrial genes ( Table 3.4). The assumption of

stationarity across al1 sites of the nucleotide sequences for the ingroup (ratites) was not violated

for the rRNA, tRNA and protein-coding genes (1" and 2ndcodon positions). Stationarity was also

not rejected for the encoded amino acid seqT--2.2s. Analysis of sequences that included third

codon positions, however, consistently rejected stationarity for most of the protein-coding genes,

indicating significant deviation in base composition across sequences. The taxa that deviated

consistently fkom the compositional average included the chicken, duck, tinamou and rhea.

Among the birds that violated stationarity, the rhea deviated fiom stationarity with the greatest

frequency (8 out of the 13 protein-coding genes). More stringent tests of stationuity across sites

using only variable positions (Table 3.5), also fail to demonstrate significant violations of

stationarity in the amino acid sequences. However, stationarity is rejected for the distal outgroups in the ribosomal and iransfer RNA genes and for al1 positions in the protein-coding nucleotide sequences.

Rates of Variation among Sites and among Genes

An examination of the variation in substitution rates among sites within the paleognaths produced a values that were less than one for al1 the protein-coding (1" and 2' codon positions Birds - Mammds - Alligator. Tude (0 C contribution - G contribution ) Figure 3.4 The distribution of birds, mammals, alligator and turtle as a function of G+C content at four-fold degenerate sites (contribution of each base separately is also shown).

Figure 3.5 The distribution of birds, mammals, alligator and huile as a fiuiction of the directional mutation pressure ( ) and the G+C content at nonsynonymous sites (Pnon). 73 Table 3.4 Results of chi-square tests cornparhg nucleotide and amino acid compositions across al1 sites for each sequence to the average sequence composition.

Gene Nucleotide Amino Acid

Protein-Coding la++M 1St 2nd 3rd ATPase 6 passed passed passed failed(ch-r) passed ATPase 8 passed passed passed failed") passed COI passed passed passed fai led(chLd passed COI1 passed passed passed failed(ch*o) passed COI11 passed passed passed failed(ChS) passed Cytochrome b passed passed p~ssed failed(lr' passed ND 1 passed passed passed fai led") passed ND2 passed passed passed failed(ch*be*o) passed ND3 passed passed passed passed passed ND4L passed passed passed failed(t*clr) passed ND4 passed passed passed failed(ch*t*r) passed ND5 passed passed passed failed('8" passed ND6 passed passed passed passed passed Concatenated Genes faileddqa failedtdtu passed failedal' passed rRNA & tRNA Genes Al1 Nudeotide Positions 12s rRNA passed NIA 16s rRNA passed N/A failedtu NIA

tRNAs stems passed NIA tRNAs loops passed N/A taxa violating stationarity are indicated: ch-chicken, t-tinamou, r-rhea, O-ostrich,c-cassowary, e-emu, k-kiwi, m-moa, d-duck, a-alligator, tu-tutle 74 Table 3.5 Results of chi-square tests cornparhg nucleotide and arnino acid compositions across variable sites only for each sequence to the average sequence composition.

Genes Nucleot ide Amino Acid

Concatenated Protein-Coding Genes

- Al1 Nucleotide Positions Concatenated Ribosomal RNA Genes failed (d,tu) NIA Concatenated Transfer RNA Genes failed (a) NIA

Al1 Genes Concatenated (lY& 2" codon positions only) failed (ch,d,r) NIA taxa violating stationarity are indicated: ch-chicken, t-tinarnou, r-rhea, o-ostrich, c-cassowary, e-emu, k-kiwi, m-moa d-duck, a-alligator, tu-turtle and amino acids), rRNA and tRNA genes (Table 3.6). A likelihd ratio test of the wiiformity of

substitution rates arnong sites demonstrated that rate constancy could be rejected for al1 the

mitochondrial genes. The a values for each codon position varied, with 2" codon positions

generally having the lowest values, reflecting the extreme functional constraints acting on these

sites where al1 substitutions are nonsynonyrnous. First codon positions had a values slightly

higher than 2""ositions for al1 the genes except ATPase 6 and 8, but still well below one. With

the exception of cytochrome b, a values for 3rdcodon positions were al1 greater than one. This

indicates reduced functional constraints with the correspondingly greater homogeneity in the rate

of substitution across these sites.

Model of Nucleoride Substitution

The program Modeltest was used to ascertain the best model to estimate the pattern of

nucleotide substitution for the three data sets: the protein-coding, the rRNA and the tRNA genes.

To determine the most appropriate model, Modeltest estimates a tree of the phylogenetic

relationships from the data using the neighbour-joining algorithm and then calculates which model has the maximum likelihood value for this tree amongst 56 different models of substitution. In the process of determining the most appropriate model, the program tests the equality of base frequency, transitional and transversional substitutions, transitional substitution rates, transversional substitution rates, rate variation among sites and the proportion of invariable sites hyputheses. These results are summarized in Table 3.7, and confirm the findings already described (which did not rely on an estimated tree topology). The model chosen for the protein- coding, rRNA and tRNA (stems) genes that best fits the data was the GTR model with a Table 3.6 Substitution rate variation among sites for both the protein-coding and RNA genes.

Gene 2A1 a *S.E.

ATPase 6 ATPase 8 COI COI1 COIII Cytochrome b ND 1 ND2 ND3 ND4L ND4 ND5 ND6 rRNA & tRNA 2A1 Al1 Positions

tRNAs stems 87.42' 0.11 i0.01 tRNAs loops 25 1.54' 0.33 *0.03 * -rate constancy rejected at the 1% level(2Al) 6.6) -estimate beyond intemal upper bound correction for invariant sites and rate variation among sites ( GTR+l+r) (Table 3.8).

Discussion

Pattern ofSubstitutions

The analyses of the nucleotide and amino acid sequences for the complete mitochondrial

genomes described here revealed patterns similar to those published for mitochondrial genes and

genomes for other taxa (Irwin 199 1 ; Kumar 1996; Kocher and Carleton 1997). The same pattem

of differing substitution rates among the three codon positions was observed with 2ndpositions

being the most conserved, followed by 1" and then 3" positions. This pattem is a by-product of

selectional constraints. Transitional changes occur more frequently than transversional changes

overall, but considerable heterogeneity exists in the fiequency of each of the six possible types of

nucleotide substitutions. This was seen in the empirical observation of nuc leotide substitution

fiequencies and confinned by Modeltest in its assignment of GTR as the most appropriate model

of nucleotide substitution (a model which assumes different rates for each of the six possible

types of nucleotide substitutions).

Tests of stationarity rejected compositional homogeneity among the paleognath genomes.

The observation of compositional biases prompted an examination of the directional mutation

pressure in the paleognath grnomes sequenced here as well as in several additional vertebrate genomes. Among al1 but one of the 13 mammalian genomes examined, directional mutation pressure was driving the sequences toward greater A+T content. For the 15 avian genomes tested, six were experiencing directional mutation pressure toward greater A+T content, while Table 3.7 The hypotheses tested and the associated P values as calculated using the program Modeltest.

Test rRNAs Protein-Coding tRNAs 1 s'+2"dcodon stems Equal base frequencies hypothesis Transitions=Transvenions hypothesis Equal Transitions rates hypothesis d Transversions rates hypothesis Equal rates among sites hypothesis Invariable sites hvmthesis 0.032941

Hiearc hical Proportion of Genes LRT AIC Invariant Sites a -1nL rRNAs GTR+l[' GTR+I+r 0.1280 0.3420 13361.12 tRNA stems K2P+ï GTR+I+l? O. 1797 0.4863 3610.75 Protein-Coding: 1' + 2"4 GTR+I+I' GTR+I+Ii 0.3141 0.4544 30496.68 3 combined data sets GTR+I+r GTR+I+L' 0.289 1 0.4070 4801 8.36 five were experiencing no directional pressure and four were driven toward greater G+C content.

If the avian genomes are divided phylogenetically between paleognaths and neognaths, it is interesting to note that of al1 the birds experiencing greater A+T pressure, five of the six are paieognaths. Arnong neognaths, only the rook is experiencing A+T pressure while the other six neognaths are experiencing either no pressure or pressure toward G+C content. One proposed reason that selection may favour an increased G+C content is the stronger bond atTorded by GC pairs, which would ensure "better protection against DNA breathing and mutability" (Bemardi and Bemardi 1986). If such a selective force existed, it would be reasonable to assume that it might be present in flying birds due to their higher body temperatures. Paleognaths, with the single exception of the tinarnou, are flightless and have reduced metabolic rates and body temperatures relative to flying birds. Thus, it is possible that the overall reduction in directional mutation pressure toward A+T in neognath sequences may be a by-product of the unique physiological adaptations required for flight. As more manunalian and neognath mitochondrial genome sequences become available it will be possible to determine whether this pattern is maintained or is merely a stochastic artifact of a small sarnple size. No further sampling of paleognath birds is possible as the genomes sequenced here represent al1 the major extant lineages.

It is also notable that there appears to be a higher frequency of G+C content in al1 the birds relative to al1 the mamrnals for nonsynonymous sites. This may reflect a slight difference in the arnino acid composition that could potentially be a product of different adoptive functional requirements. Jermiin et al. (1994) characterized directionai mutation pressure in cytochrome b for 1 10 taxa of vertebrates, and found a correlation between directional mutation pressure and G+C content at nonsynonymous sites. Their argument rhat directional mutation pressure was

introducing a compositional bias at nonsynonymous sites and thus a bias in amino acid selection

was supported by correlation and regression analyses of sequence composition at

nonsynonymous sites in the genomes reviewed here.

Examination of the pattern of nucleotide substitution identified several elements which

may influence the phylogenetic utility of these sequences. Comparisons of the number of

substitutions and the genetic distance between taxa for the protein-coding genes clearly indicate

that 3" codon position transitions are saturated with multiple substitutions. While transversions

at this codon position only exhibited saturation effects for pairwise cornparisons that included

non-avian taxa, their inclusion would prove pmblematic due to base compositional biases when

analysed using conventional methods. Within the ratites, the rheas repeatedly violated

stationarity at 3" codon positions in the test across al1 sites. The more stringent test using only

the variable sites indicated that al1 the nucleotide gene sequences violated stationarity. For the

ribosomal and transfer RNA genes the taxa that differed in composition were the duck and the

non-avian outgoups. Among the protein-coding genes dlthree codon positions were rejected as

being homogeneous in base composition. The greater stringency required to detect violations in the 1" and 2* codon positions and RNA genes is Iikely a resuIt of the greater constraints on these sites limiting the amount and kind of substitutions possible. It is interesting to note that stationarity among the amino acid sequences cannot be rejected, in spite of the fact that the nucleotide sequences which encode them are clearly experiencing directional mutation pressure.

This result may be more indicative of the limitations of the simple x2 test to discriminate compositional biases in a sequence with 20 possible amino acids at each site in contrast with simpler 4 base nucleotide sequences which have three times the length of sequence. The inclusion of taxa with unequal base compositions violates the assumptions of rnost phylogenetic methods and ofien produces trees that refiect compositional similarities rather than phylogenetic relationships (Lockhart et al. 1994; Foster and Hickey 1999; Xia 2000). Since chicken and duck, like rheas exhibit relatively high G+C content, the use of 3" codon position transversions could potentially attract rhea toward chicken to the base of the ratite tree. For these reasons, 3" codon positions should be excluded from the estimation of' the patterns and rates of nucleotide substitution. Additionally inclusion of sites which are nonhomogeneous in base composition should only be used in methods of phylogenetic analysis that utilize models of substitution which compensate for variation in base composition between taxa.

While the regression of the number of transitional substitutions for lYcodon positions versus p-distance appears linear, the greater number of observed transversional changes compared to transitional changes for painvise cornparisons involving the outgroups would suggest that these sites have begun to experience saturation effects. A T-test of the level of substitutions saturation at these sites indicate that the overall observed level of substitution saturation is well below that of the predicted half of full saturation. This would indicate that the inclusion of 1' codon position transitional substitutions in phylogenetic analyses would be warranted. The rRNA genes and the tRNA stems, which were show in chapter 2 to be under greater fùnctional constraints than the protein-coding genes, do not exhibit saturation for either the transitional or transversional changes when gaps and arnbiguous alignments are removed. As such, the aligned sequences of rRNA genes and tRNA stems should prove phylogenetically informative. tRNA loops exhibited signs of saturation for transitional substitutions but not for transversiod changes. This combined with the high frequency of indels making alignments

problernatic and the small amount of sequence they can contribute to any analysis would argue

that these sites should be excluded.

Rates of Evolirtion

The rate constnncy test for rate variation among sites demonstrated that significant rate

variation was occurring for each of the genes. Generally the values of a estimated for the

paleognath genomes were lower than the values found by Kumar (1 996) for vertebrates in the

same genes. The difference between these a values and Kumar's likely lies in the greater

antiquity of the lineages in his studies and the greater diversity of modes of living which may

have contributed to selective difierences in their mitochondrial genes.

The requirement for the use of the gamma correction was additionally confirmed by the

program Modeltest. This test performs a likelihood ratio test for the data set, comparing simple

models versus more complex models. The ideal situation when performing phylogenetic reconstruction is to have the simplest model possible as it provides distances with the smallest variances. Using the Akaike information critenon (AIC) (Akaike 1974) which rewards models for goodness of fit but imposes a penalty for unnecessary parameters, the model chosen by

Modeltest for the rRNA, tRNA (stems) and protein-coding genes was the GTR+I+r. The conclusions drawn here fiom exarnining the pattern of nucleotide and arnino acid sequence substitution will be used in the following chapter to apply appropriate models of evolution to the various potential data sets. These will then be used to phylogenetically reconstnict the relationships arnong the ratite birds and examine the phylogeny's implications on the biogeographic history of this grwp.

References

Adachi, J. and Hasegawa, M.. 1996a. Tempo and mode of synonymous substitutions in

mitochondrial DNA of primates. Mol. Biol. Evol. 13:200-208.

Adachi, J. and Hasegawa, M.. 1996b. Mode1 of amino acid substitution in proteins encoded by

mitochondrial DNA. J. Mol. Evol. 42:459-468.

Akaike, H. 1974. A new look at the statistical mode1 identification. IEEE Trans. Autom. Contr.

19:716-723.

Anderson, S., Bankier, A.T., BarreIl, B.G., de Bruijn, M.H.L.,Coulson, A.R., Drouin, J., Eperon,

I.C., Nierlich, D.P.,Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.H., Staden, R. and Young,

I.G. 198 1. Sequence and organization of the hurnan mitochondrial genome. Nature 290457-465.

Anderson, S., de Bruijn, M.H., Coulson, A.R, Eperon, I.C., Sanger, F. and Young, I.G. 1982.

Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J. Mol. Biol. 156:683-717. Amason, u., Gullberg, A., Johnsson, E. end Ledje, C. 1993. The nucleotide sequence of the

mitochondrial DNA molecule of the grey seal, Halichoerus grypus, and a cornparison with

mitochondrial sequences of other true seals. J. Mol. Evol. 37:323-330.

~rnason,U. and Gullberg, A. 1993. Cornparison between the complete mtDNA sequences of the

blue and the fin whale, two species that can hybridize in nature. J. Mol. Evol. 37:3 12-322.

Bemardi, G. and Bernardi, G. 1986. Cornpositional constraints and genome evolution. J. Mol.

Evol. 24:l-11.

Bibb, M.J., Van Etten, R.A., Wright, C.T., Walberg, M.W. md Clayton, D.A. 1981. Sequence and gene organization of mouse mitochondrial DNA. Ce11 26: 167- 180.

Brown, W.M., Prager, E.M.,Wang, A. and Wilson, A.C. 1982. Mitochondrial DNA sequences of primates: tempo and mode of evolution. J. Mol. Evol. 18:225-239

Brown, G. G. and Simpson, M. V. 1982. Novel features of animal mtDNA evolution as show by sequences of two rat cytochrome oxidase subunit II genes. Proc. Nat. Acad. Sci. 79:3246-

3250.

Desjardins, P. and Morais, R. 1990. Sequence and gene organization of the chicken mitochondrial genome: a novel gene order in higher vertebrates. J. Mol. Biol. 212599.634.

86 Fitch, W.M. and Margoliash, E. 1967. A method for estimating the number of invariant amino

acid coding positions in a gene using cytocbrome c as a mode1 case. Biochem. Genet. 1:65-7 1.

Foster, P.G. and Hickey D.A. 1999. Compositional bias may affect both DNA-based and protein- based phylogenetic reconstructions. J. Mol. Evol. 48:284-290

Galtier, N. and Guoy, M. 1995. Infemng phylogenies fiom DNA sequences of unequal base compositions. Proc. Nat. Acad. Sci. 92: 1 13 17- 1 132 1.

Gillespie, J.H. 1986. Rates of molecular evolution. Ann. Rev. Ecol. Syst. 17:637-665.

Gissi, C., Gullberg, A. and Amason, u 1998. The complete mitochondrial DNA sequence of the rabbit, OrycioIagus cuniculus. Genomics 50: 16 1- 169.

Hiidid, A., Janke, A. and hnason, U. 1998. The complete mitochondrial genome of Rhea americana and early avian divergences. J. Mol. Evol. 46,669-679.

Hiidid, A. and Amason, U. 1999. Analyses of mitochondrial DNA nest ratite birds within the

Neognathae-supporting a neotenous origin of ratite morphological characters.

Proc. R. Soc. Lond., B, Biol. Sci. 266:305-309.

Invin, D.M.,Kocher, T.D.and Wilson, A.C. 199 1. Evolution of the cytochrome b gene of

87 mammals. 3. Mol. Evol. 32:128-lU

Janke, A. and Amason, U. 1997. The complete mitochondrial genome of alligator mississippiensis and the separation between recent archosauria (birds and crocodiles). Mol. Biol.

Evol. 14: 1266-1273.

Janke, A., Xu, X. and hason, U. 1997. The complete mitochondrial genome of the wallaroo,

Macropus robusttcs, and the phylogenetic relationship among Monotremata, Marsupialia, and

Eutheria. Proc. Natl. Acad, Sci. U.S.A. 94:1276-128 1.

Jermiin, L.S., Graur, D., Lowe, R.M. and Crozier, R.H.1994. Analysis of directional mutation pressure and nucleotide content in mitochondrial cytochrome b genes. J. Mol. Evol. 39: 160-173.

Jermiin, L.S., Foster, P.G.,Graur, D., Lowe, R.M.and Crozier, R.H. 1996. Unbiased estimation of symmetrical directional mutation pressure from protein-coding DNA. S. Mol. Evol. 42:476-

480.

Kim, K.S., Lee, S.E., Jeong, H.W. and Ha, J.H. 1998. The complete nucleotide sequence of the domestic dog, Canîsfamiliurîs, mitochondrial genome. Mol. Phylogenet. Evol. 10:2 10-220.

Kocher, T. and Carlton, K.L. 1997. Base substitution in fish mitochondriai DNA: patterns and rates. In, "Molecular Systematics of Fishes". (T.D. Kocher and C.A. Stepien ,eds.)Academic

88 Press, San Diego.

Krettek, A., Gullberg, A. and hason, U. 1995. Sequence analysis of the complete

mitochondrial DNA molecule of the hedgehog, Erinaceus europaeus, and the phy logenetic

position of the Lipotyphla. S. Mol. Evol. 41:952-957.

Kumar, S., Tamura, S. and Nei, M. 1993. MEGA, Molecular Evolutionary Genetics Analysis,

Version 1 .O (Pe~sylvaniaState Univ., University Park).

Kumar, S. 1996. Patterns of nucleotide substitution in mitochondrial protein-coding genes of

vertebrates. Genetics l43:53 7-548.

Lockhart, P.J., Steel, M.A., hendy, M..D. and Penny, D. 1994. Recovering evolutionary trees

under a more realistic mode1 of sequence evolution. Mol. Biol. Evol. 11:605-6 12.

Lopez, J.V., Cevario, S. and O'Brien, S.J. 1996. Complete nucleotide sequences of the domestic

cat, Felis calus, mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the

nuclear genome. Genomics 33:229-246.

Martin, A.P. 1995. Metabolic rate and directional nucleotide substitution in animal mitochondrial

DNA. Mol. Biol. EvoI. 12: 1124-1 13 1. Mindell, D.P., Sorenson, MD.,Dimcheff, DE., Hasegawa, M, Ast, LC. and Yuri, T. 1999.

Interordinal relationships of birds and other reptiles based on whole mitochondnal genomes.

Syst. Biol. 48: 138- 152.

Posada, D. and Crandall, K.A. 1998. Modeltest: testing the mode1 of DNA substitution.

Bioinformatics 14(9):8 17-8 18.

Roe, B.A., Ma, D.P., Wilson, R.K.and Wong, J.F. 1985. The complete nucleotide sequence of

the Xcnopus iaevis mitochondrial genome. J. Biol. Chem. 260:9759-9774.

Rzhetsky, A. and Nei, M. 1995. Tests of applicability of several substitution models for DNA

sequence data. Mol. Biol. Evol. 12: 131 - 15 1.

Strimrner, K. and von Haeseler, A. 1996. Quartet puzzling: a quartet maximum likelihood

method for reconstructing tree topologies. Mol. B iol. Evol. 13:964-969.

Suoeka, N. 1962. On the genetic basis of variation and heterogeneity pf DNA base composition.

Proc, Natl. Acad. Sci. U.S.A. 485820592.

Suoeka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl.

Acad. Sci. U.S.A. 85:2653-2657. Takahata, N. 199 1. Overdispersed molecular clock at the major histocompatability complex loci.

Proc. R. Soc. Lond B 243: 13- 18.

Takezaki, N. and Gojobori, T. 1999. Correct and incorrect vertebrate phylogenies obtained by the entire mitoc hondrial DNA sequences. Mol. Biol. Evol. l6:590-60 1.

Thompson, J.D., Gibson, T.J., Plewiak, F, Jeanmolgin, J. and Higgins, D.G. 1997. The Clustal X windows interface: flexible strategies for multiple sequence aiignment aided by quality analysis tools. Nucleic Acids Res. 24:4876-4882.

Ursing, B.M. and Amason, 0.1998. Analyses of mitochondrial genomes strongly support a hippopotamus-whale clade. Unpublished.

Van Tuinen, M., Sibley, C.G.and Hedges, S.B. 2000. The early history of modem birds inferred

From DNA sequences of nuclear and mitochondrial ribosomal genes. Mol. Biol. Evol. 17:451 -

457.

Waddell, P.J., Cao, Y., Hauf, J. and Hasegawa, M. 1999. Using novel phylogenetic methods to evaluate marnmalian mtDNA, including arnino acid-invariant sites-LogDet plus site stripping, to detect interna1 conflicts in the data, with special reference to the position of hedgehog, , and elephant. Syst. Biol. 48:3 1-53. Xia, X. 2000. Data analysis in molecular biotogy wd cvolution. Kluwer Academic Publistiers.

BostodDordrecht/London.

Xu, X. and hason, U. 1994. The complete mitochondrial DNA sequence of the horse, Equus

caballus, extensive heteroplasmy of the control region. Gene 148:357-362.

Xu, X. and Amason, U. 1997. The complete mitochondrial DNA sequence of the white

rhinoceros, Ceratotherium simum, and cornparison with the mtDNA sequence of the

lndian rhinoceros, Rhinoceros unicornis. Mol. Phylogenet. Evol. 7: 189- 194.

Yang, Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39: 105-1 1 1.

Yang, 2. 1996. Among-site rate variation and its impact on phylogenetic analyses. TREE 11:367-

Yang, 2. and Kumar, S. 1996. Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. Mol. Biol. Evol. 13:650-659.

Yang, 2. 1997. PAML:A program package fio phylogenetic analysis by maximum likelihood.

Comput. Appl. Biosci. 13555456.

Zardoya, R. and Meyer, A. 1998. Complete mitochondrial genome suggests diapsid afinities of

92 hniles. Proc. Natl. Acad. Sci. U.S.A. 95: 14226-14231. Chapter 4

Ratite Phylogenetic Rela tionships and Biogeographic History based on

Mitochondrial Genome Sequences

For well over a century, the phylogeny of the ratites bas remained controversial despite a

considerable number of studies published on this group (for review see Sibley and Ahlquist

1990). In the last three decades, several comparative studies based on morphology,

immunological distances, chromosome banding as well as protein and DNA sequencing have clearly dernonstrated the close relationships within the ratites (Cracraft 1974; Prager et al. 1976; deBoet 1 980; Sibley and Ahlquist 198 1, 1990; Stapel et al. 1984; Bledsoe 1988; Caspers et al.

1994; Cooper et al. 1992; Cooper 1997; Lee et al. 1997 and van Tuinen et al. 1998). This has lead to the general acceptance of the monophyly of the ratites, with the tinamous as their closest living relative. There is also a consensus arnong the molecular studies that the living

Australasian ratites form a monophyletic clade.

For the ordering of the basal nodes of the ratite tree, agreement is lacking both within and between osteological and molecular analyses (see Fig. 1.3). Different methodologies have produced a variety of topologies, and different data partitions of the sarne mitochondrial DNA sequence data sets have produced codlicting resuits Lee et al. (1997) sequenced over 5,000 bp

of mitochondrial protein-coding and ribosomal RNA genes. Their molecular tree placed the

rheas as basal among the ratites. Van Tuinen et al. (1 998) sequenced just the mtDNA ribosomal

genes and found strong support for the ostrich as the basal taxon. They attributed the

discrepancy between their findings and those of Lee et al. (1997) to differing phylogenetic

signals between protein-coding and ribosomal RNA genes. Discrepancy also existed between

the molecular tree and the morphological tree produced in the study by Lee et al. (1997). They

explained this discrepancy as the result of improper rooting of the molecular tree and that they

were effectively the sarne tree. When the data sets were combined they fond the kiwi and moa

lineages were basal within ratites. A cornpeting phylogenetic hypothesis which placed moas at

the base of the ratite tree, with kiwis being more derived, was constructed using another set of

morphological characters by Bledsoe (1 988). However, the choice and polarization of characters employed in this analysis was disputed by Lee et al. (1 997). The only published molecular data set which included the sequences from the extinct moas is that of Cooper et al.

(1992), and these were short sequences of mtDNA from 12s rRNA (390 bp). This tree placed the rheas basal mong ratites, with the moas branching off next.

To help resolve the uncertainty in the phylogenetic relationships among the ratite birds, 1 sequenced the complete mitochondrial genomes for seven ratites: the ostrich, emu, southern cassowary, lesser rhea, great spotted kiwi and two extinct species of moa (Anomalopteryx didijbrmes and Emeus crassus). 1 also sequenced mitochondrial genomes for two close relatives: the elegant crested tinamou and the great tinamou. Earlier molecular studies acknowledged the apparent antiquity of this group (Sibley and Ahlquist 1990; Lee et al. 1997;

95 van Tuinen et al. 1998) and stressed the need for greater sequence lengths to resolve the

conflicts among the proposed ratite trees (van Tuinen et al. 1998). The use of the protein-

coding, ribosomal, and transfer RNA genes to reconstruct the phylogenetic relationships within

the ratites provides the opportunity to examine whether conflicting signals within the

mitochondrion are present. The use of complete mitochondrial genomes has been show to

successfully infer phylogenetic relationships at the ordinal and class levels (Cao et al. 1994;

Cummings et al. 1995; Russo et al. 1996; Zardoya and Meyer 1998) and should hopefully prove

appropriate for resolving the relationships among the ratites. The inclusion of moas along with

increased taxon sampling of ratites allows a more thorough examination of whether the apparent

conflicts in the previously proposed tree topologies arise simply fiom incorrect rooting of the

tree or from inadequate phylogenetic signal in previous sequence data sets.

The estimation of the ratite phylogenetic relationships is also critical in testing

competing theories of their biogeography. If the assumption of equal rates of evolution across

lineages is not violated, a molecular clock can be invoked to calculate divergence dates in the

ratite tree. The pattern of divergence along with their dates cm be compared to the dates and order of the fragmentation of Gondwana. This information can then be used to evaluate how well the radiation of the ratites conforms with the vicariance hypothesis or consequently whether some dispersa1 events are required to explain their current distribution. DNA Isoia~ion,Aniplijcotion and Sequencing

For the details on the samples and the DNA isolation, amplification and sequencing

procedures, please refer to the Materials & Methods in Chapter 2.

Sequence Data

Complete mitochondrial DNA sequences of the chicken, Gallus gallus, X52392

(Desjardins and Morais 1990) the greater rhea, Rhea americana, Y 16884 (Hiidid et al. 1998),

the redheaded duck, Aythya urnericana, AF090337 (Mindel1 et al. 1999), the alligator, Alligator

rnississippiensis, Y 13 1 13 (Janke and hason 1997), md the Afncan side-necked turtle,

Pelomedusa subrufa, MO39066 (Zardoya and Meyer 1998) wcre downloaded from Genbank.

Sequences were initially aligned using the program Clustal X (Thompson et al. 1997) and

refined by eye. For ribosomal and transfer RNA genes, the alignrnents were adjusted based on the available secondary structure models for the 12s ribosomal gene (Gutell 1994; Van de Peer et al. 1994; Sullivan et al. 1995 and Hickson et al. 1996), the 16s ribosomal gene (De Rij k et al.

2000) and the 22 transfer RNAs (Desjardins and Morais 1990; Sprinzl et al. 1991; Kumazawa and Nishida 1993). In the case of the protein-coding genes, alignrnents were perfonned using the predicted amino acids and the nucleotides were made to conform to this. Sites containing gaps, ambiguous alignments, and regions of sequence overlap were elirninated. For analyses involving the concatenated sequences of the protein-coding genes, only 12 of the 1 3 protein- coding genes were used. ND6, which is the only protein coding gene transcribed from the L strand, was analysed separately due to the strong difference in base composition and pattern of substitution between the H and the L strands (Adachi et al. 1993). The gene ND3 was included,

97 however the obsewed extra base at position 174 found in ail the genomes sequenced here

(Chapter 2), as well as in other avian genomes (Hiidid et al. 1998; Mindel1 et al. 1998, 1999),

was excluded.

Data Analysis

Several concatenated data sets were assembled for phylogenetic reconstruction as

follows: (1) 36 of the 37 genes, (2) the protein-coding genes minus ND6, (3) both ribosomal

genes, (4) the transfer RNA genes, and (5) each of the protein-coding genes individually.

Concatenated sequence data sets which include the protein-coding genes were analysed with and

without 3" codon positions to assess the influence of saturation effects. Phylogenetic analyses

were camied out using maximum parsimony (MP), maximum likelihood (ML) and neighbour-

joining (NJ) with confidence levels for the interna1 tree topology assessed using nonparametric

bootstrapping. For the MP method, branch-and-bound searches were performed for al1 the

nucleotide data sets with equal weighting using the cornputer package PAUP* (version 4.0b4;

Swofford 1998). For NJ analysis, the program PAUP* was also used. For the ML analysis of the

sequerices in addition to using PAUP*, the prograrn NHML (Galtier and Gouy 1995; Galtier and

Gouy 1998; Galtier, Tourasse & Gouy. 2000, modified and recompiled for Linux by Pavneet

Arora) was also used.

Al1 methods of phylogenetic reconstruction assume uniformity of base composition across taxa. The violation of this assurnption can result in erroneous tree topologies (Lockhart et al. 1994; Galtier and Guoy 1995; Foster and Hickey 1999; Xia 2000) and this error is not alleviated by the use of longer sequences (Penny et al. 1999). Due to the varied base

compositions across these taxa (Chapter 3), additional phylogenetic analyses were camed out

using methods which correct for compositional biases such as the LogDeterminant transform

(Lockhart et al. 1994). To perforrn ML analyses using a correction for compositional biases,

the program NHML (nonhomogeneous maximum likelihood) was used.

The level of support for each taxon grouping (or split) was evaluated for the data sets by

means of spectral analysis (Hendy and Penny 1993; Hendy et al. 1994; Lento et al. 1 995) using

the program Spectmm (version 2.3.O; Charleston 1995).

Modeis

In chapter 3, rate variation among sites was cleariy shown to be present for both the

nucleotide and amino acid sequences. The nucleotide substitution model determined for the

concatenated rRNA, tRNA and the protein-coding genes was the general thereversible model

(GTR) (Yang 1994) with a gamma correction (I') and correction for invariant sites (1) (see

Appendix 4.1 for the models of substitution for each individual data set). The substitution

model was chosen using the prograrn Modeltest (version 3; Posada and Crandall 1998) which

generates a neighbour-joining tree based on the data set and then compares the likelihood values

for this tree across 56 different models of substitution with and without gamma and invariant

sites corrections. The other trend observed in the sequences was the heterogeneous base composition across taxa. While the use of LogDet can correct for nonstationarity across sequences, it has a drawback that it is incompatible with measures to account for rate heterogeneity among sites, such as a r distribution. In lieu of this, rate heterogeneity can be

99 partially conected for by removing proportions of constant sites (Hasegawa et al. 1985; Peter

Lockhart pers. corn.). It has been suggested that this approach often fits the data better than a

discrete r distributed mode1 (Waddell and Penny 1996; Waddell et al. 1997). The values used

for the proportion of invariant sites were estimated using the programs PAUP* and Modeltest.

The correction for cornpositional biases used in the prograrn NHML is computationally

compatible with the sites being fitted to a discrete r distribution. To accurately assess the

difference in G+C content between taxa, program DMP (version 2.0; Jermiin et al. 1996) was

used to calculate the G+C content at synonymous sites. These sites were chosen as they are generally free to Vary and as such, are more likely to show signs of compositional biases.

Rate variations between lineages were assessed using 3 tawon relative rates tests with the total concatenated sequence, (excluding 3" codon positions due to saturation effects).

Unarnbiguous parsimony sites were mapped ont0 3 taxon trees by means of the program

MacClade (version 3.08; Maddison and Maddison 1992). The equality of the rates of two lineages was tested using the Pearson test. The test statistic used was x2 = (a-b)2/(a+b),where a is the number of changes dong the branch from the first ingroup to the , and b is the number of changes fiom the second ingroup to the outgroup. The x2 test assumes independence of sites and has 1 degree of freedom. Values greater than 3.84 are considered significant at the

5% level. The test statistic derivation and the methodology used here follows Waddell et al.

1999b. As repeated testing reduces the reliability of the tests, a sequential Bon Ferroni (Rice

1989) correction was applied.

Sequences which did not fail the 3 taxon relative rates test were used to make a ML dock constrained tree. The mode1 of substitution used was KY85 with rate variation across

100 sites and a correction for nonstationary sequences (Yang and Roberts 1995) as impiemented in

the computer package PAML (version 3.0; Yang 2000).

Results

Analysis of the Concaienated Data Sets

A combined data set of al1 the ribosomal RNA and transfer RNA genes along with 12 of

the 13 protein-coding genes was analysed for the ratites, tinamous, two neognath outgroups

(chicken and duck), as well as, the two non-avian outgroups (alligator and turtle). The methods

used were: MP, NJ and ML. In chapter 3, the 3" codon positions were shown to have high rates

of substitution and are likely experiencing saturation effects for transitional changes. As these

sites make up a large proportion of the variable sites, the sequences were analysed both with and

without 3rdcodon positions (*3rd positions) to assess their influence on the resulting tree

topology. The trees are shown with their bootstrap support in Figures 4.1 and 4.2. Analyses, which included 3" codon positions, yielded trees with similar topologies regardless of the method of phylogenetic reconstruction used. The MP tree and NJ tree were identical and only differed fiom the ML tree in the placement of the ostrich. Bootstrap support was high for nodes which define the paleognaths as a monophyletic group, and rheas as the most basal lineage among the paleognaths. The support for these nodes was equally high when the other sequenced neognath mitochondrial genomes (rook, peregrine falcon, village indigobird, grey-headed broadbill) were included (results not shown). The sister-status of the emu and cassowary with the kiwi as their next closest kin also had strong bootstrap support. Al1 three trees had tinamous with the moas which is contrary to traditional and the findings of previous studies.

101 The sister group relationship of the tinamous and mmwhile not strongly supportai, was also

present in the three trees generated when 3rdcodon positions were excluded. Exclusion of these

sites reduced the bootstrap support for several of the nodes in the tree topologies to less than

50%.

When only lstand 2"" codon positions are included, the topology of the MP tree is

changed. The kiwi switched fiom forming a clade with the emu and cassowary, to being the

sister group of the ostrich. In the NJ tree, the rheas moved to a more derived position in the tree,

making the moaltinamou clade the most basal lineage in the paleognaths. In the ML tree, the

kiwi joined the moa and tinamou clade. Results of the analyses of the individual genes are

shown in Appendix 4.2. Generally, the topology of the trees generated From the analyses of the

individual genes are more variable, and support for individual nodes is lower than that of the

total concatenated sequence.

The above analyses were carried out using methods that assume that the pattern of nucleotide substitution is the same across al1 lineages, and that the overall base composition has not changed with time. Examination of these sequences has shown this is not the case. The 3rd codon positions have been shown to deviate significantly from stationwity. While the remaining sites do pass a stationarity test across al1 sites, they fail a more stringent test when al1 constant sites are removed (Chapter 3). As uneven base fiequencies can result in erroneous trees with taxa clustering according to base composition, it is necessary to examine these trees for evidence of this effect. In Figure 4.1, the base compositions of the synonyrnous sites for each taxon has been mapped alongside the three trees generated. Notably, with a couple of exceptions, base composition follows a smooth gradient fiom low WC content in the most

102 48.0 alligator 54.9 duck 59.7 chickenduck greatcr rhea n 58.0 greater rhea 58.0 lesser rhea ostrich moa l moa 2 elegant crested tinamou elegant crested tinamou great tinamou

79/59 great spotted kiwi 46.8 great spoîted kiwi (MPMJ) remu 46.0 48.9 southem cassowary

Figure 4.1 Phylogenetic hypotheses based on the concatenated sequences of the &NA, tRNA and 12 of the 13 protein-coding genes (al1 codon positions). A) 1s the single most parsimonious tree, as well, as the neighbour-joining tree using the model GTR + 1 +r. B) 1s the tree topology with the maximum likelihood using the same model. The central column shows the percentage of G+C content at synonymous sites for each taxon. Values show within the tree indicate the level of bootstrap support (only values >50 are shown). tude alligator alligator chicken duck duck 1.B greater rhea

100 lesser rhea 73 tinamou

I a 98 67 1 astrich moa 1 62 elegant crested moa 2 greater rhea dgreattinamou elegant crested tinamou - lesser rhea ostrich great tinamou great spotted kiwi great spotted kiwi great spotted kiwi emu Gemu 1 r- emu southern cassowary 100 southern cassowary +southern cassowary

Figure 4.2 Phylogenetic hypotheses based on the concatenated sequences of the rRNA, tRNA and 12 of the 13 protein-coding genes (3rd codon positions excluded). A) 1s the single most parsimonious tree. The topology is essentially the sarne as MP tree in Figure 4.1 with the exception that the ostrich and kiwi are now sister taxa. B) The neighbour -joining tree using the mode1 GTR + 1 + I' C) 1s the tree topology with the maximum likelihood using the same mode]. Values shown within the tree indicate the level of bootstrap support (only values > 50 are shown). derived branches of the tree to a cluster of high WCcontent sequences at the base of the

avian tree. This artifact is present in al1 the trees that included 3" positions and is also

predominant arnongst the trees generated when these sites were excluded. The influence of

unequal base frequencies was Further examined by calculating a distance matrix based solely on

nucleotide composition. This matrix was used to produce a Manhattan tree (computed with the

program Spectnun) (Figure 4.3). The resultant tree shores one notable feature with 5 out of the

6 trees generated fiom the total concatenated data set: the rheas, duck and chicken ail cluster

together. This clustehg is seen in al1 the trees generated from the data sets that included the

protein-coding genes, with 3" codon positions present and in the MP and ML trees when they

were excluded.

To remove the compositional effects, a neighbour joining analysis was performed using

LogDet, with the proportional removal of some of the invariant sites (13" positions, Figure 4.4).

The use of LogDet did have an effect on the resulting tree topologies when compared with methods that assume stationarity. In the two trees produced, the moas and tinarnous no longer form a clade and the tinarnous are the basal paleognath lineage. The rheas no longer cluster with duck and chicken and move to a more derived position in the tree so that the ratites become monophyletic. The mapping of G+C content on the taxa in the tree no longer forms a gradient of increasing G+C content. The two tree topologies which rcsulted (* 3rdpositions) are almost identical, differing only in the position of the ostrich relativr to the rhea. The ostrich becomes basal among the extant ratites when 3" codon positions are removed. Othenvise the rhea is basal. The two trees are also well supported with most of the nodes having high bootstrap support.

105 ostrich chicken duck greater rhea J lesser rhea --moa 1 a moa 2 -cassowary great tinamou elegant crested tinamou +' -alligator - -great spotted kiwi - turtle 0.01 - emu

Figure 4.3 Nucleotide compositional tree based on the total concatenated sequence. Analyses of the rRNA and the Protein-Coding Dara Sets

As the total concatenated sequence is composed of rRNA, tRNA and protein-coding

genes, it is important to ascertain whether the resultant tree topology is independently supported

by the constituent groups of genes which make up this data set, or whether they contain

conflicting phylogenetic signals, as has been suggested (van Tuinen et al. 1998). Analyses of

the ribosomal genes and the protein-coding genes using LogDet correction are shown in Figure

4.5. For the relationships within the paleognaths, the ribosomal genes and the protein-coding genes (al1 sites) produced the same tree. A discrepancy does however arise when 3' codon positions were excluded fkom the protein-coding genes data set. Without these sites, the relationships within the extant ratites changes. The rheas fom a clade with the emu and cassowary, followed by the kiwi and then the ostrich, although the bootstrap support for these nodes are weak. For the relationships outside of the ratites, there was a conflict between the placement of the tinamous relative to the duck and chicken and also a discrepancy in the relationship of the duck to the chicken. The tree generated fiom the ribosomai genes placed the tinarnous as the most basal avian lineage, the ratites are more closely &liated with the chicken and duck implying the paleognaths are paraphyletic. By contrast, the tree constnicted from protein-coding genes (* 3rd positions) retained the tinamous basal to the ratites and the paleognaths were monophyletic, as in the tree fiom the total concatenated data set. Unlike the trees from the ribosomal genes or the total concatenated data set, the two trees generated from the protein-coding genes did not place the duck and chicken to be sister groups, but rather showed them as branching sequentially off the trunk of the avian tree. ealligator chicken duck elegant crested tinamou elegant crested tinamou great tinamou great tinamou

moa 1 moa 1 moa 2 moa 2 greater rhea ostrich lesser rhea greater rhea ostrich lesser rhea

great spotted kiwi great sponed kiwi emu emu

southem cassowary southem cassowary

A) rRNA + tRNA + Protein-coding genes B) rRNA + tRNA + Protein-coding genes excluding 3rd codon positions

Figure 4.4 Neighbour-joining trees using a LogDet correction with an estimate of the proportion of invariable sites A) 1s the tree for the total concatenated sequences of the rRNA, tRNA and 12 of the 13 proteinçoding genes. B) 1s the tree for the total concatenated data set with 3rd codon positions exchded. Values show within the tree indicate the level of bootstrap support (only values > 50 are shown). elegant crested tinamou chicken chicken great iinamou - duck duck 100 elegant crested tinamou elegani cresied tinamou - greai iinamou 73 - moa l 100 moa 2

greaier hea greaîer rhea ostrich -% 100 - lesser rhea lesser rhea great spotted kiwi 74 ostrich - 57 ostrich greater rhea

-72 lesser rhea great spotted kiwi -52 greaî sponed kiwi - I 77 IOI) emu southern cassowary

A) rRNA genes B) Protein-coding genes C) Protein-coding genes excluding 3rd codon positions

Figure 4.5 Neighbour-joining trees using a LogDet correction with an estimate of the proportion of invariable sites A) 1s the tree for the concatenated ribosomal genes. B) 1s the tree for the concatenated protein-coding genes (including al1 codon positions). C) The tree for the concatenated protein-coding genes excluding 3rd positions. Values show within the tree indicate the level of bootstrap support (only values > 50 are shown). Due ta the computational limitations of the program NHML and the search options it

provides, the use of ML was limited to testing hypothesis trees. In Table 4.1, the ML scores for

five different proposed ratite phylogenetic hypotheses are examined. Along with these

hypotheses is the tree generated here using the total concatenated data set, and LogDet corrected

distances. Based on the total data set, the NJ LogDet tree was found to be the optimal tree

amongst these hypotheses.

Spectral Analysis

The representation of phylogenetic information as a tree has the limitation that it presents

only the splits which are compatible without supplying information on suboptimal signals and

the degree of conflict. Spectral analysis provides a means of assessing the degree of support and or conflict for each possible split in a data set, without the need of inferring a phylogenetic tree.

Spectral analysis was performed on a LogDet corrected distance rnatrix generated from the total concatenated sequence. The results are presented in the fom of a Lento plot which shows the support and conflict for the highest ranked 30 splits out of a total of 16,384 possible splits

(Figure 4.6). Splits which appear in the NJ LogDet tree have been indicated on this plot.

Among the splits, the most strongly supported ones are those that define clearly monophyletic groups such as the moas, rheas, tinamous, and the emu and cassowary. The critical splits which define the relationship of these lineages to one another within the ratites (such as splits i, j, k and

1) are more weakly supported. The use of spectral analysis also allowed the cornparison of the support/conflict ranking for splits that are consistent with other proposed phylogenetic hypotheses (Table 4.1). Table 4.1 Log-likelihood scores for different ratite phylogenetic hypotheses calculated using the program NHML and the total concatenated data set. Five different proposed ratite phylogenetic hypotheses were examined dong with the NJ LogDet tree produced .

Tree rRNA+tRNA+Proteins Genes 13535 bp

Ch=chicken, and A=cilligator

1. Cracraft 1974, 1997 Motphology tree. 2. Prager et al. 1976 Immunological distance tree. 3. Sibley and Ahlquist 1990 (UPGMA) DNA-DNA Hybridization tree; van Tuinen et al. 1998 rRNA tree. 4. Sibley and Ahlquist 1990 (Fitch) DNA-DNA Hybridization tree; Biedsoe 1988 Morphology tree. 5. Cooper et al. 1992 12SrWA tree; Cooper 1997 C-MOS tree; Lee et al. 1997 Mitochondrial tree. 6. A variant of the Cooper et al./ Lee et al. tree with moas basal.

Maximum-likelihood score marked by "*".

Splits m through r. are definitive splits chosen fiom altemate tree topologies. AU these

splits have a much lower ranking than the splits which define the NJ LogDet tree recovered

here, indicating little support for these groupings.

Relative Rates Tests

The results of the relative rates tests are show in Table 4.2. Initially the alligator was

chosen as the outgroup when comparing the rates of evolution between the neognaths and the

paleognaths. The outgroup was then moved progressively inward to test for rate differences

between the ratites and the tinarnous. From this test it is clear, even afler the results were

corrected with a sequential Bon Frrroni, that the tinamous are experiencing significantly faster

rates of evolution. For further tests within the ratites, the N! LogDet tree was used as a guide in

choosing the appropriate outgroup. As the ostrich and the rhea can both appear basal among the

extant ratites, depending on whether 3" codon positions are includrd or not, rate tests with these

taxa used one of the moas as the outgroup. After correcting for repeated testing, the relative rates

tests within the ratites were unable to find any statistically significant differences in the rates of evolution arnong the remaining taxa.

Divergence Time Estimation

To estimate divergence times among taxa, a calibration point, usually provided from fossil dates, is required. For birds, these dates have ofien been chosen fiom events such as the split between diapsids and synapsids at 3 10 million years ago (mya) (Kurnar and Hedges 1W8), or the split of birds from crocodilians at 254 mya (Hiirlid et al. 1998). The choice to use ancient

113 Table 4.2 Three taon relative rates test using unambiguou parsimony sites on total concatenated sequences excluding 3" codon positions.

Outgroup lngroup 1 No. of Subs. Ingroup 2 No. of Subs. x2 alligator chicken 1965 duck 2033 1.16 1980 southem cassowary 200 1 0.1 1 1984 emu 1966 0.08 1956 great spotted kiwi 1973 0.07 1997 ostrich 1979 0.08 1978 lesser rhea 1972 0.00 1989 moa l 1985 0.00 1 994 great tinamou 2058 1 .O1 chicken great tinamou 896 southem cassowary 720 19. 17* 906 emu 718 2 1.76* 903 great spotted kiwi 783 8-54'' 909 ostrich 755 14.25* 906 lesser rhea 742 16.32* 88 1 moa I 728 14.55* 926 lesser tinamou 882 1 .O7 great tinamou moa I 8 10 southern cassowary 8 12 0.00 8 17 emu 817 O .O0 782 great spotted kiwi 863 3.99 806 ostrich 834 0.48 789 lesser rhea 837 1.42 moa 1 ostrich 533 southem cassowary 486 2.17 540 emu 503 1.3 1 528 great spotted kiwi 560 0.94

moa 1 lesser rhea 553 southem cassowary 499 2.77 547 emu 503 1.84 533 great spotted kiwi 558 0.57 os tric h great spotted 586 southern cassowary 489 8.76 589 emu 5 12 5.39 * statistically significant after a Bon Ferroni correction 114 dates pria to the origins of bu& is one of necessiîy as the avian fossil record is very

poor. The light weight bodies and hollow bones of birds do not lend themselves well to

fossilization and as a result few reliable avian fossil dates exist. The use of older calibration

dates has inherent dangers as the assurnption of similar or constant rates of evolution has to be

made over very long periods of time, between increasingly divergent taxa. It also relies on

models of evolution that compensate for greater numbers of multiple substitutions. Waddell et

al. (1999b) recently attempted to date avian divergences using a calibration point within birds.

They used a Presbyornis fossil date of 55 mya to calibrate the divergence between and

geese in a ML clock constrained tree generated frorn a 12s rRNA data set. This calibration

point was then used to calculate the divergence date between and ,

which was estimated at 68 mya *5 my. The accuracy of calibration dates can hinge on many

factors. One oftliem is the certainty that the fossil represents the oldest known specimen, which

for this Presbyornis specimen is in question as older dates have been reported (Olson 1994).

Another difficulty with this calibration date is the huge discrepancy between it and the date

calculated for the sarne divergence fiom another molecular data set. Kurnar and Hedges (1 998) calculated the split to have occurred 1 12 rnya k11.7 my based on 658 nuclear genes. The extreme discrepancy between these two divergence times estimates casts doubts on their utility as unarnbiguous calibration dates.

Ratites by merit of their terrestrial adaptations (large size and big bones) would be expected to be better represented in the fossil record. The earliest reported ratite fossil, a portion of the tarsometatarsus, was found on a sub-Antarctic island and dates back to the middle to late

Eocene (50-40 mya) (Tambussi et al. 1994). The identification of this fossil has been questioned

115 as it is fragmentary and has few diagnostic characters linking it with ratites (Waddell et al.

1999b). Aside from this single fossil, no unambiguous ratite fossils are known until the

Oligocene. It is from this time onwards that many of the recognized ratite forms begin to appear. An important fossil ratite, Emuarius gi& was one such form alive at this tirne. Bones of this species have been found in Riversleigh Station in northwestem Queensland, Australia and date back to the late Oligocene (25 mya). This species was described as being close to, but still divergent from, the common ancestor of the emus and cassowaries (Boles 1992). Its hindlimb characters suggested that this form post-dated the divergence event and should be included in the Dromaiinae. This fossil provides the best candidate for a calibntion date within the ratites. However, additional time still needs to te added to reach the divergence event; an extra 20% would seem reasonable and adds 5 my to make the calibration point 30 mya.

A clock constrained ML tree was constructed from the total concatenated sequence data set (without 3" codon positions) for the taxa that passed the relative rates tests. This included al1 the birds except the tinamous. A divergence date of 30 rnya for the emdcassowary was used to calibrate the clock. The tree is show in Figure 4.7 dong with the calculated divergence dates and the associated standard errors. Using this calibration date, the chickedduck split was calculated to be 1 10 mya. This is remarkably close to the estimate of 1 12mya made by Kumar and Hedges (1 998) using nuclear genes and a calibration date of 3 10 mya for the diapsidsynapsid split. If the calibration date used by Waddell et al. (1999) of 68 rnya for this split is used to calibrate the tree, then the date of the emdcassowary split is reduced to 18.5 mya. This date is unreasonable as it postdates the fossil evidence (Emuarius gidju 25 mya)

116 whic h i tsel f postdates the emu/csssowaiy divergence. The divergence dates within the ratites

shows rapid speciation of the basal ratite lineages. Between 67mya and 59 mya, the moas,

, rheas, kiwis and emdcassowary common ancestor branched off fiom the trunk of the

ratite tree.

scussiqn

Compositional Biases

Compositional biases among sequences being used in phylogenetic reconstruction have ofien been show to produce erroneous tree topologies. In the analyses done here, it was readily apparent that the third codon positions of the protein-coding genes had strong compositional biases between taxa. Removal of these sites was sufficient for the sequences to pass a X'

(P>0.05)test of base frequency stationarity. Analyses of the resulting sequences using MP, NJ and ML still returned trees with certain taxa clustering according to their base compositions, such as the chicken, duck and rheas, even when more complex models of substitution were used.

A stricter stationarity test, in which constant sites were removed, revealed that the differences in base compositions extended to al1 sites. The change in the resultant tree topologies by using methods that correct for heterogenous base compositions such as LogDet and the prograrn

NHML supported the idea that the original tree topologies were being influenced by heterogeneous base fiequencies. The use of these corrections resulted in more highly resolved trees with larger bootstrap support and less conflict between the phylogenetic signals of the different data partitions. These results are consistent with the removal of conflicting signals due to compositional effects. duck

chicken 110 mya 18.7 my

> moa 2

71.1 mva.

lesser rhea - 68 mya î6.5 my greater rhea I igreat spotted kiwi 59.3 mya 16.7 my southern cassowary 30 mya

Figure 4.7. Clock constrained ML tree. Dates are based on the emu/cassowary divergence and standard errors are calculated using the standard enors of the ML branch lengths combined with a fossil date error of *5my using the method of Waddell and Penny (1996). Ratire Phyiogeny

A considerable number of studies, using a variety of markers, has been carried out over

the last thirty years in an effort to resolve the relationships of the ratite birds. Except for

establishing the monophyly of this group, agreement is generally lacking on the ordering of

ratite relationships. The analyses of the rnitochondrial genomes has revealed at least some of

the reasons for the general difficulties in obtaining congruent trees, as well as on specific difficulties when using mitochondrial sequences. From the tree topologies produced here and in other DNA comparisons (Sibley and Ahlquist 1990; Lee et al. 1997; van Tuinen et al. 1998) it is clear that the radiation within the ratites occurred very rapidly. When the tree was calibrated with a molccular clock (Figure 4.7), I estimated it took a mere 12 million years between the first ratite speciation event leading to moas to the 1st major divergence, between the lineages leading to kiwis, and emus and cassowaries. The radiation of the lineage leading to rheas and that of the ostrich was almost contemporaneous, as these events were only about 50,000 years apart. For most of the ratites this rapid speciation was followed by long periods, on the order of tens of millions of years, without further record of speciation. The ratite tree is thus characterized by short internodal distances and long terminal branches. This poses a challenge for methods of phylogenetic reconstruction. With short intemodal distances, little time between speciation events is available to accumulate diagnostic substitutions which cm later be used to deduce the pattern of bifurcation. When followed by long terminal branches, the signal that has accumulated along the intemodal branches begins to be obscured or erased by multiple substitutions. The choice of mitochondrial genome sequences was origi~llymade bafed on the

prediction of this overall tree topology. mtDNA, with its faster rate of evolution would have a

greater chance of accumulating signal over these short periods of time between radiation events.

The drawback of using these genes is that their high rate of evolution also makes them prone to

saturation effects especially over such long spans of time.

Several ratite phylogenetic hypotheses (Figure 1.3) have been proposed and among these one of the most contentious issues is the placement of the ostrich and the rhea (Sibley and

Ahlquist 1990; van Tuinen et al. 1998). Three topologies have been put fonh for their placement among the extant ratites, 1) the ostrich is the basal, 2) the rhea is basal, or 3) the ostrich and rhea form a clade at the base of the tree. The tree topologies produced here using complete mitoc hondrial sequences, totalling 13,53 5 bp, found no support for the third option and couldn't resolve between the first two with any strong degree of confidence. For reconstructions using the total data set, the ribosomal genes or the protein-coding genes, the rhea was basal. However, for each data set the situation could be altered by the inclusion or exclusion of different outgroups (other neoganths, results not shown) or sites (3" codon positions). Lee et al. (1997) reported a similar finding. Even with a given topology the support for either the ostrich or rhea being basal was low, with bootstrap values less than 72%. The tree which had the highest bootstrap value for rheas being basal was based on the ribosomal genes alone. This was the opposite finding to van Tuinen et al. (1998), who argued that the ribosomal genes supported the ostrich being basal and the protein-coding genes supported the rheas being basal. The likely explanation for the discrepancy between our ribosomai tree and van Tuinen et al.% (1998) was taon sampling. This situation may be changed when an

120 mitochondrial gemme is seqwnced. The elephant bird is believed to be most closely related to

the ostrich and its inclusion would help to break up the long branch leading to the ostrich and

might increase the resolution. Even if this is the case, the intenodal branch length between the

ostrich and the rhea is extremely short, reflecting, as a brief time interval between the divergence of the ostrich and rhea from their comrnon ancestor. Which of these taxa is basal may remain unknowable with any degree of certainty.

Another uncertainty in ratite systematics has been the placement of the moa relative to the kiwi. Al1 the molecular sequence based studies have placed the kiwi as the closest living relative to the emu and cassowary. Among the four morphological studies of the ratite birds, the kiwi has been placed either with the emu and cassowary (Bledsoe 1988), or at the base of the ratite tree with the moa (Cracraft 1974; Kurochkin 1995; Lee et al. 1997). Based on the results 1 have obtained, the kiwi is the sister group to the emu and cassowary. There was no evidence that it grouped with the moa or resided at the base of the ratite tree. The bootstrap support was generally high for an Australasian clade of ratites composed of the kiwis, emu and cassowary.

Lee et al. (1997) attempted to reconcile the discrepancy between their molecular tree based on

5,000 bp of mtDNA sequence, which supported a kiwi, emu and cassowary clade, and their morphological tree which held ihat the kiwi and moa were sister taxa located at the base of the ratite tree. They argued that both topologies were in fact the same and the discrepancy between them was a result of improper rooting. This argument holds as long as moas are not included in the analysis, as was the case in their molecular study. The inclusion of moas in my study yielded trees which generally support Lee et al.% (1997) molecular tree, and are inconsistent with the morphological tree which has the kiwi as basal sister group to the moa. Most

121 morphologica1 studies place moas at the base of the ratite tree (Cracraft 1974; Bledsoe 1 988;

Kurochkin 1995). Due to the difticulties in getting organic macromolecules from extinct animal

remains, only one evolutionary study involving moas has been published using molecular

markers (Cooper et al. 1992). Based on 366 bp of the mtDNA gene 12s rRNA, they found the

rhea to be basal among al1 the ratites rather than the moa. The moa occupied a position in the

tree between rhea and ostrich. Considering the difficulties described here wi th cornpositional

biases among sequences, this result might be expected, but Cooper (1 997) later re-analysed

these results using a LogDet transfomi and reported the same result. At no point in the LogDet

analyses done here, with any data set (even the ribosoma! genes), does the rhea corne out as

basal to the moa. The discrepancy between my finding of the moa being basal and of Cooper

(1997) of the rhea being basal can most likely be explained due to the limited signal in only using 366 bp of sequence in resolving relationships in such an ancient group.

Historical Biogeography ofthe Ratites

The geographic distribution of the ratites has been one of the impediments to recognizing the ratites as a monophyletic group. For many biologists of the 19'h and early 20thcenturies, it was dificult to conceive how these flightless birds could have such a disjunct distribution across the southem hemisphere, unless they arose independently. With the acceptance of the theory of continental drift and the recognition that al1 the land masses occupied by the ratites were once part of a larger landmass called Gondwanaland, Cracraft (1 973, 1974) proposed the vicariance hypothesis of ratite origins. He proposed that the ratites al1 descended fiom a flightless ancestor on this supercontinent. As the landmass fragmented, the descendant ratites ail passively rafled

122 with their continental fragments to their current positions. This idea was supporteci by his cladistic analysis of ratite morphological characters which produced a phylogeny whose branching pattern was largely in agreement with the continental fragmentation pattern (Figure

4.8). Later analyses of the ratites based on molecular markers and morphological characters disputed the basal placement of the kiwi in the ratite tree (Sibley and Ahlquist 198 1,1990;

Bledsoe 1988; Cooper et al. 1992,1997; van Tuinen et al. 1998). These groups placed the kiwi in a more derived position (as found in this study) which appears to be at odds with a vicariance origin unless the ratites are even older than currently thought. New Zealand is the only place where kiwis are found and it split offearly in the history of the break-up of Gondwanaland (70-

85 mya). This has lead some researchers to propose that the kiwis rnay have secondarily invaded New Zealand. This later invasion was suggested to have occurred by either swimming or by flight (Cooper et al. 1992). The discovery of volant birds with archaic palates in North

Amercia and approximately 50-60 rnya (Houde 1986) offered the scenario that perhaps flightlessness arose independently among the mtite lineages and dispersal played a role in their biogeography. Van Tuinen et al. (1998) proposed two altemate views on ratite biogeography both which contain the original vicariance hypothesis with some elements of dispersal.

Clocking the age of the rhedostrich split using an alligatorlbird divergence date, Halid et al.

(1998) proposed a maximum age of the rhedostrich divergence at 5 1 mya Gmy. Van Tuinen et al. (1 998) recognized that the separation of fiom 100 mya (Smith, Smith and Fumeil 1994) was too eady a date for the separation of rheas from oshiches to be explained by vicariance alone, and that some dispersal either out of Afnca, through the northern hernisphere into South Amenca or the reverse was required. Dispersal was also required in their

hypothesis to reach Australia and New Zealand.

The method of calibrating the ratite clock used by Hiidid et al. (1 998) assumed an

instantaneous slowdown in the rate of evolution in birds at the divergence between birds and

crocodilians. A more biologically reasonable assumption would be that the slowdown occurred

along the lineage leading to birds, which would lower the bird rate and make a11 the avian

divergence dates older. In my clocked tree (Figure 4.7) an intemal ratite calibration date was

used for the divergence of emu from cassowary at 30 mya. This date gives a duckkhicken split

estimate which is in good agreement with Kumar and Hedzes (1 998), suggesting this calibration

may be reasonable. With this calibration, the split between the rhea and ostrich is 68 mya.

Mile this estimate is older than of Htirlid et al. (1998), they likely would have obtained a

similar date if they had applied a mid-branch correction method (Penny et al. 1998) along the

branch leading to birds. The date of 68 mya is clearly too young for the ostrich to be a product

of vicariance in the traditional view of the separation between Africa and South America.

However, vicariance cannot be ruled out for the ostrich as new evidence of land links between

Antarctica through India to Madagascar (Krause et al. 1997) lasting until the late suggests Africa could have been co~ectedto Antarctica to a rnuch later date. The date for the divergence of the kiwi from the emu and cassowary was calculated to be 59.3 mya. This date appears to be too young for kiwis to have split off fiom the ratite tree when New Zealand split from Gondwanaland. However, considering the errors associated with these estimates and with the fossil dating, and recognizing that an increase in the date of the emu/cassowary comrnon ancestor fiom 30 rnya to 35 mya would be sufficient to increase the age of the kiwi lineage fiorn

124

59.3 mya to 70 mye., a vicariant origin for the kiwis caruiot be mled out. This divergence date of 59.3 mya for the emu/cassowary common ancestor is clearly close to the separation date of Australia fiom Antarctica, 64 mya (Smith, Smith and Funnel 1994). The moa lineage branched off the from the other ratites 71 mya. This is clearly old enough that its presence in New Zealand could be explained as vicariance. The divergence of the rheas 68 mya, predates the fragmentation of South Amerka from Gondwanaland (34 mya).

To summarize these rnolecular clock results based on the tree I obtained with the large mtDNA sequence datasets, al1 the divergence dates of ratites taxa except the ostrich are close to the fragmentation dates of the landmasses to which they are currently found, and thus are consistent with the vicariance hypothesis of Cracraft (1 974).

Rcferences

Adachi, J. Cao, Y. and Hasegawa, M. 1993. Tempo and mode of mitochondrial DNA evolution in vertebrates at the amino acid sequence level: rapid evolution in warm-blooded vertebrates. J.

Mol, Evol, 36:270-281.

Adachi, J. and Hasegawa, M. 1996a. MOLPHY,version 2.3: Prograrns for molecular phylogenetics based on maximum-likelihood. Comput. Sci. Monogr. No. 28. Institute of

Statistical Mathematics, Tokyo. Adachi, J. and Hasegawa, M. 1996b. Model of amino acid substitution in proteins encodeci by

mitochondrial DNA, J, Mol. Evol. 42: 459-468.

Boles, W.E., 1992. Revision of Dromaius gidju Patterson and Rich 1987 fiom Riversleigh,

northwestem Queensland, Australia, with a reassessment of its generic position. Natural History

Museum Los Angeles County, Science Series 36: 195-208.

Cabot, E. 1997. XESEE version 3.1, Eyeball Sequence Editor.

Cao, Y., Adachi, J. A., Janke, A., PMbo, S. and Hasegawa, M. 1994. Phylogenetic relationships among eutherian orders estimated from inferred sequence of mitochondnal proteins: Instsbility of a tree based on a single gene. J. Mol. Evol. 395 19-527.

Charleston, M. 1 995. SPECTRUM program: http://taxonomy.zoo1ogy.gla.ac.uk/- mac/mike/spectnim/spectrurn.html, Glasgow.

Cooper, A. 1997. Studies of avian ancient DNA: from jurassic park to modern island extinctions. In "Avian Molecular Evolution and Systematics" (D.P.Mindell, Ed.), Academic

Press, San Diego. pp 345-373. Cooper, A., Mourer-Chauvire, C.,Chambers, G.K., von Haeseler, A., Wilson, A.and Piiiibo, S.

1992. Independent origins of New Zealand moas and kiwis. Proc. Nat. Acad. Sci. U.S.A.

89:874 1-8744.

Casper, G, Wattel, J. and de kng W.W. 1994. a A-crystallin sequences group tinamou with

ratites. Mol. BioI, Evol. 11:711-713.

Cracraft, J. 1974. Phylogeny and evolution of the ratite birds. Ibis 116:494-521.

Cummings, M. P., Otto, S. P. and Wakeley, J. 1995. Sampling properties of DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. l2:8 14-822.

De Boer, L.E.M. 1980. Do the chromosomes of the kiwi provide evidence for a monophyletic origin of the ratites? Nature 287:84-85.

De Rijk, P., Wuyts, J., Van de Peer, Y., Winkelmans, T. and De Watchter, R. 2000. The

European large Subunit Ribosomal RNA database. Nucleic Acids Res.28: 177-178 Desjardins, P. and Morais, R. 1990. Sequence and gene organization of the chicken

mitochondnal genome: a novel gene order in higher vertebrates. J. Mol. Biol. 212599-634.

Feduccia, A. 1999. The origin and evolution of birds. 2" Edition. Yale Univ Press, New Haven

Connecticut,

Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.52.

Distributed by the author. Department of Genetics, University of Washington, Seattle.

Foster, P.G.and Hickey, D.A. 1999. Compositional bias may affect both DNA-based abd protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284-290.

Galtier, N. and Guoy, M. 1995. Infemng phylogenies from DNA sequences of unequal base compositions. Proc. Nat. Acad. Sci. 92: 113 17-1 1321.

Galtier, N., Tourasse, N. and Guoy, M. 2000. A nonhyperthermophilic common ancestor to extant life forms. Science 283:220-22 1.

Gutell, R. R. 1994. Collection of small subunit (16s- and 16s-like) ribosomal RNA structures:

1994. Nucleic Acids Res. 22:3502-3507. Hagelberg, E. 1994. Mitochondrial DNA fiom ancient bones. In "AncientDNA" (B. Hemnann

and S. Hummel, eds.), Springer-Verlag, New York.

Hiidid, A., Janke, A. and hason, U. 1997. The mtDNA sequence of the ostrich and the

divergence between paleognathous and neognathous birds. Mol. Biol. Evol. 14:754-76 1.

Hklid, A., Janke, A. and hason, U. 1998. The complete mitochondrial genome of Rhea

americana and early avian divergences. J. Mol. Evol. 46:669679.

Hasegawa, M., Kishino, H. and Yano, T. 1985. Dating of the human-ape splitting by a

molecular clock of mitochondrial DNA. I. Mol. Evol. 21: 160-174.

Hay, W.W., DeConto, R., Wold, C.N., Wilson, K.M.,Voigt, S., Schulz, M., Wold-Rossby, A.,

Dullo, W.-C.,Ronov, A.B., Balukhovsky, A.N. and E. Soeding (1999): ALTERNATIVE

GLOBAL CRETACEOUS PALEOGEOGRAPHY, in Barrera, E. and Johnson, C. (eds.),

The Evolution of Cretaceous OceanKlimate Systems, Geological Society of America Special

Paper 332, pp. 1-47.

Hendy, M.D.and Penny, D. 1993. Spectral analysis of phylogenetic data. Klassif. 105-24.

Hendy, M.D., Stell, M.A. and Penny, D. 1994. A discrete Fourier analysis for evolutionary trees.

Proc. Natl. Acad. Sci. USA 91:3339-3343. Hickson, R. E., Simon, C., Cooper, A., Spicer, G. S., Sullivan, J. and Penny, D. 1996.

Conserved sequence motifs, alignmen and secondary structure for the third domain of animal

12SrRNA. Mol, Biol. Evol. 13: 150-169.

Houde, P. 1986. Ostrich ancestors found in the northem hemisphere suggest new hypothesis of

ratite origins. Nature. 324563465.

Irwin, D., Kocher, T. D. and Wilson, A. C. 1991. Evolution of the cytochrome b gene of

marnmals. J. Mol. Evol. 32: 128- 144.

Janke, A. and Amason, U. 1997. The complete mitochondrial genome of alligator rnississippiensis and the separation between recent archosauria (birds and crocodiles). Mol. Biol.

Evol. 14: 1266- 1272.

Jermiin, L.S., Foster, P.G.,Graur, D., Lowe, R.M. and Crozier, R.H. 1996. Unbiased estimation of symmetrical directional mutation pressure fiom protein-coding DNA. J. Mol. Evol. 42:476-

480.

Krause, D. W., Prasad, G.V.R., von Koenigswald, W., Sahni, A. and Grine, F. 1997.

Cosmopolitanism among gondwanan marnrnals. Nature 390504-507. Kumar, S., Tamura, S. and Nei, M. 1993. MEGA, Molecular Evolutionary Genetics Analysis,

Version 1.O (Pennsylvania State Univ., University Park).

Kumar, S. and Hedges, S.B. 1998. A molecular timescale for vertebrate evolution. Nature.

392:9 17-920.

Kumazawa, Y. and Nishida, M. 1993. Sequence evolution of mitochondrial tRNA genes and deep-branch animal phy logenetics. J. Mol. Evol. 37:380-398.

Kurochkin, E.N. 1995. Morphological differentiation of paleognathous and neognathous birds.

Courier Forschungsinst. Senckenberg 181:79-88.

Lee, K., Feinstein, J. and Cracraft, J. 1997. The phylogeny of ratite birds: resolving conflicts between molecular and morphological data sets. In "Avian Molecular Evolution and

Systematics". Academic Press, San Diego. pp173-209.

Lento, G.M., Hickson, R.E., Chambers, G.K.,Penny, 0.1995. Use of spectral analysis to test hypothesis of the origin of pi~ipeds.Mol. Biol. Evol. 12:28-52.

Lake, LA. 1994. Reconstructing evolutionary trees fiom DNA and protein sequences: Paralinear distances. Proc. Natl. Acad. Sci. USA 91: 1455-1459. Lockharî, P.J., Stell, M.A., Hendy, M.D. and Penny, D. 1994 Recovering evolutionary trees

under a more realistic mode1 of sequence evolution. Mol. Biol. Evol. 11:605-6 12.

Maddison, W.P. and Maddison, D.R. 1992. MacClade version 3. Sinauer Associates,

Sunderland, Massachusetts.

Mindell, D.P.,Sorenson, M.D.,Dimcheff, D.E., and Dimcheff, D.E. 1998. An extra nucleotide

is not translated in mitochondrial ND3 os some birds and turtles. Mol. Biol. Evol. 15: 1568-

1571.

Mindell, D.P., Sorenson, M.D., Dimcheff. D.E., Hasegawa, M., Ast, J.C. and Yuri, T. 1999.

Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes.

Syst. Biol. 48: 138- 152.

ODSN 2000. n \t 11 .owi 11.ci r, Map Generator.

Olson, S.L. 1994. A giant Presbyornis (Aves: anseriformes) and other birds from the

Aquia formation of Maryland and Virginia. Proc. Biol. Soc. of Washington. 107:429-435.

Penny, D. and Hasegawa, M. 1997.The platypus put in its place. Nature 387549450. Penny, D., Hasegawa, M., Waddell, P.J. and Hendy, MD. 1999. Marnmalian evolution: timing

and implications from using the LogDeterminant transfonn for proteins of differing amino acid

composition. Syst. Biol. 48:76-93

Posada, D. and Crandall, K.A. 1998. Modeltest: testing the mode1 of DNA substitution.

Bioinforrnatics 14:817-8 18.

Pnger, E.M., Wilson, A.C., Osuga, D.T. and Feeney, R.E. 1976. Evolution of flightless land birds on southem continents: transfemn cornparison shows monophyletic origin of ratites. J.

Mol. Evol. 8:283-294.

Russo, C. A., Takezaki, N. and Nei M. 1996. EtTiciencies of different genes and different tree- building methods in recovering a known vertebraie phylogeny. Mol. Biol. Evol. 13525-536.

Sambrook, I., Fritsch, E. F. and Maniatis, T. 1989. Molecular Cloning: A Laboratory Manual,

2"ed. Cold Spring Harbour Laboratory Press, Cold Spring Harbour, New York.

Sankoff, D., Leduc, G., Antoine, N., Paquin, B., Lang, B.F. and Cedergren 1992. Gene order for phylogenetic inference: evolution of the mitochondrial genome. Proc. Nat. Acad. Sci. U.S.A.

89:6575-6579. Sibley, C.G.and Ahlquist, J.E. 1981. The phylogeny and relaiionships of the birds as

indicated by DNA-DNA hybridization. In " Evolution Today, Proc 2" Congr for Systematic

Evolutionary Biology", (G.G.EScudder. and L.L. Reveal Eds). Hunt Inst. Botanic Document,

Pittsburgh, pp 301 -335.

Sibley, C.G. and Ahlquist, J.E. 1990. Phylogeny and classification of birds. Yale Univ Press,

New Haven Connecticut.

Smith, A.G., Smith, D,G,and Funnell B.M. 1994. Atlas of Mesozoic and Cenozoic coastlines.

Cambridge University Press, Cambridge, England.

Sprinzl, M., Dank, N., Nock, S and Schon, A. 1991. Compilation of tRNA sequences and sequences of tRNA genes. Nuc. Acids Res. 19:2 127-2 17 1.

Stapel, S.O., Leunissen, S.A.M., Veateeg, M. Wattel, J. and de Jong, W. W. 1984. Ratites as the oldest offshoot of avian stem-evidence from a crystallin A sequences. Nature. 3 1M57-259.

Steel, M. 1994. Recovering a tree from the Markov leaf colourations it generates under a

Markov model. Appl. Math. Lett. 7:19-23.

Strimmer, K. and von Haeseler. A. 1996. Quartet puzziing: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13: 964-969.

135 Sullivan, J., Holsinger, K. E. and Simon, C. 1995. Among-site rate variation and phylogenetic

analysis of 12SrRNA in sigmodontine rodents. Mol. Biol. Evol. lW88- 100 1.

Swofford, DL. 1998. PAUP*: Phylogenetic analysis using parsimony (* and other methods),

ver 4.0.bS Sinauer, Sutherland, Mass.

Tambussi, C.P.,Noriega, J.I., Ga-dzicki, A. Tatur, A., Reguero, M.A. and Vizcaino, S.F. 1994.

Ratite bird fiom the Paleogene La Mesta formation, Seymour Island, Antarctica. Polish Polar

Res. 15: 15-20.

Thompson, J.D.,Gibson, T.J., Plewiak, F, Jeanmolgin, J. and Higgins, D.G. 1997. The Clustal

X windows interface: flexible strategies for multiple sequence alignment aided by quality

analysis tools. Nucleic Acids Res. 24487694882.

van de Peer, Y., Van deen Broeck I., Rijk, P. and de Wachter, R. 1994. Database on the structure

of srnaIl ribosomal subunit RNA, Nucleic Acids Res. 22:3488-3494.

van Tuinen, M., Sibley, C. G. and Hedges S. B. 1998. Phylogeny and Biogeography of ratite birds infened fiom DNA sequences of the mitochondrial ribosomal genes. Mol. Biol. Evol. 15:

3 70-3 76. Waddell, P.J. and Penny, D. 1996. Evolutionary trees of apes and humans from DNA sequences.

Handbook of humans symbolic evolution. Oxford University Press pp. 53-73.

Waddell, P.J., Penny, D. and Moore, T. 1997. Jadarnard conjugation and modeling sequence evolution with unequal rates across sites. Mol. Phylogenet. Evol. 8:33-50.

Waddell, P.J., Cao, Y., Hauf, J. and Hasegawa, M. 1999a. Using novel phylogenetic methods to evaluate marnmalian mtDNA, including amino acid-invariant sites-LogDet plus site stripping, to detect interna1 conflicts in the data, with special reference to the position of hedgehog, armadillo, and elephant. Syst. Biol. 48:3 1-53.

Waddell, P.J., Cao, Y., hasegawa, M. and Mindell, D.P.1999b Assessing the cretaceous superordinal divergence times within birds and placental marnrnals by using whole mitochondrial protein sequences and an extended statistical frarnework. Syst. Biol. 48: 1 19-137.

Xia, X. 2000. Data analysis in molecular biology and evolution. Kluwer Academic Publishers.

Boston/Dordrecht/London

Yang, 2. 1994. Estimating the pattern of nucleotide substitution. S. Mol. Evol. 39: 105-1 1 1.

Yang, 2.2000. Phylogenetic Analysis by Maximum Likelihood (PAML), Version 3.0.

University College, London, London, England.

137 Yang, Z. and Roberis, D. 1995. On the use of nucleic acid sequences to infer early branchings in

the tree of life. Mol. Bio. Evol. 12:451-458.

Zardoya, R. and Meyer, A. 1999. Complete mitochondrial genome suggests diapsid afinities of turtles. Proc. Nat. Acad. Sci. U.S.A.95: 14226-1423 1. Chapter 5

Summary

Characterinition of the Paleognath Mitochondrial Gemmes

The nine paleognath mitochondrial genomes sequenced here are similar in size and gene

complement to other vertebrate genomes. Bird mitochondrial genomes that have been

sequenced to date are unique from other vertebrate genomes in their gene order, where ND6 is

separated from the control region by tRNAG'",and the origin of light strand replication found in the WANCY region in other vertebrates is absent. As paleognaths likely represent one of the

basal avian lineages, it was of interest to find eight of the nine genomes share this avian gene order. The ninth, the elegant crested tinamou, has a remangement where ND6 was translocated to the 3' end of the control region. Similar rearrangements to that found in this tinamou have subsequently been found in many independent bird lineages. This region appears to be a "hot spot" for gene rearrangements in bird mtDNA. A large nurnber of the translocations observed occur between the ~RNA~~and RNAG'". Among the ratites, this region was the most variable in the entire mitochondrion. With the mitochondrial genome under strong constraints for maintaining a small size, spacer sequences are limited to a few base pairs or simply absent between genes. However, between these two tRNAs was found the highest level of size variation, with the spacer ranging fiom over 80 bp in kiwis and the great tinamous, dom to 8 bp in the lesser rhea and 4 bp in the elegant crested tinamou. Another feature that appears to be unique to virtually al1 non- birds observed here was an additional base pair found at

position 174 bp in the gene ND3. A single base insertion in a protein-coding gene results in a

frameshift. Such an event should be strongly selected against, yet this extra base was found in

al1 the paleognaths sequenced here and in representatives of al1 the non-passerine orders except

one. It was also not present in the passerines surveyed, which included oscines and suboscines.

The maintenance of this insertion across almost al1 the diverse avian taxa sampled here,

representing al1 bird orders, would seem to indicate that a mechanism exists which will edit this

base out pior to the translation of the mRNA. This conclusion can be made as the sites beyond

the insertion maintain a reading he,albeit shifted by one base, which codes for a conserved

protein product. Furthemore, the codon sites maintain the ratio of decreasing variation in base

substitution rates at the ratio of 3": 1":2" ,typically observed for protein-coding genes under

selection. Generally, the features observed for these paleognath genomes was consistent with

that found in other avian genomes.

Rates and Patterns of Nucfeotide and Amino Acid Sequence Evofution

I compared the rates and patterns of nucleotide and amino acid sequence evolution

within the paleognaths, as well as between the paleognaths and other avian and non-avian

genomes. Sequence divergence within the paleognaths, excluding confamilial comparisons, revealed considerable amount of genetic divergence. Substitutional saturation effects were already apparent at third codon positions for transitional substitutions. This either indicated a group with high rates of evolution or that the paleognaths constitute a very ancient lineage within birds. Examination of the base composition across these taxa showed a considerable

amount of heterogeneity. This was most clearly evident at third codon positions which are likely

to be more susceptible as they are under reduced selection. Stringent tests of stationarity

examining only variable positions, demonstrated that al1 sites within the protein-coding genes

are experiencing directional mutation pressure. This is also the case for the nbosomal RNA and

the transfer RNA genes although to a much lesser degree. Within the extant ratites, only the

rheas were experiencing directional mutation pressure driving them toward greater G+C content.

The others were driven towards greater A+T content. The moas were effectively neutral. In cornparisons of the bird genomes to those of the mamrnals, it is clear that the mammals are predominantly responding to A+T pressure, while al1 but one of the neognaths are neutral or increasing in G+C content. Interestingly, most of the ratites with their mammal-like body temperatures and metabolic rates, share this mammal like A+T directional mutation pressure, with the rheas being the glaring exception. The bias in G+C or A+T base content is not limited to sites under reduced selection. An examination of nonsynonymous sites reveals that birds overall also have higher G+C content at these sites than mammals.

The failure to pass tests of stationarity has ramifications on the use of these sequences for inferring refationships. AI1 methods of phylogenetic reconstruction assume uniform base frequencies across the sequences being analysed. Violation of this cm lead to emneous trees.

Analyses of these sequences also demonstrated that a considerable amount of rate variation across sites is also present.

Inferring Ratite P hylogenetic Relationships Phylogenetic reconstructions using the concatenated sequences of the rMA, tRNA and

12 of the 13 protein-coding genes produced trees that were largely resolved and well supported

when the heterogeneity in the base composition across taxa was corrected. The tree topologies

produced by ribosomal genes or protein-coding genes alone agreed with the tree produced using

the total concatenated sequence in respects to the arrangements of the ratites. However, sub-sets

of the total concatenated data set were more weakly supported. Exclusion of 3" codon positions

in the protein-coding genes resulted in a failure to îully resolve the tree, suggesting that despite

high levels of saturation at these sites, they still retained some phylogenetic signal. Individual

genes had even less resolving power and were not reliabie at recovering the tree.

The tree constructed from the concatenated sequences placed the moas basal among the

ratites, in agreement with al1 the morphological studies, but in contrast with the single molecular analysis involving moas. The kiwis were not found to be the sister group to the moas, but were

far more derived and were most closely related to the emu and cassowary. This agrees with al1 the molecular studies performed on this group to date but disagrees with two of the three morphological studies. The order of branching of the rhea and ostrich could not be resolved unequivocally.

Raiiie Historical Biogeography

Relative rates tests showed the ratites and the neognath outgroups to have similar rates of evolution and a molecular clock was applied. The date used to calibrate the clock was that of a fossil ratite close to the divergence of emus fiom cassowaries. This calibration placed the divergence between anseriformes and galliformes at 1 1O mya, which is very close to an independent estimate wing nucleei genes. The basal divergences of ratite lineages al1 occurred

within a short period of about 12 myr, which has likely been responsible for the dificulty in

resolving the relationships within this group. A vicariant origin for the ratites associated with

the fragmentation of Gondwana could not be rejected for the major ratite lineages. The moas

speciated sufficiently long ago that they likely were in New Zealand when it drifted away from

Antarctica. The divergence of the emus/cassowary common ancestor from the kiwi coincides

approximately with the separation between Australia and Antarctica. This same date is very

close for the kiwis to have separated with New Zealand. However, considering the associated

errors this hypothesis cannot be ruled out. The ostrich, which is estimated to have diverged 68

mya, would appear to young for the split between Africa and South America which occurred 100

mya. However recent evidence for land bridges between Antarctica, India and Madagascar in the

late Cretaceous allow for an altemate route by which Africa was potentially co~ectedto

Gondwana to a much later date and thus do not rule out a vicariant origin for the ostrich.

Complete mitochondrial genomes have proven to be a powerful tool in resolving the deeper relationships among marnrnals, and in this thesis 1 have shown that these sequences have successfully clarified the relationships among the ratite birds. However, the antiquity of basa1 divergences among birds coupled with the loss of flight in ratites has been associated with the evolution of compositional biases, and saturation in third codon positions. Methods which correct for these problems are essential in recovering well supported trees free of topological instability. Appendices LICVMFYWPAGSTNQHKRDE

t-

LICVMFYWPAGSTNQHKRDE

Appendix 2.2 Amino acid frequencies for the rnitochondrial protein-coding genes. LICVMFYWPAGSTNQHKRDE

PI LICVMFYWPAGSTNQHKRDE

30 - ATPase 6

20:: - - -.- - -

LICVMFYWPAGSTNQHKRDE Amino Acid mchisken Utirtamou memu ~iss~wrpy irha f&rich moa Appendix 2.2 Amino acid frequencies for the mitochondrial protein-coding genes - continued. LICVMFYWPAGSTNQHKRDE

LICVMFYWPAGSTNQHKRDE

Appendix 2.2 Amino acid fiequencies for the mitochondrial protein-coding genes - continued. LICVMFYWPAGSTNQHKRDE

LICVMFYWPAGSTNQHKRDE

30 - Cyt b

LICVMFYWPAGSTNQHKRDE Amino Acid

Appendix 2.2 Amino acid frequencies for the mitochondrial protein-coding genes - continuai. Appendix 2.3 Listing of birds sequemd for ND3, indicates presence or absence of single base insertion at position 174 bp in the gene.

Extra Order Saecies Common Name Base

Struthioniformes Sfruthio camelus Ostrich Rheiformes Rerocnemlapennata Lesser Rhea Casuarius casuarius Southern Cassowary Dromaius novaehollandiae Emu Apterygiforrnes Apfetyx haasti Great Spotted Kiwi Tinam iformes Eudromia elegans Elegant Crested Tinamou Sphenisciformes Pygoscelis adeliae Adelie Penguin Eudyptula minor Little Blue Penguin Gaviiformes Gavia immer Common Loon Podicipediformes Podlceps auritus Slavonian C tebe Procellariformes Diomedea adans Wandering Albatross Diomedea chrysostoma Grey-headed Albatross Pelecaniformes Sula bassana Northern Cannet Ciconiiforme Ardea herodias Great Blue Heron Anseriformes Brania canadensis Canada Goose Falconiformes Aquila chysaetos Golden Eagle Galliformes Gallus gallus Chicken Gruiformes Grus canadensis Sandhill Crane Charadriiformes Sferna paradisaea Arctic Tern Arenaria interpres Ruddy Turnstone Columbiformes Columa livia Rock Dove Psittaciformes Amazona fucumana Tucuman Amazon Agapornis roseicollis Peach-faced Lovebird Cucliformes Chalcites lucidus Golden Bronze Cuckoo Cuculus canorus Strigiformes Nyciea scandiaca Snowy Owl Caprimulgiformes Chordeiles minor Common Nighthawk Caprimulgus vociferus W hippoor-will Apodiformes Chaetura pelagica Chimncy Swift Archilochus colubrls Ruby-throated Coliiformes Cdks st~iètus Speckted Mouseblrd Trogoniformes Apaloderma narlna Narina's Trogon Coraciiformes Dace10 gigas Kookaburra Piciformes Plcoides pubescens Downy Woodpecker Passeriformes Parus atricapillus Black-capped Chickadee Cardinalis cardinalis Northem Cardinal Sfurnella neglecta Western Meadowlark Agelaius phocniceus Red-winged Blackbird Carduelis chloris Greenlinch Zonofrichia leucophrys White-crowned Sparrow Seiurus chlorls Oven bird Plectrophenax nivalb Snow Bunting Pifta versicolor Noisy Pitta

149 Appendix 3.2 Symmetrical directional mutation pressure estimates for the rnitochondrial protein-coding genes for various taxa..

rook chicken tinamou moa rhea emu cassowary kiwi ostrich ATPase6

ATPase8 COI COI1 COlll

CY~b ND1 5~2 ND3 ND4L ND4 ND5 ND6

------12genesm 0.483*** 0.5362 0.477*** 0.509 0.567E 0.447*** 0.47Se** 0.454*** 0.493* Concatenated sequences of the 12 H-strand encoded protein-coding genes Appendix 3.2 continued.

-- -- alligator turtle human cow mouse cat wallaroo frog

ATPase6 0.499 0.354*** 0.5 16 0.360*** 0.320*** 0.380*** 0.401 ** 0.299** *

COI 0.454* 0.37 1 *" 0.507 0.384*** 0.323*** 0.38 1 *** 0.368*** 0.305***

12 geneE 0.467*** 0.385*** 0.5282 0.41Sf ** 0.363*** 0.432*** 0.422*** 0.3 12*** Concatenated sequences of the 12 H-strand encoded protein-coding genes Appendix 4.1 Models of substitution estirnated for the individual gene data (al1 codon positions) sets using the prograrn Modeltest v3. The model and parameters shown were those satisfying the Akaike Information Criterion.

ATPase 6 ATPase 8

Model selected: TrN+G Mode1 selected: GTR+I+G -1nL = 4867.7456 -InL = 1287.9 136 Substitution model: Substitution model: Rate matrix Rate matrix R(a) [A-C] = 1.0000 R(a) [A-Cl = 0.0627 R(b) [A-G] = 12.7723 R(b) [A-G] = 2.7400 R(c) [A-T] = 1.0000 R(c) [A-TJ = 0.0000 R(d) [C-G] = 1 .O000 R(d) [C-G] = 0.3767 R(e) [C-T] = 8.8433 R(e) [C-T] = 0.4003 R(f) [G-T] = 1 .O000 R(f) [G-T]= 1.0000 Base frequencies: Base frequencies: Equal frequencies freqA = 0.3409 freqA = 0.3738 freqC = 0.3826 freqC = 0.3309 freqG = 0.0607 fieqG = 0.0357 freqT = 0.2 158 freqT = 0.2596 Among-site rate variation: Among-site rate variation: Proportion of invariable sites = O Proportion of invariable sites (1) = 0.2826 Variable sites (G) Variable sites (G) Gamma distribution shape parameter = 0. Gamma distribution shape parameter = 2. AIC= 9747.49 12 1 1. AlC= 2595.827148

CO1 COI1 Model selected: GTR+I+G Model selected: GTR+I+G -1nL = 9450.2764 -1nL = 44 1 1.7798 Substitution model: Substitution model: Rate matrix Rate matrix R(a)[A-Cl= 1.8055 R(a) [A-C] = 9887.5576 R(b) [A-G] = 9.34 14 R(b) [A-G] = 97838.5078 R(c) [A-T) = 2.906 1 R(c) [A-TJ = 26279.9766 R(d) CC-G] = 0.5790 R(d) [C-G] = 4293.6738 R(e) [C-T] = 27.6002 R(e) [C-Tl= 2 13306.2969 R(f) [G-Tl = 1 .O000 R(f) [G-Tl= 1 .O000 Base frequencies: Base fiequencies: Equal fiequencies Equal frequencies fieqA = 0.3345 freqA = 0.352 1 fieqC = 0.3 133 hqC = 0.3233 freqG = 0.1 170 fieqG = 0.0999 fieqT = 0.23 52 freqT = 0.2247 Among-site rate variation: Among-site rate variation: Proportion of invariable sites (1) = 0.3446 Proportion of invariable sites (1) = 0.1554 Variable sites (G) Variable sites (6) Gamma distribution shape parameter = 0.3694 Gamma distribution shape parameter = 0.2598 AIC= 18920.552734 AIC= 8843.559570 Appendix 4.1 continwd.

COttl Cytochrome b Model selected: GTR+I+G Model selected: TVM+I+G -InL = 4945.202 1 -InL = 7489.0952 Substitution model: Substitution rnodel: Rate matrix Rate matrix R(a) [A-Cl = 208355.8 135 R(a) [A-C] = 3.5337 R(b) [A-G] = 1 1409û0.7500 R(b) [A-G] = 1 7.576 1 R(c) [A-T] = 76 1954.8 125 R(c) [A-T] = 2.1 807 R(d) [C-G] = 23075.2949 R(d) CC-G] = 1.4535 R(e) [C-T) = 39048 17.7500 R(e) [C-T] = 17.576 1 R(f) [G-Tl = 1.0000 R(f) [G-T] = 1 .O000 Brise frequencies: Base frequencies: Equal frequencies freqA = 0.33 10 freqA = 0.3089 freqC = 0.3423 freqC = 0.3802 freqG = 0.1 161 freqG = 0.0872 freqT = 0.2 105 freqT = 0.2237 Among-site rate variation: Amonpsite rate variation: Proportion of invariable sites (1) = 0.39 15 Proportion of invariable sites (1) = 0.1482 Variable sites (G) Variable sites (G) Gamma distribution shape parametet = 0.6082 Gamma distribution shape parameter = 0.3293 AC= 99 10.404297 A IC= 14996.1 90430

ND1 ND2 Model selected: TrN+I+G Model setected: GTR+I+G -lnL = 6987.9 13 1 -InL = 8459.5430 Substitution model: Substitution model: Rate matrix Rate matrix R(a) [A-Cl = 1 .O000 R(a) [A-C] = 0.24 19 R(b) [A-G ] = 1 5.2058 R(b) [A-G] = 2.8587 R(c) [A-TI = 1 .O000 R(c) [A-T] = 0.1 O56 R(d) [C-GJ = t .O000 R(d) [C-G]= 0.29 t 7 R(e) [C-Tl = 9.78 17 R(e) [C-TJ = 13457 R( f) [G-T] = 1 -0000 R(f) [G-T] = 1 .O000 Base frequencies: Base fiequencies: Equal frequencies freqA = 0.3302 fieqA = 0.364 1 freqC = 0.3495 freqC = 0.35 12 freqG = 0.0809 freqG = 0.0658 freqT = 0.2394 freqT = 0.2 190 Among-site rate variation: Among-site rate variation: Proportion of invariable sites (1) = 0.33 17 Proportion of invariable sites (1) = 0.1626 Variable sites (G) Variable sites (G) Gamma distribution shape parameter = 0.6092 Gamma distribution shape parameter = 0.5335 AiC= 13989.826 172 AIC= 16939.085938 Appendix 4.1 continued.

ND3 ND4L Model selected: TVM+G Model selected: TVM+G -1nL = 2406.1897 -lnL = 2233.3955 Substitution model: Substitution model: Rate matrix Rate matrix R(a) [A-Cl = 285056.7188 R(a) [A-Cl = 0.3031 R(b) [A-G] = 9528 13.2500 R(b) [A-G] = 4.4596 R(c) [AIT] = 143547.8750 R(c) [A-Tl = 1.3265 R(d) [C-G] = 0.0000 R(d) [C-G]= 0.3833 R(e) [C-Tl = 9528 1 3.2500 R(e) [C-T] = 4.4596 R(f) [G-T]= 1 .O000 R(f) [G-T] = 1 .O000 Base frequencies: Base fiequencies:

freqA = 0.3294 freqA = 0.3 100 freqC = 0.3467 freqC = 0.3687 freqG = 0.084 1 freqG = 0.0949 freqT = 0.2398 freqT = 0.2265 Among-site rate variation: Among-site rate variation: Proportion of invariable sites = O Proportion of invariable sites = O Variable sites (G) Variable sites (G) Gamma distribution shape parameter = 0.2820 Gamma distribution shape parameter = 0.2893 AIC= 4828.379395 AIC= 4482.79 10 16

ND4 ND5 Model selected: TVM+I+G Model selected: GTR+I+G -InL = 10202.2 100 -lnL = 13944.2666 Substitution model: Substitution model: Rate matrix Rate matrix R(a) [A-Cl = 0.3054 R(a) [A-C] = 0.7286 R(b) [AG] = 5.2430 R(b) [A-G] = 4.5392 R(c) [A-Tl = 0.6489 R(c) [A-Tl = 0.7835 R(d) [C-G] = 0.3094 R(d) [C-G] = 0.0000 R(e) [C-Tl = 5.2430 R(e) [C-T] = 5.9852 R(f) CG-T] = 1.0000 R(f) [G-Tj = t .m Base frequencies: Base frequencies: Equal frequencies freqA = 0.3573 freqA = 0.3492 freqC = 0.3708 freqC = 0.3624 freqG = 0.0675 freqG = 0.0759 freqT = 0.2044 tkeqT = 0.2 125 Amongsite rate variation: Among-site rate variation: Proportion of invariable sites (1) = 0.2452 Proportion of invariable sites (1) = 0.1 704 Variable sites (G) Variable sites (G) Gamma distribution shape pmmeter = 0.4835 Gamma distribution shape parameter = 0.45 17 AIC= 20422.4 19922 AIC= 27908.533203 Appendix 4.1 continued

Model selected: K8 l uf+G -1nL = 4151.5171 Substitution model: Rate rnatrix R(a) [A-C] = 1 .O000 R(b) [AG] = 35.5698 R(c)[A-T]= 6.5528 R(d)[C-G) = 6.5528 R(e) [C-T] = 35.5698 R(f) [G-T] = 1 .O000 Base frequencies: freqA = O. 1380 freqC = 0.0768 freqG = 0.3 76 1 freqT = 0.4090 Among-site rate variation: Proportion of invariabie sites = O Variable sites (Ci) Gamma distribution shape parameter = 0.380 1 Appendix 4.2 Neighbour joining (LogDet) for the individual protein-coding genes. Bootstrapped trees are also shown.

AG-Log Det

h turtle turtle 1- 1- alligator alligator elegant crested tinamou great tinamou - great tinamou elegant crested tinamou a -chicken chicken rnoa 1 rnoa 1 z Cmoa 2 moa 2 rn ostnctl - ostrich great spotted kiwi - greater rhea great spotted kiwi I lesser rhea emu southern cassowary 1 southern cassowary - 10

turtle turtle I alligator alligator chicken

great spotted kiwi great tinamou elegant crested tinarnou LI rnoa 1 southem cassowary I rnoa 2 - great tinamou greater rhea elegant crested tinamou I lesser rhea rnoa 1 emu C rnoa 2 I southern cassowary 10

COI-Log Det turîle turîle alligator I chicken chicken great tinamou elegant crested tinarnou great spotted Ki -, rnoa 1 great tinamou C rnoa 2 elegant crested tinamou ostrich great spotted kiwi L. greater rhea southern cassowary lesser rhea greater rhea Ilesser rhea o. 1 Appendix 4.2 Neighbour joining (LogDet) for the individual protein-coding genes. Bootstrapped trees are also show. -continued.

COI 1-LogDet COIlboot turtfe tu rtle alligator Falligator chicken chicken great tinamou 1 1 ,- great tinamou elegant crested tinamou

southem cassowary ostrich great spotted kiwi

lesser rhea southem cassowary

COlll-boot

b turtle turtle alligator altigator chicken chicken 1 ostrich ostrich - rnoa 1 great spotted kiwi C rnoa 2 a greater rhea Clesser rhea greater rhea lesser rhea southem cassowary k ::hem cassowary great tinarnou great tinamou elegant crested tinamou elegant crested tinarnou 0.t 10 L

Cytb-boot

h turtle turtle alligator

great spotted kiwi

great spotted kiwi greater rhea southem cassowary greater ha Clesset hea southern cassowary o. 1 Appendix 4.2 Neighbour joining (LogDet) for the individual protein-coding genes. Bootstrapped trees are also shown. îontinued.

1 turtle turtle alligator alligator chicken great tinamou great tinamou elegant crested tinamou elegant crested tinarnou rnoa 1 ostrich II L moa 2 great spotted kiwi ostrich great spotted kiwi greater rhea lesser rhea

southem cassowary southem cassowary

ND2-LogDet ND2-boot turtle turtle alligator alligator chicken chicken great tinamou great tinamou - elegant crested tinamou elegant crested tinamou -ostrich great spotted kiwi 4r- rnoa 1 greater rhea -I lesser rhea emu southem cassowary southem cassowary -0.1

turtle turtle I alligator alligator great tinamou chicken chicken r great tinamou great spotted kiwi elegant crested tinamou

3 moa 1 , ostrich C rnoa 2 great spotted kiwi ostrich I moa 1

r( emu rnoa 2 southem cassowary greater rhea elegant crested ünamou lesser rhea greater rhea 1 C ksser rhea 0.1 Appendix 4.2 Neighbour joining (LogDet) for the individual protein-coding genes. Bootstrapped trees are also shown. -continued.

ND6-Log Det , turtle turtle alligator alligator -chicken chicken -ostrich great tinamou great spotted kiwi elegant crested tinamou great tinamou ostrich EiiT"restecitinamou great spotted kiwi moa 1 I moa 2 emu greater rhea