NOTE TO USERS

This reproduction is the best copy available.

UMI*

EXPLORING THE EFFICACY, UTILITY, AND LIMITATIONS

OF DNA BARCODING WITHIN THE CLASS AVES

A Thesis

Presented to The Faculty of Graduate Studies

of

The University of Guelph

by KEVIN CHARLES ROBERT KERR

In partial fulfilment of requirements for the degree of

Doctor of Philosophy April, 2010

© Kevin C. R. Kerr, 2010 Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-64533-8 Our file Notre reference ISBN: 978-0-494-64533-8

NOTICE: AVIS:

The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

•+• Canada ABSTRACT

EXPLORING THE EFFICACY, UTILITY AND LIMITATIONS OF DNA BARCODING WITHIN THE CLASS AVES

Kevin C. R. Kerr Advisors: University of Guelph, 2010 Professor P. D. N. Hebert Professor A. J. Baker

This thesis investigates the efficacy of a recently proposed molecular bioidentifcation system known as "DNA barcoding". This system employs a short, standardized gene region

(648bp of the mitochondrial gene cytochrome c oxidase I, in the case of ) as a unique species identifier. To test species-level resolution, I constructed a library of DNA barcode sequences for from three regions: the Nearctic (North America), the southern

Neotropics (Argentina), and the eastern Palearctic (Russia, Mongolia, and Kazakhstan).

The accuracy of barcode-based species identification was assessed using the currently accepted avian , which is the most robust of any taxonomic group. I also tested the

use of DNA barcodes for species discovery via detection of large intraspecific divergences.

Common intraspecific and interspecific trends in phylogeography were compared within

and between biogeographical realms. Using the avian barcode library, I also compared the performance of several different methods for species delimitation, including distance-based thresholds, tree-based methods, and character-based methods (wherein each nucleotide of the sequence is treated as a unique character). Finally, I used the abundance of sequence data to test for signs of selection in cytochrome c oxidase I. Whole mitochondrial genomes

available from GenBank were used to review the consistency of selective pressure throughout the genome. This largely confirmed the role of purifying selection in the evolution of the mitochondrial genome in birds. Overall, this study substantiates the utility of DNA barcoding as a reliable tool for the purposes of species identification and for highlighting taxa in need of further taxonomic review. ACKNOWLEDGEMENTS

This research was funded by an Ontario Graduate Scholarship and an Elgin Card terrestrial zoology scholarship to myself, and, to a greater extent, by formidable grants to

Paul Hebert from Genome Canada, the Natural Sciences and Engineering Research

Council of Canada, and the Gordon and Betty Moore Foundation.

In its entirety, the research presented in this thesis is truly a culmination of the efforts of many, to all of whom I owe a debt of gratitude. First and foremost, I must thank Paul Hebert for seeing my potential and for providing me with so many tremendous opportunities, some beyond which I could have ever imagined. Without him, none of this would have been possible.

I have been incredibly fortunate to have an extensive support network and the opportunity to collaborate with so many wonderful people. My co-advisor, Allan Baker, and Mark Peck of the Royal Ontario Museum provided samples, insightful discussion, and behind-the-scenes tours. Dr. Mark Stoeckle of the Rockefeller University was always willing to offer thoughtful commentary and, when in a pinch, sound medical advice.

Carla Dove and her myriad of technicians at the Smithsonian Institution helped tremendously in wrangling up specimens and by contributing sequences. Sharon Birks and staff at the Burke Museum of Natural History (including, but not limited to, Rob

Faucett, Chris Wood, and Sievert Rohwer) were kind enough to host me at their museum and provide extensive access to their remarkable tissue collection. Charles Francis and

Michel Gendron of the Canadian Wildlife Service also offered valuable collaboration.

I owe unfuerte abrazo to Pablo Tubaro for having the foresight to get involved with DNA barcoding. Our ongoing student exchange has been one of the best parts of my

i graduate experience. Accordingly, I thank Dario Lijtmaer for being such an excellent collaborator and for occasionally serving as my translator, I thank Ana Barreira and Pilar

Benites for getting me home from the jungle in one piece, and I thank Cecilia Kopuchian,

Leonardo Campagna, and everyone else at the museum for all of our interactions.

The Hebert Lab has been a great academic home. I must thank all lab mates, past and present: Tyler Zemlak, Elizabeth Clare, John Wilson, Taika von Konigslow, Vazrick

Nazari, Christina Carr, and Erin Corstorphine. I thank Sujeevan Ratnasingham, who on account of his genius has been the backbone of this whole endeavour (and occasionally stopped to have fun too). I also thank all of the other brains that have helped me out along the way: Jeremey deWaard, Jonathan Witt, Alex Smith, Mehrdad Hajibabaei, Nataly

Ivanova, Alex Borisenko, Dirk Steinke, Rob Dooh, and Justin Schonfeld. Lab support came from Chris Grainger, Janet Topan, Constantine Christopoulos, Isabelle Meusnier and Liuqiong Lu, as well as notably from Angela Hollis of the Genomics Facility.

Jinzhong Fu and Teri Crease have both served very important roles on my committee and I have relied on them both for support. The Department of Integrative

Biology is generally filled with terrific faculty, of whom I would like to especially thank

Jim Bogart, Elizabeth Boulding, Bob Hanner, Brian Husband, Ryan Gregory, Denis

Lynn, Steve Newmaster, Beren Robinson, and Don Stevens. The students in this department (Darren Sleep, Marc Freeman, Nathalie Newby, Mark Sherrard, Kate Crosby,

John Urquhuart, Joe Crowley, Martin Brummell, and Sarah Alderman, to name just a few) have been great friends and I could not imagine the experience without them.

This experience has taught me the essential value of museum collections, without which a study of this magnitude would never be possible, as should be evidenced by this

ii laundry list of collaborating institutions. For providing tissue samples I thank the Royal

Ontario Museum (including volunteers of Toronto's Fatal Light Awareness Program),

Burke Museum of Natural History and Culture, Zoological Museum of Moscow

Lomonosov University, Museo Argentino de Ciencias Naturales, and the Canadian

Wildlife Service. For additional samples processed via the Smithsonian I would like to thank the Academy of Natural Sciences Philadelphia, American Museum of Natural

History, Field Museum, Museum of Comparative Zoology, Louisiana State University,

Museum of Southwestern Biology, Museum of Vertebrate Zoology, National Museum of

Natural History, University of Alaska-Fairbanks, and the University of Kansas. Bob

Montgomerie (Queen's University) and Chris Earley (The Arboretum, University of

Guelph) provided additional specimens and Irby Lovette (Cornell University) shared unpublished sequences. Feather samples were generously provided by the following stations: Albert Creek Observatory, Appalachian Highlands Science Learning

Center of the US National Park Service, Brier Island Bird Migration Research Station,

Gros Morne National Park Migration Monitoring Station, Haldimand Bird Observatory,

Innis Point Bird Observatory, Inglewood Bird Observatory, Long Point Bird

Observatory, Mackenzie Nature Observatory, McGill Bird Observatory, Rock Point Bird

Banding Station, Rocky Point Bird Observatory, St. Andrews Banding Station, Tommy

Thompson Park Bird Research Station, and Vaseux Lake Migration Monitoring Station.

Lastly, but most importantly, I thank my parents, Gary and Sandy, and my sister,

Jenny, for always being supportive and believing in me. I am extremely lucky to have such a wonderful family. I also thank Andrea Nesbitt for enduring my workaholic stints, long absences, and all of the chaos, with love and support.

iii TABLE OF CONTENTS

List of Tables vii

List of Figures viii

PROLGUE 1

General introduction 2

CHAPTER I

Comprehensive DNA barcode coverage of North American birds 10

Abstract 11

Introduction 12

Methods 13

Results 15

Discussion 17

CHAPTER n

Probing evolutionary patterns in Neotropical birds through DNA barcodes 29

Abstract 30

Introduction 31

Methods 32

Results 35

Discussion 37

Barcodes in Argentinean birds 37

Barcoding the avifaunas of Argentina and North America 41

Conclusions 43

IV CHAPTER III

Filling the gap - COI barcode resolution in eastern Palearctic birds 48

Abstract 49

Introduction 51

Methods 53

Sampling 53

Laboratory methods 54

Data analysis 55

Results 57

Neighbour-joining clusters 57

Distance-based assignment 58

Character-based assignment 59

Discussion 60

Species boundaries in Palearctic birds 60

Methods comparison 64

Conclusions 67

CHAPTER IV

Searching for selection using avian DNA barcodes 76

Abstract 77

Introduction 78

Methods 80

Data collection 80

Assessing genetic diversity 81

v Neutrality tests ofCOI variation 82

Amino acid variation 83

COI versus other mitochondrial genes 84

Results 85

Genetic diversity 85

Neutrality tests 86

Amino acid diversity 86

Genomic comparisons 87

Discussion 88

Conclusions 92

EPILOGUE 98

General conclusions 99

REFERENCES 105

APPENDICES see accompanying CD

VI LIST OF TABLES

Table 1.1: Species with overlapping barcode clusters 23

Table 1.2: Provisional splits of recognized species 24

Table 2.1: Bird species from Argentina with deeply divergent COI groups 45

Table 3.1: Groups of species that failed recognition via MOTU analysis 69

Table 3.2: List of all species containing divergent COI lineages 71

Table 4.1: Species included in McDonald-Kreitman tests for neutrality 93

Table 4.2: Summary of amino acid variation for twelve avian orders 94

vn LIST OF FIGURES

Figure 1.1: Comparing barcode sequence clusters with species-level taxonomy 25

Figure 1.2: Mean intraspecific variation according to number of individuals 26

Figure 1.3: Intraspecific distance, population size, and apparent species age 27

Figure 1.4: Intraspecific and nearest neighbour distances in North American birds 28

Figure 2.1: Frequency histograms of COI variation for Argentinean birds 46

Figure 2.2: Maps of distributional patterns of divergent barcode lineages 47

Figure 3.1: Map of the eastern Palearctic region with collecting sites highlighted 73

Figure 3.2: Cumulative type I and type II error plots for divergence thresholds 74

Figure 3.3: Divergence patterns between closely related species 75

Figure 4.1: Intraspecific diversity, interspecific divergence, and sampling effort 95

Figure 4.2: Predicted secondary structure for the avian consensus sequence 96

Figure 4.3: Mean dN, ds, and dN/ds for thirteen protein-coding mitochondrial genes 97

viii PROLOGUE

1 GENERAL INTRODUCTION

The study of biological diversity has emerged as a cornerstone of the life sciences.

Looking backward, it is apparent that the field has grown with great momentum; the popular contraction "biodiversity" was only proposed in 1988 (Wilson and Peter 1988), while this year, 2010, has been declared the International Year of Biodiversity (Martens

2010). Conservation efforts are typically the focus of biodiversity studies, but we cannot protect what we do not know. The author David Quammen astutely described the classification of species as "...a crucial, basic enterprise, prerequisite to deeper ecological understanding" (Quammen 2003, p.68). Accordingly, the systematic classification of all life should be at the forefront of biodiversity goals.

In 1995, the term "taxonomic impediment" was coined referring to a deficiency in the expert taxonomic knowledge that would be necessary to identify and describe the

Earth's biodiversity (Evenhuis 2007). Although this terminology was new, the problem was not. This is evidenced by commentary written forty-five years earlier (Sabrosky

1950):

In many groups, the lack of adequate support for taxonomic work and the

relatively few students entering the field, among other factors, postpone

indefinitely the time when careful and comprehensive studies can be carried

out. The situation is particularly intriguing, for it is contemporaneous with

the great interest in such fields as evolution, zoogeography, and speciation,

which depend so greatly for their raw materials on the data published by the

taxonomists!

2 The more recent impetus to solve this issue was spurred by increasing extinction rates. The possibility of losing species before they could even be described suggested that timing was critical. Unfortunately, the identification of the Earth's biota is not a trivial matter. Despite over 250 years of effort commencing with Linnaeus' Systema Naturae

(1758), it is estimated that only 10 percent of all life forms on the planet have been described (Wilson 2003). Clearly, a new catalyst is needed to accelerate the rate of species discovery and invigorate the taxonomic community.

This need for faster taxonomic progress motivated a proposal by Hebert et al.

(2003a) to use a single standardized genetic marker as a global bioidentification tool.

This method was dubbed "DNA barcoding" in reference to the analogous Universal

Product Code - the familiar black-and-white bars that adorn virtually every modern commercial product. Commercial barcodes utilize a string of 11 digits that can vary between 10 character states (the numbers 0 through 9) to create billions of unique combinations, enough variation for each commercial product to bear its own code.

Characters in the DNA barcode are limited to four possible character states (the four nucleotides: adenine, guanine, cytosine, and thymine), but even with some functional constraints the inclusion of 648 nucleotide positions allows for an enormity of variation

(Hebert et al. 2003a). The idea put forth was that if mutations occur at even a modest rate

(i.e. 2% per million years), then reproductively isolated species would accumulate enough differences in their DNA so that each could be readily distinguished and identified by a unique sequence.

3 Hebert et al. (2003a) selected the mitochondrial gene cytochrome c oxidase I, or simply COI, as their candidate gene - the so-called "DNA barcode". A mitochondrial gene was a natural choice on account of the haploid mode of inheritance the mitochondrial genome, lack of recombination, and lack of introns (Ballard and Whitlock

2004). The specific choice of COI was based partly on the availability of universal primers (Folmer et al. 1994), but also for its phylogenetic signal (Hebert et al. 2003a).

However, opinions vary on the phylogenetic utility of COI (Feldman and Omland 2005;

Remigio and Hebert 2003; Simmons and Weller 2001; Zardoya and Meyer 1996). Other proposals for DNA-based identification systems have promoted the use of alternative genes, which implies that gene choice is somewhat arbitrary (e.g. Lee et al. 2008; Vences et al. 2005); but for broad success, standardization is key (Frezal and Leblois 2008).

The inclusion of DNA evidence in taxonomic practice is not in itself novel (e.g.

Busse et al. 1996; Karp et al. 1996; McManus and Bowles 1996; Navajas and Fenton

2000; Sibley and Ahlquist 1983). However, previous treatments have not given it a central role. The promotion of "DNA taxonomy", wherein DNA would serve a primary role in systematic studies, was concurrent with the seminal barcoding publication

(Blaxter 2004; Tautz et al. 2003). The two ideas vary subtly, but fundamental differences do exist. As noted above, the novelty of DNA barcoding lies primarily in the use of a standardized gene or combination of genes but is ultimately meant to compliment traditional methods; DNA taxonomy intends to install DNA at the core of a taxonomic reference system (Tautz et al. 2003). At most, the barcoding endeavour could perhaps be viewed as a derivative of the more ambiguously defined DNA taxonomy, but the two should not be confused.

4 DNA barcoding may be further broken down into two distinct objectives: species identification and species discovery (DeSalle 2006; Moritz and Cicero 2004).

Understanding the difference between these two purported uses is essential if the method is to be properly evaluated. Species identification concerns the assignment of an unknown specimen to a known species based on comparison of its DNA barcode to a reference database of vouchered specimens. Species discovery concerns the recognition of overlooked cryptic taxa, which are very similar morphologically but possess divergent genetic signatures. It has long been common for new species boundaries to be identified when taxa are carefully scrutinized with newer technologies, as is also articulated in

Sabrosky's commentary (1950):

In all too many cases we are still merely quoting the names and

conclusions of workers of the last generation, or even earlier, because

there has been no one available to study groups in the light of modern

techniques and concepts. When they are studied, we often find that old

records of "common species" mask a whole complex of forms.

Those words were penned three years prior to the discovery of the structure of

DNA (Watson and Crick 1953). With the modern incorporation of molecular methods,

Sabrosky's words echo louder than ever. The validity of species boundaries may be argued back and forth, particularly in a group such as birds where species are frequently described based on our ability to recognize them in the field (Watson 2005). Genetic evidence is often viewed as being more concrete since it is reflective of evolutionary

5 history (Zink 2004). But the rate at which DNA-led species discoveries are progressing is still insufficient, as is evidenced by the yet unresolved taxonomic impediment. Broadly applying the standardized barcoding approach across many taxonomic groups could potentially unearth a plethora of undiscovered species in a relatively short period of time

(Tautz et al. 2003).

Early studies accompanying the initial proposal for DNA barcoding offered a precursory view of COI sequence variation within an assortment of Eukaryotic taxa

(Barrett and Hebert 2005; Hebert et al. 2003b; Hogg and Hebert 2004; Remigio and

Hebert 2003). The results of these early studies showed DNA barcoding's potential by demonstrating that COI sequence variation was generally low within species and significantly greater between them. However, the DNA barcoding proposal was not unanimously accepted within the taxonomic community and critics were swift to identify perceived weaknesses, both in theory and in applications (Prendini 2005; Will and

Rubinoff 2004). The practice was proclaimed unscientific (Lipscomb et al. 2003) and fears were voiced that it would starve traditional taxonomy of needed funds (Ebach and

Holdrege 2005). Detractors of DNA barcoding published critiques with fanaticism

(Cameron et al. 2006), yet more grounded concerns did arise (DeSalle et al. 2005;

Nielsen and Matz 2006). Can DNA barcodes distinguish young radiations of species?

Can they delimit diverse tropical taxa? Will introgression be a major stumbling point?

How similar must DNA barcodes be to confirm the identity of a species? A more thorough test of the DNA barcoding method was still needed to help clarify these points of contention.

6 The core of many biological disciplines, such as physiology, genetics, or developmental biology, are almost entirely dependent on the extrapolation of trends observed in model organisms (Hedges 2002). Equivalently, it is reasonable that a few choice groups of organisms may be used to more properly evaluate the DNA barcoding paradigm. This may be accomplished through dense taxonomic sampling of well-studied groups. The mature taxonomy of birds makes them an ideal benchmark. Their ubiquitous nature and relatively low total number of species, combined with centuries of human fascination, has lead to birds being the most taxonomically robust group of virtually all life forms (Gill 2003).

A pilot study examined intraspecific and interspecific COI distances in a select number of North American birds (Hebert et al. 2004). The initial results were encouraging, generally showing limited variation within species and much greater variation between them. Exceptions to the former trend occurred in four species (i.e. large intraspecific differences) and led researchers to suggest that cryptic lineages had been uncovered. Increased sampling within those species (Solitary Sandpiper, Tringa solitaria;

Marsh Wren, Cistothorus palustris; Warbling Vireo, Vireo gilvus; Eastern Meadowlark,

Stums magna) revealed phylgeographic patterns wherein most of the lineages divided into eastern and western populations. However, critics were again quick to announce perceived shortcomings in the analysis, such as the exclusion of closely-related sister species and of tropical species (Moritz and Cicero 2004).

The preliminary investigation into the effectiveness of DNA barcode-based identification in birds paved the way for my thesis, which begins where the former left off. While mitochondrial markers are frequently incorporated into studies of avian

7 phylogenetics and phylogeography, COI is a virtual stranger to ornithology, having been employed in only a few remote studies (Feldman and Omland 2005). Hence, this thesis details the development of a large and comprehensive library of COI DNA barcodes for the avian class. Using the currently accepted taxonomy of birds as a gold standard, the efficacy of DNA barcodes as a tool for both species identification and species discovery is put to the test.

In Chapter One, I expand the COI barcode library to include 93% of North

American resident breeding and visiting pelagic bird species using vouchered tissue samples from a large assembly of museums, plus feather samples from bird banding stations. This comprehensive analysis provides a more complete picture of COI divergences within and between avian species and forms one of the largest scale studies of barcoding in vertebrates. Neighbour-joining trees with bootstrap support are used to diagnose 'barcode clusters', which closely correspond to species boundaries. Large intraspecific gaps suggestive of species-level differences are identified in fifteen currently recognized species. Problematic cases, wherein resolution between recognized species is poorly defined, are also highlighted between young species pairs. I discuss the hypothesis of selective sweeps as a mechanism to maintain low intraspecific diversity.

In Chapter Two, I explore barcode divergence in Neotropical birds by using vouchered museum specimens from Argentina. Similar to the Chapter 1,1 compare intra- and interspecific distances to assess the efficacy of barcode-based species delimitation.

The phylogeographic structure of species with large intraspecific differences is compared to that observed in North American species. Furthermore, intraspecific variation is explored in species with ranges expansive enough to include North America and

8 Argentina. The different patterns observed are discussed in an ecological and biogeographical context.

Chapter Three presents the third regional analysis, which details COI variation in eastern Palearctic birds. The eastern Palearctic provides a northern temperate region

analogous to the Nearctic, but one that differs greatly in landmass and glacial history. I compare patterns of species diversity to those in the North American avifauna.

Alternative methods for species delimitation are also explored in this chapter, including tree-based, distance-based, and character-based methods. Ultimately, I show that tree- and distance-based methods share most of the same strengths, while character-based

methods perform poorly on large datasets but provide additional resolution when used in conjunction with one of the other methods.

In Chapter Four, I turn the focus to the COI gene itself and the role of selection in

shaping the observed divergence patterns is more carefully explored. The idea that

selective sweeps are responsible for the low variation observed in the barcode region is put to the test. Finally, I make use of the growing number of complete mitochondrial

genomes available from GenBank to compare the types of substitutions observed in COI

to the other 12 protein-coding mitochondrial genes.

9 CHAPTER I

Comprehensive DNA barcode coverage of North American birds

Published in Molecular Ecology Notes:

Kerr, K.C.R., Stoeckle, M. Y., Dove, C. J., Weigt, ,.A. Francis, CM., and Hebert, P.D.N. (2007)

Comprehensive DNA barcode coverage of North / nerican birds. Molecular Ecology Notes 7: 535-543. doi: 10.1111/J.1471-8286.2006.01670.X

10 ABSTRACT

DNA barcoding seeks to assemble a standardized reference library for DNA- based identification of eukaryotic species. The utility and limitations of this approach need to be tested on well-characterized taxonomic assemblages. Here we provide a comprehensive DNA barcode analysis for North American birds including 643 species representing 93% of the breeding and pelagic avifauna of the United States and Canada.

Most (94%) species possess distinct barcode clusters, with average neighbor-joining bootstrap support of 98%. In the remaining 6%, barcode clusters correspond to small sets of closely-related species, most of which hybridize regularly. Fifteen (2%) currently recognized species are comprised of two distinct barcode clusters, many of which may represent cryptic species. Intraspecific variation is weakly related to census population size and species age. This study confirms that DNA barcoding can be effectively applied across the geographic and taxonomic expanse of North American birds. The consistent finding of constrained intraspecific mitochondrial variation in this large assemblage of species supports the emerging view that selective sweeps limit mitochondrial diversity.

11 INTRODUCTION

Mitochondrial DNA analysis has been employed in the evolutionary study of species for more than 30 years (Avise and Walker 1999; Brown et al. 1979;

Mindell et al. 1997). Its higher mutation rate and lower effective population size than nuclear DNA means that it provides a powerful tool to probe for evidence of reproductive isolation between lineages. This fact provoked a proposal to standardize DNA-based species identification by analyzing a uniform segment of the mitochondrial genome. With this approach, a library of sequences from taxonomically verified voucher specimens serve as DNA identifiers for species, in short, DNA barcodes (Hebert et al. 2003a). For animals, research has focused on a 648 bp segment of the mitochondrial gene cytochrome c oxidase I (COI), which can be readily recovered from diverse species with a limited set of primers. DNA barcoding translates expert taxonomic knowledge of diagnostic morphologic characters into a widely accessible format, DNA sequences, enabling more people to identify specimens. In addition to assigning specimens to known species, DNA barcoding can speed the discovery of new species, as large sequence differences in animal mitochondrial DNA generally signal species status.

For this approach to be effective, it must be possible to distinguish intraspecific and interspecific mitochondrial DNA variation. Pseudogenes, retention of ancestral polymorphisms, hybridization, and the idiosyncrasies of mitochondrial DNA inheritance pose potential difficulties (Benasson et al. 2001; Moritz and Cicero 2004; Thalmann et al.

2004; Will et al. 2005). The simplest test is whether genetic distances within species are less than those between species. Surprisingly, 23% of 2,319 animal species failed this test in one review (Funk and Omland 2003), implying that mitochondrial gene sequences do

12 not reliably capture species boundaries. However, the published studies that formed the basis for this estimate may be biased towards exceptional situations and groups in need of taxonomic revision, as further investigations on several vertebrate and invertebrate groups have shown that COI barcodes distinguish more than 95% of species (Hajibabaei et al. 2006; Ward et al. 2005).

Because birds have been the subjects of particularly intensive taxonomic analysis, they provide an excellent opportunity to test the efficacy of barcode-based species delimitation. With most recent species splits stemming from genetic studies, avian taxonomy could, in turn, benefit from a broad-scale genetic survey. In a preliminary survey of 260 North American bird species, COI sequence variation between species was generally much greater than that within species, and no two species shared barcodes

(Hebert et al. 2004). As a result, COI sequence information enabled assignment of specimens to known species. Four of 120 species (3%) studied in greater detail contained two distinct barcode clusters, which appeared to reflect cryptic species; a conclusion supported by observations of subtle differences in song and morphology for 3 of the 4 cases (Kroodsma 1989; Rohwer 1976; Sibley and Monroe 1990). To test these results more stringently, we increase taxon coverage and sample sizes in this study, applying

DNA barcoding to examine the taxonomic status of 643 species, representing 93% of the breeding and pelagic bird species from the United States and Canada (Figure 1.1).

METHODS

Most analytic methods followed those described in the earlier study (Hebert et al.

2004). DNA sources for this study included frozen tissue samples (muscle, liver, or

13 blood), most of which were obtained from specimens with vouchers housed in museum collections. In addition to tissue samples, feathers (breast feathers or retrices) freshly collected at bird banding stations at 6 locales (Ontario, New Brunswick, Nova Scotia,

Yukon, North Carolina, Tennessee) were analyzed. Feather samples were stored in a dark, dry location at room temperature.

DNA extraction, PCR, and sequencing reactions were performed at either the

University of Guelph or the Smithsonian Institution. DNA was isolated using DryRelease

(see Hajibabaei et al. 2005), Qiagen DNEasy tissue extraction kit (Qiagen Inc., Valencia,

California), or the NucleoSpin96 tissue kit (Machery-Nagel, Diiren, Germany). Feather samples were processed using the former method exclusively. PCR predominantly utilized a single primer pair: BirdFl (TTC TCC A AC CAC AAA GAC ATT GGC AC) and BirdRl (ACG TGG GAG ATA ATT CCA AAT CCT G). If amplification was unsuccessful, an alternate forward primer, FalcoFA (TCA ACA AAC CAC AAA GAC

ATC GGC AC), or reverse primers, BirdR2 (ACT ACA TGT GAG ATG ATT CCG

AAT CCA G) and VertebrateRl (TAG ACT TCT GGG TGG CCA AAG AAT CA), were employed. All reactions were run under the following thermal cycle program: 1 min at 94°C followed by 6 cycles of 1 min at 94°C, 1.5 min at 45°C, and 1.5 min at 72°C, followed in turn by 35 cycles of 1 min at 94°C, 1.5 min at 55°C, and 1.5 min at 72°C, and finally 5 min at 72°C. 45 cycles were run in place of 35 for DNA extracted from feather samples to compensate for lower yields of DNA. PCR products were visualized on pre­ cast 2% agarose gels using the E-gel 96 system (Invitrogen, Carlsbad, CA, USA). PCR products were bidirectionally sequenced on an ABI 3100, 3130, or 3730. Contigs were

14 assembled from forward and reverse reads using Sequencher, version 4.5 (Gene Codes

Corporation, Ann Arbor, MI, USA).

Specimen and collection data, sequences, and trace files are provided in the container project "Birds of North America Phase 2" at http://www.barcodinglife.org. A

Kimura 2-Parameter distance metric was employed for sequence comparisons (Kimura

1980), genetic distances were calculated using the BOLD Management & Analysis

System (www.barcodinglife.org), bootstrap analysis was performed with 1000 replicates using MEGA, version 3.1 (Kumar et al. 2004), and scatter and box plots were generated with SigmaPlot 8.02 (SPSS, Inc.). All new sequences have been deposited in GenBank under accession numbers DQ432694 to DQ433261, DQ433274 to DQ433846,

DQ434243 to DQ434805, while sequences from the earlier study (Hebert et al. 2004) are deposited in GenBank under accession numbers AY666171 to AY666596.

RESULTS

A standard set of primers amplified the target region of COI from all but 1 of 643 species. These taxa included representatives from 19 (70%) of the 27 extant orders of birds, distributed among 71 families and 286 genera (Appendix 1.1). Together with the

438 specimens analyzed in the earlier study, we obtained COI sequences from 2,590 individuals, 70% from vouchered specimens held in museum collections. The mean length of the products sequenced was 658 bp. We analyzed multiple individuals (average

= 4.1, range = 2-125) from 546 (85%) of the 642 species, including 5 or more individuals from 211 species (33%). In most cases, conspecific specimens derived from widely- separated sites (Birds of North America Phase 2 project at www.barcodinglife.org).

15 We detected presumptive pseudogenes in approximately 5% of specimens. Because these were generally short, approximately 100-200 nucleotides, complete barcode sequences could be recovered with bidirectional sequencing. One presumptive pseudogene corresponding to the full-length barcode sequence was detected in 3 tyrannid flycatcher specimens (0.1%). Overall, pseudogenes were not an important limit to recovery of COI sequences.

Average intraspecific variation was unrelated to the number of individuals analyzed, suggesting there was representative sampling (Figure 1.2). Within the low and narrow band of intraspecific variation, there was a weak relationship to census population size, which ranges from a few thousand to over 300 million individuals (Figure 1.3)

(Wetlands International 2002; Rich et al. 2004). Intraspecific mitochondrial variation was only weakly associated with apparent species age (Figure 1.3). The earlier North

American bird study measured mean congeneric distance, the average distance among all congeneric relatives. To more stringently test the discriminatory power of COI barcodes, the present study examined nearest neighbor distance, the minimum genetic distance between a species and its closest congeneric relative. Nearest neighbor distance averaged

4.3%, 19-fold higher than the mean within species and 11-fold higher than the average maximum intraspecific distance (Figure 1.4). Including all species may give a more representative picture, as generic assignments may be incorrect, and 10% of birds are the sole members of their ; in this case average nearest neighbor distance was higher at

5.9% (Figure 1.4).

Levels of sequence difference varied across families: 35% of ducks, geese, and swans (Anatidae) showed nearest neighbor differences of 1 % or less, whereas all

16 sandpipers (Scolopacidae), plovers (Charadriidae), and owls (Strigidae) had nearest neighbor distances greater than 1%. COI barcodes separated 20 of the 23 taxonomic splits recognized in North American birds over the past 25 years with nearest neighbor distances ranging from 0.3-6.0% (Appendix 1.2). Average bootstrap support for species nodes with multiple individuals was 97.8%. As expected, bootstrap values were lower among the most closely-related species, averaging 79.8% for species with nearest neighbor distances less than 1%, but 99.5% for species with distances above 1%.

Forty-two species (6.4%) shared sequences or had clusters of sequences overlapping with those of another species, including 14 pairs, 2 triplets, and 1 set of 8 species (Table 1.1). The pattern of COI variation within these sets of overlapping species was indistinguishable from variation within single species, with the exception of Mallards and Black Ducks, which are known to harbor two distinct mitochondrial lineages (Avise et al. 1990). By contrast, we detected 15 other species with intraspecific distances greater than 2.5% (Table 1.2); each contained two distinct sequence clusters typically comprised of individuals from different geographic areas. These clusters may represent cryptic species. Treating these provisional species as distinct, average within-species variation for the COI barcode region was 0.23%.

DISCUSSION

The present study has reaffirmed that most North American bird species correspond to a single, tightly cohesive array of barcode sequences that are distinct from those of any other species. However, 15 species include two distinct barcode clusters,

17 while 42 other species possess barcode sequences that are shared or overlap with those of other species. What explains these exceptional cases?

Cases of deep barcode divergence within what are thought to be single species generally indicate cryptic taxa (Meyer and Paulay 2005; Moritz 1994). Our screen for provisional splits in species, employing a threshold that was 10X higher than the mean intra-specific variation, revealed 15 cases. Results from a thresholding approach must be interpreted with caution and are best used to flag species in need of further research.

Significantly, most of these hypothesized splits are supported by prior taxonomic work

(Table 1.2). In total, 9 of our 15 cases have been previously cited; 8 have been proposed to represent species pairs, the exceptional case being the northern raven (Omland et al.

2000). Some of the species yield additional lineages when non-North American populations are included; for example, 6 lineages in total are suggested for the winter wren (Drovetski et al. 2004b).

Regarding the 17 sets of species with overlapping barcodes, three processes may account for these findings. First, some may be recently diverged sister taxa where COI has not yet accumulated sequence differences. In such cases more extensive sequence information might allow resolution. Second, these taxa may share mitochondrial DNA because of hybridization. Most of our species sets with overlapping barcodes hybridize at least occasionally; many show extensive hybridization and produce fertile Fl hybrid offspring. Examples include Snow Goose and Ross's Goose (Cooke et al. 1995); Blue- winged and Cinnamon Teal (Bolen 1979); Mallard, Mottled, and Black Ducks

(McCracken et al. 2001); Sharp-tailed Grouse and Greater Prairie-Chicken (Sparling Jr.

1980); Red-naped and Red-breasted Sapsuckers (Johnson and Johnson 1985);

18 Townsend's and Hermit Warblers (Morrison and Hardy 1983); and the eight species of large white-headed gulls (California, Glaucous, Glaucous-winged, Herring, Iceland,

Lesser Black-backed, Western, Thayer's) (Olsen 2004). These taxa may be in the indeterminate zone between differentiated populations and distinct species (de Queiroz

2005), or well-formed species that are losing genetic identity due to secondary contact and hybridization. Third, some of the pairs with overlapping barcodes may be a single species (Johnston 1961).

Although there is an abundance of subspecific assignments in North American birds -5.5 per species according to one survey - many do not show any evidence of genetic divergence (Zink 2004). Barcode analyses can serve as a quick screening tool for those lineages with deep genetic divergence, aiding detection of overlooked species. In fact, all past barcode surveys have identified new taxonomic units, either as named species, provisional species, evolutionary significant units (ESUs), or molecular operational taxonomic units (MOTUs) in 4-40% of the species examined (Hajibabaei et al. 2006; Meyer and Paulay 2005; Monaghan et al. 2005; Saunders 2005; Scheffer et al.

2006; Smith et al. 2005). These results suggest that "an iterative process of DNA barcoding... folio wed by taxonomic study" will be a productive path to cataloging biodiversity (Barber and Boyce 2006). In the present study most provisional species were small to medium-sized, plainly-colored birds, whereas most species with overlapping barcodes were large and/or brightly-colored, which might reflect a natural taxonomic tendency toward undersplitting inconspicuous birds and/or oversplitting more conspicuous species.

19 Over 30 years ago, Richard Lewontin concluded that intraspecific variation is tightly constrained and recognized that both genetic drift and natural selection offer possible explanations for this fact (Lewontin 1974). Under genetic drift, recent population bottlenecks could account for low intraspecific variation. It might be argued that the low levels of mitochondrial variation detected in our study reflect the unique history of North American birds, most of which have expanded into their present ranges from smaller populations following retreat of glaciers. However, restricted intraspecific mitochondrial variation also exists in many vertebrate and invertebrate species from tropical, temperate, marine, and terrestrial environments (Barrowclough and Shields

1984; Bucklin and Wiebe 1998; Hajibabaei et al. 2006; Meyer and Paulay 2005;

Saunders 2005), implying a more general explanation. Effective population size for nuclear genes can reach an asymptotic limit due to linkage; this effect is strongest for organisms with large genomes, with the result that the effective population size of vertebrates might not exceed 104 (Gillespie 2000; Lynch et al. 2006). Although not directly applicable to mitochondria, this effect does reveal the complexities of estimating effective population sizes and predicting the role of drift in scouring variation.

Low mitochondrial variation might alternatively (or additionally) reflect recurrent selective sweeps; repeated diffusions of new, selectively favored variants across the breeding range of a species could purge mitochondrial diversity. Although 98% of the nucleotide differences in COI barcode sequences in our study between nearest neighbors were synonymous, selection on any nucleotide position in the mitochondrial genome would result in the loss of variation in the barcode region because mitochondrial DNA is inherited as a single linkage group, due to its asexual transmission. Mutations in nuclear

20 or mitochondrial loci important in nuclear-mitochondrial co-adaptation might be particularly important (Catalano et al. 2006). A recent analysis of patterns of substitution in nuclear and mitochondrial DNA concluded that reduced mitochondrial diversity in animals is due to selective sweeps (Bazin et al. 2006). Although these authors found no correlation between census population size and intraspecific mitochondrial variation, the range of variation was less than expected given census population sizes. This latter finding, together with our results showing trends toward increased diversity in larger populations and older species, imply that genetic drift does influence mitochondrial variation, but only weakly.

Most researchers agree that species are a key unit of biological systems, but quarrel about how best to define them. Hence, theoretical and operational species concepts proliferate, each emphasizing different aspects of present day biology and evolutionary history (Wheeler 2000). Some believe that a basic taxonomic unit does not exist, instead viewing species as a convenient taxonomic construct, "an arbitrary cut-off somewhere along a branch in the tree of life" (Mishler and Shapley 2004). The tight clustering of mtDNA sequences within species observed in our study not only bolsters the view that species are fundamental biological units, but also reveals that their identification is usually uncomplicated.

In summary, most North American bird species appear to have a similar genetic structure, each being a single tight cluster of mitochondrial DNA variants distinct from the clusters of closely-related species. High bootstrap support for species nodes in this study and in other animal groups suggests neighbor-joining analysis of COI barcode sequences will be widely effective (Hajibabaei et al. 2006; Ward et al. 2005). The few

21 species with higher intraspecific diversity were comprised of two such clusters, many of which appear to represent cryptic species. It seems likely that further study will reveal additional lineages within some species, but leave unchanged the underlying pattern of segregation of mitochondrial diversity into distinct clusters (Zink 2004). Together these observations imply a general constraint on mitochondrial diversity in birds.

22 Table 1.1. Species with overlapping barcode clusters. The percent similarity between related species (calculated using a K2P distance metric) is provided.

Order Common Name Scientific Name n Similarity (%) 1 Anseriformes Snow Goose Chen caerulescens 5 99.8 2 Ross's Goose Chen rossii 2 3 Black Duck Anas rubripes 8 4 Mallard Anas platyrhynchos 8 99.4 5 Mottled Duck Anas fulvigula 1 6 Blue-winged Teal Anas discors 8 100.0 7 Cinnamon Teal Anas cyanoptera 2 8 King Eider Somateria spectabilis 5 99.7 9 Common Eider Scomateria mollissima 1 10 Galliformes Sharp-tailed Grouse Tympanuchus phasianellus 3 99.7 11 Greater Prairie-Chicken Tympanuchus cupido 1 12 Lesser Prairie-Chicken Tympanuchus pallidicinctus 5 12 Podicipediformes Western Grebe Aechmophorus occidentalis 2 99.7 14 Clark's Grebe Aechmophorus clarkii 2 15 Charadriiformes Laughing Gull Larus atricilla 8 99.3 16 Franklin's Gull Larus pipixcan 4 17 California Gull Larus californicus 5 99.8 18 Herring Gull Larus argentatus 7 19 Thayer's Gull Larus thayeri 4 20 Iceland Gull Larus glaucoides 1 21 Lesser Black-backed Gull Larusfuscus 5 22 Western Gull Larus occidentalis 4 23 Glaucous-winged Gull Larus glaucescens 4 24 Glaucous Gull Larus hyperboreus 4 25 Piciformes Red-naped Sapsucker Sphyrapicus nuchalis 5 99.4 26 Red-breasted Sapsucker Sphyrapicus ruber 6 27 Passeriformes Black-billed Magpie Pica hudsonia 3 99.6 28 Yellow-billed Magpie Pica nuttalli 3 29 American Crow Corvus brachyrhynchos 3 99.5 30 Northwestern Crow Corvus caurinus 4 31 Townsend's Warbler Dendroica townsendi 6 99.5 32 Hermit Warbler Dendroica occidentalis 5 33 Golden-crowned Sparrow Zonotrichia leucophrys 8 99.7 34 White-crowned Sparrow Zonotrichia atricapilla 3 35 Dark-eyed Junco Junco hy emails 24 99.7 36 Yellow-eyed Junco Junco phaeonotus 3 37 Snow Bunting Plectrophenax nivalis 2 99.9 38 McKay's Bunting Plectrophenax hyperboreus 1 39 Great-tailed Grackle Quiscalis mexicanus 11 99.2 40 Boat-tailed Grackle Quiscalis major 6 41 Common Redpoll Carduelis flammea 2 99.7 42 Hoary Redpoll Carduelis hornemanni 5

23 Table 1.2. Provisional species. Provisional splits of recognized species with intraspecific distances above 2.5% threshold (*) identified in earlier study (Hebert et al. 2004); (f) prior research supports split; (J) Prior research cites genetic division but does not support species split (citations provided in table). Bootstrap support for provisional species clusters are shown.

Max. Common Name Scientific Name Intrasp. Bootstrap Citation (if applicable) Dist. 1 Northern Fulmar Fulmaris glacialis 3.1 100/- 2 Solitary Sandpiper* Tringa solitaria 5.4 100/100 3 Western Screech Owl Megascops kennicottii 3.1 100/100 4 Warbling Vireo*f Vireo gilvus 4.0 100/99 (Sibley and Monroe 1990) 5 Mexican Jayt Aphelocoma ultramarina 5.3 100/- (Rice et al. 2003) 6 Western Scrub-Jayt Aphelocoma californica 3.2 111- (Rice et al. 2003) 7 Common Ravent Corvus co rax 4.3 100/92 (Omland et al. 2000) 8 Mountain Chickadeef Poecile gambeli 3.7 100/100 (Gill et al. 1993) 9 Bushtit Psaltriparus minimus 3.6 100/100 10 Winter Wrenf Troglodytes troglodytes 6.4 100/100 (Drovetski et al. 2004b) 11 Marsh Wren*f Cistothorus palustris 7.9 100/100 (Kroodsma 1989) 12 Bewick's Wren Thyromanes bewickii 4.8 100/100 13 Hermit Thrush Catharus guttatus 3.2 100/100 14 Curve-billed Thrashert Toxostoma curvirostre 7.4 100/- (Zink and Blackwell- Rago 2000) 15 Eastern Meadowlark*f Sturnella magna 4.6 100/100 (Rohwer 1976)

24 RECOGNIZED SPECIES YES NO YES A. SINGLE CLUSTERS B. MULTIPLE CLUSTERS MATCH SPECIES-LEVEL SUGGEST CRYPTIC SPECIES DISTINCT TAXONOMY BARCODE CLUSTERS NO C. OVERLAPPING CLUSTERS D. (BLANK) SUGGEST YOUNG SPECIES, HYBRIDIZATION, OR SYNONOMY

Figure 1.1. Comparing barcode sequence clusters with species-level taxonomy.

Categories A-C are described in figure; by definition, all potential splits recognized by barcoding have distinct barcodes, so D is blank.

25 i.o -

O g 1.4- • 03 • (0 • . • ••B 1-2- • • Q.

• _j 0.0 - -4- + -4- •• i • ~i ^ 3 4 5 6 7 8 9 >10 No. individuals analyzed

Figure 1.2. Mean intraspecific variation according to number of individuals analyzed.

Boxes indicate mean, 25th and 75th percentile; bars, 10th and 90th percentile; and dots, values above or below 90th or 10th percentile, respectively.

26 A) 1.8 n = 411 8 1.6 H c r =.26 CO T5 1.4 H p = < .0001

°- 1.2-

" 1.0 - *^

Census population size B) 1.6

O 1.4 n = 391 C CO r =.16 (/) 1.2 p = .002 T3 Q. CM 1 0 *: o 0.8 cif i 0) Q. (0 0.6 c c 0.4 (0 0 2 0.2 0.0 2 4 6 8 10 12 14 16 Minimum interspecific distance

Figure 1.3. Intraspecific distance, population size, and apparent species age. A) Linear regression of mean intraspecific distance and log 10 census population size. For illustration purposes, a box plot was generated as described in legend to Figure 2. B)

Linear regression of mean intraspecific COI distance compared with apparent species age, as indicated by minimum interspecific K2P distance to nearest congeneric relative.

27 A. Mean intraspecific distance B. Maximum intraspecific distance

provisional species (n=30)

min 0 max 1.59 min 0 max 2.24 mean .23 se .01 mean .40 se .02 -

—I 1— —t 1 r— 5

C. Nearest neighbor distance D. Nearest neighbor distance congeners only

overlapping species / (n=42)

min 0 max 16.99 min 0 max 14.18 mean 5.91 se .10 mean 4.30 se .15 h l_ ] i—i—i i ' o -I—i—i—i—I—i—i—i—n—i—i—j—i—r i i 15 0 5 10 15

Figure 1.4. Intraspecific and nearest neighbor distances in North American birds.

Applying these measures to the data set in the preliminary study (2) gave mean values of

0.24, 0.27, 8.02, and 5.86 for A-D, respectively.

28 CHAPTER II

Probing evolutionary patterns in Neotropical birds through DNA barcodes

Published in PLoS ONE:

Kerr KCR, Lijtmaer DA, Barreira AS, Hebert PDN, Tubaro PL. (2009) Probing evolutionary patterns in

Neotropical birds through DNA barcodes. PLoS ONE 4(2): e4379. doi:10.1371/journal.pone.0004379

29 ABSTRACT

The Neotropical avifauna is more diverse than that of any other biogeographic region, but our understanding of patterns of regional divergence is limited. Critical examination of this issue is currently constrained by the limited genetic information available. This study begins to address this gap by assembling a library of mitochondrial

COI sequences, or DNA barcodes, for Argentinian birds and comparing their patterns of genetic diversity to those of North American birds.

Five hundred Argentinian species were examined, making this the first major examination of DNA barcodes for South American birds. Our results indicate that most southern Neotropical bird species show deep sequence divergence from their nearest- neighbour, corroborating that the high diversity of this fauna is not based on an elevated incidence of young species radiations. Although species ages appear similar in temperate

North and South American avifaunas, patterns of regional divergence are more complex in the Neotropics, suggesting that the high diversity of the Neotropical avifauna has been fueled by greater opportunities for regional divergence. Deep genetic splits were observed in at least 21 species, though distribution patterns of these lineages were variable. The lack of shared polymorphisms in species, even in species with less than

0.5M years of reproductive isolation, further suggests that selective sweeps could regularly excise ancestral mitochondrial polymorphisms.

These findings confirm the efficacy of species delimitation in birds via DNA barcodes, even when tested on a global scale. Further, they demonstrate how large libraries of a standardized gene region provide insight into evolutionary processes.

30 INTRODUCTION

DNA barcoding, the survey of sequence diversity in a standard gene region (5' segment of mitochondrial cytochrome c oxidase I, or COI, for animals), has a strong track record for identifying species in varied taxonomic groups (Hajibabaei et al. 2006;

Smith et al. 2008; Ward et al. 2005). One particularly comprehensive study of DNA barcodes revealed that 94% of 643 North American bird species possess diagnostic barcode sequences (Hebert et al. 2004; Kerr et al. 2007). Moreover, the few cases where barcode sharing limited taxonomic resolution in this fauna involved closely allied species that often hybridize. Similar results have been obtained from the Palearctic; Yoo et al.

(2006) reported that barcodes reliably identify Korean birds (92 of 450 species were examined).

There remains a need for similar investigations in the Neotropics, the hotspot for avian diversity with a fauna of 3,370 breeding species including 3,121 endemics (Newton

2000). Aside from this high taxon diversity, tropical species often possess greater genetic structure than their temperate zone counterparts (Bates et al. 1999; Garcia-Moreno et al.

1998; Garcia-Moreno et al. 1999; Hackett 1996). For both these reasons, it has been argued that the Neotropical avifauna will challenge DNA barcoding (Moritz and Cicero

2004). Yet, the only previous test of barcoding in Neotropical birds disagreed with this conclusion as it found that 16 species of the endemic family Thamnophilidae could be discriminated (Vilaga et al. 2006). Clearly, a larger-scale investigation is needed.

Moreover, a broad survey of sequence diversity at COI in Neotropical birds permits the analysis of patterns of genetic divergence and geographic distributions of distinct lineages, as well as comparisons with other geographic areas (particularly the Nearctic,

31 where most avian species have already been barcoded (Hebert et al. 2004; Kerr et al.

2007). This contribution would be highly valuable to study diverse aspects of evolution in birds and to detect species, or groups of species, requiring more detailed investigations of taxonomic status.

The Argentinian avifauna includes 980 species, approximately 25% of

Neotropical bird species (Mazar Barnett and Pearman 2001; Narosky and Yzurieta 2003).

The present study examines patterns of barcode divergence in over half of the bird

species native to Argentina. In addition to testing the effectiveness of DNA barcodes for

species identification, we explore cases where different species share COI sequences and those where single species include two or more divergent lineages. Finally, and more critically, we analyze patterns of sequence divergence in the birds of Argentina and compare them with those in North America with a view towards understanding the origins of the diversity in South American avifauna and obtaining a broader perspective on mitochondrial genetic variation in New World birds.

METHODS

Most specimens (88%) were collected by the Ornithology Division of the Museo

Argentino de Ciencias Naturales "Bernardino Rivadavia" (MACN) between 2003 and

2007, sometimes in collaboration with other institutions. A few additional specimens were donated (6%), confiscated from illegal traders (4%), or obtained from skins of birds collected after 1995 (2%). DNA was usually extracted from frozen samples of pectoral muscle, liver or heart, but a few extractions were from blood (1%) or small pieces of

32 skin/toe pads from museum skins (2%). All samples derive from the tissue collection at the MACN.

A voucher is present in the MACN or in another collaborating institution for 99% of the specimens that provided a tissue sample for analysis. While most of these vouchers were study skins, a few were skeletons or specimens in ethanol. In the case of blood samples, birds were photographed prior to release to provide an e-voucher. All specimens were identified in the field and validated after preparation; taxonomic assignments follow

Clements (2007). Only specimens with confirmed species identities were included.

Adults were preferred over juveniles and, in the case of species with sexual dimorphism, males were chosen over females. Specimens were examined from all localities with representatives of a species, but no more than three individuals were analyzed from a single location, excepting a few species with particularly high genetic divergence.

DNA extracts were obtained using glass fibre columns with vertebrate lysis buffer, and an automated protocol using a Biomek FXP liquid handler (Ivanova et al.

2006). Extracts were eluted in 50 fil of molecular grade water. COI sequences were amplified using the primer pair of BirdFl (5'-

TTCTCCAACCACAAAGACATTGGCAC-3') and COIbirdR2 (5'-

ACGTGGGAGATAATTCCAAATCCTGG-3'). When PCR failed and degraded DNA was the suspected cause, internal primers were used in conjunction with those above:

AvMiRl (5'-ACTGAAGCTCCGGCATGGGC-3') and AvMiFl (5'-

CCCCCGACATAGCATTCC-3'). PCR reactions were initially run following the thermal cycling program in Kerr et al. (2007). Later samples used a shorter program which was equally effective: One cycle at 94°C for 1 min, five cycles of 94°C for 1 min, 45°C for 40

33 s, and 72°C for 1 min, 35 cycles of 94°C for 1 min, 51°C for 40 s, and 72°C for 1 min, and lastly one cycle of 72°C for 5 min. PCR products were visualized on 2% agarose gels

(E-gel 96, Invitrogen) and were bi-directionally sequenced on an ABI 3730x1 DNA

Analyzer. Sequence records were assembled from forward and reverse reads using

SEQUENCHER, version 4.5 (Gene Codes) and aligned by eye using BioEdit version

7.0.5.3 (Hall 1999).

Specimen and collection data, sequences, and trace files are available in the project 'Birds of Argentina - Phase I' in BOLD (http://www.barcodinglife.org).

Sequences have been deposited in GenBank under accession numbers FJ027014 -

FJ028607. BOLD process IDs, museum numbers, and GenBank accession numbers for each specimen analyzed are outlined in Appendix 2.1. Comparisons to COI sequences for

North American birds employed data available from BOLD in the container project

'Birds of North America - Phase II' (sequences are also available from GenBank under accession numbers AY666171-AY666596, DQ432694-DQ433261, DQ433274-

DQ433846, and DQ434243-DQ434805).

Sequences were compared using the Kimura 2-parameter distance option (Kimura

1980) in the BOLD Management & Analysis System (Ratnasingham and Hebert 2007).

Linear regression was performed using R version 2.5.0 (R Development Core Team

2007). Intra- and interspecific variation were examined visually with neighbour-joining trees generated using the 'Taxon ID Tree' option on BOLD. Bootstrap support using

1000 replicates was calculated using MEGA version 4.0 (Tamura et al. 2007). Sequences of pairs or trios of species with low divergence were analyzed by eye for the identification of diagnostic nucleotides (positions fixed within each species but different

34 between them), which have previously proven to be robust in other species (Kerr et al., unpublished data).

RESULTS

In total, 1,594 sequences were obtained from 500 species representing 51 % of the bird species known from Argentina, including 22 of 23 orders and 68 of 81 families

(Appendix 2.2 provides a species list). On average, 3.2 individuals (range 1-19) were analyzed per species, with 389 taxa represented by multiple specimens. Only sequences longer than 550 bp with less than 1% ambiguous base calls were included (average sequence length was 692 bp), except four sequences possessing 1-3% ambiguous calls that represented the sole records for their species.

The mean sequence distance among congeneric taxa was 7.6%, while the distance to the nearest congener averaged 6.2%, based on 282 comparisons. In genera represented by multiple species, 92% of species were more than 1.0% divergent from their nearest congener and 83% were more than 2.5% divergent (Figure 2.1). COI delivered a species identification for 98.8% of species, using either a distance-based criterion or, in cases of very low divergence (less than 0.5%), using diagnostic nucleotide substitutions. Mean intraspecific distance was 0.24% based on the 389 species represented by multiple specimens (weighted equally regardless of the number of individuals). There was a weak association between intraspecific variation and sample size (linear regression, p < 0.01,

R2 = 0.12).

A few cases of low (<1%) divergence between congeners were detected. In most cases, discrimination by COI was possible because each species formed a distinct cluster

35 in a neighbour-joining tree or showed diagnostic sequence differences. For example, sequences of the mockingbird species Mimus dorsalis and M. triurus showed three diagnostic substitutions, and the woodpeckers Veniliornis frontalis and V. passerinus differed by two nucleotides. Two goldfinch species (Carduelis atrata and C. crassirostris) and three ground-tyrants (Muscisaxicola capistratus, M. frontalis and M. maclovianus) also showed low divergences but were separable. Two ducks, Anas puna and A. versicolor, possessed five diagnostic substitutions, although the latter species showed considerable within species variation at other nucleotide positions. However, in all these cases, barcodes delivered reliable identifications because sequences for each species formed a single cluster (bootstrap support exceeded 70% in all cases, with the exception of A. versicolor, which had 57% support). Three other ground-tyrants

(Muscisaxicola flavinucha, M. cinereus, M. rufivertex) with even lower genetic divergences were paraphyletic, impeding straightforward identification. Additionally, there was one species complex where barcode resolution was clearly compromised- six species of Sporophila (S. cinnamomea, S. hypochroma, S. hypoxantha, S. palustris, S. ruficollis, S. zelichi) all shared barcodes.

Hebert et al. (2004) suggested that a sequence threshold of ten times the average intraspecific variation could be used to identify those cases where a current species might represent more than one taxon. For the birds of Argentina, this threshold lies at 2.4% and its application flags 13 species as possessing unusually high sequence variation (Table

2.1). Another way of identifying species in need of taxonomic scrutiny involves the search for taxa whose specimens form two or more distinct clusters with high bootstrap support (i.e. >98%) in a neighbour-joining tree. If applied to Argentinian birds, eight

36 more species are flagged, all showing maximum intraspecific distances higher than 1.5%

(Table 2.1).

More than 10% of the Argentine avifauna (111 of 980 species) also occurs in

North America, but 45 of these species are migrants that do not breed in Argentina, five are pelagic visitors to both regions, and six are introduced to one or both areas. However, barcode data are available for 42 of the remaining 55 species, which possess natural breeding ranges extending from Argentina to North America (see Appendix 2.3). Seven of these widely distributed species displayed substantial genetic divergence (>2.4%) between North American and Argentinian populations, three displayed smaller divergences (1.5-2.4%), while the remaining 32 showed limited or no divergence.

DISCUSSION

Barcodes in Argentinian Birds

Just nine of the 500 bird species included in our study cannot be distinguished using COI sequences. Three of these are Muscisaxicola ground-tyrants, which have low interspecific divergence and appear to be paraphyletic. The remaining six species, which all share barcodes, are members of the southern capuchinos, a sub-group within the genus

Sporophila that includes nine species (seven of them present in Argentina) and shows little mitochondrial sequence variation (Lijtmaer et al. 2004). Members of this group are believed to have diverged within the past 0.5M years, fueled by sexual selection and a fragmented landscape, and they are known to hybridize (Lijtmaer et al. 2004). Although shared mitochondrial sequences have also been reported in white-headed gulls (Kerr et al.

2007), in Darwin's finches (Sato et al. 1999), and possibly in crossbills (Edelaar et al.

37 2003), the present study reinforces earlier evidence that such cases are exceptional in both Nearctic and Neotropical locales despite the known existence of hybridization in birds, which involves around 10% of the world's species (McCarthy 2006).

Five additional genera with very low interspecific divergences included a pair or triad of species; none of these cases was a surprise. Mimus dorsalis and M. triurus are regarded as sister taxa (Arbogast et al. 2006), while Veniliornis frontalis and V. passerinus are so similar morphologically that Nores (1992) designated them as allospecies, and Moore et al. (2006) suggested that they diverged only 0.35 Mya. Anas puna and A. versicolor (Johnson and Sorenson 1999) have sometimes been considered conspecific (Rodriguez Mata et al. 2006), but the present results support the conclusion that they are young species. Arnaiz-Villena et al. (1998) suggested a very recent expansion of Carduelis in South America to explain the low genetic divergence between

C. atrata and C. crassirostris. Finally, Chesser (2000) proposed a middle-late Pleistocene diversification for Muscisaxicola because of the shallow divergences between its member species. Further instances of low divergence between species will undoubtedly be revealed as taxonomic coverage builds for Argentina, but there is no evidence that young species radiations are any more common in this region than in North America. Further studies in northern, more tropical areas of South America are needed to establish if this similarity is a consequence of the comparison of two mainly temperate regions in opposite hemispheres or if the same trend is present throughout the Neotropics.

Two evolutionary inferences derive from the present results. First, the relative paucity of very closely related species implies that high species diversity in southern

Neotropical birds does not owe its origin to an elevated incidence of young species

38 radiations, a finding that is consistent with recent proposals (Weir and Schluter 2004;

Weir and Schluter 2007). Second, and more generally, the low variation within species and the fixation of diagnostic COI sequences even in young species groups conflicts with expectations based on stochastic models of mitochondrial variation, which argue that ancestral polymorphisms will persist for millions of generations (Hickerson et al. 2006).

Instead, the rapid emergence of fixed differences is compatible with the growing evidence that selective sweeps recurrently strip variation from mitochondrial gene pools

(Bazin et al. 2006), although demographic factors have not been ruled out as a possible explanation.

Tropical taxa are generally thought to show more genetic structure than their temperate zone counterparts, even in the absence of geographical barriers (Francisco et al. 2007). Work on North American birds revealed deep barcode divergences in 2.7% of

species (15/546), all involving allopatric lineages, usually east-west splits. The incidence of deep splits was slightly higher in Argentina with 3.3% of species with multiple records

(13/389) showing divergences greater than the 2.4% threshold. Another eight species possessed distinct barcode clusters with 1.5-2.4% divergence, producing a total of 21

species with marked population structure (5.4% of the species examined). Interestingly, these cases of divergence included situations of allopatric, parapatric and sympatric divergence.

Fourteen of the 21 species showed allopatric divergences although there was no

simple pattern of geographic structuring. Some cases involved north-south divergence.

For example, Patagonian populations of Cistothorus platensis and Cinclodes fuscus possessed almost 5% divergence from those in northwestern Argentina (Figure 2.2b),

39 which is consistent with previous findings (Traylor 1988). Other barcode splits coincided with environmental gradients or known barriers to gene flow. For example, specimens of

Upucerthia dumetaria from different elevations in the Andes diverged by as much as

5.4%, while 4% divergence between lineages of Thamnophilus ruficapillus (see Figure

2.2a) coincided with isolation caused by the Chaco woodland (Nores 1992). Some cases of allopatric divergence seem to represent overlooked species; specimens of Serpophaga subcristata from northeastern Argentina and those from Patagonia and Buenos Aires province not only exhibit 2% COI divergence, but differences in morphology and vocalizations (Straneck 1993). Likewise Troglodytes aedon possessed three COI lineages with divergences as high as 5% and its Neotropical populations are thought to include several species (J. Klicka, unpublished data). In other cases, the situation is unclear. For example, two subspecies of Vanellus chilensis (northern V. c. chilensis, southern V. c. fretensis) show just 1.5% divergence, but this matches the divergence between other closely allied species in the same family (e.g. Charadrius alticola and C. falklandicus).

Five of the 21 species with deep divergences involved cases of parapatry. For example, populations of Vireo olivaceus in northeastern and northwestern Argentina possess up to 3.1% sequence divergence, but both COI lineages occurred at one northeastern site (Figure 2.2c). Does this area represent a region of sympatry between reproductively isolated species or a contact zone between phylogeographic groups? More specimens from this location need to be examined for variation at nuclear loci to resolve this uncertainty (Brumfield 2005).

Interestingly, two of the 21 species possessed divergent mitochondrial lineages in sympatry. One of these species, Manacus manacus, includes four colour forms that are

40 sometimes regarded as different species (Snow 2004). Specimens from Parque Nacional

Iguazii included two COI groups with 3.5% divergence and males of both lineages were collected from a single lek (Figure 2.2d), suggesting that the divergent COI groups in M. manacus represent a rare case of deep intra-specific divergence. However, further study is required to determine the origin of such genetic variation and the taxonomic status of this species. Furthermore, this sympatric distribution of lineages could prove to be parapatric with increased sampling (the same could theoretically be true for any of the above examples of allopatry if a region of overlap has been left unsampled). This emphasizes the need to collect several specimens per locality, as well as to sample the entire distribution of each species.

Barcoding the avifaunas of Argentina and North America

Although species coverage is higher for North America (643 species, 93% of fauna) than for Argentina (500 species, 51% of fauna), sample sizes are high enough in both regions to provide a good sense of overall patterns of COI variation. Mean intraspecific divergences are congruent - 0.23% in North America and 0.24% in

Argentina. The nearest neighbour distance for congeneric taxa is lower in North America

(4.3%) than in Argentina (6.2%), but this regional difference will undoubtedly lessen as species coverage builds for Argentina. Barcode sequences are effective for species identification in both settings; 94% of North American birds and 98% of birds from

Argentina can be identified to a species level. The incidence of deep intraspecific divergences is similar in the two regions (2.7% versus 3.3%), but distributional patterns vary. Most of the genetically divergent groups in North America reflect east-west

41 allopatry (Kerr et al. 2007), while divergences in Argentina are more complex; some are north-south, others are east-west, and yet others occur along altitudinal gradients or in response to specific habitat barriers. Moreover, some cases of deep barcode divergence in

Argentinian species involve parapatric or sympatric lineages.

Aside from a test of congruence in barcode patterns, this study provided information on sequence divergences for 42 species whose breeding range extends from

Argentina to North America. Fifteen of the 32 species with low divergence are waterbirds

(e.g. herons, rails, cormorants, ducks); their use of coastal habitats facilitates gene flow

(Friesen 1997). Seven other species are tropical raptors with limited ranges in North

America and whose long-distance movements ensure gene flow (Bildstein 2004). The few small in this group may represent recent range extensions into the southernmost United States. The 10 species with deeper genetic divergences (>1.5%) were largely plain-coloured passerines {Troglodytes aedon, Vireo olivaceus) and birds with cryptic lifestyles (Nyctidromus albicollis, Glaucidium brasilianum). Most possessed a disjunct range, typically with northern migratory and southern non-migratory populations (e.g. Athene cunicularia, Troglodytes aedon, Vireo olivaceus). Some groups, such as the vireos, are thought to have evolved migratory behaviour on multiple occasions (Cicero and Johnson 1998), switches that might provoke rapid speciation because they isolate breeding populations (Bearhop et al. 2005). The status of all 10 species with deep splits requires further evaluation, but the need for taxonomic revisions has already been suggested in some cases (e.g. Brumfield and Capparella 1996).

Aside from revealing cases of geographic divergence, the coupling of North

American and Argentinian results revealed two cases of barcode sharing. Parula

42 americana, a species ranging from eastern North America to Central America, shares barcodes with P. pitiayumi, a tropical species whose range extends north to Texas. A recent range expansion from a common ancestor has been proposed as the most likely cause for the low divergence between these species (Lovette and Bermingham 2001). The second case of sequence overlap involves Anas americana, restricted to North/Central

America, and A. sibilatrix, confined to the southern cone of South America. While ducks generally exhibit low genetic divergences, these two species possess striking plumage differences. Peters et al. (2005) proposed that rapid phenotypic changes have been provoked by divergent selective pressures in the northern and southern hemispheres.

Conclusions

The taxonomy of Neotropical birds remains largely reliant on dated morphological studies (Prum 1994), but molecular data promise to expedite a newly detailed understanding of this fauna (Fjeldsa et al. 2007). Although levels of genetic differentiation do not dictate taxonomic status (Zink et al. 1995a), barcode analysis illuminates those taxa and those segments of their ranges where further research is justified. Taxonomic decisions cannot be based simply on COI sequences, but barcode surveys are a powerful tool for rapidly identifying those species in need of further investigation. The occurrence of limited variation between well-known sister taxa suggests that even more cryptic species may persist than a liberal thresholding approach, such as the 10X rule, might indicate (Tavares and Baker 2008). The present study shows the way in which a broad-ranging analysis of sequence diversity in a single gene region can also deliver insights concerning the diversification of faunas as opposed to small

43 groups of species. Interestingly, Argentinian and North American birds showed similar incidences of deep intraspecific divergences and of barcode sharing. The underlying causes of both these situations are of great importance to our understanding of avian speciation. We expect that follow-up investigations of sequence variation at other loci, and studies on morphology, behaviour, vocalization and distributions (e.g. Toews and

Irwin 2008) will rapidly advance understanding of the diversity and diversification of the

Neotropical avifauna.

44 Table 2.1. Bird species from Argentina with two or three deeply divergent groups at

COL Species showing more than 2.4% sequence divergence between groups are in bold.

Maximum distances are reported in percent divergence. Patterns represent allopatry (A), parapatry (P), or sympatry (S).

# Family Species Max. Individuals Pattern distance per lineage 1 Charadriidae Vanellus chilensis 1.54 4/4 A 2 Strigidae Athene cunicularia 1.60 1/5 A 3 Dendrocolaptidae Sittasomus griseicapillus 3.25 1/6 A 4 Furnariidae Geositta cunicularia 3.41 1/2 S 5 Leptasthenura aegithaloides 3.72 2/8 A 6 Cinclodes fuscus 4.65 3/5 A 7 Upucerthia dumetaria 5.41 2/10 A 8 Cranioleuca pyrrhophya 1.53 2/4 P 9 Thamnophilidae Thamnophilus caerulescens 2.44 3/4 P 10 Thamnophilus ruficapillus 4.03 2/2 A 11 Pipridae Manacus manacus 3.56 3/3 S 12 Tyrannidae Serpophaga subcristata 2.04 2/3 A 13 Myiophobus fasciatus 4.67 1/3 A 14 Knipolegus aterrimus 1.9 3/4 P 15 Troglodytidae Cistothorus platensis 4.95 1/2 A 16 Troglodytes aedon 4.99 3/8/8 A/P 17 Vireonidae Vireo olivaceus 3.09 2/3 P 18 Thraupidae Thraupis bonariensis 3.29 1/6 A 19 Cardinalidae Cyanocompsa brissonii 2.04 2/6 P 20 Saltator aurantiirostris 1.52 1/6 A 21 Emberizidae Arremon flavirostris 1.75 3/4 A

45 §0%

I 66%

I 40%

20%. 4

1 I 3 4 S § 7 S | i@ 11 12 O 14 IS 16 1? IS Distance {m}

Figure 2.1. Frequency histograms of COI sequence variation for birds of Argentina.

Distance to nearest congeneric neighbour for 282 species from genera represented by multiple taxa (black) and mean intraspecific diversity for the 389 species of birds with two or more sequence records (white).

46 00 (b)

• V o • • • O -o 0

Bf j-o r# H: I I 2.0%

Figure 2.2. Maps detailing the different distributional patterns of divergent barcode lineages. Species ranges are highlighted in green and circles indicate collection sites.

Hollow and filled in circles correspond to lineages represented on superimposed neighbour-joining trees (shaded circles represent sites with overlap), (a) Barcode lineages are allopatric and coincide with disjunctions in the distribution of populations (e.g.

Thamnophilus ruficapillus). (b) Barcode lineages are allopatric, but species distribution appears continuous (e.g. Cinclodes fuscus). (c) Barcode lineages are parapatric (e.g. Vireo olivaceus). (d) Barcode lineages are sympatric (e.g. Manacus manacus).

47 CHAPTER III

Filling the gap - COI barcode resolution in eastern Palearctic birds

Published in Frontiers in Zoology:

Kerr, K.C.R., Birks, S. M., Kalyakin, M., Red'kin, Y., Koblik, E., and Hebert, P.D.N. (2010) Filling the gap

- COI barcode resolution in eastern Palearctic birds. Frontiers in Zoology 6: 29. doi: 10.1186/1742-9994-6-

29

48 ABSTRACT

The Palearctic region supports relatively few avian species, yet recent molecular studies have revealed that cryptic lineages likely still persist unrecognized. A broad survey of cytochrome c oxidase I (COI) sequences, or DNA barcodes, can aid on this front by providing molecular diagnostics for species assignment. Barcodes have already been extensively surveyed in the Nearctic, which provides an interesting comparison to this region; faunal interchange between these regions has been very dynamic. We explored COI sequence divergence within and between species of Palearctic birds, including samples from Russia, Kazakhstan, and Mongolia. As of yet, there is no consensus on the best method to analyze barcode data. We used this opportunity to compare and contrast three different methods routinely employed in barcoding studies: clustering-based, distance-based, and character-based methods.

We produced COI sequences from 1,674 specimens representing 398 Palearctic species. These were merged with published COI sequences from North American congeners, creating a final dataset of 2,523 sequences for 599 species. Ninety-six percent of the species analyzed could be accurately identified using one or a combination of the methods employed. Most species could be rapidly assigned using the cluster-based or distance-based approach alone. For a few select groups of species, the character-based method offered an additional level of resolution. Of the five groups of indistinguishable species, most were pairs, save for a larger group comprising the herring gull complex. Up to 44 species exhibited deep intraspecific divergences, many of which corresponded to previously described phylogeographic patterns and endemism hotspots.

49 COI sequence divergence within eastern Palearctic birds is largely consistent with that observed in birds from other temperate regions. Sequence variation is primarily congruent with taxonomic boundaries; deviations from this trend reveal overlooked biological patterns, and in some cases, overlooked species. More research is needed to further refine the taxonomic status of some Palearctic birds, but large genetic surveys such as this may facilitate this effort. DNA barcodes are a practical means for rapid species assignment, although efficient analytical methods will likely require a two-tiered approach to differentiate closely related pairs of species.

50 INTRODUCTION

DNA barcoding employs sequences from a short standardized gene region to identify species (Hebert et al. 2003a). The mitochondrial gene cytochrome c oxidase I

(COI) has been firmly established as the core barcode region for animals (Frezal and

Leblois 2008) and its performance has been evaluated in birds from several regions, including North America (Kerr et al. 2007), Brazil (Chaves et al. 2008; Vilaca et al.

2006), Argentina (Kerr et al. 2009b), and Korea (Yoo et al. 2006). While most bird species are readily identifiable through morphological traits (Watson 2005), their well- developed taxonomy makes them a valuable group to test the efficacy of barcoding.

Additionally, avian taxonomy is not immune to change, and in recent decades DNA evidence has clarified many species boundaries. Broad surveys, such as DNA barcoding, can expedite this process by quickly spotlighting species that merit further taxonomic investigation (Elias-Gutierrez and Valdez-Moreno 2008; Gibbs 2009; Yassin 2008). This capacity is illustrated by several recently described species that were earlier revealed as divergent lineages during barcode surveys (Areta and Pearman 2009; Barker et al. 2008;

Toews and Irwin 2008).

Although the avian diversity of the Palearctic is relatively depauperate (Newton

2000) and its taxonomy was stable for decades, modern molecular techniques have spurred the recognition of overlooked species (Knox et al. 2002). These new species were often hidden within morphologically cryptic assemblages, which impeded their discovery

(e.g. Illera et al. 2008; Li and Zhang 2004). In other cases, biological species hypotheses could not be tested because divergent populations had allopatric distributions (Friesen et al. 1996; Zink et al. 1995b; Zink et al. 2002b). Molecular analyses continue to illuminate

51 the phylogeographic structure of birds in this region (Drovetski et al. 2004a; Drovetski et al. 2004b; Pavlova et al. 2006; Pavlova et al. 2003; Pavlova et al. 2008; Zink et al. 2006a;

Zink et al. 2008; Zink et al. 2002b). A recent barcoding survey of Scandinavian birds by

Johnsen et al. (2010) revealed high species resolution plus a few divergent lineages, including some between European and North American populations of trans-Atlantic species. The Atlantic Ocean serves as a relatively impermeable barrier to dispersal for non-pelagic birds (Newton 2000; but see Voelker et al. 2009), but the situation is very different in the eastern Palearctic where intercontinental exchange across the Bering

Strait is more frequent (Pavlova et al. 2008; Reeves et al. 2008; Zink et al. 1995b).

Johnsen et al. (2010) also highlighted sequence divergences within a few species that failed to correspond to known subspecies or logical geographical patterns - a pattern not observed in a comprehensive survey of Nearctic birds (Kerr et al. 2007). To determine if this pattern is recurrent, to highlight further cases of cryptic divergences, and to explore general patterns in sequence divergence, we advance COI barcode coverage in this study to include the breeding birds of the eastern Palearctic region, including Russia, Ukraine,

Kazakhstan, and Mongolia.

Despite the growth of DNA barcode libraries, no consensus has yet emerged on the best method to analyze DNA barcode data (Ferri et al. 2009). Some of the original tools proposed to delimit species using COI sequences, such as neighbour-joining profiles

(Barrett and Hebert 2005) and distance thresholds (Hebert et al. 2004), have been criticized by several authors for not realistically addressing the complexity of species boundaries (Baker et al. 2009; Moritz and Cicero 2004; Wiemers and Fiedler 2007;

Zhang et al. 2008). More recent tools have gained complexity, incorporating coalescent

52 theory and more elaborate statistical methods, though at the cost of computational time and power (Abdo and Golding 2007; Matz and Nielsen 2005; Zhang et al. 2008). The situation is further complicated by the dual purposes proposed for barcoding: species identification and species discovery (DeSalle 2006). The majority of new generation tools require pre-defined species designations and consequently cannot be used to identify divergent genetic lineages within known groups. Although the use of DNA barcodes to

"discover" species is contentious, it is generally accepted that barcode data can be used to flag potentially distinct taxa for further hypothesis testing (Rach et al. 2008). Because the taxonomy of Holarctic birds is relatively mature (Baker et al. 2009), we take this opportunity to compare and contrast some of the more commonly used analytical methods.

METHODS

Sampling

We examined 1,674 individuals representing 398 Palearctic species, with 83% of these taxa represented by multiple individuals. Species coverage was not uniformly distributed across orders and families due to specimen availability; nearly two-thirds of resident passerines were represented, versus less than 38% of non- birds. We used frozen tissue (typically pectoral muscle) from museum specimens; all but six tissues were linked to vouchered specimens. All tissue specimens originated from either the ornithology collection at the Burke Museum of Natural History and Culture (87.5%) or from the Zoological Museum of Moscow University (12.5%), and were collected in the field during the past 20 years. To capture geographical variation, individuals collected

53 from widely dispersed sites were preferentially sampled for each species whenever possible (see Figure 3.1 for distribution of collecting sites). Additional sequences from

North American congeners were also contributed (see below). As a taxonomic reference, we followed Clements (2007), including corrections and updates up to 8 October 2007 with the exception of treating Corvus comix as conspecific with C. corone (sensu Haring et al. 2007).

Laboratory methods

DNA extraction, PCR, and sequencing reactions follow the procedures described in Kerr et al. (2009b). Only sequences greater than 500 bp and containing fewer than 10 ambiguous base calls were included in analyses. The sequence from one Anas crecca specimen was omitted from analysis due to suspicion that it was actually an A. crecca x

A. carolinensis hybrid based on morphology and molecular results. Collection data, sequences, and trace files are available from the project 'Birds of the eastern Palearctic' at http://www.barcodinglife.org. All sequences have also been deposited in GenBank

(Accession nos GQ481247 - GQ482920). A complete list of the museum catalog numbers, BOLD process identification numbers, and GenBank accession numbers for each specimen analyzed is included in Appendix 3.1.

We supplemented the data gathered in this study with sequences from North

American congeners (accessible from the "Birds of North America - Phase II" project folder at www.barcodinglife.org) to examine divergences within transcontinental species and between sister species pairs. This added 849 sequences from 227 species, of which

66 species were shared with the Palearctic dataset. A list of BOLD process identification

54 numbers and GenBank accession numbers for these sequences are listed in Appendix 3.2.

In total, 2,523 sequences from 559 species were included in the analyses.

Data analysis

To assess the discriminatory power of COI barcodes, we compared three different methods commonly deployed in DNA barcoding studies: neighbour-joining (NJ) clusters, distance-based thresholds, and character-based assignment. We avoided more computationally intensive methods in favour of programs that could be executed in real time. For the clustering method, we used MEGA version 3.1 (Kumar et al. 2004) to construct an NJ tree using the Kimura 2 parameter distance model (K2P). More sophisticated tree-building methods exist, but since we are concerned about terminal branches, not deeper branching patterns, this method is sufficient. Support for monophyletic clusters was determined using 500 bootstrap replicates. Species were accepted as being monophyletic providing they comprised the smallest diagnosable cluster with greater than 95% bootstrap support (Felsenstein 1985). Though bootstrap support cannot be determined for species represented by a single sequence, they were included in the analysis to observe if they created paraphyly in neighbouring taxa.

Species that could be divided into two or more well-supported clusters were flagged as potentially cryptic taxa.

For the threshold-based approach, we blindly grouped sequences into provisional species clusters using a molecular operational taxonomic unit (MOTU) assignment program originally developed for nematodes (Floyd et al. 2002). The program,

'MOTU_define.pl' v2.07 (R. Floyd and M. Blaxter, unpublished; available from

55 http://www.nematodes.org/bioinformatics/MOTU/index.shtml), clusters sequences together based on BLAST similarity using a user-defined base difference cut-off. Rather than use an arbitrary cut-off value, we determined the optimum threshold, or OT

(Wiemers and Fiedler 2007), by pooling our new data with the published North American bird dataset (Kerr et al. 2007) and generating a cumulative error plot using all species with multiple representatives (see Figure 3.2). We adopted a liberal threshold of 11 base differences based on this result, which approximately equates to 1.6% divergence.

Program parameters only included sequences greater than 500bp with a minimum alignment overlap of 400bp; however, this did not exclude any sequences from analysis.

For the character-based identification method, we used the character assignment system CAOS, which automates the identification of conserved character states (in this case, different nucleotides) from a cladogram of pre-defined species (Sarkar et al. 2008).

The system comprises two programs: P-Gnome and P-Elf (Sarkar et al. 2008). P-Gnome is used to identify the diagnostic sequence characters that separate species and uses them to generate a rule set for species identification; P-Elf classifies new sequences to species using the rule set. We used the programs PAUP v4.0bl0 (Swofford 2002) and

MESQUITE v2.6 (Maddison and Maddison 2009) respectively to produce the input NJ trees and nexus files for P-Gnome in accordance with the CAOS manual. We executed P-

Gnome using several subsets of our data. First, we tried all of the Palearctic species included in this study to determine if diagnostic characters could be identified to separate a wide range of species. The input tree for P-Gnome requires that all species nodes be collapsed to single polytomies, which is an arduous task for large numbers of species. We only used a single representative from each species to circumvent this issue with the

56 drawback that intraspecific variation is ignored during rule generation. To test the character-based method on a finer scale, we ran the program independently on the three largest genera sampled: Emberiza (n=23), Phylloscopus (n=13), and Turdus (n=13). For species with multiple representatives, the shortest sequence was omitted from rule generation and used later to test species assignment.

For the first two tests (NJ and MOTU), all species exhibiting type I error, wherein a single species produced two or more discernable clusters of sequences, were compiled.

Additional lines of evidence (e.g. alternative genes, morphological differences, song differences, etc.) were sought from previous studies to support or refute the likelihood of species differences in such cases. However, no formal recommendations are made here.

We also performed the two-cluster test using Lintre (Takezaki et al. 1995) to determine if sequences from these species had evolved in a clock-like manner. For type II errors, wherein multiple species grouped together to form one well-supported cluster, sequences from each cluster were run through P-Gnome to ascertain if diagnostic characters could be identified that distinguish these close species.

RESULTS

Neighbour-joining clusters

Of the 559 species analyzed, 72 had only a single representative and thus no bootstrap support could be calculated. However, all of these formed independent branches on the NJ tree that did not compromise the identification of other species. The remaining species were categorized into four patterns (Figure 3.3). Ninety percent formed well-supported (>95% bootstrap) monophyletic groups (Figure 3.3a), and an additional

57 4% were monophyletic but with less than 95% bootstrap support (Figure 3.3b). Ten species, 2% of the total, were paraphyletic (Lams canus, Thalasseus sandviciensis,

Motacilla citreola, M.flava, Saxicola maurus, Sitta europaea, familiaris, Lanius collurio, L. excubitor, and Pica pica)(Figure 3.3c). The remaining taxa (4%) formed monophyletic clusters that contained two or more species (Figure 3.3d; Table 3.1). These were mostly limited to pairs of sister taxa, with the notable exception of one cluster containing 10 species in the Herring gull complex (Larus californicus, L.fuscus, L. glaucescens, L. glaucoides, L. heuglini, L. hyperboreus, L. occidentalis, L. smithsonianus, L. thayeri, and L. vegae).

Forty-two species showed evidence of having divergent lineages (Table 3.2).

Twenty-two species formed two or more well-supported (>95% bootstrap) monophyletic clusters. Another four species formed two distinct clusters, but with one cluster possessing only 90-94% bootstrap support. These cases included 7 of the 10 paraphyletic species. In an additional 16 species, a single specimen was divergent from the rest, but further sampling is necessary to adequately evaluate these cases. Table 3.2 lists all species with divergent lineages. The total number of species recognized via this method is difficult to gauge due to inclusion of single representatives for some species and divergent lineages.

Distance-based assignment

The MOTU analysis identified 570 clusters, or taxonomic units, versus the 559 recognized by traditional taxonomy. The similarity of these numbers disguises discrepancies in species assignment. Poor resolution occurred in 22 groups representing

58 61 species (Table 3.1). These lumped taxa, as with the NJ clustering method, were mostly limited to pairs of species, save for two triplets (Somateria spp. and Turdus spp.) and thirteen large white-headed gulls (Larus canus, L. delawarensis, L. marinus, and the aforementioned members of the Herring gull complex). Divergent groups were recognized in 42 species (Table 3.2); 95% of these overlapped with those recognized via

NJ. Most were divided into two clusters, though three or more clusters were detected in five species. In two of the paraphyletic species (Motacilla flava, Lanius collurio), one lineage was lumped with a closely related species while the other lineage was divergent.

Character-based assignment

P-Gnome failed to produce a diagnostic rule set that that could distinguish all 398 species sequenced in this study. Results using subsets of the data were more successful.

Complete diagnostic rule sets were generated and successfully tested for both

Phylloscopus and Turdus. The rule set for Emberiza could not distinguish between sequences of E. leucocephalos and E. citrinella due to their close congruence. In addition, P-Elf failed to correctly identify single sequences from the species E. chrysophrys and E. elegans. The former sequence was short (594 bp) and might have lacked important diagnostic characters. However, the latter sequence was of typical length (694 bp) and only exceptional in that it contained 5 polymorphic sites from the sequence used to generate the rule set. Both of these species were incorrectly identified as

E. aureola, though this identification would vary if the input tree were altered.

Of 22 groups of lumped species, all but five could be resolved using diagnostic characters (see Table 3.1). For example, the species pair Coturnix coturnix and C.

59 japonica possessed 10 diagnostic nucleotide sites, two short of recognition by the MOTU threshold but still easily distinguishable. More complex rule sets were required when more species were involved (e.g. Aythya ducks). The remaining groups featured virtually no variation between species. These include 10 members of the herring gull complex

{Larus spp.) and the species pairs Gallinago gallinago/G. delicata, Cuculus canoruslC. optatus, Carduelis flammealC. hornemanni, and Emberiza citrinella/E. leucocephalos.

DISCUSSION

Species boundaries in Palearctic birds

Divergence levels between closely related species were highly variable, ranging from approximately 0-16%; however, some of these values may be inflated for under- sampled genera and families. Recent studies have detached rate variation in the mitochondrial genome from factors such as population size, body size, and other life- history traits (Bazin et al. 2006; Nabholz et al. 2009; Nabholz et al. 2008). While some authors contend that rate variation in birds is highly irregular (Nabholz et al. 2009), a recent thorough review demonstrated relatively minor variation and upheld the occurrence of clock-like evolution (Weir and Schluter 2008). Consequently, we attribute the limited divergence between some sister species to recent speciation events. Studies documenting recent and rapid diversifications often address subspecific variants rather than full species (Mila et al. 2007a; Mila et al. 2007b). Still, low sequence divergence does not necessarily indicate that species should be synonymised (Joseph and Omland

2009). Low sequence divergence is particularly common in superspecies complexes,

60 including those divided between continents, but the species within them remain valid units for both ecological studies and conservation.

Four species pairs and the large white-headed gulls included in this study featured virtually no variation for COI and could not be distinguished using any of the approaches employed in this study. Low divergence in mitochondrial markers had been previously demonstrated in each of these cases. Lumping has been considered for some, including

Carduelis flammea/hornemanni (Marthinsen et al. 2008) and the recently split Gallinago gallinago/delicata (Baker et al. 2009), but more evidence is required. The cause of shared mitochondrial haplotypes between Cuculus canorus and C. optatus has not been resolved

(hybrids have never been documented (Sorenson and Payne 2005), but their taxonomic distinction has been asserted based on song differences (Payne 2005). Emberiza citrinella and E. leucocephalos are exceptionally interesting in that they are the most phenotypically distinct of these pairs and a survey of nuclear markers revealed genetic divergence (Irwin et al. 2009). They are known to hybridize extensively and introgression is a likely explanation (Irwin et al. 2009). Species boundaries in the large white-headed gulls may have also been confused by contemporary hybridization, though shallow history and slowed rates of evolution have also been implicated (Crochet and Desmarais

2000; Liebers et al. 2004).

Nearly one tenth of the species (7.5%) analyzed in this study contained divergent mitochondrial lineages, with divergences averaging 3.6%. While divergence at a single mitochondrial gene alone is insufficient evidence to define new species boundaries, it is cause for new hypothesis testing. Several recently split species that are morphologically similar to their nearest relative, such as the swallow Riparia diluta or the warbler

61 Locustella amnicola, represent taxa that barcodes would flag for closer scrutiny.

Distributions of most of the divergent lineages in this study conform to one of four previously documented phylogeographic trends (summarized in Table 3.2): a unique lineage in the Caucasus region (Hewitt 2000); a unique lineage in the Sakhalin region

(Zink et al. 2002a); divergent lineages divided into eastern and western populations (Zink et al. 2008); divergent lineages on either side of the Bering Strait (Zink et al. 1995b).

Species with multiple lineages can display more than one of these patterns. A few lineages appear to be parapatric, which could indicate areas of overlap or hybrid zones

(Aliabadian et al. 2005). Past climate change and its effect on historical habitat distribution is likely responsible for shaping patterns of genetic divergence in modern populations, but whether or not these populations were divided by the same historical events is difficult to determine without dating divergence times. While the COI sequences mostly appear to be evolving in a clocklike fashion, dating is risky given the absence of adequate calibration points and the reliance on various assumptions (Pavlova et al. 2008; Weir and Schluter 2008).

Most species exhibited surprisingly limited variation between Old World and

New World populations. Of the approximately 140 species with Holarctic distributions,

43% are represented in this study. Only 11 of these 61 species (18%) possessed intraspecific divergences great enough to signal likely species-level differences by either the NJ or MOTU method. The Bering Sea has served a variable but clear role as a barrier to gene flow for birds, particularly non-marine species. Several trans-Beringian species have already been split in recent years, due partly to molecular evidence (e.g.

Brachyramphus marmoratus/B. perdix (Friesen et al. 1996), Picoides tridactylus/P.

62 dorsalis (Zink et al. 1995b), Pica pica/P. hudsoni (Banks et al. 2000)). Still, caution must be exercised when identifying species boundaries between allopatric populations. For example, one of the Palearctic Lanius excubitor specimens from this study appears to belong to the North American clade, suggesting that some modern exchange might occur between the continents. Though it is more common for Palearctic species to invade the

Nearctic, the reverse pattern has also been observed (Zink et al. 2006b). Correct interpretation of this result requires further study with additional specimens.

This survey has identified a number of species that demand further taxonomic

scrutiny (see Table 3.2). It is likely that some of the divergent lineages identified here represent distinct species. Of course, genetic distances do not always correspond to

species limits (Zink et al. 2006b; Zink et al. 1995b). Alternative explanations for the divergent lineages observed include historical phylogeographic isolation, female- restricted dispersal, or male-biased gene flow (Baker et al. 2009). The common phylogeographic patterns observed in many of the divergent lineages support the idea of historical isolation. Areas of secondary contact must be further studied to evaluate the gene flow between lineages (Moritz et al. 2009). In a few exceptional cases genetic lineages appear largely sympatric, including within Alauda arvensis, Delichon dasypus,

and Phoenicurus phoenicurus. Nuclear copies of mitochondrial sequences (numts) are an unlikely explanation given the absence of stop codons and heterozygous peaks.

Phoenicurus phoenicurus was also noted by Johnsen et al. (2010), who attributed the aberrant phylogeographic pattern to admixture of historically separated lineages. This situation is paradoxical compared to suspected introgressed genomes used to explain limited divergence in sister species. Selective sweeps are frequently invoked to explain

63 the limited variation observed in mitochondrial markers (Irwin et al. 2009; Kerr et al.

2009b), which raises the question of how two mtDNA lineages manage to persist in one species but not another. Ongoing research of species limits and evolutionary histories is clearly still necessary in the Palearctic.

Methods comparison

The MOTU assignment program used in this study was originally developed for meiofauna with few morphological characters (Floyd et al. 2002). Applying it to a group with better-established taxonomy allows more conclusive tests of its performance. Our results indicated a type II error rate of 10.9%, but this is inflated by the diversity of named white-headed gull species (Larus spp.); with these species eliminated, error is reduced to 8.8%. At this point, we don't consider type I errors a fault of this method since these cases are biologically interesting, do not necessarily impair identification, and may represent over-looked species (Baker et al. 2009; Hebert et al. 2004). The major drawback to the program in its current form is the difficulty in associating any level of statistical support with species assignments, which may differ slightly depending on the input order of sequences. Although the program does allow a random re-sampling scheme, the output is not summarized, making statistical inference on the stability of taxonomic units virtually impossible. The major impediment now for biologists applying this method to microscopic invertebrates still lies in determining an operational threshold.

The use of a distance-based threshold technique has been a major point of contention in the DNA barcoding endeavour (Hickerson et al. 2006; Meyer and Paulay

2005; Moritz and Cicero 2004). While COI variation represents a product of evolution,

64 an arbitrary cut-off value does not reflect what is known about the evolutionary processes responsible for this variation. The threshold approach depends on the existence of a gap between levels of intraspecific variation and interspecific divergence, which opponents argue does not exist. Early success in identifying a "barcoding gap" in North American birds was attributed to insufficient sampling of closely related species (Baker et al. 2009;

Moritz and Cicero 2004). We found the original "lOx rule" proposed by Hebert et al.

(2004) to be too conservative to recognize recently diverged species and opted for a more liberal threshold of 1.6%. While this value was more effective at species identification, some sister species exhibited little or no variation, which eliminates the possibility of identifying a gap. However, invalidating the use of distance-based methods based on the failure of thresholds might be going too far. Identifying the nearest matches to a query sequence is still useful, even if a conclusive assignment is not provided (Ratnasingham and Hebert 2007).

The development of an NJ profile for identification depends on the coalescence of species and not an arbitrary level of divergence (Wiemers and Fiedler 2007); in theory, species that failed recognition via the threshold approach may still be recognized.

However, we found that the same species were typically problematic for both approaches

(see Table 3.1). This is not surprising: high bootstrap support is unlikely when a slight aberration in the data would alter the results (Holmes 2003), which is the case when sequences are highly similar. Critics have argued that the bootstrap test for monophyly is simply too conservative and incorrectly rejects monophyly in too many cases (Rodrigo

1993). This is apparent from the 4% of species that appear monophyletic but with limited support. Alternative forms of statistical support based on coalescent theory suggest that

65 increased sampling decreases the risk of monophyly by chance, which would support the reality of these patterns despite low bootstrap values (Rosenberg 2007). A modified NJ algorithm with non-parametric bootstrapping has been proposed to offer fast barcode- based identifications, but success still depends on the completeness of the reference database and weakly divergent species remain problematic (Munch et al. 2008).

The character-based method was effective, but did not feature the same scalability as the previous two methods. We found that the CAOS system was severely constrained by limits on the number of species that could be included for rule generation. More thorough benchmarking is necessary to determine the upper limits of the program, but at this point in time they are unclear. We also found that comprehensive sampling for each taxon is vital for accurate rules that account for intraspecific polymorphisms. When operating with smaller sets of taxa, the programs were successful in both identifying diagnostic characters and in subsequently identifying new sequences to species.

However, we did find P-Elf to be highly susceptible to erroneous identifications for unrepresented species, counter to previous claims (Kelly et al. 2007). When using smaller datasets, sequences introduced from novel taxa were typically given a species level identification, even when those taxa derived from a different order (data not shown).

Both distance-based and clustering-based methods appear to share the same computational strengths, handling even large datasets quickly. However, both methods are also impaired by the same issues: limited divergence between sister taxa. The results of the character-based method appear to complement the former two methods. While it is precise and able to detect minor differences in closely related taxa (Wong et al. 2009), it is unable to handle large numbers of sequences. It is also susceptible to errors when the

66 appropriate taxa have not been comprehensively sampled. When it comes to species identification, we propose that the best method might actually be a multi-tiered approach, where an initial method is used to narrow the identification to a select group of taxa and an alternate method is used to differentiate similar taxa. Similarly, Munch et al. (2008) recommend incorporating methods that model population level variation to distinguish between closely allied species. For cases of limited divergence, sampling a longer stretch of COI or even alternative genes would increase support for identifications.

Conclusions

The utility of DNA barcodes in avian research is two-fold. Preliminary investigations, such as this, offer fresh insight to aid the ongoing effort to refine avian taxonomy. And secondly, a comprehensive library of COI sequences provides an invaluable tool for species assignment when differences in morphology are difficult to measure or otherwise assess. This includes species with cryptic morphological differences (e.g. Phylloscopus warblers, Calandrella larks, and Empidonax flycatchers) but also scenarios where identification is desired but only fragmentary remains are available (e.g. air strikes, nest contents, diet analysis, etc.). This study reaffirms these possibilities, demonstrating that COI sequence variation is largely congruent with species boundaries. Departures from this congruence are typically indicative of overlooked biological processes; historically separated lineages in the case of within species divergence, and recent or historical gene flow in the case of shared haplotypes between species. Molecular analysis is novel for some of these taxonomic groups or geographic areas, and the resultant observations highlight areas in need of further taxonomic study.

67 The efficacy of DNA barcodes for use in species assignment is dependent on two factors: the construction of thorough COI libraries and efficient tools to assign sequences to species. This study substantiates the need for dense taxonomic sampling. It further demonstrates that standardized gene libraries are easily amalgamated to examine geographically broad areas or taxonomically diverse groups. Current analytical methods for barcode data appear insufficient for handling recently evolved species. Though less of a problem for known cases of shallow divergence, where pairs of species may be further scrutinized using a multi-tiered approach, these cases may be more problematic for those who wish to use barcodes as a tool to accelerate species discovery.

68 Table 3.1. List of all groups of species that failed recognition via MOTU analysis.

Additionally, species with aberrant NJ profiles are indicated; profile designations (a-d) refer to Figure 3.3. Bootstrap support is given for each species ("nm" denotes that the species is not monophyletic) and the average interspecific distance is given for each group of species, both as percentages. Whether groups could be distinguished via CAOS is also indicated.

Family Species n NJ Bootstrap Inter sp CAOS 1 Gaviidae Gavia adamsii 6 b 38 0.77 Yes Gavia immer 3 67 2 Phalacrocoracidae Phalacrocorax pelagicus 9 b 61 0.78 Yes Phalacrocorax urile 1 n/a 3 Ardeidae Ardea cinerea 1 b n/a 1.90 Yes Ardea herodias 4 99 4 Anatidae Anas falcata 1 b n/a 1.46 Yes Anas strepera 9 50 Ay thy a affinis 9 b 24 1.58 Yes Aythya americana 10 61 Aythya collaris 10 81 Aythya fuligula 3 90 Aythya marila 11 12 Aythya valisineria 6 87 6 Bucephala clangula 1 b 55 1.58 Yes Bucephala islandica 10 87 7 Somateria fishcheri 7 b 94 0.96 Yes Somateria mollisima 10 d nm Somateria spectabilis 3 nm 8 Phasianidae Coturnix coturnix 2 a 99 1.50 Yes Coturnix japonica 4 99 9 Accipitridae Buteo buteo 3 b 85 1.92 Yes Buteo lagopus 2 92 10 Scolopacidae Gallinago delicata 6 d nm 0.15 No Gallinago gallinago 4 nm 11 Gallinago megala 2 b 93 0.61 Yes Gallinago stenura 5 98 12 Glareolidae Glareola pratincola 2 a 99 1.61 Yes Glareola nordmanni 3 99 13 Laridae Lams canus 5 89 0.65 Yes Larus canus "brachyrhynchus" 4 77 Lams delawarensis 3 50 Larus marinus 3 87 Larus spp. f 34 nm 0.24 No 14 Alcidae Cepphus carbo 3 "99~ 0.97 Yes

69 Cepphus columba 2 99 15 Cuculidae Cuculus canorus 5 d nm 0.71 No Cuculus optatus 5 nm 16 Motacillidae Motacillaflava "taivana" 2 b 99 1.16 Yes Motacilla citreola "citreola " 2 87 Motacilla citreola "werae " 4 98 17 Turdidae Tardus naumanni 9 b 75 1.10 Yes Turdus ruficollis 8 67 18 Turdus chrysolaus 9 b 97 1.35 Yes Turdus obscurus 5 67 Turdus pallidus 4 51 19 Laniidae Lanius isabellinus 3 b 99 1.71 Yes Lanius collurio% 2 93 20 Fringillidae Carduelis flammea 10 d nm 0.40 No Carduelis hornemanni 6 nm 21 Carduelis pinus 6 a 99 2.01 Yes Carduelis spinus 15 99 22 Emberizidae Emberiza citrinella 5 d nm 0.09 No Emberiza leucocephalos 5 nm

t Represents the ten members of the Herring gull complex listed in the text

X Only two of four specimens of the paraphyletic Lanius collurio exhibited limited divergence from L. isabellinus

70 Table 3.2. List of all species containing divergent COI lineages. An asterisk in the respective column indicates that lineages were supported via the NJ or MOTU method (a question mark indicates undetermined cases). The number of specimens and bootstrap support (%) for each cluster is indicated, as is the mean distance (%) between all clusters within each species.

Species NJ MOTU n Bootstrap Dist Phyl Bio Ref Falco columbarius ? * 1/4 -/99 2.29 P/N A Gallinula chloropus ? * 1/6 -/99 3.45 P/N A 1 Charadrius alexandrinus * * 4/3 99/99 7.53 P/N A Tringa totanus * 3/3 99/99 0.87 E/W A 2 Numenius phaeopus ? * 5/1 99/- 3.57 P/N A 3 Limosa limosa ? * 4/1 99/- 2.27 E/W P Thalasseus sandvicensis * * 2/6 98/99 3.78 P/N A 4 Streptopelia orientalis ? * 5/2 99/94 2.14 Sak P Asio otus ? 4/5 99/94 1.10 P/N A Aegolius funereus ? * 1/3 -/99 4.13 P/N A 5 Caprimulgus europaeus ? * 3/1 99/- 2.97 Cau A Dendrocopos major ? * 4/1 99/- 2.71 Sak A 6 Alauda arvensis * * 1/4/5 99/99/99 6.02 E/W, Sak A/P Delichon dasypus ? * 1/1/2 -i-m 3.58 S Anthus rubescens * * 6/2 99/99 2.46 P/N A 3 Motacilla flava * 2/1 87/- 5.57 E/W A 7 Troglodytes troglodytes * * 3/8/1 99/99/-/ 3.70 E/W, Cau A 8 1512 99/99 P/N Erithacus rubecula ? * 6/1 99/- 4.66 Cau A Luscinia megarhynchos ? * 1/2 -/99 2.56 Cau A Muscicapa sibirica ? * 6/1 99/- 2.85 Sak A Phoenicurus auroreus * * 2/3 99/99 2.36 E/W A Phoenicurus ochruros * * 3/2/1 99/99/- 3.66 E/W, Cau A Phoenicurus phoenicurus * * 2/4 99/99 5.20 S 1 Saxicola maurus ? * 7/1 99/- 7.91 E/W A 9 Cettia diphone * * 10/2 99/97 3.03 Sak A Phylloscopus borealis * * 8/6 99/99 3.59 Sak A 10 Phylloscopus trochiloides * * 4/4 99/99 4.39 E/W A 11 Sylvia curruca * * 6/3 99/99 5.56 E/W A Urosphena squameiceps ? * 4/1 99/- 2.09 Sak A Regulus regulus * * 7/3 99/99 3.69 E/W A 12 Parus major * * 6/7 99/99 2.59 E/W A 13, 14 Periparus ater * * 8/3 99/99 4.43 Cri A 1 Sitta europaea ? * 1/10/ -1991-1- 2.91 E/W, Cau, A 15 1/1 Yak Certhia familiaris ? * 6/3 93/99 1.93 E/W A Lanius excubitor * * 2/4 99/99 3.60 P/N P Lanius collurio 7 * 2/2 93/98 2.29 E/W A

71 Corvus corone * 1/7 -/83 2.15 E/W A Corvus frugilegus * * 2/2 99/99 2.94 E/W A Garrulus glandarius * * 4/3 99/99 2.63 E/W A Pica pica ? * 1/9 -/99 3.59 E/W A Sturnus vulgaris ? * 5/1 -/96 1.85 Kaz A Pinicola enucleator * * 12/2 99/99 4.54 P/N A Emberiza pallasi * * 4/2 99/99 3.10 Mog A Emberiza spodocephala * * 8/6 99/99 3.36 Sak A

Phyl: Phylogeographic patterns (P/N = Palearctic/Nearctic, E/W = east/west, Sak =

Sakhalin region, Cau = Caucasus region, Cri = Crimean region, Kaz = Kazakhstan, Mog

= Mongolia, Yak = Sakha (Yakutia) region)

Bio: Biogeographic patterns (A = allopatric, P = parapatric, S = sympatric)

Additional references detailing more comprehensive studies are supplied where available:

1 (Johnsen et al. 2010), 2 (Baker et al. 2009), 3 (Zink et al. 1995b), 4 (Efe et al. 2009), 5

(Koopman et al. 2005), 6 (Zink et al. 2002a), 7 (Pavlova et al. 2003), 8 (Drovetski et al.

2004b), 9 (Zink et al. 2009), 10 (Reeves et al. 2008), 11 (Irwin et al. 2001), 12 (Packert et al. 2009), 13 (Kvist et al. 2003), 14 (Pavlova et al. 2006), 15 (Zink et al. 2006a), 16

(Haring et al. 2007), 17 (Akimova et al. 2007).

72 Figure 3.1. Map of the eastern Palearctic region detailing the collecting sites for all

specimens used in this study. Red circles indicate sampling sites. Sampling intensity is indicated by the brightness of each circle.

73 100%

OFaSse negatives • FaDse positives

qrYxr"£aqor*i5rioa)o™^^aDo™^u>coo™^u>copr'j^oa5C|™^v9cqo™

Threshold ("Vo divergence)

Figure 3.2. Cumulative error plots of type I (false positive) and type II (false negative) errors for different divergence thresholds. Plot is based on 979 Holarctic bird species.

The optimum threshold occurs at 1.6% divergence.

74 a) c) Carduelis flavirostris _du 99_L Lanius isabellinus

90|l# 96 O Lanius collurio 99 93>-0 99 Carduelis cannabina •—T _ Lanius collurio 98L_n

b) d) 99 Turdus torquatus Emberiza leucocephalos) O Emberiza citrinella O O Emberiza leucocephalos O O O O O 99 O Emberiza citrinella Turdus ruficollis O o 99 Lo O 67 \QEmberiza leucocephalos, O 99 38 -J Emberiza cia 98 O 74 O I T Emberiza godlewski 75 r-O 99 • • O 99 O Turdus naumanni 99 O [• • Emberiza cioides O O 0.02 O

Figure 3.3. Examples of divergence patterns between closely related species illustrated in the NJ tree: a) Species are monophlyletic with >95% bootstrap support, b) Species are monophyletic, but is support is weak, c) Species are not monophyletic (i.e. paraphyly occurs), d) Multiple species form a single monophyletic group.

75 CHAPTER IV

Searching for selection using avian DNA barcodes ABSTRACT

The barcode of life project has assembled a very large number of mitochondrial

cytochrome c oxidase I (COI) sequences. Although these sequences were gathered to

develop a DNA-based system for species identification, biological inferences may also be

derived from the wealth of data. Recurrent selective sweeps have been invoked as an

evolutionary mechanism to explain limited intraspecific COI diversity, particularly in

birds, but this hypothesis has not been formally tested. In this study, I collated COI

sequences from previous barcoding studies on birds and tested them for evidence of

selection. Using this expanded data set, I re-examined the relationships between

intraspecific diversity and interspecific divergence and sampling effort, respectively. I employed the McDonald-Kreitman test to test for neutrality in sequence evolution between closely related pairs of species. Because amino acid sequences were generally

constrained between closely related pairs, I also included broader intra-order comparisons

to quantify patterns of protein variation in avian COI sequences. Lastly, using 22

published whole mitochondrial genomes, I compared the evolutionary rate of COI against

the other 12 protein-coding mitochondrial genes to assess intra-genomic variability. I find

no evidence of selective sweeps between closely related species. Contrary to prior

studies, I did uncover a weak relationship between intraspecific variation and

interspecific divergence. However, most evidence pointed to an overall trend of strong

purifying selection and functional constraint. The COI protein did vary across the class

Aves, but to a very limited extent. COI was the least variable gene in the mitochondrial

genome, suggesting that other genes might be more informative for probing factors constraining mitochondrial variation within species.

77 INTRODUCTION

The role of selection in the evolution of the mitochondrial genome is the subject of ongoing debate (Ballard and Whitlock 2004; Gerber et al. 2001; Meiklejohn et al.

2007). Variation at mitochondrial genes was long regarded as largely neutral and has been frequently used to infer effective population size and historical demographies based on that assumption. Consequently, mitochondrial DNA (mtDNA) variation has become a mainstay of molecular ecology and phylogeographic studies (Ballard and Whitlock

2004). More recently, this paradigm has shifted, as an ever-increasing role of selection has been recognized in the evolution of mitochondrial genes (Gerber et al. 2001).

The earliest studies to test the expectations of neutrally-evolving mtDNA instead found evidence of selection against mildly deleterious mutations in varied groups of animals including Drosophila (Rand and Kann 1996), mice (Nachman et al. 1994), humans (Hasegawa et al. 1998; Wise et al. 1998), and birds (Fry 1999). In such cases, the observed trend was toward an excess of amino acid polymorphisms within species as compared to amino acid substitutions between species. In contrast, more recent studies have cited evidence of positive selection in the mitochondrial genome, which has been attributed to cyto-nuclear interactions (Ballard and Whitlock 2004). Looking across 26 mammalian taxa, Schmidt et al. (2001) found that the nonsynonymous substitution rate was much greater in gene regions that coded for close-contact residues (i.e. those interacting with nuclear-encoded residues), suggesting positive selection acting at those

sites.

The above example also illustrates the mitochondrial genome's susceptibility to indirect selection. The mitochondrial genome generally lacks recombination and thus

78 behaves as a single linkage group. In some cases additional genes may also be linked, as in birds where females are the heterogametic sex and linkage extends to the W chromosome (Berlin et al. 2007). This linkage forms the foundation for an emerging view that mitochondrial evolution is governed by recurrent "selective sweeps". A selective sweep, also known as "genetic hitchhiking", occurs when selection acting on one site results in loss of variation from linked sites (Barton 2000). Unfortunately, the selective sweep hypothesis is difficult to test in the mitochondrial genome because it forms a single linkage group; most tests depend on the comparison of multiple loci (Galtier et al. 2000).

Demonstration of selective sweeps occurring in mitochondrial genes has typically been indirect. For example, in a landmark study which examined nearly 3,000 animal species, Bazin et al. (2006) concluded that the genetic diversity of mitochondrial markers was independent of population size, contradicting prior assumptions. They contended that purifying selection could not explain the pattern and that recurrent fixation of beneficial mutations was the most parsimonious explanation. The study generated much debate

(Meiklejohn et al. 2007; Mulligan et al. 2006) and was perhaps most soundly criticized for assessing neutrality between distantly related taxa (Wares et al. 2006). The supposed sweeps were actually detected at deep phylogenetic levels, not necessarily between closely related species (Berry 2006).

Large-scale analyses of the mitochondrial gene cytochrome c oxidase I (COI) for

DNA barcoding studies have invoked routine selective sweeps to explain the consistently low variation observed within species, despite the varying age of species (e.g. Kerr et al.

2007). This violates a rudimentary tenet of neutral theory that intraspecific polymorphism is correlated with interspecific divergence (Hudson et al. 1987). Simulations based on

79 neutral models predicted that more than 4 million generations would be necessary to achieve the degree of COI differentiation observed between closely allied species

(Hickerson et al. 2006), but barcode data suggested independence of intraspecific variation and species age (Kerr et al. 2009b; Kerr et al. 2007). Baker et al. (2009) offered an alternative explanation, arguing that low intraspecific diversity is an artifact of the small number of individuals examined in most barcoding studies and that denser intraspecific sampling would erase this pattern. The problem is compounded by varying methods that can be used to measure intraspecific diversity (e.g. genetic distance, haplotype number, etc.).

The number of available COI sequences has increased dramatically with the success of DNA barcoding, particularly for taxa such as birds (Frezal and Leblois 2008).

In this study, I take advantage of the expanded avian COI barcode data to more rigorously test for evidence of selection. This includes a reassessment of the relationship between intraspecific variation and interspecific divergence and sampling effort, tests for neutrality, and a cross genome comparison of genetic variation.

METHODS

Data collection

Published data were accessed from three public projects in the Barcode of Life

Database (BOLD, Ratnasingham and Hebert 2007): "Birds of North America - Phase II"

(Kerr et al. 2007), "Birds of Argentina - Phase I" (Kerr et al. 2009b), and "Birds of the eastern Palearctic" (Kerr et al. 2009a). Public data were supplemented with new sequences acquired from 826 specimens representing 113 species of North American

80 birds. The majority of specimens (98%) were represented by feather samples collected from banding stations including several across Canada (Atlantic Bird Observatory and

Brier Island Bird Migration Research Station, Nova Scotia; St. Andrews Banding Station,

New Brunswick; Gros Morne National Park Migration Monitoring Station,

Newfoundland; McGill Bird Observatory, Quebec; Haldimand Bird Observatory, Long

Point Bird Observatory, Prince Edward Point Bird Observatory, and Tommy Thompson

Park Bird Research Station, Ontario; Inglewood Bird Sanctuary, Alberta; Mackenzie

Nature Observatory, Rocky Point Bird Observatory, and Vaseux Lake Bird Observatory,

British Columbia; Albert Creek Bird Banding Station and Teslin Lake Bird Banding

Station, Yukon) and a single station in North Carolina, U.S.A. (Appalachian Highlands

Science Learning Center of the US National Park Service). The remaining specimens were comprised of muscle tissue samples from curated collections (including the

Canadian Wildlife Service, Royal Ontario Museum, Burke Museum of Natural History and Culture, Museum of Comparative Zoology, Museum of Southwestern Biology, and the Smithsonian Institution National Museum of Natural History). All DNA extraction,

PCR, and sequencing methods follow those reported by Kerr et al. (2007) and were performed at the Biodiversity Institute of Ontario, University of Guelph.

Assessing genetic diversity

Only species that were represented in BOLD by 12 or more COI sequences were included for this analysis. Additionally, only sequences greater than 650 bp and with fewer than 7 (= approximately 1%) ambiguous base calls were included. In total, 55 species were used in the analysis and are listed in Appendix 4.1. By nature of the original

81 sampling scheme, specimens were sampled broadly from across the respective range of each species. To quantify genetic diversity, the number of unique haplotypes (h), haplotype diversity (//), and nucleotide diversity (71) were calculated using DnaSP version

5.0 (Librado and Rozas 2009). The minimum nearest neighbour distance was used to measure interspecific divergence and was calculated using the Kimura 2-parameter metric in the BOLD Management and Analysis System version 2.5 (Ratnasingham and

Hebert 2007).

Linear regression was used to test the relationship between sampling effort (i.e. the number of specimens included in the analysis) and h, H, and 71, respectively using R version 2.5.0 (R Development Core Team 2007). The Pearson product-moment correlation coefficient was used to test for a relationship between nearest-neighbour distance and h, H, and 71, respectively, also using R version 2.5.0 (R Development Core

Team 2007).

Neutrality tests of COI variation

To test COI variation for evidence of neutrality, I selected pairs of closely related species that were well populated in BOLD. Congeneric pairs of sister taxa were identified using a neighbour-joining tree generated from BOLD (Ratnasingham and

Hebert 2007). Species pairs were only selected when one member of the pair was represented by at least 7 specimens and the other was represented by at least 2 specimens.

In total, this included 34 pairs of species representing 10 orders and 29 families of birds

(see Table 4.1). The standard McDonald-Kreitman test (available at http://mkt.uab.es/mkt/) was run on each species pair using the vertebrate mitochondrial

82 code (Egea et al. 2008). This test produces a 2 x 2 contingency table of nonsynonymous intraspecific polymorphisms (Pn), synonymous intraspecific polymorphisms (Ps), nonsynonymous fixed differences (Dn), and synonymous fixed differences (Ds). The program also provides the neutrality index, or NI, which is calculated as (Pn/Ps)/(Dn/Ds)

(Rand and Kann 1996). Neutrality is supported when NI = 1, whereas N < 1 implies an excess of amino acid divergence between species, and N > 1 implies an excess of amino acid polymorphism within species.

Amino acid variation

Because amino acid variation tends to be low between pairs of avian sister species, I also examined variation at the ordinal level to assess amino acid variation in

COL I selected the 12 best-represented orders from the database (Apodiformes,

Anseriformes, Charadriiformes, Ciconiiformes, Columbiformes, Coraciiformes,

Falconiformes, Galliformes, Passeriformes, Piciformes, Psittaciformes, and Strigiformes) and then trimmed the database to include only species with at least two full-length sequences (i.e. 694 bp). Species with polymorphisms (n = 43) were removed from the analysis so that only species with fixed differences were included. Single individuals were randomly selected from each of the remaining species pairs to populate the final working dataset (n = 623). Nucleotide sequences were translated to amino acid sequences using Geneious version 3.5 (Drummond et al. 2007). The number of amino acid sequence

"types" was tallied for each order (denoted as h\ analogous to haplotype number). To capture the diversity of amino acid sequence types within each order (i.e. the frequency

83 of each type), I calculated a diversity value (denoted if) using a modified version of

Nei's haplotype diversity equation (equation 8.5, Nei 1987),

H' = n(l-lxi2)/(n-l) where n equals the number of species in each order and x, represents the frequency of the ith amino acid sequence type within each order. The number of amino acid types unique to each order were identified using a neighbour-joining tree generated through BOLD

(Ratnasingham and Hebert 2007). To assess the level of intra-order amino acid divergence, I calculated the mean PAM1 matrix scores for each order using MEGA version 4.0 (Tamura et al. 2007).

To approximate the position of amino acid substitutions within the COI protein, a consensus sequence was constructed from the 623 amino acid sequences described above.

The consensus sequence was aligned to the bovine sequence using an ends-free local alignment and the BLOSUM62 substitution matrix. The consensus sequence was positioned on the bovine secondary structure, which has been determined via crystallography (Tsukihara et al. 1996), using the alignment as a guide. Residues were identified as either loop or helix sites based on their location in the bovine structure. A

Chi-square test with Yates correction was run in R version 2.5.0 (R Development Core

Team 2007) to see if variable sites were equally distributed between the two regions.

COI versus other mitochondrial genes

Whole avian mitochondrial genomes published on GenBank as of 14 December

2009 were surveyed for pairs of congeneric taxa. Two genera - Gallus and Syrmaticus - were represented by more than two species, so the two most closely related species were

84 selected for analysis. In total, 22 complete mitochondrial genomes were downloaded for the 11 available congeneric pairs (see Appendix 4.2). Each of the 13 protein-coding genes was segregated into a separate fasta file and aligned using ClustalW in MEGA version 4.0 (Tamura et al. 2007). An extra base pair was removed from the ND3 sequence at position 9768 from several species to maintain the reading frame. ND6 was analyzed in reverse complement for all species. Pairwise dN/ds ratios were calculated for all genes from all congeneric pairs using the codeml package in PAML version 4.3 (Yang

2007).

The disj/ds ratios were transformed prior to statistical analysis using an arcsine square root transformation. A one-way ANOVA was used to test for a difference in dw, ds, and the dN/ds ratios between the different genes. A Tukey's honestly significant difference (HSD) test was used subsequently to identify which genes differed significantly. Both tests were performed using R version 2.5.0 (R Development Core

Team 2007).

RESULTS

Genetic diversity

The sampling effort ranged from 12 to 34 COI sequences per species. The nearest neighbour distance varied dramatically from 0 to 12.88% K2P corrected distance. Figure

4.1 provides scatter plots for all 6 comparisons. Only haplotype number was significantly

2 correlated to sampling effort, although the relationship was weak (r = 0.20, Fii53 =

13.16, p < 0.001). There was no significant relationship with haplotype diversity (r2 =

2 0.09, Fi,53 = 0.42, p = 0.518) or nucleotide diversity (r = 0.01, Fi,53 = 0.01, p = 0.923).

85 However, there was a weak but significant correlation between nearest-neighbour distance and nucleotide diversity (r53 = 0.46, p < 0.001), as well as haplotype diversity

(r53 = 0.31, p = 0.019), but there was no significant correlation to haplotype number (Y53 =

0.22, p = 0.103).

Neutrality tests

Both polymorphic and fixed nonsynonymous differences were rare between species pairs (see Table 4.1). Consequently, 15% of NI values were zero and 79% were undefined. Only two pairwise comparisons {Empidonax alnorum/E. traillii and Strix occidentalis/S. varia) possessed at least one polymorphic and one fixed nonsynonymous difference, but only Strix had significantly different ratios of polymorphic to fixed differences (x2 = 5.74, p < 0.05, Table 4.1). Overall, Ds was significantly greater than Ps

(7.5-fold on average; 139 = -10.49, p < 0.001), whereas DN did not differ significantly from PN (t59 = 1.09, p = 0.279). Interestingly, Ps was not correlated to Ds (r32 = 0.14, p =

0.404) but was correlated to sample size (r32 = 0.49, p < 0.01).

Amino acid diversity

Table 4.2 summarizes the results of the diversity tests. Of the 231 residues, 41 were variable among the species examined. Amino acid sequence diversity was relatively high within orders, but the degree of variation (i.e. the number of substitutions) was limited. Highly divergent taxa often shared the same sequence. For example, the same amino acid sequence was recovered from members of the Charadriiformes,

Columbiformes, Coraciiformes, and Falconiformes.

86 The predicted secondary structure of the consensus amino acid sequence is illustrated in Figure 4.2. According to the predicted structure, 78% of residues occur in helix sites and 73% of variable positions were in that region, revealing no association between variation and position in the secondary structure (x2 = 0.46, p = 0.497). Most amino acid substitutions were those known to commonly occur (e.g. isoleucine <-> valine, alanine <-> serine, and isoleucine <-> leucine) (Betts and Russell 2003). Approximately

20% of the amino acid substitutions were only observed within a single species.

Confidence in these rare amino acid sequences is increased by the sampling strategy, wherein only fixed amino acid substitutions were included in analysis. However, two unusual substitutions were observed, both occurring in the last residue and both in

Anseriformes: leucine —> serine in Netta peposaca, and leucine —» phenylalanine in

Amazonetta brasiliensis.

Genomic comparisons

Syrmaticus ellioti and S. humiae were too narrowly divergent to be informative

(the dw/ds ratio for every gene except ND5 was either 0 or 1), so this species pair was removed from further analyses. The mean values for d^, ds, and the dN/ds ratio are depicted for each gene in Figure 4.3. There was a significant difference in the d^ds ratios recorded for the thirteen protein-coding genes (F12,117 = 6.40, p < 0.001). A significant difference was also recorded for dw (F12,117 = 3.12, p < 0.001), though not for ds (F12,117

= 0.75, p < 0.699). Post-hoc Tukey HSD comparisons revealed that the ANOVA result for the dw/ds ratios was due to ATP8, which differed significantly from all other genes (p

87 < 0.01), and COI, which differed from ND3 (p < 0.05). Similarly, the difference in dN occurred between ATP8 and COI, COII, COIII, and Cyt B, respectively (p < 0.01).

DISCUSSION

The present results confirm the prediction of Baker et al. (2009) that the number of rare haplotypes encountered will increase with sampling effort, as is fairly intuitive.

However, because intraspecific divergence remains relatively low between most haplotypes, the mean intraspecific variation is nearly unaffected by sampling effort.

There was a relationship between haplotype diversity and interspecific divergence and, in contrast to Kerr et al. (2007), I did find a weak relationship between intraspecific variation and interspecific divergence as well. This discrepancy may be partly due to their treatment of divergent mitochondrial lineages as "provisional species", which would reduce both intraspecific variation and interspecific divergence in some of the species included in this study such as Vireo gilvus or Troglodytes troglodytes, among others. This difference aside, the relationship is suggestive of demographic effects (i.e. bottlenecks), rather than selection.

Attempts to test neutrality were impeded by the lack of amino acid sequence variation both within and between species, which is a common problem of this method

(Meiklejohn et al. 2007). The McDonald-Kreitman test is susceptible to error when taxa are distantly related and multiple substitutions at single sites are likely, but amino acid variation is rarer when taxa are closely related (Ballard and Whitlock 2004). Lack of amino acid variation disrupts the utility of the neutrality index, as it results in either a zero value (when polymorphisms are absent) or infinity (when divergence is absent).

88 Other studies have circumvented this issue with a simple, yet questionable practice of

substituting zeros with arbitrary values (i.e. Bazin et al. 2006; Rand and Kann 1998).

Previous studies examining the neutrality of variation in avian mitochondrial

genes have generally proposed a model of mildly deleterious mutations (e.g. Fry 1999).

Zink (2005) found that members of the passerine genus Parus exhibited an excess of nonsynonymous polymorphisms in closely related species and cited purifying selection as

the cause after ruling out demographic effects. However, the same author (2006a) could

not reject neutrality when examining phylogroups of the polymorphic species Sitta europaea, suggesting that drift was largely responsible for genetic differences between budding species. The taxonomic scope in this study was much broader than its predecessors and the general pattern appeared to be one of functional constraint. While

the only calculable NI values were greater than one, it would be misleading to describe the sequences as bearing excess amino acid polymorphisms since amino acid variation

was generally rare. In either case, the pattern is suggestive of purifying selection.

Variation in the amino acid sequence was rare between closely related species, but there was substantial variation when broader taxonomic comparisons were made.

Most positions in the amino acid sequence (82%) are conserved across all birds. While

variable sites were more numerous within helix sites, they were proportionately equal between helix and loop sites. This is inconsistent with previous studies that have

observed differing selective pressures on surface (primarily loops) and transmembrane

sites, with substitutions in transmembrane sites being more heavily constrained by

interaction effects with other residues (Wang and Pollock 2007), and a greater tendency towards neutral evolution within surface sites (Wise et al. 1998). However, these

89 inferences suppose the accuracy of the predicted secondary structure, which unfortunately is difficult to verify.

Mapping these changes onto a phylogeny is challenging as our current understanding of evolutionary relationships amongst the major avian orders is in flux

(Hackett et al. 2008). However, looking at variability within orders, it is clear that some amino acids substitutions are recurrent (e.g. isoleucine <-> valine at position 12), whereas other substitutions have a single origin (e.g. glycine —> serine at position 117 in

Strigiformes). Small changes to proteins can have an adaptive impact. For example, adaptive changes to haemoglobin proteins in high-altitude geese have been attributed to four mutations in the bar-headed goose, Anser indicus (Liang et al. 2001), and a single mutation in the Andean goose, Chloephaga melanoptera (Hiebl et al. 1987). For the majority of the amino acid substitutions observed here, an adaptive explanation seems unlikely, especially when amino acid sequences are shared between very divergent taxa.

Co-evolution within the gene has been demonstrated for proximal residues in vertebrates

(Wang and Pollock 2007), but this too seems an unlikely explanation given the independent origins of the sequences. It is more likely that most of the observed amino acid substitutions have low-impact changes that escape purifying selection, particularly since linkage between genetic loci is known to reduce the effectiveness of purifying selection (Paland and Lynch 2006).

Selection in mitochondrial genes has been attributed to co-evolution with nuclear- encoded genes. In marine copepods, inter-population hybrids have shown reduced mitochondrial function and, subsequently, reduced fitness (Burton et al. 2006). Reduced function has also been demonstrated in cybrid cells that cross human nuclear DNA with

90 mtDNA from other primates (Kenyon and Moraes 1997). In birds, the fitness costs to hybrids are less clear. Empirical studies have revealed an effect on the metabolic rate of hybrids between divergent populations of an Old World passerine, Saxicola torquata spp.

(Tieleman et al. 2009). Despite a measurable impact on metabolic rate, the overall fitness cost is uncertain and could still depend on the magnitude of mitochondrial mismatch between taxa. Complete mitochondrial introgression has been demonstrated between certain avian sister species, such as the Palearctic buntings Emberiza citrinella and

Emberiza leucocephalos (Irwin et al. 2009), which is one situation that does lend support to the selective sweep hypothesis. Introgression has also been ascribed to selective processes in other non-avian species, including salmonids (Wilson and Bernatchez 1998) and Drosophila (Bachtrog et al. 2006). Conversely, introgressed mitochondrial haplotypes from gray wolves, Canis lupus (Lehman et al. 1991), and domestic dogs,

Canis familiaris (Adams et al. 2003), have been recovered from populations of coyote,

Canis latrans, but neither of these has lead to fixation. The consequence of mitochondrial substitutions between recently diverged species appears erratic, but current data would suggest divergence is mostly spurred by drift and less occasionally by selection.

An important consideration for this study is how reflective DNA barcode data are of general trends in the mitochondrial genome. The barcode region has in fact been previously used as a predictor of variation in nucleotide composition across the mitochondrial genome (Clare et al. 2008; Min and Hickey 2007), but that is not to say that mutation rates cannot vary between mitochondrial genes. The number of replacement sites occurring in the "barcoding region" of COI was not significantly different from that of the rest of the COI gene (K. C. R. Kerr, personal observation), which confirms that

91 DNA barcodes provide an overall representation of COI. Across the genome, COI has been known for its conservative substitution rate (Lynch and Jarrell 1993). A comprehensive summary of substitution rates in vertebrate mitochondrial genomes suggested that the rate increases with distance from the origin (Broughton and Reneau

2006). Observing this pattern within birds is hindered because the origin of replication for the light strand is as of yet undetermined in the avian mitochondrial genome (Desjardins and Morais 1990). Regardless, this pattern was not apparent in this study, given that divergence rates between genes did not differ significantly.

Conclusions

Overall, I found no clear evidence for recurrent selective sweeps. While barcode data do not match neutral predictions, the impression is that evolution in mitochondrial genes, and COI in particular, are largely governed by purifying selection. Nucleotide divergence between closely related species appears mostly attributable to drift. Because of the linkage of the entire mitochondrial genome, it is challenging to study evolution without examining the entire genome. As large-scale DNA sequencing becomes more accessible, there will be growth in the number of sequenced whole mitochondrial genomes. Currently, no avian species is represented by more than one mitochondrial genome in GenBank, but intraspecific mitochondrial genomic variation has yielded insights into the evolutionary process for other organisms, such as gadine fish (Marshall et al. 2009) and humans (Mishmar et al. 2003).

92 Table 4.1. Thirty-four species pairs included in McDonald-Kreitman tests for neutrality of COI variation in birds. Sample size for each species is indicated in parentheses.

Acronyms are for nonsynonymous intraspecific polymorphisms (Pn), synonymous intraspecific polymorphisms (Ps), nonsynonymous interspecific fixed differences (Dn), synonymous interspecific fixed differences (Ds), and neutrality index (NI). Ps, Ds, Pn, and

Dn are uncorrected values.

2 Species 1 (n) Species 2 (n) P„ Ps D„ Ds NI % p Lagopus muta (21) L. leucura (5) 0 6 2 38 0 0.31 0.575 Phalaropus lobatus (11) P. fulicarus (2) 1 2 0 34 null 11.65 0 Actitus macularius (8) A. hypoleucos (5) 0 0 0 64 null null null Brachyramphus brevirostris (7) ZJ. marmoratus (2) 0 7 0 31 null null null Gallinago gallinago/delicata (9) G. paraguaiae (3) 1 4 0 21 null 4.38 0.036 Lams ridibundus (8) L. Philadelphia (4) 0 1 0 15 null null null Stercorarius longicaudus (7) 5. parasiticus (4) 0 5 0 37 null null null Thalasseus sandvicensis (8) P. elegans (5) 0 6 0 10 null null null Phalacrocorax pelagicus (11) P. penicillatus (5) 0 3 0 37 null null null Puffin us pacificus (7) P. few/Zen (6) 0 4 0 22 null null null Strix occidentalis (7) S. varia (4) 1 3 1 51 17 5.74 0.016 Megascops asio (9) M. kennicotti (5) 0 4 1 37 0 0.11 0.742 Chaetura vauxi (8) C. pelagica (2) 0 3 0 14 null null null Falco sparverius (7) P. tinnunculus (3) 0 8 1 53 0 0.15 0.697 Picoides villosus (10) P. albolarvatus {!) 0 14 0 18 null null null Zenaida macroura (8) Z auriculata (8) 0 8 0 15 null null null Columbina passerina (8) C talpacoti (4) 0 3 0 35 null null null Empidonax alnorum (8) £. rrai//(7 (4) 2 3 2 13 4.333 1.67 0.196 Leptasthenura aegithaloides (8) L. fuliginiceps (3) 0 5 1 37 0 0.13 0.713 Phacellodomus ruber (8) P. striaticollis (3) 0 2 0 21 null null null Phytotoma rutila (7) P. rara (3) 2 3 0 59 null 24.36 0 Nucifraga caryocatactes (9) N. columbina (8) 0 10 0 35 null null null Poecile montana (20) P. palustris (11) 1 12 0 38 null 2.98 0.084 Cinclus mexicanus (7) C. cinclus (5) 0 7 2 51 0 0.27 0.601 Locustella certhiola (7) L. ochotensis (7) 0 10 0 30 null null null Turdus viscivorus (8) T. philomelos (4) 0 14 0 54 null null null Seiurus noveboracensis (24) S. aurocapillus (\A) 0 21 0 47 null null null Sicalis luteola (7) 5. flaveola (6) 3 7 0 55 null 17.30 0 Melospiza melodia (27) M lincolnii (26) 2 7 0 18 null 4.32 0.037 Emberiza aureola (11) Zs. rustica (9) 111 0 34 null 2.90 0.088 Paroaria capitata (7) P. coronata (5) 0 1 1 31 0 0.03 0.857 Molothrus bonarensis (10) M afer (9) 3 4 0 16 null 7.89 0.004 Fringilla montifringilla (12) P. coe/efo (4) 0 7 0 49 null null null Passer domesticus (17) P. montanus (11) 1 11 0 37 null 3.15 0.076

93 Table 4.2. Summary of COI amino acid variation for the 12 orders of birds examined.

The number of amino acid sequence types (h1), the diversity of amino acid sequence types (H'), and the percentage of types unique to each order are outlined below. The

mean intra-order PAM score indicates the level of amino acid divergence.

Order n h' H' Uniqueness PAM (±s.d.) Ciconiiformes 9 6 0.833 50% 0.00778 (±0.00355) Anseriformes 30 9 0.662 100% 0.00507 (±0.00215) Falconiformes 15 9 0.848 93% 0.01199 (±0.00443) Galliformes 11 8 0.891 100% 0.01113 (±0.00488) Charadriiformes 84 10 0.361 50% 0.00196 (±0.00076) Columbiformes 14 8 0.901 50% 0.00689 (±0.00301) Psittaciformes 9 7 0.944 71% 0.00817 (±0.00331) Strigiformes 9 8 0.972 100% 0.01817 (±0.00562) Apodiformes 15 9 0.924 89% 0.01084 (±0.00341) Coraciiformes 7 6 0.952 71% 0.01724 (±0.00526) Piciformes 22 4 0.403 75% 0.00194 (±0.00112) Passeriformes 398 78 0.903 95% 0.01458 (±0.00490) Total 623 148 0.944

94 A) D)

• • •-.

—1— 1 25 30 0.00 0.02 004 006 008 0,10 0.12

NN

Figure 4.1. Scatter plots relating sampling effort of avian COI sequences (n) to A) haplotype number (h), B) haplotype diversity (H), and C) nucleotide diversity (Pi), and

K2P nearest neighbour distance (NN) to D) haplotype number, E) haplotype diversity, and F) nucleotide diversity.

95 Figure 4.2. Predicted secondary structure for the avian consensus sequence of the

"barcoding region" of COI based on the structure derived from bovine cytochrome c oxidase. Letters indicate the consensus amino acid sequence based on the one-letter code.

Black circles are conserved sites. Variable sites vary from white to gray based on the percentage of sequences containing the consensus amino acid. The number of different amino acids occuring at a single site is represented by the thickness of the outline.

96 s J

J. ND2 COI CON ATP8 ATP6 COIII ND3 ND4L ND4 ND5 CylB ND6 ND1

••••.••••••! I ! 1 1 1 1 1 1 i 1 1 1 1 1 ND1 ND2 COI COII ATP8 ATP6 COIII ND3 ND4L ND4 ND5 CytB ND6

1 3

~i 1 ! 1 1 1 1 i 1 1 1 1 r ND1 ND2 COI COII ATP8 ATP6 COIN ND3 ND4L ND4 ND5 CytB ND6

Figure 4.3. Box plots of dN (yellow), ds (blue), and the dn/ds ratios (green) for each of the thirteen protein-coding mitochondrial genes from the 22 species listed in Appendix

4.2. Genes are ordered to match the arrangement in the avian mitochondrial genome.

97 EPILOGUE

98 GENERAL CONCLUSIONS

In the introduction of this thesis, I presented two potential uses for DNA barcoding: species identification and species discovery. In many groups of organisms, species identification is a daunting task, often requiring expert opinion to decipher otherwise unintelligible keys (hence the appeal of DNA barcoding). Thanks to an abundance of meticulously illustrated field guides and monographic tomes, species identification in birds is far more approachable. Complete checklists of the birds of the world are tended competitively (e.g. Clements 2007; Gill and Wright 2006), providing a reasonably accurate view of total species numbers. Armed with this knowledge, I have demonstrated the efficacy and the limitations of the DNA barcode-based approach to species identification in three regional, mostly temperate bird faunas.

The comprehensive library of COI sequences amassed for North American birds revealed that most species (94%) could be accurately identified. By expanding the database to include the birds of the southern Neotropics and the eastern Palearctic, where climatic histories differ, I was able to demonstrate that the success with North American birds was not a simple artifact of glacial bottlenecks (Hughes and Hughes 2007), as similarly strong performance was revealed in the Palearctic and Neotropical regions.

Some underlying levels of intraspecific variation do exist in COI, so it is important to clearly define a method by which species may be delimited to accurately correspond with taxonomic boundaries. The threshold approach, often represented by the "lOx rule"

(Hebert et al. 2004), oversimplifies this process. In Chapter 3, while distance-based and tree-based methods performed equally well, I demonstrated that a multi-tiered approach is likely the best solution to this problem.

99 The final COI dataset provided significant representation of a large number of closely related species, including several pairs of sister taxa, wherein conflicts in resolution were most likely to occur (Moritz and Cicero 2004). Groups that were indistinguishable using DNA barcodes alone most frequently involved pairs of closely related species where it is suspected that insufficient time had passed to allow COI sequences to diverge. That these occurrences were not uniformly distributed across the avian class (e.g. frequently occurring within Anseriformes) suggests that rate variation is important. Introgression has also been pinpointed as an agent for reducing variation between species (Irwin et al. 2009). However, if gene flow does not continue between species, differences should accumulate over time, as seen in Stercorarius pomarinus

(Andersson 1999). In a few cases, species limits may genuinely require revision (e.g.

Gallinago delicata, Baker et al. 2009). In two exceptional genera - Sporophila and Larus

- a plethora of species appear as a single genetic mixing pot. Attempts to distinguish the

Sporophila species using alternative genes have also failed (Campagna et al. 2009).

Similar assemblages in other taxa are often united as single species (Mila et al. 2007a;

Mila et al. 2007b); oddly, in the gulls, systematic effort appears to be working in the reverse direction, with new species being recognized (Banks et al. 2008). However, a slower mutation rate has been proposed as a cause for the limited genetic diversity of gulls (Crochet and Desmarais 2000). While they may form a seemingly insurmountable challenge to molecular identification methods, some of the other narrowly divergent taxa could be separated using a character-based method where thresholds failed, but this does require an a priori definition of species limits (DeSalle et al. 2005).

100 The topic of species limits segues to the second use of DNA barcodes: species discovery. This has been by far the most contentious application of DNA barcoding

(DeSalle 2006; DeSalle et al. 2005; Hickerson et al. 2006; Moritz and Cicero 2004; Will et al. 2005). First, it must be clarified that DNA barcoding is not intended to replace traditional taxonomy (Gregory 2005); rather, its benefit to the taxonomic enterprise is additive (Padial and de la Riva 2007). For example, divergent lineages noted within species in Chapters One and Two have since been acknowledged as different species in more thorough taxonomic assessments (Areta and Pearman 2009; Barker et al. 2008;

Toews and Irwin 2008). While not every divergent mitochondrial lineage will lead to the naming of new species, the findings are sufficient to warrant further investigation and test new hypotheses about species boundaries. This "first-pass" approach has been bolstered elsewhere, even where molecular evidence has not been applied, and its benefit to conservation efforts has been acknowledged (Peterson 2006). So, while a single mitochondrial gene may not provide adequate evidence to describe new species, the application of DNA barcodes to flag taxa in need of further review may still expedite the species discovery process, alleviating the taxonomic impediment.

Divergent mitochondrial genes have recently been proposed as a cause of speciation rather than as a consequence thereof (Gershoni et al. 2009), but others, such as

Jerry Coyne, have dismissed this notion (Lane 2009). In Chapter Four, I revealed that

COI protein differences between closely related species were very slight, suggesting that drift is more likely responsible for interspecific divergence. However, my data were also concordant with previous findings that COI diversity is lower than other mitochondrial genes. While cyto-nuclear interactions have been implicated in the evolution of

101 cytochrome c oxidase (Schmidt et al. 2001), my data suggest that residues in genes other than COI might contribute more to such compatibility issues.

The avian DNA barcode library has already served in forensic applications, including the identification of food products, feathers from anthropological artifacts, nest host species, and birds involved in "airstrike" collisions (Dove et al. 2008). While this demonstrates practical applications of the barcode library for bird identification, the method is arguably most germane to microscopic taxa and those with cryptic life stages.

The question remains, how well are the results of this study extrapolated to other groups of organisms? The time lag between speciation and hybrid inviability is known to persist longer in birds than in other vertebrates (Fitzpatrick 2004). This could result in a deceleration of the mitochondrial DNA substitution rate, which is in fact well documented in birds (Kessler and Avise 1985). This would also explain the susceptibility of avian species to mitochondrial introgression. However, alternative explanations for this reduced genetic variability exist, including Hill-Robertson effects associated with the maternally inherited W chromosome (Berlin et al. 2007), a lower output of reactive oxygen species (Hickey 2008), a decreased tolerance of amino acid substitutions (Stanley and Harrison 1999), and an accelerated rate of morphological evolution (Johns and Avise

1998).

Watson (2005) contended that birds were more likely than other animal species to be described based on attributes that allow field diagnosis, which could result in the oversight of evolutionary significant units. While this thesis demonstrates that that occasionally is true, molecular features are now routinely employed in avian taxonomy

(Collar and Spottiswoode 2005) and the rate of 'cryptic species' discovery in birds is

102 considered on par with that observed in other organisms (Pfenninger, Schwenk, 2007).

Given these known differences, it is reasonable to extrapolate the findings of this study to other vertebrate groups, with the caveat that low sequence divergence between sister species might be less severe in other groups, but higher intraspecific diversity might also be observed. Extending these findings to invertebrate taxa could provide a greater challenge given the larger degree of life history differences.

Proponents of using nuclear loci for avian phylogenetics and other molecular investigations now have a profusion of primer pairs at their disposal for use in multilocus studies (Edwards 2008). While this is an exciting advancement, particularly for population-level studies, supremacy is still touted for mitochondrial genes when it comes to species delimitation (Zink and Barrowclough 2008). While many former beliefs about the evolutionary properties of mitochondrial markers have been falsified, they continue to serve a valued purpose as our understanding is refined (Galtier et al. 2009). Additionally, molecular data is on the brink of a new era of accessibility, thanks to advances in sequencing technology (Ellegren 2008). Consequently, large-scale genomic studies will proliferate and whole mito-genomic sequences will become increasingly available, enabling a greater understanding of mitochondrial evolution (Zardoya and Suadrez 2008).

This will allow more accurate biological inferences to be made from the growing pool of mitochondrial genetic data, such as that resulting from the Barcode of Life project.

Ultimately, the development of a comprehensive library of avian DNA barcodes has been reciprocally illuminating. Avian taxonomy has provided a gold standard against which to test the efficacy of DNA barcoding. In exchange, departures from expected patterns have yielded new insight into taxonomic boundaries. Birds cannot be the sole

103 test of the DNA barcoding paradigm, but alongside other global taxonomic campaigns such as FISH-BOL (Ward et al. 2009) and the All-Leps campaign, we can garner an unbiased appreciation of the power and limitations of this taxonomic tool.

104 REFERENCES

Abdo Z, Golding GB (2007) A step toward barcoding life: A model-based, decision-

theoretic method to assign genes to preexisting species groups. Systematic

Biology 56:44-56

Adams JR, Leonard J A, Waits LP (2003) Widespread occurrence of a domestic dog

mitochondrial DNA haplotype in southeastern US coyotes. Molecular Ecology

12:541-546

Akimova A, Haring E, Kryukov S, Kryukov A (2007) First insights into a DNA

sequence based phylogeny of the Eurasian Jay Garrulus glandarius. Russian

Journal of Ornithology 16:567-575

Aliabadian M, Roselaar CS, Nijman V, Sluys R, Vences M (2005) Identifying contact

zone hotspots of passerine birds in the Palearctic region. Biology Letters 1:21-23

Andersson M (1999) Hybridization and skua phylogeny. Proceedings of the Royal

Society of London Series B-Biological Sciences 266:1579-1585

Arbogast BS, Drovetski SV, Curry RL, Boag PT, Seutin G, Grant PR, Grant BR,

Anderson DJ (2006) The origin and diversification of Galapagos mockingbirds.

Evolution 60:370-382

Areta JI, Pearman M (2009) Natural history, morphology, evolution, and taxonomic

status of the earthcreeper Upucerthia saturatior (Furnariidae) from the Patagonian

forests of South America. Condor 111:135-149

Arnaiz-Villena A, Alvarez-Tejado M, Ruiz-del-Valle V, Garcia-de-la-Torre C, Varela P,

Recio MJ, Ferre S, Martinez-Laso J (1998) Phylogeny and rapid Northern and

105 Southern Hemisphere speciation of goldfinches during the Miocene and Pliocene

Epochs. Cellular and Molecular Life Sciences 54:1031-1041

A vise JC, Ankney CD, Nelson WS (1990) Mitochondrial gene trees and the

evolutionary relationships of mallard and black ducks. Evolution 44:1109-1119

Avise JC, Walker DE (1999) Species realities and numbers in sexual vertebrates:

Perspectives from an asexually transmitted genome. Proceedings of the National

Academy of Sciences of the United States of America 96:992-995

Bachtrog D, Thornton K, Clark A, Andolfatto P (2006) Extensive introgression of

mitochondrial DNA relative to nuclear genes in the Drosophila yakuba species

group. Evolution 60:292-302

Baker AJ, Tavares ES, Elbourne RF (2009) Countering criticisms of single

mitochondrial DNA gene barcoding in birds. Molecular Ecology Resources

9:257-267

Ballard JWO, Whitlock MC (2004) The incomplete natural history of mitochondria.

Molecular Ecology 13:729-744

Banks RC, Chesser RT, Cicero C, Dunn JL, Kratter AW, Lovette IJ, Rasmussen PC,

Remsen JV, Rising JD, Stotz DF, Winker K (2008) Forty-ninth supplement to

the American Ornithologists' Union - Check-list of north American birds. Auk

125:756-766

Banks RC, Cicero C, Dunn JL, Kratter AW, Ouellet H, Rasmussen PC, Remsen JV,

Rising JA, Stotz DF (2000) Forty-second supplement to the American

Ornithologists' Union check-list of North American birds. Auk 117:847-858

106 Barber P, Boyce SL (2006) Estimating diversity of Indo-Pacific coral reef stomatopods

through DNA barcoding of stomatopod larvae. Proceedings of the Royal Society

B-Biological Sciences 273:2053-2061

Barker FK, Vandergon AJ, Lanyon SM (2008) Assessment of species limits among

yellow-breasted meadowlarks (Sturnella spp.) using mitochondrial and sex-linked

markers. Auk 125:869-879

Barrett RDH, Hebert PDN (2005) Identifying spiders through DNA barcodes. Canadian

Journal of Zoology 83:481-491

Barrowclough GF, Shields GF (1984) Karyotypic evolution and long-term effective

population sizes of birds. Auk 101:99-102

Barton NH (2000) Genetic hitchhiking. Philosophical Transactions of the Royal Society

of London Series B-Biological Sciences 355:1553-1562

Bates JM, Hackett SJ, Goerck JM (1999) High levels of mitochondrial DNA

differentiation in two lineages of antbirds (Drymophila and Hypocnemis). Auk

116:1093-1106

Bazin E, Glemin S, Galtier N (2006) Population size does not influence mitochondrial

genetic diversity in animals. Science 312:570-572

Bearhop S, Fiedler W, Furness RW, Votier SC, Waldron S, Newton J, Bowen GJ,

Berthold P, Farnsworth K (2005) Assortative mating as a mechanism for rapid

evolution of a migratory divide. Science 310:502-504

Benasson D, Zhang D, Hart DL, Hewitt GM (2001) Mitochondrial pseudogenes:

Evolution's misplaced witnesses. Trends in Ecology & Evolution 16

107 Berlin S, Tomaras D, Charles worth B (2007) Low mitochondrial variability in birds

may indicate Hill-Robertson effects on the W chromosome. Heredity 99:389-396

Berry OF (2006) Mitochondrial DNA and population size. Science 314:1388-1388

Betts MJ, Russell RB (2003) Amino acid properties and consequences of substitutions.

In: Barnes MR, Gray IC (eds) Bioinformatics for Geneticists. Wiley, Chichester,

p408

Bildstein KL (2004) Raptor migration in the neotropics: Patterns, processes, and

consequences. Ornitologia Neotropical 15:83-99

Blaxter ML (2004) The promise of a DNA taxonomy. Philosophical Transactions of the

Royal Society of London Series B-Biological Sciences 359:669-679

Bolen EG (1979) Blue-winged x cinnamon teal hybrid (Anas discors x Anas clypeata)

from Oklahoma, USA. Wilson Bulletin:367-370

Broughton RE, Reneau PC (2006) Spatial covariation of mutation and nonsynonymous

substitution rates in vertebrate mitochondrial genomes. Molecular Biology and

Evolution 23:1516-1524

Brown WM, George M, Wilson AC (1979) Rapid evolution of animal mitochondrial

DNA Proceedings of the National Academy of Sciences of the United States of

America 76:1967-1971

Brumfield RT (2005) Mitochondrial variation in Bolivian populations of the variable

antshrike (Thamnophilus caerulescens). Auk 122:414-432

Brumfield RT, Capparella AP (1996) Genetic differentiation and taxonomy in the House

Wren species group. Condor 98:547-556

108 Bucklin A, Wiebe PH (1998) Low mitochondrial diversity and small effective

population sizes of the copepods Calanus finmarchicus and Nannocalanus minor.

Possible impact of climatic variation during recent glaciation. Journal of Heredity

89:383-392

Burton RS, Ellison CK, Harrison JS (2006) The sorry state of F-2 hybrids:

Consequences of rapid mitochondrial DNA evolution in allopatric populations.

American Naturalist 168:S14-S24

Busse HJ, Denner EBM, Lubitz W (1996) Classification and identification of bacteria:

Current approaches to an old problem. Overview of methods used in bacterial

systematics. Journal of Biotechnology 47:3-38

Cameron S, Rubinoff D, Will K (2006) Who will actually use DNA barcoding and what

will it cost? Systematic Biology 55:844-847

Campagna L, Lijtmaer DA, Kerr KCR, Barreira AS, Hebert PDN, Lougheed SC, Tubaro

PL (2009) DNA barcodes provide new evidence of a recent radiation in the

genus Sporophila (Aves: Passeriformes). Molecular Ecology Resources Online

Early: 10.1111/J.1755-0998.2009.02799.X

Catalano D, Licciulli F, Turi A, Grillo G, Saccone C, D'Elia D (2006) MitoRes: a

resource of nuclear-encoded mitochondrial genes and their products in Metazoa.

BMC Bioinformatics 7

Chaves AV, Clozato CL, Lacerda DR, Sari EHR, Santos FR (2008) Molecular

taxonomy of brazilian tyrant-flycatchers (Passeriformes: Tyrannidae). Molecular

Ecology Resources 8:1169-1177

109 Chesser RT (2000) Evolution in the high Andes: The phylogenetics of Muscisaxicola

ground-tyrants. Molecular Phylogenetics and Evolution 15:369-380

Cicero C, Johnson NK (1998) Molecular phylogeny and ecological diversification in a

clade of New World songbirds (genus Vireo). Molecular Ecology 7:1359-1370

Clare EL, Kerr KCR, von Konigslow TE, Wilson JJ, Hebert PDN (2008) Diagnosing

mitochondrial DNA diversity: Applications of a sentinel gene approach. Journal

of Molecular Evolution 66:362-367

Clements JF (2007) The Clements checklist of the birds of the world. Cornell University

Press, New York

Collar NJ, Spottiswoode CN (2005) Species limits in birds: A response to Watson.

Bioscience 55:388-389

Cooke F, Rockwell RF, Lank DB (1995) The Snow Geese of La Perouse Bay: natural

selection in the wild. Oxford University Press, Oxford, U.K.

Crochet PA, Desmarais E (2000) Slow rate of evolution in the mitochondrial control

region of gulls (Aves : Laridae). Molecular Biology and Evolution 17:1797-1806 de Queiroz K (2005) Ernst Mayr and the modern concept of species. Proceedings of the

National Academy of Sciences of the United States of America 102:6600-6607

DeSalle R (2006) Species discovery versus species identification in DNA barcoding

efforts: response to Rubinoff. Conservation Biology 20:1545-1547

DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species

delimitation and DNA barcoding. Philosophical Transactions of the Royal Society

B-Biological Sciences 360:1905-1916

110 Desjardins P, Morais R (1990) Sequence and gene organization of the chicken

mitochondrial genome - a novel gene order in higher vertebrates. Journal of

Molecular Biology 212:599-634

Dove CJ, Rotzel NC, Heacker M, Weigt LA (2008) Using DNA barcodes to identify

bird species involved in birdstrikes. Journal of Wildlife Management 72:1231-

1236

Drovetski SV, Zink RM, Fadeev IV, Nesterov EV, Koblik EA, Red'kin YA, Rohwer S

(2004a) Mitochondrial phylogeny of Locustella and related genera. Journal of

Avian Biology 35:105-110

Drovetski SV, Zink RM, Rohwer S, Fadeev IV, Nesterov EV, Karagodin I, Koblik EA,

Red'kin YA (2004b) Complex biogeographic history of a Holarctic passerine.

Proceedings of the Royal Society of London Series B-Biological Sciences

271:545-551

Drummond AJ, Ashton B, Cheung M, Heled J, Kearse M, Moir R, Stones-Havas S,

Thierer T, Wilson A (2007) Geneious v3.0, Available from

http://www.geneious.com/

Ebach MC, Holdrege C (2005) DNA barcoding is no substitute for taxonomy. Nature

434:697-697

Edelaar P, Summers R, Iovchenko NP (2003) Ecology and evolution of the crossbills-

complex (Loxid). Vogelwarte 42:113-114

Edwards SV (2008) A smorgasbord of markers for avian ecology and evolution.

Molecular Ecology 17:945-946

111 Efe MA, Tavares ES, Baker A J, Bonatto SL (2009) Multigene phylogeny and DNA

barcoding indicate that the Sandwich tern complex (Thalasseus sandvicensis,

Laridae, Sternini) comprises two species. Molecular Phylogenetics and Evolution

52:263-267

Egea R, Casillas S, Barbadilla A (2008) Standard and generalized McDonald-Kreitman

test: a website to detect selection by comparing different classes of DNA sites.

Nucleic Acids Research 36:W157-W162

Elias-Gutierrez M, Valdez-Moreno M (2008) A new cryptic species of Leberis Smirnov,

1989 (Crustacea, Cladocera, Chydoridae) from the Mexican semi-desert region,

highlighted by DNA barcoding. Hidrobiologica 18:63-74

Ellegren H (2008) Sequencing goes 454 and takes large-scale genomics into the wild.

Molecular Ecology 17:1629-1631

Evenhuis NL (2007) Helping solve the "other" taxonomic impediment: Completing the

eight steps to total enlightenment and taxonomic nirvana. Zootaxa:3-12

Feldman CR, Omland KE (2005) Phylogenetics of the common raven complex (Corvus:

Corvidae) and the utility of ND4, COI and intron 7 of the p-fibrinogen gene in

avian molecular systematics. Zoologica Scripta 34:145-156

Felsenstein J (1985) Confidence limits on phylogenies - an approach using the

bootstrap. Evolution 39:783-791

Ferri E, Barbuto M, Bain O, Galimberti A, Uni S, Guerrero R, Ferte H, Bandi C, Martin

C, Casiraghi M (2009) Integrated taxonomy: traditional approach and DNA

barcoding for the identification of filarioid worms and related parasites

(Nematoda). Frontiers in Zoology 6

112 Fitzpatrick BM (2004) Rates of evolution of hybrid inviability in birds and mammals.

Evolution 58:1865-1870

Fjeldsa J, Irestedt M, Jonsson KA, Ohlson JI, Ericson PGP (2007) Phylogeny of the

genus Upucerthia: a case of independent adaptations for terrestrial life.

Zoologica Scripta 36:133-141

Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode

identification. Molecular Ecology 11:839-850

Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for

amplification of mitochondrial cytochrome c oxidase subunit I from diverse

metazoan invertebrates. Molecular Marine Biology and Biotechnology 3:294-299

Francisco MR, Gibbs HL, Galetti M, Lunardi VO, Galetti PM (2007) Genetic structure

in a tropical lek-breeding bird, the blue manakin (Chiroxiphia caudata) in the

Brazilian Atlantic Forest. Molecular Ecology 16:4908-4918

Frezal L, Leblois R (2008) Four years of DNA barcoding: Current advances and

prospects. Infection Genetics and Evolution 8:727-736

Friesen VL (1997) Population genetics and the spatial scale of conservation of colonial

waterbirds. Colonial Waterbirds 20:353-368

Friesen VL, Piatt JF, Baker AJ (1996) Evidence from cytochrome b sequences and

allozymes for a 'new' species of alcid: The long-billed Murrelet (Brachyramphus

perdix). Condor 98:681-690

Fry AJ (1999) Mildly deleterious mutations in avian mitochondrial dna: evidence from

neutrally tests. Evolution 53:1617-1620

113 Funk DJ, Omland KE (2003) Species-level paraphyly and polyphyly: Frequency,

causes, and consequences, with insights from animal mitochondrial DNA. Annual

Review of Ecology Evolution and Systematics 34:397-423

Galtier N, Depaulis F, Barton NH (2000) Detecting bottlenecks and selective sweeps

from DNA sequence polymorphism. Genetics 155:981-987

Galtier N, Nabholz B, Glemin S, Hurst GDD (2009) Mitochondrial DNA as a marker of

molecular diversity: a reappraisal. Molecular Ecology 18:4541-4550

Garcfa-Moreno J, Arctander P, Fjeldsa J (1998) Pre-Pleistocene differentiation among

chat-tyrants. Condor 100:629-640

Garcfa-Moreno J, Arctander P, Fjeldsa J (1999) A case of rapid diversification in the

neotropics: Phylogenetic relationships among Cranioleuca spinetails (Aves,

Furnariidae). Molecular Phylogenetics and Evolution 12:273-281

Gerber AS, Loggins R, Kumar S, Dowling TE (2001) Does nonneutral evolution shape

observed patterns of DNA variation in animal mitochondrial genomes? Annual

Review of Genetics 35:539-566

Gershoni M, Templeton AR, Mishmar D (2009) Mitochondrial bioenergetics as a major

motive force of speciation. Bioessays 31:642-650

Gibbs J (2009) Integrative taxonomy identifies new (and old) species in the

Lasioglossum (Dialictus) tegulare (Robertson) species group (Hymenoptera,

Halictidae). Zootaxa:l-38

GillFB (2003) Ornithology. Freeman, New York

Gill FB, Mostrom AM, Mack AL (1993) Speciation in North American chickadees:

Patterns of mtDNA genetic divergence. Evolution 47:195-212

114 Gill FB, Wright MT (2006) Birds of the World. Princeton University Press, New Jersey

Gillespie JH (2000) The neutral theory in an infinite population. Gene 261:11-18

Gregory TR (2005) DNA barcoding does not compete with taxonomy. Nature 434:1067

Hackett SJ (1996) Molecular phylogenetics and biogeography of tanagers in the genus

Ramphocelus (Aves). Molecular Phylogenetics and Evolution 5:368-382

Hackett SJ, Kimball RT, Reddy S, Bowie RCK, Braun EL, Braun MJ, Chojnowski JL,

Cox WA, Han KL, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore

WS, Sheldon FH, Steadman DW, Witt CC, Yuri T (2008) A phylogenomic study

of birds reveals their evolutionary history. Science 320:1763-1768

Hajibabaei M, DeWaard JR, Ivanova NV, Ratnasingham S, Dooh RT, Kirk SL, Mackie

PM, Hebert PDN (2005) Critical factors for assembling a high volume of DNA

barcodes. Philosophical Transactions of the Royal Society B-Biological Sciences

360:1959-1967

Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes

distinguish species of tropical Lepidoptera. Proceedings of the National Academy

of Sciences of the United States of America 103:968-971

Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and

analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series

41:95-98

Haring E, Gamauf A, Kryukov A (2007) Phylogeographic patterns in widespread corvid

birds. Molecular Phylogenetics and Evolution 45:840-862

Hasegawa M, Cao Y, Yang ZH (1998) Preponderance of slightly deleterious

polymorphism in mitochondrial DNA: Nonsynonymous/synonymous rate ratio is

115 much higher within species than between species. Molecular Biology and

Evolution 15:1499-1505

Hebert PDN, Cywinska A, Ball SL, DeWaard JR (2003a) Biological identifications

through DNA barcodes. Proceedings of the Royal Society of London Series B-

Biological Sciences 270:313-321

Hebert PDN, Ratnasingham S, deWaard JR (2003b) Barcoding animal life: cytochrome

c oxidase subunit 1 divergences among closely related species. Proceedings of the

Royal Society of London Series B-Biological Sciences 270:S96-S99

Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds

through DNA barcodes. PLoS BIOLOGY 2:1657-1663

Hedges SB (2002) The origin and evolution of model organisms. Nature Reviews

Genetics 3:838-849

Hewitt G (2000) The genetic legacy of the Quaternary ice ages. Nature 405:907-913

Hickerson MJ, Meyer CP, Moritz C (2006) DNA barcoding will often fail to discover

new animal species over broad parameter space. Systematic Biology 55:729-739

Hickey AJR (2008) An alternate explanation for low mtDNA diversity in birds: an age-

old solution? Heredity 100:443-443

Hiebl I, Braunitzer G, Schneeganss D (1987) The primary structures of the major and

minor hemoglobin components of adult Andean goose (Chloephaga melanoptera,

Anatidae) - the mutation leu-ser in position 55 of the beta-chains. Biological

Chemistry Hoppe-Seyler 368:1559-1569

116 Hogg ID, Hebert PDN (2004) Biological identification of springtails (Hexapoda:

Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes.

Canadian Journal of Zoology 82:749-754

Holmes S (2003) Bootstrapping phylogenetic trees: theory and methods. Statistical

Science 18:241-255

Hudson RR, Kreitman M, Aguade M (1987) A test of neutral molecular evolution based

on nucleotide data. Genetics 116:153-159

Hughes AL, Hughes MAK (2007) Coding sequence polymorphism in avian

mitochondrial genomes reflects population histories. Molecular Ecology 16:1369-

1376

Illera JC, Richardson DS, Helm B, Atienza JC, Emerson BC (2008) Phylogenetic

relationships, biogeography and speciation in the avian genus Saxicola. Molecular

Phylogenetics and Evolution 48:1145-1154

International W (2002) Waterbird population estimates, Wageningen, The Netherlands

Irwin DE, Bensch S, Price TD (2001) Speciation in a ring. Nature 409:333-337

Irwin DE, Rubstov AS, Panov EV (2009) Mitochondrial introgression and replacement

between yellowhammers (Emberiza citrinella) and pine buntings (E.

leucocephalos; Aves, Passeriformes). Biological Journal of the Linnean Society

98:422-438

Ivanova NV, Dewaard JR, Hebert PDN (2006) An inexpensive, automation-friendly

protocol for recovering high-quality DNA. Molecular Ecology Notes 6:998-1002

117 Johns GC, Avise JC (1998) A comparative summary of genetic distances in the

vertebrates from the mitochondrial cytochrome b gene. Molecular Biology and

Evolution 15:1481-1490

Johnsen A, Rindal E, Ericson PGP, Zuccon D, Kerr KCR, Stoeckle MY, Lifjeld JT

(2010) DNA barcoding of Scandinavian birds reveals divergent lineages in trans-

Atlantic species. Journal of Ornithology Online early

Johnson KP, Sorenson MD (1999) Phylogeny and biogeography of dabbling ducks

(genus: Anas): a comparison of molecular and morphological evidence. Auk

116:792-805

Johnson NK, Johnson CB (1985) Speciation in sapsuckers (Sphyrapicus): II. Sympatry,

hybridization, and mate preference in S. ruber daggetti and S. nuchalis. Auk

102:1-15

Johnston DW (1961) The biosystematics of American Crows. University of Washington

Press, Seattle

Joseph L, Omland KE (2009) Phylogeography: its development and impact in Australo-

Papuan ornithology with special reference to paraphyly in Australian birds. Emu

109:1-23

Karp A, Seberg O, Buiatti M (1996) Molecular techniques in the assessment of

botanical diversity. Annals of Botany 78:143-149

Kelly RP, Sarkar IN, Eernisse DJ, Desalle R (2007) DNA barcoding using chitons

(genus Mopalia). Molecular Ecology Notes 7:177-183

Kenyon L, Moraes CT (1997) Expanding the functional human mitochondrial DNA

database by the establishment of primate xenomitochondrial cybrids. Proceedings

118 of the National Academy of Sciences of the United States of America 94:9131-

9135

Kerr KCR, Birks SM, Kalyakin MV, Red'kin YA, Koblik EA, Hebert PDN (2009a)

Filling the gap - COI barcode resolution in eastern Palearctic birds. Frontiers in

Zoology 6

Kerr KCR, Lijtmaer DA, Barreira AS, Hebert PDN, Tubaro PL (2009b) Probing

evolutionary patterns in Neotropical birds through DNA barcodes. PLoS ONE 4:6

Kerr KCR, Stoeckle MY, Dove CJ, Weigt LA, Francis CM, Hebert PDN (2007)

Comprehensive DNA barcoding coverage of North American birds. Molecular

Ecology Notes 7:535-543

Kessler LG, Avise JC (1985) A comparative description of mitochondrial DNA

differentiation in selected avian and other vertebrate genera. Molecular Biology

and Evolution 2:109-125

Kimura M (1980) A simple method for estimating evolutionary rates of base

substitutions through comparative studies of nucleotide sequences. Journal of

Molecular Evolution 16:111-120

Knox AG, Collinson M, Helbig AJ, Parkin DT, Sangster G (2002) Taxonomic

recommendations for British birds. Ibis 144:707-710

Koopman NE, McDonald DB, Hay ward GD, Eldegard K, Sonerud GA, Sermach SG

(2005) Genetic similarity among Eurasian subspecies of boreal owls Aegolius

funereus. Journal of Avian Biology 36:179-183

Kroodsma DE (1989) Two North American song populations of the marsh wren reach

distributional limits in the Central Great Plains. Condor 91:332-340

119 Kumar S, Tamura K, Nei M (2004) MEGA3: Integrated software for molecular

evolutionary genetics analysis and sequence alignment. Briefings in

Bioinformatics 5:150-163

Kvist L, Martens J, Higuchi H, Nazarenko AA, Valchuk OP, Orell M (2003) Evolution

and genetic structure of the great tit (Parus major) complex. Proceedings of the

Royal Society of London Series B-Biological Sciences 270:1447-1454

LaneN (2009) On the origin of bar codes. Nature 462:272-274

Lee JCI, Tsai LC, Huang MT, Jhuang JA, Yao CT, Chin SC, Wang LC, Linacre A, Hsieh

HM (2008) A novel strategy for avian species identification by cytochrome b

gene. Electrophoresis 29:2413-2418

Lehman N, Eisenhawer A, Hansen K, Mech LD, Peterson RO, Gogan PJP, Wayne RK

(1991) Introgression of coyote mitochondrial DNA into sympatric North

American gray wolf populations. Evolution 45:104-119

Lewontin RC (1974) The genetic basis of evolutionary change. Columbia University

Press, New York

Li W, Zhang Y-y (2004) Subspecific taxonomy of Ficedula parva based on sequences

of mitochondrial cytochrome b gene. Zoological Research 25:127-131

Liang YH, Liu XZ, Liu SH, Lu GY (2001) The structure of greylag goose oxy

haemoglobin: the roles of four mutations compared with bar-headed goose

haemoglobin. Acta Crystallographica Section D-Biological Crystallography

57:1850-1856

Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA

polymorphism data. Bioinformatics 25:1451-1452

120 Liebers D, de Knijff P, Helbig AJ (2004) The herring gull complex is not a ring species.

Proceedings of the Royal Society of London Series B-Biological Sciences

271:893-901

Lijtmaer DA, Sharpe NMM, Tubaro PL, Lougheed SC (2004) Molecular phylogenetics

and diversification of the genus Sporophila (Aves: Passeriformes). Molecular

Phylogenetics and Evolution 33:562-579

Linnaeus C (1758) Systema Naturae, Editio decima. Laurentii Salvii, Holmiae

Lipscomb D, Platnick N, Wheeler Q (2003) The intellectual content of taxonomy: a

comment on DNA taxonomy. Trends in Ecology & Evolution 18:65-66

Lovette IJ, Bermingham E (2001) Mitochondrial perspective on the phylogenetic

relationships of the Parula wood-warblers. Auk 118:211-215

Lynch M, Jarrell PE (1993) A method for calibrating molecular clocks and its

application to animal mitochondrial DNA. Genetics 135:1197-1208

Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of

organelle genomic architecture. Science 311:1727-1730

Maddison WP, Maddison DR (2009) Mesquite: a modular system for evolutionary

analysis, http://mesquiteproject.org

Marshall HD, Coulson MW, Carr SM (2009) Near neutrality, rate heterogeneity, and

linkage govern mitochondrial genome evolution in Atlantic Cod (Gadus morhua)

and other gadine fish. Molecular Biology and Evolution 26:579-589

Martens K (2010) The International Year of Biodiversity. Hydrobiologia 637:1-2

Marthinsen G, Wennerberg L, Lifjeld JT (2008) Low support for separate species within

the redpoll complex (Carduelisflammea-hornemanni-cabaret) from analyses of

121 mtDNA and microsatellite markers. Molecular Phylogenetics and Evolution

47:1005-1017

Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on

DNA sequence data. Philosophical Transactions of the Royal Society B-

Biological Sciences 360:1969-1974

Mazar Barnett J, Pearman M (2001) Annotated checklist of the birds of Argentina. Lynx

Edicions, Barcelona

McCarthy E (2006) The handbook of avian hybrids of the world. Oxford University

Press, New York

McCracken KG, Johnson WP, Sheldon FH (2001) Molecular population genetics,

phylogeography, and conservation biology of the mottled duck (Anas fulvigula).

Conservation Genetics 2:87-102

McManus DP, Bowles J (1996) Molecular genetic approaches to parasite identification:

Their value in diagnostic parasitology and systematics. International Journal for

Parasitology 26:687-704

Meiklejohn CD, Montooth KL, Rand DM (2007) Positive and negative selection on the

mitochondrial genome. Trends in Genetics 23:259-263

Meyer CP, Paulay G (2005) DNA barcoding: Error rates based on comprehensive

sampling. PLoS BIOLOGY 3:2229-2238

Mila B, McCormack JE, Castaneda G, Wayne RK, Smith TB (2007a) Recent postglacial

range expansion drives the rapid diversification of a songbird lineage in the genus

Junco. Proceedings of the Royal Society B-Biological Sciences 274:2653-2660

122 Mila B, Smith TB, Wayne RK (2007b) Speciation and rapid phenotypic differentiation

in the yellow-rumped warbler Dendroica coronata complex. Molecular Ecology

16:159-173

Min XJ, Hickey DA (2007) DNA barcodes provide a quick preview of mitochondrial

genome composition. PLoS ONE 2:5

Mindell DP, Sorenson MD, Huddleston CJ, Miranda HC, Knight A, Sawchuk SJ, Yuri T

(1997) Phylogenetic relationships among and within select avian orders based on

mitochondrial DNA. In: Mindell DP (ed) Avian molecular evolution and

systematics. Academic Press, New York, p 214-247

Mishler BD, Shapley RL (2004) Presentation at Assembling the Tree of Life - PI

meeting, National Science Foundation, Nov. 19-21

Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M,

Easley K, Chen E, Brown MD, Sukernik RI, Olckers A, Wallace DC (2003)

Natural selection shaped regional mtDNA variation in humans. Proceedings of the

National Academy of Sciences of the United States of America 100:171-176

Monaghan MT, Balke M, Gregory TR, Vogler AP (2005) DNA-based species

delineation in tropical beetles using mitochondrial and nuclear markers.

Philosophical Transactions of the Royal Society B-Biological Sciences 360:1925-

1933

Moore WS, Weibel AC, Agius A (2006) Mitochondrial DNA phylogeny of the

woodpecker genus Veniliornis (Picidae, Picinae) and related genera implies

convergent evolution of plumage patterns. Biological Journal of the Linnean

Society 87:611-624

123 Moritz C (1994) Defining "evolutionarily significant units" for conservation. Trends in

Ecology & Evolution 9:373-375

Moritz C, Cicero C (2004) DNA barcoding: Promise and pitfalls. PLoS BIOLOGY

2:1529-1531

Moritz C, Hoskin CJ, MacKenzie JB, Phillips BL, Tonione M, Silva N, VanDerWal J,

Williams SE, Graham CH (2009) Identification and dynamics of a cryptic suture

zone in tropical rainforest. Proceedings of the Royal Society B-Biological

Sciences 276:1235-1244

Morrison ML, Hardy JW (1983) Hybridization between Hermit and Townsend's

Warblers. Murrelet 64:65-72

Mulligan CJ, Kitchen A, Miyamoto MM (2006) Comment on "Population size does not

influence mitochondrial genetic diversity in animals". Science 314:1390

Munch K, Boomsma W, Willerslev E, Nielsen R (2008) Fast phylogenetic DNA

barcoding. Philosophical Transactions of the Royal Society B-Biological Sciences

363:3997-4002

Nabholz B, Glemin S, Galtier N (2009) The erratic mitochondrial clock: variations of

mutation rate, not population size, affect mtDNA diversity across birds and

mammals. BMC Evolutionary Biology 9

Nabholz B, Mauffrey JF, Bazin E, Galtier N, Glemin S (2008) Determination of

mitochondrial genetic diversity in mammals. Genetics 178:351-361

Nachman MW, Boyer SN, Aquadro CF (1994) Nonneutral evolution at the

mitochondrial NADH dehydrogenase subunit 3-gene in mice. Proceedings of the

National Academy of Sciences of the United States of America 91:6364-6368

124 Narosky T, Yzurieta D (2003) Birds of Argentina and Uruguay: a field guide. Vazquez

Mazzini Editores, Buenos Aires

Navajas M, Fenton B (2000) The application of molecular markers in the study of

diversity in acarology: A review. Experimental and Applied Acarology 24:751-

774

Nei M (1987) Molecular Evolutionary Genetics. Columbia University Press, New York

Newton I (2000) The Speciation and Biogeography of Birds. Academic Press, New

York

Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Systematic

Biology 55:162-169

Nores M (1992) Bird speciation in subtropical South America in relation to forest

expansion and retraction. Auk 109:346-357

Olsen KM (2004) Gulls of Europe, Asia, and North America. Princeton University

Press, Princeton, New Jersey

Omland KE, Tarr CL, Boarman WI, Marzluff JM, Fleischer RC (2000) Cryptic genetic

variation and paraphyly in ravens. Proceedings of the Royal Society of London

Series B-Biological Sciences 267:2475-2482

Packert M, Martens J, Severinghaus LL (2009) The Taiwan Firecrest (Regulus

goodfellowi) belongs to the Goldcrest assemblage {Regulus regulus s. /.):

evidence from mitochondrial DNA and the territorial song of the Regulidae.

Journal of Ornithology 150:205-220

Padial JM, de la Riva I (2007) Integrative taxonomists should use and produce DNA

barcodes. Zootaxa:67-68

125 Paland S, Lynch M (2006) Transitions to asexuality result in excess amino acid

substitutions. Science 311:990-992

Pavlova A, Rohwer S, Drovetski SV, Zink RM (2006) Different post-Pleistocene

histories of Eurasian parids. Journal of Heredity 97:389-402

Pavlova A, Zink RM, Drovetski SV, Red'kin Y, Rohwer S (2003) Phylogeographic

patterns in Motacilla flava and Motacilla citreola: Species limits and population

history. Auk 120:744-758

Pavlova A, Zink RM, Drovetski S V, Rohwer S (2008) Pleistocene evolution of closely

related sand martins Riparia riparia and R. diluta. Molecular Phylogenetics and

Evolution 48:61-73

Payne RB (2005) Bird Families of the World: Cuckoos. Oxford University Press, New

York

Peters JL, McCracken KG, Zhuravlev YN, Lu Y, Wilson RE, Johnson KP, Omland KE

(2005) Phylogenetics of wigeons and allies (Anatidae: Anas): the importance of

sampling multiple loci and multiple individuals. Molecular Phylogenetics and

Evolution 35:209-224

Peterson AT (2006) Taxonomy is important in conservation: a preliminary reassessment

of Philippine species-level bird taxonomy. Bird Conservation International

16:155-173

Prendini L (2005) Comment on "Identifying spiders through DNA barcodes". Canadian

Journal of Zoology 83:498-504

Prum RO (1994) Species status of the white-fronted manakin, Lepidothrix serena

(Pipridae), with comments on conservation biology. Condor 96:692-702

126 Quammen D (2003) Monster of god: the man-eating predator in the jungles of history

and the mind. Norton, New York

Rach J, DeSalle R, Sarkar IN, Schierwater B, Hadrys H (2008) Character-based DNA

barcoding allows discrimination of genera, species and populations in Odonata.

Proceedings of the Royal Society B-Biological Sciences 275:237-247

Rand DM, Kann LM (1996) Excess amino acid polymorphism in mitochondrial DNA:

Contrasts among genes from Drosophila, mice, and humans. Molecular Biology

and Evolution 13:735-748

Rand DM, Kann LM (1998) Mutation and selection at silent and replacement sites in the

evolution of animal mitochondrial DNA. Genetica 102-3:393-407

Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System

(www.barcodinglife.org). Molecular Ecology Notes 7:355-364

Reeves AB, Drovetski SV, Fadeev IV (2008) Mitochondrial DNA data imply a

stepping-stone colonization of Beringia by arctic warbler Phylloscopus borealis.

Journal of Avian Biology 39:567-575

Remigio EA, Hebert PDN (2003) Testing the utility of partial COI sequences for

phylogenetic estimates of gastropod relationships. Molecular Phylogenetics and

Evolution 29:641-647

Rice NH, Martinez-Meyer E, Peterson AT (2003) Ecological niche differentiation in the

Aphelocoma jays: a phylogenetic perspective. Biological Journal of the Linnean

Society 80:369-383

Rich TD, Beardmore CJ, Berlanga H, Blancher PJ, Bradstreet MSW, Butcher GS,

Demarest DW, Dunn EH, Hunter WC, Inigo-Elias EE, Kennedy JA, Martelt AM,

127 Panjabi AO, Pashley DN, Rosenberg KV, Rustay CM, Wendt JS, Will TC (2004)

Partners in flight: North American landbird conservation plan. Cornell Lab of

Ornithology, Ithica, New York

Rodrigo AG (1993) Calibrating the bootstrap test of monophyly. International Journal

for Parasitology 23:507-514

Rodriguez Mata JR, Erize F, Rumboll M (2006) A field guide to the birds of South

America. Collins, London

Rohwer S (1976) Specific distinctness and adaptive differences in southwestern

meadowlarks. Occasional Papers of the Museum of Natural History University of

Kansas: 1-13

Rosenberg NA (2007) Statistical tests for taxonomic distinctiveness from observations

of monophyly. Evolution 61:317-323

Sabrosky CW (1950) Taxonomy and ecology. Ecology 31:151-152

Sarkar IN, Planet PJ, Desalle R (2008) CAOS software for use in character-based DNA

barcoding. Molecular Ecology Resources 8:1256-1259

Sato A, O'HUigin C, Figueroa F, Grant PR, Grant BR, Tichy H, Klein J (1999)

Phylogeny of Darwin's finches as revealed by mtDNA sequences. Proceedings of

the National Academy of Sciences of the United States of America 96:5101-5106

Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary

appraisal holds promise for future applications. Philosophical Transactions of the

Royal Society B-Biological Sciences 360:1879-1888

128 Scheffer SJ, Lewis ML, Joshi RC (2006) DNA barcoding applied to invasive leafminers

(Diptera : Agromyzidae) in the Philippines. Annals of the Entomological Society

of America 99:204-210

Schmidt TR, Wu W, Goodman M, Grossman LI (2001) Evolution of nuclear-and

mitochondrial-encoded subunit interaction in cytochrome c oxidase. Molecular

Biology and Evolution 18:563-569

Sibley CG, Ahlquist JE (1983) Phylogeny and classification of birds based on the data

of DNA-DNA hybridization. Current Ornithology:245-292

Sibley CG, Monroe BL (1990) Distribution and taxonomy of birds of the world. Yale

University Press, New Haven, CT

Simmons RB, Weller SJ (2001) Utility and evolution of cytochrome b in .

Molecular Phylogenetics and Evolution 20:196-210

Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity

assessment of a hyperdiverse group: the ants of Madagascar.

Philosophical Transactions of the Royal Society B-Biological Sciences 360:1825-

1834

Smith MA, Poyarkov NA, Hebert PDN (2008) COl DNA barcoding amphibians: take

the chance, meet the challenge. Molecular Ecology Resources 8:235-246

Snow DW (2004) Family Pipridae (Manakins). In: del Hoyo J, Elliot A, Sargatal J (eds)

Handbook of the Birds of the World. Lynx Edicions, Barcelona, p 110-169

Sorenson MD, Payne RB (2005) A molecular genetic analysis of cuckoo phylogeny. In:

Payne RB (ed) Bird Families of the World: Cuckoos. Oxford University Press,

New York, p 68-94

129 Sparling Jr. DW (1980) Hybridization and taxonomic status of Greater Prairie-chickens

and Sharp-tailed Grouse. Prairie Naturalist 12:92-101

Stanley SE, Harrison RG (1999) Cytochrome b evolution in birds and mammals: An

evaluation of the avian constraint hypothesis. Molecular Biology and Evolution

16:1575-1585

Straneck RJ (1993) Aportes para la unification de Serpophaga subcristata y

Serpophaga munda, y la revalidation de Serpophaga griseiceps (Aves:

Tyrannidae). Revista del Museo Argentino de cienicas Naturales Zoologia 16:51-

63

Swofford DL (2002) PAUP*: Phylogenetic analysis using parsimony (*and other

methods). Sinauer Associates, Sunderland, Massachusetts

Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and

linearized trees. Molecular Biology and Evolution 12:823-833

Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular evolutionary genetics

analysis (MEGA) software version 4.0. Molecular Biology and Evolution

24:1596-1599

Tautz D, Arctander P, Minelli A, Thomas RH, Vogler AP (2003) A plea for DNA

taxonomy. Trends in Ecology & Evolution 18:70-74

Tavares ES, Baker AJ (2008) Single mitochondrial gene barcodes reliably identify

sister-species in diverse clades of birds. BMC Evolutionary Biology 8

Team RDC (2007) R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna

130 Thalmann O, Hebler J, Poinar HN, Paabo S, Vigilant L (2004) Unreliable mtDNA data

due to nuclear insertions: a cautionary tale from analysis of humans and other

great apes. Molecular Ecology 13:321-335

Tieleman BI, Versteegh MA, Fries A, Helm B, Dingemanse NJ, Gibbs HL, Williams JB

(2009) Genetic modulation of energy metabolism in birds through mitochondrial

function. Proceedings of the Royal Society B-Biological Sciences 276:1685-1693

Toews DPL, Irwin DE (2008) Cryptic speciation in a Holarctic passerine revealed by

genetic and bioacoustic analyses. Molecular Ecology 17:2691-2705

Traylor MA (1988) Geographic variation and evolution in South American Cistothorus

platensis (Aves: Troglodytidae). Fieldiana Zoology: 1-35

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, Shinzawaltoh K,

Nakashima R, Yaono R, Yoshikawa S (1996) The whole structure of the 13-

subunit oxidized cytochrome c oxidase at 2.8 angstrom. Science 272:1136-1144

Vences M, Thomas M, van der Meijden A, Chiari Y, Vieites D (2005) Comparative

performance of the 16S rRNA gene in DNA barcoding of amphibians. Frontiers

in Zoology 2

Vilaca ST, Lacerda DR, Sari EHR, Santos FR (2006) DNA-based identification applied

to Thamnophilidae (Passeriformes) species: the first barcodes of Neotropical

birds. Revista Brasileira de Ornitologia 14:7-13

Voelker G, Rohwer S, Outlaw DC, Bowie RCK (2009) Repeated trans-Atlantic

dispersal catalysed a global songbird radiation. Global Ecology and Biogeography

18:41-49

131 Wang ZO, Pollock DD (2007) Coevolutionary patterns in cytochrome c oxidase subunit

I depend on structural and functional context. Journal of Molecular Evolution

65:485-495

Ward RD, Hanner R, Hebert PDN (2009) The campaign to DNA barcode all fishes,

FISH-BOL. Journal of Fish Biology 74:329-356

Ward RD, Zemlak TS, Junes BH, Last PR, Hebert PDN (2005) DNA barcoding

Australia's fish species. Philosophical Transactions of the Royal Society B-

Biological Sciences 360:1847-1857

Wares JP, Barber PH, Ross-Ibarra J, Sotka EE, Toonen RJ (2006) Mitochondrial DNA

and population size. Science 314:1388-1389

Watson DM (2005) Diagnosable versus distinct: Evaluating species limits in birds.

Bioscience 55:60-68

Watson JD, Crick FHC (1953) MOLECULAR STRUCTURE OF NUCLEIC ACIDS -

A STRUCTURE FOR DEOXYRIBOSE NUCLEIC ACID. Nature 171:737-738

Weir JT, Schluter D (2004) Ice sheets promote speciation in boreal birds. Proceedings

of the Royal Society of London Series B-Biological Sciences 271:1881-1887

Weir JT, Schluter D (2007) The latitudinal gradient in recent speciation and extinction

rates of birds and mammals. Science 315:1574-1576

Weir JT, Schluter D (2008) Calibrating the avian molecular clock. Molecular Ecology

17:2321-2328

Wheeler QD (2000) Species concepts and phylogenetic theory. Columbia University

Press, New York

132 Wiemers M, Fiedler K (2007) Does the DNA barcoding gap exist? - a case study in

blue butterflies (Lepidoptera: Lycaenidae). Frontiers in Zoology 4

Will KW, Mishler BD, Wheeler QD (2005) The perils of DNA barcoding and the need

for integrative taxonomy. Systematic Biology 54:844-851

Will KW, Rubinoff D (2004) Myth of the molecule: DNA barcodes for species cannot

replace morphology for identification and classification. Cladistics-the

International Journal of the Willi Hennig Society 20:47-55

Wilson CC, Bernatchez L (1998) The ghost of hybrids past: fixation of arctic charr

(Salvelinus alpinus) mitochondrial DNA in an introgressed population of lake

trout (S. namaycush). Molecular Ecology 7:127-132

Wilson EO (2003) The encyclopedia of life. Trends in Ecology & Evolution 18:77-80

Wilson EO, Peter FM (1988) Biodiversity. National Academy Press, Washington, D.C.

Wise CA, Sraml M, Easteal S (1998) Departure from neutrality at the mitochondrial

NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees.

Genetics 148:409-421

Wong EHK, Shivji MS, Hanner RH (2009) Identifying sharks with DNA barcodes:

assessing the utility of a nucleotide diagnostic approach. Molecular Ecology

Resources 9:243-256

Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular

Biology and Evolution 24:1586-1591

Yassin A (2008) Molecular and morphometrical revision of the tuberculatus

species subgroup (Diptera: ), with descriptions of two cryptic

species. Annals of the Entomological Society of America 101:978-988

133 Yoo HS, Eah JY, Kim JS, Kim YJ, Min MS, Paek WK, Lee H, Kim CB (2006) DNA

barcoding Korean birds. Molecules and Cells 22:323-327

Zardoya R, Meyer A (1996) Phylogenetic performance of mitochondrial protein-coding

genes in resolving relationships among vertebrates. Molecular Biology and

Evolution 13:933-942

Zardoya R, Suadrez M (2008) Sequencing and phylogenomic analysis of whole

mitochondrial genomes of animals. Methods in Molecular Biology: 185-200

Zhang AB, Sikes DS, Muster C, Li SQ (2008) Inferring species membership using DNA

sequences with back-propagation neural networks. Systematic Biology 57:202-

215

Zink RM (2004) The role of subspecies in obscuring avian biological diversity and

misleading conservation policy. Proceedings of the Royal Society of London

Series B-Biological Sciences 271:561-564

Zink RM (2005) Natural selection on mitochondrial DNA in Parus and its relevance for

phylogeographic studies. Proceedings of the Royal Society of London Series B-

Biological Sciences 272:71-78

Zink RM, Barrowclough GF (2008) Mitochondrial DNA under siege in avian

phylogeography. Molecular Ecology 17:2107-2121

Zink RM, Blackwell-Rago RC (2000) Species limits and recent population history in the

Curve-billed Thrasher. Condor 102:881-886

Zink RM, Drovetski SV, Rohwer S (2002a) Phylogeographic patterns in the great

spotted woodpecker Dendrocopos major across Eurasia. Journal of Avian Biology

33:175-178

134 Zink RM, Drovetski SV, Rohwer S (2006a) Selective neutrality of mitochondrial ND2

sequences, phylogeography and species limits in Sitta europaea. Molecular

Phylogenetics and Evolution 40:679-686

Zink RM, Pavlova A, Drovetski S, Rohwer S (2008) Mitochondrial phylogeographies of

five widespread Eurasian bird species. Journal of Ornithology 149:399-413

Zink RM, Pavlova A, Drovetski S, Wink M, Rohwer S (2009) Taxonomic status and

evolutionary history of the Saxicola torquata complex. Molecular Phylogenetics

and Evolution 52:769-773

Zink RM, Pavlova A, Rohwer S, Drovetski SV (2006b) Barn swallows before barns:

population histories and intercontinental colonization. Proceedings of the Royal

Society B-Biological Sciences 273:1245-1251

Zink RM, Rohwer S, Andreev AV, Dittmann DL (1995a) Trans-Beringia comparisons

of mitochondrial DNA differentiation in birds. Condor 97:639-649

Zink RM, Rohwer S, Andreev AV, Dittmann DL (1995b) Trans-Beringia comparisons

of mitochrondrial DNA differentiation in birds. Condor 97:639-649

Zink RM, Rohwer S, Drovetski S, Blackwell-Rago RC, Farrell SL (2002b) Holarctic

phylogeography and species limits of Three-toed Woodpeckers. Condor 104:167-

170

135