NOTE TO USERS
This reproduction is the best copy available.
UMI*
EXPLORING THE EFFICACY, UTILITY, AND LIMITATIONS
OF DNA BARCODING WITHIN THE CLASS AVES
A Thesis
Presented to The Faculty of Graduate Studies
of
The University of Guelph
by KEVIN CHARLES ROBERT KERR
In partial fulfilment of requirements for the degree of
Doctor of Philosophy April, 2010
© Kevin C. R. Kerr, 2010 Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition
395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada
Your file Votre reference ISBN: 978-0-494-64533-8 Our file Notre reference ISBN: 978-0-494-64533-8
NOTICE: AVIS:
The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.
The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.
In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.
While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.
•+• Canada ABSTRACT
EXPLORING THE EFFICACY, UTILITY AND LIMITATIONS OF DNA BARCODING WITHIN THE CLASS AVES
Kevin C. R. Kerr Advisors: University of Guelph, 2010 Professor P. D. N. Hebert Professor A. J. Baker
This thesis investigates the efficacy of a recently proposed molecular bioidentifcation system known as "DNA barcoding". This system employs a short, standardized gene region
(648bp of the mitochondrial gene cytochrome c oxidase I, in the case of animals) as a unique species identifier. To test species-level resolution, I constructed a library of DNA barcode sequences for birds from three regions: the Nearctic (North America), the southern
Neotropics (Argentina), and the eastern Palearctic (Russia, Mongolia, and Kazakhstan).
The accuracy of barcode-based species identification was assessed using the currently accepted avian taxonomy, which is the most robust of any taxonomic group. I also tested the
use of DNA barcodes for species discovery via detection of large intraspecific divergences.
Common intraspecific and interspecific trends in phylogeography were compared within
and between biogeographical realms. Using the avian barcode library, I also compared the performance of several different methods for species delimitation, including distance-based thresholds, tree-based methods, and character-based methods (wherein each nucleotide of the sequence is treated as a unique character). Finally, I used the abundance of sequence data to test for signs of selection in cytochrome c oxidase I. Whole mitochondrial genomes
available from GenBank were used to review the consistency of selective pressure throughout the genome. This largely confirmed the role of purifying selection in the evolution of the mitochondrial genome in birds. Overall, this study substantiates the utility of DNA barcoding as a reliable tool for the purposes of species identification and for highlighting taxa in need of further taxonomic review. ACKNOWLEDGEMENTS
This research was funded by an Ontario Graduate Scholarship and an Elgin Card terrestrial zoology scholarship to myself, and, to a greater extent, by formidable grants to
Paul Hebert from Genome Canada, the Natural Sciences and Engineering Research
Council of Canada, and the Gordon and Betty Moore Foundation.
In its entirety, the research presented in this thesis is truly a culmination of the efforts of many, to all of whom I owe a debt of gratitude. First and foremost, I must thank Paul Hebert for seeing my potential and for providing me with so many tremendous opportunities, some beyond which I could have ever imagined. Without him, none of this would have been possible.
I have been incredibly fortunate to have an extensive support network and the opportunity to collaborate with so many wonderful people. My co-advisor, Allan Baker, and Mark Peck of the Royal Ontario Museum provided samples, insightful discussion, and behind-the-scenes tours. Dr. Mark Stoeckle of the Rockefeller University was always willing to offer thoughtful commentary and, when in a pinch, sound medical advice.
Carla Dove and her myriad of technicians at the Smithsonian Institution helped tremendously in wrangling up specimens and by contributing sequences. Sharon Birks and staff at the Burke Museum of Natural History (including, but not limited to, Rob
Faucett, Chris Wood, and Sievert Rohwer) were kind enough to host me at their museum and provide extensive access to their remarkable tissue collection. Charles Francis and
Michel Gendron of the Canadian Wildlife Service also offered valuable collaboration.
I owe unfuerte abrazo to Pablo Tubaro for having the foresight to get involved with DNA barcoding. Our ongoing student exchange has been one of the best parts of my
i graduate experience. Accordingly, I thank Dario Lijtmaer for being such an excellent collaborator and for occasionally serving as my translator, I thank Ana Barreira and Pilar
Benites for getting me home from the jungle in one piece, and I thank Cecilia Kopuchian,
Leonardo Campagna, and everyone else at the museum for all of our interactions.
The Hebert Lab has been a great academic home. I must thank all lab mates, past and present: Tyler Zemlak, Elizabeth Clare, John Wilson, Taika von Konigslow, Vazrick
Nazari, Christina Carr, and Erin Corstorphine. I thank Sujeevan Ratnasingham, who on account of his genius has been the backbone of this whole endeavour (and occasionally stopped to have fun too). I also thank all of the other brains that have helped me out along the way: Jeremey deWaard, Jonathan Witt, Alex Smith, Mehrdad Hajibabaei, Nataly
Ivanova, Alex Borisenko, Dirk Steinke, Rob Dooh, and Justin Schonfeld. Lab support came from Chris Grainger, Janet Topan, Constantine Christopoulos, Isabelle Meusnier and Liuqiong Lu, as well as notably from Angela Hollis of the Genomics Facility.
Jinzhong Fu and Teri Crease have both served very important roles on my committee and I have relied on them both for support. The Department of Integrative
Biology is generally filled with terrific faculty, of whom I would like to especially thank
Jim Bogart, Elizabeth Boulding, Bob Hanner, Brian Husband, Ryan Gregory, Denis
Lynn, Steve Newmaster, Beren Robinson, and Don Stevens. The students in this department (Darren Sleep, Marc Freeman, Nathalie Newby, Mark Sherrard, Kate Crosby,
John Urquhuart, Joe Crowley, Martin Brummell, and Sarah Alderman, to name just a few) have been great friends and I could not imagine the experience without them.
This experience has taught me the essential value of museum collections, without which a study of this magnitude would never be possible, as should be evidenced by this
ii laundry list of collaborating institutions. For providing tissue samples I thank the Royal
Ontario Museum (including volunteers of Toronto's Fatal Light Awareness Program),
Burke Museum of Natural History and Culture, Zoological Museum of Moscow
Lomonosov University, Museo Argentino de Ciencias Naturales, and the Canadian
Wildlife Service. For additional samples processed via the Smithsonian I would like to thank the Academy of Natural Sciences Philadelphia, American Museum of Natural
History, Field Museum, Museum of Comparative Zoology, Louisiana State University,
Museum of Southwestern Biology, Museum of Vertebrate Zoology, National Museum of
Natural History, University of Alaska-Fairbanks, and the University of Kansas. Bob
Montgomerie (Queen's University) and Chris Earley (The Arboretum, University of
Guelph) provided additional specimens and Irby Lovette (Cornell University) shared unpublished sequences. Feather samples were generously provided by the following stations: Albert Creek Bird Observatory, Appalachian Highlands Science Learning
Center of the US National Park Service, Brier Island Bird Migration Research Station,
Gros Morne National Park Migration Monitoring Station, Haldimand Bird Observatory,
Innis Point Bird Observatory, Inglewood Bird Observatory, Long Point Bird
Observatory, Mackenzie Nature Observatory, McGill Bird Observatory, Rock Point Bird
Banding Station, Rocky Point Bird Observatory, St. Andrews Banding Station, Tommy
Thompson Park Bird Research Station, and Vaseux Lake Migration Monitoring Station.
Lastly, but most importantly, I thank my parents, Gary and Sandy, and my sister,
Jenny, for always being supportive and believing in me. I am extremely lucky to have such a wonderful family. I also thank Andrea Nesbitt for enduring my workaholic stints, long absences, and all of the chaos, with love and support.
iii TABLE OF CONTENTS
List of Tables vii
List of Figures viii
PROLGUE 1
General introduction 2
CHAPTER I
Comprehensive DNA barcode coverage of North American birds 10
Abstract 11
Introduction 12
Methods 13
Results 15
Discussion 17
CHAPTER n
Probing evolutionary patterns in Neotropical birds through DNA barcodes 29
Abstract 30
Introduction 31
Methods 32
Results 35
Discussion 37
Barcodes in Argentinean birds 37
Barcoding the avifaunas of Argentina and North America 41
Conclusions 43
IV CHAPTER III
Filling the gap - COI barcode resolution in eastern Palearctic birds 48
Abstract 49
Introduction 51
Methods 53
Sampling 53
Laboratory methods 54
Data analysis 55
Results 57
Neighbour-joining clusters 57
Distance-based assignment 58
Character-based assignment 59
Discussion 60
Species boundaries in Palearctic birds 60
Methods comparison 64
Conclusions 67
CHAPTER IV
Searching for selection using avian DNA barcodes 76
Abstract 77
Introduction 78
Methods 80
Data collection 80
Assessing genetic diversity 81
v Neutrality tests ofCOI variation 82
Amino acid variation 83
COI versus other mitochondrial genes 84
Results 85
Genetic diversity 85
Neutrality tests 86
Amino acid diversity 86
Genomic comparisons 87
Discussion 88
Conclusions 92
EPILOGUE 98
General conclusions 99
REFERENCES 105
APPENDICES see accompanying CD
VI LIST OF TABLES
Table 1.1: Species with overlapping barcode clusters 23
Table 1.2: Provisional splits of recognized species 24
Table 2.1: Bird species from Argentina with deeply divergent COI groups 45
Table 3.1: Groups of species that failed recognition via MOTU analysis 69
Table 3.2: List of all species containing divergent COI lineages 71
Table 4.1: Species included in McDonald-Kreitman tests for neutrality 93
Table 4.2: Summary of amino acid variation for twelve avian orders 94
vn LIST OF FIGURES
Figure 1.1: Comparing barcode sequence clusters with species-level taxonomy 25
Figure 1.2: Mean intraspecific variation according to number of individuals 26
Figure 1.3: Intraspecific distance, population size, and apparent species age 27
Figure 1.4: Intraspecific and nearest neighbour distances in North American birds 28
Figure 2.1: Frequency histograms of COI variation for Argentinean birds 46
Figure 2.2: Maps of distributional patterns of divergent barcode lineages 47
Figure 3.1: Map of the eastern Palearctic region with collecting sites highlighted 73
Figure 3.2: Cumulative type I and type II error plots for divergence thresholds 74
Figure 3.3: Divergence patterns between closely related species 75
Figure 4.1: Intraspecific diversity, interspecific divergence, and sampling effort 95
Figure 4.2: Predicted secondary structure for the avian consensus sequence 96
Figure 4.3: Mean dN, ds, and dN/ds for thirteen protein-coding mitochondrial genes 97
viii PROLOGUE
1 GENERAL INTRODUCTION
The study of biological diversity has emerged as a cornerstone of the life sciences.
Looking backward, it is apparent that the field has grown with great momentum; the popular contraction "biodiversity" was only proposed in 1988 (Wilson and Peter 1988), while this year, 2010, has been declared the International Year of Biodiversity (Martens
2010). Conservation efforts are typically the focus of biodiversity studies, but we cannot protect what we do not know. The author David Quammen astutely described the classification of species as "...a crucial, basic enterprise, prerequisite to deeper ecological understanding" (Quammen 2003, p.68). Accordingly, the systematic classification of all life should be at the forefront of biodiversity goals.
In 1995, the term "taxonomic impediment" was coined referring to a deficiency in the expert taxonomic knowledge that would be necessary to identify and describe the
Earth's biodiversity (Evenhuis 2007). Although this terminology was new, the problem was not. This is evidenced by commentary written forty-five years earlier (Sabrosky
1950):
In many groups, the lack of adequate support for taxonomic work and the
relatively few students entering the field, among other factors, postpone
indefinitely the time when careful and comprehensive studies can be carried
out. The situation is particularly intriguing, for it is contemporaneous with
the great interest in such fields as evolution, zoogeography, and speciation,
which depend so greatly for their raw materials on the data published by the
taxonomists!
2 The more recent impetus to solve this issue was spurred by increasing extinction rates. The possibility of losing species before they could even be described suggested that timing was critical. Unfortunately, the identification of the Earth's biota is not a trivial matter. Despite over 250 years of effort commencing with Linnaeus' Systema Naturae
(1758), it is estimated that only 10 percent of all life forms on the planet have been described (Wilson 2003). Clearly, a new catalyst is needed to accelerate the rate of species discovery and invigorate the taxonomic community.
This need for faster taxonomic progress motivated a proposal by Hebert et al.
(2003a) to use a single standardized genetic marker as a global bioidentification tool.
This method was dubbed "DNA barcoding" in reference to the analogous Universal
Product Code - the familiar black-and-white bars that adorn virtually every modern commercial product. Commercial barcodes utilize a string of 11 digits that can vary between 10 character states (the numbers 0 through 9) to create billions of unique combinations, enough variation for each commercial product to bear its own code.
Characters in the DNA barcode are limited to four possible character states (the four nucleotides: adenine, guanine, cytosine, and thymine), but even with some functional constraints the inclusion of 648 nucleotide positions allows for an enormity of variation
(Hebert et al. 2003a). The idea put forth was that if mutations occur at even a modest rate
(i.e. 2% per million years), then reproductively isolated species would accumulate enough differences in their DNA so that each could be readily distinguished and identified by a unique sequence.
3 Hebert et al. (2003a) selected the mitochondrial gene cytochrome c oxidase I, or simply COI, as their candidate gene - the so-called "DNA barcode". A mitochondrial gene was a natural choice on account of the haploid mode of inheritance the mitochondrial genome, lack of recombination, and lack of introns (Ballard and Whitlock
2004). The specific choice of COI was based partly on the availability of universal primers (Folmer et al. 1994), but also for its phylogenetic signal (Hebert et al. 2003a).
However, opinions vary on the phylogenetic utility of COI (Feldman and Omland 2005;
Remigio and Hebert 2003; Simmons and Weller 2001; Zardoya and Meyer 1996). Other proposals for DNA-based identification systems have promoted the use of alternative genes, which implies that gene choice is somewhat arbitrary (e.g. Lee et al. 2008; Vences et al. 2005); but for broad success, standardization is key (Frezal and Leblois 2008).
The inclusion of DNA evidence in taxonomic practice is not in itself novel (e.g.
Busse et al. 1996; Karp et al. 1996; McManus and Bowles 1996; Navajas and Fenton
2000; Sibley and Ahlquist 1983). However, previous treatments have not given it a central role. The promotion of "DNA taxonomy", wherein DNA would serve a primary role in systematic studies, was concurrent with the seminal barcoding publication
(Blaxter 2004; Tautz et al. 2003). The two ideas vary subtly, but fundamental differences do exist. As noted above, the novelty of DNA barcoding lies primarily in the use of a standardized gene or combination of genes but is ultimately meant to compliment traditional methods; DNA taxonomy intends to install DNA at the core of a taxonomic reference system (Tautz et al. 2003). At most, the barcoding endeavour could perhaps be viewed as a derivative of the more ambiguously defined DNA taxonomy, but the two should not be confused.
4 DNA barcoding may be further broken down into two distinct objectives: species identification and species discovery (DeSalle 2006; Moritz and Cicero 2004).
Understanding the difference between these two purported uses is essential if the method is to be properly evaluated. Species identification concerns the assignment of an unknown specimen to a known species based on comparison of its DNA barcode to a reference database of vouchered specimens. Species discovery concerns the recognition of overlooked cryptic taxa, which are very similar morphologically but possess divergent genetic signatures. It has long been common for new species boundaries to be identified when taxa are carefully scrutinized with newer technologies, as is also articulated in
Sabrosky's commentary (1950):
In all too many cases we are still merely quoting the names and
conclusions of workers of the last generation, or even earlier, because
there has been no one available to study groups in the light of modern
techniques and concepts. When they are studied, we often find that old
records of "common species" mask a whole complex of forms.
Those words were penned three years prior to the discovery of the structure of
DNA (Watson and Crick 1953). With the modern incorporation of molecular methods,
Sabrosky's words echo louder than ever. The validity of species boundaries may be argued back and forth, particularly in a group such as birds where species are frequently described based on our ability to recognize them in the field (Watson 2005). Genetic evidence is often viewed as being more concrete since it is reflective of evolutionary
5 history (Zink 2004). But the rate at which DNA-led species discoveries are progressing is still insufficient, as is evidenced by the yet unresolved taxonomic impediment. Broadly applying the standardized barcoding approach across many taxonomic groups could potentially unearth a plethora of undiscovered species in a relatively short period of time
(Tautz et al. 2003).
Early studies accompanying the initial proposal for DNA barcoding offered a precursory view of COI sequence variation within an assortment of Eukaryotic taxa
(Barrett and Hebert 2005; Hebert et al. 2003b; Hogg and Hebert 2004; Remigio and
Hebert 2003). The results of these early studies showed DNA barcoding's potential by demonstrating that COI sequence variation was generally low within species and significantly greater between them. However, the DNA barcoding proposal was not unanimously accepted within the taxonomic community and critics were swift to identify perceived weaknesses, both in theory and in applications (Prendini 2005; Will and
Rubinoff 2004). The practice was proclaimed unscientific (Lipscomb et al. 2003) and fears were voiced that it would starve traditional taxonomy of needed funds (Ebach and
Holdrege 2005). Detractors of DNA barcoding published critiques with fanaticism
(Cameron et al. 2006), yet more grounded concerns did arise (DeSalle et al. 2005;
Nielsen and Matz 2006). Can DNA barcodes distinguish young radiations of species?
Can they delimit diverse tropical taxa? Will introgression be a major stumbling point?
How similar must DNA barcodes be to confirm the identity of a species? A more thorough test of the DNA barcoding method was still needed to help clarify these points of contention.
6 The core of many biological disciplines, such as physiology, genetics, or developmental biology, are almost entirely dependent on the extrapolation of trends observed in model organisms (Hedges 2002). Equivalently, it is reasonable that a few choice groups of organisms may be used to more properly evaluate the DNA barcoding paradigm. This may be accomplished through dense taxonomic sampling of well-studied groups. The mature taxonomy of birds makes them an ideal benchmark. Their ubiquitous nature and relatively low total number of species, combined with centuries of human fascination, has lead to birds being the most taxonomically robust group of virtually all life forms (Gill 2003).
A pilot study examined intraspecific and interspecific COI distances in a select number of North American birds (Hebert et al. 2004). The initial results were encouraging, generally showing limited variation within species and much greater variation between them. Exceptions to the former trend occurred in four species (i.e. large intraspecific differences) and led researchers to suggest that cryptic lineages had been uncovered. Increased sampling within those species (Solitary Sandpiper, Tringa solitaria;
Marsh Wren, Cistothorus palustris; Warbling Vireo, Vireo gilvus; Eastern Meadowlark,
Stums magna) revealed phylgeographic patterns wherein most of the lineages divided into eastern and western populations. However, critics were again quick to announce perceived shortcomings in the analysis, such as the exclusion of closely-related sister species and of tropical species (Moritz and Cicero 2004).
The preliminary investigation into the effectiveness of DNA barcode-based identification in birds paved the way for my thesis, which begins where the former left off. While mitochondrial markers are frequently incorporated into studies of avian
7 phylogenetics and phylogeography, COI is a virtual stranger to ornithology, having been employed in only a few remote studies (Feldman and Omland 2005). Hence, this thesis details the development of a large and comprehensive library of COI DNA barcodes for the avian class. Using the currently accepted taxonomy of birds as a gold standard, the efficacy of DNA barcodes as a tool for both species identification and species discovery is put to the test.
In Chapter One, I expand the COI barcode library to include 93% of North
American resident breeding and visiting pelagic bird species using vouchered tissue samples from a large assembly of museums, plus feather samples from bird banding stations. This comprehensive analysis provides a more complete picture of COI divergences within and between avian species and forms one of the largest scale studies of barcoding in vertebrates. Neighbour-joining trees with bootstrap support are used to diagnose 'barcode clusters', which closely correspond to species boundaries. Large intraspecific gaps suggestive of species-level differences are identified in fifteen currently recognized species. Problematic cases, wherein resolution between recognized species is poorly defined, are also highlighted between young species pairs. I discuss the hypothesis of selective sweeps as a mechanism to maintain low intraspecific diversity.
In Chapter Two, I explore barcode divergence in Neotropical birds by using vouchered museum specimens from Argentina. Similar to the Chapter 1,1 compare intra- and interspecific distances to assess the efficacy of barcode-based species delimitation.
The phylogeographic structure of species with large intraspecific differences is compared to that observed in North American species. Furthermore, intraspecific variation is explored in species with ranges expansive enough to include North America and
8 Argentina. The different patterns observed are discussed in an ecological and biogeographical context.
Chapter Three presents the third regional analysis, which details COI variation in eastern Palearctic birds. The eastern Palearctic provides a northern temperate region
analogous to the Nearctic, but one that differs greatly in landmass and glacial history. I compare patterns of species diversity to those in the North American avifauna.
Alternative methods for species delimitation are also explored in this chapter, including tree-based, distance-based, and character-based methods. Ultimately, I show that tree- and distance-based methods share most of the same strengths, while character-based
methods perform poorly on large datasets but provide additional resolution when used in conjunction with one of the other methods.
In Chapter Four, I turn the focus to the COI gene itself and the role of selection in
shaping the observed divergence patterns is more carefully explored. The idea that
selective sweeps are responsible for the low variation observed in the barcode region is put to the test. Finally, I make use of the growing number of complete mitochondrial
genomes available from GenBank to compare the types of substitutions observed in COI
to the other 12 protein-coding mitochondrial genes.
9 CHAPTER I
Comprehensive DNA barcode coverage of North American birds
Published in Molecular Ecology Notes:
Kerr, K.C.R., Stoeckle, M. Y., Dove, C. J., Weigt, ,.A. Francis, CM., and Hebert, P.D.N. (2007)
Comprehensive DNA barcode coverage of North / nerican birds. Molecular Ecology Notes 7: 535-543. doi: 10.1111/J.1471-8286.2006.01670.X
10 ABSTRACT
DNA barcoding seeks to assemble a standardized reference library for DNA- based identification of eukaryotic species. The utility and limitations of this approach need to be tested on well-characterized taxonomic assemblages. Here we provide a comprehensive DNA barcode analysis for North American birds including 643 species representing 93% of the breeding and pelagic avifauna of the United States and Canada.
Most (94%) species possess distinct barcode clusters, with average neighbor-joining bootstrap support of 98%. In the remaining 6%, barcode clusters correspond to small sets of closely-related species, most of which hybridize regularly. Fifteen (2%) currently recognized species are comprised of two distinct barcode clusters, many of which may represent cryptic species. Intraspecific variation is weakly related to census population size and species age. This study confirms that DNA barcoding can be effectively applied across the geographic and taxonomic expanse of North American birds. The consistent finding of constrained intraspecific mitochondrial variation in this large assemblage of species supports the emerging view that selective sweeps limit mitochondrial diversity.
11 INTRODUCTION
Mitochondrial DNA analysis has been employed in the evolutionary study of animal species for more than 30 years (Avise and Walker 1999; Brown et al. 1979;
Mindell et al. 1997). Its higher mutation rate and lower effective population size than nuclear DNA means that it provides a powerful tool to probe for evidence of reproductive isolation between lineages. This fact provoked a proposal to standardize DNA-based species identification by analyzing a uniform segment of the mitochondrial genome. With this approach, a library of sequences from taxonomically verified voucher specimens serve as DNA identifiers for species, in short, DNA barcodes (Hebert et al. 2003a). For animals, research has focused on a 648 bp segment of the mitochondrial gene cytochrome c oxidase I (COI), which can be readily recovered from diverse species with a limited set of primers. DNA barcoding translates expert taxonomic knowledge of diagnostic morphologic characters into a widely accessible format, DNA sequences, enabling more people to identify specimens. In addition to assigning specimens to known species, DNA barcoding can speed the discovery of new species, as large sequence differences in animal mitochondrial DNA generally signal species status.
For this approach to be effective, it must be possible to distinguish intraspecific and interspecific mitochondrial DNA variation. Pseudogenes, retention of ancestral polymorphisms, hybridization, and the idiosyncrasies of mitochondrial DNA inheritance pose potential difficulties (Benasson et al. 2001; Moritz and Cicero 2004; Thalmann et al.
2004; Will et al. 2005). The simplest test is whether genetic distances within species are less than those between species. Surprisingly, 23% of 2,319 animal species failed this test in one review (Funk and Omland 2003), implying that mitochondrial gene sequences do
12 not reliably capture species boundaries. However, the published studies that formed the basis for this estimate may be biased towards exceptional situations and groups in need of taxonomic revision, as further investigations on several vertebrate and invertebrate groups have shown that COI barcodes distinguish more than 95% of species (Hajibabaei et al. 2006; Ward et al. 2005).
Because birds have been the subjects of particularly intensive taxonomic analysis, they provide an excellent opportunity to test the efficacy of barcode-based species delimitation. With most recent species splits stemming from genetic studies, avian taxonomy could, in turn, benefit from a broad-scale genetic survey. In a preliminary survey of 260 North American bird species, COI sequence variation between species was generally much greater than that within species, and no two species shared barcodes
(Hebert et al. 2004). As a result, COI sequence information enabled assignment of specimens to known species. Four of 120 species (3%) studied in greater detail contained two distinct barcode clusters, which appeared to reflect cryptic species; a conclusion supported by observations of subtle differences in song and morphology for 3 of the 4 cases (Kroodsma 1989; Rohwer 1976; Sibley and Monroe 1990). To test these results more stringently, we increase taxon coverage and sample sizes in this study, applying
DNA barcoding to examine the taxonomic status of 643 species, representing 93% of the breeding and pelagic bird species from the United States and Canada (Figure 1.1).
METHODS
Most analytic methods followed those described in the earlier study (Hebert et al.
2004). DNA sources for this study included frozen tissue samples (muscle, liver, or
13 blood), most of which were obtained from specimens with vouchers housed in museum collections. In addition to tissue samples, feathers (breast feathers or retrices) freshly collected at bird banding stations at 6 locales (Ontario, New Brunswick, Nova Scotia,
Yukon, North Carolina, Tennessee) were analyzed. Feather samples were stored in a dark, dry location at room temperature.
DNA extraction, PCR, and sequencing reactions were performed at either the
University of Guelph or the Smithsonian Institution. DNA was isolated using DryRelease
(see Hajibabaei et al. 2005), Qiagen DNEasy tissue extraction kit (Qiagen Inc., Valencia,
California), or the NucleoSpin96 tissue kit (Machery-Nagel, Diiren, Germany). Feather samples were processed using the former method exclusively. PCR predominantly utilized a single primer pair: BirdFl (TTC TCC A AC CAC AAA GAC ATT GGC AC) and BirdRl (ACG TGG GAG ATA ATT CCA AAT CCT G). If amplification was unsuccessful, an alternate forward primer, FalcoFA (TCA ACA AAC CAC AAA GAC
ATC GGC AC), or reverse primers, BirdR2 (ACT ACA TGT GAG ATG ATT CCG
AAT CCA G) and VertebrateRl (TAG ACT TCT GGG TGG CCA AAG AAT CA), were employed. All reactions were run under the following thermal cycle program: 1 min at 94°C followed by 6 cycles of 1 min at 94°C, 1.5 min at 45°C, and 1.5 min at 72°C, followed in turn by 35 cycles of 1 min at 94°C, 1.5 min at 55°C, and 1.5 min at 72°C, and finally 5 min at 72°C. 45 cycles were run in place of 35 for DNA extracted from feather samples to compensate for lower yields of DNA. PCR products were visualized on pre cast 2% agarose gels using the E-gel 96 system (Invitrogen, Carlsbad, CA, USA). PCR products were bidirectionally sequenced on an ABI 3100, 3130, or 3730. Contigs were
14 assembled from forward and reverse reads using Sequencher, version 4.5 (Gene Codes
Corporation, Ann Arbor, MI, USA).
Specimen and collection data, sequences, and trace files are provided in the container project "Birds of North America Phase 2" at http://www.barcodinglife.org. A
Kimura 2-Parameter distance metric was employed for sequence comparisons (Kimura
1980), genetic distances were calculated using the BOLD Management & Analysis
System (www.barcodinglife.org), bootstrap analysis was performed with 1000 replicates using MEGA, version 3.1 (Kumar et al. 2004), and scatter and box plots were generated with SigmaPlot 8.02 (SPSS, Inc.). All new sequences have been deposited in GenBank under accession numbers DQ432694 to DQ433261, DQ433274 to DQ433846,
DQ434243 to DQ434805, while sequences from the earlier study (Hebert et al. 2004) are deposited in GenBank under accession numbers AY666171 to AY666596.
RESULTS
A standard set of primers amplified the target region of COI from all but 1 of 643 species. These taxa included representatives from 19 (70%) of the 27 extant orders of birds, distributed among 71 families and 286 genera (Appendix 1.1). Together with the
438 specimens analyzed in the earlier study, we obtained COI sequences from 2,590 individuals, 70% from vouchered specimens held in museum collections. The mean length of the products sequenced was 658 bp. We analyzed multiple individuals (average
= 4.1, range = 2-125) from 546 (85%) of the 642 species, including 5 or more individuals from 211 species (33%). In most cases, conspecific specimens derived from widely- separated sites (Birds of North America Phase 2 project at www.barcodinglife.org).
15 We detected presumptive pseudogenes in approximately 5% of specimens. Because these were generally short, approximately 100-200 nucleotides, complete barcode sequences could be recovered with bidirectional sequencing. One presumptive pseudogene corresponding to the full-length barcode sequence was detected in 3 tyrannid flycatcher specimens (0.1%). Overall, pseudogenes were not an important limit to recovery of COI sequences.
Average intraspecific variation was unrelated to the number of individuals analyzed, suggesting there was representative sampling (Figure 1.2). Within the low and narrow band of intraspecific variation, there was a weak relationship to census population size, which ranges from a few thousand to over 300 million individuals (Figure 1.3)
(Wetlands International 2002; Rich et al. 2004). Intraspecific mitochondrial variation was only weakly associated with apparent species age (Figure 1.3). The earlier North
American bird study measured mean congeneric distance, the average distance among all congeneric relatives. To more stringently test the discriminatory power of COI barcodes, the present study examined nearest neighbor distance, the minimum genetic distance between a species and its closest congeneric relative. Nearest neighbor distance averaged
4.3%, 19-fold higher than the mean within species and 11-fold higher than the average maximum intraspecific distance (Figure 1.4). Including all species may give a more representative picture, as generic assignments may be incorrect, and 10% of birds are the sole members of their genus; in this case average nearest neighbor distance was higher at
5.9% (Figure 1.4).
Levels of sequence difference varied across families: 35% of ducks, geese, and swans (Anatidae) showed nearest neighbor differences of 1 % or less, whereas all
16 sandpipers (Scolopacidae), plovers (Charadriidae), and owls (Strigidae) had nearest neighbor distances greater than 1%. COI barcodes separated 20 of the 23 taxonomic splits recognized in North American birds over the past 25 years with nearest neighbor distances ranging from 0.3-6.0% (Appendix 1.2). Average bootstrap support for species nodes with multiple individuals was 97.8%. As expected, bootstrap values were lower among the most closely-related species, averaging 79.8% for species with nearest neighbor distances less than 1%, but 99.5% for species with distances above 1%.
Forty-two species (6.4%) shared sequences or had clusters of sequences overlapping with those of another species, including 14 pairs, 2 triplets, and 1 set of 8 species (Table 1.1). The pattern of COI variation within these sets of overlapping species was indistinguishable from variation within single species, with the exception of Mallards and Black Ducks, which are known to harbor two distinct mitochondrial lineages (Avise et al. 1990). By contrast, we detected 15 other species with intraspecific distances greater than 2.5% (Table 1.2); each contained two distinct sequence clusters typically comprised of individuals from different geographic areas. These clusters may represent cryptic species. Treating these provisional species as distinct, average within-species variation for the COI barcode region was 0.23%.
DISCUSSION
The present study has reaffirmed that most North American bird species correspond to a single, tightly cohesive array of barcode sequences that are distinct from those of any other species. However, 15 species include two distinct barcode clusters,
17 while 42 other species possess barcode sequences that are shared or overlap with those of other species. What explains these exceptional cases?
Cases of deep barcode divergence within what are thought to be single species generally indicate cryptic taxa (Meyer and Paulay 2005; Moritz 1994). Our screen for provisional splits in species, employing a threshold that was 10X higher than the mean intra-specific variation, revealed 15 cases. Results from a thresholding approach must be interpreted with caution and are best used to flag species in need of further research.
Significantly, most of these hypothesized splits are supported by prior taxonomic work
(Table 1.2). In total, 9 of our 15 cases have been previously cited; 8 have been proposed to represent species pairs, the exceptional case being the northern raven (Omland et al.
2000). Some of the species yield additional lineages when non-North American populations are included; for example, 6 lineages in total are suggested for the winter wren (Drovetski et al. 2004b).
Regarding the 17 sets of species with overlapping barcodes, three processes may account for these findings. First, some may be recently diverged sister taxa where COI has not yet accumulated sequence differences. In such cases more extensive sequence information might allow resolution. Second, these taxa may share mitochondrial DNA because of hybridization. Most of our species sets with overlapping barcodes hybridize at least occasionally; many show extensive hybridization and produce fertile Fl hybrid offspring. Examples include Snow Goose and Ross's Goose (Cooke et al. 1995); Blue- winged and Cinnamon Teal (Bolen 1979); Mallard, Mottled, and Black Ducks
(McCracken et al. 2001); Sharp-tailed Grouse and Greater Prairie-Chicken (Sparling Jr.
1980); Red-naped and Red-breasted Sapsuckers (Johnson and Johnson 1985);
18 Townsend's and Hermit Warblers (Morrison and Hardy 1983); and the eight species of large white-headed gulls (California, Glaucous, Glaucous-winged, Herring, Iceland,
Lesser Black-backed, Western, Thayer's) (Olsen 2004). These taxa may be in the indeterminate zone between differentiated populations and distinct species (de Queiroz
2005), or well-formed species that are losing genetic identity due to secondary contact and hybridization. Third, some of the pairs with overlapping barcodes may be a single species (Johnston 1961).
Although there is an abundance of subspecific assignments in North American birds -5.5 per species according to one survey - many do not show any evidence of genetic divergence (Zink 2004). Barcode analyses can serve as a quick screening tool for those lineages with deep genetic divergence, aiding detection of overlooked species. In fact, all past barcode surveys have identified new taxonomic units, either as named species, provisional species, evolutionary significant units (ESUs), or molecular operational taxonomic units (MOTUs) in 4-40% of the species examined (Hajibabaei et al. 2006; Meyer and Paulay 2005; Monaghan et al. 2005; Saunders 2005; Scheffer et al.
2006; Smith et al. 2005). These results suggest that "an iterative process of DNA barcoding... folio wed by taxonomic study" will be a productive path to cataloging biodiversity (Barber and Boyce 2006). In the present study most provisional species were small to medium-sized, plainly-colored birds, whereas most species with overlapping barcodes were large and/or brightly-colored, which might reflect a natural taxonomic tendency toward undersplitting inconspicuous birds and/or oversplitting more conspicuous species.
19 Over 30 years ago, Richard Lewontin concluded that intraspecific variation is tightly constrained and recognized that both genetic drift and natural selection offer possible explanations for this fact (Lewontin 1974). Under genetic drift, recent population bottlenecks could account for low intraspecific variation. It might be argued that the low levels of mitochondrial variation detected in our study reflect the unique history of North American birds, most of which have expanded into their present ranges from smaller populations following retreat of glaciers. However, restricted intraspecific mitochondrial variation also exists in many vertebrate and invertebrate species from tropical, temperate, marine, and terrestrial environments (Barrowclough and Shields
1984; Bucklin and Wiebe 1998; Hajibabaei et al. 2006; Meyer and Paulay 2005;
Saunders 2005), implying a more general explanation. Effective population size for nuclear genes can reach an asymptotic limit due to linkage; this effect is strongest for organisms with large genomes, with the result that the effective population size of vertebrates might not exceed 104 (Gillespie 2000; Lynch et al. 2006). Although not directly applicable to mitochondria, this effect does reveal the complexities of estimating effective population sizes and predicting the role of drift in scouring variation.
Low mitochondrial variation might alternatively (or additionally) reflect recurrent selective sweeps; repeated diffusions of new, selectively favored variants across the breeding range of a species could purge mitochondrial diversity. Although 98% of the nucleotide differences in COI barcode sequences in our study between nearest neighbors were synonymous, selection on any nucleotide position in the mitochondrial genome would result in the loss of variation in the barcode region because mitochondrial DNA is inherited as a single linkage group, due to its asexual transmission. Mutations in nuclear
20 or mitochondrial loci important in nuclear-mitochondrial co-adaptation might be particularly important (Catalano et al. 2006). A recent analysis of patterns of substitution in nuclear and mitochondrial DNA concluded that reduced mitochondrial diversity in animals is due to selective sweeps (Bazin et al. 2006). Although these authors found no correlation between census population size and intraspecific mitochondrial variation, the range of variation was less than expected given census population sizes. This latter finding, together with our results showing trends toward increased diversity in larger populations and older species, imply that genetic drift does influence mitochondrial variation, but only weakly.
Most researchers agree that species are a key unit of biological systems, but quarrel about how best to define them. Hence, theoretical and operational species concepts proliferate, each emphasizing different aspects of present day biology and evolutionary history (Wheeler 2000). Some believe that a basic taxonomic unit does not exist, instead viewing species as a convenient taxonomic construct, "an arbitrary cut-off somewhere along a branch in the tree of life" (Mishler and Shapley 2004). The tight clustering of mtDNA sequences within species observed in our study not only bolsters the view that species are fundamental biological units, but also reveals that their identification is usually uncomplicated.
In summary, most North American bird species appear to have a similar genetic structure, each being a single tight cluster of mitochondrial DNA variants distinct from the clusters of closely-related species. High bootstrap support for species nodes in this study and in other animal groups suggests neighbor-joining analysis of COI barcode sequences will be widely effective (Hajibabaei et al. 2006; Ward et al. 2005). The few
21 species with higher intraspecific diversity were comprised of two such clusters, many of which appear to represent cryptic species. It seems likely that further study will reveal additional lineages within some species, but leave unchanged the underlying pattern of segregation of mitochondrial diversity into distinct clusters (Zink 2004). Together these observations imply a general constraint on mitochondrial diversity in birds.
22 Table 1.1. Species with overlapping barcode clusters. The percent similarity between related species (calculated using a K2P distance metric) is provided.
Order Common Name Scientific Name n Similarity (%) 1 Anseriformes Snow Goose Chen caerulescens 5 99.8 2 Ross's Goose Chen rossii 2 3 Black Duck Anas rubripes 8 4 Mallard Anas platyrhynchos 8 99.4 5 Mottled Duck Anas fulvigula 1 6 Blue-winged Teal Anas discors 8 100.0 7 Cinnamon Teal Anas cyanoptera 2 8 King Eider Somateria spectabilis 5 99.7 9 Common Eider Scomateria mollissima 1 10 Galliformes Sharp-tailed Grouse Tympanuchus phasianellus 3 99.7 11 Greater Prairie-Chicken Tympanuchus cupido 1 12 Lesser Prairie-Chicken Tympanuchus pallidicinctus 5 12 Podicipediformes Western Grebe Aechmophorus occidentalis 2 99.7 14 Clark's Grebe Aechmophorus clarkii 2 15 Charadriiformes Laughing Gull Larus atricilla 8 99.3 16 Franklin's Gull Larus pipixcan 4 17 California Gull Larus californicus 5 99.8 18 Herring Gull Larus argentatus 7 19 Thayer's Gull Larus thayeri 4 20 Iceland Gull Larus glaucoides 1 21 Lesser Black-backed Gull Larusfuscus 5 22 Western Gull Larus occidentalis 4 23 Glaucous-winged Gull Larus glaucescens 4 24 Glaucous Gull Larus hyperboreus 4 25 Piciformes Red-naped Sapsucker Sphyrapicus nuchalis 5 99.4 26 Red-breasted Sapsucker Sphyrapicus ruber 6 27 Passeriformes Black-billed Magpie Pica hudsonia 3 99.6 28 Yellow-billed Magpie Pica nuttalli 3 29 American Crow Corvus brachyrhynchos 3 99.5 30 Northwestern Crow Corvus caurinus 4 31 Townsend's Warbler Dendroica townsendi 6 99.5 32 Hermit Warbler Dendroica occidentalis 5 33 Golden-crowned Sparrow Zonotrichia leucophrys 8 99.7 34 White-crowned Sparrow Zonotrichia atricapilla 3 35 Dark-eyed Junco Junco hy emails 24 99.7 36 Yellow-eyed Junco Junco phaeonotus 3 37 Snow Bunting Plectrophenax nivalis 2 99.9 38 McKay's Bunting Plectrophenax hyperboreus 1 39 Great-tailed Grackle Quiscalis mexicanus 11 99.2 40 Boat-tailed Grackle Quiscalis major 6 41 Common Redpoll Carduelis flammea 2 99.7 42 Hoary Redpoll Carduelis hornemanni 5
23 Table 1.2. Provisional species. Provisional splits of recognized species with intraspecific distances above 2.5% threshold (*) identified in earlier study (Hebert et al. 2004); (f) prior research supports split; (J) Prior research cites genetic division but does not support species split (citations provided in table). Bootstrap support for provisional species clusters are shown.
Max. Common Name Scientific Name Intrasp. Bootstrap Citation (if applicable) Dist. 1 Northern Fulmar Fulmaris glacialis 3.1 100/- 2 Solitary Sandpiper* Tringa solitaria 5.4 100/100 3 Western Screech Owl Megascops kennicottii 3.1 100/100 4 Warbling Vireo*f Vireo gilvus 4.0 100/99 (Sibley and Monroe 1990) 5 Mexican Jayt Aphelocoma ultramarina 5.3 100/- (Rice et al. 2003) 6 Western Scrub-Jayt Aphelocoma californica 3.2 111- (Rice et al. 2003) 7 Common Ravent Corvus co rax 4.3 100/92 (Omland et al. 2000) 8 Mountain Chickadeef Poecile gambeli 3.7 100/100 (Gill et al. 1993) 9 Bushtit Psaltriparus minimus 3.6 100/100 10 Winter Wrenf Troglodytes troglodytes 6.4 100/100 (Drovetski et al. 2004b) 11 Marsh Wren*f Cistothorus palustris 7.9 100/100 (Kroodsma 1989) 12 Bewick's Wren Thyromanes bewickii 4.8 100/100 13 Hermit Thrush Catharus guttatus 3.2 100/100 14 Curve-billed Thrashert Toxostoma curvirostre 7.4 100/- (Zink and Blackwell- Rago 2000) 15 Eastern Meadowlark*f Sturnella magna 4.6 100/100 (Rohwer 1976)
24 RECOGNIZED SPECIES YES NO YES A. SINGLE CLUSTERS B. MULTIPLE CLUSTERS MATCH SPECIES-LEVEL SUGGEST CRYPTIC SPECIES DISTINCT TAXONOMY BARCODE CLUSTERS NO C. OVERLAPPING CLUSTERS D. (BLANK) SUGGEST YOUNG SPECIES, HYBRIDIZATION, OR SYNONOMY
Figure 1.1. Comparing barcode sequence clusters with species-level taxonomy.
Categories A-C are described in figure; by definition, all potential splits recognized by barcoding have distinct barcodes, so D is blank.
25 i.o -
O g 1.4- • 03 • (0 • . • ••B 1-2- • • Q. • _j 0.0 - -4- + -4- •• i • ~i ^ 3 4 5 6 7 8 9 >10 No. individuals analyzed Figure 1.2. Mean intraspecific variation according to number of individuals analyzed. Boxes indicate mean, 25th and 75th percentile; bars, 10th and 90th percentile; and dots, values above or below 90th or 10th percentile, respectively. 26 A) 1.8 n = 411 8 1.6 H c r =.26 CO T5 1.4 H p = < .0001 °- 1.2- " 1.0 - *^ Census population size B) 1.6 O 1.4 n = 391 C CO r =.16 (/) 1.2 p = .002 T3 Q. CM 1 0 *: o 0.8 cif i 0) Q. (0 0.6 c c 0.4 (0 0 2 0.2 0.0 2 4 6 8 10 12 14 16 Minimum interspecific distance Figure 1.3. Intraspecific distance, population size, and apparent species age. A) Linear regression of mean intraspecific distance and log 10 census population size. For illustration purposes, a box plot was generated as described in legend to Figure 2. B) Linear regression of mean intraspecific COI distance compared with apparent species age, as indicated by minimum interspecific K2P distance to nearest congeneric relative. 27 A. Mean intraspecific distance B. Maximum intraspecific distance provisional species (n=30) min 0 max 1.59 min 0 max 2.24 mean .23 se .01 mean .40 se .02 - —I 1— —t 1 r— 5 C. Nearest neighbor distance D. Nearest neighbor distance congeners only overlapping species / (n=42) min 0 max 16.99 min 0 max 14.18 mean 5.91 se .10 mean 4.30 se .15 h l_ ] i—i—i i ' o -I—i—i—i—I—i—i—i—n—i—i—j—i—r i i 15 0 5 10 15 Figure 1.4. Intraspecific and nearest neighbor distances in North American birds. Applying these measures to the data set in the preliminary study (2) gave mean values of 0.24, 0.27, 8.02, and 5.86 for A-D, respectively. 28 CHAPTER II Probing evolutionary patterns in Neotropical birds through DNA barcodes Published in PLoS ONE: Kerr KCR, Lijtmaer DA, Barreira AS, Hebert PDN, Tubaro PL. (2009) Probing evolutionary patterns in Neotropical birds through DNA barcodes. PLoS ONE 4(2): e4379. doi:10.1371/journal.pone.0004379 29 ABSTRACT The Neotropical avifauna is more diverse than that of any other biogeographic region, but our understanding of patterns of regional divergence is limited. Critical examination of this issue is currently constrained by the limited genetic information available. This study begins to address this gap by assembling a library of mitochondrial COI sequences, or DNA barcodes, for Argentinian birds and comparing their patterns of genetic diversity to those of North American birds. Five hundred Argentinian species were examined, making this the first major examination of DNA barcodes for South American birds. Our results indicate that most southern Neotropical bird species show deep sequence divergence from their nearest- neighbour, corroborating that the high diversity of this fauna is not based on an elevated incidence of young species radiations. Although species ages appear similar in temperate North and South American avifaunas, patterns of regional divergence are more complex in the Neotropics, suggesting that the high diversity of the Neotropical avifauna has been fueled by greater opportunities for regional divergence. Deep genetic splits were observed in at least 21 species, though distribution patterns of these lineages were variable. The lack of shared polymorphisms in species, even in species with less than 0.5M years of reproductive isolation, further suggests that selective sweeps could regularly excise ancestral mitochondrial polymorphisms. These findings confirm the efficacy of species delimitation in birds via DNA barcodes, even when tested on a global scale. Further, they demonstrate how large libraries of a standardized gene region provide insight into evolutionary processes. 30 INTRODUCTION DNA barcoding, the survey of sequence diversity in a standard gene region (5' segment of mitochondrial cytochrome c oxidase I, or COI, for animals), has a strong track record for identifying species in varied taxonomic groups (Hajibabaei et al. 2006; Smith et al. 2008; Ward et al. 2005). One particularly comprehensive study of DNA barcodes revealed that 94% of 643 North American bird species possess diagnostic barcode sequences (Hebert et al. 2004; Kerr et al. 2007). Moreover, the few cases where barcode sharing limited taxonomic resolution in this fauna involved closely allied species that often hybridize. Similar results have been obtained from the Palearctic; Yoo et al. (2006) reported that barcodes reliably identify Korean birds (92 of 450 species were examined). There remains a need for similar investigations in the Neotropics, the hotspot for avian diversity with a fauna of 3,370 breeding species including 3,121 endemics (Newton 2000). Aside from this high taxon diversity, tropical species often possess greater genetic structure than their temperate zone counterparts (Bates et al. 1999; Garcia-Moreno et al. 1998; Garcia-Moreno et al. 1999; Hackett 1996). For both these reasons, it has been argued that the Neotropical avifauna will challenge DNA barcoding (Moritz and Cicero 2004). Yet, the only previous test of barcoding in Neotropical birds disagreed with this conclusion as it found that 16 species of the endemic family Thamnophilidae could be discriminated (Vilaga et al. 2006). Clearly, a larger-scale investigation is needed. Moreover, a broad survey of sequence diversity at COI in Neotropical birds permits the analysis of patterns of genetic divergence and geographic distributions of distinct lineages, as well as comparisons with other geographic areas (particularly the Nearctic, 31 where most avian species have already been barcoded (Hebert et al. 2004; Kerr et al. 2007). This contribution would be highly valuable to study diverse aspects of evolution in birds and to detect species, or groups of species, requiring more detailed investigations of taxonomic status. The Argentinian avifauna includes 980 species, approximately 25% of Neotropical bird species (Mazar Barnett and Pearman 2001; Narosky and Yzurieta 2003). The present study examines patterns of barcode divergence in over half of the bird species native to Argentina. In addition to testing the effectiveness of DNA barcodes for species identification, we explore cases where different species share COI sequences and those where single species include two or more divergent lineages. Finally, and more critically, we analyze patterns of sequence divergence in the birds of Argentina and compare them with those in North America with a view towards understanding the origins of the diversity in South American avifauna and obtaining a broader perspective on mitochondrial genetic variation in New World birds. METHODS Most specimens (88%) were collected by the Ornithology Division of the Museo Argentino de Ciencias Naturales "Bernardino Rivadavia" (MACN) between 2003 and 2007, sometimes in collaboration with other institutions. A few additional specimens were donated (6%), confiscated from illegal traders (4%), or obtained from skins of birds collected after 1995 (2%). DNA was usually extracted from frozen samples of pectoral muscle, liver or heart, but a few extractions were from blood (1%) or small pieces of 32 skin/toe pads from museum skins (2%). All samples derive from the tissue collection at the MACN. A voucher is present in the MACN or in another collaborating institution for 99% of the specimens that provided a tissue sample for analysis. While most of these vouchers were study skins, a few were skeletons or specimens in ethanol. In the case of blood samples, birds were photographed prior to release to provide an e-voucher. All specimens were identified in the field and validated after preparation; taxonomic assignments follow Clements (2007). Only specimens with confirmed species identities were included. Adults were preferred over juveniles and, in the case of species with sexual dimorphism, males were chosen over females. Specimens were examined from all localities with representatives of a species, but no more than three individuals were analyzed from a single location, excepting a few species with particularly high genetic divergence. DNA extracts were obtained using glass fibre columns with vertebrate lysis buffer, and an automated protocol using a Biomek FXP liquid handler (Ivanova et al. 2006). Extracts were eluted in 50 fil of molecular grade water. COI sequences were amplified using the primer pair of BirdFl (5'- TTCTCCAACCACAAAGACATTGGCAC-3') and COIbirdR2 (5'- ACGTGGGAGATAATTCCAAATCCTGG-3'). When PCR failed and degraded DNA was the suspected cause, internal primers were used in conjunction with those above: AvMiRl (5'-ACTGAAGCTCCGGCATGGGC-3') and AvMiFl (5'- CCCCCGACATAGCATTCC-3'). PCR reactions were initially run following the thermal cycling program in Kerr et al. (2007). Later samples used a shorter program which was equally effective: One cycle at 94°C for 1 min, five cycles of 94°C for 1 min, 45°C for 40 33 s, and 72°C for 1 min, 35 cycles of 94°C for 1 min, 51°C for 40 s, and 72°C for 1 min, and lastly one cycle of 72°C for 5 min. PCR products were visualized on 2% agarose gels (E-gel 96, Invitrogen) and were bi-directionally sequenced on an ABI 3730x1 DNA Analyzer. Sequence records were assembled from forward and reverse reads using SEQUENCHER, version 4.5 (Gene Codes) and aligned by eye using BioEdit version 7.0.5.3 (Hall 1999). Specimen and collection data, sequences, and trace files are available in the project 'Birds of Argentina - Phase I' in BOLD (http://www.barcodinglife.org). Sequences have been deposited in GenBank under accession numbers FJ027014 - FJ028607. BOLD process IDs, museum numbers, and GenBank accession numbers for each specimen analyzed are outlined in Appendix 2.1. Comparisons to COI sequences for North American birds employed data available from BOLD in the container project 'Birds of North America - Phase II' (sequences are also available from GenBank under accession numbers AY666171-AY666596, DQ432694-DQ433261, DQ433274- DQ433846, and DQ434243-DQ434805). Sequences were compared using the Kimura 2-parameter distance option (Kimura 1980) in the BOLD Management & Analysis System (Ratnasingham and Hebert 2007). Linear regression was performed using R version 2.5.0 (R Development Core Team 2007). Intra- and interspecific variation were examined visually with neighbour-joining trees generated using the 'Taxon ID Tree' option on BOLD. Bootstrap support using 1000 replicates was calculated using MEGA version 4.0 (Tamura et al. 2007). Sequences of pairs or trios of species with low divergence were analyzed by eye for the identification of diagnostic nucleotides (positions fixed within each species but different 34 between them), which have previously proven to be robust in other species (Kerr et al., unpublished data). RESULTS In total, 1,594 sequences were obtained from 500 species representing 51 % of the bird species known from Argentina, including 22 of 23 orders and 68 of 81 families (Appendix 2.2 provides a species list). On average, 3.2 individuals (range 1-19) were analyzed per species, with 389 taxa represented by multiple specimens. Only sequences longer than 550 bp with less than 1% ambiguous base calls were included (average sequence length was 692 bp), except four sequences possessing 1-3% ambiguous calls that represented the sole records for their species. The mean sequence distance among congeneric taxa was 7.6%, while the distance to the nearest congener averaged 6.2%, based on 282 comparisons. In genera represented by multiple species, 92% of species were more than 1.0% divergent from their nearest congener and 83% were more than 2.5% divergent (Figure 2.1). COI delivered a species identification for 98.8% of species, using either a distance-based criterion or, in cases of very low divergence (less than 0.5%), using diagnostic nucleotide substitutions. Mean intraspecific distance was 0.24% based on the 389 species represented by multiple specimens (weighted equally regardless of the number of individuals). There was a weak association between intraspecific variation and sample size (linear regression, p < 0.01, R2 = 0.12). A few cases of low (<1%) divergence between congeners were detected. In most cases, discrimination by COI was possible because each species formed a distinct cluster 35 in a neighbour-joining tree or showed diagnostic sequence differences. For example, sequences of the mockingbird species Mimus dorsalis and M. triurus showed three diagnostic substitutions, and the woodpeckers Veniliornis frontalis and V. passerinus differed by two nucleotides. Two goldfinch species (Carduelis atrata and C. crassirostris) and three ground-tyrants (Muscisaxicola capistratus, M. frontalis and M. maclovianus) also showed low divergences but were separable. Two ducks, Anas puna and A. versicolor, possessed five diagnostic substitutions, although the latter species showed considerable within species variation at other nucleotide positions. However, in all these cases, barcodes delivered reliable identifications because sequences for each species formed a single cluster (bootstrap support exceeded 70% in all cases, with the exception of A. versicolor, which had 57% support). Three other ground-tyrants (Muscisaxicola flavinucha, M. cinereus, M. rufivertex) with even lower genetic divergences were paraphyletic, impeding straightforward identification. Additionally, there was one species complex where barcode resolution was clearly compromised- six species of Sporophila (S. cinnamomea, S. hypochroma, S. hypoxantha, S. palustris, S. ruficollis, S. zelichi) all shared barcodes. Hebert et al. (2004) suggested that a sequence threshold of ten times the average intraspecific variation could be used to identify those cases where a current species might represent more than one taxon. For the birds of Argentina, this threshold lies at 2.4% and its application flags 13 species as possessing unusually high sequence variation (Table 2.1). Another way of identifying species in need of taxonomic scrutiny involves the search for taxa whose specimens form two or more distinct clusters with high bootstrap support (i.e. >98%) in a neighbour-joining tree. If applied to Argentinian birds, eight 36 more species are flagged, all showing maximum intraspecific distances higher than 1.5% (Table 2.1). More than 10% of the Argentine avifauna (111 of 980 species) also occurs in North America, but 45 of these species are migrants that do not breed in Argentina, five are pelagic visitors to both regions, and six are introduced to one or both areas. However, barcode data are available for 42 of the remaining 55 species, which possess natural breeding ranges extending from Argentina to North America (see Appendix 2.3). Seven of these widely distributed species displayed substantial genetic divergence (>2.4%) between North American and Argentinian populations, three displayed smaller divergences (1.5-2.4%), while the remaining 32 showed limited or no divergence. DISCUSSION Barcodes in Argentinian Birds Just nine of the 500 bird species included in our study cannot be distinguished using COI sequences. Three of these are Muscisaxicola ground-tyrants, which have low interspecific divergence and appear to be paraphyletic. The remaining six species, which all share barcodes, are members of the southern capuchinos, a sub-group within the genus Sporophila that includes nine species (seven of them present in Argentina) and shows little mitochondrial sequence variation (Lijtmaer et al. 2004). Members of this group are believed to have diverged within the past 0.5M years, fueled by sexual selection and a fragmented landscape, and they are known to hybridize (Lijtmaer et al. 2004). Although shared mitochondrial sequences have also been reported in white-headed gulls (Kerr et al. 2007), in Darwin's finches (Sato et al. 1999), and possibly in crossbills (Edelaar et al. 37 2003), the present study reinforces earlier evidence that such cases are exceptional in both Nearctic and Neotropical locales despite the known existence of hybridization in birds, which involves around 10% of the world's species (McCarthy 2006). Five additional genera with very low interspecific divergences included a pair or triad of species; none of these cases was a surprise. Mimus dorsalis and M. triurus are regarded as sister taxa (Arbogast et al. 2006), while Veniliornis frontalis and V. passerinus are so similar morphologically that Nores (1992) designated them as allospecies, and Moore et al. (2006) suggested that they diverged only 0.35 Mya. Anas puna and A. versicolor (Johnson and Sorenson 1999) have sometimes been considered conspecific (Rodriguez Mata et al. 2006), but the present results support the conclusion that they are young species. Arnaiz-Villena et al. (1998) suggested a very recent expansion of Carduelis in South America to explain the low genetic divergence between C. atrata and C. crassirostris. Finally, Chesser (2000) proposed a middle-late Pleistocene diversification for Muscisaxicola because of the shallow divergences between its member species. Further instances of low divergence between species will undoubtedly be revealed as taxonomic coverage builds for Argentina, but there is no evidence that young species radiations are any more common in this region than in North America. Further studies in northern, more tropical areas of South America are needed to establish if this similarity is a consequence of the comparison of two mainly temperate regions in opposite hemispheres or if the same trend is present throughout the Neotropics. Two evolutionary inferences derive from the present results. First, the relative paucity of very closely related species implies that high species diversity in southern Neotropical birds does not owe its origin to an elevated incidence of young species 38 radiations, a finding that is consistent with recent proposals (Weir and Schluter 2004; Weir and Schluter 2007). Second, and more generally, the low variation within species and the fixation of diagnostic COI sequences even in young species groups conflicts with expectations based on stochastic models of mitochondrial variation, which argue that ancestral polymorphisms will persist for millions of generations (Hickerson et al. 2006). Instead, the rapid emergence of fixed differences is compatible with the growing evidence that selective sweeps recurrently strip variation from mitochondrial gene pools (Bazin et al. 2006), although demographic factors have not been ruled out as a possible explanation. Tropical taxa are generally thought to show more genetic structure than their temperate zone counterparts, even in the absence of geographical barriers (Francisco et al. 2007). Work on North American birds revealed deep barcode divergences in 2.7% of species (15/546), all involving allopatric lineages, usually east-west splits. The incidence of deep splits was slightly higher in Argentina with 3.3% of species with multiple records (13/389) showing divergences greater than the 2.4% threshold. Another eight species possessed distinct barcode clusters with 1.5-2.4% divergence, producing a total of 21 species with marked population structure (5.4% of the species examined). Interestingly, these cases of divergence included situations of allopatric, parapatric and sympatric divergence. Fourteen of the 21 species showed allopatric divergences although there was no simple pattern of geographic structuring. Some cases involved north-south divergence. For example, Patagonian populations of Cistothorus platensis and Cinclodes fuscus possessed almost 5% divergence from those in northwestern Argentina (Figure 2.2b), 39 which is consistent with previous findings (Traylor 1988). Other barcode splits coincided with environmental gradients or known barriers to gene flow. For example, specimens of Upucerthia dumetaria from different elevations in the Andes diverged by as much as 5.4%, while 4% divergence between lineages of Thamnophilus ruficapillus (see Figure 2.2a) coincided with isolation caused by the Chaco woodland (Nores 1992). Some cases of allopatric divergence seem to represent overlooked species; specimens of Serpophaga subcristata from northeastern Argentina and those from Patagonia and Buenos Aires province not only exhibit 2% COI divergence, but differences in morphology and vocalizations (Straneck 1993). Likewise Troglodytes aedon possessed three COI lineages with divergences as high as 5% and its Neotropical populations are thought to include several species (J. Klicka, unpublished data). In other cases, the situation is unclear. For example, two subspecies of Vanellus chilensis (northern V. c. chilensis, southern V. c. fretensis) show just 1.5% divergence, but this matches the divergence between other closely allied species in the same family (e.g. Charadrius alticola and C. falklandicus). Five of the 21 species with deep divergences involved cases of parapatry. For example, populations of Vireo olivaceus in northeastern and northwestern Argentina possess up to 3.1% sequence divergence, but both COI lineages occurred at one northeastern site (Figure 2.2c). Does this area represent a region of sympatry between reproductively isolated species or a contact zone between phylogeographic groups? More specimens from this location need to be examined for variation at nuclear loci to resolve this uncertainty (Brumfield 2005). Interestingly, two of the 21 species possessed divergent mitochondrial lineages in sympatry. One of these species, Manacus manacus, includes four colour forms that are 40 sometimes regarded as different species (Snow 2004). Specimens from Parque Nacional Iguazii included two COI groups with 3.5% divergence and males of both lineages were collected from a single lek (Figure 2.2d), suggesting that the divergent COI groups in M. manacus represent a rare case of deep intra-specific divergence. However, further study is required to determine the origin of such genetic variation and the taxonomic status of this species. Furthermore, this sympatric distribution of lineages could prove to be parapatric with increased sampling (the same could theoretically be true for any of the above examples of allopatry if a region of overlap has been left unsampled). This emphasizes the need to collect several specimens per locality, as well as to sample the entire distribution of each species. Barcoding the avifaunas of Argentina and North America Although species coverage is higher for North America (643 species, 93% of fauna) than for Argentina (500 species, 51% of fauna), sample sizes are high enough in both regions to provide a good sense of overall patterns of COI variation. Mean intraspecific divergences are congruent - 0.23% in North America and 0.24% in Argentina. The nearest neighbour distance for congeneric taxa is lower in North America (4.3%) than in Argentina (6.2%), but this regional difference will undoubtedly lessen as species coverage builds for Argentina. Barcode sequences are effective for species identification in both settings; 94% of North American birds and 98% of birds from Argentina can be identified to a species level. The incidence of deep intraspecific divergences is similar in the two regions (2.7% versus 3.3%), but distributional patterns vary. Most of the genetically divergent groups in North America reflect east-west 41 allopatry (Kerr et al. 2007), while divergences in Argentina are more complex; some are north-south, others are east-west, and yet others occur along altitudinal gradients or in response to specific habitat barriers. Moreover, some cases of deep barcode divergence in Argentinian species involve parapatric or sympatric lineages. Aside from a test of congruence in barcode patterns, this study provided information on sequence divergences for 42 species whose breeding range extends from Argentina to North America. Fifteen of the 32 species with low divergence are waterbirds (e.g. herons, rails, cormorants, ducks); their use of coastal habitats facilitates gene flow (Friesen 1997). Seven other species are tropical raptors with limited ranges in North America and whose long-distance movements ensure gene flow (Bildstein 2004). The few small passerines in this group may represent recent range extensions into the southernmost United States. The 10 species with deeper genetic divergences (>1.5%) were largely plain-coloured passerines {Troglodytes aedon, Vireo olivaceus) and birds with cryptic lifestyles (Nyctidromus albicollis, Glaucidium brasilianum). Most possessed a disjunct range, typically with northern migratory and southern non-migratory populations (e.g. Athene cunicularia, Troglodytes aedon, Vireo olivaceus). Some groups, such as the vireos, are thought to have evolved migratory behaviour on multiple occasions (Cicero and Johnson 1998), switches that might provoke rapid speciation because they isolate breeding populations (Bearhop et al. 2005). The status of all 10 species with deep splits requires further evaluation, but the need for taxonomic revisions has already been suggested in some cases (e.g. Brumfield and Capparella 1996). Aside from revealing cases of geographic divergence, the coupling of North American and Argentinian results revealed two cases of barcode sharing. Parula 42 americana, a species ranging from eastern North America to Central America, shares barcodes with P. pitiayumi, a tropical species whose range extends north to Texas. A recent range expansion from a common ancestor has been proposed as the most likely cause for the low divergence between these species (Lovette and Bermingham 2001). The second case of sequence overlap involves Anas americana, restricted to North/Central America, and A. sibilatrix, confined to the southern cone of South America. While ducks generally exhibit low genetic divergences, these two species possess striking plumage differences. Peters et al. (2005) proposed that rapid phenotypic changes have been provoked by divergent selective pressures in the northern and southern hemispheres. Conclusions The taxonomy of Neotropical birds remains largely reliant on dated morphological studies (Prum 1994), but molecular data promise to expedite a newly detailed understanding of this fauna (Fjeldsa et al. 2007). Although levels of genetic differentiation do not dictate taxonomic status (Zink et al. 1995a), barcode analysis illuminates those taxa and those segments of their ranges where further research is justified. Taxonomic decisions cannot be based simply on COI sequences, but barcode surveys are a powerful tool for rapidly identifying those species in need of further investigation. The occurrence of limited variation between well-known sister taxa suggests that even more cryptic species may persist than a liberal thresholding approach, such as the 10X rule, might indicate (Tavares and Baker 2008). The present study shows the way in which a broad-ranging analysis of sequence diversity in a single gene region can also deliver insights concerning the diversification of faunas as opposed to small 43 groups of species. Interestingly, Argentinian and North American birds showed similar incidences of deep intraspecific divergences and of barcode sharing. The underlying causes of both these situations are of great importance to our understanding of avian speciation. We expect that follow-up investigations of sequence variation at other loci, and studies on morphology, behaviour, vocalization and distributions (e.g. Toews and Irwin 2008) will rapidly advance understanding of the diversity and diversification of the Neotropical avifauna. 44 Table 2.1. Bird species from Argentina with two or three deeply divergent groups at COL Species showing more than 2.4% sequence divergence between groups are in bold. Maximum distances are reported in percent divergence. Patterns represent allopatry (A), parapatry (P), or sympatry (S). # Family Species Max. Individuals Pattern distance per lineage 1 Charadriidae Vanellus chilensis 1.54 4/4 A 2 Strigidae Athene cunicularia 1.60 1/5 A 3 Dendrocolaptidae Sittasomus griseicapillus 3.25 1/6 A 4 Furnariidae Geositta cunicularia 3.41 1/2 S 5 Leptasthenura aegithaloides 3.72 2/8 A 6 Cinclodes fuscus 4.65 3/5 A 7 Upucerthia dumetaria 5.41 2/10 A 8 Cranioleuca pyrrhophya 1.53 2/4 P 9 Thamnophilidae Thamnophilus caerulescens 2.44 3/4 P 10 Thamnophilus ruficapillus 4.03 2/2 A 11 Pipridae Manacus manacus 3.56 3/3 S 12 Tyrannidae Serpophaga subcristata 2.04 2/3 A 13 Myiophobus fasciatus 4.67 1/3 A 14 Knipolegus aterrimus 1.9 3/4 P 15 Troglodytidae Cistothorus platensis 4.95 1/2 A 16 Troglodytes aedon 4.99 3/8/8 A/P 17 Vireonidae Vireo olivaceus 3.09 2/3 P 18 Thraupidae Thraupis bonariensis 3.29 1/6 A 19 Cardinalidae Cyanocompsa brissonii 2.04 2/6 P 20 Saltator aurantiirostris 1.52 1/6 A 21 Emberizidae Arremon flavirostris 1.75 3/4 A 45 §0% I 66% I 40% 20%. 4 1 I 3 4 S § 7 S | i@ 11 12 O 14 IS 16 1? IS Distance {m} Figure 2.1. Frequency histograms of COI sequence variation for birds of Argentina. Distance to nearest congeneric neighbour for 282 species from genera represented by multiple taxa (black) and mean intraspecific diversity for the 389 species of birds with two or more sequence records (white). 46 00 (b) • V o • • • O -o 0 Bf j-o r# H: I I 2.0% Figure 2.2. Maps detailing the different distributional patterns of divergent barcode lineages. Species ranges are highlighted in green and circles indicate collection sites. Hollow and filled in circles correspond to lineages represented on superimposed neighbour-joining trees (shaded circles represent sites with overlap), (a) Barcode lineages are allopatric and coincide with disjunctions in the distribution of populations (e.g. Thamnophilus ruficapillus). (b) Barcode lineages are allopatric, but species distribution appears continuous (e.g. Cinclodes fuscus). (c) Barcode lineages are parapatric (e.g. Vireo olivaceus). (d) Barcode lineages are sympatric (e.g. Manacus manacus). 47 CHAPTER III Filling the gap - COI barcode resolution in eastern Palearctic birds Published in Frontiers in Zoology: Kerr, K.C.R., Birks, S. M., Kalyakin, M., Red'kin, Y., Koblik, E., and Hebert, P.D.N. (2010) Filling the gap - COI barcode resolution in eastern Palearctic birds. Frontiers in Zoology 6: 29. doi: 10.1186/1742-9994-6- 29 48 ABSTRACT The Palearctic region supports relatively few avian species, yet recent molecular studies have revealed that cryptic lineages likely still persist unrecognized. A broad survey of cytochrome c oxidase I (COI) sequences, or DNA barcodes, can aid on this front by providing molecular diagnostics for species assignment. Barcodes have already been extensively surveyed in the Nearctic, which provides an interesting comparison to this region; faunal interchange between these regions has been very dynamic. We explored COI sequence divergence within and between species of Palearctic birds, including samples from Russia, Kazakhstan, and Mongolia. As of yet, there is no consensus on the best method to analyze barcode data. We used this opportunity to compare and contrast three different methods routinely employed in barcoding studies: clustering-based, distance-based, and character-based methods. We produced COI sequences from 1,674 specimens representing 398 Palearctic species. These were merged with published COI sequences from North American congeners, creating a final dataset of 2,523 sequences for 599 species. Ninety-six percent of the species analyzed could be accurately identified using one or a combination of the methods employed. Most species could be rapidly assigned using the cluster-based or distance-based approach alone. For a few select groups of species, the character-based method offered an additional level of resolution. Of the five groups of indistinguishable species, most were pairs, save for a larger group comprising the herring gull complex. Up to 44 species exhibited deep intraspecific divergences, many of which corresponded to previously described phylogeographic patterns and endemism hotspots. 49 COI sequence divergence within eastern Palearctic birds is largely consistent with that observed in birds from other temperate regions. Sequence variation is primarily congruent with taxonomic boundaries; deviations from this trend reveal overlooked biological patterns, and in some cases, overlooked species. More research is needed to further refine the taxonomic status of some Palearctic birds, but large genetic surveys such as this may facilitate this effort. DNA barcodes are a practical means for rapid species assignment, although efficient analytical methods will likely require a two-tiered approach to differentiate closely related pairs of species. 50 INTRODUCTION DNA barcoding employs sequences from a short standardized gene region to identify species (Hebert et al. 2003a). The mitochondrial gene cytochrome c oxidase I (COI) has been firmly established as the core barcode region for animals (Frezal and Leblois 2008) and its performance has been evaluated in birds from several regions, including North America (Kerr et al. 2007), Brazil (Chaves et al. 2008; Vilaca et al. 2006), Argentina (Kerr et al. 2009b), and Korea (Yoo et al. 2006). While most bird species are readily identifiable through morphological traits (Watson 2005), their well- developed taxonomy makes them a valuable group to test the efficacy of barcoding. Additionally, avian taxonomy is not immune to change, and in recent decades DNA evidence has clarified many species boundaries. Broad surveys, such as DNA barcoding, can expedite this process by quickly spotlighting species that merit further taxonomic investigation (Elias-Gutierrez and Valdez-Moreno 2008; Gibbs 2009; Yassin 2008). This capacity is illustrated by several recently described species that were earlier revealed as divergent lineages during barcode surveys (Areta and Pearman 2009; Barker et al. 2008; Toews and Irwin 2008). Although the avian diversity of the Palearctic is relatively depauperate (Newton 2000) and its taxonomy was stable for decades, modern molecular techniques have spurred the recognition of overlooked species (Knox et al. 2002). These new species were often hidden within morphologically cryptic assemblages, which impeded their discovery (e.g. Illera et al. 2008; Li and Zhang 2004). In other cases, biological species hypotheses could not be tested because divergent populations had allopatric distributions (Friesen et al. 1996; Zink et al. 1995b; Zink et al. 2002b). Molecular analyses continue to illuminate 51 the phylogeographic structure of birds in this region (Drovetski et al. 2004a; Drovetski et al. 2004b; Pavlova et al. 2006; Pavlova et al. 2003; Pavlova et al. 2008; Zink et al. 2006a; Zink et al. 2008; Zink et al. 2002b). A recent barcoding survey of Scandinavian birds by Johnsen et al. (2010) revealed high species resolution plus a few divergent lineages, including some between European and North American populations of trans-Atlantic species. The Atlantic Ocean serves as a relatively impermeable barrier to dispersal for non-pelagic birds (Newton 2000; but see Voelker et al. 2009), but the situation is very different in the eastern Palearctic where intercontinental exchange across the Bering Strait is more frequent (Pavlova et al. 2008; Reeves et al. 2008; Zink et al. 1995b). Johnsen et al. (2010) also highlighted sequence divergences within a few species that failed to correspond to known subspecies or logical geographical patterns - a pattern not observed in a comprehensive survey of Nearctic birds (Kerr et al. 2007). To determine if this pattern is recurrent, to highlight further cases of cryptic divergences, and to explore general patterns in sequence divergence, we advance COI barcode coverage in this study to include the breeding birds of the eastern Palearctic region, including Russia, Ukraine, Kazakhstan, and Mongolia. Despite the growth of DNA barcode libraries, no consensus has yet emerged on the best method to analyze DNA barcode data (Ferri et al. 2009). Some of the original tools proposed to delimit species using COI sequences, such as neighbour-joining profiles (Barrett and Hebert 2005) and distance thresholds (Hebert et al. 2004), have been criticized by several authors for not realistically addressing the complexity of species boundaries (Baker et al. 2009; Moritz and Cicero 2004; Wiemers and Fiedler 2007; Zhang et al. 2008). More recent tools have gained complexity, incorporating coalescent 52 theory and more elaborate statistical methods, though at the cost of computational time and power (Abdo and Golding 2007; Matz and Nielsen 2005; Zhang et al. 2008). The situation is further complicated by the dual purposes proposed for barcoding: species identification and species discovery (DeSalle 2006). The majority of new generation tools require pre-defined species designations and consequently cannot be used to identify divergent genetic lineages within known groups. Although the use of DNA barcodes to "discover" species is contentious, it is generally accepted that barcode data can be used to flag potentially distinct taxa for further hypothesis testing (Rach et al. 2008). Because the taxonomy of Holarctic birds is relatively mature (Baker et al. 2009), we take this opportunity to compare and contrast some of the more commonly used analytical methods. METHODS Sampling We examined 1,674 individuals representing 398 Palearctic species, with 83% of these taxa represented by multiple individuals. Species coverage was not uniformly distributed across orders and families due to specimen availability; nearly two-thirds of resident passerines were represented, versus less than 38% of non-passerine birds. We used frozen tissue (typically pectoral muscle) from museum specimens; all but six tissues were linked to vouchered specimens. All tissue specimens originated from either the ornithology collection at the Burke Museum of Natural History and Culture (87.5%) or from the Zoological Museum of Moscow University (12.5%), and were collected in the field during the past 20 years. To capture geographical variation, individuals collected 53 from widely dispersed sites were preferentially sampled for each species whenever possible (see Figure 3.1 for distribution of collecting sites). Additional sequences from North American congeners were also contributed (see below). As a taxonomic reference, we followed Clements (2007), including corrections and updates up to 8 October 2007 with the exception of treating Corvus comix as conspecific with C. corone (sensu Haring et al. 2007). Laboratory methods DNA extraction, PCR, and sequencing reactions follow the procedures described in Kerr et al. (2009b). Only sequences greater than 500 bp and containing fewer than 10 ambiguous base calls were included in analyses. The sequence from one Anas crecca specimen was omitted from analysis due to suspicion that it was actually an A. crecca x A. carolinensis hybrid based on morphology and molecular results. Collection data, sequences, and trace files are available from the project 'Birds of the eastern Palearctic' at http://www.barcodinglife.org. All sequences have also been deposited in GenBank (Accession nos GQ481247 - GQ482920). A complete list of the museum catalog numbers, BOLD process identification numbers, and GenBank accession numbers for each specimen analyzed is included in Appendix 3.1. We supplemented the data gathered in this study with sequences from North American congeners (accessible from the "Birds of North America - Phase II" project folder at www.barcodinglife.org) to examine divergences within transcontinental species and between sister species pairs. This added 849 sequences from 227 species, of which 66 species were shared with the Palearctic dataset. A list of BOLD process identification 54 numbers and GenBank accession numbers for these sequences are listed in Appendix 3.2. In total, 2,523 sequences from 559 species were included in the analyses. Data analysis To assess the discriminatory power of COI barcodes, we compared three different methods commonly deployed in DNA barcoding studies: neighbour-joining (NJ) clusters, distance-based thresholds, and character-based assignment. We avoided more computationally intensive methods in favour of programs that could be executed in real time. For the clustering method, we used MEGA version 3.1 (Kumar et al. 2004) to construct an NJ tree using the Kimura 2 parameter distance model (K2P). More sophisticated tree-building methods exist, but since we are concerned about terminal branches, not deeper branching patterns, this method is sufficient. Support for monophyletic clusters was determined using 500 bootstrap replicates. Species were accepted as being monophyletic providing they comprised the smallest diagnosable cluster with greater than 95% bootstrap support (Felsenstein 1985). Though bootstrap support cannot be determined for species represented by a single sequence, they were included in the analysis to observe if they created paraphyly in neighbouring taxa. Species that could be divided into two or more well-supported clusters were flagged as potentially cryptic taxa. For the threshold-based approach, we blindly grouped sequences into provisional species clusters using a molecular operational taxonomic unit (MOTU) assignment program originally developed for nematodes (Floyd et al. 2002). The program, 'MOTU_define.pl' v2.07 (R. Floyd and M. Blaxter, unpublished; available from 55 http://www.nematodes.org/bioinformatics/MOTU/index.shtml), clusters sequences together based on BLAST similarity using a user-defined base difference cut-off. Rather than use an arbitrary cut-off value, we determined the optimum threshold, or OT (Wiemers and Fiedler 2007), by pooling our new data with the published North American bird dataset (Kerr et al. 2007) and generating a cumulative error plot using all species with multiple representatives (see Figure 3.2). We adopted a liberal threshold of 11 base differences based on this result, which approximately equates to 1.6% divergence. Program parameters only included sequences greater than 500bp with a minimum alignment overlap of 400bp; however, this did not exclude any sequences from analysis. For the character-based identification method, we used the character assignment system CAOS, which automates the identification of conserved character states (in this case, different nucleotides) from a cladogram of pre-defined species (Sarkar et al. 2008). The system comprises two programs: P-Gnome and P-Elf (Sarkar et al. 2008). P-Gnome is used to identify the diagnostic sequence characters that separate species and uses them to generate a rule set for species identification; P-Elf classifies new sequences to species using the rule set. We used the programs PAUP v4.0bl0 (Swofford 2002) and MESQUITE v2.6 (Maddison and Maddison 2009) respectively to produce the input NJ trees and nexus files for P-Gnome in accordance with the CAOS manual. We executed P- Gnome using several subsets of our data. First, we tried all of the Palearctic species included in this study to determine if diagnostic characters could be identified to separate a wide range of species. The input tree for P-Gnome requires that all species nodes be collapsed to single polytomies, which is an arduous task for large numbers of species. We only used a single representative from each species to circumvent this issue with the 56 drawback that intraspecific variation is ignored during rule generation. To test the character-based method on a finer scale, we ran the program independently on the three largest genera sampled: Emberiza (n=23), Phylloscopus (n=13), and Turdus (n=13). For species with multiple representatives, the shortest sequence was omitted from rule generation and used later to test species assignment. For the first two tests (NJ and MOTU), all species exhibiting type I error, wherein a single species produced two or more discernable clusters of sequences, were compiled. Additional lines of evidence (e.g. alternative genes, morphological differences, song differences, etc.) were sought from previous studies to support or refute the likelihood of species differences in such cases. However, no formal recommendations are made here. We also performed the two-cluster test using Lintre (Takezaki et al. 1995) to determine if sequences from these species had evolved in a clock-like manner. For type II errors, wherein multiple species grouped together to form one well-supported cluster, sequences from each cluster were run through P-Gnome to ascertain if diagnostic characters could be identified that distinguish these close species. RESULTS Neighbour-joining clusters Of the 559 species analyzed, 72 had only a single representative and thus no bootstrap support could be calculated. However, all of these formed independent branches on the NJ tree that did not compromise the identification of other species. The remaining species were categorized into four patterns (Figure 3.3). Ninety percent formed well-supported (>95% bootstrap) monophyletic groups (Figure 3.3a), and an additional 57 4% were monophyletic but with less than 95% bootstrap support (Figure 3.3b). Ten species, 2% of the total, were paraphyletic (Lams canus, Thalasseus sandviciensis, Motacilla citreola, M.flava, Saxicola maurus, Sitta europaea, Certhia familiaris, Lanius collurio, L. excubitor, and Pica pica)(Figure 3.3c). The remaining taxa (4%) formed monophyletic clusters that contained two or more species (Figure 3.3d; Table 3.1). These were mostly limited to pairs of sister taxa, with the notable exception of one cluster containing 10 species in the Herring gull complex (Larus californicus, L.fuscus, L. glaucescens, L. glaucoides, L. heuglini, L. hyperboreus, L. occidentalis, L. smithsonianus, L. thayeri, and L. vegae). Forty-two species showed evidence of having divergent lineages (Table 3.2). Twenty-two species formed two or more well-supported (>95% bootstrap) monophyletic clusters. Another four species formed two distinct clusters, but with one cluster possessing only 90-94% bootstrap support. These cases included 7 of the 10 paraphyletic species. In an additional 16 species, a single specimen was divergent from the rest, but further sampling is necessary to adequately evaluate these cases. Table 3.2 lists all species with divergent lineages. The total number of species recognized via this method is difficult to gauge due to inclusion of single representatives for some species and divergent lineages. Distance-based assignment The MOTU analysis identified 570 clusters, or taxonomic units, versus the 559 recognized by traditional taxonomy. The similarity of these numbers disguises discrepancies in species assignment. Poor resolution occurred in 22 groups representing 58 61 species (Table 3.1). These lumped taxa, as with the NJ clustering method, were mostly limited to pairs of species, save for two triplets (Somateria spp. and Turdus spp.) and thirteen large white-headed gulls (Larus canus, L. delawarensis, L. marinus, and the aforementioned members of the Herring gull complex). Divergent groups were recognized in 42 species (Table 3.2); 95% of these overlapped with those recognized via NJ. Most were divided into two clusters, though three or more clusters were detected in five species. In two of the paraphyletic species (Motacilla flava, Lanius collurio), one lineage was lumped with a closely related species while the other lineage was divergent. Character-based assignment P-Gnome failed to produce a diagnostic rule set that that could distinguish all 398 species sequenced in this study. Results using subsets of the data were more successful. Complete diagnostic rule sets were generated and successfully tested for both Phylloscopus and Turdus. The rule set for Emberiza could not distinguish between sequences of E. leucocephalos and E. citrinella due to their close congruence. In addition, P-Elf failed to correctly identify single sequences from the species E. chrysophrys and E. elegans. The former sequence was short (594 bp) and might have lacked important diagnostic characters. However, the latter sequence was of typical length (694 bp) and only exceptional in that it contained 5 polymorphic sites from the sequence used to generate the rule set. Both of these species were incorrectly identified as E. aureola, though this identification would vary if the input tree were altered. Of 22 groups of lumped species, all but five could be resolved using diagnostic characters (see Table 3.1). For example, the species pair Coturnix coturnix and C. 59 japonica possessed 10 diagnostic nucleotide sites, two short of recognition by the MOTU threshold but still easily distinguishable. More complex rule sets were required when more species were involved (e.g. Aythya ducks). The remaining groups featured virtually no variation between species. These include 10 members of the herring gull complex {Larus spp.) and the species pairs Gallinago gallinago/G. delicata, Cuculus canoruslC. optatus, Carduelis flammealC. hornemanni, and Emberiza citrinella/E. leucocephalos. DISCUSSION Species boundaries in Palearctic birds Divergence levels between closely related species were highly variable, ranging from approximately 0-16%; however, some of these values may be inflated for under- sampled genera and families. Recent studies have detached rate variation in the mitochondrial genome from factors such as population size, body size, and other life- history traits (Bazin et al. 2006; Nabholz et al. 2009; Nabholz et al. 2008). While some authors contend that rate variation in birds is highly irregular (Nabholz et al. 2009), a recent thorough review demonstrated relatively minor variation and upheld the occurrence of clock-like evolution (Weir and Schluter 2008). Consequently, we attribute the limited divergence between some sister species to recent speciation events. Studies documenting recent and rapid diversifications often address subspecific variants rather than full species (Mila et al. 2007a; Mila et al. 2007b). Still, low sequence divergence does not necessarily indicate that species should be synonymised (Joseph and Omland 2009). Low sequence divergence is particularly common in superspecies complexes, 60 including those divided between continents, but the species within them remain valid units for both ecological studies and conservation. Four species pairs and the large white-headed gulls included in this study featured virtually no variation for COI and could not be distinguished using any of the approaches employed in this study. Low divergence in mitochondrial markers had been previously demonstrated in each of these cases. Lumping has been considered for some, including Carduelis flammea/hornemanni (Marthinsen et al. 2008) and the recently split Gallinago gallinago/delicata (Baker et al. 2009), but more evidence is required. The cause of shared mitochondrial haplotypes between Cuculus canorus and C. optatus has not been resolved (hybrids have never been documented (Sorenson and Payne 2005), but their taxonomic distinction has been asserted based on song differences (Payne 2005). Emberiza citrinella and E. leucocephalos are exceptionally interesting in that they are the most phenotypically distinct of these pairs and a survey of nuclear markers revealed genetic divergence (Irwin et al. 2009). They are known to hybridize extensively and introgression is a likely explanation (Irwin et al. 2009). Species boundaries in the large white-headed gulls may have also been confused by contemporary hybridization, though shallow history and slowed rates of evolution have also been implicated (Crochet and Desmarais 2000; Liebers et al. 2004). Nearly one tenth of the species (7.5%) analyzed in this study contained divergent mitochondrial lineages, with divergences averaging 3.6%. While divergence at a single mitochondrial gene alone is insufficient evidence to define new species boundaries, it is cause for new hypothesis testing. Several recently split species that are morphologically similar to their nearest relative, such as the swallow Riparia diluta or the warbler 61 Locustella amnicola, represent taxa that barcodes would flag for closer scrutiny. Distributions of most of the divergent lineages in this study conform to one of four previously documented phylogeographic trends (summarized in Table 3.2): a unique lineage in the Caucasus region (Hewitt 2000); a unique lineage in the Sakhalin region (Zink et al. 2002a); divergent lineages divided into eastern and western populations (Zink et al. 2008); divergent lineages on either side of the Bering Strait (Zink et al. 1995b). Species with multiple lineages can display more than one of these patterns. A few lineages appear to be parapatric, which could indicate areas of overlap or hybrid zones (Aliabadian et al. 2005). Past climate change and its effect on historical habitat distribution is likely responsible for shaping patterns of genetic divergence in modern populations, but whether or not these populations were divided by the same historical events is difficult to determine without dating divergence times. While the COI sequences mostly appear to be evolving in a clocklike fashion, dating is risky given the absence of adequate calibration points and the reliance on various assumptions (Pavlova et al. 2008; Weir and Schluter 2008). Most species exhibited surprisingly limited variation between Old World and New World populations. Of the approximately 140 species with Holarctic distributions, 43% are represented in this study. Only 11 of these 61 species (18%) possessed intraspecific divergences great enough to signal likely species-level differences by either the NJ or MOTU method. The Bering Sea has served a variable but clear role as a barrier to gene flow for birds, particularly non-marine species. Several trans-Beringian species have already been split in recent years, due partly to molecular evidence (e.g. Brachyramphus marmoratus/B. perdix (Friesen et al. 1996), Picoides tridactylus/P. 62 dorsalis (Zink et al. 1995b), Pica pica/P. hudsoni (Banks et al. 2000)). Still, caution must be exercised when identifying species boundaries between allopatric populations. For example, one of the Palearctic Lanius excubitor specimens from this study appears to belong to the North American clade, suggesting that some modern exchange might occur between the continents. Though it is more common for Palearctic species to invade the Nearctic, the reverse pattern has also been observed (Zink et al. 2006b). Correct interpretation of this result requires further study with additional specimens. This survey has identified a number of species that demand further taxonomic scrutiny (see Table 3.2). It is likely that some of the divergent lineages identified here represent distinct species. Of course, genetic distances do not always correspond to species limits (Zink et al. 2006b; Zink et al. 1995b). Alternative explanations for the divergent lineages observed include historical phylogeographic isolation, female- restricted dispersal, or male-biased gene flow (Baker et al. 2009). The common phylogeographic patterns observed in many of the divergent lineages support the idea of historical isolation. Areas of secondary contact must be further studied to evaluate the gene flow between lineages (Moritz et al. 2009). In a few exceptional cases genetic lineages appear largely sympatric, including within Alauda arvensis, Delichon dasypus, and Phoenicurus phoenicurus. Nuclear copies of mitochondrial sequences (numts) are an unlikely explanation given the absence of stop codons and heterozygous peaks. Phoenicurus phoenicurus was also noted by Johnsen et al. (2010), who attributed the aberrant phylogeographic pattern to admixture of historically separated lineages. This situation is paradoxical compared to suspected introgressed genomes used to explain limited divergence in sister species. Selective sweeps are frequently invoked to explain 63 the limited variation observed in mitochondrial markers (Irwin et al. 2009; Kerr et al. 2009b), which raises the question of how two mtDNA lineages manage to persist in one species but not another. Ongoing research of species limits and evolutionary histories is clearly still necessary in the Palearctic. Methods comparison The MOTU assignment program used in this study was originally developed for meiofauna with few morphological characters (Floyd et al. 2002). Applying it to a group with better-established taxonomy allows more conclusive tests of its performance. Our results indicated a type II error rate of 10.9%, but this is inflated by the diversity of named white-headed gull species (Larus spp.); with these species eliminated, error is reduced to 8.8%. At this point, we don't consider type I errors a fault of this method since these cases are biologically interesting, do not necessarily impair identification, and may represent over-looked species (Baker et al. 2009; Hebert et al. 2004). The major drawback to the program in its current form is the difficulty in associating any level of statistical support with species assignments, which may differ slightly depending on the input order of sequences. Although the program does allow a random re-sampling scheme, the output is not summarized, making statistical inference on the stability of taxonomic units virtually impossible. The major impediment now for biologists applying this method to microscopic invertebrates still lies in determining an operational threshold. The use of a distance-based threshold technique has been a major point of contention in the DNA barcoding endeavour (Hickerson et al. 2006; Meyer and Paulay 2005; Moritz and Cicero 2004). While COI variation represents a product of evolution, 64 an arbitrary cut-off value does not reflect what is known about the evolutionary processes responsible for this variation. The threshold approach depends on the existence of a gap between levels of intraspecific variation and interspecific divergence, which opponents argue does not exist. Early success in identifying a "barcoding gap" in North American birds was attributed to insufficient sampling of closely related species (Baker et al. 2009; Moritz and Cicero 2004). We found the original "lOx rule" proposed by Hebert et al. (2004) to be too conservative to recognize recently diverged species and opted for a more liberal threshold of 1.6%. While this value was more effective at species identification, some sister species exhibited little or no variation, which eliminates the possibility of identifying a gap. However, invalidating the use of distance-based methods based on the failure of thresholds might be going too far. Identifying the nearest matches to a query sequence is still useful, even if a conclusive assignment is not provided (Ratnasingham and Hebert 2007). The development of an NJ profile for identification depends on the coalescence of species and not an arbitrary level of divergence (Wiemers and Fiedler 2007); in theory, species that failed recognition via the threshold approach may still be recognized. However, we found that the same species were typically problematic for both approaches (see Table 3.1). This is not surprising: high bootstrap support is unlikely when a slight aberration in the data would alter the results (Holmes 2003), which is the case when sequences are highly similar. Critics have argued that the bootstrap test for monophyly is simply too conservative and incorrectly rejects monophyly in too many cases (Rodrigo 1993). This is apparent from the 4% of species that appear monophyletic but with limited support. Alternative forms of statistical support based on coalescent theory suggest that 65 increased sampling decreases the risk of monophyly by chance, which would support the reality of these patterns despite low bootstrap values (Rosenberg 2007). A modified NJ algorithm with non-parametric bootstrapping has been proposed to offer fast barcode- based identifications, but success still depends on the completeness of the reference database and weakly divergent species remain problematic (Munch et al. 2008). The character-based method was effective, but did not feature the same scalability as the previous two methods. We found that the CAOS system was severely constrained by limits on the number of species that could be included for rule generation. More thorough benchmarking is necessary to determine the upper limits of the program, but at this point in time they are unclear. We also found that comprehensive sampling for each taxon is vital for accurate rules that account for intraspecific polymorphisms. When operating with smaller sets of taxa, the programs were successful in both identifying diagnostic characters and in subsequently identifying new sequences to species. However, we did find P-Elf to be highly susceptible to erroneous identifications for unrepresented species, counter to previous claims (Kelly et al. 2007). When using smaller datasets, sequences introduced from novel taxa were typically given a species level identification, even when those taxa derived from a different order (data not shown). Both distance-based and clustering-based methods appear to share the same computational strengths, handling even large datasets quickly. However, both methods are also impaired by the same issues: limited divergence between sister taxa. The results of the character-based method appear to complement the former two methods. While it is precise and able to detect minor differences in closely related taxa (Wong et al. 2009), it is unable to handle large numbers of sequences. It is also susceptible to errors when the 66 appropriate taxa have not been comprehensively sampled. When it comes to species identification, we propose that the best method might actually be a multi-tiered approach, where an initial method is used to narrow the identification to a select group of taxa and an alternate method is used to differentiate similar taxa. Similarly, Munch et al. (2008) recommend incorporating methods that model population level variation to distinguish between closely allied species. For cases of limited divergence, sampling a longer stretch of COI or even alternative genes would increase support for identifications. Conclusions The utility of DNA barcodes in avian research is two-fold. Preliminary investigations, such as this, offer fresh insight to aid the ongoing effort to refine avian taxonomy. And secondly, a comprehensive library of COI sequences provides an invaluable tool for species assignment when differences in morphology are difficult to measure or otherwise assess. This includes species with cryptic morphological differences (e.g. Phylloscopus warblers, Calandrella larks, and Empidonax flycatchers) but also scenarios where identification is desired but only fragmentary remains are available (e.g. air strikes, nest contents, diet analysis, etc.). This study reaffirms these possibilities, demonstrating that COI sequence variation is largely congruent with species boundaries. Departures from this congruence are typically indicative of overlooked biological processes; historically separated lineages in the case of within species divergence, and recent or historical gene flow in the case of shared haplotypes between species. Molecular analysis is novel for some of these taxonomic groups or geographic areas, and the resultant observations highlight areas in need of further taxonomic study. 67 The efficacy of DNA barcodes for use in species assignment is dependent on two factors: the construction of thorough COI libraries and efficient tools to assign sequences to species. This study substantiates the need for dense taxonomic sampling. It further demonstrates that standardized gene libraries are easily amalgamated to examine geographically broad areas or taxonomically diverse groups. Current analytical methods for barcode data appear insufficient for handling recently evolved species. Though less of a problem for known cases of shallow divergence, where pairs of species may be further scrutinized using a multi-tiered approach, these cases may be more problematic for those who wish to use barcodes as a tool to accelerate species discovery. 68 Table 3.1. List of all groups of species that failed recognition via MOTU analysis. Additionally, species with aberrant NJ profiles are indicated; profile designations (a-d) refer to Figure 3.3. Bootstrap support is given for each species ("nm" denotes that the species is not monophyletic) and the average interspecific distance is given for each group of species, both as percentages. Whether groups could be distinguished via CAOS is also indicated. Family Species n NJ Bootstrap Inter sp CAOS 1 Gaviidae Gavia adamsii 6 b 38 0.77 Yes Gavia immer 3 67 2 Phalacrocoracidae Phalacrocorax pelagicus 9 b 61 0.78 Yes Phalacrocorax urile 1 n/a 3 Ardeidae Ardea cinerea 1 b n/a 1.90 Yes Ardea herodias 4 99 4 Anatidae Anas falcata 1 b n/a 1.46 Yes Anas strepera 9 50 Ay thy a affinis 9 b 24 1.58 Yes Aythya americana 10 61 Aythya collaris 10 81 Aythya fuligula 3 90 Aythya marila 11 12 Aythya valisineria 6 87 6 Bucephala clangula 1 b 55 1.58 Yes Bucephala islandica 10 87 7 Somateria fishcheri 7 b 94 0.96 Yes Somateria mollisima 10 d nm Somateria spectabilis 3 nm 8 Phasianidae Coturnix coturnix 2 a 99 1.50 Yes Coturnix japonica 4 99 9 Accipitridae Buteo buteo 3 b 85 1.92 Yes Buteo lagopus 2 92 10 Scolopacidae Gallinago delicata 6 d nm 0.15 No Gallinago gallinago 4 nm 11 Gallinago megala 2 b 93 0.61 Yes Gallinago stenura 5 98 12 Glareolidae Glareola pratincola 2 a 99 1.61 Yes Glareola nordmanni 3 99 13 Laridae Lams canus 5 89 0.65 Yes Larus canus "brachyrhynchus" 4 77 Lams delawarensis 3 50 Larus marinus 3 87 Larus spp. f 34 nm 0.24 No 14 Alcidae Cepphus carbo 3 "99~ 0.97 Yes 69 Cepphus columba 2 99 15 Cuculidae Cuculus canorus 5 d nm 0.71 No Cuculus optatus 5 nm 16 Motacillidae Motacillaflava "taivana" 2 b 99 1.16 Yes Motacilla citreola "citreola " 2 87 Motacilla citreola "werae " 4 98 17 Turdidae Tardus naumanni 9 b 75 1.10 Yes Turdus ruficollis 8 67 18 Turdus chrysolaus 9 b 97 1.35 Yes Turdus obscurus 5 67 Turdus pallidus 4 51 19 Laniidae Lanius isabellinus 3 b 99 1.71 Yes Lanius collurio% 2 93 20 Fringillidae Carduelis flammea 10 d nm 0.40 No Carduelis hornemanni 6 nm 21 Carduelis pinus 6 a 99 2.01 Yes Carduelis spinus 15 99 22 Emberizidae Emberiza citrinella 5 d nm 0.09 No Emberiza leucocephalos 5 nm t Represents the ten members of the Herring gull complex listed in the text X Only two of four specimens of the paraphyletic Lanius collurio exhibited limited divergence from L. isabellinus 70 Table 3.2. List of all species containing divergent COI lineages. An asterisk in the respective column indicates that lineages were supported via the NJ or MOTU method (a question mark indicates undetermined cases). The number of specimens and bootstrap support (%) for each cluster is indicated, as is the mean distance (%) between all clusters within each species. Species NJ MOTU n Bootstrap Dist Phyl Bio Ref Falco columbarius ? * 1/4 -/99 2.29 P/N A Gallinula chloropus ? * 1/6 -/99 3.45 P/N A 1 Charadrius alexandrinus * * 4/3 99/99 7.53 P/N A Tringa totanus * 3/3 99/99 0.87 E/W A 2 Numenius phaeopus ? * 5/1 99/- 3.57 P/N A 3 Limosa limosa ? * 4/1 99/- 2.27 E/W P Thalasseus sandvicensis * * 2/6 98/99 3.78 P/N A 4 Streptopelia orientalis ? * 5/2 99/94 2.14 Sak P Asio otus ? 4/5 99/94 1.10 P/N A Aegolius funereus ? * 1/3 -/99 4.13 P/N A 5 Caprimulgus europaeus ? * 3/1 99/- 2.97 Cau A Dendrocopos major ? * 4/1 99/- 2.71 Sak A 6 Alauda arvensis * * 1/4/5 99/99/99 6.02 E/W, Sak A/P Delichon dasypus ? * 1/1/2 -i-m 3.58 S Anthus rubescens * * 6/2 99/99 2.46 P/N A 3 Motacilla flava * 2/1 87/- 5.57 E/W A 7 Troglodytes troglodytes * * 3/8/1 99/99/-/ 3.70 E/W, Cau A 8 1512 99/99 P/N Erithacus rubecula ? * 6/1 99/- 4.66 Cau A Luscinia megarhynchos ? * 1/2 -/99 2.56 Cau A Muscicapa sibirica ? * 6/1 99/- 2.85 Sak A Phoenicurus auroreus * * 2/3 99/99 2.36 E/W A Phoenicurus ochruros * * 3/2/1 99/99/- 3.66 E/W, Cau A Phoenicurus phoenicurus * * 2/4 99/99 5.20 S 1 Saxicola maurus ? * 7/1 99/- 7.91 E/W A 9 Cettia diphone * * 10/2 99/97 3.03 Sak A Phylloscopus borealis * * 8/6 99/99 3.59 Sak A 10 Phylloscopus trochiloides * * 4/4 99/99 4.39 E/W A 11 Sylvia curruca * * 6/3 99/99 5.56 E/W A Urosphena squameiceps ? * 4/1 99/- 2.09 Sak A Regulus regulus * * 7/3 99/99 3.69 E/W A 12 Parus major * * 6/7 99/99 2.59 E/W A 13, 14 Periparus ater * * 8/3 99/99 4.43 Cri A 1 Sitta europaea ? * 1/10/ -1991-1- 2.91 E/W, Cau, A 15 1/1 Yak Certhia familiaris ? * 6/3 93/99 1.93 E/W A Lanius excubitor * * 2/4 99/99 3.60 P/N P Lanius collurio 7 * 2/2 93/98 2.29 E/W A 71 Corvus corone * 1/7 -/83 2.15 E/W A Corvus frugilegus * * 2/2 99/99 2.94 E/W A Garrulus glandarius * * 4/3 99/99 2.63 E/W A Pica pica ? * 1/9 -/99 3.59 E/W A Sturnus vulgaris ? * 5/1 -/96 1.85 Kaz A Pinicola enucleator * * 12/2 99/99 4.54 P/N A Emberiza pallasi * * 4/2 99/99 3.10 Mog A Emberiza spodocephala * * 8/6 99/99 3.36 Sak A Phyl: Phylogeographic patterns (P/N = Palearctic/Nearctic, E/W = east/west, Sak = Sakhalin region, Cau = Caucasus region, Cri = Crimean region, Kaz = Kazakhstan, Mog = Mongolia, Yak = Sakha (Yakutia) region) Bio: Biogeographic patterns (A = allopatric, P = parapatric, S = sympatric) Additional references detailing more comprehensive studies are supplied where available: 1 (Johnsen et al. 2010), 2 (Baker et al. 2009), 3 (Zink et al. 1995b), 4 (Efe et al. 2009), 5 (Koopman et al. 2005), 6 (Zink et al. 2002a), 7 (Pavlova et al. 2003), 8 (Drovetski et al. 2004b), 9 (Zink et al. 2009), 10 (Reeves et al. 2008), 11 (Irwin et al. 2001), 12 (Packert et al. 2009), 13 (Kvist et al. 2003), 14 (Pavlova et al. 2006), 15 (Zink et al. 2006a), 16 (Haring et al. 2007), 17 (Akimova et al. 2007). 72 Figure 3.1. Map of the eastern Palearctic region detailing the collecting sites for all specimens used in this study. Red circles indicate sampling sites. Sampling intensity is indicated by the brightness of each circle. 73 100% OFaSse negatives • FaDse positives qrYxr"£aqor*i5rioa)o™^^aDo™^u>coo™^u>copr'j^oa5C|™^v9cqo™ Threshold ("Vo divergence) Figure 3.2. Cumulative error plots of type I (false positive) and type II (false negative) errors for different divergence thresholds. Plot is based on 979 Holarctic bird species. The optimum threshold occurs at 1.6% divergence. 74 a) c) Carduelis flavirostris _du 99_L Lanius isabellinus 90|l# 96 O Lanius collurio 99 93>-0 99 Carduelis cannabina •—T _ Lanius collurio 98L_n b) d) 99 Turdus torquatus Emberiza leucocephalos) O Emberiza citrinella O O Emberiza leucocephalos O O O O O 99 O Emberiza citrinella Turdus ruficollis O o 99 Lo O 67 \QEmberiza leucocephalos, O 99 38 -J Emberiza cia 98 O 74 O I T Emberiza godlewski 75 r-O 99 • • O 99 O Turdus naumanni 99 O [• • Emberiza cioides O O 0.02 O Figure 3.3. Examples of divergence patterns between closely related species illustrated in the NJ tree: a) Species are monophlyletic with >95% bootstrap support, b) Species are monophyletic, but is support is weak, c) Species are not monophyletic (i.e. paraphyly occurs), d) Multiple species form a single monophyletic group. 75 CHAPTER IV Searching for selection using avian DNA barcodes ABSTRACT The barcode of life project has assembled a very large number of mitochondrial cytochrome c oxidase I (COI) sequences. Although these sequences were gathered to develop a DNA-based system for species identification, biological inferences may also be derived from the wealth of data. Recurrent selective sweeps have been invoked as an evolutionary mechanism to explain limited intraspecific COI diversity, particularly in birds, but this hypothesis has not been formally tested. In this study, I collated COI sequences from previous barcoding studies on birds and tested them for evidence of selection. Using this expanded data set, I re-examined the relationships between intraspecific diversity and interspecific divergence and sampling effort, respectively. I employed the McDonald-Kreitman test to test for neutrality in sequence evolution between closely related pairs of species. Because amino acid sequences were generally constrained between closely related pairs, I also included broader intra-order comparisons to quantify patterns of protein variation in avian COI sequences. Lastly, using 22 published whole mitochondrial genomes, I compared the evolutionary rate of COI against the other 12 protein-coding mitochondrial genes to assess intra-genomic variability. I find no evidence of selective sweeps between closely related species. Contrary to prior studies, I did uncover a weak relationship between intraspecific variation and interspecific divergence. However, most evidence pointed to an overall trend of strong purifying selection and functional constraint. The COI protein did vary across the class Aves, but to a very limited extent. COI was the least variable gene in the mitochondrial genome, suggesting that other genes might be more informative for probing factors constraining mitochondrial variation within species. 77 INTRODUCTION The role of selection in the evolution of the mitochondrial genome is the subject of ongoing debate (Ballard and Whitlock 2004; Gerber et al. 2001; Meiklejohn et al. 2007). Variation at mitochondrial genes was long regarded as largely neutral and has been frequently used to infer effective population size and historical demographies based on that assumption. Consequently, mitochondrial DNA (mtDNA) variation has become a mainstay of molecular ecology and phylogeographic studies (Ballard and Whitlock 2004). More recently, this paradigm has shifted, as an ever-increasing role of selection has been recognized in the evolution of mitochondrial genes (Gerber et al. 2001). The earliest studies to test the expectations of neutrally-evolving mtDNA instead found evidence of selection against mildly deleterious mutations in varied groups of animals including Drosophila (Rand and Kann 1996), mice (Nachman et al. 1994), humans (Hasegawa et al. 1998; Wise et al. 1998), and birds (Fry 1999). In such cases, the observed trend was toward an excess of amino acid polymorphisms within species as compared to amino acid substitutions between species. In contrast, more recent studies have cited evidence of positive selection in the mitochondrial genome, which has been attributed to cyto-nuclear interactions (Ballard and Whitlock 2004). Looking across 26 mammalian taxa, Schmidt et al. (2001) found that the nonsynonymous substitution rate was much greater in gene regions that coded for close-contact residues (i.e. those interacting with nuclear-encoded residues), suggesting positive selection acting at those sites. The above example also illustrates the mitochondrial genome's susceptibility to indirect selection. The mitochondrial genome generally lacks recombination and thus 78 behaves as a single linkage group. In some cases additional genes may also be linked, as in birds where females are the heterogametic sex and linkage extends to the W chromosome (Berlin et al. 2007). This linkage forms the foundation for an emerging view that mitochondrial evolution is governed by recurrent "selective sweeps". A selective sweep, also known as "genetic hitchhiking", occurs when selection acting on one site results in loss of variation from linked sites (Barton 2000). Unfortunately, the selective sweep hypothesis is difficult to test in the mitochondrial genome because it forms a single linkage group; most tests depend on the comparison of multiple loci (Galtier et al. 2000). Demonstration of selective sweeps occurring in mitochondrial genes has typically been indirect. For example, in a landmark study which examined nearly 3,000 animal species, Bazin et al. (2006) concluded that the genetic diversity of mitochondrial markers was independent of population size, contradicting prior assumptions. They contended that purifying selection could not explain the pattern and that recurrent fixation of beneficial mutations was the most parsimonious explanation. The study generated much debate (Meiklejohn et al. 2007; Mulligan et al. 2006) and was perhaps most soundly criticized for assessing neutrality between distantly related taxa (Wares et al. 2006). The supposed sweeps were actually detected at deep phylogenetic levels, not necessarily between closely related species (Berry 2006). Large-scale analyses of the mitochondrial gene cytochrome c oxidase I (COI) for DNA barcoding studies have invoked routine selective sweeps to explain the consistently low variation observed within species, despite the varying age of species (e.g. Kerr et al. 2007). This violates a rudimentary tenet of neutral theory that intraspecific polymorphism is correlated with interspecific divergence (Hudson et al. 1987). Simulations based on 79 neutral models predicted that more than 4 million generations would be necessary to achieve the degree of COI differentiation observed between closely allied species (Hickerson et al. 2006), but barcode data suggested independence of intraspecific variation and species age (Kerr et al. 2009b; Kerr et al. 2007). Baker et al. (2009) offered an alternative explanation, arguing that low intraspecific diversity is an artifact of the small number of individuals examined in most barcoding studies and that denser intraspecific sampling would erase this pattern. The problem is compounded by varying methods that can be used to measure intraspecific diversity (e.g. genetic distance, haplotype number, etc.). The number of available COI sequences has increased dramatically with the success of DNA barcoding, particularly for taxa such as birds (Frezal and Leblois 2008). In this study, I take advantage of the expanded avian COI barcode data to more rigorously test for evidence of selection. This includes a reassessment of the relationship between intraspecific variation and interspecific divergence and sampling effort, tests for neutrality, and a cross genome comparison of genetic variation. METHODS Data collection Published data were accessed from three public projects in the Barcode of Life Database (BOLD, Ratnasingham and Hebert 2007): "Birds of North America - Phase II" (Kerr et al. 2007), "Birds of Argentina - Phase I" (Kerr et al. 2009b), and "Birds of the eastern Palearctic" (Kerr et al. 2009a). Public data were supplemented with new sequences acquired from 826 specimens representing 113 species of North American 80 birds. The majority of specimens (98%) were represented by feather samples collected from banding stations including several across Canada (Atlantic Bird Observatory and Brier Island Bird Migration Research Station, Nova Scotia; St. Andrews Banding Station, New Brunswick; Gros Morne National Park Migration Monitoring Station, Newfoundland; McGill Bird Observatory, Quebec; Haldimand Bird Observatory, Long Point Bird Observatory, Prince Edward Point Bird Observatory, and Tommy Thompson Park Bird Research Station, Ontario; Inglewood Bird Sanctuary, Alberta; Mackenzie Nature Observatory, Rocky Point Bird Observatory, and Vaseux Lake Bird Observatory, British Columbia; Albert Creek Bird Banding Station and Teslin Lake Bird Banding Station, Yukon) and a single station in North Carolina, U.S.A. (Appalachian Highlands Science Learning Center of the US National Park Service). The remaining specimens were comprised of muscle tissue samples from curated collections (including the Canadian Wildlife Service, Royal Ontario Museum, Burke Museum of Natural History and Culture, Museum of Comparative Zoology, Museum of Southwestern Biology, and the Smithsonian Institution National Museum of Natural History). All DNA extraction, PCR, and sequencing methods follow those reported by Kerr et al. (2007) and were performed at the Biodiversity Institute of Ontario, University of Guelph. Assessing genetic diversity Only species that were represented in BOLD by 12 or more COI sequences were included for this analysis. Additionally, only sequences greater than 650 bp and with fewer than 7 (= approximately 1%) ambiguous base calls were included. In total, 55 species were used in the analysis and are listed in Appendix 4.1. By nature of the original 81 sampling scheme, specimens were sampled broadly from across the respective range of each species. To quantify genetic diversity, the number of unique haplotypes (h), haplotype diversity (//), and nucleotide diversity (71) were calculated using DnaSP version 5.0 (Librado and Rozas 2009). The minimum nearest neighbour distance was used to measure interspecific divergence and was calculated using the Kimura 2-parameter metric in the BOLD Management and Analysis System version 2.5 (Ratnasingham and Hebert 2007). Linear regression was used to test the relationship between sampling effort (i.e. the number of specimens included in the analysis) and h, H, and 71, respectively using R version 2.5.0 (R Development Core Team 2007). The Pearson product-moment correlation coefficient was used to test for a relationship between nearest-neighbour distance and h, H, and 71, respectively, also using R version 2.5.0 (R Development Core Team 2007). Neutrality tests of COI variation To test COI variation for evidence of neutrality, I selected pairs of closely related species that were well populated in BOLD. Congeneric pairs of sister taxa were identified using a neighbour-joining tree generated from BOLD (Ratnasingham and Hebert 2007). Species pairs were only selected when one member of the pair was represented by at least 7 specimens and the other was represented by at least 2 specimens. In total, this included 34 pairs of species representing 10 orders and 29 families of birds (see Table 4.1). The standard McDonald-Kreitman test (available at http://mkt.uab.es/mkt/) was run on each species pair using the vertebrate mitochondrial 82 code (Egea et al. 2008). This test produces a 2 x 2 contingency table of nonsynonymous intraspecific polymorphisms (Pn), synonymous intraspecific polymorphisms (Ps), nonsynonymous fixed differences (Dn), and synonymous fixed differences (Ds). The program also provides the neutrality index, or NI, which is calculated as (Pn/Ps)/(Dn/Ds) (Rand and Kann 1996). Neutrality is supported when NI = 1, whereas N < 1 implies an excess of amino acid divergence between species, and N > 1 implies an excess of amino acid polymorphism within species. Amino acid variation Because amino acid variation tends to be low between pairs of avian sister species, I also examined variation at the ordinal level to assess amino acid variation in COL I selected the 12 best-represented orders from the database (Apodiformes, Anseriformes, Charadriiformes, Ciconiiformes, Columbiformes, Coraciiformes, Falconiformes, Galliformes, Passeriformes, Piciformes, Psittaciformes, and Strigiformes) and then trimmed the database to include only species with at least two full-length sequences (i.e. 694 bp). Species with polymorphisms (n = 43) were removed from the analysis so that only species with fixed differences were included. Single individuals were randomly selected from each of the remaining species pairs to populate the final working dataset (n = 623). Nucleotide sequences were translated to amino acid sequences using Geneious version 3.5 (Drummond et al. 2007). The number of amino acid sequence "types" was tallied for each order (denoted as h\ analogous to haplotype number). To capture the diversity of amino acid sequence types within each order (i.e. the frequency 83 of each type), I calculated a diversity value (denoted if) using a modified version of Nei's haplotype diversity equation (equation 8.5, Nei 1987), H' = n(l-lxi2)/(n-l) where n equals the number of species in each order and x, represents the frequency of the ith amino acid sequence type within each order. The number of amino acid types unique to each order were identified using a neighbour-joining tree generated through BOLD (Ratnasingham and Hebert 2007). To assess the level of intra-order amino acid divergence, I calculated the mean PAM1 matrix scores for each order using MEGA version 4.0 (Tamura et al. 2007). To approximate the position of amino acid substitutions within the COI protein, a consensus sequence was constructed from the 623 amino acid sequences described above. The consensus sequence was aligned to the bovine sequence using an ends-free local alignment and the BLOSUM62 substitution matrix. The consensus sequence was positioned on the bovine secondary structure, which has been determined via crystallography (Tsukihara et al. 1996), using the alignment as a guide. Residues were identified as either loop or helix sites based on their location in the bovine structure. A Chi-square test with Yates correction was run in R version 2.5.0 (R Development Core Team 2007) to see if variable sites were equally distributed between the two regions. COI versus other mitochondrial genes Whole avian mitochondrial genomes published on GenBank as of 14 December 2009 were surveyed for pairs of congeneric taxa. Two genera - Gallus and Syrmaticus - were represented by more than two species, so the two most closely related species were 84 selected for analysis. In total, 22 complete mitochondrial genomes were downloaded for the 11 available congeneric pairs (see Appendix 4.2). Each of the 13 protein-coding genes was segregated into a separate fasta file and aligned using ClustalW in MEGA version 4.0 (Tamura et al. 2007). An extra base pair was removed from the ND3 sequence at position 9768 from several species to maintain the reading frame. ND6 was analyzed in reverse complement for all species. Pairwise dN/ds ratios were calculated for all genes from all congeneric pairs using the codeml package in PAML version 4.3 (Yang 2007). The disj/ds ratios were transformed prior to statistical analysis using an arcsine square root transformation. A one-way ANOVA was used to test for a difference in dw, ds, and the dN/ds ratios between the different genes. A Tukey's honestly significant difference (HSD) test was used subsequently to identify which genes differed significantly. Both tests were performed using R version 2.5.0 (R Development Core Team 2007). RESULTS Genetic diversity The sampling effort ranged from 12 to 34 COI sequences per species. The nearest neighbour distance varied dramatically from 0 to 12.88% K2P corrected distance. Figure 4.1 provides scatter plots for all 6 comparisons. Only haplotype number was significantly 2 correlated to sampling effort, although the relationship was weak (r = 0.20, Fii53 = 13.16, p < 0.001). There was no significant relationship with haplotype diversity (r2 = 2 0.09, Fi,53 = 0.42, p = 0.518) or nucleotide diversity (r = 0.01, Fi,53 = 0.01, p = 0.923). 85 However, there was a weak but significant correlation between nearest-neighbour distance and nucleotide diversity (r53 = 0.46, p < 0.001), as well as haplotype diversity (r53 = 0.31, p = 0.019), but there was no significant correlation to haplotype number (Y53 = 0.22, p = 0.103). Neutrality tests Both polymorphic and fixed nonsynonymous differences were rare between species pairs (see Table 4.1). Consequently, 15% of NI values were zero and 79% were undefined. Only two pairwise comparisons {Empidonax alnorum/E. traillii and Strix occidentalis/S. varia) possessed at least one polymorphic and one fixed nonsynonymous difference, but only Strix had significantly different ratios of polymorphic to fixed differences (x2 = 5.74, p < 0.05, Table 4.1). Overall, Ds was significantly greater than Ps (7.5-fold on average; 139 = -10.49, p < 0.001), whereas DN did not differ significantly from PN (t59 = 1.09, p = 0.279). Interestingly, Ps was not correlated to Ds (r32 = 0.14, p = 0.404) but was correlated to sample size (r32 = 0.49, p < 0.01). Amino acid diversity Table 4.2 summarizes the results of the diversity tests. Of the 231 residues, 41 were variable among the species examined. Amino acid sequence diversity was relatively high within orders, but the degree of variation (i.e. the number of substitutions) was limited. Highly divergent taxa often shared the same sequence. For example, the same amino acid sequence was recovered from members of the Charadriiformes, Columbiformes, Coraciiformes, and Falconiformes. 86 The predicted secondary structure of the consensus amino acid sequence is illustrated in Figure 4.2. According to the predicted structure, 78% of residues occur in helix sites and 73% of variable positions were in that region, revealing no association between variation and position in the secondary structure (x2 = 0.46, p = 0.497). Most amino acid substitutions were those known to commonly occur (e.g. isoleucine <-> valine, alanine <-> serine, and isoleucine <-> leucine) (Betts and Russell 2003). Approximately 20% of the amino acid substitutions were only observed within a single species. Confidence in these rare amino acid sequences is increased by the sampling strategy, wherein only fixed amino acid substitutions were included in analysis. However, two unusual substitutions were observed, both occurring in the last residue and both in Anseriformes: leucine —> serine in Netta peposaca, and leucine —» phenylalanine in Amazonetta brasiliensis. Genomic comparisons Syrmaticus ellioti and S. humiae were too narrowly divergent to be informative (the dw/ds ratio for every gene except ND5 was either 0 or 1), so this species pair was removed from further analyses. The mean values for d^, ds, and the dN/ds ratio are depicted for each gene in Figure 4.3. There was a significant difference in the d^ds ratios recorded for the thirteen protein-coding genes (F12,117 = 6.40, p < 0.001). A significant difference was also recorded for dw (F12,117 = 3.12, p < 0.001), though not for ds (F12,117 = 0.75, p < 0.699). Post-hoc Tukey HSD comparisons revealed that the ANOVA result for the dw/ds ratios was due to ATP8, which differed significantly from all other genes (p 87 < 0.01), and COI, which differed from ND3 (p < 0.05). Similarly, the difference in dN occurred between ATP8 and COI, COII, COIII, and Cyt B, respectively (p < 0.01). DISCUSSION The present results confirm the prediction of Baker et al. (2009) that the number of rare haplotypes encountered will increase with sampling effort, as is fairly intuitive. However, because intraspecific divergence remains relatively low between most haplotypes, the mean intraspecific variation is nearly unaffected by sampling effort. There was a relationship between haplotype diversity and interspecific divergence and, in contrast to Kerr et al. (2007), I did find a weak relationship between intraspecific variation and interspecific divergence as well. This discrepancy may be partly due to their treatment of divergent mitochondrial lineages as "provisional species", which would reduce both intraspecific variation and interspecific divergence in some of the species included in this study such as Vireo gilvus or Troglodytes troglodytes, among others. This difference aside, the relationship is suggestive of demographic effects (i.e. bottlenecks), rather than selection. Attempts to test neutrality were impeded by the lack of amino acid sequence variation both within and between species, which is a common problem of this method (Meiklejohn et al. 2007). The McDonald-Kreitman test is susceptible to error when taxa are distantly related and multiple substitutions at single sites are likely, but amino acid variation is rarer when taxa are closely related (Ballard and Whitlock 2004). Lack of amino acid variation disrupts the utility of the neutrality index, as it results in either a zero value (when polymorphisms are absent) or infinity (when divergence is absent). 88 Other studies have circumvented this issue with a simple, yet questionable practice of substituting zeros with arbitrary values (i.e. Bazin et al. 2006; Rand and Kann 1998). Previous studies examining the neutrality of variation in avian mitochondrial genes have generally proposed a model of mildly deleterious mutations (e.g. Fry 1999). Zink (2005) found that members of the passerine genus Parus exhibited an excess of nonsynonymous polymorphisms in closely related species and cited purifying selection as the cause after ruling out demographic effects. However, the same author (2006a) could not reject neutrality when examining phylogroups of the polymorphic species Sitta europaea, suggesting that drift was largely responsible for genetic differences between budding species. The taxonomic scope in this study was much broader than its predecessors and the general pattern appeared to be one of functional constraint. While the only calculable NI values were greater than one, it would be misleading to describe the sequences as bearing excess amino acid polymorphisms since amino acid variation was generally rare. In either case, the pattern is suggestive of purifying selection. Variation in the amino acid sequence was rare between closely related species, but there was substantial variation when broader taxonomic comparisons were made. Most positions in the amino acid sequence (82%) are conserved across all birds. While variable sites were more numerous within helix sites, they were proportionately equal between helix and loop sites. This is inconsistent with previous studies that have observed differing selective pressures on surface (primarily loops) and transmembrane sites, with substitutions in transmembrane sites being more heavily constrained by interaction effects with other residues (Wang and Pollock 2007), and a greater tendency towards neutral evolution within surface sites (Wise et al. 1998). However, these 89 inferences suppose the accuracy of the predicted secondary structure, which unfortunately is difficult to verify. Mapping these changes onto a phylogeny is challenging as our current understanding of evolutionary relationships amongst the major avian orders is in flux (Hackett et al. 2008). However, looking at variability within orders, it is clear that some amino acids substitutions are recurrent (e.g. isoleucine <-> valine at position 12), whereas other substitutions have a single origin (e.g. glycine —> serine at position 117 in Strigiformes). Small changes to proteins can have an adaptive impact. For example, adaptive changes to haemoglobin proteins in high-altitude geese have been attributed to four mutations in the bar-headed goose, Anser indicus (Liang et al. 2001), and a single mutation in the Andean goose, Chloephaga melanoptera (Hiebl et al. 1987). For the majority of the amino acid substitutions observed here, an adaptive explanation seems unlikely, especially when amino acid sequences are shared between very divergent taxa. Co-evolution within the gene has been demonstrated for proximal residues in vertebrates (Wang and Pollock 2007), but this too seems an unlikely explanation given the independent origins of the sequences. It is more likely that most of the observed amino acid substitutions have low-impact changes that escape purifying selection, particularly since linkage between genetic loci is known to reduce the effectiveness of purifying selection (Paland and Lynch 2006). Selection in mitochondrial genes has been attributed to co-evolution with nuclear- encoded genes. In marine copepods, inter-population hybrids have shown reduced mitochondrial function and, subsequently, reduced fitness (Burton et al. 2006). Reduced function has also been demonstrated in cybrid cells that cross human nuclear DNA with 90 mtDNA from other primates (Kenyon and Moraes 1997). In birds, the fitness costs to hybrids are less clear. Empirical studies have revealed an effect on the metabolic rate of hybrids between divergent populations of an Old World passerine, Saxicola torquata spp. (Tieleman et al. 2009). Despite a measurable impact on metabolic rate, the overall fitness cost is uncertain and could still depend on the magnitude of mitochondrial mismatch between taxa. Complete mitochondrial introgression has been demonstrated between certain avian sister species, such as the Palearctic buntings Emberiza citrinella and Emberiza leucocephalos (Irwin et al. 2009), which is one situation that does lend support to the selective sweep hypothesis. Introgression has also been ascribed to selective processes in other non-avian species, including salmonids (Wilson and Bernatchez 1998) and Drosophila (Bachtrog et al. 2006). Conversely, introgressed mitochondrial haplotypes from gray wolves, Canis lupus (Lehman et al. 1991), and domestic dogs, Canis familiaris (Adams et al. 2003), have been recovered from populations of coyote, Canis latrans, but neither of these has lead to fixation. The consequence of mitochondrial substitutions between recently diverged species appears erratic, but current data would suggest divergence is mostly spurred by drift and less occasionally by selection. An important consideration for this study is how reflective DNA barcode data are of general trends in the mitochondrial genome. The barcode region has in fact been previously used as a predictor of variation in nucleotide composition across the mitochondrial genome (Clare et al. 2008; Min and Hickey 2007), but that is not to say that mutation rates cannot vary between mitochondrial genes. The number of replacement sites occurring in the "barcoding region" of COI was not significantly different from that of the rest of the COI gene (K. C. R. Kerr, personal observation), which confirms that 91 DNA barcodes provide an overall representation of COI. Across the genome, COI has been known for its conservative substitution rate (Lynch and Jarrell 1993). A comprehensive summary of substitution rates in vertebrate mitochondrial genomes suggested that the rate increases with distance from the origin (Broughton and Reneau 2006). Observing this pattern within birds is hindered because the origin of replication for the light strand is as of yet undetermined in the avian mitochondrial genome (Desjardins and Morais 1990). Regardless, this pattern was not apparent in this study, given that divergence rates between genes did not differ significantly. Conclusions Overall, I found no clear evidence for recurrent selective sweeps. While barcode data do not match neutral predictions, the impression is that evolution in mitochondrial genes, and COI in particular, are largely governed by purifying selection. Nucleotide divergence between closely related species appears mostly attributable to drift. Because of the linkage of the entire mitochondrial genome, it is challenging to study evolution without examining the entire genome. As large-scale DNA sequencing becomes more accessible, there will be growth in the number of sequenced whole mitochondrial genomes. Currently, no avian species is represented by more than one mitochondrial genome in GenBank, but intraspecific mitochondrial genomic variation has yielded insights into the evolutionary process for other organisms, such as gadine fish (Marshall et al. 2009) and humans (Mishmar et al. 2003). 92 Table 4.1. Thirty-four species pairs included in McDonald-Kreitman tests for neutrality of COI variation in birds. Sample size for each species is indicated in parentheses. Acronyms are for nonsynonymous intraspecific polymorphisms (Pn), synonymous intraspecific polymorphisms (Ps), nonsynonymous interspecific fixed differences (Dn), synonymous interspecific fixed differences (Ds), and neutrality index (NI). Ps, Ds, Pn, and Dn are uncorrected values. 2 Species 1 (n) Species 2 (n) P„ Ps D„ Ds NI % p Lagopus muta (21) L. leucura (5) 0 6 2 38 0 0.31 0.575 Phalaropus lobatus (11) P. fulicarus (2) 1 2 0 34 null 11.65 0 Actitus macularius (8) A. hypoleucos (5) 0 0 0 64 null null null Brachyramphus brevirostris (7) ZJ. marmoratus (2) 0 7 0 31 null null null Gallinago gallinago/delicata (9) G. paraguaiae (3) 1 4 0 21 null 4.38 0.036 Lams ridibundus (8) L. Philadelphia (4) 0 1 0 15 null null null Stercorarius longicaudus (7) 5. parasiticus (4) 0 5 0 37 null null null Thalasseus sandvicensis (8) P. elegans (5) 0 6 0 10 null null null Phalacrocorax pelagicus (11) P. penicillatus (5) 0 3 0 37 null null null Puffin us pacificus (7) P. few/Zen (6) 0 4 0 22 null null null Strix occidentalis (7) S. varia (4) 1 3 1 51 17 5.74 0.016 Megascops asio (9) M. kennicotti (5) 0 4 1 37 0 0.11 0.742 Chaetura vauxi (8) C. pelagica (2) 0 3 0 14 null null null Falco sparverius (7) P. tinnunculus (3) 0 8 1 53 0 0.15 0.697 Picoides villosus (10) P. albolarvatus {!) 0 14 0 18 null null null Zenaida macroura (8) Z auriculata (8) 0 8 0 15 null null null Columbina passerina (8) C talpacoti (4) 0 3 0 35 null null null Empidonax alnorum (8) £. rrai//(7 (4) 2 3 2 13 4.333 1.67 0.196 Leptasthenura aegithaloides (8) L. fuliginiceps (3) 0 5 1 37 0 0.13 0.713 Phacellodomus ruber (8) P. striaticollis (3) 0 2 0 21 null null null Phytotoma rutila (7) P. rara (3) 2 3 0 59 null 24.36 0 Nucifraga caryocatactes (9) N. columbina (8) 0 10 0 35 null null null Poecile montana (20) P. palustris (11) 1 12 0 38 null 2.98 0.084 Cinclus mexicanus (7) C. cinclus (5) 0 7 2 51 0 0.27 0.601 Locustella certhiola (7) L. ochotensis (7) 0 10 0 30 null null null Turdus viscivorus (8) T. philomelos (4) 0 14 0 54 null null null Seiurus noveboracensis (24) S. aurocapillus (\A) 0 21 0 47 null null null Sicalis luteola (7) 5. flaveola (6) 3 7 0 55 null 17.30 0 Melospiza melodia (27) M lincolnii (26) 2 7 0 18 null 4.32 0.037 Emberiza aureola (11) Zs. rustica (9) 111 0 34 null 2.90 0.088 Paroaria capitata (7) P. coronata (5) 0 1 1 31 0 0.03 0.857 Molothrus bonarensis (10) M afer (9) 3 4 0 16 null 7.89 0.004 Fringilla montifringilla (12) P. coe/efo (4) 0 7 0 49 null null null Passer domesticus (17) P. montanus (11) 1 11 0 37 null 3.15 0.076 93 Table 4.2. Summary of COI amino acid variation for the 12 orders of birds examined. The number of amino acid sequence types (h1), the diversity of amino acid sequence types (H'), and the percentage of types unique to each order are outlined below. The mean intra-order PAM score indicates the level of amino acid divergence. Order n h' H' Uniqueness PAM (±s.d.) Ciconiiformes 9 6 0.833 50% 0.00778 (±0.00355) Anseriformes 30 9 0.662 100% 0.00507 (±0.00215) Falconiformes 15 9 0.848 93% 0.01199 (±0.00443) Galliformes 11 8 0.891 100% 0.01113 (±0.00488) Charadriiformes 84 10 0.361 50% 0.00196 (±0.00076) Columbiformes 14 8 0.901 50% 0.00689 (±0.00301) Psittaciformes 9 7 0.944 71% 0.00817 (±0.00331) Strigiformes 9 8 0.972 100% 0.01817 (±0.00562) Apodiformes 15 9 0.924 89% 0.01084 (±0.00341) Coraciiformes 7 6 0.952 71% 0.01724 (±0.00526) Piciformes 22 4 0.403 75% 0.00194 (±0.00112) Passeriformes 398 78 0.903 95% 0.01458 (±0.00490) Total 623 148 0.944 94 A) D) • • •-. —1— 1 25 30 0.00 0.02 004 006 008 0,10 0.12 NN Figure 4.1. Scatter plots relating sampling effort of avian COI sequences (n) to A) haplotype number (h), B) haplotype diversity (H), and C) nucleotide diversity (Pi), and K2P nearest neighbour distance (NN) to D) haplotype number, E) haplotype diversity, and F) nucleotide diversity. 95 Figure 4.2. Predicted secondary structure for the avian consensus sequence of the "barcoding region" of COI based on the structure derived from bovine cytochrome c oxidase. Letters indicate the consensus amino acid sequence based on the one-letter code. Black circles are conserved sites. Variable sites vary from white to gray based on the percentage of sequences containing the consensus amino acid. The number of different amino acids occuring at a single site is represented by the thickness of the outline. 96 s J J. ND2 COI CON ATP8 ATP6 COIII ND3 ND4L ND4 ND5 CylB ND6 ND1 ••••.••••••! I ! 1 1 1 1 1 1 i 1 1 1 1 1 ND1 ND2 COI COII ATP8 ATP6 COIII ND3 ND4L ND4 ND5 CytB ND6 1 3 ~i 1 ! 1 1 1 1 i 1 1 1 1 r ND1 ND2 COI COII ATP8 ATP6 COIN ND3 ND4L ND4 ND5 CytB ND6 Figure 4.3. Box plots of dN (yellow), ds (blue), and the dn/ds ratios (green) for each of the thirteen protein-coding mitochondrial genes from the 22 species listed in Appendix 4.2. Genes are ordered to match the arrangement in the avian mitochondrial genome. 97 EPILOGUE 98 GENERAL CONCLUSIONS In the introduction of this thesis, I presented two potential uses for DNA barcoding: species identification and species discovery. In many groups of organisms, species identification is a daunting task, often requiring expert opinion to decipher otherwise unintelligible keys (hence the appeal of DNA barcoding). Thanks to an abundance of meticulously illustrated field guides and monographic tomes, species identification in birds is far more approachable. Complete checklists of the birds of the world are tended competitively (e.g. Clements 2007; Gill and Wright 2006), providing a reasonably accurate view of total species numbers. Armed with this knowledge, I have demonstrated the efficacy and the limitations of the DNA barcode-based approach to species identification in three regional, mostly temperate bird faunas. The comprehensive library of COI sequences amassed for North American birds revealed that most species (94%) could be accurately identified. By expanding the database to include the birds of the southern Neotropics and the eastern Palearctic, where climatic histories differ, I was able to demonstrate that the success with North American birds was not a simple artifact of glacial bottlenecks (Hughes and Hughes 2007), as similarly strong performance was revealed in the Palearctic and Neotropical regions. Some underlying levels of intraspecific variation do exist in COI, so it is important to clearly define a method by which species may be delimited to accurately correspond with taxonomic boundaries. The threshold approach, often represented by the "lOx rule" (Hebert et al. 2004), oversimplifies this process. In Chapter 3, while distance-based and tree-based methods performed equally well, I demonstrated that a multi-tiered approach is likely the best solution to this problem. 99 The final COI dataset provided significant representation of a large number of closely related species, including several pairs of sister taxa, wherein conflicts in resolution were most likely to occur (Moritz and Cicero 2004). Groups that were indistinguishable using DNA barcodes alone most frequently involved pairs of closely related species where it is suspected that insufficient time had passed to allow COI sequences to diverge. That these occurrences were not uniformly distributed across the avian class (e.g. frequently occurring within Anseriformes) suggests that rate variation is important. Introgression has also been pinpointed as an agent for reducing variation between species (Irwin et al. 2009). However, if gene flow does not continue between species, differences should accumulate over time, as seen in Stercorarius pomarinus (Andersson 1999). In a few cases, species limits may genuinely require revision (e.g. Gallinago delicata, Baker et al. 2009). In two exceptional genera - Sporophila and Larus - a plethora of species appear as a single genetic mixing pot. Attempts to distinguish the Sporophila species using alternative genes have also failed (Campagna et al. 2009). Similar assemblages in other taxa are often united as single species (Mila et al. 2007a; Mila et al. 2007b); oddly, in the gulls, systematic effort appears to be working in the reverse direction, with new species being recognized (Banks et al. 2008). However, a slower mutation rate has been proposed as a cause for the limited genetic diversity of gulls (Crochet and Desmarais 2000). While they may form a seemingly insurmountable challenge to molecular identification methods, some of the other narrowly divergent taxa could be separated using a character-based method where thresholds failed, but this does require an a priori definition of species limits (DeSalle et al. 2005). 100 The topic of species limits segues to the second use of DNA barcodes: species discovery. This has been by far the most contentious application of DNA barcoding (DeSalle 2006; DeSalle et al. 2005; Hickerson et al. 2006; Moritz and Cicero 2004; Will et al. 2005). First, it must be clarified that DNA barcoding is not intended to replace traditional taxonomy (Gregory 2005); rather, its benefit to the taxonomic enterprise is additive (Padial and de la Riva 2007). For example, divergent lineages noted within species in Chapters One and Two have since been acknowledged as different species in more thorough taxonomic assessments (Areta and Pearman 2009; Barker et al. 2008; Toews and Irwin 2008). While not every divergent mitochondrial lineage will lead to the naming of new species, the findings are sufficient to warrant further investigation and test new hypotheses about species boundaries. This "first-pass" approach has been bolstered elsewhere, even where molecular evidence has not been applied, and its benefit to conservation efforts has been acknowledged (Peterson 2006). So, while a single mitochondrial gene may not provide adequate evidence to describe new species, the application of DNA barcodes to flag taxa in need of further review may still expedite the species discovery process, alleviating the taxonomic impediment. Divergent mitochondrial genes have recently been proposed as a cause of speciation rather than as a consequence thereof (Gershoni et al. 2009), but others, such as Jerry Coyne, have dismissed this notion (Lane 2009). In Chapter Four, I revealed that COI protein differences between closely related species were very slight, suggesting that drift is more likely responsible for interspecific divergence. However, my data were also concordant with previous findings that COI diversity is lower than other mitochondrial genes. While cyto-nuclear interactions have been implicated in the evolution of 101 cytochrome c oxidase (Schmidt et al. 2001), my data suggest that residues in genes other than COI might contribute more to such compatibility issues. The avian DNA barcode library has already served in forensic applications, including the identification of food products, feathers from anthropological artifacts, nest host species, and birds involved in "airstrike" collisions (Dove et al. 2008). While this demonstrates practical applications of the barcode library for bird identification, the method is arguably most germane to microscopic taxa and those with cryptic life stages. The question remains, how well are the results of this study extrapolated to other groups of organisms? The time lag between speciation and hybrid inviability is known to persist longer in birds than in other vertebrates (Fitzpatrick 2004). This could result in a deceleration of the mitochondrial DNA substitution rate, which is in fact well documented in birds (Kessler and Avise 1985). This would also explain the susceptibility of avian species to mitochondrial introgression. However, alternative explanations for this reduced genetic variability exist, including Hill-Robertson effects associated with the maternally inherited W chromosome (Berlin et al. 2007), a lower output of reactive oxygen species (Hickey 2008), a decreased tolerance of amino acid substitutions (Stanley and Harrison 1999), and an accelerated rate of morphological evolution (Johns and Avise 1998). Watson (2005) contended that birds were more likely than other animal species to be described based on attributes that allow field diagnosis, which could result in the oversight of evolutionary significant units. While this thesis demonstrates that that occasionally is true, molecular features are now routinely employed in avian taxonomy (Collar and Spottiswoode 2005) and the rate of 'cryptic species' discovery in birds is 102 considered on par with that observed in other organisms (Pfenninger, Schwenk, 2007). Given these known differences, it is reasonable to extrapolate the findings of this study to other vertebrate groups, with the caveat that low sequence divergence between sister species might be less severe in other groups, but higher intraspecific diversity might also be observed. Extending these findings to invertebrate taxa could provide a greater challenge given the larger degree of life history differences. Proponents of using nuclear loci for avian phylogenetics and other molecular investigations now have a profusion of primer pairs at their disposal for use in multilocus studies (Edwards 2008). While this is an exciting advancement, particularly for population-level studies, supremacy is still touted for mitochondrial genes when it comes to species delimitation (Zink and Barrowclough 2008). While many former beliefs about the evolutionary properties of mitochondrial markers have been falsified, they continue to serve a valued purpose as our understanding is refined (Galtier et al. 2009). Additionally, molecular data is on the brink of a new era of accessibility, thanks to advances in sequencing technology (Ellegren 2008). Consequently, large-scale genomic studies will proliferate and whole mito-genomic sequences will become increasingly available, enabling a greater understanding of mitochondrial evolution (Zardoya and Suadrez 2008). This will allow more accurate biological inferences to be made from the growing pool of mitochondrial genetic data, such as that resulting from the Barcode of Life project. Ultimately, the development of a comprehensive library of avian DNA barcodes has been reciprocally illuminating. Avian taxonomy has provided a gold standard against which to test the efficacy of DNA barcoding. In exchange, departures from expected patterns have yielded new insight into taxonomic boundaries. Birds cannot be the sole 103 test of the DNA barcoding paradigm, but alongside other global taxonomic campaigns such as FISH-BOL (Ward et al. 2009) and the All-Leps campaign, we can garner an unbiased appreciation of the power and limitations of this taxonomic tool. 104 REFERENCES Abdo Z, Golding GB (2007) A step toward barcoding life: A model-based, decision- theoretic method to assign genes to preexisting species groups. Systematic Biology 56:44-56 Adams JR, Leonard J A, Waits LP (2003) Widespread occurrence of a domestic dog mitochondrial DNA haplotype in southeastern US coyotes. Molecular Ecology 12:541-546 Akimova A, Haring E, Kryukov S, Kryukov A (2007) First insights into a DNA sequence based phylogeny of the Eurasian Jay Garrulus glandarius. Russian Journal of Ornithology 16:567-575 Aliabadian M, Roselaar CS, Nijman V, Sluys R, Vences M (2005) Identifying contact zone hotspots of passerine birds in the Palearctic region. Biology Letters 1:21-23 Andersson M (1999) Hybridization and skua phylogeny. Proceedings of the Royal Society of London Series B-Biological Sciences 266:1579-1585 Arbogast BS, Drovetski SV, Curry RL, Boag PT, Seutin G, Grant PR, Grant BR, Anderson DJ (2006) The origin and diversification of Galapagos mockingbirds. Evolution 60:370-382 Areta JI, Pearman M (2009) Natural history, morphology, evolution, and taxonomic status of the earthcreeper Upucerthia saturatior (Furnariidae) from the Patagonian forests of South America. Condor 111:135-149 Arnaiz-Villena A, Alvarez-Tejado M, Ruiz-del-Valle V, Garcia-de-la-Torre C, Varela P, Recio MJ, Ferre S, Martinez-Laso J (1998) Phylogeny and rapid Northern and 105 Southern Hemisphere speciation of goldfinches during the Miocene and Pliocene Epochs. Cellular and Molecular Life Sciences 54:1031-1041 A vise JC, Ankney CD, Nelson WS (1990) Mitochondrial gene trees and the evolutionary relationships of mallard and black ducks. Evolution 44:1109-1119 Avise JC, Walker DE (1999) Species realities and numbers in sexual vertebrates: Perspectives from an asexually transmitted genome. Proceedings of the National Academy of Sciences of the United States of America 96:992-995 Bachtrog D, Thornton K, Clark A, Andolfatto P (2006) Extensive introgression of mitochondrial DNA relative to nuclear genes in the Drosophila yakuba species group. Evolution 60:292-302 Baker AJ, Tavares ES, Elbourne RF (2009) Countering criticisms of single mitochondrial DNA gene barcoding in birds. Molecular Ecology Resources 9:257-267 Ballard JWO, Whitlock MC (2004) The incomplete natural history of mitochondria. Molecular Ecology 13:729-744 Banks RC, Chesser RT, Cicero C, Dunn JL, Kratter AW, Lovette IJ, Rasmussen PC, Remsen JV, Rising JD, Stotz DF, Winker K (2008) Forty-ninth supplement to the American Ornithologists' Union - Check-list of north American birds. Auk 125:756-766 Banks RC, Cicero C, Dunn JL, Kratter AW, Ouellet H, Rasmussen PC, Remsen JV, Rising JA, Stotz DF (2000) Forty-second supplement to the American Ornithologists' Union check-list of North American birds. Auk 117:847-858 106 Barber P, Boyce SL (2006) Estimating diversity of Indo-Pacific coral reef stomatopods through DNA barcoding of stomatopod larvae. Proceedings of the Royal Society B-Biological Sciences 273:2053-2061 Barker FK, Vandergon AJ, Lanyon SM (2008) Assessment of species limits among yellow-breasted meadowlarks (Sturnella spp.) using mitochondrial and sex-linked markers. Auk 125:869-879 Barrett RDH, Hebert PDN (2005) Identifying spiders through DNA barcodes. Canadian Journal of Zoology 83:481-491 Barrowclough GF, Shields GF (1984) Karyotypic evolution and long-term effective population sizes of birds. Auk 101:99-102 Barton NH (2000) Genetic hitchhiking. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 355:1553-1562 Bates JM, Hackett SJ, Goerck JM (1999) High levels of mitochondrial DNA differentiation in two lineages of antbirds (Drymophila and Hypocnemis). Auk 116:1093-1106 Bazin E, Glemin S, Galtier N (2006) Population size does not influence mitochondrial genetic diversity in animals. Science 312:570-572 Bearhop S, Fiedler W, Furness RW, Votier SC, Waldron S, Newton J, Bowen GJ, Berthold P, Farnsworth K (2005) Assortative mating as a mechanism for rapid evolution of a migratory divide. Science 310:502-504 Benasson D, Zhang D, Hart DL, Hewitt GM (2001) Mitochondrial pseudogenes: Evolution's misplaced witnesses. Trends in Ecology & Evolution 16 107 Berlin S, Tomaras D, Charles worth B (2007) Low mitochondrial variability in birds may indicate Hill-Robertson effects on the W chromosome. Heredity 99:389-396 Berry OF (2006) Mitochondrial DNA and population size. Science 314:1388-1388 Betts MJ, Russell RB (2003) Amino acid properties and consequences of substitutions. In: Barnes MR, Gray IC (eds) Bioinformatics for Geneticists. Wiley, Chichester, p408 Bildstein KL (2004) Raptor migration in the neotropics: Patterns, processes, and consequences. Ornitologia Neotropical 15:83-99 Blaxter ML (2004) The promise of a DNA taxonomy. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 359:669-679 Bolen EG (1979) Blue-winged x cinnamon teal hybrid (Anas discors x Anas clypeata) from Oklahoma, USA. Wilson Bulletin:367-370 Broughton RE, Reneau PC (2006) Spatial covariation of mutation and nonsynonymous substitution rates in vertebrate mitochondrial genomes. Molecular Biology and Evolution 23:1516-1524 Brown WM, George M, Wilson AC (1979) Rapid evolution of animal mitochondrial DNA Proceedings of the National Academy of Sciences of the United States of America 76:1967-1971 Brumfield RT (2005) Mitochondrial variation in Bolivian populations of the variable antshrike (Thamnophilus caerulescens). Auk 122:414-432 Brumfield RT, Capparella AP (1996) Genetic differentiation and taxonomy in the House Wren species group. Condor 98:547-556 108 Bucklin A, Wiebe PH (1998) Low mitochondrial diversity and small effective population sizes of the copepods Calanus finmarchicus and Nannocalanus minor. Possible impact of climatic variation during recent glaciation. Journal of Heredity 89:383-392 Burton RS, Ellison CK, Harrison JS (2006) The sorry state of F-2 hybrids: Consequences of rapid mitochondrial DNA evolution in allopatric populations. American Naturalist 168:S14-S24 Busse HJ, Denner EBM, Lubitz W (1996) Classification and identification of bacteria: Current approaches to an old problem. Overview of methods used in bacterial systematics. Journal of Biotechnology 47:3-38 Cameron S, Rubinoff D, Will K (2006) Who will actually use DNA barcoding and what will it cost? Systematic Biology 55:844-847 Campagna L, Lijtmaer DA, Kerr KCR, Barreira AS, Hebert PDN, Lougheed SC, Tubaro PL (2009) DNA barcodes provide new evidence of a recent radiation in the genus Sporophila (Aves: Passeriformes). Molecular Ecology Resources Online Early: 10.1111/J.1755-0998.2009.02799.X Catalano D, Licciulli F, Turi A, Grillo G, Saccone C, D'Elia D (2006) MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa. BMC Bioinformatics 7 Chaves AV, Clozato CL, Lacerda DR, Sari EHR, Santos FR (2008) Molecular taxonomy of brazilian tyrant-flycatchers (Passeriformes: Tyrannidae). Molecular Ecology Resources 8:1169-1177 109 Chesser RT (2000) Evolution in the high Andes: The phylogenetics of Muscisaxicola ground-tyrants. Molecular Phylogenetics and Evolution 15:369-380 Cicero C, Johnson NK (1998) Molecular phylogeny and ecological diversification in a clade of New World songbirds (genus Vireo). Molecular Ecology 7:1359-1370 Clare EL, Kerr KCR, von Konigslow TE, Wilson JJ, Hebert PDN (2008) Diagnosing mitochondrial DNA diversity: Applications of a sentinel gene approach. Journal of Molecular Evolution 66:362-367 Clements JF (2007) The Clements checklist of the birds of the world. Cornell University Press, New York Collar NJ, Spottiswoode CN (2005) Species limits in birds: A response to Watson. Bioscience 55:388-389 Cooke F, Rockwell RF, Lank DB (1995) The Snow Geese of La Perouse Bay: natural selection in the wild. Oxford University Press, Oxford, U.K. Crochet PA, Desmarais E (2000) Slow rate of evolution in the mitochondrial control region of gulls (Aves : Laridae). Molecular Biology and Evolution 17:1797-1806 de Queiroz K (2005) Ernst Mayr and the modern concept of species. Proceedings of the National Academy of Sciences of the United States of America 102:6600-6607 DeSalle R (2006) Species discovery versus species identification in DNA barcoding efforts: response to Rubinoff. Conservation Biology 20:1545-1547 DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species delimitation and DNA barcoding. Philosophical Transactions of the Royal Society B-Biological Sciences 360:1905-1916 110 Desjardins P, Morais R (1990) Sequence and gene organization of the chicken mitochondrial genome - a novel gene order in higher vertebrates. Journal of Molecular Biology 212:599-634 Dove CJ, Rotzel NC, Heacker M, Weigt LA (2008) Using DNA barcodes to identify bird species involved in birdstrikes. Journal of Wildlife Management 72:1231- 1236 Drovetski SV, Zink RM, Fadeev IV, Nesterov EV, Koblik EA, Red'kin YA, Rohwer S (2004a) Mitochondrial phylogeny of Locustella and related genera. Journal of Avian Biology 35:105-110 Drovetski SV, Zink RM, Rohwer S, Fadeev IV, Nesterov EV, Karagodin I, Koblik EA, Red'kin YA (2004b) Complex biogeographic history of a Holarctic passerine. Proceedings of the Royal Society of London Series B-Biological Sciences 271:545-551 Drummond AJ, Ashton B, Cheung M, Heled J, Kearse M, Moir R, Stones-Havas S, Thierer T, Wilson A (2007) Geneious v3.0, Available from http://www.geneious.com/ Ebach MC, Holdrege C (2005) DNA barcoding is no substitute for taxonomy. Nature 434:697-697 Edelaar P, Summers R, Iovchenko NP (2003) Ecology and evolution of the crossbills- complex (Loxid). Vogelwarte 42:113-114 Edwards SV (2008) A smorgasbord of markers for avian ecology and evolution. Molecular Ecology 17:945-946 111 Efe MA, Tavares ES, Baker A J, Bonatto SL (2009) Multigene phylogeny and DNA barcoding indicate that the Sandwich tern complex (Thalasseus sandvicensis, Laridae, Sternini) comprises two species. Molecular Phylogenetics and Evolution 52:263-267 Egea R, Casillas S, Barbadilla A (2008) Standard and generalized McDonald-Kreitman test: a website to detect selection by comparing different classes of DNA sites. Nucleic Acids Research 36:W157-W162 Elias-Gutierrez M, Valdez-Moreno M (2008) A new cryptic species of Leberis Smirnov, 1989 (Crustacea, Cladocera, Chydoridae) from the Mexican semi-desert region, highlighted by DNA barcoding. Hidrobiologica 18:63-74 Ellegren H (2008) Sequencing goes 454 and takes large-scale genomics into the wild. Molecular Ecology 17:1629-1631 Evenhuis NL (2007) Helping solve the "other" taxonomic impediment: Completing the eight steps to total enlightenment and taxonomic nirvana. Zootaxa:3-12 Feldman CR, Omland KE (2005) Phylogenetics of the common raven complex (Corvus: Corvidae) and the utility of ND4, COI and intron 7 of the p-fibrinogen gene in avian molecular systematics. Zoologica Scripta 34:145-156 Felsenstein J (1985) Confidence limits on phylogenies - an approach using the bootstrap. Evolution 39:783-791 Ferri E, Barbuto M, Bain O, Galimberti A, Uni S, Guerrero R, Ferte H, Bandi C, Martin C, Casiraghi M (2009) Integrated taxonomy: traditional approach and DNA barcoding for the identification of filarioid worms and related parasites (Nematoda). Frontiers in Zoology 6 112 Fitzpatrick BM (2004) Rates of evolution of hybrid inviability in birds and mammals. Evolution 58:1865-1870 Fjeldsa J, Irestedt M, Jonsson KA, Ohlson JI, Ericson PGP (2007) Phylogeny of the ovenbird genus Upucerthia: a case of independent adaptations for terrestrial life. Zoologica Scripta 36:133-141 Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Molecular Ecology 11:839-850 Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology 3:294-299 Francisco MR, Gibbs HL, Galetti M, Lunardi VO, Galetti PM (2007) Genetic structure in a tropical lek-breeding bird, the blue manakin (Chiroxiphia caudata) in the Brazilian Atlantic Forest. Molecular Ecology 16:4908-4918 Frezal L, Leblois R (2008) Four years of DNA barcoding: Current advances and prospects. Infection Genetics and Evolution 8:727-736 Friesen VL (1997) Population genetics and the spatial scale of conservation of colonial waterbirds. Colonial Waterbirds 20:353-368 Friesen VL, Piatt JF, Baker AJ (1996) Evidence from cytochrome b sequences and allozymes for a 'new' species of alcid: The long-billed Murrelet (Brachyramphus perdix). Condor 98:681-690 Fry AJ (1999) Mildly deleterious mutations in avian mitochondrial dna: evidence from neutrally tests. Evolution 53:1617-1620 113 Funk DJ, Omland KE (2003) Species-level paraphyly and polyphyly: Frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annual Review of Ecology Evolution and Systematics 34:397-423 Galtier N, Depaulis F, Barton NH (2000) Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics 155:981-987 Galtier N, Nabholz B, Glemin S, Hurst GDD (2009) Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Molecular Ecology 18:4541-4550 Garcfa-Moreno J, Arctander P, Fjeldsa J (1998) Pre-Pleistocene differentiation among chat-tyrants. Condor 100:629-640 Garcfa-Moreno J, Arctander P, Fjeldsa J (1999) A case of rapid diversification in the neotropics: Phylogenetic relationships among Cranioleuca spinetails (Aves, Furnariidae). Molecular Phylogenetics and Evolution 12:273-281 Gerber AS, Loggins R, Kumar S, Dowling TE (2001) Does nonneutral evolution shape observed patterns of DNA variation in animal mitochondrial genomes? Annual Review of Genetics 35:539-566 Gershoni M, Templeton AR, Mishmar D (2009) Mitochondrial bioenergetics as a major motive force of speciation. Bioessays 31:642-650 Gibbs J (2009) Integrative taxonomy identifies new (and old) species in the Lasioglossum (Dialictus) tegulare (Robertson) species group (Hymenoptera, Halictidae). Zootaxa:l-38 GillFB (2003) Ornithology. Freeman, New York Gill FB, Mostrom AM, Mack AL (1993) Speciation in North American chickadees: Patterns of mtDNA genetic divergence. Evolution 47:195-212 114 Gill FB, Wright MT (2006) Birds of the World. Princeton University Press, New Jersey Gillespie JH (2000) The neutral theory in an infinite population. Gene 261:11-18 Gregory TR (2005) DNA barcoding does not compete with taxonomy. Nature 434:1067 Hackett SJ (1996) Molecular phylogenetics and biogeography of tanagers in the genus Ramphocelus (Aves). Molecular Phylogenetics and Evolution 5:368-382 Hackett SJ, Kimball RT, Reddy S, Bowie RCK, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han KL, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore WS, Sheldon FH, Steadman DW, Witt CC, Yuri T (2008) A phylogenomic study of birds reveals their evolutionary history. Science 320:1763-1768 Hajibabaei M, DeWaard JR, Ivanova NV, Ratnasingham S, Dooh RT, Kirk SL, Mackie PM, Hebert PDN (2005) Critical factors for assembling a high volume of DNA barcodes. Philosophical Transactions of the Royal Society B-Biological Sciences 360:1959-1967 Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes distinguish species of tropical Lepidoptera. Proceedings of the National Academy of Sciences of the United States of America 103:968-971 Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98 Haring E, Gamauf A, Kryukov A (2007) Phylogeographic patterns in widespread corvid birds. Molecular Phylogenetics and Evolution 45:840-862 Hasegawa M, Cao Y, Yang ZH (1998) Preponderance of slightly deleterious polymorphism in mitochondrial DNA: Nonsynonymous/synonymous rate ratio is 115 much higher within species than between species. Molecular Biology and Evolution 15:1499-1505 Hebert PDN, Cywinska A, Ball SL, DeWaard JR (2003a) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B- Biological Sciences 270:313-321 Hebert PDN, Ratnasingham S, deWaard JR (2003b) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society of London Series B-Biological Sciences 270:S96-S99 Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds through DNA barcodes. PLoS BIOLOGY 2:1657-1663 Hedges SB (2002) The origin and evolution of model organisms. Nature Reviews Genetics 3:838-849 Hewitt G (2000) The genetic legacy of the Quaternary ice ages. Nature 405:907-913 Hickerson MJ, Meyer CP, Moritz C (2006) DNA barcoding will often fail to discover new animal species over broad parameter space. Systematic Biology 55:729-739 Hickey AJR (2008) An alternate explanation for low mtDNA diversity in birds: an age- old solution? Heredity 100:443-443 Hiebl I, Braunitzer G, Schneeganss D (1987) The primary structures of the major and minor hemoglobin components of adult Andean goose (Chloephaga melanoptera, Anatidae) - the mutation leu-ser in position 55 of the beta-chains. Biological Chemistry Hoppe-Seyler 368:1559-1569 116 Hogg ID, Hebert PDN (2004) Biological identification of springtails (Hexapoda: Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes. Canadian Journal of Zoology 82:749-754 Holmes S (2003) Bootstrapping phylogenetic trees: theory and methods. Statistical Science 18:241-255 Hudson RR, Kreitman M, Aguade M (1987) A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159 Hughes AL, Hughes MAK (2007) Coding sequence polymorphism in avian mitochondrial genomes reflects population histories. Molecular Ecology 16:1369- 1376 Illera JC, Richardson DS, Helm B, Atienza JC, Emerson BC (2008) Phylogenetic relationships, biogeography and speciation in the avian genus Saxicola. Molecular Phylogenetics and Evolution 48:1145-1154 International W (2002) Waterbird population estimates, Wageningen, The Netherlands Irwin DE, Bensch S, Price TD (2001) Speciation in a ring. Nature 409:333-337 Irwin DE, Rubstov AS, Panov EV (2009) Mitochondrial introgression and replacement between yellowhammers (Emberiza citrinella) and pine buntings (E. leucocephalos; Aves, Passeriformes). Biological Journal of the Linnean Society 98:422-438 Ivanova NV, Dewaard JR, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Molecular Ecology Notes 6:998-1002 117 Johns GC, Avise JC (1998) A comparative summary of genetic distances in the vertebrates from the mitochondrial cytochrome b gene. Molecular Biology and Evolution 15:1481-1490 Johnsen A, Rindal E, Ericson PGP, Zuccon D, Kerr KCR, Stoeckle MY, Lifjeld JT (2010) DNA barcoding of Scandinavian birds reveals divergent lineages in trans- Atlantic species. Journal of Ornithology Online early Johnson KP, Sorenson MD (1999) Phylogeny and biogeography of dabbling ducks (genus: Anas): a comparison of molecular and morphological evidence. Auk 116:792-805 Johnson NK, Johnson CB (1985) Speciation in sapsuckers (Sphyrapicus): II. Sympatry, hybridization, and mate preference in S. ruber daggetti and S. nuchalis. Auk 102:1-15 Johnston DW (1961) The biosystematics of American Crows. University of Washington Press, Seattle Joseph L, Omland KE (2009) Phylogeography: its development and impact in Australo- Papuan ornithology with special reference to paraphyly in Australian birds. Emu 109:1-23 Karp A, Seberg O, Buiatti M (1996) Molecular techniques in the assessment of botanical diversity. Annals of Botany 78:143-149 Kelly RP, Sarkar IN, Eernisse DJ, Desalle R (2007) DNA barcoding using chitons (genus Mopalia). Molecular Ecology Notes 7:177-183 Kenyon L, Moraes CT (1997) Expanding the functional human mitochondrial DNA database by the establishment of primate xenomitochondrial cybrids. Proceedings 118 of the National Academy of Sciences of the United States of America 94:9131- 9135 Kerr KCR, Birks SM, Kalyakin MV, Red'kin YA, Koblik EA, Hebert PDN (2009a) Filling the gap - COI barcode resolution in eastern Palearctic birds. Frontiers in Zoology 6 Kerr KCR, Lijtmaer DA, Barreira AS, Hebert PDN, Tubaro PL (2009b) Probing evolutionary patterns in Neotropical birds through DNA barcodes. PLoS ONE 4:6 Kerr KCR, Stoeckle MY, Dove CJ, Weigt LA, Francis CM, Hebert PDN (2007) Comprehensive DNA barcoding coverage of North American birds. Molecular Ecology Notes 7:535-543 Kessler LG, Avise JC (1985) A comparative description of mitochondrial DNA differentiation in selected avian and other vertebrate genera. Molecular Biology and Evolution 2:109-125 Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111-120 Knox AG, Collinson M, Helbig AJ, Parkin DT, Sangster G (2002) Taxonomic recommendations for British birds. Ibis 144:707-710 Koopman NE, McDonald DB, Hay ward GD, Eldegard K, Sonerud GA, Sermach SG (2005) Genetic similarity among Eurasian subspecies of boreal owls Aegolius funereus. Journal of Avian Biology 36:179-183 Kroodsma DE (1989) Two North American song populations of the marsh wren reach distributional limits in the Central Great Plains. Condor 91:332-340 119 Kumar S, Tamura K, Nei M (2004) MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Briefings in Bioinformatics 5:150-163 Kvist L, Martens J, Higuchi H, Nazarenko AA, Valchuk OP, Orell M (2003) Evolution and genetic structure of the great tit (Parus major) complex. Proceedings of the Royal Society of London Series B-Biological Sciences 270:1447-1454 LaneN (2009) On the origin of bar codes. Nature 462:272-274 Lee JCI, Tsai LC, Huang MT, Jhuang JA, Yao CT, Chin SC, Wang LC, Linacre A, Hsieh HM (2008) A novel strategy for avian species identification by cytochrome b gene. Electrophoresis 29:2413-2418 Lehman N, Eisenhawer A, Hansen K, Mech LD, Peterson RO, Gogan PJP, Wayne RK (1991) Introgression of coyote mitochondrial DNA into sympatric North American gray wolf populations. Evolution 45:104-119 Lewontin RC (1974) The genetic basis of evolutionary change. Columbia University Press, New York Li W, Zhang Y-y (2004) Subspecific taxonomy of Ficedula parva based on sequences of mitochondrial cytochrome b gene. Zoological Research 25:127-131 Liang YH, Liu XZ, Liu SH, Lu GY (2001) The structure of greylag goose oxy haemoglobin: the roles of four mutations compared with bar-headed goose haemoglobin. Acta Crystallographica Section D-Biological Crystallography 57:1850-1856 Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451-1452 120 Liebers D, de Knijff P, Helbig AJ (2004) The herring gull complex is not a ring species. Proceedings of the Royal Society of London Series B-Biological Sciences 271:893-901 Lijtmaer DA, Sharpe NMM, Tubaro PL, Lougheed SC (2004) Molecular phylogenetics and diversification of the genus Sporophila (Aves: Passeriformes). Molecular Phylogenetics and Evolution 33:562-579 Linnaeus C (1758) Systema Naturae, Editio decima. Laurentii Salvii, Holmiae Lipscomb D, Platnick N, Wheeler Q (2003) The intellectual content of taxonomy: a comment on DNA taxonomy. Trends in Ecology & Evolution 18:65-66 Lovette IJ, Bermingham E (2001) Mitochondrial perspective on the phylogenetic relationships of the Parula wood-warblers. Auk 118:211-215 Lynch M, Jarrell PE (1993) A method for calibrating molecular clocks and its application to animal mitochondrial DNA. Genetics 135:1197-1208 Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of organelle genomic architecture. Science 311:1727-1730 Maddison WP, Maddison DR (2009) Mesquite: a modular system for evolutionary analysis, http://mesquiteproject.org Marshall HD, Coulson MW, Carr SM (2009) Near neutrality, rate heterogeneity, and linkage govern mitochondrial genome evolution in Atlantic Cod (Gadus morhua) and other gadine fish. Molecular Biology and Evolution 26:579-589 Martens K (2010) The International Year of Biodiversity. Hydrobiologia 637:1-2 Marthinsen G, Wennerberg L, Lifjeld JT (2008) Low support for separate species within the redpoll complex (Carduelisflammea-hornemanni-cabaret) from analyses of 121 mtDNA and microsatellite markers. Molecular Phylogenetics and Evolution 47:1005-1017 Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on DNA sequence data. Philosophical Transactions of the Royal Society B- Biological Sciences 360:1969-1974 Mazar Barnett J, Pearman M (2001) Annotated checklist of the birds of Argentina. Lynx Edicions, Barcelona McCarthy E (2006) The handbook of avian hybrids of the world. Oxford University Press, New York McCracken KG, Johnson WP, Sheldon FH (2001) Molecular population genetics, phylogeography, and conservation biology of the mottled duck (Anas fulvigula). Conservation Genetics 2:87-102 McManus DP, Bowles J (1996) Molecular genetic approaches to parasite identification: Their value in diagnostic parasitology and systematics. International Journal for Parasitology 26:687-704 Meiklejohn CD, Montooth KL, Rand DM (2007) Positive and negative selection on the mitochondrial genome. Trends in Genetics 23:259-263 Meyer CP, Paulay G (2005) DNA barcoding: Error rates based on comprehensive sampling. PLoS BIOLOGY 3:2229-2238 Mila B, McCormack JE, Castaneda G, Wayne RK, Smith TB (2007a) Recent postglacial range expansion drives the rapid diversification of a songbird lineage in the genus Junco. Proceedings of the Royal Society B-Biological Sciences 274:2653-2660 122 Mila B, Smith TB, Wayne RK (2007b) Speciation and rapid phenotypic differentiation in the yellow-rumped warbler Dendroica coronata complex. Molecular Ecology 16:159-173 Min XJ, Hickey DA (2007) DNA barcodes provide a quick preview of mitochondrial genome composition. PLoS ONE 2:5 Mindell DP, Sorenson MD, Huddleston CJ, Miranda HC, Knight A, Sawchuk SJ, Yuri T (1997) Phylogenetic relationships among and within select avian orders based on mitochondrial DNA. In: Mindell DP (ed) Avian molecular evolution and systematics. Academic Press, New York, p 214-247 Mishler BD, Shapley RL (2004) Presentation at Assembling the Tree of Life - PI meeting, National Science Foundation, Nov. 19-21 Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley K, Chen E, Brown MD, Sukernik RI, Olckers A, Wallace DC (2003) Natural selection shaped regional mtDNA variation in humans. Proceedings of the National Academy of Sciences of the United States of America 100:171-176 Monaghan MT, Balke M, Gregory TR, Vogler AP (2005) DNA-based species delineation in tropical beetles using mitochondrial and nuclear markers. Philosophical Transactions of the Royal Society B-Biological Sciences 360:1925- 1933 Moore WS, Weibel AC, Agius A (2006) Mitochondrial DNA phylogeny of the woodpecker genus Veniliornis (Picidae, Picinae) and related genera implies convergent evolution of plumage patterns. Biological Journal of the Linnean Society 87:611-624 123 Moritz C (1994) Defining "evolutionarily significant units" for conservation. Trends in Ecology & Evolution 9:373-375 Moritz C, Cicero C (2004) DNA barcoding: Promise and pitfalls. PLoS BIOLOGY 2:1529-1531 Moritz C, Hoskin CJ, MacKenzie JB, Phillips BL, Tonione M, Silva N, VanDerWal J, Williams SE, Graham CH (2009) Identification and dynamics of a cryptic suture zone in tropical rainforest. Proceedings of the Royal Society B-Biological Sciences 276:1235-1244 Morrison ML, Hardy JW (1983) Hybridization between Hermit and Townsend's Warblers. Murrelet 64:65-72 Mulligan CJ, Kitchen A, Miyamoto MM (2006) Comment on "Population size does not influence mitochondrial genetic diversity in animals". Science 314:1390 Munch K, Boomsma W, Willerslev E, Nielsen R (2008) Fast phylogenetic DNA barcoding. Philosophical Transactions of the Royal Society B-Biological Sciences 363:3997-4002 Nabholz B, Glemin S, Galtier N (2009) The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evolutionary Biology 9 Nabholz B, Mauffrey JF, Bazin E, Galtier N, Glemin S (2008) Determination of mitochondrial genetic diversity in mammals. Genetics 178:351-361 Nachman MW, Boyer SN, Aquadro CF (1994) Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3-gene in mice. Proceedings of the National Academy of Sciences of the United States of America 91:6364-6368 124 Narosky T, Yzurieta D (2003) Birds of Argentina and Uruguay: a field guide. Vazquez Mazzini Editores, Buenos Aires Navajas M, Fenton B (2000) The application of molecular markers in the study of diversity in acarology: A review. Experimental and Applied Acarology 24:751- 774 Nei M (1987) Molecular Evolutionary Genetics. Columbia University Press, New York Newton I (2000) The Speciation and Biogeography of Birds. Academic Press, New York Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Systematic Biology 55:162-169 Nores M (1992) Bird speciation in subtropical South America in relation to forest expansion and retraction. Auk 109:346-357 Olsen KM (2004) Gulls of Europe, Asia, and North America. Princeton University Press, Princeton, New Jersey Omland KE, Tarr CL, Boarman WI, Marzluff JM, Fleischer RC (2000) Cryptic genetic variation and paraphyly in ravens. Proceedings of the Royal Society of London Series B-Biological Sciences 267:2475-2482 Packert M, Martens J, Severinghaus LL (2009) The Taiwan Firecrest (Regulus goodfellowi) belongs to the Goldcrest assemblage {Regulus regulus s. /.): evidence from mitochondrial DNA and the territorial song of the Regulidae. Journal of Ornithology 150:205-220 Padial JM, de la Riva I (2007) Integrative taxonomists should use and produce DNA barcodes. Zootaxa:67-68 125 Paland S, Lynch M (2006) Transitions to asexuality result in excess amino acid substitutions. Science 311:990-992 Pavlova A, Rohwer S, Drovetski SV, Zink RM (2006) Different post-Pleistocene histories of Eurasian parids. Journal of Heredity 97:389-402 Pavlova A, Zink RM, Drovetski SV, Red'kin Y, Rohwer S (2003) Phylogeographic patterns in Motacilla flava and Motacilla citreola: Species limits and population history. Auk 120:744-758 Pavlova A, Zink RM, Drovetski S V, Rohwer S (2008) Pleistocene evolution of closely related sand martins Riparia riparia and R. diluta. Molecular Phylogenetics and Evolution 48:61-73 Payne RB (2005) Bird Families of the World: Cuckoos. Oxford University Press, New York Peters JL, McCracken KG, Zhuravlev YN, Lu Y, Wilson RE, Johnson KP, Omland KE (2005) Phylogenetics of wigeons and allies (Anatidae: Anas): the importance of sampling multiple loci and multiple individuals. Molecular Phylogenetics and Evolution 35:209-224 Peterson AT (2006) Taxonomy is important in conservation: a preliminary reassessment of Philippine species-level bird taxonomy. Bird Conservation International 16:155-173 Prendini L (2005) Comment on "Identifying spiders through DNA barcodes". Canadian Journal of Zoology 83:498-504 Prum RO (1994) Species status of the white-fronted manakin, Lepidothrix serena (Pipridae), with comments on conservation biology. Condor 96:692-702 126 Quammen D (2003) Monster of god: the man-eating predator in the jungles of history and the mind. Norton, New York Rach J, DeSalle R, Sarkar IN, Schierwater B, Hadrys H (2008) Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proceedings of the Royal Society B-Biological Sciences 275:237-247 Rand DM, Kann LM (1996) Excess amino acid polymorphism in mitochondrial DNA: Contrasts among genes from Drosophila, mice, and humans. Molecular Biology and Evolution 13:735-748 Rand DM, Kann LM (1998) Mutation and selection at silent and replacement sites in the evolution of animal mitochondrial DNA. Genetica 102-3:393-407 Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes 7:355-364 Reeves AB, Drovetski SV, Fadeev IV (2008) Mitochondrial DNA data imply a stepping-stone colonization of Beringia by arctic warbler Phylloscopus borealis. Journal of Avian Biology 39:567-575 Remigio EA, Hebert PDN (2003) Testing the utility of partial COI sequences for phylogenetic estimates of gastropod relationships. Molecular Phylogenetics and Evolution 29:641-647 Rice NH, Martinez-Meyer E, Peterson AT (2003) Ecological niche differentiation in the Aphelocoma jays: a phylogenetic perspective. Biological Journal of the Linnean Society 80:369-383 Rich TD, Beardmore CJ, Berlanga H, Blancher PJ, Bradstreet MSW, Butcher GS, Demarest DW, Dunn EH, Hunter WC, Inigo-Elias EE, Kennedy JA, Martelt AM, 127 Panjabi AO, Pashley DN, Rosenberg KV, Rustay CM, Wendt JS, Will TC (2004) Partners in flight: North American landbird conservation plan. Cornell Lab of Ornithology, Ithica, New York Rodrigo AG (1993) Calibrating the bootstrap test of monophyly. International Journal for Parasitology 23:507-514 Rodriguez Mata JR, Erize F, Rumboll M (2006) A field guide to the birds of South America. Collins, London Rohwer S (1976) Specific distinctness and adaptive differences in southwestern meadowlarks. Occasional Papers of the Museum of Natural History University of Kansas: 1-13 Rosenberg NA (2007) Statistical tests for taxonomic distinctiveness from observations of monophyly. Evolution 61:317-323 Sabrosky CW (1950) Taxonomy and ecology. Ecology 31:151-152 Sarkar IN, Planet PJ, Desalle R (2008) CAOS software for use in character-based DNA barcoding. Molecular Ecology Resources 8:1256-1259 Sato A, O'HUigin C, Figueroa F, Grant PR, Grant BR, Tichy H, Klein J (1999) Phylogeny of Darwin's finches as revealed by mtDNA sequences. Proceedings of the National Academy of Sciences of the United States of America 96:5101-5106 Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philosophical Transactions of the Royal Society B-Biological Sciences 360:1879-1888 128 Scheffer SJ, Lewis ML, Joshi RC (2006) DNA barcoding applied to invasive leafminers (Diptera : Agromyzidae) in the Philippines. Annals of the Entomological Society of America 99:204-210 Schmidt TR, Wu W, Goodman M, Grossman LI (2001) Evolution of nuclear-and mitochondrial-encoded subunit interaction in cytochrome c oxidase. Molecular Biology and Evolution 18:563-569 Sibley CG, Ahlquist JE (1983) Phylogeny and classification of birds based on the data of DNA-DNA hybridization. Current Ornithology:245-292 Sibley CG, Monroe BL (1990) Distribution and taxonomy of birds of the world. Yale University Press, New Haven, CT Simmons RB, Weller SJ (2001) Utility and evolution of cytochrome b in insects. Molecular Phylogenetics and Evolution 20:196-210 Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philosophical Transactions of the Royal Society B-Biological Sciences 360:1825- 1834 Smith MA, Poyarkov NA, Hebert PDN (2008) COl DNA barcoding amphibians: take the chance, meet the challenge. Molecular Ecology Resources 8:235-246 Snow DW (2004) Family Pipridae (Manakins). In: del Hoyo J, Elliot A, Sargatal J (eds) Handbook of the Birds of the World. Lynx Edicions, Barcelona, p 110-169 Sorenson MD, Payne RB (2005) A molecular genetic analysis of cuckoo phylogeny. In: Payne RB (ed) Bird Families of the World: Cuckoos. Oxford University Press, New York, p 68-94 129 Sparling Jr. DW (1980) Hybridization and taxonomic status of Greater Prairie-chickens and Sharp-tailed Grouse. Prairie Naturalist 12:92-101 Stanley SE, Harrison RG (1999) Cytochrome b evolution in birds and mammals: An evaluation of the avian constraint hypothesis. Molecular Biology and Evolution 16:1575-1585 Straneck RJ (1993) Aportes para la unification de Serpophaga subcristata y Serpophaga munda, y la revalidation de Serpophaga griseiceps (Aves: Tyrannidae). Revista del Museo Argentino de cienicas Naturales Zoologia 16:51- 63 Swofford DL (2002) PAUP*: Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Massachusetts Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and linearized trees. Molecular Biology and Evolution 12:823-833 Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24:1596-1599 Tautz D, Arctander P, Minelli A, Thomas RH, Vogler AP (2003) A plea for DNA taxonomy. Trends in Ecology & Evolution 18:70-74 Tavares ES, Baker AJ (2008) Single mitochondrial gene barcodes reliably identify sister-species in diverse clades of birds. BMC Evolutionary Biology 8 Team RDC (2007) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna 130 Thalmann O, Hebler J, Poinar HN, Paabo S, Vigilant L (2004) Unreliable mtDNA data due to nuclear insertions: a cautionary tale from analysis of humans and other great apes. Molecular Ecology 13:321-335 Tieleman BI, Versteegh MA, Fries A, Helm B, Dingemanse NJ, Gibbs HL, Williams JB (2009) Genetic modulation of energy metabolism in birds through mitochondrial function. Proceedings of the Royal Society B-Biological Sciences 276:1685-1693 Toews DPL, Irwin DE (2008) Cryptic speciation in a Holarctic passerine revealed by genetic and bioacoustic analyses. Molecular Ecology 17:2691-2705 Traylor MA (1988) Geographic variation and evolution in South American Cistothorus platensis (Aves: Troglodytidae). Fieldiana Zoology: 1-35 Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, Shinzawaltoh K, Nakashima R, Yaono R, Yoshikawa S (1996) The whole structure of the 13- subunit oxidized cytochrome c oxidase at 2.8 angstrom. Science 272:1136-1144 Vences M, Thomas M, van der Meijden A, Chiari Y, Vieites D (2005) Comparative performance of the 16S rRNA gene in DNA barcoding of amphibians. Frontiers in Zoology 2 Vilaca ST, Lacerda DR, Sari EHR, Santos FR (2006) DNA-based identification applied to Thamnophilidae (Passeriformes) species: the first barcodes of Neotropical birds. Revista Brasileira de Ornitologia 14:7-13 Voelker G, Rohwer S, Outlaw DC, Bowie RCK (2009) Repeated trans-Atlantic dispersal catalysed a global songbird radiation. Global Ecology and Biogeography 18:41-49 131 Wang ZO, Pollock DD (2007) Coevolutionary patterns in cytochrome c oxidase subunit I depend on structural and functional context. Journal of Molecular Evolution 65:485-495 Ward RD, Hanner R, Hebert PDN (2009) The campaign to DNA barcode all fishes, FISH-BOL. Journal of Fish Biology 74:329-356 Ward RD, Zemlak TS, Junes BH, Last PR, Hebert PDN (2005) DNA barcoding Australia's fish species. Philosophical Transactions of the Royal Society B- Biological Sciences 360:1847-1857 Wares JP, Barber PH, Ross-Ibarra J, Sotka EE, Toonen RJ (2006) Mitochondrial DNA and population size. Science 314:1388-1389 Watson DM (2005) Diagnosable versus distinct: Evaluating species limits in birds. Bioscience 55:60-68 Watson JD, Crick FHC (1953) MOLECULAR STRUCTURE OF NUCLEIC ACIDS - A STRUCTURE FOR DEOXYRIBOSE NUCLEIC ACID. Nature 171:737-738 Weir JT, Schluter D (2004) Ice sheets promote speciation in boreal birds. Proceedings of the Royal Society of London Series B-Biological Sciences 271:1881-1887 Weir JT, Schluter D (2007) The latitudinal gradient in recent speciation and extinction rates of birds and mammals. Science 315:1574-1576 Weir JT, Schluter D (2008) Calibrating the avian molecular clock. Molecular Ecology 17:2321-2328 Wheeler QD (2000) Species concepts and phylogenetic theory. Columbia University Press, New York 132 Wiemers M, Fiedler K (2007) Does the DNA barcoding gap exist? - a case study in blue butterflies (Lepidoptera: Lycaenidae). Frontiers in Zoology 4 Will KW, Mishler BD, Wheeler QD (2005) The perils of DNA barcoding and the need for integrative taxonomy. Systematic Biology 54:844-851 Will KW, Rubinoff D (2004) Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification. Cladistics-the International Journal of the Willi Hennig Society 20:47-55 Wilson CC, Bernatchez L (1998) The ghost of hybrids past: fixation of arctic charr (Salvelinus alpinus) mitochondrial DNA in an introgressed population of lake trout (S. namaycush). Molecular Ecology 7:127-132 Wilson EO (2003) The encyclopedia of life. Trends in Ecology & Evolution 18:77-80 Wilson EO, Peter FM (1988) Biodiversity. National Academy Press, Washington, D.C. Wise CA, Sraml M, Easteal S (1998) Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees. Genetics 148:409-421 Wong EHK, Shivji MS, Hanner RH (2009) Identifying sharks with DNA barcodes: assessing the utility of a nucleotide diagnostic approach. Molecular Ecology Resources 9:243-256 Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24:1586-1591 Yassin A (2008) Molecular and morphometrical revision of the Zaprionus tuberculatus species subgroup (Diptera: Drosophilidae), with descriptions of two cryptic species. Annals of the Entomological Society of America 101:978-988 133 Yoo HS, Eah JY, Kim JS, Kim YJ, Min MS, Paek WK, Lee H, Kim CB (2006) DNA barcoding Korean birds. Molecules and Cells 22:323-327 Zardoya R, Meyer A (1996) Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and Evolution 13:933-942 Zardoya R, Suadrez M (2008) Sequencing and phylogenomic analysis of whole mitochondrial genomes of animals. Methods in Molecular Biology: 185-200 Zhang AB, Sikes DS, Muster C, Li SQ (2008) Inferring species membership using DNA sequences with back-propagation neural networks. Systematic Biology 57:202- 215 Zink RM (2004) The role of subspecies in obscuring avian biological diversity and misleading conservation policy. Proceedings of the Royal Society of London Series B-Biological Sciences 271:561-564 Zink RM (2005) Natural selection on mitochondrial DNA in Parus and its relevance for phylogeographic studies. Proceedings of the Royal Society of London Series B- Biological Sciences 272:71-78 Zink RM, Barrowclough GF (2008) Mitochondrial DNA under siege in avian phylogeography. Molecular Ecology 17:2107-2121 Zink RM, Blackwell-Rago RC (2000) Species limits and recent population history in the Curve-billed Thrasher. Condor 102:881-886 Zink RM, Drovetski SV, Rohwer S (2002a) Phylogeographic patterns in the great spotted woodpecker Dendrocopos major across Eurasia. Journal of Avian Biology 33:175-178 134 Zink RM, Drovetski SV, Rohwer S (2006a) Selective neutrality of mitochondrial ND2 sequences, phylogeography and species limits in Sitta europaea. Molecular Phylogenetics and Evolution 40:679-686 Zink RM, Pavlova A, Drovetski S, Rohwer S (2008) Mitochondrial phylogeographies of five widespread Eurasian bird species. Journal of Ornithology 149:399-413 Zink RM, Pavlova A, Drovetski S, Wink M, Rohwer S (2009) Taxonomic status and evolutionary history of the Saxicola torquata complex. Molecular Phylogenetics and Evolution 52:769-773 Zink RM, Pavlova A, Rohwer S, Drovetski SV (2006b) Barn swallows before barns: population histories and intercontinental colonization. Proceedings of the Royal Society B-Biological Sciences 273:1245-1251 Zink RM, Rohwer S, Andreev AV, Dittmann DL (1995a) Trans-Beringia comparisons of mitochondrial DNA differentiation in birds. Condor 97:639-649 Zink RM, Rohwer S, Andreev AV, Dittmann DL (1995b) Trans-Beringia comparisons of mitochrondrial DNA differentiation in birds. Condor 97:639-649 Zink RM, Rohwer S, Drovetski S, Blackwell-Rago RC, Farrell SL (2002b) Holarctic phylogeography and species limits of Three-toed Woodpeckers. Condor 104:167- 170 135