Conservation Genomics of the Endangered Mexican Wolf and De Novo SNP Marker Development in Pumas using Next-Generation Sequencing

Item Type text; Electronic Dissertation

Authors Fitak, Robert Rodgers

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Download date 10/10/2021 17:53:18

Link to Item http://hdl.handle.net/10150/316929 CONSERVATION GENOMICS OF THE ENDANGERED MEXICAN WOLF AND

DE NOVO SNP MARKER DEVELOPMENT IN PUMAS USING NEXT-

GENERATION SEQUENCING

by

Robert Rodgers Fitak

______Copyright @ Robert R. Fitak 2014

A Dissertation Submitted to the Faculty of the

GRADUATE INTERDISCIPLINARY PROGRAM IN GENETICS

In Partial Fulfillment of the Requirements For the Degree of

DOCTOR OF PHILOSOSPHY

In the Graduate College

THE UNIVERSITY OF ARIZONA

2014 2

THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE

As members of the Dissertation Committee, we certify that we have read the dissertation prepared by Robert R. Fitak titled “Conservation Genomics of the endangered Mexican wolf and de novo SNP marker development in pumas using next-generation sequencing” and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy

______Date: May 10, 2013 Melanie Culver

______Date: May 10, 2013 Giovanni Bosco

______Date: May 10, 2013 Michael Nachman

______Date: May 10, 2013 Philip Hedrick

______Date: May 10, 2013 Steven M. Chambers

Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

______Date: May 10, 2013 Dissertation Director: Melanie Culver 3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder.

SIGNED: Robert R. Fitak

4

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor, Dr. Melanie Culver, for her guidance and support for the past 5+ years. Melanie and her family, Dave and Sierra Bognar, have been so very helpful in so many ways. I am very grateful for the additional support and guidance of my committee members, Dr. Gio Bosco, Dr. Michael Nachman, Dr. Steve Chambers, and Dr. Phil Hedrick. I give many thanks to everyone in the Conservation Genetics Laboratory at the University of Arizona, especially Ashwin Naidu, Alex Ochoa-Hein, Dr. Adrian Munguia-Vega, Sophia Amirsultan, Taylor Edwards, and Dr. Sarah Rinkevich. I thank Dr. Daryl Kelly, Dr. Paul Fuerst, and Dr. Greg Booton at the Ohio State University for giving me the chance to begin my scientific career.

For my puma project, I would like to thank Dr. Scott Bender and the Navajo Nation Zoo for teaching me how to draw blood from pumas, and Linda Searles and staff at the Southwest Wildlife Conservation Center for providing many of the samples. I also thank Ron Thompson and Andy Salazar for teaching me so much about pumas in the field. I am grateful to Bianca Judy for finding SNPs, and Ashwin Naidu and the Arizona Game and Fish Department for collecting and organizing all the harvested puma samples.

For my study of Mexican wolves I thank Dr. Sarah Rinkevich for without her support this study would not have been possible. I also thank Dr. Joe Cook at the Museum of Southwestern Biology, and Dr. Maegan Harden and staff at the Broad Institute for their genotyping services.

At the University of Arizona, I am especially thankful for all the students, staff, and faculty of the IGERT program in Comparative Genomics, notably Dr. Michael Nachman, Dr. Matt Sullivan, and Penny Liebig for their training and financial support. Additionally, I thank the Science Foundation Arizona, U.S. Fish and Wildlife Service, Arizona Game and Fish Department, the Graduate College, and the Genetics Interdisciplinary Program for their financial support. I am grateful for all the work performed by the University of Arizona Genetics Core. I thank Dr. Murray Brilliant and the Genetics IDP for inviting me to join the University of Arizona. I am indebted to all the students and staff of the Genetics IDP for their support and companionship over the years, especially Lori Taylor and Cindy Cowen.

Finally, I am very grateful for the support from my loving family, especially my mother and father Madge and Mike Fitak, sisters Michelle and Melissa and their families, and my ever-so-beautiful wife Sophia. 5

DEDICATION

This dissertation is dedicated to my loving parents, Madge and Mike, and to my wife Sophia. 6

TABLE OF CONTENTS

LIST OF TABLES ...... 9 LIST OF FIGURES ...... 11

ABSTRACT ...... 15

INTRODUCTION ...... 16 Conservation genetics and genomics ...... 16 Dissertation goals ...... 19 Description of the dissertation format ...... 20

PRESENT STUDIES ...... 23 Focal ...... 23 Pumas (Puma concolor) ...... 24 Mexican wolves (Canis lupus baileyi) ...... 26 PumaPlex: A new panel of SNP markers for effective conservation genetic analysis and management of pumas ...... 28 -wide analysis of SNPs reveals pedigree errors and a lack of domestic dog ancestry in the endangered Mexican wolf ...... 31 Measuring inbreeding using genomic data: endangered Mexican wolves as an example ...... 33 Evidence for selection and an association study of involved in fitness in the endangered Mexican wolf (Canis lupus baileyi): implications for management ...35 Conclusions ...... 37

REFERENCES ...... 40

APPENDIX A: CONSERVATION GENETICS AND THE TRANSITION TO CONSERVATION GENOMICS ...... 46 Abstract ...... 47 Introduction ...... 48 Next-generation sequencing ...... 49 Microarrays ...... 53 Expression Microarrays ...... 54 Genotyping Microarrays ...... 57 Expanded genetic marker surveys ...... 59 Whole genome sequencing ...... 63 Conclusion ...... 65 References ...... 67

APPENDIX B: PUMAPLEX: A NEW PANEL OF SNP MARKERS FOR EFFECTIVE CONSERVATION GENETIC ANALYSIS AND MANAGEMENT OF PUMAS ...... 88 Abstract ...... 89 7

Introduction ...... 91 Materials and Methods ...... 95 Sample collection and nucleic acid extraction ...... 95 cDNA library construction and sequencing ...... 96 Sequence assembly and SNP identification ...... 96 Sequenom genotyping, validation, and analyses ...... 98 Comparison with microsatellite data ...... 101 Individual identity and relatedness ...... 102 Fecal DNA samples and analysis ...... 102 Results ...... 103 Sequencing and SNP identification ...... 103 PumaPlex ...... 104 Comparison with microsatellite data ...... 106 Discussion ...... 109 PumaPlex ...... 109 Comparison with microsatellite data ...... 112 Conclusions ...... 115 References ...... 117

APPENDIX C: GENOME-WIDE ANALYSIS OF SNPS REVEAL PEDIGREE ERRORS AND A LACK OF DOMESTIC DOG ANCESTRY IN THE ENDANGERED MEXICAN WOLF ...... 147 Abstract ...... 148 Introduction ...... 149 Materials and Methods ...... 152 CanineHD BeadChip ...... 152 Sample collection and genotyping ...... 153 Quality control and pedigree errors ...... 154 Population admixture ...... 155 Results ...... 157 Quality control and pedigree errors ...... 157 Population admixture ...... 159 Discussion ...... 161 Pedigree errors ...... 161 Population admixture ...... 163 Conclusions ...... 164 References ...... 166

APPENDIX D: MEASURING INBREEDING USING GENOMIC DATA: THE ENDANGERED MEXICAN WOLF AS AN EXAMPLE ...... 180 Abstract ...... 181 Introduction ...... 182 Materials and Methods ...... 186 Inbreeding in Mexican wolves ...... 186 Population simulations ...... 189 Results ...... 190 8

Mexican wolf pedigree ...... 190 Simulated pedigrees ...... 192 Discussion ...... 192 References ...... 197

APPENDIX E: A GENOME-WIDE SCAN FOR SELECTION AND AN ASSOCIATION STUDY OF GENES INVOLVED IN FITNESS IN THE ENDANGERED MEXICAN WOLF (CANIS LUPUS BAILEYI): IMPLICATIONS FOR MANAGEMENT ...... 211 Abstract ...... 212 Introduction ...... 213 Materials and Methods ...... 216 Genetic and phenotypic data ...... 216 Scan for FST outlier loci ...... 217 Genome-wide association mapping of fitness traits ...... 217 identification and comparison ...... 218 Results ...... 218 Scan for FST outlier loci ...... 218 Genome-wide association mapping of fitness traits ...... 219 Discussion ...... 220 References ...... 225

COMPLETE LIST OF LITERATURE CITED ...... 242

9

LIST OF TABLES

Table A.1: Summary of the commonly used next-generation sequencing technologies. Each company normally offers a variety of commercial platforms and variations, but the information listed below is for the platform with the highest yield. Visit the websites given or see Glenn et al. (2011) for more information. I have excluded certain technologies such as Helicos Biosciences because the company has filed bankruptcy and will no longer offer their service, and those offered by Oxford Nanopore Technologies because their sequencing systems are not yet commercially available ...... 84

Table B.1: General summary of pyrosequencing results, assembly, and SNP verification success ...... 129

Table B.2: Table of the 27 SNP loci reported in PumaPlex and their best annotations using BLAST (www.blast.ncbi.nlm.nih.gov) ...... 130

Table B.3: Shown for each in P. c. couguar are the number of individuals with genotypes (n), minor allele frequency (MAF), observed (HO) and expected (HE) heterozygosities, FIS according to Weir and Cockerham (1984) and test for significant deviation from Hardy-Weinberg equilibrium (Prob) ...... 131

Table B.4: Summary of genetic variation statistics (standard deviation within parentheses) within each subspecies of puma: genotypes (n), minor allele frequency (MAF), observed (HO) and expected (HE) heterozygosities, FIS according to Weir and Cockerham (1984) and percent polymorphic loci (P) ...... 132

Table B.5: General indices of genetic variation averaged over 15 microsatellite (MS) and 23 SNP (SNP) loci for three putative populations of pumas from Arizona. N = number of individuals, NA = mean number of alleles, HO = observed heterozygosity, HE = expected heterozygosity, FIS = inbreeding coefficient ...... 133

Table B.6: Pairwise FST measures calculated using the microsatellite (above diagonal) and SNP (below diagonal) datasets. All FST values were significantly greater than zero (P < 0.017) ...... 134

Table C.1: Reproducibility of genotypes across six replicated samples ...... 172

Table C.2: Mendelian disagreements between genotypes of Parent-offspring trios. Wolf IDs are given according to the Mexican wolf official studbook (Siminski 2011) ....173

Table D.1: Parameters used in VORTEX for population simulations. The asterisk indicates the parameter that varied between MW-RND (0) and MW-MK (1) ...... 204

Table D.2: Summary of the comparisons between the two genetic estimators (FHET and FROH) of inbreeding with that calculated from the pedigree (FPED) for the different 10

datasets. Each comparison is represented by the sample size (n), the Pearson product moment correlation (r) and its p-value (p), and the root mean squared error (RMSE) 206

Table E.1: A list of the genes found within the regions covered by the 22 statistical FST outlier loci, order by number (CHR). Gene symbols indicate names given in UNIPROT to usually the homolog. RNA abbreviations are rRNA = ribosomal RNA, snRNA = small nuclear RNA, sno RNA = small nucleolar RNA, misc RNA = uncharacterized RNA ...... 233

Table E.2: List of GO term annotations for the 31 coding genes found in the chromosomal regions containing SNPs under selection. The categories of GO annotation are biological processes (BP), molecular function (MF) and cellular component (CC). OBS = observed number, EXP = expected number, FDR = false discovery rate ...... 235

Table E.3: A list of the genes found within chromosome (CHR) 13 containing SNPs associated with the two traits studied. Gene symbols indicate names given in UNIPROT to usually the human homolog. RNA abbreviations are miRNA = micro RNA, sno RNA = small nucleolar RNA ...... 236 11

LIST OF FIGURES

Figure A.1: Cost (in U.S. dollars) per megabase (Mb) of DNA sequencing from September 2001 until January 2013. Notice the Y-axis is on a logarithmic scale. The graph was created from data available publically at http://www.genome.gov/sequencingcosts/ ...... 85

Figure A.2: Diagram depicting the basic procedure followed when comparing the expression levels of two samples (usually a control and treatment) using a microarray hybridization method (A) and when performing an RNA-seq analysis (B). Adapted from Figure 2 of Forêt et al. (2007) ...... 86

Figure A.3: Line graphs depicting the relationship between the time to the most recent common ancestor (TMRCA) between two species and the proportion of SNPs amplified (solid line) or amplified and polymorphic (dashed line). These graphs are recreated from those of Miller et al. (2012a) comparing domestic and wild species on three different Illumina genotyping chips: BovineSNP50, OvineSNP50, and EquineSNP50 ...... 87

Figure B.1: Map of puma samples used in this study. The number within each colored region indicates the number of samples obtained. The colors represent a region with at least one puma sample of subspecies P. c couguar (yellow), P. c. puma (red), P c. costaricensis (green), P. c. capricornensis (orange), P. c. cabrerae/P. c. puma (light blue), and P. c. concolor (purple). One puma of unknown origin is not shown ...... 135

Figure B.2: Distribution of the top BLASTN hit for 1,000 randomly chosen pyrosequence reads. Only hits with an e-value less than 1e-30 were retained ...... 136

Figure B.3: Box and whiskers plot of A) the FREEBAYES quality score metric for each SNP and B) the count of each allele. Monomorphic loci are represented in gray, polymorphic loci in blue, and failed loci in red ...... 137

Figure B.4: Mean FIS (black line) for varying levels of heterozygosity predicted after 100,000 simulations following the method of Girard (2011). The upper and lower 95% confidence intervals are shown as dashed lines. Each black circle represents a locus and the names of two loci outside the 95% boundaries are given. Loci falling above the upper boundary (95% confidence interval) have an excess of homozygosity, suggesting a potential for a null alleles, whereas those loci below the boundary display an excess of heterozygosity compared with the predicted distribution ...... 138

Figure B.5: Principle coordinates analysis for the first two axis of variation of the PumaPlex data. The six different subspecies of puma are shown according to the same colors given in Figure B.1. UNK indicates the single, captive pet puma of unknown geographic origin ...... 139

12

Figure B.6: Results of the STRUCTURE analysis for the entire puma dataset. A) Plot of the ΔK for each value of K, B) Ancestry plot for each puma for the optimal K = 3 solution. Each color represents the proportion of ancestry in the three clusters. Individuals to the left of the vertical black line are from Central (CA) and South American (SA) subspecies (ssp). The remaining individuals are all P. c. couguar. C) Map of the sampling location of each puma, colored according to the cluster that contains a majority of the individual’s ancestry, including a more detailed view of the southwestern United States (D) where a majority of the samples were collected ....140

Figure B.7: Map of putative population assignments for hunter-harvested pumas based upon location of registration of the carcass. N = dark gray, SW = light gray, SE = white. Dark black lines indicate major interstate highways; light black lines represent smaller highways. Map adapted from http://en.wikipedia.org/wiki/File:USA_Arizona_location_map.svg under the Creative Commons Attribution-Share Alike 3.0 Unported license ...... 141

Figure B.8: Principle coordinates analysis of both the A) microsatellite and B) SNP datasets for the first two principle coordinates (component 1 and component 2). Putative puma populations are represented as N = black circles, SE = red circles, SW = blue circles ...... 142

Figure B.9: Results from the Bayesian clustering software STRUCTURE for A) microsatellites and B) SNPs. Each individual is represented as a vertical bar with colors corresponding the individual’s proportion of ancestry in the given number of clusters (K) ...... 143

Figure B.10: Probability of identity (PI) and probability of identity among siblings (PISIB) calculated for increasing combinations of loci for both the SNP and microsatellite (MS) datasets ...... 144

Figure B.11: Mean relatedness within each population calculated according to the method of Lynch and Ritland (1999) for both the microsatellite (blue) and SNP (red) datasets. Significant differences between datasets within a population are indicated with an asterisk (P < 0.01) ...... 145

Figure B.12: Genotyping success (call rate) for three microsatellite loci (blue) compared with 23 SNP loci (red) in 10 puma fecal samples. Microsatellite analyses described in Naidu et al. (2011) ...... 146

Figure C.1: Call rate over all 173,662 SNPs for each of 96 samples ordered according to increasing call rate along the X-axis ...... 174

Figure C.2: Multidimensional scaling plots for the first two dimensions of the LD-pruned data (6,794 SNPs). Each individual Mexican wolf is shown as a pie chart with slices representing each wolf’s proportion of ancestry in the three captive lineages as predicted from the pedigree. MB = McBride (blue), AR = Aragón (red), GR = Ghost 13

Ranch (green). Numbers indicate IDs (Siminski 2011) of wolves who appear abnormal in the clustering according to their ancestry predicted from the pedigree 175

Figure C.3: Multidimensional scaling of the combined dataset containing 80 Mexican wolves (blue circles) and 31 domestic dogs (black circles). Only the first two dimensions are shown ...... 176

Figure C.4: Results of the population clustering analysis in ADMIXTURE. A) Cross- validation error for different potential numbers of clusters, K, between 1 and 10. B) Ancestry plot for K = 3 and 6. Each vertical bar represents an individual with colors corresponding to the proportion of ancestry in the predefined number of clusters K (y axis). MB = McBride, AR = Aragón, GR = Ghost Ranch, CL = cross-lineage ...... 177

Figure C.5: Results from ADMIXTURE for K = 2 population clusters for both the A) unsupervised and B) supervised analyses. MB = McBride, AR = Aragón, GR = Ghost Ranch, CL = cross lineage. The single domestic dog genotyped, Hershey, is shown separately on the far right. In the supervised analysis, B, only non-MB Mexican wolves and Hershey had ancestry estimated; domestic dogs and MB wolves were assigned to clusters 1 and 2 a priori ...... 178

Figure C.6: Line graph depicting the hybrid index h (proportion of ancestry derived from MB wolves; black line), the proportion of the genome containing haplotypes derived from MB wolves (gray line), and the proportion of ancestry from MB predicted from the pedigree (dotted line). Non-MB Mexican wolves are listed on the horizontal axis according to studbook ID (Siminski 2011) and are ordered by birth year ...... 179

Figure D.1: A) Comparison of the proportion of total ROHs in different length categories and B) number vs. total length of ROHs. The horizontal and vertical dashed lines indicate the maximum values for known unrelated CL wolves with FPED = 0. The regression line was generated from these same CL wolves. MB = McBride, GR = Ghost Ranch, AR = Aragón, CL = cross-lineage ...... 207

Figure D.2: Observed measures of FHET (black line) and FROH (red line) for individual wolves sorted according to increasing FPED (colored circles). MB = McBride, GR = Ghost Ranch, AR = Aragón, CL = cross-lineage ...... 208

Figure D.3: Comparison of 95% confidence intervals for FHET (black region) and FROH (gray region) derived from 200 iterations of simulated genotype data with the observed FPED (red line). Individuals are sorted along the x-axis according to increasing FPED ...... 209

Figure D.4: Density plots of the genetic estimators of inbreeding FHET and FROH compared with true inbreeding known from the pedigree, FPED, for individuals in the simulated populations bred under a minimizing mean kinship approach (MW-MK; A and B) and using random mating approach (MW-RND; C and D). The solid black line represents the hypothetical 1:1 ratio between measures, whereas the dashed line 14

indicates the observed regression through all points. The scale bars on the right indicate the relative density of points on the plot ...... 210

Figure E.1: FST value for each locus plotted according to decreasing false discovery rate (FDR). The vertical black line indicates a FDR cutoff of 0.05. Outlier loci are shown in red ...... 237

Figure E.2: A) Manhattan plot displaying the results of the association mapping for juvenile survival. Not all are labeled for the sake of space. B) The same plot just for chromosome 13, containing the most significant SNPs. C) QQ Plot for the expected and observed P-values. The red line indicates the expected relationship, with the gray-shaded region representing the 95% confidence interval 238

Figure E.3: A) Manhattan plot displaying the results of the association mapping for litter size. Not all chromosomes are labeled for the sake of space. B) The same plot just for chromosome 13, containing the most significant SNPs. C) QQ Plot for the expected and observed P-values. The red line indicates the expected relationship, with the gray-shaded region representing the 95% confidence interval ...... 240 15

ABSTRACT

Traditionally, conservation genetics has examined neutral-marker (e.g microsatellite) surveys to inform the conservation and management of species. The field expanded together with the expansion of molecular biology, primarily enabled by polymerase chain reaction (PCR) and DNA sequencing technologies. Recently, advances in genomics and bioinformatics, notably next-generation sequencing (NGS), have demonstrated the ability to further enhance conservation genetic assessments. As a result, conservation genetics is rapidly transforming into a field of conservation genomics. Although complete genome sequencing and analysis is still beyond the reach of many conservation genetic projects, researchers can benefit by producing large amounts of genetic data quickly for their species of interest, or by exploiting existing genomic data for a closely related species. The research presented below serves as an example of these two different approaches. First, I review the current state of conservation genomics, utilizing examples when appropriate to illustrate different techniques and approaches. Next, I describe the development of a tool using NGS that is useful for the rapid genetic analysis of pumas (Puma concolor) called PumaPlex. This work details the methods involved and will be useful for anyone interested in working with a species where little genomic data is available. The last three chapters focus on using an existing genomic tool for the domestic dog to analyze admixture, quantify inbreeding, and identify potential adaptive variation in the endangered Mexican wolf

(Canis lupus baileyi). The results demonstrated the Mexican wolf has no significant recent ancestry from domestic dogs, and that several loci may potentially be effective in increasing fitness in the reintroduced population. 16

INTRODUCTION

Conservation genetics and genomics

It is evident that the earth’s biodiversity is being lost at an alarming rate and this pace does not appear to be diminishing (Butchart et al. 2010). Many threats, including human consumption of earth’s ecological resources, pollution, over-harvesting, invasive species, and global climate change are contributing to this decline, albeit their location and impact are distributed unevenly (Brooks et al. 2006). Being a “crisis” discipline, conservation biology often reacts to this biodiversity loss through protecting species and ecosystems, often with little available empirical data (Soulé 1985). Like other crisis- related fields, conservation biology incorporates a heterogeneous mixture of disciplines, notably those in the biological and social sciences (Soulé 1985). Of the biological sciences, genetics has been recognized as a fundamental component. Frankham (1995) suggested that there were seven key genetic concerns in conservation: 1) inbreeding depression, 2) accumulation and loss of deleterious alleles, 3) loss of genetic variation in small populations, 4) genetic adaptation to captivity and its effect on population reintroductions, 5) outbreeding depression, 6) population fragmentation, and 7) taxonomic uncertainty and introgression. The growth in the application of these genetic concepts and tools in conservation biology led to the emergence of the sub-discipline conservation genetics.

Traditionally, conservation genetics has examined relatively few neutral genetic markers (e.g. microsatellites) to inform the conservation and management of species.

The field expanded together with the expansion of molecular biology, primarily enabled by polymerase chain reaction (PCR) and DNA sequencing technologies, and now 17 includes a variety of genetic markers and methods. A recent meta-analysis of molecular studies of captive wildlife populations found that (55%) used microsatellites, while the remaining 45% of studies were distributed among the use of mitochondrial DNA, allozymes, restriction fragment-length polymorphisms (RFLPs), amplified fragment- length polymorphisms (AFLPs), randomly-amplified polymorphic DNA (RAPDs), inter simple sequence repeats (ISSRs), MHC loci, and karyotype analysis (Witzenberger and

Hochkirch 2011). Recently, technological advances in genetics have facilitated the investigation of many more molecular markers, often many orders of magnitude more. In some cases, it is now feasible to examine every single base of DNA in an organism. This new era of large-scale genetics is collectively termed genomics. When applied by conservation geneticists, genomics has the potential to enhance genetic assessments of endangered and threatened species, and has transformed the field into one of conservation genomics (Kohn et al. 2006). Although complete genome sequencing and analysis is still beyond the reach of many conservation genetic projects, researchers can benefit by producing large amounts of genetic data quickly for their species of interest, or by exploiting existing genomic data for a closely related species. The emergence of conservation genomics has been enabled by two key technologies: 1) high-throughput genotyping of single nucleotide polymorphisms (SNPs) (Tsuchihashi and Dracopoli

2002) and 2) next-generation sequencing (NGS) technologies (Metzker 2009). Although other aspects of genomics are useful, these technologies have been the most widely applied and hold the most promise for the field of conservation genomics.

Originally, conservation genomics applied the massive amounts of genomic information from model organisms to closely related species of conservation interest. 18

For example, von Holdt et al. (2011) used a high-throughput genotyping assay to analyze

~48,000 SNPs from domestic dogs in a variety of wild populations of wolves and other canids. The drastic reduction in NGS costs has now made possible the ability to sequence the entire genome of an endangered species (e.g. the giant panda; Li et al. 2010 and the wild camel; Jirimutu et al. 2012), or even multiple from an endangered species (e.g. giant pandas; Zhao et al. 2013). This data will provide conservation biologists with the opportunity to directly observe the sequences of DNA responsible for adaptation to certain environments, the loci that have negative, or detrimental, effects on fitness, and/or regions of chromosomes that are the result of inbreeding or introgression

(Allendorf et al. 2010). Importantly, estimating genetic diversity may no longer be necessary, as complete genomes allow the absolute measurement of genetic variation, which can be monitored and managed accordingly in imperiled species. Although still in its infancy, conservation genomics holds much promise for the preservation and maintenance of biodiversity. This new era of genomic data will not only provide more detailed answers to traditional conservation genetic questions, but may facilitate a change in the canonical view of conservation genetics through the ability to ask entirely new questions not previously possible.

As mentioned above, conservation genomic studies generally follow one of two approaches 1) the “genome-enabled” approach and 2) the “ab initio” approach. The genome-enabled approach applies a variety of genetic data, technologies, or other available information from one species (usually, but not limited to, model-organisms) to another, closely related species (Kohn et al. 2006). The potential for this type of approach will depend largely upon the phylogenetic relationships between the species 19 and the questions addressed (Kohn et al. 2006). The ab initio approach directly generates novel genetic or genomic data for the species of interest. Recently, with the advent of

NGS technologies, the latter approach is becoming much more common. In the following dissertation I demonstrate the use of each approach for conservation genomic studies of two, heavily managed carnivore species in Arizona: the Mexican wolf (Canis lupus baileyi) and the puma (Puma concolor). In the endangered Mexican wolf, my coauthors and I employ a large panel of SNPs (>173,000) developed for domestic dogs to answer several important questions regarding the captive management and reintroduction efforts. For pumas, we sequence the transcriptome (the collection of expressed genes in a given space and time) using 454 pyrosequencing technology, and create a custom SNP genotyping array for their genetic analysis. The results of this dissertation have important implications for the management of both carnivore species, and are discussed at length within each chapter.

Dissertation goals

1. Review how conservation geneticists have been utilizing the most recent

technological advances in genomics and applying them to species

conservation, notably through the use of next-generation sequencing and

genotyping of single nucleotide polymorphisms

2. Develop a high-through SNP genotyping assay for pumas (Puma concolor)

from next-generation sequencing of the puma transcriptome to:

a. Investigate population structure of pumas in the southwestern United

States 20

b. Examine the effectiveness of SNP genotyping using this assay with

that of traditional microsatellite analysis

3. Apply the extensive genomic information available for the domestic dog to its

closely related counterpart, the endangered Mexican wolf (Canis lupus

baileyi) to:

a. Quantify the extent of introgression from domestic dogs into the

Mexican wolf population

b. Compare statistical methods to estimate inbreeding from genomic data

in Mexican wolves for future studies of inbreeding depression and to

facilitate population management

c. Identify genetic loci associated with adaptive and detrimental effects

on reproductive fitness

Description of the dissertation format

This dissertation has been arranged as five independent appendices (Appendices

A – E). Each appendix has been formatted as a scientific manuscript to facilitate future publication in peer-reviewed journals. Although future publications will have multiple co-authors for their assistance throughout these studies (see Acknowledgements), the work presented in the following appendices represents my own original ideas, research, data collection, analyses, and interpretations.

Appendix A: CONSERVATION GENETICS AND THE TRANSITION TO

CONSERVATION GENOMICS. This section introduces the topics of genomics, next- generation sequencing, and SNPs and how they are being applied to species conservation. 21

Appendix B: PUMAPLEX: A NEW PANEL OF SNP MARKERS FOR

EFFECTIVE CONSERVATION GENETIC ANALYSIS AND MANAGEMENT

OF PUMAS (Puma concolor). This work was presented at 46th Joint Annual Meeting of the AZ/NM Chapters of the Wildlife Society in Albuquerque, New Mexico in February

2013. Several parts of this work are also given in two final reports to the Arizona Game and Fish Department - Habitat Partnership Committee (HPC-09-406 and HPC-10-705) who funded this project. Participating co-authors include Ron Thompson, Ashwin Naidu, and Melanie Culver

Appendix C: GENOME-WIDE ANALYSIS OF SNPS REVEAL PEDIGREE

ERRORS AND A LACK OF DOMESTIC DOG ANCESTRY IN THE

ENDANGERED MEXICAN WOLF (Canis lupus baileyi). These results were most recently presented at The Wildlife Society Annual Conference in Portland, Oregon in

October 2012, and were also described in a final report submitted to the U.S. Fish and

Wildlife Service Quick Response Program (Project ID: 10-R2-09) in October 2013. The co-authors of this study include Sarah Rinkevich and Melanie Culver.

Appendix D: MEASURING INBREEDING USING GENOMIC DATA: THE

ENDANGERED MEXICAN WOLF AS AN EXAMPLE. This work was also funded by the U.S. Fish and Wildlife Service Quick Response Program (10-R2-09) but is not included in the final report. The co-authors of this study include Sarah Rinkevich and

Melanie Culver. 22

Appendix E: EVIDENCE FOR SELECTION AND AN ASSOCIATION STUDY OF

GENES INVOLVED IN FITNESS IN THE ENDANGERED MEXICAN WOLF

(Canis lupus baileyi): IMPLICATIONS FOR MANAGEMENT. This study was also funded by the U.S. Fish and Wildlife Service Quick Response Program (10-R2-09) and can be found in the final report submitted in October 2013. The co-authors of this study include Sarah Rinkevich and Melanie Culver.

23

PRESENT STUDIES Focal Species

Despite the charismatic and iconic regard for much of the world’s large carnivores species, many are declining in population and face a variety of threats. It has been documented that the persistence of carnivores is negatively correlated with human population density (Cardillo et al. 2004), and this result may be a valuable indicator of anthropogenic impacts on the environment (Morrison et al. 2007). The loss of large carnivore density is known to cascade down through the food web and have enormous effects on the communities of other mammals, herpetofuana, insects, and plants (Ripple et al. 2014). Furthermore, extinctions of large carnivores may severely perturb the distribution of zoonotic parasites and their relationship with their hosts, potentially increasing infectious disease risk to (Harris and Dunn 2013). In the United

States, Arizona has five native large carnivores; more than any other state. These large carnivores include pumas (Puma concolor), jaguars (Panthera onca), gray wolves (Canis lupus), American black bears (Ursus americanus), and grizzly bears (Ursus arctus).

Because of the diversity of carnivores, Arizona may experience increasingly catastrophic ecological effects from the loss of these top predators compared with other states.

Grizzly bears and gray wolves were both extirpated from Arizona, although a small population of Mexican gray wolves (Canis lupus baileyi) has been reintroduced (see below), and jaguars are quire rare. This leaves pumas and black bears as controlling a majority of the top-predator dynamics in Arizona. Therefore, the conservation and management of these large carnivores, of which genetics plays an important role, is essential for maintaining the long-term integrity of the ecological communities in 24

Arizona. Below I describe the two large carnivore species analyzed in this dissertation: the puma and the Mexican wolf.

Pumas (Puma concolor)

The puma (Puma concolor) is a large felid with the most widespread range of any terrestrial mammal in the western hemisphere; from Alaska and northern Canada to the southernmost extent of Argentina and Chile (Sunquist and Sunquist 2002). The puma is not restricted to any certain habitat, and can be found in all forest types at nearly any elevation, scrublands, and deserts (Sunquist and Sunquist 2002). Its widespread distribution is likely the reason why the puma holds the Guinness World Record for most

English names of any animal, more than 40, including common names like “catamount”,

“painter”, “panther”, “mountain lion”, and “cougar” (Guinness World Records 2004).

Pumas can be quite large, with males typically attaining weights anywhere between 53 –

72 kg (116 – 160 lb) and females normally between 34 – 48 kg (75 – 106 lb) (see Nowell and Jackson 1996). Puma diet is largely composed of a variety of ungulates, including both wild (e.g. deer [Odocoileus sp.], bighorn sheep [Ovis canadensis], elk [Cervus canadensis]) and domestic (e.g. cattle [Bos taurus], horses [Equus sp.]) species, but often feed upon other small- to medium-sized mammals, birds, and reptiles (reviewed by

Nowell and Jackson 1996). In North America, ungulates composed a larger proportion of puma diet (68 %) than in Central and South America (35 %) where smaller prey was more prevalent (Iriarte et al. 1990). The versatile habitat and dietary requirements, secretive nature, large size, and charismatic nature of pumas corresponds well with their extensive distribution and revered nature in many native’s folklore (Lake-Thom 1997). 25

Despite their large size, pumas taxonomically share a common ancestor with other small , including domestic cats (Felis catus), more recently (~6.7 MYA) than with large cats (Panthera sp.) (~10.8 MYA). Additionally, they belong to the Puma Lineage, whom they share with their closest relative, the jaguarondi (P. yagouaroundi), and also with the cheetah (Acinonyx jubatus) (Johnson et al. 2006). P. concolor was originally divided into 32 subspecies, but Culver et al. (2000) used both mitochondrial and nuclear microsatellite DNA to condense these 32 subspecies into the currently recognized six subspecies. These six subspecies include four in South America (P. c. puma, P. c. cabrerae, P. c. capricornensis, P. c. concolor), one in Central America (P. c. costaricensis) and one in North America (P. c. couguar). An additional subspecies of puma, P. c. coryi, is sometimes recognized in Florida, but this is often strictly for political purposes and for legal protection according to the United States Endangered Species Act.

In North America, the single subspecies is likely the result of a post-glaciation re- colonization of puma populations from Central and South America approximately

10,000-12,000 years ago (Culver et al. 2000).

Although listed as a species of “least concern” according to the International

Union for the Conservation of Nature’s (IUCN) Red List (www.iucnredlist.org), puma populations are generally decreasing but are legally hunted in Canada, the western United

States (with the exception California), El Salvador, Ecuador, and Guyana (Caso et al.

2008). Pumas were extirpated from eastern North America (with the exception of the isolated Florida population described above) during European colonization, however a growing number of confirmed sightings are being reported from areas east of the 26

Mississippi River, including Illinois, Wisconsin, Indiana, and Connecticut (USFWS

2011).

For conservation purposes, pumas have often been targeted as an “umbrella species” to target the protection of entire landscapes (Weber and Rabinowitz 1996) because, as mentioned above for many large carnivores, their presence can have substantial impacts on dynamics and species composition of prey (Logan and Irwin 1985;

Turner et al. 1992) and even plant communities (Schmitz et al. 2000). Management of pumas is also directly related to their use as a game species (Logan and Sweanor 2001), for predating upon (at times endangered) wildlife (Turner et al. 1992; Wehausen 1996;

Hayes et al. 2000; Schaefer et al. 2000), and for human-animal conflicts such as predation upon livestock, pets, and even humans (Beier 1991; Torres et al. 1996). As human encroachment upon puma habitat increases, the identification and maintenance of migration corridors is becoming of interest to maintain connectivity between populations, especially for dispersing males who are have been known to travel very long distances

(Elbroch et al. 2009). Therefore, in order to preserve the ecological communities in which pumas reside and to minimize the conflicts between humans and pumas, research into the appropriate conservation and management actions to take is necessary.

Mexican wolves (Canis lupus baileyi)

Until recently, gray wolves (Canis lupus) were the most widely distributed terrestrial mammal aside from humans, inhabiting a majority of the Northern Hemisphere from 15° N in North America and 12° N in India (Mech and Boitani 2003). In North

America, Hall and Kelson (1959) recognized 24 different subspecies of gray wolf, but 27 later work by Nowak (1995) reduced this to five subspecies: C. l. arctos (Arctic wolf), C. l. lycaon (Eastern timber wolf), C. l. nubilis (Great Plains wolf), C. l. occidentalis (Rocky

Mountain wolf), and C. l. baileyi (Mexican wolf). However, the most recent synthesis of

North American gray wolf taxonomy supports that C. l. lycaon may represent a distinct species, C. lycaon, that C. l. nubilis, C. l. occidentalis, and C. l. baileyi are different subspecies, and that C. l. arctos remains inconclusive due to lack of information

(Chambers et al. 2012). Nonetheless, morphological and genetic evidence suggest that all extant North American gray wolves are descended from Eurasian gray wolves, which migrated across the Bering Strait in the middle Pleistocene (Nowak 1995; Vilá et al.

1999). Both Nowak (1995) and Vilá et al. (1999) argued that multiple waves of migration into North America occurred, the earliest of which was the Mexican wolf.

Nelson and Goldman (1929) were the first to describe the Mexican wolf (C. l. baileyi) from a type specimen from Chihuahua, Mexico. Its historical distribution is thought to have included much of southern Arizona and New Mexico, extreme western

Texas, and extended south through central Mexico. The Mexican wolf is the smallest subspecies of gray wolf, reaching an adult size of 1.5 – 1.8 m (5 – 6 ft) in length, 63 – 81 cm (25 – 32 in) in height, and a mass of 23 – 41 kg (50 – 90 lb) (Young and Goldman

1944). Cranially morphology is also quite distinct from other currently recognized subspecies, although there was considerable overlap between C. l. baileyi and two other previously recognized subspecies of wolves in the southwest (C. l. mogollonensis and C. l. youngi) (Bogan and Mehlhop 1983). This is consistent with the analysis of mitochondrial DNA from historic wolf specimens, which reported that what was formerly known as C. l. mogollonensis was likely an intergrade between Mexican wolves and 28 more northerly C. lupus (Leonard et al. 2005). This study also reported that a single C. l. nubilus-associated haplotype was found in Arizona, and C. l. baileyi-associated haplotypes were found in Nebraska, further documenting the potential for overlap between gray wolf subspecies. Nonetheless, both mitochondrial (Vilá et al. 1999;

Leonard et al. 2005; Hailer and Leonard 2008) and nuclear DNA (García-Moreno et al.

1996; Hailer and Leonard 2008; von Holdt et al. 2011) support that C. l. bailey are a distinct subspecies of gray wolf (Chambers et al. 2012).

As a result of European colonization of western North America and extensive predator removal campaigns in the early 20th century, Mexican wolf abundance was drastically reduced (reviewed by Rinkevich 2012). By 1980, the Mexican wolf was considered extirpated from the U.S. (Brown 1983) and the remaining population in

Mexico likely followed soon thereafter. During this time, however, three independent captive breeding programs were started, founded by an effective total of seven individuals (Hedrick et al. 1997). Overtime, these three captive lineages, Ghost Ranch,

Aragón, and McBride (Certified), accumulated high levels of inbreeding, and were subsequently merged in 1995. In 1998, the first reintroduction of Mexican wolves into east-central Arizona and western New Mexico took place, and has continued periodically until the present (USFWS 2010). Currently, the free-ranging Mexican wolf population numbers 75 individuals (USFWS 2012), far below initial expectations.

PumaPlex: A new panel of SNP markers for effective conservation genetic analysis and management of pumas 29

Pumas (Puma concolor) are one of the most widely studied species of wildlife in the western hemisphere, and are often the focus of a variety of wildlife conservation and management programs. Decisions made regarding puma management, however, can be quite complicated and have a variety of effects including direct impacts on other game species like bighorn sheep (Ovis canadensis) and deer (Odocoileus sp.), and potential consequences on livestock production, local ranchers, and/or other human-puma conflicts. Therefore, it is imperative that wildlife managers utilize the most robust scientific techniques that can rapidly produce accurate results in order to translate them into effective conservation strategies.

Recently, genetic tools have been incorporated into a variety of puma conservation programs, and have provided useful results regarding their demographic history, levels of genetic variation, familial relationships, individual identification, and diet analysis. Nearly all of these studies have focused on the use of either maternally inherited mitochondrial DNA or codominant, nuclear repetitive elements called microsatellites. Microsatellites have become popular lately, and a variety of loci isolated either from domestic cats or directly from pumas are currently being examined by researchers. Microsatellites, however, can be rather time consuming and expensive to analyze, and are notoriously difficult to compare across studies. A new genetic marker, called single nucleotide polymorphisms (SNPs), has recently grown in popularity and appears promising for a variety of wildlife-related studies.

In this study, my coauthors and I developed and validated a novel SNP- genotyping assay for pumas called PumaPlex. We first sequenced the transcriptome of a pool of RNA extracted from 12 wild-born pumas using 454 pyrosequencing. The 30 sequencing generated >358,000 sequences and >108 million bases (Mb) of DNA. We assembled these sequences into 2,073 contigs and used this assembly to identify SNPs.

After quality control and filtering, an initial set of 434 SNPs was produced. In order to verify a SNP was real, polymorphic, and not a false positive we analyzed 129 SNPs using the Sequenom MassARRAY system. A total of 30/129 (23.3 %) SNPs were found to fit our criteria, and a final genotyping multiplex, PumaPlex, was produced and contained 27

SNPs. PumaPlex was then validated on more than 500 pumas, including samples representing their entire geographic range and all six subspecies.

In P. c. couguar, the mean minimum allele frequency (MAF) was 0.274 ± 0.025 and ranged from a low of 0.060 to a highest MAF of 0.449. Observed heterozygosity was 0.340 ± 0.024 and FIS was 0.074 and did not differ significantly from zero. A principle coordinates (PCoA) analysis was able to suggest differences between P. c. couguar and the other subspecies of puma, and no specific clustering within the North

American pumas was present. When compared with a dataset of microsatellite for the same individuals in Arizona, the SNP dataset produced relative levels of heterozygosity,

FIS, and FST comparable to that of the microsatellite dataset for the hypothetical populations examined. Overall pairwise relatedness between individuals was 0.091 and

0.096 for microsatellites and SNPs, respectively) and significantly greater than zero (t- test, P < 0.001). Both datasets are able to distinguish population structure between northern and southern Arizona using both PCoA and a Bayesian analysis implemented in the software STRUCTURE. However, the microsatellite dataset detects additional, subtle structure between southeast and southwest Arizona not detected by the SNP loci.

The probabilities of identity (PI) for microsatellites and SNPs were 2.4x10-12 and 3.6x10- 31

8, respectively. PumaPlex had approximately the same PI as nine microsatellite loci (PI =

2.5x10-8). Nonetheless, both datasets were well below the recommended threshold for estimates of population size (PI < 0.01) and for forensic applications (PI < 0.0001).

When genotyping call rates were compared in a small sample of 10 scat samples, SNPs had a higher call rate (0.692) than that of microsatellites (0.633), although this difference was not significantly different (paired t-test, P = 0.426). SNPs, however, usually outperformed microsatellites when microsatellite call rate was very low (0.33).

These results demonstrate the PumaPlex is an effective tool for the genetic analysis of puma populations. PumaPlex is able to process 384 samples for all loci simultaneously, saving much time and costs whilst producing similar results to traditional microsatellite analysis. Additionally, the resulting SNP genotypes produced from

PumaPlex are easily stored in databases and compared across studies, making it valuable for large-scale studies and conservation efforts of pumas across their range.

Genome-wide analysis of SNPs reveals pedigree errors and a lack of domestic dog ancestry in the endangered Mexican wolf

As a result of extensive predator persecution during the European expansion and colonization of the western Unites States and Mexico, the Mexican wolf (Canis lupus baileyi) was eventually extirpated by the mid 1900’s. During this time, however, three independent captive lineages were established. These captive populations, McBride,

Ghost Ranch, and Aragón are descended from three, two, and two founders, respectively.

Over time, however, these captive populations accumulated substantial levels of inbreeding. To mitigate the effects of inbreeding, the three captive wolf lineages were 32 merged in 1995. In 1998, Mexican wolves were first introduced into Arizona and New

Mexico, and the current population of free-ranging wolves numbers approximately 75 individuals. Numerous questions remain, however, regarding the extent of pedigree errors in the Mexican wolf captive population as well as potential admixture from domestic dogs.

In this study, 87 different Mexican wolf samples and a single domestic dog were genotyped for >173,000 SNPs using the Illumina CanineHD BeadArray. After removing

6 samples with low genotyping success (< 0.96), a total of 80 Mexican wolves and the domestic dog sample remained. Across all SNPs, the reproducibility of genotypes was quite high (~ 100 %). When examining 13 parent-offspring trios, three trios were found to likely contain a misidentified parent. The mean frequency of Mendelian errors in these three trios was 3.2 %, 80 times greater than the frequency of Mendelian errors in the remaining 10 trios (0.04 %). After quality control and trimming of SNP loci, 68,147 polymorphic SNPs remained. A multidimensional scaling analysis of this dataset revealed clustering associated with the three captive lineages. Individual wolves of cross-lineage ancestry were placed intermediate between the three population clusters, correlating well with the pedigree-predicted ancestry. Multidimensional scaling also identified two additional wolves whose genetic ancestry did not correspond with the pedigree prediction. One of these individuals was also identified as a likely misidentified parent.

The Mexican wolf dataset was then combined with a dataset of 446 domestic dogs from 30 breeds to identify potential admixture. The total dataset was quality filtered as done previously and also for strong linkage disequilibrium resulting in a final dataset of 33

106,373 SNPs. Both multidimensional scaling and a Bayesian analysis implemented in the software ADMIXTURE found little to no evidence of introgression from domestic dogs. The results from ADMIXTURE indicated that only two individuals from the

Aragón lineage had a small amount of potential domestic dog ancestry, 2.4 % and 3.4 %.

A maximum likelihood and principle components analysis of the hybrid index, h, were inconclusive since Mexican wolf ancestry only correlated with pedigree estimates in the

McBride lineage, and no “true” Mexican wolves could be used as a proper ancestral population.

These data demonstrate that the extensive inbreeding that occurred in the three captive lineages quickly generated substantial allele frequency divergence. This resulted in the ability to identify several pedigree errors, which needs further investigation to be properly corrected in the official pedigree. Finally, comparison with a large dataset of domestic dogs supports an almost non-existent level of introgression in Mexican wolves.

This has important implications for the long-term conservation and management of this endangered subspecies.

Measuring inbreeding using genomic data: endangered Mexican wolves as an example

Inbreeding occurs when an individual’s parents share a recent common ancestor.

In many endangered species, or otherwise small populations, inbreeding is an important concern because it increases homozygosity. The increase in homozygosity is assumed to result in a decrease in the fitness of individuals and the population, called inbreeding depression. The ability to statistically correlate inbreeding with decreased fitness is 34 necessary for detecting inbreeding depression, and the entire field of “heterozygosity- fitness correlations” is devoted to this. However, these correlations depend on a number of factors, and are susceptible to a variety of conditions such as Mendelian inheritance variation, inbreeding variance, the number of genetic markers, and method of calculating inbreeding. These correlations are often rather weak, especially in circumstances where the expectation of inbreeding depression is high. Therefore, examination of new methods, especially in the era of genome-wide sets of genetic markers, is necessary to attempt to improve the ability to detect inbreeding depression, and thereby potentially reduce this effect in threatened or endangered populations.

Using the Mexican wolf population as an example and the genome-wide set of

SNPs described in Appendix C, three different measures of inbreeding were compared: 1) inbreeding predicted from the pedigree (FPED), 2) inbreeding calculated from a commonly used moment estimator (FHET) and 3) inbreeding derived from the proportion of the genome in runs of homozygosity (FROH). FROH was a better estimator of FPED (r =

0.897, p = < 0.001) than FHET (r = 0.072, p = 0.528). Inbreeding calculated using FROH was also consistently higher than FPED, suggesting the founders of the Mexican wolf population may have been inbred to some degree. Furthermore, to avoid the effect of potential pedigree errors, sampling variance, and founder inbreeding, the above correlations were repeated on simulated SNP data generated according to the Mexican wolf pedigree. Again, FROH (r ~ 1.00, p = < 0.001) outperformed FHET (r = 0.134, p =

0.235) when predicting FPED (r = 0.897, p = < 0.001). The high correlation of FROH with

FPED was indicative of the effect of pedigree errors on the observed dataset. Finally, to the assess the effect of different breeding strategies on inbreeding calculations we 35 simulated 500 populations bred using a method to minimize mean kinship, commonplace in captive populations of endangered species, and another 500 populations bred using random mating. Both sets of simulations utilized parameters characteristic of the

Mexican wolf captive populations. FROH continued to outperform FHET in all simulations, and, rather unexpectedly, was a slightly better predictor in populations bred using mean kinship than populations mated at random.

These results support that large, genome-wide sets of SNPs can be very useful for predicting inbreeding, and that FROH, in particular, is a more efficient estimator than methods (such as FPED) that utilize allele frequencies. This has important implications in detecting inbreeding depression in many endangered species, where sample sizes can be quite low and deviations from Hardy-Weinberg expectations can have drastic effects on allele frequencies, thereby reducing the ability to accurately infer inbreeding.

Evidence for selection and an association study of genes involved in fitness in the endangered Mexican wolf (Canis lupus baileyi): implications for management

Programs that attempt to minimize the loss of biodiversity often practice in situ conservation, however ex situ conservation is also important when a species can no longer be managed in its natural habitat. Many ex situ, or captive populations, can be quite challenging because these populations commonly start with relatively few founders, lack preliminary genetic data, adapt to captivity, become inbred, and/or fail to reproduce.

Therefore genetics, or more recently, genomics, is a valuable tool to aid in the management in these populations. 36

The conservation program of the endangered Mexican wolf (Canis lupus baileyi) suffers from a variety of the consequences often encountered after a severe bottleneck and subsequent inbreeding (see above, or Appendices C and D). The current Mexican wolf population is susceptible to a variety of low fitness traits, such as small litter size and juvenile survival to name a few. The identification of the underlying genetic basis for these traits is critical for future population management to establish a persistent and prolific wild population.

In this study a genome-wide scan for selection was performed using the 68,147

SNP dataset described in previous chapters of this dissertation. Additionally, each SNP was tested for a potential association with two quantitative fitness phenotypes: litter size and juvenile survival. The tests for selection, performed using the 80 high quality individual’s genotypes and the software BAYESCAN, detected 54 loci whose posterior odds for a model accounting for positive selection as the best explanation of the observed differentiation were greater than models without selection. After multiple comparison correction using a false discovery rate (FDR) cutoff of 0.05, only 22 loci remained. A total of 48 genes were found within the genomic regions of these SNPs. Four of these genes have been previously implicated in human disease, and one gene, ZAR1L, a homolog of ZAR1, is expressed in vertebrate ovaries and certain alleles are known to cause infertility in mice. analysis revealed that these regions under selection were enriched for genes involved in signal transduction in response to DNA damage (FDR = 0.057).

For the association study, a total of 3,790 and 3,618 SNPs were found associated

(p < 0.05) with juvenile survival and litter size, respectively. However, no loci remained 37 after correction for multiple testing. The effect size was quite small for juvenile survival

(0.4 %) and much larger for litter size (3.7 %). For both phenotypes, the most significant hit was to a region of chromosome 13. For juvenile survival, this region of chromosome

13 contained the canine homologs of four genes: RNF139, TATDN1, NDUFB9, and

MTSS1. For litter size, nine protein-coding genes were found in the region. One gene,

RSPO2, had previously been implicated as responsible for moustache and eyebrow

‘furnishings’ in wire-haired domestic dogs, and a second gene, EBAG9, has been implicated in a variety of cancers and the estrogen response pathway. Reduced expression of EBAG9 has also been known to increase the rejection of fetuses in humans.

Overall, this study lacked statistical power to generate definitive conclusions regarding loci under selection and those associated with juvenile survival and litter size.

Nonetheless, these results do suggest that genome-wide analysis of SNPs may be important for detecting these influential genetic regions. Future work is needed to confirm these finding using a much larger sample size. However, once found, genetic marker-assisted selection of individuals to pair and release into the wild may have critical implications in the recovery of the reintroduced Mexican wolf population.

Conclusions

As mentioned in the introduction, two separate approaches were utilized in these studies to apply genomic techniques and technologies to wildlife conservation. Using the ab initio approach in pumas, we were able to develop a high-throughput genotyping array that appears to be nearly as powerful as current microsatellite analysis. This array provides rapid results with little expense of DNA, is comparable across laboratories, and 38 outperforms microsatellites in fecal DNA genotyping success. However, the ab initio method required extensive work up front and produced relatively few markers compared with the genome-enabled approach. Using this latter approach in Mexican wolves, we demonstrated extensive evidence for pedigree errors and a lack of recent admixture with domestic dogs. Considering that gray wolf reintroduction is a current “hot topic” in politics and wildlife management, these results will be important for defining the future of the Mexican wolf recovery program. Additionally, the Mexican wolf population and associated pedigree provided an excellent opportunity to test the performance of different estimators of inbreeding from genome-wide SNP data. We discovered that for quantifying inbreeding and testing for subsequent inbreeding depression, using runs of homozygosity is more accurate and precise, especially when population sizes are small and deviations from Hardy-Weinberg expectations large. Finally, we were able to identify potential genetic loci linked to litter size and juvenile survival in the Mexican wolf. However, we also highlight how we lack the statistical power to support robust conclusions, and more work will be needed to verify these findings.

Conservation genetics has been rapidly transformed into a field of conservation genomics, enabled primarily by next-generation sequencing technologies. Although modern genomic markers, notably SNPs, will not completely replace traditional markers like microsatellites, their use will continue to expand along with technology.

Conservation biologists and wildlife managers should be aware of the available techniques and questions that can be addressed, in addition to being prepared to ask new questions that were once not feasible. With the decreasing costs and increasing availability of complete genome sequencing, these new questions, along with the 39 canonical approaches and questions in conservation genetics, should improve our ability to conserve and manage the planet’s biodiversity. 40

REFERENCES

Allendorf FW, Hohenlohe PA & Luikart G. (2010). Genomics and the future of

conservation genetics. Nature Reviews Genetics 11(10): 697-709.

Beier P. (1991). Cougar attacks on humans in the United States and Canada. Wildlife

Society Bulletin 19(4): 403-412.

Bogan MA & Mehlhop P. (1983). Systematic relationship of gray wolves (Canis lupus)

in southwestern North America. Occasional Papers in the Museum of

Southwestern Biology 1: 1-21.

Brooks TM, Mittermeier RA, da Fonseca GAB, et al. (2006). Global biodiversity

conservation priorities. Science 313(5783): 58-61.

Brown DE. (1983). The wolf in the Southwest: the making of an endangered species, The

University of Arizona Press, Tucson, Arizona, USA.

Butchart SHM, Walpole M, Collen B, et al. (2010). Global biodiversity: indicators of

recent declines. Science 328(5982): 1164-1168.

Cardillo M, Purvis A, Sechrest W, et al. (2004). Human population density and extinction

risk in the world's carnivores. Plos Biology 2(7): 909-914.

Caso A, Lopez-Gonzalez C, Payan E, et al. (2008). Puma concolor. In IUCN 2013.

IUCN Red List of Threatened Species. Version 2013.2. www.iucnredlist.org.

Chambers SM, Fain SR, Fazio B, et al. (2012). An account of the taxonomy of North

American wolves from morphological and genetic analyses. North American

Fauna 77: 1-67.

Culver M, Johnson WE, Pecon-Slattery J, et al. (2000). Genomic ancestry of the

American puma (Puma concolor). Journal of Heredity 91(3): 186-197. 41

Elbroch M, Wittmer HU, Saucedo C, et al. (2009). Long-distance dispersal of a male

puma (Puma concolor puma) in Patagonia. Revista Chilena de Historia Natural

82: 459-461.

Frankham R. (1995). Conservation genetics. Annual Review of Genetics 29: 305-327.

Garcia-Moreno J, Matocq MD, Roy MS, et al. (1996). Relationships and genetic purity of

the endangered Mexican wolf based on analysis of microsatellite loci.

Conservation Biology 10(2): 376-389.

Guinness World Records 2004. (2004). Guinness World Records LTD. p. 49.

Hailer F & Leonard JA. (2008). Hybridization among three native North American canis

species in a region of natural sympatry. Plos One 3(10): e3333.

Hall ER & Kelson KR. (1959). The Mammals of North America, Vol. II, The Ronald

Press, New York, New York, USA.

Harris NC & Dunn RR. (2013). Species loss on spatial patterns and composition of

zoonotic parasites. Proceedings of the Royal Society B 280(1771): 20131847.

Hayes CL, Rubin ES, Jorgensen MC, et al. (2000). Mountain lion predation of bighorn

sheep in the Peninsular Ranges, California. Journal of Wildlife Management

64(4): 954-959.

Hedrick PW, Miller PS, Geffen E, et al. (1997). Genetic evaluation of the three captive

Mexican wolf lineages. Zoo Biology 16(1): 47-69.

Iriarte JA, Franklin WL, Johnson WE, et al. (1990). Biogeographic variation of food

habits and body size of the American puma. Oecologia 85: 185-190.

Jirimutu WZ, Ding G, Chen G, et al. (2012). Genome sequences of wild and domestic

bactrian camels. Nature Communications 3: 1202. 42

Johnson WE, Eizirik E, Pecon-Slattery J, et al. (2006). The Late Miocene radiation of

modern Felidae: A genetic assessment. Science 311(5757): 73-77.

Kohn MH, Murphy WJ, Ostrander EA, et al. (2006). Genomics and conservation

genetics. Trends in Ecology & Evolution 21(11): 629-637.

Lake-Thom B. (1997). Spirits of the earth: a guide to Native American nature symbols,

stories, and ceremonies. Penguin Books, New York, USA.

Leonard JA, Vilá C & Wayne RK. (2005). Legacy lost: genetic variability and population

size of extirpated US grey wolves (Canis lupus). Molecular Ecology 14(1): 9-17.

Li R, Fan W, Tian G, et al. (2010). The sequence and de novo assembly of the giant

panda genome. Nature 463(7279): 311-317.

Logan KA & Irwin LL. (1985). Mountain lion habitats in the Big Horn Mountains,

Wyoming. Wildlife Society Bulletin 13(3): 257-262.

Logan KA & Sweanor LL. (2001). Desert Puma: Evolutionary Ecology and

Conservation of an Enduring Carnivore, Island Press, Washington D.C., USA.

Mech LD & Boitani L. (2003). Wolves: behavior, ecology, and conservation. University

of Chicago Press, Chicago, Illinois, USA.

Metzker ML. (2009). Sequencing technologies - the next generation. Nature Reviews

Genetics 11(1): 31-46.

Morrison JC, Sechrest W, Dinerstein E, et al. (2007). Persistence of large mammal faunas

as indicators of global human impacts. Journal of Mammalogy 88(6): 1363-1380.

Nelson EW & Goldman EA. (1929). A new wolf from Mexico. Journal of Mammology

10: 165-166. 43

Nowak RM. (1995). Ecology and conservation of wolves in a changing world,

Proceedings of the Second North American Wolf Symposium, Canadian

Circumpolar Institute, University of Alberta, Edmonton, Alberta, Canada.

Nowell K & Jackson P. (1996). Wild cats. Status survey and conservation action plan.

IUCN/SSC Specialist Group, Gland, Switzerland and Cambridge, UK.

Rinkevich S. (2012). An assessment of abundance, diet, and cultural significance of

Mexican gray wolves in Arizona. Dissertation, University of Arizona.

Ripple WJ, Estes JA, Beschta RL, et al. (2014). Status and ecological effects of the

world's largest carnivores. Science 343(6167): 151-+.

Schaefer RJ, Torres SG & Bleich VC. (2000). Survivorship and cause-specific mortality

in sympatric populations of mountain sheep and mule deer. California Fish and

Game 86(2): 127-135.

Schmitz OJ, Hamback PA & Beckerman AP. (2000). Trophic cascades in terrestrial

systems: a review of the effects of carnivore removals on plants. American

Naturalist 155(2): 141-153.

Soulé, ME. (1985). What is conservation biology?. BioScience 35(11): 727-734.

Sunquist ME & Sunquist F. (2002). The essence of cats. In Wild cats of the world

(Sunquist, M. E. & Sunquist, F., eds.), pp. 11-13. University of Chicago Press,

Chicago, Illinois, USA.

Torres SG, Mansfield TM, Foley JE, et al. (1996). Mountain lion and human activity in

California: testing speculations. Wildlife Society Bulletin 24(3): 451-460.

Tsuchihashi Z & Dracopoli NC. (2002). Progress in high throughput SNP genotyping

methods. Pharmacogenomics Journal 2(2): 103-110. 44

Turner JW, Wolfe ML & Kirkpatrick JF. (1992). Seasonal mountain lion predation on a

feral horse population. Canadian Journal of Zoology 70(5): 929-934.

U.S. Fish and Wildlife Service [USFWS]. (2010). Mexican wolf conservation

assessment. U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New

Mexico, USA.

U.S. Fish and Wildlife Service [USFWS]. (2011). Eastern puma (=cougar) (Puma

concolor couguar) 5-year review: summary and evaluation. U.S. Fish and

Wildlife Service, Maine Field Office, Orono, Maine, USA.

U.S. Fish and Wildlife Service [USFWS]. (2012). Mexican wolf Blue Range

reintroduction project statistics. U.S. Fish and Wildlife Service, Downloaded

from: http://www.fws.gov/southwest/es/mexicanwolf/.

Vilá C, Amorim IR, Leonard JA, et al. (1999). Mitochondrial DNA phylogeography and

population history of the grey wolf Canis lupus. Molecular Ecology 8(12): 2089-

2103. vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305.

Weber W & Rabinowitz A. (1996). A global perspective on large carnivore conservation.

Conservation Biology 10(4): 1046-1054.

Wehausen JD. (1996). Effects of mountain lion predation on bighorn sheep in the Sierra

Nevada and Granite Mountains of California. Wildlife Society Bulletin 24(3): 471-

479. 45

Witzenberger KA & Hochkirch A. (2011). Ex situ conservation genetics: a review of

molecular studies on the genetic consequences of captive breeding programmes

for endangered animal species. Biodiversity and Conservation 20(9): 1843-1861.

Young SP & Goldman EA. (1944). The Wolves of North America: Part I, Dover

Publications Inc., New York, New York, USA.

Zhao S, Zheng P, Dong S, et al. (2013). Whole-genome sequencing of giant pandas

provides insights into demographic history and local adaptation. Nature Genetics

45(1): 67-71. 46

APPENDIX A

CONSERVATION GENETICS AND THE TRANSITION TO CONSERVATION GENOMICS

FITAK, ROBERT R.

Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ

85721, USA

This paper was prepared to submit to a journal.

Corresponding Address:

Robert R. Fitak

317 Biosciences East

1311 E 4th St

Tucson, AZ, 85721, USA

Tel: 520-626-1636

Fax: 520-621-8801

Email: [email protected]

Keywords:

Population genomics, next-generation sequencing, genome enabling, single nucleotide polymorphisms 47

ABSTRACT

The field of conservation genetics has continually aimed at applying the latest techniques in molecular genetics to the conservation and management of species. In recent years, substantial improvements in genomic technologies and DNA sequencing have revolutionized the applications of genetics in conservation studies. As a result, conservation genetics has progressed into a field of conservation genomics. Through the ability to simultaneously analyze the expression of thousands of genes, genotype individuals for hundreds of thousands of markers, or even sequence entire genomes, conservation genetics has generated more statistically robust conclusions and even begun asking new questions entirely. For instance, captive breeding programs can now implement more specific “marker-assisted breeding” strategies, and the once-elusive adaptive variation can now be identified and managed accordingly in threatened or endangered populations. Unlike other reviews suggesting the potential for conservation genomics, we acknowledge conservation genomics as realized and here we use existing conservation genomic studies to illustrate the various applications of genomics and present the reader with various techniques currently being employed. Conservation biologists and managers alike should find this article a useful introduction to tools and techniques available in the rapidly changing field of conservation genomics. 48

INTRODUCTION

The field of conservation genetics applies genetic and other molecular techniques to preserve species as dynamic entities capable of tolerating environmental change

(Frankham 2009). Traditionally, conservation genetics has employed neutral-marker

(e.g., microsatellite) surveys to examine the loss of genetic diversity, inbreeding, population structure, kinship, selection, and effects of captivity (Frankham 1995).

Conservation genetics developed together with the expansion of molecular biology, primarily enabled by the polymerase chain reaction (PCR) and DNA sequencing.

Recently, new technologies that can rapidly produce many orders of magnitude more sequences, collectively termed next-generation sequencing (NGS), have pioneered the growth in the field of genomics (Hawkins et al. 2010; Metzker 2010; Zhou et al. 2010).

Adopted by conservation geneticists, genomics has the potential to enhance conservation genetic assessments, and a field of conservation genomics has been proposed (Luikart et al. 2003; Ryder 2005; Kohn et al. 2006; Avise 2009; Ouborg 2009; Allendorf et al. 2010;

Ouborg et al. 2010).

Genomics, or the study of genomes, is revolutionizing our understanding of biology. Since the first publication of a genome sequence (phage Φ-X174, Sanger et al.

1977a), the field has grown to include nearly 3,000 complete genomes and ~8,500 ongoing projects (Pagani et al. 2012). Endeavors such as the 1,000 Genomes Project

(Altshuler et al. 2012) and Genome 10K Project (Haussler et al. 2009) aim to sequence as many genomes to elucidate the evolutionary history and extent of genetic variation in humans and vertebrates, respectively. Although the cost and bioinformatic requirements of resequencing complete genomes remains beyond the financial reach of most 49 conservation genetic projects, the benefits from genomics are still evident. For example, genomic-level analyses are not limited to the taxon sequenced but genome-enable numerous related taxa depending on phylogenetic distance and questions addressed

(Kohn et al. 2006). Considering nearly all model organisms have had their genome sequenced, next the number of sequences from non-model organisms, such as species of conservation priority, will begin to increase.

Previous reviews have examined the need for and potential applications of genome analyses in conservation (Luikart et al. 2003; Ryder 2005; Kohn et al. 2006;

Avise 2009; Ouborg 2009; Allendorf et al. 2010; Ouborg et al. 2010), but often present limited discussion regarding how conservation geneticists have already implemented genomics. In this review, I discuss existing studies that have adopted genomic techniques to benefit species conservation. I begin with an introduction to sequencing and the explosion of NGS technologies. Then, I describe various microarray assays, including heterologous (cross-species) studies, which require very little or no a priori sequence data for the species of interest. Finally, I discuss the various applications of genomics to expanded surveys for genetic markers and complete genome sequencing.

Due to the large number of recent studies employing genomics in a variety of non-model species, this review is not exhaustive and I primarily focus on studies of endangered, threatened, or conservation-relevant species of wild vertebrates. Although I occasionally reference evolutionary studies of species not in the interest of conservation, usually the techniques and questions can be directly extrapolated to conservation programs.

NEXT-GENERATION SEQUENCING 50

Since 1977 DNA sequencing has generally been performed by two techniques:

Sanger dideoxy chemistry (Sanger et al. 1977b) and Maxam-Gilbert chemical sequencing

(Maxam and Gilbert 1977). The Sanger method rapidly became the technique of choice for its ease of automation and was drastically improved as a result of the international effort to sequence the (Lander et al. 2001). During the Human Genome

Project, automated dideoxy sequencing achieved improved accuracy, increased sequence length, greater parallelization, and reduced cost. Sanger sequencing, however, is not without its limitations. It requires gels and polymers to separate individual fluorescently labeled DNA fragments, can only be partially automated, and analyzes a limited number of samples in parallel. After the Human Genome Project, the growing interest in genomics and increased demand for sequences required several orders of magnitude more sequence production. And despite its innumerable applications in biology and medicine, automated Sanger-sequencing was unable to meet this requirement (Ansorge 2009).

In the early 2000's, large amounts of resources were devoted to the development of new sequencing technologies that did not use gels and could sequence DNA on a much larger scale. As a result, three new sequencing platforms were commercialized and released between 2005 and 2007. These techniques, developed by 454 Life Sciences

(www.454.com), Solexa (www.illumina.com), and Applied Biosystems

(www.appliedbiosystems.com), were able to sequence DNA on the order of millions

(Mb) to billions (Gb) of base-pairs per run (Table A.1; for detailed descriptions of NGS technologies, see Shendure and Ji 2008; Ansorge 2009; Metzker 2010; Zhou 2010; Glenn

2011). The first NGS platform was released by 454 Life Sciences in 2005 and was later purchased by Roche (http://www.roche.com). The technique used pyrosequencing of 51

DNA attached to beads in picoliter plates (Margulies et al. 2005). The first release, the

GS-20, could produce roughly 25 Mb per machine-run (Margulies et al. 2005). The latest release, the GS-FLX Titanium XL+, is capable of producing 700 Mb per run with read lengths up to 1,000 bases (Table A.1; www.454.com). The long read lengths of 454 sequencing are an advantage over other technologies, however, it is a challenge to sequence homopolymer stretches (Ansorge 2009; Metzker 2010). The other two sequencing platforms currently produce much shorter read lengths (50 – 150 bp at present), but orders of magnitude more reads (Table A.1). The Solexa platform was first released in 2006, and purchased by Illumina in 2007. It used a sequencing-by-synthesis method, and the latest version of the technology is capable of generating up to 600 Gb per run (HiSeq 2500; http://www.illumina.com). In 2007, Applied Biosystems released their SOLiD sequencing system. This method used a ligation-based sequencing approach and the current model can generate up to 180 Gb per run (5500xl; www.appliedbiosystems.com). In addition to these canonical NGS technologies, a plethora of other techniques have become available. These techniques can sequence single molecules of DNA (Helicos Biosciences - www.helicosbio.com; Pacific

Biosciences - www.pacificbiosciences.com), utilize proton gradients on semiconductor chips (Ion Torrent – www.iontorrent.com), and pass strands of DNA through protein nanopores (Oxford Nanopore Technologies – www.nanoporetech.com). Overall, NGS has led to drastic reductions in sequencing costs (Figure A.1), and as a result has become affordable and amenable to the budgets of conservation geneticists.

NGS has and will continue to revolutionize our understanding of genomics.

Despite its many advantages, NGS will likely never replace Sanger sequencing due to its 52 low cost (when only a few sequences are desired), long reads, and high quality. As new sequencing technologies emerge and current technologies advance, however, one needs to be aware of the advantages and disadvantages of each NGS system. Each NGS technology has specific error types and rates, which can vary depending on a variety of conditions (see Gilles et al. 2011; Glenn 2011; Minoche et al. 2011; Luo et al. 2012;

Quail et al. 2012; Wang et al. 2012 for more detailed descriptions of this topic). For example, Ion Torrent sequencing has difficulties sequencing AT-rich samples (Quail et al. 2012), and Roche 454 pyrosequencing with homopolymer stretches (Gilles et al.

2011). Regardless, the NGS platform chosen will depend upon whether one is sequencing PCR amplicons, genomic DNA, or cDNA, the costs, the read length desired, etc. Aside from these drawbacks the principle limitation of NGS thus far has been the bioinformatic challenges and data storage concerns associated with the massive amounts of sequence data (Pop and Salzberg 2008). NGS generates very large data files, often requiring extensive storage capabilities in the many terabytes. Assembling the many, and often short, sequencing reads also requires processing capabilities beyond what is routinely available to consumers. Therefore, researchers often have access to shared, high-performance computer resources and spend a majority of costs on bionformatic support. Despite the drawbacks, NGS remains instrumental to studies of genomics.

Perhaps soon it will even be possible to sequence entire single chromosomes in one pass with low error. Nonetheless, as I discuss below, NGS and its role in genomics provide incredible opportunities for conservation geneticists to improve their studies of threatened and endangered species.

53

MICROARRAYS

For years conservation geneticists have used genetic markers, such as DNA sequences and microsatellite loci, from one species to examine other, closely related species (see Barbará et al. 2007 for a review and references therein). The success of these cross-species, or heterologous, genetic markers can vary depending upon the questions addressed and phylogenetic distance between the species (Kohn et al. 2006).

The first genomic tools and resources were initially developed for model organisms, but as analyses became more affordable they found their way into the hands of conservation geneticists researching closely related species. Prior to the development of NGS, these genomic tools were focused on the use of various types of arrays, or chips, called microarrays.

The microarray has been instrumental for studies of genomics. Microarrays are small chips or glass slides containing hundreds to millions of oligonucleotide probes that hybridize with DNA and/or RNA. Microarrays can simultaneously analyze the expression pattern of thousands of genes or genotype individuals for millions of single- nucleotide polymorphisms (SNPs) and copy number variants (Heller 2002; Hoheisel

2006). More recently, microarrays have been developed to examine other biological content such as short RNA molecules (Thomson et al. 2004; Li and Ruan 2009), total protein content (proteome; Talapatra et al. 2002; Stoevesandt et al. 2009; Lee et al. 2013), and sites of DNA methylation (epigenome; Hoheisel 2006; Laird 2010). Gibson (2002) and Chismar et al. (2002) were the first to propose the heterologous use of microarrays as a valuable tool for ecologists and population biologists to examine species of conservation interest. Gibson (2002) suggested that the use of expression microarrays 54 has important applications in understanding adaptive variation, quantitative trait variation, complex biochemical pathways and networks underlying phenotypic variation, symbiosis and parasitism, and population structure and divergence.

Expression microarrays

The early use of microarrays focused on the simultaneous analysis of gene expression. Expression microarrays primarily identify groups of genes that respond differently to varying sets of conditions or treatments (Schena et al. 1995) (Figure A.2A).

In order to construct an expression microarray, one must have previous knowledge of the sequences of a large number of genes. These sequences can be found in a published genome sequence, gathered from databases of expressed sequence tags (ESTs; fragments of expressed genes that have been sequenced), or more recently, identified from transcriptome sequencing (see section below on EXPANDED MARKER SURVEYS).

Initially, human gene expression microarrays were used to analyze gene expression in non-human primates. For example, Vahey et al. (2003) analyzed gene expression in macaques with and without a simian immunodeficiency virus to investigate the progression of disease. Another study by Uddin (2004) used human microarrays to phylogenetically place human and chimpanzees as sister taxa based upon gene expression profiles. The use of human expression microarrays also expanded to include more distantly related species such as red wolves (Canis rufus; Kennerly et al. 2008) and even

Atlantic salmon (Salmo salar; Tsoi et al. 2003). In the former study, Kennerly et al. investigated the regulation of genes associated with confinement of red wolves.

Although only 12% of the 24,254 gene probes hybridized above background to red wolf 55 cDNA, the authors still found 482 genes that statistically responded to captivity. The results indicated that in confined animals biological pathways associated with pro- inflammatory and stress responses were activated. The authors suggested these genes might serve as biomarkers useful for studies of animal management and evaluating the consequences of habitat changes on population health. In the latter study of Atlantic salmon by Tsoi et al., 241 (6%) of the 4,131 genes on the human microarray produced detectable signals. Of these 241 genes, the authors reported four that responded differently between healthy fish and those experimentally infected with Aeromonas salmonicida. The authors highlighted the relevance of these genes in immune response, in addition to the utility of the array for gene discovery in salmon. As expression microarrays became more readily available for other model species, studies on an increasing number of taxa were performed. For example, Kassahn et al. (2007) examined heat stress in coral reef fish (Pomacentrus moluccensis) using an array developed for the model fish species Danio rerio, and Magnanou et al. (2009) used an expression array developed for laboratory rats (Rattus novegicus) to identify genes responsible for dietary shifts in woodrats. In birds, Naurin et al. (2008) demonstrated the feasibility of analyzing passerine bird species using zebra finch (Taeniopygia guttata) microarrays. The authors found that 96% of the gene probes for the zebra finch were accurately analyzed in the common whitethroat (Sylvia communis).

The heterologous use of expression microarrays, however, does not come without caveats (see Oshlack et al. 2007; Buckley 2009). The success of such a study will depend primarily upon the ability of the heterologous DNA to bind to the probes on the microarray. Generally, successful binding decreases with increasing sequence 56 divergence, with ~80% sequence similarity necessary for proper binding (Evertsz et al.

2001; Chismar et al. 2002; Renn et al. 2004). Therefore it must be taken into consideration that successful heterologous hybridization will occur in genes more highly conserved between the species. Researchers must also be cautious when making comparisons in gene regulation between different species, because sequence divergence between each species studied and the probes on the array can vary widely. This variation can potentially confound the results because differences in gene expression can be attributed to differential hybridization success and not true biological differences in gene regulation (see Forêt et al. 2007 and Oshlack et al. 2007 for a complete discussion).

Consequently, the heterologous use of expression microarrays for intra-specific studies are much more effective, because sequence divergence will equally impact hybridization in all samples. More recently, as more sequence data accumulated for non-model organisms, a variety of studies have designed customized, species-specific or multi- species microarrays (e.g., Oshlack et al. 2007; Vera et al. 2008; Tymchuk et al. 2010;

Bonneaud et al. 2012). A hybrid approach can also be used, in which one designs a custom microarray using sequences from a different species (Mancia et al. 2012). This approach requires previous knowledge of which genes reliably cross-hybridize. The study by Mancia et al. was able to design a microarray using domestic dog sequences previously known to hybridize with California sea lion (Zalophus californianus) sequences. The authors used the resulting gene expression profiles to differentiate the individual sea lions based upon disease status and toxin exposure. The authors suggested the potential for this approach in improving our understanding of the health status of wildlife. Despite the utility of expression microarrays, the frequency of their use is 57 waning. This decline can be directly attributed to the growth of NGS technologies.

Nowadays, it is much easier to compare gene expression on a genomic scale by simply sequencing the transcripts produced (“RNA-seq”; Figure A.2B) (see Wang et al. 2009;

Wilhelm and Landry 2009; Marguerat and Bähler 2010 for reviews). RNA-seq has several advantages over expression microarrays, including 1) little or no previous sequence data is necessary, 2) less RNA starting material is required, 3) more sensitive to differences in expression, and 4) the ability to detect differences in expression of isoforms and alleles (Wang et al. 2009). Several studies in conservation have already begun employing the RNA-seq technique (Goetz et al. 2010; Jeukens et al. 2010; Wolf et al. 2010; Fu and He 2012; Perry et al. 2012a). As a result of the increased capabilities of

NGS, it is arguable that RNA-seq will replace a majority of array-based expression studies. Different techniques aside, it is clear that studies of gene expression have important implications for studies in conservation.

Genotyping microarrays

As discussed earlier, microarrays are not limited to the investigation of gene expression, but are commonly used to genotype individuals for a large number of genetic markers, usually SNPs (Decker et al. 2009; Allendorf et al. 2010; Seeb et al. 2011;

Angeloni et al. 2012). The most prevalent heterologous use of genotyping microarrays has been in studies of wild relatives of domestic species, such as bovids (Decker et al.

2009; Pertoldi et al. 2010; Michelizzi et al. 2011; Haynes and Latch 2012; Ogden et al.

2012), ovids (Miller et al. 2011; Miller et al. 2012a), equids (McCue et al. 2012), suids

(Ramos et al. 2009), canids (vonHoldt et al. 2011; Knowles 2010), and felids (Mullikin et 58 al. 2010). For example, Pertoldi et al. (2010) genotyped American (Bison bison) and

European (B. bonasus) bison on an array designed with 53,000 SNPs from domestic cattle. The authors discovered lower polymorphism and heterozygosity in addition to longer haplotype block structure in the European bison compared with the American bison. These results were consistent with the more extreme bottleneck of population size experienced in European bison at the turn of the 20th century. The authors also proposed the use of “marker-assisted” breeding strategies in the bison to minimize inbreeding depression and maximize diversity in captive populations of bison. The study by vonHoldt et al. (2011) utilized a genotyping array for domestic dogs (Canis lupus familiaris) to analyze a large number of dogs and other wolf-like canids. The authors suggested that the highly controversial ancestry of Great Lakes and red wolves is actually a result of recent introgression between canids derived from gray wolves (C. lupus) and coyotes (C. latrans), respectively. Rutledge et al. (2012) and Chambers et al. (2012), however, pointed out several inconsistencies in the conclusions presented by vonHoldt et al.. Both Rutledge et al. and Chambers et al. note that the results were not considered in a larger context of other genetic, ecological, and fossil evidence, and that the comparisons across distinct taxa introduced substantial ascertainment bias (see below for brief discussion of ascertainment bias). As a result of these misinterpretations, vonHoldt et al. have certainly obscured endangered species conservation and management plans

(Rutledge et al. 2012).

Similar to the case with expression microarrays, genotyping microarrays are not without faults. The effectiveness of these assays also decays with increasing phylogenetic distance from the species for which the array was developed. For instance, 59 when genotyping microarrays are compared across species, Miller et al. (2012) observed that the call rate (proportion of successfully genotyped loci) reduced linearly by a rate of

~1.5% per million years of divergence, and the retention of polymorphic SNPs decreased exponentially (Figure A.3). Another concern when utilizing genetic markers, like SNPs, developed in a separate species is ascertainment bias (Morin et al. 2004; Albrechtsen et al. 2010; Garvin et al. 2010). Ascertainment bias refers to the systematic deviations of allele frequencies in the SNP discovery panel to that in the genotyping panel (Kuhner et al. 2000, Nielsen and Signorovitch 2003; Nielsen 2004; Clark et al. 2005), generally resulting in fewer alleles observed in species with high divergence from the species the markers originated from. Similar to expression data, the ascertainment bias can be minimized or reduced by making comparisons only across similar samples or populations. We have already seen how comparing allele frequencies for SNPs from domestic dogs across many species of canids can lead to highly controversial, potentially biased, results. Therefore, conservation geneticists need to assess the costs and benefits between utilizing a heterologous assay (whether expression or genotyping) versus developing an array specific for a particular species.

EXPANDED GENETIC MARKER SURVEYS

A genetic marker is a region of DNA that can be used to make inferences regarding the ecological or evolutionary history of an individual or population. Since the development of PCR in the 1980’s, short, tandem repeats of DNA called microsatellites have been the primary genetic marker for studies of population and conservation genetics

(Ellegren 2004; Selkoe and Toonen 2006). Generally, the most challenging component 60 of any conservation genetic study is the isolation and characterization of the microsatellite markers. The process of isolating, cloning, sequencing, and testing microsatellites in a population is expensive, laborious, and may require many months to complete. Typically, between 10-30 polymorphic loci are characterized. Also, the frequency of microsatellites is not equal in all taxa, and in some cases it can be very difficult to isolate multiple polymorphic loci (Lagercrantz et al. 1993; Nève and Meglécz

2000). NGS, however, has reduced the cost, labor, and time involved in isolating microsatellite loci (Ekblom and Galindo 2011; Gardner et al. 2011; Malausa et al. 2011).

As of 2011, at least 54 studies have isolated microsatellites using NGS (Gardner et al.

2011), and this number has likely increased dramatically since. To isolate microsatellites researchers often use 454 pyrosequencing, which offers the longer read lengths necessary to contain the repetitive unit and potential primer locations (Ekblom and Galindo 2011;

Gardner et al. 2011; Malausa et al. 2011). For example, a small 454 sequencing run (¼ and ⅛ of a plate) yielded over 14,612 microsatellite loci in the copperhead snake

(Agkistrodon contortrix), 4,564 of which contained high-quality priming sites (Castoe et al. 2010). In the endangered bream (Megalobrama pellegrini) Wang et al. (2012) produced 49,811 unique sequences from ¼ plate of a 454 run and identified microsatellite repeats in nearly half of them (24,522). Although the increased availability of microsatellites will benefit many species of conservation priority, genomics has yet to generate a reliable, high-throughput technique to analyze individuals for large numbers of microsatellites (see Highnam et al. 2013 for an exception). On the other hand, another genetic marker, SNPs, have become popular in population genetic studies, and are 61 amenable to genome-scale, high-throughput analyses (Vignal et al. 2002; Morin et al.

2004, Garvin et al. 2010).

A SNP is simply a single base mutation that has risen to appreciable frequency in a population and is readily detectable by scanning individuals. Although four alleles are possible for any given SNP, only two alleles are generally observed as a result of the low mutation rate per nucleotide site and the mutational bias favoring transitions (Vignal et al.

2002). Compared with microsatellites, the advantages of SNPs include: 1) a well- characterized mutational pattern (infinite sites model), 2) increased abundance in the genome, 3) are more amenable to high-throughput technologies, 4) genotypes can be easily stored in databases and successfully compared across studies, and 5) better adapted for forensic applications (Vignal et al. 2002; Brumfield et al. 2003; Morin et al. 2004;

Schlötterer 2004). However, as a result of their low information content per site, a higher number of SNPs relative to microsatellites need to be analyzed. The actual number of

SNPs required can vary depending upon the information content of the SNPs (Rosenberg et al. 2003), the question addressed, sample size, and evolutionary or demographic history of the target organism (Morin et al. 2004; Morin et al. 2009).

Until recently, SNPs were rarely used in conservation genetics because the process of SNP discovery was quite laborious and expensive. The few examples included studies where the species of conservation interest were closely related to model species with a genome sequence, termed genome-enabled species (Kohn et al. 2006).

NGS, however, has made this process much easier, and has allowed for SNPs to be used in species that are not genome-enabled. There are many different methods for the identification of SNPs, and the method chosen will depend on a variety of factors, 62 including costs, availability of technical resources, amount of existing genetic data for the investigated species, and the questions addressed (for an exhaustive review of methods and examples, see Garvin et al. 2010). To identify a SNP, one must sequence the same region multiple times from a pool of one or more individuals. The number of times a particular base has been sequenced is called coverage (e.g., 2X). To minimize the recognition of PCR artifacts or sequencing errors as SNPs, the coverage must be relatively high. Because high coverage of loci is required and most endangered and/or threatened species have large genomes (>1 Gb), the proportion of the genome sequenced must be greatly reduced to make sequencing affordable and data analyses computationally feasible.

Several techniques exist to generate a genomic “reduced-representation” library for sequencing (Garvin et al. 2010; Davey et al. 2011). The most popular approach has been transcriptome sequencing (Vera et al. 2008; Renaut et al. 2010; Malik et al. 2011;

Margam et al. 2011; Van Belleghem et al. 2012). This method sequences complementary

DNA reverse-transcribed from messenger RNA from a pool of individuals. A genome can also be reduced to smaller fragments through digestion with particular combinations of restriction enzymes. These fragments are then size-fractionated and excised from a gel prior to sequencing (Weidman et al. 2008; van Tassell et al. 2008; Seabury et al. 2011).

This technique can also be combined with the ligation of adapters containing unique identifiers or primer sequences, similar to amplified fragment length polymorphism

(AFLP) procedures (van Orsouw et al. 2007; Houston et al. 2012). A similar approach, using restriction-site associated DNA (RAD tags), digests DNA with a restriction enzyme then sequences short regions surrounding the enzyme’s cut site (Baird et al. 2008). This 63 method not only identifies thousands of SNPs (the actual number dependent upon the choice of restriction enzyme(s)) but simultaneously genotypes individuals as well (Baird et al. 2008; Davey et al. 2011). For example, sequencing RAD tags, or RAD-seq, has been performed in poplar (Populus sp.; Stölting et al. 2013), the pitcher plant mosquito

(Wyeomyia smithii; Emerson et al. 2010), and trout (Oncorhynchus sp.; Hohenlohe et al.

2011).

WHOLE GENOME SEQUENCING

The ability to interrogate larger numbers of molecular markers (whether SNPs, microsatellites, or other markers) is critical to have the resolution and statistical robustness necessary to make important conservation and management decisions.

However, genotyping molecular markers still only represent a portion of variability within a genome. Therefore, it may be beneficial for conservation biologists to assess all the genetic variation in a genome. Because of NGS, the process of sequencing complete genomes has become a reality.

The first complete genomes to be sequenced in non-model vertebrates were mitochondrial genomes (mitogenomes; see Curole and Kocher 1999). Mitogenomes have been sequenced to examine phylogeography of cetaceans (Morin et al. 2010), taxonomy of Ursidae (Delisle and Strobeck 2002), evolutionary history of extinct species

(Cooper et al. 2001; Gilbert et al. 2008; Green et al. 2008), and species designations of

“living fossil” coelacanths (Nikaido et al. 2011). These studies were possible because the mitochondria of most extant vertebrates is rather small (~0.016 Mb) compared to the nuclear genome (~2-4 Gb). The first nuclear genome to be sequenced with a 64 conservation application was that of the giant panda (Ailuropoda melanoleuca; Li et al.

2009). The authors were the first to assemble an entire eukaryotic genome strictly from short read DNA sequences (Illumina). In contrast with the panda’s primary diet of bamboo, Li et al. identified genes conferring a potential to digest meat and a lack of known genes associated with the ability to digest cellulose. The analyses did reveal that the panda might not express a functional umami taste receptor, a receptor known to detect components of meat. In addition to the assessment of several unique evolutionary traits, the authors identified >2.7 million SNPs potentially useful for population genetic studies.

Since the panda genome was published, other complete genomes from endangered and threatened species have emerged including the European eel (Anguilla anguilla; Henkel et al. 2012), peregrine falcon (Falco peregrinus; Zhan et al. 2013), saker falcon (F. cherrug; Zhan et al. 2013), aye-aye (Daubentonia madagascariensis; Perry et al. 2012b;

2013), polar bear (Ursus maritimus; Miller et al. 2012b), and the African elephant

(Loxodonta africana; www.broadinstitute.org).

Genetic variation is generally defined in three categories: neutral, detrimental, and adaptive. Neutral variation has no measurable effect on fitness, whereas detrimental and adaptive variation have negative and beneficial effects, respectively. These categories are not discrete, however, and selection coefficients can vary continuously. Unlike neutral and detrimental variation, adaptive variation has remained the most challenging to evaluate in conservation genetic studies (Hedrick 2009; Manel et al. 2010; Schoville et al.

2012). The role of adaptive genetic variation in free-ranging populations, in a context of global change, has even been considered “one of the greatest challenges of this century”

(Manel et al. 2010). As described above for the panda, complete genomes are beginning 65 to uncover important instances of adaptive variation. For another example, Zhan et al.

(2013) identified variation potentially contributing to specific beak morphology in both

F. peregrinus and F. cherrug. Additionally, the authors found duplications in two genes involved in a water conservation pathway in F. cherrug, which is considered an arid environment-adapted species. Adaptive variation is critical for the long-term survival of any species because a deficiency in adaptive variation corresponds with a reduced long- term evolutionary potential and may compromise a population’s ability to respond to new environmental challenges. In this situation, the only potential source of adaptation would be from mutation or gene flow from other populations (Hedrick 2009). Additionally, studies of adaptive variation are beginning to demonstrate that adaptive markers are better estimates of evolutionary differentiation along ecological gradients and useful for identifying and prioritizing conservation units (Bonin et al. 2007; Gebremedhin et al.

2009). Both the increase in complete genome sequencing and expanded surveys of neutral genetic markers will assist in predicting the levels of adaptive variation in a population (Hedrick 2009; Manel et al. 2010). Regardless, adaptive variation remains an elusive, yet important, figure in conservation genetics.

CONCLUSION

Similar to the manner in which genomics revolutionized studies of model organisms, the field of conservation genetics is being rapidly transformed into conservation genomics. Heterologous assays, expanded surveys for genetic markers, complete genomes, and investigations of adaptive variation are all facilitated by genomics and have been directly applied to endangered and free-ranging species by 66 conservation geneticists. Initiatives such as the Genome 10K Project will continue to advance our understanding of the mechanisms underlying the generation and evolution of animal biodiversity (Haussler et al. 2009). In the future, conservation genomics may reveal detrimental variation in critically endangered species like the California condor

(Romanov et al. 2009), unravel the mysteries behind the worldwide decline in amphibians (Storfer et al. 2009), or even assist in the restoration of extinct biodiversity

(Nicholls 2008, Folch et al. 2009, Piña-Alguilar 2009). Conservation geneticists should be prepared to welcome the era of genomics, because the inevitable transformation has already begun. 67

REFERENCES

Albrechtsen A, Nielsen FC & Nielsen R. (2010). Ascertainment biases in SNP chips

affect measures of population divergence. Molecular Biology and Evolution 27:

2534-2547.

Allendorf FW, Hohenlohe PA & Luikart G. (2010). Genomics and the future of

conservation genetics. Nature Reviews Genetics 11(10): 697-709.

Altshuler DM, Durbin RM, Abecasis GR, et al. (2012). An integrated map of genetic

variation from 1,092 human genomes. Nature 491(7422): 56-65.

Angeloni F, Wagemaker N, Vergeer P, et al. (2012). Genomic toolboxes for conservation

biologists. Evolutionary Applications 5(2): 130-143.

Ansorge WJ. (2009). Next-generation DNA sequencing techniques. New Biotechnology

25(4): 195-203.

Avise JC. (2009). Perspective: conservation genetics enters the genomics era.

Conservation Genetics 11(2): 665-669.

Baird NA, Etter PD, Atwood TS, et al. (2008). Rapid SNP discovery and genetic

mapping using sequenced RAD markers. Plos One 3(10): e3376.

Barbara T, Palma-Silva C, Paggi GM, et al. (2007). Cross-species transfer of nuclear

microsatellite markers: potential and limitations. Molecular Ecology 16(18):

3759-3767.

Bonin A, Nicole F, Pompanon F, et al. (2007). Population Adaptive Index: a new method

to help measure intraspecific genetic diversity and prioritize populations for

conservation. Conservation Biology 21(3): 697-708.

Bonneaud C, Balenger SL, Zhang J, et al. (2012). Innate immunity and the evolution of 68

resistance to an emerging infectious disease in a wild bird. Molecular Ecology

21(11): 2628-2639.

Brumfield RT, Beerli P, Nickerson DA, et al. (2003). The utility of single nucleotide

polymorphisms in inferences of population history. Trends in Ecology &

Evolution 18(5): 249-256.

Castoe TA, Poole AW, Gu W, et al. (2010). Rapid identification of thousands of

copperhead snake (Agkistrodon contortrix) microsatellite loci from modest

amounts of 454 shotgun genome sequence. Molecular Ecology Resources 10(2):

341-347.

Chambers SM, Fain SR, Fazio B, et al. (2012). An account of the taxonomy of North

American wolves from morphological and genetic analyses. North American

Fauna 77: 1-67.

Chismar JD, Mondala T, Fox HS, et al. (2002). Analysis of result variability from high-

density oligonucleotide arrays comparing same-species and cross-species

hybridizations. BioTechniques 33(3): 516-518.

Clark AG, Hubisz MJ, Bustamante CD, et al. (2005). Ascertainment bias in studies of

human genome-wide polymorphism. Genome Research 15: 1496-1502.

Cooper A, Lalueza-Fox C, Anderson S, et al. (2001). Complete mitochondrial genome

sequences of two extinct moas clarify ratite evolution. Nature 409(6821): 704-

707.

Curole JP & Kocher TD. (1999). Mitogenomics: digging deeper with complete

mitochondrial genomes. Trends in Ecology & Evolution 14(10): 394-398.

Davey JW, Hohenlohe PA, Etter PD, et al. (2011). Genome-wide genetic marker 69

discovery and genotyping using next-generation sequencing. Nature Reviews

Genetics 12(7): 499-510.

Decker JE, Pires JC, Conant GC, et al. (2009). Resolving the evolution of extant and

extinct ruminants with high-throughput phylogenomics. Proceedings of the

National Academy of Sciences of the United States of America 106(44): 18644-

18649.

Delisle I & Strobeck C. (2002). Conserved primers for rapid sequencing of the complete

mitochondrial genome from carnivores, applied to three species of bears.

Molecular Biology and Evolution 19(3): 357-361.

Ekblom R & Galindo J. (2011). Applications of next generation sequencing in molecular

ecology of non-model organisms. Heredity 107(1): 1-15.

Ellegren H. (2004). Microsatellites: simple sequences with complex evolution. Nature

Reviews Genetics 5(6): 435-445.

Emerson KJ, Merz CR, Catchen JM, et al. (2010). Resolving postglacial phylogeography

using high-throughput sequencing. Proceedings of the National Academy of

Sciences of the United States of America 107(37): 16196-16200.

Evertsz EM, Au-Young J, Ruvolo MV, et al. (2001). Hybridization cross-reactivity

within homologous gene families on glass cDNA microarrays. BioTechniques

31(5): 1182, 1184, 1186, 1188, 1190-1192.

Folch J, Cocero MJ, Chesne P, et al. (2009). First birth of an animal from an extinct

subspecies (Capra pyrenaica pyrenaica) by cloning. Theriogenology 71(6): 1026-

1034.

Forêt S, Kassahn KS, Grasso LC, et al. (2007) Genomic and microarray approaches to 70

coral reef conservation biology. Coral Reefs 26: 475-486.

Frankham R. (1995). Conservation genetics. Annual Review of Genetics 29: 305-327.

Frankham R, Ballou JD & Briscoe DA. (2009). Introduction to conservation genetics,

Cambridge University Press.

Fu B & He S. (2012). Transcriptome analysis of silver carp (Hypophthalmichthys

molitrix) by paired-end RNA sequencing. DNA Research 19(2): 131-142.

Gardner MG, Fitch AJ, Bertozzi T, et al. (2011). Rise of the machines - recommendations

for ecologists when using next generation sequencing for microsatellite

development. Molecular Ecology Resources 11(6): 1093-1101.

Garvin MR, Saitoh K & Gharrett AJ. (2010). Application of single nucleotide

polymorphisms to non-model species: a technical review. Molecular Ecology

Resources 10(6): 915-934.

Gebremedhin B, Ficetola GF, Naderi S, et al. (2009). Frontiers in identifying

conservation units: from neutral markers to adaptive genetic variation. Animal

Conservation 12(2): 107-109.

Gibson G. (2002). Microarrays in ecology and evolution: a preview. Molecular Ecology

11(1): 17-24.

Gilbert MTP, Drautz DI, Lesk AM, et al. (2008). Intraspecific phylogenetic analysis of

Siberian woolly mammoths using complete mitochondrial genomes. Proceedings

of the National Academy of Sciences of the United States of America 105(24):

8327-8332.

Gilles A, Meglécz E, Pech N, et al. (2011). Accuracy and quality assessment of 454 GS-

FLX Titanium pyrosequencing. BMC Genomics 12: 245. 71

Glenn TC. (2011). Field guide to next-generation DNA sequencers. Molecular Ecology

Resources 11(5): 759-769.

Goetz F, Rosauer D, Sitar S, et al. (2010). A genetic basis for the phenotypic

differentiation between siscowet and lean lake trout (Salvelinus namaycush).

Molecular Ecology 19(Supp1): 176-196.

Green RE, Malaspinas AS, Krause J, et al. (2008). A complete neandertal mitochondrial

genome sequence determined by high-throughput sequencing. Cell 134(3): 416-

426.

Haussler D, O'Brien SJ, Ryder OA, et al. (2009). Genome 10K: a proposal to obtain

whole-genome sequence for 10,000 vertebrate species. Journal of Heredity

100(6): 659-674.

Hawkins RD, Hon GC & Ren B. (2010). Next-generation genomics: an integrative

approach. Nature Reviews Genetics. 11: 476-486.

Haynes GD & Latch EK. (2012). Identification of novel single nucleotide polymorphisms

(SNPs) in deer (Odocoileus spp.) using the BovineSNP50 BeadChip. Plos One

7(5): e36536.

Hedrick PW. (2009). Neutral, detrimental, and adaptive variation in conservation

genetics. In Conservation genetics in the age of genomics (Amato, G., DeSalle,

R., Ryder, O. A. & Rosenbaum, H. C., eds.), pp. 41-57. Columbia University

Press.

Heller MJ. (2002). DNA microarray technology: devices, systems, and applications.

Annual Review of Biomedical Engineering 4(1): 129-153.

Henkel CV, Burgerhout E, de Wijze DL, et al. (2012). Primitive duplicate hox clusters in 72

the European eel's genome. Plos One 7(2): e32231.

Highnam G, Franck C, Martin A, et al. (2013). Accurate human microsatellite genotypes

from high-throughput resequencing data using informed error profiles. Nucleic

Acids Research 41(1): 1-7.

Hoheisel JD. (2006). Microarray technology: beyond transcript profiling and genotype

analysis. Nature Reviews Microbiology 7(3): 200-210.

Hohenlohe PA, Amish SJ, Catchen JM, et al. (2011). Next-generation RAD sequencing

identifies thousands of SNPs for assessing hybridization between rainbow and

westslope cutthroat trout. Molecular Ecology Resources 11(Suppl 1): 117-122.

Houston DD, Elzinga DB, Maughan PJ, et al. (2012). Single nucleotide polymorphism

discovery in cutthroat trout subspecies using genome reduction, barcoding, and

454 pyro-sequencing. BMC Genomics 13: 724.

Jeukens J, Renaut S, St-Cyr J, et al. (2010). The transcriptomics of sympatric dwarf and

normal lake whitefish (Coregonus clupeaformis spp., Salmonidae) divergence as

revealed by next-generation sequencing. Molecular Ecology 19(24): 5389-5403.

Kassahn KS, Caley MJ, Ward AC, et al. (2007). Heterologous microarray experiments

used to identify the early gene response to heat stress in a coral reef fish.

Molecular Ecology 16(8): 1749-1763.

Kennerly E, Ballmann A, Martin S, et al. (2008). A gene expression signature of

confinement in peripheral blood of red wolves (Canis rufus). Molecular Ecology

17(11): 2782-2791.

Knowles JC. (2010). Population genomics of North American grey wolves (Canis lupus),

Masters Thesis, University of Alberta. 73

Kohn MH, Murphy WJ, Ostrander EA, et al. (2006). Genomics and conservation

genetics. Trends in Ecology & Evolution 21(11): 629-637.

Kuhner MK, Beerli P, Yamato J, et al. (2000). Usefulness of single nucleotide

polymorphism data for estimating population parameters. Genetics 156: 439-447.

Lagercrantz U, Ellegren H & Andersson L. (1993). The abundance of various

polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic

Acids Research 21(5): 1111-1115.

Laird PW. (2010). Principles and challenges of genome-wide DNA methylation analysis.

Nature Reviews Genetics 11(3): 191-203.

Lander ES, Int Human Genome Sequencing C, Linton LM, et al. (2001). Initial

sequencing and analysis of the human genome. Nature 409(6822): 860-921.

Lee JR, Magee DM, Gaster RS, et al. (2013). Emerging protein array technologies for

proteomics. Expert Review of Proteomics 10(1): 65-75.

Li R, Fan W, Tian G, et al. (2009). The sequence and de novo assembly of the giant

panda genome. Nature 463(7279): 311-317.

Li W & Ruan K. (2009). MicroRNA detection by microarray. Analytical and

Bioanalytical Chemistry 394(4): 1117-1124.

Luikart G, England PR, Tallmon D, et al. (2003). The power and promise of population

genomics: from genotyping to genome typing. Nature Reviews Genetics 4(12):

981-994.

Luo C, Tsementzi D, Kyrpides N, et al. (2012). Comparisons of Illumina and Roche 454

sequencing technologies on the same microbial community DNA sample. PLoS

One 7(2): e30087. 74

Magnanou E, Malenke JR, & Dearing MD (2009) Expression of biotransformation genes

in woodrat (Neotoma) herbivores on novel and ancestral diets: identification of

candidate genes responsible for dietary shifts. Molecular Ecology 18(11): 2401-

2414.

Malausa T, Gilles A, Meglecz E, et al. (2011). High-throughput microsatellite isolation

through 454 GS-FLX Titanium pyrosequencing of enriched DNA libraries.

Molecular Ecology Resources 11(4): 638-644.

Malik A, Korol A, Hubner S, et al. (2011). Transcriptome sequencing of the blind

subterranean mole rat, Spalax galili: utility and potential for the discovery of

novel evolutionary patterns. Plos One 6(8): e21227.

Mancia A, Ryan JC, Chapman RW, et al. (2012). Health status, infection and disease in

California sea lions (Zalophus californianus) studied using a canine microarray

platform and machine-learning approaches. Developmental and Comparative

Immunology 36(4): 629-637.

Manel S, Joost S, Epperson BK, et al. (2010). Perspectives on the use of landscape

genetics to detect genetic adaptive variation in the field. Molecular Ecology

19(17): 3760-3772.

Margam VM, Coates BS, Bayles DO, et al. (2011). Transcriptome sequencing, and rapid

development and application of SNP markers for the legume pod borer Maruca

vitrata (Lepidoptera: Crambidae). Plos One 6(7): e21388.

Marguerat S & Bähler J. (2010). RNA-seq: from technology to biology. Cellular and

Molecular Life Sciences 67(4): 569-579.

Margulies M, Egholm M, Altman WE, et al. (2005). Genome sequencing in 75

microfabricated high-density picolitre reactors. Nature 437(7057): 376-380.

Maxam AM & Gilbert W. (1977). New method for sequencing DNA. Proceedings of the

National Academy of Sciences of the United States of America 74(2): 560-564.

McCue ME, Bannasch DL, Petersen JL, et al. (2012). A high density SNP array for the

domestic horse and extant Perissodactyla: utility for association mapping, genetic

diversity, and phylogeny studies. PloS Genetics 8(1): e1002451.

Metzker ML. (2009). Sequencing technologies — the next generation. Nature Reviews

Genetics 11(1): 31-46.

Michelizzi VN, Wu XL, Dodson MV, et al. (2011). A global view of 54,001 single

nucleotide polymorphisms (SNPs) on the Illumina BovineSNP50 BeadChip and

their transferability to water buffalo. International Journal of Biological Sciences

7(1): 18-27.

Miller JM, Kijas JW, Heaton MP, et al. (2012a). Consistent divergence times and allele

sharing measured from cross-species application of SNP chips developed for three

domestic species. Molecular Ecology Resources 12(6): 1145-1150.

Miller JM, Poissant J, Kijas JW, et al. (2011). A genome-wide set of SNPs detects

population substructure and long range linkage disequilibrium in wild sheep.

Molecular Ecology Resources 11(2): 314-322.

Miller W, Schuster SC, Welch AJ, et al. (2012b). Polar and brown bear genomes reveal

ancient admixture and demographic footprints of past climate change.

Proceedings of the National Academy of Sciences of the United States of America

109(36): E2382-E2390.

Minoche AE, Dohm JC, & Himmelbauer H. (2011). Evaluation of genomic high- 76

throughput sequencing data generated on Illumina HiSeq and Genome Analyzer

systems. Genome Biology 12: R112.

Morin PA, Archer FI, Foote AD, et al. (2010). Complete mitochondrial genome

phylogeographic analysis of killer whales (Orcinus orca) indicates multiple

species. Genome Research 20(7): 908-916.

Morin PA, Luikart G, Wayne RK, et al. (2004). SNPs in ecology, evolution and

conservation. Trends in Ecology & Evolution 19(4): 208-216.

Morin PA, Martien KK & Taylor BL. (2009). Assessing statistical power of SNPs for

population structure and conservation studies. Molecular Ecology Resources 9(1):

66-73.

Mullikin JC, Hansen NF, Shen L, et al. (2010). Light whole genome sequence for SNP

discovery across domestic cat breeds. BMC Genomics 11: 406.

Naurin S, Bensch S, Hansson B, et al. (2008). A microarray for large-scale genomic and

transcriptional analyses of the zebra finch (Taeniopygia guttata) and other

passerines. Molecular Ecology Resources 8(2): 275-281.

Nève G & Meglécz E. (2000). Microsatellite frequencies in different taxa. Trends in

Ecology & Evolution 15(9): 376-377.

Nicholls H. (2008). Darwin 200: let's make a mammoth. Nature 456(7220): 310-314.

Nielsen R & Signorovitch J. (2003). Correcting for ascertainment biases when analyzing

SNP data: applications to the estimation of linkage disequilibrium. Theoretical

Population Biology 63: 245-255.

Nielsen R. (2004). Population genetic analysis of ascertained SNP data. Human

Genomics 1: 218-224. 77

Nikaido M, Sasaki T, Emerson JJ, et al. (2011). Genetically distinct coelacanth

population off the northern Tanzanian coast. Proceedings of the National

Academy of Sciences of the United States of America 108(44): 18009-18013.

Ogden R, Baird J, Senn H, et al. (2012). The use of cross-species genome-wide arrays to

discover SNP markers for conservation genetics: a case study from Arabian and

scimitar-horned oryx. Conservation Genetics Resources 4(2): 471-473.

Oshlack A, Chabot AE, Smyth GK, et al. (2007). Using DNA microarrays to study gene

expression in closely related species. Bioinformatics 23(10): 1235-1242.

Ouborg NJ. (2009). Integrating population genetics and conservation biology in the era of

genomics. Biology Letters 6(1): 3-6.

Ouborg NJ, Angeloni F & Vergeer P. (2010). An essay on the necessity and feasibility of

conservation genomics. Conservation Genetics 11(2): 643-653.

Pagani I, Liolios K, Jansson J, et al. (2012). The Genomes OnLine Database (GOLD) v.4:

status of genomic and metagenomic projects and their associated metadata.

Nucleic Acids Research 40(D1): D571-D579.

Perry GH, Louis EE, Ratan A, et al. (2013). Aye-aye population genomic analyses

highlight an important center of endemism in northern Madagascar. Proceedings

of the National Academy of Sciences of the United States of America. (ahead of

print).

Perry GH, Melsted P, Marioni JC, et al. (2012a). Comparative RNA sequencing reveals

substantial genetic variation in endangered primates. Genome Research 22: 602-

610.

Perry GH, Reeves D, Melsted P, et al. (2012b). A genome sequence resource for the aye- 78

aye (Daubentonia madagascariensis), a nocturnal lemur from Madagascar.

Genome Biology and Evolution 4(2): 126-135.

Pertoldi C, Wójcik JM, Tokarska M, et al. (2010). Genome variability in European and

American bison detected using the BovineSNP50 BeadChip. Conservation

Genetics 11(2): 627-634.

Piña-Aguilar RE, Lopez-Saucedo J, Sheffield R, et al. (2009). Revival of extinct species

using nuclear transfer: hope for the mammoth, true for the Pyrenean ibex, but is it

time for "conservation cloning''? Cloning and Stem Cells 11(3): 341-346.

Pop M & Salzberg SL. (2008). Bioinformatic challenges of new sequencing technology.

Trends in Genetics 24(3): 142-149.

Quail MA, Smith M, Coupland P, et al. (2012). A tale of three next generation

sequencing platforms: comparison of Ion Torrent, Pacific Biosciences, and

Illumina MiSeq sequencers. BMC Genomics 13: 341.

Ramos AM, Crooijmans RPMA, Affara NA, et al. (2009). Design of a high density SNP

genotyping assay in the pig using SNPs identified and characterized by next

generation sequencing technology. Plos One 4(8): e6524.

Renaut S, Nolte AW & Bernatchez L. (2010). Mining transcriptome sequences towards

identifying adaptive single nucleotide polymorphisms in lake whitefish species

pairs (Coregonus spp. Salmonidae). Molecular Ecology 19(Suppl 1): 115-131.

Renn SC, Aubin-Horth N, & Hofmann HA (2004). Biologically meaningful expression

profiling across species using heterologous hybridization to a cDNA microarray.

BMC Genomics 5: 42.

Romanov MN, Tuttle EM, Houck ML, et al. (2009). The value of avian genomics to the 79

conservation of wildlife. BMC Genomics 10(Suppl 2): S10.

Rosenberg NA, Li LM, Ward R, et al. (2003). Informativeness of genetic markers for

inference of ancestry. American Journal of Human Genetics 73(6): 1402-1422.

Rutledge LY, Wilson PJ, Klütsch CFC, et al. (2012). Conservation genomics in

perspective: A holistic approach to understanding Canis evolution in North

America. Biological Conservation 155: 186-192.

Ryder OA. (2005). Conservation genomics: applying whole genome studies to species

conservation efforts. Cytogenetic and Genome Research 108(1-3): 6-15.

Sanger F, Air GM, Barrell BG, et al. (1977). Nucleotide sequence of bacteriophage

phiX174 DNA. Nature 265(5596): 687-695.

Sanger F, Nicklen S & Coulson AR. (1977). DNA sequencing with chain-terminating

inhibitors. Proceedings of the National Academy of Sciences of the United States

of America 74(12): 5463-5467.

Schena M, Shalon D, Davis RW, et al. (1995) Quantitative monitoring of gene expression

patterns with a cDNA microarray. Science 270(5235): 467-470.

Schlotterer C. (2004). The evolution of molecular markers - just a matter of fashion?

Nature Reviews Genetics 5(1): 63-69.

Schoville SD, Bonin A, François O, et al. (2012). Adaptive genetic variation on the

landscape: methods and cases. Annual Review of Ecology, Evolution, and

Systematics 43(1): 23-43.

Seabury CM, Bhattarai EK, Taylor JF, et al. (2011). Genome-wide polymorphism and

comparative analyses in the white-tailed deer (Odocoileus virginianus): a model

for conservation genomics. Plos One 6(1): e15811. 80

Seeb JE, Carvalho G, Hauser L, et al. (2011). Single-nucleotide polymorphism (SNP)

discovery and applications of SNP genotyping in nonmodel organisms. Molecular

Ecology Resources 11(Suppl 1): 1-8.

Selkoe KA & Toonen RJ. (2006). Microsatellites for ecologists: a practical guide to using

and evaluating microsatellite markers. Ecology Letters 9(5): 615-629.

Shendure J & Ji HL. (2008). Next-generation DNA sequencing. Nature Biotechnology

26(10): 1135-1145.

Stoevesandt O, Taussig MJ & He MY. (2009). Protein microarrays: high-throughput

tools for proteomics. Expert Review of Proteomics 6(2): 145-157.

Stölting KN, Nipper R, Lindtke D, et al. (2013). Genomic scan for single nucleotide

polymorphisms reveals patterns of divergence and gene flow between

ecologically divergent species. Molecular Ecology 22(3): 842-855.

Storfer A, Eastman JM & Spear SF. (2009). Modern molecular methods for amphibian

conservation. BioScience 59(7): 559-571.

Talapatra A, Rouse R & Hardiman G. (2002). Protein microarrays: challenges and

promises. Pharmacogenomics 3(4): 527-536.

Thomson JM, Parker J, Perou CM, et al. (2004). A custom microarray platform for

analysis of microRNA gene expression. Nature Methods 1(1): 47-53.

Tsoi SCM, Cale JM, Bird IM, et al. (2003). Use of human cDNA microarrays for

identification of differentially expressed genes in Atlantic salmon liver during

Aeromonas salmonicida infection. Marine Biotechnology 5: 545–554.

Tymchuk WV, O’Reilly P, Bittman J, et al. (2010). Conservation genomics of Atlantic

salmon: variation in gene expression between and within regions of the Bay of 81

Fundy. Molecular Ecology 19(9): 1842-1859.

Uddin M, Wildman DE, Liu GZ, et al. (2004). Sister grouping of chimpanzees and

humans as revealed by genome-wide phylogenetic analysis of brain gene

expression profiles. Proceedings of the National Academy of Sciences of the

United States of America 101(9): 2957-2962.

Vahey MT, Nau ME, Taubman M, et al. (2003). Patterns of gene expression in peripheral

blood mononuclear cells of rhesus macaques infected with SIVmac251 and

exhibiting differential rates of disease progression. AIDS Research and Human

Retroviruses 19(5): 369-387.

Van Belleghem SM, Roelofs D, Van Houdt J, et al. (2012). De novo transcriptome

assembly and SNP discovery in the wing polymorphic salt marsh beetle Pogonus

chalceus (Coleoptera, Carabidae). Plos One 7(8): e42605. van Orsouw NJ, Hogers RCJ, Janssen A, et al. (2007). Complexity reduction of

polymorphic sequences (CRoPS (TM)): a novel approach for large-scale

polymorphism discovery in complex genomes. Plos One 2(11): e1172.

Van Tassell CP, Smith TPL, Matukumalli LK, et al. (2008). SNP discovery and allele

frequency estimation by deep sequencing of reduced representation libraries.

Nature Methods 5(3): 247-252.

Vera JC, Wheat CW, Fescemyer HW, et al. (2008). Rapid transcriptome characterization

for a nonmodel organism using 454 pyrosequencing. Molecular Ecology 17(7):

1636-1647.

Vignal A, Milan D, SanCristobal M, et al. (2002). A review on SNP and other types of

molecular markers and their use in animal genetics. Genetics Selection Evolution 82

34(3): 275-305. vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305.

Wang JJ, Yu XM, Zhao K, et al. (2012). Microsatellite development for an endangered

Bream Megalobrama pellegrini (Teleostei, Cyprinidae) using 454 sequencing.

International Journal of Molecular Sciences 13(3): 3009-3021.

Wang XV, Blades N, Ding J, et al. (2012). Estimation of sequencing error rates in short

reads. BMC Bioinformatics 13: 185.

Wang Z, Gerstein M, & Snyder M. (2009). RNA-Seq: a revolutionary tool for

transcriptomics. Nature Reviews Genetics 10: 57-63.

Wiedmann RT, Smith TPL & Nonneman DJ. (2008). SNP discovery in swine by reduced

representation and high throughput pyrosequencing. BMC Genetics 9: 81.

Wilhelm BT & Landry J. (2009). RNA-Seq - quantitative measurement of expression

through massively parallel RNA-sequencing. Methods 48(3): 249-257.

Wolf JB, Bayer T, Haubold B, et al. (2010). Nucleotide divergence vs. gene expression

differentiation: comparative transcriptome sequencing in natural isolates from the

carrion crow and its hybrid zone with the hooded crow. Molecular Ecology

19(Supp1): 162-175.

Zhan X, Pan S, Wang J, et al. (2013). Peregrine and saker falcon genome sequences

provide insights into evolution of a predatory lifestyle. Nature Genetics (ahead of

print).

Zhou X, Ren L, Meng Q, et al. (2010). The next-generation sequencing technology and 83

application. Protein & Cell 1(6): 520-536.

84

Table A.1: Summary of the commonly used next-generation sequencing technologies. Each company normally offers a variety of commercial platforms and variations, but the information listed below is for the platform with the highest yield. Visit the websites given or see Glenn et al. (2011) for more information. I have excluded certain technologies such as Helicos Biosciences because the company has filed bankruptcy and will no longer offer their service, and those offered by Oxford Nanopore Technologies because their sequencing systems are not yet commercially available.

Number of Read Typical Instrument Reads/Run Length Yield/Run Time Website (millions) (bases) (Mb)

ABI 3730xl 0.000096 650 0.06 2 hrs www.appliedbiosystems.com (Sanger)

454 FLX 1 1,000 700 23 hrs www.454.com Titanium XL+ a a

Illumina 6,000 PE 2 x 100 600,000 11 days www.illumina.com HiSeq 2500

Ion Torrent 60-80 200 ≤10,000 2-4 hrs www.iontorrent.com Proton

PacBio RSII 0.047 4,606 217 10 hrs www.pacbio.com Solid 5500 2,800 PEa 75 x 35a 180,000 7 days www.appliedbiosystems.com aPE = paired-end reads. Normally bases are sequenced from either end of a circularized DNA fragment. 85

Figure A.1: Cost (in U.S. dollars) per megabase (Mb) of DNA sequencing from September 2001 until January 2013. Notice the Y-axis is on a logarithmic scale. The graph was created from data available publically at http://www.genome.gov/sequencingcosts/.

#!!!!$

#!!!$

#!!$

#!$

#$ !"#$%&'(%)*%+,-.-%/"001(#2%

!"#$

!"!#$ )*+(!,$ )*+(!/$ )*+(!0$ )*+(!1$ )*+(!2$ )*+(!3$ )*+(!4$ )*+(!5$ )*+(#!$ )*+(##$ )*+(#,$ )*+(#/$ %&'(!#$ %&'(!,$ %&'(!/$ %&'(!0$ %&'(!1$ %&'(!2$ %&'(!3$ %&'(!4$ %&'(!5$ %&'(#!$ %&'(##$ %&'(#,$ -*.(!1$ -*.(!3$ -*.(!4$ -*.(!5$ -*.(##$ -*.(!,$ -*.(!/$ -*.(!0$ -*.(!2$ -*.(#!$ -*.(#,$

86

Figure A.2: Diagram depicting the basic procedure followed when comparing the expression levels of two samples (usually a control and treatment) using a microarray hybridization method (A) and when performing an RNA-seq analysis (B). Adapted from Figure 2 of Forêt et al. (2007).

+     (((     

       

  !  '  & "            

#  #  )!   *&

    

$% " )!   

 

 &&!  

 

87

Figure A.3: Line graphs depicting the relationship between the time to the most recent common ancestor (TMRCA) between two species and the proportion of SNPs amplified (solid line) or amplified and polymorphic (dashed line). These graphs are recreated from those of Miller et al. (2012a) comparing domestic and wild species on three different Illumina genotyping chips: BovineSNP50, OvineSNP50, and EquineSNP50.

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'" !"#$#"%#&'#(')*!+'

!#&"

!#%"

!#$"

!" !" (" $!" $(" %!" %(" &!" &(" '!" ,-./0'1-203'

88

APPENDIX B

PUMAPLEX: A NEW PANEL OF SNP MARKERS FOR EFFECTIVE CONSERVATION GENETIC ANALYSIS AND MANAGEMENT OF PUMAS

FITAK, ROBERT R.

Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ

85721, USA

COAUTHORS: THOMPSON, R., NAIDU, A., and CULVER, M.

This paper was prepared to submit to a journal.

Corresponding Address:

Robert R. Fitak

317 Biosciences East

1311 E 4th St

Tucson, AZ, 85721, USA

Tel: 520-626-1636

Fax: 520-621-8801

Email: [email protected]

Keywords: single nucleotide polymorphisms, next-generation sequencing, microsatellites, non- invasive DNA

89

ABSTRACT

Pumas (Puma concolor) are a captivating wildlife species that draw much attention from managers for their use as a game species, for predating upon (at times endangered) wildlife, and for human-animal conflicts. In response to this attention, managers must make informed decisions when managing puma populations, often utilizing the most recent and robust scientific techniques. These techniques include the use of genetics, which are employed to discern such questions as population structure, diet, individual identification, population size, inbreeding, migration, parentage, gene flow, etc. Often, nuclear microsatellites are used, however, these data suffer from disadvantages such as difficulty in being compared across studies (laboratories), the high cost of examination, and homoplasy (the same allele present by different mutational events). Therefore, new genetic markers, such as single nucleotide polymorphisms

(SNPs), that can circumvent these concerns would be beneficial. SNPs are rapidly becoming the marker of choice for ecological and conservation studies of wildlife. SNPs have several advantages over their more commonly used counterpart, microsatellites, including reduced homoplasy, increased abundance, computational simplicity, high- throughout capabilities, and suitability for data storage and comparison. Additionally, most SNP genotyping technologies require shorter DNA fragments than microsatellite genotyping, potentially improving amplification success in ancient and non-invasively collected samples. Here we describe the development of PumaPlex, an assay to rapidly examine single nucleotide polymorphisms (SNPs) in a large number of pumas.

PumaPlex can simultaneously analyze 27 SNPs in 384 samples using the SequenomTM 90

MassARRAY system. In this article we highlight the assay design and its experimental validation in over 500 pumas samples. We compared genotypes from PumaPlex and 15 microsatellite markers in 237 pumas collected in Arizona. Overall, conclusions drawn from both datasets were quite similar. Additionally, we compared genotyping success between marker datasets in 10 fecal DNA samples, and found that SNPs may outperform microsatellites, especially when microsatellite genotyping has poor success. As a result,

SNPs provide an excellent method for the genetic analysis of pumas, and their use can facilitate improved genetic monitoring of pumas from both tissue and non-invasively collected samples.

91

INTRODUCTION

The puma (Puma concolor) is a captivating wildlife species that draws large amounts of attention for its use as a game species (Logan and Sweanor 2001), for predating upon (at times endangered) wildlife (Turner et al. 1992; Wehausen 1996;

Hayes et al. 2000; Schaefer et al. 2000), and for human-animal conflicts such as predation upon livestock, pets, and even humans (Beier 1991; Torres et a. 1996).

Additionally, large carnivores, like pumas, often have substantial impacts on the dynamics of both prey (Logan and Irwin 1985; Turner et al. 1992) and plant communities

(Schmitz et al. 2000), and are frequently the center of conservation programs targeting entire landscapes (Weber and Rabinowitz 1996). In response to both the anthropological and ecological consequences of puma biology, wildlife officials must continue to research and monitor puma populations to make informed decisions regarding their conservation and management (Culver and Schwartz 2011).

As part of effective conservation and management programs, genetics studies are often initiated to investigate many aspects of species’ biology (Frankham 2003). For the past 20 years, the genetic marker of choice has been the microsatellite. Microsatellites are short, tandemly repeated, DNA sequences commonly found throughout the genome.

The effectiveness of microsatellites in population genetic studies can be attributed to their amplification by relatively simple PCR methods, high degree of polymorphism, and abundance (Schlötterer 2004). Microsatellites, however, are not without their disadvantages. The high mutation rate for microsatellites, which generates their high level of polymorphism, can also result in high levels of homoplasy ( the same allele present by different mutational events; reviewed by Estoup et al. 2002). Homoplasy in 92 microsatellites makes deciphering between identity by descent and identity by state a challenge, potentially biasing such parameters as divergence and coancestry.

Additionally, differences in amplicon size due to variation in the Taq polymerase, fluorescent dye, and analytical software used render the comparison of microsatellites across studies a challenge and their ability to be stored in comparative databases difficult

(Hahn et al. 2001; Vignal et al. 2002, Schlötterer 2004).

Due to the large interest in pumas for the reasons mentioned above, many genetic studies have been performed. Culver et al. (2000) used both microsatellites and mitochondrial DNA to identify a recent founding event among all North American pumas, reducing 15 previously described subspecies into a single subspecies, Puma concolor couguar. The authors also reported that a majority of genetic variation among pumas could be attributed to Central and South American populations, where five other subspecies were described. On a finer scale, studies employing nuclear microsatellite markers have found both significant barriers to gene flow (Walker et al. 2000; Ernest et al

2003; McRae et al. 2005; Loxterman 2011; Andreasen et al. 2012) and a lack of population structure (Sinclair et al. 2001; Anderson et al. 2004; Castilho et al. 2011;

Miotto et al. 2012). Other studies have also employed genetic analysis of non-invasive samples to document the presence of pumas (Haag et al. 2009; Miotto et al. 2011), minimum number of individuals (Miotto et al. 2007; Naidu et al. 2011), and diet (Farrell et al. 2000; Ernest et al. 2002; Naidu 2009). Despite large amounts of genetic research on pumas using microsatellites, comparisons across studies are difficult, if not impossible, due to a lack of consistency in the loci chosen and the challenge of standardizing allele sizes without extensive calibration or sharing of positive control 93 samples among studies. Therefore, in order to facilitate long-term genetic monitoring of puma populations and comparisons across laboratories, a panel of genetic markers that can be analyzed in a high-throughput fashion while minimizing cross-laboratory variation is necessary.

Recently, advances in genomic techniques, notably next-generation sequencing, have facilitated a potential role for a new molecular marker, single nucleotide polymorphisms (SNPs; Vignal et al. 2002; Morin et al. 2004; Garvin et al. 2010). A SNP is simply a single base change in a DNA sequence. Although any site in a DNA sequence could potentially have four alleles, a SNP generally has two (biallelic) as a result of the low mutation rate per nucleotide site and the mutational bias favoring transitions (Vignal et al. 2002). SNPs have many advantages over microsatellites: they are more numerous in the genome, analytically simpler, and can be analyzed in a high-throughput fashion.

Additionally, unlike microsatellites SNPs are more amenable to high-throughput technologies and genotypes can be easily stored in databases and successfully compared across studies (Vignal et al. 2002; Morin et al. 2004; Schlötterer 2004; Garvin et al.

2010). Although SNPs have many advantages over microsatellites, they can be limited by strong ascertainment bias (Morin et al. 2004; Albrechtsen et al. 2010; Garvin et al.

2010). Ascertainment bias can be avoided, or minimized, through a proper SNP discovery strategy (Clark et al. 2005), investigating questions not susceptible to the allele frequency spectrum like parentage or individual identification (Morin et al. 2004), using

SNPs to mark only haplotype blocks (tag SNPs, Sabeti et al. 2007), or using statistical corrections (Nielsen et al. 2004; Albrechtsen et al. 2010). When using SNPs, the primary concern is their low information content per locus relative to their microsatellite 94 counterparts. To compensate for their low information content, more SNPs (2-6x as many) than microsatellites must be used, with the actual number dependent upon the question addressed, sample size, and evolutionary history of the target organism (Vignal et al. 2002; Brumfield et al. 2003; Morin et al. 2004; Schlötterer 2004). For example, estimates of population structure require ~30 SNPs to detect moderate levels of differentiation (FST ~ 0.01), while as many as 80 SNPs are necessary to detect lower levels of differentiation (Morin et al. 2009). For questions regarding paternity exclusion and discriminating between related and unrelated individuals, 40 – 100 SNPs may be necessary (Blouin et al. 1996; Morin et al. 2004). When comparing the results of SNPs with microsatellites, several studies have reported that ~1 – 4 times as many SNPs perform equally, if not better, than microsatellites (Liu et al. 2005; Seddon et al. 2005;

Smith et al. 2007; Coates et al. 2009). This number can vary, however, depending on a variety of factors including the parameter in question, levels of genetic variation, and sample size (Morin et al. 2004; 2009).

In this study, we report the development of a high-throughput assay containing 27

SNPs for genotyping pumas. We illustrate the methods used for identifying the SNPs in addition to their validation in more than 500 pumas from throughout their range. We used this genotyping assay to estimate levels of genetic variation, population structure, probability of identity and relatedness in a sample of hunter-harvested pumas collected in

Arizona. We compared our results with those of 15 microsatellite loci analyzed in the same individuals (Naidu et al., unpublished). We focused a majority of analyses on population structure and individual identification, since these are priorities for puma conservation and management (Logan and Sweanor 2001; Culver and Schwartz 2011). 95

Additionally, because SNPs are expected to outperform microsatellites in studies of ancient or non-invasively collected DNA samples (Seddon et al. 2005; Morin and

McCarthy 2007), we included a set of 10 fecal samples previously identified as puma

(Naidu et al. 2011) and compared the genotyping success of SNPs with that of microsatellites. Our results indicated that the assay is robust to most common population genetic analyses, and it produced genotypes that facilitate long-term data storage and comparison among studies.

MATERIALS AND METHODS

Sample collection and nucleic acid extraction

We collected fresh, whole blood samples from a combination of 12 wild and captive pumas via saphenous venipuncture. All pumas were of wild origin from different locations in Arizona, with the exception of a single puma from Bosque del Apache, New

Mexico. Approximately 500 µL of fresh, whole blood was placed immediately into RNA

Protect Animal Blood Tubes (Qiagen Inc., Germantown, MD, USA) and stored in liquid nitrogen or on ice for <24 hours then in liquid nitrogen until RNA extraction could be performed. Remaining whole blood was stored at -20 °C for DNA extraction. We extracted total RNA using the RNA Protect Animal Blood System (Qiagen Inc.) according to manufacturer’s recommendations. We collected and extracted two RNA samples per individual, and pooled these together for the final sample. All DNA extractions were performed with the DNeasy Blood and Tissue Kit (Qiagen Inc.) as outlined by the manufacturer. For downstream genotyping we collected an additional

444 tissue samples from hunter-harvested pumas as part of a monitoring program by the 96

Arizona Game and Fish Department (including one puma from New Mexico and a puma reported from Connecticut). We had DNA extracted using a BioSprint 96 automated system (Qiagen Inc.) with Dynabeads magnetic bead (Invitrogen, Life Technologies,

Grand Island, NY, USA) technology and quantified using the Quant-iT PicoGreen reagent (Invitrogen) at the University of Arizona Genetics Core (UAGC).

cDNA library construction and sequencing

We assessed total RNA quality and quantity on a 2100 Bioanalyzer (Agilent

Technologies Inc., Santa Clara, CA, USA) for each sample using the RNA 6000 Nano chip. We pooled each of the 12 samples in equimolar ratios to obtain 534 ng of total

RNA in 50 µL of water. We synthesized cDNA using the SMARTer Pico PCR cDNA

Synthesis Kit (Clontech Laboratories Inc., Mountain View, CA, USA) following the manufacturer’s recommendations. This kit targets polyadenylated RNA and minimizes contamination from ribosomal and other RNAs. To avoid biases in cDNA library construction and amplification, we prepared six independent cDNA libraries as described above, quantified each on a 2100 Bioanalyzer using the DNA High Sensitivity chip, and pooled them in equimolar concentrations for the final cDNA library. We sequenced the final cDNA library using half of a picotiter plate on a GS-FLX Titanium platform (454

Life Sciences, A Roche Company, Branford, CT, USA) at the UAGC.

Sequence assembly and SNP identification

We assessed the specificity of the sequencing by randomly selecting 1,000 raw sequence reads and performing a search against NCBI’s nr database using BLASTN 97

(http://blast.ncbi.nlm.nih.gov/Blast.cgi). We implemented an e-value cutoff of 10-30 and only kept the top hit. We assembled all the sequences into contigs using the

GSASSEMBLER software (Roche), employing parameters to screen reads against

NCBI’s vector contaminant database, to trim the reads of the SMARTer oligonucleotide primer (Clontech) used in cDNA library preparation, and to remove contigs <100 bases in length.

To identify SNPs, we first cleaned the raw sequence reads by removing the cDNA library primer, trimming the 3’ ends of each read to a minimum phred quality score of 20, and filtering reads shorter than 100 bases using CUTADAPT (Martin 2011). Next, we mapped the cleaned sequencing reads back to the assembled contigs using BWA (Li and

Durbin 2009). All polymorphisms were identified and scored using FREEBAYES

(Garrison and Marth 2012). The resulting polymorphisms were filtered to include only biallelic SNPs with a FREEBAYES calculated quality score ≥ 20 and a minimum allele count ≥ 2.

For each SNP, we designed primers using PRIMER3 (Rozen and Skaletsky 2000) to target a fragment between 150-250 bp in length and to include at least 50 bp of flanking sequence on either side of the SNP. In cases where PRIMER3 could not successfully generate primers for a desired SNP, we removed these SNPs from downstream analyses.

Because the alignment of paralogous sequences can result in the detection of false-positive SNPs, we attempted to remove those primer sets amplifying genes potentially containing paralogs. Briefly, for each target sequence we performed a

BLASTX search against a custom database composed of from the Canis lupus 98 familiaris, Bos taurus, and Ailuropoda melanoleuca genomes. For each query we selected the top hit and performed a nucleotide BLAST search using the protein’s coding sequence against its genome of origin. We omitted fragments from further analyses if they had multiple hits within its genome with an e-value <1x10-30 and similarity >90%.

Furthermore, the BLAST results were manually compared against the Duplicated Genes

Database (http://dgd.genouest.org) for Canis lupus familiaris for further evidence of paralogous sequences, and all sequences with best matches to mitochondrial genes were removed.

Sequenom genotyping, validation, and analyses

We designed 4 primer multiplexes for the simultaneous analyses of 28, 29, 36, and 36 putative SNPs using the Assay Design Suite v1.0 (Sequenom Inc., San Diego,

CA, USA). We screened each multiplex for polymorphic SNPs using DNA from the 12 pumas used to construct the initial cDNA library and 179 of the hunter-harvested pumas on the Sequenom MassARRAY system (Sequenom) at the UAGC. We compared success rate of validated SNPs to FREEBAYES quality scores and allele counts using a

Welch’s T-test to look for patterns in successful validation of SNPs.

To further validate SNPs, we selected eight SNP loci at random for verification using traditional PCR and Sanger sequencing in the 12 original puma samples. We amplified each SNP using ~20 ng template DNA, 0.5 µM each of forward and reverse primer described above, 1X PCR buffer (USB, Cleveland, OH, USA), 1.5 mM MgCl2,

0.25 mM each dNTP, 0.05% BSA, and 0.5 U Taq DNA polymerase (USB) in a final volume of 20 µL. Cycling conditions consisted of an initial melting step of 5 min at 95 99

°C, then 40 cycles of 95 °C for 45 sec, 52-62 °C for 45 sec, and 72 °C for 1 min, with a final extension step at 72 °C for 7 min. We purified all PCR products using the ExoSAP-

IT PCR Clean-up Kit (USB) according to manufacturer’s specifications. PCR products were sequenced on an ABI3730 DNA Sequencer (Applied Biosystems, Foster City, CA,

USA) at the UAGC in both forward and reverse directions. Sequences were cleaned, visually inspected for errors, and assembled using Sequencher v5.1 (Gene Codes

Corporation, Ann Arbor, MI USA http://www.genecodes.com). The genotypes at each

SNP locus were called manually after visual inspection of the chromatograms.

A final multiplex was developed using the polymorphic SNPs identified from the four initial Sequenom multiplexes. Using this final multiplex, we analyzed an additional

265 of the hunter- harvested Arizona pumas and 75 pumas collected by Culver et al.

(2000) from other parts of their range for a total of 531 individuals including at least one individual from each of the six described subspecies (Figure B.1): P. c couguar (n =

512), P. c. puma (n = 6), P c. costaricensis (n = 1), P. c. capricornensis (n = 2), P. c. cabrerae (n = 5), and P. c. concolor (n = 4). One additional individual was a captive pet puma from an unknown location. Within P. c. couguar, we collected 501 individuals from 13 states, four from Mexico, and seven from Canada (Figure B.1). A total of 13 samples were replicated across the different Sequenom analyses, and negative controls were used for each assay. For final analyses we removed individuals missing genotypes at more than four loci and then removed SNP loci with a call rate <95%.

We annotated each SNP using BLAST against the known coding sequences in the

Felis catus 6.2 assembly (www.ensembl.org/Felis_catus). We used approximately 200 bp flanking sequence on each side of the SNP and only retained the top hit with a 100 maximum identity >90%. In situations where no significant match was found, we queried the NCBI RefSeq database (www.ncbi.nlm.nih.gov/refseq) instead. For each subspecies of puma, we calculated the minor allele frequency (MAF), observed and expected heterozygosities at each locus then tested for significant deviations from Hardy-

Weinberg equilibrium (HWE) using exact tests and complete enumeration of the P-value in GENEPOP v4.2 (Rousset 2008). We tested for statistically linked loci using pairwise calculations of linkage disequilibrium (LD) in GENEPOP and assessed its significance using a Markov chain algorithm with a 10,000 iteration burn-in period and 20 batches of

5,000 iterations each. We calculated the inbreeding coefficient FIS, using FSTAT v2.9

(Goudet 2001) and tested for values significantly different from zero using 3640 permutations in FSTAT. We screened our dataset for the presence of null alleles using a procedure described by Girard (2011) using the mean FIS (calculated in FSTAT) and

100,000 simulations. We assessed initial population structuring using a principle coordinates analysis (PCoA) in the software Genalex v6.5 (Peakall and Smouse 2012).

PCoA summarizes major patterns in a multivariate dataset on several primary axes of variation. We further investigated population structure using the Bayesian clustering algorithm implemented in STRUCTURE v2.3.2.1 (Pritchard et al. 2000; Falush et al.

2003; Hubisz et al. 2009). STRUCTURE is used to determine the number of clusters

(populations) without any a priori knowledge of population substructure. The estimated number of populations (K) is given as the largest value of ΔK calculated in the software

STRUCTURE HARVESTER (Earl and vonHoldt 2012) using the method presented by

Evanno et al. (2005). We tested for K = 1–15 with 15 iterations of each possible K. We assumed admixture and correlated allele frequencies and ran each analysis for 30,000 101 burn-in generations and 300,000 generations thereafter. For the most probable number of populations, K, we combined the ten replicates using the full-search algorithm in the software CLUMPP v1.1.2 (Jakobsson and Rosenberg 2007) and plotted the results using the software DISTRUCT v1.1 (Rosenberg 2004).

Comparison with microsatellite data

We compared the results of SNP genotyping using PumaPlex with a subset of these samples (n = 237) that had already been genotyped for 15 Felis catus-derived microsatellite markers – FCA026, FCA035, FCA043, FCA052, FCA057, FCA077,

FCA082, FCA090, FCA096, FCA132, FCA144, FCA176, FCA221, FCA229, and

FCA290 (Menotti-Raymond et al. 1999) – as part of a separate study in our laboratory

(Naidu et al. unpublished data). For downstream analyses we only retained samples that were genotyped in both datasets. For both the SNP and microsatellite datasets we removed loci with more than 10% missing data, and removed individuals from both datasets if they were missing data at more than two SNP loci and/or at more than three microsatellite loci. Because the exact site of collection for each hunter-harvested puma is unknown, we assigned each puma to a putative subpopulation of origin as shown in

Figure B.7 (N, SW, and SE) according to where each hunter registered the carcass.

These three regions are defined by several interstate highways, which may serve as barriers to gene flow in pumas (Beier 1995; McRae et al. 2005).

For each marker dataset within each putative population we tested for deviations from HWE, for LD between pairs of markers, calculated the number of alleles, observed

(HO) and expected (HE) heterozygosities, and the inbreeding coefficient FIS as described 102 previously. We compared the potential structuring of puma populations in Arizona between each marker dataset using the following three analyses. First, to investigate if any population structure existed between the three putative regions we calculated FST between each pair of subpopulations and assessed its significance using exact G-tests and

Markov chain search parameters as described above in GENEPOP. Then, we used the

PCoA analysis in GENALEX and STRUCTURE software as described above. The parameters for STRUCTURE were similar to those described previously with the exception of 10 independent runs for each K and a 100,000 step burn-in period followed by 500,000 iterations. For each potential value of K, we summarized and plotted the results as mentioned previously.

Individual identity and relatedness

We compared the ability of both datasets to accurately identify individuals by calculating the theoretical probability of identity (PI) and the probability of identity among siblings (PISIBS) in GENALEX. The microsatellite loci were ordered according to their description above, and the SNP dataset was ordered as given in Table B.2. Finally, we estimated pairwise coefficients of relatedness for each dataset using the method of

Lynch and Ritland (1999) in the software COANCESTRY v1.0 (Wang 2011). We assessed statistical differences between datasets and subpopulations using paired and two sample T-tests, respectively.

Fecal DNA samples and analysis 103

We included 10 fecal DNA samples that were previously identified as puma and that were genotyped for 12 microsatellite loci by Naidu et al. (2011)—FCA043, FCA057,

FCA082, FCA090, FCA096, and FCA166 (Menotti-Raymond et al. 1999), and PcoB105,

PcoA2, PcoD8, PcoD301, and PcoD329 (Rodzen et al. 2007), and PcoB010w

(Kurushima et al. 2006). According to Naidu et al. (2011), only three loci amplified consistently (FCA043, FCA057, FCA090) after repeating the genotyping three times for each locus in each sample. We genotyped these 10 fecal DNA samples on PumaPlex as described above. We genotyped each fecal DNA sample three times to minimize potential genotyping errors associated with ancient and non-invasively collected DNA samples (Bonin et al. 2004). Across replicates, a genotype was scored when at least two replicates matched, and when no disagreements existed across all three replicates, the same criteria that were applied to the microsatellite genotype scoring.

RESULTS

Sequencing and SNP identification

A general summary of the sequencing and SNP verification results can be viewed in Table B.1. We generated 358,049 sequence reads from the cDNA library with a mean length of 509.6 bases. Initial comparison of 1,000 randomly selected sequences to the

NCBI ‘nr’ database indicated nearly all sequences were of mammalian, non-human origin

(Figure B.2), suggesting they originated from pumas. A majority of sequences had high- scoring matches to the domestic dog (Canis lupus familiaris), panda (Ailuropoda melanoleuca), and domestic cat (Felis catus) genomes. Additionally, another large proportion of reads had no significant matches to the database, and the remaining 104 matches were distributed among many different mammalian and non-mammalian genomes.

Sequence assembly produced 2,073 contigs with an N50 length of 673 bases. Of the 434 putative SNPs identified, we screened 129 using the Sequenom MassArray system and found 30 (23.3%) that were polymorphic. When we compared validated SNP loci to monomorphic loci and loci that failed the Sequenom genotyping we found no significant differences in FREEBAYES quality scores and allele frequencies (Figure B.3,

Welches t-test p > 0.05 in all comparisons). Further validation of eight SNP loci in the

12 original puma individuals using traditional Sanger sequencing revealed 100% agreement between the Sanger and Sequenom genotypes. For the final multiplex, hereafter referred to as PumaPlex, we were able to include 27 of the 30 polymorphic

SNPs. A total of 21 of 27 SNPS were annotated using a BLAST search against the F. catus genome, with sequence identity between the puma sequence and the F. catus sequence ranging between 94.23% and 100% (Table B.2). The remaining six SNP loci were annotated using the Bos taurus, Pongo abelii, Odobenus rosmarus, Mustela putorius, or Canis lupus genomes, albeit with much lower identities (between 74.5% with

B. taurus and 82.5% with Mustela putorius) (Table B.2).

PumaPlex

We analyzed a total of 531 pumas (12 pumas collected for cDNA library construction, 444 hunter harvested AZ pumas, and 75 pumas from Culver et. al. 2000) for the 27 SNPs included on PumaPlex. Across 13 replicated samples, we found 100% agreement in genotype calls. We removed one locus, PCCNI, because in the final 105

PumaPlex this locus did not amplify in any individuals. We removed a total of 15 individuals that were missing genotypes at more than four loci. The final dataset therefore contained 516 individual pumas, and missingness (failed genotypes) was only

2.7%. We reported LD between pairwise locus combinations and HWE for each locus within P. c. couguar individuals only because the SNPs were ascertained in a panel of 12 individuals of this subspecies. After Bonferroni correction we detected only four instances of pairwise LD, and no consistent deviations were present. Testing for HWE revealed eight of 26 (30.8%) loci differed from expectations after correction for multiple tests (Table B.3). A test for null alleles revealed an excess of individual homozygosity at locus PCFHL1 (Figure B.4). Analyses were performed both with and without this locus without any significant differences therefore we retained this locus in the following calculations. In P. c. couguar, the mean MAF was 0.274 ± 0.025 and ranged from a low of 0.060 to a highest MAF of 0.449 (Table B.3). Observed heterozygosity was 0.340 ±

0.024 and FIS was 0.074 and did not differ significantly from zero (Table B.3). In the other five puma subspecies, MAF, observed heterozygosity, expected heterozygosity, and proportion of polymorphic loci were all reduced, but this may be a result of small sample sizes and/or ascertainment bias when making comparisons across these subspecies (Table

B.4). Therefore, we simply report these values and do not make further comparisons across subspecies.

We summarized our dataset using a PCoA and found that the first two components account for 22.4% and 18.7% of the variation. When combined graphically, it appeared there were moderate amounts of overlap between subspecies (Figure B.5), but generally the first component was able to produce some separation between P. c. couguar 106 of North America and the remaining subspecies of Central and South America. We didn’t identify any clustering within North American pumas. The results from

STRUCTURE, however, suggested three clusters as the best possible explanation, although evidence also existed for seven populations (Figure B.6A). All of the Central and South American pumas assigned to a single cluster, shown as green in Figure B.6, with the exception of a single P. c. cabrerae individual which assigned to a separate cluster shown in orange. The single, captive pet puma clustered with this green group as well (Figure B.5, B.6C), indicating a similar ancestry. In general, pumas of the green cluster were found through the entire range. In Arizona, however, it appears the northern portion of the state contains a majority of pumas from a single cluster (green, Figure

B.6D), whereas the southern portion of the state contains a mix of the other two clusters

(orange and blue, Figure B.6D).

Comparison with microsatellite data

We compared genotypes for 237 pumas from both PumaPlex and analysis of 15 microsatellite loci. We removed no microsatellite loci and four SNP loci for low genotyping success (<90%). We removed a total of six individuals from both datasets for missing genotypes at either more than two microsatellite or three SNP loci. After

Bonferroni correction, a single microsatellite locus, FCA043, demonstrated consistent deviations from HWE and instances of LD. This locus was removed from subsequent analyses. Using this dataset no SNP loci showed significant departure from HWE or LD between loci. For the microsatellite dataset, HE was greatest in SE (0.663) and least in N

(0.629), however both HO and HE were similar in all three subpopulations (Table B.5). 107

The level of inbreeding, FIS, was highest for N (0.041), but was generally quite low for each subpopulation, and did not differ significantly from zero in all cases (P > 0.05).

Measures of heterozygosity for the SNP dataset were also similar in each subpopulation, albeit consistently about half that of the microsatellite dataset (Table B.5). This is expected considering the lower polymorphism of SNPs compared with microsatellites.

For SNPs, HE was highest in SE (0.366) and lowest in SW (0.345). Like microsatellites,

FIS was highest in N (0.082), but no subpopulations displayed a significant amount of inbreeding.

We estimated levels of differentiation for each dataset using pairwise FST between subpopulations (Table B.5). Both datasets revealed modest (FST < 0.1) but significant (P

< 0.017), subpopulation structure. In the microsatellite dataset, the largest levels of differentiation were between N and SW (FST = 0.068, P < 0.00001) and N and SE (FST =

0.061, P < 0.00001), whereas differentiation between the two southern groups, SE and

SE, was approximately half (FST = 0.030, P < 0.00001). Results for the SNP dataset were quite similar, with the largest level of differentiation occurring between N and SE (FST =

0.056, P < 0.00001). A slightly lower amount was observed between N and SW (FST =

0.042, P < 0.00001), and the least amount between SE and SW (FST = 0.020, P <

0.00001).

Using principle coordinates analysis in the microsatellite dataset we identified component 1 that accounted for the largest amount of variation in the data (22.45%;

Figure B.8A). This axis was able to moderately distinguish N from SE and SW. The second component accounted for 20.18% of the variation but did not further discriminate between populations. For the SNP dataset, component 1 (20.76%) weakly separated SW 108 from N and SE, whereas component 2 (19.17%) was able to somewhat discriminate N from SE and SW (Figure B.8B).

The Bayesian analysis of population structure showed that ΔK was highest at K =

3 for the microsatellite dataset, and K = 4 for the SNP dataset. Figure B.9 illustrates the proportion of ancestry for each individual in K = 2, 3, 4, and 5 potential clusters for both datasets. For microsatellites (Figure B.9A), the first major division, K = 2, separates N from SE and SW, much as demonstrated by FST. At K = 3, the most probable number of subpopulations, each region, N, SE, and SW generally forms their own unique cluster, with a small number of individuals within each cluster having the largest proportion of ancestry from one of the other regions. For SNPs (Figure B.9B) the most probable number of subpopulations is K = 4, however the graph does not represent any pattern of four clusters. Rather, it appears the SNP dataset at K = 3 separated most individuals in N as a unique cluster, but no distinction was made between SE and SW as evidenced with microsatellites.

As expected, we found PI and PISIBS for 14 microsatellites to be much lower than

- for the 23 SNPs (Figure B.10). The PI and PISIBS for the 14 microsatellites were 2.4x10

12 and 1.3x10-5, respectively, and 3.6x10-8 and 1.6x10-4 for 23 SNPs. The 23 SNP loci on

PumaPlex had approximately the same PI as nine microsatellite loci (PI = 2.5x10-8).

Overall, values for PI and PISIBS for both datasets were well below the recommended threshold for estimates of population size (PI < 0.01) and for forensic applications (PI <

0.0001) given by Waits et al. (2001).

We calculated coefficients of relatedness for each pairwise combination of individuals within each population for each marker dataset (Figure B.11) and found 109 average relatedness was highest in N for both datasets (0.091 and 0.096 for microsatellites and SNPs, respectively) and significantly greater than zero (t-test, P <

0.001) in all cases. In N and SE, no significant differences in relatedness between genetic markers were observed, but in SW relatedness calculated with microsatellites was significantly greater than that calculated with SNPs (paired t-test, P = 0.00086). Mean relatedness between subpopulations was different in all comparisons (Welch’s t-test, P <

0.001), with the exception of between SW and SE in the SNP dataset, where this difference was less substantial (P = 0.011).

We genotyped 10 puma fecal samples using PumaPlex, and compared the genotyping success (call rate) with that of three microsatellite loci (Naidu et al. 2011).

Call rate for SNPs (0.692) was greater than that of microsatellites (0.633), although not significantly different (paired t-test, P = 0.426). Fecal samples that produced a high call rate for one type of marker usually performed well for the other marker as well (Figure

B.12). However, when a scat performed poorly for microsatellites (call rate = 0.333),

SNPs usually outperformed microsatellites in these samples (Figure B.12).

DISCUSSION

PumaPlex

We sequenced and assembled the transcriptome of a pool of 12 pumas in order to identify a panel of SNP markers useful for genetic analyses. We found no patterns between the ability to predict SNPs from quality scores and read depth and actual validation of the SNP. Therefore, although our conservative criteria predicted 434 SNPs, the actual number of potential SNPs in our dataset may be much larger. When using the 110

Sequenom MassARRAY system, we found genotype estimates to be reliable, reproducible, and with a rapid turn around compared with microsatellite genotype data.

Therefore, Sequenom is an effective method to screen predicted SNPs when compared with Sanger sequencing

We used 129 of our predicted SNPs to develop PumaPlex, which is a panel of 27

SNPs analyzed simultaneously in 384 samples as a single multiplex on the Sequenom

MassARRAY system. We experimentally validated PumaPlex in 531 individuals, including 513 individuals from the North American subspecies of puma, P. c. couguar.

To minimize ascertainment bias, we used a large ascertainment sample set (n = 12), reported all polymorphic SNPs (MAF > 0) and only reported HWE and LD for individuals from the same population as the ascertainment sample set (Brumfield et al.

2003; Morin et al. 2004). We also examined the applicability of PumaPlex to pumas from throughout their range, and found the assay to still be effective, although we acknowledge the potential for ascertainment bias to exist on samples from other parts of the species’ range. Therefore, we suggest other users of PumaPlex take this into account when calculating parameters that use allele frequencies to model demographic changes like effective population size (Wakeley et al. 2001; Brumfield et al. 2003; Morin et al.

2004). The potential for ascertainment bias is likely to be greatest when using PumaPlex on South American pumas and least when analyzing North American pumas, considering the recent founding event that likely occurred in North American pumas approximately

10,000 – 12,000 years ago (Culver et al. 2000). We did find a reduction in polymorphism in Central and South American subspecies of puma consistent with ascertainment bias, however the sample sizes for these subspecies were quite low. A much larger sample 111

size is necessary corroborate these results. Other parameters, such as PID, assignment tests, and paternity tests are less sensitive to ascertainment bias and these questions may be more applicable for studies using PumaPlex outside of North America (Krawczak

1999; Manel et al. 2000; Morin et al. 2004).

We did identify eight loci that deviated from HWE, and although statistically significant, these deviations were usually quite small and may be a function of a large sample size (Martínez-Abraín 2008), or the effect of a subdivided population (Wahlund effect). Although we found no consistent patterns of LD, other PumaPlex users who detect LD may want to remove one locus from each pair or combine the two loci into a single haplotype. By combining into haplotypes using software such as PHASE

(Stephens et al. 2001; Stephens and Donnelly 2003), the result is a single locus with up to

4 alleles (haplotypes). Morin et al. (2009) demonstrated that inferring haplotypes actually increases statistical power in estimates of population differentiation. Ignoring the Central and South American subspecies of puma due to very small sample sizes, analysis of population structure revealed strong evidence for three genetic clusters.

Geographically, one of these clusters was composed of primarily pumas from northern

Arizona and north. The remaining two clusters were rather evenly distributed throughout the southwestern United States. However, as evidenced by both the PCoA and the

STRUCTURE analysis, we did find individuals from the southwestern clusters in the northern portion of their range and vice versa. This could be the result of long migration events of individual pumas, likely young males. Young male pumas have been known to disperse quite far, at times over 700 km (Elbroch et al. 2009). The large potential for 112 dispersal is likely maintaining genetic heterogeneity across the range of P. c. couguar, minimizing any observable population structure.

Comparison with microsatellite data

We also made comparisons between the results of PumaPlex and that of 15 traditional microsatellite loci in a sample of pumas from Arizona. FST, PCoA, and

STRUCTURE results further support, using both types of markers, that pumas in northern Arizona, N, are the most genetically divergent from pumas in the two southern regions (SE and SW). These analyses also suggest potential subpopulation structure between SE and SW, but this differentiation is much less and not necessarily supported by STRUCTURE with SNPs. It is quite possible that PumaPlex may lack the power necessary to distinguish between subpopulations using STRUCTURE when the signal is very weak. This can be overcome by increasing the number of loci, the number of individuals, or both (Morin et al. 2009). Although increasing the number of SNPs in

PumaPlex would be beneficial, the exact number of SNPs necessary can be difficult to estimate. In gray wolves, Knowles (2010) measured discordance between estimates of subpopulation structure between marker sets and found that between 56 and 140 SNPs are necessary to infer the same structure as 14 microsatellites. However, the author stated that these numbers may be specific to wolves and don’t necessarily represent the number required for any species.

Despite the potential for less resolution in determining population substructure using SNPs, the general conclusions drawn from each dataset did not change. Both datasets revealed modest population structure between N and the southern regions, 113 although this weak structure may not be biologically meaningful or may be the result of isolation by distance. The primary differentiation between N and the other regions had been reported elsewhere (McRae et al. 2005). McRae et al. (2005) suggested this structuring of puma populations might be a consequence of restricted movements across

Interstate-40. Similar reports of highways serving as barriers to movement and gene flow have been reported in pumas from California (Beier 1995) in addition to other carnivores, including grizzly bears (Ursus arctos; Mace et al. 1996) and coyotes (Canis latrans;

Riley et al. 2006). However, we cannot exclude the possibility of other barriers, such as geography, restricting gene flow in pumas (see McRae et al. 2005 for a detailed discussion). For instance, much of northern Arizona is composed of high-elevation coniferous forests and cold temperate grasslands, while much of southern Arizona is

Sonoran desertscrub (Brown et al. 2007). In addition to subpopulation structure, mean pairwise relatedness was much greater in N than in SE and SW for both datasets. The higher level of relatedness in N could be a result of kin discrimination or limited dispersal of individuals (Hamilton 1964a; 1964b; Cornwallis et al. 2009), especially females (Biek et al. 2006) in this region. We suggest the latter as a more plausible explanation due in part to the solitary nature of pumas that limit their interactions (Hornocker 1969;

Seidensticker et al. 1973; Sunquist and Sunquist 2002) and to larger, more contiguous habitat available in northern Arizona that may result in smaller home range sizes, limiting dispersal (Laundre and Loxterman 2007). A smaller effective population size in N may also inflate estimates of pairwise relatedness, especially considering that observed heterozygosity was less in N. 114

Our results also demonstrated that PumaPlex is a useful tool for identifying individuals. Although the PID is much lower for the microsatellite dataset than for SNPs, the PID for the SNP dataset is still able to distinguish individuals with high confidence

(PID = 3.6x10-8, Waits et al. 2001). Therefore, PumaPlex provides an excellent alternative to microsatellite analysis for studies that require individual identification, such as estimating and monitoring population sizes, home range sizes, and for forensic applications (Taberlet et al. 1997; 1999; Ernest et al. 2000; Ogden et al. 2009; Ogden

2011). Recently, the ability to identify individuals has become popular with the expansion of population surveys using non-invasive DNA samples (see Piggot and

Taylor 2003; Waits and Paetkau 2005 for reviews). In pumas, non-invasive DNA studies have focused on the amplification of microsatellites from fecal DNA, although often encountering poor genotyping success (Ernest et al. 2000; Ernest et al. 2002; Miotto et al.

2007; Naidu et al. 2011). Because SNPs interrogate much shorter fragments and are amenable to high-throughput analysis, it has been suggested that they may outperform microsatellites in amplification success from non-invasively collected samples (Morin et al. 2004; Seddon et al. 2005; Morin and McCarthy 2007; Fabbri et al. 2012). Fabbri et al.

(2012) explicitly compared genotyping success between SNPs and microsatellites in 318

Italian wolf scats, and did not find evidence for a significant increase in genotyping success with SNPs. The authors only amplified nine SNP loci, and determined that more loci were needed to accurately identify individual wolves. To our knowledge, this study is the first to observe the genotyping success of SNPs in wildlife feces using the

Sequenom system. Our results suggest that the genotyping performance of SNPs in

PumaPlex may actually outperform that of microsatellites in puma feces, especially in 115 samples with very low microsatellite amplification success. However, a more thorough study is necessary since the sample size of scats were quite low. Additionally, PumaPlex only requires ~10 ng of total DNA per reaction (1 reaction = 27 SNPs), whereas using microsatellites this number may be much larger, potentially exhausting a limited supply of DNA.

Conclusions

SNPs are rapidly becoming the marker of choice for conservation genetic studies, and understanding their performance relative to more commonly used genetic markers is necessary when determining which marker to use (Morin et al. 2004). Although

PumaPlex contained 27 SNPs, it successfully estimated levels of genetic variation, population differentiation, and PID and produced results that are not substantially different from conclusions drawn from 14 microsatellite loci. Microsatellites in pumas, however, were able to detect subtle population structure, an issue that can be compensated for by increasing the number of SNPs (Morin et al. 2009). Because we only screened 129

SNPs, the potential for a large number of polymorphic SNPs still exists in the transcriptome dataset. Therefore, the number of SNPs on PumaPlex can be increased at future times. Unfortunately, this will come at a cost of sample size per run of the

Sequenom machine. Instead of 384 samples genotyped for 27 SNPs, the number of SNPs could be approximately doubled but only 192 samples can be processed. It may be useful to identify additional SNPs from a larger ascertainment sample set that includes pumas from more distant geographic areas to minimize ascertainment bias, and to include SNPs from non-coding regions to minimize effects from selection. Because PumaPlex is 116 analyzed on the Sequenom MassARRAY system, the multiplex can be used at any institution with the instrument and produces data coded simply as base identities. This results in data easily stored in databases and facilitates comparison across studies without the need for extensive cross-calibration of allele sizes such as for microsatellites (Vignal et al. 2002; Morin et al. 2004; Coates et al. 2009). Also, once sample collection is complete, the use of high-throughput DNA extraction robots and PumaPlex can produce genetic data ready for analysis in a matter of days, which previously would take anywhere from several months to years to complete using microsatellites. PumaPlex also required very small amounts of DNA (~10ng/sample; www.Sequenom.com) thus providing an efficient and economical use of samples containing limited DNA quantity such as in non-invasive and museum-collected samples. Puma managers and researchers can employ PumaPlex to generate results quickly and of comparable performance with microsatellites. Therefore, we encourage researchers and managers to consider

PumaPlex as a valuable tool for large-scale studies and continued monitoring of pumas to facilitate their conservation and management.

117

REFERENCES

Albrechtsen A, Nielsen FC & Nielsen R. (2010). Ascertainment biases in SNP chips

affect measures of population divergence. Molecular Biology and Evolution

27(11): 2534-2547.

Anderson CR, Lindzey FG & McDonald DB. (2004). Genetic structure of cougar

populations across the Wyoming Basin: metapopulation or megapopulation.

Journal of Mammalogy 85(6): 1207-1214.

Andreasen AM, Stewart KM, Longland WS, et al. (2012). Identification of source-sink

dynamics in mountain lions of the Great Basin. Molecular Ecology 21(23): 5689-

5701.

Beier P. (1991). Cougar attacks on humans in the United States and Canada. Wildlife

Society Bulletin 19(4): 403-412.

Beier P. (1995). Dispersal of juvenile cougars in a fragmented habitat. Journal of Wildlife

Management 59(2): 228-237.

Biek R, Akamine N, Schwartz MK, et al. (2006). Genetic consequences of sex-biased

dispersal in a solitary carnivore: Yellowstone cougars. Biology Letters 2(2): 312-

315.

Blouin MS, Parsons M, Lacaille V, et al. (1996). Use of microsatellite loci to classify

individuals by relatedness. Molecular Ecology 5(3): 393-401.

Bonin A, Bellemain E, Eidesen PB, et al. (2004). How to track and assess genotyping

errors in population genetics studies. Molecular Ecology 13(11): 3261-3273. 118

Brown DE, Unmack PJ & Brennan TC. (2007). Digitized map of biotic communities for

plotting and comparing distributions of North American animals. Southwestern

Naturalist 52(4): 610-616.

Brumfield RT, Beerli P, Nickerson DA, et al. (2003). The utility of single nucleotide

polymorphisms in inferences of population history. Trends in Ecology &

Evolution 18(5): 249-256.

Castilho CS, Marins-Sá LG, Benedet RC, et al. (2011). Landscape genetics of mountain

lions (Puma concolor) in southern Brazil. Mammalian Biology - Zeitschrift für

Säugetierkunde 76(4): 476-483.

Clark AG, Hubisz MJ, Bustamante CD, et al. (2005). Ascertainment bias in studies of

human genome-wide polymorphism. Genome Research 15(11): 1496-1502.

Coates BS, Sumerford DV, Miller NJ, et al. (2009). Comparative performance of single

nucleotide polymorphism and microsatellite markers for population genetic

analysis. Journal of Heredity 100(5): 556-564.

Cornwallis CK, West SA & Griffin AS. (2009). Routes to indirect fitness in

cooperatively breeding vertebrates: kin discrimination and limited dispersal.

Journal of Evolutionary Biology 22(12): 2445-2457.

Culver M, Johnson WE, Pecon-Slattery J, et al. (2000). Genomic ancestry of the

American puma (Puma concolor). Journal of Heredity 91(3): 186-197.

Culver M & Schwartz MK. (2011). Conservation genetics and cougar management. In

Managing cougars in North America (Jenks, J. A., ed.), pp. 91-110. Jack H.

Berryman Institute, Utah State University, Logan, Utah, USA. 119

Earl DA & vonHoldt BM. (2012). STRUCTURE HARVESTER: a website and program

for visualizing STRUCTURE output and implementing the Evanno method.

Conservation Genetics Resources 4(2): 359-361.

Elbroch M, Wittmer HU, Saucedo C, et al. (2009). Long-distance dispersal of a male

puma (Puma concolor puma) in Patagonia. Revista Chilena de Historia Natural

82: 459-461.

Ernest HB, Boyce WM, Bleich VC, et al. (2003). Genetic structure of mountain lion

(Puma concolor) populations in California. Conservation Genetics 4(3): 353-366.

Ernest HB, Penedo MCT, May BP, et al. (2000). Molecular tracking of mountain lions in

the Yosemite Valley region in California: genetic analysis using microsatellites

and faecal DNA. Molecular Ecology 9(4): 433-441.

Ernest HB, Rubin ES & Boyce WM. (2002). Fecal DNA analysis and risk assessment of

mountain lion predation of bighorn sheep. Journal of Wildlife Management 66:

75-85.

Estoup A, Jarne P & Cornuet JM. (2002). Homoplasy and mutation model at

microsatellite loci and their consequences for population genetics analysis.

Molecular Ecology 11(9): 1591-1604.

Evanno G, Regnaut S & Goudet J. (2005). Detecting the number of clusters of

individuals using the software structure: a simulation study. Molecular Ecology

14(8): 2611-2620.

Fabbri E, Caniglia R, Mucci N, et al. (2012). Comparison of single nucleotide

polymorphisms and microsatellites in non-invasive genetic monitoring of a wolf

population. Archives of Biological Sciences 64(1): 321-335. 120

Falush D, Stephens M & Pritchard JK. (2003). Inference of population structure using

multilocus genotype data: linked loci and correlated allele frequencies. Genetics

164(4): 1567-1587.

Farrell LE, Romant J & Sunquist ME. (2000). Dietary separation of sympatric carnivores

identified by molecular analysis of scats. Molecular Ecology 9(10): 1583-1590.

Frankham R. (2003). Genetics and conservation biology. Comptes Rendus Biologies 326:

22-29.

Garrison E & Marth G. (2012). Haplotype-based variant detection from short-read

sequencing. ArXiv e-prints 1207.3907.

Garvin MR, Saitoh K & Gharrett AJ. (2010). Application of single nucleotide

polymorphisms to non-model species: a technical review. Molecular Ecology

Resources 10(6): 915-934.

Girard P. (2011). A robust statistical method to detect null alleles in microsatellite and

SNP datasets in both panmictic and inbred populations. Statistical Applications in

Genetics and Molecular Biology 10(1): 1-10.

Goudet J. (2001). FSTAT, a program to estimate and test gene diversities and fixation

indices (version 2.9.3), Vol. 2009.

Haag T, Santos AS, De Angelo C, et al. (2009). Development and testing of an optimized

method for DNA-based identification of jaguar (Panthera onca) and puma (Puma

concolor) faecal samples for use in ecological and genetic studies. Genetica

136(3): 505-512. 121

Hahn M, Wilhelm J & Pingoud A. (2001). Influence of fluorophor dye labels on the

migration behavior of polymerase chain reaction amplified short tandem repeats

during denaturing capillary electrophoresis. Electrophoresis 22(13): 2691-2700.

Hamilton WD. (1964). Genetical evolution of social behavior 1. Journal of Theoretical

Biology 7(1): 1-16.

Hamilton WD. (1964). Genetical evolution of social behavior 2. Journal of Theoretical

Biology 7(1): 17-52.

Hayes CL, Rubin ES, Jorgensen MC, et al. (2000). Mountain lion predation of bighorn

sheep in the Peninsular Ranges, California. Journal of Wildlife Management

64(4): 954-959.

Hornocker MG. (1969). Winter territoriality in mountain lions. Journal of Wildlife

Management 33(3): 457-464.

Hubisz MJ, Falush D, Stephens M, et al. (2009). Inferring weak population structure with

the assistance of sample group information. Molecular Ecology Resources 9(5):

1322-1332.

Jakobsson M & Rosenberg NA. (2007). CLUMPP: a cluster matching and permutation

program for dealing with label switching and multimodality in analysis of

population structure. Bioinformatics 23(14): 1801-1806.

Knowles JC. (2010). Population genomics of North American grey wolves (Canis lupus),

Masters Thesis, University of Alberta.

Krawczak M. (1999). Informativity assessment for biallelic single nucleotide

polymorphisms. Electrophoresis 20(8): 1676-1681. 122

Kurushima JD, Collins JA, Well JA, et al. (2006). Development of 21 microsatellite loci

for puma (Puma concolor) ecology and forensics. Molecular Ecology Notes 6(4):

1260-1262.

Laundre JW & Loxterman J. (2007). Impact of edge habitat on summer home range size

in female pumas. American Midland Naturalist 157(1): 221-229.

Li H & Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25(14): 1754-1760.

Liu NJ, Chen L, Wang S, et al. (2005). Comparison of single-nucleotide polymorphisms

and microsatellites in inference of population structure. BMC Genetics 6: S26.

Logan KA & Irwin LL. (1985). Mountain lion habitats in the Big Horn Mountains,

Wyoming. Wildlife Society Bulletin 13(3): 257-262.

Logan KA & Sweanor LL. (2001). Desert Puma: Evolutionary Ecology and

Conservation of an Enduring Carnivore, Island Press, Washington D.C., USA.

Loxterman JL. (2011). Fine scale population genetic structure of pumas in the

Intermountain West. Conservation Genetics 12(4): 1049-1059.

Lynch M & Ritland K. (1999). Estimation of pairwise relatedness with molecular

markers. Genetics 152(4): 1753-1766.

Mace RD, Waller JS, Manley TL, et al. (1996). Relationships among grizzly bears, roads

and habitat in the Swan Mountains, Montana. Journal of Applied Ecology 33(6):

1395-1404.

Manel S, Berthier P & Luikart G. (2000). Detecting wildlife poaching: identifying the

origin of individuals with Bayesian assignment tests and multilocus genotypes.

Conservation Biology 16(3): 650-659. 123

Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing

reads. EMBnet.journal 17(1): 10-12.

Martínez-Abraín A. (2008). Statistical significance and biological relevance: a call for a

more cautious interpretation of results in ecology. Acta Oecologica 34(1): 9-11.

McRae BH, Beier P, Dewald LE, et al. (2005). Habitat barriers limit gene flow and

illuminate historical events in a wide-ranging carnivore, the American puma.

Molecular Ecology 14(7): 1965-1977.

Menotti-Raymond M, David VA, Lyons LA, et al. (1999). A genetic linkage map of

microsatellites in the domestic cat (Felis catus). Genomics 57(1): 9-23.

Miotto RA, Cervini M, Begotti RA, et al. (2012). Monitoring a puma (Puma concolor)

population in a fragmented landscape in southeast Brazil. Biotropica 44(1): 98-

104.

Miotto RA, Cervini M, Figueiredo MG, et al. (2011). Genetic diversity and population

structure of pumas (Puma concolor) in southeastern Brazil: implications for

conservation in a human-dominated landscape. Conservation Genetics 12(6):

1447-1455.

Miotto RA, Rodrigues FP, Ciocheti G, et al. (2007). Determination of the minimum

population size of pumas (Puma concolor) through fecal DNA analysis in two

protected cerrado areas in the Brazilian Southeast. Biotropica 39(5): 647-654.

Morin PA, Luikart G, Wayne RK, et al. (2004). SNPs in ecology, evolution and

conservation. Trends in Ecology & Evolution 19(4): 208-216. 124

Morin PA, Martien KK & Taylor BL. (2009). Assessing statistical power of SNPs for

population structure and conservation studies. Molecular Ecology Resources 9(1):

66-73.

Morin PA & McCarthy M. (2007). Highly accurate SNP genotyping from historical and

low-quality samples. Molecular Ecology Notes 7(6): 937-946.

Naidu A. (2009). Genetics analysis of mountain lion (Puma concolor) feces from KOFA

National Wildlife Refuge, Arizona. Masters Thesis, University of Arizona.

Naidu A, Smythe LA, Thompson RW, et al. (2011). Genetic analysis of scats reveals

minimum number and sex of recently documented mountain lions. Journal of

Fish and Wildlife Management 2(1): 106-111.

Nielsen R. (2004). Population genetic analysis of ascertained SNP data. Human

Genomics 1: 218-224.

Ogden R. (2011). Unlocking the potential of genomic technologies for wildlife forensics.

Molecular Ecology Resources 11: 109-116.

Ogden R, Dawnay N & McEwing R. (2009). Wildlife DNA forensics—bridging the gap

between conservation genetics and law enforcement. Endangered Species

Research 9: 179-195.

Peakall R & Smouse PE. (2012). GenAlEx 6.5: genetic analysis in Excel. Population

genetic software for teaching and research - an update. Bioinformatics 28(19):

2537-2539.

Piggott MP & Taylor AC. (2003). Remote collection of animal DNA and its applications

in conservation management and understanding the population biology of rare and

cryptic species. Wildlife Research 30(1): 1-13. 125

Pritchard JK, Stephens M & Donnelly P. (2000). Inference of population structure using

multilocus genotype data. Genetics 155(2): 945-959.

Riley SPD, Pollinger JP, Sauvajot RM, et al. (2006). A southern California freeway is a

physical and social barrier to gene flow in carnivores. Molecular Ecology 15(7):

1733-1741.

Rodzen JA, Banks JD, Meredith EP, et al. (2007). Characterization of 37 microsatellite

loci in mountain lions (Puma concolor) for use in forensic and population

applications. Conservation Genetics 8(5): 1239-1241.

Rosenberg NA. (2004). Distruct: a program for the graphical display of population

structure. Molecular Ecology Notes 4(1): 137-138.

Rousset F. (2008). Genepop’007: a complete re-implementation of the genepop software

for Windows and Linux. Molecular Ecology Resources 8(1): 103-106.

Rozen S & Skaletsky HJ. (2000). Primer3 on the WWW for general users and for

biologist programmers. In Bioinformatics Methods and Protocols: Methods in

Molecular Biology (Krawetz, S. & Misener, S., eds.), pp. 365-386. Humana Press,

Totowa, New Jersey, USA.

Sabeti PC, Varilly P, Fry B, et al. (2007). Genome-wide detection and characterization of

positive selection in human populations. Nature 449(7164): 913-U912.

Schaefer RJ, Torres SG & Bleich VC. (2000). Survivorship and cause-specific mortality

in sympatric populations of mountain sheep and mule deer. California Fish and

Game 86(2): 127-135.

Schlotterer C. (2004). The evolution of molecular markers - just a matter of fashion?

Nature Reviews Genetics 5(1): 63-69. 126

Schmitz OJ, Hamback PA & Beckerman AP. (2000). Trophic cascades in terrestrial

systems: a review of the effects of carnivore removals on plants. American

Naturalist 155(2): 141-153.

Seddon JM, Parker HG, Ostrander EA, et al. (2005). SNPs in ecological and conservation

studies: a test in the Scandinavian wolf population. Molecular Ecology 14(2):

503-511.

Seidensticker J, Hornocker MG, Wiles WV, et al. (1973). Mountain lion social

organization in the Idaho Primitive Area. Wildlife Monographs 35: 1-60.

Sinclair EA, Swenson EL, Wolfe ML, et al. (2001). Gene flow estimates in Utah's

cougars imply management beyond Utah. Animal Conservation 4: 257-264.

Stephens M & Donnelly P. (2003). A comparison of Bayesian methods for haplotype

reconstruction from population genotype data. American Journal of Human

Genetics 73(5): 1162-1169.

Stephens M, Smith NJ & Donnelly P. (2001). A new statistical method for haplotype

reconstruction from population data. American Journal of Human Genetics 68(4):

978-989.

Sunquist ME & Sunquist F. (2002). The essence of cats. In Wild cats of the world

(Sunquist, M. E. & Sunquist, F., eds.), pp. 11-13. University of Chicago Press,

Chicago, Illinois, USA.

Taberlet P, Camarra JJ, Griffin S, et al. (1997). Noninvasive genetic tracking of the

endangered Pyrenean brown bear population. Molecular Ecology 6(9): 869-876.

Taberlet P, Waits LP & Luikart G. (1999). Noninvasive genetic sampling: look before

you leap. Trends in Ecology & Evolution 14(8): 323-327. 127

Torres SG, Mansfield TM, Foley JE, et al. (1996). Mountain lion and human activity in

California: testing speculations. Wildlife Society Bulletin 24(3): 451-460.

Turner JW, Wolfe ML & Kirkpatrick JF. (1992). Seasonal mountain lion predation on a

feral horse population. Canadian Journal of Zoology 70(5): 929-934.

Vignal A, Milan D, SanCristobal M, et al. (2002). A review on SNP and other types of

molecular markers and their use in animal genetics. Genetics Selection Evolution

34(3): 275-305.

Waits LP, Luikart G & Taberlet P. (2001). Estimating the probability of identity among

genotypes in natural populations: cautions and guidelines. Molecular Ecology

10(1): 249-256.

Waits LP & Paetkau D. (2005). Noninvasive genetic sampling tools for wildlife

biologists: a review of applications and recommendations for accurate data

collection. Journal of Wildlife Management 69(4): 1419-1433.

Wakeley J, Nielsen R, Liu-Cordero SN, et al. (2001). The discovery of single-nucleotide

polymorphisms - and inferences about human demographic history. American

Journal of Human Genetics 69(6): 1332-1347.

Walker CW, Harveson LA, Pittman MT, et al. (2000). Microsatellite variation in two

populations of mountain lions (Puma concolor) in Texas. Southwestern Naturalist

45(2): 196-203.

Wang J. (2011). Coancestry: a program for simulating, estimating and analysing

relatedness and inbreeding coefficients. Molecular Ecology Resources 11(1): 141-

145. 128

Weber W & Rabinowitz A. (1996). A global perspective on large carnivore conservation.

Conservation Biology 10(4): 1046-1054.

Wehausen JD. (1996). Effects of mountain lion predation on bighorn sheep in the Sierra

Nevada and Granite Mountains of California. Wildlife Society Bulletin 24(3): 471-

479.

Weir BS & Cockerham CC. (1984). Estimating F-statistics for the analysis of population-

structure. Evolution 38(6): 1358-1370.

129

Table B.1: General summary of pyrosequencing results, assembly, and SNP verification success.

Number of Reads 358,049 Number of Bases 108,014,736 Number of Contigs 2,073 N50 Contig Length 673 SNPs Identified 434 SNPs tested on Sequenom 129 SNPs Verified using Sequenom 30 SNPs on PumaPlex 27 130

Table B.2: Table of the 27 SNP loci reported in PumaPlex and their best annotations using BLAST (www.blast.ncbi.nlm.nih.gov)

% Locus ID Gene ID Database Reference Length Genome Similarity PCFAS* FAS ENSBTAG00000010785 74.5 145 Bos taurus PCZYX ZYX ENSFCAT00000029805 97.66 214 Felis catus PCITGB3A* ITGB3 ENSPPYG00000008948 80.3 356 Pongo abelii PCTKT TKT ENSFCAT00000031688 99.5 400 Felis catus PCTAL1* TAL1 XM_004413164.1 82 401 Odobenus rosmarus PCPRDX6 PRDX6 ENSFCAT00000032574 98.25 400 Felis catus PCFAM213A FAM213A ENSFCAT00000028388 97.24 398 Felis catus PCBACTIN BETA-ACTIN ENSFCAT00000006790 100 400 Felis catus PCPPBP* PPBP XM_004766276.1 82.5 401 Mustela putorius PCAP2M1 AP2M1 ENSFCAT00000007791 99.75 400 Felis catus PCPDLIM1 PDLIM1 ENSFCAT00000001420 98.5 400 Felis catus PCCOPE COPE ENSFCAT00000009760 100 183 Felis catus PCCCDC1098 CCDC109B ENSFCAT00000018939 99.75 400 Felis catus PCLAPTM5 LAPTM5 ENSFCAT00000028452 96.23 398 Felis catus PCFHL1 FHL1 ENSFCAT00000030378 99 400 Felis catus PCLOC797* LOC100685797 XR_134672.1 77.8 404 Canis lupus PCVIM VIM ENSFCAT00000013566 99.5 400 Felis catus PCSDCBP SDCBP ENSFCAT00000031049 96.25 400 Felis catus PCMARCH2 MARCH2 ENSFCAT00000015221 99.25 400 Felis catus PCIFI16 IFI16 ENSFCAT00000031928 94.23 416 Felis catus PCPSMB8 PSMB8 ENSFCAT00000031681 99 400 Felis catus PCPTPRC PTPRC ENSFCAT00000015694 98.5 400 Felis catus PCITGB3B* ITGB3 ENSPPYG00000008948 80.3 356 Pongo abelii PCMYL12A MYL12A ENSFCAT00000031607 99 400 Felis catus PCARHGDIB ARHGDIB ENSFCAT00000031411 98.75 400 Felis catus PCMGAT1 MGAT1 ENSFCAT00000015348 99.3 400 Felis catus PCCNI CCNI ENSFCAT00000003637 94.9 412 Felis catus * Indicates no high-scoring Blast hits to the F. catus genome were found; therefore the best overall hit is shown. 131

Table B.3: Shown for each locus in P. c. couguar are the number of individuals with genotypes (n), minor allele frequency (MAF), observed (HO) and expected (HE) heterozygosities, FIS according to Weir and Cockerham (1984) and test for significant deviation from Hardy-Weinberg equilibrium (Prob).

** Locus N MAF HO HE FIS Prob PCFAS 498 0.445 0.476 0.494 0.0374 0.4067 PCZYX 498 0.060 0.116 0.113 -0.0276 1.000 PCITGB3A 376 0.449 0.527 0.495 -0.0627 0.2562 PCTKT 497 0.234 0.296 0.359 0.1769 0.0001* PCTAL1 498 0.436 0.418 0.492 0.1516 0.0007* PCPRDX6 498 0.245 0.341 0.370 0.0782 0.0924 PCFAM213A 498 0.174 0.295 0.287 -0.0273 0.6414 PCBACTIN 489 0.162 0.229 0.271 0.1556 0.0014* PCPPBP 496 0.276 0.383 0.400 0.043 0.369 PCAP2M1 498 0.307 0.390 0.426 0.0859 0.0579 PCPDLIM1 497 0.341 0.433 0.449 0.0385 0.417 PCCOPE 492 0.309 0.386 0.427 0.0966 0.0352 PCCCDC1098 494 0.305 0.419 0.424 0.012 0.8329 PCLAPTM5 497 0.202 0.280 0.323 0.1342 0.0052 PCFHL1 497 0.087 0.064 0.158 0.5933 0.0000* PCLOC797 498 0.239 0.349 0.364 0.0404 0.3912 PCVIM 363 0.398 0.394 0.479 0.1793 0.0006* PCSDCBP 498 0.217 0.317 0.340 0.067 0.1499 PCMARCH2 498 0.11 0.181 0.196 0.0812 0.1037 PCIFI16 498 0.473 0.456 0.499 0.0867 0.0595 PCPSMB8 498 0.425 0.412 0.489 0.1586 0.0004* PCPTPRC 434 0.094 0.138 0.171 0.1931 0.0005* PCITGB3B 496 0.408 0.554 0.483 -0.1465 0.0012* PCMYL12A 495 0.191 0.293 0.309 0.0528 0.2498 PCARHGDIB 498 0.151 0.245 0.256 0.0435 0.3780 PCMGAT1 497 0.375 0.457 0.469 0.0269 0.5625 Mean 484.462 0.274 0.340 0.367 0.074 SE 7.081 0.025 0.024 0.023 * Significant departure from HWE after Bonferroni correction (P < 0.0019) ** No loci were significantly different from zero after 3640 permutations 132

Table B.4: Summary of genetic variation statistics (standard deviations within parentheses) within each subspecies of puma: genotypes (n), minor allele frequency (MAF), observed (HO) and expected (HE) heterozygosities, FIS according to Weir and Cockerham (1984) and percent polymorphic loci (P).

Subspecies N MAF HO HE FIS P 0.274 0.340 0.367 P. c. couguar 498 0.074 100.00% (0.025) (0.024) (0.023) 0.159 0.21 0.222 P. c. cabrerae 5 0.168 69.23% (0.068) (0.038) (0.035) 0.144 0.212 0.188 P. c. capricornensis 2 0.214 46.15% (0.124) (0.057) (0.041) 0.159 0.260 0.220 P. c. concolor 4 -0.038 61.54% (0.078) (0.056) (0.038) 0.135 0.269 0.135 P. c. costaricensis 1 N/A* 26.92% (0.226) (0.089) (0.044) 0.138 0.177 0.183 P. c. puma 5 0.151 50.00% (0.076) (0.041) (0.04) * Could not be calculated due to a sample size of 1 133

Table B.5: General indices of genetic variation averaged over 15 microsatellite (MS) and 23 SNP (SNP) loci for three putative populations of pumas from Arizona. N = number of individuals, NA = mean number of alleles, HO = observed heterozygosity, HE = expected heterozygosity, FIS = inbreeding coefficient.

Marker Population N NA HO HE FIS* Type N 43 4.857 0.611 0.629 0.041 MS SW 65 6.429 0.674 0.653 -0.024 SE 123 5.357 0.654 0.663 0.017 N 43 2.000 0.342 0.354 0.082 SNP SW 65 2.000 0.353 0.345 -0.014 SE 123 2.000 0.359 0.366 0.023

* FIS values did not differ significantly from zero (P > 0.05) 134

Table B.6: Pairwise FST measures calculated using the microsatellite (above diagonal) and SNP (below diagonal) datasets. All FST values were significantly greater than zero (P < 0.017).

N SE SW N - 0.061 0.068 SE 0.056 - 0.030 SW 0.042 0.020 -

135

Figure B.1: Map of puma samples used in this study. The number within each colored region indicates the number of samples obtained. The colors represent a region with at least one puma sample of subspecies P. c couguar (yellow), P. c. puma (red), P c. costaricensis (green), P. c. capricornensis (orange), P. c. cabrerae/P. c. puma (light blue), and P. c. concolor (purple). One puma of unknown origin is not shown.

 

           



 



 

136

Figure B.2: Distribution of the top BLASTN hit for 1,000 randomly chosen pyrosequence reads. Only hits with an e-value less than 1e-30 were retained.

   

   

137

Figure B.3: Box and whiskers plot of A) the FREEBAYES quality score metric for each SNP and B) the count of each allele. Monomorphic loci are represented in gray, polymorphic loci in blue, and failed loci in red. 

%  

%

% %   

% % % % 

         "      $ 

   !  #   $

138

Figure B.4: Mean FIS (black line) for varying levels of heterozygosity predicted after 100,000 simulations following the method of Girard (2011). The upper and lower 95% confidence intervals are shown as dashed lines. Each black circle represents a locus and the names of two loci outside the 95% boundaries are given. Loci falling above the upper boundary (95% confidence interval) have an excess of homozygosity, suggesting a potential for a null alleles, whereas those loci below the boundary display an excess of heterozygosity compared with the predicted distribution.

139

Figure B.5: Principle coordinates analysis for the first two axis of variation of the PumaPlex data. The six different subspecies of puma are shown according to the same colors given in Figure B.1. UNK indicates the single, captive pet puma of unknown geographic origin.

!"#$%$"%& !"'$(!)$*%*+(+& !)*!),)$& !)+-"$(!%*+(+& !""#$%&'& !)./."$& 0.1"& 234&

!""#$%&(&

140

Figure B.6: Results of the STRUCTURE analysis for the entire puma dataset. A) Plot of the ΔK for each value of K, B) Ancestry plot for each puma for the optimal K = 3 solution. Each color represents the proportion of ancestry in the three clusters. Individuals to the left of the vertical black line are from Central (CA) and South American (SA) subspecies (ssp). The remaining individuals are all P. c. couguar. C) Map of the sampling location of each puma, colored according to the cluster that contains a majority of the individual’s ancestry, including a more detailed view of the southwestern United States (D) where a majority of the samples were collected.

141

Figure B.7: Map of putative population assignments for hunter-harvested pumas based upon location of registration of the carcass. N = dark gray, SW = light gray, SE = white. Dark black lines indicate major interstate highways; light black lines represent smaller highways. Map adapted from http://en.wikipedia.org/wiki/File:USA_Arizona_location_map.svg under the Creative Commons Attribution-Share Alike 3.0 Unported license.



 

142

Figure B.8: Principle coordinates analysis of both the A) microsatellite and B) SNP datasets for the first two principle coordinates (component 1 and component 2). Putative puma populations are represented as N = black circles, SE = red circles, SW = blue circles.

     

     

      

     



143

Figure B.9: Results from the Bayesian clustering software STRUCTURE for A) microsatellites and B) SNPs. Each individual is represented as a vertical bar with colors corresponding the individual’s proportion of ancestry in the given number of clusters (K).







 









 144

Figure B.10: Probability of identity (PI) and probability of identity among siblings (PISIB) calculated for increasing combinations of loci for both the SNP and microsatellite (MS) datasets.

6.0E-01

5.0E-01

4.0E-01

3.0E-01 PI

2.0E-01

1.0E-01

0.0E+00 1 3 5 7 9 11 13 15 17 19 21 23 Number of Loci

PI-SNP PISIB-SNP PI-MS PISIB-MS

145

Figure B.11: Mean relatedness within each population calculated according to the method of Lynch and Ritland (1999) for both the microsatellite (blue) and SNP (red) datasets. Significant differences between datasets within a population are indicated with an asterisk (P < 0.01).





  

  



    

146

Figure B.12: Genotyping success (call rate) for three microsatellite loci (blue) compared with 23 SNP loci (red) in 10 puma fecal samples. Microsatellite analyses described in Naidu et al. (2011).

1

0.9

0.8

0.7

0.6

0.5

Call Rate 0.4

0.3

0.2

0.1

0

SNP Microsatellite

147

APPENDIX C

GENOME-WIDE ANALYSIS OF SNPS REVEALS PEDIGREE ERRORS AND A LACK OF DOMESTIC DOG ANCESTRY IN THE ENDANGERED MEXICAN WOLF

FITAK, ROBERT R.

Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ

85721, USA

COAUTHORS: RINKEVICH, S. and CULVER, M.

This paper was prepared to submit to a journal.

Corresponding Address:

Robert R. Fitak

317 Biosciences East

1311 E 4th St

Tucson, AZ, 85721, USA

Tel: 520-626-1636

Fax: 520-621-8801

Email: [email protected]

Keywords: single nucleotide polymorphisms, gray wolves, conservation genomics, Arizona, New

Mexico, hybridization

148

ABSTRACT

The Mexican wolf (Canis lupus baileyi) was extirpated by 1980 and existed in three separate captive lineages. These lineages, McBride, Ghost Ranch, and Aragón are descended from three, two, and two founders, respectively. Over time, however, the small number of founders from each population resulted in a high level of inbreeding.

Inbreeding causes individuals to express slightly deleterious traits, thereby reducing the fitness of the population. To mitigate this loss, the three captive wolf lineages were merged in 1995. The merger led to an increase in measured fitness, known as genetic rescue, in cross-lineage wolves. In 1998, the first reintroduction of wolves began.

Currently, the free-ranging wolf population numbers approximately 75 individuals.

Studies using microsatellites have estimated levels of relatedness and genetic variation among wolves. Despite these studies, numerous concerns remain, including: the accuracy of the pedigree and the purity of founders (the proportion of Mexican wolf ancestry derived from domestic dogs). We examined these concerns using genomic technologies developed for the domestic dog. We genotyped 80 Mexican wolves for more than 172,000 genetic markers called SNPs (single nucleotide polymorphisms). We identified several potential pedigree errors using the genotypes from family trios.

Additionally, comparison with 446 domestic dogs revealed a lack of biologically significant ancestry from domestic dogs. Overall, our results can be used to improve the captive production and reintroduction program for the Mexican wolf.

149

INTRODUCTION

The Mexican gray wolf (Canis lupus baileyi; Nelson and Goldman 1929;

Goldman 1937) was historically distributed throughout the southwestern United States and northern Mexico, although its exact distribution and taxonomic status has been extensively debated (Young and Goldman 1944; Hall and Kelson 1959; Bogan and

Mehlhop 1983; Hoffmeister 1986; Nowak 1995; Parsons 1996). The Mexican wolf is considered the smallest (Young and Goldman 1944; Brown 1983) and one of the most genetically divergent (Wayne et al. 1992; García-Moreno et al. 1996; Vilá et al. 1999; vonHoldt et al. 2011; Chambers et al. 2012) of North American gray wolves.

Mitochondrial DNA analysis showed that the Mexican wolf haplotype was more similar to a Eurasian gray wolf haplotype than North American gray wolf haplotypes, suggesting that Mexican wolves represent one of the earliest waves of migration of Canis lupus into the New World (Vilá et al. 1999; Wayne and Vilá 2003). Nonetheless, Leonard et al.

(2005) and Hailer and Leonard (2008) analyzed mitochondrial DNA from historical gray wolf specimens and found an ubiquitous North American gray wolf haplotype in two

Mexican wolves, and a southern haplotype as far north as Nebraska, suggesting gene flow was common in the region or the absorption of southern haplotypes occurred into northern, invading populations. The authors also reported a Mexican wolf haplotype that differed by one and two base pairs from a Mexican and Texas coyote (C. latrans), respectively, and may have represented interspecific hybridization.

Since the settlement of the southwestern United States, Mexican wolves experienced a rapid decline in numbers as a result of extensive predator removal campaigns. In 1976, the Mexican wolf was first protected as an endangered subspecies 150

(USFWS 1976). Despite this protection, by 1980 the Mexican wolf was considered extirpated from the U.S. and fewer than 50 wild individuals likely remained in Mexico

(McBride 1980; Brown 1983). To mitigate this decline in the Mexican wolf, the U.S.

Fish and Wildlife Service established a captive population during the 1970’s called the

McBride, or Certified, population. A total of six individuals (five males and a female) were captured in Mexico, but only three contributed to subsequent generations and are effectively considered founders (Hedrick et al. 1997; Siminski 2011). Two additional captive lineages had also been established by this time: one in the U.S. called Ghost

Ranch, and a second one at the San Juan de Aragón Zoo in Mexico called the Aragón population. Each of these two lineages was descended from two founders (Hedrick et al.

1997). Eventually in 1998, 11 Mexican wolves were reintroduced into a recovery zone in

Arizona and New Mexico (USFWS 2010). As of 2012, there are 75 free-ranging wolves in Arizona and New Mexico (USFWS 2012).

Because of population bottlenecks and small population sizes often encountered in captivity, intensive genetic management is necessary (Lacy 1994; Laikre 1999; Leberg and Firmin 2008). The most common practice is to minimize the mean kinship of individuals in the population. This practice not only minimizes the loss of genetic variation over time, but also minimizes inbreeding and is effective in conserving allelic diversity (Ballou and Lacy 1995; Fernández et al. 2004; Lacy 2009). This strategy is most effective when the complete pedigree is known and the founders are unrelated

(Frankham et al. 2009). This is often not the case as pedigrees may contain errors, unknown individuals, and founders with unknown relationships to each other. The

Mexican wolf pedigree, for instance, assumes the founders are unrelated, but this is 151 unlikely considering three of the seven founders were some of the last remaining

Mexican wolves and are likely related to some degree (USFWS 2010). Therefore, molecular markers, especially large numbers of single nucleotide polymorphisms (SNPs), are valuable for pedigree reconstruction (Santure et al. 2010; Hauser et al. 2011).

Through the identification of pedigree errors, managers may be able to further curtail the genetic deterioration in current captive populations.

The potential for introgression from other canids (i.e coyotes and domestic dogs) has been a concern for wolf management programs (USFWS 2007). For the Mexican wolf, the purity of the captive populations, especially Aragón and Ghost Ranch, has been questioned (Hedrick et al. 1997). Carley (1979) suspected the male founder of Ghost

Ranch to not be a pure Mexican wolf but rather a wolf admixed with domestic dog, and further elaborated on the poorly documented management of the population. In Aragón, two female wolves were bred with a wolf-dog hybrid, but the resulting offspring were supposedly eliminated from the population (García-Moreno et al. 1996; Hedrick et al.

1997). Furthermore, the reduced density of Mexican wolves prior to extirpation may have facilitated hybridization (Rhymer and Simberloff 1996). In the wild, Mexican wolves have been known to hybridize with coyotes (Leonard et al. 2005) and domestic dogs (USFWS 2010). The former occurred prior to any conservation management, is considered a very rare occurrence, and was not found in captive Mexican wolves (Hailer and Leonard 2008). The latter occurred more recently in the introduced population with the resulting pups removed and euthanized. Using skull morphometrics, there was no detectable hybridization between Mexican wolves and other species of canids in the

McBride (see Hedrick et al. 1997), Ghost Ranch (Bogan and Mehlhop 1983), and Aragón 152

(Weber 1989; Lopez and Vasquez 1991) lineages. Hedrick et al. (1997) do provide a personal communication from R. M. Nowak suggesting other morphological characteristics consistent with introgression from domestic dogs, however it was not possible to exclude effects from rearing and maintenance in captivity. Genetically, examination of the captive populations had demonstrated a lack of introgression from other canids using both mitochondrial DNA (Wayne et al. 1992) and microsatellite markers (García-Moreno et al. 1996), although the possibility of a small amount of non- wolf ancestry cannot be entirely ruled out (Hedrick et al. 1997).

In this study we performed a genome-wide screen of Mexican wolves from each of the three captive populations and captive and wild wolves of mixed ancestry. We utilized a high-density SNP genotyping array developed for domestic dogs to assay

>172,000 SNPs in the Mexican wolf genome. We used the results to determine the potential for pedigree errors, and included a comparison with 446 domestic dogs to evaluate the potential for admixture between these two canids. This work is the largest genetic study of Mexican wolves to date and provides extensive evidence for a lack of any biologically significant ancestry from domestic dogs. These results, along with the identification of several pedigree errors, have important implications for the conservation and management of this endangered wolf.

MATERIALS AND METHODS

CanineHD BeadChip

In this study we used the CanineHD BeadChip (Illumina Inc., San Diego, CA,

USA). The CanineHD BeadChip was developed by Vaysse et al. (2011) and included 153 more than 172,000 evenly spaced SNPs (mean spacing = 13 kb). These SNPs are a combination of SNPs from the existing domestic dog genome project and an additional

4,353 SNPs ascertained from resequencing gaps in a panel of four domestic dog breeds

(Irish Wolfhounds, West Highland White Terrier, Belgian Shepherds and Shar-Pei) and a pool of wolves (Vaysee et al. 2011). The authors evaluated the BeadChip using 450 samples of domestic dogs from 26 breeds, and also demonstrated its performance in 15 wolves (species not reported). The average minor allele frequency (MAF) and call rate across all loci and samples are 0.23, and 0.998, respectively (www.illumina.com). The call rate is the proportion of useable SNPs based upon whether or not an individual’s genotype can be found within the three existing genotype clusters.

Sample collection and genotyping

We collected a total of 95 Mexican wolf whole blood and tissue samples from the

Museum of Southwestern Biology’s Division of Mammals and the Division of Genomic

Resources (www.msb.unm.edu) or from the Mexican Wolf Recovery Team as part of the captive breeding and monitoring program. Samples included individuals with pure ancestry in each of the three original captive lineages (McBride = MB, Aragón = AR,

Ghost Ranch = GR) and individuals of mixed ancestry (cross-lineage = CL) according to the Mexican wolf pedigree (Siminski 2011). The sample collection included 14 family trios (sire, dam, and offspring) to assess Mendelian inconsistencies in the markers.

Additionally, we collected whole blood from a domestic dog of unknown breed ancestry

(Hershey) to serve as a genotyping control. We extracted DNA from each sample using the DNeasy Blood and Tissue Kit (Qiagen Inc., Germantown, MD, USA) according to 154 the manufacturer’s specifications. Purified DNA was concentrated using ethanol precipitation and quantified using the Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen

Corp., Calsbad, CA, USA). In cases where DNA concentrations were less than that required for downstream genotyping (~30 ng/uL), multiple DNA extractions were performed for the sample and subsequently combined to achieve the appropriate concentrations. All samples were genotyped for ~172,000 SNPs using the CanineHD

BeadChip at the Broad Institute (www.broadinstitute.org; Cambridge, MA, USA). We replicated a total of eight samples within and between different BeadChips to assess the reproducibility of the results.

Quality control and pedigree errors

All signal intensities were imported into the GenomeStudio Software (Illumina) and genotypes called according to cluster profiles for each SNP using the canineHD.egt file from Illumina. Genotypes with a GenTrain score less than 0.35 were removed from subsequent analyses. To ensure reliability for each sample we calculated the call rate and removed individuals with a call rate less than 0.96 (approximately less than one standard deviation from the mean call rate across all samples). After removing low quality samples we assessed reproducibility of SNP genotypes across replicated samples and removed the replicate with the lower GC10 score. We searched for inconsistencies in

Mendelian inheritance of SNPs among the family trios in GenomeStudio. Using the software PLINK v1.07 (Purcell et al. 2007) we first removed all SNPs from the X- and

Y-chromosomes to avoid biases between sexes. We looked for patterns in missing genotypes among individuals by clustering individuals according to their “identity by 155 missingness” (IBM) using the “--cluster-missing” command in PLINK. Strong patterns in IBM were visualized using multidimensional scaling. The IBM analyses were repeated after the data cleaning process as well, following the process described below.

We cleaned the genotype data by first removing the domestic dog sample to not bias allele frequency data. Next, we calculated allele frequencies and Hardy-Weinberg equilibrium (HWE) using an exact test by Wigginton et al. (2005) in PLINK. We constructed a cleaned dataset of 68,147 SNPs (68K dataset) after filtering the genotype data to only include SNPs with a genotype frequency ≥90%, MAF ≥1%, and SNPs not significantly deviating from HWE (P > 0.001). We generated a set of 6,794 unlinked

SNPs (7K dataset) after pruning the genotype data in PLINK using a conservative approach employing a sliding window of 50 SNPs, pruning one SNP from each pair with an r2 value ≥0.5, then moving the window five SNPs and repeating (PLINK command: -- indep-pairwise 50 5 0.5). We calculated genetic distance between pairs of individuals using the 7K dataset and an identity by state method in PLINK and summarized the results using multidimensional scaling.

Population admixture

We combined the Mexican wolf dataset with results from 446 domestic dogs from

30 breeds genotyped also on the CanineHD BeadChip (Lequarré et al. 2011; Vaysee et al.

2011; www.eurolupa.org) to investigate the potential for recent admixture. The genotypes were combined and filtered for autosomal SNPs, missingness, MAF, and LD as described above to produce a final dataset containing 106,373 SNPs (106K dataset).

We repeated each of the following analyses using 106K dataset with all samples (446 156 domestic dogs + Hershey + 80 Mexican wolves; n = 527) in addition to a reduced dataset that contained a dog sample chosen randomly from each of 30 breeds, the domestic dog sample, Hershey, and the 80 Mexican wolf individuals (n = 111) to control for the very large differences in sample sizes of analysis with n = 527 (similar to that performed by vonHoldt et al. 2011).

We first assessed the potential for admixture with domestic dogs using multidimensional scaling in the software PLINK. Next, we used a Bayesian clustering algorithm implemented in the program ADMIXTURE (Alexander et al. 2009) to determine the most likely number of genetic clusters (K). We tested for a K between 1 –

10, assumed no a priori hypotheses regarding population structure (unsupervised), and employed a cross-validation procedure (--cv=10) to determine the most likely number of clusters, K. The most likely K is the value when the standard error of the cross-validation calculations is smallest. We also repeated the ADMIXTURE analysis at just K = 2 using the supervised option (--supervised). This analysis used the prior information that all domestic dogs (with the exception of Hershey which we used to control for detecting

100% domestic dog ancestry) had an assumed ancestry of 100% in one cluster together, and that all pure MB Mexican wolves had 100% assumed ancestry in the second cluster.

Therefore, the program used this prior information to estimate the ancestry from these two populations in all non-MB Mexican wolves. We used MB to represent pure Mexican wolves, since it is the best documented of the Mexican wolf lineages. This particular analysis, however, will not likely reveal admixture in other Mexican wolves if the MB lineage itself was admixed. 157

Next, we explored the potential for two software programs, INTROGRESS

(Gompert and Buerkle 2010) and PCADMIX (Brisbin 2010), to assess potential admixture with domestic dogs using the 106K dataset. Both programs were used to estimate the hybrid index, h, defined as the proportion of ancestry in an individual derived from the second parental population. We used the genotypes from the 31 domestic dogs (one from each breed and Hershey) described above as the first parental population and the 31 pure MB wolves as the second parental population.

INTROGRESS used a maximum likelihood estimation procedure whereas PCADMIX used a principle components analysis of phased haplotypes. We inferred phased haplotypes using the software BEAGLE v3.3 (Browning and Browning 2007). We used otherwise default parameters with the exception of increasing the “-nsamples” parameter to 20 to improve accuracy. We compared the results from both INTROGRESS and

PCADMIX to the predicted ancestry in MB using a Pearson’s correlation test available in the statistical software package R (cor.test; www.R-project.org) since we did not have a parental wolf population to improve the accuracy of the analyses.

RESULTS

Quality control and pedigree errors

We genotyped a total of 96 samples (95 Mexican wolves and 1 domestic dog) on the canineHD array for 173,662 SNPs. We refer to all wolves hereafter by their official studbook number (Siminski 2011). The average call rate across all samples was 0.983.

The lowest call rate was 0.863 and the highest 0.996. We removed nine individuals that had a call rate below 0.96 (Figure C.1). From the remaining 87 individuals six replicated 158 samples remained, with a mean number of errors across replicates of 8.17 for an overall reproducibility of 0.99998 (Table C.1). We calculated Mendelian inconsistencies among

13 family trios (one trio failed quality control) according to the pedigree. We found a very low frequency of Mendelian errors, except in three cases (offspring-sire-dam: 820-

520-547, 886-520-547, and 1008-732-797; Table C.2). The former two instances were trios consisting of the same two parents at the same institution, whereas the latter instance was from an individual born free in the recovery area. When we calculated inconsistencies between the offspring and each parent independently for these three trios, we found in the former two a majority of errors can be attributed to the dam, 547 (1,460 and 1,386 errors, respectively) compared with the sire, 520 (44 and 42 errors, respectively). In the latter trio, comparisons between both sire and dam resulted in a similar number of errors in each case (2,650 and 2,588 errors, respectively).

We found no significant patterns of IBM among samples both before and after cleaning the genotype data (data not shown). After removing SNPs on the X- and Y- chromosomes (5,532 SNPs), that deviated from HWE (P < 0.001, 995 SNPs), with >10% missing genotypes (3,161 SNPs), and with a MAF > 0.01 (95,848 SNPs) we were left with a dataset containing 68,147 SNPs (68K). After pruning for significant LD, a dataset containing 6,794 unlinked SNPs (7K) remained. Multidimensional scaling of the pairwise genetic distances revealed distinct clustering according to an individual wolf’s ancestry in each of the three captive original populations (Figure C.2). The first dimension separated MB from the other lineages and accounted for ~35% of variation in the dataset, whereas the second dimension separated AR from the other two lineages and accounted for ~30% of total variation. CL wolves appeared to be placed at locations 159 intermediate between the three lineage clusters as predicted by the pedigree. There were two exceptions, wolves 858 and 547 (Figure C.2). These two individuals are both predicted to have pure MB ancestry from the pedigree, yet the multidimensional scaling suggested that individual 858 had some ancestry from AR and potentially a small amount from GR, whereas individual 547 was likely a first generation hybrid between MB and

GR. Individual 547 was also the dam that accounted for a majority of Mendelian inheritance errors in two family trios.

Population admixture

We used several methods to examine the potential for admixture between

Mexican wolves and domestic dogs. In each method, we compared the Mexican wolves with a dataset containing 446 domestic dogs from 30 breeds and one domestic dog of unknown ancestry (Hershey). We also made comparisons with a reduced set of 31 domestic dogs containing one randomly chosen individual from each of 30 breeds and

Hershey. In both comparisons, the results were similar so we only discuss the results from the reduced dataset for convenience (when constructing figures).

Multidimensional scaling showed little indication of shared ancestry (Figure C.3).

The first dimension (~30% of variation) was able to strongly differentiate between different Mexican wolves, whereas the second dimension (~25% variation) separated domestic dogs from Mexican wolves. The nearest dog individual to the Mexican wolf was a Greenland sled dog, a type of husky considered to be an ‘ancient’ breed (vonHoldt et al. 2010), thus more likely indicative of ancient shared ancestry between wolves and dogs and not reflective of recent admixture. 160

We used a Bayesian clustering method in ADMIXTURE to assess the probability of K different subpopulations within the individuals examined. The lowest cross- validation error between different values of K was observed at K = 6, although another local minimum was observed at K = 3 (Figure C.4A). At K = 3, Mexican wolves and domestic dogs appeared independent, with only a small amount of Mexican wolf ancestry found in dogs (Figure C.4B). Within Mexican wolves, MB and GR formed separate clusters, with both AR and CL wolves being admixed between these two clusters. At K =

6, both domestic dogs and Mexican wolves again lacked any evidence of admixture

(Figure C.4B). Within Mexican wolves, AR and GR formed distinct clusters, and the remaining individuals were distributed among the remaining three potential clusters. At

K = 2, both the unsupervised and supervised analyses yielded very similar results (Figure

C.5). Both place all Mexican wolves into a separate cluster from that of domestic dogs.

The two AR individuals, however, were the only two individuals that showed a very small fraction of ancestry derived from domestic dogs (2.4% and 3.4% respectively). No other evidence of domestic dog ancestry in Mexican wolves was observed. In domestic dogs, on the other hand, the unsupervised analysis suggests a small amount of wolf ancestry in several individuals (Figure C.5A).

We estimated the hybrid index, h, using both a maximum likelihood and principle components method and both produced similar results. The maximum likelihood method calculated a mean h of 0.882 ± 0.014 with a minimum of 0.662 in an AR wolf and a high of 0.985 in a CL wolf. The mean of the principle components method was 0.859 ± 0.009 for the proportion of MB-derived haplotypes, with range between 0.714 and 0.933 for the same individuals. However, tests for correlation between the two estimates of h and the 161 pedigree predicted ancestry in MB was high and significant in both cases (0.991 and

0.985; p < 2.2e-16, for the maximum likelihood and principle components methods, respectively).

DISCUSSION

We were able to generate high-quality SNP genotypes for 80 Mexican wolves using the CanineHD BeadChip. As expected, a large number of SNPs (> 50%) were monomorphic across all samples; likely the result of the small number of founders, extensive inbreeding, and possibility of historical small population size. Mexican wolves have previously been shown to have the lowest observed heterozygosity (0.12) among the gray wolves and coyotes sampled by vonHoldt et al. (2011). The data were also highly reproducible (> 99.99%), consistent with the results of Hong et al. (2012), who reported a genotype concordance of 99.87% for the Illumina human 1M Duo Chip across technical replicates within the same laboratory. Therefore, this data is useful for genome-wide association studies in Mexican wolves where replication of the associations is necessary for confidence in the results.

Pedigree errors

We also identified three trios where one or both parents may be incorrectly identified. In two cases, the dam 547 produced a substantial number of Mendelian inheritance inconsistencies in two offspring from different litters separated by one year.

Additionally, the pedigree predicted that wolf 547 has 100% ancestry from MB, whereas multidimensional scaling of the genetic data suggested that wolf 547 most likely derived 162

50% ancestry from MB and 50% from GR. This is quite possible, considering the particular facility where wolf 547 was born had several pure MB and GR individuals present at that time. However, we cannot exclude the possibility that an error had occurred in the proper identification of the tissue sample collected. Our results also suggested that extensive inbreeding within each of the captive lineages resulted in rapid and substantial subpopulation structure (Anderson and Dunham 2008). However it remains possible that this substructure may have existed in Mexican wolves prior to their extirpation. For instance, the founders of MB were from the states of Chihuahua and

Durango, Mexico, whereas the GR founders were from Tumacacori, Arizona, U.S.A and

Sonora, Mexico. The founders of AR were collected from an unknown location

(Siminski 2011). As a result of this substructure, the use of multidimensional scaling or other population structure software (like ADMIXTURE) is valuable for estimating the contribution from different captive lineages in wild-born or otherwise unknown individuals. The same analysis can also identify potential pedigree errors like individuals

547 and 858 as reported in this study. It is important to minimize pedigree errors through proper documentation and the addition of genetic data because even a modest number of pedigree errors (> 15%) can reduce the benefit of commonly used captive breeding strategies in conserving genetic variation (Oliehoek and Bijma 2009). Although the exact rate of pedigree errors is yet to be assessed in Mexican wolves, our small sample size of

13 family trios indicated mistakes might be relatively common (23.1%). Future work examining additional Mexican wolves needs to be performed to substantiate the pedigree error rate. Once corrected, an accurate pedigree will improve the selection of individuals and mating pairs to release into the wild, and will be useful for investigating quantitative 163 traits, inbreeding depression, evolution of cooperation, and inbreeding avoidance in the wild Mexican wolf population (Pemberton 2008).

Population admixture

As reported elsewhere for other genetic markers and SNPs (Wayne et al. 1992;

García-Moreno et al. 1996; Vilá et al. 1999; vonHoldt et al. 2011), we found little evidence of admixture with domestic dogs. The multidimensional scaling of genetic distances and Bayesian clustering both demonstrate the integrity of the Mexican wolf genome has remained intact, despite the potential for hybridization. Results from

ADMIXTURE detect an extremely small level of shared ancestry (<3.4%) with domestic dogs in AR. This could have been a result of a single hybridization within the captive population, however, in the few number of generations between the founders and these two AR individuals one would expect a higher proportion of shared ancestry. It is more likely this admixture resulted from either a hybridization event in the wild population prior to founding AR or is simply ancient shared ancestry since the domestication of dogs was relatively recent. Either way, this small amount of dog ancestry in AR, and its absence in the current captive and wild population (including released individuals tested in this dataset), lack biological significance (Hedrick et al. 1997). Although the potential for ascertainment bias existed because the SNPs were developed for polymorphisms in dogs, methods such as multidimensional scaling of genetic distances are not susceptible to this bias (vonHoldt et al. 2011). Results for model-based approaches such as those from ADMIXTURE should be treated with caution because of this bias, although both methods supported the same conclusion. 164

We also attempted to use a maximum likelihood method and a method that identifies parental haplotypes and their distribution in a potentially admixed population to assess potential hybridization with domestic dogs. When using MB and domestic dogs as parental populations, we found a majority of ancestry derived from MB from both methods. These analyses do assume that MB wolves do not have domestic dog ancestry and represent all the variation found in non-MB wolves. Although these analyses at first appeared to indicate a larger proportion of ancestry from domestic dogs in non-MB wolves than ADMIXTURE or multidimensional scaling, further comparison with pedigree predictions of MB ancestry suggest otherwise. In other words, the analyses were only identifying MB ancestry in non-MB wolves, and assigning the remaining ancestry to domestic dogs. To circumvent this issue, we will need to repeat the analyses and substitute the MB parental population with a population of gray wolves representing as much of their genetic variation as possible. This method would improve the ability to assign Mexican wolf haplotypes to either parental population and predict introgression.

This result does indicate that in both AR and GR wolves, approximately 30% of their genome is not shared with MB. Since ADMIXTURE and multidimensional scaling suggest a lack of domestic dog ancestry, we can assume that GR and AR founders contained substantial genetic variation not found in MB. Unfortunately, in the current wild population (as depicted in the right hand side of Figure C.6), a majority of genetic contribution can be attributed to MB-(like) haplotypes.

Conclusions 165

In this study we reported the potential for dense SNP datasets to improve our ability to reconstruct and verify pedigrees, which in turn may help optimize the breeding strategy and maximize the retention of genetic diversity in the captive population

(Pertoldi et al. 2010). We also reported the lack of biologically significant introgression from domestic dogs in Mexican wolves. It remains, possible, however, that the SNP genotypes from the 30 breeds of dogs used in this study do not represent the variation found in another breed of dog that potentially hybridized with a Mexican wolf in the past.

With this data we cannot exclude the possibility for introgression from other canids, such as northern gray wolves or coyotes, which has been reported previously and of rare occurrence (Leonard et al. 2005) and is of less concern in the recovery program for the

Mexican wolf (USFWS 2010). Hopefully, through continued and expanded monitoring of Mexican wolves using a genome-wide set of SNPs, managers will be able to conserve the integrity of the Mexican wolf’s genome and sufficient genetic variation necessary to maintain a large population with future evolutionary potential.

166

REFERENCES

Alexander DH, Novembre J & Lange K. (2009). Fast model-based estimation of ancestry

in unrelated individuals. Genome Research 19(9): 1655-1664.

Anderson EC & Dunham KK. (2008). The influence of family groups on inferences made

with the program Structure. Molecular Ecology Resources 8(6): 1219-1229.

Ballou JD & Lacy RC. (1995). Identifying genetically important individuals for

management of genetic variation in pedigreed populations. In Population

management for survival and recovery: analytical methods and strategies in small

population conservation (Ballou, J. D., Gilpin, M. E. & Foose, T. J., eds.), pp. 76-

111. Columbia University Press, New York, New York, USA.

Bogan MA & Mehlhop P. (1983). Systematic relationship of gray wolves (Canis lupus)

in southwestern North America. Occasional Papers in the Museum of

Southwestern Biology 1: 1-21.

Brisbin A. (2010). Linkage analysis for categorical traits and ancestry assignment in

admixed individuals, Ph.D Dissertation; Cornell University.

Brown DE. (1983). The wolf in the Southwest: the making of an endangered species, The

University of Arizona Press, Tucson, Arizona, USA.

Browning SR & Browning BL. (2007). Rapid and accurate haplotype phasing and

missing data inference for whole-genome association studies by use of localized

haplotype clustering. American Journal of Human Genetics 81(5): 1084-1097.

Carley CJ. (1979). Narrative report on alleged Mexican gray wolf (Canis lupus baileyi)

lineages held in captivity. U.S Fish and Wildlife Service Southwest Region,

Albuquerque, New Mexico, USA. 167

Chambers SM, Fain SR, Fazio B, et al. (2012). An account of the taxonomy of North

American wolves from morphological and genetic analyses. North American

Fauna 77: 1-67.

Fernandez J, Toro MA & Caballero A. (2004). Managing individuals' contributions to

maximize the allelic diversity maintained in small, conserved populations.

Conservation Biology 18(5): 1358-1367.

Frankham R, Ballou JD & Briscoe DA. (2009). Introduction to conservation genetics,

Cambridge University Press, Cambridge, UK.

Garcia-Moreno J, Matocq MD, Roy MS, et al. (1996). Relationships and genetic purity of

the endangered Mexican wolf based on analysis of microsatellite loci.

Conservation Biology 10(2): 376-389.

Goldman EA. (1937). The Wolves of North America. Journal of Mammology 18: 37-45.

Gompert Z & Buerkle CA. (2010). INTROGRESS: a software package for mapping

components of isolation in hybrids. Molecular Ecology Resources 10(2): 378-384.

Hailer F & Leonard JA. (2008). Hybridization among three native North American canis

species in a region of natural sympatry. Plos One 3(10): e3333.

Hall ER & Kelson KR. (1959). The Mammals of North America, Vol. II, The Ronald

Press, New York, New York, USA.

Hauser L, Baird M, Hilborn RAY, et al. (2011). An empirical comparison of SNPs and

microsatellites for parentage and kinship assignment in a wild sockeye salmon

(Oncorhynchus nerka) population. Molecular Ecology Resources 11: 150-161.

Hedrick PW, Miller PS, Geffen E, et al. (1997). Genetic evaluation of the three captive

Mexican wolf lineages. Zoo Biology 16(1): 47-69. 168

Hoffmeister DF. (1986). Mammals of Arizona, University of Arizona Press, Tucson,

Arizona, USA.

Hong HX, Xu L, Liu J, et al. (2012). Technical reproducibility of genotyping SNP arrays

used in genome-wide association studies. Plos One 7(9): e44483.

Lacy RC. (1994). Managing genetic diversity in captive populations of animals. In

Restoration and recovery of endangered plants and animals (Bowles, M. L. &

Whelan, C. J., eds.), pp. 63-89. Cambridge University Press, Cambridge, UK.

Lacy RC. (2009). Stopping evolution. In Conservation genetics in the age of genomics

(Amato, G., DeSalle, R., Ryder, O. A. & Rosenbaum, H. C., eds.), pp. 58-81.

Columbia University Press, New York, New York, USA.

Laikre L. (1999). Hereditary defects and conservation genetic management of captive

populations. Zoo Biology 18(2): 81-99.

Leberg PL & Firmin BD. (2008). Role of inbreeding depression and purging in captive

breeding and restoration programmes. Molecular Ecology 17(1): 334-343.

Leonard JA, Vilá C & Wayne RK. (2005). Legacy lost: genetic variability and population

size of extirpated US grey wolves (Canis lupus). Molecular Ecology 14(1): 9-17.

Lequarré AS, Andersson L, Andre C, et al. (2011). LUPA: A European initiative taking

advantage of the canine genome architecture for unraveling complex disorders in

both human and dogs. Veterinary Journal 189(2): 155-159.

Lopez G & Vazquez CB. (1991). Linaje de lobos Mexicanos "San Juan de Aragón":

historia, evidencia de su autenticidad y posibilidad de certificacion. Unpublished

report for the Zoologico San Juan de Aragón. 169

McBride, RT. (1980). The Mexican wolf (Canis lupus baileyi): a historical review and

observations on it status and distribution. U.S. Fish and Wildlife Service

Southwest Region, Albuquerque, New Mexico, USA.

Nelson EW & Goldman EA. (1929). A new wolf from Mexico. Journal of Mammology

10: 165-166.

Nowak RM. (1995). Ecology and conservation of wolves in a changing world,

Proceedings of the Second North American Wolf Symposium, Canadian

Circumpolar Institute, University of Alberta, Edmonton, Alberta, Canada.

Oliehoek PA & Bijma P. (2009). Effects of pedigree errors on the efficiency of

conservation decisions. Genetics Selection Evolution 41(1): 9.

Parsons D. (1996). Case study: the Mexican wolf. New Mexico Journal of Science 34:

101-123.

Pemberton JM. (2008). Wild pedigrees: the way forward. Proceedings of the Royal

Society B-Biological Sciences 275(1635): 613-621.

Pertoldi C, Wójcik JM, Tokarska M, et al. (2010). Genome variability in European and

American bison detected using the BovineSNP50 BeadChip. Conservation

Genetics 11(2): 627-634.

Purcell S, Neale B, Todd-Brown K, et al. (2007). PLINK: A tool set for whole-genome

association and population-based linkage analyses. American Journal of Human

Genetics 81(3): 559-575.

Rhymer JM & Simberloff D. (1996). Extinction by hybridization and introgression.

Annual Review of Ecology and Systematics 27: 83-109. 170

Santure AW, Stapley J, Ball AD, et al. (2010). On the use of large marker panels to

estimate inbreeding and relatedness: empirical and simulation studies of a

pedigreed zebra finch population typed at 771 SNPs. Molecular Ecology 19(7):

1439-1451.

Siminski DP. (2011). Mexican wolf, Canis lupus baileyi, international studbook, 2011.

The Living Desert, Palm Desert, California, USA.

U.S. Fish and Wildlife Service [USFWS]. (1976). Determination that two species of

butterflies are threatened species and two species of mammals are endangered

species. Federal Register 41(83): 17736-17740.

U.S. Fish and Wildlife Service [USFWS]. (2007). Red wolf (Canis rufus) 5-year status

review: summary and evaluation. U.S. Fish and Wildlife Service Southeast

Region, Manteo, North Carolina, USA.

U.S. Fish and Wildlife Service [USFWS]. (2010). Mexican wolf conservation

assessment. U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New

Mexico, USA.

U.S. Fish and Wildlife Service [USFWS]. (2012). Mexican wolf Blue Range

reintroduction project statistics. U.S. Fish and Wildlife Service, Downloaded

from: http://www.fws.gov/southwest/es/mexicanwolf/.

Vaysse A, Ratnakumar A, Derrien T, et al. (2011). Identification of genomic regions

associated with phenotypic variation between dog breeds using selection

mapping. PLoS Genetics 7(10): e1002316. 171

Vilá C, Amorim IR, Leonard JA, et al. (1999). Mitochondrial DNA phylogeography and

population history of the grey wolf Canis lupus. Molecular Ecology 8(12): 2089-

2103. vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305. vonHoldt BM, Pollinger JP, Lohmueller KE, et al. (2010). Genome-wide SNP and

haplotype analyses reveal a rich history underlying dog domestication. Nature

464(7290): 898-902.

Wayne RK, Lehman N, Allard MW, et al. (1992). Mitochondrial DNA variability of the

gray wolf - genetic consequences of population decline and habitat fragmentation.

Conservation Biology 6(4): 559-569.

Wayne RK & Vilá C. (2003). Molecular genetic studies of wolves. In Wolves: behavior,

ecology, and conservation (Mech, L. D. & Boitani, L., eds.). The University of

Chicago Press, Chicago, Illinois, USA.

Weber M. (1989). La pureza racial del lobo gris Mexicano (Canis lupus baileyi) en

cautiverio en Mexico: estudios preliminares. Memoria del VI Simposio Sobre

Fauna Silvestre; F.M.V.Z. UNAM.

Wigginton JE, Cutler DJ & Abecasis GR. (2005). A note on exact tests of Hardy-

Weinberg equilibrium. American Journal of Human Genetics 76(5): 887-893.

Young SP & Goldman EA. (1944). The Wolves of North America: Part I, Dover

Publications Inc., New York, New York, USA.

172

Table C.1: Reproducibility of genotypes across six replicated samples.

ID # Correct # Errors Total Frequency 10 171002 3 171005 0.999991 29 171016 7 171023 0.999980 72 171147 12 171159 0.999965 412 170857 7 170864 0.999980 547 170350 3 170353 0.999991 1038 170667 17 170684 0.999950 Mean 170838.8 8.17 170848.0 0.999976

173

Table C.2: Mendelian disagreements between genotypes of Parent-offspring trios. Wolf IDs are given according to the Mexican wolf official studbook (Siminski 2011).

Offspring Sire Dam # Correct # Errors Total Frequency 105 60 37 170586 62 170648 0.9996 170 60 37 167326 231 167557 0.9986 171 60 37 170223 100 170323 0.9994 518 411 84 169233 64 169297 0.9996 658 105 569 169671 75 169746 0.9996 820 520 547 165098 4565 169663 0.9731 861 574 797 169526 58 169584 0.9997 886 520 547 165972 3645 169617 0.9785 919 904 511 169789 31 169820 0.9998 922 904 511 169858 26 169884 0.9998 1008 732 797 161328 8072 169400 0.9523 1054 732 797 169491 53 169544 0.9997 1055 732 797 169550 45 169595 0.9997 Mean 168280.8 1309.8 169590.6 0.9923

174

Figure C.1: Call rate over all 173,662 SNPs for each of 96 samples ordered according to increasing call rate along the X-axis.                   175

Figure C.2: Multidimensional scaling plots for the first two dimensions of the LD- pruned data (6,794 SNPs). Each individual Mexican wolf is shown as a pie chart with slices representing each wolf’s proportion of ancestry in the three captive lineages as predicted from the pedigree. MB = McBride (blue), AR = Aragón (red), GR = Ghost Ranch (green). Numbers indicate IDs (Siminski 2011) of wolves who appear abnormal in the clustering according to their ancestry predicted from the pedigree.

  



 

176

Figure C.3: Multidimensional scaling of the combined dataset containing 80 Mexican wolves (blue circles) and 31 domestic dogs (black circles). Only the first two dimensions are shown.

●● ●●●●●●● ●●●●●● ● ●● 0.05 ● ●● ● ●

●● ●●● ●●● ● ●● ● ●●●●● ● ●●●●● ●●●●● ● ● ●● ● ●● ● ● ● 0.00 ● ●● ● ●● ●●● ● ● ● ● ●

●●

●●

− 0.05 ● ● Dimension 2

● ● − 0.10

●●● ● ● − 0.15 −0.20 −0.15 −0.10 −0.05 0.00 0.05

Dimension 1

177

Figure C.4: Results of the population clustering analysis in ADMIXTURE. A) Cross- validation error for different potential numbers of clusters, K, between 1 and 10. B) Ancestry plot for K = 3 and 6. Each vertical bar represents an individual with colors corresponding to the proportion of ancestry in the predefined number of clusters K (y axis). MB = McBride, AR = Aragón, GR = Ghost Ranch, CL = cross-lineage.

       





#$

        

#

    ! "  178

Figure C.5: Results from ADMIXTURE for K = 2 population clusters for both the A) unsupervised and B) supervised analyses. MB = McBride, AR = Aragón, GR = Ghost Ranch, CL = cross lineage. The single domestic dog genotyped, Hershey, is shown separately on the far right. In the supervised analysis, B, only non-MB Mexican wolves and Hershey had ancestry estimated; domestic dogs and MB wolves were assigned to clusters 1 and 2 a priori.

     



     

     



     

179

Figure C.6: Line graph depicting the hybrid index h (proportion of ancestry derived from MB wolves; black line), the proportion of the genome containing haplotypes derived from MB wolves (gray line), and the proportion of ancestry from MB predicted from the pedigree (dotted line). Non-MB Mexican wolves are listed on the horizontal axis according to studbook ID (Siminski 2011) and are ordered by birth year.

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&" !"#$#"%#&'#(')&*+,-".'/&'01'

!#%"

!#$"

!" (&)" ($+" (%!" (%$" ()," (*'" (,&" ))*" )&(" )'," )(+" *&!" *&%" ,,%" +!)" +%!" +)$" ,,!" ++)" -%(+" -%+'" $!!+" $!%+" $!&$" $!&%" $!&'" $!&+" $!&," $!'%" $!''" $$$!" $$$$" $$$(" $$!(" $$!)" $$!+" $!('" $!((" $$$)" $$$*" $$&!" $$&&" $$)$" --'$$" --'$%" --'%!" --'&$" --'&%" --'%)"

*Aragón individuals **Ghost Ranch individuals

180

APPENDIX D

MEASURING INBREEDING USING GENOMIC DATA: THE ENDANGERED MEXICAN WOLF AS AN EXAMPLE

FITAK, ROBERT R.

Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ

85721, USA

COAUTHORS: RINKEVICH, S. and CULVER, M.

This paper was prepared to submit to a journal.

Corresponding Address:

Robert R. Fitak

317 Biosciences East

1311 E 4th St

Tucson, AZ, 85721, USA

Tel: 520-626-1636

Fax: 520-621-8801

Email: [email protected]

Keywords: single nucleotide polymorphisms, Mexican wolf, captive breeding, pedigree, simulation

181

ABSTRACT

In order to understand the extent of inbreeding depression in a population, one must first quantify the level of inbreeding, F. The most traditional methods to measure F are based on information obtained from a pedigree. When an accurate pedigree is unavailable, molecular methods provide an alternative method to estimate F. Normally between 10-50 microsatellite loci are employed to provide a genome-wide estimate of F.

However, recent advances in sequencing and genotyping technologies have now made it possible to interrogate orders of magnitude more molecular markers, primarily single nucleotide polymorphisms (SNPs), or even complete genomes to estimate F. In this study we evaluated three different measures of inbreeding in the endangered Mexican wolf (Canis lupus baileyi) population: the pedigree estimate and two estimates derived from genome-wide SNP genotyping, one based on allele frequencies FHET, and the other based on runs of homozygosity (ROHs) in the genome FROH. FROH consistently outperformed FHET using both the observed pedigree and simulated data. These results demonstrate that genome-wide analysis of SNPs is very promising for calculating F, especially in small and/or managed populations. This has important implications for detecting inbreeding depression in wildlife conservation programs. 182

INTRODUCTION

Inbreeding is a result of mating between individuals that are related. In other words, an individual is inbred when its parents share a common ancestor. The consequence of inbreeding is the increase in homozygosity, which has often been associated with a reduction in fitness traits (see Table 1 of Charlesworth and Willis 2009 for examples). This reduction in fitness as a result of inbreeding, termed inbreeding depression, occurs through two distinct genetic mechanisms: 1) the increased homozygosity of recessive deleterious alleles and 2) increased homozygosity at loci where balancing selection maintains heterozygosity (Charlesworth and Charlesworth

1987; Charlesworth and Willis 2009). Understandably, inbreeding is more frequent when population sizes are small, such as in endangered species or otherwise bottlenecked populations, as individuals are more likely to encounter mates that are closely related.

Therefore, the identification and avoidance of inbreeding is of great concern in many captive breeding and conservation programs, where the reduction in fitness as a result of inbreeding can greatly inflate the extinction risk of the population (Hedrick and

Kalinowski 2000).

In order to understand the extent of inbreeding depression in a population, one must first quantify the level of inbreeding. Inbreeding is measured using the coefficient

F and is defined as the probability that two alleles at a locus within an individual are identical by descent (IBD, or autozygous) (Wright 1921; Malécot 1948). The most traditional methods to measure F are based on information obtained from a pedigree

(Wright 1922). However, in wild populations the pedigree is usually never known, or only for a few generations, and the pedigrees of captive populations often suffer from 183 errors, unknown individuals, and unknown relationships between the founders

(Pemberton 2008; Olihoek and Bijma 2009). When an accurate pedigree is unavailable, molecular methods provide an alternative method to estimate F. For many years microsatellites have been the molecular marker of choice, and a variety of methods have been developed to infer F from these data. These methods include the use of multilocus heterozygosity (e.gs Slate et al. 2004; Overall et al. 2005), variation in microsatellite repeat length (Coulson et al. 1998), moment estimators (Ritland 1996; Carothers et al.

2006), maximum likelihood estimators (Leuteneggar et al. 2003; Wang 2011; Hall et al.

2012), and a Bayesian estimator (Vogl et al. 2002). The ability of these methods to accurately estimate true inbreeding depends upon a variety of factors such as the number of markers (Balloux et al. 2004; Santure et al. 2010), sample size (Coltman and Slate

2003), variance in Mendelian segregation (Hill and Weir 2011), variance in F (Slate et al.

2004; Chapman et al. 2009), and number of generations of inbreeding (Alves et al. 2008).

Nevertheless, the accuracy of these methods in estimating F has often been questioned

(see Hedrick et al. 2001; Fernández et al. 2005; Overall et al. 2005; Chapman et al. 2009;

Santure et al. 2010).

Normally between 10-50 microsatellite loci are employed to provide a genome- wide estimate of F. However, recent advances in sequencing and genotyping technologies have now made it possible to interrogate orders of magnitude more molecular markers or even complete genomes to estimate F (Allendorf et al. 2010). A larger number of markers, usually single nucleotide polymorphisms (SNPs), is believed to provide more precise measures of inbreeding. However, when calculating a moment estimator of F (Purcell et al. 2007) from ~50,000 SNPs in Finnsheep, Li et al. (2011) 184 observed only a moderate correlation (r = 0.533) with the pedigree measure. Worse yet, an analysis of more than 60,000 SNPs in an ancient Iberian pig strain found only a weak correlation (r = 0.22) between a traditional molecular estimate (Malecot 1948) and the pedigree (Saura et al. 2013).

Another method developed by McQuillan et al. (2008) takes advantage of genome-level molecular data and appears promising. This method takes into account that autozygous regions (those regions IBD) are not distributed evenly across the genome, but occur in long, contiguous runs of homozygosity (ROHs) (Broman and Weber 1999). It has been shown that ROHs occur in a variety of lengths depending upon the time since the most recent common ancestor and the degree of shared parental ancestry (Kirin et al.

2010). For instance, small ROHs could be the result of locally reduced recombination rates, while very long ROHs are generally the result of recent inbreeding because recombination has had little chance to break apart the segments (Kirin et al. 2010).

McQuillan et al. (2008) demonstrated that the proportion of the autosomal genome covered by the genetic markers that occur in ROHs could estimate F. Using this method the authors found a relatively high correlation (r = 0.86) with the pedigree estimate in humans. In Iberian pigs and cattle both Silió et al. (2013) and Hamzić (2011) reported strong correlations (r = 0.814 – 0.919 and r = 0.619 - 0.705 respectively) with pedigree inbreeding. Simulations have also shown that the use of ROHs in a management setting can provide the optimal balance between maintaining diversity and fitness in an inbred population (de Cara et al. 2013). Therefore, the use of ROHs to estimate inbreeding may have important implications in the genetic management of both captive and wild populations. 185

In this study we evaluated three different measures of inbreeding in the endangered Mexican wolf (Canis lupus baileyi) population: the pedigree estimate and two estimates derived from genome-wide SNP genotyping. After being extirpated from the wild around 1980 (McBride 1980; Brown 1983), the Mexican wolf only existed in three independent captive lineages. Derived from a total of seven founders, these lineages accumulated high levels of inbreeding and were subsequently merged in 1995

(see Hedrick and Fredrickson 2008 for a review). In 1998 Mexican wolves were reintroduced into Arizona and New Mexico and the current population is estimated to contain 75 individuals (USFWS 2012). In contrast to the high levels of inbreeding in wolves from the three captive lineages (mean F = 0.184 – 0.608), the mean inbreeding coefficient of cross-lineage wolves as of 2005 was only 0.085 (Hedrick and Fredrickson

2008). Although evidence of inbreeding depression has not been identified in the original captive lineages, this might be partly due to the lack of statistical power from low amounts of variation in inbreeding resulting from a breeding strategy to minimize mean kinship (Kalinowski et al. 1999; Fredrickson et al. 2007). More so, the pedigree may contain errors from some unclear matings in the original captive lineages (Hedrick et al.

1997; Carley 1979) and the assumption that the founders, some of which may have been the last remaining wild Mexican wolves, are unrelated. In addition to comparing estimates of inbreeding in the observed Mexican wolf population, we measured inbreeding in simulated populations to control for the potential pedigree errors described above. Because of the increased statistical power and decreasing cost, genome-wide molecular analyses are rapidly growing in popularity. Therefore, it is important to understand whether or not estimates of F from genomic data can benefit the genetic 186 management of the Mexican wolf population and other captive and wild populations of endangered species

MATERIALS AND METHODS

Inbreeding in Mexican wolves

We measured and compared three different estimates of the inbreeding coefficient, F, in 80 Mexican wolves. These individuals included wolves with pure ancestry in each of the three original captive lineages (McBride [MB] n = 31, Aragón

[AR] n = 2, and Ghost Ranch [GR] n = 6) and individuals of mixed ancestry (cross- lineage [CL] n = 41). Unrelated wolves (n = 6) were defined as first generation CL wolves (offspring of a mating between parents of pure ancestry from separate captive lineages). We first calculated F using the official pedigree (Siminski 2011) in the software GENES (http://www.vortex9.org/genes.html), hereafter referred to as FPED.

GENES is an accessory program to the software SPARKS and is extensively used in many species survival plans connected to the International Species Information System

(ISIS; http://www.isis.org). GENES produces an unbiased estimator of F even when a parent(s) is unknown and assumes all founders are unrelated.

For the remaining two measures of F we used SNP genotype data described previously (see Appendix C). Briefly, DNA was extracted from blood and tissue samples of 89 Mexican wolves using the DNeasy Blood and Tissue Kit (Qiagen, Germantown,

MD, USA) and genotyped for ~172,000 SNPs on the Illumina CanineHD BeadChip. A total of 9 individuals with a call rate less than 96% were removed because a low call rate may be indicative of poor confidence in genotypes. After filtering and quality control in 187

PLINK (Purcell et al. 2007) (see Appendix C) 68,147 high quality SNP loci remained

(68K dataset). A second dataset of 6,794 unlinked SNPs (7K dataset) was also generated after pruning the 68K dataset for SNPs in linkage disequilibrium using the “–indep- pairwise” command in PLINK.

Using the 7K dataset, we calculated the second estimator of inbreeding, FHET, using the “–het” option in PLINK. This method used the equation:

!! − !! !! = !! − !! where the inbreeding coefficient F for individual i was a function of the observed number of homozygote loci O, the number of homozygote loci expected by chance E, and the total number of autosomal loci L. The expected number of homozygote loci Ei was calculated from the data using an unbiased estimator:

!! 1 − 2! ! ! ! ! !" !!! !!" − 1 where at each SNP j, p and q are the two allele frequencies and TA is twice the number of non-missing genotypes (see Purcell et al. 2007). We calculated FHET separately for each captive lineage and cross-lineage wolf.

The final measure of F was derived using runs of homozygosity (FROH). ROHs are contiguous stretches of homozygous regions in the genome that occur in a variety of lengths depending upon the time since the most recent common ancestor and the degree of inbreeding (Kirin et al. 2010). The inbreeding coefficient FROH was estimated from

ROHs using the equation:

!!"#! !!"# = !!"#$ 188

where !!"#! represents the total length of ROHs in the genome above a threshold length i and !!"#$ is the total length of the autosomal genome covered by the genetic markers (McQuillan et al. 2008). We used !!"#$ = 2318230000 bp, the autosomal genome length of the dog (Canis lupus familiaris) excluding unplaced scaffolds from the

CanFam2.0 genome assembly (Genbank assession: GCA_000002285.1); the same assembly used for development of the CanineHD BeadChip. We identified ROHs in the

68K dataset using a detection method implemented in PLINK. An ROH was defined as a region of homozygosity containing at least 100 SNPs and >1Mb in length. Each ROH also must have at least one SNP per 50 kb, and if a gap >1Mb was present the ROH was split into two separate ROHs. We accounted for potential error by allowing one heterozygous genotype and up to five missing genotypes per ROH. We assessed the correlations and their significance between the three different measures of F using the root mean squared error (RMSE), Pearson’s product moment correlation, r, and a

Welch’s t-test to assess significance in the R statistical package (www.r-project.org).

In order to assess the distribution of FHET and FROH expected from the official pedigree, we simulated genotypes for the 68K dataset across the official Mexican wolf pedigree (Siminski 2011) using the software SIMPED (Leal et al. 2005). SIMPED uses initial allele frequencies to simulate genetic data in founders and then tracks the alleles according to the pedigree. We performed 200 simulations and initiated each simulation using the observed minimum allele frequency for each SNP calculated over all individual

Mexican wolves. Additionally, we included the sex-averaged recombination rate in domestic dogs of 0.97 cM/Mb (Wong et al. 2010). For each simulation FHET and FROH were calculated and compared with FPED as described above. 189

Population simulations

Because captive populations often undergo controlled mating that may impact the ability to measure inbreeding, we investigated the performance of the two genetic estimators of inbreeding, FHET and FROH, in two simulated captive scenarios: 1) a population based upon demographic parameters characteristic of the Mexican wolf captive population (Seal 1990) with mates chosen to minimize mean kinship (MW-MK), and 2) the same Mexican wolf population but with pairings determined at random (MW-

RND). We performed 500 iterations for each population in VORTEX 9.99c (Lacy et al.

2009). We used similar conditions to those described by Seal (1990) with a few exceptions to mirror parameters of the captive Mexican wolf population: an initial population of 7 founders, a first age of reproduction for females of 2 and males of 1, and excluded catastrophes. Mating pairs were selected either using an approach to minimize mean kinship (MW-MK) without updating the prioritized list of potential breeders (“a static MK list”) (Miller and Lacy 2005) or selected at random (MW-RND). A detailed list of the parameters used can be found in Table D.1. Each iteration lasted 35 years or was capped at 1,000 individuals. Additionally, we included the simulation of 10 unlinked nuclear loci in each individual. We used the resulting genotypes to reconstruct the population pedigree in FRANZ (Riester et al. 2009). Because the populations were simulated, we used a genotyping error rate of 0 and shortened the number of iterations to

1,000,000 with a 200,000 burn-in period. We generated the 68K SNP dataset across each of the simulated populations and calculated FHET and FROH as described above for the observed Mexican wolf population. The only exception was that for each population we 190 used 100 replications of SNP data using SIMPED instead of 200 as done previously. We compared the genetic estimates FHET and FROH with the true value of inbreeding, FPED, reported for each individual in VORTEX using the statistical procedures described above.

RESULTS

Mexican wolf pedigree

We calculated three different measures of inbreeding in 80 Mexican wolves: inbreeding according to the pedigree (FPED), a moment estimator (FHET) and the proportion of the genome composed of ROHs (FROH). Mean FPED for the 80 Mexican wolves was 0.192 (SD 0.138) and ranged between 0 in the offspring of founders and first generation CL wolves to a maximum of 0.609 in a GR wolf. For the observed SNP dataset, mean FHET was -0.117 (SD 0.144). A majority of FHET values were less than zero, indicating an excess of heterozygosity compared with that expected by chance. We found the mean FROH across all wolves was 0.251 (SD 0.135), significantly greater than

FPED (t-test, p = 0.007). In known unrelated wolves, FPED was 0 by definition, whereas mean FHET was (-0.236, SD 0.078) and mean FROH was (0.031, SD 0.013). Using a minimum length of 1 Mb, we found a mean of 43.5 (SD 14.4) ROHs per Mexican wolf genome. As expected, in unrelated wolves mean number of ROHs was much smaller

(14.3, SD 4.4). GR wolves had the highest proportion of ROHs >30 Mb (Figure D.1A), and two GR wolves shared the longest ROH overall (91.6 Mb, chromosome 3). MB and

CL, which accumulated less inbreeding than AR and GR according to the studbook, generally had a larger proportion of ROHs in the smaller categories as predicted (i.e. <5

Mb; Figure D.1A). The shortest ROH was 1.9 Mb on chromosome 33 in two CL wolves. 191

Using unrelated wolves as a reference, wolves with related parents had both more numerous and longer ROHs (Figure D.1B). It was also apparent that the length of the genome in ROHs increased faster than the number of ROHs (Figure D.1B).

When comparing the performance of the two genetic estimators with that from the pedigree, FROH consistently outperformed FHET (Table D.2). FROH had a more than 4-fold less RMSE and much higher correlation (r = 0.897, p < 0.001, RMSE = 0.086) with FPED than FHET (r = 0.072, p = 0.528, RMSE = 0.363). FROH was often larger than FPED, but in several cases large differences were observed (Figure D.2). For example, two wolves

(studbook numbers 9 and 10, the first two individuals to the left of Figure D.2) are first generation offspring of the founders and therefore FPED = 0, however, FROH for these individuals (0.279 and 0.191, respectively) was much higher than expected. On the other hand, in two instances (studbook numbers 547 and 858) FROH (0.023 and 0.123, respectively) was much lower than the expected FPED (0.188 and 0.281, respectively)

(Figure D.2). These individuals were identified previously (see appendix C) as having misidentified parentage resulting in an incorrect calculation of FPED.

To assess the potential for sampling error when calculating FROH and FHET and to minimize the effects of inbreeding in the founders, pedigree errors, and genotype errors, we simulated a 68K SNP dataset across the Mexican wolf pedigree 200 times (Figure

D.3). As expected, the performance of both genetic estimators improved (Table D.2).

Using the simulated genotypes, FROH again was a much better estimator of FPED (r ~ 1.0, p < 0.001, RMSE = 0.021) than FHET (r = 0.134, p < 0.235, RMSE = 0.357). In this simulated dataset FHET was a better predictor than in the observed dataset, but again showed a consistent excess of heterozygosity. 192

Simulated pedigrees

To investigate the potential effect of captive breeding strategies on the ability to calculate F from SNP data, we simulated 500 captive populations in each of two scenarios. Simulations averaged 442.4 individuals/population for MW-MK and 463.8 individuals/population for MK-RND. In both cases, again, FROH was again a more efficient estimator of inbreeding than FHET (Table D.2). Unlike in the analyses using the observed Mexican wolf pedigree, the overall performance of FHET was much improved and the correlation significant in both types of simulated populations (Table D.2), demonstrating that sample size and the ability to accurately calculate expected allele frequencies are important when calculating FHET. Rather unexpectedly, both predictors performed better in MW-MK compared with MK-RND despite the systematic deviation from random mating implemented in MW-MK (Figure D.4). Although the differences in

RMSE were quite small between scenarios, they were significant for both FHET and FROH

(t-test, p < 0.001).

DISCUSSION

In many threatened and endangered species, where small population sizes are the norm, accurately measuring the level of inbreeding is critical for long-term management and persistence of the populations. When estimating F, it is generally considered that estimates derived from genealogical relationships outperform those using molecular methods (Fernández et al. 2005; Pemberton 2008). However, because the ability to accurately infer F increases as the number of molecular markers is increased (Santure et 193 al. 2010; Miller et al. 2013), modern advances in high throughput sequencing and genotyping technologies (notably SNPs) provide an excellent means to improve inbreeding estimation in species of conservation concern. In this study we reported that a genome-wide scan of SNPs in the endangered Mexican wolf could accurately estimate pedigree inbreeding. Notably, that inbreeding, measured using the proportion of the genome in contiguous stretches of homozygosity (FROH), consistently outperformed a method based upon allele frequencies (FHET) (Purcell et al. 2007). The strong correlation of FROH with FPED (0.897) was similar to that reported in pigs for ROH > 1 Mb (0.775;

Silió et al. 2013) and humans for ROH > 1.5 Mb (McQuillan et al. 2008). The performance of FROH improved using simulated SNP genotype data, further demonstrating its accuracy and precision while suggesting potential errors that can be attributed to factors such as pedigree errors, founder inbreeding, and genotype error. For example, in two cases we discovered large decreases in FROH relative to FPED (Figure

D.2), which corresponded with individuals previously shown to likely have misidentified parentage. Otherwise, we found that FROH was consistently greater than FPED, suggesting that some inbreeding can be attributed prior to that relative to the founder generation.

The consistent overestimate of FROH could also, however, be a function of the genome size used. We calculated FROH using the autosomal genome size of the domestic dog, but the actual genome size covered by the 68K SNP dataset may be smaller. For example, repetitive regions like centromeres are not covered by SNPs but are included in the genome assembly. Therefore, when accounting for these regions, FROH may actually be slightly larger than that observed in this study. We also chose to define a ROH as being at least 1 Mb in length to avoid inflation of FROH due to local reductions in recombination 194 rate which has been shown previously to be an effective cutoff (McQuillan et al. 2008;

Silió et al 2013). This minimum length was considered effective because in first generation cross lineage wolves, where the expected FPED is 0, mean FROH was only

0.031, indicating low background levels of FROH as a result of more ancient autozygosity and small effective population size. Since the shortest ROH was 1.9 Mb, increasing the minimum length to 1.5 Mb would not change the results, and should account for a majority of LD (vonHoldt et al. 2011).

Although higher correlations of FHET with FPED have been observed in Finnsheep

(Li et al. 2011), the much larger sample size and potentially more reliable genealogical records facilitated reduced error and improved estimates of allele frequencies in the

Finnsheep population. In this study, our sample size was rather small, especially in GR and AR, and may not reflect accurate measurements of the allele frequencies. If we used allele frequencies across all populations, the extensive structuring of the captive lineages shown previously (Appendix C) likely would further confound FHET measurements. In endangered, threatened, or otherwise small populations (like the Mexican wolf), small sample sizes and unusual demographic histories are relatively common therefore a molecular measure of inbreeding independent of allele frequencies is important.

Furthermore, many captive breeding strategies, such as pairing individuals to minimize mean kinship, potentially directly influence Hardy-Weinberg expectations necessary to calculate many molecular estimates of F. Therefore, FROH, which is not biased by sample size and allele frequency estimation, can be measured in as little as a single individual, making it a very useful tool for conservation programs. FROH can also be calculated very quickly, whereas the large numbers of markers can make certain methods, such as 195 maximum likelihood and Bayesian approaches, very time consuming and computationally demanding.

Another interesting result was that both FHET and FROH were significantly better estimators of pedigree inbreeding, albeit slightly, in simulated populations bred under mean kinship than the same populations with random mating. This is likely a result of the pedigree reconstruction method. Using genetic data and IBD, it is more difficult to reconstruct an inbred pedigree than an outbred pedigree (Kirkpatrick et al. 2011). Since randomly mated populations will contain, on average, individuals more closely related to all other individuals compared with a population bred under mean kinship, reconstructing the relationships in randomly mating populations from genetic data will be more difficult and contain a higher frequency of errors. Therefore, the observation that FHET and FROH are better estimators in the population paired using mean kinship may simply be due to more accurate pedigree reconstruction. Nonetheless, the differences between measures of inbreeding in both types of simulated populations were quite small, indicating that the pedigree reconstruction was rather accurate.

The measurement of inbreeding and inbreeding depression is an important component of many conservation programs. In the endangered Mexican wolf significant inbreeding depression has not been identified in the original captive lineages (Kalinowski et al. 1999), yet is evident when cross-lineage wolves are included (Fredrickson et al.

2007). Fredrickson et al. (2007) suggested that the combination of small population sizes and substantial inbreeding in the captive populations has led to a high load of recessive deleterious alleles, notably in MB, and may be responsible for the reduction in fitness relative to other populations of North American gray wolves. This is congruent with the 196 long ROHs in MB reported in this study and evidence that long ROHs are enriched for deleterious variation (Szpiech et al. 2013). If there exists recessive deleterious variants of large effect, it may be possible to use ROHs to identify the genomic regions harboring these variants through homozygosity mapping (e.g. Wang et al. 2009; Collin et al. 2011;

Papić et al. 2011). Homozygosity mapping is most effective in closed, inbred populations (e.g. the Mexican wolf) but has been successful in outbred populations as well (Hildebrandt et al. 2009). If an ROH is discovered that is significantly associated with reduction of fitness, managers can take this into consideration when pairing or releasing wolves. Methods already exist to manage single genetic loci while simultaneously minimizing inbreeding (Sonesson et al. 2003; Fernandez et al. 2006).

Nonetheless, in light of decreasing sequencing costs, genomic tools will become increasingly available for managed and/or endangered species. Our results demonstrated that ROHs inferred from a genome-wide set of SNPs may have important applications in the conservation and management of endangered species. 197

REFERENCES

Allendorf FW, Hohenlohe PA & Luikart G. (2010). Genomics and the future of

conservation genetics. Nature Reviews Genetics 11(10): 697-709.

Alves E, Barragan C & Toro MA. (2008). Inbreeding and homozygosity in Iberian pigs.

Spanish Journal of Agricultural Research 6(2): 248-251.

Balloux F, Amos W & Coulson T. (2004). Does heterozygosity estimate inbreeding in

real populations? Molecular Ecology 13(10): 3021-3031.

Broman KW & Weber JL. (1999). Long homozygous chromosomal segments in

reference families from the Centre d'Étude du Polymorphisme Humain. The

American Journal of Human Genetics 65(6): 1493-1500.

Brown DE. (1983). The wolf in the Southwest: the making of an endangered species, The

University of Arizona Press, Tucson, Arizona, USA.

Carley, CJ. (1979). Narrative report on alleged Mexican gray wolf (Canis lupus baileyi)

lineages held in captivity. U.S. Fish and Wildlife Service Southwest Region,

Albuquerque, New Mexico, USA.

Carothers AD, Rudan I, Kolcic I, et al. (2006). Estimating human inbreeding coefficients:

comparison of genealogical and marker heterozygosity approaches. Annals of

Human Genetics 70: 666-676.

Chapman JR, Nakagawa S, Coltman DW, et al. (2009). A quantitative review of

heterozygosity-fitness correlations in animal populations. Molecular Ecology

18(13): 2746-2765.

Charlesworth D & Charlesworth B. (1987). Inbreeding depression and its evolutionary

consequences. Annual Review of Ecology and Systematics 18: 237-268. 198

Charlesworth D & Willis JH. (2009). The genetics of inbreeding depression. Nature

Reviews Genetics 10(11): 783-796.

Collin RWJ, van den Born LI, Klevering BJ, et al. (2011). High-resolution homozygosity

mapping is a powerful tool to detect novel mutations causative of autosomal

recessive rp in the Dutch population. Investigative Ophthalmology & Visual

Science 52(5): 2227-2239.

Coltman DW & Slate J. (2003). Microsatellite measures of inbreeding: a meta-analysis.

Evolution 57(5): 971-983.

Coulson TN, Pemberton JM, Albon SD, et al. (1998). Microsatellites reveal heterosis in

red deer. Proceedings of the Royal Society B-Biological Sciences 265(1395): 489-

495. de Cara MÁR, Villanueva B, Toro MÁ, et al. (2013). Using genomic tools to maintain

diversity and fitness in conservation programmes. Molecular Ecology 22(24):

6091-6099.

Fernandez J, Roughsedge T, Woolliams JA, et al. (2006). Optimization of the sampling

strategy for establishing a gene bank: storing PrP alleles following a scrapie

eradication plan as a case study. Animal Science 82: 813-821.

Fernandez J, Villanueva B, Pong-Wong R, et al. (2005). Efficiency of the use of pedigree

and molecular marker information in conservation programs. Genetics 170(3):

1313-1321.

Fredrickson RJ, Siminski P, Woolf M, et al. (2007). Genetic rescue and inbreeding

depression in Mexican wolves. Proceedings of the Royal Society B: Biological

Sciences 274(1623): 2365-2371. 199

Hall N, Mercer L, Phillips D, et al. (2012). Maximum likelihood estimation of individual

inbreeding coefficients and null allele frequencies. Genetics Research 94(3): 151-

161.

Hamzić E. (2011). Levels of inbreeding derived from runs of homozygosity: a

comparison of Austrian and Norwegian cattle breeds. Master's Thesis; University

of Natural Resources and Life Sciences, Vienna, Austria.

Hedrick P, Fredrickson R & Ellegren H. (2001). Evaluation of ∂2, a microsatellite

measure of inbreeding and outbreeding, in wolves with a known pedigree.

Evolution 55(6): 1256-1260.

Hedrick PW & Fredrickson RJ. (2008). Captive breeding and the reintroduction of

Mexican and red wolves. Molecular Ecology 17(1): 344-350.

Hedrick PW & Kalinowski ST. (2000). Inbreeding depression in conservation biology.

Annual Review of Ecology and Systematics 31: 139-162.

Hedrick PW, Miller PS, Geffen E, et al. (1997). Genetic evaluation of the three captive

Mexican wolf lineages. Zoo Biology 16(1): 47-69.

Hildebrandt F, Heeringa SF, Ruschendorf F, et al. (2009). A Systematic Approach to

Mapping Recessive Disease Genes in Individuals from Outbred Populations. Plos

Genetics 5(1): e1000353.

Hill WG & Weir BS. (2011). Variation in actual relationship as a consequence of

Mendelian sampling and linkage. Genetics Research 93(1): 47-64.

Kalinowski ST, Hedrick PW & Miller PS. (1999). No inbreeding depression observed in

Mexican and red wolf captive breeding programs. Conservation Biology 13(6):

1371-1377. 200

Kirin M, McQuillan R, Franklin CS, et al. (2010). Genomic runs of homozygosity record

population history and consanguinity. Plos One 5(11): e139996.

Kirkpatrick B, Li SC, Karp RM, et al. (2011). Pedigree reconstruction using identity by

descent. Journal of Computational Biology 18(11): 1481-1493.

Lacy RC, Borbat M & Pollak JP. (2009). Vortex: a stochastic simulation of the extinction

process. Version 9.95. Chicago Zoological Society, Brookfield, Illinois, USA.

Leal SM, Yan K & Muller-Myhsok B. (2005). SimPed: a simulation program to generate

haplotype and genotype data for pedigree structures. Human Heredity 60(2): 119-

122.

Leutenegger AL, Prum B, Genin E, et al. (2003). Estimation of the inbreeding coefficient

through use of genomic data. American Journal of Human Genetics 73(3): 516-

523.

Li MH, Stranden I, Tiirikka T, et al. (2011). A comparison of approaches to estimate the

inbreeding coefficient and pairwise relatedness using genomic and pedigree data

in a sheep population. Plos One 6(11): e26256.

Malécot G. (1948). Les mathématiques de i’hérédité. Paris: Masson & Cie.

McBride, RT. (1980). The Mexican wolf (Canis lupus baileyi): a historical review and

observations on it status and distribution. U.S. Fish and Wildlife Service

Southwest Region, Albuquerque, New Mexico, USA.

McQuillan R, Leutenegger A-L, Abdel-Rahman R, et al. (2008). Runs of homozygosity

in European populations. The American Journal of Human Genetics 83(3): 359-

372.

Miller JM, Malenfant RM, David P, et al. (2013). Estimating genome-wide 201

heterozygosity: effects of demographic history and marker type. Heredity (Ahead

of Print) 10.1038/hdy.2013.99.

Miller PS & Lacy RC. (2005). VORTEX: a stochastic simulation of the extinction

process. version 9.50 user’s manual. Apple Valley, MN: Conservation Breeding

Specialist Group (SSC/IUCN).

Oliehoek PA & Bijma P. (2009). Effects of pedigree errors on the efficiency of

conservation decisions. Genetics Selection Evolution 41(1): 9.

Overall ADJ, Byrne KA, Pilkington JG, et al. (2005). Heterozygosity, inbreeding and

neonatal traits in Soay sheep on St Kilda. Molecular Ecology 14(11): 3383-3393.

Papic L, Fischer D, Trajanoski S, et al. (2011). SNP-array based whole genome

homozygosity mapping: a quick and powerful tool to achieve an accurate

diagnosis in LGMD2 patients. European Journal of Medical Genetics 54(3): 214-

219.

Pemberton JM. (2008). Wild pedigrees: the way forward. Proceedings of the Royal

Society B-Biological Sciences 275(1635): 613-621.

Purcell S, Neale B, Todd-Brown K, et al. (2007). PLINK: A tool set for whole-genome

association and population-based linkage analyses. American Journal of Human

Genetics 81(3): 559-575.

Riester M, Stadler PF & Klemm K. (2009). FRANz: reconstruction of wild multi-

generation pedigrees. Bioinformatics 25(16): 2134-2139.

Ritland K. (1996). Estimators for pairwise relatedness and individual inbreeding

coefficients. Genetical Research 67(2): 175-185.

Santure AW, Stapley J, Ball AD, et al. (2010). On the use of large marker panels to 202

estimate inbreeding and relatedness: empirical and simulation studies of a

pedigreed zebra finch population typed at 771 SNPs. Molecular Ecology 19(7):

1439-1451.

Saura MA, Fernández A, Rodríguez MC, et al. (2013). Genome-wide estimates of

coancestry and inbreeding in a closed herd of ancient Iberian pigs. Plos One

8(10): e78314.

Seal US. (1990). Mexican wolf (Canis lupus baileyi) population viability assessment.

Review draft report of workshop held 22-24 October, 1990, at Fossil Rim

Wildlife Center, Texas, USA.

Silió L, Rodríguez MC, Fernández A, et al. (2013). Measuring inbreeding and inbreeding

depression on pig growth from pedigree or SNP-derived metrics. Journal of

Animal Breeding and Genetics 130(5): 349-360.

Siminski, DP. (2011). Mexican wolf, Canis lupus baileyi, international studbook, 2011.

The Living Desert, Palm Desert, California, USA.

Slate J, David P, Dodds KG, et al. (2004). Understanding the relationship between the

inbreeding coefficient and multilocus heterozygosity: theoretical expectations and

empirical data. Heredity 93(3): 255-265.

Sonesson AK, Janss LLG & Meuwissen THE. (2003). Selection against genetic defects

in conservation schemes while controlling inbreeding. Genetics Selection

Evolution 35(4): 353-368.

Szpiech ZA, Xu JS, Pemberton TJ, et al. (2013). Long runs of homozygosity are enriched

for deleterious variation. American Journal of Human Genetics 93(1): 90-102. 203

U.S. Fish and Wildlife Service [USFWS]. (2012). Mexican wolf Blue Range

reintroduction project statistics. U.S. Fish and Wildlife Service, Downloaded

from: http://www.fws.gov/southwest/es/mexicanwolf/.

Vogl C, Karhu A, Moran G, et al. (2002). High resolution analysis of mating systems:

inbreeding in natural populations of Pinus radiata. Journal of Evolutionary

Biology 15(3): 433-439. vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305.

Wang J. (2011). A new likelihood estimator and its comparison with moment estimators

of individual genome-wide diversity. Heredity 107(5): 433-443.

Wang S, Haynes C, Barany F, et al. (2009). Genome-wide autozygosity mapping in

human populations. Genetic Epidemiology 33(2): 172-180.

Wong AK, Ruhe AL, Dumont BL, et al. (2010). A comprehensive linkage map of the dog

genome. Genetics 184(2): 595-U436.

Wright S. (1921). Systems of mating. I. The biometric relations between parent and

offspring. Genetics 6(2): 111-123.

Wright S. (1922). Coefficients of inbreeding and relationship. American Naturalist 56:

330-338. 204

Table D.1: Parameters used in VORTEX for population simulations. The asterisk indicates the parameter that varied between MW-RND (0) and MW-MK (1).

Simulation Settings Number of iterations/simulations 500 Number of years 35 Number of days 365 Number of populations 1 Species Description Inbreeding depression? TRUE Lethal equivalents 1.7 Percent due to recessive lethals 50 EV concordance? 1 Number of types of catastrophes 0 EV correlation 0.5 Reproductive System Polygamous? TRUE Hermaphroditic? FALSE Long term monogamous? FALSE Long term polygamous? FALSE Age of first reproduction for females 2 Age of first reproduction for males 1 Maximum age of reproduction 12 Maximum number litters per year 1 Maximum number progeny per litter 10 Sex ratio 50 Density dependent reproduction FALSE Reproductive Rates % Adult females breeding / EV in % breeding 50,12 Use mean and SD? TRUE Mean/standard deviation 5,2 Distribution of litters 50,50 Mortality Mortality Females {"40": "10": "20": "5": "15": "4"} Mortality Males {"40": "10": "20": "5"} Monopolization % Males in Breeding Pool / % Males Successfully Siring Offspring / % Mean Offspring/Successful Sire {"50": "28.3": "1.5"} Initial Population Size Initial population size 7 Female ages {"1": "1": "0": "1": "0": "0": "0": "0": "0": "1": "0": "0"} Male ages {"1": "1": "0": "0": "1": "0": "0": "0": "0": "0": "0": "0"} Carrying Capacity Carrying capacity (k) / SD in K Due to EV 1000,0 Trend in K? / Over how many years? / % annual increase decrease? FALSE Genetic Management NumNeutralLoci? 10 BreedMaintainK? 0 PreventMatingsF? 0 205

PairMeanKinship? 0/1* PairMeanKinshipDynamic? FALSE PairMeanKinshipStatic? TRUE

206

Table D.2: Summary of the comparisons between the two genetic estimators (FHET and FROH) of inbreeding with that estimated from the pedigree (FPED) for the different datasets. Each comparison is represented by the sample size (n), the Pearson product moment correlation (r) and its p-value (p), and the root mean squared error (RMSE).

Population Estimator n r p RMSE

MW FHET 80 0.072 0.528 0.363 observed data FROH 80 0.897 < 0.001 0.086 MW FHET 80 0.134 0.235 0.357 simulated data FROH 80 ~1.00 < 0.001 0.021 F 221199 0.610 < 0.001 0.244 MW-MK HET FROH 221199 0.929 < 0.001 0.045 F 231889 0.591 < 0.001 0.292 MW-RND HET FROH 231889 0.911 < 0.001 0.060

207

Figure D.1: A) Comparison of the proportion of total ROHs in different length categories and B) number vs. total length of ROHs for individual wolves. The horizontal and vertical dashed lines indicate the maximum values for known unrelated CL wolves with FPED = 0. The regression line was generated from these same CL wolves. MB = McBride, GR = Ghost Ranch, AR = Aragón, CL = cross-lineage

 +         

                                           





                    

 " #                                 

%*%&     

   

       +          ! 

 

$%&  '  ()

208

Figure D.2: Observed measures of FHET (black line) and FROH (red line) for individual wolves sorted according to increasing FPED (colored circles). MB = McBride, GR = Ghost Ranch, AR = Aragón, CL = cross-lineage

●●●● ● MB ● GR ●● 0.5 ● AR ● CL

● ●●●●●●●● ●●●● ●●●●●● ●●● ●●●●●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●● ● ●●●● ●

●●●●●●●● 0.0 F − 0.5

0 20 40 60 80

Individuals

209

Figure D.3: Comparison of 95% confidence intervals for FHET (black region) and FROH (gray region) derived from 200 iterations of simulated genotype data with the observed FPED (red line). Individuals are sorted along the x-axis according to increasing FPED.







 

  

  

  

  

    

210

Figure D.4: Density plots of the genetic estimators of inbreeding FHET and FROH compared with true inbreeding known from the pedigree, FPED, for individuals in the simulated populations bred under a minimizing mean kinship approach (MW-MK; A and B) and using random mating approach (MW-RND; C and D). The solid black line represents the hypothetical 1:1 ratio between measures, whereas the dashed line indicates the observed regression through all points. The scale bars on the right indicate the relative density of points on the plot.

211

APPENDIX E

A GENOME-WIDE SCAN FOR SELECTION AND AN ASSOCIATION STUDY OF GENES INVOLVED IN FITNESS IN THE ENDANGERED MEXICAN WOLF (Canis lupus baileyi): IMPLICATIONS FOR MANAGEMENT

FITAK, ROBERT R.

Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ

85721, USA

COAUTHORS: RINKEVICH, S. and CULVER, M.

This paper was prepared to submit to a journal.

Corresponding Address:

Robert R. Fitak

317 Biosciences East

1311 E 4th St

Tucson, AZ, 85721, USA

Tel: 520-626-1636

Fax: 520-621-8801

Email: [email protected]

Keywords: single nucleotide polymorphisms, gray wolves, conservation genomics, Arizona, New

Mexico

212

ABSTRACT

As a result of declining biodiversity, ex situ, or captive breeding, has become an important component of conserving endangered species. Properly managing captive populations can be difficult, however, because they commonly start with relatively few founders, lack preliminary genetic data, adapt to captivity, become inbred, and/or fail to reproduce. Genetic methods often are employed in captive breeding programs to discern relationships between individuals and monitor the loss of genetic variation. Recently, the ability to select individuals to breed based upon surveys of genetic markers has been proposed. In this study, we used a dataset containing 68,147 single nucleotide polymorphisms (SNPs) to detect loci under selection in the Mexican wolf population.

We also performed an association study to identify SNPs potentially linked to genes responsible for variation in fitness. Our results found 22 loci under selection, potentially linked to genes known to cause disease in humans. One locus in particular was known to cause infertility in mice. We also identified an association of SNPs on chromosome 13 to juvenile survival and litter size. One of the genes identified in this study had previously been implicated in the rejection of fetuses in humans during pregnancy; however, neither association was statistically significant after correction for multiple testing. Our results suggest that using genome-wide surveys of genetic markers has the potential to be useful for identifying adaptive and detrimental variation in the endangered Mexican wolf, which could be managed accordingly to promote growth of the reintroduced population.

213

INTRODUCTION

It is evident that the earth’s biodiversity is being lost at an alarming rate and this pace does not appear to be diminishing (Butchart et al. 2010). Often, in situ conservation is practiced as the most effective means of conserving biodiversity, however, species are frequently encountered that can no longer be conserved in their natural habitat.

Therefore, ex situ conservation (captive breeding programs) has also become an important component of conservation biology (Frankham et al. 2009, Witzenberger and

Hochkirch 2011). For captive breeding programs, the maintenance of genetic diversity and demographic integrity are the primary concerns (Lacy 1994; Ballou and Lacy 1995).

This can be challenging because captive populations commonly start with relatively few founders, lack preliminary genetic data, adapt to captivity, become inbred, and/or fail to reproduce (Leberg and Firmin 2008; Frankham et al. 2009). Consequently, it is critical to incorporate genetic data from pedigrees and molecular markers to strengthen the effectiveness of captive breeding programs (Lacy 1994; Ballou and Lacy 1995).

The primary goal of any captive breeding program is essentially to ‘stop evolution’, or rather minimize the genetic change from that of their wild ancestors (Lacy

2009). Breeding strategies that minimize kinship between individuals in the population are effective in maximizing the retention of genetic variation and minimizing inbreeding

(Ballou and Lacy 1995; Montgomery et al. 1997). Unfortunately, individuals in managed populations can accumulate levels of inbreeding potentially reducing fitness (inbreeding depression; Hedrick and Kalinowski 2000; Seymour et al. 2001; Leberg and Firmin

2008) and can adapt to captivity in as little as one generation (Christie et al. 2012).

Therefore, in addition to minimizing kinship, identifying genetic components of reduced 214 fitness and adaptation may be beneficial when selecting breeding pairs and individuals to release in captive breeding programs.

The process through which managers choose individuals to mate on the basis of molecular genetic markers is termed marker-assisted selection (MAS; Wang and Hill

2000; Pertoldi et al. 2010). MAS is often employed in agricultural settings to promote crops and/or livestock lineages with certain favorable characteristics (Ribaut and

Hoisington 1998; Dekkers 2004; Williams 2005). In livestock, the application of MAS has improved a variety of desired characteristics, including reproduction, growth, meat quality and yield, muscle development, and milk production (see Goddard and Hayes

2009 for an extensive review). MAS may also be important in captive populations of wild species to select individuals that minimize the average probability of identity by descent, potentially increasing effective population size and genetic variation (Pertoldi et al. 2010). Captive populations of endangered species often encounter high frequencies of inherited deleterious conditions (see Laikre 1999 for a review) such as blindness in captive gray wolves (Canis lupus) in Scandinavian zoos (Laikre et al. 1993) and chondrodystrophy in the California condor (Gymnogyps californianus; Romanov et al.

2006; 2009). Using MAS, especially genome-wide markers like single nucleotide polymorphisms (SNPs), it is possible to identify such detrimental traits and eventually reduce their frequency or remove them from the population (Kohn et al. 2006). There are important trade-offs, however, when implementing MAS. For instance, selecting individuals using MAS may not maximize effective population size or minimize the loss of neutral genetic variation. Nonetheless, genome-wide surveys may help identify loci under selection, which contribute to adaptation in a captive population. Because these 215 selected loci can often reduce fitness in a wild population (Lynch and O’Hely 2001), their identification and subsequent reduction is beneficial.

One endangered gray wolf subspecies that may be susceptible to the aforementioned concerns is the Mexican wolf (Canis lupus baileyi). By the mid 1980’s the Mexican wolf was extirpated from its historical range and only existed in three separate captive populations (Hedrick et al. 1997). These populations were maintained as three separate lineages, McBride (MB), Aragón (AR), and Ghost Ranch (GR), each accumulating substantial levels of inbreeding and were subsequently merged in 1995

(although a portion of MB was kept independent). In 1998, the first reintroductions of

Mexican wolves into a recovery area in Arizona and New Mexico began. Unfortunately, the reintroduced population size has remained below expectations (USFWS 2010). A variety of factors, notably high mortality through disease, managed removal, illegal shooting, and vehicle collisions have contributed to this low growth rate (USFWS 2012).

Another concern is low fitness, measured by small litter size and low juvenile survival.

For example, litter size in the reintroduced Mexican wolf population averaged 2.1 pups, much lower than that of captive Mexican wolves (4.6 pups) and historical wild Mexican wolves (4.5 pups; McBride 1980; AMOC and IFT 2005). Furthermore, wolves descended from cross-lineage parents have a higher fitness than wolves from the captive

McBride population (Fredrickson et al. 2007). Increasing fitness, such as through increased litter size and juvenile survival, is likely to have important consequences for the success of the wild, reintroduced population (USFWS 2010).

In this study, we used a dataset containing >68,000 SNPs to scan the Mexican wolf’s genome for signals of selection. Additionally, we gathered fitness traits for each 216 wolf in the study and performed an association study to identify loci contributing to differences in fitness between individuals. We used our results to suggest the potential for such loci to contribute to increasing fitness in the reintroduced population, thus providing an important component to establishing a robust wild population.

MATERIALS AND METHODS

Genetic and phenotypic data

We used the 68K dataset described in Appendix C. This dataset contained 80

Mexican wolves and their genotypes for 68,147 high quality SNPs after quality control and filtering. The 80 samples were composed of wolves from MB (n = 31), AR (n = 2),

GR (n = 6), and of mixed ancestry (n = 41). We calculated two fitness traits for each wolf using the official species studbook (Siminski 2011): i) juvenile survival and ii) litter size. Juvenile survival was defined as the proportion of an individual’s offspring that survive to one year of age and litter size was defined as the average number of offspring per litter per individual. We were able to measure fitness traits in 48 of the 80 wolves using the pedigree. It must be mentioned that for wolves born in the wild, reintroduced population parentage is often determined using microsatellite markers by the U.S. Fish and Wildlife Service and litter size is often measured after pups emerge from the den. It is possible the number of offspring at the time of birth may differ from the number that emerged from the den. The calculation of these traits was adapted from those presented by Fredrickson et al. (2007). We controlled for independence of the two fitness traits by testing for significant correlation using the cor.test function in R (www.r-project.org).

217

Scan for FST outlier loci

We used a Bayesian approach implemented in BAYESCAN v2.1 (Foll and

Gaggiotti 2008; Fiscer et al. 2011) to detect loci under selection, also known as an FST outlier test. When compared to other software for detecting FST outlier loci,

BAYESCAN had the lowest number of type I and type II errors (false positives and false negatives, respectively; Narum and Hess 2011). The approach in BAYESCAN examines the components of FST that are shared among all loci within a population and those of each specific locus shared among all populations. The Bayesian implementation determines the posterior probability for each locus under a model with and without selection. Selection is inferred when the posterior odds for the model with selection is greater than 10 (Fischer et al. 2011). Our approach used 20 pilot runs of a Markov Chain

Monte Carlo method with 5,000 iterations each to adjust the proposal distribution, and a final run of a 50,000 iteration burn-in followed by an additional 50,000 iterations. For significance, we calculated the false discovery rate (FDR; Benjamini and Hochberg 1995) and report loci with an FDR < 0.05 (recommendation by BAYESCAN). In other words,

5% of the significant loci are expected to be false positives.

Genome-wide association mapping of fitness traits

We used the genotype data to test each locus for associations with the two phenotypes described above using the software GAPIT (Lipka et al. 2012). GAPIT incorporates a method called efficient mixed-model association (EMMA), designed to perform association mapping of quantitative traits in model organisms to account for high population substructure and relatedness between individuals (Kang et al. 2008). The 218 software uses a kinship matrix (matrix of identity by descent values) to correct for closely related individuals, and then implements a principle components algorithm to account for population substructure within the samples. The resulting P values were corrected for multiple tests using the FDR procedure described above.

Gene identification and comparison

For both loci under selection and those mapping to our traits of interest, we identified the known genes in that region using the ENSEMBL dog genome browser

(www.ensembl.org). Briefly, for a single locus we included all genes found within 100 kb up and downstream, and for multiple loci within the same region we included all genes within the region and those within 100 kb of flanking sequence. Descriptions for known genes were supplemented with information from the ENTREZ

(www.ncbi.nlm.nig.gov/entrez) and UNIPROT (www.uniprot.org) databases. We determined enrichment for particular functional classes of genes using gene ontologies

(GO terms) in the web-based tool WEBGESTALT (Zhang et al. 2005). We tested for statistical significance using a hypergeometric test and corrected for multiple comparisons using the FDR.

RESULTS

Scan for FST outlier loci

We used a Bayesian analysis to identify loci significantly more differentiated than expected across four putative ‘populations’ of Mexican wolves (MB, AR, GR, and mixed wolves). From the 68K dataset we discovered 54 loci whose posterior odds for a model 219 accounting for positive selection as the best explanation of the observed differentiation were >10. FST for these loci ranged between 0.51 and 0.53. After correction for multiple testing using a FDR < 0.05 only 22 loci remained (mean FDR = 0.040; Figure E.1). We mapped these 22 loci to the genome of the domestic dog and found that seven different chromosomal regions were represented. The longest region contained 10 SNPs considered to be outliers and covered ~2.1 Mb. Using the seven chromosomal regions covered by these SNPs and their 100 kb flanking regions we identified 48 genes as potential candidates for selection (Table E.1). In four cases the gene had been implicated in human disease. One gene, ZAR1L, is similar to ZAR1, which is expressed in all vertebrate ovaries examined and was described as important for fertility in mice (Wu et al. 2003a, b). The 48 genes contained 17 (35.4%) coding for various RNA molecules and

31 (64.6%) protein coding genes. An analysis of ontology found 29 GO terms associated with the 31 protein coding genes (Table E.2). The most significant enrichment of genes under selection was for genes involved in signal transduction in response to DNA damage (first three rows of Table E.2; FDR = 0.057).

Genome-wide association mapping of fitness traits

We measured two fitness traits in 48 Mexican wolves using the species studbook

(Siminski 2011): juvenile survival and litter size. Mean juvenile survival and litter size were 0.73 ± 0.037 and 3.76 ± 0.25, respectively. We observed no statistical correlation between the two traits (Pearson’s correlation coefficient r = -0.088; P = 0.55). For juvenile survival, 3,790 SNPs were associated with the trait (Figure E.2A, P < 0.05), however, after correction for multiple comparisons using FDR no loci remained (Figure 220

E.2C). The most significant loci mapping to the trait were three SNPs in a ~270 kb region of chromosome 13 (P = 0.00095; FDR = 0.83; Figure E.2B). The correlation coefficient, r2, for a model with these three SNPs is 0.30, much greater than the model excluding these SNPs (r2 = 0.024). The effect size for these loci, however, was quite small (~0.4%). This particular region of chromosome 13 contained the canine homologs of four genes: RNF139, TATDN1, NDUFB9, and MTSS1 (Table E.3).

Similar to the juvenile survival trait, we also identified a large number of SNPs associated with litter size in Mexican wolves (3,618 SNPs, P < 0.05; Figure E.3A).

However, none of these were significant after correction for multiple comparisons (FDR

> 0.92; Figure E.3C). Surprisingly, the most significant region that mapped to the trait was also on chromosome 13 (P = 0.00035; Figure E.3B). The r2 was also much greater for a model including these SNPs (r2 = 0.32) than excluding them (r2 = -0.031).

Compared with the juvenile mortality phenotype, the effect size was much larger

(~3.7%). This region of chromosome 13 contained 23 SNPs and covered ~1.5 Mb. The

ENSEMBL browser had 11 genes annotated in this region, nine of which coded for proteins (Table E.3). One gene, RSPO2, had previously been implicated as responsible for moustache and eyebrow ‘furnishings’ in wire-haired domestic dogs (Cadieu et al.

2009; Boyko et al. 2010). A second gene, EBAG9 (or RCAS1), was described in the estrogen-response pathway (Watanabe et al. 1998) and has been implicated in a variety of cancers (Oizumi et al. 2002; Rousseau et al. 2002).

DISCUSSION: 221

In this study we performed scans for selection and association for two phenotypic traits related to fitness using a genome-wide panel of SNPs. We were able to identify several loci under positive, directional selection containing genes enriched for pathways responding to DNA damage. Because linkage disequilibrium (LD) is considerably high

2 in the Mexican wolf (r 0.5 > 1 Mb; vonHoldt et al. 2011), it is possible that the selection of 100 kb of flanking sequence may not capture the locus under selection. More so, calculating FST between populations when sample size is quite low (AR = 2, GR = 6) can be problematic (Holsinger and Weir 2009). However, the method implemented in

BAYESCAN is able to incorporate fluctuations in allele frequency due to small sample size with little or no bias, at the cost of reduced power (Foll and Gaggiotti 2008; Fischer et al. 2011). Additionally, recent work has demonstrated that large sample sizes, e.g. <

20, are not necessary when FST > 0.05 (Kalinowski 2005), and small sample sizes can be compensated for by using large numbers of markers (>1,000; Willing et al. 2012; also see

Morin et al. 2009).

A large number of the genes found in the positively selected regions were enriched for processes involved in the generation of energy and implicated in human diseases like cancer and Down’s syndrome. These could be loci with highly deleterious alleles, where inbreeding and differential selection are efficient at purging the detrimental variants, inflating FST estimates (see García-Dorado 2008). One gene we identified on chromosome 25, ZAR1L, is such a candidate. ZAR1L, and its homolog ZAR1 are known to be important for the proper transition of an oocyte to an embryo and mice homozygous for a particular ZAR1L allele are infertile (Wu et al. 2003a; Hu et al. 2010). Our study also indicated positive selection in a region containing the gene BRCA2, which is 222 important in DNA repair and can harbor mutations greatly increasing the risk of breast cancer in humans (Duncan et al. 1998). In humans, inbreeding has been shown to increase the risk of breast cancer (Rudan 1999), most likely by doubling the dose of a cancer-susceptibility allele. Our results demonstrated the potential to discover loci under selection independent of a known phenotype, however extensive sequencing of these candidate loci and surveys of these diseases in Mexican wolves relative to their more outbred counterparts would be necessary to validate these findings. It must be noted, that the extensive bottleneck experienced by Mexican wolves can result in an allele frequency pattern similar to that of selection. Therefore, differentiating between the effects of selection and drift can be difficult. We suggest that even if FST is inflated for a particular locus as a result of drift, these loci may also serve as candidates for future study. The addition of other, independent tests such as Tajima’s D may corroborate these findings if the same regions are detected.

Using two quantitative phenotypes related to fitness, we were unable to detect any significant associations. The lack of significant findings is likely a result of two factors: i) the small sample size for which phenotypes are available and ii) the potential for a large number of loci each contributing to a small amount of variance in the trait. In domestic dogs, Karlsson et al. (2007) were able to map candidate loci involved in two morphological phenotypes using only 20 individuals, however these phenotypes were considered to be Mendelian traits and a result of a single locus. Traits like fitness, which are quantitative in nature, are more likely influenced by a large number of rare alleles

(Eyre-Walker 2010). Nonetheless, for both traits we did detect strong associations between regions of chromosome 13 and fitness. One candidate gene, RSPO2, had 223 previously been associated with eyebrow and moustache ‘furnishings’ in domestic dogs

(Cadieu et al. 2009; Boyko et al. 2010). A second candidate, EBAG9, contains a motif that binds the estrogen receptor (Watanabe et al. 1998). In addition to being associated with several types of tumors (Umeoka et al. 2001; Oizumi et al. 2002; Rousseau et al.

2002; Tsuneizumi et al. 2002; Sonoda et al. 2003; Takahashi et al. 2003), reduced expression of this gene can increase activation of natural killer cells, thereby destroying uterine glands and rejecting the fetus during human pregnancies (Ohshima et al. 2001). It is possible an allele affecting the activity or expression of this gene could impact the retention of fetuses in Mexican wolves. Low pup recruitment has been of large concern in the reintroduced Mexican wolf population (USFWS 2010). Our results suggested a potential to use MAS to select captive wolves for reintroduction that increase the frequency of the advantageous allele; potentially increasing litter size and pup recruitment.

In this study we identified candidate regions in the Mexican wolf’s genome associated with both positive selection and fitness traits. These regions are currently the only candidates, and more focused genotyping of select SNPs in a larger number of individuals and resequencing of candidate regions will assist in proper validation of these loci. This study does indicate the potential to discover loci associated with traits important for the conservation and management of Mexican wolves. Although the sample size made uncovering significant associations very difficult unless the association was substantial, the most significant candidate region did contain a result with a quite fitting biological context (EBAG9). Therefore, despite a lack of statistically significant results, this study merely identified candidates that serve as targets for future studies. 224

Targeted genotyping of candidate genes in a larger sample of Mexican wolves will provide a cost-effective means of further investigating whether or not the associations were spurious results. Any significant associations can then be examined in the context of MAS. Because the Mexican wolf is an extensive, well-managed captive population, it provides an excellent opportunity to empirically test the potential for MAS. Also, another method, called genome prediction, may be useful for captive populations like the

Mexican wolf. Genome prediction utilizes dense SNP data to predict quantitative traits in offspring (Goddard and Hayes 2009). This method may facilitate the selection of

Mexican wolf offspring to breed or release into the wild that are statistically likely to benefit a particular trait. By continuing to genotype individuals and collect additional reproductive and other phenotype information, managers eventually may be able to shift the frequency of certain alleles in favor of promoting increased growth and survival in the wild population. However, the potential benefits of MAS need to be assessed in the context of crossing individuals that may not maximize the effective population size nor minimize the loss of genetic variation, because it represents a separate strategy to selecting mating pairs for captive breeding programs.

225

REFERENCES

Adaptive Management Oversight Committee and Interagency Field Team [AMOC and

IFT]. (2005). Mexican wolf Blue Range reintroduction project 5-year review.

U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New Mexico,

USA.

Ballou JD & Lacy RC. (1995). Identifying genetically important individuals for

management of genetic variation in pedigreed populations. In Population

management for survival and recovery: analytical methods and strategies in small

population conservation (Ballou, J. D., Gilpin, M. E. & Foose, T. J., eds.), pp. 76-

111. Columbia University Press, New York, New York, USA.

Benjamini Y & Hochberg Y. (1995). Controlling the false discovery rate - a practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society

Series B-Methodological 57(1): 289-300.

Boyko AR, Quignon P, Li L, et al. (2010). A simple genetic architecture underlies

morphological variation in dogs. PLoS Biology 8(8): e1000451.

Butchart SHM, Walpole M, Collen B, et al. (2010). Global biodiversity: Indicators of

recent declines. Science 328(5982): 1164-1168.

Cadieu E, Neff MW, Quignon P, et al. (2009). Coat variation in the domestic dog is

governed by variants in three genes. Science 326(5949): 150-153.

Christie MR, Marine ML, French RA, et al. (2012). Genetic adaptation to captivity can

occur in a single generation. Proceedings of the National Academy of Sciences of

the United States of America 109(1): 238-242. 226

Dekkers JCM. (2004). Commercial application of marker- and gene-assisted selection in

livestock: strategies and lessons. Journal of Animal Science 82(13): E313-E328.

Duncan JA, Reeves JR & Cooke TG. (1998). BRCA1 and BRCA2 proteins: roles in

health and disease. Journal of Clinical Pathology-Molecular Pathology 51(5):

237-247.

Eyre-Walker A. (2010). Genetic architecture of a complex trait and its implications for

fitness and genome-wide association studies. Proceedings of the National

Academy of Sciences of the United States of America 107(Suppl 1): 1752-1756.

Fischer MC, Foll M, Excoffier L, et al. (2011). Enhanced AFLP genome scans detect

local adaptation in high-altitude populations of a small rodent (Microtus arvalis).

Molecular Ecology 20(7): 1450-1462.

Foll M & Gaggiotti O. (2008). A genome-scan method to identify selected loci

appropriate for both dominant and codominant markers: a Bayesian perspective.

Genetics 180(2): 977-993.

Frankham R, Ballou JD & Briscoe DA. (2009). Introduction to conservation genetics,

Cambridge University Press, Cambridge, UK.

Fredrickson RJ, Siminski P, Woolf M, et al. (2007). Genetic rescue and inbreeding

depression in Mexican wolves. Proceedings of the Royal Society B: Biological

Sciences 274(1623): 2365-2371.

García-Dorado A. (2008). A simple method to account for natural selection when

predicting inbreeding depression. Genetics 180(3): 1559-1566.

Hedrick PW & Kalinowski ST. (2000). Inbreeding depression in conservation biology.

Annual Review of Ecology and Systematics 31: 139-162. 227

Hedrick PW, Miller PS, Geffen E, et al. (1997). Genetic evaluation of the three captive

Mexican wolf lineages. Zoo Biology 16(1): 47-69.

Holsinger KE & Weir BS. (2009). Genetics in geographically structured populations:

defining, estimating and interpreting FST. Nature Reviews Genetics 10(9): 639-

650.

Hu J, Wang F, Zhu X, et al. (2010). Mouse ZAR1-like (XM_359149) colocalizes with

mRNA processing components and its dominant-negative mutant caused two-cell-

stage embryonic arrest. Developmental Dynamics 239(2): 407-424.

Kalinowski ST. (2005). Do polymorphic loci require large sample sizes to estimate

genetic distances? Heredity 94(1): 33-36.

Kang HM, Zaitlen NA, Wade CM, et al. (2008). Efficient control of population structure

in model organism association mapping. Genetics 178(3): 1709-1723.

Karlsson EK, Baranowska I, Wade CM, et al. (2007). Efficient mapping of mendelian

traits in dogs through genome-wide association. Nature Genetics 39(11): 1321-

1328.

Kohn MH, Murphy WJ, Ostrander EA, et al. (2006). Genomics and conservation

genetics. Trends in Ecology & Evolution 21(11): 629-637.

Lacy RC. (1994). Managing genetic diversity in captive populations of animals. In

Restoration and recovery of endangered plants and animals (Bowles, M. L. &

Whelan, C. J., eds.), pp. 63-89. Cambridge University Press, Cambridge, UK.

Lacy RC. (2009). Stopping evolution. In Conservation genetics in the age of genomics

(Amato, G., DeSalle, R., Ryder, O. A. & Rosenbaum, H. C., eds.), pp. 58-81.

Columbia University Press, New York, New York, USA. 228

Laikre L. (1999). Hereditary defects and conservation genetic management of captive

populations. Zoo Biology 18(2): 81-99.

Laikre L, Ryman N & Thompson EA. (1993). Hereditary blindness in a captive wolf

(Canis lupus) population - frequency reduction of a deleterious allele in relation to

gene conservation. Conservation Biology 7(3): 592-601.

Leberg PL & Firmin BD. (2008). Role of inbreeding depression and purging in captive

breeding and restoration programmes. Molecular Ecology 17(1): 334-343.

Lipka AE, Tian F, Wang QS, et al. (2012). GAPIT: genome association and prediction

integrated tool. Bioinformatics 28(18): 2397-2399.

Lynch M & O'Hely M. (2001). Captive breeding and the genetic fitness of natural

populations. Conservation Genetics 2(4): 363-378.

McBride RT. (1980). The Mexican wolf (Canis lupus baileyi): a historical review and

observations on it status and distribution. U.S. Fish and Wildlife Service

Southwest Region, Albuquerque, New Mexico, USA.

Montgomery ME, Ballou JD, Nurthen RK, et al. (1997). Minimizing kinship in captive

breeding programs. Zoo Biology 16(5): 377-389.

Morin PA, Martien KK & Taylor BL. (2009). Assessing statistical power of SNPs for

population structure and conservation studies. Molecular Ecology Resources 9(1):

66-73.

Narum SR & Hess JE. (2011). Comparison of FST outlier tests for SNP loci under

selection. Molecular Ecology Resources 11(Suppl 1): 184-194. 229

Ohshima K, Nakashima M, Sonoda K, et al. (2001). Expression of RCAS1 and FasL in

human trophoblasts and uterine glands during pregnancy: the possible role in

immune privilege. Clinical and Experimental Immunology 123(3): 481-486.

Oizumi S, Yamazaki K, Nakashima M, et al. (2002). RCAS1 expression: A potential

prognostic marker for adenocarcinomas of the lung. Oncology 62(4): 333-339.

Pertoldi C, Wójcik JM, Tokarska M, et al. (2010). Genome variability in European and

American bison detected using the BovineSNP50 BeadChip. Conservation

Genetics 11(2): 627-634.

Ribaut JM & Hoisington D. (1998). Marker-assisted selection: new tools and strategies.

Trends in Plant Science 3(6): 236-239.

Romanov MN, Koriabine M, Nefedov M, et al. (2006). Construction of a California

condor BAC library and first-generation chicken-condor comparative physical

map as an endangered species conservation genomics resource. Genomics 88(6):

711-718.

Romanov MN, Tuttle EM, Houck ML, et al. (2009). The value of avian genomics to the

conservation of wildlife. BMC Genomics 10(Suppl 2): S10.

Rousseau J, Tetu B, Caron D, et al. (2002). RCAS1 is associated with ductal breast

cancer progression. Biochemical and Biophysical Research Communications

293(5): 1544-1549.

Rudan I. (1999). Inbreeding and cancer incidence in human isolates. Human Biology

71(2): 173-187. 230

U.S. Fish and Wildlife Service [USFWS]. (2010). Mexican wolf conservation

assessment. U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New

Mexico, USA.

U.S. Fish and Wildlife Service [USFWS]. (2012). Mexican wolf Blue Range

reintroduction project statistics. U.S. Fish and Wildlife Service, Downloaded

from: http://www.fws.gov/southwest/es/mexicanwolf/.

Seymour AM, Montgomery ME, Costello BH, et al. (2001). High effective inbreeding

coefficients correlate with morphological abnormalities in populations of South

Australian koalas (Phascolarctos cinereus). Animal Conservation 4(3): 211-219.

Siminski, DP. (2011). Mexican wolf, Canis lupus baileyi, international studbook, 2011.

The Living Desert, Palm Desert, California, USA.

Sonoda K, Miyamoto S, Hirakawa T, et al. (2003). Association between RCAS1

expression and clinical outcome in uterine endometrial cancer. British Journal of

Cancer 89(3): 546-551.

Takahashi S, Urano T, Tsuchiya F, et al. (2003). EBAG9/RCAS1 expression and its

prognostic significance in prostatic cancer. International Journal of Cancer

106(3): 310-315.

Tsuneizumi MK, Nagai H, Harada H, et al. (2002). A highly polymorphic CA repeat

marker at the EBAG9/RCAS1 locus on 8q23 that detected frequent multiplication

in breast cancer. Annals of Human Biology 29(4): 457-460.

Umeoka K, Sanno N, Oyama K, et al. (2001). Immunohistochemical analysis of RCAS1

in human pituitary adenomas. Modern Pathology 14(12): 1232-1236. 231 vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305.

Wang JL & Hill WG. (2000). Marker-assisted selection to increase effective population

size by reducing Mendelian segregation variance. Genetics 154(1): 475-489.

Watanabe T, Inoue S, Hiroi H, et al. (1998). Isolation of estrogen-responsive genes with a

CpG island library. Molecular and Cellular Biology 18(1): 442-449.

Williams JL. (2005). The use of marker-assisted selection in animal breeding and

biotechnology. Revue Scientifique Et Technique-Office International Des

Epizooties 24(1): 379-391.

Willing EM, Dreyer C & van Oosterhout C. (2012). Estimates of genetic differentiation

measured by FST do not necessarily require large sample sizes when using many

SNP markers. Plos One 7(8): e42649.

Witzenberger KA & Hochkirch A. (2011). Ex situ conservation genetics: a review of

molecular studies on the genetic consequences of captive breeding programmes

for endangered animal species. Biodiversity and Conservation 20(9): 1843-1861.

Wu XM, Viveiros MM, Eppig JJ, et al. (2003). Zygote arrest 1 (Zar1) is a novel

maternal-effect gene critical for the oocyte-to-embryo transition. Nature Genetics

33(2): 187-191.

Wu XM, Wang P, Brown CA, et al. (2003). Zygote arrest 1 (Zar1) is an evolutionarily

conserved gene expressed in vertebrate ovaries. Biology of Reproduction 69(3):

861-867. 232

Zhang B, Kirov S & Snoddy J. (2005). WebGestalt: an integrated system for exploring

gene sets in various biological contexts. Nucleic Acids Research 33(Suppl 2):

W741-W748.

233

Table E.1: A list of the genes found within the regions covered by the 22 statistical FST outlier loci, ordered by chromosome number (CHR). Gene symbols indicate names given in UNIPROT to usually the human homolog. RNA abbreviations are rRNA = ribosomal RNA, snRNA = small nuclear RNA, sno RNA = small nucleolar RNA, misc RNA = uncharacterized RNA.

CHR ENSEMBL ID GENE SYMBOL 8 ENSCAFG00000014595 Q9N0W1_CANFA 8 ENSCAFG00000014594 ABHD12B 8 ENSCAFG00000014603 TRIM9 12 ENSCAFG00000001571 LRFN2 12 ENSCAFG00000029736 snRNA 23 ENSCAFG00000025791 rRNA 23 ENSCAFG00000027451 snRNA 23 ENSCAFG00000008387 TSC22D2 23 ENSCAFG00000028582 uncharacterized 23 ENSCAFG00000008407 EIF2A 23 ENSCAFG00000008429 SELT 23 ENSCAFG00000008436 FAM194A 23 ENSCAFG00000029264 FAM188B2 23 ENSCAFG00000008447 CLRN1a 23 ENSCAFG00000008485 MED12L 25 ENSCAFG00000006379 N4BP2L1 25 ENSCAFG00000006383 BRCA2b 25 ENSCAFG00000031981 ZAR1L 25 ENSCAFG00000025580 FRY 31 ENSCAFG00000009702 PIGPc 31 ENSCAFG00000009771 TTC3 31 ENSCAFG00000026294 snRNA 31 ENSCAFG00000021739 snRNA 31 ENSCAFG00000009792 DSCR3c 31 ENSCAFG00000025862 snRNA 31 ENSCAFG00000026922 snRNA 31 ENSCAFG00000009801 DYRK1A 31 ENSCAFG00000004266 uncharacterized 31 ENSCAFG00000009812 KCNJ6 31 ENSCAFG00000020842 misc_RNA 31 ENSCAFG00000028162 snoRNA 32 ENSCAFG00000010215 UNC5C 32 ENSCAFG00000027073 snRNA 32 ENSCAFG00000025783 rRNA 37 ENSCAFG00000012880 ICOS 37 ENSCAFG00000027164 rRNA 234

37 ENSCAFG00000026079 snRNA 37 ENSCAFG00000012891 PARD3B 37 ENSCAFG00000022999 snRNA 37 ENSCAFG00000012907 uncharacterized 37 ENSCAFG00000029423 snoRNA 37 ENSCAFG00000013191 NRP2 37 ENSCAFG00000013309 INO80D 37 ENSCAFG00000013333 NDUFS1d 37 ENSCAFG00000029635 uncharacterized 37 ENSCAFG00000026348 snoRNA 37 ENSCAFG00000021390 snoRNA 37 ENSCAFG00000013362 GPR1 aassociated with Usher’s syndrome (Joensuu et al. 2001) bassociated with early onset breast cancer (Duncan et al. 1998) cassociated with Down’s syndrome (trisomy 21; Nakamura et al. 1997) dassociated with complex I deficiency (Bénit et al. 2001)

235

Table E.2: List of GO term annotations for the 31 protein coding genes found in the chromosomal regions containing SNPs under selection. The categories of GO annotation are biological processes (BP), molecular function (MF) and cellular component (CC). OBS = observed number, EXP = expected number, FDR = false discovery rate

CATEGORY DESCRIPTION OBS EXP FDR BP signal transduction by p53 class mediator 2 0.05 0.0568 signal transduction in response to DNA BP 2 0.04 0.0568 damage DNA damage response, signal transduction BP 2 0.04 0.0568 by p53 class mediator energy derivation by oxidation of organic BP 2 0.1 0.1527 compounds generation of precursor metabolites and BP 2 0.15 0.2613 energy BP nervous system development 4 0.97 0.2674 BP positive regulation of gene expression 2 0.86 0.4231 BP response to abiotic stimulus 2 0.44 0.4231 BP locomotion 2 0.74 0.4231 BP system development 4 2.27 0.4231 MF receptor activity 2 0.92 0.4071 MF transferase activity 3 1.23 0.4071 MF ATP binding 2 0.76 0.4071 MF purine nucleotide binding 2 1 0.4071 MF signal transducer activity 2 0.96 0.4071 MF adenyl ribonucleotide binding 2 0.76 0.4071 MF protein binding 8 6.27 0.4071 MF molecular transducer activity 2 0.96 0.4071 MF ribonucleoside binding 2 0.99 0.4071 MF purine nucleoside binding 2 0.99 0.4071 CC membrane 5 3.26 0.6547 CC protein complex 3 1.67 0.6547 CC plasma membrane 3 1.45 0.6547 CC cell periphery 3 1.53 0.6547 CC cell 9 7.69 0.6547 CC macromolecular complex 4 2.06 0.6547 CC cytoplasmic part 4 3.22 0.9761 CC intracellular organelle 3 5.64 1 CC membrane-bounded organelle 3 5 1

236

Table E.3: A list of the genes found within chromosome (CHR) 13 containing SNPs associated with the two traits studied. Gene symbols indicate names given in UNIPROT to usually the human homolog. RNA abbreviations are miRNA = micro RNA, sno RNA = small nucleolar RNA.

TRAIT CHR ENSEMBL ID GENE SYMBOL 13 ENSCAFG00000029985 RNF139 Juvenile 13 ENSCAFG00000001044 TATDN1 Survival 13 ENSCAFG00000001045 NDUFB9 13 ENSCAFG00000001050 MTSS1 13 ENSCAFG00000000696 RSPO2a 13 ENSCAFG00000000702 EIF3E 13 ENSCAFG00000029583 miRNA 13 ENSCAFG00000000713 uncharacterized 13 ENSCAFG00000000716 TMEM74 Litter Size 13 ENSCAFG00000000717 TRHR 13 ENSCAFG00000000719 NUDCD1 13 ENSCAFG00000024786 PKHD1L1 13 ENSCAFG00000000738 EBAG9 13 ENSCAFG00000000741 SYBU 13 ENSCAFG00000027907 snoRNA aPreviously associated with coat morphology in domestic dogs (Cadieu et al. 2009; Boyko et al. 2010) 237

Figure E.1: FST value for each locus plotted according to decreasing false discovery rate (FDR). The vertical black line indicates a FDR cutoff of 0.05. Outlier loci are shown in red. 



238

Figure E.2: A) Manhattan plot displaying the results of the association mapping for juvenile survival. Not all chromosomes are labeled for the sake of space. B) The same plot just for chromosome 13, containing the most significant SNPs. C) QQ Plot for the expected and observed P-values. The red line indicates the expected relationship, with the gray-shaded region representing the 95% confidence interval.

239

240

Figure E.3: A) Manhattan plot displaying the results of the association mapping for litter size. Not all chromosomes are labeled for the sake of space. B) The same plot just for chromosome 13, containing the most significant SNPs. C) QQ Plot for the expected and observed P-values. The red line indicates the expected relationship, with the gray-shaded region representing the 95% confidence interval.

241

242

COMPLETE LIST OF LITERATURE CITED

Adaptive Management Oversight Committee and Interagency Field Team [AMOC and

IFT]. (2005). Mexican wolf Blue Range reintroduction project 5-year review.

U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New Mexico,

USA.

Albrechtsen A, Nielsen FC & Nielsen R. (2010). Ascertainment biases in SNP chips

affect measures of population divergence. Molecular Biology and Evolution

27(11): 2534-2547.

Alexander DH, Novembre J & Lange K. (2009). Fast model-based estimation of ancestry

in unrelated individuals. Genome Research 19(9): 1655-1664.

Allendorf FW, Hohenlohe PA & Luikart G. (2010). Genomics and the future of

conservation genetics. Nature Reviews Genetics 11(10): 697-709.

Altshuler DM, Durbin RM, Abecasis GR, et al. (2012). An integrated map of genetic

variation from 1,092 human genomes. Nature 491(7422): 56-65.

Alves E, Barragan C & Toro MA. (2008). Inbreeding and homozygosity in Iberian pigs.

Spanish Journal of Agricultural Research 6(2): 248-251.

Anderson CR, Lindzey FG & McDonald DB. (2004). Genetic structure of cougar

populations across the Wyoming Basin: metapopulation or megapopulation.

Journal of Mammalogy 85(6): 1207-1214.

Anderson EC & Dunham KK. (2008). The influence of family groups on inferences made

with the program Structure. Molecular Ecology Resources 8(6): 1219-1229. 243

Andreasen AM, Stewart KM, Longland WS, et al. (2012). Identification of source-sink

dynamics in mountain lions of the Great Basin. Molecular Ecology 21(23): 5689-

5701.

Angeloni F, Wagemaker N, Vergeer P, et al. (2012). Genomic toolboxes for conservation

biologists. Evolutionary Applications 5(2): 130-143.

Ansorge WJ. (2009). Next-generation DNA sequencing techniques. New Biotechnology

25(4): 195-203.

Avise JC. (2009). Perspective: conservation genetics enters the genomics era.

Conservation Genetics 11(2): 665-669.

Baird NA, Etter PD, Atwood TS, et al. (2008). Rapid SNP discovery and genetic

mapping using sequenced RAD markers. Plos One 3(10): e3376.

Ballou JD & Lacy RC. (1995). Identifying genetically important individuals for

management of genetic variation in pedigreed populations. In Population

management for survival and recovery: analytical methods and strategies in small

population conservation (Ballou, J. D., Gilpin, M. E. & Foose, T. J., eds.), pp. 76-

111. Columbia University Press, New York, New York, USA.

Balloux F, Amos W & Coulson T. (2004). Does heterozygosity estimate inbreeding in

real populations? Molecular Ecology 13(10): 3021-3031.

Barbara T, Palma-Silva C, Paggi GM, et al. (2007). Cross-species transfer of nuclear

microsatellite markers: potential and limitations. Molecular Ecology 16(18):

3759-3767.

Beier P. (1991). Cougar attacks on humans in the United States and Canada. Wildlife

Society Bulletin 19(4): 403-412. 244

Beier P. (1995). Dispersal of juvenile cougars in a fragmented habitat. Journal of Wildlife

Management 59(2): 228-237.

Benjamini Y & Hochberg Y. (1995). Controlling the false discovery rate - a practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society

Series B-Methodological 57(1): 289-300.

Biek R, Akamine N, Schwartz MK, et al. (2006). Genetic consequences of sex-biased

dispersal in a solitary carnivore: Yellowstone cougars. Biology Letters 2(2): 312-

315.

Blouin MS, Parsons M, Lacaille V, et al. (1996). Use of microsatellite loci to classify

individuals by relatedness. Molecular Ecology 5(3): 393-401.

Bogan MA & Mehlhop P. (1983). Systematic relationship of gray wolves (Canis lupus)

in southwestern North America. Occasional Papers in the Museum of

Southwestern Biology 1: 1-21.

Bonin A, Bellemain E, Eidesen PB, et al. (2004). How to track and assess genotyping

errors in population genetics studies. Molecular Ecology 13(11): 3261-3273.

Bonin A, Nicole F, Pompanon F, et al. (2007). Population Adaptive Index: a new method

to help measure intraspecific genetic diversity and prioritize populations for

conservation. Conservation Biology 21(3): 697-708.

Bonneaud C, Balenger SL, Zhang J, et al. (2012). Innate immunity and the evolution of

resistance to an emerging infectious disease in a wild bird. Molecular Ecology

21(11): 2628-2639.

Boyko AR, Quignon P, Li L, et al. (2010). A simple genetic architecture underlies

morphological variation in dogs. PLoS Biology 8(8): e1000451. 245

Brisbin A. (2010). Linkage analysis for categorical traits and ancestry assignment in

admixed individuals, Ph.D Dissertation; Cornell University.

Broman KW & Weber JL. (1999). Long homozygous chromosomal segments in

reference families from the Centre d'Étude du Polymorphisme Humain. The

American Journal of Human Genetics 65(6): 1493-1500.

Brooks TM, Mittermeier RA, da Fonseca GAB, et al. (2006). Global biodiversity

conservation priorities. Science 313(5783): 58-61.

Brown DE, Unmack PJ & Brennan TC. (2007). Digitized map of biotic communities for

plotting and comparing distributions of North American animals. Southwestern

Naturalist 52(4): 610-616.

Brown DE. (1983). The wolf in the Southwest: the making of an endangered species, The

University of Arizona Press, Tucson, Arizona, USA.

Browning SR & Browning BL. (2007). Rapid and accurate haplotype phasing and

missing data inference for whole-genome association studies by use of localized

haplotype clustering. American Journal of Human Genetics 81(5): 1084-1097.

Brumfield RT, Beerli P, Nickerson DA, et al. (2003). The utility of single nucleotide

polymorphisms in inferences of population history. Trends in Ecology &

Evolution 18(5): 249-256.

Butchart SHM, Walpole M, Collen B, et al. (2010). Global biodiversity: indicators of

recent declines. Science 328(5982): 1164-1168.

Cadieu E, Neff MW, Quignon P, et al. (2009). Coat variation in the domestic dog is

governed by variants in three genes. Science 326(5949): 150-153. 246

Cardillo M, Purvis A, Sechrest W, et al. (2004). Human population density and extinction

risk in the world's carnivores. Plos Biology 2(7): 909-914.

Carley, CJ. (1979). Narrative report on alleged Mexican gray wolf (Canis lupus baileyi)

lineages held in captivity. U.S. Fish and Wildlife Service Southwest Region,

Albuquerque, New Mexico, USA.

Carothers AD, Rudan I, Kolcic I, et al. (2006). Estimating human inbreeding coefficients:

comparison of genealogical and marker heterozygosity approaches. Annals of

Human Genetics 70: 666-676.

Caso A, Lopez-Gonzalez C, Payan E, et al. (2008). Puma concolor. In IUCN 2013.

IUCN Red List of Threatened Species. Version 2013.2. www.iucnredlist.org.

Castilho CS, Marins-Sá LG, Benedet RC, et al. (2011). Landscape genetics of mountain

lions (Puma concolor) in southern Brazil. Mammalian Biology - Zeitschrift für

Säugetierkunde 76(4): 476-483.

Castoe TA, Poole AW, Gu W, et al. (2010). Rapid identification of thousands of

copperhead snake (Agkistrodon contortrix) microsatellite loci from modest

amounts of 454 shotgun genome sequence. Molecular Ecology Resources 10(2):

341-347.

Chambers SM, Fain SR, Fazio B, et al. (2012). An account of the taxonomy of North

American wolves from morphological and genetic analyses. North American

Fauna 77: 1-67.

Chapman JR, Nakagawa S, Coltman DW, et al. (2009). A quantitative review of

heterozygosity-fitness correlations in animal populations. Molecular Ecology

18(13): 2746-2765. 247

Charlesworth D & Charlesworth B. (1987). Inbreeding depression and its evolutionary

consequences. Annual Review of Ecology and Systematics 18: 237-268.

Charlesworth D & Willis JH. (2009). The genetics of inbreeding depression. Nature

Reviews Genetics 10(11): 783-796.

Chismar JD, Mondala T, Fox HS, et al. (2002). Analysis of result variability from high-

density oligonucleotide arrays comparing same-species and cross-species

hybridizations. BioTechniques 33(3): 516-518.

Christie MR, Marine ML, French RA, et al. (2012). Genetic adaptation to captivity can

occur in a single generation. Proceedings of the National Academy of Sciences of

the United States of America 109(1): 238-242.

Clark AG, Hubisz MJ, Bustamante CD, et al. (2005). Ascertainment bias in studies of

human genome-wide polymorphism. Genome Research 15(11): 1496-1502.

Coates BS, Sumerford DV, Miller NJ, et al. (2009). Comparative performance of single

nucleotide polymorphism and microsatellite markers for population genetic

analysis. Journal of Heredity 100(5): 556-564.

Collin RWJ, van den Born LI, Klevering BJ, et al. (2011). High-resolution homozygosity

mapping is a powerful tool to detect novel mutations causative of autosomal

recessive rp in the Dutch population. Investigative Ophthalmology & Visual

Science 52(5): 2227-2239.

Coltman DW & Slate J. (2003). Microsatellite measures of inbreeding: a meta-analysis.

Evolution 57(5): 971-983.

Cooper A, Lalueza-Fox C, Anderson S, et al. (2001). Complete mitochondrial genome

sequences of two extinct moas clarify ratite evolution. Nature 409(6821): 704- 248

707.

Cornwallis CK, West SA & Griffin AS. (2009). Routes to indirect fitness in

cooperatively breeding vertebrates: kin discrimination and limited dispersal.

Journal of Evolutionary Biology 22(12): 2445-2457.

Coulson TN, Pemberton JM, Albon SD, et al. (1998). Microsatellites reveal heterosis in

red deer. Proceedings of the Royal Society B-Biological Sciences 265(1395): 489-

495.

Culver M & Schwartz MK. (2011). Conservation genetics and cougar management. In

Managing cougars in North America (Jenks, J. A., ed.), pp. 91-110. Jack H.

Berryman Institute, Utah State University, Logan, Utah, USA.

Culver M, Johnson WE, Pecon-Slattery J, et al. (2000). Genomic ancestry of the

American puma (Puma concolor). Journal of Heredity 91(3): 186-197.

Curole JP & Kocher TD. (1999). Mitogenomics: digging deeper with complete

mitochondrial genomes. Trends in Ecology & Evolution 14(10): 394-398.

Davey JW, Hohenlohe PA, Etter PD, et al. (2011). Genome-wide genetic marker

discovery and genotyping using next-generation sequencing. Nature Reviews

Genetics 12(7): 499-510. de Cara MÁR, Villanueva B, Toro MÁ, et al. (2013). Using genomic tools to maintain

diversity and fitness in conservation programmes. Molecular Ecology 22(24):

6091-6099.

Decker JE, Pires JC, Conant GC, et al. (2009). Resolving the evolution of extant and

extinct ruminants with high-throughput phylogenomics. Proceedings of the

National Academy of Sciences of the United States of America 106(44): 18644- 249

18649.

Dekkers JCM. (2004). Commercial application of marker- and gene-assisted selection in

livestock: strategies and lessons. Journal of Animal Science 82(13): E313-E328.

Delisle I & Strobeck C. (2002). Conserved primers for rapid sequencing of the complete

mitochondrial genome from carnivores, applied to three species of bears.

Molecular Biology and Evolution 19(3): 357-361.

Duncan JA, Reeves JR & Cooke TG. (1998). BRCA1 and BRCA2 proteins: roles in

health and disease. Journal of Clinical Pathology-Molecular Pathology 51(5):

237-247.

Earl DA & vonHoldt BM. (2012). STRUCTURE HARVESTER: a website and program

for visualizing STRUCTURE output and implementing the Evanno method.

Conservation Genetics Resources 4(2): 359-361.

Ekblom R & Galindo J. (2011). Applications of next generation sequencing in molecular

ecology of non-model organisms. Heredity 107(1): 1-15.

Elbroch M, Wittmer HU, Saucedo C, et al. (2009). Long-distance dispersal of a male

puma (Puma concolor puma) in Patagonia. Revista Chilena de Historia Natural

82: 459-461.

Ellegren H. (2004). Microsatellites: simple sequences with complex evolution. Nature

Reviews Genetics 5(6): 435-445.

Emerson KJ, Merz CR, Catchen JM, et al. (2010). Resolving postglacial phylogeography

using high-throughput sequencing. Proceedings of the National Academy of

Sciences of the United States of America 107(37): 16196-16200. 250

Ernest HB, Boyce WM, Bleich VC, et al. (2003). Genetic structure of mountain lion

(Puma concolor) populations in California. Conservation Genetics 4(3): 353-366.

Ernest HB, Penedo MCT, May BP, et al. (2000). Molecular tracking of mountain lions in

the Yosemite Valley region in California: genetic analysis using microsatellites

and faecal DNA. Molecular Ecology 9(4): 433-441.

Ernest HB, Rubin ES & Boyce WM. (2002). Fecal DNA analysis and risk assessment of

mountain lion predation of bighorn sheep. Journal of Wildlife Management 66:

75-85.

Estoup A, Jarne P & Cornuet JM. (2002). Homoplasy and mutation model at

microsatellite loci and their consequences for population genetics analysis.

Molecular Ecology 11(9): 1591-1604.

Evanno G, Regnaut S & Goudet J. (2005). Detecting the number of clusters of

individuals using the software structure: a simulation study. Molecular Ecology

14(8): 2611-2620.

Evertsz EM, Au-Young J, Ruvolo MV, et al. (2001). Hybridization cross-reactivity

within homologous gene families on glass cDNA microarrays. BioTechniques

31(5): 1182, 1184, 1186, 1188, 1190-1192.

Eyre-Walker A. (2010). Genetic architecture of a complex trait and its implications for

fitness and genome-wide association studies. Proceedings of the National

Academy of Sciences of the United States of America 107(Suppl 1): 1752-1756.

Fabbri E, Caniglia R, Mucci N, et al. (2012). Comparison of single nucleotide

polymorphisms and microsatellites in non-invasive genetic monitoring of a wolf

population. Archives of Biological Sciences 64(1): 321-335. 251

Falush D, Stephens M & Pritchard JK. (2003). Inference of population structure using

multilocus genotype data: linked loci and correlated allele frequencies. Genetics

164(4): 1567-1587.

Farrell LE, Romant J & Sunquist ME. (2000). Dietary separation of sympatric carnivores

identified by molecular analysis of scats. Molecular Ecology 9(10): 1583-1590.

Fernandez J, Roughsedge T, Woolliams JA, et al. (2006). Optimization of the sampling

strategy for establishing a gene bank: storing PrP alleles following a scrapie

eradication plan as a case study. Animal Science 82: 813-821.

Fernandez J, Toro MA & Caballero A. (2004). Managing individuals' contributions to

maximize the allelic diversity maintained in small, conserved populations.

Conservation Biology 18(5): 1358-1367.

Fernandez J, Villanueva B, Pong-Wong R, et al. (2005). Efficiency of the use of pedigree

and molecular marker information in conservation programs. Genetics 170(3):

1313-1321.

Fischer MC, Foll M, Excoffier L, et al. (2011). Enhanced AFLP genome scans detect

local adaptation in high-altitude populations of a small rodent (Microtus arvalis).

Molecular Ecology 20(7): 1450-1462.

Folch J, Cocero MJ, Chesne P, et al. (2009). First birth of an animal from an extinct

subspecies (Capra pyrenaica pyrenaica) by cloning. Theriogenology 71(6): 1026-

1034.

Foll M & Gaggiotti O. (2008). A genome-scan method to identify selected loci

appropriate for both dominant and codominant markers: a Bayesian perspective.

Genetics 180(2): 977-993. 252

Forêt S, Kassahn KS, Grasso LC, et al. (2007) Genomic and microarray approaches to

coral reef conservation biology. Coral Reefs 26: 475-486.

Frankham R, Ballou JD & Briscoe DA. (2009). Introduction to conservation genetics,

Cambridge University Press, Cambridge, UK.

Frankham R. (1995). Conservation genetics. Annual Review of Genetics 29: 305-327.

Frankham R. (2003). Genetics and conservation biology. Comptes Rendus Biologies 326:

22-29.

Fredrickson RJ, Siminski P, Woolf M, et al. (2007). Genetic rescue and inbreeding

depression in Mexican wolves. Proceedings of the Royal Society B: Biological

Sciences 274(1623): 2365-2371.

Fu B & He S. (2012). Transcriptome analysis of silver carp (Hypophthalmichthys

molitrix) by paired-end RNA sequencing. DNA Research 19(2): 131-142.

García-Dorado A. (2008). A simple method to account for natural selection when

predicting inbreeding depression. Genetics 180(3): 1559-1566.

Garcia-Moreno J, Matocq MD, Roy MS, et al. (1996). Relationships and genetic purity of

the endangered Mexican wolf based on analysis of microsatellite loci.

Conservation Biology 10(2): 376-389.

Gardner MG, Fitch AJ, Bertozzi T, et al. (2011). Rise of the machines - recommendations

for ecologists when using next generation sequencing for microsatellite

development. Molecular Ecology Resources 11(6): 1093-1101.

Garrison E & Marth G. (2012). Haplotype-based variant detection from short-read

sequencing. ArXiv e-prints 1207.3907.

Garvin MR, Saitoh K & Gharrett AJ. (2010). Application of single nucleotide 253

polymorphisms to non-model species: a technical review. Molecular Ecology

Resources 10(6): 915-934.

Gebremedhin B, Ficetola GF, Naderi S, et al. (2009). Frontiers in identifying

conservation units: from neutral markers to adaptive genetic variation. Animal

Conservation 12(2): 107-109.

Gibson G. (2002). Microarrays in ecology and evolution: a preview. Molecular Ecology

11(1): 17-24.

Gilbert MTP, Drautz DI, Lesk AM, et al. (2008). Intraspecific phylogenetic analysis of

Siberian woolly mammoths using complete mitochondrial genomes. Proceedings

of the National Academy of Sciences of the United States of America 105(24):

8327-8332.

Gilles A, Meglécz E, Pech N, et al. (2011). Accuracy and quality assessment of 454 GS-

FLX Titanium pyrosequencing. BMC Genomics 12: 245.

Girard P. (2011). A robust statistical method to detect null alleles in microsatellite and

SNP datasets in both panmictic and inbred populations. Statistical Applications in

Genetics and Molecular Biology 10(1): 1-10.

Glenn TC. (2011). Field guide to next-generation DNA sequencers. Molecular Ecology

Resources 11(5): 759-769.

Goetz F, Rosauer D, Sitar S, et al. (2010). A genetic basis for the phenotypic

differentiation between siscowet and lean lake trout (Salvelinus namaycush).

Molecular Ecology 19(Supp1): 176-196.

Goldman EA. (1937). The Wolves of North America. Journal of Mammology 18: 37-45. 254

Gompert Z & Buerkle CA. (2010). INTROGRESS: a software package for mapping

components of isolation in hybrids. Molecular Ecology Resources 10(2): 378-384.

Goudet J. (2001). FSTAT, a program to estimate and test gene diversities and fixation

indices (version 2.9.3), Vol. 2009.

Green RE, Malaspinas AS, Krause J, et al. (2008). A complete neandertal mitochondrial

genome sequence determined by high-throughput sequencing. Cell 134(3): 416-

426.

Guinness World Records 2004. (2004). Guinness World Records LTD. p. 49.

Haag T, Santos AS, De Angelo C, et al. (2009). Development and testing of an optimized

method for DNA-based identification of jaguar (Panthera onca) and puma (Puma

concolor) faecal samples for use in ecological and genetic studies. Genetica

136(3): 505-512.

Hahn M, Wilhelm J & Pingoud A. (2001). Influence of fluorophor dye labels on the

migration behavior of polymerase chain reaction amplified short tandem repeats

during denaturing capillary electrophoresis. Electrophoresis 22(13): 2691-2700.

Hailer F & Leonard JA. (2008). Hybridization among three native North American canis

species in a region of natural sympatry. Plos One 3(10): e3333.

Hall ER & Kelson KR. (1959). The Mammals of North America, Vol. II, The Ronald

Press, New York, New York, USA.

Hall N, Mercer L, Phillips D, et al. (2012). Maximum likelihood estimation of individual

inbreeding coefficients and null allele frequencies. Genetics Research 94(3): 151-

161. 255

Hamilton WD. (1964). Genetical evolution of social behavior 1. Journal of Theoretical

Biology 7(1): 1-16.

Hamilton WD. (1964). Genetical evolution of social behavior 2. Journal of Theoretical

Biology 7(1): 17-52.

Hamzić E. (2011). Levels of inbreeding derived from runs of homozygosity: a

comparison of Austrian and Norwegian cattle breeds. Master's Thesis; University

of Natural Resources and Life Sciences, Vienna, Austria.

Harris NC & Dunn RR. (2013). Species loss on spatial patterns and composition of

zoonotic parasites. Proceedings of the Royal Society B 280(1771): 20131847.

Hauser L, Baird M, Hilborn RAY, et al. (2011). An empirical comparison of SNPs and

microsatellites for parentage and kinship assignment in a wild sockeye salmon

(Oncorhynchus nerka) population. Molecular Ecology Resources 11: 150-161.

Haussler D, O'Brien SJ, Ryder OA, et al. (2009). Genome 10K: a proposal to obtain

whole-genome sequence for 10,000 vertebrate species. Journal of Heredity

100(6): 659-674.

Hawkins RD, Hon GC & Ren B. (2010). Next-generation genomics: an integrative

approach. Nature Reviews Genetics. 11: 476-486.

Hayes CL, Rubin ES, Jorgensen MC, et al. (2000). Mountain lion predation of bighorn

sheep in the Peninsular Ranges, California. Journal of Wildlife Management

64(4): 954-959.

Haynes GD & Latch EK. (2012). Identification of novel single nucleotide polymorphisms

(SNPs) in deer (Odocoileus spp.) using the BovineSNP50 BeadChip. Plos One

7(5): e36536. 256

Hedrick P, Fredrickson R & Ellegren H. (2001). Evaluation of ∂2, a microsatellite

measure of inbreeding and outbreeding, in wolves with a known pedigree.

Evolution 55(6): 1256-1260.

Hedrick PW & Fredrickson RJ. (2008). Captive breeding and the reintroduction of

Mexican and red wolves. Molecular Ecology 17(1): 344-350.

Hedrick PW & Kalinowski ST. (2000). Inbreeding depression in conservation biology.

Annual Review of Ecology and Systematics 31: 139-162.

Hedrick PW, Miller PS, Geffen E, et al. (1997). Genetic evaluation of the three captive

Mexican wolf lineages. Zoo Biology 16(1): 47-69.

Hedrick PW. (2009). Neutral, detrimental, and adaptive variation in conservation

genetics. In Conservation genetics in the age of genomics (Amato, G., DeSalle,

R., Ryder, O. A. & Rosenbaum, H. C., eds.), pp. 41-57. Columbia University

Press.

Heller MJ. (2002). DNA microarray technology: devices, systems, and applications.

Annual Review of Biomedical Engineering 4(1): 129-153.

Henkel CV, Burgerhout E, de Wijze DL, et al. (2012). Primitive duplicate hox clusters in

the European eel's genome. Plos One 7(2): e32231.

Highnam G, Franck C, Martin A, et al. (2013). Accurate human microsatellite genotypes

from high-throughput resequencing data using informed error profiles. Nucleic

Acids Research 41(1): 1-7.

Hildebrandt F, Heeringa SF, Ruschendorf F, et al. (2009). A Systematic Approach to

Mapping Recessive Disease Genes in Individuals from Outbred Populations. Plos

Genetics 5(1): e1000353. 257

Hill WG & Weir BS. (2011). Variation in actual relationship as a consequence of

Mendelian sampling and linkage. Genetics Research 93(1): 47-64.

Hoffmeister DF. (1986). Mammals of Arizona, University of Arizona Press, Tucson,

Arizona, USA.

Hoheisel JD. (2006). Microarray technology: beyond transcript profiling and genotype

analysis. Nature Reviews Microbiology 7(3): 200-210.

Hohenlohe PA, Amish SJ, Catchen JM, et al. (2011). Next-generation RAD sequencing

identifies thousands of SNPs for assessing hybridization between rainbow and

westslope cutthroat trout. Molecular Ecology Resources 11(Suppl 1): 117-122.

Holsinger KE & Weir BS. (2009). Genetics in geographically structured populations:

defining, estimating and interpreting FST. Nature Reviews Genetics 10(9): 639-

650.

Hong HX, Xu L, Liu J, et al. (2012). Technical reproducibility of genotyping SNP arrays

used in genome-wide association studies. Plos One 7(9): e44483.

Hornocker MG. (1969). Winter territoriality in mountain lions. Journal of Wildlife

Management 33(3): 457-464.

Houston DD, Elzinga DB, Maughan PJ, et al. (2012). Single nucleotide polymorphism

discovery in cutthroat trout subspecies using genome reduction, barcoding, and

454 pyro-sequencing. BMC Genomics 13: 724.

Hu J, Wang F, Zhu X, et al. (2010). Mouse ZAR1-like (XM_359149) colocalizes with

mRNA processing components and its dominant-negative mutant caused two-cell-

stage embryonic arrest. Developmental Dynamics 239(2): 407-424. 258

Hubisz MJ, Falush D, Stephens M, et al. (2009). Inferring weak population structure with

the assistance of sample group information. Molecular Ecology Resources 9(5):

1322-1332.

Iriarte JA, Franklin WL, Johnson WE, et al. (1990). Biogeographic variation of food

habits and body size of the American puma. Oecologia 85: 185-190.

Jakobsson M & Rosenberg NA. (2007). CLUMPP: a cluster matching and permutation

program for dealing with label switching and multimodality in analysis of

population structure. Bioinformatics 23(14): 1801-1806.

Jeukens J, Renaut S, St-Cyr J, et al. (2010). The transcriptomics of sympatric dwarf and

normal lake whitefish (Coregonus clupeaformis spp., Salmonidae) divergence as

revealed by next-generation sequencing. Molecular Ecology 19(24): 5389-5403.

Jirimutu WZ, Ding G, Chen G, et al. (2012). Genome sequences of wild and domestic

bactrian camels. Nature Communications 3: 1202.

Johnson WE, Eizirik E, Pecon-Slattery J, et al. (2006). The Late Miocene radiation of

modern Felidae: A genetic assessment. Science 311(5757): 73-77.

Kalinowski ST, Hedrick PW & Miller PS. (1999). No inbreeding depression observed in

Mexican and red wolf captive breeding programs. Conservation Biology 13(6):

1371-1377.

Kalinowski ST. (2005). Do polymorphic loci require large sample sizes to estimate

genetic distances? Heredity 94(1): 33-36.

Kang HM, Zaitlen NA, Wade CM, et al. (2008). Efficient control of population structure

in model organism association mapping. Genetics 178(3): 1709-1723. 259

Karlsson EK, Baranowska I, Wade CM, et al. (2007). Efficient mapping of mendelian

traits in dogs through genome-wide association. Nature Genetics 39(11): 1321-

1328.

Kassahn KS, Caley MJ, Ward AC, et al. (2007). Heterologous microarray experiments

used to identify the early gene response to heat stress in a coral reef fish.

Molecular Ecology 16(8): 1749-1763.

Kennerly E, Ballmann A, Martin S, et al. (2008). A gene expression signature of

confinement in peripheral blood of red wolves (Canis rufus). Molecular Ecology

17(11): 2782-2791.

Kirin M, McQuillan R, Franklin CS, et al. (2010). Genomic runs of homozygosity record

population history and consanguinity. Plos One 5(11): e139996.

Kirkpatrick B, Li SC, Karp RM, et al. (2011). Pedigree reconstruction using identity by

descent. Journal of Computational Biology 18(11): 1481-1493.

Knowles JC. (2010). Population genomics of North American grey wolves (Canis lupus),

Masters Thesis, University of Alberta.

Kohn MH, Murphy WJ, Ostrander EA, et al. (2006). Genomics and conservation

genetics. Trends in Ecology & Evolution 21(11): 629-637.

Krawczak M. (1999). Informativity assessment for biallelic single nucleotide

polymorphisms. Electrophoresis 20(8): 1676-1681.

Kuhner MK, Beerli P, Yamato J, et al. (2000). Usefulness of single nucleotide

polymorphism data for estimating population parameters. Genetics 156: 439-447. 260

Kurushima JD, Collins JA, Well JA, et al. (2006). Development of 21 microsatellite loci

for puma (Puma concolor) ecology and forensics. Molecular Ecology Notes 6(4):

1260-1262.

Lacy RC, Borbat M & Pollak JP. (2009). Vortex: a stochastic simulation of the extinction

process. Version 9.95. Chicago Zoological Society, Brookfield, Illinois.

Lacy RC. (1994). Managing genetic diversity in captive populations of animals. In

Restoration and recovery of endangered plants and animals (Bowles, M. L. &

Whelan, C. J., eds.), pp. 63-89. Cambridge University Press, Cambridge, UK.

Lacy RC. (2009). Stopping evolution. In Conservation genetics in the age of genomics

(Amato, G., DeSalle, R., Ryder, O. A. & Rosenbaum, H. C., eds.), pp. 58-81.

Columbia University Press, New York, New York, USA.

Lagercrantz U, Ellegren H & Andersson L. (1993). The abundance of various

polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic

Acids Research 21(5): 1111-1115.

Laikre L, Ryman N & Thompson EA. (1993). Hereditary blindness in a captive wolf

(Canis lupus) population - frequency reduction of a deleterious allele in relation to

gene conservation. Conservation Biology 7(3): 592-601.

Laikre L. (1999). Hereditary defects and conservation genetic management of captive

populations. Zoo Biology 18(2): 81-99.

Laird PW. (2010). Principles and challenges of genome-wide DNA methylation analysis.

Nature Reviews Genetics 11(3): 191-203.

Lake-Thom B. (1997). Spirits of the earth: a guide to Native American nature symbols,

stories, and ceremonies. Penguin Books, New York, USA. 261

Lander ES, Int Human Genome Sequencing C, Linton LM, et al. (2001). Initial

sequencing and analysis of the human genome. Nature 409(6822): 860-921.

Laundre JW & Loxterman J. (2007). Impact of edge habitat on summer home range size

in female pumas. American Midland Naturalist 157(1): 221-229.

Leal SM, Yan K & Muller-Myhsok B. (2005). SimPed: a simulation program to generate

haplotype and genotype data for pedigree structures. Human Heredity 60(2): 119-

122.

Leberg PL & Firmin BD. (2008). Role of inbreeding depression and purging in captive

breeding and restoration programmes. Molecular Ecology 17(1): 334-343.

Lee JR, Magee DM, Gaster RS, et al. (2013). Emerging protein array technologies for

proteomics. Expert Review of Proteomics 10(1): 65-75.

Leonard JA, Vilá C & Wayne RK. (2005). Legacy lost: genetic variability and population

size of extirpated US grey wolves (Canis lupus). Molecular Ecology 14(1): 9-17.

Lequarré AS, Andersson L, Andre C, et al. (2011). LUPA: A European initiative taking

advantage of the canine genome architecture for unraveling complex disorders in

both human and dogs. Veterinary Journal 189(2): 155-159.

Leutenegger AL, Prum B, Genin E, et al. (2003). Estimation of the inbreeding coefficient

through use of genomic data. American Journal of Human Genetics 73(3): 516-

523.

Li H & Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25(14): 1754-1760. 262

Li MH, Stranden I, Tiirikka T, et al. (2011). A comparison of approaches to estimate the

inbreeding coefficient and pairwise relatedness using genomic and pedigree data

in a sheep population. Plos One 6(11): e26256.

Li R, Fan W, Tian G, et al. (2009). The sequence and de novo assembly of the giant

panda genome. Nature 463(7279): 311-317.

Li W & Ruan K. (2009). MicroRNA detection by microarray. Analytical and

Bioanalytical Chemistry 394(4): 1117-1124.

Lipka AE, Tian F, Wang QS, et al. (2012). GAPIT: genome association and prediction

integrated tool. Bioinformatics 28(18): 2397-2399.

Liu NJ, Chen L, Wang S, et al. (2005). Comparison of single-nucleotide polymorphisms

and microsatellites in inference of population structure. BMC Genetics 6: S26.

Logan KA & Irwin LL. (1985). Mountain lion habitats in the Big Horn Mountains,

Wyoming. Wildlife Society Bulletin 13(3): 257-262.

Logan KA & Sweanor LL. (2001). Desert Puma: Evolutionary Ecology and

Conservation of an Enduring Carnivore, Island Press, Washington D.C., USA.

Lopez G & Vazquez CB. (1991). Linaje de lobos Mexicanos "San Juan de Aragón":

historia, evidencia de su autenticidad y posibilidad de certificacion. Unpublished

report for the Zoologico San Juan de Aragón.

Loxterman JL. (2011). Fine scale population genetic structure of pumas in the

Intermountain West. Conservation Genetics 12(4): 1049-1059.

Luikart G, England PR, Tallmon D, et al. (2003). The power and promise of population

genomics: from genotyping to genome typing. Nature Reviews Genetics 4(12):

981-994. 263

Luo C, Tsementzi D, Kyrpides N, et al. (2012). Comparisons of Illumina and Roche 454

sequencing technologies on the same microbial community DNA sample. PLoS

One 7(2): e30087.

Lynch M & O'Hely M. (2001). Captive breeding and the genetic fitness of natural

populations. Conservation Genetics 2(4): 363-378.

Lynch M & Ritland K. (1999). Estimation of pairwise relatedness with molecular

markers. Genetics 152(4): 1753-1766.

Mace RD, Waller JS, Manley TL, et al. (1996). Relationships among grizzly bears, roads

and habitat in the Swan Mountains, Montana. Journal of Applied Ecology 33(6):

1395-1404.

Magnanou E, Malenke JR, & Dearing MD (2009) Expression of biotransformation genes

in woodrat (Neotoma) herbivores on novel and ancestral diets: identification of

candidate genes responsible for dietary shifts. Molecular Ecology 18(11): 2401-

2414.

Malausa T, Gilles A, Meglecz E, et al. (2011). High-throughput microsatellite isolation

through 454 GS-FLX Titanium pyrosequencing of enriched DNA libraries.

Molecular Ecology Resources 11(4): 638-644.

Malécot G. (1948). Les mathématiques de i’hérédité. Paris: Masson & Cie.

Malik A, Korol A, Hubner S, et al. (2011). Transcriptome sequencing of the blind

subterranean mole rat, Spalax galili: utility and potential for the discovery of

novel evolutionary patterns. Plos One 6(8): e21227.

Mancia A, Ryan JC, Chapman RW, et al. (2012). Health status, infection and disease in

California sea lions (Zalophus californianus) studied using a canine microarray 264

platform and machine-learning approaches. Developmental and Comparative

Immunology 36(4): 629-637.

Manel S, Berthier P & Luikart G. (2000). Detecting wildlife poaching: identifying the

origin of individuals with Bayesian assignment tests and multilocus genotypes.

Conservation Biology 16(3): 650-659.

Manel S, Joost S, Epperson BK, et al. (2010). Perspectives on the use of landscape

genetics to detect genetic adaptive variation in the field. Molecular Ecology

19(17): 3760-3772.

Margam VM, Coates BS, Bayles DO, et al. (2011). Transcriptome sequencing, and rapid

development and application of SNP markers for the legume pod borer Maruca

vitrata (Lepidoptera: Crambidae). Plos One 6(7): e21388.

Marguerat S & Bähler J. (2010). RNA-seq: from technology to biology. Cellular and

Molecular Life Sciences 67(4): 569-579.

Margulies M, Egholm M, Altman WE, et al. (2005). Genome sequencing in

microfabricated high-density picolitre reactors. Nature 437(7057): 376-380.

Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing

reads. EMBnet.journal 17(1): 10-12.

Martínez-Abraín A. (2008). Statistical significance and biological relevance: a call for a

more cautious interpretation of results in ecology. Acta Oecologica 34(1): 9-11.

Maxam AM & Gilbert W. (1977). New method for sequencing DNA. Proceedings of the

National Academy of Sciences of the United States of America 74(2): 560-564. 265

McBride RT. (1980). The Mexican wolf (Canis lupus baileyi): a historical review and

observations on it status and distribution. U.S. Fish and Wildlife Service

Southwest Region, Albuquerque, New Mexico, USA.

McCue ME, Bannasch DL, Petersen JL, et al. (2012). A high density SNP array for the

domestic horse and extant Perissodactyla: utility for association mapping, genetic

diversity, and phylogeny studies. PloS Genetics 8(1): e1002451.

McQuillan R, Leutenegger A-L, Abdel-Rahman R, et al. (2008). Runs of homozygosity

in European populations. The American Journal of Human Genetics 83(3): 359-

372.

McRae BH, Beier P, Dewald LE, et al. (2005). Habitat barriers limit gene flow and

illuminate historical events in a wide-ranging carnivore, the American puma.

Molecular Ecology 14(7): 1965-1977.

Mech LD & Boitani L. (2003). Wolves: behavior, ecology, and conservation. University

of Chicago Press, Chicago, Illinois, USA.

Menotti-Raymond M, David VA, Lyons LA, et al. (1999). A genetic linkage map of

microsatellites in the domestic cat (Felis catus). Genomics 57(1): 9-23.

Metzker ML. (2009). Sequencing technologies - the next generation. Nature Reviews

Genetics 11(1): 31-46.

Michelizzi VN, Wu XL, Dodson MV, et al. (2011). A global view of 54,001 single

nucleotide polymorphisms (SNPs) on the Illumina BovineSNP50 BeadChip and

their transferability to water buffalo. International Journal of Biological Sciences

7(1): 18-27.

Miller JM, Kijas JW, Heaton MP, et al. (2012a). Consistent divergence times and allele 266

sharing measured from cross-species application of SNP chips developed for three

domestic species. Molecular Ecology Resources 12(6): 1145-1150.

Miller JM, Malenfant RM, David P, et al. (2013). Estimating genome-wide

heterozygosity: effects of demographic history and marker type. Heredity (Ahead

of Print) 10.1038/hdy.2013.99.

Miller JM, Poissant J, Kijas JW, et al. (2011). A genome-wide set of SNPs detects

population substructure and long range linkage disequilibrium in wild sheep.

Molecular Ecology Resources 11(2): 314-322.

Miller PS & Lacy RC. (2005). VORTEX: a stochastic simulation of the extinction

process. version 9.50 user’s manual. Apple Valley, MN: Conservation Breeding

Specialist Group (SSC/IUCN).

Miller W, Schuster SC, Welch AJ, et al. (2012b). Polar and brown bear genomes reveal

ancient admixture and demographic footprints of past climate change.

Proceedings of the National Academy of Sciences of the United States of America

109(36): E2382-E2390.

Minoche AE, Dohm JC, & Himmelbauer H. (2011). Evaluation of genomic high-

throughput sequencing data generated on Illumina HiSeq and Genome Analyzer

systems. Genome Biology 12: R112.

Miotto RA, Cervini M, Begotti RA, et al. (2012). Monitoring a puma (Puma concolor)

population in a fragmented landscape in southeast Brazil. Biotropica 44(1): 98-

104.

Miotto RA, Cervini M, Figueiredo MG, et al. (2011). Genetic diversity and population

structure of pumas (Puma concolor) in southeastern Brazil: implications for 267

conservation in a human-dominated landscape. Conservation Genetics 12(6):

1447-1455.

Miotto RA, Rodrigues FP, Ciocheti G, et al. (2007). Determination of the minimum

population size of pumas (Puma concolor) through fecal DNA analysis in two

protected cerrado areas in the Brazilian Southeast. Biotropica 39(5): 647-654.

Montgomery ME, Ballou JD, Nurthen RK, et al. (1997). Minimizing kinship in captive

breeding programs. Zoo Biology 16(5): 377-389.

Morin PA & McCarthy M. (2007). Highly accurate SNP genotyping from historical and

low-quality samples. Molecular Ecology Notes 7(6): 937-946.

Morin PA, Archer FI, Foote AD, et al. (2010). Complete mitochondrial genome

phylogeographic analysis of killer whales (Orcinus orca) indicates multiple

species. Genome Research 20(7): 908-916.

Morin PA, Luikart G, Wayne RK, et al. (2004). SNPs in ecology, evolution and

conservation. Trends in Ecology & Evolution 19(4): 208-216.

Morin PA, Martien KK & Taylor BL. (2009). Assessing statistical power of SNPs for

population structure and conservation studies. Molecular Ecology Resources 9(1):

66-73.

Morrison JC, Sechrest W, Dinerstein E, et al. (2007). Persistence of large mammal faunas

as indicators of global human impacts. Journal of Mammalogy 88(6): 1363-1380.

Mullikin JC, Hansen NF, Shen L, et al. (2010). Light whole genome sequence for SNP

discovery across domestic cat breeds. BMC Genomics 11: 406. 268

Naidu A, Smythe LA, Thompson RW, et al. (2011). Genetic analysis of scats reveals

minimum number and sex of recently documented mountain lions. Journal of

Fish and Wildlife Management 2(1): 106-111.

Naidu A. (2009). Genetics analysis of mountain lion (Puma concolor) feces from KOFA

National Wildlife Refuge, Arizona. Masters Thesis, University of Arizona.

Narum SR & Hess JE. (2011). Comparison of FST outlier tests for SNP loci under

selection. Molecular Ecology Resources 11(Suppl 1): 184-194.

Naurin S, Bensch S, Hansson B, et al. (2008). A microarray for large-scale genomic and

transcriptional analyses of the zebra finch (Taeniopygia guttata) and other

passerines. Molecular Ecology Resources 8(2): 275-281.

Nelson EW & Goldman EA. (1929). A new wolf from Mexico. Journal of Mammology

10: 165-166.

Nève G & Meglécz E. (2000). Microsatellite frequencies in different taxa. Trends in

Ecology & Evolution 15(9): 376-377.

Nicholls H. (2008). Darwin 200: let's make a mammoth. Nature 456(7220): 310-314.

Nielsen R & Signorovitch J. (2003). Correcting for ascertainment biases when analyzing

SNP data: applications to the estimation of linkage disequilibrium. Theoretical

Population Biology 63: 245-255.

Nielsen R. (2004). Population genetic analysis of ascertained SNP data. Human

Genomics 1: 218-224.

Nikaido M, Sasaki T, Emerson JJ, et al. (2011). Genetically distinct coelacanth

population off the northern Tanzanian coast. Proceedings of the National

Academy of Sciences of the United States of America 108(44): 18009-18013. 269

Nowak RM. (1995). Ecology and conservation of wolves in a changing world,

Proceedings of the Second North American Wolf Symposium, Canadian

Circumpolar Institute, University of Alberta, Edmonton, Alberta, Canada.

Nowell K & Jackson P. (1996). Wild cats. Status survey and conservation action plan.

IUCN/SSC Cat Specialist Group, Gland, Switzerland and Cambridge, UK.

Ogden R, Baird J, Senn H, et al. (2012). The use of cross-species genome-wide arrays to

discover SNP markers for conservation genetics: a case study from Arabian and

scimitar-horned oryx. Conservation Genetics Resources 4(2): 471-473.

Ogden R, Dawnay N & McEwing R. (2009). Wildlife DNA forensics—bridging the gap

between conservation genetics and law enforcement. Endangered Species

Research 9: 179-195.

Ogden R. (2011). Unlocking the potential of genomic technologies for wildlife forensics.

Molecular Ecology Resources 11: 109-116.

Ohshima K, Nakashima M, Sonoda K, et al. (2001). Expression of RCAS1 and FasL in

human trophoblasts and uterine glands during pregnancy: the possible role in

immune privilege. Clinical and Experimental Immunology 123(3): 481-486.

Oizumi S, Yamazaki K, Nakashima M, et al. (2002). RCAS1 expression: A potential

prognostic marker for adenocarcinomas of the lung. Oncology 62(4): 333-339.

Oliehoek PA & Bijma P. (2009). Effects of pedigree errors on the efficiency of

conservation decisions. Genetics Selection Evolution 41(1): 9.

Oshlack A, Chabot AE, Smyth GK, et al. (2007). Using DNA microarrays to study gene

expression in closely related species. Bioinformatics 23(10): 1235-1242.

Ouborg NJ, Angeloni F & Vergeer P. (2010). An essay on the necessity and feasibility of 270

conservation genomics. Conservation Genetics 11(2): 643-653.

Ouborg NJ. (2009). Integrating population genetics and conservation biology in the era of

genomics. Biology Letters 6(1): 3-6.

Overall ADJ, Byrne KA, Pilkington JG, et al. (2005). Heterozygosity, inbreeding and

neonatal traits in Soay sheep on St Kilda. Molecular Ecology 14(11): 3383-3393.

Pagani I, Liolios K, Jansson J, et al. (2012). The Genomes OnLine Database (GOLD) v.4:

status of genomic and metagenomic projects and their associated metadata.

Nucleic Acids Research 40(D1): D571-D579.

Papic L, Fischer D, Trajanoski S, et al. (2011). SNP-array based whole genome

homozygosity mapping: a quick and powerful tool to achieve an accurate

diagnosis in LGMD2 patients. European Journal of Medical Genetics 54(3): 214-

219.

Parsons D. (1996). Case study: the Mexican wolf. New Mexico Journal of Science 34:

101-123.

Peakall R & Smouse PE. (2012). GenAlEx 6.5: genetic analysis in Excel. Population

genetic software for teaching and research - an update. Bioinformatics 28(19):

2537-2539.

Pemberton JM. (2008). Wild pedigrees: the way forward. Proceedings of the Royal

Society B-Biological Sciences 275(1635): 613-621.

Perry GH, Louis EE, Ratan A, et al. (2013). Aye-aye population genomic analyses

highlight an important center of endemism in northern Madagascar. Proceedings

of the National Academy of Sciences of the United States of America. (ahead of

print). 271

Perry GH, Melsted P, Marioni JC, et al. (2012a). Comparative RNA sequencing reveals

substantial genetic variation in endangered primates. Genome Research 22: 602-

610.

Perry GH, Reeves D, Melsted P, et al. (2012b). A genome sequence resource for the aye-

aye (Daubentonia madagascariensis), a nocturnal lemur from Madagascar.

Genome Biology and Evolution 4(2): 126-135.

Pertoldi C, Wójcik JM, Tokarska M, et al. (2010). Genome variability in European and

American bison detected using the BovineSNP50 BeadChip. Conservation

Genetics 11(2): 627-634.

Piggott MP & Taylor AC. (2003). Remote collection of animal DNA and its applications

in conservation management and understanding the population biology of rare and

cryptic species. Wildlife Research 30(1): 1-13.

Piña-Aguilar RE, Lopez-Saucedo J, Sheffield R, et al. (2009). Revival of extinct species

using nuclear transfer: hope for the mammoth, true for the Pyrenean ibex, but is it

time for "conservation cloning''? Cloning and Stem Cells 11(3): 341-346.

Pop M & Salzberg SL. (2008). Bioinformatic challenges of new sequencing technology.

Trends in Genetics 24(3): 142-149.

Pritchard JK, Stephens M & Donnelly P. (2000). Inference of population structure using

multilocus genotype data. Genetics 155(2): 945-959.

Purcell S, Neale B, Todd-Brown K, et al. (2007). PLINK: A tool set for whole-genome

association and population-based linkage analyses. American Journal of Human

Genetics 81(3): 559-575.

Quail MA, Smith M, Coupland P, et al. (2012). A tale of three next generation 272

sequencing platforms: comparison of Ion Torrent, Pacific Biosciences, and

Illumina MiSeq sequencers. BMC Genomics 13: 341.

Ramos AM, Crooijmans RPMA, Affara NA, et al. (2009). Design of a high density SNP

genotyping assay in the pig using SNPs identified and characterized by next

generation sequencing technology. Plos One 4(8): e6524.

Renaut S, Nolte AW & Bernatchez L. (2010). Mining transcriptome sequences towards

identifying adaptive single nucleotide polymorphisms in lake whitefish species

pairs (Coregonus spp. Salmonidae). Molecular Ecology 19(Suppl 1): 115-131.

Renn SC, Aubin-Horth N, & Hofmann HA (2004). Biologically meaningful expression

profiling across species using heterologous hybridization to a cDNA microarray.

BMC Genomics 5: 42.

Rhymer JM & Simberloff D. (1996). Extinction by hybridization and introgression.

Annual Review of Ecology and Systematics 27: 83-109.

Ribaut JM & Hoisington D. (1998). Marker-assisted selection: new tools and strategies.

Trends in Plant Science 3(6): 236-239.

Riester M, Stadler PF & Klemm K. (2009). FRANz: reconstruction of wild multi-

generation pedigrees. Bioinformatics 25(16): 2134-2139.

Riley SPD, Pollinger JP, Sauvajot RM, et al. (2006). A southern California freeway is a

physical and social barrier to gene flow in carnivores. Molecular Ecology 15(7):

1733-1741.

Rinkevich S. (2012). An assessment of abundance, diet, and cultural significance of

Mexican gray wolves in Arizona. Dissertation, University of Arizona. 273

Ripple WJ, Estes JA, Beschta RL, et al. (2014). Status and ecological effects of the

world's largest carnivores. Science 343(6167): 151-+.

Ritland K. (1996). Estimators for pairwise relatedness and individual inbreeding

coefficients. Genetical Research 67(2): 175-185.

Rodzen JA, Banks JD, Meredith EP, et al. (2007). Characterization of 37 microsatellite

loci in mountain lions (Puma concolor) for use in forensic and population

applications. Conservation Genetics 8(5): 1239-1241.

Romanov MN, Koriabine M, Nefedov M, et al. (2006). Construction of a California

condor BAC library and first-generation chicken-condor comparative physical

map as an endangered species conservation genomics resource. Genomics 88(6):

711-718.

Romanov MN, Tuttle EM, Houck ML, et al. (2009). The value of avian genomics to the

conservation of wildlife. BMC Genomics 10(Suppl 2): S10.

Rosenberg NA, Li LM, Ward R, et al. (2003). Informativeness of genetic markers for

inference of ancestry. American Journal of Human Genetics 73(6): 1402-1422.

Rosenberg NA. (2004). Distruct: a program for the graphical display of population

structure. Molecular Ecology Notes 4(1): 137-138.

Rousseau J, Tetu B, Caron D, et al. (2002). RCAS1 is associated with ductal breast

cancer progression. Biochemical and Biophysical Research Communications

293(5): 1544-1549.

Rousset F. (2008). Genepop’007: a complete re-implementation of the genepop software

for Windows and Linux. Molecular Ecology Resources 8(1): 103-106. 274

Rozen S & Skaletsky HJ. (2000). Primer3 on the WWW for general users and for

biologist programmers. In Bioinformatics Methods and Protocols: Methods in

Molecular Biology (Krawetz, S. & Misener, S., eds.), pp. 365-386. Humana Press,

Totowa, New Jersey.

Rudan I. (1999). Inbreeding and cancer incidence in human isolates. Human Biology

71(2): 173-187.

Rutledge LY, Wilson PJ, Klütsch CFC, et al. (2012). Conservation genomics in

perspective: A holistic approach to understanding Canis evolution in North

America. Biological Conservation 155: 186-192.

Ryder OA. (2005). Conservation genomics: applying whole genome studies to species

conservation efforts. Cytogenetic and Genome Research 108(1-3): 6-15.

Sabeti PC, Varilly P, Fry B, et al. (2007). Genome-wide detection and characterization of

positive selection in human populations. Nature 449(7164): 913-U912.

Sanger F, Air GM, Barrell BG, et al. (1977). Nucleotide sequence of bacteriophage

phiX174 DNA. Nature 265(5596): 687-695.

Sanger F, Nicklen S & Coulson AR. (1977). DNA sequencing with chain-terminating

inhibitors. Proceedings of the National Academy of Sciences of the United States

of America 74(12): 5463-5467.

Santure AW, Stapley J, Ball AD, et al. (2010). On the use of large marker panels to

estimate inbreeding and relatedness: empirical and simulation studies of a

pedigreed zebra finch population typed at 771 SNPs. Molecular Ecology 19(7):

1439-1451. 275

Saura MA, Fernández A, Rodríguez MC, et al. (2013). Genome-wide estimates of

coancestry and inbreeding in a closed herd of ancient Iberian pigs. Plos One

8(10): e78314.

Schaefer RJ, Torres SG & Bleich VC. (2000). Survivorship and cause-specific mortality

in sympatric populations of mountain sheep and mule deer. California Fish and

Game 86(2): 127-135.

Schena M, Shalon D, Davis RW, et al. (1995) Quantitative monitoring of gene expression

patterns with a cDNA microarray. Science 270(5235): 467-470.

Schlotterer C. (2004). The evolution of molecular markers - just a matter of fashion?

Nature Reviews Genetics 5(1): 63-69.

Schmitz OJ, Hamback PA & Beckerman AP. (2000). Trophic cascades in terrestrial

systems: a review of the effects of carnivore removals on plants. American

Naturalist 155(2): 141-153.

Schmitz OJ, Hamback PA & Beckerman AP. (2000). Trophic cascades in terrestrial

systems: a review of the effects of carnivore removals on plants. American

Naturalist 155(2): 141-153.

Schoville SD, Bonin A, François O, et al. (2012). Adaptive genetic variation on the

landscape: methods and cases. Annual Review of Ecology, Evolution, and

Systematics 43(1): 23-43.

Seabury CM, Bhattarai EK, Taylor JF, et al. (2011). Genome-wide polymorphism and

comparative analyses in the white-tailed deer (Odocoileus virginianus): a model

for conservation genomics. Plos One 6(1): e15811. 276

Seal US. (1990). Mexican wolf (Canis lupus baileyi) population viability assessment.

Review draft report of workshop held 22-24 October, 1990, at Fossil Rim

Wildlife Center, Texas, USA.

Seddon JM, Parker HG, Ostrander EA, et al. (2005). SNPs in ecological and conservation

studies: a test in the Scandinavian wolf population. Molecular Ecology 14(2):

503-511.

Seeb JE, Carvalho G, Hauser L, et al. (2011). Single-nucleotide polymorphism (SNP)

discovery and applications of SNP genotyping in nonmodel organisms. Molecular

Ecology Resources 11(Suppl 1): 1-8.

Seidensticker J, Hornocker MG, Wiles WV, et al. (1973). Mountain lion social

organization in the Idaho Primitive Area. Wildlife Monographs 35: 1-60.

Selkoe KA & Toonen RJ. (2006). Microsatellites for ecologists: a practical guide to using

and evaluating microsatellite markers. Ecology Letters 9(5): 615-629.

Seymour AM, Montgomery ME, Costello BH, et al. (2001). High effective inbreeding

coefficients correlate with morphological abnormalities in populations of South

Australian koalas (Phascolarctos cinereus). Animal Conservation 4(3): 211-219.

Shendure J & Ji HL. (2008). Next-generation DNA sequencing. Nature Biotechnology

26(10): 1135-1145.

Silió L, Rodríguez MC, Fernández A, et al. (2013). Measuring inbreeding and inbreeding

depression on pig growth from pedigree or SNP-derived metrics. Journal of

Animal Breeding and Genetics 130(5): 349-360.

Siminski DP. (2011). Mexican wolf, Canis lupus baileyi, international studbook, 2011.

The Living Desert, Palm Desert, California, USA. 277

Sinclair EA, Swenson EL, Wolfe ML, et al. (2001). Gene flow estimates in Utah's

cougars imply management beyond Utah. Animal Conservation 4: 257-264.

Slate J, David P, Dodds KG, et al. (2004). Understanding the relationship between the

inbreeding coefficient and multilocus heterozygosity: theoretical expectations and

empirical data. Heredity 93(3): 255-265.

Sonesson AK, Janss LLG & Meuwissen THE. (2003). Selection against genetic defects

in conservation schemes while controlling inbreeding. Genetics Selection

Evolution 35(4): 353-368.

Sonoda K, Miyamoto S, Hirakawa T, et al. (2003). Association between RCAS1

expression and clinical outcome in uterine endometrial cancer. British Journal of

Cancer 89(3): 546-551.

Soulé, ME. (1985). What is conservation biology?. BioScience 35(11): 727-734.

Stephens M & Donnelly P. (2003). A comparison of Bayesian methods for haplotype

reconstruction from population genotype data. American Journal of Human

Genetics 73(5): 1162-1169.

Stephens M, Smith NJ & Donnelly P. (2001). A new statistical method for haplotype

reconstruction from population data. American Journal of Human Genetics 68(4):

978-989.

Stoevesandt O, Taussig MJ & He MY. (2009). Protein microarrays: high-throughput

tools for proteomics. Expert Review of Proteomics 6(2): 145-157.

Stölting KN, Nipper R, Lindtke D, et al. (2013). Genomic scan for single nucleotide

polymorphisms reveals patterns of divergence and gene flow between

ecologically divergent species. Molecular Ecology 22(3): 842-855. 278

Storfer A, Eastman JM & Spear SF. (2009). Modern molecular methods for amphibian

conservation. BioScience 59(7): 559-571.

Sunquist ME & Sunquist F. (2002). The essence of cats. In Wild cats of the world

(Sunquist, M. E. & Sunquist, F., eds.), pp. 11-13. University of Chicago Press,

Chicago, Illinois, USA.

Szpiech ZA, Xu JS, Pemberton TJ, et al. (2013). Long runs of homozygosity are enriched

for deleterious variation. American Journal of Human Genetics 93(1): 90-102.

Taberlet P, Camarra JJ, Griffin S, et al. (1997). Noninvasive genetic tracking of the

endangered Pyrenean brown bear population. Molecular Ecology 6(9): 869-876.

Taberlet P, Waits LP & Luikart G. (1999). Noninvasive genetic sampling: look before

you leap. Trends in Ecology & Evolution 14(8): 323-327.

Takahashi S, Urano T, Tsuchiya F, et al. (2003). EBAG9/RCAS1 expression and its

prognostic significance in prostatic cancer. International Journal of Cancer

106(3): 310-315.

Talapatra A, Rouse R & Hardiman G. (2002). Protein microarrays: challenges and

promises. Pharmacogenomics 3(4): 527-536.

Thomson JM, Parker J, Perou CM, et al. (2004). A custom microarray platform for

analysis of microRNA gene expression. Nature Methods 1(1): 47-53.

Torres SG, Mansfield TM, Foley JE, et al. (1996). Mountain lion and human activity in

California: testing speculations. Wildlife Society Bulletin 24(3): 451-460.

Tsoi SCM, Cale JM, Bird IM, et al. (2003). Use of human cDNA microarrays for

identification of differentially expressed genes in Atlantic salmon liver during

Aeromonas salmonicida infection. Marine Biotechnology 5: 545–554. 279

Tsuchihashi Z & Dracopoli NC. (2002). Progress in high throughput SNP genotyping

methods. Pharmacogenomics Journal 2(2): 103-110.

Tsuneizumi MK, Nagai H, Harada H, et al. (2002). A highly polymorphic CA repeat

marker at the EBAG9/RCAS1 locus on 8q23 that detected frequent multiplication

in breast cancer. Annals of Human Biology 29(4): 457-460.

Turner JW, Wolfe ML & Kirkpatrick JF. (1992). Seasonal mountain lion predation on a

feral horse population. Canadian Journal of Zoology 70(5): 929-934.

Tymchuk WV, O’Reilly P, Bittman J, et al. (2010). Conservation genomics of Atlantic

salmon: variation in gene expression between and within regions of the Bay of

Fundy. Molecular Ecology 19(9): 1842-1859.

U.S. Fish and Wildlife Service [USFWS]. (1976). Determination that two species of

butterflies are threatened species and two species of mammals are endangered

species. Federal Register 41(83): 17736-17740.

U.S. Fish and Wildlife Service [USFWS]. (2007). Red wolf (Canis rufus) 5-year status

review: summary and evaluation. U.S. Fish and Wildlife Service Southeast

Region, Manteo, North Carolina, USA.

U.S. Fish and Wildlife Service [USFWS]. (2010). Mexican wolf conservation

assessment. U.S. Fish and Wildlife Service Southwest Region, Albuquerque, New

Mexico, USA.

U.S. Fish and Wildlife Service [USFWS]. (2011). Eastern puma (=cougar) (Puma

concolor couguar) 5-year review: summary and evaluation. U.S. Fish and

Wildlife Service, Maine Field Office, Orono, Maine, USA. 280

U.S. Fish and Wildlife Service [USFWS]. (2012). Mexican wolf Blue Range

reintroduction project statistics. U.S. Fish and Wildlife Service, Downloaded

from: http://www.fws.gov/southwest/es/mexicanwolf/.

Uddin M, Wildman DE, Liu GZ, et al. (2004). Sister grouping of chimpanzees and

humans as revealed by genome-wide phylogenetic analysis of brain gene

expression profiles. Proceedings of the National Academy of Sciences of the

United States of America 101(9): 2957-2962.

Umeoka K, Sanno N, Oyama K, et al. (2001). Immunohistochemical analysis of RCAS1

in human pituitary adenomas. Modern Pathology 14(12): 1232-1236.

Vahey MT, Nau ME, Taubman M, et al. (2003). Patterns of gene expression in peripheral

blood mononuclear cells of rhesus macaques infected with SIVmac251 and

exhibiting differential rates of disease progression. AIDS Research and Human

Retroviruses 19(5): 369-387.

Van Belleghem SM, Roelofs D, Van Houdt J, et al. (2012). De novo transcriptome

assembly and SNP discovery in the wing polymorphic salt marsh beetle Pogonus

chalceus (Coleoptera, Carabidae). Plos One 7(8): e42605. van Orsouw NJ, Hogers RCJ, Janssen A, et al. (2007). Complexity reduction of

polymorphic sequences (CRoPS (TM)): a novel approach for large-scale

polymorphism discovery in complex genomes. Plos One 2(11): e1172.

Van Tassell CP, Smith TPL, Matukumalli LK, et al. (2008). SNP discovery and allele

frequency estimation by deep sequencing of reduced representation libraries.

Nature Methods 5(3): 247-252. 281

Vaysse A, Ratnakumar A, Derrien T, et al. (2011). Identification of genomic regions

associated with phenotypic variation between dog breeds using selection

mapping. PLoS Genetics 7(10): e1002316.

Vera JC, Wheat CW, Fescemyer HW, et al. (2008). Rapid transcriptome characterization

for a nonmodel organism using 454 pyrosequencing. Molecular Ecology 17(7):

1636-1647.

Vignal A, Milan D, SanCristobal M, et al. (2002). A review on SNP and other types of

molecular markers and their use in animal genetics. Genetics Selection Evolution

34(3): 275-305.

Vilá C, Amorim IR, Leonard JA, et al. (1999). Mitochondrial DNA phylogeography and

population history of the grey wolf Canis lupus. Molecular Ecology 8(12): 2089-

2103.

Vogl C, Karhu A, Moran G, et al. (2002). High resolution analysis of mating systems:

inbreeding in natural populations of Pinus radiata. Journal of Evolutionary

Biology 15(3): 433-439. vonHoldt BM, Pollinger JP, Earl DA, et al. (2011). A genome-wide perspective on the

evolutionary history of enigmatic wolf-like canids. Genome Research 21(8):

1294-1305. vonHoldt BM, Pollinger JP, Lohmueller KE, et al. (2010). Genome-wide SNP and

haplotype analyses reveal a rich history underlying dog domestication. Nature

464(7290): 898-902. 282

Waits LP & Paetkau D. (2005). Noninvasive genetic sampling tools for wildlife

biologists: a review of applications and recommendations for accurate data

collection. Journal of Wildlife Management 69(4): 1419-1433.

Waits LP, Luikart G & Taberlet P. (2001). Estimating the probability of identity among

genotypes in natural populations: cautions and guidelines. Molecular Ecology

10(1): 249-256.

Wakeley J, Nielsen R, Liu-Cordero SN, et al. (2001). The discovery of single-nucleotide

polymorphisms - and inferences about human demographic history. American

Journal of Human Genetics 69(6): 1332-1347.

Walker CW, Harveson LA, Pittman MT, et al. (2000). Microsatellite variation in two

populations of mountain lions (Puma concolor) in Texas. Southwestern Naturalist

45(2): 196-203.

Wang J. (2011). A new likelihood estimator and its comparison with moment estimators

of individual genome-wide diversity. Heredity 107(5): 433-443.

Wang J. (2011). Coancestry: a program for simulating, estimating and analysing

relatedness and inbreeding coefficients. Molecular Ecology Resources 11(1): 141-

145.

Wang JJ, Yu XM, Zhao K, et al. (2012). Microsatellite development for an endangered

Bream Megalobrama pellegrini (Teleostei, Cyprinidae) using 454 sequencing.

International Journal of Molecular Sciences 13(3): 3009-3021.

Wang JL & Hill WG. (2000). Marker-assisted selection to increase effective population

size by reducing Mendelian segregation variance. Genetics 154(1): 475-489. 283

Wang S, Haynes C, Barany F, et al. (2009). Genome-wide autozygosity mapping in

human populations. Genetic Epidemiology 33(2): 172-180.

Wang XV, Blades N, Ding J, et al. (2012). Estimation of sequencing error rates in short

reads. BMC Bioinformatics 13: 185.

Wang Z, Gerstein M, & Snyder M. (2009). RNA-Seq: a revolutionary tool for

transcriptomics. Nature Reviews Genetics 10: 57-63.

Watanabe T, Inoue S, Hiroi H, et al. (1998). Isolation of estrogen-responsive genes with a

CpG island library. Molecular and Cellular Biology 18(1): 442-449.

Wayne RK & Vilá C. (2003). Molecular genetic studies of wolves. In Wolves: behavior,

ecology, and conservation (Mech, L. D. & Boitani, L., eds.). The University of

Chicago Press, Chicago, Illinois, USA.

Wayne RK, Lehman N, Allard MW, et al. (1992). Mitochondrial DNA variability of the

gray wolf - genetic consequences of population decline and habitat fragmentation.

Conservation Biology 6(4): 559-569.

Weber M. (1989). La pureza racial del lobo gris Mexicano (Canis lupus baileyi) en

cautiverio en Mexico: estudios preliminares. Memoria del VI Simposio Sobre

Fauna Silvestre; F.M.V.Z. UNAM.

Weber W & Rabinowitz A. (1996). A global perspective on large carnivore conservation.

Conservation Biology 10(4): 1046-1054.

Wehausen JD. (1996). Effects of mountain lion predation on bighorn sheep in the Sierra

Nevada and Granite Mountains of California. Wildlife Society Bulletin 24(3): 471-

479. 284

Weir BS & Cockerham CC. (1984). Estimating F-statistics for the analysis of population-

structure. Evolution 38(6): 1358-1370.

Wiedmann RT, Smith TPL & Nonneman DJ. (2008). SNP discovery in swine by reduced

representation and high throughput pyrosequencing. BMC Genetics 9: 81.

Wigginton JE, Cutler DJ & Abecasis GR. (2005). A note on exact tests of Hardy-

Weinberg equilibrium. American Journal of Human Genetics 76(5): 887-893.

Wilhelm BT & Landry J. (2009). RNA-Seq - quantitative measurement of expression

through massively parallel RNA-sequencing. Methods 48(3): 249-257.

Williams JL. (2005). The use of marker-assisted selection in animal breeding and

biotechnology. Revue Scientifique Et Technique-Office International Des

Epizooties 24(1): 379-391.

Willing EM, Dreyer C & van Oosterhout C. (2012). Estimates of genetic differentiation

measured by FST do not necessarily require large sample sizes when using many

SNP markers. Plos One 7(8): e42649.

Witzenberger KA & Hochkirch A. (2011). Ex situ conservation genetics: a review of

molecular studies on the genetic consequences of captive breeding programmes

for endangered animal species. Biodiversity and Conservation 20(9): 1843-1861.

Wolf JB, Bayer T, Haubold B, et al. (2010). Nucleotide divergence vs. gene expression

differentiation: comparative transcriptome sequencing in natural isolates from the

carrion crow and its hybrid zone with the hooded crow. Molecular Ecology

19(Supp1): 162-175.

Wong AK, Ruhe AL, Dumont BL, et al. (2010). A comprehensive linkage map of the dog

genome. Genetics 184(2): 595-U436. 285

Wright S. (1921). Systems of mating. I. The biometric relations between parent and

offspring. Genetics 6(2): 111-123.

Wright S. (1922). Coefficients of inbreeding and relationship. American Naturalist 56:

330-338.

Wu XM, Viveiros MM, Eppig JJ, et al. (2003a). Zygote arrest 1 (Zar1) is a novel

maternal-effect gene critical for the oocyte-to-embryo transition. Nature Genetics

33(2): 187-191.

Wu XM, Wang P, Brown CA, et al. (2003b). Zygote arrest 1 (Zar1) is an evolutionarily

conserved gene expressed in vertebrate ovaries. Biology of Reproduction 69(3):

861-867.

Young SP & Goldman EA. (1944). The Wolves of North America: Part I, Dover

Publications Inc., New York, New York, USA.

Zhan X, Pan S, Wang J, et al. (2013). Peregrine and saker falcon genome sequences

provide insights into evolution of a predatory lifestyle. Nature Genetics (ahead of

print).

Zhang B, Kirov S & Snoddy J. (2005). WebGestalt: an integrated system for exploring

gene sets in various biological contexts. Nucleic Acids Research 33(Suppl 2):

W741-W748.

Zhao S, Zheng P, Dong S, et al. (2013). Whole-genome sequencing of giant pandas

provides insights into demographic history and local adaptation. Nature Genetics

45(1): 67-71.

Zhou X, Ren L, Meng Q, et al. (2010). The next-generation sequencing technology and

application. Protein & Cell 1(6): 520-536.