<<

The Pennsylvania State University

The Graduate School

Eberly College of

DNA PRESERVATION UNDER EXTREME CONDITIONS

A Dissertation in

Biology and Astrobiology

by

Kristine Korzow Richter

© 2014 Kristine Korzow Richter

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2014

The dissertation of Kristine Korzow Richter was reviewed and approved* by the following:

Beth Shapiro Associate Professor, and Evolutionary Biology, UC Santa Cruz Dissertation Co-Adviser

Jennifer L. Macalady Associate Professor of Geosciences Dissertation Co-Adviser, Co-Chair of Committee

James H. Marden Professor of Biology Co-Chair of Committee

Stephen W. Schaeffer Professor of Biology

Christopher House Professor of Geosciences, Director PSARC, Director PA Space Grant Consortium

Douglas R. Cavener Professor of Biology Head of Department of Biology

*Signatures are on file in the Graduate School

ii

Abstract

Over the past thirty years the field of ancient DNA (aDNA) has expanded from looking at short DNA sequences from historic in museums to reconstructing entire genomes of extinct organisms. The preservation conditions of DNA in bone are well researched; however, ancient DNA is found in more places than in bone. The research herein discusses several of these other environments: microbial DNA from cells in the deep subsurface, microbial DNA trapped in salt crystals, and DNA transferred to objects during use. In addition, extreme conditions of preservation in bones are briefly addressed. Possible recovery of DNA biomarkers of a 34.5 million year old hydrothermal system indicates that ancient environments may leave traces for millennia in microbial communities. Discovery of Native American DNA preserved in moccasins and cordage for 700 years leads to a potentially large source of information on the genetic composition of ancient populations. Development and analysis of the Depressed Rate Test (DRT) shows the potential for authenticating ancient claims using sequence data only for when isotopic dating is unavailable. Finally, unsuccessful recovery of DNA using tradition methods from bone samples from living 100,000-200,000 years ago addresses the limitations of traditional methods. However, new methods may allow DNA recovery from these samples. Examining DNA preservation under different conditions helps to inform sampling strategies, analytical techniques, and authentication of DNA data from extreme environments here on Earth and on other planets or moons in the Solar System.

iii

Table of Contents List of Figures ...... vii List of Tables ...... x Acknowledgments ...... xii Chapter 1: Introduction ...... 1 DNA as a window to the past ...... 2 Concerns with aDNA research ...... 4 DNA degradation through time ...... 4 Co-extracted compounds inhibit reactions ...... 8 DNA contamination ...... 9 Characterizing DNA under extreme conditions ...... 13 Determining the authenticity of geologically ancient DNA ...... 14 Detecting biomarkers from an ancient hydrothermal system ...... 15 Recovering aDNA from touched objects ...... 18 The limit of recoverable DNA in bones ...... 19 Ancient DNA and astrobiology implications ...... 20 References ...... 23 Chapter 2: A computational test of the authenticity of geologically ancient DNA ...... 41 Introduction ...... 42 Materials and Methods ...... 46 Data Sets ...... 46 Analysis ...... 50 Results ...... 52 Simulated Data ...... 52 Horse Data ...... 55 Bacterial and Archaeal ‘Test’ Data ...... 55 Discussion ...... 56 References ...... 60 Chapter 3: Biosignatures of thermophilic microorganisms from a temporary hydrothermal system 34.5 million years ago in the Chesapeake Bay Impact Structure ...... 65 Introduction ...... 66 Impact craters and biomarkers ...... 66 Chesapeake Bay impact structure ...... 67 Materials and Methods ...... 70 Coring and contamination control ...... 70 DNA extraction ...... 70 PCR amplification, library preparation and sequencing ...... 72 Data processing and analysis ...... 74 Results ...... 76 DNA extraction, amplification and library preparation ...... 76 Sequencing results and metagenomic composition ...... 77 analysis ...... 78 Genes Indicative of Metabolic Potential ...... 85 Discussion ...... 87 Low biomass samples – difficulties resulting in high fragmentation ...... 87 Community composition and biomarker identification ...... 88 Biomarkers of an ancient hydrothermal system ...... 90

iv

Astrobiology and DNA biomarkers ...... 92 References ...... 93 Chapter 4: Identifying human haplotypes from DNA transferred from objects during use ...... 102 Introduction ...... 103 DNA transferred to objects ...... 103 Political and cultural challenges of working with human remains ...... 103 Challenges of DNA recovery from transferred objects ...... 104 Cave Site ...... 105 Materials and Methods ...... 107 Sampling ...... 107 Extraction ...... 107 Amplification ...... 110 SNP amplification design ...... 110 Library preparation, and sequencing ...... 111 Sequence analysis and haplotypes identification ...... 111 Results ...... 113 Quality of Data ...... 113 Damage ...... 113 Contamination ...... 114 SNPs and haplotypes from ancient samples ...... 114 Discussion ...... 116 References ...... 118 Chapter 5: The limit of recoverable aDNA in bones – DNA recovered from bones from the bearing sites of the Yukon Territory, Canada and Snowmass, Colorado USA ...... 120 Introduction ...... 121 The fossil bearing sites of Snowmass, Colorado (USA) and the Yukon, Canada ...... 123 Bison latifrons ...... 124 Cervidae family ...... 125 hesternus ...... 125 Materials and Methods ...... 127 Sampling and extraction ...... 127 DNA amplification, and library preparation, and sequencing ...... 128 Analysis ...... 130 Results ...... 130 Snowmass bison and deer ...... 130 Yukon camels ...... 130 Discussion ...... 131 Snowmass bison and deer ...... 131 Yukon camels ...... 132 Traditional PCR vs. library preparation and capture for very old sequences ...... 132 Too old, and not too cold ...... 133 References ...... 135 Chapter 6: Conclusion ...... 140 DNA Preservation under extreme conditions ...... 141 Determining the authenticity of geologically ancient DNA ...... 141 Detecting biomarkers from an ancient hydrothermal system ...... 142 Recovering aDNA from touched objects ...... 143 The limit of recoverable DNA in bones ...... 143 DNA in extreme conditions ...... 144

v

Ancient DNA and Astrobiology ...... 144 Analog environments and preservation conditions ...... 145 Remote DNA detection and analysis ...... 147 Future Directions ...... 147 References ...... 149

vi

List of Figures

Figure 1-1: Post-mortem DNA modification from ancient remains. (a) Formation of strand breaks though hydrolytic cleavage from either (i) direct cleavage of the phosphodiester bond or (ii) depurination followed by breakage of the backbone through beta elimination. (b) Different types of crosslink formation including (i) interstrand crosslinks by alkylation and (ii) intermolecular crosslinks though the Mailard reaction. (c) Oxidative and hydrolytic cleavage result in modification of bases leading to (i) blocking lesions or (ii) miscoding lesions. Figure from Willerslev and Cooper (2005)...... 6

Figure 1-2: On the left, fragmentation of DNA through time showing lesion frequency or single and double stranded breaks. Double stranded breaks occur more frequently in older samples. While there is a possibility for correcting single stranded breaks, double stranded breaks fragment the DNA permanently. On the right, interstrand crosslink formation through time. Crosslink formation results in blocking the polymerase and rendering the DNA not amplifiable and effectively shortening the fragment size. Figure from Mitchell et al. (2005)...... 7

Figure 1-3: Long term survival of 100 bp of DNA as a function of temperature. Calculations were based upon a genome size of 3.0x106 bp, the Arrhenius equation, and the depurination kinetics of Lindahl (1993). Figure from Willerslev et al. (2004a)...... 8

Figure 1-4: One example of a method for multiple subsampling events to remove contamination. Figure from Juck et al. (2005)...... 11

Figure 2-1: Within a family of organisms there is a distribution of relative molecular clock rates that can be determined (gray box). Additional sequences can be added and their position relative to the modern distribution of rates can be estimated. Modern sequences assumed to be ancient will have an artificially increased rate (blue box) due to the fact that there are more mutations than would be expected, while ancient sequences assumed to be modern will have an artificially depressed rate (red box) due to the fact that there are fewer mutations than expected. The proposed method makes use of the ancient sequences assumed to be modern and looks for a depressed rate compared to the modern distribution. Image adapted from Hebsgaard et al. (2005)...... 45

Figure 2-2: For each ancient sequence at each rate there are tree trees, each of which has three ancient sequences at each time point. Each of the trees has three replicate sequence sets generated for each rate. The control sequence set contains 50 modern sequences for each tree. For DS1 this results in a total of 684 sequence sets and for DS2 this results in 360 sequence sets that were then used to test the power of DRT...... 47

vii

Figure 2-3: The distribution of calculated mutation rates for the external branches in the trees. All data is from simulated data sets with a root height of 1 Mya. Row one has a rate of 5x10-7, row two a rate of 5x10-8, and row three a rate of 5x10-9. The ancient sequences, which are shaded in red, have fixed ages of 1 Mya (column 1), 100 kya (column 2), and 10 kya (column 3). The ancient sequences with a rate of 5x10-7 have significantly depressed rates for sequences aged 1 Mya (p-value 0.00000±0.00000) and 100 kya (p-value 0.00000±0.00000) but not for 10 kya (p-value 0.62071±0.01002). Ancient sequences with a rate of 5x10-8 have significantly depressed rates only for the sequence aged 1 Mya (p-value 0.00000±0.00000). Ancient sequences with rates of 5x10-9 have no sequences with significantly depressed rates...... 53

Figure 2-4: Relative rate distribution for horse data. The ancient samples are highlighted in red for a 700 kya bone, an 80-120 kya bone, and a 25 kya bone. Based upon simulation the 700 kya bone is within the range of detection for the DRT; however, significantly depressed rate was not detected...... 55

Figure 3-1: Location of the drill site within in the Chesapeake Bay Impact Structure. The location is also shown in relation to the eastern USA. Image from Cockell et al. (2012)...... 67

Figure 3-2: Core depth in relation to composition of the core, impact processes from Gohen et al. (2008), and cell counts from Cockell et al. (2009). Location of the three samples in this study are indicated with blue arrows on the far left...... 69

Figure 3-3: A) Rarefaction plot showing a curve of the species richness from the taxonomic data. The curve is a plot of the total number of distinct species annotations as a function of the number of sequences sampled. The flattening of the curve to the right indicates that more intensive sampling is likely to yield only a few additional species. The curves represent each of the four different barcodes for Biocore 2 (red), Biocore 3 (orange), Biocore 4 (yellow), and the EN (black). B) The first and second axes of the PCoA plot generated by MG-RAST for the 16 individual barcodes. The barcodes cluster by sample and are separate from one another. The Extraction Negative clusters closest to Biocore 29 which is expected since Biocore 29 would have the least amount of template DNA making it the most prone to contamination. C) PCA plot of the Biocores and several different environments (Chesapeake Bay Water (CB Water), subsurface samples from the Gulf of Mexico (Gulf of Mexico) and the Peru Margin (Peru Margin)). Most of the variation between the samples separates Biocore 29 from the rest of the samples. This is consistent with Biocore 29 being far deeper than all of the other samples and thus represents a different environment. Biocore 3 is also deeper than the other samples, but represents a different geochemical environment than Biocore 29 which is consistent with the separation from both Biocore 29 and the other samples. The EN represents a clean room environment which is different from the sediment and water environments of the other samples. By removing the sequences which represented a large part of the contamination in the Biocores, the EN no longer groups closely with Biocore 29...... 79

Figure 3-4: Graphs of the taxonomic composition of the 16S rRNA sequence data from the CBIS samples...... 81

Figure 3-5: Graphs of the taxonomic composition of the protein sequence data from the CBIS samples ...... 82

viii

Figure 3-6: Graphs of the taxonomic composition of from protein sequence data from the CBIS samples...... 83

Figure 4-1: Moccasin (42B01 10241) with porcupine quill applique and piping decorating the toe and apron seam (Source: Utah Museum of Natural History)...... 106

Figure 4-2: Haplogroup map including key Native American haplogropus X2a, A2a, B4a and B2. The SNPs which determine each haplogropup are indicated. When possible more than one SNP was targeted for each branching point. In order to confirm a haplotype present in the sample at least one SNP from each branching point to the haplogroup had to be present...... 115

Figure 5-1: Map of the locations where the bones analyzed in this paper were recovered (Map made with www.zeemaps.com)...... 123

Figure 5-2: Phylogeny of the Camelidae and dietary assignment from dentil microwear analysis. The tribes Lamini and Camelini are marked on the tree which shows the relationships between Camelops and modern camelid genera: Lamah, Vicugna, and Camelus. Figure from Semprebon and Rivals (2010)...... 126

ix

List of Tables

Table 2-1: Horses with sample numbers and ages...... 49

Table 2-2: Ancient bacterial and archaeal sequences tested including reported age and ingroup and outgroup genera or order used in the analysis. Data comes from the following sources a) Panieri et. al. (2010), b) Fish et. al. (2002), c) Park et. al. (2009), and d) Vreeland et. al. (2007)...... 50

Table 3-1: Sample depth, location in the core, and date collected for the samples analyzed...... 70

Table 3-2: PCR primers including fragment length and annealing temperature in degrees Celsius. a) Primers from Baker et al. (2003)...... 73

Table 3-3: Genes and proteins corresponding to metabolisms which were investigated...... 75

Table 3-4: Protocols tested on the Biocore samples, which extractions were used to make Illumina libraries, and which libraries were sequenced...... 76

Table 3-5: Information about sequences from quality control and protein and rRNA identification from MG-RAST. ADR is the number of duplicate reads...... 77

Table 3-6: Comipled familes from and archaea were compared between samples. The second column shows number of families that are only present in the Extraction Negative. The thrid column shows the number of famlies shared between the core sample and the Extraction Negative. The fourth column shows the families which are found only shared between the Biocores. The fifth column shows the number of famlies which are unique to each Biocore. Clearly based upon family composition there the samples differ with all Biocores containing unique famliles from each other and the extraction negative...... 78

Table 3-7: Families of bacteria and archaea unique to the biocores. Some of these are clearly shown in Figures 3-4 to 3-6 (pie charts) and some are families with low percentages which are not clearly indicated in the pie charts...... 84

Table 3-8: The metagenomes were searched for key metabolism genes first using nucleotide sequence (columns 2-5) and then for confirmation those entire reads translated and compared to a protein database (columns 6-9)...... 86

Table 4-1: List of samples used in aDNA analysis, the object type, the type of sampling used, and the radiocarbon dates for the item (when available). Items marked with a * are from a private collection, other moccasins from 1937 are from the Utah Museum of Natural History. Samples from 2011 were taken during excavation...... 109

Table 4-2: Table of primers used for analysis including length of product, anealing temperature in degrees Celsius, location in the genome, and SNPs amplified on the fragment. a – primers from Álvarez-Iglesias et al. (2007), b – primers from van Oven and Kayser (2009), c – primers from Vallone et al. (2004), d – primers from Ripan Malhi (pers. com.), e – primers for the HVS1 region which were used to test for presence of Native American DNA...... 112

x

Table 4-3: Samples of items tested for Native American DNA indicating which samples were targeted for SNP analysis (presence of Native American clones). For those 18 samples the number of SNP locations which amplfied is reported (maximum 39) and the number of those locations where SNPs were dtected are reported. Finally the haplotype or haplotypes present are reported...... 116

Table 5-1: Samples which were extracted for aDNA analysis, including the sample location and morphological species idnetification. DMNS is the Dever Museum of and Science...... 128

Table 5-2: Primer sequences used in this experiment, along with the fragment length for the amplified product. * indicates primers from Shapiro et al. (2004) and ** indicates primers from Weinstock et al. (2009)...... 129

xi

Acknowledgments

This required a lot of work and persistence to complete. First, I want to thank those who worked in the lab with me – sometimes for many, many hours especially Mathias Stiller. Second, I send a big thanks to those who spent hours editing my thesis for grammar and spelling and doing the practical things like helping clean and cook – Nick and Janet Korzow, Jeannie and Joe Richter, and Didem Ikis. Third, I could not have gotten through all the failures in these projects without the support of my boss Dr. Beth Shapiro and my husband Stephen Richter. Thanks!

xii

Chapter 1: Introduction

DNA as a window to the past Deoxy-ribonucleic acid (DNA) encodes the genes that make up every living creature on earth. DNA from living organisms can determine evolutionary relationships between organisms, metabolisms, character traits, and more. From organisms that are long dead, DNA can provide a window into the organism’s diet, disease history, relationship to extant (still living) organisms, character traits, and metabolic processes. Thirty years ago the first attempts were made to recover ancient DNA (aDNA) sequences from skin of a quagga, an extinct member of the horse family (Higuchi et al., 1984) and from human mummies (Pääbo, 1985, 1989). These early attempts used molecular cloning to amplify small sequence fragments and revealed several important characteristics of aDNA. First, DNA recovered from ancient samples was principally microbial or fungal in origin. Second, endogenous DNA was often limited to very low concentrations of short, damaged fragments of multi-copy loci (such as mitochondrial DNA (mtDNA)).

With the invention of the polymerase chain reaction (PCR), it became possible to amplify and study samples with low concentrations of DNA, thereby allowing the number and range of aDNA studies to increase (Pääbo, 1989; Pääbo et al., 1989; Thomas et al., 1989). PCR, however, also created a problem. PCR’s enormous amplifying power created an increased sensitivity to contamination from modern DNA especially previously amplified PCR products. This resulted in a number of false positives from intra-laboratory contamination yielding spectacular claims of DNA sequences surviving millions of years in remains (Golenberg et al., 1990; Soltis et al., 1992), dinosaur bones (Woodward et al., 1994), and amber inclusions (Cano et al., 1992a, 1992b, 1993; DeSalle, 1994; DeSalle et al., 1992, 1993; Poinar and Cano, 1993). Most of the claims have either been disproven as they are shown to originate from human or microbial contamination (Gutiérrez and Marin, 1998; Zischler et al., 1995) or have proved impossible to replicate independently (Austin et al., 1997; Sidow et al., 1991). Some of the others have been disregarded because a lack of appropriate methods renders them effectively meaningless (Adcock et al., 2013; Cooper et al., 2001).

With the publication of standards for aDNA research which addressed issues of contamination and template damage (Cooper and Poinar, 2000; Willerslev and Cooper, 2005), aDNA research moved through its troubled past into a viable scientific discipline. Using PCR based techniques, relationships between living and extinct animals were elucidated. For example,

2

woolly mammoths were found to be more closely related to Asian than African elephants (Krause et al., 2006). Also information about phenotypic traits of extinct animals have been recovered. In mammoths, the single-exon nuclear melanocortin type I receptor (Mclr) gene was sequenced and found to be polymorphic implying that populations of mammoths had both dark and light haired individuals (Römpler et al., 2006).

Although most aDNA work has been focused on bones, teeth, and occasionally hair from the target individual, aDNA can be collected from the environment as well. Animals and shed their DNA into the environment during their and after their death. In sediments which can be securely dated and have little DNA migration through the sediment, aDNA has been sequenced from plants and animals which were present at the age of the sediment and used to infer the fauna and flora in the environment as well as the climate (Haile et al., 2009; Willerslev et al., 2003).

Also present in the environment is DNA from living and dead microorganisms. Environmental samples can provide information on the microbial community living in the environment including taxonomic classification of the microbial community composition, metabolic pathways found in the community, and evolutionary history and closest relatives of the community members (Gilbert and Dupont, 2011; Handelsman, 2004; Temperton and Giovannoni, 2012). Again this is not only useful for modern samples, but can be used in places where there is very little migration of microorganisms through the environmental matrix, to give insight into the ancient microbial communities (Inagaki et al., 2005). This method of characterizing ancient microbial communities has included finding thermophiles in Lake Vostok (Bulat et al., 2004) and identifying the ancient origin of antibiotic resistance genes (D’Costa et al., 2011).

In 2005, the first of the next-generation-sequencing (NGS) technologies was described (Margulies et al., 2005) and was almost immediately used by aDNA researchers to generate 13 million bp of the woolly mammoth nuclear genome (Poinar et al., 2006). A draft of the nuclear genomes of the woolly mammoth (Miller et al., 2008) and Neanderthal (Green et al., 2010) as well as a high quality coverage of the nuclear genome of a 4,500 year old Palaeo-Eskimo (Rasmussen et al., 2010) followed. NGS is ideal for aDNA research because it eliminates some of the competition problems that occur between target and contaminant sequences and it produces large amounts of sequence data (von Bubnoff, 2008).

3

Additional advances in technology have used a more recent technology, hybridization capture, which allows enrichment of the target DNA without using PCR. Hybridization capture was initially used on modern DNA to enrich for large parts of the genome, for example the complete exome (Ng et al., 2009). In aDNA research it is used to enrich the target species DNA over contaminant DNA such as that from bacteria and fungi (Briggs et al., 2009; Burbano et al., 2010). Very recent advances in extraction, library preparation, and hybridization capture have allowed for the retrieval of highly fragmented and damaged aDNA (Dabney et al., 2013; Meyer et al., 2012; Orlando et al., 2013).

While technological advances have made the recovery of aDNA easier, aDNA research is still limited by DNA degradation and contamination. These issues have been quantified and are well studied in aDNA from bones. However, there are plenty of other types of samples that can be used for aDNA research including environmental samples and items which have DNA transfer during use. I test data from geologically ancient DNA (gaDNA) originating from microorganisms trapped in salt crystals for authenticity. I extract and sequence DNA from difficult environmental samples (up to 840m below the surface) in order to test for the presence of biomarkers of microorganisms which lived in an ancient hydrothermal system. I also extract and sequence human DNA from mittens, moccasins, and cordage used 700 years ago and then buried. Finally, I examine some very difficult bones from ancient bison, deer, and camelids to examine the condition of DNA preservation in these samples.

Concerns with aDNA research DNA degradation through time A number of things can cause damage – chemical modifications – to DNA including UV radiation, free radicals, and even water. During the lifetime of an organism, DNA is actively being protected from harm, checked for damage, and repaired by multiple cellular mechanisms (Lindahl, 1993). After the organism dies or DNA is shed into the environment, these cellular mechanisms no longer operate and therefore damage accumulates. Ultimately this damage makes the DNA no longer able to be recovered. The pattern of DNA damage and degradation that occurs is well-

4 characterized (Briggs et al., 2007; Brotherton et al., 2007; Gilbert et al., 2003a, 2007; Heyn et al., 2010; Ho et al., 2007; Hofreiter et al., 2001a; Stiller et al., 2006).

Typically after death DNA is degraded by endogenous nucleases; however, this process is slowed or stopped when environmental conditions deactivate or destroy the nucleases. This happens under conditions of rapid desiccation, low temperatures, high salt concentration, and lack of oxygen (Hofreiter et al., 2001b). Even if the nucleases are degraded, damage still slowly accrues through oxidation, hydrolysis and radiation as shown in Figure 1-1 (Willerslev and Cooper, 2005). Primarily, damage occurs as continued fragmentation of DNA into shorter and shorter fragments by direct hydrolytic cleavage of the DNA phosphodiester backbone or hydrolytic depurination resulting in an abasic site followed by cleavage of the backbone by beta elimination. Interstrand crosslinks by alkylation and intermolecular crosslinks by Maillard reactions prevent amplification by polymerases (Figure 1-2). In addition to shortened or unamplifiable fragments, accumulation of specific chemical modifications occurs, mostly deamination of cytosine to uracil which results in C to T and G to A miscoding lesions. Additional oxidative damage can result in blocking lesions which promote chimeric sequences through ‘jumping pcr’ (Hansen et al., 2006; Heyn et al., 2010; Hofreiter et al., 2001b; Mitchell et al., 2005; Stiller et al., 2006; Willerslev and Cooper, 2005).

5

Figure 1-1: Post-mortem DNA modification from ancient remains. (a) Formation of strand breaks though hydrolytic cleavage from either (i) direct cleavage of the phosphodiester bond or (ii) depurination followed by breakage of the backbone through beta elimination. (b) Different types of crosslink formation including (i) interstrand crosslinks by alkylation and (ii) intermolecular crosslinks though the Mailard reaction. (c) Oxidative and hydrolytic cleavage result in modification of bases leading to (i) blocking lesions or (ii) miscoding lesions. Figure from Willerslev and Cooper (2005).

6

Damage and degradation of ancient DNA are dependent upon preservation environment, most importantly temperature, but also pH, presence of heavy metals, and oxygen content (Hofreiter et al., 2001b). Continuous, cold temperatures provide the best preservation conditions. Figure 1-3 shows the theoretical calculations for DNA survival based upon temperature. In warmer temperatures depurination happens more rapidly, fragmenting the DNA more quickly (Lindahl, 1993; Willerslev et al., 2004a). Measurements on decay kinetics confirm the theoretical calculations although with a longer half life and show that nuclear DNA degrades at least twice as fast as mitochondrial DNA (Allentoft et al., 2012). Other factors that affect the rates of depurination are pH and heavy metal ion chelation, making areas with higher concentrations of heavy metals and soil humic acids more susceptible to degradation (Willerslev et al., 2004b). Crosslinks can accumulate even faster than strand breaks resulting in a time when DNA is still present in the sample, but not amplifiable (Hansen et al., 2006). Ultimately consensus for the upper limit of aDNA survival is between 400 thousand and 1.5 million years, beyond which DNA is either severely crosslinked, completely degraded, or otherwise non-detectable (Allentoft et al., 2012; Hansen et al., 2006; Willerslev et al., 2004a).

Figure 1-2: On the left, fragmentation of DNA through time showing lesion frequency or single and double stranded breaks. Double stranded breaks occur more frequently in older samples. While there is a possibility for correcting single stranded breaks, double stranded breaks fragment the DNA permanently. On the right, interstrand crosslink formation through time. Crosslink formation results in blocking the polymerase and rendering the DNA not amplifiable and effectively shortening the fragment size. Figure from Mitchell et al. (2005).

7

There are several methods which can be used to detect damage and increase the quality and quantity of amplifiable aDNA. Uracil-N-glycosylase (UNG) removes deamination products of cytosine and therefore can test the origins of C to T changes in ancient samples (Gilbert et al., 2003b; Hofreiter et al., 2001a). N-phenacylthiazolium bromide (PTB) can break intermolecular cross-links, although it is not widely used (Poinar et al., 1998). The most commonly used method to amplify damaged DNA is to use a high fidelity polymerase enzyme such as Taq HiFi or TaqGold which will minimize sequence error rates and/or increase amplification efficiency (Cooper et al., 2001; Willerslev et al., 1999).

Figure 1-3: Long term survival of 100 bp of DNA as a function of temperature. Calculations were based upon a genome size of 3.0x106 bp, the Arrhenius equation, and the depurination kinetics of Lindahl (1993). Figure from Willerslev et al. (2004a).

Co-extracted compounds inhibit reactions Other compounds can be co-extracted with DNA during aDNA extractions which can inhibit amplification by PCR (Rossen et al., 1992; Wilson, 1997). The biggest problem comes from humic acids in soil (Tebbe and Vahjen, 1993; Tsai and Olson, 1992a), but high heavy metal content (Tsai and Olson, 1992b) and polysaccharides and phenolic compounds from plants (Lee and Cooper, 1995) can also inhibit Taq polymerases used in PCR. Humic acids are a complex mixture of different acids containing carboxyl and phenolate groups produced by biodegradation of dead organic matter and is found in soil, peat, coal, and bodies of water (Stevenson, 1994). They are frequently co-extracted with DNA and inhibit enzymatic reactions, specifically PCR (Tebbe and Vahjen, 1993; Tsai and Olson, 1992a). Plants and fungi contain high quantities of

8

polysaccharides and phenolic compounds which also inhibit PCR (Lee and Cooper, 1995; Ziegenhagen et al., 1993). Heavy metal concentration is not a common problem in most aDNA studies, but can be a problem in some sediment environments (Tsai and Olson, 1992b). In addition to PCR, library preparation reactions use different enzymes and the effects of different inhibitors has not been fully explored for these reactions.

There are several ways proposed to get around these inhibitors. The easiest solution is to simply dilute the extract in the PCR reaction. This dilutes the concentration of inhibitors and thus allows for amplification. However, this solution can be undesirable for aDNA and deep biosphere sediment work as the concentration of DNA which is already low is also diluted (King et al.; Kormas et al., 2003; Webster et al., 2003). Gel purification has been shown to remove humic acids; however, this method has the potential to introduce contamination and lose large quantities of DNA and so is only useful on samples with high quantities of DNA (Sørensen et al., 2004). Various extraction protocols can help remove inhibitors including carrier mediated ethanol precipitation (Kalmár et al., 2000), isopropanol precipitation (Hanni et al., 1995), and the now standard aDNA silica extraction protocol (Rohland and Hofreiter, 2007a). Several additives to extractions can increase the amount of inhibitors which are removed. These include Cetyl- trimethyl ammonium bromide (CTAB) (Cubero et al., 1999) and polyvinylpolypyrrolidone (PVPP) (Barns et al., 1994; Holben et al., 1988) for general use and polyvinylpyrrolidone (PVP) for use in plant extractions (Rachmayanti et al., 2006). For sediments with extreme heavy metal inhibition, dialysis or ion exchange chromatography can be needed (Tebbe and Vahjen, 1993). Finally Bovine Serum Albin (BSA) has been found to block inhibitors when added into PCR reactions for a wide range of samples including sediments and aDNA extracts (Kreader, 1996; McKeown, 1994; Rohland and Hofreiter, 2007b; V. Wintzingerode et al., 1997).

DNA contamination In addition to accumulating damage through time, aDNA is also prone to contamination by modern DNA. aDNA is fragmented, damaged, and often present in low quantities in tissue or sediments. Orders of magnitude more unfragmented and undamaged modern DNA can present in the sample (Hofreiter et al., 2001b). When the aDNA is amplified, the polymerase will preferentially amplify the undamaged modern contaminant DNA over the damaged and

9

fragmented aDNA (Cooper and Poinar, 2000; Gilbert et al., 2005a; Willerslev and Cooper, 2005). Modern contamination can come from three main sources: the environment, during excavation and/or storage, and during laboratory analysis.

Contaminant DNA can come from the environment. As most samples examined using aDNA are buried in the soil or sediment, environmental bacteria from the soil will thus be present in the sample (Willerslev et al., 2004a). In addition, depending upon the environment, DNA can leach from the surface potentially contaminating a buried sample with modern DNA (Andersen et al., 2012; Hebsgaard et al., 2009).

During the excavation samples are at the greatest risk of contamination by the excavators or by the excavation equipment. This is a particularly large problem with human samples (Cooper, 1997; Serre et al., 2004). Also problematic are environmental samples taken from coring equipment as the drill or coring process can introduce new microorganisms or spread surface microorganisms contaminating samples (Rummel, 2001), especially environments with low microbial biomass (Bulat et al., 2004; Navarro-Gonzalez, 2003). Methods have been developed to observe potential contamination of drilling equipment by coating the drill and drilling fluid with a modern DNA vector (Willerslev et al., 2003), microfluorsecnt beads (Gronstal et al., 2009), or other tracers (David C. Smith, Arthur J. Spivack, Martin R. Fisk, Shelley A. Haveman, 2000). Then, the samples are subsampled through several steps (Figure 1-4). If either the DNA or the beads are present in the sample, the sample should be considered contaminated. In addition, studies have been done on drilling fluid in order to see what microorganisms are present and thus likely contaminants (Alekhina et al., 2007; Gronstal et al., 2009). Contamination can also occur during storage before aDNA samples are taken. This is a particular problem for museum specimens where cross contamination or contamination from small , insects, and humans can be common (Cooper, 1997; Serre et al., 2004).

10

Figure 1-4: One example of a method for multiple subsampling events to remove contamination. Figure from Juck et al. (2005).

Finally, samples can be contaminated during the extraction and amplification process. A common source of contamination is amplified products in the laboratory. One PCR amplification can create 1012-1015 copies of the desired fragment of DNA (Kwok and Higuchi, 1989). This is many orders of magnitude larger than what is in the aDNA sample (Cooper et al., 2001; Handt et al., 1996). Thus, working in an isolated area and making sure that modern PCR products are not transported into that isolated area is very important (Willerslev and Cooper, 2005). As with the excavation, contamination can occur from the laboratory environment and tools. With most samples, this type of contamination can be easily identified; however, it is particularly a concern with human and microbial work because of the ubiquitous presence of both humans and microorganisms in the laboratory. DNA from microorganisms is especially problematic because most aDNA analysis processes remove all microbial DNA without identifying if the microbial DNA is environmental or from laboratory contamination (Willerslev and Cooper, 2005).

The aDNA community has a checkered past of spectacular claims, including those of dinosaur DNA, that were later shown to be contaminants (Wang et al., 1997; Woodward et al., 1994; Young et al., 1995; Zischler et al., 1995). This led to the development and widespread adoption of a series of standard protocols to help ensure the authenticity of ancient sequences (Cooper and Poinar, 2000; Gilbert et al., 2005a; Hofreiter et al., 2001b; Pääbo et al., 1989; Wayne

11

et al., 1999). These protocols involve both protecting the ancient samples and extracts from contamination by modern sequences and detecting patterns of contamination and damage in the aDNA data. These criteria as stated by (Gilbert et al., 2005a) are:

1. Isolation of work areas – Because the risk of contamination from modern, amplified DNA is quite high, extractions and the set-up of amplification reactions should be conducted in an isolated laboratory space. 2. Negative control extractions and amplifications – To screen for contaminants at all stages, negative controls should be completed and when necessary sequenced. If possible, extractions and amplifications should be redone if the corresponding negative control is contaminated. 3. Appropriate molecular behavior – As DNA degradation occurs through time, older or more damaged samples should amplify smaller fragments in comparison to younger or more well preserved samples. 4. Reproducibility – Multiple extractions and amplifications should yield consistent results. However, with very low template samples amplifications from the same extract may not always amplify due to stochastic variation. 5. Cloning of products – Cloning of products can assess for damage, contamination, and jumping PCR. In well preserved samples cloning may not be necessary, but in samples with high risk of contamination or in samples with contribution from multiple organisms, amplification products should be cloned prior to sequencing. 6. Independent replication – The generation of consistent results by different research groups can help increase the authenticity of the results. This is especially true with samples that are poorly preserved or are very old. 7. Biochemical preservation – The preservation of other biomolecules that correlate with DNA survival should indicate good sample preservation, although this is not always the case. 8. Quantification – Real-Time or qPCR can give an indication of the number of starting templates in the reaction and thus the amount of DNA which has been extracted from a sample.

12

9. Associated remains – Other items and remains associated with the sample should show similar preservation and contamination conditions. Multiple samples from the same area which are well preserved indicate a higher level of security in all the results.

Not all of these criteria are applicable to every aDNA study, but they provide a good framework for avoiding and identifying contamination. During the course of the following experiments, proper practices for working with ancient sequences were followed including using an isolated, dedicated aDNA laboratory, wearing protective garments, working up the contamination gradient, when necessary working with sterilized equipment in a laminar flow hood, and including many negative controls throughout the process. In addition, we looked for patterns of damage, most commonly seen as C to T and G to A changes and attempted amplification of only short fragments. When necessary we compared the aDNA data to DNA from common contaminants, individuals who worked with the samples, and from extraction negatives. In each of the following chapters, the precautions that are were taken to ensure the authenticity of the ancient sequences are discussed in more detail.

Characterizing DNA under extreme conditions My thesis addresses characterizing DNA preservation under extreme conditions. Scientists working in the aDNA field have continually pushed the boundaries of extreme conditions. Improvements in sequencing technology have led to the ability to reconstruct entire ancient genomes (for example Miller et al., 2008; Noonan et al., 2006) and better detect contamination (Allentoft et al., 2012; Stiller et al., 2006). Optimization to aDNA protocols has improved the ability to detect and analyze highly fragmented or damaged DNA (Meyer et al., 2012; Orlando et al., 2013). My work explores generally non-bone examples of DNA in extreme environments to better understand preservation conditions and ability to authenticate these difficult samples. Chapter two describes a way to authenticate geologically ancient DNA and applies it to ancient halophilic bacteria from salt crystals. Chapter three provides results from a deep subsurface sample about DNA biomarkers from ancient thermophilic microbial communities. Chapter four includes evidence of authentic Native American DNA derived from moccasins, mittens, and cordage buried for 700 years. Chapter five describes results and lack of results from DNA analysis

13

of several ancient bones. Chapter six discusses the common themes found in my projects and relates the work to astrobiology.

Determining the authenticity of geologically ancient DNA Claims of an extraordinary nature are common to aDNA research, but such extraordinary claims require extraordinary evidence and analysis. Early extreme claims about DNA surviving for millions of years in plants (Golenberg et al., 1990; Kim et al., 2004; Soltis et al., 1992), amber- preserved samples (Cano et al., 1992a, 1992b, 1993; DeSalle, 1994; DeSalle et al., 1992, 1993; Poinar and Cano, 1993), and dinosaur bones (Woodward et al., 1994) were either proven to be obvious human or microbial contamination or were not able to be independently replicated (Brotherton et al., 2007; Gilbert et al., 2005b; Gutiérrez and Marin, 1998; Sidow et al., 1991; Zischler et al., 1995). Some of the most extreme claims come from microbial DNA preserved in halite crystals for many millions of years. Claims of halophilic microorganisms isolated from ancient rock salt began in the 1960s (Bibo, 1983; Dombrowski, 1963; Norton et al., 1993; Stan- Lotter et al., 2000; Vreeland et al., 1998), although they did not come without skeptics (Bien and Schwartz, 1965; De Ley et al., 1966). These early studies generally used bulk crystal. This means that while they may have shown that were present in salt deposits, they did not prove that the halophiles were as old as the formation. As there is both brine movement and secondary crystallization of salt in salt brines these isolates could be contamination (McGenity et al., 2000).

To address this issue, researchers began to isolate microorganisms directly from fluid inclusions in single, primary crystals (Fish et al., 2002; Radax et al., 2001; Vreeland et al., 2000). Although the aDNA community has expressed repeated concerns about the authenticity of these claims (Graur and Pupko, 2001; Hebsgaard et al., 2005; Willerslev and Hebsgaard, 2005), they continue to be published in mainstream scientific literature (Coolen and Overmann, 2007; Gramain et al., 2011; Panieri et al., 2010; Park et al., 2009; Satterfield et al., 2005; Schubert et al., 2010; Vreeland et al., 2007). The main concerns raised about these claims are the lack of following widely adopted best practices for aDNA retrieval (Cooper and Poinar, 2000; Gilbert et al., 2005a); the lack of characteristic aDNA damage and degradation patterns which suggest a modern, not ancient, origin; and the reputed age of the samples as 100-fold greater than the theoretical limits of post-mortem DNA survival (Lindahl, 1993). Due to the small quantities of biological material

14 that are recovered, it is often not possible to verify the age of the samples by isotopic measurements, so often the age determinations must be made solely from stratigraphy. We, therefore, develop a method, the Depressed Rate Test, which uses molecular clock rate estimates to test the authenticity of the age estimates of these geologically ancient DNA sequences. In Chapter Two, I show the sensitivity of the test through simulation and then apply the test to previously published geologically ancient DNA sequences of fungi, bacteria, and archaea.

Detecting biomarkers from an ancient hydrothermal system In the early 1990s Thomas Gold hypothesized about the deep, dark energy biosphere, the multiple ecosystems which are physically located in environments that exist in permanent darkness (Gold, 1992). This dark biosphere, existing down from about 1m below the continental surface or seafloor, contains the largest collection of habitats for biological systems on Earth and is predominantly inhabited by microbial populations; archaea, bacteria, , and viruses (reviewed in Colwell and D’Hondt, 2013; Edwards et al., 2012). Some early estimates suggested that up to 95% of reside in the deep subsurface of Earth accounting for over one-third of the Earth’s total biomass carbon (Whitman et al., 1998). Although these estimates have been decreased based upon new data (Kallmeyer et al., 2012), the deep biosphere still represents a number of relatively unknown ecosystems which are important for “driving carbon and nutrient cycling and catalyzing a multitude of reactions between rocks, sediments and fluids” (Jørgensen, 2012). These varied ecosystems all share the fact that they are one or more steps removed from the photosynthetic world; yet, nearly all the current knowledge of microbiological processes and biogeochemical cycles in the microbial world derives from studies of light ecosystems (Edwards et al., 2012). However, increasing interest in both the marine and terrestrial biosphere has resulted in an upswing in published research papers on deep subsurface microbiology in the past ten years (Colwell and D’Hondt, 2013).

The deep subsurface is a challenging environment to study for a number of reasons. First, sampling methods are often expensive and difficult, many requiring coring which is expensive and poses a high contamination risk (Colwell and D’Hondt, 2013), even with standardized protocols that decrease contamination risk and increase the ability to detect when a sample is contaminated (Kieft et al., 2007). Because of the high cost per quantity of sample material recovered and the

15

difficulty of sampling, samples often do not have replicates and the location of sampling is often based upon factors other than biology (Colwell and D’Hondt, 2013; Jørgensen, 2012). For example, the Ocean Drilling Program (ODP) was primarily focused on plate tectonics, paleoceanography, and other processes that were expressed along ocean margins. This bias led to a high cell counts because these regions have high organic productively and high sedimentation rates (Jørgensen, 2012). In the terrestrial subsurface, studies are biased toward areas which are linked to human reliance on the deep geological provinces – water, minerals, hydrocarbons, or repositories for waste (Edwards et al., 2012).

Second, much of the deep biosphere is made of low biomass (D’Hondt et al., 2009; Kallmeyer et al., 2008, 2012) which creates difficulties with DNA extraction and increases potential for contamination (Santelli et al., 2010). In order to combat the problem multiple displacement amplification can be used, but this leads to bias in the relative genome coverage of community members (Yilmaz et al., 2010).

Third, between 90% and 99% of bacteria and archaea have proven resistant to current culturing methods (Amann et al., 1995; Torsvik et al., 1990a, 1990b). These microorganisms are likely to be different phenotypically and genetically from those characterized in culture (Rondon et al., 2000) meaning that PCR primers and genome mapping software may be biased in favor of the small percentage of cultured microorganisms. Additionally, there is a bias against archaea in many extraction methods (Lipp et al., 2008), compounding the problem.

However, in the past few years things have improved. There has been an increase in diversity in the sampling sites including low biomass marine sites (D’Hondt et al., 2009). Also, the gold mines of South Africa have yielded previously uncharacterized microorganisms from all three domains of life (Borgonie et al., 2011; Chivian et al., 2008; Lin et al., 2006; Takai et al., 2001; Wanger et al., 2008). Approximately 25 CORK (Circulation Obviation Retrofit Kit) observatories provide opportunities for long term microbial studies (Edwards et al., 2012), including one at Juan de Fuca ridge where multiple in situ experiments have been conducted (Fisher et al., 2011; Orcutt et al., 2010a, 2010b; Smith et al., 2011). Finally, improved technology allows for better metagenomic analysis of samples including some studies on mRNA (Orsi et al., 2013).

16

One pressing question remaining about the deep subsurface is the status of the cells that reside there (Hoehler and Jørgensen, 2013). Are they living and reproducing, in a state of low metabolic activity to repair damage, in a dormant or vegetative state, or dead ? Multiple lines of evidence indicate that at least some of the microbial cells are living. Respiration rates for some deep subsurface communities have been measured indicating life (D’Hondt et al., 2009; Røy et al., 2012). FISH, CARD-FISH, and qPCR, detection methods which rely on intact rRNA genes or rRNA, have detected bacterial and archaeal cells (Biddle et al., 2006; Schippers and Neretin, 2006; Schippers et al., 2005). Structural and isotopic analysis of intact polar membrane lipids indicate living microorganisms (Biddle et al., 2006; Lipp et al., 2008; Sturt et al., 2004). Also, cells were found to assimilate labeled tracer atoms after incubation (Morono et al., 2011). Even though all of these methods indicate that some of the microbial population is alive, they have extremely low doubling times, as long as 200-2,000 years (Biddle et al., 2006), and biomass turnover, ranging from years to millennia (Jørgensen, 2011; Jørgensen and Boetius, 2007; Lomstein et al., 2012). This leads to the question, “How old are these deep subsurface bacteria?” The lowest rates of biomass turnover that have been calculated are still orders of magnitude higher than the energy required to repair DNA depurination and protein amino acid racemization, but are not high enough to realistically maintain other functions including reproduction and even mobility (Hoehler and Jørgensen, 2013). Therefore, the age of individual members of the community in the deep subsurface remains unknown, but could potentially be counted in the hundreds of thousands of years.

One interesting deep subsurface location that has been drilled with contamination controls for microbial sampling is the Chesapeake Bay Impact Structure (CBIS). The CBIS is a buried, 85- 90 km late Eocene (34.5 million years ago) impact crater formed when a 2-3 km diameter impactor hit a shallow marine environment (Johnson et al., 1998; Koeberl et al., 1996). The sequence of events during and after the impact have been reconstructed. They include a hydrothermal system lasting for at least 10,000 years followed by deposition of marine sediments (Collins and Wünnemann, 2005; Kenkmann et al., 2009; Kulpecz et al., 2009; Malinconico et al., 2009). Such temporary hydrothermal systems have been shown to be colonized by microbial communities (Lindgren et al., 2010; Parnell et al., 2004, 2010). This thermophilic community would have remained in the crater during cooling and burial. Given the slow microbial rates of turnover in the

17 deep subsurface the question then is, “Can a DNA signal from this thermophilic community be detected after millions of years?”

My project aimed to extract and sequence DNA from multiple layers from 150-840m from the deep subsurface of the CBIS, including both sediments from the impact layers and from the post impact sediments, in order to examine the possibility of detecting DNA from microorganisms which lived during the time of the temporary hydrothermal system or descended from microorganisms living at that time. Due to the low biomass of the samples and possibility of contamination, techniques optimized for aDNA extraction and sequencing were used in conjunction with techniques used in deep subsurface research in order to retrieve DNA. In chapter three I discuss the results of the metagenome sequencing of three layers from the CBIS and the ability to detect potential DNA biomarkers of the ancient thermophilic community.

Recovering aDNA from touched objects DNA can easily be transferred from a human or to an object during production or use. The possibility of retrieving DNA from the original users of historic artifacts is intriguing. This sort of DNA analysis is routinely done in forensic studies (van Oorschot et al., 2010). Recovering this transferred DNA was first reported in 1997 (van Oorschot and Jones, 1997) and has led to a wide range of other objects including tools, clothing, shoes, knives, vehicles, firearms, food, bedding, condoms, lip cosmetics, wallets, jewelry, glass, skin, paper, cables, windows, doors, and stones (Bright and Petricevic, 2004; Herber and Herold, 1998; Van Hoofstat et al., 1999; Horsman-Hall et al., 2009; Oz et al., 1999; Petricevic et al., 2006; Polley et al., 2006; Rutty, 2002; Sewell et al., 2008; Webb et al., 2001; Wickenheiser, 2002).

Analysis of transferred lipids is routinely conducted on artifacts typically on the interior of jars or pots to determine the contents (for a review: Brown and Brown, 2013; for two recent examples of lipid use Craig et al., 2011, 2013), but has been successfully used on a variety of items including slab-lined pits, blubber lamps, beeswax, and adhesives (Baeten et al., 2010; Charters et al., 1993, 1995; Heron et al., 2010, 2013). Although DNA has been recovered from a variety of objects including parchment, textiles, leather or hide, and bone artifacts, they have all attempted to recover DNA from the organism from which the object derived (examples include Burger et al.,

18

2000; Campana et al., 2010; Murphy et al., 2011; Poulakakis et al., 2007; Schlumbaum et al., 2010; Sinding et al., 2012; Vuissoz et al., 2007). Only a few studies have actually worked with aDNA transferred onto objects. These include transfer of DNA onto the interior of a fossil egg shell from the gestating bird (Oskam and Bunce, 2012; Oskam et al., 2010), transfer into pots and vessels (Hansson and Foley, 2008; McGovern et al., 2009), and quids (chewed and discarded wads of plant material) and aprons (fiber material that contains menstrual blood) from North America (LeBlanc et al., 2007).

My aim was to analyze moccasins, mittens, and cordage (made by rolling plant fibers on the legs repeated with application of saliva) from a North American archaeological site in order to identify human aDNA from the individuals who wore or made these objects. Chapter four discusses the difficulties working with human remains, transferred DNA, and mixed haplotype samples. Despite the challenges, authentic Native American DNA was recovered from some of the excavated items.

The limit of recoverable DNA in bones The majority of work on aDNA research has been conducted on bones. In chapter five I discuss in brief the outcomes from several animal bone projects where the bone samples are expected to have poor DNA preservation. Extinct from the Late Pleistocene and Early Holocene make good subject for aDNA research because many of their remains have been recovered from continually cold permafrost conditions (Shapiro and Cooper, 2003). Some of these extinct megafauna are poorly represented in the fossil record (Jiménez-Hidalgo and Carranza- Castañeda, 2010; Merino and Rossi, 2010) or have no published DNA sequences. I attempt extractions on the bones of three of these species: Giant bison (Bison latifrons), Western North American camel (Camelops hesternus), and North American deer (tentative Odocoileus sp.).

The Yukon permafrost has yielded large number of well-preserved bones (some examples Barnes et al., 2002; Campos et al., 2010; Orlando et al., 2013; Weinstock et al., 2005), and the recent discovery of bone from a C. hesternus, provides a new sample to use for exploring the genetic relationships between these extinct North American camelids and their living relatives the llamas and alpacas of South America (Zazula et al., 2011). Samples from the Lower 48 States are

19

generally less well preserved than those from the Yukon because of the freeze-thaw cycles that occur. However, bones from lower latitudes have successfully yielded aDNA (for example Arndt et al., 2003; Speller et al., 2010; Storey et al., 2007; Valentine et al., 2008; Weinstock et al., 2009). The recently discovered high elevation Pleistocene fossil bearing site near Snowmass Village, Colorado has yielded over 5,000 bones from at least 26 different spanning the period 150,000-46,000 years ago (Johnson et al., 2011). Although this area does go through freeze-thaw cycles and is quite old for a lower latitude site making preservation conditions most likely poor, the wide variety of bones provide a unique opportunity to study a series of high altitude ecosystems. I examined bones from B. latifrons and Odocoileus sp. recovered from the Snowmass fossil site.

As these three species lack published DNA data, taxonomic classification is based solely on morphological features, which is challenging when remains are scarce. In addition, even when looking at only extant members of a species morphological and genetic classifications do not always agree (for example the Odocoileinae family of deer Fernández and Vrba, 2005; Randi et al., 1998). The aim of this project was to determine the phylogenetic relationships between B. latifrons and C. hesternus with their nearest relatives and to resolve the taxonomic relationship of the Odocoileus sp. with the extant members of the family. However, both traditional PCR and NGS yielded only one fragmented sequence from one C. hesternus bone. In chapter five, I discuss why the preservation in the Yukon of C. hesternus samples are likely slightly better than the Snowmass samples and conclude with a discussion of some very recently published methods that may allow recovery of the fragmented DNA from these samples.

Ancient DNA and astrobiology implications As more planets are detected and more is learned about the planets and moons in our own Solar System, the likelihood that we might discover life present elsewhere in the Universe in increasing. A central question to astrobiology is, “If life is present how do we detect it?” Advances in technology have improved detection and analysis of remote planets and now there are over 700 confirmed and several thousand more candidates (Impey, 2013). While initially only large planets were detectable, the number of smaller planets detected has increased drastically (Borucki et al., 2011; Mayor et al., 2011). By early 2012 NASA’s Kepler mission reported over 200 candidate

20

planets that are less than 1.25 times Earth’s size (Batalha et al., 2013). However, we are limited to studying these exoplanets remotely through means such as determining if the planet could have liquid water from its location in relation to its star (i.e. the Habitable Zone – papers where it is described Abe et al., 2011; Forget and Pierrehumbert, 1997; Kasting et al., 1993) and ultimately looking at the spectra which show remote characteristics such as gasses in the atmosphere (Kaltenegger et al., 2010; Schulze-Makuch et al., 2011).

However, if we look for life on planets and moons in our Solar System there are possibilities for sampling to detect life. Potentially habitable moons and planets in our Solar System include Mars, Europa, Titan, and Enceladus (for a review of the evidence see McKay, 2011; Shapiro and Schulze-Makuch, 2009). Mars is the easiest of these planets to study and has had liquid water on the surface in the past (Bibring et al., 2006; Ehlmann et al., 2011; Hoehler and Jørgensen, 2013; Mustard et al., 2008; Poulet et al., 2005). NASA has sent two remote probes to detect life on Mars: the Viking landers in 1977 and the recent rover Curiosity. Although neither of these missions has yet found definitive proof of life on Mars, they have only samples a small section of the planet at the surface (Levin, 2014, 2013). As conditions on the surface are poor for life now, there is a strong chance that if life was present on Mars in the past it may be surviving only in the subsurface (Clark, 1998; Michalski et al., 2013). Where it might be present and how it could be detected and authenticated are major goals of astrobiology research.

A number of Mars and Europa analog sites have been investigated on Earth most of which are what we would consider extreme environments for life (review of a number of sites in Fairén et al., 2010). Researchers have found life surviving and even thriving in these extreme environments such as, but not limited to the Antarctic Dry Valleys (Heldmann et al., 2013), continually ice covered lakes such as Lake Vostok (Bulat et al., 2011), super cooled brines within permafrost (Gilichinsky et al., 2003), deserts (Foing et al., 2011), hot springs (Sarkisova et al., 2010), volcanic sites (Hynek et al., 2011; Marcucci et al., 2013), areas with mineral evaporates (Barbieri and Stivaletta, 2012; Gómez et al., 2012), multiple types of caves (Azúa-Bustos et al., 2010; Jones et al., 2010), and the deep subsurface (Sherwood Lollar et al., 2009). Using information on extreme environments and improvements in biotechnology a number of ways which improve life detection (as we know it with carbon based nucleic or amino acids) are being proposed and tested (Beegle et al., 2011; Lui et al., 2011; Parro et al., 2008, 2011). As with the

21

Viking landers and Curiosity employing these methods would be expensive and therefore it is crucial to understand what expected results might be and how to optimize and authenticate results these life detection mechanisms.

There are two main concerns with these DNA life detection mechanisms addressed here. First, these methods will have limiting sampling capability and limited geographical range. Thus better understanding the analog environments and preservation conditions of DNA here on Earth will lead the optimal sampling locations when such a mission is sent to Mars. Second, extraordinary claims of finding DNA based life on Mars (or other planets/moons in our Solar System) will require extraordinary proof of authentication as contamination from Earth microorganisms poses a major problem (Bargoma et al., 2013; Ghosh et al., 2010; Kerney and Schuerger, 2011). In chapter six I discuss the interdisciplinary nature of astrobiology, how my research applies to better addressing these two concerns with life detection, and finally propose some future directions.

22

References Abe, Y., Abe-Ouchi, A., Sleep, N.H., and Zahnle, K.J. (2011). Habitable zone limits for dry planets. Astrobiology 11, 443–460. Adcock, C.T., Hausrath, E.M., and Forster, P.M. (2013). Readily available phosphate from minerals in early aqueous environments on Mars. Nat. Geosci. 6, 824–827. Alekhina, I.A., Marie, D., Petit, J.R., Lukin, V.V., Zubkov, V.M., and Bulat, S.A. (2007). Molecular analysis of bacterial diversity in kerosene-based drilling fluid from the deep ice borehole at Vostok, East Antarctica. FEMS Microbiol. Ecol. 59, 289–299. Allentoft, M.E., Collins, M., Harker, D., Haile, J., Oskam, C.L., Hale, M.L., Campos, P.F., Samaniego, J.A., Gilbert, M.T.P., Willerslev, E., et al. (2012). The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. B Biol. Sci. 279, 4724–4733. Amann, R.I., Ludwig, W., and Schleifer, K.-H. (1995). Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59, 143–169. Andersen, K., Bird, K.L., Rasmussen, M., Haile, J., Breuning-Madsen, H., Kjaer, K.H., Orlando, L., Gilbert, M.T.P., and Willerslev, E. (2012). Meta-barcoding of “dirt” DNA from soil reflects biodiversity: META-BARCODING OF “DIRT” DNA FROM SOIL. Mol. Ecol. 21, 1966–1979. Arndt, A., Van Neer, W., Hellemans, B., Robben, J., Volckaert, F., and Waelkens, M. (2003). Roman trade relationships at Sagalassos (Turkey) elucidated by ancient DNA of fish remains. J. Archaeol. Sci. 30, 1095–1105. Austin, J.J., Ross, A.J., Smith, A.B., Fortey, R.A., and Thomas, R.H. (1997). Problems of reproducibility – does geologically ancient DNA survive in amber–preserved insects? Proc. R. Soc. Lond. B Biol. Sci. 264, 467–474. Azúa-Bustos, A., González-Silva, C., Salas, L., Wynne, J.J., McKay, C.P., Palma, R.E., and Vicuña, R. (2010). Atacama Desert Caves as Analog Models of Habitability for Microbial Life on the Surface of Mars. LPI Contrib. 1538, 5030. Baeten, J., Romanus, K., Degryse, P., De Clercq, W., Poelman, H., Verbeke, K., Luypaerts, A., Walton, M., Jacobs, P., and De Vos, D. (2010). Application of a multi-analytical toolset to a 16th century ointment: identification as lead plaster mixed with beeswax. Microchem. J. 95, 227–234. Barbieri, R., and Stivaletta, N. (2012). Halophiles, Continental Evaporites and the Search for Biosignatures in Environmental Analogues for Mars. In Life on Earth and Other Planetary Bodies, (Springer), pp. 13–26. Bargoma, E., La Duc, M.T., Kwan, K., Vaishampayan, P., and Venkateswaran, K. (2013). Differential Recovery of Phylogenetically Disparate Microbes from Spacecraft-Qualified Metal Surfaces. Astrobiology 13, 189–202. Barnes, I., Matheus, P., Shapiro, B., Jensen, D., and Cooper, A. (2002). Dynamics of Pleistocene population extinctions in Beringian brown bears. Science 295, 2267–2270.

23

Barns, S.M., Fundyga, R.E., Jeffries, M.W., and Pace, N.R. (1994). Remarkable archaeal diversity detected in a Yellowstone National Park hot spring environment. Proc. Natl. Acad. Sci. 91, 1609– 1613. Batalha, N.M., Rowe, J.F., Bryson, S.T., Barclay, T., Burke, C.J., Caldwell, D.A., Christiansen, J.L., Mullally, F., Thompson, S.E., and Brown, T.M. (2013). Planetary Candidates Observed by Kepler. III. Analysis of the First 16 Months of Data. Astrophys. J. Suppl. Ser. 204, 24. Beegle, L., Kirby, J.P., Fisher, A., Hodyss, R., Saltzman, A., Soto, J., Lasnik, J., and Roark, S. (2011). Sample handling and processing on Mars for future astrobiology missions. In 2011 IEEE Aerospace Conference, pp. 1–10. Bibo, F. ‐J. (1983). Vermehrungsfähige Mikroorganismen in Steinsalz aus primären Lagerstätten. Kali U Steinsalz 8, 367–373. Bibring, J.-P., Langevin, Y., Mustard, J.F., Poulet, F., Arvidson, R., Gendrin, A., Gondet, B., Mangold, N., Pinet, P., Forget, F., et al. (2006). Global Mineralogical and Aqueous Mars History Derived from OMEGA/Mars Express Data. Science 312, 400–404. Biddle, J.F., Lipp, J.S., Lever, M.A., Lloyd, K.G., Sørensen, K.B., Anderson, R., Fredricks, H.F., Elvert, M., Kelly, T.J., and Schrag, D.P. (2006). Heterotrophic Archaea dominate sedimentary subsurface ecosystems off Peru. Proc. Natl. Acad. Sci. U. S. A. 103, 3846–3851. Bien, E., and Schwartz, W. (1965). [Geomicrobiological studies. VI. On the presence of dead and living bacterial cells in salt deposits]. Z. Für Allg. Mikrobiol. 5, 185–205. Borgonie, G., García-Moyano, A., Litthauer, D., Bert, W., Bester, A., van Heerden, E., Möller, C., Erasmus, M., and Onstott, T.C. (2011). Nematoda from the terrestrial deep subsurface of South Africa. Nature 474, 79–82. Borucki, W.J., Koch, D.G., Basri, G., Batalha, N., Brown, T.M., Bryson, S.T., Caldwell, D., Christensen-Dalsgaard, J., Cochran, W.D., and DeVore, E. (2011). Characteristics of planetary candidates observed by Kepler. II. Analysis of the first four months of data. Astrophys. J. 736, 19. Briggs, A.W., Stenzel, U., Johnson, P.L.F., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., et al. (2007). Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. 104, 14616–14621. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., Lalueza-Fox, C., Rudan, P., Brajkovic, D., Kucan, Z., et al. (2009). Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325, 318–321. Bright, J.-A., and Petricevic, S.F. (2004). Recovery of trace DNA and its application to DNA profiling of shoe insoles. Forensic Sci. Int. 145, 7–12. Brotherton, P., Endicott, P., Sanchez, J.J., Beaumont, M., Barnett, R., Austin, J., and Cooper, A. (2007). Novel high-resolution characterization of ancient DNA reveals C > U-type base modification events as the sole cause of post mortem miscoding lesions. Nucleic Acids Res. 35, 5717–5728.

24

Brown, K.A., and Brown, T.A. (2013). Biomolecular Archaeology*. Annu. Rev. Anthropol. 42, 159–174. Von Bubnoff, A. (2008). Next-Generation Sequencing: The Race Is On. Cell 132, 721–723. Bulat, S.A., Alekhina, I.A., Blot, M., Petit, J.-R., de Angelis, M., Wagenbach, D., Lipenkov, V.Y., Vasilyeva, L.P., Wloch, D.M., Raynaud, D., et al. (2004). DNA signature of thermophilic bacteria from the aged accretion ice of Lake Vostok, Antarctica: implications for searching for life in extreme icy environments. Int. J. Astrobiol. 3, 1–12. Bulat, S.A., Alekhina, I.A., Marie, D., Martins, J., and Petit, J.R. (2011). Searching for life in extreme environments relevant to Jovian’s Europa: Lessons from subglacial ice studies at Lake Vostok (East Antarctica). Adv. Space Res. 48, 697–701. Burbano, H.A., Hodges, E., Green, R.E., Briggs, A.W., Krause, J., Meyer, M., Good, J.M., Maricic, T., Johnson, P.L., and Xuan, Z. (2010). Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328, 723–725. Burger, J., Hummel, S., Pfeiffer, I., and Herrmann, B. (2000). Palaeogenetic analysis of (pre) historic artifacts and its significance for anthropology. Anthropol. Anz. 69–76. Campana, M.G., Bower, M.A., Bailey, M.J., Stock, F., O’Connell, T.C., Edwards, C.J., Checkley- Scott, C., Knight, B., Spencer, M., and Howe, C.J. (2010). A flock of sheep, goats and cattle: ancient DNA analysis reveals complexities of historical parchment manufacture. J. Archaeol. Sci. 37, 1317–1325. Campos, P.F., Willerslev, E., Sher, A., Orlando, L., Axelsson, E., Tikhonov, A., Aaris-Sørensen, K., Greenwood, A.D., Kahlke, R.-D., and Kosintsev, P. (2010). Ancient DNA analyses exclude humans as the driving force behind late Pleistocene musk ox (Ovibos moschatus) population dynamics. Proc. Natl. Acad. Sci. 107, 5675–5680. Cano, R.J., Poinar, H., and Poinar Jr, G.O. (1992a). Isolation and partial characterization of DNA from the bee Proplebeia dominicana (Apidae: Hymenoptera) in 25-40 million year old amber. Med. Sci. Res. 20, 249–251. Cano, R.J., Poinar, H.N., Roublik, D.W., and Poinar Jr, G.O. (1992b). Enzymatic amplification and nucleotide sequencing of poritions of the 18S rRNA gene of the bee Proplebeia dominicana (Apidae: Hymenoptera) isolated from 25-40 million year old Dominican amber. Med. Sci. Res. 20, 619–622. Cano, R.J., Poinar, H.N., Pieniezak, N.S., and Poinar Jr, G.O. (1993). Enzymatic amplification and nucleotide sequencing of DNA from 120-135 million year old weevil. Nature 363, 536–538. Charters, S., Evershed, R.P., Goad, L.J., Heron, C., and Blinkhorn, P. (1993). Identification of an adhesive used to repair a Roman jar. Archaeometry 35, 91–101. Charters, S., Evershed, R.P., Blinkhorn, P.W., and Denham, V. (1995). Evidence for the mixing of fats and waxes in archaeological ceramics. Archaeometry 37, 113–127.

25

Chivian, D., Brodie, E.L., Alm, E.J., Culley, D.E., Dehal, P.S., DeSantis, T.Z., Gihring, T.M., Lapidus, A., Lin, L.-H., and Lowry, S.R. (2008). Environmental reveals a single-species ecosystem deep within Earth. Science 322, 275–278. Clark, B.C. (1998). Surviving the limits to life at the surface of Mars. J. Geophys. Res. Planets 103, 28545–28555. Collins, G.S., and Wünnemann, K. (2005). How big was the Chesapeake Bay impact? Insight from numerical modeling. Geology 33, 925–928. Colwell, F.S., and D’Hondt, S. (2013). Nature and extent of the deep biosphere. Rev Miner. Geochem 75, 547–574. Coolen, M.J.L., and Overmann, J. (2007). 217 000-year-old DNA sequences of in Mediterranean sapropels and their implications for the reconstruction of the paleoenvironment. Environ. Microbiol. 9, 238–249. Cooper, A. (1997). Reply to Stoneking: ancient DNA--how do you really know when you have it? Am. J. Hum. Genet. 60, 1001. Cooper, A., and Poinar, H.N. (2000). Ancient DNA: do it right or not at all. Science 289, 1139. Cooper, A., Lalueza-Fox, C., Anderson, S., Rambaut, A., Austin, J., and Ward, R. (2001). Complete mitochondrial genome sequences of two extinct moas clarify ratite . Nature 409, 704–707. Craig, O.E., Steele, V.J., Fischer, A., Hartz, S., Andersen, S.H., Donohoe, P., Glykou, A., Saul, H., Jones, D.M., Koch, E., et al. (2011). Ancient lipids reveal continuity in culinary practices across the transition to agriculture in Northern Europe. Proc. Natl. Acad. Sci. 108, 17910–17915. Craig, O.E., Saul, H., Lucquin, A., Nishida, Y., Taché, K., Clarke, L., Thompson, A., Altoft, D.T., Uchiyama, J., Ajimoto, M., et al. (2013). Earliest evidence for the use of pottery. Nature 496, 351– 354. Cubero, O.F., Crespo, A., Fatehi, J., and Bridge, P.D. (1999). DNA extraction and PCR amplification method suitable for fresh, herbarium-stored, lichenized, and other fungi. Plant Syst. Evol. 216, 243–249. D’Costa, V.M., King, C.E., Kalan, L., Morar, M., Sung, W.W.L., Schwarz, C., Froese, D., Zazula, G., Calmels, F., Debruyne, R., et al. (2011). Antibiotic resistance is ancient. Nature 477, 457–461. D’Hondt, S., Spivack, A.J., Pockalny, R., Ferdelman, T.G., Fischer, J.P., Kallmeyer, J., Abrams, L.J., Smith, D.C., Graham, D., and Hasiuk, F. (2009). Subseafloor sedimentary life in the South Pacific Gyre. Proc. Natl. Acad. Sci. 106, 11651–11656. Dabney, J., Knapp, M., Glocke, I., Gansauge, M.-T., Weihmann, A., Nickel, B., Valdiosera, C., García, N., Pääbo, S., Arsuaga, J.-L., et al. (2013). Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. 110, 15758–15763.

26

David C. Smith, Arthur J. Spivack, Martin R. Fisk, Shelley A. Haveman, H.S. (2000). Tracer- Based Estimates of Drilling-Induced Microbial Contamination of Deep Sea Crust. Geomicrobiol. J. 17, 207–219. DeSalle, R. (1994). Implications of ancient DNA for phylogenetic studies. Experientia 50, 543– 550. DeSalle, R., Gatesy, J., Wheeler, W., and Grimaldi, D. (1992). DNA sequences from a fossil termite in Oligo-Miocene amber and their phylogenetic implications. Science 257, 1933–1936. DeSalle, R., Barcia, M., and Wray, C. (1993). PCR jumping in clones of 30-million-year-old DNA fragments from amber preserved termites (Mastotermes electrodominicus). Experientia 49, 906– 909. Dombrowski, H. (1963). BACTERIA FROM PALEOZOIC SALT DEPOSITS. Ann. N. Y. Acad. Sci. 108, 453–460. Edwards, K.J., Becker, K., and Colwell, F. (2012). The Deep, Dark Energy Biosphere: Intraterrestrial Life on Earth. Annu. Rev. Earth Planet. Sci. 40, 551–568. Ehlmann, B.L., Mustard, J.F., Murchie, S.L., Bibring, J.-P., Meunier, A., Fraeman, A.A., and Langevin, Y. (2011). Subsurface water and clay mineral formation during the early history of Mars. Nature 479, 53–60. Fairén, A.G., Chevrier, V., Abramov, O., Marzo, G.A., Gavin, P., Davila, A.F., Tornabene, L.L., Bishop, J.L., Roush, T.L., and Gross, C. (2010). Noachian and more recent phyllosilicates in impact craters on Mars. Proc. Natl. Acad. Sci. 107, 12095–12100. Fernández, M.H., and Vrba, E.S. (2005). A complete estimate of the phylogenetic relationships in Ruminantia: a dated species-level supertree of the extant ruminants. Biol. Rev. 80, 269–302. Fish, S.A., Shepherd, T.J., McGenity, T.J., and Grant, W.D. (2002). Recovery of 16S ribosomal RNA gene fragments from ancient halite. Nature 417, 432–436. Foing, B. h., Stoker, C., and Ehrenfreund, P. (2011). Astrobiology field research in Moon/Mars analogue environments. Int. J. Astrobiol. 10, 137–139. Forget, F., and Pierrehumbert, R.T. (1997). Warming early Mars with carbon dioxide clouds that scatter infrared radiation. Science 278, 1273–1276. Ghosh, S., Osman, S., Vaishampayan, P., and Venkateswaran, K. (2010). Recurrent Isolation of Extremotolerant Bacteria from the Clean Room Where Phoenix Spacecraft Components Were Assembled. Astrobiology 10, 325–335. Gilbert, J.A., and Dupont, C.L. (2011). Microbial Metagenomics: Beyond the Genome. Annu. Rev. Mar. Sci. 3, 347–371. Gilbert, M.T.P., Hansen, A.J., Willerslev, E., Rudbeck, L., Barnes, I., Lynnerup, N., and Cooper, A. (2003a). Characterization of Genetic Miscoding Lesions Caused by Postmortem Damage. Am. J. Hum. Genet. 72, 48–61.

27

Gilbert, M.T.P., Willerslev, E., Hansen, A.J., Barnes, I., Rudbeck, L., Lynnerup, N., and Cooper, A. (2003b). Distribution Patterns of Postmortem Damage in Human Mitochondrial DNA. Am. J. Hum. Genet. 72, 32–47. Gilbert, M.T.P., Bandelt, H.-J., Hofreiter, M., and Barnes, I. (2005a). Assessing ancient DNA studies. Trends Ecol. Evol. 20, 541–544. Gilbert, M.T.P., Barnes, I., Collins, M.J., Smith, C., Eklund, J., Goudsmit, J., Poinar, H., and Cooper, A. (2005b). Long-term survival of ancient DNA in Egypt: Response to Zink and Nerlich (2003). Am. J. Phys. Anthropol. 128, 110–114. Gilbert, M.T.P., Binladen, J., Miller, W., Wiuf, C., Willerslev, E., Poinar, H., Carlson, J.E., Leebens-Mack, J.H., and Schuster, S.C. (2007). Recharacterization of ancient DNA miscoding lesions: insights in the era of sequencing-by-synthesis. Nucleic Acids Res. 35, 1–10. Gilichinsky, D., Rivkina, E., Shcherbakova, V., Laurinavichuis, K., and Tiedje, J. (2003). Supercooled Water Brines Within Permafrost—An Unknown Ecological Niche for Microorganisms: A Model for Astrobiology. Astrobiology 3, 331–341. Gold, T. (1992). The deep, hot biosphere. Proc. Natl. Acad. Sci. 89, 6045–6049. Golenberg, E.M., Giannasi, D.E., Clegg, M.T., Smiley, C.J., Durbin, M., Henderson, D., and Zurawski, G. (1990). Chloroplast DNA sequence from a Miocene Magnolia species. Nature 344, 656–658. Gómez, F., Rodríguez-Manfredi, J.A., Rodríguez, N., Fernández-Sampedro, M., Caballero- Castrejón, F.J., and Amils, R. (2012). Habitability: Where to look for life? Halophilic habitats: Earth analogs to study Mars habitability. Planet. Space Sci. 68, 48–55. Gramain, A., Díaz, G.C., Demergasso, C., Lowenstein, T.K., and McGenity, T.J. (2011). Archaeal diversity along a subterranean salt core from the Salar Grande (Chile). Environ. Microbiol. 13, 2105–2121. Graur, D., and Pupko, T. (2001). The Permian Bacterium that Isn’t. Mol. Biol. Evol. 18, 1143– 1146. Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., and Fritz, M.H.-Y. (2010). A draft sequence of the Neandertal genome. Science 328, 710–722. Gronstal, A.L., Voytek, M.A., Kirshtein, J.D., Heyde, N.M. von der, Lowit, M.D., and Cockell, C.S. (2009). Contamination assessment in microbiological sampling of the Eyreville core, Chesapeake Bay impact structure. Geol. Soc. Am. Spec. Pap. 458, 951–964. Gutiérrez, G., and Marin, A. (1998). The most ancient DNA recovered from an amber-preserved specimen may not be as ancient as it seems. Mol. Biol. Evol. 15, 926–929. Haile, J., Froese, D.G., MacPhee, R.D.E., Roberts, R.G., Arnold, L.J., Reyes, A.V., Rasmussen, M., Nielsen, R., Brook, B.W., Robinson, S., et al. (2009). Ancient DNA reveals late survival of mammoth and horse in interior Alaska. Proc. Natl. Acad. Sci. 106, 22352–22357.

28

Handelsman, J. (2004). Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol. Mol. Biol. Rev. 68, 669–685. Handt, O., Krings, M., Ward, R.H., and Pääbo, S. (1996). The retrieval of ancient human DNA sequences. Am. J. Hum. Genet. 59, 368. Hanni, C., Brousseau, T., Laudet, V., and Stehelin, D. (1995). Isopropanol precipitation removes PCR inhibitors from ancient bone extracts. Nucleic Acids Res. 23, 881–882. Hansen, A.J., Mitchell, D.L., Wiuf, C., Paniker, L., Brand, T.B., Binladen, J., Gilichinsky, D.A., Rønn, R., and Willerslev, E. (2006). Crosslinks Rather Than Strand Breaks Determine Access to Ancient DNA Sequences From Frozen Sediments. Genetics 173, 1175–1179. Hansson, M.C., and Foley, B.P. (2008). Ancient DNA fragments inside Classical Greek amphoras reveal cargo of 2400-year-old shipwreck. J. Archaeol. Sci. 35, 1169–1176. Hebsgaard, M.B., Phillips, M.J., and Willerslev, E. (2005). Geologically ancient DNA: fact or artefact? Trends Microbiol. 13, 212–220. Hebsgaard, M.B., Gilbert, M.T.P., Arneborg, J., Heyn, P., Allentoft, M.E., Bunce, M., Munch, K., Schweger, C., and Willerslev, E. (2009). The farm beneath the sand—An archaeological case study on ancient “dirt”DNA. Antiquity 83, 430–444. Heldmann, J.L., Pollard, W., McKay, C.P., Marinova, M.M., Davila, A., Williams, K.E., Lacelle, D., and Andersen, D.T. (2013). The high elevation Dry Valleys in Antarctica as analog sites for subsurface ice on Mars. Planet. Space Sci. 85, 53–58. Herber, B., and Herold, K. (1998). DNA typing of human dandruff. J. Forensic Sci. 43, 648–656. Heron, C., Nilsen, G., Stern, B., Craig, O., and Nordby, C. (2010). Application of lipid biomarker analysis to evaluate the function of “slab-lined pits” in Arctic Norway. J. Archaeol. Sci. 37, 2188– 2197. Heron, C., Andersen, S., Fischer, A., Glykou, A., Hartz, S., Saul, H., Steele, V., and Craig, O. (2013). Illuminating the Late Mesolithic: residue analysis of’blubber’lamps from Northern Europe. Antiquity 87, 178–188. Heyn, P., Stenzel, U., Briggs, A.W., Kircher, M., Hofreiter, M., and Meyer, M. (2010). Road blocks on paleogenomes—polymerase extension profiling reveals the frequency of blocking lesions in ancient DNA. Nucleic Acids Res. 38, e161–e161. Higuchi, R., Bowman, B., Freiberger, M., Ryder, O.A., and Wilson, A.C. (1984). DNA sequences from the quagga, an extinct member of the horse family. Ho, S.Y.W., Heupink, T.H., Rambaut, A., and Shapiro, B. (2007). Bayesian Estimation of Sequence Damage in Ancient DNA. Mol. Biol. Evol. 24, 1416–1422. Hoehler, T.M., and Jørgensen, B.B. (2013). Microbial life under extreme energy limitation. Nat. Rev. Microbiol. 11, 83–94.

29

Hofreiter, M., Jaenicke, V., Serre, D., Haeseler, A. von, and Pääbo, S. (2001a). DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res. 29, 4793–4799. Hofreiter, M., Serre, D., Poinar, H.N., Kuch, M., and Pääbo, S. (2001b). Ancient DNA. Nat. Rev. Genet. 2, 353–359. Holben, W.E., Jansson, J.K., Chelm, B.K., and Tiedje, J.M. (1988). DNA Probe Method for the Detection of Specific Microorganisms in the Soil Bacterial Community. Appl. Environ. Microbiol. 54, 703–711. Van Hoofstat, D.E., Deforce, D.L., Hubert De Pauw, I.P., and Van den Eeckhout, E.G. (1999). DNA typing of fingerprints using capillary electrophoresis: effect of dactyloscopic powders. Electrophoresis 20, 2870–2876. Horsman-Hall, K.M., Orihuela, Y., Karczynski, S.L., Davis, A.L., Ban, J.D., and Greenspoon, S.A. (2009). Development of STR profiles from firearms and fired cartridge cases. Forensic Sci. Int. Genet. 3, 242–250. Hynek, B.M., McCollom, T.M., and Rogers, K.L. (2011). Cerro Negro volcano, Nicaragua: An assessment of geological and potential biological systems on early Mars. Analogs Planet. Explor. 279–285. Impey, C. (2013). The First Thousand Exoplanets: Twenty Years of Excitement and Discovery. In Astrobiology, History, and Society, D.A. Vakoch, ed. (Springer Berlin Heidelberg), pp. 201–212. Inagaki, F., Okada, H., Tsapin, A.I., and Nealson, K.H. (2005). MICROBIAL SURVIVAL: The Paleome: A Sedimentary Genetic Record of Past Microbial Communities. Astrobiology 5, 141– 153. Jiménez-Hidalgo, E., and Carranza-Castañeda, O. (2010). Blancan Camelids from San Miguel de Allende, Guanajuato, Central México. J. Paleontol. 84, 51–65. Johnson, G.H., Kruse, S.E., Vaughn, A.W., Lucey, J.K., Hobbs, C.H., and Powars, D.S. (1998). Postimpact deformation associated with the late Eocene Chesapeake Bay impact structure in southeastern Virginia. Geology 26, 507–510. Johnson, K.R., Miller, I.M., Sertich, J., Stucky, R., Fisher, D.C., and Pigati, J.S. (2011). Ziegler Reservoir and the Snowmastodon Project: Overview and Geologic Setting of a Recently Discoverd Series of High-Elevation Pleistocene Ecosystems near Snowmass Village, Colorado. Jones, D.S., Tobler, D.J., Schaperdoth, I., Mainiero, M., and Macalady, J.L. (2010). Community structure of subsurface biofilms in the thermal sulfidic caves of Acquasanta Terme, Italy. Appl. Environ. Microbiol. 76, 5902–5910. Jørgensen, B.B. (2011). Deep subseafloor microbial cells on physiological standby. Proc. Natl. Acad. Sci. 108, 18193–18194. Jørgensen, B.B. (2012). Shrinking majority of the deep biosphere. Proc. Natl. Acad. Sci. 109, 15976–15977.

30

Jørgensen, B.B., and Boetius, A. (2007). Feast and famine—microbial life in the deep-sea bed. Nat. Rev. Microbiol. 5, 770–781. Kallmeyer, J., Smith, D.C., Spivack, A.J., and D’Hondt, S. (2008). New cell extraction procedure applied to deep subsurface sediments. Limnol Ocean. Methods 6, 236–245. Kallmeyer, J., Pockalny, R., Adhikari, R.R., Smith, D.C., and D’Hondt, S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proc. Natl. Acad. Sci. 109, 16213–16216. Kalmár, T., Bachrati, C.Z., Marcsik, A., and Raskó, I. (2000). A simple and efficient method for PCR amplifiable DNA extraction from ancient bones. Nucleic Acids Res. 28, e67–e67. Kaltenegger, L., Selsis, F., Fridlund, M., Lammer, H., Beichman, C., Danchi, W., Eiroa, C., Henning, T., Herbst, T., and Léger, A. (2010). Deciphering spectral fingerprints of habitable exoplanets. Astrobiology 10, 89–102. Kasting, J.F., Whitmire, D.P., and Reynolds, R.T. (1993). Habitable zones around main sequence stars. Icarus 101, 108–128. Kenkmann, T., Collins, G.S., Wittmann, A., Wünnemann, K., Reimold, W.U., and Melosh, H.J. (2009). A model for the formation of the Chesapeake Bay impact crater as revealed by drilling and numerical simulation. Geol. Soc. Am. Spec. Pap. 458, 571–585. Kerney, K.R., and Schuerger, A.C. (2011). Survival of Bacillus subtilis Endospores on Ultraviolet- Irradiated Rover Wheels and Mars Regolith under Simulated Martian Conditions. Astrobiology 11, 477–485. Kieft, T.L., Phelps, T.J., Fredrickson, J.K., Hurst, C.J., Crawford, R.L., Garland, J.L., Lipson, D.A., Mills, A.L., and Stetzenbach, L.D. (2007). Drilling, coring, and sampling subsurface environments. Man. Environ. Microbiol. 799–817. Kim, S., Soltis, D.E., Soltis, P.S., and Suh, Y. (2004). DNA sequences from Miocene fossils: an ndhF sequence of Magnolia latahensis (Magnoliaceae) and an rbcL sequence of Persea pseudocarolinensis (Lauraceae). Am. J. Bot. 91, 615–620. King, C.E., Debruyne, R., Kuch, M., Schwarz, C., and Poinar, H.N. A quantitative approach to detect and overcome PCR inhibition in ancient DNA extracts. BioTechniques 47. Koeberl, C., Poag, C.W., Reimold, W.U., and Brandt, D. (1996). Impact Origin of the Chesapeake Bay Structure and the Source of the North American Tektites. Science 271, 1263–1266. Kormas, K.A., Smith, D.C., Edgcomb, V., and Teske, A. (2003). Molecular analysis of deep subsurface microbial communities in Nankai Trough sediments (ODP Leg 190, Site 1176). FEMS Microbiol. Ecol. 45, 115–125. Krause, J., Dear, P.H., Pollack, J.L., Slatkin, M., Spriggs, H., Barnes, I., Lister, A.M., Ebersberger, I., Pääbo, S., and Hofreiter, M. (2006). Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439, 724–727.

31

Kreader, C.A. (1996). Relief of amplification inhibition in PCR with bovine serum albumin or T4 gene 32 protein. Appl. Environ. Microbiol. 62, 1102–1106. Kulpecz, A.A., Miller, K.G., Browning, J.V., Edwards, L.E., Powars, D.S., McLaughlin, P.P., Harris, A.D., and Feigenson, M.D. (2009). Postimpact deposition in the Chesapeake Bay impact structure: Variations in eustasy, compaction, sediment supply, and passive-aggressive tectonism. Geol. Soc. Am. Spec. Pap. 458, 811–837. Kwok, S. al, and Higuchi, R. (1989). Avoiding false positives with PCR. Nature 339, 237–238. LeBlanc, S.A., Kreisman, L.S.C., Kemp, B.M., Smiley, F.E., Carlyle, S.W., Dhody, A.N., and Benjamin, T. (2007). Quids and Aprons: Ancient DNA from Artifacts from the American Southwest. J. Field Archaeol. 32, 161–175. Lee, A.B., and Cooper, T.A. (1995). Improved direct PCR screen for bacterial colonies: wooden toothpicks inhibit PCR amplification. BioTechniques 18, 225–226. Levin, G. (2014). Evidence for microbial life on Mars? J. Med. Imaging. Levin, G.V. (2013). Implications of curiosity’s findings for the viking labeled release experiment and life on mars. In Proc. SPIE, pp. 8865–2. De Ley, J., Kersters, K., and Park, I.W. (1966). Molecular-biological and taxonomic studies onPseudomonas halocrenaea, a bacterium from Permian salt deposits. Antonie Van Leeuwenhoek 32, 315–331. Lin, L.-H., Wang, P.-L., Rumble, D., Lippmann-Pipke, J., Boice, E., Pratt, L.M., Lollar, B.S., Brodie, E.L., Hazen, T.C., and Andersen, G.L. (2006). Long-term sustainability of a high-energy, low-diversity crustal biome. Science 314, 479–482. Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709–715. Lindgren, P., Ivarsson, M., Neubeck, A., Broman, C., Henkel, H., and Holm, N.G. (2010). Putative fossil life in a hydrothermal system of the Dellen impact structure, Sweden. Int. J. Astrobiol. 9, 137–146. Lipp, J.S., Morono, Y., Inagaki, F., and Hinrichs, K.-U. (2008). Significant contribution of Archaea to extant biomass in marine subsurface sediments. Nature 454, 991–994. Lomstein, B.A., Langerhuus, A.T., D’Hondt, S., Jørgensen, B.B., and Spivack, A.J. (2012). Endospore abundance, microbial growth and necromass turnover in deep sub-seafloor sediment. Nature 484, 101–104. Lui, C., Carr, C.E., Rowedder, H., Ruvkun, G., and Zuber, M. (2011). SETG: An instrument for detection of life on Mars ancestrally related to life on Earth. In 2011 IEEE Aerospace Conference, pp. 1–12. Malinconico, M.L., Sanford, W.E., and Horton, J.W. (2009). Postimpact heat conduction and compaction-driven fluid flow in the Chesapeake Bay impact structure based on downhole vitrinite reflectance data, ICDP-USGS Eyreville deep core holes and Cape Charles test holes. Geol. Soc. Am. Spec. Pap. 458, 905–930.

32

Marcucci, E.C., Hynek, B.M., Kierein-Young, K.S., and Rogers, K.L. (2013). Visible to Near- Infrared Spectroscopy of Acid-Sulfate Weathering Sites in Nicaraguan Volcanic Systems: An Early Mars Analog. LPI Contrib. 1719, 1677. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.-J., and Chen, Z. (2005). Genome sequencing in microfabricated high- density picolitre reactors. Nature 437, 376–380. Mayor, M., Marmier, M., Lovis, C., Udry, S., Ségransan, D., Pepe, F., Benz, W., Bertaux, J.-L., Bouchy, F., and Dumusque, X. (2011). The HARPS search for southern extra-solar planets XXXIV. Occurrence, mass distribution and orbital properties of super-Earths and Neptune-mass planets. ArXiv Prepr. ArXiv11092497. McGenity, T.J., Gemmell, R.T., Grant, W.D., and Stan-Lotter, H. (2000). Origins of halophilic microorganisms in ancient salt deposits. Environ. Microbiol. 2, 243–250. McGovern, P.E., Mirzoian, A., and Hall, G.R. (2009). Ancient Egyptian herbal wines. Proc. Natl. Acad. Sci. 106, 7361–7366. McKay, C.P. (2011). The search for life in our Solar System and the implications for science and society. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 369, 594–606. McKeown, B.J. (1994). An acetylated (nuclease‐free) bovine serum albumin in a PCR buffer inhibits amplification. BioTechniques 2, 247–248. Merino, M.L., and Rossi, R.V. (2010). Origin, systematics, and morphological radiation. InJ MB Duarte González Eds Neotropical Cervidology Biol. Med. Lat. Am. Deer Jaboticabal Braz. Gland Switz. Funep Collab. IUCN 2–11. Meyer, M., Kircher, M., Gansauge, M.-T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Jay, F., Prüfer, K., Filippo, C. de, et al. (2012). A High-Coverage Genome Sequence from an Archaic Denisovan Individual. Science 338, 222–226. Michalski, J.R., Cuadros, J., Niles, P.B., Parnell, J., Deanne Rogers, A., and Wright, S.P. (2013). Groundwater activity on Mars and implications for a deep biosphere. Nat. Geosci. 6, 133–138. Miller, W., Drautz, D.I., Ratan, A., Pusey, B., Qi, J., Lesk, A.M., Tomsho, L.P., Packard, M.D., Zhao, F., Sher, A., et al. (2008). Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456, 387–390. Mitchell, D., Willerslev, E., and Hansen, A. (2005). Damage and repair of ancient DNA. Mutat. Res. Mol. Mech. Mutagen. 571, 265–276. Morono, Y., Terada, T., Nishizawa, M., Ito, M., Hillion, F., Takahata, N., Sano, Y., and Inagaki, F. (2011). Carbon and nitrogen assimilation in deep subseafloor microbial cells. Proc. Natl. Acad. Sci. 108, 18295–18300. Murphy, T.M., Ben-Yehuda, N., Taylor, R.E., and Southon, J.R. (2011). Hemp in ancient rope and fabric from the Christmas Cave in Israel: talmudic background and DNA sequence identification. J. Archaeol. Sci. 38, 2579–2588.

33

Mustard, J.F., Murchie, S.L., Pelkey, S.M., Ehlmann, B.L., Milliken, R.E., Grant, J.A., Bibring, J.-P., Poulet, F., Bishop, J., Dobrea, E.N., et al. (2008). Hydrated silicate minerals on Mars observed by the Mars Reconnaissance Orbiter CRISM instrument. Nature 454, 305–309. Navarro-Gonzalez, R. (2003). Mars-Like Soils in the Atacama Desert, Chile, and the Dry Limit of Microbial Life. Science 302, 1018–1021. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., and Eichler, E.E. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276. Noonan, J.P., Coop, G., Kudaravalli, S., Smith, D., Krause, J., Alessi, J., Chen, F., Platt, D., Pääbo, S., Pritchard, J.K., et al. (2006). Sequencing and Analysis of Neanderthal Genomic DNA. Science 314, 1113–1118. Norton, C.F., McGenity, T.J., and Grant, W.D. (1993). Archaeal halophiles (halobacteria) from two British salt mines. J. Gen. Microbiol. 139, 1077–1081. Van Oorschot, R.A., Ballantyne, K.N., and Mitchell, R.J. (2010). Forensic trace DNA: a review. Investig Genet 1, 14. Van Oorschot, R.A., and Jones, M.K. (1997). DNA fingerprints from fingerprints. Nat.-Lond.- 767–767. Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., et al. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78. Orsi, W.D., Edgcomb, V.P., Christman, G.D., and Biddle, J.F. (2013). Gene expression in the deep biosphere. Nature 499, 205–208. Oskam, C.L., and Bunce, M. (2012). DNA Extraction from Fossil Eggshell. In Ancient DNA, (Springer), pp. 65–70. Oskam, C.L., Haile, J., McLay, E., Rigby, P., Allentoft, M.E., Olsen, M.E., Bengtsson, C., Miller, G.H., Schwenninger, J.-L., and Jacomb, C. (2010). Fossil avian eggshell preserves ancient DNA. Proc. R. Soc. B Biol. Sci. 277, 1991–2000. Oz, C., Levi, J.A., Novoselski, Y., Volkov, N., and Motro, U. (1999). Forensic identification of a rapist using unusual evidence. J. Forensic Sci. 44, 860–862. Pääbo, S. (1985). Molecular cloning of ancient Egyptian mummy DNA. Pääbo, S. (1989). Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc. Natl. Acad. Sci. 86, 1939–1943. Pääbo, S., Higuchi, R.G., and Wilson, A.C. (1989). Ancient DNA and the polymerase chain reaction. The emerging field of molecular archaeology. J. Biol. Chem. 264, 9709. Panieri, G., Lugli, S., Manzi, V., Roveri, M., Schreiber, B.C., and Palinska, K.A. (2010). Ribosomal RNA gene fragments from fossilized identified in primary gypsum from the late Miocene, Italy. Geobiology 8, 101–111.

34

Park, J.S., Vreeland, R.H., Cho, B.C., Lowenstein, T.K., Timofeeff, M.N., and Rosenzweig, W.D. (2009). Haloarchaeal diversity in 23, 121 and 419 MYA salts. Geobiology 7, 515–523. Parnell, J., Lee, P., Cockell, C. s., and Osinski, G. r. (2004). Microbial colonization in impact- generated hydrothermal sulphate deposits, Haughton impact structure, and implications for sulphates on Mars. Int. J. Astrobiol. 3, 247–256. Parnell, J., Boyce, A., Thackrey, S., Muirhead, D., Lindgren, P., Mason, C., Taylor, C., Still, J., Bowden, S., Osinski, G.R., et al. (2010). Sulfur isotope signatures for rapid colonization of an impact crater by thermophilic microbes. Geology 38, 271–274. Parro, V., Rivas, L.A., and Gómez-Elvira, J. (2008). Protein Microarrays-Based Strategies for Life Detection in Astrobiology. Space Sci. Rev. 135, 293–311. Parro, V., de Diego-Castilla, G., Rodríguez-Manfredi, J.A., Rivas, L.A., Blanco-López, Y., Sebastián, E., Romeral, J., Compostizo, C., Herrero, P.L., García-Marín, A., et al. (2011). SOLID3: A Multiplex Antibody Microarray-Based Optical Sensor Instrument for In Situ Life Detection in Planetary Exploration. Astrobiology 11, 15–28. Petricevic, S.F., Bright, J.-A., and Cockerton, S.L. (2006). DNA profiling of trace DNA recovered from bedding. Forensic Sci. Int. 159, 21–26. Poinar, H.N., and Cano, R.J. (1993). DNA from an extinct plant. Nat. Int. Wkly. J. Sci. 363, 677. Poinar, H.N., Hofreiter, M., Spaulding, W.G., Martin, P.S., Stankiewicz, B.A., Bland, H., Evershed, R.P., Possnert, G., and Pääbo, S. (1998). Molecular coproscopy: dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science 281, 402–406. Poinar, H.N., Schwarz, C., Qi, J., Shapiro, B., MacPhee, R.D., Buigues, B., Tikhonov, A., Huson, D.H., Tomsho, L.P., and Auch, A. (2006). Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394. Polley, D., Mickiewicz, P., Vaughn, M., Miller, T., Warburton, R., Komonski, D., Kantautas, C., Reid, B., Frappier, R., and Newman, J. (2006). An investigation of DNA recovery from firearms and cartridge cases. J.-Can. Soc. FORENSIC Sci. 39, 217. Poulakakis, N., Tselikas, A., Bitsakis, I., Mylonas, M., and Lymberakis, P. (2007). Ancient DNA and the genetic signature of ancient Greek manuscripts. J. Archaeol. Sci. 34, 675–680. Poulet, F., Bibring, J.-P., Mustard, J.F., Gendrin, A., Mangold, N., Langevin, Y., Arvidson, R.E., Gondet, B., Gomez, C., Berthé, M., et al. (2005). Phyllosilicates on Mars and implications for early Martian climate. Nature 438, 623–627. Rachmayanti, Y., Leinemann, L., Gailing, O., and Finkeldey, R. (2006). Extraction, amplification and characterization of DNA from dipterocarpaceae. Plant Mol. Biol. Report. 24, 45–55. Radax, C., Gruber, C., and Stan-Lotter, H. (2001). Novel haloarchaeal 16S rRNA gene sequences from Alpine Permo-Triassic rock salt. Extremophiles 5, 221–228.

35

Randi, E., Mucci, N., Pierpaoli, M., and Douzery, E. (1998). New phylogenetic perspectives on the Cervidae (Artiodactyla) are provided by the mitochondrial cytochrome b gene. Proc. R. Soc. Lond. B Biol. Sci. 265, 793–801. Rasmussen, M., Li, Y., Lindgreen, S., Pedersen, J.S., Albrechtsen, A., Moltke, I., Metspalu, M., Metspalu, E., Kivisild, T., and Gupta, R. (2010). Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762. Rohland, N., and Hofreiter, M. (2007a). Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762. Rohland, N., and Hofreiter, M. (2007b). Comparison and optimization of ancient DNA extraction. BioTechniques 42, 343–352. Römpler, H., Rohland, N., Lalueza-Fox, C., Willerslev, E., Kuznetsova, T., Rabeder, G., Bertranpetit, J., Schöneberg, T., and Hofreiter, M. (2006). Nuclear Gene Indicates Coat-Color Polymorphism in Mammoths. Science 313, 62–62. Rondon, M.R., August, P.R., Bettermann, A.D., Brady, S.F., Grossman, T.H., Liles, M.R., Loiacono, K.A., Lynch, B.A., MacNeil, I.A., and Minor, C. (2000). Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66, 2541–2547. Rossen, L., Nørskov, P., Holmstrøm, K., and Rasmussen, O.F. (1992). Inhibition of PCR by components of food samples, microbial diagnostic assays and DNA-extraction solutions. Int. J. Food Microbiol. 17, 37–45. Røy, H., Kallmeyer, J., Adhikari, R.R., Pockalny, R., Jørgensen, B.B., and D’Hondt, S. (2012). Aerobic microbial respiration in 86-million-year-old deep-sea red clay. Science 336, 922–925. Rummel, J.D. (2001). Special Feature: Planetary exploration in the time of astrobiology: Protecting against biological contamination. Proc. Natl. Acad. Sci. 98, 2128–2131. Rutty, G.N. (2002). An investigation into the transference and survivability of human DNA following simulated manual strangulation with consideration of the problem of third party contamination. Int. J. Legal Med. 116, 170–173. Santelli, C.M., Banerjee, N., Bach, W., and Edwards, K.J. (2010). Tapping the subsurface ocean crust biosphere: Low biomass and drilling-related contamination calls for improved quality controls. Geomicrobiol. J. 27, 158–169. Sarkisova, S.A., Tringe, S.G., Thomas-Keprta, K.L., Garrison, D.H., McKay, D.S., and Brown, I.I. (2010). Studying Prokaryotic Communities in Iron Depositing Hot Springs (IDHS): Implication for Early Mars Habitability. Satterfield, C.L., Lowenstein, T.K., Vreeland, R.H., Rosenzweig, W.D., and Powers, D.W. (2005). New evidence for 250 Ma age of halotolerant bacterium from a Permian salt crystal. Geology 33, 265–268.

36

Schippers, A., and Neretin, L.N. (2006). Quantification of microbial communities in near‐surface and deeply buried marine sediments on the Peru continental margin using real‐time PCR. Environ. Microbiol. 8, 1251–1260. Schippers, A., Neretin, L.N., Kallmeyer, J., Ferdelman, T.G., Cragg, B.A., Parkes, R.J., and Jørgensen, B.B. (2005). Prokaryotic cells of the deep sub-seafloor biosphere identified as living bacteria. Nature 433, 861–864. Schlumbaum, A., Campos, P.F., Volken, S., Volken, M., Hafner, A., and Schibler, J. (2010). Ancient DNA, a Neolithic legging from the Swiss Alps and the early history of goat. J. Archaeol. Sci. 37, 1247–1251. Schubert, B.A., Lowenstein, T.K., Timofeeff, M.N., and Parker, M.A. (2010). Halophilic Archaea cultured from ancient halite, Death Valley, California. Environ. Microbiol. 12, 440–454. Schulze-Makuch, D., Méndez, A., Fairén, A.G., von Paris, P., Turse, C., Boyer, G., Davila, A.F., António, M.R. de S., Catling, D., and Irwin, L.N. (2011). A two-tiered approach to assessing the habitability of exoplanets. Astrobiology 11, 1041–1052. Serre, D., Langaney, A., Chech, M., Teschler-Nicola, M., Paunovic, M., Mennecier, P., Hofreiter, M., Possnert, G., and Pääbo, S. (2004). No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. 2, e57. Sewell, J., Quinones, I., Ames, C., Multaney, B., Curtis, S., Seeboruth, H., Moore, S., and Daniel, B. (2008). Recovery of DNA and fingerprints from touched documents. Forensic Sci. Int. Genet. 2, 281–285. Shapiro, B., and Cooper, A. (2003). Beringia as an Ice Age genetic museum. Quat. Res. 60, 94– 100. Shapiro, R., and Schulze-Makuch, D. (2009). The search for alien life in our solar system: strategies and priorities. Astrobiology 9, 335–343. Sherwood Lollar, B., Moran, J., Tille, S., Voglesonger, K., Lacrampe-Couloume, G., Onstott, T., Pratt, L., and Slater, G. (2009). Hydrogeologic Controls on the Deep Terrestrial Biosphere- Chemolithotrophic Energy for Subsurface Life on Earth and Mars. In AGU Spring Meeting Abstracts, p. 01. Sidow, A., Wilson, A.C., Pääbo, S., Hummel, S., Bada, J.L., Westbroek, P., Hagelberg, E., and Curry, G.B. (1991). Bacterial DNA in Clarkia Fossils [and Discussion]. Philos. Trans. Biol. Sci. 333, 429–433. Sinding, M.-H.S., Gilbert, M.T.P., Grønnow, B., Gulløv, H.C., Toft, P.A., and Foote, A.D. (2012). Minimally destructive DNA extraction from archaeological artefacts made from whale baleen. J. Archaeol. Sci. 39, 3750–3753. Soltis, P.S., Soltis, D.E., and Smiley, C.J. (1992). An rbcL sequence from a Miocene Taxodium (bald cypress). Proc. Natl. Acad. Sci. 89, 449–451.

37

Sørensen, K.B., Lauer, A., and Teske, A. (2004). Archaeal phylotypes in a metal-rich and low- activity deep subsurface sediment of the Peru Basin, ODP Leg 201, Site 1231. Geobiology 2, 151– 161. Speller, C.F., Kemp, B.M., Wyatt, S.D., Monroe, C., Lipe, W.D., Arndt, U.M., and Yang, D.Y. (2010). Ancient mitochondrial DNA analysis reveals complexity of indigenous North American turkey domestication. Proc. Natl. Acad. Sci. 107, 2807–2812. Stan-Lotter, H., Radax, C., Gruber, C., McGenity, T.J., Legat, A., Wanner, G., and Denner, E.B.M. (2000). The distribution of viable microorganisms in Permo-Triassic rock salt. In SALT, p. 8th. Stevenson, F.J. (1994). Humus Chenmistry: Genesis, Composition, Reactions (New York: John Wiley and Sons). Stiller, M., Green, R.E., Ronan, M., Simons, J.F., Du, L., He, W., Egholm, M., Rothberg, J.M., Keates, S.G., Ovodov, N.D., et al. (2006). Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl. Acad. Sci. 103, 13578–13584. Storey, A.A., Ramírez, J.M., Quiroz, D., Burley, D.V., Addison, D.J., Walter, R., Anderson, A.J., Hunt, T.L., Athens, J.S., Huynen, L., et al. (2007). Radiocarbon and DNA evidence for a pre- Columbian introduction of Polynesian chickens to Chile. Proc. Natl. Acad. Sci. 104, 10335–10339. Sturt, H.F., Summons, R.E., Smith, K., Elvert, M., and Hinrichs, K.-U. (2004). Intact polar membrane lipids in prokaryotes and sediments deciphered by high‐performance liquid chromatography/electrospray ionization multistage mass spectrometry—new biomarkers for biogeochemistry and microbial ecology. Rapid Commun. Mass Spectrom. 18, 617–628. Takai, K., Komatsu, T., Inagaki, F., and Horikoshi, K. (2001). Distribution of archaea in a black smoker chimney structure. Appl. Environ. Microbiol. 67, 3618–3629. Tebbe, C.C., and Vahjen, W. (1993). Interference of humic acids and DNA extracted directly from soil in detection and transformation of recombinant DNA from bacteria and a yeast. Appl. Environ. Microbiol. 59, 2657–2665. Temperton, B., and Giovannoni, S.J. (2012). Metagenomics: microbial diversity through a scratched lens. Curr. Opin. Microbiol. 15, 605–612. Thomas, R.H., Schaffner, W., Wilson, A.C., and Pääbo, S. (1989). DNA phylogeny of the extinct marsupial wolf. Nature 340, 465–467. Torsvik, V., Goksøyr, J., and Daae, F.L. (1990a). High diversity in DNA of soil bacteria. Appl. Environ. Microbiol. 56, 782–787. Torsvik, V., Salte, K., Sørheim, R., and Goksøyr, J. (1990b). Comparison of phenotypic diversity and DNA heterogeneity in a population of soil bacteria. Appl. Environ. Microbiol. 56, 776–781. Tsai, Y.-L., and Olson, B.H. (1992a). Detection of low numbers of bacterial cells in soils and sediments by polymerase chain reaction. Appl. Environ. Microbiol. 58, 754–757.

38

Tsai, Y.-L., and Olson, Bet.H. (1992b). Rapid method for separation of bacterial DNA from humic substances in sediments for polymerase chain reaction. Appl. Environ. Microbiol. 58, 2292–2295. Valentine, K., Duffield, D.A., Patrick, L.E., Hatch, D.R., Butler, V.L., Hall, R.L., and Lehman, N. (2008). Ancient DNA reveals genotypic relationships among Oregon populations of the sea otter (Enhydra lutris). Conserv. Genet. 9, 933–938. Vreeland, R.H., Jr, A.F.P., McDonnough, S., and Meyers, S.S. (1998). Distribution and diversity of halophilic bacteria in a subsurface salt formation. Extremophiles 2, 321–331. Vreeland, R.H., Rosenzweig, W.D., and Powers, D.W. (2000). Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal. Nature 407, 897–900. Vreeland, R.H., Jones, J., Monson, A., Rosenzweig, W.D., Lowenstein, T.K., Timofeeff, M., Satterfield, C., Cho, B.C., Park, J.S., Wallace, A., et al. (2007). Isolation of Live Cretaceous (121– 112 Million Years Old) Halophilic Archaea from Primary Salt Crystals. Geomicrobiol. J. 24, 275– 282. Vuissoz, A., Worobey, M., Odegaard, N., Bunce, M., Machado, C.A., Lynnerup, N., Peacock, E.E., and Gilbert, M.T.P. (2007). The survival of PCR-amplifiable DNA in cow leather. J. Archaeol. Sci. 34, 823–829. Wang, H.L., Yan, Z.Y., and Jin, D.Y. (1997). Reanalysis of published DNA sequence amplified from cretaceous dinosaur egg fossil. Mol. Biol. Evol. 14, 589–591. Wanger, G., Onstott, T.C., and Southam, G. (2008). Stars of the terrestrial deep subsurface: A novel “star‐shaped”bacterial morphotype from a South African platinum mine. Geobiology 6, 325–330. Wayne, R.K., Leonard, J.A., and Cooper, A. (1999). Full of sound and fury: the recent history of ancient DNA. Annu. Rev. Ecol. Syst. 457–477. Webb, L.G., Egan, S.E., and Turbett, G.R. (2001). Recovery of DNA for forensic analysis from lip cosmetics. J. Forensic Sci. 46, 1474–1479. Webster, G., Newberry, C.J., Fry, J.C., and Weightman, A.J. (2003). Assessment of bacterial community structure in the deep sub-seafloor biosphere by 16S rDNA-based techniques: a cautionary tale. J. Microbiol. Methods 55, 155–164. Weinstock, J., Willerslev, E., Sher, A., Tong, W., Ho, S.Y., Rubenstein, D., Storer, J., Burns, J., Martin, L., and Bravi, C. (2005). Evolution, systematics, and phylogeography of Pleistocene horses in the New World: a molecular perspective. PLoS Biol. 3, e241. Weinstock, J., Shapiro, B., Prieto, A., Marín, J.C., González, B.A., Gilbert, M.T.P., and Willerslev, E. (2009). The Late Pleistocene distribution of vicuñas (Vicugna vicugna) and the “extinction” of the gracile llama (“Lama gracilis”): New molecular data. Quat. Sci. Rev. 28, 1369–1373. Whitman, W.B., Coleman, D.C., and Wiebe, W.J. (1998). Prokaryotes: the unseen majority. Proc. Natl. Acad. Sci. 95, 6578–6583.

39

Wickenheiser, R.A. (2002). Trace DNA: a review, discussion of theory, and application of the transfer of trace quantities of DNA through skin contact. J. Forensic Sci. 47, 442–450. Willerslev, E., and Cooper, A. (2005). Review Paper. Ancient DNA. Proc. R. Soc. B Biol. Sci. 272, 3–16. Willerslev, E., and Hebsgaard, M.B. (2005). New evidence for 250 Ma age of halotolerant bacterium from a Permian salt crystal: Comment and reply COMMENT. Geology 33, e93–e93. Willerslev, E., Hansen, A.J., Christensen, B., Steffensen, J.P., and Arctander, P. (1999). Diversity of Holocene life forms in fossil glacier ice. Proc. Natl. Acad. Sci. 96, 8017–8021. Willerslev, E., Hansen, A.J., Binladen, J., Brand, T.B., Gilbert, M.T.P., Shapiro, B., Bunce, M., Wiuf, C., Gilichinsky, D.A., and Cooper, A. (2003). Diverse Plant and Animal Genetic Records from Holocene and Pleistocene Sediments. Science 300, 791–795. Willerslev, E., Hansen, A.J., and Poinar, H.N. (2004a). Isolation of nucleic acids and cultures from fossil ice and permafrost. Trends Ecol. Evol. 19, 141–147. Willerslev, E., Hansen, A.J., Rønn, R., Brand, T.B., Barnes, I., Wiuf, C., Gilichinsky, D., Mitchell, D., and Cooper, A. (2004b). Long-term persistence of bacterial DNA. Curr. Biol. CB 14, R9–10. Wilson, I.G. (1997). Inhibition and facilitation of nucleic acid amplification. Appl. Environ. Microbiol. 63, 3741. V. Wintzingerode, F., Göbel, U.B., and Stackebrandt, E. (1997). Determination of microbial diversity in environmental samples: pitfalls of PCR-based rRNA analysis. FEMS Microbiol. Rev. 21, 213–229. Woodward, Weyand, N.J., and Bunnell, M. (1994). DNA sequence from Cretaceous period bone fragments. Science 266, 1229–1232. Yilmaz, S., Allgaier, M., and Hugenholtz, P. (2010). Multiple displacement amplification compromises quantitative analysis of metagenomes. Nat. Methods 7, 943–944. Young, D.L., Huyen, Y., and Allard, M.W. (1995). Testing the Validity of the Cytochrome B Sequence from Cretaceous Period Bone Fragments as Dinosaur DNA. Cladistics 11, 199–209. Zazula, G.D., Turner, D.G., Ward, B.C., and Bond, J. (2011). Last interglacial western camel (Camelops hesternus) from eastern Beringia. Quat. Sci. Rev. 30, 2355–2360. Ziegenhagen, B., Guillemaut, P., and Scholz, F. (1993). A procedure for mini-preparations of genomic DNA from needles of silver (Abies alba mill.). Plant Mol. Biol. Report. 11, 117–121. Zischler, H., Hoss, M., Handt, O., Haeseler, A. von, Kuyl, A. van der, and Goudsmit, J. (1995). Detecting dinosaur DNA. Science 268, 1192–1193.

40

Chapter 2: A computational test of the authenticity of geologically ancient DNA

Abstract: Over the past decade, claims of geologically ancient DNA sequenced from bacteria and archaea have continued to be published in the mainstream scientific literature despite repeated concerns about the authenticity of these data. As most of these geologically ancient sequences are isolated from very small quantities of biological material, if is often not possible to verify the age of these samples by isotopic measurements. Here, we develop a method that we call the Depressed Rate Test, or DRT, to test the authenticity of age estimates of geologically ancient DNA sequences using molecular clock rate estimates. Through simulation, we show that the sensitivity of the DRT to distinguish very old sequences from present-day sequences depends on a combination of the age of the ancient sample and the evolutionary rate of the locus being analyzed. To test the utility of the DRT with real data, we evaluate the authenticity of previously published geologically ancient DNA sequences of bacteria and archaea, including sequences claimed to be up to 419 million years old. The DRT does not support the authenticity of any sequences reported to be older than 10 million years in age. We conclude that while the DRT, when added to existing good practices for ancient DNA work, can help authenticate claims of geologically ancient DNA. However, for many of the most commonly isolated genetic loci from geologically ancient samples such as 16S rRNA, the evolutionary rate is too slow to enable distinguishing between geologically ancient and present-day samples. We propose that future analyses of geologically ancient bacteria and archaea target genes with faster mutation rates, thereby making it possible to use phylogenetic methods as a means to authenticate the sample age.

41

Introduction Over the past decade, claims of recovering ancient bacteria and archaea from amber, ice cores, and salt crystals have been published, ranging in reported age from tens of thousands to hundreds of millions of years (Bellemain et al., 2013; Coolen and Overmann, 2007; Fish et al., 2002; Gramain et al., 2011; Lewis et al., 2008; Miteva and Brenchley, 2005; Panieri et al., 2010; Park et al., 2009; Satterfield et al., 2005; Schubert et al., 2010; Vreeland et al., 2000, 2007) Unlike DNA from bones or other materials with large amounts of organic compounds, there is not enough material from these microorganisms to isotopically date the samples, so often the age determinations must be made solely from stratigraphy. However, the authenticity of these geologically ancient bacterial and archaeal sequences is frequently questioned by the ancient DNA community (Graur and Pupko, 2001; Hebsgaard et al., 2005; Willerslev and Hebsgaard, 2005). Three general concerns are often raised: (1) the widely adopted procedures for avoiding contamination during ancient DNA retrieval (Cooper and Poinar, 2000; Gilbert et al., 2005) are not always followed for geologically ancient DNA (gaDNA); (2) the purportedly ancient sequences sometimes do not contain damage and degradation patterns characteristic of ancient DNA, which can suggest a modern, not ancient, origin; and (3) the reputed age of many samples is up to 100-fold greater that the theoretical limits of post-mortem DNA survival (Lindahl, 1993), underscoring the importance of authenticating these extraordinary results.

The checkered past of the field of ancient DNA, including spectacular claims of dinosaur DNA that were later shown to be contaminants (Wang et al., 1997; Woodward et al., 1994; Young et al., 1995; Zischler et al., 1995), led to the development and widespread adoption of a standard suite of protocols to help ensure the authenticity of ancient sequences. One such protocol is to perform ancient DNA research in a facility that is geographically isolated from other molecular biology research (Cooper and Poinar, 2000; Gilbert et al., 2005). This protects the ancient samples and extracts from contamination by modern DNA sequences, which are preferentially amplified over damaged fragments of ancient DNA. However, several studies reporting gaDNA sequences used modern bacterial samples as positive controls during the extraction of their ancient DNA (Fish et al., 2002; Panieri et al., 2010; Park et al., 2009). In at least one of these cases, the positive control was identical to nine of the reported 23 Mya sequences and was the most closely related modern sequence to another five (Park et al., 2009). Another widely adopted protocol is to replicate the production of a new or extraordinary result in an independent ancient DNA laboratory

42

(Cooper and Poinar, 2000). This helps to eliminate the possibility that the ancient DNA sequence is simply a laboratory-derived contaminant. Most reported gaDNA sequences have not been replicated (Fish et al., 2002; Panieri et al., 2010; Park et al., 2009; Vreeland et al., 2000, 2007).

Ancient DNA is expected to be fragmented and to contain damage – chemical modifications to the DNA helix that accumulate post-mortem. The pattern of DNA damage and degradation that occurs is well-characterized (Briggs et al., 2007; Brotherton et al., 2007; Gilbert et al., 2003, 2007; Heyn et al., 2010; Ho et al., 2007; Hofreiter et al., 2001; Stiller et al., 2006). Primarily, damage occurs as continued fragmentation of DNA into shorter and shorter fragments, and the accumulation of specific chemical modifications, mostly deamination of cytosine residues that results in C to T and G to A miscoding lesions, oxidative damage that creates blocking lesions, depurination, and crosslinking between strands (Willerslev and Cooper, 2005). Even in exceptional circumstances that might promote the long survival of DNA, damage and fragmentation will still occur. Therefore, gaDNA should only be amplifiable in small fragments, and should show the typical spectrum of DNA damages for ancient sequences (Hansen et al., 2006; Hebsgaard et al., 2005). This is not often the case in claims of gaDNA, sequences up to 1000 base pairs (bp) have been published from bacteria and archaea claimed to be between 11 and 419 My in age (Fish et al., 2002; Park et al., 2009; Vreeland et al., 2007). In contrast, the oldest authenticated DNA samples published are up to 400-800 kya old bacteria from Arctic ice cores and Greenland permafrost (Willerslev et al., 2003, 2007), and a recent draft genome assembled from a 700 kya horse bone (Orlando et al., 2013). In both cases, the DNA fragments that could be recovered were very short (averaging 130 and 78 bp fragments respectively) and showed evidence of considerable damage. In the ice core, the bacterial sequences from the older samples (lower stratigraphic levels) represented a much less taxonomically diverse community than did those at the surface levels. All of these samples were also recovered directly from permafrost environments, which are known to be ideal environments for preservation of DNA due to their near-continual low temperatures and lack of freeze-thaw cycles, which slows the processes of damage and decay (Willerslev et al., 2004).

It is on the basis of these DNA decay observations that the theoretical limit for PCR- amplifiable DNA is estimated to be between 400 kya and 1.5 Mya, beyond which DNA is either severely crosslinked, completely degraded or otherwise non-detectable (Hansen et al., 2006;

43

Willerslev et al., 2004). In contrast, some gaDNA claims are up to 419 Mya (Park et al., 2009), over 400-fold greater than this limit. However, these estimates are based on rates and patterns of decay post-mortem; it is possible that some of the microorganisms from which gaDNA has been reported may have gone into a low metabolic state while trapped in ice and salt crystals (Fendrihan et al., 2012; Schubert et al., 2009). In this case, the sequences reported are actually sequences from very slowly evolving but living organisms, which would explain why the typical patterns of damage and decay expected for gaDNA are not observed. In bacterial community composition surveys from different levels down a sediment core, spore-forming bacteria were shown lost through time quicker than non-spore-forming bacteria (Johnson et al., 2007), possibly indicating that microorganisms that can maintain a low metabolic state may survive, while spore-forming bacteria that cannot repair DNA in their spore state do not (Maughan, 2007). In deep subsurface environments, biomass turnover can range between years and millennia (Jørgensen, 2011; Jørgensen and Boetius, 2007; Lomstein et al., 2012) and metabolic rates are extremely low (D’Hondt et al., 2009; Røy et al., 2012) but still orders of magnitude higher than the energy required to repair DNA depurination and protein amino acid racemization (Hoehler and Jørgensen, 2013). However, even if these trapped microorganisms can live millennia in a low metabolic state, claims of hundreds of millions of years are still extreme and would benefit from additional evidence to illustrate their authenticity, particularly without isotope dating.

Here, we propose a version of a relative rates test based on theory from Hebsgaard et al. (2005) that we call the Depressed Rates Test, or DRT, in which we estimate relative rates of evolution of terminal branches across a genealogy to detect whether those terminal branches that lead to ancient sequences have statistically different rates of evolution than those that lead to modern samples. If the geologically ancient sequences are actually very old, that branch should be “missing” an amount of evolutionary change that is proportional to the time that has passed since the organism was alive or actively growing and reproducing. If we then assume that the old sequence is the same age as the younger sequences, this “missing” evolution should manifest as a severely depressed rate of evolution along that branch compared to the other terminal branches in the tree (Figure 2-1).

44

Figure 2-1: Within a family of organisms there is a distribution of relative molecular clock rates that can be determined (gray box). Additional sequences can be added and their position relative to the modern distribution of rates can be estimated. Modern sequences assumed to be ancient will have an artificially increased rate (blue box) due to the fact that there are more mutations than would be expected, while ancient sequences assumed to be modern will have an artificially depressed rate (red box) due to the fact that there are fewer mutations than expected. The proposed method makes use of the ancient sequences assumed to be modern and looks for a depressed rate compared to the modern distribution. Image adapted from Hebsgaard et al. (2005).

We use a series of simulated data sets to test whether the DRT can, in fact, be used to authenticate purportedly geologically ancient DNA sequences. Published ancient DNA sequences vary widely both in the reported age and in the rate of evolution along the sequenced locus. We therefore design our simulation framework to test a broad range of sequence age and mutation rate parameters. We then apply the DRT to real data sets of published geologically ancient bacteria and archaea. We conclude that, for many of the most commonly isolated genetic loci, and in particular those that tend to be targeted for ancient bacteria, the rate of evolution is too slow to distinguish authentic geologically ancient DNA from modern, contaminating sequences.

45

Materials and Methods Data Sets Simulated Data

To test the efficacy of the DRT to distinguish geologically ancient sequence from present- day sequences under a variety of experimental scenarios, our simulation design focused on variation of three major parameters: age of the ancient sequence, the total depth or age of the tree, and the rate of evolution of the locus sequenced. Figure 2-2 provides a schematic of our simulation workflow. Briefly, we generated genealogies comprising 50 present-day sequences and up to 18 ancient sequences. We then simulated DNA sequence alignments from each genealogy using up to five different mutation rates. Next, we subsampled these alignments so that only one ancient sequence was included in each. Finally, we used a relaxed molecular clock approach to infer trees for each of these subsampled alignments, and estimated the evolutionary rates along the branches in each genealogy. Our goal was to identify external branches whose evolutionary rate was significantly depressed compared to the rate along other external branches in the tree, which we hypothesize to be evidence of a significant difference in age.

While most ancient DNA analyses to date have focused on population samples with very recent common ancestors, many experiments involving geologically ancient DNA sample from taxonomic units that are much older. To test the effect of tree depth on the ability of the DRT to identify geologically ancient samples, we subdivided our experiment into two data sets. Data Set 1 (DS1) comprised trees with a root height between 500 and 600 Mya. Using BEAST v1.7 (Drummond et al., 2012), we generated from a null alignment a series of trees with 51 present-day sequences and 18 ancient sequences. Of the modern sequences, one belonged to an outgroup species and the remainder were assumed to be from the same taxonomic unit. For the 18 ancient sequences, three were assigned to each of six different ages: 10 kya, 100 kya, 1 Mya, 10 Mya, 100 Mya and 500 Mya. Data set 2 (DS2) comprised trees with a root height between 1 and 1.5 Mya, and again included 50 modern ingroup and one modern outgroup sequence and nine ancient sequences, three sequences assigned to each of three different ages: 10 kya, 100 kya, and 1 Mya. The two older ancient sequence age categories used in DS1 were excluded from DS2 as they exceeded the age of the root of the tree. For each data set, we selected three tree topologies for further analysis.

46

Figure 2-2: For each ancient sequence at each rate there are tree trees, each of which has three ancient sequences at each time point. Each of the trees has three replicate sequence sets generated for each rate. The control sequence set contains 50 modern sequences for each tree. For DS1 this results in a total of 684 sequence sets and for DS2 this results in 360 sequence sets that were then used to test the power of DRT.

Next, we simulated sequence data (1.4 kb) down each of the trees using Seq-Gen (Rambaut and Grass, 1997), assuming an HKY substitution model with a transition/transversion ratio of 5. For each of these three trees selected from in DS1, we used four different molecular clock rates:

47

5x10-11, 5x10-10, 5x10-9, and 5x10-8 mutations/site/year. For each tree in DS2, we used four different molecular clock rates: 5x10-9, 5x10-8, 5x10-7, 5X10-6 mutations/site/year. The mutation rates represent the range of rates for the most commonly isolated loci in ancient DNA data sets, ranging from slowly evolving, bacterial 16S rRNA with rates around 10-10 (Foster et al., 2009; Lenski et al., 2003), to more quickly-evolving mitochondrial hypervariable region, where rates as fast as 10-6, have been estimated for some animal species (Lambert et al., 2002; Nabholz et al., 2008). The rates selected for each data set reflect those that are informative given the root height (the slowest evolutionary rates did not produce an observable number of mutations in the younger trees). Each combination of molecular clock rate and tree topology was replicated three times, for a total of 3 (number of trees selected from each data set) x 4 (number of mutation rates) x 3 (replicates for each mutation rate) = 36 alignments for each data set.

For each of the 36 alignments of 69 sequences each in DS1, we then created separate 50- sequence alignments each containing only one of the 18 ancient sequences. This was done to facilitate estimation of the distribution of evolutionary rates along external branches. Each of the 50-sequence alignments comprised one ancient sequence, 48 modern ingroup sequences (the sequence to be removed was randomly selected in each sub-alignment), and the outgroup sequence. In addition, we created one control alignment comprising the outgroup and 49 ingroup sequences. This procedure produced a total of 684 alignments of simulated data for DS1. We performed a similar subsampling and alignment creation procedure for DS2, resulting in a total of 360 simulated alignments (Figure 2-2).

Horse Control Region Data Sets

Recently the mitochondrial sequence for a 700,000 year old horse was published (Orlando et al., 2013). This horse data, along with several other 5 other horse control sequences (mutation rate 1.088x10-7 dated between 20,000 and 120,000 years (unpublished Shapiro Lab) were used to test the DRT (Table 2-1). The data set consisted of 49 modern horse sequences from GenBank, 6 ancient horse sequences, and a donkey sequence as an outgroup, making the tree height around 4 Mya. The sequences were aligned using MAFFT version 7 (Katoh and Standley, 2013; Katoh et al., 2002). Then, to match the simulated data sets and facilitate detection of depressed rates, we downsampled each alignment to 50 sequences each, where each sub-alignment comprised 48

48 modern sequences, one outgroup sequence and one ancient sequence. The control sequence set consisted of only present-day samples. We analyzed a total of seven alignments, six of which included an ancient sequence.

Sample Age (years) MS363 25,377 MS364 30,784 MS365 >52,400 (radiocarbon infinite) MS291 80,000-120,000 MS292 80,000-120,000 TC21c 680,000±20,000

Table 2-1: Horses with sample numbers and ages.

Bacterial and Archaeal Data Sets

Next, we used the DRT to directly test the authenticity of several published sequences of geologically ancient bacteria and archaea. We analyzed sequences from four different publications: Fish et al. (2002), Panieri et al. (2010), Park et al. (2009), and Vreeland et al. (2007), all of which targeted the 16S rRNA gene. We generated a different data set for each Order that was represented by at least one ancient sequence. As above, each data set comprised published ancient sequences, present-day sequences used in the analysis in each original publication, and additional present-day sequences downloaded from the public database. Our aim in compiling these data sets was to span the range of taxonomic diversity within the bacterial or archaeal Order into which the ancient sequence is classified (Table 2-2).

Because bacterial alignments of 16S can sometimes be difficult due to sections which are highly variable, we aligned all the sequences within each Order using MAFFT together, inspected the alignments by eye, and trimmed them to between 500 and 600bp, depending upon the length of the ancient sequences. We then subsampled the resulting alignments as above, so that each alignment comprised 49 present-day sequences, one outgroup sequence, and one ancient sequence. Control alignments consisted of only modern samples. This resulted in 76 alignments, 70 of which were testing the authenticity of a single, reported, geologically ancient sequence.

49

# Seq Acc Number Reported Age Ingroup Genus Outgroup 1 FJ809905a 5.816 Mya Order: Oscillatoriales Chroococcidiops 8 FJ809895-FJ809903a 5.901 Mya Order: Oscillatoriales Chroococcidiops 12 AJ319550-AJ319561b 11-15.8 Mya , Haloarcula Halobacterium, Halarchaeum 14 EF535047, EF535049-EF535055, 23 Mya Halorubrum Haloferax EF535059, EF535061-EF535083c

7 EF535041, EF535046, EF535066- 121 Mya Haloferax, Natronobacteriu EF535068, EF535077, EF535083c Haloquadratur, Halogeometricum, Halosarcina 1 EF535073c 121 Mya Halobacterium 1 AJ878084d 121 Mya Natronomonas Halogpiger 13 EF535040, EF535042, EF535044, 121 Mya Halosimplex, Haloarcula EF535045, EF535067, EF535070, Halobacterium, EF535075, EF535078, EF535081, Halarchaeum AJ878078, AJ878079, AJ878081, AJ878082cd 5 EF535084-EF535086, EF535096, 419 Mya Haloferax, Natronobacteriu EF535102c Haloquadratur, Halogeometricum, Halosarcina 1 EF535091c 419 Mya Natronomonas Halogpiger 7 EF535087, EF535088, EF535093, 419 Mya Halosimplex, Haloarcula EF535095, EF535097, EF535098, Halobacterium, EF535100c Halarchaeum

Table 2-2: Ancient bacterial and archaeal sequences tested including reported age and ingroup and outgroup genera or order used in the analysis. Data comes from the following sources a) Panieri et. al. (2010), b) Fish et. al. (2002), c) Park et. al. (2009), and d) Vreeland et. al. (2007).

Analysis For each alignment, we inferred the tree topology and relative rates along the branches of the tree using BEAST v1.7.5, constraining all ingroup sequences to be monophyletic. We assumed sequences evolved according to an HKY substitution model with estimated base frequencies and with rates that followed an Uncorrelated Lognormal Relaxed clock model, which allows branches in the phylogeny to evolve at different rates (Drummond et al., 2006). Tree topology was assumed to follow a birth/death tree model (Gernhard, 2008). A normal prior was placed on treeHeight,

50 with a mean of 1 and a standard deviation of 0.001. The rates estimated are therefore not actual estimates of the rate along each branch, but estimates of rates relative to each other within the tree. Both the ucld.mean and birth/death rates priors were set to uniform distributions with a lower bound of 0 and an upper bound of infinity.

Due to the variation in mutation rates and tree heights, the alignments analyzed required different numbers of iterations in order to achieve effective sample size (ESS) values over 200 for each parameter. These run statistics were computed by Tracer version 1.5 (Rambaut and Drummond, 2007). Analyses were run for between 100 million and 100 billion iterations, and trees and parameter values were sampled from the posterior sufficiently frequently to sample 10,000 states. The first 10% of the sampled states were removed from each run as burn-in, leaving 9,000 sampled states for downstream analysis.

The resulting BEAST output tree files were processed with TreeStat (in the BEAST suite of programs) to retrieve the external branch rates, or the rates along the terminal branches in the tree, for each sampled tree. The external branch rate output from TreeStat was then run through a script that estimates the significance value for each external branch rate being less than the mean branch rate (script is available upon request). Because the MCMC method allows us to sample the Bayesian posterior probability distribution, we can directly test our hypothesis on each sample by counting the number of times the null hypothesis is true to estimate the p-value. In this case, the null hypothesis is that the ancient sample is not ancient, and therefore will not fall significantly outside the distribution of external branch rates observed for known modern samples. Thus, the p- value is interpreted to indicate the probability that a particular sample differs significantly in rate from modern samples (i.e. is ancient). To obtain this distribution, we estimate the mean branch rate across all tip branches for each sampled tree. Then, we estimate the p-value by counting the number of sampled trees for which the rate of a particular branch is greater than or equal to the mean of all the branches in that tree, divided by the total number of trees sampled from that BEAST run (in this case the total number was 9,000). Because the number of MCMC samples is finite, there will be some uncertainty on the estimated p-value. This uncertainty can be reduced by increasing the number of MCMC samples. We computed the uncertainty of our p-values using the standard Wald method (Agresti and Coull, 1998).

51

The median and mean rates along each tip branch over all the sampled trees were estimated from TreeAnnotator (in the BEAST suite of programs) and visualized in FigTree version 1.4 (Rambaut, 2007). R was used to make a histogram of the distribution of external branch rates (script is available upon request). The rate distributions between all the modern samples in the alignments were compared in order to make sure that there were no modern taxa with rates that would skew the analysis.

Results Simulated Data Figure 2-3 shows an example of how ancient sequences can be detected based upon age of sample. Three main factors determined whether the DRT was able to detect a significantly depressed rate along the branch leading to the ancient sample (Table 2-3): the age of ancient sample, the evolutionary rate of the genomic locus, and the total age of the tree. Older samples (above 1 Mya) are more easily detected as authentically ancient via a depressed rate than are younger samples (10-100 kya). When the evolutionary rate is faster, the younger ancient samples are more often detected as having a significantly depressed rate. Younger samples are also more frequently detected as authentically ancient when the tree itself is younger.

As shown in Table 2-3, for the younger tree height (DS 2), authentically ancient samples could not be detected as having significantly depressed rates when the evolutionary rate was equal to or faster than 5x10-9. For slower evolutionary rates, sequences near the root of the tree (1 Mya) were able to be detected as ancient for all the replicates. For ancient sequences that were towards the middle of the tree (100 kya), depressed rates could be detected for some, but not all, of the replicates with an evolutionary rate of 5x10-8, but with quicker evolutionary rates (5x10-7), all of the replicates had significantly depressed rates and therefore were identified as ancient. For sequences near the tips of the tree (10 kya) none of the replicates at any evolutionary rate were detected as ancient.

52

Figure 2-3: The distribution of calculated mutation rates for the external branches in the trees. All data is from simulated data sets with a root height of 1 Mya. Row one has a rate of 5x10-7, row two a rate of 5x10-8, and row three a rate of 5x10-9. The ancient sequences, which are shaded in red, have fixed ages of 1 Mya (column 1), 100 kya (column 2), and 10 kya (column 3). The ancient sequences with a rate of 5x10-7 have significantly depressed rates for sequences aged 1 Mya (p-value 0.00000±0.00000) and 100 kya (p-value 0.00000±0.00000) but not for 10 kya (p-value 0.62071±0.01002). Ancient sequences with a rate of 5x10-8 have significantly depressed rates only for the sequence aged 1 Mya (p-value 0.00000±0.00000). Ancient sequences with rates of 5x10- 9 have no sequences with significantly depressed rates.

For the older tree height (DS 1), samples near the root of the tree were detected at the slower rates. Specifically all replicates of samples 500 Mya were detected as having significantly depressed rates even with the slowest mutation rate 5x10-11. Rates of sequences towards the middle of the tree (1–100 Mya) were detected in a relationship of one order of magnitude difference in mutation rate to one order difference in age of ancient sequence. In other words some, but not all, of the replicates were detected as ancient for the 100 Mya sample at an evolutionary rate of 5x10- 10 and all replicates were detected as ancient when the evolutionary rate was increased to 5x10-9. Similarly for the 10 Mya sample, some replicates had a significantly depressed rate at an

53

evolutionary rate of 5x10-9 and all replicates were detected at an evolutionary rate of 5x10-8. The same is true for the 1 Mya sample. For samples near the tips of the tree (10–100 kya) no replicates at any evolutionary rate were detected as ancient.

Mutation Rate (mutations/site/year) 5x10-11 5x10-10 5x10-9 5x10-8 5x10-7 5x10-6 A 10 kya 100 kya Age 1 Mya 0.00-0.32 10 Mya 0.00-0.29 0.00-0.03* 100 Mya 0.11-0.42 0.00-0.28 0.00-0.03* 0.00-0.03* 500 Mya 0.00-0.01* 0.00** 0.00** 0.00**

B 10 kya 0.13-0.48 0.12-0.50 Age 100 kya 0.02-0.37 0.00-0.02* 0.00** 1 Mya 0.10-0.19 0.00** 0.00** 0.00**

Table 2-3: Sensitivity of the test in simulation is based upon the mutation rate, age of sample, and tree height. The range of significance values from the replicates are given based upon the age and mutation rate for A) DS 1 (>500 Mya root height) and B) DS 2 (~1 Mya root height). * indicates where all replicates are significant at the p=0.05 level and ** indicate where all replicates are significant at the p=0.01 level. Gray boxes indicate where no tests were performed because the rate and tree height were unable to be tested together.

It is easier to detect authentically ancient sequences when they are close to the root of the tree, because they will have the largest amount of difference between the number of changes in the ancient sequence and the number of changes which have occurred on the branches to the modern sequences. However, ancient sequences are still detected as ancient in trees with deeper roots as long as the evolutionary rate of the gene locus is relatively fast and the age of the sequence is toward the middle of the tree. With deep rooted trees (root greater than 100 Mya), sequences near the tips (under 1 Mya) are not going to be detected as ancient, at any evolutionary rate. Decreasing the root height by approximately two orders of magnitude, increases the detection ability of the test by about one order of magnitude in either evolutionary rate or sequence age. For example, with a root height greater than 500 Mya a 1 Mya sequence is detected as ancient in all replicates when the evolutionary rate is 5x10-7. If the root height is decreased two orders of magnitude to around 1.5 Mya, a 1 Mya sequence is able to be detected when the evolutionary rates is 5x10-8 or one order of magnitude difference from the evolutionary rate detected with the greater root height.

54

This sequence is one order of magnitude more recent than the one detectable at the same evolutionary rate, but a root height two orders of magnitude smaller.

Horse Data The horse samples had very little variation in their rates. With a root height of 4 Mya and a rate of 1.088x10-7, ancient samples over 100 kya should be detected as ancient. However, while the 700 kya horse bone had a lower mean rate than the modern samples, the rate distribution was not significantly depressed (Figure 2-4). None of the younger horse bones had depressed rates.

Figure 2-4: Relative rate distribution for horse data. The ancient samples are highlighted in red for a 700 kya bone, an 80-120 kya bone, and a 25 kya bone. Based upon simulation the 700 kya bone is within the range of detection for the DRT; however, significantly depressed rate was not detected.

Bacterial and Archaeal ‘Test’ Data The bacterial and archaeal samples had more variation in their rates than the simulated data. In addition some of the modern samples had significantly depressed rates (Table 2-4). This is most likely because the bacteria and archaeal data sets have variation in their rates along each branch. Because only 16S rRNA data was available for these geologically ancient samples, the sample would have to be greater than ten million years old to have a depressed rate detected by this method. Therefore many of the claims of gaDNA that are available are outside the limit of detection. However, of the 71 bacterial and archaeal samples that were within the time frame and were tested, none had significantly depressed rates.

55

# Ancient # Modern Sequences Genus Ingroup Genus Outgroup Sequences Tested with Depressed Rates Order: Oscillatoriales Chroococcidiopsis 9 0 Halosimplex, Haloarcula 33 3 Halobacterium, Halarchaeum

Halorubrum Haloferax 14 2

Haloferax, Natronobacterium 12 0 Haloquadratur, Halogeometricum, Halosarcina

Halococcus Halobacterium 1 2

Natronomonas Halogpiger 2 1

Table 2-4: Summary of the six Data Sets for the bacterial and archaeal data including the ingroups, outgroups, number of ancient sequences tested in each Data Set and the number of modern sequences with depressed rates that were found in each Data Set. None of the 71 reported ancient sequences showed depressed rates.

Discussion Although ancient DNA sequences up to 500 Mya old have been reported, the authenticity of these sequences remains a matter of debate. Such results require extraordinary proof, which is very difficult to obtain in almost every situation in which gaDNA has been reported. Sample sizes, for example, tend to be very small, making it impossible to directly date the ancient sample via isotopic means. DNA experiments involving micro-organisms are also highly susceptible to contamination, as the potential contaminants are ubiquitous in most environments including clean rooms (Duc et al., 2007; Moissl et al., 2007). Despite these challenges, the potential for gaDNA to inform studies of long-term evolutionary processes highlights a real need for some way to test and confirm the authenticity of these data.

Here, we provide the first statistical method for authentication of gaDNA, which we term the Depressed Rate Test, or DRT. The DRT estimates the relative evolutionary rates of all terminal branches in a genealogy, searching for depressed rates along those terminal branches leading to the ancient sequence as confirmation of authenticity. Many gaDNA claims come from bacteria and archaea where evolutionary rates are not well established (Kuo and Ochman, 2009; Ochman

56

et al., 1999); therefore, DRT uses relative rates to avoid this problem, as rates within a bacteria family are generally consistent (Kuo and Ochman, 2009). The test is based on the assumption that authentically ancient DNA sequences will be “missing” an amount of evolutionary change relative to present-day sequences. The amount of missing evolution will depend on both the amount of time that passed since the death of the ancient organism and the rate of evolution along the targeted genetic locus. The more time since organismal death and the faster the evolutionary rate, the more changes will be missing, and the more likely a depressed rate along that lineage will be observed. Therefore, if we compare a truly ancient sequence to its present-day relatives, but constrain the ancient sample to have a modern date, the branch leading to that ancient sample will have a depressed evolutionary rate relative to the other terminal branches.

Through simulation, we show that the DRT is sufficiently sensitive to detect a sequence greater than 100 kya when the mutation rate is 5x10-7 or at 10 Mya when the mutation rate is 5x10-9. This combination of mutation rate and sample age limits testing the DRT on authenticated aDNA samples. One sample which fits into this range is the recently published 700 kya horse mitochondrial control region sequence (Orlando et al., 2013). However, when testing this sequence, the DRT was unable to detect a significantly depressed rate. This is most likely because the mammalian mitochondrial DNA has an elevated transition to transversion ratio (Nabholz et al., 2008) leading to a high back-mutation rate and therefore masking the signal of the DRT, indicating that a the transition/transversion ratio must also be evaluated before using the DRT.

Through simulation, we show that the DRT is sufficiently sensitive to detect sequences as authentically ancient when samples are older than 100 kya and evolutionary rates are greater than 5x10-7 mutations/site/year. Interestingly, this same combination of sequence age and evolutionary is not sufficient to detect authentically ancient sequence when the total age of the tree is more than 1 Mya. When the total age of the tree is increased the signal to noise ratio goes up because there are more mutations and the detection ability of the test decreases. This suggests that the choice of outgroup sequence is critical; outgroups that are too evolutionarily distant from the ingroup will result in the decrease of ability to detect authentically ancient sequences. For ancient sequences older than 1 Mya, the ratio of the age of the sample to the age of the root should be no more than 1 to 500. For ancient sequences less than 1 Mya, that ratio should be no more than 1 to 20.

57

Therefore a sample with an age of 500 kya should be tested with an outgroup that diverged no more than 10 Mya for DRT to be effective.

Another important consideration is the amount of rate variation among ingroup lineages. In the analysis of the bacterial and archaeal data sets, we observed that modern samples occasionally have depressed rates. Thus it is important to run a control with only the modern samples to make sure that the gene mutation rate is stable within the timeframe and data set of the modern samples being used and identify any modern samples with depressed rates. Modern sequences with depressed rates should either be removed from the sample set as outliers or should be recorded and kept in the sample set if there are not many known modern sequences related to the sample. If there are a large number of modern samples with depressed rates, attempts should be made to narrow the taxonomic class of the modern sequences as more closely related species generally have less diverse rates. Lastly, it might be advisable to analyze a different gene or locus for the taxonomic group at hand as mutation rate varies by gene.

As most published sequence data from gaDNA is from the slowly mutating 16S rRNA gene, the DRT test was only sensitive enough to test for samples over several million years. For these very old gaDNA (over 5 Mya), we find no evidence of a depressed rate in any of the samples tested, despite conditions where the detection power is strong, indicating that they are not truly ancient samples. Even if the halophiles were trapped with algae cells as a food source or reverted into a low metabolically active dormant state, they would be either growing very, very slowly or in a state where the cells are actively repairing their DNA, but not growing. This reduced rate would mean that the cells would have fewer mutations over time and therefore we would still expect to see depressed rates on the DRT test.

More likely, these gaDNA sequences derive from contamination. The samples could have been contaminated by modern microorganisms during processing. Many of the gaDNA claims over 1 Mya do not follow the published aDNA guidelines, most especially in that several studies have used positive controls. Unsurprisingly, some of the million year sequences are identical to the positive control, which is more likely due to modern contamination than convergent evolution. However, the contamination does not have to be modern from the sampling and extraction process. The contaminating cells could be old, from a contamination event that happened thousands of years ago. This would still account for the small size of the observed cells, as they will have been

58 trapped for an extended period of time, but is more consistent with a timeframe where aDNA can be retrieved.

Although 16S rRNA gene is a powerful locus for genus identification, its slow mutation rate limits the ability to detect differences in rate between modern samples and ancient samples to greater than 10 Mya. Therefore, the published data for many gaDNA claims that is limited to 16S cannot be tested by this method. However, 16S is not the only useful gene that is stable over the periods of time needed to use this test. Other genes must have relatively stable rates over the last 10 million years, be relatively free from horizontal transfer, and have 50-100 modern samples spanning the bacterial or archaeal family sequences. Several other genes fit the criteria for authenticating sequences less than 10 million years old, although they may or may not be useful depending upon the amount of available sequence data from closely related taxa and have primers which are conserved for at least the family level. Examples of these genes are the photolayse genes (Lucas-Lledó and Lynch, 2009), histidine biosynthesis gene HisH, leucine transport gene brnQ, modular polyketide synthases: ketosynthase (KS) and acyltransferas (AT) (Wang et al., 2010), nitrogen fixation gene nifH, and other bacterial core genes (Gil et al., 2004).

In summary, it is important to think about the results from analysis of gaDNA in light of findings from the aDNA community. First, aDNA guidelines must be followed, particularly in regards to progression up the contamination gradient, amplification of small overlapping fragments, and replication in independent laboratories (Gilbert et al., 2005). Second, it is important to target multiple genes for analysis and the 16S rRNA gene alone is not sufficient to authenticate gaDNA. Finally, following DNA sequencing, it is equally important to perform careful analyses such as identifying common laboratory bacterial contaminants and strains used as controls, analyzing the sequence for damage, creating trees to see the placement of the ancient sequence within modern sequence diversity, and applying the DRT test, which will allow more secure authentication of geologically ancient samples.

59

References Agresti, A., and Coull, B.A. (1998). Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions. Am. Stat. 52, 119–126. Bellemain, E., Davey, M.L., Kauserud, H., Epp, L.S., Boessenkool, S., Coissac, E., Geml, J., Edwards, M., Willerslev, E., Gussarova, G., et al. (2013). Fungal palaeodiversity revealed using high-throughput metabarcoding of ancient DNA from arctic permafrost. Environ. Microbiol. 15, 1176–1189. Briggs, A.W., Stenzel, U., Johnson, P.L.F., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., et al. (2007). Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. 104, 14616–14621. Brotherton, P., Endicott, P., Sanchez, J.J., Beaumont, M., Barnett, R., Austin, J., and Cooper, A. (2007). Novel high-resolution characterization of ancient DNA reveals C > U-type base modification events as the sole cause of post mortem miscoding lesions. Nucleic Acids Res. 35, 5717–5728. Coolen, M.J.L., and Overmann, J. (2007). 217 000-year-old DNA sequences of green sulfur bacteria in Mediterranean sapropels and their implications for the reconstruction of the paleoenvironment. Environ. Microbiol. 9, 238–249. Cooper, A., and Poinar, H.N. (2000). Ancient DNA: do it right or not at all. Science 289, 1139. D’Hondt, S., Spivack, A.J., Pockalny, R., Ferdelman, T.G., Fischer, J.P., Kallmeyer, J., Abrams, L.J., Smith, D.C., Graham, D., and Hasiuk, F. (2009). Subseafloor sedimentary life in the South Pacific Gyre. Proc. Natl. Acad. Sci. 106, 11651–11656. Drummond, A.J., Ho, S.Y.W., Phillips, M.J., and Rambaut, A. (2006). Relaxed Phylogenetics and Dating with Confidence. PLoS Biol 4, e88. Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. Duc, M.T.L., Dekas, A., Osman, S., Moissl, C., Newcombe, D., and Venkateswaran, K. (2007). Isolation and Characterization of Bacteria Capable of Tolerating the Extreme Conditions of Clean Room Environments. Appl. Environ. Microbiol. 73, 2600–2611. Fendrihan, S., Dornmayr-Pfaffenhuemer, M., Gerbl, F.W., Holzinger, A., Grösbacher, M., Briza, P., Erler, A., Gruber, C., Plätzer, K., and Stan-Lotter, H. (2012). Spherical particles of halophilic archaea correlate with exposure to low water activity – implications for microbial survival in fluid inclusions of ancient halite. Geobiology 10, 424–433. Fish, S.A., Shepherd, T.J., McGenity, T.J., and Grant, W.D. (2002). Recovery of 16S ribosomal RNA gene fragments from ancient halite. Nature 417, 432–436.

60

Foster, J.T., Beckstrom-Sternberg, S.M., Pearson, T., Beckstrom-Sternberg, J.S., Chain, P.S.G., Roberto, F.F., Hnath, J., Brettin, T., and Keim, P. (2009). Whole-Genome-Based Phylogeny and Divergence of the Genus Brucella. J. Bacteriol. 191, 2864–2870. Gernhard, T. (2008). The conditioned reconstructed process. J. Theor. Biol. 253, 769–778. Gil, R., Silva, F.J., Peretó, J., and Moya, A. (2004). Determination of the Core of a Minimal Bacterial Gene Set. Microbiol. Mol. Biol. Rev. 68, 518–537. Gilbert, M.T.P., Hansen, A.J., Willerslev, E., Rudbeck, L., Barnes, I., Lynnerup, N., and Cooper, A. (2003). Characterization of Genetic Miscoding Lesions Caused by Postmortem Damage. Am. J. Hum. Genet. 72, 48–61. Gilbert, M.T.P., Bandelt, H.-J., Hofreiter, M., and Barnes, I. (2005). Assessing ancient DNA studies. Trends Ecol. Evol. 20, 541–544. Gilbert, M.T.P., Binladen, J., Miller, W., Wiuf, C., Willerslev, E., Poinar, H., Carlson, J.E., Leebens-Mack, J.H., and Schuster, S.C. (2007). Recharacterization of ancient DNA miscoding lesions: insights in the era of sequencing-by-synthesis. Nucleic Acids Res. 35, 1–10. Gramain, A., Díaz, G.C., Demergasso, C., Lowenstein, T.K., and McGenity, T.J. (2011). Archaeal diversity along a subterranean salt core from the Salar Grande (Chile). Environ. Microbiol. 13, 2105–2121. Graur, D., and Pupko, T. (2001). The Permian Bacterium that Isn’t. Mol. Biol. Evol. 18, 1143– 1146. Hansen, A.J., Mitchell, D.L., Wiuf, C., Paniker, L., Brand, T.B., Binladen, J., Gilichinsky, D.A., Rønn, R., and Willerslev, E. (2006). Crosslinks Rather Than Strand Breaks Determine Access to Ancient DNA Sequences From Frozen Sediments. Genetics 173, 1175–1179. Hebsgaard, M.B., Phillips, M.J., and Willerslev, E. (2005). Geologically ancient DNA: fact or artefact? Trends Microbiol. 13, 212–220. Heyn, P., Stenzel, U., Briggs, A.W., Kircher, M., Hofreiter, M., and Meyer, M. (2010). Road blocks on paleogenomes—polymerase extension profiling reveals the frequency of blocking lesions in ancient DNA. Nucleic Acids Res. 38, e161–e161. Ho, S.Y.W., Heupink, T.H., Rambaut, A., and Shapiro, B. (2007). Bayesian Estimation of Sequence Damage in Ancient DNA. Mol. Biol. Evol. 24, 1416–1422. Hoehler, T.M., and Jørgensen, B.B. (2013). Microbial life under extreme energy limitation. Nat. Rev. Microbiol. 11, 83–94. Hofreiter, M., Jaenicke, V., Serre, D., Haeseler, A. von, and Pääbo, S. (2001). DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res. 29, 4793–4799.

61

Johnson, S.S., Hebsgaard, M.B., Christensen, T.R., Mastepanov, M., Nielsen, R., Munch, K., Brand, T., Gilbert, M.T.P., Zuber, M.T., Bunce, M., et al. (2007). Ancient bacteria show evidence of DNA repair. Proc. Natl. Acad. Sci. 104, 14401–14405. Jørgensen, B.B. (2011). Deep subseafloor microbial cells on physiological standby. Proc. Natl. Acad. Sci. 108, 18193–18194. Jørgensen, B.B., and Boetius, A. (2007). Feast and famine—microbial life in the deep-sea bed. Nat. Rev. Microbiol. 5, 770–781. Katoh, K., and Standley, D.M. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066. Kuo, C.-H., and Ochman, H. (2009). Inferring clocks when lacking rocks: the variable rates of molecular evolution in bacteria. Biol. Direct 4, 35. Lambert, D.M., Ritchie, P.A., Millar, C.D., Holland, B., Drummond, A.J., and Baroni, C. (2002). Rates of Evolution in Ancient DNA from Adélie Penguins. Science 295, 2270–2273. Lenski, R.E., Winkworth, C.L., and Riley, M.A. (2003). Rates of DNA Sequence Evolution in Experimental Populations of Escherichia coli During 20,000 Generations. J. Mol. Evol. 56, 498– 508. Lewis, K., Epstein, S., Godoy, V.G., and Hong, S.-H. (2008). Intact DNA in ancient permafrost. Trends Microbiol. 16, 92–94. Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709–715. Lomstein, B.A., Langerhuus, A.T., D’Hondt, S., Jørgensen, B.B., and Spivack, A.J. (2012). Endospore abundance, microbial growth and necromass turnover in deep sub-seafloor sediment. Nature 484, 101–104. Lucas-Lledó, J.I., and Lynch, M. (2009). Evolution of Mutation Rates: Phylogenomic Analysis of the Photolyase/Cryptochrome Family. Mol. Biol. Evol. 26, 1143–1153. Maughan, H. (2007). Rates of molecular evolution in bacteria are relatively constant despite spore dormancy. Evol. Int. J. Org. Evol. 61, 280–288. Miteva, V.I., and Brenchley, J.E. (2005). Detection and isolation of ultrasmall microorganisms from a 120,000-year-old Greenland glacier ice core. Appl. Environ. Microbiol. 71, 7806–7818. Moissl, C., Osman, S., La Duc, M.T., Dekas, A., Brodie, E., DeSantis, T., and Venkateswaran, K. (2007). Molecular bacterial community analysis of clean rooms where spacecraft are assembled. FEMS Microbiol. Ecol. 61, 509–521. Nabholz, B., Glémin, S., and Galtier, N. (2008). Strong Variations of Mitochondrial Mutation Rate across Mammals—the Longevity Hypothesis. Mol. Biol. Evol. 25, 120–130.

62

Ochman, H., Elwyn, S., and Moran, N.A. (1999). Calibrating bacterial evolution. Proc. Natl. Acad. Sci. U. S. A. 96, 12638–12643. Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., et al. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78. Panieri, G., Lugli, S., Manzi, V., Roveri, M., Schreiber, B.C., and Palinska, K.A. (2010). Ribosomal RNA gene fragments from fossilized cyanobacteria identified in primary gypsum from the late Miocene, Italy. Geobiology 8, 101–111. Park, J.S., Vreeland, R.H., Cho, B.C., Lowenstein, T.K., Timofeeff, M.N., and Rosenzweig, W.D. (2009). Haloarchaeal diversity in 23, 121 and 419 MYA salts. Geobiology 7, 515–523. Rambaut, A. (2007). FigTree v.1.4. http://tree.bio.ed.ac.uk/software/figtree/ Rambaut, A., and Drummond, A.J. (2007). Tracer v1.4. http://tree.bio.ed.ac.uk/software/tracer/ Rambaut, A., and Grass, N.C. (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. CABIOS 13, 235–238. Røy, H., Kallmeyer, J., Adhikari, R.R., Pockalny, R., Jørgensen, B.B., and D’Hondt, S. (2012). Aerobic microbial respiration in 86-million-year-old deep-sea red clay. Science 336, 922–925. Satterfield, C.L., Lowenstein, T.K., Vreeland, R.H., Rosenzweig, W.D., and Powers, D.W. (2005). New evidence for 250 Ma age of halotolerant bacterium from a Permian salt crystal. Geology 33, 265–268. Schubert, B.A., Lowenstein, T.K., Timofeeff, M.N., and Parker, M.A. (2009). How do prokaryotes survive in fluid inclusions in halite for 30 k.y.? Geology 37, 1059–1062. Schubert, B.A., Lowenstein, T.K., Timofeeff, M.N., and Parker, M.A. (2010). Halophilic Archaea cultured from ancient halite, Death Valley, California. Environ. Microbiol. 12, 440–454. Stiller, M., Green, R.E., Ronan, M., Simons, J.F., Du, L., He, W., Egholm, M., Rothberg, J.M., Keates, S.G., Ovodov, N.D., et al. (2006). Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl. Acad. Sci. 103, 13578–13584. Vreeland, R.H., Rosenzweig, W.D., and Powers, D.W. (2000). Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal. Nature 407, 897–900. Vreeland, R.H., Jones, J., Monson, A., Rosenzweig, W.D., Lowenstein, T.K., Timofeeff, M., Satterfield, C., Cho, B.C., Park, J.S., Wallace, A., et al. (2007). Isolation of Live Cretaceous (121– 112 Million Years Old) Halophilic Archaea from Primary Salt Crystals. Geomicrobiol. J. 24, 275– 282. Wang, H., Liu, N., and Huang, Y. (2010). [Phylogenomic analysis of modular polyketide synthases in actinomycetes and its application in product screening]. Wei Sheng Wu Xue Bao 50, 1293– 1304.

63

Wang, H.L., Yan, Z.Y., and Jin, D.Y. (1997). Reanalysis of published DNA sequence amplified from cretaceous dinosaur egg fossil. Mol. Biol. Evol. 14, 589–591. Willerslev, E., and Cooper, A. (2005). Review Paper. Ancient DNA. Proc. R. Soc. B Biol. Sci. 272, 3–16. Willerslev, E., and Hebsgaard, M.B. (2005). New evidence for 250 Ma age of halotolerant bacterium from a Permian salt crystal: Comment and reply COMMENT. Geology 33, e93–e93. Willerslev, E., Hansen, A.J., Binladen, J., Brand, T.B., Gilbert, M.T.P., Shapiro, B., Bunce, M., Wiuf, C., Gilichinsky, D.A., and Cooper, A. (2003). Diverse Plant and Animal Genetic Records from Holocene and Pleistocene Sediments. Science 300, 791–795. Willerslev, E., Hansen, A.J., and Poinar, H.N. (2004). Isolation of nucleic acids and cultures from fossil ice and permafrost. Trends Ecol. Evol. 19, 141–147. Willerslev, E., Cappellini, E., Boomsma, W., Nielsen, R., Hebsgaard, M.B., Brand, T.B., Hofreiter, M., Bunce, M., Poinar, H.N., Dahl-Jensen, D., et al. (2007). Ancient Biomolecules from Deep Ice Cores Reveal a Forested Southern Greenland. Science 317, 111–114. Woodward, Weyand, N.J., and Bunnell, M. (1994). DNA sequence from Cretaceous period bone fragments. Science 266, 1229–1232. Young, D.L., Huyen, Y., and Allard, M.W. (1995). Testing the Validity of the Cytochrome B Sequence from Cretaceous Period Bone Fragments as Dinosaur DNA. Cladistics 11, 199–209. Zischler, H., Hoss, M., Handt, O., Haeseler, A. von, Kuyl, A. van der, and Goudsmit, J. (1995). Detecting dinosaur DNA. Science 268, 1192–1193.

64

Chapter 3: Biosignatures of thermophilic microorganisms from a temporary hydrothermal system 34.5 million years ago in the Chesapeake Bay Impact Structure

Abstract: Impact events shape the geology and biology of the earth by transporting volatile and organic compounds to Earth and delivering a localized pulse of energy to the lithosphere. This energy often initially sterilizes the area and then creates a temporary hydrothermal system often with a new influx of nutrients. This hydrothermal system can quickly be colonized by thermophilic bacteria and/or archaea which ultimately become trapped when the hydrothermal system cools and sedimentary layers accumulate. A major question posed is if DNA biosignatures of the thermophiles are recoverable after thousands or millions of years. The Chesapeake Bay Impact Structure (CBIS) is a well studied site of one such system with an impact event happening 34.5 million years ago. DNA was extracted from three subsamples of a two kilometer core into the CBIS including one sample from the crater where the hydrothermal system occurred and two samples from the sedimentary layers above the crater. Metagenomes from the samples share some resemblance to other deep subsurface samples. Metabolisms present include sulfate reduction, fermentation, and autotrophy. An increase in thermophilic bacteria and archaea were found in the sample from the crater compared to the sedimentary layers.

65

Introduction Impact craters and biomarkers Asteroids and comets shape both the geology and biology of the earth. These impactors can, for example, instigate mass extinction-events (Alvarez et al., 1980; Schulte et al., 2010; Toon et al., 1997), transport volatile and organic compounds to Earth (Albarède, 2009), and deliver a localized pulse of energy to the lithosphere which fractures and displaces rocks as well as creates a temporary hydrothermal system (Kieffer, 1971; Kirsimäe et al., 2002; Osinski et al., 2001). The thermal excursion caused by kinetic energy and disruptive shock waves can sterilize the area of microbial life near the impact (Kring, 1997). However, impact events can have long term benefits to deep sub-surface microbial communities including an increase in the permeability and porosity of target materials, the mixing of lithologies, the formation of a hydrothermal system, and the introduction of new nutrients into the crater cavity. These effects can create new habitats, increase the flow of nutrients and redox couples, and improve the water availability (Cockell et al., 2002, 2012). New opportunities for microbial life emerge once the hydrothermal system cools below 121C, the upper temperature limit for life (Kashefi and Lovley, 2003). Microbial communities have been shown to be able to colonize these hydrothermal systems in fewer than 10,000 years following the impact event (Lindgren et al., 2010; Parnell et al., 2004, 2010). Therefore impact craters are important places for life on early Earth and Mars and are the target for current and proposed scientific missions to Mars (Parnell et al., 2004; Rathbun and Squyres, 2002; Schwenzer and Kring, 2009; Squyres et al., 2012).

Biomarkers are often used to identify and characterize life in these extreme environments. DNA is one of the most informative biomarkers and can be found in living or dormant cells, spores, or as small fragments in the environment. Until recently, specific DNA sequences had to be targeted for analysis. The advent of next-generation sequencing methods has allowed metagenomics, i.e. sequencing and analysis of all the DNA in a sample, overcoming the need for specific targets and many of the biases associated with those analyses. These DNA metagenomes isolated directly from environmental samples, can be used to explore microbial community composition, understand metabolic pathways found in the community, and determine evolutionary history of community members (Gilbert and Dupont, 2011; Handelsman, 2004; Temperton and Giovannoni, 2012). While useful, DNA is also one of the more difficult biomarkers. Biomolecules

66

such as hydrocarbon lipids can be preserved in ancient sediments on billion year time scales (Reviews in Brocks and Pearson, 2005; Sachse et al., 2012; Schouten et al., 2013). In contrast, DNA degrades very quickly outside the protective environment of a cell, with the theoretical limit for PCR-amplifiable DNA estimated between 400 thousand and 1.5 million years (Allentoft et al., 2012; Lindahl, 1993). Beyond that point, the DNA is either severely crosslinked, completely degraded, or otherwise non-detectable (Hansen et al., 2006; Willerslev et al., 2004). However, DNA signatures from past events could still be present in the extant microbial community for thousands of years (Bulat et al., 2004; Inagaki et al., 2005). Here, we use the latest ancient DNA technologies to recover DNA from several sediment layers isolated from a soil core taken from a 34.5 million year old impact crater in the Chesapeake Bay, eastern North America. Our goal is to recover DNA biomarkers of a hydrothermal system at this impact site.

Chesapeake Bay impact structure The Chesapeake Bay Impact Structure (CBIS) is a buried, 85-90 km diameter impact structure formed in the late Eocene (34.5 million years ago) when a 2-3 km diameter impactor hit a shallow marine environment (water depth less than 350 m). The marine environment was underlain by a few hundred meters of lower tertiary and Cretaceous sediments above Neoproterozoic and Paleozoic crystalline basement rocks (Figure 3-1) (Johnson et al., 1998; Koeberl et al., 1996; Poag et al., Figure 3-1: Location of the drill site within in the Chesapeake Bay Impact 1994; Powars and Bruce, 1999). The Structure. The location is also shown in relation to the eastern USA. Image from Cockell et al. (2012). resulting impact structure is one of the world’s largest and best preserved marine impact features. The sequence of events during and after the impact have been reconstructed and include the formation of a crater cavity, the deposition

67 of impact breccias that include suevites and airfall deposits, the displacement of a large granite megablock from the edge of the crater, the deposition of avalanched sediments and ocean-resurge (tsunami) deposits formed during the impact, the temporary hydrothermal system lasting at least 10,000, and the subsequent deposition of marine sediments (Collins and Wünnemann, 2005; Kenkmann et al., 2009; Kulpecz et al., 2009; Malinconico et al., 2009).

Within the crater, thermal modeling shows that temperatures reached over 390°C in the hot rock layer with an upward, vertical fluid flow of cooling suevitic fluids and deeper basement brines (120°C) through the sediment breccias. This would have been sufficient to sterilize the sediment breccias within 450 m above the suevite. In addition, the basement brines would have replaced any marine water trapped in the crater, creating the saline brine that currently fills the crater (Malinconico et al., 2009). Due to the regional hydraulic gradients and hydraulic conductivities of the CBIC materials, the rate of the ground water flushing is slow enough that much of the original brine that entered the crater after the impact is still in the crater today (Sanford, 2003). The pore water chemistry shows an environment capable of supporting microbial growth, containing ample carbon (both organic and inorganic), electron receptors (especially sulfate), and bioavailable iron (Sanford et al., 2009).

In 2005 and 2006 the International Continental Scientific Drilling Program (ICDP) and the U.S. Geological Survey (USGS) drilled three cores into the CBIS down to a composite distance of 1.7 km. These cores span from impact altered basement rock through the impact processes and through the 444 m of post impact Eocene to Pleistocene sediments (Figure 3-2) (Gohn et al., 2006). Biological samples were taken from the core under strict decontamination conditions for cell counting, culturing, and DNA extraction. Cell counts and enrichment cultures indicate the presence of viable microorganisms throughout the core (Cockell et al., 2009; Finster et al., 2009). Biological samples from the top 140 m yielded data on the microbial community composition, and functional genes for carbon fixation and sulfate reduction were detected from these layers (Breuker et al., 2011).

In the present study, metagenomic data from the CBIS sedimentary layers post impact (150 m and 240 m) and impact layers (840 m) were analyzed to assess community composition and metabolic potential of microbial communities. We search for evidence of key genes involved in

68

Figure 3-2: Core depth in relation to composition of the core, impact processes from Gohen et al. (2008), and cell counts from Cockell et al. (2009). Location of the three samples in this study are indicated with blue arrows on the far left. sulfate reduction, methanotropy, methanogensis, and carbon fixation to gain insight into community physiology and biogeochemical cycling. After establishing distinct deep biosphere communities, we then look for the presence of thermophiles in the impact layers to see if we can detect biomarkers specific to the of the hydrothermal community from the time following the impact. These metagenomes will provide an opportunity to 1) observe deep sea sediment ecosystems, 2) examine a community isolated for millions of years, and 3) improve the understanding of the colonization of impact craters through identification of biomarkers.

69

Materials and Methods Coring and contamination control Three coreholes were drilled into the CBIS in 2005 and 2006 by the International Continental Scientific Drilling Program (ICDP) and the U.S. Geological Survey (USGS) to a composite depth of 1.76 km at Eyreville Farm, in Northampton County, Virginia, USA. The drill site is located ~9 km from the structure’s center, in the central crater (Gohn et al., 2006). Subsectioning of the core was performed for microbiological analysis using autoclaved equipment in a laminar flow hood before the samples were washed for geological analysis (Table 3-1). Due to the high risk of contamination, four methods were used for contamination monitoring. First, the drill and drilling fluid contained fluorescent microspheres which were used to mimic the ability of contaminant cells to enter the core Depth (m) Location Date Collected through fractures during coring. Biocore 2 159.3 Sedimentary Layer 16 Sept. 2005 Biocore 3 241.5 Sedimentary Layer 17 Sept. 2005 Second, a chemical tracer (Halon Biocore 29 842.1 Impact Structure 17 Oct. 2005 1211) was infused into the drilling Table 3-1: Sample depth, location in the core, and date collected for the samples analyzed. mud to monitor presence of drilling mud in the cores. Subsamples were tested for presence of microspheres (by epifluorescence microscopy) and Halon 1211 (by gas chromatograph) at the US Geological Survey in Reston, Virginia. Third, pore rewater and drilling mud samples were collected and examined by excitation- emission matrix fluorescence spectroscopy to characterize the dissolved organic carbon (DOC) to identify potential contamination. Finally, microbial contaminants in the drilling fluid were identified through 16S rRNA clone libraries in order to identify potential contamination (Gronstal et al., 2009). The subsections were then stored at the USGS in Reston at -80°C until they were shipped to the ancient DNA facility at The Pennsylvania State University where they were stored at -20°C until DNA extraction.

DNA extraction Sediment samples were taken from the subcores in a sterile laboratory in which no PCR amplification has taken place. Subsampling was performed in a laminar flow hood while wearing protective clothing, including face masks and double layers of gloves to prevent contamination of the samples with modern human DNA and bacterial DNA from organisms on human skin. All

70

equipment and surfaces were washed with bleach and ethanol prior to use, and all equipment and solutions used in the extractions were UV irradiated in the laminar flow hood for 1 hour prior to use. Samples from the core subsamples were taken using a sterile scalpel and placed into 1.5 ml or 50 ml tubes and weighed before proceeding to DNA extraction. Negative controls (extractions performed without sediment samples) were performed at a proportion of 1:3 for all extractions.

Nine protocols for DNA extraction were used for each layer of the core that was processed. Six different commercial kits were used: 1) PowerSoil Kit from MoBio using 0.5 g of sediment, 2) PowerLyzer Kit from MoBio using 0.5 g of sediment, 3) FastPrep Kit using 0.5 g of sediment, 4) Qiagen Stool Kit using 0.5 g of sediment, 5) RNA Soil Kit from MoBio using 2 g of sediment, and 6) PowerMax Soil Kit from MoBio using 5 g of sediment and the following alterations. In the lysis step solution C1 and 5 ml of phenol:chloroform:isoamyl alcohol were added before bead beating on a vortex for 5 min. After centrifugation, the aqueous phase was removed and 4ml of C2 solution and 4 ml of C3 solution were added. The solution was then mixed, incubated, and pelleted according to the protocol. Instead of C4 solution, binding salt CB3 from the PowerWater Kit was used. After the final step in the protocol, a carrier (glycol blue from TaKaRa) was added followed by an ethanol precipitation. For all the kits the final elution or resuspension was 50 µl of TE with 0.05% Tween (TET).

The other three protocols are described below. 1) 2g of sediment were extracted using the ancient DNA silica bone protocol (Rohland and Hofreiter, 2007). 2) A bead-beating protocol modified from Carrigg et al. (2007) was used on both 0.25-0.5 g and 5 g of sediment. The details provided are for 0.25-0.5 g of sediment. For the lysis phase, sediment, 0.5 ml of Griffth’s CTAB solution (1 volume 10% CTAB in 0.7M NaCl and 1 volume 240 mM potassium phosphate buffer pH=8.0), 0.5 ml of phenol-chloroform-isoamyl alcohol (25 part phenol, 24 part chloroform, 1 part isoamyl alcohol pH=8.0), and 250 mg of zirconia/silica beads (equal weight 1.0, 0.5, and 1.0 mm diameter beads) were added to a 2 ml screw cap microcentrifuge tube. The tube was subjected to bead-beating on a vortex for 30 sec and centrifuged at 10,000 x g for 5 min. The aqueous phase was transferred to a new 2ml screw cap microcentrifuge tube and an equal volume of chloroform- isoamyl alcohol was added, mixed by inversion, and separated by centrifugation at 10,000 x g for 5 min. The aqueous phase was removed into a new microcentrifuge tube. 0.6 volume of isopropanol was added to the aqueous phase and the samples were incubated overnight at 4°C.

71

DNA was pelleted by centrifugation at 16,000 x g for 20 min and then the pellet was washed with 70% ice cold ethanol, centrifuged for 16,000 x g for 5 min at 4°C, and allowed to dry in the laminar flow hood. The resulting pellet was resuspended in 50 µl of TET.

3) A chemical and bead beating protocol modified from Sheu et al. (2008) and Zhou et al. (1996) was used on 0.25-0.5 g and 5 g of sediment. Details are provided for 0.25-0.5 g of sediment. For the lysis phase, sediment, 270 µl of Extraction Buffer (100 mM Tris-HCl, pH=8.0, 100 mM EDTA, pH=8.0, 100 mM sodium phosphate, pH=8.0, 1.5 M NaCl, and 1% CTAB), 1 µl of Proteinase K (20 mg/ml), and 250 mg of zirconia/silica beads (as above) were added to a 2 ml screw cap microcentrifuge tube. The solution was incubated at room temp for 30 min on a rotator. After the incubation, 30 µl of 20% SDS was added and the centrifuge tube followed by bead beating on a vortex for 1 min. The solution was then incubated for 2 hours at 65°C with gentle inversions every 15-20 min and centrifuged at 6,000 x g for 10 min after which the supernatant was collected in a new microcentrifuge tube. To the remaining pellet 90 µl of Extraction Buffer and 10 µl of 20% SDS was added followed by bead beating on a vortex for 30 sec, incubation at 65C for 10 min, and centrifugation at 6,000 x g for 10 min. The resulting supernatant was combined with the one previously obtained and an equal volume of phenol-chloroform-isoamyl alcohol was added. After mixing for 2 min and centrifugation at 10,000 x g for 5 min, the aqueous phase was transferred to a new screw cap microcentrifuge tube. An equal volume of chloroform- isoamyl alcohol was added, followed by mixing, centrifugation, and transfer of the aqueous phase to a new microcentrifuge tube. 0.6 volume of isopropanol was added to the aqueous phase and the samples were incubated overnight at 4C. DNA was pelleted by centrifugation at 16,000 x g for 20 min and then the pellet was washed with 70% ice cold ethanol, centrifuged for 16,000 x g for 5 min at 4°C, and allowed to dry in the laminar flow hood. The resulting pellet was resuspended in 50 µl of TET. Extracts from all methods were stored at -20°C until amplification or library preparation.

PCR amplification, library preparation and sequencing To test each extraction for the presence of DNA suitable for PCR amplification, amplification of the 16S rRNA gene from bacteria and archaea was attempted using four different PCR primer sets that amplified between a 100 bp and 1.4 kb fragment of the 16S rRNA gene

72

(Table 3-2). PCR amplifications were performed in 25 µl reactions in duplicate using 1 µl of extracted DNA or 1 µl of a 1:10 dilution of the extract. Final reagent concentrations were as follows: 1.25 U Platinum Taq Hi-Fidelity and 1X buffer (Invitrogen Ltd., UK), 2mg/ml rabbit serum albumin (RSA; Sigma, fraction V), 2mM MgSO4, 250 µM of each dNTP, and 1 µM of each primer. PCR thermal cycling reactions consisted of the following steps: 94C for 10 min, followed by 35 cycles of 94°C denaturation for 25-30 sec, annealing for 45 sec at the optimal Tm for each primer set (Table 3-2), and extension at 68°C for 1-2 min depending on fragment size. Negative extraction controls as well as negative PCR controls (no extract) were amplified with each primer set.

Forward Primer Reverse Primer Frag. Length Annealing UniFa AAACTYAAAKGAATTGRCGG UniR ACGGGCGGTGTGTRC 500 bp 59 E786Fa GATTAGATACCCTGGTAG U926 CCGTCAATTCCTTTRAGTTT 140 bp 56 957a AATTGGAKTCAACGCCGGR 1057 GGCCATGCACCWCCTCTC 100 bp 58 BactF1 GGTGSTGCAYGGYTGTCGT BactR1 AGGCCCGRGAACGTATTCAC 342 bp 63 27F AGAGTTTGATCCTGGCTCAG 1492R GGTTACCTTGTTACGACTT 1.4 kb 50

Table 3-2: PCR primers including fragment length and annealing temperature in degrees Celsius. a) Primers from Baker et al. (2003).

A subset of the samples were selected for additional metagenomic analysis using a shotgun sequencing approach on the Illumina platform. Eight genomic libraries were made from 6 sediment DNA extracts (3 extracts from the PowerMax Soil Kit and 3 extracts from chemical and bead beating protocol) and the corresponding 2 negative control extractions. 15 µl of each extract was used for Illumina library generation according to the protocol by (Meyer and Kircher, 2010) with two alterations. First, after the first blunt ending step the purification was done with a QIAquick Nucleotide Removal Kit (Qiagen) to ensure the retention of very short DNA fragments. Second, two rounds of indexing PCR were run with 20 cycles each round in order to ensure amplification of the low biomass samples. After the indexing PCR, 5 µl of each library was run on an agarose gel to detect the presence of DNA. Samples from which libraries were successfully prepared, determined by the presence of DNA on the agarose gel, were then pooled and sequenced at the QB3 sequencing facility at The University of California, Berkeley on a HiSeq2000 Illumina platform with 100 bp paired-end chemistry.

73

Data processing and analysis Forward and reverse Illumina read pairs were first merged using the MergeReadsFastQ_cc script according to Kircher (2012). Unassembled reads were then uploaded to the MG-RAST web- based analysis package V3.3 (Meyer et al., 2008) for processing and sequence identification. The quality control options were set to 1) Dynamic Trim reads to remove reads which had more than 5 bases with lower phred quality scores than 15 (Cox et al., 2010), 2) to screen for and remove human sequences using DNA level matching with bowtie (Langmead et al., 2009), and 3) to remove reads that were artificial duplicates created by the sequencing process (identified as reads which started at the same base and which had exactly the same sequence for the first 50 bases) (Gomez-Alvarez et al., 2009). The quality-controlled database was analyzed to provide functional and phylogenetic information using the MG-RAST pipeline, which uses the BLAST-like algorithm BLAT (Kent, 2002). Because the Illumina sequences were short (average length 80bp), thresholds of maximum e-value of 10-5 and minimum percentage identity of 60% for taxonomic best hit classification against the M5NR database for proteins and against the M5RNA database for rRNA (Wilke et al., 2012) were employed. For functional analysis, a maximum e-value of 10-5 and minimum percentage identity of 60% for hierarchical classification against the Subsystems database (Overbeek et al., 2005) were used. Both taxonomic and functional data was visualized in pie charts using Krona (Ondov et al., 2011).

To examine species richness in the samples from the protein taxonomic annotations, rarefaction curves were generated from MG-RAST’s annotation tools. To assess whether the CBIS samples were significantly different from the CBIS extraction negative, Principle Coordinate Analysis (PCoA) plots for two different data sets were calculated. First PCoA was conducted on the 16 CBIC barcoded samples using taxonomic (representative hit classification) and functional (hierarchical classification) classifications using MG-RAST analysis tools. Second, to assess the similarity of the CBIS to other deep subsurface samples as well as Chesapeake Bay water samples, taxonomy tables were exported from MG-RAST for the CBIS samples as well as metagenomes from Peru Margin Subseafloor Biosphere (MG-RAST IDs: 4440960, 4440961, 4440973, 4459940, 4459941), Gulf of Mexico Deep Sea Sediment (MG-RAST IDs: 4465489, 4465490, and 4465491), and the Chesapeake Bay metagenomes from the Global Ocean Sampling Expedition (MG-RAST IDs: 4441132 and 4441584). Comparison of the reads identified as Eukaryotic in the samples indicated that all of the CBIS samples and the extraction negative were contaminated by terrestrial

74

mammals (especially from the genera Homo and Mus) and C4 grasses which are ubiquitous in the environment and most likely are from laboratory contamination (Ii, 2012; Leonard et al., 2007). Due to this contamination, all reads identified as eukaryotic from all the samples were removed before PCA analysis. The CBIS samples and the additional metagenomes were analyzed using PCA by STAMP (Parks and Beiko, 2010) with the parent level the and the profile level the family. Finally the Biocore samples were compared to determine the number of genera shared between samples.

To assess the potential activity of community members, key genes necessary for sulfate reduction, methanogenesis, methanotrophy, and carbon fixation were searched for in the CBIS metagenomes. Sequences for each metagenome were downloaded from the MG-RAST database prior to the gene calling and made into BLAST databases using the stand-alone BLAST+ applications (Camacho et al., 2009). Genes coding for key proteins (Berg et al., 2010; Hügler and Sievert, 2011; Teske et al., 2003) in several metabolisms from 4-6 different bacteria and archaea were downloaded from NCBI (Table 3-3). The four databases were then searched for matches to these key genes using BLASTn. For matches with an e-value smaller than 1, reads were translated and compared to the non-redundant protein database using BLASTx (Altschul et al., 1997). Matches with e-values lower than 10-5 were identified as a match to the gene of interest.

Process Protein Gene sulfate reduction adenosine-5-phosphosulfate reductase aps enzyme dissimilatory sulfite reductase dsrAB

methanogenesis M methy reductase mrcA

methanotrophy particulate methane monooxygenase pmoA

reductive citric acid cycle 2-oxogluterate synthase korAB ATP-citrate lyase aclBA

the reductive acetyl-coenzyme A pathway acetyl-CoA synthase-CO dehydrogenase CLJU 3-hydroxypropionate bicycle malonyl-CoA reductase mcr malyl-CoA lyase mclA

dicarboxylate-hydroxybutyrate cycle 4-hydroxybutyryl-CoA dehydratase AbfD

Table 3-3: Genes and proteins corresponding to metabolisms which were investigated.

75

Results DNA extraction, amplification and library preparation There was no visible amplification for any of the 16S rRNA primers used in either the CBIS samples or the corresponding negative controls for any of the extraction methods and dilutions. This is most likely due to the fact that the DNA was very fragmented and/or present in concentrations too low for amplification.

Protocol Biocore Libraries Metagenome PowerSoil Kit 2, 3, 29 PowerLyzer Kit 2, 3, 29 FastPrep Kit 2, 3, 29 Qiagen Stool Kit 2, 3, 29 RNA Soil Kit 2, 3, 29 PowerMax Soil Kit 2, 3, 29 X X Silica Bone Protocol 2, 3, 29 Bead Beating Protocol 2, 3, 29 Chemical and Bead Beating Protocol 2, 3, 29 X

Table 3-4: Protocols tested on the Biocore samples, which extractions were used to make Illumina libraries, and which libraries were sequenced.

Although genomic libraries were prepared from 6 specimens and 2 extraction methods, only the PowerMax Soil Kit from MoBio produced libraries of quality and concentration necessary for sequencing (Table 3-4). DNA was visible after the amplification step of the library preparation in all three CBIS Biocore samples and the CBIS negative extraction (EN) that were extracted using this protocol. The amplification negative control was clean and the library preparation negative control contained only barcode dimers without inserts (data not shown). For the CBIS samples (both Biocores and EN), the estimated size of the inserts was around 60-80 bp. These very short fragments are most likely due to extensive fragmentation of DNA in the sample or shearing caused by the extraction process.

76

Sequencing results and metagenomic composition Following quality control and adaptor trimming, the metagenomes contained ~3.5-5.7 million sequences (Table 3-5) A large percentage (88%-94%) of the sequences were removed as artificial duplicate reads (ADR). Between 7% and 24% of the quality controlled sequences were identified as either proteins or rRNA, with the latter identified at a ratio of 100 to 1, consistent with genome content. Due to the short fragment length, only family-level identification is feasible.

Number of Sequences Avg. Seq. Length GC Content Predicted Predicted Protein rRNA Upload ADR Post Upload Post Upload Post Protein rRNA Identified Identified Biocore 2 45,666,291 39,356,134 5,281,273 77±27 84±29 57±9% 56±10% 1,223,726 1,180,845 592,653 8,782 Biocore 3 61,799,047 38,977,187 3,570,558 74±25 77±25 59±8% 59±8% 447,598 635,229 853,882 4,397 Biocore 29 75,828,483 41,860,209 5,795,416 67±22 74±23 61±7% 60±7% 1,631,899 2,242,513 853,882 6,729 EN 64,247,408 41,375,710 4,213,289 78±28 81±28 57±9% 57±9% 640,040 768,979 284,543 7,928

Table 3-5: Information about sequences from quality control and protein and rRNA identification from MG-RAST. ADR is the number of duplicate reads.

The rarefaction curves of the CBIS samples indicate that the diversity has been adequately sampled (Figure 3-3A). Having sampled the extent of diversity in each sample, we examined the genera in each sample to establish differences between the EN and Biocores 2, 3, and 29 (Table 3- 6). Although all Biocores share some genera with the EN, the microbial communities in each of the CBIS samples is distinct, resulting in separation of each metagenome using PCoA (Figure 3- 3B). The deepest sample, Biocore 29, clusters most closely with the EN, which has been seen in other deep subsurface samples where the low biomass from the deep samples lead to higher contamination (pers. Com. Chris House). When compared with other deep sea sediments and the water from the Chesapeake Bay (Figure 3-3C), the first principle component axis separates Biocore 29 from all the rest of the samples. The second principle component axis separates out Biocore 3 and the EN from each other and the rest of the samples. Biocore 2 groups most closely with the other samples from the Chesapeake Bay and the deep sea sediments. The separation between the EN (which represents a clean room environment) and the sediment or water samples is consistent with the fact that the largest source of contamination that was grouping the EN with

Biocore 29 in the PCoA plot of the Biocores was from Eukaryotes.

77

Number of Bacterial and Archaeal Famlies Shared with Shared with Unique to EN only EX other Biocores Biocore Biocore 2 82 12 14 Biocore 3 48 10 7 Biocore 29 79 13 12 EN 14 98

Table 3-6: Comipled familes from bacteria and archaea were compared between samples. The second column shows number of families that are only present in the Extraction Negative. The thrid column shows the number of famlies shared between the core sample and the Extraction Negative. The fourth column shows the families which are found only shared between the Biocores. The fifth column shows the number of famlies which are unique to each Biocore. Clearly based upon family composition there the samples differ with all Biocores containing unique famliles from each other and the extraction negative.

Taxonomy analysis Figures 3-4 to 3-6 show the taxonomic distribution of the CBIC samples based on translated gene product identifications. All the samples are dominated by bacteria with archaea only being a minor component (less than 2% per sample). Eukaryotes are largely composed of contaminant sequences largely from the families Poaceae (flowering plants with more than 11,000 domestic and wild species), Muridae, and Hominidae and therefore are excluded from further analysis. The distribution of bacteria and archaea in the EN contains bacteria families which include species associated with humans (Propinobacteriacene and ), common environmental species (Moraxellaceae, Pseudomondaceae, Corynebacteriaceae, and ), and species found in clean rooms (, , Bradyrhizobiaceae, Psudomonadaceae, Methanoceldococcoceae, Methanococcaceae, and Thermococcaceae) (La Duc et al., 2003; Duc et al., 2007; Moissl et al., 2007; Venkateswaran et al., 2001). As most of these families contain species which are also found in the deep subsurface, they are not removed from further analysis but the EN data is provided and used to carefully interpret the Biocores data. Although most families are shared with the EN, some are unique to the Biocores (Table 3-7).

78

Figure 3-3: A) Rarefaction plot showing a curve of the species richness from the taxonomic data. The curve is a plot of the total number of distinct species annotations as a function of the number of sequences sampled. The flattening of the curve to the right indicates that more intensive sampling is likely to yield only a few additional species. The curves represent each of the four different barcodes for Biocore 2 (red), Biocore 3 (orange), Biocore 4 (yellow), and the EN (black). B) The first and second axes of the PCoA plot generated by MG-RAST for the 16 individual barcodes. The barcodes cluster by sample and are separate from one another. The Extraction Negative clusters closest to Biocore 29 which is expected since Biocore 29 would have the least amount of template DNA making it the most prone to contamination. C) PCA plot of the Biocores and several different environments (Chesapeake Bay Water (CB Water), subsurface samples from the Gulf of Mexico (Gulf of Mexico) and the Peru Margin (Peru Margin)). Most of the variation between the samples separates Biocore 29 from the rest of the samples. This is consistent with Biocore 29 being far deeper than all of the other samples and thus represents a different environment. Biocore 3 is also deeper than the other samples, but represents a different geochemical environment than Biocore 29 which is consistent with the separation from both Biocore 29 and the other samples. The EN represents a clean room environment which is different from the sediment and water environments of the other samples. By removing the Eukaryote sequences which represented a large part of the contamination in the Biocores, the EN no longer groups closely with Biocore 29.

79

MG-RAST taxonomic classification for Biocore 2 detected 5 dominant and an additional 14 phyla which combined account for less than 1% of the sequences. These taxonomic lineages encompassed 28 classes and 79 orders of bacteria. Five archaeal phyla were detected composed of 9 classes and 16 orders. Biocore 3 is composed of 5 dominant bacterial phyla with an additional 10 phyla that account for around 2% of the total sequences. These are composed of 27 classes and 74 orders of bacteria. Four archaeal phyla were detected composed of 5 classes and 7 orders. Biocore 29 consists of 4 dominant phyla and 11 phyla accounting for less than 1% of the sequences. These encompass 22 classes and 57 orders of bacteria. Three archaeal phyla were detected consisting of 3 classes and 8 orders.

The dominant bacteria taxa was in all three Biocores followed by , , and . Small amounts of were present in Biocores 2 and 3, but not in Biocore 29. Dominant bacterial families are Moraxellaceae, Pseudomonadaceae, Enterobacteriaceae, Sphingomonadaceae, Comamonadaceae, Oxalobacteraceae, and Streptococcaceae for Biocore 2; Pseudomonadaceae, Bradyhizobiaceae, Sphingomonadaceae, Streptococcaceae, and Veillonallaceae for Biocore 3; and Pseudomonadaceae, Enterobacteriaceae, Bradyrhixobiaceae, Rhizobiaceae, Sphingomondaceae, and Caulobacteraceae for Biocore 29.

Archaea varied more between samples with Biocore 2 composed of roughly equal amounts of the phyla , , , and Thaumarchaeota. Biocore 3 is primarily composed of Crenarchaeota and Euryarchaeota with small amounts of Korarchaeota and Thaumarachaeota. Biocore 29 contains Crenarchaeota and Euryarchaeota with a small amount of Thaumarachaeota. The dominant archaeal families for Biocores 2 and 3 are Sulfobacterace, Desulfurococcaceae, Methanosarchinaceae, and Cenarchaeaceae. Biocore 29 is composed almost entirely of Sulfobaceae and Thermococcaceae with a small amount of Desulfurococcaceae and .

80

Figure 3-4: Graphs of the taxonomic composition of the 16S rRNA sequence data from the CBIS samples.

81

Figure 3-5: Graphs of the taxonomic composition of the protein sequence data from the CBIS samples

82

Figure 3-6: Graphs of the taxonomic composition of archaea from protein sequence data from the CBIS samples.

83

Families Unique to the Biocores Biocore 2 Biocore 3 Biocore 29 Bacteria Nocardiopsaceae Dietziaceae Herpetosiphonaceae Tsukamurellaceae Rubrobacteraceae Prochlorotrichaceae Cytophagaceae Hydrogenothermaceae Colwelliaceae Rhodothermaceae Waddliaceae Synergistaceae Planctomycetaceae Chlorobiaceae Hyphomicrobiaceae Nitrospinaceae Acidaminococcaceae Hydrogenophilaceae Geobacteraceae Methylococcaceae Pelobacteraceae Pseudoalteromonadaceae Nocardioidaceae Chromatiaceae Oceanospirillaceae Paenibacillaceae Halomonadaceae Verrucomicrobiaceae Methylophilaceae Piscirickettsiaceae Dermacoccaceae Synergistaceae Blattabacteriaceae Dermacoccaceae Rhodocyclaceae Blattabacteriaceae Cystobacteraceae Rhodocyclaceae Nocardioidaceae Cystobacteraceae Paenibacillaceae Chlorobiaceae Methylophilaceae Acidaminococcaceae Methylococcaceae Nocardioidaceae Paenibacillaceae Methylophilaceae

Archaea Fervidicoccaceae Thermoplasmataceae Halobacteriaceae Cenarchaeaceae Nitrosopumilaceae Desulfurococcaceae Nitrososphaeraceae Methanosaetaceae Sulfolobaceae Methanosaetaceae Methanosarcinaceae Methanosarcinaceae Desulfurococcaceae Halobacteriaceae Sulfolobaceae Desulfurococcaceae Sulfolobaceae

Table 3-7: Families of bacteria and archaea unique to the biocores. Some of these are clearly shown in Figures 3-4 to 3-6 (pie charts) and some are families with low percentages which are not clearly indicated in the pie charts.

84

Genes Indicative of Metabolic Potential Table 3-8 shows the number of reads which were hits to the key metabolic genes and the ones that were confirmed through the BLASTx search. Unfortunately many of the hits to genes could not be confirmed through translated searching of the non-redundant protein database. But some of genes were identified at both the gene and the translated gene level. asp was identified via BLASTx in Biocores 2 and 29. dsrAB, which encodes for proteins necessary for assimilatory sulfate reduction and oxidation, was found in all the Biocores, although no sequences were verified through the BLASTx search. None of the samples had any hits to mcrA, a gene that is essential for methanogensis, but Biocores 2 and 29 had confirmed hits to pmoA, involved in methane oxidation in methantrophs.

The carbon fixation genes encoding the proteins 2-oxogluterate synthase, ATP-citrate lyase, acetyl-CoA synthase, malyl-CoA lyase, and 4-hydroxybutyryl-CoA dehydratase were identified by multiple hits in all four metagenomes. However, the BLASTx search with the translated sequences was only able to identify these sequences as belonging to more general families of proteins. 2-oxogluterate synthase was identified in BLASTn and BLASTx searches of both Biocore 2 and 29 while malyl-CoA lyase was identified in Biocore 2. Given the difficulty of accurately identifying proteins based on sequences of short read lengths and the high stringency BLAST parameters employed, these results provide evidence that organisms capable of primary productivity are present in the deep subsurface.

85

no no no no no Ex Neg protein/ transposase/ phage integrase phage monooxygenase malate-CoA ligase/ glycosyl hydrolase/ 2-oxoacid ferredoxin ferredoxin 2-oxoacid succinate-CoA ligase/ ligase/ succinate-CoA diphosphate reductase/ reductase/ diphosphate succinyl-CoA synthetase/succinyl-CoA 2-oxogluterate ferredoxin ferredoxin 2-oxogluterate oxidoreductase subunit beta oxidoreductase phosphopantetheine-binding phosphopantetheine-binding oxidoreductase subunit beta/ oxidoreductase 4-hydroxy-3-methylbut-2-enyl 4-hydroxy-3-methylbut-2-enyl deoxyribose-phosphate aldolase deoxyribose-phosphate no no no reductase/ reductase/ Biocore 29 GMP synthase/ sulfite reductase sulfite 3-hydroxyacyl-CoA 3-hydroxyacyl-CoA malate-CoA ligase/ 2-oxoacid ferredoxin ferredoxin 2-oxoacid succinate-CoA ligase/ ligase/ succinate-CoA 2-oxogluterate synthase 2-oxogluterate succinyl-CoA synthetasesuccinyl-CoA acyl-CoA dehydrogenase acyl-CoA 2-oxogluterate ferredoxin ferredoxin 2-oxogluterate hypothetical protein from protein hypothetical glutamine amidotranferase/ amidotranferase/ glutamine oxidoreductase subunit beta/ oxidoreductase subunit beta/ oxidoreductase hydorzy-3-methylglutaryl-CoA sodium:dicarboxylate symportersodium:dicarboxylate Bradyrhizobium/ Mesorhizobium dehydrogenase/ 3- dehydrogenase/ ABC exporter ATP-binding subunit/ ATP-binding exporter ABC 5) and then for confirmation those entire reads reads entire those confirmation for then and 5) - no no no no no no ligase Hits from BLASTx against protein database protein/ Biocore 3 malate-CoA ligase/ Methylobacterium/ fatty acid CoA ligase/ succinate-CoA ligase/ ligase/ succinate-CoA AMP-binding domain/ succinyl-CoA synthetasesuccinyl-CoA 2-oxogluterate ferredoxin ferredoxin 2-oxogluterate hypothetical protein from protein hypothetical citrate lyase beta subunit/ beta lyase citrate oxidoreductase subunit beta oxidoreductase ABC transporter ATP-binding ATP-binding transporter ABC biotin-acetyl-CoA-carboxylase biotin-acetyl-CoA-carboxylase no no no alpha reductase Biocore 2 2-oxogluterate dehydrogenase/ dehydrogenase/ malyl-CoA lyase/ lyase/ malyl-CoA monooxygenase/ Hyphomicrobium/ 3-hydroxyacyl-CoA 3-hydroxyacyl-CoA sulfate transporter sulfate 2-oxoacid ferredoxin ferredoxin 2-oxoacid succinate-CoA ligase/ ligase/ succinate-CoA synthaseacltransferase/ acetolactate synthase 1/ synthase acetolactate succinyl-CoA synthetase/succinyl-CoA 2-oxogluterate ferredoxin ferredoxin 2-oxogluterate hypothetical protein from protein hypothetical hypothetical protein from protein hypothetical phenylacetate-CoA ligase/ peptide methionine sulfoxide sulfoxide methionine peptide dimethyl sufoxide reductase/ reductase/ dimethyl sufoxide oxidoreductase subunit beta/ oxidoreductase subunit beta/ oxidoreductase glycyl-tRNA synthetase subunit subunit synthetase glycyl-tRNA Bradyrhizobium/Mesorhizobium

0 0 0 59 64 94 50 99 14 Ex Neg 0 0 59 26 63 95 62 100 101 Biocore 29 0 0 54 11 51 14 104 103 121 Biocore 3 Hits from BLAST Biocore Databases 6 0 8 1 35 94 62 49 98 Biocore 2 : The metagenomes were searched for key metabolism genes first using nucleotide sequence (columns 2 (columns sequence nucleotide using first genes metabolism key for searched were metagenomes The : 8 - 3

asp dsrAB mcrA pmoA korAB aclBA CLJU mclA AbfD Table Table 6 - 9). (columns database protein a to compared and translated

86

Discussion This paper describes the community composition for three different layers of the CBIS in order to analyze the community composition and ultimately identify the possibility of DNA biomarkers existing from the impact induced hydrothermal system. Because these deep subsurface sediments are generally low biomass and highly prone to contamination, before we identify biomarkers, we must approach the analysis of these samples carefully. First, we discuss the reasons for the short fragments in these samples in comparison to other sediment metagenomes. Then, we discuss the community composition of the core and identify potential biomarkers in the layer associated with the hydrothermal system. Finally, we eliminate hypotheses other than a living remnant of a past microbial community for the presence of hydrothermal biomarkers in Biocore 29.

Low biomass samples – difficulties resulting in high fragmentation The difficulty with extracting and amplifying the samples is most likely due to three factors. First the samples most likely have very low biomass. The original cell counts for the layers between 140 to 1766 m in the core showed that the upper part of the sample (between 140 and 444 m) had an average cell count between 106 and 108 cells/g with a high variation decreasing at 444 m to below 106 (Cockell et al., 2009). However, the cell counts average 106 from 80 to 140 m (Breuker et al., 2011). Even this number is slightly high for other cell counts in terrestrial sediments for a similar depth range (Balkwill, 1989; Fredrickson et al., 1991; Fry et al., 2009; Kieft et al., 1995; Takai et al., 2003) while marine sediments decrease even further with depth (Parkes et al., 1994, 2000; Schippers et al., 2005). Finally, cell counting methods produce higher cell counts than DNA based quantification methods for deep biosphere samples (Engelen et al., 2008). The lower cell counts are more compatible with the low biomass recovered from the Biocores.

Second, the microbial cells may have been embedded in the mineral matrix or difficult to lyse (Lipp et al., 2008). For these reasons multiple extractions were attempted using both long incubations at warm temperatures and sometimes extensive bead-beating. This leads to the third problem which is the highly fragmented nature of the DNA in the extracts. Active cultures have been obtained from this core marking the presence of living cells (Finster et al., 2009). Although there is most likely also fragmented DNA from decaying cells, we expected large sections of DNA

87 from living cells as most deep subsurface samples have been found to be active, albeit at a much slower respiration rate than the surface world (Chapelle and Lovley, 1990; D’Hondt et al., 2002, 2004; Lin et al., 2006; Lomstein et al., 2012; Onstott et al., 1999; Phelps et al., 1994; Price and Sowers, 2004; Røy et al., 2012). As evidenced by the negative amplification reactions and short fragment size in the library preparation, this was not the case. It is possible that the extensive bead-beating necessary to lyse the cells in the mineral matrix resulted in shearing of the DNA (Bürgmann et al., 2001; Miller et al., 1999; Yu and Morrison, 2004).

Community composition and biomarker identification The composition of the Biocore samples is comparable with other deep biosphere samples as well as with the analysis done on the top 140m of the core. The top 140 m of the core had a dominance of bacteria over archaea and was lacking the mcrA gene, necessary for methanogenesis (Breuker et al., 2011) similar to the lower layers of the core analyzed here. Despite the lack of mcrA gene in Biocores 2 and 3, sequences most closely related to methanogens are present indicating a possibly low level of methanogensis in these layers. In addition, the top 100 m contained aprA, common in sulfate reducing bacteria, in low copy numbers and is also present in Biocores 2 and 29. Breuker et al. (2011) found that the main families of bacteria and archaea recovered from the top 140 m include; Geobacteraceae, Pseudomonadaceae, Morxellaceae, Methanococcaceae, Thermoproteaceae, Desulfurococcaceae which are all present in Biocores 2 and 3. In addition, the top 140m contained taxa which clustered with other unclassified deep subsurface sediments in the phyla Euryarchaeota and Crenarchaeota. Biocores 2 and 3 contain at least 15% unclassified reads affiliated with these phyla and may be similar to the unclassified fraction of archaeal reads in the upper 140 m.

Although subsurface ecosystems in general have slow biomass turnover and the extent to which the cells are actually reproducing is unknown (Hoehler and Jørgensen, 2013; Jørgensen, 2011; Jørgensen and Boetius, 2007; Lomstein et al., 2012), marine subsurface sediments often have a higher biodiversity and are more dynamic than the extremely slow growing microbial biome of the terrestrial deep biosphere sediments (Edwards et al., 2012). The Biocores studied here have a much higher biodiversity than what was recovered from the top 140 m. This is consistent with using more sensitive next generation sequencing techniques (i.e. Illumina), but also could represent

88 an actual increase in biodiversity due to the change from a purely terrestrial deposition environment of the top 140 m to a marine deposition environment at lower depths. Although there are no “keystone species” identified for the deep biosphere, bacterial phyla consistently recovered from the deep subsurface are Proteobacteria, Actinobacteria, Firmicutes, and Chloroflexi (Batzke et al., 2007; Biddle et al., 2005; Fry et al., 2009; Rastogi et al., 2010; Wilms et al., 2006). Reads affiliated with Proteobacteria, Actinobacteria, and Firmicutes are found in high quantities of all three of the Biocores with Proteobacteria as the most abundant phyla. Reads affiliated with Chloroflexi were found in Biocores 2 and 3, but not Biocore 29, which is in contrast to reports of Chloroflexi abundance increasing with depth (Fry et al., 2009; Wilms et al., 2006).

Crenarchaeota and Euryarchaeota are typically found in the deep subsurface (Biddle et al., 2008; Fry et al., 2009; Rastogi et al., 2010) and reads affiliated with both phyla were abundant in all Biocores. In addition, Biocores 2 and 3 also have reads affiliated with Korarchaeota and Thaumarchaeota which are relatively uncharacterized phyla of archaea with few cultured representatives (Auchtung et al., 2006; Brochier-Armanet et al., 2011; Dawson et al., 2006; Spang et al., 2010).

Several deep subsurface studies have identified bacteria from the genera Bacillus and Rhizobioum (Batzke et al., 2007; Engelen et al., 2008). Sequences affiliated with these genera are present in all three Biocores; however, they are also present in the EN. Bacillus is a common genus found in both the deep subsurface as well as in clean rooms (Duc et al., 2007; Moissl et al., 2007). Rhizobioum fix nitrogen and are best known for their symbiosis with legumes and plants (Long, 1996, 2001; Young and Johnston, 1989). Although generally aerobic or facultatively aerobic, they have been found in the deep subsurface at several sites growing under anoxic conditions and most likely represent the presence of nitrogen fixation in the ecosystem (Batzke et al., 2007; Engelen et al., 2008). Nitrogen fixation is often found in deep biospheres with sulfate reduction, methane oxidation, and fermentation (Engelen et al., 2008; Rastogi et al., 2010). Biocores 2 and 3 contained evidence of methane oxidation and all three Biocores contain taxa which are capable of sulfate reduction and fermentation. All three of the Biocores contained genes identified as carbon fixation genes. Several of the families present only in the Biocores contain lithoautotrophs including Hydrogenothermaceae, Nitrospinaceae, Geobacteraceae, and Hydrogenophilaceae. Therefore, like other deep subsurface environments, there is evidence for

89

lithoautotrophy as a potential source of primary productivity in this deep biosphere ecosystem where the dominant taxa are anaerobic, heterotrophic fermenters.

Salinity increases with depth in the CBIC as do biosignatures of halophilic archaea and bacteria; present in Biocore 29 (Halomonadaceae, Rhodothermace, and Halobacteriaceae), but not in the other Biocores. This agrees well with the geochemistry data where the salinity in Biocore 29 is much higher than in the upper layers.

Biocore 29 comes from within the impact crater and could potentially carry biomarkers of the temporary hydrothermal system that existed there 34.5 million years ago in the currently surviving microbial population. Compared to the other Biocores, Biocore 29 contains a small increase in bacterial thermophile families (Rhodothermaceae) and a large increase in the archaeal families whose characterized isolates are thermophilic: Sulfolobus, Staphylothermus, Pyrococcus, and Thermococcus. These thermophlic families of archaea compose over 75% of the archaeal population in Biocore 29, whereas they account for less than 20% of the total archaea in the other Biocores. In addition to members of these families being thermophilies, they also contain species which are tolerant of high levels of salt (Rhodothermaceae) and use sulfur (Sulfolobus and Staphylothermus). Biocore 29 also had an increased abundance of reads affiliated with the autotropohic deep sea the thermophilic family Pyrococcus. These observations are consistent with the geochemical reconstruction of the temporary hydrothermal system where a warm salt brine filled the crater along with high levels of sulfur species. Due to the sterilizing event of the impact microbial communities colonizing the temporary hydrothermal system would have benefitted from the presence of primary producers.

Biomarkers of an ancient hydrothermal system In order to determine if DNA biomarkers are identifiable in the CBIS samples, first we must determine if the data represents the sampled sediment or a contaminant from the drilling and/or laboratory analysis. To identify potential laboratory contaminants, an EN was sequenced with the CBIS samples. The taxonomic composition of the EN was consistent with other clean rooms which have been analyzed (Duc et al., 2007; Moissl et al., 2007). Although the Biocores share families with the EN, each Biocore is comprised of distinct populations as shown by PCoA

90

(Figure 3-3) analysis and a taxonomic analysis of the families present in the samples (Table 3-7). The Biocore taxonomic and functional data are consistent with the geochemistry and show ample support of a self-sustaining microbial community including a variety of autotrophic organisms. Based on these independent analyses and observations, we conclude that this data potentially represents the environment from which it was sampled.

However, if the data is authentic, it may not necessarily represent biomarkers from the hydrothermal system after the impact event. We consider three hypotheses to address this discrepancy. First, the CBIC is not a closed system and therefore the data may represent a more recent colonization of the sediment. Second, the hydrothermal signal in Biocore 29 is from organisms which died after the hydrothermal system cooled and have been preserved for 34 million years. Finally, the hydrothermal signal in Biocore 29 represents a low level of hydrothermal microorganisms from the temporary impact induced hydrothermal system still metabolically active in the current microbial community.

First, the sample might be contaminated from nearby environments. If the CBIS is not a closed system, microbial life may have migrated from nearby communities contaminating the sediments with more recent microbes. We reject this hypothesis for two reasons: first, the hydrological studies strongly suggest that the system has been closed since the cooling of the hydrothermal system (Sanford, 2003); and second, there are no nearby thermophilic systems to contaminate the CBIS.

Second, the thermophilic signature could represent dead microorganisms or free DNA from the hydrothermal system 34.5 Mya. This explanation is extremely unlikely due to the fact that the environment has been almost continually aqueous for the past 34 million years and DNA is unstable in water. DNA has a maximum lifetime of 400 kya to 1.5 Mya before the DNA is either severely cross-linked, completely degraded or otherwise non-detectable (Hansen et al., 2006; Willerslev and Cooper, 2005; Willerslev et al., 2004). Therefore, dead microorganisms or free DNA are inconsistent with the recovery timeframe.

Finally, the thermophilic signature in the sediment represents living thermophiles present in the community. It has been shown that microorganisms are capable of surviving long periods of time with low metabolic function in order to maintain DNA integrity (D’Hondt et al., 2002; Fichtel et al., 2012; Hoehler and Jørgensen, 2013; Johnson et al., 2007; Jørgensen, 2011; Teske, 2005).

91

They have been observed directly from salt crystals where they are miniaturized in comparison to actively growing communities of the same species (Lowenstein et al., 2011; Schubert et al., 2009). Organisms have been cultured from extreme environments (Gales et al., 2011; Rastogi et al., 2010; Roh et al., 2002; Sass and Cypionka, 2004; Takai et al., 2002). Also, low metabolic function has been observed in deep subsurface communities and other extreme environments (Chapelle and Lovley, 1990; Johnson et al., 2007; Price and Sowers, 2004; Røy et al., 2012). After culturing, some organisms from these communities have been found to have rapid metabolisms under favorable conditions and are not unique to extreme environments. However, it appears that when they are relocated to these extreme environments, they are capable of reducing their metabolisms in order to wait for more favorable conditions (Getliff et al., 1992; Morono et al., 2011).

Although, we cannot conclusively prove that hydrothermal microorganisms are alive in the CBIS, the data we present here could be explained by their presence as a living remnant from the 34.5 Mya hydrothermal system.

Astrobiology and DNA biomarkers The Martian surface was inhospitable by the time eukaryotic life evolved on Earth; however, the subsurface of Mars is most likely capable of supporting chemolithoautotropic microorganisms similar to those living below the seafloor on Earth (Fisk and Giovannoni, 1999). Current research has targeted areas where upwelling may allow for sampling of this potential deep biosphere (Michalski et al., 2013), but the question remains, “What can we look for?” The Chesapeake Bay Impact Crater provides an analog site for temporary hydrothermal systems created on Mars from impact events where microbial colonization during a time of high energy (hydrothermal system) resulted in a long term colonization of the deep biosphere, isolated from the photosynthetic surface for over 30 million years. Detecting biomarkers from microorganisms which lived during the hydrothermal system provides insight into the evolution or such systems as well as providing support for long term survival of microbial DNA biomarkers from the deep subsurface.

92

References Albarède, F. (2009). Volatile accretion history of the terrestrial planets and dynamic implications. Nature 461, 1227–1233. Allentoft, M.E., Collins, M., Harker, D., Haile, J., Oskam, C.L., Hale, M.L., Campos, P.F., Samaniego, J.A., Gilbert, M.T.P., Willerslev, E., et al. (2012). The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. B Biol. Sci. 279, 4724–4733. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Alvarez, L.W., Alvarez, W., Asaro, F., and Michel, H.V. (1980). Extraterrestrial Cause for the Cretaceous-Tertiary Extinction. Science 208, 1095–1108. Auchtung, T.A., Takacs-Vesbach, C.D., and Cavanaugh, C.M. (2006). 16S rRNA Phylogenetic Investigation of the Candidate Division “Korarchaeota.”Appl. Environ. Microbiol. 72, 5077–5082. Balkwill, D.L. (1989). Numbers, diversity, and morphological characteristics of aerobic, chemoheterotrophic bacteria in deep subsurface sediments from a site in South Carolina. Geomicrobiol. J. 7, 33–52. Batzke, A., Engelen, B., Sass, H., and Cypionka, H. (2007). Phylogenetic and Physiological Diversity of Cultured Deep-Biosphere Bacteria from Equatorial Pacific Ocean and Peru Margin Sediments. Geomicrobiol. J. 24, 261–273. Berg, I.A., Kockelkorn, D., Ramos-Vera, W.H., Say, R.F., Zarzycki, J., Hügler, M., Alber, B.E., and Fuchs, G. (2010). Autotrophic carbon fixation in archaea. Nat. Rev. Microbiol. 8, 447–460. Biddle, J.F., House, C.H., and Brenchley, J.E. (2005). Microbial stratification in deeply buried marine sediment reflects changes in sulfate/methane profiles. Geobiology 3, 287–295. Biddle, J.F., Fitz-Gibbon, S., Schuster, S.C., Brenchley, J.E., and House, C.H. (2008). Metagenomic signatures of the Peru Margin subseafloor biosphere show a genetically distinct environment. Proc. Natl. Acad. Sci. 105, 10583–10588. Breuker, A., Koweker, G., Blazejak, A., and Schippers, A. (2011). The Deep Biosphere in Terrestrial Sediments in the Chesapeake Bay Area, Virginia, USA. Front. Microbiol. 2. Brochier-Armanet, C., Gribaldo, S., and Forterre, P. (2011). Spotlight on the Thaumarchaeota. ISME J. 6, 227–230. Brocks, J.J., and Pearson, A. (2005). Building the Biomarker Tree of Life. Rev. Mineral. Geochem. 59, 233–258. Bulat, S.A., Alekhina, I.A., Blot, M., Petit, J.-R., de Angelis, M., Wagenbach, D., Lipenkov, V.Y., Vasilyeva, L.P., Wloch, D.M., Raynaud, D., et al. (2004). DNA signature of thermophilic bacteria from the aged accretion ice of Lake Vostok, Antarctica: implications for searching for life in extreme icy environments. Int. J. Astrobiol. 3, 1–12.

93

Bürgmann, H., Pesaro, M., Widmer, F., and Zeyer, J. (2001). A strategy for optimizing quality and quantity of DNA extracted from soil. J. Microbiol. Methods 45, 7–20. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Carrigg, C., Rice, O., Kavanagh, S., Collins, G., and O’Flaherty, V. (2007). DNA extraction method affects microbial community profiles from soils and sediment. Appl. Microbiol. Biotechnol. 77, 955–964. Chapelle, F.H., and Lovley, D.R. (1990). Rates of microbial metabolism in deep coastal plain aquifers. Appl. Environ. Microbiol. 56, 1865–1874. Cockell, C.S., Lee, P., Osinski, G., Horneck, G., and Broady, P. (2002). Impact-induced microbial endolithic habitats. Meteorit. Planet. Sci. 37, 1287–1298. Cockell, C.S., Gronstal, A.L., Voytek, M.A., Kirshtein, J.D., Finster, K., Sanford, W.E., Glamoclija, M., Gohn, G.S., Powars, D.S., and Horton, J.W. (2009). Microbial abundance in the deep subsurface of the Chesapeake Bay impact crater: Relationship to lithology and impact processes. Geol. Soc. Am. Spec. Pap. 458, 941–950. Cockell, C.S., Voytek, M.A., Gronstal, A.L., Finster, K., Kirshtein, J.D., Howard, K., Reitner, J., Gohn, G.S., Sanford, W.E., Horton, J.W., et al. (2012). Impact Disruption and Recovery of the Deep Subsurface Biosphere. Astrobiology 12, 231–246. Collins, G.S., and Wünnemann, K. (2005). How big was the Chesapeake Bay impact? Insight from numerical modeling. Geology 33, 925–928. Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11, 485. D’Hondt, S., Rutherford, S., and Spivack, A.J. (2002). Metabolic activity of subsurface life in deep-sea sediments. Science 295, 2067–2070. D’Hondt, S., Jørgensen, B.B., Miller, D.J., Batzke, A., Blake, R., Cragg, B.A., Cypionka, H., Dickens, G.R., Ferdelman, T., and Hinrichs, K.-U. (2004). Distributions of microbial activities in deep subseafloor sediments. Science 306, 2216–2221. Dawson, S., Delong, E.F., and Pace, N.R. (2006). Phylogenetic and ecological perspectives on uncultured Crenarchaeota and Korarchaeota. The Prokaryotes 3. Duc, M.T.L., Dekas, A., Osman, S., Moissl, C., Newcombe, D., and Venkateswaran, K. (2007). Isolation and Characterization of Bacteria Capable of Tolerating the Extreme Conditions of Clean Room Environments. Appl. Environ. Microbiol. 73, 2600–2611. La Duc, M.T., Nicholson, W., Kern, R., and Venkateswaran, K. (2003). Microbial characterization of the Mars Odyssey spacecraft and its encapsulation facility. Environ. Microbiol. 5, 977–985. Edwards, K.J., Becker, K., and Colwell, F. (2012). The Deep, Dark Energy Biosphere: Intraterrestrial Life on Earth. Annu. Rev. Earth Planet. Sci. 40, 551–568.

94

Engelen, B., Ziegelmüller, K., Wolf, L., Köpke, B., Gittel, A., Cypionka, H., Treude, T., Nakagawa, S., Inagaki, F., Lever, M.A., et al. (2008). Fluids from the Oceanic Crust Support Microbial Activities within the Deep Biosphere. Geomicrobiol. J. 25, 56–66. Fichtel, K., Mathes, F., Konneke, M., Cypionka, H., and Engelen, B. (2012). Isolation of Sulfate- Reducing Bacteria from Sediments Above the Deep-Subseafloor Aquifer. Front. Microbiol. 3. Finster, K.W., Cockell, C.S., Voytek, M.A., Gronstal, A.L., and Kjeldsen, K.U. (2009). Description of Tessaracoccus profundi sp.nov., a deep-subsurface actinobacterium isolated from a Chesapeake impact crater drill core (940 m depth). Antonie Van Leeuwenhoek 96, 515–526. Fisk, M.R., and Giovannoni, S.J. (1999). Sources of nutrients and energy for a deep biosphere on Mars. J. Geophys. Res. Planets 104, 11805–11815. Fredrickson, J.K., Balkwill, D.L., Zachara, J.M., Li, S.-M.W., Brockman, F.J., and Simmons, M.A. (1991). Physiological diversity and distributions of heterotrophic bacteria in deep cretaceous sediments of the Atlantic coastal plain. Appl. Environ. Microbiol. 57, 402–411. Fry, J.C., Horsfield, B., Sykes, R., Cragg, B.A., Heywood, C., Kim, G.T., Mangelsdorf, K., Mildenhall, D.C., Rinna, J., and Vieth, A. (2009). Prokaryotic populations and activities in an interbedded coal deposit, including a previously deeply buried section (1.6–2.3 km) above 150 Ma basement rock. Geomicrobiol. J. 26, 163–178. ∼ Gales, G., Chehider, N., Joulian, C., Battaglia-Brunet, F., Cayol, J.-L., Postec, A., Borgomano, J., Neria-Gonzalez, I., Lomans, B.P., Ollivier, B., et al. (2011). Characterization of Halanaerocella petrolearia gen. nov., sp. nov., a new anaerobic moderately halophilic fermentative bacterium isolated from a deep subsurface hypersaline oil reservoir. Extremophiles 15, 565–571. Getliff, J.M., Fry, J.C., Cragg, B.A., and Parkes, R.J. (1992). 45. THE POTENTIAL FOR BACTERIA GROWTH IN DEEP SEDIMENT LAYERS OF THE JAPAN SEA, HOLE 798B— LEG 1281. Gilbert, J.A., and Dupont, C.L. (2011). Microbial Metagenomics: Beyond the Genome. Annu. Rev. Mar. Sci. 3, 347–371. Gohn, G.S., Koeberl, C., Miller, K.G., Reimold, W.U., Cockell, C.S., Horton, J.W., Sanford, W.E., and Voytek, M.A. (2006). Chesapeake Bay ImpactStructure Drilled. EOS Trans. 87, 349–355. Gomez-Alvarez, V., Teal, T.K., and Schmidt, T.M. (2009). Systematic artifacts in metagenomes from complex microbial communities. ISME J. 3, 1314–1317. Gronstal, A.L., Voytek, M.A., Kirshtein, J.D., Heyde, N.M. von der, Lowit, M.D., and Cockell, C.S. (2009). Contamination assessment in microbiological sampling of the Eyreville core, Chesapeake Bay impact structure. Geol. Soc. Am. Spec. Pap. 458, 951–964. Handelsman, J. (2004). Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol. Mol. Biol. Rev. 68, 669–685. Hansen, A.J., Mitchell, D.L., Wiuf, C., Paniker, L., Brand, T.B., Binladen, J., Gilichinsky, D.A., Rønn, R., and Willerslev, E. (2006). Crosslinks Rather Than Strand Breaks Determine Access to Ancient DNA Sequences From Frozen Sediments. Genetics 173, 1175–1179.

95

Hoehler, T.M., and Jørgensen, B.B. (2013). Microbial life under extreme energy limitation. Nat. Rev. Microbiol. 11, 83–94. Hügler, M., and Sievert, S.M. (2011). Beyond the Calvin Cycle: Autotrophic Carbon Fixation in the Ocean. Annu. Rev. Mar. Sci. 3, 261–289. Ii, G.P.W.G. (2012). New grass phylogeny resolves deep evolutionary relationships and discovers C4 origins. New Phytol. 193, 304–312. Inagaki, F., Okada, H., Tsapin, A.I., and Nealson, K.H. (2005). MICROBIAL SURVIVAL: The Paleome: A Sedimentary Genetic Record of Past Microbial Communities. Astrobiology 5, 141– 153. Johnson, G.H., Kruse, S.E., Vaughn, A.W., Lucey, J.K., Hobbs, C.H., and Powars, D.S. (1998). Postimpact deformation associated with the late Eocene Chesapeake Bay impact structure in southeastern Virginia. Geology 26, 507–510. Johnson, S.S., Hebsgaard, M.B., Christensen, T.R., Mastepanov, M., Nielsen, R., Munch, K., Brand, T., Gilbert, M.T.P., Zuber, M.T., Bunce, M., et al. (2007). Ancient bacteria show evidence of DNA repair. Proc. Natl. Acad. Sci. 104, 14401–14405. Jørgensen, B.B. (2011). Deep subseafloor microbial cells on physiological standby. Proc. Natl. Acad. Sci. 108, 18193–18194. Jørgensen, B.B., and Boetius, A. (2007). Feast and famine—microbial life in the deep-sea bed. Nat. Rev. Microbiol. 5, 770–781. Kashefi, K., and Lovley, D.R. (2003). Extending the Upper Temperature Limit for Life. Science 301, 934–934. Kenkmann, T., Collins, G.S., Wittmann, A., Wünnemann, K., Reimold, W.U., and Melosh, H.J. (2009). A model for the formation of the Chesapeake Bay impact crater as revealed by drilling and numerical simulation. Geol. Soc. Am. Spec. Pap. 458, 571–585. Kent, W.J. (2002). BLAT—The BLAST-Like Alignment Tool. Genome Res. 12, 656–664. Kieffer, S.W. (1971). Shock metamorphism of the Coconino Sandstone at Meteor Crater, Arizona. J. Geophys. Res. 76, 5449–5473. Kieft, T.L., Fredrickson, J.K., McKinley, J.P., Bjornstad, B.N., Rawson, S.A., Phelps, T.J., Brockman, F.J., and Pfiffner, S.M. (1995). Microbiological comparisons within and across contiguous lacustrine, paleosol, and fluvial subsurface sediments. Appl. Environ. Microbiol. 61, 749–757. Kircher, M. (2012). Analysis of high-throughput ancient DNA sequencing data. In Ancient DNA, (Springer), pp. 197–228. Kirsimäe, K., Suuroja, S., Kirs, J., Kärki, A., Polikarpus, M., Puura, V., and Suuroja, K. (2002). Hornblende alteration and fluid inclusions in Kärdla impact crater, Estonia: Evidence for impact- induced hydrothermal activity. Meteorit. Planet. Sci. 37, 449–457.

96

Koeberl, C., Poag, C.W., Reimold, W.U., and Brandt, D. (1996). Impact Origin of the Chesapeake Bay Structure and the Source of the North American Tektites. Science 271, 1263–1266. Kring, D.A. (1997). Air blast produced by the Meteor Crater impact event and a reconstruction of the affected environment. Meteorit. Planet. Sci. 32, 517–530. Kulpecz, A.A., Miller, K.G., Browning, J.V., Edwards, L.E., Powars, D.S., McLaughlin, P.P., Harris, A.D., and Feigenson, M.D. (2009). Postimpact deposition in the Chesapeake Bay impact structure: Variations in eustasy, compaction, sediment supply, and passive-aggressive tectonism. Geol. Soc. Am. Spec. Pap. 458, 811–837. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. Leonard, J.A., Shanks, O., Hofreiter, M., Kreuz, E., Hodges, L., Ream, W., Wayne, R.K., and Fleischer, R.C. (2007). Animal DNA in PCR reagents plagues ancient DNA research. J. Archaeol. Sci. 34, 1361–1366. Lin, L.-H., Wang, P.-L., Rumble, D., Lippmann-Pipke, J., Boice, E., Pratt, L.M., Lollar, B.S., Brodie, E.L., Hazen, T.C., and Andersen, G.L. (2006). Long-term sustainability of a high-energy, low-diversity crustal biome. Science 314, 479–482. Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709–715. Lindgren, P., Ivarsson, M., Neubeck, A., Broman, C., Henkel, H., and Holm, N.G. (2010). Putative fossil life in a hydrothermal system of the Dellen impact structure, Sweden. Int. J. Astrobiol. 9, 137–146. Lipp, J.S., Morono, Y., Inagaki, F., and Hinrichs, K.-U. (2008). Significant contribution of Archaea to extant biomass in marine subsurface sediments. Nature 454, 991–994. Lomstein, B.A., Langerhuus, A.T., D’Hondt, S., Jørgensen, B.B., and Spivack, A.J. (2012). Endospore abundance, microbial growth and necromass turnover in deep sub-seafloor sediment. Nature 484, 101–104. Long, S.R. (1996). Rhizobium symbiosis: nod factors in perspective. Plant Cell 8, 1885. Long, S.R. (2001). Genes and signals in the Rhizobium-legume symbiosis. Plant Physiol. 125, 69– 72. Lowenstein, T.K., Schubert, B.A., and Timofeeff, M.N. (2011). Microbial communities in fluid inclusions and long-term survival in halite. GSA Today 21, 4–9. Malinconico, M.L., Sanford, W.E., and Horton, J.W. (2009). Postimpact heat conduction and compaction-driven fluid flow in the Chesapeake Bay impact structure based on downhole vitrinite reflectance data, ICDP-USGS Eyreville deep core holes and Cape Charles test holes. Geol. Soc. Am. Spec. Pap. 458, 905–930. Meyer, M., and Kircher, M. (2010). Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing. Cold Spring Harb. Protoc. 2010, pdb.prot5448.

97

Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., et al. (2008). The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386. Michalski, J.R., Cuadros, J., Niles, P.B., Parnell, J., Deanne Rogers, A., and Wright, S.P. (2013). Groundwater activity on Mars and implications for a deep biosphere. Nat. Geosci. 6, 133–138. Miller, D.N., Bryant, J.E., Madsen, E.L., and Ghiorse, W.C. (1999). Evaluation and Optimization of DNA Extraction and Purification Procedures for Soil and Sediment Samples. Appl. Environ. Microbiol. 65, 4715–4724. Moissl, C., Osman, S., La Duc, M.T., Dekas, A., Brodie, E., DeSantis, T., and Venkateswaran, K. (2007). Molecular bacterial community analysis of clean rooms where spacecraft are assembled. FEMS Microbiol. Ecol. 61, 509–521. Morono, Y., Terada, T., Nishizawa, M., Ito, M., Hillion, F., Takahata, N., Sano, Y., and Inagaki, F. (2011). Carbon and nitrogen assimilation in deep subseafloor microbial cells. Proc. Natl. Acad. Sci. 108, 18295–18300. Ondov, B.D., Bergman, N.H., and Phillippy, A.M. (2011). Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385. Onstott, T.C., Phelps, T.J., Kieft, T., Colwell, F.S., Balkwill, D.L., Fredrickson, J.K., and Brockman, F.J. (1999). A global perspective on the microbial abundance and activity in the deep subsurface. Enigmatic Microorg. Life Extreme Environ. Kluwer Acad. Publ. Dordr. Neth. 487– 500. Osinski, G.R., Spray, J.G., and Lee, P. (2001). Impact-induced hydrothermal activity within the Haughton impact structure, arctic Canada: Generation of a transient, warm, wet oasis. Meteorit. Planet. Sci. 36, 731–745. Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.-Y., Cohoon, M., de Crecy- Lagard, V., Diaz, N., Disz, T., Edwards, R., et al. (2005). The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. Nucleic Acids Res. 33, 5691– 5702. Parkes, R.J., Cragg, B.A., Bale, S.J., Getlifff, J.M., Goodman, K., Rochelle, P.A., Fry, J.C., Weightman, A.J., and Harvey, S.M. (1994). Deep bacterial biosphere in Pacific Ocean sediments. Nature 371, 410–413. Parkes, R.J., Cragg, B.A., and Wellsbury, P. (2000). Recent studies on bacterial populations and processes in subseafloor sediments: a review. Hydrogeol. J. 8, 11–28. Parks, D.H., and Beiko, R.G. (2010). Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26, 715–721. Parnell, J., Lee, P., Cockell, C. s., and Osinski, G. r. (2004). Microbial colonization in impact- generated hydrothermal sulphate deposits, Haughton impact structure, and implications for sulphates on Mars. Int. J. Astrobiol. 3, 247–256.

98

Parnell, J., Boyce, A., Thackrey, S., Muirhead, D., Lindgren, P., Mason, C., Taylor, C., Still, J., Bowden, S., Osinski, G.R., et al. (2010). Sulfur isotope signatures for rapid colonization of an impact crater by thermophilic microbes. Geology 38, 271–274. Phelps, T.J., Murphy, E.M., Pfiffner, S.M., and White, D.C. (1994). Comparison between geochemical and biological estimates of subsurface microbial activities. Microb. Ecol. 28, 335– 349. Poag, C.W., Powars, D.S., Poppe, L.J., and Mixon, R.B. (1994). Meteoroid mayhem in Ole Virginny: Source of the North American tektite strewn field. Geology 22, 691–694. Powars, D.S., and Bruce, T.S. (1999). The effects of the Chesapeake Bay impact crater on the geologic framework and the correlation of hydrogeologic units of southeastern Virginia, south of the James River. US Geol. Surv. Prof. Pap. 1612. Price, P.B., and Sowers, T. (2004). Temperature dependence of metabolic rates for microbial growth, maintenance, and survival. Proc. Natl. Acad. Sci. U. S. A. 101, 4631–4636. Rastogi, G., Osman, S., Kukkadapu, R., Engelhard, M., Vaishampayan, P.A., Andersen, G.L., and Sani, R.K. (2010). Microbial and Mineralogical Characterizations of Soils Collected from the Deep Biosphere of the Former Homestake Gold Mine, South Dakota. Microb. Ecol. 60, 539–550. Rathbun, J.A., and Squyres, S.W. (2002). Hydrothermal Systems Associated with Martian Impact Craters. Icarus 157, 362–372. Roh, Y., Liu, S.V., Li, G., Huang, H., Phelps, T.J., and Zhou, J. (2002). Isolation and characterization of metal-reducing Thermoanaerobacter strains from deep subsurface environments of the Piceance Basin, Colorado. Appl. Environ. Microbiol. 68, 6013–6020. Rohland, N., and Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762. Røy, H., Kallmeyer, J., Adhikari, R.R., Pockalny, R., Jørgensen, B.B., and D’Hondt, S. (2012). Aerobic microbial respiration in 86-million-year-old deep-sea red clay. Science 336, 922–925. Sachse, D., Billault, I., Bowen, G.J., Chikaraishi, Y., Dawson, T.E., Feakins, S.J., Freeman, K.H., Magill, C.R., McInerney, F.A., van der Meer, M.T.J., et al. (2012). Molecular Paleohydrology: Interpreting the Hydrogen-Isotopic Composition of Lipid Biomarkers from Photosynthesizing Organisms. Annu. Rev. Earth Planet. Sci. 40, 221–249. Sanford, W. (2003). Heat flow and brine generation following the Chesapeake Baybolide impact. J. Geochem. Explor. 78–79, 243–247. Sanford, W.E., Voytek, M.A., Powars, D.S., Jones, B.F., Cozzarelli, I.M., Cockell, C.S., and Eganhouse, R.P. (2009). Pore-water chemistry from the ICDP-USGS core hole in the Chesapeake Bay impact structure—Implications for paleohydrology, microbial habitat, and water resources. Geol. Soc. Am. Spec. Pap. 458, 867–890. Sass, H., and Cypionka, H. (2004). Isolation of Sulfate-Reducing Bacteria from the Terrestrial Deep Subsurface and Description of Desulfovibrio cavernae sp. nov. Syst. Appl. Microbiol. 27, 541–548.

99

Schippers, A., Neretin, L.N., Kallmeyer, J., Ferdelman, T.G., Cragg, B.A., Parkes, R.J., and Jørgensen, B.B. (2005). Prokaryotic cells of the deep sub-seafloor biosphere identified as living bacteria. Nature 433, 861–864. Schouten, S., Hopmans, E.C., and Sinninghe Damsté, J.S. (2013). The organic geochemistry of glycerol dialkyl glycerol tetraether lipids: A review. Org. Geochem. 54, 19–61. Schubert, B.A., Lowenstein, T.K., and Timofeeff, M.N. (2009). Microscopic identification of prokaryotes in modern and ancient halite, Saline Valley and Death Valley, California. Astrobiology 9, 467–482. Schulte, P., Alegret, L., Arenillas, I., Arz, J.A., Barton, P.J., Bown, P.R., Bralower, T.J., Christeson, G.L., Claeys, P., Cockell, C.S., et al. (2010). The Chicxulub Asteroid Impact and Mass Extinction at the Cretaceous-Paleogene Boundary. Science 327, 1214–1218. Schwenzer, S.P., and Kring, D.A. (2009). Impact-generated hydrothermal systems capable of forming phyllosilicates on Noachian Mars. Geology 37, 1091–1094. Sheu, C., Wu, C.-Y., Chen, S.-C., and Lo, C.-C. (2008). Extraction of DNA from soil for analysis of bacterial diversity in transgenic and nontransgenic papaya sites. J. Agric. Food Chem. 56, 11969–11975. Spang, A., Hatzenpichler, R., Brochier-Armanet, C., Rattei, T., Tischler, P., Spieck, E., Streit, W., Stahl, D.A., Wagner, M., and Schleper, C. (2010). Distinct gene set in two different lineages of ammonia-oxidizing archaea supports the Thaumarchaeota. Trends Microbiol. 18, 331– 340. Squyres, S.W., Arvidson, R.E., Bell, J.F., Calef, F., Clark, B.C., Cohen, B.A., Crumpler, L.A., Souza, P.A. de, Farrand, W.H., Gellert, R., et al. (2012). Ancient Impact and Aqueous Processes at Endeavour Crater, Mars. Science 336, 570–576. Takai, K., Hirayama, H., Sakihama, Y., Inagaki, F., Yamato, Y., and Horikoshi, K. (2002). Isolation and metabolic characteristics of previously uncultured members of the order Aquificales in a subsurface gold mine. Appl. Environ. Microbiol. 68, 3046–3054. Takai, K., Mormile, M.R., McKinley, J.P., Brockman, F.J., Holben, W.E., Kovacik, W.P., and Fredrickson, J.K. (2003). Shifts in archaeal communities associated with lithological and geochemical variations in subsurface Cretaceous rock. Environ. Microbiol. 5, 309–320. Temperton, B., and Giovannoni, S.J. (2012). Metagenomics: microbial diversity through a scratched lens. Curr. Opin. Microbiol. 15, 605–612. Teske, A.P. (2005). The deep subsurface biosphere is alive and well. Trends Microbiol. 13, 402– 404. Teske, A., Dhillon, A., and Sogin, M.L. (2003). Genomic Markers of Ancient Anaerobic Microbial Pathways: Sulfate Reduction, Methanogenesis, and Methane Oxidation. Biol. Bull. 204, 186–191. Toon, O.B., Zahnle, K., Morrison, D., Turco, R.P., and Covey, C. (1997). Environmental perturbations caused by the impacts of asteroids and comets. Rev. Geophys. 35, 41–78.

100

Venkateswaran, K., Satomi, M., Chung, S., Kern, R., Koukol, R., Basic, C., and White, D. (2001). Molecular Microbial Diversity of a Spacecraft Assembly Facility. Syst. Appl. Microbiol. 24, 311– 320. Wilke, A., Harrison, T., Wilkening, J., Field, D., Glass, E.M., Kyrpides, N., Mavrommatis, K., and Meyer, F. (2012). The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC Bioinformatics 13, 141. Willerslev, E., and Cooper, A. (2005). Review Paper. Ancient DNA. Proc. R. Soc. B Biol. Sci. 272, 3–16. Willerslev, E., Hansen, A.J., and Poinar, H.N. (2004). Isolation of nucleic acids and cultures from fossil ice and permafrost. Trends Ecol. Evol. 19, 141–147. Wilms, R., Köpke, B., Sass, H., Chang, T.S., Cypionka, H., and Engelen, B. (2006). Deep biosphere-related bacteria within the subsurface of tidal flat sediments. Environ. Microbiol. 8, 709–719. Young, J.P.W., and Johnston, A.W.B. (1989). The evolution of specificity in the legume- Rhizobium symbiosis. Trends Ecol. Evol. 4, 341–349. Yu, Z., and Morrison, M. (2004). Improved extraction of PCR-quality community DNA from digesta and fecal samples. Biotechniques 36, 808–813. Zhou, J., Bruns, M.A., and Tiedje, J.M. (1996). DNA recovery from soils of diverse composition. Appl. Environ. Microbiol. 62, 316–322.

101

Chapter 4: Identifying human haplotypes from DNA transferred from objects during use

Abstract: DNA transferred from human to objects is routinely recovered and analyzed by forensic scientists. However, very rarely has such transfer DNA been recovered from historic or ancient samples. Ideally suited for such analysis are items which are well used and in frequent contact with skin or body fluids. A cache of Native American moccasins and other human artifacts found in a cave off the Great Salt Lake, provide a useful sample set on which to test the ability to recover transfer ancient DNA. In this paper, we report authentic Native American haplotypes found in 700 year old moccasins, mittens, and cordage. Transfer ancient DNA potentially provides a large sample set of items which could aid in reconstructing the genetics of ancient populations.

102

Introduction DNA transferred to objects DNA is easily transferred from humans to objects through production and use. This transferred DNA is routinely recovered by forensic scientists from a wide range of objects including but not limited to tools, clothing, shoes, knives, vehicles, firearms, food, bedding, condoms, lip cosmetics, wallets, jewelry, glass, skin, paper, cables, windows, doors, and stone (see review in van Oorschot et al., 2010). In archaeology analysis of transferred lipids from a variety of items including vessels, slab-lined pits, lamps, and seals is also routinely conducted (see review in Brown and Brown, 2013). A small number of studies have been conducted on transfer DNA. These studies examine the DNA transferred from a gestating bird to a fossil egg shell (Oskam and Bunce, 2012; Oskam et al., 2010); the DNA transferred from plants to cooking or storage vessels (Hansson and Foley, 2008; McGovern et al., 2009) DNA to objects; and the DNA transferred from humans to quids (chewed and discarded wads of plant material) and aprons (fiber material that contains menstrual blood) from Native North Americans (LeBlanc et al., 2007). Therefore although ancient DNA (aDNA) is theoretically able to be recovered from transferred objects, only one study has been conducted on human DNA transfer to an object which is a potentially very large source of information about past populations.

Political and cultural challenges of working with human remains Working with human remains poses a unique set of political and cultural concerns, especially when dealing with artifacts or remains from Native Americans. As many artifacts have been stolen or seized during the last two centuries without regard for Native American beliefs, practices, or cultural history, the Native American Graves Protection and Repatriation Act (NAGPRA) was passed in 1990 to protect the Native American rights to their cultural heritage (Gunn, 2009). Scientific study of such artifacts and remains depends upon permission from those indigenous people with responsibility for it and research must be discussed and shared openly with “those with a direct cultural and/or religious interest in the material.” (Jones and Harris, 1998). Each Native American culture and nation have different beliefs about the treatment of human remains and associated artifacts which range from allowing no studies, to allowing no invasive studies (such as most DNA analyses), to those that allow all types of study (Arion T. Mayes, 2010).

103

In addition, different tribes will allow genetic material to be used for some research purposes, but not for others. This can be a specific problem in relation to tribal origins (Reardon and TallBear, 2012) which is often a research goal when extracting aDNA. Stuart Kirsch (2011) suggests that the best approach is through negotiations recognizing the conflicting domains of property, kinship, and science which are involved, and this process takes time and dedication from anthropologists, archaeologists, and who work with human remains and artifacts.

We obtained permission and cooperation for genetic analysis of transferred DNA on recovered artifacts from both the owners of the site and the local Native American tribal council. Permission was given to sample and sequence mitochondrial DNA sequences from living individuals, individuals associated with the excavation, and living members of the local Native American tribe. Permission was limited to mitochondrial DNA analysis, and we agreed to consult the tribal council at each stage of the research process and prior to publication, to ensure that the tribal members clearly understood the process of obtaining and analyzing the data and interpreting the results.

Challenges of DNA recovery from transferred objects In addition to the challenges of working with human artifacts, transferred aDNA poses a number of problems including difficult conditions for preservation, ease of contamination, and multiple DNA profiles on the items. First, items which have DNA transferred from humans will have less initial DNA than a direct product of the human such as bone or tissue. In addition, the DNA transferred to the item is generally only found on the surface and therefore is not protected by outer layers as human products such as teeth are (Adler et al., 2011). Finally, these samples are found in a warm environment which is not ideal for DNA preservation (Smith et al., 2003), although some cave microenvironments have good preservation conditions given their stable temperature and humidity conditions (Letts and Shapiro, 2012). Therefore transfer aDNA is subject to a difficult preservation environment and is expected to be fragmented and damaged.

Because of the expected low amount of starting cells and the lack of a removable outer layer, these samples are particularly prone to contamination during excavation and extraction. This problem is complicated by the fact that the samples are human and thus sources for contamination

104

are inherently present during excavation and extraction (Pilli et al., 2013). Therefore, precautions to avoid contamination (such as wearing protective clothing and sampling quickly after excavation) must be taken and the analysis must take into account that contamination is likely to be present.

After addressing preservation and contamination, transfer aDNA poses an additional problem in that the original item was often used by more than one individual. If these individuals are closely related, it will be difficult to distinguish between them especially with fragmented, damaged samples. If the individuals were not closely related, there will be multiple haplotypes present. With short, fragmented DNA, individual SNPs cannot be confirmed to be from the same individual also complicating the analysis.

In order to better assess the possibilities of recovering transfer aDNA from items, we investigated moccasins, mittens, and cordage from a cave site on the Great Salt Lake. These items show evidence of frequent use and would have been in contact with skin making them ideal candidates for recovery of transfer aDNA. We tested 35 items for presence of Native American DNA. Those samples with some Native American DNA present were further examined to identify the haplogroup or haplogroups present using 39 mitochondrial single nucleotide polymorphisms (SNPs).

Cave Site Thirty eight samples were collected from a cave on the north shore of the Great Salt Lake, originally excavated in the 1930s, and subsequently excavated over two field seasons in 2010 and 2011. The original excavations revealed signs of occupation around 700 years ago (radio carbon dated materials) and items recovered from the site included animal and plant remains; bison, antelope, deer, and elk bones, horn, hair, and hide; and marsh plants. Evidence of hunting and hide preparation including arrow points, cane arrow shafts, hardwood freshafts, bow fragments, knife handles, scrapers, and bone fleshers was also present. Other domestic items found were slate knives, basketry, cordage, mittens, and matting of various plant materials. In addition, pieces from a game and an etched slate were found. The largest amount of objects found were moccasins. Over 250 were excavated from the original dig and usually were made from bison leather and fur

105

lining, with some made from deer and antelope leather or plant fiber linings. Many of these artifacts were stored at the Utah Museum of Natural History and some of the moccasins were reconstructed using conservation techniques.

Additional excavations over two field seasons in 2011 and 2013 were conducted by John W. Ives in order to use modern standards to assess some of the remaining deposits and cave environments for stratigraphy, zooarchaeological, macrobotanical, and ancient DNA research. These excavations recovered additional material remains including more moccasins (Figure 4-1), mittens, and cordage which were sampled during excavation for aDNA analysis. The moccasins and mittens are well worn, some contain wear holes and therefore are prime targets for DNA transfer from the individuals who wore these items. Cordage is made by rubbing plant fibers between the hands and thighs and occasionally applying saliva so it is also a prime target for a large amount of DNA to be transferred. In addition, some of the cordage samples had hair twisted in, although the origin of the hair is unknown.

Figure 4-1: Moccasin (42B01 10241) with porcupine quill applique and piping decorating the toe and apron seam (Source: Utah Museum of Natural History).

106

Materials and Methods Sampling Two different types of samples were obtained from the items. The first set of samples were taken by Beth Shapiro from moccasins excavated in the 1930s. Some of these samples had been curated with different techniques. Samples were taken with sterile swabs by swabbing the interior of the moccasin in the toe area where there would be the least chance for contamination and by any holes made from wear which would potentially be areas with the most endogenous DNA.

A second set of samples were taken both from moccasins from the 1930s collection and from the fresh excavation in 2011 by Beth Shapiro and John W. Ives. Samples were taking wearing protective clothing including hair nets, face masks, and gloves from freshly excavated samples before processing in order to introduce the least amount of contamination. They were cut from the foot pad, toe, or near wear holes from moccasins, the interior of mittens, and parts of cordage. All samples were then double bagged and shipped to the Ancient DNA Laboratory at The Pennsylvania State University where they were stored at room temperature until subsampling and extraction. Under a flow hood and with a body suit, facemask, and double gloves, the moccasins and mittens were subsampled to approximately 1 cm x 1 cm pieces and the cordage was subsampled to approximate a 1cm length for extraction. These subsamples were then further cut into smaller pieces and placed into a 1.5 ml sterile eppindorf tube for extraction. Between each sample preparation the flow hood and the scissors used for subsampling were washed with bleach, ethanol, and sterile water. In addition the outer layer of gloves was replaced between sampling.

Extraction In order to decrease contamination from humans, all extraction work was conducted in a flow hood which was both cleaned with bleach and ethanol and UV irradiated before use. All reagents were made from freshly opened bottles in small batches and when possible all reagents were UV irradiated for 1 hour prior to use. In addition, all plastic wear and consumables used were UV irradiated for 1 hour prior to use. Negative extraction controls (no sample) were processed before the samples and again between every five samples in order to assess contamination from handling vs. contamination from reagents.

107

The first set of 14 swab samples (Table 4-1) were extracted using a protocol for low copy number and touch-type biological evidence from the Penn State University Forensic Science Program (PSU 0026 V2). This protocol was modified from Schiffner et al. (2005) and Tamariz et al. (2006). The swab was cut into a 1.5 mL eppindorf tube with 250 µl of LCN Stain Extraction Buffer (0.01% SDS; 10 mM EDTA; 100 mM NaCl; 7.6 mM Tris-HCl, pH 8.0) and 10 µl of Proteinase K (10 mg/mL). The sample was incubated at 56°C for 30 min and then heated to 99°C for 10 min to inactivate the Proteinase K. Then the extract and swab were transferred to a Micron- 50 column in a fresh 1.5 ml tube. Between each transfer the forceps were cleaned with bleach, ethanol, and water. The Micron column was centrifuged at 1000 x g for 15 min (until the volume left in the column was approximately 25 µl. 250 µl of TE buffer (10 mM Tris-HCl; 0.1 mM EDTA, pH 7.5) was added to the column followed by addition centrifugation at 1000 x g for 15 min until the volume left in the column was again approximately 25 µl. The Micron column was reversed into a clean 1.5 ml tube and centrifuged at 1000 x g for 3 min. The resulting extract was stored at -20°C until amplification.

21 small fragments of moccasin, mitten, or cordage from the 2011 excavation and 3 small fragments from moccasins excavated in 1937 (Table 4-1) were extracted using a silica based extraction method tested for aDNA samples (Rohland and Hofreiter, 2007). Although usually used for bone samples, the silica based method preserves highly damaged and fragmented DNA. Since the human DNA was at the surface of the sample, the leather or hemp did not need to be degraded to retrieve the desired DNA and therefore a stronger extraction buffer was not necessary. The final elution volume was 50 µl of TE with 0.05% Tween.

108

Sample Excavation Number Sample Type Material Excavation Location Age (C13 date) UP10.KK.001 Drawer A14 Swab moccasin 1937 Cave 1 UP10.KK.002 42B01-16146 Swab moccasin 1937 Cave 1 UP10.KK.003 42B01-11574 Swab moccasin 1937 Cave 1 UP10.KK.004 42B01-10145 Swab moccasin 1937 Cave 1 UP10.KK.005 42B01-10104 Swab moccasin 1937 Cave 1 UP10.KK.006 42B01-16146 Swab moccasin 1937 Cave 1 UP10.KK.007 42B01-9748 Swab moccasin 1937 Cave 1 UP10.KK.008 42B01-9753 Swab moccasin 1937 Cave 1 UP10.KK.009 42B01-9748 Swab moccasin 1937 Cave 1 UP10.KK.010 42B01-10346 Swab moccasin 1937 Cave 1 UP10.KK.011 42B01-11582.59 Swab moccasin 1937 Cave 1 UP10.KK.012 42B01-9748 Swab moccasin 1937 Cave 1 UP10.KK.013 42B01-10102 Swab moccasin 1937 Cave 1 UP10.KK.014 42B01-115822 Swab moccasin 1937 Cave 1 UP11.KK001 Cave #1 Moc #1 Cut material moccasin* 1937 Cave 1 UP11.KK002 FS116 Cut material moccasin 2011 Cave 1 730±26 UP11.KK003 Cave #3 Cut material moccasin* 1937 Cave 3 UP11.KK004 FS126 Cut material moccasin 2011 Cave 1 UP11.KK005 42 B01 FS168 Cut material moccasin 2011 Cave 1 747±22 UP11.KK006 FS130 Cut material moccasin 2011 Cave 1 UP11.KK007 Cave #1 Moc #2 Cut material moccasin* 1937 Cave 1 UP11.KK008 FS131 Cut material moccasin 2011 Cave 1 700±22 UP11.KK009 FS173 Cut material moccasin 2011 Cave 1 706±25 UP11.KK010 FS291 Cut material moccasin 2011 Cave 1 UP11.KK013 42 B01 FS801 Cut material moccasin 2011 Cave 1 777±22 UP11.KK014 42 B01 FS 799 Cut material moccasin 2011 Cave 1 UP11.KK015 42 B01 FS225 Cut material leather - possible mitten 2011 Cave 1 or mitten lining UP11.KK016 42 B01 FS235 Cut material leather - moccasin 2011 Cave 1 UP11.KK017 42 B01 FS291 Cut material plant fibre cordage 2011 Cave 1 UP11.KK018 42 B01 FS117 Cut material twisted sinew 2011 Cave 1 UP11.KK019 42 B01 FS71 Cut material plant fibre cordage 2011 Cave 1 UP11.KK020 42 B01 FS265 Cut material plant fibre cordage 2011 Cave 1 UP11.KK021 42 B01 FS263 Cut material thicker cordage 2011 Cave 1 UP11.KK022 42 B01 FS14 Cut material plant fibre cordage 2011 Cave 1 UP11.KK023 42 B01 FS615 Cut material plant fibre cordage 2011 Cave 1 UP11.KK024 42 B01 FS303 Cut material plant fibre cordage 2011 Cave 1 UP11.KK025 42 B01 FS568 Cut material bark cordage 2011 Cave 1 UP12.KK010 42 B01 FS305 Cut material moccasin 2011 Cave 1 725±24

Table 4-1: List of samples used in aDNA analysis, the object type, the type of sampling used, and the radiocarbon dates for the item (when available). Items marked with a * are from a private collection, other moccasins from 1937 are from the Utah Museum of Natural History. Samples from 2011 were taken during excavation.

109

Amplification Extracts from 14 swabs and 24 samples were amplified with 3 overlapping primers from the HVS1 region of the mitochondrial genome (Table 4-2) to assess for Native American halpogroups. PCR amplifications were performed in 25 µl reactions using 1 µl of extract, 1.25 U Platinum Taq Hi-Fidelity and 1X buffer (Invitrogen Ltd., UK), 2mg/ml rabbit serum albumin (RSA; Sigma, fraction V), 2 mM MgSO4, 250 µM of each dNTP, and 1 µM of each primer. PCR thermal cycling reactions were 94°C for 2 min, followed by 35 cycles of 94°C denaturation for 25- 30 sec, annealing for 45 sec at various temperatures depending upon the primers, and extension at 68°C for 45 sec. Negative extraction controls as well as negative PCR controls (no extract) were amplified with each PCR reaction. Amplification reactions were purified and cloned using the TOPO TA Cloning Kit (Invitrogen) in 1/5 cloning reactions. 10-20 clones for each amplification were sequenced at the Genomics Core Facility (The Huck Institutes of the Life Sciences, The Pennsylvania State University) on an Applied Biosystems 3730XL machine.

SNP amplification design Eighteen samples had clones which were positive for Native American DNA. These samples were then chosen for further single nucleotide polymorphism (SNP) analysis to more conclusively identify the haplogroup(s) present. SNPs were chosen in order to identify several of the key Native American haplotypes (Álvarez-Iglesias et al., 2007; van Oven and Kayser, 2009; Vallone et al., 2004). As the DNA from the samples was expected to be damaged and fragmented, primers were designed to be short. As the copy number was expected to be low and therefore not all primer sets may amplify all samples, all haplotype distinctions were designed to have multiple SNPs with different primer sets when possible (Figure 4-2). Amplification was attempted of 39 mitochondrial DNA (mtDNA) SNPs, one nuclear SNP, and 3 Y SNPs using 30 primer sets. When necessary primers were designed using PrimerBLAST from NCBI using the human reference sequence (Table 4-2). All primers were checked for cross amplification from the human genome also using PrimerBLAST. Amplification was carried out according to the same protocol as the HVS1 region primers above.

For each amplification, the corresponding negative extraction controls and a negative PCR extraction (no template) were also conducted. For most reactions if any of these negative controls

110 were positive for amplification the entire PCR reaction was discarded and a new amplification was completed. For four amplifications, there was a very slight contamination repeatedly in only one of the negative extraction controls. For these samples, the positive extraction negatives were cloned and sequenced (as above). As none of these negatives were positive for Native American DNA, the amplification products from the samples were used.

Library preparation, and sequencing As the samples were contaminated, a next generation sequencing methods (Illumina) was chosen to provide greater depth of coverage of the PCR products that could be provided by cloning alone. PCR amplification products corresponding to each sample were pooled in roughly equal proportions. Then to concentrate them, the pooled products were purified again before library preparation. 15 µl of the pooled PCR products were used to make an Illumina library according to the protocol by Meyer and Kircher (2010) for each of the 18 samples. These libraries were pooled and then sequenced at the Ancient DNA Laboratory at The University of California, Santa Cruz on a Miseq Illumina platform with 150 bp paired end chemistry.

Sequence analysis and haplotypes identification Illumina data was mergered using SeqPrep (https://github.com/jstjohn/SeqPrep). As all the PCR products were short enough to merge, all unmerged data was then discarded. Primers were removed using a python script (Shapiro Lab). Using BWA (Li and Durbin, 2009) and Samtools (Li et al., 2009) the resulting reads were the mapped to the human reference sequence and duplicate reads were removed. As these samples could contain multiple Native American haplotypes from usage as well as modern human contamination a consensus sequence could not be used to call the SNPs. Therefore each file containing the aligned, duplicate removed reads from one sample was opened in Geneious R4 (http://www.geneious.com/) and each SNP location was inspected by eye for presence of the Native American SNP, contamination, and damage.

111

Forward Reverse Length Tm SNP(s) L611 ACCTCCTCAAAGCAATACACTG H743 GTGCTTGATGCTTGTTCCTTTTG 176 56 mt 663d hum_1415F ACCCCAGAAAACTACGATAGCCC hum_1415R GGGCCCTGTTCAACTAAGCACTCT 94 60 mt 1415 hum_1719F CCAAACCCACTCCACCTTACT hum_1719R GCGCCAGGTTTCAATTTCTA 88 56 mt 1719/1736a hum_3330F CAGTCAGAGGTTCAATTCCTCTT hum_3330R GGGCCTTTGCGTAGTTGTAT 142 60 mt 3330b hum_3552F CATCTACCATCACCCTCTACATCA hum_3552R ACCAGGGGGTTGGGTATG 94 60 mt 3547/3552a hum_4820F TAGCCCCCTTTCACTTCTGA hum_4820R GGCTTACGTTTAGTGAGGGAGA 135 56 mt 4820/4824a hum_4977F ACGTAAGCCTTCTCCTCACTCTCTC hum_4977R AGATTTTGCGTAGCTGGGTTTGGTT 87 56 mt 4977 hum_5176F TAACTACTACCGCATTCCTA hum_5176R AAAGCCGGTTAGCGGCGGCA 56 mt 5176 hum_8027F GAACCAGGCGACCTGCGACT hum_8027R CCCGGTCGTGTAGCGGTGAA 185 56 mt 8027 hum_8281F TAGGGCCCGTATTTACCCTAT hum_8281R AAGAGGTGTTGGTTCTCTTAATCTTT 110 60 mt 8281-8289 dela hum_8913F CCCCTTATGAGCGGGCACAGT hum_8913R ACGGCCAGGGCTATTGGTTGA 156 60 mt 8913 hum_10238F GCGTCCCTTTCTCCATAAAA hum_10238R GGGTAAAAGGAGGGCAATTT 80 58 mt 10283b hum_10256F TATATCCCCCGCCCGCGTCC hum_10256R TGTTTGTAGGGCTCATGGTAGGGG 115 62 mt 10256 hum_11177F TTCACAGCCACAGAACTAATCA hum_11177R AGTGCGATGAGTAGGGGAAG 164 60 mt 11177b hum_11365F CACCCTAGGCTCACTAAACATTC hum_11365R TTCGACATGGGCTTTAGGG 160 60 mt 11365b hum_12007F AACCACGTTTCTCCTGATCAAA hum_12007R AATGTGGTGGGTGAGTGAGC 121 60 mt 11969/12007a hum_12705F CCCAAACATTAATCAGTTCTTCAA hum_12705R TCTCAGCCGATGAACAGTTG 102 60 mt 12705b hum_13259F CGCCCTTACACAAAATGACATCAA hum_13259R TCCTATTTTTCGAATATCTTGTTC 208 56 mt 13259 hum_13966F ACCGCACAATCCCCTATCTA hum_13966R AGGTGATGATGGAGGTGGAG 132 60 mt 13966b hum_14470F CAAGACCTCAACCCCTGACC hum_14470R GGGGGAGGTTATATGGGTTT 129 56 mt 14470c hum_14502F GACAACCATCATTCCCCCTA hum_14502R CTATTTATGGGGGTTTAGTATTGATT 125 56 mt 14502a hum_15535F CGGCTTACTTCTCTTCCTTCTCT hum_15535R ATATCATTCGGGCTTGATGTG 130 56 mt 15335a 15986Fe GCACCCAAAGCTAAGATTCTAATTT 16153R CAGGTGGTCAAGTATTTATGGT 168 56 mt 16093/16111d 16106Fe GCCAGCCACCATGAATATTGT 16251R GGAGTTGCAGTTGATGTGTGAT 146 56 mt 16140/16189/d 16213/16217/ 16223 16190Fe CCCCATGCTTACAAGCAAGT 16355R GGGATTTGACTGTAATGTGCTATGT 166 56 mt 16256/16265/d 16278/16290/ 16319 hum_16483F TCGCTCCGGGCCCATAACAC hum_16483R GGGGAACGTGTGGGCTATTTAGG 102 58 mt 16483 Hum_ALNas2F GCATCCTGATTACTCTGTCGT Hum_ALNas2R TAAATTCATCGAACACTTTGGCAT 56 nuc Hum_YM9F GGACCCTGAAATACAGAACTGC Hum_YM9R CGTTTGAACATGTCTAAATTAAAGAAAA 56 Y Hum_YRPS4YF CAGGGCAATAAACCTTGGAT Hum_YRPS4YR GTGGCCAGCCTCTTATCTCTC 60 Y Hum_YP39F GTTCGAAAGGGGATCCCTGG Hum_YP39R CCCGGGAGGTGGAGGTTATA 58 Y

Table 4-2: Table of primers used for analysis including length of product, annealing temperature in degrees Celsius, location in the genome, and SNPs amplified on the fragment. a – primers from Álvarez-Iglesias et al. (2007), b – primers from van Oven and Kayser (2009), c – primers from Vallone et al. (2004), d – primers from Ripan Malhi (pers. com.), e – primers for the HVS1 region which were used to test for presence of Native American DNA.

After the SNPs were called for each sample, then the haplotype tree was used to analyze each set of SNPs. As it was possible that a particular SNP in a particular sample would only contain reads from contamination, due to the low copy number of the starting material, we did not require every characteristic SNP be present to identify a sample as containing a particular haplotype. Rather, at least one SNP in each category which led to separation of haplotypes on the tree was required for identifying that a haplotype was present in the sample.

112

Results Quality of Data DNA recovered from first batch of moccasin swabs suffered from extreme contamination from European haplotypes. A few Native American haplotypes were recovered from two swabs, but these Native American sequences were at a very low ratio in the recovered clones (1:20). As these samples were excavated and preserved without thought of aDNA recovery, it was unsurprising that they were highly contaminated (Eklund, 2012). On the basis of the potential preservation of the Native American haplogroups in the swabbed samples, aDNA sampling was undertaken at the recent exaction of the site in 2011 using methods to decrease contamination (the samples were quickly sampled after excavation by individuals with protective clothing) and increase the amount of transfer aDNA as starting material (cut samples of the object rather than swabs of the object were used). In addition, three of the previously excavated moccasins were sampled. These new samples had a much lower rate of contamination detected in the cloning reactions, but European contamination sequences were still present. Eighteen of the samples contained Native American sequences and therefore were chosen for deeper analysis (Table 4-3). Use of an Illumina library allowed for deeper sequence analysis than cloning in order to better assess contamination and identify haplotypes which might be present in very low copy number.

Damage Damage was observed in both cloned sequence data and Illumina reads. As the reads were from PCR products not from shotgun library preparation, the pattern of increasing damage near the end of the reads was not expected in these samples. C to T and the corresponding G to A, changes from deamination of cytosine to uracil are the most commonly seen damage in aDNA (Hofreiter et al., 2001), and these were frequently observed in the samples, especially in the reads containing the Native American SNPs. This is expected as the Native American SNPs are on the damaged, ancient fragments while the contaminant DNA is on modern fragments and thus would not be damaged.

113

Contamination Non-Native American haplotypes were identified in all of the samples. Contamination from modern humans is a common problem in working on human aDNA (Pilli et al., 2013) and therefore this was expected. The contamination could come from 1) the excavation, 2) the reagents, or 3) the handing of the samples during extraction and amplification. Decreasing the handling time during the excavation will decrease the potential for contamination. Using negative extraction and amplification controls both before and after the samples allowed us to monitor contamination from reagents or handling. However, even with these precautions, all of the samples had some level of contamination present.

SNPs and haplotypes from ancient samples Thirty nine SNPs were targeted from the mitochondrial genome. These SNPs distinguished between X, A, and B haplogroups. Specifically X2a1, X2a2, A2a, A2b, B4d, B4f, B4a1, and B2a haplogroups were targeted. For an identification at least one of each of the SNPs at each fork on the haplogroup tree (Figure 4-2) had to be present. Three of the samples, had one haplotype confirmed to be present by this criteria. Four of the samples had more than one SNP for each branch of the tree present for multiple haplotypes and all of the haplotypes present are reported. All four of these samples contained fragments where two SNPs each for a different haplotype were present. Recovered reads contained a mixture of reads with either one or the other SNP present, but not both on the same read, indicating a mixture of two different haplotypes and not a product of damage.

Five of the samples had some Native American SNPs, but not enough to make a confirmed haplotype identification. This was because, some of the PCR fragments either did not amplify or did not contain the expected SNP. These samples are reported as having presumptive haplotypes based upon what the majority of the recorded SNPs represented. Four of the samples, had Native American SNPs with unusual patterns which did not match any haplotype and therefore a haplotype could not be determined. This is most likely caused by the samples containing several individuals of different haplotypes where none of the haplotypes were completely recovered and thus an unusual pattern is presented.

114

Finally two of the samples contained no confirmed Native American sequences. One of the samples had no SNPs present in any of the recovered fragments. The other sample had one SNP present, but it is one of the SNPs present also in some European and Asian haplotypes and therefore without the presence of any other Native American SNPs the sample cannot be confirmed as Native American.

Figure 4-2: Haplogroup map including key Native American haplogropus X2a, A2a, B4a and B2. The SNPs which determine each haplogroup are indicated. When possible more than one SNP was targeted for each branching point. In order to confirm a haplotype present in the sample at least one SNP from each branching point to the haplogroup had to be present.

There is a variation in recovery between the moccasin leather and the cordage. Although a similar percentage of cordage and leather yielded Native American DNA (7/9 and 11/15 respectively), more of the cordage was able to be identified to haplotype. In addition, multiple haplotypes were clearly present in the cordage at a greater rate than the leather. This is possibly due to the fact that the cordage may include hair or salvia rather than just transfer from skin contact and therefore more transfer DNA was deposited on the sample during use.

115

Sample Material NA Clones SNPs Amplified SNPs detected Haplotype KK75 UP11.KK001 moccasin* KK76 UP11.KK002 moccasin X 32 10 B2a KK77 UP11.KK003 moccasin* X 18 5 presumptive B2 KK78 UP11.KK004 moccasin KK79 UP11.KK005 moccasin X 20 0 no NA DNA KK80 UP11.KK006 moccasin KK81 UP11.KK007 moccasin* X 31 10 B4f, B2a KK82 UP11.KK008 moccasin X 20 4 presumptive B2 KK83 UP11.KK009 moccasin X 26 7 not determined KK84 UP11.KK010 moccasin X 26 6 presumptive B2 KK126 UP11.KK013 moccasin X 21 5 not determined KK127 UP11.KK014 moccasin KK128 UP11.KK015 leather X 18 1 no NA DNA KK129 UP11.KK016 leather X 22 6 B4f KK130 UP11.KK017 cordage KK132 UP11.KK018 cordage X 31 10 B4a1, B2a KK133 UP11.KK019 cordage X 30 7 B2a KK134 UP11.KK020 cordage X 32 17 A2a, B4a1, B2a KK135 UP11.KK021 cordage X 32 17 A2a, B4f, B4a1, B2a KK136 UP11.KK022 cordage X 30 8 presumptive B2 KK137 UP11.KK023 cordage X 24 2 not determined KK138 UP11.KK024 cordage X 23 6 not determined KK139 UP11.KK025 cordage KK240 UP12.KK010 moccasin X 26 5 presumptive B2

Table 4-3: Samples of items tested for Native American DNA indicating which samples were targeted for SNP analysis (presence of Native American clones). For those 18 samples the number of SNP locations which amplfied is reported (maximum 39) and the number of those locations where SNPs were dtected are reported. Finally the haplotype or haplotypes present are reported.

Discussion The results provide evidence that it is possible to retrieve DNA transferred to objects during use. This opens up a wider range of objects from which to retrieve ancient human DNA in order to look the evolutionary history of ancient and historic human populations. However, transfer DNA comes with its own concerns. First, unlike bones or teeth, objects are likely to be used by more than one individual. This means that there are potentially mixed haplotypes present which can be difficult to parse from damaged and fragmented aDNA. In addition, multiple related individuals using an object may not be detected because all share the same haplotype. Items used

116

by a single person would be ideal as multiple haplotypes would not be present, but as most items are handled by multiple people items used by a single person are unlikely to be recovered.

Second, low copy number is an inherent problem as the DNA was transferred to the object from use. Samples should be chosen for the potential amount of DNA that would have been transferred to the object. Samples which involve saliva such as the cordage, have more potential to discover multiple haplotypes from mixed usage, than samples with only skin transfer such as the moccasins and mitten leather. In addition, items which are frequently used such as clothing and shoes are more likely to contain transfer DNA than items which are seldom touched or used.

Third, contamination poses a problem as the DNA is human in origin and only on the surface of the object, preventing common sterilization procedures used on bone and teeth from being applied. Therefore, additional precautions must be taken during excavation and processing in order to detect contamination that is present. Also, ideally, the individuals undertaking excavation and DNA processing should be from a different ethnic group than the individuals who used the items as allows for easier identification of contaminating DNA sequences. Finally, in order to combat all these problems, sequencing the samples in depth is necessary in order to detect haplotypes at low copy number that may be present and assess the contamination.

Working with human bones often creates ethical issues and destructive sampling is sometimes not allowed. Using objects with transfer DNA to look at ancient and historic human populations provides not only an alternative to destructive sampling of human remains, it also provides a larger cross section of individuals from a population to be sampled. This has the potential to greatly increase the amount of genetic information available from past human populations in order to address questions of evolution and migration.

117

References Adler, C.J., Haak, W., Donlon, D., and Cooper, A. (2011). Survival and recovery of DNA from ancient teeth and bones. J. Archaeol. Sci. 38, 956–964. Álvarez-Iglesias, V., Jaime, J.C., Carracedo, Á., and Salas, A. (2007). Coding region mitochondrial DNA SNPs: Targeting East Asian and Native American haplogroups. Forensic Sci. Int. Genet. 1, 44–55. Arion T. Mayes (2010). These Bones Are Read: The Science and Politics of Ancient Native America. Am. Indian Q. 34, 131–156. Brown, K.A., and Brown, T.A. (2013). Biomolecular Archaeology*. Annu. Rev. Anthropol. 42, 159–174. Eklund, J.A. (2012). The possible effects of preparation and conservation treatments on DNA research, using treatments for basketry as a case study. J. Inst. Conserv. 35, 92–102. Gunn, S.J. (2009). Native American Graves Protection and Repatriation Act at Twenty: Reaching the Limits Is Our National Consensus, The. William Mitchell Law Rev. 36, 503. Hansson, M.C., and Foley, B.P. (2008). Ancient DNA fragments inside Classical Greek amphoras reveal cargo of 2400-year-old shipwreck. J. Archaeol. Sci. 35, 1169–1176. Hofreiter, M., Jaenicke, V., Serre, D., Haeseler, A. von, and Pääbo, S. (2001). DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res. 29, 4793–4799. Jones, D.G., and Harris, R.J. (1998). Archeological Human Remains: Scientific, Cultural, and Ethical Considerations. Curr. Anthropol. 39, 253–264. Kirsch, S. (2011). Science, Property, and Kinship in Repatriation Debates. Mus. Anthropol. 34, 91–96. LeBlanc, S.A., Kreisman, L.S.C., Kemp, B.M., Smiley, F.E., Carlyle, S.W., Dhody, A.N., and Benjamin, T. (2007). Quids and Aprons: Ancient DNA from Artifacts from the American Southwest. J. Field Archaeol. 32, 161–175. Letts, B., and Shapiro, B. (2012). Case Study: Ancient DNA Recovered from Pleistocene-Age Remains of a Florida Armadillo. In Ancient DNA, B. Shapiro, and M. Hofreiter, eds. (Humana Press), pp. 87–92. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078– 2079. McGovern, P.E., Mirzoian, A., and Hall, G.R. (2009). Ancient Egyptian herbal wines. Proc. Natl. Acad. Sci. 106, 7361–7366.

118

Van Oorschot, R.A., Ballantyne, K.N., and Mitchell, R.J. (2010). Forensic trace DNA: a review. Investig Genet 1, 14. Oskam, C.L., and Bunce, M. (2012). DNA Extraction from Fossil Eggshell. In Ancient DNA, (Springer), pp. 65–70. Oskam, C.L., Haile, J., McLay, E., Rigby, P., Allentoft, M.E., Olsen, M.E., Bengtsson, C., Miller, G.H., Schwenninger, J.-L., and Jacomb, C. (2010). Fossil avian eggshell preserves ancient DNA. Proc. R. Soc. B Biol. Sci. 277, 1991–2000. Van Oven, M., and Kayser, M. (2009). Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394. Pilli, E., Modi, A., Serpico, C., Achilli, A., Lancioni, H., Lippi, B., Bertoldi, F., Gelichi, S., Lari, M., and Caramelli, D. (2013). Monitoring DNA Contamination in Handled vs. Directly Excavated Ancient Human Skeletal Remains. PloS One 8, e52524. Reardon, J., and TallBear, K. (2012). “Your DNA Is Our History”: Genomics, Anthropology, and the Construction of Whiteness as Property. Curr. Anthropol. 53, S233–S245. Rohland, N., and Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762. Schiffner, L.A., Bajda, E.J., Prinz, M., Sebestyen, J., Shaler, R., and Caragine, T.A. (2005). Optimization of a simple, automatable extraction method to recover sufficient DNA from low copy number DNA samples for generation of short tandem repeat profiles. Croat. Med. J. 46, 578. Smith, C.I., Chamberlain, A.T., Riley, M.S., Stringer, C., and Collins, M.J. (2003). The thermal history of human fossils and the likelihood of successful DNA amplification. J. Hum. Evol. 45, 203–217. Tamariz, J., Voynarovska, K., Prinz, M., and Caragine, T. (2006). The application of ultraviolet irradiation to exogenous sources of DNA in plasticware and water for the amplification of low copy number DNA. J. Forensic Sci. 51, 790–794. Vallone, P.M., Just, R.S., Coble, M.D., Butler, J.M., and Parsons, T.J. (2004). A multiplex allele- specific primer extension assay for forensically informative SNPs distributed throughout the mitochondrial genome. Int. J. Legal Med. 118, 147–157.

119

Chapter 5: The limit of recoverable aDNA in bones – DNA recovered from bones from the fossil bearing sites of the Yukon Territory, Canada and Snowmass, Colorado USA

Abstract: Although the theoretical limit of aDNA is between 400,000 and 1.5 million years, very few samples have approached this age limit. Recently DNA from two bones which push this age limit have been discovered: a 700,000 year old horse bone from permafrost and a 300,000 year old cave bear bone from a cave. In this chapter I examine the possibility of recovering DNA from several bones dated to the last interglacial period from two different sites, one in the Yukon Territory, Canada and one in Snowmass, Colorado USA. These bones come from three extinct species from which no DNA has been published: the giant bison (Bison latifrons), the western camel (Camelops hesternus) and a putatively new species of North American deer (Odocoileus sp.). DNA from these species would clarify the relationships between extinct and extant members of their respective genera. Unfortunately, only one fragment of DNA from one C. hesternus bone was recovered indicating poor preservation in all of the samples. Although the DNA recovery was poor with traditional methods, recently published protocols for extraction and sequencing of short, damaged DNA may prove successful particularly on the C. hesternus bone from which DNA was recovered.

120

Introduction The preceding chapters explored preservation conditions of mixed sample of DNA, including sediment samples and objects onto which DNA has been transferred through wearing or use of that object. In this chapter, I discuss the results of several projects involving DNA extraction directly from bones. Bones have some of the most well developed protocols for the recovery of aDNA; I apply these existing methods in an effort to extract DNA from bones collected from preservation environments that are less than ideal for preservation of DNA, either because of environmental conditions themselves or simply due to the age of the sample.

The oldest permafrost sample that has yielded DNA is a 700,000 year old horse bone (Orlando et al., 2013). This horse meapodial fossil was found at the Thistle Creek site in west- central Yukon Territory, Canada and has yielded approximately 1.12x genomic coverage when mapped against the horse genome. Outside of permafrost environments, fewer very old specimens have produced amplifiable DNA. However, some very old non-permafrost bones do have DNA surviving, including a 100,000 year old Neanderthal bone (Orlando et al., 2006) and a 120,000 year old mitochondrial polar bear jaw bone (Lindqvist et al., 2010). In 2006, very short primers used to amplify a fragment size of around 50bp were used to successfully amplify single nucleotide polymorphisms from cave bear samples recovered from Sima de los Huesos Cave in Spain, which were believed to be between 120,000 and 300,000 years old (Valdiosera et al., 2006). Recently the complete mitochondrial genome of one of these specimens was sequenced, thanks to improved protocols for ancient DNA isolation and genomic library preparation (Dabney et al., 2013).

DNA degradation occurs post mortem in animal remains. Although DNA can survive thousands of years after the death of the organism, it becomes highly fragmented and damaged. Initially DNA is degraded by nucleases. However, if the nucleases are deactivated or destroyed by environmental conditions such as rapid desiccation, low temperature, high salt concentration, or lack of oxygen, the DNA can survive extended periods of time (Hofreiter et al., 2001). The DNA is still subject to damage through oxidation, hydrolysis, and radiation (Willerslev and Cooper, 2005). Damage resulting in a shortened length of amplifiable DNA is driven by several effects including depurination of bases (Briggs et al., 2007; Lindahl, 1993), crosslinking between molecules (Hansen et al., 2006), and hydrolysis of the DNA backbone (Willerslev and Cooper, 2005). The result of these processes leads to increasing fragmentation of the DNA through time.

121

Theoretical (Lindahl, 1993) and empirical (Allentoft et al., 2012; Willerslev et al., 2007) evidence indicates that the upper limit of DNA survival is between 400,000 and 1.5 million years largely due to fragmentation. As constant frozen temperatures decrease chemical reaction rates, bones found in permafrost or permanently frozen conditions have a greater likelihood to contain amplifiable DNA than bones of the same age in more temperate climates. This is consistent with existing data. In addition, DNA at the extreme of preservation ages has been shown to be highly fragmented. The 700,000 year old horse bone had an average fragment size of 77pb (Orlando et al., 2013) and 94% of the 300,000 year old cave bear inserts were shorter than 50bp (Dabney et al., 2013). Thus we expect highly fragmented and degraded DNA from these very old or poorly preserved bone samples.

I examined bone samples from three extinct species of North American megafauna: the giant bison (Bison latifrons), the western camel (Camelops hesternus), and a putatively new species of deer (tentative Odocoileus sp.). Bones from these animals have been found in North America, but as yet no DNA sequences from any of these species has been published. Without genetic information, taxonomic classification is based solely on morphological features, which can be challenging with extinct species whose remains are scarce. Investigation of several families of both modern and extinct mammals, including bison (Shapiro et al., 2004); elephants, mammoths, and (Krause et al., 2006; Rohland et al., 2010); and the Odocoileinae family of deer (Fernández and Vrba, 2005; Randi et al., 1998), have shown that morphological and genetic classifications do not always lead to the same conclusions. Understanding the phylogenetic relationships between extinct species and other members of the same family (both living and extinct) will allow better understanding of changes in biodiversity over time. Ultimately these insights can help inform management practices of living large mammals during the current period of climate change and warming (de Bruyn et al., 2011; Dawson et al., 2011; Lorenzen et al., 2011; Moritz and Agudo, 2013).

122

The fossil bearing sites of Snowmass, Colorado (USA) and the Yukon, Canada The Snowmass Village fossil site (also known as the “Snowmastodon” site), near Snowmass Village, CO (Figure 5-1), is a high elevation Pleistocene fossil bearing site, located 2,705 m above sea level. Excavated by the Denver Museum of Nature and Science in 2010 and 2011, the site consists of 10m of fossil-bearing sediments spanning around 100,000 years (Johnson et al., 2011). This location was the site of a small glacial lake formed during the Bull Lake glaciation between 150,000 and 130,000 years ago (Carrara et al., 2011). This lake was the site of alternating warm forest and colder tundra climates until it was filled in between 100,000 and 50,000 years ago creating an alpine meadow (Stucky et al., 2011). Over 5,000 bones from 26 different Ice Age vertebrates, including seven extinct megafauna species, along with logs, peat, wood, , rocks, and have been unearthed providing an unprecedented record for a high altitude Pleistocene site (Johnson et al., 2011; Lubeck, 2010). The American (Mammut americanum), giant bison (Bison latifrons), and ground sloth (Megalonyx jeffersonii) were found in the lower layers when the lake was present, while the Columbian mammoth (Mammut americanum), deer (Odocoileus sp.), horse (Equus sp.), and giant camel (Camelops sp.) were found in the upper layers after the lake had been filled in (dmns.org, 2011; Stucky et al., 2011).

Figure 5-1: Map of the locations where the bones analyzed in this paper were recovered (Map made with www.zeemaps.com).

123

A Camelops bone was found in the Yukon Territory, Canada, at White River (Figure 5-1) in 2010 from the top of an 8 m high sedimentary exposure. Pollen and macrofossil data recovered suggest the fossil was deposited in a pond with fluctuating water levels in a shrub-tundra region. This ecosystem was not as forested as true interglacial conditions nor as cold as full glacial conditions which given the stratigraphy dates the bone between 115 and 87 thousand years ago (Zazula et al., 2011). Also analyzed were Camelops bones from the sites of Canyon Creek, Alaska and Hunker Creek, Yukon, Canada which are also dated to radio carbon infinity and likely come from a similar date as the White River bone.

Bison latifrons The first bison entered North America from Asia during the middle Pleistocene spreading southward through the continental (MacDonald, 1981). The largest bison found in the midcontinent, Bison latifrons, reached a shoulder height of 2.5 m, weighed over 2,000 kg, and had a horn span of greater than 2m from tip to tip. These dimensions make B. latifrons 25%-50% greater than its living relatives (Harlan, 1825; Kurtén, 1980). The evolutionary relationships between the extant and extinct bison species, including B. latifrons, B. priscus, B. antiquus, B. occidentalis, and B. bison, are not secure (Scott, 2010; Shapiro et al., 2004; Wilson, 1996; Woodburne, 2004). B. latifrons most likely lived in small family groups in the woodlands of North America, preferring warm climates (MacDonald, 1981) and went extinct 21,000-30,000 years ago (Kurtén, 1980). Recent excavations of a high-altitude site near Snowmass Colorado have recovered B. latifrons bones dating to 150,000-100,000 years old (Stucky et al., 2011).

My aim in this project was to determine the phylogenetic relationship between B. latifrons and other species of bison that lived in North America during the Late Pleistocene, for which aDNA data has been previously published (Shapiro et al., 2004).

124

Cervidae family The first fossils from the family Cervidae appear in the early Miocene in Eurasia with Cervids most likely entering North America during the early Pliocene (Eisenberg and Wemmer, 1987). Modern Cervidae is divided into 3 subfamilies: Hydropotinae (Old World deer with no antlers), Cervinae (Old World deer), and Odocoileinae (New World deer). Although most studies show that the latter two groups of antlered deer are monophyletic, this is not universally true. In addition the relationship within the Odocoileinae are not well established (Fernández and Vrba, 2005; Merino and Rossi, 2010; Randi et al., 1998). Odocoileninae genera likely migrated into North America in the late Miocene (4.2-5.7 million years ago) (Webb, 2000) although they may have diverged as a family before that (Fernández and Vrba, 2005). Extinct deer have been little studied resulting in most ancient bones not being identified and the validity of some of the bones which have been identified is questioned (Merino and Rossi, 2010).

Cervid bones have been recovered from the Snowmass site from layers dating 100,000- 46,000 years ago (Stucky et al., 2011). As the taxonomy of the bones is unclear from morphology, my aim for this project was to use genetic data from this ancient deer to resolve its taxonomic relationship with extant species.

Camelops hesternus The family Camelidae appeared in North America around 40-45 million years ago (Honey et al., 1998; Stanley et al., 1994). They evolved from the small Poebrodon of the middle Eocone into a larger Poebrotherium of the late Eocene-Oligocene. They then diversified in the early Miocene with at least 20 genera with varying traits including the gazelle-like Stenomylines, the short legged Miolabines, and the long necked Aepycameline (Semprebon and Rivals, 2010). The largest of these camels (such as Titanotylopus and Gigantocamelus) stood taller than modern giraffes. Between 25 and 11 million years ago the two modern tribes, Lamini and Camelini had diverged (Cui et al., 2007; Harrison, 1979; Webb, 1974) (Figure 5-2). The tribe Camelini had dispersed into Eurasia by 7 million years ago, where they eventually evolved into the modern bactran and dromedary species. Around 3 million years ago, the tribe Lamini dispersed into South America, eventually evolving into extanct llamas, alpacas, vicunas, and guanocs (Honey et al., 1998; Stanley et al., 1994).

125

Bones from the genus Camelops first appear in central Mexico (Jiménez-Hidalgo and Carranza-Castañeda, 2010) and the southern Plains (MORGAN and LUCAS, 2003; Morgan and White, 2005; Thompson and White Jr., 2004) between 3.2 and 4.0 million years ago, evolving into as many as six distinct species by the early Pleistocene (Dalquest, 1992; Jiménez-Hidalgo and Carranza-Castañeda, 2010; Webb, 1977). The species C. hesternus is best known in the fossil record from across western North America, where it may have survived until around 11,000 years ago (Harington, 1997; Haynes, 2009; Kurtén, 1980; Pinsof, 1996); however, redating of some of the bones with these younger dates using AMS radiocarbon has resulted in no Camelops remains with finite radiocarbon ages, suggesting they may have gone extinct prior to the last glacial interval (unpublished data, G. Zazula). With a mixed feeding strategy Camelops spread across a wide range of environments including both temperate and arctic environments (Dompierre and Churcher, 1996; Semprebon and Rivals, 2010). The protein osteocalcin sequence was determined for one C. hesternus bone from New Mexico and compared with modern camelids. All sequences were indistinguishable (Humpula et al., 2007).

Figure 5-2: Phylogeny of the Camelidae and dietary assignment from dentil microwear analysis. The tribes Lamini and Camelini are marked on the tree which shows the relationships between Camelops and modern camelid genera: Lama, Vicugna, and Camelus. Figure from Semprebon and Rivals (2010).

Although rare, Camelops bones have been recovered in Eastern Beringia from central Alaska (Guthrie, 1968; Harington, 1977, 1997; Repenning and Haas, 1981), Old Crow River in northern Yukon (Harington, 1977), and in the mining districts of west-central Yukon (Harington,

126

1977, 1997; Zazula et al., 2011). While these bones are in a good preservation environment, they are likely to be very old and thus extracting DNA is likely to be difficult. My aim in this project was to determine the phylogenetic relationship between C. hesternus and extant species in the family Camelidae.

Materials and Methods Sampling and extraction After excavation, subsampling from 12 B. latifrons and 2 Odocoilus bones from the Snowmass site was done by Beth Shapiro and Mathias Stiller at the Denver Museum of Nature and Science. North American camelid samples were shipped to The Pennsylvania State University by collaborators, where 8 bones from the Yukon Territory and 1 bone from Alaska were subsampled for DNA extraction in a sterile laboratory dedicated to ancient DNA research (Table 5-1). All DNA extractions, PCR reactions, and library preparation were performed in the sterile laboratory following standard protocols for working with ancient DNA (Cooper and Poinar, 2000; Gilbert et al., 2005). All post-PCR molecular work was performed in a separate laboratory that was geographically isolated from the ancient DNA lab.

DNA was extracted according to the following protocol. First, the external layers of the bone were removed using a dremel tool. A 0.25-0.5 g piece of bone was excised and powdered using a microdismembrator (Braun Mikrodismembrator U with an 8 mm tungsten ball bearing – B. Braun Biotech International, Germany). The bones were then extracted using a silica based extraction method for aDNA (Rohland and Hofreiter, 2007). An extraction negative (no bone powder) was included for every 6 samples processed. The finial elution volume was 50 µl of TE containing 0.05% Tween and the samples were stored at -20°C until amplification and/or library preparation.

127

Sample Museum Museum Number Sample Loction Species UP11.KK.011 DMNS DMNS 59.15 Snowmass B. Latifrons UP11.KK.012 DMNS DMNS 59.104 Snowmass B. Latifrons UP11.MS.429 DMNS Field# 13.9b Snowmass B. Latifrons UP11.MS.430 DMNS Field# 21.7 Snowmass B. Latifrons UP11.MS.431 DMNS Field# 23.13 Snowmass B. Latifrons UP11.MS.432 DMNS Field# 25.50 Snowmass B. Latifrons UP11.BS001 DMNS DMNS 54.258 Snowmass B. Latifrons UP11.BS002 DMNS DMNS 64.002 Snowmass B. Latifrons UP11.BS003 DMNS DMNS 48.31 Snowmass B. Latifrons UP11.BS004 DMNS DMNS 54.261 Snowmass B. Latifrons UP11.BS005 DMNS DMNS 53.002 Snowmass B. Latifrons UP11.BS006 DMNS DMNH 13.9 Snowmass B. Latifrons UP11.MS.424 DMNS Field# A2.5 Snowmass Odocoileus sp. UP11.MS.425 DMNS Field# A2.6 Snowmass Odocoileus sp. UP12.KK001 Yukon Government YG328.259 Hunker Creek Camelops sp. UP12.KK002 Yukon Government YG400.6 White River Camelops sp. UP12.KK003 Yukon Government YG328.21 Hunker Creek Camelops sp. UP12.KK004 Yukon Government YG328.22 Hunker Creek Camelops sp. UP12.KK005 Yukon Government YG328.281 Hunker Creek Camelops sp. UP12.KK006 Yukon Government YG328.287 Hunker Creek Camelops sp. UP12.KK007 Yukon Government YG29.199 Hunker Creek Camelops sp. UP12.KK008 Yukon Government YG328.23 Hunker Creek Camelops sp. UP12.KK009 Yukon Government 74AWR14-M1330 Canyon Creek Camelops sp.

Table 5-1: Samples which were extracted for aDNA analysis, including the sample location and morphological species idnetification. DMNS is the Dever Museum of Nature and Science.

DNA amplification, and library preparation, and sequencing Primers were designed using Lasergene v.10 from closely related species mitochondrial DNA sequences: B. bison for B. latifrons, a consensus sequence for the new world deer for Odocoileus, and a consensus sequence for the four South American camelids. Primers were chosen to target short conserved fragments of the mitochondrial cytochrome b gene (cytb), the mitochondrial control region (CR), or the mitochondrial D-loop (Table 5-2). Primers were checked for uniqueness using PrimerBLAST from NCBI. PCR amplifications were performed in 25 µl reactions using 1 µl of extract, 1.25 U Platinum Taq Hi-Fidelity and 1X buffer (Invitrogen Ltd., UK), 2 mg/ml rabbit serum albumin (RSA; Sigma, fraction V), 2 mM MgSO4, 250 µM of each dNTP, and 1 µM of each primer. PCR thermal cycling reactions were 94°C for 2 min, followed by 35 cycles of 94°C denaturation for 25-30 sec, annealing for 45 sec at various temperatures depending upon the primers (information available upon request), and extension at

128

68°C for 45 sec. Negative extraction controls as well as negative PCR controls (no extract) were amplified with each PCR reaction. Amplification reactions were purified and cloned using the TOPO TA Cloning Kit (Invitrogen) in 1/10 cloning reactions. 3-10 clones for each amplification were sequenced at the PSU Genomics Core Facility (The Huck Institutes of the Life Sciences, The Pennsylvania State University) on an Applied Biosystems 3730XL machine.

Species Gene Forward Reverse Length Cervidae cytb cervshort1F TCT GTY ACY CAY ATY TGY CGA GA cervshort1R CAR ATR AAR AAT ATD GAK GCT CC 86 bp Cervidae cytb cervshort2F CGA TTY TTY GCY TTY CAC TTY AT cervshort2R TYG GRT TRT TDG AYC CTG TTT CRT 97 bp Bison D loop bisshortF TTA TTA ATC GTA CAT AGC ACA TT bisshortR TCA CGC GGC ATG GTA ATT A 95 bp Bison* CR 178F GCCCCATGCATATAAGCAAG 309R GCCTAGCGGGTTGCTGGTTTCACGC 132 bp Camelid** cytb camcytb4L CAACAGCCTTCTCTTCAGTCG camcytb4H GGAAGGCGTAGGAACCGTAG 149 bp Camelid cytb camcytb5F CGA AAR TCC CAC CCR CTA C camcytb5R ATG TAT TGC TAG AAA TAG TCC TGT C 150 bp Camelid cytb camcytb6F ATG AAA YTT CGG CTC CCT camcytb6R CGT AAT TYA CGT CTC GGC A 137 bp Camelid cytb camcytb7F TTACGGCTGAATTATTCGYTAYCT camcytb7R TAATTACTGTTGCCCCTCAAAAT 218 bp Camelid cytb camcytb8F TACGCCTTCCTAGARACTTGAAAY camcytb8R GTGTAAGGGTGGCTTTRTCTACG 211 bp Camelid cytb camcytb9F CATTTTGAGGGGCAACAGTAAT camcytb9R TGTTTCGTGTAAAAATAGTAGATG 194 bp Camelid cytb camcytb10F TCTTCGCCTTYCAYTTTATCTTAC camcytb10R RTGGAAKGGRATTTTGTCTAT 131 bp Camelid cytb camcytb11F CTCCAACAAYCCAACAGGAA camcytb11R YAGGAARTATCAYTCTGGTTTA 214 bp Camelid** CR camCR2R GGCCTGGTGATTAAGCTCGT camCR2F ACGCGTTGCGTGCTATATGT 159 bp

Table 5-2: Primer sequences used in this experiment, along with the fragment length for the amplified product. * indicates primers from Shapiro et al. (2004) and ** indicates primers from Weinstock et al. (2009).

In addition to amplification by PCR, several of the B. latifrons samples were further analyzed using a process of target enrichment followed by high throughput sequencing on the Illumina platform. For 5 samples (UP11.KK.011, UP11.KK.012, UP11.MS.429, UP11.MS.430, and UP11.MS.431), 15ul of DNA extract was used to make a genomic library following the protocol by Meyer and Kircher (2010). After the indexing PCR, the samples were enriched for bison mitochondrial DNA using a commercial, RNA-based hybridization capture method (MYcroarray, Ann Arbor, MI), with a bait designed based on mitochondrial sequences of B. priscus using a 10bp tiling density. Enriched libraries were sequenced at the QB3 sequencing facility at The University of California, Berkeley on a HiSeq2000 Illumina platform with 100 bp paired end chemistry.

129

Analysis Sanger sequencing data was trimmed and aligned to the reference genomes using SeqMan from the Lasergene 10 Suite. Sequences were then inspected by eye for their relationship to the reference genomes. Forward and reverse Illumina read pairs were first merged using the MergeReadsFastQ_cc script according to Kircher (2012). All reads were mapped to the mitochondrial genome sequence of B. bison (Genbank accession NC_012346) using the iterative mapping software mia (Briggs et al., 2009).

Results Snowmass bison and deer Amplification of small fragments (less than 100 bp) of the Snowmass bison and deer samples was repeatedly unsuccessful. None of the deer samples had successful amplification and the sequences from two bison samples with amplification products were identified as contamination from cow. None of the reads from the Next Gen sequencing aligned to the bison genome.

Yukon camels Amplification from one primer set (camCR2) of one of the Yukon camel bones (UP12.KK.006) was successful. The sample was cloned and sequenced. One of the clonal sequences had 40 bp (after primer removal) which aligned to the camel sequence and had one difference from all extant camel sequences. While this is not enough to obtain a secure phylogeny for Camelops, this is enough to prove endogenous DNA.

130

Discussion Snowmass bison and deer No authentic DNA sequences could be recovered from any samples excavated from the Snowmass site. The samples at Snowmass underwent freeze-thaw conditions for at least 100,000 years. This increased the speed at which DNA degrades in comparison to permafrost environments. For the bison samples, there are modern bison genomes, but it is very difficult to design primers that are specific only to bison and not also to cow; and would be informative on differences between bison and B. latifrons. The observed cow contamination in the PCR amplifications most likely came from reagents being contaminated with cow and other domesticated animals, which is a continuing problem in aDNA research (Leonard et al., 2007).

Later, a colleague, Mathias Stiller, re-extracted two of these samples (UP11.KK011 and UP11.MS429) following new protocols for DNA extraction (Dabney et al., 2013) and single- stranded library preparation (Meyer et al., 2012). These protocols are reported to provide better coverage for highly damaged samples, and in particular for samples dominated by fragments less than 50pb in length. The resulting library was spiked into a run on an Illumina MiSeq and sequenced to low coverage. Of the nearly one million read recovered, only three mapped to the cow mitochondrial genome and the rest were identified mostly as unidentified environmental microorganisms. This suggests that if DNA does survive in these remains it does so at extremely low quantities and will be very challenging to recover. A future aim is to sequence this newly enriched library more deeply so as to observe the complexity of the bison DNA within the sample.

For the deer samples, the problem is even worse. Unlike the bison bones, the morphological identification of the cervid bones is not secure and the closest extant relative is unknown. Therefore designing primers that amplify an informative, very short fragment and are highly conserved between cervid species is very challenging. In addition, the lack of knowledge of the closest species complicates the design of probes for a capture experiment. However, capture experiments work with up to 10% mismatches in 60 bp and capture can be designed with low stringency using multiple probes for divergent regions (Shapiro lab, unpublished results) and so allows for more divergence than traditional PCR primer amplification.

131

Yukon camels The recently excavated Yukon samples provide the first indication of authentic aDNA from a Camelops bone. While more work needs to be done to ensure that the amplification represents endogenous DNA, this bone shows that DNA can be recovered from some of these samples. The lack of amplification for the Yukon samples is most likely due to fragmentation. Therefore the best method to retrieve sequence data for a phylogenetic reconstruction is to sequence a camelid enriched, Illumina library created from the extracted Camelops bone.

Traditional PCR vs. library preparation and capture for very old sequences PCR amplification has a lower limit on its ability to reconstruct sequences. Only fragments that are long enough to hybridize two PCR primers around a stretch of informative sequence are able to be detected. If the fragment size is decreased to 50 bp, as was seen in the 300,000 year old cave bear (Dabney et al., 2013), then only around 10 bp of informative sequence between the two priming sites can be recovered. This can at best only allow single nucleotide polymorphisms or short insertions/deletions to be sequenced. Library preparation, however, adds artificial adaptor sequences to the ends of the molecule thereby allowing all of the fragment to be sequenced, yielding more continuous informative sequence data to be recovered. Single stranded library preparation (Meyer et al., 2012) improves on this by not removing any of the overhanging bases to get a double stranded blunt end. Rather, it allows for all of these overhangs to be sequenced.

In addition, PCR primers must be specific in order to amplify the targeted region and must all be independently designed and tested. Problems with using a reference genome that is too divergent from the ancient sequence include: primers not binding leading to no amplification and preferential amplification of contaminant DNA. Capture protocols allow for more divergence between the reference and ancient sequences and even allow for multiple probes for the same location in very divergent areas of the genome when the closest relative is unknown. Single stranded library preparation combined with library enrichment through capture has proven successful in recovering both mitochondrial and nuclear genomic data from highly fragmented samples (Dabney et al., 2013; Orlando et al., 2013).

132

Therefore, although DNA in the Snowmass bison and deer and the Yukon camel bones is highly fragmented or cross linked, there is a possibility that authentic DNA can be recovered with high coverage sequencing of enriched libraries. For the Snowmass samples, if DNA is recovered from the bison enriched libraries, further work on the cervid bones may be able to resolve the taxonomy of these species using a low stringency hybridization capture experiment. The presence of DNA from a Yukon camel bone indicates that these bones likely contain fragmented DNA which can also be recovered through sequencing of enriched libraries.

Too old, and not too cold DNA preservation is much better understood for bones than for environmental samples; however, even in bones DNA preservation is not always a linear relationship based on age. Under some conditions a relatively constant decay rate has been shown (Allentoft et al., 2012), but successful extractions depend on depositional circumstances especially freeze-thaw cycles and general fluctuations in temperature and humidity. Although DNA has been recovered for samples which are older than those at the Snowmass site, the freeze-thaw cycles at Snowmass clearly degraded the DNA much more quickly than conditions in continually cold permafrost environment.

The 700,000 year old horse bone was deposited during a glacial environment and most likely had been frozen since that time, making this bone a prime candidate for optimal DNA preservation (Orlando et al., 2013). While the Camelops bones from the Yukon were found in a permafrost environment, they were most likely deposited during an interglacial period (Turner et al., 2013). Camelops bones have been found in the Southern Part of North America including Texas (Wilson W. Crook and Harris, 1958) and Mexico (Jiménez-Hidalgo and Carranza- Castañeda, 2010) indicating a more interglacial-adapted species. Therefore, the Camelops bones are from a cold location, but were most likely deposited in a warmer period for that location. The Snowmass bison and deer bones were also deposited in an interglacial period, but rather than being frozen since that time, they have been in a temperate environment with freeze-thaw cycles which is even worse for DNA preservation.

133

Rapid advances in aDNA technology including protocols designed specifically for highly fragmented samples, have significantly expanded the range of samples from which it is possible to extract DNA. However, there is a physical limit to DNA survival. For frozen samples this limit ranges between 400,000 and 1.5 million years. DNA from the 700,000 year old horse bone has reached into this range. Under less than ideal conditions as is seen for the Snowmass and Yukon samples, the limit to DNA survival is less than the optimal range. Based upon the likely fragmentation of the Snowmass samples, they represent the edge of DNA survival under temperate conditions.

134

References Allentoft, M.E., Collins, M., Harker, D., Haile, J., Oskam, C.L., Hale, M.L., Campos, P.F., Samaniego, J.A., Gilbert, M.T.P., Willerslev, E., et al. (2012). The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. B Biol. Sci. 279, 4724–4733. Briggs, A.W., Stenzel, U., Johnson, P.L.F., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., et al. (2007). Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. 104, 14616–14621. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., Lalueza-Fox, C., Rudan, P., Brajkovic, D., Kucan, Z., et al. (2009). Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325, 318–321. De Bruyn, M., Hoelzel, A.R., Carvalho, G.R., and Hofreiter, M. (2011). Faunal histories from Holocene ancient DNA. Trends Ecol. Evol. 26, 405–413. Carrara, P., Pigati, J.S., and Bryant, B. (2011). Formation of the “Snowmastodon” Site - A Death Trap for Late Pleistocene Animals, Snowmass Village, Colorado. Cooper, A., and Poinar, H.N. (2000). Ancient DNA: do it right or not at all. Science 289, 1139. Cui, P., Ji, R., Ding, F., Qi, D., Gao, H., Meng, H., Yu, J., Hu, S., and Zhang, H. (2007). A complete mitochondrial genome sequence of the wild two-humped camel (Camelus bactrianus ferus): an evolutionary history of camelidae. BMC Genomics 8, 241. Dabney, J., Knapp, M., Glocke, I., Gansauge, M.-T., Weihmann, A., Nickel, B., Valdiosera, C., García, N., Pääbo, S., Arsuaga, J.-L., et al. (2013). Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. U. S. A. 110, 15758–15763. Dalquest, W.W. (1992). Problems in the nomenclature of North American Pleistocene camelids. Ann. Zool. Fenn. 28, 291–299. Dawson, T.P., Jackson, S.T., House, J.I., Prentice, I.C., and Mace, G.M. (2011). Beyond Predictions: Biodiversity Conservation in a Changing Climate. Science 332, 53–58. dmns.org (2011). The Snowmastodon Project Updates (Denver Museum of Nature and Science). Dompierre, H., and Churcher, C.S. (1996). Premaxillary shape as an indicator of the diet of seven extinct late Cenozoic new world camels. J. Vertebr. Paleontol. 16, 141–148. Eisenberg, J.F., and Wemmer, C.M. (1987). The evolutionary history of the Cervidae with special reference to the South American radiation. Biol. Manag. Cervidae CM Wemmer Ed Smithson. Inst. Press Wash. DC 77, 60–64. Fernández, M.H., and Vrba, E.S. (2005). A complete estimate of the phylogenetic relationships in Ruminantia: a dated species-level supertree of the extant ruminants. Biol. Rev. 80, 269–302. Gilbert, M.T.P., Bandelt, H.-J., Hofreiter, M., and Barnes, I. (2005). Assessing ancient DNA studies. Trends Ecol. Evol. 20, 541–544.

135

Guthrie, R.D. (1968). Paleoecology of the Large-Mammal Community in Interior Alaska during the Late Pleistocene. Am. Midl. Nat. 79, 346–363. Hansen, A.J., Mitchell, D.L., Wiuf, C., Paniker, L., Brand, T.B., Binladen, J., Gilichinsky, D.A., Rønn, R., and Willerslev, E. (2006). Crosslinks Rather Than Strand Breaks Determine Access to Ancient DNA Sequences From Frozen Sediments. Genetics 173, 1175–1179. Harington, C.R. (1977). Pleistocene mammals of the Yukon Territory. PhD Thesis, Department of Zoology. University of Alberta. Harington, C.R. (1997). Ice Age Yukon and Alaskan Camels. Beringian Research Notes, Govenment of Yukon. Dep. Tour. Cult. 1–4. Harlan, R. (1825). Fauna Americana: Being a Description of the Mammiferous Animals Inhabitating North America (Anthony Finley). Harrison, J.A. (1979). Revision of the Camelidae (Artiodactyla, Tylogoda) and description of the new genus Alforjas. PalaeontContr 95, 1–20. Haynes, G. (2009). Estimates of Clovis-Era Megafaunal Populations and Their Extinction Risks. In American Megafaunal Extinctions at the End of the Pleistocene, G. Haynes, ed. (Springer Netherlands), pp. 39–53. Hofreiter, M., Serre, D., Poinar, H.N., Kuch, M., and Pääbo, S. (2001). Ancient DNA. Nat. Rev. Genet. 2, 353–359. Honey, J.G., Harrison, J.A., Prothero, D.R., and Stevens, M.S. (1998). Camelidae. In Evolution of Tertiary Mammals of North America, C.M. Janis, K.M. Scott, and L.J. Jacobs, eds. (Cambridge University Press), pp. 439–462. Humpula, J.F., Ostrom, P.H., Gandhi, H., Strahler, J.R., Walker, A.K., Stafford Jr, T.W., Smith, J.J., Voorhies, M.R., George Corner, R., and Andrews, P.C. (2007). Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications. Geochim. Cosmochim. Acta 71, 5956–5967. Jiménez-Hidalgo, E., and Carranza-Castañeda, O. (2010). Blancan Camelids from San Miguel de Allende, Guanajuato, Central México. J. Paleontol. 84, 51–65. Johnson, K.R., Miller, I.M., Sertich, J., Stucky, R., Fisher, D.C., and Pigati, J.S. (2011). Ziegler Reservoir and the Snowmastodon Project: Overview and Geologic Setting of a Recently Discoverd Series of High-Elevation Pleistocene Ecosystems near Snowmass Village, Colorado. Kircher, M. (2012). Analysis of high-throughput ancient DNA sequencing data. In Ancient DNA, (Springer), pp. 197–228. Krause, J., Dear, P.H., Pollack, J.L., Slatkin, M., Spriggs, H., Barnes, I., Lister, A.M., Ebersberger, I., Pääbo, S., and Hofreiter, M. (2006). Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439, 724–727. Kurtén, B. (1980). Pleistocene Mammals of North America (Columbia University Press).

136

Leonard, J.A., Shanks, O., Hofreiter, M., Kreuz, E., Hodges, L., Ream, W., Wayne, R.K., and Fleischer, R.C. (2007). Animal DNA in PCR reagents plagues ancient DNA research. J. Archaeol. Sci. 34, 1361–1366. Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709–715. Lindqvist, C., Schuster, S.C., Sun, Y., Talbot, S.L., Qi, J., Ratan, A., Tomsho, L.P., Kasson, L., Zeyl, E., Aars, J., et al. (2010). Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. Proc. Natl. Acad. Sci. 107, 5053–5057. Lorenzen, E.D., Nogués-Bravo, D., Orlando, L., Weinstock, J., Binladen, J., Marske, K.A., Ugan, A., Borregaard, M.K., Gilbert, M.T.P., Nielsen, R., et al. (2011). Species-specific responses of Late Quaternary megafauna to climate and humans. Nature 479, 359–364. Lubeck, M. (2010). Snowmass Fossil Site Provides Opportunity to Study Past Vegetation and Climate in Colorado (USGS). MacDonald, J.N. (1981). North American bison: their classification and evolution (Berkeley, University of California Press). Merino, M.L., and Rossi, R.V. (2010). Origin, systematics, and morphological radiation. InJ MB Duarte González Eds Neotropical Cervidology Biol. Med. Lat. Am. Deer Jaboticabal Braz. Gland Switz. Funep Collab. IUCN 2–11. Meyer, M., and Kircher, M. (2010). Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing. Cold Spring Harb. Protoc. 2010, pdb.prot5448. Meyer, M., Kircher, M., Gansauge, M.-T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Jay, F., Prüfer, K., Filippo, C. de, et al. (2012). A High-Coverage Genome Sequence from an Archaic Denisovan Individual. Science 338, 222–226. Morgan, G.S., and White, R.S. (2005). Miocene and Pliocene vertebrates from Arizona. In Vertebrate Paleontology in Arizona, A.B. Heckert, and S.G. Lucas, eds. pp. 115–136. MORGAN, G.S., and LUCAS, S.G. (2003). Chapter 12. Bull. Am. Mus. Nat. Hist. 269–320. Moritz, C., and Agudo, R. (2013). The Future of Species Under Climate Change: Resilience or Decline? Science 341, 504–508. Orlando, L., Darlu, P., Toussaint, M., Bonjean, D., Otte, M., and Hänni, C. (2006). Revisiting Neandertal diversity with a 100,000 year old mtDNA sequence. Curr. Biol. 16, R400–R402. Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., et al. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78. Pinsof, J.D. (1996). Current status of North American Sangamonian local faunas and vertebrate taxa. In Palaeocology and Palaeoenvironments of Late Cenozoic Mammals: Tribute to the Career of C.S. (Rufus) Churcher, K.M. Stewart, and K.L. Seymour, eds. (Toronto: Univeristy of Toronto Press),

137

Randi, E., Mucci, N., Pierpaoli, M., and Douzery, E. (1998). New phylogenetic perspectives on the Cervidae (Artiodactyla) are provided by the mitochondrial cytochrome b gene. Proc. R. Soc. Lond. B Biol. Sci. 265, 793–801. Repenning, C.A., and Haas, H. (1981). Canyon creek: a late Pleistocene vertebrate locality in interior Alaska. Quat. Res. 16, 1967–1980. Rohland, N., and Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762. Rohland, N., Reich, D., Mallick, S., Meyer, M., Green, R.E., Georgiadis, N.J., Roca, A.L., and Hofreiter, M. (2010). Genomic DNA Sequences from Mastodon and Woolly Mammoth Reveal Deep Speciation of Forest and Savanna Elephants. PLoS Biol 8, e1000564. Scott, E. (2010). Extinctions, scenarios, and assumptions: Changes in latest Pleistocene large herbivore abundance and distribution in western North America. Quat. Int. 217, 225–239. Semprebon, G.M., and Rivals, F. (2010). Trends in the paleodietary habits of fossil camels from the Tertiary and Quaternary of North America. Palaeogeogr. Palaeoclimatol. Palaeoecol. 295, 131–145. Shapiro, B., Drummond, A.J., Rambaut, A., Wilson, M.C., Matheus, P.E., Sher, A.V., Pybus, O.G., Gilbert, M.T.P., Barnes, I., Binladen, J., et al. (2004). Rise and Fall of the Beringian Steppe Bison. Science 306, 1561–1565. Stanley, H.F., Kadwell, M., and Wheeler, J.C. (1994). Molecular evolution of the family Camelidae: a mitochondrial DNA study. Proc. Biol. Sci. 256, 1–6. Stucky, R., Johnson, K.R., Miller, I.M., Fisher, D.C., Graham, R.W., McDonald, H.G., and Pigati, J.S. (2011). Zigler Reservoir and the Snowmastodon Project: New High-Elevation Fossil Vertebrate Faunas from Snowmass Village, Colorado. Thompson, M.A., and White Jr., R.S. (2004). Getting over the hump: Blancan records of Camelops from North America, with special reference to Hagerman, Idaho and the 111 Ranch, Arizona. Geol. Soc. Am. Abstr. Programs 36, 54. Turner, D.G., Ward, B.C., Bond, J.D., Jensen, B.J.L., Froese, D.G., Telka, A.M., Zazula, G.D., and Bigelow, N.H. (2013). Middle to Late Pleistocene ice extents, tephrochronology and paleoenvironments of the White River area, southwest Yukon. Quat. Sci. Rev. 75, 59–77. Valdiosera, C., García, N., Dalén, L., Smith, C., Kahlke, R.-D., Lidén, K., Angerbjörn, A., Arsuaga, J.L., and Götherström, A. (2006). Typing single polymorphic nucleotides in mitochondrial DNA as a way to access Middle Pleistocene DNA. Biol. Lett. 2, 601–603. Webb, S.D. (1974). Pleistocene Llamas of Florida, with a brief review of the Lamini. In Pleistocene Mammals of Florida, (Gainseville: University Press of Florida), pp. 170–214. Webb, S.D. (1977). A history of savannah vertebrates in the New World, Part I: North America. Annu. Rev. Ecol. Syst. 8, 355–380.

138

Webb, S.D. (2000). Evolutionary history of new world Cervidae. Antelopes Deer Relat. Foss. Rec. Behav. Ecol. Syst. Conserv. ES Vrba GB Schaller Eds Yale Univ. Press N. Hav. Conn. 38–64. Willerslev, E., and Cooper, A. (2005). Review Paper. Ancient DNA. Proc. R. Soc. B Biol. Sci. 272, 3–16. Willerslev, E., Cappellini, E., Boomsma, W., Nielsen, R., Hebsgaard, M.B., Brand, T.B., Hofreiter, M., Bunce, M., Poinar, H.N., Dahl-Jensen, D., et al. (2007). Ancient Biomolecules from Deep Ice Cores Reveal a Forested Southern Greenland. Science 317, 111–114. Wilson, M.C. (1996). Late quaternary vertebrates and the opening of the ice-free corridor, with special reference to the genus bison. Quat. Int. 32, 97–105. Wilson W. Crook, J., and Harris, R.K. (1958). A Pleistocene Campsite near Lewisville, Texas. Am. Antiq. 23, 233–246. Woodburne, M.O. (2004). Late Cretaceous and Cenozoic Mammals of North America: Biostratigraphy and Geochronology (Columbia University Press). Zazula, G.D., Turner, D.G., Ward, B.C., and Bond, J. (2011). Last interglacial western camel (Camelops hesternus) from eastern Beringia. Quat. Sci. Rev. 30, 2355–2360.

139

Chapter 6: Conclusion

140

DNA Preservation under extreme conditions Over the past thirty years the field of ancient DNA has prospered from new technological advances, standardization of methods, and increased research focus. DNA damage in bones has been well quantified, and very recently published methods have increased the ability to detect extremely damaged and fragmented DNA. However, ancient DNA in contexts other than bones is not well studied. In the preceding chapters, I have addressed issues of DNA preservation from several environments including the deep subsurface, fluid inclusions in salt crystals, DNA transferred to objects from use, and finally ancient bones which are close to the extreme limit for DNA preservation given their temperature history and environment. In all of these environments authentication of the DNA is a problem due to the ease of condemnation and the low copy number of the authentic DNA in the sample. When authenticated, DNA from these environments pushes the limits for what is known about DNA preservation and therefore opens up new doors to pursue aDNA in contexts other than bones.

Determining the authenticity of geologically ancient DNA The recovery of millions of years old ancient DNA trapped in salt crystals is inconsistent with currently accepted limits of DNA survival. As the ages of these bacteria cannot be obtained through isotopic dating, a means of using sequence data alone to evaluate such claims would be highly beneficial. As DNA mutation rates are clock like in nature, these geologically ancient sequences should be ‘missing’ evolution and thus their mutation rates should be depressed in relation to their modern relatives. In chapter two, I propose and evaluate a method to test for these depressed rates, the Depressed Rate Test (DRT). Although, the test is limited to samples over 100 kya, gene sequences with low back mutation rates, and a total tree height to age of sample ratio of 100:1, claims of geologically ancient DNA should be able to be evaluated. None of the currently published geologically ancient sequences which fall in the detection limit of the test were found to have depressed rates.

141

Detecting biomarkers from an ancient hydrothermal system The deep subsurface provides a plethora of different environments where microbial life survives and in some cases thrives although at very low metabolic rates. Biomarkers of ancient ecosystems could therefore be present in these systems in the form of DNA in very low metabolically active cells trapped in the deep subsurface after the original ecosystem was buried. A well studied system where it would be possible to detect such biomarkers is the Chesapeake Bay Impact Structure (CBIS). An impact event 34.5 Mya created a temporary hydrothermal system on the continental shelf which lasted between 10,000 and 100,000 years. Most likely this hydrothermal system was colonized by microbial life, which would have been trapped when the hydrothermal system cooled and the crater was filled with sediment.

In chapter three, I reported sequences of metagenomes from three different depths of the CBIS representing the impact crater (840 m) and the sedimentary layers above the impact crater (150 and 240 m). These metagenomes were dominated by the bacterial families Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes. Archaea, primarily from the families Crenarchaeota, Euryarchaeota, Korarchaeota, and Thaumarchaeota, composed only a small fraction of the recovered taxa. Both from analysis of functional genes and from taxonomic composition the ecosystem contains bacteria which are autotrophic and capable of sulfate reduction and fermentation. These metagenomes were similar to ones from other deep subsurface communities containing evidence of lithoautotrophy as a source of primary productivity for an ecosystem dominated by anaerobic, heterotrophic fermenters.

In addition, I was possibly able to detect DNA biomarkers of an ancient hydrothermal system in samples associated with the impact structure, but not in the sedimentary layers above. These biomarkers may have derived from bacteria whose ancestors were trapped after the impact induced hydrothermal system cooled. They make up a small percentage of the community most likely in a state of low metabolic activity. The presence of biomarkers from past environmental conditions is important for work done in other deep subsurface communities and can impact the understanding of both modern community composition and ecology of subsurface ecosystems and reconstructing the ecology of past communities particularly ones with temporary high energy environments such as impact events.

142

Recovering aDNA from touched objects Although a potentially rich source for DNA, little work has been done on human DNA transferred to objects from use. In a cave near the Great Salt Lake, a stash of Native American objects including moccasins, mittens, and cordage was excavated in the 1930s and again in 2011 and 2013. Samples of cordage and leather from moccasins and mittens were taken with particular care for removing potential sources of contamination. In chapter four, I report human DNA sequences recovered from moccasins and cordage. As these objects were used by multiple individuals, several of the samples have multiple haplotypes present which complicates the analysis. In addition, contamination from modern human sequences is still present even with the precautions taken. Therefore, cloning of amplified products was insufficient to detect all the haplotypes present. Taking advantage of next generation sequencing technology, we were able to sequence each amplified product more deeply and therefore detect multiple Native American haplotypes even with the contamination. Using transfer DNA could potentially open the door for a greater number of samples through which the genetic make-up of ancient and historic populations can be studied.

The limit of recoverable DNA in bones Although not the main focus of my dissertation, in chapter five, I report the lack of amplification through traditional methods of DNA from ancient bison (Bison latifrons), deer (Odocoileus sp.), and camels (Camelops hesternus). One, fragmented camel sequence was recovered from an interglacial bone recovered from permafrost, while the bison and deer from lower latitudes yielded no sequences. These samples are most likely very near the limit of recoverable DNA for their temperature profile and environment. Therefore any DNA present is highly damaged resulting in extreme fragmentation of the DNA. Very recent methods have been developed to recover this fragmented and damaged DNA for a continuously frozen sample from 700,000 years ago. Based upon the results from the one C. hesternus bone, these samples are likely candidates for recovery of fragmented DNA using the new methods.

143

DNA in extreme conditions All of these projects combine to form a better understanding of conditions where DNA is recoverable and what steps are needed to authenticate ancient samples which do not come from bones. As detection methods improve, very low quantities of DNA are able to be detected. This has led to discoveries of life in many places once thought to be uninhabitable and recovery of DNA from single cells. The research above advances research into some of these areas including obtaining biomarkers from the deep subsurface and recovering DNA which was transferred to object though use. In addition, new extreme claims require ways of authentication such as the DRT test. The results of this work can inform research not only the involved fields: geomicrobiology, anthropology, archaeology, and biology; but also the newly emerging field of astrobiology which seeks to understand life on early Earth and develop methods to detect life on other planets and moons both in our Solar System and elsewhere in the Universe.

Ancient DNA and Astrobiology According to the NASA Astrobiology Roadmap, astrobiology addressed three basic questions (Des Marais et al., 2008). “How does life begin and evolve? Does life exist elsewhere in the universe? What is the future of life on Earth and beyond?” These questions are inherently multidisciplinary in nature bringing together diverse teams of physicists, astronomers, geologists, biologists, and chemists to work together to answer these questions. Although these collaborations are generally beneficial, the interdisciplinary nature occasionally leads to conflicts that “arise from different standards that different disciplines require for ‘proof’” (Benner et al., 2013). One recent such conflict exists in the claim of bacteria which was alleged to substitute arsenic for phosphorus in its DNA (Wolfe-Simon et al., 2011). Benner et al. (2013) describe how this conflict in proof unfolded. The field of aDNA has faced problems with authentication of results due to the degraded and contamination prone nature of their samples. Astrobiology has come to include a component of DNA research, particularly in extreme environments. As many of the samples contain a low amount of biomass and/or degraded DNA and are prone to contamination; knowledge, methods, and standards of “proof” from the ancient DNA community can help improve DNA research in astrobiology.

144

As missions to planets and moons within our Solar System become more accessible and biotechnology advances, advanced biomolecular life detection systems are being tested for the goal of being used on other planets and moons specifically Mars, Europa, Enceladus, and Titan (Beegle et al., 2011; Lui et al., 2011; Parro et al., 2008, 2011). There are two main concerns with these DNA life detection mechanisms that can benefit from collaborations with aDNA researchers. First, these methods will have limiting sampling capability and limited geographical range. Thus better understanding the analog environments and preservation conditions of DNA here on Earth will lead the optimal sampling locations for such a mission. Second, extraordinary claims of finding DNA based life on elsewhere in our Solar System will require extraordinary proof of authentication as condemnation from Earth microorganisms poses a major problem (Bargoma et al., 2013; Ghosh et al., 2010; Kerney and Schuerger, 2011). Ancient DNA research has combatted problems of determining the authenticity of extreme DNA claims for thirty years and can provide insight for astrobiologists.

Analog environments and preservation conditions A number of Mars and Europa analog sites on Earth are being investigated (Azúa-Bustos et al., 2010; Barbieri and Stivaletta, 2012; Bulat et al., 2011; Foing et al., 2011; Gilichinsky et al., 2003; Gómez et al., 2011; Heldmann et al., 2013; Hynek et al., 2011; Jones et al., 2010; Marcucci et al., 2013; Sarkisova et al., 2010; Sherwood Lollar et al., 2009). These sites often contain what we normally think of as extreme conditions, yet life has been found to survive and even thrive from such places as permanently ice covered lakes (Bulat et al., 2011) to hot springs (Sarkisova et al., 2010). However, investigating such extreme locations comes with its own set of problems which can benefit from knowledge and methods used in aDNA research.

Two problems present in some analog sites are 1) they yield low biomass samples and 2) the samples can easily be contaminated. The CBIS is a an examples of a Mars analog site where cratering could create a temporary hydrothermal system (Horton et al., 2006). Working with low biomass samples poses problems which have been addressed in aDNA research. One such simple and relatively small detail is that DNA adheres to the walls of plastic wear and with low biomass samples this can actually negative effect amplification and library preparation. In order to avoid this non-specific binding of DNA to tube walls, 0.05% Tween-20 is added into solutions where

145

DNA is stored. The aDNA community has also optimized protocols for low biomass samples, including extraction methods and library preparation methods, geobiologists working in analog sites, can benefit from collaborations on protocol development.

Criteria for decreasing contamination used in aDNA research includes having extractions isolated from post amplification steps (Gilbert et al., 2005). This criteria was applied to the CBIS samples and most of the families of microorganisms detected in the negative extraction control were identified as bacteria commonly found on humans or in clean rooms. When such isolation protocols are not in place, such as with the claims of geologically ancient halophiles, there is less certainty that the results are not from laboratory contamination such as seen in (Park et al., 2009) where some of the ancient sequences were clearly the same as the modern positive control. Incorporating the knowledge of how to reduce contamination already used in the aDNA community can decrease contamination especially from modern extremophiles which might be in culture or amplified in the laboratory.

In addition to investigating different analog environments where life is currently active on Earth, aDNA research can provide insight into how DNA survives when not actively being repaired. Ideally, DNA would be found in a living microbial community on another planet; however, this might not be the case. Life may have gone extinct leaving only dead cells with damaged DNA or DNA shed into the environment. Understanding the conditions which are favorable to DNA preservation are important to choosing a landing location for any sampling mission to another planet or moon. Bones from the site of Snowmass, CO are reaching the edge of preservation conditions in temperate environments between 100,000 and 150,000 years ago. Colder temperatures yield increased DNA preservation with sites that are generally colder such as Hunker Creek, Yukon, Canada yielding slightly better preservation conditions. The best preservation is in permanently frozen conditions such as those which yielded the 700,000 year horse bone (Orlando et al., 2013). In addition to temperature DNA preservation is also dependent on pH, heavy metal concentration, oxygen content, and exposure to UV radiation (Hofreiter et al., 2001). Understand DNA preservation on Earth will aid in identifying places for landing remote detection devises on other planets or moons.

146

Remote DNA detection and analysis Ancient DNA researchers continually face difficulties with contamination. Extraordinary claims of finding biomolecules on other planets or moons, would require extraordinary proof that such evidence is authentic. As methods of detecting microorganisms increase in sensitivity, microorganisms have been found the survive in a large number of extreme conditions including clean rooms used in making space equipment (Bargoma et al., 2013; La Duc et al., 2003; Ghosh et al., 2010; Kerney and Schuerger, 2011; Venkateswaran et al., 2001) and trips into space (Horneck et al., 1994; Onofri et al., 2012; Saffary et al., 2002). These terrestrial microorganisms are potential contaminants to any remote detection assay.

Any claim of microorganisms from for a remote detection assay would require thoughtful use of negative controls, assessment of potential contaminants, inclusion of methods to detect extremely fragmented molecules (in case only remains and not currently living organisms are present), and assessment of results in regards to independent evolution. Microbial and human contamination pose problems for any ancient DNA analysis as DNA from these organisms is ubiquitous in nature. Contamination was observed in both the negative control for the CBIS extraction and in the human extracts from the moccasins. Understanding and assessing the possible sources for contamination is essential to authenticate the results and such methods can lead to support for authentic results as seen in the results from the CBIS core and buried Native American moccasins.

As is seen in the all the previous chapters, fragmentation of DNA after death of an organism occurs easily. As any sample return or remote detection devises would be expensive, it would be beneficial to include detection of fragmented DNA in order to assess if DNA ‘fossils’ are present in the environment as well.

Future Directions Continued collaboration of aDNA and astrobiology researchers will aid in increased claims of authenticity and increased detection of ‘fossil’ or low biomass microorganism in extreme environments. Future research directions for collaboration include the following. First, increased research on terrestrial and marine subsurface research. Issues of contamination surrounding

147

sample retrieval methods particularly coring continue to cause problems for researchers and detecting and preventing such contamination is essential to research in the deep subsurface as well as on missions to detect microorganisms on other planets and moons. Second, investigation into authentication of ancient DNA when only sequence data is present. Claims of geologically ancient DNA up to millions of years old are controversial. Collaborations could help increase understanding of DNA preservation in extreme environments such a high salt concentrations as well as understanding of how long bacteria can survive in low metabolic states, will aid in authentication of such geologically ancient data. In addition, when little information is available, ways of authenticating ancient samples from only sequence data could help in determining age of microorganisms which are found in the deep subsurface and aid in understanding how active these communities are. Finally, ancient DNA researchers often work with samples which are too fragmented to use traditional PCR bases assays, collaborations using such methods will help to improve detection techniques for organisms here on Earth or elsewhere in our Solar System which are too divergent to be detected by traditional methods.

148

References Azúa-Bustos, A., González-Silva, C., Salas, L., Wynne, J.J., McKay, C.P., Palma, R.E., and Vicuña, R. (2010). Atacama Desert Caves as Analog Models of Habitability for Microbial Life on the Surface of Mars. LPI Contrib. 1538, 5030. Barbieri, R., and Stivaletta, N. (2012). Halophiles, Continental Evaporites and the Search for Biosignatures in Environmental Analogues for Mars. In Life on Earth and Other Planetary Bodies, (Springer), pp. 13–26. Bargoma, E., La Duc, M.T., Kwan, K., Vaishampayan, P., and Venkateswaran, K. (2013). Differential Recovery of Phylogenetically Disparate Microbes from Spacecraft-Qualified Metal Surfaces. Astrobiology 13, 189–202. Beegle, L., Kirby, J.P., Fisher, A., Hodyss, R., Saltzman, A., Soto, J., Lasnik, J., and Roark, S. (2011). Sample handling and processing on Mars for future astrobiology missions. In 2011 IEEE Aerospace Conference, pp. 1–10. Benner, S.A., Bains, W., and Seager, S. (2013). Models and Standards of Proof in Cross- Disciplinary Science: The Case of Arsenic DNA. Astrobiology. Bulat, S.A., Alekhina, I.A., Marie, D., Martins, J., and Petit, J.R. (2011). Searching for life in extreme environments relevant to Jovian’s Europa: Lessons from subglacial ice studies at Lake Vostok (East Antarctica). Adv. Space Res. 48, 697–701. La Duc, M.T., Nicholson, W., Kern, R., and Venkateswaran, K. (2003). Microbial characterization of the Mars Odyssey spacecraft and its encapsulation facility. Environ. Microbiol. 5, 977–985. Foing, B. h., Stoker, C., and Ehrenfreund, P. (2011). Astrobiology field research in Moon/Mars analogue environments. Int. J. Astrobiol. 10, 137–139. Ghosh, S., Osman, S., Vaishampayan, P., and Venkateswaran, K. (2010). Recurrent Isolation of Extremotolerant Bacteria from the Clean Room Where Phoenix Spacecraft Components Were Assembled. Astrobiology 10, 325–335. Gilbert, M.T.P., Bandelt, H.-J., Hofreiter, M., and Barnes, I. (2005). Assessing ancient DNA studies. Trends Ecol. Evol. 20, 541–544. Gilichinsky, D., Rivkina, E., Shcherbakova, V., Laurinavichuis, K., and Tiedje, J. (2003). Supercooled Water Brines Within Permafrost—An Unknown Ecological Niche for Microorganisms: A Model for Astrobiology. Astrobiology 3, 331–341. Gómez, F., Walter, N., Amils, R., Rull, F., Klingelhöfer, A.K., Kviderova, J., Sarrazin, P., Foing, B., Behar, A., and Fleischer, I. (2011). Multidisciplinary integrated field campaign to an acidic Martian Earth analogue with astrobiological interest: Rio Tinto. Int. J. Astrobiol. 10, 291.

149

Heldmann, J.L., Pollard, W., McKay, C.P., Marinova, M.M., Davila, A., Williams, K.E., Lacelle, D., and Andersen, D.T. (2013). The high elevation Dry Valleys in Antarctica as analog sites for subsurface ice on Mars. Planet. Space Sci. 85, 53–58. Hofreiter, M., Serre, D., Poinar, H.N., Kuch, M., and Pääbo, S. (2001). Ancient DNA. Nat. Rev. Genet. 2, 353–359. Horneck, G., Bücker, H., and Reitz, G. (1994). Long-term survival of bacterial spores in space. Adv. Space Res. 14, 41–45. Horton, J.W., Ormö, J., Powars, D.S., and Gohn, G.S. (2006). Chesapeake Bay impact structure: Morphology, crater fill, and relevance for impact structures on Mars. Meteorit. Planet. Sci. 41, 1613–1624. Hynek, B.M., McCollom, T.M., and Rogers, K.L. (2011). Cerro Negro volcano, Nicaragua: An assessment of geological and potential biological systems on early Mars. Analogs Planet. Explor. 279–285. Jones, D.S., Tobler, D.J., Schaperdoth, I., Mainiero, M., and Macalady, J.L. (2010). Community structure of subsurface biofilms in the thermal sulfidic caves of Acquasanta Terme, Italy. Appl. Environ. Microbiol. 76, 5902–5910. Kerney, K.R., and Schuerger, A.C. (2011). Survival of Bacillus subtilis Endospores on Ultraviolet-Irradiated Rover Wheels and Mars Regolith under Simulated Martian Conditions. Astrobiology 11, 477–485. Lui, C., Carr, C.E., Rowedder, H., Ruvkun, G., and Zuber, M. (2011). SETG: An instrument for detection of life on Mars ancestrally related to life on Earth. In 2011 IEEE Aerospace Conference, pp. 1–12. Des Marais, D.J., Nuth, J.A., Allamandola, L.J., Boss, A.P., Farmer, J.D., Hoehler, T.M., Jakosky, B.M., Meadows, V.S., Pohorille, A., Runnegar, B., et al. (2008). The NASA Astrobiology Roadmap. Astrobiology 8, 715–730. Marcucci, E.C., Hynek, B.M., Kierein-Young, K.S., and Rogers, K.L. (2013). Visible to Near- Infrared Spectroscopy of Acid-Sulfate Weathering Sites in Nicaraguan Volcanic Systems: An Early Mars Analog. LPI Contrib. 1719, 1677. Onofri, S., de la Torre, R., de Vera, J.-P., Ott, S., Zucconi, L., Selbmann, L., Scalzi, G., Venkateswaran, K.J., Rabbow, E., Sánchez Iñigo, F.J., et al. (2012). Survival of Rock- Colonizing Organisms After 1.5 Years in Outer Space. Astrobiology 12, 508–516. Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., et al. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78. Park, J.S., Vreeland, R.H., Cho, B.C., Lowenstein, T.K., Timofeeff, M.N., and Rosenzweig, W.D. (2009). Haloarchaeal diversity in 23, 121 and 419 MYA salts. Geobiology 7, 515–523.

150

Parro, V., Rivas, L.A., and Gómez-Elvira, J. (2008). Protein Microarrays-Based Strategies for Life Detection in Astrobiology. Space Sci. Rev. 135, 293–311. Parro, V., de Diego-Castilla, G., Rodríguez-Manfredi, J.A., Rivas, L.A., Blanco-López, Y., Sebastián, E., Romeral, J., Compostizo, C., Herrero, P.L., García-Marín, A., et al. (2011). SOLID3: A Multiplex Antibody Microarray-Based Optical Sensor Instrument for In Situ Life Detection in Planetary Exploration. Astrobiology 11, 15–28. Saffary, R., Nandakumar, R., Spencer, D., Robb, F.T., Davila, J.M., Swartz, M., Ofman, L., Thomas, R.J., and DiRuggiero, J. (2002). Microbial survival of space vacuum and extreme ultraviolet irradiation: strain isolation and analysis during a rocket flight. FEMS Microbiol. Lett. 215, 163–168. Sarkisova, S.A., Tringe, S.G., Thomas-Keprta, K.L., Garrison, D.H., McKay, D.S., and Brown, I.I. (2010). Studying Prokaryotic Communities in Iron Depositing Hot Springs (IDHS): Implication for Early Mars Habitability. Sherwood Lollar, B., Moran, J., Tille, S., Voglesonger, K., Lacrampe-Couloume, G., Onstott, T., Pratt, L., and Slater, G. (2009). Hydrogeologic Controls on the Deep Terrestrial Biosphere- Chemolithotrophic Energy for Subsurface Life on Earth and Mars. In AGU Spring Meeting Abstracts, p. 01. Venkateswaran, K., Satomi, M., Chung, S., Kern, R., Koukol, R., Basic, C., and White, D. (2001). Molecular Microbial Diversity of a Spacecraft Assembly Facility. Syst. Appl. Microbiol. 24, 311–320. Wolfe-Simon, F., Blum, J.S., Kulp, T.R., Gordon, G.W., Hoeft, S.E., Pett-Ridge, J., Stolz, J.F., Webb, S.M., Weber, P.K., and Davies, P.C. (2011). A bacterium that can grow by using arsenic instead of phosphorus. Science 332, 1163–1166.

151

Vita

Kristine Korzow Richter Kristine was born and raised in Valparaiso, Indiana. She received a B.A. in Biochemistry and Near Easter Languages and Civilizations, with a minor in Mathematics as well as her M.S. in Chemistry from The University of Pennsylvania in 2006. During this time she worked on microarray chemistry in a corporate collaboration with Boekel Microhybe under Don Baldwin and Ponzy Lu. She received a second master’s degree (M.Sc.) in Archaeological Science from the University of Bradford (UK) in 2009 where she worked on proteomics of archaeological fish bones under Matthew Collins and Andrew Jones. Her work on ancient biomolecules led her to pursue a graduate degree in the Department of Biology of The Pennsylvania State University, where she worked under Beth Shapiro in the Ancient DNA Laboratory.