<<

CHARACTERIZATION OF IN () GLACIAL, BASAL, AND ACCRETION

Colby J. Gura

A Thesis

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

December 2019

Committee:

Scott O. Rogers, Advisor

Helen Michaels

Paul Morris

© 2019

Colby Gura

All Rights Reserved iii ABSTRACT

Scott O. Rogers, Advisor

Chapter 1: Vostok is named for the nearby located at 78°28’S,

106°48’E and at an elevation of 3,488 m. The lake is covered by a that is approximately 4 km thick and comprised of 4 different types of ice: meteoric, basal, type 1 accretion ice, and type

2 accretion ice. Six samples were derived from the glacial, basal, and accretion ice of the 5G (depths of 2,149 m; 3,501 m; 3,520 m; 3,540 m; 3,569 m; and 3,585 m) and prepared through several processes. The RNA and DNA were extracted from ultracentrifugally concentrated samples. From the extracted RNA, cDNA was synthesized so the samples could be further manipulated. Both the cDNA and the DNA were amplified through polymerase chain reaction. Torrent primers were attached to the DNA and cDNA and then prepared to be sequenced. Following sequencing the sequences were analyzed using BLAST.

Python and Biopython were then used to collect more data and organize the data for manual curation and analysis.

Chapter 2: As a result of the glacier and its geographic location, is an extreme and unique environment that is often compared to ’s ice-covered moon, .

Lake Vostok was originally thought to be sterile, but multiple studies have suggested that not only is there a variety of bacterial and eukaryotic organisms living in the lake, but it may contain a complex . The results of this analysis yielded metagenomic and metatranscriptomic data that aligned with a wide variety of organisms from 30 different phyla. The associated organisms were capable of many metabolic pathways, such as the cycle and fixation, as well as oxidation and/or reduction pathways for , , arsenic, hydrogen, iv , phosphorous, uranium, and chromium compounds. The number of organisms unique to each sample was quite high for all samples except the layered meteoric ice sample, which only contained sequences similar to one . These results, combined with previous research, indicates that Lake Vostok is a transitory repository of DNA and organisms from the glacier, and also contains a much larger dynamic and ecosystem.

v

TABLE OF CONTENTS

Page

CHAPTER 1: INTRODUCTORY REVIEW ...... 1

1.1 Lake Vostok ...... 1

1.2 Glacial Ice ...... 3

1.3 Ice Core Drilling ...... 7

1.4 RNA Extraction ...... 9

1.5 DNA Extraction ...... 10

1.6 cDNA Synthesis ...... 11

1.7 Polymerase Chain Reaction (PCR) ...... 13

1.8 Ion Torrent Sequencing ...... 17

1.9 Basic Local Alignment Search Tool ...... 19

1.10 Python and Biopython ...... 21

1.11 Chapter 1 References ...... 23

CHAPTER 2: INFLUX OF ORGANISMS INTO VOSTOK FROM

GLACIAL AND BASAL ICE…………...... 29

2.1 Abstract ...... 29

2.2 Introduction ...... 30

2.2.1 Lake Vostok ...... 30

2.2.2 Glacial Ice Conditions ...... 31

2.2.3 Previous Research ...... 35

2.2.4 Purpose of Research ...... 37

2.3 Materials and Methods ...... 38 vi

2.3.1 Sample Preparation ...... 38

2.3.2 cDNA Synthesis ...... 39

2.3.3 Adapter Ligation ...... 40

2.3.4 Fractionation ...... 41

2.3.5 DNA Rehydration ...... 41

2.3.6 Polymerase Chain Reaction ...... 42

2.3.7 EcoRI MID Primer Polymerization and Purification ...... 42

2.3.8 Sample Purification ...... 45

2.3.9 Ion Torrent Primer Polymerization ...... 46

2.3.10 Ion Torrent Sample Preparation ...... 47

2.3.11 Sequence Clean Up ...... 47

2.3.12 Organism Determinations ...... 47

2.3.13 and Physiology ...... 48

2.3.14 Comparison of Data ...... 49

2.4 Results and Discussion ...... 49

2.4.1 Sequence Data – Overall ...... 49

2.4.2 Glacial Ice (2,149 m) ...... 50

2.4.2.1 Results Summary ...... 50

2.4.2.2 Organism Overlap ...... 50

2.4.3 Basal Ice (3,501 m + 3,520 m) ...... 51

2.4.3.1 Results Summary ...... 51

2.4.3.2 Organism Overlap ...... 58

2.4.3.3 Unshared Organism Summary ...... 72 vii

2.4.4 Shallow Embayment Type 1 Accretion Ice (3,540 m + 3,569 m) ...... 77

2.4.4.1 Results Summary ...... 77

2.4.4.2 Organism Overlap ...... 83

2.4.4.3 Unshared Organism Summary ...... 88

2.4.5 Shallow Embayment Type 2 Accretion Ice (3,585 m) ...... 91

2.4.5.1 Results Summary ...... 91

2.4.5.2 Organism Overlap ...... 101

2.4.5.3 Unshared Organism Summary ...... 104

2.4.6 Summary ...... 110

2.4.7 ...... 112

2.4.7.1 Glacial Ice (2,149 m) ...... 112

2.4.7.2 Basal Ice (3,501 m + 3,520 m) ...... 112

2.4.7.3 Type 1 Accretion Ice (3,540 m + 3,569 m)...... 114

2.4.7.4 Type 2 Accretion Ice (3,585 m) ...... 115

2.4.8 Metabolic Analysis ...... 117

2.4.8.1 Glacial Ice (2,149 m) ...... 117

2.4.8.2 Basal Ice (3,501 m + 3,520 m) ...... 117

2.4.8.3 Type 1 Accretion Ice (3,540 m + 3,569 m)...... 121

2.4.8.4 Type 2 Accretion Ice (3,585 m) ...... 124

2.5 Conclusions of Findings ...... 126

2.5.1 Meteoric Ice (2,149 m) ...... 126

2.5.2 Basal Ice (3,501 m + 3,520 m) ...... 127

2.5.2.1 Organism Overlap ...... 127 viii

2.5.2.2 Organism Contents...... 128

2.5.3 Shallow Embayment Type 1 Accretion Ice (3,540 m +3,569 m) ...... 130

2.5.3.1 Organism Overlap ...... 130

2.5.3.2 Organism Contents...... 131

2.5.4 Shallow Embayment Type Two Accretion Ice (3,585 m) ...... 133

2.5.4.1 Organism Overlap ...... 133

2.5.4.2 Organism Contents...... 134

2.5.5 Summary ...... 135

2.5.6 Biochemical Ecosystem ...... 135

2.5.6.1 Basal Ice ...... 136

2.5.6.2 Accretion Ice ...... 139

2.5.6.3 Basal Versus Accretion Samples ...... 142

2.5.7 Possible and Sources ...... 144

2.5.8 Zones in Lake Vostok ...... 148

2.5.9 Multicellular Organisms ...... 152

2.5.10 Lake Vostok Ecosystem ...... 159

2.5.11 Conclusions ...... 161

2.6 Chapter 2 References ...... 162 ix

LIST OF FIGURES

Figure Page

1 Three maps that illustrate where Lake Vostok is positioned, details about the

lake’s shape and geography...... 2

2 Top graph - Number of viable, non-viable, and total cells/ml isolated from glacial

meltwater. Lower graph - Number of unique sequences per sample and count of

colonies/ml isolated from the same samples...... 5

3 The process for reverse transcribing RNA into cDNA ...... 12

4 Stages, temperatures, and times for the thermocycler protocol utilized during PCR 15

5 Layout of the CMOS chip for Ion Torrent...... 18

6 Cross section along the pathway of the glacier to the ice core drill site,

indicating the probable composition of the ice that is over Lake Vostok as

well as the lake’s layout ...... 33

7 Detailed view of where all samples investigated in this study and previous study by

Shtarkman et al. (2013) are located in the glacier above Lake Vostok...... 34

8 The overlap of 3,501 m + 3,520 m, 3,540 m + 3,569 m, and 3,585 m samples is

illustrated using a Venn diagram ...... 59

9 The number of organisms from a variety of phyla found in the 3,501 m + 3,520 m

(basal ice) sample, the 3,540 m + 3,569 m (type 1 accretion ice) sample, and the

overlaps between them...... 63

10 Comparison of the number of organisms from a variety of phyla found in the

3,501 m + 3,520 m (basal ice) sample and the 3,585 m (type 2 accretion ice) sample

and the overlap between them...... 66 x

11 Organisms unique to 3,501 m + 3,520 m (blue circle with 513 unique organisms;

basal ice) and the V5 sample (yellow circle with 1224 unique organisms; type 1/2

accretion ice), as well as the overlaps between them...... 70

12 Organisms unique to and shared between 3,501 m + 3,520 m (blue circle with 513

unique organisms; basal ice) and V6 (3,606 m + 3,621 m; type 1/2 accretion ice

from the main lake basin; purple circle with 93 unique organisms)...... 72

13 Organisms unique to the 3,540 m +3,569 m sample (green circle; type 1 accretion ice)

and unique to the 3,585 m sample (red circle; type 2 accretion ice)...... 84

14 Organisms unique to the 3,540 m +3,569 m sample (green circle; type 1 accretion ice)

and unique to the V5 sample (yellow circle; type 1/2 accretion ice), both from the

shallow embayment...... 87

15 Organisms unique to the 3,540 m + 3,569 m sample (green circle; type 1 ice from

shallow embayment) and unique to the V6 sample (purple circle; type 1/2 ice from

main lake basin), as well as the overlap between them...... 88

16 Number of unique organisms that were unique to the 3,585 m sample (red circle;

type 2 accretion ice) and V5 (yellow circle; type 1/2 accretion ice) and the overlap

between them ...... 102

17 Organisms unique to both the 3,585 m sample (red circle; type 2 accretion ice from

the shallow embayment) and V6 (purple circle; 3,606 + 3,621 m; type 1/2 accretion

ice from the main lake basin), as well as the overlap between them...... 104

18 The distribution of unique organisms that are shared between the samples analyzed

in this study and a previous study by Shtarkman et al., 2013 ...... 111 xi

19 The proportions of the various extremophiles found in ice samples from contemporary

and previous research by Shtarkman et al., (2013)...... 113

20 The quantity of unique organisms from the three samples that were found to contain

extremophiles ...... 116

21 Phyla that have been found in both the 3,540 m + 3,569 m sample and the 3,585 m

sample that have been shown to be involved in the nitrogen cycle...... 122

xii

LIST OF TABLES

Table Page

1 A list of each of the 3 parts of the synthesized forward and reverse Ion Torrent

primer sequences...... 44

2 Summary of results from the 3,501 m + 3,520 m sample with organisms classified

to the phylum level, unique rRNA gene sequences, their ecology, their physiology,

and any important notes...... 52

3 Summary of results from the 3,540 m and 3,569 m sample with organisms classified

to the phylum level, unique rRNA gene sequences, their ecology, their physiology,

and any important notes ...... 79

4 Summary of results from the 3,585 m sample with organisms classified to the

phylum level, unique rRNA gene sequences, their ecology, their physiology, and any

important notes...... 93 1

CHAPTER 1: INTRODUCTORY REVIEW

1.1 Lake Vostok Lake Vostok is the largest of the 379 subglacial discovered to date in Antarctica

(Fig. 1), with a maximum depth of 1,067 m and a surface area of 14,000 km2 (Wright & Siegert,

2012). Named for the nearby Vostok Station, it is located at 78°28’S, 106°48’E, at an elevation

of 3,488 m. Existing for 15 million years under the East , Lake Vostok was

initially discovered in 1977 using aerial radio echo sounding (RES), a form of ground

penetrating radar (Robin et al., 1977; Siegert et al., 1996; Siegert et al., 2001). The lake was confirmed in 1996 by ERS-1, a satellite designed expressly for taking altimetry measurements of polar ice sheets. By accurately measuring elevation changes of the glacial ice, locations identified in the RES studies were reanalyzed and the proposed borders of Lake Vostok were confirmed (Siegert et al., 1996). Further analysis with aerial RES found that the average thickness of the glacier was approximately 4,000 m (Siegert et al., 1996). 2

Figure 1. Three maps that illustrate where Lake Vostok is positioned, details about the lake’s shape and geography. a) Map of Antarctica showing the location of Lake Vostok. b) Outline of

Lake Vostok based upon radio echo sounding and satellite. c) Area of focus for this analysis, the

5G drill site, and flow of glacier across the lake (dashed line with directional arrow).

Adapted from: Rogers et al. 2013

3

Lake Vostok is considered to be an extreme environment and is often compared to conditions on Jupiter’s icy moon, Europa (Petit et al., 2005). Lake Vostok exists in complete darkness because it resides under 4,000 m of overlying glacial ice that exerts approximately 350 atmospheres of pressure and has temperatures that greatly vary based upon their proximity to activity (Christner et al., 2006; Shtarkman et al., 2013). The dissolved levels are estimated to be a maximum of 50 times above atmospheric levels, due to melting of glacial ice depositing trapped atmospheric gases into the (Lipenkov & Istomin, 2001).

Consequently, the comparison to Europa is not without merit as any organism living in such an environment faces a myriad of astonishing challenges.

Though the harsh conditions would lead one could believe Lake Vostok is a sterile environment, evidence suggests this is not the case. This research serves to confirm previous findings in the Dr. Scott Rogers lab that indicated there may be an ecosystem of organisms living in Lake Vostok (D’Elia et al., 2008; D’Elia et al., 2009, Rogers et al., 2013, Shtarkman et al.,

2013). Furthermore, if evidence of organisms living in the lake can be found, the source of these organisms will be determined. To accomplish this, metagenomic and metatranscriptomic techniques will be employed to taxonomically identify DNA and RNA sequences isolated from ice core samples. These taxonomic identifications will be used to analyze the niches and metabolic capabilities associated with the organisms as well as compare ice core sections to determine if organisms are shared between samples.

1.2 Glacial Ice

The glacier above Lake Vostok is comprised of four distinct regions of ice: glacial ice, basal ice, type 1 accretion ice, and type 2 accretion ice. Glacial ice, or meteoric ice, is the thickest portion of the glacier extending from the upper surface to a depth of 3,310 m (Petit et 4

al., 1999). At its deepest, the age of the ice is estimated to be between 410,000 and 420,000 years

old (Petit, et al., 1999; Salamatin et al., 2004). It is comprised of compacted snowfall, which accumulates at a rate of 2.7 g/cm2 per year (Kapitsa et al., 1996). This undisturbed glacial ice

provides a temporal paleoclimate history dating back 420,000 years (Christner et al., 2006; Petit,

et al., 1999).

Bulat et al. (2009) investigated three samples derived from Vostok glacial ice and found

the samples to contain on average between 1 and 3 cells/ml, with a peak of 24 cells/ml. Another

study by Christner et al. (2006) on 13 different glacial ice samples between depths of 171 m and

3,196 m examined a variety of characteristics of the ice. From those samples, cellular

concentrations between 34 ± 10 and 380 ± 53 cells/ml were reported based upon SYBR Gold

staining of cell suspensions. Both studies provide evidence that there is a great degree of

variation in cellular concentration between samples from glacial ice. Additionally, it should be

noted that fine and dust can skew readings in some instances, though in glacial ice this

problem is not as prevalent.

Basal ice is between 3,310 m to 3,538 m in depth, which corresponds to glacial ice that is

between 1 and 2 million years old at its deepest (Salamatin et al., 2004). The basal ice is

meteoric ice that has interacted with the as the glacier moves eastward at about 2 m per

year over the ridge on the west side of the lake. It is characterized by the disorganization of the

ice layers, and presence of sediment inclusions. As a result of this turbulence, the temporal

record is not discernable in basal ice (Petit et al., 2005). As with meteoric ice, the concentration

of cells is extremely low falling between 1 and 2 cells per ml with almost none being viable,

based upon fluorescence microscopy with live/dead staining (Fig 2, Top; D’Elia et al., 2009). As

with the glacial ice, culturing of basal ice meltwater yielded <5 colonies per ml (Fig 2, Bottom). 5

When the cells were analyzed using electron microscopy many exhibited various degrees of

damage supporting the live/dead fluorescence staining evidence (D’Elia et al., 2008).

Figure 2. Top graph - Number of viable, non-viable, and total cells/ml isolated from glacial meltwater. Lower graph - Number of unique sequences per sample and count of colonies/ml isolated from the same samples.

Source: D’Elia et al., 2009.

6

Type 1 accretion ice found between depths of 3,538 m to 3,585 m and 3,595 m to 3,608

m is the result of lake water freezing to the bottom of the glacier in the shallow embayment and

over a peninsula, respectively, as the glacier traverses the lake (Fig. 1c; Salamatin et al., 2004;

Castello & Rogers, 2005; Rogers et al., 2013). The accretion ice closest to the bottom of the glacial ice (oldest accretion ice) is approximately 10,000 to 15,000 years old. There are several notable characteristics of type 1 accretion ice including: visible sediment or bedrock inclusions, a

change in ice crystal size from a few millimeters to up to one meter, and a radical reduction in

electrical conductivity compared to glacial and basal ice (Petit et al., 2005). Furthermore, unlike

the glacial ice, there is a higher cell count, with mean of 7-8 cells per ml (with moderate viability) and there are no trapped gases present in the accretion ice (Petit et al., 2005; D’Elia et al., 2009; Rogers et al., 2013). In addition, previous analysis of accretion especially at the border between the first layer of type 1 and type 2 ice (approximate depth of 3,585 m) had the highest concentration of cells, unique sequences, and colonies (Fig 2; D’Elia et al., 2009). Further investigation of accretion ice has found genetic evidence of higher richness and diversity of organisms, including within the , ,

Cyanobacteria, and ; as well as about 6% composed of a variety of

(Shtarkman et al., 2013).

Type 2 accretion ice is found between depths of 3,585 m and 3,595 m and 3,609 m and

3,769 m and is comprised of water that has accreted to the glacier over the lake water of the shallow embayment and over most of the southern basin of Lake Vostok, respectively (Castello

& Rogers, 2005; D'Elia et al, 2008, 2009; Rogers et al. 2013; Shtarkman et al. 2013). Unlike type 1 accretion ice, type 2 accretion ice is free of inclusions and below 3,623 m is almost entirely clear, with some monocrystals approaching 2 m in length (Petit et al., 2005; Bulat et al., 7

2009). Furthermore, there is a lower cell count at 0 - 4 cells per ml, and fewer viable cells. A

spike in cell concentration at approximately 3,620 m just after the peninsula, was reported, and

the was low (D’Elia et al., 2009). Genetic analysis of type 2 accretion ice has shown not only fewer organisms, but also less and diversity (Shtarkman et al.,

2013).

1.3 Ice Core Drilling

There are three categories of drilling methods: electromechanical, thermal, and hot water.

Electromechanical drilling utilizes an electric motor to spin a large tube tipped with hardened

steel teeth around the circumference of the drill, leaving a cavity in the center that holds the ice

core sections. Additionally, the outside of the tube has a protruding thread that carries the ice

chips away from the teeth as the drill cuts (National Science Foundation, 2019). Drilling fluids,

such as kerosene, are often used, which reduce vibration, lubricate the drill, and prevent

refreezing of water (Rogers & Castello, 2020, in press). Thermal drilling uses a heated ring at the

tip of the drill instead of teeth to melt through the ice. Refreezing of the borehole is a big

concern, so antifreeze must be added. While thermal drilling can be simpler due to

fewer moving parts and lower stress on the components, the bore holes produced are often

smaller and the heat can damage glacial ice due to temperature differential (National Science

Foundation, https://icecores.org/about-ice-cores). Hot water drills are used for creating a

borehole, without collecting a core. They are used for rapidly reaching a certain depth in the ice,

where an ice coring drill is then used to retrieve a core from subglacial features or to directly drill

to subglacial lakes (Rogers & Castello, 2020, in press).

The drilling of ice cores for scientific purposes is a relatively recent endeavor that began

in the late 1950s in . The earliest cores drilled by the U.S. Army were fairly shallow, 8

from a few hundred meters to 1,390 m deep. The cores were analyzed for isotope content to gain

greater insight into the stability of the climate of the past (Maries Vej, 2019). By the 1970s

drilling techniques had become more refined and soon ice cores reaching 2 km were retrieved

from Greenland. In particular, the DYE 3 ice core from Greenland, was studied by many people,

including S.O. Rogers and colleagues, who isolated viable and fungi, as well as virus

sequences that were preserved in ice for up to 100,000 years (Rogers & Castello, 2020, in press;

Castello & Rogers 2005; Castello et al., 1999; Ma et al., 2000).

Moreover, drilling at Vostok began in the 1970s and eventually ended in 2015, although

active drilling was not ongoing the entire time (Litvinenko et al., 2014). From 1980 until 1985

the first large scale drilling project began, known as the 3G Vostok core. After some setbacks,

the depth eventually reached 2,085 m corresponding to an age of 150,000 years. Analysis proved

fruitful when Professor Sabit Abyzov and his team discovered bacteria, fungi, and microalgae

among other things, within the ice core sections (Rogers & Castello, 2020, in press; Castello and

Rogers 2005; Abyzov et al., 2004). The 4G borehole began in 1985 to probe further into the

glacial ice. The borehole eventually reached 2,546 m, correlating to ice over 220,000 years old.

Analysis found that viable organisms decreased with depth, but some bacteria from the oldest ice

were found to be culturable (Rogers & Castello, 2020, in press).

The 5G borehole drilling began in 1990 starting with 5G-1, which ultimately reached a

depth of 3,523 m by February of 1998. A deviation was made creating a branch in the borehole

which was name 5G-2, because the drill was stuck at the 3,523 m depth. Drilling continued into

the accretion ice (which started at 3,538 m), but then was stopped at 3,623 m to discuss how best

to sample from the lake water. After an 8-year break, drilling resumed reaching a depth of 3,666 m before the drill got stuck, which again required deviation drilling around the problem area. 9

The 5G-2 borehole eventually struck lake water at 3,769 m flooding the borehole in February of

2012. The drill was brought to the surface, and the lake water in the borehole was allowed to freeze and a frozen core was retrieved from this frozen lake water the following year.

Unfortunately, the core section was heavily contaminated with (that contained bacteria) and could not be used for study. A final borehole, 5G-3, was drilled in January 2015 using the 5G-1 hole as a head start. The borehole deviated at 3,458 m from the 5G-1 hole and eventually reached a depth of 3,724 m (Litvinenko et al., 2014). To date, no results from this lake sample have been reported. Nonetheless, studies of the 5G cores have proven fruitful with ever more detailed analysis being performed (D’Elia et al., 2008, 2009; Rogers et al. 2013;

Shtarkman et al., 2013).

1.4 RNA Extraction

In order to examine the contents of the ice core sections in more detail, nucleic acids

(DNA and RNA) were extracted, amplified, and sequenced, followed by metagenomic and metatranscriptomic analyses. RNA extraction was performed utilizing TRIzol reagent. The

TRIzol reagent is comprised of two aqueous compounds, guanidinium isothiocyanate and phenol, to which chloroform is added. The first component is guanidinium isothiocyanate, a salt that is a chaotropic agent (disrupts hydrogen bonds between water molecules) that dissociates into two : guanidinium (Gdm+) and thiocyanate (SCN-). Gdm+ has been shown to be one of the strongest protein denaturing agents commonly in use, although its exact mechanism is not yet resolved. The protein denaturing power of the Gdm+ ions is essential for denaturing the RNases ubiquitously found in the environment (Mason et al., 2003). Additionally, the SCN- is responsible for neutralizing excessive positive charges, denaturing proteins, and reducing the 10 density of the . All of these qualities promote the salting out of proteins in solution once they are denatured (Mason et al., 2003).

The second component of TRIzol is phenol, which is a hydrocarbon that is partially water soluble due to its polar oxygen group. Phenol cannot dissolve nucleic acids because phenol is overall non-polar and nucleic acids are highly polar. However, the density of phenol (1.07 g/ml) is marginally higher than water pushing the aqueous layer to the top while dissolving potential contaminants away. Additionally, when exposed to phenol, proteins are denatured through interactions with amino acid side chains based upon polarity (Oswald, 2015).

Lastly, chloroform is added to the solution to further separate the phenol layer from the aqueous layer. Because chloroform has a density of 1.49 g/ml and is non-polar once the sample is centrifuged, the solution separates into three phases. The bottom phase is the organic chloroform layer (often dyed red from TRIzol solution) which carries with it the denatured proteins and non-polar contaminants. The second is the aqueous phenol phase that contains the suspended DNA, which can be isolated following a different procedure, although yields are generally low. Finally, a larger aqueous phase is found on top which contains the suspended

RNA (Rio et al., 2012).

1.5 DNA Extraction

The DNA extraction technique utilized was pioneered by Scott O. Rogers and Arnold J.

Bendich to effectively extract DNA from milligram amounts of fresh, preserved, and mummified tissues (Rogers & Bendich, 1985). This technique uses cetyltrimethylammonium bromide

(CTAB; aka hexadecyltrimethylammonium bromide) to exploit the differential solubilities of polysaccharides, proteins, and nucleic acids, when complexed with CTAB. The DNA-CTAB 11

complex forms a CTA+/DNA- salt with a long hydrophobic tail, which reduces the solubility of

the DNA. In addition, proteins in solution are denatured using a chloroform-isoamyl alcohol

treatment to separate proteins out of solution prior to precipitation of the nucleic acids. The

CTAB and NaCl concentrations can then be manipulated to precipitate the DNA and maintain

the polysaccharides in solution (Azmat et al., 2012). A Na+/DNA- salt is eventually obtained in

the final steps by raising the ion concentration, which replaces the non-polar CTAB on

the DNA, and precipitation of the sodium DNA salt using . Results from this method of

extraction have been shown to be capable of extracting DNA from sources containing low

concentrations of DNA. The resulting DNA has been shown to be of high quality and usable in

PCR and other molecular biology protocols (Rogers & Bendich, 1985).

1.6 cDNA Synthesis

Complementary DNA (cDNA) synthesis (Fig. 3) is a procedure in which RNA is reverse

transcribed into double stranded DNA in a few steps (Croy, 1998). The first step is to isolate

RNA from the target organisms. It should be noted that DNA can also be included in the sample

along with the RNA because it will not be degraded during the process. For the combined

metagenomic and metatranscriptomic procedures that we used, the RNA, extracted using TRIzol, and the DNA, isolated using CTAB, were combined together, in order to maintain a higher overall concentration, which improves recovery of the nucleic acids.

12

Figure 3. The process for reverse transcribing RNA into cDNA.

The second step is to attach DNA primers onto the sample RNA allowing reverse

transcriptase to bind to the template RNA (Fig 3a). To target mRNA, oligo-dT primers are used to specifically bond with the polyA tails of mRNA. The disadvantage to this is not only less final cDNA, but the cDNA is not as useful for organism identification, and cannot be used in some

NGS (next generation sequencing) protocols. Alternatively, random hexamers can be used to prime mRNA, tRNA, rRNA, and other . This results in a higher yield of cDNA, because the relative proportions of mRNA, tRNA (and other small RNAs), and rRNA are approximately

5, 15, and 80%, respectively (Sambrook et al., 2001)

Step 3 is the addition of reverse transcriptase (RT) to begin reverse transcribing the primed RNA forming a cDNA:RNA hybrid (Fig 3b). The RT is genetically engineered from an

RT originally isolated from Moloney Murine Leukemia Virus. Older RTs possessed RNase H activity which would damage the template RNA, reducing not only the maximum cDNA size but 13 quantity as well. However, the engineered RT enzymes do not have RNase H activity which increases the protection of the RNA and cDNA yield. Additionally, the engineered RTs are also more resistant to heat which enables cDNA synthesis at a higher temperature. This is beneficial when cDNA synthesis is performed at a higher temperature to denature template RNA secondary structures (Sambrook et al., 2001).

The RNA template is nicked in multiple locations with RNase H, leaving it disrupted but still mostly intact (Fig 3c). DNA polymerase I, originally isolated from Escherichia coli, is then added which uses the remaining RNA as primers to begin synthesis of the complement strand of

DNA through a process called nick translation (Fig 3d). The second strand synthesis reaction is incubated at 16°C to allow the DNA polymerase I to synthesize by nick translation. At higher temperatures, DNA polymerase I can cause strand displacement resulting in the removal of previously synthesized DNA. If conditions are appropriate, the result of the second strand synthesis is a double stranded DNA sequence comprised of a copy of the original RNA sequence and its complement (Sambrook et al., 2001; SuperScript Choice, 2019).

1.7 Polymerase Chain Reaction (PCR)

H. Gobind Khorana and Kjell Kleppe are often credited with first describing a scientific process closely resembling PCR. They called the process repair replication and it was capable of synthesizing short double stranded and single stranded DNA fragments. Without easy access to oligonucleotide primers and heat stable DNA polymerase the process proved impractical

(Kaunitz, 2015). It was not until the mid-1980s that it was modified for commercial applications, by Kerry Mullis, then at Cetus, when regions of single genes were replicated using DNA polymerase I from E. coli. (Sambrook et al., 2001; Kaunitz, 2015). Due to heating at 95°C during each cycle to denature the strands of DNA, the E. coli DNA polymerase was denatured 14

during each cycle, meaning that the DNA polymerase had to be replenished each cycle.

However, the isolation and purification of thermostable DNA polymerase from

aquaticus (Taq DNA polymerase), made it possible to use a single application of polymerase in

PCR, which led to its commercialization. Taq DNA polymerase is thermostable at the

temperature required to denature DNA (95°C), removing the requirement to add new polymerase

during each cycle of PCR (although after 30-40 cycles, it has lost a large proportion of its

original activity). Because of this, the process could be automated, resulting in a straight forward

procedure that can produce millions of amplicons from a small amount of input DNA in a matter

of hours (Rogers, 2017).

A total of six components are absolutely required for successful and accurate replication.

The first is DNA polymerase. Second, primers are required which are small lengths of

synthetically produced DNA usually designed for a specific binding site in the target DNA (but

can also be primers that amplify random lengths of DNA) in a final concentration of 0.1-0.5 uΜ.

The success of PCR often rides upon properly designed primers. Third, deoxynucleoside triphosphates (dNTPs) supply the nucleotides needed to synthesize complement DNA and are usually comprised of equimolar parts of dATPs, dTTPs, dCTPs, and dGTPs, each at 200-250 uM

(Sambrook et al., 2001). Fourth, ions, Mg2+, are required as cofactors for the

enzyme, and most commonly used at a concentration of 1.5 mM (Rogers, 2017). Fifth, a buffer,

commonly Tris (room temperature pH of 8.3-8.8), is needed to stabilize the pH of the reaction.

Last, and most obvious, template DNA is required because PCR requires some input DNA to

amplify. Typically, 10 ng of template DNA is used in a 25 ul PCR reaction (Sambrook et al.,

2001). Although not required, monovalent cations, usually K+, at a concentration of 50 mM are

recommended for longer DNA segments because it stabilizes the backbone of single 15

stranded DNA (Lorenz, 2012). Additionally, the concentrations of both the monovalent and

divalent cations can influence the efficiency and specificity of DNA polymerase (Raj, 2014).

PCR begins with an initial denaturation step, followed by a three-step cycle repeated up to 45 times, and ends with a final extension phase (Fig 4). In the initial denaturation step, the sample is heated to between 94 and 95°C for 1 to 3 minutes. Heating for a period greater than 3 minutes has been shown to inactivate the Taq DNA polymerase (Lorenz, 2012). The purpose of this is to denature the DNA separating the double stranded DNA in the sample into single stranded DNA. After the first step, denaturation is performed during each cycle by heating the reaction to 94 to 95°C for 30 seconds. The exact temperature and length of time is dependent on the G+C content and template DNA length. As the length of the DNA strand increases so do the total number of bonds which requires more energy and for a longer period of time to break.

Additionally, due to the extra hydrogen bond in the G+C nucleotide base pair it takes more energy to break compared to the two bonds in the A+T base pairs (Sambrook et al., 2001).

Figure 4. Stages, temperatures, and times for the thermocycler protocol utilized during PCR.

16

The second step in the cycle is to anneal the primers to the denatured template DNA. For this step, the reaction temperature is reduced to between 37°C and 65°C, depending on the annealing temperature of the primers, for 0.5 to 4 minutes (Lorenz, 2012; Rogers, 2017). If the temperature is too high, the primers will anneal poorly or not at all. If it is too low, non-specific

annealing will occur, meaning the primer may anneal to non-complementary template DNA.

This step is essential to optimize the accuracy and yield of PCR (Sambrook et al., 2001). Short

sequences require short times, while longer ones require long times during this process.

The goal of creating primers is specificity which means the primer needs to bind

accurately, consistently, and stably with its target DNA sequence. There are numerous factors to

consider when designing primers such as primer length. A primer should be between 18 and 25

nucleotides and close to the same length so the primers are long enough to be specific but short

enough to bond efficiently. Second, a primer should be free of repetitive elements and self-

complementary regions to prevent the formation of hairpin loops and dimers. The melting

temperature of the primers should be within 5°C of each other so the DNA product denatures

evenly and the primers anneal uniformly (Sambrook et al., 2001).

The third step is extension from the primers beginning the synthesis of complementary

DNA strand. A ramp rate of 0.1°C/sec to the extension temperature of 72°C to 78°C (depending

on the enzyme) is often used to prevent the primer from disassociating from the template DNA

before the DNA polymerase has extended the DNA. This step has been demonstrated to increase

DNA yield (Ellinghaus et al., 1999). A general rule is to allow 2 minutes of extension for every

500 bp of product. Following the conclusion of the extension step, the denaturing step begins to

separate the newly synthesized DNA once more. These three steps are repeated in each cycle. At

the completion of all the PCR cycles, an extended extension step of 72°C, which is 17 approximately three times longer, is used to ensure all polymerase is able to complete extension to the ends of all the templates (Sambrook et al., 2001).

1.8 Ion Torrent Sequencing

Ion Torrent sequencing is a next generation sequencing (NGS) technique that relies on the pH change as a result of a hydrogen ion being released each time a nucleotide is added to the single stranded template DNA. The technique uses a chip comprised of a complementary metal- oxide semiconductor (CMOS) that is attached to ion-sensitive field-effect transistors (ISFET), which are ideal for measuring changes in pH. The ISFET are located under small wells that house all components necessary for reading pH changes, including an acrylamide bead covered with template DNA, a sensor plate, and a gate to open and close a drain to remove the dNTP containing solution and any released protons (Fig. 5; Rothberg et al., 2011).

18

Figure 5. Layout of the CMOS chip for Ion Torrent. a) Schematic of how an individual well is laid out. When a nucleotide is added to the growing DNA strand, the H+ ions cause a change in the metal oxide sensing layer, which is converted into an electrical signal, and recorded on a computer. b) Electron micrograph showing the well alignment with the ISFET metal sensor and underlying electronics. c) Arrangement of the wells on the chip in a 2D array. Each of the rows are cycled through for each column to retrieve and measure their individual locations and outputs.

Source: Rothberg et al., 2011.

19

The process begins by binding the template DNA that is to be sequenced to DNA linking

sequences attached to the 2 um microbeads that are isolated in individual wells. This is facilitated

by adapters that have been attached to the template DNA, which are complementary to the DNA

sequences on the microbeads. Each of the dNTPs (dATPs, dTTPs, dCTPs, and dGTPs) are then

spread across the chip in a stepwise fashion. When a nucleotide is added by DNA polymerase, a

proton is released in the process, reducing the pH by 0.02 units per proton. If there are sections

of repeated nucleotides present, a proportional increase in the number of protons will be released

and thus a greater reduction in the pH. The pH change triggers the sensor at the bottom of each

well and this is converted into electrical signals, which are sent to a computer, wherein the type

of nucleotide, and the number of nucleotides, added is recorded for each position in the well

array. Between nucleotide solutions, each well is washed to remove the previous solution. No

enzymes are required to ensure complete removal of dNTPs due to the small size of the wells.

This process is repeated until synthesis is complete (Rothberg et al., 2011). The accuracy of the

process has been demonstrated to be 98.9% for the first 100 bp of a given sequence. To ensure

the highest degree of accuracy, each sequence is synthesized multiple times and a consensus sequence is produced, based upon all the readings. The consensus sequences have been shown to be 99.99% accurate when assembled and compared to a reference genome (Rothberg et al.,

2011).

1.9 Basic Local Alignment Search Tool

The Basic Local Alignment Search Tool (BLAST) was developed in 1990 as a quick and

efficient method of searching large DNA or protein databases, and is still extremely popular to this day. The algorithm uses heuristics to quickly search a given database with a high degree of accuracy (Madden, 2013). Briefly, the process is broken down into three algorithmic steps: 20 compiling a list of high scoring words, scanning the database for matches to the list, and extending out from the matches. The exact details of the search process will vary, based on whether a protein or DNA dataset is being used as the query (Altschul et al., 1990).

During the list compiling step for DNA, the algorithm compiles a list of contiguous k- mers (commonly 12-mers but can be configured differently) known as “words”, which consist of consecutive nucleotides in the sequence. For a given query sequence, a list length of n-k+1 where n is the length of the query sequence, and k is the word size. It has been demonstrated that using a smaller k-mer will lead to better results because more possible alignments can be discovered due to the higher chance of matching at the cost of speed (Altschul et al., 1997). This system creates a list often thousands of words long for each DNA sequence. The list is then converted into “bytes”, each comprised of 4 nucleotides, using a table. By compressing the data down into bytes, the scanning process is considerably faster (Altschul et al., 1990).

In the next step, the database is searched using each of the bytes compiled from the query. When a high quality alignment is found it is extended in both directions without gaps to analyze how much of the query aligns with the original match. More than one match within a certain distance of one another is given even more weight in its scoring compared to only one match (Altschul et al., 1997). Once the gap free extension is complete, the alignment is given a score based upon its quality. If the alignment is scored high enough (usually 38 bits), the extension phase will continue with gaps being introduced. The purpose of the gaps is to connect several high scoring segment pairs (HSPs) together into a single long sequence. Therefore, several lower scoring segments can be linked together into a single HSP (Altschul et al., 1997).

The best HSPs are subjected to analysis to calculate final scores based upon the introduced gaps 21

likely due to potential insertions and deletions (Madden, 2013). A list of the top 500 HSPs is

compiled and displayed to the user for manual curation (Altschul et al., 1997).

The BLAST program has evolved over the years beginning in 1990 with the first program

that lacked any gap analysis. Since then, it has evolved to what it is today. A free service

comprised of twelve related programs that allow for highly customizable searches of genomic,

transcriptomic, and proteomic data. All BLAST databases and programs are freely available for

download to run locally, known as standalone BLAST or BLAST+, allowing for even more

customization of the protocol (Madden, 2013). Unlike the online service, BLAST+ allows for

configuration of databases, more complex filtering protocols, highly customized output formats,

and parallel processing which can dramatically reduce processing time (Camacho et al., 2008).

Even with the program being freely available, running BLAST+ locally requires at least 100

gigabytes of storage for the databases and a high-end computer if large amounts of data are being processed.

1.10 Python and Biopython

Named after “Monty Python’s Flying Circus”, Python is an object-oriented programming language originally invented in 1991 by Guido van Rossum. It was developed to overcome barriers with older programming languages, such as exception handling, while improving on readability, ease of use, and ease of modification. These improvements to older languages resulted in a programming language that is applicable to almost any project. Thus, its use has grown worldwide since its introduction. Additionally, the Python Software Foundation is a nonprofit organization that currently holds the Python copyright and handles all Python related business. The organization ensures that the software is freely available across the world. Users 22 are encouraged to experiment to create new modules, which many do, which are almost always open source and free of charge (Python Software Foundation, 2019).

Originally created in 1999 by the Open Bioinformatics Foundation (OBF), one of these freely available modules for Python is Biopython. Since its inception, it has been built upon and improved by integrating high quality plotting libraries and statistical analysis modules. OBF lists over 100 publications that either cite or use Biopython as part of the research (Cock et al., 2009).

Biopython is capable of an extraordinary range of functionality related to almost any bioinformatics process. Some examples of its functionality include: parsing data in most common formats (FASTA, Genbank, EMBL, etc.,) into Python usable datasets, direct interaction with online services supported by NCBI (National Center for Biotechnology Information) and

ExPASy (Expert Protein Analysis system), interaction with local bioinformatics software, statistical analysis, and basic supervised machine learning. (Change et al., 2018)

Standard functions used in Python (Boolean operations, truth value testing, iterators, and data structure manipulation to name a few) are paired with Biopython’s specialized library to simplify and automate jobs that would take weeks to months. This is possible because Biopython specializes in biological data management and hides the large complicated scripts in the background allowing the end user to use only a few lines to execute very complex commands.

As a result, combining standard Python functions with Biopython, in conjunction with

Biopython’s well written tutorial, becomes easy and intuitive to learn and execute (Python

Software Foundation, 2019).

For example, Python can read through a spreadsheet and pick out specific columns, rows, or individual cells based upon almost any criteria. The chosen data can then be removed, 23 changed, moved, or saved, so that it can be used in another application. For example, an accession number could be used to retrieve taxonomic information from Genbank. When combined with standard functions and even other Python modules, a standard protocol can be created and automated. This enables hands off processing of vast amounts of data, in this study 4 million DNA sequences, instead of a laborious multistep procedure requiring multiple programs and file format conversions. Ultimately, the power of Python is its ease of use and seamless integration of open source modules like Biopython combined with the imagination of not only the user but the entire Python community.

1.11 Chapter 1 References Abyzov, S. S., Hoover, R. B., Imura, S., Mitskevich, I. N., Naganuma, T., Poglazova, M. N.,

Ivanov, M. V. (2004). Use of Different Methods for of Ice-Entrapped

Microorganisms in Ancient Layers of the Antarctic Glacier. Advances in Space Research,

33(8), 1222-1230.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J. (1990). Basic Local

Alignment Search Tool. Journal of Molecular Biology, 215(3), 403-410.

Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. (1997).

Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search

Programs. Nucleic Acids Research, 25(17), 3389-3402.

Azmat, M. A., Khan, I. A., Cheema, H. M., Rajwana, I. A., Khan, A. S., & Khan, A. A. (2012).

Extraction of DNA Suitable for PCR Applications from Mature Leaves of Mangifera

indica L. Journal of Zhejiang University. Science. B, 13(4), 239–243. 24

Bulat, S. A., Alekhina, I. A., Lipenkov, V. Y., Lukin, V. V., Marie, D., & Petit, J. R. (2009).

Cell Concentrations of in Glacial and Lake Ice of the Vostok Ice Core,

East Antarctica. , 78(6), 808-810.

Camacho, C., Madden, T., Tao, T., Agarwala, R., & Morgulis, A. (2008). BLAST® Command

Line Applications User Manual. Retrieved from

https://www.ncbi.nlm.nih.gov/books/NBK279674/

Castello, J. D., Rogers, S. O. (Eds.). (2005). in Ancient Ice. Princeton University Press,

Princeton, NJ.

Chang, J., Chapman, B., Friedberg, I., Hamelryck, T., De Hoon, M., Cock, P., Antao, T.,

Talevich, E., Wilczyński, B. (2018). Biopython Tutorial and Cookbook. Retrieved from

http://biopython.org/DIST/docs/tutorial/Tutorial.html

Christner, B., Royston-Bishop, G., Foreman, C., Arnold, B., Tranter, M., Welch, K., Lyons, B.,

Tsapin, A., Studinger, M., Priscu, J. (2006). Limnological Conditions in Subglacial Lake

Vostok, Antarctica. Limnology and Oceanography, 51(6), 2485-2501.

Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I.,

Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, Michiel J. L. (2009). Biopython:

Freely Available Python Tools for Molecular Biology and Bioinformatics.

Bioinformatics, 25(11), 1422-1423.

Croy, R. (1998). Molecular Genetics II - Genetic Engineering Course (Supplementary notes).

Durham University. Retrieved from

https://web.archive.org/web/20020824023822/http://www.dur.ac.uk/~dbl0www/Staff/Cr

oy/cDNAfigs.htm 25

D'Elia, T., Veerapaneni, R., Rogers, S. O. (2008). Isolation of Microbes from Lake Vostok

Accretion Ice. Appl. Environ. Microbiol., 74(15), 4962-4965.

D'Elia, T., Veerapaneni, R., Theraisnathan, V., Rogers, S. O. (2009). Isolation of Fungi from

Lake Vostok Accretion Ice. Mycologia, 101(6), 751-763.

Ellinghaus, P., Badehorn, D., Blümer, R., Becker, K., Seedorf, U. (1999). Increased Efficiency

of Arbitrarily Primed PCR by Prolonged Ramp Times. Biotechniques, 26(4), 626-630.

Kapitsa, A. P., Ridley, J. K., Robin, G. D. Q., Siegert, M. J., Zotikov, I. A. (1996). A Large

Deep Freshwater Lake Beneath the Ice of Central . , 381(6584),

684.

Kaunitz J. D. (2015). The Discovery of PCR: ProCuRement of Divine Power. Digestive

Diseases and Sciences, 60(8), 2230–2231.

Lipenkov, V. Y., Istomin, V. A. (2001). On the Stability of Air Clathrate-Hydrate Crystals in

Subglacial Lake Vostok, Antarctica. Mater. Glyatsiol. Issled, 91, 129-133.

Litvinenko, V. S., Vasiliev, N. I., Lipenkov, V. Y., Dmitriev, A. N., Podoliak, A. V. (2014).

Special Aspects of and Results of 5G Hole Drilling at Vostok Station,

Antarctica. Annals of , 55(68), 173-178.

Lorenz T. C. (2012). Polymerase Chain Reaction: Basic Protocol Plus Troubleshooting and

Optimization Strategies. Journal of Visualized Experiments, (63).

Madden, T. (2013). The BLAST Sequence Analysis Tool (2nd ed.). Retrieved from

https://www.ncbi.nlm.nih.gov/books/NBK153387/

Ma, L., Rogers, S. O., Catranis, C. M., Starmer, W. T. (2000). Detection and Characterization of

Ancient Fungi Entrapped in Glacial Ice. Mycologia, 92(2), 286-295. 26

Maries Vej, J. (2019). Drilling to bedrock – University of Copenhagen. Retrieved from

http://www.iceandclimate.nbi.ku.dk/research/drill_analysing/history_drilling/dril

l_bedrock/

Mason, P. E., Neilson, G. W., Dempsey, C. E., Barnes, A. C., Cruickshank, J. M. (2003). The

Hydration Structure of Guanidinium and Thiocyanate Ions: Implications for Protein

Stability in Aqueous Solution. Proc. Natl. Acad. of Sci. USA, 100(8), 4557-4561.

National Science Foundation (2019) About Ice Cores | NSF Ice Core Facility. Retrieved from

https://icecores.org/about-ice-cores

Oswald, N. (2015). The Basics: How Phenol Extraction of DNA Works. Retrieved from

https://bitesizebio.com/384/the-basics-how-phenol-extraction-works/

Petit, J. R., Alekhina, I., Bulat, S. (2005). Lake Vostok, Antarctica: Exploring a Subglacial Lake

and Searching for Life in an Extreme Environment. In M. Gargaud, H. Martin (Eds.),

Lectures in (pp. 227-288). Berlin, Heidelberg: Springer.

Petit, J. R., Jouzel, J., Raynaud, D., Barkov, N. I., Barnola, J. M., Basile, I., Benders, M.,

Chappellaz, J., Davis, M., Delaygue, G., Delmotte, M. (1999). Climate and Atmospheric

History of the Past 420,000 Years from the Vostok Ice Core, Antarctica. Nature,

399(6735), 429.

Python Software Foundation, (2019). General Python FAQ — Python 3.7.3 documentation.

Retrieved from https://docs.python.org/3/faq/general.html

Raj, S. (2014). Role of KCl and MgCl2 in PCR. Retrieved from

https://www.biotecharticles.com/Biotech-Research-Article/Role-of-KCl-and-MgCl2-in-

PCR-3271.html 27

Rio, D. C., Ares, M., Hannon, G. J., Nilsen, T. W. (2010). Purification of RNA Using TRIzol

(TRI reagent). Cold Spring Harbor Protocols, 2010(6). doi: 10.1101/pdb.prot5439

Robin, G. D. Q., Drewry, D. J., Meldrum, D. T. (1977). International Studies of Ice Sheet and

Bedrock. Philosophical Transactions of the Royal Society of London. Series B,

Biological Sciences, 16(3), 185-196.

Rogers, S. O. (2017). Integrated Molecular Evolution (2nd ed.). Boca Rotan, FL, United States:

CRC Press.

Rogers, S. O., & Bendich, A. J. (1985). Extraction of DNA from Milligram Amounts of Fresh,

Herbarium and Mummified Plant Tissues. Plant molecular biology, 5(2), 69-76.

Rogers, S. O., Castello, J. D. (2020, in press). Defrosting Ancient Microbes: Emerging

Genomes in a Warmer World. Boca Raton, FL, United States: CRC Press.

Rogers, S. O., Shtarkman, Y. M., Koçer, Z. A., Edgar, R., Veerapneni, R. S., D'Elia, T., Morris,

P. F. (2013). Subglacial Lake Vostok (Antarctica) Accretion Ice contains a Diverse Set of

Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya.

Biology, (2), 206-232.

Rothberg, J. M., Hinz, W., Rearick, T. M., Schultz, J., Mileski, W., Davey, M., Leamon, J.,

Johnson, K., Milgrew, M., Edwards, M., Hoon, J., Simons, J., Marran, D., Myers, J.,

Davidson, J., Branting, A., Nobile, J., Puc, B., Light, D., Clark, T., Huber, M.,

Branciforte, J., Stoner, I., Cawley, S., Lyons, M., Fu, Y., Homer, N., Sedova, M., Miao

X., Reed, B., Sabina, J., Feierstein, E., Schorn, M., Alanjary, M., Dimalanta, E.,

Dressman, D., Kasinskas, R., Sokolsky, T., Fidanza, J., Namsaraev, E., McKernan, K.,

Williams, A., Roth, G.T., Bustillo, J. (2011). An integrated Semiconductor Device

Enabling Non-Optical Genome Sequencing. Nature, 475(7356), 348-352. 28

Salamatin, A. N., Tsyganova, E. A., Lipenkov, V. Y. a., Petit, J. R. (2004). Vostok (Antarctica)

Ice-Core Time-Scale from Datings of Different Origins. Annals of Glaciology, 39(1),

283-292.

Sambrook, J., Russell, D. W., Russell, D. W. (2001). Molecular Cloning: A Laboratory Manual

(Vol. 999). New York: Cold spring harbor laboratory press.

Shtarkman, Y., Koçer, Z., Edgar, R., Veerapaneni, R., D'Elia, T., Morris, P., Rogers, S. (2013).

Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse set of Sequences

from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya. PloS ONE, 8(7),

e67221.

Siegert, M. J., Ridley, J. K., Kapitsa, A. P., de Q. Robin, G., Zotikov, I. A. (1996). A Large

Deep Freshwater Lake Beneath the Ice of Central East Antarctica. Nature, 381(6584),

684-686.

Siegert, M. J., Ellis-Evans, J. C., Tranter, M., Mayer, C., Petit, J. R., Salamatin, A., Priscu, J. C.

(2001). Physical, Chemical and Biological Processes in Lake Vostok and Other Antarctic

Subglacial Lakes. Nature, 414(6864), 603.

SuperScript Choice System for cDNA Synthesis. (2019). Invitrogen User Guide. Retrieved from

https://www.thermofisher.com/order/catalog/product/18090019

Wright, A., Siegert, M. (2012). A Fourth Inventory of Antarctic Subglacial Lakes. Antarctic

Science, 24(6), 659-664.

29

CHAPTER 2: INFLUX OF ORGANISMS INTO SUBGLACIAL LAKE VOSTOK FROM

GLACIAL AND BASAL ICE 2.1 Abstract

With a maximum depth of 1,067 m and a surface area of 14,000 km2, subglacial Lake

Vostok is one of the largest lakes in the world. It was discovered under a glacier approximately

4,000 m thick in the East . As a result of the glacier and geographic location,

Lake Vostok is an extreme and unique environment that is often compared to Jupiter’s ice- covered moon, Europa. Any organism living in the lake would be subjected to a multitude of challenges, including 350 atmospheres of pressure, 50 times the atmospheric oxygen concentration, and dramatic temperature differences depending on proximity to hydrothermal vent activity. Because of these challenges, Lake Vostok was originally thought to be sterile, but multiple studies have suggested that not only is there a variety of bacterial and eukaryotic organisms living in the lake, but it may contain a complex ecosystem. It is hypothesized that

Lake Vostok not only contains a wide variety of organisms but that there is indeed an ecosystem existing in the lake, and that few organisms are being added to the lake from the glacial and basal ice. The results of this analysis yielded metagenomic and metatranscriptomic data that aligned with a wide variety of organisms from 30 different phyla. The associated organisms were capable of numerous metabolic pathways, such as the nitrogen cycle and , as well as oxidation and/or reduction pathways for sulfur, iron, arsenic, hydrogen, hydrocarbon, phosphorous, uranium, and chromium compounds. The number of organisms unique to each sample was quite high, although sequences for only one species were found in the layered meteoric ice. However, sequences similar to 513 unique organisms were isolated from the basal ice, which is meteoric ice that has been disrupted through contact with bedrock as the glacier moves across the landscape. The type 1 accretion ice yielded sequences similar to 133 unique 30

moves across the landscape. The type 1 accretion ice yielded sequences similar to 133 unique

organisms, while the type 2 accretion ice sample contained sequences similar to 681 unique

organisms. These results, combined with previous research, indicates that Lake Vostok is a

transitory repository of DNA and organisms from the glacier, but contains a much larger

dynamic community and ecosystem.

2.2 Introduction

2.2.1 Lake Vostok

Subglacial Lake Vostok has a maximum depth of 1,067 m and a surface area of 14,000 km2, making it the largest of the 379 lakes currently known in Antarctica (Wright & Siegert,

2012). Lake Vostok was only a theory until the early 1970s when scientists first detected the

presence of liquid water under the ice using ground-penetrating radar (Robin et al., 1979). The

subglacial lake is situated under the East Antarctica Ice Sheet beneath approximately 4,000 m of

ice (Siegert et al., 1996). It has been isolated from the atmosphere for approximately 15 million

years (Denton et al., 1993). The sum of these factors has produced a lake that presents a myriad of unique challenges to any organisms that live in the lake. Organisms face extreme pressure from the weight of the ice above, the complete lack of sunlight, extreme high and low temperatures that vary depending on proximity to the hydrothermal activity, and an oxygen concentration about 50 times higher than the atmosphere (Lipenkov et al., 2001). Due to the lake’s isolation, little was known until recent years about the ecology that exists in the of

the lake. 31

2.2.2 Glacial Ice Conditions

To further understand the dynamics present in the glacier above Lake Vostok as well as the lake itself the types of ice in the glacier must be understood. The glacial ice that covers Lake

Vostok consists of four different types of ice. Starting at the surface, glacial ice, also known as meteoric ice, is 3,538 m thick and makes up the majority of the ice in the glacier (Fig. 6). The glacial ice is comprised of compacted snowfall and contains trapped atmospheric gases, as well as dust particles and organic inclusions, providing a temporal paleoclimate record dating back

420,000 years (Christner et al., 2006; Petit, et al., 1999). Specifically, the meteoric ice sample

(2,149 m) in this analysis has been calculated to be approximately 156,000 years old (Siegert,

2003; Salamatin et al., 2004). The upper 3,310 m consists of mainly undisturbed layers of ice, while the next 228 m, known as basal ice, is meteoric ice that has been deformed through the interaction with the bedrock below (patterned gray, Figs. 6 and 7). This causes turbulence in the ice and a notable amount of inclusions (Petit et al., 2005). It is estimated that the oldest basal ice is between one and two million years old based upon computational models (Salamatin et al.,

2004), although parts of the basal ice could be as old at 15 million years old (Salamatin, personal communication).

The third and fourth types of ice are two types of accretion ice. Unlike meteoric and basal ice, accretion ice is formed by lake water freezing onto the bottom of the glacier and comes in two forms. Type 1 accretion ice (dark gray Fig. 6; Fig. 7) has fine silty inclusions throughout, and is found between depths of 3,539 m to 3,585 m and 3,595 m to 3,608 m and is the result of lake water freezing to the bottom of the glacier over parts of a shallow embayment and over the peninsula, respectively, as the glacier traverses the lake (Salamatin et al., 2004; Castello &

Rogers, 2005; Rogers et al., 2013). In addition to the granular inclusions, it contains 32 up 1 decimeter in size, and low electrical conductivity (Petit et al., 2005). It is estimated that accretion ice is formed at a rate of 3.8 cm/year in the shallow embayment. Based upon this rate, the 3,540 m and 3,569 m type 1 accretion ice samples are estimated to be about 5,100 and 4,400 years old, respectively (Christner et al., 2006). However, Bell et al (2002) has estimated the age of the accretion ice to be upwards of 13,000 years.

Type 2 accretion ice is found between depths of 3,585 m and 3,595 m and 3,608 m and

3,769 m (white, in Figs. 6 & 7). The type 2 ice sample analyzed in this study was from a depth of

3,585 m, which is crystal , with few inclusions. It formed over the shallow embayment and is estimated to be about 4,000 years old (Christner et al., 2006). Unlike type 1 ice, it is comprised of water that has accreted to the glacier over the lake water of the shallow embayment and over most of the southern basin of Lake Vostok, respectively (Castello & Rogers, 2005;

D'Elia et al., 2008, 2009). Beyond a depth of 3,608 m the type 2 accretion ice is generally free from inclusions and has ice crystals up to 2 m in length and is thought to accrete at a slightly decreased rate of between 1.4 and 2.5 cm/year (Bell et al., 2002; Petit et al., 2005; Bulat et al.,

2009). 33

Figure 6. Cross section along the pathway of the glacier to the ice core drill site, indicating the probable composition of the ice that is over Lake Vostok as well as the lake’s layout. It shows different layers of the glacier and the location of the borehole used to sample the ice.

Source: Shtarkman et al., 2013. 34

Figure 7. Detailed view of where all samples investigated in this study and previous

study by Shtarkman et al. (2013) are located in the glacier above Lake Vostok. The 2,149 m

sample (a) is about 175,000 years old. Both the 3,501 m and 3,520 m samples (b) are at least

500,000 years old. The 3,540 m sample (c) is about 20,000 years old and formed about 400 m

from shore. The 3,563 m sample (d) is about 18,500 years old and formed 5,500 m from shore.

The 3,569 m sample (c) is about 18,000 years old and formed 6,500 m from the shore. The 3,585 m sample (d) is 17,500 years old and formed 10,000 m from the shore. The 3,606 m and 3,621 m samples (e) are 16,000 and 14,000 years old and formed 14,500 m and 18,000 m from the shore,

respectively. HT = possible hydrothermal region. 35

2.2.3 Previous Research

Previous studies have investigated the cellular content of the 5G ice core from Lake

Vostok. Ice core samples corresponding to all four types contain cells, but with high variations in

concentration. An assay of ice core samples for cellular concentration was performed using flow

cytofluorimetry. Concentrations ranging from 1.9 cells/ml to 3 cells/ml, with a peak of 24

cells/ml, were reported in glacial ice samples (Bulat et al., 2009). Additionally, Christner et al.

(2006) investigated the concentration of cells using SYBR Gold staining in 15 different ice core samples ranging from 171 m to 3,537 m. Based upon the staining, the glacial ice samples ranged as low as 34 cells/ml ± 10 at ~2100 m to as high as 380 cells/ml ± 53 at ~1600 m, which appears to be an outlier compared to other studies.

Studies on basal ice samples have generally found lower cellular concentrations than glacial ice. Based upon fluorescence microscopy with live/dead staining the concentration of cells was approximately 2-8 cells/ml with few being viable. Attempts to culture organisms from basal meltwater corroborate the fluorescence microscopy with from zero to 18 colonies/ml reported (D’Elia et al., 2008; D’Elia et al., 2009). However, the region between 3,501 m and

3,520 m has been shown to have higher cell concentrations. Based upon the SYBR Gold assay, the cell concentration in basal ice was determined to be about 30 cells/ml but with little of the variation seen in glacial ice samples (Christner et al., 2006). Additionally, Bulat et al. (2009) using flow cytofluorimetry found no cells in basal ice samples although the researchers had trouble distinguishing between cells and the fine particles present in the sample.

Therefore, it is possible that some cells were present in the basal ice (Bulat, personal communication). 36

Accretion ice has been the subject of much greater attention because currently it is the

best method of studying the waters of Lake Vostok. Analysis of type 1 accretion ice has

demonstrated an average cell concentration of 7-8 cells/ml with moderate viability (D’Elia et al.,

2008). The border between type 1 and type 2 ice (approximate depth of 3,585 m) had some of the

highest concentrations of cells, unique DNA sequences, and colonies (D’Elia et al., 2008).

Further studies performed on the border of type 1 and type 2 accretion ice found genetic

evidence for high species richness and diversity of both bacterial and eukaryotic organisms

(Christner et al., 2006; Shtarkman et al., 2013).

Type 2 accretion ice has yielded fewer cells, with concentrations of 0-4 cells/ml, as well as fewer viable cells. A spike in cellular concentration was found at 3,620 m just after the peninsula, although it was significantly lower than the one at 3,585 m (D’Elia et al., 2008, 2009).

As with 3,585 m accretion ice from the shallow embayment, the 3606 and 3,621 m samples from the main lake basin were near the transition from type 1 to type 2 accretion ice. Further analysis found the genetic richness and diversity also was much lower than the 3,585 m sample

(Shtarkman et al., 2013).

Some of the sequences obtained from the type 1 and 2 accretion ice include a wide variety of bacteria, including psychrophilic, psychrotolerant, mesophilic, thermotolerant, and thermophilic species (Rogers et al., 2013; Shtarkman et al., 2013). Sequences most similar to those of autotrophic organisms indicate the presence of microbes capable of carbon fixation.

Also, other sequences indicating autotrophic and heterotrophic organisms that are capable of and nutrient cycling have been reported from the accretion ice. Finally, sequences from species common to aquatic and marine environments have been reported (Rogers 37

et al., 2013). Combined, these findings indicate that there is a dynamic and unique ecosystem in

Lake Vostok.

2.2.4 Purpose of Research

It is hypothesized that the genetic diversity found in accretion ice samples are not derived

from the overlying glacier, but instead is the result of an active community of organisms living in

the lake. To determine the source of the organisms, meltwater samples taken from 6 different ice

core sections encompassing glacial, basal, and accretion ice were analyzed. A combination of

metagenomic and metatranscriptomic data, and bioinformatic analyses, were used to evaluate the

species in the ice sample in order to determine the source of biological diversity found in Lake

Vostok.

It was expected that each glacial ice would contain a low quantity of organisms based

upon previous research (Chirstner et al., 2006, Bulat et al., 2009). The next highest quantity of

potential organisms were hypothesized to be from the 3,540 m + 3,569 m sample isolated from

type 1 accretion ice based upon previous cell counts (Abyzov et al., 2005; Chirstner et al., 2006,

D’Elia et al., 2008, 2009). The basal ice was expected to contain higher numbers of organisms, because subglacial environments usually contain higher numbers of organsims than the overriding . Microbial life is commonly found at the interface between glaciers and bedrock because of the presence of liquid water, biologically important , and, on occasion, geothermal activity (Christner et al., 2006, Christner et al., 2008, D’Elia et al., 2008;

D’Elia et al., 2009) Finally, the 3,585 m sample (type 2 ice) was hypothesized to have the highest quantity of organisms among all samples based upon previous research in the Dr. Rogers

Lab (D’Elia et al., 2008; D’Elia et al., 2009, Rogers et al., 2013, Shtarkman et al., 2013) 38

2.3 Materials and Methods

2.3.1 Sample Preparation

To assay the RNA and DNA present in each type of ice in the glacier, sections were

obtained from each of four Vostok 5G ice core regions: glacial ice (2,149 m), basal ice (3,501 m

and 3,520 m), type 1 accretion ice from the shallow embayment (3,540 m and 3569 m), and type

2 ice from the shallow embayment (3,585 m) obtained from the NSF-ICL (National Science

Foundation - Ice Core Laboratory, Lakewood, Colorado). The core sections were shipped frozen

to our laboratory at Bowling Green State University, Bowling Green, Ohio; and were stored at -

20°C. Sections were chosen based upon desired depths and absence of cracks. Some had

previously been cut lengthwise into quarter core sections by the staff at NSF-ICL. The selected sections were cut into lengths of 6-16 cm, which would yield between 250 and 400 ml of melt water and warmed at 4°C for 30 min prior to melting. All sterilization and melting procedures were conducted in a sterile laminar flow biosafety hood. The hood was inside a positive pressure room supplied with HEPA (high efficiency particulate air) filtered air. All lab bench and hood surfaces were sterilized with a 5.25% sodium hypochlorite solution, 70% ethanol, and treated with UV irradiation for 1 hour prior to sterilization of ice core sections. Inside the sterile laminar flow hood, the ice core sections were completely immersed in a 5.25% sodium hypochlorite solution (chilled to 4°C) for 10 sec followed by 3 rinses with 800 ml of autoclaved sterile reverse osmosis (RO) water (4 °C, 18.2 MΩ, <1 parts per billion [ppb] total organic carbon [toc]). Each treated ice core section was then allowed to melt at room temperature inside the hood using a sterile funnel with aliquots of between 25 and 50 ml being collected in sterile 50 ml screw cap tubes, placed on ice (frozen sterile RO water). 39

In total, five ice core section meltwater samples were prepared: 2,149 m (glacial), a

mixture of 3,501 and 3,520 m (basal), a mixture of 3,540 and 3,569 m (type 1 accretion), 3,585

m (type 2 accretion), and a negative control (sterile RO water, 18.2 MΩ, <1 ppb toc). Some

samples were mixed because there was insufficient sample from each to be used individually.

Each of the samples was then concentrated from 250 ml of meltwater by ultracentrifugation at

100,000 xg for 14 h, followed by resuspension in 50 ul of sterile RO water. The nucleic acids

were purified using TRIzol reagent for the RNAs and CTAB for the DNA (Rogers and Bendich

1985; Rogers et al. 1989; Shtarkman et al., 2013). For each depth, 2.5 ul from the

ultracentrifuged sample and 2.5 ul from the purified nucleic acid sample were combined into one

tube. Resulting in 5 ul volumes for each sample that were used for cDNA synthesis.

2.3.2 cDNA Synthesis

To analyze the metatranscriptome of each sample, the RNA sequences present in each

sample were converted to cDNA using a SuperScript II kit (Invitrogen, Carlsbad, California).

First, random hexamer primers were added to each 5 ul sample to a concentration of 6.5 ng/ul.

Each sample was then mixed gently, incubated at 70°C for 10 min, and then chilled on ice. The first strand synthesis consisted of 33.3 mM Tris-HCl (pH 8.3), 50 mM KCl, 2 mM MgCl2, 6 mM

DTT (dithoithreitol), 0.33 mM dNTPs (0.0825 mM of each dATP, dCTP, dGTP, dTTP),100

units of SuperScript II reverse transcriptase, and mixed gently to a final volume of 15.5 ul for

each sample. All samples were then incubated at 37°C for 90 min and then placed on ice. Second

strand synthesis consisted of 25 mM Tris-HCl (pH 6.9), 93 mM KCl, 4.6 mM MgCl2, 0.14 mM

+ β-NAD , 9.3 mM (NH4)2SO4, 0.5 mM dNTP mix (as above), 1.2 mM DTT, 5 U E. coli DNA ligase, 20 U E. coli DNA polymerase I, and 1 U E. coli RNase H, for a total of 80 ul. Again, all reagents were mixed gently before being incubated for 2.5 h at 15°C. Next, 5 U T4 DNA 40

polymerase was added to each sample and incubated at 15°C for 7 min. Upon completion, the

samples were put on ice. The reaction was stopped by the addition of EDTA to a concentration of 30 mM. Then, 75 ul of chloroform/isoamyl alcohol (24:1) was added and the mixture and thoroughly agitated to form an emulsion. The samples were then centrifuged at 14,000 xg for 2 min to separate the layers. The aqueous layer (top layer) was then transferred into a new tube. To precipitate the DNA, NaCl was added to a final concentration of 0.5 M (after addition of ethanol) and gently mixed, followed by 250 ul of 100% ethanol (-20°C) mixed gently, and placed in a -

20°C freezer for 15 min. The samples were then centrifuged at 14,000 xg for 20 min to pellet the precipitated DNA, and the supernatant was carefully decanted. Next, the pelleted DNA was washed with -20°C 80% ethanol, centrifuged at 14,000 xg for 2 min, and then the ethanol was carefully poured off before drying the DNA pellet in a vacuum centrifuge (Eppendorf Vacufuge,

Hauppauge, NY) at 45°C for 10 min. The pellet was rehydrated in 9 ul of DEPC-treated water.

Each sample contained both the original DNA and cDNA. Both were used to ensure that there was a sufficient quantity of nucleic acids for sequencing.

2.3.3 Adapter Ligation

In order to later amplify the cDNA and DNA using PCR (polymerase chain reaction), a known sequence was added to each end of the unknown sequences. The reaction to attach the

EcoRI (NotI) adapters consisted of 66 mM Tris-HCl (pH 7.6), 10 mM MgCl2, 1 mM ATP, 100

pmols EcoRI (NotI) adapters (AATTCGCGGCCGCGTCGAC), 14 mM DTT, and 2.5 U T4

DNA ligase in a total volume of 25 ul. The reactions were incubated at 15°C for 16 h. Then, they

were heated to 70°C to stop the reaction. Next, 15 U T4 polynucleotide kinase was added,

mixed, and incubated at 37°C for 30 min. To stop the reaction, the samples were heated to 70°C. 41

2.3.4 Fractionation

To assess the variety and quantity of cDNA and DNA fragments present in each sample fractionation was employed to separate the nucleic acid fragments based upon length. Invitrogen cDNA size Fractionation Columns (Sephracyl supplied with the Superscript II kit) were rinsed with TNE (10 mM Tris (pH 7.5), 0.1 mM EDTA, 25 mM NaCl). Each sample was mixed with

TNE to a total volume of 122 ul and loaded into the Sephracyl columns. Then, 100 ul of TNE was added to the same column and drained into a second microfuge tube. Following this, 23 fractions (35 ul each) were collected with the addition of 100 ul of TNE to the columns, as needed.

2.3.5 DNA Rehydration

To purify each fraction the DNA was precipitated overnight and rehydrated. To each fraction, NaCl was added to a final concentration of 0.5 M, and mixed gently. Next, 2 volumes

(100 ul) of -20°C 100% ethanol were added, mixed gently, and incubated at -20°C overnight.

Samples were then centrifuged at 14,000 xg for 15 min to pellet the DNA. The supernatant was decanted and the pellet was washed with -20°C 80% ethanol, then centrifuged at 14,000 xg for 5 min. The supernatant was decanted before being dried by vacuum centrifuge (Eppendorf

Vacufuge, Hauppauge, NY) for 15 min at 45°C. The samples were then rehydrated using 50 ul of

0.1 X TE (1 mM Tris (pH 8.0), 0.1 mM EDTA). Each sample contained a mix of both DNA and cDNA representing the metagenome and metatranscriptome, respectively, with EcoRI (NotI) sequences added to both ends. 42

2.3.6 Polymerase Chain Reaction

Due to the low concentrations of DNA and cDNA present in each fraction, amplification was needed to create enough DNA for sequencing. Rehydrated fractions were subjected to polymerase chain reaction (PCR) amplification using a PTC-100 thermal cycler (Bio-Rad,

Hercules, California) which consisted of 0.8 mM dNTPs, 10 mM KCl, 20 mM Tris (pH 8.3), 10 mM (NH4)2SO4, 2 mM MgCl2, 0.4% Triton X-100, 10 mg bovine serum albumen, 1 U Taq DNA polymerase (AmpliTaq Gold DNA polymerase with supplied buffer, Applied Biosystems, Foster

City, California), 1 uM EcoRI (NotI) primer (AATTCGCGGCCGCGTCGA), and 5 ul of the

DNA/cDNA sample, in a total volume to 25 ul. The thermocycling program was 94°C for 5 min followed by 45 cycles of 94°C for 1 min, 55°C for 2 min, 0.1°C/sec ramp to 72°C, 72°C for 2 min. Once the cycles were finished, the samples were incubated at 72°C for 10 min before being held at 4°C.

All amplified fractions were subjected to electrophoresis on 2% agarose gels (Type 1, low EEO, Sigma-Aldrich, St Louis, Missouri) at 5 V/cm in TBE (89 mM Tris, 89 mM borate, 2 mM EDTA), 0.5 ug/ml ethidium bromide and visualized with UV irradiation to determine concentrations and size distributions. Once all samples were evaluated, it was decided that fractions 3-13 from each sample were the most ideal candidates for further processing, based on having DNA distributions between approximately 100 and 1,000 bp.

2.3.7 EcoRI MID Primer Polymerization and Purification

Ion Torrent requires specially designed primers to enable the sequencing process. To enable this process, a total of five pairs of primers were synthesized (Table 1). The forward primers were named MID2A A-key through MID6A A-key (skipping MID3A) and MID10A A- 43

key. The reverse primers were named in a similar fashion, MID2A P1-key through MID6A P1-

key (skipping MID3A), and MID10 P1-key (Table 1). Each A-key primer was paired to its corresponding P1-key primer. For example, MID2A A-key was paired with MID2A P1-key.

Each primer was comprised of three components (Table 1). The A-key sequences (blue sequences, Table 1) allow the DNA fragment to properly bind to the specific DNA strands attached to the microbeads embedded in the Ion Torrent microarray, while the P1-key sequences are used to initiate (i.e., prime) DNA sequencing from the opposite end of the DNA fragments.

Unique EcoRI MID (multiplex identifier) tag sequences (red sequences, Table 1) were synthesized into the DNA primers. The unique MID sequences were used for later identification of samples in silico. The EcoRI MID10A, EcoRI MID2A, EcoRI MID4A, EcoRI MID5A, and

EcoRI MID6A sequences were incorporated into the /cDNAs from the negative control;

2,149 m; 3,501 + 3,520 m; 3,540 + 3,569 m; and 3,585 m samples, respectively. Finally, the primers contained EcoRI/NotI sequences (black sequences, Table 1) to anneal to the sample

DNAs/cDNAs. The sequences were added using the same master mix as above, 1 uM corresponding MID EcoRI primer, 1 ul of DNA sample for a total volume to 25 ul with the same

PCR protocol used. 44

Table 1. A list of each of the 3 parts of the synthesized forward and reverse Ion Torrent primer sequences. The blue portion illustrates the A-key sequences needed for annealing to the Ion

Torrent microbeads, while P1-key sequences are necessary for priming the DNA synthesis reactions in the Ion Torrent system. The red portion represents the indexing (MID - multiplex identifier) sequence used to separate the different sequence sources in silico following sequencing. The black portion illustrates the EcoRI/NotI site that was used during PCR amplification. 45

2.3.8 Sample Purification

To remove extraneous dNTPs or very small DNA fragments present in the samples

following amplification, the samples were purified using a QIAquick PCR purification kit

(Qiagen, Venlo, Netherlands). The maximum capacity of the purification columns is 10 ug of

DNA. To ensure all samples were below this maximum threshold for purification columns, DNA

quantity was estimated using agarose gels (as above) by comparing a band of known DNA

quantity in the 100 base pair ladder (New England Biolabs, Ipswich, MA) to the DNA

fluorescence present from each sample fraction.

For each sample, fractions 3-7 and 8-13 were combined into separate tubes. Five volumes

of PB buffer was added to the volume of DNA present in each sample microfuge tube. Each

required QIAquick spin column was loaded into a 2 ml collection tube. Then each sample was

then loaded into a QIAquick spin column-microfuge tube apparatus and centrifuged at 14,000 xg for 45 sec. The resulting flow-through liquid was discarded and the 2 ml collection tube was reinstalled back over the column. The column was then washed using 750 ul of PE buffer and centrifuged for 45 sec. The flow-through liquid was again discarded and the same 2 ml collection

tube was returned to the QIAquick spin column. The column was centrifuged for another minute

to completely remove any remaining buffer. The column was then inserted into a clean 1.5 ml

microfuge tube and 30 ul of EB buffer (10 mM Tris-HCl, pH 8.5) was added to the column,

incubated at room temperature for 1 min and then centrifuged for 1 min. Once all samples were

purified, they were subjected to electrophoresis (as described above) on 2% agarose gels to

confirm that purification was successful. 46

2.3.9 Ion Torrent Primer Polymerization

The Ion Torrent adapters (which are required for Ion Torrent sequencing procedures) were added to the sample sequences, using a PCR procedure (described above), where the Ion

Torrent sequence had been synthesized as part of the primers (Table 1), Next, the ends of the amplified products were repaired using a NEBNext for Ion Torrent Library Preparation Kit (New

England Biolabs, Ipswich, MA). Each sample was prepared as follows: sample DNA (8.3 ng/ul), end repair buffer (50 mM Tris-HCl, 10 mM MgCl2, 10 mM DTT, 1 mM ATP, 0.4 mM dATP,

0.4 mM dCTP, 0.4 mM dGTP, 0.4 mM dTTP), end repair enzyme mix (NEBNext system).

Samples were then incubated for 20 min at 25°C then 70°C for 10 min.

To precipitate the DNA, NaCl (0.5 M after addition of ethanol) and 2.5 volumes (250 ul) of -20°C 100% ethanol were added to each sample and mixed gently by rocking. The samples were then incubated at -20°C for 1 hr. Then, the samples were centrifuged at 14,000 xg for 15 min. The supernatant was decanted and discarded. Each sample was washed with 100 ul -20°C

80% ethanol then centrifuged at 14,000 xg for 5 min. The supernatant was again decanted and the remaining pelleted DNA was dried in a vacuum centrifuge (Eppendorf Vacufuge,

Hauppauge, NY) for 15 min at 45°C. The samples were then rehydrated in 100 ul of 0.1 X TE (1 mM Tris (pH 8.0), 0.1 mM EDTA).

To add the final Ion Torrent primers (Table 1), and to increase the amount of DNA, a

PCR reaction was prepared. This reaction consisted of 200 ng DNA from each sample, 1 mM Ion

Torrent Primers, and 50 ul NEBNext Q5 hot start high fidelity PCR master in a total volume of

100 ul. The thermocycler program was 98°C for 30 sec followed by 20 cycles of 98°C for 10 sec,

58°C for 30 sec, 65°C for 30 sec and once complete, the samples were incubated at 65°C for 5 47

minutes. Once polymerization was complete the samples were purified with Qiaquick columns

using the previous procedure.

2.3.10 Ion Torrent Sample Preparation

Two samples were prepared for sequencing at the University of Pennsylvania Core

Sequencing Lab (Philadelphia, PA). Sample one consisted of a mix of the negative control, 2,149

m, and 3,501 + 3,520 m, each consisting of 150 ng of DNA for a total of 450 ng in a volume of

50 ul. The second sample consisted of a mixture of 225 ng of DNA from 3,540 + 3,569 m and

225 ng of DNA from 3,585 m, for a total of 450 ng and a final volume of 45 ul. The specific

mixes assured an equal contribution of DNA from each sample, based on quantitation from the

agarose gels.

2.3.11 Sequence Clean Up

To enable efficient and effective processing in silico the received files were subjected to several treatments. This process begins with the sequences that were received from the

University of Pennsylvania Core Sequencing Lab as two files in FASTQ format and converted to

FASTA format using Biopython (Open Bioinformatics Foundation). The files were then divided back into their original five groups based upon the MID index sequences (red sequences in Table

1). Next, the primer sequences were clipped off of all of the reads in silico in each of the five groups. Finally, any reads less than 50 base pairs were removed from all of the groups of sample

DNA.

2.3.12 Organism Determinations

To achieve quick and accurate organism affiliations from the sequence data, standalone

BLAST+ and the NCBI BLAST nucleotide and databases were downloaded. BLAST 48 searches were performed locally on each of the prepared files using the command: “blastn -db nt

-query input file -out output file.csv -outfmt "10 qseqid sacc sgi length sstart send pident evalue staxid sskingdom ssciname" -max_target_seqs 20”. Performing the BLAST search locally enabled a more customized and significantly faster process than using BLAST online.

Approximately 33% of the sequences from each ice core section depth failed to return any similar sequences during the database searches.

Using custom Biopython scripts, sequences that were found in the negative control were removed from the other datasets based upon GI numbers with a percent identity ≥99%. The top 5 removed were Macaca fascicularis, Pongo pygmaeus, Homo sapiens, Burkholderia multivorans, and Burkholderia cenocepacia. Duplicate GI numbers were eliminated by keeping the GI number with the lowest e-value and highest percent identity. Then, the gene that the query sequence matched during BLAST was retrieved and put into three categories: rRNA, mRNA, and other sequences. The rRNA genes were primarily used to confirm taxonomic affiliation. The mRNA genes were used to determine taxonomic affiliations and to analyze metabolic pathways.

The phyla corresponding to each genus and species were determined and associated with each sequence.

2.3.13 Ecology and Physiology

To obtain more detailed information about each organism identified, the ecological niches and metabolic characteristics were retrieved for each species, or strain if possible, with

NCBI information, publications, and other online resources. Organisms were then categorized into groups based upon gene similarities from BLAST, ecology, physiology, or other notable species characteristics. Additionally, the data were used to compile tables of organisms with potentially important metabolic pathways such as nitrogen cycling and carbon fixation. 49

2.3.14 Comparison of Data

To gather more data for additional comparisons, metagenomic and metatranscriptomic

data from Shtarkman et al. (2013) and Rogers et al. (2013) was retrieved. In those studies, the

data pertaining to glacial depths of 3,563 m and 3,585 m, respectively corresponding to type 1

and type 2 accretion ice from the shallow embayment, had been combined into one sample,

named V5. In addition, data from glacial depths of 3,606 m and 3,621 m, corresponding to type 1

and type 2 accretion ice from part of the main lake basin, were combined together, named V6.

Samples V5 and V6 were originally analyzed in a similar fashion to those being analyzed

currently. Sequencing data from the 2013 studies and the data presented here were compared to

determine the overlaps in taxa within the glacial, basal, and accretion ice using a custom Python

script. The matching organisms along with their corresponding phylum and were saved

as an Excel file for analysis.

2.4 Results and Discussion

2.4.1 Sequence Data - Overall

Sequence data NCBI accession numbers are as follows: 2,149 m sample -

SAMN12175388; 3,501 + 3,520 m sample - SAMN12175389; 3,540 + 3,569 m sample -

SAMN12175390; 3,585 m sample - SAMN12175391. The BioProject that contains information

related to the samples listed above located at the NCBI sequence read archive and can be

retrieved using the accession number PRJNA552298. 50

2.4.2 Glacial Ice (2,149 m)

2.4.2.1 Results Summary

For the 2,149 m sample (glacial ice), 4,571 bp were recovered with a total of 97 reads and an average length of 47 bp. Once duplicate sequences were removed and low-quality reads were removed, the 2,149 m sample had 38 unique reads with an average length of 56 bp. Of the 38 unique reads, 12 sequences were identified as belonging to the same uncultured cyanobacterium

GenInfo Identifier (GI) and accession number, all of which aligned with 16S ribosomal RNA with between 97 and 100% identity. The other 26 sequences were either removed as contamination, based upon the negative control, or did not align with anything in the BLAST database. The low quantity of reads in the 2,149 m sample are consistent with research by D’Elia et al. (2008) that showed glacial (basal) ice to have low cellular concentrations (a mean of 6 cells per ml of sample). In addition, Bulat et al. (2009) investigated glacial ice samples close to the

2,149 m sample and found between 3 and 24 cells per ml. The team noted that 24 cells per ml was at the extreme end of their measurements. Based upon this, it is not unanticipated that very few reads were found during sequencing.

2.4.2.2 Organism Overlap To understand whether the DNA sequences recovered from each sample were unique or shared among multiple samples an analysis of the overlap between samples was performed. This process provided insight into the origin of the potential organisms in each sample. All of the samples (2,149 m, 3,501 m + 3,520 m, 3,540 m + 3,569 m, and 3,585 m) contained sequences that aligned with an uncultured cyanobacterium all with the same accession and GenInfo identifier number. Beyond the 2,149 m sample, which aligned with 16S rRNA, the other three samples (3,501 m + 3,520 m 3,540 m + 3,569 m, and 3,585 m) aligned (100% identity) with a 51

gene of unknown function. However, because the regions used were from different parts of the

genome, it is impossible to determine whether it is the same species in each sample.

Additionally, overlap analysis between the 2,149 m sample and the V5 (3,563 m + 3,585 m; type

1/2 accretion ice) sample (Shtarkman & Koçer et al., 2013) again showed they both shared an

uncultured cyanobacterium. Finally, comparison between the 2,149 m sample and the V6 (3,606

+ 3,621 m; type 1/2 accretion ice) sample indicated no shared organisms, as the V6 sample did not contain any , uncultured or otherwise.

2.4.3 Basal Ice (3,501 m + 3,520 m)

2.4.3.1 Results Summary

To understand the metagenomic and metatranscriptomic data available, the total amount

of data was analyzed using some basic techniques. A total of 46,421,725 bp of sequence data

was found for the 3,501 m + 3,520 m sample (basal ice) with 1,079,575 total reads with an

average length of 43 bp. Following removal of redundant and <50 nt sequences, there were

33,139 unique reads, with an average read length of 68 bp. Following BLAST analysis, 513

unique organisms based on sequence similarities were determined (471 to species level) of which

388 aligned with rRNA gene sequences (Table 2). Of these unique organisms, 77.5% (365) were

bacteria, 19.7% (93) were eukaryotes, and 0.19% (1) was an . The remainder were

unidentified uncultured organisms. 52

Table 2. Summary of results from the 3,501 m + 3,520 m sample with organisms classified to the phylum level, unique rRNA gene sequences, their ecology, their physiology, and any important notes.

Unique Unique rRNA Species Taxon Organisms Sequences Ecology Physiology Characteristics Notes

Archaea 1 0 Euryarchaeota 1 0 freshwater Bacteria 398 314

heterotroph, saprotroph, halotolerant, chemoorganotroph intestines, , carbon, nitrogen, plant roots, plant , and sulfur Actinobacteria 48 10 surfaces, larval gamma radiation cycling arthropods, soil, resistant, antibiotic producing freshwater, and desiccation able to produce silver marine resistant nano-particles PCB, PAH, PHE, PYR, pyrene degrading, animal heterotroph, intestines, , chemoheterotroph, animal , saprotroph, halotolerant, 110 104 pathogens, plant chemoorganotroph, aerial surfaces halophile, hydrocarbon degrading, and roots, ice, thermotolerant, polysaccharides soil, freshwater, mesophile degrading, marine, sediment carbon cycling, 53

organisms hard to animal Candidatus culture in laboratories 3 2 intestines, soils, no information heterotroph (Has never been , cultured) wastewater photosynthetic, Cyanobacteria 2 0 freshwater alkaliphile , triglyceride producing, uncultured uncultured 1 1 uncultured species species species animal heterotroph, alpha- intestines, hemolytic, beta- ^converts peptone into animal symbiont, hemolytic, and then psychrotolerant, Firmicutes 34 32 plant aerial asaccharolytic, ammonium mesophile surfaces and ammonifying^, nitrogen *conversion of 2- roots, plant fixing, carbon cycling*, phosphoglycerate into symbiont, soil acid producing phosphoenolpyruvate uncultured uncultured 1 1 uncultured species species species uncultured uncultured Nitrospinae 1 1 uncultured species species species 1 0 soil mesophile chemoorganotroph

phototrophic iron reducing, arsenite rhizosphere, air, oxidizing, nitrogen 11 1 soil, freshwater, fixing, methylamine marine degrading, methanesulfonate- degrading 54

diazotrophic, methylotrophic, chemolithoautotrophic, chemoorganotroph, heterotroph soil, aquatic, hydrocarbon sediment, psychrophile, metabolizing, rhizosphere, mesophile, 114 98 denitrifying, nitrite- glacial ice, air, acidophile, oxidizing, nitrogen animal phytopathogen fixing, benzene intestines, degradation, iron oxidizing, iron reducing, dimethyl disulfide producing, heavy metal resistant sediment, iron reducing, freshwater, thermophile, reducing, crude oil 13 11 marine, soil, alkaliphile, degrading, - hydrothermal mesophile oxidizing, vents heterotroph, soil, freshwater, chemoorganotroph brackish water, psychrophile, nitrogen fixing, plant surfaces, halotolerant, 44 39 denitrifying, nitrifying, aquatic , phytopathogen, nitrite reducing, arsenic arthropod mesophile oxidizing, hydrocarbon symbiont, degrading

Epsilonproteobacteria 5 4 animal intestines mesophile sulfate reducing

chemolithoautotroph Hydrogenophilalia 1 1 freshwater thermophile hydrogen oxidizing, chemo- Oligoflexia 6 6 freshwater mesophile organoheterotroph 55

can cause heterotroph, obligate 3 3 disease in mesophile parasite animals Eukaryota 101 65 can cause Apicomplexa 4 0 disease in heterotroph, obligate animals mesophile parasite

Arthropoda 3 1 Argentinian ants, field no information mesophile soil, arthropod guts, plant heterotroph, saprotroph, Ascomycota 21 4 pathogen, mesophile ammonia metabolizing, surface of fruits, fermenting animal pathogen photosynthetic, marine, acidophile, photoautotrophic, freshwater, Bacillariophyta 17 15 halotolerant, triglyceride brackish water, stenothermic accumulating, carbon plant pathogen fixing psychrophile, acidophile, autotroph, arsenic halophile, freshwater, detoxifying, Chlorophyta 10 8 halotolerant, marine triglyceride mesophile, accumulating desiccation resistant amphibian parasite, fish parasite, heterotroph, mesophile, Euglenozoa 22 20 arthropod no information pathogenic, parasite, obligate parasite freshwater, marine, soil 56

photosynthetic carbon fixing, Haptophyta 1 0 marine mesophile dimethylsulfide producing cottonwood tree, super-xerophytic broadleaf evergreen, mustard plant, rapeseed, cotton, pollen in glacial photosynthetic, carbon Streptophyta 23 17 mesophile sunflower, wild ice fixing barely, henbane, sweet potato, morning glory, tropical pitcher plant, radish, fava bean, Japanese pagoda tree Unknown 13 10 isolated from: sediment, female adult beetle, enrichment culture, coal bed core, rumen solid digests, Hawaii Oceanographic Time- Uncultured bacteria 9 6 no information no information no information series study site Aloha, Zoige Plateau peat bog soil, 1.714- 1.722 g/ml fraction of 15N-monoammonium phosphate treated uncontaminated high soil 57

isolated from: krill guts, 25m from Uncultured 2 2 no information no information no information DHARMA 32 station in the South Atlantic isolated from: Guerrero Negro Hypersaline Mat 04; altitude 310m AMSL; Uncultured unknown 2 2 no information no information no information sample depth 1m below water level, barren control OBD up to 20 cm depth 58

2.4.3.2 Organism Overlap

Overlap between 3,501 m + 3,520 m (basal ice), 3,540 m + 3,569 m (type 1 accretion ice), and

3,585 m (type 2 accretion ice)

To further understand how each sample potentially interacts with the others and to identify the source of found in the lake, further analysis of the overlap between samples was performed. When the 2,149 m sample was factored out, there were 18 unique species found to be common between 3,501 m + 3,520 m, 3,540 m + 3,569 m, and 3,585 m samples (Fig. 8). 59

Figure 8. The overlap of 3,501 m + 3,520 m, 3,540 m + 3,569 m, and 3,585 m samples is illustrated using a Venn diagram. The blue circle illustrates the 3,501 + 3,520 m (basal ice) sample with N indicating a total of 513 unique organisms. The red circle illustrates the 3,585 m

(type 2 accretion ice) sample with N indicating a total of 681 unique organisms. The green circle illustrates the 3,540 m + 3,569 m (type 1 accretion ice) sample with N indicating 133 unique organisms. Each circle’s area is proportional to the number of unique organisms from each depth. The abbreviations for the phyla in each sample are listed below. 60 61

Of the 17 unique species common to 3,501 m + 3,520 m, 3,540 m + 3,569 m, and 3,585 m, 15 were bacteria, almost all of which were soil and freshwater dwelling bacteria. These included nine Actinobacteria, including three strains of kansasii (all 100% identity to a gene for acetyl-CoA acetyltransferase), a species found predominately in aquatic conditions, but has been isolated from soil on rare occasions; Tsukamurella tyrosinosolvens

(97% identity to homoserine O-succinyltransferase for 3,501 m + 3,520 m and 3,540 m + 3,569 m; 100% identity to AMP-dependent synthetase in 3,585 m), which is both soil dwelling and aquatic; Clavibacter michiganensis subsp. insidiosus (100% identity to 23S rRNA in 3,501 m +

3,520 m; 100% identity to ABC transporter in 3,540 m + 3,569 m; 97% identity to ABC transporter in 3,585 m), which is psychrotolerant; and Streptomyces Sge12 (100% identity to betaine-aldehyde dehydrogenase in 3,501 m + 3,520 m and 3,540 m + 3,569 m; 85% identity to

ATP-dependent RNA helicase HrpA in 3,585 m), a soil dwelling bacterium. The only found from all three samples was the cyanobacterium Arthrospira sp. PCC 8005

(all with 97% identity to a gene of unknown function), which is able to survive alkaline and depleted nitrogen conditions.

Five soil dwelling Betaproteobacteria were also recovered, including three Burkholderia species: B. pseudomallei (100% identity to gene of unknown function in 3,501 m + 3,520 m;

100% identity to translational GTPase TypA mRNA in 3,540 m + 3,569 m; 100 % identity to

23S ribosomal RNA in 3,585 m), B. cenocepacia H111 (96% identity to D-aminoacylase mRNA in 3,501 m + 3,520 m; 100% identity to GTP-binding protein TypA/BipA mRNA in 3,540 m +

3,569 m; 98% identity to D-aminoacylase mRNA in 3,585 m), and B. cenocepacia J2315 (96% identity to D-aminoacylase mRNA in 3,501 m + 3,520 m; 100% identity to GTP-binding protein

TypA/BipA mRNA in 3,540 m + 3,569 m; 98% identity to D-aminoacylase mRNA in 3,585 m). 62

Also present were sequences from, Massilia putida (94% identity in 3,501 m + 3,520 m; 97%

identity in 3,540 m + 3,569 m; and 86% identity in 3,585 m; all aligning with genes of unknown

function), an aerobic mesophile, which is resistant to heavy metal pollution and can produce

dimethyl disulfide; and Achromobacter xylosoxidans (95% identity to Maltose/maltodextrin

transport ATP-binding protein MalK mRNA in 3,501 m + 3,520 m; 97% identity to

Maltose/maltodextrin transport ATP-binding protein MalK mRNA in 3,540 m + 3,569 m; 100%

identity to 23S rRNA in 3,585 m), which is a wet soil dwelling aerobic bacterium.

The remaining two organisms were both eukaryotes. One of the eukaryotes was an

uncultured specimen (98% identity to 18S rRNA in 3,501 m + 3,520 m; 98% identity to 18S

rRNA in 3,540 m + 3,569 m; 100% identity to 28S rRNA in 3,585 m) and the other was

Monoraphidium neglectum (100% identity to a hypothetical protein in 3,501 m + 3,520 m; 3,540 m + 3,569 m; and 3,585 m), an aquatic oleaginous alga.

Overlap between 3,501 m + 3,520 m (basal ice) vs 3,540 + 3,569 m (type 1 accretion ice): Sequences most closely matching a total of 27 unique organisms were shared between samples from 3,501 m + 3,520 m and 3,540 m+ 3,569 m. Of the 27 organisms, 24 were of bacterial origin, including members of Actinobacteria, Proteobacteria (Betaproteobacteria and Gammaproteobacteria), and Cyanobacteria (Fig. 9). Of the 24 unique Bacteria, 20 (83.3%) were most similar to comparable mRNA sequences and the other four (16.7%) failed to align with a gene that has a known function in the NCBI database. In addition, 18 had an identity above 97% and all were above 88% identity. The other three were from the eukaryotic phyla Arthropoda and Chlorophyta. This overlap was approximately 5.26% of all the unique organisms found in the 3,501 m + 3,520 m sample and 20.3% of those from the 3,540 m+ 3,569 m sample. 63

Figure 9. The number of organisms from a variety of phyla found in the 3,501 m + 3,520 m

(basal ice) sample, the 3,540 m + 3,569 m (type 1 accretion ice) sample, and the overlaps between them. 64

Once organisms that were shared among the 3,501 m + 3,520 m, 3,540 m + 3,569 m, and

3,585 m samples were removed, eight Bacteria remained consisting of seven Betaproteobacteria and one Gammaproteobacteria. All organisms were found based upon sequence alignments with mRNAs, and had percent identities of 88% or greater. Some notable Betaproteobacteria included

Achromobacter ruhlandii (95% identity in 3,501 m + 3,520 m, and 100 % identity in 3,540 m +

3,569 m, both aligning with to glycerol-3-phosphate ABC transporter ATP-binding protein mRNA), a bacterium naturally found inhabiting soil; Burkholderia ubonensis MSMB22 (97% identity to coA-transferase III family protein mRNA in 3,501 m + 3,520 m, and 97% identity to

GTP-binding protein TypA/BipA mRNA in 3,540 m + 3,569 m), another soil dwelling bacterium; and Pandoraea oxalativorans (88% identity to sulfonate ABC transporter ATP- binding protein mRNA in both 3,501 m + 3,520 m and 3,540 m + 3,569 m), an aerobic mesophile known to thrive in soil or rich in oxalates. In addition, sequences from two

Paraburkholderi species were present in both samples; P. phymatum STM815 (94% identity to

ABC transporter related mRNA in both 3,501 m + 3,520 m and 3,540 m + 3,569 m) and P. sprentiae WSM5005 (95% identity to sulfonate ABC transporter ATP-binding protein for 3,501 m + 3,520, and 97.4% identity to the same mRNA in 3,540 m + 3,569 m). Both organisms fix nitrogen and have been found in rhizosphere environments.

For the two unique eukaryotes, three were sequences most closely matching mRNA and one aligned with an 18S rRNA sequence. The organisms were associated with two phyla,

Arthropoda and Chlorophyta. Monoraphidium neglectum (100% aligned with gene of unknown function in 3,501 m + 3,520 m and 3,540 m + 3,569 m) and an uncultured eukaryote (98% identity to 18S rRNA in 3,501 m + 3,520 m and 3,540 m + 3,569 m) were shared among all 65

samples, in addition to Linepithema humile (100% identity to mRNA for an uncharacterized

protein), a South American arthropod species. Overlap between 3,501 m + 3,520 m (basal ice), and 3,585 m (type 2 accretion ice): A total of 70 unique organisms were shared between the 3,501 m + 3,520 m sample and the 3,585 m sample (Fig. 10). Of these, 61 were Bacteria from four phyla (Actinobacteria, Bacteroidetes, Cyanobacteria, and Proteobacteria [Betaproteobacteria and Gammaproteobacteria]), eight were Eukaryota from three phyla (Arthropoda, Chlorophyta, and Streptophyta), and one was an uncultured unidentified organism. This overlap constituted approximately 13.5% of the total unique organisms that made up the 3,501 m + 3,520 m sample and 10.1% of the organisms in the 3,585 m sample. 66

Figure 10. Comparison of the number of organisms from a variety of phyla found in the 3,501 m

+ 3,520 m (basal ice) sample and the 3,585 m (type 2 accretion ice) sample and the overlap between them. 67

As with the previous sample, 16 organisms that were shared among all samples have

been factored out for the following section. This left a total of 45 unique Bacteria, consisting of

15 Actinobacteria, one Bacteroidetes, 24 Betaproteobacteria, and five Gammaproteobacteria. For

the Actinobacteria, sequences for 14 of the organisms aligned with mRNA sequences and the

other was a sequence that aligned with an uncultured Actinobacteria. Most notable of these

included, radiotolerans SRS30216 = ATCC BAA-149 (97% identity to ATP-

dependent DNA helicase mRNA in 3,501 m + 3,520 m, and 91% identity to GTP-binding

protein LepA mRNA in 3,585 m), Cellulomonas fimi ATCC 484 (92% identity to saccharopine dehydrogenase mRNA in 3,501 m + 3,520 m, and 94% identity to GTP-binding protein TypA mRNA in 3,585 m) a soil dwelling bacterium, and Rhodococcus jostii RHA1 (97% identity to inositol 2-dehydrogenase mRNA in 3,501 m + 3,520 m, and 88% identity to acyl-CoA synthetase mRNA in 3,585 m). Rhodococcus species are common in polar soils, and have been frequently found in ice core samples (Castello and Rogers 2005; D'Elia et al., 2008, 2009;

Shtarkman et al 2013; Rogers et al. 2013). The only Bacteroidetes shared was Hymenobacter sp.

PAMC 26554 (96% identity to 23S rRNA in 3,501 m + 3,520 m, and 99% identity to sorbosone dehydrogenase mRNA in 3,585 m), isolated from an Antarctic . Next, 24

Betaproteobacteria were shared between samples, with 23 (all except 1 above 97% identity) having sequences aligned to ribosomal sequences. Notable Betaproteobacteria included,

Candidatus Methylopumilus planktonicus (100% identity to 23S rRNA in 3,501 m + 3,520 m, and 97% identity 16S rRNA in 3,585 m), a marine ; Aquabacterium olei (100% identity to 23S rRNA in 3,501 m + 3,520 m and 3,585 m), a soil inhabiting microbe; and

Azoarcus sp. DN11 (100% identity to 23S rRNA in 3,501 m + 3,520 m and 3,585 m), a nitrogen- fixing aquatic bacterium. Finally, five Gammaproteobacteria, three being identified to the species 68

level, included, Pseudoxanthomonas suwonensis (100% identity to 23S rRNA in 3,501 m +

3,520 m, and 100% identity to thymidylate synthase mRNA in 3,585 m), a mesophile originally

isolated from cotton waste compost; and Xanthomonas oryzae pv. oryzae (100% identity to 23S

rRNA in 3,501 m + 3,520 m, and 93% identity GTP-binding protein mRNA in 3,585 m), which

is sometimes found living freely in soil.

Five unique Eukaryotes were shared among the samples as well. The first, Gryllus

bimaculatus (93% identity to mRNA for an unknown protein in 3,501 m + 3,520 m, and 97%

identity with a gene of unknown function in 3,585 m), commonly known as the field cricket. The

remaining four organisms were within the Streptophyta, all with an identity greater than 97%.

Sequences closest to Vicia faba (99% identity to 28S rRNA in 3,501 m + 3,520 m, and 100%

identity with an ATP synthase CF1 epsilon subunit in 3,585 m) were found. The remaining

Streptophyta did not have sequences that aligned with genes with any documented function on

Genbank for those organisms.

Overlap between V5 (3563 m + 3585 m; type 1/2 accretion ice) and 3,501 m + 3,520 m

(basal ice): Between V5 and the sample from 3,501 m + 3,520 m sample, there were 30 unique sequences (25 bacterial and 5 eukaryotic) in common, spanning ten phyla (Fig. 11). Three were

Actinobacteria (all at least 97% identity): oris (95% identity in V5 and 97% identity

to inositol 2-dehydrogenase mRNA in 3,501 m + 3,520 m), Mycobacterium kansasii (92%

identity in V5 and 100% identity to acetyl-CoA acetyltransferase mRNA in 3,501 m + 3,520 m),

and an uncultured Actinobacterium sp. (96% to 98% identity in V5 and 97% identity to 16S

rRNA in the basal ice sample). Another seven were from the Bacterioidetes, and all were similar

sequences. The unique species all aligned with ribosomal RNA sequences and four had percent 69 identities of at least 97%. These included, Flavobacterium johnsoniae (100% identity in V5 and

100% identity to 23S rRNA in the basal ice data), a soil bacterium; Prevotella denticola (99% identity in V5 and 100% identity to 23S rRNA in the basal ice), an anaerobic mesophile; and

Pedobacter steynii (99% identity in V5 and 99% identity to 23S rRNA in the basal ice), a psychrotolerant soil dwelling organism. A single uncultured cyanobacterium that was found in all samples except V6 (3606 + 3621 m; type 1/2 accretion ice from the main lake basin). Five unique species from the phylum Firmicutes were found to be shared between previous (V5;

Shtarkman et al 2013) and the basal ice sample. Of these, three have been reported from soil and animal intestines and the other two are uncultured and Staphylococcus. Eight are

Proteobacteria, consisting of five Betaproteobacteria and three Gammaproteobacteria. All

Betaproteobacteria, were uncultured organisms, except Delftia acidovorans (98% identity in V5 and 100% identity to 23S rRNA in basal ice data), a soil, sediment, and freshwater dwelling organism. The Gammaproterobacteria consisted of Haemophilus haemolyticus (99% identity in

V5 and 100% identity to a gene of unknown function in basal ice sample data); xanthomarina (99% identity in V5 and 95.4% identity to 16S rRNA in the 3,501 + 3,520 m sample data), an organism associated with tunicates; and an uncultured bacterium. 70

Figure 11. Organisms unique to 3,501 m + 3,520 m (blue circle with 513 unique organisms; basal ice) and the V5 sample (yellow circle with 1224 unique organisms; type 1/2 accretion ice), as well as the overlaps between them. 71

For the eukaryotes, sequences shared between V5 and 3,501 m + 3,520 m fell into three

phyla (Arthropoda, Ascomycota, and Streptophyta) (Fig. 11). The first was similar to that from

Gryllus bimaculatus (88% identity in V5 and 93% identity to a gene of unknown function in basal ice data). Second, Saccharomyces cerevisiae (100% identity in V5 and 100% identity to

26S rRNA in 3,501 m + 3,520 m data), a yeast ubiquitous in nature, was found. Next, two

Streptophyta were shared, Medicago truncatula (93% identity in V5 and 100% identity to 60S rRNA in 3,501 m + 3,520 m data), a Mediterranean legume; and Vicia faba (96% identity in V5

and 100% 26S rRNA in 3,501 m + 3,520 m data), a common agricultural plant, with many wild

relatives. It is unlikely these sequences are the result of horizontal gene transfer or other anomaly because the sequences were 114 nucleotides in length and aligned with ribosomal sequences.

Finally, an uncultured eukaryote of an unknown phylum was found.

Overlap between V6 (3,606 m + 3,621 m; type 1/2 accretion ice from the main lake

basin) and 3,501 m + 3,520 m (basal ice): Between V6 and the 3,501 m + 3,520 m sample, there

were a total of eight unique organisms in common to both (Fig. 12), of which four were

uncultured organisms. The uncultured organisms consisted of one Gammaproteobacteria, one

Betaproteobacteria and one Actinobacteria, all with a >97% identity and aligning most closely to

16S ribosomal sequences. The remaining four organisms, all of which aligned with ribosomal

RNA genes, had sequence identities of at least 97% with those in the NCBI data base. These

included Dyadobacter fermentans (98.4% identity to 23S rRNA), a glucose fermenting aerobic

organism; Burkholderia pseudomallei (92% identity in V6 and 100% to a gene of unknown

function in the basal ice sample), a soil dwelling bacterium; and Delftia acidovorans (91%

identity to 16S rRNA in V6 and 100% 23S rRNA in 3,501 m + 3,520 m data), a member 72

of Betaproteobacteria isolated from soil, sediment, activated sludge, crude oil, and water.

Figure 12. Organisms unique to and shared between 3,501 m + 3,520 m (blue circle with 513

unique organisms; basal ice) and V6 (3,606 m + 3,621 m; type 1/2 accretion ice from the main

lake basin; purple circle with 93 unique organisms).

2.4.3.3 Unshared Organism Summary

To further understand the source of the DNA and RNA as well as the potential present in the basal ice the unique organisms were investigated. A total of 410 out of

513 unique organisms from the 3,501 m + 3,520 m sample (basal ice) were unique among all of

the samples analyzed. These organisms included one Archaea from the phylum Euryarchaeota,

316 Bacteria from 16 different phyla, the most numerous of which was Proteobacteria followed 73

by Bacteroidetes, and 93 Eukaryotes from 8 phyla, with Euglenozoa being the most numerous,

followed by Ascomycota.

Archaea: A sequence similar to that of Halorubrum trapanicum (100% identity to gene of unknown function) was found to be unique to the 3,501 m + 3,520 m (basal ice) sample. It was previously described as an extremely halophilic organism that was originally isolated from salt produced by the Trapani salt flats in Italy (Grant et al, 1998). Bacteria: Of the 316 unique bacterial species identified, a total of 239 had a percent identity of at least 97%. It should be noted that many of the organisms were extremophiles or had important metabolic capabilities which will be discussed later. Among all unique species, 145 were classified as Proteobacteria. Specifically, 11 were Alphaproteobacteria, including taiwanensis (100% identity to molybdopterin biosynthesis protein mRNA); and four species of Methylobacterium, all with 100% identity to mRNA sequences for oxidoreductase. Seventy-four organisms identified were Betaproteobacteria, 57 of which had an identity of 97% or greater, including, Burkholderiales GJ-E1013 (100% identity to 23S ribosomal RNA); Ferriphaselus amnicola (100% identity to 16S ribosomal RNA); and Ralstonia pickettii (100% identity to 16S ribosomal RNA). Sequences similar to thirteen Deltaproteobacteria were found, of which 11 had an identity of at least 97%, including Anaeromyxobacter sp. Fw109-5 (97% identity ATPase AAA-2 domain protein mRNA). Next, sequences were most similar to five , all of which had an identity of 100% (four aligning with 23S rRNA) to organisms from the genus Helicobacter. Thirty-five organisms were from the class Gammaproteobacteria, with 25 having an identity of at least 97%, such 74

as, Aeromonas hydrophila (16S rRNA), Serratia marcescens (23S rRNA), and Raoultella planticola (ATP-binding protein mRNA). From the class Hydrogenophilalia, only

Hydrogenophilus thermoluteolus (100% identity to 23S rRNA), an aquatic thermophile, was found. Finally, six Oligoflexia, such as, Silvanigrella aquatica (97.5% identity to 16S rRNA),

and five strains of Spirobacillus were present.

There were 101 unique organisms from Bacteriodetes identified, with 81 having a percent

identity of at least 97%. Some of the most notable included Fluviicola taffensis (100% identity to

23S rRNA), a freshwater organism; Flavobacterium kingsejongi (100% identity to 23S rRNA),

which has been isolated from Antarctic penguin feces; and Sediminicola sp. YIK13 (100%

identity to 23S rRNA), which is a marine bacterium. It should be noted that many organisms from Bacteroidetes were extremophilic or had important metabolic pathways.

There were 29 organisms from the Firmicutes that were not found in any other sample.

Of those, 14 had an identity of 97% or greater. The most notable organisms were those

commonly found in the intestines of animals, such as Blautia coccoides (100% identity to 23S

rRNA) and plantarum (100% identity to 23S rRNA). Others included

Paenibacillus polymyxa ATCC 842 (100% identity to 23S rRNA), which is a free-living soil dwelling bacteria, and pasteurianum (98% identity to 16S rRNA), which is a soil

borne organism able to oxidize hydrogen to produce protons.

Sequences most similar to 23 unique organisms from the Actinobacteria were found. Of

the sequences recovered, 13 resulted in an identity of at least 97%. Some of the most notable

included Acidipropionibacterium virtanenii (100% identity to 23S rRNA); Euzebya sp. DY32-46

(100% identity to 23S rRNA), which has been isolated from sea water and is involved in several 75 nutrient cycles; and Actinoplanes derwentensis (97 % identity to mRNA), often recovered from river sediments.

Finally, there were a few sequences that were allied with phyla that were minor components of the total, and only four of these could be classified to the species level. The first phylum is Candidatus Saccharibacteria. Three sequences aligned most closely with organisms within this candidate phylum, but none could be classified to the species level. One sequence each in Elusimicrobia, Gemmatimonadetes, and Nitrospinae were found, but none could be classified to species. For Planctomycetes, a sequence closest to Paludisphaera borealis (94% identity to a gene of unknown function) was found. This species is a chemoorganotrophic mesophile, originally isolated from a . Finally, sequences closest to three Spirochaetes were found, all of which were subspecies of pallidum. All were identified at 100% identity to 23S ribosomal RNA. Eukaryotes: Sequences from a total of 93 unique eukaryotes were found to be exclusive to the 3,501 m + 3,520 m sample (basal ice), of which 75 had an identity of 97% or greater. The eukaryotes were from 8 phyla, which were Apicomplexa, Arthropoda, Ascomycota, Bacillariophyta, Chlorophyta, Euglenozoa, Haptophyta, and Streptophyta. Sequences associated with all phyla each had at least one species with an identity of 97% or greater.

The most numerous phylum represented was Euglenozoa, with a total of 21 organisms indicated, all of which had an identity of at least 97%. Some of the most notable examples include, Dimastigella trypaniformis (100% identity to 60S rRNA), which is a terrestrial kinetoplastid isolated from sandy soil; Parabodo caudatus (100% identity to the 24S subunit 76

RNA), which is a free living aquatic kinetoplastid flagellate; and Trypanosoma triglae (100% identity to 24S alpha ribosomal RNA), which is associated with marine teleosts.

Next, 20 Ascomycota were indicated, with 14 having an identity of at least 97%. Fifteen of the 20 organisms were closest to strains of Saccharomyces cerevisiae, which is a common environmental organism, and is overrepresented in the NCBI database. Additionally,

Saccharomyces pastorianus (100% identity to 60S rRNA) was indicated. The remaining organisms were either classified to the species level or had a low identity.

A total of 19 organisms were found to belong to the phylum Streptophyta, all of which had sequence identities of at least 97%. Additionally, all but five were identified based upon ribosomal sequences. Of the 19, four organisms were closest to agricultural species, such as

Raphanus sativus (16S rRNA), Brassica napus (26S rRNA), Helianthus annuus (26S rRNA), and Gossypium herbaceum (gene of unknown function). The remaining organisms were a variety of wild species ranging from Styphnolobium japonicum (Japanese pagoda tree) to Silene vulgaris

(an American wildflower).

The next most common phylum found was Bacillariophyta, which is comprised of that are primarily photosynthetic. A total of 17 unique bacillariophytes were indicated, of which 10 had sequence identities of at least 97%. Some examples include, Chaetoceros decipiens (100% identity to 28S rRNA), an organism considered to be important in many aquatic food chains; Skeletonema potamos (100% identity to a gene of unknown function), an organism common in warm eutrophic waters; and an unidentified organism belonging to the genus

Nitzschia, which associated with cold waters, specifically those in polar regions. 77

Nine organisms belonging to Chlorophyta were found, with five having a sequence identity of at least 97%. Three were species of Chlamydomonas, each with at least 97% identity to 23S ribosomal RNA, or a gene of unknown function. Two were extremophiles and the third,

Chlamydomonas noctigama (16S rRNA), is a phototrophic freshwater and terrestrial species.

Outside of the Chlamydomonas, Dunaliella salina, a halotolerant alga, is the only other organism that could be classified to the species level, with an identity of at least 97% to 23S ribosomal

RNA.

Finally, only a few sequences were found from Apicomplexa, Haptophyta, and

Arthropoda. Four organisms, belonging to Apicomplexa, were from the genus Plasmodium, some species of which cause diseases in animals. None had sequences that aligned with a gene of known function. The Haptophyta that was indicated by a sequence was Emiliania huxleyi

CCMP1516 (100% identity to gene of unknown function), which is a coccolithophore, ubiquitous throughout ecosystems. One member of the Arthropoda was indicated. It was

Euryischia sp. RDB-1999 (94% identity to 28S rRNA), which belongs to a genus of parasitic wasps.

2.4.4 Shallow Embayment Type 1 Accretion Ice (3,540 m + 3,569 m) 2.4.4.1 Results Summary To understand the metagenomic and metatranscriptomic data available, the sequencing data was analyzed and broken down by domain. For the sample from 3,540 m + 3,569 m

(shallow embayment type 1 accretion ice) a total of 50,689,858 bp was recovered with a total of

1,236,338 reads and an average read length of 41 bp. Once redundant and <50 nt sequences were removed, there were 38,635 unique reads with an average length of 65 bp. A total of 133 unique organisms were indicated from sequence similarity results (131 to species level), of which 2 78 were aligned with rRNA genes (Table 3). In total, 76.7% of unique organisms were bacteria,

18.8% were eukaryotes, 1.5% were archaea, 2.25% most closely matched viruses, and the remainder were uncultured unidentified organisms. 79

Table 3. Summary of results from the 3,540 m and 3,569 m sample with organisms classified to the phylum level, unique rRNA gene sequences, their ecology, their physiology, and any important notes.

Unique Unique rRNA Taxon Organisms Sequences Ecology Physiology Species Characteristics Notes Archaea 2 0 can use light to power ATP halophile, radiation heterotrophic, synthesizing proton Euryarchaeota 2 0 marine resistant, heavy autotrophic pumps. metal resistant Has effective DNA repair mechanisms Bacteria 102 1

heterotroph, chemoorganotroph, animal thermotolerant, , saprotroph intestines, plant psychrotolerant, synthetic organic surfaces, photochromic Actinobacteria 41 0 halophile, pollutant degrading, marine, species halotolerant, hydrocarbon freshwater, soil, mesophile metabolizing, nitrate sediment oxidizing, antibiotic producing

autotroph, Cyanobacteria 2 0 freshwater, ice alkaliphilic photosynthetic phylum not very Gemmatimonadetes 1 0 soil no information well characterized 80

psychrotolerant, chemolithoautotroph, mineral ores, can fix carbon from Acidithiobacillia 3 0 acidophile, iron oxidation, sulfur soil mesophile oxidizing

animal cell photoautotrophic, cultures, animal photoheterotrophic, intestines, chemoautotrophic, Alphaproteobacteria 4 1 marine, mesophile chemoheterotrophic, freshwater, NOx producing, waste water, nitrogen fixing, carbon sediment fixing

chemoorganotroph, chemolithoautotrophic, heterotroph animal dimethyl-disulfide respiratory tract, producing, hydrocarbon Betaproteobacteria 34 0 animal mesophile degrading, nitrogen intestines, soil, fixing, denitrifying, sediment sulfur oxidation, iron oxidizing, uranium oxidizing

animal saprotroph, fungicide Deltaproteobacteria 1 0 intestines, soil, mesophile producing, antibiotic tree bark producing 81

animal pathogen, soil, chemoheterotroph, thermophile, freshwater, chemolithoautotrophic, halophile, Gammaproteobacteria 16 0 feces, animal thiosulfate oxidizing, acidophile, intestines, sulfur oxidizing, mesophile rhizosphere, tetrathionate oxidizing sediment

Eukaryota 25 0 Argentinian ant Arthropoda 1 0 no information mesophile heterotroph species animal Ascomycota 1 0 pathogen, mesophile heterotroph soil photosynthetic, carbon fixing, Chlorophyta 1 0 freshwater mesophile triglyceride accumulating possible contamination: cow, monkeys, Chordata 16 0 no information mesophile heterotroph goat, horse, mole rat, mouse, and crocodile mammal Platyhelminthes 1 0 mesophile obligate parasite parasite pollen in glacier photosynthetic, carbon pineapple, peanut, Streptophyta 4 0 mesophile ice fixing tobacco, date palm natural in Euglenozoa 1 0 tsetse fly gut mesophile obligate parasite Crocodilus niloticus Virus 3 0 82

properties of 1 0 bacteriophage no information no information bacteriophage and plasmid Properties of Phagemid 2 0 phagemid no information no information bacteriophage and plasmid Unknown 1 1 isolated from: , Uncultured eukaryote 1 1 no information no information no information Adventfjorden (Norway) 83

2.4.4.2 Organism Overlap Overlap between 3,540 m + 3,569 m (type 1 accretion ice) and 3,585 m (type 2 accretion ice), both from the shallow embayment: As with the previous overlap analyses, the organisms overlapping between samples were analyzed to provide evidence of the source of biodiversity in the lake. A total of 58 unique organisms were shared between the 3,540 m + 3,569 m and 3,585 m samples. Forty-two were from sequences most similar to bacteria from the phyla Actinobacteria, Cyanobacteria, Gemmatimonadetes, and Proteobacteria (Alphaproteobacteria and Betaproteobacteria). Additionally, 16 organisms were of eukaryotic origin from the phyla Ascomycota, Chlorophyta, Chordata, Platyhelminthes, and Streptophyta (Fig. 13). These organisms constituted about 43.6% of all organisms in the 3,540 m + 3,569 m sample and 8.5% of the 3,585 m sample. After removal of the aforementioned organisms shared among all samples, 26 bacteria, from four phyla, and 14 eukaryotes, also from four phyla, remained. This corresponded to 26.1% of organisms from 3,540 m + 3,569 m and 5.8% of the 3,585 m sample. 84

Figure 13. Organisms unique to the 3,540 m +3,569 m sample (green circle; type 1 accretion ice) and unique to the 3,585 m sample (red circle; type 2 accretion ice). The intersection indicates the organisms shared between the samples. The circle’s areas are proportional to the number of organisms present. 85

The majority of the bacteria were Actinobacteria which constituted 21 of the 26 unique

bacteria. This is a deviation from the trend of the other samples where the majority of shared

bacteria were Betaproteobacteria. Twenty of the Actinobacteria were based upon mRNA

alignments all with a percent identity above 81%, and 11 were greater than 97% identity. Most

notable Actinobacteria included Gordonia sp. 1D (100% identity to amidohydrolase mRNA in

3,540 m + 3,569 m, and 100% identity to glutamate-5-semialdehyde dehydrogenase mRNA in

3,585 m) a halotolerant, thermotolerant, hydrocarbon metabolizing bacteria isolated from

Antarctic soil and water. Aeromicrobium sp. A1-2 (93% identity to a hypothetical protein mRNA

in 3,540 m + 3,569 m, and 97% identity to AMP-dependent synthetase mRNA in 3,585 m) is a

psychrotolerant and thermotolerant bacterium isolated from marine sediment in Ardley Cove in

West Antarctica. A final notable Actinobacteria was Rhodococcus opacus (97% identity to

peptide synthetase mRNA in 3,540 m + 3,569 m, and 100% identity to bifunctional (p)ppGpp

synthase/hydrolase relA mRNA in 3,585 m), a halotolerant, soil dwelling, lithotroph.

A single Gemmatimonadetes, Gemmatirosa kalamazoonesis (100% identity to a LVIVD

repeat-containing protein mRNA in 3,540 m + 3,569 m and 3,585 m) was shared between both

samples. This organism is an aerobic chemoheterotroph previously isolated from agricultural

soil. In addition, three Alphaproteobacteria were shared between samples. Most notable among

these was Azospirillum brasilense Sp245 (100% identity to a hypothetical protein mRNA in

3,540 m + 3,569 m, and 100% identity to acetyl/propionyl-CoA carboxylase subunit alpha

mRNA in 3,585 m), a rhizosphere dwelling bacterium able to fix nitrogen. Finally, Burkholderia

metallica (100% identity to GTP-binding protein TypA mRNA in 3,540 m + 3,569 m, and 98%

identity to D-aminoacylase mRNA in 3,585 m) was the only member of the Betaproteobacteria, aside from those found in all samples. It is part of the Burkholderia cepacia complex, a group of 86

20 Burkholderia species that share similar attributes, known for living in both soil and fresh water.

The 14 eukaryotes consisted of one Ascomycota, 10 Chordata, one Platyhelminthes, and

two Streptophyta. The Ascomycota was Metarhizium brunneum ARSEF 3297 (100% identity to

hexose transporter-like protein mRNA in 3,540 m + 3,569 m and 3,585 m), and is a soil-born

mesophile. Of the 10 chordates, five were monkeys and four had a percent identity of 98% or greater and all were above 95%. However, the sequences for three of the monkeys aligned with

genes of unknown function in Genbank, and many primate sequences were found in the negative

control sample. It is likely the sequences similar to monkeys are the result of misclassified

human sequences due to the close evolutionary relationship between humans and monkeys as

well as the relatively short read lengths achievable with Ion Torrent. The o nly Platyhelminthes

found was a sequence that aligned to a gene of unknown function most similar to Spirometra

erinaceieuropaei (100% identity), an obligate parasite of animals, commonly known as a tape worm. Finally, the two Streptophyta were from sequences most similar to mRNA sequences in

Ananas comosus (94% identity to zinc transporter 10-like isoform X2 in 3,540 m + 3,569 m and

3,585 m), and Phoenix dactylifera (100% identity to PRA1 family protein F3-like in 3,540 m +

3,569 m and 3,585 m). The presence of these sequences are likely the result of pol len being

trapped in the glacier that was originally released by ancestors of modern species.

Overlap between 3,540 m + 3,569 (type 1 accretion ice) m and V5 (3,563 m + 3,585 m; type 1/2 accretion ice), both from the shallow embayment: The V5 and 3,540 m + 3,569 m

samples shared three unique organisms (Fig. 14). The first organism was an uncultured

cyanobacterium that was found in all samples, with the exception of V6 (3606 m + 3621 m, type

1/2 ice from the main lake basin). The second was Mycobacterium kansasii (92% identity 87

in V5 and 100% acetyl-CoA acetyltransferase in the 3,540 + 3,596 m data), which is a that was isolated from fresh water and soil. The final shared organism was another uncultured

eukaryote, of unknown taxonomic affiliation.

Figure 14. Organisms unique to the 3,540 m +3,569 m sample (green circle; type 1 accretion ice) and unique to the V5 sample (yellow circle; type 1/2 accretion ice), both from the shallow

embayment.

Overlap between 3,540 m + 3,569 m (type 1 accretion ice from the shallow embayment)

and V6 (3,606 m + 3,621 m; type 1/2 accretion ice from the main lake basin): Between V6 and

the 3,540 m + 3,569 m sample, two unique organisms were shared (Fig. 15). The first,

Burkholderia pseudomallei (92% identity to 16S rRNA in V6, and 100% to the mRNA of a

GTPase TypeA mRNA in the 3,540 + 3,569 m data). The second, Escherichia coli 88

(99% identity to 16S rRNA in V6), aligned 100% with a phage minor tail protein in the 3,540 m

+ 3,569 m sample.

Figure 15. Organisms unique to the 3,540 m + 3,569 m sample (green circle; type 1 ice from

shallow embayment) and unique to the V6 sample (purple circle; type 1/2 ice from main lake

basin), as well as the overlap between them.

2.4.4.3 Unshared Organism Summary

To uncover more evidence needed to determine the possible influx of organisms into

Lake Vostok from the glacial or basal ice, the organisms unique to the 3,540 m + 3,569 m sample were analyzed. In total, there were 65 unique organisms that were not shared with any other samples, with 41 having a percent identity greater than 97%. Of the 65 organisms, two

Archaea from the phylum Euryarchaeota were unshared. For bacteria, 51 belonging to the phyla

Actinobacteria and Proteobacteria (Acidithiobacillia, Alphaproteobacteria, Betaproteobacteria,

Deltaproteobacteria, and Gammaproteobacteria) were identified. Nine of the unshared organisms 89

were comprised of eukaryotes from the phyla Chordata, Streptophyta, and Euglenozoa. Finally, the taxonomic affiliation with the remaining three was undetermined. Archaea: Two potential Euryarchaeota, salinarum R1 (100% identity to formyltetrahydrofolate deformylase mRNA) and Halobacterium salinarum NRC-1 (100% identity to the same mRNA) were found only in the 3,540 + 3,569 m ice core sample. Both organisms are known to tolerate high levels of radiation and possess complex DNA repair mechanisms. Bacteria: A total of 51 unique bacteria were found to be unique to the 3,540 m + 3,569 m sample. Of the 51, a total of 31 organisms had an identity of 97% or greater. Additionally, all the bacteria were from three phyla: Actinobacteria, Acidithiobacillia and Proteobacteria (Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria).

Among all phyla, Proteobacteria comprised the 37 of the 51 unique organisms.

Rhodopseudomonas palustris BisA53 (100% identity to 40S rRNA) was the only

Alphaproteobacteria that was recovered. This organism is ubiquitously found in soil, marine

sediments, and freshwater sediments. A total of 21 unique Betaproteobacteria were recovered of which 15 had an identity of at least 97%. Fourteen of the 21 organisms were from the genus

Burkholderia including, B. gladioli pv. Gladioli (100% identity to translational GTPase TypA

mRNA) which is found in soil, water, and in many animals; and Burkholderia thailandensis

(100% identity to translational GTPase TypA mRNA) which is another soil dwelling organism.

Beyond Burkholderia, another organism identified was Bordetella petrii (100% identity to ABC 90

transporter ATP-binding protein mRNA), which was originally recovered from r iver sediment. A

single soil dwelling Deltaproteobacteria, Sorangium cellulosum So ce56 (97% identity to protein

kinase mRNA), was unique to the 3,540 m + 3,569 m sample. Finally, 14 Gammaproteobacteria

were unique unshared among other samples with 11 having an identity of 97% or greater. The

majority of these were strains of Escherichia coli. Beyond E. coli, several extremophiles, such as

Shigella sonnei (100% identity to phage minor tail protein L mRNA) and Sulfurifustis variabilis

(100% identity to Ankyrin mRNA), were recovered.

A total of 11 Actinobacteria were unique to the 3,540 + 3,569 m sample, with two having

an identity of at. T least 97% hese organisms were Kitasatospora setae KM-6054 (100% identity

to mRNA that produces a hypothetical protein) and Auraticoccus monumenti (100% identity to

CAAX protease self-immunity mRNA), both of which are environmental bacteria found in soil.

The identity of the other organisms did not drop below 84%, with all aligning with mRNA

sequences.

Finally, Acidithiobacillus ferrivorans SS3 (100% identity to phosphate ABC transporter mRNA), and two strains of Acidithiobacillus ferrooxidans (both 100% identity to phosphate

ABC transporter mRNA), were unique to this sample. All organisms isolated from the order

Acidithiobacillia have been shown to be chemolithoautotrophs that gain the majority of their

energy from oxidizing iron and sulfur compounds. Additionally, A. ferrivorans SS3 has been

demonstrated to fix carbon from carbon dioxide. Eukaryotes: Sequences for most identical nine eukaryotes were found to be unique among the samples. Of the nine organisms only three were found to have an identity of at least 91

97%. Additionally, the nine organisms fell into the phyla Chordata, Streptophyta, and

Euglenozoa, each having one organism at 97% identity or greater.

Sequences from six chordates were found only in the 3,540 + 3,569 m sample.

Additionally, none of the sequences found aligned with genes of known function. The only

chordate to have an identity of at least 97% was Equus caballus. However, often these animals

are rendered and parts end up as contaminants in PCR reagents, so this may be the source of

these sequences. More data would be needed to determine the accuracy of these results.

Two Streptophyta were absent from any other samples. As with the chordates, the sequences that aligned with both Streptophyta sequences matched genes of unknown function.

The first unshared organism was Arachis hypogaea (100% identity to a gene of unknown function), and the second was Nicotiana tabacum (91% identity to a gene of unknown function). The most likely source of these sequences is from pollen or airborne plant fragments. : Three bacteriophages were also identified and found to be absent from any other samples. The only one to align with an identified gene was Enterobacteria phage O276 (100% identity to mRNA for a minor tail protein sequence), which infects strains of Escherichia coli. The other two bacteriophage sequences were those that were similar to those of phagemid vector pScaf-7560.2 and phagemid vector pScaf-8064.5, both with 100% identity to an unknown gene. 2.4.5 Shallow Embayment Type 2 Accretion Ice (3,585 m)

2.4.5.1 Results Summary

To understand the total amount of data retrieved and how it was related to the other

samples, some data analysis was performed. The 3,585 m sample (shallow embayment type 2 92 accretion ice) had 1,656,787 reads containing a total of 71,241,841 bp, with an average length of

43 bp. Of those, 64,626 unique reads were found with a mean length of 80 bp once redundant and <50 nt sequences were removed. The BLAST results indicated 681 unique organisms (637 to species level), of which 153 aligned with rRNA genes (Table 4). Overall, 81.2% of the unique organisms were bacteria, 16.1% were eukaryotes, 0.3% were viruses, and the remainder were uncultured and unidentified organisms. 93

Table 4. Summary of results from the 3,585 m sample with organisms classified to the phylum level, unique rRNA gene sequences, their ecology, their physiology, and any important notes.

Unique Unique rRNA Species Taxon Organisms Sequences Ecology Physiology Characteristics Notes Bacteria 266 19

heterotroph, chemoorganotroph, thermophile, lithotroph, saprotroph thermotolerant, methylotroph psychrophile, alcohol metabolizing, psychrotolerant, hydrocarbon mesophile, metabolizing, halophile, animals, plants, polysaccharide *Kineococcus halotolerant, fungi, freshwater, degrading, nitrogen radiotolerans so Actinobacteria 237 3 desiccation resistant, marine, sediment, fixing, nitrogen radiation resistant acidotolerant, soil, ice, air reducing, nitrate it grows in nuclear alkaliphile, oxidizing, antibiotic waste alkalitolerant, producing, iron radiation resistant*, reducing, phosphorous heavy metal cycling, triglyceride resistant, accumulating, endosymbiont hexavalent chromium reducing

Aquificae 1 1 no information no information no information uncultured lichen, thermophile, Bacteroidetes 8 5 freshwater, halophile, radiation heterotroph activated sludge resistant 94

autotroph, photosynthetic triglyceride Cyanobacteria 4 2 freshwater alkaliphile accumulating, carbon fixing, oxygen producing mesophile, gamma heterotroph -Thermus 3 0 soil radiation resistant nitrate reducing

heterotroph, thermophile, animal intestines, , saprotroph Firmicutes 11 7 halotolerant, freshwater, soil nitrogen fixing, mesophile biosurfactant producing

oligotroph, Gemmatimonadetes 1 0 soil mesophile chemoheterotroph Planctomycetes 1 1 no information no information no information uncultured 95

heterotroph, oligotroph, phototrophic, photosynthetic, chemoorgaotrophic metabolize thermophile, ethanol/methanol, halotolerant, nitrogen fixing, nitrate soil, sediment, mesophile, reducing, NOx freshwater, , producing, aromatic Alphaproteobacteria 126 1 marine, alkalitolerant, monomer degrading, hydrothermal acidotolerant, heavy metal resistant, vents phytopathogen, antibiotic resistant, radiation resistant antibiotic producing, crude oil degrading, phosphate mobilizing, iron reducing, hexavalent chromium reducing

phototroph, psychrophile, methylotroph psychrotolerant, hydrocarbon animal, fungi, halotolerant, metabolizing, nitrate soil, freshwater, halophile, Betaproteobacteria 90 70 reducing, nitrite marine, sediment, alkaliphile, dissimilating, ice mesophile, denitrifying, thiosulfate phytopathogen, oxidizing, dimethyl desiccation resistant disulfide producing 96

chemoheterotrophic nitrate reducing, uranium oxide Deltaproteobacteria 5 1 sediment, soil mesophile reducing, iron oxide reducing, manganese oxide reducing

halophile, , plant surfaces, halotolerant, saprotroph, thiosulfate- insect guts, alkaliphile, oxidizing hydrocarbon Gammaproteobacteria 65 26 rhizosphere, soil, alkalitolerant, metabolizing, antibiotic aquatic, marine mesophile, resistant, nitrate phytopathogen reducing, denitrifying

Eukaryota 110 20 Honey bee and Arthropoda 2 0 no information mesophile heterotroph field cricket

heterotroph, saprotroph mesophile, hydrocarbon degrading, animal, plant Ascomycota 20 13 pathogen, hexavalent chromium surfaces, soil phytopathogen reducing, statin producing

plant surfaces, Basidiomycota 2 0 mesophile heterotroph rhizosphere, soil photosynthetic, autotroph, carbon Chlorophyta 1 0 freshwater no information fixing, triglyceride accumulating 97

monkeys, mouse, snake, treeshrew, Chordata 28 0 no information mesophile heterotroph domestic yak, birds, antelope, cod, tarsier

mesophile, free Nematoda 1 0 animals, soil heterotroph living parasite mesophile, obligate Platyhelminthes 1 0 animals heterotroph tape worm parasite evergreen, pineapple, cannabis, photosynthetic, Indian paintbrush, pollen in glacial Streptophyta 46 0 mesophile autotroph, Arabica and ice carbon fixing Robusto coffee, tobacco, olive, fava bean, mung bean, anzaki bean moss, soil, peat, Amoebozoa 5 4 mesophile heterotroph freshwater photosynthetic, Ochrophyta 1 1 marine no information autotroph, carbon fixing

marine, brackish Ciliophora 2 2 no information heterotroph water, sediment

animal parasite, heterotroph, Heterokontophyta 1 0 mesophile freshwater metabolizing

Virus 2 0 98

Rhizobium phage 1 0 bacteriophage no information no information RHEph10

Stealth virus 1 0 no information no information no information

Unknown 16 15 isolated from: tap water from tap 4, rice straw anaerobic digester, activated Uncultured Bacteria 4 4 no information no information no information wastewater, creosote polluted soil, winter wheat forage-fed steer #169 99

Isolated from: granite surface, Lake Tyrrell, marine surface water, activated sludge, 35m, deep anoxic and hypersulfidic water column sample, Svalbard: Uncultured Eukaryote 6 5 Adventfjorden, freshwater lake, oligotrophic freshwater lake, euphotic portion of the water column, Monterey Bay oligotrophic Pacific Ocean waters, micro-oxic water column sample isolated from: Guerrero Negro Hypersaline Mat 03; altitude 310m AMSL, Guerrero Negro Hypersaline Uncultured Unknown 5 6 Mat 06; altitude 310m AMSL, Guerrero Negro Hypersaline Mat 02; altitude 310m AMSL, Guerrero Negro Hypersaline 100

Mat 04; altitude 310m AMSL, Guerrero Negro Hypersaline Mat 05; altitude 310m AMSL, Guerrero Negro Hypersaline Mat 01; altitude 310m AMSL,gastrointes tinal specimens, oxygenic photogranule, thermogenic travertine, anaerobic digester, rice straw anaerobic digester, Lake Lugano water, oil field Homo sapiens; Clones donated by Synthetic Construct 1 0 Kazusa DNA Research Institute 101

2.4.5.2 Organism Overlap

Overlap between V5 (3,563 m + 3,585 m; type 1/2 accretion ice from the shallow

embayment) and 3,585 m (type 2 accretion ice from the shallow embayment): To gain insight

into the overlap between samples, the remaining combinations were explored. Between V5 and

the 3,585 m sample a total of 23 unique organisms (19 bacteria and four eukaryotes) spanning ten phyla were in common (Fig. 16). Of the bacteria, two unique organisms were Actinobacteria:

Mycobacterium kansasii and an uncultured Actinobacterium were found in both samples.

Among the Bacteroidetes, only an uncultured bacterium was shared between samples. For the

cyanobacteria, only an uncultured cyanobacterium was shared. The samples shared four

Firmicutes, (93% identity to 16S rRNA in V5, and 100% identity to 16S ribosomal RNA in the 3,585 data), an aerobic mesophile isolated from soil and animal digestive

tracts; Flavonifractor plautii (96% identity to 16S rRNA in V5 and 100% identity to GTP-

binding protein TypA mRNA in the 3,585 data), an anaerobe known to be part of the human

(and other animals) gut biome; an uncultured Carnobacterium, a genus of environmental bacteria often isolated from ice; and an uncultured Veillonella, a genus of bacteria commonly found in animal intestines. Several types of Proteobacteria were shared between samples consisting of four Alphaproteobacteria, one uncultured Betaproteobacteria, three

Gammaproteobacteria, one uncultured Pelobacter Deltaproteobacteria, and one uncultured bacterium (only able to be classified as Proteobacteria). The Alphaproteobacteria consisted of

three nitrogen fixing soil dwelling organisms, Mesorhizobium loti (98% identity to 23S rRNA in

V5, and 88% imidazole glycerol phosphate synthase subunit HisF mRNA in the 3,585 m data),

Rhizobium gallicum (87% identity to 23S rRNA in V5, and 100% identity to propionyl-CoA

carboxylase subunit alpha 2 mRNA in 3,585 m data), and Sinorhizobium meliloti (90% identity 102 to plasmid pSymA in V5, and 98% identity to phosphoglycerate kinase mRNA in the 3,585 m data), as well as one aquatic organism, Brevundimonas diminuta (94% identity to 16S rRNA in

V5 and 84% identity to phosphoglycerate kinase mRNA in 3,585 m). The Gammaproteobacteria consisted of one uncultured Gammaproteobacteria, one uncultured Pseudomonas sp., and one

Psychrobacter sp. Of the eukaryotic organisms, one Arthropod, Gryllus bimaculatus; two

Streptophyta, Medicago truncatula and Vicia faba; and one uncultured eukaryote were shared.

Figure 16. Number of unique organisms that were unique to the 3,585 m sample (red circle; type

2 accretion ice) and V5 (yellow circle; type 1/2 accretion ice) and the overlap between them. 103

Overlap between V6 (3,606 + 3,621 m; type 1/2 accretion ice) and 3,585 m (type 2 accretion ice): Comparisons between 3,585 and V6 resulted in a total of ten unique organisms shared (Fig. 17). Of these, all had at least 97% identity, eight aligned with ribosomal sequences, and two aligned with mRNA sequences. The most notable of these are Azospirillum sp. B510 (96% identity to 16S rRNA in V6 and 100% identity to phosphoglycerate kinase mRNA in the 3,585 m data), which is a soil dwelling nitrogen fixing organism; Burkholderia pseudomallei (92% identity to 16S rRNA in V6 and 100% identity to 23S rRNA for the 3,585 m data), a soil dwelling Betaproteobacterium; and Pseudomonas aeruginosa (98% identity to NIH-1 tail fiberin genes V6, and 99% identity to CusA/CzcA family heavy metal efflux RND transporter mRNA in 3,585 data), which is a soil dwelling and aquatic Gammaproteobacteria capable of metabolizing . The remaining seven organisms consisted of five uncultured Proteobacteria, one uncultured Actinobacteria, and one unknown organism. 104

Figure 17. Organisms unique to both the 3,585 m sample (red circle; type 2 accretion ice from the shallow embayment) and V6 (purple circle; 3,606 + 3,621 m; type 1/2 accretion ice from the main lake basin), as well as the overlap between them.

2.4.5.3 Unshared Organism Summary

The quantity and taxonomic affiliation of the organisms unique to the 3,585 m sample were investigated. A total of 554 of the 681 unique organisms did not overlap with any other sample. These were primarily comprised of bacteria with 453 organisms unique to the 3,585 m ice core section. Next, a total of 90 eukaryotic organisms were not shared with any other sample.

Two viruses were not found in any other sample. Finally, taxonomic affiliation was undetermined for 3 organisms. 105

Bacteria: A total of 453 unique Bacteria were only found in the 3,585 m sample. Of these, 173 organisms had sequence identities of 97% or greater. The organisms were from eight phyla: Actinobacteria, , Bacteroidetes, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Planctomycetes, and Proteobacteria (Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria). As with the 3,501 + 3,520 m sample, many organisms were extremophiles and will be discussed in a later section. The phylum with the most representatives was Proteobacteria. A total of 238 Proteobacteria were unique to the 3,585 m sample, 106 of which had sequence identities of at least 97%. The majority of the organisms, 117, were classified as Alphaproteobacteria, but only 37 had sequence identities of at least 97%. Some notable examples include Altererythrobacter sp. B11 (100% identity to mRNA for phosphoglycerate kinase), Celeribacter ethanolicus (100% identity to phosphoglycerate kinase mRNA), and Pelagibaca abyssi (97% identity to phosphoglycerate kinase mRNA), all of which have been isolated from deep sea water or sediment samples. Others have been isolated from freshwater or freshwater sediment sources, such as Magnetospirillum gryphiswaldense (100% identity to phosphoglycerate kinase mRNA), Rhodopseudomonas palustris CGA009 (100% identical to mRNA for a hypothetical protein), and Magnetospirillum sp. ME-1 (100% identity to acetyl/propionyl-CoA carboxylase subunit alpha mRNA). Many others have been shown to have important metabolic capabilities or are extremophiles.

Betaproteobacteria comprised 60 of the 238 Proteobacteria, of which 43 had a sequence

identity of at least 97%. Some of the most notable examples include Achromobacter sp. B7

(100% identity to 23S ribosomal RNA), Burkholderia oklahomensis (100% identity to YihY 106

family inner membrane domain protein mRNA), and Ralstonia pseudosolanacearum (100%

identity to translational GTPase TypA mRNA), all of which have been isolated from soil. Others

have been isolated from aquatic conditions, including Candidatus Methylopumilus turicensis

(97% identity to 23S rRNA), Methylophilus sp. TWE2 (94% identity to 23S rRNA), and

Alcaligenes aquatilis (100% identity to 23S rRNA).

A total of 4 unique Deltaproteobacteria were unique to this sample. The only organism to

have an identity of at least 97% was Vulgatibacter incomptus (100% sequence identity to GTP-

binding protein TypA/BipA mRNA). The others were two strains of Anaeromyxobacter

dehalogenans and Anaeromyxobacter sp. K, all aligning with excinuclease ABC mRNA at approximately 94% identity.

Finally, 57 unique organisms were categorized as Gammaproteobacteria, with 25 having sequence identities of at least 97%. Of the 25 high identity organisms, sequences similar to 13 species of Pseudomonas were found. Twelve had a 100% identity to 16S ribosomal RNA while the others align 100% with several mRNAs. Sequences similar to other organisms, such as

Lysobacter maris (100% identity to thymidylate synthase mRNA) and Nitrincola alkalilacustris

(100% identity to 16S rRNA), also were found.

The phylum with the second most unique organisms was Actinobacteria, with 192 unique organisms, 47 of which had sequence identities of 97% or greater. Furthermore, all sequences from Actinobacteria almost exclusively aligned with mRNAs. Only one uncultured

Frigoribacterium sp. aligned with 16S ribosomal RNA. Approximately 50 of the organisms have been isolated from soils. Included among these are Saccharothrix espanaensis DSM 44229 (98% identity to Gamma-glutamyl phosphate reductase mRNA), Actinosynnema mirum DSM 43827

(100% identity to ATP-dependent helicase HrpA mRNA), and Luteipulveratus mongoliensis 107

(100% identity to thymidylate synthase mRNA). Several other marine species were indicated,

including Micromonospora rifamycinica (97% identity to GTP-binding protein LepA mRNA),

Micromonospora krabiensis (97% identity to GTP-binding protein LepA mRNA), and

Verrucosispora maris AB-18-032 (94% identity to gamma-glutamyl phosphate reductase

mRNA). Other organisms such as Pseudonocardia hydrocarbonoxydans (97% identity to GTP-

binding protein LepA mRNA), Intrasporangium calvum (100% identity to thymidylate synthase

mRNA), and Kocuria flava (97% identity to elongation factor 4 mRNA) have all been isolated

from air samples.

Sequences most similar to a total of seven organisms were categorized as Firmicutes, all

of which had sequence identities of at least 97%, and aligned with both rRNA and mRNA. Some

examples included Bacillus mojavensis (98% identity to 16S rRNA), originally isolated from soil

sampled from the Mojave Desert; Heliobacillus mobilis (100% identity to phosphoglycerate

kinase mRNA); and Thermobacillus composti KWC4 (100% identity to ABC-type antimicrobial

peptide transport system mRNA).

Finally, there were several phyla that only had a few organisms that were categorized.

Sequences closest to three organisms belonging to Deinococcus-Thermus were found. These were Deinococcus actinosclerus, Deinococcus soli, and Deinococcus swuensis, all of which

aligned with either thymidylate synthase or GTP-binding protein TypA mRNA sequences with at

least 97% identity. Next, sequences closest to two uncultured Cyanobacteria were unique to this

sample. The first was an uncultured Phormidium sp. (100% identity to 16S rRNA) and the

second was uncultured Oscillatoria sp. (95% identity to 16S rRNA). A sequence closest to a

single uncultured Planctomyces sp. (100% identity to 16S rRNA), which is a genus is shown to

be capable of anammox reduction of nitrogen dioxide with ammonia to yield nitrogen gas, was 108

found. Last, a sequence closest to an uncultured Sulfurihydrogenibium sp. was present, which had an identity of 93% to 16S rRNA. Eukaryotes: Sequences similar to a total of 90 eukaryotes were found to be unique to the 3,585 m sample. These organisms were categorized into ten phyla, including Arthropoda, Ascomycota, Basidiomycota, Chordata, Nematoda, Streptophyta, Amoebozoa, Ciliophora, Heterokontophyta, and Ochrophyta. Of these 90 Eukaryotes, 80 were found to have sequence identities of at least 97%. A total of 40 unique organisms were grouped into Streptophyta, 37 of which had sequence identities of at least 97%. Additionally, sequences aligned with 28 of the 40 organisms failed to match with genes of known function. Of the 40 organisms, 14 are common agricultural plants, such as Lens culinaris (100% identity to ATP synthase CF1 epsilon subunit mRNA),

Coffea arabica (98% identity to ATP synthase CF1 epsilon subunit mRNA), and Olea europaea subsp. Cuspidate (100% identity to a gene of unknown function). The remaining organisms were a variety of wild species, of which eight had sequences that aligned with mRNA genes. These included Vicia sativa (100% identity to ATP synthase CF1 epsilon subunit mRNA), Pyrenacantha thomsoniana (100% identity to AtpE mRNA), and Lodes cirrhosa (100% identity to ATP synthase CF1 epsilon subunit mRNA).

A total of 19 organisms were classified as Ascomycota, with 15 having sequence

identities of at least 97%. These organisms included Boeremia exigua var. linicola (100% identity to 28S rRNA), Arthrocatena tenebrio (100% identity to 60S rRNA), and

Phaeodactylium stadleri (100% identity to 60S rRNA), all of which are soil dwelling organisms. 109

Following Ascomycota, Chordata was the next most numerous, with 18 unique

organisms not present in any other sample. Of these, 16 had sequence identities of at least 97%.

It should be noted that of the 18, only five had sequences that aligned with genes of known

function. These included organisms such as Pantholops hodgsonii (100% identity to imidazole

glycerol phosphate synthase hisHF mRNA), Tupaia chinensis (100% identity to lysine-specific

demethylase PHF2 mRNA), and Carlito syrichta (100% identity to lysine-specific demethylase

PHF2 isoform X2 mRNA). Twenty out of the 45 unique chordate sequences were similar to species of monkeys. Due to the short read lengths and the close relationship to humans for many of the recovered sequences, it is most likely that these sequences are derived from humans but misclassified, resulting in them being kept in the sample after removal of samples found in the negative control. Five sequences were also similar to common livestock and sequences similar to six avian species were found all of which may have come from reagents in the PCR kit used.

A total of five Amoebozoa were present, all of which had sequence identities of at least

97%. Four of the organisms were from the genus Difflugia, including D. bacillariarum, D. hiraethogii, and D. lanceolate, all of which are shelled amoebas found in freshwater environments and aligned with 40S rRNA. The other organism was Cavenderia fasciculate (97% identity to COBW domain-containing protein mRNA), a soil dwelling slime mold.

The remaining phyla ere represented by few organisms that could be categorized. Two organisms in the Basidiomycota that were present were Malassezia globosa CBS 7966 and

Rhodotorula graminis WP1, whose sequences aligned with genes for hypothetical proteins. One

Nematoda, Parastrongyloides trichosuri (96% identity to a gene of unknown function), was unique to this sample. Next, two Ciliophora were unique to this sample. They were Corlissina maricaensis and Trachelolophos quadrinucleatus, both of which had a 100% sequence identity 110

to 40S rRNA. A single Heterokontophyta, Saprolegnia parasitica CBS 223.65 (100% identity to a gene for a hypothetical protein), which is an aquatic organism capable of infecting fish. Finally, Chrysopodocystis socialis (100% identity to 18S rRNA), a member of Ochrophyta, was found to be unique to the 3,585 m sample. Viruses: Two viruses were found only in the 3,585 m sample. The first was Rhizobium phage RHEph10, which had an 87% sequence identity to gene for thymidylate synthase. The second was an unidentified stealth virus, a type of virus known for its ability to avoid provoking an inflammatory immune response, which had a 91% sequence identity to a gene of unknown function (Martin & Anderson, 1997).

2.4.6 Summary

Overall, there was some overlap found between all samples analyzed. However, even for

the samples with the highest amount of overlap (3,501 m + 3,520 m and 3,585 m) the amount is

quite low. This indicates that while there could be some contribution from one location to another in Lake Vostok, it is likely quite low compared to the organisms unique to each sample.

This trend is further illustrated below in figure 18. 111

Figure 18: The distribution of unique organisms that are shared between the samples analyzed in this study and a previous study by Shtarkman et al., 2013. Each sample is color coded with overlapping regions indicating the overlap between two or three samples representing regions of

Lake Vostok. Abbreviations: A = Archaea, B = Bacteria, and E = Eukarya. 112

2.4.7 Extremophiles

To understand the types of that may exist in the lake, the extremophiles present in each sample were identified. Organisms tolerant of extreme environments supports the hypothesis that hydrothermal habitats may exist.

2.4.7.1 Glacial Ice (2,149 m)

No sequences were found in the 2,149 m sample that aligned with extremophiles.

2.4.7.2 Basal Ice (3,501 m + 3,520 m)

The 3,501 m + 3,520 m sample had a wide variety of extremophiles (Fig. 19).

Approximately 36% were most similar to psychrophilic and psychrotolerant species, all of which had sequence identities of >97% similar except for one, Marinifilaceae bacterium SPP2 at 95% identity to 23S rRNA. Notably, sequences matching two strains of Aequorivita sublithincola

(both 100% identity to 23S rRNA) were found, an organism commonly found in Antarctic sea water. Another extremophile found was a eukaryote Chlamydomonas sp. UWO241 (100% identity to a gene of unknown function), a psychrophilic green alga originally isolated from

Antarctica. In contrast, 17% of the sequences most similar to extremophiles were related to organisms that were thermophilic or thermotolerant. For instance, Sulfuriferula thiophila (100% identity to 16S rRNA) is a Betaproteobacteria found in hot springs and known to oxidize sulfur.

Sequences most similar to eight unique species of halophilic or halotolerant organisms, although only two were >97% similar. One was closest to the Archaea, Halorubrum trapanicum (100% identity to a gene of unknown function), known to be extremely halophilic; and the other was

Dunaliella salina (97% identity to 50S rRNA), a green alga that is highly tolerant to saline conditions. Sequences most similar to three unique desiccation resistant organisms were 113

Rhodococcus jostii RHA1 (97% identity to inositol 2-dehydrogenase mRNA), Kineococcus

radiotolerans SRS30216 = ATCC BAA-149 (97% identity to mRNA for ATP-dependent DNA

helicase) and Haematococcus lacustris (95% identity to 23S rRNA). Five unique sequences were

most similar to extremophiles that were acidophilic or acidotolerant. All but one had a sequence

identity greater than 97%. Finally, three unique organisms were known to be alkaliphilic, all of

which had percent sequence identities of 97% or greater, such as Alkalitalea saponilacus (100%

identity to 23S rRNA) and Anaeromyxobacter sp. Fw109-5 (97% identity to ATPase AAA-2

domain protein mRNA).

Figure 19. The proportions of the various extremophiles found in ice samples from

contemporary and previous research by Shtarkman et al., (2013). Abbreviations: p:

psychrophilic/psychrotolerant, t: thermophilic/thermotolerant, h: halophilic/halotolerant, l: alkaliphilic/alkalaitolerant, d: desiccation resistant, and c: acidophilic/acidotolerant. 114

2.4.7.3 Type 1 Accretion Ice (3,540 m + 3,569 m)

The 3,540 m + 3,569 m sample had a total of 21 unique organisms that were classified as

extremophiles (Fig. 19). This low quantity is consistent with what would be expected based upon previous research by D'Elia et al (2008, 2009). Of the 21, three were closest to thermophilic or thermotolerant species. Sulfurifustis variabilis (100% identity to ankyrin mRNA) originally isolated from a lake, thrives at temperatures of 42-45°C, and is able to oxidize several sulfur compounds. In addition, sequences closest to Gordonia sp. 1D (100% identity to amidohydrolase

mRNA) were also found. This species was originally isolated from Antarctic soil and water and

is thought to be thermotolerant, halotolerant, and hydrocarbon metabolizing. Sequences closest

to two psychrotolerant organisms were discovered. Acidithiobacillus ferrivorans SS3 (100%

identity to phosphate ABC transporter mRNA) grows at temperatures as low as 5°C and oxidizes

both ferrous iron and inorganic sulfur compounds. The most numerous category of

extremophiles was that of halotolerant and halophilic organisms, with 11 in total of which four

had sequence identities greater than 97%, including Rhodococcus opacus (97% identity to

peptide synthetase mRNA), a soil dwelling chemolithotroph. Sequences matching two strains of

the Archaea Halobacterium salinarum (NRC-1 and R1 both 100% identity to

formyltetrahydrofolate deformylase mRNA) were found. These strains are known to be

extremely halophilic, able to tolerate low oxygen and are exceptionally resistant to UV

irradiation, X-rays, and gamma radiation. Closest matches to three : Acidithiobacillus

ferrivorans SS3 (100% identity to mRNA for phosphate ABC transporter), Salinisphaera sp.

LB1 (95% identity to GTP-binding protein TypA/BipA mRNA), and sonnei (100%

identity to phage minor tail protein L mRNA) were discovered. Finally, one unique alkalitolerant

bacterium, Sulfurifustis variabilis (100% identity to ankyrin mRNA) and one alkaliphilic 115

bacteria, Arthrospira sp. PCC 8005 (97% identity to a gene of unknown function), a

cyanobacteria found in alkaline lakes, were found.

2.4.7.4 Type 2 Accretion Ice (3,585 m)

Finally, the 3,585 m sample had sequences closest to 88 unique extremophiles (Fig. 19).

Of these, 16 were thermophilic or thermotolerant, six above 97% identity, including Gordonia

sp. 1D (100% identity to glutamate-5-semialdehyde dehydrogenase mRNA), Streptomyces leeuwenhoekii (100% identity to Gamma-glutamyl phosphate reductase mRNA), and

Thermobacillus composti KWC4 (100% identity to ABC-type antimicrobial peptide transport system mRNA). The T. composti KWC4 was originally isolated from a composting reactor, and

S. leeuwenhoekii was isolated from the hyper-arid Atacama Desert and is not only thermophilic, but it is highly desiccation resistant, and produces antibiotics. Sequences closest to 12 unique organisms have previously been shown to be psychrotolerant or psychrophilic. Four have >97% sequence identities with recognized species. Polaromonas glacialis (97% identity to 16S rRNA) was isolated from an alpine glacier and shown to be psychrophilic. Candidatus Methylopumilus planktonicusi (97% identity to 16S rRNA) grows well at low temperatures and plays an important role in the carbon cycling via the turnover of single carbon compounds. Forty unique organisms were found during analysis that were halotolerant or halophilic. Sixteen had percent identities greater than 97%, including Salinibacter ruber (97% identity to a gene of unknown function), which has been shown to need at least 15% salinity to grow; Pelagibaca abyssi (97% identity to phosphoglycerate kinase mRNA), which was originally isolated from a deep sea water sample; and Rhodococcus opacus (100% identity to mRNA), which was isolated from soil and produces high amounts of triglycerides. Sequences closest to three unique species of acidotolerant or acidophilic were found, but none were above the 97% sequence identity 116 threshold. Microbacterium sp. CGR1 was at 96% identity to GTP-binding protein TypA mRNA and is an extraordinary organism also isolated from the Atacama Desert, able to tolerate a wide pH, salinity, and temperature range. In addition, it can also reduce iron and is resistant to high levels of arsenic. Finally, 12 unique organisms were found to be alkalitolerant or alkaliphilic, three of which were above 97% sequence identity. For example, Nitrincola alkalilacustris and

Nitrincola schmidtii, both matched at 100% identity to 16S rRNA, were originally isolated from water of soda pans in Hungary and shown to be alkaliphilic as well as halotolerant. The counts for the organisms in each category discussed are summarized below in Figure 20.

Figure 20. The quanitity of unique organisms from the three samples that were found to contain extremophiles. Thermo denotes thermophilic or thermotolerant. Psychro denotes either psychrophilic or psychrotolerant organisms. Halo denotes halophilic and halotolerant.

Desiccation denotes desiccation resistant organisms. Acid denotes acidophilic and acidotolerant organisms. Finally, alkali denotes both alkaliphilic and alkalitolerant organisms. 117

2.4.8 Metabolic Analysis

To further determine if Lake Vostok contains an active and dynamic ecosystem, potential

metabolic cycles and pathways were investigated within each sample. If the samples contain

organisms capable of performing vital processes, then it is likely that an ecosystem exists in the lake. The most important metabolic cycle investigated in this study was the nitrogen cycle.

Additionally, several pathways were also investigated including carbon fixation, iron , sulfur metabolism, and hydrocarbon metabolism, among many others.

2.4.8.1 Glacial Ice (2,149 m)

As previously stated, the only sequences found in the 2,149 m sample were most similar

to an uncultured cyanobacterium. Sequences similar to uncultured cyanobacteria have been

isolated throughout the glacier and some cyanobacteria may end up in the lake while others will

remain trapped in the glacier. If the cyanobacterium survived it could contribute to the proposed

ecosystem in Lake Vostok, or their sequences could simply be the result of remnants from dead

cyanobacteria. It has been shown that Cyanobacteria are capable of fixing nitrogen in

environments lacking sunlight, thus providing an important to an isolated ecosystem.

However, they can only fix carbon using (Sohm et al., 2011). Without more

information on the uncultured cyanobacteria it is impossible to know its metabolic capabilities.

2.4.8.2 Basal Ice (3,501 m + 3,520 m)

Forty-two unique organisms from seven phyla were identified from the 3,501 m + 3,520

m sample which were put into seven metabolic categories related to nitrogen, iron, sulfur,

arsenic, phosphorous, carbon (hydrocarbon metabolizing and carbon fixing), and hydrogen. 118

Categorization was based upon known physiological capabilities derived from various online

resources.

Organisms from the phyla Actinobacteria, Alphaproteobacteria, Betaproteobacteria,

Deltaproteobacteria, Firmicutes, and Gammaproteobacteria have all been shown to be involved

in parts of nitrogen cycling. Sequences closest to two Alphaproteobacteria, Rhizobium gallicum

bv. gallicum R602 (97% identity to myo-inositol 2-dehydrogenase 2 mRNA) and Azorhizobium caulinodans ORS 571 (97% identity to short-chain dehydrogenase/reductase SDR mRNA), as well as a Betaproteobacteria Paraburkholderia phymatum STM815 (93.8% identity to mRNA) and a Firmicutes, Clostridium pasteurianum (98% identity to 16S rRNA) are all capable of nitrogen fixation. One Actinobacteria, Euzebya sp. DY32-46 (100% identity to 23S rRNA), was originally isolated from sea water and reduces nitrite and transports ammonium. In addition,

Thauera aromatica K172 (100% identity to rRNA), Azoarcus sp. DN11 (100% identity to 23S rRNA), Serratia marcescens (100% identity to 23S rRNA), Pseudomonas stutzeri (95.3% identity to 16S rRNA), and Pseudomonas xanthomarina (95.3% identity to 16S rRNA) have all been shown to have denitrifying capabilities. Finally, one sequence most similar to a nitrifying organism, Candidatus Nitrotoga sp. (100% identity to 16S rRNA) was also found. Additionally,

Planctomyces spp. were found. Many are capable of anammox reactions, which use ammonia and nitrate to produce nitrogen gas.

Eighteen unique organisms from seven phyla can fix carbon from carbon dioxide. All of the organisms are able to use the reductive pentose phosphate cycle to fix carbon. These included

Cylindrotheca closterium (100% identity to 28S rRNA), Chlamydomonas noctigama (97.3% identity to rRNA), Acidiphilium multivorum AIU301 (97.4% identity to mRNA), Arthrospira sp.

PCC 8005 (97.1% identity to gene of unknown function), and Emiliania huxleyi CCMP1516 119

(100% identity to mRNA for a hypothetical protein), as well as others. However, due to the

absence of sunlight under 3.5 km of ice these organisms are not utilizing sunlight to fix carbon.

Instead, either they would have to survive using a heterotrophic lifestyle, or they are obtaining

the ATP and NADH, necessary for the CO2 fixing reactions, from other processes.

Sequences similar to five unique organisms spanning five phyla were involved in sulfur metabolism. Euzebya sp. DY32-46 (100% identity to 23S rRNA) cleaves dimethylsulfoniopropionate which is an extremely abundant sulfur metabolite. Massilia putida

(94% identity to a gene of unknown function) is capable of producing dimethyl-disulfide which is a sulfur metabolite important for sulfur cycling. Two Deltaproteobacteria, Desulfococcus multivorans (100% identity to 23S rRNA) and Desulfurella acetivorans (100% identity to 23S

rRNA) are both able to reduce sulfate. Finally, Emiliania huxleyi CCMP1516 (100% identity to

mRNA for a hypothetical protein) is a coccolithophore, which produces dimethyl and

plays an important role in sulfur cycling in the .

Sequences closest to 24 organisms from six phyla have been shown to metabolize a wide

range of hydrocarbons. The most important of these are seven unique organisms that all aligned

with 23S ribosomal RNA, including Candidatus Filomicrobium marinum (100% identity),

Sphingobacterium sp. 21 (98% identity), Candidatus Methylopumilus planktonicus (100%

identity), Methylotenera versatilis (100% identity), and Methylovorus glucosotrophus (100%

identity). These organisms are capable of metabolizing single carbon compounds such as

methanol and . The ability to metabolize single carbon compounds is important because

hydrothermal vents have been shown to release C1 to C4 hydrocarbon compounds (Proskurowski

et al., 2008). 120

Additionally, it has been shown that some hydrothermal vents produce a wide variety of

aromatic hydrocarbons from the pyrolysis of (Kawka & Simoneit, 1990). Another

seven organisms were found to metabolize an assortment of aromatic hydrocarbons, including

three Actinobacteria, Mycobacterium sp. PYR15 (86% identity to gene of unknown function),

Mycobacterium sp. WY10 (85% identity to ATP-dependent DNA helicase mRNA), and

Rhodococcus jostii RHA1 (97% identity to inositol 2-dehydrogenase mRNA), as well as the

Betaproteobacteria, Pseudomonas xanthomarina (95% identity to 16S rRNA) that can all

degrade polycyclic aromatic hydrocarbons. Additionally, Arenibacter algicola (100% identity to

23S rRNA), Thauera aromatica K172 (100% identity to 50S rRNA), Azoarcus sp. DN11 (100%

identity to 23S rRNA) were present, which can all degrade aromatic hydrocarbons.

Next, sequences similar to four unique organisms from the phyla Alphaproteobacteria

and Betaproteobacteria were found to have iron metabolizing capabilities. Acidiphilium cryptum

JF-5 (97% identity to mRNA is capable of dissimilatory iron reduction), and Ferribacterium limneticum (98% identity to 16S rRNA) can reduce iron(III). Burkholderiales bacterium GJ-E10

(100% identity to 23S rRNA) and Ferriphaselus amnicola (100% identity to 16S rRNA) can

oxidize iron.

Other metabolic capabilities of interest include organisms that can oxidize hydrogen,

such as Clostridium pasteurianum (98% identity to 16S rRNA) and Hydrogenophilus

thermoluteolus (100% identity to 23S rRNA). Organisms capable of phosphorous metabolism

were Serratia marcescens (100% identity to 23S rRNA), which is capable of metabolizing

several inorganic phosphorous compounds; Euzebya sp. DY32-46 (100% identity to 23S rRNA),

involved in marine phosphorous cycling; and Microlunatus phosphovorus NM-1 (96% identity

to cadmium-transporting ATPase mRNA), a bioaccumulator of phosphorous. Three organisms, 121

Acidiphilium multivorum AIU301 (97% identity to inositol 2-dehydrogenase mRNA),

Pseudomonas xanthomarina (95.3% identity to 16S rRNA), Chlamydomonas eustigma (100%

identity to 23S rRNA) were all capable of detoxifying arsenic through oxidative or other

biological processes.

2.4.8.3 Type 1 Accretion Ice (3,540 m + 3,569 m)

Sequences similar to a total of 21 unique organisms from eight phyla were identified

from the 3,540 m + 3,569 m sample, and are capable of metabolizing nitrogen, iron, uranium,

sulfur, and/or hydrocarbons. In addition, some potentially have important or interesting abilities

such as carbon fixation, resistance to radiation, and resistance to heavy metals.

Five unique organisms from three phyla were found to be nitrogen metabolizing (Fig.

19). Included in these are three nitrogen fixing bacteria, Azospirillum brasilense Sp245 (100%

sequence identity to mRNA for hypothetical protein), Paraburkholderia phymatum STM815

(94% identity to ABC transporter related mRNA), Paraburkholderia sprentiae WSM5005

(97.4% identity to sulfonate ABC transporter ATP-binding protein mRNA). The remaining

organisms Tessaracoccus sp. T2.5-30 (100% identity to co-chaperone YbbN mRNA) and

Thiobacillus denitrificans ATCC 25259 (98.0% identity to GTP-binding protein TypA mRNA) are capable of reducing oxidized nitrogen compounds, such as NO3 and NO2, to nitrogen gas. 122

Figure 21. Phyla that have been found in both the 3,540 m + 3,569 m sample and the 3,585 m

sample that have been shown to be involved in the nitrogen cycle. The key shows the

abbreviations used for the phyla included. Other metabolic processes important to an ecosystem

are also noted. Organisms found in the 3,540 m + 3,569 m and 3,585 m samples were found to

be capable of four different carbon fixation pathways. Additionally, organisms capable of several

other interesting metabolic processes were noted. The * denotes the possible, but unconfirmed, presence of a metabolic pathway. Abbreviations: Actino: Actinobacteria, Asco: Ascomycota,

Alpha: Alphaproteobacteria, Beta: Betaproteobacteria, Delta: Deltaproteobacteria, DT:

Deinococcus-Thermus, Firmi: Firmicutes, Gamma: Gammaproteobacteria, Acid:

Acidithiobacillia, Planct: Planctomycetes. 123

From the sample, a total of seven unique organisms are capable of fixing carbon from

carbon dioxide. They are from the phyla: Chlorophyta, Cyanobacteria, and Proteobacteria

(Acidithiobacillia, Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria). All

organisms use the reductive pentose phosphate cycle, including Arthrospira sp. PCC 8005 (97%

identity to gene of unknown function), Rhodopseudomonas palustris BisA53 (100% identity to

mRNA for a S3P SSU protein), Sulfurifustis variabilis (100% identity to mRNA for ankyrin),

and Monoraphidium neglectum (100% identity to mRNA for hypothetical protein).

Additionally, sequences similar to four unique organisms were found to be capable of

metabolizing sulfur. Of these, Thiobacillus denitrificans ATCC 25259 (98% identity to GTP-

binding protein TypA mRNA) couples to the oxidation of inorganic sulfur

compounds, such as and thiosulfate. Two other organisms, Acidithiobacillus

ferrivorans SS3 (100% identity to phosphate ABC transporter mRNA) and Acidithiobacillus

ferrooxidans ATCC 53993 (100% identity to phosphate ABC transporter mRNA) oxidizes

inorganic sulfur in a process not tied to denitrification. The final organism, Massilia putida (97%

identity to gene of unknown function) is able to reduce sulfur into dimethyl disulfide, an

important compound in sulfur cycling.

Three unique organisms that have iron oxidation abilities are, Thiobacillus denitrificans

ATCC 25259 (98% identity to GTP-binding protein TypA mRNA), which is able to oxidize

iron(II) in anaerobic conditions through a nitrate dependent process; Acidithiobacillus

ferrivorans SS3 (100% identity to phosphate ABC transporter mRNA); and Acidithiobacillus

ferrooxidans ATCC 53993 (100% identity to phosphate ABC transporter mRNA), both of which oxidize ferrous iron. 124

2.4.8.4 Type 2 Accretion Ice (3,585 m)

The sequences from 3,585 m revealed organisms with a high diversity in their metabolic

capabilities. A total of 11 phyla containing 97 unique organisms capable of metabolic processes

involving nitrogen, iron, hexavalent chromium, uranium, phosphorous, perchlorate,

hydrocarbons, and alcohols. Additionally, other organisms were capable of fixing carbon, had

saprotrophic capabilities, and were able to metabolize environmental pollutants.

Of the 97 organisms, 44 are involved in nitrogen cycling, the majority of which are able

to fix nitrogen (Fig. 21). The nitrogen fixing organisms include an Actinobacteria, alni

ACN14a (92.1% identity to mRNA for Tyrosine phosphorylated protein A); a Firmicutes,

Heliobacillus mobilis (100% identity to mRNA for phosphoglycerate kinase); and 26 unique

Alphaproteobacteria, including Azospirillum brasilense (100% identity to mRNA

acetyl/propionyl-CoA carboxylase subunit alpha), Hartmannibacter diazotrophicus (100% identity to mRNA for Phosphoglycerate kinase), Magnetospirillum sp. ME-1 (100% identity to mRNA for acetyl/propionyl-CoA carboxylase subunit alpha), and Sinorhizobium meliloti (100% identity to mRNA for phosphoglycerate kinase). The remaining 15 organisms, spanning six phyla, are all capable of reducing nitrates, including Tessaracoccus sp. T2.5-30 (100% identity to mRNA for co-chaperone YbbN), Pannonibacter phragmitetus (96% identity to mRNA for 4- hydroxytetrahydrobiopterin dehydratase), Alcaligenes aquatilis (100% identity to 23S rRNA),

Deinococcus actinosclerus (97% identity to mRNA for thymidylate synthase), and Pseudomonas brassicacearum (100% identity to mRNA for imidazole glycerol phosphate synthase subunit hisF).

Sequences most similar to 30 unique organisms were found that are capable of metabolizing a wide array of hydrocarbons or hydrocarbon-based compounds. Most notable 125 among these include Azoarcus sp. DN11 (100% identity to 23S ribosomal RNA), capable of benzene metabolism, which has been detected from hydrothermal vents (Venturi et al., 2017).

Other organisms have a much broader range of metabolic processes, such as Mycolicibacterium chubuense NBB4 (100% identity to mRNA for glycine dehydrogenase) which can metabolize C2 to C4 alkenes and C2 to C16 alkanes, some of which are released by hydrothermal vents

(Welhan & Lupton, 1987).

Six organisms are capable of metabolizing single carbon compounds, such as methanol, methane, methylamine, and formate, some of which are released by hydrothermal vents. Notable examples include Amycolatopsis methanolica 239 (100% identity to mRNA for O- succinylbenzoate-CoA ligase), which metabolizes methanol; Methylotenera mobilis (97% identity to 23S ribosomal RNA) which metabolizes methylamine; and Methylomonas methanica

MC09 (91% identity to mRNA for cobalamin synthesis protein P47K), an organism that metabolizes methane.

Next, sequences representing five organisms capable of metabolizing iron were found in this sample. Only Mycobacteroides abscessus subsp. massiliense (89% identity to mRNA for

UDP-glucose 6-dehydrogenase) is able to oxidize iron (II) and iron (III). Magnetospirillum magneticum AMB-1 (100% identity to mRNA for 3-phosphoglycerate kinase),

Anaeromyxobacter dehalogenans 2CP-1 (94% identity to mRNA for excinuclease ABC), A. dehalogenans 2CP-C (95% identity to mRNA for Excinuclease ABC subunit A), and

Anaeromyxobacter sp. K (94% identity to mRNA excinuclease ABC) all are capable of iron reduction.

Two strains of A. dehalogenans, 2CP-1 (94% identity to mRNA for excinuclease ABC) and A. dehalogenans 2CP-C (95% identity to mRNA for Excinuclease ABC subunit A) are able 126

to reduce uranium. Microlunatus phosphovorus NM-1 (100% identity to mRNA for

phosphoserine aminotransferase) accumulates high amounts of polyphosphate, Dechloromonas

hortensis (99% identity to 16S ribosomal RNA) reduces perchlorate, Pannonibacter

phragmitetus BB (94% identity to an mRNA of unknown function) reduces chromium, and

Acetobacter persici (100% identity to mRNA for 4-hydroxytetrahydrobiopterin dehydratase) can

convert ethanol into acetic acid. Additionally, sequences similar to a Gammaproteobacteria,

Dyella thiooxydans (100% identity to mRNA for thymidylate synthase) and a

Betaproteobacteria, Hydrogenophaga crassostreae (94.7% identity to 23S rRNA) both can

oxidize thiosulfate to sulfate.

Finally, sequences similar to 16 unique organisms from five phyla were found to have

carbon fixing capabilities. Two organisms only utilize the reductive pentose phosphate cycle:

Arthrospira sp. PCC 8005 (97% identity to gene of unknown function), Rhodoferax antarcticus

(95% identity to 23S rRNA). Seven are able to use reductive pentose phosphate cycle or C4-

dicarboxylic acid pathway to fix carbon, including six strains of Rhodopseudomonas palustris

(90% to 100% identity to mRNA for imidazole glycerol phosphate synthase cyclase subunit) and

Monoraphidium neglectum (100% identity to gene of unknown function). One organism,

Dinoroseobacter shibae DFL 12 = DSM 16493 (91% identity to mRNA), is able to use the

ethylmalonyl-CoA pathway to fix carbon.

2.5 Conclusions of Findings

2.5.1 Meteoric Ice (2,149 m)

Previous research demonstrated that the cellular concentration in glacial ice samples is generally low, ranging from 1 cells/ml to 2 cells/ml but could potentially reach upwards of 380 127 cells/ml in certain circumstances (Christner et al, 2006; Bulat et al, 2009). Results for the glacial ice (2,149 m) sample are lower than previous findings, but this is not unexpected. Twelve different sequences were found that all aligned to the same uncultured cyanobacterium. This amounts to about one unique sequence for every 20-30 ml of sample meltwater. When compared to the data gathered from other samples (3,501 m + 3,520 m; 3,540 m + 3,569 m; 3,585 m; V5

[3,563 m + 3,585 m]; and V6 [3,606 m + 3,621 m]), an uncultured cyanobacterium with the same GI and accession number was identified in all samples except V6. Based on this, it appears that the glacial ice is contributing very little to the genetic diversity and richness in the accretion ice (and therefore, in Lake Vostok) reported by other researchers. Instead the organisms reported from the accretion ice samples must be entering the lake through other means, or more likely are residents of the lake.

2.5.2 Basal Ice (3,501 m + 3,520 m)

2.5.2.1 Organism Overlap

The metagenomic and metatranscriptomic analysis of basal ice sample (3,501 m + 3,520 m) returned 33,139 unique sequences corresponding to 513 unique organisms of which 410 were only found in this sample. The percentage of organisms shared between 3,501 m + 3,520 m and

3,540 m + 3,569 m (type 1 accretion ice from the shallow embayment) was 4% (Fig 9).

Additionally, the percentage of organisms shared between 3,501 m + 3,520 m and 3,585 m (type

2 ice from the shallow embayment) was 5.8% (Fig 10). The number shared between 3,501 m +

3,520 m, 3,540 m + 3,569 m (type 1 and 2 accretion ice from the shallow embayment), and 3,585 m was 18 organisms (1.3%) (Fig 8). Considering the basal ice sample is comprised of disturbed glacial ice and is estimated to be from 500,000 to more than one million years old, while the accretion ice samples are formed from lake water, are 16 to 20 thousand years old, the 128

differences are not surprising (Bell et al., 2002). Therefore, the differences are likely due to the spatial and temporal origins of the samples. Based upon this, the majority of the organisms present in basal ice are unique to that sample, most of which were either already present in the ice or were picked up by the glacier as it traversed the bedrock. Additionally, a large proportion of the cells found are often not viable based upon live/dead fluorescence staining and culturing assays (D’Elia et al., 2008, 2009). However, there still appears to be a very small contribution from the basal ice to the shallow embayment. The sharing may be a temporary event, potentially the result of a geological event that causes disruption in the area (e.g., hydrothermal ).

If the small amount of sharing was happening consistently, it would be expected that the overlap between depths would be substantial and consistent between basal ice and the accretion ice from the shallow embayment.

2.5.2.2 Organism Contents

Sequences similar to 15 organisms known to be psychrophilic or psychrotolerant were found in the basal ice sample. These organisms have been isolated from freshwater, marine, sedimentary, and soil sources. It is likely that some of the soil dwelling organisms are picked up as the glacier rubs against the bedrock. Additionally, the presence of thermophilic and thermotolerant species in conjunction with the indicates there may be extremes of hot and cold. However, analysis on helium isotope levels in glacier ice samples have indicated that geothermal activity in the ridge leading into the shallow embayment is unlikely (Jean-

Baptiste et al., 2001). Therefore, the sequences similar to thermophilic and thermotolerant organisms are likely not thriving in the basal ice region. Alternatively, the organisms may only be related to thermophilic species. However, the meltwater at the basal ice-bedrock interface is likely home to many of the aquatic psychrophiles. Due to the turbulence found in the basal ice, it 129

is hard to determine the time and location where these and psychrophiles were

picked up by the glacier.

There are 87 organisms found in the sample that are normally found in soil and sediment, which are neither psychrophilic nor thermophilic. Specifically, 74 are soil dwelling while 13 are sediment dwelling, many of which have been shown to be mesophilic. The relatively high percentage of is somewhat expected due to the erosive forces that form the basal ice and areas of moderate temperature present between extremes. Alternatively, many species that have been found in polar ice samples are psychrotolerant, so some of these organisms might actually be psychrotolerant, and not simply mesophilic. The majority of the soil and sediment dwelling organisms are likely picked up by the basal ice as the glacier rubs against the bedrock

causing inclusions and the associated organisms to become incorporated. The interface between

the bedrock and the glacial ice in many locales is a unique ecosystem, often containing many

unique species that thrive in those zones.

The presence sequences similar to a total of 102 aquatic and marine organisms indicates

that there may be freshwater and saltwater conditions present at the glacier-bedrock interface.

The pressure from the glacier on the bedrock, friction from basal sliding, and geothermal activity

can all cause melting of the basal ice leading to temporary aquatic conditions (Persson, 2018).

There may also be small subglacial lakes and waterways that contain aquatic and marine

organisms. The sequences corresponded primarily to organisms found in aquatic conditions, with

about half as many being found from marine environments. As with the organisms from soil and

sediment sources, most of the aquatic and marine organisms are mesophilic, but many are

adaptable to a variety of conditions. Additionally, only 3 halophilic organisms were found, 2 of

which are marine. 130

2.5.3 Shallow Embayment Type 1 Accretion Ice (3,540 m +3,569 m)

2.5.3.1 Organism Overlap

Overlap between the type 1 accretion ice (3,540 m +3,569 m) and type 2 accretion ice

(3,585 m) in the shallow embayment had the highest proportion of organisms shared among any

of the samples analyzed (Fig 13). Fifty-one organisms were shared between the samples,

amounting to 7.1% of the total organisms present. The temporal and spatial distances between

the closest samples (3,569 m and 3,585 m) is approximately 400 years and 1,200 m apart,

respectively. The temporal and geographic difference between the furthest samples (3,540 m and

3,585 m) is approximately 1,100 years and 3,500 meters, respectively (Bell et al., 2002;

Christner et al., 2006). Considering the amount of time and distance between sample locations it would appear that some contributions may be made from one area of the lake to another. The lack of physical barriers between samples would further enable contribution, but even without barriers, bacteria traversing thousands of meters would be challenging. One factor that might

present a barrier is the presence of a hot plume of water originating from hydrothermal activity in

the area that appears to be near the middle of the shallow embayment. This is indicated by the

high proportion of thermophiles between the 3,563 m and 3,585 m levels, and lower proportions

of thermophiles above and below those ice core sections. Also, the number of halophiles also

increases on both adjacent regions. Therefore, there might be a hot plume roughly midway into

the shallow embayment. Cooling on either side could cause precipitation of some of the

dissolved salts, and after millennia would cause concentration of the salts in regions peripheral to

the hot water plume.

Interestingly, the overlap between the type 1 accretion ice (3,540 m + 3,569 m) and V5 (a

mixture of type 1 and type 2 accretion ice; 3,563 m + 3,585 m) was significantly lower, with 131 only three organisms being shared between samples (Fig 14). Even with the slight overlap of these samples, the low numbers of common organisms between samples indicates that there either are differences in microbial communities spatially or temporally, or both. It is possible that nutrients are released from hydrothermal activity, and changes in the activity might result in blooms of life that are incorporated into the accretion ice and spread across the shallow embayment before slowly dying off. As stated above, the prevalence of thermophiles midway across the shallow embayment, and the lower concentrations of similar organisms on either side of that region, suggests a different set of conditions in the middle of the embayment, which may be a plume of hot water.

2.5.3.2 Organism Contents

The type 1 accretion ice samples from the western side of the shallow embayment had the second lowest number of unique organisms (133) of all samples tested (second only to the glacial ice). There appears to be some commonality in the organisms between the type 1 accretion ice and the basal ice, but in general, the species overlap between regions appears to be small.

In addition to low numbers of organisms present as a whole, the type 1 accretion ice from the 3,540 m + 3,569 m sample has lower levels of extremophiles. Sequences that aligned with a total of one thermophilic, two thermotolerant, zero psychrophilic, two psychrotolerant, three halophilic, and eight halotolerant species were identified from the sample while 35 unique mesophiles were identified. Based upon this it would appear that the locations that form the

3,540 m + 3,569 m sample could be moderate in temperature, relative to other locations. This is the region where some of the basal ice melts into the lake water, which indicates that the temperatures are above freezing, and a temperature gradient might exist from a hydrothermal area to the melting zone. This region may contain water of moderate temperatures. 132

The 3,540 m and 3,569 m depths correspond to the grounding line where the glacier enters into the lake and scrapes along the lakebed. Combined with the soil and sedimentary inclusions in the accretion ice samples it is no surprise that of the 133 organisms, 45 are species associated with soil or sedimentary habitats. As the glacier enters the lake, the sediment is disturbed resulting in it being suspended in the water where the mineral particles subsequently freeze to the glacier along with any associated organisms.

The number of aquatic organisms is ten times higher than marine dwelling organisms.

Type 1 accretion ice samples have found the levels of dissolved sodium to be a maximum of 22 mmol/l (Christner et al., 2006). However, due to the fractionation of ions from the water as it transitions to ice, the actual sodium ion concentration in the shallow embayment is likely much closer to 10,000 mmol/l. This is closer to seawater (48,000 mmol/l) than it is to freshwater (26 mmol/l) (Christner et al., 2006). Sequences similar to eight halotolerant and three halophilic organisms were isolated, which is more than the meteoric and basal samples but significantly less than the type 2 accretion ice. The three halophilic organisms present that are known to thrive in extremely acidic and saline water conditions. Furthermore, the halotolerant organisms primarily inhabit soil or sedimentary environments. It is likely that there is a hydrothermal vent is contributing water with high ion concentrations near where the region of the 3,540 m + 3,569 m sample, with an input of freshwater from basal ice melt or other source is supplying the myriad of aquatic organisms. Therefore, the western portion of the shallow embayment appears to be a region of several gradients, including a temperature gradient from cold to warm (moving west to east), a salinity gradient from low to high, and a silt gradient from high to low. This causes an organism gradient from low to high. 133

2.5.4 Shallow Embayment Type 2 Accretion Ice (3,585 m)

2.5.4.1 Organism Overlap

The species overlap between the 3,585 m and V5 (3,563 m + 3,585 m; type 1 and type 2

accretion ice) was 24 unique organisms, amounting to about 1.2% of the total number of

organisms in both samples (Fig. 16). It was initially expected that the overlap would be higher

considering one depth was in common to both samples. While the same depth was used, even the

small differences in depth between the beginning and the end of the 3,585 m sample can amount

to a difference of 25-70 years. Based on the larger amount of metagenomic and transcriptomic

information in V5, the 3,563 m sample is likely contributing a large number of sequences to the

V5 sample. This is consistent with other studies that reported larger numbers of organisms in the

3,560 m region than in the 3,585 m region (Christner et al. 2006). For the V5r sample, the e were contributions from two different regions (and two types of ice) of the shallow embayment. So, in

that sense, the differences between the V5 (3562 + 3,585 m core sections) and the 3,585 m

sample is not a surprise.

The overlap between the 3,585 m and V6 (3,606 + 3,621 m; type 1 and 2 accretion ice from the main lake basin) amounted to a total of 9 unique organisms, which amounted to 1.1% of

all organisms present in both samples (Fig. 17). It is likely that the overall contribution V6 makes

to the shallow embayment, or vice versa, is extremely low as the lowest amount of organism

overlap all occurred from comparisons with V6. The low level of overlap is consistent with the

presence of a physical barrier, which is a peninsula partially separating the embayment from the southern main basin, as well as the colder temperature, deeper water, and much lower

concentrations of ions in the main basin (Christner et al. 2006; Rogers et al. 2013; Shtarkman et

al., 2013). 134

2.5.4.2 Organism Contents

Although the 3,585 m sample is classified as type 2 ice, it is more accurately in the transition zone between type 1 and type 2 accretion ice. As such, the location that forms the

3,585 m sample contains fewer inclusions than type 1 ice and has lower concentrations of ions that type 1 ice (Christner et al., 2006). Sequences similar to a total of 175 soil and 30 sediment dwelling organisms were found in this sample. Some of the soil organisms are possibly a result of contribution from the other samples (3,501 m + 3,520 m and 3,540 m + 3,569 m). Between

3,585 m and 3,501 m + 3,520 m, 19 of the overlapping organisms have been isolated from soil or sediment. Between 3,585 m and 3,540 m + 3,569 m, 24 of the overlapping organisms have been isolated from soil or sediment. Additionally, 19 organisms that have been isolated from soil or sediment are halophilic or halotolerant.

Aquatic and marine organisms are present in high numbers in the 3,585 m sample, although less in total quantity than soil and sediment dwelling organisms. With 66 aquatic organisms and 30 marine organisms, the proportions are in line with results from basal and type

1 accretion ice samples. Of the aquatic and marine organisms, 14 were also halophilic or halotolerant. In conjunction with sequences similar to several thermophilic or thermotolerant and psychrophilic or psychrotolerant species it is possible the proposed hydrothermal vents in the shallow embayment are releasing plumes of hot water which either carry dissolved ions with it as it rises, causing increases in salinity in that region (Bulat et al., 2004; Christner et al., 2008;

Shtarkman et al., 2013). The presence of sequences similar to acidophilic and acidotolerant as well as alkaliphilic and alkalitolerant organisms further indicates possible hydrothermal activity.

Hydrothermal vents have been shown to release both acidic and alkaline water depending on the exact chemistry of the water within and surrounding the vents (Tivey, 2007). 135

2.5.5 Summary

Overall, the overlap of sequences similar to organisms found in the analyzed regions of the shallow embayment appears to be minimal compared to the ones unique to each region.

Additionally, the presence of sequences similar to thermophilic, psychrophilic, halophilic, acidophilic, and alkaliphilic organisms that have been found in soil, sedimentary, freshwater, and marine environments indicate a diverse set of environmental conditions. These environments are geographically separated by up to 9 km with each region having dynamic conditions that likely differ greatly from one another. Specifically, the basal ice and the type 2 accretion ice of the

3,585 m sample have both a higher quantity and functionally diverse population while type 2 ice formed over the main basin has a much lower quantity of organisms but a similar functionally diverse population.

2.5.6 Biochemical Ecosystem

Acquiring nutrients and the cycling of nutrients are of the utmost importance for all the organisms in a particular ecosystem. Because no sunlight reaches the lake, other energy and chemical sources are necessary for organism and ecosystem viability. In addition to carbon fixation and cycling, nitrogen fixation and cycling are vital for the formation of biological molecules, conversion of energy, and prevention of buildup of toxic compounds (Bernhard,

2010). Several other metabolic processes are potentially present in the basal and accretion ice samples, including those related to sulfur, iron, arsenic, hydrogen, phosphorous, and chromium metabolism. 136

2.5.6.1 Basal Ice

In the basal and accretion ice samples, organisms known to perform almost all parts of the nitrogen cycle were present. In the basal ice, sequences similar to organisms capable of fixing atmospheric nitrogen into ammonia in processes not tied to phototrophic means were found. Additionally, only one of the three nitrogen fixing organisms were associated with plants, but none were obligately symbiotic at a stringency of at least 97% sequence identity. One organism, Candidatus Nitrotoga sp. (100% identity to 16S rRNA) was capable of oxidation of nitrite to nitrate, but no organisms were found that could oxidize ammonia into nitrate. In a previous study, a few organisms were found that could accomplish this conversion (Shtarkman et al. 2013). Some organisms capable of tie it with carbon fixation allowing them to fix

CO2 during nitrification. However, the energy generated from this process is low (Bernhard,

2010). No sequences similar to any organisms capable of performing anammox were found. The process is linked to Planctomycetes and the only one that was found in the basal ice sample is not known to perform anammox. A sequence similar to one Planctomycetes, Paludisphaera borealis, was isolated, but it belongs to the order Planctomycetales, which does not contain any organisms shown to perform anammox. Organisms capable of reducing nitrate back into nitrogen gas

(denitrification) were prevalent in basal ice. Additionally, one organism that has been noted for its capabilities was present, which would assist in the release of nitrogen back into the ecosystem. Most fungi are capable of decomposing nitrogenous compounds back to ammonia. In fact, they are one of the most common groups of organisms that are active in decomposition to recycle nutrients.

The majority of organisms capable of carbon fixation in the basal ice do so through the reductive pentose phosphate cycle. In cyanobacteria and plant chloroplasts, this pathway requires 137

sunlight to produce the ATP and NADH for carbon fixation from carbon dioxide. However, any

process that produces the required ATP and NADH can be used to power this process. Therefore,

many of these organisms have the potential to fix carbon from inorganic sources. Additionally,

there were three chemolithoautrophic organisms that were found in the basal ice sample that can

fix inorganic carbon through alternative pathways. Emiliania huxleyi CCMP1516 was able to fix

CO2 into while Ferriphaselus amnicola and Burkholderiales bacterium GJ-E10 fix carbon by oxidizing iron.

Hydrogen oxidation is closely associated with hydrothermal vents because hydrogen is often released in large quantities and produces more energy than methane or sulfur oxidation and can be used to fix carbon (Petersen et al., 2011). Two organisms were found that oxidize hydrogen in order to fix carbon chemolithoautotrophically: Hydrogenophilus thermoluteolus, which is an extremely thermophilic organism and Clostridium pasteurianum, which also fixes nitrogen. Another group (Bulat et al. 2004) also found H. thermoluteus in Vostok accretion ice from the shallow embayment.

Sequences similar to organisms that are known to possess pathways that can metabolize sulfur, iron, arsenic, phosphorous, and hydrocarbons were found in the basal ice. Sulfur is an essential component of biological life, used for forming di-sulfide bridges in proteins, iron-sulfur

clusters in metalloproteins, and are needed to form bridging ligands in some enzymes. Sulfur is

also used as an electron acceptor and electron donor to assist in metabolic reactions (Sievert et

al., 2007). It is no surprise that organisms capable of metabolizing and releasing various forms of

sulfur are present in the basal ice including several organisms that use sulfate as an electron

acceptor, one that can cleave dimethylsulfoniopropionate (a ubiquitous compound produced by 138

), and two that release volatile sulfur compounds. The is also involved in

other nutrient cycles including carbon, nitrogen, phosphorous, and iron (Sievert et al., 2007).

Sequences similar to organisms capable of both oxidizing and reducing iron were found.

The iron oxidizing organisms are extremely important as they both tie iron oxidation to carbon

fixation in a chemolithoautotrophic lifestyle. Two other organisms are capable of reducing iron

which is used as a replacement for oxygen in the oxidation of carbon compounds (Erbs & Spain,

2002).

Two organisms from the basal ice were found to be capable of oxidizing arsenic which is

frequently released by hydrothermal vents (Stolz et al., 2006). Arsenic can be oxidized to

produce energy, and can be an electron donor or acceptor depending on the chemical process.

Additionally, it can be used to replace phosphorous in some chemical reactions (Stolz et al.,

2006).

Phosphorous is introduced into an ecosystem through the of rocks and

hydrothermal vents. No hydrothermal vent activity has been detected under the basal ice, and

therefore it is possible that phosphorous is picked up as the glacier erodes the bedrock where it

can later be introduced into water (Froelich et al., 1982). The three organisms found to be

involved with phosphorous cycling including Serratia marcescens, which is capable of utilizing

inorganic phosphorous, and Microlunatus phosphovorus NM-1, which accumulates poly- phosphorous from the environment. These are both important components of cycling phosphorous in an ecosystem.

Finally, sequences similar to organisms capable of hydrocarbon metabolism were also found in the basal ice. This metabolic capability is not unusual in bacteria. Hydrothermal 139

activity, the main source of hydrocarbons, has not been reported in the regions close to where

basal ice is formed, but other sources for hydrocarbons may exist under the ice. And, of course,

as organisms die, they release large amounts of hydrocarbons.

2.5.6.2 Accretion Ice

Organisms that are capable of carrying out one or more steps of the nitrogen cycle are all

represented in the accretion ice samples. The accretion ice was populated with multiple phyla

involved in all parts of the nitrogen cycle. A planctomycete was found in the 3,585 m sample,

but no further taxonomic affiliation could be identified. Therefore, it is uncertain whether

annamox reactions are taking place in Lake Vostok. However, previous research of the V5

accretion ice sample found Keunenia stuttgartiensis, a planctomycete capable of performing

anammox (Shtarkman et al., 2013). Also, there were several organisms (within the

Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria) in the V5 sample that were

(Rogers et al. 2013; Shtarkman et al., 2013). However, only a small subset of

bacteria was capable of nitrification. Therefore, in this study, they may have been overlooked or

are under reported taxa, due to small numbers of organisms and sequences. Sequences similar to

seventeen organisms are present in accretion ice samples were capable of nitrogen fixation and

are known to be free-living in soil and aquatic conditions. Sequences similar to a large quantity

of organisms capable of denitrification are present in the accretion ice samples. These organisms

are capable of reducing nitrate, nitrite, ammonia, and ammonium to N2. Reducing nitrate and

nitrite into ammonia instead of N2 has been shown to have important consequences on nitrogen availability in an ecosystem because it allows for re-oxidation much faster, enabling the nitrogen to remain in the system longer (Sievert et al., 2007). 140

Accretion ice samples followed a similar trend as the basal ice. The majority of organisms classified as carbon fixers use the reductive pentose phosphate cycle. Three organisms found in the 3,540 m + 3,569 m sample use alternative carbon fixing pathways. Acidithiobacillus ferrivorans and Thiobacillus denitrificans ATCC 25259 are chemolithoautotrophic, and oxidize sulfur and iron to fix carbon. Previous analysis of the V5 sample found organisms that belong to the phyla Proteobacteria, Chlorobi, Archaeplastida, Chromalveolates, and Cyanobacteria are capable of using either the reductive TCA or the reductive pentose phosphate cycles. In addition, and two Archaea were found that may use the reductive acetyl-CoA pathway to fix carbon

(Rogers et al., 2013; Shtarkman et al., 2013).

Some hydrothermal vents release a wide variety of hydrocarbon compounds. Fischer-

Tropsch type (FTT) synthesis is thought to be, at least in part, responsible for abiotic production of simple hydrocarbons such as methane, ethane, and propane. This reaction occurs at temperatures of 390°C and pressures of 400 bars (conditions similar to those that would be found around a hydrothermal vent in Lake Vostok) and is catalyzed by various including chromite which has been shown to be an effective catalyst and has been found in both meteoric and accretion ice (Foustoukos & Seyfried, 2004; Christner et al., 2006). The presence of chromite is not unreasonable because chromium is very prevalent in the Earth’s crust and sequences similar to an organism capable of reducing chromium were found in accretion ice

(Emsley, 2001). More elaborate hydrocarbons can be synthesized through the pyrolysis of hemipelagic sediment at high temperatures reaching gasoline range aliphatic and aromatic hydrocarbons (Bazylinski et al., 1989). With this in mind, the presence of sequences similar to

35 organisms documented to be capable of metabolizing hydrocarbons may be utilizing this ability if hydrothermal vent activity is present in the lake. 141

Hydrothermal vents are known to release large quantities of sulfur that can provide energy for entire ecosystems by supplying a myriad of sulfur compounds and enabling carbon fixing through the Calvin-Benson cycle (Sievert et al., 2007). Beyond the two strains of

Acidithiobacillus ferrivorans and Thiobacillus denitrificans ATCC 25259, which oxidize sulfur to fix carbon, another organism that can reduce sulfur compounds was also found. Massilia putida reduces sulfur to oxidize other organic matter in metabolic reactions. Although hydrocarbon metabolism was not noted for M. putida, sulfur reduction has been shown to be used in metabolizing hydrocarbons that are released by hydrothermal vents to obtain organic carbon (Sievert et al., 2007).

Accretion ice samples from the shallow embayment were found to have sequences similar to both iron oxidizing and iron reducing organisms. The oxidation of ferrous iron into iron oxide catalyzed by iron oxidizing bacteria, such as the two strains of Acidithiobacillus ferrivorans and Thiobacillus denitrificans ATCC 25259, releases energy (primarily as NADH) that the cell can use for metabolic processes. The oxidation of iron has also been reported to occur in organisms that live in hydrothermal ecosystems (Erbs & Spain, 2002). Four iron reducing organisms were also found in the accretion ice samples. Even though the total amount of energy produced from the reduction of iron coupled to the oxidation of carbon is quite low compared to iron oxidation, it is thought to account for a large portion of the carbon oxidation in environments with high levels of reactive iron. The advantage in the reduction of iron is that it is an anaerobic process, which can be used to oxidize many organic carbon compounds (Erbs &

Spain, 2002). This process would enable the iron reducing organisms to utilize iron either oxidized by other bacteria or directly use iron oxide released from a hydrothermal vent, or deposited by glacial melt. 142

Sequences most similar to Nocardia farcinica IFM 10152 (86% identity to mRNA for putative acyl-CoA synthetase), Pseudomonas putida (100% identity to 16S rRNA), and

Epicoccum nigrum (100% identity to 60S rRNA) were isolated. All three organisms have saprotrophic capabilities that would be an important part of any potential ecosystem by breaking down organic matter such as proteins, carbohydrates, and lipids into their building blocks. These components will then eventually be returned to the ecosystem to be used by other organisms

(Clegg & Mackean, 2006). Many fungi also are important in these processes, and there were sequences from several species of ascomycetes, basidiomycetes, and others in the accretion ice samples. The presence of these organisms would, of course, be essential for nutrient cycling in the lake and are part of both the nitrogen and carbon cycles.

Finally, sequences similar to two organisms capable of reducing uranium and one capable of oxidizing uranium were found in accretion ice samples. It has been proposed that the decay of radioactive uranium may be responsible for the 4He found in accretion ice samples (Bulat et al.,

2004). This uranium would likely be found in sediment or in the bedrock, which uranium metabolizing organisms could inhabit.

2.5.6.3 Basal Versus Accretion Samples

There are notable similarities in metabolic capabilities between the basal ice and accretion ice samples from the shallow embayment. Sequences similar to organisms involved the nitrogen cycle and carbon fixation as well as organisms that can metabolize hydrocarbons, sulfur, and iron were reported in both samples, although it should be noted that in all cases of similar metabolic capabilities little or no similarity in the species was found. This is common in various ecosystems, including animal (including humans) gut microbiota may be very different from one individual to another, but the ongoing biochemical processes are remarkably similar 143

(Jiang et al., 2016). Accretion ice from the shallow embayment and basal ice both had complete or nearly complete nitrogen cycles, while the communities of organisms differed significantly.

The most notable difference is that no organisms were found capable of anammox in the basal ice sample while a Planctomyces species possibly capable of anammox and Keunenia stuttgartiensis, which is capable of anammox, was isolated from the 3,585 m sample and V5 sample respectively (Rogers et al., 2013; Shtarkman et al., 2013). Both sample types also contained organisms capable of carbon fixation using the reductive pentose phosphate cycle, as well as alternative pathways using iron, sulfur, and hydrogen; or by using a reductive TCA cycle or a reductive acetyl CoA pathway.

Organisms capable of metabolizing hydrocarbons were reported in both basal and accretion ice, with accretion ice having a higher number of unique organisms. Including organisms capable of metabolizing more complex hydrocarbons such as polycyclic hydrocarbons. Hydrothermal vents would be one potential source of the hydrocarbons that these organisms are metabolizing in the shallow embayment. Regions that form basal ice may have potential hydrothermal activity, although none has so far been reported. However, methanogenic archaea have been reported living at the glacier-bedrock interface in several glaciers, so there may be other sources of hydrocarbons for organisms living in the basal ice (Boyd et al., 2010).

Both the basal and accretion samples contained sequences for organisms capable of very similar pathways for reducing and oxidizing sulfur and iron. If these organisms are living in the bedrock and lake these compounds could be supplied by bedrock erosion, glacial ice, other microbes, or hydrothermal activity.

Although no organisms capable of arsenic oxidation or reduction, hydrogen oxidation, or phosphorous metabolism were found in accretion ice samples in this study, the V5 sample had 144 sequences similar to organisms that can perform these functions (Rogers et al. 2013; Shtarkman et al. 2013). Because that sample was in the vicinity of the possible hot water plume from the hydrothermal vent, these organisms may be present in a limited area of the shallow embayment.

Both arsenic and hydrogen metabolism are common in bacteria that live near hydrothermal vent activity. Hydrogen oxidation is used in an alternative carbon fixation pathway by carbon fixing bacteria (Stolz et al., 2006; Petersen et al., 2011).

A sequence from a member of the Betaproteobacteria capable of manganese oxidation was present in the basal ice. A sequence that was from a closely related species was reported from the V5 sample from previous work (Rogers et al., 2013). Dissolved manganese is commonly released from hydrothermal vents and is regularly scavenged by bacteria living in hydrothermal vent communities (Cowen et al., 1990).

2.5.7 Possible Energy and Nutrient Sources

There are three possible sources of energy and nutrient input into the lake: glacial melt water, erosion of bedrock, and hydrothermal vents (Karl et al., 1999; Christner et al., 2006,

2008). It is likely that all of these sources are supplying energy or nutrients into Lake Vostok, implying that it is not a closed system but instead is receiving a steady input of biologically vital energy and molecules through multiple sources. Meteoric ice has been shown to contain a variety of biologically relevant nutrients but are generally in very low concentrations compared to water that forms accretion ice in the shallow embayment and main basin. Non-purgeable organic carbon concentrations of meteoric ice have been reported to be 16 umol/l, and ion concentrations such as sodium (2.4 umol/l), (0.4 umol/l), (1.09 umol/l), magnesium (0.36 umol/l), chloride (2.8 umol/l), and sulfate (1.8 umol/l) have all been reported (Chistner et al.,

2006). Other nutrients, including nitrite, nitrate, O2, N2, and carbon compounds have been 145

reported as well. Overall, the amount of nutrients entering Lake Vostok solely through meteoric

ice melting would be insufficient to supply an ecosystem that is active and growing, although it

would contribute in a measurable fashion to the total. However, it may be just enough to support

a static ecosystem under the assumption that no carbon fixation is occurring in the lake

(Christner et al., 2006, 2008).

The contribution of nutrients by bedrock erosion through basal melt has been proposed

as another method (Karl et al., 1999; Christner et al., 2006). This is thought to occur by basal ice incorporating glacial flour as it moves across bedrock and depositing the sediment into the waters of Lake Vostok when as it melts. This process would greatly increase the amount of energy containing compounds available for the myriad of metabolic pathways that have been reported (Siegert et al., 2003; Shtarkman & Koçer et al., 2013; Rogers et al., 2013), especially given the fact that both living and dead microbes have been reported from basal ice. While basal ice is also of meteoric origin, it carries both organisms and nutrients from meteoric origins, but also those from the bedrock. Overall nutrient levels of basal ice have been shown to be marginally higher than those of glacial ice. The amount of nutrients in basal ice is lower than would be expected if bedrock alone were contributing nutrients, but it has been noted that during the refreezing process a large proportion of the solutes would be excluded from the ice. Instead, freezing would likely result in a concentration of nutrients in the liquid water layer between bedrock and basal ice, which eventually flows into the lake (Siegert et al., 2003). Erosion of bedrock would make nutrients such as iron, sulfur compounds, and nitrogen compounds (nitrite and nitrate) available in higher levels which would enable sustainable growth of the ecosystem,

especially if nutrients were being added through meteoric ice as well (Christner et al., 2006). 146

Finally, a third source of energy in Lake Vostok is potentially provided through

hydrothermal activity, and this might provide the largest contribution to the energy and

biologically vital chemicals needed by the organisms in the lake. It is thought that the location of

hydrothermal activity is in the shallow embayment and not in the main basin, although only a

small portion of the lake has been examined. This is supported by the results here, as well as

previous reports, which conclude that the accretion ice from the shallow embayment contains

much higher concentrations of organisms than does the accretion ice from the main lake basin

(Christner et al. 2005; Rogers et al. 2013; Shtarkman et al. 2013). Hydrothermal vent activity

was ruled out in the main basin through helium isotope analysis. However, further analyses of

more of the lake would have to be completed before a final conclusion on this is made. High

ratios of 3He/4He are indicative of crustal thermal activity, i.e. geothermal or hydrothermal

activity, while low 3He/4He ratios indicate the opposite. Analysis of accretion ice samples found

3He/4He ratios to be lower than those of meteoric ice samples indicating that no significant hydrothermal input is happening (Jean-Baptiste et al., 2001). This hypothesis was refuted by

Bulat et al. (2004) in several ways. First, the shape of the lake implies deep faults which could

enable water to penetrate deeply into the crust where it would heat up and rise back up. This has

been corroborated by other studies indicating that Lake Vostok lies in a graben within a rift

valley system, similar to that in Africa. Volcanic and hydrothermal features are common in these

regions (Bulat et al., 2004, Leichenkov et al., 2011). Second, even though the ratio of 3He/4He is low, the total quantities of He are very high indicating that it is being released by either decay of uranium or by vents near or within the shallow embayment. Both theories are supported because organisms capable of reducing uranium were found in this study and the increased He levels are reported early in the accretion ice which is formed in or near the shallow embayment. Third, 18O 147 is released by minerals in hydrothermal springs and are known to enrich hot springs relative to precipitation. Accretion ice samples have been found to have similar enrichment of 18O indicating there may be hydrothermal activity. The presence of hydrothermal vents is also supported with evidence of organisms capable of metabolizing compounds commonly released by hydrothermal vents such as hydrocarbons, sulfur, iron, and arsenic. Evidence of organisms known to be thermophilic, halophilic, acidophilic, and alkaliphilic further support the possibility that hydrothermal vents play a role in the energy balance of Lake Vostok. It is also supported by finding the genetic signatures of thermophilic and thermotolerant organisms primarily in the accretion ice from the shallow embayment (Bulat et al. 2004; Rogers et al. 2013; Shtarkman et al. 2013).

Overall, it is likely that the source of nutrients and energy for Lake Vostok are not from one source, but instead are from multiple sources, and each of the organisms in the lake have adapted to the available energy and nutrients, as well as to the community of organisms surrounding them. Energy input from meteoric ice, basal melt containing bedrock or other sediment, and hydrothermal activity all probably play a part in the energy supply. Many of the elements supplied by each are similar, but the variety of nutrients found in accretion ice point to multiple sources.

2.5.8 Zones in Lake Vostok

Metagenomic and metatranscriptomic evidence suggests there are several zones present in Lake Vostok. Sequences similar to thermophiles, psychrophiles, halophiles, acidophiles, and have been isolated from both basal and accretion ice samples. In addition, organisms that have been demonstrated to inhabit freshwater, marine environments, sediment, and soil indicate there are likely conditions similar to these present in the lake. The data present a 148

complex environment in Lake Vostok, especially within the shallow embayment. The

extremophiles that live within the lake and surrounding areas are diverse and complex as well.

While most organisms are unicellular, there are many multicellular species present.

There is a large body of metagenomic and metatranscriptomic evidence that geothermal

heating or hydrothermal vents are present in Lake Vostok. The presence of sequences similar to

thermophilic organisms including some that require high temperatures to survive have been

found. For instance, sequences similar to Hydrogenophilus thermoluteolus has been found in this research, as well as in others (Bulat et al., 2004; Rogers et al. 2013, Shtarkman et al., 2013).

Sequences from another organism in the same family as H. thermoluteolus were found by

Shtarkman et al. (2013) and Rogers et al. (2013). Sequences similar to another thermophile

Desulfurella acetivorans, which has an optimal growth of 52 to 57°C and is found around hydrothermal vents, were isolated in this study. From the accretion ice, sequences similar to

Methyloceanibacter caenitepidi were isolated, which is not as thermophilic as the previously mentioned organisms but is a methane metabolizing organism that around hydrothermal vents. Sequences similar to other extremophiles that inhabit acidic and alkaline conditions were also found which further support that there may be hydrothermal activity in Lake Vostok.

Interestingly, the highest proportion of thermophiles was found in a sample (V5) that included accretion ice meltwater from the mixture of the 3,563 m and 3,585 m core sections. However, in the surrounding ice core sections (the mixture of the 3,540 m and 3,569 m core sections) and the individual 3,585 m section the proportion of halophilic and halotolerant organisms was greater, but the proportions of thermophiles and thermotolerant organisms was lower. This can be explained if the position of a hydrothermal plume of hot water were present near the middle of the shallow embayment. Hot water would flow out from their region and mix with the cooler 149

lake water, forming a temperature gradient surrounding the plume. Salting out of some of the

compounds would be expected as the water cooled, which would increase the salinity of the

sediments surrounding the plume. Input of microbes and nutrients would occur on the western

half of the embayment, due to melting of the basal ice, while the eastern side of the embayment

would contain microbes and nutrients from both the hot water plume, organisms that live in

cooler and clearer water, and some organisms from the main basin.

Beyond ecological conditions, sequences similar to organisms capable of metabolizing

hydrocarbons, sulfur, iron, arsenic, hydrogen and chromium also indicates the presence of

compounds known to be either released by or associated with hydrothermal vents. Assuming

these organisms are residing in the lake, it does not necessarily mean that they are using these

metabolic pathways, but the documented presence of sulfate, nitrates, and iron would mean that if the compounds are present they would be available for organisms in the lake. This is further supported by the low level of nitrates found in accretion ice relative to glacial ice samples which potentially indicates that nitrate metabolism is occurring in the lake (Priscu et al., 1999).

Furthermore, the carbon fixing organisms that utilize alternative pathways rely on the oxidation

and reduction of sulfur, nitrogen, iron, and hydrogen (Hügler & Sievert, 2011). All of these

components are known to be released in high enough quantities to support a hypothetical

ecosystem in Lake Vostok. It has been proposed by Christner et al. (2006) that the inputs from

meteoric ice melt and glacial flour would be sufficient to support an ecosystem without

geothermal input. This proposal does not rule out presence of hydrothermal activity but does

show that locations in the lake without hydrothermal input would likely have sufficient nutrient

input through other means. 150

The limnologic and geological evidence supporting the presence of aquatic and marine

environments in Lake Vostok have been reported in previous studies (De Angelis et al., 2005;

Christner et al., 2006; Rogers et al., 2013; Shtarkman et al., 2013). Although Lake Vostok is currently covered by approximately 4 km of ice, this has not always been the case. For a period of between 34 million years ago and 14 million years ago the glaciers that comprise the East

Antarctic Ice Sheet oscillated repeatedly covering and uncovering Lake Vostok. Lake Vostok currently is below and is surrounded by ridges between 20 and 50 m above the current sea level. During the periods of glacial oscillation sea levels could reach between 50 and 100 m higher than today effectively rendering Lake Vostok an Antarctic bay (Young et al., 2011). Once the lake was covered by a glacier for the last time sea water would have been trapped. Over time as glacial meltwater, which is fresh water, entered the lake from the top, creating an ion gradient or a saline "lake" below the freshwater, a common occurrence in other subglacial lakes, as well as in oceans worldwide (Brambilla et al., 2001; Boetius & Joye, 2009). This is reflected in analysis of accretion ice samples which have yielded evidence for a layer of salt water on the bottom of the lake. Extremely small brine droplets which were about 1.5% salinity, much higher than the 0.05 to 0.3% salinity of the freshwater, have been found in accretion ice. In the same samples, mineral inclusions associated with sediment and hydrothermal vent activity carrying brine water were also isolated (De Angelis et al., 2005). This, coupled with the presence of sequences from many marine species, suggests a marine layer in Lake Vostok.

The presence of these conditions is supported by the finding of metagenomic and metatranscriptomic evidence of aquatic and marine organisms including halophilic or halotolerant organisms. Of all organisms with a documented , about 31% of basal ice organisms inhabited freshwater or marine conditions and about 19% of organisms inhabited the 151

same conditions in accretion ice. Halophilic and halotolerant organisms were also found in both

basal and accretion samples including, Dunaliella salina (isolated from basal ice) that is famous

for being one of the most halophilic algae on earth (Boetius & Joye, 2009).

Finally, there are likely sedimentary environments in and around Lake Vostok. Lake

Vostok is deep enough that even during the period of glacial oscillation that the sediments at the

bottom of Lake Vostok may have remained intact over 30 million years. The time span and depth

of Lake Vostok likely means there is a deep layer of sediment at the bottom of the lake (Siegert

et al., 2001), as well as an ancient salt water layer. The sediment layer lining the bottom of the

embayment is probably different than that in the main basin, because it is receiving influx from

the basal ice, as well as from hydrothermal sources. Sedimentary inclusions are found in basal

and accretion ice samples which are made of a myriad of minerals including biotite, quartz, and

potassium feldspar, among many others (Priscu et al., 1999). Not only did Lake Vostok have

millions of years to accumulate sediment before it was covered by a glacier, but deposition of

sediment is still going on today. This is corroborated by the high proportion of sequences similar

to soil or sediment dwelling organisms in basal and accretion ice samples. Approximately 29%

of basal ice organisms are categorized into soil or sediment habitats. Sequences of soil and

sediment dwelling organisms from the accretion ice are higher, at 39% in the 3,540 + 3,569 m

sample, and 41% in the 3,585 m sample, than in the basal ice sample. These sedimentary organisms are associated with both freshwater and sea water sediment, including extremophiles

associated with salinity and temperature. They also are indicative of mixing in this area, which

may be the location of hydrothermal vents at the bottom of the shallow embayment, which is

causing turbulence. Evidence to support a variety of ecosystems including hydrothermal vents,

layers of fresh and salt water, layers of sediment, and environmental extremes are all supported 152

by previous and contemporary research (Siegert et al., 2001; Bulat et al., 2004; De Angelis et al.,

2005; Rogers et al., 2013; Shtarkman et al., 2013).

2.5.9 Multicellular Organisms

Previous studies of Lake Vostok have found evidence to suggest the presence of

multicellular organisms in the water such as sequences similar to Fungi, Animalia, and Protists.

Furthermore, sequences similar to bacteria associated with aquatic and marine multicellular

organisms have also been reported (Rogers et al. 2013; Shtarkman et al., 2013).

Sequences similar to multicellular organisms, as well as bacteria associated with aquatic

or marine organisms were present in basal and accretion ice samples. In the basal ice, sequences

similar to eukaryotic organisms from Apicomplexa, Arthropoda, Ascomycota, Bacillariophyta,

Chlorophyta, Haptophyta, Streptophyta, and Euglenozoa were found. All of the sequences

related to Apicomplexa were from the genus Plasmodium. Each of the 4 species are obligate

parasites and known to cause disease in vertebrates and insects.

The sequences similar to two Arthropoda organisms (Gryllus bimaculatus and

Linepithema humile) were found. While it is possible that related aquatic species exist, or that they lived on the margins of the lake prior to glaciation, it is more likely that pieces of these organisms became airborne, and landed on the glacier long ago, and then eventually were delivered to the lake by the glacier.

Sequences similar to a several different species of Ascomycota, predominately strains of

Saccharomyces cerevisiae, were isolated all of which are naturally found in soil. One source of these sequences could be dust that has blown in from South America and was trapped in glacial ice. Another potential source is from the bedrock where these organisms could be residing. The 153 presence of these are not surprising as previous studies on Lake Vostok by D’Elia et al. (2008,

2009) reported the isolation of many cultures of ascomycetes, as well as some basidiomycetes.

Furthermore, sequences from ascomycetes and basidiomycetes were reported in Shtarkman et al.

(2013) and Rogers et al. (2013).

Sequences similar to organisms in several phyla of photosynthetic organisms including

Bacillariophyta, Chlorophyta, and Haptophyta were found in the basal ice. Algae, diatoms, stramenopiles, alvolates, and other protists have been reported in many subglacial lakes including , Lake Hoare, , and Lake Vostok in previous studies

(Christner et al., 2014; Dudeja et al., 2012; Rogers et al. 2013; Shtarkman et al., 2013).

Bacillariophyta is a phylum of photosynthetic diatoms that are ubiquitously found in icy, aquatic, and marine environments, including those near the poles and in subglacial lakes (Kellogg and

Kellogg 2005; Scherer et al., 2004). The presence of sequences similar to these organisms could come from a variety of sources. Due to their long evolutionary history and ubiquity, it is not unreasonable for these organisms to be the ancestors of diatoms living in Lake Vostok prior to being covered by a glacier 15 million years ago. Although the lack of light would prevent them from exhibiting photosynthetic activity, diatoms are well known for their heterotrophic capabilities and can survive off of a variety of organic carbon sources (Hellebust & Lewin,

1977). Also, their numbers in glacial ice are so great that it is likely that they are begin constantly delivered to the lake by the glacier (Kellogg & Kellogg, 2005). Sequences similar to organisms from the phyla Chlorophyta, including the psychrophilic species Chlamydomonas sp. UWO241

(100% identity to a gene of unknown function), were found. As with the Bacillariophyta,

Chlorophyta have been reported in a variety of subglacial lakes in Antarctica including organisms from the genus Chlamydomonas (Dudeja et al., 2012). The single Haptophyta, 154

Emiliania huxleyi CCMP1516, is a coccolithophore prominent throughout the world’s oceans

and is known to fix carbon and serves an important role in sulfur cycling. It may have been an

original resident of Lake Vostok, or it could have been blown onto the glacier after being

airborne in sea foam.

A myriad of sequences similar to species belong to Streptophyta were isolated from the

basal ice, all of which except one had sequence identities greater than 99% to either ribosomal

RNA or genes of unknown function. Some were similar to agricultural species, such as Brassica

napus (100% identity to 26S rRNA), Raphanus sativus (100% identity to 26S rRNA), and Vicia

faba (100% identity to 26S rRNA). Others were wild species, such as Ammopiptanthus

mongolicus (100% identity to gene of unknown function), Silene vulgaris (100% identity to 26S

rRNA), and Senna occidentalis (100% identity to 26S rRNA), which are found throughout the

world. The most likely sources of these sequences are from pollen that was trapped in the glacier

and over time was deposited into the lake through the glacier. Alternatively, the pollen (or pieces

of the plants) could have been deposited in the sediment of the lake prior to glaciation.

Finally, a variety of sequences similar to Euglenozoa were isolated from the basal ice.

The most notable is a sequence 100% identical to 24S alpha rRNA of Trypanosoma triglae

because it is known to infect marine teleosts, an infraclass of Neopterygii. In a previous study,

sequences from several species of bacteria that inhabit fish gastrointestinal systems were found

in the accretion ice from the shallow embayment (Rogers et al., 2013; Shtarkman et al., 2013).

These sequences support the hypothesis that fish may exist in Lake Vostok.

In addition to the eukaryotic organisms, there are also sequences similar to seven bacteria associated with eukaryotes that were found in the basal ice. The most numerous of these are from the genus Flavobacterium, all of which except for one had a 100% identity to 23S ribosomal 155

RNA with the exception being 100% identical to a gene of unknown function. Four of the

Flavobacterium (F. psychrophilum, F. columnare ATCC 49512, F. columnare, and F. branchiophilum) all are known to cause disease in fish. The most notable of these four is F. psychrophilum which is psychrophilic and known to inhabit freshwater conditions. A sequence similar to Flavobacterium crassostreae was also found which is known to infect Pacific mollusks. Sequences from small marine mollusks were reported from accretion ice from the shallow embayment previously (Rogers et al., 2013; Shtarkman et al., 2013). Beyond disease causing bacteria, sequences from two nonpathogenic marine organisms were found. Sequences similar to the Gammaproteobacteria Pseudomonas xanthomarina, which has been isolated from marine ascidians, were found but had an identity of 95% for 16S rRNA, and therefore probably from a related species. The other was Echinicola strongylocentroti, which has been isolated from sea urchins, with a sequence identity of 96% to 23S rRNA identity, which is below the 97% cut off, and therefore is likely from a closely related species. Overall, the bacteria found indicate the possible presence of eukaryotic species. It is unknown whether any are obligately associated with any eukaryotic species or whether they are living separate from eukaryotes.

Accretion ice samples yielded sequences similar to a variety of eukaryotic organisms, as well as bacteria associated with eukaryotes. The potential eukaryotes include those within:

Arthropoda, Ascomycota, Basidiomycota, Chlorophyta, Chordata, Platyhelminthes,

Streptophyta, Euglenozoa, Nematoda, Amoebozoa, Ciliophora, Heterokontophyta, and

Ochrophyta. The Arthropoda linked sequences are most similar to Linepithema humile and

Gryllus bimaculatus, the same organisms from basal ice, as well as Apis mellifera. Therefore, these may be entering the lake from the melting basal ice. All sequences had at least 97% 156 identity but were all linked to either genes of unknown function or uncharacterized mRNA sequences.

A very large variety of sequences similar to Ascomycota and two Basidiomycota were found in the accretion ice. Sequences similar to organisms from the same families have been found in previous studies on accretion ice from Lake Vostok (D’Elia et al., 2009; Shtarkman et al., 2013; Rogers et al., 2013). The source of these fungi may be from soil or sediment that entered Lake Vostok prior to being covered by glacial ice and have been living there ever since.

Another possible source is from glacial melt water that is deposited into the lake on a regular basis (D’Elia et al., 2009). None of the fungi associated with these fungi are known to be particularly psychrophilic. Therefore, it is therefore likely that if they are living in a portion of the lake or embayment that has moderate water temperatures, or that they have adapted to a cold environment. Most of the fungi that have been isolated from ice core sections are adaptable to a variety of temperatures from 4°C to 22°C (D'Elia et al. 2009).

Only one sequence from a member of Chlorophyta was found in accretion ice samples,

Monoraphidium neglectum, which was found in both the 3,540 m + 3,569 m and 3,585 m samples. This organism is known for its ability to produce fatty acids which is not reduced even when living heterotrophically. This could provide compounds to the hydrocarbon degrading organisms (Bogen, 2013).

The remaining organisms are a mix of various species including Notothenia coriiceps

(92% identity to an mRNA for N-lysine methyltransferase SETD8-A), also known as black rock cod. As mentioned previously, sequences from several bacteria that inhabit fish gastrointestinal tracts were previously found in the accretion ice from the shallow embayment, and fish have been reported living in subglacial environments (Rogers et al. 2013; Shtarkman et al., 2013; , 157

2015; Fox, 2018). While the species of fish probably differs from that mentioned above because

of the low sequence identity, this provides further support for the contention that fish live in

Lake Vostok.

Two sequences similar to a single Platyhelminthes, Spirometra erinaceieuropaei, a

species of tapeworm, were found in the accretion ice samples. This species in particular is known

for infecting mammals but has a complex life cycle. It must first infect a copepod during its

larval stage it then infects a reptile, amphibian, or fish, and then a mammal host. Sequences of

copepods were reported in the accretion ice from Lake Vostok, so that host probably exists in the

lake (Rogers et al., 2013; Shtarkman et al., 2013). It has been reported that it must enter a mammalian host to reproduce (Bennet et al., 2014). If fish are living in the lake, they could provide the means for this species to reproduce. However, this might be a related species that is poorly studied, as such the life cycle might be unknown.

The sequence similar to Parastrongyloides trichosuri, a species of Nematoda was also found in accretion ice aligning with a gene of unknown function at 96% identity to a gene of

unknown function. It is a parasitic worm species but is also capable of free living in soil.

A wide variety of sequences similar to Streptophyta were identified from all accretion ice

samples examined to date. It should be noted than none of the sequences aligned with ribosomal

sequences but instead with mRNA or genes of unknown function. Many of the possible

organisms are agricultural species such as Ananas comosus (pineapple), Arachis hypogaea

(peanut), Coffea arabica (coffee) to name a few. Other sequences aligned with wild Streptophyta

such as Erythranthe guttata (seep monkeyflower) and Phoenix dactylifera (Egyptian starcluster).

These sequences being found in the ice are likely the result of pieces from the plant or pollen 158

becoming airborne, and eventually being trapped in the glacier, or from deposition in the lakebed

sediment prior to glaciation.

The sequences similar to two genera of Amoebozoa are from species that have been

found in soil and freshwater. Although they are not particularly known for living in cold

climates, it is likely that if these organisms are living in the lake they are relatives to these

species. They were possibly seeded into the lake prior to it being covered by glacial ice and have

been living there ever since.

A sequence 100% identical to a gene of unknown function of a Euglenozoa,

Trypanosoma grayi, was isolated. This organism is known to parasitize the tsetse flies. However, it is most likely that this sequence is from a species close to T. grayi. Other Trypanosoma

organisms are known to infect fish from the families Clinidae, Blenniidae and Gobiidae (Hayes

et al., 2014). It is more likely that the sequence identified in this study is a relative of a fish or

arthropod infecting Trypanosoma.

Sequences aligned with organisms belonging to Ciliophora were also identified 100% to

40S rRNA. Both organisms are known to inhabit sediment in or near marine or brackish waters.

Sequences similar to an organism from Heterokontophyta and another from Ochrophyta were

also isolated. The heterokont, Saprolegnia parasitica CBS 223.65, is an aquatic parasite known

to infect fish, again indicating that fish may exist in the lake. The Ochrophyta, Chrysopodocystis

socialis, is a photosynthetic marine organism. As with Amoebozoa, these organisms are likely

the result of seeding from sediment near Lake Vostok prior to glaciation, and that they are able to

parasitize in the lake. 159

Sequences similar to two Bacteria associated with aquatic organisms were isolated from accretion ice samples. The first sequence aligned with 95.2% identity to mRNA for GTP-binding protein from Microbacterium paraoxydans. This environmental organism is known to cause disease in fish (Soto‐Rodriguez et al., 2013). The other sequence was similar to

Hydrogenophaga crassostreae with a 95% identity to 23S rRNA. This species is associated with

Crassostrea gigas, a species of Pacific oyster. Although the identity is low for taxonomic identification based upon ribosomal RNA, it may signal the presence of mollusks in Lake

Vostok. Sequences from deep sea marine clams were previously found in accretion ice from the shallow embayment (Rogers et al. 2013; Shtarkman et al. 2013).

2.5.10 Lake Vostok Ecosystem

Evidence of organisms involved with vital nutrient cycles, such as the nitrogen cycle and the , indicate that active nutrient cycling is occurring in Lake Vostok. Sequences from basal ice aligned with enough organisms to complete almost all parts of the nitrogen cycle.

Likewise, sequences from accretion ice samples aligned with organisms that were part of all parts of the nitrogen cycle (Rogers et al. 2013; Shtarkman et al., 2013). Sequences similar to organisms that can fix carbon through common pathways such as the reductive pentose phosphate cycle (C3 cycle) or the C4-dicarboxylic acid pathway, and through alternative pathways using the reductive TCA cycle or the reductive acetyl CoA cycle, or from oxidation of iron and sulfur, have been found in both basal and accretion ice (Erbs & Spain, 2002; Sievert et al., 2007; Rogers et al., 2013; Shtarkman et al., 2013). Furthermore, research has shown that the input of nutrients occurs in a consistent fashion through a combination of glacial ice melting, bedrock erosion, and hydrothermal vent activity (Karl et al., 1999; Bulat et al., 2004; Christner et al., 2006, 2008). Evidence of these energy and nutrient inputs is supported by the organisms 160

found in this study, as well as from previous reports that can metabolize arsenic, sulfur, uranium,

hydrogen, iron, nitrogen, chromium, and hydrocarbon compounds which are added into the lake

through melting, erosion, and hydrothermal activity (Rogers et al. 2013; Shtarkman et al., 2013).

Evidence also suggests that multicellular organisms exist in Lake Vostok. Direct

evidence based upon sequences from accretion ice are similar to those from aquatic, marine, and

sediment dwelling organisms from Animalia, Ascomycota, Basidiomycota, Bacillariophyta,

Chlorophyta, Amoebozoa, Ciliophora, Heterokontophyta, Haptophyta, and Ochrophyta. This

indicates there is likely multicellular life in the shallow embayment, including ,

protists, and animals (arthropods, invertebrates, and vertebrates). Indirect evidence from

sequences in basal and accretion ice samples that are similar to bacteria that live on or in

multicellular organisms were present, such as Pseudomonas xanthomarina, Echinicola strongylocentroti, Hydrogenophaga crassostreae.

Finally, the organisms that could potentially be living in the lake generally do not align

with possible contamination from outside sources. Possible laboratory and reagent contamination

were detected through the inclusion of a negative control, which was treated identically to the

other samples that were prepared. Any organisms in the negative control whose sequence was >

99% identical were removed from the data sets of all samples. In addition, it is impossible to

imagine how sequences similar to a wide variety of thermophiles, psychrophiles, halophiles,

acidophiles, alkaliphiles, aquatic, marine, and others would have been incorporated into all of

our ice core section samples (Bulat et al., 2004; Shtarkman et al., 2013; Rogers et al., 2013). 161

2.5.11 Conclusions

Based on the extensive metagenomic, metatranscriptomic, cultural, microscopic, and other results presented here and elsewhere (D'Elia et al., 2008, 2009; Rogers et al., 2013;

Shtarkman et al. 2013), Lake Vostok is not only a repository of genomic and transcriptomic information that has been deposited over time, but it is also a dynamic ecosystem that experiences inputs of energy and nutrients leading to a diverse biological community consisting mainly of bacteria, but also includes a diverse set of eukaryotic organisms, including complex multicellular species. The results indicate that there is a small contribution of species from the melting of the basal ice into the lake water, but that most of the organisms are residents of the lake. A large proportion of the species exist in the shallow embayment, which itself consists of several zones. Hydrothermal influences appear highest near the middle of the shallow embayment, while cooler zones exist surrounding the plume(s) of heated water. While some organisms appear to be entering the lake from melting of the basal ice, the vast majority only begin to appear, sometimes in large numbers, within the lake water. These organisms are especially numerous and diverse in the accretion ice from the shallow embayment. After the glacier passes the peninsula and is in contact with the lake water over the main lake basin, the biological community changes dramatically from one that contains thermophiles, acidophiles, alkaliphiles, and a mix of aquatic, marine, and sediment species, to one that contains primarily psychrophiles and aquatic species. The quantity and diversity of species also drops greatly in accretion ice over the main basin. The ecosystem may have some similarities with other subglacial environments, but because of the combination of glacial input, pressure, and hydrothermal activity, it may be unique among deep subglacial environments. 162

2.6 Chapter 2 References

Abyzov, S., Poglazova, M., Mitskevich, J., & Ivanov, M. (2005). Common Features of

Microorganisms in Ancient Layers of the Antarctic Ice Sheet. In J. D., Castello, S.O.

Rogers, (Eds.), Life in Ancient Ice (pp. 240-250). Princeton, NJ: Princeton University

Press.

Bazylinski, D. A., Wirsen, C. O., & Jannasch, H. W. (1989). Microbial Utilization of Naturally

Occurring Hydrocarbons at the Guaymas Basin Hydrothermal Vent Site. Appl. Environ.

Microbiol., 55(11), 2832-2836.

Bell, R. E., Studinger, M., Tikku, A. A., Clarke, G. K., Gutner, M. M., Meertens, C. (2002).

Origin and Fate of Lake Vostok Water Frozen to the Base of the .

Nature, 416(6878), 307.

Bell, R., Studinger, M., Tikku, A., Castello, J. D. (2005) Comparative Biological Analyses Of

Accretion Ice from Subglacial Lake Vostok. In J. D., Castello, S.O. Rogers, (Eds.), Life

in Ancient Ice (pp. 251-267). Princeton, NJ: Princeton University Press.

Bennett, H. M., Mok, H. P., Gkrania-Klotsas, E., Tsai, I. J., Stanley, E. J., Antoun, N. M.,

Coghlan, A., Harsha, B., Traini, A., Ribeiro, D. M., Steinbiss, S., (2014). The Genome of

the Sparganosis Tapeworm Spirometra erinaceieuropaei Isolated From the Biopsy of a

Migrating Brain Lesion. Genome Biol., 15(11), 510.

Bernhard, A. (2010) The Nitrogen Cycle: Processes, Players, and Human Impact. Nature Educ.

Knowl. 3(10):25 163

Boetius, A., Joye, S. (2009). Thriving in salt. Science, 324(5934), 1523-1525.

doi:10.1126/science.1172979

Bogen, C. (2013). Systematic Identification of Microalgal Species for Lipid Production and

Genome Based Molecular Characterization of the Oleaginous Microalga Monoraphidium

Neglectum (Doctoral dissertation). Available from ProQuest Dissertations Publishing.

Boyd, E. S., Skidmore, M., Mitchell, A. C., Bakermans, C., Peters, J. W. (2010).

in Subglacial Sediments. Environmental microbiology reports, 2(5), 685-692.

Brambilla, E., Hippe, H., Hagelstein, A., Tindall, B. J., Stackebrandt, E. (2001). 16S rDNA

Diversity of Cultured and Uncultured of a Mat Sample from Lake Fryxell,

McMurdo Dry Valleys, Antarctica. Extremophiles, 5(1), 23-33.

Bulat, S. A., Alekhina, I. A., Blot, M., Petit, J. R., De Angelis, M., Wagenbach, D., Lipenkov, V.

Y., Vasilyeva, L. P., Wloch, D. M., Raynaud, D., Lukin, V. V. (2004). DNA Signature of

Thermophilic Bacteria from the Aged Accretion Ice of Lake Vostok, Antarctica:

Implications for Searching for Life in Extreme Icy Environments. Intl. J. Astrobiol., 3(1),

1-12.

Bulat, S. A., Alekhina, I. A., Lipenkov, V. Y., Lukin, V. V., Marie, D., Petit, J. R. (2009). Cell

Concentrations of Microorganisms in Glacial and Lake Ice of the Vostok Ice Core, East

Antarctica. Microbiol., 78(6), 808-810.

Castello, J. D., Rogers, S. O. (2005). Life in Ancient Ice. Princeton, NJ: Princeton University

Press. 164

Christner, B., Royston-Bishop, G., Foreman, C., Arnold, B., Tranter, M., Welch, K., Lyons, B.,

Tsapin, A., Studinger, M., Priscu, J. (2006). Limnological Conditions in Subglacial Lake

Vostok, Antarctica. Limnol. Oceanog., 51(6), 2485-2501.

Christner, B. C., Skidmore, M. L., Priscu, J. C., Tranter, M., Foreman, C. M. (2008). Bacteria in

Subglacial Environments. In R. Margesin (Ed.), Psychrophiles: from Biodiversity to

Biotechnology (pp. 51-71). Berlin, Heidelberg: Springer.

Christner, B.C., Priscu, J.C., Achberger, A.M., Barbante, C., Carter, S.P., Christianson, K.,

Michaud, A.B., Mikucki, J.A., Mitchell, A.C., Skidmore, M.L. and Vick-Majors, T.J.

(2014). A Microbial Ecosystem Beneath the . Nature,

512(7514), 310.

Cowen, J. P., Massoth, G. J., Feely, R. A. (1990). Scavenging Rates of Dissolved Manganese in

a Hydrothermal Vent Plume. Deep Sea Res. Part A, Oceanog. Res. Pap., 37(10), 1619-

1637.

De Angelis, M., Morel‐Fourcade, M. C., Barnola, J. M., Susini, J., Duval, P. (2005). Brine

Micro‐Droplets and Solid Inclusions in Accreted Ice from Lake Vostok (East Antarctica).

Geophy. Res. Lett, 32(12). 32(12).

D'Elia, T., Veerapaneni, R., Rogers, S. O. (2008). Isolation of Microbes From Lake Vostok

Accretion Ice. Appl. Environ. Microbiol., 74(15), 4962-4965. doi:10.1128/AEM.02501-

07

D'Elia, T., Veerapaneni, R., Theraisnathan, V., Rogers, S. O. (2009). Isolation of Fungi from

Lake Vostok Accretion Ice. Mycologia, 101(6), 751-763. 165

Denton, G. H., Sugden, D. E., Marchant, D. R., Hall, B. L., & Wilch, T. I. (1993). East Antarctic

Ice Sheet Sensitivity to Pliocene Climatic Change from a Dry Valleys Perspective.

Geografiska Annaler. Series A, Physical Geography, 75(4), 155-204.

Dudeja, S., Bhattacherjee, A. B., Chela-Flores, J. (2012). Antarctica as Model for the Possible

Emergence of Life on Europa. In A. Hanslmeier, S. Kempe, J. Seckbach (Eds.), Life on

Earth and other Planetary Bodies (pp. 407-419). Dordrecht, Netherlands: Springer.

Emsley, J. (2001). Nature's building blocks: An A-Z guide to the elements. Oxford University

Press, New York.

Erbs, M., & Spain, J. (2002). Microbial Iron Metabolism in Natural Environments. Microbial

Diversity, 2002.

Foustoukos, D. I., Seyfried, W. E. (2004). Hydrocarbons in Hydrothermal Vent Fluids: The Role

of Chromium-Bearing Catalysts. Science, 304(5673), 1002-1005.

Fox, D. (2015). Fish live beneath Antarctica. Nature. https://doi.org/10.1038/nature.2015.16772

Fox, D. (2018). The Hunt for Life Below Antarctic Ice. Nature, 564(7735), 180–182.

https://doi.org/10.1038/d41586-018-07669-3

Froelich, P. N., Bender, M. L., Luedtke, N. A., Heath, G. R., DeVries, T. (1982). The Marine

Phosphorus Cycle. Am. J. Sci, 282(4), 474-511. doi: 10.3389/fmicb.2013.00105

Grant, W. D., Oren, A., Ventosa, A. (1998). Proposal of strain NCIMB 13488 as neotype of

Halorubrum trapanicum. Request for an Opinion. International Journal of Systematic and

Evolutionary Microbiology, 48(3), 1077-1078. 166

Habe, H., Omori, T. (2003). Genetics of Polycyclic Aromatic Hydrocarbon Metabolism in

Diverse Aerobic Bacteria. Biosci. Biotech. Biochem., 67(2), 225-243.

Hayes, P. M., Lawton, S. P., Smit, N. J., Gibson, W. C., Davies, A. J. (2014). Morphological and

Molecular Characterization of a Marine Fish Trypanosome from South Africa, Including

its Development in a Leech Vector. Parasit. Vectors, 7, 50.

Hellebust, J. A., Lewin, J. (1977). . In D. Werner (Ed.), The Biology of

Diatoms (pp. 169-197). Berkeley, CA: University of California Press.

Hügler, M., Sievert, S. M. (2011). Beyond the Calvin cycle: Autotrophic Carbon Fixation in the

Ocean. Annu. Rev. Marine Sci., 3, 261-289.

Jean-Baptiste, P., Petit, J. R., Lipenkov, V. Y., Raynaud, D., Barkov, N. I. (2001). Constraints on

Hydrothermal Processes and Water Exchange in Lake Vostok from Helium Isotopes.

Nature, 411(6836), 460.

Jiang, Y., Xiong, X., Danska, J., Parkinson, J. (2016). Metatranscriptomic Analysis of Diverse

Microbial Communities Reveals Core Metabolic Pathways and Microbiome-Specific

Functionality. Microbiome, 4(1), 2.

Karl, D., Bird, D. F., Björkman, K., Houlihan, T., Shackelford, R., Tupas, L. (1999).

Microorganisms in the Accreted Ice of Lake Vostok, Antarctica. Science, 286(5447),

2144-2147.

Kawka, O. E., Simoneit, B. R. (1990). Polycyclic Aromatic Hydrocarbons in Hydrothermal

Petroleum from the Guaymas Basin Spreading Center. Appl. Geochem., 5(1-2), 17-27. 167

Kellogg, D. Kellogg, T. B. (2005). Frozen in Time: The Diatom Record in Ice Cores from

Remote Drilling Sites on the Antarctic Ice Sheets. In J. D. Castello, S.O. Rogers (Eds.),

Life in Ancient Ice (pp. 69-93). Princeton, NJ: Princeton University Press.

Leichenkov, G. L., Belyavsky, B. V., Antonov, A. V., Rodionov, N. V., & Sergeev, S. A. (2011).

First Information About the Geology of Central Antarctica Based on Study of Mineral

Inclusions in Ice Cores of the Vostok Station Borehole. In V. Fortov (Ed.). Doklady Earth

Sciences (Vol. 440, pp. 1207). New York, New York: Springer.

Lipenkov, V. Y., Istomin, V. A. (2001). On the Stability of Air Clathrate-Hydrate Crystals in

Subglacial Lake Vostok, Antarctica. Mater. Glyatsiol. Issled, 91, 129-133.

Martin, J., & Anderson, D. (1997). Stealth virus epidemic in the Mohave Valley. Pathobiology,

65(1), 51-56.

Persson, B. N. J. (2018). Ice friction: Glacier Sliding on Hard Randomly Rough Bed Surface. J.

Chem. Phys., 149(23), 234701. doi:10.1063/1.5055934

Petersen, J. M., Zielinski, F. U., Pape, T., Seifert, R., Moraru, C., Amann, R., Hourdez, S.,

Girguis, P. R., Wankel, S. D., Barbe, V., Pelletier, E., Fink, D., Borowski, C., Bach., W,

Dubilier, N. (2011). Hydrogen is an Energy Source for Hydrothermal Vent Symbioses.

Nature, 476(7359), 176.

Petit, J. R., Jouzel, J., Raynaud, D., Barkov, N. I., Barnola, J. M., Basile, I., Benders, M.,

Chappellaz, J., Davis, M., Delaygue, G., Delmotte, M. (1999). Climate and Atmospheric 168

History of the Past 420,000 Years from the Vostok Ice Core, Antarctica. Nature,

399(6735), 429.

Petit, J. R., Alekhina, I., Bulat, S. (2005). Lake Vostok, Antarctica: Exploring a Subglacial Lake

and Searching for Life in an Extreme Environment. In M. Gargaud, B. Barbier, H.

Martin, J. Reisse (Eds.). Lectures in astrobiology (pp. 227-288). Berlin, Heidelberg:

Springer.

Priscu, J. C., Adams, E. E., Lyons, W. B., Voytek, M. A., Mogk, D. W., Brown, R. L., McKay C.

P., Takacs, C.D., Welch, K. A., Wolf, C. F., Kirshtein, J. D., Avci, R. (1999).

Geomicrobiology of Subglacial Ice Above Lake Vostok, Antarctica. Science, 286(5447),

2141-2144.

Proskurowski, G., Lilley, M. D., Seewald, J. S., Früh-Green, G. L., Olson, E. J., Lupton, J. E.,

Sylva, S. P., Kelley, D. S. (2008). Abiogenic Hydrocarbon Production at Lost City

Hydrothermal Field. Science, 319(5863), 604-607.

Robin, G. D. Q., Drewry, D. J., Meldrum, D. T. (1977). International Studies of Ice Sheet and

Bedrock. Philos. Trans. R. Soc. Lond. B, Biolog. Sci., 279(963), 185-196.

Rogers, S. O., & Bendich, A. J. (1985). Extraction of DNA from Milligram Amounts of Fresh,

Herbarium and Mummified Plant Tissues. Plant Molecular Biology, 5(2), 69-76.

Rogers, S. O., Shtarkman, Y. M., Koçer, Z. A., Edgar, R., Veerapneni, R. S., D'Elia, T., Morris,

P. F. (2013). Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse Set

of Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya.

Biology, (2), 206-232. 169

Salamatin, A. N., Tsyganova, E. A., Lipenkov, V. Y. a., Petit, J. R. (2004). Vostok (Antarctica)

Ice-Core Time-Scale from Datings of Different Origins. Annals Glaciol., 39(1), 283-292.

Scherer, R. P., Sjunneskog, C. M., Iverson, N. R., Hooyer, T. S. (2004). Assessing Subglacial

Processes from Diatom Fragmentation Patterns. Geology, 32(7), 557-560.

Shtarkman, Y., Koçer, Z., Edgar, R., Veerapaneni, R., D'Elia, T., Morris, P., Rogers, S.O.

(2013). Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse Set of

Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya. PloS

ONE, 8(7), e67221.

Siegert, M. J., Ridley, J. K., Kapitsa, A. P., de Q. Robin, G., Zotikov, I. A. (1996). A Large Deep

Freshwater Lake Beneath the Ice of Central East Antarctica. Nature, 381(6584), 684-686.

Siegert, M. J., Ellis-Evans, J. C., Tranter, M., Mayer, C., Petit, J. R., Salamatin, A., Priscu, J. C.

(2001). Physical, Chemical and Biological Processes in Lake Vostok and Other Antarctic

Subglacial Lakes. Nature, 414(6864), 603.

Siegert, M. J. (2003). Glacial–Interglacial Variations in Central East Antarctic Ice Accumulation

Rates. Quater. Sci. Rev., 22(5-7), 741-750.

Siegert, M. J., Tranter, M., Ellis‐Evans, J. C., Priscu, J. C., Berry Lyons, W. (2003). The

Hydrochemistry of Lake Vostok and the Potential for Life in Antarctic Subglacial Lakes.

Hydrol. Process., 17(4), 795-814.

Sievert, S. M., Kiene, R. P., Schulz-Vogt, H. N. (2007). The Sulfur Cycle. Oceanog., 20(2), 117-

123. 170

Sohm, J. A., Webb, E. A., Capone, D. G. (2011). Emerging Patterns of Marine Nitrogen

Fixation. Nature Rev. Microbiol., 9(7), 499.

Soto‐Rodriguez, S. A., Cabanillas‐Ramos, J., Alcaraz, U., Gomez‐Gil, B., Romalde, J. L. (2013).

Identification and Virulence of Aeromonas dhakensis, Pseudomonas mosselii and

Microbacterium paraoxydans Isolated from Nile tilapia, Oreochromis niloticus,

Cultivated in Mexico. J. Appl. Microbiol., 115(3), 654-662.

Stolz, J., Basu, P., Santini, J., Oremland, R. (2006). Arsenic and Selenium in Microbial

Metabolism. Annu. Rev. Microbiol., 60(1), 107-130.

Tivey, M. K. (2007). Generation of Seafloor Hydrothermal Vent Fluids and Associated Mineral

Deposits. Oceanography, 20(1), 50-65.

Venturi, S., Tassi, F., Gould, I.R., Shock, E.L., Hartnett, H.E., Lorance, E.D., Bockisch, C.,

Fecteau, K.M., Capecchiacci, F., Vaselli, O., (2017). Mineral-Assisted Production of

Benzene under Hydrothermal Conditions: Insights from Experimental Studies on C6

cyclic Hydrocarbons. J. Volcanol. Geotherm. Ress, 346, 21-27.

Welhan, J. A., Lupton, J. E. (1987). Light Hydrocarbon Gases in Guaymas Basin Hydrothermal

Fluids: Thermogenic Versus Abiogenic Origin. AAPG Bull., 71(2), 215-223.

Wright, A., Siegert, M. (2012). A Fourth Inventory of Antarctic Subglacial Lakes. Antarc. Sci.,

24(6), 659-664.

Young, D.A., Wright, A.P., Roberts, J.L., Warner, R.C., Young, N.W., Greenbaum, J.S.,

Schroeder, D.M., Holt, J.W., Sugden, D.E., Blankenship, D.D., Van Ommen, T.D. 171

(2011). A Dynamic Early East Antarctic Ice Sheet Suggested by Ice-Covered Fjord

Landscapes. Nature, 474(7349), 72.