METAGENOMIC/METATRANSCRIPTOMIC STUDY OF ORGANISMS ENTRAPPED

IN ICE AT FOUR LOCATIONS IN ANTARCTICA

Sammy O. Juma

A Thesis

Submitted to the Graduate College of Bowling Green

State University in partial fulfillment of

the requirements for the degree of

Master of Science

August, 2013

Committee:

Dr. Scott O. Rogers, Advisor

Dr. Paul Morris

Dr. Vipaporn Phuntumart

© 2013

Sammy Juma

All Rights Reserved

iii

ABSTRACT Dr. Scott O. Rogers, Advisor

Antarctica has one of the most extreme environments on Earth. The biodiversity and the richness on the continent are low and decrease with increases in elevation and distance from the coastal regions. Previous scientific research in Antarctica has been used to understand the past climatic conditions, survival mechanisms used by the microbial communities and various environmental factors that contribute the dispersal of microorganisms. The research presented here is a comparison of microbial inclusions in ice at four locations in Antarctica

(Byrd, Taylor Dome, Vostok and J-9) to identify the factors that influence the microbial distribution patterns and to investigate survival of the micobes under harsh conditions. Culture- dependent and culture independent techniques (e.g., metagenomics and metatranscriptomics) were used to analyze sequences present in ice cores from Antarctica. The sequences analyzed matched those from Spirochaetes, Verrucomicrobia, Bacteroideters, Cyanobacteria,

Deinococcus-Thermus, , , , Euryarchaeota and

Ascomycota. Analysis of the metagenomic/metatranscriptomic sequences was also carried out to characterize the various pathways represented in the diverse communities. Analysis of the data revealed that the numbers of unique sequences obtained from the samples were few (Taylor

Dome (51), Byrd (43), Vostok (33) and J-9 (40). The number of unique sequences was lowest from the sample obtained at the most elevated location in the interior of Antarctica (Vostok (33)) and highest from the sample that was closest to the Antarctic coast (Taylor Dome (51)). The samples from all the four locations appeared to harbor very few species of microorganisms

(Taylor Dome (12), Byrd (13), Vostok (6), and J-9 (7)). Analysis of the microbial pathways revealed that the microorganisms are able to utilize various sources of carbon, recycle iv and had unique and cell structures that have previously been reported to be important for microbial survival at very low temperatures.

v

TABLE OF CONTENTS

CHAPTER 1: INTRODUCTORY REVIEW………...………………………………...…………1

1.1 ANTARCTICA: A COLD DESERT…………………………………………………...... 1

1.2 DISTRIBUTION OF MICROBES IN COLD ECOSYSTEMS………...…….…………..2

1.3 INVASION OF MICROBES IN TO ANTARCTICA...……………..…………………...3

1.3.1 Storm system and circumpolar currents……………………………………...... 3

1.3.2 Wind transport………….………………………………………………………………4

1.3.3 Migratory birds……….………………………………………………………………...5

1.4 METHODS OF CHARACTERIZING MICROBIAL DISTRIBUTION……….………...5

1.4.1 Culture-dependent methods…………………………………………………………….5

1.4.2 Culture-independent method…………………………………………………………...6

1.4.2.1 454 pyrosequencing……………………………………………………………...... 7

1.4.2.2 rRNA (ribosomal RNA)……………………………………………………………8

1.5 ANALYSIS OF METABOLIC FUNCTIONS……..………………...…………………...9

1.6 HYPOTHESIS AND GOALS OF THIS PROJECT……….……...…..………………...10

1.7 REFERENCES…………………………………………………………...……………...12

CHAPTER 2: COMPARISON STUDY OF THE SEQUENCE DISTRIBUTION AND

METABOLIC PATHWAYS PRESENT IN FOUR LOCATIONS OF ANTARCTICA……….17

2.1 ABSTRACT………………….……………………………………………...…………...17

2.2 INTRODUCTION…………….…………………………………………...…………….19 vi

2.3 METHODOLOGY………………...……………………………………………………..23

2.3.1 Ice core section handling and culturing…...…………………………………………..23

2.3.2 DNA extraction of the pure cultures…………………………………………………..24

2.3.2.1 PCR ( chain reaction)…………………………………………………25

2.3.2.2 PCR purification……………………………………………….…………26

2.3.3 Ultracentrifugation…………………………………………………………………….26

2.3.4 DNA extraction………………………………………………………………………..27

2.3.5 RNA extraction.……………………………………………………………………….27

2.3.6 cDNA synthesis….……………………………………………………………………28

2.3.6.1 First strand synthesis…………………………………………………………...…28

2.3.6.2 Second strand synthesis………………………………………………….………..29

2.3.7 EcoRI (NotI) adapter addition………………………………………………………...30

2.3.8 Column chromatography……….……………………………………………………..30

2.3.9 Amplification and quantitation………………………………………………………..31

2.3.10 Addition of 454 adapters…………………………………………………………….32

2.3.11 Sequencing…………………………………………………………………………...33

2.3.12 Analysis of the sequences….………………………………………………………...35

2.4 RESULTS………………………………………………………………………...…….37

2.4.1 Sequences obtained……….…………………………………………………………...37 vii

2.4.2 Community structure………………………………………………………………….40

2.4.2.1 Taxonomic groups represented…………………………………………………...40

2.4.2.2 Phylogenetic analysis……………………………………………………………..42

2.4.3 Statistical analysis……………………………………………………………………..46

2.4.4 Metabolic analysis…………………………………………………………………….46

2.5 DISCUSSION…………………………………………………………………...……...51

2.5.1 Factors responsible for sequence distribution in Antarctica…………………………..52

2.5.2 Metabolic analysis…………………………………………………………………….59

2.5.2.1 in cold habitats…………………………….……………………...... 59

2.5.3 Adaptation to cold habitats……………………………………………………………62

2.5.4 Important Enzymes in cold habitats…………………………………………………..63

2.6 CONCLUSION……………………………………………………………………..…...64

2.7 REFERENCES…………………………………………………………………...……...66

viii

APPENDICES………..………………………………………………………………………….73

A. Numbers of sequences matching those for genes in specific pathways that were recovered

from the Taylor Dome, Bryd, Vostok and J-9 samples………………………………….73

B. Description of the sequences found in the Taylor Dome, Bryd, Vostok and J-9 ice core

section samples…………………………………………………………………………..76

C. gene sequences identified from the Byrd, Vostok, J-9 and Taylor Dome

samples…………………………………………………………………………………...86

D. Statistical analysis of factors influencing sequence distribution in Antarctica…………..96

ix

LIST OF TABLES Table 1: Ice core information for the core sections used in the research…………..……...... 11

Table 2: MID primer sequences used for each sample. The primers have the form A-key-MID-

EcoRI and B-key-MID-EcoRI…………………………………………….…………….……….34

Table 3: Taxonomic affiliations of at Taylor Dome, Byrd, Vostok and J-9 determined by phylogenetic analysis of LSU and SSU rRNA……………….………………………………… 38

Table 4: Taxonomic affiliations of bacteria sequences amplified in the metagenomic/metatranscriptomic analysis at Taylor Dome, Byrd, Vostok and J-9 determined by phylogenetic analysis of LSU and SSU rRNA…………………………………………………..39

Table 5: Percentage representation of the major metabolic pathways from the four locations based on the number of enzyme sequences found from the metagenomic/metatranscriptomic analysis...... 50

Table 6: Comparison of the sampled locations in terms of site elevation, distance from the coast, number of unique sequences recovered and number of species identified using the ribosomal subunits………………………………………………………………………….……………… 54

x

LIST OF FIGURES Figure 1: Locations in Antarctica from which the ice cores were obtained. Vostok and Taylor

Dome are in East Antarctica, while Byrd and J-9 are in West Antarctica. In general, East

Antarctica is higher in elevation than West Antarctica…………………………….……………22

Figure 2: Bar chart of the microbial composition at the phylum level from the four sampled locations in Antarctica. Percentage distribution was calculated on the basis of the total number of sequences obtained from each sampled location…………………………………………...……41

Figure 3: Phylogenetic analysis of LSU rRNA gene sequences isolated from Antarctica by metagenomic/metatranscriptomic analysis……………………………….…………………...... 44

Figure 4: Neighbor joining tree of SSU rRNA gene sequences isolated from Antarctica by the culture and metagenomic/metatranscriptomic analysis methods………………………...………45

Figure 5: Bar chart of the major metabolic pathways representation at the four sampled locations in Antarctica…………………………………………………………………………………...... 49

Figure 6: Prevailing winds in Antarctica during winter…………………………………………56

Figure 7: Distribution of the number of sequences obtained from the sampled locations versus the distance from Antarctic coast and elevation of the sampled locations ……………...... ……58 1

CHAPTER 1: INTRODUCTORY REVIEW

1.1 ANTARCTICA: A COLD DESERT Antarctica is the fifth largest continent, with a land-mass of more than 14,000,000 km2

(5,400,000 sq mi). It is the southernmost continent, and is surrounded by the Southern Ocean,

which causes only cold ocean currents to reach the continent. Unlike other continents, 98% of

Antarctica is covered by ice that averages 1.6 kilometers in thickness. Antarctica is considered

the coldest, driest, windiest desert on Earth, with annual precipitation of only 200 mm (8 inches) along the coast. The inland temperatures that can reach to as low as -89°C (11°C colder than subliming ice) in winter and as high as 15°C near to the coast in the summer (Bratchkova and

Ivanova 2011). It is much colder than the Arctic because parts of East Antarctica are up to 3 km above sea level (Bratchkova and Ivanova 2011). Because of its elevation, East Antarctica is much colder than West Antarctica. The frozen environment in Antarctica had once been considered to be devoid of life (Cowan and Ah Tow 2004), but in 1908 Tsiklinsky published the first report about the Antarctic microflora after the French Antarctic expedition between 1903 and 1905. Since then, an increasing number of studies have shown that permanently frozen environments have abundant and diverse communities of microorganisms that may not only be detected, but can be recovered by cultivation. Indeed microbes have been isolated from as deep as 3,621 m, from the Vostok ice core (D’Elia, Rogers, Veerapaneni 2008). It is possible that various cold-adapted organisms, such as algae, bacteria, fungi, protists, animals and plants, have evolved to survive in this harsh Antarctica environment (Skidmore et al. 2000).

2

1.2 DISTRIBUTION OF MICROBES IN COLD ECOSYSTEMS It has been widely accepted that the biodiversity of large organisms in cold ecosystems declines with an increase in latitude as the temperatures decrease and the conditions become more severe, but the diversity of macroorganisms does not follow this trend (Pearce et al. 2009).

Due to the extremely cold temperatures accompanied by freeze and thaw cycles, high irradiance and variations in nutrient supply, the biodiversity of Antarctica is reduced, with Bacteria comprising the majority of the biomass. Although microbial cell numbers in clear glacial ice are generally low (from <1 to 103 cells ml−1), they have been found to vary with depth, with a major cause for these variations attributed to insoluble mineral particles, which are considered one of the main carriers of microbial cells (Abyzov et al. 1998). The distribution of microorganisms in

Antarctica occurs by the transport of living cells over large distances made possible by favorable climatic conditions, disseminating factors that ensure the survival and colonization of the new habitat (Margesin and Miteva 2011). There is a possibility that the high distribution potential of microorganisms from Antarctic Peninsula, South America and other Antarctic locations may have lead to the evolution of many cosmopolitan species and to the diversity of microorganisms found in Antarctica (Yergeau et al. 2007). The number of species found in Antarctica is relatively low compared to temperate regions, but they also represent a wide variety of taxa that contain a diverse gene pool. The metabolic diversity of the community members in ice is high such that the few strains are able to provide important functions (Simon et al. 2009).

Viable microorganisms have been isolated from frozen soils (Rivkina et al. 2008), perennial ice covered lakes (Clocksin et al. 2007), glacial ice (Ma et al. 2000) and accretion ice

(Karl et al 1999, D’Elia et al. 2008) in the Antarctica. Attempts have been made to study microorganisms trapped directly in glacial ice and correlate this to their geographical location and mode of dispersal. Christner et al. (2000) investigated the composition of microbial 3 populations in ice collected from a variety of different ecological sources, deposited at known times and under defined environmental conditions in Greenland, China, Peru, and Antarctica.

They noted that the diversity and abundance of microorganisms from a particular ice core was determined by the proximity of the ice core source to major biological ecosystems. The major factors that determined the number of recoverable microorganisms at different locations in their research were the prevalent climate, wind direction and individual events, such as volcanic activity, that occurred at the time of deposition.

1.3 INVASION OF MICROBES INTO ANTARCTICA The Southern Ocean and the circumpolar currents act as geographic and climatic barriers between Antarctica and other landmasses such that any potential colonist must arrive via the aerobiota, animal vector or aerosols (Wynn-Williams 1996). The microbial colonists must have several characteristics that allow them to colonize after long-range transport. Some of these characteristics include resistance to desiccation, autotrophy, protection from UV damage and long-term survival without active photosynthesis (Wynn-Williams 1996). The major sources of microbial cells in Antarctica are volcanic ash, terrestrial dust and marine aerosols (Margesin and

Miteva 2011). The common agents for the distribution of these cells are storm systems, winds and migratory birds. These agents may be responsible for the introduction of microorganisms from outside Antarctica and for their redistribution within Antarctica.

1.3.1 Storm systems and circumpolar currents

Precipitation in Antarctica is carried by cyclonic storm systems that carry moist warm air from lower latitudes. The Antarctic Peninsula receives the highest precipitation on the continent while inner regions have reduced precipitation as the elevation and distance from the sea increases.

Areas closer to the Antarctica coast, such as Taylor Dome in East Antarctica, have their 4 precipitation influenced more by cyclonic activity than inland regions such as Vostok (Steig et al. 1998). The storm systems play an important role in the distribution of airborne microorganisms across the continent. The importance of the circumpolar currents and the

Antarctic coastal currents in the dispersal process in Antarctica is indicated by the circumpolar distribution of marine microbiota (Vincent 2000).

1.3.2 Wind transport

Non-marine microbial species are thought to have entered Antarctica as dry frozen propagules blown by the wind. Some of the evidence that illustrates the transfer of microbes to

Antarctica via atmospheric circulation include the discovery of pollen from South American plants and the isolation of various species of microbiota from the Antarctica ice sheet that have not been found as living cells in the present day Antarctica (Vincent 2000). Katabatic winds play an important role in distribution of organisms in Antarctica. Antarctica is much colder than the surrounding ocean. This leads to high pressure in the interior of the continent and an outflow of cold air (katabatic winds) away from the cold regions toward the warmer surrounding bodies of water. Katabatic winds form sheet-like rivers of air and blowing snow with speeds of over 300 knots until the cold air mass dissipates, or until there is a countervailing push from the ice shelf side of the mountains. When katabatic winds reach the coast they are affected by local cyclonic winds that divert them in an anticlockwise direction. Unlike most storms, katabatic winds can last for days on end. Michaud, Šabacká and Priscu (2011) investigated the cyanobacterial diversity across landscape units in Taylor Valley, Antarctica. They noted that the diversity of cyanobacteria increases from the upper portion of the valley towards the lower side of the valley and this led them to conclude that there was a net transport of organisms down the valley consistent with the prevailing orientation of high-energy winds. Cyanobacteria-like particles that 5 act as biogenic nuclei for water droplet formation have also been observed in clouds over the

Ross Ice Shelf and it is thought that they may have been dispersed by winds from the microbial mat communities on McMurdo Ice Shelf (Vincent 2000).

1.3.3 Migratory birds

Immigrant birds are another mode of microbial transfer into Antarctica. The south polar skua, cape pigeon, arctic tern and giant fulmar are some of the birds that have been observed in

Antarctica (Vincent 2003), some reaching as far as the South Pole. These birds have a wide geographical range, for example, the south polar skuas have been observed in Mexico, Argentina and Greenland. The Arctic terns fly from the Arctic Circle to the Antarctic Circle. It is also possible that their migration has been ongoing for more than 25 million years (Steadman 2005).

Bacteria that are psychrotolerant have been isolated from ice cores from Antarctic ice cores

(D’Elia et al. 2008, Christner et al. 2000). These bacteria also have been found in temperate regions and it is thought that they might have been transported to Antarctica by migrating birds.

This was demonstrated in an experiment where washing of the tail feathers from the arctic tern yielded cyanobacteria, green alga and protozoa (Vincent 2000). This suggests that the birds may have been responsible for the transfer of many species of microbes to the polar regions. Once in the Antarctic the microorganisms can be redistributed by a variety of mechanisms such as wind.

1.4 METHODS OF CHARACTERIZING MICROBIAL DISTRIBUTION

1.4.1 Culture-dependent methods

Standard culture techniques to characterize microbial ecology involve isolation and characterization of microorganisms using growth media and incubators. Culture-independent methods also have been used to characterize microbial diversity in ice (Steven et al. 2007). 6

Analysis of the community composition of culturable bacteria from various parts of East

Antarctica was used to show that the bacterial communities were different from each other based

on location and the community sizes decreased with increases in elevation and distance from the

coast (Yan et al. 2012). The major limitation of culture-based techniques is that 99% of the

microorganisms in any environment are not cultivable by standard culturing techniques (Rastogi

and Sani 2011). Culture-dependent methods of analyzing population diversity are biased as they

select for organisms that can be cultured while leaving out others that are not currently culturable

and thus grossly underestimate the population diversity in a particular environment. Another

limitation is that any deviation of the culture process from the initial environmental parameters

can alter the community structure by imposing new selective conditions (Liu et al. 1997).

1.4.2 Culture-independent method

Metagenomics and/or metatranscriptomics, coupled with pyrosequencing, are cultivation- independent genomic approaches have been used successfully to characterize entire microbial communities in a number of ecosystems by studying their physiological and genetic traits

(DeLong and Karl 2005). Most microbial communities found in soils or aquatic environments are complex, with very few of their members having been cultured. Metagenomics and metatranscriptomics provide ways to analyze the community structure and diversity of such systems. Simon et al. (2009) successfully used a metagenomic approach to survey metabolic potential and phylogenetic diversity of a microbial assemblage present in glacial ice from the

Alps. Metagenomic/metatranscriptomic analysis, coupled with pyrosequencing, has also been used to characterize microorganisms in ice cores from Greenland (Knowlton et al. 2013). Other

diverse environments that have been studied using metagenomics are Antarctic waters (Moreira

2006), marine sediments (Breitbart et al. 2004), soils (Sjöling and Cowan 2008) and surface 7 ocean waters (Biers et al. 2009). Metagenomics is based on direct isolation of nucleic acids from environmental samples. The methods provide powerful tools for comparing and exploring the ecology and metabolic pathways of complex environmental microbial communities (Simon and

Daniel 2011). These approaches may be divided into function-based and sequence-based approaches for screening of metagenomic libraries, both of which may comprise the cloning of environmental DNA and the construction of small-insert or large-insert libraries (Simon and

Daniel 2011). The resulting metagenomic libraries may be used to transform a host using plasmids, cosmids or fosmids depending on the DNA quality and size, targeted genes, and screening strategy (Daniel 2005). Small-insert libraries can be employed for the identification of novel biocatalysts encoded by single genes or a small operon, whereas large-insert libraries are required to recover large gene clusters, which code for proteins in complex pathways (Daniel

2005). One major disadvantage of cloning-dependent metagenomic analysis is that there is a cloning bias where the method often misses sequences that are in low concentrations or difficult to clone. Cloning-independent metagenomic approaches coupled with 454 pyrosequencing has been used to overcome this disadvantage.

1.4.2.1 454 Pyrosequencing Pyrosequencing, a next generation sequencing (NGS) technology, allows sequencing of a single strand of DNA by synthesizing a complementary strand next to it. One dNTP is added in the reaction at a time. If there is a complementary base, then the DNA polymerase catalyzes the reaction and releases pyrophosphate. ATP sulfurylase uses the pyrophosphate to produce ATP in the presence of adenosine 5' phosphosulfate. Luciferase uses the ATP produced to convert luciferin to oxyluciferin. Light is produced by this reaction that is detected by a light-sensor which is recorded by a computer. Eventually, the recorded flashes of light for each position in 8 the reaction plate are used to determine the sequence of the template. Pyrosequencing provides a high number of sequences that can be used to analyze the microbial community and infer function of that community. This is particularly important when analyzing microbial community ice samples where the concentrations of microbes are low. The limitation of pyrosequencing is that lengths of individual DNA sequences are shorter compared to Sanger sequencing that uses the chain termination method. The advantage of pyrosequencing is that a large number of sequences can be obtained simultaneously with results that are almost as accurate as with other conventional sequencing methods while dispensing of the need for labeled nucleotides and electrophoresis. Pyrosequencing is also a less expensive method (per nucleotide), compared to

Sanger sequencing. However, the error rate of pyrosequencing has been shown to overestimate phylotypes in cases where each pyrosequencing read is treated as a unique identifier or a community member (Simon and Daniel 2011). Because of the short read lengths, microbial identification using DNA pyrosequencing has focused on variable regions within the small ribosomal subunit RNA gene (Petrosino et al. 2009).

1.4.2.2 rRNA (ribosomal RNA) The comparison of ribosomal RNA (rRNA) gene sequences is a powerful tool for deducing phylogenetic relationships among Bacteria, Archaea, and Eukarya. This is because the ribosome has a central role in translating mRNA into proteins. For this reason regions of the rRNA are well conserved, having experienced only small sequence changes over billions of years of evoloution, while other genomic regions have undergone significant changes. The small subunit (SSU) rRNA is a component of the 30S subunit of the bacterial ribosome that is from about 1,500 to 1,600 nucleotides (nt) in length. The SSU rRNA sequences have been used to reconstruct broad level phylogenies because of very high conservation among species from 9

Bacteria through Eukarya (Coenye and Vandamme 2003), and the fact that rRNA genes are inherited vertically. The notion that rRNA genes could identify an organism by reconstructing its phylogeny, along with the possibility of storing sequences in databases, has resulted in the rapid adoption of the SSU rRNA genes by microbiologists for and phylogenetics (Case et al.

2007). The use of amplified and sequenced SSU rRNA genes has also been widely accepted as an approach in the study of microbial diversity (Lemos et al. 2011).

1.5 ANALYSIS OF METABOLIC FUNCTIONS Organismal DNA sequences obtained from an environmental sample using metagenomic or metatranscriptomic methods may be used to characterize the metabolic functions of the microbial community found at a particular location. The differences in metabolic functions from different communities may be characterized by comparing the metabolic abundance of the metabolic pathways. This is done by carrying out the alignment of sequence reads obtained from the metagenomic processes to genes in a database on the basis of sequence similarity. The genes encode proteins that are known enzymes that take part in various biochemical reactions. The

Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolism pathway database has been used to store global reference pathways that have been identified in various organisms. Studies on microbial communities have been carried out by comparing the biological processes that are associated with the functional categories in the KEGG pathway (Qin et al. 2010). The microbial ecosystems in Antarctica are important in evolutionary ecology research because the Antarctic environment contains food webs that are less complex and provide a way for better understanding the interaction of microbial processes (Vincent 2000). 10

1.6 HYPOTHESIS AND GOALS OF THIS PROJECT The distribution of microorganisms in Antarctica is driven by factors such as elevation,

wind patterns, proximity to the coastal regions and migratory birds. To verify this, I compared

sequences from microbial communities deposited in ice approximately 500 years ago. The ice

samples were obtained at an average depth of 100 m from four different locations in Antarctica

(Table 1); Taylor Dome (77°40′S 158°00′E), Byrd (80°00′S 119°31′W) Vostok (72°28′S 106°48′E)

and J-9 (82°32′S 168°38′W). The number of sequences recovered from locations closer to the

Antarctic coast (Byrd (454 km) and Taylor Dome (241 km)) is higher than the locations inland of

Antarctica (Vostok (1,680 km)) due to the presence of the cyclonic wind system and migratory birds at the coast region. Taylor Dome and Vostok are located at high elevation in East

Antarctica (2,440 m and 3,488 m respectively) while J-9 and Byrd are located at low elevations in West Antarctica (60 m and 1530 m respectively). With the katabatic winds playing an important role in the movement of microorganisms, low areas such as J-9 and Byrd have a high number of microbial sequences recovered compared to Taylor Dome and Vostok. The analysis of the sequence generated from amplification of of the DNA and the cDNA extracted from the nucleic acids was used to characterize the microbial community structure in the ice at the four locations. Potential correlations between the environmental parameters and microbial community distribution were identified in order to get a deeper understanding of the microbial community structure. The aim of this project was to identify which parameters drive the distribution of microbial communities in Antarctica and to identify the microbial pathways that are represented by the microbial communities in Antarctica.

11

Table 1: Ice core information for the core sections used in the research.

Drill Site Drill Drill Drill Site Drill Site Distance Site

(Ice core Year Depth Longitude Latitude from the Elevation sample) (m)1 coast (km) (m)

Vostok 1992 125 72° 28″ 00′ S 106° 48″ 00′ E 1,680 3,488

Byrd 1968 105 80° 00″ 00′ S 119° 31″ 00′ W 454 1,530

J-9 1978 96 82° 32″ 00′ S 168° 38″ 00′ W 647 60

Taylor Dome 1994 99 77° 40″ 00′ S 158° 00″ 00′ E 241 2,440

1 All depths represent ice that was deposited less than 500 years ago.

12

1.7 REFERENCES

Abyzov SS, Poglazova MN, Mitskevich IN, Ivanov MV (1998). Common features of microorganisms in ancient layers of the Antarctic ice sheet. In: Castello JD, Rogers SO (eds)

Life in ancient ice. Princeton University Press, Princeton, NJ, pp 240–250.

Biers EJ, Sun S, Howard EC (2009). Prokaryotic genomes and diversity in surface ocean waters: interrogating the global ocean sampling metagenome. Appl. Environ. Microbiol. 75:2221–2229.

Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F (2004). Diversity and population structure of a near-shore marine- sediment viral community. Proc. R. Soc. Lond.

B. 271:565–574.

Case RJ, Boucher Y, Dahllof I, Holmstrom C, Doolittle WF, Kjelleberg S (2007). Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies. Appl. Environ.

Microbiol. 73:278–288.

Christner BC, Mosley-Thompson E, Thompson LG, Zagorodnov V, Sandman K, and Reeve JN

(2000). Recovery and identification of viable bacteria immured in glacial ice. Icarus 144:479–

485.

Clocksin MK, Jung DO, Madigan MT (2007). Cold-active chemoorganotrophic bacteria from permanently ice-covered Lake Hoare, McMurdo dry valleys, Antarctica. Appl. Environ.

Microbiol. 73:3077–3085.

Coenye T, Vandamme P (2003). Intragenomic heterogeneity between multiple 16S ribosomal

RNA operons in sequenced bacterial genomes. FEMS Microbiol. 228:45–49. 13

Cowan DA, Ah Tow L (2004). Endangered Antarctic microbial communities. Ann. Rev.

Microbiol. 58:649–690.

D’Elia T, Veerapaneni R, Rogers SO (2008). Isolation of microbes from Lake Vostok accretion ice. Appl. Environ. Microbiol. 74:4962–4965.

Daniel R (2005). The metagenomics of soil. Nature Rev. Microbiol. 3:470–478.

DeLong EF, Karl DM, (2005). Genomic perspectives in microbial oceanography. Nature

437:336–342.

Egevang C, Stenhouse I J, Phillips RA, Petersen A, Fox JW, Silk JR (2010). Tracking of Arctic terns Sterna paradisaea reveals longest animal migration. Proc. Natl. Acad. Sci. 107:2078–

2081.

Ivanova A, Ivanova V (2011). Bioactive Metabolites produced by Microorganisms collected in

Antarctica and the Arctic. Biotechnol and Biotechnol. Eq. DOI: 10.5504/bbeq.2011.0116.

Lemos LN, Fulthorpe RR, Triplett EW, Roesch LF (2011). Rethinking Microbial Diversity

Analysis in the High Throughput Sequencing Era. J. Microbiol. Methods. 86:42–51.

Liu WT, Marsh TM, Cheng H, Forney LJ (1997). Characterization of microbial diversity by determining terminal restriction fragment length Polymorphisms of genes encoding 16S rRNA.

Appl. Environ. Microbiol. 16:4516–4522.

Margesin R, Miteva V (2011). Diversity and ecology of psychrophilic microorganisms. Res.

Microbiol. 162:346–361. 14

Michaud BA, Šabacká M, Priscu CJ (2011). Cyanobacterial diversity across landscape units in a polar desert: Taylor Valley, Antarctica. FEMS Microbiol. Ecol. doi:10.1111/j.1574-

6941.2012.01297.x.

Moreira D, Rodríguez-Valera F, López-García P (2006). Metagenomic analysis of mesopelagic

Antarctic plankton reveals a novel deltaproteobacterial group. Microbiol. 152:505–517.

Pearce DA, Bridge PD, Hughes KA, Sattler B, Psenner R, Russell NJ (2009). Microorganisms in the atmosphere over Antarctica. FEMS Microbiol. Ecol. 69:143–157.

Petrosino FJ, Highlander S, Luna AR, Gibbs AR, Versalovic J (2009). Metagenomic

Pyrosequensing and Microbial Identification. Clin. Chem. 55:856–866.

Qin et al. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65.

Rivkina EM, Friedmann EI, Mckay CP, Gilichinsky DA (2000). Metabolic activity of permafrost bacteria below the freezing point. Appl. Environ. Microbiol. 66:3230–3233.

Simon C, Daniel R (2011). Metagenomic Analyses: past and future trends. Appl. Environ.

Microbiol. 77:1153–116.

Simon C, Wiezer A, Strittmatter WA, Daniel R (2009). Phylogenetic diversity and metabolic potential revealed in a glacier ice metagenome. Appl. Environ. Microbiol. 75:7519–7526.

Sjöling S, Cowan DA (2008). Metagenomics: microbial community genomes revealed.

Psychrophiles: from Biodiversity to Biotechnology. Springer, Berlin, Heidelberg, pp 313–332. 15

Skidmore ML, Foght JM, Sharp MJ (2000). Microbial life beneath a high Arctic glacier. Appl.

Environ. Microbiol. 66:3214–3220.

Steadman DW (2005). The paleoecology and fossil history of migratory landbirds. In: Greenberg

R and Marra PP (eds) Birds of two worlds: the ecology and evolution of migration. Johns

Hopkins University Press, Baltimore, MD., pp. 5–17.

Steig EJ, Brook EJ, White JWC, Sucher CM, Bender ML, Lehman SJ, Morse DL, Waddington

ED, Clow GD (1998). Synchronous climate changes in Antarctica and the North Atlantic.

Science 282:92–95.

Steven B, Briggs G, McKay CP, Pollard WH, Greer CW, Whyte LG (2007). Characterization of the microbial diversity in a permafrost sample from the Canadian high Arctic using culture- dependent and culture-independent methods. FEMS Microbiol. Ecol. 59:513–23.

Vincent WF (2000). Evolutionary origins of Antarctic microbiota: Invasion, selection and endemism. Antarctic Sci. 12:374–385.

Vincent WF (2003). Microbial ecosystems in Antarctica. Cambridge University Press,

Cambridge, UK, pp 229–235.

Wynn-Williams DD (1996). Antarctic microbial diversity: the basis of polar ecosystem processes. Biodiv. Conserv. 5:1271–1293.

Yan P, Hou S, Chen T, Ma X, Zhang S (2012). Culturable bacteria isolated from snow cores along the 1300 km traverse from Zhongshan Station to Dome A, East Antarctica. Extremophiles

16:345–354. 16

Yergeau E, Newsham KK, Pearce DA, Kowalchuk GA (2007). Patterns of bacterial diversity across a range of Antarctic terrestrial habitats. Environ. Microbiol. 9:2670–2682.

17

2. CHAPTER 2: SEQUENCE AND METABOLIC PATHWAY DISTRIBUTION IN FOUR LOCATIONS OF ANTARCTICA

2.1 ABSTRACT Antarctica has one of the most extreme environments on Earth. The biodiversity and the species richness on the continent are low and decrease with increases in elevation and distance from the coastal regions. Previous scientific research in Antarctica has been used to understand the past climatic conditions, survival mechanisms used by the microbial communities and various environmental factors that contribute the dispersal of microorganisms. The research presented here is a comparison of microbial inclusions in ice at four locations in Antarctica

(Byrd, Taylor Dome, Vostok and J-9) to identify the factors that influence the microbial distribution patterns and to investigate survival of the micobes under harsh conditions. Culture- dependent and culture-independent techniques (e.g., metagenomics and metatranscriptomics) were used to analyze sequences present in ice cores from Antarctica. The sequences analyzed matched those from Spirochaetes, Verrucomicrobia, Bacteroideters, Cyanobacteria,

Deinococcus-Thermus, Proteobacteria, Firmicutes, Actinobacteria, Euryarchaeota and

Ascomycota. Analysis of the metagenomic/metatranscriptomic sequences was also carried out to characterize the various pathways represented in the diverse communities. Analysis of the data revealed that the numbers of unique sequences obtained from the samples were few (Taylor

Dome (51), Byrd (43), Vostok (33) and J-9 (40). The number of unique sequences was lowest from the sample obtained at the most elevated location in the interior of Antarctica (Vostok (33)) and highest from the sample that was closest (Taylor dome (51)) to the Antarctic coast. The samples from all four locations appeared to harbor very few species of microorganisms (Taylor

Dome (12), Byrd (13), Vostok (6), and J-9 (7)). Analysis of the microbial pathways revealed that the microorganisms are able to utilize various sources of carbon, recycle nitrogen and had unique 18 enzymes and cell structures that have previously been reported to be important for microbial survival at very low temperatures.

19

2.2 INTRODUCTION Much of the Earth is permanently cold (below 5°C). This includes the seawater, deep oceans and polar ice caps. The organisms that live in these habitats are of interest to the scientific

community because they are considered to be the most successful colonizers. The species

diversity decreases with decreasing temperature and microorganisms such as bacteria, fungi and

algae become the dominant flora in extreme conditions (Russell et al. 1990).

Antarctica is one of the highest, driest and most inhospitable continents on Earth that

holds more than half of the world’s fresh water. Antarctica is one of the least inhabited and least

polluted places on Earth. Because of this, it forms a suitable place for scientists to study climatic

change over thousands of years and ecosystems in environments that have been least affected by

human activities. Antarctica is divided into East Antarctica and West Antarctica. These two

regions are separated by the Transantarctic Mountains. East Antarctica is at a higher elevation

than West Antarctica and constitutes two-thirds of the Antarctic continent. West Antarctica,

although intensely cold, has a milder climate compared to East Antarctica. Studies have been

carried out on the microorganisms entrapped in ice at various locations in Antarctica. Attempts

have also been made to correlate these organisms to the local geography and climatic conditions

during deposition (Christner 2000). Bacteria recovered from various parts of Antarctica were

found to have endured desiccation, solar irradiation, periods of frozen dormancy and thawing.

For this reason most of the bacteria found in these harsh conditions belong to bacterial groups

that have mechanisms (such as the production of spores and pigments) that confer resistance to

harsh environmental conditions (Christner 2000).

Culture-independent methods and culture-dependent methods have been used to

investigate the microbial communities found at various locations in Antarctica (Christner 2000, 20

Jose et al. 2003, Pointing et al. 2009). Analysis of small subunit rRNA gene sequences using

culture-independent methods has been successfully applied in phylogenetic studies of viable,

non-viable and dormant microorganisms in Antarctica (Cowan and Tow 2004). Although it

might be unwise to compare directly the data on microbial communities reported by different

research scientists without taking into account their research parameters, it has been

demonstrated that, in general, the number of microbial communities decreases with the increase

in altitude (Wynn-Williams 1996). Attention has also been given to the molecular and regulatory

mechanisms adopted by microorganisms that survive in these harsh environments

(Satyanarayana et al. 2005). The understanding of microbial distribution in Antarctica

environments not only advances our understanding of the microbial ecosystem but also the

effects of environmental change in polar environments.

Pyrosequencing is increasingly being utilized in the study of diverse bacterial

communities with the emphasis of utilizing a large number of sequences obtained through

metagenomic analysis (Youssef et al. 2009). The use of pyrosequencing is especially important

when analyzing sequences that are very low in concentration or very rare in a sample. Glacial ice

samples have been observed to have low concentrations (<1 to 103 cells ml−1) of microbial cells

(Priscu et al. 1999).

In this research, both culture-dependent and culture-independent approaches were used to evaluate the microbial sequence distribution estimates in ice at four locations (Taylor Dome,

Byrd, Vostok and J-9) in Antarctica (Figure 1). Cloning-independent metagenomic and metatranscriptomic approaches were utilized following extraction of total DNA and RNA from each ice meltwater sample. This is because the cell and nucleic acid concentrations are very low in the ice core samples. Pyrosequencing was utilized to obtain the sequences of the DNA and 21

RNA present in the samples. Analysis of the metagenomic DNA enabled us to assess the metabolic potential of organisms represented while analysis of rRNA genes from the metatranscriptomic fraction was important in evaluating the taxa of the microorganisms present in the ice samples. The taxonomic scheme developed by The National Center for Biotechnology

Information (NCBI) was used to facilitate microorganism identification. The Kyoto encyclopedia of Genes and Genomes (KEGG) metabolism pathway database was used to reconstruct the metabolic pathways based on the similarities to sequences recovered from the metagenomic/metatranscriptomic analysis.

22

30°W 0° 30°E

60°W 60°E

East Antarctica

90°W South Pole 90°E West Antarctica J-9 Vostok Byrd

Taylor Dome

120°W 120°E

150°W 150°E

Figure 1: Locations in Antarctica from which the ice cores were obtained. Vostok and Taylor

Dome are in East Antarctica, while Byrd and J-9 are in West Antarctica. In general, East

Antarctica is at higher elevations than West Antarctica.

23

2.3 METHODOLOGY

2.3.1 Ice core section handling and culturing

Four ice cores from four different locations in Antarctica (Vostok, Byrd, J-9 and Taylor

Dome; Fig. 1) were obtained from the US Geological Survey National Ice Core Laboratory

(USGS NICL), Denver, CO. They were shipped to our laboratory and stored at -20°C. The ice cores were warmed at -4°C for two hours (note: working with them at -20°C sometimes induces thermal shock and cracking) and thereafter the frost on the ice was removed before a small section was cut from the large ice core using a saw. The cutting of the ice core was done in a sterile room where the benches had been cleaned with 70% ethanol and the room bathed with

UV irradiation 30 minutes prior to the work session. The ice core section was then moved into a sterile laminar flow hood, which had also been wiped with 70% ethanol and bathed with germicidal UV irradiation for 1 hour. The ice core section from each sample was dipped into a

5.25% solution of cold sodium hypochlorite (800 ml undiluted Clorox) for 5 - 10 seconds to decontaminate the outer part of the ice and rinsed successively in three beakers containing 1,500 ml of reverse osmosis autoclaved water (18 MΩ; <1 ppb TOC, total organic carbon) such that the ice core was completely covered (Rogers et al 2004). Rinsing was done to ensure that the sodium hypochlorite used for decontamination was removed from the ice. Rinsed sections were placed into a sterile Buchner funnel for melting. Meltwater was collected in 25 ml aliquots, representing melt shells in the ice core sections. The first meltwater aliquot was denoted “shell

1.” The first shells contain meltwater from the outer part of the ice while the last shells contain meltwater from the inner part of the ice core. Up to 10 shells of the meltwater were collected from each sample. Then, 200 µl of the meltwater from every shell of the decontaminated ice was inoculated onto agar plates. Two types of media used were ; Malt extract agar (MEA; 1.28% 24

maltose, 0.27% dextrin, 0.24% glycerol, 0.08% peptone, 1.5% agar) and Luria–Bertani (LB) agar

(1% tryptone, 0.5% yeast extract, 1% sodium chloride, 1.5% agar) to encourage the growth of

fungi and bacteria, respectively. During the handling process two agar plates (malt extract agar

and LB agar) were place in the laminar flow hood with their lids removed as controls to monitor the air quality within the laminar flow hood. The inoculated plates were incubated at 15°C for the first week and moved to the 8°C incubator thereafter. Plates were examined for fungal and bacterial growth weekly. Any isolates obtained were immediately subcultured to obtain pure cultures. The remaining meltwater in the shells was stored at -20°C.

2.3.2 DNA extraction of the pure cultures

DNA extraction was carried out using a CTAB (cetyltrimethylammonium bromide)

method (Rogers and Bendich 1985). The culture samples were transferred in to sterile microfuge

tubes. A total of 200 µl of the extraction buffer [2% CTAB (w/v), 100 mM Tris (pH 8.0), 20 mM

EDTA, 1.4 M NaCl] was added to the microfuge and mixed thoroughly. Chloroform/isoamyl

(24:1) was then added and mixed thoroughly to form an emulsion, and the phases were

separated by centrifugation at 16,000 xg for 3 minutes. Next, the aqueous phase was transferred

to a new microfuge tube. One-fifth volume of a 5% CTAB solution was added to the microfuge

and a second round of chloroform/isoamyl extraction was performed. The aqueous phase was

transferred into a new sterile microfuge tube and 0.6 volume of isopropanol was added, followed

by gentle mixing. Next, the microfuge tube was place at -20°C for 24 hours to allow

precipitation of the nucleic acids. Centrifugation was carried out at 16,000 xg for 1 minute to

pellet the precipitate. Then, the isopropanol was decanted followed by rehydration of the pellet in

300 µl high salt TE [10 mM Tris (pH 8.0), 0.1 mM EDTA, 1 M NaCl] and heating in a water

bath at 65°C. A total of 600 µl of cold 95% ethanol was added to reprecipitate the DNA followed 25 by centrifugation. The ethanol was then decanted. Next, 300 µl of cold 80% ethanol was added to rinse the DNA, followed by centrifugation. The ethanol was decanted and the pellet was dried in a Vacufuge concentrator (Eppendorf Model 5301, Brinkman, Dublin, OH) at room temperature until the residual ethanol had evaporated. Care was taken not to dry the pellet completely. The DNA was then rehydrated in 20 µl 0.1X TE [1 mM Tris (pH 8.0), 0.1 mM

EDTA]. A total of 1 µl the extracted DNAs with the rehydration solution were subjected to electrophoresis at 5 V/cm for 2 hours on a 1% agarose gel, with TBE (89 mM Tris, 89 mM boric acid, 2 Mm EDTA) and 0.5 µg/ml ethidium bromide. Molecular weight makers (100 base pair ladder, BioLabs Inc., New England) were used as size makers on each gel. The gel was illuminated with UV and photographed with a digital camera.

2.3.2.1 PCR (polymerase chain reaction) amplification The ribosomal DNA regions were amplified with EUB16S-611(forward primer) and EUB16S-

1336 (reverse primer) primers (D’Elia et al. 2008). DNA was amplified with a Fermentas Life

Sciences PCR Reagent kit (Fermentas Inc., Hanover, Maryland). Each reaction consisted of 2 µl of DNA, 1X Taq buffer [10 mM Tris-HCl (pH 8.80), 50 mM KCl, 0.8% (v/v) nonidet p40], 1

µM each primer (forward 3’- TGGGAACTGCATCTGATACTGGCAA-5’ and reverse 3’-

TTAGCACCTAGTCTTACGGTGCCAC-5’), 0.2 mM each dNTP, 1.25 U Taq DNA polymerase, 1.5 mM MgCl2, in 50 µl total volume. The PCR reaction was carried out using a

BIO-RAD PTC-100 Programmable Thermocycler. The program used was 94°C for 5 min; then

40 cycles of 94°C for 2 min, 54°C for 2 min with a ramp of 0.8°C /sec to 72°C, and 72°C for 2 min; and finally 72°C for 10 min. 26

2.3.2.2 PCR product purification A QIAquick PCR Purification kit (QIAGEN, Valencia, CA.) was used to purify the amplified

DNA. A total of 5 volumes of binding Buffer (Buffer PB) (guanidine hydrochloride and isopropanaol) were added to 1 volume of the PCR sample, followed by mixing. A QIAquick spin column was placed in a collection tube. Next, the sample was applied to the QIAquick column, followed by centrifugation for 1 min. Then, the flow through was discarded and the QIAquick column placed back into the same tube. A total of 0.75 ml Buffer PE was used to wash the

QIAquick column followed by centrifugation for 1 minute. The flow through was discarded and the column centrifuged for an additional 1 minute. Next, the QIAquick column was placed in a clean microcentrifuge tube. A total of 50 µl buffer EB (10 mM Tris-Cl, pH 7.0) was placed at the center of the QIAquick membrane followed by centrifugation of the column for 1 minute. The purified DNA was then collected in the microcentrifuge tube. Electrophoresis was carried out as described above. Molecular weight makers (100 base pair ladder, BioLabs Inc., New England) were used as size makers on each gel. The gel was illuminated with UV and photographed with a digital camera. A total of 25 ng of 600 bp purified PCR product in 12 µl was sent to Quintara

Biosciences, Albany, CA for Sanger sequencing.

(The concentrations of buffer PB and constituents of buffer PE are confidential according to

QIAGEN)

2.3.3 Ultracentrifugation

Meltwater samples (8 centrifuge tubes per sample, each with 35 ml of meltwater) were centrifuged at 100,000 xg for 18 hours at 4°C using a Beckman type 60 Ti rotor set. The supernatants were transferred into conical tubes while the pellets from each centrifuge tube were 27 resuspended in 50 µl 0.1 x TE [10 mM Tris (pH 8.0) and 1 mM EDTA (pH 8.0)] per tube.

Resuspended pellets in conical tubes were pooled and stored at -20°C.

2.3.4 DNA extraction

A total of 200 µl of each concentrated meltwater sample was place in a clean autoclaved

1.5 ml microcentrifuge tube. Next, 200 µl of 2X CTAB extraction buffer [2% CTAB (w/v), 100 mM Tris (pH 8.0), 20 mM EDTA (pH 8.0) and 1.4 M NaCl] was added to the sample and mixed.

The centrifuge tube was placed in a water bath set at 65°C for ten minutes and the tube was agitated several times. Then, 400 µl chloroform isoamyl alcohol (96% chloroform, 4% isoamyl alcohol) was added to denature and separate the proteins from the DNA. The mixture was shaken vigorously to form an emulsion and centrifuged at 16,000 xg for five minutes after which the top phase was transferred to a new microcentrifuge tube. Then, 400 µl of isopropanol was added into the microcentrifuge tube to precipitate the DNA followed by mixing and centrifugation for 15 minutes. The alcohol was decanted after centrifugation and 200 µl of 80% ethanol added to the contents of the tube to rinse the DNA pellet. This was followed by centrifugation for 5 minutes, after which the alcohol was decanted. The pellets were dried in a Vacufuge concentrator

(Eppendorf Model 5301, Brinkman, Dublin, OH) at room temperature until the residual ethanol had evaporated. The pellet was rehydrated in 20 µl 0.1X TE [1 mM Tris (pH 8.0), 0.1 mM

EDTA]. The microcentrifuge tube was placed in a 65°C water bath for 20 minutes, with frequent mixing, to complete the hydration of the DNA. Then, DNA was stored at -20°C.

2.3.5 RNA extraction

A total of 200 µl of the concentrated meltwater was placed in a clean autoclaved 1.5 ml microcentrifuge tube. Three volumes (600 µl) of TRIzol LS reagent (phenol/guanidinium thiocyanate) were added and mixed for 10 seconds until complete homogenization. The 28 homogenized sample was then incubated at 55°C for 10 minutes to permit complete dissociation of nucleoprotein complexes. Next, chloroform was added to the contents of the microcentrifuge tube (160 µl of chloroform per 600 µl of TRIzol LS reagent used) and mixed vigorously for 15 seconds. The samples were incubated at room temperature for 15 minutes, followed by centrifugation at 16,000 xg for 15 minutes. The top aqueous layer containing the RNA was removed and precipitation was carried out by adding 400 µl isopropanol. The sample was then mixed by inverting the tube repeatedly and precipitated for 2 hours at -20°C. Next, it was centrifuged to pellet the RNA (16,000 xg for 10 minutes). Then, the supernatant was decanted and 800 µl of 80% cold ethanol was added to rinse the RNA pellet. Centrifugation was carried out (16,000 xg for 5 minutes) and the supernatant was decanted. Next, the pellet was air dried in a sterile hood for 30 minutes and then rehydrated with 20 µl of 0.1x TE [10 mM Tris (pH 8.0) and 1 mM EDTA (pH 8.0). The rehydrated pellet was stored at -80°C.

2.3.6 cDNA synthesis

2.3.6.1 First strand synthesis The cDNA was synthesized from the extracted RNA. A total of 10 µl of the extracted

RNA was added to an RNase free 0.5 ml microcentrifuge tube. Next, 76 pmol of random hexamer primers (provided in an Invitrogen SuperScript Choice System kit for DNA synthesis,

Calsbad, CA) were added to the RNA. The contents of the microcentrifuge tube were mixed gently and placed in a 70°C water bath for 10 minutes for denaturation after which the tubes were chilled on ice to anneal the primers. The first strand synthesis reaction consisted of the first strand buffer [50 mM TrisHCl (pH 8.3), 75 mM KCl, 3 mM MgCl2], 0.95 mM dithiothreitol

(DTT), 76 pmol random hexamer primers, 200 µM of each dNTP, 200 U SuperScript II RT

(). The final volume of the reaction was 21 µl. The mixture was incubated at 29

37°C for 2 hours after which the reaction was stopped by placing on ice. The purpose of the first

strand synthesis is to synthesize a reverse complement copy of DNA from the RNA using

reverse transcriptase.

2.3.6.2 Second strand synthesis The first strand was used as a template in the synthesis of the complementary strand of

DNA. The second strand synthesis reaction consisted of DEPC-treated (diethylpyrocarbonate)

water, second strand buffer [50 mM TrisHCl (pH7.8), 5 mM MgCl2, 1 mM DTT], 200 µM of

each dNTP, 10 U E. coli DNA , 40 U E. coli polymerase I, 2 U E. coli RNase H. The final

volume of the mixture was 171 µl. The contents were mixed thoroughly and incubated at 15° C for 2.5 hours. Then, 10 U of T4 DNA polymerase were added and the contents were incubated for 7 minutes at 15°C. The reaction was placed on ice and 10 µl of 0.5 M EDTA added to stop the reaction. Next, 150 µl of chloroform: isoamyl alcohol (96% chloroform, 4% isoamyl alcohol) was added and mixed. Centrifugation was carried out for 2 minutes at 16,000 xg after which the upper aqueous layer was transferred into a new 1.5 ml microcentrifuge tube. Next, 0.5 volumes of 1 M NaCl (final concentration 333 mM) was added to provide Na+ ions that are necessary to

form a salt with the phosphate groups on the DNA. The contents were mixed in a

microcentrifuge and placed at -20°C for 10 minutes after addition of 0.5 ml cold 95% ethanol for

precipitation. The mixture was centrifuged for 20 minutes at 16,000 xg, after which the alcohol

was poured off. The pellet was washed with 0.5 ml cold 80% ethanol. The ethanol was poured

off and the pellet was dried in a Vacufuge concentrator (Eppendorf Model 5301, Brinkman,

Dublin, OH) at room temperature for 15 minutes. Care was taken not to dry the sample

completely. 30

2.3.7 EcoRI (NotI) adapter addition

A total of 10 µl of the extracted DNA (above) was added to the cDNA (above). The

EcoRI (Not I) adapter addition reaction consisted of: DEPC (diethylpyrocarbonate)-treated water, adapter buffer [66 mM Tris-HCl (pH 7.6), 10 mM MgCl2, 1 mM ATP], 25.75 pmol EcoRI

(Not I) adapters (5′-AATTCGCGGCCGCGTCGAC-3′ complimentary to 5′-

pGTCGACGCGGCCGCG-3′), 1.4 mM DTT and 5 U T4 DNA ligase. The final volume of the

mixture was 60 µl. The purpose of adding the EcoRI (NotI) adapters was to provide the same

EcoRI 5′ extension at both ends of the cDNA and DNA so that they could be amplified using

PCR with EcoRI primers. Addition of T4 DNA ligase was done to catalyze the joining of the

EcoRI (NotI) adapters onto the ends of the cDNA and DNA. The mixture was then incubated at

15°C for 18 hours. The reaction was heated to 70°C in a water bath after incubation to inactivate

the ligase.

2.3.8 Column chromatography

Column chromatography using a Sephacryl column (Invitrogen Life Technologies, Grand

Island, NY) was carried out to fractionate the cDNA and DNA according to size. This was done

to ensure that the smaller cDNAs and DNAs were not favored over the larger strands during PCR

amplification. Also, nucleic acids greater than 2-4 kb cannot be sequenced using pyrosequencing

methods. The top and bottom cap on the column were removed and the excess ethanol was

allowed to drain into microcentrifuge tubes and discarded. Then, 0.8 ml of TEN buffer [10 mM

Tris-HCl (pH 7.5), 0.1 mM EDTA, 25 mM NaCl; autoclaved] was added on to the column and

allowed to drain into microcentrifuge tubes. This was repeated three more times for a total of 3.2

ml of TEN buffer. Next, 20 sterilized microcentrifuge tubes were placed in a rack with tube 1

under the outlet of the column. Then, 100 µl of TEN buffer was added to the tubes containing 31 cDNA and DNA reaction from EcoRI (NotI) adapter addition reaction and mixed gently. The entire sample was added to the center of the top of the column and allowed drain in to the

Sephacryl. The effluent was collected in the first tube. Next, 100 µl of TEN buffer was added to the column and the effluent was collected in the second tube. Then, 100 µl of the TEN buffer was added on to the column and single drops (approximately 35 µl) were collected in the third tube through to the eighteenth tube. Afterwards, 0.5 volumes of 1 M NaCl, followed by two volumes of cold absolute ethanol were added to each of the tubes. The contents were mixed gently and stored overnight at -20°C to precipitate the DNA and cDNA. Centrifugation was carried out at room temperature for 20 minutes at 16,000 xg and the supernatant was decanted.

The pellets were washed with 0.5 ml of 80% ethanol and were centrifuged for 5 minutes at

16,000 xg. The supernatants were decanted. Then, the cDNA/DNAs were dried in the Vacufuge concentrator (Eppendorf Model 5301, Brinkman, Dublin, Ohio) at room temperature until the residual ethanol had evaporated. Next, each of the nucleic acid fractions was rehydrated in 20 µl of 0.1x TE [10 mM Tris (pH 8.0) and 1 mM EDTA (pH 8.0).

2.3.9 Amplification and quantitation

Each of the nucleic acid fractions was subjected to PCR (polymerase chain reaction) amplification with EcoRI adapter primers (3’- AATTCGCGGCCGCGCTCGAC-5’) using a

GeneAmp Kit with AmpliTaq DNA Polymerase (Applied Biosystems Cat. #N801-0055). Each reaction consisted of 10 µl of the nucleic acid fractions (in 0.1X TE), 54 pmol EcoRI adapter primers, 200 µM of each dNTP, PCR buffer [10 mM TrisHCl (pH 8.3), 50 mM KCl, 1.5 mM

MgCl2, 0.001% w/v gelatin; autoclaved] and 2.5 U Taq DNA polymerase in 0.5 ml thin walled

PCR tubes. The final volume of the reaction mix was 50 µl. The PCR reaction was carried out using BIO-RAD PTC-100 Programmable Thermocycler. The program used was 94°C for 4 min; 32

then, 40 cycles of 94°C for 1 min, 55°C for 2 min, 72°C for 2 min and finally 72°C for 10 min.

Next, a 1% agarose gel was prepared, with TBE [89 mM Tris base, 89 mM boric acid, 2 mM

EDTA (pH 8.0)] and 0.5 µg/ml ethidium bromide. Gel electrophoresis was performed as

described above. The 1% agarose gel was placed in the electrophoresis unit and gel

electrophoresis was run at 5 V/cm for 1 hour after loading using 3 µl of each PCR product mixed

with 3 µl of the dye and densifier into the wells. Molecular weight markers (100 base pair ladder,

BioLabs Inc., New England) were used as size standards the on the gel. The PCR products of for

the samples exhibiting bands in the 200 to 2,000 bp range were precipitated using 200 µl 80%

ethanol and pooled for subsequent amplification.

2.3.10 Addition of 454 adapters

A and B primers for 454 sequencing were used to amplify the samples (Table 2). These

primers have the form: A-key-MID-EcoRI and B-key-MID-EcoRI. The key (for calibration of

the 454 sequencer) is a known sequence of four nucleotides (TCAG) located immediately

downstream from the sequencing primer A (forward primer) and primer B (reverse primer).

Unique multiplex identifiers (MID) sequences were incorporated into each primer so that

eventually the samples could be pooled in a single sequencing run and automated software could

be used for the identification of the sequences from each of the samples using the MID

sequences on the ends. EcoRI linker sequences were incorporated into the 3′ ends of the primers.

Their function was to anneal to the EcoRI (NotI) adapter sequence at the ends of the DNA during

PCR amplification. An example of a primer used with the form A-key-MID1-EcoRI was

CGTATCGCCTCCCTCGCGCCA-TCAG-ACGAGTGCGT-AATTCGCGGCCGCGTCGAC while B-key-MID1-EcoRI was CTATGCGCCTTGCCAGCCCGC-TCAG-ACGAGTGCGT-

AATTCGCGGCCGCGTCGAC. The following is a list of the samples and their corresponding 33

454 adapter (see Table 2): Taylor Dome – MID 2, Byrd – MID 4, Vostok – MID 5, and J-9 –

MID 6. A 1% gel was prepared, loaded and run, as above. However, for this gel, serial dilutions

(5 ng to 75 ng) of plasmid pGEM3Z (Promega Corp., Madison, WI) were used as markers for determination of DNA concentrations. Then, the gel was removed from the buffer and placed on a UV source and photographed with a digital camera to confirm amplification and record fluorescence amounts.

2.3.11 Sequencing

The DNA concentrations estimated from the gel fluorography were: Vostok – 750 ng/µl;

Byrd – 750 ng/µl; J-9 – 1,800 ng/µl; Taylor Dome – 1,400 ng/µl. The samples were pooled in a ratio of 1:1:4:4, in a total volume of 50 µl, at a final concentration of 75 ng/µl. The pooled sample was then delivered to the DNA Sequencing Facility, University of Pennsylvania, PA for

454 sequencing.

34

Table 2: MID primer sequences used for each sample. The primers have the form A-key-MID- EcoRI and B-key-MID-EcoRI.

Sample Primer Name Primer Sequence Taylor A-MetaMID_2 CGTATCGCCTCCCTCGCGCCATCAGACGCTCGACAAATTCGCGGCCGCGTCGAC

Dome B-MetaMID_2 CTATGCGCCTTGCCAGCCCGCTCAGACGCTCGACAAATTCGCGGCCGCGTCGAC

Byrd A-MetaMID_4 CGTATCGCCTCCCTCGCGCCATCAGAGCACTGTAGAATTCGCGGCCGCGTCGAC

B-MetaMID_4 CTATGCGCCTTGCCAGCCCGCTCAGAGCACTGTAGAATTCGCGGCCGCGTCGAC

Vostok A-MetaMID_5 CGTATCGCCTCCCTCGCGCCATCAGATCAGACACGAATTCGCGGCCGCGTCGAC

B-MetaMID_5 CTATGCGCCTTGCCAGCCCGCTCAGATCAGACACGAATTCGCGGCCGCGTCGAC

J-9 A-MetaMID_6 CGTATCGCCTCCCTCGCGCCATCAGATATCGCGAGAATTCGCGGCCGCGTCGAC

B-MetaMID_6 CTATGCGCCTTGCCAGCCCGCTCAGATATCGCGAGAATTCGCGGCCGCGTCGAC

The letters in bold represent the MID sequences.

35

2.3.12 Analysis of the sequences

The 454 sequencing results were uploaded to the Ohio Super Computer (OSC) server.

First, the sequences were divided into separate files based on their MID sequences, which correspond to the specific ice core section samples. Next, the primer sequences were clipped off in silico and sequence assembly (joining together fragments with identical 5’ and 3’ ends) was performed which resulted in longer contiguous (called contigs) sequences. MegaBlast was used to search the NCBI database for similar sequences. The MegaBlast results (with the cut off parameter 1e-10) were converted to a database format (FileMaker Pro11 program, FileMaker

Inc., Santa Clara, California) and the Galaxy genome web site (Pennsylvania State University,

PA, https://main.g2.bx.psu.edu/) was used to retrieve the basic taxonomy from the MegaBlast results. Accession numbers for the most similar sequences were exported and used in Batch

Entrez NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/) to retrieve the sequences. First, basic taxonomic and similarity information were compiled from the searches. Then the sequences obtained were reorganized based on the gene regions and multiple sequence alignments were performed according to gene homology. Multiple sequence alignments were performed using the MAFFT (Multiple Alignment using Fast Fourier

Transform, Kyoto University, Kyoto, Japan: http://mafft.cbrc.jp/alignment/software/) global and local alignment tools. The SSU rRNA sequences from the bacterial cultures were combined with the SSU rRNA sequences from the metagenomic analysis for the alignment. Next, the alignment file was converted into a NEXUS file format with SeaView (Université de Lyon, Villeurbanne,

France:http://pbil.univ-lyon1.fr/software/seaview.html) and neighbor joining phylogenetic analysis was also performed using SeaView. The trees obtained were evaluated with 36

Dendroscope (Dendroscope, University of Tübingen, Tübingen, Germany: http://ab.inf.uni- tuebingen.de/software/dendroscope/).

Next, the 454 unassembled reads were uploaded to the KAAS (KEGG Automatic

Annotation Server; KEGG - Kyoto Encyclopedia of Genes and Genomes, Kyoto, Japan) pathway website and blasted against bacterial and eukaryotic sequences. Metabolic maps were retrieved from the KAAS-KEGG analysis. A table of enzymes that match the sequences from each pathway was constructed. The pathways obtained in the analyzed data set were mapped with the iPath (Interactive Pathways Explorer, Heidelberg, Germany) web server to construct and compare the metabolic profile of each sample.

The metagenomic data obtained from two samples that had been analyzed in a previous research study (Knowlton et al. 2013) were used as controls for this research. The first control was autoclaved nanopure water (18.2 MΩ, <1ppb) that was subjected to ultracentrifugation. The second control was nanopure water (18.2 MΩ, <1ppb).

37

2.4 RESULTS

2.4.1 Sequences obtained

A total of 13756, 20855, 22721 and 18364 high quality sequence reads were obtained

from the 454 pyrosequencing procedure representing Taylor Dome, Byrd, Vostok and J-9,

respectively. The reads were assembled in into contigs which yielded total of 224, 130, 341 and

128 sequences from Taylor Dome, Byrd, Vostok and J-9, respectively, which were similar to

sequences from the NCBI database using MegaBlast analysis. After the removal of

contaminants based on the set sequences obtained from the control samples, a total of 51, 43, 33

and 40 unique sequences matching the NCBI database were obtained representing Taylor

Dome, Byrd, Vostok and J-9, respectively. A total of 64, 47, 124 and 53 sequences from Taylor

Dome, Byrd, Vostok and J-9 samples respectively did not match anything in the NCBI database. Microbial growth was only observed on the LB agar plates. A total of 25 SSU rRNA gene sequences were obtained from the pure cultures (Table 3).

38

Table 3: Taxonomic affiliations of bacteria recovered from the pure cultures at Taylor Dome,

Byrd, Vostok and J-9 determined by phylogenetic analysis of SSU rRNA.

Accession Percent

DNA region Query-Name Closest Phylogenetic Member Number Identity Sample Source

SSU rDNA CUL2-s1 Bacillus amyloliquefaciens KC900904 99% Taylor Dome

CUL2-s2 Bacillus cereus strain JP44SK37 KC900905 97% Taylor Dome

CUL2-s4 Bacillus mojavensis strain p23_C02 KC900907 99% Taylor Dome

CUL2-s5 Bacillus subtilis strain p23_C04 KC900908 97% Taylor Dome

CUL2-s6 Stenotrophomonas maltophilia strain p71_G10 KC900909 94% Taylor Dome

CUL2-s7 Stenotrophomonas sp. SO5 KC900910 98% Taylor Dome

CUL2-s8 Uncultured bacterium clone BI2_57 KC900911 99% Taylor Dome

CUL2-s9 Bacillus oceanisediminis strain p83_H06 KC900912 99% Taylor Dome

CUL2-s10 Bacillus sp. TE-7 KC900913 98% Taylor Dome

CUL4-s1 Bacterium NLAE-zl-P657 KC900914 97% Byrd

CUL4-s3 Uncultured bacterium clone nby373f09c1 KC900915 99% Byrd

CUL4-s4 Uncultured bacterium clone nby397h01c1 KC900916 98% Byrd

CUL4-s7 Micrococcus sp. 9(2012) KC900918 97% Byrd

CUL4-s8 Stenotrophomonas sp. 2325 KC900919 98% Byrd

CUL4-s9 Uncultured bacterium clone 1H3C_28 KC900920 97% Byrd

CUL4-s2 Uncultured bacterium clone SRD3_C05 KC900921 82% Byrd

CUL4-s5 Bacillus subtilis strain RP24 KC900922 99% Byrd

CUL5-s1 Bacillus circulans 16S rRNA gene isolate P23 5c1 KC900923 90% Vostok

CUL5-s2 Uncultured bacterium clone YL-50 KC900924 100% Vostok

CUL5-s3 Bacillus sp. AS-S01a KC900925 97% Vostok

CUL5-s4 Uncultured bacterium clone BH2_aao23a10 KC900926 85% Vostok

CUL6-s1 Uncultured bacterium clone ncd989c02c1 KC900927 96% J-9

CUL6-s2 Bacillus sp. MOD23 KC900928 99% J-9

CUL6-s3 Bacillus subtilis strain RP24 KC900929 99% J-9

CUL6-s4 Uncultured bacterium clone ncd2227e11c1 KC900930 91% J-9

39

Table 4: Taxonomic affiliations of bacteria sequences amplified in the metagenomic/metatranscriptomic analysis at Taylor Dome, Byrd, Vostok and J-9 determined by phylogenetic analysis of LSU and SSU rRNA.

Accession Percent

DNA region Query-Name Closest Phylogenetic Member Number Identity Sample Source

SSU rDNA MID4_c104 Rhodococcus sp. EL6 KC900931 94% Byrd

MID4_c115 Rhodococcus sp. DR4 KC900932 97% Byrd

MID5_c234 Xanthobacter autotrophicus KC900933 96% Vostok

MID2_c147 Flavobacterium sp. ER3 KC900934 95% Taylor Dome

MID2_c15 Arthrobacter sp. EL3 KC900935 97% Taylor Dome

LSU rDNA MID6_c72 Uncultured cyanobacterium clone AN2_1 KC900936 98% J-9

MID6_rep_c89 Uncultured cyanobacterium clone AN2_1 KC900937 99% J-9

MID4_c131 Uncultured bacterium clone F5K2Q4C04IDZJ3 KC900938 99% Byrd

MID4_rep_c146 Uncultured bacterium clone F5K2Q4C04IDZJ3 KC900939 99% Byrd

MID2_c154 Cyanothece sp. PCC 7425 KC900941 97% Taylor Dome

MID6_rep_c40 Trichodesmium erythraeum IMS101 KC900942 96% J-9

MID4_rep_c26 Gordonibacter pamelaeae 7-10-1-b KC900940 91% Byrd

MID5_rep_c223 Sphingomonas sp. 352 KC900943 97% Vostok

MID5_s54 Rhodococcus erythropolis PR4 KC900944 98% Vostok

40

2.4.2 Community structure

2.4.2.1 Taxonomic groups represented One of the most important objectives in a metagenomic or metatranscriptomic study is to

analyze the microbial structure in natural environmental samples. In Fig. 2, the distribution of the

sequences among major taxonomic groups of Bacteria and Eukarya is presented for each of the

four locations (Taylor Dome, Byrd, Vostok and J-9) that were sampled. The sequences recovered matched sequences from species of Spirochaetes, Verrucomicrobia, Bacteroidetes,

Cyanobacteria, Deinococcus-Thermus, Proteobacteria, Firmicutes, Actinobacteria,

Euryarchaeota and Ascomycota (Figure 2). The four major taxa that were highly represented at

all the sampled locations were Actinobacteria, , Betaproteobacteria and

Firmicutes. A large proportion of sequences within each taxa that had been obtained by the

metagenomic/metatranscriptomic approach exhibited less than 95% identity with their

corresponding closest BLAST sequence matches (Table 4), while a large portion of those

recovered from the pure cultures exhibited more than 95% identity (Table 3). Byrd had the

highest number of species represented (13) followed by Taylor Dome (12), J-9 (7)

and Vostok (6). 41

100% Ascomycota 90% Euryarchaeota 80% Verrucomicrobia

70% Spirochaetes Cyanobacteria 60% Bacteroidetes 50% Deinococcus-Thermus

40% Uncultured bacteria

Gammaproteobacteria 30% Betaproteobacteria 20% Firmicutes

10% Alphaproteobacteria

0% Actinobacteria Taylor Dome Byrd Vostok J-9

Figure 2: Bar chart of the microbial composition at the phylum level from the four sampled locations in Antarctica. Percentage distribution was calculated on the basis of the total number of sequences obtained from each sampled location.

42

2.4.2.2 Phylogenetic analysis Ribosomal genes are important in establishing the phylogenetic association of organisms

because of their high conservation among all cellular organisms. The

metagenomic/metatranscriptomic analysis yielded 14 sequences that could be assigned to SSU

or LSU rRNA genes. The SSU rRNA sequences from the metagenomic/metatranscriptomic

analysis [Taylor Dome (2), Byrd (2) and Vostok (1)] were grouped with the SSU rRNA

sequences from the pure cultures [Taylor Dome (9), Byrd (8) Vostok (4) and J-9 (4)]. An

additional 41 sequences obtained from the NCBI database were used to construct a SSU rRNA

phylogenetic tree (Figure 3). The LSU rRNA genes sequences [Taylor Dome (1), Byrd (3),

Vostok (2), J-9 (3)] from the metagenomic/metatranscriptomic analysis and an additional 5

sequences obtained from the NCBI database were used to construct the LSU rRNA neighbor

joining phylogenetic tree (Figure 4). The SSU and LSU rRNA sequences obtained grouped into

four phyla (Bacteroidetes, Firmicutes Actinobacteria, and Proteobacteria) and three phyla

(Actinobacteria, Cyanobacteria, and Proteobacteria), respectively (see Fig 3 and 4).

All sequences that were closely related to the Alphaproteobacteria were found in the

Vostok sample. They were related to species of Xanthobacter autotrophicus and

chlorophenolicum. Xanthobacter autotrophicus is a nitrogen fixing, hydrogen oxidizing, gram

negative, rod shaped bacterium, while Sphingobium chlorophenolicum is a degrader of lignin-

derived aromatic compounds. The sequences that were grouped with the Gammaproteobacteria were closely related to the Aeromonas hydrophila, Stenotrophomonas maltophila,

Stenotrophomonas sp SO5 and Stenotrophomonas sp 2325. These were obtained from Vostok and Byrd. A sequence that grouped members of the Bacteroidetes was isolated from Byrd. It was closely related to Flavobacterium sp ER3. Sequences that grouped with the Actinobacteria 43 were isolated from Byrd, J-9 and Taylor Dome. They were closely related to the Gordonibacter pamelaeae, Arthrobacter sp. EL3, Micrococcus sp. and Rhodococcus erythropolis. Most of the bacteria that grouped with members of Firmicutes were closely related to Bacillus amyloliquefaciens, Bacillus cerus, Bacillus majavensis, Bacillus subtilis, Bacillus oceanisediminis, Bacillus circulans and Straphylococcus succinus. Firmicutes were isolated from all the locations sampled. The uncultured bacterial sequences obtained from pure cultures grouped with Firmicutes [Taylor Dome (1), Byrd (3), and Vostok (1)], Gammaproteobacteria

[Byrd (1), Vostok (1)] and Actinobacteria [J-9 (2)]. Uncultured bacterial sequences from the

LSU rDNA metagenomic/metatranscriptomic analysis grouped with Cyanobateria [J-9 (2)] and

Actinobacteria [Byrd (2)].

44

gi|334343657:2456358-2459151 Sphingobium chlorophenolicum Alphaproteobacteria MID5 rep c223

gi|66277298|gb|AY987615.1| Aeromonas hydrophila 100 Gammaproteobacteria MID5 s54

gi|113473942:3139115-3141995 Trichodesmium erythraeum

gi|220905643:535030-537841 Cyanothece sp 100 100 . 100 MID2 c154 64 Cyanobacteria MID6 c72

80 MID6 rep c89 100 100

MID6 rep c40

gi|295105686:75519-78480 Gordonibacter pamelaeae

100 MID4 c131 100 Actinobacteria 99 MID4 rep c146

MID4 rep c26 0.05 substitutions/site Figure 3: Phylogenetic analysis of LSU rRNA gene sequences isolated from Antarctica by metagenomic analysis.

Neighbor joining phylogenetic tree indicating the relationships among the sequences recovered from the four sampled locations and the sequences of their nearest relatives based on

GeneBank/NCBI LSU rRNA gene sequences. MID2, MID4, MID5 and MID6 represent LSU rRNA gene sequences recovered from Taylor Dome, Byrd, Vostok and J-9, respectively.

Bootstrap values greater than 50% are indicated on the respective branches. The scale bar represents 0.05 fixed substitutions per nucleotide position. 45

NJ 100 gi|217031074|gb|FJ517631.1| Flavobacterium sp. ER3 MID2 c147 Bacteroidetes 97 gi|343206107|ref|NR 044696.1| Cytophagafermentans gi|283821192|gb|GU269545.1| Meniscus glaucopis gi|310974915|ref|NR 036779.1| Geothrix fermentans 84 100 gi|297501468|dbj|AB561885.1| Acidobacterium capsulatum gi|158148186|emb|AM887762.1| Bryobacter aggregatus 53 gi|256855978|emb|FN428684.1| Geobacillus thermoleovorans gi|126153803|emb|AJ605773.3| Bacillus aurantiacus gi|400294697|gb|JX144727.1| Bacillus cereus 85 CUL2-s2 gi|379363959|gb|JQ607533.1| Bacterium NLAE-zl-P657 67 gi|403314452|gb|JX312627.1| Staphylococcus succinus CUL6-s3 CUL4-s1 84 CUL4-s3 CUL4-s4 gi|224611835|gb|FJ687601.1| Bacillus subtilis strain BPK-17 gi|336020898|gb|FJ887877.2| Bacillus sp. AS-S01a Firmicutes gi|375151242|emb|HE661620.1| Bacillus amyloliquefaciens 58 gi|121052996|gb|EF154418.1| Bacillus subtilis strain RP24 gi|399162910|gb|JX122119.1| Bacillus sp. MOD23 gi|388833214|gb|JQ834370.1| Bacillus subtilis strain p23_C04 gi|388833212|gb|JQ834368.1| Bacillus mojavensis CUL2-s4 CUL6-s2 77 CUL2-s8 CUL2-s1 CUL4-s5 CUL2-s5 CUL5-s3 gi|403084313|gb|JX292026.1| Bacillus sp. TE-7 gi|388833787|gb|JQ834943.1| Bacillus oceanisediminis CUL2-s9 CUL2-s10 CUL5-s4 CUL4-s2 gi|76057953|emb|AM086230.1| Bacillus circulans CUL5-s1 gi|7650386|gb|AF230876.1| Rhodococcus erythropolis gi|310923310|emb|FR669673.1| Rhodococcus globerulus gi|217031036|gb|FJ447542.1|| Rhodococcus sp. DR4 100 MID4 c115 86 gi|217031058|gb|FJ517615.1| Rhodococcus sp. EL2 MID4 c104 Actinobacteria gi|2581848|gb|AF005010.1| Nocardioides simplex 100 gi|217031059|gb|FJ517616.1| Arthrobacter sp. EL3 99 MID2 c15 85 gi|395132718|gb|JQ958884.1| Micrococcus sp. CUL4-s7 63 CUL6-s4 CUL6-s1 gi|126032584|gb|EF426439.1| Delftia sp. Betaproteobacteria CUL5-s2 gi|355845851|gb|HQ631948.1| Stenotrophomonas sp. SO5 gi|401878554|gb|JX174202.1| Stenotrophomonas sp. 2325 gi|388830709|gb|JQ831865.1| Stenotrophomonas maltophilia 100 CUL2-s7 Gammaproteobacteria CUL4-s8 CUL4-s9 CUL2-s6 gi|217031043|gb|FJ464980.1| Sphingomonas sp. 100 gi|359804334|dbj|AB681056.1| Sphingobium chlorophenolicum gi|359803512|dbj|AB680660.1| Mesorhizobium loti Alphaproteobacteria gi|359803507|dbj|AB680655.1| Xanthobacter autotrophicus 100 MID5 c234 100 gi|265678783|ref|NR 029088.1| Thermovibrio ammonificans gi|33286382|gb|AY268939.1| Desulfurobacterium atlanticum Aquificae 79 gi|18642553|gb|AF393781.1| Dehalobium chlorocoercia gi|174838|gb|L04675.1| Heliothrix oregonensis Chloroflexi gi|339905467|gb|JN216874.1| Nostoc calcicola Cyanobacteria 0.01 substitutions/site Figure 4: Neighbor joining tree of SSU rRNA gene sequences isolated from Antarctica by the culture and metagenomic analysis methods. 46

CUL- represents sequences recovered from the pure cultures, while MID- represents those

recovered from the metagenomic/metatranscriptomic analysis. The SSU rRNA gene sequences

isolated from Taylor Dome (CUL2 and MID2), Byrd (CUL4 and MID4), Vostok (CUL5 and

MID5) and J-9 (CUL6 and MID6) were analyzed with similar sequences obtained from

GeneBank/NCBI. Bootstrap values greater than 50% are indicated on the respective branches.

The scale bar represents 0.01 fixed substitutions per nucleotide position.

2.4.3 Statistical analysis

Two separate sets of regression analyses were carried out. One was carried out to analyze the relationship between number of sequences obtained and elevation of the sampled location. The

other was also done to determine the relationship between the number of sequences obtained and

the distance of the sampled location from the Antarctic coast. The effect of the distance of the

sampled locations from the Antarctic coast on the number of sequences recovered was weakly

significant on the p<0.1 level while the effect of elevation of the sampled locations on the

number of sequences recovered was not significant (r2 = 0.834, p=0.087 and r2= 0.036, p=0.811

respectively).

2.4.4 Metabolic analysis

A total of 108, 78, 95 and 103 of the unassembled read sequences from Taylor Dome,

Byrd, Vostok and J-9, respectively matched enzymes in KAAS (KEGG Automatic Annotation

Server; KEGG - Kyoto Encyclopedia of Genes and Genomes, Kyoto, Japan; Moriya et al. 2007)

pathway database representing various metabolic pathways. The combined list of sequences

similar to those encoding enzymes from the four locations indicated the presence of 76 metabolic

pathways that were grouped into 8 metabolic pathway clusters (Table 5). The three major group metabolic pathways that had a high representation of enzyme were those involved in 47

carbohydrate metabolism [Taylor Dome (31%), Byrd (35%), Vostok (39%), J-9 (33%)], amino

acid metabolism [Taylor Dome (28%), Byrd (21%), Vostok (27%), J-9 (28%)] and energy production [Taylor Dome (13%), Byrd (18%), Vostok (9%), J-9 (14%)]

More than one-third of the sequences matching enzymes from each of the samples were involved in carbohydrate metabolism. Most of these sequences shared homologies to genes involved in the tricarboxylic acid cycle and glycolysis/gluconeogenesis, as well as in propionate, butanoate, pyruvate, fructose and mannose metabolism. Sequences associated with energy conservation pathways were highly represented at all the four locations. Sequences that matched enzymes involved in energy metabolism were the third most abundant [Taylor Dome (13%),

Byrd (18%), Vostok (9%) and J-9 (33%)] after the amino acid metabolism enzymes. The high percentage of sequences matching enzymes involved in energy metabolism and conservation is reasonable given the extreme environmental conditions in both transport and entrapment in glacial ice. The pathways involved in energy metabolism that were most represented at Taylor

Dome, Byrd and J-9 were those involved in carbon fixation, nitrogen metabolism and oxidative phosphorylation. These pathways were however poorly represented at Vostok. Sequences for genes responsible for nitrogen metabolism enzymes were present in the samples from Taylor

Dome, Byrd and J-9, but they were not found in the sample from Vostok. In agreement with the deep glacial ice environment the enzymes related to photosynthesis were very few in the Taylor

Dome and Byrd samples, and they were absent in Vostok and J-9 samples.

In general, the sequence matches for enzymes involved in lipid metabolism from the four samples was very low. Sequences matching the long chain acyl-CoA synthetase enzyme from the

Byrd sample, triaclglcerol lipase from Taylor Dome sample and glycerol from J-9 sample were the only sequences that matched enzymes that were unique to lipid metabolism. Sequences 48 matching the acetyl-CoA carboxylase carboxyl gene were present in samples from

Taylor Dome and Vostok. Long chain acyl-CoA synthetase, an enzyme that plays an important role in fatty acid metabolism by activating free fatty acids to acyl-CoA thioesters was indicated by a single sequence in the Byrd sample. Other sequences were closest to those genes encoding the enzymes methionyl-tRNA synthase (J-9), acetyl-CoA acyltransferase (Taylor Dome) and aldehyde dehydrogenase (Vostok and Byrd).

The percentage of sequences involved in amino acid metabolism at all the four locations was almost identical apart from the percentage from Byrd sample which was somewhat lower

(Taylor Dome (31%), Byrd (25%), Vostok (31%) and J-9 (31%) ). The fraction of enzymes responsible for metabolism of terpenoids, polyketides and metabolism of other secondary metabolites were highest in samples collected from the high elevation East Antarctica locations.

The sequences matching genes for enzymes involved in xenobiotic biodegradation was high in

Taylor Dome sample, followed by J-9. Very few sequences encoding xenobiotics were found in the Vostok and Byrd samples. 49

100% Biosynthesis of other 90% secondary metabolites Glycan Biosynthesis and 80% Metabolism Metabolism of Terpenoids and 70% Polyketides Xenobiotics Biodegradation 60% and Metabolism Metabolism of other Amino 50% Acids Lipid Metabolism 40%

30% Metabolism of Cofactors and vitamins

20% Energy Production

10% Amino Acid Metabolism

0% Carbohydrate Metabolism Taylor Dome Byrd Vostok J-9

Figure 5: Bar chart of the major metabolic pathways representation at the four sampled locations in Antarctica.

Percent distribution of the pathways was calculated on the basis of the total number of sequences obtained from each sampled location (Taylor Dome, Byrd, Vostok and J-9) that matched gene sequences for enzymes on KAAS (KEGG Automatic Annotation Server; KEGG- Kyoto

Encyclopedia of Genes and Genomes, Kyoto, Japan (Moriya et al. 2007)).

50

Table 5: Percentage representation of the major metabolic pathways from the four locations based on the number of enzyme sequences found from the metagenomic/metatranscriptomic analysis.

Percentage representation

Metabolic pathway Taylor Byrd Vostok J-9

Carbohydrate metabolism 31%(33) 35%(27) 39%(37) 33%(33)

Energy production 13%(14) 18%(14) 9%(9) 14%(14)

Lipid metabolism 4%(4) 10%(8) 3%(3) 2%(2)

Amino acid metabolism 31%(30) 25%(16) 31%(26) 31%(31)

Metabolism of cofactors and vitamins 7%(8) 9%(7) 9%(4) 15%(15)

Glycan biosynthesis and metabolism 1%(1) 0%(0) 1%(1) 1%(1)

Metabolism of secondary metabolites 5%(5) 3%(2) 5%(5) 2%(2)

Xenobiotics biodegradation and metabolism 9%(10) 1%(1) 1%1(1) 3%(3)

The raw number of sequences for each value is in parentheses.

51

2.5 DISCUSSION In this study, the sequence distribution of organisms and nucleic acids entrapped in ice samples collected from four locations in Antarctica using both culture-dependent and culture- independent approaches was compared. The number of sequences obtained from the metagenomic/metatranscriptomic analysis was low. This could have been due to the limited amount of DNA in the ice samples. This study demonstrates the wide dispersion of microorganisms and the presence of diverse microorganisms in ice core samples from a depth of approximately 100 m in Antarctica. The sequences recovered matched sequences from members of the Spirochaetes, Verrucomicrobia, Bacteroidetes, Cyanobacteria, Deinococcus-Thermus,

Proteobacteria, Firmicutes, Actinobacteria, Euryarchaeota, and Ascomycota present in the

GeneBank.

The sequences that were most represented from all the four sampled locations were those that were closely related to Actinobacteria, Firmicutes and Alphaproteobacteria. Research on microbial analysis of ice cores from Antarctica and Greenland also showed that the ice cores contained primarily sequences from members from the same three phyla (Knowlton et al. 2013).

This may indicated the large representation of the members of these phyla in polar ice.

Sequences closely related to Firmicutes were the most represented from the pure cultures while sequences closely related to Actinobacteria were the most represented in the metagenomic/metatranscriptomic analysis. Isolates of Bacillus were the most predominant among the cultures. Some of the sequences obtained from the sampled locations were closely related to Bacillus subtilis, Bacillus movajensis and Bacillus amyloliquifaciens. Pure cultures

closely related to Bacillus subtilis, Bacillus movajensis and Bacillus amyloliquifaciens have previously been isolated from ice cores in Antarctica (D’Elia et al. 2008) and Greenland 52

(Knowlton et al. 2013). They are known to be tolerant to low temperatures and resistant to desiccation (Christner et al. 2000). This might be the reason why they are common in ice cores.

A sequence closely related to Bacillus oceanisediminis was isolated from the Taylor Dome ice core. The isolation of sequences from Antarctic ice cores that were closely related to marine bacteria previously were reported (D’Elia et al. 2008). Other sequences that were closely related to marine bacteria include those that were closely related to Trichodesmium erythraeum (J-9 sample) and Cyanothece sp. PCC 7425 (Taylor Dome).

Sequences that were closely related to Micrococcus sp., Rhodococcus sp. and

Arthrobacter sp. were isolated from Vostok, Byrd and Taylor Dome. These bacteria have also been isolated from ice cores in Greenland (Knowlton et al. 2013) and Antarctica (D’Elia et al.

2008). Some Rhodococcus sp. and Arthrobacter sp. are psychrophilic and it is possible that they may be widely represented in Antarctic ice. The number of sequences that were closely related to members of the Ascomycota was few. The isolation and culturing of Ascomycota from ice cores from Antarctica previously was reported (D’Elia et al. 2009). The isolation of only a few sequences that were closely related to Ascomycota from the sampled location may be an indication of very low concentrations of these microbes in the ice that was sampled.

2.5.1 Factors responsible for sequence distribution in Antarctica

Vostok (at an elevation of 3,488 m) has the least number of sequence representations compared to Taylor Dome (2,440 m), Byrd (1,530 m) and J-9 (60 m) samples (Table 6). The decrease in number of sequence representation with increase in elevation has been reported in other research (Yergeau et al. 2007). However, the number of sequences from the Taylor Dome and Byrd samples does not follow this trend. The high number of sequences recovered from the sample at Taylor Dome, a high altitude site, compared to the Byrd sample may indicate that 53 elevation alone is insufficient to predict the sequence distribution in Antarctic ice. Other factors that might be important in predicting sequence distribution are storms, precipitation, birds, winds and human activities. Distance of the sampled locations from the Antarctic coast may also be an important factor that determines sequence distribution in Antarctica. Taylor Dome, which is approximately 281 km from the coast line of Antarctica, has a sample with the highest number of recovered sequences followed by Byrd, J-9 and Vostok (454 km, 647km and 1680 km from the coast line, respectively; Table 6). This is in agreement with a similar study carried out that showed that the number of recoverable microorganisms tends to increase with increase in altitude within areas that are closer to the Antarctic coast (Yergeau et al. 2007). This implies that microbial distribution patterns and numbers depend more on nearness to maritime verses continental locations as opposed to latitude.

The deposition of airborne microorganisms in Antarctica may have influenced the microbial assemblages, food webs and metabolic pathways in their new environment (Pearce et al. 2009). The prevailing wind patterns in Antarctica show an anti-clockwise circular movement within the continent (Fig 6). The katabatic winds on the other hand move downhill towards the

Antarctic coast. When they reach the Antarctic coast, the continental winds are forced into join the circumpolar winds that go around Antarctica. These winds converge with other winds from the surrounding ocean and continents after which they periodically reenter the continent as storms, especially at the southern part of the continent where Taylor Dome is located. If wind transport plays an important role in the distribution of microbes in Antarctica, then this wind distribution pattern makes it nearly impossible to have unique microbial species found at one particular location in Antarctica. The data presented does not provide proof of endemism. It is therefore possible that the organisms may be widely distributed in all locations throughout 54

Antarctica by the prevailing winds and it might be the reason why some sequences obtained from the four locations are similar to one another. Sequences matching Bacillus subtilis were found in the Vostok, Byrd, and J-9 locations. Frost flowers, sea surface structures that are created in dry and extreme cold conditions by salt exclusion of the ice matrix during freezing have been found to contain high concentrations of bacteria (Harding et al 2011). Frost flowers, coupled with strong wind movement over the ocean, are involved in the long-range transport of bacteria and it is possible that the marine species Bacillus oceanisediminis (Taylor Dome Sample),

Strenotrophomonas sp. S05 (Taylor Dome Sample) and Strenotrophomonas sp. 2325 (Byrd sample), were transported by this mode. Microbes have also been found to be more concentrated on the ocean surface microlayer compared to subsurface waters (Aller et al 2005). It is this microlayer that is responsible for enriching microbes in marine aerosols that are mainly produced by wave action and bubble bursting. It is possible that storm systems moving from the ocean into

Antarctica through Taylor Dome may have been responsible for the transport of these microbiaota inform or aerosols into Antarctica. Other sequences that were most similar to marine bacteria species include those that were closely related to Trichodesmium erythraeum (J-9 sample) and Cyanothece sp. PCC 7425 (Taylor Dome sample).

Byrd and J-9 are located at low elevations (1530 m and 60 m, respectively) in West

Antarctica closer (454 km and 647 km, respectively) to the Antarctica coastline where there is a higher diversity of organisms compared to the remote continental parts of Antarctica (Yergeau et al. 2007). Migratory birds that that fly to the Antarctica mainly stay close to the coastal region and it is possible that they may fly to closer areas such as Byrd and J-9 (Vincent 2000). These birds have been known to introduce microorganisms from other continents to Antarctica (Cowan and Tow 2004). Migratory birds have also been known to fly to the interior of the continent 55

(Graña and Montalti 2012). Various Bacillus species have been shown to cause diseases in birds

(Fox and Pensose 2001). There is a possibility that some of the Bacillus sp. recovered as isolates from the samples might have been dispersed to the various locations by migratory birds.

Although birds such as the polar skua have been reported at the South Pole, they have probably been recently attracted there by the dump at the South Pole Station (Mund and Miller 1995). This station was not present 500 years ago when the sampled bacteria were deposited and therefore winds remain the probable transport mechanism for the microorganisms found in remote continental areas of Antarctica.

Table 6: Comparison of the sampled locations in terms of site elevation, distance from the coast, number of unique sequences recovered and number of species identified using the ribosomal sub-units.

Location Site Elevation Distance from Number of unique Number of sampled (m) the coast (km) Sequences Species identified

Vostok 3,488 1,680 33 7

Byrd 1,530 454 43 13

J-9 60 647 40 7

Taylor Dome 2,440 241 51 12

56

Vostok

J-9

Byrd

Taylor Dome

Figure 6: Prevailing winds in Antarctica during winter. The prevailing winds show a clockwise movement around the Antarctic continent and an anti- clockwise movement circular movement within the continent. The prevailing winds on the continent drift off at the edge of the continent and join the winds at the southern ocean. The winds around the continent reenter the continent at specific points especially around Taylor

Dome (Pearce et al. 2009).

57

The Taylor Dome sample had the highest number of sequences matched (51) sequences on the NCBI database followed by Byrd (43), J-9 (40) and Vostok (33). A large number of sequences were common to the Taylor Dome and Byrd ice cores, presumably because the two regions are closer to major sources of airborne materials found close to the Antarctic coastline.

As the wind and the storm systems move from the Southern Ocean into Antarctica some of the airborne material may be deposited at Taylor Dome and Byrd (241 km and 454 km from the

Antarctic coastline). The ice core from Vostok contained the fewest number of sequences (33) that matched sequences on the NCBI database. This might be because Vostok is located at a higher elevation (3,488 m) in a remote part of Eastern Antarctica. It also lies within a very cold desert with very low amounts of annual precipitation. In comparison, the conditions are more moderate at Taylor Dome, Byrd and J-9.

The results of the statistical regression analysis on the sequences obtained versus the distance of the sampled location from the Antactic coast showed a goodness of fit of the model

(significant F=0.087). Which means that there is a high probability (91.4%) that the output of the regression did not occur by chance. The implication of this is that the distance of the sampled location from the Antactic coast has an effect on the number of sequences obtained from the sampled location in Antarctica. However, from the regression analysis it is clear that this correlation is weakly significant (r2=0.834, p=0.087, p<0.1). On the other hand, the results from statistical regression analysis on the sequences obtained from the sampled locations versus the elevation of the sampled locations indicate that there is a high probability that the output of the regression occurred by chance (significant F= 0.811). This indicates that the elevation of the sampled location has very little or no effect on the number of sequences recovered compared to the distance of the sampled location from the Antactic coast (significant F=0.087). The 58

correlation of the elevation of the sampled locations with the number of sequences recovered is

not significant (r2= 0.036, p=0.811). See figure 7.

60 y = -0.0107x + 49.809 R² = 0.8339 50 Taylor Dome

Byrd 40 J-9 Vostok sequences 30 of

20 Number 10

0 0 500 1000 1500 2000 Distance of sampled location fom the coast (km) (a)

60 y = -0.001x + 43.573 R² = 0.0357 50 Taylor Dome

Byrd 40 J-9

sequences sequences Vostok 30 of

20 Number 10

0 0 1000 2000 3000 4000 Elevation of sampled location (m) (b)

Figure 7: Distribution of the number of sequences obtained from the sampled locations versus

the distance of the sampled locations from the Antarctic coast (a); and sequences obtained from

the sampled locations versus the elevation of the sampled locations (b).

59

2.5.2 Metabolic analysis

The microbes found in glacial ice may be viable, dormant or dead. There is evidence that microorganisms survive with little nutrient in glacial ice at temperatures below the freezing point of pure water (Price and Todd 2004). Campen et al. 2003 also showed that there were abnomalities in the concentrations of gases trapped in ice between temperate glacial ice and polar glacial ice suggesting that there was some form of metabolic activity taking place in the solid ice.

Numerous sequences that matched enzymes were recovered from the four locations sampled.

2.5.2.1 Metabolism in cold habitats The carbohydrate metabolic analysis of the organisms from the four locations indicates that they are able to utilize various sources of carbon. The percentage of sequences matching enzymes responsible for energy metabolism was highest in the Byrd sample (18%), followed by

J-9 (14%), Taylor Dome (13%), then Vostok (9%). As the temperatures decrease and the environmental conditions become more severe, metabolic and growth rates also decrease (Price and Sowers 2004). This might be the reason why the percentage of sequences matching enzymes responsible for energy metabolism was low at in the Vostok sample. The energy sources for the microorganisms influence the metabolic processes and determine the community structure in the glacial ice (Price 2000). These energy sources will differ depending on various environmental and geographic factors present at a given location. This is evident in areas closer to the Antarctic coast, such as Taylor Dome, where the glacial ice contains a high amount of salt that is brought to the continent by storms moving across the Southern Ocean (Becagli et al. 2012). The recovery of few sequences from the Vostok sample may be as a result of a limited number of energy sources compared to the other sampled locations. 60

Glacial ice has been known to harbor numerous types of aerobic and anaerobic bacteria

(Simon et al. 2009), including methanogens, heterotrophs, nitrate and sulfate reducers (Skidmore

et al. 2000). These bacteria are thought to mainly live in the veins between the ice crystals. It is possible that most of the metabolic activity takes place in these veins of the ice where nutrients and elements necessary for survival are present (Price 2000; Price and Sowers 2004). The ice

veins are filled with acids such as sulphuric acid, hydrochloric acid, nitric acid, methanosulfonic

and formic acid with very low freezing temperatures that enables them to remain in liquid form

within veins in the ice crystals (Price 2000). The indication of genes for methane, nitrogen and

sulphur metabolism at the four locations indicates that the microorganisms may be utilizing these

acids to obtain their energy. Some species, known to obtain energy by oxidation of electron

donors in their environment, that were matched to the sequences obtained from the sampled

location were Arthrobacter sp EL3, and Sphingomonas sp. 352.

Sequences potentially encoding enzymes responsible for lipid metabolism at the four

locations were very few. The pathways represented were fatty acid, glycerolipid and alpha-

linolenic acid metabolism. These may have been because of the small sample size or limited

number of microorganisms in the ice. The importance of lipid metabolism by microorganisms in

cold deserts has been studied in great depth (Margesin and Miteva 2011; Sengupta and

KChattopadhyay 2013). The alteration of the membrane lipid composition is a way that is used

by bacteria to adapt to low and high temperature stress (Shivaji and Prakash 2010). Under low

temperatures lipid in the cell membranes undergo a transition from a liquid crystalline phase to

the a gel phase (Shivaji and Prakash 2010). Bacteria that live in cold environments have

undergone adaptation to reduce the rate of this transition by coding sequences for

polyunsaturated fatty acid synthesis as well as fatty acid cis/trans to increase the cell 61 membrane fluidity and sustain the cell function at low temperatures (Methé et al. 2005). A sequence from Taylor Dome was closest to a sequence from Flavobacterium sp. ER3 which is known to have a high percentage of unsaturated fatty acids. A sequence from Taylor Dome ice was closest to the acetyl-CoA carboxylase carboxyl transferase gene. The enzyme encoded by this gene is important in the synthesis of malonyl-CoA, which is a building block for new fatty acids used for phospholipid formation.

Bacterial survival under stress, such as lack of substrates, UV-irradiation and low temperatures in Antarctica has led to the evolution of species with unique structures and biological activity (Singh and Singh 2011). Polyketides and terpenoids are structurally diverse secondary metabolites produced by a wide range of organisms with a broad range of biological activities that have been extensively studied in fungi and bacteria (Rateb and Ebel 2011; Kim and Li 2012). The pathways for the synthesis of secondary metabolites were present in the locations although the enzymes represented were very few (Table 5). Secondary metabolites have many functions in bacteria, some of which include facilitation of competition for substrates, communication between organisms (Mukherjee 2012), absorb harmful UV irradiation and inhibit microbes (Demain 1999). Some of the sequences recovered from the samples are most similar to those from bacteria known to produce secondary metabolites that may be important in the glacial ice community. Bacillus subtilis isolated from the sampled regions are also known to produce antimicrobial secondary metabolites (Awais et al. 2010). Some functions of secondary metabolites are not clearly understood but it is clear that they are produced under selective evolutionary pressures and therefore must serve an important function (Amsler 2001).

The percentages of sequences involved in amino acid metabolism at all the four locations were almost identical (Table 5). This observation may reflect the similarity of the environmental 62 conditions and pressures at the four different locations. Synthesis of all twenty amino acids is represented by the pathways present in all the sampled locations. Due to the limited number of sequences that matched specific enzymes involved it is difficult to specifically evaluate each of the pathways.

2.5.3 Adaptation to cold habitats

The adaptation of microorganisms to live in cold environments such as glacial ice may be a preference or a tolerance. Microbes have adapted to living in extreme habitats in various ways.

In an effort to survive at low temperatures, microorganisms may increase the proportion of polyunsaturated fatty acids in the membrane to increase the membrane fluidity (Thomas and

Dieckmann 2002). At low temperatures when the rate of enzyme movement across the membrane is low some microorganism may opt, if possible, to move to an environment richer in organic materials to compensate for the less effective transport system or synthesis enzymes that are better adapted to low temperatures (Price 2000). Nucleic acids in microorganisms are also affected by low temperatures. The secondary structures of nucleic acids at low temperatures are destabilized in a way that could interfere with the DNA replication, transcription and translation.

To avoid this, the microorganisms living in ice synthesize cold active proteins to increase the stability of the proteins and the catalytic efficiency (Price 2000). Cold-adapted microbes often have high proportions of A/T, while organisms in hot springs are normally G/C rich (Zhao et al.

2010). Some bacteria synthesize antifreeze proteins to counter the formation of the crystalline ice that may damage the cell structures (Mindock et al. 2001). Some bacteria recovered from the ice belong to the groups of bacteria known to have structures such as spores and thick cell walls that confer resistance to harsh environmental conditions, such as freezing and dehydration (e.g., species of Bacillus circulans, Bacillus subtilis and Bacillus cereus). Rhodococcus erythropolis 63

also have color pigmentation that is photo-induced that is important in the absorption of solar

irradiation to prevent DNA damage.

2.5.4 Important enzymes in cold habitats

The analysis of enzymes from microorganisms living in cold habitats provides an

opportunity to study the survival mechanisms and adaptation the microorganisms to their habitat

(Morita et al. 1988). Psychrophilic and psychrotolerant microorganism may adapt two strategies

to ensure their survival in cold deserts: produce enzymes that operate optimally at lower

temperatures or increase the number of enzymes produced to compensate for lower rates of

(D'Amico et al. 2001). Some of the enzyme sequences recovered from the four

locations have been reported to be important in the survival of microorganisms living in cold

deserts. Isoenzymes with different thermal stabilities contribute to bacterial adaptation in

extreme environment (Feller and Gerday 1997). Sequences most similar to the isocitrate

dehydrogenase I were found in the J-9 sample. Isocitrate dehydrogenase I, a dimer, is more

stable than isocitrate dehydrogenase II which is a monomer (Russell et al. 1990). The two

isoenzymes have previously also been isolated from psychrophilic Vibro ssp. Isocitrate

dehydrogenase I denatures above 15°C but is reactivated at 0°C which makes it efficient at low

temperatures (Russell et al. 1990).

Hydrolytic enzymes are used in the hydrolysis of high molecular weight organic

constituents such as esters, proteins, and alpha- and beta-polysaccharides that are active in the

process of carbon and nitrogen recycling in glacial ice (Yu et al. 2009). Phosphatase and alpha- galactosidase are some of the hydrolytic enzymes reported to be important in bacteria found in glacial ice (Yu et al. 2009). Sequences matching these two enzymes were isolated from the sampled locations. Sequences close to those encoding polyphoshate kinase were found in the J-9 64 and Taylor Dome samples. These are responsible for the synthesis of polyphosphates (Rao et al.

2009). Polyphosphates serve as short-term energy storage sources and influence specific gene expression under stressful conditions by mimicking DNA and forming a complex with RNA. By regulating specific enzymatic activities polyphosphates promote mutagenesis that may eventually lead to adaptive evolution of microorganisms in extreme environments (Seufferheld et al. 2008).

2.6 CONCLUSION In this study, microbial content in ice cores from four locations in Antarctica were sampled in order to investigate sequence distribution with the various likely agents and factors responsible for the distribution. Analyses of the metabolic pathways present in the samples were also performed. Metagenomic and metatranscriptomic analyses were carried out to quantify the genes that are present in the community, while SSU and LSU rRNA analysis was also performed using sequences from isolates present at the four locations. The numbers of unique sequences obtained were few (Taylor dome (51), Byrd (43), Vostok (33) and J-9 (40), probably because of the small sample size or the fact that very few microorganism are able to survive in the glacial ice. The number of unique sequences is lowest at the most elevated location in the interior of

Antarctica (Vostok (33)) and highest the location closest to the Antarctic coast (Taylor Dome

(51)). The samples from all the four locations appear to harbor very few species of microorganisms (Taylor Dome (12), Byrd (13), Vostok (6), J-9 (7)), with the highest species representation at the locations closest to the Antarctic coast (Taylor Dome (12) and Byrd (13)).

Analysis of the metabolic pathways present revealed that the pathways responsible for carbohydrate, energy, and amino acid metabolism had the highest gene sequence representations, while those responsible for secondary metabolites were poorly represented. It is important to 65 note that the microorganisms found at the different locations in Antarctica may have been assembled by the various distribution agents and may not necessarily be interacting and functioning as a community.

66

2.7 REFERENCES

Abyzov SS, Poglazova MN, Mitskevich IN, Ivanov MV (1998). Common features of

microorganisms in ancient layers of the Antarctic ice sheet. In: Castello JD, Rogers SO (eds)

Life in ancient ice. Princeton University Press, Princeton, NJ, pp 240–250.

Aller JY, Kuznetsova MR, Jahns CJ, Kemp PF (2005). The sea surface microlayer as a source of viral and bacterial enrichment in marine aerosols. J. Aerosol Sci. 36:801–812.

Amsler CD, McClintock JB, Baker BJ (2001). Secondary metabolites as mediators of trophic

interactions among Antarctic marine organisms. Amer. Zool. 41:17–26.

Awais M, Pervez A, Yaqub A, Shah MM (2010). Production of antimicrobial metabolites by

Bacillus subtilis immobilized in polyacrylamide gel. Pakistan. J. Zool. 42:267–275.

Azmi OR, Seppelt RD (1998). The broad-scale distribution of microfungi in the Windmill

Islands region, continental Antarctica. Polar Biol. 19:92–100.

Becagli S, Scarchilli C, Traversi R, Dayan U, Severi M, Frosini D, Udisti R (2012). Study of

present-day sources and transport processes affecting oxidised sulphur compounds in

atmospheric aerosols at Dome C (Antarctica) from year-round sampling campaigns. Atmos.

Environ. 52:98–108.

Campen KR, Sowers T, Alley RB (2003). Evidence for microbial consortia metabolizing within

a low latitude mountain glacier. Geology 31:231–234.

Christner BC (2000). Recovery and identification of viable bacteria immured in glacial ice.

Icarus 144:479–485. 67

Christner BC, Mosley-Thompson E, Thompson LG, Zagorodnov V, Sandman K, Reeve JN

(2000). Recovery and identification of viable bacteria immured in glacial ice. Icarus 144:479–

485.

Cowan DA, Ah Tow L (2004). Endangered Antarctic microbial communities. Ann. Rev.

Microbiol. 58:649–690.

D’Amico S, Gerday C, Feller G (2001). Structural determinants of cold adaptation and stability in a large protein. J. Biol. Chem. 276:25791–25796.

D'Elia T, Veerapaneni R, Rogers SO (2008). Isolation of microbes from Lake Vostok accretion ice. Appl. Environ. Microbiol. 74:4962–4965.

D’Elia T, Veerapaneni R, Theraisnathan V, Rogers SO (2009). Isolation of fungi from Lake

Vostok accretion ice. Mycologia 101:751–763.

Demain AL (1999). Pharmaceutically active secondary metabolites of microorganisms. Appl.

Microbiol. Biotechnol. 52: 455–463.

Feller G, Gerday C (1997). Psychrophilic enzymes: molecular basis of cold adaptation. Cell.

Mol. Life Sci. 53:830–841.

Fox H, Penrose CB (1923). Disease in Captive Wild Mammals and Birds. Incidence,

Description, Comparison. J.B. Lippincott Co., Philadelphia, PA, pp 665–670.

Graña GM, Montalti D (2012). Trophic interactions between brown and south polar skuas at

Deception Island, Antarctica. Polar biol. 35:299–304. 68

Harding T, Jungblut AD, Lovejoy C, Vincent WF (2011). Microbes in high Arctic snow and

implications for the cold biosphere. Appl. Environ. Microbiol. 77: 3234–3243.

Jose´ RT, Brett MG, Friedmann EI, Norman RP (2003). Microbial diversity of cryptoendolithic

communities from the McMurdo dry valleys, Antarctica. Appl. Environ. Microbiol. 69:3858–

3867.

Karl DM, Bird DF, Björkman K, Houlihan T, Shackelford R, Tupas L (1999). Microorganisms

in the accreted ice of Lake Vostok, Antarctica. Science. 286: 2144–2147.

Kim SK, Li YX (2012). Biological activities and health effects of terpenoids from marine fungi.

Marine Medicinal Foods: Implications and Applications: Animals and Microbes. Vol. 65.

Elsevier Inc., Waltham, MA, pp 137–149.

Knowlton C, Veerapaneni R, D'Elia T, Rogers SO (2013). Microbial analyses of ancient ice core

sections from Greenland and Antarctica. Biol. 2:206–232.

Ma LJ, Rogers SO, Catranis CM, Starmer WT (2000). Detection and characterization of ancient

fungi entrapped in glacial ice. Mycologia 286–295.

Margesin R, Miteva V (2011). Diversity and ecology of psychrophilic microorganisms. Res.

Microbiol. 162:346–361.

Methé BA, Nelson KE, Deming JW, Momen B, Melamud E, Zhang X, Fraser CM (2005). The

psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. P. Natl. Acad. Sci. USA, 102:10913–10918.

Mindock CA, Petrova MA, Hollingsworth R I (2001). Re-evaluation of osmotic effects as a general adaptative strategy for bacteria in sub-freezing conditions. Biophys. Chem. 89:13–24. 69

Morita RY (1988). Bioavailability of energy and its relationship to growth and starvation

survival in nature. Can. J. Microbiol. 34:436–441.

Moriya Y, Itoh M, Okuda S, Yoshizawa A, Kanehisa M (2007). KAAS: an automatic genome

annotation and pathway reconstruction server. Nucleic Acids Res. 35:182–185.

Mukherjee PK, Horwitz BA, Kenerley CM (2012). Secondary Metabolism in Trichoderma – a genomic perspective. Microbiol. 158:35

Mund M J, Miller GD (1995). Diet of the south polar skua Catharacta maccormicki at Cape

Bird, Ross Island, Antarctica. Polar Biol. 15:453–455.

Pearce DA, Bridge PD, Hughes KA, Sattler B, Psenner R, Russell NJ (2009). Microorganisms in the atmosphere over Antarctica. FEMS Microbiol. Ecol. 69:143–157.

Pointing SB, Chan Y, Lacap DC, Lau MC, Jurgens JA, Farrell RL (2009). Highly specialized

microbial diversity in hyper-arid Polar desert. Proc. Natl. Acad. Sci. USA 106:19964–19969.

Price PB (2000). A habitat for psychrophiles in deep Antarctic ice. Proc. Natl. Acad. Sci. USA.

97:1247–1251.

Price PB, Todd S (2004). Temperature dependence of metabolic rates for microbial growth,

maintenance, and survival. Proc. Natl. Acad. Sci. USA. 101:4631–4636.

Priscu JC, Adams EE, Lyons WB, Voytek MA, Mogk DW, Brown RL (1999). Geomicrobiology of subglacial ice above Lake Vostok, Antarctica. Science 286:2141–2144.

Rao NN, Gómez-García MR, Kornberg A (2009). Inorganic polyphosphate: essential for growth and survival. Ann. Rev. Biochem. 78:605–647. 70

Rastogi G, Sani RK (2011). Molecular techniques to assess microbial community structure,

function, and dynamics in the environment. In: Microbes and Microbial Technology. Springer,

New York, pp 29–57.

Rateb ME, Ebel R (2011). Secondary metabolites of fungi from marine habitats. Nat. Prod. Rep.

28: 290–344.

Rogers SO, Bendich AJ (1985). Extraction of DNA from milligram amounts of fresh, herbarium

and mummified plant tissues. Plant Mol. Biol. 5:69–76.

Rogers SO, Theraisnathan V, Ma LJ , Zhao Y, Zhang G, Shin SG, Castello JD, and Starmer WT

(2004). Comparisons of protocols for decontamination of environmental ice samples for

biological and molecular examinations. Appl. Environ. Microbiol. 70:2540–2544.

Russell NJ, Harrisson P, Johnston IA, Jaenicke R, Zuber M, Franks F and Wynn-Williams D

(1990). Cold adaptation of microorganisms [and discussion]. Philos. Trans. R. Soc. Lond. B.

326: 95-611.

Satyanarayana T, Chandralata R, Shivaji S (2005). Extremophilic microbes: Diversity and

perspectives. Curr. Sci. 89:78–90.

Sengupta D and KChattopadhyay M A (2013). Metabolism in bacteria at low temperature: A

recent report. J. Biosci. 38:1–4.

Seufferheld MJ, Alvarez HM, Farias ME (2008). Role of polyphosphates in microbial adaptation to extreme environments. Appl. Environ. Microbiol. 74:5867.

Shivaji S, Prakash JS. (2010). How do bacteria sense and respond to low temperature? Arch.

Microbiol. 192:85–95. 71

Simon C, Wiezer A, Strittmatter AW, Daniel R (2009). Phylogenetic diversity and metabolic

potential revealed in a glacier ice metagenome. Appl. Environ. Microbiol. 75:7519–7526.

Singh J, Dubey AK, Singh RP (2011). Antarctic terrestrial ecosystem and role of pigments in

enhanced UV-B radiations. Rev. Environ. Sci. Biotech. 10:63–77.

Skidmore ML, Foght JM, Sharp MJ (2000). Microbial life beneath a high Arctic glacier. Appl.

Environ. Microbiol. 66: 3214–3220.

Thomas DN, Dieckmann GS (2002). Antarctic Sea ice - a habitat for extremophiles. Science

295:641–644.

Vincent WF (2000). Evolutionary origins of Antarctic microbiota: Invasion, selection and

endemism. Antarct. Sci. 12:374–385.

Wynn-Williams DD (1996). Antarctic microbial diversity: the basis of polar ecosystem

processes. Biodiv. Conserv. 5:1271–1293

Yergeau E, Newsham KK, Pearce DA, Kowalchuk GA (2007). Patterns of bacterial diversity across a range of Antarctic terrestrial habitats. Environ. Microbiol. 9:2670–2682.

Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS (2009). Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16S rRNA gene-based environmental surveys. Appl.

Environ. Microbiol. 75:5227–5236.

Yu Y, Li H, Zeng Y, Chen B (2009). Extracellular enzymes of cold-adapted bacteria from Arctic sea ice, Canada Basin. Polar Biol. 32:1539–1547. 72

Zhao JS, Deng Y, Manno D, Hawari J (2010). Shewanella spp. genomic evolution for a cold marine lifestyle and in-situ explosive biodegradation. PLoS ONE 5:9109.

73

APPENDICES

A. Numbers of sequences matching those for genes in specific pathways that were recovered from the Taylor Dome, Bryd, Vostok and J-9 samples.

No. of Sequences matching to enzymes in Metabolic Pathways metabolic pathways Taylor Dome Byrd Vostok J-9 Carbohydrate Metabolism 00010 Glycolysis / Gluconeogenesis 4 4 7 3 00020 Citrate cycle TCA cycle 4 2 2 5 00030 Pentose phosphate pathway 3 3 4 4 00040 Pentose and glucuronate interconversions 1 1 1 2 00051 Fructose and mannose metabolism 3 3 1 4 00052 Galactose metabolism 1 1 1 2 00053 Ascorbate and aldarate metabolism 1 1 00500 Starch and sucrose metabolism 1 1 00520 Amino sugar and nucleotide sugar metabolism 4 3 4 2 00620 Pyruvate metabolism 5 3 7 3 00630 Glyoxylate and dicarboxylate metabolism 3 2 1 3 00640 Propanoate metabolism 3 3 5 1 00650 Butanoate metabolism 2 1 2 2 00660 C5-Branched dibasic acid metabolism 1

Energy Production 00190 Oxidative phosphorylation 3 3 4 00195 Photosynthesis 1 1 00710 Carbon fixation in photosynthetic organisms 1 2 2 2 00720 Carbon fixation pathways in prokaryotes 4 3 4 4 00680 Methane metabolism 1 1 2 1 00910 Nitrogen metabolism 4 4 6 00920 Sulfur metabolism 1

Lipid Metabolism 00061 Fatty acid biosynthesis 1 1 1 1 00071 Fatty acid metabolism 1 2 1 00561 Glycerolipid metabolism 1 1 1 1 00592 alpha-Linolenic acid metabolism 1

74

Cont. No. of Sequences matching to enzymes in Metabolic Pathways metabolic pathways Taylor Dome Byrd Vostok J-9 Amino Acid Metabolism 00230 4 2 3 6 00240 Pyrimidine metabolism 5 2 3 6 00250 Alanine, aspartate and glutamate metabolism 1 2 1 00260 Glycine, serine and threonine metabolism 3 1 3 2 00270 Cysteine and methionine metabolism 1 1 3 1 00280 Valine, leucine and isoleucine degradation 3 2 1 1 00290 Valine, leucine and isoleucine biosynthesis 1 1 1 2 00300 Lysine biosynthesis 1 1 1 00330 Arginine and proline metabolism 4 4 4 4 00340 Histidine metabolism 2 2 2 1 00350 Tyrosine metabolism 1 1 1 00360 Phenylalanine metabolism 3 1 2 00380 Tryptophan metabolism 1 1 00400 Phenylalanine, tyrosine and tryptophan biosynthesis 1 2 1 1

Metabolism of other Amino Acids 00410 beta-Alanine metabolism 1 1 1 00450 Selenocompound metabolism 1 1 1 1 00473 D-Alanine metabolism 1 1 1 00480 Glutathione metabolism 1 1 1

Glycan Biosynthesis and Metabolism 00550 Peptidoglycan biosynthesis 1 1 1

Metabolism of Cofactors and vitamins 00730 Thiamine metabolism 2 00740 Riboflavin metabolism 1 00760 Nicotinate and nicotinamide metabolism 2 1 00770 Pantothenate and CoA biosynthesis 2 1 2 1 00780 Biotin metabolism 1 00790 Folate biosynthesis 1 2 00860 Porphyrin and chlorophyll metabolism 4 5 5 5 00130 Ubiquinone and other terpenoid-quinone biosynthesis 1 1 2

75

Cont. No. of Sequences matching to enzymes in Metabolic Pathways metabolic pathways Taylor Dome Byrd Vostok J-9 Metabolism of Terpenoids and Polyketides 00900 Terpenoid backbone biosynthesis 2 2 1 00903 Limonene and pinene degradation 1 1 1 00281 Geraniol degradation 1 00253 Tetracycline biosynthesis 1 1 1

Biosynthesis of other secondary metabolites 00311 Penicillin and cephalosporin biosynthesis 1 01053 Biosynthesis of siderophore group nonribosomal peptides 1

Xenobiotics Biodegradation and Metabolism 00362 Benzoate degradation 1 1 00627 Aminobenzoate degradation 2 00361 Chlorocyclohexane and chlorobenzene degradation 1 1 1 00623 Toluene degradation 1 00642 Ethylbenzene degradation 1 00643 Styrene degradation 1 00363 Bisphenol degradation 1 00626 Naphthalene degradation 1 00624 Polycyclic aromatic hydrocarbon degradation 1 00983 Drug metabolism - other enzymes 1

76

B. Description of the sequences found in the Taylor Dome, Bryd, Vostok and J-9 ice core section samples.

Taylor dome

Query-Name GI Q length E-Value %-Ident Description Product

MID2_c74 336102715 399 8.00E-77 82% [Cellvibrio] gilvus ATCC 13127, complete genome ATP-binding protein

MID2_c187 381356398 71 2.00E-09 87% Bradyrhizobium sp. S23321 DNA, complete genome ABC transporter -binding protein

MID2_rep_c56 381356398 217 2.00E-59 87% Bradyrhizobium sp. S23321 DNA, complete genome GTPase ObgE

MID2_c32 338164315 253 0.001 100% Cupriavidus necator N-1 chromosome 1, complete sequence glutathione S-transferase

MID2_c72 226316800 308 1.00E-14 93% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyltransferase

MID2_c35 160361034 369 0 100% Delftia acidovorans SPH-1, complete genome outer membrane efflux protein

MID2_c211 333741867 312 3.00E-154 99% Delftia sp. Cs1-4, complete genome Alkaline phosphatase

MID2_rep_c208 359285645 292 3.00E-89 88% Geobacillus thermoleovorans CCB_US3_UF5, complete genome Nicotinate (Nicotinamide) nucleotide

MID2_rep_c45 359285645 295 5.00E-92 88% Geobacillus thermoleovorans CCB_US3_UF5, complete genome Nicotinate (Nicotinamide) nucleotide

MID2_rep_c93 169239075 337 3.00E-05 100% Mycobacterium abscessus chromosome, complete sequence Glycine dehydrogenase [decarboxylating]

MID2_s182 169239075 337 3.00E-05 100% Mycobacterium abscessus chromosome, complete sequence Glycine dehydrogenase [decarboxylating]

MID2_s244 169239075 333 9.00E-06 91% Mycobacterium abscessus chromosome, complete sequence Glycine dehydrogenase [decarboxylating]

MID2_s273 169239075 322 1.00E-04 100% Mycobacterium abscessus chromosome, complete sequence Glycine dehydrogenase [decarboxylating]

MID2_c115 326411376 168 4.00E-31 83% Polymorphum gilvum SL003B-26A1, complete genome Helicase subunit of the DNA excision repair

MID2_c251 297153409 330 1.00E-18 90% Streptomyces bingchenggensis BCW-1, complete genome secreted protein

Uncultured bacterium clone F5K2Q4C04IUZMC 23S ribosomal RNA gene, MID2_c36 291261390 330 8.00E-91 92% partial sequence 23S ribosomal RNA

MID2_c98 328807854 215 9.00E-14 79% Verrucosispora maris AB-18-032, complete genome helicase domain-containing protein 77

Cont.

Query-Name GI Q length E-Value %-Ident Description Product

MID2_c134 154158043 194 2.00E-15 92% Xanthobacter autotrophicus Py2, complete genome Electron-transferring-flavoprotein dehydrogenase

MID2_c198 154158043 194 2.00E-15 92% Xanthobacter autotrophicus Py2, complete genome Electron-transferring-flavoprotein

MID2_c201 154158043 122 6.00E-43 93% Xanthobacter autotrophicus Py2, complete genome helicase domain protein

MID2_c202 154158043 122 3.00E-41 93% Xanthobacter autotrophicus Py2, complete genome helicase domain protein

MID2_s259 154158043 161 1.00E-15 92% Xanthobacter autotrophicus Py2, complete genome Electron-transferring-flavoprotein dehydrogenase

MID2_c114 300021538 358 7.00E-54 74% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome 3'-5' exonuclease

MID2_c129 148251626 406 6.00E-62 79% Bradyrhizobium sp. BTAi1 chromosome, complete genome nitroreductase family protein

MID2_c138 308175814 237 9.00E-06 66% Arthrobacter arilaitensis Re117 chromosome, complete genome polyamine ABC transporter substrate-binding protein

MID2_c19 319765157 401 8.00E-92 81% Geobacillus sp. Y412MC52 chromosome, complete genome hypothetical protein

MID2_c206_1 71733195 168 2.00E-16 16% Pseudomonas syringae pv. phaseolicola 1448A chromosome, complete

MID2_c207 182411826 225 5.00E-28 87% Opitutus terrae PB90-1 chromosome, complete genome translation initiation factor IF-2

MID2_c248 285016821 228 5.00E-41 77% Xanthomonas albilineans GPE PC73 chromosome, complete genome DNA helicase

MID2_c32_1 384213726 97 2.00E-14 97% Bradyrhizobium japonicum USDA 6, complete genome hypothetical protein

MID2_c39 386000717 109 4.00E-05 81% Methanosaeta harundinacea 6Ac chromosome, complete genome glycosyl transferase

MID2_c39_1 383775170 205 5.00E-18 75% Actinoplanes missouriensis 431, complete genome putative C4-dicarboxylate transporter

Clavibacter michiganensis subsp. michiganensis NCPPB 382 chromosome, MID2_c6 148271178 97 2.00E-08 79% complete genome 5-methyltetrahydropteroyltriglutamate/homocysteine S-meth...

MID2_c72_1 240136783 205 6.00E-33 77% Methylobacterium extorquens AM1, complete genome acetyl-CoA carboxylase, carboxytransferase subunit alpha

MID2_c95 383768277 78 1.00E-15 88% Bradyrhizobium sp. S23321, complete genome glycine dehydrogenase

MID2_rep_c106 300021538 149 5.00E-19 76% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome DNA ligase, NAD-dependent 78

Cont.

Query-Name GI Q length E-Value %-Ident Description Product

MID2_rep_c4 300021538 157 1.00E-20 76% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome DNA ligase, NAD-dependent

MID2_rep_c67 75674199 107 4.00E-12 78% Nitrobacter winogradskyi Nb-255 chromosome, complete genome hypothetical protein

MID2_rep_c77 255961261 384 1.00E-57 83% Pseudomonas fluorescens Pf0-1 chromosome, complete genome hypothetical protein

MID2_c201 374096398 232 4.00E-52 87% Streptomyces hygroscopicus subsp. jinggangensis 5008, complete genome translation initiation factor IF-2

MID2_c308 297163432 100 4.00E-04 100% Truepera radiovictrix DSM 17093, complete genome ATP-dependent DNA helicase RecQ

MID2_c134 315593157 290 9.00E-80 87% Variovorax paradoxus EPS, complete genome AMP-dependent synthetase and ligase

MID2_c121 154158043 122 6.00E-43 93% Xanthobacter autotrophicus Py2, complete genome helicase domain protein

MID2_c324 226316800 149 2.00E-14 96% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID2_c258 333741867 309 9.00E-155 99% Delftia sp. Cs1-4, complete genome Alkaline phosphatase

MID2_rep_c26 11342515 413 7.00E-162 93% Enterococcus dispar 23S rRNA gene, strain LMG 13521 (T) 23S ribosomal RNA

MID2_c67 191639869 112 9.00E-14 81% Burkholderia cenocepacia J2315 chromosome 3, complete sequence hypothetical protein

MID2_c76 126432613 310 7.00E-22 71% Mycobacterium sp. JLS chromosome, complete genome PAS/PAC sensor hybrid histidine kinase

MID2_c147 217031074 380 2.00E-165 95% Flavobacterium sp. ER3 16S ribosomal RNA gene, partial sequence rRNA-16S ribosomal RNA

MID2_c15 217031059 379 7.00E-180 97% Arthrobacter sp. EL3 16S ribosomal RNA gene, partial sequence rRNA-16S ribosomal RNA

MID2_c154 220905643 239 2.00E-108 97% Cyanothece sp. PCC 7425 chromosome, complete genome rRNA-23S ribosomal RNA

79

Byrd

Query-Name GI Q length E-Value %-Ident Description Product

MID4_rep_c13 336102715 299 2.00E-52 82% [Cellvibrio] gilvus ATCC 13127, complete genome ATP-binding protein

MID4_rep_c20 336102715 299 4.00E-54 82% [Cellvibrio] gilvus ATCC 13127, complete genome ATP-binding protein

MID4_c76 226316800 307 1.00E-14 93% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID4_c94 256687298 326 2.00E-132 94% sedentarius DSM 20547, complete genome translation elongation factor 1A (EF-1A/EF-Tu)

MID4_c134 323356448 266 3.00E-74 87% Microbacterium testaceum StLB037 DNA, complete genome methionyl-tRNA synthetase

Nematostella vectensis predicted protein (NEMVEDRAFT_v1g225553) MID4_c59 156317834 310 5.00E-48 83% methionyl-tRNA synthetase partial mRNA

Nocardiopsis dassonvillei subsp. dassonvillei DSM 43111 chromosome 1, MID4_rep_c87 296843433 239 1.00E-58 85% translation initiation factor IF-2 complete sequence

MID4_c89 326411376 304 6.00E-12 76% Polymorphum gilvum SL003B-26A1, complete genome Coproporphyrinogen III oxidase, anaerobic

MID4_c97 336243519 316 8.00E-101 99% Sordaria macrospora k-hell hypothetical protein (SMAC_09740), mRNA hypothetical protein

MID4_c43 334343657 292 5.00E-117 93% Sphingobium chlorophenolicum L-1 chromosome 1, complete sequence Baf family transcriptional activator

MID4_c77 285808497 301 5.00E-38 97% Uncultured bacterium 270 genomic sequence 16S ribosomal RNA

MID4_c41 154158043 194 2.00E-15 92% Xanthobacter autotrophicus Py2, complete genome Electron-transferring-flavoprotein

MID4_c90 154158043 194 2.00E-15 92% Xanthobacter autotrophicus Py2, complete genome Electron-transferring-flavoprotein

MID4_c10 343207074 452 1.00E-52 71% Mesorhizobium loti transcriptional regulator

MID4_c136 332980605 116 1.00E-12 80% Mahella australiensis 50-1 BON chromosome, complete genome aldehyde dehydrogenase

MID4_c137 121602919 377 1.00E-30 88% Polaromonas naphthalenivorans CJ2 chromosome, complete genome DegT/DnrJ/EryC1/StrS aminotransferase

80

Cont.

Query-Name GI Q length E-Value %-Ident Description Product

MID4_rep_c12 338736863 212 1.00E-22 77% Hyphomicrobium sp. MC1, complete genome domain beta-lactamase-like

MID4_rep_c45 383775170 232 2.00E-21 73% Actinoplanes missouriensis 431, complete genome putative C4-dicarboxylate transporter

MID4_rep_c96 383775170 232 2.00E-21 73% Actinoplanes missouriensis 431, complete genome putative C4-dicarboxylate transporter

MID4_c41_1 383768277 80 2.00E-14 88% Bradyrhizobium sp. S23321, complete genome putative ferredoxin protein

MID4_c76_1 288956841 216 1.00E-35 76% Azospirillum sp. B510 chromosome, complete genome acetyl-CoA carboxylase carboxyl transferase subunit

MID4_rep_c240 68262661 378 6.00E-78 82% jeikeium K411 complete genome putative GTP-binding protein

MID4_rep_c32 68262661 378 1.00E-79 82% Corynebacterium jeikeium K411 complete genome putative GTP-binding protein

MID4_c284 333113473 402 6.00E-133 88% Pseudomonas fulva 12-X, complete genome glycosyl transferase family 2

MID4_s320 333113473 268 2.00E-110 94% Pseudomonas fulva 12-X, complete genome glycosyl transferase family 2

MID4_c219 226237899 448 2.00E-18 92% Rhodococcus opacus B4 DNA, complete genome putative 2,5-diketo-D-gluconate reductase

MID4_c292 47482 262 6.00E-56 83% S.ramocissimus tuf1 gene for elongation factor Tu1 elongation factor G

MID4_c242 334139601 335 1.00E-36 92% Novosphingobium sp. PP1Y chromosome, complete genome 383 bp at 5' side: hypothetical protein

MID4_c251 300021538 259 1.00E-16 70% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome sodium/hydrogen exchanger

MID4_c260 51891138 104 8.00E-14 82% Symbiobacterium thermophilum IAM 14863 chromosome, complete genome class-V aminotransferase

MID4_c269 240867064 141 4.00E-26 79% Ralstonia pickettii 12D chromosome 2, complete sequence peptidase S45 penicillin amidase

MID4_c271 111017022 133 1.00E-07 72% Rhodococcus jostii RHA1 chromosome, complete genome LuxR family transcriptional regulator

MID4_c297 148552929 152 8.00E-36 90% Sphingomonas wittichii RW1 chromosome, complete genome 15 bp at 5' side: radical SAM protein

MID4_c49 371486894 236 5.00E-54 92% Pseudoxanthomonas spadix BD-a59 chromosome, complete genome ABC transporter ATP-binding protein

81

Cont

Query-Name GI -Number Q length E-Value %-Ident Description Product

MID4_c194 229587578 150 2.00E-43 87% Pseudomonas fluorescens SBW25 chromosome, complete genome hypothetical protein

MID4_c204 169627108 280 8.00E-33 79% Mycobacterium abscessus ATCC 19977 chromosome 1, complete sequence threonine dehydratase

MID4_c212 191639869 117 1.00E-13 81% Burkholderia cenocepacia J2315 chromosome 3, complete sequence hypothetical protein

MID4_c222 172039680 220 3.00E-31 75% Corynebacterium urealyticum DSM 7109 chromosome, complete genome pyruvate dehydrogenase subunit E1

MID4_c104 217031062 355 1.00E-172 94% Rhodococcus sp. EL6 16S ribosomal RNA gene, partial sequence rRNA-16S ribosomal RNA

MID4_c115 217031036 393 0 97% Rhodococcus sp. DR4 16S ribosomal RNA gene, partial sequence rRNA-16S ribosomal RNA

MID4_c131 291259140 127 2.00E-74 99% Uncultured bacterium clone F5K2Q4C04IDZJ3 23S ribosomal RNA gene, partial sequence rRNA-23S ribosomal RNA

MID4_rep_c146 291259140 161 4.00E-76 99% Uncultured bacterium clone F5K2Q4C04IDZJ3 23S ribosomal RNA gene, partial sequence rRNA-23S ribosomal RNA

MID4_rep_c26 295105686 243 2.00E-109 91% Gordonibacter pamelaeae 7-10-1-b draft genome rRNA-23S ribosomal RNA

82

Vostok

Query-Name GI Q length E-Value %-Ident Description Product

MID5_c36 2.61E+08 224 2.00E-17 89% degensii KC4, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID5_rep_c26 2.26E+08 311 0.002 100% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID5_rep_c27 2.26E+08 311 1.00E-14 93% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID5_rep_c28 2.26E+08 310 1.00E-14 93% Deinococcus deserti VCD115, complete genome putative Acetyl-coenzyme A carboxylase carboxyl

MID5_rep_c66 2.57E+08 329 1.00E-14 93% DSM 20547, complete genome translation elongation factor 1A (EF-1A/EF-Tu)

MID5_c305 3.36E+08 380 3.00E-134 93% Mesorhizobium opportunistum WSM2075, complete genome YadA domain protein

MID5_c298 3.35E+08 413 8.00E-166 94% Microlunatus phosphovorus NM-1 DNA, complete genome succinate-semialdehyde dehydrogenase

MID5_c218 1.69E+08 285 2.00E-52 81% Mycobacterium abscessus chromosome, complete sequence Glycine dehydrogenase [decarboxylating]

MID5_c68 1.88E+08 246 3.00E-05 100% Ralstonia pickettii 12J plasmid pRPIC01, complete sequence family protein

MID5_c339 2.26E+08 405 7.00E-125 100% Rhodococcus opacus B4 DNA, complete genome 2,5-diketo-D-gluconate reductase

MID5_c120 1.72E+08 234 2.00E-18 92% Corynebacterium urealyticum DSM 7109 chromosome, complete genome phosphotransacetylase

MID5_c299 3.34E+08 241 6.00E-27 73% Sinorhizobium meliloti AK83 chromosome 2, complete sequence ABC transporter periplasmic protein

MID5_c38 3.39E+08 146 5.00E-13 80% Hyphomicrobium sp. MC1, complete genome hypothetical protein

MID5_c38_1 3.48E+08 107 3.00E-07 73% Pseudogulbenkiania sp. NH8B, complete genome aldose 1-epimerase

MID5_c386 1.22E+08 276 4.00E-18 69% Polaromonas naphthalenivorans CJ2 chromosome, complete genome DegT/DnrJ/EryC1/StrS aminotransferase

MID5_c52 75674199 96 5.00E-10 78% Nitrobacter winogradskyi Nb-255 chromosome, complete genome hypothetical protein

MID5_c65 2.28E+08 208 1.00E-21 70% Corynebacterium aurimucosum ATCC 700975, complete genome phosphopantothenate cysteine ligase / 4'-phospho- pantoth...

MID5_c67 3E+08 283 3.00E-32 73% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome undecaprenyl diphosphate synthase

MID5_rep_c1 3.84E+08 231 2.00E-21 73% Actinoplanes missouriensis 431, complete genome putative C4-dicarboxylate transporter 83

Cont.

Query-Name GI Q length E-Value %-Ident Description Product

MID5_rep_c105 3.84E+08 209 1.00E-21 73% Actinoplanes missouriensis 431, complete genome putative C4-dicarboxylate transporter

MID5_rep_c18 1.62E+08 122 5.00E-11 63% Gluconacetobacter diazotrophicus PAl 5 chromosome, complete genome glucose/galactose transporter

MID5_rep_c7 3.39E+08 480 1.00E-39 68% Hyphomicrobium sp. MC1, complete genome hypothetical protein

MID5_c137 2.57E+08 329 3.00E-134 93% Kytococcus sedentarius DSM 20547, complete genome translation elongation factor 1A (EF-1A/EF-Tu)

MID5_c259 2.57E+08 274 2.00E-135 99% Kytococcus sedentarius DSM 20547, complete genome translation elongation factor 1A (EF-1A/EF-Tu)

MID5_c30 2.57E+08 203 2.00E-99 100% Kytococcus sedentarius DSM 20547, complete genome carbohydrate ABC transporter ATP-binding

MID5_rep_c215 3.59E+08 262 4.00E-128 99% Geobacillus thermoleovorans CCB_US3_UF5, complete genome Uncharacterized conserved protein UPF0236

MID5_rep_c40 3.59E+08 262 5.00E-132 100% Geobacillus thermoleovorans CCB_US3_UF5, complete genome Uncharacterized conserved protein UPF0236

MID5_c110 3.36E+08 403 2.00E-78 82% [Cellvibrio] gilvus ATCC 13127, complete genome ATP-binding protein

MID5_c310 3.41E+08 81 5.00E-26 94% Amycolatopsis mediterranei S699, complete genome elongation factor EF-Tu

MID5_c231 2.56E+08 289 1.00E-63 83% DSM 44928, complete genome translation elongation factor G

MID5_c218 2.21E+08 147 7.00E-48 92% Caulobacter crescentus NA1000, complete genome TonB-dependent receptor

MID5_c103 3.75E+08 581 0 100% Lactococcus lactis subsp. lactis IO-1 DNA, complete genome 23S ribosomal RNA

MID5_c300 3.23E+08 256 8.00E-70 86% Microbacterium testaceum StLB037 DNA, complete genome methionyl-tRNA synthetase

MID5_c230 1.2E+08 189 3.00E-52 87% Mycobacterium vanbaalenii PYR-1, complete genome translation elongation factor 2 (EF-2/EF-G)

MID5_c202 3.75E+08 247 6.00E-26 77% Nocardia cyriacigeorgica GUH-2 chromosome complete genome conserved protein of unknown function

Xanthobacter autotrophicus gene for 16S rRNA, partial sequence, strain: MID5_c234 3.6E+08 391 0 96% NBRC 14758 rRNA-16S ribosomal RNA

MID5_rep_c223 3.72E+08 239 3.00E-114 97% Sphingomonas sp. 352 23S ribosomal RNA gene, partial sequence rRNA-23S ribosomal RNA

MID5_s54 2150030 301 4.00E-152 98% Rhodococcus erythropolis PR4, complete genome rRNA-23S ribosomal RNA

84

J-9

Query-Name GI Q length E-Value %-Ident Description Product

MID6_c152 336102715 286 7.00E-51 82% [Cellvibrio] gilvus ATCC 13127, complete genome ATP-binding protein

MID6_c117 142849896 321 8.00E-61 84% Aeromonas salmonicida subsp. salmonicida A449, complete genome valine tRNA synthetase

MID6_c54 68262661 343 1.00E-113 89% Corynebacterium jeikeium K411 complete genome 30S ribosomal protein S4

MID6_c23 333741867 310 2.00E-155 99% Delftia sp. Cs1-4, complete genome Alkaline phosphatase

MID6_c153 359285645 293 7.00E-91 88% Geobacillus thermoleovorans CCB_US3_UF5, complete genome Nicotinate (Nicotinamide) nucleotide

MID6_c68 375750259 304 2.00E-82 86% Gordonia polyisoprenivorans VH2, complete genome GTP-dependent nucleic acid-binding protein EngD

MID6_c85 375750259 346 9.00E-26 83% Gordonia polyisoprenivorans VH2, complete genome protein of unknown function DUF350

MID6_c118 256687298 328 1.00E-133 94% Kytococcus sedentarius DSM 20547, complete genome translation elongation factor 1A (EF-1A/EF-Tu)

MID6_rep_c38 323272819 265 1.00E-72 86% Microbacterium testaceum StLB037 DNA, complete genome methionyl-tRNA synthetase

MID6_rep_c4 323272819 265 3.00E-74 87% Microbacterium testaceum StLB037 DNA, complete genome methionyl-tRNA synthetase

MID6_s66 323272819 265 1.00E-67 85% Microbacterium testaceum StLB037 DNA, complete genome methionyl-tRNA synthetase

Propionibacterium freudenreichii subsp. shermanii CIRM-BIA1, complete MID6_c150 296920963 362 2.00E-12 100% genome hypothetical protein

MID6_rep_c32 19032282 286 5.00E-147 100% Rhizobium sp. AC100 plasmid pAC200 transposon Tnceh, complete sequence

MID6_c10 296267998 95 1.00E-11 89% Thermobispora bispora DSM 43833 chromosome, complete genome AMP-dependent synthetase and ligase

MID6_c105 126432613 282 2.00E-53 78% Mycobacterium sp. JLS chromosome, complete genome hypothetical protein

MID6_c110 197103466 215 5.00E-40 96% Phenylobacterium zucineum HLK1 chromosome, complete genome peptidase, M16 family

MID6_c157 300021538 152 4.00E-20 76% Hyphomicrobium denitrificans ATCC 51888 chromosome, complete genome DNA ligase, NAD-dependent

MID6_c29 145220606 216 2.00E-51 83% Mycobacterium gilvum PYR-GCK chromosome, complete genome TetR family transcriptional regulator

MID6_c5 227831830 402 2.00E-54 71% Corynebacterium aurimucosum ATCC 700975, complete genome ferrochelatase 85

Cont.

Query-Name GI Q length E-Value %-Ident Description Product

MID6_c74 24430032 188 5.00E-14 93% Streptomyces coelicolor A3(2) chromosome, complete genome CDA peptide synthetase II

MID6_rep_c34 121602919 407 2.00E-14 70% Polaromonas naphthalenivorans CJ2 chromosome, complete genome DegT/DnrJ/EryC1/StrS aminotransferase

MID6_rep_c46 332980605 91 8.00E-13 80% Mahella australiensis 50-1 BON chromosome, complete genome aldehyde dehydrogenase

Aeromonas hydrophila subsp. hydrophila ATCC 7966 chromosome, complete MID6_rep_c50 117617447 189 6.00E-07 68% genome ferrichrome-iron receptor

MID6_c112 120587178 249 1.00E-07 95% Acidovorax citrulli AAC00-1, complete genome 3-deoxy-D-manno-octulosonate 8-phosphate

MID6_rep_c2 120587178 405 9.00E-07 95% Acidovorax citrulli AAC00-1, complete genome 3-deoxy-D-manno-octulosonate 8-phosphate

MID6_c122 294009986 235 1.00E-49 80% Sphingobium japonicum UT26S chromosome 1, complete genome copper/silver efflux system membrane protein CusA/CzcA

Leptospira interrogans serovar Lai str. IPAV chromosome chromosome 1, MID6_c131 386072488 237 4.00E-36 73% complete sequence 461 bp at 5' side: permease

MID6_c136 340792956 169 8.00E-11 70% Corynebacterium variabile DSM 44702 chromosome, complete genome hypothetical protein

MID6_c147 133909243 251 5.00E-22 96% Saccharopolyspora erythraea NRRL 2338 chromosome, complete genome peptide ABC transporter,substrate-binding protein

Nocardiopsis dassonvillei subsp. dassonvillei DSM 43111 chromosome, MID6_c162 297558985 288 2.00E-35 73% complete genome NmrA family protein

MID6_c164 337289620 199 2.00E-26 77% Corynebacterium ulcerans BR-AD22 chromosome, complete genome hypothetical protein

MID6_c60 375750259 346 9.00E-26 83% Gordonia polyisoprenivorans VH2, complete genome protein of unknown function DUF350

MID6_c72 225696243 391 0 98% Uncultured cyanobacterium clone AN2_1 16S ribosomal RNA gene rRNA-16S ribosomal RNA

MID6_rep_c89 225696243 417 0 99% Uncultured cyanobacterium clone AN2_1 16S ribosomal RNA gene rRNA-16S ribosomal RNA

MID6_rep_c40 113473942 222 3.00E-96 96% Trichodesmium erythraeum IMS101 chromosome, complete genome rRNA-16S ribosomal RNA 86

C. Enzyme gene sequences identified from the Byrd, Vostok, J-9 and Taylor Dome samples. Byrd

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes ko:K00128 E1.2.1.3; aldehyde dehydrogenase (NAD+) X X X X X X X X ko:K00163 aceE; pyruvate dehydrogenase E1 component X X X X ko:K00231 PPOX; oxygen-dependent protoporphyrinogen oxidase X X ko:K00313 fixC; electron transfer flavoprotein-quinone X ko:K00342 nuoM; NADH-quinone oxidoreductase subunit M X ko:K00371 narH; nitrate reductase 1, beta subunit X X ko:K00616 E2.2.1.2; transaldolase X X X ko:K00849 galK; X X ko:K00926 arcC; carbamate kinase X X X ko:K01443 E3.5.1.25; N-acetylglucosamine-6-phosphate deacetylase X X X ko:K01465 URA4; dihydroorotase ko:K01581 E4.1.1.17; ornithine decarboxylase X X X ko:K01584 E4.1.1.19A; arginine decarboxylase X ko:K01599 hemE; uroporphyrinogen decarboxylase X X ko:K01623 ALDO; fructose-bisphosphate aldolase, class I X X X X ko:K01624 FBA; fructose-bisphosphate aldolase, class II X X X X ko:K01626 E2.5.1.54; 3-deoxy-7-phosphoheptulonate synthase X X ko:K01681 ACO; aconitate hydratase 1 X X X X

87

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K01695 trpA; tryptophan synthase alpha chain X X ko:K01698 hemB; porphobilinogen synthase X X ko:K01745 hutH; histidine ammonia- X ko:K01847 MUT; methylmalonyl-CoA mutase X X X ko:K01874 MARS; methionyl-tRNA synthetase X X ko:K01897 ACSL; long-chain acyl-CoA synthetase X ko:K01962 accA; acetyl-CoA carboxylase carboxyl transferase subunit alpha X X X X X X ko:K02111 ATPF1A; F-type H+-transporting ATPase subunit alpha X X ko:K02492 hemA; glutamyl-tRNA reductase X X ko:K02495 hemN; oxygen-independent coproporphyrinogen III oxidase X ko:K02782 PTS-Gut-EIIB; PTS system, glucitol/sorbitol-specific IIB component X ko:K03046 rpoC; DNA-directed RNA polymerase subunit beta' X ko:K03183 ubiE; ubiquinone/menaquinone biosynthesis methyltransferase X X ko:K05575 ndhD; NAD(P)H-quinone oxidoreductase subunit 4 X ko:K08967 mtnD; 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase X ko:K10011 arnA; UDP-4-amino-4-deoxy-L-arabinose formyltransferase X X ko:K10797 enr; 2-enoate reductase X ko:K13038 coaBC; phosphopantothenoylcysteine decarboxylase X

88

Vostok Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes ko:K00029 E1.1.1.40; malate dehydrogenase (oxaloacetate-decarboxylating)(NADP+) X X ko:K00036 G6PD; glucose-6-phosphate 1-dehydrogenase X X X X ko:K00075 murB; UDP-N-acetylmuramate dehydrogenase X X ko:K00099 dxr; 1-deoxy-D-xylulose-5-phosphate reductoisomerase X X ko:K00128 E1.2.1.3; aldehyde dehydrogenase (NAD+) X X X X X X X X ko:K00134 GAPDH; glyceraldehyde 3-phosphate dehydrogenase X X X ko:K00135 E1.2.1.16; succinate-semialdehyde dehydrogenase (NADP+) X X X ko:K00147 proA; glutamate-5-semialdehyde dehydrogenase X ko:K00163 aceE; pyruvate dehydrogenase E1 component X X X X ko:K00231 PPOX; oxygen-dependent protoporphyrinogen oxidase X X ko:K00282 gcvPA; glycine dehydrogenase subunit 1 X X ko:K00322 sthA; NAD(P) transhydrogenase X ko:K00382 DLD; dihydrolipoamide dehydrogenase X X ko:K00616 E2.2.1.2; transaldolase X X X ko:K00806 uppS; undecaprenyl diphosphate synthase X X ko:K00849 galK; galactokinase X X ko:K00867 coaA; type I X ko:K00955 cysNC; bifunctional enzyme CysN/CysC X X X X ko:K00969 nadD; nicotinate-nucleotide adenylyltransferase X ko:K01434 E3.5.1.11; penicillin amidase X X ko:K01438 argE; acetylornithine deacetylase X X ko:K01443 E3.5.1.25; N-acetylglucosamine-6-phosphate deacetylase X X X

89

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K01584 E4.1.1.19A; arginine decarboxylase X ko:K01586 lysA; diaminopimelate decarboxylase X X X ko:K01599 hemE; uroporphyrinogen decarboxylase X X ko:K01623 ALDO; fructose-bisphosphate aldolase, class I X X X X ko:K01695 trpA; tryptophan synthase alpha chain X X X ko:K01745 hutH; histidine ammonia-lyase X ko:K01772 hemH; ferrochelatase X X ko:K01847 MUT; methylmalonyl-CoA mutase X X X X ko:K01895 ACSS; acetyl-CoA synthetase X X X X ko:K01955 carB; carbamoyl-phosphate synthase large subunit X X ko:K01962 accA; acetyl-CoA carboxylase carboxyl transferase subunit alpha X X X X ko:K02433 gatA; aspartyl-tRNA(Asn)/glutamyl-tRNA (Gln) amidotransferase subunit A ko:K02492 hemA; glutamyl-tRNA reductase X X ko:K02495 hemN; oxygen-independent coproporphyrinogen III oxidase X X ko:K03043 rpoB; DNA-directed RNA polymerase subunit beta X ko:K03046 rpoC; DNA-directed RNA polymerase subunit beta' X ko:K13038 coaBC; phosphopantothenoylcysteine decarboxylase X ko:K13788 pta; phosphate acetyltransferase X X X X ko:K13810 tal-pgi; transaldolase / glucose-6-phosphate isomerase X X X

90

J-9 Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes ko:K00031 IDH1; isocitrate dehydrogenase X X X X X ko:K00088 E1.1.1.205; IMP dehydrogenase X X X ko:K00099 dxr; 1-deoxy-D-xylulose-5-phosphate reductoisomerase X X ko:K00104 glcD; glycolate oxidase X X X ko:K00147 proA; glutamate-5-semialdehyde dehydrogenase X ko:K00163 aceE; pyruvate dehydrogenase E1 component X X X X ko:K00313 fixC; electron transfer flavoprotein-quinone oxidoreductase X ko:K00342 nuoM; NADH-quinone oxidoreductase subunit M X ko:K00362 nirB; nitrite reductase (NAD(P)H) large subunit X X ko:K00371 narH; nitrate reductase 1, beta subunit X X ko:K00382 DLD; dihydrolipoamide dehydrogenase X X X X ko:K00412 CYTB; ubiquinol-cytochrome c reductase cytochrome b subunit X ko:K00457 HPD; 4-hydroxyphenylpyruvate dioxygenase X X ko:K00611 OTC; ornithine carbamoyltransferase X X ko:K00616 E2.2.1.2; transaldolase X X X ko:K00754 E2.4.1.- X ko:K00796 folP; dihydropteroate synthase X ko:K00849 galK; galactokinase X X ko:K00864 E2.7.1.30; X ko:K00867 coaA; type I pantothenate kinase X ko:K00926 arcC; carbamate kinase X X X ko:K00937 ppk; polyphosphate kinase X

91

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K00941 thiD; hydroxymethylpyrimidine/phosphomethylpyrimidine kinase X ko:K00969 nadD; nicotinate-nucleotide adenylyltransferase X ko:K01012 E2.8.1.6; biotin synthetase X ko:K01077 E3.1.3.1; alkaline phosphatase X X X ko:K01187 E3.2.1.20; alpha-glucosidase X ko:K01443 E3.5.1.25; N-acetylglucosamine-6-phosphate deacetylase X X ko:K01465 URA4; dihydroorotase X X ko:K01599 hemE; uroporphyrinogen decarboxylase X X ko:K01623 ALDO; fructose-bisphosphate aldolase, class I X X X ko:K01625 eda; 2-dehydro-3-deoxyphosphogluconate aldolase X X X ko:K01676 E4.2.1.2A; fumarate hydratase, class I X X X X ko:K01681 ACO; aconitate hydratase 1 [ X X X X ko:K01695 trpA; tryptophan synthase alpha chain X X ko:K01704 leuD; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit X X X ko:K01745 hutH; histidine ammonia-lyase X ko:K01772 hemH; ferrochelatase X X ko:K01783 rpe; ribulose-phosphate 3-epimerase X X X X ko:K01874 MARS; methionyl-tRNA synthetase X X ko:K01897 ACSL; long-chain acyl-CoA synthetase ko:K01955 carB; carbamoyl-phosphate synthase large subunit X

92

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K02337 DPO3A1; DNA polymerase III subunit alpha X ko:K02338 DPO3B; DNA polymerase III subunit beta X ko:K02492 hemA; glutamyl-tRNA reductase X X ko:K02495 hemN; oxygen-independent coproporphyrinogen III oxidase X X ko:K02770 PTS-Fru-EIIC; PTS system, fructose-specific IIC component X ko:K02782 PTS-Gut-EIIB; PTS system, glucitol/sorbitol-specific IIB component X ko:K03043 rpoB; DNA-directed RNA polymerase subunit beta X ko:K03046 rpoC; DNA-directed RNA polymerase subunit beta' X ko:K03183 ubiE; ubiquinone/menaquinone biosynthesis methyltransferase X X ko:K03404 chlD; magnesium chelatase subunit D X X ko:K03562 tolQ; biopolymer transport protein TolQ X ko:K03821 phbC; polyhydroxyalkanoate synthase X ko:K04780 dhbF; nonribosomal peptide synthetase DhbF X ko:K05575 ndhD; NAD(P)H-quinone oxidoreductase subunit 4 X ko:K07258 dacC; D-alanyl-D-alanine carboxypeptidase (penicillin-binding protein 5/6) X ko:K08967 mtnD; 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase X ko:K10797 enr; 2-enoate reductase X X ko:K13788 pta; phosphate acetyltransferase X X X X ko:K14652 ribBA; 3,4-dihydroxy 2-butanone 4-phosphate synthase X ko00040 Pentose and glucuronate interconversions X

93

Taylor Dome

Glycan Biosynthesis Xenobiotics Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of Microbial metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse ko01100 Metabolic enzymes Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko:K00036 G6PD; glucose-6-phosphate 1-dehydrogenase X X X X ko:K00099 dxr; 1-deoxy-D-xylulose-5-phosphate reductoisomerase X X ko:K00116 mqo; malate dehydrogenase (quinone) X ko:K00146 feaB; phenylacetaldehyde dehydrogenase X X X ko:K00147 proA; glutamate-5-semialdehyde dehydrogenase X ko:K00163 aceE; pyruvate dehydrogenase E1 component X X X X ko:K00231 PPOX; oxygen-dependent protoporphyrinogen oxidase X X ko:K00282 gcvPA; glycine dehydrogenase subunit 1 X X ko:K00313 fixC; electron transfer flavoprotein-quinone oxidoreductase X ko:K00341 nuoL; NADH-quinone oxidoreductase subunit L X ko:K00371 narH; nitrate reductase 1, beta subunit X X ko:K00382 DLD; dihydrolipoamide dehydrogenase X X X ko:K00457 HPD; 4-hydroxyphenylpyruvate dioxygenase X X ko:K00606 panB; 3-methyl-2-oxobutanoate hydroxymethyltransferase X X ko:K00632 E2.3.1.16; acetyl-CoA acyltransferase X X X X X X ko:K00806 uppS; undecaprenyl diphosphate synthase X X ko:K00849 galK; galactokinase X X ko:K00874 kdgK; 2-dehydro-3-deoxygluconokinase X X ko:K00926 arcC; carbamate kinase X X X ko:K00937 ppk; polyphosphate kinase X ko:K00954 E2.7.7.3A; pantetheine-phosphate adenylyltransferase X

94

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K00969 nadD; nicotinate-nucleotide adenylyltransferase X ko:K01046 E3.1.1.3; triacylglycerol lipase X ko:K01077 E3.1.3.1; alkaline phosphatase X X X ko:K01443 E3.5.1.25; N-acetylglucosamine-6-phosphate deacetylase X X ko:K01465 URA4; dihydroorotase X X ko:K01586 lysA; diaminopimelate decarboxylase X X X ko:K01599 hemE; uroporphyrinogen decarboxylase X X ko:K01623 ALDO; fructose-bisphosphate aldolase, class I X X X X ko:K01647 CS; citrate synthase X X X ko:K01681 ACO; aconitate hydratase 1 X X X X ko:K01695 trpA; tryptophan synthase alpha chain X ko:K01745 hutH; histidine ammonia-lyase X ko:K01750 E4.3.1.12; ornithine cyclodeaminase X X ko:K01847 MUT; methylmalonyl-CoA mutase X X X X X ko:K01895 ACSS; acetyl-CoA synthetase X X X ko:K01921 ddl; D-alanine-D-alanine ligase X X ko:K01955 carB; carbamoyl-phosphate synthase large subunit X ko:K01962 accA; acetyl-CoA carboxylase carboxyl transferase subunit alpha X X X X X X ko:K02111 ATPF1A; F-type H+-transporting ATPase subunit alpha X ko:K02428 rdgB; dITP/XTP pyrophosphatase X

95

Cont.

Glycan Biosynthesis Xenobiotics Microbial Metabolism Biosynthesis Metabolism of Metabolism of of other Biodegradation Biosynthesis of metabolism Carbohydrate Energy Lipid Amino Acid of other and Cofactors and Terpenoids and secondary and secondary in diverse Metabolism Production Metabolism Metabolism Amino Acids Metabolism vitamins Polyketides metabolites Metabolism metabolites environments ko01100 Metabolic enzymes

ko:K02433 gatA; aspartyl-tRNA(Asn)/glutamyl-tRNA (Gln) amidotransferase subunit A ko:K02472 wecC; UDP-N-acetyl-D-mannosaminuronic acid dehydrogenase X ko:K02492 hemA; glutamyl-tRNA reductase X X ko:K02770 PTS-Fru-EIIC; PTS system, fructose-specific IIC component X ko:K02782 PTS-Gut-EIIB; PTS system, glucitol/sorbitol-specific IIB component X ko:K03043 rpoB; DNA-directed RNA polymerase subunit beta X ko:K03046 rpoC; DNA-directed RNA polymerase subunit beta' X ko:K03404 chlD; magnesium chelatase subunit D X X ko:K03562 tolQ; biopolymer transport protein TolQ X X ko:K04782 pchB; isochorismate pyruvate lyase X ko:K07258 dacC; D-alanyl-D-alanine carboxypeptidase (penicillin-binding protein 5/6) X X ko:K07806 arnB; UDP-4-amino-4-deoxy-L-arabinose-oxoglutarate aminotransferase X ko:K08967 mtnD; 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase [ X ko:K09065 argF1; N-acetylornithine carbamoyltransferase X X ko:K10797 enr; 2-enoate reductase X

96

D. Statistical analysis of factors influencing sequence distribution in Antarctica. I. Summary of the statistical regression analysis of the effect of distance of the sampled locations from the Antarctic coast on the number of sequences obtained.

Regression Statistics

Multiple R 0.913154084

R Square 0.833850382

Adjusted R Square 0.750775572

Standard Error 3.72192483

Observations 4

ANOVA

df SS MS F Significance F

Regression 1 139.0445511 139.04455 10.037343 0.086845916

Residual 2 27.70544888 13.852724

Total 3 166.75

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0%

Intercept 49.80862133 3.151691268 15.803776 0.00398 36.2479883 63.3692544 40.60572828 59.01151439

Distance -0.010666607 0.003366797 -3.168177 0.0868459 -0.025152764 0.00381955 -0.020497604 -0.000835609

97

II. Summary of the statistical regression analysis of the effect elevation of the sampled locations on the number of sequences obtained.

Regression Statistics

Multiple R 0.189003

R Square 0.035722

Adjusted R Square -0.44642

Standard Error 8.96642

Observations 4

ANOVA

df SS MS F Significance F

Regression 1 5.956636 5.956636 0.074091 0.810997

Residual 2 160.7934 80.39668

Total 3 166.75

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0%

Intercept 43.57263 8.058279 5.407188 0.032542 8.900653 78.24461 20.04257 67.10269

Elevation -0.00097 0.003563 -0.2722 0.810997 -0.0163 0.014359 -0.01137 0.009433