USING BALTICA ECOTYPES AS A MODEL FOR TRANSCRIPTIONAL

VARIATION AT THE POPULATION LEVEL

by

WILLIAM SEALY HAMBRIGHT

Presented to the Faculty of the Graduate School of

The University of Texas at Arlington in Partial Fulfillment

of the Requirements

for the Degree of

MASTER OF SCIENCE IN BIOLOGY

THE UNIVERSITY OF TEXAS AT ARLINGTON

DECEMBER 2010

Copyright © by Sealy Hambright 2010

All Rights Reserved

ACKNOWLEDGEMENTS

First I would like to thank my family, and my amazing girlfriend Mandi. Without their support I would not be where I am at today. Next, I would like to thank Dr. Jorge Rodrigues for his guidance and undaunted patience and support throughout this project. Thank you to my committee, Dr. Thomas Chrzanowski and Dr. Woo-Suk Chang. Dr. Chrzanowski is not only an advisor, but has been a professor, mentor, and friend during my tenure at UTA. I am in debt to all of my fellow colleagues for their immense help and ability to put up with my caffeine driven antics. To name a few: Drs Sridev Mohapatra, Babur Mirza, Atcha Boonmee, as well as Aditya

Ranjan M.S., Jantiya Isanapong M.S., Bogar Garcia, Fabiana da Silva Paula, Blaine Thompson,

Christi Hull M.S., Briony Foster, and the entire Melotto lab (including Dr. Melotto). I especially want to thank Austin Willis M.S., for our work was often a collaborative effort in learning which I believe is the foundation of research.

November 19, 2010

iii

ABSTRACT

USING SHEWANELLA BALTICA ECOTYPES AS A MODEL FOR TRANSCRIPTIONAL

VARIATION AT THE POPULATION LEVEL

Sealy Hambright M.S.

The University of Texas at Arlington, 2010

Supervising Professor: Jorge Rodrigues

Eukaryotic studies have shown considerable transcriptional variation among individuals of the same population. Owing to the cost of sequencing entire eukaryotic genomes, tested organisms were assumed to be genomically similar or even identical. We overcame this necessary assumption by using four sequenced strains of the bacterium Shewanella baltica,

(OS155, OS185, OS195, OS223), as models to assess transcriptional variation and ecotype formation within a prokaryotic population. The strains were isolated from various depths throughout a water column of the Baltic Sea occupying different ecological niches characterized by various abiotic parameters. Although their genome sequences are strikingly similar, when grown in the laboratory under identical conditions, all strains exhibited statistically significantly different growth rates suggesting global expressional variation. To confirm the findings, custom microarray slides containing probes representing all four of the sequenced genomes were hybridized with two strains at a time, in a two color manner, using a loop design. A one way

ANOVA designated 415 core genes to be differentially expressed between the four strains at a stringent P value of 0.001. Furthermore, when analyzing common gene sequences shared

iv

among 32 other strains within the water column, Ecotype Simulation software consistently grouped all four model strains into discrete ecotypes. Transcriptional pattern variations such as the ones highlighted here may be used as indicators of short-term evolution emerging from the formation of bacterial ecotypes.

v

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... iii

ABSTRACT ...... iv

LIST OF ILLUSTRATIONS...... viii

LIST OF TABLES ...... ix

Chapter Page

1. LITERATURE REVIEW……………………………………..………..…...... 1

1.1 The Shewanella genus ...... 1

1.2 Mechanisms for genetic diversity: Sexually active ...... 8

1.3 The bacterial species concept: Molecular standards to define species ...... 9

1.4 Bacterial evolution: The transcriptome ...... 13

2. USING SHEWANELLA BALTICA ECOTYPES AS A MODEL FOR TRANSCRIPTIONAL VARIATION AT THE POPULATION LEVEL…………………...15

2.1 Introduction...... 15

2.2 Materials and Methods ...... 17

2.2.1 Cultivation and isolation ...... 17

2.2.2 Ecotype Simulation (ES) ...... 17

2.2.3 RNA extraction, enrichment and labeling ...... 18

2.2.4 Oligonucleotide Microarray design ...... 18

2.2.5 Microarray hybridization ...... 19

2.2.6 Global expression profile analyses ...... 20

2.2.7 Experimental standards ...... 21

2.3 Results ...... 21

2.3.1 Standardized growth assays ...... 21 vi

2.3.2 Ecotype simulation ...... 23

2.3.3 Global expression analysis ...... 26

2.4 Discussion ...... 37

2.4.1 Scope of the study ...... 37

2.4.2 Ecotype demarcations ...... 37

2.4.3 Genotypic and phenotypic differences among strains ...... 38

2.4.4 Foundations for analysis scheme ...... 39

2.4.5 Expressional relatedness among strains ...... 41

APPENDIX

A. COMPLETE PROTOCOL: CELL TO ANALYSIS ...... 45

B. SUPPLEMENTAL MATERIALS ...... 54

REFERENCES ...... 69

BIOGRAPHICAL INFORMATION ...... 74

vii

LIST OF ILLUSTRATIONS Figure Page

1.1 16S rRNA gene based phylogenetic tree of the Shewanella genus ...... 2

1.2 16S rRNA gene based phylogenetic tree of the Shewanella baltica isolates obtained from Gotland deep water column...... 4

2.1 Microarray hybridization loop design for all possible pairwise comparisons ...... 20

2.2 Shewanella baltica growth in defined media containing glucose as single carbon source ...... 22

2.3 Shewanella baltica growth in defined media containing maltose as single carbon source ...... 23

2.4 Ecotype Simulation (ES) using gyrB sequences for 36 Shewanella baltica strains within the water column ...... 25

2.5 Comparative differentially expressed gene profiles per strain ...... 27

2.6 Global gene distribution profile indicating range of expressional variation of all gene-individuals ...... 29

2.7 Experimental variation versus significance between strains ...... 30

2.8 Number of genes found per COG designation ...... 31

2.9 Pathway predictions for the OS155 total transcriptome using the cellular omics-viewer from Biocyc.org ...... 33

2.10 Pathway predictions for the OS195 total transcriptome using the cellular omics-viewer from Biocyc.org ...... 34

2.11 Pathway predictions for the OS185 total transcriptome using the cellular omics-viewer from Biocyc.org ...... 35

2.12 Pathway predictions for the OS223 total transcriptome using the cellular omics-viewer from Biocyc.org ...... 36

viii

LIST OF TABLES

Table Page

1.1 Abiotic parameters within the Gotland Deep water column for each strain isolation depth ...... 5

1.2 Genome features and gene content of sequenced Shewanella baltica strains ...... 7

B.1 Genes identified by Genespring GX 11 to be significantly differentially expressed among all four S. baltica strains using a one-way ANOVA ...... 55

B.2 COG descriptions per category ...... 68

ix

CHAPTER 1

LITERATURE REVIEW

1.1 The Shewanella genus

The Shewanella genus, type strain S. putrefaciens, belongs to the gram negative gamma- lineage which includes Escherichia coli and its closest relative, the

Vibrionaceae (Figure 1.1). The first Shewanella strain was isolated from putrid butter in 1931 and named Achromobacter putrefaciens [1]. After a few taxonomic changes, it was not until

1985 that A. putrefaciens became the type strain for the Shewanella genus [2]. The genus is distributed worldwide and contains mostly aquatic members of both marine and freshwater habitats [3, 4]. While less frequent, some Shewanellae are found in sedimentary environments, and even clinical samples as opportunistic pathogens [5-7]. In general, members are often found in chemically stratified environments [3, 8]. Not surprisingly, Shewanellae can use a broad range of carbon sources [3], but energetically are strictly respiratory facultatively anaerobic organisms [9]. In fact, the most pertinent and identifiable characteristic of the clade is the ability to respire a variety of inorganic and organic substrates including non-standard elements like chromium [10], technetium [11], neptunium [12], and plutonium [13]. A more exhaustive list can be found elsewhere [14].

1

Figure 1.1 16S rRNA gene based phylogentic tree of the Shewanella genus. Members of S. baltica are indicated in red with corresponding isolations depths and site.

2

Because of the ecological relevance and unprecedented redox versatility, the

Shewanellae have garnered interest in both the applied sciences as well as basic research endeavors. For example, a variety of Shewanella spp. have been studied for use in biological fuel cells [15, 16]. The use of Shewanella spp. as agents for bioremediation has also become a important topic, as they can beneficially affect the solubility of unwanted pollutants or elements in soil and aquatic systems [4]. Over 20 Shewanella genomes have been sequenced

(http://img.jgi.doe.gov/cgi-bin/pub/main.cgi). For 10 genomes, representing multiple clades, an extensive comparative genomic, proteomic, and phenomic, study has been carried out to gage overall diversity among strains [17]. It is thought that genetic diversity among these Shewanella strains is due to mobile islands and/or genetic insertions. This could in part explain their cosmopolitan distribution and ecological specialization [17, 18].

Shewanella baltica was first isolated from the Gotland Deep region of the Baltic Sea at depths ranging from 80 to 140 meters [9]. Prior to 1998, S. baltica was considered to be a member of the S. putrefaciens lineage (Owens group II). Using comparative genomic tools such as DNA-DNA hybridization, RAPD analysis, 16S rRNA gene identity, and G+C content, S. baltica was phylotyped as its own clade [9]. Overall, the water column hosts a variety of closely related S. baltica strains or ecotypes (Figure 1.2) performing a variety of functional roles. For instance, S. baltica seems to play an important ecological role in the turnover of nitrogen being the primary taxonomic unit responsible for local denitrification, especially at the oxic-anoxic interface [19]. S. baltica definitely fits well within the confines of the aforementioned genus description in terms of metabolic versatility and stratified habitat [9, 19]. Within the water column an extensive profile of abiotic parameters has been established [19], which consists of a chemically stratified brackish environment (Table 1.1). The Baltic Sea nutrient pool is rich as well. This is due to immense eutrophication in the region caused by runoff of sewage and fertilizers from surrounding countries [20-22]. As an effect, the region harbors a surplus of organic and inorganic substrates.

3

Figure 1.2 16S rRNA based phylogenetic tree of the Shewanella baltica isolates obtained from the Gotland deep water column. The four sequenced genomes are in boldface with their corresponding isolation depth in meters.

4

Table 1.1 Abiotic parameters within the Gotland Deep water column for each strain isolation deptha.

-1 -1 -1 - -1 - -1 + -1 Strain(s) and O2 [ml l ] Salinity Temperature H2S [µmol l ] N2O [nmol l ] NO3 [µmol l ] NO2 [µmol l ] NH4 [µmol l ] Isolation Depth [m] [%] [⁰C]

OS155 (90) 1.5 10.0 4.0 0.0 95.0 50.0 7.5 1.0 OS185, OS223 (120) 1.4 10.5 5.0 0.0 70.0 63.0 0.0 0.0 5

OS195 (140) 0.0 11.0 5.5 2.0 20.0 0.0 0.0 10.0 a Values extrapolated from Brettar et. al. 2001. Measurements were recorded during strain isolation in 1986.

For four of the strains (OS223, OS195, OS185, and OS155), complete genome sequences have been established along with extensive physiological profiles [9] (Rodrigues, unpublished data). Because of this information, whole genome relatedness among these four strains can be arranged in terms of shared and unique genes (Table 1.2). In addition, genome synteny, horizontally acquired genes, average nucleotide identity (ANI), and 16S rRNA gene identity among the four strains has been assayed [18]. DNA-DNA reassociation values greater than 70% have also been observed for the strains [9, 18]. Because of all these molecular thresholds, the four strains are considered to be of the same species. It is thus likely that all other S. baltica members in the water column suffice the same molecular standards.

6

Table 1.2 Genome features and gene content of sequenced Shewanella baltica strains.

Strain Genome Sizea Number of genesb Protein coding Partially shared Strain specific NCBI Accession genesb genesc and % genesc and % Number OS223 5,358,884 4,464, [203] 4,245, [191] 4,307 (96.5) 157 (3.7) CP001252 OS195 5,547,544 4,667, [192] 4,499, [189] 4,584 (98.2) 84 (1.9) CP000891 7

OS185 5,312,910 4,521, [75] 4,323, [71] 4,467 (98.8) 54 (1.2) CP000753 OS155 5,342,896 4,521, [200] 4,307, [182] 4,367 (96.6) 154 (3.6) CP000563 a Genome size was computed as the sum of chromosome and plasmid. b When applicable, number of genes present in plasmids are shown between brackets. c Only orthologous genes outside of the core were considered in this analysis.

1.2 Mechanisms for genetic diversity: Sexually active bacteria

In general, bacteria are believed to be extremely receptive to foreign DNA. Normal barriers to exogenous DNA incorporation that exist in sexual reproducing genera are often not apparent in bacteria [23]. Cells can acquire genetic material through a bacteriophage intermediate (transduction), from a donor cell (conjugation), or directly from the environment

(transformation) [24]. If the foreign DNA is incorporated into the chromosome, completely or partially, recombination occurs. Cells can also retain plasmids containing an assortment of useful genes that never get incorporated into the chromosome and are thus lost at cell division adding to the load of free environmental DNA. When recombination occurs, it affects genomic identity and can have a coalescent affect if it occurs between strains of the same species, just as sex does in eukaryotes, termed homologous recombination [24]. While this does contribute to genetic variety among lineages, homologous recombination is thought to primarily affect core genetic elements shared by closely related lineages [24]. In opposition, lateral gene exchanges are thought to provide accessory genes acted upon by positive selection [17, 25-27].

Extraneous DNA is often acquired from distantly related species as opposed to hereditary traceable lineages [24]. As a whole, these lateral gene transfer events are frequent and constitute significant portions of most bacterial genomes [28]. Horizontal transfers can include transposable elements, pseudogenes, and entire “genomic islands”, which contain multiple functional genes [24, 28, 29]. Such multi-gene transfers as opposed to select chromosomal regions of homologous recombination can have a significant effect on adaptation and diversity [24]. Genomic islands introduced into recipient genomes are often filled with accessory genes that allow cells to gain some type of competitive advantage. Beneficial genetic elements can enhance utilization of certain nutrients, optimize a response to an environmental condition, or even allow the cell to proliferate in otherwise non-growing conditions [30]. These islands have also been shown to increase pathogenicity in strains and can facilitate antibiotic resistance underlying the clinical significance of such processes [30, 31]. Conversely, while

8

many insertions may be helpful to the cell, it stands to reason that many are not beneficial and may be acted upon by negative selection. Overall, lateral gene transfer enhances fitness by supplying advantageous traits while increasing overall genetic diversity. In fact, it is believed that the majority of gene content differences between members of the same species are due to mobile islands and insertion elements [17].

So what makes bacterial gene transfer so different than vertical gene transfer? The difference is magnitude and variety. Multiple genes can be introduced into a recipient chromosome at once. These genes can be any gene within the donor genome including ribosomal elements [32]. In addition, if genetic exchange is extremely licentious, and can vary from cell to cell, genomic comparisons may produce an inconsistent signal in phylogenetic trees based on sequence identity. Genetic recombination coupled with undefined community parameters makes a well defined bacterial species concept hard to construct. Furthermore, bacteria as a whole share extremely similar sized genomes while displaying extremely diverse phenotypes and metabolic properties [28]. The source of genetic diversity in bacteria is hard to pin down and makes compiling a discrete evolutionary history of the clade troublesome.

Understanding this, various molecular techniques can be utilized to assess bacterial relatedness and species classification.

1.3 The bacterial species concept: Molecular standards to define species

In higher eukaryotes, the biological species concept is straightforward following the definition set forth by Ernst Mayr [33]. According to Mayr, discrete species can produce fertile offspring through sexual reproductive processes. In the bacterial world this is non-existent, as genetic exchange, or recombination, occurs through a promiscuous variety of horizontal mechanisms. Thus a defined bacterial species concept has many definitions [34-36]. The original classification of bacteria was based on a phenetic system best illustrated by Bergey’s

Manual of Determinative Bacteriology, which has been a staple in diagnostic microbiology since

1923. Considering that genetic exchange is prevalent in bacteria, and that immense phenotypic

9

variation exists between closely related genera, DNA based relatedness tools have long supplanted phenotypic classification. Since the 1960s, DNA-DNA hybridization (DDH) experiments have been employed to ascertain relatedness in bacteria. In this scheme, whole genome similarity, independent of nucleotide sequencing, is compared between two individuals by hybridizing their genomic DNA against one another. It is generally accepted that a DDH value of 70% overlap properly demarcates the species level in bacteria [37]. Unfortunately, the

DDH method is laborious and expensive which limits its utilization. Since the advent of sequencing technology, sequence identity for specific regions of DNA has become a widely used relatedness metric. For these reasons, methods such as the multilocus sequence typing

(MLST) [38], and small subunit rRNA (SSU) gene identity [39] are the most prevalent. MLST compares sets of housekeeping genes which are assumed to be nearly identical in non- recombinant phylogenies. Thus, MLST can be used to assess homologous recombination rates in microbial populations which can vary between different populations [40]. SSU analysis takes advantage of the 16S rRNA gene sequence. This sequence is approximately 1500 bp in length and is highly conserved throughout the Eubacteria lineage yet divergent from the

Archaebacteria lineage [39]. It is currently thought to properly demarcate the species level at

99% similarity [41]. Recent studies using whole genome sequences confirm this resolution as organisms with nearly identical 16S rRNA gene sequences have been found to contain significant genotypic variation [42, 43]. Due to its applications in prokaryotic , SSU rRNA is considered by most to be the gold standard for bacterial classification. This is primarily due to the massive database of ribosomal sequences already assembled and will probably remain so until the cost of sequencing whole genomes is dramatically reduced [44]. While divergence within specific gene sequences is very useful, taxonomists generally agree that sequence based tools which consider whole genome relatedness are more robust and telling.

This is made possible by the ever growing list of complete microbial genome sequences.

Because of the breadth of information available from entire genome sequences, much more

10

powerful molecular standards are being applied to assess bacterial divergence. This is important because microbial genomes are known to be very dynamic, thus analysis of very small portions of DNA may not tell the whole story of clade phylogeny. For instance, if recombination rates are high in a population, and are a primary source for allelic variation, it can be assumed that phylogenetic inferences made using single gene loci are more an assumption of specific allelic differences instead of clonal divergence [40]. The use of single gene loci, or even a few housekeeping genes as determinants of phylogeny naturally lack the strength of whole genome assays and may not be representative of evolutionary history but more of lateral gene transfer events [42, 44]. In addition, the bacterial genome undergoes systematic genomic purging [45]. It is also extremely streamline consisting of mostly coding genes and few gene duplications [45]. Thus divergence due to an accumulation of lateral gene transfer events may not be traceable as only the most recent descendents of the donor cells share the same orthologous elements [17]. Due to these limitations, comparative genomic tools that evaluate overall similarity of protein coding sequences (CDS), such as average nucleotide identity, are gaining interest in the taxonomic world. Average nucleotide identity, or ANI, between strains has been shown to agree with the traditional metrics for speciation including DDH values and 16S rRNA gene identity [42]. ANI only considers genes shared between strains, which are thought to provide a more homologous representation of strain relatedness [42]. At this resolution, ANI also takes into account sequence divergence caused by ecological factors contributing to the overall evolutionary distance [42]. Metrics such as ANI that utilize whole genome sequences and take into account shared gene identity may be the best molecular tools for demarcating like species to date. Other methods that consider proteomic and transcriptional data have also been used as evolutionary signatures to assay divergence [17, 46].

Ecological history has also been examined as a determinate for speciation and an addition to the bacterial species concept [17, 42, 47, 48]. The ecological specialization of a clade may be the most important indicator of adaptation and thus would be reflected in the

11

functional gene content of the clade. It stands to reason that habitat will determine, at least in part, the indispensability of a gene as successful traits are selected for. Over time habitat plays an important role in shaping the gene content of the operational taxonomic units (OTU’s) of the region [42, 49]. In fact, comparative genomic studies have revealed that organisms from similar ecologies demonstrate less overall genomic divergence in terms of gene content [42]. While similar habitats may have similar metagenomic signatures, they are often comprised of significantly different taxa [50, 51]. This suggests that ecology plays a distinct role in the selection of traits and not necessarily organisms that comprise a community. In fact, the source of bacterial community composition is a provocative topic in the field of microbial biogeography.

One idea that has lingered since the late 19th century is that that the environment selects from a seed bank of ubiquitous microorganisms [52]. This concept is predicated on a global distribution of bacteria with non-limited options for dispersal essentially eliminating the existence of local endemic taxa and a microbial biogeography. Ecological specialization to certain environments has been identified in a few bacterial lineages [47, 48, 53], which supports a discrete environmental role in community structure. Another prevalent model for community structure is the neutral community assembly model (NCM), which implies that current bacterial community composition is a product of stochastic colonization of functionally equivalent taxa from some source community [54]. According to this model, community composition is essentially a random process defined by the size and diversity of the surrounding source community or

“metacommunity” [55]. In this model, phenotypic variation (including expressional variation), among populations should be greater than variation within a population due to spatial separation and evolutionary divergence. Some studies have found evidence to the contrary [56], while others provide evidence to support NCMs [51, 57]. Overall it is generally accepted that over time bacterial community composition becomes a product of ecological specialization more than likely due genomic diversity caused by genetic drift and recombination via horizontally acquired genetic elements as opposed to neutral assembly [54]. Few sequence based tools

12

have been established that take into account ecological divergence within populations. Such tools aim to build on sequence based relatedness by increasing resolution to identify ecologically distinct members of a bacterial population. These include AdaptML [58] and

Ecotype Simulation [48].

1.4 Bacterial evolution: The transcriptome

Evolution is the product of selective forces acting upon biological mechanisms. Affected features are often structural, e.g. wing length, vertebral structure, protein architecture etc. One truth behind Darwin’s basic morphological observations for selection is nearly all these changes are results of altered proteins, whether it is post or pre-translational. Phenotypic variation, including the regulation of, is necessary for the bacterial lifestyle and is rooted in the genetic content of the cell. It has been postulated for some time that the regulation of gene expression, as opposed to large scale shifts in variant protein production, may be a product of evolution by natural selection in eukaryotic systems [59]. For instance, noticeable differences in gene expression patterns between humans and chimpanzees are present in many organs, especially the brain [60, 61]. It is suggested that the much more pronounced cognitive ability exhibited by humans may have evolved through mechanisms geared toward transcriptional regulation. In eukaryotes, the process of regulation is a complex system affected by histone coverage of the gene, promoter sequence and architecture, as well as transcription factor (TF) specificity and activity [62]. Specifically, it is thought that the addition or deletion of TF binding sites, or cis- regulatory sites, are the mechanism for evolutionary change as opposed to changes in the TFs themselves or trans-regulatory sites [63-65]. Tweaking of cis-regulatory sites increases the selectivity of specific genes in which to alter expression. In bacteria, the same evolutionary caveats exist but the mechanisms are thought to be different as gene regulation is much more dynamic and pronounced [66]. Expressional divergence is thought to be mediated by both TF architecture within the promoter as well as modifications of the TF sequence specifically [66-69].

For instance, when heterologously expressed in each cell line, very similar TF orthologues

13

(PhoP) from Salmonella enterica and Yersinia pestis did not affect transcription on species specific genes but did alter transcription on ancestral genes shared among the organisms [66].

This indicates that TF sequence variation that exists between species is likely under selective pressure. Furthermore, it seems that there may be a core regulon among members of a bacterial population just as there is a core genome [66].

Bacteria contain robust regulatory networks that are constantly being modified and rapidly evolving [70, 71]. Regulatory networks are also extremely dynamic and variable, even between like species [72]. An ongoing question is what drives the evolution of such circuits in bacteria? Given that genomic identity is governed primarily by natural selection in large populations of bacteria [69], it is thus becoming apparent that evolutionary forces at the transcriptional level play a pivotal role in fitness [64, 73]. Simply, selective forces, primarily positive, acting upon immense genetic variation generated by the massive rates of horizontal genetic exchange between individuals are driving the divergence [74]. Furthermore, It stands to reason that evolution will occur more rapidly for positive selection than neutral or negative selection because of so many opportunities for genetic acquisition in bacteria. It is thus widely accepted that neutral mechanisms for transcriptional divergence that are limited to genetic drift and point mutations cannot account for such immense variation [29, 40]. In fact, studies have indicated that forces of stabilizing selection may govern global transcriptional variance [64, 73].

What is left to ascertain is the extent of variation that exists between individuals of the same population of bacteria. Very similar assays have been done in eukaryotic populations, which have shown that even small expressional differences are meaningful highlighting the importance of measuring individual variation [56, 75, 76].

14

CHAPTER 2

USING SHEWANELLA BALTICA ECOTYPES AS A MODEL FOR TRANSCRIPTIONAL

VARIATION AT THE POPULATION LEVEL

2.1 Introduction

Eukaryotic speciation is generally a well understood process that includes a classic species definition. On the other hand, bacterial speciation can be a complex task due to intrinsic prokaryotic properties such as quick generation times, cultivation difficulty, small size, and asexual reproduction. The modern species concept for bacteria can best be described as convoluted. One thing for sure is that there exists significant genetic diversity among the

Eubacteria lineage of which there is little resolve. Many factors contribute to prokaryotic genetic diversity including horizontal gene transfer and DNA recombination which in turn complicates the classification of bacteria [35, 77]. It is even fair to question whether or not bacteria contain a genetic continuum at all in nature [36]. Because of this genetic promiscuity, molecular thresholds such as DNA-DNA hybridization [37], 16S rRNA gene identity [39], and Average

Nucleotide Identity [42] are often used to demarcate genus and/or species. While their resolution is regularly debated, these powerful molecular tools have revolutionized prokaryotic systematics and biogeography and are generally considered to be the gold standard. While important, these sequence based relatedness still lacks a fundamental ecological emphasis and does not account for ecologically distinct members. To accommodate for this, the ecotype concept has been developed to accommodate ecological differences among strains [78].

Ecotypes, or clades of recently divergent ecologically distinct units, may be very important to the overall population dynamic. These units contribute to overall community composition and often perform discrete roles that allow the metapopulation to coexist [79]. Studies have shown that genomically similar prokaryotic populations can include clades of ecologically distinct members

15

due to slightly different environmental parameters [47, 48]. This implies a direct environmental role in selecting more fit individuals. Others have found that the history of community structure plays an important role in determining current community composition, which introduces an element of stochasticity to the equation [57]. All in all, this indicates that an ecological emphasis is necessary to paint the picture of past and present community composition along with processes that generated specialization therein.

The question quickly arises: What genetic mechanisms control diversification and phenotypic variation that is apparent in closely related organisms? In primates, it has been thought for some time that adaptation may lie within the transcriptome [59, 60]. For example, significant transcriptional differences at the regulatory level are thought to be the most responsible for increased cognitive ability in humans when compared to lower primates [61].

This suggests that advantageous effects of transcriptional diversity may be selected for thus highlighting the importance of expressional divergence. If selective forces are acting upon the transcriptome, such forces may be at the forefront of divergence and thus evolution. Although expressional variation at the individual level has been characterized for a few eukaryotic groups

[56, 75, 76], and is undoubtedly inherent, the problem lies in trying to decipher “noise” from important functional proteins and transcripts. Finding responsible genetic elements will be hard but may be very telling.

Here we investigate the individual expressional variation among four closely related bacterial ecotypes by comparing their transcriptomes using custom microarrays. By comparing relative transcription profiles between strains, we aim to characterize the frequency and magnitude of differential gene expression. It is much easier to assay this phenomenon using prokaryotes. Bacteria in culture allow for a larger sample size (~108 or even greater), thus more individuals, and thus more measurements. Furthermore, we also chose model ecotypes representative of an entire population which allows for a more direct characterization of variation across an evolutionary gradient. The use of oligonucleotide microarrays provides a high

16

throughput analysis of the whole genome differential expression among the strains. The four genomes used in this study belong to the species Shewanella baltica (OS223, OS195, OS185, and OS155) and can be found under the following NCBI accession numbers: CP001252,

CP000891, CP000753, CP000563.

2.2 Materials and Methods

2.2.1. Cultivation and isolation

Shewanella baltica strains OS195, OS185, OS223, and OS155 were grown separately in batch culture using nutrient rich marine broth (Difco, Lawrence, KS). Cells were allowed to reach exponential phase to adjust to fastidious growth, then transferred (0.5 ml) in triplicate to

75 ml defined media previously described [80]. Flasks were incubated in 25⁰C, at 200 rpm.

Optical density (OD600) measurements were recorded using the Spectronic 20D+ spectrophotometer (Thermo Electron, Waltham, MA). Growth rate of S. baltica was calculated based on an increase in optical density over time.

2.2.2. Ecotype Simulation (ES)

Shewanella baltica ecotypes were identified by Ecotype Simulation software [48]. ES is a sequence based tool that demarcates bacterial ecotypes within the confines of modern bacterial species concepts [81]. A total of seven different gene sequences shared among 36 strains were used as input. Sequences were first aligned using the freeware ClustalX [82].

Sequences were used exclusively and in concatenated sets of 2-5 sequences as input for ES analysis. Briefly, the ES software demarcates the number of ecotypes (n) by estimating rates of ecotype formation (Ω), as well as genetic drift (d) and periodic selection (σ), all consistent with the Stable Ecotype Model [83]. All gene sequences were recombinant free as the ES algorithm does not explicitly characterize sequence diversity due to recombination events [48].

Sequences used for ES analysis can be found under the following accession numbers:

AAN53658, AAN55644, AAN57661, AAN55227, AAN55734, AAN53703, and ADN96149.

17

2.2.3. RNA extraction, enrichment and labeling

Once cells reached and OD600 = 0.4, five ml were spun down at 10,000 X g at 25⁰C.

Total RNA from harvested cells was phenol extracted from three biological replicates per strain using a RiboPure bacterial RNA kit (Ambion, Austin, TX) according to manufacturer’s instructions. Messenger RNA was enriched from total RNA (10µg), by removing 16S and 23S rRNAs using a capture hybridization approach in a MICROBExpress bacterial mRNA enrichment kit (Ambion, Austin, TX).

Samples containing 100 ng of mRNA were amplified with the MessageAmp II aRNA amplification kit (Ambion, Austin, TX) and converted to anti-sense RNA (aRNA) by first reverse transcribing the sequence into cDNA bearing a T7 promoter, followed by DNA polymerase mediated second strand synthesis, yielding dsDNA as a template for in vitro transcription. The aRNA was purified according to kit specifications, and assessed for quality using a Bioanalyzer

2100 platform (Agilent technologies, Santa Clara, CA). Samples were ethanol precipitated, followed by fluorescent labeling using with either Alexa Flour-555 or Alexa Flour-647 dye

(Invitrogen, Carlsbad, CA), following the manufactures specifications, with the exception of 2X the recommended starting aRNA concentration (10 µg), water, and labeling buffer. Labeled samples were purified using an RNeasy kit (Qiagen, Valencia, CA), and checked spectrophotometrically for concentration and dye incorporation.

2.2.4. Oligonucleotide Microarray design

Custom glass slide microarrays were constructed using phosphoramidite chemistry for oligonucleotide polymerization [84] in 44-48mer lengths by Mycroarray inc. (Ann Arbor, MI).

Slides were probed with PCR amplicons representing genes unique to OS223 (157), OS195

(84), OS185 (54), and OS155 (154), as well as 3,934 core genes shared among all strains.

Specifically, the arrays contained 3 to 7 replicates per probe, for a total of 30,000 probes representing 5,234 protein coding genes including unique and core genes. Core amplicons

18

were synthesized to provide equal binding efficiency among strains thus eliminating competitive bias in the results.

2.2.5. Microarray hybridization

Anti-sense RNA from strains OS223, OS195, OS185, and OS155 were used for the assay. Two different strains were simultaneously hybridized per slide in a competitive two-color scheme for each possible pairwise comparison excluding self (Figure 2.1). Each 60 µl hybridization reaction contained approximately 2 µg aRNA per strain, oppositely labeled with either Alexa-Flour 555 or Alexa-Flour 647 dyes. The aRNA solution was contained on the slide using 25 X 40 mm LifterSlips (Thermo Scientific, Waltham, MA). Slides were then statically incubated at 50⁰C for 18 hours in single-slide rubber hybridization chambers (CamLab,

Cambridge, U.K.). Hybridization procedures and parameters including washing requirements, and final concentrations of necessary buffers and solutions, were carried out according to

Mycroarray’s specifications with the addition of a 50⁰C heated post incubation washing step, as well as the removal of the aRNA fragmentation procedure.

19

Figure 2.1 Microarray hybridization loop design for all possible pairwise comparisons. Each arrow represents one pairwise comparison. For each comparison there were 6 replicate hybridizations including dye swap arrays.

2.2.6. Global expression profile analyses

Three biological replicates were used per strain comparison, allowing for 18 hybridizations to be performed, using a loop design format. In addition, dye swaps hybridizations were performed for all biological replicates for a total of 36 slide hybridizations.

Slides were scanned at a 30-µm resolution using a Genepix 4200A scanner (Axon Instruments,

Sunnyvale, CA), utilizing the built in automated photomultiplier tube (PMT) adjustment for optimal laser sensitivity.

Overall expression was analyzed using the Genespring GX11 software (Agilent technologies, Santa Clara, CA), with foreground-background signal intensities imported separately per channel in a one color format. The data set including 36 slides, representing 72 samples, was normalized by applying a quantile algorithm, then Log2 transformed to a threshold of 1.0. Normalized signal intensities were filtered using an upper cutoff of 95% and lower cutoff 20

of 10% per channel, keeping all entities falling within that range in at least half of the samples.

Differential expression among the four strains was determined using the mean of the

Log2 transformed signals for replicate spots and performing a one-way ANOVA with a subsequent Benjamini-Hotchberg multiple testing correction for all gene-entities. All significant genes showed at least a 2-fold change in at least one pairwise comparison. Additional statistical analysis was performed for each pairwise comparison exclusively using the Statistical Analysis of Microarrays (SAM) T-statistic and its version of a P value called the q score. SAM is freely available online at http://www-stat.stanford.edu/~tibs/SAM/ [85]. Log2 transformed signals were used as expression values and significant genes were determined using a false discovery rate of less than 10% and a cutoff of 4.0 fold change. Gene function, and pathway predictions were made using the Pathway Tools Omics Viewer from Biocyc.org as well as in-house predictions.

2.2.7 Experimental standards

All reported microarray data from this study are in accordance with the Microarray Gene

Expression Data Society’s standards regarding minimum information about a microarray experiment [86].

2.3 Results

2.3.1. Standardized growth assays

Strains OS223, OS195, OS185, and OS155, exhibited significantly different growth rates and doubling times (P < 0.005), when grown at 25⁰C in defined media using glucose as the single carbon source (Figure 2.2). When maltose was used as the single carbon source, growth remained different (P < 0.005), with the exception of OS223 and OS185, as they were similar in their growth characteristics (Figure 2.3). Strain OS155 demonstrated the most noticeable growth pattern deviation from the other strains in both carbon sources, as well as the most pronounced difference in growth between the two carbon sources. In addition, strain

OS155 and OS195 exhibited the fastest growth rates in glucose but completely divergent growth profiles in maltose. Overall, deviations in growth characteristics among the four strains

21

were most apparent in the lag phase, with noticeable stationary phase deviations in maltose. In glucose, respective doubling time (hrs) and growth rate (h-1) per strain was: OS223 (13.236,

0.052), OS195 (6.009, 0.115), OS185 (9.113, 0.076), and OS155 (2.055, 0.052). In maltose, respective doubling time (hrs) and growth rate per strain was: OS223 (2.793, 0.247), OS195

(3.139, 0.220), OS185 (2.689, 0.257), and OS155 (9.601, 0.072).

0.8

0.7

0.6

0.5

0.4

0.3

0.2 Optical Density Optical Density (600 nm) 0.1

0.0 0 25 50 75 Time (hours)

Figure 2.2 Shewanella baltica growth in defined media containing glucose as the single carbon source. OS223 (diamond), OS155 (circle), OS185 (square), and OS195 (triangle). Missing error bars indicate a standard error smaller than the symbol.

22

1.2

1

0.8

0.6

0.4

Optical Density Optical Density (600 nm) 0.2

0 5 15 25 35 45 55 Time (hours)

Figure 2.3 Shewanella baltica growth in defined media containing maltose as the single carbon source. OS223 (square), OS195 (circle), OS185 (diamond), and OS155 (triangle). Missing error bars indicate a standard error smaller than the symbol.

2.3.2 Ecotype simulation

The rate and number of ecotype formations was estimated using the Ecotype

Simulation software (ES). Seven different recombinant free gene sequences were used, from

36 Shewanella baltica isolates found at various depths and abiotic parameters throughout the

Gotland Deep water column. Using a maximum likelihood approach, ES characterized sequence diversity of the seven sequences by constructing a “clade sequence diversity” curve

(Figure 2.4A), as a model for the evolutionary history of the clade. The ES algorithm then used this distribution to estimate the rates (per nucleotide substitution) of ecotype formation, periodic selection, genetic drift, and total number of distinct ecotypes that best fit the proposed model.

Using gyrB sequences for all 36 isolates, ES demarcated 14 ecotype clades placing OS223,

OS195, OS185, and OS155 in different clades (Figure 2.4B). There was a close relationship between the number of ecotype clades demarcated by ES (Figure 2.4B brackets), and the systematic output (Figure 2.4B tree). There was also a noticeable grouping pattern according to

23

water column depth (Figure 2.4B). Similar results were obtained (in terms of ecotype partitioning of the four model strains and depth grouping) when using all seven gene sequences exclusively and in random sets of three concatenated sequences (data not shown).

24

Figure 2.4 Ecotype Simulation (ES) using gyrB sequences for 36 Shewanella baltica strains within the water column. (a) Plot including the sequence diversity patterns using seven different gene sequences as input; SO0578 (blue), SO2615 (red), SO4702 (green), SO2183 (purple), SO2706 (teal), gyrB (orange), SO0625 (brown). Simulation was carried out using the seven genes both exclusively and in concatenated sets yielding consistent demarcations (trees not shown). (b) ES output tree. Red asterisks represent four model strains. Brackets indicate ecotypes demarcated by ES. Colors represent depth of isolation; red, 80 m, brown 90 m, purple 110 m, orange 120 m, blue 130 m, green 140 m.

25

2.3.3 Global expression analysis

A total of 72 measurements of expression, 18 per strain, were obtained for OS223,

OS195, OS185, and OS155. Differentially regulated core genes were determined by performing a one-way ANOVA using intensity values from pairwise hybridizations of the four strains. A total of 415 core genes were found to be differentially regulated among the four strains, which consisted of 10.6% of the entire probe set (Table A.1). In this scheme, differences in expression are in fact due to innate strain specific factors as opposed to conditional responses assayed in a two-color hybridization scheme.

Statistically significant genes were arranged according to frequency of occurrence per strain for all pairwise comparisons as an indicator of overall transcriptional individuality (Figure

2.5A). A total of 415 core genes were identified constituting a global list deemed statistically significant for every pairwise comparison. Thus each gene had three expressional instances, or one per pairwise comparison. Of the 415 genes, OS223 expressed more genes at a higher level when compared to any other strain (Figure 2.5A). Of these, 264 genes (54.9%) were found in every pairwise comparison, or were expressed more by OS223 than all the other strains (Figure

2.5B). In contrast, when compared to all the other strains OS155 exhibited an overall higher expression level for only 23 genes (6.4%), but did express 361 different genes at a higher level than at least one of the other strains (Figure 2.5B). Furthermore, OS155 had the highest frequency of genes expressed at a higher level unique to a singular pairwise comparison, again in opposition to the OS223 profile (Figure 2.5C). Thus, OS155 had the most diverse and least uniform transcription profile while OS223 had the least diverse and most consistent when compared to the other three strains exclusively (Figure 2.5B and C).

26

Figure 2.5 Comparative differentially expressed gene profiles per strain. (a) Number of core genes (y axis) differentially expressed per strain per pairwise comparison (x axis). (b) Core genes found to be differentially expressed per strain in every pairwise comparison. (c) Core genes found to be differentially uniquely expressed in only 1 of the possible 3 pairwise comparisons. Colors indicate strains; OS195 (blue), OS155 (red), OS223 (green), OS185 (purple). All differential expression is actually a general higher level of expression when compared to the alternative strain.

27

The 415 genes were arranged according to strain, and their magnitude of expression, or factor of variation (Figure 2.6). Since each strain was hybridized with the other three strains separately, each gene had three comparisons per strain. Overall, most of the differentially expressed genes per strain exhibited less than a 2.0 fold increase. Most notably was OS155, where 452 (76.4%) of the differentially expressed genes from all pairwise comparisons were less than a 2.0 fold increase. Not surprisingly, OS223 contained only 376 (54.0%) such genes, the lowest of the four strains, but interestingly comprised the largest total number of higher expression instances (697). In other words, OS223 consistently expressed the same genes at a higher level when compared to the three other strains. Using the Significance Analysis of

Microarrays package (SAM), fewer statistically significant genes were found to be differentially expressed using a false discovery rate of less than 15%. The total SAM output for every pairwise comparison yielded 140 unique genes of which 60 were also found in the 582 gene set from the ANOVA output (43%).

28

300

250

200

150

Number of Genes 100

50

0 1.0-1.9 2.0-2.9 3.0-3.9 4.0-4.9 5.0-5.9 6.0-6.9 7.0-7.9 8.0-8.9 9.0-9.9 > 10.0 Fold Change Range

Figure 2.6 Global gene distribution profile indicating range of expressional variation of all gene- individuals. Individual expression is the average of 18 replicates for all 3 possible pairwise comparisons. Colors indicate strains; OS195 (blue), OS155 (red), OS223 (green), OS185 (purple).

Using the 18 replicate measurements per core gene per strain we calculated the percent coefficient of variation, or %C.V., among the study group (Figure 2.7). Replicates consisted of three biological replicates, each consisting of six technical replicates. Percent C.V. is equal to the standard deviation divided by the mean, expressed as a percentage, and is a metric often used to assess experimental variation in microarray datasets [56, 75]. A general trend was noticed in which genes that had lower %C.V. values, or exhibited minimal variation among the technical replicates, also had more significant P values. For the 582 significant genes, which include strain specific genes, 354 (61%) had a %C.V. value of less than 15% for at least one strain.

29

160

120

80 % C.V.

40

0 0 2 4 6 8 10

Log (1/P)

Figure 2.7 Experimental variation versus significance between strains. Experimental variation per gene (3937) per strain (4) reported as % C.V. P values are calculated from the ANOVA and reported in an inverse log10 manner. Colors indicate strains; OS195 (blue), OS155 (red), OS223 (green), OS185 (purple).

Using the differentially expressed core genes as a model of the global transcriptome among the four strains, we categorized these entities using the Clusters of Orthologous Groups

(COG) designations (Figure 2.8). The genes represented a variety of expected COG categories including a large percentage of genes related to transcription (9.3%), amino acid transport and metabolism (7.8%), and energy production and conversion (7.5%). Not surprisingly, genes of unknown function represented the largest portion (16.3%) of the dataset. For comparison, a similar COG distribution of the genes differentially expressed by less than a 2.0 fold change by

OS155 was created (Figure 2.8). OS155 was responsible for the highest frequency of differentially expressed genes within such a range (251), and demonstrated the fastest growth rate among the four strains. Comparatively, OS155 contained a higher proportion of COGs related to amino acid transport and metabolism, lipid transport and metabolism, cellular motility, secondary metabolite biosynthesis (transport and metabolism), signal transduction mechanisms, and unknown functions.

30

Z V U T S R Q P O N M L K 31 J

I H G F E D C A

0 10 20 30 40 50 60 70

Figure 2.8 Number of genes found per COG designation. Blue, COG profile of entire 415 significant gene set. Red, COG profile of entire 415 statistically significant genes excluding OS155 genes differentially expressed by a factor less than 2.0. Green, COG profile of only OS155 genes differentially expressed by a factor less than 2.0. For a more detailed description of the COG groups see Table B.2.

31

Pathways were predicted utilizing the Pathway Tools Omics Viewer (Biocyc.org) using normalized log2 transformed intensity values for all the core genes for all four strains as input.

Genes, and their associated pathways, were deemed significant if there was a 4.0 fold increase or decrease from the overall intensity (Figures 2.9-2.12). First, both OS195 and OS155 seemed to emphasize gene expression for enzymes related to specific intermediates involved in anaerobic respiration. Specifically reactions found to generate reducing power for oxidative phosphorylation via 2-phospho-D-glycerate and phosphoenolpyruvate (Figure 2.9 No. 5).

However, only OS155 demonstrated both lower expression for genes involved in aerobic respiration and higher expression for genes involved in anaerobic respiration. Both OS195 and

OS155 indicated reduced gene expression for enzymes related to the non-oxidative branch of the pentose phosphate pathway. They also demonstrated increased gene expression for enzymes related to the TCA cycle, glycolysis, and spermidine (a polyamine) biosynthesis, which via the regulation of certain intermediates, may be responsible for their faster growth rates.

Polyamines have multifunctional roles in a cell and do seem to be critical for normal cellular growth probably via the modulation of nucleic acids [87]. Intermediates involved in amino acid synthesis shunts via glutamate in the TCA cycle, were expressed at a higher level for OS195 and OS155 as well which corroborates with the COG findings. Furthermore, OS155 displayed the least amount of increased transcriptional emphasis on only 12 membrane transport systems, whereas all other strains recorded increases in over 20 systems (Figures 2.9-2.12).

32

33

Figure 2.9 Pathway predictions for the OS155 total transcriptome using the cellular omics-viewer from Biocyc.org. Generally, reactions on the periphery represent transport systems, left side represents metabolite and vitamin metabolism, middle represents central metabolic pathways, and right side represents simple molecular transformations. Yellow, expression below the -4.0 fold cutoff. Blue, expression between the -4.0 fold and +4.0 fold cutoff. Red, expression above the -4.0 fold cutoff. Numbers 1-5, Pathways found to have increased gene expression for OS195 and OS155. Numbers 6-9, Pathways found to have decreased gene expression for only OS155. 1) glycolysis II 2) TCA cycle variant 3) spermidine biosynthesis 4) TCA cycle variant 5) anaerobic respiration 6) aerobic respiration 7) Pentose Phosphate (non-oxidative branch) 8) Formyl THF biosynthesis I and 9) membrane transport.

34

Figure 2.10 Pathway predictions for the OS195 total transcriptome using the cellular omics-viewer from Biocyc.org. Generally, reactions on the periphery represent transport systems, left side represents metabolite and vitamin metabolism, middle represents central metabolic pathways, and right side represents simple molecular transformations. Yellow, expression below the -4.0 fold cutoff. Blue, expression between the -4.0 fold and +4.0 fold cutoff. Red, expression above the -4.0 fold cutoff.

35

Figure 2.11 Pathway predictions for the OS185 total transcriptome using the cellular omics-viewer from Biocyc.org. Generally, reactions on the periphery represent transport systems, left side represents metabolite and vitamin metabolism, middle represents central metabolic pathways, and right side represents simple molecular transformations. Yellow, expression below the -4.0 fold cutoff. Blue, expression between the -4.0 fold and +4.0 fold cutoff. Red, expression above the -4.0 fold cutoff.

36

Figure 2.12 Pathway predictions for the OS223 total transcriptome using the cellular omics-viewer from Biocyc.org. Generally, reactions on the periphery represent transport systems, left side represents metabolite and vitamin metabolism, middle represents central metabolic pathways, and right side represents simple molecular transformations. Yellow, expression below the -4.0 fold cutoff. Blue, expression between the -4.0 fold and +4.0 fold cutoff. Red, expression above the -4.0 fold cutoff.

Pathways for only OS155 genes found to be differentially expressed by a factor less than 2.0 fold were also characterized because 1) most differential expression for OS155 was due to these genes, and 2) OS155 was the most unique in terms of growth rate and transcription profile. A large number of these genes were found to be involved in anaerobic respiration, nucleotide metabolism, amino acid metabolism, and gluconate catabolism (data not shown).

2.4 Discussion

2.4.1 Scope of the study

Here, I have aimed to characterize diversity within a natural prokaryotic population from a both a transcriptional and ecological perspective. The study is unique in that ecologically distinct yet genomically very similar bacterial strains representative of an entire population were used. Using Shewanella baltica strains as a model, the findings suggest that there exists significant transcriptional variation at the population level. This in fact may imply that the interpretation of the genome, and not just genomic insertions, deletions, and or rearrangements, could be the primary mode of recent divergence within the clade. Using such a global approach to exemplify clade divergence within a population of bacteria has assumedly not been achieved thus far. These findings advance the understanding of intraspecies variation and mechanisms responsible for increased fitness.

2.4.2 Ecotype demarcations

The four strains were tested to see if they constituted discrete ecotypes among 32 other

S. baltica strains isolated from the water column. Ecotypes can be defined as ecologically distinct units respondent to a particular habitat, whose diversity is limited by genetic drift or periodic selection [47, 48, 78, 83, 88, 89]. In fact, ecotypes constitute the most recently divergent clades [83, 88]. To define ecotypes in the S. baltica population, we employed Ecotype

Simulation software (ES). It was found that OS223, OS195, OS185, and OS155 consistently represented different ecotypes. This alone was not unexpected due to differences in isolation depth, but ES also consistently identified several putative ecotypes among the other strains 37

from similar depths. This suggests that depth, at least in meters, is not the singular determinate of ecotype formation in this population. Although water renewal occurs at a slower than average rate and the water column is very stratified [22, 90], being a marine system migration is still dynamic. This creates a natural vector for dissemination by eliminating physical barriers, such as geographical features, typically found with endemic taxa [91]. For these reasons, it is unlikely that the putative ecotypes represent endemic groups of different lineages who independently colonized a niche instead of representing ecologically distinct units specialized for that region over time. In fact, recent studies suggest that localized fitness may actually be due to horizontally acquired genetic elements among the four S. baltica strains [18].

Ecological distinctness of the putative ecotypes may be a necessary a priori assumption independent and supplemental to any ES demarcation [48, 83] and is required by other ecotype demarcating algorithms [58]. Understanding this, S. baltica strains were selected for this study based on the breadth of information already known about their habitat and sequenced genomes

[9, 19, 92]. In fact, much is known about the genus as a whole [3, 4, 8, 17]. Furthermore, the strains were collected from various depths comprised of dissimilar hydrographical and chemical parameters. Here, ES was simply used as a sequence based method to compound the idea of localized fitness hypothesized for the strains, and not a standalone measure of clade divergence.

2.4.3 Genotypic and phenotypic differences among strains

The initial goal was to assess metabolic congruency by measuring growth efficiency of four Shewanella baltica strains in standardized conditions. The findings explicitly reveal the metabolic plasticity among the strains, and highlight the transcriptional versatility of the closely related species. Considering all four strains are classified as the same species using current molecular metrics [18] and have a high number of orthologous protein coding genes, it was surprising to see such metabolic plasticity between the strains when grown in the same conditions (Figures 2.2 and 2.3). OS155, the fastest growing strain, was isolated from an environment containing nitrate and nitrite as well as the largest amount of dissolved oxygen 38

[19]. These substrates are not found in the anoxic zones of the water column where the other three strains were isolated (Table 1.1) [19]. This could suggest a preference for terminal electron acceptors, possibly oxygen, other than nitrogenous compounds, which have are known to be heavily utilized by S. baltica strains at lower depths [19]. It has also been shown that nitrogen is the limiting element in the more shallow regions of the Baltic Sea [90].

Growth profiles in glucose did not reveal any parallels between faster growing strains and slower growing in terms of shared gene content among the strains. Both OS155 and

OS223 are thought to have genomes comprised of numerous genomic rearrangements, whereas OS195 and OS185 are almost completely syntenic [18]. The incredibly successful growth of OS155 in our defined media may have admittedly been, to some degree, rooted in the overlap of abiotic parameters between our defined media and the strains native environment, especially oxygen concentration. That being said, OS155 was the slowest growing strain in maltose while OS195 was the quickest growing even though it was isolated from a completely anaerobic depth. This suggests that the presence of oxygen alone does not explain the OS155 growth profile, and in fact highlights carbohydrate metabolism as another factor.

2.4.4 Foundations for analysis scheme

It is important to note that the principle rationale of this study was to assess the proportion and magnitude of natural variation among closely related bacterial species. It is understood that numerous biological processes occur between gene and functional protein that cannot completely be explained by using mRNA intensity measurements from a microarray experiment. It is also thought that protein and mRNA abundance are uncorrelated due to stochasticity in gene expression and transcriptional noise [93, 94]. Thus, the goal was not to elucidate specific genetic elements responsible for such variation. This would be a considerable task considering the plethora of factors associated with gene expression, more importantly phenotype. Such factors have been well reviewed [62, 66, 69, 74, 95] and although extremely important, are outside the scope of this study.

39

I elected to hybridize the four S. baltica strains in a two-color format but analyze the data in a one-color format. By hybridizing the strains against each other it allowed for a more robust comparative assay where gene expression for one strain was strictly relative to all other strains expression. Many two-color microarray experiments depend on the ratio of intensity measurements of two conditions (or one condition vs. a reference condition). Analysis in this manner is not very sensitive to small differences in expression. This bias was avoided by finding genes that demonstrated statistically significant expressional variation using single measurements of intensity per strain-gene. One-color analysis also allowed for double the amount of intensity measurements per slide, and three times the number of replicates per pairwise comparison. For example, in a two-color format with three biological replicates, OS155 would be hybridized against OS195 six times (including dye swaps). Using the aforementioned format, OS155 can be compared to OS195, as well as any other strain, a total of 18 times

(including dye swaps). While one-color normalization algorithms lack the strength of the

LOWESS algorithm in terms of intensity bias [96], we chose to analyze with the higher number of replicates afforded by a one-color scheme.

Similar pantranscriptome studies have used an ANOVA that incorporated non-biological variation [56, 76]. Here, we used a generic one-way ANOVA and reduced type I errors by applying more stringent statistical estimates. A significance level of P = 0.001 was chosen along with the application of the Benjamini-Hochberg multiple testing correction [97]. In addition, non- biological error was reduced by using oligonucleotides synthesized using phosphoramidite chemistry, a highly efficient synthesis method shown to retain an interslide %C.V. value often less than 12% (mycroarray.com). Each slide also contained spatially separated replicates reducing localized hybridization error in addition to the 18 replicate strain measurements. As a final metric to assess experimental variation, %C.V. was calculated between slides and found to be sufficiently low (Figure 2.7).

40

2.4.5 Expressional relatedness among strains

I present several observations of over arching importance in terms of expressional variation among the four S. baltica strains. (1) The four strains exhibited discrete transcriptional profiles when grown in like conditions in terms of consistency of highly expressed elements. In other words, each strain possessed a cohort of core genes that it expressed at a higher level than any other strain (Figure 2.5B). For instance, when compared to OS223, OS195, and

OS185, OS155 expressed the same 23 genes at a higher level than any other strain. The other three strains each contained over 100 such genes. (2) For each strain comparison, one of the strains exhibited higher expression for a cohort of genes unique to that comparison (Figure

2.5C). For instance, when compared to the other three strains, OS155 expressed a different cohort of genes at a higher level per comparison. (3) For every strain, most differential expression was less than 2.0 fold (Figure 2.6). This phenomenon was most predominant in the

OS155 expression profile where 293 genes were differentially expressed at this magnitude. As a whole, the data indicates that the four strains have discrete transcriptional profiles relative to one another.

Why do some strains conserve expression? To directly answer this, each pairwise comparison was measured exclusively using the SAM package. Not surprisingly, SAM identified fewer differentially expressed genes. This is most likely an artifact of how closely related the strains and thus transcriptomes are. When comparing just two samples using a T test, as opposed to multiple samples in an ANOVA, small deviations which may be biologically relevant across a system can be discounted [98]. These limitations are similar to those encountered when using a two-color ratio based analysis and can potentially reduce the power of the experiment.

The transcriptional differences between the S. baltica strains seem to shed light on metabolic strategies elicited by the more successful growing strains. Understandably, one can only assume this for the growth parameters used for this study and differences in gene expression are only relative to the other strains in the assay. This however is not limiting as I 41

aim to assess intraspecies variation as opposed to variation due to some modified growth condition. From a pathway perspective, the faster growing strains (OS195 and OS155) were similar and equally divergent from the slower growing strains (OS223 and OS185) (Figure 2.9).

OS155 uniquely demonstrated decreased expression for genes involved in vitamin biosynthesis, aerobic respiratory processes, and membrane transport. Being the only strain isolated form an oxic zone, the lessened dependency for aerobic redox substrates is appealing.

This may be an affect of the immense eutrophication found in the Baltic region which introduces diverse inorganic substrates available for redox processes [22]. Being at the shallowest depth,

OS155 would naturally be exposed to these substrates before the other three strains.

Furthermore, OS223 and OS185 may prefer nitrate as a terminal electron acceptor considering both strains are thought to be primarily responsible for denitrification at their depth [19]. Not having nitrate in the growth medium probably caused the observed shift to aerobic respiratory processes for the two strains. Overall, the pathway patterns for the four strains seem to indicate a link between growth rate and expressional emphasis on central metabolic pathways and redox diversity.

It was interesting that the fastest growing strain (OS155) demonstrated the lowest magnitude of differential expression, or contained the most genes differing by a factor of less than 2.0 fold (Figure 2.6). It could be that OS155 possess a more efficient or streamline transcriptome. Although only speculative, faster growing strains may avoid superfluous gene expression that expends valuable cellular energy instead favoring minimal but directed regulation. This would provide a much more optimized transcriptome through minimal gene expression. Furthermore, intraspecies transcriptional variation in eukaryotes, although often small in magnitude [56, 76, 99, 100], is thought to be governed by stabilizing selection [64, 73].

This force would eliminate large deviations from the expressional mean, thus creating an accumulation of many small changes in gene expression as opposed to large scale shifts.

Following this logic, it would be expected that small differences in gene expression between strains contribute to observed growth success. While subtle changes in gene expression 42

probably reflect normal biological variation among individuals, there is not a reason to think that such variation cannot be advantageous. This becomes apparent when looking at the pathway representation and COG designations for genes differentially expressed by less than 2.0 fold within the OS155 transcription profile. As expected, many genes were directly related to general growth such as glycolytic processes, nucleotide metabolism, and tRNA charging pathways related to peptidoglycan amino acid residues. Most importantly, many of the processes that were deduced from the whole transcriptome pathway analysis to be the most important, in terms of growth, were retained in the small fold change gene set. For instance, genes involved in central metabolic pathways such as TCA and pentose phosphate along with general fatty acid metabolism. The most notable absences were genes involved in membrane transport systems suggesting an emphasis on intracellular metabolism. Accepting that central metabolic pathways have the most direct influence on overall growth, the fact that genes differing by less than 2.0 fold are involved in these pathways is interesting. Conversely, genes that exhibited a large fold change difference did not seem to be as impactful on pertinent growth pathways (as inferred by the growth medium). This enforces the thought that small expressional changes are not only important but potentially the most relevant in terms of fitness and thus selective forces.

This pattern is also present when comparing the COG designations for the same set of genes

(Figure 2.8). Understandably, any conclusions drawn from either the pathway analysis or COG descriptions are only as sound as the integrity of the published databases.

Minimal fluctuations in mRNA copy number can vary between individuals and even cells

[93]. This makes discriminating between meaningful data and noise difficult, and may always be an inherent burden to accept. The observed similarities and differences from the transcription profiles among the four strains is generally congruent with ecological differentiation (depth), genomic divergence (shared genes, ANI), and pathway relatedness. That is, the most ecologically and phylogenetically similar strains reflected the most similar transcriptomes. There were noticeable parallels in terms of the magnitude and proportion of differential expression between the aforementioned eukaryotic reports and this study. This suggests a possible 43

conservation of expressional properties of fitness between prokaryotes and eukaryotes especially as a means of selection. Using the data obtained from this study, it is hard to predict explicit expressional elements (gene sets or even regulatory networks), responsible for each strains localized fitness. Such an assay would necessitate conditional responses for each strain thus highlighting genetic elements that dictate growth success. That being said, it will be important in the future to investigate contrasts in gene expression among the strains in multiple other conditions, preferably with parameters that mimic environmental conditions. This will allow for a more direct comparison between transcriptome and ecological specification.

44

APPENDIX A

COMPLETE PROTOCOL: FROM CELL TO ANALYSIS (HYBRIDIZATION PROTOCOL ADAPTED FROM DR. WOO-SUK CHANG)

45

Oligonucleotide Microarray Protocol for Shewanella baltica By: W. S. Hambright

Isolation of total RNA using Ribo-Pure Bacteria kit by Ambion (AM1925): *Recommended samples ran at once: up to 4

1. Grow cells to logarithmic optical density (this is ~4.0 in DM w/ Glucose). 2. Spin 5ml DM at full speed for 5 min. Acceleration should be set at full speed but the brake MUST be set at < 1/2 speed. 3. During spin prep 0.5 ml screwcap tube with ~250 μl Zirconia beads. Also, set micro- centrifuge temperature to 4˚C. 4. After spin, discard supernatant and resuspend cells with 350 μl RNAwiz. Add solution to Zirconia beads and horizontally vortex at full speed for 10 min. 5. Centrifuge full speed for 5 min at 4˚C. 6. Transfer lysate to new 1.5 ml tube and add 0.2 volumes of chloroform. Mix well and incubate at RT for 10 min. 7. Centrifuge full speed for 5 min at 4˚C. Transfer aqueous phase (top) to new 1.5 ml tube (typically ~150-200 μl). 8. Place 100 μl (per sample) elution solution into 1.5 ml tube and warm to 96˚C for at least 5 min. 9. Add 0.5 volume 100% ethanol to each sample and mix thoroughly. 10. Transfer sample to filter cartridge in a new collection tube and spin full speed for 1 min. Discard flow through. 11. Add 700 μl wash solution I to same tube/cartridge and spin at full speed for 1 min. Discard flow through. 12. Add 500 μl wash solution 2/3 to same tube/cartridge and spin at full speed for 1 min. Discard flow through. 13. Repeat step 12. 14. Spin empty tube/cartridge for 1 min to remove trace volume. Discard collection tube, and place filter into a new collection tube. 15. Elute RNA with 45 μl pre-heated elution solution and spin full speed for 1 min. DO NOT DISCARD FLOW THROUGH. 16. Repeat step 15. 17. Immediately add 1/9th volume 10X DNase buffer and 4 μl DNase to sample(s). DNase is an enzyme, KEEP COLD using cold block tube rack. 18. Incubate 30 min at 37˚C. 19. Add 1/5th of original RNA volume DNase Inactivation Reagent to sample(s). This should be approximately18 μl. 20. Incubate at RT for 2 min then pellet DNase Inactivation Reagent by spinning full speed for 1 min. Collect supernatant and put in new 1.5 ml tube. 21. Asses yield using Nanodrop. Typical yield is 350-400 ng/μl. Store at -80˚C for up to 6 months.

Purification of mRNA using MICROBExpress kit from Ambion (AM 1905): *Recommended samples ran at once: up to 4

1. Precipitate total RNA (TRNA) by adding 0.1 volume 3M sodium acetate, and 3 volumes 100% ethanol. Vortex well. 2. Quick freeze at -80˚C for 30 min. During this time allow micro-centrifuge to reach a temperature of 4˚C. 3. Pellet RNA by spinning at 12,000 rcf for 30 min at 4˚C. At this time set a water bath for 70˚C and 37˚C. 46

4. Pipette and discard supernatant. Add 1 ml ice cold 70% ethanol and vortex well. 5. Spin full speed at 4˚C for 10 min. Carefully remove all supernatant. 6. Air dry for ~2min in hood. 7. Dissolve in 15 μl TE (10mM Tris-HCl pH8, 1mM EDTA). 8. Label all sample tubes. Pipet 200 μl Binding Buffer into 1.5 ml tube. 9. Add 8-9 μg RNA (max volume of 15 μl) from step 7 to tubes with binding buffer. (If insufficient RNA for 8-9 μg add as much as possible). 10. Vortex briefly. 11. Add 4 μl Capture Oligo Mix to mixture. 12. Vortex briefly and quick spin tube to collect mixture. 13. Denature for 10 min at 70˚C. 14. Anneal for 20 min at 37˚C. This can actually incubate for up to 1 hr so at this time begin labeling tubes and do steps 15-19. 15. Prepare Oligo mag-beads by putting 50 μl per sample in a 1.5 ml tube and capturing them with the magnetic stand for 2 min. (The more mag stands you have here the quicker and better results you will obtain. I suggest 1 mag stand per 2 samples). 16. Discard supernatant leaving beads. 17. Add equal volume NFW, vortex, and place on mag stand for 2 min. Discard supernatant leaving beads. 18. Add equal volume Binding Buffer, vortex, and place on mag stand for 2 min. Discard supernatant leaving beads. 19. Add equal volume Binding Buffer, vortex, and place in 37˚C water bath for 5 min. At this time also heat the Wash Solution in the 37˚C until later use. 20. Remove mag beads from water bath, vortex and add 50 μl to each RNA sample. 21. Vortex mixture briefly and quick spin to collect mixture. Incubate at 37˚C for 15 min. 22. Capture magnetic beads using mag stand for 2 min. Remove supernatant (RNA) and put in new 1.5 ml tube on ice. 23. Add 100 μl pre-warmed wash solution to captured beads in mag stand. Resuspend beads, vortex, and place back in mag stand for 2 min. Remove supernatant and add to RNA tube on ice. 24. Perform an ethanol precipitation by adding 1/10th volume 3M sodium acetate, 4 μl glycogen, and 1ml ice cold 100% ethanol to RNA tube. vortex thoroughly, and precipitate for 1hr at -20˚C. 25. Centrifuge for 30 min at 13,000 rpm. Decant and discard supernatant. 26. Add 750 μl ice cold 70% ethanol, vortex, and spin for 5 min at 13,000 rpm. Decant and discard supernatant. 27. Repeat step 26. 28. Spin for 1min at 13,000 rpm and remove residual liquid with fine tipped pipette. Air dry for 5 min in hood. 29. Resuspend pellet in 25 μl NFW (or TE) with frequent flicking for 10 min. 30. Place tube in mag stand for 2 min to remove residual mag beads. Remove supernatant and place in new 1.5 ml tube. 31. Use nanodrop or Agilent Bioanalyzer to quantify and qualify results. Yields should be around 40-60 ng/μl. Store at -80˚C for up to 6 months.

Synthesis of cRNA from enriched mRNA or total RNA using MessageAmp II Bacteria kit by Ambion (AM 1790): *Recommended samples ran at once: up to 8

1. Place 500-1000 ng total RNA or 100-400 ng enriched mRNA in a new 200 μl nuclease free tube. Bring to volume with 5 μl NFW (or just add 5 μl from enriched product if within acceptable concentration range). 2. Place samples in a pre-programmed thermal cycler for 10 min under the following conditions: • 70˚C for 10 min 47

• 4˚C forever • Lid temp at ~105˚C 3. While incubating prepare Polyadenylation Master Mix (PMM) using website (www4.appliedbiosystems.com/tools/ma2bact). 4. After incubation, place samples in ice for 3 min then add 5 μl PMM to each sample. Vortex, then quick spin to collect mixture. 5. Place samples in a pre-programmed thermal cycler for 15 min under the following conditions: • 37˚C for 15 min • 4˚C forever • Lid temp at 50˚C 6. During incubation prepare Reverse Transcription Master Mix (RTMM) using website (www4.appliedbiosystems.com/tools/ma2bact). 7. After incubation remove samples then add 10 μl RTMM to each sample. Vortex, then quick spin to collect mixture. 8. Place samples in a pre-programmed thermal cycler for 2 hr under the following conditions: • 42˚C for 2 hr • 4˚C forever • Lid temp at 50˚C 9. Towards end of incubation, on ice prepare Second Strand Master Mix (SSMM) using website (www4.appliedbiosystems.com/tools/ma2bact). 10. After incubation remove samples then add 80 μl SSMM to each sample. Vortex, then quick spin to collect mixture. 11. Place samples in a pre-programmed thermal cycler for 2 hr under the following conditions: • 16˚C for 2 hr • 4˚C forever • Lid temp at 16˚C or RT 12. Towards end of incubation, preheat NFW to 55˚C. Per sample, add 250 μl cDNA binding buffer to new 1.5 ml tube and prepare and label cDNA filter cartridges and collection tubes. 13. After incubation add sample to 1.5 ml tube containing binding buffer and vortex. 14. Add buffer/sample mix to corresponding cDNA filter cartridges. Spin for 1 min at 10,000 rcf and discard flow through. 15. Add 500 μl wash buffer to filter. Spin for 1 min at 10,000 rcf and discard flow through. 16. Spin filter empty for 1 min to remove trace volume. Transfer samples to new elution tubes. 17. Add 18 μl preheated NFW to center of filter. Incubate for 5 min at RT. Spin at 10,000 rcf for 1.5 min. Discard filter cartridge. 18. Place samples at 20˚C until step 21. (Samples can be stored overnight here if need be) 19. Prepare IVT Master Mix (IVTMM) using website for 1 round of amplification. This protocol is used for amino-allyl substituted nucleotides. When not using modified nucleotides, refer to the previous website. 20. For each sample add 26 μl IVTMM to new 200 μl small tubes. 21. Add thawed sample to IVTMM, vortex, and then quick spin to collect mixture.

22. Place samples in a pre-programmed thermal cycler for 14 hr under the following conditions: • 37˚C for 14 hr • 4˚C forever • Lid temp at ~105˚C 23. After incubation, add 60 μl NFW to all samples. Place samples at 20˚C until step 26 (Samples can be stored indefinitely at this point). 24. Preheat NFW to 55˚C. Label and assemble aRNA filter cartridges and collection tubes for all samples. 25. Per sample, add 350 μl aRNA binding buffer to new 1.5 ml tube. 48

26. Add thawed RNA to binding buffer in 1.5 ml tube and vortex. 27. Add 250 μl 100% ethanol to mixture and pipette up and down multiple times. Move IMMEDIATELY to next step. 28. Load sample into labeled aRNA filter cartridge in collection tube. Spin for 1 min at 10,000 rcf and discard flow through. 29. Add 650 μl wash buffer to filter. Spin for 1 min at 10,000 rcf and discard flow through. 30. Spin empty for an additional min to remove trace volume. Discard collection tube and place filter cartridge in new collection tube. 31. Add 150 μl NFW to filter and incubate in a water bath at 55˚C for 10 min. DO THIS BEFORE YOU SPIN. 32. After incubation spin tube for 1.5 min at 10,000 rcf to recover aRNA. Remove recovered aRNA and make three 50 μl aliquots. 33. Yields should be around 700-1000 ng/μl (or ~4X starting material). Store at -20˚C or -80˚C forever.

Labeling of amine modified cRNA (aRNA) using Alexa Fluor 555 Molecular Probes by Invitrogen (A32756): **Recommended samples ran at once: NO MORE THAN 3 **Allow dye tube to equilibrate to room temp for 30 min before use. **Mycroarray claims that up to 40 μg of cRNA per rxn can be coupled here although Alexa Flour calls for 1-5 μg. Great results have been seen with 20 μg.

1. Perform an ethanol precipitation on one 50 μl aliquot of aRNA by adding 5 μl 3M sodium acetate (pH 5.2), 125 μl 100% ethanol, and 2 μl glycogen. 2. Vortex mixture and incubate at -20˚C for 1.5 hr. 3. Centrifuge at 10,000 rcf for 30 min at RT. Decant as much supernatant as possible. 4. Wash with 100 μl 70% ethanol, vortex, and spin at 10,000 rcf for 5 min. Decant as much supernatant as possible. 5. Repeat step 4. 6. Quick spin for 1 min to re-pellet. 7. Aspirate remaining volume with fine tipped pipet. 8. Air dry pellet in hood for 5 min. At this time allow dye to equilibrate at RT in the dark. 9. Resuspend pellet in NFW of a volume usually between 50-100 μl. 10. Obtain a nanodrop concentration of the RNA. Should expect 700-1000 ng/μl. Store at -80˚C forever. 11. For labeling, begin by dissolving dye (set out earlier for at least 30 min), with 2 μl DMSO. Keep dye in dark when not in use. Vortex well. Let incubate for at least 5 min. 12. Add 6 μl of FRESH 1M sodium bicarbonate labeling buffer (pH 9) to a new 200 μl tube. 13. For ~10μg rxn: Transfer 10 μl of precipitated RNA in NFW to the 200μl tube. Mix by gently pipetting. *Note: This volume usually contains 7-10μg RNA. Concentration is not critical here as aliquots will be removed downstream at lower concentrations.

14. Add RNA/labeling buffer mix to dye container (approx 8 μl). Vortex, quick spin, then incubate at RT in the dark for 1.5 hr.

Clean Up of Dye Labeled RNA using RNeasy kit by Qiagen (74104):

1. Add 350 μl Buffer RLT and 250 μl 100% ethanol to a new 1.5 ml tube (per sample). 2. Assemble and label filter cartridges and collection tubes per sample. 3. Add 90 μl NFW to dye tube. Transfer volume to 1.5 ml tube with buffer and ethanol. Mix by pipetting up and down multiple times. 4. Transfer mixture to filter cartridge and centrifuge for 20 sec at 8,000 rcf. Discard flow through. 49

5. Add 500 μl Buffer RPE to filter cartridge and centrifuge for 20 sec at 8,000 rcf. Discard flow through. 6. Add another 500 μl Buffer RPE and centrifuge for 2 min at 8,000 rcf. Discard flow through. 7. Spin empty cartridge for 1 min to remove trace volume. Discard collection tube and place filter cartridge in new 1.5 ml collection tube. 8. Add 30 μl NFW to filter cartridge and centrifuge for 1 min at 8,000 rcf. 9. Repeat step 8. Discard filter cartridge. 10. Nanodrop RNA using microarray setting. Yields typically range from 50-80 ng/μl RNA with ~2.5-4.0 pmol dye. Base dye ratio is typically 1.5-1.7. Use website to calculate base dye labeling efficiency. (http://probes.invitrogen.com/resources/calc/basedyeratio.html) 11. Spin tube in a speedvac to lypholize sample. Store at -20˚C forever.

Microarray analysis using Alexa Fluor Labeled RNA with 30K, 1 block Mycroarray Synthesized Slides: * Protocol adapted from Dr. Woo-Suk Chang, April 2009 **Prehybridization is not required as it is done during QC before shipment.

Prehybridization (Stratalinker not needed): 1. Preheat Pre-Hyb soln in a coplin jar to 50˚C (water bath or incubator) . 2. Incubate slides at 50˚C for 45 min (water bath or incubator). 3. Towards end of incubation, prep all washes. 4. Place slides in slide rack and wash vigorously up and down in 0.025X SSPE for 1 min. 5. Repeat with new 0.025X SSPE. 6. Place slides in a coplin jar with 0.025X SSPE and wash for 1 min by gentle shaking. 7. Empty 0.025X SSPE and replace with new 0.025X SSPE and wash again. Repeat 3X. 8. Place slides in a NEW coplin jar with 0.025X SSPE and wash again for 1 min by gentle shaking. 9. Dip slides a few times in isopropanol and centrifuge for 2 min at 850 rpm to dry.

Hybridization Using Single Use Rubber Hyb Chambers: 1. Prepare 1X hyb solution as written in the solutions section. 2. Incubate 5 min at 65˚C followed by snap cooling on ice for at least 5 min. 3. Prepare Hyb chamber by adding 350 μl Hyb soln to filter paper placed in the bottom of the chamber. Place in incubator at 37-50˚C to preheat until hybridization. 4. Resuspend aRNA in proper volume of 1X Hyb soln to dispense your predetermined aRNA concentration. For instance, if using a single color hybridization and want 2.5 μg in 60 μl follow formula to determine volume of hyb soln (X) needed to resuspend to correct concentration.

a) [aRNA μg/μl] * [58.0 μl] = Total μg aRNA in pellet b) [Total aRNA] / [.0416 μg/μl] = [X μl] c) Take 60 μl from X μl to add to hybridize to slide

*Note: [.0416 * 60 μl] = 2.5μg

5. Vortex gently to mix and reconstitute. (If small amount of particles remain this is ok as they will dissolve readily at 65˚C) 6. Incubate 3-4 min at 65˚C then transfer immediately to ice for 1 min. 7. Spin at RT for 1 min at 13,000 rpm. Combine samples at this time for 2 color Hyb’s. 8. Clean lifter slip and place on slide in the chamber. Load 60 μl of resuspended aRNA to the slide 20 μl at a time by varying tip placement along the lifterslip. 9. Seal chamber, wrap in foil, and place in 50˚C water bath for 18-36 hrs (user defined).

50

Post-Hybridization Washes (This step is critical and needs thorough washing):

1. Remove lifter slip from slide and wash in Wash I for 3 min. Repeat at 2X in new coplin jars. 2. Wash slide in Wash II for 1 min 2X 3. Immediately place slides in a new slide rack and dry by spinning at 850 rpm for 2 min at RT. 4. Scan or store slide. *If washing is not sufficient repeat step 1 at 50˚C and step 2 as is.

Required Solutions: Prehybridization Stock Solution: 20X SSPE – 125 ml 10% SDS – 5 ml BSA -- 100 mg NFW -- 370 ml *Filter soln to .2 μm and store at 4˚C.

2X Hybridization Solution (For ~5 Reactions): Formamide - 500 μl 20X SSPE – 500 μl Tween 20 - 1 μl Take 500 μl of 2X Hyb Solution and add to 500 μl NFW to make 1X (done twice). Next, Take an aliquot of 1X and add in to a new tube. Add Acetylated BSA at .01mg/ml and 1% control oligos to aliquot. This is the Hyb Soln used to resuspend the target.

Post-Hybridization Washes (60 ml portions in coplin jar): Wash I: 20X SSPE – 3 ml NFW – 57 ml

Wash II: 20X SSPE – .3 ml (300 μl) NFW – 59.7 ml

Stripping of Mycroarray slides: 1. Place slides in .1X SSPE at RT to hydrate for >30min. 2. Transfer slides to .1X SSPE and .01% SDS for 1 hr at 50˚C. 3. QUICKLY move to same solution at 82˚C for 15 min. 4. Cool in .1X SSPE for 15 min at RT. 5. Wash then use.

Note* If re-using slides try to re-hybridize with same aRNA as before. Not recommended for slides without replicate oligos such as S. balticas. Make sure that spot intensities decrease between 80-98%.

Scanning slides using the Axon Genepix Scanner: *Follow guidelines in the manual, but below is a brief synopsis. 1. Make sure lasers are calibrated. This can be checked in Genepix. 2. Allow scanner to warm up for 15 min. Also do at least one practice preview scan to warm the lasers up. 3. Do a preview scan on your slide and select the slide scan area AND place the corresponding .gal file over the scan area. *This need not fit perfectly but needs to generally cover all oligos. 4. Use the Auto PMT adjust function in Genepix and do not select “from current PMT”. Do select “perform data scan after adjustment”. 51

5. Once this is done, very carefully place the .gal file over the spots and hit F5 to auto align. 6. Double Click on the slide area and set all spot diameters to 30. *This function actually works very well but check over slide to make sure all spots are covered properly. 7. Perform the analysis, and save the results as a .gpr file.

Analysis in GSGX Using 2 Dye Competitive Format: *This is for generic single color analysis using multiple samples per file. This incorporates each channel separately. 1. Modify all .gpr files so that identifier column (ID or Name) has exact replicates. Save as tab delimited .txt file. Ex: OS155__1234_@gene1245-345 OS155__1234_@gene1245 2. In GSGX open new project, then from the annotations drop down menu, create custom technology from .txt file. 3. Select single color and multiple sample per file options. Select any .txt as an example file and select annotation file if any. At this time S. baltica does not have one. 4. Follow instructions through the windows. Select custom identifier column (ID or Name), and select custom signal columns. *The 2 signal columns should be: F532 Median-B532 Median F635 Median-B635 Median 5. Create new project using the drop down menu. 6. Create new generic single color experiment using the drop down menu. 7. Select new custom technology and select all .txt files you wish to incorporate. *This depends on your experiment. Incorporate files for a single comparison (OS195vsOS155), or many comparisons (OS155vsOS195vsOS185 ect…). You can tell GSGX how to interpret your input later. 8. Allow GSGX to split the files, then threshold the signals to 1 and quantile normalize. 9. Next, median center the arrays. This does not affect the statistics.

*All the next functions and transformations are done using the drop down menus on the right hand side of the screen. Order of operations is VERY important.

Experiment Grouping: 1. Set up one parameter: Strain 2. Label each channel (F-B) by strain. (i.e. 155 or 195…) 3. Save these setting to avoid repetition.

Experiment Interpretation: *This allows the user to tell GSGX how to interpret the parameters. You should group all strains separately AND together. If doing a pairwise comparison, than group those two strains together. 1. Select parameter to be grouped together, this should be Strain. Hit next. 2. Select strains to be grouped together, check the average over replicates box, then hit next. 3. Name the interpretation accordingly and hit finish. This is the interpretation you will select for all future filtering and analysis.

Quality Control: *This is arguably the most important aspect of the analysis process. There is not a discrete single pathway or rules to follow. Below is a good start.

52

1. Select the quality control option and visually inspect the 3D PCA orientation of all the data files. Using this tool and the box and whisker plot automatically generated, remove outlying data files. 1. Select the filter by expression option. Use raw data and use lower bound 20th percentile and upperbound 95th percentile in at least half of the conditions. 2. Select filter probes by error option and only retain entities with a %CV value of less than 1000 in at least half of the conditions. Other filtering options to consider: 1. Only keep entities with a SD over 1.0 in at least half of the conditions. 2. Only keep entities with a Fold Change of 2.0 in at least 1 condition. 3. Filter using volcano plot. This is nice and all, but any stringent statistical test should circumvent the use of this as a filtering mechanism. 4. Entities where F-B is a positive value.

Statistical Analysis: *This is not the typical fold change analysis that can be done separately. Fold change principles are becoming less and less acceptable criterion for differential expression. Fold change is ratio based approximation, and not a parametric statistical test for significance or variation. Along with Fold Change analysis, cluster analysis is also ran separately from statistical analysis. Clustering allows the user to visualize relatedness of all the filtered entities in terms of their intensity, and is a very useful tool.

1. Select statistical analysis from the right hand menu. When doing each pairwise comparison independently (i.e. 195vs155 and 195vs185), only select the interpretation that contains the 2 strains being compared. 2. Select the T-test option between the strains. *All replicates are averaged automatically. 3. Use an asymptotic approximation, along with the Benjamini-Hochberg Multiple Testing Correction (FDR approximation). 4. The next window allows the user to edit the p-value cutoff and select entities and save entity lists per certain p-value. *From this window, the user can right click and export the list as a tab-delimited file external to GSGX for use in other programs. This is invaluable and is a great facet of GSGX.

Results interpretation: *These are very useful tools in GSGX. These allow the user to visualize related entities and study pathways, Gene Set’s, and Gene Ontologies. See the manual for guidance but the user MUST import supplemental gene identifiers such as GO terms and Entrez ID’s within an annotation file. For all questions contact GSGX technical support at [email protected].

53

APPENDIX B

SUPPLEMENTAL MATERIALS

54 Table B.1 Genes identified by Genespring GX 11 to be significantly differentially expressed among all four S. baltica strains using a one-way ANOVA. Only genes that had a locus tag and COG designation were included. Fold change is a factor of variation based on statistical output. For each strain comparison, strain responsible for higher expression corresponding to the fold change is indicated in the column.

OS155 OS155 OS155 OS185 OS185 OS195 vs. vs. vs. vs. vs. vs. Locus Tag COG Fold ∆ OS185 Fold ∆ OS195 Fold ∆ OS223 Fold ∆ OS195 Fold ∆ OS223 Fold ∆ OS223

Shew185_0001 COG0593L 1.60 185 2.11 155 1.86 155 3.60 185 1.99 185 2.37 223 Shew185_0014 COG2199T 1.14 185 1.35 155 1.14 155 1.64 185 3.45 185 1.33 223 Shew185_0020 COG4635CH 1.02 185 2.26 155 2.74 155 1.37 185 1.33 185 1.02 223 Shew185_0044 COG0607P 1.33 185 1.05 155 1.37 155 2.18 185 4.62 185 1.84 223 Shew185_0047 COG2081R 3.08 185 1.77 195 4.76 223 1.97 195 4.22 223 5.32 195 Shew185_0050 COG0569P 1.72 155 1.01 155 2.42 155 1.21 185 1.08 185 2.72 223 Shew185_0062 COG0738G 2.10 185 3.43 195 1.08 223 2.45 195 2.50 223 3.44 195 Shew185_0096 COG2188K 2.55 185 2.23 195 2.19 223 4.69 195 3.41 223 4.37 195

55 Shew185_0097 COG2987E 2.96 185 1.82 195 1.64 223 3.74 195 1.61 223 4.99 195 Shew185_0105 COG3276J 1.11 155 1.30 155 1.83 155 1.22 185 1.17 185 1.24 195 Shew185_0131 COG0303H 1.44 155 1.42 155 3.16 155 2.16 185 1.22 185 2.10 195 Shew185_0142 COG2866E 1.52 185 4.63 155 1.47 155 1.09 185 1.06 185 2.26 195 Shew185_0144 COG1301C 3.14 185 2.54 195 1.31 223 3.30 195 1.55 223 5.40 195 Shew185_0147 COG1188J 4.07 155 2.20 195 2.69 223 3.18 195 3.07 223 7.70 223 Shew185_0177 COG0796M 1.55 185 3.22 155 6.21 155 1.04 185 3.40 185 2.29 195 Shew185_0178 COG0724R 1.16 155 3.33 155 3.85 155 16.60 185 5.53 185 1.41 195 Shew185_0184 COG0250K 1.11 155 3.75 155 2.20 155 1.69 185 3.01 185 1.25 195 Shew185_0206 COG0093J 1.05 185 1.32 155 3.18 155 3.36 185 16.09 185 1.04 223 Shew185_0214 COG1841J 1.12 185 1.74 155 1.50 155 5.28 185 3.61 185 1.27 195 Shew185_0232 COG0526OC 1.75 155 3.85 155 2.06 155 1.70 185 2.37 185 2.80 223 Shew185_0234 COG0526OC 1.20 155 1.57 155 1.09 155 17.51 185 2.58 185 1.53 195 Shew185_0252 COG0604CR 1.11 155 1.37 155 1.50 155 1.76 185 5.00 185 1.25 195 Shew185_0253 COG0642T 2.26 185 2.85 195 2.13 223 1.66 195 1.46 223 3.72 223

Shew185_0281 COG0725P 1.14 155 2.48 155 1.44 155 1.34 185 2.99 185 1.34 223

Shew185_0289 COG2199T 4.11 155 1.87 195 2.16 223 3.10 195 3.22 223 7.80 223 Table B.1 - Continued

Shew185_0300 COG0520E 1.29 185 3.53 155 1.41 155 3.43 185 2.31 185 1.81 223 Shew185_0308 COG1217T 1.13 155 2.45 155 1.46 155 7.15 185 3.60 185 1.33 223 Shew185_0322 COG3135Q 2.03 185 1.97 195 2.00 223 8.54 195 2.64 223 3.37 195 Shew185_0337 COG0365I 1.60 155 3.56 155 2.48 155 2.24 185 1.41 185 2.37 223 Shew185_0350 COG0251J 1.36 155 3.94 155 1.52 155 5.75 185 27.45 185 1.92 223 Shew185_0363 COG1948L 1.96 185 4.63 195 2.51 223 14.76 195 4.57 223 3.23 223 Shew185_0377 COG0227J 19.98 185 6.06 195 1.24 223 3.11 195 4.49 223 47.93 223 Shew185_0379 COG0548E 1.43 155 1.50 155 4.13 155 1.58 185 3.70 185 2.08 195 Shew185_0389 COG2067I 1.72 155 2.35 155 2.33 155 3.93 185 1.17 185 2.72 195 Shew185_0392 COG2001S 4.29 155 1.22 195 1.33 223 1.22 195 1.27 223 8.84 195 Shew185_0408 COG0739M 1.25 185 2.50 155 2.42 155 1.56 185 1.53 185 1.72 223 Shew185_0417 COG0661R 1.48 155 2.59 155 1.93 155 3.26 185 1.36 185 2.19 223 Shew185_0421 COG0684H 3.16 155 1.28 195 1.82 223 1.66 195 4.79 223 5.43 223 56 Shew185_0433 COG0564J 1.29 185 2.00 155 1.78 155 1.31 185 1.15 185 1.80 195

Shew185_0488 COG1459NU 1.19 155 1.49 155 2.12 155 1.46 185 1.27 185 1.52 223 Shew185_0516 COG0114C 1.43 155 2.41 155 1.13 155 2.73 185 3.17 185 2.09 195 Shew185_0518 COG2968S 1.12 155 1.46 155 2.20 155 3.59 185 1.80 185 1.28 223 Shew185_0519 COG2200T 1.29 185 3.40 155 2.15 155 10.07 185 2.06 185 1.82 223 Shew185_0527 COG0152F 4.49 185 1.73 195 4.19 223 2.87 195 1.59 223 9.56 195 Shew185_0547 COG0810M 1.68 185 1.07 155 1.27 155 3.24 185 16.91 185 2.65 223 Shew185_0550 COG0546R 2.58 185 1.10 195 3.00 223 2.43 195 1.63 223 4.44 195 Shew185_0557 COG4974L 2.90 155 2.18 195 8.88 223 14.28 195 6.41 223 4.87 195 Shew185_0611 COG3436L 1.31 185 3.67 155 1.84 155 1.66 185 3.61 185 1.83 223 Shew185_0621 COG0612R 1.07 185 1.80 155 2.08 155 3.87 185 1.32 185 1.09 223 Shew185_0627 COG0845M 1.27 185 1.95 155 4.92 155 3.82 185 5.66 185 1.77 195 Shew185_0643 COG0534V 1.59 185 3.11 155 2.39 155 21.43 185 1.69 185 2.36 195 Shew185_0654 COG1119P 1.82 155 1.23 155 1.07 223 4.73 185 1.03 223 2.94 223

Table B.1 - Continued

Shew185_0660 COG1695K 2.98 155 4.44 195 8.46 223 7.18 195 2.12 223 5.12 195 Shew185_0702 COG0102J 2.95 155 1.53 195 1.29 223 13.06 195 6.97 223 4.93 195 Shew185_0706 COG0790R 4.55 185 2.19 195 2.46 223 2.44 195 1.62 223 10.12 223 Shew185_0710 COG3137M 1.12 185 4.72 155 2.71 155 1.80 185 2.97 185 1.26 223 Shew185_0737 COG0456R 3.19 185 1.62 195 2.77 223 1.36 195 1.77 223 5.56 223 Shew185_0748 COG1949A 1.85 185 1.15 155 1.91 223 3.97 185 1.69 223 3.00 195 Shew185_0757 COG5471S 1.39 185 3.09 155 1.34 155 4.31 185 1.88 185 2.00 195 Shew185_0762 COG4737S 3.07 155 1.44 195 1.01 223 4.89 195 4.35 223 5.23 223 Shew185_0787 COG2133G 3.38 155 1.32 195 2.31 223 7.51 195 2.21 223 6.09 195 Shew185_0790 COG3203M 1.83 185 1.33 155 3.86 223 3.29 185 3.47 223 2.97 195 Shew185_0806 COG2813J 2.10 155 1.37 195 1.63 223 2.04 195 22.87 223 3.46 223 Shew185_0850 COG1489R 3.19 155 1.39 195 4.37 223 1.29 195 3.90 223 5.54 195 Shew185_0857 COG2303E 4.69 185 1.23 195 3.50 223 6.38 195 5.75 223 10.98 195 57 Shew185_0889 COG3789S 1.31 155 2.17 155 1.96 155 4.61 185 3.71 185 1.84 223 Shew185_0890 COG4262R 2.07 155 1.15 195 9.85 223 1.18 195 4.30 223 3.41 223 Shew185_0895 COG3329R 3.85 185 3.34 195 1.11 223 8.65 195 6.87 223 6.70 195 Shew185_0901 COG3181S 2.09 155 2.43 195 4.36 223 1.13 195 1.37 223 3.44 195 Shew185_0909 COG1376S 2.87 185 2.52 195 1.30 223 1.80 195 1.82 223 4.82 223 Shew185_0919 COG0369P 1.00 185 18.63 155 4.55 155 1.06 185 9.38 185 1.01 195 Shew185_0927 COG0007H 4.01 185 1.25 195 1.83 223 2.81 195 1.06 223 7.33 223 Shew185_0932 COG4191T 2.06 155 1.16 195 2.04 223 83.78 195 3.72 223 3.41 223 Shew185_0951 COG0834ET 7.98 185 1.24 195 1.11 223 2.85 195 1.27 223 25.38 223 Shew185_0955 COG1346M 4.53 185 3.37 195 1.20 223 6.69 195 2.50 223 9.71 223 Shew185_0973 COG1410E 1.84 155 3.03 155 3.64 223 1.21 185 2.59 223 3.00 223 Shew185_0977 COG2038H 1.33 185 1.05 155 1.24 155 1.90 185 3.02 185 1.84 195 Shew185_0981 COG2109H 1.53 185 1.57 155 2.08 155 2.08 185 4.71 185 2.29 223 Shew185_0983 COG0614P 2.58 155 1.13 195 6.29 223 12.20 195 4.63 223 4.42 223

Table B.1 - Continued

Shew185_1000 COG4591M 2.27 155 4.67 195 2.24 223 1.84 195 1.06 223 3.80 195 Shew185_1002 COG5000T 1.61 185 1.18 155 2.14 155 3.02 185 2.95 185 2.49 195 Shew185_1006 COG1309K 1.35 185 1.25 155 1.54 155 7.99 185 2.60 185 1.89 223 Shew185_1015 COG4559P 1.86 185 13.32 155 1.76 223 11.38 185 3.99 223 3.05 195 Shew185_1019 COG0811U 4.24 185 1.47 195 1.74 223 1.13 195 2.61 223 8.60 223 Shew185_1022 COG3721P 1.97 155 3.96 195 3.71 223 1.90 195 2.53 223 3.25 223 Shew185_1030 COG2939E 2.51 185 1.23 195 40.12 223 8.20 195 6.20 223 4.31 223 Shew185_1041 COG0262H 1.16 185 2.72 155 3.03 155 2.66 185 1.08 185 1.41 195 Shew185_1043 COG0840NT 1.68 185 1.82 155 1.41 155 1.32 185 1.30 185 2.65 223 Shew185_1050 COG3178R 1.64 155 1.16 155 2.03 155 1.44 185 4.29 185 2.59 223 Shew185_1051 COG1208MJ 6.37 155 2.36 195 2.31 223 2.53 195 6.84 223 16.67 223 Shew185_1054 COG1052CHR 1.82 155 3.75 155 1.04 223 1.05 185 3.81 223 2.95 195 Shew185_1085 COG4105R 2.67 155 3.21 195 1.01 223 7.92 195 1.59 223 4.54 223 58 Shew185_1096 COG1238S 8.36 155 1.48 195 3.24 223 1.67 195 2.47 223 28.86 223 Shew185_1140 COG0730R 1.66 155 2.67 155 2.23 155 3.25 185 3.39 185 2.60 195 Shew185_1145 COG2222M 4.49 155 2.41 195 14.16 223 1.07 195 1.22 223 9.59 195 Shew185_1147 COG4299S 2.52 185 1.29 195 6.39 223 3.48 195 4.91 223 4.31 223 Shew185_1153 COG1012C 1.17 155 3.29 155 3.06 155 3.70 185 1.41 185 1.42 223 Shew185_1160 COG2897P 2.26 185 2.81 195 1.29 223 2.37 195 1.56 223 3.72 195 Shew185_1169 COG1177E 3.07 185 2.45 195 2.83 223 2.12 195 5.04 223 5.21 195 Shew185_1190 COG1610S 1.96 155 3.72 195 1.50 223 1.50 195 1.02 223 3.22 223 Shew185_1206 COG0038P 2.72 155 1.10 195 1.83 223 1.54 195 1.55 223 4.58 223 Shew185_1231 COG0682M 1.90 155 1.02 195 3.71 223 4.24 195 2.15 223 3.15 195 Shew185_1232 COG0207F 1.82 155 1.94 155 1.30 223 1.84 185 5.68 223 2.95 223 Shew185_1236 COG2938S 2.00 155 2.34 195 1.03 223 7.36 195 3.70 223 3.30 223 Shew185_1248 COG0854H 1.74 155 6.32 155 1.33 155 4.27 185 1.56 185 2.74 195 Shew185_1258 COG0722E 8.85 155 1.17 195 3.66 223 1.57 195 4.58 223 32.75 223

Table B.1 - Continued

Shew185_1266 COG3658C 1.64 155 2.16 155 3.04 155 1.88 185 1.21 185 2.58 195 Shew185_1267 COG3909C 3.14 155 1.47 195 1.61 223 3.35 195 1.91 223 5.40 195 Shew185_1286 COG4585T 3.02 185 4.56 195 1.10 223 4.17 195 5.86 223 5.17 195 Shew185_1302 COG3956R 1.81 185 1.02 155 1.04 155 6.11 185 4.54 223 2.90 223 Shew185_1309 COG1607I 3.64 185 4.40 195 6.67 223 1.21 195 4.56 223 6.47 195 Shew185_1313 COG1309K 7.12 185 1.01 195 1.45 223 6.87 195 1.02 223 19.32 195 Shew185_1329 COG0834ET 2.29 185 1.44 195 2.11 223 2.68 195 2.21 223 3.91 195 Shew185_1331 COG3104E 1.07 185 1.53 155 1.63 155 34.90 185 1.19 185 1.09 223 Shew185_1345 COG1722L 1.80 155 6.12 155 2.83 155 11.07 185 3.47 223 2.88 223 Shew185_1352 COG2363S 2.02 155 5.76 195 1.14 223 1.00 195 1.58 223 3.35 223 Shew185_1362 COG2974L 1.24 155 1.50 155 1.98 155 4.14 185 1.40 185 1.69 223 Shew185_1369 COG2239P 1.40 155 2.68 155 1.31 155 3.74 185 2.57 185 2.02 195 Shew185_1372 COG5006R 2.25 155 2.57 195 1.18 223 2.94 195 6.11 223 3.71 195 59 Shew185_1379 COG1309K 2.82 155 1.44 195 1.07 223 8.37 195 1.20 223 4.77 195 Shew185_1385 COG3608R 1.46 155 1.06 155 2.89 155 2.38 185 2.85 185 2.16 195 Shew185_1418 COG3321Q 1.74 185 1.70 155 2.70 155 1.80 185 27.89 185 2.74 223 Shew185_1419 COG3321Q 1.08 185 1.37 155 1.53 155 1.67 185 3.00 185 1.12 223 Shew185_1451 COG0743I 1.85 155 1.71 155 1.37 223 1.87 185 1.13 223 3.01 223 Shew185_1469 COG1218P 1.19 155 2.57 155 1.71 155 1.77 185 2.18 185 1.51 195 Shew185_1470 COG1940KG 2.42 155 1.96 195 1.60 223 3.38 195 1.47 223 4.18 195 Shew185_1483 COG0625O 1.27 155 2.63 155 1.97 155 2.70 185 1.87 185 1.78 223 Shew185_1532 COG2716E 2.80 155 2.02 195 1.75 223 3.28 195 1.11 223 4.75 195 Shew185_1557 COG3708S 2.37 185 2.70 195 1.37 223 1.17 195 3.54 223 4.07 223 Shew185_1569 COG0076E 1.56 185 1.76 155 2.67 155 14.83 185 2.63 185 2.33 195 Shew185_1603 COG0601EP 4.12 155 1.27 195 2.15 223 2.07 195 4.76 223 7.96 223 Shew185_1616 COG3923L 3.03 185 1.75 195 2.48 223 2.72 195 5.23 223 5.18 223 Shew185_1620 COG3203M 3.46 185 2.22 195 1.24 223 4.18 195 1.30 223 6.15 195

Table B.1 - Continued

Shew185_1629 COG0810M 2.22 155 2.08 195 6.58 223 1.45 195 4.12 223 3.66 195 Shew185_1630 COG4783R 1.08 185 2.80 155 1.81 155 2.03 185 7.70 185 1.11 195 Shew185_1634 COG1670J 2.18 155 1.32 195 5.29 223 2.67 195 6.68 223 3.57 223 Shew185_1637 COG0583K 2.97 155 3.27 195 1.13 223 1.50 195 3.18 223 5.06 223 Shew185_1682 COG4974L 3.88 155 2.91 195 3.03 223 1.82 195 1.65 223 6.85 195 Shew185_1709 COG0546R 3.18 155 1.15 195 3.80 223 1.15 195 1.81 223 5.43 195 Shew185_1714 COG0332I 2.78 155 1.93 195 1.17 223 1.37 195 2.94 223 4.75 223 Shew185_1715 COG0331I 2.24 155 2.39 195 1.99 223 6.35 195 1.35 223 3.69 195 Shew185_1732 COG2199T 3.74 155 7.80 195 1.25 223 1.22 195 5.30 223 6.56 195 Shew185_1746 COG0695O 6.86 185 1.77 195 2.68 223 1.13 195 4.24 223 19.24 195 Shew185_1763 COG0259H 1.18 155 1.37 155 3.58 155 6.49 185 2.63 185 1.50 195 Shew185_1777 COG2233F 1.38 155 2.67 155 24.60 155 1.06 185 1.32 185 1.96 195 Shew185_1796 COG3201H 2.41 155 2.11 195 3.81 223 35.38 195 3.25 223 4.11 195 60 Shew185_1799 COG2259S 4.04 185 1.59 195 1.05 223 1.19 195 1.67 223 7.43 223

Shew185_1802 COG3751O 4.86 155 2.05 195 1.25 223 9.22 195 2.42 223 11.78 223 Shew185_1808 COG4984S 1.97 155 1.02 195 1.60 223 3.12 195 4.66 223 3.24 223 Shew185_1825 COG0626E 1.67 155 3.77 155 2.37 155 2.82 185 2.61 185 2.60 195 Shew185_1827 COG0745TK 3.78 185 2.96 195 1.55 223 2.01 195 1.15 223 6.56 223 Shew185_1835 COG3085S 1.10 185 1.48 155 4.40 155 1.82 185 2.26 185 1.18 223 Shew185_1867 COG2804NU 3.59 185 1.16 195 1.71 223 1.34 195 2.65 223 6.41 223 Shew185_1869 COG3593L 1.84 185 2.62 155 2.15 223 2.09 185 4.22 223 3.00 195 Shew185_1894 COG2894D 1.48 185 1.30 155 2.37 155 5.26 185 7.87 185 2.18 195 Shew185_1904 COG0375R 9.64 155 1.14 195 3.99 223 1.42 195 3.63 223 35.54 195 Shew185_1906 COG0409O 2.27 185 3.85 195 2.31 223 1.33 195 1.03 223 3.73 223 Shew185_1922 COG3131P 1.09 185 3.80 155 2.80 155 1.32 185 1.40 185 1.15 223 Shew185_1929 COG0729M 1.20 185 1.45 155 1.19 155 1.46 185 1.39 185 1.54 195 Shew185_1930 COG1446E 1.38 155 1.29 155 2.79 155 3.12 185 1.69 185 1.95 195

Table B.1 - Continued

Shew185_1950 COG0840NT 1.51 155 3.06 155 2.20 155 3.30 185 4.97 185 2.24 223 Shew185_1965 COG2932K 2.26 185 2.12 195 1.04 223 1.52 195 4.25 223 3.72 223 Shew185_2012 COG2226H 1.26 185 1.82 155 1.85 155 1.64 185 2.14 185 1.76 195 Shew185_2031 COG0028EH 1.99 185 1.19 195 2.36 223 1.66 195 1.16 223 3.26 195 Shew185_2056 COG2717S 1.01 155 1.23 155 2.18 155 1.31 185 3.41 185 1.01 223 Shew185_2061 COG4659C 3.09 185 1.68 195 2.13 223 1.29 195 3.55 223 5.33 223 Shew185_2064 COG2878C 4.50 185 1.21 195 6.14 223 1.60 195 1.42 223 9.66 195 Shew185_2069 COG2842R 1.74 185 3.72 155 1.26 155 3.21 185 1.34 185 2.77 223 Shew185_2117 COG0031E 2.37 155 1.57 195 1.93 223 1.26 195 6.79 223 4.07 195 Shew185_2126 COG1309K 1.42 185 1.13 155 2.18 155 6.10 185 1.04 185 2.07 223 Shew185_2134 COG0232F 1.01 155 1.20 155 1.23 155 1.13 185 1.29 185 1.02 223 Shew185_2150 COG0022C 1.47 155 1.98 155 2.23 155 4.92 185 3.38 185 2.17 195 Shew185_2162 COG0664T 1.07 185 1.10 155 3.27 155 2.79 185 1.76 185 1.09 195 61 Shew185_2166 COG3198S 3.62 185 2.71 195 2.27 223 3.46 195 1.04 223 6.46 195

Shew185_2168 COG4736O 3.01 155 5.28 195 3.68 223 1.40 195 4.86 223 5.14 195 Shew185_2172 COG3437KT 3.65 155 2.86 195 4.25 223 8.17 195 3.28 223 6.51 223 Shew185_2191 COG2378K 1.78 185 2.87 155 3.65 155 10.07 185 2.04 223 2.84 223 Shew185_2211 COG0172J 3.82 185 3.08 195 3.17 223 1.05 195 1.42 223 6.64 195 Shew185_2215 COG1674D 3.93 185 4.49 195 14.18 223 1.72 195 4.52 223 7.00 223 Shew185_2229 COG3706T 1.01 155 2.38 155 1.10 155 26.53 185 29.02 185 1.01 223 Shew185_2233 COG1396K 1.21 185 2.13 155 2.47 155 2.96 185 2.03 185 1.59 223 Shew185_2236 COG4628S 1.63 185 1.29 155 5.47 155 2.24 185 5.33 185 2.52 223 Shew185_2258 COG4574R 5.87 185 1.41 195 1.18 223 24.66 195 3.12 223 15.27 195 Shew185_2264 COG3211R 1.23 185 1.69 155 1.26 155 16.16 185 1.44 185 1.64 195 Shew185_2265 COG3148S 2.75 185 2.86 195 1.85 223 8.79 195 1.32 223 4.70 195 Shew185_2298 COG3139S 1.15 185 2.13 155 1.28 155 2.63 185 3.40 185 1.36 223 Shew185_2300 COG2866E 1.58 185 3.15 155 2.00 155 14.48 185 1.60 185 2.36 195

Table B.1 - Continued

Shew185_2319 COG2022H 1.69 185 1.39 155 1.21 155 6.61 185 2.15 185 2.67 195 Shew185_2328 COG2384R 2.66 185 1.01 195 2.86 223 4.82 195 1.01 223 4.52 195 Shew185_2331 COG2128S 2.42 185 2.10 195 3.97 223 8.45 195 2.48 223 4.22 195 Shew185_2335 COG2350S 1.14 155 3.95 155 1.10 155 4.84 185 1.33 185 1.34 195 Shew185_2337 COG3387G 1.26 155 1.12 155 1.75 155 5.70 185 2.97 185 1.75 223 Shew185_2373 COG2814G 2.74 185 1.75 195 1.89 223 4.64 195 1.31 223 4.65 195 Shew185_2386 COG1104E 1.26 185 1.38 155 1.35 155 4.12 185 3.17 185 1.75 223 Shew185_2410 COG0665E 1.42 155 1.63 155 2.40 155 3.82 185 3.00 185 2.07 223 Shew185_2413 COG1199KL 4.11 155 2.02 195 4.09 223 6.10 195 2.38 223 7.85 195 Shew185_2448 COG0144J 2.41 155 1.39 195 2.79 223 1.31 195 6.46 223 4.15 195 Shew185_2456 COG0572F 1.38 185 1.80 155 1.26 155 6.50 185 5.49 185 1.96 195 Shew185_2468 COG2838C 1.19 155 1.52 155 1.91 155 2.35 185 2.37 185 1.53 195 Shew185_2480 COG0277C 1.53 185 1.65 155 1.94 155 1.88 185 2.13 185 2.29 223 62 Shew185_2494 COG0745TK 1.90 185 11.24 195 5.09 223 38.19 185 2.07 223 3.14 223

Shew185_2505 COG0074C 1.99 185 6.85 195 1.50 223 3.68 195 14.02 223 3.27 223 Shew185_2507 COG0508C 2.28 155 1.14 195 3.47 223 1.78 195 11.16 223 3.81 195 Shew185_2509 COG0479C 1.05 185 2.24 155 1.14 155 1.89 185 1.01 185 1.05 223 Shew185_2515 COG3696P 4.19 155 1.11 195 7.73 223 2.85 195 2.23 223 8.31 195 Shew185_2516 COG0841V 2.21 155 1.96 195 1.91 223 3.29 195 5.86 223 3.62 195 Shew185_2533 COG1309K 1.05 155 2.40 155 2.06 155 2.03 185 1.02 185 1.05 195 Shew185_2578 COG2199T 2.04 155 1.08 195 2.30 223 2.40 195 1.63 223 3.39 223 Shew185_2582 COG0524G 1.06 155 3.32 155 4.62 155 3.69 185 1.58 185 1.06 223 Shew185_2604 COG1067O 1.99 155 1.84 195 7.35 223 3.24 195 1.02 223 3.27 223 Shew185_2616 COG0503F 2.74 155 1.17 195 3.40 223 3.54 195 2.32 223 4.65 223 Shew185_2630 COG2951M 1.72 155 2.96 155 2.80 155 2.04 185 1.42 185 2.71 223 Shew185_2646 COG0318IQ 1.13 155 2.02 155 1.39 155 2.17 185 8.06 185 1.31 195 Shew185_2652 COG0583K 1.11 185 1.92 155 1.24 155 2.85 185 2.69 185 1.22 223

Table B.1 - Continued

Shew185_2665 COG0737F 1.39 155 5.12 155 2.27 155 2.60 185 3.31 185 1.99 223 Shew185_2684 COG1714S 1.94 155 2.82 195 1.54 223 3.43 195 5.04 223 3.18 195 Shew185_2693 COG0282C 1.89 155 1.05 195 11.61 223 7.11 185 3.71 223 3.08 223 Shew185_2725 COG0134E 2.16 155 1.35 195 2.18 223 2.10 195 3.72 223 3.56 223 Shew185_2730 COG3486Q 2.42 185 1.70 195 1.29 223 1.03 195 1.51 223 4.23 223 Shew185_2746 COG1529C 2.29 185 1.04 195 1.46 223 3.69 195 13.37 223 3.89 195 Shew185_2751 COG0491R 2.79 185 4.86 195 1.22 223 5.18 195 2.40 223 4.75 223 Shew185_2761 COG0136E 2.02 155 1.76 195 4.84 223 4.22 195 5.31 223 3.35 223 Shew185_2783 COG1721R 1.15 155 5.48 155 1.57 155 1.32 185 1.49 185 1.35 195 Shew185_2798 COG1670J 3.90 155 1.19 195 1.31 223 3.75 195 2.08 223 6.98 223 Shew185_2799 COG4935O 1.83 185 2.23 155 2.24 223 5.87 185 3.50 223 2.98 195 Shew185_2813 COG0673R 1.12 155 1.61 155 1.36 155 1.77 185 2.16 185 1.29 223 Shew185_2816 COG0835NT 2.41 185 1.30 195 2.27 223 1.77 195 2.81 223 4.13 223 6 3

Shew185_2836 COG1946I 1.40 155 1.18 155 1.68 155 2.43 185 2.35 185 2.01 223 Shew185_2843 COG2076P 1.07 155 3.12 155 1.43 155 3.81 185 2.64 185 1.09 223 Shew185_2856 COG0730R 1.21 185 3.77 155 1.53 155 2.65 185 1.72 185 1.59 195 Shew185_2873 COG0442J 1.14 185 1.87 155 1.58 155 4.51 185 1.30 185 1.33 223 Shew185_2898 COG1898M 1.78 185 2.25 155 1.81 155 1.64 185 5.24 223 2.85 195 Shew185_2901 COG2148M 4.88 185 2.60 195 3.99 223 3.53 195 2.18 223 12.12 223 Shew185_2904 COG0250K 1.06 185 1.23 155 1.92 155 1.97 185 1.67 185 1.07 195 Shew185_2906 COG2204T 1.80 185 3.15 155 1.07 155 1.21 185 1.46 223 2.89 223 Shew185_2911 COG0835NT 1.29 185 2.46 155 1.43 155 2.25 185 2.86 185 1.80 223 Shew185_2936 COG1766NU 1.15 155 2.07 155 1.54 155 8.75 185 30.84 185 1.36 223 Shew185_2937 COG1677NU 5.57 155 4.09 195 2.77 223 5.62 195 1.87 223 14.19 195 Shew185_2943 COG1345N 3.99 185 1.73 195 1.49 223 1.31 195 2.50 223 7.28 223 Shew185_2951 COG1344N 1.95 155 1.21 195 3.94 223 1.12 195 4.66 223 3.22 223 Shew185_2955 COG1706N 2.24 185 5.70 195 8.74 223 4.13 195 1.53 223 3.70 195

Table B.1 - Continued

Shew185_2957 COG4786N 3.45 155 3.59 195 1.53 223 2.57 195 1.16 223 6.12 223 Shew185_2980 COG2604S 1.06 155 3.07 155 2.49 155 15.53 185 14.74 185 1.06 195 Shew185_2994 COG0046F 1.13 155 5.13 155 1.37 155 16.86 185 12.72 185 1.32 223 Shew185_3015 COG1246E 4.80 155 1.57 195 2.64 223 3.67 195 5.20 223 11.74 195 Shew185_3054 COG2924CO 1.99 155 3.39 195 4.51 223 7.16 195 1.03 223 3.28 195 Shew185_3063 COG2230M 2.63 185 3.49 195 2.71 223 1.01 195 3.72 223 4.51 195 Shew185_3093 COG3325G 1.11 155 1.39 155 3.18 155 1.04 185 1.53 185 1.22 195 Shew185_3098 COG0513LKJ 3.61 185 1.63 195 1.43 223 1.72 195 1.55 223 6.45 223 Shew185_3146 COG0642T 2.59 155 1.59 195 1.53 223 1.89 195 2.39 223 4.44 195 Shew185_3154 COG0781K 3.21 155 3.64 195 7.68 223 2.36 195 3.86 223 5.57 195 Shew185_3169 COG0845M 2.42 185 7.60 195 1.89 223 2.35 195 3.40 223 4.16 195 Shew185_3175 COG1432S 1.59 155 1.87 155 1.66 155 10.07 185 1.49 185 2.36 223 Shew185_3182 COG1309K 1.51 185 1.06 155 2.14 155 2.85 185 2.38 185 2.24 223 64

Shew185_3186 COG0583K 1.46 185 1.60 155 1.84 155 7.26 185 7.15 185 2.16 223 Shew185_3189 COG0824R 1.15 155 1.37 155 6.40 155 3.97 185 12.12 185 1.35 223 Shew185_3191 COG1639T 6.77 155 2.17 195 2.89 223 3.20 195 1.44 223 17.77 223 Shew185_3192 COG0826O 2.31 155 1.16 195 7.81 223 8.06 195 3.40 223 3.96 195 Shew185_3194 COG1404O 2.14 185 12.23 195 1.74 223 3.05 195 1.20 223 3.54 195 Shew185_3197 COG0826O 1.32 185 1.19 155 3.43 155 5.88 185 1.73 185 1.84 223 Shew185_3212 COG3005C 3.19 185 2.09 195 2.94 223 3.40 195 4.79 223 5.47 195 Shew185_3235 COG4108J 1.59 185 1.03 155 1.69 155 1.00 185 1.44 185 2.36 195 Shew185_3244 COG0745TK 2.99 185 2.01 195 1.35 223 10.14 195 2.81 223 5.13 195 Shew185_3289 COG0293J 1.61 185 4.04 155 1.24 155 5.45 185 1.77 185 2.43 195 Shew185_3295 COG5266P 2.54 155 1.82 195 2.56 223 1.37 195 2.50 223 4.33 223 Shew185_3296 COG3656S 4.23 155 2.40 195 1.30 223 5.87 195 3.45 223 8.53 223 Shew185_3313 COG1466L 5.06 155 2.50 195 4.03 223 16.34 195 3.32 223 12.62 223 Shew185_3342 COG4605P 1.69 155 2.44 155 1.92 155 7.59 185 1.35 185 2.65 223

Table B.1 - Continued

Shew185_3389 COG0120G 1.34 155 2.50 155 4.20 155 5.40 185 2.45 185 1.85 195 Shew185_3408 COG0745TK 1.40 155 2.09 155 1.20 155 7.67 185 3.28 185 2.01 195 Shew185_3416 COG0263E 1.62 155 1.02 155 2.20 155 5.36 185 6.30 185 2.52 195 Shew185_3436 COG2869C 2.19 185 1.63 195 1.37 223 2.66 195 4.75 223 3.57 223 Shew185_3439 COG4772P 1.70 185 5.33 155 1.04 155 1.94 185 1.80 185 2.69 223 Shew185_3448 COG1167KE 2.22 155 1.33 195 2.05 223 2.42 195 3.35 223 3.67 195 Shew185_3485 COG2207K 3.47 185 1.89 195 3.13 223 5.33 195 1.95 223 6.25 223 Shew185_3486 COG0518F 3.53 155 5.63 195 1.15 223 2.28 195 1.44 223 6.32 223 Shew185_3487 COG3131P 1.74 155 4.99 155 1.11 155 5.03 185 11.26 185 2.78 195 Shew185_3517 COG0450O 1.25 155 1.47 155 2.93 155 6.77 185 1.07 185 1.72 223 Shew185_3523 COG1651O 1.67 155 2.37 155 1.39 155 1.02 185 4.78 185 2.64 223 Shew185_3525 COG1114E 1.18 185 1.47 155 1.90 155 19.41 185 1.06 185 1.47 195 Shew185_3565 COG0667C 1.46 185 1.80 155 1.00 155 1.11 185 2.21 185 2.16 223 65

Shew185_3566 COG3713M 2.25 155 2.43 195 3.79 223 4.51 195 3.19 223 3.71 223 Shew185_3569 COG0745TK 2.12 155 2.51 195 3.27 223 3.29 195 2.91 223 3.51 223 Shew185_3576 COG2166R 1.20 185 3.65 155 4.35 155 1.96 185 1.53 185 1.54 223 Shew185_3594 COG3000I 3.37 155 4.72 195 1.59 223 2.27 195 2.52 223 6.00 223 Shew185_3605 COG0004P 3.24 155 1.38 195 1.97 223 3.47 195 1.53 223 5.74 223 Shew185_3612 COG0456R 3.83 185 1.61 195 1.37 223 9.26 195 3.58 223 6.64 223 Shew185_3646 COG5184DZ 2.58 155 2.77 195 2.26 223 9.60 195 1.70 223 4.44 195 Shew185_3658 COG0682M 2.16 185 2.50 195 7.44 223 6.71 195 1.99 223 3.56 195 Shew185_3663 COG2199T 1.87 155 1.96 195 1.58 223 1.58 185 2.74 223 3.05 195 Shew185_3708 COG0347E 1.16 155 3.37 155 1.96 155 3.35 185 3.16 185 1.41 223 Shew185_3732 COG4566T 2.78 185 2.95 195 3.16 223 1.08 195 2.53 223 4.73 195 Shew185_3748 COG3212S 4.29 185 3.45 195 2.56 223 2.41 195 13.16 223 8.82 195 Shew185_3752 COG1012C 3.22 155 2.79 195 1.25 223 1.46 195 1.95 223 5.65 223 Shew185_3776 COG0688I 4.31 155 4.07 195 3.27 223 6.05 195 4.23 223 8.85 195

Table B.1 - Continued

Shew185_3779 COG0584C 1.83 185 1.34 155 1.50 223 7.56 185 1.71 223 2.96 195 Shew185_3781 COG0111HE 1.67 185 4.03 155 2.78 155 1.82 185 12.49 185 2.64 223 Shew185_3786 COG0515RTKL 3.02 155 2.41 195 1.68 223 2.54 195 1.35 223 5.16 195 Shew185_3793 COG2204T 1.86 185 6.91 155 3.83 223 1.75 185 1.62 223 3.02 223 Shew185_3795 COG3076S 3.50 155 3.57 195 3.58 223 1.74 195 4.28 223 6.30 223 Shew185_3803 COG0668M 1.09 185 1.47 155 1.52 155 1.43 185 1.32 185 1.14 195 Shew185_3836 COG4977K 2.28 185 1.82 195 1.28 223 12.16 195 1.55 223 3.88 223 Shew185_3854 COG0757E 5.52 155 1.13 195 5.94 223 1.55 195 23.78 223 14.15 195 Shew185_3866 COG3590O 3.09 185 1.22 195 4.47 223 10.55 195 5.49 223 5.34 223 Shew185_3872 COG3301P 2.27 185 2.29 195 4.48 223 7.16 195 9.70 223 3.73 223 Shew185_3891 COG0382H 8.74 155 3.35 195 1.42 223 2.40 195 1.36 223 32.20 223 Shew185_3896 COG2321R 1.71 155 6.98 155 3.13 155 3.43 185 3.49 185 2.70 223 Shew185_3906 COG3118O 2.52 155 1.23 195 4.46 223 4.75 195 1.40 223 4.32 223 66 Shew185_3907 COG0471P 3.26 185 4.16 195 4.72 223 2.84 195 7.31 223 5.81 223 Shew185_3948 COG1051F 2.88 185 7.45 195 3.45 223 5.93 195 1.01 223 4.86 223 Shew185_3958 COG1359S 4.47 155 2.37 195 1.86 223 3.79 195 5.96 223 9.25 223 Shew185_3967 COG2846D 2.37 155 2.46 195 1.06 223 1.28 195 2.33 223 4.07 223 Shew185_3984 COG1566V 1.36 185 1.69 155 2.35 155 1.65 185 3.03 185 1.89 223 Shew185_4017 COG0204I 1.25 155 2.29 155 1.38 155 2.45 185 1.38 185 1.75 195 Shew185_4035 COG1651O 2.40 155 26.03 195 5.57 223 5.07 195 3.26 223 4.08 195 Shew185_4050 COG2207K 2.74 185 1.73 195 1.38 223 5.09 195 1.98 223 4.67 195 Shew185_4058 COG1609K 1.31 155 3.16 155 1.96 155 2.40 185 5.59 185 1.82 195 Shew185_4081 COG0338L 3.24 185 2.50 195 1.07 223 4.14 195 7.45 223 5.69 223 Shew185_4100 COG1526C 2.45 185 15.25 195 1.08 223 4.31 195 1.09 223 4.25 223 Shew185_4143 COG0790R 3.08 155 5.23 195 2.14 223 2.45 195 1.17 223 5.24 223 Shew185_4150 COG1522K 1.27 185 1.59 155 1.46 155 7.82 185 3.20 185 1.78 195 Shew185_4165 COG0742L 4.23 185 1.44 195 2.59 223 3.01 195 24.87 223 8.55 195

Table B.1 - Continued

Shew185_4195 COG0642T 1.74 155 4.37 155 1.27 155 1.74 185 1.77 185 2.78 223 Shew185_4196 COG0745TK 3.08 185 1.15 195 2.62 223 2.75 195 6.72 223 5.28 223 Shew185_4202 COG2183K 1.89 155 1.35 195 2.59 223 3.46 185 1.37 223 3.10 195 Shew185_4229 COG0110R 2.07 155 2.40 195 1.11 223 2.71 195 3.96 223 3.42 223 Shew185_4275 COG1404O 1.03 185 1.80 155 2.66 155 3.33 185 2.86 185 1.03 195 Shew185_4308 COG0619P 4.60 155 3.13 195 1.45 223 1.21 195 5.14 223 10.28 195 Shew185_4337 COG0478T 1.82 185 1.19 155 2.14 223 1.73 185 2.41 223 2.95 223 Shew185_4346 COG0218R 5.61 155 1.45 195 4.46 223 4.25 195 5.78 223 14.38 195 Shew185_4349 COG3078S 1.06 155 1.32 155 1.01 155 1.31 185 1.53 185 1.07 223 Shew185_4352 COG1816F 1.88 185 2.69 195 3.32 223 5.01 185 3.42 223 3.07 195 Shew185_4360 COG0449M 3.32 155 4.02 195 1.52 223 2.08 195 1.54 223 5.94 195 Shew185_4364 COG0355C 2.84 185 7.17 195 1.37 223 9.45 195 3.97 223 4.79 195 Shew185_4368 COG0712C 1.76 155 3.99 155 1.43 155 1.72 185 1.70 185 2.81 195 67 Shew185_4373 COG1475K 2.67 185 3.81 195 1.09 223 3.88 195 2.93 223 4.53 223

Table B.2 COG descriptions per category.

COG Group Description A-RNA processing and modification C-Energy production and conversion D-Cell cycle control, chromosome partitioning E-Amino acid transport and metabolism F-Nucleotide transport and metabolism G-Carbohydrate transport and metabolism H-Coenzyme transport and metabolism I-Lipid transport and metabolism J-Translation, ribosomal structure and biogenesis K-Transcription L-Replication, recombination and repair M-Cell wall, membrane, and envelope biogenesis N-Cellular motility O-Posttranslational modification P-Inorganic ion transport and metabolism Q-Secondary metabolites biosynthesis, transport and metabolism R-General function prediction only S-Unknown function T-Signal transduction mechanisms U-Intracellular trafficking, secretion, and vesicular transport V-Defense mechanisms Z-Cytoskeleton

68

REFERENCES

1. Derby H., H.B., Bacteriology of Butter. IV. Bacteriological studies of surface taint butter. Iowa Agric. Exp. Stn. Res. Bull., 1931. 145: p. 387-416. 2. Macdonell, M.T. and R.R. Colwell, Phylogeny of the Vibrionaceae, and Recommendation for 2 New Genera, Listonella and Shewanella. Systematic and Applied Microbiology, 1985. 6(2): p. 171-182. 3. Fredrickson, J.K., et al., Towards environmental systems biology of Shewanella. Nature Reviews Microbiology, 2008. 6(8): p. 592-603. 4. Hau, H.H. and J.A. Gralnick, Ecology and biotechnology of the genus Shewanella. Annual Review of Microbiology, 2007. 61: p. 237-258. 5. Debois, J., et al., Pseudomonas putrefaciens as a cause of infection in humans. Journal of Clinical Pathology, 1975. 28(12): p. 993-6. 6. Degreef, H., J. Debois, and J. Vandepitte, Pseudomonas putrefaciens as a cause of infection of venous ulcers. Dermatologica, 1975. 151(5): p. 296-301. 7. Holmes, B., S.P. Lapage, and H. Malnick, Strains of Pseudomonas-Putrefaciens from Clinical Material. Journal of Clinical Pathology, 1975. 28(2): p. 149-155. 8. Venkateswaran, K., et al., Polyphasic taxonomy of the genus Shewanella and description of Shewanella oneidensis sp. nov. International Journal of Systematic Bacteriology, 1999. 49: p. 705-724. 9. Ziemke, F., et al., Reclassification of Shewanella putrefaciens Owen's genomic group II as Shewanella baltica sp. nov. International Journal of Systematic Bacteriology, 1998. 48: p. 179-186. 10. Caccavo, F., R.P. Blakemore, and D.R. Lovley, A Hydrogen-Oxidizing, Fe(Iii)-Reducing Microorganism from the Great Bay Estuary, New-Hampshire. Applied and Environmental Microbiology, 1992. 58(10): p. 3211-3216. 11. Wildung, R.E., et al., Effect of electron donor and solution chemistry on products of dissimilatory reduction of technetium by Shewanella putrefaciens. Applied and Environmental Microbiology, 2000. 66(6): p. 2451-2460. 12. Lloyd, J.R., P. Yong, and L.E. Macaskie, Biological reduction and removal of Np(V) by two microorganisms. Environmental Science & Technology, 2000. 34(7): p. 1297-1301. 13. Boukhalfa, H., et al., Plutonium(IV) reduction by the metal-reducing bacteria Geobacter metallireducens GS15 and Shewanella oneidensis MR1. Applied and Environmental Microbiology, 2007. 73(18): p. 5897-5903. 14. Nealson KH, S.J., ed. Ecophysiology of the genus Shewanella. The Prokaryotes, ed. M. Dworkin. 2005, Springer-Verlag: New York. 15. Lanthier, M., K.B. Gregory, and D.R. Lovley, Growth with high planktonic biomass in Shewanella oneidensis fuel cells. FEMS Microbiol Lett, 2008. 278(1): p. 29-35. 16. Ringeisen, B.R., et al., High power density from a miniature microbial fuel cell using Shewanella oneidensis DSP10. Environmental Science & Technology, 2006. 40(8): p. 2629-34. 17. Konstantinidis, K.T., et al., Comparative systems biology across an evolutionary gradient within the Shewanella genus. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(37): p. 15909-15914. 18. Caro-Quintero, A., et al., Unprecedented levels of horizontal gene transfer among spatially co-occurring Shewanella bacteria from the Baltic Sea. ISME J, 2010. 69

19. Brettar, I., E.R. Moore, and M.G. Hofle, Phylogeny and Abundance of Novel Denitrifying Bacteria Isolated from the Water Column of the Central Baltic Sea. Microb Ecol, 2001. 42(3): p. 295-305. 20. Bonsdorff, E., C. Ronnberg, and K. Aarnio, Some ecological properties in relation to eutrophication in the Baltic Sea. Hydrobiologia, 2002. 475(1): p. 371-377. 21. Ronnberg, C. and E. Bonsdorff, Baltic Sea eutrophication: area-specific ecological consequences. Hydrobiologia, 2004. 514(1-3): p. 227-241. 22. Wulff, F., A. Stigebrandt, and L. Rahm, Nutrient Dynamics of the Baltic Sea. Ambio, 1990. 19(3): p. 126-133. 23. Majewski, J., et al., Barriers to genetic exchange between bacterial species: Streptococcus pneumoniae transformation. Journal of Bacteriology, 2000. 182(4): p. 1016-1023. 24. Vos, M., Why do bacteria engage in homologous recombination? Trends in Microbiology, 2009. 17(6): p. 226-232. 25. Gogarten, J.P. and J.P. Townsend, Horizontal gene transfer, genome innovation and evolution. Nature Reviews Microbiology, 2005. 3(9): p. 679-687. 26. Narra, H.P. and H. Ochman, Of what use is sex to bacteria? Current Biology, 2006. 16(17): p. R705-R710. 27. Redfield, R.J., Do bacteria have sex? Nature Reviews Genetics, 2001. 2(8): p. 634-639. 28. Ochman, H., J.G. Lawrence, and E.A. Groisman, Lateral gene transfer and the nature of bacterial innovation. Nature, 2000. 405(6784): p. 299-304. 29. Fraser, C., W.P. Hanage, and B.G. Spratt, Recombination and the nature of bacterial speciation. Science, 2007. 315(5811): p. 476-480. 30. Hacker, J. and E. Carniel, Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep, 2001. 2(5): p. 376-81. 31. Schmidt, H. and M. Hensel, Pathogenicity islands in bacterial pathogenesis. Clin Microbiol Rev, 2004. 17(1): p. 14-56. 32. Miller, S.R., et al., Discovery of a free-living chlorophyll d-producing cyanobacterium with a hybrid proteobacterial/cyanobacterial small-subunit rRNA gene. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(3): p. 850-855. 33. Mayr, E., The biological meaning of species. Biol.J. Liizn. Soc., 1969(1): p. 311-320. 34. Doolittle, W.F. and O. Zhaxybayeva, On the origin of prokaryotic species. Genome Res, 2009. 19(5): p. 744-56. 35. Fraser, C., et al., The Bacterial Species Challenge: Making Sense of Genetic and Ecological Diversity. Science, 2009. 323(5915): p. 741-746. 36. Konstantinidis, K.T., A. Ramette, and J.M. Tiedje, The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B-Biological Sciences, 2006. 361(1475): p. 1929-1940. 37. Wayne, L.G., et al., Report of the Ad-Hoc-Committee on Reconciliation of Approaches to Bacterial Systematics. International Journal of Systematic Bacteriology, 1987. 37(4): p. 463-464. 38. Maiden, M.C., et al., Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A, 1998. 95(6): p. 3140-5. 39. Woese, C.R., Bacterial evolution. Microbiological Reviews, 1987. 51(2): p. 221-71. 40. Feil, E.J., et al., Recombination within natural populations of pathogenic bacteria: Short- term empirical estimates and long-term phylogenetic consequences. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(1): p. 182- 187.

70

41. Stackebrandt, E.E., J., Taxonomic parameters revisited: tarnished gold standards. Microbiology Today, 2006(33): p. 152-155. 42. Konstantinidis, K.T. and J.M. Tiedje, Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(7): p. 2567-2572. 43. Welch, R.A., et al., Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A, 2002. 99(26): p. 17020-4. 44. Konstantinidis, K.T. and J.M. Tiedje, Prokaryotic taxonomy and phylogeny in the genomic era: advancements and challenges ahead. Curr Opin Microbiol, 2007. 10(5): p. 504-9. 45. Moran, N.A., Microbial minimalism: Genome reduction in bacterial pathogens. Cell, 2002. 108(5): p. 583-586. 46. Denef, V.J., et al., Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 2010. 107(6): p. 2383-2390. 47. Connor, N., et al., Ecology of Speciation in the Genus Bacillus. Applied and Environmental Microbiology, 2010. 76(5): p. 1349-1358. 48. Koeppel, A., et al., Identifying the fundamental units of bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics. Proc Natl Acad Sci U S A, 2008. 105(7): p. 2504-9. 49. Cohan, F.M., Towards a conceptual and operational union of bacterial systematics, ecology, and evolution. Philosophical Transactions of the Royal Society B-Biological Sciences, 2006. 361(1475): p. 1985-1996. 50. Reed, H.E. and J.B.H. Martiny, Testing the functional significance of microbial composition in natural communities. Fems Microbiology Ecology, 2007. 62(2): p. 161- 170. 51. Woodcock, S., et al., Neutral assembly of bacterial communities. Fems Microbiology Ecology, 2007. 62(2): p. 171-180. 52. Becking, L.G.M.B., Geobiologie of Inleiding Tot de Milieukunde, 1934(Van Stockkum & Zoon, The Hague). 53. Glockner, F.O., B.M. Fuchs, and R. Amann, Bacterioplankton compositions of lakes and oceans: a first comparison based on fluorescence in situ hybridization. Applied and Environmental Microbiology, 1999. 65(8): p. 3721-3726. 54. Hubbell, S.P., ed. The Unified Neutral Theory of Biodiversity and Biogeography. 2001, Princeton University Press: Princeton, N.J. 55. Curtis, T.P. and W.T. Sloan, Prokaryotic diversity and its limits: microbial community structure in nature and implications for microbial ecology. Curr Opin Microbiol, 2004. 7(3): p. 221-6. 56. Oleksiak, M.F., G.A. Churchill, and D.L. Crawford, Variation in gene expression within and among natural populations. Nature Genetics, 2002. 32(2): p. 261-266. 57. Langenheder, S., E.S. Lindstrom, and L.J. Tranvik, Structure and function of bacterial communities emerging from different sources under identical conditions. Applied and Environmental Microbiology, 2006. 72(1): p. 212-220. 58. Hunt, D.E., et al., Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science, 2008. 320(5879): p. 1081-1085. 59. King, M.C., Wilson, A.C. , Evolution at Two Levels in Humans and Chimpanzees. Science, 1975. 188(4184): p. 107-116. 60. Enard, W., et al., Intra- and interspecific variation in primate gene expression patterns. Science, 2002. 296(5566): p. 340-3. 61. Gilad, Y., et al., Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature, 2006. 440(7081): p. 242-5. 71

62. Tirosh, I., N. Barkai, and K.J. Verstrepen, Promoter architecture and the evolvability of gene expression. J Biol, 2009. 8(11): p. 95. 63. Carroll, S.B., Evolution at two levels: On genes and form. Plos Biology, 2005. 3(7): p. 1159-1166. 64. Denver, D.R., et al., The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nature Genetics, 2005. 37(5): p. 544-548. 65. Wray, G.A., The evolutionary significance of cis-regulatory mutations. Nature Reviews Genetics, 2007. 8(3): p. 206-16. 66. Perez, J.C. and E.A. Groisman, Transcription factor function and promoter architecture govern the evolution of bacterial regulons. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(11): p. 4319-4324. 67. Lintner, R.E., et al., Limited functional conservation of a global regulator among related bacterial genera: Lrp in Escherichia, Proteus and Vibrio. Bmc Microbiology, 2008. 8: p. - . 68. Susanna, K.A., et al., A single, specific thymine mutation in the ComK-Binding site severely decreases binding and transcription activation by the competence transcription factor ComK of Bacillus subtilis. Journal of Bacteriology, 2007. 189(13): p. 4718-4728. 69. van Hijum, S.A.F.T., M.H. Medema, and O.P. Kuipers, Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation. Microbiology and Molecular Biology Reviews, 2009. 73(3): p. 481-+. 70. Babu, M.M., S.A. Teichmann, and L. Aravind, Evolutionary dynamics of prokaryotic transcriptional regulatory networks. Journal of Molecular Biology, 2006. 358(2): p. 614- 633. 71. Tuch, B.B., H. Li, and A.D. Johnson, Evolution of eukaryotic transcription circuits. Science, 2008. 319(5871): p. 1797-1799. 72. Hendriksen, W.T., et al., Regulation of gene expression in Streptococcus pneumoniae by response regulator 09 is strain dependent. Journal of Bacteriology, 2007. 189(4): p. 1382-1389. 73. Lemos, B., et al., Rates of divergence in gene expression profiles of primates, mice, and flies: Stabilizing selection and variability among functional categories. Evolution, 2005. 59(1): p. 126-137. 74. Perez, J.C. and E.A. Groisman, Evolution of transcriptional regulatory circuits in bacteria. Cell, 2009. 138(2): p. 233-44. 75. Crawford, D.L. and M.F. Oleksiak, The biological importance of measuring individual variation. Journal of Experimental Biology, 2007. 210(9): p. 1613-1621. 76. Oleksiak, M.F., J.L. Roach, and D.L. Crawford, Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nature Genetics, 2005. 37(1): p. 67-72. 77. McInerney, J.O., J.A. Cotton, and D. Pisani, The prokaryotic tree of life: past, present ... and future? Trends in Ecology & Evolution, 2008. 23(5): p. 276-281. 78. Cohan, F.M., The Effects of Rare but Promiscuous Genetic Exchange on Evolutionary Divergence in Prokaryotes. American Naturalist, 1994. 143(6): p. 965-986. 79. Green, J.L., B.J. Bohannan, and R.J. Whitaker, Microbial biogeography: from taxonomy to traits. Science, 2008. 320(5879): p. 1039-43. 80. Robert S. Burlage, R.A., David Stahl, Gill Geesey, Gary Sayler ed. Techniques in Microbial Ecology. 1998, Oxford University Press: New York. 81. de Queiroz, K., Ernst Mayr and the modern concept of species. Proc Natl Acad Sci U S A, 2005. 102 Suppl 1: p. 6600-7. 82. Larkin, M.A., et al., Clustal W and clustal X version 2.0. Bioinformatics, 2007. 23(21): p. 2947-2948. 83. Cohan, F.M. and E.B. Perry, A systematics for discovering the fundamental units of bacterial diversity. Current Biology, 2007. 17(10): p. R373-R386.

72

84. Gao, X.L., et al., Oligonucleotide synthesis using solution photogenerated acids. Journal of the American Chemical Society, 1998. 120(48): p. 12698-12699. 85. Tusher, V.G., R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response (vol 98, pg 5116, 2001). Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(18): p. 10515-10515. 86. Brazma, A., et al., Minimum information about a microarray experiment (MIAME) - toward standards for microarray data. Nature Genetics, 2001. 29(4): p. 365-371. 87. Tabor, C.W. and H. Tabor, Polyamines in Microorganisms. Microbiological Reviews, 1985. 49(1): p. 81-99. 88. Cohan, F.M. and A.F. Koeppel, The Origins of Ecological Diversity in Prokaryotes. Current Biology, 2008. 18(21): p. R1024-U17. 89. Turesson, G., The Species and the Variety as Ecological Units. Herditas, 1922. 3: p. 100-113. 90. Graneli, E., et al., Nutrient Limitation of Primary Production in the Baltic Sea Area. Ambio, 1990. 19(3): p. 142-151. 91. Ramette, A. and J.M. Tiedje, Biogeography: An emerging cornerstone for understanding prokaryotic diversity, ecology, and evolution. Microbial Ecology, 2007. 53(2): p. 197-207. 92. Ziemke, F., I. Brettar, and M.G. Hofle, Stability and diversity of the genetic structure of a Shewanella putrefaciens population in the water column of the central Baltic. Aquatic Microbial Ecology, 1997. 13(1): p. 63-74. 93. Taniguchi, Y., et al., Quantifying E-coli Proteome and Transcriptome with Single- Molecule Sensitivity in Single Cells. Science, 2010. 329(5991): p. 533-538. 94. Tyagi, S., GENOMICS E. coli, What a Noisy Bug. Science, 2010. 329(5991): p. 518- 519. 95. Hershberg, R. and H. Margalit, Co-evolution of transcription factors and their targets depends on mode of regulation. Genome Biology, 2006. 7(7): p. -. 96. Quackenbush, J., Microarray data normalization and transformation. Nature Genetics, 2002. 32 Suppl: p. 496-501. 97. Benjamini, Y. and Y. Hochberg, Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B- Methodological, 1995. 57(1): p. 289-300. 98. Cui, X.Q. and G.A. Churchill, Statistical tests for differential expression in cDNA microarray experiments. Genome Biology, 2003. 4(4): p. -. 99. Whitehead, A. and D.L. Crawford, Variation in tissue-specific gene expression among natural populations. Genome Biology, 2005. 6(2): p. R13. 100. Whitehead, A. and D.L. Crawford, Neutral and adaptive variation in gene expression. Proc Natl Acad Sci U S A, 2006. 103(14): p. 5425-30.

73

BIOGRAPHICAL INFORMATION

Sealy Hambright was born in Tyler Texas and later graduated from Whitehouse High School in

2002. He subsequently attended the University of Texas at Tyler graduating in 2006 with a

Bachelor of Science in Biology and a minor in Chemistry. In the fall of 2010 he graduated from the University of Texas at Arlington with a Masters of Science in Biology with a focus in

Microbiology. His work consisted of elucidating expressional divergence among closely related

Shewanella baltica ecotypes. Following this work he will seek to obtain a Ph.D. in the general field of microbiology. It is also universally accepted that Sealy Hambright is in fact, the man, and should be referred to as such.

74