The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses

Barbara A. Methe´ *†, Karen E. Nelson*, Jody W. Deming‡§, Bahram Momen¶, Eugene Melamudʈ, Xijun Zhangʈ, John Moultʈ, Ramana Madupu*, William C. Nelson*, Robert J. Dodson*, Lauren M. Brinkac*, Sean C. Daugherty*, Anthony S. Durkin*, Robert T. DeBoy*, James F. Kolonay*, Steven A. Sullivan*, Liwei Zhou*, Tanja M. Davidsen*, Martin Wu*, Adrienne L. Huston**, Matthew Lewis*, Bruce Weaver*, Janice F. Weidman*, Hoda Khouri*, Terry R. Utterback*, Tamara V. Feldblyum*, and Claire M. Fraser*

*The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850; ‡School of Oceanography and Astrobiology Program, University of Washington, Seattle, WA 98195; ¶Department of Natural Resource Sciences and Landscape Architecture, University of Maryland, 1108 H. J. Patterson Hall, College Park, MD 20742; ʈCenter for Advanced Research in Biotechnology, Biotechnology Institute, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850; and **Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802

Contributed by Jody W. Deming, June 9, 2005 The completion of the 5,373,180-bp genome sequence of the (Ϫ1°C), cells continue to swim in sugar solutions down to Ϫ10°C marine psychrophilic bacterium Colwellia psychrerythraea 34H, a (9), and growth can occur under deep-sea pressures. C. psychreryth- model for the study of life in permanently cold environments, raea also produces extracellular polysaccharides and, in particular, reveals capabilities important to and nutrient cycling, cold-active with low temperature optima for activity and bioremediation, production of secondary metabolites, and cold- marked heat instability (10). adapted enzymes. From a genomic perspective, cold adaptation is The features that define cold adaptation and our comprehension suggested in several broad categories involving changes to the cell of them continue to evolve. Several current biochemical models of membrane fluidity, uptake and synthesis of compounds conferring catalysis are predicated on an increased flexibility in certain cryotolerance, and strategies to overcome temperature-dependent regions of cold-active enzyme architecture and high activity cou- barriers to carbon uptake. Modeling of three-dimensional pled with a concomitant increase in thermolability (11). However, homology from bacteria representing a range of optimal growth the adaptations to protein architecture essential to cold-active temperatures suggests changes to proteome composition that may enzymes are still not well understood, and inquiries to unlock these enhance enzyme effectiveness at low temperatures. Comparative adaptations continue to be an active area of investigation (12). genome analyses suggest that the psychrophilic lifestyle is most Nonetheless, the biochemical properties of cold-active enzymes likely conferred not by a unique set of genes but by a collection of have made them attractive for exploitation in a number of bio- SCIENCES synergistic changes in overall genome content and amino acid chemical, bioremediation, and industrial processes (13). Given composition. these attractions (the known traits of C. psychrerythraea, its prod- ENVIRONMENTAL ucts, and its place in a genus of global occurrence across all manner proteome ͉ psychrophily ͉ bioremediation ͉ astrobiology ͉ three- of cold marine habitat), C. psychrerythraea was selected as a model dimensional homology modeling organism for genomic studies of bacterial cold adaptation. Materials and Methods y volume, most of Earth’s biosphere is cold and marine, with B90% of the ocean’s waters at 5°C or colder. Fully 20% of Earth’s Sequencing, Gene Identification, and Genome Analysis. Cloning, surface environment is frozen, including permanently frozen soil sequencing, and assembly were as described for genomes sequenced (permafrost), terrestrial ice sheets (glacial ice), polar sea ice, and by The Institute for Genomic Research (14, 15). Open reading snow cover (1). Although a diversity of microorganisms can be frames (ORFs) [or coding sequences (CDSs)] likely to encode were predicted by GLIMMER (16) (Table 1). This program, recovered from these environments, only cold-adapted organisms Ͼ can be active in them (2). Among the cold-adapted bacteria, the based on interpolated Markov models, was trained with ORFs 90 genus Colwellia (1) within the ␥-proteobacteria, provides an un- bp from the genomic sequence that had BLASTX hits to The Institute usual case: All characterized members are strictly psychrophilic for Genomic Research nonredundant internal protein database ϫ Ϫ5 (requiring temperatures of Յ20°C to grow on solid media) having with an expectation value better than 1 10 , as well as with any C. psychrerythraea genes available in GenBank. All predicted pro- been obtained from stably cold marine environments, including Ͼ deep sea and Arctic and Antarctic sea ice (3). Many members of this teins 30 aa were searched against a nonredundant protein data- genus produce extracellular polymeric substances relevant to bio- base as described in ref. 14. Frameshifts and point mutations were film formation and cryoprotection (4, 5) and enzymes capable of detected and corrected where appropriate. Remaining frameshifts degrading high-molecular-weight organic compounds. These traits and point mutations are considered to be authentic and were are likely to make Colwellia species important to carbon and annotated as ‘‘authentic frameshift’’ or ‘‘authentic point mutation,’’ nutrient cycling wherever they occur in the cold marine environ-

ment, from contaminated sediments to ice formations under study Freely available online through the PNAS open access option. as analogs for possible habitats on a younger Earth (6) and other Abbreviations: CDS, coding sequence; OGT, optimal growth temperature; CDA, canonical planets and moons (e.g., Mars and Europa) (1, 7). discriminant analysis; CDF, canonical discriminant function; PHA, polyhydroxyalkanoate. Colwellia psychrerythraea 34H isolated from Arctic marine sedi- Data deposition: The annotated genome sequence has been deposited in the GenBank ments (8) represents the type species of the genus Colwellia (3). It database (accession no. CP000083). grows reliably in heterotrophic media over a temperature range of †To whom correspondence may be addressed at: The Institute for Genomic Research, 9712 approximately Ϫ1°C to 10°C, with cardinal growth temperatures Medical Center Drive, Rockville, MD 20850. E-mail: [email protected]. (optimum of 8°C, maximum of 19°C, and extrapolated minimum of §To whom correspondence may be addressed at: School of Oceanography, Box 357940, Ϫ14.5°C) (5) ranking among the lowest for all characterized University of Washington, Seattle, WA 98195. E-mail: [email protected]. bacteria (2). Maximum cell yield is achieved at subzero temperature © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0504766102 PNAS ͉ August 2, 2005 ͉ vol. 102 ͉ no. 31 ͉ 10913–10918 Downloaded by guest on October 1, 2021 Table 1. General features of the C. psychrerythraea 34H genome Table 2. A list of 22 organisms used for predicted proteome composition analyses Size, bp 5,373,180 G ϩ C percentage 37.9 OGT G ϩ C Predicted CDSs, n 4,937 Name class percentage Lineage Avg. size of CDS, bp 924 ␣ Percentage coding 85 Caulobacter crescentus CB15 M 67.1 -Proteobacteria rRNA operons (16S-23S-5S), n 9 Corynebacterium glutamicum ATCC M 53.7 Actinomycetale tRNAs, n 88 13032 ␥ Structural RNAs, n 1 Escherichia coli O157:H7 M 50.5 -Proteobacteria ␦ CDSs similar to known protein, n 2,664 Desulfovivrio vulgaris Hildenborough M 63.2 -Proteobacteria ␥ CDSs similar to proteins of unknown, n 543 Haemophilus influenzae Rd KW20 M 38 -Proteobacteria Function* Listeria innocua CLIP11262 M 37.3 Firmicute Number of conserved hypothetical 690 Listeria monocytogenes EGD M 37.9 Firmicute Proteins† Oceanobacillus iheyensis HTE831 M 35.7 Firmicute ␥ Hypothetical proteins,‡ n 1,041 Pasteurella multocida Pm70 M 40.3 -Proteobacteria ␥ ␳-Independent terminators, n 584 Pseudomonas aeruginosa PA01 M 66.4 -Proteobacteria Pseudomonas putida KT2440 M 61.4 ␥-Proteobacteria *Unknown function: substantial sequence similarity to a named protein for S. oneidensis MR-1 M 45.9 ␥-Proteobacteria which no function is currently attributed. Vibrio cholerae E1 Tor N16961 M 47 ␥-Proteobacteria †Conserved hypothetical protein: sequence similarity to a translation of an- Vibrio parahaemolyticus RIMD M 45.3 ␥-Proteobacteria other ORF; however, no experimental evidence for the protein exists. 2210633 ‡Hypothetical protein: no substantial similarity to any sequenced protein. Vibrio vulnificus CMCP6 M 47 ␥-Proteobacteria Yersinia pestis KIM M 47.6 ␥-Proteobacteria C. psychrerythraea 34H P 37.9 ␥-Proteobacteria or, in the case of multiple lesions within a single CDS, ‘‘degenerate.’’ D. psychrophila LSv54 P 46.8 ␦-Proteobacteria Protein membrane-spanning domains were identified by TOPPRED Aquifex aeolicus VF5 T 43.5 Aquificaceae (17). The 5Ј regions of each ORF were inspected to define initiation Thermosynechococcus elongatus BP-1 T 53.9 Cyanobacteria codons using homologies, position of ribosomal binding sites, and Thermotoga maritima MSB8 T 46.1 Thermotogales transcriptional terminators. Two sets of hidden Markov models Thermoanaerobacter tengcongensis T 37.5 Clostridia MB4T were used to determine ORF membership in families and super- families: PFAM 14.0 (18) and TIGRFAMs 4.0 (19). PFAM 14.0 hidden The OGTs of the selected organisms range between 25°C and 37°C for Markov models were also used with a constraint of a minimum of mesophilic (M) genomes, 8°C and 10°C for psychrophilic (P) genomes, and 75°C two hits to find repeated domains within proteins and mask them. and 85°C for thermophilic (T) genomes. The G ϩ C percentage is the average Domain-based paralogous families were then built by performing G ϩ C content of the complete genome. Lineage refers to the organism’s all-versus-all searches on the remaining protein sequences using a phylogenetic placement based on 16S rRNA phylogenetic analysis. modified version of a method described in ref. 15. The replicative origin was determined by colocalization of genes identity between the template and CDS had to be Ͼ30% to avoid (dnaA and dnaN) often found near the origin in prokaryotic alignment errors. For multiple potential models for a given tem- genomes and GC nucleotide skew (GϪC͞GϩC) analysis (20). ␹2 plate from a genome, the one with greatest similarity to the Regions of atypical nucleotide composition were identified by template was selected. To lessen issues associated with paralogous analysis: The distribution of all 64 trinucleotides (3-mer) was model comparisons, the CDS used for the model was determined computed for the complete genome in all six ORFs, followed by the by bidirectional best matches to the C. psychrerythraea CDS. 3-mer distribution in 2,000-bp windows. Windows overlapped by After these filtering steps, the remaining sequences were pro- 1,000 bp. For each window, the ␹2 statistic on the difference cessed through a homology building pipeline by using target- between its 3-mer content and that of the whole genome was template alignment (CLUSTALW), backbone copy (APE) and side- computed (see Fig. 2, which is published as supporting information on the PNAS web site). Information on additional comparative chain building (SCWRL) (22). A total of 2,026 models from 173 genomic analyses can be found as Supporting Text, which is pub- templates including 624,000 residues was constructed. Surface- lished as supporting information on the PNAS web site. exposed area was calculated for each residue in the model by using the STRIDE program (23). From the surface composition analysis, Amino Acid Composition and Statistical Analysis. Twenty-two pre- modeled residues were subdivided into two categories: exposed and Ͻ dicted proteomes from complete genomes were chosen for analysis buried. If total exposed area for a residue was 10% of maximum to represent a range of genome GϩC percentage content and exposed area for the residue type, the residue in question was optimal growth temperature (OGT) (Table 2), including several defined as buried. close mesophilic relatives of Colwellia (e.g., Shewanella oneidensis and Vibrio spp.), and more divergent lineages, such as the psychro- Canonical Discriminant Analysis (CDA). The CANDISC procedure (with philic ␦-proteobacterium Desulfotalea psychrophila (21) and Gram- parametric, linear classification rules and prior probabilities pro- positive bacteria. Because no complete genome sequence from a portional to sample size) of the SAS system (SAS͞STAT 9.1, SAS, Cary, thermophilic representative of the ␥-proteobacteria was available, NC) was used to perform CDAs. CDA was used to identify and other thermophilic representatives for which the complete se- rank amino acid proportions that could discriminate between the quence is available from the bacterial domain were included. three OGT classes. To explain the total variance of a data set, CDA elucidates a number of canonical discriminant functions (CDFs) Three-Dimensional Protein Homology Modeling. The set of predicted equal to the smaller number of independent variables or the CDSs from C. psychrerythraea and 21 other completed genomes number of class variables minus one. This analysis included 20 (Table 2) were searched against the Protein Data Bank to identify independent variables (the proportions of the 20 amino acids) and potential three-dimensional templates. Searches used the BLASTP three OGT classes resulting in the computation of two CDFs, all of algorithm with an e-value cutoff of Ͻ0.001. To ensure that homol- which were significant (based on large eigenvalues and P values Ͻ ogy models were of high quality, the following criteria were applied 0.05) for each data set. to sequence and template selection. Structural templates had to Total canonical structure was chosen over total standardized cover at least 80% of the length of the CDS, and minimum sequence canonical scores as an index describing the property and structure

10914 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0504766102 Methe´ et al. Downloaded by guest on October 1, 2021 of the CDFs due to the presence of significant pairwise correlations This versatility is further suggested by the presence of three copies among some of the independent variables. Total canonical struc- of granule-associated protein genes (CPS4086, CPS4085, and ture indicates correlation between original variables and canonical CPS4084) physically located among genes devoted to PHA synthe- scores of a given CDF and can therefore be considered variable sis. Two of these copies have no homologs in other bacterial loadings. Because of the presence of linear dependence of the lineages. variables, the analysis was conducted by removing one variable at Genomic investigations further disclosed an ability of C. psychr- random. Each variable was tested until the one with the least erythraea to synthesize and degrade polyamides similar to cyano- influence on the canonical loadings was detected. phycin, protein-like polymers that function as nitrogen reserves. Polyamides are of industrial interest as possible biopolymer sub- Results stitutes for polyacrylates (32). For C. psychrerythraea, the collective General Genome Features and Comparisons. The C. psychrerythraea gene complement for biosynthesis of PHA and cyanophycin-like 34H genome consists of a single circular chromosome of 5,373,180 compounds may be of particular benefit to the psychrophilic bp with 4,937 predicted CDSs. General genome features are lifestyle by ensuring intracellular reserves of nitrogen and carbon to presented in Table 1 (see also Fig. 2). The C. psychrerythraea aid in circumventing any cold-imposed limitations to carbon and genome offers a distinct phylogenetic framework for evaluating nitrogen uptake (33). evolution of the psychrophilic lifestyle. Compatible Solutes. Genome analyses of C. psychrerythraea indicate Membrane Fluidity. An important challenge to life at cold temper- an overall expansion in transporter families involved in the uptake atures is the ability to maintain the cell membrane in a liquid- of compatible solutes that may serve multiple roles, including crystalline state (13). C. psychrerythraea genome analyses have osmoprotection and cryoprotection (34). The genome possesses at revealed the presence of a suite of CDSs important to this function. least five putative transporters involved in the movement of qua- CDSs predicted to function in polyunsaturated fatty acid synthesis ternary ammonium compounds of the betaine͞carnitine͞choline (a well established cold adaptation) (24), including a putative transporter family (CPS4027, CPS4009, CPS3860, CPS2003, and operon related to polyketide-like polyunsaturated fatty acid syn- CPS1335) as well as homologs to the ATP-binding cassette trans- thases (CPS3104, CPS3103, CPS3102, and CPS3099) (25) were ͞ port system for direct uptake of betaine (CPS4933, identified, as was a fatty acid cis trans that would confer CPS4934, and CPS4935). Furthermore, two lineage-specific dupli- an ability to alter the ratio of cis- to trans-esterified fatty acids in cations of genes encoding betaine aldehyde hydrogenase, choline phospholipids (CPS0087). Polyunsaturated fatty acid synthesis and dehydrogenase, and a BetI family regulator that function in the increased cis-isomerization, for example, enhance membrane flu- production and regulation of glycine betaine from the uptake of idity at low temperatures. choline-containing compounds are also present. C. psychrerythraea genome analyses further elucidated several The C. psychrerythraea genome possesses four copies of serine copies of genes vital to fatty acid and phospholipid biosynthesis (see hydroxymethyltransferase (glyA) (CPS4031, CPS3844, CPS2427, Table 4, which is published as supporting information on the PNAS and CPS0728), which catalyzes the interconversion of glycine and web site). The genome possesses a 3-oxoacyl-(acyl-carrier-protein) serine, and four copies of formyltetrahydrofolate deformylase reductase (CPS2297), which can catalyze the first reduction step in SCIENCES (purU) (CPS4357, CPS4036, CPS3620, and CPS2482), which reg- fatty acid biosynthesis, and two putative copies (CPS0665 and ENVIRONMENTAL ulates intracellular concentrations of the tetrahydrofolate one- CPS1608). One of these copies (CPS1608) is located in an operon with other fatty acid metabolism genes in an approximate 15- carbon pool. Collectively, these enzymes play critical roles in kilobase region of the genome populated by CDSs involved in fatty regulating key biosynthetic pathways, such as purine and lipid acid metabolism and branched-chain amino acid catabolism. The biosynthesis. Two of the glyA and purU copies are located on the physical proximity of these CDSs to one another may indicate that C. psychrerythraea chromosome in putative operons that also en- the branched-chain portion resulting from branched-chain amino code CDSs for . Sarcosine oxidase demethylates acid catabolism is incorporated into branched-chain fatty acid sarcosine to glycine and 5,10-methylenetetrahydrofolate, the sub- synthesis. The introduction of branched-lipids into membrane strates for glyA (35). Because sarcosine can be derived from the architecture is a mechanism that reduces membrane viscosity at catabolism of the osmoprotectant and cryoprotectant betaine, the cold temperatures (24). The C. psychrerythraea genome also pos- metabolism of choline, betaine, and sarcosine may be linked. sesses multiple copies of putative ␤-keto-acyl carrier proteins C. psychrerythraea may derive benefits from these genes and met- (KAS-II and KAS-III) (see Table 4) central to fatty acid synthesis abolic links due to their dual influences on the production of and control of straight and branched-chain lipid ratios in the cell protective compounds and as sources of carbon, nitrogen, and membrane (26, 27). energy (see Supporting Text).

Carbon, Energy, and Nitrogen Reserves. Genome analyses reveal the Extracellular Compounds. The synthesis of extracellular polysaccha- capacity of C. psychrerythraea to produce polyhydroxyalkanoate rides and degradative enzymes is important to the overall metab- (PHA) compounds, a family of polyesters that serve as intracellular olism and possibly to the cold-adaptation of C. psychrerythraea in its carbon and energy reserves, of which some forms have been linked environment. Extracellular polysaccharides can serve as cryopro- to pressure adaptation (28). PHA compounds are of industrial tectants (4, 5), and extracellular enzyme production may represent interest for their thermoplastic and elastomeric properties and as another mechanism for overcoming threshold requirements for sources for fine chemical synthesis (29, 30). The ability of C. psy- dissolved organic carbon in cold environments (5, 33). For instance, chrerythraea to produce PHA compounds is likely linked with its over half of the enzymes assigned to the degradation of proteins and significant capacity to produce and degrade fatty acids suggested in peptides in the C. psychrerythraea genome are predicted to be part by the expansion (multiple gene duplications) of acyl-CoA localized external to the cytoplasm, among the highest percentage dehydrogenase and enoyl-CoA hydratase gene families, whose roles in any completed genome (see Table 5, which is published as are central to the utilization of medium- and long-chain fatty acids supporting information on the PNAS web site). The genome further that can be oxidized via the ␤-oxidation cycle and produce sub- encodes an expansion of putative members of the extracellular strates for PHA biosynthesis (31). Because PHA composition factor subfamily of ␴-70 transcription factors that have multiple depends in part on carbon sources and the manner in which they are roles, including regulating extracellular polysaccharide biosynthesis, catabolized (29–31), these gene family expansions may infer ver- and of paralogous families of glycosyl , which are also satility in the nature of PHAs that C. psychrerythraea can synthesize. likely to function in extracellular polysaccharide synthesis.

Methe´ et al. PNAS ͉ August 2, 2005 ͉ vol. 102 ͉ no. 31 ͉ 10915 Downloaded by guest on October 1, 2021 Unusual Capacities and Genes. Genome analyses of C. psychreryth- raea demonstrate the presence of many well described CDSs related to DNA metabolism and protein synthesis (and such common behaviors as motility) implying that overall essential enzymatic functions inherent to these basic processes are similar to other proteobacteria. However, the presence of some CDSs with distant homology to ␥-proteobacterial sequences and expansions of other gene families alludes to the existence of additional as yet unde- scribed mechanisms possibly related to cold adaptation, including posttranslational modifications (36) (see Table 6, which is published as supporting information on the PNAS web site). Five CDSs encode for common forms of cold-shock proteins (CPS4529, CPS2895, CPS0737, CPS0718, and CPS0148), of which four appear to be localized to the cytoplasm. The fifth (CPS0148) is predicted to contain an unusual protein architecture by the inclusion of three transmembrane-spanning regions. In addition, two CDSs (CPS2624 and CPS1911) present in C. psychrerythraea bear modest homology to cold-shock domain proteins from Vibrio spp. and S. oneidensis and may represent as yet uncharacterized proteins relevant to cold adaptation. The C. psychrerythraea genome also contains a suite of CDSs with roles in the synthesis and catabolism of complex, high-molecular- weight organic compounds and possible C1 metabolism that ulti- mately facilitate a wide range of responses to its environment, emphasizing the versatile roles that C. psychrerythraea can play in carbon and nutrient cycles of cold environments (Supporting Text; see also Fig. 3, which is published as supporting information on the PNAS web site). These activities include not only the catabolism of complex compounds to provide energy and carbon sources but also mechanisms of detoxification relevant to the bioremediation of cold environments and other biotechnological applications. For in- stance, C. psychrerythraea possesses a homolog to the 2,4,6- trichlorophenol monooxygenase of Ralstonia eutropha (CPS2047) along with two homologs to reductive dehalogenases involved in the degradation of pentachlorophenol (CPS1905 and CPS1668) (37). Genome analyses also suggest the presence of putative dioxygen- ases (CPS1846 and CPS4358) and monooxygenases (CPS3582, CPS3527, and CPS1273) critical to the cleavage of ring bearing and aliphatic compound degradation. A particularly unusual finding in the C. psychrerythraea genome is the presence of CDSs involved in the biosynthesis and utilization of coenzyme F420. These coenzymes were first discovered in meth- anogens, where they are critical participants in methanogenesis (38). Since their discovery, homologs to coenzyme F420 sequences have been reported in only a few bacterial lineages. In Rhodococcus Fig. 1. Scatter plot of the scores of CDF 1 and CDF 2 indicating separation based on OGT. (a) The scatter plot from the total primary residue analysis. spp., homologs to coenzyme F420 have been linked to polynitroaro- (b) The scatter plot from the surface-exposed residue analysis. (c) The matic compound degradation, such as 2,4,-dinitrophenol (39). scatter plot from the buried residue analysis. ■, psychrophiles; F, meso- These findings suggest possible roles in C. psychrerythraea related to philes; Œ, thermophiles. C1 or aromatic compound metabolism. The ability to respond to reactive species is a vital function when undergoing aerobic metabolism and is likely to be of further ends where putative repressor genes are found, indicative of a importance in C. psychrerythraea because of the need to protect cell possible past recombination event in a circularized phage genome. membrane polyunsaturated fatty acids, which are generally more The proximity of these phage genomes to integrases and trans- susceptible than saturated fatty acids to oxidative damage (40). posases suggests their involvement in larger mobile genetic Genome analyses reveal an enhanced antioxidant capacity in elements. C. psychrerythraea through the presence of a variety of CDSs encoding antioxidants, including three copies of catalase genes Amino Acid Composition Comparisons with Other Bacterial Genomes. (CPS2441, CPS3328, and CPS1344). In addition to the typical iron- To successfully thrive in cold environments, psychrophiles must or manganese-containing superoxide dismutase (SOD) C. psychr- synthesize enzymes that perform effectively at low temperatures. erythraea also possesses a putative nickel-containing SOD Cold-temperature environments present several challenges, in par- (CPS0444), a form that has not been reported in proteobacterial ticular reduced reactions rates, increased viscosity, and altered lineages. An alternative SOD may provide a mechanism to neu- microscopic structure (including phase changes) of the surrounding tralize reactive oxygen species while circumventing any environ- medium. To cope with these conditions cold-adapted enzymes have mentally imposed iron limitations. been found to exhibit an increase in enzyme turnover (Kcat)or The C. psychrerythraea genome includes two putative filamentous improvement of catalytic efficiency (Kcat͞KM) at a given tempera- phage genomes (Fig. 1). Despite sharing nine identical CDSs and ture, relative to their mesophilic counterparts (11). These changes corresponding intergenic regions (conserved in sequence and gene have been suggested to originate from localized increases in enzyme order), the two phage genomes diverge from one another at both flexibility or ‘‘molecular plasticity’’ in critical locations of the

10916 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0504766102 Methe´ et al. Downloaded by guest on October 1, 2021 protein architecture. This plasticity is believed to result in a lowering Table 3. Total canonical structure indicating correlation of the transition state barrier for the catalyzed reaction, relative to coefficient (r) between original variables (amino acids) and the mesophilic counterparts, and may ultimately require lower ther- scores of CDF 1 and CDF 2 for the primary, surface-exposed, and modynamic stability (11–13). buried residue data sets The availability of a whole-genome sequence provides an im- CDF 1 CDF 2 portant opportunity to investigate these phenomena from a pro- ‘‘R’’ group teome level in the bacterial domain. The predicted amino acid Variable rPrPcharacter composition of the entire C. psychrerythraea proteome was com- Primary pared with those from 21 other complete, predicted proteomes Asp 0.68 0.0005 0.05 0.80 Acidic from bacteria across a range of OGTs and genome GϩC percent- Glu Ϫ0.66 0.0009 0.03 0.89 Acidic age content (Table 2). The predicted proteomes of each organism His 0.62 0.0022 0.12 0.59 Basic were first compared based on their primary compositions. Next, Val Ϫ0.56 0.0072 0.25 0.25 Hydrophobic predicted CDSs were examined for matches to known three- Ser 0.56 0.0073 ؊0.47 0.02 Polar dimensional protein structures. The amino acid compositions of the Tyr Ϫ0.53 0.010 Ϫ0.10 0.64 Polar͞aromatic surface and buried residues were then estimated for the subset of Met 0.53 0.011 0.07 0.74 Hydrophobic Ϫ CDSs for which significant matches to three-dimensional structures Thr 0.50 0.017 0.02 0.94 Polar Ala 0.47 0.026 0.19 0.39 Hydrophobic could be determined. Amino acid composition for each data set was Lys Ϫ0.47 0.029 Ϫ0.16 0.47 Basic analyzed for statistically significant differences that may be related Gln 0.45 0.037 0.05 0.83 Polar to OGT. Surface-exposed The proportion of polar residues was among the highest, whereas Ser 0.73 0.0001 0.43 0.04 Polar charged amino acids proportions were among the lowest for the two Gln 0.71 0.0002 0.045 0.84 Polar psychrophilic genomes in the surface composition. An overall Val Ϫ0.67 0.0006 Ϫ0.12 0.59 Hydrophobic decrease in nonpolar residues from the exposed composition was Glu Ϫ0.58 0.0048 Ϫ0.04 0.85 Acidic noted only for C. psychrerythraea (see Table 7, which is published as Lys Ϫ0.56 0.0062 0.20 0.36 Basic supporting information on the PNAS web site). Met 0.52 0.012 0.008 0.97 Hydrophobic His 0.45 0.03 ؊0.45 0.03 Basic CDA (41, 42) was performed on each data set to identify and Asp 0.17 0.44 ؊0.48 0.02 Acidic rank amino acid proportions that could discriminate among the Arg 0.17 0.45 ؊0.42 0.05 Basic three OGT groups: psychrophile, mesophile, or thermophile. CDA Buried quantifies a set of underlying constructs, CDFs, that are the linear Tyr Ϫ0.84 Ͻ.0001 Ϫ0.04 0.84 Polar͞aromatic functions of the original variables (amino acids) minimizing within Asp 0.79 Ͻ.0001 Ϫ0.05 0.82 Acidic (thermal) group and maximizing among (thermal) group varia- His 0.72 0.0001 0.17 0.46 Basic tions. The contribution of the original variables on the construction Ser 0.60 0.0031 ؊0.54 0.009 Polar of these functions can then be quantified through the use of their Thr 0.58 0.0043 Ϫ0.10 0.65 Polar Ϫ loadings, which correspond to the correlation coefficients between Glu 0.57 0.0055 0.14 0.54 Acidic Ala 0.51 0.015 0.10 0.65 Hydrophobic SCIENCES the original variables and the function scores. Lys Ϫ0.50 0.019 Ϫ0.09 0.69 Basic

For each data set, CDA correctly grouped each organism based Ile Ϫ0.42 0.048 Ϫ0.36 0.10 Hydrophobic ENVIRONMENTAL on amino acid composition into one of the three types of thermal classes (psychrophile, mesophile, or thermophile) (Fig. 1 and Table Data are ranked in descending order by the absolute value of the correla- 3). Along CDF 1, the mesophile and psychrophile groups were tions for CDF 1 and only significant variables (P Ͻ 0.05) are shown. ‘‘R’’ group much closer to one another relative to the thermophiles, and the character refers to the chemical character of the amino acid residue side chain. Significant variables on CDF 1 are shown in normal font. Data in bold are greatest discrimination was between the thermophiles relative to significant on CDF 1 and CDF 2, and data in bold italic are significant on CDF the mesophiles and psychrophiles. Along CDF 2, the greatest 2 only. A positive correlation between a variable and CDF 1 for all data sets discrimination occurred between the psychrophiles relative to the indicates an increase in that variable in the transition from thermophile to mesophiles and thermophiles (Fig. 1). mesophile and psychrophile. A negative correlation between a variable and CDF 1 from the primary data set was defined by significant CDF 2 in the primary and buried data sets indicates an increase in that variable correlations with aspartate (acidic), histidine (weakly basic), serine, in the transition from mesophile and thermophile to psychrophile. A negative threonine, glutamine (polar͞noncharged), methionine, alanine correlation between a variable and CDF 2 in the surface data set indicates a (hydrophobic), glutamate (acidic), valine (hydrophobic), tyrosine decrease in that variable in the transition from mesophile and thermophile to (polar͞aromatic), and lysine (basic). A negative correlation (indi- psychrophile. cating an increase in the relative proportion from thermophile to psychrophile) with serine was the only significant contributor to which could not be directly detected by this analysis, including the CDF 2 (Fig. 1 and Table 3). The buried data set had similar influence of noncovalent interactions (e.g., hydrogen bonds, van variables defining CDF 1 and CDF 2 when compared with the der Waals forces, and hydrophobic interactions) (43), may be primary data set (Table 3). important contributors to differences in protein thermostability, The surface data set represented the greatest separation of the and the same set of changes is not likely to occur in every enzyme thermal classes along CDF 1 and, in particular, between the mesophiles and psychrophiles along CDF 2 (Fig. 1). Significant class. In addition, several of the organisms included as mesophiles decreases from mesophile to psychrophile were noted for aspartate, in this study have psychrotolerant physiologies [e.g., S. oneidensis arginine, and histidine, which is consistent with the overall de- MR-1 (44) and the Listeria spp. (34)], which may further confound creases in charged amino acid composition (see Table 7). A trend the identification of differences in amino acid composition between toward the substitution of aspartate for glutamate was also noted, mesophiles and psychrophiles. and an increase in serine content was again suggested in the Nonetheless, even with the inclusion of divergent bacterial lin- transition from thermophile to psychrophile (Fig. 1 and Table 3). eages, significant differences in amino acid composition between The results of this analysis indicate more significant differences the three thermal classes could be identified, the results of which are in amino acid composition between the thermophiles and either consistent with several reported trends. Several studies of protein mesophiles or psychrophiles than between mesophiles and psych- thermostability have suggested a decrease in polar residues and rophiles (Table 3). These results may in part reflect the limit of increase in charged amino acids as temperature increases (45–48), resolution of this study. Changes related to protein composition trends that were generally supported in this study. Increased serine

Methe´ et al. PNAS ͉ August 2, 2005 ͉ vol. 102 ͉ no. 31 ͉ 10917 Downloaded by guest on October 1, 2021 in both psychrophilic genomes revealed in this study would con- The three-dimensional protein homology modeling and CDA tribute to increased polar surface amino acids. Of interest are the examination in this study has provided a comprehensive compar- presence of four copies of glyA (which controls the interconversion ison of proteome composition across OGTs and divergent lineages of glycine and serine) in C. psychrerythraea, which suggests that these in the bacterial domain to determine whether signals possibly gene multiplications may play a role in maintaining proportions of related to thermal adaptation of proteins could be detected. this key amino acid. An apparent favoring of aspartic acid over Differences likely to enhance architectural changes to enzymes glutamate, particularly on the surface of psychrophilic proteins, is favoring their effectiveness at cold temperatures appear consistent consistent with studies of thermophilic proteins; in C. psychreryth- with some previously reported trends. In particular, a trend toward raea, these substitutions would translate to a decrease in the increased polar residues (particularly serine), the substitution of unfolding transition temperature of proteins, effectively making aspartate for glutamate, and a general decrease in charged residues them less heat-stable (46). on the surface of proteins were noted. Each of these changes is consistent with prevailing theories that increased flexibility and Discussion reduced thermostability contribute to enzyme cold adaptation. The genome sequence of C. psychrerythraea has provided an However, effects such as those arising from noncovalent interac- important opportunity to better understand this organism’s poten- tions are also likely contributors to the stability of enzymes at tial functions in the marine environment and to gain insight into different temperatures, and modifications may differ depending on adaptations that help define and influence the psychrophilic life- the class of enzyme. style. Genome analyses revealed a variety of metabolic capabilities Given the existence of psychrophiles in lineages across the tree and roles in carbon and nutrient cycling, including some that may of life, multiple mechanisms contributing to cold adaptation may be useful to bioremediation in cold environments. exist. To date, genome analyses suggest that cold adaptation From a genome-level perspective, adaptations potentially bene- consists of a collection of synergistic changes in overall genome ficial to life in cold environments can be seen in several broad configuration reflected in terms of gene content and amino acid categories. Several of the adaptive strategies appear to increase composition rather than the presence of a unique set of genes fitness by effectively overcoming multiple obstacles at low temper- indicative of and responsible for conferring a psychrophilic lifestyle. ature, including temperature-dependent barriers to carbon and nitrogen uptake. These strategies are reflected in expansions of We thank O. White, M. Heaney, S. Lo, M. Holmes, V. Sapiro, R. gene families related to cell membrane synthesis, a capacity for Karamchedu, and R. Deal for informatics, database, and software uptake or synthesis of compounds that in part may confer cryotol- support; The Institute for Genomic Research faculty and sequencing core for expert advice and assistance; and L. E. Wells for phage-gene erance, including PHAs (which may also aid in pressure adapta- analysis. This work was supported by the United States Department of tion), cyanophycin-like compounds, and glycine betaine, as well as Energy Office of Biological and Environmental Research through the the capacity to produce copious quantities of extracellular enzymes Microbial Genomes Program. J.W.D. acknowledges support from the and polysaccharides. National Aeronautics and Space Administration Astrobiology Institute.

1. Deming, J. W. & Eicken, H. (2005) Life in Ice (Cambridge Univ. Press, Cambridge, 25. Metz, J. G., Roessler, P., Facciotti, D., Levering, C., Dittrich, F., Lassner, M., U.K.). Valentine, R., Lardizabal, K., Domergue, F., Yamada, A., et al. (2001) Science 293, 2. Bowman, J. P. (2005) Adv. Microb. Ecol., in press. 290–292. 3. Deming, J. W. & Junge, K. (2005) Bergey’s Manual of Systematic Bacteriology 26. Lai, C. Y. & Cronan, J. E. (2004) J. Bacteriol. 186, 1869–1878. (Bergey’s Manual Trust, East Lansing, MI), Vol. 2. 27. Choi, K.-H., Heath, R. J. & Rock, C.O. (2000) J. Bacteriol. 182, 365–370. 4. Krembs, C., Deming, J. W., Junge, K. & Eicken, H. (2002) Deep-Sea Res. 49, 28. Martin, D. D., Bartlett, D. H. & Roberts, M. F. (2002) Extremophiles 6, 507–514. 2163–2181. 29. Madison, L. L. & Huisman, G. W. (1999) Microbiol. Mol. Biol. Rev. 63, 21–53. 5. Huston, A. L. (2003) Ph.D. thesis (Univ. of Washington, Seattle). 30. Lee, S. Y., Park, S. H., Lee, Y. & Lee, S. H. (2001) Production of Chiral and Other 6. Kirschvink, J. L., Gaidos, E. J., Bertani, L. E., Beukes, N. J., Gutzmer, J., Maepa, L. N. Valuable Compounds from Microbial Polyesters (Wiley, Weinheim, Germany). & Steinberger, R. E. (2000) Proc. Nat. Acad. Sci. USA 97, 1400–1405. 31. Park, S. J. & Lee, S. Y. (2003) J. Bacteriol. 185, 5391–5397. 7. Deming, J. W. (2002) Cur. Opin. Microbiol. 3, 301–309. 32. Frey, K. M., Oppermann-Sanio, F. B., Schmidt, H. & Steinbuchel, A. (2002) Appl. 8. Huston, A. L., Krieger-Brockett, B. B. & Deming, J. W. (2000) Appl. Environ. Environ. Microbiol. 68, 3377–3384. Microbiol. 2, 383–388. 33. Pomeroy, L. R. & Wiebe, W. J. (2001) Aquatic Microbiol. 23, 187–204. 9. Junge, K., Eicken, H. & Deming, J. W. (2003) Appl. Environ. Microbiol. 69, 34. Wemekamp-Kamphuis, H. H., Sleator, R. D., Wouters, J. A., Hill, C. & Abee, T. 4282–4284. (2004) Appl. Environ. Microbiol. 70, 2912–2918. 10. Huston, A. L., Methe´, B. & Deming, J. W. (2004) Appl. Environ. Microbiol. 70, 35. Chlumsky, L. J., Zhang, L. & Jorns, M. S. (1995) J. Biol. Chem. 270, 18252–18259. 3321–3328. 36. Dalluge, J. J., Hamamoto, T., Horikoshi, K., Morita, R. Y., Stetter, K. O. & 11. Georlette D., Blaise, V., Collins, T., D’Amico, S., Gratia, E., Hoyoux, A., Marx, J. C., McCloskey, J. A. (1997) J. Bacteriol. 179, 1918–1923. Sonan, G., Feller, G. & Gerday, C. (2004) FEMS Microbiol. Rev. 28, 25–42. 37. Cai, M. & Xun, L. (2002) J. Bacteriol. 184, 4672–4680. 12. Marx, J. C., Blaise, V., Collins, T., D’Amico, S., Delille, D., Gratia, E., Hoyoux, A., 38. Hagemeier, C. H., Shima, S., Thauer, R. K., Bourenkov, G., Bartunik, H. D. & Huston, A. L., Sonan, G., Feller, G. & Gerday, C. (2004) Cell Mol. Biol. 50, 643–655. Ermler, U. (2003) J. Mol. Biol. 332, 1047–1057. 13. Feller, G. & Gerday, C. (2003) Nat. Rev. Microbiol. 1, 200–208. 39. Heiss, G., Trachtmann, N., Abe, Y., Masahiro, T. & Kackmuss, H.-J. (2003) Appl. 14. Methe´, B. A., Nelson, K. E., Eisen, J. A., Paulsen, I. T., Nelson, W., Heidelberg, J. F., Environ. Microbiol. 69, 2748–2754. Wu, D., Wu, M., Ward, N., Beanan, M. J., et al. (2003) Science 302, 1967–1969. 15. Heidelberg, J. F., Seshadri, R., Haveman, S. A., Hemme, C. L., Paulsen, I. T., 40. Barriere, C., Cento, D., Lebert, A., Leroy-Setrin, S., Berdague, J. L. & Talon, R. Kolonay, J. F., Eisen, J. A., Ward, N., Methe´, B. A., Brinkac, L. M., et al. (2004) Nat. (2001) FEMS Microbiol. Lett. 201, 181–185. Biotechnol. 22, 554–559. 41. McLachlan, G. J. (1992) Discriminant Analysis and Statistical Pattern Recognition 16. Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. (1999) Nucleic Acids (Wiley, New York). Res. 27, 4636–4641. 42. Momen, B. & Zehr, J. P. (1998) Ecol. Appl. 8, 497–507. 17. Nielsen, H., Engelbrecht, J., Brunak, S. & vonHeijne, G. (1997) Int. J. Neural Syst. 43. Lazardis, T., Archontis, G. & Karplus, M. (1995) Advances in Protein Chemistry 8, 581–599. (Academic, San Diego). 18. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S. R., Griffiths- 44. Abboud, R., Popa, R., Souza-Egipsy, V., Giometti, C. S., Tollaksen, S., Mosher, J. J., Jones, S., Howe, K. L., Marshall, M. & Sonnhammer, E. L. L. (2002) Nucleic Acids Findlay, R. H. & Nealson, K. H. (2005) Appl. Environ. Microbiol. 71, 811–816. Res. 20, 276–280. 45. Haney, P. J., Badger, H. J., Buldak, G. L., Reich, C. I., Woese, C. R. & Olsen, G. J. 19. Haft, D. H. & Selengut, J. D. (2003) Nucleic Acids Res. 31, 41–43. (1999) Proc. Nat. Acad. Sci. USA 96, 3578–3583. 20. Lobry, J. R. (1996) Mol. Biol. Evol. 13, 660–665. 46. Lee, D. Y., Kyeong-Ae, K., Yi, Y. G. & Key-Sun, K. (2004) Biochem. Biophys. Res. 21. Rabus, R., Ruepp, A., Frickey, T., Rattei, T., Fartmann, B., Stark, M., Bauer, M., Com. 320, 900–906. Zibat, A., Lombardot, T., Becker, I., et al. (2004) Environ. Microbiol. 6, 887–902. 47. Saunders, N. F., Thomas, T., Curmi, P. M., Mattick, J. S., Kuczek, E., Slade, R., Davis, 22. Bower, M. J., Cohen, F. E. & Dunbrack, R. L., Jr. (1997) J. Mol. Biol. 267, 1268–1282. J., Franzmann, P. D., Boone, D., Rusterholtz, K., et al. (2003) Genome Res. 13, 23. Frishman, D. & Argos, P. (1995) Proteins 23, 566–579. 1580–1588. 24. Russell, N. J. (1997) Comp. Biochem. Physiol. 118, 489–493. 48. Farias, S. T. & Bonato, M. C. (2003) Gen. Mol. Res. 2, 383–393.

10918 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0504766102 Methe´ et al. Downloaded by guest on October 1, 2021