<<

Protein Adaptation to High Hydrostatic Pressure:

Computational Analysis of the Structural Proteome

Samvel Avagyan1,2, Daniel Vasilchuk1,3§ and George I. Makhatadze1,2,3,*

1Center for Biotechnology and Interdisciplinary Studies, 2Department of Biological Sciences,

3Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New

York 12180

*Corresponding Author: George I. Makhatadze, Center for Biotechnology and Interdisciplinary

Studies,, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180 USA Phone: (518) 276-

4417 E-mail: [email protected]

Keywords: Piezophilic adaptation; Protein stability; Protein Volume; Structural

Proteome; Pressure Perturbation Calorimetry; Structural Modeling; Osmolytes

- 1 - ABSTRACT

Hydrostatic pressure has a vital role in the biological adaptation of the piezophiles, that live under high hydrostatic pressure. However, the mechanisms by which piezophiles are able to adapt their proteins to high hydrostatic pressure is not well understood. One proposed hypothesis is that the volume changes of unfolding (ΔVTot) for proteins from piezophiles is distinct from those of non-piezophilic organisms. Since ΔVTot defines pressure dependence of stability, we performed a comprehensive computational analysis of this property for proteins from piezophilic and non-piezophilic organisms. In addition, we experimentally measured the ΔVTot of acylphosphatases and thioredoxins belonging to piezophilic and non-piezophilic organisms.

Based on this analysis we concluded that there is no difference in ΔVTot for proteins from piezophilic and non-piezophilic organisms. Finally, we put forward the hypothesis that increased concentrations of osmolytes can provide a systemic increase in pressure stability of proteins from piezophilic organisms and provide experimental thermodynamic evidence in support of this hypothesis.

- 2 - 1. INTRODUCTION

Single-celled organisms represent the dominant form of life on earth 1. Exploration has led us to appreciate not only the diversity of life on earth, but also the diversity of the environments in which these organisms live 2. Until recently, it was thought that life could not exist in extreme conditions such as the bottom of the ocean or deep subsurface environments 2-5. However, collection of pristine samples from such environments, and exploration of deep caves and mines has shown that many life forms, including , , and eukarya exist in places previously thought to be uninhabitable 2,6. Examining of the genomic data from these samples has led to estimate that the vast majority of life exists in the depths of the oceans and buried deep in the subsurface crust 1,6. Such studies have suggested that there is a trove of species living in the deepest parts of the earth’s crust waiting to be discovered. For example, it is known that the crust of the earth is not uniform and large cracks, or fissures, penetrate kilometers into the earth 2. These fissures are full of water, and because they are so deep, the heat from the mantle of the earth is able to warm these waters 2. Minerals are then able to dissolve into these waters providing the primary ingredients to sustain life 6. Interestingly, similar conditions have been proposed to be drivers of life Jupiter’s moon Europa. On Europa, it has been suggested that there is sufficient geological and tidal activity in its ocean to provide energy for life 7. Should this hypothesis hold true it would be possible for life forms to exist in the liquid phase of its oceans, which reach depths of up to 170 km 7.

While extra-terrestrial life remains a hypothesis, isolation and culturing of organisms living in extreme conditions on earth has shown that these organisms are very sensitive to changes in their environment 8-12. For example, when an that is adapted to more “mild”

(atmospheric pressure) conditions is subjected to the extreme condition (high pressure) it is unable

- 3 - to grow. 8. This is because changes in pressure, much like changes in temperature, have deleterious effects on biomacromolecules, including changes in the fluidity of membranes, the dissociation of protein-ligand complexes, and denaturation of proteins 9.

A response of a system to changes in pressure is described by the Le Chatelier’s principle which suggests that a when a system in equilibrium is perturbed, it will re-equilibrate to a new state which minimizes the effect of the perturbation. Thermodynamically this can be written as:

∂∆G ( ) = ∆VTot 1. ∂P T where G is the Gibbs energy and VTot is the volume changes, P is the pressure and T is the temperature. In the case of pressure unfolding of proteins, the system is the protein in an aqueous environment, the perturbation is the application of the pressure, and the equilibrium is between

13-15 folded and unfolded states. The majority of proteins have a negative VTot of unfolding , i.e. the volume of the unfolded state is smaller than the volume of the native state. Thus, the increase in pressure will lead to protein unfolding. Interestingly, there are examples of proteins that have a positive VTot which suggests that the folded state will become more stable at increased pressure

15. Based on these observations, one of the possible mechanisms of adaptation to high hydrostatic pressure is that proteins from organisms that live under elevated pressure evolved to have less negative, or even positive, volume changes upon unfolding,. This hypothesis has been tested in the present work.

We have previously established a method to compute the ΔVTot for proteins using its three- dimensional structure 15,16. To better understand the molecular mechanisms of adaptation to high pressure, we have applied this formalism to ΔVTot analysis of the proteome of organisms living under diverse environmental conditions which include psychro-piezophiles (high-pressure, low- temperature adapted organisms), thermo-piezophiles (high-temperature, high-pressure adapted

- 4 - organisms,) (high-temperature adapted organisms), and (low- temperature adapted organisms). We also analyzed the ΔVTot on a protein-by-protein basis, by finding homologous matches across the four groups mentioned above, and compared them to the proteins from mesophilic organisms. Finally, we experimentally measured and computed ΔVTot for two protein families, acylphosphatase and thioredoxin, using representative structures from these classes of organisms.

2. Materials and Methods

Obtaining and Preparing Structures from PDB

Groups of Archaea and Prokarya belonging to psychrophiles, psychro-piezophiles, thermophiles, and thermo-piezophiles were identified that contained structures deposited to the

Protein Data Bank (PDB). Effort was taken to ensure that a diverse group of organisms was represented in the study, thus both bacterial and archaeal species were included in the four groups.

Some organisms such as thermophilus is over represented in the Protein Data Bank, having more than 10,000 deposited structures; in such cases, the organism was left out of the initial analysis. This was done to minimize bias towards any single organism. Only structures solved by

X-ray crystallography were included into analysis.

Structures were obtained using the RESTful web service of the PDB. An XML document containing the PDB Tree Entity ID for an organism, chain type containing only proteins (yes

Protein, no DNA, no RNA, no RNA/DNA hybrids), and an X-ray resolution of < 2.7 Å was uploaded to http://www.rcsb.org/pdb/rest/search. The search results returned a list of PDB IDs matching the aforementioned criteria. All structures were analyzed as monomers. In cases of homomultimers, chain A was selected to further analysis. In cases of heteromultimers, the first

- 5 - representative chain of each sequence was chosen to undergo further analysis. Any structures that were discontinuous due to missing amino acids were discarded. To ensure low redundancy in the analysis, proteins with multiple structures were identified and the longest, continuous, structure was chosen, the rest were discarded. Sequences containing modified or unnatural amino acids were discarded. The only exception was for sequences containing selenomethionine. The

Modeller software package was used to model in missing sidechain atoms 17. Complete list of

PDB id is provided.

Generating the Native and Unfolded Sate Ensembles

To generate the native state ensemble, a native-state, all-atom, explicit solvent MD was performed using GROMACS, as previously described 15,18. The CHARMM27 force field and the

TIP3P water model were used to model the protein water system 19,20. The obtained crystal structures were solvated in a dodecahedron such that the all of the protein atoms were at least 10

Å away from the edges of the box. Charges on the protein were neutralized by the addition of 0.1

M of NaCl. An initial energy minimization steepest descent was performed for 1000 integration steps or until the maximum force < 1,000 kJ/mol/nm was reached. This was followed by a 200 ps of constant volume equilibration, then 100 ps of constant pressure equilibration. Lastly, a 50 ns production run was generated at a temperature of 300 K and 1 bar of pressure. Pressure was monitored using the Parinello-Rahman barostat, having a 2 ps relaxation time and a compressibility of 4.6·10-5atm-1 21. Temperature was monitored with V-rescale, having a relaxation time of 0.1 ps 18. High frequency bond vibrations were constrained using the LINCS and SETTLE algorithms 22. Electrostatic interactions were modelled using a smooth Particle Mesh Ewald

(PME), with a PME order paramour of four, and a Fourier spacing of 0.16 23. At the end of the

- 6 - production run, 50 structures were extracted from the production trajectory at a sampling frequency of one structure per nanosecond in order to obtain the native state ensemble.

The unfolded state ensemble containing 1000 structures for each protein was generated using Trajectory Directed Ensemble Sampling (TraDES) 24. Secondary structure propensity was removed using the all-coil flag in TraDES software (-c T). The generated structures were subsequently energy minimized in implicit solvent as described above.

Calculating Volume of Proteins

Calculation of protein volume in solution has been extensively described by us in detail and will be described here briefly 15,16. The total volume of a protein in solution can be described by equation, where VTot refers to total volume of a protein in solution, VSE refers to the solvent excluded volume, and VHyd refers to the hydration volume.

VTot = VSE + VHyd 2.

The solvent excluded volume consists of the protein’s void (VVoid) and van der Waals volumes (VvdW):

VSE = VVoid + VvDW 3.

The void volume is due to the fact that the atoms of a protein do not pack at 100% efficiency, thus there will be unoccupied space that is also shielded from the solvent; this space is the void volume 15,25. The van der Waals is volume is derived from the van der Waals radii of atoms. The hydration volume, VHyd, is the result of the interaction of the protein molecules with its surrounding water molecules. This interaction will affect the density of water at the surface of the protein when compared to the density of the bulk water 15. As a result, the hydration volume is dependent on the surface area that is interacting with water 15. While it has been established that the contribution of hydration volume will always be positive, the magnitude is dependent on the

- 7 - total exposed polar and the non-polar surface 15. The contribution of each type surface area are

-5 -5 defined by the non-polar (kNP) and polar (kP) coefficients. kNP=0.38 Å and kP=0.03 Å and were derived empirically from a linear fit to the dependence of hydration volume on each type of surface area model compounds 15, i.e.:

VHyd = (kNP × MSANP) + (kP × MSAP) 4.

The change in volume upon unfolding ΔVTot can be calculated as the sum in the change in void

15 volume (ΔVVoid) and the change in hydration volume (ΔVHyd) upon unfolding .

ΔVTot = VVoid,U − VVoid,N + VHyd,U − VHyd,N = ΔVVoid + ΔVHyd 5. where the ΔVVoid and ΔVHyd are calculated by subtracting the respective unfolded state values from the native state values. The VvdW volumes do not change upon protein unfolding, therefore they

15 are not considered when calculating ΔVTot .

Protein expression, purification and quantification

Modern thioredoxin (Trx) genes from the following organisms: Escherichia coli (EC),

Methanococcus jannaschii (MJ), Pyrococcus yayanosii (PY), Shewanella benthica (SB), (TT) have been expressed and purified as described previously 26. In addition, ancestral Trx protein sequences corresponding reconstructed using Ancestral Sequence

Reconstruction (ASR) were used 27. The recombinant ancestral Trx genes from the last bacterial common ancestor (LBCA), last archaeal common ancestor (LACA), archaeal-eukaryotic common ancestor (AECA), last common ancestor of the cyanobacterial and thermos groups

(LPBCA), last eukaryotic common ancestor (LECA), last common ancestor of γ-proteobacteria

(LGPCA), and the last common ancestor of animals and fungi (LAFCA), were expressed and purified as described 26,27. Modern acylphosphatase (Acp) sequences from the following organisms (BS), Marinitoga piezophila (MP), Thermus thermophilus (TT),

- 8 - Sulfolobus tokodaii (ST) with engineered His-tags at the N-terminus were codon-optimized for expression in E. coli and cloned into the pGia vector (Blue Heron, Bothell, WA).

Protein expression was done BL21 (DE3) E. coli strain in LB media (18 g tryptone, 10 g yeast extract, and 5 g sodium chloride, per 1 L) 37°C. Overexpression of protein was induced when OD600 reached ~0.8 o.u., using isopropyl-β-D-1-thiogalactopyranoside (IPTG) with a final concentration of 1 mM. Cells were harvested by centrifugation 6-10 hours post induction. The supernatant was discarded and the remaining pellets were stored at -20oC prior to further purification. Frozen cell pellets were thawed and lysed using a French press cell. The resulting lysate was diluted twice with 20 mM sodium phosphate, 500 mM sodium chloride, and 30 mM imidazole, pH 7.4. Cellular debris was removed by centrifugation and the resulting supernatant was passed through a Nickel-Nitrilotriacetic Acid (Ni-NTA) Agarose column, equilibrated with

20 mM sodium phosphate, 500 mM sodium chloride, and 30 mM imidazole, pH 7.4 buffer. The stationary phase was washed with the binding buffer until the flow-through OD280 decreased to a value of ~0.01 optical units. Protein was eluted using the elution buffer (20 mM sodium phosphate,

500 mM sodium chloride, 500 mM imidazole, pH 7.4). The eluted solution was concentrated to approximately 30 ml using centrifugal filter units (Millipore, Billerica, MA) with a 3.5 kDa molecular weight cutoff (MWCO) and loaded onto Sephadex G-50 column (2.5x170 cm) equilibrated with a pH 7.4 buffer (20 mM sodium phosphate, 500 mM NaCl). Fractions containing protein were combined and concentrated down to 10-15 ml using centrifugal units. The purity of protein was evaluated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-

PAGE). The protein solution was aliquoted into 2 ml Eppendorf tubes, flash-frozen, and stored at

-20oC.

Differential scanning calorimetry (DSC) and pressure perturbation calorimetry (PPC)

- 9 - Before carrying out DSC and PPC experiments, protein samples were dialyzed extensively using a Fisherbrand dialysis membrane with a 3.5 kDa MWCO. For acidic pH ranges, 30 mM glycine-HCl (pH 2.0-3.4) and 30 mM sodium acetate (pH 3.5-5.0) buffers were used. Buffers were chosen to minimize both enthalpy and volume changes upon ionization 28-30. After overnight dialysis, the samples were centrifuged for at least 15 minutes at 4oC and 14,000 rpm (20,800xg) in an Eppendorf 5417R microcentrifuge, in order to pellet any insoluble material present after dialysis. A Hitachi U-2900 dual-beam spectrophotometer was employed to measure near ultraviolet-visible (UV-Vis) absorption spectra of the proteins using 10 mm path-length rectangular quartz cuvettes. Protein concentration was determined after accounting for light

31 scattering as described . The molar extinction coefficients (ε280nm), and partial specific volumes

(ῡpr) for Trx and Acp variants are given in Table S1. Extinction coefficients for Trx and Acp were calculated as described previously 32. Partial specific volumes for the Trx and Acp variants were calculated from amino acid sequence as previously described 33.

DSC experiments were performed using a VP-DSC instrument (MicroCal Inc./GE

Healthcare) with Trx and Acp protein concentrations of 0.3-1.5 mg/ml, and a scan rate of

90oC/hour. Reversibility of the protein unfolding transitions was determined by stopping the protein-unfolding scan just after the completion of the transition, allowing the cells cool down, and observing another transition as the temperature was increased once again. The studied proteins had 80-90 % reversibility, as calculated by dividing the fitted calorimetric unfolding enthalpy values of the second scan (re-melt) by that of the first scan. The experimental DSC data were analyzed according to a two-state model using an in-house written NLREG script as described 34.

Summary of experimentally measured enthalpies of unfolding, H, and transition temperatures,

Tm, are shown in Figure S1.

- 10 - PPC was performed using an additional pressure-cap attachment with the VP-DSC instrument, as has been described previously 35-37. Experimental scans were carried out at protein concentrations of 2.0-5.0 mg/ml. The coefficient of thermal expansion of a protein as a function of temperature, αpr (T), was measured directly as the difference between the heat produced by the protein in the sample cell, and that of the buffer in the reference cell, using pressure jumps of ~0.55

MPa. Duplicate measurements of heat changes upon compression and decompression were performed every five degrees in the range of 2-97 oC. Data was collected every two degrees in the unfolding transition temperature ranges to better describe the transition profile. Prior to running the experiments, baseline experiments with sample-reference cells containing buffer-buffer, buffer-water, and water-water were performed. Heat effects arising from the buffer component and water were accounted for as follows 36:

∆푄푏푢푓−퐻20 ∆푄푝푟−푏푢푓 푎푝푟 = 훼퐻2푂 − − 6. 푇∙∆푃∙푉푐푒푙푙 푇∙∆푃∙푚푝푟∙ῡ푝푟

Where αH2O is the coefficient of thermal expansion for water, ∆Qbuf-H20 and ∆Qpr-buf describe the heat changes for the buffer and protein solutions, ∆P is the change in pressure of the system, Vcell is the volume of the calorimetric sample cell, and mpr is the mass of the protein in solution. The

exp 36 overall experimental αpr (T) profile was fit as follows using in house written NLREG scripts :

푒푥푝 퐾푒푞(푇) ∆퐻(푇)∙∆푉푃푟 훼푝푟 (푇) = FN(T) ∙ αN(T) + FU(T) ∙ αU(T) + 2 2 7. (1+퐾푒푞(푇)) (1+퐾푒푞(푇)) ∙ῡ푃푟

In which FN(T), FU(T) describe the temperature dependencies of the fraction of folded and unfolded protein, αN(T) and αU(T) describe the temperature dependencies of the folded and unfolded thermal expansion profiles, and Keq(T) is the temperature dependence of the equilibrium constant of protein unfolding. Summary of the experimentally measured enthalpies of unfolding,

- 11 - H, the volume changes upon unfolding, V, and the transition temperatures, Tm, for acylphosphatase and thioredoxin protein families are shown in Figure S1. Values linearly extrapolated to 25°C were used for comparison with the computed values in Figure 3.

Thermodynamic parameters obtained from DSC and PPC experiments were used to compute the stability of a protein using the following equation 38:

ΔG = ΔH (T − Tm)⁄Tm + Δc푝(T − Tm) − Δc푝Tln(T⁄Tm) +

2 ΔV (p − po) + ∆α(T − Tm)(p − po) + (∆β⁄2)(p − po)

8. where H is the enthalpy of unfolding at Tm, V is the volume change upon unfolding at Tm, and po is the reference pressure (0.1 MPa). The Δcp is the change in heat capacity (calculated from the temperature dependence of enthalpy of unfolding see Figure S1), ∆α is the change in thermal expansivity (calculated from the temperature dependence of the volume changes upon unfolding see Figure S1), the Δβ is the change in compressibility upon unfolding, taken from 39. A pressure- temperature (P-T) stability phase diagram can be generated by setting the ΔG(p, T) = 0 and defines the boundaries of stability for the protein (Figure 4) The area under the line corresponds to the pressure and temperature ranges in which the protein will be largely in the native state. The area above of the line describes the conditions in which the protein is largely in the unfolded state 40.

2. RESULTS

Computational analysis was performed on the protein structures deposited to the Protein

Data Bank (PDB) from the four groups of organisms: thermophiles, psychrophiles, thermo- piezophiles, and psychro-piezophiles. Proteins ranging in size from 50 to 500 amino acid residues

- 12 - were selected and the final number of structures for each group was as follows: 254 for thermo- piezophiles, 421 for thermophiles, 52 for psychro-piezophiles and 84 for psychrophiles. In addition, 171 structures from mesophiles that were analyzed previously 15 were included. The complete list of PDB IDs for each group can be found in Table S2. Despite the relatively small number of structures in the data sets, they possess features that have been identified from a larger analysis of sequences of proteins. For example, the amino acid composition of thermophilic proteins in our data set shows an increased fraction charged amino acids and decreased fraction of polar, while the psychrophilic proteins have an have increased fraction of polar residues and a decreased fraction of charged residues (see Figure S2), an observation reported previously by

Claverie and colleagues as CvP-bias 41,42.

To compare the volume change of unfolding the ΔVTot of each protein from the compiled datasets was calculated as described in Materials and Methods. Figure 1A shows a comparison of

ΔVTot as a function of the number of amino acid residues. It is evident that although ΔVTot becomes more negative with increase in protein size, this trend is similar for the five sets of proteins

(thermophiles, psychrophiles, mesophiles, thermo-piezophiles, and psychro-piezophiles.) Smaller proteins have a ΔVTot that is close to zero while larger proteins have increasingly more negative

ΔVTot. Furthermore, as protein size is increased the variance of the data increases. Figure 1B compares the ΔVTot values for the five datasets in a form of a box-plot. The median ΔVTot values are -310 Å3, -279 Å3, -221 Å3, -273 Å3 and -236 Å3 for thermophiles, psychrophiles, mesophiles, thermo-piezophiles, and psychro-piezophiles, respectively. A two-way Analysis of Variance performed on the five datasets returned a P-value of 0.39, which indicates no detectable difference.

Furthermore, the Student’s t-test pairwise p-values range between 0.09 and 0.95, also indicating that there is no significant difference in the ΔVTot for the proteins in these five groups.

- 13 - The proteome-wide calculations of ΔVTot indicated that there is no discernible difference across thermophiles, psychrophiles, mesophiles, thermo-piezophiles, and psychro-piezophiles. To further illustrate this point, we computed ΔVTot of individual homologous structures from the five groups. To this end, we analyzed six different protein families, for which structures are available across the five groups. Comparison of computed ΔVTot is shown in Figure 2 (see also Figure S3 for expanded set of data). Despite variance in the individual ΔVTot values, there is no clear trend which suggests that proteins belonging to a particular group (i.e. piezophilic organisms vs non- piezophilic) have distinct a ΔVTot.

Finally, we have experimentally measured thermodynamics properties including the ΔVTot for representative members of two protein families: acylphosphatase, Acp, and thioredoxin, Trx,

(see Materials and Methods section and Figure S1). The acylphosphatases studied were mesophilic Bacillus subtilis (Acp-BS), thermophilic Thermus thermophilus (Acp-TT) and

Sulfolobus tokodaii (Acp-ST), and thermo-piezophilic Marinitoga piezophila (Acp-MP) and

Pyrococcus yayanosii (Acp-PY) organisms. The thioredoxins studied were from the psychro- piezophilic Shewanella benthica (Trx-SB), mesophilic Escherichia coli (Trx-EC), and thermophilic Thermus thermophilus (Trx-TT) and jannaschii (Trx-MJ) organisms. In addition, we included six thioredoxins that were obtained using ancestral sequence reconstruction (ASR). The details for ASR are described in Perez-Jimenez et al. 27 and based on their constructed phylogenetic tree, the sequences are dated to be between 1.6 and 4.2 Gyr old.

The experimental ΔVTot values for all these proteins at 25°C are shown in Figure 3.

The three-dimensional structures for the Acp and Trx proteins studied experimentally are available (see Table S1 for PDB codes) and were used to compute the ΔVTot using the procedure identical to the one described above. The computed and experimentally determined ΔVTot values

- 14 - are compared in Figure 3 and show a good agreement between the two data sets. This, together with our previously published evidence 15, further supports the validity of our computational approach to calculate the volume changes upon protein unfolding.

In addition, the experimental data shown in Figure 3 shows two important features. First, there is no trend suggesting a difference for the absolute values of the experimentally measured

ΔVTot for proteins from piezophilic versus non-piezophilic organisms. Second, ΔVTot values for ancestral thioredoxins show similar variance as the ΔVTot of extant proteins suggesting that this parameter is not conserved during the evolution. The experimental results analyzed together with the proteome-level calculated ΔVTot values led us to conclude that the ΔVTot of proteins from piezophilic organisms is comparable to the ΔVTot of proteins from non-piezophilic organisms.

3. DISCUSSION

The conclusion we reached above refutes our original hypothesis that the modulation of volume changes upon unfolding of proteins can be used as a mechanism to cope with high hydrostatic pressure. We find no evidence in support for this hypothesis and thus conclude that modulating ΔVTot is not a mechanism with which piezophilic organisms evolved to stabilize their proteins against high hydrostatic pressure. Yet it is known that life exists in the deepest parts of the ocean where hydrostatic pressure reached 100-120 MPa. 3-5,11,12,43-45. This suggests that there must be non-volume based mechanisms that stabilize proteins against pressure denaturation.

Proteins with higher thermodynamic stability will have expanded pressure range where they remain native. Conversely, proteins that do not have high stability will be more susceptible to high hydrostatic pressure. This can be seen from the P-T diagrams constructed using experimental data for Acp and Trx protein families (Figure 4A-B). Thus, factors that increase the net protein

- 15 - thermostability play a role. These factors, such as increase in number of hydrogen bond, optimized electrostatic interactions, changes in conformational entropy of the native and unfolded states, increase in secondary structure propensities, etc have been well documented 46-49. Yet, it has been shown that different proteins adopt different strategies to increase their thermostability 50. There is however, one mechanism of thermostabilization that can act on all proteins and may be relevant for piezophilic organisms - the stabilizing effect of protective osmolytes. Stabilizing osmolytes, such as trimethylamine oxide (TMAO), glutamate, betaine, alanine, β-hydroxybutyrate (β-HB), mannosylglycerate, di-myo-inositol phosphate and N-Acetyl-β-lysine, are present in high concentrations in many piezophilic organisms 9,51,52. These molecules were shown to increase protein stability (see e.g. references 53-60) and this increase in stability is not accompanied by volume changes 61. Such increase in stability may have a physiological role in piezophiles. For example, Photobacterium profundum is a well-studied facultative piezophile that grows optimally at 200-300 bars of pressure but is also able to grow at atmospheric pressures. Growth studies of this organism at atmospheric and high pressures have shown that P. profundum concentrates osmolytes such as betaine and glutamate (both of which are stabilizing osmolytes) when incubated at high pressures 62. The presence of elevated concentration of osmolytes is not unique to single- celled organisms. Deep-sea animals such as muscles, skates, and teleosts have also been observed having high levels of TMAO or another type of osmolyte, such as scyllo-inositol, β-alanine, or betaine 51,52,63. Moreover, a correlation between the concentration of TMAO and the habitat depth

(i.e. increase in hydrostatic pressure) of certain fish is well documented 64,65. Thus, a likely mechanism for the piezophilic adaptation can be the net increase in stability due to increase in the concentration of osmolytes. Addition of osmolytes can produce a significant modulation in the P-

T stability diagram. For example, the modulation in P-T stability of a model protein RNAse A by

- 16 - various osmolytes (see Figure 4C) significantly increases the tolerance to high hydrostatic pressure and is expected to produce qualitatively similar stabilizing effects on any protein. Such mechanism is much easier to implement from an evolutionary perspective as it will work on a systemic level and will stabilize all proteins. Moreover, osmolytes usually have stabilizing effects on protein interactions with the ligands or substrates again providing net increase in stability and thus increase in pressure tolerance on functional level beyond just protein structure.

- 17 -

Supplementary Information

The Supplementary Information includes Tables with PDB codes, and additional figures (2 separate PDF files)

Acknowledgements

Supported by grants CHEM-1506468 and CHEM-1803045 from the US National Science

Foundation (awarded to G.I.M) and the RPI Presidential Fellowship awarded to S.A. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) comet (SDSC) and stampede2 (TACC) using allocation TG-MCB140107, which is supported by National Science

Foundation grant number ACI-1548562.

Conflict of Interest

The authors declare no conflict of interest.

- 18 - REFERENCES

1. Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences 1998;95(12):6578-6583. 2. Onstott TC. Deep Life : The Hunt for the Hidden Biology of Earth, Mars, and Beyond. Princeton, UNITED STATES: Princeton University Press; 2016. 3. McNichol J, Stryhanyuk H, Sylva SP, Thomas F, Musat N, Seewald JS, Sievert SM. Primary productivity below the seafloor at deep-sea hot springs. P Natl Acad Sci USA 2018;115(26):6756-6761. 4. Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, D'Hondt S. Global distribution of microbial abundance and biomass in subseafloor sediment. P Natl Acad Sci USA 2012;109(40):16213-16216. 5. Bar-On YM, Phillips R, Milo R. The biomass distribution on Earth. Proc Natl Acad Sci U S A 2018;115(25):6506-6511. 6. Magnabosco C, Lin LH, Dong H, Bomberg M, Ghiorse W, Stan-Lotter H, Pedersen K, Kieft TL, van Heerden E, Onstott TC. The biomass and biodiversity of the continental subsurface. Nature Geoscience 2018;11(10):707-717. 7. Chyba CF, Phillips CB. Possible ecosystems and the search for life on Europa. Proceedings of the National Academy of Sciences 2001;98(3):801-804. 8. Bartlett DH. Pressure effects on in vivo microbial processes. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 2002;1595(1–2):367-381. 9. Oger PM, Jebbar M. The many ways of coping with pressure. Research in Microbiology 2010;161(10):799-809. 10. Simonato F, Campanaro S, Lauro FM, Vezzi A, D'Angelo M, Vitulo N, Valle G, Bartlett DH. Piezophilic adaptation: a genomic point of view. J Biotechnol 2006;126(1):11-25. 11. Lauro FM, Bartlett DH. Prokaryotic lifestyles in deep sea habitats. 2008;12(1):15-25. 12. Abe F. Exploration of the effects of high hydrostatic pressure on microbial growth, physiology and survival: Perspectives from piezophysiology. Biosci Biotech Bioch 2007;71(10):2347-2357. 13. Royer CA. Application of pressure to biochemical equilibria: the other thermodynamic variable. Methods Enzymol 1995;259:357-377. 14. Chalikian TV. Volumetric properties of proteins. Annu Rev Biophys Biomol Struct 2003;32:207-235. 15. Chen CR, Makhatadze GI. Molecular determinant of the effects of hydrostatic pressure on protein folding stability. Nature Communications 2017;8:14561. 16. Chen CR, Makhatadze GI. ProteinVolume: calculating molecular van der Waals and void volumes in proteins. BMC Bioinformatics 2015;16(1):101. 17. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen M-Y, Pieper U, Sali A. Comparative protein structure modeling using Modeller. Current protocols in bioinformatics 2006;Chapter 5:Unit-5.6. 18. Pronk S, Páll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, Shirts MR, Smith JC, Kasson PM, van der Spoel D, Hess B, Lindahl E. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics (Oxford, England) 2013;29(7):845-854.

- 19 - 19. Brooks BR, Brooks CL, Mackerell AD, Nilsson L, Petrella RJ, Roux B. CHARMM: The Biomolecular Simulation Program. J Comput Chem 2009;30. 20. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics 1983;79(2):926-935. 21. Parrinello M, Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics 1981;52(12):7182-7190. 22. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry 1997;18(12):1463-1472. 23. Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. The Journal of Chemical Physics 1995;103(19):8577-8593. 24. Feldman HJ, Hogue CWV. Probabilistic sampling of protein conformations: New hope for brute force? Proteins: Structure, Function, and Bioinformatics 2002;46(1):8-23. 25. Liang J, Dill KA. Are proteins well-packed? Biophysical journal 2001;81(2):751-766. 26. Tzul FO, Vasilchuk D, Makhatadze GI. Evidence for the principle of minimal frustration in the evolution of protein folding landscapes. Proc Natl Acad Sci U S A 2017;114(9):E1627-E1632. 27. Perez-Jimenez R, Ingles-Prieto A, Zhao ZM, Sanchez-Romero I, Alegre-Cebollada J, Kosuri P, Garcia-Manyes S, Kappock TJ, Tanokura M, Holmgren A, Sanchez-Ruiz JM, Gaucher EA, Fernandez JM. Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat Struct Mol Biol 2011;18(5):592-596. 28. Lee S, Heerklotz H, Chalikian TV. Effects of buffer ionization in protein transition volumes. Biophys Chem 2010;148(1-3):144-147. 29. Makhatadze GI. Heat capacities of amino acids, peptides and proteins. Biophys Chem 1998;71(2-3):133-156. 30. Yu Y, Makhatadze GI, Pace CN, Privalov PL. Energetics of ribonuclease T1 structure. Biochemistry 1994;33(11):3312-3319. 31. Winder AF, Gent WL. Correction of light-scattering errors in spectrophotometric protein determinations. Biopolymers 1971;10(7):1243-1251. 32. Grimsley GR, Pace CN. Spectrophotometric determination of protein concentration. Current Protocols in Protein Science. College Station, Texas: John Wiley and Sons, Inc.; 2004. p 3.1.1-3.1.9. 33. Makhatadze GI, Medvedkin VN, Privalov PL. Partial molar volumes of polypeptides and their constituent groups in aqueous solution over a broad temperature range. Biopolymers 1990;30(11-12):1001-1010. 34. Lopez MM, Makhatadze GI. Differential scanning calorimetry. Methods Mol Biol 2002;173:113-119. 35. Lin LN, Brandts JF, Brandts JM, Plotnikov V. Determination of the volumetric properties of proteins and other solutes using pressure perturbation calorimetry. Anal Biochem 2002;302(1):144-160. 36. Schweiker KL, Makhatadze GI. Use of pressure perturbation calorimetry to characterize the volumetric properties of proteins. Methods Enzymol 2009;466(Ch 22):527-547. 37. Tsamaloukas AD, Pyzocha NK, Makhatadze GI. Pressure perturbation calorimetry of unfolded proteins. J Phys Chem B 2010;114(49):16166-16170. 38. Heremans K, Smeller L. Protein structure and dynamics at high pressure. Biochim Biophys Acta 1998;1386(2):353-370.

- 20 - 39. Papini CM, Pandharipande PP, Royer CA, Makhatadze GI. Putting the Piezolyte Hypothesis under Pressure. Biophys J 2017;113(5):974-977. 40. Smeller L. Pressure–temperature phase diagrams of biomolecules. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 2002;1595(1– 2):11-29. 41. Suhre K, Claverie J-M. Genomic Correlates of Hyperthermostability, an Update. Journal of Biological Chemistry 2003;278(19):17198-17202. 42. Cambillau C, ‡, Claverie J-M. Structural and Genomic Correlates of Hyperthermostability. Journal of Biological Chemistry 2000;275(42):32383-32386. 43. Martinez N, Michoud G, Cario A, Ollivier J, Franzetti B, Jebbar M, Oger P, Peters J. High protein flexibility and reduced hydration water dynamics are key pressure adaptive strategies in prokaryotes. Sci Rep 2016;6:32816. 44. Golub M, Martinez N, Michoud G, Ollivier J, Jebbar M, Oger P, Peters J. The Effect of Crowding on Protein Stability, Rigidity, and High Pressure Sensitivity in Whole Cells. Langmuir 2018;34(35):10419-10425. 45. Jebbar M, Franzetti B, Girard E, Oger P. Microbial diversity and adaptation to high hydrostatic pressure in deep-sea hydrothermal vents prokaryotes. Extremophiles 2015;19(4):721-740. 46. Makhatadze GI, Privalov PL. Energetics of protein structure. Adv Protein Chem 1995;47:307-425. 47. Vogt G, Woell S, Argos P. Protein thermal stability, hydrogen bonds, and ion pairs. J Mol Biol 1997;269(4):631-643. 48. Fersht A, Winter G. Protein engineering. Trends Biochem Sci 1992;17(8):292-295. 49. Pace CN, Shirley BA, McNutt M, Gajiwala K. Forces contributing to the conformational stability of proteins. FASEB J 1996;10(1):75-83. 50. Berezovsky IN, Chen WW, Choi PJ, Shakhnovich EI. Entropic stabilization of proteins and its proteomic consequences. PLoS Comput Biol 2005;1(4):e47. 51. Yancey PH, Blake WR, Conley J. Unusual organic osmolytes in deep-sea animals: adaptations to hydrostatic pressure and other perturbants. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology 2002;133(3):667-676. 52. Yancey PH. Water Stress, Osmolytes and Proteins. American Zoologist 2001;41(4):699- 709. 53. Yancey PH, Siebenaller JF. Co-evolution of proteins and solutions: protein adaptation versus cytoprotective micromolecules and their roles in marine organisms. Journal of Experimental Biology 2015;218(12):1880-1896. 54. Canchi DR, Garcia AE. Cosolvent Effects on Protein Stability. Annual Review of Physical Chemistry, Vol 64 2013;64:273-293. 55. Guin D, Gruebele M. Weak Chemical Interactions That Drive Protein Evolution: Crowding, Sticking, and Quinary Structure in Folding and Function. Chem Rev 2019. 56. Sarkar M, Pielak GJ. An osmolyte mitigates the destabilizing effect of protein crowding. Protein Sci 2014;23(9):1161-1164. 57. Cinar H, Fetahaj Z, Cinar S, Vernon RM, Chan HS, Winter RHA. Temperature, Hydrostatic Pressure, and Osmolyte Effects on Liquid-Liquid Phase Separation in Protein Condensates: Physical Chemistry and Biological Implications. Chemistry 2019. 58. Rosgen J. Molecular basis of osmolyte effects on protein and metabolites. Methods Enzymol 2007;428:459-486.

- 21 - 59. Jaworek MW, Schuabb V, Winter R. The effects of glycine, TMAO and osmolyte mixtures on the pressure dependent enzymatic activity of alpha-chymotrypsin. Phys Chem Chem Phys 2018;20(3):1347-1354. 60. Manisegaran M, Bornemann S, Kiesel I, Winter R. Effects of the deep-sea osmolyte TMAO on the temperature and pressure dependent structure and phase behavior of lipid membranes. Phys Chem Chem Phys 2019;21(34):18533-18540. 61. Papini CM, Pandharipande PP, Royer CA, Makhatadze GI. Putting the Piezolyte Hypothesis under Pressure. Biophysical Journal 2017;113(5):974-977. 62. Martin D, Bartlett DH, Roberts MF. Solute accumulation in the deep-sea bacterium Photobacterium profundum. Extremophiles 2002;6(6):507-514. 63. Yancey PH, Siebenaller JF. Trimethylamine oxide stabilizes teleost and mammalian lactate dehydrogenases against inactivation by hydrostatic pressure and trypsinolysis. Journal of Experimental Biology 1999;202(24):3597-3603. 64. Samerotte AL, Drazen JC, Brand GL, Seibel BA, Yancey PH. Correlation of trimethylamine oxide and habitat depth within and among species of teleost fish: an analysis of causation. Physiol Biochem Zool 2007;80(2):197-208. 65. Yancey PH, Speers-Roesch B, Atchinson S, Reist JD, Majewski AR, Treberg JR. Osmolyte Adjustments as a Pressure Adaptation in Deep-Sea Chondrichthyan Fishes: An Intraspecific Test in Arctic Skates (Amblyraja hyperborea) along a Depth Gradient. Physiol Biochem Zool 2018;91(2):788-796.

- 22 - FIGURE LEGENDS

Figure 1. Comparison of the computed volume changes show no difference between

piezophilic and non-piezophilic proteins. Panel (A). Dependence of the VTot on

number of amino acid residues from Thermophiles (, gray circles), Thermo-

piezophiles (, red down triangles), Mesophiles (, black squares) Psychrophiles

(, blue diamonds) and Psychro-piezophiles (, green up tringles). See Table S2

for list of proteins PDB codes. Panel (B). Box-plot comparison of the data presented

in Panel A using the same color-coding.

Figure 2. Comparison of the computed volume changes for the homologous proteins from

piezophilic and non-piezophilic proteomes. Panel (A). Enoyl-CoA Hydratase (EC

4.2.1.17); Panel (B). Dihydrodipicolinate Synthase (EC 4.2.1.52); Panel (C). Lactate

Dehydrogenase (EC 1.1.2.3); Panel (D). Adenylate Kinase (EC 2.7.4.3); Panel (E).

Malate Dehydrogenase (EC 1.1.1.37); Panel (F). Isopropyl Malate Dehydrogenase

(EC 1.1.1.85). The colors are: gray - Thermophiles; red – Thermo-piezophiles; black

- Mesophiles; blue - Psychrophiles; green – Psychro-piezophiles. The corresponding

PDB codes for each protein are listed on the x-axes (see also Table S3 and expanded

data in Figure S3).

Figure 3. Comparison between experimentally measured and computed volume changes of

two protein families. Panel (A). Acylphosphatase (EC 3.6.1.7); Panel (B).

Thioredoxin (EC 1.8.1.9). Shaded bars correspond to the V values at 25°C obtained

- 23 - from the linear extrapolation of the temperature dependence of experimental values

shown in Figure S3 (see Materials and Methods for details) while colored bars are for

the computed values. The colors are gray - Thermophiles; red – Thermo-piezophiles;

black - Mesophiles; green – Psychro-piezophiles, yellow - ancestrally reconstructed

proteins.

Figure 4. Representative Pressure-Temperature (P-T) diagrams for acylphosphatase and

thioredoxin protein families and RNAse A from experimental data. Panel (A).

Acylphosphatases: Acp-BS - black; Acp-TT - gray; Acp-MP - red. Panel (B). Extant

thioredoxins: Trx-SB - green; Trx-EC - black; Trx-TT - gray; Panel (C).

Experimentally derived P-T diagrams for RNAseA in solution containing various

concentrations of osmolytes; 2M betaine - cyan; 2 M sarcosine -– pink; 2 M glutamate

- orange or no osmolyte - black. P-T diagrams on Panels A and B were calculated

using equation 8 from the experimental data shown in Figure S1. Panel C is adopted

with modifications from 61.

- 24 -

Figure 1.

A B 0 0

-500

) -250

3

V

Tot

-1000 (Å Tot -500

V

3

)

 -1500 Thermophiles -750 ThermoPiezophiles -2000 Mesophiles Psychrophiles PsychroPiezophiles -1000 -2500 50 100 150 200 250 300 350 400 450 500 Thermo Thermo Meso Psychro Psychro Number of Amino Acid Residues Piezo Piezo

Figure 2.

5VE2 4K3W 5YLO 4JVT 2PBP 5AFD 4PFM 3IRD 3PB0 2YXG 6LDH 3PQE 2V6M 1HYG A 0 B 0 C 0 -200 -200 -250

-400

)

)

)

3

3 -400 3 -500

(Å (Å -600

Tot

Tot

Tot

V

V -600 V -750 -800 -1000 -800 -1000

Enoyl-CoA Hydratase -1250 Dihydrodipicolinate Synthase Lactate Dehydrogenase -1000 -1200

1S3G 4K46 2CDN 1ZIP 2V65 1B8P 4JCO 4KDE 2X0I 2UXQ 3VKZ 1CM7 1VLC D 0 E 0 F 0 -100 -200 -250

-200

)

)

)

3

3 -400 3 -500

(Å -300 (Å

Tot

Tot

Tot

V

V -600 V -750 -400

-800 -1000 -500

Adenylate Kinase Malate Dehydrogenase Isopropyl Malate Dehydrogenase -600 -1000 -1250

- 25 - Figure 3.

BS MP PY TT ST SB EC TT MJ LPBCA LGPCA LACA LAFCA AECA LBCA A 0 B 0

-50 -50

/mol)

/mol)

3

3 -100 -100

V (cm V

V (cm V

D

D

-150 -150

-200 -200

- 26 -

Figure 4.

Temperature (oC) 0 20 40 60 80 400 A 300 Acp-MP

200 Acp-TT 500 Pressure (MPa) 2M Sar C 100 400

Pressure (MPa) Pressure Acp-BS 2M Bet 2M Glu 0 300 B 0M 400 Trx-TT 200

100 300 Trx-EC 0 200 Trx-SB 0 20 40 60 80 o 100 Temperature ( C)

Pressure (MPa) Pressure

0 0 20 40 60 80 o Temperature ( C)

- 27 -