<<

Correlation between molecules identified in atomic models of -galactosidase

determined by cryo-EM and X-ray crystallography

Florentina Tofoleanu1*, Lesley A. Earl2,**, Frank C. Pickard IV1,*** & Bernard R. Brooks1

1Laboratory of Computational Biology, Biochemistry and Biophysics Center, National Institutes

of Health, Bethesda, MD 20892 USA

2Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National

Institutes of Health, Bethesda, MD 20892 USA

Corresponding Authors: Florentina Tofoleanu ([email protected]) & Bernard

R. Brooks ([email protected])

*current address: Global Discovery Chemistry, Computer-Aided Drug Discovery, Novartis Institutes for BioMedical Research, 181 Mass Ave., Cambridge, MA 02139, USA **current address: Office of Science Communications, Public Liaison and Education, National Eye Institute, National Institutes of Health, 31 Center Drive, Bethesda MD, 20892 USA ***current address: Pfizer World Research and Development, Groton, CT 06340, USA

ABSTRACT

The placement of individual water molecules can now be determined from high resolution cryo- electron microscopy (cryo-EM) maps, but only a fraction of these water molecules overlap with those observed in X-ray crystallographic maps. Furthermore, there is substantial variation between the placement of water molecules in multiple X-ray maps. Here, (MD) simulations are used to compare the dynamics and energetics of water in a 2.2-Å- resolution cryo-EM structure with those of water conserved across 56 high-resolution X-ray crystallographic structures of -galactosidase. The MD simulations indicate that water molecules observed in the cryo-EM map and those conserved among X-ray structures are dislodged from their initial locations at a similar rate, which is approximately half the rate of non-conserved water molecules. Additionally, the residence times and the exposure of the water binding sites are similar for the cryo-EM and conserved water molecules, and are statistically different from those of non-conserved water molecules. Conserved water has been shown to be involved in folding, dynamics and function. The fact that the properties of water molecules identified by a single cryo-EM map closely correspond to those conserved across multiple X-ray structures points towards the opportunities that cryo-EM opens for structure-based drug-design.

KEYWORDS

Cryo-electron microscopy, molecular dynamics simulations, conserved water, residence time analysis, solvent exposure analysis

ABBREVIATIONS

Cryo-electron microscopy (cryo-EM); molecular dynamics (MD), solvent accessible surface area

(SASA)

Introduction

Identifying protein profiles is critical to understand both and function, as well as for the design of high affinity lead compounds for drug discovery (Poornima and

Dean, 1995; Wong and Lightstone, 2011). Interactions between water molecules and amino acids occur both at the surface of protein (Laage et al., 2017) and within the structure, including areas such as subunit interfaces, catalytic sites, and other cavities (Levy and Onuchic, 2006). While protein atoms can be accurately localized in high resolution X-ray crystallographic maps, the reliable placement of water molecules is more difficult (Nittinger et al., 2015). During the process of crystallization, the distribution of water molecules in the protein may be influenced by crystallization, crystal contacts (Nittinger et al., 2015), and a variety of other factors (Juers,

2000; Juers et al., 2000). One approach to identify structurally important water molecules that are stably associated with the protein is to determine “conserved” water molecules, defined as the subset of water molecules that are placed at roughly the same location in an ensemble of independent high resolution X-ray structures of the same polypeptide (Knight et al., 2009; Levitt and Park, 1993). Identifying such conserved water molecules – which are expected to be more stable and thus more likely to be involved in long-lasting interactions – is of particular interest for the use of molecular models for structure-based and other structural analyses.

Resolutions attainable by single particle cryo-electron microscopy (cryo-EM) have improved significantly over the last several years; and water molecules, in proximity to both polar and non-polar residues and in hydrogen-bonded chains, can now be reliably placed into the experimental electron density for the highest resolution maps (Banerjee et al., 2016; Bartesaghi et al., 2015; Bartesaghi et al., 2018; Campbell et al., 2015; Laverty et al., 2019; Merk et al., 2016; Nakane et al., 2020). In order for water density to be observed by cryo-EM, it is necessary that the water molecules occupy an identical location in the vast majority of the molecules from which the final 3D structure is derived by averaging.

Here, we use molecular dynamics (MD) simulations to test the locations of structural water molecules by modeling the protein-water interactions, which have significant effects on the behavior of the protein, including protein folding and stability, and protein-ligand interactions

(Privalov and Crane-Robinson, 2017). We simulate both cryo-EM and X-ray derived models of

-galactosidase to evaluate water dynamics and identify those water molecules more stably associated with the protein, providing a benchmark by which to evaluate the physical qualities of water molecules placed by each method. We show that, compared to water molecules placed in high-resolution X-ray structures, a large proportion of water molecules observed by cryo-EM are structurally stable. Historically, X-ray crystallography has been the major source of experimentally-based water placement in structural models of biological complexes; however, as cryo-EM begins to achieve sufficient resolution to experimentally observe water density for stable water molecules, cryo-EM models will increasingly be available for use in a variety of contexts.

Materials and Methods

Setting up the systems

Models of Escherichia coli β-galactosidase monomers were constructed from chain A of PDB

5A1A (cryo-EM), and from chain A of PDB 1DP0 (X-ray). 1DP0 is at 1.7 Å resolution, has amino acid residues 14-1024, and includes 4 Mg2+ and 5 Na+ ions, as well as 1134 assigned water molecules per monomer, while 5A1A is at 2.2 Å, has amino acids 2-1024 (we added amino acid 1 to our monomer model) and includes 2 Mg2+ and 2 Na+ ions, 194 assigned water molecules, as well as the inhibitor phenylethyl -D-thiogalactopyranoside (PETG) in each monomer. The parameters for PETG were generated using the CHARMM generalized force field

(CGenFF) (Vanommeslaeghe et al., 2010). We retained all water molecules and ions found in the initial structures into our monomer models. Details of the two systems are found in

Supplementary Table 1. We solvated the structures with the explicit TIP3P water model and added Mg2+, Na+ and Cl− ions to neutralize the systems and to provide ionic concentrations of 2 mM MgCl2 and 50 mM NaCl, the experimental in our previous study (Bartesaghi et al.,

2014). Histidine residues were kept neutral. The two systems were prepared for simulation by minimization, followed by heating to 300 K (145 ps), while constraining the protein backbone and experimentally-placed ions and water molecules Finally, equilibration (500 ps) was performed with the Cα atoms harmonically restrained (Figure 1a). The preparation steps were followed by 100 ns simulations at constant number of particles, constant pressure, and constant temperature (NPT ensemble) using the CHARMM package (Brooks et al., 1983) with the

CHARMM36 force field, saving the trajectory every 10 ps (Figure 1b). Pressure was maintained at 1 atm with the Langevin piston method and the temperature was kept constant at 300 K with a

Nosé-Hoover thermostat (Feller et al., 1995; MacKerell et al., 1998), with a time step of 1 fs.

Particle mesh Ewald (Essmann et al., 1995) was used to impose periodic boundary conditions for simulation of the monomer in bulk solvent. A further simulation of 20 ns, saving each frame at

0.1 ps, was performed for the analysis of water residence times (see below).

Structure and trajectory visualization and rendering were performed using Visual Molecular

Dynamics (VMD) (Humphrey et al., 1996) and PyMol (Schrodinger, 2015), and ribbon model visualizations were completed in UCSF Chimera (Pettersen et al., 2004). CHARMM, custom- written Python and tcl scripts were employed for trajectory and structural analysis.

Identification of conserved water

To identify conserved water molecules across all X-ray crystal structures deposited in the Protein

Data Bank for E. coli β-galactosidase, we analyzed 56 PDB structures, each containing one to four tetramers. In total, we identified 272 monomer chains (Table S2), each with unique water molecules. We discarded monomers from structures with resolutions lower than 2.2 Å (a total of

124 chains, PDB IDs: 1F4A, 1F4H, 1HN1, 3E1F, 3IAQ, 3T0B, 3MUY, 3T2P, 3T2Q, 3VDA,

3VDC, 3VD3, 3VD5, 3VD7, 4DUX, 4V40, 4V41, 4V44 and 4V45). The remaining 148 chains were aligned to chain A of 1DP0 (1DP0_A), our X-ray reference structure. A water molecule in the reference structure was considered conserved if it was consistently coincident within 1.5 Å of a water molecule in all other 148 structures (Aksianov et al., 2008). We identified 195 such water molecules and mapped them into 1DP0_A.

Water retention analysis

Throughout the analysis, we refer to three water molecule categories: water molecules within the cryo-EM model (cryo-EM water), any water molecules in the X-ray model (X-ray water) and the specific 195 water molecules within the X-ray model identified as conserved (conserved water).

During the preparation steps described above (minimization, heating and equilibration), some of the experimentally placed water molecules diffused away from the protein. Upon completion of the preparation and prior to the start of the water dynamics simulation, the cryo-EM model retained 100 (out of 194) cryo-EM water molecules, while the X-ray model retained 156 (out of

1134) X-ray and 82 (out of 195) conserved water molecules. The final number of retained water molecules at 100 ns were 11 (0.9%), 10 (5.5 %), and 9 (4.6%) for the X-ray model, cryo-EM model, and conserved subset, respectively (Figure 1c). Root mean square deviation (RMSD) values for each residue were calculated for both the cryo-EM and X-ray models for movement over the duration of the simulation (Figure S1d-g).

At the end of the simulation, the radius considered for overlap between water molecules in the cryo-EM and X-ray structures was extended to 2.5 Å, to account for the motion of both protein and water atoms during the simulation.

We used the second order exponential decay function to fit the data (the number of water molecules remaining in their original locations at each time point), and is given by:

푎푒−푏푥 + 푐푒−푑푥 + 푓, where a and c are the exponential weights, b and d are the decay constants and f is the number of water molecules that persist at the end of the simulation. Second order exponential coefficients are shown in Table S3.

Mg2+ ions retained their first solvation shell throughout the simulation. Water molecules remained complexed with Mg2+ ions, since the simulation time (100 ns) was far shorter than the typical exchange time for the first solvation shell of water molecules (estimated to be on the order of μs) (Callahan et al., 2010). Water molecules associated with Mg2+ ions were thus not included in our analyses.

Solvent accessible surface area

The solvent accessible surface area (SASA) for each protein residue within each structure was computed using the VMD plugin with a probe radius of 1.4 Å. We averaged the SASA values for each residue for the duration of the 100 ns simulation. We then assigned each residue to one category: charged, hydrophilic or hydrophobic, by the Kyte & Doolittle hydrophobicity scale

(Kyte and Doolittle, 1982).

Residence times and analysis

Water binding sites were defined as protein residues that formed hydrogen bonds with experimentally-placed water molecules in the initial structures (prior to simulations). Hydrogen bonds were assessed using 3.0 Å and 150 cutoffs, which are considered “weak” hydrogen bonds

(Pimentel and McClellan, 1960).

We estimated the residence time of water molecules around every protein residue in the cryo-EM and X-ray models by analyzing the hydrogen bonds formed with water during a 20 ns simulation, extended from the 100 ns simulation described above. Hydrogen bonds were evaluated at 0.1 ps intervals. The average residence time of water molecules that form hydrogen bonds with the protein (the first solvation shell of protein residues) was calculated based on the Impey method

(Impey et al., 1983). To assist in our analysis of the characteristic times of hydrogen bonds

∗ between protein residue i and water molecule j, we introduce the function 푃푖푗(휏, 휏 + 푡; 푡 ). This function takes a value of 1 if a hydrogen bond exists continuously between residue i and water j between time τ and τ + t, without an interruption longer than t*, otherwise this function takes a value of 0. From Pij, we define a normalized autocorrelation function for each protein residue i:

〈 ( ∗)〉 ∗ 푃푖푗 휏, 휏 + 푡; 푡 푗,휏 푅푖(푡; 푡 ) = ∗ , 〈푃푖푗(휏, 휏; 푡 )〉푗,휏 where we average over all water molecules j and over all time origins τ. As a practical measure, due to the discretization of Pij in steps of 0.1 ps, hydrogen bonds with a duration less than 0.5 ps were removed from the autocorrelation analysis to mitigate discretization artifacts. We assume that the autocorrelation function undergoes double exponential decay, and fit it to the following form:

푓푖푡 ∗ 푅푖 (푡, 푡 ) = 푐 exp(−푡/휏푠ℎ표푟푡) + (1 − 푐) exp(−푡/휏푙표푛푔), where 휏푙표푛푔 > 휏푠ℎ표푟푡 > 0 and 0 ≤ 푐 ≤ 1. We take the value of τlong to be the mean residence time of hydrogen bonds formed between a specific protein residue and water molecules in its first hydration shell. To compensate for very long-lasting hydrogen bonds, and to improve the reliability of fits, the 20 ns MD simulation was split into 5 ns segments, and the residence time values for each residue were estimated independently for each segment and averaged. A representative example of fits for a single residue (with t* = 2.0 ps) in the cryo-EM and X-ray models is shown in Figure S2a. Because results from the Impey method are dependent on the value of t* (Laage and Hynes, 2008), we tested several values: 0.0, 0.1, 0.5, 1.0 and 2.0 ps; based on the correlation between the two models (Figure S2b), we chose t* = 2 ps, a value that was used in previous similar analyses of protein (Henchman and McCammon, 2002).

Statistical analysis

The correlation association between SASA values (Figure S2d) and residence time values

(Figure S2c) for residues 14-1024 in the cryo-EM versus X-ray models was evaluated with

Kendall’s tau and Spearman’s rho statistic, as implemented in the R stats package (https://cran.r- project.org/, version 3.3.2). Correlation association values are shown in Table S4. Wilcoxon signed rank test (paired) was used to evaluate differences between SASA values or residence time values for cryo-EM and X-ray models. For subsets of residues initially bound by Cryo, X- ray, or Conserved water molecules, the Wilcoxon rank sum test (unpaired) was used to individually evaluate differences between subsets with respect to SASA, residence time, and

RMSD values; a Bonferroni’s correction was used to adjust for multiple comparisons.

Alternative hypothesis was two-tailed for each analysis. Wilcoxon rank test values are shown in

Table S5.

RESULTS

We constructed monomer models of Escherichia coli (E. coli) protein -galactosidase from structures determined by cryo-EM at 2.2 Å resolution (Bartesaghi et al., 2015) (PDB ID 5A1A, chain A), and X-ray crystallography (Juers et al., 2000) (PDB ID 1DP0, chain A, 1.7 Å resolution) (see Table S1 and Figures S1a, S1b for details of the two models). Using an ensemble of X-ray crystal structures of E. coli -galactosidase with resolutions between 1.6 and 2.2 Å

(Table S2), we further identified 195 conserved water molecules (hereafter referred to as

“conserved water”) and mapped them onto chain A of the atomic model for 1DP0, one of the highest resolution X-ray structures of wild-type E. coli -galactosidase (see Materials and

Methods, Figure S1c). A comparison of the locations of all the water molecules in the monomeric 5A1A model (hereafter referred to as “cryo-EM water”) with those reported in the

1DP0 model (hereafter referred to as “X-ray water”) shows that of the 194 initial cryo-EM water molecules, 169 (87%) overlap with the X-ray water molecules, and 68 molecules (44.8%) overlap with the conserved water molecules (52% of total conserved water).

To evaluate the strength of interactions between the protein and cryo-EM water molecules versus

X-ray water molecules, we monitored water dynamics during a restrained 500 ps equilibration step (Figure 1a), followed by a 100 ns unrestrained MD simulation (Figure 1b). The rate of water displacement in the structures during the simulation was consistent with results from similar prior studies (Halle, 2004; Henchman and McCammon, 2002; Schiffer and van Gunsteren,

1999), and followed a second order exponential decay. While the decay rates were similar for cryo-EM and conserved water molecules, the decay rate for the ensemble of X-ray water molecules was significantly higher (Figure 1a-b, Table S3). These findings suggest that, on average, the conserved water molecules, as well as those identified by cryo-EM, occupy more kinetically stable locations in the structure, while the vast majority of water molecules placed in the X-ray atomic model are less kinetically stable and more easily displaced during the simulation.

The most strongly protein-associated water molecules were still bound to the protein at the end of the simulation. In our analysis, we excluded the water molecules in the first solvation shell of the Mg2+ ions, which are expected to remain bound over significantly longer periods than our

100 ns simulation (Callahan et al., 2010). Similar numbers of water molecules are retained in their original positions for the X-ray (11 molecules, 9 of which were conserved) and cryo-EM models (10 molecules) at the end of the 100 ns simulation. Out of the 10 water molecules remaining in the proximity of the protein in the cryo-EM model, 7 overlap with retained conserved X-ray water molecules (Figure 1c; densities from the cryo-EM and 2Fo-Fc maps for several retained water molecules are shown in Figure 1d).

We further explored the impact of solvent exposure and local hydrogen bonding interactions on water retention during a follow-up 20 ns MD simulation. We analyzed the average solvent accessible surface area, SASA, (see Figure S2) and the residence time for water within hydrogen-bonding distance, res (see Methods, Figure S2a-f). We first computed these parameters for all protein residues, and then focused on the binding sites for cryo-EM or X-ray-placed water molecules.

When comparing all residues in the X-ray and cryo-EM models, as expected, SASA values were strongly correlated, with a Kendall’s  value of 0.84. The res values for all residues were somewhat less strongly correlated, with a Kendall’s  value of 0.56 (Figure S2c-d, Table S4).

Histograms of res values of all residues and of binding site subsets are found in Figures S2e, f.

We characterized four classes of water binding sites in -galactosidase: residues bound to water molecules in the 1DP0 X-ray structure (X-ray), bound to water molecules in the 5A1A cryo-EM model (Cryo), bound to the subset of water molecules conserved across the X-ray atomic models

(Conserved), or bound to the water molecules in 1DP0 excluded from the conserved subset

(Non-conserved). Interestingly, a comparison of the SASA and res values for each of these subgroups of residues shows a striking pattern: protein residues in the Cryo and Conserved groups display significantly higher res values and lower SASA values than the X-ray or Non- conserved groups (Figure 2, Figure S2g-h, see Table S5 for Wilcoxon signed rank tests of each subgroup). This pattern indicates that water molecules identified as being conserved across multiple crystal structures or present in high-resolution cryo-EM structures are more likely to be associated with protein residues over longer periods of time, and are buried deeper within the structure. In contrast, the vast majority of water molecules placed during refinement of crystallographically derived atomic models are near residues with higher SASA values and display significantly lower res values, presumably because they are associated with the surface of the protein, rather than with its interior (Figures S1a-c, S2g-h) (Lounnas and Pettitt, 1994).

In particular, we found that the SASA values of hydrophobic residues bound by X-ray water molecules are significantly higher than those bound by cryo-EM or conserved water (Figure

S2g). Although partially solvent-exposed hydrophobic residues have been proposed to have a stabilizing effect (Modig et al., 2004), highly solvent-exposed hydrophobic residues on the surface of the protein have been suggested to not stably associate with water (Halle and

Davidovic, 2003). The SASA values of charged residues bound to X-ray water are also higher than SASA values of the binding sites for cryo-EM or conserved water (Figure S2g), which suggests that the latter binding sites are more buried. Charged residues typically associate with water molecules (Steinbach and Brooks, 1993), and interactions between buried charged residues and water have been shown to stabilize (Isom et al., 2008).

DISCUSSION AND CONCLUSIONS

The use of protein structural models for a variety of purposes – including the design of therapeutics – depends on a thorough understanding of stabilizing and destabilizing elements of the structure, including protein interactions with water and other solvent molecules. Using MD simulations to observe the behavior of water molecules placed in atomic models, either from cryo-EM maps or X-ray crystal structures, allows modelling of parameters like water residence time that provide insights into interactions that stabilize bound water molecules. Water molecules that are structurally conserved in multiple crystals have increased residence times compared to bulk water molecules in (Pfeiffer et al., 1998). Since water molecules resolved by X-ray crystallography should exist at local energy minima (Levitt and Park, 1993), it is important to reliably predict those positions through modeling the molecular structures. Our analyses show that ~ 90% of the water molecules identified in the 5A1A cryo-EM density map of -galactosidase are also found in the 1DP0 X-ray structure, and that ~ 50% of these water molecules also correspond to those conserved across multiple atomic models obtained by X-ray crystallography. We conclude that these water molecules must reside at locations with high probabilities of containing stable hydrogen-bonded interactions, in contrast to the overall set of water molecules in any particular atomic model derived by crystallography, the majority of which appear not to be stably associated with the protein.

In cases where multiple crystal structures exist, computing conserved water placement can reveal a subset of water molecules that are more stably associated with the protein. However, in cases where a large collection of independent crystal structures is not available, a single high resolution cryo-EM structure may provide comparable high quality water structure information as we have shown here for -galactosidase.

While relatively few cryo-EM structures at resolutions allowing water placement are currently available (Banerjee et al., 2016; Bartesaghi et al., 2015; Campbell et al., 2015; Dong et al., 2017;

Fischer et al., 2015; Liu et al., 2016a; Liu et al., 2016b; Merk et al., 2016; Miguez Amil et al.,

2020; Pintilie et al., 2020; Su et al., 2017; Zivanov et al., 2018), due to the rapid expansion of the cryo-EM field, along with progress in methods for 3D reconstruction, the number of such structures is likely to grow. It is thus necessary to understand the differences between models derived by X-ray crystallography and cryo-EM, and the opportunities these new models offer for applications such as structure-based drug design.

ACKNOWLEDGEMENTS

We thank Sriram Subramaniam for helpful scientific discussions and guidance all throughout the project. We also thank Doreen Matthies, Sagar Chittori, and Jacqueline L. S. Milne for critical reading of the manuscript. We are grateful for technical assistance from Tim Miller, Rick Venable and Veronica Falconieri.

FUNDING SOURCES

This work was supported by the Intramural Research Programs of the National Heart Lung and

Blood Institute (NHLBI, to B.R.B.) at the National Institutes of Health, as well as by the NHLBI

Lenfant Biomedical Fellowship (to F.T.). This work utilized NIH High Performance Computing

(HPC) resources of the Biowulf (http://hpc.nih.gov) and Lobos (http://www.lobos.nih.gov) clusters at the NIH.

REFERENCES

Aksianov, E., Zanegina, O., Grishin, A., Spirin, S., Karyagina, A., Alexeevski, A., 2008. Conserved Water Molecules In X-Ray Structures Highlight The Role Of Water In Intramolecular And Intermolecular Interactions. J. Bioinf. Comput. Biol. 6, 775-788.

Banerjee, S., Bartesaghi, A., Merk, A., Rao, P., Bulfer, S.L., Yan, Y., Green, N., Mroczkowski, B., Neitz, R.J., Wipf, P., Falconieri, V., Deshaies, R.J., Milne, J.L., Huryn, D., Arkin, M., Subramaniam, S., 2016. 2.3 A resolution cryo-EM structure of human p97 and mechanism of allosteric inhibition. Science 351, 871-875.

Bartesaghi, A., Matthies, D., Banerjee, S., Merk, A., Subramaniam, S., 2014. Structure of beta- galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. Proc Natl Acad Sci U S A 111, 11709-11714.

Bartesaghi, A., Merk, A., Banerjee, S., Matthies, D., Wu, X., Milne, J.L., Subramaniam, S., 2015. 2.2 A resolution cryo-EM structure of beta-galactosidase in complex with a cell- permeant inhibitor. Science 348, 1147-1151.

Bartesaghi, A., Aguerrebere, C., Falconieri, V., Banerjee, S., Earl, L.A., Zhu, X., Grigorieff, N., Milne, J.L.S., Sapiro, G., Wu, X., Subramaniam, S., 2018. Atomic Resolution Cryo-EM Structure of beta-Galactosidase. Structure 26, 848-856 e843.

Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M., 1983. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 187-217.

Callahan, K.M., Casillas-Ituarte, N.N., Roeselova, M., Allen, H.C., Tobias, D.J., 2010. Solvation of magnesium dication: molecular dynamics simulation and vibrational spectroscopic study of magnesium chloride in aqueous solutions. J Phys Chem A 114, 5141-5148.

Campbell, M.G., Veesler, D., Cheng, A., Potter, C.S., Carragher, B., 2015. 2.8 A resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy. Elife 4, e06380.

Dong, Y., Liu, Y., Jiang, W., Smith, T.J., Xu, Z., Rossmann, M.G., 2017. Antibody-induced uncoating of human rhinovirus B14. Proc Natl Acad Sci U S A 114, 8017-8022.

Essmann, U., Perera, L., Berkowitz, M.L., Darden, T., Lee, H., Pedersen, L.G., 1995. A smooth particle mesh Ewald method. The Journal of Chemical Physics 103, 8577-8593.

Feller, S.E., Zhang, Y., Pastor, R.W., Brooks, B.R., 1995. Constant pressure molecular dynamics simulation: the Langevin piston method. J. Chem. Phys. 103, 4613-4621. Fischer, N., Neumann, P., Konevega, A.L., Bock, L.V., Ficner, R., Rodnina, M.V., Stark, H., 2015. Structure of the E. coli ribosome-EF-Tu complex at <3 A resolution by Cs- corrected cryo-EM. Nature 520, 567-570.

Halle, B., 2004. Protein hydration dynamics in solution: a critical survey. Philos T R Soc B 359, 1207-1223.

Halle, B., Davidovic, M., 2003. Biomolecular hydration: from water dynamics to hydrodynamics. Proc Natl Acad Sci U S A 100, 12135-12140.

Henchman, R.H., McCammon, J.A., 2002. Structural and dynamic properties of water around acetylcholinesterase. Protein Sci 11, 2080-2090.

Humphrey, W., Dalke, A., Schulten, K., 1996. VDM: visual molecular dynamics. J. Mol. Graphics 14, 33-38, plates, 27-28.

Impey, R.W., Madden, P.A., Mcdonald, I.R., 1983. Hydration and Mobility of Ions in Solution. J Phys Chem-Us 87, 5071-5083.

Isom, D.G., Cannon, B.R., Castaneda, C.A., Robinson, A., Garcia-Moreno, B., 2008. High tolerance for ionizable residues in the hydrophobic interior of proteins. Proc Natl Acad Sci U S A 105, 17784-17788.

Juers, D.H., 2000. A structural view of beta-galactosidase in action, University of Oregon, Eugene, OR.

Juers, D.H., Jacobson, R.H., Wigley, D., Zhang, X.J., Huber, R.E., Tronrud, D.E., Matthews, B.W., 2000. High resolution refinement of beta-galactosidase in a new crystal form reveals multiple metal-binding sites and provides a structural basis for alpha- complementation. Protein Sci 9, 1685-1699.

Knight, J.D.R., Hamelberg, D., McCammon, J.A., Kothary, R., 2009. The role of conserved water molecules in the catalytic domain of protein kinases. Proteins 76, 527-535.

Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J Mol Biol 157, 105-132.

Laage, D., Hynes, J.T., 2008. On the residence time for water in a solute hydration shell: Application to aqueous halide solutions. J Phys Chem B 112, 7697-7701.

Laage, D., Elsaesser, T., Hynes, J.T., 2017. Perspective: Structure and ultrafast dynamics of biomolecular hydration shells. Struct Dynam-Us 4.

Laverty, D., Desai, R., Uchanski, T., Masiulis, S., Stec, W.J., Malinauskas, T., Zivanov, J., Pardon, E., Steyaert, J., Miller, K.W., Aricescu, A.R., 2019. Cryo-EM structure of the human alpha1beta3gamma2 GABAA receptor in a lipid bilayer. Nature 565, 516-520.

Levitt, M., Park, B.H., 1993. Water: now you see it, now you don't. Structure 1, 223-226. Levy, Y., Onuchic, J.N., 2006. Mechanisms of protein assembly: lessons from minimalist models. Acc Chem Res 39, 135-142.

Liu, Y., Hill, M.G., Klose, T., Chen, Z., Watters, K., Bochkov, Y.A., Jiang, W., Palmenberg, A.C., Rossmann, M.G., 2016a. Atomic structure of a rhinovirus C, a virus species linked to severe childhood asthma. Proc Natl Acad Sci U S A 113, 8997-9002.

Liu, Z., Gutierrez-Vargas, C., Wei, J., Grassucci, R.A., Ramesh, M., Espina, N., Sun, M., Tutuncuoglu, B., Madison-Antenucci, S., Woolford, J.L., Jr., Tong, L., Frank, J., 2016b. Structure and assembly model for the Trypanosoma cruzi 60S ribosomal subunit. Proc Natl Acad Sci U S A 113, 12174-12179.

Lounnas, V., Pettitt, B.M., 1994. Distribution Function Implied Dynamics Versus Residence Times and Correlations - Solvation Shells of Myoglobin. Proteins-Structure Function and Genetics 18, 148-160.

MacKerell, A.D., Jr., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau, F.T.K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D.T., Prodhom, B., Reiher, W.E., III, Roux, B., Schlenkrich, M., Smith, J.C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J., Yin, D., Karplus, M., 1998. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 102, 3586- 3616.

Merk, A., Bartesaghi, A., Banerjee, S., Falconieri, V., Rao, P., Davis, M.I., Pragani, R., Boxer, M.B., Earl, L.A., Milne, J.L., Subramaniam, S., 2016. Breaking Cryo-EM Resolution Barriers to Facilitate Drug Discovery. Cell 165, 1698-1707.

Miguez Amil, S., Jimenez-Ortega, E., Ramirez-Escudero, M., Talens-Perales, D., Marin- Navarro, J., Polaina, J., Sanz-Aparicio, J., Fernandez-Leiro, R., 2020. The cryo-EM Structure of Thermotoga maritima beta-Galactosidase: Quaternary Structure Guides Protein Engineering. ACS Chem Biol 15, 179-188.

Modig, K., Liepinsh, E., Otting, G., Halle, B., 2004. Dynamics of protein and peptide hydration. J Am Chem Soc 126, 102-114.

Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P.M.G.E., Grigoras, I.T., Malinauskaite, L., Malinauskas, T., Miehling, J., Yu, L., Karia, D., Pechnikova, E.V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S.W., Chirgadze, D.Y., Murshudov, G., Aricescu, A.R., Scheres, S.H.W., 2020. Single- particle cryo-EM at atomic resolution. bioRxiv.

Nittinger, E., Schneider, N., Lange, G., Rarey, M., 2015. Evidence of water molecules--a statistical evaluation of water molecules based on electron density. J Chem Inf Model 55, 771-783. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.E., 2004. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605-1612.

Pfeiffer, S., Spitzner, N., Lohr, F., Ruterjans, H., 1998. Hydration water molecules of nucleotide- free RNase T1 studied by NMR spectroscopy in solution. J Biomol NMR 11, 1-15.

Pimentel, G.C., McClellan, A.L., 1960. The Hydrogen Bond W. H. Freeman & Co.

Pintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M.F., Chiu, W., 2020. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nat Methods 17, 328-334.

Poornima, C.S., Dean, P.M., 1995. Hydration in drug design .3. Conserved water molecules at the ligand-binding sites of homologous proteins. J Comput Aid Mol Des 9, 521-531.

Privalov, P.L., Crane-Robinson, C., 2017. Role of water in the formation of macromolecular structures. Eur Biophys J 46, 203-224.

Schiffer, C.A., van Gunsteren, W.F., 1999. Accessibility and order of water sites in and around proteins: A crystallographic time-averaging study. Proteins-Structure Function and Genetics 36, 501-511.

Schrodinger, LLC. 2015. The PyMOL Molecular Graphics System, Version 1.8.

Steinbach, P.J., Brooks, B.R., 1993. Protein Hydration Elucidated by Molecular-Dynamics Simulation. P Natl Acad Sci USA 90, 9135-9139.

Su, X., Ma, J., Wei, X., Cao, P., Zhu, D., Chang, W., Liu, Z., Zhang, X., Li, M., 2017. Structure and assembly mechanism of plant C2S2M2-type PSII-LHCII supercomplex. Science 357, 815-820.

Vanommeslaeghe, K., Hatcher, E., Acharya, C., Kundu, S., Zhong, S., Shim, J., Darian, E., Guvench, O., Lopes, P., Vorobyov, I., Mackerell, A.D., Jr., 2010. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J Comput Chem 31, 671-690.

Wong, S.E., Lightstone, F.C., 2011. Accounting for water molecules in drug design. Expert Opin Drug Discov 6, 65-74.

Zivanov, J., Nakane, T., Forsberg, B.O., Kimanius, D., Hagen, W.J., Lindahl, E., Scheres, S.H., 2018. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7.

FIGURE LEGENDS

Figure 1: Water dynamics simulation. (a, b) 500 ps equilibration (a) and 100 ns MD simulation

(b) showing water displacement from its original location. Raw data points (solid line) and second order exponential decay fits (dotted line) are shown for all water molecules in the X-ray model (1DP0; blue), the conserved subset in 1DP0 (purple), and for all water molecules in the cryo-EM model (5A1A; green). (c) Ribbon diagram of a β-galactosidase monomer (X-ray model, post-simulation) showing placement of water molecules remaining at the end of the MD simulation from cryo-EM (green), conserved X-ray (purple), and water molecules from the X-ray model that are not in the conserved subset (blue). (d) Density (from the cryo-EM map and 2Fo-

Fc X-ray map) corresponding to five water molecules that remain at the end of the 100 ns simulation and are common to X-ray (blue) and cryo-EM (green) models.

Figure 2: Residence time analysis. (a) Residence times (τres) shown for all residues in the cryo-

EM (green) and X-ray (blue) derived models (left); residues not bound by the cryo-EM, X-ray, or conserved X-ray (purple) water molecules (center); or residues bound by cryo-EM, X-ray, conserved X-ray, or non-conserved X-ray (gray) (right). Median values (solid line) and

25th and 75th quartiles (dotted lines) are shown. Residues with τres values < 0.4 ps or > 200 ps are not shown. (b) Solvent accessible surface area (SASA) values are indicated for subgroups, with median and quartiles, as in (a).