Correlation Between Water Molecules Identified in Atomic Models of Β
Total Page:16
File Type:pdf, Size:1020Kb
Correlation between water molecules identified in atomic models of -galactosidase determined by cryo-EM and X-ray crystallography Florentina Tofoleanu1*, Lesley A. Earl2,**, Frank C. Pickard IV1,*** & Bernard R. Brooks1 1Laboratory of Computational Biology, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD 20892 USA 2Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA Corresponding Authors: Florentina Tofoleanu ([email protected]) & Bernard R. Brooks ([email protected]) *current address: Global Discovery Chemistry, Computer-Aided Drug Discovery, Novartis Institutes for BioMedical Research, 181 Mass Ave., Cambridge, MA 02139, USA **current address: Office of Science Communications, Public Liaison and Education, National Eye Institute, National Institutes of Health, 31 Center Drive, Bethesda MD, 20892 USA ***current address: Pfizer World Research and Development, Groton, CT 06340, USA ABSTRACT The placement of individual water molecules can now be determined from high resolution cryo- electron microscopy (cryo-EM) maps, but only a fraction of these water molecules overlap with those observed in X-ray crystallographic maps. Furthermore, there is substantial variation between the placement of water molecules in multiple X-ray maps. Here, molecular dynamics (MD) simulations are used to compare the dynamics and energetics of water in a 2.2-Å- resolution cryo-EM structure with those of water conserved across 56 high-resolution X-ray crystallographic structures of -galactosidase. The MD simulations indicate that water molecules observed in the cryo-EM map and those conserved among X-ray structures are dislodged from their initial locations at a similar rate, which is approximately half the rate of non-conserved water molecules. Additionally, the residence times and the solvent exposure of the water binding sites are similar for the cryo-EM and conserved water molecules, and are statistically different from those of non-conserved water molecules. Conserved water has been shown to be involved in protein folding, dynamics and function. The fact that the properties of water molecules identified by a single cryo-EM map closely correspond to those conserved across multiple X-ray structures points towards the opportunities that cryo-EM opens for structure-based drug-design. KEYWORDS Cryo-electron microscopy, molecular dynamics simulations, conserved water, residence time analysis, solvent exposure analysis ABBREVIATIONS Cryo-electron microscopy (cryo-EM); molecular dynamics (MD), solvent accessible surface area (SASA) Introduction Identifying protein solvation profiles is critical to understand both protein structure and function, as well as for the design of high affinity lead compounds for drug discovery (Poornima and Dean, 1995; Wong and Lightstone, 2011). Interactions between water molecules and amino acids occur both at the surface of protein (Laage et al., 2017) and within the structure, including areas such as subunit interfaces, catalytic sites, and other cavities (Levy and Onuchic, 2006). While protein atoms can be accurately localized in high resolution X-ray crystallographic maps, the reliable placement of water molecules is more difficult (Nittinger et al., 2015). During the process of crystallization, the distribution of water molecules in the protein may be influenced by crystallization, crystal contacts (Nittinger et al., 2015), and a variety of other factors (Juers, 2000; Juers et al., 2000). One approach to identify structurally important water molecules that are stably associated with the protein is to determine “conserved” water molecules, defined as the subset of water molecules that are placed at roughly the same location in an ensemble of independent high resolution X-ray structures of the same polypeptide (Knight et al., 2009; Levitt and Park, 1993). Identifying such conserved water molecules – which are expected to be more stable and thus more likely to be involved in long-lasting interactions – is of particular interest for the use of molecular models for structure-based drug design and other structural analyses. Resolutions attainable by single particle cryo-electron microscopy (cryo-EM) have improved significantly over the last several years; ions and water molecules, in proximity to both polar and non-polar residues and in hydrogen-bonded chains, can now be reliably placed into the experimental electron density for the highest resolution maps (Banerjee et al., 2016; Bartesaghi et al., 2015; Bartesaghi et al., 2018; Campbell et al., 2015; Laverty et al., 2019; Merk et al., 2016; Nakane et al., 2020). In order for water density to be observed by cryo-EM, it is necessary that the water molecules occupy an identical location in the vast majority of the molecules from which the final 3D structure is derived by averaging. Here, we use molecular dynamics (MD) simulations to test the locations of structural water molecules by modeling the protein-water interactions, which have significant effects on the behavior of the protein, including protein folding and stability, and protein-ligand interactions (Privalov and Crane-Robinson, 2017). We simulate both cryo-EM and X-ray derived models of -galactosidase to evaluate water dynamics and identify those water molecules more stably associated with the protein, providing a benchmark by which to evaluate the physical qualities of water molecules placed by each method. We show that, compared to water molecules placed in high-resolution X-ray structures, a large proportion of water molecules observed by cryo-EM are structurally stable. Historically, X-ray crystallography has been the major source of experimentally-based water placement in structural models of biological complexes; however, as cryo-EM begins to achieve sufficient resolution to experimentally observe water density for stable water molecules, cryo-EM models will increasingly be available for use in a variety of contexts. Materials and Methods Setting up the systems Models of Escherichia coli β-galactosidase monomers were constructed from chain A of PDB 5A1A (cryo-EM), and from chain A of PDB 1DP0 (X-ray). 1DP0 is at 1.7 Å resolution, has amino acid residues 14-1024, and includes 4 Mg2+ and 5 Na+ ions, as well as 1134 assigned water molecules per monomer, while 5A1A is at 2.2 Å, has amino acids 2-1024 (we added amino acid 1 to our monomer model) and includes 2 Mg2+ and 2 Na+ ions, 194 assigned water molecules, as well as the inhibitor phenylethyl -D-thiogalactopyranoside (PETG) in each monomer. The parameters for PETG were generated using the CHARMM generalized force field (CGenFF) (Vanommeslaeghe et al., 2010). We retained all water molecules and ions found in the initial structures into our monomer models. Details of the two systems are found in Supplementary Table 1. We solvated the structures with the explicit TIP3P water model and added Mg2+, Na+ and Cl− ions to neutralize the systems and to provide ionic concentrations of 2 mM MgCl2 and 50 mM NaCl, the experimental solutions in our previous study (Bartesaghi et al., 2014). Histidine residues were kept neutral. The two systems were prepared for simulation by minimization, followed by heating to 300 K (145 ps), while constraining the protein backbone and experimentally-placed ions and water molecules Finally, equilibration (500 ps) was performed with the Cα atoms harmonically restrained (Figure 1a). The preparation steps were followed by 100 ns simulations at constant number of particles, constant pressure, and constant temperature (NPT ensemble) using the CHARMM package (Brooks et al., 1983) with the CHARMM36 force field, saving the trajectory every 10 ps (Figure 1b). Pressure was maintained at 1 atm with the Langevin piston method and the temperature was kept constant at 300 K with a Nosé-Hoover thermostat (Feller et al., 1995; MacKerell et al., 1998), with a time step of 1 fs. Particle mesh Ewald (Essmann et al., 1995) was used to impose periodic boundary conditions for simulation of the monomer in bulk solvent. A further simulation of 20 ns, saving each frame at 0.1 ps, was performed for the analysis of water residence times (see below). Structure and trajectory visualization and rendering were performed using Visual Molecular Dynamics (VMD) (Humphrey et al., 1996) and PyMol (Schrodinger, 2015), and ribbon model visualizations were completed in UCSF Chimera (Pettersen et al., 2004). CHARMM, custom- written Python and tcl scripts were employed for trajectory and structural analysis. Identification of conserved water To identify conserved water molecules across all X-ray crystal structures deposited in the Protein Data Bank for E. coli β-galactosidase, we analyzed 56 PDB structures, each containing one to four tetramers. In total, we identified 272 monomer chains (Table S2), each with unique water molecules. We discarded monomers from structures with resolutions lower than 2.2 Å (a total of 124 chains, PDB IDs: 1F4A, 1F4H, 1HN1, 3E1F, 3IAQ, 3T0B, 3MUY, 3T2P, 3T2Q, 3VDA, 3VDC, 3VD3, 3VD5, 3VD7, 4DUX, 4V40, 4V41, 4V44 and 4V45). The remaining 148 chains were aligned to chain A of 1DP0 (1DP0_A), our X-ray reference structure. A water molecule in the reference structure was considered conserved if it was consistently coincident within 1.5 Å of a water molecule in all other 148 structures (Aksianov et al., 2008). We identified 195 such water molecules and mapped them into 1DP0_A. Water retention analysis Throughout the analysis, we