Preparing a PDB File the Protein Data Bank (PDB) Is Possibly the World’S Leading Public Source of Three-Dimensional Data for Biological Molecules (1)
Total Page:16
File Type:pdf, Size:1020Kb
Copyright ©2006, Accelrys Software Inc. All rights reserved. Preparing a PDB File The Protein Data Bank (PDB) is possibly the world’s leading public source of three-dimensional data for biological molecules (1). As of July 2006, over 37,000 entries could be found in the PDB. Hundreds more are being added every month. Both X-ray diffraction and other solid-state techniques account for the majority of the structures. However, over 5500 NMR structures are also available. These deposited structures include proteins, peptides, nucleic acids, carbohydrates, and complexes of these molecules. Figure 1: Schematic view of the ligand-binding domain from the As a first step in a modeling project, many vitamin D receptor (PDB file 1IE9). researchers look in the PDB to find available The crystallographic waters are shown structures related to their project. Preparation of as small spheres and the bound ligand these molecules for work in the Discovery Studio is shown as a CPK model. environment is a critical process to your modeling OH efforts. H3C CH In the following steps, we will load a PDB file for 3 the ligand-binding domain of the vitamin D receptor H C 3 (VDR) in complex with a ligand (named VDX in CH 3 this exercise). The file is 1IE9 as reported by Tocchini-Valentini et al. (5). The vitamin D receptor is responsible for the expression of a H variety of genes including calcium metabolism, bone formation, and cell growth and differentiation CH 2 (2). Understanding VDR conformational changes resulting from interactions with bound ligands may HO OH help to identify and treat persons at risk for Figure 2: 1α,25-dihydroxyvitamin disorders such as osteoporosis, breast cancer, or D3, the metabolized form of prostate cancer. vitamin D. It is named VDX in In this exercise, you will retrieve the PDB file and this exercise. prepare it for an energy calculation such as an energy minimization or molecular dynamics simulation. You will learn in the lesson how to: • Load a PDB file directly from the Protein Data Bank, • Split the molecule, • Edit non-protein components, • Combine components, • Rename objects using the Hierarchy View, and • Prepare and run an energy calculation. Seq1 - 2 Copyright ©2006, Accelrys Software Inc. All rights reserved. 1. Load the PDB file We must have Discovery Studio running to start the exercise. Launch the Discovery Studio client if it not already running. If Discovery Studio is already running, from the Windows menu, select the Close All command. If you are prompted to save any molecules or data, select No. Now, we will load the PDB file directly from the RCSB through the Discovery Studio interface. Note: The File | Open URL… command only will obtain files through a network connection to a PDB server. An error is returned if the connection is not possible. Check with your instructor or system administrator if a connection is available from your workshop location. From the File pulldown menu, select the Open URL… command. In the dialog box, the URL should refer to the RCSB site. Replace the last four characters of the URL with 1ie9. Click the Open button. If a connection to the PDB cannot be made, the required file is available in the DataFiles directory as 1IE9.pdb. The structure displayed in the 3D Window is the crystal structure of the ligand-binding domain of the vitamin D receptor with a bound ligand and 225 crystallographic waters. From the View menu, select the Hierarchy command. Note the distribution of components in the Hierarchy View. We will rearrange the components to facilitate working with the structure. 2. Remove the unit cell The crystallographic unit cell often has little meaning in molecular mechanics calculations. Sometimes it is just easier to remove the unit cell. From the Structure menu, select the Crystal Cell pullright menu. Choose the Remove Cell command. Note the change in the Hierarchy View as the unit cell information is removed. Expand the object 1ie9 to verify that the ligand and water molecules are still present. 3. Split the components of the molecular system This PDB file is composed of many parts. We can view the components through the Hierarchy Window explored in other exercises. However, we can actually separate the components of the PDB file quicker with the Split command. Access the Tools Explorer. Then, access the Protein Reports and Utilities tool. Find the Split tool group. Three options are listed and will be described in a moment. Click the All command. Note the change in the Hierarchy View. The three components of the PDB file have been segregated into separate objects and renamed. The three components are the protein (1ie9_A), the ligand (1ie9_NonProtein), and the Seq1 - 3 Copyright ©2006, Accelrys Software Inc. All rights reserved. crystallographic water (1ie9_Water). The three options for the Split command allow some flexibility in how to manipulate objects. Splits out all chains and non-protein substructures into separate All objects in the Hierarchy View and lists each amino acid sequence as a separate sequence in the Sequence View. Splits out all protein chain substructures into separate objects in the Protein Hierarchy View and lists each amino acid sequence as a separate sequence in the Sequence View. Splits out ligands and other non-protein substructures such as waters Non-Protein into separate objects in the Hierarchy View and lists any non-protein polypeptide sequences as separate sequences in the Sequence View. 4. Remove the water For this exercise, we will remove the crystallographic water molecules. There are several ways that the waters could be removed. In this case, as we split the system and produced a new hierarchical arrangement, we will use the Hierarchy View. In the Hierarchy View, select the entry 1ie9_Water. The entry should be selected in both the Hierarchy View and the 3D Window. Click the Delete key on the keyboard. The 225 water molecules are removed from both the Hierarchy View and the 3D Window. 5. Rename components To simplify the object names, we may rename them. First we will rename the overall system. In the Hierarchy View, select the entry 1ie9_A. Right click with the mouse and select the command Attributes of 1ie9_A. In the Molecule Attributes dialog, select the Name cell. Change the entry to Complex Click the OK button. Now we will rename the protein chain Expand the entry for Complex. Select the entry A. Right click with the mouse and select the command Attributes of A. In the Molecule Attributes dialog, select the Name cell. Change the entry to Receptor. Click OK. Now, we will group the ligand VDX with the receptor. While holding down the left mouse button in the Hierarchy View, drag the entry for 1ie9_NonProtein into the entry Complex. The resulting hierarchical arrangement should appear as in Figure 3. Note the ligand has been renamed <Chain>. This is the name of the molecule that we would have seen under the object 1ie9_NonProtein. Seq1 - 4 Copyright ©2006, Accelrys Software Inc. All rights reserved. Finally we will rename the ligand. Now select in the Hierarchy View the entry <Chain>. Right click with the mouse and select the command Attributes of Chain. Enter for the object name VDX. Click the OK button. Figure 3: Hierarchy view with The final Hierarchy View should appear as in Figure 4. the merged objects 6. Clean the protein Discovery Studio can automate many of the tasks required to properly prepare a protein for an energy calculation. Before performing the clean operation, we can specify what operations to conduct through the Preferences dialog. Figure 4: Hierarchy window From the Edit menu, select the Preferences… command. showing the merged and rename In the Preferences dialog, expand the Protein Utilities objects page. Select the Clean Protein page. At this point, we have several options that can be set. These options will address common problems that may be present in PDB files. For example, X-ray crystal structures will typically not have hydrogen atoms, so these must be added. Also, the chain ends must be set with the correct chemistry and any missing atoms in residues must be placed. The options are described below. NonStandard Names Checks whether atom names conform to the standard names and corrects them if necessary. Correct Disorder (Retain One Checks for disordered atoms and retains only the first Problems Set) set. Incomplete Residues Adds missing side chain atoms to amino acids. This operation will not fill in missing loop regions. Allows the protonation state of the ionizable amino acids and the termini to be controlled using the Desired pH standard pKa values. The protonation state is adjusted after any modifications of the hydrogens and termini by the following options. Seq1 - 5 Copyright ©2006, Accelrys Software Inc. All rights reserved. Figure 5: Clean Protein preferences panel If the Modify Hydrogens All Hydrogens: All hydrogen atoms are added. option is checked, Polar Hydrogens: Only those hydrogen atoms that Hydrogens hydrogens will be added could be involved in hydrogen bonds will be added. as needed. No Hydrogens: No hydrogen atoms are added. If the Modify Termini option is checked, then Termini termini will be added or removed as indicated. Ensures that amino acids have the correct bond Fix Connectivity and Bond Orders order. This option will have no effect on nonstandard amino acids or ligands. We will set the Clean Protein preferences now. Turn on preferences as shown in Figure 5 on the previous page. Turn on all toggles in the Correct Options section. Toggle on Modify Hydrogens and All. Toggle on Modify Termini and Add. Click the OK button in the Preferences dialog and return to the 3D window.