Finding Protein and Molecular Structures

Finding Protein and Molecular Structures Part of the Jmol Training Guide from the MSOE Center for BioMolecular Modeling Interactive version available at http://cbm.msoe.edu/teachingResources/jmol/jmolTraining/structures.html Introduction In order to view a protein or molecule using Jmol, or any molecular visualization program, you need to have a 3-dimensional structure file. These files contain the (X, Y, Z) coordinates for the atoms that make up a structure, along with information about each atom. These files can vary dramatically in both size and internal format, depending on how large the structure is and how the structure file was created. The most common molecular structure file formats that you will be using with Jmol are Protein Databank (.pdb) files and MDL Molfile (.mol) files. Types of Structure Files Protein Databank (.pdb) Files The protein databank (.pdb) file format is curated and annotated by the RCSB Protein Databank (www.pdb.org). The RCSB PDB is an international database that contains archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of micro biology. The RCSB Protein Databank has also created tools and resources for research and education in molecular biology, structural biology, computational biology, and beyond. The RCSB Protein Databank is the primary source for large protein structure files and will be discussed in more detail before. MDL Molfile (.mol) Files The MDL Molfile (.mol) file format was originally designed as part of the Chemical MIME Project by Henry Rzepa. It is similar to .pdb files in that it contains the 3- dimensional locations of atoms in a molecular structure. However, unlike .pdb files, .mol files are often used for smaller structures such as ligands, drugs and sugars. There are a large number of .mol file sources including ChemSpider, Drug Bank and the NIH Cactus Server. Many chemical drawing programs such as ChemDraw and ChemDoodle export .mol files for viewing created structures in 3-dimensional visualization programs. Inside a Structure File Once a structure has been determined, each atom in the structure is assigned an (X, Y, Z) coordinate to mark its location in 3- dimensional space. Additional information compliments these basic coordinates including the type of atom at each location, the chain and the residue the atom is part of. Some structure files contain additional information such as resolution data, temperature numbers, electrostatic potential data and more. The image to the right shows a short bit of code from inside of a structure file. For more information on structure files and how they are determined, visit these RCSB Protein Databank resources: Understanding PDB Data http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/intro.html Methods for Determining Atomic Structures http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at- Structures/methods.html The RCSB Protein Databank The RCSB Protein Databank (http://www.pdb.org) is the largest worldwide repository for the processing and distribution of .pdb file structure data of large molecules of proteins and nucleic acids. There now well over 100,000 structure files available on the www.pdb.org website! Finding Structures on the Protein Databank Each structure hosted on the Protein Databank has a unique four character long alpha-numeric identifier, referred to as the structure's PDB ID. Often more than one .pdb file will exist for a specific type of protein. For example, there are hundreds of .pdb file entries for the relatively common protein Hemoglobin. It is often a good idea to use specific information about a structure listed below to help determine if you have found the best possible file. Who are the authors of the PDB file? In which journal was the primary citation published? On what date was the file deposited into the PDB? How many chains are in this file? Are there any heterologous groups within this PDB file? If so, which ones? From what source was this molecule isolated? The Structure Summary Page When you click on a specific PDB ID, you will initially see the Structure Summary page for the structure. This page includes a variety of useful information about the structure. Structure Preview Image - Provides a quick overview of what the molecule or protein looks like. Structure ID Number - This 4 letter/number ID is a unique identifier that is assigned to the crystal data file upon deposition into the database. Source of the Molecule - From which species was the molecule isolated, such as human, bacterium, virus, mouse, etc.. Title - Title of the .pdb file Authors - These are the researchers who were involved with the crystallization of the molecule. The senior author or principal investigator is usually the last author in science publications. Primary Citation - The journal article that accompanies the .pdb file. This is usually an excellent research resource for understanding the function of the molecule. Molecular Description – The abstract associated with the primary citation. Chemical Component - This will tell you the number of chains within the molecule and the chain identity. For example, in the hemoglobin file 1a3n.pdb, the chains A and C are the alpha-globin molecules and chains B and D are the beta-globin molecules. This section also tells you if there are any heterologous groups that were crystallized with the molecule. Not all .pdb files will have this section. o The 2-3 letter identifier used to designate the chemical components contained within the file listed are recognized by Jmol and can be used to select these molecules with the Jmol Console. o For example, if this section stated that there was NAG (N-acetyl-glucosamine) contained within the molecule, RasMol would recognize “NAG” and you could therefore “select NAG” and RasMol would be able to select the atoms within that chemical component of the PDB file. Method of Structure Determination - The method that was used to obtain the structural data (NMR, X-ray diffraction). Resolution - How accurate the data is; the smaller the number, the better the data. The View in 3D Window The View in 3D Window will also let you preview the structure using a web- embedded online Jmol. To view this preview, simply click the "View in 3D: JSmol" button that is located directly below the molecule image on each Structure Summary Page. The Sequence Page Just above the .pdb file Title should be a series of tabs, the fourth of which is the Sequence tab. This section of the .pdb file page provides specific sequence information as well as secondary structure information about the molecule. You can identify the alpha helices or beta sheets as well as the amino/carboxyl termini, which are the first and last amino acids of the protein. The Two Ways to Obtain a .pdb Structure One of the key features of the Protein Data Bank is the ability to search the database for files. You can search for a unique structure if you know its PDB ID, or by using key words and authors. To submit a search query, enter these terms in the search box located near the top center of every www.pdb.org page. After you have entered the search terms in the field, hit enter or click on the "Go" button to the right of the search field. There are two ways to obtain a .pdb file: 1. Download the File from the RCSB Protein Databank website. a. Go to the website http://www.pdb.org b. In the top right corner of the website is a search bar similar to the image below. Type in the four number/letter file name, in this case we are looking for "1qys", and click the "Search" button. c. This should bring you to the page for "1qys.pdb – Top 7". Just below the search box on the right should be a list of four options. Click "Download Files" and you will see an expanded menu similar to the image shown below. d. Click "PDB Format" to begin the download of the .pdb file containing the coordinates for Top 7. This file, named "1qys.pdb", can be saved to the location of your choosing on your computer. Note that is a good idea to create a new folder for each molecule you work on to organize all of your .pdb files, images, and other related work. 2. Dynamically Load the File from the RCSB Protein Databank Server. As long as you have an Internet connection, Jmol allows you to dynamically connect to the RCSB Protein Databank and load a structure without downloading it permanently to your computer. You will, however, need to know the four character alpha-numeric PDB ID for the structure you are looking for. To load the structure file 1qys.pdb: load=1qys Note that you do not need to add the file extension (.pdb) when entering this command; just the four character alpha-numeric PDB ID is needed. You do, however, need to include the equal sign "=" with no spaces between it and the name of the .pdb file. This equal sign tells Jmol that you want to access the RCSB Protein Databank servers to find the structure, rather than finding a file locally on your computer. Additional Resources from the RCSB Protein Databank The RCSB Protein Databank has several regularly updated features as well as some interesting interviews and newsletters that may be useful for any Jmol designer. The Molecule of the Month by David S. Goodsell provides an introduction to the structure and function of a molecule, a discussion of its relevance to health and disease, interactive views, discussion topics, and links to related entries.

Finding Protein and Molecular Structures

Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats

Notes on OLEX2

Visualizing 3D Molecular Structures Using an Augmented Reality App

Chemdoodle Web Components: HTML5 Toolkit for Chemical Graphics, Interfaces, and Informatics Melanie C Burger1,2*

Development and Application of a Computational Platform for Complex Molecular Design Jaime Rodríguez-Guerra Pedregal

Page 1 of 52 RSC Advances

ACD/Chemsketch Reference Manual (Ver 11.0)

Chem3d 17.0 User Guide Chem3d 17.0

Collaborative Development of Predictive Toxicology Applications

Computer-Aided Information Retrieval and Management System from Scientific Documents

The Architecture of Starch Blocklets Follows Phyllotaxic Rules Francesco Spinozzi1, Claudio Ferrero2 & Serge Perez3*

Extending the Reach of Computational Approaches to Model Enzyme Catalysis