A Resource for Membrane-Embedded Protein Structures and Their Lipid Interactions
Total Page:16
File Type:pdf, Size:1020Kb
D390–D397 Nucleic Acids Research, 2019, Vol. 47, Database issue Published online 12 November 2018 doi: 10.1093/nar/gky1047 The MemProtMD database: a resource for membrane-embedded protein structures and their lipid interactions Thomas D. Newport, Mark S.P. Sansom and Phillip J. Stansfeld* Downloaded from https://academic.oup.com/nar/article-abstract/47/D1/D390/5173663 by University of Warwick user on 11 November 2019 Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK Received September 09, 2018; Revised October 05, 2018; Editorial Decision October 08, 2018; Accepted October 16, 2018 ABSTRACT continuous improvements to detergent solubilization and crystallization protocols, such as Lipidic Cubic Phase (6), Integral membrane proteins fulfil important roles in HiLiDe (7) and MemGold (8). Meanwhile, the enhanced many crucial biological processes, including cell sig- resolution and variety of structures solved by Cryo-Electron nalling, molecular transport and bioenergetic pro- Microscopy (9) opens up a wealth of new possibilities. cesses. Advancements in experimental techniques With rapid growth in the number of protein structures are revealing high resolution structures for an in- available, several database systems have been developed in creasing number of membrane proteins. Yet, these order to organize and annotate these structures in a bio- structures are rarely resolved in complex with mem- logically meaningful way. The PDB is a universal reposi- brane lipids. In 2015, the MemProtMD pipeline was tory for all experimentally derived membrane protein struc- developed to allow the automated lipid bilayer as- tures, holding a single accession for each deposited struc- sembly around new membrane protein structures, re- ture. Structures deposited in the PDB may be linked to one or more UniProt (10) protein sequence, which in turn can be leased from the Protein Data Bank (PDB). To make grouped into larger Pfam (11) families. In addition the Gene these data available to the scientific community, Ontology (GO) project (12) defines a standardized way to a web database (http://memprotmd.bioch.ox.ac.uk) further annotate proteins with attributes ranging from func- has been developed. Simulations and the results of tional notes to cellular location. Several specialized classi- subsequent analysis can be viewed using a web fication systems also exist to classify membrane proteins; browser, including interactive 3D visualizations of Membrane Proteins of Known Structure (mpstruc; http: the assembled bilayer and 2D visualizations of lipid //blanco.biomol.uci.edu/mpstruc/)(13) places membrane contact data and membrane protein topology. In addi- protein structures into a hand-curated tree, whilst the Trans- tion, ensemble analyses are performed to detail con- porter Classification DataBase (TCDB; http://www.tcdb. served lipid interaction information across proteins, org)(14) provides an automatically generated tree look- families and for the entire database of 3506 PDB en- ing specifically at transporters and channels. The GPCRdb provides comprehensive data, diagrams and web tools for tries. Proteins may be searched using keywords, PDB G protein-coupled receptors (GPCRs; http://gpcrdb.org) or Uniprot identifier, or browsed using classification (15). The membrane protein database (MPDB; http://www. systems, such as Pfam, Gene Ontology annotation, mpdb.tcd.ie) details the conditions and additives used to ex- mpstruc or the Transporter Classification Database. perimentally solve membrane protein structures (16). All files required to run further molecular simulations Membrane proteins are embedded in a lipid bilayer, of proteins in the database are provided. which has significant influence on both the structure and function of the protein. Whilst it is not uncommon to be INTRODUCTION able to identify a few lipids when determining a membrane protein structure (17), most lipid components of the mem- There are now ∼3500 structures of over 1000 unique in- brane are highly dynamic and rapidly exchanged (18), so tegral membrane proteins deposited in the Protein Data are not readily resolved by current structure determination Bank (PDB) (1,2). Membrane proteins are of consider- techniques. able biomedical interest, constituting ∼25% of published Methods such as Positioning of Proteins in Membrane genomes (3) and 50% of current drug targets (4). A near- (PPM) (19), TMDET (20) and Memembed (21) can use exponential increase in the number of published membrane the distribution of surface amino acid residues to infer protein structures (5) looks set to be sustained through *To whom correspondence should be addressed. Tel: +44 1865 613362; Fax: +44 1865 613201; Email: [email protected] Present address: Thomas D. Newport, Oxford Nanopore Technologies, Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK. C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2019, Vol. 47, Database issue D391 both the tilt of the protein and the thickness of the lipid ilies. The database can be browsed using a freely acces- bilayer, modelling the bilayer as a slab bounded by two sible web application (http://memprotmd.bioch.ox.ac.uk), planes. PPM is used in the Orientations of Proteins in Mem- and results of analyses, as well as files and instructions re- branes (OPM; https://opm.phar.umich.edu) database, while quired to perform further simulations can be downloaded. TMDET is applied to entries in the Protein Data Bank of Transmembrane Proteins (PDBTM; http://pdbtm.enzim. MATERIALS AND METHODS hu). Although these methods can accurately predict the lo- cation of transmembrane (TM) domains, they lack explicit Implementation Downloaded from https://academic.oup.com/nar/article-abstract/47/D1/D390/5173663 by University of Warwick user on 11 November 2019 lipids and so are unable to capture specific interactions be- The initial CGSA simulation setup is performed using the tween individual lipid molecules and sites on the surface of MemProtMD pipeline (34). In brief, integral membrane the protein. Highly specific protein–lipid interactions can proteins are identified from the PDB based on an Octo- modulate protein function, either directly (22)orbyregu- pus prediction of surface-assessible ␣-helical TM domains lating protein–protein interactions such as oligomerization (38). Potential -barrel membrane proteins are identified (23,24). These lipid-binding sites yield potential targets for based on the number, length, accessibility and hydropho- novel allosteric modulatory drugs from within the mem- bicity of their -strands. Where available, the biological brane, e.g. AZ3451 binding to the PAR2 receptor (25). In unit in the PDB is prepared for the oligomeric state of other instances, interactions between the protein and spe- the simulated protein. In all instances non-protein atoms cific lipid molecules result in chemical modification ofthe are removed from the PDB entry prior to simulation. MD lipid (26), or the translocation of a lipid molecule between simulations are performed using GROMACS 5.1.4 (39) leaflets of the bilayer (27). In several cases, the structure and the MARTINI 2.2 forcefield (40). Completed simu- of the protein may induce local deformations of the mem- lations are then converted to atomistic representation us- brane, thinning or thickening the membrane or inducing ing CG2AT (30). Analysis of completed simulations is per- curvature (28). formed using Python, MDAnalysis (41) and our in house Molecular Dynamics (MD) simulations permit detailed mpm-tools, which includes bindings for MUSCLE (42)and models of protein interactions within biological mem- a python adaptor for VMD (43), used to render static im- branes, characterizing the interactions of a protein with ages. Two dimensional visualizations of data are performed explicit lipid molecules. MD simulations are typically per- using D3.js. Three dimensional protein visualization uses formed on one of two scales: atomistic MD simulations con- PV.The database is stored using MongoDB. The web server sider each atom in the simulated system as a single par- uses NodeJS and Express to serve a frontend application ticle, producing more accurate simulations, whilst Coarse- built on ReactJS and Redux. Server application deployment Grained (CG) MD simulations treat 3–4 heavy atoms as a is performed using Docker Compose. single particle (29), reducing the complexity of the system and allowing for longer or larger simulations. Multi-scale approaches simulate a system using both CG and atom- RESULTS istic representations, combining the strengths of both ap- The MemProtMD database and web server contents proaches (30,31). Several methods exist to reconstitute lipid membranes At time of writing, the MemProtMD database holds 3506 using MD simulations. The CHARMM-GUI (32) lipid- CGMD simulations of intrinsic membrane proteins in- builder tool can either be used to insert a protein structure serted into phospholipid bilayers, each based on a single into a pre-equilibrated bilayer, or generate a new bilayer by structure deposited in the PDB. Represented within these packing lipids around a protein structure, whilst INSANE are TM proteins,