Topology Prediction of Membrane Proteins: Why, How and When?

Topology Prediction of Membrane Proteins: Why, How and When? Karin Melén Stockholm University © Karin Melén, Stockholm 2007 ISBN 91-7155-397-5 Printed in Sweden by Universitetsservice AB, Stockholm 2007 Distributor: Stockholm University Library To Ture, Otto and Henrik List of publications Publications included in this thesis Paper I Melén K, Krogh A, von Heijne G. (2003) Reliability measures for membrane protein topology prediction algorithms. J Mol Biol. 327(3):735-44. Paper II Kim H, Melén K, von Heijne G. (2003) Topology models for 37 Saccharomyces cerevisiae membrane proteins based on C-terminal reporter fusions and predictions. J Biol Chem. 278(12):10208-13. Paper III Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melén K, de Gier JW, von Heijne G (2004) Experimentally based topology models for E. coli inner membrane proteins. Protein Sci. 13(4):937-45. Paper IV Daley DO, Rapp M, Granseth E, Melén K, Drew D, von Heijne G. (2005) Global topology analysis of the Escherichia coli inner membrane proteome. Science. 308(5726):1321-3. Paper V Kim H*, Melén K*, Österberg M*, von Heijne G. (2006) A global topology map of the Saccharomyces cerevisiae membrane proteome. Proc Natl Acad Sci. 103(30):11142-7. * These authors contributed equally Reprints were made with permission from the publishers Other publications Österberg M, Kim H, Warringer J, Melén K, Blomberg A, von Heijne G. (2006) Phenotypic effects of membrane protein overexpression in Saccharomyces cerevisiae. Proc Natl Acad Sci. 103(30):11148-53. Granseth E, Daley DO, Rapp M, Melén K, von Heijne G. (2005) Experimentally constrained topology models for 51,208 bacterial inner membrane proteins. J Mol Biol. 352(3):489-94. Laudon H, Hansson EM, Melén K, Bergman A, Farmery MR, Winblad B, Lendahl U, von Heijne G, Näslund J. (2005) A nine-transmembrane domain topology for presenilin 1. J Biol Chem. 280(42):35352-60. Contents 1 Introduction ................................................................................................11 1.1 Biological membranes .............................................................................................11 1.2 Membrane proteins .................................................................................................13 1.2.1 Peripheral membrane proteins .......................................................................13 1.2.2 Integral membrane proteins............................................................................13 2 Membrane protein structure.......................................................................17 2.1 Overexpression .......................................................................................................17 2.2 Techniques for structure determination...................................................................18 2.2.1 Electron crystallography..................................................................................19 2.2.2 X-ray crystallography ......................................................................................19 2.2.3 NMR spectroscopy..........................................................................................19 2.3 Structural models by homology...............................................................................20 2.4 Structure prediction .................................................................................................21 3 Membrane protein topology .......................................................................23 3.1 Experimental determination of topology..................................................................24 3.1.1 Reporter genes ...............................................................................................24 3.2 Prediction of topology..............................................................................................26 3.2.1 Topological determinants................................................................................27 3.2.2 Topology prediction algorithms.......................................................................31 4 Summary of papers....................................................................................39 4.1 Reliability measures for topology predictions and the use of experimental knowledge (Paper I) ......................................................................................................39 4.2 Topology models for a small number of S. cerevisiae membrane proteins based on C-terminal reporter fusions and predictions (Paper II) ..................................................41 4.3 Topology models for a small number of E. coli membrane proteins and optimization analysis of fusion points (Paper III) ...............................................................................43 4.4 Large-scale topology analysis of the E. coli and S. cerevisiae membrane proteomes (Papers IV and V)........................................................................................45 5 Discussion and future perspectives ...........................................................50 Acknowledgements .......................................................................................52 References....................................................................................................54 Abbreviations 2D Two-dimensional 3D Three-dimensional Endo H Endoglycosidase H ER Endoplasmic reticulum GFP Green fluorescent protein GO Gene ontology GPCR G protein-coupled receptor HA Hemagglutinin His4 Histidinol dehydrogenase HMM Hidden Markov model MP Membrane protein NMR Nuclear magnetic resonance NN Neural network ORF Open reading frame PDB Protein Data Bank PhoA Alkaline phosphatase SG Structural genomics SP Signal peptide SRP Signal recognition particle SUC2 Invertase (carrying N-glycosylation sites) TM Transmembrane Amino acids Ala Alanine Leu Leucine Arg Arginine Lys Lysine Asn Asparagine Met Methionine Asp Aspartic acid Phe Phenylalanine Cys Cysteine Pro Proline Gln Glutamine Ser Serine Glu Glutamic acid Thr Threonine Gly Glycine Trp Tryptophan His Histidine Tyr Tyrosine Ile Isoleucine Val Valine 1 Introduction The cell is the essential unit of life and is the fundamental building block in all organisms. Cells are surrounded by membranes, usually a double layer of lipids, which separates them from the outside world. The membrane is a physical barrier that protects the cell from foreign molecules at the same time as it prevents leakage of internal components and substances. However, a cell must be able to communicate with its surroundings, exchange molecules and adapt to sudden environmental changes. Membrane proteins (MPs) are the key players in these communication processes and are responsible for regulating the permeability of the membranes. They are involved in a wide range of functions, such as transport of ions and water, receptors for hor- mones or other signaling molecules, recognition of “self” versus “non-self”, transducing energy and cell-cell interactions, just to mention some. The diversity of functions is also mirrored in the great variability in the three-dimensional (3D) structure of membrane proteins. Some are integral with certain parts embedded in the membrane whereas others are peripheral and bound only to the membrane surface. Determination of the structures would facilitate the assignment of the functions, but unfortunately it is very difficult to solve the three-dimensional structure of membrane proteins experimentally. Therefore, alternative approaches to obtain structural information must be taken in parallel with traditional structure determination efforts. One way is to use bioinformatics methods to predict which parts of the protein interact with the lipid bilayer and which parts are residing outside the membrane using the amino acid sequence only. Other means can be to experimentally locate selected parts of the proteins to the cell interior, cell ex- terior or in the membrane. In the work presented in this thesis, a strategy of combining theoretical predictions and experimental analyses has been car- ried out in order to increase our knowledge of the membrane protein universe. 1.1 Biological membranes The membranes surrounding the cells in all three domains of life (bacteria, archea and eukaryotes) are called plasma membranes. In eukaryotes there are 11 additional internal membranes that define specific compartments, so-called organelles. Some examples are the endoplasmic reticulum (ER), Golgi network, mitochondria, chloroplasts, nucleus, lysosomes and peroxisomes. Common to all kinds of membrane is that they consist of a mixture of lipids that are assembled into a bilayer structure. There are different classes of lipids that can be neutral, zwitterionic or negatively charged, where glyco- and phospholipids are the most common ones. The main characteristics for all lipids are similar in that they are amphipathic molecules with hydrophilic head groups and hydrophobic hydrocarbon tails. In an aqueous milieu they spontaneously arrange themselves into a double layer with the polar head groups facing the aqueous environment and the hydrophobic tails in each layer pointing inward, thereby avoiding contact with water. The driving force for this formation is the inability of the fatty acid hydrocarbon chains to hydrogen bond to water. The shape of the tails allows van der Waals interactions between neighboring tails. The lengths of the fatty acid chains determine the thickness of the hydrophobic core that typically is about 30 Å. The size of the polar head groups on

Topology Prediction of Membrane Proteins: Why, How and When?

Computation and Visualization of Protein Topology Graphs Including Ligand Information

Predictive Energy Landscapes for Folding Α-Helical Transmembrane Proteins

Curriculum Vitae – Prof. Anders Krogh Personal Information

Protein Folding and the Organization of the Protein Topology Universe

Using the Spycatcher-Spytag and Related Systems for Labeling and Localizing Bacterial Proteins

Predicting Transmembrane Topology and Signal Peptides with Hidden Markov Models

UC San Diego UC San Diego Electronic Theses and Dissertations

Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids

Tporthmm : Predicting the Substrate Class Of

A Jumping Profile HMM for Remote Protein Homology Detection

APEX2 Proximity Proteomics Resolves Flagellum Subdomains and Identifies Flagellum Tip-Specific

Dear Delegates,History of Productive Scientiﬁc Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary ﬁelds