Membrane Protein Prediction Methods

Methods 41 (2007) 460–474 www.elsevier.com/locate/ymeth Membrane protein prediction methods Marco Punta a,b, Lucy R. Forrest a,b, Henry Bigelow a,b, Andrew Kernytsky a,b, Jinfeng Liu a,b,c, Burkhard Rost a,b,c,¤ a Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Ave., New York, NY 10032, USA b Columbia University Center for Computational Biology and Bioinformatics (C2B2), 1130 St. Nicholas Ave., New York, NY 10032, USA c North East Structural Genomics Consortium (NESG), Columbia University, 1130 St. Nicholas Ave., New York, NY 10032, USA Accepted 5 July 2006 Abstract We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and high- light the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identiWcation, transmembrane segment and protein topology predictions, homology and ab initio modeling. In general, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experi- mental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and eVort will be required in the future to ameliorate database search, homology and ab initio modeling. © 2006 Elsevier Inc. All rights reserved. Keywords: Membrane proteins; Protein structure prediction; Protein function prediction; Alignments; Transmembrane segment prediction; Homology modeling; ab initio modeling 1. Introduction thick (»30 Å) hydrophobic core. The sequence and structure of IMPs reXects an extreme eVort of adaptation to this 1.1. The fold space of membrane proteins is relatively smallƒ environment. Since the interactions of polar groups with the hydrophobic core of the membrane are energetically Integral membrane proteins (IMPs) are polypeptide unfavorable, IMPs have evolved so as to minimize them. chains embedded into biological membranes. In this review, Inside the membrane, IMP chains form long regular sec- we focus exclusively on polytopic membrane proteins, i.e. ondary structure elements that cross the entire bilayer, i.e. proteins that span the membrane at least once. We will not are transmembrane (TM) segments. In this way, they satisfy consider monotopic membrane proteins, which despite the hydrogen-bond potential of the backbone amide and being anchored to the membrane do not cross it (anchors carbonyl groups. These secondary structure elements can include amphipathic helices and intra-membranous mole- either be TM helices (TMH) or TM -strands (TMB) cules covalently attached to the protein). arranged into -sheets; helices and strands are connected Biological membranes are phospholipid bilayers. The by regions that generally extend beyond the membrane polar heads of the phospholipids face the aqueous solution core, into the aqueous solution. Also, the assembly of TM on both sides of the membrane, while the lipid tails form a elements is strongly inXuenced by the presence of the bilayer. -Helices form bundles, with each helix strongly * Corresponding author. Fax: +1 212 305 7932. oriented towards the normal to the membrane plane [1]. E-mail address: [email protected] (B. Rost). -strands are arranged in barrel-like structures, with the URL: http://www.rostlab.org/ (B. Rost). barrel axis normal to the membrane plane. While helical 1046-2023/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ymeth.2006.07.026 M. Punta et al. / Methods 41 (2007) 460–474 461 IMPs are practically ubiquitous (only being absent in the domains contain crevices and cavities that can accommo- outer membrane of Gram-negative bacteria), -barrel date numerous water molecules and other ligands, thus IMPs are only found in the outer membrane of Gram-nega- forming a more diverse environment with which the protein tive bacteria, mitochondria and chloroplasts. Proteins with can interact. There is also evidence that TMH topology is both integral TMHs and TMBs have not been observed. not solely determined by the protein sequence, but depends Due to the strong compositional biases imposed by the on complex interactions with the translocon, the machinery bilayer onto the amino-acid sequence, predicting the loca- responsible for inserting TMHs into the membrane [11]. tion of TM segments in the protein sequence turns out to be Consequently, topology may depend on the organism in a relatively easy task. In fact, in order to cross the mem- which the protein is expressed. Finally, functional IMPs are brane, TMHs need to be at least 15 residues long and com- very often the result of the assembly of several chains, pos- posed predominantly of hydrophobic amino acids. TMBs ing the problem of Wnding the correct protein–protein are generally longer than 10 residues and are comprised of docking solution, a notoriously diYcult task, despite recent alternating hydrophobic and polar amino acids (hydropho- progress [12]. For example, receptor binding and activity in bic side chains face the lipids while polar residues face the G-protein coupled receptors have been reported to be water- or protein-Wlled interior of the barrel). Also, regions linked to oligomerization [13]. Thus, the IMP structural connecting TM segments that are not translocated across space, although smaller than that sampled by water-soluble the bilayer (“inside” or cytoplasmic regions) are enriched in proteins, is still intricate enough to make computational positively charged amino acids, following the so-called structure prediction a very challenging task. ‘positive-inside rule’ [2,3]. This simpliWes the prediction of the way in which TM segments cross the membrane 1.3. Two main prediction tasks: identify and characterize (inside–out or outside–in). In conclusion, in comparison to water-soluble proteins, In this paper, we review computational methods that can IMP chains are able to sample only a limited number of be applied to study IMPs. Some of these approaches have folds [4]. This reduced conformational space may be easier to been developed speciWcally for IMPs, others have been search through traditional sampling algorithms (e.g. molecu- adapted or simply transferred from the techniques origi- lar dynamics and Monte Carlo). As we said, the number, nally devised for water-soluble proteins. Computational location and cross-membrane direction of TM segments can methods have been developed to address two diVerent be predicted rather accurately (see Section 2). It is, therefore, goals: to identify novel IMPs or to characterize the func- reasonable to believe that 3D1-structure prediction for IMPs tional and structural features of a protein experimentally is within the reach of computational methods. known to be inside the membrane. The Wrst task (identiWcation of IMPs) can be solved by 1.2. ƒ but membrane proteins still exhibit remarkable sequence or proWle alignment techniques that detect rela- structural variability tives of known IMPs. Alternatively, we can search motif databases (e.g. PROSITE [14], PRINTS [15]) to identify Notwithstanding the previously described constraints functional motifs that are characteristic of IMPs and use and biases, presently no method can accurately predict the these motifs to extend the database searches, or we can use 3D structure of any IMP from sequence alone. On one prediction methods that locate TM segments. Note that the hand, this is due to the very limited number of high-resolu- latter two approaches do not require homology to a known tion structures available, giving us a myopic view of the IMP and that the described methods produce diVerent and IMP structural space; on the other, IMP architectures may often complementary information. For example, Ruta et al. not be as simple as initially thought. Indeed, as more exper- [16] used database searches and sequence analysis to detect imental structures have become available, IMPs have a previously uncharacterized voltage-gated potassium revealed an unexpected level of structural diversity. Con- channel in the archae bacterium Aeropyrum pernix and straints on TMH length and tilt angles are not as strict as later conWrmed the prediction experimentally. On the other previously hypothesized [5]. Also, analysis of known struc- hand, prediction of TM segments has been widely used for tures in the Protein Data Bank [6] (PDB) indicated that estimating the fraction of membrane proteins in various about 50% of TMHs contain non-canonical elements (i.e. genomes and kingdoms of life. kinks, 310-helix and -helix turns) [7] and 5% of all TMHs Also the second task (functional and structural charac- cross the membrane only partially, i.e. form half TMHs [8]. terization) can be achieved by diVerent means, depending Stable loop regions have been found inside the hydropho- on the type of information that is available (Fig. 1). If the bic core of the membrane [9,10]. In some cases, membrane high-resolution structure of a protein (template) homolo- gous to the target is known, we can use homology modeling to obtain a prediction for the target structure. The quality 1 Abbreviations used: 3D, three-dimensional; GPCR, G-protein coupled of the resulting model typically depends on the sequence receptor; HMM, Hidden Markov Model; IMP, integral membrane protein;

Membrane Protein Prediction Methods

Table S1. List of Proteins in the BAHD1 Interactome

The Second Critical Assessment of Fully Automated Structure

CASP)-Round V

Inactive USP14 and Inactive UCHL5 Cause Accumulation of Distinct Ubiquitinated Proteins In

Proteomic and Bioinformatic Pipeline to Screen the Ligands of S

Identification and the Significance of Selective Proteins in Bile And

Supplementary Table 1. the List of Proteins with at Least 2 Unique

Cross-Talk Between Overlap Interactions in Biomolecules: a Case Study of the Β-Turn Motif

Biological Recognition of Graphene Nanoflakes

Jakub Paś Application and Implementation of Probabilistic

April 11, 2003 1:34 WSPC/185-JBCB 00018 RAPTOR: OPTIMAL PROTEIN THREADING by LINEAR PROGRAMMING

USE of CLUSTERING TECHNIQUES for PROTEIN DOMAIN ANALYSIS Eric Rodene University of Nebraska - Lincoln, [email protected]