Proteins at the Plasma Membrane: What Protein Domains Are on the Outside and What Domains Are on the Inside?
Total Page:16
File Type:pdf, Size:1020Kb
Proteins at the plasma membrane: What protein domains are on the outside and what domains are on the inside? Kanthida Kusonmano Kwanjeera Wanichthanarak Natapol Pornputtapong Chalmers University of Technology, Sweden The cover image is adopted from http://4e.plantphys.net/image.php?id=80. 1 Introduction Membrane proteins are involved in a wide range of important biological processes, such as cell signaling, transport of membrane-impermeable molecules, cell adhesion and cell–cell communication, many of which are involved in disease mechanism and drug target discovery. Thus, an understanding of their structure and function is of great importance for biological and pharmacological research. Because of the experimental difficulties, i.e. not easy to crystallize, these membrane proteins are rarely found in structural databases. Sequence-based analysis is therefore an important approach for investigating such proteins [1]. Transmembrane proteins are a class of integral proteins which penetrate into or through the lipid bilayer of cell membrane or plasma membrane. There are three regions that can be defined: the region outside the membrane, the region inside the membrane and the region in the bilayer (Figure 1). Figure 1 Representation of a transmembrane (integral membrane) protein (figure adopted from [2]) Prediction of transmembrane helices from sequence is a key challenge for bioinformatics. In this study we used TMHMM, a hidden Markov model for predicting transmembrane helices in protein sequences [3], to predict the location and in/out orientation of human transmembrane helices. Then, we investigated further for the protein domains of in/out transmembrane regions using HMMER, a tool for searching protein homologs and for making sequence alignments [4]. The tool was implemented based on profile hidden Markov models (profileHMMs). 2 Material and methods To examine human transmembrane proteins, we first retrieved all human proteins from UniProt/SwissProt [5]. Since there are both plasma membrane and internal membranes e.g. ER membrane in a human cell, in order to identify only the proteins of the plasma membrane or transmembrane proteins, we used two Gene Ontology (GO ) terms; ‘plasma membrane’ and ‘integral to membrane’. After getting the set of transmembrane proteins, we performed two tasks in parallel: 1. Predict transmembrane helices TMHMM was used to predict the location and in/out orientation of transmembrane helices [3]. Given a HMM, the tool predicts transmembrane helices by finding the most probable topology of a given residue. There are three possibilities whether a given residue is on the cytoplasmic side (intracellular), on the periplasmic side (extracellular), or in a transmembrane helix (within membrane). 2. Identify domains of transmembrane proteins HMMER was used to identify the protein domains of transmembrane proteins [4 ]. The tool is based on profileHMMs and it is used for searching functional domains on given protein sequences. Pfam-A profiles, which are derived from high quality and manually curated protein domains in Pfam database [6], were used as query profiles. Prediction is performed by hmmsearch program which is a part of HMMER package with cut-off score at 10E-5. Information from both TMHMM and HMMER were combined to distinguish domains of in/out transmembrane regions. Analytical steps for transmembrane proteins are illustrated in Figure 2. 3 Figure 2 Analytical pipeline of human transmembrane proteins 4 Results and discussions Transmembrane prediction As mention earlier, the structure of transmembrane proteins can be defined into three parts; intracellular, extracellular and transmembrane helix. All of transmembrane proteins have the same core structure which is the transmembrane helix mostly having specific functions of membrane integration and transportation. There are 1076 transmembrane proteins which were collected from UniProt/SwissProt database based on “Plasma membrane” and “Integral to membrane” keywords of Gene Ontology definition. A Ruby script was used to parse data from UniProt/EBI format into fasta format as shown in appendix A. Figure 3 Number of transmembrane helix of human transmembrane proteins TMHMM was used to predict the location and in/out orientation of transmembrane helices for these transmembrane proteins. The number of predicted transmembrane helix from TMHMM are varied from no transmembrane helix to 14 transmembrane helices as shown in Figure 3. 43 transmembrane proteins were unable to identify transmembrane helix which is probably because of limit capability of TMHMM 5 algorithm. After tracing back to UniProt, most of transmembrane types for such group are signal-anchor helix which may have different properties from general transmembrane helix. Thus these proteins will not be included in domain prediction step. Among the rest of transmembrane proteins, the most common number of transmembrane helix is 7. This group of transmembrane proteins relates to the G protein-coupled receptors (GPCRs), the largest group of transmembrane proteins. Even though these transmembrane proteins share core functional unit, i.e. 7- transmembrane helix, many of them contain different functional domains in extra and intracellular region [7]. Domain prediction In addition to transmembrane helix, the other parts are extracellular and intracellular regions which do not insert into the membrane. Diverse functions of membrane proteins are based on functional domains that present in these two regions. There are 237 domains assigned to 2316 positions of the proteins. HMMER provides the positions of predicted-functional domains while TMHMM results give the in/out orientation of transmembrane helices. Combining results from HMMER and TMHMM, we can categorize predicted domains into three groups: domains on transmembrane helix, domains in intracellular and domains in extracellular. 1. Domains on transmembrane helix This group of domains is situating on most parts of transmembrane region, in other words they cover many topological domains or almost whole part of transmembrane proteins. The properties of domains found in this region is very similar. It shows the properties of membrane integration which only confirm the prediction from TMHMM. There are 97 domains (shown in appendix B) in this group and most of them belong to 7-transmembrane GPCRs family. Nearly half of proteins that we used in this prediction have GPCRs domain family on their sequence. This correlates to the results from TMHMM part. 2. Domains in intracellular region Domains in this group were predicted to be in intracellular position by TMHMM. Intracellular domain is a part of transmembrane protein that contact to cytoplasm. There are only 16 domains assigned into this group. It is very small when compare to the others. Functions of these domains suppose to be the domains that connect to intracellular components such as, cell structure, metabolism and signal transduction activities. In this study, we found many of them are functioning as binding regions. Functions of intracellular domains are not much diverse when compare to domains of extracellular region. The most 6 common domain is cadherin_C or Cadherin cytoplasmic region, a part of protein commonly found on cadherin protein. Cadherin protein is a member of cell adhesion molecules which are needed during tissue differentiation. The cadherin cytoplasmic region is cytoplasmic tails which link the cytoskeleton by catenins. The other domains are shown in appendix C. 3. Domains in extracellular region This is the largest group of domains that we classified in this study. There are 110 domains found in extracellular regions of transmembrane proteins. From description data of Pfam database, most of them are non-specific to extracellular regions. In other words, we cannot be certain which domains are specific to the inside or outside of the cell. Only 14 domains are highly specific to extracellular regions as shown in appendix D, for example Cadherin, ig, SEA and Sushi. In addition, we found that a domain, classified from Pfam as an intracellular domain, was found in this prediction as an extracellular domain. This domain is the calx-beta. The calx-beta domain is a tandem repeat in the cytoplasmic domains of Calx Na-Ca exchanger, also presents in the cytoplasmic tail of mammalian integrin-beta4. This motif is used for calcium binding and regulation [8]. There are 3 proteins which were predicted to contain Calx beta domain. To prove the correctness of prediction, we plotted domain architecture of two proteins which have Calx beta domain as show in figure 4 and 5. Predicted Calx beta domain usually comes along with other extracellular domains like VWC, EPTP and GPS. This conflict may be from TMHMM or this domain has another function in extracellular region. Some domains are detected in both intracellular and extracellular regions based on TMHMM. These might be because of either wrong predictions of positions and orientations from TMHMM or false positive from HMMER. Table in appendix E shows predicted domains occurred in intracellular and extracellular group. There are some domains such as I-set, and V-set which highly dominate in extracellular than intracellular. 7 Figure 4 Transmembrane helices and functional domains position of protein Q8WXG9 Figure 5 Transmembrane helices and functional domains position of protein Q86XX4 8 References 1. Nugent T and Jones DT, 2009, Transmembrane protein topology prediction using support vector machines, BMC Bioinformatics, 10:159. 2. Hurwitz N, Pellegrini-Calace