Modeling of the Three-Dimensional Structure of Proteins with the Typical Leucine-Rich Repeats Andrey V Kajava1 ,2*, Gilbert Vassart 3 and Shoshana J Wodak 2
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Modeling of the three-dimensional structure of proteins with the typical leucine-rich repeats Andrey V Kajava1 ,2*, Gilbert Vassart 3 and Shoshana J Wodak 2 1Swiss Institute for Experimental Cancer Research, Chemin des Boveresses 155, CH-1066 Epalinges, s/Lausanne, Switzerland, 2 Unite de Conformation des Macromolecules Biologiques, Universite Libre de Bruxelles, CP160/16, P2, Av P. Heger 1050 Bruxelles, Belgium, and 31nstitut de Recherche Interdisciplinaire and Service de Genetique Medicale Facult6 de Medecine, University Libre de Bruxelles, Campus Erasme, 1070 Bruxelles, Belgium Background: Leucine-rich repeats (LRRs) are present to model the three-dimensional structure of the most in proteins with diverse functions. The horseshoe- common LRR units. These modeled units were then shaped structure of a ribonuclease inhibitor (RI), with a used to build the three-dimensional structure of the parallel sheet lining the inner circumference of the extracellular domain of the thyrotropin receptor (TSHR) horseshoe and a helices flanking its outer circumference, - a 'typical' LRR protein. is the only X-ray structure containing these repeats to be Conclusions: The modeled TSHR structure adopts a determined. Despite the fact that the lengths and non-globular arrangement, similar to that in RI. The sequences of the RI repeats differ from those of the most 3 regions of this typical LRR protein are the same as in commonly occurring LRRs, it was deemed worthwhile the RI structure, whereas the a helices are shorter and the to derive a three-dimensional structural framework of conformations of the at3 and aotconnections are differ- these more typical LRR proteins, using the RI structure ent. As a result of these differences it was not possible to as a template. pack together typical LRR units using repeats such as Results: Sequence alignments of 569 LRRs from 68 those found in RI. This mutually exclusive relationship is proteins were obtained by a profile search and used in a supported by sequence analysis. The predicted structure comparative sequence analysis to distinguish between of the typical LRRs obtained here can be used to build residues with a probable structural role and those models for any of the known LRR proteins and the which seemed essential for function. This knowledge, approach used for the prediction could be applied to along with the known atomic structure of RI, was used other proteins containing internal repeats. Structure 15 September 1995, 3:867-877 Key words: atomic structure, leucine-rich repeats, molecular modeling, thyrotropin receptor Introduction lengths can be obtained from the analysis of 569 repeats Long, tandem arrays of a leucine-rich repeat (LRR) found in 68 different LRR proteins selected by have been found in the primary structure of many pro- sequence-profile searching [8] of the GENPEPT data- teins of diverse origin, function and cellular location [1]. base [9]. In accordance with this two-peak distribution Included among over 60 known LRR proteins are (Fig. la), LRR proteins can be divided into two sub- receptors for hormones [2], cell-adhesion molecules [3], families. The main peak of the distribution contains extracellular matrix binding glycoproteins [4], enzymes more than 90% of the LRRs. The characteristic repre- [5] and tyrosine kinase receptors [6]. In spite of the keen sentative of the most common subfamily (referred to as interest in these proteins and extensive knowledge about 'typical' hereafter) is a 24-residue LRR. RI, with its 28 their sequences, little was known about the spatial to 29-residue LRRs [10], belongs to the other, much arrangement of the LRR. The solution of the crystal less populated LRR subfamily. Comparative sequence structure of porcine ribonuclease inhibitor (RI) [7], a analysis also revealed differences in the consensus protein with LRRs, can be considered as a breakthrough sequences for the repeat in these two subfamilies in the understanding of the three-dimensional (3D) (Fig. lb). An 11-residue segment of the LRR (under- structure of LRR proteins. In the RI molecule, LRRs lined in Fig. lb), corresponding in the structure of RI to correspond to structural units that consist of a short the 3 strand and consecutive loop region, is conserved {3 strand and an a helix. These units form a superhelix in both subfamilies, whereas the remaining parts of the and are arranged so that all the [3 strands and a helices repeat are different. This suggests that although RI and are parallel to a common axis, resulting in a non-globu- the typical LRR proteins probably have the same gen- lar, horseshoe-shaped molecule with a curved parallel eral spatial arrangement and even the same 3 strand 13 sheet lining the inner circumference of the horseshoe arrangement at the atomic level, the overall 3D structure and the a helices flanking its outer circumference. of the typical LRR proteins may differ in significant Analysis of the LRR lengths and sequences (see the ways from that of RI. Materials and methods section), however, indicates that the repeats of the RI are somewhat atypical examples of Analysis of the known data led us to believe that theoreti- the LRR family. An idea of the distribution of the LRR cal prediction and molecular modeling of the typical LRR *Corresponding author. © Current Biology Ltd ISSN 0969-2126 867 868 Structure 1995, Vol 3 No 9 Results and discussion Assumption of identity of the 3 structure in all LRR proteins Accurate sequence alignments are critical for correct homology modeling. Our first step, therefore, was selec- tion of the most complete set of LRR proteins by profile searching [8] and their comprehensive sequence align- ment (see Materials and methods section). The sequence analysis showed that all LRRs (including RI) have a highly conserved 11-residue stretch which has the consensus sequence LxxLxLxxNxL (Fig. lb), where L indicates leucine or bulky, non-polar residues, N indicates asparagine or cysteine, and X is any amino acid. In the crystal structure of RI, the first seven residues of this conserved motif form a parallel sheet which lines the inside wall of the 'horseshoe'. The leucines (or sometimes valine, isoleucine, phenylalanine or methionine) are directed towards the inside of this structure, forming a continuous hydrophobic core. After this seven-residue 13 region, each strand changes direction by 900. In this half-turn conformation the NH and CO of the peptide groups cannot form hydrogen bonds with neighboring strands and have to be directed towards the non-polar core of the molecule. This would be energetically unfavorable, but a conserved asparagine (or cysteine) is positioned right after the 13 strand so that it is able to form Fig. 1. Results of sequence-profile searches. (a) Length distribu- a network of specific hydrogen bonds with these NH tion of 569 leucine-rich repeats (LRRs) found in 68 different pro- and CO groups, thus satisfying their hydrogen-bonding teins. The black area within a histogram bar indicates the potential. The last conserved residue in this pattern is a occurrence of the repeats belonging to the 28-residue subfamily of LRR. The grey area indicates occurrence of the more common bulky non-polar residue (usually leucine) which, in RI, is 24-residue subfamily repeats. (b) Characteristic sequence pat- directed into the hydrophobic core. Being conserved in terns for the 28-residue (black) and for 24-residue (grey) LRR sub- all LRR sequences, this 'P3 structure+asparagine-ladder' family. The representative repeats were taken from RI [10] and part of the repeat would therefore be expected to adopt a from carboxypeptidase N 15] sequences, respectively. The identi- similar spatial structure. Assuming that this is correct, the cal parts of the repeats are underlined. Capital letters represent residues that are conserved within the subfamilies. Highly con- molecular modeling task is then limited to the construc- served residues are in bold. tion of the remaining part of the typical LRR, which differs in sequence from the RI LRRs. proteins, on the basis of the known RI structure, would be instrumental in improving our understanding of their 3D Prediction of the topology structure. The prediction of the 3D structure of proteins is Analysis of known protein structures has shown that pro- clearly a highly risky undertaking yielding results which teins with repetitive primary structures also contain are, in general, not reliable. In cases of proteins with repet- repetitive modules in their 3D structure [11]. On the itive sequence, however, a priori theoretical structural pre- basis of this one can conclude, even in the absence of dictions have been much more successful. Examples of information on the RI crystal structure, that the structure these proteins are fibril proteins which have 2-4 residue of LRR proteins involves a repetitive 3D arrangement. periodicities (e.g. silk fibroin, collagen and tropomyosin) As the sequences of the typical LRRs are similar those of [11] or bead-like proteins with 30-40 residue periodicities the RI LRRs, it is natural to assume that the 3D struc- (e.g. zinc-finger proteins [12]). These predictions have tural arrangement of proteins containing these repeats is been significantly facilitated by the assumption of regular similar. Thus, the topology of the typical LRR protein is spatial arrangement, the possibility of distinguishing probably also of the superhelix type, with one side, between structurally and functionally important residue formed by the parallel 13structure and the other com- conservations, and by sets of indirect experimental evi- posed of fragments with an unknown conformation dence. There are also a few examples of successful predic- packed collinearly side by side. tion by homology modeling [13]. The proteins of the more populated LRR subfamily fulfil all of the above The second assumption implies a horseshoe shape in all requirements - they are repetitive and the structure of a LRR superhelices.