Topological Distribution of Four-A-Helix Bundles (Folding Motif/Solvent Accessibility/Helix Dipole) SCOTT R
Total Page:16
File Type:pdf, Size:1020Kb
Proc. NatI. Acad. Sci. USA Vol. 86, pp. 6592-6596, September 1989 Biochemistry Topological distribution of four-a-helix bundles (folding motif/solvent accessibility/helix dipole) SCOTT R. PRESNELL* AND FRED E. COHEN*t Departments of *Pharmaceutical Chemistry and tMedicine, University of California, San Francisco, San Francisco, CA 94143-0446 Communicated by Frederic M. Richards, June 19, 1989 ABSTRACT The four-a-helix bundle, a common struc- suggest a set of bundle structures beyond those initially tural motif in globular proteins, provides an excellent forum for reviewed by Weber and Salemme (8). Further, we will the examination of predictive constraints for protein backbone describe robust topological characterizations and categori- topology. An exhaustive examination of the Brookhaven Crys- zations of the discovered structures. In light of our catego- tallographic Protein Data Bank and other literature sources has rization scheme, the past and current topological constraints lead to the discovery of 20 putative four-a-helix bundles. used for the prediction of protein structures containing four- Application of an analytical method that examines the differ- a-helix bundles are reevaluated. ence between solvent-accessible surface areas in packed and partially unpacked bundles reduced the number of structures to 16. Angular requirements further reduced the list ofbundles METHODS to 13. In 12 of these bundles, all pairs of neighboring helices To locate putative four-a-helix bundles, two independent were oriented in an anti-parallel fashion. This distribution is in observers employed the graphical display program MIDAS accordance with structure types expected if the helix macro (11) to inspect more than 300 globular protein structures from dipole effect makes a substantial contribution to the stability of the Brookhaven Protein Data Bank (12) (November 14, 1988). the native structure. The characterizations and classifications Particularly flexible inspection criterion suggested 14 poten- made in this study prompt a reevaluation ofconstraints used in tial four-a-helix bundles from 12 proteins. To discriminate structure prediction efforts. compact bundles from within the list of putative structures, a quantitative determination of helix-to-helix packing was Specification of the code that translates primary to tertiary developed. This method is based on the algorithm of Lee and structure remains unresolved. It has proven difficult to Richards (13) for surface area determination. For each pu- predict which features in an amino acid sequence will provide tative four-a-helix bundle, the solvent-accessible surface the basis for the three-dimensional structure or function of a area was determined for the entire helix bundle and for each given protein. However, proteins with only 10% identity at of the four possible sets ofone helix separated from the other comparable positions in a polypeptide can have notably three. Comparing the sum ofthe surface areas ofthe one- and similar structures (1). Indeed, an individual tertiary structure three-helical substructures to the original bundle structure will often fall into one of a limited number of structural produced a quantitative value for the amount of surface area classes (2). Tertiary-structure prediction systems incorporat- lost on burying each helix in the bundle. The typical surface ing combinatorial methods (3-5) and template methods (6, 7) area of a four-a-helix bundle is approximately 2000 A2. Ifone target known structural classes in an attempt to reduce the assumes an energetic conversion value of 24 cal mold A-2 number of structures created and examined. Therefore, to (14), then a loss of200 A2 in surface area upon burying the last effectively predict tertiary structures, we must determine as helix into a bundle provides a hydrophobic stabilization of4.8 many constraints describing the individual structural classes kcal mol' (1 cal = 4.184J). Accordingly, if the surface area as possible. lost on burying any individual helix was less than 10% of the Among the structural class containing those proteins con- entire bundle's solvent-accessible surface area, then that structed predominantly from a-helical structures, a highly group of four helices was not considered a four-a-helix recurrent- motif is the collection of (anti)parallel a-helices bundle. Loop regions between the helices were not included known as the four-a-helix bundle. Four-a-helix bundles are in the surface area calculations. found in proteins covering a wide range of structure and Interhelical angles were determined for each pair ofhelices function, but they display some common characteristics. along the perimeter of the bundle. Helix vectors were deter- Weber and Salemme (8) were among the first to examine and mined using the adaptive helix parameter method (15) to characterize four-a-helix bundles as a class of super- minimize the propagation axis variability. Ideal helices were secondary structure. Initially, their work suggested a com- generated in a known manner from three primary helical mon, right-handed, connective topology for all four-a-helix parameters: radius, pitch, and number of residues per turn. bundles. This characterization, derived from a limited data By using the Kabsch method (16) for calculating the rms fit base of bundle topologies, proved too restrictive to be between the ideal and observed helical coordinates, the correct. The "handedness" constraint obscured further at- matrix required to transform the actual coordinates to lie tempts to characterize the topology of previously known and along the x axis is produced. The three helical parameters can newly discovered structures (9, 10). Incorporation of all the be extracted from this matrix and adjusted in an iterative presently known four-a-helix bundle structures requires a fashion until an ideal helix fits the observed helix as nearly as different categorization scheme. possible. The interhelical angle (fl) was defined as the The purpose of this study is to describe and present a arc-cosine of the dot product of the two helix vectors. The thorough examination of the literature for four-a-helix bun- cross product of the helix vectors was used to determine the dles using generalized determination criteria. These criteria sign of the interhelical angle. Because we were interested in perpetuating the classical definition of a four-a-helix bundle, The publication costs of this article were defrayed in part by page charge we chose to eliminate from further consideration those bun- payment. This article must therefore be hereby marked "advertisement" dles containing an absolute value of acute interhelical angles in accordance with 18 U.S.C. §1734 solely to indicate this fact. greater than 400. A review of the recent literature and a 6592 Downloaded by guest on September 30, 2021 Biochemistry: Presnell and Cohen Proc. Natl. Acad. Sci. USA 86 (1989) 6593 Table 1. Definition of the putative four-a-helix bundles investigated Protein PDB code Ref. Helix A Helix B Helix C Helix D Cytochrome b5 2b5c 18 32-39 43-49 54-61 64-75 Cytochrome b-562 156b 19 2-19 24-45 62-85 88-108 Catalase 8cat 20 177-188 451-467 470-483 485-500 Cytochrome c' 2ccy 21 5-30 40-58 79-102 104-125 Cytochrome P-450cam 2cpp 22 127-145 149-169 234-267 359-378 Citrate synthase (a) 4cts 23 136-152 163-195 274-291 390-416 Citrate synthase (b) 4cts 23 221-236 237-341 344-365 373-386 Cytochrome c peroxidase 2cyp 24 42-54 103-119 165-177 255-272 Methemerythrin lhmq 25 19-37 41-64 70-85 91-109 T4 lysozyme 2lzm 26 92-106 116-124 125-138 144-156 p-Hydroxybenzoate hydroxylase lphh 27 12-24 53-57 102-114 299-318 Phospholipase C (a) * 28 12-28 33-55 105-125 206-242 Phospholipase C (b) * 28 85-104 105-125 171-187 206-242 Thermolysin 3tln 29 230-246 260-274 280-2% 300-312 Individual helices were defined according to the HELIX records of the data bank files. PDB, Protein Data Base. The following proteins contained four-a-helix bundles that were evaluated but could not be analyzed by the numerical method because of a lack of crystal coordinates: ferritin (30), interleukin 2 (31), human complement component C3a (although the crystal structure reveals disorder in the N terminus, homology modeling with complement component C5a strongly suggests a helical N-terminal region (32, 33), human complement CSa (33, 34), tobacco mosaic virus coat protein (35), and human growth hormone (9). *Phospholipase C coordinates were obtained directly from E. Hough (University of Tr0mso). personal communication produced reports of six other four- group of related structures with absolute acute interhelical a-helix bundles; these were also examined for handedness angles greater than 400 that will form the subject of future and backbone topologies. study. The angular requirements further reduced the list of The AMBER program suite (17) was used to determine bundles to the final 13. internal energies of native protein structures. Crystal struc- Helix-Bundle Categorization. Fig. 1 exemplifies the set of ture coordinates were taken directly from the Brookhaven topological descriptors for helix bundles developed in the Protein Data Bank (12) (release dated, November 14, 1988). current study: (i) the polypeptide chain connectivity, (ii) the unit direction vectors of the individual helices, and (iii) the RESULTS AND DISCUSSION overall bundle handedness or macroscopic chirality. Two general types of connectivities exist between helical seg- Structural Evaluation of Putative Helix Bundles. Table 1 ments. The first type is a plain or adjacent connection, where specifies the structures evaluated in this study. Table 2 the C terminus of one helix is adjacent in space to the N presents a summary of the evaluation of surface area loss terminus of the next helix, but the direction vector changes upon burying the last helix of a four-a-helix bundle. Although orientation by 1800 within the connecting loop of polypeptide all of the previously recognized helix bundles pack well backbone.