Protein Science (1999, 4:1188-1202. Cambridge University Press. Printed in the USA. Copyright 0 1995 The Society

Automatic recognition of hydrophobic clusters and their correlation with units

MICHEAL H. ZEHFUS Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy and Department of , College of Biological Sciences, The Ohio State University, Columbus, Ohio 43210 (RECEIVEDDecember 2, 1994; ACCEPTEDMarch 15, 1995)

Abstract A method is described to objectively identify hydrophobic clusters in of known structure. Clusters are found by examining a protein for compact groupings of side chains. Compact clusters containseven or more res- idues, have an average of 65% hydrophobic residues, and usually occur in protein interiors. Although smaller clusters contain only side-chain moieties, larger clusters enclose significant portions of the peptide backbonein regular secondary structure. These clusters agreewell with hydrophobic regionsassigned by more intuitive meth- ods and many larger clusters correlatewith protein domains. These results are in striking contrast with theclus- tering algorithm of J. Heringa andP. Argos (1991, JMol Biol220:151-171). That method finds that clusters located on a protein’s surface are not especially hydrophobic and average only 3-4 residues in size. Hydrophobic clusters can be correlatedwith experimental evidence on early folding intermediates. This corre- lation is optimized when clusters with less than nine hydrophobic residues are removed from the data set. This suggests that hydrophobic clusters are important in the folding process only if they have enough hydrophobic residues. Keywords: compactness; folding intermediates; hydrophobicity; hydrophobic cores; hydrophobic clusters; pro- tein folding

Since the seminal work of Kauzmann(1959) outlining the na- unit definitions vary widely. Because each investigator identi- ture of the hydrophobic force, an implicit assumption in pro- fies hydrophobic regions using different criteria, the proposed tein folding has been that proteins possess an interior core of tie between hydrophobic regions and early folding intermediates hydrophobic residues and that the formationof this core is a ma- must be viewed skeptically. Thus, it is important that an objec- jor force in the folding process. Interestin hydrophobic regions tive method be found to identify such regions. has undergone a renaissance in recent years. In theoretical work, This paper describes how hydrophobic regions can be objec- Dill has proposed that a protein’s regular secondary structure tively identified by locating compact clusters of side chains. is a natural outcome of burying residuesin a hydrophobic core Using compactness to locate suchregions makes intuitive sense (Dill et al., 1993). In experimental work, both hydrogen ex- because hydrophobic clusters, like oil drops, will maximize in- change experiments and NOESY experiments have identified terior interactions while minimizing their surface area. early folding intermediates that are frequently correlated with Although compactness, not hydrophobicity,is the primese- hydrophobic regions (Evans et al., 1991; Matouschek et al., lection criterion, the discovered clusters arevery hydrophobic, 1992a, 1992b; Pan & Briggs, 1992; Gronenborn & Clore, 1994). containing an average of 65% hydrophobicresidues (the aver- Although thereis much interest in hydrophobic regions, there age protein in the study had42% hydrophobic residues). Two is no widely accepted method to identifyor delineate them. Hy- structurally distinct hydrophobic regions, called clusters and drophobic regions are usually identified by visual inspection, and cores, are observed. Clusters are smaller,with side-chain atoms located in the center of the unit and main-chain atoms located Reprint requests to: Micheal H. Zehfus, Division of Medicinal Chem- in the periphery. Cores are larger and enclose backbone atoms istry and Pharmacognosy, College of Pharmacy and Department of Bio- as elements of regular secondary structure in their interior. chemistry, College of Biological Sciences, The Ohio State University, If the hydrophobic clusters are filtered to remove unitswith Columbus,Ohio 43210; e-mail: [email protected] less than nine hydrophobic residues,a good correlationis seen state.edu. Abbreviations: BPTI, bovine pancreatic trypsin inhibitor; FOM, fig- between the clusters and early folding intermediates. This sup- ure of merit; 2, coefficient of compactness; r, normalized coefficient ports the premise that hydrophobic regions are important in of compactness. early foldingevents. It also suggests that thereis a threshold be- 1188 Recognition of hydrophobic clusters 1189

low which hydrophobic clusters are not stable enough toseed the same helix is stabilized by two different clusters (a-lactal- the protein foldingprocess. Further studyof the correlationbe- bumin units I.A.2 and I.A.3). tween hydrophobic clusters and early folding intermediateswill Because hydrophobic clusters are usually discontinuous, the help us better understand protein folding. presence ofcontinuous stretches of residues in larger units is puz- zling. This puzzle is explained by the observation, made earlier, that larger units contain inclusions of regular secondary struc- Results and discussion ture. Continuous stretches of residuesin a cluster occur when Proteins derive a great deal ofstability by clustering hydropho- a piece of regular secondarystructure hasbeen incorporated into bic residues into regions where they are in contact with each the cluster’s interior. other but have little contact with the solvent. A hydrophobic Because each protein is unique, the cluster structure and the cluster should havea minimum surface area forits enclosed vol- correlation of clusters with early folding intermediateswill now ume, because this would maximize the favorable hydrophobic- be discussed on a protein-by-protein basis. hydrophobic interactions while minimizing the unfavorable hydrophobic-solvent interactions. Barnase Compactness, as definedby the Z parameter, finds units that have a minimum surface area fortheir enclosed volume (Zehfus A large twisted @-sheet forms the basic framework for this & Rose, 1986). Thus, there should be a correlation between 110-residue protein (Baudet & Janin, 1991). Core I consists of compactness and hydrophobic clusters. This correlationis sur- both sides of roughly 2/3 of this sheet. Cluster I.A, containing prisingly strong. The average compact cluster contains 65% hy- 76% hydrophobic residues, is on oneside and I.B, with 67% hy- drophobic residues, and only about5% of the clusters contain drophobic residues,is on the other. Unit11, with only 47%hy- less than 40% hydrophobic residues. drophobic residues, is on the same side of the sheet as I.B, but Cluster hydrophobicity is size dependent. Smaller compact the twist in the sheet keeps units I1 and 1.B separate from each clusters are about 70% hydrophobic, whereas larger ones are other. The structures of barnase and many of its clusters and only 50% hydrophobic. This effect may be explained by the po- subclusters are presented in Figure2 as an exampleof a typical sition of theclusters within the protein. Smaller clusters are gen- protein and its clusters. erally buried in the protein interior, whereas the larger clusters Fersht’s group has developed a detailed model for the fold- have expanded to include some of the protein’s surface and ing of barnase based onextensive analysis of folding mutations therefore are more hydrophilic. and protonexchange data (Fersht et al., 1992; Matouschek et al., The basic topography of compact clusters also varies with unit 1992a, 1992b; Serrano et al., 1992a, 1992b, 1992~). This model size. In small compact clusters, the side chains are clustered to-identifies three hydrophobic regions called core,, core,, and gether in the center of the unit, with the main-chain atomslo- core,. These regions correspond quite well with cluster I.A, cated around theperiphery. When clusters have more than about unit I1 andcluster I.B, respectively.Core, and core3 are 15 residues, main-chain atoms are nolonger restricted to the pe- thought to form simultaneously, early in the folding process. riphery of the unit but become buried within the clustersele- in This region would correspond nicely to the coreI compact unit. ments of regular secondary structure. Units with main-chain Core, forms later, mostlikely because it is more hydrophilic in inclusions areeasily identified either by visual inspection, orby nature. Consistent with this explanation one finds that core I the dramatic decreases in absolute Z values (increasing com- (the combination of core, and core3)is both larger and contains pactness) when main-chain atoms are includedin compactness more hydrophobic residues than unit I1 (core2). calculations. The goodness of fit between a hydrophobic cluster and a fold- Figure 1 shows the compact hydrophobic clusters and other ing intermediate may be estimated in a number called the fig- parameters of interest for a dozen different proteins. In this ure of merit. The FOM is determined by first calculating the figure, each side chain of a cluster is marked with a Ior a percentage of residuesin the hydrophobic cluster that arein or at the residue’s position in the protein, and each unit hasbeen adjacent toresidues in the folding intermediate, then determin- systematically named to portray its position in the overallhier- ing the percentage of residues in the folding intermediate that archy of compact units. This namingsystem is explained in the are in or adjacent to residues of the hydrophobic cluster, and Methods section. Although this figure displays all compact units, then averaging these two numbers together. The FOM of the those displayed in bold (with theI mark) have passed two ad- combination of core, + core, correlated with hydrophobic core ditional filtering steps derived empiricallyto obtain thebest fit I is 66%. The FOM of unit I1 and core3 is 61 %. between the compact clusters and experimental evidence for Using Ca contact distances, Yanagawa hasdivided this pro- early folding units in these proteins (see the Methods). tein into six folding modules: 1-24, 24-52, 52-73, 73-88, 88- The discontinuous nature of hydrophobic clusters initially 98, and 98-110. These modules are thought to correlate with makes this figure hard to interpret;however, certain patterns can structural units encoded by exons (Yanagawa et al., 1993). In- be observed that make good structuralsense. In @-strands, alter- terestingly, none of these modules appear to containa complete nate residues point in opposite directions. The pattern II I hydrophobic cluster, buta few hydrophobic clusters seem to be shows that residues 1, 3, and 5 are clustered together, most likely composed of combinations of modules (unit11, and clusters 1I.A on one side ofa @-strand. In somecases, like the @-sheet region and 1.B). of units I.B.2 and 1.Cof ribonuclease, one can observe twodis- tinct clusters stabilizing opposite faces of a @-sheet. Ubiquitin Interactions with one side aof helix are seen in patterns like I I,where residues 1, 4, 5, 8, and 9 allcluster onone The frameworkof this 76-residue protein is a @-barrelwhere two face of the helix. As with &strands, one can find cases where @-strands havebeen replaced with a single helix (Vijay-Kumar 1190 M.H. Zehfus

I w-02 I .;h .is> 6 &;;X 6 Gz d:; 11 .a I k.5; a;.:%E m ZZ 8 5 :I 3.g jj= I.d ggi&Zg R -,So 4 11% m...$2: 0cuuKtm~- g II*x 2g 8 ze E, 4: I;i " zec;B EA:+ .\...q I .$: glllamoz? ;3=gg.=o .>.:III:t,: R.. 4.' ggcxc3 It

~ =X2 81 II ,jl/ &.a 2 d 0 Q\ FY g " II .:,: .;i,: <.gs~u~g 8 I kazxg;.z uuumx u I am OL-2 :-m; h( ,c $1- .:::$:: 'i p,, ,:IcI .- 4 I 2 :'zgz,- e :53H;;: A$,, :$ VJ .a il, x 5 :$; 8 ,%+ 2 -3 wcl -,,II = ,i.i L G? *hl oc 0 g :/: $ I umKtzci~~ 0- wcl 3 ~~n~~lw p. :$:; E" * -q -*.%,i dmI ,. 27;;~u-orn$.2 2%; 52 gx52g ," % mrl 8 j$i 8 iz u ZS-. m am- &~~;.."Im o-:: t.,i I ,,,dl.; li. z- fi '!,I G"-sum O1 :z3a31 u -;x;tu3", :==3:; lim .,ii,m II.'&:a ;:-zu;;Lv * ..; 'I .- a& g E2 2 .< : I.P * :I I I.,bi 5%" . c= E 0- cc9 m O"E;ij83 ~ II I., I N 3 o gs.2 Y I I:;<, 0 us g s.2 :is: N- * 0 ma: :I I p-jp 0 "m "E a a 2 #mm.:~ ,= z; gu.s E % - I 2'- 2 X?kIll ~. :$: 5:2==z.s 5 n 2 :Irl I,ili ,e g g +%2 Ql :0 a ...$1 .& m v) Q\ 2 Q\ ;;; rn .I a X 2SII zi-g 0 - nn 0- =$ g2-m -1" iD w raxikml : >Yi ,il m.::<,i g Y..+ .u m; 1. .iB z ;ij x;ij y2 ;u" .,!I.it; xilsml si g 4 I& g;,&.,: I :gggzzAlQl an * 4 LI gz&0.2- ,5 0 -:z R :w Iii& 1"' 0- gmm N b :a mi* S*ZEZ Y* Nrn m ..x: >E2C.2= &Em& l N D. 11 I$ s% i :;/ 5VJ 0 '$?-. g % ct l.ii, ZRg2-I Lc N N .:k - N .t.: --a Tg +:g :: N N r :$; I:%! ,z xp "- & : ~ j; i c: 4.. ,aq,s V" .s :: $3 .w %Z.ONQ\ 5 2: 0 1 N a .;> 14,: .- . uo'z 0 v1 N N D. *y: 3: 9z oz-J;2:, 0 N N 31< e g Vj m hl 0 -i: uz-; $, 3; R a m .y a ..ui A z -% I*;;ij4Ktxm,l 3-3" hi? N %:'fly .: .: illll y%g;,z:p 3N 2a I .is.;,, 0 - 2 3 ,':!Im:jg 0- 2 ggns~;+6 N r,, + 'CI 0.0 CY n X ,vII ,, .ji.: . i &< ,. p-"< m .,/I=i:i e;zQ\ m 2 a x $[: 8.i -I:/ = E j; ca,c&.^H 3 PI .. 3 8 gzg~~~ 0 - "a ,I141s:j# o -PI la 3 3 &a+gg g c 2-4-3 @I...ri u = AZ.5 32 3 $ rn #::.e I ;<>: 1.1 2 2 .>>8% ;E i .u Q) Eg = +-a- A- rn p- I., %@ o\ '5 :.- ._ -4 (0 " 0- 0- 0- cCbsxy227 Gvl *CWOWPrl CN l-CCITSl-P N-l*"Ol-w.43mwOn Wl- yIW(DWYI"l YIWC?OYI~CPC~~~ g m EP: d ~~Cl-mwl-...... CY ...... ?? %g+$ m 3 pJ=Q\c 7;z Q mmnrl-ww nr +mmm33 w~nrmmc~m~mrlmCP 2 ;zz= Y"~ OCICdrll-m...... In*.. mrl-owm...... rnd-wwmq~m-Nnm...... nrl.. - .-3 $2; II NnnNNNN ..IN rlnnm3-4 NCIdNNdNdNnNd3 -0 0 5c-W sss f i NP.WDI0-W om mmt-omm wm*w3*cl*mmNCI m- R%%u%g g< E"" "CINNNd *-.. WCNNNN..... mWnmmmNmrlm~m~...... *N.. EZN 9 3;;;;;; 43 rl.-Irlrlrl- """33""" -3 o+- Y mm womml-P. - -m-~cnml-l-l-l-nl-ml- FIN UgC7"Lieg % morlol-Nl-mdnrl 3 rl +".?I W Q3CmN 4 M * n3 .^ 532 2 - - *d . :.: YI mu - CIY - ds-2.o 23 - .-PI rl- a rl - -n *Y " -a w5giT'a;; .o .A a 1": e! EaZ"-2z k t" 0 ong' NI( w H 5 a-s 3 ;.!: w !A i I:;$ u A!%?:N 3 j &;+7m; u4 3 Z39" .iNcIW ._ $;t$<{i{z g $f:q<{{{{mmqU?? j; B zEes28;;;;; zi 3 m&UUHHHWg lllUah19 HHHHHHYH WHH Recognition of hydrophobic clusters 1191

0 01

0 0- m m

I

0 P I

0 0- W I-I I:?;.;9; 0- I

CI I *0-

./;., I

0 0- I " CI I I I:. I I 0 N

0

Y) 0 Io m

0

N N 4

c

H H Y .4 B 1192 M.H. Zehfus

0

"7 x si. II 1

I I I

0- W

0 -m m ma m I= I

.*Ill

-aJ wm Recognition of hydrophobic clusters 1193

I II I

II I m

I I::

ID .. ,:

2:- .!!! I 0- Y d R

Y 4 3 1 I94 M.H. Zehfus

B

A D

I

r

Fig. 2. Hydrophobic clusters from barnase. A: Tube representation of the backbone framework. B: Side-chain surface of the bulk hy- drophobic cluster. C: Side-chain surfaces of core I (yellow) and unit I1 (dark blue). D: Side-chain surfaces of cluster 1.A (green) and cluster 1.B (red). E: Side-chain surfaces of cluster I.A.l (pink) and cluster I.A.2 (blue).

" -, Recognition of hydrophobic clusters 1 I95 et al., 1987). Hydrophobic cluster analysis finds a single large relations are seen between unit I1 and the continuous domain core I that corresponds well to this barrel. Cluster 1.A is smaller, and cluster 1.C and one discontinuous domain. LA shows mod- containing the core of the barrel anda few peripheral residues, erate correlation with the second discontinuous domain. and cluster I.A. 1 represents the extremely hydrophobic center When the heme is removed from myoglobin, its structure is of the barrel. Clusters I.A.2 and I.A.3 overlap with 1.A.I but largely disrupted, but some residual structure remains. The have little overlap with each other. Theyrepresent combinations structure of apo-myoglobin hasbeen probed using NMR both of the I.A.1 central core with portions of a side or end of the to quantitate proton exchange (Hughson et al., 1990) and to barrel. Due to their large overlap with I.A. 1, they are notviewed identify clusters of associating residues (Cocco & Lecomte, as separate structural entities, but as extensionsof the central 1990). If compact hydrophobic clusters are important structural core. Unit I1 has little overlap with coreI and represents a sin- entities, then the sameclusters found in the holoenzyme should gle strand of residues at one end of the barrel. also be observed in the experimental data from the apoenzyme. Using pulsed hydrogen exchange, Briggs and Roder found Cocco and Lecomte (1990) identified four different clusters that this protein foldslargely as a single unit, with most reporter of hydrophobic residues in the apoenzyme. Twoof these clus- protons being 80% protected from solvent exchange in the first ters, A and C in Figure 1, contain more than five residues, 20 ms of folding (Briggs & Roder, 1992). This protein can form whereas the others aresmaller. Cluster A corresponds well with a compact non-native state at low pH in 60% methanol. This the I.A. 1 compact cluster (FOM 72%), whereas cluster C agrees state has been studied by Pan and Briggs (1992), using pulsed only moderately with unit I1 (FOM 55%). No units are found hydrogen exchange experiments, and by Stockman et al. (1993), that match thesmaller clusters, but thisis not surprising because using heteronuclear techniques. they are too small to be detected with this methodology. The exchange data, both in native and non-native states, cor- The NH protons protected from solvent exchange correlate relate well with each other, identifying similar sets of protected quite well with cluster 1.A (Hughsonet al., 1990). Because this nuclei. Stockman’s data, which is restricted to amide proton cluster is a larger version of the I.A. 1 cluster that coincides with NOE constraints,clearly shows the presence of the helix and two one of Lecomte’s units, clearly compact hydrophobic clusters strands of 0-sheet at the N-terminus of the protein but shows are important structural elements in both holomyoglobin and little distinct structure in the rest of the protein. apomyoglobin. The hydrogen exchange data and the I.A.1 hydrophobic clus- ter are clearly closely related (FOM 81%). This cluster contains the hydrophobic residues from the middle of the barrel struc- Cytochrome b5 ture. The larger 1.A structureis a superset of the I.A.1 cluster and contains additionalresidues from theexterior of the a-helix. Cytochrome b5 is another protein that containsa noncovalently Some of these additional residues are not protected from sol- bound heme. Again, this heme is not included in thecluster anal- vent exchange in the folding process, indicating that only the ysis for simplicity. Unlike most other proteins in this analysis, smaller cluster is important at this stage in folding. cytochrome b5 does not appear to have a large unit that in- Stockman’s dataidentify a smaller intermediate in the A state cludes most of the protein’s residues. A compact unit of this size containing only one helix and two strands of P-sheet. Appar- does exist, but it is not hydrophobic enough to be classed as a ently his methodology is more restrictive and fails to identify the hydrophobic cluster. more fluid regions of the nascent hydrophobic core. This protein contains both helices and one large /3-sheet (Mathews et al., 1972). Core I encompasses both sides of this &sheet and represents the main core for more thanhalf the mol- Myoglobin ecule. Cluster I.A also includesresidues from bothsides of the 0-sheet, although thebulk of this unit is on theside of the sheet Myoglobin is a largely a-helical protein with a noncovalently distal from theheme. 1.B also lies on the distalside of the sheet bound heme group (Takano, 1984). To simplify calculations the but represents a region wheretwo helices dock with the P-sheet. heme was not included in the compact unit analysis. In this pro- 1.C and unit I1 contact oppositefaces of the heme. There is some tein, core I is a very large entity that includes the bulk of the pro- overlap between these clusters along the edge of theheme clos- tein’s interior. Core I buries most of the G and H helices and est to the P-sheet. includes one face from all other helices except helix D. Like myoglobin, the heme of this protein may be removed to Cluster 1.A looks like a logical predecessor to core I. It has obtain a new, structurally different apoenzyme. The structure about 2/3 of the G helix completely buried and interactions are of the apoenzyme has been studied in detail by Moore and seen with faces of the A,B, and H helices. The 1.A subclusters Lecomte (1993). The apocytochrome structural unitconsists of represent various helix-docking interactions within this unit. a @-sheet and two short helices that are not involved in heme This protein contains alternate structures,cluster 1.C and1.C’. binding. This structuralunit corresponds well with either the 1.A Alternate structures are places where the clustering algorithm or 1.B clusters (FOMs 76% and 86%, respectively). Core I is a identifies two units of approximately the samesize and compact- larger unit that contains both 1.A and I.B, but it also includes ness, occupying roughly the same region of the protein. The residues from two additionalhelices that are notseen in the ex- choice between primary and alternate structure is made using perimental data. hydrophobicity. The alternate structureis clearly much less hy- No correlation is seen with unit 11, and this is puzzling because drophobic than the primary unit. it is equivalent to both 1.A or 1.B in size, compactness, and hy- Previously, compact domain analysis hasbeen done on myo- drophobicity. Unit 11, however, is in direct contact with the globin (Zehfus, 1994). This analysis divided myoglobin into one heme and probably requires some specific interactions for continuous domain and two discontinuous domains. Good cor- stabilization. 1196 M.H. Zehfus

Cytochrome c The CY domain contains hydrophobicclusters 1.A and I.B. 1.A represents the true central core of this domain and almost com- Cytochrome c has a covalently attached heme moiety. This pletely buries the H5 helix (residues 93-106). 1.B represents a makes the heme difficult to remove, and most experimental bulge to oneside of this core. These twoclusters are in intimate studies are donein its presence. Because the heme is present in contact, with 50% of the residues in l.B occurringin I.A. It is the experimental work, aspecial analysis was done thatincluded likely that thea domain has asingle somewhat noncompacthy- the heme, as well as the usual analysis where the heme is ignored. drophobic core that includes both1.A and I.B, and the cluster It can be seen that the presence of the heme significantly en- building algorithm divided this into twosmaller but more com- hances the compactness of both the large bulk unit and coreI. pact subclusters. The P-sheet region of this protein is seen pri- The heme can alsofit into the smaller cluster that shares many marily in the 1.C and unit ll clusters. Unit 1.D lies at theinterface elements of both the 1.A and 1.B apoclusters. between the CY and 0 region and may be included in either. Core I is a large unit that encompasses mostof the protein’s Early folding intermediates in T4 lysozyme have been char- interior. Cluster 1.Asits on one endof the heme and makes con- acterized usingpulsed hydrogen exchange (Lu & Dahlquist, tact between the N-terminal, 60s, 70s, and C-terminal helices and 1992). These data show three regions protected from solvent ex- both faces of the heme. 1.B represents just the docking of the change in the first8 ms of folding. Many, but not all,of these OS, 60s, and 70s helices with one face of the heme. When the residues occur in the small, highly hydrophobic I.A.3 cluster. heme is included in the analysis, a single 16-residue cluster is Several other clusters (I.B, l.A.l, I.A.2) contain similar struc- found that combines elementsof both the 1.A and 1.B clusters tural elements but combine them with other residues that are not with the heme to form a compactcluster. 1.C shares several res- protected at this time. Perhaps these structures fold moreslowly idues with 1.B but is not in contact with the heme. and will only correlatewith experimental dataif the experimen- Clusters 1.B and I.A.2 both correlatewith compact domains tal conditions are varied so a later time point is observed. found by Zehfus (1994) and seem to represent the hydrophobic The FOM of the I.A.3 cluster and the protected protons is cores of these domains. Taniuchi has identified four core do- 64070, indicating only a moderate correlation. The reason these mains in cytochrome by studying hybrid two-fragment com- structures are not better correlatedis that a 0-turn-0 structure plexes (Fisher & Taniuchi, 1992). Little correlation is seen with (residues 15-35) is protected from exchange but is not part of his domain regions. the I.A.3 cluster. These protected residues, in fact,do not oc- The folding of this molecule with the bound heme has been cur in any hydrophobic cluster. studied using hydrogen-exchange labeling experiments (Roder Visual inspection suggests that cluster 1.C may stabilize this et al., 1988). Ten protons in this protein’s N- and C-terminal he- structure indirectly. The residues in question form a twisted lices become 40% protected from solvent exchangein the first 0-turn-0 structure. The l.C clusteris located at thebase of the 30 ms of folding (Fig. 1 lists four that havebeen clearly identi- 0-turn-0 and keeps the strands locked together. This prevents fied). Additional folding steps then take place on 0.2-s and 10-s the strands from separating and could greatly retard hydrogen time scales. These two helices are also important in the struc- exchange for protons within the strands. Thus, the collapseof ture of the non-native compact A stateobserved under low pH- a hydrophobic pocket may stabilize structures thatlie outside high salt conditions. Here static hydrogen exchange experiments the cluster itself. identify these two helices plus part of the60s helix as a poten- tial folding intermediate (Jeng et al., 1990). The above helices are found in the I.A, I.B, and 1.AA.B- Hen egg white lysozyme heme clusters of cytochrome c. Because the folding studies are done in the presence of the heme, the latter unit is the logical This is another a+P protein (Diamond, 1974) where the large match. The FOM for this matchis only 41 Yo, indicating a poor core I encompasses the bulk of the protein’s interior, excluding fit between the units. This poorfit occurs because the 1.AA.B- a small flap of 0 structure, andincludes both CY and 0domains. heme cluster includesseveral residues not observed experimen- 1.A is the true coreof the a region and features a buried helix tally. Most of these additional residues are not in regions of (helix B, residues 25-35). 1.C is a bulge in the base of the01 re- regular secondary structure. Because protons in nonregular sec- gion away from the 0-flap. There is only one residue in com- ondary structure exchange more quicklywith the solvent, they mon between 1.A and I.C, so it is not clear whether the core of are not detected easily in this kind of experiment. Thus, these this region is best represented by one largeor two smaller inde- additional residues may be partof the hydrophobic cluster, but pendent clusters. The0 region is stabilized by the two clusters this method cannot detect them. 1.B and I.D. These clusters are independent and have no resi- dues in common. Cluster 1.A correlates reasonably well with a discontinuous T4 lysozyme compact domain foundby Zehfus (1994). A second continuous domain from this analysis seems to use both1.B and 1.Dclus- This is a good example of an CY+@protein. One region in this ters as its hydrophobic core. protein is a large cluster of helices, and the second containsa Miranker et al. (1991) have found 38 protons protected from single 0-sheet with two auxiliaryhelices (Bell et al., 1991). This hydrogen exchange in the first minute of protein’s this folding. visual organization is not strictly observed in the hierarchy of Buck et al. (1993) have studied a partially folded non-native hydrophobic clusters. Two smaller clusters are found in each do- compact formof this protein foundin 50% trifluoroethanol at main, but no superclusters are found that correspond to the do- low pH. Evans et al. (1991) have studied thermally denatured mains. Instead, core l includes the 01 domain and most of the lysozyme in aqueous solution and observed chemical shift ef- 0domain, excluding only a small flap of 0 into a separateunit 11. fects consistent with the formation of hydrophobic clusters, Recognition of hydrophobic clusters 1197 but only twospecific interactions, 51-53 and 62-63, have been of unit I1 is part of cluster1.C) and should be joined intoa sec- observed. ond supercluster. Thereis only oneresidue in common between Figure 1 shows that both the region protected from solvent the 1.AA.B supercluster and the I.C/unit I1 supercluster, so exchange early in folding and theregion protected fromsolvent these two units are independent of each other. exchange in the compact non-native state correlate quite well This protein hasbeen divided into two binary discontinuous with cluster 1.A (FOMs 76% and 69%,respectively). The inter- domains using compactness (Zehfus, 1994). One domain uses actions observed in the thermally denatured statebetween resi- the I.C/unit I1 supercluster as its core, whereas a second domain dues 51 and 53 and residues 61 and 62 do not, however, play uses parts of 1.AA.B as its core. Notice that the 1.AA.B super- a prominent role in any native cluster. cluster contains residues from four distinct regions of ribonu- clease. By definition, a binary domain contains residues from two regions, so it cannot be matched exactly to a cluster with a-Lactalbumin four regions. Like the previous two proteins, a-lactalbumin contains two do- Udgaonkar and Baldwin (1990) followed the folding of ribo- mains: one helical and the other a @sheet and more random nuclease using pulsed hydrogen exchange. Although most ex- structure (Acharya et al.,1989). The helical region is well rep- change experiments determine the time courseof protection for resented by core I.A. This cluster contains a single core helix (B, individual protons, this experiment classes protons on the de- residues 23-34) that is almost completely buried by several other gree of protection (strong, medium, or weak) that exists after helices. The subsets of 1.A represent different docking inter- the first 0.4 s of folding. Most of these protected protons oc- actions between the corehelix and the surroundinghelices. Be- cur in both 1.A and 1.B clusters, confirming thehypothesis that cause several of these units overlap (I.A.3 and I.A.4; I.A.4 and these clusters should be joinedin a single1.AA.B supercluster. I.A.2), it is likely that they do not represent independent fold- The FOM for the match between the I.A/I.B cluster and the ing units but are simply artificial subdivisions of the 1.A unit. protected residues is 66%. Few of the protected protons occur Units I1 and I11 are spatially distinct from 1.A and form the in the I.C/unit I1 supercluster. The preferred folding of cluster protein’s second, irregular structural domain. Unit I1 encloses 1.AA.B over I.C/unit I1 is probably because the 1.AII.Bclus- both sides of a 0-turn$ between residues 40 and 50, whereas ter is larger (39 residues) and more hydrophobic(30 hydropho- unit 111 is much more random with little secondary structure. bics) than the unit 1IA.C cluster (26 residues, 15 hydrophobics). Units I1 and 111 are about the samesize and hydrophobicity and have about40% overlap with each other. It is not clear whether Ribonuclease T, they should be treated as small independent units should be or The central skeleton of thismolecule is a highly twisted &sheet joined into one larger supercluster. (Martinez-Oyanedel et al., 1991). There is no large single core Cluster 1.C lies at the interfacebetween helical core 1.A and in this protein, onlyfive smaller clusters. Cluster I forms anin- the irregular unit II/unit 111 lobe. Cluster 1.Cis also consistent ner core around which the 0-sheet is twisted. Clusters I11 and with the hydrophobic boxused by Koga and Berliner (1985) to IV are on theside of the sheet opposite cluster I and arelocated establishthe structural similarity between thisprotein and on different ends of an a-helix that stabilizes this side of the pro- lysozyme. tein. Clusters I1 and V are located along theedges of the sheet. At low pH, a-lactalbumin goes into a compact non-native Cluster I has one region that overlaps with cluster I1 and a sec- A-state. The A-state of guineapig a-lactalbumin has been stud- ond region that overlaps with cluster 111. This suggests that the ied by hydrogen exchange (Chyan et al., 1993), and someside- three clusters should be joined intoa single I/II/III superclus- chain clusters have been identified by NOESY interactions in the ter. Clusters IV and V, on the other hand, havelittle or no over- closely related bovinea-lactalbumin A-state(Alexandrescu etal., lap with the otherclusters and arelargely independent. Cluster V, 1993). The correlation between hydrophobic domains and ex- with only eight hydrophobic residues, is not hydrophobic perimental datais not as goodwith this protein as it is for other enough to be a folding nucleation site. proteins. The protection data correlate only moderately well The folding of ribonuclease TI is very complicated. Func- (FOM 51%) with cluster I.A.2, and no compact clusters are tionally, it is thought to be dependent on the cis-trans isomeri- found that correlate with the observed NOESY data. zation of two different proline amide bonds (Kiefhaberet al., One explanation for thisis that the A-state of a-lactalbumin 1990a,1990b, 1990~). Experimentally, pulsed hydrogen ex- more closely resembles the denatured state than thenative state. change experiments detect two distinct folding phases thought A major point of the Alexandrescuet al. (1993) paper is that no to represent the foldingof a native-like intermediate anda sec- evidence is found fornative-like hydrophobic clusters, and the ond non-native compact globular state (Mullins et al., 1993). large cluster observed is not consistent with the native structure. The residues associated with the faster folding form of ribo- nuclease (labeled F in Fig. 1) are thought to be associated with Ribonuclease a native-like intermediate. Some of these residues can be found in each of the I, 11, and I11 hydrophobic clusters, but nosingle This protein contains a single &sheet, bent to form a V-like cluster seems better correlated than the others. Thisis taken as structure (Wlodawer et al., 1988). Again, core I is an oversized evidence that these units should be grouped intoa single super- unit containing most of the proteinexcept one small loop that cluster rather than viewed as separate entities. forms theunit I1 cluster. The 1.A and 1.B clusters make one arm of theP-sheet. With 39% of the residues in 1.B occurring in I.A, these two clusters overlapwell and probably representa single Interleukin larger supercluster. The other arm of the6-sheet is stabilized by Interleukin’s structure is that of a @-barrelwhere one end of the the 1.C and unit I1 clusters. These two clusters also overlap (44% barrel has been closed off by trefoil arrangement of &strands 1 I98 M.H. Zehfus

(Veerapandian et al., 1992). Core I represents the interior of the hydrophobic, andexclude regions of hydrogen bonded secondary @barrel core plus several exterior residues. Core I1 is a bit structure. Larger units can be assembledby joining or expand- smaller but again represents the central 0-barrel core in combi- ing smaller units, areless hydrophobic, and enclose elements of nation with some exteriorresidues. The interiors of both these regular secondary structure. cores are almost identical, and the exteriors overlap but are not Although the hydrophobic clusters are intuitively pleasing, identical. The exterior residues sharedby both cores are found their structural hierarchies do not always match our intuitive in the 1.B (or 1I.A) cluster family. The 1.C cluster of core 1 con- picture of domains. Occasionally the clustering algorithm will tinues from this surfacein one direction, whereas the1I.B and expand a cluster beyond its domain boundary toinclude a signif- 1I.D clusters of core I1 continue in a different direction. icant portion of a neighboring domain. Thisis a minor problem In this protein the cluster hierarchy is complicated because because these situations areeasily identified by visual inspection many higher order clusters share common subclusters. The sub- of the proposed units. clusters that best represent the interior nucleus of the &barrel This method will also occasionally subdivide hydrophobic are units I.A.2 and I.A.3. These units areused in three differ- clusters into two or moresmaller but more compactsubclusters. ent superclusters (I.A, I.B, and I1.A) and so have alternate des- These cases are identified by examining the overlap between ignations I.B.2A.B.3 and II.A.2hI.A.3. These two units share units; whenever two clusters are roughly the samesize and have many residues (a 40% overlap) and are contiguous with each more than 30% overlapping residues, they shouldbe examined other. They almost certainly represent a larger, less compact carefully tosee if they should be joined into a larger superclus- cluster that hasbeen artificially divided into twosmaller pieces. ter. Several cases were shown where two, or even three, over- Pulsed hydrogen exchange techniqueshave been used to study lapping clusters were proposed to be a single larger supercluster. the folding of interleukin (Varley et al., 1993). In contrast to the In these cases, the experimentalevidence did not favor anysin- a-helical proteins, where NH protection may occurwithin a few gle cluster but indicated that the clusters form simultaneously milliseconds, protectionhere takes placein the 1-IO-s time as a single unit. frame. Thegeneral pattern of protection correlates with the joint Heringa and Argos(1991) have proposed a different method I.A/l.B cluster that represents the hydrophobic middle of the for finding densely packed clusters of side chains. Their clus- 0-barrel. Although the general pattern of protection correlates ters are strikingly different from theclusters found here. In the with this combined cluster(FOM 85%), it is difficult to match Heringa method, oneexhaustively searches allside-chain com- the exact pattern of early, middle, and late protection to indi- binations for pairs of side chains with maximal contact. After vidual clusters or cores. normalization for side-chain size, a threshold is set, and a set Gronenborn and Clore(1994) have proposed that folding of of “dense neighbor” pairsis selected. These pairs are thenused this proteinis consistent with the hydrophobic zipper model of to find dense clustersof three residues, and three-residueclus- Dill et al. (1993). They further proposed that aset of four leu- ters are used to find four-residue clusters, etc. At each step of cines and onecysteine formed the coreof a “zipper”region. Al- cluster growth, the added residueis checked for minimal con- though the evidence given here is consistent with hydrophobic tact with other residues in the cluster and for contactwith resi- regions being important in early folding events, none of thehy- dues outside the cluster. If the added residue has too much drophobic clusters found here correspond to leucine the zipper contact with residues outside the cluster,it is not added. The lat- of Gronenborn and Clore. ter rule is used to make the clusters relatively isolated. The clusters discovered by the Heringa method areusually lo- cated on aprotein’s surface and contain an averageof three or Bovine pancreatic trypsin inhibitor four residues. As might be expected for surface residues, the clusters are notespecially hydrophobic and frequently include Core I of BPTI contains residues from the beginning, middle, charged residues. This contrasts sharplywith the clusters found and end of the protein’s sequence and corresponds to the end here, which are much larger,extremely hydrophobic, and usu- of the molecule containing the a-helix. Thiscluster and its sub- ally located in a protein’s interior. clusters correspond quiteclosely to the PaPypeptide used by Because both methods attempt to find densely packed clus- Staley and Kim (1990) to model a two disulfide folding inter- ters, and both methodsuse some formof surface area measure- mediate in BPTI. This peptide has native-like structure in so- ment to find these clusters, the difference in results is quite lution, so the corel cluster family mustplay an important role surprising. There are twolikely explanations forthis dissimilar- in stabilizing this structure. ity. First, for technical reasons, the compactness method can- Both core I and unit correspond to central regionsof dif- I1 not properly evaluate the compactness of small units (see the ferent discontinuous compact domains (Zehfus,1994). Unit I1 Methods) and only reliably locates units containing seven or is a good match because it is binary in nature. Core I, on the more residues. Thus, this method cannot analyze thevery small other hand, contains residues from three regions in BPTI and clusters observed by Heringa and may be examining entirelyan does not match exactly with the binary compact domain. The different type of cluster. ternary hydrophobic clusteris probably the coreof a true ter- The second explanation is that the Heringa method has anin- nary compact domain, but the present domain identification ternal bias for surfaceresidues. That method deliberately selects scheme is not sophisticated enough to identify such units. units that have little overlap with other residues. Such units are more likely to occur at protein’sa surface, where one side of the General trends unit is solvent exposed, than in the protein interior,where the unit’s entire surface will be in contact with other residues. It re- The hydrophobic clusters found here are intuitively pleasing. mains for future investigations todecide which kind of cluster Smaller units come from the interior of the protein,largely are is structurally more interesting. Recognition of hydrophobic clusters 1199

Recently Swindells (1995) described a method for findinghy- is stabilized by a few key cluster residues, so the experimentally drophobic clusters by identifying buried residues in regular sec- observed structure maybe much larger than thecluster respon- ondary structures thathave a large number of contactsbetween sible for the structure.To make matters even more confusing, hydrophobic atoms. This approachis used to finda single core a single sheet may be stabilized by two different clusters, one in a protein and does notyield the hierarchical cluster of fami- on each side of the sheet. Thus, a simple one-to-one correspon- lies observed here. The core definitionis also quite restrictive, dence between protected residues and clusters will almost never so the largest observed core has only about 25 residues. occur in @ regions. Three of the proteins analyzed by Swindells are alsoanalyzed Another factor that may precludea correlation between hy- here. With a FOM of 88%, Swindells’s core for interleukin I@ drophobic clusters and proposed folding intermediates is that is an excellent match with the I.A. 1 clusterfound here. The core the structurebeing analyzed may notbe a folding intermediate. he chooses for myoglobin is also similar to the I.A.1 cluster In a few proteins the compact non-native A-state was used as found here, although its FOM is lower (67%). Thebiggest dif- a folding intermediate model. In cytochrome c and ubiquitin, ference is in pancreatic trypsin inhibitor. Because the Swindells this is reasonable because the proton exchange data of the method uses a threshold accessibility value, only four residues A-state resemble the proton exchange datain the native state. in BPTI can beused to forma core. Because these residues are In a-lactalbumin, however, thereis no such directevidence ty- not in contact with each other, he finds no acceptable core for ing the “” state directly to the folding process, this protein. This contrasts with the four clusters found here, and indeed, thereis good evidence that this particular “molten one ofwhich is thought to bea significant folding intermediate. globule” has somevery non-native features(Alexandrescu et al., Early folding intermediates are thoughtto correlate with hy- 1993). drophobic regionsin proteins. To see if this is true, hydropho- Considering all the factors that can cloud the correlationbe- bic clusters were correlated against experimentally identified tween hydrophobic clusters and folding intermediates,it is ac- folding intermediates. In nine cases the correlationis very good, tually quite encouraging that the correlationis as good asit is. and in two cases it is only moderate. Interestingly, thebest cor- It appears that compact hydrophobic clusters are importantin relation between hydrophobic clusters and folding intermediates early folding intermediates, and thatcluster size can be directly occurs when clusters with less than nine hydrophobic residues tied to a unit’s existence and stability. are removed from the dataset. This suggests that thereis a min- imum size for a hydrophobic pocket and thatclusters below this Methods threshold are not stable enough to initiate folding. The total number of hydrophobicresidues also appears tobe Compacthydrophobic clusters were foundfor the proteins important in determining relative stability between units. Sev- a-lactalbuminIALC (Acharya et al., 1989), barnaselRNB eral cases are seen where two or more possible clusters exist in (Baudet & Janin, 1991), ubiquitin lUBQ (Vijay-Kumar et al., a protein. In thesecases the cluster with the most hydrophobic 1987), myoglobin 4MBN (Takano, 1984), cytochrome bS 3B5C residues usually has the best correlation with the experimental (Mathews et al., 1972), cytochrome c 3CYT (Takano & Dick- data. This makes intuitivesense because the larger clusters would erson, 1980), interleukin 4IlB (Veerapandian et al., 1992), T4 hide more hydrophobic surface area and be moreenergetically lysozyme 4LZM (Bell et al., 1991), hen egg-whitelysozyme favored. 6LYZ (Diamond, 1974), ribonuclease 7RSA (Wlodawer et al., Correlating hydrophobic clusters tospecific steps in folding 1988), BPTI 6PTI (Wlodawer et al., 1987), and ribonuclease T, is more difficult. In proteinswhere distinct ranges of protection 9RNT (Martinez-Oyanedel et al., 1991), using coordinates de- or distinct phases of protection can be observed,it is usually not posited in the Brookhaven Protein DataBank (Bernstein et al., possible to tie individual clusters to distinct folding entities. This 1977). In all cases, heteroatoms and acetylationswere removed often occurs because a single structural entity, like a @-sheet, from the coordinatesets. If a side-chain position was multiply may be stabilized by more than one hydrophobic cluster. In defined, the single most populated side-chain position was used. these cases, it is not clear whether a single cluster is necessary An exception to the abovewas made for cytochromec. This for the structure to form, or if both clusters must coalesce protein contains a single covalently bound heme group, andre- simultaneously. folding and protection studies done onthis protein include the The cases where hydrophobic clusters and folding intermedi- heme moiety. To see if the presence of the heme changes the ates do not correlate may be explained in several ways. First, analysis, the analysis was performed both with and without many experiments used to monitor folding intermediates are not the heme group. well suited for observing tertiary folding interactions. For in- Compactness, measured by dividing an object’s solvent-acces- stance, hydrogen exchange labeling experiments usually only sible surface area by its minimum possible surface area, has been monitor residues in regions of regular secondary structure. If used to locate continuous and binary discontinuous protein do- compact clusters contain residues outside helices or sheets, they mains (Zehfus, 1987, 1993, 1994). In the search for domains, will not be observed using this method. Further, the reactions main-chain and side-chain moietiesof an amino acid are con- involved in the exchange process are complex, and other factors, sidered as one inseparable unit, and domains areassembled con- such as solventaccessibility, are also importantin determining tinuously, by adding new amino acids at either end of a growing, the proton exchange rates. continuous peptide segment. Making correlations between hydrophobic clusters and fold- In thesearch for compactclusters, a muchdifferent approach ing intermediates are particularly difficult in @-sheet regions. is taken. First, because the peptide backboneis inherently hy- This is due to the discontinuous nature of the@-sheet. The hy- drophilic and should not appearin a hydrophobic region until drophobic cluster that stabilizes a 0-sheetmay contain only one its polar moieties are hydrogen bonded, side chainson6 are used or two residues from each @-strand. Inthis case, the entire sheet in building clusters. Backbone moieties maybe added to these 1200 M. H.Zeh fus clusters in later steps of the analysis for clarification of back- the mean Z value, rather than absolute Z value, {becomes a bone interactions, but they are not used directly in the clus- size-independent parameter for comparing compactness values. ter-building algorithm. Using the {parameter, each clumpis now searched fora few The second differencebetween this method and the domain- side-chain sets that arevery compact. To do this, each member finding algorithmis that side chains are addeddiscontinuously. in the clump is compared to other,similar-sized members of the This is necessary because clusters are inherently discontinuous; clump. If a given unit has N residues, and anotherunit contain- a cluster on oneside of a 0-sheet will only contain every other ing between N/2 to 3N/2residues is found with a higher { value, residue within a single 0-strand, and the strandsof a 0-sheet fre- the unit with the lower { is rejected. quently are not contiguous with each other. Clusters formedby Many of the remaining units are closely related, describing the docking of helices are also inherently discontinuous, al- nearly identical pieces of structure. To find the best unit to rep- though the pattern of the discontinuity will be different. resent each structure, each cluster is next compared to othersim- Because we wish to discover hydrophobic clusters,it is natu- ilar-sized clusters to see if they are closely related. If the potential ral to think thatwe should limit our search to only hydropho- cluster has N residues, then all units between N/2 and 3N/2res- bic side chains. It is impossible, however, to class many side idues are searched for sequencesimilarities. If more than 50% chains as purely hydrophobicor hydrophilic because they con- of a smaller unit’s residues occur within a larger unit, the clus- tain a mixture of properties. Consequently, all side chains are ters are declared similar, and only the clusterwith the highest used in the discovery algorithm. It will be seen that even when { values is retained. hydrophilic side chains are included, the discovered units are Although hydrophobicity is not used as a constraint in finding very hydrophobic. A few hydrophilic clusters are found, but these compact clusters, the final units largelyare hydrophobic. they are easily identified and removed. When alanine, valine, leucine, isoleucine, proline, phenylala- The gist of the method is that each side chain in a protein is nine, tryptophan, methionine,cystine, and tyrosine are consid- used to grow a family of potential clusters called a “clump.” ered hydrophobic, the average compact cluster contains 65% Clumps are grownin a stepwise manner, by finding the single hydrophobic residues, whereas the average proteinused in this side chain that optimizes the compactness of the growing clump. study contains only 42% hydrophobicresidues. The procedure Once clumps havebeen identified for eachside chain in a pro- does, however, find some clusters that are not hydrophobic. tein, they are comparedwith each other to find the setsof side These hydrophilic clusters are removedby eliminating all clus- chains that are most compact. Themost compact side-chain sets ters with less than 40% hydrophobic residues. Nearly 95% of are then identified as compact clusters or cores. all compact clusters pass this test. All discovered hydrophobic clusters are listed in Figure 1. Details of the cluster-finding algorithm When all compact hydrophobic clusters are comparedwith the experimental data for early folding intermediates, it becomes ap- A clump is started froma given side chain by pairing it with all parent thatneither the largest nor thesmallest clusters correlate other side chains in the protein and evaluating eachpair’s com- with folding intermediates. Thiswas investigated further totry pactness. When the most compact pairis found, the identity of to understand this phenomenon, andto design data filters that the added side chain and the compactness of the pair are re- would remove noncorrelating units from the data set. corded, and the first step in clump growth is complete. Next, The largest clusters are muchbigger than theunits delineated this pair of side chainsis then tried in combination with all re- by the experimental data and probably represent structures maining side chains to find the most compact side-chain trip- formed late in the folding process. These large units are removed let. This processis continued, and the clump grows, untilall of from the data sets by eliminating all clusters containing more the protein’s side chains are containedin the clump. than 50% of a protein’s amino acids. During this process, compactnessis evaluated with the Z pa- The most plausible reason that smaller units do not correlate rameter (Zehfus & Rose, 1986), using a look-up table-based with observed structure is that nascent units mustexceed some algorithm for area and volumecalculations (Zehfus, 1993), spe- threshold energyvalue before they are stable enough tobe ob- cially modified to use only side-chain atoms. The numberof cal- served. To try to understand this better, the set of hydropho- culations done can be greatly reduced if clumps are inspected bic clusters was filtered usingsize, percenthydrophobicity, after each step of growth to remove clumps with identical setscompactness, or total number of hydrophobic residues, seeto of side chains. which parameter would give the best fit with experimental data. The next step in the procedure is to examine the clumps for The best correlation was found with total number of hydropho- sets of side chains that are exceptionally compact. Comparing bic residues. Units containing nine or more hydrophobic resi- the compactnessof units of different sizes is difficult, however, dues correlatewell with experimental data, whereas thosewith because Z values for these units are size dependent (data not eight or fewer hydrophobics have little or no correlation. The shown). A size-independent parameter called {was developed clusters in Figure 1 that pass these additional filtering steps are to remove this complication.

Veerapandian B, Gilliland GL, Raag R, Svensson AL, Masui Y, Hirai Y, Protein anatomy: Functional roles of barnase module. J Biol Chem Poulos TL. 1992. Functional implications of interleukin-I beta based on 268:5861-5865. the three-dimensional structure. Proteins Struct Funct Genet 12: 10-23. Zehfus MH. 1987. Continuous compact protein domains. Proteins Struct Vijay-Kumar S, Bugg CE, Cook WJ. 1987. Structure of ubiquitin refined Funct Genet 2:90-1 IO. at 1.8 A resolution. J Mol Biol 194531-544. Zehfus MH. 1993. Improved calculations of compactness and a re-evalua- Wlodawer A, Nachman J, Gilliland GL, Gallagher W, Woodward C. 1987. tion of continuous compact units. Proteins Struct Funct Genet 16:293- Structure of form I11 crystals of bovine pancreatic trypsin inhibitor. J 300. Mol Bioi 198:469-480. Zehfus MH. 1994. Binary discontinuous compact domains. Protein Eng Wlodawer A, Svensson LA, Sjolin L, Gilliland GL. 1988. Structure of phos- 7:335-340. phate-free ribonuclease A refined at 1.26 A. Biochemistry27:2705-2717. Zehfus MH, Rose GD. 1986. Compact units in proteins. Biochemistry 25: Yanagawa H, Yoshida K, Torigoe C, Park JS,Sato K, Shirai T, Go M. 1993. 5759-5765.