View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Tertiary structure of an immunoglobulin-like domain from the giant muscle : a new member of the I set Mark Pfuhl and Annalisa Pastore

EMBL, Heidelberg, Meyerhofstrasse 1, 691 17 Heidelberg, Federal Republic of Germany

Background: Titin is a gigantic protein located in the two P-sheets packed against each other. Each sheet

thick filament of vertebrate muscles. The putative contains four strands. The structure of M5 belonau to the functions of titin range from interactions with I (intermediate) set of the immunoglobulin superfamily and other muscle to a role in muscle recoil. and is very similar to telokin, which is also found in mus- Analysis of its complete sequence has shown that titin cles. Although M5 and telokin have relatively little is a multi-domain protein containing several copies of sequence similarity, the two proteins clearly share the modules of 100 amino acids each. These are thought same hydrophobic core. The major difference between to belong to the fibronectin type-111 and immuno- telokin and the titin M5 module is the absence of the C' globulin superfamilies. So far, a complete structural strand in the latter. determination has not been carried out on any of the Conclusions: The titin domains and several of the titin modules. immunoglobulin-like domains from other modular Results: The three-dimensional structure of an immuno- muscle proteins are highly conserved at the positions globulin module, located in the M-line of the sarcomere corresponding to the hydrophobic core of M5. Our close to the titin C terminus and called 'MS', was deter- results indicate that it may be possible to use the structure mined by multi-dimensional NMR spectroscopy. The of M5 as a molecular temdate to model most of the other structure has the predicted immunoglobulin fold with immunoglobulin-like domains in muscle titin.

Structure 15 April 1995, 3:391-401 Key words: immunoglobulin-like modules, muscles, NMR, titin

Introduction containing these domains are found in a broad range of The increasing amount of information on both the ter- organisms (from molluscs to vertebrates) and show a wide tiary structures and sequences of proteins has variety of complexity [2]. They range from the rather revealed the existence of families that share a very con- simple myosin light-chain kinase (e.g. telokin [4]) and served folding pattern but have extremely divergent proteins such as M-protein [5], H-protein [6] and C-pro- sequences [I,2]. Among these families, the immunoglob- tein [7-91 with only a few immunoglobulin-like ulin superfamily is one of the most divergent, both in domains, through kettin [lo] with at most 30 units, up to terms of sequence and function. All of the members of the giant protein titin. the superfamily share, with only minor variations, the same fold (-100 amino acids long) assembled as two Titin has a molecular weight of -3 million Daltons, P-sheets that pack tightly to form a P-sandwich [3]. which makes it the largest single-chain protein known to date (for review, see [11,12]). It is thought to form a This fold places the two termini of the polypeptide chain fibrous intracellular system in vertebrate striated muscle at far ends of the structure, in contrast to most other and to play an important role in sarcomere alignment single-domain globular proteins. In this way, it allows the during muscle contraction [13-151. It has also been sug- single domains to be arranged in linear arrays, a useful gested that titin may act as a molecular ruler regulating feature for building long proteins used in intracellular the assembly and the precise length of the thick filament or extracellular scaffolds. Furthermore, the common in muscles [16]. The titin found in cardiac muscles is 'theme' for immunoglobulin-like modules is the possibil- one of the smallest isoforms, and contains about 240 ity of forming highly specific interactions with other mol- modules almost equally divided between immunoglobu- ecules [2]. The conserved P-sandwich fold is regarded as lin-like and fibronectin type-111-like domains with only providing a stable scaffold to 'host' highly specific binding a few intervening sequence elements ([17,18]; S Labeit, sites on its surface. Small variations of the fold may add personal communication). The most notable of these additional flexibility to fulfil particular functions. intervening sequences are the repeats of a KSP (Lys-Ser- Pro) phosphorylation motif placed as a linker between Among the few intracellular proteins containing two immunoglobulin modules located in the C terminus immunoglobulin-like domains reported so far, the of titin. In contrast to other muscle proteins containing muscle-related family is the most numerous. Proteins immunoglobulin-like domains, titin extends throughout

O Current Biology Ltd ISSN 0969-2126 392 Structure 1995, Vol 3 No 4

the whole length of each half sarcomere [15,19,20] the linker sequence that contains the KSP phosphoryla- (Fig. 1). Immunoglobulin-like domains are common tion repeats [18]. High levels of KSP phosphorylating throughout the sarcomere, in contrast to the fibronectin kinase were detected (in vitro) in developing but not in type-Ill domains, which are present only near to the differentiated muscle. This suggests that phosphorylation centre of the molecule (in the A-band). Immunoglobu- of the titin C terminus is necessary during muscle differ- lin-like domains located in different regions of the sarco- entiation and it is possible that this phosphorylation may mere must have adapted to satisfy different functions of control the assembly of M-line proteins into regular titin, from being a major organizer of the thick filament structures in myogenesis. in the A-band [21], interacting with myosin and C-protein, to anchoring to the Z-line and the M-line, and to providing elasticity in the I-band [15,22]. The Results and discussion presumed involvement of titin in elasticity and the pas- Structural statistics sive generation of force adds a previously unobserved A detailed account of the complete assignment procedure function to the already numerous ones known for the of the NMR spectrum of the M5 domain has already immunoglobulin fold. It is therefore of great interest to been presented elsewhere [25]. A total of 872 nuclear solve the structures of representative immunoglobulin Overhauser effect enhanced (NOE) cross-peaks distance domains from different regions of the titin molecule. By restraints were obtained from NOE spectroscopy correlating the structures to the sequences on the one (NOESY) experiments (Table 1). Eighteen pairs of hand and to the specific functions on the other, a much ,B-methylene protons were assigned stereospecifically. better understanding of the immunoglobulin fold may The residues of a histidine tag (MHHHHHHSS), intro- be achieved. duced to simplify purification of the M5 module [26] and possibly to help increase the solubility of this domain, With this aim in mind, we have begun the characteriza- were not included in the structure calculation because no tion of several immunoglobulin-like modules selected NOEs were observable for this part of the molecule. The from different regions of titin. By preliminary circular histidine tag region is presumed to be highly mobile, as dichroism, fluorescence and NMR spectroscopy we indicated by the random-coil chemical shifts [27] of the found that titin's immunoglobulin-like modules adopt a histidine tag a-protons. Therefore, it should have little stable fold in solution and exhibit structural features con- influence on the structure of the domain. sistent with predictions [23,24]. Among them, domains from the M-line are the best-behaved in terms of high- Table 1. Statistics of structure calculation. resolution structural determination. Based on the com- plete assignment of the NMR spectrum of this module, NOE and H-bond derived distances: we describe here the structure of M5, which corresponds No. of NOEs 872 to the fifth immunoglobulin module of the 10 located in No. of intra-residual NOEs 180 the titin C terminus (Fig. lb). It is followed directly by No. of sequential NOEs 224 No. of NOEs (i-j, i+1 i+5) 342 No. of hydrogen bond constraints 23

Calculation results: No. of structures at start 500 No. accepted after first redac run 273 Min/max target function after second redac run 27/400 No. accepted after second redac run 129 Min/max target function after second redac run 14/98 No. accepted after third redac run 79 Min/max target function after third redac run 10/28 No. of structures used for MD 40

Parameters for 16 structures after MD: No. of NOE violations>0.4A 0 No. of NOE violations 0.3-0.4A 1 No. of NOE violations 0.2-0.3 A 16 No. of NOE violations 0.1-0.2A 70 Total no. of NOE violations 210 Average NOE violation (A) 0.053 Average bond length deviation (A) 0.006 Fig. 1. (a) Schematic representation of the sarcomere. The thick Maximal bond length deviation (A) 0.03 and the thin filaments are coloured red and blue, respectively. Average bond angle deviation (°) 1.1 Titin is shown in green and spans the whole length between the Maximal bond angle deviation (°) 4.3 Z-disk and the M-line. (b) The C-terminal region of the titin mol- ecule (from 118]). The titin kinase (in black) is on the left fol- Abbreviations: NOE, nuclear Overhauser enhancement; MD, molecular lowed by 10 immunoglobulin domains (grey) and insertion dynamics. sequences (white). The module M5 is highlighted in orange. Titin Ig modules Pfuhl and Pastore 393

The best 40 structures after a distance geometry calcula- (Table 1). Multiple root mean square deviation (rmsd) tion using the standard minimization protocol with superposition resulted in an rmsd of the backbone atoms stereospecific assignments had NOE violations of less of 0.95 A and 1.52 A for all heavy atoms (after mini- than 0.5 A and target functions between 12 A and 27 A. mization of the averaged structure). As can be seen in The violations were distributed uniformly along the Figs 2 and 3, the P-sheet regions are better defined, sequence, so they must reflect a tight calibration rather (average rmsd values of 0.4-0.5 A), while some of the than major distance inconsistencies, which would lead loop regions are more poorly defined, in particular instead to larger violations persistently involving the residues Lys36-Arg45 and Ser61-Asn69 (rmsd of up to same distance. Molecular dynamics simulation resulted >2.0 A). Despite the mobility of the histidine tag, only a in a slight improvement, leading to structures with minor 'fraying' is observed around the N terminus as the <0.4 A violations of the NOE restraints, <4O violations first strand starts immediately after the tag, at residue Ile2 of bond angles and <0.04 A violations of bond lengths of the effective sequence.

Fig. 2. (a) Superposition of the best 16 structures after distance geometry and molecular dynamics. The well defined A and A' p-sheets are clearly visible on the left. Part of strand D is seen on the right side. The loop Lys36-Arg45 on the lower right and the loop Ser60-Gly68 at the bottom show a higher deviation from the average structure. (b) The aver- aged structure with every tenth residue labelled and shown in the same orienta- tion as in (a). 394 Structure 1995, Vol 3 No 4

I .... (a) A A B C D E F G 200

a so 0

:3 100 W

50

0

I (b) 2.5 t I t /i t: I ! o 2 0~ i 1,11;,",,!,~:I;, !TfI *4 1.5 ~~i~~~i I t at !:: 1:j t t~~ ' " ; i 0.5 ns

T.. "A',. T, _ T, _ , T_, E_ , ., .F ., _ , U 0 10 20 30 40 50 60 70 80 90 100 -150 -100 -50 0 50 100 150 residue number

Fig. 3. (a) Plot of solvent-accessible surface area summed up for all atoms of each residue as calculated with X-PLOR. The posi- Fig. 4. Ramachandran plot for the averaged minimized structure. tions of 13-strands are indicated by black rectangles at the top. Glycine residues are displayed as circles and their identifiers (b) Plot of the average rmsd of the 16 best structures from the shown explicitly; all other residues are displayed as crosses. averaged structure calculated with X-PLOR. The largest devia- Non-glycine residues far outside the low-energy regions are tions are found in the long loops Ser60-Gly68 and Lys36-Arg45, labelled explicitly. the latter being 'split' into halves of poor definition by the anchoring residue Leu40. Values for backbone atoms are dis- played in solid lines, averaged values for all heavy atoms per residue in dashed lines. bonds is protected against exchange. In the average structure, some very weak hydrogen bonds with poor geometry may define two consecutive 3-turns in this An analysis of the Ramachandran plot shows that most region centred around residue Leu40 (Leu40-Ser43 and residues have energetically favourable backbone confor- from Gly37-Leu40). A y-turn could then be formed mations (Fig. 4). Most of the non-glycine residues outside from Val39-Ser41. However, since these secondary- the low-energy areas are located in the two poorly defined structure elements are not present in all the structures, regions (such as residues Glu67, Leu40 and Ser41). they should be considered only as one of several possible conformational states that this region may adopt. Structural overview The molecule adopts the classical immunoglobulin fold Hydrophobic core consisting of two P3-sheets forming a 3-sandwich The hydrophobic core of the M5 domain is dominated (Fig. 5). Each of the sheets contains four strands: the first by a cluster of aromatic amino acids (Phel9, Trp33, sheet (red in Fig. 5) comprises strands A, B, E and D His46, Phe57, Tyr70 and Phe85 in Fig. 6) slightly shifted whereas the second sheet (blue in Fig. 5) comprises from the centre of the molecule (towards the bottom in strands A', G, F and C. Strands A and A' are somewhat Fig. 5). This cluster is lined by Trp33 on one side and atypical because A consists essentially of a 13-bulge Tyr70 and His46 on the other side, and three phenylala- whereas A' packs parallel with the C terminus of strand nines which fill the centre of the cluster. The whole clus- G. The contribution of the N-terminal strand to both 3- ter, with the exception of His46, is inaccessible to the sheets is made possible by a twist of this strand around solvent (accessible surface <5 A2 ). Residue Tyr70 is residue Arg7. The 13-bulge, as it belongs to the G1S class shielded from the surface by the loop around Asp66 and (according to the classification given in [28]), is stabilized forms a bifurcated hydrogen bond between its hydroxyl by two hydrogen bonds from residue Thr4 and one from group and the backbone oxygens of Asp66 and Glu67. Leu3-Asp22. This interaction keeps the loop region between Glu68 and Ser60O well ordered. The buried hydrogen bond also Two pairs of antiparallel strands (DE and FG) are con- explains why the hydroxyl proton of Tyr70 exchanges nected by short type I 3-turns. All of the other strands slowly enough to be observable in the NMR spectra are connected by longer, less well defined loops. (usually hydroxyl protons exchange with the solvent The longest loop, which connects strands C and D too fast on the NMR timescale to be observable). The (approximately 10 residues long from residue Lys36), is aromatic residues are surrounded by the aliphatic residues very poorly defined by the NMR restraints and none of Leu40, Val48, Ile59, Ala17, Leu87, Val72, Val74, Pro6, the amide protons potentially involved in hydrogen Val31 and Ala83, completing the hydrophobic core. Titin Ig modules Pfuhl and Pastore 395

Fig. 5. Secondary-structure elements in the structure of M5. The 3-strands are coloured according to the sheet to which they belong. The orientation is the same as in Fig. 2. Strands are labelled at the top and bottom in the appropriate colours. N and Ctermini are indicated.

Strong aromatic-aliphatic interactions are found between Leu40 and Tyr70 and between Val72 and Trp33. Residue Leu40 projects from the surface loop Lys36-Arg45 deep into the hydrophobic core and makes very close contacts with Tyr70 and Trp33. Residue Leu40 appears to be anchoring the otherwise rather hydrophilic loop region very rigidly to the frame of the molecule. In the struc- Fig. 6. Structure of M5 with the residues of the hydrophobic core ture bundle, Leu40 has rmsd values of <1.0 A for both displayed with all their heavy atoms. The backbone is in blue, backbone and side chain, whereas the flanking residues and hydrophobic side chains are in red. are much more disordered (Figs 3,7). are observed for the -protons of Leu87 which are low- Residue Val72 replaces one of the two cysteines which field shifted to values rather unusual for a leucine (2.3 form the well-conserved disulphide bridge in the ppm). The reason for this is the orientation of the extracellular members of the immunoglobulin protein 1-protons in the plane of both Phel9 and Phe85. superfamily [29]. Together with Val31, Val72 packs against Trp33 and covers almost its entire surface. The A second hydrophobic cluster formed by Valll, Val62 very intimate association of Val31, Leu40 and Va172 with and Ile89 is present at the end of the A' strand, in the EF aromatic amino acids is also reflected in the chemical loop and at the C terminus, respectively (see Fig. 6). shifts of the corresponding methyl groups: they The A strand is only weakly coupled to the central part correspond to the most strongly high-field shifted reso- of the molecule by Pro6 packing against Trp33 and nances in the aliphatic region of the proton NMR spec- Phe85, and Ile2 packing against Val74. In contrast to trum [25]. Similarly, unusual values of proton resonances Val72, the other position for the classic disulphide is 396 Structure 1995, Vol 3 No 4

H46 R45 H46 R45

-

Fig. 7. Part of the hydrophobic core of 56 M5. A hydrogen bond is formed between the Tyr70 hydroxyl proton and the back- bone carbonyls of Asp66 and Glu67. The aromatic ring packs against Leu40. The 7 side chain carboxylate group of Asp66 forms ion bridges with His46 and weakly with Arg45. Hydrogen bonds/ionic inter- actions are indicated by dashed lines.

Fig. 8. (a) Interaction of Tyr53 with the loop around Pro27-Pro29. The aromatic ring of Tyr53 packs against the hydro- phobic surface provided by Pro27, Val28 and Pro29. The two prolines are in cis and trans conformations, respec- tively. A hydrogen bond is formed from the hydroxyl proton of Tyr53 to the side chain carboxylate of Asp24. (b) Alterna- tive interactions of Asp24 with Argl in the family of 16 accepted structures. Side chains of Argl,- Asp24 and Tyr5J are shown for the 16 best structures while the backbone is only shown for the average structure. Titin Ig modules Pfuhl and Pastore 397 occupied by a cysteine (Cys21) which is completely Comparison with other immunoglobulin folds buried in the core. Consistent with its inaccessibility to The structure of the immunoglobulin-like domain pre- the solvent (accessible surface area <5 A*) is the observa- sented here shows the overall fold typical of the tion that the y-proton (the hydroxyl proton of Tyr7O) immunoglobulin superfamily, in accord with predictions exchanges only slowly with the solvent and is therefore based on multiple sequence analysis [17]. It is now observable in the NMR spectrum. The same phenome- appropriate to ask to which of the immunoglobulin non is observed for the y-proton of Ser55 (the hydro- structure subsets M5 belongs. In addition to the three gen-bonding partner of Cys21 in the antiparallel classical immunoglobulin folds (variable or V set, and the P-sheets B and E); residue Ser55 is protected by a hydro- C1 and C2 constant sets), a new subset of the immuno- gen bond formed with the side-chain oxygen of Thr50. globulin superfamily, the I set, has been defined recently Both observable y-protons indicate that this whole by Harpaz and Chothia [30]. The classification relies on hydrophobic core is very stable and relatively rigid. hydrogen-bond patterns which diversify the subfamilies: the V set contains 10 strands divided in ABED and Protein surface A'GFCC'C"; the C1 and C2 sets contain eight (ABED The surface of the M5 module is predominantly covered and GFCC') and seven (ABE and GFCC') strands by hydrophilic amino acids, especially threonines (in the respectively; and the I set, as defined from the only struc- whole sequence there are 12 threonines, mostly exposed, ture known at the time, telokin [31], was described as out of a total of 91 residues). Being amphiphilic in similar to the V set except for the absence of the C" nature, these residues provide an interface to 'shelter' the strand and the shorter length of the C' strand (three edges of the P-sandwich from the solvent. However, residues). For most of the core contacts, the I set is very some small patches of non-isolated hydrophobic residues similar to the V set. About 64-72 positions were found are noticeable on the surface of M5 and may be impor- to be structurally equivalent in the two sets, with 20 key tant in its binding to other proteins of the M-line matrix. residues involved in the hydrophobic core. From them, a profile was built to search for other possible I or V set Two of the three tyrosines in M5 reside in surface loops, structures. It was suggested that all the sequences consis- without taking part in the formation of the hydrophobic tent with the profile would be structurally very similar. core. Having a partial hydrophilic character, both are On this basis, the immunoglobulin-like modules of titin packed to some extent against hydrophobic regions of other surface residues. Tyr53 packs against the large hydrophobic loop formed by the cis-proline Pro27, Val28 and Pro29 (in trans conformation). Tyr53 forms a hydro- gen bond to the carboxylate of Asp24. The hydroxyl proton, however, seems to be in the fast-exchange regime with water, presumably for lack of any further protection (Tyr53 accessible surface area >I50 A2). An additional reason for fast exchange may be the alternative interaction of the carboxylate of Asp24 with Argl that is observed in some structures (Fig. 8). The aromatic ring of Tyrl2 packs against the P- and y-methylene groups of Gln90 which is also observable in the NMR spectra as high-field shift of the P- and y-protons. The only other hydrophobic patch on the surface is formed by Va173, Va139, Leu34 and the methyl group of Thr32. In contrast with the region around Va128, this patch is freely accessi- ble to the solvent and may therefore be involved in interaction with other proteins.

No well defined salt bridges are observed on the surface of the protein. Some pairs, however, are close enough (with their charged groups -6 A apart) that at least a short-lived charge interaction is possible. Some evidence of this is found by analyzing these distances in the complete bundle of NMR solutions. For the Glu84-Arg35 pair, the dis- tance between Arg35 C( and Glu84 C6 varies from 3.8 A Fig. 9. Comparison of M5 and telokin sequences. Sequences are to 12.2 A. Similar values are found for the Arg45-Asp66 labelled as follows: tel, telokin; M5, titin; Con M, consensus and Argl-Asp24 pairs, although the CI?-Cy distances sequence of titin M-line type II domains; Con A, consensus range from 4.8-12.3 A and from 4.4-9.2 A and indicate a sequence of titin A-band type II domains; key, key residues of the I-set fold; sheet, position of p-sheets in telokin; pos, number- low population for potential ion bridges. The pair ing in telokin P-sheets. Residues making the I-set are shaded in Argl-Asp24 is particularly interesting, in that Asp24 is pink, residues belonging to the titin type II consensus are shaded also involved in a hydrogen bond to Tyr53 (see above). in green. 398 Structure 1995, Vol 3 No 4

Fig. 10. Comparison of M5 and telokin [31] structures. (a) Superposition of the average structure of M5 and telokin in the same orienta- tion as in Fig. 2. The structures were superimposed using only the 46 V-frame residues of the I set resulting in an rmsd of 1.65 A. Titin M5 is coloured yellow, telokin is coloured in blue. Titin M5 is labelled every 10 residues. (b) Comparison of the hydrogen-bond pat- tern in the structures of titin M5 (left) and telokin (TLK) (right). The presence of hydrogen bonds in the NMR structure of titin M5 was also judged from hydrogen exchange data. Hydrogen bonds are displayed as thick horizontal lines.

were predicted to belong to the I set (Harpaz and [30]) which is only weakly populated. Instead, an ionic Chothia, personal communication). bridge (albeit slightly distorted) is present in some of the NMR structures between Asp66 and His46. A compari- An analysis of the structure of M5 in terms of the key son of all the immunoglobulin domains from titin shows residues of the I set shows that virtually all the expected that the region around the loop Lys36-His46 is not par- interactions are conserved in the M5 module (compare ticularly well conserved [17,18]. Some domains do not Fig. 9 with Fig. 6). Only two contacts are absent, but this have any positively charged residues near D1 position, is probably due to the flexibility of the regions involved. and some have a negatively charged residue at this posi- One of the missing contacts is the salt bridge tion. Most domains that contain basic residues in this Arg45-Asp66 (D1-EF6, according to the terminology in region have them at different, non-conserved positions. Titin Ig modules Pfuhl and Pastore 399

The other missing contact is the hydrogen bond between consecutive immunoglobulin-like domains at their Gly14 and Val62 (A'B1-EF2 in [30]) which is not C termini, located around the centre of the M-line [33]. formed. However, the residues involved in the two expected interactions otherwise behave consistently with the I set: Asp66 forms a hydrogen bond from its main- Biological implications chain oxygen to the side-chain hydroxyl of Tyr70 and Titin is one of the most astonishing examples of another hydrogen bond from its side-chain carboxylate modular proteins, in terms of the number of group to the backbone amide of Ser65. Gly14, a residue modules it contains and the adaptation of mod- highly conserved in titin, has a 'disallowed' conformation ules to completely different functions. Most of the with positive and qf angles, whereas Val62 contributes sequence is made up of amino acid repeats of to the hydrophobic core. about 100 residues. These have been classified on the basis of sequence analysis as type-I and type-II As similar hydrophobic cores lead to similar structures, a modules, which belong to the fibronectin type-III superposition of M5 and telokin shows a very high and immunoglobulin superfamilies, respectively degree of structural homology in spite of the relatively [17]. The modularity of titin offers the possibility low level of sequence identity (27%) (Fig. 10). However, of dissecting the protein into several independent an interesting difference between M5 and telokin lies in domains of suitable size for structure determina- the long loop connecting strands C and D (see Fig. 10). tion. The study of selected modules and their A short C' strand (three residues) is present in telokin in interfaces may then be related to the structure and this region. Reconstruction using the crystallographic function of the overall protein. symmetry shows that crystal packing effects do not occur in this region and could therefore not have induced a In this paper, we have described the structure of more ordered structure. No evidence is found of the C' one of the immunoglobulin-like domains from the strand in M5 in any of the NMR parameters (secondary C terminus of titin. Modules from this region chemical shift, amide exchange rates and NOE pattern). bind to M-protein and myomesin, both of which In M5, the long CD loop folds deeply inwards and forms contain immunoglobulin-like domains and con- 3-turns. However, only one conserved key residue tribute to the meshwork of the M-line [5]. The defines this region, that is, position CD1, which has structure we report shows that this module Val82 in telokin and the corresponding Leu40 in M5 (see belongs to the I set of the immunoglobulin Fig. 9). A large variability around this region was also superfamily fold. Almost all of the key interac- found by Harpaz and Chothia [30] in the sequences tions expected in this fold are present, so that extracted by their profile. Similar variability in length and comparison with telokin [31], the only other sequence is found when aligning M5 to other titin mod- known structure of a muscle immunoglobulin-like ules [18]. It is therefore likely that the loop can adopt a domain, leads to 1.65 A root mean square devia- variety of conformations in different modules. We can tion for the backbone atoms despite the relatively generalize these results by suggesting that the C' strand is low sequence homology (27% identity and 43% not a stringent requirement for the identification of the I similarity). These values of sequence similarity are set and that this fold is characterized by only eight strands. well within what has been called the 'twilight region', within which modelling on the basis of The large variability of this region might reflect diversi- sequence homology becomes unreliable [34,35]. fied functions of modules in different regions of the titin However, it is now accepted that similar molecule. Interestingly, this part of the sequence forms hydrophobic cores lead to very similar structures the C' and C" strands in the variable domains of anti- in spite of little sequence homology. A low degree bodies. The loop between these strands is one of the of sequence homology but a high conservation of hypervariable loops (L2) that form the three antigen- the key residue positions is observed throughout binding sites [29]. Because this is strongly related to anti- the whole titin family of domains and in most of body function (and not to the fold), we speculate that the the other immunoglobulin-like domains in muscle region between the C and D strands is not important for proteins. The highest similarity to M5 is observed folding, but is for the very specific function of an indi- in the other titin M-line modules. Our results vidual module. It has previously been reported that the indicate that M5 may be used as a molecular tem- whole C terminus of titin is involved in interactions with plate to model most, if not all, of the other titin two M-line proteins, myomesin and M-protein [5]. The domains. These models may then be used in CD loop would therefore be a good candidate for a spe- mutagenesis studies to help identify the diversified cific binding site. This observation is also echoed by the functions of the corresponding modules. This recently published data for VCAM-1 [32]. In domain 1 approach may help, for example, in mapping the of VCAM-1, all the residues known to be involved in interaction sites of the titin C terminus with other integrin binding are clustered onto the CFG face and, in proteins and should ultimately shed light on the particular, in the N-terminal part of the CD loop. Inter- role of the titin C terminus in the process of estingly, the architecture of myomesin and M-protein is M-line assembly during myogenesis. very similar to that of titin; both proteins contain five 400 Structure 1995, Vol 3 No 4

Materials and methods Acknowledgements: The authors would like to thank M Gautel and Distance determination C Joseph for expression and purification of the sample, C Chothia, Distances for structure calculations were extracted from a Y Harpaz, AM Lesk, AS Politou, S Labeit and B Kolmerer for 100 ms mixing time nuclear Overhauser enhancement spec- stimulating discussions, and DJ Thomas for critical reading of the troscopy (NOESY) spectrum recorded at 300 K. This mixing manuscript. The help of M Nilges with X-PLOR and D Grindrod with computers is acknowledged. time is still in the linear range of the build-up curve (data not This work was supported by a grant from the Human Frontiers Science Organisation. shown). NOEs were integrated and calibrated by the AURE- LIA software using the distances Hel-H51 and HisE1-HC2 from Trp33 as reference distances. This calibration led to dis- tances between inter-strand a- and amide protons in p-sheets References 1. Doolittle, R.F & Bork, P. (1993). Evolutionarily mobile modules in entirely consistent with their expected values [36]. In cases of proteins. Sci. Am. 269, 50-56. overlap, distances involving amide protons were extracted from 2. Bork, P., Holm, L. & Sander, C. (1994). The immunoglobulin fold: "5N three-dimensional NOESY-HSQC (NOESY-hetero- structural classification, sequence patterns and common core. nuclear single quantum coherence) spectra. For well resolved J. Mo. Biol. 242, 309-320. but ambiguous NOEs involving side 3. Chothia, C. & Lesk, A.M. (1982). Evolution of proteins formed by chains, assignments were beta-sheets. i. Plastocyanin and azurin. J. Mol. Biol. 160, 309-323. obtained from a three-dimensional TOCSY-NOESY (total 4. Yoshikai, S. & Ikebe, M. (1992). Molecular cloning of the chicken giz- correlation spectroscopy-NOESY) experiment. A semi-quan- zard telokin gene and cDNA. Arch. Biochem. Biophys. 299, 242-247. titative calibration of the three-dimensional spectra was 5. Vinkemeier, U., Obermann, W., Weber, K. & Frst, D.O. (1993). assumed for NOEs involving protons with fixed distances (e.g. The globular head domain of titin extends into the center of the sar- comeric M band. cDNA cloning, epitope mapping and immuno- H3Tyrl2 to HyGln90). For the others, the largest distance was electron microscopy of two titin associated proteins. . Cell Sci. assumed (i.e. 5.0 A). Groups of protons with degenerate reso- 106, 319-330. nances were given the appropriate distance corrections (0.9 A, 6. Vaughan, K.T., Weber, F.E., Einheber, S. & Fischman, D.A. (1993). 1.5 A and 2.5 A for methylene, methyl and aromatic groups, Molecular cloning of chicken myosin-binding protein (mybp) h (86- kDa protein) reveals extensive homology with mybp-c (C-protein) respectively). Stereospecific assignments were obtained for the with conserved immunoglobulin c2 and fibronectin type Ill motifs. p-proton pairs using the HABAS program [37]. Distances to I. Biol. Chem. 268, 3670-3676. pro-chiral pairs of protons, for which no stereospecific assign- 7. Einheber, S. & Fischman, D. (1990). Isolation and characterization ments were obtained, were corrected by adding 0.8 A [38]. of a cDNA clone encoding avian skeletal muscle C-protein: an intra- cellular member of the immunoglobulin superfamily. Proc. Natl. Hydrogen bonds were imposed between pairs of residues for Acad. Sci. USA 87, 2157-2161. which all standard 3-sheet inter-strand NOEs were observed 8. Forst, D.O., Vinkemeier, U. & Weber, K. (1992). Mammalian skele- and for which the amide proton was stable for at least 30 days tal muscle c-protein: purification from bovine muscle, binding to titin and the characterization of a full-length human cDNA. . Cell in D20. For each hydrogen bond, two upper limit restraints were used between HN-O Sci. 102, 769-778. (2.4 A) pairs and N-O (3.4 A) 9. Vaughan, K.T., Weber, F.E. & Fischman, D.A. (1992). Immunoglob- pairs. A summary of important parameters in the structure ulin domains in muscle proteins. Symp. Soc. Exp. Bio. 46, 167-177. calculation is given in Table 1. 10. Lakey, A., et al., & Bullard, B. (1993). Kettin, a large modular protein in the z-disc of insect muscles. EMBO 1. 12, 2863-2871. 11. Maruyama, K. (1994). Connectin, an elastic protein of striated mus- Structure calculation cle. Biophys. Chem. 50, 73-85. Structure calculation was started with the program DIANA 12. Trinick, J. (1994). Titin and nebulin: protein rulers in muscle? Trends 2.1 [37]. Three cycles of the REDAC strategy [39] were per- Biochem. Sci. 19, 405-408. 13. Maruyama, K., Kimura, S., Yoshidomi, H., Sawada, H. & Kikuchi, K. formed giving a total of 221 dihedral angle constraints for the (1981). Connectin, an elastic protein of muscle. Identification of last cycle. Dihedral angle constraints were generated for "titin" with "connectin". J. Biochem. 89, 701-709. residues with a target function of <0.4 A2 in at least 15 struc- 14. Trinick, J., Knight, P. & Whiting, A.K. (1984). Purification and prop- tures. The best 40 structures obtained from DIANA starting erties of native titin. J. Mo. Biol. 180, 331-356. with a total of 500 structures were subsequently 15. Furst, D.O., Osborn, M., Nave, R. & Weber, K. (1988). The organisa- refined with tion of titin filaments in the half sarcomere revealed by monoclonal the X-PLOR program [40] using a modified simulated- in immunoelectron microscopy. A map of ten nonrepeti- annealing protocol [41] and explicit hydrogen bond terms. tive epitopes starting at the z-disk extends close to the M-line. . Cell After heating to 3000 K for 20 ps, the system was cooled Biol. 106, 1563-1572. down in 30 ms to 300 K. The temperature was kept constant 16. Whiting, A.J., Wardale, J. & Trinick, J. (1989). Does titin regulate the length of muscle thick filaments? J. Mol. Biol. 205, 263-268. for 5 ps after which the structure was energy-minimized for 17. Labeit, S., et a., & Trinick, J. (1990). A regular pattern of two types 1000 steps. The 16 best structures were averaged by multiple of 100-residue motif in the sequence of titin. Nature 345, 273-276. simultaneous superposition using a program kindly provided 18. Gautel, M., Leonhard, K. & Labeit, S. (1993). Phosphorylation of KSP by Dr R Diamond [42]. Geometric strain from the averaging motifs in the C-terminal region of titin in differentiating myoblasts. EMBOJ. 10, 3827-3834. was removed by 5 ps of molecular dynamics at 300 K fol- 19. Maruyama, K., Yoshioka, H., Higuchi, K., Ohashi, S., Kimura, S. & lowed by 1000 steps of energy minimization. Structures were Natori, R. (1985). Connectin filaments line thick filaments in frog displayed and analyzed with MIDAS [43], MOLSCRIPT [44] skeletal muscles revealed by immunoelectron microscopy. . Cell and INSIGHTII [45]. Secondary-structure elements were Bio. 101, 2167-2174. 20. Nave, R., F0rst, D.O. & Weber, K. (1989). Visualization of the polar- identified using the program DSSP [46] and supplemented ity of isolated titin molecules: a single globular head on a long thin with NMR data (secondary chemical shift, amide exchange rod as the M-band anchoring domain? J. Cell Bio. 109, 2177-2187. rates, NOE patterns). The P3-bulge was classified according to 21. Soteriu, A., Gamage, M. & Trinick, J. (1993). A survey of interactions [28] whereas 1-turns were identified from their backbone b made by the giant protein titin. J. Cell Sci. 14, 123-129. and angles [47] and 22. Itoh, Y., et a., & Maruyama, K. (1988). Extensible and less extensi- hydrogen bonds from DSSP. Solvent ble domains of connectin filaments in skeletal vertebrate muscle accessible surfaces were calculated with X-PLOR using a sarcomere as detected by immunofluorescence and immunoelec- sphere of radius 1.6 A. tron microscopy with monoclonal antibodies. J. Biochem. 104, 504-511. 23. Politou, A.S., Gautel, M., Pfuhl, M., Labeit, S. & Pastore, A. (1994). The coordinates for this structure have been submitted to the Immunoglobulin-type domains of titin: same fold, different stability? Brookhaven Protein Data Bank (entry name 1TNN). Biochemistry 33, 4730-4737. Titin Ig modules Pfuhl and Pastore 401

24. Politou, A.S., Gautel, M., Joseph, C. & Pastore, A. (1994). assignments in protein 1H nuclear magnetic resonance spectra. Immunoglobulin-type domains of titin are stabilized by amino- i. Mol. Biol. 102, 321-345. terminal extension. FEBS Lett. 352, 27-31. 37. Gontert, P., Braun, W. & WOthrich, K. (1991). Efficient computation of 25. Pfuhl, M., Gautel, M., Politou, A.S., Joseph, C. & Pastore, A. (1995). three-dimensional protein structures in solution from nuclear magnetic Secondary structure determination by NMR spectroscopy of an resonance data using the program DIANA and the supporting immunoglobulin domain of the giant muscle protein titin. J. Biomol. programs CALIBA, HABAS and GLOMSA. I. Mol. Biol. 217, 517-530. NMR 5, in press. 38. WOithrich, K. & Braun, M.B.W. (1983). Pseudo-structures for the 20 26. LeGrice, S.F.J. & Grueninger-Leitch, F. (1990). Rapid purification of common amino acids for use in studies of protein conformations by homodimer and heterodimer HIV-1 reverse transcriptase by metal measurements of intramolecular proton-proton distance constraints chelating affinity chromatography. Eur. J. Biochem. 87, 307-314. with nuclear resonance. J. Mol. Biol. 169, 949-961. 27. Wuthrich, K. (1986). NMR of Proteins and Nucleic Acids. Wiley 39. Guntert, P. & WOthrich, K. (1991). Improved efficiency of protein Interscience, New York. structure calculations from NMR data using the program DIANA with 28. Chan, A.W.E., Hutchinson, E.G., Harris, D. & Thornton, J.M. (1993). redundant dihedral angle constraints. J. Biomol. NMR 1, 447-456. Identification, classification and analysis of beta-bulges in proteins. 40. Briinger, A.T. (1993). X-PLOR: A system for X-ray Crystallography Protein Sci. 2, 1574-1581. and NMR, Version 3.1. Yale University, New Haven, CT. 29. Kabat, E.A., Wu., T.T., Bilofsky, H., Reid-Milner, M. & Perry, H. 41. Nilges, M., Clore, G.M. & Gronenborn, A.M. (1988). Determination (1976). Variable regions of immunoglobulin chains: Tabulations of three-dimensional structures of proteins from interproton distance and analysis of amino acid sequences. Beranek & Newman, data by dynamical simulated annealing from a random array of Cambridge, MA. atoms. FEBS Lett. 239, 129-136. 30. Harpaz, Y. & Chothia, C. (1994). Many of the immunoglobulin 42. Diamond, R. (1992). On the multiple simultaneous superposition of superfamily domains in cell adhesion molecules and surface recep- molecular structures by rigid body transformation. Proteins 1, tors belong to a new structural set which is close to that containing 1279-1283. variable domains. J. Mol. Biol. 238, 528-539. 43. Ferrin, T.E., Huang, C.C., Jarvis, L.E. & Langridge, R. (1988). The 31. Holden, H.M., Ito, M., Hartshorne, D.J. & Rayment, I. (1992). X-ray MIDAS display system. J. Mol. Graphics 6, 13-22. structure determination of telokin, the C-terminal domain of myosin 44. Kraulis , P.J. (1991). MOLSCRIPT: a program to produce both light chain kinase, at 2.8 A resolution. J. Mol. Biol. 227, 840-851. detailed and schematic plots of protein structure. J. Apple. Crystallogr. 32. Jones, E.Y., et al., & Stuart, D.I. (1995). Crystal structure of an 24,251-259. integrin-binding fragment of vascular cell adhesion molecule-1 at 45. Dayringer, H.E., Tramontano, A., Sprang, S. & Fletterick, R.J. (1986). 1.8 A resolution. Nature 373, 539-544. INSIGHT an interactive molecular graphics package. J. Mol. Graphics 33. Chothia, C., Novotny, J., Bruccoleri, R. & Karplus, M. (1985). 4, 82-87. Domain association in immunoglobulin molecules. The packing of 46. Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary variable domains. J. Mol. Biol. 186, 651-663. structure: pattern recognition of hydrogen-bonded and geometrical 34. Doolittle, R.F. (1981). Similar amino-acid sequences: chance or features. Biopolymners 22, 2577-2637. common ancestry? Science 214, 149-159. 47. Wilmot, C.M. & Thornton, J.M. (1988). Analysis and prediction of 35. Pastore, A. & Lesk, A. (1991). Brave new proteins: what the different types of p-turns in proteins. J. Mol. Biol. 203, 221-232. evolution reveals about protein structure. Curr. Opin. Biotechnol. 2, 592-598. Received: 1 Feb 1995; revisions requested: 21 Feb 1995; 36. Billeter, M., Braun, W. & Wuthrich, K. (1981). Sequential resonance revisions received: 2 Mar 1995. Accepted: 3 Mar 1995.