Silk Fibroin: Structural Implications of a Remarkable Amino Acid Sequence
Total Page:16
File Type:pdf, Size:1020Kb
http://www.paper.edu.cn PROTEINS:Structure,Function,andGenetics44:119–122(2001) SilkFibroin:StructuralImplicationsofaRemarkable AminoAcidSequence Cong-ZhaoZhou,1,2,3 FabriceConfalonieri,1 MichelJacquet,1 RolandPerasso,2 Zhen-GangLi,3 andJoelJanin4* 1InstitutdeGe´ne´tiqueetMicrobiologie,Universite´Paris-SudetCNRS,OrsayCedex,France 2LaboratoiredeBiologieCellulaire4,Universite´Paris-SudetCNRS,OrsayCedex,France 3DepartmentofBiology,UniversityofScienceandTechnologyofChina,Hefei,Anhui,People’sRepublicofChina 4Laboratoired’EnzymologieetBiochimieStructurales,CNRS,Gif-sur-Yvette,France ABSTRACT Theaminoacidsequenceofthe minedproteinsequenceofthefibroinheavychaindeduced heavychainofBombyxmorisilkfibroinwasderived fromthatofthegenomicDNA.4 fromthegenesequence.The5,263-residue(391-kDa) polypeptidechaincomprises12low-complexity SEQUENCEANDCOMPOSITIONOFTHE “crystalline”domainsmadeupofGly–Xrepeatsand FIBROINHEAVYCHAIN covering94%ofthesequence;XisAlain65%,Serin Aftertheearlystudiesmentionedaboveandreviewedby 23%,andTyrin9%oftherepeats.Theremainder Lucasetal.,1 Bombyxsilkfibroinreceivedlittleattention includesanonrepetitive151-residueheaderse- frombiochemists.Thesequenceofthe26-kDalightchain, quence,11nearlyidenticalcopiesofa43-residue ofashortinternalfragmentandofa85-residueC-terminal spacersequence,anda58-residueC-terminalse- sequenceoftheheavychainweredetermined.5–7 Thelight quence.Theheadersequenceishomologoustothe chain,whichislinkedtotheheavychainbyasingle N-terminalsequenceofotherfibroinswithacom- disulfidebridge,hasastandardaminoacidcomposition pletelydifferentcrystallineregion.InBombyxmori, andanonrepetitivesequence.Itplaysonlyamarginalrole eachcrystallinedomainismadeupofsubdomains inthefiber. ofϳ70residues,whichinmostcasesbeginwith Thecompleteaminoacidsequenceoftheheavychain repeatsoftheGAGAGShexapeptideandterminate hasnowbeendeducedfromthatofa80-kbpfragmentof withtheGAAStetrapeptide.Withinthesubdo- BombyxmorigenomicDNAcomprisingthe17-kpbfibroin mains,theGly–Xalternanceisstrict,whichstrongly gene(accesscodeGenBankAF226688–EMBLP05790).4 supportstheclassicPauling–Coreymodel,inwhich Thegenecodesforapolypeptidechainof5,263residues -sheetspackoneachotherinalternatinglayersof withamolecularweightof391kDa,composedof45.9% Gly/GlyandX/Xcontacts.Whenfittingtheactual glycine,30.3%alanine,12.1%serine,5.3%tyrosine,1.8% sequencetothatmodel,weproposethateachsubdo- valine,andonly4.7%oftheother15aminoacidtypes. mainformsa-strandandeachcrystallinedomain Mostofthesequenceislow-complexityandforms2,377 atwo-layered-sandwich,andwesuggestthatthe repeatsofaGly–X(GX)dipeptidemotif.TheGXrepeatis -sheetsmaybeparallel,ratherthanantiparallel,as thebuildingblockofthe-sheetsinthePauling–Corey hasbeenassumeduptonow.Proteins2001;44:119–122. modelandofthewholefiber.Thesequencecontainsvery ©2001Wiley-Liss,Inc. longstretchesofGXrepeatsthatmustconstitutethex-ray Keywords:Bombyxmori;low-complexitysequences; diffractingstructure,oftencalledthe“crystalline”compo- -sheetpacking;fiberstructure nentofsilkfibroin.ResidueXisAlain64%oftherepeats, Serin22%,Tyrin10%,Valin3%,andThrin1.3%ofthe INTRODUCTION repeats.Essentiallynoneoftheother14aminoacidtypes Thelongandrobustproteinfiberthatmakesupthe ispresentintherepeats.In2%ofthedipeptides,thefirst cocoonoftheBombyxmoricaterpillarhasbeenusedby positionisanalanineinsteadofglycine.Thisoccurs manformorethan3,000yearstoproducehigh-quality almostexclusivelyintheGAAStetrapeptide,whichis ϭ threadandcloth.Theirremarkablepropertiesoriginatein repeated41times.Moreover,twohexapeptidess ϭ thephysicalchemistryofsilkfibroin,theproteinthat GAGAGS,in433copies,andy GAGAGY,in120copies, constitutesalmostallthefiber.Fibroinissynthesizedin accountfor70%ofthelow-complexityregion.Theirabun- largequantitybythesilkglandofthecaterpillarastwo dancewasknownfrompartialsequencesanditwas polypeptidechainslinkedbyadisulfidebridge.Thelarger suggestedthatfibroinmightbemadeupentirelyofthese heavychainisglycinerich,andmostofitssequenceisa repeatofGly–Ala/Serdipeptides.Thesilkfiberhasbeen 1 Grantsponsor:ProgrammedesRecherchesAvance´esFranco- knownasearlyas1913todiffractx-rays. Itsdiffraction Chinois;Grantnumber:BT97-05. patternischaracteristicofapleated-sheet,andithelped *Correspondenceto:Joe`lJanin,LEBS-CNRS,91198Gif-sur-Yvette, PaulingandCoreyindefiningthistypeofsecondary [email protected] structure.2 ThePauling–Coreymodelofsilkfibroin3 is Received3November2000;Accepted5March2001 revisitedherewithinthecontextoftherecentlydeter- Publishedonline00Month2001 ©2001WILEY-LISS,INC. 中国科技论文在线 http://www.paper.edu.cn 120 C.-Z. ZHOU ET AL. Fig. 1. The silk fibroin heavy chain. The 5,263-residue polypeptide chain (EMBL P05790) is broken in domains and subdomains. The sequence number of the first residue number and the length l of each domain are given in parentheses. The 25-residue motif in boldface characters is repeated between the header and the linkers. A one-residue (or three-residue) insertion in subdomain GX9.4 is also in boldface characters. Lowercase letters s, y, a, and u represent frequently observed hexapeptides. Hexapeptide code and number of copies: s GAGAGS 433 y GAGAGY 120 a GAGAGA 27 GAGYGA 39 hexapeptides.6,8 Figure 1 shows that this is incorrect in fibroin is otherwise completely different from Bombyx.It spite of the hexapeptide abundance, and that fibroin has a yields a different type of fiber, and the bulk of its 2,639- more elaborate primary structure as discussed below. residue sequence (EMBL O76786), although low-complex- ity, is more alanine than glycine-rich.9 A BLAST search THE NONREPETITIVE OR “AMORPHOUS” against the Drosophila genome also suggests the presence SEGMENTS of a segment related to these N-terminal sequences in the The GX repeats that form the bulk of the polypeptide CG18026 gene product, but the score is much lower (27% chain are distributed among 12 domains separated by identity over 136 residues), and the protein function is short linkers (Fig. 1). In contrast to the domains, the unknown. N-terminal 151 residues, C-terminal 50 residues, and the The 58-residue C-terminal segment of the Bombyx heavy 42–43 residue linkers between domains are nonrepetitive chain is arginine/lysine-rich and depleted in hydrophobic and “amorphous,” as opposed to the “crystalline” domains. residues, which does not suggest a globular fold. It con- The N-terminal segment or header has a standard amino tains three cysteine residues involved in two disulfide acid composition and may constitute an independent globu- bonds,7 one with the light chain, the other internal. The 11 lar unit, possibly with some ␣-helix as well as a -sheet. linker segments connecting the crystalline domains have Related sequences (33–38% identity) are found at the nearly identical sequences, including a 25-residue nonre- N-terminus of the fibroins of the moths Galleria mal- petitive peptide (boldface characters in Fig. 1), also present lonella and Antheraea pernyi (Fig. 2). The cocoon of the in the header sequence in a truncated version. The peptide latter is used in Asia to produce tussah silk. Antheraea breaks the GX alternance and terminates the crystalline 中国科技论文在线 http://www.paper.edu.cn SEQUENCE AND STRUCTURE OF SILK FIBROIN 121 collaborators proposed that the fiber is made of antiparal- lel -sheets packing on top of each other.3 The -strands extend along the fiber axis, yielding a 7.0-Å axial repeat containing two peptide units. There is also a Ϸ9.5-Å repeat across the fiber in the diffraction pattern. It is interpreted as representing in one direction, twice the spacing between -strands in an antiparallel -sheet, and in the orthogonal direction, twice the spacing between packed -sheets. The Pauling–Corey model, elaborated upon by Crick and Ken- drew,10 takes note of the fact that, in a -strand made of Gly–Ala/Ser repeats, all the nonglycine residues have their side-chains on the same face of the -sheet. The -sheets may then pack on each other alternatively by their glycine face and by their Ala/Ser face. The first Fig. 2. The header sequence. Secondary structure prediction by PHD and an alignment of the N-terminal sequences of silk fibroins from packing yields a short 3.5–3.9 Å spacing, the second longer Bombyx mori (EMBL P05790), Galleria mellonella (EMBL AF095239), one of 5.3–5.7 Å. This type of packing was later observed in and Antherea pernyi (EMBL O76786). E, extended; H, helical. crystalline poly-(Ala–Gly).11 The Pauling–Corey model was challenged in a recent study of the Bombyx silk fiber pattern. In an attempt to domains. It has a proline, charged residues and the only interpret the diffracted intensities quantitatively, Taka- tryptophan residue found in fibroin. Charged residues are hashi et al.12 test four ways of assembling -strands and entirely absent from the crystalline domains. packing -sheets. Adjacent strands within a -sheet may either be parallel or antiparallel, and their side-chains PUNCTUATIONS IN THE CRYSTALLINE REGION may either all point in the same direction (polar mode) or The 12 crystalline domains are labeled GX1–GX12 in alternatively up and down (antipolar mode). With this Figure 1. Their average length is 413 residues, omitting convention, the Pauling–Corey model is polar–antiparal- GX12, which is much shorter (37 residues). Within a lel. It yields the best fit to a set of 26 diffracted intensities domain, the GX alternance is perturbed only by the assuming the fibroin sequence to be just poly-(Ala–Gly). occasional presence of a GAAS tetrapeptide, and, in do- Nevertheless, the authors go on to consider effects of main GX9, by a single residue