Relating Sequence Encoded Information to Form and Function of Intrinsically Disordered Proteins
Total Page:16
File Type:pdf, Size:1020Kb
Available online at www.sciencedirect.com ScienceDirect Relating sequence encoded information to form and function of intrinsically disordered proteins Rahul K Das, Kiersten M Ruff and Rohit V Pappu Intrinsically disordered proteins (IDPs) showcase the of archetypal IDPs. CCRs enable the assignments of importance of conformational plasticity and heterogeneity in conformational descriptors and inferences regarding the protein function. We summarize recent advances that connect amplitudes of conformational fluctuations of IDPs. These information encoded in IDP sequences to their conformational insights are relevant because amino acid compositions are properties and functions. We focus on insights obtained often well conserved among orthologs of IDPs even if their through a combination of atomistic simulations and biophysical sequences are poorly conserved [42,43]. measurements that are synthesized into a coherent framework using polymer physics theories. Compositional classes of IDPs Address Amino acid compositions of IDPs are characterized by Department of Biomedical Engineering and Center for Biological distinct biases [5]. They are deficient in canonical hydro- Systems Engineering, Washington University in St. Louis, One Brookings phobic residues and enriched in polar and charged resi- Drive, Campus Box 1097, St. Louis, MO 63130, USA dues. Accordingly, IDPs fall into three distinct compositional classes that reflect the fraction of charged Corresponding author: Pappu, Rohit V ([email protected]) versus polar residues. The distinct classes are polar tracts, polyampholytes, and polyelectrolytes [41] (see Figure 1). Polar Current Opinion in Structural Biology 2015, 32:102–112 tracts are deficient in charged, hydrophobic, and proline residues. They are enriched in polar amino acids such as This review comes from a themed issue on Sequences and topology Asn, Gly, Gln, His, Ser, and Thr. Polyampholytes and Edited by M Madan Babu and Anna R Panchenko polyelectrolytes can either be weak or strong depending For a complete overview see the Issue and the Editorial on the fraction of charged residues (FCR) that is quanti- Available online 2nd April 2015 fied as the sum of f+ and fÀ (see Figure 2). The latter two http://dx.doi.org/10.1016/j.sbi.2015.03.008 parameters quantify the fraction of positive and negative- ly charged residues in an IDP sequence. Polyelectrolytes 0959-440X/# 2015 Elsevier Ltd. All rights reserved. have an excess of one type of charge, that is, f+ > fÀ or vice versa. Polyampholytes have roughly equivalent fractions of opposite charges, that is, f+ fÀ. The designation of weak versus strong polyampholytes/polyelectrolytes is governed by the value of FCR. In strong polyampho- Introduction lytes/polyelectrolytes, the high FCR values encode an intrinsic tendency for populating expanded coil-like con- Protein domains are modular building blocks of macro- formations because charged residues prefer to be solvated molecular complexes and interaction networks [1]. The in aqueous milieus. concept of domains can be generalized to include se- quence regions that fail to fold as autonomous units [2]. These intrinsically disordered regions/proteins, referred A formal language for describing to collectively hereafter as IDPs, are distinct from struc- conformational preferences of IDPs tured domains. Their sequences encode an intrinsic Ensembles of conformations as opposed to singular rep- inability to fold into singular well-defined three-dimen- resentative structures are appropriate for describing IDPs. sional structures [3 ,4 ,5–7] although some IDPs do fold The balance between solvent-mediated intra-chain into well-ordered structures in the context of functional attractions versus repulsions determines the types of con- complexes. IDPs are implicated in important cellular formations that make up the ensemble that is thermody- processes that include cell division [8,9 ], cell signaling namically accessible to an IDP sequence. When attractions [3 ,10], intracellular transport [11,12 ], bacterial translo- dominate, the conformations in the ensemble are, on aver- cation [13 ], cell mechanics [14 ,15], protein degradation age, compact and spherical, that is, globular. Conversely, if [16,17], posttranscriptional regulation [18], and cell cycle intra-chain repulsions dominate over attractions or, stated control [19]. differently, chain solvation is preferred over desolvation, then the conformations are, on average, expanded, prolate IDPs can be classified into distinct conformational classes ellipsoidal, and coil-like. An intermediate scenario results if based on their amino acid compositions [20–41]. We the strengths of intra-chain solvent mediated repulsions summarize recent results that have identified composi- are counterbalanced by equivalent attractive interactions. tion-to-conformation relationships (CCRs) through studies Under such circumstances, the ensembles are characterized Current Opinion in Structural Biology 2015, 32:102–112 www.sciencedirect.com Encoding form and function of IDPs Das, Ruff and Pappu 103 Figure 1 PolyQ: …QQQQQQQQQ…QQQQQQQQQQ … … … … Sup35: …SNQGNNQQNYQQ YSQNGNQQ… … … POLAR TRACTS EcSSB: …QGGGAPAGGN IGGGQPQGGW… … … Nup42: …TSPFGS LQQNASQNASSTSS… … … WEAK Nup60: …NAYKSENAPSASS KEFNFTN… … … PfSSB: …FM PLNSNDKIIEDKEFTDRL… … … POLYAMPHOLYTES Nsp1: …AF SFGAKSDEKKDGDASKPA… … … PQBP1: …YDKVDRERERDRERDRDRGY… … … STRONG WEAK PRM2: …ACYPVNIRARGLGKNMGMKS… … … … … POLYELECTROLYTES PDE6G: …DITVICPWEAFNHLELHELA… NP1: …RARSRGRSVRRRRR GRSPGR… … … RAG2: …SFDGDDE FDTYNEDDEDDES… … … STRONG Hydrophobic Polar Proline Positive Negative Current Opinion in Structural Biology Definitions of polar tracts, polyelectrolytes, and polyampholytes. Polar tracts shown here include polyQ (UniProt ID: P42858): Polyglutamine tracts are found in at least ten proteins associated with human neurodegenerative disorders including Huntington’s disease; Sup35 (UniProt ID: P05453): Residues 4–23 of S. cerevisiae Sup35 corresponding to a region of the N-terminal prion domain; EcSSB (UniProt ID: P0AGE0): Residues 117–136 of E. coli single stranded DNA binding protein corresponding to a region of the C-terminal tail; Nup42 (UniProt ID: P49686): Residues 181–200 of S. cerevisiae nucleoporin Nup42 corresponding to a region of the FG domain, which modulates gating of the nuclear pore complex. Polyampholytes shown here include: Nup60 (UniProt ID: P39705): Residues 412–431 of S. cerevisiae nucleoporin Nup60 corresponding to a region of the FG domain which modulates gating of the nuclear pore complex; PfSSB (UniProt ID: Q8I415): Residues 232–251 of P. falciparum single stranded DNA binding protein corresponding to a region of the C-terminal tail; Nsp1 (UniProt ID: P14907): Residues 359–378 of S. cerevisiae nucleoporin Nsp1 corresponding to a region of the FG domain which modulates gating of the nuclear pore complex; PQBP1 (UniProt ID: O60828): Residues 146–165 of H. sapiens polyglutamine-tract binding protein 1 corresponding to a region of the expanded linker, which connects the N-terminal WW domain and the C-terminal U5 15 kDa binding region. Polyelectrolytes shown here include: PRM2 (UniProt ID: Q9EP54): Residues 2–21 of the C. griseus DNA packaging protein protamine 2, which is involved in the chromatin condensation process during spermatogenesis [6]; 0 0 PDE6G (UniProt ID: P18545): Residues 63–82 of H. sapiens retinal rod rhodopsin-sensitive cGMP 3 ,5 -cyclic phosphodiesterase subunit gamma protein, which is involved in processing visual signal; NP1 (UniProt ID: O13030): Residues 5–24 of C. pyrrhogaster protamine 1 which is involved in the chromatin condensation process during spermatogenesis; RAG2 (UniProt ID: P21784): Residues 392–411 of C. griseus V(D)J recombination- activating protein 2 corresponding to a region of the ‘acidic hinge’ which modulates DNA repair mechanisms. by maximal conformational heterogeneity and compact, also classify the sequence-specific conformational prop- semi-compact, expanded, and chimeric conformations be- erties by quantifying the amplitudes of conformational come thermodynamically accessible [41]. Typical hetero- fluctuations [39]. All of these classifiers and descriptors polymeric IDP sequences can sample conformations that rely on comparisons of measured or calculated values of are chimeras of globules, coils, rods, and semi-compact conformational fluctuations to expectations from analyti- hairpins. The preference is governed by the region-specific cal theories for flexible polymers in different types of amino acid compositions along the linear sequence. solvents. Figure 3 summarizes the typical workflow that leads from analysis of results from computer simulations Polymer physics theories provide access to formal or in vitro experiments to quantitative inferences regard- descriptors of conformational ensembles for heteroge- ing CCRs and/or sequence-to-conformation relationships neous systems such as IDPs and these have been (SCRs). reviewed recently [41,44]. Analytical relationships predict the scaling of parameters such as radii of gyration, mean Distinct compositional classes can be end-to-end distances, and hydrodynamic radii as func- mapped to distinct conformational classes tions of chain length, amino acid composition, and intrin- Results from atomistic simulations obtained using explic- sic stiffness. Analytical relations are also available to relate it representations of solvent molecules [45,46] and studies the scaling of inter-residue distances to the linear se- based on fluorescence correlation spectroscopy