Sequences and Topology: Disorder, Modularity, and Post/Pre Translation Modiﬁcation

COSTBI-1118; NO. OF PAGES 3 Available online at www.sciencedirect.com Sequences and topology: disorder, modularity, and post/pre translation modification Editorial overview Julian Gough and A Keith Dunker Current Opinion in Structural Biology 2013, 23:xx– yy This review comes from a themed issue on Se- quences and topology 0959-440X/$ – see front matter, # 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.sbi.2013.05.001 Julian Gough In this collection of reviews we bring together discussions of intrinsically Department of Computer Science, University disordered proteins and their functional mechanisms operating at different of Bristol, Merchant Venturers Building, loci across the ordered-to-disordered spectrum. Our previous editorial [1] for Woodland Road, Bristol BS8 1UB, UK sequences and topology in 2011 contrasts and compares structured and e-mail: [email protected] disordered proteins but viewed in light of their evolution. Here we move forward, focusing on function across the spectrum of disorder, but still within Julian Gough began his research career in structural bioinformatics, but soon combined the context of evolution. Disordered regions can also participate as modules this with genomics and sequence analysis to in multi-domain proteins; the reviews herein cover both structured [2] and study evolution, also producing the disordered modularity [3]. A growing cohort of researchers suggest that we SUPERFAMILY database of structural need to improve the connections between structure, disorder and their domain assignments to genomes. Gough functions, and several reviews present computational and experimental continues to examine molecular evolution methodologies for doing so. A running theme is the importance to function across all species and applies bioinformatics at multiple levels; most recently to intrinsic of modifications, both post and pre-translation. protein disorder, function and phenotype prediction, and to studying the determinants We begin at the beginning, with the Holt review describing one of the first of cellular identity. proteins to be recognized as disordered but functional back in the early 1950s, but for which the picture continues to become clearer in recent high A Keith Dunker Center for Computational Biology and profile publications. Casein is a protein that functions, when modified by Bioinformatics, Department of Biochemistry phosphorylation, at the unfolded end of the disordered-to-ordered spectrum. and Molecular Biology, Indiana University Holt tells the remarkable story of how this protein enables the co-existence School of Medicine, Indianapolis, IN 46202, of calcium in both hard and soft forms in the body, for example, in both milk USA and bone. e-mail: [email protected] A Keith Dunker, after a first career focusing For the various caseins and the other similarly acting IDPs, the basic idea is on the study of virus and phage structure and that the side-chain attached phosphates bind at the phosphate positions on assembly using biophysical methods, the lattices at the surfaces of calcium-phosphate nanoclusters, which are switched to a second career focusing on the precursors to the crystallization of calcium phosphate hydroxyapatite. study of intrinsically disordered proteins Because the caseins are disordered they can readily adjust their shapes to (IDPs) using bioinformatics and fit onto the irregular surfaces and even bend over the edges of the computational biology approaches. These latter studies suggest that IDPs and IDP nanoclusters and thus surround them. Multiple casein molecules join these regions comprise 30% to 50% of the amino clusters as needed to completely engulf the nanoclusters, and a single casein acids of various eukaryotic proteomes, and can bind to more than one nanocluster via multiple phosphate modifications. that IDPs and IDP regions are being found to The endpoint structures are ‘casein micelles’ or nanocluter-stabilized carry out an ever-widening array of crucial soluble aggregates of casein containing multiple nanoclusters. Overall, biological functions centered on signaling the various caseins enable milk to be supersaturated with calcium phos- and regulation. phate, this providing important nutrition for the growth of bones and pre- emergent teeth in newborn mammals of all kinds. The various body fluids in both mammals and other chordates contain other IDPs that likewise enable the accumulation of calcium phosphate to supersaturation levels via www.sciencedirect.com Current Opinion in Structural Biology 2013, 23:1–3 Please cite this article in press as: Gough J, Dunker AK, 1Sequences and topology: disorder, modularity, and post/pre translation modification, Curr Opin Struct Biol (2013), http://dx.doi.org/ 10.1016/j.sbi.2013.05.001 COSTBI-1118; NO. OF PAGES 3 2 Sequences and topology phosphorylated IDP side chains binding onto the surfaces aided by recent technical advances that are aimed at of the calcium phosphate nanoclusters. Such high levels calibrating ensemble descriptions. of solubilized calcium phosphate are necessary for both initial assembly and growth of bones and teeth and also The Jakob review continues the journey to the structured for their maintenance throughout life. end of our disordered-to-ordered spectrum with examples where well-foldedness (or the lack thereof) confers the Moving towards the partially ordered range of the dis- function. Whereas the Holt paper describes activity con- ordered-to-ordered spectrum, the Blackledge review trolled by phosphorylation, Jakob’s redox proteins considers proteins that can take on a structural ensemble undergo a different form of modification (oxidation) that of conformations. Figure 1 shows a highly disordered switches proteins between ordered and disordered states. protein that nonetheless has a small structured helix in The review contains three examples of reversible redox- the chain separating two regions of disorder, but closer to controlled transitions, including separate examples where the more disordered end of the spectrum. For a figure and function is activated with an order to disorder transition overview of the concept of a structural ensemble, as and conversely with a disorder to order transition. This covered in the Blackledge review, we refer you to our illustrates the multiplicity of relationships between order, previous editorial [1]. Blackledge explains how NMR disorder and gain or loss of function. approaches are particularly well adapted to elucidate the conformational behavior and biological activities of In a similar way to disordered/structured function being ensembles at atomic level resolution. This is further enabled/disabled in the Jakob review, the Babu review Figure 1 Current Opinion in Structural Biology At the more disordered end of the spectrum, but not completely disordered: all states of TSP9 (PDB 2fft). A single stable helix (black) seperates two disordered segments (blue and green-red) of the Thylakoid soluble phosphoprotein TSP9. All known states from NMR are shown superposed. This 2 2 image provided by Matt Oates reproduced from D P documentation [4] (http://d2p2.pro), a database of protein structure and disorder in genomes. Current Opinion in Structural Biology 2013, 23:1–3 www.sciencedirect.com Please cite this article in press as: Gough J, Dunker AK, 1Sequences and topology: disorder, modularity, and post/pre translation modification, Curr Opin Struct Biol (2013), http://dx.doi.org/ 10.1016/j.sbi.2013.05.001 COSTBI-1118; NO. OF PAGES 3 Editorial overview Gough 3 presents a different modification mechanism to the same computational methods in contrast to experimental end, this time pre-translation. Babu explains how disor- methods as emphasized by Blackledge. In attempting dered regions’ functions are disabled in a protein by being to predict Gene Ontology (GO) terms for sequences of removed altogether via splicing. This mechanism is com- unknown function, Jones combined a wide array of pre- plimentary to the other forms of modification described in dicted features. While the mere presence or absence of the earlier reviews. In fact, Babu authoritatively describes disorder alone is not likely to be a powerful feature, Jones the multi-level concept as the rewiring of networks of extolls the predictive potential of features within disorder signaling and regulation which utilise post-translational such as molecular recognition sites and short linear motifs. modification sites (as well as linear interaction motifs). Blackledge remarks that it is an unusual feature of the The Babu review allows us to step back from the specifics IDP field, that it is dominated by computational over of the first two reviews and contemplate the participation experimental analysis. Jones also concludes with a call for in higher levels of the cellular machine. the production of more experimental data from which to base the computational predictions. The much-needed Elofsson review extends the discus- sion of alternative splicing in this collection, from dis- Finally Taylor, in another form of computational predic- ordered regions to the possible effect on modular tion, describes recent prominent advances in contact structured regions of proteins: domains. In this review prediction between amino acids in proteins (as a route proteins are considered as linear combinations of to function via structure). Although the ideas have been domains. Splicing can lead to different kinds of protein around for a long time, significant recent advances have variation, but this review focuses

Sequences and Topology: Disorder, Modularity, and Post/Pre Translation Modiﬁcation

The SUPERFAMILY 2.0 Database: a Signiﬁcant Proteome Update and a New Webserver Arun Prasad Pandurangan 1,*, Jonathan Stahlhacke2, Matt E

Applied Category Theory for Genomics – an Initiative

Functional Effects Detailed Research Plan

Tum1 Is Involved in the Metabolism of Sterol Esters in Saccharomyces Cerevisiae Katja Uršič1,4, Mojca Ogrizović1,Dušan Kordiš1, Klaus Natter2 and Uroš Petrovič1,3*

Hmms Representing All Proteins of Known Structure. SCOP Sequence Searches, Alignments and Genome Assignments Julian Gough* and Cyrus Chothia

Smurflite: Combining Simplified Markov Random Fields With

AUTOPHY by Deepika Prasad a Thesis

Protein Domain Coocurences Reveal Functional Changes of Regulatory Mechanisms During Evolution

Renaissance SUPERFAMILY in the Decade Since Its Launch, a Repository of Information About Proteins in Genomes Has Developed Into a Primary Reference

An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement in Accuracy

Simulation and Graph Mining Tools for Improving Gene Mapping Efficiency

Specialized Hidden Markov Model Databases for Microbial Genomics