Essay Deducing Protein Function by Forensic Integrative Cell Biology William C. Earnshaw* Wellcome Trust Centre for Cell Biology, University of Edinburgh, ICB, Edinburgh, Scotland, United Kingdom even essential alternative functions that are what part it plays in the particular cellular Summary currently unsuspected. process I happen to be studying. Depend- Our ability to sequence genomes The subject of this essay is an important ing on the organism, the functions of some has provided us with near-complete emerging area in cell biology research: 20%–60% of proteins are uncertain [2]. lists of the proteins that compose how to predict the functions of unchar- As referred to here, ‘‘uncharacterised cells, tissues, and organisms, but acterised and unknown proteins and how proteins’’ are proteins that are present in this is only the beginning of the to identify and characterise novel func- annotated databases, but whose functions process to discover the functions of tions of known proteins (for earlier discus- are not determined. Proteins published as, cellular components. In the future, sions of this see [1–3]). These are areas for example, ‘‘protein up-regulated in it’s going to be crucial to develop that I predict will involve the coordination cancer’’ or ‘‘protein up-regulated in cell computational analyses that can of very different kinds of advances by two type x’’ may have names, but no one predict the biological functions of distinct cohorts of future cell biologists. actually knows what they do. In a recent uncharacterised proteins. At the The first of these will be adept at study of the proteome of mitotic chromo- same time, we must not forget producing a huge range of information somes, amongst .4,000 proteins identi- those fundamental experimental from a wide variety of ‘‘omics’’ and other fied, my colleagues and I found just over skills needed to confirm the predic- high-throughput studies and able to inte- 300 proteins like this [4]. tions or send the analysts back to grate this information to predict how What I call ‘‘unknown proteins’’ can be the drawing board to devise new proteins function. The second will devise of two classes. First, many proteins are ones. low and high-throughput biochemical tests present in databases but have not yet to prove or disprove those predictions in appeared in publications or been given the laboratory. formal names. In our chromosome anal- What does it mean for cell biologists to Before I go further, I should define my ysis, we identified 260 of these unknown be working in a ‘‘post-genomic’’ era? This terms. First, the term ‘‘function’’ means proteins. The second class comprises is more a media term than a scientific different things to different groups of proteins whose existence is unsuspected, term, but I believe that what is commonly researchers: to a classical geneticist, for that is, of course, until someone describes understood by it is that we now work in an example, it might mean turning a fly’s them. For example, by using mass era where there is claimed to be a antennae into legs; to a biochemist it spectroscopy Crispin Miller’s group found complete—or near complete—parts list might mean forming a complex with a 346 novel peptides and proteins that were for the cells of most major experimental group of other proteins known to be smaller than the minimum cut-off size organisms. Of course this belief is not involved in a particular process such as used for identifying protein-coding genes entirely true. We all know that the genome regulation of gene expression, and to a by the Human Genome Project [5]. projects have not given us the complete structural chemist it might mean removing Shortly thereafter, the same group iden- sequence of all human (or any other an electron from one chemical bond and tified 39 previously unsuspected genes metazoan) DNA. For example, we still transferring it to another. As a cell encoding novel short proteins in Schizo- do not have a contiguous sequence of the biologist, I am usually content that I know saccharomyces pombe [6]). Alternative splic- highly repetitive regions of chromosomes something about the function of my ing, where a single pre-mRNA can yield in and around centromeres and at a protein if I know where it is in the cell, several—or many—functional mature number of other loci. Of course, there what other proteins it interacts with, and mRNAs that can encode a range of may not be very many (or even any) important undiscovered protein-coding Citation: Earnshaw WC (2013) Deducing Protein Function by Forensic Integrative Cell Biology. PLoS Biol 11(12): genes hiding in these regions, but it is e1001742. doi:10.1371/journal.pbio.1001742 likely that there are still a substantial Published December 17, 2013 number of unknown proteins encoded by Copyright: ß 2013 William C. Earnshaw. This is an open-access article distributed under the terms of the the human genome. Furthermore, as I will Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any discuss below, there are also likely to be medium, provided the original author and source are credited. quite a few proteins whose functions we Funding: WCE is a Principal Research Fellow of The Wellcome Trust (grant number 073915). The funders had think we know, but that have important or no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: I collaborate with Prof. Juri Rappsilber and Dr. Shinya Ohta, who are thanked in the acknowledgments section, but have no competing interests with any person or entity mentioned in the text. Essays articulate a specific perspective on a topic of broad interest to scientists. Abbreviations: APC/C, anaphase promoting complex/cyclosome. * E-mail: [email protected] PLOS Biology | www.plosbiology.org 1 December 2013 | Volume 11 | Issue 12 | e1001742 related proteins, some of which may have figure this out other than by chance Borealin/Dasra, an essential member of the very different functions, provides another observation or coincidence? chromosomal passenger complex [18,19]. very rich source of previously unknown The development of multi-classifier proteins. The Power of Making Lists combinatorial proteomics for the analysis RNA-seq and proteomics studies are By far the simplest way to predict the of the mitotic chromosome proteome [4] is just now beginning to reveal the true function of an unknown protein is to show an example of applying this type of complexity of proteins that can be gener- that it has significant sequence relatedness approach to other sorts of datasets. In this ated from the ,20,000 protein-coding to another protein whose function is case, we combined data (arrayed as genes in humans. Beyond the discovery known. This is all well and good unless classifiers) from five different types of of unknown proteins, there is a world of your favourite protein is a ‘‘pioneer’’ (i.e., proteomics experiments, all based on the largely unexplored functional diversity protein of unknown function whose amino stable incorporation of labelled amino arising from post-translational modifica- acid sequence is unrelated to any protein acids in culture (SILAC) approach [20], tions of proteins. Given the example that of known function), in which case you are to look at the association of several simple phosphorylation can change the on your own without a tried and tested thousand proteins with isolated mitotic nuclear lamins from members of a highly recipe for how to proceed. I suggest that in chromosomes. These classifiers included insoluble structural framework underpin- the future the emerging art of predicting the estimated number of copies of each ning the nuclear envelope to soluble the function of unknown proteins (and protein in mitotic chromosomes; the ratio proteins in the mitotic cytoplasm [7,8] predicting novel functions for known of the amount of each protein in isolated and the vast numbers of modifications on proteins) will be the domain of ‘‘forensic chromosomes versus the amount in the proteins such as the histones [9], the scope integrative cell biologists’’ (using ‘‘foren- cytosol after removal of the chromosomes; for functional complexity induced by post- sic’’ as defined in Wikipedia as the the extent to which proteins present in translational modifications is truly vast and investigation of ‘‘situations after the fact, cytosol bound to isolated chromosomes will keep cell biologists busy for many and to establish what occurred based on that had been incubated in crude cytosol; years to come. collected evidence’’). These researchers the extent to which the abundance of Exploring the unknown is always excit- will develop methods for integrating a particular proteins on chromosomes was ing and challenging, but occasionally wide range of approaches to look at affected by loss of the condensin complex exploring the ‘‘known’’ can yield equally protein function, breaking each approach from chromosomes; and the extent to exciting dividends. In some cases it is down to its component ‘‘classifiers’’: lists of which the abundance of particular pro- straightforward to predict proteins that all the components of a system (or teins on chromosomes was affected by loss will have multiple functions—an obvious subsystem, as defined by [3]) identified of the SKA complex from chromosomes example lies in the correctly, but perhaps by a particular experiment, each of which [4]. We constructed classifiers in which the naively named histone acetyl transferases is attributed a score based on a predefined proteins from these different experiments and histone deacetylases and their ilk, criterion and then ranked in order of that were ranked according to their SILAC which likely add and remove post-transla- score. The key is then to figure out how to ratios (e.g., whether there was more or less tional modifications to very large numbers combine and compare these classifiers to of a given protein associated with chro- of cellular proteins in addition to histones.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-