Mapping the Human Proteome Using Bioinformatic Methods

Mapping the Human Proteome Using Bioinformatic Methods LINN FAGERBERG Royal Institute of Technology School of Biotechnology Stockholm 2011 © Linn Fagerberg Stockholm 2011 Royal Institute of Technology School of Biotechnology AlbaNova University Center SE-106 91 Stockholm Sweden Printed by Universitetsservice US-AB Drottning Kristinas väg 53B SE-100 44 Stockholm Sweden ISBN 978-91-7415-886-1 TRITA-BIO Report 2011:4 ISSN 1654-2312 Cover illustration: Network view of the distribution of protein subcellular locations in U-2 OS. Each protein is represented by a circle colored by the number of locations it is detected in out of a total of 16 subcellular compartments. Linn Fagerberg (2011): Mapping the Human Proteome Using Bioinformatic Methods. School of Biotechnology, Royal Institute of Technology (KTH), Sweden ABSTRACT The fundamental goal of proteomics is to gain an understanding of the expression and function of the proteome on the level of individual proteins, on the level of defined cell types and on the level of the entire organism. In this thesis, the human proteome is explored using membrane protein topology prediction methods to define the human membrane proteome and by global protein expression profiling, which relies on a complex study of the location and expression levels of proteins in tissues and cells. A whole-proteome analysis was performed based on the predicted protein-coding genes of humans using a selection of membrane protein topology prediction methods. The study used a majority decision-based method, which estimated that approximately 26% of the human genes encode for a membrane protein. The prediction results are displayed in a visualization tool to facilitate the selection of antigens to be used for antibody generation. Global protein expression profiles in a large number of cells and tissues in the human body were analyzed for more than 4000 protein targets, based on data from the antibody-based immunohistochemistry and immunofluorescence methods within the framework of the Human Protein Atlas project. The results revealed few cell-type specific proteins and a high fraction of human proteins expressed in most cells, suggesting that cell and tissue specificity is attained by a fine-tuned regulation of protein levels. The expression profiles were also used to analyze the relationship between 45 cell lines by hierarchical clustering and principal component analysis. The global protein expression patterns overall reflected the tumor origin of the cells, and also allowed for identification of proteins of importance for distinguishing different categories of cell lines, as defined by phenotype of progenitor cell. In addition, the protein distribution in 16 subcellular compartments in three of the human cell lines was mapped. A large fraction of proteins were localized in two or more compartments and, in line with previous results, a majority of proteins were detected in all three cell lines. Finally, mass spectrometry-based protein expression levels were compared to RNA-seq-based transcript expression levels in three cell lines. Highly ubiquitous mRNA expression was found and the changes of expression levels between the cell lines showed high correlations between proteins and transcripts. Large general differences in abundance of proteins from various functional classes were observed. A comparison between categories based on expression levels revealed that, in general, genes with varying expression levels between the cell lines or only expressed in one cell line were highly enriched for cell-surface proteins. These studies show a path for a systematic analysis to characterize the proteome in human cells, tissues and organs. Keywords: proteome, transcriptome, bioinformatics, membrane protein prediction, subcellular localization, protein expression level, cell line, immunohistochemistry, immunofluorescence. © Linn Fagerberg LIST OF PUBLICATIONS This thesis is based upon the following five publications, which are referred to in the text by the corresponding roman numerical (I-V). The papers are included in the Appendix. I. Fagerberg L, Jonasson K, von Heijne G, Uhlén M, Berglund L. Prediction of the human membrane proteome. Proteomics. 2010 Mar; 10(6):1141-9. II. Pontén F, Gry M, Fagerberg L, Lundberg E, Asplund A, Berglund L, Oksvold P, Björling E, Hober S, Kampf C, Navani S, Nilsson P, Ottosson J, Persson A, Wernérus H, Wester K, Uhlén M. A global view of protein expression in human cells, tissues, and organs. Mol Syst Biol. 2009 Dec;5:337. III. Fagerberg L, Strömberg, S, Gry M, El-Obeid A, Nilsson K, Uhlen M, Ponten F, Asplund A. The Global Protein Expression Pattern in Human Cell Lines. Manuscript. IV. Fagerberg L*, Stadler C*, Skogs M, Hjelmare M, Jonasson K, Wiking M, Åbergh A, Uhlén M, Lundberg E. Mapping the subcellular protein distribution in three human cell lines. Submitted. V. Lundberg E*, Fagerberg L*, Klevebring D, Matic I, Geiger T, Cox J, Älgenäs C, Lundeberg J, Mann M, Uhlen M. Defining the transcriptome and proteome in three functionally different cell lines. Mol Syst Biol. 2010 Dec 21;6:450. * Authors contributed equally to this work. All publications are reproduced with permission of the respective copyright holders. RELATED PUBLICATIONS Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus H, Björling L, Ponten F. Towards a knowledge-based Human Protein Atlas. Nat Biotechnol. 2010 Dec;28(12):1248-50. Klevebring D, Fagerberg L, Lundberg E, Emanuelsson O, Uhlén M, Lundeberg J. Analysis of transcript and protein overlap in a human osteosarcoma cell line. BMC Genomics. 2010 Dec 2;11:684. Berglund L, Björling E, Oksvold P, Fagerberg L, Asplund A, Szigyarto CA, Persson A, Ottosson J, Wernérus H, Nilsson P, Lundberg E, Sivertsson A, Navani S, Wester K, Kampf C, Hober S, Pontén F, Uhlén M. A genecentric Human Protein Atlas for expression profiles based on antibodies. Mol Cell Proteomics. 2008 Oct;7(10):2019-27. Berglund L, Björling E, Jonasson K, Rockberg J, Fagerberg L, Al-Khalili Szigyarto C, Sivertsson A, Uhlén M. A whole-genome bioinformatics approach to selection of antigens for systematic antibody generation. Proteomics. 2008 Jul;8(14):2832-9. Uhlén M, Björling E, Agaton C, Szigyarto CA, Amini B, Andersen E, Andersson AC, Angelidou P, Asplund A, Asplund C, Berglund L, Bergström K, Brumer H, Cerjan D, Ekström M, Elobeid A, Eriksson C, Fagerberg L, Falk R, Fall J, Forsberg M, Björklund MG, Gumbel K, Halimi A, Hallin I, Hamsten C, Hansson M, Hedhammar M, Hercules G, Kampf C, Larsson K, Lindskog M, Lodewyckx W, Lund J, Lundeberg J, Magnusson K, Malm E, Nilsson P, Odling J, Oksvold P, Olsson I, Oster E, Ottosson J, Paavilainen L, Persson A, Rimini R, Rockberg J, Runeson M, Sivertsson A, Sköllermo A, Steen J, Stenvall M, Sterky F, Strömberg S, Sundberg M, Tegel H, Tourle S, Wahlund E, Waldén A, Wan J, Wernérus H, Westberg J, Wester K, Wrethagen U, Xu LL, Hober S, Pontén F. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics. 2005 Dec;4(12):1920-32. CONTENTS INTRODUCTION .................................................................................................... 1 1. THE DISCOVERY OF THE BUILDING BLOCKS OF LIFE ............................................................ 2 2. PROTEINS ....................................................................................................................... 5 2.1 CLASSIFICATION OF PROTEINS ......................................................................................... 5 2.2 MEMBRANE PROTEINS .................................................................................................... 8 2.3 PROTEIN ABUNDANCE ..................................................................................................... 8 2.4 PROTEOMIC TOOLS FOR ANALYZING PROTEIN EXPRESSION ...................................................... 9 2.4.1 ANTIBODY-BASED PROTEIN PROFILING ....................................................................... 10 2.4.2 MASS SPECTROMETRY-BASED METHODS ...................................................................... 12 3. THE HUMAN PROTEIN ATLAS PROJECT ............................................................................. 13 3.1 PROTEIN PROFILING USING IMMUNOHISTOCHEMISTRY ......................................................... 14 3.2 SUBCELLULAR PROFILING USING IMMUNOFLUORESCENCE-BASED CONFOCAL MICROSCOPY ............ 16 4. BIOINFORMATICS AND DATA ANALYSIS ............................................................................ 17 4.1 BIOLOGICAL DATABASES ........................................................................................................... 17 4.1.1 GENE AND GENE EXPRESSION RELATED DATABASES .............................................................. 17 4.1.2 PROTEIN RELATED DATABASES ............................................................................................ 19 4.1.3 ONTOLOGIES ..................................................................................................................... 21 4.2 TOOLS FOR THE ANALYSIS AND VISUALIZATION OF LARGE-SCALE BIOLOGICAL DATA ...................... 23 4.2.1 HIERARCHICAL CLUSTERING ................................................................................................ 23 4.2.2 PRINCIPAL COMPONENT ANALYSIS ....................................................................................... 25 4.2.3 ENRICHMENT ANALYSIS OF GENE ANNOTATIONS ..................................................................

Mapping the Human Proteome Using Bioinformatic Methods

Computation and Visualization of Protein Topology Graphs Including Ligand Information

Predictive Energy Landscapes for Folding Α-Helical Transmembrane Proteins

Protein Folding and the Organization of the Protein Topology Universe

Using the Spycatcher-Spytag and Related Systems for Labeling and Localizing Bacterial Proteins

UC San Diego UC San Diego Electronic Theses and Dissertations

APEX2 Proximity Proteomics Resolves Flagellum Subdomains and Identifies Flagellum Tip-Specific

Machine Learning Methods for Protein Structure Prediction Jianlin Cheng, Allison N

The Relationship Between APOL1 Structure and Function: Clinical Implications

Predicting Protein Secondary and Supersecondary Structure

The Development of the Prediction of Protein Structure

Protein Topology Function

Ÿþf R O N T M a T T