GLOBAL PROTEOMIC DETECTION OF NATIVE, STABLE, SOLUBLE HUMAN COMPLEXES

by

Pierre Claver Havugimana

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto

© Copyright by Pierre Claver Havugimana 2012

Global Proteomic Detection of Native, Stable, Soluble Human Protein Complexes

Pierre Claver Havugimana

Doctor of Philosophy

Graduate Department of Molecular Genetics University of Toronto

2012 Abstract

Protein complexes are critical to virtually every biological process performed by living organisms. The cellular “interactome”, or set of physical protein-protein interactions, is of particular interest, but no comprehensive study of human multi-protein complexes has yet been reported. In this Thesis, I describe the development of a novel high-throughput profiling method, which I term Fractionomic Profiling-Mass Spectrometry (or FP-MS), in which biochemical fractionation using non-denaturing high performance liquid chromatography (HPLC), as an alternative to affinity purification (e.g. TAP tagging) or immuno-precipitation, is coupled with tandem mass spectrometry-based protein identification for the global detection of stably- associated protein complexes in mammalian cells or tissues. Using a cell culture model system, I document proof-of-principle experiments confirming the suitability of this method for monitoring large numbers of soluble, stable protein complexes from either crude protein extracts or enriched sub-cellular compartments. Next, I document how, using orthogonal functional genomics information generated in collaboration with computational biology groups as filters, we applied FP-MS co-fractionation profiling to construct a high-quality map of 622 predicted unique soluble human protein complexes that could be biochemically enriched from HeLa and

HEK293 nuclear and cytoplasmic extracts. Our network is enriched in assemblies consisting of human disease-linked and contains hundreds of putative new components and novel ii complexes, many of which are broadly evolutionarily conserved. This study revealed unexpected biological associations, such as the GNL3, FTSJ3, and MKI67IP factors involved in 60S assembly. It is my expectation that this first systematic, experimentally-derived atlas of putative human protein complexes will constitute a starting point for more in depth, hypothesis- driven functional investigations of basic human molecular and cellular biology. I also note that my generic FP-MS screening approach can, and is currently, being applied by other members of the Emili laboratory to examine the global interactomes of other mammalian cell lines, tissues, sub-cellular compartments, and diverse model organisms, which should expand our understanding of proteome adaptations and association networks associated with cell physiology, animal development and molecular evolution.

iii

Acknowledgments

I thank God for protecting and guiding me throughout the most hopeless situations of my life and allowing me to arise as a mature, responsible, stronger and wiser individual ready to face any challenge with faith and perseverance. From what began as just gaining a diploma in order to establish myself in my new country and home after the genocide in Rwanda, later ignited a dream which led to a lengthy and incredible journey of doctoral studies. I would like to thank several people who have helped me to make this journey a very rewarding and memorable experience.

First of all, I would like to express my sincere appreciation and gratitude to my supervisor, Dr. Andrew Emili. I joined the Emili laboratory in 2005, first as a Research Assistant and then later as a Graduate Student starting 2007, after obtaining degrees in Biochemistry and

Chemistry and a Master of Science in Chemical Engineering, all from Laval University. Along this path, Dr. Andrew Emili has provided me the resources and freedom to test my every single idea, as well as tremendous encouragement and unwavering support, all with outstanding patience. I consider Dr. Emili as my godparent, a colleague and a great friend with whom I received invaluable advice that has and will always remind me of my parents to whom I inherited the value of the hard work and perseverance.

Second, I would like to thank all members of my PhD exam committee for their valuable time and constructive criticisms. I particular thank my supervisory committee members, Dr. Gary

Bader, Dr. Jack Greenblatt and Dr. Andrew Wilde, for educating me and allowing me to grow as a critical thinking scientist. Their encouragement and guidance through challenge in the past five years of my doctoral studies have been a source of inspiration and I hope I can live up to their expectations. iv

Third, I respectfully acknowledge my exceptional collaborators, the group of Dr. Edward

Marcotte, Dr. Alberto Paccanaro, Dr. Shoshana Wodak and Dr. Elisabeth Tillier. Their computational expertise and intellectual inputs made the findings in this thesis a dream that came true. Special thanks to Dr. Edward Marcotte for the opportunity afforded me to work in his laboratory, and to members of his team for their support and friendship during my stay in Austin.

Thanks are also expressed to Dr. Jim Ingles whose comments and editing rules made an everlasting improvement to this thesis.

Fourth, I also greatly appreciated the support, advice and encouragement from the present and past members of the Emili and Greenblatt labs during our daily conversations, lab meetings and exam preparation. In particular, I would like to extend my sincere thanks to Vincent Fong,

Pingzhao Hu, Sadhna Phanse, Cuihong Wan, Hongbo Guo and Zuyao Ni for their technical assistance, to Ruth Isserlin, Gabe Musso and Jeffrey Fillingham for proofreading my committee reports, and to Alla Gagarinova for proofreading my re-class proposal.

Fifth, I would like to thank my wife and best friend, Venis, for taking pride in my achievements. Your patience and unprecedented support and belief in my personal goals and potential has meant so much to me. Thanks are also due to my cousin, Ignace and his family, and my sister, Donathille, for being there when I needed them most. A huge thanks to all members of my family-in-law for the warmth and love they have brought into my life.

Finally, I dedicated this thesis, the most significant achievement of my life so far, to all members of my family who could have been proud of me today if their lives were not taken so early in the 1994 Rwandan genocide.

Thank you all and God bless you. Pierre Claver Havugimana, 2012 v

Table of Contents

Acknowledgments ...... iv

Table of Contents ...... vi

List of Tables ...... ix

List of Figures ...... x

List of Abbreviations ...... xii

Chapter 1 Introduction and Background Information ...... 1

1 Introduction and background information ...... 1

1.1 The importance of protein complexes ...... 1

1.2 Experimental approaches for genome-wide screening of protein complexes ...... 5

1.2.1 Affinity purification and mass spectrometry ...... 9

1.2.2 Low-throughput experimental approaches amenable to proteome scale-up ...... 26

1.3 Computational methods for prediction and analysis of protein complexes ...... 29

1.4 Repositories of experimentally derived PPI and data ...... 32

1.5 Goal and purposes of the present thesis project ...... 33

Chapter 2 Development of a High-Throughput Global Profiling Method Based on High Resolution Ion-Exchange High Performance Liquid Chromatography for Proteomic-Scale Analyses of Native Stable Soluble Protein Complexes ...... 35

2 Development of a high-throughput global profiling method based on high resolution ion- exchange high performance liquid chromatography for proteomic-scale analyses of native stable soluble protein complexes ...... 36

2.1 Introduction ...... 36

2.2 Material and Methods ...... 39

2.2.1 Cell lines and cell-free extract preparation ...... 39

2.2.2 HPLC columns, buffers, and instrumentation ...... 39

2.2.3 Single phase weak cation-exchange fractionation of HeLa cytosolic extracts ..... 39

2.2.4 Single phase weak anion-exchange fractionation of HeLa cytosolic extracts ...... 40

vi

2.2.5 Dual phase WAX and WCX fractionation of HeLa cytosolic extracts ...... 40

2.2.6 Triple phase ion-exchange fractionation of HeLa cytosolic extracts ...... 41

2.2.7 Triple phase ion-exchange fractionation of HeLa nuclear extracts ...... 41

2.2.8 Dual phase heparin mixed-bed IEX fractionation of HeLa nuclear extracts ...... 42

2.2.9 Single phase heparin fractionation of HeLa nuclear extracts ...... 43

2.2.10 LC-MS/MS separation and identification of chromatographic peptide fractions . 43

2.2.11 LC-MS/MS spectra database search and protein identification ...... 45

2.2.12 Benchmarking human protein complexes and protein-protein interactions ...... 46

2.3 Results ...... 48

2.3.1 Ion-exchange high performance liquid chromatography of HeLa cell-free extracts ...... 48

2.3.2 Determination of the performance of the various ion-exchange high performance liquid chromatography columns ...... 57

2.3.3 Assessing the reproducibility and efficiency of tandem heparin mixed-bed IEX co-fractionation approach ...... 66

2.4 Discussion ...... 69

Chapter 3 Application of the FP-MS Approach to the Global Detection of Soluble Stable Human Native Complexes Expressed in HeLa and HEK293 Model Lines ...... 71

3 Application of the FP-MS approach to the global detection of soluble stable native complexes expressed in HeLa and HEK293 model lines ...... 72

3.1 Introduction ...... 72

3.2 Material and Methods ...... 74

3.2.1 Experimental methods ...... 74

3.2.2 Computational analysis methods ...... 77

3.3 Results ...... 92

3.3.1 High-throughput complex fractionation and detection by LC-MS/MS ...... 92

3.3.2 Reconstruction of a high confidence co-complex interaction network ...... 101

3.3.3 Construction and validation of protein complexes from the probabilistic interaction network ...... 110 vii

3.3.4 Clinical and biological implications of the reconstructed human protein complexes ...... 119

3.3.5 Conservation of human protein complexes ...... 123

3.3.6 Protein abundance ...... 127

3.4 Discussion ...... 127

Chapter 4 Conclusions and Future Directions ...... 129

4 Conclusions and future directions ...... 130

4.1 Conclusions ...... 130

4.2 Future directions ...... 137

4.2.1 Investigation of the role of FTSJ3, MKI67IP, and GNL3 in ribosome biogenesis ...... 137

4.2.2 Enhanced FP-MS for the analysis of membrane-associated protein complexes 137

4.2.3 Comparative interactome mapping across other models ...... 138

4.2.4 Elucidation of microbe-human cell interactions ...... 138

References ...... 140

Appendix 1: List of predicted protein complexes ...... 153

viii

List of Tables

Table 1-1: List of data type used in the construction of HumanNet resource...... 30

Table 2-1: Positive reference protein complexes from CORUM database used to evaluate the performance of the co-fractionation HPLC experiments...... 47

Table 2-2: Test performance analysis indicating the fraction of subunits of reference complexes sharing the same exact co-apex peak during FP-MS...... 63

Table 3-1: Summary of the sample analyzed by MS in this study...... 83

Table 3-2: Benchmarking results between ClusterONE and other popular clustering algorithms...... 111

Table 3-3: Enrichment analysis indicates highly significant PPI overlap between my study and independent AP-MS datasets...... 115

Table 3-4: Functional enrichment analysis supporting predicted complexes...... 118

ix

List of Figures

Figure 1-1: Pie chart showing proportion of Ontology annotations experimentally derived or computationally predicted for the 20,240 sequence reviewed human proteins in the UniProtKB genome database circa March, 2012...... 3

Figure 1-2: Various experimental methods reported in the curated CORUM annotation database that have been used to document protein complexes in higher , including human...... 8

Figure 2-1: Schematic workflow of my Fractionomic Profiling-Mass Spectrometry strategy for identifying soluble protein complexes on a large scale...... 50

Figure 2-2: Evaluation of chromatographic resolution of HeLa S3 cytosolic protein extracts by four different IEX-HPLC methods...... 53

Figure 2-3: Assessment of HPLC fractionation efficiency with a HeLa S3 nuclear protein extracts...... 56

Figure 2-4: Distribution of protein abundances detected by MS in HeLa cell-free extracts...... 59

Figure 2-5: Identification of co-eluting proteins through detection of co-apex peaks...... 61

Figure 2-6: Co-elution network comprising associations predicted within and between 20 representative annotated human complexes used as a reference test set to evaluate my co- fractionation scoring procedure...... 65

Figure 2-7: Reproducibility of heparin-IEX HPLC and LC-MS/MS Profiling...... 68

Figure 3-1: Integrative multi-pronged strategy used to identify human soluble protein complexes...... 94

Figure 3-2: Identification of co-purifying protein subunits by LC-MS/MS analysis...... 95

Figure 3-3: Abundance levels of the proteins identified in this study...... 96

Figure 3-4: Overlap of the proteins identified in this study with those in CORUM database...... 98

x

Figure 3-5: Assessment of LC-MS/MS protein detection bias...... 99

Figure 3-6: Distribution of protein-protein interactions identified by the co-apex method...... 100

Figure 3-7: Deriving protein complexes from biochemical co-elution network data set...... 104

Figure 3-8: Filtering biochemical network with functional evidence improves both precision and recall...... 105

Figure 3-9: Biochemical network of the 20 reference complexes after filtering with functional evidence...... 106

Figure 3-10: Correlation between human PPI and orthologous Drosophila PPI...... 108

Figure 3-11: Precision-recall curve showing improved performance obtained after denoising procedure...... 109

Figure 3-12: Distribution of predicted complexes based on size...... 112

Figure 3-13: Global validation of the predicted high confidence human protein complexes. .... 113

Figure 3-14: Proportions of annotated versus putative new protein complexes...... 114

Figure 3-15: Global map of high confidence human protein complexes...... 117

Figure 3-16: Distribution of disease-associated proteins in the predicted protein complexes. .. 120

Figure 3-17: Membership in complexes predicts disease associations...... 121

Figure 3-18: Affinity purification mass spectrometry confirmed three novel ribosome biogenesis factors...... 122

Figure 3-19: Evolutionary conservation of protein complexes...... 124

Figure 3-20: Conservation of human complexes in fly and yeast...... 126

xi

List of Abbreviations

Ǻ Angstrom

ACN Acetonitrile

AIDS Acquired immuno-deficiency syndrome amu Atomic mass unit

AP-MS Affinity purification mass spectrometry

ARP2/3 Actin-Related Proteins ARP2 and ARP3

BAC Bacterial artificial

BioGRID Biological General Repository for Interaction Datasets

CD Compact disc

CDLS Cornelia de Lange syndrome cDNA Complementary Deoxyribonucleic acid

CGAP Cancer genome anatomy project

CID Collision induced dissociation

ClusterONE Clustering with Overlapping Neighborhood Expansion cm centimeter

CMC Clustering based on maximal cliques

COG Conserved oligomeric Golgi complex

CORUM Comprehensive Resource of Mammalian protein complexes

CV Coefficient of variation

DCS Dual phase column set up

DIP Database of interacting proteins

DMEM Dulbecco's Modified Eagle Medium

DTT Dithiothreitol

EMSA Electrophoretic mobility shift assay xii

ESI Electro-spray ionization

ESI-MS/MS Electro-spray ionization tandem MS

EST Expressed sequence tag

FDR False discovery rate

FP-MS Fractionomic profiling-mass spectrometry

GAD Genetic Association Database

GFP Green fluorescent protein

GO-CC - cell component

HAC Heparin affinity chromatography

HCW Dual heparin mixed-bed IEX

HEK Human embryonic kidney

HIV Human immunodeficiency virus

HPLC High-performance liquid chromatography

HPRD Human Protein Reference Database id inner diameter

IEA Inferred from electronic annotation

IEF Iso-electric focusing

IEX Ion exchange chromatography

IEX-HPLC Ion exchange chromatography-HPLC

IgG Immunoglobulin G

IP Immuno-precipitation kDa kilo Dalton kV kilo Volts

LC-MS/MS Liquid chromatography- tandem mass spectrometry

LGS Langer-Giedion Syndrome

xiii

Lsm 2-8 Sm like-proteins numbered 2-8

LTQ Linear ion trap quadrupole

LUMIER Luminescence-based mammalian interactome mapping

MAFFT Multiple sequence alignment based on the fast Fourier transform

MALDI-TOF Matrix assisted laser desorption/ionization time of-flight

MAPPIT Mammalian protein–protein interaction trap

MCL Markov Clustering

MCODE Molecular Complex Detection

MeCP1 Methyl CpG binding protein complex 1 min Minutes

MINT Molecular INTeraction database

MIPS Munich Information Center for Protein Sequences mM Millimolar mm millimeter (10-3)

MMM MatrixMatchMaker

MNG-3 Maltose–neopentyl glycol

MPPI The MIPS mammalian protein-protein interaction database. mRNA messenger RNA

MS Mass spectrometry

MS/MS tandem mass spectrometry

NCBI National Center for Biotechnology Information

ND No biological data available

NELF Negative

NR Not recorded oC Celsius degree

xiv

OMA Orthologous MAtrix

OMIM Online Mendelian Inheritance in Man

OPHID Online Predicted Human Interaction Database

ORF Open

PBAF Polybromo-and BAF containing complex

PCC Pearson correlation coefficient

PINdb Proteins Interacting in the Nucleus database

PPI Protein-protein interactions

RFC Replication factor C complex

RNA Ribonucleic acid

RNAi Ribonucleic acid interference

RNSC Restricted Neighborhood Search Clustering Algorithm

SPA Tandem sequential peptide affinity

SVM Support Vector Machines

TAP Tandem affinity purification

TCA Trichloroacetic acid

TCS Triple phase column set up

TEV Tobacco etch virus

TFE 2,2,2-trifluoroethanol

TREX Transport RNA export complex

TRiC/CCT TCP-1 Ring Complex/ Chaperonin containing TCP-1

UniProt Universal Protein Resource

UniProtKB UniProt Knowledgebase

WAX Weak anion-exchange

WCC weighted cross correlation

xv

WCX Weak cation exchange

Y2H Yeast two-hybrid

μl micro liter

μm Micron

xvi 1

Chapter 1 Introduction and Background Information

1 Introduction and background information

1.1 The importance of protein complexes

Protein complexes, often viewed as molecular machines, are defined as an assembly of two or more proteins that, through a concerted mechanism of action, manage to perform or regulate critical biological processes in cells (Alberts, 1998; Hartwell et al., 1999). Although certain cellular proteins (for example, ) may carry out their function independently, most proteins physically interact, either transiently (dynamically) or stably (which is the focus of this

Thesis), with other proteins for proper function in a complex biological system, and may even be shared components of multiple complexes. For example, in eukaryotic cells, the multi-subunit complex consisting of the ribonucleic acid (RNA) polymerase II core consists of 12 stably-associated subunits, of which 5 are shared with RNA polymerase I and III. In concert with -associated protein complexes, RNAP II is actively recruited to promoters where it transcribes messenger ribonucleic acid (mRNA) coding to produce templates for polypeptide synthesis by the multiple components of the apparatus, including a megadalton ribosome complex. The discovery of biochemically well defined complexes such as these, as well as the many other yet to be discovered cellular protein assemblies localized to the nucleus and other sub-cellular compartments, remains a seminal challenge for the biological community.

2

Unbiased experimental elucidation of intermolecular connectivity and specificity can provide insights into both individual gene product function and entire biological systems consisting of molecular modules of interacting proteins. Strikingly, while the is thought to encode on the order of ~22,000 open reading frames (excluding splice variants), over half of all predicted human proteins currently lack experimental evidence indicating a biological function or sub-cellular location (Figure 1-1). Moreover, prior to my work, the lack of comprehensive, unbiased approaches means that experimental evidence has only managed to assign about 20% of annotated human gene products as components of multi-protein complexes

(Hutchins et al., 2010; Malovannaya et al., 2011), which contrasts with the remarkable progress made by the Emili and Greenblatt groups (Butland et al., 2005; Hu et al., 2009; Krogan et al.,

2006) and many others (Breitkreutz et al., 2010; Gavin et al., 2006; Gavin et al., 2002; Ho et al.,

2002) in applying high-throughput screening methods to map the interactomes of simple single- cell models like yeast or E. coli, where about half of all open reading frames have been confidently assigned to membership to one or more putative complexes (Breitkreutz et al., 2010;

Butland et al., 2005; Gavin et al., 2006; Gavin et al., 2002; Ho et al., 2002; Hu et al., 2009;

Krogan et al., 2006; Pu et al., 2007).

3

Figure 1-1: Pie chart showing proportion of Gene Ontology annotations experimentally derived or computationally predicted for the 20,240 sequence reviewed human proteins in the UniProtKB genome database circa March, 2012.

4

As result, a current challenge for the biological community is to understand the biological function of unannotated human proteins through determination of their physical interactions, and how these influence human health and disease. Due to the increasing global threat of non- communicable diseases (e.g. cancer, cardiovascular disorders, diabetes), there is pressing need to increase our understanding of how human genes, cells and tissues function as integrated systems at the molecular level and how the underlying interaction networks respond to normal physiological cues or become perturbed in pathological conditions. To accelerate such basic knowledge, innovative technologies that potentially allow for the systematic proteomic investigation of the many, functionally diverse protein complexes present in a typical human cell on a proteome-wide level are needed. This motivation drove my development and application of a new experimental procedure aimed at facilitating the construction of a comprehensive, high- quality human protein interaction map.

In this chapter, I review well-established 'state-of-the-art' proteomic screening technologies commonly used to catalogue large collections of protein interactions in important model systems. First, I present an overview of the available assays for protein complexes studies and briefly comment on the yeast two-hybrid based assays designed for binary protein interaction screening, followed by a more detailed review of the affinity purification-mass spectrometry (AP-MS) method, devised for the large-scale screening and detection of protein complexes and its applications to the study of the human physical interactome. Second, I comment on more traditional, low-throughput biochemical fractionation experimental approaches described in the literature that I show are potentially scalable to large-scale proteomic studies of stable human protein complexes. Third, I present computational approaches that have been developed to predict protein complex membership based on interpretation of

5 physical interaction networks or other experimental criteria (e.g. gene functional co-conservation across species). Fourth, I examine the extent, utility and limitations of existing curated human protein-protein interaction data and annotated protein complex information currently available in the public domain. I finally close this introductory chapter by introducing the rationale of my doctoral research project, which is aimed at expanding knowledge about basic human "systems biology" through construction and rigorous benchmarking of a global high-quality reference map of soluble human protein complexes present in cultured cell lines.

1.2 Experimental approaches for genome-wide screening of protein complexes

Because of the importance of protein-protein interactions and protein complexes in various cellular processes, many experimental techniques have been developed for systematic screening

[for a review see (Phizicky and Fields, 1995)]. To date, more than 60 low-throughput experimental methods (summarized as a histogram in Figure 1-2) have been used to isolate and characterize protein complexes in mammalian species (Ruepp et al., 2009). Some have proven to be exceedingly popular, such as traditional co-immunoprecipitation wherein an is used to precipitate a target protein in association with its interacting partners, because they are well regarded assays and/or are easy to implement on a small scale.

Others have been designed to identify pairs of direct binary interacting proteins. These include the genetic yeast two-hybrid high-throughput screening approach (or Y2H), and related mammalian Y2H spin-off methods [reviewed in (Hiu Yi Lam and Stagljar, 2012; Suter et al.,

2008)], the mammalian protein–protein interaction trap (MAPPIT) (Eyckerman et al., 2001) or the luminescence-based mammalian interactome mapping procedure (LUMIER) (Barrios-

Rodiles et al., 2005). The Y2H assay, for example, is based on genetic complementation assays that test combinations of ORFs through presentation of two (bait and prey) recombinant fusion

6 proteins (as factor activating domain and DNA-binding domain fusions, respectively) and expressing them intracellularly in yeast along with various genetic and/or enzymatic reporter constructs. If the two chimeric proteins interact, they reconstitute a functional transcriptional factor which in turn activates the expression of a reporter gene(s). As such, these assays provide only indirect and limited information about the composition of protein complexes.

Two key advantages of simple binary PPI assays such as Y2H are assay sensitivity and scalability. For example, since its inception in 1989 (Fields and Song, 1989), Y2H has been used to map ~15,000 putative PPI among ~6,000 human proteins (Stark et al., 2006). Large-scale Y2H data sets were first described using yeast as the actual of focus (Ito et al., 2001;

Uetz et al., 2000). These first data sets had high false positive rates (spurious interactions estimated at up to 50%) (von Mering et al., 2002), creating widespread skepticism about the reliability of these methods to deduce interaction networks. In response to this criticism, enhanced versions of Y2H platforms have been constructed to more reliably generate high quality PPI data (Dreze et al., 2010). For example, a variety of quality controls and data filters

(which is a theme that is explored extensively in Chapter 3) are now routinely included in each

Y2H experiment to ensure protein-protein interactions are measured more reliably.

In a recent 2008 Science paper by Vidal and colleagues (Yu et al., 2008), they emphasized the crucial role played by the gold standard data set when the scientist is evaluating his experimental results or comparing results from different experimental approaches such as

Y2H and AP-MS. They demonstrated that while Y2H performed well on binary gold standard

PPI (100 curated PPI as a positive set; 100 random PPI as a negative set), AP-MS screens performed very poorly (<10% positives detected). However, these results were reversed when

7 the MIPS gold standard was used to assess the quality of Y2H PPI (<10% true positives). These observations suggest that one should exercise caution when selecting a gold standard as a reference set. Otherwise, the performance of a method may be wrongly evaluated. Such would be the case for Y2H, which performs well in detecting transient PPI but not on the stable PPI generated by the AP-MS screening approach. In my study, which focused on stable protein complexes, I removed Y2H-based PPI in my gold standard to reflect these observations.

8

Figure 1-2: Various experimental methods reported in the curated CORUM annotation database that have been used to document protein complexes in higher eukaryotes, including human.

The number of instances (transformed in Log base for easy visualization in a bar chart) that the method appears in CORUM is plotted on X-axis and the Methods on Y-axis. The widely used methods are from the bottom to the top (where the number of instance is less than one example) on Y-axis. Please note that assays for DNA-protein interaction are also included.

9

Generally, each of these previously reported approaches has been used in an ad hoc manner to identify the associations of specific proteins of particular interest. That said, most existing methods are hard to scale-up due to cost or a lack of appropriate high-grade reagents such as validated . Moreover, each method has its own relative strengths and weaknesses, especially with regards to assay simplicity, sensitivity, specificity, reproducibility, and scalability. The commonly used co-IP method requires the production of high quality, ideally clean (i.e., non-cross-reacting) antibodies tailored to each protein under investigation.

While the cross-reacting issue has recently been partially addressed by the development of a MS semi-quantitative method based on peptide spectral count information to filter out non-specific protein binders (Malovannaya et al., 2011; Malovannaya et al., 2010), production of high quality antibodies for all predicted human proteins remains an unresolved challenge. Described below is the well-established AP-MS method that has been used especially successfully on a large scale for the routine isolation and identification of heteromeric protein complexes.

1.2.1 Affinity purification and mass spectrometry

Affinity purification after tagging coupled to mass spectrometry (AP-MS) is a well regarded targeted experimental approach for individual protein complex purification and identification that has served as a cornerstone in the rapidly emerging field of interactome proteomics (Gingras et al., 2007; Kocher and Superti-Furga, 2007; Musso et al., 2007). In a typical AP-MS experiment for characterization of protein complexes, DNA encoding a peptide 'tag' is often inserted at the N- or C-terminus of a protein-coding cDNA gene using classical molecular cloning or a commercial kit (e.g. Invitrogen Gateway technology) to speed up the procedure.

Next, the resulting fusion protein ('bait') is expressed in cells to form a complex or complexes with endogenous proteins. Third, the cells are disrupted/lysed and the bait along with its

10 interacting partners ('preys') are retrieved with an antibody or affinity matrix that recognizes the tag. A variety of different procedures [for more recent reviews see (Li, 2011; Liang et al., 2009;

Terpe, 2003; Xu et al., 2010)] can be used for protein tagging and improving affinity purification of the protein complex. Some involve the use of single-step affinity purification (e.g. with a

FLAG tag) and others involve sequential steps using a TAP tag ( tandem affinity purification) in which there is a fusion of two different tags separated with a protease site linker sequence at the terminus of the target protein. The initial TAP tag approach was performed in S. cerevisiae, where a dual tag that consisted of Protein A and a calmodulin-binding peptide (CBP) separated by a tobacco etch virus (TEV) protease cleavage sequence was introduced at the C-terminal of candidate proteins via (Rigaut et al., 1999). Single-step affinity purifications have the advantage of minimizing sample dilution, and are fast and less stringent, thus allowing the capture of some transient protein complexes. The main drawback of single affinity purification is its significant background. Tandem affinity purification or the TAP tag approach, which involves two sequential affinity purification steps allows, purification of the protein complex to near homogeneity and is suitable for the study of soluble stable protein complexes (which are the focus of my PhD Thesis). The key disadvantage of the TAP tag approach is the significant dilution of the target complex during the second step of purification, which usually results in the loss of transient interactions. Lastly, after the purification of the target protein and its binding partners is accomplished by either one of the above affinity-tag purification approaches, the composition of the purified material (putative complex) is subjected to identification by either one or two of the so-called ‘soft ionization’ MS techniques. These include matrix-assisted laser desorption/ionization-MS (MALDI-MS) (Karas and Hillenkamp,

1988) and electro-spray ionization tandem MS (ESI-MS/MS) (Fenn et al., 1989). The principle and applications of these techniques in biological sciences are well documented in the literature

11

(Aebersold and Mann, 2003; Steen and Mann, 2004). ESI ionizes the biomolecules (e.g., peptides or proteins) present in solution and is hence readily coupled to a liquid-based chromatographic separation approach, such as reverse-phase high performance liquid chromatography. MALDI sublimates and ionizes the samples present in a dry, crystalline matrix, via irradiation with a focused laser beam. MALDI-MS is generally used to analyze relatively simple peptide mixtures (e.g., in-gel digests of protein bands), whereas liquid chromatography coupled to ESI-MS systems (LC-MS/MS) is ideal for the analysis of complex mixtures (I used this technique in this PhD Thesis).

Briefly, the process of identification of protein complexes by AP-MS consists of four steps. First, there is construction of a library of ORF-tagged proteins for each individual protein of the organism under investigation by traditional molecular cloning techniques or by using commercial kits. The second is the expression of tagged proteins in host-cells such that each tagged protein requires individual cell culturing. Third, the extraction and purification of the tagged protein and its associated partners is performed from the cell lysates. Fourth, the identification of the subunit composition of the putative protein complex involves the use of soft ionization MS-techniques such as MALDI-MS or LC-MS/MS. In my Thesis, I aimed to streamline this process by eliminating the laborious and time-consuming cloning procedure as well as the culturing of multiple cell lines, which may be costly when propagation of the human cells is required for a research project. Below, I describe the procedure for proteins identification by an LC-MS/MS approach using a linear ion trap quadrupole (LTQ) MS instrument, which was available at the start of my PhD project, an approach also commonly used in a large number of protein complex studies [reviewed in (Bensimon et al., 2012)].

12

1.2.1.1 Protein identification by MS shotgun bottom-up proteomics

MS-based proteomics is an attractive new technology in field of proteomics. This new research field has emerged from the availability of gene and genome sequence databases, the development of non-destructive ionization MS techniques applicable to proteins, and the availability of innovative bioinformatic software that allowed the interpretation of large numbers of mass spectra generated by the MS instruments. The development of the non-destructive MS methodologies was recognized by a Noble prize in Chemistry in 2002 to John B. Fenn and

Koichi Tanaka for the development of ESI and MALDI respectively.

So far, the most widely applied method for protein identification is referred to as MS shotgun bottom-up proteomics (i.e., inferring protein identity based on its peptide sequence information). Similar to the shotgun sequencing approach in genomics, this term describes an approach used to systematically identify proteins from a biological sample (e.g. an affinity purified protein complex or even a cell lysate) using a combination of liquid chromatography separation of peptides generated by trypsin digestion and their subsequent analysis by tandem mass spectrometry (MS/MS) (Link et al., 1999; Washburn et al., 2001; Wolters et al., 2001). The

MS instrument I used is equipped with linear ion-trap quadruple (LTQ) mass analyzer, which contains a larger cylindrical ions trap, compared to the now almost retired three-dimensional ion trap or LCQ. An LTQ mass analyzer allows the accumulation of more ions, hence offering increased sensitivity, resolution and mass accuracy for accurate protein identification and quantification. Below, I describe the basic concept of protein identification by MS shotgun proteomics.

In a typical MS shotgun bottom-up proteomics approach, protein samples are first precipitated with a reagent compatible with MS, then the sulfide bonds in the proteins are

13 reduced and alkylated to prevent formation or re-formation of the disulfide bonds. Next, denatured protein samples are digested with trypsin to generate specific peptides with C- terminally protonated amino acids, which is advantageous for subsequent peptide sequencing.

The resulting peptide mixtures are then fractionated in-line by reverse-phase liquid chromatography in very fine capillaries. The resolved peptide solution is subjected to an electric potential (2.0-3.5 kV), resulting in the formation of a spray and the desolvation and ionization of the peptides (electro-spray ionization; ESI). The mass to charge (m/z) ratios are recorded from peptide ions that pass the collision cell without fragmentation in the mass spectrometer (MS1 intensities). Specific ions are randomly selected for collision-induced dissociation (CID) with neutral gas molecules (helium is the most commonly used), and the resulting fragment ions are measured in the second mass analyzer in tandem mass spectrometry (MS/MS). To enhance the diversity of selected ions, dynamic exclusion criteria can be set to exclude an ion from selection if it has recently been analyzed. To do so, the instrument is set to perform in data-dependent mode, wherein the instrument will select automatically the most intense ions for MS/MS analysis. The MS precursor ion intensities (MS1 intensities) obtained in the first stage can be used for peptide quantification, whereas MS/MS fragment ions (spectral count) information from the second stage contain sequence information that can be compared with sequences from in silico digested protein sequence databases for peptide and protein identification.

For database searches, peptides are generated by an in silico digest of a relevant proteome database with computational software such as Proteogest (Cagney et al., 2003), and then a theoretical mass spectrum is predicted for each peptide. While there is a panoply of database searching algorithms in proteomics (e.g. SEQUEST, MASCOT, X!Tandem) to compare the theoretical spectra and experimental mass spectra to infer the peptide identity (and therefore the protein present in the purified protein complex) based on the best match between the theoretical

14 spectrum and the observed spectrum, the SEQUEST search engine, which uses a cross- correlation function to evaluate similarities between an experimental mass spectrum and a predicted spectrum from a database, pioneered by Yates and colleagues (Eng et al., 1994; Yates et al., 1995), has been successfully used in our lab (Chan et al., 2012; Havugimana et al., 2007;

Kislinger et al., 2006; Krogan et al., 2006) and in the work presented in this Thesis (Chapters 2 and 3). The key issue with peptide and protein identification via database searching is that some top scoring peptide matches are falsely identified (Peng et al., 2003). One simple and robust way to evaluate the false discovery rate (FDR) directly is to search simultaneously a so-called target- decoy database [e.g., the original database with reversed amino-acid orientation], with matches to the decoy proteins considered as false positives in a stringent test of specificity (95-99% confidence level) (Kislinger and Emili, 2003)

An issue in the database searches for the MS shotgun bottom-up proteomics approach is that this method is biased towards larger and more abundant proteins, as the number of available peptides for analysis increases with protein size and quantity. Sample simplification via pre- fractionation prior to digestion often improves the detection and coverage of the proteome in a given sample (Havugimana et al., 2007). Also, in one of the most in-depth proteomics studies ever conducted, Mathias Mann’s group demonstrated that nearly ~10,000 proteins expressed in

HeLa cells can be reliably detected by MS after fractionation of the proteome (Nagaraj et al.,

2011). I have used the Mathias Mann group’s data to assess the protein coverage and the diversity of protein abundance identified in my own studies.

15

1.2.1.2 Deriving protein complexes from AP-MS protein identification

In large-sale AP-MS studies, the MS identification outputs a list of proteins detected in each individual purification experiment. A computational tool such as the CONTRAST software

(Tabb et al., 2002) is often used to compare thousands of purification experiments in a single matrix (N x M), where N rows represent prey proteins and M columns consist of bait proteins used to purify prey proteins. Each cell in the matrix represents a score (e.g., spectral counts which serve as proxy for protein abundance, number of unique peptides, confidence score) used to identify the putative proteins (i.e., truly associated proteins or contaminants that need to be removed). This data representation is similar to fractionation experiments (performed in my study) where N rows represent the proteins identified and M columns the number of fractions, and allows preliminary assessment of the data by visualization in a ‘heatmap’ format using, for example, the Java Treeview freeware.

To derive putative protein complexes, the above data matrix is first transformed into a scored binary matrix network of bait-prey protein interactions in a so-called “spoke model”

(Krogan et al., 2006) or a network that combines both bait-prey and prey-prey protein interactions in the so-called “matrix model” network (Gavin et al., 2006; Kuhner et al., 2009).

There are a variety of scoring methods to assign the weight (i.e., a measure of confidence) to each PPI in the network. For example, while Krogan and colleagues (Krogan et al., 2006) used a machine learning algorithm trained by PPI derived from hand-curated MIPS protein complexes in his large-scale yeast study by AP-MS, Gavin and associates (Gavin et al., 2006) devised a new scoring measure based solely on raw purification data, which they termed the 'socio-affinity index'. In the next step, they also used different clustering algorithms (discussed further below) to infer the protein complexes from their high confidence PPI.

16

Regardless of the methods used to score the protein interactions or to organize the highly densely connected proteins into reproducible clusters, it is imperative to evaluate the performance of the method in predicting biologically meaningful PPI and protein complexes by means of gold standards in combination with scores for precision and sensitivity or recall. Precision is defined as the number of true positives (TP) (i.e., correct identification of genuine interactors) divided by the total number of identified interactors (i.e., true positives plus false positives (FP) or spurious interactors) or TP/(TP+FP). Sensitivity or recall is defined as the number of true positives identified divided by the total number of true positives might have been identified

(which includes the false negatives (FN) or genuine interactors, that are missed) or TP/(TP+FN).

Although imperfect (Jansen and Gerstein, 2004), the gold standard or reference set (i.e., representative data set drawn from literature curation of long standing biochemical information

[for example, MIPS protein complex database]) coupled to precision-recall analysis (i.e., a computational approach to estimate the predictive accuracy of data from two classes given the class labels, referred to as positives and negatives) provides not only a unique way to compare ones derived PPI with putative complexes available in the literature but also to assess directly the quality of preliminary proteomic datasets, which is a crucial step in developing and validating any new experimental technology. Precision and recall are calculated over a range of thresholds and precision is plotted as a function of recall (X-axis). On the precision-recall curve, each threshold defines one point on the curve by considering protein pairs whose association in the data exceeds the threshold value to be positive predictions and other pairs to be negative predictions, and the final dataset depends on the threshold cut off. Together, precision-recall analysis and reproducibility (i.e., the percentage of the interactions covered in repeated experiments) provide definitive measures to validate a new proteomic high-throughput

17 experimental method. In the yeast large scale AP-MS study, reproducibility was evaluated at∼70% (Gavin et al., 2006).

In the section below, I summarize the application of (T)AP-MS in some groundbreaking interactome studies that have been published in the last decade with an particular emphasis on the large-scale identification of the soluble protein complexes in conjunction with bottom-up protein shotgun sequencing by mass spectrometry (Link et al., 1999; Washburn et al., 2001).

1.2.1.2.1 Examples of large-scale studies of protein interactions in unicellular model organisms by AP-MS

To date, the most comprehensive physical interaction datasets generated based on direct experimental observations of affinity purified protein complexes were reported by our group in collaboration with the Greenblatt laboratory (Krogan et al., 2006), as well as by another competing team (Gavin et al., 2006), using the budding yeast as the model system.

Using the TAP procedure pioneered by the Seraphin group (Rigaut et al., 1999), Krogan and colleagues (Krogan et al., 2006) used a publicly available collection of 4,562 endogenous chromosomally ORF-TAP tagged fusion strains to define a high quality map of soluble yeast protein complexes. Each individual TAP-tagged fusion protein, along with its stably interacting partners, was extracted and affinity-purified from a 4L yeast culture. For the purification, an immunoglobulin G (IgG) column, which binds the protein A tag, was used first, then, following elution with TEV protease, a second calmodulin column was used in the presence of calcium, followed by elution with EGTA. From the 2,357 successful (bait identified) purifications (i.e.

52% success rate), a total of 4,087 proteins were identified by either MALDI-TOF MS or ESI-

MS/MS. Bait-prey protein associations were deemed genuine if detected in at least two co-

18 purification experiments, whereas 44 common contaminants detected in >3% of all purifications as well as the ribosomal proteins were removed from further consideration. To assign confidence scores to the interactions, the filtered MS data was subjected to machine learning procedures to prune spurious interactions. To this end, the authors trained and cross-validated their algorithms using a reference set (i.e., gold-standard) of manually-curated yeast protein complexes downloaded from the Munich Information Center for Protein Sequences (MIPS) database.

However, this gold standard was small, with only 68 stable complexes, and hence it may overestimate the performance. It is also not easy to define clearly which proteins interact within the same complex; there is always bias when deriving PPI from complexes.

Nevertheless, using an interaction probability score threshold of 0.273 (which corresponded to a median precision of 0.69), the authors identified 7,123 putative high confidence protein-protein interactions (i.e. bait-prey interactions using the so-called “spoke” model) among 2,708 proteins, covering roughly half the expressed proteome.

To de-convolute protein complexes from this interaction network, the Markov Clustering

(MCL) algorithm (Enright et al., 2002) was used, which simulates random walks within graphs

(wherein nodes are selected at random and a fixed number of PPI “edges” is crossed). Through an iterative process of many such walks the algorithm splits the proteins into exclusive groups based on the relative flow across highly traversed regions, with high connectivity being a sign of clusters. Using expansion and inflation parameters that optimize overlap with the MIPS reference complexes (Mewes et al., 2004), the Krogan et al. study predicted a set of 547 non- overlapping clusters, of which over half had not been previously reported. Extensive bioinformatics analyses based on examining the semantic similarity (i.e., the tendency of a pair of proteins to be annotated with similar function, co-localization), and evolutionary co-

19 conservation (proteins in a complex tend to co-evolve) of the components of the predicted protein complexes supported the overall reliability of the defined protein complexes. These types of functional association have been also used in the construction and evaluation of PPI in my study (Chapter3).

A parallel genome-wide yeast TAP-MS study by Gavin and collaborators (Gavin et al.,

2006) reported data of a similar scope with 3,206 purified fusion proteins identified with a ~50% overall success rate., After SDS-PAGE separation and trypsin in-gel digestion of the purified polypeptides, the authors used matrix assisted laser desorption/ionization time of-flight

(MALDI-TOF) MS to identify interacting proteins. To discriminate contaminants from genuine interactions, they devised a so-called “socio-affinity index” filtering scheme that combined the spoke and matrix models based on the tendency of a given protein pair to consistently co-purify when relevant proteins are tagged, i.e., prey-prey interactions. Using this scoring metric, the authors built a probabilistic PPI network using the socio-affinity indices as edge weights. The authors applied an iterative clustering approach and identified 491 protein complexes, of which

257 were new (i.e., not documented in the literature). As per Krogan et al., they also provided supporting functional evidence based on protein co-expression, co-localization, evolutionarily co-conservation, known multi-protein structures and binary interaction data. Instead of using these types of functional genomic associations as evidences for assessing the quality of complexes, I have used them to prune spurious interactors in my studies (Chapter 3).

Remarkably, despite using the same standardized experimental approach, the two studies showed limited overlap both in terms of PPI and predicted protein complex membership, providing vastly different takes on the yeast interactome (Goll and Uetz, 2006). Later re-analyses of the raw data produced in both studies demonstrated that the major source of discrepancy arose

20 during data processing (Collins et al., 2007; Hart et al., 2007; Pu et al., 2007), particularly the different approaches used to score the PPI networks and the different clustering algorithms applied to the networks, illustrating the need for careful interpretation and benchmarking of large-scale interaction data.

The Emili and Greenblatt groups adopted a smaller (8 kDa instead of 20 kDa) tandem sequential peptide affinity (SPA) tag (Zeghouf et al., 2004)to purify endogenous bacterial protein complexes from recombineered E. coli K-12 strains (Hu et al., 2009). Unlike the original TAP tag procedure, wherein up to 18% of C-terminally tagged essential proteins led to non-viable yeast, which was not solved by N-terminal tagging (Gavin et al., 2002), the SPA tags showed high success in purifying essential E.coli proteins (Babu et al., 2009; Butland et al., 2005), implying the tag was less likely to interfere with protein folding/function. Of the 1,476 E.coli strains expressing SPA-tagged fusions, 1,241 baits were successfully purified (i.e., ~84% success rate). To segregate non-specific binders from genuinely co-purifying pairs of proteins, the mass spectrometry data was again subjected to machine learning. As positives, the authors used low- throughput PPI curated in public PPI databases like DIP, BIND, and IntAct, while negatives were defined as pairs of proteins in different sub-cellular localizations. The final weighted network, with a minimum likelihood ≥ 0.75, consisted of 5,993 PPI among 1,757 proteins. As with Krogan et al (Krogan et al., 2006), this high confidence network was partitioned by MCL, resulting in a map of 443 protein complexes including, 244 (55%) with at least one functionally uncharacterized E.coli protein. As independent validations, the authors showed that genes encoding putative interacting proteins had higher mutual information (i.e., the score for the co- absence/presence between two functionally related genes across genomes in phylogenetic profile studies) scores (correlated phylogenetic patterns) and more highly correlated gene co-expression patterns. These high-throughput studies led not only to the discovery of hundreds of previously

21 uncharacterized complexes and components, but also shed light on the global molecular organization of eukaryotic and prokaryotic cells. Apart from the computational analysis of PPI data, a real logistical challenge prevents the easy application of the (T)AP-MS method to higher eukaryotes on a global scale. While tagged cDNA expression constructs can be stably or transiently expressed in cultured mammalian cell lines to isolate protein complexes (Behrends et al., 2010; Ewing et al., 2007; Hutchins et al., 2010; Mak et al., 2010; Sardiu et al., 2008; Sowa et al., 2009), an advantage of the yeast and E.coli studies is that, with facile genetic methods and far fewer ORFs than mammalian cells, all genes could be efficiently tagged at their chromosomal loci using homologous recombination and therefore be expressed at native levels using endogenous promoters. Nevertheless, as I summarize below, systematic AP-MS studies have recently been reported for multi-cellular organisms.

1.2.1.2.2 Examples of AP-MS surveys in higher eukaryotes

Five years after the two global yeast AP-MS studies, a comparable genome-scale AP-MS study cataloging the compositions of affinity-purified protein complexes in the metazoan Drosophila was reported (Guruharsha et al., 2011). In this effort, Guruharsha and colleagues used a metallothionein , which is inducible, to transiently express 4,927 C-terminally FLAG-

HA-epitope tagged fly fusion proteins in cultured Schneider cell lines (S2R+). Cell-free extracts were then subjected to one-step (anti-HA resin) affinity purification, for which 3,488 baits produced positive LC-MS/MS results. To eliminate nonspecific interactions, the authors devised and benchmarked a new statistical metric, termed the HyperGeometric Spectral Counts score

(HGSCore). This score calculated the probability of co-occurrence of interacting protein pairs based on a hypergeometric distribution error model (Hart et al., 2007), which was modified to take into account the normalized spectral count information (i.e., number of times peptides matching to a given protein were identified in a particular sample) to estimate protein abundance.

22

They then used the hypergeometric function to calculate the significance of co-occurrence of protein pairs (assuming genuine interactors tend to co-appear across relevant experiments whereas contaminants don’t). Imposing a modest p-value of 5%, they defined 10,969 high- confidence interactions involving 2,297 Drosophila proteins. Applying MCL, they predicted 556 putative complexes consisting of 2,240 proteins. To validate their predictions, they compared, via orthology, their complexes to curated human protein complexes reported in the CORUM

(Ruepp et al., 2009) and REACTOME (Croft et al., 2011; Matthews et al., 2009) databases and high confidence yeast complexes (Pu et al., 2007; Pu et al., 2009). Good agreement was found for highly conserved complexes, such as the proteasome and translation factors, amongst others.

As in yeast, they also observed that genes encoding subunits of the same complex tend to be co- expressed and share the same gene ontology annotations. Finally, they tagged a selected number of conserved Drosophila proteins and expressed them in the human HEK293 cells as additional cross-species validations. Immunoprecipitation Mass spectrometry confirmed 65% of the selected protein interactions. The key issue with this study is the lack of a gold standard to estimate the performance of this approach. The authors relied on their statistical simulation and the agreement of their clusters with conserved, highly abundant human and yeast protein complexes (e.g. proteasome, prefoldin) to establish the reliability of their data. Such conserved complexes and many others reported in CORUM should have been used as references to objectively assess the performance of the method, as well as the correspondence between their data and the literature.

The first large-scale human protein AP-MS study reported PPI linked to 338 C-teminally

FLAG-tagged disease-associated bait proteins isolated using one-step affinity purification followed by gel-based LC-MS/MS analysis (Ewing et al., 2007). 1,034 individual AP-MS experiments were performed, consisting of replicate analyses of 408 baits transiently over-

23 expressed in Human Embryonic Kidney 293 cells (HEK293) and 202 control immunoprecipitation experiments (i.e., empty vectors). Empirical filtering based on the control experiments was then used to eliminate non-specific contaminants (frequent flyers), such as tubulin, and ribosomal and heat shock proteins. Finally, a computational model to score and rank interactions based on prey protein reproducibility was applied. The authors observed that most associations between components of well-known complexes identified by two or more unique peptides occurred with scores >0.3. Using this threshold as a cut-off, a set of 6,463 proteins interactions between 2,235 distinct proteins was defined. Unfortunately, no partitioning algorithm was applied to de-convolute this network into potential protein complexes.

In another more recent large-scale targeted-study, this time involving the identification of protein complexes potentially implicated in mitosis, Hutchins and collaborators (Hutchins et al.,

2010) minimized protein over-expression artifacts by expressing the tagged protein from bacterial artificial (BACs) with endogenous regulatory elements (Poser et al.,

2008). In this LAP-tag (localization and affinity tag) technique, the authors fused green fluorescent protein (GFP) and S-peptide at the C-terminal or N-terminal (for cases where C- terminal tags failed to produce viable clones) of mouse genes of interest that are conserved in human. The bait proteins were stably expressed in HeLa cells arrested in mitosis with nocodazole. The bait proteins and interactors were purified from cell extracts using two-step affinity purifications with anti-GFP antibody in the first round followed by binding and elution from an S-protein matrix. The purification products were then analyzed by LC-MS/MS using high resolution LTQ-Fourier transform instruments. Of the 254 baits attempted, 239 were identified along with 936 candidate prey proteins (excluding frequent flyer proteins identified in experiments where the bait went undetected). To identify complexes, they created an interaction graph based on a matrix model in which the edge weights reflected the number of shared baits

24

(i.e. a higher number of shared baits indicated a higher likelihood of being associated in the same complex). They then partitioned their network using a clustering algorithm called “Spectral fuzzy C-means" (SFCM) that, in contrast to standard implementations of MCL, allowed protein assignment to more than one complex. Of the 107 identified clusters with 2 to 20 components, only 11 matched closely to a reference complex with an average precision of 59% (i.e. fraction of cluster components assigned to the same grouping) and an average recall of 89% (i.e. fraction of reference complex subunits assigned to the same cluster).

The Hutchins et al. study offers several notable advantages over other human AP-MS studies noted above. First of all, protein over-expression was minimized by tagging mouse orthologs following RNA interference knockdown of the endogenous human protein, with the caveat that not all proteins are conserved between human and rodents. Secondly, the usage of

GFP in the tag also permitted in vivo protein localization by fluorescence microscopy. However, as noted before, epitope tags can interfere with biological function. Alternating the tag at either the N- or C-terminus makes the whole procedure more time-consuming and costly for large scale studies of the roughly 10,000 proteins predicted to be expressed in a single cell type (Nagaraj et al., 2011). These technical issues limit the application of AP-MS to more focused studies of selected biological systems, such as that the Emili and Greenblatt laboratories are currently doing to explore the chromatin-related protein machinery that is involved in epigenetic regulation of .

Although remarkable examples of progress have been registered in mapping subsets of human protein interactions through the focused application of AP-MS, information about endogenous protein complexes and protein interactions that exist in a native, unperturbed system

25

(i.e., proteins expressed in a native cellular environment) is still lacking for the most part. To address this challenge, innovative high-throughput methods are required.

One proposed strategy is the scaling up of traditional immunoprecipitation experiments based on the use of large collections of commercially available antibodies against hundreds of endogenous proteins followed by MS identification of immunoprecipitated protein species

(Malovannaya et al., 2011; Malovannaya et al., 2010). The sharing of epitopes, leading to cross- reactivity, is even treated as an informative feature in IP/MS screening studies in that the goal is to document consistently co-purifying sets of proteins, regardless of the actual target. For example, Qin and colleagues (Malovannaya et al., 2011) used 1,796 different primary antibodies in duplicate IP experiments to isolate endogenous human coregulator protein complexes from

HeLa cell nuclear extracts (Malovannaya et al., 2011). For each IP/MS experiment, ~ 1 mL of nuclear extract (~10 mg proteins) was quickly precipitated to preserve weak interactions prior to

LC-MS/MS. The authors routinely identified 100-300 proteins (both specific and non-specific) per IP experiment. From the 3,290 MS analyses, the authors identified 11,485 unique human gene products with more than half found redundantly using different antibodies. To assign proteins to complexes, the authors first manually removed common ribosomal, cytoskeletal and heat shock proteins and then applied an iterative bait-independent method they called Near

Neighbor Network analysis (Malovannaya et al., 2010) to assess the co-occurrence of interacting pairs of proteins. For this analysis, candidate interactions were deemed genuine if the protein pairs co-purified at least three times with a particular seed (selected protein with higher spectral counts) and at least two independent antibodies. To segregate near-neighbors, they calculated a cosine similarity metric as a proxy for interaction proximity, based on the assumption that ratios between complex components across different experiments should be similar. Using an ad hoc score cut off, they were able to predicted 486 minimal endogenous core complex "modules" (sets

26 of proteins with inter-dependent stoichiometries) (MEMO) centered on the transcription apparatus. Limited RNA interference and AP-MS experiments were performed to validate one newly identified complex. The MEMO modules showed enrichment for annotated complexes, but low overall coverage (~15%) relative to the CORUM curation database.

1.2.2 Low-throughput experimental approaches amenable to proteome scale-up

In the pre-proteomic era, many notable discoveries of core macromolecular assemblies involved in critical biological processes, such as transcription (e.g. RNA polymerase), translation (e.g. ribosome), and protein degradation (e.g. proteasome), were achieved based on biochemical fractionation to isolate cellular constituents with assayable properties (e.g. enzymatic activity). In such studies, a protein complex of interest was typically purified to (near) homogeneity using two or more consecutive fractionation steps, usually including sub-cellular fractionation by ultracentrifugation followed by classical forms (ion-exchange, hydrophobic, hydroxylapatite, affinity, or size exclusion) of chromatography. The separation of eukaryotic RNA polymerases I,

II, and III by chromatographic anion-exchange fractionation of sea urchin embryo nuclear extracts (Roeder and Rutter, 1969) and the identification of the aminoacyl-transfer RNA multisynthetase complex from rat liver (Bandyopadhyay and Deutscher, 1971) illustrate just two of the many success stories in the pre-proteomic era. However, in principle, it is not necessary to purify macromolecular complexes to absolute homogeneity in order to identify their components.

Given the rapid proliferation of MS-based protein identification techniques since the 1990's, the combination of classical biochemical co-fractionation combined with emerging tandem mass spectrometry-based protein profiling has emerged as a means to ascertain complex membership by monitoring the co-fractionation of stably associated proteins in a sensitive and unbiased fashion (Andersen et al., 2003; Dong et al., 2008; Foster et al., 2006; Hartman et al., 2007;

Mosley et al., 2009).

27

For example, Andersen et al. used a traditional, single-step sucrose gradient fractionation to enrich human centrosome associated proteins, whose components they subsequently identified by monitoring of peptide co-elution profiles by high resolution-MS quantification (Andersen et al., 2003). Mann and coworkers (Foster et al., 2006) elegantly extended this technique to map the enrichment of 1,404 proteins in 10 sub-cellular compartments in mouse liver, while Mosley et al. adopted a similar co-fractionation approach based on spectral counting to assess the association of proteins that populate the nucleus in budding yeast (Mosley et al., 2009). While the latter study was limited to known transcription regulators, they found that some components of known complexes had multiple peaks in their sucrose gradient fractions and concluded that these proteins may be involved in multiple complexes. However, given the poor resolution of sedimentation methods, I suspect that this may also be pure chance co-elution and further validation may be needed. I discuss the coincidental co-fractionation issue in Chapters 2 and 3 of this Thesis.

In a similar vein, Hartman et al. used stable isotope-coded affinity tags (based on chemical labeling of cysteine residues before protein digestion) as a quantitative measure to correlate protein profiles obtained by co-sedimentation to survey mitochondrial membrane protein complexes in Arabidopsis thaliana (Hartman et al., 2007). The use of isotope labeling was likewise adopted by Dong and colleagues (Dong et al., 2008) in their “tagless” pilot study combining multi-column fractionation of 13 E.coli endogenous protein complexes. In this study they resolved proteins sequentially on anion-exchange and gel-filtration columns prior to MS detection. Based on a subset of analyzed fractions, the authors evaluated at 50% the number of

E.coli protein complexes that could survive the multiple chromatographic steps. While the stability of protein complexes to dilution may not be an issue, because the biomass could be easily scaled up, multiple column chromatographic fractionation requires considerable resources

28 and a highly automated environment to handle the thousands of fractions generated, and thus may not be suitable for many research laboratory settings. In Chapter 2 of my Thesis, I propose a novel and generic ‘Fractionomics-Profiling-MS” technology that can be applied in any modern laboratory proteomic setting. The number of fractions is minimized and no sample labeling is required in order to attain reasonable resolution in isolating and identifying soluble stable complexes.

More recently, as I detail in Chapter 2, I have shown that non-denaturing high performance ion-exchange fractionation can offer much high resolution isolation of human proteins from cultured cell extracts (Havugimana et al., 2006; Havugimana et al., 2007) and can be exploited to monitor the composition of stable, soluble protein complexes in a highly parallel manner. This approach offers two advantages over the existing methods in defining the interactome of human cells. First, there is no need to tag and label individual proteins so that interactions are assayed in a near physiological context. Second, sample generation is greatly simplified, as only as a single sample of ~108-109 cells suffices to interrogate the human interactome. Critically, the extensive purification to near homogeneity of selected protein complexes is not a requirement, nor are specialized affinity reagents like antibodies. In principle, my approach provides a potentially unbiased and cost-effective solution to the systematic mapping of the networks of protein complexes present in diverse cell types and model organisms.

29

1.3 Computational methods for prediction and analysis of protein complexes

Given the incompleteness of the mapped human protein interactome and the limitations of the currently available experimental methods, various computational approaches have been developed either to predict human protein interactions or with the aim of aiding in the visualization and critical assessment of existing physical interaction networks. Some notable efforts in this direction are summarized below. Each method has its own merits and shortcomings depending on the features used to predict associations. For example, protein interactions and complexes can be predicted based on orthogonal genetic information such as gene co-evolution (Lehner and Fraser, 2004; Tillier and Charlebois, 2009) gene ontology properties (especially biological process and protein co-localization annotations), gene co- expression (Zanivan et al., 2007), text mining or a combination of all of these features (Ramani et al., 2005).

To achieve more accurate predictions, experimental binary physical interactions can also be integrated with functional association features. For example, based on the knowledge that interacting proteins tend to co-evolve and to be co-expressed, Ramani et al. used a machine learning approach to compare human mRNA co-expression patterns with those of orthologous genes in five other eukaryotes (fly, mouse, plant, worm and yeast) to derive a set of 7,000 putative physical associations among 2,348 human proteins (Ramani et al., 2008) via calculation of Pearson correlation coefficients for pairs of human genes, as well as for their corresponding orthologs.

Likewise, our colleagues in the Marcotte laboratory used a machine learning approach based on Bayesian integration of the 21 features shown in Table 1-1, to construct a probabilistic

30 functional network, which they termed HumanNet, to predict 476,399 pairwise functional associations for 18,714 human validated protein-encoding genes (Lee et al., 2011). As will become apparent in Chapter 3, a portion of these data were invaluable in the studies reported in this Thesis as filters to improve HPLC co-elution based inferences of the composition of human protein complexes.

Table 1-1: List of data type used in the construction of HumanNet resource.

This table contains most of the types of features used to predict high confidence human PPI.

CODE Organism Type of data CE-CC C. elegans Co-citation of worm gene CE-CX C. elegans Co-expression among worm genes CE-GT C. elegans Worm genetic interactions CE-LC C. elegans Literature curated worm protein physical interactions CE-YH C. elegans High-throughput yeast 2-hybrid assays among worm genes DM-PI D. melanogaster Fly protein physical interactions HS-CC H.sapiens Co-citation of human genes HS-CX H.sapiens Co-expression among human genes HS-DC H.sapiens Co-occurrence of domains among human proteins Gene neighborhoods of bacterial and archaeal orthologs of human HS-GN H.sapiens genes HS-LC H.sapiens Literature curated human protein physical interactions HS-MS H.sapiens Human protein complexes from AP-MS HS-PG H.sapiens Co-inheritance of bacterial and archaeal orthologs of human genes HS-YH H.sapiens High-throughput yeast 2-hybrid assays among human genes SC-CC S.cerevsiae Co-citation of yeast genes SC-CX S.cerevsiae Co-expression among yeast genes SC-GT S.cerevsiae Yeast genetic interactions SC-LC S.cerevsiae Literature curated yeast protein physical interactions SC-MS S.cerevsiae Yeast protein complexes from AP-MS Yeast protein interactions inferred from tertiary structures of SC-TS S.cerevsiae complexes SC-YH S.cerevsiae High-throughput yeast 2-hybrid assays among yeast genes

31

To organize the interactions inferred from functional and physical associations into functional modules such as protein complexes, a variety of clustering techniques have been devised [see

(Li et al., 2010) for a detailed review]. Protein-protein interaction (PPI) networks are often represented as graphs, in which the nodes represent proteins while the edges represent physical interactions. Edges are often assigned a weight based on a measure of interaction reliability, and are undirected. Detecting a protein complex then becomes a matter of finding densely connected regions in the interconnected graph. Among the popular unsupervised clustering approaches used to identify densely connected regions from a graph (King et al., 2004; Spirin and Mirny, 2003), the Markov graph clustering (MCL) algorithm is a preferred choice since it has been shown to accurately extract complexes from diverse PPI networks (Brohee and van Helden, 2006; Pu et al., 2007). However, standard MCL implementations do not fully exploit the information provided in confidence scores and do not provide information about component sharing between multiple complexes, which may be biologically quite pervasive. Such a caveat was highlighted in the Krogan et al. study, where MCL could not differentiate components of RNA polymerase I from RNA polymerase III. Novel clustering approaches able to identify overlapping protein complexes, while taking into account interaction probabilities, have been devised. One notable example is ClusterONE (Nepusz et al., 2012), which I have used to interpret the human interaction data that I generated, is highlighted in Chapter 3 of this Thesis.

To ascertain that computed protein clusters are biologically relevant (i.e., not obtained by random chance), close evaluations of the predicted complexes are performed with different functional criteria. For example, by comparing the predicted complexes with Gene Ontology annotations (Ashburner et al., 2000), one should expect to see proteins in the same clusters being co-localized in the same cellular component or performing similar functions, whereas random clusters will not. Additionally, observation of the pairs of proteins in the same complex

32 exhibiting the same phenotype when subjected to perturbation by RNA interference is a good indicator of the quality of predicted protein complexes dataset. To assess the statistical significance for enrichment of a given functional category or phenotype, the standard hypergeometric distribution function is often used to calculate the p-value (Hart et al., 2007;

Tavazoie et al., 1999; Zanivan et al., 2007), with low probability values indicating highly enriched features.

To facilitate and guide scientists in retrieving relevant biological information and generating testable hypotheses from complex biological datasets, a variety of software platforms have been developed to visualize, amalgamate, and analyze interaction networks. Among the most popular, publicly available tools is Cytoscape (Shannon et al., 2003), which is supported by a large collection of plug-in tools that facilitate the statistical analysis and organization of large scale biological datasets. I have used this particular software to visualize a human protein complex map, which I experimentally derive and present in Chapter 3.

1.4 Repositories of experimentally derived PPI and protein complex data

To facilitate community access to the incredible amount of valuable human protein interaction data in the literature, different research groups and consortia have constructed and made database repositories available in the public domain. These include the Biological General Repository for

Interaction Datasets (BioGRID) (Stark et al., 2011), the Human Protein Reference Database

(HPRD) (Prasad et al., 2009), the IntAct protein interaction database (Kerrien et al., 2011), the

REACTOME database (Joshi-Tope et al., 2005), the Comprehensive Resource of Mammalian protein complexes (CORUM database) (Ruepp et al., 2009) and the Proteins Interacting in the

Nucleus database (PINdb) (Luc and Tempst, 2004). Human protein interactions and protein

33 complexes reported in these databases have been used extensively for benchmarking purposes in this Thesis (see Chapter 3) to assess the reliability and coverage of the human protein interactions and protein complexes I predict during the course of my Thesis. It's worth emphasizing, though, that these curated interactions were generated using a wide range of experimental (and even non-experimental) approaches conducted using different cellular models and tissues.

1.5 Goal and purposes of the present thesis project

The above investigations show that, while tremendous progress has been made in mapping, interpreting and visualizing protein-protein interactions and protein complexes in a comprehensive manner in simple model organisms such as yeast and , including ongoing efforts by our group and many others, only a far more limited outline has been experimentally delineated to date for mammalian cells. Nevertheless, these studies serve as models that clearly highlight the power of systematic approaches for documenting protein complexes and protein- protein interactions in a highly reliable, informative manner. Notwithstanding the limitations of existing proteome-scale methods, such as AP-MS, it is likely that the current gaps in our understanding of the human protein interactome will be surmounted within the next decade. To speed this closing, I have focused my PhD research project on developing and applying an innovative approach for the systematic enrichment and characterization of soluble stably- associated human protein complexes and protein-protein interactions under near native physiological conditions in cultured cell lines as a way to accelerate understanding of the molecular organization of human cells.

In Chapter 2, I describe the development of a generic and effective 'guilt-by-co-elution' profiling strategy based on native biochemical fractionation by high resolution HPLC of stable

34 protein complexes, which were subsequently identified by highly sensitive LC-MS/MS technology, as a means of interrogating the composition of large numbers of protein complexes in a parallel manner.

In Chapter 3, I describe how, in collaboration with computational biologists, we applied machine learning procedures to integrate the biochemical profiles I generated for HeLa and

HEK293 cells with other functional association evidence to filter out physical protein interactions (Chapter 2) and construct a high-confidence network of human protein-protein interactions from which we derived a global map of putative human protein complexes.

In chapter 4, I summarize the main conclusions and outcomes derived from the collective data presented in this Thesis project and present my thoughts on the current status and promising future research directions for my co-fractionation approach.

35

Chapter 2 Development of a High-Throughput Global Profiling Method Based on High Resolution Ion-Exchange High Performance Liquid Chromatography for Proteomic-Scale Analyses of Native Stable Soluble Protein Complexes

Portions of this chapter have been reprinted or adapted from (Havugimana et al., 2006; Havugimana et al., 2007).

I did all of the experiments presented in this Chapter

Prof. Andrew Emili supervised the project and advised on the experimentation.

36

2 Development of a high-throughput global profiling method based on high resolution ion-exchange high performance liquid chromatography for proteomic-scale analyses of native stable soluble protein complexes

2.1 Introduction

Biological systems often depend on stable physical associations between two or more proteins to form macromolecular “machines” that perform the various activities underlying cell homeostasis, growth and proliferation (Alberts, 1998). Regardless of the physiological context, a typical eukaryotic cell contains a wide diversity of heteromeric protein complexes, composed of different components, which localize to different sub-cellular compartments. Over the past few decades, many soluble cytoplasmic and nuclear protein complexes involved in cell processes ranging from gene expression and protein synthesis to chromosome dynamics have been purified. Experimental characterization of the subunit composition of these complexes has provided mechanistic insights into the modular nature of biological systems (Hartwell et al.,

1999), accelerated understanding of the functional organization of healthy and diseased cells at the molecular level (Vidal et al., 2011), and facilitated the assignment of functional annotations to previously uncharacterized proteins via guilt-by-association (Hu et al., 2009; Oliver, 2000).

Public repositories have been developed to curate annotated human protein complexes (Prasad et al., 2009; Ruepp et al., 2009) while bioinformatics analyses based on sequence orthology and other computational tools have led to predictions of additional annotations (Lee et al., 2011).

Yet, despite considerable progress in the comprehensive characterization of protein complexes in microbes (Butland et al., 2005; Gavin et al., 2006; Krogan et al., 2006; Kuhner et al., 2009), only

37 a subset of the protein assemblies present in any one particular human cell-type have been documented to date. Indeed, existing experimentally verified physical interactions reported in a public repository of mammalian complexes (Ruepp et al., 2009) are estimated to cover less than one fifth of the predicted encoded proteins of the human genome.

Protein affinity purification coupled to mass spectrometry (AP-MS) is widely regarded as a cornerstone technology for the systematic isolation and identification of the subunit composition of stable complexes (Gingras et al., 2007; Kocher and Superti-Furga, 2007; Musso et al., 2007) and more recently for transient complexes (Malovannaya et al., 2011; Malovannaya et al., 2010). However, selective co-immunoprecipiation depends critically on access to high quality antibodies, which are still very limiting and/or expensive. On the other hand, while modern recombinant DNA procedures (recombineering) have facilitated the expression of tagged protein “baits” in higher eukaryotes (Behrends et al., 2010; Bouwmeester et al., 2004;

Goudreault et al., 2009; Guruharsha et al., 2011; Hutchins et al., 2010; Mak et al., 2010; Sowa et al., 2009), sequence-verified cDNAs encoding cell-specific protein variants are not always available, and artifacts can results from protein tagging or over-expression. Moreover, scaling-up

AP-MS to investigate thousands of proteins remains virtually impractical, precluding any comprehensive survey of stable protein complexes in a human cell-type.

Traditional biochemical fractionation procedures based on conventional chromatography or co-sedimentation (fractionation based on size and shape) have been used extensively in the past to isolate protein complexes exhibiting assayable properties of interest such as enzymatic activity (Roeder and Rutter, 1969). However, progress in liquid chromatography-tandem mass spectrometry (LC-MS/MS) protein detection makes it possible to identify and monitor the relative abundance of thousands of proteins in fractionated biological mixtures and to infer

38 complex membership based on co-purification of two or more proteins through different separation procedures such as co-sedimentation through gradient centrifugation (Andersen et al.,

2003; Foster et al., 2006; Mosley et al., 2009), co-migration through blue native-PAGE (Wessels et al., 2009), co-fractionation through size exclusion chromatography (Olinares et al., 2010), and/or multiple orthogonal chromatography steps (Dong et al., 2008). Despite the various advantages of these approaches over the existing affinity-based or antibody-based purification procedures, the combination of high resolution chromatography, such as ion-exchange high- performance liquid chromatography, with MS to study protein complexes on genome-wide has never been attempted.

In this work, I have applied this concept on a large scale, using exhaustive LC-MS/MS shotgun profiling of nuclear and cytoplasmic protein extracts from cultured human cells fractionated by non-denaturing high performance ion-exchange chromatography (IEX-HPLC) to determine the composition of stable, native soluble protein complexes. To benchmark this approach, I evaluated the detection of 20 well-defined human soluble protein complexes as well as the overall proteome coverage attained using various ion exchange column arrangements that

I extensively optimized (Havugimana et al., 2006; Havugimana et al., 2007). These include examination of single phase weak anion-exchange (WAX), single phase weak cation-exchange

(WCX), single phase heparin affinity column (HAC), dual phase column set up (DCS) WAX in series with WCX, dual phase heparin mixed-bed IEX (HCW), and triple phase column set (TCS)

WAX-WAX-WCX fractionation procedures to increase the resolution and improve the detection of lower abundance protein complexes. Optimal fractionation was obtained using a triple-phase

IEX column arrangement for cytoplasmic extracts; whereas heparin (DNA mimetic) affinity chromatography coupled to mixed-bed IEX chromatography produced the best results for nuclear extracts.

39

2.2 Material and Methods

2.2.1 Cell lines and cell-free extract preparation

HeLa S3 and HEK 293 cell-free extracts (nuclear and cytoplasmic extracts) prepared under non- denaturing conditions were obtained from Paragon Bioservices (MD, USA). Prior to use for

HPLC fractionation, target cell extracts were treated with Benzonase (100 units/mL) to remove nucleic acids and clarified by centrifugation (14K rpm, 10 min, 4 oC) to remove insoluble debris.

2.2.2 HPLC columns, buffers, and instrumentation

IEX chromatography columns (weak anion-exchange PolyWAX LP; weak cation-exchange

PolyCAT A; mixed-bed PolyCATWAX50/50 columns) were purchased from PolyLC Inc (MD,

USA). A TSKgel Heparin-5PW affinity column was obtained from Tosoh Bioscience LLC (PA,

USA). The buffer systems were freshly prepared with HPLC grade H2O and comprised low salt buffer A [10 mM Tris-HCl, pH7.6, 3 mM NaN3, 0.5 mM DTT, 5%-Glycerol] and high salt

Buffer B [Buffer A + 1.5 M NaCl]. I performed all the HPLC fractionations using an Agilent

1100 HPLC binary pump system (Agilent Technologies, ON, Canada), essentially as described below and as published elsewhere (Havugimana et al., 2006; Havugimana et al., 2007). Protein elution was monitored by absorption at 280 nm.

2.2.3 Single phase weak cation-exchange fractionation of HeLa cytosolic extracts

A total of ~2.0-3.0 mg soluble protein from HeLa S3 cytosolic extract was applied to a PolyCAT

A column (200 x 4.6 mm id, 5 m, 1000-A) equilibrated with buffer A. Elution of bound proteins was achieved through application of a 30-min gradient from 0 to 50% buffer B, with a final 2-min gradient of 50-100% buffer B applied to elute tightly bound proteins. 100% buffer B was maintained for an additional 2-min before returning back to 0% buffer B in 2-min for re-

40 equilibration of the column for 3-min. A total of 45 × 1.2-ml fractions were automatically collected using a flow rate of 1.2 mL/min. Protein was precipitated with 10% Trichloroacetic acid overnight at 4°C, digested with one g sequencing grade trypsin (Roche, Mississauga,

Canada) and the peptides analyzed by electrospray ionization LC-MS/MS using a LTQ linear ion-trap instrument operated in data-dependent mode as described below.

2.2.4 Single phase weak anion-exchange fractionation of HeLa cytosolic extracts

A total of ~2.0-3.0 mg soluble protein from HeLa S3 cytosolic extract was applied to a PolyWAX LP column (200 x 4.6 mm id, 5 m, 1000-A) equilibrated with buffer A. Elution of bound proteins and collection of fractions was achieved as described above. Protein was precipitated with 10% TCA overnight at 4°C prior to trypsin digestion and analysis by LC- MS/MS.

2.2.5 Dual phase WAX and WCX fractionation of HeLa cytosolic extracts

The dual-column set-up consisted of a tandemly connected arrangement of WAX-CAT columns

(PolyWAX LP, 200 x 4.6 mm i.d, 5 μm, 1000 Å; PolyCAT A, 20 x 4.6 mm i.d, 5 μm, 1000Å).

The two-stage column system was protected from clogging with a one cm weak anion exchange pre-column guard cartridge. The entire column enclosure compartment was cooled to 17 oC while the other bays were chilled to 4 oC to minimize sample degradation. Injections were typically ~2.0-3.0 mg total protein loaded per run. Elution was achieved using a multi-step gradient, consisting of six transitions with increasing proportions of buffer B: (step 1; equilibration) 0%B, 0-8 min; (step 2; salt gradient) 0-45%B, 8-38 min; (step 3; high salt rinse)

45-100%B, 38-58 min; (step 4; high salt wash) 100%B, 58-66 min; (step 5; restoration) 100-

0%B, 66-68 min; and, lastly, 0%B (step 6; re-equilibration) from 68 to 76 min. The column mobile phase flow rate was fixed at 1 ml.min-1. The chromatograms were monitored at 280 nm and timed fractions collected using an automated fraction collector cooled to 4oC. A total of 100

41 fractions were collected per run, with one fraction isolated per ~2.9 min (each ~0.7 ml in volume for two successive injections).

2.2.6 Triple phase ion-exchange fractionation of HeLa cytosolic extracts

To identify macromolecular complexes that populate the HeLa cytoplasmic compartment, I scaled up (i.e., increased the column length from 40 to 60 cm and increased sample loading to up

~3-fold ) my optimized dual phase IEX-HPLC fractionation procedure (see section 2.2.5) to enhance resolution and protein concentration in each collected fraction. A total of 7 to 9 mg total proteins from HeLa cytoplasmic extract was fractionated on a triple phase IEX-HPLC analytical column set up (200 x 4.6 mm i.d, 5m, 1000-A PolyWAX LP  200 x 4.6-mm i.d, 5m, 1000-

A PolyWAX LP  200 x 4.6 mm i.d, 5m, 1000-A PolyCAT A) and resolved into 300 x 0.4- ml fractions using a 2.5-h gradient elution program (23 min with 100% buffer A; 75-min with 0-

50% buffer B; 3-min with 50-100% buffer B; 23-min with 100% buffer B; 3-min with 100 to 0% buffer B; 23-min with 100% buffer A) at flow rate of 0.5 ml/min. Both the 19 fractions representing the column flow-through and the 12 fractions representing the re-equilibration step were discarded as no proteins were detected in a short quality control LC-MS/MS analysis. All remaining 269 fractions were analyzed in duplicate by in-depth LC-MS/MS.

2.2.7 Triple phase ion-exchange fractionation of HeLa nuclear extracts

As I have documented in previously published studies (Havugimana et al., 2006; Havugimana et al., 2007), tandem weak anion-exchange (WAX) coupled in series to a weak cation-exchange

(WCX) offered greater resolution than a single column or WCX-WAX in tandem. To minimize both chance co-elution (i.e. spurious co-fractionation of functionally-unrelated, physically- uncoupled proteins) and bias (i.e. limited proteome coverage) (Figure 2-4B), I optimized the flow rate and sample loading for a semi-preparative triple phase IEX-HPLC in which the pre-

42 column in a highly reproducible dual columns system (Havugimana et al., 2007) was replaced by a weak anion exchange column of the same size as the resolving columns (250 x 9.4 mm i.d,

12m, 1500-A PolyWAX LP  250 x 9.4-mm i.d, 12m, 1500-A PolyWAX LP  250 x 9.4 mm i.d, 5m, 1500-A PolyCAT A). I used this same preparative system to fractionate ~10-12 mg total proteins present in HeLa nuclear extracts into 375 x 0.8-ml fractions using a programmed elution protocol consisting of a 10-min gradient with 100% buffer A to allow protein binding followed by a 50-min gradient with 0 to 50% buffer B followed by a 10 min gradient with 50 to 100% buffer B, 10 min at 100% buffer B, 10 min with 100 to 0% buffer B, and finally 10 min at 100% buffer A to re-equilibrate the column for the next injection. An optimal flow rate of 4ml/min was used in this elution gradient program. Collected fractions were analyzed by LC-MS/MS in duplicate.

2.2.8 Dual phase heparin mixed-bed IEX fractionation of HeLa nuclear extracts

To enhance the detection of low abundance nuclear proteins by MS, I optimized a high resolution tandem affinity column coupled online with a mixed-bed ion exchange column to enrich and resolve the multi-protein complexes present in nuclear extracts. Typically, 8-10 mg protein in HeLa nuclear extracts were loaded on a dual TSKgel Heparin-5PW affinity column

(75 x 7.5 mm id, 10 m, 1000-A) coupled in series with PolyCATWAX mixed-bed ion exchange column (200 x 4.6 mm id, 12 m, 1500-A) mounted to an integrated Agilent 1100 HPLC system

(Agilent Technologies, ON, Canada), A 4 hours salt gradient elution (0.15 - 1.5M NaCl) in

Binding Buffer A was used with a flow rate of 0.25 ml/min to resolve and fractionate protein into

120 x 0.5 ml fractions for post-fractionation MS protein identification. HeLa nuclear extract was fractionated in duplicate to confirm the reproducibility.

43

2.2.9 Single phase heparin fractionation of HeLa nuclear extracts

HeLa nuclear extract (~6.0 mg total protein), prepared using traditional methods (Dignam et al.,

1983), was fractionated on a TSKgel Heparin-5PW affinity column (75 x 7.5 mm id, 10 m,

1000-A) previously equilibrated with buffer A at a flow rate of 0.5 ml/min. After loading, the bound proteins were eluted from the column with a 50-min gradient from 0 to 50% buffer B

(buffer A + 1.5M NaCl). A 5-min gradient with 50-100% buffer B was applied to elute tightly bound proteins, with 100% buffer B maintained for an additional 3 min before returning back to

0% B for 7-min to re-equilibrate the column. In total, 48 × 0.75-ml fractions were collected from

0 to 72 min (1.5 min/fraction). Protein was precipitated with 10% TCA overnight at 4°C.

2.2.10 LC-MS/MS separation and identification of chromatographic peptide fractions

To identify the proteins present in each HPLC fraction, the samples were 10%-TCA precipitated overnight and the precipitates washed briefly using ice cold 100% acetone. Proteins were then resuspended in 50 l of trypsin digestion buffer [50 mM Ammonium Bicarbonate, 1 mM CaCl2,

50 mM Tris; pH7.8], subjected to reduction (10 mM DTT, 30 min, 30 oC) and alkylation (15 mM

IAM, 60 min, 30 oC in the dark), and then digested (18hrs, 30 oC, with gentle agitating) with one

g sequencing grade trypsin (Roche, Mississauga, Canada). The mixtures were concentrated in a

Savant Speed Vacuum and the tryptic peptides solubilised in 20 l of 5% formic acid prior to analysis by LC-MS/MS using a linear ion trap mass spectrometer (LTQ; Thermo Fisher

Scientific, CA, USA) or, when available, an LTQ Orbitrap Velos (Thermo Fisher) hybrid high performance instrument coupled online to an automated nano-flow HPLC pump System (EASY- nLC; Proxeon, Odense, Denmark) via a nano-electrospray ion source.

44

The peptides were separated by reverse-phase chromatography using 150-μm i.d micro- capillary columns packed in-house with fused-silica C18 resin (Zorbax XDB-C18, 3.5 µm,

Agilent Technologies, Canada) at a flow-rate of 500 nl/min. I used columns varying between 10-

40 cm in length depending on the sample complexity obtained in each fractionation experiment.

The gradient elution time was adjusted according to the length of the column and varied between

2 and 4 hrs. For a 2 hr gradient elution, 5 l of tryptic peptides generated for TCS-HPLC fractions were typically loaded onto a 20-cm column and subsequently eluted with a 0 to 35% solvent B (0.1% formic acid/95% acetonitrile) over 90 min and from 35 to 95% in 15 min.

For peptides analyzed on an LTQ ion trap instrument, eluted peptides were directly electro-sprayed via application of a spray voltage of 3.0 kV at the ion source (Proxeon). The MS was operated in a fully automated data-dependent manner using onboard Xcaliber 2.0 software to acquire one full MS scan (400 - 2,000 m/z) followed by five consecutive MS/MS scans selected based on the most abundant precursor ions in the MS1 scan, specifying a minimum precursor signal trigger threshold of 1,000 counts. Ion fragmentation was performed in CID

(collision-induced dissociation) mode through application of a normalized collision energy of

35%. Ions subjected to MS/MS were excluded from further sequencing for 30 seconds.

For mixtures analyzed on an LTQ Orbitrap Velos hybrid instrument, the peptide samples were directly loaded onto a ~10 cm in-house packed column (75 μm inner diameter) with 3 μm reversed phase beads (Zorbax 80XDB-C18, Agilent). Using a 60 min gradient (5-35 % ACN), the peptides were electro-sprayed at 2.5 kV into the mass spectrometer. The instrument was operated in data-dependent acquisition mode switching automatically between one full MS precursor scan and, subsequently, 10 MS/MS fragmentation acquisitions. Instrument control was specified through the onboard Tune 2.6.0 and Xcalibur 2.1.0 programs. Full scan MS spectra

45

(400 – 2,000 m/z) were acquired in the Orbitrap analyzer at high resolution (60,000 at 400 m/z) after accumulation to a target precursor ion intensity value of 106 in the linear ion trap.

Fragmentation was performed in CID mode applying a 35% normalized collision energy.

2.2.11 LC-MS/MS spectra database search and protein identification

All MS/MS spectra (IEX experiments) were combined and mapped against a target-decoy human database downloaded from Universal Protein Resources Database (UniProtKB/Swiss-Prot

Release 57.11), comprising 20,328 human proteins supplemented with common contaminants such as Human Papilloma Virus-18 proteins, Human Immunodeficiency Virus-1 proteins, bovine albumin, bovine trypsin and Benzonase). I searched the spectra using the SEQUEST algorithm

(V2.7) (Eng et al., 1994) with a forward-reverse sequence decoy strategy to estimate the empirical false-discovery rate (Kislinger and Emili, 2003). Static modifications were permitted to allow for the detection of carboxyamidomethylated (+57amu) cysteine residues. All peptide matches were required to be fully tryptic, although one missed cleavage was permitted. The probabilistic STATQUEST model (Kislinger et al., 2003)was then used to evaluate and assign confidence scores to all putative matches. Proteins and peptides were considered positively identified if detected within a stringent 1% false discovery rate cut off (based on empirical target-decoy database search results).

The proteomic patterns of the IEX-HPLC fractions from all experiments were compared using the CONTRAST software tool (Tabb et al., 2002). I manually removed from further consideration all proteins supported with only a single spectrum across all combined LC-MS/MS runs. Moreover, to ensure a proteomic data set of high quality, I confirmed that all of my LC-

MS detected proteins were also present in the previously reported HeLa S3 and HEK293 mRNA deep-sequencing expression data sets (Morin et al., 2008; Sultan et al., 2008). Additionally, only

46 proteins that were both supported by at least two unique peptide sequences and previously reported in at least one recent comprehensive proteomic study of the HeLa proteome (Selbach et al., 2008; Wisniewski et al., 2009) were retained. To facilitate cross-mapping between data sets, I used UniProtKB accession numbers as a common identifier and the UniProt ID mapping tool to interconvert different gene and protein identifiers.

2.2.12 Benchmarking human protein complexes and protein-protein interactions

To benchmark the biochemical data generated by the above co-fractionation methods, I constructed a reference set of previously reported positive and negative protein-protein interactions (PPI). The positive reference set was defined as pairs of proteins within the same protein complex as reported in the CORUM curation database (see Table 2-1). Conversely, the negative reference set consisted of pairs of proteins reported to be present in two different complexes. Complexes used for this assessment did not have any proteins shared in common and thus showed no intersection (i.e., no interactions expected between the complexes).

By analogy to clustering algorithms which allow to computationally group functionally related proteins in clusters, chromatographic columns can be used to experimentally isolate group of functionally related proteins or complexes and both precision and sensitivity can be used to evaluate the best performing column after identification of the co-eluting proteins present in each chromatographic fraction by LC-MS/MS. For a poor performing column (i.e., lack of chromatographic separation), the 20 reference complexes reported in Table 2-1 would appear either in one single chromatographic fraction or in all collected fractions. In this case the precision could be used to approximate the chromatographic resolution at ~ 0.05 (585 possible positive PPI/11,935 possible PPI) or ~5% precision. For the best performing columns, each one

47 of the 20 reference complex would appear in isolate and distinct chromatographic fraction (s) and the resolution of 1 (585/585) or 100% precision would be observed.

Table 2-1: Positive reference protein complexes from CORUM database used to evaluate the performance of the co-fractionation HPLC experiments. No. Complex name Number of Component ID components 1 20S Proteasome 14 PSMA1, PSMA2, PSMA3, PSMA4, PSMA5, PSMA6, complex PSMA7, PSMB1, PSMB2, PSMB3, PSMB4, PSMB5, PSMB6, PSMB7 2 CCT-micro complex 8 CCT1, CCT2, CCT3, CCT4, CCT5, CCT6A, CCT7, CCT8 3 Prefoldin complex 6 PFDN1, PFDN2, PFDN4, PFDN5, PFDN6, PFDN3 4 Multisynthetase 11 DARS, AIMP3, EPRS, IARS, AIMP2, KARS, LARS, complex MARS, QARS, RARS, AIMP1 5 ARP2/3 complex 7 ACTR2, ACTR3, ARPC1B, ARPC2, ARPC3, ARPC4, ARPC5 6 COG complex 8 COG1, COG2, COG3, COG4, COG5, COG6, COG7, COG8 7 Exocyst complex 8 EXOC1, EXOC2, EXOC3, EXOC4, EXOC5, EXOC6, EXOC7, EXOC8 8 Exosome complex 11 RRP44, EXOSC1, EXOSC2, EXOSC3, EXOSC4, EXOSC5, EXOSC6, EXOSC7, EXOSC8, EXOSC9, EXOSC10 9 Coatomer complex 8 COPE, ARCN1, COPB1, COPA, COPZ1, COPG1, COPG2, COPB2 10 Anaphase promoting 9 ANAPC1, ANAPC2, ANAPC4, ANAPC5, ANAPC7, complex CDC16, CDC23, CDC27, ANAPC10 11 Septin complex 5 SEPT11, SEPT2, SEPT7, SEPT8, SEPT9 12 Lsm complex 7 LSM2, LSM3, LSM4, LSM5, LSM6, LSM7, LSM8 13 tRNA export 8 DDX39B, THOC1, THOC2, THOC3, THOC4, THOC5, (TREX) THOC6, THOC7 14 Splicing factor 3b 8 DDX42, PHF5A, SF3B1, SF3B14, SF3B2, SF3B3, SF3B4, SF3B5 15 Replication factor C 5 RFC1, RFC2, RFC3, RFC4, RFC5 16 DNA primase-alpha 4 POLA1, POLA2, PRIM1, PRIM2 17 PBAF complex 10 ARID2, PBRM1, SMARCA4, SMARCB1, SMARCC1, SMARCC2, SMARCD1, SMARCE1, ACTB, ACTL6A 18 MeCP1 complex 9 CHD4, HDAC1, HDAC2, MBD2, MBD3, MTA2, RBBP4, RBBP7, GATAD2B 19 Condensin I 5 NCAPD2, NCAPG, NCAPH, SMC2, SMC4 20 Cohesin SA2 4 RAD21, SMC1A, SMC3, STAG2

48

2.3 Results

2.3.1 Ion-exchange high performance liquid chromatography of HeLa cell-free extracts

The limited scalability and high cost of traditional AP-MS experimental approaches for the proteome-scale detection of protein complexes in higher eukaryotes highlights the need for more efficient detection methods for multi-protein complexes designed to be adaptable to screening interaction networks in model human cell lines. To isolate and characterize soluble protein complexes effectively, a successful methodology would need to be effective, reproducible, cost- effective, and sensitive enough to identify low abundance protein complexes. While sedimentation, which separates proteins according to molecular size, shape and density, is an excellent method for recovering large proteins and protein complexes in an intact native conformation (Andersen et al., 2003; Foster et al., 2006), it suffers from very poor resolution and limited scalability. Alternatively, multi-columns fractionation can be used to purify protein complexes nearly to homogeneity prior to MS analysis (Dong et al., 2008). However, components of the complex may be lost due to the inevitable sample dilutions. Additionally, the post-fractionation chemical labeling proposed by Dong and colleagues considerably reduces the scalability of the method.

To this end, I developed a high-performance Fractionomics Profiling-Mass Spectrometry

(FP-MS) approach which addresses many of these limitations. As illustrated schematically in

Figure 2-1, my method is based on performing extensive biochemical fractionation of both nuclear and cytoplasmic soluble protein extracts using an extensively empirically-optimized high resolution chromatographic fractionation procedure to resolve stable multi-protein complexes prior to mass spectrometric detection.

49

Using this global high-throughput procedure, the proteins present in each collected fraction are then proteolytically digested and identified and quantified by label-free shotgun peptide sequencing LC-MS/MS. Finally, protein complexes are de-convoluted based on the correlation of the recorded profiles of co-eluting subunits. Critical elements of this systematic strategy include minimizing chance co-elution of functionally unrelated proteins from complicated cell-free lysates by achieving the highest resolution separations possible, consistently obtaining high coverage proteomic detection, and using appropriate scoring, filtering and clustering procedures to predict biologically meaningful protein associations. Novel candidate interactions were identified via comparison with publicly available interaction data

(i.e., PPI curation databases) (see chapter 3).

To establish a standardized model system for optimizing my experimental procedures and to assess the general effectiveness of my approach for the isolation and characterization of stably associated endogenous protein complexes, I selected HeLa cells for a case study. Two reasons provided additional motivation for the selection of this particular cell line. First, HeLa cells have been used to biochemically investigate human cell biology and protein function for many decades, thus providing a rich biological context for interpreting my proteomic data. Second, high quality HeLa nuclear and cytoplasmic extracts were commercially available, thus facilitating the development of the co-fractionation method.

50

Figure 2-1: Schematic workflow of my Fractionomic Profiling-Mass Spectrometry strategy for identifying soluble protein complexes on a large scale.

Step 1- Nuclear and cytoplasmic extracts were prepared using a traditional non-denaturing procedure (Dignam et al., 1983) starting from target cells. Step 2- Stable complexes (only two representative complexes with 3 and 4 subunits respectively are shown for illustrative purposes) were separated by IEX-HPLC and fractions were collected. Step 3- Collected protein fractions were precipitated, trypsinized and analyzed by tandem mass spectrometry. Step 4- Protein profiles were examined based on spectral counting as a semi-quantitative measurement. Step 5- Proteins elution profiles were compared across all fractions using a similarity metric such as the Pearson correlation coefficient. The major concept is that stably associated proteins present in the same complex should co-elute during non-denaturing IEX-HPLC, and, hence, a strong profile consistency (e.g. high PCC) should be observed. Step 6- After stringent bioinformatics filtering of the biochemical data, a clustering algorithm (e.g. such as ClusterONE, see Chapter 3 for details) was used to organize highly correlated protein elution patterns into reproducible groupings, which may represent known complexes (e.g. documented in the literature) or putative novel complexes that require further evaluation and validation.

51

2.3.1.1 Resolution and fractionation of HeLa cell-free cytoplasmic extracts

I used commercially available columns for the development of a flexible, fast and reproducible

IEX-HPLC pre-fractionation method to improve proteomic detection coverage using standard gel-free tandem mass spectrometry screening procedures (Havugimana et al., 2006; Havugimana et al., 2007). To gain insight into the soluble macromolecular machines that populate the cell , I subsequently evaluated four different procedures for ion-exchange chromatography for their ability to bind and separate soluble stable complexes present in HeLa cytoplasmic extracts: single phase weak cation-exchange, single phase weak anion-exchange, dual phase coupling weak anion-and cation-exchange columns in series, and lastly a triple phase column system I set up comprising two weak anion-exchange columns in tandem with a weak cation- exchange column. In contrast to my previous studies (Havugimana et al., 2006; Havugimana et al., 2007), where I had focused on sample simplification using short and narrow bore ion- exchange HPLC columns, here I applied long analytical ion-exchange HPLC columns to resolve stable protein complexes from a large amount of loaded extract.

52

Figure 2-2 shows the chromatographic fractionation performance (i.e. chromatograms) routinely achieved with each of the four different chromatographic systems to fractionate cytoplasmic cell- free extract. Of the four configurations I extensively evaluated, the single phase WCX column configuration exhibited the poorest retention and separation efficiencies as most protein eluted in the flow-through fraction (as judged by absorbance at 280 nm). In terms of resolution, it was obvious that the dual phase column set up outperformed single stage WAX, and, moreover, I found the triple phase column arrangement exhibited markedly better resolving power than the dual phase system. With this fractionation procedure, protein peaks were narrow and most of the material was retained (i.e. no flow-through loss). I concluded that the WAXWAXWCX columns geometry, together with an empirically optimized execution of the salt-based elution gradients, leads to the highest possible resolution and recovery of protein samples along the entire chromatogram. The improved resolution is most likely due in part to the longer overall column length coupled to the optimal sample loading. The triple phase column set up also provided an additional advantage over a dual phase column arrangement, since a larger amount of sample could be loaded and fractionated. This, in turn, enhanced detection of low-abundance proteins in the subsequent downstream LC-MS/MS analysis of the collected protein fractions, albeit with the burden of generating a higher number of collected fractions (i.e. requiring additional mass spectrometry run time).

53

Figure 2-2: Evaluation of chromatographic resolution of HeLa S3 cytosolic protein extracts by four different IEX-HPLC methods.

Protein elution times (X-axis) were monitored with an Ultraviolet-detector at 280 nm (Y-axis). Highlighted are the results obtained with: (A) Weak cation-exchange HPLC on a PolyCAT A (20 x 4.6 mm id, 5-µm particle size; 1000 Ǻ pore size) column; (B) Weak anion-exchange HPLC on PolyWAX LP (20 x 4.6 mm id, 5-µm particle size; 1000 Ǻ pore) column; (C) Dual phase ion- exchange column set up HPLC consisting of a PolyWAX LP and PolyCAT A columns; (D) Triple phase ion-exchange column set up HPLC comprising two consecutive PolyWAX LP and PolyCAT A columns. Elution conditions are described in the Material and Methods section of this Chapter. X-axis for single phase and dual phase separation were expanded 3 and 2 fold respectively to facilitate comparison.

54

2.3.1.2 Resolution and fractionation of HeLa cell-free nuclear extracts

The nucleus hosts a myriad of macromolecular complexes that act in concert with the genetic material to control DNA replication and gene expression and ensure faithful chromosome compaction and transmission. For example, chromatin structure and accessibility is continuously modified by dedicated protein complexes, such as Methyl CpG binding protein complex 1

(MeCP1) and Polybromo and BAF-containing complex (PBAF; also called SWI/SNF complex

B), that cause genes to be turned “on” or “off” transcriptionally by configuring nucleosomal positions in response to post-translational histone marks at specific regulatory loci, or which control other aspects of chromosome dynamics. Many of these essential machineries have been isolated from cultured human cells and have been characterized biochemically. Nevertheless, many curated human protein complexes, including nuclear-specific assemblies, reported in public databases such as CORUM and PINdb, were derived from a variety of different cell types using different experimental methods, making it hard to reconstruct a map of protein complexes present in a typical human cell to understand the whole functional ensemble.

To fill this gap using my FP-MS approach, I subjected a high-salt HeLa nuclear extract to three different fractionation approaches (HAC, HCW and TCS) to enrich for nuclear protein complexes, such as the chromatin modifying complexes described above. The fractionation methods I tested included high performance liquid affinity chromatography on a generic Heparin column, which consists of an immobilized mixture of natural linear polymeric negatively- charged sulfated glycosaminoglycans that display a high affinity for a wide range of nucleic acid binding proteins. In effect, the presence of polyanionic sulfate groups of the heparin column make it a high-capacity cation-exchanger. Heparin affinity chromatography was combined with mixed bed ion-exchange chromatography using a PolyCATWAX column system (50/50 mixture of WCX and WAX column packing material), which has previously been reported to exhibit

55 similar resolution as separate tandem PolyWAX-PolyCAT columns (el Rassi and Horvath, 1986;

Maa et al., 1988). Since a standard tandem column system could not be tested in combination with a heparin column due to the increased backpressure, I opted for a dual phase heparin-

PolyWAXCAT mixed-bed arrangement for fair comparison with the triple phase IEX- chromatography system which I developed for cytoplasmic extracts and described above

(Section 2.3.1.1). My goal here was to define a generic, effective and reproducible fractionation procedure that enhanced resolution and recovery of nucleus-associated protein complexes such as the core chromatin machinery (e.g. transcription factors).

Representative results (chromatographic elution profiles) obtained using these different fractionation platforms are summarized in Figure 2-3. Similar to the single weak cation- exchange fractionation, the single heparin affinity chromatographic system exhibit reduced resolution as most proteins eluted in the flow-through (Figure 2-3A), presumably because they are not captured by the negatively charged sulfonated groups. However, far better and highly reproducible chromatographic resolution was obtained through the combination of using heparin together with mixed-bed PolyCATWAX columns (Figure 2-3B and C), which ensured both an excellent overall retention and a uniform distribution of proteins over the entire chromatogram.

As discussed below, this chromatographic system even provided improved overall nuclear protein identifications (c.f. Figure 2-4B) as well as the positive identification of reference complex components (i.e., increased precision; Table 2-2) compared to the triple phase

IEX chromatography I developed using cytoplasmic extracts and showed unsurpassed chromatographic resolution (Figure 2-3D). Once again, the trade-off was a correspondingly higher number of collected fractions requiring more time for downstream processing and LC-

MS/MS analysis.

56

Figure 2-3: Assessment of HPLC fractionation efficiency with a HeLa S3 nuclear protein extracts.

Please refer to the Material and Methods for details on sample amount and chromatographic elution conditions. Three different fractionation approaches were examined here. Chromatogram A shows the results of affinity chromatography of a soluble HeLa nuclear extract on single phase heparin affinity chromatography. Chromatograms B and C show the better resolution and reproducible results obtained using a dual phase heparin-mixed-bed IEX- HPLC fractionation of HeLa nuclear extract. The replicate sample injections were performed three months apart. Chromatogram D was recorded during fractionation of HeLa nuclear extract on a semi-preparative triple phase IEX-HPLC. Column effluents were collected automatically by fraction collector and analyzed by LC-MS/MS (see section material and methods for details).

57

2.3.2 Determination of the performance of the various ion-exchange high performance liquid chromatography columns

To establish the actual usefulness of my IEX-HPLC sorting methods for fractionating stable protein complexes, the identities of the various soluble proteins present in each fraction were determined by LC-MS/MS using automated data-dependent fragmentation of precursor peptide ions (i.e., a simple heuristic method where precursor ions were detected in a survey scan and selected automatically by the instrument). To this end, the proteins in the disparate fractions were precipitated and digested extensively using trypsin. The resulting peptide mixtures were then chromatographically resolved by nanoflow reverse-phase (C18) micro-capillary HPLC and subsequently detected by online electro-spray ionization into an attendant linear ion trap (LTQ) tandem mass spectrometer. All of the mass spectra generated by the instrument were searched against a reference human protein sequence database using the SEQUEST search engine (Eng et al., 1994) and a list of highly confident (99%) candidates were selected using the STATQUEST probability scoring model (Kislinger et al., 2003).

To examine and compare the proteomic patterns I obtained, the pattern of confidently identified proteins present across the entire set of fractionated samples in each of the seven separate biochemical experiments (i.e., WCX-HeLa cytoplasmic extracts, WAX-HeLa cytoplasmic extracts, DCS-HeLa cytoplasmic extracts, TCS-HeLa cytoplasmic extracts, HAC-

HeLa nuclear extracts, TCS-HeLa nuclear extracts, and HCW-HeLa nuclear extracts) was deduced using spectral counts as a semi-quantitative measure (Kislinger et al., 2006). Overall, I was able to identify 5,623 proteins at 99% confidence in 2,132 MS replicate runs from 1,099 different collected fractions. This corresponded to more than half of proteins recently reported in a deep proteomic analysis of HeLa cells by Nagaraj and coworkers in the Mann group (Nagaraj et al., 2011). Figure 2-4 shows the abundance distribution of the proteins identified in each

58 fractionation experiment after matching my protein list to the estimated protein abundances reported in the Nagaraj et al. study. The best proteome coverage (2,619 and 3,249 proteins detected using the LTQ instrument) was obtained using the triple phase column set up (TCS) and dual phase heparin-mixed-bed IEX HPLC for the HeLa cytosolic and nuclear extracts, respectively. The median putative abundance was roughly 75,150 and 49,900 copies/cell for cytoplasmic and nuclear proteins, respectively.

59

Figure 2-4: Distribution of protein abundances detected by MS in HeLa cell-free extracts.

Panel A-shows the distribution of proteins identified by LC-MS/MS analysis of HeLa cytoplasmic extracts after fractionation with four different chromatographic approaches. Panel B-highlights the distribution of proteins identified from HeLa nuclear extracts after fractionation with three different co-fractionation methods (single heparin, triple-phase column set up, and dual phase heparin mixed-bed IEX). Protein fractions in replicate 1 were analyzed on LTQ linear instrument while those in replicate 2 were analyzed on a higher performance Orbitrap LTQ Velos instrument. Panel C compares the abundance distribution of proteins identified in two different Soluble HeLa cell-free extracts (i.e., nuclear vs. cytoplasm) after combination of all MS runs. Panel D shows the overlap of proteins identified in the nuclear and cytoplasmic compartments. Protein abundance was estimated based on the recent report by the Mann laboratory (Nagaraj et al., 2011).

60

Following this evaluation of the proteome coverage obtained for each fractionation experiment, I next sought to assess the relative efficiencies of the column systems at separating a subset of 20 hand curated human protein complexes reported in the CORUM reference database (Ruepp et al., 2009) as a benchmark of assay performance. To objectively quantify the performance of each chromatographic arrangement, I evaluated two standard performance measures, sensitivity and precision. To this end, I first merged the LC-MS/MS results recorded in the replicate analyses

(where applicable), then I calculated a stringent summary statistic, which I termed the co-apex score. Namely, instead of simply identifying proteins present in common in any same fraction, for each pair of identified proteins, I determined the number of fractionation experiments in which the proteins showed maximal (modal) abundance in the same exact peak fraction.

The representative plots shown in Figure 2-5 illustrate the co-apex pattern of co-elution profiles seen among components of the 20S proteasome complex in HeLa nuclear extracts subjected to heparin-IEX HPLC fractionation. All 14 annotated subunits of the 20S proteasome complex consistently co-eluted during this particular experiment and shared the same exact peak apex (maximum abundance) or co-apex (at fraction 67). From this analysis, I deduced that all 91 co-complex associations expected between these 14 subunits were correctly observed and had a co-apex score of one (the minimal value that can be deduced for a single experiment for co- eluting components of the same complex). Likewise, both the number of components and observed protein interactions for each of the other 19 reference complexes were deduced using this approach in each of the 8 fractionation experiments and were used to compute the sensitivity and precision performance measures (see Table 2-2).

61

Figure 2-5: Identification of co-eluting proteins through detection of co-apex peaks.

In this approach, each protein signal (i.e., spectral counting) was normalized to its own highest spectral count, while proteins that share the same apex are defined as co-eluting proteins consistent with membership in the same complex. In this particular experiment, HeLa nuclear extract was fractionated with dual heparin-PolyCAT/WAX columns; as shown, all the subunits of the 20S proteasome shared the same exact peak apex. In the top graph network diagram, each node represents a 20S proteasome subunit while an edge represents a positive co-apex score (value of 1) between co-eluting protein pairs.

62

Sensitivity was defined as the number of observed intra-complex physical interactions, or

PPI, divided by the number of expected (i.e. all possible annotated subunits ) intra-complex PPI pairs, while precision was defined as the number of observed intra-complex PPI divided by the sum of both intra-complex (true positive) and inter-complex (false positive) PPI observed in a particular co-fractionation experiment.

Table 2-2 summarizes the results obtained for the eight experiments, including the integration of all experiments (using a minimum co-apex threshold score of 2 imposed for more stringent identification of co-eluting protein pairs). As measured according to both sensitivity and precision, the best performing column systems were the triple phase column set up arrangement and the heparin column coupled to mixed-bed IEX (for cytoplasmic and nuclear proteins, respectively). For the triple-phase-IEX fractionation of HeLa cytoplasmic extracts the sensitivity was estimated at 34% (200 PPI/585 PPI) with a precision of 70% (200 intra-complex

PPI/287 intra and inter-complex PPI). In the case of dual phase heparin mixed-bed IEX fractionation, I obtained a sensitivity of 77% (453 PPI/585 PPI) and precision of 60 % (453 intra- complex PPI/754 intra and inter-complex PPI). Combining all the experimental results (column

“All” in Table 2-2) achieved a detection coverage of  99% (154 components detected of 155 expected) for the 20 reference complexes with sensitivity boosted to 85% (499 observed intra- complex pairwise interactions predicted out of 585) for an overall precision of 60% (499 PPI/820

PPI). Notably, the overall sensitivity and precision of this integrated (multi-fractionation) dataset were equivalent to that achieved using two replicate LC-MS/MS analyses of heparin-mixed-bed

IEX HPLC of nuclear extract alone, suggesting this particular fractionation procedure is the most powerful in terms of enrichment and resolution for many stably associated nuclear protein assemblies. Additionally, heparin-mixed-bed IEX HPLC separation was the only system able to

63 retain and enrich all the subunits of the nuclear anaphase promoting complex for subsequent downstream identification by LC-MS/MS (i.e., subunits of this complex were not detected by other methods).

Table 2-2: Test performance analysis indicating the fraction of subunits of reference complexes sharing the same exact co-apex peak during FP-MS.

WCX, cation exchange; WAX, anion-exchange; DCS, dual phase; TCS, triple phase; HAC, heparin; HCW, dual heparin mixed-bed IEX. Reference complexes HeLa cytoplasmic extract HeLa nuclear extract Name Size WCX WAX DCS TCS TCS HAC HCW1 HCW2 All 20S Proteasome complex 14 14 14 14 14 11 12 14 14 14 ARP2/3 complex 7 3 6 2 7 7 4 7 7 7 Coatomer complex 8 0 1 7 4 7 7 8 8 8 Cohesin-SA2 complex 4 0 0 3 0 4 2 4 4 4 Condensin-I complex 5 4 3 2 1 2 4 5 5 5 DNA primase complex 4 0 0 4 4 2 1 4 4 4 Exocyst complex 8 0 0 1 0 7 0 8 7 8 Exosome complex 11 1 1 6 3 5 4 11 10 10 Lsm 2-8 complex 7 0 4 5 5 3 3 6 6 7 MeCP1 complex 9 2 2 4 1 5 5 9 9 9 Multisynthetase complex 11 9 7 8 11 11 2 11 11 11 PBAF complex 10 1 2 3 1 5 5 10 10 10 Prefoldin complex 6 5 6 6 6 5 6 5 5 6 RFC complex 5 0 0 3 2 2 1 5 5 5 Septin complex 5 0 0 3 0 5 1 5 5 5 Splicing factor 3b 8 1 0 6 3 8 7 8 8 8 complex TREX complex 8 0 1 1 1 4 1 8 7 8 TRiC/CCT complex 8 8 8 8 8 6 3 7 8 8 COG complex 8 0 0 0 0 7 0 7 8 8 Anaphase promoting 9 0 0 0 0 0 1 9 9 9 complex Expected total proteins 155 155 155 155 155 155 155 155 155 Observed total proteins 48 55 86 71 106 69 151 150 154 Sensitivity (protein level) 31% 35% 55% 46% 68% 45% 97% 97% 99% Expected positives PPIs 585 585 585 585 585 585 585 585 585 Observed positive PPIs 134 168 203 200 181 82 453 509 499 All observed PPIs 259 203 632 287 363 205 754 973 820 Sensitivity (PPI level) 23% 29% 35% 34% 31% 14% 77% 87% 85% Precision (PPI level) 52% 83% 32% 70% 50% 40% 60% 52% 61%

64

As highlighted in Figure 2-6, starting from the integrated experimental analysis of the 20 reference complexes, components of the same complex typically co-eluted in the same fraction and could be clustered together based on the predicted PPI (see for example the anaphase promoting complex). Nevertheless, both components of functionally related complexes, for example subunits of the cohesin and condensin complexes, which are both involved in cell division, and hence may be plausibly linked, and functionally unrelated complexes, such the splicing factor 3b and the vesicle trafficking coatomer complexes (which are not expected to associate), often appeared in the same fractions. As will be shown in Chapter 3, I turned to additional functional genomic evidence (e.g. gene co-expression) to score and prune my biochemical data, thereby refining my protein co-elution network using computational techniques that minimized coincidental co-fractionation (see Chapter 3).

65

Figure 2-6: Co-elution network comprising associations predicted within and between 20 representative annotated human complexes used as a reference test set to evaluate my co- fractionation scoring procedure.

Complex composition, shown using differently colored node groupings (circles), was defined based on existing CORUM curations. Edges within complexes reflect bone fide subunit associations, while edges between complexes likely represent chance co-elution. Edge thickness is proportional to the occurrence of co-apex across 8 experiments at a minimal co-apex of 2 (i.e. pair of proteins shared the same exact apex peak in at least two experiments).

66

2.3.3 Assessing the reproducibility and efficiency of tandem heparin mixed- bed IEX co-fractionation approach I have employed several criteria to define an optimal, generic and robust high-throughput FP-MS technology for global detection of endogenous water-soluble stable multi-protein complexes.

The overall proteome coverage and PPI prediction sensitivity and precision after tandem heparin- mixed-bed IEX HPLC fractionation and shotgun sequencing of HeLa nuclear lysate (HCW1 column in Table 2-2) indicated the potential of this technique to monitor stable protein complexes at a global level.

To further ascertain the applicability of this approach on a proteomic scale, I first ensured the reproducibility of my protocol for characterizing soluble protein complexes. To attain these goals, I performed a second fractionation of HeLa nuclear extract, using the same chromatographic elution parameters that I had established as being optimal. However, this time after fraction collection, I analyzed the protein elution profiles using our recently acquired high performance LTQ Orbitrap Velos hybrid tandem mass spectrometer, albeit using much shorter run times since the new instrument has a much higher sampling duty cycle and improved sensitivity. In total 3,060 proteins were identified by MS/MS sequencing using an FDR of 1%

(estimated based on using an empirical decoy database) in two MS technical replicates (i.e., 2x

120 fractions) within a two week period of MS instrument time. This proteome coverage corresponds to ~85% of the proteins that were originally identified over a period of a nearly two months of extensive shotgun sequencing using a low resolution LTQ instrument (Figure 2-4B).

Moreover, the overall effectiveness at identifying co-eluting components of the same 20 reference complexes was comparable between the two different MS instruments.

The plots presented in Figure 2-7 show reference complex detection coverage obtained by both the low resolution LTQ and high resolution LTQ Orbitrap Velos, while also highlighting the high reproducibility of the dual heparin mixed-bed IEX chromatography procedure. Once

67 again, most of the components of the reference complexes were identified in both experiments with high sensitivity and precision (see Table 2-2). As illustrated in Figure 2-7A, there was, however, absence of co-elution for several known components of certain test complexes (e.g.

RRP44, which is an annotated subunit of the exosome), which in turn negatively impact the two performance measures. These inconsistencies may be explained either by the potential issue of partial complex disruption by the salt gradient or by the component itself being a member of multiple complexes. In some cases, my stringent co-apex scoring scheme missed otherwise highly similar co-elution profiles (e.g. seen as gaps in the heatmap plotted in Figure 2-7A).

Strategies to address components shared by multiple complexes as well as improvements to my scoring approach will be discussed in Chapter 3. Despite these caveats, autocorrelation analysis

– namely, the detection of a particular protein in the same exact fractions between the two replicate experiments identified 2,016 proteins reproducibly profiled by FP-MS with an average

Spearman rank correlation coefficient greater or equal to 0.90, with a coefficient of variation of only 9%, while for the overlap of proteins between the two data sets (LTQ vs. Orbitrap) exceeded 85% (2,634/3060) (see Figure 2-7C). Therefore, I concluded from these pilot studies that my fractionation and identification procedures were consistent and reliable, and that a revised screening strategy based on the Orbitrap had the long-term potential of serving as the fastest global profiling platform for analyzing large numbers of soluble protein complexes separated by HPLC from mammalian cell extracts.

68

Figure 2-7: Reproducibility of heparin-IEX HPLC and LC-MS/MS Profiling.

A- Comparison of subunit elution profiles of 20 reference complexes in two independent HPLC fractionation experiments of HeLa nuclear extract. In replicate 1, each protein fraction was analyzed with a low scan speed LTQ instrument for 4 hours while in replicate 2, protein fractions were analyzed on a high scan speed Orbitrap LTQ Velos hybrid instrument for one hour. Shown on the y-axis are examples of the co-elution patterns of known components of protein complexes (annotated on the figure) and within each complex individual subunits are charted. Blue color intensity indicates presence of the protein, and the yellow color indicates absence of the protein co- purifications based on the detection of the exact peak apex in the same fraction of HeLa nuclear extracts. Only apex and directly adjacent peak information is shown. B- Venn diagram summarizing the number of proteins detected by LC-MS/MS and the overlap between the two replicates. C- The cumulative number of proteins with similar co-elution profiles in the two replicate co-fractionation experiments plotted as function of the spearman correlation coefficient for auto-correlated protein profiles.

69

2.4 Discussion

Although it is a well established method in traditional biochemistry, I have shown that IEX-

HPLC has the exciting potential to be used to for modern proteomic analyses of interacting protein networks. In this work, I tested six different configurations of ion-exchange column systems. I found that multi-phase column combining heparin with mixed-bed IEX-HPLC provided the most effective, reliable, and reproducible means of identifying endogenous soluble human protein complexes starting from biological mixtures. My results also suggested that IEX-

HPLC fractionation could be a useful high-throughput method to globally monitor large numbers of human protein complexes without the need for individual gene tagging or targeted affinity purification (which suffer from caveats in addition to logistical concerns). As a proof-of- principle, this method was evaluated and validated by analyzing a test set of annotated human protein complexes from cell-free nuclear extracts. My pilot co-elution profiling data confirmed the robustness of my experimental procedures and the general promise of the FP-MS concept.

In particular, my dual Heparin-PolyCATWAX column set-up displayed impressive resolution of the reference complexes with minimal sample loss due to column flow-through.

Consistent with this, I subsequently observed noticeably improved proteomic detection coverage by LC-MS/MS, as indicated by a significant increase in both the number and quality of identifications made by tandem mass spectrometry. In effect, my optimized fractionation procedure led to a nearly 3-fold increase in nuclear proteome coverage compared to single step heparin affinity chromatography. Nevertheless, further enhancement in resolution may be achieved through use of more extensive sub-fractionation prior to FP-MS analysis. A key problem with this approach remains the co-elution of functionally unrelated proteins. However, as I document in Chapter 3, the use of computational strategies to integrate orthogonal genomic

70 evidence, such gene co-expression or co-localization information, can be used to improve the demarcation of the subunits of genuine multi-protein complexes from chance co-eluting entities.

The FP-MS approach described here is flexible, in that it can potentially be applied to virtually any cell line or tissue. In fact, this method is now being routinely used in our laboratory as a standard sample preparation protocol for many of our global proteomic projects, especially those aimed at the construction of new protein interactome maps for a variety of model species whose genomes have been sequenced but for which AP-MS technology has lagged or is inconvenient.

71

Chapter 3 Application of the FP-MS Approach to the Global Detection of Soluble Stable Human Native Complexes Expressed in HeLa and HEK293 Model Lines

A paper has been published in Cell (Havugimana et al.2012) based largely on the information in this Chapter.

Under the guidance of Prof. Andrew Emili, I performed all chromatographic fractionation experiments (Cuihong Wan assisted me for the HeLa MS runs on the Orbitrap) and data analysis described in Figures 3-2 to Figure 3-6; Figure 3-9; Figure 3-12 to 3-16. I also performed the affinity purification-mass spectrometry experiment shown in Figure 3-18. Prof. Edward Marcotte did analysis for the data presented in Figure 3-17 and Figure 3-20B. Pingzhao Hu calculated the weighted cross-correlation scores and I calculated the co-apex scores. Traver Hart calculated the Pearson correlation scores based on Poisson noise model and integrated all calculated scores with orthogonal genomic evidence (Figure 3-8) and produced Figure 3-10. Tamas Nepusz and Haixuan Yang developed the ClusterONE algorithm and denoising procedure respectively. Sadhna Phanse assisted me with the Cytoscape Figure 3-13 and Figures 3-15. Peggy Wang performed enrichment analyses (Table 3-3 and Table 3-4) and I helped her to retrieve relevant data sets. Prof. Elisabeth Tillier did analysis for the data presented in Figure 3-19, and Andrei Turinsky helped with the analysis of the data presented in Figure 3-20A.

The data matrix list of 5,584 proteins identified in 1,163 fractions can be can be found in the supplementary excel spreadsheet data file or on the accompanying CD and via a dedicated web portal of human protein complexes (http://human.med.utoronto.ca) that comprises all the data generated in this study.

72

3 Application of the FP-MS approach to the global detection of soluble stable native complexes expressed in HeLa and HEK293 model lines 3.1 Introduction

Protein complexes are stable macromolecular assemblies that perform many of the diverse biochemical activities essential to cell homeostasis, growth and proliferation. Comprehensive characterization of the composition of multi-protein complexes in the sub-cellular compartments of model organisms like yeast, fly, worm and bacteria have provided critical mechanistic insights into the global modular organization of conserved biological systems (Hartwell et al., 1999), accelerated functional annotation of uncharacterized proteins via guilt-by-association (Hu et al.,

2009; Oliver, 2000), and facilitated understanding of both evolutionarily conserved and disease- related pathways (Vidal et al., 2011). How the ~20,000 or so proteins encoded by the human genome are partitioned into heteromeric “protein machines” remains an important but elusive research question, however, as less than one fifth of all predicted human open reading frames are currently annotated as encoding subunits of protein complexes in public curation databases

(Ruepp et al., 2009).

Loss of function in genes encoding the subunits of protein complexes typically give rise to similar phenotypes, or, through genetic interaction, amplify the phenotypic effects of other alleles in functionally linked sets of genes. Identifying the membership of protein complexes, therefore, addresses a crucial layer in the hierarchical functional organization of biological systems that links the core biochemistry of a functioning cell to the general physiology of an organism, and is fundamental to deciphering the relationship between genotype and phenotype. While bioinformatics analyses have been used to predict evolutionarily conserved

73 human protein-protein interactions (PPIs) on a large-scale (Ramani et al., 2008; Rhodes et al.,

2005), most of these associations remain to be verified experimentally.

Affinity purification of tagged exogenous proteins coupled with tandem mass spectrometry (AP-MS) is an effective method for isolating and characterizing the composition of stably-associated human proteins in experiments ranging from dozens to hundreds of different

'baits' (Behrends et al., 2010; Bouwmeester et al., 2004; Ewing et al., 2007; Hutchins et al., 2010;

Jeronimo et al., 2007; Mak et al., 2010; Sardiu et al., 2008; Sowa et al., 2009). Likewise, immunoprecipitation can be used to systematically isolate endogenous human protein complexes from human cell lines (Malovannaya et al., 2011). Nevertheless, the limited availability of high- quality antibodies or sequence-verified cDNA clones suitable for targeted protein complex enrichment precludes scale-up required for the unbiased assessment of the molecular association networks underlying human cells. Conversely, although traditionally used to isolate discrete complexes with specific assayable biochemical properties (e.g., enzymatic activity), classical biochemical fractionation procedures have been used to resolve biological mixtures as a means of ascertaining the collective composition of human protein complexes present in certain sub- cellular compartments (Andersen et al., 2003; Ramani et al., 2008; Wessels et al., 2009).

In this study, extensive scaled up biochemical fractionation with in-depth, quantitative mass spectrometric profiling was combined with stringent computational filtering to resolve and identify endogenous, soluble, stably-associated human protein complexes present in cytoplasmic and nuclear extracts generated from cultured cells. While the resulting reconstructed high- quality physical interaction network shows strong overlap with existing curated and experimentally derived sets of annotated protein complexes, it contains many predicted novel subunits and previously unreported complexes with specific functional, evolutionary and

74 disease-related biological attributes. To my best knowledge, this resource represents both the largest experimentally derived catalog to date of human protein complexes for reference cell lines measured under a single, fixed assay condition and a reliable first draft of the basic physical wiring diagram of a human cell.

3.2 Material and Methods

3.2.1 Experimental methods

In addition to the extensive co-fractionation experiments presented in Chapter 2, additional orthogonal co-fractionations using sucrose gradients and isoelectric focusing have been performed to accommodate IEX salt-sensitive complexes. I also performed experiments with

HEK 293 cells to construct a nearly universal set of human protein complexes. All MS/MS data that were generated in these experiments were combined and searched against the same database

(see Chapter 2) using the same criteria.

3.2.1.1 Chromatographic fractionation of HEK293 nuclear cell-free extracts

HEK 293 nuclear extracts were obtained from Paragon and were fractionated (8-10 mg protein/mL) using my optimized high resolution tandem affinity column coupled online with a mixed-bed ion exchange column described in Chapter 2. Time-based fractions, 120 x 0.5-ml, were collected, precipitated, reduced, alkylated, digested and analyzed using an LTQ MS instrument in duplicates to confirm the reproducibility.

3.2.1.2 Isoelectric focusing fractionation (IEF) of HeLa cell-free extracts

3.2.1.2.1 IEF sample preparation and fractionation

HeLa cells were grown to 70-80% confluency in 75cm2 flasks and harvested by mechanical scraping. Cells were washed in ice-cold PBS, pelleted by centrifugation (600xg), and resuspended in lysis buffer [10mM Tris-HCl (pH 8.0), 10mM KCl, 1.5mM MgCl2, 0.5mM DTT,

75 and 1x Protease Inhibitor Cocktail Set I (Calbiochem)]. Cells were lysed on ice using a Dounce homogenizer and fractionated into cytosolic and nuclear fractions using a protocol adapted from previous publication (Andersen et al., 2003). Briefly, cells were centrifuged at 1000xg for 5 minutes (4˚C). The supernatant was saved as the cytosolic fraction. The pellet was resuspended in 250mM sucrose/10mM MgCl2/1x Protease Inhibitor Cocktail, layered over a sucrose cushion of 880mM sucrose/0.5mM MgCl2/1x Protease Inhibitor Cocktail, and centrifuged at 3000xg for

10 minutes (4˚C). The supernatant was discarded and the pellet resuspended in lysis buffer with

5% NP-40 by sonicating water bath (15 minutes). Following sonication, samples were centrifuged at 3500xg for 10 minutes to pellet insoluble material, with the supernatant saved as the nuclear fraction. Both cytosolic and nuclear fractions were further fractionated in solution by isoelectric focusing on a MicroRotofor Liquid-Phase IEF cell (Bio-Rad). Ten fractions per sample were collected across a pH range of either 3-10 or 5-8. Following IEF fractionation, ampholytes were removed by OrgoSol DetergentOUT detergent removal kit (G-Biosciences).

3.2.1.2.2 Trypsin digestion and MS analysis of IEF protein fractions

Samples were denatured and reduced in 50% 2,2,2-trifluoroethanol (TFE) and 15 mM DTT at

55˚C for 45 minutes, followed by alkylation with 55mM iodoacetamide for 30 minutes at room temperature in the dark. Following alkylation, samples were diluted to 5% TFE in 50mM Tris-

HCl, pH8.0/2mM CaCl2 and digested with a 1:50 final concentration of Proteomics Grade trypsin (Sigma) for 5 hours at 37˚C. Digestion was quenched by addition of 1% formic acid, and the sample volume was reduced to near dry (<20μl) by speedvac centrifugation. Samples were resuspended in 5% acetonitrile/0.1% formic acid and bound and washed on HyperSep C18

SpinTips (Thermo). Following elution, the sample volume was reduced by speedvac to remove elution buffer. Samples were resuspended in 5% acetonitrile/0.1% formic acid and filtered through Amicon Ultra 10kDa centrifugation filters (Millipore).

76

Samples were analyzed by LC-MS/MS. Peptides were separated on a Zorbax 300SB-

C18 reverse phase column (0.075 x 150mm, 3.5μm; Agilent) with an elution gradient of 5-38% acetonitrile over 230 minutes followed by 38-100% over 15 minutes. Peptides were analyzed by nanoelectrospray ionization onto an LTQ Orbitrap mass spectrometer (Thermo Scientific).

Parent mass scans (MS1) were collected at high resolution (100,000) with data dependent ion selection activated for ions of greater than +1 charge. Up to 12 ions per MS1 were selected for

CID fragmentation spectrum acquisition (MS2), with ions selected twice within 30 seconds placed on a dynamic exclusion list for 45 seconds.

3.2.1.3 Sucrose gradient fractionation of HeLa cell-free extracts

Generation of the sucrose density gradient fractions and MS analysis was described elsewhere

(Andersen et al., 2003; Ramani et al., 2008). Briefly, they were generated using a 7-47% continuous sucrose gradient and ultra-high-speed centrifugation of the supernatants from HeLa

S3 cell-free extracts. Gradient fractions were analyzed by Mass Spectrometry with LTQ-Orbitrap hybrid mass spectrometer (ThermoFisher), and tandem mass spectra were searched as described below.

3.2.1.4 Immunoprecipitation mass spectrometry of selected candidates

C-terminal 3X-FLAG tagged expression clones of candidate ribosome biogenesis proteins were constructed via Gateway LR cloning (Invitrogen) of human ORF clones from the PlasmidID collection into a modified pcDNA3 vector (Invitrogen) followed by sequence verification. 3x106

HEK293 cells were transfected with 5 g of DNA of tagged genes and untransfected cells were used as control. FuGene6 (Roche) reagent in DMEM medium with 10% FBS and 1 U/ml of penicillin and streptomycin (Lonza) was used to transfect the cells for 24 hr. Cells were harvested after growing in the same medium with 10 U/ml of penicillin and streptomycin for an

77 additional 24 hr. Cell lysis, FLAG immunoprecipitation (IP) on anti-FLAG M2 affinity gel

(Sigma; A2220), immuno-complex elution and digestions were performed according to the method of Dunham et al. (Dunham et al., 2011). Digested peptide mixtures (9μl) were loaded onto a reverse phase micro-capillary pre-column (25-mm x 75-μm silica packed with 5-μm Luna

C18 stationary phase; Phenomenex) and injected onto a micro-capillary analytical column (100- mm × 75-μm). Peptide separation was performed over 105 min with 5-95% Acetonitrile

(acidified with 0.1% formic acid) via an EASY-nLC system. Eluted peptides were directly sprayed into an Orbitrap Velos mass spectrometer (ThermoFisher Scientific) with collision activated dissociation using a nanospray ion source (Proxeon). 10 MS/MS data-dependent scans were acquired simultaneously with one high resolution (60,000) full scan mass spectrum. An exclusion list was enabled to exclude a maximum of 500 ions for 30 seconds. Acquired RAW files were extracted from the mass spectrometry data with the extractms program and submitted for database searching using the SEQUEST search engine against a target-decoy

UniProtKB/Swiss-Prot FASTA file. Search parameters were set to allow for one missed cleavage site, one variable modification of +16 for methionine oxidation and one fixed modification of

+57 for cysteine carbamidomethylation using precursor ion tolerances of 3 m/z. After searching, peptide and protein hits were filtered using a 20 ppm (parts per million) tolerance for the precursor ion. I required a1% FDR for protein and peptide positive identifications.

3.2.2 Computational analysis methods

3.2.2.1 MS correlation measures

3.2.2.1.1 Pearson correlation coefficient

Proteins belonging to the same multi-protein complex should co-elute across a biochemical fractionation, giving rise to similar elution profiles for those proteins. The similarity of elution

78 profiles, represented as vectors containing the observed spectral counts for a protein in each fraction in a single experiment, was measured by determination of the Pearson correlation coefficients of the normalized elution profiles.

Each fractionation and mass spectrometry series identified N proteins across M fractions.

The raw data matrix is then an N by M matrix A where each A(i,j) represents the number of

MS/MS spectra observed to match protein i in fraction j. The normalized data matrix, B, converted numbers of peptides to frequencies, and was calculated as

Ai, j Bi, j  A i, j i  

A protein's normalized elution profile is represented by a row in this matrix, and the Pearson correlation coefficient was measured for each pair. While the Pearson correlation coefficient is a good indicator of a co-complex relationship if both proteins are observed at high counts in the matrix, proteins observed at very low counts but found in the same fraction are often perfectly correlated but have poor predictive power.

To circumvent this artifact, a synthetic noise was introduced into the raw data matrix and measured the extent to which noise affected the observed correlations and, by extension, the predictive power of correlation as it relates to protein complex membership. The observation of each protein in each fraction was modeled as a Poisson process, with lambda parameter assigned as the maximum likelihood estimate equal to the raw counts of protein i in fraction j (the A(i,j) value). The noise term 1/M was added to the maximum likelihood estimate for each cell. The value 1/M was chosen on the basis that each protein was represented in the matrix by at least one peptide count, and the background probability for this should be evenly distributed across the M

79 fractions. Thus the noise-added matrix C = A + 1/M, a constant. The MS experiment is re-run in silico by drawing randomly from Poisson(C (i, j)) for each cell, then normalizing as above and calculating the Pearson correlations for each pair of proteins. This process was repeated 1,000 times, and the mean Pearson correlation for each pair was recorded.

The noise term has the effect of giving every cell in the matrix a nonzero, albeit small, probability of "discovering" a protein count in that cell. The impact of this discovery on the correlation of that protein's elution profile with other normalized elution profiles was minimal for proteins observed at high counts and maximal for those observed with only one count across all fractions.

3.2.2.1.2 Weighted cross correlation

In addition to the noise model correlation scores, a weighted cross correlation score was measured for each pair of proteins in each experiment. The similarity of spectra profiles between each pair of proteins was calculated based on a weighted cross correlation (WCC) approach (de

Gelder et al., 2001), which was implemented in the R package wccsom (http://cran.r- project.org/web/packages/wccsom/index.html). The similarity value is between 0 and 1. There are some advantages of this approach over other similarity measures, such as Pearson correlation coefficient. The WCC approach can take into account the relative shift between spectra profile patterns. In other words, given a protein, one can compare its spectra profile at a point/fraction with the profiles in that neighborhood of the corresponding point/fraction of another protein.

Moreover, it is possible to assign weight to the different points in the neighborhood. In this study, one point/fraction shift between spectra profile patterns was considered and calculated the weights based on a simple triangle function (http://mathworld.wolfram.com/TriangleFunction.html).

80

3.2.2.2 Machine learning methods

The noise-model correlations and weighted cross correlations of each pair of proteins observed in each of the seven cytoplasmic and eleven nuclear MS fractionation experiments were combined into matrices of protein pairs x 14 (cytoplasmic) or x 22 (nuclear) experimental observations. Missing data, where the pair of proteins were not both observed in a given experiment, were interpreted as zeros.

A gold standard reference set of positive and negative interactions was generated from the CORUM database of curated mammalian protein complexes. Human complexes consisting of 3 or more proteins were identified and filtered for those identified by mass spectrometry and related methods, removing those identified solely by, for example, two-hybrid approaches,

EMSA, and imaging techniques. Highly overlapping complexes (those with Simpson similarity coefficient > 0.5) were merged, resulting in a reference set of 324 complexes comprised of 2,151 proteins. Each complex was then classified as “nuclear” and/or “cytoplasmic” based on the GO

Cellular Component annotation of its constituent proteins, resulting in 198 cytoplasmic and 190 nuclear complexes. These complexes were then randomly split into two groups, one for training pairwise co-complex protein-protein interactions in a machine learning framework and another independent set for optimizing final protein complex predictions from putative PPI. For PPI training, a reference positive interaction was defined as the case where two proteins were annotated to be in the same complex, and a reference negative interaction was defined as the case where both proteins were in the annotated set but never appeared in the same complex. Although the CORUM complexes contain a large number of highly overlapping, redundant complex definitions, merging redundant complexes and reducing the complexes to unique pairwise interactions minimized this source of bias. To further reduce bias, large complexes from the

CORUM reference set (e.g. spliceosome, ribosome) were excluded. Such big complexes would

81 otherwise account for a majority of reference PPI. Moreover, while the defined negative interactions almost certainly contained some actual positives due to incomplete annotations, their effect is necessarily small, as negative interactions greatly outnumbered positives. This renders the estimation of accuracy more conservative, as some negatives will in fact be mislabeled.

The data were subjected to a variety of machine learning algorithms using the Weka suite of tools and assessed for accuracy and coverage. Naïve Bayes and Logistic Regression classifiers were run using default parameters. Support Vector Machines (SVM) was applied using the SMO engine with a radial basis function kernel. The Random Forest implementation in Weka was too slow to use in an exploratory fashion but the Fast Random Forest re- implementation (http://code.google.com/p/fast-random-forest/) gave a significant performance boost and yielded the best results, as judged by cross-validated recall-precision analysis.

3.2.2.2.1 Incorporation of genomic and proteomic evidence

Genomic and proteomic evidence were assembled from the HumanNet functional gene interaction network (Lee et al., 2011). HumanNet integrates a wide array of alternate data types across both human cell lines and model organism experiments into a log likelihood score indicating the strength of evidence suggesting that a given pair of genes operates in the same biological process. In order to minimize circularity that might bias predictions of PPIs, data derived from human experimental and computational prediction of protein-protein interactions were excluded from the selected lines of evidence reported in HumanNet. In all, protein-protein linkages from 17 lines of evidence were individually added to the classifier as independent features, with missing values set to zero. The list of data types included was listed in Table 1-1

(Chapter 1) excluding the human Y2H data, human published AP-MS data, human literature curated PPI data, and co-citation of human genes data.

82

The nuclear dataset thus comprised 41 quantitative features for each protein pair: 11 MS datasets measured by noise-model correlation, and again by weighted cross-correlation; the 17 features from HumanNet; a Co-Evolution score (Clark et al., 2011; Tillier and Charlebois, 2009) measuring correlated evolutionary rates; and a Co-Apex score measuring the number of MS experiments in which both proteins showed maximum (modal) abundance in the same fraction.

Likewise, the cytoplasmic dataset consisted of 33 features per pair: 14 MS and 19 other.

A greedy stepwise feature selection algorithm which was implemented in Weka was used to rank features and selected only the most informative ones, with the specific goal of choosing the single best correlation metric for each particular MS data set. It was observed that, after the first of the large-scale repeat MS experiments was folded into the classifier, the second repeat added little information and ranked poorly. To rescue these data, the four largest MS replicates

(Table 3-1) were merged by addition and recalculated the noise model and weighted cross correlation scores for these four datasets. Performing feature selection on this data yielded 22 top-performing, non-duplicated features for the cytoplasmic data and 25 features for the nuclear data. Predictions were generated for these sets using the Fast Random Forest classifier in Weka and a combined score was generated for each pair by taking one minus the product of one minus the posterior probability of the pair interacting, as predicted by the classifier. For pairs that appeared in only one dataset, that dataset’s posterior probability was used. Applying the classifier to all pairs which had a correlation measure greater than 0.5 in any one MS data set yielded 817,179 protein pairs, of which 48,915 had posterior probability >= 0.5. Notably, incorporation of the complementary genomic evidence boosted the recall of PPI beyond that from the mass spectrometry evidence alone, across a wide range of predictive precision, e.g. increasing recall by ~20% at a cumulative precision of 0.7.

83

Table 3-1: Summary of the sample analyzed by MS in this study.

(NE, nuclear extract; CE, cytoplasmic extract; IEF, isoelectric focusing; WAX, weak anion- exchange; IEX, ion-exchange; LTQ, linear ion-trap) Type of cell Fractionation Number of MS Number of MS Experiment extract approach fractions instrument replicates 1 HEK 293 NE Dual heparin IEX 120 LTQ 2 2 HeLa NE Dual heparin-IEX 120 LTQ 2 3 HeLa NE Dual heparin-IEX 120 Orbitrap 1 4 HeLa NE Triple phase IEX 375 LTQ 2 5 HeLa NE Single heparin 48 LTQ 1 6 HeLa NE Sucrose gradient 14 Orbitrap 1 7 HeLa NE IEF, pH 5 to 8 10 Orbitrap 1 8 HeLa NE IEF, pH 3 to 10 10 Orbitrap 1 9 HeLa CE Triple phase IEX 269 LTQ 2 10 HeLa CE Single WAX 43 LTQ 1 11 HeLa CE Sucrose gradient 14 Orbitrap 1 12 HeLa CE IEF, pH 5 to 8 10 Orbitrap 1 13 HeLa CE IEF, pH 3 to 10 10 Orbitrap 2

3.2.2.2.2 Denoising the inferred protein-protein interactions

In order to further reduce the amount of noise in the above inferred protein-protein interaction network (i.e., random forest output), a novel procedure that exploits the network topology and protein co-localization information was developed to filter the inferred PPI prior to discovering protein complexes. If two proteins were not very well connected through their local network neighbourhood but there existed a high-probability PPI between them, this constituted a spurious edge and was removed by the procedure.

The first step of the new procedure removed the connections in the interaction network for which there was little evidence according to the network topology. The rationale here was that if two proteins belong to the same complex, they should be well connected to each other through many short paths in the graph. Diffusion methods over random graphs have previously

84 been employed to quantify the amount of connectivity existing between two nodes in a graph

(Coifman et al., 2005; Paccanaro et al., 2006).

Here, a multiple-step diffusion which calculates the connectivity between proteins i and j as the (i,j) element of the matrix was used and defined as below:

λM e  λ  M

where M is the 5,549×5,549 matrix whose entries are the output of the random forest classifiers, and λ is the inverse of the maximal eigenvalue of M. Edges with diffusion values lower than 5E-

05 are then deleted from the original graph. This new network was defined as D graph.

In the second step, the D graph was calibrated using protein co-localization information.

To do this, the network was linearly combined with the GO-CC (Haas et al., 1999; Harris et al.,

2004) normalized semantic similarity scores with the assumption that they are independent. The rationale here was that two proteins located in different cellular locations should not interact. The final score for each link was thus given by:

 Simi, j 1 1 Di, j 1   MS  where Sim(i,j) is the maximum of the pairwise similarities between the two groups of GO-CC terms to which protein i and protein j are annotated (Pesquita et al., 2009; Resnik, 1999), and MS is the maximal value of the semantic similarity scores.

Note that, among 5,549 proteins, there are 1,790 proteins that are not annotated in GO-

CC. Therefore for these proteins the D graph (output of the first step) was simply used, as this

85

(second) step cannot be applied to unannotated proteins. Also, the GO-CC annotation with evidence codes such as NR, IEA, and ND were discarded. Scores below a threshold of 0.55 were set to zero. The resulting de-noised Protein-Protein Interaction graph contains 13,993 interactions (3,006 proteins) at an estimated 21.5% false discovery rate (FDR).

3.2.2.3 Clustering of the denoised PPI network to discover protein complexes

Protein complexes appear as densely connected regions within the denoised interaction network.

Since a protein may belong to multiple protein complexes, these densely connected regions may overlap. To elucidate such overlapping sets of dense regions in the predicted biochemical network, a novel algorithm named ClusterONE (Clustering with Overlapping Neighborhood

Expansion) was developed. ClusterONE finds complexes by growing multiple clusters from seed proteins, independently of each other. The growth of a putative complex is governed by a greedy rule that tries to maximize the cohesiveness of the complex. The cohesiveness of a complex C was defined as follows:

win

win  wout  p C

where Win is the total weight of connections within C, Wout is the total weight of interactions connecting the complex with the rest of the network and |C| is the size of the complex. p is a penalty constant that accounts for the possibility of uncharted connections in the network as it assumes p extra external connections for the complex for every protein involved. In each step of the growth process, a new adjacent protein was added to the complex or removed to an already added protein in a way that yields the maximal increase in cohesiveness. The growth process stops when it is not possible to increase the cohesiveness any further. At this stage, the cluster is declared a protein complex candidate if its density is above a given density threshold d, and the

86 growth process restarts from a different seed. The first seed is the protein with the largest total weight on its incident connections (i.e., the protein with the most confident set of interactions), and subsequent seeds are always selected in a similar manner but excluding proteins that have already been added to some protein complex candidate. Since the growth processes are independent of each other, the calculated complexes may overlap. More details on ClusterONE can be found (Nepusz et al., 2012). The algorithm has two main parameters: the penalty p and the density threshold d. The optimal settings for these parameters were found after training the algorithm on the cluster-training complex subset (see above) to yield the highest maximum bipartite matching ratio (Nepusz et al., 2012). In this study, the final set of complexes was derived using p=2.9 and d=0.4.

To evaluate the overlap of the predicted complexes with the CORUM complexes, the number of CORUM complexes matching at least one predicted complex by a matching score greater than 0.25 (matching score = size of intersection squared, divided by the product of the two complexes sizes, as previously defined (Bader and Hogue, 2003)) was first applied.

Secondly, the maximum matching ratio (Nepusz et al., 2012) was calculated by matching each predicted complex to at most one reference complex and vice versa, while maximizing the total matching score between them (with the theoretical maximum of 1.0 considered as a perfect match). Thirdly, the geometric accuracy was measured as previously defined (square-root of the product of positive predictive value and clustering-wise sensitivity) (Brohee and van Helden,

2006). The predicted complexes showed better correspondence with the CORUM catalogue of reference human protein complexes than the results of other popular methods, including

MCODE, MCL, CMC and RNSC (see Table 3-2).

87

3.2.2.4 Enrichment analysis of protein pairs with shared annotations

To evaluate interacting and co-complexed protein pairs, the following large-scale sets of protein- protein interactions were collected: 1,991 co-complex interactions related to chromosome segregation (Hutchins et al., 2010), and “co-regulator” complexes identified through immunoprecpitation and mass spectrometry-based methods (Malovannaya et al., 2011), and a D.

Melanogaster protein interaction network generated with AP-MS method (Guruharsha et al.,

2011). In addition, the following sets of gene annotations was also amassed : three available sets of 1,023,3,563, and 114,477 human disease-gene associations (Becker et al., 2004; Hamosh et al., 2005; TheUniProtConsortium, 2011), 2,065 gene-mitotic phenotype associations (Hutchins et al., 2010; Neumann et al., 2010), curated sets of 74,250 mouse, 86,383 yeast, and 27,065 worm gene-phenotype associations assembled in (McGary et al., 2010), upstream regulatory motifs for 265,270 genes (Xie et al., 2005).

Next, protein interaction partners were tested for having common functional or enriched in a given phenotypic associations. That is, are protein pairs which are predicted to interact significantly more likely to share annotations? For each annotation set, the total number of protein pairs sharing annotations was calculated in the space of all possible pairs formed from the background set of annotated proteins detectable through my experimental procedures. This

“expected” fraction of pairs was compared with shared annotations with the “observed” fraction of interaction partners with shared annotations. To measure the significance of the observed fraction, a p-value was obtained from the following hypergeometric test:

88

m N  m    min(n,m)     x  n  x  p(x  k)   [1] xk  N     n  where N is the number of possible annotated pairs, m is the number of possible pairs with shared annotation, n is the number of annotated interaction partners, and k is the number of interaction partners with shared annotation.

3.2.2.5 Enrichment analysis of protein clusters with particular phenotype associations

Predicted protein clusters were tested for their enrichment for particular human, mouse, or worm gene-phenotype associations. The significance of members of a cluster sharing a particular phenotype was determined by the hypergeometric probability, as above, where N is the number of annotated proteins in the background protein set, m is the number of proteins annotated with the queried phenotype, n is the number of annotated proteins in the cluster, and k is the number of proteins in the cluster annotated by the queried phenotype.

3.2.2.6 Cross-validations with curated complexes in public databases and independent studies

I compared my network of complexes to curated complexes in 5 public databases, including

CORUM (Ruepp et al., 2009), REACTOME (Haw et al., 2011), PINdb (Luc and Tempst, 2004), and HPRD (Prasad et al., 2009) databases, and specified complexes within the GO cellular component category (Ashburner et al., 2000) to assess the agreement between the predicted complexes and the literature. I required a minimum of 2 shared components and a Simpson similarity coefficient (defined as the number of shared proteins between two query complexes, divided by the minimum number of proteins between the two complexes) greater than 0.5 in

89 order to report a putative complex as curated in either one of the above databases. Next, I validated putative new complexes (i.e., not curated in the above public repositories) through comparison with recently published independent co-affinity purification data (Guruharsha et al.,

2011; Malovannaya et al., 2011). In particular, I accessed the recent human protein interaction results of Guruharsha et al. (Guruharsha et al., 2011). This group performed affinity-tag pull- down experiments for human proteins present in 41 of my complex data set. Overall, of the 299 relevant human bait-prey interactions reported, 143 likewise occurred within complexes reported in my study, representing a 47.8% validation rate. This agreement is comparable to the 63.8% validation rate they claim for their own complex predictions, and is probably an underestimate since they don't report all the proteins actually detected by mass spectrometry, but rather only human proteins with orthologs in their initial Drosophila PPI network.

I also compared my complexes with the results of Malovannaya et al., also published in

Cell last year (Malovannaya et al., 2011), which verified a total of 127 of my complexes (i.e. clusters show a Simpson matching coefficient > 0.5 between studies), including 42 (33%) of the predicted complexes that are not curated in CORUM.

3.2.2.7 Conservation of complexes across model organisms

To examine to what extent human protein complexes identified in this study have known counterparts in yeast and fly, the set of 720 multi-protein complexes in S. cerevisiae identified in a recent study (Babu et al., 2012) and the 556 complexes recently derived for D. Melanogaster

(Guruharsha et al., 2011) were considered . Both sets of complexes have been identified using

AP-MS techniques. Briefly, human complexes were converted into an ortholog representation by mapping, whenever possible, the components of each complex to their orthologs in yeast and fly, respectively. The ortholog representation of individual complexes was used to search for the most

90 statistically significant match between this representation and all known complexes from the corresponding organism. The process was also repeated in the opposite direction, mapping model- organism complexes onto the human collection in order to identify reciprocally best matches.

Statistical significance was established using the Fisher’s exact test for hypergeometric distribution and the Benjamini-Hochberg method (Benjamini and Hochberg, 1995) to correct for multiple testing

(estimated false discovery rate ≤ 0.05). Orthology relationships for human, yeast and fruit fly were derived from two well established sources: the InParanoid 7.0 (Ostlund et al., 2010) and Ensembl

Compara (Vilella et al., 2009). The latter includes both the current Ensembl release 64

(ftp://ftp.ensembl.org/pub/release-64/mysql/ensembl_compara_64/) and Ensembl Genomes release 11

(ftp://ftp.ensemblgenomes.org/pub/pan_ensembl/release-11/mysql/ensembl_compara_pan_homology_11_64/).

The Ensembl IDs from Compara were mapped using BioMart Perl API, which is available at

(http://www.biomart.org/martservice.html). In addition, the human-to-yeast orthology map was extended by matching human and yeast genes that share a common fly ortholog.

3.2.2.8 Coevolution analysis

For the calculation of coevolution scores, MatrixMatchMaker (MMM) program (Clark et al.,

2011; Tillier and Charlebois, 2009) (Clark et al., 2011; Taatjes et al., 2002; Tillier and

Charlebois, 2009) was employed. Orthologous protein sequence clusters were obtained from the

OMA Database (Schneider et al., 2007) and comprised 204,689 eukaryotic groups that span 96 species, of which 20,800 contained human orthologs. The groups containing a human protein and at least 10 orthologous sequences were aligned using MAFFT (Katoh et al., 2005) and distance matrices were obtained by using protdist from PHYLIP (Felsenstein, 2005) with the

PMB distance matrix (Veerassamy et al., 2003) to correct for multiple substitutions. The MMM was ran in an all-by-all manner with a selected tolerance of 0.1 (10%) and chose to use taxon information such that only sequences from the same species could be matched.

91

To assess the relative evolutionary rate, an average matrix was obtained by averaging the distance matrix entries over all of the OMA groups’ matrices. The average matrix was used to compute the relative rate of an OMA group’s evolution as the ratio of its rate (average distance to the human ortholog) over the average matrix’s rate for the same subset of species pairs.

Values greater than 1 are proteins that are evolving faster than average, whereas values less than one indicate more slowly evolving proteins.

To evaluate the evolutionary age, the distribution of species present in the OMA orthologous groups which determine the ancestral node in the phylogenetic tree of all eukaryotic species was employed. The evolutionary distance from the human sequence to this last common ancestral node was then calculated and, in the case of complexes, averaged over the proteins in the complex. This gives an approximate evolutionary origin of the human orthologs.

3.2.2.9 Interaction database and PPI orthology

All OMA proteins were assigned ROGiDs based on their amino acid sequence. These IDs were then used to identify the known physical (or inferred by the author) protein-protein interactions from the iRefIndex database (Razick et al., 2008), which combines protein interaction data from multiple public databases: BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact,

MPPI and OPHID. Human protein interaction data were also downloaded from most of these public databases and some other online available resources independently. These databases and resources included BioGRID (Stark et al., 2011), DIP (Salwinski et al., 2004), MINT (Ceol et al.,

2010), HPRD (Prasad et al., 2009), INTACT (Aranda et al., 2010), NCBI Gene

(http://www.ncbi.nlm.nih.gov/gene/), CORUM (Ruepp et al., 2009) and the Human interactome dataset (Rual et al., 2005). Orthology of the PPIs was then determined using the species distribution of the OMA groups.

92

3.3 Results

3.3.1 High-throughput complex fractionation and detection by LC-MS/MS

To isolate human protein complexes in a sensitive and unbiased manner, cytoplasmic and nuclear soluble protein extracts isolated from human HeLa S3 and HEK 293 cells (grown as suspension and adherent cultures, respectively,) were subjected to extensive complementary biochemical fractionation procedures. These two widely-studied laboratory cell lines have been used as models of human cell biology for many decades (Graham et al., 1977; Masters, 2002), providing a rich biological context for interpreting the resulting proteomic data. Stably interacting proteins that co-fractionated together were identified subsequently by nano-flow liquid-chromatography- tandem mass spectrometry (LC-MS/MS). As illustrated schematically in Figure 3-1, the entire experimental pipeline was optimized using a multi-pronged strategy to minimize two major confounding issues: limited dynamic range (i.e., preferential detection of high abundant components) and ‘chance’ co-elution (i.e., co-fractionation of functionally-unrelated proteins).

To address the former concern, I performed extremely deep fractionation by employing multiple orthogonal separation techniques to better resolve distinct protein complexes (see

Materials and Methods). As a primary separation technique, I employed non-denaturing high- performance multi-bed ion-exchange chromatography (IEX-HPLC) using four different empirically optimized analytical column combinations and shallow salt gradients unlikely to perturb non-ionic protein associations (Havugimana et al., 2007). In parallel, complementary sucrose gradient centrifugation and isoelectric focusing technologies were applied to capture salt-sensitive protein assemblies. In total 1,163 different fractions in eight nuclear and five cytosolic extract fractionation experiments (Table 3-1) were collected and were each subjected to label-free shotgun sequencing (2,057 replicate analyses) using highly sensitive ion trap mass

93 spectrometers (see Material and Methods). The resulting >18,000,000 mass spectra, acquired during over 9,000 hours of dedicated instrument run time, were rigorously searched against human protein sequences obtained from the Swiss-Prot database.

After merging all the datasets, a total of 41,506 unique peptides (supported by ~1.6 million individual MS/MS spectra) were mapped to 5,584 distinct human proteins (Figure 3-2 summarizes this result in 'heatmap' format) at an estimated ~1% false discovery rate (theoretical protein and peptide level FDR based on statistical model) (Kislinger et al., 2003). The proteins identification matrix table of 5,584 proteins x 1,163 fractions is supplied as a supplementary table 3-1S (see also accompanying CD), and also available on a dedicated web database of human protein complexes (http://human.med.utoronto.ca) that contains all the data generated in this study in any easily navigated format. Notwithstanding the underrepresentation of membrane proteins in our starting cell extracts, this coverage encompasses roughly 60% of the experimentally-verified human proteome (Figure 3-3) (Nagaraj et al., 2011).

94

Figure 3-1: Integrative multi-pronged strategy used to identify human soluble protein complexes.

Cell-free extracts were extensively fractionated using traditional biochemical fractionation techniques (IEX, ion exchange chromatography; IEF, isoelectric focusing; and density gradient centrifugation). Co-eluting proteins present in each fraction were identified by tandem mass spectrometry and a co-elution network generated by calculating profile similarity scores (Pearson correlation, co-apex, and weighted cross-correlation) using spectral counts information (see Material and Methods).

95

Figure 3-2: Identification of co-purifying protein subunits by LC-MS/MS analysis.

This figure shows a hierarchical clustering representation of the 5,584 protein profiles obtained for the HeLa and HEK cell-free extracts in 13 fractionation experiments reported in Table 3-1. Shading indicates the spectral counts recorded by LC-MS/MS.

96

Figure 3-3: Abundance levels of the proteins identified in this study.

Shown are protein abundance levels corresponding to components of identifed co-purifying protein subunits (red line), reconstructed complexes (blue line) or annotated CORUM complexes (black line) estimated from the known HeLa proteome abundance (Nagaraj et al., 2011). ~ 85% (4,745/5,584) of the proteins identified in this study are supported by proteomic data produced with a high resolution mass spectrometer identifying 8,263 proteins in HeLa proteome (Nagaraj et al., 2011).

97

Of the proteins identified, 989 (18%) were detected exclusively in nuclear fractions (including

376 annotated transcription or chromatin-related factors), 832 (15%) had links to human disease

(e.g. annotated in a public database like OMIM), and only 1,632 (29%) had biochemical annotations as subunits of previously reported protein complexes in the CORUM curation database (Ruepp et al., 2009) (corresponding to roughly two-thirds, or 64%, of all existing human protein entries; Figure 3-4). Importantly, due to the extensive fractionation, we observed minimal bias in terms of protein abundance beyond that reported for previously annotated protein complexes or the experimentally defined HeLa proteome (Figure 3-3).

Next, to minimize the possibility of chance co-elution, rather than simply identifying the proteins present in each fraction, I quantified variation in protein abundance based on the observed patterns of spectral counts recorded across all of the collected fractions to determine the extent to which pairs of proteins co-eluted. As I previously shown in Figure 2-7A, these experimental profiles were highly reproducible (i.e., average Spearman rank correlation coefficients greater than 80% between replicate experiments), even using alternate methods of mass spectrometric quantification (i.e., extracted MS1 peak intensities were largely consistent with spectral counting; Figure 3-5).

98

Figure 3-4: Overlap of the proteins identified in this study with those in CORUM database.

Proteins identified in this study covered 64% (1,632/2,549) of the proteins present in the CORUM reference database.

99

Figure 3-5: Assessment of LC-MS/MS protein detection bias.

Box-and-whiskers quartile plots show the high consistency (profile correlation > 0.8) of the co- fractionation data using different measures of protein abundance (spectral counts by MS2 versus high precision MS1). Data reproducibility was calculated using Spearman correlation coefficients of replicate profiles. Horizontal solid black lines mark minimum, first quartile, median, third quartile, and maximum spearman correlation values. Dashed black lines mark mean spearman correlations. High-scoring interacting protein pairs show reproducible HeLa or HEK 293 co-elution profiles measured on a linear ion-trap (A and B, MS2 spectral counts) or a high precision Orbitrap instrument ( C, MS1 peptide intensities using the MaxQuant software of Mann and colleagues; D, MS2 spectral counts).

100

To objectively evaluate the biochemical data, I calculated a stringent summary statistic, termed the co-apex score, for each pair of proteins identified LC-MS/MS by determining the number of fractionation experiments in which the proteins showed maximum (modal) abundance in the same exact peak fraction. Figure 3-6 shows the distribution of co-apex scores.

Figure 3-6: Distribution of protein-protein interactions identified by the co-apex method.

Shown are the distribution of the 255,754 co-elution PPI, among 4,223 different proteins, that pass a minimal co-apex score of 2.

101

To assess the effectiveness of my co-fractionation approach, I performed an initial sanity test by examining the co-elution profiles and co-apex scores obtained for a reference set of 20 well- known human protein complexes reported in CORUM. As illustrated by the representative HeLa nuclear extract IEX-HPLC profiles shown in Figure 2-7A, the subunits of these complexes typically co-eluted in the same biochemical fractions. Of the 155 components detected by mass spectrometry, 85% (499/585 annotated pairs) of the annotated subunit pairs of the reference complexes had high co-apex similarity scores (i.e., co-elute together in at least two or more experiments), validating the overall efficacy of the fractionation procedures I used to isolate native protein complexes and the general correctness of the protein identification and quantification pipeline.

3.3.2 Reconstruction of a high confidence co-complex interaction network

Despite the consistency in co-elution of annotated complex members, certain functionally distinct complexes occasionally exhibited overlapping chromatographic elution profiles (e.g.

Splicing factor 3b and Coatomer complexes, which have unrelated cellular roles, tended to co- elute during IEX-HPLC, Figure 2-6), presenting a potential source of spurious co-complex interactions. While this artifact was minimized to a certain degree by performing multiple independent fractionation experiments, an integrative computational approach was used to further improve de-convolution (Figure 3-7). Since physically-interacting proteins often perform related biological functions (Alberts, 1998) and are often evolutionarily co-conserved (Hartwell et al., 1999), a machine learning procedure (Figure 3-7; see Material and Methods for details) was devised to score and select higher-confidence physical interactions based on both the experimentally measured co-elution profiles and the existence of additional supporting functional-association evidence inferred from correlated evolutionary rates (Tillier and

102

Charlebois, 2009) and functional genomics datasets compiled for H. sapiens, S. cerevisiae, D. melanogaster and C. elegans.

First, for each of the 13 fractionation experiments, correlation measures between all possible pairs of proteins were computed to capture their tendency to co-elute. In addition to the co-apex summary statistic, to account for mass spectrometry sampling error, a weighted cross- correlation function was devised to account for slight variation in the protein profiles measured in each experiment. To account for low spectral values, a Poisson noise model was generated before the calculation of Pearson correlation scores, deeming the co-elution profiles of protein pairs measured with low spectral counts as less predictive of genuine physical interactions. Only protein pairs with a correlation score of at least 0.5 by at least one of these measures in one or more experiments were considered for further analysis, reducing the total number of pairs from over 15 million initially to the roughly 800,000 pairs with reasonable biochemical evidence.

Additionally, the predictive power of correlated protein evolutionary rates (Tillier and

Charlebois, 2009), mRNA co-expression, and domain co-occurrence, and, via orthology, fly protein-protein interactions (based on binary yeast two-hybrid assay studies) and extensive physical and functional associations reported previously for yeast and worm (see Experimental

Procedures) (Lee et al., 2011) was exploited to improve the assignment of interaction probabilities. In order to increase the discriminatory power of the procedure, it was necessary to penalize those interactions which lacked independent supporting evidence – and which were thus more likely to correspond to cases of chance co-elution – by integrating evidence from these functional association data, as diagrammed in Figure 3-7. A feature selection algorithm was used to select the most informative datasets in addition to the biochemical correlation scores, and

103 the resulting features were used to estimate the probability of interaction to protein pairs using a cross-validated random forest classifier.

For training, CORUM curated set of human protein complexes filtered for those complexes reported before based on biochemical methods was used as a base reference. As many CORUM complexes are highly overlapping due to redundancy in existing annotations, complexes sharing subunits in common (Simpson coefficient > 0.5) were combined. Half of the resulting 324 nonredundant reference complexes was employed as training set for co-complex probability prediction The gold standard positive interactions was defined as pairs of proteins in the same complex and inferring gold standard negatives between proteins in different complexes.

The other half of the reference complexes was withheld for subsequent use as an independent training set for cluster optimization (described below). Although the biochemical data was a pre- requisite for scoring, the performance curves shown in Figure 3-8 indicate that the inclusion of the additional functional genomic information substantially increased recall at the same level of precision compared to classifiers based on the profiling data alone. Moreover, the integration of this additional supporting functional evidence removed the bulk of the spurious, inter-complex interactions seen initially (Figure 3-9).

104

Figure 3-7: Deriving protein complexes from biochemical co-elution network data set.

The biochemical data was combined with alternate weighted networks of functional association evidence using a random forest classifier and a training set of reference complexes (CORUM) to filter out spurious connections and infer high-confidence protein interactions. The probabilities were re-weighted to ensure an interactome in which physically-associated proteins are also generally co-localized. Both the PPI and predicted clusters were evaluated with different independent functional criteria to ensure high quality. In the interaction graphs, round nodes represent proteins while edges represent interactions; nodes with the same color are subunits of

105 the same complex. In the classifier panel, blue diamonds represent attributes in the decision tree vector and green diamonds (leafs) represent the final result (positive or negative). Arrows in the Figure represent data flow.

Figure 3-8: Filtering biochemical network with functional evidence improves both precision and recall.

Shown is a comparison of the cumulative precision-predication rank curves for the biochemical MS data alone (black curve) and after filtering with genomic evidence (blue curve). The incorporation of the functional evidence increases both precision (i.e., reducing false positives) and recall of true positives. HumanNet-only data (i.e., functional associations data) was too sparse to give meaningful output (running the classifier on the HumanNet-only data failed).

106

Figure 3-9: Biochemical network of the 20 reference complexes after filtering with functional evidence.

107

As an alternate measure of reliability, scored human protein interactions were compared to a recently reported network of Drosophila co-complex protein interactions (Guruharsha et al.,

2011), which had not been used to build the classifier. Strikingly, despite using vastly different experimental methods and scoring schemes, a remarkably good overall correlation (Spearman r=0.40; n=11,675 orthologs mapped using Inparanoid) was observed. Even after removing interactions supported by alternate Drosophila data, high-scoring fly pairs matched high-scoring pairs in our analysis and were strongly enriched for reference positive co-complex members

(Figure 3-10).

Finally, in order to remove any remaining false positive interactions, the co-complex dataset derived from the previous step was further denoised by pruning loosely connected interactions using a computational diffusion procedure calibrated by protein co-localization

(Pesquita et al., 2009) to enforce local network topologies more consistent with annotated complexes from the withheld portion of the reference CORUM complexes (see Material and

Methods). Benchmark precision and recall versus the holdout set of known reference complexes

(Figure 3-11) were significantly higher than those reported for a smaller, recently published set of affinity-purified human protein complexes (Hutchins et al., 2010) validating the reliability of our scoring procedure. Application of a probability cutoff score of 0.78 resulted in a high- confidence set of 13,993 co-complex interactions among 3,006 unique human proteins, most of which (8,691 interactions) have not been reported before (i.e., are not publicly annotated). It is worth reiterating that all of these physical interactions were directly supported by the experimental biochemical co-fractionation data; the addition of functional data and de-noising served only to flag candidates lacking either functional support or topological support within the network. The interaction probability scores may be underestimated, however, because the reference ‘gold standards’ used for learning are imperfect (Jansen and Gerstein, 2004).

108

Figure 3-10: Correlation between human PPI and orthologous Drosophila PPI.

Good overall correlation (Spearman r=0.40; n=11,675) of my scored human PPI with corresponding interaction scores reported previously for orthologous fly PPI from which validated, high confidence complexes were subsequently derived (Guruharsha et al., 2011). Heatmap shows prediction accuracy (log ratio of true positives to true negatives), with high- scoring pairs in both studies for highly enriched for positives.

109

Figure 3-11: Precision-recall curve showing improved performance obtained after denoising procedure.

Shown is the strong performance obtained for reconstructing reference CORUM complexes after denoising the inferred PPI (MS co-elution network filtered with genomic evidence) with co- localization information and network topology. Each complex in reference set is highlighted by a red dot at the threshold at which half of the protein pairs in the complex are recovered.

110

3.3.3 Construction and validation of protein complexes from the probabilistic interaction network

In order to define complex membership, the high-confidence probabilistic physical interaction network was partitioned using the seeded cluster growth algorithm ClusterONE (Nepusz et al.,

2012), which outperformed other clustering methods (Table 3-2). In total, the clustering predicts

622 discrete putative complexes among 2,634 distinct proteins ( see Appendix 1). The majority

(62%; 385/622) of these complexes are novel (i.e., only 237 are currently annotated in a public database like CORUM; Figure 3-12). Although the fraction of curated components varies, I recapitulated 258 previously reported complexes (Figure 3-14), which even included several membrane-associated complexes, such as the Coat Protein I (COPI) and II (COPII) vesicle transport complexes which shuttle cargo between the Golgi and endoplasmic reticulum.

Strikingly, while the complex membership size distribution approximated an inverse power law with a median of 4 subunits, most (67%; 335) of the 500 smaller putative complexes with 5 or fewer components, including the bulk (74%; 83) of the 112 predicted heterodimers, have never been curated before (Figure 3-14).

Both independent experimental validation based on more traditional immunoprecipitation or co-affinity purification methods and orthology mapping support at least 21 of these putative novel complexes (i.e., not in any reference database) (Figure 3-14). For example, Guruharsha et al. recently reported 299 co-complex interactions based on pull-down experiments of 43 affinity- tagged human proteins present in 41 complexes reported in this Thesis, of which 143 interactions map precisely to the predicted complexes, representing a 47.8% validation rate (which may be an underestimate as Guruharsha et al. do not report human interactions that fall outside the fly interologs examined in their study). Likewise, the results of Malovannaya et al., who used large- scale immunoprecipitation to isolate native human protein complexes, show excellent agreement

111 to 123 of the predicted complexes (i.e., clusters show a Simpson matching coefficient  0.5 between studies). This agreement includes 42 (34%) of the predicted complexes lacking in curated CORUM database (Figure 3-13). Table 3-3 summarizes the highly significant overlap of the predicted complexes with these fully independent datasets, with enrichments ranging from

4- to 477-fold over chance, thus broadly and systematically validating our network of derived human protein complexes.

Table 3-2: Benchmarking results between ClusterONE and other popular clustering algorithms.

ClusterONE MCL RNSC CMC MCODE

Number of clusters 771 665 940 2429 87

Matched Corum complexes 38 30 33 37 18

Maximum matching ratio (MMR) 0.069 0.059 0.065 0.064 0.039

Geometric accuracy 0.357 0.350 0.351 0.307 0.277

The purpose of this benchmark was to assess which algorithm performed best at detecting protein complexes by clustering the denoised protein interaction network. These results were obtained by first learning the parameters of the different algorithms on half of the CORUM complex set and then testing the algorithm on the entire CORUM set. For each algorithm, the parameters were learned by maximizing the maximum matching ratio (Nepusz et al., 2012) between the reference and the predicted complexes. The number of matched CORUM complexes was calculated using a match score threshold of 0.25. Note that ClusterONE, CMC, and MCODE can assign proteins to more than one complex (i.e., can detect overlapping protein complexes) while MCL and RNSC cannot. I examined the 771 complexes obtained by ClusterONE more closely and found that the results were easier to comprehend if I reduced the redundancy by combining complexes sharing subunits (Simpson coefficient  0.5 between complexes). This procedure lead to a final set of 622 complexes discussed in the text and presented in Appendix 1.

112

Figure 3-12: Distribution of predicted complexes based on size.

113

Figure 3-13: Global validation of the predicted high confidence human protein complexes.

Shown is the network of predicted 622 complexes with a validation rate of ~ 40% which has been proportioned according to subunit number, existing curations, validation status obtained by independent immunoprecipitation/mass spectrometry experiments (Malovannaya et al., 2011), and PPI connectivity (proportioned edge width). Clusters were broadly grouped into the 5 major cell compartments based on the localization of the majority of proteins in each cluster.

114

Figure 3-14: Proportions of annotated versus putative new protein complexes.

The pie chart shows the fractions of the 622 inferred complexes with their subunits annotated in one or more of the human complexes in public repositories (CORUM, PINdb, REACTOME, and HPRD) or that have been independently experimentally-verified.

115

Table 3-3: Enrichment analysis indicates highly significant PPI overlap between my study and independent AP-MS datasets.

Shown is the overlap between this study with recent large-scale co-affinity purification datasets generated for human (Hutchins et al., 2010; Malovannaya et al., 2011) and (via orthology) (Guruharsha et al., 2011).

116

Figure 3-15 shows the broad functional diversity of the predicted complexes (a navigable map is available online for close visualization of individual clusters and their supporting co-complex interactions). Consistent with biological expectation (Hartwell et al., 1999; Lage et al., 2007;

Oliver, 2000; Vidal et al., 2011), the subunits of the complexes were significantly enriched for related biological functions, transcriptional regulatory motifs, and pathological processes (Table

3-4). Compared to the entire set of identified proteins, the clustered proteins also showed enrichment for post-translation modifications linked to cellular regulation, like acetylation

(Benjamini-corrected p ≤ 10-41) and (p ≤ 10-5). Of particular interest, many of the complexes are linked to core cellular processes, such as mRNA splicing (p ≤ 10-15) or transcription (p ≤ 10-5), that either have RNAi-induced phenotype in cell culture (e.g. cell division arrest, p ≤ 10-31) or are associated, via orthology, with similar mouse, yeast or worm mutant phenotypes (see Figure 3-4).

117

Figure 3-15: Global map of high confidence human protein complexes.

Schematic representation of the global network of inferred human soluble protein complexes (colored by membership). Numbered are examples of annotated complexes in CORUM reference database.

118

Table 3-4: Functional enrichment analysis supporting predicted complexes.

Analysis of independent datasets reveals proteins in the same complexes are markedly enriched for sharing similar disease associations, RNAi phenotypes in human cell culture (Neumann et al., 2010), mutational and RNAi phenotypes in other species (via orthology), and transcriptional regulatory motifs (Xie et al., 2005).

119

3.3.4 Clinical and biological implications of the reconstructed human protein complexes

Consistent with this strong tendency for proteins in the same complex to be affiliated with similar mutational and RNAi phenotypes, subunits of the predicted human protein complexes were much more likely than chance (p ≤ 10-46) to have links to a documented clinical pathology (

Table 3-4), with disease-associated proteins distributed broadly amongst the complexes (Figure

3-16). Closer examination of the interaction sub-networks comprising both known human disease genes and genes that currently lack annotation or which have not previously been associated with any human disorders highlights the utility of the map. One such example is shown in Figure 3-17 , illustrating the case of the human developmental disorder Cornelia de

Lange syndrome (CdLS). Mutations in three subunits of the cohesin complex (SMC1A, SMC3,

NIPBL) have been linked to CdLS (Pie et al., 2010), implicating an additional component

(RAD21) as a candidate CdLS , and consistent with at least one unmapped CdLS locus residing on (DeScipio et al., 2005). The link to RAD21 provides a likely explanation for the occasional overlap of Langer-Giedion Syndrome (LGS) clinical presentation with CdLS, as all LGS patients are at least partially defective for RAD21 [see e.g. (McBrien et al., 2008; Wuyts et al., 2002)]. Similarly, RAD18, a homolog of SMC3 and SMC1A, may play a role in CdLS, consistent with unmapped CdLS deletions within chromosome 3p25 (DeScipio et al., 2005). Reports coinciding with the preparation of this Thesis appear to confirm that RAD21 mutations do indeed lead to a CdLS-like syndrome (Deardorff et al., 2012), supporting the use of this complex map to prioritize promising candidate genes for human diseases.

120

Figure 3-16: Distribution of disease-associated proteins in the predicted protein complexes.

The Histogram panel shows the distribution of annotated disease-associated proteins that are present in my compendium of 622 protein complexes. Protein components with known or predicted human disorder associations are those annotated in the UniProt database (TheUniProtConsortium, 2011), the Online Mendelian Inheritance in Man (OMIM) (Hamosh et al., 2005) and/or the Genetic Association Database (GAD) (Becker et al., 2004).

121

Figure 3-17: Membership in complexes predicts disease associations.

Three of four proteins mapped to the cohesin complex are known to account for roughly half of cases of the human congenital disorder Cornelia de Lange syndrome (Pie et al., 2010), strongly implicating the fourth component, RAD21, as a candidate gene for the disease. This association may explain similarities in clinical presentation between CdLS and Langer-Giedion syndrome, as the latter patients routinely harbor RAD21 deletions [e.g. (McBrien et al., 2008; Wuyts et al., 2002)].

122

Similarly, participation in the same complex suggests shared functions; the map can thus be used to predict new biochemical functions for proteins and other types of functions. I experimentally validated one such case for a ribosome-associated sub-complex containing

BOP1, RRS1, GNL3, EBP2, FTSJ3, and MK1671P, first confirming the interactions by affinity tagging/purification and mass spectrometry (Figure 3-18). BOP1, EBP2, and the yeast ortholog of RRS1 are known to participate in maturation of the large 60S ribosomal subunit, suggesting the other factors likewise engage in ribosome assembly, consistent with the nucleolar localizations of GNL3, FTSJ3, and MKI67IP.

Figure 3-18: Affinity purification mass spectrometry confirmed three novel ribosome biogenesis factors.

Novel ribosome biogenesis candidates (orange) in association with annotated components (blue) after affinity-purification of FLAG-tagged proteins (top). Colored squares indicate detection by AP-MS (see Experimental Methods for details). SPC25-FLag was used as negative control.

123

3.3.5 Conservation of human protein complexes

Estimates based on sequence similarity across orthologs indicate that the components of the detected complexes are generally more ancient and have higher conservation on average than most human proteins (Figure 3-19A). Using orthology relationships derived from well established sources and calculating evolutionary rates and ages for all human proteins as a base distribution for gauging the emergence of complexes (see Material and Methods), many complexes appear to be quite ancient and slowly evolving (Figure 3-19B). Strikingly, however, most (60%; 376/622) human complexes likely arose with vertebrates (i.e., orthologs not present in invertebrates or fungi) (see Figure 3-19B). Hence, these analyses suggest a major shift/expansion in the ancestral protein interaction network coincident with the emergence of vertebrates.

Given the availability of experimentally-derived networks of fly and yeast protein complexes, we could directly examine the evolutionary conservation of protein complexes across animals by comparing our network of human complexes with the extensive maps of 556 fly protein complexes recently reported for D. melanogaster (Guruharsha et al., 2011) and 720 yeast protein complexes documented for S. cerevisiae (Babu et al., 2012). Roughly one quarter (24%;

149/622) of the predicted human protein complexes showed statistically significant overlaps with complexes reported for these models (inset, Figure 3-19B), with half of the subunits having clear orthologs (Figure 3-20A); the remaining components presumably represent genuine differences or incomplete orthology annotations.

124

Figure 3-19: Evolutionary conservation of protein complexes.

A-The components of predicted human complexes evolved more slowly, calculated as the average of evolutionary rate ratios, compared to the entire set of expressed proteins (see Computational analyses methods for details). B- A pronounced spike in the number of predicted human protein complexes originated with the emergence of vertebrates. The X-axis shows increasingly inclusive orthologous groups in the phylogeny of eukaryotes.

125

The functional significance of unannotated ancestral human complexes supported by conservation in yeast or fly (Figure 3-20A) warrants further investigations. At least one such complex, a recently reported novel multi-subunit tRNA-splicing (Popow et al., 2011), was characterized recently. The interaction between DDX1 and C14orf166 was detected at high confidence both in the present dataset (probability score 0.899) and in the Guruharsha et al. fly co-complex data, while the other respective associated complex subunits likewise show significant overlap (Benjamini-corrected p-value  1.1x10-7). Additional examples of complex conservation are similarly supported by independent experimental evidence, e.g. such as the matching tissue specificities of the putatively interacting proteins endoplasmin and glucosidase

2 (Figure 3-20B), which form an uncharacterized complex conserved in both the fly and human maps.

126

A B

Figure 3-20: Conservation of human complexes in fly and yeast.

A-Human complexes conserved in fly (Guruharsha et al., 2011), and yeast (Babu et al., 2012). Each node represents a complex (human in blue, fly in green, yeast in orange). Node size is proportional to the size of the corresponding complex. Reciprocal best matches are shown as dark grey edges, non-reciprocal as lighter grey directed edges. The edge thickness is proportional to the Sorensen-Dice overlap of complex members. Human complexes absent from public databases (“novel” complexes) are drawn as rectangular nodes, the remaining human complexes as circular nodes. B- Similar tissue-specific protein expression patterns support a functional association between proteins ENPL and GLU2B, which were observed to interact in a novel human complex that was also conserved in fly. Panels show representative antibody staining in normal tissue biopsies collected and reported by the Human Protein Atlas project (Uhlen et al., 2010) (www.proteinatlas.org).

127

3.3.6 Protein abundance

Consistent with the documented origins of the HeLa/HEK293 cells analyzed in this study, the identified complexes were significantly enriched for epithelial markers (p ≤ 10-183; UniProt tissue annotations). However, most complex subunits are considered ubiquitous across diverse human tissues (p ≤ 10-11; PIR tissue specificity annotations), and are expressed in the top quartiles of

1,045 of 7,067 neoplastic and normal tissue CGAP EST libraries (1% FDR), including normal kidney (p ≤ 10-39), muscle (p ≤ 10-20), liver (p ≤ 10-12), brain (p ≤ 10-20), vascular (p ≤ 10-30), bone

(p ≤ 10-15), and embryonic tissue (p ≤ 10-31). Genes encoding complex subunits also tend to share common upstream transcriptional regulatory motifs (p ≤ 10-8) (Table 3-3). Proteins mapped to complexes showed no major bias in abundance over the complete set of human proteins identified by mass spectrometry (Figure 3-3). The pervasiveness of ubiquitously expressed protein complexes argues strongly for broad relevance to basic human cell biology.

3.4 Discussion

The biochemically-based interaction data obtained in this large-scale proteomic study have enabled the identification of 364 previously unannotated protein complexes (i.e., predicted complexes with no statistically significant match to complexes in public databases) encompassing 1,278 human proteins, many of which are linked to human disease or are evolutionarily conserved, as well as the identification of new subunits of well-studied and widely-conserved nuclear and cytoplasmic protein machineries, such as the ribosome biogenesis machinery, with clear biological implications. Most of the high-confidence protein interactions provided in this resource have not been previously reported in a major, public interaction database and hence motivate mechanistic investigations of specific biological systems.

128

Prior to this work, experimental knowledge regarding soluble protein complex membership in human cells has generally been ad hoc or focused on specific sub-cellular systems. The unbiased integrative approach, wherein biochemical evidence (co-fractionation) of soluble native macromolecules was combined with genomic inferences (imputed functional associations) provides an inclusive snapshot of human protein complexes under a standardized cellular context, thus serving as a reference against which future process- or cell-type specific or dynamic interaction datasets can be compared.

This 'first pass' draft of the soluble, stably-associated human protein 'complexome' provides a glimpse into the global physical molecular organization of human cells, which is likely to be perturbed in pathological states or in response to environmental cues. I believe that, with further refinement to my FP-MS experimental procedures (see chapter 4), the interaction mapping strategy presented in this work has the potential to interrogate such changes in interaction space in a systematic manner in the future.

129

Chapter 4 Conclusions and Future Directions

130

4 Conclusions and future directions 4.1 Conclusions

Most cellular processes are carried out by stable physical association between proteins in the form of dedicated macromolecular complexes. These include numerous soluble cytoplasmic and nuclear protein machines mediating core, conserved cellular functions such as gene expression, protein synthesis, and chromosome replication. Since the subunits of these complexes usually work together to perform some biological role, knowledge of complex membership can be used to deduce not only biological mechanisms but also the function of proteins that currently lack annotations (Gavin et al., 2006; Guruharsha et al., 2011; Hu et al., 2009; Hutchins et al., 2010;

Krogan et al., 2006; Malovannaya et al., 2011). Yet, despite considerable progress in the large- scale isolation and identification of soluble protein complexes from simpler unicellular model organisms like yeast and bacteria and more recently from metazoa such as fly using systematic

AP-MS approaches, no systematic but otherwise non-targeted experimental effort to define all the endogenous protein complexes present in a typical human cell has been reported prior to the work presented in this Thesis.

In Chapter 2 of this Thesis, I introduced the generic proteome-scale protein complex identification procedure FP-MS, short for “Fractionomic Profiling-Mass Spectrometry”, which allowed for the global and relatively unbiased detection of soluble stable protein complexes. In contrast to antibody-based enrichment methods, which are serial approaches in the sense that only one complex is analyzed per individual experiment, my method can be used to retrieve and compare multiple protein complexes almost simultaneously in a standardized, highly parallel manner starting from a single culture of human cells. Importantly, the approach doesn’t require genetic manipulation or the use of specialized antibodies or cDNA reagents in order to identify

131 many of the protein complexes present in cell extracts, meaning a single individual (i.e. myself) was capable of generating a massive amount of biochemical data relatively quickly. My FP-MS workflow consists of cell culturing, cell lysis, chromatographic fractionation, and LC-MS/MS analysis of the protein fractions. This was followed by computational scoring and filtering of the protein co-elution profiles, ideally using supporting orthogonal genomics evidence to enrich for functionally relevant interactions, and clustering the co-elution network to define complex membership. Each of these steps can and was independently optimized using objective criteria.

However, one should keep in mind that there are two critical steps: firstly, cell lysis in non- denaturing conditions to maintain the integrity of multi-protein complexes while minimizing artifacts associated with compartment perturbation and protein dilution; and, secondly the highest possible resolution during the chromatographic separation of native assemblies to minimize biologically non-informative coincidental co-fractionation. Although some hurdles remain and will discussed further below, in my studies I have tested various arrangements of

IEX-HPLC fractionation using cell free extracts from two of the oldest and best studied human model cell lines that have been used in almost all facets of cell biology and protein biochemistry,

HeLa and HEK 293 cells (HEK cells were used in this study to a lesser degree) as a test case. I have shown that each of the various column arrangements studied here can be applied to achieve a reasonable degree of separation of stable protein complexes starting from highly complicated mixtures. The optimal choice for a single resin or combinations of resin phases largely depends on both protein amount and sample types. In each chromatographic approach, annotated components of the same complex typically show correlated co-elution profiles and usually share the same exact apex peak. Less often, members of curated complexes have been observed in separated at multiple fractions. There are several possibilities that can explain such observations:

(i) one protein could be in multiple complexes; (ii) a protein could be in different complexes as

132 splice variants (Kyriacou and Deutscher, 2008), which are difficult to identify by MS-based peptide sequencing due to the high percentage (100%) of sequence coverage required to unequivocally distinguish two isoforms; (iii) the proteins may separated because of protein complex disruption; (iv) one peak could contain the free protein not incorporated in the known complex. It would be interesting to actually investigate the percentage of synthesized proteins that get incorporated into complexes.

One side benefit of increased chromatographic resolution is that overall proteome coverage improves, presumably due to the detection of low abundance proteins separated from those of higher abundance in well-resolved HPLC protein fractions. The use of single phase columns provided both the poorest chromatographic separation and lowest proteome coverage.

Conversely, dual phase heparin mixed-bed IEX chromatography was most effective with nuclear extracts. This conclusion is based on the behavior of 20 reference complexes, whose respective subunits co-eluted with highest reproducibility and precision using this platform. Triple phase columns set up, which is a combination of two anion-exchange columns in tandem followed by a cation-exchange column, proved to be more effective for the fractionation of cytoplasmic extracts; the availability of a dedicated MS instrument meant that the larger number of collected fractions was not a burden in this case. My fractionation procedures can be scaled up or down through reducing or increasing the column internal diameter and length.

Most of my proteomics results were generated using a low resolution linear ion-trap

(LTQ) MS instrument, which was the only platform accessible to me during most of my PhD studies. However, I was fortunate to be able to use the high performance Orbitrap LTQ Velos hybrid instrumentation in my last set of experiments. While the new high resolution data largely confirm my original findings, comparative analysis of the LC-MS/MS profile results obtained

133 using the LTQ and Orbitrap Velos revealed that the higher scan speed, mass precision and quantitative reproducibility of the newer platform can markedly improve the inferences made for lower abundance protein complexes. Moreover, the improved efficiency of this platform with its faster duty cycle meant the actual amount of instrument time required for detailed analysis of a biochemical sample could be reduced by at least 4-fold, a result which means that routine screening of large numbers of HPLC fractions can be significantly streamlined.

In principle, this high speed instrument can be used to generate the whole cellular interactome within two-three months for any cell type or tissue from any model system which has a sequenced genome. The current Orbitrap LTQ Velos MS instrument permits the analysis of one fraction per hour, including column washing and re-equilibration times. At this rate, the

9,000 hours spent on analyzing 2,057 fractions would have been reduced to ~ 2,000 hours, about

3 months, and would have provided us with its accurate MS1 intensities. This would, in turn, be used to approximate the protein complex stoichiometries lacking in my data.

In order to obtain a comprehensive view of any protein interactome using this instrument, it would be preferable to apply FP-MS to isolated cell , not just nuclear and cytoplasmic extracts, for more enhanced IEX-HPLC resolution of the proteome. For example, the coincidental co-fractionation observed between the vesicle trafficking complex and the nuclear splicing factor 3b complex was likely due to insufficient cell fractionation. Resolving such issues by separating complexes into their sub-cellular compartments prior to HPLC fractionation will result in higher levels of protein complex purity and will facilitate identification of low abundance proteins by MS. Non-ionic detergents or other agents could be included in the protocol to assist in the detection of membrane-associated protein complexes.

The protein elution profiles that would result from employing this approach may only require a

134 simple clustering to de-convolute the data into complexes. Were the cell contents partitioned into

10 sub-cellular compartments, 600-1,200 fractions would be generated, requiring ~ 2 months of

MS instrument time.

In the work presented in Chapter 3, which I accomplished through close collaboration with national and international groups of computational biologists, I have shown that combining my extensive proteomic profiles with complementary functional genomics information can serve to increase overall precision of complex prediction by minimizing false positives, artifacts which stem from the ever-present chance co-elution issue, and false negatives, as was reflected in the overall increased sensitivity and precision.

Nevertheless, further improvements in computational analyses are needed to increase sensitivity and precision. Even though I removed non-mass spectrometry data (e.g. Y2H, electron microscopy) from my gold standard, my negative references could still have some false negative PPI, since the CORUM database is not fully representative of all human complexes.

Also, the integrated HumanNet data set fed into the classifier contained a small number of conserved PPI (~3% of my experimental PPI; ~26,000/800,000), which may bias the machine learning outcome by overestimating the number of false positives. I strongly believe that integration of the currently published human expression data from 947 and 639 cancer cell lines

(Barretina et al., 2012; Garnett et al., 2012) would improve sensitivity and precision as well as the coverage of protein complexes in our constructed map. Such integration of genomic evidence from the same model system will likely shorten the time to experimentally verify all PPI by using more extensive sub-fractionation.

There were many different clustering strategies available for this work, and indeed I obtained reasonable results after applying the widely-used MCL algorithm on my scored co-

135 elution data. However, I achieved the best performance, measured through computation of the matching ratio between our predicted complexes and the reference complexes, using an advanced 'diffusion/pruning' clustering approach generated by the Paccanaro group (Nepusz et al., 2012). The overall quality of the global map of 622 putative human protein complexes reported here was established by obtaining extensive experimental evidence consistent with the composition of at least 237 of our predicted clusters (achieving a global validation rate of ~40%).

This evidence included comparisons to independent, recently published human immunoprecipitation data (Malovannaya et al., 2011), human and fly co-affinity purification/mass spectrometry data (Guruharsha et al., 2011), and the large body of literature- derived protein complexes curated in public repositories such as CORUM and HPRD. I also performed my own validation using AP-MS on several newly identified ribosome biogenesis factors.

Although a large number of complexes has been identified in this study, the reported human interactome map is still far from being comprehensive. For example, I missed 55%, or

179, of the 324 complexes that have been documented in the CORUM database (after merging redundant CORUM complexes). This CORUM database itself is not comprehensive. I was able to find an additional 92 complexes by further examination of the literature and other public repositories. There are many reasons why I didn’t capture all these previously known complexes.

First of all, the complexes stored in public databases are from different tissues and cell types grown under or isolated from different cell states. Not all of these complexes may even be expressed in the logarithmic growth conditions used for our HeLa and HEK 293 cell lines.

Beyond these biological limitations, there are the additional reasons that likely contributed to the large number of literature-curated protein complexes not present the predicted map. First, there was poor detection of membrane-associated protein complexes and transiently occurring protein

136 complexes. Both of these types of complexes were beyond the scope of this Thesis project.

Second, there is preferential detection of higher abundance and larger proteins by MS. Third, stringency in data processing, which was necessary for a high-quality interactome, sometimes penalized good interactions with low scores and, in some cases, suppressed an entire complex

(e.g. NELF complex). Despite these shortcomings, a large number of novel soluble stable protein complexes were discovered, and these studies have contributed to the expanded knowledge about human protein complexes in two standardized cell lines.

Finally, in Chapter 3, I outlined how our extensive network of associating human proteins can provide insights into cellular functions, potential disease relevance, and broad biological properties, such as the evolutionary conservation of both individual human proteins and resulting complexes. Using explicit examples, such as the newly discovered ribosome biogenesis factors and proteins involved in the human CdLS disorder, I have provided information concerning the roles of proteins of previously unknown function based on interactions with annotated proteins and have provided a basic molecular blueprint underlying human cell biology and disease.

However, as HeLa is a transformed cell line with malignant genome modifications, abnormal complexes (i.e. not observable in a “normal human epithelial cell”) may have been detected. It would be of interest, of course, at some stage to compare this map with those from normal diploid cell lines and perhaps learn more about complexes that may only be present in human disease.

In summary, I believe that FP-MS could emerge as a powerful analytical tool for the routine, large-scale discovery and identification of protein complexes, and that my catalogue of soluble stable human multi-protein complexes will provide the biological community with opportunities for further functional discoveries and understanding of the molecular context by

137 which core biological processes operate in normal healthy human cells and how these systems might become perturbed in disease states.

4.2 Future directions

4.2.1 Investigation of the role of FTSJ3, MKI67IP, and GNL3 in ribosome biogenesis

Given sufficient resources and time, I would have liked to perform an examination of the exact roles of the candidate ribosome biogenesis factors, namely FTSJ3, MKI67IP, and GNL3, whose functional associations were predicted from my FP-MS results. Currently, we don’t know why, how and at what stage of ribosome assembly these proteins are involved. Experiments combining biochemical and genetic assays to probe ribosomal RNA processing and the nuclear export of pre- from the nucleolus to the cytoplasm upon depletion of these factors could provide insights into their localization as well as their potential roles in the ribosomal RNA and 60S ribosome assembly pathways.

4.2.2 Enhanced FP-MS for the analysis of membrane-associated protein complexes

Even though a few membrane-associated protein complexes were detected in this study, my experimental design was devised for the isolation and detection of the readily soluble protein complexes present in human cells. However, recent work by my colleagues, Dajana Vuckovic and Laura Gianni, indicate that a co-fractionation approach based on IEX-HPLC can be adapted to examine the composition of stable complexes consisting of membrane proteins for an even more comprehensive view of protein assemblies in human cells or other models. Such experiments are facilitated by the inclusion of HPLC-compatible non-denaturing detergents during extraction and subsequent extract fractionation. Preliminary results from our lab show that maltose–neopentyl glycol amphiphile (MNG-3), which has been shown to effectively

138 solubilize and stabilize G protein–coupled receptor complexes (Chae et al., 2010; Rasmussen et al., 2011), is potentially a good candidate for an FP-MS proteomic approach of membrane associated protein complexes.

4.2.3 Comparative interactome mapping across other models

The intriguing conservation trends observed in this study, namely a vertebrate spike in complexes, as well as the modest overlap between the derived human complex data and recently published fly and yeast interactome maps, were provocative. This may reflect evolutionary adaptations within PPI networks, but it would be wise not to exclude the possible confounding bias stemming from the different methods and experimental designs used by each study. Hence, it would be of considerable interest to generate a "pan-animal" scale analysis documenting the conservation of soluble protein complexes across different animal species and phyla. In principle, my FP-MS approach is applicable to any organism whose genome has been sequenced and hence is amenable to LC-MS/MS-based protein identifications. Indeed, using my optimized

FP-MS platform, our laboratory has already started what may be the first study of its kind looking directly into the compositions of soluble stable protein complexes present in cell extracts across a multitude of diverse model animal species using a single standardized experimental protocol. I expect to see the FP-MS approach I have pioneered become a major driver in the discovery of basic biological functions and pathways across species, thereby facilitating our understanding of ‘systems biology’.

4.2.4 Elucidation of microbe-human cell interactions

Understanding the mechanisms used by pathogenic microorganisms to commandeer the host machinery while evading the immune response may allow the development of more effective strategies against important infectious diseases such as AIDS, malaria, tuberculosis and less well

139 studied, but equally persistent, public health problems like invasive candidiasis (Pfaller and

Diekema, 2007). Comprehensive maps showing the protein complexes targeted by bacterial toxins could provide one crucial step towards understanding the interactions between host cells and invading microbes. To date, two types of strategies have been employed to systematically study host-microbes interactions. Y2H-based methodologies have been used to generate a comprehensive map of potential interactions between Epstein-Barr viral proteins and human proteins (Calderwood et al., 2007), as well as a physical map connecting H1N1 influenza and its human host (Shapira et al., 2009). More recently, the Krogan group (Jager et al., 2011) carried out a systematic AP-MS analysis to build a comprehensive map of HIV-human protein physical associations. While informative, these two complementary strategies did not capture changes in protein complex compositions during different stages of the infection cycle.

I believe that my FP-MS approach can overcome this shortcoming. Using the example of

HIV infection, discrete steps in the process of pathogenicity could be monitored through collection and profiling of relevant samples using cultured human T-cells infected with virus at different time points. Such an experiment could provide insights into the kinetics of viral assembly and, potentially, identify essential host factors used by the virus to abrogate normal cellular pathways and host responses. I note that prior to my PhD work in the Emili lab, I performed a time-course analysis of the assembly of the Adenovirus type 5 capsid upon infection of HEK293 cells (Havugimana, 2003).

140

References

1. Aebersold, R., and Mann, M. (2003). Mass spectrometry-based proteomics. Nature 422, 198-207.

2. Alberts, B. (1998). The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell 92, 291-294.

3. Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P., Nigg, E. A., and Mann, M. (2003). Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570-574.

4. Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A. T., Kerrien, S., Khadake, J., et al. (2010). The IntAct molecular interaction database in 2010. Nucleic Acids Res 38, D525-531.

5. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29.

6. Babu, M., Butland, G., Pogoutse, O., Li, J., Greenblatt, J. F., and Emili, A. (2009). Sequential peptide affinity purification system for the systematic isolation and identification of protein complexes from Escherichia coli. Methods Mol Biol 564, 373- 400.

7. Babu, M., Vlasblom, J., Pu, S., Guo, X., Graham, C., Bean, B. D. M., Vizeacoumar, F. J., Burston, H. E., Snider, J., Phanse, S., et al. (2012). Interaction Landscape of Membrane Protein Complexes in Saccharomyces cerevisiae. Nature DOI 10.1038/nature11354.

8. Bader, G. D., and Hogue, C. W. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2.

9. Bandyopadhyay, A. K., and Deutscher, M. P. (1971). Complex of aminoacyl-transfer RNA synthetases. J Mol Biol 60, 113-122.

10. Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., Wilson, C. J., Lehar, J., Kryukov, G. V., Sonkin, D., et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-607.

11. Barrios-Rodiles, M., Brown, K. R., Ozdamar, B., Bose, R., Liu, Z., Donovan, R. S., Shinjo, F., Liu, Y., Dembowy, J., Taylor, I. W., et al. (2005). High-throughput mapping of a dynamic signaling network in mammalian cells. Science 307, 1621-1625.

12. Becker, K. G., Barnes, K. C., Bright, T. J., and Wang, S. A. (2004). The genetic association database. Nat Genet 36, 431-432.

141

13. Behrends, C., Sowa, M. E., Gygi, S. P., and Harper, J. W. (2010). Network organization of the human system. Nature 466, 68-76.

14. Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B 57, 289-300.

15. Bensimon, A., Heck, A. J., and Aebersold, R. (2012). Mass Spectrometry-Based Proteomics and Network Biology. Annu Rev Biochem.

16. Bouwmeester, T., Bauch, A., Ruffner, H., Angrand, P. O., Bergamini, G., Croughton, K., Cruciat, C., Eberhard, D., Gagneur, J., Ghidelli, S., et al. (2004). A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. Nat Cell Biol 6, 97-105.

17. Breitkreutz, A., Choi, H., Sharom, J. R., Boucher, L., Neduva, V., Larsen, B., Lin, Z. Y., Breitkreutz, B. J., Stark, C., Liu, G., et al. (2010). A global protein kinase and phosphatase interaction network in yeast. Science 328, 1043-1046.

18. Brohee, S., and van Helden, J. (2006). Evaluation of clustering algorithms for protein- protein interaction networks. BMC Bioinformatics 7, 488.

19. Butland, G., Peregrin-Alvarez, J. M., Li, J., Yang, W., Yang, X., Canadien, V., Starostine, A., Richards, D., Beattie, B., Krogan, N., et al. (2005). Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433, 531-537.

20. Cagney, G., Amiri, S., Premawaradena, T., Lindo, M., and Emili, A. (2003). In silico proteome analysis to facilitate proteomics experiments using mass spectrometry. Proteome Sci 1, 5.

21. Calderwood, M. A., Venkatesan, K., Xing, L., Chase, M. R., Vazquez, A., Holthaus, A. M., Ewence, A. E., Li, N., Hirozane-Kishikawa, T., Hill, D. E., et al. (2007). Epstein- Barr virus and virus human protein interaction maps. Proc Natl Acad Sci U S A 104, 7606-7611.

22. Ceol, A., Chatr Aryamontri, A., Licata, L., Peluso, D., Briganti, L., Perfetto, L., Castagnoli, L., and Cesareni, G. (2010). MINT, the molecular interaction database: 2009 update. Nucleic Acids Res 38, D532-539.

23. Chae, P. S., Rasmussen, S. G., Rana, R. R., Gotfryd, K., Chandra, R., Goren, M. A., Kruse, A. C., Nurva, S., Loland, C. J., Pierre, Y., et al. (2010). Maltose-neopentyl glycol (MNG) amphiphiles for solubilization, stabilization and crystallization of membrane proteins. Nat Methods 7, 1003-1008.

24. Chan, J. N., Vuckovic, D., Sleno, L., Olsen, J. B., Pogoutse, O., Havugimana, P., Hewel, J. A., Bajaj, N., Wang, Y., Musteata, M. F., et al. (2012). Target Identification by Chromatographic Co-elution: Monitoring of Drug-Protein Interactions without Immobilization or Chemical Derivatization. Mol Cell Proteomics 11, M111 016642.

142

25. Clark, G. W., Dar, V. U., Bezginov, A., Yang, J. M., Charlebois, R. L., and Tillier, E. R. (2011). Using coevolution to predict protein-protein interactions. Methods Mol Biol 781, 237-256.

26. Coifman, R. R., Lafon, S., Lee, A. B., Maggioni, M., Nadler, B., Warner, F., and Zucker, S. W. (2005). Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci U S A 102, 7426-7431.

27. Collins, S. R., Kemmeren, P., Zhao, X. C., Greenblatt, J. F., Spencer, F., Holstege, F. C., Weissman, J. S., and Krogan, N. J. (2007). Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics 6, 439-450.

28. Croft, D., O'Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., et al. (2011). Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 39, D691-697.

29. de Gelder, R., Wehrens, R., and Hageman, J. A. (2001). A generalized expression for the similarity of spectra: application to powder diffraction pattern classification. Journal of Computational Chemistry 22, 273-289.

30. Deardorff, M. A., Wilde, J. J., Albrecht, M., Dickinson, E., Tennstedt, S., Braunholz, D., Monnich, M., Yan, Y., Xu, W., Gil-Rodriguez, M. C., et al. (2012). RAD21 Mutations Cause a Human Cohesinopathy. Am J Hum Genet 90, 1014-1027.

31. DeScipio, C., Kaur, M., Yaeger, D., Innis, J. W., Spinner, N. B., Jackson, L. G., and Krantz, I. D. (2005). Chromosome rearrangements in cornelia de Lange syndrome (CdLS): report of a der(3)t(3;12)(p25.3;p13.3) in two half sibs with features of CdLS and review of reported CdLS cases with chromosome rearrangements. Am J Med Genet A 137A, 276-282.

32. Dignam, J. D., Lebovitz, R. M., and Roeder, R. G. (1983). Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res 11, 1475-1489.

33. Dong, M., Yang, L. L., Williams, K., Fisher, S. J., Hall, S. C., Biggin, M. D., Jin, J., and Witkowska, H. E. (2008). A "tagless" strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking. J Proteome Res 7, 1836-1849.

34. Dreze, M., Monachello, D., Lurin, C., Cusick, M. E., Hill, D. E., Vidal, M., and Braun, P. (2010). High-quality binary interactome mapping. Methods Enzymol 470, 281-315.

35. Dunham, W. H., Larsen, B., Tate, S., Badillo, B. G., Goudreault, M., Tehami, Y., Kislinger, T., and Gingras, A. C. (2011). A cost-benefit analysis of multidimensional fractionation of affinity purification-mass spectrometry samples. Proteomics 11, 2603- 2612.

36. el Rassi, Z., and Horvath, C. (1986). Tandem columns and mixed-bed columns in high- performance liquid chromatography of proteins. J Chromatogr 359, 255-264.

143

37. Eng, J. K., McCormack, A. L., and Yates, J. R. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry 5, 976-989.

38. Enright, A. J., Van Dongen, S., and Ouzounis, C. A. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575-1584.

39. Ewing, R. M., Chu, P., Elisma, F., Li, H., Taylor, P., Climie, S., McBroom-Cerajewski, L., Robinson, M. D., O'Connor, L., Li, M., et al. (2007). Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol 3, 89.

40. Eyckerman, S., Verhee, A., der Heyden, J. V., Lemmens, I., Ostade, X. V., Vandekerckhove, J., and Tavernier, J. (2001). Design and application of a cytokine- receptor-based interaction trap. Nat Cell Biol 3, 1114-1119.

41. Felsenstein, J. (2005). PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author University of Washington, Seattle.

42. Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science 246, 64-71.

43. Fields, S., and Song, O. (1989). A novel genetic system to detect protein-protein interactions. Nature 340, 245-246.

44. Foster, L. J., de Hoog, C. L., Zhang, Y., Zhang, Y., Xie, X., Mootha, V. K., and Mann, M. (2006). A mammalian organelle map by protein correlation profiling. Cell 125, 187- 199.

45. Garnett, M. J., Edelman, E. J., Heidorn, S. J., Greenman, C. D., Dastur, A., Lau, K. W., Greninger, P., Thompson, I. R., Luo, X., Soares, J., et al. (2012). Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570- 575.

46. Gavin, A. C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dumpelfeld, B., et al. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631-636.

47. Gavin, A. C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A. M., Cruciat, C. M., et al. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141-147.

48. Gingras, A. C., Gstaiger, M., Raught, B., and Aebersold, R. (2007). Analysis of protein complexes using mass spectrometry. Nat Rev Mol Cell Biol 8, 645-654.

49. Goll, J., and Uetz, P. (2006). The elusive yeast interactome. Genome Biol 7, 223.

50. Goudreault, M., D'Ambrosio, L. M., Kean, M. J., Mullin, M. J., Larsen, B. G., Sanchez, A., Chaudhry, S., Chen, G. I., Sicheri, F., Nesvizhskii, A. I., et al. (2009). A PP2A phosphatase high density interaction network identifies a novel striatin-interacting

144

phosphatase and kinase complex linked to the cerebral cavernous malformation 3 (CCM3) protein. Mol Cell Proteomics 8, 157-171.

51. Graham, F. L., Smiley, J., Russell, W. C., and Nairn, R. (1977). Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J Gen Virol 36, 59- 74.

52. Guruharsha, K. G., Rual, J. F., Zhai, B., Mintseris, J., Vaidya, P., Vaidya, N., Beekman, C., Wong, C., Rhee, D. Y., Cenaj, O., et al. (2011). A Protein Complex Network of . Cell 147, 690-703.

53. Haas, E., Grell, M., Wajant, H., and Scheurich, P. (1999). Continuous autotropic signaling by membrane-expressed tumor necrosis factor. J Biol Chem 274, 18107-18112.

54. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., and McKusick, V. A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514-517.

55. Harris, M. A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., et al. (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32, D258-261.

56. Hart, G. T., Lee, I., and Marcotte, E. R. (2007). A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics 8, 236.

57. Hartman, N. T., Sicilia, F., Lilley, K. S., and Dupree, P. (2007). Proteomic complex detection using sedimentation. Anal Chem 79, 2078-2083.

58. Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999). From molecular to modular cell biology. Nature 402, C47-52.

59. Havugimana, P. C. (2003). Analyse et suivi de l'expression de proteines d'adenovirus recombinants par LCMS. MSc Thesis, Universite Laval.

60. Havugimana, P. C., Wong, P., and Emili, A. (2006). Enhanced Proteomic Analysis by HPLC Prefractionation in Handbook of Pharmaceutical Biotechnology (ed S C Gad), John Wiley & Sons, Inc, Hoboken, NJ, USA doi: 101002/9780470117118ch13b, 1491– 1501.

61. Havugimana, P. C., Wong, P., and Emili, A. (2007). Improved proteomic discovery by sample pre-fractionation using dual-column ion-exchange high performance liquid chromatography. J Chromatogr B Analyt Technol Biomed Life Sci 847, 54-61.

62. Haw, R. A., Croft, D., Yung, C. K., Ndegwa, N., D'Eustachio, P., Hermjakob, H., and Stein, L. D. (2011). The Reactome BioMart. Database (Oxford) 2011, bar031.

63. Hiu Yi Lam, M., and Stagljar, I. (2012). Strategies for membrane interaction proteomics: No mass spectrometry required. Proteomics.

145

64. Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al. (2002). Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180-183.

65. Hu, P., Janga, S. C., Babu, M., Diaz-Mejia, J. J., Butland, G., Yang, W., Pogoutse, O., Guo, X., Phanse, S., Wong, P., et al. (2009). Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7, e96.

66. Hutchins, J. R., Toyoda, Y., Hegemann, B., Poser, I., Heriche, J. K., Sykora, M. M., Augsburg, M., Hudecz, O., Buschhorn, B. A., Bulkescher, J., et al. (2010). Systematic analysis of human protein complexes identifies chromosome segregation proteins. Science 328, 593-599.

67. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98, 4569-4574.

68. Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J. R., McGovern, K. E., Clarke, S. C., Shales, M., Mercenne, G., Pache, L., Li, K., et al. (2011). Global landscape of HIV- human protein complexes. Nature 481, 365-370.

69. Jansen, R., and Gerstein, M. (2004). Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol 7, 535-545.

70. Jeronimo, C., Forget, D., Bouchard, A., Li, Q., Chua, G., Poitras, C., Therien, C., Bergeron, D., Bourassa, S., Greenblatt, J., et al. (2007). Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol Cell 27, 262-274.

71. Joshi-Tope, G., Gillespie, M., Vastrik, I., D'Eustachio, P., Schmidt, E., de Bono, B., Jassal, B., Gopinath, G. R., Wu, G. R., Matthews, L., et al. (2005). Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33, D428-432.

72. Karas, M., and Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem 60, 2299-2301.

73. Katoh, K., Kuma, K., Miyata, T., and Toh, H. (2005). Improvement in the accuracy of multiple sequence alignment program MAFFT. Genome Inform 16, 22-33.

74. Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., et al. (2011). The IntAct molecular interaction database in 2012. Nucleic Acids Res 40, D841-846.

75. King, A. D., Przulj, N., and Jurisica, I. (2004). Protein complex prediction via cost-based clustering. Bioinformatics 20, 3013-3020.

76. Kislinger, T., Cox, B., Kannan, A., Chung, C., Hu, P., Ignatchenko, A., Scott, M. S., Gramolini, A. O., Morris, Q., Hallett, M. T., et al. (2006). Global survey of organ and

146

organelle protein expression in mouse: combined proteomic and transcriptomic profiling. Cell 125, 173-186.

77. Kislinger, T., and Emili, A. (2003). Going global: protein expression profiling using shotgun mass spectrometry. Curr Opin Mol Ther 5, 285-293.

78. Kislinger, T., Rahman, K., Radulovic, D., Cox, B., Rossant, J., and Emili, A. (2003). PRISM, a generic large scale proteomic investigation strategy for . Mol Cell Proteomics 2, 96-106.

79. Kocher, T., and Superti-Furga, G. (2007). Mass spectrometry-based functional proteomics: from molecular machines to protein networks. Nat Methods 4, 807-815.

80. Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P., et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637-643.

81. Kuhner, S., van Noort, V., Betts, M. J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., et al. (2009). Proteome organization in a genome-reduced bacterium. Science 326, 1235-1240.

82. Kyriacou, S. V., and Deutscher, M. P. (2008). An important role for the multienzyme aminoacyl-tRNA synthetase complex in mammalian translation and cell growth. Mol Cell 29, 419-427.

83. Lage, K., Karlberg, E. O., Storling, Z. M., Olason, P. I., Pedersen, A. G., Rigina, O., Hinsby, A. M., Tumer, Z., Pociot, F., Tommerup, N., et al. (2007). A human phenome- interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 25, 309-316.

84. Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res 21, 1109-1121.

85. Lehner, B., and Fraser, A. G. (2004). A first-draft human protein-interaction map. Genome Biol 5, R63.

86. Li, X., Wu, M., Kwoh, C. K., and Ng, S. K. (2010). Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 11 Suppl 1, S3.

87. Li, Y. (2011). The tandem affinity purification technology: an overview. Biotechnol Lett 33, 1487-1499.

88. Liang, S., Shen, G., Xu, X., Xu, Y., and Wei, Y. (2009). Affinity Purification Combined with Mass Spectrometry-Based Proteomic Strategy to Study Mammalian Protein Complex and Protein-Protein Interactions. Current Proteomics Volume 6 Number 1.

147

89. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R., 3rd (1999). Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 17, 676-682.

90. Luc, P. V., and Tempst, P. (2004). PINdb: a database of nuclear protein complexes from human and yeast. Bioinformatics 20, 1413-1415.

91. Maa, Y. F., Antia, F. D., el Rassi, Z., and Horvath, C. (1988). Mixed-bed ion-exchange columns for protein high-performance liquid chromatography. J Chromatogr 452, 331- 345.

92. Mak, A. B., Ni, Z., Hewel, J. A., Chen, G. I., Zhong, G., Karamboulas, K., Blakely, K., Smiley, S., Marcon, E., Roudeva, D., et al. (2010). A lentiviral functional proteomics approach identifies chromatin remodeling complexes important for the induction of pluripotency. Mol Cell Proteomics 9, 811-823.

93. Malovannaya, A., Lanz, R. B., Jung, S. Y., Bulynko, Y., Le, N. T., Chan, D. W., Ding, C., Shi, Y., Yucer, N., Krenciute, G., et al. (2011). Analysis of the human endogenous coregulator complexome. Cell 145, 787-799.

94. Malovannaya, A., Li, Y., Bulynko, Y., Jung, S. Y., Wang, Y., Lanz, R. B., O'Malley, B. W., and Qin, J. (2010). Streamlined analysis schema for high-throughput identification of endogenous protein complexes. Proc Natl Acad Sci U S A 107, 2431-2436.

95. Masters, J. R. (2002). HeLa cells 50 years on: the good, the bad and the ugly. Nat Rev Cancer 2, 315-319.

96. Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., et al. (2009). Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res 37, D619-622.

97. McBrien, J., Crolla, J. A., Huang, S., Kelleher, J., Gleeson, J., and Lynch, S. A. (2008). Further case of microdeletion of 8q24 with phenotype overlapping Langer-Giedion without TRPS1 deletion. Am J Med Genet A 146A, 1587-1592.

98. McGary, K. L., Park, T. J., Woods, J. O., Cha, H. J., Wallingford, J. B., and Marcotte, E. M. (2010). Systematic discovery of nonobvious human disease models through orthologous phenotypes. Proc Natl Acad Sci U S A 107, 6544-6549.

99. Mewes, H. W., Amid, C., Arnold, R., Frishman, D., Guldener, U., Mannhaupt, G., Munsterkotter, M., Pagel, P., Strack, N., Stumpflen, V., et al. (2004). MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res 32, D41-44.

100. Morin, R., Bainbridge, M., Fejes, A., Hirst, M., Krzywinski, M., Pugh, T., McDonald, H., Varhol, R., Jones, S., and Marra, M. (2008). Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 45, 81-94.

148

101. Mosley, A. L., Florens, L., Wen, Z., and Washburn, M. P. (2009). A label free quantitative proteomic analysis of the Saccharomyces cerevisiae nucleus. J Proteomics 72, 110-120.

102. Musso, G. A., Zhang, Z., and Emili, A. (2007). Experimental and computational procedures for the assessment of protein complexes on a genome-wide scale. Chem Rev 107, 3585-3600.

103. Nagaraj, N., Wisniewski, J. R., Geiger, T., Cox, J., Kircher, M., Kelso, J., Paabo, S., and Mann, M. (2011). Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 7, 548.

104. Nepusz, T., Yu, H., and Paccanaro, A. (2012). Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods 9, 471-472.

105. Neumann, B., Walter, T., Heriche, J. K., Bulkescher, J., Erfle, H., Conrad, C., Rogers, P., Poser, I., Held, M., Liebel, U., et al. (2010). Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature 464, 721-727.

106. Olinares, P. D., Ponnala, L., and van Wijk, K. J. (2010). Megadalton complexes in the stroma of arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry, and hierarchical clustering. Mol Cell Proteomics 9, 1594-1615.

107. Oliver, S. (2000). Guilt-by-association goes global. Nature 403, 601-603.

108. Ostlund, G., Schmitt, T., Forslund, K., Kostler, T., Messina, D. N., Roopra, S., Frings, O., and Sonnhammer, E. L. (2010). InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 38, D196-203.

109. Paccanaro, A., Casbon, J. A., and Saqi, M. A. (2006). Spectral clustering of protein sequences. Nucleic Acids Res 34, 1571-1580.

110. Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J., and Gygi, S. P. (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC- MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2, 43-50.

111. Pesquita, C., Faria, D., Falcao, A. O., Lord, P., and Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS Comput Biol 5, e1000443.

112. Pfaller, M. A., and Diekema, D. J. (2007). Epidemiology of invasive candidiasis: a persistent public health problem. Clin Microbiol Rev 20, 133-163.

113. Phizicky, E. M., and Fields, S. (1995). Protein-protein interactions: methods for detection and analysis. Microbiol Rev 59, 94-123.

114. Pie, J., Gil-Rodriguez, M. C., Ciero, M., Lopez-Vinas, E., Ribate, M. P., Arnedo, M., Deardorff, M. A., Puisac, B., Legarreta, J., de Karam, J. C., et al. (2010). Mutations and

149

variants in the cohesion factor genes NIPBL, SMC1A, and SMC3 in a cohort of 30 unrelated patients with Cornelia de Lange syndrome. Am J Med Genet A 152A, 924-929.

115. Popow, J., Englert, M., Weitzer, S., Schleiffer, A., Mierzwa, B., Mechtler, K., Trowitzsch, S., Will, C. L., Luhrmann, R., Soll, D., and Martinez, J. (2011). HSPC117 is the essential subunit of a human tRNA splicing ligase complex. Science 331, 760-764.

116. Poser, I., Sarov, M., Hutchins, J. R., Heriche, J. K., Toyoda, Y., Pozniakovsky, A., Weigl, D., Nitzsche, A., Hegemann, B., Bird, A. W., et al. (2008). BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals. Nat Methods 5, 409-415.

117. Prasad, T. S., Kandasamy, K., and Pandey, A. (2009). Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology. Methods Mol Biol 577, 67-79.

118. Pu, S., Vlasblom, J., Emili, A., Greenblatt, J., and Wodak, S. J. (2007). Identifying functional modules in the physical interactome of Saccharomyces cerevisiae. Proteomics 7, 944-960.

119. Pu, S., Wong, J., Turner, B., Cho, E., and Wodak, S. J. (2009). Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res 37, 825-831.

120. Ramani, A. K., Bunescu, R. C., Mooney, R. J., and Marcotte, E. M. (2005). Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 6, R40.

121. Ramani, A. K., Li, Z., Hart, G. T., Carlson, M. W., Boutz, D. R., and Marcotte, E. M. (2008). A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol Syst Biol 4, 180.

122. Rasmussen, S. G., Choi, H. J., Fung, J. J., Pardon, E., Casarosa, P., Chae, P. S., Devree, B. T., Rosenbaum, D. M., Thian, F. S., Kobilka, T. S., et al. (2011). Structure of a nanobody-stabilized active state of the beta(2) adrenoceptor. Nature 469, 175-180.

123. Razick, S., Magklaras, G., and Donaldson, I. M. (2008). iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405.

124. Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95-130.

125. Rhodes, D. R., Tomlins, S. A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana- Sundaram, S., Ghosh, D., Pandey, A., and Chinnaiyan, A. M. (2005). Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 23, 951-959.

126. Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M., and Seraphin, B. (1999). A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol 17, 1030-1032.

150

127. Roeder, R. G., and Rutter, W. J. (1969). Multiple forms of DNA-dependent RNA polymerase in eukaryotic organisms. Nature 224, 234-237.

128. Rual, J. F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz, G. F., Gibbons, F. D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005). Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173- 1178.

129. Ruepp, A., Waegele, B., Lechner, M., Brauner, B., Dunger-Kaltenbach, I., Fobo, G., Frishman, G., Montrone, C., and Mewes, H. W. (2009). CORUM: the comprehensive resource of mammalian protein complexes--2009. Nucleic Acids Res 38, D497-501.

130. Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004). The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32, D449- 451.

131. Sardiu, M. E., Cai, Y., Jin, J., Swanson, S. K., Conaway, R. C., Conaway, J. W., Florens, L., and Washburn, M. P. (2008). Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc Natl Acad Sci U S A 105, 1454- 1459.

132. Schneider, A., Dessimoz, C., and Gonnet, G. H. (2007). OMA Browser--exploring orthologous relations across 352 complete genomes. Bioinformatics 23, 2180-2182.

133. Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. (2008). Widespread changes in protein synthesis induced by . Nature.

134. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-2504.

135. Shapira, S. D., Gat-Viks, I., Shum, B. O., Dricot, A., de Grace, M. M., Wu, L., Gupta, P. B., Hao, T., Silver, S. J., Root, D. E., et al. (2009). A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255-1267.

136. Sowa, M. E., Bennett, E. J., Gygi, S. P., and Harper, J. W. (2009). Defining the human deubiquitinating enzyme interaction landscape. Cell 138, 389-403.

137. Spirin, V., and Mirny, L. A. (2003). Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A 100, 12123-12128.

138. Stark, C., Breitkreutz, B. J., Chatr-Aryamontri, A., Boucher, L., Oughtred, R., Livstone, M. S., Nixon, J., Van Auken, K., Wang, X., Shi, X., et al. (2011). The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 39, D698-704.

139. Stark, C., Breitkreutz, B. J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34, D535-539.

151

140. Steen, H., and Mann, M. (2004). The ABC's (and XYZ's) of peptide sequencing. Nat Rev Mol Cell Biol 5, 699-711.

141. Sultan, M., Schulz, M. H., Richard, H., Magen, A., Klingenhoff, A., Scherf, M., Seifert, M., Borodina, T., Soldatov, A., Parkhomchuk, D., et al. (2008). A global view of gene activity and by deep sequencing of the human transcriptome. Science 321, 956-960.

142. Suter, B., Kittanakom, S., and Stagljar, I. (2008). Two-hybrid technologies in proteomics research. Curr Opin Biotechnol 19, 316-323.

143. Taatjes, D. J., Naar, A. M., Andel, F., 3rd, Nogales, E., and Tjian, R. (2002). Structure, function, and activator-induced conformations of the CRSP coactivator. Science 295, 1058-1062.

144. Tabb, D. L., McDonald, W. H., and Yates, J. R., 3rd (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J Proteome Res 1, 21-26.

145. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. (1999). Systematic determination of genetic network architecture. Nat Genet 22, 281-285.

146. Terpe, K. (2003). Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol 60, 523-533.

147. TheUniProtConsortium (2011). Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res 39, D214-219.

148. Tillier, E. R., and Charlebois, R. L. (2009). The human protein coevolution network. Genome Res 19, 1861-1871.

149. Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000). A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623-627.

150. Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., Jonasson, K., Forsberg, M., Zwahlen, M., Kampf, C., Wester, K., Hober, S., et al. (2010). Towards a knowledge- based Human Protein Atlas. Nat Biotechnol 28, 1248-1250.

151. Veerassamy, S., Smith, A., and Tillier, E. R. (2003). A transition probability model for amino acid substitutions from blocks. J Comput Biol 10, 997-1010.

152. Vidal, M., Cusick, M. E., and Barabasi, A. L. (2011). Interactome networks and human disease. Cell 144, 986-998.

153. Vilella, A. J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., and Birney, E. (2009). EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19, 327-335.

152

154. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002). Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399-403.

155. Washburn, M. P., Wolters, D., and Yates, J. R., 3rd (2001). Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19, 242-247.

156. Wessels, H. J., Vogel, R. O., van den Heuvel, L., Smeitink, J. A., Rodenburg, R. J., Nijtmans, L. G., and Farhoud, M. H. (2009). LC-MS/MS as an alternative for SDS- PAGE in blue native analysis of protein complexes. Proteomics 9, 4221-4228.

157. Wisniewski, J. R., Zougman, A., Nagaraj, N., and Mann, M. (2009). Universal sample preparation method for proteome analysis. Nat Methods 6, 359-362.

158. Wolters, D. A., Washburn, M. P., and Yates, J. R., 3rd (2001). An automated multidimensional protein identification technology for shotgun proteomics. Anal Chem 73, 5683-5690.

159. Wuyts, W., Roland, D., Ludecke, H. J., Wauters, J., Foulon, M., Van Hul, W., and Van Maldergem, L. (2002). Multiple exostoses, mental retardation, hypertrichosis, and brain abnormalities in a boy with a de novo 8q24 submicroscopic interstitial deletion. Am J Med Genet 113, 326-332.

160. Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V., Lindblad-Toh, K., Lander, E. S., and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345.

161. Xu, X., Song, Y., Li, Y., Chang, J., Zhang, H., and An, L. (2010). The tandem affinity purification method: an efficient system for protein complex purification and protein interaction identification. Protein Expr Purif 72, 149-156.

162. Yates, J. R., 3rd, Eng, J. K., McCormack, A. L., and Schieltz, D. (1995). Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal Chem 67, 1426-1436.

163. Yu, H., Braun, P., Yildirim, M. A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane- Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-quality binary protein interaction map of the yeast interactome network. Science 322, 104-110.

164. Zanivan, S., Cascone, I., Peyron, C., Molineris, I., Marchio, S., Caselle, M., and Bussolino, F. (2007). A new computational approach to analyze human protein complexes and predict novel protein interactions. Genome Biol 8, R256.

165. Zeghouf, M., Li, J., Butland, G., Borkowska, A., Canadien, V., Richards, D., Beattie, B., Emili, A., and Greenblatt, J. F. (2004). Sequential Peptide Affinity (SPA) system for the identification of mammalian and bacterial protein complexes. J Proteome Res 3, 463-468.

153

Appendix 1: List of predicted protein complexes

UNIPROT PREFERRED

CLUSTER SIZE ACCESSION GENE NAME PROTEIN SHORT DESCRIPTION

C_1 2 O60547 GMDS GDP-mannose 4,6 dehydratase

C_1 2 P30043 BLVRB Flavin reductase

C_2 2 Q92785 DPF2 Zinc finger protein ubi-d4

C_2 2 Q9NVP1 DDX18 ATP-dependent RNA helicase DDX18

C_3 2 Q86Y82 STX12 Syntaxin-12

C_3 2 O95249 GOSR1 Golgi SNAP receptor complex member 1

C_4 2 Q9UPN6 SCAF8 Protein SCAF8

C_4 2 P26583 HMGB2 High mobility group protein B2

C_5 2 P15408 FOSL2 Fos-related antigen 2

C_5 2 P18846 ATF1 Cyclic AMP-dependent transcription factor ATF-1

C_6 2 P23258 TUBG1 Tubulin gamma-1 chain

C_6 2 Q9UGJ1 TUBGCP4 Gamma-tubulin complex component 4

C_7 2 Q96FW1 OTUB1 thioesterase OTUB1

C_7 2 Q8N6M0 OTUD6B OTU domain-containing protein 6B

C_8 2 Q9H6T3 RPAP3 RNA polymerase II-associated protein 3

C_8 2 Q9NWS0 PIH1D1 PIH1 domain-containing protein 1

C_9 2 Q9UJS0 SLC25A13 Calcium-binding mitochondrial carrier protein Aralar2

C_9 2 Q9H2W6 MRPL46 39S L46, mitochondrial

C_10 2 Q9Y3C4 TPRKB TP53RK-binding protein

C_10 2 Q96S44 TP53RK TP53-regulating kinase

C_11 2 Q9UDY4 DNAJB4 DnaJ homolog subfamily B member 4

C_11 2 O95433 AHSA1 Activator of 90 kDa heat shock protein ATPase homolog 1

C_12 2 Q92917 GPKOW G patch domain and KOW motifs-containing protein

C_12 2 Q99700 ATXN2 Ataxin-2

154

C_13 2 O00534 VWA5A von Willebrand factor A domain-containing protein 5A

C_13 2 P06126 CD1A T-cell surface glycoprotein CD1a

C_14 2 Q53EL6 PDCD4 Programmed cell death protein 4

C_14 2 P55055 NR1H2 Oxysterols receptor LXR-beta

C_15 2 Q8IVH2 FOXP4 Forkhead box protein P4

C_15 2 P46013 MKI67 Antigen KI-67

C_16 2 Q15181 PPA1 Inorganic pyrophosphatase

C_16 2 Q9Y281 CFL2 Cofilin-2

Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic C_17 2 P42338 PIK3CB subunit beta isoform

C_17 2 P27986 PIK3R1 Phosphatidylinositol 3-kinase regulatory subunit alpha

C_18 2 Q14118 DAG1 Dystroglycan

C_18 2 P57737 CORO7 Coronin-7

C_19 2 Q9Y478 PRKAB1 5'-AMP-activated protein kinase subunit beta-1

C_19 2 Q13131 PRKAA1 5'-AMP-activated protein kinase catalytic subunit alpha-1

C_20 2 Q96FN9 C14orf126 Probable D-tyrosyl-tRNA(Tyr) deacylase 2

Haloacid dehalogenase-like domain-containing C_20 2 Q9H0R4 HDHD2 protein 2

RNA polymerase I-specific transcription C_21 2 Q9NYV6 RRN3 RRN3

C_21 2 Q96T37 RBM15 Putative RNA-binding protein 15

C_22 2 Q6UVK1 CSPG4 Chondroitin sulfate proteoglycan 4

C_22 2 P04632 CAPNS1 Calpain small subunit 1

C_23 2 Q9UHL4 DPP7 Dipeptidyl peptidase 2

C_23 2 Q99417 MYCBP C-Myc-binding protein

C_24 2 Q9Y6N7 ROBO1 Roundabout homolog 1

C_24 2 P53778 MAPK12 Mitogen-activated protein kinase 12

C_25 2 Q9UBL3 ASH2L Set1/Ash2 histone methyltransferase complex subunit ASH2

C_25 2 O15047 SETD1A Histone- N-methyltransferase SETD1A

C_26 2 Q13630 TSTA3 GDP-L-fucose synthase

155

C_26 2 Q8WXX5 DNAJC9 DnaJ homolog subfamily C member 9

C_27 2 Q969U7 PSMG2 Proteasome assembly chaperone 2

C_27 2 O95456 PSMG1 Proteasome assembly chaperone 1

C_28 2 O95197 RTN3 Reticulon-3

C_28 2 Q9NRG1 PRTFDC1 Phosphoribosyltransferase domain-containing protein 1

C_29 2 Q8WVY7 UBLCP1 Ubiquitin-like domain-containing CTD phosphatase 1

Phosphoribosyl pyrophosphate synthase-associated protein C_29 2 Q14558 PRPSAP1 1

C_30 2 Q8WXI9 GATAD2B Transcriptional repressor p66-beta

C_30 2 Q13330 MTA1 Metastasis-associated protein MTA1

C_31 2 Q9NRY2 INIP SOSS complex subunit C

C_31 2 Q68E01 INTS3 Integrator complex subunit 3

C_32 2 Q13043 STK4 /-protein kinase 4

C_32 2 Q99584 S100A13 Protein S100-A13

C_33 2 O60488 ACSL4 Long-chain-fatty-acid--CoA ligase 4

C_33 2 O95573 ACSL3 Long-chain-fatty-acid--CoA ligase 3

C_34 2 Q14764 MVP Major vault protein

C_34 2 Q9H444 CHMP4B Charged multivesicular body protein 4b

C_35 2 O43752 STX6 Syntaxin-6

C_35 2 Q8TDX6 CSGALNACT1 Chondroitin sulfate N-acetylgalactosaminyltransferase 1

C_36 2 Q9BQ67 GRWD1 Glutamate-rich WD repeat-containing protein 1

C_36 2 Q9P287 BCCIP BRCA2 and CDKN1A-interacting protein

C_37 2 Q9UI15 TAGLN3 Transgelin-3

C_37 2 O15305 PMM2 Phosphomannomutase 2

C_38 2 Q99426 TBCB Tubulin-folding B

C_38 2 P13693 TPT1 Translationally-controlled tumor protein

C_39 2 Q96SI9 STRBP Spermatid perinuclear RNA-binding protein

C_39 2 Q9UBS0 RPS6KB2 kinase beta-2

C_40 2 P61024 CKS1B Cyclin-dependent kinases regulatory subunit 1

156

C_40 2 P06493 CDK1 Cyclin-dependent kinase 1

C_41 2 Q96CN9 GCC1 GRIP and coiled-coil domain-containing protein 1

Bifunctional methylenetetrahydrofolate C_41 2 P13995 MTHFD2 /cyclohydrolase

C_42 2 Q96K17 BTF3L4 Transcription factor BTF3 homolog 4

Putative nascent polypeptide-associated complex subunit C_42 2 Q9BZK3 NACAP1 alpha-like protein

C_43 2 Q8NFH4 NUP37 Nucleoporin Nup37

C_43 2 Q9Y4L1 HYOU1 Hypoxia up-regulated protein 1

C_44 2 O76094 SRP72 Signal recognition particle 72 kDa protein

C_44 2 Q9UHB9 SRP68 Signal recognition particle 68 kDa protein

C_45 2 O75794 CDC123 Cell division cycle protein 123 homolog

C_45 2 Q8IZP0 ABI1 Abl interactor 1

C_46 2 Q9Y5P6 GMPPB Mannose-1-phosphate guanyltransferase beta

C_46 2 Q96IJ6 GMPPA Mannose-1-phosphate guanyltransferase alpha

C_47 2 Q9BYT8 NLN Neurolysin, mitochondrial

C_47 2 O14772 FPGT Fucose-1-phosphate guanylyltransferase

C_48 2 Q13505 MTX1 Metaxin-1

C_48 2 Q9NZ45 CISD1 CDGSH iron-sulfur domain-containing protein 1

C_49 2 Q9Y3C1 NOP16 Nucleolar protein 16

C_49 2 Q9H4G0 EPB41L1 Band 4.1-like protein 1

C_50 2 Q9UBZ4 APEX2 DNA-(apurinic or apyrimidinic site) 2

C_50 2 Q6L8Q7 PDE12 2',5'-phosphodiesterase 12

C_51 2 P55327 TPD52 Tumor protein D52

C_51 2 Q9NR31 SAR1A GTP-binding protein SAR1a

C_52 2 Q17RY0 CPEB4 Cytoplasmic polyadenylation element-binding protein 4

Bifunctional heparan sulfate N-deacetylase/N- C_52 2 P52848 NDST1 sulfotransferase 1

C_53 2 Q15370 TCEB2 Transcription elongation factor B polypeptide 2

C_53 2 Q15369 TCEB1 Transcription elongation factor B polypeptide 1

157

C_54 2 Q6ZNB6 NFXL1 NF-X1-type zinc finger protein NFXL1

C_54 2 Q14677 CLINT1 Clathrin interactor 1

C_55 2 Q03169 TNFAIP2 Tumor necrosis factor alpha-induced protein 2

C_55 2 Q92547 TOPBP1 DNA topoisomerase 2-binding protein 1

C_56 2 Q5H9R7 PPP6R3 Serine/threonine-protein phosphatase 6 regulatory subunit 3

C_56 2 O00743 PPP6C Serine/threonine-protein phosphatase 6 catalytic subunit

C_57 2 Q96IG2 FBXL20 F-box/LRR-repeat protein 20

C_57 2 Q13616 CUL1 Cullin-1

C_58 2 P0C1Z6 TFPT TCF3 fusion partner

C_58 2 P28370 SMARCA1 Probable global transcription activator SNF2L1

C_59 2 P99999 CYCS Cytochrome c

C_59 2 P36404 ARL2 ADP-ribosylation factor-like protein 2

C_60 2 Q7Z4G1 COMMD6 COMM domain-containing protein 6

C_60 2 Q9H0A8 COMMD4 COMM domain-containing protein 4

C_61 2 Q9Y4P1 ATG4B Cysteine protease ATG4B

C_61 2 P31946 YWHAB 14-3-3 protein beta/alpha

C_62 2 P29084 GTF2E2 Transcription initiation factor IIE subunit beta

C_62 2 P29083 GTF2E1 General transcription factor IIE subunit 1

C_63 2 Q92889 ERCC4 DNA repair endonuclease XPF

C_63 2 P07992 ERCC1 DNA excision repair protein ERCC-1

C_64 2 Q9UJW0 DCTN4 Dynactin subunit 4

C_64 2 Q9NZ32 ACTR10 Actin-related protein 10

C_65 2 Q8IZ69 TRMT2A tRNA (uracil-5-)-methyltransferase homolog A

C_65 2 Q14147 DHX34 Probable ATP-dependent RNA helicase DHX34

C_66 2 P30711 GSTT1 Glutathione S- theta-1

C_66 2 P78417 GSTO1 Glutathione S-transferase omega-1

C_67 2 Q9NVS9 PNPO Pyridoxine-5'-phosphate oxidase

C_67 2 Q93052 LPP Lipoma-preferred partner

158

C_68 2 P32455 GBP1 Interferon-induced guanylate-binding protein 1

C_68 2 Q00535 CDK5 Cyclin-dependent kinase 5

C_69 2 P68871 HBB Hemoglobin subunit beta

C_69 2 Q8TB36 GDAP1 Ganglioside-induced differentiation-associated protein 1

C_70 2 P11498 PC Pyruvate carboxylase, mitochondrial

C_70 2 Q13765 NACA Nascent polypeptide-associated complex subunit alpha

C_71 2 Q13535 ATR Serine/threonine-protein kinase ATR

C_71 2 O95067 CCNB2 G2/mitotic-specific cyclin-B2

C_72 2 Q12882 DPYD Dihydropyrimidine dehydrogenase [NADP(+)]

C_72 2 Q8WUX2 CHAC2 Cation transport regulator-like protein 2

C_73 2 Q2NKX8 ERCC6L DNA excision repair protein ERCC-6-like

C_73 2 Q13085 ACACA Acetyl-CoA carboxylase 1

C_74 2 Q06547 GABPB1 GA-binding protein subunit beta-1

C_74 2 Q06546 GABPA GA-binding protein alpha chain

C_75 2 P47756 CAPZB F-actin-capping protein subunit beta

C_75 2 P52907 CAPZA1 F-actin-capping protein subunit alpha-1

C_76 2 Q68D10 SPTY2D1 Protein SPT2 homolog

C_76 2 Q96KG9 SCYL1 N-terminal kinase-like protein

C_77 2 P47712 PLA2G4A Cytosolic phospholipase A2

C_77 2 P12277 CKB Creatine kinase B-type

C_78 2 Q68CZ2 TNS3 Tensin-3

C_78 2 Q9NZL9 MAT2B Methionine adenosyltransferase 2 subunit beta

C_79 2 Q9UPS6 SETD1B Histone-lysine N-methyltransferase SETD1B

C_79 2 Q9P0U4 CXXC1 CpG-binding protein

C_80 2 Q99755 PIP5K1A Phosphatidylinositol 4-phosphate 5-kinase type-1 alpha

C_80 2 P48729 CSNK1A1 Casein kinase I isoform alpha

C_81 2 Q15024 EXOSC7 Exosome complex component RRP42

C_81 2 Q5RKV6 EXOSC6 Exosome complex component MTR3

159

C_82 2 Q86TN4 TRPT1 tRNA 2'-phosphotransferase 1

C_82 2 P57772 EEFSEC Selenocysteine-specific elongation factor

C_83 2 Q15022 SUZ12 Polycomb protein SUZ12

C_83 2 Q15910 EZH2 Histone-lysine N-methyltransferase EZH2

C_84 2 Q9BTA9 WAC WW domain-containing adapter protein with coiled-coil

C_84 2 Q9H974 QTRTD1 Queuine tRNA-ribosyltransferase subunit QTRTD1

C_85 2 Q13045 FLII Protein flightless-1 homolog

C_85 2 P47813 EIF1AX initiation factor 1A, X-chromosomal

C_86 2 P49815 TSC2 Tuberin

C_86 2 Q9NQ88 TIGAR Fructose-2,6-bisphosphatase TIGAR

C_87 2 O94973 AP2A2 AP-2 complex subunit alpha-2

C_87 2 O95782 AP2A1 AP-2 complex subunit alpha-1

C_88 2 Q9UI26 IPO11 Importin-11

C_88 2 Q99962 SH3GL2 Endophilin-A1

C_89 2 Q6ZW49 PAXIP1 PAX-interacting protein 1

C_89 2 P78316 NOP14 Nucleolar protein 14

C_90 2 O43414 ERI3 ERI1 exoribonuclease 3

C_90 2 P33981 TTK Dual specificity protein kinase TTK

C_91 2 Q96HA7 TONSL Tonsoku-like protein

C_91 2 Q9Y6X3 MAU2 MAU2 chromatid cohesion factor homolog

C_92 2 Q9H0H5 RACGAP1 Rac GTPase-activating protein 1

C_92 2 Q02241 KIF23 Kinesin-like protein KIF23

C_93 2 Q96B97 SH3KBP1 SH3 domain-containing kinase-binding protein 1

C_93 2 Q12884 FAP Seprase

C_94 2 Q9UKI8 TLK1 Serine/threonine-protein kinase tousled-like 1

C_94 2 P49441 INPP1 Inositol polyphosphate 1-phosphatase

C_95 2 Q13112 CHAF1B Chromatin assembly factor 1 subunit B

C_95 2 Q13111 CHAF1A Chromatin assembly factor 1 subunit A

160

C_96 2 Q32P51 HNRNPA1L2 Heterogeneous nuclear ribonucleoprotein A1-like 2

C_96 2 Q7L1Q6 BZW1 Basic leucine zipper and W2 domain-containing protein 1

C_97 2 P53634 CTSC Dipeptidyl peptidase 1

C_97 2 P60033 CD81 CD81 antigen

C_98 2 Q4J6C6 PREPL Prolyl endopeptidase-like

C_98 2 P13798 APEH Acylamino-acid-releasing enzyme

C_99 2 O15344 MID1 Midline-1

C_99 2 Q96T60 PNKP Bifunctional polynucleotide phosphatase/kinase

C_100 2 Q6PJG2 C14orf43 Uncharacterized protein C14orf43

C_100 2 Q96HC4 PDLIM5 PDZ and LIM domain protein 5

C_101 2 Q8N9N7 LRRC57 Leucine-rich repeat-containing protein 57

C_101 2 Q02388 COL7A1 Collagen alpha-1(VII) chain

C_102 2 P00966 ASS1 Argininosuccinate synthase

C_102 2 Q14161 GIT2 ARF GTPase-activating protein GIT2

C_103 2 P42167 TMPO Lamina-associated polypeptide 2, isoforms beta/gamma

C_103 2 O75531 BANF1 Barrier-to-autointegration factor

C_104 2 Q9BSU1 C16orf70 UPF0183 protein C16orf70

C_104 2 P35237 SERPINB6 Serpin B6

C_105 2 P46934 NEDD4 E3 ubiquitin-protein ligase NEDD4

C_105 2 P11532 DMD Dystrophin

C_106 2 Q96GD0 PDXP Pyridoxal phosphate phosphatase

C_106 2 Q9ULK4 MED23 Mediator of RNA polymerase II transcription subunit 23

C_107 2 P61204 ARF3 ADP-ribosylation factor 3

C_107 2 P84077 ARF1 ADP-ribosylation factor 1

C_108 2 Q7L7X3 TAOK1 Serine/threonine-protein kinase TAO1

C_108 2 Q8N5C6 SRBD1 S1 RNA-binding domain-containing protein 1

C_109 2 O95487 SEC24B Protein transport protein Sec24B

C_109 2 Q15437 SEC23B Protein transport protein Sec23B

161

C_110 2 O95455 TGDS dTDP-D-glucose 4,6-dehydratase

C_110 2 Q99714 HSD17B10 3-hydroxyacyl-CoA dehydrogenase type-2

C_111 2 Q08170 SRSF4 Serine/arginine-rich splicing factor 4

C_111 2 Q96S55 WRNIP1 ATPase WRNIP1

C_112 2 Q02809 PLOD1 Procollagen-lysine,2-oxoglutarate 5-dioxygenase 1

C_112 2 Q99618 CDCA3 Cell division cycle-associated protein 3

C_113 3 Q6UXN9 WDR82 WD repeat-containing protein 82

C_113 3 P09936 UCHL1 Ubiquitin carboxyl-terminal hydrolase isozyme L1

C_113 3 Q6PHR2 ULK3 Serine/threonine-protein kinase ULK3

Serine/threonine-protein phosphatase PP1-gamma catalytic C_114 3 P36873 PPP1CC subunit

C_114 3 Q12972 PPP1R8 Nuclear inhibitor of protein phosphatase 1

C_114 3 P35240 NF2 Merlin

C_115 3 Q13099 IFT88 Intraflagellar transport protein 88 homolog

C_115 3 Q8WYA0 IFT81 Intraflagellar transport protein 81 homolog

C_115 3 Q9NQC8 IFT46 Intraflagellar transport protein 46 homolog

Serine/threonine-protein phosphatase 1 regulatory subunit C_116 3 Q96QC0 PPP1R10 10

C_116 3 Q8N6T7 SIRT6 NAD-dependent protein deacetylase -6

C_116 3 Q9UG63 ABCF2 ATP-binding cassette sub-family F member 2

C_117 3 Q96CM3 RPUSD4 RNA pseudouridylate synthase domain-containing protein 4

C_117 3 Q9HC36 RNMTL1 RNA methyltransferase-like protein 1

C_117 3 Q9NP92 MRPS30 28S ribosomal protein S30, mitochondrial

C_118 3 Q92820 GGH Gamma-glutamyl hydrolase

C_118 3 P07099 EPHX1 Epoxide hydrolase 1

C_118 3 Q9NXV6 CDKN2AIP CDKN2A-interacting protein

C_119 3 Q5T8P6 RBM26 RNA-binding protein 26

C_119 3 Q9H2M9 RAB3GAP2 Rab3 GTPase-activating protein non-catalytic subunit

C_119 3 Q8NFH3 NUP43 Nucleoporin Nup43

C_120 3 P29350 PTPN6 Tyrosine-protein phosphatase non-receptor type 6

162

C_120 3 Q5PRF9 SAMD4B Protein Smaug homolog 2

C_120 3 Q9NVE7 PANK4 Pantothenate kinase 4

C_121 3 Q8IWB7 WDFY1 WD repeat and FYVE domain-containing protein 1

C_121 3 P83876 TXNL4A Thioredoxin-like protein 4A

C_121 3 Q96EM0 C14orf149 Probable proline racemase

C_122 3 O75368 SH3BGRL SH3 domain-binding glutamic acid-rich-like protein

C_122 3 Q9UNZ2 NSFL1C NSFL1 cofactor p47

C_122 3 Q9BS40 LXN Latexin

C_123 3 P61019 RAB2A Ras-related protein Rab-2A

C_123 3 P04181 OAT Ornithine aminotransferase, mitochondrial

C_123 3 P10619 CTSA Lysosomal protective protein

C_124 3 Q6P9B6 KIAA1609 TLD domain-containing protein KIAA1609

C_124 3 P53350 PLK1 Serine/threonine-protein kinase PLK1

C_124 3 Q53FA7 TP53I3 Quinone PIG3

C_125 3 Q9P270 SLAIN2 SLAIN motif-containing protein 2

C_125 3 P78346 RPP30 Ribonuclease P protein subunit p30

C_125 3 Q5VZK9 LRRC16A Leucine-rich repeat-containing protein 16A

C_126 3 Q96ME1 FBXL18 F-box/LRR-repeat protein 18

C_126 3 Q9HD42 CHMP1A Charged multivesicular body protein 1a

C_126 3 Q9Y6K5 OAS3 2'-5'-oligoadenylate synthase 3

C_127 3 Q9Y6E0 STK24 Serine/threonine-protein kinase 24

C_127 3 Q9HCD5 NCOA5 Nuclear receptor coactivator 5

C_127 3 Q66GS9 CEP135 Centrosomal protein of 135 kDa

C_128 3 Q96AY3 FKBP10 Peptidyl-prolyl cis-trans FKBP10

C_128 3 P42892 ECE1 Endothelin-converting enzyme 1

C_128 3 Q9H3Z4 DNAJC5 DnaJ homolog subfamily C member 5

C_129 3 Q99719 SEPT5 Septin-5

C_129 3 Q9P0V9 SEPT10 Septin-10

163

C_129 3 O00273 DFFA DNA fragmentation factor subunit alpha

C_130 3 Q00613 HSF1 Heat shock factor protein 1

C_130 3 Q13322 GRB10 Growth factor receptor-bound protein 10

C_130 3 Q6ZUT9 DENND5B DENN domain-containing protein 5B

C_131 3 O75526 RBMXL2 RNA-binding motif protein, X-linked-like-2

C_131 3 Q92696 RABGGTA Geranylgeranyl transferase type-2 subunit alpha

Ferredoxin-fold anticodon-binding domain-containing protein C_131 3 Q9BRP7 FDXACB1 1

C_132 3 Q99973 TEP1 Telomerase protein component 1

C_132 3 Q9BTK6 PA1 PAXIP1-associated protein 1

C_132 3 P10155 TROVE2 60 kDa SS-A/Ro ribonucleoprotein

C_133 3 Q16637 SMN1; SMN2 Survival motor neuron protein

C_133 3 Q9UHI6 DDX20 Probable ATP-dependent RNA helicase DDX20

C_133 3 P38432 COIL Coilin

C_134 3 Q9H446 RWDD1 RWD domain-containing protein 1

C_134 3 Q6PCB5 RSBN1L Round spermatid basic protein 1-like protein

C_134 3 P55039 DRG2 Developmentally-regulated GTP-binding protein 2

Evolutionarily conserved signaling intermediate in Toll C_135 3 Q9BQ95 ECSIT pathway, mitochondrial

C_135 3 Q16611 BAK1 Bcl-2 homologous antagonist/killer

C_135 3 Q9NRK6 ABCB10 ATP-binding cassette sub-family B member 10, mitochondrial

C_136 3 O95359 TACC2 Transforming acidic coiled-coil-containing protein 2

C_136 3 O43663 PRC1 Protein regulator of cytokinesis 1

C_136 3 O75694 NUP155 Nuclear pore complex protein Nup155

C_137 3 Q96KA5 CLPTM1L Cleft lip and palate transmembrane protein 1-like protein

C_137 3 Q9HD33 MRPL47 39S ribosomal protein L47, mitochondrial

C_137 3 Q8N983 MRPL43 39S ribosomal protein L43, mitochondrial

C_138 3 Q9NQW7 XPNPEP1 Xaa-Pro aminopeptidase 1

C_138 3 Q9NP79 VTA1 Vacuolar protein sorting-associated protein VTA1 homolog

C_138 3 O75663 TIPRL TIP41-like protein

164

C_139 3 Q86UT6 NLRX1 NLR family member X1

C_139 3 Q9ULC4 MCTS1 Malignant T-cell-amplified sequence 1

C_139 3 O43583 DENR Density-regulated protein

C_140 3 Q06124 PTPN11 Tyrosine-protein phosphatase non-receptor type 11

C_140 3 P49841 GSK3B Glycogen synthase kinase-3 beta

C_140 3 Q9UP38 FZD1 Frizzled-1

C_141 3 Q9H2U1 DHX36 Probable ATP-dependent RNA helicase DHX36

C_141 3 P49773 HINT1 Histidine triad nucleotide-binding protein 1

C_141 3 Q9NRV9 HEBP1 Heme-binding protein 1

C_142 3 Q93096 PTP4A1 Protein tyrosine phosphatase type IVA 1

C_142 3 Q9NVU7 SDAD1 Protein SDA1 homolog

C_142 3 P11279 LAMP1 Lysosome-associated membrane glycoprotein 1

C_143 3 P20290 BTF3 Transcription factor BTF3

C_143 3 Q6NW29 RWDD4 RWD domain-containing protein 4

C_143 3 Q9NUG6 PDRG1 and DNA damage-regulated protein 1

C_144 3 Q58FF7 HSP90AB3P Putative heat shock protein HSP 90-beta-3

C_144 3 P02792 FTL Ferritin light chain

C_144 3 P02794 FTH1 Ferritin heavy chain

C_145 3 P54577 YARS Tyrosine--tRNA ligase, cytoplasmic

C_145 3 Q14320 FAM50A Protein FAM50A

C_145 3 Q86W50 METTL16 Methyltransferase-like protein 16

C_146 3 O14757 CHEK1 Serine/threonine-protein kinase Chk1

C_146 3 Q14686 NCOA6 Nuclear receptor coactivator 6

C_146 3 Q96RN5 MED15 Mediator of RNA polymerase II transcription subunit 15

C_147 3 P16615 ATP2A2 Sarcoplasmic/endoplasmic reticulum calcium ATPase 2

C_147 3 O14983 ATP2A1 Sarcoplasmic/endoplasmic reticulum calcium ATPase 1

C_147 3 Q14573 ITPR3 Inositol 1,4,5-trisphosphate receptor type 3

C_148 3 Q9NVM9 Asun Protein asunder homolog

165

C_148 3 Q9BXJ9 NAA15 N-alpha-acetyltransferase 15, NatA auxiliary subunit

C_148 3 P41227 NAA10 N-alpha-acetyltransferase 10

C_149 3 Q92890 UFD1L Ubiquitin fusion degradation protein 1 homolog

C_149 3 Q8TAT6 NPLOC4 Nuclear protein localization protein 4 homolog

C_149 3 P58546 MTPN Myotrophin

C_150 3 Q5TC82 RC3H1 Roquin

C_150 3 P26196 DDX6 Probable ATP-dependent RNA helicase DDX6

C_150 3 P32320 CDA

C_151 4 Q12965 MYO1E Unconventional myosin-Ie

C_151 4 O60610 DIAPH1 Protein diaphanous homolog 1

C_151 4 P35749 MYH11 Myosin-11

C_151 4 P60709 ACTB Actin, cytoplasmic 1

C_152 4 Q9P2L0 WDR35 WD repeat-containing protein 35

C_152 4 Q9NYU2 UGGT1 UDP-glucose:glycoprotein glucosyltransferase 1

C_152 4 Q86V81 ALYREF THO complex subunit 4

C_152 4 Q14669 TRIP12 Probable E3 ubiquitin-protein ligase TRIP12

C_153 4 Q12965 MYO1E Unconventional myosin-Ie

C_153 4 O60610 DIAPH1 Protein diaphanous homolog 1

C_153 4 O95757 HSPA4L Heat shock 70 kDa protein 4L

C_153 4 Q5VTR2 RNF20 E3 ubiquitin-protein ligase BRE1A

C_154 4 O95365 ZBTB7A Zinc finger and BTB domain-containing protein 7A

C_154 4 O75691 UTP20 Small subunit processome component 20 homolog

C_154 4 Q8IY37 DHX37 Probable ATP-dependent RNA helicase DHX37

C_154 4 Q9H0A0 NAT10 N-acetyltransferase 10

C_155 4 Q9Y5V0 ZNF706 Zinc finger protein 706

C_155 4 Q96RU2 USP28 Ubiquitin carboxyl-terminal hydrolase 28

C_155 4 O60763 USO1 General vesicular transport factor p115

C_155 4 O14578 CIT Citron Rho-interacting kinase

166

C_156 4 Q7Z7H5 TMED4 Transmembrane emp24 domain-containing protein 4

C_156 4 Q15363 TMED2 Transmembrane emp24 domain-containing protein 2

C_156 4 P49257 LMAN1 Protein ERGIC-53

C_156 4 P45877 PPIC Peptidyl-prolyl cis-trans isomerase C

C_157 4 Q5VYJ4 RUEL1 Small nuclear ribonucleoprotein polypeptide E-like protein

C_157 4 P08779 KRT16 Keratin, type I cytoskeletal 16

C_157 4 P52272 HNRNPM Heterogeneous nuclear ribonucleoprotein M

C_157 4 Q9UDW1 UQCR10 Cytochrome b-c1 complex subunit 9

C_158 4 Q15007 WTAP Pre-mRNA-splicing regulator WTAP

C_158 4 O75439 PMPCB Mitochondrial-processing peptidase subunit beta

C_158 4 P52292 KPNA2 Importin subunit alpha-2

C_158 4 Q9UK76 HN1 Hematological and neurological expressed 1 protein

C_159 4 Q9P0L0 VAPA Vesicle-associated membrane protein-associated protein A

C_159 4 P04179 SOD2 Superoxide dismutase [Mn], mitochondrial

C_159 4 P51649 ALDH5A1 Succinate-semialdehyde dehydrogenase, mitochondrial

C_159 4 Q9UKD2 MRTO4 mRNA turnover protein 4 homolog

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_160 4 Q9UI09 NDUFA12 subunit 12

C_160 4 P33897 ABCD1 ATP-binding cassette sub-family D member 1

C_160 4 O75947 ATP5H ATP synthase subunit d, mitochondrial

C_160 4 P00846 MT-ATP6 ATP synthase subunit a

C_161 4 Q5VYS8 ZCCHC6 Terminal uridylyltransferase 7

C_161 4 P51608 MECP2 Methyl-CpG-binding protein 2

C_161 4 Q01831 XPC DNA repair protein complementing XP-C cells

C_161 4 P41208 CETN2 Centrin-2

C_162 4 Q9UGV2 NDRG3 Protein NDRG3

C_162 4 Q9Y450 HBS1L HBS1-like protein

C_162 4 O14810 CPLX1 Complexin-1

C_162 4 Q9UJY4 GGA2 ADP-ribosylation factor-binding protein GGA2

167

C_163 4 Q9H8H2 DDX31 Probable ATP-dependent RNA helicase DDX31

C_163 4 Q9ULG1 INO80 DNA helicase INO80

C_163 4 Q9GZR7 DDX24 ATP-dependent RNA helicase DDX24

C_163 4 Q9H981 ACTR8 Actin-related protein 8

C_164 4 Q9UBT2 UBA2 SUMO-activating enzyme subunit 2

C_164 4 Q9UBE0 SAE1 SUMO-activating enzyme subunit 1

C_164 4 Q99611 SEPHS2 Selenide, water dikinase 2

C_164 4 Q9NZZ3 CHMP5 Charged multivesicular body protein 5

C_165 4 P23921 RRM1 Ribonucleoside-diphosphate reductase large subunit

C_165 4 A6NL28 TPM3L Putative tropomyosin alpha-3 chain-like protein

C_165 4 P32119 PRDX2 Peroxiredoxin-2

C_165 4 P51688 SGSH N-sulphoglucosamine sulphohydrolase

C_166 4 Q9UBU9 NXF1 Nuclear RNA export factor 1

C_166 4 O95232 LUC7L3 Luc7-like protein 3

C_166 4 O75531 BANF1 Barrier-to-autointegration factor

C_166 4 Q9NUU7 DDX19A ATP-dependent RNA helicase DDX19A

C_167 4 P11234 RALB Ras-related protein Ral-B

C_167 4 P51148 RAB5C Ras-related protein Rab-5C

C_167 4 Q9UHA4 LAMTOR3 Ragulator complex protein LAMTOR3

C_167 4 Q8NF37 LPCAT1 Lysophosphatidylcholine acyltransferase 1

C_168 4 O60245 PCDH7 Protocadherin-7

C_168 4 Q9NUL7 DDX28 Probable ATP-dependent RNA helicase DDX28

C_168 4 Q9GZY8 MFF Mitochondrial fission factor

C_168 4 P21589 NT5E 5'-nucleotidase

C_169 4 P61978 HNRNPK Heterogeneous nuclear ribonucleoprotein K

C_169 4 Q13185 CBX3 Chromobox protein homolog 3

Calcium/calmodulin-dependent protein kinase type II subunit C_169 4 Q9UQM7 CAMK2A alpha

C_169 4 Q9C0C2 TNKS1BP1 182 kDa tankyrase-1-binding protein

168

C_170 4 Q9UNH7 SNX6 Sorting nexin-6

C_170 4 O60749 SNX2 Sorting nexin-2

C_170 4 Q13596 SNX1 Sorting nexin-1

PH domain leucine-rich repeat-containing protein C_170 4 O60346 PHLPP1 phosphatase 1

C_171 4 Q9NZI7 UBP1 Upstream-binding protein 1

C_171 4 P57081 WDR4 tRNA (guanine-N(7)-)-methyltransferase subunit WDR4

C_171 4 Q9UBP6 METTL1 tRNA (guanine-N(7)-)-methyltransferase

C_171 4 Q8IUD2 ERC1 ELKS/Rab6-interacting/CAST family member 1

C_172 4 Q96AG3 SLC25A46 Solute carrier family 25 member 46

C_172 4 Q8WVM8 SCFD1 Sec1 family domain-containing protein 1

C_172 4 Q86VS8 HOOK3 Protein Hook homolog 3

C_172 4 P17066 HSPA6 Heat shock 70 kDa protein 6

C_173 4 Q14980 NUMA1 Nuclear mitotic apparatus protein 1

C_173 4 Q16531 DDB1 DNA damage-binding protein 1

C_173 4 Q13620 CUL4B Cullin-4B

C_173 4 Q13619 CUL4A Cullin-4A

C_174 4 Q92616 GCN1L1 Translational activator GCN1

C_174 4 O75340 PDCD6 Programmed cell death protein 6

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_174 4 Q16718 NDUFA5 subunit 5

C_174 4 Q27J81 INF2 Inverted formin-2

C_175 4 Q9Y4P8 WIPI2 WD repeat domain phosphoinositide-interacting protein 2

C_175 4 O95881 TXNDC12 Thioredoxin domain-containing protein 12

C_175 4 Q6P3X3 TTC27 Tetratricopeptide repeat protein 27

C_175 4 P40222 TXLNA Alpha-taxilin

C_176 4 Q92610 ZNF592 Zinc finger protein 592

C_176 4 Q8NEM2 SHCBP1 SHC SH2 domain-binding protein 1

C_176 4 Q05519 SRSF11 Serine/arginine-rich splicing factor 11

C_176 4 Q7L014 DDX46 Probable ATP-dependent RNA helicase DDX46

169

C_177 4 Q15428 SF3A2 Splicing factor 3A subunit 2

C_177 4 Q9UNP9 PPIE Peptidyl-prolyl cis-trans isomerase E

C_177 4 P50897 PPT1 Palmitoyl-protein thioesterase 1

C_177 4 Q6KC79 NIPBL Nipped-B-like protein

C_178 4 P54725 RAD23A UV excision repair protein RAD23 homolog A

C_178 4 Q9Y6A5 TACC3 Transforming acidic coiled-coil-containing protein 3

C_178 4 P61106 RAB14 Ras-related protein Rab-14

C_178 4 A1L0T0 ILVBL Acetolactate synthase-like protein

C_179 4 Q96H78 SLC25A44 Solute carrier family 25 member 44

C_179 4 P14923 JUP Junction plakoglobin

C_179 4 P35221 CTNNA1 Catenin alpha-1

C_179 4 P15291 B4GALT1 Beta-1,4-galactosyltransferase 1

C_180 4 Q4G0F5 VPS26B Vacuolar protein sorting-associated protein 26B

C_180 4 A0AVT1 UBA6 Ubiquitin-like modifier-activating enzyme 6

C_180 4 P55786 NPEPPS Puromycin-sensitive aminopeptidase

C_180 4 Q9Y376 CAB39 Calcium-binding protein 39

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_181 4 Q9P032 NDUFAF4 assembly factor 4

C_181 4 Q15012 LAPTM4A Lysosomal-associated transmembrane protein 4A

C_181 4 Q99538 LGMN Legumain

C_181 4 O00425 IGF2BP3 -like growth factor 2 mRNA-binding protein 3

C_182 4 Q92804 TAF15 TATA-binding protein-associated factor 2N

C_182 4 Q9UHR5 SAP30BP SAP30-binding protein

C_182 4 P35637 FUS RNA-binding protein FUS

C_182 4 Q14687 GSE1 Genetic suppressor element 1

C_183 4 P49642 PRIM1 DNA primase small subunit

C_183 4 P49643 PRIM2 DNA primase large subunit

C_183 4 Q14181 POLA2 DNA polymerase alpha subunit B

C_183 4 P09884 POLA1 DNA polymerase alpha catalytic subunit

170

C_184 4 Q8N1B4 VPS52 Vacuolar protein sorting-associated protein 52 homolog

C_184 4 Q7Z2T5 TRMT1L TRMT1-like protein

C_184 4 Q99986 VRK1 Serine/threonine-protein kinase VRK1

C_184 4 P39748 FEN1 Flap endonuclease 1

C_185 4 Q9H1B7 IRF2BPL Interferon regulatory factor 2-binding protein-like

C_185 4 P78310 CXADR Coxsackievirus and adenovirus receptor

C_185 4 Q86TX2 ACOT1 Acyl-coenzyme A thioesterase 1

C_185 4 Q6NVY1 HIBCH 3-hydroxyisobutyryl-CoA hydrolase, mitochondrial

C_186 4 Q6P2S7 GNN Tetratricopeptide repeat protein GNN

C_186 4 P09382 LGALS1 Galectin-1

C_186 4 P04075 ALDOA Fructose-bisphosphate aldolase A

C_186 4 P21333 FLNA Filamin-A

C_187 4 Q9Y247 FAM50B Protein FAM50B

C_187 4 Q6P161 MRPL54 39S ribosomal protein L54, mitochondrial

C_187 4 Q9Y399 MRPS2 28S ribosomal protein S2, mitochondrial

C_187 4 Q9Y3D3 MRPS16 28S ribosomal protein S16, mitochondrial

C_188 4 Q17R98 ZNF827 Zinc finger protein 827

C_188 4 P51965 UBE2E1 Ubiquitin-conjugating enzyme E2 E1

Rod cGMP-specific 3',5'-cyclic phosphodiesterase subunit C_188 4 P16499 PDE6A alpha

C_188 4 Q8IWX8 CHERP Calcium homeostasis endoplasmic reticulum protein

C_189 4 Q8N5G0 C4orf52 Uncharacterized protein C4orf52

C_189 4 O75940 SMNDC1 Survival of motor neuron-related-splicing factor 30

C_189 4 Q9BWJ5 SF3B5 Splicing factor 3B subunit 5

C_189 4 Q9P013 CWC15 Protein CWC15 homolog

C_190 4 O15042 U2SURP U2 snRNP-associated SURP motif-containing protein

C_190 4 Q6NWY9 PRPF40B Pre-mRNA-processing factor 40 homolog B

C_190 4 P55081 MFAP1 Microfibrillar-associated protein 1

C_190 4 P61927 RPL37 60S ribosomal protein L37

171

C_191 4 Q9UKY1 ZHX1 Zinc fingers and homeoboxes protein 1

C_191 4 Q15269 PWP2 Periodic tryptophan protein 2 homolog

C_191 4 O94829 IPO13 Importin-13

C_191 4 Q9C0E2 XPO4 Exportin-4

TFIIH basal transcription factor complex helicase XPD C_192 4 P18074 ERCC2 subunit

TFIIH basal transcription factor complex helicase XPB C_192 4 P19447 ERCC3 subunit

C_192 4 Q9H3E2 SNX25 Sorting nexin-25

C_192 4 Q8N196 SIX5 Homeobox protein SIX5

C_193 4 Q96BR5 SELRC1 Sel1 repeat-containing protein 1

C_193 4 Q96AY3 FKBP10 Peptidyl-prolyl cis-trans isomerase FKBP10

C_193 4 Q5T0N5 FNBP1L Formin-binding protein 1-like

C_193 4 P11717 IGF2R Cation-independent mannose-6-phosphate receptor

C_194 4 Q9BXF6 RAB11FIP5 Rab11 family-interacting protein 5

C_194 4 Q9UNN8 PROCR Endothelial protein C receptor

C_194 4 P07858 CTSB Cathepsin B

C_194 4 Q9NUT2 ABCB8 ATP-binding cassette sub-family B member 8, mitochondrial

C_195 4 P51659 HSD17B4 Peroxisomal multifunctional enzyme type 2

C_195 4 Q9Y3D6 FIS1 Mitochondrial fission 1 protein

C_195 4 P51608 MECP2 Methyl-CpG-binding protein 2

C_195 4 P04040 CAT Catalase

C_196 4 Q96QD8 SLC38A2 Sodium-coupled neutral amino acid transporter 2

C_196 4 P08729 KRT7 Keratin, type II cytoskeletal 7

C_196 4 O94905 ERLIN2 Erlin-2

C_196 4 Q8NE86 MCU Calcium uniporter protein, mitochondrial

C_197 4 Q9BTD8 RBM42 RNA-binding protein 42

C_197 4 O00287 RFXAP Regulatory factor X-associated protein

C_197 4 P19338 NCL Nucleolin

C_197 4 Q99567 NUP88 Nuclear pore complex protein Nup88

172

C_198 4 P06454 PTMA Prothymosin alpha [Cleaved into: Thymosin alpha-1]

C_198 4 Q6PGN9 PSRC1 Proline/serine-rich coiled-coil protein 1

C_198 4 Q15398 DLGAP5 Disks large-associated protein 5

C_198 4 O14965 AURKA Aurora kinase A

C_199 4 O75528 TADA3 Transcriptional adapter 3

TAF5-like RNA polymerase II p300/CBP-associated factor- C_199 4 O75529 TAF5L associated factor 65 kDa subunit 5L

C_199 4 Q96ES7 CCDC101 SAGA-associated factor 29 homolog

C_199 4 Q92621 NUP205 Nuclear pore complex protein Nup205

C_200 4 Q14498 RBM39 RNA-binding protein 39

C_200 4 P61970 NUTF2 Nuclear transport factor 2

C_200 4 P62826 RAN GTP-binding nuclear protein Ran

C_200 4 Q9BRR8 GPATCH1 G patch domain-containing protein 1

C_201 4 P06132 UROD Uroporphyrinogen decarboxylase

C_201 4 P00441 SOD1 Superoxide dismutase [Cu-Zn]

C_201 4 P22392 NME2 Nucleoside diphosphate kinase B

C_201 4 P15531 NME1 Nucleoside diphosphate kinase A

C_202 4 P26640 VARS Valine--tRNA ligase

C_202 4 P50452 SERPINB8 Serpin B8

C_202 4 Q96BR1 SGK3 Serine/threonine-protein kinase Sgk3

C_202 4 Q9P2J5 LARS Leucine--tRNA ligase, cytoplasmic

C_203 4 Q8N584 TTC39C Tetratricopeptide repeat protein 39C

C_203 4 P63000 RAC1 Ras-related C3 botulinum toxin substrate 1

C_203 4 Q9Y617 PSAT1 Phosphoserine aminotransferase

C_203 4 Q9BWD1 ACAT2 Acetyl-CoA acetyltransferase, cytosolic

C_204 4 Q6PGP7 TTC37 Tetratricopeptide repeat protein 37

C_204 4 P51151 RAB9A Ras-related protein Rab-9A

C_204 4 Q15645 TRIP13 Pachytene checkpoint protein 2 homolog

C_204 4 Q15811 ITSN1 Intersectin-1

173

C_205 4 O14773 TPP1 Tripeptidyl-peptidase 1

C_205 4 P51148 RAB5C Ras-related protein Rab-5C

C_205 4 Q9BVL2 NUPL1 Nucleoporin p58/p45

C_205 4 P31943 HNRNPH1 Heterogeneous nuclear ribonucleoprotein H

C_206 4 Q92520 FAM3C Protein FAM3C

C_206 4 Q10471 GALNT2 Polypeptide N-acetylgalactosaminyltransferase 2

C_206 4 Q96EY7 PTCD3 Pentatricopeptide repeat-containing protein 3, mitochondrial

C_206 4 Q9Y5U9 IER3IP1 Immediate early response 3-interacting protein 1

C_207 4 Q8IX01 SUGP2 SURP and G-patch domain-containing protein 2

C_207 4 Q15155 NOMO1 Nodal modulator 1

Mitochondrial import inner membrane subunit C_207 4 Q9Y5J6 FXC1 Tim9 B

Mitochondrial import inner membrane translocase subunit C_207 4 P62072 TIMM10 Tim10

C_208 4 A6NED2 RCCD1 RCC1 domain-containing protein 1

C_208 4 P29966 MARCKS Myristoylated -rich C-kinase substrate

C_208 4 O76075 DFFB DNA fragmentation factor subunit beta

C_208 4 P53384 NUBP1 Cytosolic Fe-S cluster assembly factor NUBP1

C_209 4 Q9NR50 EIF2B3 Translation initiation factor eIF-2B subunit gamma

C_209 4 Q13144 EIF2B5 Translation initiation factor eIF-2B subunit epsilon

C_209 4 Q9UI10 EIF2B4 Translation initiation factor eIF-2B subunit delta

C_209 4 P49770 EIF2B2 Translation initiation factor eIF-2B subunit beta

C_210 4 Q9P0S9 TMEM14C Transmembrane protein 14C

C_210 4 Q92544 TM9SF4 Transmembrane 9 superfamily member 4

C_210 4 Q15005 SPCS2 Signal peptidase complex subunit 2

C_210 4 O00264 PGRMC1 Membrane-associated progesterone receptor component 1

C_211 4 P17706 PTPN2 Tyrosine-protein phosphatase non-receptor type 2

C_211 4 Q9BVA0 KATNB1 Katanin p80 WD40-containing subunit B1

C_211 4 O75449 KATNA1 Katanin p60 ATPase-containing subunit A1

C_211 4 O95166 GABARAP Gamma-aminobutyric acid receptor-associated protein

174

C_212 4 P35610 SOAT1 Sterol O-acyltransferase 1

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_212 4 Q9P0J0 NDUFA13 subunit 13

C_212 4 Q9UBX3 SLC25A10 Mitochondrial dicarboxylate carrier

C_212 4 Q6PKG0 LARP1 La-related protein 1

C_213 4 O75191 XYLB Xylulose kinase

C_213 4 P52888 THOP1 Thimet oligopeptidase

C_213 4 Q969U7 PSMG2 Proteasome assembly chaperone 2

C_213 4 P52735 VAV2 Guanine nucleotide exchange factor VAV2

C_214 4 O43298 ZBTB43 Zinc finger and BTB domain-containing protein 43

C_214 4 P11234 RALB Ras-related protein Ral-B

C_214 4 P49006 MARCKSL1 MARCKS-related protein

C_214 4 Q8TCC3 MRPL30 39S ribosomal protein L30, mitochondrial

Pyridoxal-dependent decarboxylase domain-containing C_215 4 Q6P996 PDXDC1 protein 1

C_215 4 P05114 HMGN1 Non-histone chromosomal protein HMG-14

C_215 4 A9UHW6 MIF4GD MIF4G domain-containing protein

Inositol hexakisphosphate and diphosphoinositol- C_215 4 Q6PFW1 PPIP5K1 pentakisphosphate kinase 1

C_216 4 Q7Z434 MAVS Mitochondrial antiviral-signaling protein

C_216 4 P08581 MET Hepatocyte growth factor receptor

Guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit C_216 4 Q9UBI6 GNG12 gamma-12

C_216 4 Q14254 FLOT2 Flotillin-2

C_217 4 Q9NTK5 OLA1 Obg-like ATPase 1

C_217 4 P09382 LGALS1 Galectin-1

C_217 4 P09972 ALDOC Fructose-bisphosphate aldolase C

C_217 4 P04075 ALDOA Fructose-bisphosphate aldolase A

C_218 4 Q96QD8 SLC38A2 Sodium-coupled neutral amino acid transporter 2

C_218 4 P10606 COX5B Cytochrome c oxidase subunit 5B, mitochondrial

C_218 4 P20674 COX5A Cytochrome c oxidase subunit 5A, mitochondrial

175

C_218 4 Q8NE86 MCU Calcium uniporter protein, mitochondrial

C_219 4 Q9Y6M5 SLC30A1 Zinc transporter 1

C_219 4 O96006 ZBED1 Zinc finger BED domain-containing protein 1

C_219 4 P15313 ATP6V1B1 V-type proton ATPase subunit B, kidney isoform

C_219 4 Q9UHB6 LIMA1 LIM domain and actin-binding protein 1

C_220 4 O00267 SUPT5H Transcription elongation factor SPT5

C_220 4 P63272 SUPT4H1 Transcription elongation factor SPT4

C_220 4 Q9ULU4 ZMYND8 Protein kinase C-binding protein 1

C_220 4 P13984 GTF2F2 General transcription factor IIF subunit 2

C_221 4 O75436 VPS26A Vacuolar protein sorting-associated protein 26A

C_221 4 Q96T51 RUFY1 RUN and FYVE domain-containing protein 1

C_221 4 Q5SRE5 NUP188 Nucleoporin NUP188 homolog

C_221 4 O95757 HSPA4L Heat shock 70 kDa protein 4L

C_222 4 O43399 TPD52L2 Tumor protein D54

C_222 4 P37802 TAGLN2 Transgelin-2

C_222 4 Q9UGI8 TES Testin

C_222 4 O60232 SSSCA1 Sjoegren syndrome/scleroderma autoantigen 1

C_223 4 O43709 WBSCR22 Uncharacterized methyltransferase WBSCR22

SULT1A3; C_223 4 P50224 SULT1A4 Sulfotransferase 1A3/1A4

C_223 4 Q96C45 ULK4 Serine/threonine-protein kinase ULK4

C_223 4 O75628 REM1 GTP-binding protein REM 1

C_224 4 Q9NXF1 TEX10 Testis-expressed sequence 10 protein

C_224 4 Q9Y4W2 LAS1L Ribosomal biogenesis protein LAS1L

C_224 4 Q8IZL8 PELP1 Proline-, glutamic acid- and leucine-rich protein 1

C_224 4 Q9NU22 MDN1 Midasin

C_225 4 P23921 RRM1 Ribonucleoside-diphosphate reductase large subunit

C_225 4 P51688 SGSH N-sulphoglucosamine sulphohydrolase

C_225 4 P16989 CSDA DNA-binding protein A

176

C_225 4 Q07021 C1QBP Complement component 1 Q subcomponent-binding protein

C_226 4 Q9BWT3 PAPOLG Poly(A) polymerase gamma

C_226 4 Q08050 FOXM1 Forkhead box protein M1

C_226 4 Q7Z478 DHX29 ATP-dependent RNA helicase DHX29

C_226 4 O94762 RECQL5 ATP-dependent DNA helicase Q5

C_227 4 O75312 ZNF259 Zinc finger protein ZPR1

C_227 4 P62942 FKBP1A Peptidyl-prolyl cis-trans isomerase FKBP1A

C_227 4 Q96T88 UHRF1 E3 ubiquitin-protein ligase UHRF1

C_227 4 O75794 CDC123 Cell division cycle protein 123 homolog

C_228 4 P13051 UNG Uracil-DNA glycosylase

C_228 4 Q96LR5 UBE2E2 Ubiquitin-conjugating enzyme E2

Single-strand selective monofunctional uracil DNA C_228 4 Q53HV7 SMUG1 glycosylase

C_228 4 Q9UKA9 PTBP2 Polypyrimidine tract-binding protein 2

C_229 4 Q9GZS3 WDR61 WD repeat-containing protein 61

C_229 4 Q13564 NAE1 NEDD8-activating enzyme E1 regulatory subunit

C_229 4 Q8TBC4 UBA3 NEDD8-activating enzyme E1 catalytic subunit

C_229 4 Q9Y6N9 USH1C Harmonin

C_230 4 Q9NXG2 THUMPD1 THUMP domain-containing protein 1

C_230 4 O75886 STAM2 Signal transducing adapter molecule 2

C_230 4 Q9HD47 RANGRF Ran guanine nucleotide

C_230 4 O14964 HGS Hepatocyte growth factor-regulated tyrosine kinase substrate

C_231 4 P26368 U2AF2 Splicing factor U2AF 65 kDa subunit

C_231 4 Q8NBT0 POC1A POC1 centriolar protein homolog A

C_231 4 Q8IYW5 RNF168 E3 ubiquitin-protein ligase RNF168

Cytochrome c oxidase subunit 7A-related protein, C_231 4 O14548 COX7A2L mitochondrial

C_232 4 P54289 CACNA2D1 Voltage-dependent calcium channel subunit alpha-2/delta-1

C_232 4 Q5SRE5 NUP188 Nucleoporin NUP188 homolog

C_232 4 O43181 NDUFS4 NADH dehydrogenase [ubiquinone] iron-sulfur protein 4,

177

mitochondrial

C_232 4 O95757 HSPA4L Heat shock 70 kDa protein 4L

C_233 4 P00491 PNP Purine nucleoside phosphorylase

C_233 4 Q04760 GLO1 Lactoylglutathione lyase

C_233 4 Q9BY32 ITPA Inosine triphosphate pyrophosphatase

C_233 4 P35573 AGL Glycogen debranching enzyme

Serine/threonine-protein phosphatase 2B catalytic subunit C_234 4 Q08209 PPP3CA alpha isoform

C_234 4 P08134 RHOC Rho-related GTP-binding protein RhoC

C_234 4 Q9BV20 MRI1 Methylthioribose-1-phosphate isomerase

C_234 4 P63098 PPP3R1 Calcineurin subunit B type 1

C_235 4 Q9C005 DPY30 Protein dpy-30 homolog

C_235 4 Q32MZ4 LRRFIP1 Leucine-rich repeat flightless-interacting protein 1

C_235 4 P07942 LAMB1 Laminin subunit beta-1

C_235 4 Q8WTS6 SETD7 Histone-lysine N-methyltransferase SETD7

C_236 4 Q8TEM1 NUP210 Nuclear pore membrane glycoprotein 210

C_236 4 Q13151 HNRNPA0 Heterogeneous nuclear ribonucleoprotein A0

C_236 4 O75369 FLNB Filamin-B

C_236 4 Q9HB71 CACYBP Calcyclin-binding protein

C_237 4 Q9BZV1 UBXN6 UBX domain-containing protein 6

C_237 4 P23381 WARS Tryptophan--tRNA ligase, cytoplasmic

C_237 4 P22102 GART Trifunctional purine biosynthetic protein adenosine-3

C_237 4 Q14258 TRIM25 E3 ubiquitin/ISG15 ligase TRIM25

C_238 4 P46940 IQGAP1 Ras GTPase-activating-like protein IQGAP1

C_238 4 P35580 MYH10 Myosin-10

C_238 4 Q9BSJ8 ESYT1 Extended synaptotagmin-1

C_238 4 P68032 ACTC1 Actin, alpha cardiac muscle 1

C_239 4 Q96PV0 SYNGAP1 Ras GTPase-activating protein SynGAP

C_239 4 Q8TF72 SHROOM3 Protein Shroom3

178

C_239 4 Q96ER3 SAAL1 Protein SAAL1

C_239 4 Q9Y6N9 USH1C Harmonin

C_240 4 P51649 ALDH5A1 Succinate-semialdehyde dehydrogenase, mitochondrial

C_240 4 O14828 SCAMP3 Secretory carrier-associated membrane protein 3

C_240 4 Q01085 TIAL1 Nucleolysin TIAR

C_240 4 Q9NR56 MBNL1 Muscleblind-like protein 1

C_241 4 P83369 LSM11 U7 snRNA-associated Sm-like protein LSm11

C_241 4 O75528 TADA3 Transcriptional adapter 3

TAF5-like RNA polymerase II p300/CBP-associated factor- C_241 4 O75529 TAF5L associated factor 65 kDa subunit 5L

C_241 4 Q13472 TOP3A DNA topoisomerase 3-alpha

C_242 4 P12268 IMPDH2 Inosine-5'-monophosphate dehydrogenase 2

C_242 4 P20839 IMPDH1 Inosine-5'-monophosphate dehydrogenase 1

C_242 4 P00390 GSR Glutathione reductase, mitochondrial

C_242 4 P15328 FOLR1 Folate receptor alpha

C_243 4 Q9BQ61 C19orf43 Uncharacterized protein C19orf43

C_243 4 O75676 RPS6KA4 Ribosomal protein S6 kinase alpha-4

C_243 4 Q86XP3 DDX42 ATP-dependent RNA helicase DDX42

C_243 4 Q9Y3U8 RPL36 60S ribosomal protein L36

C_244 4 Q96GM8 TOE1 Target of EGR1 protein 1

C_244 4 Q9BZI7 UPF3B Regulator of nonsense transcripts 3B

C_244 4 Q9BRP8 WIBG Partner of Y14 and mago

C_244 4 P11388 TOP2A DNA topoisomerase 2-alpha

C_245 4 Q9H269 VPS16 Vacuolar protein sorting-associated protein 16 homolog

C_245 4 Q8NBM4 UBAC2 Ubiquitin-associated domain-containing protein 2

C_245 4 Q9BVS4 RIOK2 Serine/threonine-protein kinase RIO2

C_245 4 Q2NL82 TSR1 Pre-rRNA-processing protein TSR1 homolog

C_246 4 Q01082 SPTBN1 Spectrin beta chain, non-erythrocytic 1

C_246 4 P98179 RBM3 Putative RNA-binding protein 3

179

C_246 4 O14818 PSMA7 Proteasome subunit alpha type-7

C_246 4 Q6SPF0 SAMD1 Atherin

C_247 4 P00558 PGK1 Phosphoglycerate kinase 1

C_247 4 P09972 ALDOC Fructose-bisphosphate aldolase C

C_247 4 P04075 ALDOA Fructose-bisphosphate aldolase A

BOLA2; C_247 4 Q9H3K6 BOLA2B BolA-like protein 2

C_248 4 Q5JTJ3 C1orf31 Uncharacterized protein C1orf31

C_248 4 P18859 ATP5J ATP synthase-coupling factor 6, mitochondrial

C_248 4 P56134 ATP5J2 ATP synthase subunit f, mitochondrial

C_248 4 P82930 MRPS34 28S ribosomal protein S34, mitochondrial

C_249 4 P08670 VIM Vimentin

C_249 4 Q8TCY9 URGCP Up-regulator of cell proliferation

SWI/SNF-related matrix-associated actin-dependent C_249 4 Q9P0W2 HMG20B regulator of chromatin subfamily E member 1-related

C_249 4 Q9P258 RCC2 Protein RCC2

C_250 4 Q15020 SART3 Squamous cell carcinoma antigen recognized by T-cells 3

C_250 4 Q9Y247 FAM50B Protein FAM50B

C_250 4 Q9BSE5 AGMAT , mitochondrial

C_250 4 Q6P161 MRPL54 39S ribosomal protein L54, mitochondrial

C_251 4 P18827 SDC1 Syndecan-1

C_251 4 P35237 SERPINB6 Serpin B6

C_251 4 P49915 GMPS GMP synthase [-hydrolyzing]

C_251 4 P23526 AHCY Adenosylhomocysteinase

C_252 4 Q8IXQ6 PARP9 Poly [ADP-ribose] polymerase 9

C_252 4 Q9H9Q4 NHEJ1 Non-homologous end-joining factor 1

C_252 4 Q14CX7 NAA25 N-alpha-acetyltransferase 25, NatB auxiliary subunit

C_252 4 P61599 NAA20 N-alpha-acetyltransferase 20

C_253 4 Q15904 ATP6AP1 V-type proton ATPase subunit S1

C_253 4 Q13501 SQSTM1 Sequestosome-1

180

C_253 4 P22626 HNRNPA2B1 Heterogeneous nuclear ribonucleoproteins A2/B1

C_253 4 Q99729 HNRNPAB Heterogeneous nuclear ribonucleoprotein A/B

C_254 4 O75674 TOM1L1 TOM1-like protein 1

C_254 4 Q16512 PKN1 Serine/threonine-protein kinase N1

C_254 4 O00506 STK25 Serine/threonine-protein kinase 25

C_254 4 P45984 MAPK9 Mitogen-activated protein kinase 9

C_255 4 Q9NYL9 TMOD3 Tropomodulin-3

C_255 4 P63165 SUMO1 Small ubiquitin-related modifier 1

C_255 4 Q9BWH6 RPAP1 RNA polymerase II-associated protein 1

C_255 4 Q9H2H8 PPIL3 Peptidyl-prolyl cis-trans isomerase-like 3

C_256 4 P39210 MPV17 Protein Mpv17

C_256 4 Q9UHQ9 CYB5R1 NADH-cytochrome b5 reductase 1

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_256 4 O95299 NDUFA10 subunit 10, mitochondrial

C_256 4 P00533 EGFR Epidermal growth factor receptor

C_257 4 Q92597 NDRG1 Protein NDRG1

C_257 4 P13667 PDIA4 Protein disulfide-isomerase A4

C_257 4 Q9UMX5 NENF Neudesin

C_257 4 O60271 SPAG9 C-Jun-amino-terminal kinase-interacting protein 4

C_258 4 Q6P3W7 SCYL2 SCY1-like protein 2

C_258 4 P57721 PCBP3 Poly(rC)-binding protein 3

C_258 4 Q96T58 SPEN Msx2-interacting protein

C_258 4 P07738 BPGM Bisphosphoglycerate mutase

C_259 4 P12956 XRCC6 X-ray repair cross-complementing protein 6

C_259 4 P13010 XRCC5 X-ray repair cross-complementing protein 5

C_259 4 Q01664 TFAP4 Transcription factor AP-4

C_259 4 Q13017 ARHGAP5 Rho GTPase-activating protein 5

C_260 4 Q9NY27 PPP4R2 Serine/threonine-protein phosphatase 4 regulatory subunit 2

C_260 4 O60664 PLIN3 Perilipin-3

181

C_260 4 Q14126 DSG2 Desmoglein-2

C_260 4 Q9Y6H1 CHCHD2 Coiled-coil-helix-coiled-coil-helix domain-containing protein 2

C_261 4 Q8WWY3 PRPF31 U4/U6 small nuclear ribonucleoprotein Prp31

C_261 4 Q14186 TFDP1 Transcription factor Dp-1

C_261 4 Q9HAU5 UPF2 Regulator of nonsense transcripts 2

C_261 4 Q9H074 PAIP1 Polyadenylate-binding protein-interacting protein 1

C_262 4 Q9UP95 SLC12A4 Solute carrier family 12 member 4

C_262 4 Q04941 PLP2 Proteolipid protein 2

C_262 4 Q96EY7 PTCD3 Pentatricopeptide repeat-containing protein 3, mitochondrial

C_262 4 Q8TCC3 MRPL30 39S ribosomal protein L30, mitochondrial

C_263 4 Q9H061 TMEM126A Transmembrane protein 126A

C_263 4 Q8WZ42 TTN Titin

C_263 4 Q01130 SRSF2 Serine/arginine-rich splicing factor 2

C_263 4 P82650 MRPS22 28S ribosomal protein S22, mitochondrial

C_264 4 P31153 MAT2A S-adenosylmethionine synthase isoform type-2

C_264 4 Q00266 MAT1A S-adenosylmethionine synthase isoform type-1

C_264 4 Q03426 MVK Mevalonate kinase

C_264 4 P53990 IST1 IST1 homolog

C_265 4 O75477 ERLIN1 Erlin-1

C_265 4 Q9BRQ6 CHCHD6 Coiled-coil-helix-coiled-coil-helix domain-containing protein 6

C_265 4 O43633 CHMP2A Charged multivesicular body protein 2a

C_265 4 O75746 SLC25A12 Calcium-binding mitochondrial carrier protein Aralar1

C_266 4 Q96H20 SNF8 Vacuolar-sorting protein SNF8

C_266 4 Q9NRS6 SNX15 Sorting nexin-15

C_266 4 Q86U44 METTL3 N6-adenosine-methyltransferase 70 kDa subunit

C_266 4 Q9HCE5 METTL14 Methyltransferase-like protein 14

C_267 4 P61586 RHOA Transforming protein RhoA

C_267 4 Q15165 PON2 Serum paraoxonase/arylesterase 2

182

C_267 4 O14974 PPP1R12A Protein phosphatase 1 regulatory subunit 12A

C_267 4 O60664 PLIN3 Perilipin-3

C_268 4 O00159 MYO1C Unconventional myosin-Ic

C_268 4 Q29RF7 PDS5A Sister chromatid cohesion protein PDS5 homolog A

C_268 4 P09874 PARP1 Poly [ADP-ribose] polymerase 1

C_268 4 Q9NR30 DDX21 Nucleolar RNA helicase 2

C_269 4 O94966 USP19 Ubiquitin carboxyl-terminal hydrolase 19

C_269 4 O95881 TXNDC12 Thioredoxin domain-containing protein 12

C_269 4 Q9Y490 TLN1 Talin-1

C_269 4 Q15742 NAB2 NGFI-A-binding protein 2

C_270 4 Q8TF74 WIPF2 WAS/WASL-interacting member 2

C_270 4 P09132 SRP19 Signal recognition particle 19 kDa protein

C_270 4 Q9BWU0 SLC4A1AP Kanadaptin

C_270 4 O76080 ZFAND5 AN1-type zinc finger protein 5

C_271 4 A8MYJ9 IMA2L Importin subunit alpha-2-like protein

C_271 4 P36954 POLR2I DNA-directed RNA polymerase II subunit RPB9

C_271 4 Q16630 CPSF6 Cleavage and polyadenylation specificity factor subunit 6

C_271 4 P48729 CSNK1A1 Casein kinase I isoform alpha

C_272 4 Q01658 DR1 Protein Dr1

C_272 4 Q14919 DRAP1 Dr1-associated corepressor

C_272 4 P16220 CREB1 Cyclic AMP-responsive element-binding protein 1

C_272 4 P17544 ATF7 Cyclic AMP-dependent transcription factor ATF-7

Lipoamide acyltransferase component of branched-chain C_273 4 P11182 DBT alpha-keto acid dehydrogenase complex, mitochondrial

C_273 4 P38117 ETFB Electron transfer flavoprotein subunit beta

C_273 4 P13804 ETFA Electron transfer flavoprotein subunit alpha, mitochondrial

C_273 4 P09622 DLD Dihydrolipoyl dehydrogenase, mitochondrial

C_274 4 O43504 HBXIP X-interacting protein

C_274 4 Q9H8Y8 GORASP2 Golgi reassembly-stacking protein 2

183

C_274 4 O95817 BAG3 BAG family molecular chaperone regulator 3

C_274 4 Q99933 BAG1 BAG family molecular chaperone regulator 1

C_275 4 Q9P0L0 VAPA Vesicle-associated membrane protein-associated protein A

C_275 4 Q9UKD2 MRTO4 mRNA turnover protein 4 homolog

C_275 4 Q96PK2 MACF1 Microtubule-actin cross-linking factor 1

C_275 4 P63241 EIF5A Eukaryotic translation initiation factor 5A-1

C_276 4 O94966 USP19 Ubiquitin carboxyl-terminal hydrolase 19

C_276 4 Q70J99 UNC13D Protein unc-13 homolog D

C_276 4 Q6IQ26 DENND5A DENN domain-containing protein 5A

C_276 4 Q9H0D6 XRN2 5'-3' exoribonuclease 2

C_277 4 P50552 VASP Vasodilator-stimulated phosphoprotein

C_277 4 O75131 CPNE3 Copine-3

C_277 4 Q4VCS5 AMOT Angiomotin

C_277 4 Q9BWD1 ACAT2 Acetyl-CoA acetyltransferase, cytosolic

C_278 4 Q8IY81 FTSJ3 Putative rRNA methyltransferase 3

C_278 4 P35659 DEK Protein DEK

C_278 4 Q9Y4C8 RBM19 Probable RNA-binding protein 19

C_278 4 Q9UKD2 MRTO4 mRNA turnover protein 4 homolog

C_279 4 Q9H936 SLC25A22 Mitochondrial glutamate carrier 1

C_279 4 P17693 HLA-G HLA class I histocompatibility antigen, alpha chain G

C_279 4 P43304 GPD2 Glycerol-3-phosphate dehydrogenase, mitochondrial

C_279 4 Q96CS3 FAF2 FAS-associated factor 2

C_280 4 Q03169 TNFAIP2 Tumor necrosis factor alpha-induced protein 2

C_280 4 Q9UNY4 TTF2 Transcription termination factor 2

C_280 4 Q92547 TOPBP1 DNA topoisomerase 2-binding protein 1

C_280 4 P24752 ACAT1 Acetyl-CoA acetyltransferase, mitochondrial

C_281 4 A6NEC2 NPEPPSL1 Puromycin-sensitive aminopeptidase-like protein

C_281 4 Q01105 SET Protein SET

184

C_281 4 P07737 PFN1 Profilin-1

C_281 4 O00221 NFKBIE NF-kappa-B inhibitor epsilon

C_282 4 Q86XZ4 SPATS2 Spermatogenesis-associated serine-rich protein 2

C_282 4 Q5TBB1 RNASEH2B Ribonuclease H2 subunit B

C_282 4 P01111 NRAS GTPase NRas

C_282 4 Q9Y5K6 CD2AP CD2-associated protein

C_283 4 P55084 HADHB Trifunctional enzyme subunit beta, mitochondrial

C_283 4 P40939 HADHA Trifunctional enzyme subunit alpha, mitochondrial

C_283 4 P26038 MSN Moesin

C_283 4 P13804 ETFA Electron transfer flavoprotein subunit alpha, mitochondrial

C_284 4 Q9P0S9 TMEM14C Transmembrane protein 14C

C_284 4 Q9NUH8 TMEM14B Transmembrane protein 14B

C_284 4 Q9H9B4 SFXN1 Sideroflexin-1

C_284 4 Q14257 RCN2 Reticulocalbin-2

C_285 4 P63146 UBE2B Ubiquitin-conjugating enzyme E2 B

C_285 4 P63208 SKP1 S-phase kinase-associated protein 1

C_285 4 Q92889 ERCC4 DNA repair endonuclease XPF

C_285 4 P07992 ERCC1 DNA excision repair protein ERCC-1

C_286 4 P46060 RANGAP1 Ran GTPase-activating protein 1

C_286 4 P35241 RDX Radixin

C_286 4 P30041 PRDX6 Peroxiredoxin-6

C_286 4 Q14686 NCOA6 Nuclear receptor coactivator 6

C_287 4 P37802 TAGLN2 Transgelin-2

C_287 4 P62937 PPIA Peptidyl-prolyl cis-trans isomerase A

C_287 4 Q9BY32 ITPA Inosine triphosphate pyrophosphatase

C_287 4 P06733 ENO1 Alpha-enolase

C_288 4 Q9UFG5 C19orf25 UPF0449 protein C19orf25

C_288 4 Q8WXC6 MYEOV2 Myeloma-overexpressed gene 2 protein

185

C_288 4 Q9Y2U8 LEMD3 Inner nuclear membrane protein Man1

C_288 4 P33316 DUT Deoxyuridine 5'-triphosphate nucleotidohydrolase

C_289 4 Q9UMY4 SNX12 Sorting nexin-12

C_289 4 P25815 S100P Protein S100-P

C_289 4 O15355 PPM1G Protein phosphatase 1G

C_289 4 Q96CN5 LRRC45 Leucine-rich repeat-containing protein 45

C_290 4 P63027 VAMP2 Vesicle-associated membrane protein 2

C_290 4 Q9NUW8 TDP1 Tyrosyl-DNA phosphodiesterase 1

C_290 4 Q12931 TRAP1 Heat shock protein 75 kDa, mitochondrial

C_290 4 P62158 CALM1/2/3 Calmodulin

C_291 4 Q96JC1 VPS39 Vam6/Vps39-like protein

C_291 4 Q9H270 VPS11 Vacuolar protein sorting-associated protein 11 homolog

C_291 4 Q9H9C1 SPE39 Spermatogenesis-defective protein 39 homolog

C_291 4 O00308 WWP2 NEDD4-like E3 ubiquitin-protein ligase WWP2

C_292 4 Q9HBM1 SPC25 Kinetochore protein Spc25

C_292 4 Q8NBT2 SPC24 Kinetochore protein Spc24

C_292 4 Q9BZD4 NUF2 Kinetochore protein Nuf2

C_292 4 O14777 NDC80 Kinetochore protein NDC80 homolog

C_293 4 O75717 WDHD1 WD repeat and HMG-box DNA-binding protein 1

C_293 4 Q9NTZ6 RBM12 RNA-binding protein 12

C_293 4 Q8NHP8 PLBD2 Putative phospholipase B-like 2

C_293 4 Q12931 TRAP1 Heat shock protein 75 kDa, mitochondrial

C_294 4 Q9NYT0 PLEK2 Pleckstrin-2

C_294 4 O15357 INPPL1 Phosphatidylinositol 3,4,5-trisphosphate 5-phosphatase 2

C_294 4 Q15555 MAPRE2 Microtubule-associated protein RP/EB family member 2

C_294 4 Q8WYQ5 DGCR8 Microprocessor complex subunit DGCR8

C_295 4 Q96P11 NSUN5 Putative methyltransferase NSUN5

C_295 4 Q16539 MAPK14 Mitogen-activated protein kinase 14

186

C_295 4 Q9BTC8 MTA3 Metastasis-associated protein MTA3

C_295 4 Q14004 CDK13 Cyclin-dependent kinase 13

C_296 4 Q9Y5L0 TNPO3 Transportin-3

C_296 4 Q92973 TNPO1 Transportin-1

C_296 4 O00410 IPO5 Importin-5

C_296 4 Q8TEX9 IPO4 Importin-4

C_297 4 Q96DM3 C18orf8 Uncharacterized protein C18orf8

C_297 4 Q9UH65 SWAP70 Switch-associated protein 70

C_297 4 P51692 STAT5B Signal transducer and activator of transcription 5B

C_297 4 Q86UX6 STK32C Serine/threonine-protein kinase 32C

C_298 4 Q92785 DPF2 Zinc finger protein ubi-d4

C_298 4 O14975 SLC27A2 Very long-chain acyl-CoA synthetase

C_298 4 Q14980 NUMA1 Nuclear mitotic apparatus protein 1

C_298 4 Q92542 NCSTN Nicastrin

C_299 4 Q15012 LAPTM4A Lysosomal-associated transmembrane protein 4A

C_299 4 O00425 IGF2BP3 Insulin-like growth factor 2 mRNA-binding protein 3

C_299 4 Q1KMD3 HNRNPUL2 Heterogeneous nuclear ribonucleoprotein U-like protein 2

C_299 4 O60812 HNRNPCL1 Heterogeneous nuclear ribonucleoprotein C-like 1

C_300 4 P13647 KRT5 Keratin, type II cytoskeletal 5

C_300 4 P12035 KRT3 Keratin, type II cytoskeletal 3

C_300 4 P02533 KRT14 Keratin, type I cytoskeletal 14

C_300 4 P06756 ITGAV Integrin alpha-V

C_301 4 Q13813 SPTAN1 Spectrin alpha chain, non-erythrocytic 1

C_301 4 Q5VYK3 ECM29 Proteasome-associated protein ECM29 homolog

C_301 4 Q15691 MAPRE1 Microtubule-associated protein RP/EB family member 1

Eukaryotic peptide chain release factor GTP-binding subunit C_301 4 P15170 GSPT1 ERF3A

C_302 4 P61244 MAX Protein max

C_302 4 Q12996 CSTF3 Cleavage stimulation factor subunit 3

187

C_302 4 Q9H0L4 CSTF2T Cleavage stimulation factor subunit 2 tau variant

C_302 4 P33240 CSTF2 Cleavage stimulation factor subunit 2

C_303 4 P63165 SUMO1 Small ubiquitin-related modifier 1

C_303 4 Q14914 PTGR1 Prostaglandin reductase 1

C_303 4 P30044 PRDX5 Peroxiredoxin-5, mitochondrial

C_303 4 Q9BW83 IFT27 Intraflagellar transport protein 27 homolog

C_304 4 Q9UHG3 PCYOX1 Prenylcysteine oxidase 1

C_304 4 Q9Y639 NPTN Neuroplastin

NADH dehydrogenase [ubiquinone] flavoprotein 2, C_304 4 P19404 NDUFV2 mitochondrial

C_304 4 P61916 NPC2 Epididymal secretory protein E1

C_305 4 P38646 HSPA9 Stress-70 protein, mitochondrial

C_305 4 Q8NFJ5 GPRC5A Retinoic acid-induced protein 3

C_305 4 Q14568 HSP90AA2 Putative heat shock protein HSP 90-alpha A2

C_305 4 P13984 GTF2F2 General transcription factor IIF subunit 2

C_306 4 Q9H2G2 SLK STE20-like serine/threonine-protein kinase

C_306 4 P11234 RALB Ras-related protein Ral-B

C_306 4 Q15233 NONO Non-POU domain-containing octamer-binding protein

C_306 4 Q14108 SCARB2 Lysosome membrane protein 2

C_307 4 P49591 SARS Serine--tRNA ligase, cytoplasmic

C_307 4 Q8TCS8 PNPT1 Polyribonucleotide nucleotidyltransferase 1, mitochondrial

C_307 4 O00151 PDLIM1 PDZ and LIM domain protein 1

Glutamyl-tRNA(Gln) amidotransferase subunit C, C_307 4 O43716 GATC mitochondrial

C_308 4 Q6P4I2 WDR73 WD repeat-containing protein 73

C_308 4 P22314 UBA1 Ubiquitin-like modifier-activating enzyme 1

C_308 4 Q9Y6N9 USH1C Harmonin

C_308 4 Q96D46 NMD3 60S ribosomal export protein NMD3

Nucleoside diphosphate-linked moiety X motif 19, C_309 4 A8MXV4 NUDT19 mitochondrial

188

C_309 4 Q9BXS6 NUSAP1 Nucleolar and spindle-associated protein 1

C_309 4 Q15599 SLC9A3R2 Na(+)/H(+) exchange regulatory cofactor NHE-RF2

C_309 4 P61916 NPC2 Epididymal secretory protein E1

C_310 4 Q9BTD8 RBM42 RNA-binding protein 42

C_310 4 Q13416 ORC2 Origin recognition complex subunit 2

Nuclear ubiquitous casein and cyclin-dependent kinase C_310 4 Q9H1E3 NUCKS1 substrate 1

C_310 4 Q99567 NUP88 Nuclear pore complex protein Nup88

C_311 4 P31350 RRM2 Ribonucleoside-diphosphate reductase subunit M2

C_311 4 P31150 GDI1 Rab GDP dissociation inhibitor alpha

C_311 4 P22061 PCMT1 Protein-L-isoaspartate(D-aspartate) O-methyltransferase

C_311 4 O43447 PPIH Peptidyl-prolyl cis-trans isomerase H

C_312 4 Q9NZ09 UBAP1 Ubiquitin-associated protein 1

C_312 4 P09211 GSTP1 Glutathione S-transferase P

C_312 4 P62736 ACTA2 Actin, aortic smooth muscle

C_312 4 P31946 YWHAB 14-3-3 protein beta/alpha

C_313 4 Q9H0S4 DDX47 Probable ATP-dependent RNA helicase DDX47

C_313 4 Q96B26 EXOSC8 Exosome complex component RRP43

C_313 4 Q15024 EXOSC7 Exosome complex component RRP42

C_313 4 Q9NPD3 EXOSC4 Exosome complex component RRP41

C_314 4 P57105 SYNJ2BP Synaptojanin-2-binding protein

C_314 4 Q14197 ICT1 Peptidyl-tRNA hydrolase ICT1, mitochondrial

C_314 4 Q8WXH0 SYNE2 Nesprin-2

C_314 4 O96005 CLPTM1 Cleft lip and palate transmembrane protein 1

C_315 4 Q9BYJ9 YTHDF1 YTH domain family protein 1

C_315 4 P50454 SERPINH1 Serpin H1

C_315 4 O43813 LANCL1 LanC-like protein 1

C_315 4 Q6UWE0 LRSAM1 E3 ubiquitin-protein ligase LRSAM1

C_316 4 Q96H79 ZC3HAV1L Zinc finger CCCH-type antiviral protein 1-like

189

C_316 4 P18206 VCL Vinculin

C_316 4 Q15124 PGM5 Phosphoglucomutase-like protein 5

C_316 4 Q9H3Z4 DNAJC5 DnaJ homolog subfamily C member 5

C_317 4 Q86TG7 PEG10 Retrotransposon-derived protein PEG10

C_317 4 P35080 PFN2 Profilin-2

C_317 4 Q9NZD8 SPG21 Maspardin

C_317 4 Q01581 HMGCS1 Hydroxymethylglutaryl-CoA synthase, cytoplasmic

C_318 4 O14776 TCERG1 Transcription elongation regulator 1

C_318 4 P07602 PSAP Proactivator polypeptide [Cleaved into: Saposin-A

C_318 4 Q8TB36 GDAP1 Ganglioside-induced differentiation-associated protein 1

C_318 4 P37840 SNCA Alpha-synuclein

C_319 4 Q96HC4 PDLIM5 PDZ and LIM domain protein 5

C_319 4 Q96ST3 SIN3A Paired amphipathic helix protein Sin3a

C_319 4 Q9UIJ7 AK3 GTP:AMP phosphotransferase, mitochondrial

C_319 4 Q99798 ACO2 Aconitate hydratase, mitochondrial

C_320 4 Q96NB3 ZNF830 Zinc finger protein 830

C_320 4 Q8TF74 WIPF2 WAS/WASL-interacting protein family member 2

C_320 4 O60341 KDM1A Lysine-specific histone demethylase 1A

C_320 4 Q9BWU0 SLC4A1AP Kanadaptin

C_321 4 P21127 CDK11B Cyclin-dependent kinase 11B

C_321 4 Q92572 AP3S1 AP-3 complex subunit sigma-1

C_321 4 O14617 AP3D1 AP-3 complex subunit delta-1

C_321 4 O00203 AP3B1 AP-3 complex subunit beta-1

C_322 4 Q96S82 UBL7 Ubiquitin-like protein 7

C_322 4 Q6IBS0 TWF2 Twinfilin-2

C_322 4 O75688 PPM1B Protein phosphatase 1B

C_322 4 Q8TAE6 PPP1R14C Protein phosphatase 1 regulatory subunit 14C

C_323 4 O43709 WBSCR22 Uncharacterized methyltransferase WBSCR22

190

C_323 4 Q9UNH7 SNX6 Sorting nexin-6

PH domain leucine-rich repeat-containing protein C_323 4 O60346 PHLPP1 phosphatase 1

C_323 4 O75628 REM1 GTP-binding protein REM 1

C_324 4 P53350 PLK1 Serine/threonine-protein kinase PLK1

C_324 4 Q96IY1 NSL1 Kinetochore-associated protein NSL1 homolog

C_324 4 O14777 NDC80 Kinetochore protein NDC80 homolog

C_324 4 P14635 CCNB1 G2/mitotic-specific cyclin-B1

C_325 4 Q8N1G0 ZNF687 Zinc finger protein 687

C_325 4 Q13247 SRSF6 Serine/arginine-rich splicing factor 6

C_325 4 O75494 SRSF10 Serine/arginine-rich splicing factor 10

C_325 4 Q8N2C7 UNC80 Protein unc-80 homolog

C_326 4 Q17R98 ZNF827 Zinc finger protein 827

C_326 4 O75348 ATP6V1G1 V-type proton ATPase subunit G 1

C_326 4 P36543 ATP6V1E1 V-type proton ATPase subunit E 1

C_326 4 O14828 SCAMP3 Secretory carrier-associated membrane protein 3

C_327 4 Q92597 NDRG1 Protein NDRG1

C_327 4 P12081 HARS Histidine--tRNA ligase, cytoplasmic

C_327 4 O60271 SPAG9 C-Jun-amino-terminal kinase-interacting protein 4

C_327 4 P08758 ANXA5 Annexin A5

Transient receptor potential cation channel subfamily M C_328 4 Q7Z2W7 TRPM8 member 8

C_328 4 Q96SB4 SRPK1 SRSF protein kinase 1

C_328 4 Q12874 SF3A3 Splicing factor 3A subunit 3

C_328 4 Q96JM3 CHAMP1 Chromosome alignment-maintaining phosphoprotein 1

C_329 4 Q96EB1 ELP4 Elongator complex protein 4

C_329 4 Q9H9T3 ELP3 Elongator complex protein 3

C_329 4 Q6IA86 ELP2 Elongator complex protein 2

C_329 4 O95163 IKBKAP Elongator complex protein 1

C_330 4 Q8TED0 UTP15 U3 small nucleolar RNA-associated protein 15 homolog

191

C_330 4 Q99873 PRMT1 Protein arginine N-methyltransferase 1

C_330 4 Q9NZM5 GLTSCR2 Glioma tumor suppressor candidate region gene 2 protein

C_330 4 Q969X6 CIRH1A Cirhin

C_331 4 O75347 TBCA Tubulin-specific chaperone A

C_331 4 P62306 SNRPF Small nuclear ribonucleoprotein F

C_331 4 P25815 S100P Protein S100-P

C_331 4 Q96CN5 LRRC45 Leucine-rich repeat-containing protein 45

C_332 4 Q8WVV9 HNRPLL Heterogeneous nuclear ribonucleoprotein L-like

C_332 4 Q92600 RQCD1 Cell differentiation protein RCD1 homolog

C_332 4 O75175 CNOT3 CCR4-NOT transcription complex subunit 3

C_332 4 A5YKK6 CNOT1 CCR4-NOT transcription complex subunit 1

C_333 4 Q96H20 SNF8 Vacuolar-sorting protein SNF8

C_333 4 Q8IVD9 NUDCD3 NudC domain-containing protein 3

C_333 4 Q9HCE5 METTL14 Methyltransferase-like protein 14

C_333 4 P52789 HK2 Hexokinase-2

C_334 4 Q9Y2W2 WBP11 WW domain-binding protein 11

C_334 4 P08579 SNRPB2 U2 small nuclear ribonucleoprotein B''

C_334 4 Q13309 SKP2 S-phase kinase-associated protein 2

C_334 4 A8MWD9 RUXGL Small nuclear ribonucleoprotein G-like protein

C_335 4 Q8IUX1 TMEM126B Transmembrane protein 126B

C_335 4 Q8IX01 SUGP2 SURP and G-patch domain-containing protein 2

Mitochondrial import inner membrane translocase subunit C_335 4 Q9Y5J6 FXC1 Tim9 B

C_335 4 P11717 IGF2R Cation-independent mannose-6-phosphate receptor

C_336 4 P55084 HADHB Trifunctional enzyme subunit beta, mitochondrial

C_336 4 P60660 MYL6 Myosin light polypeptide 6

C_336 4 P14649 MYL6B Myosin light chain 6B

C_336 4 P17661 DES Desmin

C_337 4 Q8TDW7 FAT3 Protocadherin Fat 3

192

C_337 4 Q6XQN6 NAPRT1 Nicotinate phosphoribosyltransferase

C_337 4 P00492 HPRT1 Hypoxanthine-guanine phosphoribosyltransferase

C_337 4 P07741 APRT Adenine phosphoribosyltransferase

C_338 4 Q9Y487 ATP6V0A2 V-type proton ATPase 116 kDa subunit a isoform 2

C_338 4 Q6ZRP7 QSOX2 Sulfhydryl oxidase 2

C_338 4 Q00325 SLC25A3 Phosphate carrier protein, mitochondrial

C_338 4 O75431 MTX2 Metaxin-2

C_339 4 Q5JTD0 TJAP1 Tight junction-associated protein 1

C_339 4 Q9Y570 PPME1 Protein phosphatase methylesterase 1

C_339 4 Q9Y266 NUDC Nuclear migration protein nudC

C_339 4 P04080 CSTB Cystatin-B

C_340 4 Q99805 TM9SF2 Transmembrane 9 superfamily member 2

C_340 4 P30101 PDIA3 Protein disulfide-isomerase A3

Dolichyl-diphosphooligosaccharide--protein C_340 4 P04844 RPN2 glycosyltransferase subunit 2

C_340 4 P27824 CANX Calnexin

C_341 4 P12004 PCNA Proliferating cell nuclear antigen

C_341 4 Q13112 CHAF1B Chromatin assembly factor 1 subunit B

C_341 4 Q13111 CHAF1A Chromatin assembly factor 1 subunit A

C_341 4 Q9NWV8 BABAM1 BRISC and BRCA1-A complex member 1

C_342 4 A6NFI3 ZNF316 Zinc finger protein 316

C_342 4 Q9UQR1 ZNF148 Zinc finger protein 148

C_342 4 Q96MX6 WDR92 WD repeat-containing protein 92

C_342 4 Q15126 PMVK Phosphomevalonate kinase

C_343 4 Q96PF2 TSSK2 Testis-specific serine/threonine-protein kinase 2

C_343 4 P17600 SYN1 Synapsin-1

C_343 4 O43166 SIPA1L1 Signal-induced proliferation-associated 1-like protein 1

C_343 4 Q9BXP5 SRRT Serrate RNA effector molecule homolog

C_344 4 O75410 TACC1 Transforming acidic coiled-coil-containing protein 1

193

C_344 4 O00584 RNASET2 Ribonuclease T2

C_344 4 Q9UHG3 PCYOX1 Prenylcysteine oxidase 1

C_344 4 O14745 SLC9A3R1 Na(+)/H(+) exchange regulatory cofactor NHE-RF1

C_345 4 O00193 SMAP Small acidic protein

C_345 4 Q01105 SET Protein SET

C_345 4 P60660 MYL6 Myosin light polypeptide 6

C_345 4 P14649 MYL6B Myosin light chain 6B

C_346 4 Q9GZS3 WDR61 WD repeat-containing protein 61

C_346 4 Q9Y5K5 UCHL5 Ubiquitin carboxyl-terminal hydrolase isozyme L5

C_346 4 Q9Y6N9 USH1C Harmonin

C_346 4 Q9HAV4 XPO5 Exportin-5

C_347 4 P49755 TMED10 Transmembrane emp24 domain-containing protein 10

C_347 4 Q6NZI2 PTRF Polymerase I and transcript release factor

C_347 4 Q9UKN8 GTF3C4 General transcription factor 3C polypeptide 4

C_347 4 Q03135 CAV1 Caveolin-1

C_348 4 Q9Y2Z0 SUGT1 Suppressor of G2 allele of SKP1 homolog

C_348 4 P25815 S100P Protein S100-P

C_348 4 Q96CN5 LRRC45 Leucine-rich repeat-containing protein 45

C_348 4 Q13619 CUL4A Cullin-4A

C_349 4 Q9Y490 TLN1 Talin-1

C_349 4 Q15742 NAB2 NGFI-A-binding protein 2

C_349 4 P05556 ITGB1 Integrin beta-1

C_349 4 P08648 ITGA5 Integrin alpha-5

C_350 4 Q6NZY4 ZCCHC8 Zinc finger CCHC domain-containing protein 8

C_350 4 Q96RU2 USP28 Ubiquitin carboxyl-terminal hydrolase 28

C_350 4 Q29980 MICB MHC class I polypeptide-related sequence B

C_350 4 O14578 CIT Citron Rho-interacting kinase

C_351 4 Q01844 EWSR1 RNA-binding protein EWS

194

C_351 4 Q92839 HAS1 Hyaluronan synthase 1

C_351 4 O00165 HAX1 HCLS1-associated protein X-1

C_351 4 Q8TEQ6 GEMIN5 Gem-associated protein 5

C_352 4 Q6PML9 SLC30A9 Zinc transporter 9

Putative pre-mRNA-splicing factor ATP-dependent RNA C_352 4 O43143 DHX15 helicase DHX15

C_352 4 P35080 PFN2 Profilin-2

Evolutionarily conserved signaling intermediate in Toll C_352 4 Q9BQ95 ECSIT pathway, mitochondrial

C_353 4 P62253 UBE2G1 Ubiquitin-conjugating enzyme E2 G1

C_353 4 P40818 USP8 Ubiquitin carboxyl-terminal hydrolase 8

C_353 4 Q9NSD9 FARSB Phenylalanine--tRNA ligase beta subunit

C_353 4 Q9Y285 FARSA Phenylalanine--tRNA ligase alpha subunit

C_354 4 Q9BZK7 TBL1XR1 F-box-like/WD repeat-containing protein TBL1XR1

C_354 4 Q14019 COTL1 Coactosin-like protein

C_354 4 O95400 CD2BP2 CD2 antigen cytoplasmic tail-binding protein 2

C_354 4 O00233 PSMD9 26S proteasome non-ATPase regulatory subunit 9

C_355 4 Q9H5V9 CXorf56 UPF0428 protein CXorf56

C_355 4 O94992 HEXIM1 Protein HEXIM1

C_355 4 Q9HCC0 MCCC2 Methylcrotonoyl-CoA carboxylase beta chain, mitochondrial

C_355 4 P10809 HSPD1 60 kDa heat shock protein, mitochondrial

C_356 4 A8MWD9 RUXGL Small nuclear ribonucleoprotein G-like protein

C_356 4 Q15067 ACOX1 Peroxisomal acyl-coenzyme A oxidase 1

C_356 4 P11117 ACP2 Lysosomal acid phosphatase

C_356 4 P20645 M6PR Cation-dependent mannose-6-phosphate receptor

C_357 4 Q9HAU5 UPF2 Regulator of nonsense transcripts 2

C_357 4 Q6ZRQ5 MMS22L Protein MMS22-like

C_357 4 Q6PI98 INO80C INO80 complex subunit C

C_357 4 Q9NRG0 CHRAC1 Chromatin accessibility complex protein 1

C_358 4 Q7Z2W4 ZC3HAV1 Zinc finger CCCH-type antiviral protein 1

195

C_358 4 Q9NWH9 SLTM SAFB-like transcription modulator

C_358 4 O00442 RTCA RNA 3'-terminal phosphate cyclase

Mitochondrial import inner membrane translocase subunit C_358 4 Q96DA6 DNAJC19 TIM14

C_359 4 Q9UQE7 SMC3 Structural maintenance of chromosomes protein 3

C_359 4 Q14683 SMC1A Structural maintenance of chromosomes protein 1A

C_359 4 Q6KC79 NIPBL Nipped-B-like protein

C_359 4 O60216 RAD21 Double-strand-break repair protein rad21 homolog

C_360 4 Q9NYU2 UGGT1 UDP-glucose:glycoprotein glucosyltransferase 1

C_360 4 P50748 KNTC1 Kinetochore-associated protein 1

C_360 4 Q7Z460 CLASP1 CLIP-associating protein 1

C_360 4 O43264 ZW10 Centromere/kinetochore protein zw10 homolog

C_361 4 Q15067 ACOX1 Peroxisomal acyl-coenzyme A oxidase 1

C_361 4 Q8WX93 PALLD Palladin

C_361 4 P42126 ECI1 Enoyl-CoA delta isomerase 1, mitochondrial

C_361 4 P14854 COX6B1 Cytochrome c oxidase subunit 6B1

C_362 4 Q9UJC5 SH3BGRL2 SH3 domain-binding glutamic acid-rich-like protein 2

C_362 4 Q14914 PTGR1 Prostaglandin reductase 1

C_362 4 P30086 PEBP1 Phosphatidylethanolamine-binding protein 1

C_362 4 Q06830 PRDX1 Peroxiredoxin-1

C_363 4 Q06323 PSME1 Proteasome activator complex subunit 1

C_363 4 Q7Z6E9 RBBP6 E3 ubiquitin-protein ligase RBBP6

C_363 4 Q9HBI1 PARVB Beta-parvin

C_363 4 Q9NRX4 PHPT1 14 kDa phosphohistidine phosphatase

C_364 4 Q92841 DDX17 Probable ATP-dependent RNA helicase DDX17

C_364 4 Q96PK2 MACF1 Microtubule-actin cross-linking factor 1

C_364 4 P63241 EIF5A Eukaryotic translation initiation factor 5A-1

C_364 4 P55265 ADAR Double-stranded RNA-specific

C_365 4 Q96TA2 YME1L1 ATP-dependent zinc metalloprotease YME1L1

196

C_365 4 P51398 DAP3 28S ribosomal protein S29, mitochondrial

C_365 4 Q92552 MRPS27 28S ribosomal protein S27, mitochondrial

C_365 4 Q9Y3D9 MRPS23 28S ribosomal protein S23, mitochondrial

C_366 4 P42356 PI4KA Phosphatidylinositol 4-kinase alpha

C_366 4 P12081 HARS Histidine--tRNA ligase, cytoplasmic

C_366 4 P11413 G6PD Glucose-6-phosphate 1-dehydrogenase

C_366 4 P41214 EIF2D Eukaryotic translation initiation factor 2D

C_367 4 Q6P4R8 NFRKB Nuclear factor related to kappa-B-binding protein

C_367 4 Q6PI98 INO80C INO80 complex subunit C

C_367 4 Q9ULG1 INO80 DNA helicase INO80

C_367 4 Q9H981 ACTR8 Actin-related protein 8

C_368 4 P57721 PCBP3 Poly(rC)-binding protein 3

C_368 4 P08237 PFKM 6-phosphofructokinase, muscle type

C_368 4 P17858 PFKL 6-phosphofructokinase, liver type

C_368 4 Q01813 PFKP 6-phosphofructokinase type C

C_369 4 P49591 SARS Serine--tRNA ligase, cytoplasmic

C_369 4 Q58FG1 HSP90AA4P Putative heat shock protein HSP 90-alpha A4

C_369 4 Q8WX93 PALLD Palladin

Glutamyl-tRNA(Gln) amidotransferase subunit C, C_369 4 O43716 GATC mitochondrial

C_370 4 Q9H7Z7 PTGES2 Prostaglandin E synthase 2

C_370 4 O14949 UQCRQ Cytochrome b-c1 complex subunit 8

C_370 4 O14957 UQCR11 Cytochrome b-c1 complex subunit 10

C_370 4 Q96KA5 CLPTM1L Cleft lip and palate transmembrane protein 1-like protein

C_371 4 Q9C0D4 ZNF518B Zinc finger protein 518B

C_371 4 Q9NVG8 TBC1D13 TBC1 domain family member 13

C_371 4 Q9Y6Y8 SEC23IP SEC23-interacting protein

C_371 4 Q2M296 MTHFSD Methenyltetrahydrofolate synthase domain-containing protein

C_372 4 O95219 SNX4 Sorting nexin-4

197

C_372 4 Q75QN2 INTS8 Integrator complex subunit 8

C_372 4 Q9NVH2 INTS7 Integrator complex subunit 7

C_372 4 Q9H0H0 INTS2 Integrator complex subunit 2

C_373 4 Q15061 WDR43 WD repeat-containing protein 43

C_373 4 Q8TED0 UTP15 U3 small nucleolar RNA-associated protein 15 homolog

C_373 4 O95071 UBR5 E3 ubiquitin-protein ligase UBR5

C_373 4 Q969X6 CIRH1A Cirhin

C_374 4 Q7L1V2 MON1B Vacuolar fusion protein MON1 homolog B

C_374 4 Q9NS86 LANCL2 LanC-like protein 2

C_374 4 P48637 GSS Glutathione synthetase

C_374 4 P50750 CDK9 Cyclin-dependent kinase 9

C_375 4 P55072 VCP Transitional endoplasmic reticulum ATPase

C_375 4 O60678 PRMT3 Protein arginine N-methyltransferase 3

C_375 4 P49321 NASP Nuclear autoantigenic sperm protein

C_375 4 O75369 FLNB Filamin-B

C_376 4 Q8TEA7 TBCK TBC domain-containing protein kinase-like protein

C_376 4 Q92783 STAM Signal transducing adapter molecule 1

C_376 4 Q9HD47 RANGRF Ran guanine nucleotide release factor

C_376 4 O14964 HGS Hepatocyte growth factor-regulated tyrosine kinase substrate

C_377 4 P33527 ABCC1 Multidrug resistance-associated protein 1

C_377 4 O15173 PGRMC2 Membrane-associated progesterone receptor component 2

Lipoamide acyltransferase component of branched-chain C_377 4 P11182 DBT alpha-keto acid dehydrogenase complex, mitochondrial

C_377 4 Q8NE86 MCU Calcium uniporter protein, mitochondrial

C_378 4 Q9NUN5 LMBRD1 Probable lysosomal cobalamin transporter

KH domain-containing, RNA-binding, signal transduction- C_378 4 Q07666 KHDRBS1 associated protein 1

C_378 4 P07910 HNRNPC Heterogeneous nuclear ribonucleoproteins C1/C2

C_378 4 A1L0T0 ILVBL Acetolactate synthase-like protein

C_379 4 Q8TDN6 BRIX1 Ribosome biogenesis protein BRX1 homolog

198

C_379 4 Q99848 EBNA1BP2 Probable rRNA-processing protein EBP2

C_379 4 Q96GQ7 DDX27 Probable ATP-dependent RNA helicase DDX27

C_379 4 Q9BZZ5 API5 Apoptosis inhibitor 5

C_380 4 P53007 SLC25A1 Tricarboxylate transport protein, mitochondrial

Putative mitochondrial import inner membrane translocase C_380 4 Q5SRD1 TIMM23B subunit Tim23B

C_380 4 Q96A26 FAM162A Protein FAM162A

C_380 4 Q9UJ83 HACL1 2-hydroxyacyl-CoA lyase 1

C_381 4 Q07157 TJP1 Tight junction protein ZO-1

C_381 4 Q9UMY4 SNX12 Sorting nexin-12

C_381 4 Q9NQC3 RTN4 Reticulon-4

C_381 4 Q16270 IGFBP7 Insulin-like growth factor-binding protein 7

C_382 4 Q15070 OXA1L Mitochondrial inner membrane protein OXA1L

C_382 4 Q14126 DSG2 Desmoglein-2

Coiled-coil-helix-coiled-coil-helix domain-containing protein 2, C_382 4 Q9Y6H1 CHCHD2 mitochondrial

C_382 4 Q92887 ABCC2 Canalicular multispecific organic anion transporter 1

C_383 4 Q96DM3 C18orf8 Uncharacterized protein C18orf8

C_383 4 Q9UH65 SWAP70 Switch-associated protein 70

C_383 4 P11908 PRPS2 Ribose-phosphate pyrophosphokinase 2

C_383 4 P60891 PRPS1 Ribose-phosphate pyrophosphokinase 1

C_384 4 Q12888 TP53BP1 Tumor suppressor p53-binding protein 1

C_384 4 P57740 NUP107 Nuclear pore complex protein Nup107

C_384 4 P05976 MYL1 Myosin light chain 1/3, skeletal muscle isoform

C_384 4 Q9BR76 CORO1B Coronin-1B

C_385 4 P04179 SOD2 Superoxide dismutase [Mn], mitochondrial

C_385 4 P51649 ALDH5A1 Succinate-semialdehyde dehydrogenase, mitochondrial

C_385 4 Q9BQ15 NABP2 SOSS complex subunit B1

C_385 4 Q92733 PRCC Proline-rich protein PRCC

C_386 4 Q9Y6W5 WASF2 Wiskott-Aldrich syndrome protein family member 2

199

C_386 4 O00401 WASL Neural Wiskott-Aldrich syndrome protein

C_386 4 Q9BXK1 KLF16 Krueppel-like factor 16

C_386 4 Q96RU3 FNBP1 Formin-binding protein 1

C_387 4 Q8NH73 OR4S2 Olfactory receptor 4S2

C_387 4 Q15843 NEDD8 NEDD8

C_387 4 Q9Y3D2 MSRB2 Methionine-R-sulfoxide reductase B2, mitochondrial

Disintegrin and metalloproteinase domain-containing protein C_387 4 O14672 ADAM10 10

C_388 4 P12081 HARS Histidine--tRNA ligase, cytoplasmic

C_388 4 P41214 EIF2D Eukaryotic translation initiation factor 2D

C_388 4 O60739 EIF1B Eukaryotic translation initiation factor 1b

C_388 4 P41567 EIF1 Eukaryotic translation initiation factor 1

C_389 4 Q969T9 WBP2 WW domain-binding protein 2

C_389 4 Q5MNZ6 WDR45L WD repeat domain phosphoinositide-interacting protein 3

C_389 4 Q9BZX2 UCK2 Uridine-cytidine kinase 2

C_389 4 Q16831 UPP1 Uridine phosphorylase 1

C_390 4 P48553 TRAPPC10 Trafficking protein particle complex subunit 10

C_390 4 P35241 RDX Radixin

C_390 4 P15311 EZR Ezrin

C_390 4 P52824 DGKQ Diacylglycerol kinase theta

C_391 4 Q9Y5T5 USP16 Ubiquitin carboxyl-terminal hydrolase 16

C_391 4 Q15012 LAPTM4A Lysosomal-associated transmembrane protein 4A

C_391 4 O00425 IGF2BP3 Insulin-like growth factor 2 mRNA-binding protein 3

C_391 4 P48730 CSNK1D Casein kinase I isoform delta

C_392 4 P20618 PSMB1 Proteasome subunit beta type-1

C_392 4 Q04721 NOTCH2 Neurogenic locus notch homolog protein 2

C_392 4 Q96T58 SPEN Msx2-interacting protein

C_392 4 Q9NVF7 FBXO28 F-box only protein 28

C_393 4 Q16773 CCBL1 Kynurenine--oxoglutarate transaminase 1

200

C_393 4 Q9H223 EHD4 EH domain-containing protein 4

C_393 4 Q9H4M9 EHD1 EH domain-containing protein 1

C_393 4 P52824 DGKQ Diacylglycerol kinase theta

C_394 4 Q05682 CALD1 Caldesmon

C_394 4 O43707 ACTN4 Alpha-actinin-4

C_394 4 P35609 ACTN2 Alpha-actinin-2

C_394 4 P12814 ACTN1 Alpha-actinin-1

C_395 4 P37275 ZEB1 Zinc finger E-box-binding homeobox 1

C_395 4 P52655 GTF2A1 Transcription initiation factor IIA subunit 1

C_395 4 Q96EB6 SIRT1 NAD-dependent protein deacetylase sirtuin-1

C_395 4 P09960 LTA4H Leukotriene A-4 hydrolase

C_396 4 Q15836 VAMP3 Vesicle-associated membrane protein 3

C_396 4 P16615 ATP2A2 Sarcoplasmic/endoplasmic reticulum calcium ATPase 2

C_396 4 P16070 CD44 CD44 antigen

C_396 4 P08195 SLC3A2 4F2 cell-surface antigen heavy chain

C_397 4 Q9UPT8 ZC3H4 Zinc finger CCCH domain-containing protein 4

C_397 4 Q96RU2 USP28 Ubiquitin carboxyl-terminal hydrolase 28

C_397 4 P34896 SHMT1 Serine hydroxymethyltransferase, cytosolic

C_397 4 O95785 WIZ Protein Wiz

C_398 4 Q9NSD9 FARSB Phenylalanine--tRNA ligase beta subunit

C_398 4 Q9Y285 FARSA Phenylalanine--tRNA ligase alpha subunit

C_398 4 P29317 EPHA2 Ephrin type-A receptor 2

C_398 4 P13639 EEF2 Elongation factor 2

C_399 4 Q9NZT2 OGFR Opioid growth factor receptor

C_399 4 P04731 MT1A Metallothionein-1A

C_399 4 Q9UNF1 MAGED2 Melanoma-associated antigen D2

C_399 4 Q14657 LAGE3 L antigen family member 3

C_400 4 P68036 UBE2L3 Ubiquitin-conjugating enzyme E2 L3

201

C_400 4 Q9BXU7 USP26 Ubiquitin carboxyl-terminal hydrolase 26

C_400 4 Q2TAA8 TSNAXIP1 Translin-associated factor X-interacting protein 1

C_400 4 Q9NQ88 TIGAR Fructose-2,6-bisphosphatase TIGAR

C_401 5 Q9Y6K9 IKBKG NF-kappa-B essential modulator

C_401 5 Q9BRT9 GINS4 DNA replication complex GINS protein SLD5

C_401 5 Q9BRX5 GINS3 DNA replication complex GINS protein PSF3

C_401 5 Q9Y248 GINS2 DNA replication complex GINS protein PSF2

C_401 5 Q14691 GINS1 DNA replication complex GINS protein PSF1

C_402 5 Q9H0E2 TOLLIP Toll-interacting protein

C_402 5 Q9BTF0 THUMPD2 THUMP domain-containing protein 2

C_402 5 Q9Y5X2 SNX8 Sorting nexin-8

C_402 5 Q9H900 ZWILCH Protein zwilch homolog

C_402 5 Q2T9F4 INTS4L2 Integrator complex subunit 4-like protein 2

Solute carrier family 2, facilitated glucose transporter C_403 5 P11166 SLC2A1 member 1

C_403 5 Q27J81 INF2 Inverted formin-2

C_403 5 P08174 CD55 Complement decay-accelerating factor

C_403 5 Q6NUK1 SLC25A24 Calcium-binding mitochondrial carrier protein SCaMC-1

C_403 5 Q16706 MAN2A1 Alpha-mannosidase 2

C_404 5 O60504 SORBS3 Vinexin

C_404 5 Q9NPD8 UBE2T Ubiquitin-conjugating enzyme E2 T

C_404 5 Q9BRA2 TXNDC17 Thioredoxin domain-containing protein 17

C_404 5 Q9HD15 SRA1 Steroid receptor RNA activator 1

C_404 5 Q96Q11 TRNT1 CCA tRNA nucleotidyltransferase 1, mitochondrial

C_405 5 Q2TAY7 SMU1 WD40 repeat-containing protein SMU1

Uveal autoantigen with coiled-coil domains and ankyrin C_405 5 Q9BZF9 UACA repeats

C_405 5 Q86UV5 USP48 Ubiquitin carboxyl-terminal hydrolase 48

C_405 5 Q15276 RABEP1 Rab GTPase-binding effector protein 1

C_405 5 O94953 KDM4B Lysine-specific demethylase 4B

202

C_406 5 Q9BVA1 TUBB2B Tubulin beta-2B chain

C_406 5 Q13885 TUBB2A Tubulin beta-2A chain

C_406 5 P68363 TUBA1B Tubulin alpha-1B chain

C_406 5 Q15293 RCN1 Reticulocalbin-1

C_406 5 Q96GA3 LTV1 Protein LTV1 homolog

C_407 5 Q5VT52 RPRD2 Regulation of nuclear pre-mRNA domain-containing protein 2

C_407 5 Q15555 MAPRE2 Microtubule-associated protein RP/EB family member 2

C_407 5 Q8WYQ5 DGCR8 Microprocessor complex subunit DGCR8

C_407 5 Q9BTT6 LRRC1 Leucine-rich repeat-containing protein 1

C_407 5 Q9H211 CDT1 DNA replication factor Cdt1

C_408 5 P08729 KRT7 Keratin, type II cytoskeletal 7

C_408 5 Q9BTC0 DIDO1 Death-inducer obliterator 1

C_408 5 Q9P0M6 H2AFY2 Core histone macro-H2A.2

C_408 5 O75367 H2AFY Core histone macro-H2A.1

C_408 5 P45973 CBX5 Chromobox protein homolog 5

C_409 5 Q9NQR4 NIT2 Omega-amidase NIT2

C_409 5 Q15818 NPTX1 Neuronal pentraxin-1

C_409 5 O60313 OPA1 Dynamin-like 120 kDa protein, mitochondrial

C_409 5 O00115 DNASE2 Deoxyribonuclease-2-alpha

C_409 5 Q13011 ECH1 Delta(3,5)-Delta(2,4)-dienoyl-CoA isomerase, mitochondrial

C_410 5 Q86UE4 MTDH Protein LYRIC

C_410 5 Q15149 PLEC Plectin

C_410 5 Q15155 NOMO1 Nodal modulator 1

C_410 5 P14923 JUP Junction plakoglobin

C_410 5 P16144 ITGB4 Integrin beta-4

C_411 5 Q5VUA4 ZNF318 Zinc finger protein 318

C_411 5 Q16851 UGP2 UTP--glucose-1-phosphate uridylyltransferase

C_411 5 P05161 ISG15 Ubiquitin-like protein ISG15

203

C_411 5 Q7Z745 HEATR7B2 HEAT repeat-containing protein 7B2

C_411 5 Q9ULA0 DNPEP Aspartyl aminopeptidase

C_412 5 A8MQ03 C9orf169 UPF0574 protein C9orf169

C_412 5 Q9Y2Q5 LAMTOR2 Ragulator complex protein LAMTOR2

C_412 5 Q6IAA8 LAMTOR1 Ragulator complex protein LAMTOR1

C_412 5 P79522 PRR3 Proline-rich protein 3

C_412 5 Q86YQ8 CPNE8 Copine-8

C_413 5 Q96I51 WBSCR16 Williams-Beuren syndrome chromosomal region 16 protein

C_413 5 P35232 PHB Prohibitin

C_413 5 Q9NZI8 IGF2BP1 Insulin-like growth factor 2 mRNA-binding protein 1

C_413 5 O60506 SYNCRIP Heterogeneous nuclear ribonucleoprotein Q

C_413 5 Q08380 LGALS3BP Galectin-3-binding protein

Succinyl-CoA ligase [ADP-forming] subunit beta, C_414 5 Q9P2R7 SUCLA2 mitochondrial

C_414 5 P50213 IDH3A Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial

C_414 5 Q9UBQ7 GRHPR Glyoxylate reductase/hydroxypyruvate reductase

C_414 5 P78310 CXADR Coxsackievirus and adenovirus receptor

C_414 5 P52594 AGFG1 Arf-GAP domain and FG repeat-containing protein 1

C_415 5 P49768 PSEN1 Presenilin-1

C_415 5 P37198 NUP62 Nuclear pore glycoprotein p62

C_415 5 Q86XA9 HEATR5A HEAT repeat-containing protein 5A

C_415 5 Q96I24 FUBP3 Far upstream element-binding protein 3

C_415 5 Q9Y4J8 DTNA Dystrobrevin alpha

C_416 5 Q9UHX1 PUF60 Poly(U)-binding-splicing factor PUF60

C_416 5 P31942 HNRNPH3 Heterogeneous nuclear ribonucleoprotein H3

C_416 5 O14979 HNRPDL Heterogeneous nuclear ribonucleoprotein D-like

C_416 5 Q15056 EIF4H Eukaryotic translation initiation factor 4H

C_416 5 P06730 EIF4E Eukaryotic translation initiation factor 4E

C_417 5 P35658 NUP214 Nuclear pore complex protein Nup214

204

C_417 5 Q8N183 NDUFAF2 Mimitin, mitochondrial

C_417 5 O75521 ECI2 Enoyl-CoA delta isomerase 2, mitochondrial

C_417 5 P43897 TSFM Elongation factor Ts, mitochondrial

C_417 5 Q9NPJ3 ACOT13 Acyl-coenzyme A thioesterase 13

C_418 5 Q6PGP7 TTC37 Tetratricopeptide repeat protein 37

C_418 5 Q9UHB4 NDOR1 NADPH-dependent diflavin oxidoreductase 1

C_418 5 Q15811 ITSN1 Intersectin-1

C_418 5 Q86TU7 SETD3 Histone-lysine N-methyltransferase setd3

C_418 5 Q15477 SKIV2L Helicase SKI2W

C_419 5 Q49A26 GLYR1 Putative oxidoreductase GLYR1

C_419 5 Q14764 MVP Major vault protein

C_419 5 Q14739 LBR Lamin-B receptor

C_419 5 P50402 EMD Emerin

C_419 5 Q08211 DHX9 ATP-dependent RNA helicase A

C_420 5 Q8N9F8 ZNF454 Zinc finger protein 454

Translocase of inner mitochondrial membrane domain- C_420 5 Q9NPL8 TIMMDC1 containing protein 1

C_420 5 P56589 PEX3 Peroxisomal biogenesis factor 3

C_420 5 P28288 ABCD3 ATP-binding cassette sub-family D member 3

C_420 5 P33897 ABCD1 ATP-binding cassette sub-family D member 1

C_421 5 Q9P2B2 PTGFRN Prostaglandin F2 receptor negative regulator

C_421 5 Q9H0U3 MAGT1 Magnesium transporter protein 1

C_421 5 O60885 BRD4 Bromodomain-containing protein 4

C_421 5 Q02878 RPL6

C_421 5 Q92901 RPL3L -like

C_422 5 O43318 MAP3K7 Mitogen-activated protein kinase kinase kinase 7

C_422 5 Q9NR33 POLE4 DNA polymerase epsilon subunit 4

C_422 5 Q9NRF9 POLE3 DNA polymerase epsilon subunit 3

C_422 5 P56282 POLE2 DNA polymerase epsilon subunit 2

205

C_422 5 Q07864 POLE DNA polymerase epsilon catalytic subunit A

Mitochondrial import inner membrane translocase subunit C_423 5 Q9Y5L4 TIMM13 Tim13

C_423 5 P50502 ST13 Hsc70-interacting protein

C_423 5 P11216 PYGB Glycogen phosphorylase, brain form

C_423 5 P54886 ALDH18A1 Delta-1-pyrroline-5-carboxylate synthase

C_423 5 Q99829 CPNE1 Copine-1

C_424 5 Q13428 TCOF1 Treacle protein

C_424 5 P50897 PPT1 Palmitoyl-protein thioesterase 1

C_424 5 Q9NQR4 NIT2 Omega-amidase NIT2

C_424 5 Q15818 NPTX1 Neuronal pentraxin-1

C_424 5 Q9UIU6 SIX4 Homeobox protein SIX4

C_425 5 Q9H4B7 TUBB1 Tubulin beta-1 chain

C_425 5 Q04837 SSBP1 Single-stranded DNA-binding protein, mitochondrial

C_425 5 Q96EH3 MALSU1 Mitochondrial assembly of ribosomal large subunit protein 1

C_425 5 Q8N108 MIER1 Mesoderm induction early response protein 1

C_425 5 P24752 ACAT1 Acetyl-CoA acetyltransferase, mitochondrial

C_426 5 Q8N2U0 C17orf61 UPF0451 protein C17orf61

C_426 5 Q8WUM4 PDCD6IP Programmed cell death 6-interacting protein

C_426 5 Q9UBV8 PEF1 Peflin

C_426 5 P06744 GPI Glucose-6-phosphate isomerase

C_426 5 P13929 ENO3 Beta-enolase

C_427 5 P61088 UBE2N Ubiquitin-conjugating enzyme E2 N

C_427 5 Q9UL12 SARDH Sarcosine dehydrogenase, mitochondrial

C_427 5 Q02790 FKBP4 Peptidyl-prolyl cis-trans isomerase FKBP4

C_427 5 Q9C0B5 ZDHHC5 Palmitoyltransferase ZDHHC5

C_427 5 Q9Y3D8 TAF9 Adenylate kinase isoenzyme 6

C_428 5 Q9NTJ3 SMC4 Structural maintenance of chromosomes protein 4

C_428 5 O95347 SMC2 Structural maintenance of chromosomes protein 2

206

C_428 5 Q9BPX3 NCAPG Condensin complex subunit 3

C_428 5 Q15003 NCAPH Condensin complex subunit 2

C_428 5 Q15021 NCAPD2 Condensin complex subunit 1

C_429 5 O95267 RASGRP1 RAS guanyl-releasing protein 1

C_429 5 Q9BXF6 RAB11FIP5 Rab11 family-interacting protein 5

C_429 5 Q9BYZ2 LDHAL6B L-lactate dehydrogenase A-like 6B

C_429 5 Q9NUT2 ABCB8 ATP-binding cassette sub-family B member 8, mitochondrial

C_429 5 Q9BRR6 ADPGK ADP-dependent glucokinase

C_430 5 Q96NB3 ZNF830 Zinc finger protein 830

C_430 5 Q14966 ZNF638 Zinc finger protein 638

C_430 5 P17028 ZNF24 Zinc finger protein 24

C_430 5 P25490 YY1 Transcriptional repressor protein YY1

C_430 5 Q96RD0 OR8B2 Olfactory receptor 8B2

C_431 5 O00186 STXBP3 Syntaxin-binding protein 3

C_431 5 A1X283 SH3PXD2B SH3 and PX domain-containing protein 2B

C_431 5 P53041 PPP5C Serine/threonine-protein phosphatase 5

C_431 5 Q9Y2L1 DIS3 Exosome complex exonuclease RRP44

C_431 5 P02765 AHSG Alpha-2-HS-glycoprotein

Phosphatidylinositol glycan anchor biosynthesis class U C_432 5 Q9H490 PIGU protein

C_432 5 Q92643 PIGK GPI-anchor transamidase

C_432 5 Q969N2 PIGT GPI transamidase component PIG-T

C_432 5 Q96S52 PIGS GPI transamidase component PIG-S

C_432 5 O43292 GPAA1 Glycosylphosphatidylinositol anchor attachment 1 protein

C_433 5 Q9Y3A4 RRP7A Ribosomal RNA-processing protein 7 homolog A

C_433 5 Q9NZM1 MYOF Myoferlin

C_433 5 P43243 MATR3 Matrin-3

C_433 5 P49792 RANBP2 E3 SUMO-protein ligase RanBP2

C_433 5 O43633 CHMP2A Charged multivesicular body protein 2a

207

C_434 5 Q9BXJ9 NAA15 N-alpha-acetyltransferase 15, NatA auxiliary subunit

C_434 5 P19105 MYL12A Myosin regulatory light chain 12A

C_434 5 Q9NX55 HYPK Huntingtin-interacting protein K

C_434 5 Q01518 CAP1 Adenylyl cyclase-associated protein 1

C_434 5 P82921 MRPS21 28S ribosomal protein S21, mitochondrial

C_435 5 Q96TA1 FAM129B Niban-like protein 1

C_435 5 Q8IYT4 KATNAL2 Katanin p60 ATPase-containing subunit A-like 2

C_435 5 O43719 HTATSF1 HIV Tat-specific factor 1

C_435 5 Q96HR8 NAF1 H/ACA ribonucleoprotein complex non-core subunit NAF1

C_435 5 O14617 AP3D1 AP-3 complex subunit delta-1

C_436 5 Q9UNL2 SSR3 Translocon-associated protein subunit gamma

C_436 5 Q9NZ01 TECR Trans-2,3-enoyl-CoA reductase

C_436 5 O75940 SMNDC1 Survival of motor neuron-related-splicing factor 30

C_436 5 Q7KZF4 SND1 Staphylococcal nuclease domain-containing protein 1

Dolichyl-diphosphooligosaccharide--protein C_436 5 P04843 RPN1 glycosyltransferase subunit 1

C_437 5 Q96P70 IPO9 Importin-9

C_437 5 Q14974 KPNB1 Importin subunit beta-1

C_437 5 O60684 KPNA6 Importin subunit alpha-7

C_437 5 O15131 KPNA5 Importin subunit alpha-6

C_437 5 P52294 KPNA1 Importin subunit alpha-1

C_438 5 O43913 ORC5 Origin recognition complex subunit 5

C_438 5 Q9UBD5 ORC3 Origin recognition complex subunit 3

C_438 5 Q13416 ORC2 Origin recognition complex subunit 2

C_438 5 Q9UFC0 LRWD1 Leucine-rich repeat and WD repeat-containing protein 1

C_438 5 P23743 DGKA Diacylglycerol kinase alpha

C_439 5 O75676 RPS6KA4 Ribosomal protein S6 kinase alpha-4

C_439 5 Q9Y314 NOSIP Nitric oxide synthase-interacting protein

C_439 5 Q9UPW0 FOXJ3 Forkhead box protein J3

208

C_439 5 Q86XP3 DDX42 ATP-dependent RNA helicase DDX42

C_439 5 P23352 KAL1 Anosmin-1

C_440 5 Q9H3N1 TMX1 Thioredoxin-related transmembrane protein 1

NADH dehydrogenase [ubiquinone] 1 beta subcomplex C_440 5 Q9NX14 NDUFB11 subunit 11, mitochondrial

C_440 5 P53985 SLC16A1 Monocarboxylate transporter 1

C_440 5 Q9Y5U9 IER3IP1 Immediate early response 3-interacting protein 1

Cysteine-rich and transmembrane domain-containing protein C_440 5 Q9H1C7 CYSTM1 1

C_441 5 Q9H7Z7 PTGES2 Prostaglandin E synthase 2

C_441 5 O75127 PTCD1 Pentatricopeptide repeat-containing protein 1

C_441 5 Q13423 NNT NAD(P) transhydrogenase, mitochondrial

C_441 5 Q9H488 POFUT1 GDP-fucose protein O-fucosyltransferase 1

C_441 5 O14957 UQCR11 Cytochrome b-c1 complex subunit 10

Succinate dehydrogenase [ubiquinone] flavoprotein subunit, C_442 5 P31040 SDHA mitochondrial

Solute carrier family 2, facilitated glucose transporter C_442 5 Q8TDB8 SLC2A14 member 14

C_442 5 Q8WXH0 SYNE2 Nesprin-2

C_442 5 Q96E29 MTERFD1 mTERF domain-containing protein 1, mitochondrial

C_442 5 Q8IWT6 LRRC8A Leucine-rich repeat-containing protein 8A

C_443 5 Q8IYS2 KIAA2013 Uncharacterized protein KIAA2013

C_443 5 P51148 RAB5C Ras-related protein Rab-5C

C_443 5 P61026 RAB10 Ras-related protein Rab-10

C_443 5 O14880 MGST3 Microsomal glutathione S-transferase 3

C_443 5 O75844 ZMPSTE24 CAAX prenyl protease 1 homolog

C_444 5 O00330 PDHX protein X component, mitochondrial

C_444 5 P33527 ABCC1 Multidrug resistance-associated protein 1

Mitochondrial import inner membrane translocase subunit C_444 5 Q9Y3D7 PAM16 TIM16

C_444 5 O60841 EIF5B Eukaryotic translation initiation factor 5B

209

EIF3C; C_444 5 Q99613 EIF3CL Eukaryotic translation initiation factor 3 subunit C

C_445 5 Q15061 WDR43 WD repeat-containing protein 43

C_445 5 Q16864 ATP6V1F V-type proton ATPase subunit F

C_445 5 P51668 UBE2D1 Ubiquitin-conjugating enzyme E2 D1

Interferon-induced, double-stranded RNA-activated protein C_445 5 P19525 EIF2AK2 kinase

C_445 5 O95071 UBR5 E3 ubiquitin-protein ligase UBR5

C_446 5 Q9H2F9 CCDC68 Coiled-coil domain-containing protein 68

C_446 5 P56385 ATP5I ATP synthase subunit e, mitochondrial

C_446 5 P50995 ANXA11 Annexin A11

C_446 5 P05141 SLC25A5 ADP/ATP translocase 2

C_446 5 P63220 RPS21 40S ribosomal protein S21

C_447 5 P07951 TPM2 Tropomyosin beta chain

C_447 5 Q92804 TAF15 TATA-binding protein-associated factor 2N

C_447 5 P16949 STMN1 Stathmin

C_447 5 Q14108 SCARB2 Lysosome membrane protein 2

C_447 5 Q16531 DDB1 DNA damage-binding protein 1

C_448 5 P23921 RRM1 Ribonucleoside-diphosphate reductase large subunit

C_448 5 P00491 PNP Purine nucleoside phosphorylase

C_448 5 O00754 MAN2B1 Lysosomal alpha-mannosidase

C_448 5 Q9UBS4 DNAJB11 DnaJ homolog subfamily B member 11

C_448 5 O00115 DNASE2 Deoxyribonuclease-2-alpha

C_449 5 O95747 OXSR1 Serine/threonine-protein kinase OSR1

C_449 5 Q08AF3 SLFN5 Schlafen family member 5

C_449 5 Q8TDP1 RNASEH2C Ribonuclease H2 subunit C

C_449 5 Q5TBB1 RNASEH2B Ribonuclease H2 subunit B

C_449 5 O75792 RNASEH2A Ribonuclease H2 subunit A

C_450 5 Q96DI7 SNRNP40 U5 small nuclear ribonucleoprotein 40 kDa protein

C_450 5 Q9UKM9 RALY RNA-binding protein Raly

210

C_450 5 Q58FF8 HSP90AB2P Putative heat shock protein HSP 90-beta 2

C_450 5 Q9NWU2 C20orf11 Protein C20orf11

C_450 5 O60907 TBL1X F-box-like/WD repeat-containing protein TBL1X

Probable cytosolic iron-sulfur protein assembly protein C_451 5 O76071 CIAO1 CIAO1

C_451 5 Q6P1J9 CDC73 Parafibromin

C_451 5 Q96T76 MMS19 MMS19 nucleotide excision repair protein homolog

C_451 5 Q9Y3D0 FAM96B Mitotic spindle-associated MMXD complex subunit MIP18

C_451 5 P35269 GTF2F1 General transcription factor IIF subunit 1

C_452 5 Q9UGR2 ZC3H7B Zinc finger CCCH domain-containing protein 7B

C_452 5 Q96JC1 VPS39 Vam6/Vps39-like protein

C_452 5 Q9H270 VPS11 Vacuolar protein sorting-associated protein 11 homolog

C_452 5 Q9BZV1 UBXN6 UBX domain-containing protein 6

C_452 5 Q96QU8 XPO6 Exportin-6

C_453 5 P18031 PTPN1 Tyrosine-protein phosphatase non-receptor type 1

C_453 5 P34896 SHMT1 Serine hydroxymethyltransferase, cytosolic

C_453 5 Q8IVS2 MCAT Malonyl-CoA-acyl carrier protein transacylase, mitochondrial

C_453 5 P12035 KRT3 Keratin, type II cytoskeletal 3

C_453 5 P06756 ITGAV Integrin alpha-V

C_454 5 Q9Y2Z4 YARS2 Tyrosine--tRNA ligase, mitochondrial

Pyruvate dehydrogenase E1 component subunit alpha, C_454 5 P29803 PDHA2 testis-specific form, mitochondrial

C_454 5 Q16891 IMMT Mitochondrial inner membrane protein

C_454 5 Q15800 MSMO1 Methylsterol monooxygenase 1

C_454 5 Q96HY6 DDRGK1 DDRGK domain-containing protein 1

C_455 5 Q9Y4P8 WIPI2 WD repeat domain phosphoinositide-interacting protein 2

C_455 5 Q9Y6I4 USP3 Ubiquitin carboxyl-terminal hydrolase 3

C_455 5 Q9NUQ3 TXLNG Gamma-taxilin

C_455 5 Q6FIF0 ZFAND6 AN1-type zinc finger protein 6

C_455 5 P40222 TXLNA Alpha-taxilin

211

C_456 5 P55011 SLC12A2 Solute carrier family 12 member 2

C_456 5 O94979 SEC31A Protein transport protein Sec31A

C_456 5 Q15436 SEC23A Protein transport protein Sec23A

C_456 5 P55735 SEC13 Protein SEC13 homolog

C_456 5 P23497 SP100 Nuclear autoantigen Sp-100

C_457 5 P61421 ATP6V0D1 V-type proton ATPase subunit d 1

C_457 5 P52298 NCBP2 Nuclear cap-binding protein subunit 2

C_457 5 Q09161 NCBP1 Nuclear cap-binding protein subunit 1

C_457 5 O15083 ERC2 ERC protein 2

C_457 5 Q8IUD2 ERC1 ELKS/Rab6-interacting/CAST family member 1

C_458 5 Q9BTV4 TMEM43 Transmembrane protein 43

C_458 5 P61026 RAB10 Ras-related protein Rab-10

C_458 5 Q10471 GALNT2 Polypeptide N-acetylgalactosaminyltransferase 2

C_458 5 O75844 ZMPSTE24 CAAX prenyl protease 1 homolog

C_458 5 P82675 MRPS5 28S ribosomal protein S5, mitochondrial

C_459 5 P11441 UBL4A Ubiquitin-like protein 4A

C_459 5 P04216 THY1 Thy-1 membrane glycoprotein

C_459 5 Q14247 CTTN Src substrate cortactin

C_459 5 Q15020 SART3 Squamous cell carcinoma antigen recognized by T-cells 3

C_459 5 Q8IZP0 ABI1 Abl interactor 1

C_460 5 Q6GQQ9 OTUD7B OTU domain-containing protein 7B

C_460 5 Q8WV07 ORAOV1 Oral cancer-overexpressed protein 1

C_460 5 P52701 MSH6 DNA mismatch repair protein Msh6

C_460 5 P20585 MSH3 DNA mismatch repair protein Msh3

C_460 5 P43246 MSH2 DNA mismatch repair protein Msh2

C_461 5 Q9BSL1 UBAC1 Ubiquitin-associated domain-containing protein 1

C_461 5 Q9NYH9 UTP6 U3 small nucleolar RNA-associated protein 6 homolog

C_461 5 Q6IBS0 TWF2 Twinfilin-2

212

C_461 5 A6NIH7 UNC119B Protein unc-119 homolog B

C_461 5 Q96PU4 UHRF2 E3 ubiquitin-protein ligase UHRF2

C_462 5 Q7Z3B4 NUP54 Nucleoporin p54

C_462 5 Q9UKX7 NUP50 Nuclear pore complex protein Nup50

C_462 5 P49790 NUP153 Nuclear pore complex protein Nup153

C_462 5 Q16626 MEA1 Male-enhanced antigen 1

C_462 5 Q15021 NCAPD2 Condensin complex subunit 1

C_463 5 P54652 HSPA2 Heat shock-related 70 kDa protein 2

C_463 5 P11142 HSPA8 Heat shock cognate 71 kDa protein

C_463 5 P17066 HSPA6 Heat shock 70 kDa protein 6

C_463 5 P08107 HSPA1A/B Heat shock 70 kDa protein 1A/1B

C_463 5 Q9Y2V2 CARHSP1 Calcium-regulated heat stable protein 1

C_464 5 Q92544 TM9SF4 Transmembrane 9 superfamily member 4

C_464 5 Q9Y320 TMX2 Thioredoxin-related transmembrane protein 2

C_464 5 Q9NRX5 SERINC1 Serine incorporator 1

Mitochondrial import inner membrane translocase subunit C_464 5 Q9Y5J6 FXC1 Tim9 B

C_464 5 P11717 IGF2R Cation-independent mannose-6-phosphate receptor

C_465 5 P54727 RAD23B UV excision repair protein RAD23 homolog B

C_465 5 Q9NR45 NANS Sialic acid synthase

C_465 5 P35270 SPR Sepiapterin reductase

C_465 5 P13796 LCP1 Plastin-2

C_465 5 P46108 CRK Adapter molecule crk

C_466 5 O60888 CUTA Protein CutA

C_466 5 Q9P2K5 MYEF2 Myelin expression factor 2

C_466 5 Q9HCN4 GPN1 GPN-loop GTPase 1

C_466 5 Q5T1M5 FKBP15 FK506-binding protein 15

C_466 5 Q9BWU1 CDK19 Cyclin-dependent kinase 19

Serine/threonine-protein phosphatase 2A catalytic subunit C_467 5 P67775 PPP2CA alpha isoform

213

Serine/threonine-protein phosphatase 2A 65 kDa regulatory C_467 5 P30154 PPP2R1B subunit A beta isoform

Serine/threonine-protein phosphatase 2A 65 kDa regulatory C_467 5 P30153 PPP2R1A subunit A alpha isoform

Serine/threonine-protein phosphatase 2A 56 kDa regulatory C_467 5 Q14738 PPP2R5D subunit delta isoform

Serine/threonine-protein phosphatase 2A 55 kDa regulatory C_467 5 P63151 PPP2R2A subunit B alpha isoform

C_468 5 Q9NYB0 TERF2IP Telomeric repeat-binding factor 2-interacting protein 1

C_468 5 Q15554 TERF2 Telomeric repeat-binding factor 2

C_468 5 O60934 NBN Nibrin

C_468 5 P49959 MRE11A Double-strand break repair protein MRE11A

C_468 5 Q92878 RAD50 DNA repair protein RAD50

C_469 5 Q92558 WASF1 Wiskott-Aldrich syndrome protein family member 1

C_469 5 Q5VIR6 VPS53 Vacuolar protein sorting-associated protein 53 homolog

C_469 5 O14530 TXNDC9 Thioredoxin domain-containing protein 9

C_469 5 O94913 PCF11 Pre-mRNA cleavage complex 2 protein Pcf11

1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase C_469 5 Q9NQ66 PLCB1 beta-1

C_470 5 P49750 YLPM1 YLP motif-containing protein 1

C_470 5 Q6NUQ1 RINT1 RAD50-interacting protein 1

C_470 5 Q96EA4 CCDC99 Protein Spindly

C_470 5 Q9NZD8 SPG21 Maspardin

C_470 5 Q01581 HMGCS1 Hydroxymethylglutaryl-CoA synthase, cytoplasmic

C_471 5 Q8IVF2 AHNAK2 Protein AHNAK2

C_471 5 P35908 KRT2 Keratin, type II cytoskeletal 2 epidermal

C_471 5 P04264 KRT1 Keratin, type II cytoskeletal 1

C_471 5 P35527 KRT9 Keratin, type I cytoskeletal 9

C_471 5 P13645 KRT10 Keratin, type I cytoskeletal 10

C_472 5 Q14011 CIRBP Cold-inducible RNA-binding protein

C_472 5 Q14444 CAPRIN1 Caprin-1

214

cAMP-dependent protein kinase type II-alpha regulatory C_472 5 P13861 PRKAR2A subunit

C_472 5 P22694 PRKACB cAMP-dependent protein kinase catalytic subunit beta

C_472 5 P17612 PRKACA cAMP-dependent protein kinase catalytic subunit alpha

C_473 5 Q7Z2W4 ZC3HAV1 Zinc finger CCCH-type antiviral protein 1

C_473 5 O00442 RTCA RNA 3'-terminal phosphate cyclase

C_473 5 P15151 PVR Poliovirus receptor

C_473 5 P78406 RAE1 mRNA export factor

C_473 5 P78549 NTHL1 Endonuclease III-like protein 1

C_474 5 P55072 VCP Transitional endoplasmic reticulum ATPase

C_474 5 P31948 STIP1 Stress-induced-phosphoprotein 1

C_474 5 P16949 STMN1 Stathmin

C_474 5 P49321 NASP Nuclear autoantigenic sperm protein

C_474 5 P11142 HSPA8 Heat shock cognate 71 kDa protein

C_475 5 P41236 PPP1R2 Protein phosphatase inhibitor 2

C_475 5 Q9UJY1 HSPB8 Heat shock protein beta-8

C_475 5 Q9NPA8 ENY2 Enhancer of yellow 2 transcription factor homolog

C_475 5 Q9H773 DCTPP1 dCTP pyrophosphatase 1

C_475 5 Q6FI81 CIAPIN1 Anamorsin

C_476 5 P52788 SMS Spermine synthase

C_476 5 P40855 PEX19 Peroxisomal biogenesis factor 19

C_476 5 Q9Y536 PPIAL4A/B/C Peptidyl-prolyl cis-trans isomerase A-like 4A/B/C

C_476 5 P49023 PXN Paxillin

C_476 5 Q8WVJ2 NUDCD2 NudC domain-containing protein 2

C_477 5 Q8N2U0 C17orf61 UPF0451 protein C17orf61

C_477 5 Q9HC07 TMEM165 Transmembrane protein 165

C_477 5 P06744 GPI Glucose-6-phosphate isomerase

C_477 5 Q8N766 EMC1 ER membrane protein complex subunit 1

C_477 5 O43852 CALU Calumenin

215

C_478 5 Q15833 STXBP2 Syntaxin-binding protein 2

C_478 5 Q12846 STX4 Syntaxin-4

C_478 5 O00161 SNAP23 Synaptosomal-associated protein 23

C_478 5 P12429 ANXA3 Annexin A3

C_478 5 P50995 ANXA11 Annexin A11

C_479 5 Q6UN15 FIP1L1 Pre-mRNA 3'-end-processing factor FIP1

C_479 5 Q9C0J8 WDR33 pre-mRNA 3' end processing protein WDR33

C_479 5 Q9UKF6 CPSF3 Cleavage and polyadenylation specificity factor subunit 3

C_479 5 Q9P2I0 CPSF2 Cleavage and polyadenylation specificity factor subunit 2

C_479 5 Q10570 CPSF1 Cleavage and polyadenylation specificity factor subunit 1

Succinate dehydrogenase [ubiquinone] iron-sulfur subunit, C_480 5 P21912 SDHB mitochondrial

C_480 5 Q96ST2 IWS1 Protein IWS1 homolog

C_480 5 Q9H936 SLC25A22 Mitochondrial glutamate carrier 1

C_480 5 P06396 GSN Gelsolin

C_480 5 Q9BYC9 MRPL20 39S ribosomal protein L20, mitochondrial

C_481 5 P62995 TRA2B Transformer-2 protein homolog beta

C_481 5 Q15434 RBMS2 RNA-binding motif, single-stranded-interacting protein 2

C_481 5 Q15056 EIF4H Eukaryotic translation initiation factor 4H

C_481 5 P06730 EIF4E Eukaryotic translation initiation factor 4E

C_481 5 P31689 DNAJA1 DnaJ homolog subfamily A member 1

C_482 5 P18669 PGAM1 Phosphoglycerate mutase 1

C_482 5 Q92882 OSTF1 Osteoclast-stimulating factor 1

C_482 5 P54105 CLNS1A Methylosome subunit pICln

C_482 5 P14174 MIF Macrophage migration inhibitory factor

C_482 5 Q15181 PPA1 Inorganic pyrophosphatase

C_483 5 Q9Y4R8 TELO2 Telomere length regulation protein TEL2 homolog

C_483 5 Q15424 SAFB Scaffold attachment factor B1

C_483 5 P58107 EPPK1 Epiplakin

216

C_483 5 Q9BR76 CORO1B Coronin-1B

C_483 5 P54819 AK2 Adenylate kinase 2, mitochondrial

C_484 5 Q86TC9 MYPN Myopalladin

C_484 5 Q7Z3B3 KANSL1 KAT8 regulatory NSL complex subunit 1

C_484 5 Q9H6A0 DENND2D DENN domain-containing protein 2D

C_484 5 Q9BZJ0 CRNKL1 Crooked neck-like protein 1

C_484 5 Q14562 DHX8 ATP-dependent RNA helicase DHX8

C_485 5 Q15366 PCBP2 Poly(rC)-binding protein 2

C_485 5 P54652 HSPA2 Heat shock-related 70 kDa protein 2

C_485 5 P08107 HSPA1A/B Heat shock 70 kDa protein 1A/1B

C_485 5 P68104 EEF1A1 Elongation factor 1-alpha 1

C_485 5 Q8N163 KIAA1967 DBIRD complex subunit KIAA1967

C_486 5 P61764 STXBP1 Syntaxin-binding protein 1

C_486 5 Q96AT9 RPE Ribulose-phosphate 3-epimerase

C_486 5 Q9BV20 MRI1 Methylthioribose-1-phosphate isomerase

C_486 5 Q9H9A6 LRRC40 Leucine-rich repeat-containing protein 40

C_486 5 Q9Y597 KCTD3 BTB/POZ domain-containing protein KCTD3

C_487 5 O15118 NPC1 Niemann-Pick C1 protein

NADH dehydrogenase [ubiquinone] 1 beta subcomplex C_487 5 O95139 NDUFB6 subunit 6

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_487 5 Q9P0J0 NDUFA13 subunit 13

C_487 5 Q9UBX3 SLC25A10 Mitochondrial dicarboxylate carrier

C_487 5 Q9Y2Q9 MRPS28 28S ribosomal protein S28, mitochondrial

C_488 5 Q15165 PON2 Serum paraoxonase/arylesterase 2

C_488 5 P62834 RAP1A Ras-related protein Rap-1A

C_488 5 P39748 FEN1 Flap endonuclease 1

C_488 5 O75937 DNAJC8 DnaJ homolog subfamily C member 8

C_488 5 P60953 CDC42 Cell division control protein 42 homolog

C_489 5 O60701 UGDH UDP-glucose 6-dehydrogenase

217

C_489 5 Q13404 UBE2V1 Ubiquitin-conjugating enzyme E2 variant 1

C_489 5 P49459 UBE2A Ubiquitin-conjugating enzyme E2 A

C_489 5 Q96C45 ULK4 Serine/threonine-protein kinase ULK4

C_489 5 Q9Y244 POMP Proteasome maturation protein

C_490 5 Q56VL3 OCIAD2 OCIA domain-containing protein 2

C_490 5 Q8N4V1 MMGT1 Membrane magnesium transporter 1

Dihydrolipoyllysine-residue acetyltransferase component of C_490 5 P10515 DLAT pyruvate dehydrogenase complex, mitochondrial

C_490 5 Q9H845 ACAD9 Acyl-CoA dehydrogenase family member 9, mitochondrial

C_490 5 Q9BYD6 MRPL1 39S ribosomal protein L1, mitochondrial

C_491 5 P19634 SLC9A1 Sodium/hydrogen exchanger 1

C_491 5 Q9UK32 RPS6KA6 Ribosomal protein S6 kinase alpha-6

C_491 5 P23511 NFYA Nuclear transcription factor Y subunit alpha

C_491 5 P02686 MBP Myelin basic protein

C_491 5 Q16543 CDC37 Hsp90 co-chaperone Cdc37

C_492 5 P23193 TCEA1 Transcription elongation factor A protein 1

C_492 5 Q6N069 NAA16 N-alpha-acetyltransferase 16, NatA auxiliary subunit

C_492 5 P41227 NAA10 N-alpha-acetyltransferase 10

C_492 5 Q8IUF8 MINA MYC-induced nuclear antigen

1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase C_492 5 Q01970 PLCB3 beta-3

C_493 5 Q9P000 COMMD9 COMM domain-containing protein 9

C_493 5 Q9NX08 COMMD8 COMM domain-containing protein 8

C_493 5 Q86VX2 COMMD7 COMM domain-containing protein 7

C_493 5 Q9UBI1 COMMD3 COMM domain-containing protein 3

C_493 5 Q9Y6G5 COMMD10 COMM domain-containing protein 10

C_494 5 P35232 PHB Prohibitin

C_494 5 P22695 UQCRC2 Cytochrome b-c1 complex subunit 2, mitochondrial

C_494 5 P18859 ATP5J ATP synthase-coupling factor 6, mitochondrial

C_494 5 P48047 ATP5O ATP synthase subunit O, mitochondrial

218

C_494 5 O75947 ATP5H ATP synthase subunit d, mitochondrial

C_495 5 Q8N5G0 C4orf52 Uncharacterized protein C4orf52

C_495 5 Q13435 SF3B2 Splicing factor 3B subunit 2

C_495 5 Q13242 SRSF9 Serine/arginine-rich splicing factor 9

C_495 5 A6NIH7 UNC119B Protein unc-119 homolog B

C_495 5 P78347 GTF2I General transcription factor II-I

C_496 5 Q01844 EWSR1 RNA-binding protein EWS

C_496 5 Q8NBT0 POC1A POC1 centriolar protein homolog A

C_496 5 Q9NZJ7 MTCH1 Mitochondrial carrier homolog 1

C_496 5 Q9NVK5 FGFR1OP2 FGFR1 oncogene partner 2

C_496 5 Q8IYW5 RNF168 E3 ubiquitin-protein ligase RNF168

C_497 5 P13010 XRCC5 X-ray repair cross-complementing protein 5

C_497 5 P54577 YARS Tyrosine--tRNA ligase, cytoplasmic

C_497 5 Q9HCD5 NCOA5 Nuclear receptor coactivator 5

C_497 5 P36776 LONP1 Lon protease homolog, mitochondrial

C_497 5 Q14204 DYNC1H1 Cytoplasmic dynein 1 heavy chain 1

C_498 5 P56377 AP1S2 AP-1 complex subunit sigma-2

C_498 5 P61966 AP1S1 AP-1 complex subunit sigma-1A

C_498 5 Q9BXS5 AP1M1 AP-1 complex subunit mu-1

C_498 5 O43747 AP1G1 AP-1 complex subunit gamma-1

C_498 5 Q10567 AP1B1 AP-1 complex subunit beta-1

C_499 5 Q93009 USP7 Ubiquitin carboxyl-terminal hydrolase 7

C_499 5 O43396 TXNL1 Thioredoxin-like protein 1

C_499 5 P55055 NR1H2 Oxysterols receptor LXR-beta

C_499 5 Q8IUF8 MINA MYC-induced nuclear antigen

C_499 5 O43592 XPOT Exportin-T

C_500 5 P14618 PKM Pyruvate kinase isozymes M1/M2

C_500 5 Q15365 PCBP1 Poly(rC)-binding protein 1

219

C_500 5 P08648 ITGA5 Integrin alpha-5

C_500 5 Q96AE4 FUBP1 Far upstream element-binding protein 1

C_500 5 Q96G23 CERS2 Ceramide synthase 2

C_501 6 Q8IX90 SKA3 Spindle and kinetochore-associated protein 3

C_501 6 P27694 RPA1 Replication protein A 70 kDa DNA-binding subunit

C_501 6 P15927 RPA2 Replication protein A 32 kDa subunit

C_501 6 P35244 RPA3 Replication protein A 14 kDa subunit

C_501 6 P28340 POLD1 DNA polymerase delta catalytic subunit

C_501 6 Q9UKK9 NUDT5 ADP-sugar pyrophosphatase

C_502 6 Q9H4L4 SENP3 Sentrin-specific protease 3

C_502 6 Q6NTF9 RHBDD2 Rhomboid domain-containing protein 2

C_502 6 P61224 RAP1B Ras-related protein Rap-1b

C_502 6 P61020 RAB5B Ras-related protein Rab-5B

C_502 6 Q9HCE1 MOV10 Putative helicase MOV-10

Mitochondrial intermembrane space import and assembly C_502 6 Q8N4Q1 CHCHD4 protein 40

C_503 6 Q86UP2 KTN1 Kinectin

C_503 6 P14923 JUP Junction plakoglobin

C_503 6 Q9NSE4 IARS2 Isoleucine--tRNA ligase, mitochondrial

C_503 6 P16144 ITGB4 Integrin beta-4

C_503 6 P35221 CTNNA1 Catenin alpha-1

C_503 6 P19022 CDH2 Cadherin-2

C_504 6 Q8NI36 WDR36 WD repeat-containing protein 36

C_504 6 Q86YS7 KIAA0528 Uncharacterized protein KIAA0528

C_504 6 O43818 RRP9 U3 small nucleolar RNA-interacting protein 2

C_504 6 P11474 ESRRA Steroid hormone receptor ERR1

C_504 6 Q15269 PWP2 Periodic tryptophan protein 2 homolog

C_504 6 O94829 IPO13 Importin-13

C_505 6 Q8N1G0 ZNF687 Zinc finger protein 687

220

C_505 6 P52739 ZNF131 Zinc finger protein 131

C_505 6 Q9H7D7 WDR26 WD repeat-containing protein 26

C_505 6 Q9C0C9 UBE2O Ubiquitin-conjugating enzyme E2 O

C_505 6 Q6VN20 RANBP10 Ran-binding protein 10

C_505 6 Q8N2C7 UNC80 Protein unc-80 homolog

C_506 6 Q9H3S7 PTPN23 Tyrosine-protein phosphatase non-receptor type 23

C_506 6 Q68DN6 RGPD1/2 RANBP2-like and GRIP domain-containing protein 1

C_506 6 Q9H6Z4 RANBP3 Ran-binding protein 3

C_506 6 Q15398 DLGAP5 Disks large-associated protein 5

C_506 6 O14965 AURKA Aurora kinase A

C_506 6 Q9NRX4 PHPT1 14 kDa phosphohistidine phosphatase

C_507 6 Q86UK7 ZNF598 Zinc finger protein 598

C_507 6 Q16763 UBE2S Ubiquitin-conjugating enzyme E2 S

C_507 6 O14562 UBFD1 Ubiquitin domain-containing protein UBFD1

C_507 6 Q9C0B7 TMCO7 Transmembrane and coiled-coil domain-containing protein 7

C_507 6 Q9Y3C4 TPRKB TP53RK-binding protein

C_507 6 Q9ULW0 TPX2 Targeting protein for Xklp2

C_508 6 O15164 TRIM24 Transcription intermediary factor 1-alpha

C_508 6 Q14241 TCEB3 Transcription elongation factor B polypeptide 3

C_508 6 Q9UID3 FFR Protein fat-free homolog

C_508 6 Q9H8X2 IPPK Inositol-pentakisphosphate 2-kinase

C_508 6 O75909 CCNK Cyclin-K

C_508 6 Q9NYV4 CDK12 Cyclin-dependent kinase 12

C_509 6 Q01082 SPTBN1 Spectrin beta chain, non-erythrocytic 1

C_509 6 Q13813 SPTAN1 Spectrin alpha chain, non-erythrocytic 1

C_509 6 Q9UBK7 RABL2A Rab-like protein 2A

C_509 6 P14859 POU2F1 POU domain, class 2, transcription factor 1

C_509 6 Q16643 DBN1 Drebrin

221

C_509 6 Q9NUU7 DDX19A ATP-dependent RNA helicase DDX19A

C_510 6 Q01081 U2AF1 Splicing factor U2AF 35 kDa subunit

C_510 6 P18754 RCC1 Regulator of chromosome condensation

C_510 6 P43487 RANBP1 Ran-specific GTPase-activating protein

C_510 6 Q9Y4B6 VPRBP Protein VPRBP

C_510 6 P61970 NUTF2 Nuclear transport factor 2

C_510 6 P62826 RAN GTP-binding nuclear protein Ran

C_511 6 Q9H0U3 MAGT1 Magnesium transporter protein 1

Dolichyl-diphosphooligosaccharide--protein C_511 6 Q8TCJ2 STT3B glycosyltransferase subunit STT3B

Dolichyl-diphosphooligosaccharide--protein C_511 6 P04844 RPN2 glycosyltransferase subunit 2

Dolichyl-diphosphooligosaccharide--protein C_511 6 P04843 RPN1 glycosyltransferase subunit 1

Dolichyl-diphosphooligosaccharide--protein C_511 6 P39656 DDOST glycosyltransferase 48 kDa subunit

C_511 6 P27824 CANX Calnexin

C_512 6 Q9NZ01 TECR Trans-2,3-enoyl-CoA reductase

C_512 6 P20339 RAB5A Ras-related protein Rab-5A

C_512 6 Q8IXI1 RHOT2 Mitochondrial Rho GTPase 2

C_512 6 Q13724 MOGS Mannosyl-oligosaccharide glucosidase

C_512 6 O14494 PPAP2A Lipid phosphate phosphohydrolase 1

C_512 6 Q9Y6M1 IGF2BP2 Insulin-like growth factor 2 mRNA-binding protein 2

C_513 6 Q5JTH9 RRP12 RRP12-like protein

C_513 6 Q9NW13 RBM28 RNA-binding protein 28

C_513 6 O15318 POLR3G DNA-directed RNA polymerase III subunit RPC7

C_513 6 Q9H1D9 POLR3F DNA-directed RNA polymerase III subunit RPC6

C_513 6 Q9BUI4 POLR3C DNA-directed RNA polymerase III subunit RPC3

C_513 6 P17812 CTPS1 CTP synthase 1

C_514 6 Q6P5Z2 PKN3 Serine/threonine-protein kinase N3

C_514 6 Q8IX18 DHX40 Probable ATP-dependent RNA helicase DHX40

222

C_514 6 P09913 IFIT2 Interferon-induced protein with tetratricopeptide repeats 2

C_514 6 O15111 CHUK Inhibitor of nuclear factor kappa-B kinase subunit alpha

C_514 6 Q99871 HAUS7 HAUS augmin-like complex subunit 7

C_514 6 Q7Z478 DHX29 ATP-dependent RNA helicase DHX29

C_515 6 O75128 COBL Protein cordon-bleu

C_515 6 Q9UKY7 CDV3 Protein CDV3 homolog

C_515 6 O14908 GIPC1 PDZ domain-containing protein GIPC1

C_515 6 Q92615 LARP4B La-related protein 4B

C_515 6 Q9Y2U8 LEMD3 Inner nuclear membrane protein Man1

C_515 6 Q9H4G4 GLIPR2 Golgi-associated plant pathogenesis-related protein 1

C_516 6 Q8WXB4 ZNF606 Zinc finger protein 606

C_516 6 Q14202 ZMYM3 Zinc finger MYM-type protein 3

C_516 6 Q8NI36 WDR36 WD repeat-containing protein 36

C_516 6 Q9NPG3 UBN1 Ubinuclein-1

C_516 6 Q9UDY2 TJP2 Tight junction protein ZO-2

C_516 6 P11474 ESRRA Steroid hormone receptor ERR1

C_517 6 Q92804 TAF15 TATA-binding protein-associated factor 2N

C_517 6 Q9UHR5 SAP30BP SAP30-binding protein

C_517 6 Q8WUM0 NUP133 Nuclear pore complex protein Nup133

C_517 6 Q14315 FLNC Filamin-C

C_517 6 P29692 EEF1D Elongation factor 1-delta

C_517 6 Q16531 DDB1 DNA damage-binding protein 1

C_518 6 Q9NTI5 PDS5B Sister chromatid cohesion protein PDS5 homolog B

C_518 6 P35249 RFC4 Replication factor C subunit 4

C_518 6 P35251 RFC1 Replication factor C subunit 1

C_518 6 P12004 PCNA Proliferating cell nuclear antigen

C_518 6 P13473 LAMP2 Lysosome-associated membrane glycoprotein 2

C_518 6 P53999 SUB1 Activated RNA polymerase II transcriptional coactivator p15

223

C_519 6 Q16637 SMN1/2 Survival motor neuron protein

C_519 6 Q9UQ35 SRRM2 Serine/arginine repetitive matrix protein 2

C_519 6 Q8IYB3 SRRM1 Serine/arginine repetitive matrix protein 1

C_519 6 P49756 RBM25 RNA-binding protein 25

C_519 6 Q96PK6 RBM14 RNA-binding protein 14

C_519 6 Q8N3U4 STAG2 Cohesin subunit SA-2

C_520 6 P51808 DYNLT3 Dynein light chain Tctex-type 3

C_520 6 O75935 DCTN3 Dynactin subunit 3

C_520 6 Q13561 DCTN2 Dynactin subunit 2

C_520 6 O43237 DYNC1LI2 Cytoplasmic dynein 1 light intermediate chain 2

C_520 6 P42025 ACTR1B Beta-centractin

C_520 6 P61163 ACTR1A Alpha-centractin

C_521 6 Q8N5G0 C4orf52 Uncharacterized protein C4orf52

C_521 6 Q9NZR1 TMOD2 Tropomodulin-2

C_521 6 O94842 TOX4 TOX high mobility group box family member 4

C_521 6 Q9BWJ5 SF3B5 Splicing factor 3B subunit 5

C_521 6 Q13435 SF3B2 Splicing factor 3B subunit 2

C_521 6 Q7RTV0 PHF5A PHD finger-like domain-containing protein 5A

C_522 6 P49458 SRP9 Signal recognition particle 9 kDa protein

C_522 6 P37108 SRP14 Signal recognition particle 14 kDa protein

C_522 6 Q15185 PTGES3 Prostaglandin E synthase 3

C_522 6 O60832 DKC1 H/ACA ribonucleoprotein complex subunit 4

C_522 6 Q9NX24 NHP2 H/ACA ribonucleoprotein complex subunit 2

C_522 6 Q9NY12 GAR1 H/ACA ribonucleoprotein complex subunit 1

C_523 6 O00541 PES1 Pescadillo homolog

C_523 6 Q99733 NAP1L4 Nucleosome assembly protein 1-like 4

C_523 6 P55209 NAP1L1 Nucleosome assembly protein 1-like 1

C_523 6 O75607 NPM3 Nucleoplasmin-3

224

C_523 6 P06748 NPM1 Nucleophosmin

C_523 6 P68400 CSNK2A1 Casein kinase II subunit alpha

C_524 6 Q9BSH4 TACO1 Translational activator of cytochrome c oxidase 1

C_524 6 Q16762 TST Thiosulfate sulfurtransferase

C_524 6 Q3MHD2 LSM12 Protein LSM12 homolog

C_524 6 Q9BXS6 NUSAP1 Nucleolar and spindle-associated protein 1

C_524 6 Q15599 SLC9A3R2 Na(+)/H(+) exchange regulatory cofactor NHE-RF2

C_524 6 Q96IR7 HPDL 4-hydroxyphenylpyruvate dioxygenase-like protein

C_525 6 A6NK07 IF2BL Uncharacterized protein ENSP00000249480

Putative eukaryotic translation initiation factor 2 subunit 3-like C_525 6 Q2VIR3 EIF2S3L protein

C_525 6 Q96PZ0 PUS7 Pseudouridylate synthase 7 homolog

C_525 6 P41091 EIF2S3 Eukaryotic translation initiation factor 2 subunit 3

C_525 6 P20042 EIF2S2 Eukaryotic translation initiation factor 2 subunit 2

C_525 6 P05198 EIF2S1 Eukaryotic translation initiation factor 2 subunit 1

C_526 6 P46459 NSF Vesicle-fusing ATPase

C_526 6 O75818 RPP40 Ribonuclease P protein subunit p40

C_526 6 O75817 POP7 Ribonuclease P protein subunit p20

C_526 6 A4D2B8 PMS2P1 Putative postmeiotic segregation increased 2-like protein 1

C_526 6 P54278 PMS2 Mismatch repair endonuclease PMS2

C_526 6 P40692 MLH1 DNA mismatch repair protein Mlh1

C_527 7 Q9UNX4 WDR3 WD repeat-containing protein 3

C_527 7 Q9NRX1 PNO1 RNA-binding protein PNO1

C_527 7 Q92979 EMG1 Ribosomal RNA small subunit methyltransferase NEP1

C_527 7 P06400 RB1 Retinoblastoma-associated protein

C_527 7 Q9Y6V7 DDX49 Probable ATP-dependent RNA helicase DDX49

C_527 7 P49674 CSNK1E Casein kinase I isoform epsilon

C_527 7 Q13895 BYSL Bystin

C_528 7 Q9BQ24 ZFYVE21 Zinc finger FYVE domain-containing protein 21

225

C_528 7 O75083 WDR1 WD repeat-containing protein 1

C_528 7 Q5MNZ6 WDR45L WD repeat domain phosphoinositide-interacting protein 3

C_528 7 P61086 UBE2K Ubiquitin-conjugating enzyme E2 K

C_528 7 P04818 TYMS

C_528 7 Q9BRA2 TXNDC17 Thioredoxin domain-containing protein 17

C_528 7 P30566 ADSL

C_529 7 Q12888 TP53BP1 Tumor suppressor p53-binding protein 1

C_529 7 P29992 GNA11 Guanine nucleotide-binding protein subunit alpha-11

C_529 7 P08754 GNAI3 Guanine nucleotide-binding protein G(k) subunit alpha

Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit C_529 7 P62873 GNB1 beta-1

C_529 7 P04899 GNAI2 Guanine nucleotide-binding protein G(i) subunit alpha-2

C_529 7 P63096 GNAI1 Guanine nucleotide-binding protein G(i) subunit alpha-1

C_529 7 O60563 CCNT1 Cyclin-T1

C_530 7 P07237 P4HB Protein disulfide-isomerase

C_530 7 Q14697 GANAB Neutral alpha-glucosidase AB

C_530 7 Q04695 KRT17 Keratin, type I cytoskeletal 17

C_530 7 P14314 PRKCSH Glucosidase 2 subunit beta

C_530 7 P14625 HSP90B1 Endoplasmin

C_530 7 P27797 CALR Calreticulin

C_530 7 P27824 CANX Calnexin

C_531 7 Q12907 LMAN2 Vesicular integral-membrane protein VIP36

C_531 7 P51148 RAB5C Ras-related protein Rab-5C

C_531 7 Q02978 SLC25A11 Mitochondrial 2-oxoglutarate/malate carrier protein

C_531 7 Q7L5Y9 MAEA Macrophage erythroblast attacher

C_531 7 O75955 FLOT1 Flotillin-1

C_531 7 Q96G23 CERS2 Ceramide synthase 2

C_531 7 Q16790 CA9 Carbonic anhydrase 9

C_532 7 O94855 SEC24D Protein transport protein Sec24D

226

C_532 7 P53992 SEC24C Protein transport protein Sec24C

C_532 7 Q13492 PICALM Phosphatidylinositol-binding clathrin assembly protein

C_532 7 O75146 HIP1R Huntingtin-interacting protein 1-related protein

C_532 7 Q9NR46 SH3GLB2 Endophilin-B2

C_532 7 Q9Y4J8 DTNA Dystrobrevin alpha

C_532 7 O95782 AP2A1 AP-2 complex subunit alpha-1

C_533 7 Q9GZZ9 UBA5 Ubiquitin-like modifier-activating enzyme 5

C_533 7 P45974 USP5 Ubiquitin carboxyl-terminal hydrolase 5

C_533 7 Q9BU02 THTPA Thiamine-triphosphatase

C_533 7 O15020 SPTBN2 Spectrin beta chain, non-erythrocytic 2

C_533 7 P41229 KDM5C Lysine-specific demethylase 5C

C_533 7 Q96KB5 PBK Lymphokine-activated killer T-cell-originated protein kinase

C_533 7 Q96JP5 ZFP91 E3 ubiquitin-protein ligase ZFP91

C_534 7 Q6IAN0 DHRS7B Dehydrogenase/reductase SDR family member 7B

C_534 7 Q04656 ATP7A Copper-transporting ATPase 1

C_534 7 O60641 SNAP91 Clathrin coat assembly protein AP180

C_534 7 Q8TDX6 CSGALNACT1 Chondroitin sulfate N-acetylgalactosaminyltransferase 1

C_534 7 O15155 BET1 BET1 homolog

C_534 7 Q9UH62 ARMCX3 Armadillo repeat-containing X-linked protein 3

C_534 7 Q9UKV3 ACIN1 Apoptotic chromatin condensation inducer in the nucleus

C_535 7 P61006 RAB8A Ras-related protein Rab-8A

C_535 7 Q13636 RAB31 Ras-related protein Rab-31

C_535 7 P22307 SCP2 Non-specific lipid-transfer protein

C_535 7 P51553 IDH3G Isocitrate dehydrogenase [NAD] subunit gamma

C_535 7 P17931 LGALS3 Galectin-3

C_535 7 Q16527 CSRP2 Cysteine and glycine-rich protein 2

C_535 7 O00116 AGPS Alkyldihydroxyacetonephosphate synthase, peroxisomal

C_536 7 Q9UQ80 PA2G4 Proliferation-associated protein 2G4

227

C_536 7 Q8NC51 SERBP1 Plasminogen activator inhibitor 1 RNA-binding protein

C_536 7 Q9BY44 EIF2A Eukaryotic translation initiation factor 2A

C_536 7 P80723 BASP1 Brain acid soluble protein 1

C_536 7 P30049 ATP5D ATP synthase subunit delta, mitochondrial

C_536 7 P06576 ATP5B ATP synthase subunit beta, mitochondrial

C_536 7 P25705 ATP5A1 ATP synthase subunit alpha, mitochondrial

C_537 7 Q9NRL3 STRN4 Striatin-4

C_537 7 Q5VSL9 FAM40A Protein FAM40A

C_537 7 O60568 PLOD3 Procollagen-lysine,2-oxoglutarate 5-dioxygenase 3

C_537 7 Q8NBJ5 GLT25D1 Procollagen galactosyltransferase 1

C_537 7 P26641 EEF1G Elongation factor 1-gamma

C_537 7 P24534 EEF1B2 Elongation factor 1-beta

C_537 7 P10809 HSPD1 60 kDa heat shock protein, mitochondrial

C_538 7 Q15904 ATP6AP1 V-type proton ATPase subunit S1

C_538 7 Q99729 HNRNPAB Heterogeneous nuclear ribonucleoprotein A/B

C_538 7 Q6P161 MRPL54 39S ribosomal protein L54, mitochondrial

C_538 7 Q9Y2R9 MRPS7 28S ribosomal protein S7, mitochondrial

C_538 7 P82663 MRPS25 28S ribosomal protein S25, mitochondrial

C_538 7 P82650 MRPS22 28S ribosomal protein S22, mitochondrial

C_538 7 Q9Y676 MRPS18B 28S ribosomal protein S18b, mitochondrial

C_539 7 P51784 USP11 Ubiquitin carboxyl-terminal hydrolase 11

C_539 7 Q9BUR4 WRAP53 Telomerase Cajal body protein 1

C_539 7 Q8NDT2 RBM15B Putative RNA-binding protein 15B

C_539 7 P51608 MECP2 Methyl-CpG-binding protein 2

C_539 7 Q16576 RBBP7 Histone-binding protein RBBP7

C_539 7 O14929 HAT1 Histone acetyltransferase type B catalytic subunit

C_539 7 Q01831 XPC DNA repair protein complementing XP-C cells

C_540 7 Q86U86 PBRM1 Protein polybromo-1

228

C_540 7 Q969F1 GTF3C6 General transcription factor 3C polypeptide 6

C_540 7 Q9Y5Q8 GTF3C5 General transcription factor 3C polypeptide 5

C_540 7 Q9UKN8 GTF3C4 General transcription factor 3C polypeptide 4

C_540 7 Q9Y5Q9 GTF3C3 General transcription factor 3C polypeptide 3

C_540 7 Q8WUA4 GTF3C2 General transcription factor 3C polypeptide 2

C_540 7 Q12789 GTF3C1 General transcription factor 3C polypeptide 1

C_541 7 Q8IYI6 EXOC8 Exocyst complex component 8

C_541 7 Q9UPT5 EXOC7 Exocyst complex component 7

C_541 7 Q8TAG9 EXOC6 Exocyst complex component 6

C_541 7 O00471 EXOC5 Exocyst complex component 5

C_541 7 Q96A65 EXOC4 Exocyst complex component 4

C_541 7 O60645 EXOC3 Exocyst complex component 3

C_541 7 Q96KP1 EXOC2 Exocyst complex component 2

UDP-N-acetylglucosamine--peptide N- C_542 7 O15294 OGT acetylglucosaminyltransferase 110 kDa subunit

Serine/threonine-protein phosphatase PP1-beta catalytic C_542 7 P62140 PPP1CB subunit

Serine/threonine-protein phosphatase PP1-alpha catalytic C_542 7 P62136 PPP1CA subunit

C_542 7 P51003 PAPOLA Poly(A) polymerase alpha

C_542 7 P51610 HCFC1 Host cell factor 1

C_542 7 Q9H7Z6 KAT8 Histone acetyltransferase KAT8

C_542 7 P11802 CDK4 Cyclin-dependent kinase 4

C_543 7 P52788 SMS Spermine synthase

Serine/threonine-protein phosphatase 4 regulatory subunit C_543 7 Q5MIZ7 SMEK2 3B

C_543 7 Q8IXS6 PALM2 Paralemmin-2

C_543 7 Q8TEM1 NUP210 Nuclear pore membrane glycoprotein 210

C_543 7 Q16719 KYNU Kynureninase

C_543 7 P38432 COIL Coilin

C_543 7 Q53H82 LACTB2 Beta-lactamase-like protein 2

229

C_544 7 P19388 POLR2E DNA-directed RNA polymerases I, II, and III subunit RPABC1

C_544 7 P36954 POLR2I DNA-directed RNA polymerase II subunit RPB9

C_544 7 P62487 POLR2G DNA-directed RNA polymerase II subunit RPB7

C_544 7 O15514 POLR2D DNA-directed RNA polymerase II subunit RPB4

C_544 7 P19387 POLR2C DNA-directed RNA polymerase II subunit RPB3

C_544 7 P30876 POLR2B DNA-directed RNA polymerase II subunit RPB2

C_544 7 P24928 POLR2A DNA-directed RNA polymerase II subunit RPB1

C_545 7 Q9H814 PHAX Phosphorylated adapter RNA export protein

C_545 7 P23588 EIF4B Eukaryotic translation initiation factor 4B

C_545 7 O43432 EIF4G3 Eukaryotic translation initiation factor 4 gamma 3

C_545 7 P78344 EIF4G2 Eukaryotic translation initiation factor 4 gamma 2

C_545 7 Q04637 EIF4G1 Eukaryotic translation initiation factor 4 gamma 1

C_545 7 Q14240 EIF4A2 Eukaryotic initiation factor 4A-II

C_545 7 P60842 EIF4A1 Eukaryotic initiation factor 4A-I

C_546 7 Q12768 KIAA0196 WASH complex subunit strumpellin

C_546 7 Q9Y4E1 FAM21C WASH complex subunit FAM21C

C_546 7 Q5SNT6 FAM21B WASH complex subunit FAM21B

C_546 7 Q641Q2 FAM21A WASH complex subunit FAM21A

C_546 7 Q9Y3C0 CCDC53 WASH complex subunit CCDC53

C_546 7 Q2M389 KIAA1033 WASH complex subunit 7

C_546 7 A8K0Z3 WASH1 WAS protein family homolog 1

C_547 7 P51149 RAB7A Ras-related protein Rab-7a

C_547 7 P62491 RAB11A Ras-related protein Rab-11A

C_547 7 P46940 IQGAP1 Ras GTPase-activating-like protein IQGAP1

C_547 7 P35579 MYH9 Myosin-9

C_547 7 P35580 MYH10 Myosin-10

C_547 7 O43504 HBXIP Hepatitis B virus X-interacting protein

C_547 7 P11021 HSPA5 78 kDa glucose-regulated protein

230

C_548 7 O95365 ZBTB7A Zinc finger and BTB domain-containing protein 7A

C_548 7 P49754 VPS41 Vacuolar protein sorting-associated protein 41 homolog

C_548 7 Q96AX1 VPS33A Vacuolar protein sorting-associated protein 33A

C_548 7 Q9P253 VPS18 Vacuolar protein sorting-associated protein 18 homolog

C_548 7 Q9H269 VPS16 Vacuolar protein sorting-associated protein 16 homolog

C_548 7 Q8WXA9 SREK1 Splicing regulatory glutamine/lysine-rich protein 1

C_548 7 O75691 UTP20 Small subunit processome component 20 homolog

C_549 7 Q9BZL1 UBL5 Ubiquitin-like protein 5

C_549 7 P62837 UBE2D2 Ubiquitin-conjugating enzyme E2 D2

C_549 7 Q6PI78 TMEM65 Transmembrane protein 65

C_549 7 P10599 TXN Thioredoxin

C_549 7 O14907 TAX1BP3 Tax1-binding protein 3

C_549 7 Q9NQC3 RTN4 Reticulon-4

C_549 7 P42765 ACAA2 3-ketoacyl-CoA thiolase, mitochondrial

C_550 8 P62256 UBE2H Ubiquitin-conjugating enzyme E2 H

C_550 8 P52888 THOP1 Thimet oligopeptidase

C_550 8 Q8N584 TTC39C Tetratricopeptide repeat protein 39C

Serine/threonine-protein phosphatase 4 regulatory subunit C_550 8 Q5MIZ7 SMEK2 3B

RNA polymerase II subunit A C-terminal domain C_550 8 Q9NP77 SSU72 phosphatase SSU72

C_550 8 P52735 VAV2 Guanine nucleotide exchange factor VAV2

C_550 8 Q53H82 LACTB2 Beta-lactamase-like protein 2

C_550 8 Q9BWD1 ACAT2 Acetyl-CoA acetyltransferase, cytosolic

C_551 8 O96006 ZBED1 Zinc finger BED domain-containing protein 1

C_551 8 P61964 WDR5 WD repeat-containing protein 5

C_551 8 P49459 UBE2A Ubiquitin-conjugating enzyme E2 A

C_551 8 O94782 USP1 Ubiquitin carboxyl-terminal hydrolase 1

C_551 8 Q12792 TWF1 Twinfilin-1

C_551 8 Q9H900 ZWILCH Protein zwilch homolog

231

C_551 8 Q9H3U1 UNC45A Protein unc-45 homolog A

C_551 8 P55060 CSE1L Exportin-2

C_552 8 P51149 RAB7A Ras-related protein Rab-7a

C_552 8 Q9H0U4 RAB1B Ras-related protein Rab-1B

C_552 8 P62820 RAB1A Ras-related protein Rab-1A

C_552 8 Q15907 RAB11B Ras-related protein Rab-11B

C_552 8 P62491 RAB11A Ras-related protein Rab-11A

C_552 8 P50395 GDI2 Rab GDP dissociation inhibitor beta

C_552 8 P31150 GDI1 Rab GDP dissociation inhibitor alpha

C_552 8 O00170 AIP AH receptor-interacting protein

C_553 8 O15116 LSM1 U6 snRNA-associated Sm-like protein LSm1

C_553 8 O43172 PRPF4 U4/U6 small nuclear ribonucleoprotein Prp4

C_553 8 O43395 PRPF3 U4/U6 small nuclear ribonucleoprotein Prp3

C_553 8 Q86TB9 PATL1 Protein PAT1 homolog 1

C_553 8 O96028 WHSC1 Probable histone-lysine N-methyltransferase NSD2

C_553 8 Q9BVI0 PHF20 PHD finger protein 20

C_553 8 O43447 PPIH Peptidyl-prolyl cis-trans isomerase H

C_553 8 Q99547 MPHOSPH6 M-phase phosphoprotein 6

C_554 8 Q9NZI6 TFCP2L1 Transcription factor CP2-like protein 1

C_554 8 Q13838 DDX39B Spliceosome RNA helicase DDX39B

C_554 8 Q9Y5S9 RBM8A RNA-binding protein 8A

C_554 8 Q96A72 MAGOHB Protein mago nashi homolog 2

C_554 8 O95391 SLU7 Pre-mRNA-splicing factor SLU7

Pre-mRNA-splicing factor ATP-dependent RNA helicase C_554 8 Q92620 DHX38 PRP16

C_554 8 P27816 MAP4 Microtubule-associated protein 4

C_554 8 P60866 RPS20 40S ribosomal protein S20

C_555 8 Q96DT7 ZBTB10 Zinc finger and BTB domain-containing protein 10

C_555 8 Q9ULM3 YEATS2 YEATS domain-containing protein 2

232

C_555 8 O75643 SNRNP200 U5 small nuclear ribonucleoprotein 200 kDa helicase

C_555 8 Q69YN4 KIAA1429 Protein virilizer homolog

C_555 8 Q8N8D1 PDCD7 Programmed cell death protein 7

C_555 8 Q93008 USP9X Probable ubiquitin carboxyl-terminal hydrolase FAF-X

C_555 8 Q5T4S7 UBR4 E3 ubiquitin-protein ligase UBR4

C_555 8 Q15029 EFTUD2 116 kDa U5 small nuclear ribonucleoprotein component

C_556 8 A0AVT1 UBA6 Ubiquitin-like modifier-activating enzyme 6

C_556 8 P15374 UCHL3 Ubiquitin carboxyl-terminal hydrolase isozyme L3

C_556 8 Q86UV5 USP48 Ubiquitin carboxyl-terminal hydrolase 48

C_556 8 Q99598 TSNAX Translin-associated protein X

C_556 8 Q15631 TSN Translin

C_556 8 Q9Y3F4 STRAP Serine-threonine kinase receptor-associated protein

C_556 8 P55786 NPEPPS Puromycin-sensitive aminopeptidase

C_556 8 P40424 PBX1 Pre-B-cell leukemia transcription factor 1

C_557 8 P29966 MARCKS Myristoylated alanine-rich C-kinase substrate

C_557 8 Q9H2D1 SLC25A32 Mitochondrial folate transporter/carrier

C_557 8 Q9Y6C9 MTCH2 Mitochondrial carrier homolog 2

LIM and senescent cell antigen-like-containing domain C_557 8 P48059 LIMS1 protein 1

C_557 8 Q01650 SLC7A5 Large neutral amino acids transporter small subunit 1

C_557 8 Q00839 HNRNPU Heterogeneous nuclear ribonucleoprotein U

C_557 8 Q96GC5 MRPL48 39S ribosomal protein L48, mitochondrial

C_557 8 P82930 MRPS34 28S ribosomal protein S34, mitochondrial

C_558 8 Q9BRQ0 PYGO2 Pygopus homolog 2

C_558 8 Q02818 NUCB1 Nucleobindin-1

C_558 8 P52732 KIF11 Kinesin-like protein KIF11

C_558 8 P33176 KIF5B Kinesin-1 heavy chain

C_558 8 Q9NSK0 KLC4 Kinesin light chain 4

C_558 8 Q9H0B6 KLC2 Kinesin light chain 2

233

C_558 8 Q07866 KLC1 Kinesin light chain 1

C_558 8 Q12840 KIF5A Kinesin heavy chain isoform 5A

C_559 8 Q96MW5 COG8 Conserved oligomeric Golgi complex subunit 8

C_559 8 P83436 COG7 Conserved oligomeric Golgi complex subunit 7

C_559 8 Q9Y2V7 COG6 Conserved oligomeric Golgi complex subunit 6

C_559 8 Q9UP83 COG5 Conserved oligomeric Golgi complex subunit 5

C_559 8 Q9H9E3 COG4 Conserved oligomeric Golgi complex subunit 4

C_559 8 Q96JB2 COG3 Conserved oligomeric Golgi complex subunit 3

C_559 8 Q14746 COG2 Conserved oligomeric Golgi complex subunit 2

C_559 8 Q8WTW3 COG1 Conserved oligomeric Golgi complex subunit 1

C_560 8 Q9H9H4 VPS37B Vacuolar protein sorting-associated protein 37B

C_560 8 Q9H832 UBE2Z Ubiquitin-conjugating enzyme E2 Z

C_560 8 P54578 USP14 Ubiquitin carboxyl-terminal hydrolase 14

C_560 8 Q99816 TSG101 Tumor susceptibility gene 101 protein

C_560 8 Q86UE8 TLK2 Serine/threonine-protein kinase tousled-like 2

C_560 8 Q9GZL7 WDR12 Ribosome biogenesis protein WDR12

C_560 8 Q9UIA9 XPO7 Exportin-7

C_560 8 O14980 XPO1 Exportin-1

C_561 8 Q9Y224 C14orf166 UPF0568 protein C14orf166

C_561 8 Q9Y3I0 C22orf28 tRNA-splicing ligase RtcB homolog

C_561 8 Q52LJ0 FAM98B Protein FAM98B

C_561 8 Q92839 HAS1 Hyaluronan synthase 1

C_561 8 Q96RW7 HMCN1 Hemicentin-1

C_561 8 Q8TEQ6 GEMIN5 Gem-associated protein 5

C_561 8 Q92499 DDX1 ATP-dependent RNA helicase DDX1

C_561 8 Q9BVC5 C2orf49 Ashwin

C_562 8 P42285 SKIV2L2 Superkiller viralicidic activity 2-like 2

C_562 8 Q13573 SNW1 SNW domain-containing protein 1

234

C_562 8 P61011 SRP54 Signal recognition particle 54 kDa protein

Pre-mRNA-splicing factor ATP-dependent RNA helicase C_562 8 Q92620 DHX38 PRP16

C_562 8 Q6UX04 CWC27 Peptidyl-prolyl cis-trans isomerase CWC27 homolog

C_562 8 Q9NYD6 HOXC10 Homeobox protein Hox-C10

C_562 8 P78527 PRKDC DNA-dependent protein kinase catalytic subunit

C_562 8 Q13426 XRCC4 DNA repair protein XRCC4

C_563 8 P61923 COPZ1 Coatomer subunit zeta-1

C_563 8 Q9UBF2 COPG2 Coatomer subunit gamma-2

C_563 8 Q9Y678 COPG1 Coatomer subunit gamma-1

C_563 8 O14579 COPE Coatomer subunit epsilon

C_563 8 P48444 ARCN1 Coatomer subunit delta

C_563 8 P35606 COPB2 Coatomer subunit beta'

C_563 8 P53618 COPB1 Coatomer subunit beta

C_563 8 P53621 COPA Coatomer subunit alpha

C_564 8 Q96E11 MRRF Ribosome-recycling factor, mitochondrial

C_564 8 Q96PU8 QKI Protein quaking

C_564 8 Q9UKY7 CDV3 Protein CDV3 homolog

C_564 8 Q9NR12 PDLIM7 PDZ and LIM domain protein 7

C_564 8 Q15004 PAF PCNA-associated factor

C_564 8 O95182 NDUFA7 NADH dehydrogenase 1 alpha subcomplex subunit 7

C_564 8 O43678 NDUFA2 NADH dehydrogenase 1 alpha subcomplex subunit 2

C_564 8 P48382 RFX5 DNA-binding protein RFX5

C_565 8 Q9UHR5 SAP30BP SAP30-binding protein

C_565 8 Q96S59 RANBP9 Ran-binding protein 9

C_565 8 Q96RQ3 MCCC1 Methylcrotonoyl-CoA carboxylase subunit alpha

C_565 8 Q14315 FLNC Filamin-C

C_565 8 P29692 EEF1D Elongation factor 1-delta

C_565 8 Q9NP97 DYNLRB1 Dynein light chain roadblock-type 1

235

C_565 8 Q14203 DCTN1 Dynactin subunit 1

C_565 8 Q14204 DYNC1H1 Cytoplasmic dynein 1 heavy chain 1

C_566 9 O43298 ZBTB43 Zinc finger and BTB domain-containing protein 43

C_566 9 B0I1T2 MYO1G Unconventional myosin-Ig

C_566 9 Q9HC36 RNMTL1 RNA methyltransferase-like protein 1

C_566 9 P11233 RALA Ras-related protein Ral-A

C_566 9 Q13421 MSLN Mesothelin

C_566 9 P49006 MARCKSL1 MARCKS-related protein

C_566 9 Q13084 MRPL28 39S ribosomal protein L28, mitochondrial

C_566 9 Q9NWU5 MRPL22 39S ribosomal protein L22, mitochondrial

C_566 9 Q9NX20 MRPL16 39S ribosomal protein L16, mitochondrial

C_567 9 O00330 PDHX Pyruvate dehydrogenase protein X component, mitochondrial

Pyruvate dehydrogenase E1 component subunit beta, C_567 9 P11177 PDHB mitochondrial

Pyruvate dehydrogenase E1 component subunit alpha, C_567 9 P29803 PDHA2 testis-specific form, mitochondrial

Pyruvate dehydrogenase E1 component subunit alpha, C_567 9 P08559 PDHA1 somatic form, mitochondrial

C_567 9 P03915 MT-ND5 NADH-ubiquinone oxidoreductase chain 5

C_567 9 Q96DZ1 ERLEC1 Endoplasmic reticulum lectin 1

Dihydrolipoyllysine-residue acetyltransferase component of C_567 9 P10515 DLAT pyruvate dehydrogenase complex, mitochondrial

C_567 9 P10606 COX5B Cytochrome c oxidase subunit 5B, mitochondrial

C_567 9 Q02218 OGDH 2-oxoglutarate dehydrogenase, mitochondrial

C_568 9 Q9HB07 C12orf10 UPF0160 protein MYG1, mitochondrial

C_568 9 Q96BN8 FAM105B Protein FAM105B

C_568 9 Q92508 PIEZO1 Piezo-type mechanosensitive ion channel component 1

C_568 9 Q9UKG9 CROT Peroxisomal carnitine O-octanoyltransferase

C_568 9 Q13287 NMI N-myc-interactor

C_568 9 Q86V88 MDP1 Magnesium-dependent phosphatase 1

C_568 9 Q9UGP4 LIMD1 LIM domain-containing protein 1

236

C_568 9 Q9NY33 DPP3 Dipeptidyl peptidase 3

C_568 9 P60981 DSTN Destrin

C_569 9 Q8N9F8 ZNF454 Zinc finger protein 454

C_569 9 Q96NB2 SFXN2 Sideroflexin-2

C_569 9 P84103 SRSF3 Serine/arginine-rich splicing factor 3

C_569 9 Q9UQ90 SPG7 Paraplegin

C_569 9 P31483 TIA1 Nucleolysin TIA-1 isoform p40

C_569 9 Q9P0U1 TOMM7 Mitochondrial import receptor subunit TOM7 homolog

Mitochondrial import inner membrane translocase subunit C_569 9 Q99595 TIMM17A Tim17-A

C_569 9 P47985 UQCRFS1 Cytochrome b-c1 complex subunit Rieske, mitochondrial

Alpha-1,3-mannosyl-glycoprotein 4-beta-N- C_569 9 Q9UQ53 MGAT4B acetylglucosaminyltransferase B

C_570 9 Q9Y5V0 ZNF706 Zinc finger protein 706

C_570 9 O00488 ZNF593 Zinc finger protein 593

C_570 9 P38606 ATP6V1A V-type proton ATPase catalytic subunit A

C_570 9 P15374 UCHL3 Ubiquitin carboxyl-terminal hydrolase isozyme L3

C_570 9 P45974 USP5 Ubiquitin carboxyl-terminal hydrolase 5

C_570 9 Q8NBS9 TXNDC5 Thioredoxin domain-containing protein 5

C_570 9 P55060 CSE1L Exportin-2

C_570 9 Q8IWV7 UBR1 E3 ubiquitin-protein ligase UBR1

C_570 9 O43598 RCL Deoxyribonucleoside 5'-monophosphate N-glycosidase

C_571 9 Q96QR8 PURB Transcriptional activator protein Pur-beta

C_571 9 Q00577 PURA Transcriptional activator protein Pur-alpha

C_571 9 Q9BVC3 DSCC1 Sister chromatid cohesion protein DCC1

C_571 9 P40937 RFC5 Replication factor C subunit 5

C_571 9 P35249 RFC4 Replication factor C subunit 4

C_571 9 P40938 RFC3 Replication factor C subunit 3

C_571 9 P35250 RFC2 Replication factor C subunit 2

C_571 9 P35251 RFC1 Replication factor C subunit 1

237

C_571 9 Q8WVB6 CHTF18 Chromosome transmission fidelity protein 18 homolog

TFIIH basal transcription factor complex helicase XPD C_572 9 P18074 ERCC2 subunit

TFIIH basal transcription factor complex helicase XPB C_572 9 P19447 ERCC3 subunit

C_572 9 Q92759 GTF2H4 General transcription factor IIH subunit 4

C_572 9 Q13889 GTF2H3 General transcription factor IIH subunit 3

C_572 9 Q13888 GTF2H2 General transcription factor IIH subunit 2

C_572 9 P32780 GTF2H1 General transcription factor IIH subunit 1

C_572 9 P51946 CCNH Cyclin-H

C_572 9 P50613 CDK7 Cyclin-dependent kinase 7

C_572 9 P51948 MNAT1 CDK-activating kinase assembly factor MAT1

C_573 9 Q99623 PHB2 Prohibitin-2

C_573 9 P17813 ENG Endoglin

C_573 9 P49411 TUFM Elongation factor Tu, mitochondrial

Deoxyuridine 5'-triphosphate nucleotidohydrolase, C_573 9 P33316 DUT mitochondrial

C_573 9 Q9NWU5 MRPL22 39S ribosomal protein L22, mitochondrial

C_573 9 Q6P1L8 MRPL14 39S ribosomal protein L14, mitochondrial

C_573 9 O60783 MRPS14 28S ribosomal protein S14, mitochondrial

C_573 9 P82912 MRPS11 28S ribosomal protein S11, mitochondrial

C_573 9 P82664 MRPS10 28S ribosomal protein S10, mitochondrial

C_574 9 O75396 SEC22B Vesicle-trafficking protein SEC22b

Succinyl-CoA ligase [GDP-forming] subunit beta, C_574 9 Q96I99 SUCLG2 mitochondrial

Succinyl-CoA ligase [ADP/GDP-forming] subunit alpha, C_574 9 P53597 SUCLG1 mitochondrial

C_574 9 Q13586 STIM1 Stromal interaction molecule 1

C_574 9 Q9UHD8 SEPT9 Septin-9

C_574 9 Q9NVA2 SEPT11 Septin-11

C_574 9 Q96FQ6 S100A16 Protein S100-A16

238

C_574 9 Q15013 MAD2L1BP MAD2L1-binding protein

C_574 9 Q96AC1 FERMT2 Fermitin family homolog 2

C_575 9 A5PLL7 TMEM189 Transmembrane protein 189

C_575 9 O43464 HTRA2 Serine protease HTRA2, mitochondrial

C_575 9 P00491 PNP Purine nucleoside phosphorylase

NADH dehydrogenase [ubiquinone] flavoprotein 1, C_575 9 P49821 NDUFV1 mitochondrial

C_575 9 Q4G0N4 NADKD1 NAD kinase domain-containing protein 1

Mitochondrial import inner membrane translocase subunit C_575 9 Q9Y5J7 TIMM9 Tim9

C_575 9 Q96PL5 ERMAP Erythroid membrane-associated protein

C_575 9 Q9UBS4 DNAJB11 DnaJ homolog subfamily B member 11

C_575 9 P25325 MPST 3-mercaptopyruvate sulfurtransferase

C_576 9 Q9BTM9 URM1 Ubiquitin-related modifier 1 homolog

C_576 9 P62256 UBE2H Ubiquitin-conjugating enzyme E2 H

C_576 9 P45974 USP5 Ubiquitin carboxyl-terminal hydrolase 5

C_576 9 O75663 TIPRL TIP41-like protein

C_576 9 Q9Y6Y8 SEC23IP SEC23-interacting protein

RNA polymerase II subunit A C-terminal domain C_576 9 Q9NP77 SSU72 phosphatase SSU72

C_576 9 P32119 PRDX2 Peroxiredoxin-2

C_576 9 Q06830 PRDX1 Peroxiredoxin-1

C_576 9 O43598 RCL Deoxyribonucleoside 5'-monophosphate N-glycosidase

C_577 9 P61158 ACTR3 Actin-related protein 3

C_577 9 Q9BPX5 ARPC5L Actin-related protein 2/3 complex subunit 5-like protein

C_577 9 O15511 ARPC5 Actin-related protein 2/3 complex subunit 5

C_577 9 P59998 ARPC4 Actin-related protein 2/3 complex subunit 4

C_577 9 O15145 ARPC3 Actin-related protein 2/3 complex subunit 3

C_577 9 O15144 ARPC2 Actin-related protein 2/3 complex subunit 2

C_577 9 O15143 ARPC1B Actin-related protein 2/3 complex subunit 1B

239

C_577 9 Q92747 ARPC1A Actin-related protein 2/3 complex subunit 1A

C_577 9 P61160 ACTR2 Actin-related protein 2

C_578 9 Q9H0S4 DDX47 Probable ATP-dependent RNA helicase DDX47

C_578 9 O75475 PSIP1 PC4 and SFRS1-interacting protein

C_578 9 Q01780 EXOSC10 Exosome component 10

C_578 9 Q9NQT4 EXOSC5 Exosome complex component RRP46

C_578 9 Q06265 EXOSC9 Exosome complex component RRP45

C_578 9 Q9NPD3 EXOSC4 Exosome complex component RRP41

C_578 9 Q9NQT5 EXOSC3 Exosome complex component RRP40

C_578 9 Q13868 EXOSC2 Exosome complex component RRP4

C_578 9 P20810 CAST Calpastatin

C_579 9 Q99627 COPS8 COP9 signalosome complex subunit 8

C_579 9 Q9H9Q2 COPS7B COP9 signalosome complex subunit 7b

C_579 9 Q9UBW8 COPS7A COP9 signalosome complex subunit 7a

C_579 9 Q7L5N1 COPS6 COP9 signalosome complex subunit 6

C_579 9 Q92905 COPS5 COP9 signalosome complex subunit 5

C_579 9 Q9BT78 COPS4 COP9 signalosome complex subunit 4

C_579 9 Q9UNS2 COPS3 COP9 signalosome complex subunit 3

C_579 9 P61201 COPS2 COP9 signalosome complex subunit 2

C_579 9 Q13098 GPS1 COP9 signalosome complex subunit 1

C_580 10 Q9Y2X9 ZNF281 Zinc finger protein 281

C_580 10 Q9UBW7 ZMYM2 Zinc finger MYM-type protein 2

C_580 10 Q5T200 ZC3H13 Zinc finger CCCH domain-containing protein 13

C_580 10 Q9Y2W2 WBP11 WW domain-binding protein 11

C_580 10 Q7Z5K2 WAPAL Wings apart-like protein homolog

C_580 10 Q9Y4E8 USP15 Ubiquitin carboxyl-terminal hydrolase 15

C_580 10 Q14694 USP10 Ubiquitin carboxyl-terminal hydrolase 10

Transient receptor potential cation channel subfamily M C_580 10 Q7Z2W7 TRPM8 member 8

240

C_580 10 Q9UJT2 TSKS Testis-specific serine kinase substrate

C_580 10 Q96JM3 CHAMP1 Chromosome alignment-maintaining phosphoprotein 1

C_581 10 Q99757 TXN2 Thioredoxin, mitochondrial

C_581 10 Q7KZF4 SND1 Staphylococcal nuclease domain-containing protein 1

C_581 10 Q9NP81 SARS2 Serine--tRNA ligase, mitochondrial

C_581 10 Q969G5 PRKCDBP Protein kinase C delta-binding protein

C_581 10 O95758 PTBP3 Polypyrimidine tract-binding protein 3

C_581 10 Q9UMX5 NENF Neudesin

C_581 10 Q8IWA4 MFN1 Mitofusin-1

C_581 10 Q9NS69 TOMM22 Mitochondrial import receptor subunit TOM22 homolog

C_581 10 Q12905 ILF2 Interleukin enhancer-binding factor 2

C_581 10 O00461 GOLIM4 Golgi integral membrane protein 4

C_582 10 Q13263 TRIM28 Transcription intermediary factor 1-beta

C_582 10 Q92526 CCT6B T-complex protein 1 subunit zeta-2

C_582 10 P40227 CCT6A T-complex protein 1 subunit zeta

C_582 10 P50990 CCT8 T-complex protein 1 subunit theta

C_582 10 P49368 CCT3 T-complex protein 1 subunit gamma

C_582 10 Q99832 CCT7 T-complex protein 1 subunit eta

C_582 10 P48643 CCT5 T-complex protein 1 subunit epsilon

C_582 10 P50991 CCT4 T-complex protein 1 subunit delta

C_582 10 P78371 CCT2 T-complex protein 1 subunit beta

C_582 10 P17987 TCP1 T-complex protein 1 subunit alpha

C_583 10 Q8IY67 RAVER1 Ribonucleoprotein PTB-binding 1

C_583 10 O60568 PLOD3 Procollagen-lysine,2-oxoglutarate 5-dioxygenase 3

C_583 10 Q93100 PHKB Phosphorylase b kinase regulatory subunit beta

C_583 10 P51808 DYNLT3 Dynein light chain Tctex-type 3

C_583 10 P63172 DYNLT1 Dynein light chain Tctex-type 1

C_583 10 Q9NP97 DYNLRB1 Dynein light chain roadblock-type 1

241

C_583 10 P63167 DYNLL1 Dynein light chain 1, cytoplasmic

C_583 10 O43237 DYNC1LI2 Cytoplasmic dynein 1 light intermediate chain 2

C_583 10 Q9Y6G9 DYNC1LI1 Cytoplasmic dynein 1 light intermediate chain 1

C_583 10 Q14204 DYNC1H1 Cytoplasmic dynein 1 heavy chain 1

C_584 10 P62805 HIST1H4A Histone H4

C_584 10 P84243 H3F3A/B Histone H3.3

C_584 10 P62807 HIST1H2BC Histone H2B type 1-C/E/F/G/I

C_584 10 P33778 HIST1H2BB Histone H2B type 1-B

C_584 10 Q96A08 HIST1H2BA Histone H2B type 1-A

C_584 10 Q71UI9 H2AFV Histone H2A.V

C_584 10 P20671 HIST1H2AD Histone H2A type 1-D

HIST1H2AB; C_584 10 P04908 HIST1H2AE Histone H2A type 1-B/E

C_584 10 Q96QV6 HIST1H2AA Histone H2A type 1-A

C_584 10 P16403 HIST1H1C Histone H1.2

C_585 10 Q8TAD4 SLC30A5 Zinc transporter 5

C_585 10 Q9Y5K8 ATP6V1D V-type proton ATPase subunit D

C_585 10 Q13488 TCIRG1 V-type proton ATPase 116 kDa subunit a isoform 3

C_585 10 Q9Y487 ATP6V0A2 V-type proton ATPase 116 kDa subunit a isoform 2

C_585 10 Q93050 ATP6V0A1 V-type proton ATPase 116 kDa subunit a isoform 1

C_585 10 Q9Y3E5 PTRH2 Peptidyl-tRNA hydrolase 2, mitochondrial

Acidic leucine-rich nuclear phosphoprotein 32 family member C_585 10 Q9BTT0 ANP32E E

C_585 10 P15880 RPS2

C_585 10 P62269 RPS18 40S ribosomal protein S18

C_585 10 P62841 RPS15 40S ribosomal protein S15

C_586 10 P00519 ABL1 Tyrosine-protein kinase ABL1

C_586 10 Q96C90 PPP1R14B Protein phosphatase 1 regulatory subunit 14B

C_586 10 O15212 PFDN6 Prefoldin subunit 6

C_586 10 Q99471 PFDN5 Prefoldin subunit 5

242

C_586 10 Q9NQP4 PFDN4 Prefoldin subunit 4

C_586 10 P61758 VBP1 Prefoldin subunit 3

C_586 10 Q9UHV9 PFDN2 Prefoldin subunit 2

C_586 10 O60925 PFDN1 Prefoldin subunit 1

C_586 10 Q9NUG6 PDRG1 p53 and DNA damage-regulated protein 1

C_586 10 Q9Y5Y2 NUBP2 Cytosolic Fe-S cluster assembly factor NUBP2

C_587 10 Q9UJT2 TSKS Testis-specific serine kinase substrate

C_587 10 Q15459 SF3A1 Splicing factor 3A subunit 1

C_587 10 Q8IXT5 RBM12B RNA-binding protein 12B

C_587 10 P33993 MCM7 DNA replication licensing factor MCM7

C_587 10 Q14566 MCM6 DNA replication licensing factor MCM6

C_587 10 P33992 MCM5 DNA replication licensing factor MCM5

C_587 10 P33991 MCM4 DNA replication licensing factor MCM4

C_587 10 P25205 MCM3 DNA replication licensing factor MCM3

C_587 10 P49736 MCM2 DNA replication licensing factor MCM2

C_587 10 Q06203 PPAT Amidophosphoribosyltransferase

C_588 10 Q86YP4 GATAD2A Transcriptional repressor p66-alpha

C_588 10 Q15022 SUZ12 Polycomb protein SUZ12

C_588 10 O95983 MBD3 Methyl-CpG-binding domain protein 3

C_588 10 O94776 MTA2 Metastasis-associated protein MTA2

C_588 10 Q16576 RBBP7 Histone-binding protein RBBP7

C_588 10 Q09028 RBBP4 Histone-binding protein RBBP4

C_588 10 Q92769 HDAC2 2

C_588 10 Q13547 HDAC1 Histone deacetylase 1

C_588 10 Q14839 CHD4 Chromodomain-helicase-DNA-binding protein 4

C_588 10 Q12873 CHD3 Chromodomain-helicase-DNA-binding protein 3

C_589 10 Q9P0S9 TMEM14C Transmembrane protein 14C

C_589 10 Q15526 SURF1 Surfeit locus protein 1

243

C_589 10 Q9Y512 SAMM50 Sorting and assembly machinery component 50 homolog

NADH dehydrogenase [ubiquinone] iron-sulfur protein 8, C_589 10 O00217 NDUFS8 mitochondrial

NADH dehydrogenase [ubiquinone] iron-sulfur protein 3, C_589 10 O75489 NDUFS3 mitochondrial

NADH dehydrogenase [ubiquinone] 1 alpha subcomplex C_589 10 Q9BU61 NDUFAF3 assembly factor 3

C_589 10 Q02127 DHODH Dihydroorotate dehydrogenase

C_589 10 P08574 CYC1 Cytochrome c1, heme protein, mitochondrial

C_589 10 P00403 MT-CO2 Cytochrome c oxidase subunit 2

C_589 10 Q7KZN9 COX15 Cytochrome c oxidase assembly protein COX15 homolog

C_590 10 O75717 WDHD1 WD repeat and HMG-box DNA-binding protein 1

C_590 10 Q3ZAQ7 VMA21 Vacuolar ATPase assembly integral membrane protein

C_590 10 Q03405 PLAUR Urokinase plasminogen activator surface receptor

C_590 10 O95425 SVIL Supervillin

C_590 10 P55809 OXCT1 Succinyl-CoA:3-ketoacid coenzyme A transferase 1

C_590 10 Q9NX18 SDHAF2 Succinate dehydrogenase assembly factor 2

C_590 10 O95104 SCAF4 Splicing factor, arginine/serine-rich 15

C_590 10 Q13228 SELENBP1 Selenium-binding protein 1

C_590 10 Q96BR5 SELRC1 Sel1 repeat-containing protein 1

C_590 10 Q8NHP8 PLBD2 Putative phospholipase B-like 2

C_591 10 Q9H9B4 SFXN1 Sideroflexin-1

C_591 10 Q6KC79 NIPBL Nipped-B-like protein

C_591 10 O94826 TOMM70A Mitochondrial import receptor subunit TOM70

C_591 10 Q9P0U1 TOMM7 Mitochondrial import receptor subunit TOM7 homolog

C_591 10 O96008 TOMM40 Mitochondrial import receptor subunit TOM40 homolog

C_591 10 Q9NS69 TOMM22 Mitochondrial import receptor subunit TOM22 homolog

C_591 10 Q15388 TOMM20 Mitochondrial import receptor subunit TOM20 homolog

Mitochondrial import inner membrane translocase subunit C_591 10 Q3ZCQ8 TIMM50 TIM50

C_591 10 Q16611 BAK1 Bcl-2 homologous antagonist/killer

244

C_591 10 Q9NP58 ABCB6 ATP-binding cassette sub-family B member 6

C_592 10 O75410 TACC1 Transforming acidic coiled-coil-containing protein 1

C_592 10 P19623 SRM Spermidine synthase

C_592 10 P31949 S100A11 Protein S100-A11

C_592 10 Q99622 C12orf57 Protein C10

C_592 10 O15031 PLXNB2 Plexin-B2

C_592 10 Q9Y680 FKBP7 Peptidyl-prolyl cis-trans isomerase FKBP7

C_592 10 Q9GZT8 NIF3L1 NIF3-like protein 1

C_592 10 Q969M7 UBE2F NEDD8-conjugating enzyme UBE2F

C_592 10 P21399 ACO1 Cytoplasmic aconitate hydratase

C_592 10 Q5TFE4 NT5DC1 5'-nucleotidase domain-containing protein 1

C_593 10 P30260 CDC27 Cell division cycle protein 27 homolog

C_593 10 Q9UJX2 CDC23 Cell division cycle protein 23 homolog

C_593 10 Q13042 CDC16 Cell division cycle protein 16 homolog

C_593 10 Q9UJX3 ANAPC7 Anaphase-promoting complex subunit 7

C_593 10 Q9UJX4 ANAPC5 Anaphase-promoting complex subunit 5

C_593 10 Q9UJX5 ANAPC4 Anaphase-promoting complex subunit 4

C_593 10 Q9UJX6 ANAPC2 Anaphase-promoting complex subunit 2

C_593 10 Q9BS18 ANAPC13 Anaphase-promoting complex subunit 13

C_593 10 Q9UM13 ANAPC10 Anaphase-promoting complex subunit 10

C_593 10 Q9H1A4 ANAPC1 Anaphase-promoting complex subunit 1

C_594 10 Q92541 RTF1 RNA polymerase-associated protein RTF1 homolog

C_594 10 Q8WVC0 LEO1 RNA polymerase-associated protein LEO1

C_594 10 Q6PD62 CTR9 RNA polymerase-associated protein CTR9 homolog

C_594 10 Q8N7H5 PAF1 RNA polymerase II-associated factor 1 homolog

C_594 10 Q6P1J9 CDC73 Parafibromin

C_594 10 Q08945 SSRP1 FACT complex subunit SSRP1

C_594 10 Q9Y5B9 SUPT16H FACT complex subunit SPT16

245

C_594 10 P67870 CSNK2B Casein kinase II subunit beta

C_594 10 P19784 CSNK2A2 Casein kinase II subunit alpha'

C_594 10 P68400 CSNK2A1 Casein kinase II subunit alpha

C_595 11 O60293 ZFC3H1 Zinc finger C3H1 domain-containing protein

C_595 11 Q15906 VPS72 Vacuolar protein sorting-associated protein 72 homolog

C_595 11 Q9Y230 RUVBL2 RuvB-like 2

C_595 11 Q9Y265 RUVBL1 RuvB-like 1

C_595 11 Q9UBU8 MORF4L1 Mortality factor 4-like protein 1

C_595 11 Q6ZRS2 SRCAP Helicase SRCAP

C_595 11 Q96L91 EP400 E1A-binding protein p400

C_595 11 Q9NPF5 DMAP1 DNA methyltransferase 1-associated protein 1

C_595 11 Q9UEE9 CFDP1 Craniofacial development protein 1

C_595 11 Q9H9F9 ACTR5 Actin-related protein 5

C_595 11 P60709 ACTB Actin, cytoplasmic 1

C_596 11 Q9BUF5 TUBB6 Tubulin beta-6 chain

C_596 11 P68371 TUBB4B Tubulin beta-4B chain

C_596 11 Q13509 TUBB3 Tubulin beta-3 chain

C_596 11 Q13885 TUBB2A Tubulin beta-2A chain

C_596 11 P07437 TUBB Tubulin beta chain

C_596 11 P68366 TUBA4A Tubulin alpha-4A chain

C_596 11 Q9BQE3 TUBA1C Tubulin alpha-1C chain

C_596 11 P68363 TUBA1B Tubulin alpha-1B chain

C_596 11 Q71U36 TUBA1A Tubulin alpha-1A chain

C_596 11 Q99867 TBB4Q Putataive tubulin beta-4q chain

C_596 11 P27708 CAD CAD protein

C_597 11 O75396 SEC22B Vesicle-trafficking protein SEC22b

C_597 11 Q13428 TCOF1 Treacle protein

C_597 11 Q9P0T7 TMEM9 Transmembrane protein 9

246

C_597 11 P49755 TMED10 Transmembrane emp24 domain-containing protein 10

C_597 11 Q16762 TST Thiosulfate sulfurtransferase

Succinyl-CoA:3-ketoacid coenzyme A transferase 1, C_597 11 P55809 OXCT1 mitochondrial

C_597 11 Q13586 STIM1 Stromal interaction molecule 1

C_597 11 Q9NVA2 SEPT11 Septin-11

C_597 11 Q96FQ6 S100A16 Protein S100-A16

C_597 11 P50897 PPT1 Palmitoyl-protein thioesterase 1

C_597 11 P18887 XRCC1 DNA repair protein XRCC1

C_598 11 Q8NFH5 NUP35 Nucleoporin NUP53

C_598 11 Q969V3 NCLN Nicalin

C_598 11 P10253 GAA Lysosomal alpha-glucosidase

C_598 11 Q96AG4 LRRC59 Leucine-rich repeat-containing protein 59

LETM1 and EF-hand domain-containing protein 1, C_598 11 O95202 LETM1 mitochondrial

C_598 11 Q12906 ILF3 Interleukin enhancer-binding factor 3

C_598 11 Q9NZI8 IGF2BP1 Insulin-like growth factor 2 mRNA-binding protein 1

C_598 11 O60506 SYNCRIP Heterogeneous nuclear ribonucleoprotein Q

Dolichyl-diphosphooligosaccharide--protein C_598 11 Q8TCJ2 STT3B glycosyltransferase subunit STT3B

Dolichyl-diphosphooligosaccharide--protein C_598 11 P04844 RPN2 glycosyltransferase subunit 2

Dolichyl-diphosphooligosaccharide--protein C_598 11 P04843 RPN1 glycosyltransferase subunit 1

C_599 12 O95292 VAPB Vesicle-associated membrane protein-associated protein B/C

C_599 12 Q5JTV8 TOR1AIP1 Torsin-1A-interacting protein 1

C_599 12 Q5UIP0 RIF1 Telomere-associated protein RIF1

C_599 12 P55809 OXCT1 Succinyl-CoA:3-ketoacid coenzyme A transferase 1

C_599 12 Q96I99 SUCLG2 Succinyl-CoA ligase [GDP-forming] subunit beta

C_599 12 Q9P2R7 SUCLA2 Succinyl-CoA ligase [ADP-forming] subunit beta

C_599 12 P53597 SUCLG1 Succinyl-CoA ligase [ADP/GDP-forming] subunit alpha

247

C_599 12 Q96FQ6 S100A16 Protein S100-A16

C_599 12 Q5JRX3 PITRM1 Presequence protease, mitochondrial

C_599 12 P51688 SGSH N-sulphoglucosamine sulphohydrolase

C_599 12 P50213 IDH3A Isocitrate dehydrogenase [NAD] subunit alpha

C_599 12 Q96FN4 CPNE2 Copine-2

C_600 12 Q7L2H7 EIF3M Eukaryotic translation initiation factor 3 subunit M

C_600 12 Q9Y262 EIF3L Eukaryotic translation initiation factor 3 subunit L

C_600 12 Q9UBQ5 EIF3K Eukaryotic translation initiation factor 3 subunit K

C_600 12 Q13347 EIF3I Eukaryotic translation initiation factor 3 subunit I

C_600 12 O15372 EIF3H Eukaryotic translation initiation factor 3 subunit H

C_600 12 O75821 EIF3G Eukaryotic translation initiation factor 3 subunit G

C_600 12 O00303 EIF3F Eukaryotic translation initiation factor 3 subunit F

C_600 12 P60228 EIF3E Eukaryotic translation initiation factor 3 subunit E

C_600 12 O15371 EIF3D Eukaryotic translation initiation factor 3 subunit D

EIF3C; C_600 12 Q99613 EIF3CL Eukaryotic translation initiation factor 3 subunit C

C_600 12 P55884 EIF3B Eukaryotic translation initiation factor 3 subunit B

C_600 12 Q14152 EIF3A Eukaryotic translation initiation factor 3 subunit A

C_601 12 P52434 POLR2H DNA-directed RNA polymerases I, II, and III subunit RPABC3

C_601 12 P19388 POLR2E DNA-directed RNA polymerases I, II, and III subunit RPABC1

C_601 12 Q9Y2S0 POLR1D DNA-directed RNA polymerases I and III subunit RPAC2

C_601 12 O15160 POLR1C DNA-directed RNA polymerases I and III subunit RPAC1

C_601 12 O75575 CRCP DNA-directed RNA polymerase III subunit RPC9

C_601 12 Q9Y535 POLR3H DNA-directed RNA polymerase III subunit RPC8

C_601 12 P05423 POLR3D DNA-directed RNA polymerase III subunit RPC4

C_601 12 Q9BUI4 POLR3C DNA-directed RNA polymerase III subunit RPC3

C_601 12 Q9NW08 POLR3B DNA-directed RNA polymerase III subunit RPC2

C_601 12 O14802 POLR3A DNA-directed RNA polymerase III subunit RPC1

C_601 12 Q9H9Y6 POLR1B DNA-directed RNA polymerase I subunit RPA2

248

C_601 12 O95602 POLR1A DNA-directed RNA polymerase I subunit RPA1

C_602 13 Q9HBM6 TAF9B Transcription initiation factor TFIID subunit 9B

C_602 13 Q16594 TAF9 Transcription initiation factor TFIID subunit 9

C_602 13 Q15545 TAF7 Transcription initiation factor TFIID subunit 7

C_602 13 P49848 TAF6 Transcription initiation factor TFIID subunit 6

C_602 13 Q15542 TAF5 Transcription initiation factor TFIID subunit 5

C_602 13 O00268 TAF4 Transcription initiation factor TFIID subunit 4

C_602 13 Q5VWG9 TAF3 Transcription initiation factor TFIID subunit 3

C_602 13 Q6P1X5 TAF2 Transcription initiation factor TFIID subunit 2

C_602 13 Q8IZX4 TAF1L Transcription initiation factor TFIID subunit 1-like

C_602 13 Q15543 TAF13 Transcription initiation factor TFIID subunit 13

C_602 13 Q12962 TAF10 Transcription initiation factor TFIID subunit 10

C_602 13 P21675 TAF1 Transcription initiation factor TFIID subunit 1

C_602 13 P20226 TBP TATA-box-binding protein

C_603 13 P28331 NDUFS1 NADH-ubiquinone oxidoreductase 75 kDa subunit

C_603 13 O75251 NDUFS7 NADH dehydrogenase iron-sulfur protein 7

C_603 13 Q9Y6M9 NDUFB9 NADH dehydrogenase 1 beta subcomplex subunit 9

C_603 13 O43674 NDUFB5 NADH dehydrogenase 1 beta subcomplex subunit 5

C_603 13 O96000 NDUFB10 NADH dehydrogenase 1 beta subcomplex subunit 10

C_603 13 Q9UI09 NDUFA12 NADH dehydrogenase 1 alpha subcomplex subunit 12

C_603 13 P40926 MDH2 Malate dehydrogenase, mitochondrial

C_603 13 P48047 ATP5O ATP synthase subunit O, mitochondrial

C_603 13 P36542 ATP5C1 ATP synthase subunit gamma, mitochondrial

C_603 13 P30049 ATP5D ATP synthase subunit delta, mitochondrial

C_603 13 P06576 ATP5B ATP synthase subunit beta, mitochondrial

C_603 13 P24539 ATP5F1 ATP synthase subunit b, mitochondrial

C_603 13 P25705 ATP5A1 ATP synthase subunit alpha, mitochondrial

C_604 14 P63000 RAC1 Ras-related C3 botulinum toxin substrate 1

249

C_604 14 Q96C90 PPP1R14B Protein phosphatase 1 regulatory subunit 14B

C_604 14 O15212 PFDN6 Prefoldin subunit 6

C_604 14 Q9UHV9 PFDN2 Prefoldin subunit 2

C_604 14 O60925 PFDN1 Prefoldin subunit 1

C_604 14 Q9BPZ3 PAIP2 Polyadenylate-binding protein-interacting protein 2

C_604 14 Q9H074 PAIP1 Polyadenylate-binding protein-interacting protein 1

C_604 14 Q13310 PABPC4 Polyadenylate-binding protein 4

C_604 14 Q96G74 OTUD5 OTU domain-containing protein 5

C_604 14 Q8WXI7 MUC16 Mucin-16

C_604 14 O95373 IPO7 Importin-7

C_604 14 P78318 IGBP1 Immunoglobulin-binding protein 1

C_604 14 Q9Y5Y2 NUBP2 Cytosolic Fe-S cluster assembly factor NUBP2

C_604 14 P12532 CKMT1A/B Creatine kinase U-type, mitochondrial

C_605 14 Q9NP81 SARS2 Serine--tRNA ligase, mitochondrial

C_605 14 P11498 PC Pyruvate carboxylase, mitochondrial

C_605 14 Q969G5 PRKCDBP Protein kinase C delta-binding protein

C_605 14 O95758 PTBP3 Polypyrimidine tract-binding protein 3

C_605 14 P41219 PRPH Peripherin

C_605 14 Q9UMX5 NENF Neudesin

C_605 14 P43243 MATR3 Matrin-3

C_605 14 Q9UMR5 PPT2 Lysosomal thioesterase PPT2

C_605 14 P10253 GAA Lysosomal alpha-glucosidase

C_605 14 Q9NSE4 IARS2 Isoleucine--tRNA ligase, mitochondrial

C_605 14 O43852 CALU Calumenin

C_605 14 P19022 CDH2 Cadherin-2

C_605 14 P11586 MTHFD1 C-1-tetrahydrofolate synthase, cytoplasmic

C_605 14 Q13405 MRPL49 39S ribosomal protein L49, mitochondrial

C_606 14 P03915 MT-ND5 NADH-ubiquinone oxidoreductase chain 5

250

C_606 14 P03886 MT-ND1 NADH-ubiquinone oxidoreductase chain 1

NADH-ubiquinone oxidoreductase 75 kDa subunit, C_606 14 P28331 NDUFS1 mitochondrial

C_606 14 O00217 NDUFS8 NADH dehydrogenase iron-sulfur protein 8, mitochondrial

C_606 14 O75251 NDUFS7 NADH dehydrogenase iron-sulfur protein 7, mitochondrial

C_606 14 O75489 NDUFS3 NADH dehydrogenase iron-sulfur protein 3, mitochondrial

C_606 14 O75306 NDUFS2 NADH dehydrogenase iron-sulfur protein 2, mitochondrial

C_606 14 P56181 NDUFV3 NADH dehydrogenase flavoprotein 3, mitochondrial

C_606 14 P19404 NDUFV2 NADH dehydrogenase flavoprotein 2, mitochondrial

C_606 14 P49821 NDUFV1 NADH dehydrogenase flavoprotein 1, mitochondrial

C_606 14 O96000 NDUFB10 NADH dehydrogenase 1 beta subcomplex subunit 10

C_606 14 Q16718 NDUFA5 NADH dehydrogenase 1 alpha subcomplex subunit 5

C_606 14 P61916 NPC2 Epididymal secretory protein E1

C_606 14 P16989 CSDA DNA-binding protein A

C_607 15 Q8ND56 LSM14A Protein LSM14 homolog A

C_607 15 P26599 PTBP1 Polypyrimidine tract-binding protein 1

C_607 15 Q09666 AHNAK Neuroblast differentiation-associated protein AHNAK

C_607 15 P22626 HNRNPA2B1 Heterogeneous nuclear ribonucleoproteins A2/B1

C_607 15 Q9BUJ2 HNRNPUL1 Heterogeneous nuclear ribonucleoprotein U-like protein 1

C_607 15 P14866 HNRNPL Heterogeneous nuclear ribonucleoprotein L

C_607 15 P31942 HNRNPH3 Heterogeneous nuclear ribonucleoprotein H3

C_607 15 P55795 HNRNPH2 Heterogeneous nuclear ribonucleoprotein H2

C_607 15 P31943 HNRNPH1 Heterogeneous nuclear ribonucleoprotein H

C_607 15 P52597 HNRNPF Heterogeneous nuclear ribonucleoprotein F

C_607 15 O14979 HNRPDL Heterogeneous nuclear ribonucleoprotein D-like

C_607 15 Q14103 HNRNPD Heterogeneous nuclear ribonucleoprotein D0

C_607 15 P51991 HNRNPA3 Heterogeneous nuclear ribonucleoprotein A3

C_607 15 P09651 HNRNPA1 Heterogeneous nuclear ribonucleoprotein A1

C_607 15 Q13151 HNRNPA0 Heterogeneous nuclear ribonucleoprotein A0

251

C_608 16 Q9UHR6 ZNHIT2 Zinc finger HIT domain-containing protein 2

C_608 16 Q00341 HDLBP Vigilin

C_608 16 Q86VN1 VPS36 Vacuolar protein-sorting-associated protein 36

C_608 16 Q9BRG1 VPS25 Vacuolar protein-sorting-associated protein 25

C_608 16 Q96QK1 VPS35 Vacuolar protein sorting-associated protein 35

C_608 16 Q9UBQ0 VPS29 Vacuolar protein sorting-associated protein 29

C_608 16 Q4G0F5 VPS26B Vacuolar protein sorting-associated protein 26B

C_608 16 O75436 VPS26A Vacuolar protein sorting-associated protein 26A

C_608 16 Q06418 TYRO3 Tyrosine-protein kinase receptor TYRO3

tRNA (adenine(58)-N(1))-methyltransferase catalytic subunit C_608 16 Q96FX7 TRMT61A TRMT61A

tRNA (adenine(58)-N(1))-methyltransferase non-catalytic C_608 16 Q9UJA5 TRMT6 subunit TRM6

C_608 16 O95801 TTC4 Tetratricopeptide repeat protein 4

C_608 16 O60749 SNX2 Sorting nexin-2

C_608 16 Q92900 UPF1 Regulator of nonsense transcripts 1

C_608 16 Q8TAE6 PPP1R14C Protein phosphatase 1 regulatory subunit 14C

C_608 16 Q9UKK6 NXT1 NTF2-related export protein 1

C_609 17 Q9Y2L8 ZKSCAN5 Zinc finger protein with KRAB and SCAN domains 5

Zinc finger CCHC-type and RNA-binding motif-containing C_609 17 Q8TBF4 ZCRB1 protein 1

C_609 17 Q8WU90 ZC3H15 Zinc finger CCCH domain-containing protein 15

C_609 17 P08670 VIM Vimentin

C_609 17 P46939 UTRN Utrophin

C_609 17 Q15386 UBE3C Ubiquitin-protein ligase E3C

C_609 17 Q92995 USP13 Ubiquitin carboxyl-terminal hydrolase 13

C_609 17 O43818 RRP9 U3 small nucleolar RNA-interacting protein 2

C_609 17 Q6I9Y2 THOC7 THO complex subunit 7 homolog

C_609 17 Q86W42 THOC6 THO complex subunit 6 homolog

C_609 17 Q13769 THOC5 THO complex subunit 5 homolog

252

C_609 17 Q96J01 THOC3 THO complex subunit 3

C_609 17 Q8NI27 THOC2 THO complex subunit 2

C_609 17 Q96FV9 THOC1 THO complex subunit 1

C_609 17 Q8N806 UBR7 Putative E3 ubiquitin-protein ligase UBR7

C_609 17 O94829 IPO13 Importin-13

C_609 17 Q9C0E2 XPO4 Exportin-4

C_610 18 P22314 UBA1 Ubiquitin-like modifier-activating enzyme 1

C_610 18 Q9NXH9 TRMT1 tRNA (guanine(26)-N(2))-dimethyltransferase

C_610 18 P61758 VBP1 Prefoldin subunit 3

C_610 18 P56192 MARS Methionine--tRNA ligase, cytoplasmic

C_610 18 Q15046 KARS Lysine--tRNA ligase

C_610 18 Q9P2J5 LARS Leucine--tRNA ligase, cytoplasmic

C_610 18 P41252 IARS Isoleucine--tRNA ligase, cytoplasmic

C_610 18 P41250 GARS Glycine--tRNA ligase

C_610 18 P47897 QARS Glutamine--tRNA ligase

C_610 18 O43324 EEF1E1 Eukaryotic translation elongation factor 1 epsilon-1

C_610 18 P07814 EPRS Bifunctional glutamate/proline--tRNA ligase

C_610 18 P14868 DARS Aspartate--tRNA ligase, cytoplasmic

C_610 18 O43776 NARS Asparagine--tRNA ligase, cytoplasmic

C_610 18 P08243 ASNS Asparagine synthetase [glutamine-hydrolyzing]

C_610 18 P54136 RARS Arginine--tRNA ligase, cytoplasmic

Aminoacyl tRNA synthase complex-interacting C_610 18 Q13155 AIMP2 multifunctional protein 2

Aminoacyl tRNA synthase complex-interacting C_610 18 Q12904 AIMP1 multifunctional protein 1

C_610 18 P49588 AARS Alanine--tRNA ligase, cytoplasmic

C_611 20 Q5JSH3 WDR44 WD repeat-containing protein 44

C_611 20 Q9UK45 LSM7 U6 snRNA-associated Sm-like protein LSm7

C_611 20 P62312 LSM6 U6 snRNA-associated Sm-like protein LSm6

C_611 20 Q9Y4Y9 LSM5 U6 snRNA-associated Sm-like protein LSm5

253

C_611 20 Q9Y4Z0 LSM4 U6 snRNA-associated Sm-like protein LSm4

C_611 20 P62310 LSM3 U6 snRNA-associated Sm-like protein LSm3

C_611 20 Q9Y333 LSM2 U6 snRNA-associated Sm-like protein LSm2

C_611 20 Q00403 GTF2B Transcription initiation factor IIB

C_611 20 Q15560 TCEA2 Transcription elongation factor A protein 2

C_611 20 Q96E14 RMI2 RecQ-mediated genome instability protein 2

Pyridine nucleotide-disulfide oxidoreductase domain- C_611 20 Q8WU10 PYROXD1 containing protein 1

Putative pre-mRNA-splicing factor ATP-dependent RNA C_611 20 O60231 DHX16 helicase DHX16

C_611 20 Q99633 PRPF18 Pre-mRNA-splicing factor 18

C_611 20 O95777 NAA38 N-alpha-acetyltransferase 38, NatC auxiliary subunit

C_611 20 Q99707 MTR Methionine synthase

C_611 20 Q8TDG4 HELQ Helicase POLQ-like

C_611 20 Q8IWJ2 GCC2 GRIP and coiled-coil domain-containing protein 2

C_611 20 Q3T8J9 GON4L GON-4-like protein

C_611 20 Q99747 NAPG Gamma-soluble NSF attachment protein

C_611 20 Q9NZN4 EHD2 EH domain-containing protein 2

C_612 20 Q92782 DPF1 Zinc finger protein neuro-d4

C_612 20 Q9UIG0 BAZ1B Tyrosine-protein kinase BAZ1B

C_612 20 P51532 SMARCA4 Transcription activator BRG1

SWI/SNF-related matrix-associated actin-dependent C_612 20 Q969G3 SMARCE1 regulator of chromatin subfamily E member 1

SWI/SNF-related matrix-associated actin-dependent C_612 20 Q6STE5 SMARCD3 regulator of chromatin subfamily D member 3

SWI/SNF-related matrix-associated actin-dependent C_612 20 Q92925 SMARCD2 regulator of chromatin subfamily D member 2

SWI/SNF-related matrix-associated actin-dependent C_612 20 Q96GM5 SMARCD1 regulator of chromatin subfamily D member 1

SWI/SNF-related matrix-associated actin-dependent C_612 20 Q12824 SMARCB1 regulator of chromatin subfamily B member 1

C_612 20 Q8TAQ2 SMARCC2 SWI/SNF complex subunit SMARCC2

254

C_612 20 Q92922 SMARCC1 SWI/SNF complex subunit SMARCC1

C_612 20 Q15291 RBBP5 Retinoblastoma-binding protein 5

C_612 20 P51531 SMARCA2 Probable global transcription activator SNF2L2

C_612 20 Q8WUB8 PHF10 PHD finger protein 10

C_612 20 Q12830 BPTF Nucleosome-remodeling factor subunit BPTF

C_612 20 O75376 NCOR1 Nuclear receptor corepressor 1

C_612 20 Q9UGU5 HMGXB4 HMG domain-containing protein 4

C_612 20 Q02880 TOP2B DNA topoisomerase 2-beta

C_612 20 Q8NFD5 ARID1B AT-rich interactive domain-containing protein 1B

C_612 20 O14497 ARID1A AT-rich interactive domain-containing protein 1A

C_612 20 O96019 ACTL6A Actin-like protein 6A

C_613 20 Q9Y312 C20orf4 Uncharacterized protein C20orf4

C_613 20 Q96DI7 SNRNP40 U5 small nuclear ribonucleoprotein 40 kDa protein

C_613 20 O75643 SNRNP200 U5 small nuclear ribonucleoprotein 200 kDa helicase

C_613 20 P09234 SNRPC U1 small nuclear ribonucleoprotein C

C_613 20 P08621 SNRNP70 U1 small nuclear ribonucleoprotein 70 kDa

C_613 20 P14678 SNRPB Small nuclear ribonucleoprotein-associated proteins B and B'

C_613 20 P62318 SNRPD3 Small nuclear ribonucleoprotein Sm D3

C_613 20 P62316 SNRPD2 Small nuclear ribonucleoprotein Sm D2

C_613 20 P62314 SNRPD1 Small nuclear ribonucleoprotein Sm D1

C_613 20 P62304 SNRPE Small nuclear ribonucleoprotein E

C_613 20 Q8IYB3 SRRM1 Serine/arginine repetitive matrix protein 1

C_613 20 O95905 ECD Protein SGT1

C_613 20 Q6P2Q9 PRPF8 Pre-mRNA-processing-splicing factor 8

C_613 20 O94906 PRPF6 Pre-mRNA-processing factor 6

C_613 20 Q9UMS4 PRPF19 Pre-mRNA-processing factor 19

C_613 20 P27695 APEX1 DNA-(apurinic or apyrimidinic site) lyase

C_613 20 Q9BZJ0 CRNKL1 Crooked neck-like protein 1

255

Coiled-coil-helix-coiled-coil-helix domain-containing protein 3, C_613 20 Q9NX63 CHCHD3 mitochondrial

C_613 20 O95400 CD2BP2 CD2 antigen cytoplasmic tail-binding protein 2

C_613 20 P82909 MRPS36 28S ribosomal protein S36, mitochondrial

C_614 20 Q70CQ2 USP34 Ubiquitin carboxyl-terminal hydrolase 34

C_614 20 Q9Y6A5 TACC3 Transforming acidic coiled-coil-containing protein 3

C_614 20 Q04726 TLE3 Transducin-like enhancer protein 3

C_614 20 Q6ZVM7 TOM1L2 TOM1-like protein 2

C_614 20 Q15654 TRIP6 Thyroid receptor-interacting protein 6

C_614 20 Q6YHU6 THADA Thyroid adenoma-associated protein

SWI/SNF-related matrix-associated actin-dependent C_614 20 Q9NZC9 SMARCAL1 regulator of chromatin subfamily A-like protein 1

C_614 20 Q9UBT2 UBA2 SUMO-activating enzyme subunit 2

C_614 20 P50225 SULT1A1 Sulfotransferase 1A1

C_614 20 Q9BYN0 SRXN1 Sulfiredoxin-1

C_614 20 Q93045 STMN2 Stathmin-2

C_614 20 P63165 SUMO1 Small ubiquitin-related modifier 1

C_614 20 O60232 SSSCA1 Sjoegren syndrome/scleroderma autoantigen 1

C_614 20 Q92783 STAM Signal transducing adapter molecule 1

C_614 20 P42226 STAT6 Signal transducer and activator of transcription 6

C_614 20 O43865 AHCYL1 Putative adenosylhomocysteinase 2

C_614 20 O95486 SEC24A Protein transport protein Sec24A

C_614 20 O15355 PPM1G Protein phosphatase 1G

C_614 20 O76070 SNCG Gamma-synuclein

C_614 20 Q9UNE7 STUB1 E3 ubiquitin-protein ligase CHIP

C_615 20 Q9UBB9 TFIP11 Tuftelin-interacting protein 11

C_615 20 Q9UPN7 PPP6R1 Serine/threonine-protein phosphatase 6 regulatory subunit 1

C_615 20 Q96G25 MED8 Mediator of RNA polymerase II transcription subunit 8

C_615 20 O75586 MED6 Mediator of RNA polymerase II transcription subunit 6

C_615 20 Q9NPJ6 MED4 Mediator of RNA polymerase II transcription subunit 4

256

C_615 20 Q9NX70 MED29 Mediator of RNA polymerase II transcription subunit 29

C_615 20 Q9H204 MED28 Mediator of RNA polymerase II transcription subunit 28

C_615 20 Q6P2C8 MED27 Mediator of RNA polymerase II transcription subunit 27

C_615 20 O75448 MED24 Mediator of RNA polymerase II transcription subunit 24

C_615 20 Q15528 MED22 Mediator of RNA polymerase II transcription subunit 22

C_615 20 Q13503 MED21 Mediator of RNA polymerase II transcription subunit 21

C_615 20 Q9BUE0 MED18 Mediator of RNA polymerase II transcription subunit 18

C_615 20 Q9NVC6 MED17 Mediator of RNA polymerase II transcription subunit 17

C_615 20 Q9Y2X0 MED16 Mediator of RNA polymerase II transcription subunit 16

C_615 20 O60244 MED14 Mediator of RNA polymerase II transcription subunit 14

C_615 20 Q9UHV7 MED13 Mediator of RNA polymerase II transcription subunit 13

C_615 20 Q93074 MED12 Mediator of RNA polymerase II transcription subunit 12

C_615 20 Q9P086 MED11 Mediator of RNA polymerase II transcription subunit 11

C_615 20 Q9BTT4 MED10 Mediator of RNA polymerase II transcription subunit 10

C_615 20 Q15648 MED1 Mediator of RNA polymerase II transcription subunit 1

C_616 30 O75152 ZC3H11A Zinc finger CCCH domain-containing protein 11A

C_616 30 Q9Y277 VDAC3 Voltage-dependent anion-selective channel protein 3

C_616 30 P11441 UBL4A Ubiquitin-like protein 4A

C_616 30 P09234 SNRPC U1 small nuclear ribonucleoprotein C

C_616 30 Q01995 TAGLN Transgelin

C_616 30 Q9Y4P3 TBL2 Transducin beta-like protein 2

C_616 30 P04216 THY1 Thy-1 membrane glycoprotein

C_616 30 O15400 STX7 Syntaxin-7

C_616 30 Q8IWZ8 SUGP1 SURP and G-patch domain-containing protein 1

C_616 30 Q15637 SF1 Splicing factor 1

C_616 30 Q9BQ15 NABP2 SOSS complex subunit B1

C_616 30 O60493 SNX3 Sorting nexin-3

C_616 30 Q9Y5M8 SRPRB Signal recognition particle receptor subunit beta

257

C_616 30 P82979 SARNP SAP domain-containing ribonucleoprotein

C_616 30 Q9P2N5 RBM27 RNA-binding protein 27

C_616 30 P29558 RBMS1 RNA-binding motif, single-stranded-interacting protein 1

C_616 30 Q96E11 MRRF Ribosome-recycling factor, mitochondrial

C_616 30 Q9Y3A5 SBDS Ribosome maturation protein SBDS

C_616 30 P52758 HRSP12 Ribonuclease UK114

C_616 30 O43819 SCO2 Protein SCO2 homolog, mitochondrial

C_616 30 Q969X1 TMBIM1 Protein lifeguard 3

C_616 30 O60828 PQBP1 Polyglutamine-binding protein 1

C_616 30 P30405 PPIF Peptidyl-prolyl cis-trans isomerase F, mitochondrial

C_616 30 P23284 PPIB Peptidyl-prolyl cis-trans isomerase B

C_616 30 P22307 SCP2 Non-specific lipid-transfer protein

C_616 30 O95182 NDUFA7 NADH dehydrogenase 1 alpha subcomplex subunit 7

C_616 30 Q8TAP9 MPLKIP M-phase-specific PLK1-interacting protein

Mitochondrial import inner membrane translocase subunit C_616 30 O43615 TIMM44 TIM44

C_616 30 Q96EL3 MRPL53 39S ribosomal protein L53, mitochondrial

C_616 30 Q92665 MRPS31 28S ribosomal protein S31, mitochondrial

C_617 35 Q86VM9 ZC3H18 Zinc finger CCCH domain-containing protein 18

C_617 35 Q2TAY7 SMU1 WD40 repeat-containing protein SMU1

C_617 35 P04004 VTN Vitronectin

C_617 35 Q6EMK4 VASN Vasorin

C_617 35 Q53GS9 USP39 U4/U6.U5 tri-snRNP-associated protein 2

C_617 35 Q9BVJ6 UTP14A U3 small nucleolar RNA-associated protein 14 homolog A

C_617 35 Q13641 TPBG Trophoblast glycoprotein

C_617 35 Q9BYV6 TRIM55 Tripartite motif-containing protein 55

C_617 35 Q13595 TRA2A Transformer-2 protein homolog alpha

C_617 35 P10646 TFPI Tissue factor pathway inhibitor

C_617 35 Q9Y2W1 THRAP3 Thyroid hormone receptor-associated protein 3

258

C_617 35 O15260 SURF4 Surfeit locus protein 4

C_617 35 Q01081 U2AF1 Splicing factor U2AF 35 kDa subunit

C_617 35 Q16629 SRSF7 Serine/arginine-rich splicing factor 7

C_617 35 Q13243 SRSF5 Serine/arginine-rich splicing factor 5

C_617 35 Q05519 SRSF11 Serine/arginine-rich splicing factor 11

C_617 35 O75494 SRSF10 Serine/arginine-rich splicing factor 10

C_617 35 Q16181 SEPT7 Septin-7

C_617 35 Q15019 SEPT2 Septin-2

C_617 35 Q15424 SAFB Scaffold attachment factor B1

C_617 35 P18583 SON Protein SON

C_617 35 P06702 S100A9 Protein S100-A9

C_617 35 Q13123 IK Protein Red

C_617 35 P41223 BUD31 Protein BUD31 homolog

C_617 35 Q9Y3B4 SF3B14 Pre-mRNA branch site protein p14

C_617 35 Q9NR30 DDX21 Nucleolar RNA helicase 2

C_617 35 Q12857 NFIA Nuclear factor 1 A-type

C_617 35 Q92542 NCSTN Nicastrin

C_617 35 O00422 SAP18 Histone deacetylase complex subunit SAP18

C_617 35 Q12789 GTF3C1 General transcription factor 3C polypeptide 1

C_617 35 Q9Y5B9 SUPT16H FACT complex subunit SPT16

C_617 35 P43246 MSH2 DNA mismatch repair protein Msh2

C_617 35 Q5BKZ1 ZNF326 DBIRD complex subunit ZNF326

C_617 35 O43175 PHGDH D-3-phosphoglycerate dehydrogenase

C_617 35 Q9H2P0 ADNP Activity-dependent neuroprotector homeobox protein

C_618 38 O75643 SNRNP200 U5 small nuclear ribonucleoprotein 200 kDa helicase

C_618 38 O43290 SART1 U4/U6.U5 tri-snRNP-associated protein 1

C_618 38 O43172 PRPF4 U4/U6 small nuclear ribonucleoprotein Prp4

C_618 38 Q8WWY3 PRPF31 U4/U6 small nuclear ribonucleoprotein Prp31

259

C_618 38 O43395 PRPF3 U4/U6 small nuclear ribonucleoprotein Prp3

C_618 38 P08579 SNRPB2 U2 small nuclear ribonucleoprotein B''

C_618 38 P09661 SNRPA1 U2 small nuclear ribonucleoprotein A'

C_618 38 P09012 SNRPA U1 small nuclear ribonucleoprotein A

C_618 38 P08621 SNRNP70 U1 small nuclear ribonucleoprotein 70 kDa

C_618 38 O75962 TRIO Triple functional domain protein

C_618 38 P26368 U2AF2 Splicing factor U2AF 65 kDa subunit

C_618 38 Q01081 U2AF1 Splicing factor U2AF 35 kDa subunit

C_618 38 Q9BWJ5 SF3B5 Splicing factor 3B subunit 5

C_618 38 Q15427 SF3B4 Splicing factor 3B subunit 4

C_618 38 Q15393 SF3B3 Splicing factor 3B subunit 3

C_618 38 Q13435 SF3B2 Splicing factor 3B subunit 2

C_618 38 O75533 SF3B1 Splicing factor 3B subunit 1

C_618 38 Q12874 SF3A3 Splicing factor 3A subunit 3

C_618 38 Q15428 SF3A2 Splicing factor 3A subunit 2

C_618 38 Q15459 SF3A1 Splicing factor 3A subunit 1

C_618 38 P62318 SNRPD3 Small nuclear ribonucleoprotein Sm D3

C_618 38 P62316 SNRPD2 Small nuclear ribonucleoprotein Sm D2

C_618 38 P62314 SNRPD1 Small nuclear ribonucleoprotein Sm D1

C_618 38 Q07955 SRSF1 Serine/arginine-rich splicing factor 1

C_618 38 Q86TB9 PATL1 Protein PAT1 homolog 1

C_618 38 Q9HCS7 XAB2 Pre-mRNA-splicing factor SYF1

C_618 38 O75934 BCAS2 Pre-mRNA-splicing factor SPF27

C_618 38 Q9ULR0 ISY1 Pre-mRNA-splicing factor ISY1 homolog

C_618 38 Q6P2Q9 PRPF8 Pre-mRNA-processing-splicing factor 8

C_618 38 O94906 PRPF6 Pre-mRNA-processing factor 6

C_618 38 Q9UMS4 PRPF19 Pre-mRNA-processing factor 19

C_618 38 Q9Y3B4 SF3B14 Pre-mRNA branch site protein p14

260

C_618 38 O43660 PLRG1 Pleiotropic regulator 1

C_618 38 Q7RTV0 PHF5A PHD finger-like domain-containing protein 5A

C_618 38 P13473 LAMP2 Lysosome-associated membrane glycoprotein 2

C_618 38 Q99459 CDC5L Cell division cycle 5-like protein

C_618 38 P53999 SUB1 Activated RNA polymerase II transcriptional coactivator p15

C_618 38 Q15029 EFTUD2 116 kDa U5 small nuclear ribonucleoprotein component

C_619 41 Q53S58 TMEM177 Transmembrane protein 177

C_619 41 Q9BQC6 MRP63 Ribosomal protein 63, mitochondrial

C_619 41 Q2NL82 TSR1 Pre-rRNA-processing protein TSR1 homolog

C_619 41 Q9BVG9 PTDSS2 Phosphatidylserine synthase 2

C_619 41 Q14197 ICT1 Peptidyl-tRNA hydrolase ICT1, mitochondrial

C_619 41 Q8WWI1 LMO7 LIM domain only protein 7

KH domain-containing, RNA-binding, signal transduction- C_619 41 Q07666 KHDRBS1 associated protein 1

C_619 41 P05783 KRT18 Keratin, type I cytoskeletal 18

C_619 41 P07910 HNRNPC Heterogeneous nuclear ribonucleoproteins C1/C2

C_619 41 O43390 HNRNPR Heterogeneous nuclear ribonucleoprotein R

C_619 41 Q96DZ1 ERLEC1 Endoplasmic reticulum lectin 1

C_619 41 P62269 RPS18 40S ribosomal protein S18

C_619 41 Q9BYD2 MRPL9 39S ribosomal protein L9, mitochondrial

C_619 41 Q7Z7F7 MRPL55 39S ribosomal protein L55, mitochondrial

C_619 41 Q86TS9 MRPL52 39S ribosomal protein L52, mitochondrial

C_619 41 Q4U2R6 MRPL51 39S ribosomal protein L51, mitochondrial

C_619 41 Q8N5N7 MRPL50 39S ribosomal protein L50, mitochondrial

C_619 41 Q9BRJ2 MRPL45 39S ribosomal protein L45, mitochondrial

C_619 41 Q9H9J2 MRPL44 39S ribosomal protein L44, mitochondrial

C_619 41 Q9Y6G3 MRPL42 39S ribosomal protein L42, mitochondrial

C_619 41 Q8IXM3 MRPL41 39S ribosomal protein L41, mitochondrial

C_619 41 Q9NQ50 MRPL40 39S ribosomal protein L40, mitochondrial

261

C_619 41 Q9BYD3 MRPL4 39S ribosomal protein L4, mitochondrial

C_619 41 Q9NYK5 MRPL39 39S ribosomal protein L39, mitochondrial

C_619 41 Q96DV4 MRPL38 39S ribosomal protein L38, mitochondrial

C_619 41 Q9BZE1 MRPL37 39S ribosomal protein L37, mitochondrial

C_619 41 Q9BYC8 MRPL32 39S ribosomal protein L32, mitochondrial

C_619 41 P09001 MRPL3 39S ribosomal protein L3, mitochondrial

C_619 41 Q96A35 MRPL24 39S ribosomal protein L24, mitochondrial

C_619 41 Q16540 MRPL23 39S ribosomal protein L23, mitochondrial

C_619 41 Q7Z2W9 MRPL21 39S ribosomal protein L21, mitochondrial

C_619 41 Q5T653 MRPL2 39S ribosomal protein L2, mitochondrial

C_619 41 P49406 MRPL19 39S ribosomal protein L19, mitochondrial

C_619 41 Q9NX20 MRPL16 39S ribosomal protein L16, mitochondrial

C_619 41 Q9P015 MRPL15 39S ribosomal protein L15, mitochondrial

C_619 41 Q6P1L8 MRPL14 39S ribosomal protein L14, mitochondrial

C_619 41 Q9BYD1 MRPL13 39S ribosomal protein L13, mitochondrial

C_619 41 P52815 MRPL12 39S ribosomal protein L12, mitochondrial

C_619 41 Q9Y3B7 MRPL11 39S ribosomal protein L11, mitochondrial

C_619 41 Q7Z7H8 MRPL10 39S ribosomal protein L10, mitochondrial

C_619 41 P82933 MRPS9 28S ribosomal protein S9, mitochondrial

C_620 41 Q96A49 SYAP1 Synapse-associated protein 1

C_620 41 O95905 ECD Protein SGT1

C_620 41 P28062 PSMB8 Proteasome subunit beta type-8

C_620 41 Q99436 PSMB7 Proteasome subunit beta type-7

C_620 41 P28072 PSMB6 Proteasome subunit beta type-6

C_620 41 P28074 PSMB5 Proteasome subunit beta type-5

C_620 41 P28070 PSMB4 Proteasome subunit beta type-4

C_620 41 P49720 PSMB3 Proteasome subunit beta type-3

C_620 41 P49721 PSMB2 Proteasome subunit beta type-2

262

C_620 41 P40306 PSMB10 Proteasome subunit beta type-10

C_620 41 P20618 PSMB1 Proteasome subunit beta type-1

C_620 41 Q8TAA3 PSMA8 Proteasome subunit alpha type-7-like

C_620 41 O14818 PSMA7 Proteasome subunit alpha type-7

C_620 41 P60900 PSMA6 Proteasome subunit alpha type-6

C_620 41 P28066 PSMA5 Proteasome subunit alpha type-5

C_620 41 P25789 PSMA4 Proteasome subunit alpha type-4

C_620 41 P25788 PSMA3 Proteasome subunit alpha type-3

C_620 41 P25787 PSMA2 Proteasome subunit alpha type-2

C_620 41 P25786 PSMA1 Proteasome subunit alpha type-1

C_620 41 P61289 PSME3 Proteasome activator complex subunit 3

C_620 41 Q9UL46 PSME2 Proteasome activator complex subunit 2

C_620 41 Q06323 PSME1 Proteasome activator complex subunit 1

C_620 41 P48556 PSMD8 26S proteasome non-ATPase regulatory subunit 8

C_620 41 P51665 PSMD7 26S proteasome non-ATPase regulatory subunit 7

C_620 41 Q15008 PSMD6 26S proteasome non-ATPase regulatory subunit 6

C_620 41 Q16401 PSMD5 26S proteasome non-ATPase regulatory subunit 5

C_620 41 P55036 PSMD4 26S proteasome non-ATPase regulatory subunit 4

C_620 41 O43242 PSMD3 26S proteasome non-ATPase regulatory subunit 3

C_620 41 Q13200 PSMD2 26S proteasome non-ATPase regulatory subunit 2

C_620 41 O00487 PSMD14 26S proteasome non-ATPase regulatory subunit 14

C_620 41 Q9UNM6 PSMD13 26S proteasome non-ATPase regulatory subunit 13

C_620 41 O00232 PSMD12 26S proteasome non-ATPase regulatory subunit 12

C_620 41 O00231 PSMD11 26S proteasome non-ATPase regulatory subunit 11

C_620 41 O75832 PSMD10 26S proteasome non-ATPase regulatory subunit 10

C_620 41 Q99460 PSMD1 26S proteasome non-ATPase regulatory subunit 1

C_620 41 P62195 PSMC5 26S protease regulatory subunit 8

C_620 41 P35998 PSMC2 26S protease regulatory subunit 7

263

C_620 41 P43686 PSMC4 26S protease regulatory subunit 6B

C_620 41 P17980 PSMC3 26S protease regulatory subunit 6A

C_620 41 P62191 PSMC1 26S protease regulatory subunit 4

C_620 41 P62333 PSMC6 26S protease regulatory subunit 10B

C_621 42 Q15942 ZYX Zyxin

C_621 42 O95218 ZRANB2 Zinc finger Ran-binding domain-containing protein 2

C_621 42 O75312 ZNF259 Zinc finger protein ZPR1

C_621 42 Q96K21 ZFYVE19 Zinc finger FYVE domain-containing protein 19

C_621 42 P46937 YAP1 Yorkie homolog

C_621 42 Q16864 ATP6V1F V-type proton ATPase subunit F

C_621 42 P21281 ATP6V1B2 V-type proton ATPase subunit B, brain isoform

C_621 42 O75351 VPS4B Vacuolar protein sorting-associated protein 4B

C_621 42 O75436 VPS26A Vacuolar protein sorting-associated protein 26A

C_621 42 O94888 UBXN7 UBX domain-containing protein 7

C_621 42 Q04323 UBXN1 UBX domain-containing protein 1

C_621 42 Q96S82 UBL7 Ubiquitin-like protein 7

C_621 42 Q9Y3C8 UFC1 Ubiquitin-fold modifier-conjugating enzyme 1

C_621 42 P61960 UFM1 Ubiquitin-fold modifier 1

C_621 42 Q15819 UBE2V2 Ubiquitin-conjugating enzyme E2 variant 2

C_621 42 Q13404 UBE2V1 Ubiquitin-conjugating enzyme E2 variant 1

C_621 42 P68036 UBE2L3 Ubiquitin-conjugating enzyme E2 L3

C_621 42 P62837 UBE2D2 Ubiquitin-conjugating enzyme E2 D2

C_621 42 P51668 UBE2D1 Ubiquitin-conjugating enzyme E2 D1

C_621 42 O00762 UBE2C Ubiquitin-conjugating enzyme E2 C

C_621 42 P63146 UBE2B Ubiquitin-conjugating enzyme E2 B

C_621 42 P49459 UBE2A Ubiquitin-conjugating enzyme E2 A

C_621 42 O95155 UBE4B Ubiquitin conjugation factor E4 B

C_621 42 P15374 UCHL3 Ubiquitin carboxyl-terminal hydrolase isozyme L3

264

C_621 42 Q70CQ2 USP34 Ubiquitin carboxyl-terminal hydrolase 34

C_621 42 Q9NRR5 UBQLN4 Ubiquilin-4

C_621 42 Q9UMX0 UBQLN1 Ubiquilin-1

C_621 42 Q14166 TTLL12 Tubulin--tyrosine ligase-like protein 12

C_621 42 O75347 TBCA Tubulin-specific chaperone A

C_621 42 Q9Y6A5 TACC3 Transforming acidic coiled-coil-containing protein 3

C_621 42 Q15654 TRIP6 Thyroid receptor-interacting protein 6

C_621 42 P04818 TYMS Thymidylate synthase

C_621 42 Q8N5M4 TTC9C Tetratricopeptide repeat protein 9C

C_621 42 Q99614 TTC1 Tetratricopeptide repeat protein 1

C_621 42 Q9UBT2 UBA2 SUMO-activating enzyme subunit 2

C_621 42 P50225 SULT1A1 Sulfotransferase 1A1

C_621 42 Q9BYN0 SRXN1 Sulfiredoxin-1

C_621 42 O60232 SSSCA1 Sjoegren syndrome/scleroderma autoantigen 1

C_621 42 Q9C0B0 UNK RING finger protein unkempt homolog

C_621 42 A6NL28 TPM3L Putative tropomyosin alpha-3 chain-like protein

C_621 42 O43865 AHCYL1 Putative adenosylhomocysteinase 2

C_621 42 Q93008 USP9X Probable ubiquitin carboxyl-terminal hydrolase FAF-X

C_622 105 P62979 RPS27A Ubiquitin-40S ribosomal protein S27a

C_622 105 P22087 FBL rRNA 2'-O-methyltransferase fibrillarin

C_622 105 Q15050 RRS1 Ribosome biogenesis regulatory protein homolog

C_622 105 Q14137 BOP1 Ribosome biogenesis protein BOP1

C_622 105 O76021 RSL1D1 Ribosomal L1 domain-containing protein 1

C_622 105 Q8IY81 FTSJ3 Putative rRNA methyltransferase 3

C_622 105 Q9Y383 LUC7L2 Putative RNA-binding protein Luc7-like 2

C_622 105 P46087 NOP2 Putative ribosomal RNA methyltransferase NOP2

C_622 105 Q58FG0 HSP90AA5P Putative heat shock protein HSP 90-alpha A5

C_622 105 Q9NQ39 RPS10P5 Putative 40S ribosomal protein S10-like

265

C_622 105 Q99848 EBNA1BP2 Probable rRNA-processing protein EBP2

C_622 105 P55209 NAP1L1 Nucleosome assembly protein 1-like 1

C_622 105 P19338 NCL Nucleolin

C_622 105 Q9Y2X3 NOP58 Nucleolar protein 58

C_622 105 O00567 NOP56 Nucleolar protein 56

C_622 105 Q14978 NOLC1 Nucleolar and coiled-body phosphoprotein 1

C_622 105 P55769 NHP2L1 NHP2-like protein 1

C_622 105 Q9H009 NACA2 Nascent polypeptide-associated complex subunit alpha-2

C_622 105 Q9BQG0 MYBBP1A Myb-binding protein 1A

C_622 105 Q9BYG3 MKI67IP MKI67 FHA domain-interacting nucleolar phosphoprotein

C_622 105 P55081 MFAP1 Microfibrillar-associated protein 1

C_622 105 P20671 HIST1H2AD Histone H2A type 1-D

C_622 105 Q92993 KAT5 Histone acetyltransferase KAT5

C_622 105 P08238 HSP90AB1 Heat shock protein HSP 90-beta

C_622 105 P07900 HSP90AA1 Heat shock protein HSP 90-alpha

C_622 105 Q9BVP2 GNL3 Guanine nucleotide-binding protein-like 3

C_622 105 P63244 GNB2L1 Guanine nucleotide-binding protein subunit beta-2-like 1

C_622 105 P56537 EIF6 Eukaryotic translation initiation factor 6

C_622 105 P49411 TUFM Elongation factor Tu, mitochondrial

C_622 105 P13639 EEF2 Elongation factor 2

C_622 105 P68104 EEF1A1 Elongation factor 1-alpha 1

C_622 105 P07355 ANXA2 Annexin A2

C_622 105 P05141 SLC25A5 ADP/ATP translocase 2

C_622 105 P12235 SLC25A4 ADP/ATP translocase 1

C_622 105 P32969 RPL9

C_622 105 P62917 RPL8

C_622 105 P62424 RPL7A 60S ribosomal protein L7a

C_622 105 P18124 RPL7

266

C_622 105 Q02878 RPL6 60S ribosomal protein L6

C_622 105 P46777 RPL5

C_622 105 P36578 RPL4

C_622 105 P63173 RPL38 60S ribosomal protein L38

C_622 105 P61513 RPL37A 60S ribosomal protein L37a

C_622 105 P61927 RPL37 60S ribosomal protein L37

C_622 105 Q9Y3U8 RPL36 60S ribosomal protein L36

C_622 105 P42766 RPL35 60S ribosomal protein L35

C_622 105 P62910 RPL32 60S ribosomal protein L32

C_622 105 P62899 RPL31 60S ribosomal protein L31

C_622 105 P62888 RPL30 60S ribosomal protein L30

C_622 105 P39023 RPL3 60S ribosomal protein L3

C_622 105 P47914 RPL29 60S ribosomal protein L29

C_622 105 P46776 RPL27A 60S ribosomal protein L27a

C_622 105 P61353 RPL27 60S ribosomal protein L27

C_622 105 P83731 RPL24 60S ribosomal protein L24

C_622 105 P62750 RPL23A 60S ribosomal protein L23a

C_622 105 P62829 RPL23 60S ribosomal protein L23

C_622 105 P35268 RPL22 60S ribosomal protein L22

C_622 105 P46778 RPL21 60S ribosomal protein L21

C_622 105 P84098 RPL19 60S ribosomal protein L19

C_622 105 Q02543 RPL18A 60S ribosomal protein L18a

C_622 105 Q07020 RPL18 60S ribosomal protein L18

C_622 105 P18621 RPL17 60S ribosomal protein L17

C_622 105 P61313 RPL15 60S ribosomal protein L15

C_622 105 P50914 RPL14 60S ribosomal protein L14

C_622 105 P26373 RPL13 60S ribosomal protein L13

C_622 105 P30050 RPL12 60S ribosomal protein L12

267

C_622 105 P62913 RPL11 60S ribosomal protein L11

C_622 105 Q96L21 RPL10L 60S ribosomal protein L10-like

C_622 105 P62906 RPL10A 60S ribosomal protein L10a

C_622 105 P27635 RPL10 60S ribosomal protein L10

C_622 105 P05387 RPLP2 60S acidic ribosomal protein P2

C_622 105 P05386 RPLP1 60S acidic ribosomal protein P1

C_622 105 Q8NHW5 RPLP0P6 60S acidic ribosomal protein P0-like

C_622 105 P05388 RPLP0 60S acidic ribosomal protein P0

C_622 105 P08865 RPSA 40S ribosomal protein SA

C_622 105 P46781 RPS9

C_622 105 P62241 RPS8

C_622 105 P62081 RPS7

C_622 105 P62753 RPS6 40S ribosomal protein S6

C_622 105 P46782 RPS5

C_622 105 P62701 RPS4X 40S ribosomal protein S4, X isoform

C_622 105 P61247 RPS3A 40S ribosomal protein S3a

C_622 105 P62861 FAU 40S ribosomal protein S30

C_622 105 P23396 RPS3

C_622 105 P62273 RPS29 40S ribosomal protein S29

C_622 105 P62857 RPS28 40S ribosomal protein S28

C_622 105 Q71UM5 RPS27L 40S ribosomal protein S27-like

C_622 105 P62854 RPS26 40S ribosomal protein S26

C_622 105 P62851 RPS25 40S ribosomal protein S25

C_622 105 P62847 RPS24 40S ribosomal protein S24

C_622 105 P62266 RPS23 40S ribosomal protein S23

C_622 105 P63220 RPS21 40S ribosomal protein S21

C_622 105 P60866 RPS20 40S ribosomal protein S20

C_622 105 P15880 RPS2 40S ribosomal protein S2

268

C_622 105 P39019 RPS19 40S ribosomal protein S19

C_622 105 P08708 RPS17 40S ribosomal protein S17

C_622 105 P62249 RPS16 40S ribosomal protein S16

C_622 105 P62244 RPS15A 40S ribosomal protein S15a

C_622 105 P62841 RPS15 40S ribosomal protein S15

C_622 105 P62263 RPS14 40S ribosomal protein S14

C_622 105 P62277 RPS13 40S ribosomal protein S13

C_622 105 P25398 RPS12 40S ribosomal protein S12

C_622 105 P62280 RPS11 40S ribosomal protein S11

C_622 105 P46783 RPS10 40S ribosomal protein S10

C_622 105 P82912 MRPS11 28S ribosomal protein S11, mitochondrial