<<

MASARYK UNIVERSITY

Faculty of Science Department of Biochemistry

Ondřej Široký

PROTEIN COMPLEXES AND Bachelor thesis

Supervisor: RNDr. Zbyňek Zdráhal, Dr.

Brno 2009

There is a theory which states that if ever anybody discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.

Douglas Adams English humorist & science fiction novelist (1952 - 2001)

II

Hereby I declare that I have written this thesis only by myself and have used just the sources mentioned in the bibliography part.

Brno, May 13, 2009 Ondřej Široký

III

Acknowledgement

I would like to express special thanks to Mr. RNDr. Zbyněk Zdráhal, Dr. who gave me the support, guidance and the necessary time needed to complete this thesis. Furthermore I would like to thank all the people, who supported me either with a kind word or a piece of advice. Finally I acknowledge George Abraham and Stephanie Pitts for revising the linguistic part of this work.

IV Abstract

This work is dedicated to summarize basic approaches of determining and mapping – protein interactions (PPIs). In the beginning the main topic is what PPIs are, how can PPIs differ from each other, the kinetics of forming a and computational tools for various proteomic predictions. Following chapters should serve as an outline of a common proteomic experiment, which includes common strategies of purification (special attention is given to the TAP tag technique) and identification of by various types of mass spectrometry. No proteomic experiment can be completed without the data evaluation and standardization, so this comes next, followed by future perspectives of proteomic studies. Finally in the experimental part there is an experiment of MS based protein identification in mouse cell lines and discussion to the results obtained.

V Contents

INTRODUCTION ...... 8

THE PRIMARY OF PROTEINS ...... 10

PROTEIN – PROTEIN INTERACTIONS (PPIs) ...... 14

3.1 DISTINGUISHING AMONG PPIS ...... 16 3.1.1 Homo- and hetero-oligomeric ...... 16 3.1.2 Obligate and non-obligate complexes ...... 16 3.1.3 Permanent and transient interactions ...... 17 3.2 THE KINETICS OF FORMING A PPI ...... 17 3.3 PREDICTING AND MODELLING PPIS ...... 18 3.3.1 Mining PPIs from text ...... 18 3.3.2 Predicting PPIs ...... 19 3.3.3 Protein networks ...... 19

PROTEOMIC EXPERIMENT ...... 21

4.1 SAMPLE PREPARATION ...... 21 4.1.1 Endogenous isolation techniques ...... 22 4.1.1.1 -based methods ...... 22 4.1.1.2 Biochemical purification and affinity ...... 22 4.1.2 techniques using recombinant proteins3 ...... 24 4.1.2.1 Immobilized recombinant proteins ...... 24 4.1.2.2 technology ...... 24 4.1.2.3 Protein arrays ...... 25 4.2 PROTEIN SEPARATION ...... 25 4.2.1 One-dimensional ...... 26 4.2.2 Two-dimensional electrophoresis ...... 26 4.2.3 SDS-PAGE ...... 26 4.2.4 QPNC-PAGE ...... 27 4.3 SAMPLE DIGESTION ...... 27 4.4 MASS SPECTROMETRY DETECTION ...... 27 4.4.1 The construction of mass spectrometer ...... 28 4.4.1.1 Ion Source ...... 28 4.4.1.2 Mass analyzer ...... 29 4.4.1.3 Mass detector ...... 29 4.4.2 Common mass spectrometric configurations ...... 30 4.4.3 Protein identification strategies ...... 31 VI 4.4.3.1 Bottom up strategy ...... 31 mass fingerprinting ...... 31 Tandem MS ...... 31 4.4.3.2 Top down strategy ...... 32 4.5 DATA EVALUATION ...... 32 4.6 DATA STANDARDIZATION AND INTERPRETATION ...... 33

FUTURE PERSPECTIVES ...... 35

EXPERIMENTAL PART ...... 37

5.1 OBJECTIVES ...... 37 5.2 MATERIAL ...... 37 5.2.1 Samples ...... 38 5.2.2 Chemicals ...... 38 5.3 METHODS ...... 38 5.3.1 Sample separation ...... 38 5.3.2 Mass spectrometry analysis ...... 38 5.3.3 Data processing ...... 39 5.4 RESULTS ...... 40 5.4.2 Protein analysis ...... 40 5.5 DISCUSSION ...... 43 5.6 CONCLUSION ...... 44

ABBREVIATIONS ...... 45

APPENDIX ...... 47 A1: 3 ...... 47 C1: 1 ...... 51

BIBLIOGRAPHY ...... 59

VII Chapter 1

Introduction

Proteomics as a field of science has been developing very rapidly for the past few decades and nowadays it is a notable field of research in life sciences. It can be divided into two basic types according to an objective it faces. The first approach: expression-based is trying to map and characterize all proteins expressed in the studied object e.g. a cell, tissue or the whole organism. It can also study fluctuations in expression of proteins under variable external conditions. On the other hand, the second and presently more progressive approach is functional proteomics, which is driven by the hypothesis that interacting proteins or proteins building a PPI (Protein – Protein interaction) co-work in the same molecular process. 1 But why we have to ask. What was the most important force, which drove us into the hunt for more and more proteomic data? There are many reasons why, some of them are really simple and nonscientific, and some might be more complex. However, at first place it is the natural part of humanity, i.e. the desire to know more. We have already investigated the structure of genome in many organisms and successfully extracted the base sequence from them, but it was just not enough. So we decided to do research in another field of modern science, transcriptomics. By then we were a bit closer to understanding the processes ongoing on the molecular level of the sample. But again it was just not accurate enough. The presence of proteins resulting from of different types of mRNA (the transcriptome) cannot serve as the material to be researched, because in most cases it is not identical to the present state (the ), which determines the behavior of molecular pathways in the studied object. Thus only proteomics can give us the image of the vast complexity of cellular functions and this is also the main reason why to study molecular processes from the proteomic point of view. The classic proteomic experiment has become a lot more automatic and faster than ever before: “It is interesting to remember that the first experiments that we performed required 20 to 50 pmol amounts of protein blotted onto nitrocellulose. The mass spectra and product ion spectra were acquired

8 over the course of several liquid chromatography runs and were switched manually between different precursor ions to obtain the product ion spectra. All of the product ion spectra that were obtained were then interpreted manually and the databases were searched by sending a specially formatted e-mail message and waiting until the next day for the reply (often to find that our message had contained a syntax or format problem). The methods that we now use have advanced to the point where not only are all of the data acquired in approximately 45 minutes but the databases are searched either locally or over the internet in a matter of minutes.“ 2 It is now clear that proteomics, hand in hand with mass spectrometry, has become a tremendously powerful tool in modern life sciences and can deliver a huge amount of valuable data in short time. Among all we can name modern medicine and drug development as a field that is highly profiting from this research, mainly in being able to discover new treatment and even determine the key mechanisms that are leading to development of a particular disease. For example, applying proteomic research into molecular toxicology enables researchers to detect the carcinogenic protein markers after a couple of days of the test organism’s exposure to the carcinogenic substance, that is even before there is a certain evidence of a developing tumor. 3

9 Chapter 2

The primary structure of proteins

Proteins are one of the few types of used to build up the structures of the living organisms and . In the beginning of this thesis, that is before it is going to be focused on its core topic, i.e. Protein – Protein Interactions (PPIs), it might come useful to briefly characterize the structure of an as the basic component of protein structure.

Figure 1: The general structure of amino acid containing a central carbon and four attached groups (): amino group, carboxyl group, hydrogen atom and a side chain presented in the figure as R. The side chain can be of various kinds; it can contain hydrophobic or hydrophilic chain and highly contributes the chemical behavior of the protein. Source: http://pps00.cryst.bbk.ac.uk/course/section3/protgeom/jonc/peptide1.html

As shown in figure 1 each amino acid contains a central carbon atom (alpha carbon) and 4 groups or atoms attached to it. Three being always the same: amino group, carboxyl group and hydrogen atom. Thus it is the fourth group that makes the properties and chemical behavior of each amino acid different. In the process of creating a protein, there are only 20 amino acids that have „sense“, meaning that in the process beginning with the of DNA into mRNA, transport of mature mRNA into cytoplasm and interacting with tRNA in the process of translation, only a protein chain consisting of those 20 amino acids can be created. Relatively, proteins could be seen as a stable and not a diverse structure, but the amount of combinations is nearly infinite. That makes it such an interesting topic, the fact that only from 20

10 milestone structures so complicated and diverse biomolecules are created, not speaking about the functions for which the same rule could be applied.

Figure 2: A table of the 20 amino acids with their side chains colored in blue. Source: www.biology-online.org

Next to their primary structure (amino acid ladder) proteins can be folded in several levels (secondary, tertiary and quaternary structure). These structures are typical for each protein and the folding together with the primary structure determines the chemical behavior of the protein itself. However, the folding is partly dependent on the primary structure and that is why we can consider the primary structure as the key element determining the protein behavior. To be more specific and to reveal what primary structure is, it is useful to attach a figure (3) that shows how the primary structure is being created. It is nothing more than the covalent linkage between two neighboring amino acids created as an amide bond out of a C terminus of one amino acid and N-terminus of another one. By this condensation reaction a single covalent bond between carbon atom and nitrogen atom is being created together with releasing of . The protein chain is by convention said to start with the N-

11 terminus and end with the C-terminus, those are the places, where the specific amino acids occur, because they have created just one amide bond, while the other in the chain have created two.

Figure 3: The created between the carbon and nitrogen atom with its stereometrical properties is shown. Source: http://xray.bmc.uu.se/~kurs/BiostrukfunkX2/practicals/practical_1/figs/peptide_bond.jpg

We have introduced that the primary sequence of amino acids is the key difference among all the proteins, which are presented as a protein chain beginning with N- terminus and ending with C-terminus. However, there is one more theoretical approach dedicated to interpreting the protein primary structure. Proteins are shown as a set of shorter molecules, multi amino acid residues, called . It is often advantageous to work with peptides instead of proteins. In various proteomic experiments proteins are digested with a proteolytic agent, such as into smaller molecules, peptides. And the identification is based on matching those peptides with a particular database. There is no difference between proteins and peptides, however. They are both built from amino acids, have the two terminuses, and their length may vary. A protein of a sequence shorter than 30 amino acids can exist much like a peptide of a length greater than 30 amino acids can exist too. Then it is only the

12 occurrence that tells us, whether it is a protein with a particular function, or just a part of a peptide fragments system. When we are talking about a system, it is necessary to say that the majority of proteins do not function as a single molecule, but it has to cooperate with other proteins to accomplish the mission it has been destined to. Thus there is a dramatic number of protein complexes or machines involved in molecular pathways in an organism. To assign the protein partners interacting together various techniques have been developed and are going to be discussed in chapter 4.

Figure 4: The process of protein creation. Source: www.answers.com

13 Chapter 3

Protein – protein interactions

"Every protein in our body has its own interacting partners." Dr. Manzoor Bhat

According to the quote above there is no protein in our body, or in an organism, that has no reacting partners. This is very much true. Due to its complexity it is difficult to predict the number of PPIs in the human body, but for a simpler organism, such as yeast, expressing about 6000 proteins, the number of interactions is predicted to be circa 20 000. 4 Only a few decades ago the idea of molecular interactions in a cell was completely different from the present state of scientific research. In that time it was based on a model of huge machinery of second-order reactions 5, that is just on the simple kinetics of physical chemistry representing a reaction of molecule A with molecule B to form a compound AB. This theory was based as well on the fact that on the molecular scale is very rapid making all the processes feasible. However, as the science is making a huge progress it turned out that those processes are much more complex and sophisticated. Concerning proteins, it’s not only two proteins colliding with each other, but far more likely a protein assemblies of a precise spherical shape undergoing specific interactions. Thus the view on a cell must have had changed too and it had, indeed. The cell is presented as a complex and accurate factory with a plethora of molecular pathways incorporating numerous protein complexes. Out of this large set we are going to cover only those consisting exclusively of proteins, concluding that every PPI can be defined as a specific interaction between two or more proteins. As in the present we are in the post-genomic era, the leading topic of molecular biology is nothing but the understanding of how do the proteins in the cell interact together and what are the consequences of this complex process. It has been reviewed that there are three types of codes well worth consideration, i.e. , ‘code’ and protein interaction/assembly ‘code’ 6. It is crucial to understand the binding motives and energetics of a PPI in order to be able to qualitatively, but also quantitatively describe it. From the various scientific studies we

14 can conclude some properties of a PPI help us create a scheme and a better projection of how we should interpret an interaction between proteins. PPIs are of different types. However, the principles of universality and specificity are of an importance too. The first meaning that the proteins interactions are observable at different levels of cell structures, either structural or dynamical, and the second that a specific binding motif is responsible for exact protein recognition and forming of an interaction. Furthermore we may doubt what is being responsible for the interaction itself, which forces are present in the complex causing it to be stable for a particular amount of time depending on its function? The answer is not complicated. Among the most important count electrostatic interactions, hydrogen bonds, and van der Waals interactions. Those hold the interacting partners together. The junction has another interesting parameter: the contact area. In one proteomic experiment (the number of PPIs might be insufficient to make a global statement) it has been reported that this area is almost never smaller than 1100 Å2 meaning that each of the interacting partners has to contribute half of this area of complementary surface. With this interaction each binding partner looses about 800 Å2 of its solvent-accessible surface by usually incorporating about 20 amino acids. Thus each amino acid covers circa 40 Å2. The contact area takes approximately 6 to 29% of the protein surface with a tendency of ascendant progress towards more complex assemblies. 7 In such manner one would be able go on and on and write up many more pages just by naming other PPI’s properties of the same importance, but the crux of the problem lies somewhere else. It is not necessarily needed to comprehend all the types of PPIs, but rather to be able to classify those interactions based only on some of the properties and what is pivotal to be able to picture the PPI and to describe its behavior under specific conditions. A panoply of methods dedicated to characterization of PPIs have been developed starting with analysing just one particular interaction and going on to a characterisation of the whole proteome. Because of a high rate of false positives and false negatives some of the early methods have been accepted with skepticism and sometimes even with open criticism. Since the methods are being improved the accuracy is constantly getting better with specificity and selectivity being the most important factors observed by the analysis, mostly for high-throughput protein-protein interaction screening methods. 8 Therefore, knowing that the vast complexity of this topic does not enable us to discuss all the details covering the whole problem, but just to focus on some particular 15 issues including different types of PPIs, kinetics of a protein interaction and last but not least bioinformatic approaches of predicting PPIs that might be possible and of a particular significance in the studied organism.

3.1 Distinguishing among PPIs

There are many ways how to distinguish among protein – protein interactions. We are going to review some of the basic approaches of distinguishing various PPIs, most of them based on different folding (structural and functional groups), structural motives, chemical and physical properties, complexity, etc.

3.1.1 Homo- and hetero-oligomeric structures

Homo- and hetero-oligomeric structures differ from each other in the composition of the complex or interaction. As the name clearly justifies, homo- are composed just of one /molecule (identical chain) repeated a few times, whereas hetero-oligomeric structures are composed of at least two structural motives (non-identical chains) forming the particular PPI. Moreover homo-oligomers could be divided into isologous or heterologous types with structural symmetry. The type of symmetry determines the complex behavior in the matter of further complexation. 9

3.1.2 Obligate and non-obligate complexes

This division again creates two groups of PPIs. The first one, obligate interactions, consists of protomers that are not present on their own in the cell in vivo. Such complexes are often of an obligate function as well as taking part in the core processes in the cell. Above all we can name complexes connected to transcription or replication (Arc repressor dimmer). On the other hand, non-obligate complexes are formed from protomers that are able to exist on their own. We can name /inhibitor complex, antibody- complex, or complexes connected to cell signaling (RhoA±RhoGAP complex).

16 3.1.3 Permanent and transient interactions1

Time or more precisely the duration of a PPI can be a distinguishing factor too. Permanent interactions are very stable and thus existing only within the whole complex, transient interactions assemble and disassemble in vivo. We distinguish weak transient interactions with a dynamic equilibrium in the solution and strong transient interactions that require a molecular trigger for shifting the equilibrium (such as G-protein that is GDP/GTP dependent). It can be concluded that obligate interactions tend to be usually permanent, whereas non-obligate interactions might be of both types.

3.2 The kinetics of forming a PPI

As mentioned in the introduction of this chapter, the simple theory used a few decades ago is becoming insufficient to deal with the complexity of protein assembling. However, even nowadays a very simplified theory based on it could be used in some cases to describe the kinetics of interacting proteins. This would be the case of a molecule A interacting with molecule B forming an assembly AB. The problem of assembling a larger set of proteins is solved by adding additional protein successively meaning that the former assembly AB becomes simplified as an A2 in the next reaction with another subunit of the complex B2 forming A2B2 again (this

A2B2 now contains three subunits). In such manner the assembling of the whole complex could be described. However, a PPI is very likely to be more complex. This complexity heavily influences the way of its assembly. To obtain good and reliable structure – function relations it is crucial to be able to solve all of the different binding free energies under different conditions and hence to be able to determine the cooperation of particular subunits. 4 The whole process of forming a PPI can be divided to two stages, firstly protein recognition 10 and secondly protein folding of a multi protein complex. The former deals with the vast complexity of the cell proteome with an approach of trying to pick up the premises of a successful protein linkage. The latter focuses on the

1 The names of the categories of PPIs have been taken from the article Diversity of protein-protein interactions as they refer to specific and exact properties of the complexes. Thus there is no need to change the defined.

17 process of protein folding and specific mechanisms that are involved. The association of proteins is based on their complementary surfaces that form a noncovalent linkage with a stability determined by dissociation constant and dissociation free enthalpy. However, to cover the topic more comprehensively it would be a deep step into the field of physical chemistry and laboratory praxis, which presents a highly challenging issue, but is not the core topic of this work. It is not unusual that in proteomical research a plethora of databases is created. The same counts for the topic of thermodynamical data for PPIs. PINT, the Protein – protein Interactions Thermodynamic database collect thermodynamical data along with sequence and structural information, experimental conditions and literature information. 11 Together with an free internet interface (for academic purposes) presents PINT a powerful tool providing us better understanding of binding specifity and functions.

3.3 Predicting and modelling PPIs

The text mentioned above could give us an impression of a tremendous importance of databases in modern science. In this topic, predicting and modeling PPIs, the first impression is going to be developed in the way that a progressive research is vitally dependent on the bioinformatic tools which do various databases count. The reason for this is self explanatory. Modern sciences, specifically all the disciplines ending with –omics, have to deal with tremendously increasing amount of data created. Thus they have to be able to systematically manage it and develop universal tools to help the scientists to focus their minds just onto the core of the research.

3.3.1 Mining PPIs from text

The specific art of mining PPIs from text has become very helpful during the stage of gathering information for later experiments. The high throughput of proteomic experiments has made manual curation of PPIs very time consuming and hence disadvantageous. This problem is solved by bioinformatic approaches of collecting information on PPIs from various text sources. They could be facilitated as web interface like PIE (Protein Interaction information Extraction system), featuring online databases like PubMed and user-provided articles together with a protein

18 database and an interaction database. Via a sentence and article filter the input data are examined and searched for the interactions. 12 Another methods use filtering algorithms to successfully extract physical PPIs, in particular filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers and extracting protein-protein interactions. 13 The database work can go even further, it can link the PPI’s data with the text mining tool in order to predict certain interaction or estimate their function.

3.3.2 Predicting PPIs

Biological system function estimating out of a molecule set is the goal of modern . The function is highly dependent on interactions of the system components. At the present point of scientific research we are able to determine thousands of interactions, but not in enough detail. 14 No matter whether systems biology is understood as a continuation of molecular genomics or a group of mathematical methods to determine the molecule function it is crucial for making progress, which lies in determining the three dimensional (3D) structure of binding partners. However, it still struggles in with the structures that are of the most relevance, i.e. interactions of two or more macromolecules. Approaches to overcome this obstacle could be either experimental, or computational predictions. The former is very time-consuming and more expensive which resulted in a progressive development in computational predicting methods. There is a plethora of approaches how to determine protein interactions and all have to be aware of the same problem: false negative (missing the existing) and false positive (identifying incorrect) interactions. Among the approaches we can count methods using distant conservation of sequence patterns and structure relationships 15, structure motifs 16, protein arrays, resonance energy transfer (FRET), fluorescence cross-correlation (FCCS), determining the exact stoichiometry of multiprotein complexes 17, modelling, chemical crosslinking and foot printing, methods connected with microscopy imaging 18, protein-fold recognition, utilizing of synthetic binding interfaces 19, electron tomography and network based prediction tools. 20, 21, 22

3.3.3 Protein networks

19 Protein networks do not serve us only as a tool for predicting PPIs, but also as a tool for representing the output of large-scale experimental data leading us to better understanding the function of the proteome. When constructing network techniques such as graph theory 23, statistics, dynamical systems and others are used to successfully picture studied , pathway or even the whole proteome. This has advanced to studies of human proteome 24, 25 and has a great promise to give details for example of signaling pathways in the mammalian immune system 26. To conclude how protein networking can contribute to proteomics we have to mention that it serves as a ´glue´ for the data created in the analyzing process. In the future more stress will be put on making these structures more dynamic in order to provide insights into the functional plasticity of organisms. 27

Figure 5: An example of protein network diagram that may serve structural biologists for uncovering PPIs. Source: http://www.pnl.gov/statistics/bepro3/

20 Chapter 4

Proteomic experiment

"Proteomics has a future - you have just got to be very careful with experimental design and number of replicates. We have got some good insights from proteomics, which would have been missed by a DNA array approach. The main thing is that it is not just about techniques and instrumentation but the entire approach which counts." Professor Toni Slabas, School of Biological and Biomedical Sciences, Durham University2

There is a variety of approaches dedicated to process a proteomic experiment. However, each experiment has to contain a series of particular tasks crucial to its feasibility. The goal of this chapter is to introduce a typical procedure for determining a structure of proteins in a complex or PPI. Each proteomic experiment begins with the preparation of the sample to analyze followed by digesting the proteins with some proteolytic agent. Then the whole mixture of peptides obtained is separated, ionized and characterized by a suitable mass-spectrometric method. The last, but not least important step is validating the data and matching them with those in various proteomic databases. The most challenging step is to identify the protein correctly taking in concern posttranslational modifications and other structural changes of the protein itself or the whole PPI and process the data in the form of one of the proteomic standards (in fact the correct standardization is being discussed in order to set up as few standards as possible to make it simpler: both data gathering, storing and searching).

4.1 Sample preparation

Sample preparation procedure is composed of a couple of steps. Firstly it is necessary to choose a cell, tissue, organism i.e. the object to be studied. Then, of course, it has to be prepared in a suitable condition for the analysis. Afterwards the specific biochemical approach to extract the protein complex you are interested in is

2 Source: http://www.fixingproteomics.org

21 applied. Basically there are two ways of biochemical approaches of mapping PPIs: techniques to isolate endogenous protein complexes and in vitro techniques using recombinant proteins. Except of biochemical approaches genetic approaches are of a great importance too. Methods such as Yeast two-hybrid 28, 29, 30, used for analyzing a wide scale of complexes from molecular machines to the whole proteome in Yeast 31, 32, 33, or methods for analysing the higher organisms like noninvasive fluorescence-based methods using Resonance Energy Transfer (RET), namely bioluminiscence-RET (BRET) and fluorescence-RET (FRET), or methods based on protein fragment complementation, such as Bimolecular Fluorescence complementation (BiFC) are nowadays highly used for visualizing the interactions in their normal milieu, e.i. in living cells.

4.1.1 Endogenous isolation techniques3

Such techniques attempt to isolate endogenous protein complexes from cells and can be divided into these approaches:

4.1.1.1 Antibody-based methods

Antibody-based methods, also known as immunochemical methods are based on precipitation of complexes out of the cell lysate by using a specific immobilized antibody to a known component of the complex, which is then purified by washing out nonspecific interactions 1. The reason why this method should be chosen every time when feasible (meaning that a specific antibody can be developed) is the universality, specifity and efficiency of the procedure. Initiatives to develop to all components of human proteome are being studied. More generally antibody proteomics could refer to wide range of platforms, including tissue profiling 34, protein assays 35 and as capture reagents for a PPI analysis.

4.1.1.2 Biochemical purification and

3 The content of these two sample preparation sub-headings do not serve us as a complete review of methods available, but just mentions the most important ones.

22 One of the most efficient and widely used techniques to isolate PPIs from cells is affinity purification. It exploits the biochemical properties of a tag (a polypeptide or even a whole protein) attached to a bait protein to purify all interacting partners with it. This is reached by fusing both the coding sequence of the target protein and the tag resulting in expression of the modified protein. Various columns with a high sensitivity for a specific tag are used to enrich the complex. There is a plethora of tags that can be used in an experiment, each has its strengths and weaknesses and it is the decision of the researcher, which one to use. This presents a challenge of correctly determining the properties of the target component of the complex (stability, hydrophobicity, etc.). However, generally the most favorable are: Arg-tag, -binding peptide, cellulose-, DsbA, c-myc- tag, glutathione S-transferase, FLAG-tag, HAT-tag, His-tag, maltose-binding protein, NusA, S-tag, SBP-tag, Strep-tag, and thioredoxin. 36 Another approach of minimizing the negative effects of a tag to the protein has been developed, talking about combining more tags in order to make the most out of 37 them – enhance their solubility and yield. Tags such as dual His6-MBP affinity tag or TAP tag 38, 39 count among these. Tandem affinity purification tag, TAP tag, is the one that is being used the most. It consists of two tags spaced by a cleavable linker. The bait protein can be tagged either at C-terminus, or N-terminus, while both is recommendable consequently, hence the tag can interfere with the protein function (the incidence is just about 5%). Originally it was developed for yeast and its composed of two IgG binding domains of Staphylococcus aureus (ProtA) and a calmodulin binding peptide (CBP) separated by a TEV cleavage site. 40 One should be aware of over expressing the target modified protein, because it can cause a few problems such as association with chaperones, competition on complex formation, or even forming false new interactions. The two tags does not have to necessarily be on the same protein, which allow isolation of specific complexes, if the two proteins are independently present in other complexes. 41 Although designed for a yeast this technique has been successfully used for proteome-wide analysis of higher eukaryotic cells. Together with double-stranded RNA silencing it could be used to avoid the competition of tagged and untagged proteins for the complex formation. 42 and through developing new systems of tags the problem of a low yield can be solved. 43 23 We can conclude that TAP tag method together with mass spectrometry is one of the most advantageous and used method for deciphering PPIs. Its advantage lies in the fact, that it can cope with the universality of a tag for a wide scale analysis. Moreover, this method is quite cheap and rapid.

Figure 6: A scheme of a TAP tag purification strategy. Source: http://www.nature.com/nrm/journal/v4/n1/images/nrm1007-f1.jpg

4.1.2 In vitro techniques using recombinant proteins3

These techniques utilize various recombinant proteins to capture PPIs. The basic approaches are as follows:

4.1.2.1 Immobilized recombinant proteins

The simplest in vitro method utilizes recombinant proteins to bind the interacting partners and after washing the bounded partners are eluted and characterized by suitable mass spectrometric technique.1

4.1.2.2 Phage display technology

24 This technology is not dedicated only for studying PPIs, but also for DNA-protein interactions determination using a to link proteins with the genetic information that encodes them. It is a combination of two concepts: firstly inserting a DNA sequence and thus creating a mutation within a structural of a leads to expression of the mutated peptide on the surface of the viral particle, and secondly if the insert is a random oligonucleotide, the resulting library will be composed of peptides, in that the mutated protein surrounds the enclosed mutant DNA. 44

4.1.2.3 Protein arrays

Protein arrays also named protein chips have become a powerful tool for large-scale analyses for its speed, ease and ability to detect thousands of elements at each run. So far protein arrays have been used for detecting antibody-antigen, protein-protein, protein-nucleic-acid, protein-lipid and protein-small molecule interactions, - substrate interactions and kinase activity 45. There are two functional types of protein microarrays: analytical microarrays that employ arrayed antibodies to detect proteins and to determine their concentrations in mixtures mainly used for process named protein profiling; and functional protein microarrays where a set of proteins or even the entire proteome is placed usually on a glass plate at high density and high range of biochemical properties can be tested. 46 The detection is usually realized via fluorescent 47, chemiluminescent probes, radioisotope labeling 48, or mass spectrometry. This approach is presently used in biomedical research to search for disease-specific proteins that could be used as disease markers 49 or drug targets. 50

4.2 Protein separation

Once you have purified the PPI of your interest it has to be separated before the digestion, so it can be prepared for the final identification. Firstly it is pre- concentrated and pre-treated by affinity chromatography 51 and then separated either with chromatographic or electrophoretic techniques. The former is usually represented by liquid chromatography (LC) in most of the cases on-line coupled with the analyzing instrument. Liquid chromatography utilizes a liquid mobile phase. It can be carried out either in a column or a plane. The latter uses an electric current to separate the sample, which is necessary for exact analysis that is focused on only one stripe

25 (1D), or spot (2D) at time. The present techniques dedicated for separating use gel matrix: when analyzing proteins or smaller DNA or RNA molecules polyacryl amide, when larger molecules . The proteins are then commonly stained with silver, coomasie blue, or fluorescent dyes (e.g. SYPRO). In special cases any different approach can be used such as UV light. The procedures are as follows:

4.2.1 One-dimensional electrophoresis

The simplest separation technique is claimed to be the 1D-electophoresis. It separates the molecules hence an electromotive force (EMF), which moves the molecules through the gel matrix according to their charge and mass (e.g. ). The negatively charged molecules move towards the anode and the positively charged molecules move towards the cathode. 1D electrophoresis is suitable for basic proteomic studies carried out on a small scale. Another 1D separation technique for separating proteins is isoelectric focusing (IEF), which means a gradient of pH is applied to the gel as well as the electric potential. At all other then isoelectric point, proteins will be charged and thus pulled towards one of the ends of the gel till the overall charge will become 0. IEF is as well the first step of 2D-electrophoresis.

4.2.2 Two-dimensional electrophoresis

This technique allows the separation of particles in two dimensions. Usually the first would be mass and the second charge. The two separations are conducted after each other in 90 degrees angle and that is why a 2D gel is obtained. It is very unlikely that two molecules will have the same values for both distinguishing factors and thus the 2D electrophoresis is a very sophisticated method with high separation potential. The most sophisticated techniques are able to detect 5000 proteins simultaneously with accuracy < 1 ng per spot. 52

4.2.3 SDS-PAGE

26 Refers to polyacrylamide , which uses SDS as a reagent that denatures secondary and non--linked tertiary structures and applies almost the same negative charge to all proteins. Thus they migrate in the electric field according to their primary structure (mass). This is the most common technique used to separate proteins in various experiments.

4.2.4 QPNC-PAGE

Quantitative preparative native continuous polyacrylamide gel electrophoresis is a high-resolution variant of electrophoresis, which utilizes special gel and buffer solution, used for separating proteins by isoelectric point. It is mainly used to isolate metaloproteins from biological samples and to determine their structure-function relationships.

4.3 Sample digestion

After the electrophoretic separation the bands or spots are excised, cut into pieces and undergo an in-gel digestion with a specific protease such as trypsin to produce a particular peptide pattern for mass spectrometry analysis 53. The extraction efficiency from gels is about 20% 1 Unlike trypsin, chymotrypsin, , pronase, elastase, subtilisin, thermolysin and various endoproteinases are used. 54 As an alternative to in-gel digestion, in-solution digestion can be performed that is digesting the proteins without previous separation and analyzing them with mass spectrometry. This technique needs high-pressure liquid chromatography (HPLC) or two-dimensional liquid chromatography separation 55 to deal with the complexity of the sample. The main advantages of this technique are time, efficiency and higher recovery of peptides compared to the classic in-gel digestion.

4.4 Mass spectrometry detection

Analysing the mass with mass spectrometry has become the milestone of proteomic research. This technique has brought all the necessary aspects into life sciences to become competitive and progressive and has taken the bio-research into one of the top positions of life sciences.

27 But it was not always like this. Mass spectrometry as a technique is almost 100 years old 56. It was originally used to measure masses of atoms and it took decades till the scientists were able to fathom the potential it has. By the beginning of 1980s analysing of small organic molecules presented no problems. However, larger molecules were still a challenge. The obstacle was how to ionize them in an efficient way. Many approaches were tried, but without any significant result. Then, in 1988, almost simultaneously MALDI (matrix assisted laser desorption and ionization) and ESI () came and revolutionized the world of MS. These ionization techniques are still dominant till now and have enabled large scale analyses of biomolecular samples. The principle of MS consists of ionizing biomolecues and measuring their mass-to-charge ratios. Nowadays mass spectrometry in proteomics particularly aims at protein identification, localization and determination of posttranslational modifications and protein quantification. 57

4.4.1 The construction of mass spectrometer

In the text above we have come to the first stage of mass spectrometry, namely the ionization. Mass spectrometer consists of two more functional units, mass analyser and detector.

Figure 7: A basic scheme of mass spectometer. Source: http://www.hull.ac.uk/chemistry/masspec3/principles%20of%20ms.html

4.4.1.1 Ion Source

28 The ionization is the first step in mass spectrometric analysis. It is of two fundamentally different types of soft ionizations (soft ionization is needed to overcome the propensity of biomolecules to fragment). The first is MALDI, which utilizes special matrix mixed with biomolecules to protect them from being destroyed by a laser beam that causes the vaporization and ionization of the sample. The second, ESI, is based on loading an electrospray dispersed fine aerosol into a capillary that causes the solvent to evaporate from a charged droplet. After short time the droplet becomes unstable and undergoes a process called Raleigh fission, i.e. it emits charged jets liberating the ions into the gas phase. Other ionization techniques include glow discharge, field desorption (FD), fast atom bombardment (FAB), thermospray, desorption/ionization on silicon (DIOS), Direct Analysis in Real Time (DART), atmospheric pressure chemical ionization (APCI), secondary ion mass spectrometry (SIMS) or spark ionization and thermal ionization (TIMS).

4.4.1.2 Mass analyzer

The next step of mass spectrometry analysis is separating the generated ions according to their mass-to-charge ratios (m/z). This is done in the mass spectrometry analyzer. There are many types of analyzers, each suitable for different experimental conditions with its strengths and weaknesses. That is why there can be more analysing steps coupled in the procedure referred to as (MS/MS). Among the most used mass spectrometric analyzers we can count: Time-of-flight (TOF), Quadrupole mass analyzer, Quadrupole ion trap, Linear quadrupole ion trap and Fourier transform ion cyclotron resonance (FT-ICR).

4.4.1.3 Mass detector

The last, but not least important part of mass spectrometer is the detector, which records either the charge induced or the current produced, when the ion passes by or hits the surface of the detector. The signal produced in the detector is passed to scanning instrument that produces a mass spectrum, a record of ions as a function of m/z.

29 4.4.2 Common mass spectrometric configurations

In spite of the fact that we now know the construction of mass spectrometer, we have to be able to see the options of the mass spectrometric analysis in a context of all distinguishing factors. The demands and requirements on the mass spectrometric tool vary from experiment to experiment and are not always compatible. Here the basic mass spectrometric approaches are going to be reviewed.

Figure 8: A common outline of mass spectrometric strategies From the chart above we can conclude, that more preferable ionization technique is ESI ionization. Moreover ESI can be coupled with an online HPLC that results in a better separation and thus identification of the sample. However, the time needed for the experiment completion is much longer. ESI ionized samples are usually analyzed with MS/MS. On the contrary MALDI ionization coupled with TOF analyzer is used to analyze noncomplex samples when time is the critical factor for the analysis. MALDI-Q-TOF or MALDI-TOF/TOF can operate in tandem mass mode. The most present proteomic studies are based on data acquired from LC-MS/MS spectrometry with ESI ionization.1

30 4.4.3 Protein identification strategies

Protein identification strategies refer to the fundamental approach of how to identify proteins. In general there are two ways of the identification. The first approach `bottom up´ can be considered as prevailing in the present mass spectrometry-based research. It is based on digestion of a protein or protein mixture into short peptides with a suitable protease and then MS or MS/MS analysis. The other method is called the `top-down´ technique and means direct analysis of intact proteins with no previous digestion. A new emerging technique named ´middle-down´ is trying to combine the advantages of the two approaches by creating larger peptides (greater than 3 kDa) and then analysing them. 58 There is a slight uncertainty about what the distinguishing factor is, some lexicons designate the entity that is subjected to the primary separation technique (meaning that protein separation, digestion and then MS analysis of the peptides could be considered a top-down approach). However, in most cases the differentiating factor is the form of the sample introduced to the mass spectrometer. 59

4.4.3.1 Bottom up strategy

In this older and more sophisticated approach the analytes introduced to the mass spectrometer are peptides. Proteins can be firstly separated and then digested or alternatively the whole protein mixture can be digested and then separated usually by LC online coupled to MS. There are two different strategies for bottom up proteomics: Peptide mass fingerprinting and tandem MS (MS/MS).

Peptide mass fingerprinting

In this method peptide masses obtained from MS analysis are characteristic for the specific protein (provide a mass fingerprint) and can be used in database searching to assign them to the protein of interest. This technique requires simple protein mixtures or better pure proteins for the analysis and unique identification based on several peptides.

Tandem MS

31 In this technique a peptide ion is subjected to dissociation to produce product ion fragments. The original amino acid sequence of the precursor can be derived from the fragment ions masses. This principle is the basis for de novo sequencing by MS/MS. This technique has become very popular and includes some new approaches like shotgun proteomics.

4.4.3.2 Top down strategy

In this strategy intact protein molecular ions are generated by ESI are introduced into the mass analyzer and subjected to gas-phase fragmentation. A limitation to this technique is problematic identification of ion masses of multiply charged ions 60. However, it can be partly circumvented by ion charge state manipulation (introducing gas-phase anions to strip protons from the product ions) and mass spectrometers with high mass accuracy (FT-ICR, Orbitrap). Among the main advantages of this technique count is the theoretical access to the whole protein sequence and availability to locate and analyse PTMs. The limitations are the expensive instrumentation and analytical obstacles with multiple charged ions and proteins larger than 50 kDa.

4.5 Data evaluation

The next step of a proteomic experiment is the evaluation of the raw data obtained by the specific software controlling the mass spectrometer. This process is into variety of possibilities fairly simple. The raw data have to be processed first, that includes smoothing, centroiding and charge-stage deconvolution of the acquired spectra.1 Then the processed data are searched against a protein database. There are two most commonly used algorithms: a probabilistic approximation such as the search engine MASCOT 61 and mathematical correlation method such as SEQUEST. The crucial factor when browsing the database is the applied mass accuracy, later the defining of thresholds of the minimum scores and the allowed mass tolerances for the precursor and fragments ions. A preferable approach for determining the parameters is to search the obtained data sets against a decoy protein database 62 It might happen that more proteins match the identified pattern, in this case they are often homologous or just isoforms of the same protein. 63

32 The biggest challenge is to evaluate the data correctly. According to the latest research there are many areas that can make the acquired data more reliable. Via multiple analyses, measurement by complementary instruments, interpretation of the data with complementary algorithms can minimize the rate of false interactions. 64 Moreover the whole analysing process can be integrated into one linear pipeline. this overcomes obstacles in methodological approaches and helps to set up better standards. 65

4.6 Data standardization and interpretation

High-throughput technologies such as proteomics are generating vast data sets that have to be stored in databases that are easily comparable and interpretable. Only through careful standardization we will be able to fasten the progress and not get lost in the jungle of structural biology standards. Despite many advantages of standardization, the problem is that our understanding of biological processes is not constant, but still developing while new inventions and findings are made. That is the reason which makes standardization such a challenge. 66 There are four steps in the process of standard development: conceptual model design, model formalization, development of a data exchange format and implementation of the supporting tools and all of them are necessary. For proteomics there are already some established standards: Human Proteome Organisation (HUPO), Proteomics Standards Initiative (PSI), Proteomics Standards Initiative Molecular Interaction (PSI-MI), Minimum Information About a Proteomics Experiment (Mass Spectrometry, Mass Spectrometry Informatics) (MIAPE (MS, MSI)), Proteomics Experiment Data Repository (PEDRo), Proteomics Experiment Markup Language (PEML), Molecular and Cellular Proteomics (MCP). 67 Except of the database standards there are many more types of standards. Conceptual standards, for example, deal with the problem of interpreting the acquired data in the way that it sets up the minimum information about the experiment needed to be relevant. 68, 69 Another category of standardization is data exchanging, which is not of small importance at all. Extensible markup language (XML) has proved to be a reliable standard for data exchange. However, the latest bioinformatic tools have gone out of bounds of the XML possibilities mainly with high interoperability and dynamics. A solution to this seems to lie in extended usage of semantic web

33 technologies, such as resource-description framework (RDF). The idea of this concept is clear: everything is a resource that connects with other resources via properties. 70 When trying to find a common denominator among the plethora of standards, all of the efforts to set up wide-spread standards can be grouped into this sentence: Simplicity, but not oversimplification, is the key to success.67

34 Chapter 5

Future perspectives

Although proteomic technologies have been developing very rapidly it is certain that this development has not yet reached its end. Most probably in the few upcoming decades we will be witnessing even greater flourishing of structural biology with the functional approaches in the front. Lets have a look at some particular approaches that might shape the future of proteomic research:

o Visual proteomics thanks to cryo-electron tomography aims to deliver detailed structures of macromolecular interactions 71

o Increasing the resolution and mass accuracy in the present mass spectrometric methods 72

o Brand new techniques, such as DNA ligation assay or electrocapture technique73

o Proteomics will be capable of proved medical diagnostics 74, 75

o Organizational and functional studies on receptor signaling in cells 76

o Higher sequence coverage for identified proteins resulting in more reliable identification, dealing with post translational modifications 77

o Absolute proteomics presents a set of computational tools that aim to absolute quantitation in mass spectrometry-based proteomics. It is not entirely clear what happens during ionization and when ionized peptides enter mass spectrometer, but it is known that not all peptides are equally favored during ionization. This limitation has prevented proteomics to be a truly quantitative science. An interesting solution came up, namely estimation that the same peptides from a particular protein are found over and over again. This can be used to construct a predictive database of the peptides that are detected every time when analyzing a particular protein and thanks to computer tools used to

35 train the mass spectrometer to detect the proteotypic peptides for each protein. Then the mass spectrometer becomes more effective. 78

As it can be concluded above, the proteomic disciplines are rich in variety at the moment and they are going to extend their metodological and theoretical coverage of science discipline even more. Considering protein complexes the main issues to solve are: higher yield and effectivity in the whole experimental pathway: from PPI purification (manage not to loose transient interactions), sufficient separation and gel extraction, more sophisticated mass spectrometry detection and data treatment. It is very likely that a bright future of structural biology lies before us that is going to bring many astonishing breakthroughs.

36 Chapter 6

Experimental part

Telomeres, nucleoprotein structures at chromosome ends, protect chromosomes from degradative and fusing processes. The protective function depends on being able to keep a certain minimum length of DNA and on telomere-associated proteins. Replicative shortening of telomeres or dysfunction in protein stabilizers results in DNA instability. Telomere maintenance is usually achieved by telomerase activity or by ALT (alternative lengthening of telomeres). Since the major part of telomere is folded in nucleosomes, forming a heterochromatin structure, epigenetic factors can play an important role in telomere maintenance too. It is assumed that telomere function is somehow linked to the structure of chromatin and thus the chromatic dynamics seems to have a great influence on telomere stability. Chromatin undergoes many processes influenced by various factors, of which an important role has large and diverse superfamily of HMG (high mobility group) proteins. 79, 80 Experimental part is focused on determining the differences in protein composition in cell extracts from mouse embryonic fibroblasts (MEFs) of VA1 line from normal mouse and C1 line from a mouse with a knocked out HMGB1 gene. Results of protein composition comparison for two corresponding 1-D bands are shown in this work.

5.1 Objectives

Objective of the experimental part is to run the protein analysis in two selected electrophoretic bands and determine the differences in their protein composition and thus to search for potential candidates related to the effect of HMGB1 gene on molecular basis to the telomerase activity that is lower in the mutated mouse.

5.2 Material

For the purposes of the experiment the following samples and chemicals were used:

37 5.2.1 Samples

Cell lysates from mouse embryonic fibroblast lines VA1 (normal mouse) and C1 (mouse with a knocked out HMGB1 gene).

5.2.2 Chemicals

All chemicals needed to run this experiment are mentioned in the following sub- section of this chapter.

5.3 Methods

The list of methods comprises the electrophoretic separation, mass spectrometry analysis and data processing.

5.3.1 Sample separation

Protein extracts were separated by one dimensional gel electrophoresis (1-D GE). An aliquot of the concentrated phage proteins was boiled for 2 min in sample buffer (0.175 M Tris-HCl, pH 6.8, 15% w/v glycerol, 5% w/v SDS, 4.65% w/v DTT, a trace of bromophenol blue). Vertical 1-DE was performed on 10% T acrylamide gels. Electrophoresis was performed on a Protean II xi Cell at constant power (15 mA in the stacking gel and 30 mA during separation) in running buffer (0.025 M Tris-HCl, 0.192 M glycine, 0.1% w/v SDS). The gels were 20×20×0.1 cm. Proteins were stained with the Bio-Safe Coomassie G-250.

5.3.2 Mass spectrometry analysis

Bands selected for analysis were excised from 1-D gels. After destaining, the proteins in gel pieces were incubated with trypsin (sequencing grade, Promega) at 37°C for 2h. Digested peptides were extracted from gels using 50% ACN solution with 5% formic acid. The digestion protocol is based on procedure described by Shevchenko et al. 53

38 LC-MS/MS experiments were accomplished on an HPLC system consisting of a gradient pump (Ultimate), autosampler (Famos) and column switching device (Switchos; LC Packings, Amsterdam, The Netherlands) on-line coupled with an HCTultra PTM Discovery System ion trap mass spectrometer (Bruker Daltonik). Tryptic digests were concentrated and desalted using PepMap C18 trapping column (300 m x 5 mm, LC Packings). Sample volume was 15 l. After washing with 0.1% formic acid, the peptides were eluted from the trapping column using an acetonitrile/water gradient (4 µL/min) onto a fused-silica capillary column (320 m x 180 mm), on which peptides were separated. This column was filled with 4- m Jupiter Proteo sorbent (Phenomenex, Torrance, CA) according to a previously described procedure 81. The mobile phase A consisted of acetonitrile/0.1% formic acid (5/95 v/v) mixture and the mobile phase B consisted of acetonitrile/0.1% formic acid (80/20 v/v) mixture. The gradient elution started at 5% of mobile phase B, and after 4 minutes, it was increased linearly from 5% to 50% during 70 minutes. The analytical column outlet was connected to the electrospray ion source via a 50- m-inner diameter fused-silica capillary. Nitrogen was used as nebulizing as well as drying gas. The pressure of nebulizing gas was 15 psi. The temperature and flow rate of drying gas were set to 300 ºC and 6 L/min, respectively, and the capillary voltage was 4.0 kV. The mass spectrometer was operated in the positive ion mode in a m/z range of 300 – 1500 for MS and 100-3000 for MS/MS scans. Extraction of the mass spectra from the chromatograms, mass annotation and deconvolution of the mass spectra were performed using DataAnalysis 4.0 software (Bruker Daltonik).

5.3.3 Data processing

MASCOT 2.2 (MatrixScience, London, UK) search engine was used for processing the MS/MS data. Database searches were done against the NCBI protein database (Release 20090210). A mass tolerance of MS data and MS/MS fragments for MS/MS ion searches were 0.5 Da. All searches were done without taxonomic restriction. Oxidation of methionine and propionamide adduct of as optional modifications, respectively, and one enzyme miscleavage, were set for all searches.

39 5.4 Results

Results comprise the 1D-electrophoretic separation of the cell lysates from mouse embryonic fibroblasts outsourced from the laboratory of Dr. Michal Štros. A gel containing the two analyzed cell lines was obtained. The bands of interest are marked as 3 in VA1 cell line (the one on the left) and 1 in C1 cell line. The selected bands were then excised and analyzed with LC MS/MS spectrometry (after in-gel trypsin digestion).

Figure 9: The electrophoretic gel containing separated proteins from the studied cell lines.

5.4.2 Protein analysis

The following tables (Tab. 1 and Tab. 2) summarize the identified proteins in the analyzed bands. The listed proteins are the first candidates selected by the MASCOT search engine. In the appendix chapter the rest of the alternative candidates is listed from the protein with the highest score descending to the proteins with the lowest.

Table 1: The identified in VA1 cell line: Red indicated proteins are those that are present only in this cell line. The others having the base colour coincide in both samples

VA1: 3 band Sequence Protein name Mass (kDa) Score coverage (%) Hypothetical protein 47111 1407 56 LOC433182

unnamed protein 50086 822 42 product

ARP3 -related 47327 567 33 protein 3 homolog eukaryotic translation elongation factor 1 50029 566 24 gamma ribosomal protein L4 47124 370 21

40 heterogeneous nuclear 45701 243 10 ribonucleoprotein F 26S 49462 220 13 subunit, ATPase 3

suppression of 41630 187 10 tumorigenicity 13

Y box-binbing protein 35822 182 15

methionine adenosyltransferase 43661 178 6 II, alpha

cp27 32916 144 15

26S protease 47252 126 6 regulatory subunit 6B

splicing factor 3b, 44327 118 3 subunit 4

actin-like 6A 47399 105 5

mCG16052, isoform 47819 100 5 CRA_b

Sjogren syndrome 47727 79 3 antigen B

Table 2: The list of proteins identified in C1 cell line: Red indicated proteins are those that are present only in this cell line. The others having the base colour coincide in both samples

C1: 1 band Sequence Protein name Mass (kDa) Score coverage (%) ARP3 actin-related 47327 1039 52 protein 3 homolog ribosomal protein L4 47124 766 30 eukaryotic translation elongation factor 1 50029 751 28 gamma hypothetical protein 47111 727 34 LOC433182

unnamed protein 50086 726 35 product

41 mKIAA4020 protein 44985 670 31 eukaryotic translation initiation factor 3 52201 543 26 subunit 6 Serpin H1 precursor (Collagen-binding protein) (Colligin) (47 46560 486 24 kDa heat shock protein) ( protease inhibitor J6)

p53 43359 359 24 eukaryotic translation initiation factor 2, 38068 337 24 subunit 2 (beta) proteasome (prosome, macropain) 26S 49462 276 14 subunit, ATPase 3

SDF3 46175 262 11

Y box-binbing protein 35822 234 13

heterogeneous nuclear 49168 230 9 ribonucleoprotein H1

actin-like 6A 47399 228 8

26S protease 47252 153 13 regulatory subunit 6B

septin 10 isoform 2 49797 141 4

suppression of 41630 119 7 tumorigenicity 13 cleavage stimulation factor, 3' pre-RNA, 48351 110 6 subunit 1

parvin, alpha 42304 107 3

protein synthesis 46460 88 3 initiation factor 4A

RIKEN cDNA 46350 83 3 9630046K23 LIM homeobox protein cofactor 42665 81 4 CLIM-2

42 Cytidine monophospho-N- ? 80 2 acetylneuraminic acid synthetase hypothetical protein 44394 67 2 LOC70591 mitochondrial processing peptidase 54580 58 4 beta subunit

fos-like antigen 2 35277 57 3

5.5 Discussion

As we can see on the electrophoretic gel the searched proteins should have molecular mass between cca. 43 kDa and 55 kDa according to the markers on the left side of the gel. Then the process of comparison of the bands is as follows: the tables for VA1: 3 and C1: 1 were compared and the proteins occurring in both were crossed out. The rest of the proteins in VA1: 3 and C1: 1 bands are the sought proteins that occur only in either a normal or a mutated mouse and could be of a biological relevance.

It has been found that these proteins occur only in the normal mouse: 1st. mCG16052, isoform CRA_b 2nd. splicing factor 3b, subunit 4 3rd. cp27 4th. methionine adenosyltransferase II, alpha 5th. heterogeneous nuclear ribonucleoprotein F 6th. Sjogren syndrome antigen B

On the contrary the HMGB1 deficient mouse contained only these proteins: 1st. mKIAA4020 protein 2nd. eukaryotic translation initiation factor 3 subunit 6 Serpin H1 precursor (Collagen-binding protein) (Colligin) (47 kDa heat shock protein) (Serine protease inhibitor J6) 3rd. p53

43 4th. eukaryotic translation initiation factor 2, subunit 2 (beta) 5th. SDF3 6th. heterogeneous nuclear ribonucleoprotein H1 7th. septin 10 isoform 2 8th. cleavage stimulation factor, 3' pre-RNA, subunit 1 9th. parvin, alpha 10th. protein synthesis initiation factor 4A 11th. RIKEN cDNA 9630046K23 12th. LIM homeobox protein cofactor CLIM-2 13th. Cytidine monophospho-N-acetylneuraminic acid synthetase 14th. hypothetical protein LOC70591 15th. mitochondrial processing peptidase beta subunit 16th. fos-like antigen 2

5.6 Conclusion

The bottom line of this experiment is to identify proteins and to find out the differences in protein composition between both cell lines (normal and HMGB1 deficient mouse). Several proteins were identified just in the protein extract of one of both cell lines. These proteins could be involved in cell processes related to HMGB1. It should be noted that results presented are just preliminary and whether the candidates are related to and how they contribute to this molecular process is going to be thenceforth studied. Even though the experiment was not focused on a PPI analysis 82, 83, the procedure of the substituted experiment had only a slightly different outline that was previously meant for the one focused on PPIs.

44 Chapter 7

Abbreviations

PPI Protein-protein interaction TAP Tandem affinity purification MEF Mouse embryonic fibroblasts HMG High mobility group PINT Protein – protein Interactions Thermodynamic database PIE Protein Interaction information Extraction system PubMed Public/Publisher MEDLINE FRET Fluorescence resonance energy transfer FCCS Fluorescence croos-correlation spectroscopy RET Resonance energy transfer BRET bioluminiscence-RET FRET fluorescence-RET BiFC Bimolecular Fluorescence Complementation CBP Calmodulin binding peptide TEV Tobacco etch virus EMF Electromotive force IEF Isoelectric focusing SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis QPNC-PAGE Quantitative preparative native continuous polyacrylamide gel electrophoresis LC Liquid chromatography HPLC High-preassure liquid chromatography MALDI Matrix assisted laser desorption and ionization ESI Electrospray ionization MS Mass spectrometry MS/MS Tandem mass spectrometry FD Field desorption FAB Fast atom bombardment

45 DIOS Desorption/ionization on silicon DART Direct Analysis in Real Time APCI Atmospheric pressure chemical ionization SIMS Secondary ion mass spectrometry TOF Time-of-flight FT-ICR Fourier transform ion cyclotron resonance QIT Quadrupole ion trap PTMs Post-translational modifications HUPO Human Proteome Organisation PSI Proteomics Standards Initiative PSI-MI Proteomics Standards Initiative Molecular Interaction MIAPE Minimum Information About a Proteomics Experiment PEDRo Proteomics Experiment Data Repository PEML Proteomics Experiment Markup Language MCP Molecular and Cellular Proteomics XML Extensible markup language RDF Resource-description framework ALT Alternative lengthening of telomeres DTT Dithiothreitol ACN Acetonitrile T Total (refers to acrylamide and bisacrylamide content in electrophoretic gel) NCBI National Center for Biotechnology Information MW Molecular weight ALT Alternative lengthening of telomeres

46 Chapter 8

Appendix

Here the complete data obtained from two excised bands (VA1: 3, C1: 1), which were then digested, analyzed with LC-MS/MS and searched against MASCOT database are listed. Proteins reliably identified have a red bold score font, those limitedly identified have just normal red font and those with no significant peptides matched have the same font as the rest of the text. Some additional notes are written in blue font.

A1: 3

1. gi|70794816 Mass: 47111 Score: 1407 Queries matched: 26 pI = 6,4 56% hypothetical protein LOC433182 [Mus musculus]

2. gi|26345590 Mass: 50086 Score: 822 Queries matched: 12 pI = 9,1 42% unnamed protein product [Mus musculus]

Proteins matching the same set of peptides: gi|74195737 Mass: 50081 Score: 822 Queries matched: 35 unnamed protein product [Mus musculus]

3. gi|23956222 Mass: 47327 Score: 567 Queries matched: 9 pI = 5,6 33% ARP3 actin-related protein 3 homolog [Mus musculus]

Proteins matching the same set of peptides: gi|74198200 Mass: 47293 Score: 567 Queries matched: 10 unnamed protein product [Mus musculus] gi|74214405 Mass: 47346 Score: 567 Queries matched: 10 unnamed protein product [Mus musculus] gi|74220060 Mass: 47369 Score: 567 Queries matched: 10 unnamed protein product [Mus musculus]

4. gi|110625979 Mass: 50029 Score: 566 Queries matched: 9 pI = 6,3 24% eukaryotic translation elongation factor 1 gamma [Mus musculus]

5. gi|30794450 Mass: 47124 Score: 370 Queries matched: 5 pI = 11,0 21% ribosomal protein L4 [Mus musculus]

6. gi|19527048 Mass: 45701 Score: 243 Queries matched: 3 pI = 5,3 10% heterogeneous nuclear ribonucleoprotein F [Mus musculus]

Proteins matching the same set of peptides: gi|20073357 Mass: 37118 Score: 243 Queries matched: 5 Hnrpf protein [Mus musculus] gi|20987708 Mass: 37178 Score: 243 Queries matched: 5

47 Hnrpf protein [Mus musculus] gi|26345420 Mass: 46536 Score: 243 Queries matched: 5 unnamed protein product [Mus musculus] gi|58476100 Mass: 43658 Score: 243 Queries matched: 5 Hnrpf protein [Mus musculus] gi|62510677 Mass: 45689 Score: 243 Queries matched: 5 Heterogeneous nuclear ribonucleoprotein F (hnRNP F) gi|148670393 Mass: 37168 Score: 243 Queries matched: 5 mCG50680 [Mus musculus]

7. gi|6679503 Mass: 49462 Score: 220 Queries matched: 4 pI = 5,1 13% proteasome (prosome, macropain) 26S subunit, ATPase 3 [Mus musculus]

Proteins matching the same set of peptides: gi|13543237 Mass: 49518 Score: 220 Queries matched: 4 Proteasome (prosome, macropain) 26S subunit, ATPase 3 [Mus musculus] gi|26350207 Mass: 50350 Score: 220 Queries matched: 4 unnamed protein product [Mus musculus] gi|74185161 Mass: 55603 Score: 220 Queries matched: 4 unnamed protein product [Mus musculus] gi|74188900 Mass: 49518 Score: 220 Queries matched: 4 unnamed protein product [Mus musculus] gi|74191536 Mass: 47308 Score: 220 Queries matched: 4 unnamed protein product [Mus musculus] gi|74212095 Mass: 45208 Score: 220 Queries matched: 4 unnamed protein product [Mus musculus] gi|123226008 Mass: 44641 Score: 220 Queries matched: 4 proteasome (prosome, macropain) 26S subunit ATPase 3 [Mus musculus] gi|148695573 Mass: 49488 Score: 220 Queries matched: 4 proteasome (prosome, macropain) 26S subunit, ATPase 3, isoform CRA_a [Mus musculus] gi|148695575 Mass: 50334 Score: 220 Queries matched: 4 proteasome (prosome, macropain) 26S subunit, ATPase 3, isoform CRA_c [Mus musculus]

8. gi|19526912 Mass: 41630 Score: 187 Queries matched: 3 pI = 5,2 10% suppression of tumorigenicity 13 [Mus musculus] nonmatching molecular weight (MW)

Proteins matching the same set of peptides: gi|74177941 Mass: 41629 Score: 187 Queries matched: 3 unnamed protein product [Mus musculus] gi|74185730 Mass: 41672 Score: 187 Queries matched: 3 unnamed protein product [Mus musculus] gi|74214465 Mass: 41530 Score: 187 Queries matched: 3 unnamed protein product [Mus musculus]

9. gi|55451 Mass: 35822 Score: 182 Queries matched: 3 pI = 10,0 15% Y box-binbing protein [Mus musculus] nonmatching MW

Proteins matching the same set of peptides: gi|199821 Mass: 35723 Score: 182 Queries matched: 3 Y box transcription factor gi|203398 Mass: 35735 Score: 182 Queries matched: 3 putative gi|203999 Mass: 35699 Score: 182 Queries matched: 3 enhancer factor-1-alpha gi|988281 Mass: 35986 Score: 182 Queries matched: 3 mYB-1a gi|988283 Mass: 35678 Score: 182 Queries matched: 3 mYB-1b

48 gi|2745892 Mass: 33527 Score: 182 Queries matched: 3 Y box transcription factor [Mus musculus] gi|29437175 Mass: 35735 Score: 182 Queries matched: 3 Y box protein 1 [Mus musculus] gi|148698506 Mass: 30921 Score: 182 Queries matched: 3 mCG4206 [Mus musculus] gi|149252935 Mass: 36615 Score: 182 Queries matched: 3 PREDICTED: similar to transcription factor EF1(A) [Mus musculus]

10. gi|21704144 Mass: 43661 Score: 178 Queries matched: 2 pI = 6,0 6 % methionine adenosyltransferase II, alpha [Mus musculus]

Proteins matching the same set of peptides: gi|34849522 Mass: 33017 Score: 178 Queries matched: 2 Mat2a protein [Mus musculus] gi|40714030 Mass: 43589 Score: 178 Queries matched: 2 methionine adenosyltransferase II alpha subunit [Mus musculus] gi|74146617 Mass: 43576 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|74177663 Mass: 43687 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|74183481 Mass: 39729 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|74191370 Mass: 42072 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|74207756 Mass: 43691 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|74228973 Mass: 40094 Score: 178 Queries matched: 2 unnamed protein product [Mus musculus] gi|148666566 Mass: 48860 Score: 178 Queries matched: 2 mCG129313, isoform CRA_a [Mus musculus] gi|148666567 Mass: 43353 Score: 178 Queries matched: 2 mCG129313, isoform CRA_b [Mus musculus]

11. gi|3115274 Mass: 32916 Score: 144 Queries matched: 3 pI = 4,8 15% cp27 [Mus musculus] 1 peptide with significant score, nonmatching MW

Proteins matching the same set of peptides: gi|6753412 Mass: 32901 Score: 144 Queries matched: 3 craniofacial development protein 1 [Mus musculus] gi|7648489 Mass: 29231 Score: 144 Queries matched: 3 BCNT [Mus musculus]

12. gi|1709797 Mass: 47252 Score: 126 Queries matched: 3 pI = 5,2 6 % 26S protease regulatory subunit 6B (Proteasome 26S subunit ATPase 4) (MIP224) (MB67- interacting protein) (TAT-binding protein 7) (TBP-7) (CIP21); Taxonomy: Mus musculus 1 peptide with significant score

Proteins matching the same set of peptides: gi|26341428 Mass: 47365 Score: 126 Queries matched: 3 unnamed protein product [Mus musculus] gi|62201535 Mass: 47353 Score: 126 Queries matched: 3 Proteasome (prosome, macropain) 26S subunit, ATPase, 4 [Mus musculus] gi|74141261 Mass: 47439 Score: 126 Queries matched: 3 unnamed protein product [Mus musculus] gi|74141846 Mass: 47369 Score: 126 Queries matched: 3 unnamed protein product [Mus musculus] gi|74181490 Mass: 35243 Score: 126 Queries matched: 3 unnamed protein product [Mus musculus]

49 gi|74195574 Mass: 47319 Score: 126 Queries matched: 3 unnamed protein product [Mus musculus]

13. gi|23346437 Mass: 44327 Score: 118 Queries matched: 1 pI = 8,5 3 % splicing factor 3b, subunit 4 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|94398719 Mass: 44597 Score: 118 Queries matched: 1 PREDICTED: similar to Splicing factor 3b, subunit 4 [Mus musculus]

14. gi|9789893 Mass: 47399 Score: 105 Queries matched: 1 pI = 5,4 5 % actin-like 6A [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|23396474 Mass: 47417 Score: 105 Queries matched: 2 Actin-like protein 6A (53 kDa BRG1-associated factor A) (Actin-related protein Baf53a) gi|74141943 Mass: 47489 Score: 105 Queries matched: 2 unnamed protein product [Mus musculus] gi|148703061 Mass: 48055 Score: 105 Queries matched: 2 actin-like 6A, isoform CRA_c [Mus musculus] gi|148703062 Mass: 55298 Score: 105 Queries matched: 2 actin-like 6A, isoform CRA_d [Mus musculus]

15. gi|148676807 Mass: 47819 Score: 100 Queries matched: 2 pI = 5,7 5 % mCG16052, isoform CRA_b [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|12847562 Mass: 49536 Score: 100 Queries matched: 2 unnamed protein product [Mus musculus]

16. gi|6678143 Mass: 47727 Score: 79 Queries matched: 1 pI = 9,8 3 % Sjogren syndrome antigen B [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|26353736 Mass: 47628 Score: 79 Queries matched: 1 unnamed protein product [Mus musculus] gi|74223023 Mass: 47697 Score: 79 Queries matched: 1 unnamed protein product [Mus musculus] gi|123232427 Mass: 21399 Score: 79 Queries matched: 1 Sjogren syndrome antigen B [Mus musculus] gi|148695104 Mass: 48483 Score: 79 Queries matched: 1 Sjogren syndrome antigen B, isoform CRA_b [Mus musculus]

16. gi|28173550 Mass: 50617 Score: 71 Queries matched: 2 pI = 8,7 5 % cell division cycle 10 homolog [Mus musculus] peptides with insignificant score, just an info

Proteins matching the same set of peptides: gi|9789715 Mass: 50476 Score: 71 Queries matched: 2 Septin-7 (CDC10 protein homolog) gi|9789726 Mass: 50518 Score: 71 Queries matched: 2 Septin-7 (CDC10 protein homolog) gi|26354124 Mass: 48724 Score: 71 Queries matched: 2 unnamed protein product [Mus musculus] gi|60360112 Mass: 44985 Score: 71 Queries matched: 2 mKIAA4020 protein [Mus musculus]

50 gi|67472680 Mass: 50259 Score: 71 Queries matched: 2 Septin-7 (CDC10 protein homolog) gi|74184369 Mass: 48646 Score: 71 Queries matched: 2 unnamed protein product [Mus musculus] gi|148693353 Mass: 44828 Score: 71 Queries matched: 2 septin 7, isoform CRA_a [Mus musculus] gi|148693354 Mass: 51445 Score: 71 Queries matched: 2 septin 7, isoform CRA_b [Mus musculus]

17. gi|12805611 Mass: 25469 Score: 51 Queries matched: 1 pI = 4,7 6 % Atxn10 protein [Mus musculus] peptides with insignificant score, just an info

Proteins matching the same set of peptides: gi|74184979 Mass: 53677 Score: 51 Queries matched: 1 unnamed protein product [Mus musculus] gi|74218038 Mass: 53643 Score: 51 Queries matched: 1 unnamed protein product [Mus musculus] gi|83649709 Mass: 53673 Score: 51 Queries matched: 1 ataxin 10 [Mus musculus]

C1: 1

1. gi|23956222 Mass: 47327 Score: 1039 Queries matched: 16 pI = 5,6 52% ARP3 actin-related protein 3 homolog [Mus musculus]

Proteins matching the same set of peptides: gi|74198200 Mass: 47293 Score: 1039 Queries matched: 27 unnamed protein product [Mus musculus] gi|74214405 Mass: 47346 Score: 1039 Queries matched: 27 unnamed protein product [Mus musculus]

2. gi|30794450 Mass: 47124 Score: 766 Queries matched: 9 pI = 11,0 30% ribosomal protein L4 [Mus musculus]

3. gi|110625979 Mass: 50029 Score: 751 Queries matched: 11 pI = 6,3 28% eukaryotic translation elongation factor 1 gamma [Mus musculus]

4. gi|70794816 Mass: 47111 Score: 727 Queries matched: 10 pI = 6,4 34% hypothetical protein LOC433182 [Mus musculus]

5. gi|26345590 Mass: 50086 Score: 726 Queries matched: 11 pI = 9,1 35% unnamed protein product [Mus musculus]

Proteins matching the same set of peptides: gi|74195737 Mass: 50081 Score: 726 Queries matched: 26 unnamed protein product [Mus musculus]

6. gi|60360112 Mass: 44985 Score: 670 Queries matched: 10 pI = 8,1 31% mKIAA4020 protein [Mus musculus]

Proteins matching the same set of peptides: gi|9789715 Mass: 50476 Score: 669 Queries matched: 10 Septin-7 (CDC10 protein homolog)

51 gi|9789726 Mass: 50518 Score: 669 Queries matched: 10 Septin-7 (CDC10 protein homolog) gi|26354124 Mass: 48724 Score: 669 Queries matched: 10 unnamed protein product [Mus musculus] gi|28173550 Mass: 50617 Score: 669 Queries matched: 10 cell division cycle 10 homolog [Mus musculus] gi|74184369 Mass: 48646 Score: 669 Queries matched: 10 unnamed protein product [Mus musculus] gi|148693353 Mass: 44828 Score: 669 Queries matched: 10 septin 7, isoform CRA_a [Mus musculus] gi|148693354 Mass: 51445 Score: 669 Queries matched: 10 septin 7, isoform CRA_b [Mus musculus]

7. gi|60818924 Mass: 52201 Score: 543 Queries matched: 8 pI = 5,8 26% eukaryotic translation initiation factor 3 subunit 6 [synthetic construct]

8. gi|123577 Mass: 46560 Score: 486 Queries matched: 7 pI = 8,9 24% Serpin H1 precursor (Collagen-binding protein) (Colligin) (47 kDa heat shock protein) (Serine protease inhibitor J6); Taxonomy: Mus musculus

Proteins matching the same set of peptides: gi|26345418 Mass: 46481 Score: 486 Queries matched: 8 unnamed protein product [Mus musculus] gi|26348007 Mass: 46490 Score: 486 Queries matched: 8 unnamed protein product [Mus musculus] gi|74191337 Mass: 46505 Score: 486 Queries matched: 8 unnamed protein product [Mus musculus] gi|74198254 Mass: 46500 Score: 486 Queries matched: 8 unnamed protein product [Mus musculus] gi|148684430 Mass: 44954 Score: 486 Queries matched: 8 serine (or cysteine) peptidase inhibitor, clade H, member 1, isoform CRA_a [Mus musculus] gi|161353502 Mass: 46504 Score: 486 Queries matched: 8 serine (or cysteine) proteinase inhibitor, clade H, member 1 [Mus musculus]

9. gi|200201 Mass: 43359 Score: 359 Queries matched: 6 pI = 7,1 24% p53; Taxonomy: Mus musculus

Proteins matching the same set of peptides: gi|53571 Mass: 43506 Score: 359 Queries matched: 6 unnamed protein product [Mus musculus] gi|2961247 Mass: 43402 Score: 359 Queries matched: 6 tumor suppressor p53 [Mus musculus] gi|15375072 Mass: 43546 Score: 359 Queries matched: 6 transformation related protein 53 [Mus musculus] gi|28975327 Mass: 42427 Score: 359 Queries matched: 6 tumor suppressor p53; p53as [Mus musculus] gi|74190719 Mass: 39762 Score: 359 Queries matched: 6 unnamed protein product [Mus musculus] gi|74213717 Mass: 34672 Score: 359 Queries matched: 6 unnamed protein product [Mus musculus] gi|109157831 Mass: 22767 Score: 359 Queries matched: 6 Chain A, Protein-Dna Complex gi|148678558 Mass: 44410 Score: 359 Queries matched: 6 transformation related protein 53, isoform CRA_a [Mus musculus] gi|148747262 Mass: 43431 Score: 359 Queries matched: 6 transformation related protein 53 [Mus musculus] gi|166235369 Mass: 22170 Score: 359 Queries matched: 6 Chain A, Mouse P53 Dna-Binding Domain In -Free Oxidized State

10. gi|14149756 Mass: 38068 Score: 337 Queries matched: 5 pI = 5,6 24%

52 eukaryotic translation initiation factor 2, subunit 2 (beta) [Mus musculus] nonmatching MW

Proteins matching the same set of peptides: gi|26377771 Mass: 37549 Score: 337 Queries matched: 6 unnamed protein product [Mus musculus]

11. gi|6679503 Mass: 49462 Score: 276 Queries matched: 4 pI = 5,1 14% proteasome (prosome, macropain) 26S subunit, ATPase 3 [Mus musculus]

Proteins matching the same set of peptides: gi|2492523 Mass: 49129 Score: 276 Queries matched: 4 26S protease regulatory subunit 6A (Proteasome 26S subunit ATPase 3) (TAT-binding protein 1) (TBP-1) (Spermatogenic cell/sperm-associated TAT-binding protein homolog SATA) gi|13543237 Mass: 49518 Score: 276 Queries matched: 4 Proteasome (prosome, macropain) 26S subunit, ATPase 3 [Mus musculus] gi|26350207 Mass: 50350 Score: 276 Queries matched: 4 unnamed protein product [Mus musculus] gi|74185161 Mass: 55603 Score: 276 Queries matched: 4 unnamed protein product [Mus musculus] gi|74191536 Mass: 47308 Score: 276 Queries matched: 4 unnamed protein product [Mus musculus] gi|74212095 Mass: 45208 Score: 276 Queries matched: 4 unnamed protein product [Mus musculus] gi|123226008 Mass: 44641 Score: 276 Queries matched: 4 proteasome (prosome, macropain) 26S subunit ATPase 3 [Mus musculus] gi|148695573 Mass: 49488 Score: 276 Queries matched: 4 proteasome (prosome, macropain) 26S subunit, ATPase 3, isoform CRA_a [Mus musculus] gi|148695575 Mass: 50334 Score: 276 Queries matched: 4 proteasome (prosome, macropain) 26S subunit, ATPase 3, isoform CRA_c [Mus musculus]

12. gi|1747298 Mass: 46175 Score: 262 Queries matched: 4 pI = 6,5 11% SDF3 [Mus musculus]

Proteins matching the same set of peptides: gi|3355888 Mass: 46191 Score: 262 Queries matched: 5 capsin [Mus musculus] gi|3808221 Mass: 46229 Score: 262 Queries matched: 5 pigment epithelium-derived factor [Mus musculus] gi|117606335 Mass: 46205 Score: 262 Queries matched: 5 serine (or cysteine) proteinase inhibitor, clade F, member 1 [Mus musculus]

13. gi|55451 Mass: 35822 Score: 234 Queries matched: 3 pI = 10,0 13% Y box-binbing protein [Mus musculus] nonmatching MW

Proteins matching the same set of peptides: gi|199821 Mass: 35723 Score: 234 Queries matched: 3 Y box transcription factor gi|203398 Mass: 35735 Score: 234 Queries matched: 3 putative gi|203999 Mass: 35699 Score: 234 Queries matched: 3 enhancer factor-1-alpha gi|988281 Mass: 35986 Score: 234 Queries matched: 3 mYB-1a gi|988283 Mass: 35678 Score: 234 Queries matched: 3 mYB-1b gi|2745892 Mass: 33527 Score: 234 Queries matched: 3 Y box transcription factor [Mus musculus] gi|29437175 Mass: 35735 Score: 234 Queries matched: 3

53 Y box protein 1 [Mus musculus] gi|148698506 Mass: 30921 Score: 234 Queries matched: 3 mCG4206 [Mus musculus] gi|149252935 Mass: 36615 Score: 234 Queries matched: 3 PREDICTED: similar to transcription factor EF1(A) [Mus musculus]

14. gi|10946928 Mass: 49168 Score: 230 Queries matched: 3 pI = 5,9 9 % heterogeneous nuclear ribonucleoprotein H1 [Mus musculus]

Proteins matching the same set of peptides: gi|26353116 Mass: 51185 Score: 230 Queries matched: 5 unnamed protein product [Mus musculus] gi|148701752 Mass: 52500 Score: 230 Queries matched: 5 heterogeneous nuclear ribonucleoprotein H1, isoform CRA_b [Mus musculus]

15. gi|9789893 Mass: 47399 Score: 228 Queries matched: 3 pI = 5,4 8 % actin-like 6A [Mus musculus]

Proteins matching the same set of peptides: gi|23396474 Mass: 47417 Score: 228 Queries matched: 3 Actin-like protein 6A (53 kDa BRG1-associated factor A) (Actin-related protein Baf53a) gi|74141943 Mass: 47489 Score: 228 Queries matched: 3 unnamed protein product [Mus musculus] gi|148703061 Mass: 48055 Score: 228 Queries matched: 3 actin-like 6A, isoform CRA_c [Mus musculus] gi|148703062 Mass: 55298 Score: 228 Queries matched: 3 actin-like 6A, isoform CRA_d [Mus musculus]

16. gi|1709797 Mass: 47252 Score: 153 Queries matched: 3 pI = 5,2 13% 26S protease regulatory subunit 6B (Proteasome 26S subunit ATPase 4) (MIP224) (MB67- interacting protein) (TAT-binding protein 7) (TBP-7) (CIP21); Taxonomy: Mus musculus 1 peptide with significant score

Proteins matching the same set of peptides: gi|26341428 Mass: 47365 Score: 153 Queries matched: 4 unnamed protein product [Mus musculus] gi|62201535 Mass: 47353 Score: 153 Queries matched: 4 Proteasome (prosome, macropain) 26S subunit, ATPase, 4 [Mus musculus] gi|74141261 Mass: 47439 Score: 153 Queries matched: 4 unnamed protein product [Mus musculus] gi|74141846 Mass: 47369 Score: 153 Queries matched: 4 unnamed protein product [Mus musculus] gi|74181490 Mass: 35243 Score: 153 Queries matched: 4 unnamed protein product [Mus musculus] gi|74195574 Mass: 47319 Score: 153 Queries matched: 4 unnamed protein product [Mus musculus]

17. gi|67906175 Mass: 49797 Score: 141 Queries matched: 2 pI = 6,3 4 % septin 10 isoform 2 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|67906193 Mass: 52388 Score: 141 Queries matched: 2 septin 10 isoform 1 [Mus musculus] gi|148700262 Mass: 52402 Score: 141 Queries matched: 2 septin 10 [Mus musculus]

18. gi|19526912 Mass: 41630 Score: 119 Queries matched: 2 pI = 5,2 7 % suppression of tumorigenicity 13 [Mus musculus] 1 peptide with significant score

54

Proteins matching the same set of peptides: gi|74177941 Mass: 41629 Score: 119 Queries matched: 2 unnamed protein product [Mus musculus] gi|74185730 Mass: 41672 Score: 119 Queries matched: 2 unnamed protein product [Mus musculus] gi|74214465 Mass: 41530 Score: 119 Queries matched: 2 unnamed protein product [Mus musculus]

19. gi|13195628 Mass: 48351 Score: 110 Queries matched: 2 pI = 6,1 6 % cleavage stimulation factor, 3' pre-RNA, subunit 1 [Mus musculus] 1 peptide with significant score

20. gi|31982526 Mass: 42304 Score: 107 Queries matched: 1 pI = 5,7 3 % parvin, alpha [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|26346240 Mass: 42304 Score: 107 Queries matched: 1 unnamed protein product [Mus musculus] gi|74223614 Mass: 38337 Score: 107 Queries matched: 1 unnamed protein product [Mus musculus]

21. gi|30519969 Mass: 46104 Score: 102 Queries matched: 2 pI = 10,1 8 % DNA polymerase delta interacting protein 3 [Mus musculus] peptides with insignificant score, just an info

Proteins matching the same set of peptides: gi|60359854 Mass: 49736 Score: 102 Queries matched: 2 mKIAA1649 protein [Mus musculus] gi|74195617 Mass: 47112 Score: 102 Queries matched: 2 unnamed protein product [Mus musculus] gi|74209870 Mass: 46203 Score: 102 Queries matched: 2 unnamed protein product [Mus musculus] gi|74213780 Mass: 42909 Score: 102 Queries matched: 2 unnamed protein product [Mus musculus] gi|148672533 Mass: 46076 Score: 102 Queries matched: 2 polymerase (DNA-directed), delta interacting protein 3, isoform CRA_a [Mus musculus] gi|148672534 Mass: 42881 Score: 102 Queries matched: 2 polymerase (DNA-directed), delta interacting protein 3, isoform CRA_b [Mus musculus]

22. gi|673433 Mass: 46460 Score: 88 Queries matched: 1 pI = 5,3 3 % protein synthesis initiation factor 4A [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|50815 Mass: 42158 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|556308 Mass: 46141 Score: 88 Queries matched: 1 protein synthesis initiation factor 4A gi|7305019 Mass: 46344 Score: 88 Queries matched: 1 eukaryotic translation initiation factor 4A2 [Mus musculus] gi|26374621 Mass: 26210 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|71051290 Mass: 45994 Score: 88 Queries matched: 1 Eif4a1 protein [Mus musculus] gi|74139596 Mass: 46153 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|74151289 Mass: 41464 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus]

55 gi|74187323 Mass: 46155 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|74187427 Mass: 24855 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|74219920 Mass: 46111 Score: 88 Queries matched: 1 unnamed protein product [Mus musculus] gi|148665241 Mass: 41836 Score: 88 Queries matched: 1 eukaryotic translation initiation factor 4A2, isoform CRA_c [Mus musculus] gi|148667738 Mass: 46107 Score: 88 Queries matched: 1 mCG1035528 [Mus musculus] gi|148709584 Mass: 45845 Score: 88 Queries matched: 1 mCG50578 [Mus musculus] gi|149270417 Mass: 42652 Score: 88 Queries matched: 1 PREDICTED: similar to eukaryotic initiation factor 4AI isoform 1 [Mus musculus]

23. gi|27369505 Mass: 46350 Score: 83 Queries matched: 1 pI = 9,0 3 % RIKEN cDNA 9630046K23 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|26334375 Mass: 46322 Score: 83 Queries matched: 1 unnamed protein product [Mus musculus] gi|26337215 Mass: 37729 Score: 83 Queries matched: 1 unnamed protein product [Mus musculus] gi|148665570 Mass: 44005 Score: 83 Queries matched: 1 RIKEN cDNA 9630046K23, isoform CRA_b [Mus musculus]

24. gi|2738116 Mass: 42665 Score: 81 Queries matched: 1 pI = 6,3 4 % LIM homeobox protein cofactor CLIM-2 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|6754520 Mass: 42752 Score: 81 Queries matched: 1 LIM domain binding 1 isoform 3 [Mus musculus] gi|108743250 Mass: 36643 Score: 81 Queries matched: 1 LIM-domain-binding protein 1b [Mus musculus] gi|164663818 Mass: 46472 Score: 81 Queries matched: 1 LIM domain binding 1 isoform 1 [Mus musculus]

25. gi|21619376 Mass: Score: 80 Queries matched: 1 pI = 8,4 2 % Cytidine monophospho-N-acetylneuraminic acid synthetase [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|22208854 Mass: 48028 Score: 80 Queries matched: 1 cytidine monophospho-N-acetylneuraminic acid synthetase [Mus musculus] gi|40889461 Mass: 26035 Score: 80 Queries matched: 1 Chain A, The Crystal Structure Of Murine Cmp-5-N-Acetylneuraminic Acid Synthetase gi|68059163 Mass: 48099 Score: 80 Queries matched: 1 N-acylneuraminate cytidylyltransferase (CMP-N-acetylneuraminic acid synthetase) (CMP-NeuNAc synthetase)

26. gi|110625765 Mass: 44394 Score: 67 Queries matched: 1 pI = 5,0 2 % hypothetical protein LOC70591 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|74137611 Mass: 32741 Score: 67 Queries matched: 1 unnamed protein product [Mus musculus] gi|118142862 Mass: 44366 Score: 67 Queries matched: 1

56 RIKEN cDNA 5730455P16 gene [Mus musculus] gi|148683688 Mass: 45271 Score: 67 Queries matched: 1 mCG19053 [Mus musculus]

27. gi|95113671 Mass: 54580 Score: 58 Queries matched: 1 pI = 6,6 4 % mitochondrial processing peptidase beta subunit [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|74151629 Mass: 53707 Score: 58 Queries matched: 1 unnamed protein product [Mus musculus] gi|122065519 Mass: 54230 Score: 58 Queries matched: 1 Mitochondrial-processing peptidase subunit beta, mitochondrial precursor (Beta-MPP) (P-52) gi|148671249 Mass: 24863 Score: 58 Queries matched: 1 mCG6419, isoform CRA_c [Mus musculus]

28. gi|34328117 Mass: 35277 Score: 57 Queries matched: 1 pI = 7,0 3 % fos-like antigen 2 [Mus musculus] 1 peptide with significant score

Proteins matching the same set of peptides: gi|3002496 Mass: 46345 Score: 57 Queries matched: 1 GSTmFra2/2-163 [Expression vector pGH/F2.2-163] gi|3002500 Mass: 46528 Score: 57 Queries matched: 1 GSTmFra2/79-242 [Expression vector pGH/F2.79-242] gi|3002508 Mass: 54824 Score: 57 Queries matched: 1 GSTmFra2/2-242 [Expression vector pGH/F2.2-242] gi|3002512 Mass: 55193 Score: 57 Queries matched: 1 GSTmFra2/79-327 [Expression vector pGH/F2.79-327] gi|3002516 Mass: 63489 Score: 57 Queries matched: 1 GSTmFra2/2-327 [Expression vector pGH/F2.2-327] gi|74219060 Mass: 32697 Score: 57 Queries matched: 1 unnamed protein product [Mus musculus] gi|82896766 Mass: 32667 Score: 57 Queries matched: 1 PREDICTED: hypothetical protein [Mus musculus] gi|148705443 Mass: 31826 Score: 57 Queries matched: 1 fos-like antigen 2 [Mus musculus]

29. gi|45598372 Mass: 22074 Score: 55 Queries matched: 1 pI = 4,5 13% brain abundant, membrane attached signal protein 1 [Mus musculus] peptide with insignificant score, just an info

Proteins matching the same set of peptides: gi|149266508 Mass: 11385 Score: 55 Queries matched: 1 PREDICTED: similar to 22 kDa neuronal tissue-enriched acidic protein [Mus musculus]

30. gi|30519935 Mass: 43542 Score: 53 Queries matched: 1 pI = 9,8 3 % HIV-1 Rev binding protein 2 [Mus musculus] peptide with insignificant score, just an info

Proteins matching the same set of peptides: gi|50400505 Mass: 43511 Score: 53 Queries matched: 1 KRR1 small subunit processome component homolog (HIV-1 Rev-binding protein 2 homolog) gi|62825895 Mass: 43422 Score: 53 Queries matched: 1 Krr1 protein [Mus musculus] gi|74147291 Mass: 45415 Score: 53 Queries matched: 1 unnamed protein product [Mus musculus] gi|148689797 Mass: 43453 Score: 53 Queries matched: 1 KRR1, small subunit (SSU) processome component, homolog (yeast), isoform CRA_c [Mus musculus]

57

31. gi|9506945 Mass: 32277 Score: 52 Queries matched: 1 pI = 5,1 6 % poly(A) binding protein, nuclear 1 [Mus musculus] peptide with insignificant score, nonmatching MW, just an info

Proteins matching the same set of peptides: gi|26328001 Mass: 31025 Score: 52 Queries matched: 1 unnamed protein product [Mus musculus] gi|148704375 Mass: 31167 Score: 52 Queries matched: 1 poly(A) binding protein, nuclear 1, isoform CRA_a [Mus musculus]

32. gi|50795 Mass: 47957 Score: 48 Queries matched: 1 pI = 5,3 3 % E46 protein [Mus musculus] peptide with insignificant score, just an info

Proteins matching the same set of peptides: gi|26340442 Mass: 48214 Score: 48 Queries matched: 1 unnamed protein product [Mus musculus] gi|74184979 Mass: 53677 Score: 48 Queries matched: 1 unnamed protein product [Mus musculus] gi|74218038 Mass: 53643 Score: 48 Queries matched: 1 unnamed protein product [Mus musculus] gi|83649709 Mass: 53673 Score: 48 Queries matched: 1 ataxin 10 [Mus musculus]

58 Chapter 9

Bibliography

1. Kocher, T. & Superti-Furga, G. Mass spectrometry-based functional proteomics: from molecular machines to protein networks. Nat Methods 4, 807-815 (2007). 2. Kinter, M. & Sherman, N. and identification using tandem mass spectrometry. Wiley-Interscience (2000). 3. Fella, K., Glueckmann, M., Kruft, V., Kramer, P. J. & Kroeger, M. Proteomika v molekulární toxikologii: Identifikace potenciálních časných proteinových biomarkerů hepatokarcinogenity u krys. Chemické listy 99, 957- 961 (2005). 4. Royer, C. PROTEIN-PROTEIN INTERACTIONS. Biophysics textbook online (http://www.biophysics.org/education/croyer.pdf). 5. Alberts, B. The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists. CELL-CAMBRIDGE MA- 92, 291-294 (1998). 6. Gavin, A. C. & Superti-Furga, G. Protein complexes and proteome organization from yeast to man. Current Opinion in 7, 21- 27 (2003). 7. Uetz, P. & Vollert, C. S. Protein-Protein Interactions. Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine (ERGPMM), Springer Verlag (2005). 8. Editorial. Protein interactions à la carte. Nature Methods 3, 957-958 (2006). 9. Nooren, I. M. A. & Thornton, J. M. Diversity of protein- protein interactions. The EMBO Journal 22, 3486-3492 (2003). 10. J. Janin. The kinetics of protein-protein recognition. PROTEINS: Structure, Function, and Genetics 28,153–161 (1997). 11. Shaji Kumar, M. D. & Gromiha, M. M. PINT: Protein–protein Interactions Thermodynamic Database. Nucleic Acids Research 34, D195-D198 (2006). 12. Kim, S. et al. PIE: an online prediction system for protein-protein interactions

59 from text. Nucleic Acids Res 36, W411-5 (2008). 13. Minlie Huang, Shilin Ding , Hongning Wang and Xiaoyan Zhu. Mining Physical Protein-protein Interactions from Literature. http://www.stat.wisc.edu/~sding/paper/GB.pdf 14. Aloy, P. & Russell, R. B. Structural systems biology: modelling protein interactions. Nat Rev Mol Cell Biol 7, 188-197 (2006). 15. Aspadaler, J. et al. Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. 21, 3360-3368 (2005). 16. Ciriello, G. & Guerra, C. A review on models and algorithms for motif discovery in protein-protein interaction networks. Briefings in Functional Genomics and Proteomics 7, 147-156 (2008). 17. Hochleitner, E. O. et al. Protein stoichiometry of a multiprotein complex, the human spliceosomal U1 small nuclear ribonucleoprotein: absolute quantification using isotope-coded tags and mass spectrometry. J Biol Chem 280, 2536-2542 (2005). 18. Kazuhiro, N. Atomic model construction of protein complexes from electron micrographs and visualization of their 3D structure using a virtual reality system. J. Plasma Physics 72, 1037-1040 (2006). 19. Kossiakoff, A. A. & Koide, S. Understanding mechanisms governing protein- protein interactions from synthetic binding interfaces. Curr Opin Struct Biol 18, 499-506 (2008). 20. Sharan, R. & Ulitsky, I. Network-based prediction of protein function. Molecular Systems Biology 3, no.88 (2007). 21. Chen, P. Y., Deane, C. M. & Reinert, G. Predicting and validating protein interactions using network structure. PLoS Comput Biol 4, e1000118 (2008). 22. Saito, R. et al. Construction of reliable protein-protein interaction networks with a new interaction generality measure. Bioinformatics 19, 756-763 (2003). 23. Aittokallio, T. & Schwikowski, B. Graph-based methods for analysing networks in . Briefings in bioinformatics 7, 243-255 (2006). 24. Ewing, R. M. et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Molecular Systems Biology 3, no.89 (2007). 25. Pandey, A. & Mann, M. Proteomics to study and genomes. Nature 405, 837-846 (2000). 60 26. Bauch, A. & Superti-Furga, G. Charting protein complexes, signaling pathways, and networks in the immune system. Immunological Reviews 210, 187-207 (2006). 27. Pieroni, E. Protein networking: insights into global functional organization of . Proteomics 8, 799–816 (2008). 28. Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623-627 (2000). 29. Fields, S. & Sternglanz, R. The two hybrid system: an assay for protein- protein interactions. TIG 10, 286-292 (1994). 30. Ito T, Chiba T, Ozawa R, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569-4574 (2001). 31. Aloy, P. Structure-based assembly of protein complexes in yeast. Science 303, 2026-2029 (2004). 32. Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637-643 (2006). 33. Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737-741 (2003). 34. Uhlen, M. & Ponten, F. Antibody-based proteomics for human tissue profiling. Molecular & Cellular Proteomics 4, 384-393 (2005). 35. Hamelinck, D. et al. Optimized normalization for antibody microarrays and application to serum-protein profiling. Molecular & Cellular Proteomics 4, 773-784 (2005). 36. Terpe, K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl. Microbiol. Biotechnol. 60, 523- 533 (2003). 37. Waugh, D. S. Making the most of affinity tags. Trends Biotechnol 23, 316-320 (2005). 38. Drakas, R., Prisco, M. & Baserga, R. A modified tandem affinity purification tag technique for the purification of protein complexes in mammalian cells. Proteomics 5, 132-137 (2005). 39. Gingras, A. C., Aebersold, R. & Raught, B. Advances in protein complex analysis using mass spectrometry. J Physiol 563, 11-21 (2005). 40. Puig, O. et al. The tandem affinity purification (TAP) method: A general procedure of protein complex pufification. METHODS 24, 218 – 229 (2001). 61 41. Rigaut, G. et al. A generic method for protein complex characterization and proteome exploration. Nature biotechnology 17, 1030- 1032 (1999). 42. Forler, D. et al. An efficient protein complex purification method for functional proteomics in higher . Nature biotechnology 21, 89-92 (2003). 43. Burckstummer, T. et al. An efficient tandem affinity purification procedure for interaction proteomics in mammalian cells. Nat Methods 3, 1013-1019 (2006). 44. Rodi, D. J. & Makowski, L. Phage-display technology – finding a needle in a vast molecular haystack. Current opinion in Biotechnology 10, 87-93 (1999). 45. Johnson, S. A. & Hunter, T. Kinomics: methods for deciphering the kinome. Nat Methods 2, 17-25 (2005). 46. Zhu, H. & Snyder, M. Protein chip technology. Current Opinion in Chemical Biology 7, 55-63 (2003). 47. Hamelinck, D. et al. Optimized normalization for antibody microarrays and application to serum-protein profiling. Mol Cell Proteomics 4, 773-784 (2005). 48. Zhu, H. et al. Analysis of yeast protein kinases using protein chips. Nature Genetics 26, 283-289 (2000). 49. Paweletz, C. P. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene 20, 1981-1989(2001). 50. Cahill, D. J. Protein and antibody arrays and their medical applications. Journal of Immunological Methods 250, 81-91 (2001). 51. Lee, W-Ch. & Lee, K. H. Application of affinity chromatography in proteomics. Analytical Biochemistry 324, 1-10 (2004). 52. Görg, A., Weiss, W. & Dunn, M. J. Current two-dimensional electrophoresis technology for proteomics. Proteomics 4, 3665-3685 (2004). 53. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J. V. & Mann, M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc 1, 2856-2860 (2006). 54. Štosová, T., Havliš, J., Lenobl, R. & Šebela, M. Proteolytické enzymy: Význam pro proteomiku. Chemické listy 12, 896-905 (2005). 55. Washburn, M. P., Wolters, D. & Yates III, J. R. Large-scale analysis of the 62 yeast proteome by multidimensional protein identification technology. Nature Biotechnology 19, 242-247 (2001). 56. Chace, D. H. Measuring Mass: From Positive Rays to Proteins. Michael A Grayson, ed. CLINICAL CHEMISTRY-WASHINGTON- 49, 342-342 (2003). 57. Guerrera, I. C. & Kleiner, O. Application of mass spectrometry in proteomics. Biosci Rep 25, 71-93 (2005). 58. Siuti, N. & Kelleher, N. L. Decoding protein modifications using top-down mass spectrometry. Nature methods 4, 817-821 (2007). 59. Wehr, T. Top-down versus bottom-up approaches in proteomics. LCGC NORTH AMERICA 24, 1004 (2006). 60. Reid, G. E. & McLuckey, S. A. 'Top down' protein characterization via tandem mass spectrometry. J Mass Spectrom 37, 663-675 (2002). 61. Perkins, D. N., Pappin, D., Creasy, D. & Cottrell, J. Probability-based protein identification by searching sequence databases using mass spectrometric peptide mapping information. Electrophoresis 20, 3551-3567 (2004). 62. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4, 207-214 (2007). 63. Nesvizhskii, A. I. & Aebersold, R. Interpretation of shotgun proteomic data – The protein interference problem. Molecular & Cellular Proteomics 4, 1419- 1440 (2005). 64. Elias, J. E., Haas, W., Faherty, B. K. & Gygi, S. P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods 2, 667-675 (2005). 65. Domon, B. & Aebersold, R. Challenges and opportunities in proteomics data analysis. Molecular & Cellular Proteomics 5, 1921–1926, (2006). 66. Wierling, C., Herwig, R. & Lehrach, H. Resources, standards and tools for systems biology. Brief Funct Genomic Proteomic 6, 240-251 (2007). 67. Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nat Rev Genet 7, 593-605 (2006). 68. Taylor, C. F. et al. The minimum information about a proteomics experiment (MIAPE). NATURE BIOTECHNOLOGY 25, 887-893 (2007). 69. Orchard, S. et al. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nature Biotechnology 25, 894-898 (2007). 63 70. Wang, X., Gorlitzky, R. & Almeida, J. S. From XML to RDF: how semantic web technologie will change the design of ‘omic’ standards. Nature biotechnology 23, 1099-1103 (2005). 71. Nickell, S., Kofler, C., Leis, A. P. & Baumeister, W. Innovation: A visual approach to proteomics. Nature Reviews Molecular Cell Biology 7, 225-230 (2006). 72. Balogh, M. Debating Resolution and Mass Accuracy in Mass Spectrometry. Spectroscopy 19, 34-34 (2004). 73. M.H.V. Van Regenmortel. Minisymposium: New analytical techniques in proteomics - Introduction. Analytical Biochemistry 345, 1 (2005). 74. Beretta, L. Proteomics from the clinical perspective: many hopes and much debate. Nature Methods 4, 785-786 (2007)., 75. Wasinger, V. C. & Corthals, G. L. Proteomic tools for biomedicine. Journal of Chromatography B 771, 33-48 (2002). 76. Kabbani, N. Proteomics of membrane receptors and signaling. Proteomics 8, 4146-4155 (2008). 77. Mann, M. & Jensen, O. N. Proteomic analysis of post-translational modifications. Nature Biotechnology 21, 255-261 (2003). 78. Doerr, A. Absolute proteomics. Nature Methods 4, 195 (2007). 79. Kunicka, Z. et al. Role of chromatin structure in telomere maintenance. Abstracts of the 8th international conference of anticancer research, Kos, Greece, 17.-22. 10. 2008, lecture no. 193. 80. Bianchi, E. M. & Agresti, A. HMG proteins: dynamic players in gene regulation and differentiation. Current Opinion in Genetics & Development 15, 496-506 (2005). 81. Planeta, J., Karásek, P., Vejrosta, J., J. Sep. Sci. 26, 525-530 (2003). 82. Gingras, A. C., Gstaiger, M., Raught, B. & Aebersold, R. Analysis of protein complexes using mass spectrometry. Nat Rev Mol Cell Biol 8, 645-654 (2007). 83. Figeys, D., McBroom, L. D. & Moran, M. F. Mass spectrometry for the study of protein-protein interactions. Methods 24, 230-239 (2001).

64