<<

High-throughput analysis using -based methods

Tove Boström

KTH Royal Institute of Technology School of Biotechnology Stockholm 2014 c Tove Boström 2014 KTH Royal Institute of Technology School of Biotechnology Division of Protein Technology AlbaNova University center SE-106 91 Stockholm Sweden

Paper I c 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Paper II c 2012 American Chemical Society Paper IV c 2014 American Chemical Society Paper V c 2014 The American Society for and Molecular Biology, Inc.

ISBN 978-91-7595-292-5 TRITA-BIO Report 2014:15 ISSN 1654-2312 Printed by Universitetsservice US-AB 2014 "If you can dream it, you can do it."

- Walt Disney

Abstract

In the field of , are analyzed and quantified in high numbers. Protein analysis is of great importance and can for example generate information regarding protein function and involvement in disease. Different strategies for protein analysis and quan- tification have emerged, suitable for different applications. The focus of this thesis lies on protein identification and quantification using different setups and method development has a central role in all included papers.

The presented research can be divided into three parts. Part one describes the develop- ment of two different screening methods for His6-tagged recombinant protein fragments. In the first investigation, proteins were purified using immobilized metal ion affinity chro- matography in a 96-well plate format and in the second investigation this was downscaled to nanoliter-scale using the miniaturized sample preparation platform, integrated selective enrichment target (ISET). The aim of these investigations was to develop methods that could work as an initial screening step in high-throughput protein production projects, such as the Human Protein Atlas (HPA) project, for more efficient protein production and purification. In the second part of the thesis, focus lies on quantitative proteomics. Protein fragments were produced with incorporated heavy isotope-labeled amino acids and used as internal standards in absolute protein quantification mass spectrometry experiments. The aim of this investigation was to compare the protein levels obtained using quanti- tative mass spectrometry to mRNA levels obtained by RNA . Expression of 32 different proteins was studied in six different cell lines and a clear correlation between protein and mRNA levels was observed when analyzing on an individual level. The third part of the thesis involves the antibodies generated within the HPA project. In the first investigation a method for validation of antibodies using protein immunoenrichment coupled to mass spectrometry was described. In a second study, a method was developed where antibodies were used to capture tryptic from a digested cell lysate with spiked in heavy isotope-labeled protein fragments, enabling quantification of 20 proteins in a multiplex format. Taken together, the presented research has expanded the pro- teomics toolbox in terms of available methods for protein analysis and quantification in a high-throughput format.

Keywords: Proteomics, mass spectrometry, affinity proteomics, immunoenrichment, , IMAC, screening, protein production, protein purification, ISET, quantification, SILAC, stable isotope standard, antibody validation

I Sammanfattning

I det forskningsområde som kallas proteomik studeras proteiner i stor skala, det vill säga många proteiner parallellt. Genom att studera proteiner kan man till exempel få informa- tion om proteiners funktion och huruvida ett protein är involverat i en sjukdomsprocess. I denna avhandling har proteiner analyserats och kvantifierats på olika sätt och i alla inkluderade artiklar har metodutveckling haft en central roll.

Forskningen som presenteras i denna avhandling kan delas in i tre delar. Den första delen beskriver två nya screeningmetoder för His6-taggade rekombinanta proteinfragment. I ett första projekt renades proteinerna med immobiliserad metalljonsaffinitetskromatografi i ett 96-brunnsformat och i ett andra projekt skalades detta ner till nanoliter-format genom att använda provberedningsplattformen ISET. Målet med dessa projekt var att utveckla metoder som skulle kunna användas som ett initialt screeningsteg vid storskalig protein- produktion för att öka effektiviteten. I del två ligger fokus på kvantifiering av proteiner. Proteinfragment producerades med aminosyror med inkorporerade tunga isotoper och an- vändes sedan som interna standarder i absolut kvantifiering med masspektrometri. Målet var här att jämföra proteinnivåer med motsvarande mRNA-nivåer, vilka bestämts med RNA-sekvensering. Koncentrationen av 32 proteiner bestämdes i sex olika cellinjer och det fanns en tydlig korrelation mellan protein och mRNA när generna studerades indi- viduellt. I den sista delen av avhandlingen användes antikroppar som tagits fram inom proteinatlas-projektet (HPA). I ett projekt utvecklades en metod för validering av an- tikroppar genom att först använda antikropparna för anrikning av målproteinerna från ett cellysat och sedan med masspektrometri bestämma proteinernas identitet. I ett andra projekt användes antikropparna istället för att fånga ut peptider från ett trypsinklyvt cellysat. I en setup där tunga isotopinmärkta proteinfragment adderades till lysatet före klyvning kunde 20 proteiner kvantifieras parallellt. Sammantaget har de presenterade projekten bidragit till att möjligheterna för parallell proteinanalys och kvantifiering har ökat.

II List of publications

This thesis is based on the five publications listed below. The publications are referred to by Roman numbers (I-V) and are included in the Appendix of the thesis. Paper I Tegel H.*, Yderland L.*, Boström T., Eriksson C., Ukko- nen K., Vasala A., Neubauer P., Ottosson J. and Hober S. Parallel production and verification of protein products us- ing a novel high-throughput method. Biotechnol J (2011) vol. 6 (8) pp. 1018-25 Paper II Adler B.*, Boström T.*, Ekström S., Hober S. and Lau- rell T. Miniaturized and automated high-throughput veri- fication of proteins in the ISET platform with MALDI MS. Anal Chem (2012) vol. 84 (20) pp. 8663-9 Paper III Boström T., Danielsson F., Lundberg E., Tegel H., Jo- hansson H. J., Lehtiö J., Uhlén M., Hober S., and Takanen J. O. Investigating the correlation of protein and mRNA levels in human cell lines using quantitative proteomics and transcriptomics. Submitted Paper IV Boström T., Johansson H. J., Lehtiö J., Uhlén M. and Hober S. Investigating the applicability of antibodies gen- erated within the Human Protein Atlas as capture agents in immunoenrichment coupled to mass spectrometry. J Proteome Res (2014) vol. 13 (10) pp. 4424-35 Paper V Edfors F.*, Boström T.*, Forsström B., Zeiler M., Jo- hansson H., Lundberg E., Hober S., Lehtiö J., Mann M. and Uhlén M. Immuno-proteomics using polyclonal anti- bodies and stable isotope labeled affinity-purified recom- binant proteins. Mol Cell Proteomics (2014) vol. 13 (6) pp. 1611-24 *Both authors contributed equally to the work.

III Published work not included in the thesis

Boström T.*, Nilvebrant J.* and Hober S. Purification systems based on bacterial surface proteins. Protein Purification, R. Ahmad (Ed.), (2012) ISBN: 978-953-307-831-1, InTech *Both authors contributed equally to the work.

IV Contributions to the included papers

Paper I Performed all experimental work and data analysis to- gether with coauthor. Wrote the manuscript together with coauthors. Paper II Performed all experimental work together with coauthor. Wrote the manuscript together with coauthors. Paper III Performed all experimental work except cell cultivation and performed all MS data analysis. Wrote the manuscript together with coauthors. Paper IV Performed all experimental work and all MS data analysis. Wrote the manuscript together with coauthors. Paper V Performed cell cultivation, MS sample preparation and all MS data analysis. Produced and quantified all heavy isotope-labeled PrESTs. Contributed to the writing of the manuscript.

V Abbreviations

2D-GE two-dimensional Ab antibody ABD albumin binding domain ABP albumin binding protein ACM antibody colocalization microarray APEX absolute protein expression AQUA absolute quantification AUC area under curve BAC bacterial artificial chromosome BCA bicinchoninic acid cAb capture antibody CDR complementary determining region CIMS context-independent motif specific CR cross reactivity dAb detection antibody DARPin designed ankyrin repeat protein DNA deoxyribonucleic acid DWP deep well plate EGTA ethylene glycol tetraacetic acid EIA enzyme immunoassay ELISA enzyme-linked immunosorbent emPAI exponentially modified protein abundance index ESI EtEP equimolarity through equalizer Fab fragment antigen binding FASP filter-aided sample preparation Fc fragment crystallizable FT-ICR fourier transform ion cyclotron resonance GFP green fluorescent protein GPS global proteome survey

His6 hexahistidine HPA human protein atlas IBAQ intensity-based absolute quantification ICAT isotope coded affinity tag IEF isoelectric focusing Ig immunoglobulin IMAC immobilized metal ion affinity iMALDI immuno-MALDI IRMA immunoradiometric assay ISET integrated selective enrichment target iTRAQ isobaric tags for relative and absolute quantification

VI LC liquid chromatography LFQ label free quantification mAb monoclonal antibody MALDI matrix-assisted laser desorption ionization MRM multiple reaction monitoring mRNA messenger ribonucleic acid MS mass spectrometry MSIA mass spectrometric immunoassay MS/MS m/z mass over charge pAb polyclonal antibody PCR polymerase chain reaction PEA proximity elongation assay PFL protein frequency library pI PLA proximity ligation assay PMF peptide mass fingerprint PrEST protein epitope signature tag PSAQ protein standard absolute quantification PSM peptide spectrum match PTM post-translational modification QconCAT quantification concatamer QUICK quantitative immunoprecipitation combined with knockdown RIA radioimmunoassay RNA ribonucleic acid scfv single-chain fragment variable SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SILAC stable isotope labeling by/with amino acids in cell culture SISCAPA stable isotope standards and capture by anti-peptide antibodies SMAC sequential multiplex analyte capturing SRM selected reaction monitoring TAP tandem affinity purification TEV tobacco etch virus TMT tandem mass tag TOF time of flight TXP Triple X Proteomics µTAS micro total analysis system WB

VII Preface

When I started my journey as a PhD student in 2010, I was convinced that I would find a cure for cancer. Well, maybe not a cure, but at least that I would on my dissertation day have been part of the development of a fab- ulous diagnostic platform that would revolutionize the medical field. Five years seemed like such a long time - just think of all the things I could ac- complish! I soon realized however, that I might have overestimated things slightly. I came to the understanding that when doing research, five years is actually quite a short period. My enthusiasm was slowly replaced with dis- couragement and I started wondering if I would manage to achieve anything at all. Eventually, things started to turn and I could experience the joy of one good result that was easily enough to outweigh a dozen bad ones. Today I know that every little step towards a cure for cancer, or any other goal one may have, is of great importance and truly makes a difference. Therefore I feel extremely happy and proud that I have made a contribution to the field of proteomics through this thesis. The new findings that are presented in this book are mainly the development of novel tools to study proteins in different ways. By taking advantage of the large resource of both antibodies and antigens available within the Human Protein Atlas project, methods for protein identification, quantification and antibody validation have been developed. Present investigations are sum- marized in chapter 6. In order for the reader to get a better picture of the research as well as its impact on science, an overview of the field is presented in chapters 1-5. Many people have contributed to the research presented in this thesis and to you I am most grateful. Tove Boström Stockholm, August 25th 2014

VIII Contents

Abstract ...... I Sammanfattning ...... II List of publications ...... III Contributions to the included papers ...... V Abbreviations ...... VI Preface ...... VIII

1 Proteins and proteomics 1 What are proteins and why should we study them? ...... 1 The challenges of proteomics ...... 4 The proteome - complex but informative ...... 6

2 Mass spectrometry-based proteomics 9 History of mass spectrometry ...... 9 Mass spectrometry for protein analysis ...... 10 Sample preparation for mass spectrometry ...... 15 Data independent acquisition methods ...... 18 Quantitative proteomics ...... 19 Proteome coverage and sensitivity ...... 29

3 Affinity proteomics 33 History of the immunoassay ...... 33 Antibodies as affinity reagents ...... 34 Alternative affinity reagents ...... 37 The Human Protein Atlas ...... 38 Immunoassays ...... 42 Cross-reactivity in immunoassays ...... 46

IX CONTENTS

4 Bridging affinity proteomics with mass spectrometry 49 Benefits of combining immunoenrichment with mass spectrom- etry ...... 49 Immunoenrichment setups ...... 50 Immunoenrichment of protein or peptide groups ...... 58

5 Miniaturization 61 Miniaturization in proteomics ...... 61 The ISET platform ...... 64

6 Present Investigation 67 Summary ...... 67 Development of screening methods for recombinant protein production (papers I and II) ...... 68 Absolute MS-based protein quantification to study the corre- lation between protein and mRNA levels (paper III) . 75 Antibody validation using immunoenrichment coupled to mass spectrometry (paper IV) ...... 79 Protein quantification using immunoenrichment and mass spec- trometry (paper V) ...... 83 Concluding remarks ...... 88

7 Populärvetenskaplig sammanfattning 91

8 Acknowledgements 93

9 Bibliography 97

X Chapter 1

Proteins and proteomics

What are proteins and why should we study them?

Proteins are everything and they are everywhere. In almost all biochemical processes, you will find that proteins are among the key players. As Francis Crick pointed out in 1958: "The most significant thing about proteins is that they can do almost anything" [1]. Proteins are commonly called the building blocks of life, as without them, there would be no life at all. They are crucial in practically every cellular process and a small error in a single protein can be enough to cause a disease [2,3]. Today we know quite a lot about proteins. We know what chemical sub- stances they consist of, we know how they are produced in a cell [4, 5], we can determine what they look like [6] and their exact molecular weight [7]. We can determine in what type of cells a protein is expressed, even in what cellular compartment [8–10] and we can quantify the amounts of different proteins in various cells and body fluids [11–13]. However, we have yet a long way to go before we can fully understand the complex nature of pro- teins in how they function and interact with one another. The research area in which proteins are studied in large-scale (i.e. in high numbers) is called proteomics. The composition of a protein is determined by its genetic code, contained within the DNA in the cell. DNA, or deoxyribonucleic acid, was discovered

1 Proteins and proteomics in the late 1860s, although it was then not known that this molecule was the carrier of genetic information. This was not confirmed until almost one hundred years later [14] and in 1953, Francis Crick and James Watson were able to determine the now well-known alpha-helical structure of DNA, based on X-ray analyses performed by Rosalind Franklin and Maurice Wilkins. Once having determined the structure of DNA, they could also propose a system for DNA replication, that ensured that the DNA content would be identical in the two daughter cells after cell division [15]. Crick could in 1961 propose a sequence hypothesis, in which he stated that three bases of DNA codes for one specific [16] and finally, he could also describe the complete central dogma of molecular biology that explains how DNA is transcribed to RNA, which is further translated to protein [1,4]. Proteins were established as a unique class of molecules already in the late 18th century [17], but it was in the following century that the composition of proteins was revealed. The Dutch chemist had in 1838 observed that proteins were very large molecules with similar atomic composition, only differing in the amount of sulphur and phosphorus [18]. Mulder was the first to use the name protein in a publication, however it was the Swedish chemist and clinician Jöns Jacob Berzelius who first proposed the name [18, 19]. The name protein has its origin from the Greek word πρωτα ("prota"), meaning "of primary importance". The 20 different pro- tein building blocks, amino acids, were discovered during a period of more than 100 years between 1819 and 1936. In the beginning of the 20th century the structure of proteins was elucidated when the scientists and independently demonstrated how amino acids were connected by peptide bonds [20]. Frederick Sanger could in 1951 present the first complete amino acid sequence of a protein, namely the protein insulin of 110 amino acid residues [21, 22]. The amino acid sequence of a protein is called the primary structure (Figure 1.1). However, a chain of amino acids linked together by peptide bonds, a polypeptide chain, will in most cases not be very functional unless it also has obtained a correct fold. Locally formed structures are denoted secondary structures and include mainly α-helices, β-sheets and loops [6]. Secondary structure patterns can be predicted by the protein sequence and the hydrogen bonding between amino acids, as demon- strated by Linus Pauling in 1951 [23, 24]. Secondary structural elements

2 Proteins and proteomics

primary structure secondary structure

tertiary structure quaternary structure

Figure 1.1: The different levels of protein structure. The amino acid sequence of a protein is called the primary struc- ture. The organization of the polypeptide chain into α-helices, β-sheets and loops makes up the secondary structure. Sec- ondary structural elements are organized into a tertiary struc- ture and several polypeptide chains can be combined to form a quaternary protein structure. of the protein are further organized into a tertiary protein structure. One major driving force of this process is to hide hydrophobic residues in the pro- tein core, hence shielding them from surrounding water molecules, a theory presented by Irving Langmuir already in 1938 [25]. However it was not un- til 1959, when Walter Kauzmann gave the proposal attention that the idea

3 Proteins and proteomics started to obtain acceptance [26]. Proteins consisting of several polypep- tide chains can be further organized into a quaternary structure. Although we can quite accurately predict protein secondary structure based on amino acid sequence using different algorithms [6], predicting the complete three- dimensional structure of a protein only by looking at the primary structure is an extreme challenge and is today not possible for the majority of pro- teins. Determining protein structure is, together with investigating protein function, two of the key questions of the proteomic research field.

The challenges of proteomics

The entire set of genes of a cell or organism is called its genome and the large- scale study of genes and genomes is called genomics. A genome is constant, meaning that all cells within an organism contain the same set of genes. PCR technology has made it possible to efficiently multiply DNA for easier analysis [27,28]. In addition, the development of next generation sequencing technology has enabled high-throughput sequencing to a reasonable cost [29]. This has resulted in a rapid evolvement in the field of genomics and complete genomes of several organisms have currently been mapped. Among these is the human genome, which was completed in 2001 [30,31]. All expressed proteins within a cell together make up the proteome, a term first coined by Marc Wilkins in the 1990s [32]. In contrast to the genome, protein expression can vary in different cells as well as in different cell stages and upon different cell stimuli. The proteome is hence very dynamic and depends on both cell type and time of analysis as well as external factors. In proteomics, the major technique used today to study proteins is mass spectrometry (MS), where proteins and peptides are analyzed based on their mass and charge [33, 34]. However, the first method for large-scale pro- tein analysis was two-dimensional gels (2D-GE), which was developed in the 1970s. Using these gels hundreds or thousands of proteins could be sepa- rated and visualized, but analysis of the different proteins was challenging and mainly proteins of high abundance could be identified [35]. After the breakthrough of MS for analysis of biomolecules in the 1990s, it has grown to become a widely used technique to study proteins due to its accuracy, sen- sitivity and possibility for high-throughput analysis [7]. Another branch of

4 Proteins and proteomics proteomics is affinity proteomics, where affinity proteins such as antibodies are used for protein identification and quantification. The great advantage of affinity proteomics is the high sensitivity of the assays, however the need to develop affinity reagents towards all target proteins is a great bottleneck and the assays are greatly dependent on the quality of the affinity reagents [36]. Mass spectrometry-based proteomics and affinity proteomics will be further discussed in chapters 2 and 3. The human genome contains around 20,000 genes, each coding for one or mul- tiple specific proteins [37]. Even though we know the amino acid sequence of these proteins, this does not necessarily tell us much about protein func- tion. In eukaryotic organisms, each protein-coding is built up by exons and introns (Figure 1.2). Only the exons are coding sequences and introns will be removed during a process called alternative splicing that takes place after [5]. The exons of a gene are then combined in different ways to create different protein variants, so called protein isoforms. Further- more, proteins can be modified in different ways with different functional groups, termed post-translational modifications (PTMs) [38, 39]. For exam- ple protein can activate different signaling cascades, addi- tion of ubiquitin to a protein marks it for degradation by the proteasome and glycosylation can affect cell-cell recognition. Other modifications can alter protein-protein interactions and cellular localization [38]. Phosphorylation sites have been identified in more than 10,000 proteins [40], emphasizing the great importance of post-translational protein processing. Different isoforms and PTMs together increase the number of human protein versions tremen- dously [41]. In addition, the dynamic range of protein concentrations within a typical proteome is huge, especially in blood samples where concentrations between high and low abundant proteins vary more than ten orders of mag- nitude [42]. To generate a full picture of a proteome is therefore extremely challenging. Moreover, the proteome is very versatile and external changes during cell cultivation and sample preparation or differences in handling of blood samples can alter the composition of the proteome [43], making repro- ducibility of proteomic data difficult. All together, the proteome is extremely complex, which makes proteomics research a tough task.

5 Proteins and proteomics

DNA

mRNA

protein isoforms

protein isoforms with post-translational modications

intron exon post-translational modication

Figure 1.2: The complexity of the proteome compared to the genome. From one protein-coding gene multiple protein variants can arise. Exons can be combined in different ways during alternative splicing, and proteins can be modified with different post-translational modifications, resulting in a large number of protein variants.

The proteome - complex but informative

There is a tremendous amount of information that can be extracted from proteomic studies that we cannot learn from genomics. Transcriptomics, in which RNA is analyzed in a large scale, can act as a bridge between these two areas of research [44, 45]. Levels of mRNA within a cell can tell us approximately how many times a gene is transcribed. This can indicate at what level the corresponding protein is expressed and could therefore give clues to elucidate protein function. It can be investigated at what time

6 Proteins and proteomics point a certain gene is transcribed, for example if there is a difference in transcription throughout the cell cycle [46]. However, even though mRNA abundance can indicate the amount of protein within a cell, several studies show a rather weak global correlation between protein and mRNA levels [47]. It has however been shown that analyzing mRNA levels of individual genes instead of large gene sets can give a much better estimation of protein abun- dance, which is actually rather expected, since different genes are regulated differently [40]. Hence, a great amount of useful information regarding pro- teins can come from transcriptomics and transcript data is very useful to researchers within the proteomics field. Measuring protein abundance in different cells and tissues is an important part of proteomics research. Proteins that are expressed with similar abun- dance across different cells, expressed from so-called housekeeping genes, are most likely involved in basic cellular functions whereas proteins of very dif- ferent abundance are more likely to have cell-specific functions [48, 49]. If a protein is believed to be involved in a certain disease, a difference in con- centration of this protein in a sample for a diseased patient compared to a healthy one would be expected. Proteins that can potentially determine the disease state of a patient are denoted disease markers or biomarkers [50]. Investigating the subcellular localization of proteins can also generate infor- mation about protein function as different subcellular compartments have specific characteristics and carry out specialized functions [9]. Analysis of protein complex structures and interaction networks is also very informa- tive [51, 52] and mapping the PTMs of a protein can generate information regarding for example protein activity state [39].

7 Proteins and proteomics

8 Chapter 2

Mass spectrometry-based proteomics

History of mass spectrometry

MS is today the most widespread technique to study proteins, however ana- lysis of large biomolecules using this technology is a rather new application. Development of the first mass spectrometer-like instrument dates back to 1912, when the British physicist Sir Joseph John Thompson constructed a

"mass spectrograph" and managed to obtain mass spectra for O2,N2, CO, CO2 and COCl2. This instrument paved the way for the development of more advanced mass spectrometers. The possibility to use MS for the analysis of large biomolecules such as proteins was however not realized until several decades later. Ionization of the analytes is fundamental for the analysis and the ionization techniques that were used at this time required the analyte to be present in gas phase, limiting the analysis to small, volatile compounds. Proteins, which are large and polar molecules and hence extremely non- volatile, were much more problematic [53]. In the 1980s however, two new ionization techniques were developed that made it possible to study larger biomolecules such as peptides and proteins. Matrix-assisted laser desorp- tion ionization (MALDI) and electrospray ionization (ESI) were presented roughly at the same time by Michael Karas and Franz Hillenkamp [54] and John Fenn [55], respectively. Since then, the MS technology has kept on de-

9 Mass spectrometry-based proteomics veloping at a high speed, however MALDI and ESI remain the ion sources of choice for the analysis of biomolecules. In 2002, the Nobel prize was awarded to John Fenn and Koichi Tanaka for their work with ESI and laser desorption ionization.

Mass spectrometry for protein analysis

In MS, proteins are analyzed based on their mass and charge. The instru- mentation consists of three parts: an ionization source, a mass analyzer and a detector (Figure 2.1). Ionization is the first step of the process, where an- alyte ions are produced in gas phase. If the ionization is "soft", the analyte stays intact, however harder ionization techniques exist where analyte frag- mentation occurs in the process. In the mass analyzer, the analyte ions are separated based on their mass to charge ratio (m/z) and the output from the detector is a mass spectrum with intensity plotted against m/z. Common mass analyzers for analysis of biomolecules are the time of flight (TOF), quadrupole, ion trap, fourier-transform ion cyclotron resonance (FT-ICR) and orbitrap mass analyzers [34].

ionization m/z separation fragmentation m/z separation detection

Figure 2.1: Overview of a mass spectrometer setup. The workflow includes ionization, m/z separation and detection. Peptide ions can be detected after a first m/z separation or further fragmented to enable peptide sequencing as indicated by the grey box. Usually both modes (MS and MS/MS) are used simultaneously.

MS is a valuable tool in many aspects of protein analysis. After recombi- nant protein production and purification MS can be used for verification of the protein identity and the protein can then usually be measured in its full-length form [7]. The generated data is the molecular weight of the pro- tein, which in most cases is enough for a reliable identification. However,

10 Mass spectrometry-based proteomics modified protein variants can be difficult to map, especially if the protein is produced in a mammalian host where multiple modifications is not un- common. Solubility can be a problem especially when dealing with larger, full-length proteins. Adding detergents to the sample is generally a common strategy to tackle this problem, however detergents are not compatible with MS instrumentation and will hamper the analysis. The most common strategy for MS-based proteomics is "bottom up" pro- teomics, where a proteolytic enzyme is used to cleave a protein pool into peptides before MS analysis [56] (Figure 2.2). Peptides are, compared to full-length proteins, more easily ionized in the mass spectrometer leading to an increased sensitivity [57]. This also decreases the issue of protein solu- bility and hence makes it possible to identify insoluble membrane proteins, which is otherwise a very challenging protein class from a proteomics per- spective. In addition, the exact mass of a full-length protein is in many cases not known, due to for example complex modification patterns, as men- tioned above. When analyzing smaller amino acid sequences, this problem is decreased, as peptides from unmodified regions can be used to identify the corresponding protein. Peptides generated from digestion (cleav- age after lysine and arginine) are of good size for MS in terms of accurate mass determination and easily deconvoluted charge states and this enzyme is today the standard option for protein digestion, even though combining several proteolytic enzymes can to some extent increase the obtained se- quence coverage [58]. The obtained peptides can be analyzed directly in a mass spectrometer or after separation based on hydrophobicity using an on-line coupled reversed-phase column before injection. However, due to differences in ionization efficiency not all peptides will be detectable in the mass spectrometer and therefore complete sequence coverage will very rarely be obtained. Intact peptides can be identified in a process called peptide mass finger- printing (PMF). The molecular weights of peptides identified in the mass spectrometer are then used to map the peptides to their corresponding full- length protein. For complex protein mixtures, peptide mass fingerprinting has the drawback that two peptides of equal mass and charge cannot be dis- tinguished from one another. To get a more reliable identification, MS/MS or tandem MS is used, where not only peptide molecular weight, but also

11 Mass spectrometry-based proteomics

digestion

protein mixture peptide mixture

intensity MS analysis

1

m/z separation m/z

MS/MS analysis intensity

2

m/z separation fragmentation m/z separation

m/z

Figure 2.2: Bottom up proteomics. A proteolytic enzyme is used to digest proteins into peptides, which are injected and analyzed in a mass spectrometer. (1) In a full MS scan, pep- tides are directly analyzed by m/z. (2) In tandem MS, the ions of highest intensity are selected and fragmented to gen- erate smaller ion species. These are further separated by m/z and detected to generate a fragment ion spectrum. From this spectrum, the peptide amino acid sequence can be determined. amino acid sequence can be determined. Peptides are first separated in a mass analyzer before one chosen peptide ion, the precursor ion, is fragmented in a collision chamber, for example by collision of the ion with residual gas. After fragmentation, the resulting product or fragment ions are analyzed in a second mass analyzer and a fragment ion spectrum is recorded. The pro- cess is repeated throughout the entire chromatographic separation, with a set number of precursor ions selected per cycle [57]. This setup is described

12 Mass spectrometry-based proteomics as "tandem in space", with different mass analyzers connected in sequence to perform the different analyses. In contrast, "tandem in time" can be performed when using trapping instruments (ion trap, FT-ICR and orbi- trap), where one mass analyzer can perform both MS scan, fragmentation and MS/MS analysis [59]. Three different amino acid bonds can be cleaved during peptide fragmentation: C(R)-C, C-N or N-C(R). From these cleav- ages, six different fragment ions can be produced, depending on whether the charge is kept on the N- (a, b or c) or C-terminal side (x, y or z) of the pep- tide (Figure 2.3). The type of cleavage that occurs depends on the applied fragmentation method. In a perfect product ion spectrum, peaks representing each peptide fragment would be present. The difference in m/z between the peaks reveals which amino acid has been cleaved off at each specific position. In reality, product ion spectra rarely contain all possible fragment ions and therefore, de novo sequencing is difficult. Instead, product ion spectra are usually searched against protein databases. Theoretical tryptic peptides are generated by in silico digestion of the proteins within the chosen database and used to generate theoretical fragment ion spectra that are then compared to the ex- perimental data. The molecular weight of the precursor ion together with a mapped fragment ion spectrum is usually enough to determine the ex- act peptide identity. Peptide identifications are reported with a probability score as a measure of the reliability of the identification and today, several different search engines are available for searching MS data, such as Mascot, X!Tandem and SEQUEST [60–64]. It is also possible to obtain sequence information directly from full-length proteins, in a process called "top down" proteomics [65, 66]. Intact protein ions are then fragmented in the mass spectrometer to generate detectable fragment ions. Advantages of this approach are the higher obtained se- quence coverage compared to bottom up proteomics and the increased po- tential to localize PTMs. In addition, the exclusion of the protein digestion step is beneficial from a time and cost perspective. Using top down pro- teomics, sequence predictions have been made on proteins with molecular weight exceeding 200 kDa [67]. However, this approach also suffers from several drawbacks. Due to the increased complexity of the fragment ion spectra for full-length protein ions, where product ions can have as many

13 Mass spectrometry-based proteomics

R1 H O R3 H H C N C N C C C N C O H H H 3 H 1 2 H O R2 H O

H O R2 O R3 H C H N C C C 1 C N N O H H C H R1 H H O

a2 x1

R3 H O R2 H H C N O H H N C C C 2 C N C H H H H O O R1 H

b2 y1

H O R2 R3 H H N C C 3 H C C N C N H C O H H H O H R1 H O

c2 z1

Figure 2.3: Different fragmentation paths of a tripeptide ion. (1) C(R)-C bond cleavage resulting in a and x ion series. (2) C-N amide bond cleavage resulting in b and y ion series. (3) N-C(R) bond cleavage resulting in c and z ion series. different charges as the full-length protein, the strategy is limited to simple protein mixtures [66]. In order to reach a high enough resolution of the high charge state ions expensive FT-ICR instrumentation is most commonly used, especially if PTMs are studied [68].

14 Mass spectrometry-based proteomics

Sample preparation for mass spectrometry

To analyze proteins in cells and tissues, the first step is protein extraction by cell lysis. This can for example be performed by exposing the cells to a detergent-containing lysis buffer or by sonication. Alternatively, these strategies can be combined. After cell lysis, a chromatographic step can be applied for protein purification or fractionation, or proteins can be directly digested using a proteolytic enzyme. Purification can be performed to enrich tagged proteins (for example pro- teins containing a His6-tag or FLAG-tag) or for enrichment of a certain type of proteins (for example charged proteins or proteins with certain mod- ifications) [69] (Figure 2.4). Purification of tagged proteins can today be performed in a streamlined format, where a large number of proteins can be purified using a common protocol. For purification of native proteins, the process generally requires optimization making it more time-consuming. Antibodies or other affinity agents can also be used to enrich specific pro- teins [70], as will be discussed in chapters 3 and 4. A purification step will not only remove interfering proteins, decreasing the risk of ion suppression during MS analysis, but can also increase the concentration of the protein in the sample, which for low-abundant proteins is often crucial. Proteins can be fractionated for example based on molecular weight by using size exclusion chromatography or using 2D-GE [32] in which, proteins are separated both based on molecular weight and isoelectric point (pI) in two dimensions. For a proteome-wide experiment where the aim is to cover as large portion of the proteome as possible, sample fractionation before and/or after digestion would be advisable. Dividing the highly complex protein or peptide sample into several fractions of lower complexity will generally result in a larger list of identified proteins, however fractionation also leads to longer analysis time on the mass spectrometer. Extensive sample preparation workflows can also lead to a substantial sample loss. Protein digestion can be performed in several ways using proteolytic en- zymes. In solution digestion is a good alternative, especially for simple protein mixtures. However, if proteins are directly digested after cell ly- sis without an affinity purification or fractionation step, consideration has to be taken to the tolerance of the proteolytic enzyme for the sample conditions.

15 Mass spectrometry-based proteomics

protein extraction

A BC

protein digestion

A B

MS analysis

Figure 2.4: Strategies for MS sample preparation. After protein extraction, proteins can either be subjected to affinity purification (A), fractionation (B) or 2D-GE or SDS-PAGE separation (C). After digestion affinity purification can be ap- plied to enrich certain peptides or peptides can be fractionated based on several different properties before MS analysis.

The commonly used filter aided sample preparation (FASP) method enables efficient washing of cell lysates before digestion so that any detergents or salts from the lysis buffer are efficiently removed [71]. This is achieved at

16 Mass spectrometry-based proteomics the cost of lower sample recovery. SDS-PAGE, where proteins are separated based on size, can be used as a combined fractionation and digestion plat- form. Bands from the gel can be cut out and digestion can be performed directly in the gel piece [72]. Analysis can be performed on fractions cov- ering the whole molecular weight range or a specific fraction if a particular protein is of interest. In addition to fractionation, separation of proteins on a gel results in sample cleanup with the removal of salts and detergents. Even after an affinity purification step, there are usually several interfering proteins present in the sample, corresponding to sticky or high abundant proteins and in-gel digestion is therefore a good option to get rid of these proteins. However, this setup is difficult to scale up and when dealing with many samples, another method would therefore be recommended. The resulting peptides can be further fractionated in different ways. Com- monly, two-dimensional fractionation is performed, where the second step is peptide reversed phase separation coupled on-line to ESI-MS analysis. The fractionation methods should be orthogonal, meaning that the separation is performed based on different peptide properties, thereby increasing separa- tion efficiency. For example, isoelectric focusing (IEF) where peptides are separated by isoelectric point [73], or ion exchange chromatography [74] can be applied. This can be performed either in an off-line mode where fraction- ation is performed prior to LC-MS/MS analysis or in an on-line setup, such as the MudPIT strategy where the analytical column is packed with two layers of chromatographic material (ion exchange and reversed phase) [75]. Peptides can also be enriched using specific antibodies [76], which can reduce the sample complexity significantly. If analysis of PTMs is desired, enrich- ment of modified peptides can be performed also on the peptide level using for example antibodies targeting a certain modification or with TiO2 for en- richment of phosphopeptides [77,78]. The last step prior to MS analysis is to desalt the sample, which can be done with Stop and Go Extraction (Stage) tips (C18 material loaded in a pipette tip) [79] or commercially available desalting platforms [80]. Alternatively, desalting can be performed using an on-line setup with a C18 column, a so called trap column. If analysis is per- formed on an ESI-MS instrument, peptides are usually separated on an LC column on-line prior to injection into the mass spectrometer, as mentioned above. For MALDI-MS analysis this is more troublesome and off-line LC

17 Mass spectrometry-based proteomics fractionation can instead be applied if the sample is too complex for direct analysis.

Data independent acquisition methods

In a standard MS/MS analysis, the instrument is operated in data-dependent mode, meaning that the ions of highest intensity are chosen for fragmenta- tion. This is a good strategy for a proteome-wide experiment, however in certain cases only analysis of a set of specific proteins is desired. The data acquisition can then be performed in targeted mode where specific precursor ions are chosen computationally beforehand and all other ions are excluded from the analysis, hence operating the instrument in data-independent mode (Figure 2.5). This has the advantage of increased sensitivity, as low abundant ions are chosen for sequencing that would otherwise have been masked by higher abundant ions [81,82]. In a method called multiple or selected reaction monitoring (MRM or SRM) [83–85], a triple quadrupole mass spectrometer is used for efficient selection of both precursor and product ions to monitor specific transitions. A triple quadrupole has three quadrupole analyzers in sequence. In the first quadrupole, a specific precursor ion is selected, this ion is fragmented in the second quadrupole and the third quadrupole selects one or several product ions that are passed on to the detector [84]. Once an MRM assay is established, many samples can be analyzed rapidly with high sensitivity and reproducibility, however the assay development can be time consuming [86]. MRM is therefore commonly used in for example biomarker validation studies where relatively few, usually low abundant, proteins are analyzed in large sample sets. Data-independent acquisition can also be performed in an untargeted man- ner [87, 88]. One example is SWATH MS, where the m/z detection space is divided into 32 sets of 25 Da windows and the mass spectrometer con- stantly scans through these windows and fragments all ions in each win- dow [89].

18 Mass spectrometry-based proteomics

intensity Q1 Q2 Q3

time

Figure 2.5: Targeted MS using MRM. A peptide ion is se- lected in the first quadrupole (Q1) and thereafter fragmented in the second quadrupole (Q2). In the third quadrupole (Q3), a fragment ion is selected and passed on to the detector.

Quantitative proteomics

In many cases, quantitative information is required in order to answer a certain biological question. Protein quantification using MS is not straight- forward due to differences in ionization efficiency between different peptides, meaning that two peptides of the same abundance in a sample can give rise to different intensities in the mass spectrometer. Therefore comparing intensities or peak areas between different peptide species will not gener- ate accurate information regarding abundance. In addition, ion suppression and other matrix effects decreases the reproducibility making comparisons between runs difficult. A common strategy is therefore to add an internal standard possessing identical chemical properties as the target peptide to the sample, to which one can compare the ion signal. This can for example be done by metabolic labeling or chemical tagging methods to either obtain relative abundances between two samples or for absolute quantification to determine absolute copy numbers or protein concentrations within a cell or sample. Although it is also possible to generate quantitative information by comparing signals from two separate MS runs, the result will be of lower ac- curacy. Many different quantitative methods have been developed and they all have advantages and drawbacks. One important aspect is at which stage the samples are mixed, or the internal standard is spiked into the sample. Sample preparation prior to MS is usually quite extensive and large errors

19 Mass spectrometry-based proteomics can be introduced during these steps. A method that enables mixing of the samples at an early stage is therefore desirable to minimize these er- rors. Some of the existing methods for MS-based protein quantification are discussed in the following sections.

Metabolic labeling strategies for quantitative proteomics

Labeling peptides and proteins with heavy isotope-labeled amino acids has become a widespread strategy to enable accurate protein quantification. Consider a peptide that is present in a sample in a "light" unlabeled form and a "heavy" form with heavy isotope-labeled amino acids. The two vari- ants can be distinguished from one another due to the introduced mass shift by the heavy isotopes but the isotope labeling will not affect the proper- ties of the peptide. Signal intensities of the two variants can therefore be compared to determine their relative abundance. Labeling of arginine and lysine residues is most common, as cleavage with trypsin will then ensure that all tryptic peptides will contain at least one labeled amino acid. Several different variants of heavy isotope-labeled amino acids can be used, where carbon and/or nitrogen is labeled [90], however quantification using labeled versions of other amino acids or complete labeling of all amino acids can also be performed [91–93]. Stable isotope labeling of amino acids in cell culture (SILAC) is a widely used method for relative protein quantification [91, 94] (Figure 2.6). In SILAC, cells are cultivated in medium containing heavy isotope-labeled amino acids and hence, the stable isotopes are incorporated into the proteins by the ma- chinery of the cell. Since all expressed proteins are labeled, SILAC generates relative quantitative data on a proteome-wide scale. SILAC has for example been used to analyze protein signaling pathways [92, 93, 95], to investigate cancer proteomic profiles [96,97] and to determine the cellular response upon drug treatment [98]. SILAC can be used to obtain relative quantitative data between two [93], three [95] or even five samples, with differently labeled arginine variants [90]. Experiments with four or five different samples using deuterated amino acids is also possible, however this may lead to alterations in LC retention time due to the deuterated amino acids, called the deuterium isotope effect [99]. In addition, multiplexing will lead to an increased sample

20 Mass spectrometry-based proteomics

light amino acids heavy amino acids

mix

protein extraction

digestion

MS analysis intensity

m/z MS spectrum

Figure 2.6: Workflow for relative quantification of two sam- ples using SILAC. One cell sample is cultivated using stan- dard isotope (light) amino acids and the other with heavy isotope-labeled (heavy) amino acids. The samples are mixed, digested and analyzed in a mass spectrometer. Intensity ra- tios between the peptide variants are used to determine the relative amounts of the peptides in the two samples. complexity, hence making quantification more difficult. A variant of the SILAC method, termed super SILAC or spike-in SILAC, has been developed where labeling of the sample with heavy isotopes can be avoided. Instead, a super SILAC mix is used, containing lysates from cell lines cultivated in media with heavy isotope-labeled amino acids. The super SILAC mix is spiked into each sample and ratios between heavy and

21 Mass spectrometry-based proteomics light peptides are obtained. These ratios can then be used to determine the relative abundances of the proteins between the different samples [10, 100, 101]. Since sample labeling is not necessary, patient samples such as tissues can also be analyzed [102, 103]. However, the results are dependent on the choice of cell lines for the super SILAC mix, since quantification of proteins in tissue requires that these proteins are present also in the internal standard sample.

Chemical labeling strategies for quantitative proteomics

Other methods for protein quantification use chemical labeling of proteins or peptides. Two commonly used methods generally used for relative quantifica- tion are isobaric tags for relative and absolute quantification (iTRAQ) [104] and tandem mass tags (TMT) [105]. Here, labeling is performed on the pep- tide level after protein digestion and peptides are labeled at primary amine residues. Several different tags exist, enabling multiplex experiments, up to 8-plex for iTRAQ [106] and 10-plex for TMT [107]. The tags are isobaric, meaning that they have the same mass and different tags can therefore not be distinguished in an MS spectrum. All tags contain a reporter group, with slightly different mass for the different tags. During peptide fragmentation, reporter ions of a specific size are generated for each isobaric tag. These will be visible in the product ion scan, where their relative intensities can be used to determine the relative abundance of the peptide in the different samples. One significantly cheaper option is dimethyl labeling, where formaldehyde with either hydrogen or deuterium atoms is used as a reagent to generate a mass shift of 4 Da between the labeled variants [108, 109]. The whole labeling procedure takes less than five minutes. A drawback with this quan- tification strategy is however, as mentioned previously, the fact that deuter- ated peptides show a chromatographic retention time shift, which can affect quantification accuracy. Another strategy is labeling with 18O. This label- 18 ing is performed during proteolysis by trypsin, where hydrolysis in H2 O results in two 18O being incorporated into the carboxyl terminus of the tryp- tic peptide [110, 111]. The efficiency of the labeling however depends both on peptide length and sequence, resulting in variable incorporation of 18O.

22 Mass spectrometry-based proteomics

This generates peptides with either one or two labeled oxygen atoms, which decreases the accuracy of the method. The first method based on isotopic labeling developed for quantitative MS- based proteomics was isotope coded affinity tags (ICAT). ICAT was pre- sented in 1999 and applies labeling on the protein level at residues [112]. The original tag contained either zero or eight deuterium residues, along with a biotin molecule for affinity purification of the labeled peptides after digestion. Hence, this system also suffered from the deuterium iso- tope effect and a modified version was therefore developed where carbon isotopes were used instead of deuterium atoms [113]. An advantage of ICAT compared to iTRAQ and TMT is that labeling is performed earlier in the process, leading to less error due to differences during sample preparation. Since only proteins containing a cysteine can be quantified with this method ICAT is not suitable for all applications, however the affinity enrichment of tagged peptides reduces the sample complexity, which can be an advantage if peptides of low abundance are of interest.

Label-free relative quantification

Methods not relying on protein or peptide labeling, termed label-free quan- tification, also exist, with the advantage of easier sample preparation and lower cost but at the expense of lower quantitative precision (Figure 2.7). Peptide spectrum matches (PSMs), peptide signal intensities or areas under curves (AUCs) from two or more samples can be directly compared without prior mixing of the samples [114, 115]. However, when comparing signals from different samples, the requirements on MS instrumentation regarding accuracy and precision are increased. In spectral counting, it is assumed that the number of identified PSMs for a certain protein is relative to its abundance [115,116]. This is a commonly used strategy, however also controversial since protein quantification is based on the number of spectra instead of actual data [117]. Advantages with this technique include the straightforward data analysis, which does not require specialized software. Since quantification is based on the number of tandem MS spectra mapped to a certain protein and the accuracy of the quantifi- cation is increased with more data, multiple sequencing of each peptide is

23 Mass spectrometry-based proteomics

protein extraction

protein digestion

MS analysis

intensity intensity

m/z m/z

data analysis and comparison intensity

intensity m/z intensity intensity

m/z intensity intensity

time m/z m/z intensity intensity time

m/z m/z

Figure 2.7: Workflow for label-free quantification. Sam- ples are treated separately during the whole sample prepara- tion and analyzed separately in a mass spectrometer. During data analysis, samples are compared either by peptide inten- sities, AUCs or number of PSMs.

24 Mass spectrometry-based proteomics favorable. However, sequencing the same peptide over and over will decrease the overall proteome coverage, as the mass spectrometer can only sequence a certain number of peptides in a run. In addition, peptides with broader chromatographic peaks will be identified more times than peptides with very narrow elution profiles, leading to more PSMs and a higher estimation of pro- tein abundance. Lower abundant proteins will generally have a higher varia- tion in the number of PSMs compared to higher abundant proteins, making this strategy less accurate for proteins of low concentration [12,13]. Contrary to spectral counting, when comparing peptide signal intensities or AUCs for relative quantification, it is assumed that the peak area or inten- sity of a peptide is relative to its abundance in the sample [118, 119]. The linear relationship between signal intensity and peptide abundance makes this approach more accurate for quantification of lower abundant proteins than spectral counting, where very few PSMs are mapped to a certain protein [120]. However, necessary data processing such as feature detec- tion, normalization, noise reduction and accurate matching of MS peaks between runs makes data analysis more challenging compared to spectral counting [117].

Absolute protein quantification by spike-in standards

Compared to relative quantification where the difference in abundance be- tween two or more samples is determined, absolute quantification generates a specific protein copy number or protein concentration within a sample. Absolute quantification strategies generally make use of isotope-labeled stan- dards, of which the absolute concentration is known [121] (Figure 2.8). Syn- thetic heavy isotope-labeled peptides, so-called AQUA peptides (for Abso- lute QUAntification), can be spiked into a sample and the difference in signal intensity between the AQUA peptide and the corresponding endogenous pep- tide is thereafter used to determine the absolute peptide abundance in the sample. AQUA peptides are widely used and commercially available [122]. Another type of standard is QconCAT (quantification concatamer), which consists of several concatenated peptides in sequence resulting in quantita- tive data from more than one peptide [123]. The standard is added to the sample before digestion, which reduces the error from incomplete proteoly-

25 Mass spectrometry-based proteomics sis. However, it is important that the QconCAT standard is digested with equal efficiency compared to the endogenous protein, which would otherwise lead to inaccurate quantitative data. Since QconCAT standards are usually expressed in Escherichia coli, the production is both cheap and simple.

protein extraction

addition of protein standard

digestion

addition of peptide standard

MS analysis intensity

m/z MS spectrum

Figure 2.8: Workflow for absolute quantification using heavy isotope-labeled standards. Protein standards can be added to the sample already after protein extraction, whereas peptide standards are added after proteolytic digestion. After MS analysis, heavy to light ratios are used to determine the abso- lute concentration of the peptide within the sample.

An optimal strategy for absolute quantification is to add a full-length heavy isotope-labeled protein as quantification standard. A full-length protein standard can be spiked in before proteolysis and will be digested with the

26 Mass spectrometry-based proteomics same efficiency as the endogenous version. Quantitative data from pep- tides over the whole protein sequence will be generated, resulting in a very accurate quantification. Methods using full-length proteins as internal stan- dards include absolute SILAC [124], protein standard absolute quantifica- tion (PSAQ) [125] and FlexiQuant [126]. Furthermore, production of the protein standard in the same host as the sample with the endogenous pro- tein would be beneficial as this would ensure that the proteins will carry the same modifications. However, the time-consuming and challenging pro- duction of full-length proteins, especially in mammalian hosts, hinders the large-scale applicability of these methods. A relatively new strategy, using protein fragments of 25-150 amino acids generated in a high-throughput for- mat could be a promising strategy for absolute quantification in large-scale studies [127,128]. This strategy will be further described in chapter 3.

Absolute label-free quantification

Label-free approaches can also be used to generate absolute quantitative data, however since peptide intensities and number of PSMs cannot directly be used for absolute protein quantification, data normalization is needed. The normalization can be performed in different ways. In intensity-based absolute quantification (iBAQ), the total signal intensity of all peptides from a protein is normalized by the number of theoretical peptides for the pro- tein [129]. The high3 method uses a similar normalization approach, however only the three peptides of highest intensity are used for quantification [130]. It is assumed that the best ionizing peptides from different proteins should generate roughly the same intensities, wherefore these peptides should gen- erate more accurate data than including all peptides. The exponentially modified protein abundance index (emPAI) normalizes the number of iden- tified peptides by the number of theoretical peptides to determine absolute protein quantities [131]. In absolute protein expression (APEX), the num- ber of PSMs instead of number of identified peptides is used. This value is normalized by the number of expected peptides, after first estimating a detection probability for each peptide based on experimental data [132]. In a study comparing the performance of emPAI, APEX and T3PQ (similar principle as high3) using bovine Fetuin spiked into a yeast lysate, it was observed that the instensity-based method T3PQ showed the highest linear-

27 Mass spectrometry-based proteomics ity over the investigated dynamic range, whereas the emPAI signal reached saturation at a certain point. At a certain abundance level, no further pep- tides will be identified even though the abundance is increased, explaining the lower linearity for this method [133]. Another study compared the em- PAI, APEX and iBAQ methods for proteome-wide quantification of proteins within an E. coli lysate. Similar correlations were determined when com- paring the methods to one another, however the iBAQ method showed the lowest variation between biological replicates [134].

Comparing quantification methods

As mentioned earlier, the time point at which the samples are mixed (relative quantification) or the quantification standard is added (absolute quantifica- tion) can have an impact on the final result, as it can be assumed that errors are introduced in every step of the sample preparation. In SILAC the samples are mixed directly after cell lysis, whereas in iTRAQ, samples are handled separately until after protein digestion and peptide labeling. There- fore, for experiments where extensive sample preparation is performed on the protein level, SILAC would be a better alternative. Still, research has been presented where iTRAQ and TMT generated quantitative data with better overall accuracy compared to a 14N/15N metabolic labeling strategy, even though mixing was performed later in the sample preparation work- flow [135]. In addition, iTRAQ enables multiplexing of up to eight samples, whereas SILAC is usually performed in 2- or 3-plex. However, comparisons between different levels of iTRAQ and TMT multiplexing have showed that increased multiplexing lowers the sensitivity, leading to less identified pro- teins [136, 137]. In addition, not all samples can be analyzed with SILAC, as cells need to be grown in medium containing heavy isotope-labeled amino acids. Super SILAC can however be used to solve this problem. Interest- ingly, it has recently been shown that dimethyl labeling can perform equally compared to SILAC in terms of quantitative precision and could therefore be an alternative to super SILAC for protein quantification in tissues [138]. The same study also investigated TMT labeling and found that the quan- tification rate for this method was significantly higher than for SILAC and dimethyl labeling. Co-isolation during precursor isolation is however a prob- lem for methods where quantification is performed on the MS/MS level,

28 Mass spectrometry-based proteomics such as TMT and iTRAQ, and can affect the accuracy of the quantitative results. Using an MS3-based approach or only regarding PSMs with suffi- ciently low isolation interference has however been shown to decrease this problem [137,138]. In absolute quantification, AQUA peptides are added to the sample after digestion, whereas full-length proteins, protein fragments or QconCAT pro- teins can be spiked in at an earlier stage. However, if the standard is not identical to the endogenous protein, as for protein fragments or QconCAT standards care needs to be taken to ensure that the digestion efficiency is not altered. Moreover, even though full-length proteins and protein frag- ments are promising for absolute quantification, some peptides will generate inaccurate data due to differences in modification patterns if the standard protein was not produced in the same host as the endogenous protein. Label-free methods are beneficial regarding both time and cost, making this approach a good alternative for the analysis of large sample sets. However, since data between different MS runs is compared and both sample prepara- tion and MS acquisition can differ between samples, the accuracy is lower for these methods. This means that label-based methods can detect smaller dif- ferences between samples than what is possible with label-free quantification methods [120,139]. If an accurate estimation of protein abundance is desired, a label-free approach should therefore not be the method of choice, although when a large difference in protein abundance between samples is expected, a label-free quantification is a fast and simple alternative. It should however be noted that the natural biological variation between individuals can in some cases be relatively large, leading to high variability between biological replicates even for a very accurate and precise quantitative method.

Proteome coverage and sensitivity

Even though proteome-wide analysis using MS can detect thousands of pro- teins in a single run, achieving full coverage is today still not feasible. The yeast proteome contains around 6,000 proteins and of these more than 4,000 have been identified in MS analyses [140–143]. A lot has happened regard- ing instrumentation in the past years that has improved the potential for larger proteome coverage. In 2001, Washburn and coworkers managed to

29 Mass spectrometry-based proteomics identify almost 1,500 yeast proteins in 68 hours of analysis time [75], while earlier this year, Hebert and coworkers identified almost 4,000 yeast pro- teins in a little more than one hour [143]. There are roughly 20,000 genes in the human genome, however it is unlikely that all of these are expressed simultaneously within a cell. More than 10,000 proteins in total have been identified in human cell lines, indicating that the complexity of a human proteome lies at least around this value [10, 58, 73, 144]. This has been ac- complished using quite different setups with MS instrumentation time of roughly one day [10] to twelve days [58]. A typical MS experiment identifies between 5,000-8,000 proteins [71, 97, 129], however the number of proteins identified in one experiment depends on multiple factors, such as sample preparation and fractionation, LC setup, mass spectrometer instrumenta- tion and data analysis. Several projects have been initiated to make large amounts of MS-data available to the research community. The Peptide At- las [145] aims to achieve complete annotation of genomes of different species by providing verified proteomics data. In 2013 the Peptide Atlas consisted of data corresponding to 12,644 proteins, i.e. above 60% of the number of coding genes [146]. In addition, the recently published ProteomicsDB, contains searchable proteomics data based on almost 17,000 LC-MS/MS ex- periments from human cell lines, tissues and body fluids [40]. The dynamic range of proteins in human cells has been shown to span around seven or- ders of magnitude [144] and proteins spanning this concentration range have been detected when using iBAQ label-free quantification to analyze protein abundance in eleven human cell lines [10]. Targeted approaches such as MRM assays can be used to decrease or entirely exclude the need for sample fractionation and significantly lower the required MS analysis time [81]. However, if low abundant proteins are analyzed, frac- tionation is advisable even when using MRM, in order to avoid interfering molecules [84]. In targeted approaches, assay development is a major bottle- neck and therefore, this strategy is not suitable for whole-proteome analysis. In one study, Ebhardt and coworkers analyzed an unfractionated lysate from a U2-OS human cell line using a 35 min LC gradient. They managed to iden- tify more than 70% of the 52 targeted proteins with copy numbers per cell as low as 7,500 [147]. Even though this targeted approach could decrease the analysis time significantly, proteins of similar copy numbers have been

30 Mass spectrometry-based proteomics detected also with shotgun proteomics using fractionation and longer gra- dients [127]. One major application for MRM is to analyze more complex samples, such as plasma, for the detection of low abundant plasma pro- teins [82, 83, 148]. The protein abundance range in plasma spans ten orders of magnitude with some serum proteins, e.g. serum albumin, being present in concentrations up to 40 mg/mL [42]. These proteins often hinder the identification of lower abundant proteins in undepleted plasma making the analysis of plasma samples a great challenge. However, due to the large amount of information residing within this sample type and the relative ease of sample collection, plasma is very attractive for diagnostic purposes [42]. Plasma biomarker detection with the possibility to accurately diagnose pa- tients with a certain disease is of course very beneficial. However, regarding biomarkers a yes or no answer will not be sufficient and quantitative assays are of more use. Quantitative data has also been generated for proteins of low attomole levels or ng/mL concentrations [81, 82, 85, 149, 150], although for many low abundant plasma proteins this is still not sensitive enough and further method development is required [151]. However, with the rapid im- provement in mass spectrometry instrumentation regarding sensitivity, this should in the future not be impossible [152].

31 Mass spectrometry-based proteomics

32 Chapter 3

Affinity proteomics

History of the immunoassay

Affinity proteomics, the second branch of proteomics, makes use of affinity reagents such as antibodies to detect and quantify target proteins. In an im- munoassay setup, analytes within liquid samples such as blood or cell lysates can be detected by the specific recognition of an affinity molecule. Rosalyn Yalow and Solomon Berson developed the first immunoassay in 1960 [153]. It required a pure analyte labeled with a radioactive isotope atom and the detection and quantification of the analyte within the sample was enabled through competitive binding between the labeled and unlabeled analyte to an antibody. The method was given the name radioimmunoassay (RIA) and Yalow was in 1977 awarded the Nobel Prize in physiology or medicine for this work. In 1968, Laughton E.M. Miles and Charles Nicholas Hales introduced the immunoradiometric assay (IRMA) [154, 155]. This method resembles RIA, although the antibody instead of the analyte is labeled with the radioactive isotope, introducing several improvements. Firstly, in RIA, a small decrease in radioactivity is detected against a relatively large back- ground signal, which lowers the sensitivity. Secondly, when labeling the analyte it is possible that the interaction between analyte and antibody is af- fected, which could alter the equilibrium. A few years later, Peter Perlmann and Eva Engvall introduced the new important method enzyme-linked im- munosorbent assay (ELISA) [156]. Although similar to RIA and the IRMA,

33 Affinity proteomics

ELISA uses enzyme-coupled antibodies to generate a signal output. Using enzymes instead of radioactive isotopes has the advantage of higher stability, making labeled antibodies active and useful for a longer period of time. At the same time, Anton Schuurs and Bauke van Weemen presented the en- zyme immunoassay (EIA) [157], where, in contrast to ELISA, the enzyme is coupled to the analyte instead of the antibody. Today, ELISA is a widely used method for analysis and quantification of different molecules in com- plex samples [158] and exists in different formats, such as direct ELISA, indirect ELISA and sandwich ELISA (Figure 3.1). In addition, several other immunoassay setups exist, of which some will be discussed later in this chap- ter.

A B C

Figure 3.1: Different ELISA setups. (A) Direct ELISA where a labeled primary antibody is used for detection, (B) indirect ELISA where a secondary labeled antibody is added to the sample for signal output and (C) sandwich ELISA where two target-specific antibodies are needed for target detection.

Antibodies as affinity reagents

Even though several different types of affinity molecules are used in research and in the clinic, the predominant molecule by far is the antibody (Ab) or immunoglobulin (Ig) [159]. In nature, antibodies have a central role in our immune system, where they recognize and mark foreign objects such as bacteria or viruses for destruction [160]. Antibodies are large (150 kDa) proteins consisting of four subunits: two identical light chains and two iden- tical heavy chains, linked to each other through disulfide bonds forming a

34 Affinity proteomics

Y-shaped molecule (Figure 3.2A). Different isotypes of heavy chains result in the different Ig variants IgA, IgD, IgE, IgM and the most common IgG, all with different functions. Two different isotypes also exist for the light chains; λ and κ, however they have no known functional divergence. Three constant domains (CH 1, CH 2 and CH 3) and one variable domain (VH ) make up each heavy chain whereas the smaller light chains consist of one constant

(CL) and one variable domain (VL). The variable domains are located at the tips of the Y-shaped structure, where the outer part makes up the antigen- binding site with three complementary determining regions (CDRs). CDRs of different antibodies differ both in sequence and size and give antibodies their different specificities. The fragment crystallizable (Fc) consists of the stem of the antibody (CH 2 and CH 3) and is a constant region not involved in antigen binding. Instead, the Fc region binds to cell surface receptors of immune cells, in this way directing the immune system toward the invading object [6]. Antibodies are produced in B cells. In a process called somatic re- combination, different CDR gene segments are randomly combined resulting in a large diversity of antibodies with different specificities [160]. Generation of antibodies for research or diagnostic applications can be car- ried out in different ways, for example by immunization of a host animal, such as a rabbit or mouse, with a target antigen [160]. Antibodies can be purified from the sera of the host animal, for example using a protein A/G column for enrichment of all IgG molecules or preferably by using an antigen-coupled column, generating a polyclonal antibody mixture with IgG variants targeting the protein of interest. Since these antibodies are gener- ated from different B cells their specificities will not be identical and different antibodies will bind to different parts (epitopes) of the antigen. Polyclonal antibodies are not a renewable source and re-immunizing an animal with the same antigen will generate a polyclonal mix of antibodies with similar, but not identical epitope specificity [161]. An alternative approach is to iso- late individual B cells and immortalize them by fusion with a tumor cell, a technique that was invented by Georges J.F. Köhler and César Milstein in 1975 [162]. The resulting hybridoma cells can grow and produce antibodies indefinitely, secreting the product into the surrounding medium. Since all antibodies will be generated from the same B cell, these monoclonal anti- bodies will share the same epitope-specificity [163]. However, in order to

35 Affinity proteomics

find a monoclonal antibody with the right specificity, tedious screening is often required, as many of the clones will not show good enough affinity or specificity for the target antigen. It is also possible to mimic nature’s anti- body selection machinery and select high affinity binders in vitro using for example . Antibody selection using in vitro display systems offers a more controlled selection process, where the selection can continue until satisfactory binders have been obtained. However, smaller antibody fragments rather than full-length antibodies are mainly generated using this approach [164].

A CDRs B VH

VL

CH1

CL Fab

F(ab’) 2

CH2

VH

Fc VL

CH1 VH VL CH3 CL

Fab ScFv antibody

Figure 3.2: (A) Schematic structure of an IgG molecule and (B) some common antibody derivatives.

Depending on what kind of antigen is used in the immunization, antibodies of different epitope specificity will be obtained. If a peptide is used as antigen, antibodies recognizing linear epitopes are most likely generated. However, since small peptides can adopt multiple structures [165], it is likely that some of these antibodies will not be able to bind the structured, full-length target protein. Therefore, larger peptides or protein fragments are in most cases preferred. Larger antigens can generate antibodies targeting either lin- ear or structural epitopes and the probability that these antibodies will be able to detect structured target proteins is therefore increased. This type of antibodies are in many cases functional in assays where native proteins are to be detected, as for example in immunoenrichment experiments as will

36 Affinity proteomics be discussed in chapter 4. However, in experiments or western blot, where target proteins are denatured, antibodies targeting structural epitopes might not be functional. Hence, it is important to con- sider in what applications the antibodies are to be used before choosing a suitable antigen for antibody generation.

Alternative affinity reagents

In certain applications, the large size of antibodies can be a drawback. An- tibodies have a long half-life, resulting from both size and effector functions of the Fc fragment. This is disadvantageous for example in imaging applica- tions, where the molecule should be eliminated from the body shortly after the image is generated [166]. In addition, production using recombinant techniques in bacterial hosts, which is both economical and efficient [167], is difficult for antibodies due to their large size. To overcome these obsta- cles, several smaller antibody derivatives have been developed, as well as non-immunoglobulin based scaffolds [168, 169]. Common antibody deriva- tives include fragment antigen-binding (Fab) fragments, which consist of the light chain and VH and CH 1 from the heavy chain (Figure 3.2B). The enzyme papain cleaves the antibody just below CH 1 and CL, thus generat- ing two Fab fragments. Treating the antibodies with instead, which cleaves slightly below the cleavage site of papain, instead generates a F(ab’)2 fragment, consisting of two Fab fragments connected by a disulfide bridge. Another antibody derivative is the single-chain fragment variable (scfv) frag- ment, which consists of the VH and VL domains, connected by a linker. These antibody fragments all lack the Fc part and consequently do not pos- sess any effector functions but only act as affinity molecules. One non-immunoglobulin based scaffold is the Affibody molecule: a small, three helical bundle protein of 58 amino acids originating from one of the IgG- binding domains of staphylococcal protein A [170]. The original specificity of the protein is directed towards IgG, however by randomizing 13 positions situated in two of the helices involved in IgG binding, large libraries can be generated, from which proteins possessing new desired specificities can be se- lected. A similar protein is derived from an albumin-binding domain (ABD) of streptococcal protein G. This protein is similar in structure and size to

37 Affinity proteomics the Affibody molecule, however its original affinity is towards albumin. Ran- domizing positions involved in albumin binding has generated ABD variants with other specificities [171]. In addition, randomization has also been per- formed on the opposite side of the albumin-binding site, hence preserving this affinity and generating new bi-specific binders [172]. Other alternative scaffolds include designed ankyrin repeat proteins (DARPins), which are de- rived from the naturally occurring ankyrin proteins and consist of several repetitive units of two antiparallel α-helices [173]. Adnectins have a β-sheet fold similar to the IgG-fold, with the strands connected by six loops that are randomized for selection purposes [174]. Anticalins have a β-barrel structure with four connecting loops targeted for randomization [175]. Aptamers are short single-stranded DNA or RNA molecules that have emerged as alter- natives to protein affinity binders due to for example their high stability, straight-forward production and low immunogenicity [176].

The Human Protein Atlas

The Human Protein Atlas (HPA) project is a proteomics initiative with the aim to map the human proteome with antibody-based methods in a gene-centric manner and generate a resource of antibodies targeting all hu- man proteins [177]. The generated antibodies are thoroughly validated and are used to study protein expression in cells, healthy and cancer tissues and plasma. Since the launch in 2003, polyclonal antibodies targeting over 16,000 genes, corresponding to over 80% of the human genome, have been generated. Antigen selection is the first step of the workflow, where protein fragments, termed protein epitope signature tags (PrESTs) of 25-150 amino acids are de- signed [178] (Figure 3.3). The antigen selection is based on sequence homol- ogy towards other human proteins and sequences with low similarity to other genes are chosen to minimize the risk for cross-reactivity. In addition, trans- membrane regions and signal peptides are avoided. It would be desirable to choose protein sequences located at the protein surface, since this would increase the success rate of the antibodies in assays where native proteins are analyzed. However, since only a subset of the protein structures have cur- rently been solved, this is not a criterion in the antigen design process. After

38 Affinity proteomics

antigen production antibody PrEST design cloning and purication immunization purication

ABP PrEST 6 His

protein arrays western blot immunohistochemistry immuno€uorescence

Figure 3.3: Workflow of the HPA project. Antigen se- quences are chosen and cloned into an expression vector. Antigens are produced and characterized before rabbit immu- nization. Sera are purified to obtain a mixture of polyclonal PrEST-specific antibodies, which are thoroughly validated and used for protein analysis in immunohistochemistry and im- munofluorescence. reverse transcription of the selected gene fragments from RNA pools and insertion of the DNA fragment into an expression vector, the antigens are recombinantly produced using E. coli as expression host. Antigens are pro- duced in fusion with a hexahistidine (His6) tag as a purification handle and albumin binding protein (ABP) for increased solubility, in a high-throughput fashion [179]. With the present workflow, production of almost 300 proteins per week in batches of 72 proteins is possible. The cell cultivation is per- formed in 1 L shake flasks and the protein purification is performed using a fully automated liquid handling system. The purified proteins are validated using SDS-PAGE for purity estimation and MS for molecular weight verifi- cation with a success rate of over 80%. The following step is immunization of rabbits to generate polyclonal antibodies. Antibody-containing sera are pu-

39 Affinity proteomics rified in a two-step protocol in order to separate antibodies targeting the tag

(His6ABP) from PrEST-specific antibodies. In the first step, tag-specific antibodies are retained on a His6ABP column after which PrEST-specific antibodies can be captured on a column with immobilized PrEST [180]. Pu- rified antibodies are thereafter validated using several methods. Microarrays with 384 printed PrESTs are used to verify antibody specificity and western blot analysis using samples from two human cell lines, two human tissues and human plasma is used to verify specific binding of the antibody to the full- length protein in a complex matrix [181,182]. The antibodies are thereafter used to study protein expression in 48 normal human tissues, 20 cancer tis- sues and 47 different human cell lines using immunohistochemistry [183,184]. Subcellular localization is investigated using immunofluorescence and confo- cal microscopy analysis of three human cell lines [9,185]. Finally, all data is made publicly available on the HPA website (www.proteinatlas.org).

High-throughput production of heavy isotope-labeled protein frag- ments for mass spectrometry-based protein quantification

As discussed in chapter 2, quantification using MS is feasible but not straight- forward. For absolute quantification, heavy isotope-labeled proteins or pep- tides can be used as internal standards. Due to difficulties in producing large numbers of heavy isotope-labeled full-length proteins, synthetic pep- tides with incorporated heavy isotopes are most commonly used, although full-length proteins would be desirable due to the possibility to add the stan- dard at an earlier stage. Within the HPA project, PrEST protein fragments are used as antigens for antibody generation. The majority of all avail- able PrESTs generate at least one unique tryptic peptide and could there- fore be used as internal standards for MS-based absolute quantification. In their light form, PrESTs can be directly spiked into heavy isotope-labeled cell lysates to determine absolute protein copy numbers. However, since many interesting samples, such as plasma or tissue, cannot be labeled with heavy isotopes, heavy isotope-labeled standards are of great importance. A pipeline for the production of heavy isotope-labeled PrESTs has been set up [127], where a modified E. coli strain, auxotrophic for arginine and lysine is used for the PrEST production [186]. Lysine and arginine are supplied to a minimal culture medium in heavy isotope-labeled forms to generate heavy

40 Affinity proteomics isotope-labeled protein fragments. Lysis and purification is performed with the standard protocols used within the HPA project.

B light HisABPOneStrep heavy PrEST

A

Sequence used for quanti!cation of endogenous protein combine and digest

His 6 ABP PrEST heavy protein

His 6 ABP OneStrep light protein

Sequence used for PrEST quanti!cation MS analysis

Figure 3.4: Generation of heavy isotope-labeled PrEST standards for MS-based absolute quantification purposes. (A) PrEST quantification is performed using the quantification

standard His6ABPOneStrep. Peptides originating from the His6ABP region are used for accurate PrEST quantification. Peptides originating from the PrEST sequence can thereafter be used to quantify endogenous proteins in for example a cell lysate. (B) Workflow for PrEST quantification. PrEST and quantification standard are mixed and digested. Heavy to light

ratios from the His6ABP region are used to determine the PrEST concentration.

Within the ordinary pipeline, the concentration of purified PrESTs is mea- sured using bicinchoninic acid (BCA) assay, which is accurate enough for the immunization purposes of the regular, light PrESTs. However, the concentration of internal standards for absolute quantification needs to be

41 Affinity proteomics very accurately determined in order to obtain reliable quantitative data. Amino acid analysis, where proteins are subjected to complete hydrolysis generating free amino acids that can be quantified one by one, gives an accurate measure of protein abundance in a pure sample. However, this method is rather expensive making it disadvantageous for high-throughput purposes. Instead, PrEST quantification is performed using an MS-based setup (Figure 3.4), similar to for example the EtEP strategy, described by Holzmann and coworkers [187]. In EtEP, an N-terminal equalizer pep- tide (EP), added to the internal standard peptide during chemical synthe- sis, is used to determine the concentration of the peptide standard. The PrEST-based method instead uses a quantification standard consisting of the His6ABP tag and a strep tag as a second purification handle for this purpose. For quantification of heavy isotope-labeled PrESTs, an unlabeled version of the quantification standard is used and vice versa. The quantifi- cation standard, termed His6ABPOneStrep, is purified in a two-step setup using both immobilized metal ion affinity chromatography (IMAC) and pu- rification on a StrepTactin column. His6ABPOneStrep is further quantified using amino acid analysis. When mixing unlabeled His6ABPOneStrep and a heavy isotope-labeled PrEST and digesting them with trypsin, peptides orig- inating from the His6ABP tag will be present in both unlabeled and heavy isotope-labeled forms. Since the absolute quantity of the light peptides orig- inating from His6ABPOneStrep is known, the obtained heavy to light ratios can be used for determination of the absolute quantity of the heavy isotope- labeled PrEST. Accurately quantified PrESTs can then in turn be used as internal standards in MS-based absolute quantification setups where pep- tides from the PrEST region are compared to peptides from the correspond- ing endogenous protein in for example a cell lysate. The PrESTs commonly enable protein quantification using multiple peptides, although differences in sequence length and location within the gene of interest make different PrESTs more or less suitable for this application.

Immunoassays

The core of an immunoassay is the affinity binder, which could be an anti- body, an antibody fragment or an alternative scaffold molecule. Optimally this affinity binder should bind its target with both high affinity and speci-

42 Affinity proteomics

ficity and the binding event needs to be detectable. The affinity molecule can be directly coupled to a detectable label, such as a fluorescent dye or an enzyme. Other options are to use a secondary detection agent that carries the detectable label or instead label the antigen [11,36]. Labeling antibodies with DNA strands is another strategy that enables increased assay sensitivity due to the signal enhancement obtained after DNA amplification [188, 189]. Compared to MS, affinity proteomics has the advantage of very high sensitiv- ity with detection levels reaching sub-femtomolar levels [190–192]. However, one major issue for all antibody-based methods is the specificity as the accu- racy of the immunoassay is entirely dependent on the quality of the affinity reagent. The total signal output is converted to an abundance approximation of the analyte, with the assumption that nothing within the sample but the analyte is contributing to the signal intensity. In reality this is not always the case as proteins are likely to interact nonspecifically with both each other, surrounding molecules and surfaces. Since it is impossible to distinguish a signal generated from an antibody binding specifically to its target from a signal generated from unspecific binding or cross reactivity, antibodies need to be thoroughly validated before they can be used as affinity reagents in immunoassays. The issue of antibody cross-reactivity will be discussed later in this chapter.

planar array bead array

Figure 3.5: Different microarray formats.

ELISA is the golden standard of immunoassays with low detection levels, a wide dynamic range and good reproducibility. However, scaling up ELISA experiments for analysis of a large number of antibodies requires large sam- ple and reagent volumes. Immunoassays can instead be scaled up into a multiplexed microarray format to enable fast analysis of a large number of

43 Affinity proteomics analytes in small reaction volumes [193]. The concept of microarrays; a setup of orderly arranged spots, where each spot is a separate reaction chamber, was first described by Roger Ekins in 1989 [194]. Microarrays can be planar, where molecules are covalently attached to a glass or silicon slide, resem- bling a highly ordered system of ELISA assays. An alternative to planar arrays is bead-based microarrays, where molecules are instead attached to the surface of small micro-scale beads. This has the advantage of faster ki- netics compared to planar formats [193] (Figure 3.5). With the Luminex xMap technology, up to 500 different color-coded paramagnetic beads can be analyzed in parallel [195]. Each bead contains a precise ratio of two flu- orophores, making each bead type unique and possible to identify in a flow cytometer setup. Microarrays can be divided into analytical, functional and reversed phase protein microarrays (Figure 3.6). Analytical microarrays, also called protein capture arrays, is the most common setup and can be used to verify the presence of a protein in a sample, to compare protein abundance in different samples, for biomarker discovery and affinity measurements [196]. Here, well- characterized capture molecules such as antibodies are immobilized onto the solid support and a sample containing the analyte is thereafter added to the microarray. For detection, the sample can be labeled with for example biotin [197]. By subsequently adding a fluorophore-labeled streptavidin molecule, bound proteins are visualized. The small size of biotin minimizes the risk of interfering with antibody target binding and moreover, the tight interaction between biotin and streptavidin ensures that the majority of bound analyte molecules will be detected. In a sandwich setup, unlabeled sample containing the analyte is added to the array with immobilized capture antibodies and detection is enabled by addition of a labeled detection antibody [11]. The sandwich assay has the advantage of increased specificity, as two antibodies are required for a signal output. However, due to issues concerning cross- reactivity as will be discussed below, multiplexing can become problematic [198]. Functional protein microarrays, also called target protein arrays, can for ex- ample be used to study biochemical activities, protein interactions or for detection of autoantibodies [199]. Here, purified proteins are attached to the solid support, thus including the need to produce large numbers of re-

44 Affinity proteomics

analytical microarray

direct labeling

sandwich

functional microarray

reversed-phase microarray

Figure 3.6: Setup for analytical, functional and reversed- phase protein microarrays. combinant proteins. A labeled sample is added to the array and captured molecules are subsequently detected. Reversed-phase microarrays are a good option if scaling up the number of samples instead of number of target proteins is desired, as compared to analytical and functional arrays. In this approach, proteins from a complex sample, for example a cell lysate, are covalently attached to the solid support. A detection antibody is added to the microarray to detect the presence of a specific protein across many samples. This approach can for example be applied to investigate protein modifications or other protein alterations [196].

45 Affinity proteomics

Multiplexing an immunoassay setup is desirable but not completely straight- forward. Cross-reactivity becomes an issue when multiplexing sandwich as- says as will be discussed in the following section. In addition the dynamic range becomes problematic if target analytes have very different abundance and since assay parameters cannot be optimized for each protein, com- promises are necessary [200, 201]. Comparisons made between multiplexed immunoassays and classical ELISA assays however show that the results generally correlate well [202–205], even though multiplexed assays cannot usually match standard single-analyte immunoassays in terms of sensitiv- ity [193,203].

Cross-reactivity in immunoassays

False positive signals in immunoassays can be divided into sample-driven and reagent-driven cross-reactivity (CR). Sample-driven CR is the result of molecules in the sample reacting nonspecifically to one another or the solid support. This type of CR causes problems in direct detection assays, where all sample molecules are labeled and can generate a signal output (Figure 3.7). In a sandwich setup the sample-driven CR is decreased, since the specific binding of two separate antibodies is required for analyte de- tection [198]. However, scaling up multiplexed sandwich assays has proven to be difficult due to severe cross-reactivity issues and the upper limit of multiplexing is around 50-plex [198,200]. When applying a mix of secondary detection antibodies (dAbs), these antibodies can interact with each other, the capture antibodies (cAbs) printed on the solid support or any of the molecules within the applied sample captured on the array. In addition, molecules within the sample can interact with each other or nonspecifically with the cAbs. Pla-Roca and colleagues termed these unspecific interactions "liability pairs" and divided them into five different categories: cAb-antigen, antigen-antigen, dAb-cAb, dAb-antigen and dAb-dAb (Figure 3.7). In to- tal, the number of liability interactions is 4N(N-1), where N is the number of targets. This CR, termed reagent-driven CR, therefore increases rapidly with increased multiplexing [206]. In a multiplexed sandwich format, due to the addition of dAbs in a mixture, it is enough if one of the dAbs shows cross reactivity to other species in the sample for the whole assay to be affected [198].

46 Affinity proteomics

sample-driven cross-reactivity

reagent-driven cross-reactivity

cAb-antigen antigen-antigen dAb-cAb dAb-antigen dAb-dAb

Figure 3.7: Different types of CR in immunoassays. Sample-driven CR occurs from labeled proteins within the ap- plied sample binding nonspecifically to other proteins, antibod- ies or plastics. Reagent-driven CR may occur in multiplexed sandwich assays when a mixed pool of dAbs is added to the array. Unspecific interactions are in the figure represented by the orange molecules.

In direct labeling approaches, where only one antibody is needed, the ob- served CR will not depend on the level of multiplexing and therefore this setup is advisable for highly multiplexed experiments. Applying harsh wash- ing conditions, which will hopefully break unspecific interactions while re- taining stronger, specific interactions, is a strategy to decrease the CR. Using the sample labeling approach, multiplexed assays have been developed where over several hundred targets have been identified simultaneously with detec- tion limits in the ng/mL range [207]. Several strategies to evade the issue of reagent-driven CR have been proposed. In the antibody colocalization microarray (ACM), detection antibodies are not mixed, but are instead sep-

47 Affinity proteomics arately added to the specific spots on the microarray [206]. Another strategy is sequential multiplex analyte capturing (SMAC), where one sample is in- cubated with different batches of antibody-coupled beads in sequence. The beads are thereafter incubated with the corresponding dAb [208]. Proximity ligation assay (PLA) and proximity elongation assay (PEA) are two other examples of methods that can decrease reagent-driven CR. In PLA and PEA, two affinity reagents carrying complementary single-stranded DNA barcodes are needed for detection and upon simultaneous binding to the target pro- tein the DNA strands hybridize after which, qPCR is used for quantifica- tion [209–211]. CR is only a problem as long as signals from nonspecifically binding ana- lytes cannot be distinguished from signals arising from specific interactions between antibody and target. This is the case in immunoassays since the output is the total signal intensity from the microarray spot, well or bead. As discussed earlier, MS enables specific detection by amino acid sequence determination. The combination of immunoenrichment using antibodies and MS readout can therefore to some extent overcome the problems of CR since separate signals are obtained for different molecules and hence, the specificity of the affinity reagent becomes less crucial. Merging the fields of MS-based and affinity proteomics will be discussed in chapter 4.

48 Chapter 4

Bridging affinity proteomics with mass spectrometry

Benefits of combining immunoenrichment with mass spec- trometry

In MS analysis, where highly accurate protein identification is possible, sen- sitivity can be a limiting factor in the analysis of low abundant proteins. High abundant proteins are often troublesome, as they tend to quench sig- nals from lower abundant molecules. Targeted MS approaches can diminish this problem, however target peptides then need to be chosen beforehand and non-targeted peptides will never be detected. Depleting the sample from high abundant proteins before analysis is also a strategy. However, this can lead to the loss of interesting proteins that interact with the de- pleted species. Researchers within the MS field are constantly struggling to increase the sensitivity, especially since it is often low abundant proteins that are interesting from a clinical perspective [212]. Affinity proteomics on the other hand struggles with specificity issues, as discussed above. If the affinity agent is cross-reactive, this can hamper the whole assay and careful validation of the specificity of affinity reagents is therefore crucial in order to generate accurate results. In other words, MS, which generates very specific data, has trouble with

49 Bridging affinity proteomics with mass spectrometry sensitivity, and affinity proteomics, where very high sensitivity has been achieved, suffers from inaccuracies due to potential antibody cross reactiv- ity. Combining the two techniques in a setup where a molecule of interest is first captured from a sample using an affinity reagent and thereafter an- alyzed using MS for verification of the target identity, brings together the advantages of both technologies and an assay with both high specificity and increased sensitivity is obtained.

Immunoenrichment setups

Immunoenrichment can be performed using different setups, depending on the application and the affinity reagent. An antibody targeting a structural epitope should be used for immunoenrichment of a full-length, structured protein. For an anti-peptide antibody, immunoenrichment from a tryptic digest would be preferred, if the epitope does not contain an arginine or lysine residue. In this case a proteolytic enzyme with different specificity could be used. For quantification purposes using isotope-labeled standards, the immunoenrichment should preferably be performed after addition of the internal standard, since the recovery after the antibody capture step will not be complete and depends on the antibody affinity. This is not a problem if the spike-in standard is present during the immunoenrichment since the relative portion of bound protein or peptide will be equal for the standard and the endogenous molecule. If the enrichment is instead performed before the spike-in, a portion of the target molecule will be lost at this stage and it will be more difficult to obtain an accurate estimation of protein abun- dance. The affinity of the capture reagent towards its target would in this case need to be determined beforehand to enable an approximation of the total amount of protein or peptide present in the sample from the start. If, however, a full-length protein is used as quantification standard, immunoen- richment can be performed either on protein or peptide level. Performing the enrichment experiment on the protein level would possibly generate data from multiple peptides using only one capture reagent. To generate the same amount of data from an experiment performed on the peptide level, different anti-peptide antibodies would be needed, making immunoenrichment on the protein level advantageous in this case. However, since AQUA peptides are, as mentioned in chapter 2, the most common strategy for absolute quantifica-

50 Bridging affinity proteomics with mass spectrometry tion, immunoenrichment on the peptide level commonly applied for protein quantification purposes.

Peptide enrichment coupled to mass spectrometry

Enrichment of tryptic peptides with subsequent MS readout has become pop- ular for quantification of proteins in complex samples, primarily plasma. The stable isotope standards and capture by anti-peptide antibodies (SISCAPA) technology, developed by Anderson and coworkers, is the most widespread strategy for this purpose [76] (Figure 4.1). The SISCAPA protocol con- sists of four steps: (1) tryptic digestion of the sample, (2) addition of heavy isotope-labeled peptide standards, (3) immunoenrichment using anti-peptide antibodies and (4) peptide quantification using MS. The peptide capture step can result in an enrichment factor of above 1000 [213, 214], increasing the detection limits to the lower ng/mL range for targeted MRM assays in plasma [214–216] and saliva [217]. Detection limits as low as pg/mL have also been reported, however large plasma volumes (1 mL) were used in this case [218]. The SISCAPA assay has been multiplexed to enable simultane- ous detection of up to 50 targets, with retained quantification accuracy as compared to single-plex assays [219]. SISCAPA quantification is generally performed using liquid chromatography coupled to MRM MS, however other setups have also been tried. For exam- ple SISCAPA enrichment and MALDI MS quantification showed promising results when six different peptides were quantified in human plasma with coefficients of variation below 4% [220]. The simplicity, robustness and pos- sibility of high-throughput measurements using MALDI makes it a very at- tractive platform for analyzing clinical samples, however obtaining quantita- tive precision and reproducibility has proven to be more difficult when using MALDI MS, compared to ESI [221]. SISCAPA enrichment has also been performed in combination with an LC-free MRM setup for peptide quan- tification, where the cycle-time was reduced 300-fold, although with slightly higher coefficients of variation [222]. Another method combining peptide immunoenrichment with protein quan- tification using MS is iMALDI (immuno-MALDI), developed by Borchers and colleagues [223–225] (Figure 4.1). The workflow is similar to SISCAPA

51 Bridging affinity proteomics with mass spectrometry

SISCAPA work!ow intensity

digestion m/z ESI-MS MALDI-MS protein mixture peptide mixture

intensity heavy isotope-labeled standard

m/z MALDI-MS iMALDI work!ow

Figure 4.1: Peptide enrichment coupled to MS quantifica- tion. Proteins are digested and heavy isotope-labeled peptides are added to the digested sample before immunoenrichment using anti-peptide antibodies. Peptides are further analyzed using ESI-MS (SISCAPA) or MALDI-MS (iMALDI). in that peptides from a trypsin-digested sample are captured using anti- peptide antibodies and analyzed using MS. However, in iMALDI, as the name implies, analysis is performed using MALDI MS. The iMALDI proto- col is simple as beads containing bound peptides are directly spotted onto a MALDI target, hence eliminating the need for peptide elution. Studies using the iMALDI methodology have been focused towards disease diagnosis, for example through the analysis of peptides from Francisella Tularensis for the diagnosis of tularemia [225]. The cancer biomarker EGFR has been analyzed by enrichment of EGFR peptides from various cancer cell lines and tumor biopsies and EGFR could be detected in a sample corresponding to less than ten EGFR-expressing breast cancer cells [224]. Furthermore, iMALDI has also been applied in the diagnosis of hypertension by analysis of the peptides angiotensin I and II in plasma [226–228]. One limitation of the iMALDI tech- nology is the inability to multiplex the analysis to more than 5-10 targets, as MALDI cannot be coupled on-line to a liquid chromatography system to fractionate peptides [220]. The iMALDI setup has however been used for

52 Bridging affinity proteomics with mass spectrometry simultaneous capture and quantification of two peptides (angiotensin I and II) in plasma, with a limit of detection in the pg/mL range [227]. Apart from that these setups can be easily combined with quantification using AQUA peptides, one advantage of using peptide immunoenrichment is that a peptide sample is generally easier to handle than a protein sam- ple, due to degradation, unfolding and solubility issues often associated with handling of full-length proteins. The wide variation of physicochemical prop- erties of proteins also makes optimization of experimental conditions easier for enrichment using peptide samples [229]. In addition, production of full- length proteins is quite troublesome compared to the expression of protein fragments or generation of synthetic peptides [230], wherefore most avail- able antibodies are generated towards a peptide or protein fragment and are therefore likely to be functional in peptide enrichment setups.

Protein enrichment coupled to mass spectrometry

Due to the reasons mentioned above, protein immunoenrichment can be difficult, however the increased protein sequence coverage acquired using en- richment on the protein level, as opposed the peptide level, makes this an attractive alternative. In addition, performing the immunoenrichment prior to trypsin cleavage is advantageous from an economical perspective, espe- cially when samples of high protein content are analyzed, such as plasma or serum. The sample complexity is after immunoenrichment significantly decreased, lowering the required amount of costly enzyme to a fraction of the quantity needed to digest the original sample. Moreover, protein im- munoenrichment can generate information regarding protein complexes and interaction networks, which is information that is completely lost when per- forming immunoenrichment at the peptide level. The mass spectrometric immunoassay (MSIA) was developed already in 1995 by Nelson and colleagues [231] and since then, several studies using this technology have been described [232–234]. MSIA differs from SISCAPA and iMALDI as full-length proteins instead of peptides are captured in the im- munoenrichment step (Figure 4.2). In MSIA, the sample is incubated with antibody-coated beads in a pipette-tip format and subsequently eluted with MALDI matrix prior to MALDI MS readout. In the first publication, two

53 Bridging affinity proteomics with mass spectrometry toxins were analyzed in human blood. Here, full-length toxins were cap- tured using two different antibodies and were then analyzed using MALDI MS without enzymatic digestion. Further development of the system has led to a fully automated setup [235] and the MSIA approach has also been used with following trypsin digestion along with quantification using heavy isotope-labeled spike-in peptide standards and targeted MRM MS analy- sis [236, 237]. Several different setups have been developed using full-length internal standards for protein quantification. For example, rat and pig pro- teins have been used as internal standards to determine the concentration of the corresponding human protein. The molecular weights of the homologs are not identical, enabling differentiation between the variants in the MS spectrum. For this setup to be functional, an antibody with cross-species specificity is required [238, 239]. Unrelated proteins have also been used as standards, however this setup requires separate antibodies for capture of target protein and internal standard [240, 241]. Limits of quantification in the range of ng/mL have been achieved both with and without subsequent trypsin digestion [236, 241]. As discussed above, immunoenrichment of full- length proteins has the advantage that protein isoforms and modified protein variants can be distinguished from one another and MSIA has been used to identify different variants of for example the protein cystatin C [240] and transthyretin [241]. Another application of protein immunoenrichment coupled to MS is to verify the identity of captured proteins in bead-based immunoassays. After iden- tifying differences in signals between diseased and control plasma samples, it is crucial to verify that the difference in signal intensity originates from differences in abundance of the target protein between the two samples and not unspecific binding to antibodies or beads. This has been performed to verify the specific capture of C2, C8 [70] and carnosine dipeptidase 1 [242] in plasma. A large amount of information can be obtained from immunoenrichment of protein complexes in combination with MS readout. Different setups can be applied, the most straightforward being to use a target-specific antibody to capture the protein complexes involving the target protein [243–245] (Fig- ure 4.3). Usually cell lines are used for this purpose and by applying a very mild lysis strategy, protein interactions may be maintained. However,

54 Bridging affinity proteomics with mass spectrometry

intensity

time MRM MS elution and digestion

elution

intensity

m/z MALDI MS

Figure 4.2: Workflow for the MSIA method. Protein immu- noenrichment is performed in a pipette-tip format and eluted proteins are either digested and analyzed using MRM MS or proteins are directly analyzed using MALDI MS. the fact that enrichment conditions cannot be standardized for all antibod- ies in combination with the limitation that validated antibodies do not ex- ist for all targets hinders this approach [246]. Instead, cells expressing a tagged version of the target protein (the bait) can be used in combination with anti-tag antibodies to capture the components of protein complexes (the prey) [52, 246–248]. One common strategy is tagging proteins with the tandem affinity purification (TAP) tag, which enables purification in two steps, both under native conditions. The tag consists of a calmodulin- binding peptide and a Protein A IgG-binding domain separated by a TEV linker [249, 250]. First, proteins are purified on an IgG column and bound proteins are eluted by TEV cleavage to leave the Protein A domain on the column. The sample is thereafter subjected to a calmodulin column, from which proteins are eluted by the addition of EGTA. Other common tags are the FLAG tag and the GFP tag, that both bind to corresponding tag-specific antibodies [251–253]. Although these tagging approaches enable large-scale analysis and eliminate the problem with variation in antibody affinity and specificity, there are some issues with tagging endogenous proteins. For ex-

55 Bridging affinity proteomics with mass spectrometry ample, verifying that the tag does not affect the protein in terms of expres- sion, subcellular localization and interaction characteristics can be difficult. Altered protein expression could change the equilibrium within the cell and thereby alter the composition of the studied protein complex. In human cells, bacterial artificial chromosome (BAC) transgenomics is typically used to generate clones with tagged bait proteins, which generates a stable ex- pression of the tagged protein [254]. In order to preserve weak interactions, very mild washing conditions are usu- ally applied, which also increases the amount of nonspecifically interacting proteins that remain bound to the antibodies or the beads. Consequently it is of great importance to include good controls in the analysis in order to distinguish specific interaction partners from nonspecifically interacting proteins. Databases of contaminating proteins usually seen in immunoen- richment experiments, such as bead proteomes [255], protein frequency li- braries (PFL) [256] and the CRAPome [257] can be helpful in identifying true interactors. Furthermore, SILAC-based methods can be used to study protein interac- tions, where the bait protein is tagged in either the unlabeled or labeled sample. Specifically interacting proteins will be enriched only in the sam- ple and not in the control, resulting in high or low heavy to light ratios. Nonspecifically interacting proteins on the other hand will be enriched to a similar extent in both sample and control and the heavy to light ratios for these proteins will therefore be close to one [258, 259]. Mixing the samples before immunoenrichment can however lead to exchange of dynamic inter- action partners between the two samples, which may lead to false-negative results with specific interaction partners generating ratios close to one. To circumvent this problem, immunoenrichment can instead be performed be- fore mixing the samples [260]. Experiments involving three different SILAC labeling states have also been performed to make it possible to distinguish between constitutive and signal-dependent interaction partners [261]. An- other example is QUICK (quantitative immunoprecipitation combined with knockdown), where SILAC labeling is combined with RNA interference. In either the labeled or unlabeled sample the target protein has been knocked down. This leads to large differences in intensities for peptides originating from the knocked down protein and its interaction partners, whereas non-

56 Bridging affinity proteomics with mass spectrometry

target-speci c Abs tag-speci c Abs tag-speci c Abs label-free MS label-free MS SILAC labeling

intensity intensity intensity intensity intensity

m/z m/z m/z m/z m/z intensity intensity intensity intensity intensity

m/z m/z m/z m/z m/z intensity intensity intensity intensity intensity

m/z m/z m/z m/z m/z

Figure 4.3: Strategies for analysis of protein interactions using immunoenrichment coupled to MS. Immunoenrichment using (left) target-specific antibodies and label-free MS quan- tification, (middle) tag-specific antibodies and label-free MS quantification and (right) a SILAC-based approach where the target protein is tagged only in the light sample. In all cases specific interaction partners (blue and pink) will be present only in the sample and not in the control, whereas nonspecifi- cally interacting proteins (grey) will be present in both sample and control.

57 Bridging affinity proteomics with mass spectrometry specific binders will show ratios close to one. In this strategy no tagging of the bait protein is performed and the method hence makes use of anti-target instead of anti-tag antibodies [243].

Immunoenrichment of protein or peptide groups

Proteins can be modified in a number of different ways and different mod- ifications can for example regulate protein activity or localization [39]. In many cases, the analysis of protein modifications can be far more informative than determining the protein concentration. However, due to the generally lower abundance of modified peptides versus their unmodified counterparts, analysis and identification of PTMs is challenging. Antibodies are therefore commonly used to enrich groups of peptides or proteins carrying a certain modification, although the specificity of these antibodies is usually lower than for target-specific antibodies [262]. Immunoenrichment of phosphorylated peptides [77, 263–265] and proteins [95,264,266] is feasible with antibodies recognizing phospho- residues, which have shown higher specificity than antibodies targeting phosphory- lated and [267]. Ubiquitination can be studied with antibod- ies specific for mono- or poly-ubiquitinated proteins [268–271]. Upon tryptic digestion of ubiquitinated proteins, a glycine-glycine remnant remains at the ubiquitination site and peptides carrying this modification can be captured using an antibody specific to this site [272,273]. Antibodies specific for differ- ent ubiquitin chain conformations have also been generated [274,275], which could generate a very detailed view of the ubiquitinated proteome. In addi- tion to phosphorylation and ubiquitination, antibodies have been used in im- munoenrichment setups to analyze for example lysine acetylation [276,277], methylation [278, 279] and nitration [280,281]. Another possibility is to use antibodies specific for certain sequence mo- tifs, hence targeting multiple proteins. Wingren and coworkers developed the global proteome survey (GPS), in which antibodies targeting short C- terminal motifs of four to six amino acids are used to enable a proteome- wide analysis using only a few hundred antibodies instead of one antibody per target protein [282]. The antibodies are called context-independent mo- tif specific (CIMS) antibodies and are selected from a large (2x1010) scfv

58 Bridging affinity proteomics with mass spectrometry library. A similar approach is the Triple X Proteomics (TXP) technology, developed by Poetz and colleagues in which, polyclonal affinity-purified an- tibodies are used to target short C- or N-terminal regions of three to five amino acids [283,284].

59 Bridging affinity proteomics with mass spectrometry

60 Chapter 5

Miniaturization

Miniaturization in proteomics

Miniaturization of experimental setups generating micro total analysis sys- tems (µTAS) or lab-on-a-chip devices, is becoming more and more popular due to advantages such as shorter analysis times, lower reagent consump- tion and the possibility for high-throughput and automated analysis [285]. These platforms usually have dimensions in the micrometer range and handle sample volumes of a few microliters down to picoliters [286]. This is a rela- tively new field with the first analytical microsystem (a gas chromatograph) developed in 1979 by Stephen Terry [287]. When an object is down-scaled isomorphically, the length, area and volume ratios are altered, which has effects on certain physical properties. In the book "Fundamentals of Microfabrication: The Science of Miniaturization", Marc Madou compares the downscaling of analytical assays to differences in properties of small and large animals. For example, the heat loss of a living creature is proportional to its surface area and is therefore very different for an elephant compared to a small pygme shrew. The pygme shrew must eat constantly or otherwise freeze to death, a non-existing problem for the large elephant [288]. The same can be said about downscaling analytical systems. The larger surface area to volume ratio for miniaturized analysis platforms affects for example evaporation characteristics [289]. In addition, in miniaturized systems, surface and interfacial tension have a larger impact

61 Miniaturization as compared to macro scale setups, where gravity is the more dominant force [290]. Scaling down a reaction chamber will in general not change the reaction kinetics, although a common strategy for protein and peptide separation is to use bead-based systems, where the locally high analyte concentration on the beads can alter the kinetics of the reaction. The density of the immobilized molecule can in such a setup reach higher levels than would be possible in solution due to precipitation issues. Moreover, miniaturization will increase the reaction rate. This can be explained by the fact that more time will be required for two molecules present in a large sample volume to come in contact with one another than the time required in a much smaller volume. A reaction requiring 1 h in a standard microtiter plate well can in a miniaturized device with a 50 µm reaction chamber take a few seconds [285]. The diffusion of molecules in a macro-scale system such as a test tube or a well in a microtiter plate is negligible and mixing is performed by convection (hand-shaking or using a vortex) [291]. However, as diffusion time is decreased with decreased size of the reaction chamber, mixing by diffusion only is usually enough in a miniaturized system [292]. In addition, miniaturization reduces both reagent and sample volumes, and by that also waste volumes, resulting in a decreased cost per assay. If a molecule is present in a sample at a very low concentration, the volume needed to detect the molecule might be larger than what can be used in the micro-device. However, depending on the setup of the miniaturized device, a large sample volume can in many cases be used in combination with a very small sample elution volume using for example solid phase extraction, resulting in a large enrichment factor and increased analyte concentration in the eluate [293]. Miniaturization also facilitates automation and integra- tion of analytical steps and further enables the use of portable, single-use platforms that is very beneficial in the clinic [294]. Despite these many ad- vantages, miniaturized analysis systems have not yet become as widespread in for example clinical applications as was anticipated a couple of years ago and Sackmann reported earlier this year than the majority of microfluidics articles are still published in engineering journals [290]. The difficulty to integrate these methods into a standard laboratory setting and the trouble for non-experts to use the methods could partly explain this [295]. However,

62 Miniaturization the field is still growing rapidly, which is promising for the future. In MS-based proteomics, as discussed in chapter 2, proteolytic protein di- gestion is very often applied prior to MS detection. Digestion using a prote- olytic enzyme can be incorporated into a miniaturized analysis system. The enzyme can then for example be immobilized onto beads after which, the sample is passed through the chromatographic resin or alternatively, the di- gestion reaction can be performed using a droplet-based system where each droplet acts as a unique reaction compartment. With these small volumes, the digestion time can be down-scaled from overnight incubation in a stan- dard setup in a test tube to a couple of minutes [296–299]. Miniaturized analytical devices have been developed that integrate different sample preparation steps with mass spectrometry readout [293,296,300,301], one of them being the ISET platform, discussed below.

Figure 5.1: The ISET platform. An array of nanovials in the ISET chip can be filled with chromatographic matrix for enrichment of target molecules. Molecules are eluted to the backside of the chip that serves as a MALDI target.

63 Miniaturization

The ISET platform

The integrated selective enrichment target (ISET) platform integrates sam- ple enrichment, proteolytic digestion and MALDI MS readout in a single chip. ISET sample preparation is bead-based and theoretically, any type of bead (with the right dimensions) can be used in the ISET platform in order to enrich for proteins or peptides of interest. The ISET device is a small (54x39 or 44x39 mm) silicon chip with 96 or 48 nanovials that can be filled with the chromatographic matrix of choice [302]. At the bottom of each nanovial there is a membrane with 3x3 outlet holes of 4,500 or 8,100 µm2. By attaching the device to a vacuum manifold, liquid can easily be pulled through the vials at a desired speed. On-bead protein digestion can be per- formed directly in the nanovials. Before elution the applied vacuum pressure is lowered, which will cause the elution liquid to form a droplet attached to the backside of the chip. This side of the chip acts as a MALDI target and after crystallization of eluted sample together with a MALDI matrix the chip can be directly inserted into a MALDI mass spectrometer for analysis (Figure 5.1). The integrated format of the device minimizes sample loss, since no sample transfers are needed between sample application and data generation. A typical ISET workflow is shown in Figure 5.2. The simplicity of the device and the dimensions of the vials makes it possible to integrate the device with robotics and sample handling platforms. In addition, the possibility to choose between a large collection of chromatographic resins makes the ISET platform exceptionally versatile.

1. Load beads 3. Sample binding 5. Tryptic digestion 6. Elution 7. MALDI analysis 2. Wash 4. Wash

Figure 5.2: A typical workflow for the ISET platform.

64 Miniaturization

The ISET chip was first described in 2004 by Ekström and coworkers [303]. Initially, the application of the chip was as a sample cleanup device where C18 beads were used for peptide desalting [303–305] and the ISET plat- form showed superior performance compared to two other available peptide desalting methods; ZipTips and MassPREP PROtarget [304]. Further de- velopment of the ISET chip has been performed to increase the conductivity of the chip [306] and to optimize the nanovial outlet design [302]. Protein digestion has also been integrated into the ISET workflow [307] and so far, the ISET platform has been used for peptide desalting with reversed phase matrix [303–306], antibody-based enrichment to detect prostate-specific anti- gen from seminal plasma [307] and detection of thrombin in human serum by capture on aptamer-functionalized beads [308].

65 Miniaturization

66 Chapter 6

Present Investigation

Summary

I have now tried to summarize the field in which I have been working for the past couple of years. I hope that it has become clear why studying our proteins is so essential, both to reach a deeper understanding about how we function but also in order to understand disease progression and for the de- velopment of drugs for treatment of disease. This chapter will summarize the research that I have been working on. The main focus has been on develop- ment of high-throughput methods for protein analysis and quantification. I have worked both with production and analysis of recombinant proteins that can be used in proteomics research (papers I and II), but also methods for the analysis of endogenous proteins in human cell lines (papers III-V). The projects that I will present have all been performed within the HPA project, described in chapter 3. This is a large-scale proteomics project with great resources both regarding instrumentation, but also human knowledge and manpower, which for a graduate student of course is extremely valu- able. In the presented projects, both polyclonal antibodies generated within the project, but also monoclonal antibodies generated by the spin-off com- pany Atlas Antibodies have been used. We have also taken advantage of the antigens generated within the project. These protein epitope signature tags (PrESTs) have been produced with heavy isotope-labeled amino acids, making it possible to use them as internal standards for MS-based absolute

67 Present Investigation quantification in different setups. The different projects will be described in the following sections. In paper I, the high-throughput protein production setup used within the HPA project was scaled down to a 96-well plate format. The ISET plate, described in chapter 5, was in paper II used to further scale down the IMAC purifica- tion to a nL-scale format. PrEST antigens were in paper III produced with heavy isotope-labeled amino acids and used as internal standards for deter- mination of protein copy numbers in six different cell lines using MS. The copy numbers were further compared to mRNA levels determined by RNA sequencing. Paper IV describes how antibodies generated within the HPA project and Atlas Antibodies were used in immunoenrichment coupled to mass spectrometry to investigate whether the antibodies can bind to their native target in a complex sample and also to determine their specificity. The immunoenrichment was in paper IV performed on the protein level, however in paper V, antibodies were used to capture tryptic peptides from a digested cell lysate. In this paper, PrESTs were also used as internal standards, which were spiked into the sample directly after cell lysis to enable absolute protein quantification.

Development of screening methods for recombinant protein production (papers I and II)

Proteins are diverse molecules with varying properties, wherefore protein production can be performed in very different ways. Optimization of the production parameters can be performed in order to increase the protein yield. However, in large-scale protein production projects, where hundreds of proteins are produced every week [179], optimization is not possible in the same way. Instead, a protocol that maximizes the overall success rate is desired. Within the HPA project, the success rate over the antigen produc- tion step is above 80%, although since the protein verification is performed as the last step in the production pipeline, a large amount of time and effort is spent on proteins that in the end will not be of use. It would therefore be very beneficial to be able to exclude these proteins at an early stage in the process. One possible solution is to include a small-scale screening step before the

68 Present Investigation large-scale protein production [309–311]. In this way, non-expressing or im- pure proteins can be excluded at an early stage, although it is of great importance that the different setups generate corresponding outcomes. In paper I, a µL-scale method for protein production screening was developed. The results were subsequently compared to data obtained from the HPA protein production pipeline. Protein concentration, purity and molecular weight verification were compared between the methods (Figure 6.1).

A intensity

success

SDS-PAGE m/z IMAC MALDI MS fail Cultivation fail fail

B intensity

success

Cultivation IMAC m/z SDS-PAGE MALDI MS fail fail fail

Figure 6.1: Workflow for (A) the standard protein produc- tion and validation pipeline and (B) the small-scale screening setup.

E. coli cells expressing recombinant proteins were cultivated in 96-well plates using only 150 µL of culture medium. To enable cultivation in this small for- R mat, a fed-batch-like system for cell cultivation called the EnBase culture technology was applied [312]. During cultivation in a glucose-rich medium, cells will initially grow very fast, resulting in an accumulation of acetate in the medium, a process known as overflow metabolism. In a fed-batch sys- tem, glucose is constantly applied to the medium in a manageable amount, which leads to a more controlled cell growth and finally, to higher cell den- R sities [313]. In the EnBase system, glucose is supplied to the cells by

69 Present Investigation constant, enzymatic degradation of starch in the culture medium, which resembles a fed-batch setup. One advantage with this system compared to conventional fed-batch cultivation is the simplicity, as starch and enzyme are simply added at the start of the cultivation. This setup makes cultivation in small volumes possible, which is not applicable for standard fed-batch R systems. In paper I, the EnBase technology was first compared to the standard HPA cultivation method. Shake flasks with 100 mL tryptic soy R broth medium were used for the standard cultivation and EnBase culti- vation was performed in deep well plates (DWP) with 3 mL medium. The R advantage of the EnBase technology was apparent as 96% of the proteins showed increased yields when comparing the obtained amount of protein after purification in relation to cultivation volume. A 21 times maximum increase in protein yield per mL culture volume was observed, with an av- R erage of 3.3 times increased yield for the EnBase cultivation (Figure 6.2). R The significant increase in produced protein for the EnBase system made it possible to downscale the cultivation to 150 µL, enabling cultivation in a 96-well plate format. Protein purification was also successfully performed in a 96-well format. IMAC was used for this purpose, which is also the purification method used in the standard HPA workflow, however the amount of matrix used was downscaled from 1 mL to 12.5 µL in the screening method (80 times). The purification was performed in a filter plate, which enabled efficient and fast purification with the aid of a vacuum manifold. In total, 96 PrESTs were analyzed in triplicate using the two methods. In the standard HPA workflow, validation after protein production and purifi- cation includes SDS-PAGE analysis for determination of protein purity and MS analysis of full-length proteins for molecular weight verification. Pro- teins can be failed in each of these steps. For the small-scale filter plate format, samples were at first only analyzed using MS. This led to an agree- ment between the methods of 91% (87 out of 96 PrESTs). The majority of the proteins (n=82) were produced successfully in both setups, whereas a few proteins (n=5) showed no expression using either of the methods. Six of the proteins that did not have coinciding results were failed in the standard setup due to impurities seen on the SDS-PAGE, however these contaminants were not observed in the small-scale setup where no electrophoretic analy-

70 Present Investigation

400 ● EnBase DWP ● shake flask 300

200

100 µg protein/mL culture media 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 PrEST (1−96)

Figure 6.2: Comparison of standard protein production in R shake flasks with EnBase cultivation in DWPs. The amount of protein obtained per mL culture volume in the standard R cultivation and the EnBase DWP cultivation is shown for all 96 proteins. sis was performed. When including this step in the small-scale setup, the agreement increased to 95% (91 out of 96 PrESTs). These highly coinciding results verify that this small-scale screening method can be used to accu- rately predict the outcome from protein production and purification using a large-scale setup. This could reduce both the time and cost spent on proteins that will not be possible to produce under standardized conditions. In paper II, the protein purification setup was downscaled further and the aim was to develop a method for recombinant protein verification using the ISET platform, discussed in chapter 5 (Figure 6.3). The integrated for- mat makes this platform very suitable for automation and in this paper, two robotic systems made it possible to automate all steps until MALDI analysis.

A method was developed in which the His6-tagged PrESTs present within a bacterial lysate from the HPA production pipeline were first retained in the ISET vials by Co2+ coated beads. This type of beads had not earlier been

71 Present Investigation

intensity A

ISET puri!cation and digestion m/z MALDI MS intensity Cultivation B

Digestion Spotting and crystallization m/z IMAC MALDI MS

Figure 6.3: Workflow for (A) the ISET protein screening setup and (B) the standard HPA protein purification workflow with the addition of a tryptic digestion step. In both setups protein production was performed in 1 L shake flasks. used in combination with the ISET platform and hence this investigation shows the possibility to integrate the ISET platform with commonly used laboratory methods. All steps were performed on a vacuum manifold, which enabled efficient liquid transfer through the vials. After washing to remove unbound proteins, PrESTs were digested directly in the vials of the ISET chip. In this step, the vacuum was turned off to prevent enzyme and peptides from leaving the vials. Due to the small vial volumes, efficient digestion was performed in only 1 h. The resulting peptides could be eluted by addition of MALDI matrix and applying a low vacuum pressure, which enabled for- mation of droplets on the backside of the ISET chip. After crystallization of the droplets, the chip was inserted into a MALDI mass spectrometer and the peptides were analyzed using both PMF and MS/MS. A total of 45 proteins were included in the analysis and they were all analyzed in triplicate using both the new ISET protein verification workflow and the standard protein production and purification setup used within the HPA project. To enable a more direct comparison between the two methods, a tryptic digestion step was performed on the samples from the standard HPA workflow. PMF resulted in identification of 42 out of the 45 PrESTs for the ISET setup. The success rate for MS/MS analysis was slightly lower with 41 successfully identified proteins. Three PrESTs were not identified with either PMF or MS/MS using the ISET method. When determining the sequence coverage,

72 Present Investigation

20 ● ISET ● standard setup 15

10

5 identified unique peptides

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 PrEST (1−45)

Figure 6.4: Comparison of number of identified unique pep- tides for the ISET screening method and the standard HPA setup using PMF. peptides covering the entire protein sequence were included. However, since the PrESTs all contain a common His6ABP tag, as discussed in chapter 3, detection of a unique peptide originating from the PrEST part was required for verification (Figure 6.4). One of the unidentified PrESTs (PrEST 17) did not contain any tryptic peptides in the chosen mass range and therefore, identification of this protein was theoretically not possible. One of the re- maining two PrESTs (PrEST 40) showed very low expression, resulting in a low protein concentration, explaining why this protein could not be identi- fied. Results obtained using the ISET setup were compared to results from the standard HPA workflow. After in solution digestion and MALDI PMF and MS/MS, 39 out of 45 PrESTs were successfully identified. Even though fewer proteins were identified after protein purification and digestion using the standard method, higher average sequence coverage was obtained. PMF resulted in an average sequence coverage of 68% and 78% for the ISET and standard methods respectively (Figure 6.5) and the corresponding numbers for MS/MS analysis were 54% and 64%. The ISET method was also com- pared to the setup usually applied in the HPA project with MS analysis of full-length proteins. Here, 40 proteins were identified and hence, the ISET setup could increase the success rate in this step.

73 Present Investigation

120 ● ISET ● 100 standard setup

80

60

40

20 sequence coverage PMF (%) sequence coverage 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 PrEST (1−45)

Figure 6.5: Comparison of degree of sequence coverage (%) for the ISET screening method and the standard HPA setup using PMF.

Apart from comparing sequence coverage and success rate between the two methods, an important aspect is the total time needed for analysis and es- pecially the required hands-on time. For analysis of 48 samples using the ISET method, the time required is approximately 4 h excluding MS analysis. This includes only 30 min hands-on time, which is needed for preparation of the robotic systems. In comparison, the time needed for the standard method is approximately 25 h, where a large part (around 16 h) is due to the overnight tryptic digestion. The difference in hands-on time is smaller since the standard method uses an automated liquid handling system for protein purification, however approximately 1-1.5 hours is needed for man- ual sample handling. A big advantage with the ISET method is the low sample requirement. Only 6 µL bacterial lysate was used, which is a neg- ligible fraction of the total lysate volume after cell cultivation (0.1%). The ISET workflow could therefore offer a possibility of protein verification before large-scale protein purification. Since the IMAC matrix is rather expensive, an initial screening for protein identity in a small scale could be very advan- tageous.

74 Present Investigation

Absolute MS-based protein quantification to study the corre- lation between protein and mRNA levels (paper III)

As discussed in chapter 1, proteins are produced from mRNA in a process called . Methods for large-scale mRNA analysis, called RNA se- quencing have been developed and generation of large amounts of RNA data is today feasible [314, 315]. However, large-scale protein analysis is not as straightforward and getting a complete view of a proteome is today very difficult, if not impossible. The analysis of mRNA in order to gain knowl- edge about protein function is therefore a tempting approach. By analyzing which mRNA transcripts are present in high and low levels, some conclusions can be drawn regarding which proteins are expressed in what cell types and at what levels. The studies that have been performed comparing mRNA and protein levels are numerous and the results have been varying. Some have claimed that no correlation exists between mRNA and protein levels and other studies have shown correlations as high as 0.9 (Spearman’s rank coefficient) [40, 47, 48, 58, 316]. A trend seems to be that more recent stud- ies usually report higher correlations than do older studies, indicating that instrumentation and technology has had an impact in this area. In recent studies MS is usually the method of choice for protein quantification and strategies such as SILAC labeling [48] and label-free approaches [40,58] have been used for this purpose. In paper III, another approach, PrEST-SILAC, was applied in which heavy isotope-labeled protein fragments are used as in- ternal standards for absolute protein quantification. The protein fragments are the antigens (PrESTs) developed within the HPA project, which can be used directly as they are (without isotopic labels) as internal standards in mass spectrometry for quantification of proteins in heavy isotope-labeled cell lines. However, PrESTs can also be produced with heavy isotope-labeled arginine and lysine residues in order to generate "heavy" PrESTs that can be used for quantification of proteins in unlabeled samples [127], as described in chapter 3. In total, 57 PrESTs corresponding to 32 genes were here chosen for analysis in six cell lines (A-549, HEK 293, HeLa, Hep-G2, RT-4 and U-2 OS). The protein fragments were produced with heavy isotope-labeled arginine and lysine residues and the proteins were quantified in the different cell lines using the PrEST-SILAC setup (Figure 6.6). PrESTs were spiked into the

75 Present Investigation

light cell line heavy PrEST

protein extraction

mix

digestion

fractionation

MS analysis intensity

m/z MS spectrum

Figure 6.6: Workflow for the PrEST-SILAC method. Heavy PrESTs are spiked into a cell lysate. After digestion and fractionation the sample is analyzed in a mass spectrometer. Heavy to light ratios for peptides corresponding to the added PrEST sequences are used for absolute quantification of the endogenous proteins. cell lysates directly after cell lysis, minimizing the errors introduced due to sample handling. Samples were digested and fractionated into six portions using strong anion exchange chromatography in a pipette tip format prior to MS analysis to determine the absolute protein concentration. The resulting protein copy numbers were further compared to mRNA data obtained from RNA sequencing.

76 Present Investigation

1e+08 Spearman correlation R = 0.46 1e+07

1e+06

1e+05

1e+04 protein copy number per cell number protein copy 1e−02 1e+00 1e+02 1e+04 FPKM (RNA sequencing)

Figure 6.7: Comparison of protein and mRNA levels for 32 genes in six different cell lines. A Spearman’s rank coefficient of 0.46 (p=7e−10) was determined.

Protein quantification was performed using triplicate samples and a median coefficient of variation of 17.2% was obtained across all cell lines. Quantifica- tion was performed using between one and 18 peptides, with an average value of 3.8 quantified peptides per protein. To further validate the accuracy of the method, unlabeled (light) PrESTs were also used as internal standards. For this experiment, three of the cell lines (A-549, HEK 293 and HeLa) were cultured in medium containing heavy isotope-labeled amino acids. The resulting copy numbers were compared and the Pearson correlation was de- termined to 0.76. Protein copy numbers determined in the six cell lines were subsequently compared to mRNA data obtained by RNA sequencing. First, copy numbers from all proteins in all six cell lines were compared to the corresponding RNA data. The correlation was determined to 0.46 (Spearman’s rank coefficient, p=7−10) (Figure 6.7), similar to previously published results [47, 49]. This rather low observed agreement was expected as several tightly controlled processes are involved in protein expression. Different genes are regulated in different ways and to different extent, and regulation can occur both at

77 Present Investigation the transcriptional and the translational stage. For this reason investigating correlations on the individual gene level would be more accurate.

ANXA1 CD81 DECR1 1e+08 1e+08 1e+08

● A−549 ● A−549 A−549 HEK 293 ● ● HEK 293 HEK 293 ● 1e+07 1e+07 ● 1e+07 HeLa ● HeLa ● HeLa ● Hep−G2 Hep−G2 ● ● Hep−G2 ● 1e+06 RT−4 1e+06 RT−4 ● 1e+06 RT−4 ● U−2OS U−2OS U−2OS

1e+05 1e+05 1e+05

1e+04 1e+04 1e+04 Spearman’s rank coecient = 0.90 Spearman’s rank coecient = 0.90 Spearman’s rank coecient = 1.0

protein copy number per cell number protein copy 1e+03 p = 0.083 per cell number protein copy 1e+03 p = 0.083 per cell number protein copy 1e+03 p = 0.083 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 FPKM (RNA sequencing) FPKM (RNA sequencing) FPKM (RNA sequencing)

MSN PTPN1 RRBP1 1e+08 1e+08 1e+08 A−549 A−549 A−549 HEK 293 ● HEK 293 HEK 293 ● 1e+07 1e+07 1e+07 ● HeLa HeLa HeLa ● ● ● ●● ● ● Hep−G2 Hep−G2 ● Hep−G2 ● ● ● 1e+06 RT−4 1e+06 RT−4 1e+06 RT−4 ● ● U−2OS ● U−2OS U−2OS

1e+05 1e+05 1e+05

1e+04 Spearman’s rank coecient = 0.89 1e+04 Spearman’s rank coecient = 0.83 1e+04 Spearman’s rank coecient = 0.94

protein copy number per cell number protein copy p = 0.033 per cell number protein copy p = 0.058 per cell number protein copy p = 0.017 1e+03 1e+03 1e+03 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 1e−02 1e−01 1e+00 1e+01 1e+02 1e+03 FPKM (RNA sequencing) FPKM (RNA sequencing) FPKM (RNA sequencing)

Figure 6.8: Comparison of protein and mRNA levels for individual genes.

When comparing protein and mRNA abundance for every individual gene across the six cell lines an increased correlation was observed compared to the correlation for the whole gene set. For many genes, similar patterns in mRNA and protein levels could be observed across the cell lines. This was true both for genes with variation in the mRNA level across the cell lines and for genes where the levels were constant. For six genes a significant Spearman coefficient of correlation (p<0.1) could be determined. In order to be able to analyze the whole gene set in this way, data from more cell lines is needed. All six analyzed genes showed correlations above 0.8, the average correlation being 0.9 (Figure 6.8). It is clear that there are differences in gene regulation between the studied

78 Present Investigation genes. In several cases, mRNA levels changed drastically with very small changes in protein abundance. For these genes, factors such as protein half- life or inhibitory RNA could for example play a role in the regulation of protein abundance. This data shows that protein and mRNA correlation can actually be very high, although due to differences in gene regulation, comparisons of large gene sets usually show lower overall correlations. To further validate this theory, more data is needed. Quantification of a larger number of proteins would be interesting to get a larger picture of mRNA and protein correlation, but expansion of the cell line panel would also be necessary in order to get enough data for each gene for statistical comparisons to be significant.

Antibody validation using immunoenrichment coupled to mass spectrometry (paper IV)

As described in chapter 4, one strategy to analyze proteins in complex sam- ples is by coupling an immunoenrichment step to MS readout. This enables specific identification but also, in many cases, analysis of low abundant pro- teins, since the immunoenrichment both lowers the complexity of the sam- ple significantly and also leads to increased protein or peptide concentra- tion. However, since the availability of good, validated antibodies is limited and different antibodies generally do not behave in the same manner, high- throughput methods are rare [244, 245]. If analyzing protein interactions, it is of primary importance that the specificity of the antibody is known, in order to be able to distinguish true interaction partners from proteins that either bind nonspecifically to the antibody or proteins for which the antibody shows specific cross-reactivity. Finding good controls can therefore be problematic when dealing with target-specific antibodies, as compared to tag-specific antibodies. In paper IV, it was investigated whether antibodies generated within the HPA project could be used as capture agents in an immunoenrichment setup. The HPA antibodies are thoroughly validated, however the methods used are mostly based on detection of denatured proteins. Antibody cross reactiv- ity is determined using PrEST arrays, where antibody binding towards 384 PrESTs is analyzed. Western blot (WB) analysis is further used to investi-

79 Present Investigation

Bead coupling Immunoenrichment

MS sample prep MS analysis and data evaluation intensity

m/z

Figure 6.9: Workflow for immunoenrichment using HPA antibodies coupled to MS readout. Antibodies are covalently attached to protein A-coupled beads and the beads are subse- quently incubated with a cell lysate. After wash, elution and tryptic digestion, peptides are analyzed in a mass spectrome- ter and proteins interacting with the antibodies are identified. gate binding to endogenous proteins in cell lines, tissues and plasma. In this study, a total of 30 antibodies were chosen for analysis. Twenty of these were polyclonal antibodies generated within the HPA project and the remaining ten antibodies were monoclonal antibodies generated by Atlas Antibodies. The glioblastoma cell line U251 was used for the analysis and antibodies were chosen that generated a band corresponding to the correct molecular weight in WB. To ensure that the antibodies were positioned correctly for antigen binding, antibodies were covalently attached to protein A-coated beads. The immunoenrichment was performed in a 96-well filter plate format, which en- abled parallel handling of up to 96 samples. By attaching the filter plate to a vacuum device, liquid was easily drawn through the wells. The same system was used as in paper I. After incubation of beads and lysate, several washing steps were performed in order to minimize the amount of unspe- cific binding to the beads, antibodies and walls of the wells. Proteins were eluted from the beads, digested with trypsin and analyzed using LC-MS/MS

80 Present Investigation

(Figure 6.9). The procedure was performed in triplicate reusing the same antibody-coupled beads for each replicate. When analyzing the peptide in- tensities of the detected proteins no drop in intensity over the replicates was observed, showing that the beads are reusable for at least three times. Out of the 30 analyzed antibodies, 16 could capture their corresponding target in at least two out of three replicates. Seven of these were monoclonal and nine were polyclonal antibodies, resulting in a success rate of 70% and 45% for the monoclonal and polyclonal antibodies respectively. There are several reasons that could explain why a portion of the antibodies could not capture their target protein in this setup. The epitope could be located at the interior of the protein, which makes it accessible to the antibody only if the protein is in its denatured state. WB analysis showed that there is a clear difference between antibody recognition of a denatured protein sample compared to a sample with proteins in their native state. It is also possible that the protein is inaccessible to the antibody for other reasons. A protein involved in a large protein complex might not be able to interact with the antibody in its native state due to sterical hindrance, however under denaturing conditions where protein interactions are lost, the protein will be more accessible to the antibody. Another potential reason for antibodies not capturing their target protein in detectable amounts is that the protein is expressed at a very low level in this cell type or not expressed at all. Proteins not detected by their corresponding antibody generally had lower mRNA levels in the U251 cells based on RNA sequencing. There is also a possibility that the antibody antigen interaction is weak and that the protein therefore is released from the antibody during the washing steps. Label free quantification (LFQ) was applied in order to determine antibody specificity. Specifically enriched proteins were identified by performing t tests where peptide LFQ intensities from samples were compared to corre- sponding intensities from a control. Proteins with a more than sixteen-fold increase in intensity between sample and control and a p-value below 0.01 were considered significant. The monoclonal antibodies only enriched their target protein, whereas different contaminants and interaction partners were identified in the majority of the samples from the polyclonal antibodies. Ex- amples are shown in Figure 6.10. Here, intensity ratio between sample and control is shown on the x axis and −log10(p-value) on the y axis. Proteins

81 Present Investigation

mAb1 ( target: RBM3 ) mAb6 ( target: STX7 ) 8 ● RBM3 ● ●

● 4 5 6

● ● ● ● ● ● ● ● ● ● STX7 4 6 ● ● ● ● ● ● ● ● ● ● ● ●

−log10(p−value) ● −log10(p−value) ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● 2 ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● 1 2 3 ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ●●●●●●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●●●●●● ●●●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ●●● ●●●● ● ● ●●● ●●●●● ●● ● ● ●●●●● ● ●●●●●● ● ● ●● ●●●● ●● ●●●● ●● ● ●●●●●● ●●●● ●● ●●● ●● ●●●●●●●● ●●●●● ●● ● ● ●●●●● ●●● ●●●● ●●●● ● ● ● ●●● ● ●●●●●● ●●●● ● ● ●●●●●● ●●●● ● ● ●●●●●●●●●● ●●●● ● ● ●●●●●●●●●●●●●●●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●● ●●●●●●●●● ● ●●●●●●●●● ●●●● ● ●●●●●●●● ●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●● ● ● ●● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●● ●●●● 0 0

−6 −4 −2 0 2 4 6 −10 −5 0 5 10 log2(ratio mAb1 /control) log2(ratio mAb6 /control)

pAb5 ( target: CHCHD3 ) pAb15 ( target: NSRP1 ) 7 8

● ATPIF1 ● ●

4 5 6 ● CHCHD6 NSRP1 ● ● ● ● ● ● ● CSRP2 ● 4 6 ● ● ● ● ● ● ● ● ● CSRP1 ● ● −log10(p−value) ● ● ● ● −log10(p−value) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●

● ● 2 ● ●● ●● CHCHD3 ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● 1 2 3 ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ●● ● ●●● ● ●●● ● ●●● ● ● ●● ●●● ● ●●● ●●●● ● ● ●●●●● ● ● ●● ● ● ● ●●● ● ●●●● ● ●● ● ●●● ● ●●●●● ● ●●●● ● ●●●●●● ● ●●● ● ● ●●● ●●● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ●●●●● ●● ●●● ● ● ●● ●●●●●●●●●● ● ● ● ●● ● ●●●●●● ● ●●● ●●●●●●●● ●●●●●● ● ● ● ●●●●●●● ●●●●●● ● ● ●●● ●●●●● ●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●● ● ●●●●●● ●●●●●●●● ● ● ●●●●●●● ●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ● ● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ●● ● ● ●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●● ● ●●● ●●●●● 0 0

−6 −4 −2 0 2 4 6 −10 −5 0 5 10 log2(ratio pAb5 /control) log2(ratio pAb15 /control)

Figure 6.10: Identification of target proteins after immu- noenrichment coupled to MS. Ratios from logarithmized LFQ intensities are shown on the x-axis and the negative logarithms of the corresponding p-values on the y-axis. Grey lines indi-

cate threshold values of log2(ratio)=4 and p=0.01. Signifi- cantly enriched proteins are located in the upper right cor- ner. Color-coding: green = target protein, black = common contaminant, blue = probable interaction partner or antibody cross reactivity target. specifically enriched by the antibody will therefore be present in the upper right section, whereas contaminants with no significant difference in inten- sity between sample and control will appear around the center. Monoclonal antibody 1 (mAb1) targets RBM3, which was clearly enriched in the cor- responding samples. The same is true for mAb6, which targets the protein STX7. Polyclonal antibody 5 (pAb5) targets the protein CHCHD3. This protein was identified in the corresponding samples, however ended up just

82 Present Investigation below the significance level. The antibody significantly enriched the pro- tein CHCHD6, which is known to interact with CHCHD3 and might explain why it was detected [317, 318]. When comparing the sequences of these two proteins, a stretch of amino acids within the antigen region showed rela- tively high sequence similarity, wherefore antibody cross reactivity could be another possible explanation. The target of pAb15 is NSRP1, which was sig- nificantly enriched in the corresponding samples. Three other proteins were also enriched, however these proteins were present in a large portion of the samples, indicating that they are contaminants interacting nonspecifically with the antibody, beads or plastics. In total, LFQ data was obtained for 14 of the target proteins and twelve of these were significantly enriched in the corresponding sample. In the remaining two samples, the target proteins were identified just below the significance threshold. In summary, roughly 50% of the analyzed antibodies were functional in the developed immunoenrichment setup. This indicates that a very large number of the over 21,000 antibodies generated within the HPA project, should be possible to use in immunoenrichment of native proteins coupled to MS read- out. These antibodies could for example be used to study protein abundance in different cells and tissues or to investigate protein interactions.

Protein quantification using immunoenrichment and mass spec- trometry (paper V)

Immunoenrichment can be performed both at the protein and the peptide level, as described in chapter 4. For quantification purposes, enrichment at the peptide level is preferred, due to the straightforward incorporation of AQUA peptides for absolute quantification, as in the SISCAPA tech- nology [76]. However, incomplete digestion can hamper the accuracy of quantification based on AQUA peptides, as they are spiked into the sam- ple after the proteolytic digestion [122]. In paper V, PrESTs were used as internal standards, similarly as in paper III. However this strategy, named immuno-SILAC, also includes an immunoenrichment step where HPA anti- bodies are used to capture tryptic peptides from a digested cell lysate (Figure 6.11). First it was investigated whether the HPA antibodies were able to bind

83 Present Investigation

light cell line heavy PrEST

protein extraction

mix

digestion

immunoenrichment

MS analysis intensity

m/z MS spectrum

Figure 6.11: Workflow for protein quantification using immuno-SILAC. Heavy PrESTs are spiked into a cell lysate and the sample is digested. Antibodies are used to enrich both heavy and light versions of tryptic peptides and the eluate is analyzed in a mass spectrometer. Heavy to light ratios are used for absolute quantification of the endogenous proteins. to tryptic peptides even though they were raised towards larger protein fragments. Epitope mapping of 941 HPA antibodies was performed using high-density arrays with 12-mer synthetic peptides covering the 941 antigen sequences with one amino acid shift. The number of linear epitopes for each antibody ranged between zero and ten with on average 2.9 linear epitopes per antibody. This should give an indication of whether the antibody is suitable

84 Present Investigation for peptide immunoenrichment, however several other factors also affect the outcome. For example, epitopes containing a lysine or arginine residue will not be functional due to destruction of the epitope during tryptic cleavage. Also, some epitopes might be longer than twelve amino acids, making them undetectable on the peptide array. A subset of 150 antibodies were chosen in order to investigate to what extent the antibodies can be used in peptide immunoenrichment and the corresponding PrEST was available for 127 of these. Antibodies were incubated with protein A coated magnetic beads in pools of around 50 antibodies. As little as 50 ng per antibody was required. A PrEST mix was digested with trypsin before incubation with the bead pools. After wash and elution, the samples were subjected to MS analysis. Due to the lowered complexity of the sample after immunoenrichment, a 15 min LC gradient was sufficient for peptide separation. This can be compared to the setup in paper III, where six fractions were each separated using a 3 h gradient. In total, 57 out of the analyzed 127 antibodies successfully cap- tured one or several tryptic peptides corresponding to the target PrEST. The conclusion from this experiment was that roughly half of the antibodies rec- ognize at least one tryptic peptide, making the large pool of HPA antibodies a valuable resource for immunoenrichment also on the peptide level. It was thereafter investigated whether the antibodies could be used to gen- erate quantitative data by immunoenrichment of a mixture of endogenous peptides and heavy isotope-labeled internal standard peptides using the immuno-SILAC setup. A new set of 41 PrESTs were produced with heavy isotope-labeled arginine and lysine and were spiked into a HeLa cell lysate before tryptic digestion. Corresponding antibodies were thereafter used to capture the peptides from the cell lysate. After MS analysis, peptides corre- sponding to 22 of the PrESTs were identified. These PrESTs in turn corre- sponded to 20 different proteins, resulting in a success rate of roughly 50%, also in this experiment. The quantification of the 20 proteins was in parallel performed using the PrEST-SILAC method, described in Zeiler et al [127] and in paper III. The number of peptides used for quantification was higher for PrEST-SILAC, which was expected since no specific peptides were en- riched (Figure 6.12). In immuno-SILAC, 59% of the proteins were quantified with only one peptide, whereas in PrEST-SILAC the corresponding number was 6%.

85 Present Investigation

CAPG BLVRB ANXA3 (RatioH:L) (RatioH:L) (RatioH:L)

2 2 2 g g −2 0 2 4 lo log lo −4 −2 0 2 4 −4 −4 −2 0 2 4

0 20 40 60 80 0 20 40 60 0 20 40 60 80 100 120 140 HPRR2370009 [83aa] HPRR3210149 [73aa] HPRR2050268 [141aa] SERPINB6 CLPP DAP3 4 2 (Ratio H:L) (Ratio H:L) (Ratio H:L)

2 2 2 g lo log log −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0

0 50 100 150 0 20 40 60 80 100 120 0 20 40 60 80 100 120 HPRR1440008 [145aa] HPRR1440011 [122aa] HPRR1420037 [135aa] ERLIN1 PTPN1 DECR1 4 2 0 0 2 4 (Ratio H:L) (Ratio H:L) (Ratio H:L)

2 2 2 g log −2 0 2 4 −2 log lo −4 −2 −4 −4

0 20 40 60 80 0 20 40 60 80 100 120 140 0 10 20 30 40 50 60 70 HPRR1950257 [91aa] HPRR1952306 [139aa] HPRR2540115 [71aa]

Figure 6.12: Quantitative data from the immuno-SILAC and PrEST-SILAC setups. Heavy to light peptide ratios are shown for nine example proteins. Quantitative data generated with immuno-SILAC (yellow) and PrEST-SILAC (blue) are compared. The median peptide ratios from the PrEST-SILAC quantification are shown by the dotted lines.

It has been shown previously that the digestion efficiency between endoge- nous proteins and PrESTs is similar [127], which makes it possible to include miscleaved peptides that are otherwise usually removed from the dataset. This trend was also seen in these results. For example, several miscle- aved peptides corresponding to the protein DAP3 were identified using the immuno-SILAC setup. These all show similar heavy to light ratios where- fore they can actually provide additional accurate quantitative information (Figure 6.12). The chosen targets were all moderate to high abundant in the

86 Present Investigation

1e+08 Pearson correlation R2=0.91 ● 1e+07 ●

● ● ● ● ● ● ● ●● 1e+06 ● ● ● ● ● ● ● 1e+05 ● ●

1e+04 protein copy number per cell (immuno−SILAC) number protein copy 1e+04 1e+05 1e+06 1e+07 1e+08

protein copy number per cell (PrEST−SILAC)

Figure 6.13: Comparison of protein copy numbers gen- erated with immuno-SILAC and PrEST-SILAC. Data from immuno-SILAC is shown on the y-axis and PrEST-SILAC data on the x-axis. The Pearson correlation was determined to 0.91.

HeLa cell line, wherefore in this case proteins were in most cases identified without immunoenrichment. However, for low abundant targets enrichment strategies such as immuno-SILAC could be necessary. Regarding variation in quantitative data between replicate samples, immuno-SILAC generated slightly increased coefficients of variation compared to PrEST-SILAC rang- ing between 10-40% for the majority of the proteins. This can be due to the lower number of peptides included in the quantification, where a large num- ber of quantified peptides will generally lead to a more accurate result. When comparing the copy numbers per cell determined using the two methods, a good correlation of 0.91 could however be observed (Figure 6.13). Paper V shows that HPA antibodies can successfully be used for peptide im- munoenrichment, however a screening step is required in order to determine whether the antibody recognizes tryptic peptides and also to determine the number of peptides that can be captured by the antibody. This setup could be used for rapid quantification of proteins in cell lines but also potentially for quantification of lower abundant plasma proteins.

87 Present Investigation

Concluding remarks

Proteomics is a growing area of research. Not so surprising one could think; it is hard not to be intrigued by the challenge of investigating our most im- portant building blocks. The constant improvement of MS instrumentation is most likely a great contributor to the rapid expansion of the field and with the current development speed, an increase in sensitivity of around 70- fold has been anticipated within the coming decade [152]. Along with the technological development within the field, numerous innovative methods for protein analysis have emerged. MS, affinity reagents and the combina- tion of these are the dominant players within proteomics and are used in laboratories worldwide. The papers presented in this thesis are all focused toward high-throughput protein analysis. In papers I and II, screening methods for high-throughput verification of protein products were developed in different formats. These screening methods could decrease both the time and effort spent on pro- tein production and purification. In paper II, the ISET platform was used in a new setting, for enrichment of His6-tagged proteins, hence increasing the possible fields of application for this platform. The PrESTs generated within the HPA project were in papers III and V used as internal standards for MS-based absolute protein quantification. A high-throughput system for production of heavy isotope-labeled protein fragments has been set up, en- abling production of 100 protein fragments in a single batch. Recently, the possibility to downscale the production from a shake flask format to a deep well plate format has been investigated, which would enable feasible, paral- lel production of several hundred heavy isotope-labeled PrEST proteins. In paper V, a peptide enrichment step was also included where HPA antibodies were used to capture tryptic peptides resulting in a significant decrease in sample complexity. In addition, the enrichment step could possibly enable analysis of low abundant proteins in for example plasma, although this has yet to be investigated. For purposes such as the analysis of protein com- plexes, enrichment at the protein level is necessary. The HPA antibodies were in paper IV used in a high-throughput protein enrichment setup and it was shown that a large portion of the antibodies could capture the corre- sponding native full-length protein from a complex sample. This could be of great value in for example investigating protein interactions but also for

88 Present Investigation determining antibody specificity. One future goal is to be able to combine the setups presented in papers IV and V, using antibodies for protein and peptide enrichment, with the ISET platform used in paper II. This would result in a miniaturized setup for pro- tein and peptide analysis that could potentially be of value in the clinic. By combining antibodies targeting a panel of selected biomarkers, indicative for a specific disease in vials of the ISET plate, information regarding pa- tient disease state could be generated. This could be enabled using minimal amounts of both reagents and patient sample and also in a minimal amount of time. Moreover, the protein immunoenrichment protocol presented in pa- per IV could be modified to enable investigation of protein complexes. This should be done by optimizing the washing steps in order to preserve also weaker interactions. Another aim is to further develop the application of PrESTs as internal stan- dards in MS-based absolute quantification setups. Additional optimization of the PrEST production and validation process and development of stream- lined protocols for protein quantification could enable a robust system for accurate quantification of proteins in a high-throughput fashion. In paper III, 32 proteins were quantified using this setup, however the limit of multi- plexing probably lies far beyond this and hopefully PrEST protein standards can in the near future be used for parallel quantification of several hundreds or thousands of proteins. In summary, the work presented in this thesis includes the development and application of high-throughput methods for protein analysis and quantifi- cation. These methods can in the future be applied to determine absolute protein copy numbers in cells and hopefully also tissue and plasma. By using antibodies, it is possible that also lower abundant proteins can be detected and quantified and protein enrichment strategies enables the investigation of protein complexes. It is my hope that the presented investigations can be of value for the proteomics research field and that other researchers can benefit from the results.

89 Present Investigation

90 Chapter 7

Populärvetenskaplig sammanfattning

Tillägnad min fantastiska familj

Den här avhandlingen handlar om proteiner och hur man på olika sätt kan studera dem. Proteiner finns i alla våra celler och har där livsviktiga uppgifter. Proteiner ser till att cellerna håller ihop, de ser till att syre trans- porteras genom blodet och de ser till att vi kan ta kål på bakterier och virus som kommer in i kroppen. Bara för att nämna några exempel. Proteinerna produceras i en process som kallas translation och det är gener i vårt DNA som bestämmer hur ett protein ska se ut. Människans DNA blev kartlagt 2001 och idag vet vi alltså vilka gener vi har och därigenom vilka proteiner som kan produceras. Detta betyder dock inte på något sätt att vi vet vad alla proteiner har för funktion. Denna frågeställning studeras i det forsk- ningsområde som kallas proteomik. Proteomiken kan delas upp i två olika delar: masspektrometri (MS)-baserad proteomik och affinitetsproteomik. I MS-baserad proteomik analyserar man proteiner genom att bestämma deras massa och man tittar oftast på pep- tider, det vill säga mindre delar av proteiner som man kan få fram genom att klippa sönder ett protein med ett enzym. Detta gör man helt enkelt

91 Populärvetenskaplig sammanfattning för att de blir lättare att detektera då. I affinitetsproteomik använder man affinitetsmolekyler, oftast antikroppar, som kan binda till det protein man är intresserad av för att på så vis kunna detektera det. Jag har i min forskning använt mig av MS-baserad proteomik men även en kombination av MS- baserad och affinitetsproteomik. Målet med forskningen som presenteras i denna avhandling var att utveckla metoder för att kunna analysera och kvantifiera många proteiner parallellt. Jag har till exempel jobbat med MS- baserade metoder för att kunna kvantifiera proteiner i olika celler. Ett prob- lem när man vill analysera proteiner med MS är att det finns för mycket andra proteiner i lösningen som stör. Ett sätt att lösa detta problem är att använda antikroppar som kan fiska ut de proteiner man är intresserad av. Antikroppar har en viktig roll i vårt immunförsvar där de binder till proteiner på ytan av bakterier och virus. Detta signalerar till resten av immunförsvaret att bakterien eller viruset är en inkräktare som då kan tas om hand av olika immunceller. Antikropparnas roll är alltså (bland annat) att binda starkt till andra proteiner. Man kan idag producera antikroppar som binder till det protein man vill (målproteinet) genom att utnyttja immunförsvaret. Detta görs i stor skala i proteinatlas-projektet (HPA) genom att immunisera ka- niner med mänskliga proteinfragment. Jag har använt antikroppar för att fånga målprotein från ett cellysat (proteiner som kommer ut i lösning när man slår sönder ett cellprov). MS användes sedan för att kontrollera vad som hade bundit till antikropparna. I ett annat projekt användes en lik- nande strategi men här var det peptider (delar från sönderklippta proteiner) istället för proteiner som fiskades ut av antikroppar. Jag har även jobbat med att utveckla metoder som kan användas inom HPA-projektet för att förenkla flödet för proteinproduktion och proteinrening. Sammanfattningsvis bygger denna avhandling på utvecklingen av olika metod- er för parallell analys av proteiner. Metoderna har bland annat använts för att kvantifiera proteiner i olika celler och för att undersöka till vilka proteiner olika antikroppar binder. Det slutliga målet är att öka förståelsen för hur olika proteiner fungerar och interagerar med varandra och att kunna identi- fiera vilka proteiner som är kopplade till olika sjukdomar. Förhoppningsvis kommer denna avhandling att kunna bidra till detta.

92 Chapter 8

Acknowledgements

So now you’ve reached the last part of my thesis. I had such a great time working on the different projects and I have so many people to be grateful to for this. Both people involved in the projects but also people who have supported me in other, less research-related, ways and people who have just made me smile when I have needed it the most. Sophia, min underbara huvudhandledare och förebild. Jag är så glad att jag fått spendera de senaste åren med dig. Du har lärt mig så otroligt mycket, inte bara om forskning, utan även om hur viktigt det är att tro på sig själv och stå på sig. Tack för att du alltid ser lösningar på alla problem och för att du alltid peppar och uppmuntrar. Tack för att du tar dig tid och alltid finns där även fast du egentligen har tusen andra viktiga saker att göra. Tack också för att du gett mig så mycket frihet och låtit mig utvecklas. Tack till min bihandledare Thomas för spännande samarbeten kring ISET och roliga möten med gott godis. Tack vare dig har jag fått uppleva något annat än bara KTH, vilket har varit otroligt lärorikt och framförallt riktigt kul. Jenny OT, även fast det inte står på papper, känner jag att du verkligen har varit en handledare för mig. Du har alltid tid för frågor och funderingar och kommer alltid med kloka råd. Det har verkligen varit lärorikt att jobba med dig. Du gör inget halvhjärtat, vilket jag verkligen beundrar. Tack för alla diskussioner och pekskrivande.

93 Acknowledgements

Tack Mathias för att du är en sådan inspirationskälla och startar projekt som HPA. Tack också för att du vågade satsa på MS-kvantifiering med tunga PrESTar på KTH fast det var något helt nytt för oss. Till alla övriga PIs på plan3, både nya och gamla, vill jag också rikta ett jättestort tack. Tack för alla kloka ord och för att ni värnar om arbetsmiljön och inser att det blir bättre forskning när alla hjälps åt (och ses för att dricka öl ihop). Tack till Knut och Alice Wallenbergs stiftelse samt PIEp/Vinnova för finansiering. Tack till alla som jag samarbetat med i projekten under åren. Hanna T och Louise, ni välkomnade mig in i EnBase-projektet när jag var helt ny doktorand. Tack för alla härliga stunder med filterplattor och reningar! Tack också Louise för att du introducerade mig till MS-världen. Thank you R Kaisa and the rest of the EnBase team for nice collaborations. Belinda och Simon, efter många finjusteringar lyckades vi äntligen få rob- otarna att göra som vi ville och vi lyckades automatisera ISET-protokollet. Tack för många häftiga och lite nördiga ISET-filmer. Ett speciellt tack till dig Belinda för allt knasigt vi haft för oss nere hos er på LTH. Hydrofobpennor och häftmassa skulle jag tidigare inte associerat med forskning, men tänk så fel jag hade. Tack också för roliga samarbeten, kurser och konferenser. Tack Henrik J och Janne för MS-stöd i många av projekten. Tack för all hjälp med MS-prover och svar på frågor och för att MS-kön kunnat modifieras lite vid behov. Tack Erik M för alla timmar du suttit vid min dator och programmerat. Utan din hjälp hade det inte blivit så mycket MS-data. Tack Frida för cellodling och Anna Bä och Hanna T för roligt labbande ihop vid produktion av tunga PrESTar. Fredrik och Björn, vad kul det var att äntligen få några att prata MS med. Tack för alla diskussioner om tunga PrESTar och MS-instrument. Tack till projektledarna Sophia, Thomas, Jenny OT och Mathias som styrt projekten i mål. Tack även till Sophia, Thomas, Jenny OT, Janne och Margareta för värdefulla kommentarer till min avhandling. Thank you Marlis and Matthias for nice collaborations and thank you Marlis for taking such good care of me during my visit in Munich. I had a

94 Acknowledgements wonderful time and learnt so much. För nästan 6,5 år sedan började jag som sommarjobbare i MolBio-gruppen i HPA. Tack Anja för att du anställde mig och tack Holger, Marica, Bahram, Anna S, Jenny F, Mehri, Malin och Lili för en rolig tid med mycket racklingar och sekvensning. Tack Johan N för en fantastisk tid under mitt exjobb. Det finns ingen mer pedagogisk och positiv handledare än du. Tack även Elin och Anna JP för all hjälp på Affibody under exjobbet. Jag är otroligt tacksam att ha fått vara en del av en fantastisk forskargrupp med många kloka och fina människor. Tack Sophia för alla resor till Måla och tack alla nuvarande och gamla medlemmar i gruppen, Hanna T, Jenny OT, Micke, Sarah L, Sara K, Emma F, Johan N, Mattias, Anna K, Cajsa, Tove A, Johanna S och alla exjobbare för alla roliga stunder på frukostar och middagar. Alla härliga tjejer i proteinfabriken som haft hand om MS-instrumenten på ett eller annat sätt, Kattis, Anne-Sophie, Anneli H, Anna Be och Jenny B. Tack för alla roliga, mindre roliga och frustrerande timmar vid instrumenten. Det finns ingen större glädje än när man meckat ihop ESIn helt på egen hand. De fina exjobbare jag fått äran att handleda under åren, Sara K, Erik F och Jessica. Tack för att ni varit så motiverade och roliga att jobba med. Jag tror att jag lärde mig minst lika mycket som ni. Tack till alla nuvarande och gamla rumskamrater, Sara K, Micke, Erik F, Greg, Linnea, Josefine, Wasil, Prem och Staffan för mysig gemenskap med skratt och diskussioner. Tack Micke och Erik F för att ni alltid finns där för att hjälpa till med alla tänkbara datorproblem. Tack alla fina plan3-tjejer som styrt upp stickjuntor, tjejmiddagar och an- nat skoj. Tack gamla och nuvarande medlemmar i beer foundation för god öl och roliga tillställningar i lunchrummet. Tack till alla Bio05-or på avdel- ningen som hängt ihop i snart 10 (!) år. Hanna L, Lisa, Filippa, Magnus, Johan S och Anna H, det känns som om det var igår vi stod tillsammans framför gula K:t och det kommer kännas konstigt när vi skiljs åt i den riktiga världen. Tack Filippa och Andreas J för svettiga timmar på KTH-hallen.

95 Acknowledgements

Tack också Andreas J för att du dragit iväg mig till gymmet under hösten när jag trott att jag haft viktigare saker för mig. Tack även Emma och Lan Lan för beställningar och Inger och Kristina för administrativ hjälp. Tack Roxana & Co för fantastiskt jobb i diskrummet. Sara K, vad hade jag gjort utan dig? Jag kan inte tänka mig hur det kommer vara att inte sitta bredvid dig varje dag. Tack för att du alltid finns där oavsett om man vill skratta eller gråta. Du är nog bäst i världen på att muntra upp när inget går som det ska. Jag hoppas du förstår att du är helt fantastisk. Tack alla härliga sångfåglar som jag fått sjunga med i luciatåget genom åren. Hanna T, Sara K, Anneli H, Magnus, Erik F, Andreas H, Tobbe, Micke, Tarek, Erica, Josefine, Anna-Luisa och LanLan. Lucia är verkligen bästa dagen på året! Tack alla på plan3 för att ni tillsammans gör det till världens bästa arbet- splats. Tack också till alla HPA-grupper som tillsammans gjort HPA till det fantastiska projekt det är. Särskilt tack till proteinfabriken för BCA- analyser m.m, tack till immunotech för hjälp med western blot och tack till molbio för kontrollsekvensning. Tack också för trevliga resor till Mallorca och Antalya. Alla vänner utanför KTH, tack för allt kul vi haft genom åren. Tack gym- nasiegänget Linda, Sophia, Jennie, Aya, Susanna, Linn, Rickard och Myggan för roliga middagar och söta barn. Tack Isabelle för att du är min närmsta vän och alltid stöttar! Tack Aida och Kalle för att ni är fan- tastiska och alltid finns där och tack till mina musketörer Johnny, Marcus och Linda för alla fantastiska och galna stunder. Ni är bäst! Tack även till Frinsö-klanen som får mig att glömma forskningen ibland för att fokusera på de viktiga sakerna i livet, som fönsterfoder och elverk. Min fina familj, Mamma, Pappa och lillebror Henning. Ni är de bästa och jag älskar er över allt annat. Tack för att ni låtit mig gå min egen väg och att ni stöttat mig med mina "flygande proteiner". Slutligen, min älskade Tord, tack för att du gör mig lycklig och får mig att skratta varje dag. Tack för att du alltid är stolt över mig, speciellt när jag behöver det som mest. Du och jag för alltid.

96 Chapter 9

Bibliography

[1] F. H. Crick. On protein synthesis. Symp Soc Exp Biol, 12:138–63, 1958. [2] N. Gregersen, P. Bross, S. Vang, and J. H. Christensen. Protein misfolding and human disease. Annu Rev Genomics Hum Genet, 7:103–24, 2006. [3] S. Khan and M. Vihinen. Spectrum of disease-causing mutations in protein sec- ondary structures. BMC Struct Biol, 7:56, 2007. [4] F. Crick. Central dogma of molecular biology. Nature, 227(5258):561–3, 1970. [5] B. R. Glick and J. J. Pasternak. Molecular biotechnology : principles and applica- tions of recombinant DNA. ASM Press, Washington, D.C., 3rd edition, 2003. [6] C. Branden and J. Tooze. Introduction to protein structure. Garland : Taylor Francis, New York, 2nd edition, 1999. [7] M. Mann, R. C. Hendrickson, and A. Pandey. Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem, 70:437–73, 2001. [8] M. Uhlen, E. Bjorling, C. Agaton, C. A. Szigyarto, B. Amini, E. Andersen, A. C. Andersson, P. Angelidou, A. Asplund, C. Asplund, L. Berglund, K. Bergstrom, H. Brumer, D. Cerjan, M. Ekstrom, A. Elobeid, C. Eriksson, L. Fagerberg, R. Falk, J. Fall, M. Forsberg, M. G. Bjorklund, K. Gumbel, A. Halimi, I. Hallin, C. Ham- sten, M. Hansson, M. Hedhammar, G. Hercules, C. Kampf, K. Larsson, M. Lind- skog, W. Lodewyckx, J. Lund, J. Lundeberg, K. Magnusson, E. Malm, P. Nilsson, J. Odling, P. Oksvold, I. Olsson, E. Oster, J. Ottosson, L. Paavilainen, A. Pers- son, R. Rimini, J. Rockberg, M. Runeson, A. Sivertsson, A. Skollermo, J. Steen, M. Stenvall, F. Sterky, S. Stromberg, M. Sundberg, H. Tegel, S. Tourle, E. Wahlund, A. Walden, J. Wan, H. Wernerus, J. Westberg, K. Wester, U. Wrethagen, L. L. Xu, S. Hober, and F. Ponten. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics, 4(12):1920–32, 2005. [9] E. Lundberg and M. Uhlen. Creation of an antibody-based subcellular protein atlas. Proteomics, 10(22):3984–96, 2010.

97 Bibliography

[10] T. Geiger, A. Wehner, C. Schaab, J. Cox, and M. Mann. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol Cell Proteomics, 11(3):M111 014050, 2012. [11] U. B. Nielsen and B. H. Geierstanger. Multiplexed sandwich assays in microarray format. J Immunol Methods, 290(1-2):107–20, 2004. [12] M. Bantscheff, M. Schirle, G. Sweetman, J. Rick, and B. Kuster. Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem, 389(4):1017–31, 2007. [13] M. Bantscheff, S. Lemeer, M. M. Savitski, and B. Kuster. Quantitative mass spec- trometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem, 404(4):939–65, 2012. [14] O. T. Avery, C. M. Macleod, and M. McCarty. Studies on the chemical nature of the substance inducing transformation of pneumococcal types : Induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii. J Exp Med, 79(2):137–58, 1944. [15] J. D. Watson and F. H. Crick. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 171(4356):737–8, 1953. [16] F. H. Crick, L. Barnett, S. Brenner, and R. J. Watts-Tobin. General nature of the genetic code for proteins. Nature, 192:1227–32, 1961. [17] T. B. Osborne. The vegetable proteins. Monographs on biochemistry. 1909. [18] D. Perrett. From ’protein’ to the beginnings of clinical proteomics. Proteomics Clin Appl, 1(8):720–38, 2007. [19] H. B. Vickery. The origin of the word protein. Yale J Biol Med, 22(5):387–93, 1950. [20] J. A. Reynolds and C. Tanford. Nature’s robots : a history of proteins. , Oxford, 2001. [21] F. Sanger and H. Tuppy. The amino-acid sequence in the phenylalanyl chain of insulin. i. the identification of lower peptides from partial hydrolysates. Biochem J, 49(4):463–81, 1951. [22] F. Sanger and H. Tuppy. The amino-acid sequence in the phenylalanyl chain of insulin. 2. the investigation of peptides from enzymic hydrolysates. Biochem J, 49(4):481–90, 1951. [23] L. Pauling, R. B. Corey, and H. R. Branson. The structure of proteins; two hydrogen- bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A, 37(4):205–11, 1951. [24] L. Pauling and R. B. Corey. Atomic coordinates and structure factors for two helical configurations of polypeptide chains. Proc Natl Acad Sci U S A, 37(5):235–40, 1951. [25] C. Tanford. How protein chemists learned about the hydrophobic factor. Protein Sci, 6(6):1358–66, 1997. [26] W. Kauzmann. Some factors in the interpretation of protein denaturation. Adv Protein Chem, 14:1–63, 1959.

98 Bibliography

[27] R. K. Saiki, S. Scharf, F. Faloona, K. B. Mullis, G. T. Horn, H. A. Erlich, and N. Arnheim. Enzymatic amplification of beta-globin genomic sequences and re- striction site analysis for diagnosis of sickle cell anemia. Science, 230(4732):1350–4, 1985. [28] K. Mullis, F. Faloona, S. Scharf, R. Saiki, G. Horn, and H. Erlich. Specific enzymatic amplification of dna in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol, 51 Pt 1:263–73, 1986. [29] E. C. Hayden. Technology: The $1,000 genome. Nature, 507(7492):294–5, 2014. [30] E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McK- ernan, J. Meldrim, J. P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bent- ley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dun- ham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, et al. Initial sequencing and analysis of the human genome. Nature, 409(6822):860–921, 2001. [31] J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Mik- los, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannen- halli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evan- gelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, et al. The sequence of the human genome. Science, 291(5507):1304–51, 2001. [32] M. R. Wilkins, C. Pasquali, R. D. Appel, K. Ou, O. Golaz, J. C. Sanchez, J. X. Yan, A. A. Gooley, G. Hughes, I. Humphery-Smith, K. L. Williams, and D. F.

99 Bibliography

Hochstrasser. From proteins to proteomes: large scale protein identification by two- dimensional electrophoresis and amino acid analysis. Biotechnology (N Y), 14(1):61– 5, 1996. [33] R. Aebersold and M. Mann. Mass spectrometry-based proteomics. Nature, 422(6928):198–207, 2003. [34] B. Domon and R. Aebersold. Mass spectrometry and protein analysis. Science, 312(5771):212–7, 2006. [35] P. H. O’Farrell. High resolution two-dimensional electrophoresis of proteins. J Biol Chem, 250(10):4007–21, 1975. [36] O. Stoevesandt and M. J. Taussig. Affinity reagent resources for human proteome detection: initiatives and perspectives. Proteomics, 7(16):2738–50, 2007. [37] M. Clamp, B. Fry, M. Kamal, X. Xie, J. Cuff, M. F. Lin, M. Kellis, K. Lindblad-Toh, and E. S. Lander. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A, 104(49):19428–33, 2007. [38] M. Mann and O. N. Jensen. Proteomic analysis of post-translational modifications. Nat Biotechnol, 21(3):255–61, 2003. [39] O. N. Jensen. Modification-specific proteomics: characterization of post- translational modifications by mass spectrometry. Curr Opin Chem Biol, 8(1):33–41, 2004. [40] M. Wilhelm, J. Schlegl, H. Hahne, A. Moghaddas Gholami, M. Lieberenz, M. M. Savitski, E. Ziegler, L. Butzmann, S. Gessulat, H. Marx, T. Mathieson, S. Lemeer, K. Schnatbaum, U. Reimer, H. Wenschuh, M. Mollenhauer, J. Slotta-Huspenina, J. H. Boese, M. Bantscheff, A. Gerstmair, F. Faerber, and B. Kuster. Mass- spectrometry-based draft of the human proteome. Nature, 509(7502):582–7, 2014. [41] J. Cox and M. Mann. Quantitative, high-resolution proteomics for data-driven systems biology. Annu Rev Biochem, 80:273–99, 2011. [42] N. L. Anderson and N. G. Anderson. The human plasma proteome: history, char- acter, and diagnostic prospects. Mol Cell Proteomics, 1(11):845–67, 2002. [43] J. L. Luque-Garcia and T. A. Neubert. Sample preparation for serum/plasma pro- filing and biomarker identification by mass spectrometry. J Chromatogr A, 1153(1- 2):259–76, 2007. [44] P. A. McGettigan. Transcriptomics in the rna-seq era. Curr Opin Chem Biol, 17(1):4–11, 2013. [45] A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, and B. Wold. Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods, 5(7):621–8, 2008. [46] S. K. Archer, D. Inchaustegui, R. Queiroz, and C. Clayton. The cell cycle regulated transcriptome of trypanosoma brucei. PLoS One, 6(3):e18425, 2011. [47] R. de Sousa Abreu, L. O. Penalva, E. M. Marcotte, and C. Vogel. Global signatures of protein and mrna expression levels. Mol Biosyst, 5(12):1512–26, 2009.

100 Bibliography

[48] E. Lundberg, L. Fagerberg, D. Klevebring, I. Matic, T. Geiger, J. Cox, C. Algenas, J. Lundeberg, M. Mann, and M. Uhlen. Defining the transcriptome and proteome in three functionally different human cell lines. Mol Syst Biol, 6:450, 2010. [49] E. Eisenberg and E. Y. Levanon. Human housekeeping genes, revisited. Trends Genet, 29(10):569–74, 2013. [50] J. K. Aronson. Biomarkers and surrogate endpoints. Br J Clin Pharmacol, 59(5):491–4, 2005. [51] L. Trinkle-Mulcahy. Resolving protein interactions and complexes by affinity pu- rification followed by label-based quantitative mass spectrometry. Proteomics, 12(10):1623–38, 2012. [52] A. C. Gingras, M. Gstaiger, B. Raught, and R. Aebersold. Analysis of protein complexes using mass spectrometry. Nat Rev Mol Cell Biol, 8(8):645–54, 2007. [53] E. de Hoffman and V. Stroobant. Mass spectrometry : principles and applications. John Wiley, Chichester, 3. edition, 2007. [54] M. Karas and F. Hillenkamp. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20):2299–301, 1988. [55] J. B. Fenn, M. Mann, C. K. Meng, S. F. Wong, and C. M. Whitehouse. Electrospray ionization for mass spectrometry of large biomolecules. Science, 246(4926):64–71, 1989. [56] X. Han, A. Aslanian, and 3rd Yates, J. R. Mass spectrometry for proteomics. Curr Opin Chem Biol, 12(5):483–90, 2008. [57] H. Steen and M. Mann. The abc’s (and xyz’s) of peptide sequencing. Nat Rev Mol Cell Biol, 5(9):699–711, 2004. [58] N. Nagaraj, J. R. Wisniewski, T. Geiger, J. Cox, M. Kircher, J. Kelso, S. Paabo, and M. Mann. Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol, 7:548, 2011. [59] A. H. Payne and G. L. Glish. Tandem mass spectrometry in quadrupole ion trap and ion cyclotron resonance mass spectrometers. Methods Enzymol, 402:109–48, 2005. [60] R. Craig and R. C. Beavis. Tandem: matching proteins with tandem mass spectra. , 20(9):1466–7, 2004. [61] D. L. Tabb, C. G. Fernando, and M. C. Chambers. Myrimatch: highly accurate tan- dem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res, 6(2):654–61, 2007. [62] J. Cox, N. Neuhauser, A. Michalski, R. A. Scheltema, J. V. Olsen, and M. Mann. Andromeda: a peptide search engine integrated into the maxquant environment. J Proteome Res, 10(4):1794–805, 2011. [63] D. N. Perkins, D. J. Pappin, D. M. Creasy, and J. S. Cottrell. Probability-based pro- tein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551–67, 1999.

101 Bibliography

[64] J. K. Eng, A. L. McCormack, and J. R. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom, 5(11):976–89, 1994. [65] J. D. Tipton, J. C. Tran, A. D. Catherman, D. R. Ahlf, K. R. Durbin, and N. L. Kelleher. Analysis of intact protein isoforms by mass spectrometry. J Biol Chem, 286(29):25451–8, 2011. [66] F. Lanucara and C. E. Eyers. Top-down mass spectrometry for the analysis of combinatorial post-translational modifications. Mass Spectrom Rev, 32(1):27–42, 2013. [67] X. Han, M. Jin, K. Breuker, and F. W. McLafferty. Extending top-down mass spectrometry to proteins with masses greater than 200 kilodaltons. Science, 314(5796):109–12, 2006. [68] B. A. Garcia. What does the future hold for top down mass spectrometry? J Am Soc Mass Spectrom, 21(2):193–202, 2010. [69] M. Hedhammar, T. Graslund, and S. Hober. Protein engineering strategies for selective protein purification. Chemical Engineering Technology, 28(11):1315–1325, 2005. [70] M. Neiman, C. Fredolini, H. Johansson, J. Lehtio, P. A. Nygren, M. Uhlen, P. Nils- son, and J. M. Schwenk. Selectivity analysis of single binder assays used in plasma protein profiling. Proteomics, 13(23-24):3406–10, 2013. [71] J. R. Wisniewski, A. Zougman, N. Nagaraj, and M. Mann. Universal sample prepa- ration method for proteome analysis. Nat Methods, 6(5):359–62, 2009. [72] A. Shevchenko, H. Tomas, J. Havlis, J. V. Olsen, and M. Mann. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc, 1(6):2856–60, 2006. [73] R. M. Branca, L. M. Orre, H. J. Johansson, V. Granholm, M. Huss, A. Perez-Bercoff, J. Forshed, L. Kall, and J. Lehtio. Hirief lc-ms enables deep proteome coverage and unbiased proteogenomics. Nat Methods, 11(1):59–62, 2014. [74] J. R. Wisniewski, A. Zougman, and M. Mann. Combination of fasp and stagetip- based fractionation allows in-depth analysis of the hippocampal membrane pro- teome. J Proteome Res, 8(12):5674–8, 2009. [75] M. P. Washburn, D. Wolters, and 3rd Yates, J. R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol, 19(3):242–7, 2001. [76] N. L. Anderson, N. G. Anderson, L. R. Haines, D. B. Hardie, R. W. Olafson, and T. W. Pearson. Mass spectrometric quantitation of peptides and proteins using sta- ble isotope standards and capture by anti-peptide antibodies (siscapa). J Proteome Res, 3(2):235–44, 2004. [77] Y. Zhang, A. Wolf-Yadlin, P. L. Ross, D. J. Pappin, J. Rush, D. A. Lauffenburger, and F. M. White. Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics, 4(9):1240–50, 2005.

102 Bibliography

[78] M. R. Thingholm, T. E.and Larsen, C. R. Ingrell, M. Kassem, and O. N. Jensen. Tio(2)-based phosphoproteomic analysis of the plasma membrane and the effects of phosphatase inhibitor treatment. J Proteome Res, 7(8):3304–13, 2008. [79] J. Rappsilber, M. Mann, and Y. Ishihama. Protocol for micro-purification, enrich- ment, pre-fractionation and storage of peptides for proteomics using stagetips. Nat Protoc, 2(8):1896–906, 2007. [80] N. Jehmlich, C. Golatowski, A. Murr, G. Salazar, V. M. Dhople, E. Hammer, and U. Volker. Comparative evaluation of peptide desalting methods for salivary pro- teome analysis. Clin Chim Acta, 434:16–20, 2014. [81] M. A. Kuzyk, D. Smith, J. Yang, T. J. Cross, A. M. Jackson, D. B. Hardie, N. L. Anderson, and C. H. Borchers. Multiple reaction monitoring-based, multiplexed, ab- solute quantitation of 45 proteins in human plasma. Mol Cell Proteomics, 8(8):1860– 77, 2009. [82] D. Domanski, A. J. Percy, J. Yang, A. G. Chambers, J. S. Hill, G. V. Freue, and C. H. Borchers. Mrm-based multiplexed quantitation of 67 putative cardiovascular disease biomarkers in human plasma. Proteomics, 12(8):1222–43, 2012. [83] L. Anderson and C. L. Hunter. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics, 5(4):573–88, 2006. [84] P. Picotti and R. Aebersold. Selected reaction monitoring-based proteomics: work- flows, potential, pitfalls and future directions. Nat Methods, 9(6):555–66, 2012. [85] T. Shi, T. L. Fillmore, X. Sun, R. Zhao, A. A. Schepmoes, M. Hossain, F. Xie, S. Wu, J. S. Kim, N. Jones, R. J. Moore, L. Pasa-Tolic, J. Kagan, K. D. Rodland, T. Liu, K. Tang, 2nd Camp, D. G., R. D. Smith, and W. J. Qian. Antibody-free, tar- geted mass-spectrometric approach for quantification of proteins at low picogram per milliliter levels in human plasma/serum. Proc Natl Acad Sci U S A, 109(38):15395– 400, 2012. [86] P. Picotti, O. Rinner, R. Stallmach, F. Dautel, T. Farrah, B. Domon, H. Wenschuh, and R. Aebersold. High-throughput generation of selected reaction-monitoring as- says for proteins and proteomes. Nat Methods, 7(1):43–6, 2010. [87] J. D. Venable, M. Q. Dong, J. Wohlschlegel, A. Dillin, and J. R. Yates. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods, 1(1):39–45, 2004. [88] A. Panchaud, S. Jung, S. A. Shaffer, J. D. Aitchison, and D. R. Goodlett. Faster, quantitative, and accurate precursor acquisition independent from ion count. Anal Chem, 83(6):2250–7, 2011. [89] L. C. Gillet, P. Navarro, S. Tate, H. Rost, N. Selevsek, L. Reiter, R. Bonner, and R. Aebersold. Targeted data extraction of the ms/ms spectra generated by data- independent acquisition: a new concept for consistent and accurate proteome anal- ysis. Mol Cell Proteomics, 11(6):O111 016717, 2012.

103 Bibliography

[90] H. Molina, Y. Yang, T. Ruch, J. W. Kim, P. Mortensen, T. Otto, A. Nalli, Q. Q. Tang, M. D. Lane, R. Chaerkady, and A. Pandey. Temporal profiling of the adipocyte proteome during differentiation using a five-plex silac based strategy. J Proteome Res, 8(1):48–58, 2009. [91] S. E. Ong, B. Blagoev, I. Kratchmarova, D. B. Kristensen, H. Steen, A. Pandey, and M. Mann. Stable isotope labeling by amino acids in cell culture, silac, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics, 1(5):376–86, 2002. [92] N. Ibarrola, H. Molina, A. Iwahori, and A. Pandey. A novel proteomic approach for specific identification of tyrosine kinase substrates using [13c]tyrosine. J Biol Chem, 279(16):15805–13, 2004. [93] G. T. Cantin, J. D. Venable, D. Cociorva, and 3rd Yates, J. R. Quantitative phospho- proteomic analysis of the tumor necrosis factor pathway. J Proteome Res, 5(1):127– 34, 2006. [94] S. E. Ong, I. Kratchmarova, and M. Mann. Properties of 13c-substituted arginine in stable isotope labeling by amino acids in cell culture (silac). J Proteome Res, 2(2):173–81, 2003. [95] B. Blagoev, S. E. Ong, I. Kratchmarova, and M. Mann. Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nat Biotechnol, 22(9):1139–45, 2004. [96] P. A. Everley, J. Krijgsveld, B. R. Zetter, and S. P. Gygi. Quantitative cancer proteomics: stable isotope labeling with amino acids in cell culture (silac) as a tool for prostate cancer research. Mol Cell Proteomics, 3(7):729–35, 2004. [97] T. Geiger, S. F. Madden, W. M. Gallagher, J. Cox, and M. Mann. Proteomic portrait of human breast cancer progression identifies novel prognostic markers. Cancer Res, 72(9):2428–39, 2012. [98] Q. Wu, W. Xu, L. Cao, X. Li, T. He, Z. Wu, and W. Li. Saha treatment reveals the link between histone lysine acetylation and proteome in nonsmall cell lung cancer a549 cells. J Proteome Res, 12(9):4064–73, 2013. [99] R. Zhang, C. S. Sioma, S. Wang, and F. E. Regnier. Fractionation of isotopically labeled peptides in quantitative proteomics. Anal Chem, 73(21):5142–9, 2001. [100] T. Geiger, J. Cox, P. Ostasiewicz, J. R. Wisniewski, and M. Mann. Super-silac mix for quantitative proteomics of human tumor tissue. Nat Methods, 7(5):383–5, 2010. [101] T. Geiger, J. R. Wisniewski, J. Cox, S. Zanivan, M. Kruger, Y. Ishihama, and M. Mann. Use of stable isotope labeling by amino acids in cell culture as a spike-in standard in quantitative proteomics. Nat Protoc, 6(2):147–57, 2011. [102] R. R. Lund, M. G. Terp, A. V. Laenkholm, O. N. Jensen, R. Leth-Larsen, and H. J. Ditzel. Quantitative proteomics of primary tumors with varying metastatic capabil- ities using stable isotope-labeled proteins of multiple histogenic origins. Proteomics, 12(13):2139–48, 2012.

104 Bibliography

[103] D. K. Schweppe, J. R. Rigas, and S. A. Gerber. Quantitative phosphoproteomic profiling of human non-small cell lung cancer tumors. J Proteomics, 91:286–96, 2013. [104] P. L. Ross, Y. N. Huang, J. N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillai, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Martin, M. Bartlet-Jones, F. He, A. Jacobson, and D. J. Pappin. Multiplexed protein quan- titation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics, 3(12):1154–69, 2004. [105] A. Thompson, J. Schafer, K. Kuhn, S. Kienle, J. Schwarz, G. Schmidt, T. Neumann, R. Johnstone, A. K. Mohammed, and C. Hamon. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by ms/ms. Anal Chem, 75(8):1895–904, 2003. [106] G. Pottiez, J. Wiederin, H. S. Fox, and P. Ciborowski. Comparison of 4-plex to 8-plex itraq quantitative measurements of proteins in human plasma samples. J Proteome Res, 11(7):3774–81, 2012. [107] T. Werner, G. Sweetman, M. F. Savitski, T. Mathieson, M. Bantscheff, and M. M. Savitski. Ion coalescence of neutron encoded tmt 10-plex reporter ions. Anal Chem, 86(7):3594–601, 2014. [108] P. J. Boersema, R. Raijmakers, S. Lemeer, S. Mohammed, and A. J. Heck. Multiplex peptide stable isotope dimethyl labeling for quantitative proteomics. Nat Protoc, 4(4):484–94, 2009. [109] J. L. Hsu, S. Y. Huang, N. H. Chow, and S. H. Chen. Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem, 75(24):6843–52, 2003. [110] M. Miyagi and K. C. Rao. Proteolytic 18o-labeling strategies for quantitative pro- teomics. Mass Spectrom Rev, 26(1):121–36, 2007. [111] M. Schnolzer, P. Jedrzejewski, and W. D. Lehmann. -catalyzed incorpora- tion of 18o into peptide fragments and its application for protein sequencing by elec- trospray and matrix-assisted laser desorption/ionization mass spectrometry. Elec- trophoresis, 17(5):945–53, 1996. [112] S. P. Gygi, B. Rist, S. A. Gerber, F. Turecek, M. H. Gelb, and R. Aebersold. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol, 17(10):994–9, 1999. [113] E. C. Yi, X. J. Li, K. Cooke, H. Lee, B. Raught, A. Page, V. Aneliunas, P. Hieter, D. R. Goodlett, and R. Aebersold. Increased quantitative proteome coverage with (13)c/(12)c-based, acid-cleavable isotope-coded affinity tag reagent and modified data acquisition scheme. Proteomics, 5(2):380–7, 2005. [114] A. H. America and J. H. Cordewener. Comparative lc-ms: a landscape of peaks and valleys. Proteomics, 8(4):731–49, 2008. [115] D. H. Lundgren, S. I. Hwang, L. Wu, and D. K. Han. Role of spectral counting in quantitative proteomics. Expert Rev Proteomics, 7(1):39–53, 2010.

105 Bibliography

[116] H. Liu, R. G. Sadygov, and 3rd Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem, 76(14):4193–201, 2004. [117] D. A. Megger, T. Bracht, H. E. Meyer, and B. Sitek. Label-free quantification in clinical proteomics. Biochim Biophys Acta, 1834(8):1581–90, 2013. [118] R. D. Voyksner and H. Lee. Investigating the use of an octupole ion guide for ion storage and high-pass mass filtering to improve the quantitative performance of elec- trospray ion trap mass spectrometry. Rapid Commun Mass Spectrom, 13(14):1427– 37, 1999. [119] W. Wang, H. Zhou, H. Lin, S. Roy, T. A. Shaler, L. R. Hill, S. Norton, P. Ku- mar, M. Anderle, and C. H. Becker. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem, 75(18):4818–26, 2003. [120] W. M. Old, K. Meyer-Arendt, L. Aveline-Wolf, K. G. Pierce, A. Mendoza, J. R. Sevinsky, K. A. Resing, and N. G. Ahn. Comparison of label-free methods for quan- tifying human proteins by shotgun proteomics. Mol Cell Proteomics, 4(10):1487–502, 2005. [121] V. Brun, C. Masselon, J. Garin, and A. Dupuis. Isotope dilution strategies for absolute quantitative proteomics. J Proteomics, 72(5):740–9, 2009. [122] S. A. Gerber, J. Rush, O. Stemman, M. W. Kirschner, and S. P. Gygi. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem ms. Proc Natl Acad Sci U S A, 100(12):6940–5, 2003. [123] R. J. Beynon, M. K. Doherty, J. M. Pratt, and S. J. Gaskell. Multiplexed absolute quantification in proteomics using artificial qcat proteins of concatenated signature peptides. Nat Methods, 2(8):587–9, 2005. [124] S. Hanke, H. Besir, D. Oesterhelt, and M. Mann. Absolute silac for accurate quan- titation of proteins in complex mixtures down to the attomole level. J Proteome Res, 7(3):1118–30, 2008. [125] V. Brun, A. Dupuis, A. Adrait, M. Marcellin, D. Thomas, M. Court, F. Vandenesch, and J. Garin. Isotope-labeled protein standards: toward absolute quantitative pro- teomics. Mol Cell Proteomics, 6(12):2139–49, 2007. [126] S. Singh, M. Springer, J. Steen, M. W. Kirschner, and H. Steen. Flexiquant: a novel tool for the absolute quantification of proteins, and the simultaneous identification and quantification of potentially modified peptides. J Proteome Res, 8(5):2201–10, 2009. [127] M. Zeiler, W. L. Straube, E. Lundberg, M. Uhlen, and M. Mann. A protein epitope signature tag (prest) library allows silac-based absolute quantification and multi- plexed determination of protein copy numbers in cell lines. Mol Cell Proteomics, 11(3):O111 009613, 2012. [128] M. Zeiler, M. Moser, and M. Mann. Copy number analysis of the murine platelet proteome spanning the complete abundance range. Mol Cell Proteomics, in press, 2014.

106 Bibliography

[129] B. Schwanhausser, D. Busse, N. Li, G. Dittmar, J. Schuchhardt, J. Wolf, W. Chen, and M. Selbach. Global quantification of mammalian gene expression control. Na- ture, 473(7347):337–42, 2011. [130] J. C. Silva, M. V. Gorenstein, G. Z. Li, J. P. Vissers, and S. J. Geromanos. Absolute quantification of proteins by lcmse: a virtue of parallel ms acquisition. Mol Cell Proteomics, 5(1):144–56, 2006. [131] Y. Ishihama, Y. Oda, T. Tabata, T. Sato, T. Nagasu, J. Rappsilber, and M. Mann. Exponentially modified protein abundance index (empai) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics, 4(9):1265–72, 2005. [132] P. Lu, C. Vogel, R. Wang, X. Yao, and E. M. Marcotte. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol, 25(1):117–24, 2007. [133] J. Grossmann, B. Roschitzki, C. Panse, C. Fortes, S. Barkow-Oesterreicher, D. Rutishauser, and R. Schlapbach. Implementation and evaluation of relative and absolute quantification in shotgun proteomics with label-free methods. J Pro- teomics, 73(9):1740–6, 2010. [134] L. Arike, K. Valgepea, L. Peil, R. Nahku, K. Adamberg, and R. Vilu. Compar- ison and applications of label-free absolute proteome quantification methods on escherichia coli. J Proteomics, 75(17):5437–48, 2012. [135] Z. Li, R. M. Adams, K. Chourey, G. B. Hurst, R. L. Hettich, and C. Pan. System- atic comparison of label-free, metabolic labeling, and isobaric chemical labeling for quantitative proteomics on ltq orbitrap velos. J Proteome Res, 11(3):1582–90, 2012. [136] P. Pichler, T. Kocher, J. Holzmann, M. Mazanek, T. Taus, G. Ammerer, and K. Mechtler. Peptide labeling with isobaric tags yields higher identification rates using itraq 4-plex compared to tmt 6-plex and itraq 8-plex on ltq orbitrap. Anal Chem, 82(15):6549–58, 2010. [137] A. Sandberg, R. M. Branca, J. Lehtio, and J. Forshed. Quantitative accuracy in mass spectrometry based proteomics of complex samples: the impact of labeling and precursor interference. J Proteomics, 96:133–44, 2013. [138] A. F. Altelaar, C. K. Frese, C. Preisinger, M. L. Hennrich, A. W. Schram, H. T. Timmers, A. J. Heck, and S. Mohammed. Benchmarking stable isotope labeling based quantitative proteomics. J Proteomics, 88:14–26, 2013. [139] M. Mann. Functional and quantitative proteomics using silac. Nat Rev Mol Cell Biol, 7(12):952–8, 2006. [140] L. M. de Godoy, J. V. Olsen, G. A. de Souza, G. Li, P. Mortensen, and M. Mann. Status of complete proteome analysis by mass spectrometry: Silac labeled yeast as a model system. Genome Biol, 7(6):R50, 2006. [141] L. M. de Godoy, J. V. Olsen, J. Cox, M. L. Nielsen, N. C. Hubner, F. Frohlich, T. C. Walther, and M. Mann. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature, 455(7217):1251–4, 2008.

107 Bibliography

[142] N. Nagaraj, N. A. Kulak, J. Cox, N. Neuhauser, K. Mayr, O. Hoerning, O. Vorm, and M. Mann. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra hplc runs on a bench top orbitrap. Mol Cell Proteomics, 11(3):M111 013722, 2012. [143] A. S. Hebert, A. L. Richards, D. J. Bailey, A. Ulbrich, E. E. Coughlin, M. S. Westphall, and J. J. Coon. The one hour yeast proteome. Mol Cell Proteomics, 13(1):339–47, 2014. [144] M. Beck, A. Schmidt, J. Malmstroem, M. Claassen, A. Ori, A. Szymborska, F. Her- zog, O. Rinner, J. Ellenberg, and R. Aebersold. The quantitative proteome of a human cell line. Mol Syst Biol, 7:549, 2011. [145] F. Desiere, E. W. Deutsch, N. L. King, A. I. Nesvizhskii, P. Mallick, J. Eng, S. Chen, J. Eddes, S. N. Loevenich, and R. Aebersold. The peptideatlas project. Nucleic Acids Res, 34(Database issue):D655–8, 2006. [146] T. Farrah, E. W. Deutsch, G. S. Omenn, Z. Sun, J. D. Watts, T. Yamamoto, D. Shteynberg, M. M. Harris, and R. L. Moritz. State of the human proteome in 2013 as viewed through peptideatlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven human proteome project. J Proteome Res, 13(1):60–75, 2014. [147] H. A. Ebhardt, E. Sabido, R. Huttenhain, B. Collins, and R. Aebersold. Range of by selected/multiple reaction monitoring mass spectrometry in an unfractionated human cell culture lysate. Proteomics, 12(8):1185–93, 2012. [148] T. A. Addona, S. E. Abbatiello, B. Schilling, S. J. Skates, D. R. Mani, D. M. Bunk, C. H. Spiegelman, L. J. Zimmerman, A. J. Ham, H. Keshishian, S. C. Hall, S. Allen, R. K. Blackman, C. H. Borchers, C. Buck, H. L. Cardasis, M. P. Cusack, N. G. Dodder, B. W. Gibson, J. M. Held, T. Hiltke, A. Jackson, E. B. Johansen, C. R. Kinsinger, J. Li, M. Mesri, T. A. Neubert, R. K. Niles, T. C. Pulsipher, D. Ransohoff, H. Rodriguez, P. A. Rudnick, D. Smith, D. L. Tabb, T. J. Tegeler, A. M. Variyath, L. J. Vega-Montoto, A. Wahlander, S. Waldemarson, M. Wang, J. R. Whiteaker, L. Zhao, N. L. Anderson, S. J. Fisher, D. C. Liebler, A. G. Paulovich, F. E. Regnier, P. Tempst, and S. A. Carr. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol, 27(7):633–41, 2009. [149] H. Keshishian, T. Addona, M. Burgess, E. Kuhn, and S. A. Carr. Quantitative, mul- tiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics, 6(12):2212–29, 2007. [150] T. Shi, X. Sun, Y. Gao, T. L. Fillmore, A. A. Schepmoes, R. Zhao, J. He, R. J. Moore, J. Kagan, K. D. Rodland, T. Liu, A. Y. Liu, R. D. Smith, K. Tang, 2nd Camp, D. G., and W. J. Qian. Targeted quantification of low ng/ml level proteins in human serum without immunoaffinity depletion. J Proteome Res, 12(7):3353–61, 2013. [151] M. Pernemalm and J. Lehtio. Mass spectrometry-based plasma proteomics: state of the art and future outlook. Expert Rev Proteomics, 11(4):431–48, 2014.

108 Bibliography

[152] L. Anderson. Six decades searching for meaning in the proteome. J Proteomics, 107:24–30, 2014. [153] R. S. Yalow and S. A. Berson. Immunoassay of endogenous plasma insulin in man. J Clin Invest, 39:1157–75, 1960. [154] L. E. Miles and C. N. Hales. The preparation and properties of purified 125-i-labelled antibodies to insulin. Biochem J, 108(4):611–8, 1968. [155] L. E. Miles and C. N. Hales. Immunoradiometric assay of human growth hormone. Lancet, 2(7566):492–3, 1968. [156] E. Engvall and P. Perlmann. Enzyme-linked immunosorbent assay (elisa). quanti- tative assay of immunoglobulin g. Immunochemistry, 8(9):871–4, 1971. [157] B. K. Van Weemen and A. H. Schuurs. Immunoassay using antigen-enzyme conju- gates. FEBS Lett, 15(3):232–236, 1971. [158] R. M. Lequin. Enzyme immunoassay (eia)/enzyme-linked immunosorbent assay (elisa). Clin Chem, 51(12):2415–8, 2005. [159] P. Chames, M. Van Regenmortel, E. Weiss, and D. Baty. Therapeutic antibodies: successes, limitations and hopes for the future. Br J Pharmacol, 157(2):220–33, 2009. [160] P. Parham. The immune system. Garland, New York, NY, 2. edition, 2005. [161] B. Hjelm, B. Forsstrom, J. Lofblom, J. Rockberg, and M. Uhlen. Parallel immu- nizations of rabbits using the same antigen yield antibodies with similar, but not identical, epitopes. PLoS One, 7(12):e45817, 2012. [162] G. Kohler and C. Milstein. Continuous cultures of fused cells secreting antibody of predefined specificity. Nature, 256(5517):495–7, 1975. [163] M. Leenaars and C. F. Hendriksen. Critical steps in the production of polyclonal and monoclonal antibodies: evaluation and recommendations. ILAR J, 46(3):269– 79, 2005. [164] C. M. Hammers and J. R. Stanley. Antibody phage display: technique and appli- cations. J Invest Dermatol, 134(2):e17, 2014. [165] H. J. Dyson and P. E. Wright. Antigenic peptides. FASEB J, 9(1):37–42, 1995. [166] T. Olafsen and A. M. Wu. Antibody vectors for imaging. Semin Nucl Med, 40(3):167–81, 2010. [167] Consortium Structural Genomics, Consortium China Structural Genomics, Consor- tium Northeast Structural Genomics, S. Graslund, P. Nordlund, J. Weigelt, B. M. Hallberg, J. Bray, O. Gileadi, S. Knapp, U. Oppermann, C. Arrowsmith, R. Hui, J. Ming, S. dhe Paganon, H. W. Park, A. Savchenko, A. Yee, A. Edwards, R. Vin- centelli, C. Cambillau, R. Kim, S. H. Kim, Z. Rao, Y. Shi, T. C. Terwilliger, C. Y. Kim, L. W. Hung, G. S. Waldo, Y. Peleg, S. Albeck, T. Unger, O. Dym, J. Prilusky, J. L. Sussman, R. C. Stevens, S. A. Lesley, I. A. Wilson, A. Joachimiak, F. Collart, I. Dementieva, M. I. Donnelly, W. H. Eschenfeldt, Y. Kim, L. Stols, R. Wu, M. Zhou, S. K. Burley, J. S. Emtage, J. M. Sauder, D. Thompson, K. Bain, J. Luz, T. Gheyi,

109 Bibliography

F. Zhang, S. Atwell, S. C. Almo, J. B. Bonanno, A. Fiser, S. Swaminathan, F. W. Studier, M. R. Chance, A. Sali, T. B. Acton, R. Xiao, L. Zhao, L. C. Ma, J. F. Hunt, L. Tong, K. Cunningham, M. Inouye, S. Anderson, H. Janjua, R. Shastry, C. K. Ho, D. Wang, H. Wang, M. Jiang, G. T. Montelione, D. I. Stuart, R. J. Owens, S. Daenke, A. Schutz, U. Heinemann, S. Yokoyama, K. Bussow, and K. C. Gunsalus. Protein production and purification. Nat Methods, 5(2):135–46, 2008. [168] A. L. Nelson. Antibody fragments: hope and hype. MAbs, 2(1):77–83, 2010. [169] J. Lofblom, F. Y. Frejd, and S. Stahl. Non-immunoglobulin based protein scaffolds. Curr Opin Biotechnol, 22(6):843–8, 2011. [170] J. Lofblom, J. Feldwisch, V. Tolmachev, J. Carlsson, S. Stahl, and F. Y. Frejd. Affi- body molecules: engineered proteins for therapeutic, diagnostic and biotechnological applications. FEBS Lett, 584(12):2670–80, 2010. [171] J. N. Ahmad, Biedermannova L. Li, J., M. Kuchar, H. Sipova, A. Semeradtova, J. Cerny, H. Petrokova, P. Mikulecky, J. Polinek, O. Stanek, J. Vondrasek, J. Ho- mola, J. Maly, R. Osicka, P. Sebo, and P. Maly. Novel high-affinity binders of human interferon gamma derived from albumin-binding domain of protein g. Pro- teins, 80(3):774–89, 2012. [172] T. Alm, L. Yderland, J. Nilvebrant, A. Halldin, and S. Hober. A small bispecific protein selected for orthogonal affinity purification. Biotechnol J, 5(6):605–17, 2010. [173] H. K. Binz, M. T. Stumpp, P. Forrer, P. Amstutz, and A. Pluckthun. Design- ing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J Mol Biol, 332(2):489–503, 2003. [174] D. Lipovsek. Adnectins: engineered target-binding protein therapeutics. Protein Eng Des Sel, 24(1-2):3–9, 2011. [175] A. Skerra. Alternative binding proteins: anticalins - harnessing the structural plas- ticity of the lipocalin ligand pocket to engineer novel binding activities. FEBS J, 275(11):2677–83, 2008. [176] K. M. Song, S. Lee, and C. Ban. Aptamers and their biological applications. Sensors (Basel), 12(1):612–31, 2012. [177] M. Uhlen, P. Oksvold, L. Fagerberg, E. Lundberg, K. Jonasson, M. Forsberg, M. Zwahlen, C. Kampf, K. Wester, S. Hober, H. Wernerus, L. Bjorling, and F. Pon- ten. Towards a knowledge-based human protein atlas. Nat Biotechnol, 28(12):1248– 50, 2010. [178] M. Lindskog, J. Rockberg, M. Uhlen, and F. Sterky. Selection of protein epitopes for antibody production. Biotechniques, 38(5):723–7, 2005. [179] H. Tegel, J. Steen, A. Konrad, H. Nikdin, K. Pettersson, M. Stenvall, S. Tourle, U. Wrethagen, L. Xu, L. Yderland, M. Uhlen, S. Hober, and J. Ottosson. High- throughput protein production–lessons from scaling up from 10 to 288 recombinant proteins per week. Biotechnol J, 4(1):51–7, 2009. [180] C. Agaton, R. Falk, I. Hoiden Guthenberg, L. Gostring, M. Uhlen, and S. Hober. Selective enrichment of monospecific polyclonal antibodies for antibody-based pro- teomics efforts. J Chromatogr A, 1043(1):33–40, 2004.

110 Bibliography

[181] P. Nilsson, L. Paavilainen, K. Larsson, J. Odling, M. Sundberg, A. C. Andersson, C. Kampf, A. Persson, C. Al-Khalili Szigyarto, J. Ottosson, E. Bjorling, S. Hober, H. Wernerus, K. Wester, F. Ponten, and M. Uhlen. Towards a human proteome atlas: high-throughput generation of mono-specific antibodies for tissue profiling. Proteomics, 5(17):4327–37, 2005. [182] C. Algenas, C. Agaton, L. Fagerberg, A. Asplund, L. Bjorling, E. Bjorling, C. Kampf, E. Lundberg, P. Nilsson, A. Persson, K. Wester, F. Ponten, H. Wernerus, M. Uhlen, J. Ottosson Takanen, and S. Hober. Antibody performance in western blot applications is context-dependent. Biotechnol J, 2014. [183] F. Ponten, J. M. Schwenk, A. Asplund, and P. H. Edqvist. The human protein atlas as a proteomic resource for biomarker discovery. J Intern Med, 270(5):428–46, 2011. [184] A. Asplund, P. H. Edqvist, J. M. Schwenk, and F. Ponten. Antibodies for profiling the human proteome-the human protein atlas as a resource for cancer research. Proteomics, 12(13):2067–77, 2012. [185] F. Danielsson, M. Wiking, D. Mahdessian, M. Skogs, H. Ait Blal, M. Hjelmare, C. Stadler, M. Uhlen, and E. Lundberg. Rna deep sequencing as a tool for selection of cell lines for systematic subcellular localization of all human proteins. J Proteome Res, 12(1):299–307, 2013. [186] I. Matic, E. G. Jaffray, S. K. Oxenham, M. J. Groves, C. L. Barratt, S. Tauro, N. R. Stanley-Wall, and R. T. Hay. Absolute silac-compatible expression strain allows sumo-2 copy number determination in clinical samples. J Proteome Res, 10(10):4869–75, 2011. [187] J. Holzmann, P. Pichler, M. Madalinski, R. Kurzbauer, and K. Mechtler. Stoichiom- etry determination of the mp1-p14 complex using a novel and cost-efficient method to produce an equimolar mixture of standard peptides. Anal Chem, 81(24):10254– 61, 2009. [188] T. Sano, C. L. Smith, and C. R. Cantor. Immuno-pcr: very sensitive antigen detection by means of specific antibody-dna conjugates. Science, 258(5079):120–22, 1992. [189] S. F. Kingsmore and D. D. Patel. Multiplexed protein profiling on antibody-based microarrays by rolling circle amplification. Curr Opin Biotechnol, 14(1):74–81, 2003. [190] D. M. Rissin, C. W. Kan, T. G. Campbell, S. C. Howes, D. R. Fournier, L. Song, T. Piech, P. P. Patel, L. Chang, A. J. Rivnak, E. P. Ferrell, J. D. Randall, G. K. Provuncher, D. R. Walt, and D. C. Duffy. Single-molecule enzyme-linked im- munosorbent assay detects serum proteins at subfemtomolar concentrations. Nat Biotechnol, 28(6):595–9, 2010. [191] D. M. Rissin, C. W. Kan, L. Song, A. J. Rivnak, M. W. Fishburn, Q. Shao, T. Piech, E. P. Ferrell, R. E. Meyer, T. G. Campbell, D. R. Fournier, and D. C. Duffy. Multiplexed single molecule immunoassays. Lab Chip, 13(15):2902–11, 2013. [192] H. C. Tekin, M. Cornaglia, and M. A. Gijs. Attomolar protein detection using a magnetic bead surface coverage assay. Lab Chip, 13(6):1053–9, 2013.

111 Bibliography

[193] S. F. Kingsmore. Multiplexed protein measurement: technologies and applications of protein and antibody arrays. Nat Rev Drug Discov, 5(4):310–20, 2006. [194] R. P. Ekins. Multi-analyte immunoassay. J Pharm Biomed Anal, 7(2):155–68, 1989. [195] R. J. Fulton, R. L. McDade, P. L. Smith, L. J. Kienker, and Jr. Kettman, J. R. Advanced multiplexed analysis with the flowmetrix system. Clin Chem, 43(9):1749– 56, 1997. [196] D. A. Hall, J. Ptacek, and M. Snyder. Protein microarray technology. Mech Ageing Dev, 128(1):161–7, 2007. [197] K. Drobin, P. Nilsson, and J. M. Schwenk. Highly multiplexed antibody suspension bead arrays for plasma protein profiling. Methods Mol Biol, 1023:137–45, 2013. [198] D. Juncker, S. Bergeron, V. Laforte, and H. Li. Cross-reactivity in antibody mi- croarrays and multiplexed sandwich assays: shedding light on the dark side of mul- tiplexing. Curr Opin Chem Biol, 18:29–37, 2014. [199] S. Hu, Z. Xie, J. Qian, S. Blackshaw, and H. Zhu. Functional protein microarray technology. Wiley Interdiscip Rev Syst Biol Med, 3(3):255–68, 2011. [200] L. Perlee, J. Christiansen, R. Dondero, B. Grimwade, S. Lejnine, M. Mullenix, W. Shao, M. Sorette, V. Tchernev, D. Patel, and S. Kingsmore. Development and standardization of multiplexed antibody microarrays for use in quantitative proteomics. Proteome Sci, 2(1):9, 2004. [201] D. J. Phillips, S. C. League, P. Weinstein, and W. C. Hooper. Interference in micro- sphere flow cytometric multiplexed immunoassays for human cytokine estimation. Cytokine, 36(3-4):180–8, 2006. [202] N. C. dupont, K. Wang, P. D. Wadhwa, J. F. Culhane, and E. L. Nelson. Validation and comparison of luminex multiplex cytokine analysis kits with elisa: determina- tions of a panel of nine cytokines in clinical sample culture supernatants. J Reprod Immunol, 66(2):175–91, 2005. [203] L. Dossus, S. Becker, D. Achaintre, R. Kaaks, and S. Rinaldi. Validity of multiplex- based assays for cytokine measurements in serum and plasma from ”non-diseased” subjects: comparison with elisa. J Immunol Methods, 350(1-2):125–32, 2009. [204] J. A. Bastarache, T. Koyama, N. E. Wickersham, and L. B. Ware. Validation of a multiplex electrochemiluminescent immunoassay platform in human and mouse samples. J Immunol Methods, 2014. [205] M. F. Elshal and J. P. McCoy. Multiplex bead array assays: performance evaluation and comparison of sensitivity to elisa. Methods, 38(4):317–23, 2006. [206] M. Pla-Roca, R. F. Leulmi, S. Tourekhanova, S. Bergeron, V. Laforte, E. Moreau, S. J. Gosline, N. Bertos, M. Hallett, M. Park, and D. Juncker. Antibody colocal- ization microarray: a scalable technology for multiplex protein analysis in complex samples. Mol Cell Proteomics, 11(4):M111 011460, 2012. [207] J. M. Schwenk, U. Igel, M. Neiman, H. Langen, C. Becker, A. Bjartell, F. Pon- ten, F. Wiklund, H. Gronberg, P. Nilsson, and M. Uhlen. Toward next generation plasma profiling via heat-induced epitope retrieval and array-based assays. Mol Cell Proteomics, 9(11):2497–507, 2010.

112 Bibliography

[208] O. Poetz, T. Henzler, M. Hartmann, C. Kazmaier, M. F. Templin, T. Herget, and T. O. Joos. Sequential multiplex analyte capturing for phosphoprotein profiling. Mol Cell Proteomics, 9(11):2474–81, 2010. [209] S. Fredriksson, M. Gullberg, J. Jarvius, C. Olsson, K. Pietras, S. M. Gustafsdottir, A. Ostman, and U. Landegren. Protein detection using proximity-dependent dna ligation assays. Nat Biotechnol, 20(5):473–7, 2002. [210] M. Lundberg, S. B. Thorsen, E. Assarsson, A. Villablanca, B. Tran, N. Gee, M. Knowles, B. S. Nielsen, E. Gonzalez Couto, R. Martin, O. Nilsson, C. Fer- mer, J. Schlingemann, I. J. Christensen, H. J. Nielsen, B. Ekstrom, C. Andersson, M. Gustafsson, N. Brunner, J. Stenvang, and S. Fredriksson. Multiplexed homoge- neous proximity ligation assays for high-throughput protein biomarker research in serological material. Mol Cell Proteomics, 10(4):M110 004978, 2011. [211] M. Lundberg, A. Eriksson, B. Tran, E. Assarsson, and S. Fredriksson. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res, 39(15):e102, 2011. [212] H. Zhang and D. W. Chan. Cancer biomarker discovery in plasma using a tissue- targeted proteomic approach. Cancer Epidemiol Biomarkers Prev, 16(10):1915–7, 2007. [213] J. R. Whiteaker, L. Zhao, H. Y. Zhang, L. C. Feng, B. D. Piening, L. Anderson, and A. G. Paulovich. Antibody-based enrichment of peptides on magnetic beads for mass-spectrometry-based quantification of serum biomarkers. Anal Biochem, 362(1):44–54, 2007. [214] E. Kuhn, T. Addona, H. Keshishian, M. Burgess, D. R. Mani, R. T. Lee, M. S. Sabatine, R. E. Gerszten, and S. A. Carr. Developing multiplexed assays for troponin i and interleukin-33 in plasma by peptide immunoaffinity enrichment and targeted mass spectrometry. Clin Chem, 55(6):1108–17, 2009. [215] A. N. Hoofnagle, J. O. Becker, M. H. Wener, and J. W. Heinecke. Quantifica- tion of thyroglobulin, a low-abundance serum protein, by immunoaffinity peptide enrichment and tandem mass spectrometry. Clin Chem, 54(11):1796–804, 2008. [216] J. R. Whiteaker, C. Lin, J. Kennedy, L. Hou, M. Trute, I. Sokal, P. Yan, R. M. Schoenherr, L. Zhao, U. J. Voytovich, K. S. Kelly-Spratt, A. Krasnoselsky, P. R. Gafken, J. M. Hogan, L. A. Jones, P. Wang, L. Amon, L. A. Chodosh, P. S. Nelson, M. W. McIntosh, C. J. Kemp, and A. G. Paulovich. A targeted proteomics-based pipeline for verification of biomarkers in plasma. Nat Biotechnol, 29(7):625–34, 2011. [217] H. Neubert, J. Gale, and D. Muirhead. Online high-flow peptide immunoaffin- ity enrichment and nanoflow lc-ms/ms: assay development for total salivary pepsin/pepsinogen. Clin Chem, 56(9):1413–23, 2010. [218] J. R. Whiteaker, L. Zhao, L. Anderson, and A. G. Paulovich. An automated and multiplexed method for high throughput peptide immunoaffinity enrichment and multiple reaction monitoring mass spectrometry-based quantification of protein biomarkers. Mol Cell Proteomics, 9(1):184–96, 2010.

113 Bibliography

[219] J. R. Whiteaker, L. Zhao, C. Lin, P. Yan, P. Wang, and A. G. Paulovich. Sequential multiplexed analyte quantification using peptide immunoaffinity enrichment coupled to mass spectrometry. Mol Cell Proteomics, 11(6):M111 015347, 2012. [220] N. L. Anderson, M. Razavi, T. W. Pearson, G. Kruppa, R. Paape, and D. Suckau. Precision of heavy-light peptide ratios measured by maldi-tof mass spectrometry. J Proteome Res, 11(3):1868–78, 2012. [221] E. Szajli, T. Feher, and K. F. Medzihradszky. Investigating the quantitative nature of maldi-tof ms. Mol Cell Proteomics, 7(12):2410–8, 2008. [222] M. Razavi, L. E. Frick, W. A. LaMarr, M. E. Pope, C. A. Miller, N. L. Ander- son, and T. W. Pearson. High-throughput siscapa quantitation of peptides from human plasma digests by ultrafast, liquid chromatography-free mass spectrometry. J Proteome Res, 11(12):5642–9, 2012. [223] E. N. Warren, J. Jiang, C. E. Parker, and C. H. Borchers. Absolute quantitation of cancer-related proteins using an ms-based peptide chip. Biotechniques, Suppl:7–11, 2005. [224] J. Jiang, C. E. Parker, K. A. Hoadley, C. M. Perou, G. Boysen, and C. H. Borchers. Development of an immuno tandem mass spectrometry (imaldi) assay for egfr diag- nosis. Proteomics Clin Appl, 1(12):1651–9, 2007. [225] J. Jiang, C. E. Parker, J. R. Fuller, T. H. Kawula, and C. H. Borchers. An im- munoaffinity tandem mass spectrometry (imaldi) assay for detection of francisella tularensis. Anal Chim Acta, 605(1):70–9, 2007. [226] J. D. Reid, D. T. Holmes, D. R. Mason, B. Shah, and C. H. Borchers. Towards the development of an immuno maldi (imaldi) mass spectrometry assay for the diagnosis of hypertension. J Am Soc Mass Spectrom, 21(10):1680–6, 2010. [227] D. R. Mason, J. D. Reid, A. G. Camenzind, D. T. Holmes, and C. H. Borchers. Duplexed imaldi for the detection of angiotensin i and angiotensin ii. Methods, 56(2):213–22, 2012. [228] A. G. Camenzind, J. G. van der Gugten, R. Popp, D. T. Holmes, and C. H. Borchers. Development and evaluation of an immuno-maldi (imaldi) assay for angiotensin i and the diagnosis of secondary hypertension. Clin Proteomics, 10(1):20, 2013. [229] F. Weiss, B. H. van den Berg, H. Planatscher, C. J. Pynn, T. O. Joos, and O. Poetz. Catch and measure-mass spectrometry-based immunoassays in biomarker research. Biochim Biophys Acta, 1844(5):927–32, 2014. [230] M. Uhlen and F. Ponten. Antibody-based proteomics for human tissue profiling. Mol Cell Proteomics, 4(4):384–93, 2005. [231] R. W. Nelson, J. R. Krone, A. L. Bieber, and P. Williams. Mass spectrometric immunoassay. Anal Chem, 67(7):1153–8, 1995. [232] U. A. Kiernan, D. Nedelkov, K. A. Tubbs, E. E. Niederkofler, and R. W. Nelson. Proteomic characterization of novel serum amyloid p component variants from hu- man plasma and urine. Proteomics, 4(6):1825–9, 2004.

114 Bibliography

[233] U. A. Kiernan, R. Addobbati, D. Nedelkov, and R. W. Nelson. Quantitative multiplexed c-reactive protein mass spectrometric immunoassay. J Proteome Res, 5(7):1682–7, 2006. [234] K. A. Tubbs, U. A. Kiernan, E. E. Niederkofler, D. Nedelkov, A. L. Bieber, and R. W. Nelson. Development of recombinant-based mass spectrometric immunoassay with application to resistin expression profiling. Anal Chem, 78(10):3271–6, 2006. [235] U. A. Kiernan, K. A. Tubbs, K. Gruber, D. Nedelkov, E. E. Niederkofler, P. Williams, and R. W. Nelson. High-throughput protein characterization using mass spectrometric immunoassay. Anal Biochem, 301(1):49–56, 2002. [236] M. F. Lopez, T. Rezai, D. A. Sarracino, A. Prakash, B. Krastins, M. Athanas, R. J. Singh, D. R. Barnidge, P. Oran, C. Borges, and R. W. Nelson. Selected re- action monitoring-mass spectrometric immunoassay responsive to parathyroid hor- mone and related variants. Clin Chem, 56(2):281–90, 2010. [237] A. Prakash, T. Rezai, B. Krastins, D. Sarracino, M. Athanas, P. Russo, H. Zhang, Y. Tian, Y. Li, V. Kulasingam, A. Drabovich, C. R. Smith, I. Batruch, P. E. Oran, C. Fredolini, A. Luchini, L. Liotta, E. Petricoin, E. P. Diamandis, D. W. Chan, R. Nelson, and M. F. Lopez. Interlaboratory reproducibility of selective reaction monitoring assays using multiple upfront analyte enrichment strategies. J Proteome Res, 11(8):3986–95, 2012. [238] R. W. Nelson, D. Nedelkov, K. A. Tubbs, and U. A. Kiernan. Quantitative mass spectrometric immunoassay of insulin like growth factor 1. J Proteome Res, 3(4):851–5, 2004. [239] P. E. Oran, J. W. Jarvis, C. R. Borges, N. D. Sherma, and R. W. Nelson. Mass spectrometric immunoassay of intact insulin and related variants for population proteomics studies. Proteomics Clin Appl, 5(7-8):454–9, 2011. [240] O. Trenchevska and D. Nedelkov. Targeted quantitative mass spectrometric im- munoassay for human protein variants. Proteome Sci, 9(1):19, 2011. [241] O. Trenchevska, E. Kamcheva, and D. Nedelkov. Mass spectrometric immunoas- say for quantitative determination of transthyretin and its variants. Proteomics, 11(18):3633–41, 2011. [242] U. Qundos, H. Johannesson, C. Fredolini, G. O’Hurley, R. Branca, M. Uhlén, F. Wiklund, A. Bjartell, P. Nilsson, and J. M. Schwenk. Analysis of plasma from prostate cancer patients links decreased carnosine dipeptidase 1 levels to lymph node metastasis. Translational Proteomics, 2(0):14–24, 2014. [243] M. Selbach and M. Mann. Protein interaction screening by quantitative immuno- precipitation combined with knockdown (quick). Nat Methods, 3(12):981–3, 2006. [244] A. Malovannaya, Y. Li, Y. Bulynko, S. Y. Jung, Y. Wang, R. B. Lanz, B. W. O’Malley, and J. Qin. Streamlined analysis schema for high-throughput identifica- tion of endogenous protein complexes. Proc Natl Acad Sci U S A, 107(6):2431–6, 2010.

115 Bibliography

[245] A. Malovannaya, R. B. Lanz, S. Y. Jung, Y. Bulynko, N. T. Le, D. W. Chan, C. Ding, Y. Shi, N. Yucer, G. Krenciute, B. J. Kim, C. Li, R. Chen, W. Li, Y. Wang, B. W. O’Malley, and J. Qin. Analysis of the human endogenous coregulator complexome. Cell, 145(5):787–99, 2011. [246] N. C. Hubner, A. W. Bird, J. Cox, B. Splettstoesser, P. Bandilla, I. Poser, A. Hyman, and M. Mann. Quantitative proteomics combined with bac transgeneomics reveals in vivo protein interactions. J Cell Biol, 189(4):739–54, 2010. [247] Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schan- dorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A. R. Willems, H. Sassi, P. A. Nielsen, K. J. Rasmussen, J. R. Andersen, L. E. Johansen, L. H. Hansen, H. Jespersen, A. Podtele- jnikov, E. Nielsen, J. Crawford, V. Poulsen, B. D. Sorensen, J. Matthiesen, R. C. Hendrickson, F. Gleeson, T. Pawson, M. F. Moran, D. Durocher, M. Mann, C. W. Hogue, D. Figeys, and M. Tyers. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868):180–3, 2002. [248] M. Varjosalo, R. Sacco, A. Stukalov, A. van Drogen, M. Planyavsky, S. Hauri, R. Ae- bersold, K. L. Bennett, J. Colinge, M. Gstaiger, and G. Superti-Furga. Interlabora- tory reproducibility of large-scale human protein-complex analysis by standardized ap-ms. Nat Methods, 10(4):307–14, 2013. [249] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, and B. Seraphin. A generic protein purification method for protein complex characterization and proteome ex- ploration. Nat Biotechnol, 17(10):1030–2, 1999. [250] A. C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A. M. Michon, C. M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M. A. Heurtier, R. R. Cop- ley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, and G. Superti-Furga. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415(6868):141–7, 2002. [251] R. Kittler, L. Pelletier, C. Ma, I. Poser, S. Fischer, A. A. Hyman, and F. Buch- holz. Rna interference rescue by bacterial artificial chromosome transgenesis in mammalian tissue culture cells. Proc Natl Acad Sci U S A, 102(7):2396–401, 2005. [252] T. Berggard, S. Linse, and P. James. Methods for the detection and analysis of protein-protein interactions. Proteomics, 7(16):2833–42, 2007. [253] T. P. Hopp, K. S. Prickett, V. L. Price, R. T. Libby, C. J. March, D. P. Cerretti, D. L. Urdal, and P. J. Conlon. A short polypeptide marker sequence useful for recombinant protein identification and purification. Bio-Technology, 6(10):1204– 1210, 1988. [254] I. Poser, M. Sarov, J. R. Hutchins, J. K. Heriche, Y. Toyoda, A. Pozniakovsky, D. Weigl, A. Nitzsche, B. Hegemann, A. W. Bird, L. Pelletier, R. Kittler, S. Hua,

116 Bibliography

R. Naumann, M. Augsburg, M. M. Sykora, H. Hofemeister, Y. Zhang, K. Nasmyth, K. P. White, S. Dietzel, K. Mechtler, R. Durbin, A. F. Stewart, J. M. Peters, F. Buchholz, and A. A. Hyman. Bac transgeneomics: a high-throughput method for exploration of protein function in mammals. Nat Methods, 5(5):409–15, 2008. [255] L. Trinkle-Mulcahy, S. Boulon, Y. W. Lam, R. Urcia, F. M. Boisvert, F. Vandermo- ere, N. A. Morrice, S. Swift, U. Rothbauer, H. Leonhardt, and A. Lamond. Identi- fying specific protein interaction partners using quantitative mass spectrometry and bead proteomes. J Cell Biol, 183(2):223–39, 2008. [256] S. Boulon, Y. Ahmad, L. Trinkle-Mulcahy, C. Verheggen, A. Cobley, P. Gregor, E. Bertrand, M. Whitehorn, and A. I. Lamond. Establishment of a protein frequency library and its application in the reliable identification of specific protein interaction partners. Mol Cell Proteomics, 9(5):861–79, 2010. [257] D. Mellacheruvu, Z. Wright, A. L. Couzens, J. P. Lambert, N. A. St-Denis, T. Li, Y. V. Miteva, S. Hauri, M. E. Sardiu, T. Y. Low, V. A. Halim, R. D. Bagshaw, N. C. Hubner, A. Al-Hakim, A. Bouchard, D. Faubert, D. Fermin, W. H. Dunham, M. Goudreault, Z. Y. Lin, B. G. Badillo, T. Pawson, D. Durocher, B. Coulombe, R. Aebersold, G. Superti-Furga, J. Colinge, A. J. Heck, H. Choi, M. Gstaiger, S. Mohammed, I. M. Cristea, K. L. Bennett, M. P. Washburn, B. Raught, R. M. Ewing, A. C. Gingras, and A. I. Nesvizhskii. The crapome: a contaminant repository for affinity purification-mass spectrometry data. Nat Methods, 10(8):730–6, 2013. [258] A. J. Tackett, J. A. DeGrasse, M. D. Sekedat, M. Oeffinger, M. P. Rout, and B. T. Chait. I-dirt, a general method for distinguishing between specific and nonspecific protein interactions. J Proteome Res, 4(5):1752–6, 2005. [259] M. Vermeulen, N. C. Hubner, and M. Mann. High confidence determination of specific protein-protein interactions using quantitative mass spectrometry. Curr Opin Biotechnol, 19(4):331–7, 2008. [260] R. M. Kaake, X. Wang, and L. Huang. Profiling of protein interaction networks of protein complexes using affinity purification and quantitative mass spectrometry. Mol Cell Proteomics, 9(8):1650–65, 2010. [261] M. Hilger and M. Mann. Triple silac to determine stimulus specific interactions in the wnt pathway. J Proteome Res, 11(2):982–94, 2012. [262] J. V. Olsen and M. Mann. Status of large-scale analysis of post-translational mod- ifications by mass spectrometry. Mol Cell Proteomics, 12(12):3444–52, 2013. [263] J. Rush, A. Moritz, K. A. Lee, A. Guo, V. L. Goss, E. J. Spek, H. Zhang, X. M. Zha, R. D. Polakiewicz, and M. J. Comb. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat Biotechnol, 23(1):94–101, 2005. [264] G. Zhang and T. A. Neubert. Use of detergents to increase selectivity of immuno- precipitation of tyrosine phosphorylated peptides prior to identification by maldi quadrupole-tof ms. Proteomics, 6(2):571–8, 2006. [265] J. Villen, S. A. Beausoleil, S. A. Gerber, and S. P. Gygi. Large-scale phosphorylation analysis of mouse liver. Proc Natl Acad Sci U S A, 104(5):1488–93, 2007.

117 Bibliography

[266] A. R. Salomon, S. B. Ficarro, L. M. Brill, A. Brinker, Q. T. Phung, C. Ericson, K. Sauer, A. Brock, D. M. Horn, P. G. Schultz, and E. C. Peters. Profiling of tyrosine phosphorylation pathways in human cells using mass spectrometry. Proc Natl Acad Sci U S A, 100(2):443–8, 2003. [267] M. O. Collins, L. Yu, and J. S. Choudhary. Analysis of protein phosphorylation on a proteome-scale. Proteomics, 7(16):2751–68, 2007. [268] M. Fujimuro, H. Sawada, and H. Yokosawa. Production and characterization of monoclonal antibodies specific to multi-ubiquitin chains of polyubiquitinated pro- teins. FEBS Lett, 349(2):173–80, 1994. [269] M. Fujimuro and H. Yokosawa. Production of antipolyubiquitin monoclonal anti- bodies and their use for characterization and isolation of polyubiquitinated proteins. Methods Enzymol, 399:75–86, 2005. [270] M. Matsumoto, S. Hatakeyama, K. Oyamada, Y. Oda, T. Nishimura, and K. I. Nakayama. Large-scale analysis of the human ubiquitin-related proteome. Pro- teomics, 5(16):4145–51, 2005. [271] P. Schwertman, K. Bezstarosti, C. Laffeber, W. Vermeulen, J. A. Demmers, and J. A. Marteijn. An immunoaffinity purification method for the proteomic analysis of ubiquitinated protein complexes. Anal Biochem, 440(2):227–36, 2013. [272] S. A. Wagner, P. Beli, B. T. Weinert, M. L. Nielsen, J. Cox, M. Mann, and C. Choud- hary. A proteome-wide, quantitative survey of in vivo ubiquitylation sites reveals widespread regulatory roles. Mol Cell Proteomics, 10(10):M111 013284, 2011. [273] D. L. Swaney, P. Beltrao, L. Starita, A. Guo, J. Rush, S. Fields, N. J. Krogan, and J. Villen. Global analysis of phosphorylation and ubiquitylation cross-talk in protein degradation. Nat Methods, 10(7):676–82, 2013. [274] K. Newton, M. L. Matsumoto, I. E. Wertz, D. S. Kirkpatrick, J. R. Lill, J. Tan, D. Dugger, N. Gordon, S. S. Sidhu, F. A. Fellouse, L. Komuves, D. M. French, R. E. Ferrando, C. Lam, D. Compaan, C. Yu, I. Bosanac, S. G. Hymowitz, R. F. Kelley, and V. M. Dixit. Ubiquitin chain editing revealed by polyubiquitin linkage-specific antibodies. Cell, 134(4):668–78, 2008. [275] M. L. Matsumoto, K. E. Wickliffe, K. C. Dong, C. Yu, I. Bosanac, D. Bustos, L. Phu, D. S. Kirkpatrick, S. G. Hymowitz, M. Rape, R. F. Kelley, and V. M. Dixit. K11-linked polyubiquitination in cell cycle control revealed by a k11 linkage-specific antibody. Mol Cell, 39(3):477–84, 2010. [276] K. L. Guan, W. Yu, Y. Lin, Y. Xiong, and S. Zhao. Generation of acetyllysine antibodies and affinity enrichment of acetylated peptides. Nat Protoc, 5(9):1583–95, 2010. [277] Y. Y. Wang, J. Yao, P. Y. Yang, C. H. Deng, and H. Z. Fan. Immobilization of antibodies on magnetic carbonaceous microspheres for selective enrichment of lysine- acetylated proteins and peptides. Chinese Journal of , 30(10):2549–2555, 2012. [278] F. M. Boisvert, J. Cote, M. C. Boulanger, and S. Richard. A proteomic analysis of arginine-methylated protein complexes. Mol Cell Proteomics, 2(12):1319–30, 2003.

118 Bibliography

[279] A. Guo, H. Gu, J. Zhou, D. Mulhern, Y. Wang, K. A. Lee, V. Yang, M. Aguiar, J. Kornhauser, X. Jia, J. Ren, S. A. Beausoleil, J. C. Silva, V. Vemulapalli, M. T. Bedford, and M. J. Comb. Immunoaffinity enrichment and mass spectrometry anal- ysis of protein methylation. Mol Cell Proteomics, 13(1):372–87, 2014. [280] G. Nikov, V. Bhat, J. S. Wishnok, and S. R. Tannenbaum. Analysis of nitrated pro- teins by nitrotyrosine-specific affinity probes and mass spectrometry. Anal Biochem, 320(2):214–22, 2003. [281] X. Zhan and D. M. Desiderio. Nitroproteins from a human pituitary adenoma tissue discovered with a nitrotyrosine affinity column and tandem mass spectrometry. Anal Biochem, 354(2):279–89, 2006. [282] C. Wingren, P. James, and C. A. Borrebaeck. Strategy for surveying the proteome using affinity proteomics and mass spectrometry. Proteomics, 9(6):1511–7, 2009. [283] O. Poetz, S. Hoeppe, M. F. Templin, D. Stoll, and T. O. Joos. Proteome wide screening using peptide affinity capture. Proteomics, 9(6):1518–23, 2009. [284] S. Hoeppe, T. D. Schreiber, H. Planatscher, A. Zell, M. F. Templin, D. Stoll, T. O. Joos, and O. Poetz. Targeting peptide termini, a novel immunoaffinity approach to reduce complexity in mass spectrometric protein identification. Mol Cell Proteomics, 10(2):M110 002857, 2011. [285] N. Lion, F. Reymond, H. H. Girault, and J. S. Rossier. Why the move to microflu- idics for protein analysis? Curr Opin Biotechnol, 15(1):31–7, 2004. [286] A. Rios, A. Escarpa, and B. Simonet. Miniaturization in analytical chemistry, 2009. [287] S. C. Terry, J. H. Jerman, and J. B. Angell. A gas chromatographic air analyzer fabricated on a silicon wafer. Electron Devices, IEEE Transactions on, 26(12):1880– 1886, 1979. [288] M. J. Madou. Fundamentals of microfabrication : the science of miniaturization. CRC, Boca Raton, Fla., 2. edition, 2002. [289] M. J. Madou and R. Cubicciotti. Scaling issues in chemical and biological sensors. Proceedings of the IEEE, 91(6):830–838, 2003. [290] E. K. Sackmann, A. L. Fulton, and D. J. Beebe. The present and future role of microfluidics in biomedical research. Nature, 507(7491):181–9, 2014. [291] N. T. Nguyen and Z. Wu. Micromixers—a review. Journal of Micromechanics and Microengineering, 15(2):R1, 2005. [292] G. H. W. Sanders and A. Manz. Chip-based microsystems for genomic and proteomic analysis. Trac-Trends in Analytical Chemistry, 19(6):364–378, 2000. [293] X. Feng, W. Du, Q. Luo, and B. F. Liu. Microfluidic chip: next-generation platform for systems biology. Anal Chim Acta, 650(1):83–97, 2009. [294] V. Gubala, L. F. Harris, A. J. Ricco, M. X. Tan, and D. E. Williams. Point of care diagnostics: status and future. Anal Chem, 84(2):487–515, 2012. [295] G. M. Whitesides. The origins and the future of microfluidics. Nature, 442(7101):368–73, 2006.

119 Bibliography

[296] J. Gao, J. Xu, L. E. Locascio, and C. S. Lee. Integrated microfluidic system en- abling protein digestion, peptide separation, and protein identification. Anal Chem, 73(11):2648–55, 2001. [297] L. J. Jin, J. Ferrance, J. C. Sanders, and J. P. Landers. A microchip-based pro- teolytic digestion system driven by electroosmotic pumping. Lab Chip, 3(1):11–8, 2003. [298] Y. Li, R. Wojcik, and N. J. Dovichi. A replaceable microreactor for on-line protein digestion in a two-dimensional capillary electrophoresis system with tandem mass spectrometry detection. J Chromatogr A, 1218(15):2007–11, 2011. [299] J. Ji, L. Nie, L. Qiao, Y. Li, L. Guo, B. Liu, P. Yang, and H. H. Girault. Proteolysis in microfluidic droplets: an approach to interface protein separation and peptide mass spectrometry. Lab Chip, 12(15):2625–9, 2012. [300] J. Li, T. LeRiche, T. L. Tremblay, C. Wang, E. Bonneil, D. J. Harrison, and P. Thibault. Application of microfluidic devices to proteomics research: identifi- cation of trace-level protein digests and affinity capture of target peptides. Mol Cell Proteomics, 1(2):157–68, 2002. [301] B. E. Slentz, N. A. Penner, and F. E. Regnier. Protein proteolysis and the multi- dimensional electrochromatographic separation of histidine-containing peptide frag- ments on a chip. J Chromatogr A, 984(1):97–107, 2003. [302] B. Adler, T. Laurell, and S. Ekstrom. Optimizing nanovial outlet designs for im- proved solid-phase extraction in the integrated selective enrichment target–iset. Electrophoresis, 33(21):3143–50, 2012. [303] S. Ekstrom, L. Wallman, J. Malm, C. Becker, H. Lilja, T. Laurell, and G. Marko- Varga. Integrated selective enrichment target–a microtechnology platform for matrix-assisted laser desorption/ionization-mass spectrometry applied on protein biomarkers in prostate diseases. Electrophoresis, 25(21-22):3769–77, 2004. [304] S. Ekstrom, L. Wallman, D. Hok, G. Marko-Varga, and T. Laurell. Miniaturized solid-phase extraction and sample preparation for maldi ms using a microfabricated integrated selective enrichment target. J Proteome Res, 5(5):1071–81, 2006. [305] H. Yan, A. Ahmad-Tajudin, M. Bengtsson, S. Xiao, T. Laurell, and S. Ekstrom. Noncovalent antibody immobilization on porous silicon combined with miniatur- ized solid-phase extraction (spe) for array based immunomaldi assays. Anal Chem, 83(12):4942–8, 2011. [306] S. Ekstrom, L. Wallman, G. Helldin, J. Nilsson, G. Marko-Varga, and T. Laurell. Polymeric integrated selective enrichment target (iset) for solid-phase-based sample preparation in maldi-tof ms. J Mass Spectrom, 42(11):1445–52, 2007. [307] A. Ahmad-Tajudin, B. Adler, S. Ekstrom, G. Marko-Varga, J. Malm, H. Lilja, and T. Laurell. Maldi-target integrated platform for affinity-captured protein digestion. Anal Chim Acta, 807:1–8, 2014. [308] S. J. Lee, B. Adler, S. Ekstrom, M. Rezeli, A. Vegvari, J. W. Park, J. Malm, and T. Laurell. Aptamer/iset-ms: a new affinity-based maldi technique for improved detection of biomarkers. Anal Chem, 86(15):7627–34, 2014.

120 Bibliography

[309] N. S. Berrow, K. Bussow, B. Coutard, J. Diprose, M. Ekberg, G. E. Folkers, N. Levy, V. Lieu, R. J. Owens, Y. Peleg, C. Pinaglia, S. Quevillon-Cheruel, L. Salim, C. Sche- ich, R. Vincentelli, and D. Busso. Recombinant protein expression and solubility screening in escherichia coli: a comparative study. Acta Crystallogr D Biol Crystal- logr, 62(Pt 10):1218–26, 2006. [310] O. Brodsky and C. N. Cronin. Economical parallel protein expression screening and scale-up in escherichia coli. J Struct Funct Genomics, 7(2):101–8, 2006. [311] R. Vincentelli, A. Cimino, A. Geerlof, A. Kubo, Y. Satou, and C. Cambillau. High- throughput protein expression screening and purification in escherichia coli. Meth- ods, 55(1):65–72, 2011. [312] J. Panula-Perala, J. Siurkus, A. Vasala, R. Wilmanowski, M. G. Casteleijn, and P. Neubauer. Enzyme controlled glucose auto-delivery for high cell density cultiva- tions in microplates and shake flasks. Microb Cell Fact, 7:31, 2008. [313] K. F. Wlaschin and W. S. Hu. Fedbatch culture and dynamic nutrient feeding. Adv Biochem Eng Biotechnol, 101:43–74, 2006. [314] J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 18(9):1509–17, 2008. [315] Z. Wang, M. Gerstein, and M. Snyder. Rna-seq: a revolutionary tool for transcrip- tomics. Nat Rev Genet, 10(1):57–63, 2009. [316] R. Lichtinghagen, P. B. Musholt, M. Lein, A. Romer, B. Rudolph, G. Kristiansen, S. Hauptmann, D. Schnorr, S. A. Loening, and K. Jung. Different mrna and protein expression of matrix metalloproteinases 2 and 9 and tissue inhibitor of metallopro- teinases 1 in benign and malignant prostate tissue. Eur Urol, 42(4):398–406, 2002. [317] J. An, J. Shi, Q. He, K. Lui, Y. Liu, Y. Huang, and M. S. Sheikh. Chcm1/chchd6, novel mitochondrial protein linked to regulation of mitofilin and mitochondrial cristae morphology. J Biol Chem, 287(10):7411–26, 2012. [318] J. Xie, M. F. Marusich, P. Souda, J. Whitelegge, and R. A. Capaldi. The mitochon- drial inner membrane protein mitofilin exists as a complex with sam50, metaxins 1 and 2, coiled-coil-helix coiled-coil-helix domain-containing protein 3 and 6 and dnajc11. FEBS Lett, 581(18):3545–9, 2007.

121