Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1055

DNA-Assisted Immunoassays for High-Performance Analyses

JUNHONG YAN

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6206 ISBN 978-91-554-9114-7 UPPSALA urn:nbn:se:uu:diva-236591 2014 Dissertation presented at Uppsala University to be publicly examined in B41, Biomedicalcenter, Husargatan 3, 75018, Uppsala, Friday, 16 January 2015 at 09:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Faculty examiner: Professor Christer Wingren (Lund University).

Abstract Yan, J. 2014. DNA-Assisted Immunoassays for High-Performance Protein Analyses. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1055. 53 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9114-7.

Proteins play important roles in most cellular functions, such as, replication, transcription regulation, signal transduction, for catalyzing chemical reaction, etc. Technologies developed to identify rely either on observing their own properties such as charge, size, mass to charge ratio or sequence composition; or on using affinity reagents that recognize specific protein targets. Immunoassays utilizing functionalized affinity reagents are powerful for targeted proteomics. Among them, DNA-assisted immunoassays in which affinity reagents are labeled with DNA molecules, offer some unique advantages. In this thesis, I will present works to improve current DNA-assisted immunoassays such as proximity ligation assays (PLA), as well as to take advantage of DNA reactions to adress other problems. In paper I, a new solid support (MBC-Ts) was functionalized with and used in the solid-phase PLA for detection of VEGF. The assay using MBC-Ts was compared among the commercially available solid supports in different matrices and it was shown to exhibit enhanced limit of detection in complex matrices. In paper II, a two-step protocol was described to prepare high-quality probes used in homogeneous and in situ PLA by purifying DNA-labeled affinity reagents from unconjugated affinity reagents and excess oligonucleotides. In paper III, PLA was applied on a capillary western blotting instrument so that both the sensitivity and specificity of the original assay were improved. In paper IV, a new method was introduced to profile protein components in individual protein complexes by DNA-barcoded antibodies. This method has been used to profile protein complexes such as surface proteins on individual secreted vesicles.

Keywords: proximity ligation assay, magnetic beads, conjugation, phosphorylation, affinity binder, protein complexes, recombinant binder, sensitivity, specificity

Junhong Yan, Department of Immunology, Genetics and Pathology, Rudbecklaboratoriet, Uppsala University, SE-751 85 Uppsala, Sweden. Science for Life Laboratory, SciLifeLab, Box 256, Uppsala University, SE-75105 Uppsala, Sweden.

© Junhong Yan 2014

ISSN 1651-6206 ISBN 978-91-554-9114-7 urn:nbn:se:uu:diva-236591 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-236591) A very great deal more truth can become known than can be proven. — Richard Phillips Feynman

To my family and friends

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Yan, J., Horák, D., Lenfeld, J., Hammond, M., Kamali- Moghaddam, M. (2013) A tosyl-activated magnetic bead cellu- lose as solid support for sensitive protein detection. J Biotechnol. 167(3):235-40.

II Yan, J*., Gu, GJ*., Jost, C., Hammond, M., Plückthun, A., Landegren, U., Kamali-Moghaddam, M. (2014) A universal ap- proach to prepare reagents for DNA-assisted protein analysis. PLoS ONE 9(9): e108061. doi:10.1371/journal.pone.0108061, *equal contribution

III Yan, J., Padhan, N., Zieba, A., Gu, GJ., Pontén, F., Boge, A., Scrivener, E., Kamali-Moghaddam M., Claesson-Welsh, L., Landegren, U. Highly sensitive and specific protein and isoforms detection via proximity ligation in capillary westerns. Manuscript.

IV Wu, D., Yan, J., Sun, Y., Ronquist, G., Cavalli, M., Oelrich, J., Wadelius, W., Landegren, U., Kamali-Moghaddam, M. Profil- ing individual protein complexes by proximity-dependent bar- coding. Manuscript.

Reprints were made with permission from the respective publishers. Related work by the author

Tavoosidana, G., Ronquist, G., Darmanis, S., Yan, J., Carlsson, L., Wu, D., Conze, T., Ek, P., Semjonow, A., Eltze, E., Larsson, A., Landegren, U., Ka- mali-Moghaddam, M. (2011) Multiple recognition assay reveals prostasomes as promising plasma biomarkers for prostate cancer. Proc Natl Acad Sci USA.108 (21): 8809-8814.

Nong, R.Y., Wu, D., Yan, J., Hammond, M., Gu, G.J., Kamali-Moghaddam, M., Landegren, U., Darmanis, S. (2013) Solid-phase proximity ligation as- says for individual or parallel protein analyses with readout via real-time PCR or sequencing. Nat Protoc. 8(6):1234-48.

Contents

Introduction ...... 11 Protein detection technologies...... 12 2D electrophoresis — seeing is believing ...... 12 Mass spectrometry — masses don’t lie ...... 13 Immunoassays — needles in haystacks ...... 14 Immunoblot ...... 15 ELISA ...... 16 Highly-multiplexed immunoassays ...... 17 DNA-assisted immunoassays ...... 20 Affinity reagents ...... 24 Emerging areas requiring new technology ...... 31 Single-cell proteomics — less is more ...... 31 Resolving protein-protein interactions — pair-wise is not enough ..... 32 Present investigations ...... 35 Paper I – A tosyl-activated magnetic bead cellulose as solid support for sensitive protein detection ...... 35 Paper II – A universal approach to prepare reagents for DNA-assisted protein analysis ...... 36 Paper III – Highly sensitive and specific protein and isoforms detection via proximity ligation in capillary westerns ...... 37 Paper IV – Profiling individual protein complexes by proximity- dependent barcoding ...... 38 Future perspectives ...... 41 Acknowledgements ...... 43 References ...... 47

Abbreviations

2DE Two-dimensional electrophoresis BG Benzylguanine BSA Bovine serum albumin CoIP Co- CSF Cerebrospinal fluid CV Coefficient of variation CyTOF Cytometry time of flight DARPin Designed ankyrin repeat protein DNA Deoxyribonucleic acid ECL Electrochemiluminescent EGFR Epidermal growth factor receptor ELISA Enzyme-linked immunosorbent assay ERK Extracellular signal-regulated kinases ESI Electrospray ionization HRP Horseradish peroxidase IF Isoelectric focusing IgG Immunoglobulin G in situ PLA In situ proximity ligation assay LC Liquid chromatography LOD Limit of detection MALDI Matrix-assisted laser desorption/ionization mRNA Messenger ribonucleic acid MS Mass spectrometry PAGE Polyacrylamide gel PCR Polymerase chain reaction PEA Proximity extension assay pI Isoelectric point PLA Proximity ligation assay PPI Protein-protein interaction PSA Prostate specific

qPCR Real-time polymerase chain reaction RCA Rolling circle amplification RNA Ribonucleic acid ScFv Single chain fragment variables SDS Sodium dodecyl sulfate SELEX Systematic evolution of ligands by exponential enrichment SOMAmer Slow off-rate modified aptamers sp-PLA Solid phase proximity ligation assay SRM Selected reaction monitoring tRNA Transfer ribonucleic acid VEGF Vascular endothelial growth factor Y2H Yeast two-hybrid system

Introduction

Before going into the details about the specific area of my work, I would like to give an overview about the history and current state of methodology in biology. In the past century, biologists together with chemists and physicists have tried to understand the complex biological system by reductionism methodology, by breaking down the system into the lowest possible levels and studying individual components that compose the whole system. The biochemical approach to purify a molecule species of interest from the bio- logical system allows characterization of the molecule in the first place. How individual proteins function in organisms can be elucidated by reverse genet- ics on different levels such as RNA interference, knock-out mouse models, and CRISPR/Cas 9. This dominant methodology, usually hypothesis-driven, has provided a lot of knowledge for understanding how the biological sys- tems work. However, biological systems are obviously more than the sum of its isolated components [1]. Recent technological developments allow stud- ies of biological systems in a comprehensive way through generating “omic” data by high-throughput methods and computational modeling to define the components of the system and how the components work together to accom- plish different tasks [2-4] (Figure 1).

DNA RNA Protein

Data Observation Signaling acquisition Genome X-ray crystallography measurement pathway Trancriptome Reverse-genetics Proteome Functional genomics Cell Interactome Metabolome

Tissue Model Conclusion Hypothesis Organ Analysis

Organism Reductionism Systems biology

Figure 1. General overview of molecular biological research methodology with reductionism and systems biology point of view.

11 Now I would like to continue with the class of molecules that my thesis work focuses on. Proteins, the end products of the information flow indi- cated by the central dogma, are the building blocks and functional compo- nents of a cell. Composed of 20 different amino acids in one or more linear chains, proteins form varied, intricate structures to carry out numerous cellu- lar functions to serve as structural components of a cell, as sensors for ex- tracellular signals or signal molecules themselves, as transporters to move substances across plasma membranes, as regulators to control gene expres- sion, as enzymes to catalyze chemical reactions, etc. Elaborate cellular pathways guarantee that a certain amount of specific proteins will be ex- pressed at a certain time and location, either intracellularly or secreted into extracellular space, to fulfill their destinies. Studies of proteins provide fun- damental knowledge of cellular pathway and insights into disease onset from deregulated cellular pathways. Due to the complexity of protein composi- tion, qualitative and quantitative studies of proteins are proven to be much more difficult than analyses at levels of nucleic acids. In addition, proteins are usually posttranslationally modified including through phosphorylation, acetylation and ubiquitination to be functional. Furthermore, they often work together either with themselves or with nucleic acids to form complexes such as the transcription initiation machinery [5]. Studies of individual proteins or one class of proteins is not enough so the concept proteomics was first brought up by Marc R. Wilkins in 1994, inspired by genomics, to refer to large-scale analysis of proteins in regards to expression level across time and location, post-translational modification and interaction [6].

Protein detection technologies Early technologies to study proteins are mainly focused on the biochemical traits of the molecule itself, such as size and charge. Modern protein tech- nologies are the joint effort of different technology areas in order to push the detection limit so as to detect rare molecules, detect several proteins at the same time in order to reveal a more comprehensive picture of the proteome, and simplify the handling procedures for clinical routine use. However, no methods are perfect as each has its strengths and weaknesses so that it fits certain applications. Here, I would like to discuss a few protein technologies from early times and also the current state-of-the-art methods of proteomics.

2D electrophoresis — seeing is believing Proteins can be separated and characterized by electrophoresis according to their amino acids composition and side chains in the amino acids as they differ in size and charge. First attempt of protein separation by moving

12 boundary electrophoresis was made by Arne Tiselius in 1937 using serum proteins with a U-shaped apparatus filled with separation buffer acetate and phosphate in the vertical arms [7]. Later, proteins were more often separated in a cross-linked polymer, polyacrylamide gel electrophoresis (PAGE), which acts as a molecular sieve, where the migration of proteins relies on their charge-to-mass ratio and shape. Detergent like Sodium dodecyl sulfate (SDS) is usually applied during electrophoresis to impart proteins with nega- tive charges regardless of their intrinsic charges, and to partially denature the proteins to form rodlike shapes. Thus separation of proteins by SDS-PAGE will mostly depend on the molecular weight, with smaller proteins migrating faster in the gel. Isoelectric focusing (IF) is a procedure to determine the isoelectric point (pI) of a protein. Proteins with different charges migrate in an immobilized pH gradient gel made from low molecular organic acids and bases (ampholytes) until they reach the pH that matches their pI. In two- dimensional electrophoresis (2DE), proteins are first separated by IF in a thin strip gel and then the strip is laid on top of a slab-shape SDS-PAGE to separate orthogonally for later staining with different dyes such as Coomassie blue or silver staining so that the protein can be visualized as spots. The original protein components are resolved in two dimensions with horizontal separation reflecting the pI and the vertical separation indicating the molecular weight, by procedures that were first demonstrated by Kenrick & Margolis in the early 1970s [8].

Mass spectrometry — masses don’t lie Similar to the 2DE, mass spectrometry (MS) is another major technology in protein identification, which leads to many discoveries in complex samples. The principle of MS is that molecules to be analyzed are first ionized in a vacuum and then introduced into an electro-magnetic field. Their flying path and time of flight is related to their mass to charge ratio. The advantages of this technique lie in the capacity to identify many different proteins, peptides and post-translational modifications (PTMs) with high accuracy. However, limited sensitivity is a major drawback in MS-based proteomics technology [9]. In early days of MS-based proteomic studies, proteins mixture either ob- tained by lysis of cells and tissues or directly from liquid samples such as plasma are first separated using 2DE and then protein spots in the gel are further analyzed using MS. In modern times, samples are usually first enzy- matically digested to generate short peptides before they are subjected to fractionation techniques such as ion-exchange liquid chromatography (LC) followed by analysis in a mass spectrometer. After peptides are introduced to the mass spectrometer, different ionization techniques are developed to ion- ize large and polar molecules without physically destroying them. Matrix-

13 assisted laser desorption/ionization (MALDI) allows peptides to be ionized in a light-absorbing matrix with a short pulse of laser light and then desorbed from the matrix into the vacuum system. Another successful technique is electrospray ionization (ESI), where peptides dissolved in solution are passed through a charged needle with high electrical potential and dispersed as microdroplets. The solvent around the peptides is evaporated, leaving the ionized peptides in the gas phase. The ionized peptides are then ready to be detected in the mass spectrometer. Non-targeted MS are attractive in finding new proteins in the non-hypothesis driven studies and up to thousands of proteins can be distinguished in one run. However, non-targeted MS has a low throughput and inadequate sensitivity (µg/ml to mg/ml range), thus pro- teins with low abundance are usually undetectable or masked by abundant protein present in the sample unless a depletion or enrichment step is used before separation and detection. Apart from discovery-based proteomic research depending on non- targeted MS, targeted MS is becoming widely used in biomedicine and dis- ease proteomics research as it provides sensitive and quantitative detection of proteins, peptides and PTMs. Selected reaction monitoring (SRM) uses a triple quadrupole mass spectrometer to act as a mass filter and selectively monitor specific analytes as molecular ions or as sets of ionized fragments generated from the analyte through collisional dissociation [10]. SRM has better sensitivity and higher sample throughput than normal LC-MS and proves to be useful in analyzing predetermined sets of proteins such as pro- teins in a certain signaling pathway or sets of candidate biomarkers. Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA) uses stable isotope labeled synthetic peptides as internal control mixed with the sample to be analyzed and antibodies to capture the peptides of interests before going to the mass spectrometer [11]. SISCAPA greatly increases the detection limit and the quantification accuracy of mass spectrometry [12].

Immunoassays — needles in haystacks Another genre of technologies dominating protein detection is immunoas- says that rely on affinity reagents to report presence of protein or other re- lated molecules. Here affinity reagents include traditionally immunoglobu- lins and also different classes of affinity reagents using non-immunoglobulin scaffolds that nonetheless bind to their targets in ways similar to immu- noglobins. With the help of affinity reagents, the proteome is much more accessible than MS if one or hundreds proteins of interest are to be analyzed. Generally immunoassays are composed of different steps to ensure sensitive and specific measurement of analytes. Firstly, there is usually a separation or enrichment step to allow irrelevant proteins from plasma or cellular extracts spatially separated by a protein electrophoresis or to be washed away in a

14 solid support immobilized by affinity reagents. The second step in an immu- noassay is the probing step, where affinity reagents with different labels find their targets among other irrelevant molecules. Different labels such as en- zymes, fluorescent dyes, isotopes, nucleic acids etc. can be chemically con- jugated to the affinity reagents to be detected in the readout step. Modern immunoassays usually have a step for signal amplification to ensure lower amount molecules to be detected. Finally, the signals generated from previ- ous steps should be recorded in an instrument, for instance, a camera, a flow cytometer, a reader or a microscope. Sensitivity and specificity are the two central properties of immunoas- says. Improvement of sensitivity allowing detection of lower abundance proteins can be achieved by some combination of reducing various contribu- tions to background, including nonspecific binding of detection reagents, and cross reactions for irrelevant molecules, improving efficiency of detection, and by enhancing signal amplification. Solid supports used in many immu- noassays for enrichment and removing interfering reagents are helpful to some extent. An extremely sensitive readout instrument e.g. single- fluorophore detector can also contribute to the sensitivity of an assay. Fun- damentally important for the specificity of an is the specificity of affinity reagents. This is the only guarantee for specificity in assays that depend on binding by single affinity reagent for example the SomaLogic assay (discussed later). However, it is not possible to produce mono-specific affinity reagent that reacts only with the expected antigen, irrespective of the concentrations of the molecules. An important means to circumvent this problem is by increasing the number of affinity binders required in the assay and that is why most clinical immunoassays nowadays use a pair of affinity binders instead of just one.

Immunoblot Protein immunoblot, also called , is a technique widely used in detecting specific proteins from cell- or tissue- extract. Combined with other techniques such as co-immunoprecipitation (CoIP) and specific against certain protein or post-translational modification (PTM) of the pro- tein, western blot has been used to disentangle complicated signaling path- ways manifested through protein interactions and PTMs. Proteins extracted from cells or tissues are spatially separated either on a native PAGE or on a SDS PAGE and then transferred to a membrane (typically nitrocellulose or PVDF). After transfer, membranes are blocked to avoid unspecific binding of antibodies or other irrelevant molecules to the membrane and then the separated and membrane-transferred proteins are visualized using primary antibodies against protein of interest. Secondary antibodies against the spe- cie of primary antibodies are often labeled either with enzymes like alkaline phosphatase or horseradish peroxidase (HRP), or with fluorescent groups.

15 The HRP coupled to secondary antibodies cleaves a substrate to release a chemiluminescent compound that produces localized luminescence in pro- portion to the amount of proteins. This luminescence can be captured by a sheet of photographic film using a suitable exposure time. Proteins of inter- est appear as a band, whose intensity reflects the amount of protein, while the migration of the protein in the gel indicates the size of the correspondent protein.

ELISA The most widely used immunoassay for protein detection in research as well as in the clinic is the enzyme-linked immunosorbent assay (ELISA) that was developed from the earlier radioimmunoassay (RIA) by changing the label moiety on antibodies from radioactive isotopes to enzymes. The first ELISA, described in 1971 by Engvall and Perlmann for detection of immunoglobulin G directly labeled with enzymes by capture antibodies immobilized on the solid support [13]. In direct ELISA, which is also called sandwich ELISA, analytes of interest are captured through a capture antibody immobilized on the solid support, so that irrelevant proteins can be washed away. Detection antibodies labeled with enzymes such as alkaline phosphatase or horseradish peroxidase (HRP) recognize the analytes and generate signal in a similar way as what they do in immunoblots. Recently, electrochemiluminescent (ECL) technology has been used instead of chemiluminescence in some commercial highly-sensitive (e.g. Meso Scale Discovery, www.mesoscale.com). Detection antibodies are labeled with an ECL species 2+ such as [Ru(bpy)3] that emits signal only when in close proximity to a stimulated electrode surface [14-15]. The introduction of solid support al- lows successful removal of irrelevant proteins from complex samples and excessive detection antibodies, ensuring low signal generation when no ana- lyte is present (background signal). However, background signal still exists as detection antibodies unspecifically bind to the solid support. Nowadays commercialized ELISA for single protein analyte detection such as IL-6 can go down to 0.7 pg/ml in concentration and accurately measure proteins over 3 logs in dynamic range. ELISA has been widely used in the clinic for in vitro diagnostics such as HIV antigen p24 and hepatitis B surface antigen (hBsAg) for its easy handling and excellent reproducibility with an average CV% (coefficient of variation) below 5 % (www.rndsystem.com) Three decades after the invention of ELISA, two papers were published describing ultrasensitive detection of proteins at the single molecule level. Both technologies build immunocomplexes as in traditional sandwich im- munoassay on microparticles. The Erenna Immunoassay System, developed by Singulex, detects single fluorophores in a 100 µm capillary flow cell, where fluorescently labeled detection antibodies are released from the im- munocomplexes. The LOD (limit of detection) of cardiac troponin I, a clini-

16 cal biomarker for cardiac muscle tissue injury, was 0.2 pg/ml with 10% CV [16]. In a paper published by Savage et al. in 2014, Aβ oligomer could be detected in CSF with LOD of 0.04 pg/ml and the level of Aβ oligomer was significantly elevated in CSF from Alzheimer's disease patients than age- matched control in three separate cohorts using the Erenna Immunoassay System [17]. In 2010 another company, Quanterix, published a paper in Na- ture Biotechnology describing single-molecule ELISA based on single molecule arrays (SiMoAs) that were used to detect serum proteins at sub- femtomolar concentrations. In the Quanterix system, are captured on an excess of magnetic beads (2.7 µm in diameter) through immobilized capture antibodies so that most of the magnetic beads are empty with a small portion capturing one antigen according to the Poisson distribution. Enzyme labeled detection antibodies are added to the magnetic beads to build up the immunocomplexes like the standard ELISA. Then the magnetic beads with or without single immunocomplex are loaded into the wells (4.5 µm in di- ameter) on the SiMoAs with one bead per well along with a substrate for the coupled enzymes. The products of the enzyme reaction are visible as fluo- rescence scanned by for counting fluorescent wells. The concentration of protein in bulk solution is correlated to the percentage of beads that carry a protein molecule. The LOD for PSA detection in 25% serum was 50 aM (1.5 fg/ml) and 150 aM (2.5 fg/ml) for TNFα using the digital counting by Quanterix system [18]. These recent development of ELISA with digital quantification have greatly improved the sensitivity and accuracy of protein detection, especially for tissue specific biomarkers pre- sent at low levels in blood, something that can be useful for biomarker dis- covery and disease diagnosis [19].

Highly-multiplexed immunoassays The indication of a disease state is not usually reflected by elevation or re- duction of a single protein in the plasma but as changes in levels of sets of proteins, and the situation is particularly complex in cancer and other dis- eases that are associated with multiple genetic factors e.g. Alzheimer’s dis- ease [20] and multiple sclerosis [21]. A similar situation also arises in cell signaling study, where several proteins are involved in one signaling path- way and disruption of one signaling pathway usually affects others. Accurate quantification of one protein is not enough to understand the whole picture in biomarker discovery or proteomic studies. Limited amounts of samples obtained through procedures such as microdialysis [22-24] and fine-needle aspiration biopsies [25] are usually present in few microlitres and benefit from multiple analytes detection in individual samples. Although MS-based technologies are among the most attractive options for multiplexing analysis, the low sensitivity hinders their application in detecting low-abundance pro- tein targets.

17 Array-based multiplex immunoassay The idea and possibility of a multi-analyte immunoassay was brought up in 1989 by Roger P. Ekins in a symposium [26]. One way to realize simultane- ous detection of multiple protein analytes is to label the reagents that could recognize an analyte with known labels such as different fluorophores or by identifying the position of captured and detected proteins in two-dimensional space in an array. These are the basic ideas behind array-based multiplex immunoassays. Planar array and suspension bead array are the two most successful technologies among array-based multiplex immunoassays (Figure 2), which have wide applications in biomarker discovery in a variety of ma- trices and diseases, including ovarian cancer [27], coronary artery disease [28], and diabetic macular edema [29]. In a planar array, antibodies or other affinity binders such as single chain fragment variables (scFv [30]) recogniz- ing different analytes are immobilized on either glass slides or membrane at designated spots, while the suspension bead array uses color coded magnetic beads on which different antibodies have been immobilized to distinguish the analytes (Figure 2). The antigen capture step and the detection step are similar to standard ELISAs. Multiplexing capacity for planar arrays ranges from 20 to 1,000 among different commercialized products. Eke et al. pub- lished a paper using a phosphospecific protein array (up to 1,318 plex) from Full Moon Biosystems to study EGFR, ERK-related signaling network after Cetuximab-mediated EGFR inhibition [31]. The suspension bead array sys- tem commercialized by Luminex has a capacity to read out 100 different reactions in multiplex by using beads where two fluorescent dyes have been mixed in different ratios, and multiplexing can be up to 500 if three fluores- cent dyes are used in the beads. The color coded beads together with fluores- cence labeled immunocomplexes can be detected by a flow-cytometry-based instrument to identify the analyte and quantify the protein amount (www.luminexcorp.com). Background due to binding by detection antibod- ies to the solid support is a bottleneck both for sensitivity and multiplexing in array-based multiplex assays. Cross-reactivity of detection antibodies for irrelevant proteins in the sample is yet another important issue regarding the reliability of the assay [32].

18 ...

a Planar array b Suspension bead array

Figure 2. Illustration of array-based multiplexing immunoassay. (a) Planar array immobilizes different capture antibodies on different spot of the array and antigens from samples are captured on different spots. (b) Suspension bead array utilizes color-coded magnetic beads as solid support where capture antibodies recognizing a specific antigen are immobilized. For both planar array and suspension bead array, single-binder assay where only the capture antibodies are involved requires antigens either directly labeled with fluorophore or biotinylated (then react with streptavidin conjugated fluorophores, (a)) [33]. A more specific format for array-based multiplex assay uses detection antibody labeled with fluorophore besides the capture antibody similar to sandwich ELISA shown in (a) and (b). Finally the planar array is readout with fluorescence scanner (a) and suspension bead array is readout by a flow- cytometry like instrument where beads are flown through the double detectors, one to decode the fluorescence code in the beads to identify the antigen and the other for quantification of the antigen (b).

SOMAscan proteomic platform — personalized medicine The SOMAscan proteomic platform from SomaLogic is a highly multi- plexed assay that aims at personalized diagnostics, prevention, and treat- ment. The platform relies on a new class of SOMAmer molecular recogni- tion elements (MREs) and DNA microarrays technology to measure up to 1,129 proteins from 150 µl of biological sample (blood, plasma, CSF or tis- sue homogenates) [34]. SOMAmer (Slow Off-rate Modified Aptamers) is a type of DNA aptamer with modified bases that specifically binds protein surfaces by folding into intricate three-dimensional structure. The specificity is ensured by in vitro selection from a large library of random sequences using systematic evolution of ligands by exponential enrichment (SELEX) for the ones with slow dissociation rate (t1/2 < 30 min). In SOMAscan, pro- teins from biological samples are first bound by correctly folded SOMAmers with biotin and fluorophore labeled to form immunocomplexes and then applied to streptavidin-coated solid supports to sequentially remove proteins and SOMAmers that do not form immunocomplexes. The unspecific SOMAmers are removed by stringent washes in the multiplex assay while

19 the cognate ones will remain bound to their target. Finally, the SOMAmers are released and captured on a DNA microarray. Signal intensity from a certain spot of the microarray quantitatively reflects the concentration of a certain protein from the original samples. SOMAscan has been used in large clinical studies for diagnostic biomarker discovery of Alzheimer’s disease [35], chronic kidney disease [36] and non-small cell lung cancer [37].

DNA-assisted immunoassays Among the affinity-based protein technologies, a special class uses DNA- labeled affinity reagents to report the quantity, location and interactions of proteins. Unique advantages of affinity reagents labeled with nucleic acids in protein detection include high specificity of DNA hybridization, flexibility in DNA reactions like ligation and cleavage, enormous information storage capacity in DNA, and the opportunity for amplified detection [38].

Immuno-PCR Large numbers of techniques have been designed to improve the perform- ance of traditional affinity-based protein technologies. Immuno-PCR was first introduced in 1992 to report the presence and the quantity of proteins [38]. To detect an antigen, the monoclonal antibody immunocomplex immo- bilized on a microtiter plate wells was performed with probes comprising of oligonucleotides conjugated secondary antibodies. The oligonucleotides were then amplified by PCR and the PCR products were analyzed by aga- rose gel electrophoresis stained with ethidium bromide. The assay could detect 580 BSA molecules, which was much more sensitive than a conven- tional ELISA at that time due to the fact that the amplification magnitude of DNA is much larger than enzymes catalyzing a substrate reaction.

Proximity-dependent immunoassays In 2002, Dekker et al. described the Chromosome Conformation Capture (3C) methodology to capture the interaction between any two genomic loci by intramolecular ligation of the restriction-enzyme digested ends of the proximal genomic sequences, having been crosslinked to freeze the interac- tion [39]. The ligation products were amplified by qPCR with specific primer pairs to determine the ligation frequency. This approach first revealed relative spatial disposition and physical properties of the chromatin fiber in three dimensions within selected genomic regions. In the same year, Fredriksson et al. described a new method for sensitive and specific protein detection building on a similar idea as the 3C technol- ogy [40]. The so-called homogeneous proximity ligation assay (PLA) for protein detection was first realized by proximal binding of two DNA aptam- ers to the same protein favoring ligation of the oligonucleotides extensions to the aptamers. The ligation reaction generates an amplifiable DNA template.

20 Similar to the immuno-PCR, the quantification of proteins in PLA is con- verted to quantification of the proximal ligation products, which is per- formed using qPCR. Two years later in 2004, the same idea was applied in cytokine detection using commercial antibodies conjugated with oligonu- cleotides instead of DNA aptamers [41]. Analogous to homogeneous PLA, the proximity extension assay (PEA) utilizes DNA polymerases with 3’ ex- onuclease activity to allow extension of DNA strands upon proximal hy- bridization, which proves to work successfully also in complex matrices like blood where the activity of DNA ligase is somehow inhibited [42]. PLA can also be performed on solid supports (solid-phase PLA) in a manner similar to sandwich ELISAs, where the target is first captured through antibodies immobilized on a solid support – surface-modified plate [43] or magnetic beads [44] – and subsequently detected with a pair of anti- body-DNA conjugates similar to the ones used in the homogeneous prox- imity assays. The introduction of the solid support allows higher concentra- tion of affinity reagents used in the assay without increasing the background due to the washing steps removing excess reagents. Triple recognition in- creases detection specificity as more proof-reading steps are involved, but it may also reduces the detection efficiency as each step is not 100% efficient. Other variants of proximity ligation assays have been developed such as 4PLA, which requires five recognition events (one capture and four detec- tion probes) for detection of secreted vesicles with multiple epitopes on sur- face such as prostasomes [45]. Both solid phase PLA and homogeneous PLA or PEA have been adapted for multiplex detection of up to 36 (solid phase PLA) or 96 proteins (PEA), with either qPCR (can be miniaturized in Flu- idigm chip) or next-generation sequencing readout [46-48]. Similar efforts were made in the study of protein-protein interactions and PTM in fixed cells and tissue sections. A method that can be used to detect protein-protein interaction, report PTM, and digitally quantify these events was published in 2006 as in situ proximity ligation assay (in situ PLA) [49]. In these assays, a pair of antibody-DNA conjugates recognizes each of the partners of a protein interaction or a PTM together with its cognate protein in fixed cells or tissue sections. The recognition will give rise to a circularized DNA molecule that can be amplified through rolling circle amplification (RCA), forming a long single strand of repeated DNA sequences, which can be visualized under the microscope as a bright spot after hybridization with fluorescence-labeled detection oligonucleotides. The interaction event can be semi-quantified by counting those bright spots. A multiplex detection of PPI was demonstrated to visualize EGFR, HER2 and HER3 homo and het- erodimers in breast cancer tissue sections, and β-catenin complex formation with either E-cadherin or TCF1 in colorectal cancer tissue sections [50]. The difference of homogeneous PLA (or PEA) from other sandwich as- says, which also utilize two antibodies, one for capture and the other for detection, is that the signal generation of PLA depends on the simultaneous

21 binding of two different probes (DNA-conjugated affinity reagents) to the same protein target, while only the detection antibody in the sandwich assay contributes to signal generation. For singleplex sandwich assay, the back- ground signals seen for samples where no target antigen is present mainly comes from detection antibodies nonspecifically stuck on the solid support, something that can be partially reduced by the choice of solid support, by blocking, and by stringent washing. For homogeneous PLA, background is generated from “false proximity” regardless of presence of the antigen, a function of the concentration of PLA probes. The error rate for a pair of an- tibodies recognizing the wrong target molecule (interfering antigen) at the same time is similar for sandwich assays and for PLA. However, in a multi- plex sandwich assay where a pool of probes is present, there are more oppor- tunities for cross-reactive detection by incorrect pairs of capture and detec- tion, as well as increasing risks of unspecific binding by the detection anti- bodies to the solid support due to their greater numbers. Proximity-based assays like PLA or PEA have advantages here as only the cognate pairs of antibodies can produce detection signals, and the risk that two probes simul- taneous bind the wrong target is much smaller than for individual affinity reagents (Figure 3).

22 Sandwich assay Homogeneous PLA Singleplex

......

Multiplex ......

Background Correct IC Wrong IC Ligated oligonucleotides Wrong IC producing signals

IC-immunocomplex Signal-producing label Wrong IC not producing signals Figure 3. Illustration of sandwich assay and homogeneous PLA in both singleplex and multiplex format. Here sandwich assay refers to immunoassays involving two antibodies with signal-producing labels on one of the antibody. Possible ways to generate background signal are illustrated in green area. In the multiplex-PLA for- mat, the ligation template is specific for each correct immunocomplex so that even if one of the probes binds to the wrong target, there will be no signal generated (grey area). However, if the signal production is merely relying on one antibody like most sandwich assay, cross-reactivity of the detection antibody will be problematic in producing wrong signals (yellow area).

Among all the immunoassays that I have overviewed above, there is no per- fect assay that can both accurately quantify the abundance of proteins pre- sent at extremely low amounts in biological samples as reported for the Sin- gulex Erenna immunoassays and Quanterix Simoa assay, and analyzing thousands of proteins at the same time such as the technology of SomaLogic. The trade-off between sensitivity and multiplexing has always been an issue of immunoassays in clinical diagnostics especially for ones used to find rare disease biomarkers. In Figure 4, I summarized the sensitivity, dynamic range and multiplexing from a few well-known commercial assays for measuring the human cytokine IL-6. The relatively poor LOD from SomaLogic multi- plexing panel compared with MSD or Olink may reflect the weakness of a single-binder assay. Other aspects such as accuracy, sample consumption, throughput and handling are also considered when deciding whether an as- say can be widely used in the clinic.

23 107 105 Olink SomaLogic

105 103 MSD

Quanterix Singulex 3

R&D 10 fM 10 pg/ml

10 10-1

Quantification range 10-1 10-3 103 104 105 106 107 108 109 LLOQ molecule number Figure 4. Quantification ranges and lower limit of quantification in molecule num- bers of IL-6 from different commercial assays or kits. Quantification range is the upper limit of quantification till lower limit of quantification (LLOQ) in concentra- tion, obtained from datasheet published on company website or scientific publica- tion. Lower limit of quantification in molecule number (LLOQ molecular number) is calculated by multiplying LLOQ and sample volume (V) and Avogadro constant. Singleplex assays or kits chosen here are Quanterix IL-6 Simoa™ Assay Kit (100190), Singulex Erenna® IL-6 Immunoassay Kit (03-0089), R&D systems Hu- man IL-6 Quantikine HS ELISA Kit (HS600B). Multiplexing assays and kits are from Olink Bioscience Proseek Multiplex Inflammation I 96x96 (http://www.olink.com, [48], 92-plex), MSD (Meso Scale Discovery) Human Cyto- kine (30-Plex) (K15054D), SomaLogic SOMAscan™ assay (www.somalogic.com, [34], 1129-plex).

Affinity reagents One key component in the affinity based immunoassays is the high-quality functionalized affinity reagents whose purpose is to correctly capture and detect the target protein, and convert the recognition to signals that can be recorded by an instrument. Affinity and specificity of affinity reagents are the two important factors that directly decide the performance of an immu- noassay. Several conjugation protocols for attaching different moieties to affinity reagents have also been developed since the first generation of im- munoassays, including non-covalent linking of streptavidin to biotinylated molecules and covalent crosslinking through crosslinkers.

Conventional antibodies The production of polyclonal antibodies is achieved by inoculation of a suit- able mammal, such as a mouse, rabbit or goat with antigen. The serum col- lected from the inoculated animal (antiserum) contains a mixture of immu- noglobulin G (IgG), a small proportion of which is directed against different

24 regions of the antigen, produced by different B cells in the animal. The anti- bodies can be purified with either protein A/G for isolating total IgG, or an- tigen immobilized affinity chromatography [51] to collect only specifically antigen binding antibodies. Monoclonal antibodies are usually made from mouse hybridoma cells created by fusing myeloma cells with spleen cells from mice immunized with a certain antigen. Since those hybridoma cells are immortalized and produce only one type of antibody, monoclonal anti- bodies can be purified from their supernatants, where each antibody is di- rected against one epitope of the antigen [52]. Different efforts have been made to produce reliable antibodies either by rational design or through strict validation. Atlas Antibodies Advanced Polyclonals (Triple A Polyclonals), developed within the Human Protein Atlas project, are polyclonal antibodies produced from immunizing rabbits with a Protein Epitope Signature Tag (PrEST), a software-designed peptide of 50-100 amino acids whose sequence exhibits minimal homology to other proteins (www.atlasantibodies.se). The use of PrEST antigen reduces the risk of producing polyclonal antibodies with off-target binding or cross- reactivity with other homologous proteins compared with immunizing with full-length antigen [53]. Similar efforts were made by NCI (National Cancer Institute) with their Antibody Characterization Program by producing at least three monoclonal antibodies from one expressed recombinant antigen and one antibody per peptide, which are characterized extensively through ELISA, western blot, immunohistochemistry and so on (http://antibodies.cancer.gov/apps/site/default).

Recombinant antibody fragments and other protein scaffolds Besides producing antibodies using animals, several recombinant in vitro selection technologies have been developed for high-throughput expression and selection of affinity binders. In nature, the diversity of antigen-binding sites in immunoglobulins (Igs) derives from VDJ recombination of large sets of variable (V), diversity (D) and joining (J) gene segments in naïve B cells, along with the addition and deletion of nucleotides at the junction between the joined gene segments. The rearranged V-region genes are further diversi- fied by somatic hypermutation introducing point mutations. These mecha- nisms can theoretically generate ~1013 (~1011 in real life) different Igs in humans. Affinity maturation of the immune response towards the antigen is achieved by clonal expansion and affinity selection of Ig-producing B cells, to which the follicular dendritic cells present antigens. The B cells carrying Ig (B cell receptor) with the highest affinities towards the antigen are se- lected to survive [54]. The selected B cells then undergo a process of so- matic hypermutation, where subclones with further increased affinity for the target protein are selectively expanded. In analogy to nature, in vitro display libraries such as phage and ribosome display libraries can be used to select binding reagents from large pools [55]

25 (Figure 5). In vitro selection approaches have several advantages over the natural system in producing antigen-specific affinity binders. First of all, the in vitro display systems avoiding animal immunization potentially reducing production cost per antigen. In addition, affinity binders against various types of antigens that will cause problems when immunizing an animal in- cluding highly-conserved proteins (histones and ubiquitin) or toxic antigens (drug molecules) can be selected in vitro. Furthermore, panning conditions with different buffer conditions, modulators and competitors allow that li- braries be screened using antigens with desired conformation and PTMs so that the affinity and specificity can be guaranteed. Last but not least, the recombinant binders are generally small and can be modified with desirable features including oligopeptide tags such as the polyhistidine-tag and the FLAG-tag, expressed either at the C-or N-terminal of the binder to facilitate purification and further modification. For the recombinant antibody selection process, only the genes encoding the V domain from the heavy chain and the light chain of an antibody are cloned into a vector for in vitro library display system. The two domains can be joined with an oligopeptide linker to form a single chain Fv fragment (scFv) or connected by a disulfide bond to form a Fab fragment (Figure 6). By expressing the Fc part from different species together with the scFv se- lected from in vitro library display system to create the scFv-Fc (Yumab, http://yumab.com/), recombinant antibodies can bind target proteins biva- lently as natural antibodies but retaining advantages of recombinant binders. The Fc part in the Yumab can either be humanized, which is beneficial in therapeutics, or with a fusion partner from other species such as the mouse and rabbit so that it can be directly detected with secondary antibodies [56]. Nanobodies are another example of recombinant antibody fragments consist- ing of a single monomeric variable antibody domain of 12-14 kDa, derived from antibodies lacking light chains that are found in camelids [57-58]. There are also non-antibody protein scaffolds such as designed ankyrin re- peat proteins (DARPins) that are derived from natural ankyrin proteins of around 14 kDa consisting of four or five repeat motifs of these proteins [59- 60]. Both nanobodies and DARPins show better stability than natural anti- bodies under high temperature (> 65 °C) and in human blood serum, and they are thus promising as research tools, therapeutic agents, and diagnostics reagents [61-65]. Although recombinant binders have many advantages over the conven- tional antibodies in efficiency of production with relatively low cost and the possibility of generating binders with higher affinity and specificity, conven- tional antibodies remain the by far most widely used reagents in many appli- cations (Table 1). The main reason is that commercial antibodies nowadays cover over 80% of the protein-coding genes in the human genome, while the number for recombinant binders target less than 1% according to the Affi- nomics database (Table 1). No concrete study has been conducted to com-

26 pare the antibodies with different classes of recombinant binders either in current immunoassays or in therapeutics. In the ideal case, affinity binders should be raised against all the proteins and PTMs in the human proteome. Efficient selection for binders with high-affinity and mono-specificity, as well as reliable high-throughput validation are urgent requirements for affin- ity-based proteomics in the future. Finally, cost for binder production and validation should be low enough so that they are accessible for most re- searchers.

Functionalizing affinity reagents for different applications Affinity reagents in modern immunoassays are usually functionalized before they can be used for different applications. They are either immobilized on the solid support or labeled with different molecules to be able to generate signals to be read out by instruments. The first described ELISA used BrCN- activated microcrystalline cellulose to immobilize IgG, and alkaline phos- phatase (ALP) was conjugated to rabbit IgG using glutardialdehyde, which is a covalent conjugation by a homobifunctional crosslinker [13]. The probe used in the first immuno-PCR was linking biotinylated oligonucleotides with antibody through a streptavidin-Protein A chimera, which is a non-covalent interaction with extremely low dissociation rate [38].

27 mRNA dsDNA mRNA/DNA hybrid

PRMC ligand immolized solid support

A B

Sequencing or C large-scale expression

G H High-throughput panning: -Different antigens/ligands -Buffer condition, pH, salt -Competitors

F D

E Figure 5. Schematic diagram of ribosome display. (A) Library of double-stranded DNA encoding different binders (1012-1013 consisting of conserved sequences and random sequences) is transcribed into mRNA by in vitro transcription. (B) Pool of mRNAs is in vitro translated into peptides or proteins through ribosome stalling so that the nascent peptides or proteins will not dissociate from the ribosome and form protein-ribosome-mRNA- complexes (PRMCs). (C) PRMCs then undergo a high- throughput panning step where selection can be done by capturing PRMCs on solid support immobilized with different antigens or ligands under different panning con- ditions such as buffer condition (pH or salt concentration) or by adding different competitors so as to increase the specificity. Lower-affinity binders carrying PRMCs will be washed away in this step as higher-affinity binders carrying PRMCs will be selected in step D where mRNAs are eluted from the PRMCs and reverse tran- scribed into cDNA (E). (F) cDNAs are then amplified by PCR and from higher-affinity PRMCs are enriched. (G)The enriched double-stranded DNA pool will go through second cycle of selection by in vitro transcription like step A. After a few cycles of selection (affinity maturation, 5 cycles from a library of 108 possible binders [55]), high-affinity (~nM) binders can be selected against a specific antigen under appropriate panning condition. (H) The sequence of the binder can be ob- tained by sequencing the DNA after selection and selected target DNAs can be brought to large-scale production for further validation.

28 Table 1. Comparison of conventional antibodies with recombinant binders Conventional antibody Recombinant binder pAb mAb ScFv DARPinNanobody Genomic 80.3 0.88 0.47 0.28 coverage (%)* R&D (6020) Abcam (17542) Yumab VIB Availability§ Abcam (44248) NA SCBT (17857) (6 weeks) (3 months) SCBT (47903) Binder size (kD) ~150 ~150 29 14 14 Animals Mammalian Production E.coli, yeast (rabbit, goat) cell culture Tedious, sub- Affinity Easy, mutation and Not possible cloning and maturation in vitro selection selection Conjugation Chemical crosslinking Site-specific conjugation

*Genomic coverage (%) is the percentage of available binders against certain genes divided by the total amount of protein-coding genes in the human genome (20687). The number for conventional antibodies is taken from the Human Protein Atlas (http://www.proteinatlas.org/) that has tested 21984 antibodies from different sources on 16621 human gene products. For recombinant binders, the data are ex- tracted from the Affinomics database (http://www.proteinbinders.org/index.php). §Availability refers to the commercial availability of the binder. The numbers in the brackets for conventional antibodies are the number of available binders from the company. R&D (R&D systems, http://www.rndsystems.com/), Abcam (http://www.abcam.com/), SCBT (Santa Cruz Biotechnology, http://www.scbt.com/). For recombinant binders, there are no accessible binders commercially other than custom service and the delivery time is stated in the brack- ets. Yumab (http://yumab.com/), VIB (http://www.vib.be/en/Pages/default.aspx). NA, not available; pAb, polyclonal antibody; mAb, monoclonal antibody.

Nowadays crosslinking can be performed between proteins with different molecules such as nucleic acids [41], drugs [66-67] and peptides [68] for numerous applications. Intramolecular crosslinking of large protein com- plexes provides insights for studies of protein structures [69-70]. A crosslinker is a chemical reagent that can react with two or more chemical groups, either similar or different. The groups for protein crosslinking are typically primary amines present on the N-terminal of each polypeptide or the side chain of lysine, carboxyls on the C-terminal of each polypeptide or the side chain of aspartic acid and glutamic acid. Other residues available for crosslinking are sulfhydryls, present in the side chain of cysteine that is usu- ally in the form of disulfide bond and carbonyls groups oxidized from car- bohydrate groups available mostly in glycoproteins. The general principle of crosslinking two molecules A and B is basically first to modify both A and B with an excess of crosslinkers to obtain the reactive group if there is no di- rect reactive group on the molecules that can be conjugated. More often one

29 of the molecules to be conjugated such as fluorophores or nucleic acids are already modified with reactive groups and are usually commercially avail- able (Figure 6). Unreacted crosslinkers can be removed by either dialysis or desalting. Conjugation of molecule A and B can be realized by forming co- valent bond between the reactive groups on them. As chemical reactions usually are less than100% efficient, the yield after conjugation is a mixture of conjugated A-B and individual molecules A and B. This phenomenon would affect some of the applications and will be addressed in paper II of this thesis. Two general concerns of choosing crosslinkers are the chemical specificity and the length of the spacer in the crosslinker. Chemical specific- ity means that the reactive group on molecule A should not be present on molecule B, otherwise molecule B might be conjugated with itself and that is why heterobifunctional crosslinkers are more widely used in conjugation nowadays. Spacer length can influence the conjugation efficiency and is important in nanoscale structures, while short spacer may cause steric hin- drance for conjugation of two large molecules [71-72]. Although conjugat- ing proteins using crosslinker is a universal approach, there are some obvi- ous disadvantages such as random site modification will sometimes destroy the active site of the proteins such as antibodies and the ratio between the two conjugating molecules cannot be controlled.

Molecule A Molecule B Covalent bond IgG Fab R’ H O O R N + OON N R’ R H H VH AmineNHS ester Amine bond NH C R’ 2 H NH2 -S-S- R’

V S-S -S-S- OON

L -S-S- S-S S S-S OON S-S NH + C 2 R H R L S-S S S-S ThiolMaleimide Thiolester bond ScFv O O O

S-S R N S-S NH 2 + H N C N R’ C 2 N R’ H Fc R H H H Carbohydrate Carbohydrate S-S S-S AldehydeHydrazide Hydrazone bond

+ - N R CCH + R’ N N N R’ N N R AlkyneAzide Heterocycle Figure 6. Illustration of immunoglobulin G (IgG), Fab fragment, scFv (left) and chemical reaction of reactive groups on molecules (right). Chemical groups avail- able for conjugation are illustrated on IgG. S-S is the disulfide bond that can be reduced to be thiol group, NH2 is the primary amine group and carbohydrate group on the Fc part can be oxidized to form aldehyde group. Heavy chain and light chain of the IgG is illustrated in green and orange. VL and VH is the variable region on light chain and heavy chain, and CL and CH is the conserved region on light chain and heavy chain of an IgG.

Site-specific conjugation is possible for recombinant binders by having a designated position for conjugation to avoid modifying the active binding

30 site. This can be done by expression binders together with a polypeptide (tag) as a fusion protein. One of these widely used tags is the SNAP-tag, a 20 kDa mutant of the DNA repair protein O6-alkylguanine-DNA alkyltrans- ferase that reacts specifically and rapidly with benzylguanine (BG) deriva- tives, leading to irreversible covalent labeling of the SNAP-tag with a syn- thetic probe [73]. Recombinant binders such as DARPins have been fused with a SNAP-tag allowing conjugation to be performed through the SNAP- tag reacting with BG modified oligonucleotides [74]. Despite from protect- ing the active binding site of the affinity reagents, another important feature of site-specific conjugation is the 1:1 stochastic ratio between binder and the labeling moiety, which is crucial for some application like absolute quantifi- cation and single molecule studies. However, expressing a fusion tag similar in size to the binder itself is not an ideal approach for conjugation. Another means of site-specific conjuga- tion is to incorporate an unnatural amino acid with reactive group such as alkyne at a designed position during mRNA translation through a mutant tRNA synthetase so that the usual amber stop codon is recognized for a tRNA carrying the unnatural amino acid [75-76].

Emerging areas requiring new technology

Single-cell proteomics — less is more Studying biological systems in bulk using the technologies mentioned above have been successful in the past decades, establishing that cells interact with each other in pathways and networks to organize themselves into tissues and functional organs [77]. However, even daughter cells from a recent cell divi- sion differ from each other due to the subtle difference in microenvironment, which may be mediated by gradients of growth factors, oxygen and other perturbances [78]. The heterogeneity among cells is even more astonishing in different cancers as they start from one cell and gradually expand into whole tissues of cells with variable genetic and epigenetic traits. This het- erogeneity tends to make tumors resistant to therapies as they can resurrect themselves through the survival of subclones insensitive to the therapy [79- 81]. Thus, a new research field has recently emerged in order to understand the interplay of cancer cells with their microenvironment by studying the linkage of phenotypic and genotypic heterogeneity on a single-cell level. Challenges for single cell studies lie in the low copy numbers of the target molecules and the small volume of the sample, thus a highly sensitive and multiplex assay is required for the study of single-cell proteomics. Single-cell genomics studies, largely depending on methods to sequence DNA and RNA from a single cell, have made great progresses during the past few years thanks to rapid development in next-generation sequencing

31 and various DNA amplification techniques to allow unbiased amplification. Single-cell transcriptome studies can even be performed on fixed tissues to obtain spatial information [82-83]. Basic principle for single-cell analysis is to sort out single cells using a flow cytometer or other micro-/nanofluidics, or by fixing them on glass slides for analyses through imaging strategies. Proteomic studies on the single-cell level are advancing by combining dif- ferent technologies. Among them, cytometry by time of flight (CyTOF) is a unique example overcoming the low multiplexing in traditional flow- cytometry by replacing the fluorescence labels on antibodies with stable isotopes that can be detected by a time-of-flight mass spectrometer. The isotope labeling allows multiplexing analysis of up to 42 analytes with the speed of 1,000 cells per second, which is superior to regular fluorescence- based flow cytometer with 12 analytes at best. The CyTOF technology has been used to systemically study immune signaling pathway in healthy hu- man hematopoiesis [84] and to phenotypically profile CD8+ T cells in pathogen response [85] by analyzing major proteins components in single cells. Other ways to do single-cell analysis are based on micro- or nanoflu- idic systems so that all the treatments of cells are carried out in small cham- bers allowing trapping of individual cells. A procedure for single cell west- ern blotting (scWestern) published recently used microscope slide coupled with a 30 µm thick polyacrylamide gel micropatterned with an array of 6,720 microwells to allow settlement of single cell into individual microwell, followed by a series of reactions such as lysis, electrophoresis and probing from the wells to the gel. In this manner, proteins are detected with a thresh- old of 30,000 molecules per cell and multiplex detection of 11 protein targets was achieved through stripping and reprobing [86-87].

Resolving protein-protein interactions — pair-wise is not enough One important task in deciphering cellular biology is to resolve complex protein interaction networks as functional cellular processes are usually or- chestrated by various sets of protein complexes. Different technologies have been developed in order to map pair-wise protein-protein interactions in a high-throughput and systematic way. The yeast two-hybrid system (Y2H) was first reported in 1989 for screening of interacting proteins by expressing the bait and prey protein fusions with DNA-binding domains (DBD) and activator domains (AD) of a transcription factor, for instance as DBD-bait and AD-prey fusion proteins [88]. Physical interaction of prey and bait pro- teins bring the DBD and AD domains of the transcription factor together to initiate expression of the reporter gene, resulting in a detectable phenotype. Four years after the completion of yeast genome sequence, a comprehensive analysis of PPI using yeast Y2H was established in 2000 to screen 6,000 yeast transformants with open reading frames inserted into the Gal4 tran- scription activation domain against 192 yeast proteins, and found 957 puta-

32 tive interactions involving 1,004 yeast proteins [89]. These in vivo 2-hybrid screening systems are easy to establish and can be scaled up to screen thou- sands of proteins, however, the high false-positive rate and unnatural protein expression are main drawbacks. Other high-throughput pair-wise screening technologies use selected bait proteins immobilized on solid supports to cap- ture proteomic-scale prey proteins generated from phage- [89] or ribosome display [90] libraries through affinity enrichment, and the enriched interact- ing proteins are resolved by next-generation sequencing of protein coding sequences from the display libraries. In addition to the above described approaches, another attractive method to resolve pair-wise PPI is to use DNA-barcoded proteins combined with high-throughput DNA readout such as microarray and NGS. Proteins bar- coded with DNA can either be done using affinity reagents covalently con- jugated with DNA or expressed through in vitro translation by protein- ribosome-mRNA-DNA complexes (PRMC), which can be decoded at sin- gle-molecule resolution. The interaction of proteins can be detected by en- zymatically joining the proximal barcode DNAs either by ligation or exten- sion, or by a colocalized in situ amplified polony technology described by Gu et al. in 2014 [91]. Compared with the previously discussed 2-hybrid system or display technologies, these DNA-barcoded protein assays could allow detection of high-multiplexing PPIs in one tube. However, the draw- back of these studies is similar as in earlier techniques, in that the proteins of interest are not in their natural states. In an article published in 2012, pair- wise interactions among four proteins in the NFƙB pathway together with one housekeeping protein were investigated by proximity ligation of any two interacting proteins with specific antibodies conjugated with a pair of oli- gonucleotides, from which the ligation products were quantified on a mi- croarray [92]. Proximity ligation for pair-wise interaction was also applied to look for two monoclonal antibodies among a pool of 20 antibodies binding to the same target protein with ligation products sequenced using NGS (Hammond et al, manuscript). Similar approach based on proximity exten- sion was used to screen ligand-target pairs between DNA-conjugated protein targets through SNAP-tag directly from cell lysate and a library of DNA- linked small molecules [93]. Although affinity reagents-based approaches have the advantage of directly analyzing biological samples, weaknesses include that the detection rate is compromised by lower-grade of affinity. Proteins in nature are not limited to pairwise interactions. Higher-order protein complexes with more than two interacting partners are the norm, and seen for instance in the transcription initiation machinery and also exosomes that can be regarded as highly-packed proteins in nanometer scales. Imaging approach with different fluorescence merging for in situ analysis can detect proteins interacting with more than two partners. In situ PLA allows detec- tion of three interacting proteins [94] and multi-recognition PLA is shown to detect 4 epitopes in proximity [45]. Those approaches are, however, limited

33 to detection of homogeneous complexes with known interacting partners. No current technologies are able to profile heterogeneous protein complexes with more than two interacting partners.

34 Present investigations

Paper I – A tosyl-activated magnetic bead cellulose as solid support for sensitive protein detection

Background Magnetic nano- or microparticles have been widely used in biological appli- cations such as purification and enrichment of, for instance, nucleic acids, proteins, peptides, cells, etc. due to their good kinetics and relatively large binding capacity. They were first described in 1977 to be used as solid sup- port in a sandwich immunoassay [95]. The production of magnetic particles can be generalized by mixing iron oxide (either maghemite or magnetite) with a polymer to form the core-shell of the magnetic particles [96]. Differ- ent surface chemistries can be used to modify the surface of the magnetic particles in order to adapt them for different applications. The solid-phase proximity ligation assay (SP-PLA) using magnetic microparticles was fur- ther developed in 2010 from the original homogeneous PLA to further in- crease the sensitivity and specificity of the immunoassay by demanding the requirement of increased target recognition events and the possibility to re- move interfering agents in complex biological samples and excessive probes [97].

Aim of the study The aim of the study is to develop new magnetic cellulose beads functional- ized with tosyl groups and to demonstrate that they can be used in immuno- assays such as proximity ligation assay.

Findings and conclusions We have developed a protocol to prepare chemically-stable magnetic bead cellulose with tosyl group on the surface with average diameter of 30 µm. TEM and light microscope were used to characterize the wet and dry MBCs or tosylated-MBCs (MBC-Ts). We have also demonstrated that the tosyl group on the surface of the beads can be further modified either directly with antibodies or by first coating with streptavidin before antibodies are immobi- lized via biotin. We then evaluated the functionalized MBC-Ts in SP-PLA for detection of vascular endothelial growth factor (VEGF). The density of the immobilized antibodies was found to be crucial for the signal to noise,

35 with direct effects on the performance of the SP-PLA. The performance of MBC was compared to that of commercial magnetic beads in different as- pects such as limit of detection (LOD), robustness and dynamic range of SP- PLA. We found that the performance of MBC-Ts is comparable to that of Dynabeads in detection of VEGF in less complex samples, but better than Dynabeads when detecting VEGF in complex matrix such as 100% chicken plasma.

Paper II – A universal approach to prepare reagents for DNA-assisted protein analysis

Background DNA-assisted protein analysis technologies play an increasing role in analy- ses of proteomes [98], and for disease diagnosis, prognosis [99] and person- alized medicine [100]. High-quality and pure DNA-coupled affinity binders are required to achieve optimal performance in such analyses. In the various conjugation methods, the efficiency of functionalization of affinity binders is usually maximized by employing a large excess of suitably modified oli- gonucleotides, thereby resulting in a mixture of the desired conjugates and an excess of free oligonucleotides, along with any remaining unconjugated affinity binders. In DNA-assisted protein analysis, unconjugated binders in the detection reactions tend to compromise assay performance by competing with properly functionalized affinity reagents for binding to the respective targets, resulting in reduced detection signals. Remaining free oligonucleo- tides are liable to cause increasing assay background, which is particularly deleterious in homogeneous assays where no washing steps are applied, such as in homogeneous PLA [40-41].

Aim of the study The aim of this study was to establish a general and efficient approach for preparation of high-quality DNA-coupled affinity binders free from oligonu- cleotides and unconjugated natural antibodies or alternative binding scaf- folds. Such protocol was to be evaluated in homogenous- and in situ PLA for improved performance.

Findings and conclusions We have established a two-step purification protocol to prepare high-quality affinity reagents-DNA conjugates where unconjugated antibodies or recom- binant scaffolds and excessive oligonucleotides were sequentially removed. Antibodies against interleukin 8 (IL-8) were modified with bifunctional crosslinking reagents and conjugated with azide-modified oligonucleotides by click chemistry, while recombinant DARPin against Her2 fused with

36 SNAP domains were coupled with BG-modified oligonucleotides. They were both applied to solid-phase purification, where unconjugated binders were first removed as conjugated binders and excess DNA oligonucleotides were captured by a biotinylated capture oligonucleotides on the streptavidin coated sepharose beads. They were then released by restriction enzyme di- gestion. Next, the mixture of conjugates and excess free DNA oligonucleo- tides were subjected to either protein A/G beads (for antibodies) or his-tag isolation beads (for recombinant scaffolds), and pure conjugates were eluted. We characterized the purified conjugates on PAGE gels to ensure the re- moval of both unconjugated binders and excess DNA oligonucleotides, and then the purified and unpurified probes were compared in homogenous- and in situ PLA. We found that the successful removal of both unconjugated binders and excess oligonucleotides reduce the assay background by 4-fold, resulting in a totally 16-fold higher sensitivity in detection of IL-8 with ho- mogenous PLA. Similar results were obtained with the reduced background in detection of endogenous Her2 proteins with immunoRCA in fixed cells. We also confirmed that the unconjugated binders would lead to a consider- able decrease of signal by competing with probes in in situ PLA. Similarly, excess oligonucleotides increased the background in both homogenous and in situ PLA.

Paper III – Highly sensitive and specific protein and isoforms detection via proximity ligation in capillary westerns

Background Studies of cell signaling pathways have provided insights into cellular mechanisms such as migration, proliferation and apoptosis, and dysregulated signaling is a driving force for cancer and other diseases [101-102]. Post- translational modifications of proteins play an important role in understand- ing complex signaling pathways [103]. Current methods for studying spe- cific proteins and their post-translational modifications such as phosphoryla- tion involve mainly mass spectrometry or antibody-based analysis. However, mass spectrometry demands considerably amounts of samples due to the limited sensitivity, and antibody-based analysis such as western blotting requires previous knowledge of specific phosphorylation sites and is com- promised by cross-reactive antibodies. In a previous study, in situ PLA was combined with traditional western blotting, which allows enhanced sensitiv- ity and specificity in detection of proteins and their modifications due to the requirement for dual recognition, followed by signal amplification [104]. However, the laborious handling procedure and large consumption of re-

37 agents may prevent the assay from being widely applied in research. The NanoPro 1000 system developed by ProteinSimple allows protein and their isoforms to be separated through isoelectric focusing in a nanoliter capillary and immobilized to the capillary wall after crosslinking. Separated proteins are then probed with affinity reagents similar to in regular western blotting. The system is highly automated and is used for various studies for post- translational modifications of proteins. But it remains limited by problems associated with cross-reactive antibodies and insufficient sensitivity.

Aim of the study We aimed to apply the proximity ligation with rolling circle amplification readout format in the NanoPro 1000 system in order to achieve higher sensi- tivity and specificity in detecting proteins and their isoforms with pan- specific antibodies.

Findings and conclusions Different parameters of PLA were tested for detecting recombinant human VEGF in the NanoPro 1000 system and we found that the probe concentra- tion and the RCA duration were positively correlated with the chemilumi- nescent signals. We compared single-binder PLA and standard NanoPro 1000 to detect endogenous level of Erk1/2 in HDMEC cell lysates and found a 10-fold increased of detection signal for Erk2 and a 20-fold increased de- tection signal for Erk1. Thereby we improved the detection limit for Erk in HDMEC cell lysate from 5 µg/ml to 0.6 µg/ml. We also demonstrated im- proved specificity by using double-binder PLA with two cross-reactive pri- mary antibodies against the same target for the detection of endogenous level of aldose reductase and S100A6.

Paper IV – Profiling individual protein complexes by proximity-dependent barcoding

Background Proteins form transient or stable interaction with each other to function in various crucial biological pathways. Protein complexes have always been a great interest in studies of signaling pathways, diseases, and drug responses. Co-immunoprecipitation followed by either western blotting or mass spec- trometry has proven useful to identify and characterize interacting partners in protein complexes. However, these technologies are limited in the aspect of sensitivity and they cannot characterize heterogeneous protein complexes as they are measured in bulk. DNA-assisted protein technology has the potential to detect targets at the single molecule level as affinity reagents are coupled with amplifiable oli-

38 gonucleotides. Proteins are barcoded with designed oligonucleotide se- quences that can be decoded with different readout platforms such as DNA microarray, real-time PCR or next-generation sequencing. Proximity ligation assays with dual-tag array readout have been used to simultaneously profile all pair-wise interactions among a small set of proteins in cell lysates [92]. However, new methods are needed to profile heterogeneous protein com- plexes with more than two interacting partners, such as transcription ma- chineries and exosomes

Aim of the study We aim to establish a method that can be utilized to profile the composition of large sets of individual protein complexes in parallel, and at single mole- cule sensitivity.

Findings and conclusions Here, we report a new approach to profile protein complexes using a prox- imity-dependent barcoding assay (PBA). Antibodies recognizing different proteins are labeled with different clusters of protein-specific DNA tags (PST) and these are brought in proximity when binding to proteins in the same protein complex. The universal sequences on the PSTs in proximity then hybridize to the same clone of a DNA concatamer containing many copies of the same specific DNA tags (CST) allowing the unique sequence to be incorporated in antibody-conjugated oligonucleotides through DNA extension. In this way, all detected proteins are barcoded with DNA strands composed of PST-CST elements indicating the identity of the protein and the individual complex in which this protein participated. All this information was decoded by sequencing the whole DNA library of extended oligonucleo- tides using Illumina technology. First, artificial complexes prepared by mixing streptavidin proteins with biotin-oligonucleotides (STV-bio-oligos) were used to validate the method. A mix of biotinylated oligonucleotides with 4 different sequences were in- cubated with streptavidin tetramers, and in another case, each of the four oligonucleotides were separately incubated with streptavidin tetramers be- fore the four STV-bio-oligos were mixed for the assay where the oligonu- cleotides were extended across unique tag sequences as evidence of prox- imity. We found that STV-bio-oligos incubated separately with one type of oligonucleotide each before being combined for the extension reactions, reflected complexes with one oligonucleotide type each. Conversely, mixed sample were detected as complexes with multiple oligonucleotide types. This demonstrated that the extension happened when probes were in prox- imity, and accordingly that the PBA method is promising for detection of proximity among several proteins. We next used PBA to profile surface pro- teins on prostasomes, secreted vesicles with diameter of 40-500 nM, and

39 found heterogeneous protein complexes associated with two or three distinct proteins such as CD26, CD63 and Cathepsin B.

40 Future perspectives

The weakest link in the “omics” era lies in the proteomics, while compre- hensive understanding has been achieved in different organisms at levels of genome, transcriptome and to some extent the metabolome. There are two intriguing questions regarding proteomes that I am eager to find out the an- swers for. One concerns the connection between protein sequence, structure and function. If the amino acid sequence of a protein is known, will it be possible to predict the 3D structure of the protein and how the structural domains of the protein including specific PTMs relate to its functions and its ability to bind other molecules? Comparative genomics provides information regard- ing homologous protein sequences and domain clusters among different species and protein families, and high-resolution methods such as X-ray crystallography and NMR (nuclear magnetic resonance) can provide direct evidence linking sequence and structure. However, this information is typi- cally acquired from proteins in rather artificial states. Methods like Cryo-ET (cryo-electron tomography) aim at solving structural biology problems in situ, so that large protein assemblies can be viewed in their natural state, although the resolution is considerably reduced compared with X-ray crys- tallography and NMR [105]. Combined with targeted MS, the detailed struc- tural features of a protein such as its PTMs can be demonstrated [106]. The other question is how proteins interact with others to accomplish their various tasks. In the past decades, methods to study protein-protein interac- tion have generated vast amounts of interactome data, and predictions of interaction have been made in silico based on the sequence and structural information among different species. However, interactome data is fre- quently limited to pair-wise interaction and Y2H-like methods investigate interaction from proteins in unnatural states. A desirable method to study interactomes could have following properties: “snap-freezing” individual cells immediately after perturbation so that even the weakest interaction are preserved; “printing” the proteins and interaction onto another matrix so that the space information can be recorded; identifying interacting partners on a single-complex level with advanced proteomic approaches; generating a comprehensive profile of interactions in dimension of time, space and across different perturbations. In the last paper of this thesis, we endeavor to ad- dress this question using affinity-based proximity barcoding, which has some potential advantages, including the natural states of the proteins to be

41 investigated and the potentially very large capacity via DNA barcoding. However, technical bottlenecks on the reliability of affinity reagents, effi- ciency of proximity barcoding, and random proximity overshadowing true interactions remain to be tackled with improvements from different aspects including binder production, molecular design, and data analysis. If proteomics as the weakest chain in omics is strengthened by revolu- tionary technologies, molecular diagnostics will be an important application are in the course of clinical medicine. A regular blood test measuring levels of relevant proteins and metabolites will tell us the status of different organs, where diseases can be found at an early stage as reflected by even very slight changes in the profile. Personalized treatment based on information on the genomic and transcriptomic levels will enhance the success rate of the se- lected therapy. Detailed profiles of patients both on proteomic and metabolic levels during treatment will be increasingly helpful to monitor patients’ re- sponses to the treatment, disease progression, or recurrence. “What I cannot create, I do not understand.” A famous quote from the late Nobel Laureate in physics, Richard Feynman, is often used by synthetic biologists who are trying to make biological systems more efficient, reliable and predictable through engineering-based approaches for research and therapeutic purpose based on the currently limited knowledge of biological systems [107]. Although the approach of creating something not yet under- stood in order to understand it has its challenges, there are numerous poten- tial therapeutic applications of this kind of research, such as immune-therapy with modified T cells, which could be one solution to cure cancer [108].

42 Acknowledgements

Now I would like to express my gratitude to people who have helped me during my PhD years in and outside the lab. First I would like to thank my supervisor Masood Kamali-Moghaddam for giving me the opportunity to work in this fantastic lab, for believing in me all the time even when things are not going as planned. Thank you for all the advices and suggestions scientifically, as well as planning and prioritiz- ing things. Thank you for your encouraging attitude towards the exploratory trial projects and your optimism when I am just being too cautious. I would also like to thank my co-supervisor Ulf Landegren for creating such an excellent place for me to learn a lot of different techniques, to try different things without being restricted to rules and resources. Thank you for your numerous brilliant ideas although sometimes it is a bit hard to fol- low. Thank you for your enthusiasm in conveying ideas to collaborators and for finding interesting application to our techniques. I would like to thank Christina Magnusson for all the help on the admin- istrative work with your great efficiency. I would also like to thank Anna Dimberg for being my examiner and thanks for your compliment on my presentation at the IGP PhD symposium. Claes Wadelius, thanks for all the help with defense application. Special thanks to IGP PhD student council for organizing all the social activities especially the Rudbeck Masquerade which I enjoyed every winter. I would like to thank Erik for organizing all the projects and for the help when I need to write the acknowledgements in a manuscript. Elin E, Chris- tina C and Johanna for all the help in every corner of the lab, thank you for your exquisite home-made fikas. Johan O for being so helpful when our computers are down and teaching me what is a hexagon. For the people from the lab, I would like to thank Maria H for being my first teacher in the lab and introducing me to the colorful world in PLA. Thank you for your enthusiasm in conjugation of all kinds of binders and being a great collaborator in my projects. Di for sharing so many brilliant ideas with me, pursuing the barcoding project with constant efforts, which I believe will work eventually, and encouraging me to do so many new things like snowboarding; Lotta for sharing the passion for exosome purification and detection. Liza for your positive attitude and your warm smile, thank you for bringing important issues up in the Monday morning meeting. Felipe, thanks for bringing sunshine and Brazilian humor into the lab and

43 enlightening me on the redundancy of rolling circle amplification. Special thanks to Anne-Li for all the fun time we had in and outside the lab. Thanks for all the lunches, dinners and party times we had in Uppsala especially the Matrix shopping and shooting. Thanks for your support and understanding when things are not going so well. Lei for growing up with me scientifically in the lab, teaching me how to roll bigger, stronger and brighter blobs and all the other practical things in life like how to drive a car. Tonge for your encouragement when I feel less confident with myself and reminding me always look on the bright side. Rasel (Crowe) for your Bangladeshian humor, which I recently started to get your point. Thanks to Andries for your enormous knowledge about sig- naling pathways and all the biology-related topics; Caroline for pursuing ing single cell proteomics in the lab; Joakim for the expertise in samples and jokes during party dinners; Rachel for being a great office mate since we moved to BMC, for the help in solid-phase PLA, and for all the scientific discussions which ends up in philosophical debates. Also, thanks to Yajun for being curious at different techniques in our lab, eager to learn everything, and the discussion about family and career: Agata for smuggling antibodies from HPA, which works well in the capillary project. Peter for your enthu- siasm in cell signaling pathway and protein interactions. Johan B for your effort in the PrepProx project and I really enjoy working with you. Thanks to Hanan, Leisa and Samaneh for playing confusing students in the spex. Ola Söderberg, thanks for leading the in situ PLA group and your effort in creating an organized working environment; and for being a great oppo- nent in my master thesis presentation. Karin G, thanks for all the help with cells and making excellent spex movies with us: Linda for reminding me to take a break and have a fika, for always helping me when the glass in the hood is stuck. Carl-Magnus for your passion in charity, religion and poli- tics, and of course blobs. Björn, thanks for being a non-stereotypical Ger- man with endless jokes and interesting discussions. Axel for your help with the ancient HPLC, your lectures on Swedish culture and similar taste on American TV shows. Thanks Gaëlle, for organizing all the fun things to do outside the lab, movies, dinners and basketball. I would like to thank the padlock group, led by Mat Nilsson, for helping me think outside the box from time to time: Anja M for the master courses and the Swedish course that we have done together, Marco M for the Italian atmosphere you brought to the lab and sorry for the argument about different shapes of pastas, David H for sharing office in Rudbeck and the interest in GOT, Camilla for bringing some real chemistry to the lab, Elin F for the PERL course and Lotte for making me finally understand the difference between a selector and a padlock through your nice presentations, Thomas, Thomasz, Tagrid, Malte, Anna E, Elin L, Annika for the expertise and fun you brought.

44 And also people who used to work in the lab: Spyros for your million ways of solving one problem when I asked you for trouble-shooting, Rongqin for your strong opinions about feasibility of a project and also poli- tics, Gucci for our collaboration and being a good company for gym and wine, Irene and Ida for leading me into the world of spex, Johan V for the discussions on how to be a good entrepreneur, Mikaela for initiative work of alternative binders, Carla for the Salsa courses, Lena S for zero tolerance of chaos in the lab, Yanling L for updating the scientific world news when we meet, Pier for showing me how to run a pH gradient gel in a noodle. Naveen and Poojah for the Indian food and dessert, Joyce and Jackie for bringing US lab style to the group, George for your enthusiasm in microfludics and classical music, Takao for the best home-made sushi I ever had, Tasuro for the fun discussion at Christmas party, Ping for your pursuit in reducing background for solid-phase PLA, Rui for being great company when I was still a student in the lab, Kun and Xiao for the best Christmas dinner. Lin- nea N, thanks for sharing the student room in Rudbeck and fun time at our first MolTools group retreat. I would like to thank my collaborators outside the lab who have contrib- uted a lot to the thesis and other projects: Daniel Horák for encouraging me to work on the microparticles by sending me numerous emails and finally publish my first paper, Lena Claesson-Welsh and Narendra Padhan for all the help in protein phosphorylation and cell signaling pathway for the PLA on NanoPro project, Elaine Scrivener and Annegret Boge from Protein- Simple for providing the reagents and technical support, and also for the fun time in Paris, Göran Ronquist for constantly providing me with high- quality purified prostasomes, the scientific discussion about exosomes and the museum trip in Boston, Marco Cavalli for providing sheared chromatin, Special thanks to Jonas Schaefer for your hospitality during my stay in Zurich and stuffing me with nice Swiss chocolates from Sigma when some- thing is not working. I would like to thank Femi and Raghu for being my students to practice my pedagogic ideals. I would like to thank Marie Allen, Maria L and Vic- toria for all the new techniques like dusting a fingerprint that I learned when teaching the forensic course. Soumi, Dorothea, Hongxing and Prasoon, it is nice to know you guys in Rudbeck. I would like to thank my classmates in the IBG Master programme, Marius, Anja N, David P, Wael and special thanks to Karin Karlsson for being my examiner for my master thesis and gave me so many critical com- ments. And my corridor mates Sarah, Niclas, David N, Melissa and Johan S, thank you guys for the company during my first year in Uppsala. Also, thank you Agnese for being a great flatmate when I desperately need one and for all the delicious sugar crash in order to survive the winter.

45 感谢在乌普萨拉一起愉快玩耍的小伙伴们,你们的存在让我觉得在 瑞典的生活丰富多彩: 蔡延玲,陈晓梅,陈瑜,程倩,曹晓芳,付曦,郭小虎,郭晓媛, 黄华,胡劲之,华恺,焦响,靳川,康菁菁,李镝锐,林超颖,陆禄, 李黎,李浩,刘珩硕,刘宓,刘杨,孟曦男,倪睿庆,倪蕴韬,潘刚, 钱兆,荣斐,石成蹊,孙立,孙峪,孙阳,田亮,吴世英,王川, 解源,熊安琪,徐成,徐晖,修彩虹,郁笛,于子泉,张伟,张景阳, 张研宇,张磊,张靖佶,张博,张亨,张蓬,郑琳,赵丹,赵金, 赵雅妮…… 特别感谢:丁舟洁,一起来到乌普萨拉是阿拉的缘分,感谢这些年 来的照顾,支持和鼓励;崔悦,同一屋檐下的处女座,一切尽在不言 中;张璐和姜望舒,谢谢你们的满满正能量,一起学习,一起成长的 日子真美好;刘禄安,感谢你一直纵容我的坏脾气。 感谢徐之涵和孙冠在地球的不同角落感同身受博士生活的喜怒哀乐, 让我真切的感受到海内存知己,天涯若比邻。 最后感谢爸爸妈妈,谢谢你们这些年的支持和鼓励。父母在,不远 游,游必有方。

46 References

1. Strange, K., The end of "naive reductionism": rise of systems biology or renaissance of physiology? Am J Physiol Cell Physiol, 2005. 288(5): p. C968-74. 2. Ma'ayan, A., et al., Formation of regulatory patterns during signal propagation in a Mammalian cellular network. Science, 2005. 309(5737): p. 1078-83. 3. Mast, F.D., A.V. Ratushny, and J.D. Aitchison, Systems cell biology. J Cell Biol, 2014. 206(6): p. 695-706. 4. Joyce, A.R. and B.O. Palsson, The model organism as a system: integrating 'omics' data sets. Nat Rev Mol Cell Biol, 2006. 7(3): p. 198-210. 5. Lodish, H.F., Molecular cell biology. 6th ed. 2007, New York: W.H. Freeman. xxxvii, 1150, [76] p. 6. Godovac-Zimmermann, J., 8th Siena meeting. From genome to proteome: integration and proteome completion. Expert Rev Proteomics, 2008. 5(6): p. 769-73. 7. Tiselius, A., Electrophoresis of serum globulin: Electrophoretic analysis of normal and immune sera. Biochem J, 1937. 31(9): p. 1464-77. 8. Kenrick, K.G. and J. Margolis, Isoelectric focusing and gradient gel electrophoresis: a two-dimensional technique. Anal Biochem, 1970. 33(1): p. 204-7. 9. Altelaar, A.F., J. Munoz, and A.J. Heck, Next-generation proteomics: towards an integrative view of proteome dynamics. Nat Rev Genet, 2013. 14(1): p. 35-48. 10. Picotti, P. and R. Aebersold, Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat Methods, 2012. 9(6): p. 555-66. 11. Anderson, N.L., et al., Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti- Peptide Antibodies (SISCAPA). J Proteome Res, 2004. 3(2): p. 235- 44. 12. Razavi, M., et al., High-throughput SISCAPA quantitation of peptides from human plasma digests by ultrafast, liquid chromatography-free mass spectrometry. J Proteome Res, 2012. 11(12): p. 5642-9. 13. Engvall, E. and P. Perlmann, Enzyme-linked immunosorbent assay (ELISA). Quantitative assay of immunoglobulin G. Immunochemistry, 1971. 8(9): p. 871-4.

47 14. Blackburn, G.F., et al., Electrochemiluminescence detection for development of immunoassays and DNA probe assays for clinical diagnostics. Clin Chem, 1991. 37(9): p. 1534-9. 15. Kenten, J.H., et al., Rapid electrochemiluminescence assays of polymerase chain reaction products. Clin Chem, 1991. 37(9): p. 1626-32. 16. Todd, J., et al., Ultrasensitive flow-based immunoassays using single-molecule counting. Clin Chem, 2007. 53(11): p. 1990-5. 17. Savage, M.J., et al., A sensitive abeta oligomer assay discriminates Alzheimer's and aged control cerebrospinal fluid. J Neurosci, 2014. 34(8): p. 2884-97. 18. Rissin, D.M., et al., Single-molecule enzyme-linked immunosorbent assay detects serum proteins at subfemtomolar concentrations. Nat Biotechnol, 2010. 28(6): p. 595-9. 19. Landegren, U., et al., Opportunities for sensitive plasma proteome analysis. Anal Chem, 2012. 84(4): p. 1824-30. 20. Karch, C.M., C. Cruchaga, and A.M. Goate, Alzheimer's disease genetics: from the bench to the clinic. Neuron, 2014. 83(1): p. 11-26. 21. Beecham, A.H., et al., Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet, 2013. 45(11): p. 1353-60. 22. Delgado, J.M., et al., Dialytrode for long term intracerebral perfusion in awake monkeys. Arch Int Pharmacodyn Ther, 1972. 198(1): p. 9-21. 23. Maurer, M.H., et al., The proteome of human brain microdialysate. Proteome Sci, 2003. 1(1): p. 7. 24. Chang, S.Y., et al., Analysis of rat brain microdialysate by gas chromatography-high-resolution selected-ion monitoring mass spectrometry. J Chromatogr, 1991. 562(1-2): p. 111-8. 25. Giusti, L., P. Iacconi, and A. Lucacchini, Fine-needle aspiration for proteomic study of tumour tissues. Proteomics Clin Appl, 2011. 5(1- 2): p. 24-9. 26. Ekins, R.P., Multi-analyte immunoassay. J Pharm Biomed Anal, 1989. 7(2): p. 155-68. 27. Bertenshaw, G.P., et al., Multianalyte profiling of serum antigens and autoimmune and infectious disease molecules to identify biomarkers dysregulated in epithelial ovarian cancer. Cancer Epidemiol Biomarkers Prev, 2008. 17(10): p. 2872-81. 28. Gurbel, P.A., et al., Biomarker analysis by fluorokine multianalyte profiling distinguishes patients requiring intervention from patients with long-term quiescent coronary artery disease: a potential approach to identify atherosclerotic disease progression. Am Heart J, 2008. 155(1): p. 56-61. 29. Jonas, J.B., et al., Cytokine concentration in aqueous humor of eyes with diabetic macular edema. Retina, 2012. 32(10): p. 2150-7.

48 30. Petersson, L., et al., Multiplexing of miniaturized planar antibody arrays for serum protein profiling--a biomarker discovery in SLE nephritis. Lab Chip, 2014. 14(11): p. 1931-42. 31. Eke, I., et al., EGFR/JIP-4/JNK2 signaling attenuates cetuximab- mediated radiosensitization of squamous cell carcinoma cells. Cancer Res, 2013. 73(1): p. 297-306. 32. Ellington, A.A., et al., Antibody-based protein multiplex platforms: technical and operational challenges. Clin Chem, 2010. 56(2): p. 186-93. 33. Bystrom, S., et al., Affinity Proteomic Profiling of Plasma, Cerebrospinal Fluid, and Brain Tissue within Multiple Sclerosis. J Proteome Res, 2014. 13(11): p. 4607-19. 34. Gold, L., et al., Advances in human proteomics at high scale with the SOMAscan proteomics platform. N Biotechnol, 2012. 29(5): p. 543- 9. 35. Sattlecker, M., et al., Alzheimer's disease biomarker discovery using SOMAscan multiplexed protein technology. Alzheimers Dement, 2014. 36. Gold, L., et al., Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One, 2010. 5(12): p. e15004. 37. Mehan, M.R., et al., Validation of a blood protein signature for non- small cell lung cancer. Clin Proteomics, 2014. 11(1): p. 32. 38. Sano, T., C.L. Smith, and C.R. Cantor, Immuno-PCR: very sensitive antigen detection by means of specific antibody-DNA conjugates. Science, 1992. 258(5079): p. 120-2. 39. Dekker, J., et al., Capturing chromosome conformation. Science, 2002. 295(5558): p. 1306-11. 40. Fredriksson, S., et al., Protein detection using proximity-dependent DNA ligation assays. Nat Biotechnol, 2002. 20(5): p. 473-7. 41. Gullberg, M., et al., Cytokine detection by antibody-based proximity ligation. Proc Natl Acad Sci U S A, 2004. 101(22): p. 8420-4. 42. Lundberg, M., et al., Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low- abundant proteins in human blood. Nucleic Acids Res, 2011. 39(15): p. e102. 43. Gustafsdottir, S.M., et al., Detection of individual microbial pathogens by proximity ligation. Clin Chem, 2006. 52(6): p. 1152- 60. 44. Darmanis, S., et al., Sensitive plasma protein analysis by microparticle-based proximity ligation assays. Mol Cell Proteomics, 2010. 9(2): p. 327-35. 45. Tavoosidana, G., et al., Multiple recognition assay reveals prostasomes as promising plasma biomarkers for prostate cancer. Proc Natl Acad Sci U S A, 2011. 108(21): p. 8809-14. 46. Darmanis, S., et al., ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS One, 2011. 6(9): p. e25583.

49 47. Thorsen, S.B., et al., Detection of serological biomarkers by proximity extension assay for detection of colorectal neoplasias in symptomatic individuals. J Transl Med, 2013. 11: p. 253. 48. Assarsson, E., et al., Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One, 2014. 9(4): p. e95192. 49. Soderberg, O., et al., Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat Methods, 2006. 3(12): p. 995-1000. 50. Leuchowius, K.J., et al., Parallel visualization of multiple protein complexes in individual cells in tumor tissue. Mol Cell Proteomics, 2013. 51. Harlow, E. and D. Lane, Antibodies : a laboratory manual. 1988, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. xiii, 726 p. 52. Kohler, G. and C. Milstein, Continuous cultures of fused cells secreting antibody of predefined specificity. Nature, 1975. 256(5517): p. 495-7. 53. Stadler, C., et al., Immunofluorescence and fluorescent-protein tagging show high correlation for protein localization in mammalian cells. Nat Methods, 2013. 10(4): p. 315-23. 54. Janeway, C., Immunobiology : the immune system in health and disease. 5th ed. 2001, London ; New York, NY, US: Garland Pub. xviii, 732 p. 55. Hanes, J. and A. Pluckthun, In vitro selection and evolution of functional proteins by using ribosome display. Proc Natl Acad Sci U S A, 1997. 94(10): p. 4937-42. 56. Hust, M., et al., A human scFv antibody generation pipeline for proteome research. J Biotechnol, 2011. 152(4): p. 159-70. 57. Hamers-Casterman, C., et al., Naturally occurring antibodies devoid of light chains. Nature, 1993. 363(6428): p. 446-8. 58. Muyldermans, S., Single domain camel antibodies: current status. J Biotechnol, 2001. 74(4): p. 277-302. 59. Amstutz, P., et al., Rapid selection of specific MAP kinase-binders from designed ankyrin repeat protein libraries. Protein Eng Des Sel, 2006. 19(5): p. 219-29. 60. Zahnd, C., et al., A designed ankyrin repeat protein evolved to picomolar affinity to Her2. J Mol Biol, 2007. 369(4): p. 1015-28. 61. Gebauer, M. and A. Skerra, Engineered protein scaffolds as next- generation antibody therapeutics. Curr Opin Chem Biol, 2009. 13(3): p. 245-55. 62. Boersma, Y.L. and A. Pluckthun, DARPins and other repeat protein scaffolds: advances in engineering and applications. Curr Opin Biotechnol, 2011. 22(6): p. 849-57. 63. Hassanzadeh-Ghassabeh, G., et al., Nanobodies and their potential applications. Nanomedicine (Lond), 2013. 8(6): p. 1013-26.

50 64. Baral, T.N., et al., Experimental therapy of African trypanosomiasis with a nanobody-conjugated human trypanolytic factor. Nat Med, 2006. 12(5): p. 580-4. 65. Saerens, D., G.H. Ghassabeh, and S. Muyldermans, Single-domain antibodies as building blocks for novel therapeutics. Curr Opin Pharmacol, 2008. 8(5): p. 600-8. 66. Hinman, L.M., et al., Preparation and characterization of monoclonal antibody conjugates of the calicheamicins: a novel and potent family of antitumor antibiotics. Cancer Res, 1993. 53(14): p. 3336-42. 67. Hamann, P.R., et al., An anti-CD33 antibody-calicheamicin conjugate for treatment of acute myeloid leukemia. Choice of linker. Bioconjug Chem, 2002. 13(1): p. 40-6. 68. Corti, A., et al., Peptide-mediated targeting of cytokines to tumor vasculature: the NGR-hTNF example. BioDrugs, 2013. 27(6): p. 591-603. 69. Politis, A., et al., A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat Methods, 2014. 11(4): p. 403-6. 70. Herzog, F., et al., Structural probing of a protein phosphatase 2A network by chemical cross-linking and mass spectrometry. Science, 2012. 337(6100): p. 1348-52. 71. Barchanski, A., et al., Impact of spacer and strand length on oligonucleotide conjugation to the surface of ligand-free laser- generated gold nanoparticles. Bioconjug Chem, 2012. 23(5): p. 908- 15. 72. Shrivastav, T.G., et al., Influence of different length spacers containing enzyme conjugate on functional parameters of progesterone ELISA. J Immunoassay Immunochem, 2013. 34(1): p. 94-108. 73. Keppler, A., et al., A general method for the covalent labeling of fusion proteins with small molecules in vivo. Nat Biotechnol, 2003. 21(1): p. 86-9. 74. Gu, G.J., et al., Protein tag-mediated conjugation of oligonucleotides to recombinant affinity binders for proximity ligation. N Biotechnol, 2013. 30(2): p. 144-52. 75. Wang, L., et al., Expanding the genetic code of Escherichia coli. Science, 2001. 292(5516): p. 498-500. 76. Chin, J.W., et al., Addition of a photocrosslinking amino acid to the genetic code of Escherichiacoli. Proc Natl Acad Sci U S A, 2002. 99(17): p. 11020-4. 77. Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-3. 78. Bendall, S.C. and G.P. Nolan, From single cells to deep phenotypes in cancer. Nat Biotechnol, 2012. 30(7): p. 639-47. 79. Hanahan, D. and R.A. Weinberg, Hallmarks of cancer: the next generation. Cell, 2011. 144(5): p. 646-74.

51 80. Wang, Z.A., et al., Lineage analysis of basal epithelial cells reveals their unexpected plasticity and supports a cell-of-origin model for prostate cancer heterogeneity. Nat Cell Biol, 2013. 15(3): p. 274-83. 81. Magee, J.A., E. Piskounova, and S.J. Morrison, Cancer stem cells: impact, heterogeneity, and uncertainty. Cancer Cell, 2012. 21(3): p. 283-96. 82. Ke, R., et al., In situ sequencing for RNA analysis in preserved tissue and cells. Nat Methods, 2013. 10(9): p. 857-60. 83. Lee, J.H., et al., Highly multiplexed subcellular RNA sequencing in situ. Science, 2014. 343(6177): p. 1360-3. 84. Bendall, S.C., et al., Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science, 2011. 332(6030): p. 687-96. 85. Newell, E.W., et al., Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of CD8+ T cell phenotypes. Immunity, 2012. 36(1): p. 142-52. 86. Hughes, A.J., et al., Single-cell western blotting. Nat Methods, 2014. 11(7): p. 749-55. 87. Kang, C.C., et al., Single-Cell Western Blotting after Whole-Cell Imaging to Assess Cancer Chemotherapeutic Response. Anal Chem, 2014. 88. Fields, S. and O. Song, A novel genetic system to detect protein- protein interactions. Nature, 1989. 340(6230): p. 245-6. 89. Uetz, P., et al., A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000. 403(6770): p. 623-7. 90. Zhu, J., et al., Protein interaction discovery using parallel analysis of translated ORFs (PLATO). Nat Biotechnol, 2013. 31(4): p. 331-4. 91. Gu, L., et al., Multiplex single-molecule interaction profiling of DNA-barcoded proteins. Nature, 2014. 92. Hammond, M., et al., Profiling cellular protein complexes by proximity ligation with dual tag microarray readout. PLoS One, 2012. 7(7): p. e40405. 93. McGregor, L.M., T. Jain, and D.R. Liu, Identification of ligand- target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J Am Chem Soc, 2014. 136(8): p. 3264-70. 94. Zieba, A., et al., Intercellular variation in signaling through the TGF-beta pathway and its relation to cell density and cell cycle phase. Mol Cell Proteomics, 2012. 11(7): p. M111 013482. 95. Guesdon, J.L. and S. Avrameas, Magnetie solid phase enzyme- immunoassay. Immunochemistry, 1977. 14(6): p. 443-7. 96. Rittich, B., et al., Characterization of deoxyribonuclease I immobilized on magnetic hydrophilic polymer particles. J Chromatogr B Analyt Technol Biomed Life Sci, 2002. 774(1): p. 25- 31.

52 97. Darmanis, S., et al., Sensitive Plasma Protein Analysis by Microparticle-based Proximity Ligation Assays. Molecular & Cellular Proteomics, 2010. 9(2): p. 327-335. 98. Nong, R.Y., et al., DNA-assisted protein detection technologies. Expert Rev Proteomics, 2012. 9(1): p. 21-32. 99. Gu, G.J., et al., Role of individual MARK isoforms in phosphorylation of tau at Ser(2)(6)(2) in Alzheimer's disease. Neuromolecular Med, 2013. 15(3): p. 458-69. 100. Conze, T., et al., Analysis of genes, transcripts, and proteins via DNA ligation. Annu Rev Anal Chem (Palo Alto Calif), 2009. 2: p. 215-39. 101. Deschenes-Simard, X., et al., ERKs in cancer: friends or foes? Cancer Res, 2014. 74(2): p. 412-9. 102. O'Connor, J.W. and E.W. Gomez, Biomechanics of TGFbeta- induced epithelial-mesenchymal transition: implications for fibrosis and cancer. Clin Transl Med, 2014. 3: p. 23. 103. Seet, B.T., et al., Reading protein modifications with interaction domains. Nat Rev Mol Cell Biol, 2006. 7(7): p. 473-83. 104. Liu, Y., et al., Western blotting via proximity ligation for high performance protein analysis. Mol Cell Proteomics, 2011. 10(11): p. O111 011031. 105. Lucic, V., A. Rigort, and W. Baumeister, Cryo-electron tomography: the challenge of doing structural biology in situ. J Cell Biol, 2013. 202(3): p. 407-19. 106. Thierbach, K., et al., Protein interfaces of the conserved Nup84 complex from Chaetomium thermophilum shown by crosslinking mass spectrometry and electron microscopy. Structure, 2013. 21(9): p. 1672-82. 107. Lienert, F., et al., Synthetic biology in mammalian cells: next generation research tools and therapeutics. Nat Rev Mol Cell Biol, 2014. 15(2): p. 95-107. 108. Barrett, D.M., et al., Chimeric antigen receptor therapy for cancer. Annu Rev Med, 2014. 65: p. 333-47.

53 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1055 Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-236591 2014