Downloaded from orbit.dtu.dk on: Oct 07, 2021

Novel proteomics technologies to networks in cells and tissues

Savickas, Simonas

Publication date: 2020

Document Version Publisher's PDF, also known as Version of record

Link back to DTU Orbit

Citation (APA): Savickas, S. (2020). Novel proteomics technologies to decipher protease networks in cells and tissues. DTU Bioengineering.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

 Users may download and print one copy of any publication from the public portal for the purpose of private study or research.  You may not further distribute the material or use it for any profit-making activity or commercial gain  You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

DTU Bioengineering Department of Biotechnology and Biomedicine PhD Thesis 2020

Af

Copyright: Simonas Savickas Udgivet af: XXX XXX DTU Rekvireres: www.dtu.dk ISSN: [0000-0000] (elektronisk udgave) ISBN: [000-00-0000-000-0] (elektronisk udgave)

ISSN: [0000-0000] (trykt udgave) ISBN: [000-00-0000-000-0] (trykt udgave) Preface

This thesis is the result of a PhD project under the supervision of Professor MSO Dr. Ulrich auf dem Keller and Professor Dr. Birte Svensson at Technical University of Denmark Department of Bioengineering. Due to the move of Ulrich auf dem Keller’s laboratory, the experiments were started at ETH Zurich and completed at DTU Bioengineering. This work is subdivided into three sections: a general introduction including three manuscripts that give an overview of mass spectrometry-based degradomics methods and technically introduce a key method. The results section contains four manuscripts describing my experiments. Finally, a general discussion compares findings with the existing published data and provides an outlook how mass spectrometry might be utilized as powerful tool in personalized medicine.

I hope you will enjoy reading it.

Simonas Savickas

Novel proteomics technologies to decipher protease networks in cells and tissues Papers and Manuscripts

This PhD thesis contains the following manuscripts:

Savickas, S., & auf dem Keller, U. (2017). Targeted degradomics in protein terminomics and protease substrate discovery. Biological chemistry, 399(1), 47-54.

Schoof, E. M., Rapin, N., Savickas, S., Gentil, C., Lechman, E., Haile, J. S., ... & Porse, B. T. (2019). A quantitative single-cell proteomics approach to characterize an acute myeloid leukemia hierarchy. bioRxiv, 745679.

Savickas, S., Kastl, P., & auf dem Keller, U. (2020). Combinatorial degradomics: Precision tools to unveil proteolytic processes in biological systems. Biochimica et Biophysica Acta (BBA)- Proteins and Proteomics, 140392.

Bundgaard, L., Savickas, S., & Auf dem Keller, U. (2020). Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS. In ADAMTS (pp. 285-296). Humana, New York, NY.

Savickas, S., Kastl, P., auf dem Keller, U. (2020). Interconnected activities and functions of matrix at the wound edge. In preparation.

Savickas, S., Kastl, P., Schoof, M.E., auf dem Keller, U. (2020). hybrid Parallel Reaction Monitoring (hPRM) a mass spectrometry based technique for semi-discovery research. In preparation

Savickas, S., Schoof, M.E., auf dem Keller, U. (2020). Harnessing a novel targeted single-cell proteomics technique to expand the knowledge of primary leukemia hierarchy. In preparation

Additional publications that were outside of the scope of this thesis:

Sabino, F., Egli, F. E., Savickas, S., Holstein, J., Kaspar, D., Rollmann, M., ... & Auf dem Keller, U. (2018). Comparative degradomics of porcine and human wound exudates unravels biomarker candidates for assessment of wound healing progression in trauma patients. Journal of Investigative Dermatology, 138(2), 413-422.

Yip, M. C., Savickas, S., Gygi, S. P., & Shao, S. (2020). ELAC1 Repairs tRNAs Cleaved during Ribosome-Associated Quality Control. Cell reports, 30(7), 2106-2114.

Bundgaard, L. B., Agren, M. S., Holstein, P., Savickas, S., Sabino, F. M. R., & auf dem Keller, U. (2018). A SYSTEM-WIDE STUDY OF THE PROTEOME AND PROTEASE CLEAVAGE PRODUCTS IN CHRONIC SKIN ULCERS. Wound Repair and Regeneration, 26(2)

Savickas, S.& Kalogeropoulos, K. Bundgaard, L. B., Larsen, C. A., auf dem Keller, U. (2020). High throughput PRM method for rapid biomarker recording in body fluids. In preparation

Novel proteomics technologies to decipher protease networks in cells and tissues Summary

Precision processing of protein substrates by limited proteolysis has proven to play a key role in maintaining cellular and tissues homeostasis. Over the past years, an enormous amount of information on protease-substrate interactions has been collected and compiled into a complex network termed the protease web. We are only beginning to understand the connectivity of the protease web but have advanced our perception of the dynamics of proteolysis, complementary pathways and compensatory mechanisms. Identification of cell type- and tissue-specific proteolytic patterns in healthy and perturbed systems has gained much attention to promote generation of fundamental hypothesis for underlying mechanisms of responses to perturbations, thereby drawing a holistic picture of key events. However, we need highly reproducible assays to comprehensively monitor the interconnected protease networks within the protease web and to quantitatively assess protease-protease and protease-substrate relations with high specificity and sensitivity and applicability to samples from cells and tissues. In this project, we have developed novel highly specific and sensitive parallel reaction monitoring (PRM) assays to monitor a network of matrix metalloproteinases (MMPs) in complex samples. To concomitantly analyze the MMP activation status, we have specifically targeted peptides that are generated during propeptide removal, thus resulting in activation of the zymogens. These assays have been applied to study interconnected MMP activation on the molecular, cellular and tissue level. We have tested the hypothesis that MMP10 and MMP3 act as activator MMPs of several other family members and mapped and functionally tested MMP3/10-dependent MMP activation cascades in culture supernatants from fibroblasts and keratinocytes. Mouse models with normal, enhanced and ablated MMP10 activity have been analyzed and its function determined as local MMP activator in the skin. With the arrival of new mass spectrometers, we have extended the PRM analysis of the MMP network by developing a hybrid Parallel Reaction Monitoring (hPRM) method. The workflow utilizes the improved power of the instruments to retain PRM sensitivity to investigate the targeted network and concomitantly perform a global proteome analysis for further proteolytic signatures. This new single-shot method confirmed our previous findings of MMP abundances and simultaneously revealed processing of S100A11 by in response to a perturbation. Investigating proteolytic networks in vitro, in cell cultures and in complex tissues disregards the natural heterogeneity of cellular makeup. Thus, ultimately analysis at the single cell level is needed to understand the protease web in multicellular environments. To address this challenge, we have implemented a workflow and tested if we can identify proteases at the protein level in single cells. By combining fluorescence-activated cell sorting (FACS) and an optimized Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) method, we investigated the proteomic landscape of Leukemia hierarchy. In this single-cell proteomics data set we also identified and

Novel proteomics technologies to decipher protease networks in cells and tissues quantified 22 proteases including, but not limited to caspases, , cathepsins, and proteasome components across three cell types. Due to inherent low abundances, multiple known proteases could not be identified in the previous study. To address this limitation, we further extended our workflow into single-cell targeted proteomics analyses. The upgraded technique combines our optimized SCoPE-MS workflow with internal standard triggered-PRM (IS-PRM) to allow identification and quantification of low-abundance target proteins including proteases. Applying this workflow, we confirmed differential abundances of CD34 and CD38 in Leukemic Blasts, Progenitors and Stem Cells and observed their proportional distribution between the cells. In addition, we monitored HtrA serine proteases (HtrA) 2 and 3 and their flux between the cell types. Thereby, we monitored high abundance of HtrA2 in Leukemic Stem Cells and low levels in Progenitors, which might indicate a role of this protease in modulating differentiation. Having established a workflow for targeted analysis of peptides ultimately opens up the possibility to not only monitor proteases but also proteolytic cleavages by analysis of spanning peptides and neo-termini at the single-cell level. In conclusion, the delineation of protease networks by mass spectrometry-based proteomics methods will elucidate how their components temporally and spatially shape the proteome in complex tissues in health and disease. With the advancement of hybrid-PRM, we will be able to concomitantly analyze very lowly abundant proteases and their surrounding proteomes and degradomes. Finally, the targeted single-cell proteomics technique will pave the way to ultimately map out the protease web in individual cells isolated from complex heterogeneous tissues.

Novel proteomics technologies to decipher protease networks in cells and tissues Dansk

En balanceret og velkontrolleret proteolytisk processering af proteinsubstrater har vist sig at være en essentiel faktor for opretholdelse af homeostase i væv og celler. Gennem de senere år er der opnået en stor indsigt i interaktionen mellem proteaser og substrater, og det samlede billede viser et komplekst netværk, der er blevet kaldt protease-web. På trods af mange års forskning har vi stadig kun kendskab til en lille del af dette netværk, men vi har fået en mere indgående forståelse af dynamikken under proteolyse, samt de komplementære og kompensatoriske mekanismer. Der er en stor interesse i at identificere vævs- og cellespecifikke proteolytiske interaktioner under både normale og patologiske forhold for at opnå en større forståelse af, hvilke mekanismer der fører til ubalance og derved opnå indsigt i essentielle hændelser i en holistisk kontekst. For at foretage en dybdegående og troværdig analyse af samspillet i protease-netværket, er det nødvendigt med reproducerbare metoder, der kan bestemme protease-protease og protease- substrat relationerne kvantitativt med høj specificitet og sensitivitet både i væv og celler. I dette projekt har vi udviklet yderst specifikke og sensitive parallel reaction monitoring (PRM) metoder til at undersøge et netværk af matrix metalloproteinaser (MMPs) i komplekse prøver. For at monitorere MMPernes aktiveringsstadie har vi specifikt målrettet analyserne mod de peptid-sekvenser, der opstår under aktivering af zymogenet ved proteolytisk spaltning af propeptidet. Metoderne er blevet anvendt til at studere MMP-netværket på både vævs-, celle- og molekylært niveau. Hypotesen var at MMP10 og MMP3 aktiverer flere beslægtede proteiner. Samtidig har vi kortlagt og testet MMP3/MMP10-afhængige MMP aktiveringskaskader i supernatanten fra in vitro dyrkede fibroblaster og keratinocytter. Endeligt har vi anvendt musemodeller med normal, øget og slettet MMP10 aktivitet til at påvise at MMP10 aktiverer MMP- netværket lokalt i huden. De nye generationer af massespektrometre har banet vejen for, at vi kunne udvikle hybrid Parallel Reaction monitoring (hPRM), der er en modificeret og mere dybdegående PRM analyse. Metoden udnytter de nye instrumenters øgede power til at sikre høj sensitivitet under PRM analysen, imens der sideløbende foretages en analyse af det fulde proteom så den proteolytiske signatur kan kortlægges samtidig. Denne nye analysemetode bekræftede den MMP interaktion vi tidligere havde identificeret, og ydermere fandt vi at S100A11 processerede cathepsin D under patologiske forhold. Den cellulære heterogenitet bliver ofte negligeret ved undersøgelse af det proteolytiske netværk både in vitro, i cellekulturer og i prøver fra komplekse væv. Men for at få en komplet forståelse af protease-netværket er det nødvendigt at kunne analysere de enkelte celler i en multicellulær matrix. I dette projekt har vi udviklet et workflow, der muliggør analyse af én enkelt celle, herunder cellens proteaser. Ved at kombinere flourescence-activated cell sorting (FACS) og en optimeret Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) metode har vi undersøgt proteomet i det leukæmiske cellehierarki. I dette single-cell proteom dataset har vi på

Novel proteomics technologies to decipher protease networks in cells and tissues tværs af alle celletyper identificeret og kvantificeret 22 proteaser inklusiv caspaser, aminopeptidaser, cathepsiner, calpainer og komponenter fra proteasomet. Flere proteaser kunne ikke identificeres i single-cell studiet, da koncentrationen i den enkelte celle er meget lav. Vi optimerede derfor vores metode yderligere og udviklede single-cell targeted proteome-analysen. Denne innovative nye teknik kombinerer vores optimerede SCoPE- MS workflow med internal standard triggered PRM (IS-PRM) og muliggør identificering og kvantificering af proteiner, herunder proteaser, i meget lav koncentration. Ved brug af denne innovative teknik fandt vi en differentieret koncentration af CD34 og CD38 i leukæmiske blaster, progenitorer og stamceller, og fandt deres proportionelle fordeling mellem cellerne. Ydermere, undersøgte vi serine-proteaserne HtrA2 og HtrA3, og hvordan de varierede mellem celletyperne. Vi fandt HtrA2 i høj koncentration i leukæmiske stamceller, mens koncentrationen var lav i progenitorerne, hvilket kan indikere at disse proteaser er involveret i regulering af differentieringen. Denne nyudviklede metode til målrettet analyse af specifikke peptider åbner ikke bare muligheden for analyse af proteaser, men også muligheden for at analysere spanning-peptides og neo-termini på enkelt-celle niveau. Kortlægningen af protease-netværket ved brug af massespektrometri-baserede proteom- analyser giver mulighed for at klarlægge interaktionen mellem komponenterne i komplekse væv under normale biologiske forhold, samt hvordan det ændrer sig under patologiske tilstande. Med vores nyudviklede hPRM teknik vil vi ydermere kunne analysere proteaser i meget lav koncentration, samt det omgivende proteom og degradom. Endeligt, vil den innovative targeted single-cell proteomics teknik muliggøre en dybdegående analyse af protease-netværket i individuelle celler isoleret fra komplekse heterogene væv.

Novel proteomics technologies to decipher protease networks in cells and tissues Table of Contents

1. Introduction ...... 12 1.1 Proteases in cutaneous wound healing – a complex tissue response ...... 12 1.2 MMP network degradomics ...... 15 1.3 Combinatorial Degradomics – Manuscript 1 ...... 17 1.4 Targeted degradomics in protein terminomics and protease substrate discovery – Manuscript 2 ...... 26 1.5 Cellular heterogeneity and single-cell proteome analysis ...... 35

2. Research Objectives ...... 42

3. Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS – Manuscript 3 ...... 43

4. Mapping matrix networks in the skin - Manuscript 4 ...... 56 4.1 Introduction ...... 59 4.2 Methods ...... 61 4.3 Results ...... 67 4.4 Discussion ...... 74

5. Hybrid Parallel Reaction Monitoring (hPRM), a novel mass spectrometry-based workflow for semi-discovery research- Manuscript 5 ...... 79 5.1 Introduction ...... 82 5.2 Methods ...... 84 5.3 Results ...... 87 5.4 Discussion ...... 92

6. Optimized SCope-MS for single-cell proteomics- Manuscript 6 ...... 96

7. Harnessing a novel targeted single-cell proteomics technique to expand the knowledge of primary leukemia hierarchy- Manuscript 7 ...... 131 7.1 Introduction ...... 134 7.2 Methods ...... 135 7.3 Results ...... 137 7.4 Discussion ...... 140

8. Concluding discussion ...... 143

9. Supplementary material ...... 149

10. Acknowledgments ...... 150

Novel proteomics technologies to decipher protease networks in cells and tissues Abbreviations and Acronyms Abbreviation Full name ACN acetonitrile ABP activity-based probes AEBSF 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride AGC automatic gain control APMA 4-aminophenylmercuric acetate bp base pairs BSA bovine serum albumin CAA chloroacetamide COFRADIC combined fractional diagonal chromatography CPLL combinatorial peptide ligand libraries Ctrl control CV compensation voltage Cy7 cyanine7 CyTOF time of flight mass-cytometry dd double distilled DDA data dependent acquisition DIA data independent acquisition DMEM dulbecco’s modified eagle’s medium DMSO dimethyl sulfoxide DNA deoxyribonucleic acid E:S enzyme-to-substrate ratio ECM extracellular matrix EDTA ethylenediaminetetraacetic acid ELISpot enzyme-linked immune absorbent spot EPIC eth phenomics center ESI electrospray ionization FACS fluorescence-activated cell sorting FAIMS Pro high-field asymmetric waveform ion mobility spectrometry pro FBS fetal bovine serum FDR false discovery rate GuHCl guanidinium chloride HCD higher energy collisional dissociation HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid HPG-ALD hyperbranched polyglycerol aldehydes HPLC high performance liquid chromatography hPRM hybrid Parallel Reaction Monitoring KO knockout K-SFM keratinocyte serum-free medium m/z mass over charge ratio MEF mouse embryonic fibroblasts

Novel proteomics technologies to decipher protease networks in cells and tissues MMP matrix metalloproteinase MMRRC mutant mouse regional resource center MPK murine primary keratinocyte MS mass spectrometry MS/MS tandem mass spectrometry MS1 precursor ion scan MS2 product ion scan NCE normalized collision energy PAGE polyacrylamide gel electrophoresis PBS phosphate-buffered saline PCR polymerase chain reaction PCT pressure cycling technology PD proteome discoverer PE phycoerythrin PMSF phenylmethylsulfonyl fluoride PRM parallel reaction monitoring PSM's peptide spectrum matches PTM post-translational modification RFU relative fluorescence units RNA ribonucleic acid RNA-Seq RNA sequencing RT room temperature SCoPE-MS single cell proteomics by mass spectrometry SPLIFF simple split-ubiquitin-based method STAGE tip stop and go extraction tip TAILS terminal amine isotopic labeling of substrates TCEP tris(2-carboxyethyl)phosphine TFA trifluoroacetic acid tg transgenic TIMP tissue inhibitor of metalloproteinases TMT tandem mass tag TOF time of flight triggered by offset, multiplexed, accurate-mass, high- TOMAHAQ resolution, and absolute quantification Tris tris(hydroxymethyl)aminomethane v volume w weight WT wild-type

Novel proteomics technologies to decipher protease networks in cells and tissues 1. Introduction

1.1 Proteases in cutaneous wound healing – a complex tissue response The encodes close to 600 proteases making it the largest family of enzymes (Puente, Sánchez, Overall, & López-Otín, 2003). Originally, proteases have been associated with the digestion of food ensuring the degradation of components entering the gastrointestinal tract. However, with the discovery that limited proteolysis mediates blood coagulation and thus precision processing of biological substrates, proteases have adopted a role as potent protein regulators in the whole organism (Turk, Turk, & Turk, 2012). With advancements in computational power and analysis throughput the view on proteases acting in simple, single protease-substrate relations has changed to a far more complex system of interconnected networks, branched pathways and activation cascades. This is defined as the protease web, which established an interdependence of protease groups consisting of precise interactions among proteases, activators and inhibitors (Fortelny, et al., 2014). As an example of a complex tissue response governed by highly controlled proteolytic cascades, cutaneous wound healing is regulated by several interconnected protease activities that contribute to restoring the skin’s protective barrier at the site of injury (Moali & Hulmes, 2009) (Toriseva & Kähäri, 2009). First, the coagulation cascade prevents continuous loss of blood by clot formation, employing a complex network of proteolytic coagulation factors. This is followed by activation of the complement pathway, which aids in innate immunity, and soon after proteolytic degradation of fibrin by the plasmin system opens a path for migrating cells to populate the wound site (Schäfer & Werner, 2008) (Gurtner, Werner, Barrandon, & Longaker, 2008). The strictly regulated process is interacting with various matrix metalloproteinases (MMPs), a group of zinc- dependent extracellular endopeptidases, whose primary role is to degrade the extracellular matrix as well as having pivotal functions in epidermal re-epithelialization and skin remodeling (Gill & Parks, Metalloproteinases and their inhibitors: Regulators of wound healing, 2008) (Page-McCaw, Ewald, & Werb, 2007). Latest literature shows that these proteases have an expanded proteolytic function targeting chemokines, growth factors, their receptors and binding proteins, adding an additional dimension of regulation for immune cell recruitment, angiogenesis and epithelial cell proliferation (Parks, Wilson, & López-Boado, 2004). Additionally, many severe diseases like different forms of cancers or chronic inflammation have been associated with dysregulated MMP expression or activity (Hadler-Olsen, Fadnes, Sylte, Uhlin-Hansen, & Winberg, 2011). Thus, an enormous effort has been put into understanding the role of MMPs in those biological processes, by analyzing the substrate degradome of each MMP, functional analysis of cleaved substrates and using mice with genetically altered MMP expression (Gill, Kassim, Birkland, & Parks, 2010). Interestingly, many studies only targeting a single MMP member in a complex system led to inconclusive findings, pointing towards a far more complex interdependent MMP network.

12 Novel proteomics technologies to decipher protease networks in cells and tissues Figure 1-Secreted MMP distribution of functional protein domains across each MMP class. Adapted from Madzharova et al. (2019). All MMPs share functional domains like signal peptide (SP), thiol-containing (SH) propeptide (Pro), catalytic moiety with a zink (Zn2+) binding site and a disulfide bridged hemopexin domain, except for MMP7 and MMP26 that lack hemopexin domain. The secreted classes of MMPs are distinguished as simple MMPs with the basic structure, gelatin-binding containing fibronectin repeats, furin- activated with furin binding sites and MMP21 with a vitronectin-like insert

The transcriptional regulation of MMP expression is well studied and has been shown to be mediated by cis-regulatory elements in their promotor regions, which bind different cytokines, growth factors and hormones (Overall & López-Otín, 2002) (Fanjul-Fernández, Folgueras, Cabrera, & López-Otín, 2010). On the other hand, MMP activation on the protein level, a pivotal feature of MMP biology, is still not completely understood. All MMPs are expressed as inactive zymogens (proMMPs) (Figure 1). In this conformation, their N-terminal pro-peptides inhibit the catalytic domain by a cysteine-switch mechanism and must be proteolytically removed in order to activate the enzyme. Accumulated proMMPs in the extracellular space can be readily activated by other activating proteases upon disruption of tissue homeostasis.

Novel proteomics technologies to decipher protease networks in cells and tissues 13

In the healing skin wound, MMPs are abundant in both epithelium and stroma (clot and granulation tissue as well as dermis at the wound edge), whereas expression and activity of some is limited to a specific area or compartment (Figure 2) (Madlener, Parks, & Werner, 1998) (Toriseva & Kähäri, 2009). MMP3 also known as stromelysin-1, and MMP10 known as stromelysin-2 stand out as being secreted by keratinocytes (Madlener, et al., 1996) (Madlener, Parks, & Werner, 1998) (Toriseva & Kähäri, 2009) and serving as activator proteases. This suggests a complementary function in modulating processes at the epidermal wound edge. MMP10 was found to processes laminin-332 in vitro and the continuously active MMP10 mutant mouse have shown a scattering keratinocytes at a wound edge (Krampert, et al., 2004). Furthermore, the use of MMP10-deficient mice has demonstrated a role in repair processes in the gut and the liver (Koller, et al., 2012) (Garcia-Irigoyen, et al., 2014) (Orbe, et al., 2011). Moreover, lack of MMP10 has affected the inflammatory response in murine lungs (Kassim, et al., 2007) and contributed to inhibition of lung tumorigenesis (Regala, et al., 2011) (Justilien, et al., 2012). In addition, by using MMP3 knockout mice it has been identified that the protease has a protective role in squamous cell carcinoma and promotes keratinocyte differentiation (McCawley, Wright, LaFleur, Crawford, & Matrisian, 2008). Also, temporally increased secretion and activation of MMP3 in glaucoma patients maintains homeostasis of intraocular pressure by regulating aqueous outflow resistance (O'Callaghan, et al., 2017). However, inhibited expression of MMP3 has decreased cell proliferation and migration of colorectal cancer (Bufu, et al., 2018), which shows that improper balance of active MMP3 can lead to disastrous effects.

Figure. 2: Expression of MMPs and tissue inhibitors of metallo-proteinases (TIMPs) in the healing skin wound. MMPs and TIMPs are released from multiple cell types during cutaneous wound healing. MMP10 is highly expressed in migrating keratinocytes at the wound edge and in close proximity to fibroblasts in the granulation tissue (modified from (Toriseva & Kähäri, 2009)).

14 Novel proteomics technologies to decipher protease networks in cells and tissues

There are remarkable parallels between wound healing and skin carcinogenesis in upregulation of MMP10 expression in hyperproliferative keratinocytes (Kerkelä, et al., 2001) (Mathew, et al., 2002) (Muller, et al., 1991). Within the MMP activation network (Figure 2), MMP10 - like its close homologue MMP3 - belongs to the class of ‘activators’ (Nakamura, Fujii, Ohuchi, Yamamoto, & Okada, 1998) and itself can be activated by non-MMP proteases like plasmin (Nagase, 1995). Thus, a major MMP10 function might be the local activation of other proMMPs, which would agree with identification of several substrates that are indirectly processed by MMP10, but with a general MMP cleavage site specificity (Schlage, Kockmann, Sabino, Kizhakkedathu, & Auf Dem Keller, 2015) 1.2 MMP network degradomics With rapid advancements in mass spectrometry-based degradomics it is possible to assess the abundance of multiple MMPs and their activation status in complex samples (Fahlman, Chen, & Overall, 2014). Techniques developed for substrate cleavage, peptide, and protein analysis provide an opportunity to reveal the underlining MMP biology (Overall & López-Otín, 2002). It is well established that MMP family proteases form a highly interdependent network (Fortelny, et

Figure. 3: Interconnected MMP zymogen activation network. The illustration shows known directed activating interactions between MMPs. ‘Activators’ (green) are proteolytically activated by proteases of other classes (e.g. plasmin) (blue), and themselves activate many other MMPs. ‘Linker’ MMPs (yellow) are activated by ‘activators’ but can also activate several other family members. ‘Executioners’ (red) barely activate other MMPs but can be activated by many members of the MMP family. High interconnectivity and redundancy ensure robustness of the network to loss of individual members. Interactions were compiled from the current edition of the Handbook of Proteolytic Enzymes (Rawlings & Salvesen, 2013), and the network graph was created using Cytoscape (Shannon, et al., 2003).

Novel proteomics technologies to decipher protease networks in cells and tissues 15

al., 2014) (Van Den Steen, et al., 2002). One major external MMP activator is plasmin, a protease in the coagulation system (Fortelny, et al., 2014). The whole MMP activation network can be divided into three different protease types: “activators (MMP3, MMP10) mediating one-way downstream activation of MMPs, meaning they are not activated by other MMPs. “Linkers” (MMP1, MMP7), which are both, activated by an upstream MMP but also cleave a downstream MMP after their activation. Finally, “executioner” MMPs (MMP2, MMP8, MMP9, MMP13) are activated by other MMP members but only cleave non-MMP substrates (Figure 3.). A high degree of network connectivity ensures the robustness of the system, which explains minimal loss of function in mouse knock-out models (Gill, Kassim, Birkland, & Parks, 2010). On the other hand, up to today the literature is based on classical biochemical assays with recombinant proteins hinting towards a low complexity interaction that does not necessarily reflect physiological concentrations, since cell culture supernatants or in vivo tissues provide a far more complex biological background. Mainly MMP2 and MMP9 have been analyzed in more complex systems, with samples from genetically modified animals (Vandooren, Van Den Steen, & Opdenakker, 2013) (Liu, et al., 2005). However, studying only selected proteases of the entire MMP network in a complex system misses important information. MMPs have multiple interaction partners within the protease web, thus this raises the question which MMPs truly directly interact and which are dependent on the action of co-factors to establish a regulated, spatiotemporal activation of the members (Fortelny, et al., 2014).

16 Novel proteomics technologies to decipher protease networks in cells and tissues

1.3 Combinatorial Degradomics – Manuscript 1

Over the past 20 years, since the establishment of mass spectrometry-based proteomics, analysis of proteins and their proteoforms has significantly contributed to elucidation of complex tissue responses and their disturbance in disease. Degradomics tries to identify proteolytically cleaved forms of functional proteins, to assign responsible proteases and to understand altered functions of the post-translationally modified proteins. Significant efforts have been made to develop novel methods to globally discover protease-substrate relations in complex biological matrices. Intricate enrichment strategies, robust liquid chromatography and increased sensitivity of mass spectrometers have opened a new realm of possibilities to characterize functional properties of cleaved products in various biological environments. In this review, we provide an overview of degradomics technologies that are employed to uncover proteolytic events in different biological settings. We evaluate the advantages, disadvantages, and popularity of different techniques to generate new degradomics-related hypotheses. We discuss potential future developments aiming at increasing the throughput, reducing the sample size and increasing the sensitivity range. We briefly extend discovery research into targeted degradomics approaches, but a more detailed explanation of targeted degradomics principles can be found in Manuscript 2.

Savickas S, Kastl P, auf dem Keller U. Combinatorial degradomics: Precision tools to unveil proteolytic processes in biological systems. Biochim Biophys Acta Proteins Proteom. 2020;1868(6):140392. doi:10.1016/j.bbapap.2020.140392

Novel proteomics technologies to decipher protease networks in cells and tissues 17 %%$3URWHLQVDQG3URWHRPLFV  

Contents lists available at ScienceDirect

BBA - Proteins and Proteomics

journal homepage: www.elsevier.com/locate/bbapap

Combinatorial degradomics: Precision tools to unveil proteolytic processes in biological systems ⁎ Simonas Savickas1, Philipp Kastl1, Ulrich auf dem Keller

Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark

ARTICLE INFO ABSTRACT

Keywords: The biological activity of a protein is regulated at many levels ranging from control of transcription and Protease translation to post-translational modifications (PTM). Proteolytic processing is an irreversible PTM generating Mass spectrometry novel isoforms of a mature protein termed proteoforms. Proteoform dynamics is a major focus of current pro- Proteomics teome research, since it has been associated with many pathological conditions. Mass-spectrometry (MS)-based Degradomics proteomics and PTM-specific enrichment workflows have become the methods of choice to study proteoforms in Targeted degradomics vitro and in vivo. Here, we give an overview of currently available MS-based degradomics methods and outline Combinatorial degradomics how they can be optimally applied to study protease cleavage events. We discuss the advantages and dis- advantages of selected approaches and describe state-of-the-art improvements in degradomics technologies. By introducing the concept of combinatorial degradomics, a combination of global discovery degradomics and highly sensitive targeted degradomics, we demonstrate how MS-based degradomics further evolves as a powerful tool in biomedical protease research.

1. Introduction of specialized therapeutics that target only one specific proteoform [3]. State-of-the-art system-wide analytical methods like mass spectro- Over the past decades, the dogma of has dra- metry (MS)-based proteomics have become the preferred approaches to matically evolved from the simple concept of single gene-transcript- analyze protein functions in vitro and in vivo. Although proteomics is protein relations. It is now clear that one gene can encode many tran- based on the simple physical concept that by applying an electric field scriptional products, which are translated into even more amino acid in a high vacuum charged peptides and/or proteins can be separated chains making up proteins with multiple biological activities under and identified by their masses, it has become a highly specialized and different conditions. Thus, to understand the function of a protein in a complex field of research. As an example, MS is used to study different specific biological setting it is important to elucidate its identity, protein proteoforms resulting from PTMs, like phosphorylation, ubi- abundance and how generation of different proteoforms is dynamically quitination or acetylation [4]. Limited proteolysis of substrates is a very regulated during its lifetime [1,2]. At the transcriptional level, pro- important but often neglected irreversible PTM, which terminally alters teoform dynamics is mediated by transcription factors and alternative protein fate and creates novel proteoforms. Substrates undergoing splicing, leading to generation of protein isoforms, while at the protein proteolysis by proteases are often signaling proteins like cytokines, level generation of new proteoforms is conferred by post-translational chemokines, other proteases or extracellular matrix components. Those modifications (PTMs). In recent years, the analysis of regulation at the molecules are all involved in regulating development, immune and protein level has become a major focus, which resulted in establishment inflammatory responses, as well as many pathological conditions ran- of large initiatives, such as the Human Proteome Organization (HUPO, ging from cardiovascular to chronic inflammatory diseases or cancer. In www.hupo.org) and its associated human protein project (HPP). Reg- order to better understand those processes and identify new biomarkers ulation of protein activity at the protein level is dynamic, depends on for the diagnosis of their associated diseases, substrate proteolysis is the physiological context and ultimately determines the terminal now the focus of many research groups around the globe. Based on the function of a protein. This knowledge could therefore be used to interest in proteolysis the field of degradomics has been established, identify potential novel disease markers and allow for the development aiming at analyzing proteolysis both at the substrate and the protease

⁎ Corresponding author at: Technical University of Denmark, Department of Biotechnology and Bioengineering, Søltofts Plads, Building 224, Room 114, DK-2800 Kongens Lyngby, Denmark. E-mail address: [email protected] (U. auf dem Keller). 1 These authors contributed equally to this work. https://doi.org/10.1016/j.bbapap.2020.140392 Received 22 November 2019; Received in revised form 13 February 2020; Accepted 14 February 2020 $YDLODEOHRQOLQH)HEUXDU\ ‹(OVHYLHU%9$OOULJKWVUHVHUYHG S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV   level by MS-based methods. Over the past years, multiple specialized degradomics methods have been developed and it is getting more and more challenging to keep an overview of these approaches and to un- derstand how they can be optimally applied to study a specific problem. In the following, we will provide an overview of major degradomics approaches, outline how currently available MS-based methods can be used in protease research and discuss the future potential of the field.

2. Proteomics and degradomics

As a first step to characterize a proteome and its dynamics, usually a whole proteome analysis of a biological sample of interest is conducted, termed discovery-based proteomics. Most common discovery-based proteomics analyses follow the same, classic bottom-up-proteomics workflow. After isolating the proteome from a biological sample, pro- teins are digested into peptides using a site-specific endopeptidase (e.g. , LysC, ArgC, LysargiNase or GluC). Peptides are then fractio- nated by reverse-phase chromatography, positively ionized, and ana- lyzed in the mass spectrometer according to their mass to charge (m/z) value. The detected peptides and their abundances can then be related to the original protein by a database search against the whole proteome sequence of the target organism. The constant development of new, highly sensitive, high-resolution mass spectrometers in conjunction Fig. 1. Number of published studies using specified degradomics techniques with optimized chemical labeling reagents to analyze multiple samples according to PubMed searches. Data collected in November 2019. in a single experiment (multiplexing) allows for unprecedented peptide identification and quantification sensitivity below femtomolar ranges. Indeed, as whole proteome analysis of biological samples showed, the digest, samples are purified by two orthogonal reverse-phase high abundance of proteins can span a range of more than ten orders of performance liquid chromatography (HPLC) steps. The first HPLC step magnitude [5,6]. This is a major problem, because due to the principles fractionates the sample and reduces its complexity, whereas prior to the of MS analysis peptides derived from high-abundance proteins have the second step the C-terminal and tryptic peptides are hydrophobically tendency to mask low-abundance peptides derived from low-abundance labeled with 2,4,6-trinitrobenzenesulfonic acid (TNBS). Primary N-ter- proteins (Fig. 2A). Moreover, proteolytically generated proteoforms and mini are then isolated from the sample due to their altered retention their peptides are even less abundant, since substrate proteolysis is time on the column. Drawbacks of COFRADIC are limited multiplexing often incomplete. This can potentially be overcome by prior depletion capabilities due to the acetylation of free N-termini and restriction of of high-abundance proteins (e.g. using combinatorial peptide ligand sample labeling to stable isotope labeling with amino acids in cell libraries (CPLLs) [7]), as well as affinity purification (e.g. upon bioti- culture (SILAC), since other commercial chemical labeling reagents nylation [8]) and additional sample fractionation (e.g. by UHPLC or bind to the primary amines of peptides and proteins. A modified version affinity chromatography). In the case of in vitro experiments, this might of COFRADIC termed charge-based fractional diagonal chromatography be a feasible approach, since access to sample is usually not a problem, (ChaFRADIC) was developed, which improves its efficiency by em- but amounts of in vivo material is limited and any additional purifica- ploying strong cation exchange chromatography (SCX) prior to the tion step increases the risk of loss of proteins and in particular the lowly orthogonal HPLC step [10]. This allows for multiplex labeling of native abundant proteoforms. In order to solve this challenge and allow for the and protease-generated N-termini with amine-reactive chemical re- nearly exclusive identification of protease-generated peptides, specia- agents, instead of blocking by acetylation. ChaFRADIC has recently lized positional proteomics methods have been invented. These been miniaturized, so that the separation can be performed on a self- methods combine multiplexing of samples by isotopic labeling with a made column in a pipet tip, reducing sample input to below five μg and specific peptide enrichment step during sample preparation, depleting making the workflow less dependable on a core facility with expensive, the sample of internal tryptic endopeptidase-generated peptides and reliable HPLC systems [11]. enriching for only natural as well as protease-generated N-terminal Another powerful positional proteomics method widely applied is peptides. Over the past years multiple different positional degradomics terminal amine isotopic labeling of substrates (TAILS) ([12]; for a re- methods have been presented, with some being project-oriented and cent protocol see [13]). Like COFRADIC and ChaFRADIC, TAILS is rarely been used besides for their original studies. Others have become based on negative enrichment of protein N-termini and protease-gen- established discovery degradomics methods used to identify a multi- erated neoN-terminal peptides (Fig. 2B, middle panel). Here, an iso- tude of protease cleavage events (Fig. 1). Here, we will focus on the two topic label introduced by whole protein labeling chemically blocks all majorly applied methods in detail, but also give a brief overview of primary α- and ε-amines. After the tryptic digest, tryptic peptides are some more specialized alternatives. removed from the sample by binding to an amine-reactive highly- branched polyglycerol aldehyde polymer (HPG-ALD polymer) via their 3. Discovery degradomics as a tool to identify protease substrates unblocked primary N-terminal α-amines. Since chemically or natively blocked (e.g. acetylated) N-terminal peptides cannot bind to this The first method developed for specific enrichment of protein N- polymer, they can be collected in the flow-through of size-exclusion termini from whole proteomes was combined fractional diagonal spin-columns. Similar to ChaFRADIC, TAILS allows for multiplexing of chromatography (COFRADIC) [9]. In COFRADIC, proteins are reduced samples by isotopic labeling at the protein level prior to the tryptic and alkylated to block cysteine side chains followed by trideutero- digest, which can now be extended up to 16-plex by use of amine-re- acetylation of primary N-termini (natural and protease-generated) active tandem mass tags (TMTpro). Since less chromatography steps are (Fig. 2B, right panel). Afterwards, proteins are trypsin digested, re- involved, there is a lower risk of peptide losses during sample pre- sulting in two species of N-terminal peptides: (i) acetylated original paration, but the polymer pull-out depends on prior N-terminal peptides and (ii) tryptic peptides with free α-amines. After the tryptic blocking and efficiency of proteome labeling [14]. Multiplexing

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV  

Fig. 2. Discovery degradomics allow for identification of low-abundance proteoforms in biological samples. (A) Traditional whole proteome shotgun proteomics fails to detect many low-abundance proteins due to low signal-to-noise ratios. (B) Peptide enrichment methods in discovery degradomics allow for detection of low- abundance protease-generated peptides in complex biological samples. Identification of N-terminal peptides can be improved by either positive (left panel) or negative (middle and right panel) enrichment strategies.

provides an additional benefit, since increasing the number of samples by titanium dioxide (TiO2) affinity chromatography [15]. Additionally, analyzed within a single TMT-based TAILS experiment results in more natural und protease-generated N-termini can be positively enriched by total protein, which on the other hand allows to reduce the amount of N-terminalomics by Chemical Labeling of the α-Amine of Proteins (N- total protein needed from individual samples. CLAP) [16] or subtiligase enzymatic labeling [8], which both specifi- An alternative to COFRADIC, ChaFRADIC or TAILS is based on re- cally chemically label α-amines allowing for their affinity purification moving internal tryptic peptides via phospho-tagging (PTAG) followed (Fig. 2B, left panel). Secretome Protein Enrichment with Click Sugars

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV  

(SPECS) metabolically labels glycosylated proteins, followed by click specific low-abundance proteins and protease cleavage events. chemistry-mediated biotinylation for avidin affinity enrichment and Moreover, inherent undersampling in shotgun proteomics can prevent enables in particular identification of protease-mediated surface shed- comprehensive positional peptide mapping of identified substrate pro- ding events but without providing information on protease cleavage teins. Thus, it might not be clear, if a unique proteoform was generated sites [17][18]. by limited proteolysis or if it resulted from rather unspecific degrada- Even though identifying the N-terminal peptide of a cleaved sub- tion of the mature protein. By harnessing the sensitivity, precision and strate is often sufficient for protease substrate discovery, C-terminal ability to overcome interference of the whole proteome, targeted de- cleavage products are of as much interest. However, enrichment of C- gradomics ensures that proteoforms and cleavage events of interest can terminal peptides is not as commonly used, since carboxyl groups are be analyzed with high sensitivity and comprehensive coverage of sub- less reactive than primary amines. Nevertheless, TAILS and COFRADIC strate proteins. have been adjusted to allow for enrichment of natural and protease- The vast amount of information accessible today provides ample generated protein C-termini. For TAILS, additionally to N-terminal resources to identify a proteoform cleavage as well as a candidate blocking, ethanolamine is coupled to the C-terminus of peptides al- protease of interest. Data obtained by discovery degradomics ap- lowing for the negative enrichment of protein C-termini using a high- proaches is stored in databases like MEROPS (https://www.ebi.ac.uk/ molecular-weight poly-allylamine polymer [19]. C-terminal COFRADIC /), CutDB (https://omictools.com/cutdb-tool), TopFIND employs an extra labeling step of primary amines of C-terminal pep- (http://clipserve.clip.ubc.ca/topfind), CaspDB (https://omictools.com/ tides, followed by an additional round of HPLC fractionation [20]. caspdb-tool), Degrabase (https://wellslab.ucsf.edu/degrabase/)and Positive and negative enrichment methods both come with their others [25–29]. Those databases may also contain multiple isoforms of own advantages and disadvantages and are thus suitable to address a cleaved proteoform associated with different biological contexts different biological questions. Negative enrichment methods allow for [29,30]. Additionally, extracted datasets allow for prediction of can- in-depth analysis of both the proteome (before enrichment) and the didate proteases based on known substrate cleavage motif preferences degradome (after enrichment) with the possibility to multiplex ex- of endopeptidases. Validation of direct protease-substrate relations re- periments by chemical labeling of α- and ε-amines, which increases vealed by discovery degradomics require experimental evidence that quantification and identification confidence between multiple samples. identified proteoforms are indeed the result of a “true” cleavage event Importantly, they also enrich for naturally modified N-/C-terminal performed by a specific protease. This validation can be achieved by a peptides (e.g. acetylated N-termini), since those will not bind to the targeted degradomics experiment. In contrast to discovery de- matrix used for negative enrichment. While using readily available gradomics, targeted degradomics aims at monitoring specific proteo- chemicals for labeling, negative enrichment methods require either forms and their cleavage products rather than measuring as many extensive fractionation (COFRADIC, ChaFRADIC) or specialized poly- events as possible. As an example, a substrate of interest is co-incubated mers (TAILS) for depletion of internal tryptic peptides, which, however, with a candidate protease and the mass spectrometer set to specifically are now publicly available. As a major technical challenge, positive scan either the original or the proteolytically generated peptide. enrichment of N-terminal peptides requires selective labeling of N- Thereby, both proteoforms can be identified and their relative abun- terminal α-amines. This precludes identification of naturally blocked dances determined [31]. As long as the peptide target is known, tar- protein N-termini but might be advantageous for identification of geted degradomics allows identifying the same target peptide e.g. in a protease cleavage events in cytoplasmic samples with high numbers of simple cell-free protein system, in cell culture and in vivo [32]. For naturally acetylated N-terminal peptides. Furthermore, multiplexing instance, targeted proteomics was successfully applied to compare the and peptide quantification is mostly restricted to SILAC or label-free same peptide target across different cancer cell lines, ranging from quantification, since isotopic labels would have to be introduced with pancreatic cancer to colorectal adenosarcoma, breast cancer, and me- N-terminal affinity tags. An exception is subtiligase enzymatic labeling, tastatic adenosarcoma [33]. Similarly, it is possible to monitor the same which allows isotopic labeling of ε-amines after positive enrichment but cleaved peptide in human plasma, urine, liver, kidney, heart, skin, etc. is restricted to multiplexed quantification of lysine-containing N-term- Furthermore, targeted proteomics and degradomics are specifically inal peptides and requires access to the subtiligase enzyme. suited for cross-tissue and cross-laboratory studies and basically only Since their first publication discovery degradomics methods have limited by robustness of chromatography and instrument availability been extensively used to identify proteases and their specific cleavage [34,35]. products in various biological backgrounds and from all kinds of dif- In general, trypsin is used as a standard working protease for ferent species ranging from Escherichia coli [9,21], Arabidopsis thaliana bottom-up proteomics and degradomics, but with the emergence of [10], model animals like rodents [22] and pigs [23] to humans [24]. At alternative endopeptidases, such as LysC, AspN, GluC, , and time of writing, a simple PubMed search with the keywords “Terminal LysargiNase the coverage of the full proteome as well as the N-termi- Amine Isotopic Labeling of Substrates” or “COFRADIC” resulted in 49 nome has significantly increased [31,36]. Like in discovery approaches, (for TAILS) and 42 (for COFRADIC) research articles, respectively different endopeptidases can be applied to targeted proteomics and (Fig. 1). Additional manual curation revealed that those search terms degradomics to cover the naturally occurring proteome and degradome did not even provide a full list of all articles published using these with maximum depth. Therefore, using the appropriate endopeptidase methods. Since discussing all studies that employed a discovery de- it is now possible to develop targeted degradomics assays for most gradomics technique would be beyond the scope of this review, we protease cleavage events. As long as the peptide of interest has a length provide a compilation of research articles published up till November between 6 and 26 amino acids, the mass-spectrometer will identify the 2019 and based on PubMed searches for the associated method names fragment [36]. Still, due to a limited variety of endopeptidases identi- (Table 1). Reference numbers in this table refer to PubMed accession fication of specific fragments might be prohibited. As an alternative, a numbers of corresponding publications listed in Table S1. top-down proteomics approach might be applied, in which the intact, undigested proteome is directly injected into the LC-MS system. How- 4. Targeted degradomics to validate and extend results from ever, top-down-proteomics faces other challenges, such as limited so- discovery degradomics approaches lubility, insignificant ionization and complex fragmentation of proteins [37]. Discovery proteomics and degradomics approaches are inherently Through PTM crosstalk other PTMs of the parent protein can affect challenged by the complexity of the sample proteome. Only applying generation of proteolytic fragments. Using targeted degradomics, PTMs differential depletion steps or immunoprecipitation methods using such as phosphorylation, glycosylation and ubiquitination that may specific antibodies might enable identification and quantification of interfere with proteolytic cleavage can be concomitantly monitored,

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV  

Table 1 Compilation of published discovery degradomics methods. Ref. # refers to numbers in Table S1.

#of Technique Preparation Organism Sample type Ref. # publications time

49 TAILS 48 h Popular technique: from recombinant proteins to clinical patient testing 1–49 42 COFRADIC 48 h Popular technique: from recombinant proteins to clinical patient testing 50–91 1 HUNTER 48 h Rat Brain 135 Plant Arabidopsis thaliana 135 Human HeLa, B-ALL, 697, blood plasma, primary B-ALL, primary AML, B-cells, 135 monocytes, natural killer cells, NKT 2 CheFRADIC 48 h Plant Arabidopsis thaliana 136 Human Platelets 137 7 Enzymatic Biotinylation 60 h Bacteria Escherichia coli 123, 125, 126 Yeast Saccharomyces cerevisiae 123 Recombinant Protein Aprotinin 123,126 Human HEK-293A, Serum, Jurkat cells, blood plasma 123, 124, 126, 127 6 Yeast-two-Hybrid in Human HEK293 T, HCT116, Daudi, THP-1, OCI-AML2, dendritic cells, 129–132 Proteolysis Recombinant protein RFXANK, Fibrinogen 129, 133 Mouse Megakaryocytes, platelets 134 4 SPECS 96 h Mouse Cortical neurons (secretome), MEF (secretome), primary neurons (secretome) 92,94,95 Human HEK293 T (secretome) 93,94 9 PROTOMAP 24 h Human Blood, Jurkat T Cells, erythrocytes, MHCC97L, HCCLM6, A549, PC-3 97–104 6 TMPP, N-TOP, dN-TOP 24 h Recombinant protein Recombinant glycoproteins, Hz6F4–2 105, 106, 107, 110 Bacterium Myobacterium smegmatis, Herminiimonas arsenicoxydans 108, 109 1 NEDDylator Human Jurkat 128 4 ATOMS 48 h Human Primary macrophages, BJ cells, HUVEC, skin 111, 112 Recombinant protein Fibronectin-1, LM-111 113 10 C-Terminomics 48 h Recombinant protein Bovine serum albumin, β-casein 114 Human 293 T 115 Bacteria/Yeast Saccharomyces cerevisiae, Escherichia coli, Thermoanaerobacter tengcongensis 116–121 1 PTAG Bacteria/Yeast Neisseria meningitides, Saccharomyces cerevisiae 122

often without additional enrichment steps [13]. Monitoring the target providing access (EPIC-XS, https://epic-xs.eu/) now also offer de- peptide with and without a modification only requires correction of the gradomics technologies and have opened up many services to biome- target mass and determination of respective elution times. Concomitant dical scientists and clinical researchers. analysis of multiple PTMs in an endogenous biological environment allows for a more in-depth analysis of the sample and provides addi- 5. Future perspective for combinatorial degradomics technologies tional insight into complex biological systems [38,39]. In addition to validating peptide cleavage events, targeted de- If combined in an approach we term combinatorial degradomics gradomics data can be quantitatively analyzed with high accuracy, e.g. (Fig. 3), discovery and targeted degradomics become a powerful tool to to proportionally assess cleaved and non-cleaved proteoforms. This can identify, validate and characterize proteolytic events in every type of be achieved by measuring (i) the intensity of the ion chromatographic biological sample. peak, (ii) counting peptide spectrum matches or (iii) comparing abso- Combinatorial degradomics has already been applied to better un- lute peptide concentration to the synthetic reference spectrum [40]. derstand known and to decipher novel protease signaling pathways. The most common quantification technique determines the area under Uncovering new modes of complement activation, providing un- the curve of each transition recorded at the peptide fragment level precedented insight into inflammation and cancer signaling and un- [41,42], which is a flexible and convenient method to select the right raveling specificity profiles for entire protease families are only a few values for the identified peptide. These values can then be analyzed examples of how degradomics technologies have revolutionized pro- using standard data processing software, such as Microsoft Excel, tease research towards a comprehensive understanding of the protease GraphPad Prism, SPSS, or statistical programming languages like R, web [46–49]. Although data recorded in system-wide analyses is Matlab or Python [43,44]. All of these provide convenient packages for compiled in data repositories, designing experiments or software al- targeted degradomics data analysis and statistical methods to assist in gorithms to orthogonally but comprehensively validate the insights data interpretation. gained by degradomics experiments is still a major challenge. Fur- Targeted degradomics is becoming more and more popular as a thermore, more streamlined techniques are needed to confirm the technique to verify and reliably quantify a cleavage event of choice biological relevance of phenomena observed in such hypothesis-gen- [31]. This is supported by the advances in sensitivity, precision, speed, erating studies. and robustness of mass spectrometers that have reached and in some A more recent techniques, which has been applied to discovery cases already surpassed traditional antibody-based methods [45]. The degradomics is data independent acquisition (DIA) mass spectrometry, lack of reliable antibodies, the inability to raise an antibody against a allowing discovery and quantification of thousands of peptides and specific cleavage product, or the time it takes to produce an antibody newly generated proteoforms [50,51]. DIA shifts the paradigm of for a specific target protein is a limitation that might be overcome by identifying the most abundant ions in a peptide mixture to measuring mass spectrometry-based targeted degradomics. specified windows of masses and thus turns the stochastic nature of With better access, affordable prices and availability of proteomics data dependent acquisition (DDA) shotgun proteomics into a more core facilities, mass spectrometry-based proteomics is readily becoming unbiased approach [52]. Given the fundamental differences in modes of available to a wider community of protease researchers. Initiatives like data acquisition, DIA can also serve as orthogonal technology for high- the Clinical Proteomics Tumor Analysis Consortium (CPTAC, https:// throughput validation of DDA data. In general, DIA relies on extensive www.hupo.org/Clinical-Proteome-Tumor-Analysis-Consortium- spectral libraries that need to be recorded in separate DDA measure- (CPTAC)) and the European Proteomics Infrastructure Consortium ments, significantly increasing mass spectrometry time. However, use

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV  

Fig. 3. Overview of combinatorial degradomics. Combination of discovery and targeted degradomics methods allows monitoring of protease cleavage events inall kinds of biological samples. Candidate cleavages identified from combinatorial degradomics have to be ultimately verified by in vitro/in vivo testing across different systems. All three approaches integrate with database profiling and next-generation machine learning algorithms. of pre-generated libraries can now be completely avoided, e.g. by using the HPG-ALD polymer regularly applied in TAILS, they modified combining DIA with Prosit, a deep learning algorithm for peptide the HYTANE method for negative enrichment of N-termini [56]. With tandem mass spectra prediction [53]. this approach, they have managed to identify more than 8000 N-termini Quantitative discovery degradomics has highly benefitted from ex- out of 1 Million HeLa cells starting material [51]. tended multiplexing capabilities, now allowing concomitant analysis of High-throughput validation of discovery degradomics results does up to sixteen samples with the newly released 16plex Tandem Mass Tag not only depend on the sensitivity and throughput of the validation (TMTpro) reagents. By combining sixteen samples into a single injec- methods, but also on data processing strategies. As an example, tion, the possibility to identify a target cleavage can be increased, the Triggered by Offset, Multiplexed, Accurate mass, High resolution, and time taken for mass spectrometry analysis reduced and a significantly Absolute Quantitation (TOMAHAQ) is a new software tool for targeted higher sample throughput achieved. Dependent on the chemistry of proteomics that combines TMT-based multiplexing with targeted ana- isobaric labeling reagents, even higher levels of multiplexing can be lyses. In conjunction with the new TMTpro 16plex kit, TMT0 and TMT realized [54], which are expected to soon be commercially available for super heavy, TOMAHAQ and TOMAHAQ companion allow rapid and proteomics and degradomics applications. reliable target quantification of thousands of targets at a time. Together with the development of novel amine-reactive labeling Compared to regular targeted proteomics workflows this is a major reagents, extensive off-line peptide fractionation has the potential to boost in throughput and may also be used for validation of proteolytic significantly increase throughput and N-terminome coverage in dis- cleavage events identified in discovery degradomics experiments covery degradomics. By establishing automated set-ups using reverse [57–59]. phase high pH chromatography a sample can be fractionated into more Furthermore, by combining already existing techniques like TAILS than 48 fractions, dramatically increasing the coverage of the proteome and TOMAHAQ it will be possible to use N-terminally enriched frac- [51,55]. Injecting each fraction independently using a classical DDA tions as template libraries to monitor and quantify the N-termini of mode for analysis and combining all fractions into a single output file, it samples without enrichment. This technique will reduce sample loss is possible to identify over 10.000 N-terminal peptides from a single due to long enrichment workflows, reducing sample-to-sample varia- sample. This includes multiple proteoforms that were previously un- tion from handling errors and will increase the speed of target ver- discovered or required complex steps of targeted precipitation [45]. ification. Combining fractionation and DIA, the groups of Huesgen and Lange The combination of these new analysis tools pushes the limits of have taken another approach to N-terminal enrichment. Instead of combinatorial proteomics. With the advent of single cell proteomics

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV   and new updates on TOMAHAQ it may even become possible to Extracellular Matrix Degradomes by TMT-TAILS N-Terminomics BT - Collagen: monitor proteolysis events in single cells [60]. Due to an inherent ca- Methods and Protocols, Springer, New York, NY, 2019, pp. 115–126, , https://doi. org/10.1007/978-1-4939-9095-5_8. pacity of TMT tags and the TOMAHAQ targeted method to distinguish [14] V. Guryča, J. Lamerz, A. Ducret, P. Cutler, Qualitative improvement and quanti- between a carrier channel with over 500 cells and a single cell carrying tative assessment of N-terminomics, Proteomics. 12 (2012) 1207–1216, https://doi. a TMT tag this might enable analyzing the heterogeneity of post- org/10.1002/pmic.201100430. [15] G.P.M. Mommen, B. van de Waterbeemd, H.D. Meiring, G. Kersten, A.J.R. Heck, translational modifications at the single cell level. This will allow us to A.P.J.M. de Jong, Unbiased selective isolation of protein N-terminal peptides from draw a comprehensive map of cellular dynamics according to the size of complex proteome samples using phospho tagging (PTAG) and TiO(2)-based de- a single cell, its protein content, cell type, etc. The opportunities opened pletion, Mol. Cell. Proteomics 11 (2012), https://doi.org/10.1074/mcp.O112. up by single cell proteomics can hardly be grasped, but they will move 018283 832 LP – 842. [16] G. Xu, S.B.Y. Shin, S.R. Jaffrey, Global profiling of protease cleavage sites by che- the field even further towards clinical diagnostics. moselective labeling of protein N-termini, Proc. Natl. Acad. Sci. 106 (2009), https://doi.org/10.1073/pnas.0908958106 19310 LP – 19315. Declaration of Competing Interest [17] P.-H. Kuhn, K. Koroniak, S. Hogl, A. Colombo, U. Zeitschel, M. Willem, C. Volbracht, U. Schepers, A. Imhof, A. Hoffmeister, C. Haass, S. Roßner, S. Bräse, S.F. Lichtenthaler, Secretome protein enrichment identifies physiological BACE1 None. protease substrates in neurons, EMBO J. 31 (2012) 3157–3168, https://doi.org/10. 1038/emboj.2012.173. [18] A. Serdaroglu, S.A. Müller, U. Schepers, S. Bräse, W. Weichert, S.F. Lichtenthaler, Acknowledgments P.-H. Kuhn, An optimised version of the secretome protein enrichment with click sugars (SPECS) method leads to enhanced coverage of the secretome, Proteomics. Ulrich auf dem Keller acknowledges support by a Novo Nordisk 17 (2017) 1600423, https://doi.org/10.1002/pmic.201600423. [19] O Schilling, P F Huesgen, O Barré, C M Overall, G Cagney, A Emili (Eds.), Foundation Young Investigator Award [NNF16OC0020670]. Philipp Identification and Relative Quantification of Native and Proteolytically Generated Kastl is supported by a research fellowship from the German Research Protein C-Termini from Complex Proteomes: C-Terminome Analysis BT - Network Foundation [project no. 415888450]. Biology: Methods and Applications, Humana Press, Totowa, NJ, 2011, pp. 59–69, , https://doi.org/10.1007/978-1-61779-276-2_4. [20] P Van Damme, A Staes, S Bronsoms, K Helsens, N Colaert, E Timmerman, F X Aviles, Appendix A. Supplementary data J Vandekerckhove, K Gevaert, Complementary positional proteomics for screening substrates of endo- and exoproteases, Nat. Methods 7 (2010) 512–515, https://doi. Supplementary data to this article can be found online at https:// org/10.1038/nmeth.1469. [21] K. Gevaert, J. Van Damme, M. Goethals, G.R. Thomas, B. Hoorelbeke, H. Demol, doi.org/10.1016/j.bbapap.2020.140392. L. Martens, M. Puype, A. Staes, J. Vandekerckhove, Chromatographic isolation of methionine-containing peptides for gel-free proteome analysis, Mol. Cell. References Proteomics 1 (2002), https://doi.org/10.1074/mcp.M200061-MCP200 896 LP – 903. [22] P. Schlage, T. Kockmann, F. Sabino, J.N. Kizhakkedathu, U. Auf Dem Keller, Matrix [1] Y Liu, A Beyer, R Aebersold, On the dependency of cellular protein levels on mRNA metalloproteinase 10 degradomics in keratinocytes and epidermal tissue identifies abundance, Cell. 165 (2016) 535–550, https://doi.org/10.1016/j.cell.2016.03.014. bioactive substrates with pleiotropic functions, Mol. Cell. Proteomics 14 (2015) [2] L M Smith, N L Kelleher, T.C. for T.D. Proteomics, M Linial, D Goodlett, 3234–3246, https://doi.org/10.1074/mcp.M115.053520. P Langridge-Smith, Y Ah Goo, G Safford, L Bonilla, G Kruppa, R Zubarev, J Rontree, [23] F. Sabino, O. Hermes, F.E. Egli, T. Kockmann, P. Schlage, P. Croizat, J Chamot-Rooke, J Garavelli, A Heck, J Loo, D Penque, M Hornshaw, J.N. Kizhakkedathu, H. Smola, U. Auf Dem Keller, In vivo assessment of protease C Hendrickson, L Pasa-Tolic, C Borchers, D Chan, N Young, J Agar, C Masselon, dynamics in cutaneous wound healing by degradomics analysis of porcine wound M Gross, F McLafferty, Y Tsybin, Y Ge, I Sanders, J Langridge, J Whitelegge, exudates, Mol. Cell. Proteomics 14 (2015) 354–370, https://doi.org/10.1074/mcp. A Marshall, Proteoform: a single term describing protein complexity, Nat. Methods M114.043414. 10 (2013) 186, https://doi.org/10.1038/nmeth.2369. [24] T. Klein, S.-Y. Fung, F. Renner, M.A. Blank, A. Dufour, S. Kang, M. Bolger-Munro, [3] R Aebersold, G D Bader, A M Edwards, J E van Eyk, M Kussmann, J Qin, G S Omenn, J.M. Scurll, J.J. Priatel, P. Schweigler, S. Melkko, M.R. Gold, R.I. Viner, The biology/disease-driven human proteome project (B/D-HPP): enabling protein C.H. Régnier, S.E. Turvey, C.M. Overall, The paracaspase MALT1 cleaves HOIL1 research for the life sciences community, J. Proteome Res. 12 (2013) 23–27, reducing linear ubiquitination by LUBAC to dampen lymphocyte NF-κB signalling, https://doi.org/10.1021/pr301151m. Nat. Commun. 6 (2015) 8777, https://doi.org/10.1038/ncomms9777. [4] E S Witze, W M Old, K A Resing, N G Ahn, Mapping protein post-translational [25] E.D. Crawford, J.E. Seaman, N. Agard, G.W. Hsu, O. Julien, S. Mahrus, H. Nguyen, modifications with mass spectrometry, Nat. Methods 4 (2007) 798–806, https:// K. Shimbo, H.A.I. Yoshihara, M. Zhuang, R.J. Chalkley, J.A. Wells, The DegraBase: a doi.org/10.1038/nmeth1100. database of proteolysis in healthy and apoptotic human cells, Mol. Cell. Proteomics [5] A F Hühmer, R G Biringer, H Amato, A N Fonteh, M G Harrington, Protein analysis 12 (2013), https://doi.org/10.1074/mcp.O112.024372 813 LP – 824. in human cerebrospinal fluid: physiological aspects, current progress and future [26] N. Fortelny, S. Yang, P. Pavlidis, P.F. Lange, C.M. Overall, Proteome TopFIND 3.0 challenges, Dis. Markers 22 (2006) 3–26, https://doi.org/10.1155/2006/158797. with TopFINDer and PathFINDer: database and analysis tools for the association of [6] M. Wang, C.J. Herrmann, M. Simonovic, D. Szklarczyk, C. von Mering, Version 4.0 protein termini to pre- and post-translational events, Nucleic Acids Res. 43 (2014) of PaxDb: Protein abundance data, integrated across model organisms, tissues, and D290–D297, https://doi.org/10.1093/nar/gku1012. cell-lines, Proteomics 15 (2015) 3163–3168, https://doi.org/10.1002/pmic. [27] Y. Igarashi, A. Eroshkin, S. Gramatikova, K. Gramatikoff, Y. Zhang, J.W. Smith, 201400441. A.L. Osterman, A. Godzik, CutDB: a proteolytic event database, Nucleic Acids Res. [7] E Boschetti, P G Righetti, The ProteoMiner in the proteomic arena: a non-depleting 35 (2006) D546–D549, https://doi.org/10.1093/nar/gkl813. tool for discovering low-abundance species, J. Proteome 71 (2008) 255–264, [28] S Kumar, B J van Raam, G S Salvesen, P Cieplak, Caspase cleavage sites in the https://doi.org/10.1016/j.jprot.2008.05.002. human proteome: CaspDB, a database of predicted substrates, PLoS One 9 (2014) [8] H.A.I. Yoshihara, S. Mahrus, J.A. Wells, Tags for labeling protein N-termini with e110539, , https://doi.org/10.1371/journal.pone.0110539. subtiligase for proteomics, Bioorg. Med. Chem. Lett. 18 (2008) 6000–6003, https:// [29] N.D. Rawlings, A.J. Barrett, P.D. Thomas, X. Huang, A. Bateman, R.D. Finn, The doi.org/10.1016/j.bmcl.2008.08.044. MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 [9] K. Gevaert, M. Goethals, L. Martens, J. Van Damme, A. Staes, G.R. Thomas, and a comparison with peptidases in the PANTHER database, Nucleic Acids Res. J. Vandekerckhove, Exploring proteomes and analyzing protein processing by mass (2018), https://doi.org/10.1093/nar/gkx1134. spectrometric identification of sorted N-terminal peptides, Nat. Biotechnol. 21 [30] N Fortelny, G S Butler, C M Overall, P Pavlidis, Protease-inhibitor interaction pre- (2003) 566–569, https://doi.org/10.1038/nbt810. dictions: lessons on the complexity of protein-protein interactions, Mol. Cell. [10] A.S. Venne, F.A. Solari, F. Faden, T. Paretti, N. Dissmeyer, R.P. Zahedi, An improved Proteomics (2017), https://doi.org/10.1074/mcp.M116.065706. workflow for quantitative N-terminal charge-based fractional diagonal chromato- [31] S. Savickas, U. Auf Dem Keller, Targeted degradomics in protein terminomics and graphy (ChaFRADIC) to study proteolytic events in Arabidopsis thaliana, protease substrate discovery, Biol. Chem. (2017), https://doi.org/10.1515/hsz- Proteomics. 15 (2015) 2458–2469, https://doi.org/10.1002/pmic.201500014. 2017-0187. [11] G. Shema, M.T.N. Nguyen, F.A. Solari, S. Loroch, A.S. Venne, L. Kollipara, [32] H.A. Ebhardt, A. Root, C. Sander, R. Aebersold, Applications of targeted proteomics A. Sickmann, S.H.L. Verhelst, R.P. Zahedi, Simple, scalable, and ultrasensitive tip- in systems biology and translational medicine, Proteomics (2015), https://doi.org/ based identification of protease substrates, Mol. Cell. Proteomics 17 (2018), 10.1002/pmic.201500004. https://doi.org/10.1074/mcp.TIR117.000302 826 LP – 834. [33] M. Tachikawa, Y. Kaneko, S. Ohtsuki, Y. Uchida, M. Watanabe, H. Ohtsuka, [12] O. Kleifeld, A. Doucet, A. Prudova, U. Auf Dem Keller, M. Gioia, T. Terasaki, Targeted proteomics-based quantitative protein atlas of pannexin and J.N. Kizhakkedathu, C.M. Overall, Identifying and quantifying proteolytic events connexin subtypes in mouse and human tissues and cancer cell lines, J. Pharm. Sci. and the natural N terminome by terminal amine isotopic labeling of substrates, Nat. (2019), https://doi.org/10.1016/j.xphs.2019.09.024. Protoc. 6 (2011) 1578–1611, https://doi.org/10.1038/nprot.2011.382. [34] T.A. Addona, S.E. Abbatiello, B. Schilling, S.J. Skates, D.R. Mani, D.M. Bunk, [13] E Madzharova, F Sabino, I Sagi, N A Afratis (Eds.), U. Auf Dem Keller, Exploring C.H. Spiegelman, L.J. Zimmerman, A.J.L. Ham, H. Keshishian, S.C. Hall, S. Allen,

 S. Savickas, et al. %%$3URWHLQVDQG3URWHRPLFV  

R.K. Blackman, C.H. Borchers, C. Buck, H.L. Cardasis, M.P. Cusack, N.G. Dodder, V. Goebeler, R. Kappelhoff, U. auf dem Keller, T. Klein, P.F. Lange, G. Marino, B.W. Gibson, J.M. Held, T. Hiltke, A. Jackson, E.B. Johansen, C.R. Kinsinger, J. Li, C.J. Morrison, A. Prudova, D. Rodriguez, A.E. Starr, Y. Wang, C.M. Overall, Active M. Mesri, T.A. Neubert, R.K. Niles, T.C. Pulsipher, D. Ransohoff, H. Rodriguez, site specificity profiling of the matrix metalloproteinase family: Proteomic identi- P.A. Rudnick, D. Smith, D.L. Tabb, T.J. Tegeler, A.M. Variyath, L.J. Vega-Montoto, fication of 4300 cleavage sites by nine MMPs explored with structural and synthetic Å. Wahlander, S. Waldemarson, M. Wang, J.R. Whiteaker, L. Zhao, N.L. Anderson, peptide cleavage analyses, Matrix Biol. 49 (2016) 37–60, https://doi.org/10.1016/ S.J. Fisher, D.C. Liebler, A.G. Paulovich, F.E. Regnier, P. Tempst, S.A. Carr, Multi- j.matbio.2015.09.003. site assessment of the precision and reproducibility of multiple reaction monitoring- [48] N Fortelny, J H Cox, R Kappelhoff, A E Starr, P F Lange, P Pavlidis, C M Overall, based measurements of proteins in plasma, Nat. Biotechnol. (2009), https://doi. Network analyses reveal pervasive functional regulation between proteases in the org/10.1038/nbt.1546. human protease web, PLoS Biol. 12 (2014) e1001869, , https://doi.org/10.1371/ [35] H.D. Cox, F. Lopes, G.A. Woldemariam, J.O. Becker, M.C. Parkin, A. Thomas, journal.pbio.1001869. A.W. Butch, D.A. Cowan, M. Thevis, L.D. Bowers, A.N. Hoofnagle, Interlaboratory [49] T Klein, U Eckhard, A Dufour, N Solis, C M Overall, Proteolytic cleavage - me- agreement of insulin-like growth factor 1 concentrations measured by mass spec- chanisms, function, and “omic” approaches for a near-ubiquitous posttranslational trometry, Clin. Chem. (2014), https://doi.org/10.1373/clinchem.2013.208538. modification, Chem. Rev. (2018), https://doi.org/10.1021/acs.chemrev.7b00120. [36] P. Giansanti, L. Tsiatsiani, T.Y. Low, A.J.R. Heck, Six alternative proteases for mass [50] M. Grozdanić, R. Vidmar, M. Vizovišek, M. Fonović, Degradomics in Biomarker spectrometry-based proteomics beyond trypsin, Nat. Protoc. (2016), https://doi. Discovery, PROTEOMICS – Clin. Appl., 2019, https://doi.org/10.1002/prca. org/10.1038/nprot.2016.057. 201800138. [37] A D Catherman, O S Skinner, N L Kelleher, Top down proteomics: facts and per- [51] S S H Weng, F Demir, E K Ergin, S Dirnberger, A Uzozie, D Tuscher, L Nierves, spectives, Biochem. Biophys. Res. Commun. 445 (2014) 683–693, https://doi.org/ J Tsui, P F Huesgen, P F Lange, Sensitive determination of proteolytic proteoforms 10.1016/j.bbrc.2014.02.041. in limited microscale proteome samples, Mol. Cell. Proteomics (2019), https://doi. [38] S.L. King, C.K. Goth, U. Eckhard, H.J. Joshi, A.D. Haue, S.Y. Vakhrushev, org/10.1074/mcp.tir119.001560. K.T. Schjoldager, C.M. Overall, H.H. Wandall, TAILS N-terminomics and proteomics [52] R Bruderer, O M Bernhardt, T Gandhi, Y Xuan, J Sondermann, M Schmidt, reveal complex regulation of proteolytic cleavage by O-glycosylation, J. Biol. Chem. D Gomez-Varela, L Reiter, Optimization of experimental parameters in data-in- (2018), https://doi.org/10.1074/jbc.RA118.001978. dependent mass spectrometry significantly increases depth and reproducibility of [39] D.B. Trentini, M.J. Suskiewicz, A. Heuck, R. Kurzbauer, L. Deszcz, K. Mechtler, results, Mol. Cell. Proteomics (2017), https://doi.org/10.1074/mcp.RA117. T. Clausen, Arginine phosphorylation marks proteins for degradation by a Clp 000314. protease, Nature (2016), https://doi.org/10.1038/nature20122. [53] S Gessulat, T Schmidt, D P Zolg, P Samaras, K Schnatbaum, J Zerweck, T Knaute, [40] J.W.H. Wong, G. Cagney, An overview of label-free quantitation methods in pro- J Rechenberger, B Delanghe, A Huhmer, U Reimer, H C Ehrlich, S Aiche, B Kuster, teomics by mass spectrometry, Methods Mol. Biol. (2010), https://doi.org/10. M Wilhelm, Prosit: proteome-wide prediction of peptide tandem mass spectra by 1007/978-1-60761-444-9_18. deep learning, Nat. Methods (2019), https://doi.org/10.1038/s41592-019-0426-7. [41] A. Bourmaud, S. Gallien, B. Domon, Parallel reaction monitoring using quadrupole- [54] R. Bąchor, M. Waliczek, P. Stefanowicz, Z. Szewczuk, Trends in the Design of New Orbitrap mass spectrometer: Principle and applications, Proteomics (2016), https:// Isobaric Labeling Reagents for Quantitative Proteomics, Mol 24 (2019), https://doi. doi.org/10.1002/pmic.201500543. org/10.3390/molecules24040701. [42] B. MacLean, D.M. Tomazela, N. Shulman, M. Chambers, G.L. Finney, B. Frewen, [55] T S Batth, J V Olsen, Offline high pH reversed-phase peptide fractionation for deep R. Kern, D.L. Tabb, D.C. Liebler, M.J. MacCoss, Skyline: An open source document phosphoproteome coverage, Methods Mol. Biol. (2016), https://doi.org/10.1007/ editor for creating and analyzing targeted proteomics experiments, Bioinformatics 978-1-4939-3049-4_12. (2010), https://doi.org/10.1093/bioinformatics/btq054. [56] L Chen, Y Shan, Y Weng, Z Sui, X Zhang, Z Liang, L Zhang, Y Zhang, Hydrophobic [43] M. Choi, C.Y. Chang, T. Clough, D. Broudy, T. Killeen, B. MacLean, O. Vitek, tagging-assisted N-termini enrichment for in-depth N-Terminome analysis, Anal. MSstats: An R package for statistical analysis of quantitative mass spectrometry- Chem. 88 (2016) 8390–8395, https://doi.org/10.1021/acs.analchem.6b02453. based proteomic experiments, Bioinformatics (2014), https://doi.org/10.1093/ [57] B K Erickson, C M Rose, C R Braun, A R Erickson, J Knott, G C McAlister, M Wühr, J bioinformatics/btu305. A Paulo, R A Everley, S P Gygi, A strategy to combine sample multiplexing with [44] C Galitzine, J D Egertson, S Abbatiello, C M Henderson, L K Pino, M MacCoss, A targeted proteomics assays for high-throughput protein signature characterization, N Hoofnagle, O Vitek, Nonlinear regression improves accuracy of characterization Mol. Cell (2017), https://doi.org/10.1016/j.molcel.2016.12.005. of multiplexed mass spectrometric assays, Mol. Cell. Proteomics (2018), https:// [58] C.M. Rose, B.K. Erickson, D.K. Schweppe, R. Viner, J. Choi, J. Rogers, doi.org/10.1074/mcp.RA117.000322. R. Bomgarden, S.P. Gygi, D.S. Kirkpatrick, Tomahaq companion: a tool for the [45] S.M. Henry, E. Sutlief, O. Salas-Solano, J. Valliere-Douglass, ELISA reagent coverage creation and analysis of isobaric label based multiplexed targeted assays, J. evaluation by affinity purification tandem mass spectrometry, MAbs (2017), Proteome Res. (2019), https://doi.org/10.1021/acs.jproteome.8b00767. https://doi.org/10.1080/19420862.2017.1349586. [59] X Zhong, Q Yu, F Ma, D C Frost, L Lu, Z Chen, H Zetterberg, C Carlsson, O Okonkwo, [46] U. Auf Dem Keller, A. Prudova, U. Eckhard, B. Fingleton, C.M. Overall, Systems- L Li, HOTMAQ: a multiplexed absolute quantification method for targeted pro- level analysis of proteolytic events in increased vascular permeability and com- teomics, Anal. Chem. (2019), https://doi.org/10.1021/acs.analchem.8b04580. plement activation in skin inflammation, Sci. Signal. (2013), https://doi.org/10. [60] B. Budnik, E. Levy, G. Harmange, N. Slavov, SCoPE-MS: mass spectrometry of single 1126/scisignal.2003512. mammalian cells quantifies proteome heterogeneity during cell differentiation, [47] U. Eckhard, P.F. Huesgen, O. Schilling, C.L. Bellac, G.S. Butler, J.H. Cox, A. Dufour, Genome Biol. (2018), https://doi.org/10.1186/s13059-018-1547-5.

 1.4 Targeted degradomics in protein terminomics and protease substrate discovery –Manuscript 2

Targeted proteomics – a relatively young field within proteomics science – has made a remarkable impact also on deepening the knowledge of protease biology. The continuous advancements in mass spectrometry technology allow not only comprehensive analyses of sets of defined proteases but also quantitative assessments of lowly abundant irreversible protease cleavage events. Over the years of degradomics research, a vast amount of whole proteome and degradome discovery data has been collected. However, from thousands of identified cleavage events only a few have been validated, thus many are left “discovered” and never further investigated e.g. in more complex biological systems. This need can be addressed by targeted degradomics for extremely sensitive, high throughput mass spectrometry-based analysis of cleavage events and protease activation networks. In this review, an introduction into current targeted mass spectrometry-based proteomics technologies is provided. It is described how these techniques can be used for high sensitivity in quantitative analysis of protease cleavage events and integrated with state-of-the-art databases, search engines and machine learning algorithms for cleavage product identification and target definition. Lastly, already existing examples of targeted degradomics applied in different studies are discussed.

Savickas S, auf dem Keller U. Targeted degradomics in protein terminomics and protease substrate discovery. Biol Chem. 2017;399(1):47-54. doi:10.1515/hsz-2017-0187

26 Novel proteomics technologies to decipher protease networks in cells and tissues Biol. Chem. 2017; x(x): xxx–xxx

Minireview

Simonas Savickas and Ulrich auf dem Keller* Targeted degradomics in protein terminomics and protease substrate discovery https://doi.org/10.1515/hsz-2017-0187 cancer (Drag and Salvesen, 2010; Turk et al., 2012). Latest Received June 28, 2017; accepted August 21, 2017 databases contain information about 2700 proteases of various model organisms with more than 1000 in queue Abstract: Targeted degradomics integrates positional still to be sequenced and characterized (Rawlings et al., information into mass spectrometry (MS)-based targeted 2016). Major protease groups can be clustered on the basis proteomics workflows and thereby enables analysis of of their catalytic mechanisms into aspartic, glutamic, proteolytic cleavage events with unprecedented specific- cysteine, metallo, serine and threonine endopeptidases ity and sensitivity. Rapid progress in the establishment (Turk et al., 2012). Multiple strategies, such as substrate- of protease-substrate relations provides extensive degra- and activity-based probes and natural substrate turnover domics target lists that now can be tested with help of characterized the mechanisms of these proteases, and selected and parallel reaction monitoring (S/PRM) in more than 17 000 physiological substrates were assigned complex biological systems, where proteases act in physi- to their related cleaving enzymes (auf dem Keller and ological environments. In this minireview, we describe Schilling, 2010; Rawlings et al., 2016). the general principles of targeted degradomics, outline Rapid advancements in mass spectrometry (MS)- the generic experimental workflow of the methodology based proteomics have helped to detect protease sub- and highlight recent and future applications in protease strates and active proteases in the context of their research. biological roles and significantly increased the number Keywords: degradomics; parallel reaction monitoring; of known protein targets (Marino et al., 2015; Schlage and protease substrates; selected reaction monitoring; tar- auf dem Keller, 2015). However, although these global geted proteomics. approaches also accumulated an array of potential bio- markers, their application in clinics is lagging behind (Anderson et al., 2013). Techniques complementary to Introduction shotgun proteomics, such as selected reaction monitor- ing (SRM), selected ion monitoring (SIM) and parallel Proteolysis is a protease-induced irreversible post trans- reaction monitoring (PRM), may bridge target discov- lational modification, resulting in hydrolysis of peptide ery, context dependent activity and clinical diagnosis. or isopeptide bonds with distinct consequences for target Methodologies first described by Agard et al. (2012) and proteins. Thereby, substrates might be either eliminated by Fahlman et al. (2014) have utilized previously collected proteolytic degradation or functionally modified through biochemical data of proteolytic cleavages and applied limited proteolysis by specific removal of a defined sub- SRM and targeted analysis to monitor substrate turnover sequence of amino acids. Cellular homeostasis depends in complex biological samples with attomolar sensitivity on tightly regulated proteolytic activity, which is spatially and high specificity. They have indicated that targeted and temporally maintained by cofactors, inhibitors and degradomics, i.e. the specific proteomics-based analysis other surrounding elements. Dysregulated proteolysis of selected proteolytic cleavage events, has the quanti- may lead to fatal pathologies, such as impaired tissue tative strength, unbiased reproducibility, and system- repair, inflammation, neurodegenerative disorders or atic simplicity known from general targeted proteomics, which has been widely applied to monitor relative protein abundances. Thus, this pioneering work can have a wide *Corresponding author: Ulrich auf dem Keller, Institute of Molecular spectrum of applications from clinical assay improve- Health Sciences, ETH Zurich, Otto-Stern-Weg 7, CH-8093 Zurich, Switzerland, e-mail: [email protected] ment to elucidation of complex proteolytic networks. Simonas Savickas: Institute of Molecular Health Sciences, ETH Prominent examples for proteolytic fragments, which Zurich, Otto-Stern-Weg 7, CH-8093 Zurich, Switzerland have been already tested in clinics as biomarkers, are 2 S. Savickas and U. auf dem Keller: Targeted degradomics peptides released from the amyloid beta 4 protein (APP) dynamic activities of individual proteases or net out- in Alzheimer’s disease or fibrinogen degradation prod- comes of interconnected protease networks through tar- ucts in breast and colon cancer (Huesgen et al., 2014). geted analysis of related indicative proteolytic events. In this minireview, we introduce targeted degradom- ics as a methodology, which is complementary to shotgun MS-based degradomics approaches. In particular, we Generation of degradomics target lists focus on design and implementation of targeted degra- domics experiments and recent applications of PRM and Key to targeted degradomics is the availability of informa- SRM in the context of proteolytic processing. Moreover, tion on proteolytic cleavage events, which are translated we outline the power of these technologies as independ- into libraries of unique proteolytically processed peptides. ent functional discovery tools in addition to providing Due to extensive biomedical research in the last decades enhanced sensitivity in validation of degradomics screens. and increasing numbers of high-throughput studies, a Finally, we discuss the potential of targeted degradomics wealth of data has been deposited into custom protease- methods in elucidating complex interplays between pro- centered databases, such as MEROPS, TopFIND, CutDB, teolysis and other post-translational modifications (PTMs) CaspDB and Degrabase (Igarashi et al., 2007; Craw- exploiting their high specificity and sensitivity. ford et al., 2013; Kumar et al., 2014; Fortelny et al., 2015; Rawlings et al., 2016). For specific target proteins of inter- est or to assess activities of particular proteases, mining Concept of targeted degradomics of these databases can be complemented by extensive lit- erature searches to include more recently described puta- By specifically monitoring preselected peptides that tive cleavage events, which have not yet been included in have been generated from proteins in a bottom-up common repositories. Even if no published data are avail- approach in most cases by tryptic digest, targeted prot- able, indicative cleavage events for many proteases might eomics methods allow for assessing and quantifying the be obtained with the help of emerging protease predictor same set of related proteins reproducibly in complex pro- algorithms, which achieve high levels of accuracy, particu- teomes (Rost et al., 2015). For technical details of these larly for proteases whose cleavage specificity is dominated powerful approaches, the reader is referred to excel- by selective amino acid residues (Song et al., 2011; Soste lent recent review articles published by pioneers in the et al., 2014). Such substrate and cleavage site predictors field (Picotti and Aebersold, 2012; Ebhardt et al., 2015; are available for several groups of proteases (PROSPER: Bourmaud et al., 2016). Targeted degradomics further https://prosper.erc.monash.edu.au/home.html, Peptide- extends this concept by selectively including semi- Cutter: http://web.expasy.org/peptide_cutter/, Cascleave: tryptic peptides, which are released from proteins that http://www.structbioinfor.org/cascleave2/index.html, had been proteolytically processed. Concomitant analy- SitePrediction: http://www.dmbr.ugent.be/prx/bioit2- sis of the corresponding fully tryptic peptide (cleavage public/SitePrediction/, ProP: http://www.cbs.dtu.dk/ site spanning peptide) from the non-processed form of services/ProP/, PoPS: http://pops.csse.monash.edu.au) the target protein allows for drawing conclusions with and combine sequence information, structural proper- regard to degree of processing, while recording of ‘clas- ties and additional biophysical and biological features to sical’ tryptic peptides from other regions of the protein assign predictive values to theoretical cleavages in target relates differential processing to general changes in proteins. As an additional custom resource, cleavage data protein abundances in proteomes across multiple con- might be derived from own shotgun discovery degradom- ditions. In particular, this enables quantifying degrees ics experiments, applying methodologies that have been of maturation of proteins whose function is dependent extensively reviewed elsewhere (Rogers and Overall, 2013; on proteolytic removal of modulatory propeptides. Clas- Schlage and auf dem Keller, 2015). sical examples are protease zymogens, e.g. members of As cleavage events collected from public reposito- the blood coagulation cascade, which are activated by ries or generated by predictive algorithms are not neces- proteolytic processing. Hence, by integrating positional sarily derived from MS-based proteomics studies, they information into selection of target peptides, targeted mostly need to be translated into suitable peptide lists degradomics quantitatively monitors specific proteo- for targeted degradomics analysis. This is achieved by lytic cleavage events and their dynamics in complex in silico digestion of related cleaved and non-cleaved proteomes with high throughput and unprecedented substrate proteins using sequence specific endopepti- precision. Ultimately, this can be exploited to test for dases (Fahlman et al., 2014; Giansanti et al., 2016). As S. Savickas and U. auf dem Keller: Targeted degradomics 3 mentioned above, trypsin is the most widely used endo- from any biological source material of different levels of peptidase in bottom-up proteomics to cleave amino acid complexity that had been exposed to differential protease sequences carboxy-terminal to arginine and lysine resi- activity. This might have happened through endogenous dues. However, this specificity might be problematic for proteases whose basal activity is modulated by external targeted degradomics experiments, as it overlaps with stimuli or ablated in loss-of-function models, or upon many important trypsin-like endopeptidases, complicat- incubation of the source proteome with selected exog- ing distinction between cleavages of the test protease enously added proteolytic modifiers. Moreover, substrate and the proteomics working protease used in bottom-up peptides of known cleavage events might be added to approaches. Moreover, the high frequency of lysine and bioactive proteomes to specifically test for activities of arginine in protein sequences might often lead to very endogenous proteases (Dutta et al., 2016). The resulting short (<6 amino acids) semi-tryptic peptides, which are substrate degradomes are further digested with respec- not accessible to identification by MS-based proteomics tive working endopeptidases to generate peptides of same and thus cannot be reliably monitored in targeted experi- length and properties as in the provided target list. ments. Therefore, alternative endopeptidases such as In the next step, prepared protein digests are ana- Lys-C, Glu-C, Asp-N or pepsin should be considered that lyzed by LC-MS/MS in SRM or PRM mode, whereby the have been successfully employed in bottom-up proteom- mass spectrometer is programmed to specifically monitor ics (Giansanti et al., 2016). With LysargiNase, a very inter- peptides provided in the unique degradomics target esting novel endopeptidase has been introduced that is list. Targeted degradomics analyses use reverse phase particularly interesting for targeted degradomics, since it (RP) chromatography as classical in-line LC component. generates peptides with N-terminal lysine or arginine and Normalization of LC retention times is achieved with thus alleviates monitoring of C-terminal halves of cleaved help of standard peptides, e.g. iRT peptides (Biognosys, cleavage site spanning peptides (Huesgen et al., 2015). Switzerland), which are measured together with target The generated list of truncated semi-specific peptide frag- peptides when preparing the target list and spiked into the ments should be complemented with intact fully specific experimental protein digest prior to LC-MS/MS analysis. peptides per protein ideally N- and C-terminal to respec- The combination of defined masses and retention times tive cleavage sites. These peptides benchmark the general allows narrowing in on short time windows in so-called abundance of protease target proteins in analyzed pro- time-scheduled runs, increasing the number of simultane- teomes and allow evaluating proteolytic processing as dif- ously monitored targets with high sensitivity. Depending ferential event, e.g. in response to cellular stimuli. on the nature and complexity of analyzed proteomes and In contrast to classical targeted proteomics, for which substrate degradomes, additional dimensions of LC might extensive repositories of proteotypic peptides with associ- be added either off- or in-line with RP to enhance chances ated MS spectra and in-line liquid chromatography (LC) of precisely identifying and quantifying proteolytic end retention times for design of optimal peptide target lists products of low abundance (Di Palma et al., 2012). are available (Kusebauch et al., 2016), targeted degra- Targeted degradomics assays have been mostly imple- domics generally requires physical validation of selected mented on triple-quadrupole mass spectrometers (Agard targets with the help of synthetic peptides that are meas- et al., 2012; Shimbo et al., 2012; Soste et al., 2014; Sabino ured under standardized LC-MS conditions. Finally, intro- et al., 2015; Dutta et al., 2016; Julien et al., 2016). Triple- ducing heavy isotopes in form of 13C- or 15N-amino acids quadrupole instrument measurements rely on both pre- provides the researcher with a mix of validated peptides cursor and fragment isolation for targeted analysis, also that can be spiked into analytes for most specific monitor- known as SRM. Small mass windows for quadrupole filters ing of proteolytic events in complex biological matrices. ensure reproducible and sensitive selection of target pep- tides and their fragment ions. However, the more recently introduced PRM methodology performed on quadrupole- Targeted degradomics experimental Orbitrap instruments isolates only the precursor but workflow measures all of its possible fragment ions (Bourmaud et al., 2016). It has been observed that in some conditions A validated target list of unique peptides allows directly quantification of peptides with PRM allows higher reso- querying complex proteomes for neo-N-termini, neo-C- lution and higher sensitivity than SRM (Kockmann et al., termini and cleavage site spanning peptides with high 2016). The improved methodological accuracy enables a specificity and sensitivity. In a generic targeted degra- more stringent quantification of proteolytically gener- domics workflow (Figure 1), proteomes could be derived ated and modified peptides (neo-N and neo-C-terminus, 4 S. Savickas and U. auf dem Keller: Targeted degradomics

Figure 1: Generic targeted degradomics workflow. Targeted degradomics workflows start with identifying the proteolytic cleavage of interest in protease cleavage databases such as CutDB, MEROPS or UniProt. Numerous, high confidence proteolytic cleavages have been identified and annotated. Substrate proteins are then subjected to theoretical endopeptidase treatment with mass spectrometry working proteases such as trypsin, ArgC or AspN to generate a unique peptide list of newly formed neo-N-termini, neo-C-termini and cleavage site spanning peptides. After selection and validation of degradomics target lists, sample proteomes are subjected to test protease treatment and digested by selected sequence specific working endopeptidases. Peptides are separated by in-line liquid chromatography and analyzed by targeted mass spectrometry for members of degradomics target lists. In SRM mode, triple-quadrupole instruments select both precursor ions (Q1) and predefined fragments (Q3), whereas quadrupole-Orbitrap instruments only select the precursor ion (Q) but follow all fragments over the chromatographic elution peak that is integrated for quantitation. cleavage site spanning peptide). Thus, lower limits of Recovered peptide intensity values can be further sub- detection might be reached to help evaluating properties jected to stringent statistical normalization as well as ad of low abundant proteases or proteases with low activity hoc and post hoc testing using statistical downstream pro- in a complex proteome. cessing packages like MSstats (Choi et al., 2014). For design and interpretation of targeted degradomics To increase sensitivity and coverage, specific subspe- experiments, software packages, such as Analyst, Skyline, cies of cleavage events, such as neo-termini, might be and SpectroDive are applied, which are commonly used selectively enriched with help of appropriate positional in targeted proteomics (MacLean et al., 2010; Poli et al., proteomics approaches like terminal amine isotopic labe- 2015). The software recognizes, annotates and quantifies ling of substrates (TAILS), combined fractional diagonal peptides specified in unique degradomics target lists. chromatography (COFRADIC) or subtiligase treatment S. Savickas and U. auf dem Keller: Targeted degradomics 5

(Schlage and auf dem Keller, 2015). Thereby, proteomes Dutta et al. (2016) developed a cleavable synthetic peptide are chemically labeled on the protein level after differ- whose substrate products they monitored by SRM upon ential exposure to proteases but before digestion with incubation with human plasma. Since asparaginyl endo- working endopeptidases. Notably, as application of any peptidase activity has been implicated in diseases, such of these techniques is associated with chemical modifica- as breast cancer, leukemia and dementia, this assay might tion of target peptides, degradomics target lists have to be serve as diagnostic biomarker test in future applications. adjusted accordingly and any synthetic template peptides A study by Wiita et al. (2014) monitored proteolytic sig- subjected to the same treatment. Furthermore, selective natures in serum from cancer patients in response to enrichment of terminal peptides prohibits analysis of chemotherapy induced cell death. By utilizing the power tryptic peptides of the monitored protease substrate of of subtiligase assisted N-terminal enrichment and tar- interest from the same peptide mixture and thus requires geted degradomics, they identified and quantified pro- separate analyses of samples prior to the enrichment step. cessing events including many caspase cleavage products Finally, several additional strategies can be applied to that significantly increased in postchemotherapy plasma further enhance coverage and improve quantification of and might lead to a novel class of biomarkers to monitor proteins and protease substrates of interest (Eichelbaum response to chemotherapy. As another example, Sabino et al., 2012). et al. (2015) used multiplexed iTRAQ-based TAILS to record proteolytic fragments released into wound fluids at multi- ple time points after wounding in a pig vacuum therapy Recent applications of targeted degradomics model. Among them, a cleavage fragment of the integrin adapter protein kindlin-3 was detected, which could be In recent years, the power of targeted degradomics has assigned to caspase-3 activity and readily validated by been exploited to tackle several important questions in SRM in samples from an independent experiment. protease research. The flexibility in study design and It should be noted that, so far, most studies applied highly reproducible identification of the same peptide targeted degradomics to validate and specifically monitor species in multiple samples by SRM is particularly suited protease cleavage products that had been identified to simultaneously monitor multiple proteolytic events after enrichment of terminal peptides. However, it can over time and thereby determine kinetic parameters for be expected that the importance of termini enrichment proteases in complex systems. This was first demonstrated will decrease with increasing performance of mass spec- by Agard et al. (2012) who determined hundreds of cata- trometers. As an example, the Kindlin-3 neo-N terminus lytic efficiencies of caspase-dependent cleavage events in monitored by SRM in pig wound exudates by Sabino et al. cells both upon incubation of lysates with recombinant (2015) was identified both before and after TAILS enrich- proteases and upon activation of endogenous caspases in ment of protein N-termini. cells by apoptosis. This study applied subtiligase-tagging to enrich for N-terminal peptides in combination with SRM for quantification, a strategy that was also success- Future directions ful in determining substrate preferences of caspase-2 and caspase-6 by quantitative kinetics (Julien et al., 2016). The unique capabilities of targeted degradomics Biological fluids and specifically blood plasma approaches are particularly suited to address current present enormous challenges to stochastic sampling in major challenges in understanding the complex functions shotgun proteomics due to their complexity and extreme of proteases in biological systems. For instance, targeted dynamic range of protein concentrations (Hu et al., 2006). degradomics allows deciphering interconnected protease Consequently, analysis of proteolytic end products in activation networks and their relations by monitoring body fluids highly benefits from selectivity and specificity zymogen propeptide removal as crucial regulatory event of targeted degradomics approaches. As a powerful alter- in protease activation. Starting with the famous waterfall native to immunoblotting, sensitive SRM assays have been cascade of blood coagulation, interdependent zymogen established to monitor proteolytically generated forms of activation networks have been described for many pro- cardiac troponin T in patient serum that are associated tease groups, including caspases, cathepsins, kallikreins with severity of cardiac damage and state of a patient and matrix metalloproteinases that all together with prior to and after developing acute myocardial infarc- their substrates and inhibitors form the protease web tion (Streng et al., 2016). To test for activity of a specific (Fortelny et al., 2014). With the help of targeted analy- protease, aparaginyl endopeptidase, in blood plasma, ses, it will be possible to develop quantitative models of 6 S. Savickas and U. auf dem Keller: Targeted degradomics protease activation networks by concomitant monitor- and phosphorylation of a critical serine residue in P1′ ing of decrease in abundance of spanning peptides and position of a furin-type cleavage site (Tagliabracci et al., increase in abundance of corresponding neo-termini 2014). Moreover, Dix et al. (2012) demonstrated the func- at sites of proteolytic removal of inhibitory propeptides tional interplay between phosphorylation and prote- (Fahlman et al., 2014) (Figure 2). Respective abundance olysis during apoptosis, and TAILS has been applied to ratios will determine activation potentials of individual phosphorylated and non-phosphorylated proteomes to network nodes and ultimately delineate proteolytic poten- identify phosphorylation-dependent cleavages of target tials of modules and subnetworks and their disturbance proteins by caspases (Turowec et al., 2014). Similarly, by gain- or loss-of-function in model systems of increasing glycosylation of either the cleaving protease or the sub- levels of complexity. Thereby, the high specificity and sen- strate protein can strongly affect cleavage kinetics and sitivity of targeted degradomics will enable direct transfer modulate specificities (Goettig, 2016). Several powerful of results from test tube and cell-based assays to analysis enrichment techniques have been developed and applied of biological material from animal models and patients to study single PTMs, but stochastic sampling in shotgun suffering from diseases associated with detrimental aber- approaches complicates analysis of the same peptide in rant protease activities. Ultimately, studies of this kind different PTM-modified forms. This might be overcome by have the potential to convert current limited attempts of targeted proteomics and enable efficient characterization interfering with single protease-substrate relations into of PTM-dependent proteolytic cleavage events when com- powerful strategies of attacking disturbances in protease bined with targeted degradomics methods. networks as underlying causes of many pathologies. In addition to a strong contribution of targeted degra- Another area of research that is predestined to domics to solving important issues in protease biology, strongly benefit from targeted degradomics is the analy- it can be expected that rapid advancements in targeted sis of interplay between proteolysis and other PTMs in proteomics technologies will steadily increase the power form of PTM crosstalk. As a prime example of such com- of the concept. Currently, targeted proteomics assays are plicated interactions, proteolytic processing of fibroblast typically limited to simultaneous analysis of a few hundred growth factor 23 is regulated by reciprocal O-glycosylation peptides per LC-MS/MS run with high resolution. Recently,

Figure 2: Mapping protease activation networks by targeted degradomics. Activation of proteases is monitored by recording generation of neo-N termini and loss of cleavage site spanning peptides upon zymogen propeptide removal by targeted degradomics. Relative quantification of the spanning peptide of the inactive proform and the neo-N terminus of the active mature protease in relation to activities of activating upstream proteases determines degree of activation and thus activation potentials within an activation network. Integration of data and functional modulation of individual nodes allows mapping of complicated interconnected protease activation networks in complex biological matrices. S. Savickas and U. auf dem Keller: Targeted degradomics 7

Gallien et al. (2015) introduced internal standard triggered- Acknowledgments: We thank S. Werner (ETH Zurich, parallel reaction monitoring (IS-PRM), which they showed Switzerland) for her continuous support of our work and to significantly increase the number of measurable pep- the Swiss National Science Foundation for funding with tides in a single LC-MS/MS run by using synthetic isotopi- help of a project grant (31003A_163216). cally labeled peptides to trigger scheduled high resolution measurements. As a very powerful emerging technology, data-independent acquisition (DIA) combines advantages in coverage of shotgun proteomics for discovery-based References applications with sensitivity and reproducible monitoring Agard, N.J., Mahrus, S., Trinidad, J.C., Lynn, A., Burlingame, A.L., of selected peptides by SRM or PRM. This is achieved by and Wells, J.A. (2012). Global kinetic analysis of proteolysis via selecting all instead of only the most intense precursor ions quantitative targeted proteomics. Proc. Natl. Acad. Sci. USA for fragmentation and further analysis on time-of-flight 109, 1913–1918. or Orbitrap analyzers. Here, a particular challenge is the Anderson, N.L., Ptolemy, A.S., and Rifai, N. (2013). The riddle of deconvolution of multiplexed fragment spectra, a compu- protein diagnostics: future bleak or bright? Clin. Chem. 59, tationally intensive task that is more and more alleviated 194–197. auf dem Keller, U. and Schilling, O. (2010). Proteomic techniques by introduction of novel software packages for data inter- and activity-based probes for the system-wide study of prote- pretation (Navarro et al., 2016). Advanced sample prepara- olysis. Biochimie 92, 1705–1714. tion strategies and selective enrichment of protein termini Bourmaud, A., Gallien, S., and Domon, B. (2016). Parallel reaction together with DIA could be a key workflow to characterize monitoring using quadrupole-orbitrap mass spectrometer: 16 proteolytic events with highest coverage and precision in principle and applications. Proteomics , 2146–2159. Choi, M., Chang, C.Y., Clough, T., Broudy, D., Killeen, T., MacLean, increasingly complex biological matrices. B., and Vitek, O. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 30, 2524–2526. Conclusions Crawford, E.D., Seaman, J.E., Agard, N., Hsu, G.W., Julien, O., Mahrus, S., Nguyen, H., Shimbo, K., Yoshihara, H.A., Zhuang, Targeted degradomics is a powerful concept to precisely M., et al. (2013). The DegraBase: a database of proteolysis in healthy and apoptotic human cells. Mol. Cell. Proteomics 12, monitor proteolytic events and their modulation under 813–824. changing conditions in complex biological systems with Di Palma, S., Hennrich, M.L., Heck, A.J., and Mohammed, S. (2012). high specificity and sensitivity. With the growing knowl- Recent advances in peptide separation by multidimensional edge of specific protease activities and the improvements liquid chromatography for proteome analysis. J. Proteomics 75, in high-throughput protease substrate discovery, increas- 3791–3813. Dix, M.M., Simon, G.M., Wang, C., Okerberg, E., Patricelli, M.P., ing numbers of comprehensive degradomics target lists and Cravatt, B.F. (2012). Functional interplay between caspase can be compiled to analyze protease action and function cleavage and phosphorylation sculpts the apoptotic proteome. with extensive coverage and at increasing levels of com- Cell 150, 426–440. plexity. Numerous software packages are under active Drag, M. and Salvesen, G.S. (2010). Emerging principles in development that will further leverage data interpretation protease-based drug discovery. Nat. Rev. Drug Discov. 9, and exchange of results within the protease research com- 690–701. Dutta, A., Potier, D.N., Walker, M.J., Gray, O.J., Parker, C., Holland, munity. Several studies have already exploited the power M., Williamson, A.J., Pierce, A., Unwin, R.D., Krishnan, S., of targeted degradomics to define kinetic parameters of et al. (2016). Development of a selected reaction monitoring hundreds of proteolytic cleavages in parallel and to assess mass spectrometry-based assay to detect asparaginyl protease activities in blood and body fluids, which pose endopeptidase activity in biological fluids. Oncotarget 7, particular challenges to MS-based proteomics. Targeted 70822–70831. Ebhardt, H.A., Root, A., Sander, C., and Aebersold, R. (2015). proteomics opens up many new avenues of research, e.g. Applications of targeted proteomics in systems biology and to explore interconnected protease activation networks translational medicine. Proteomics 15, 3193–3208. under physiological conditions and to study the complex Eichelbaum, K., Winter, M., Berriel Diaz, M., Herzig, S., and interplay between proteolysis and other major PTMs. Krijgsveld, J. (2012). Selective enrichment of newly synthesized Rapid developments in MS-based proteomics technolo- proteins for quantitative secretome analysis. Nat. Biotechnol. 30 gies will further push the boundaries of throughput, sen- , 984–990. Fahlman, R.P., Chen, W., and Overall, C.M. (2014). Absolute pro- sitivity and specificity to get closer to the ultimate aim of teomic quantification of the activity state of proteases and understanding the entire protease web and its perturba- proteolytic cleavages using proteolytic signature peptides and tions in health and disease. isobaric tags. J. Proteomics 100, 79–91. 8 S. Savickas and U. auf dem Keller: Targeted degradomics

Fortelny, N., Cox, J.H., Kappelhoff, R., Starr, A.E., Lange, P.F., Picotti, P. and Aebersold, R. (2012). Selected reaction monitoring- Pavlidis, P., and Overall, C.M. (2014). Network analyses reveal based proteomics: workflows, potential, pitfalls and future pervasive functional regulation between proteases in the directions. Nat. Methods 9, 555–566. human protease web. PLoS Biol. 12, e1001869. Poli, M., Ori, A., Child, T., Jaroudi, S., Spath, K., Beck, M., and Wells, Fortelny, N., Yang, S., Pavlidis, P., Lange, P.F., and Overall, C.M. D. (2015). Characterization and quantification of proteins (2015). Proteome TopFIND 3.0 with TopFINDer and PathFINDer: secreted by single human embryos prior to implantation. EMBO database and analysis tools for the association of protein ter- Mol. Med. 7, 1465–1479. mini to pre- and post-translational events. Nucleic Acids Res. Rawlings, N.D., Barrett, A.J., and Finn, R. (2016). Twenty years of the 43, D290–D297. MEROPS database of proteolytic enzymes, their substrates and Gallien, S., Kim, S.Y., and Domon, B. (2015). Large-scale targeted inhibitors. Nucleic Acids Res. 44, D343–D350. proteomics using internal standard triggered-parallel reaction Rogers, L.D. and Overall, C.M. (2013). Proteolytic post-translational monitoring (IS-PRM). Mol. Cell. Proteomics 14, 1630–1644. modification of proteins: proteomic tools and methodology. Giansanti, P., Tsiatsiani, L., Low, T.Y., and Heck, A.J. (2016). Six Mol. Cell. Proteomics 12, 3532–3542. alternative proteases for mass spectrometry-based proteomics Rost, H.L., Malmstrom, L., and Aebersold, R. (2015). Reproducible beyond trypsin. Nat. Protoc. 11, 993–1006. quantitative proteotype data matrices for systems biology. Mol. Goettig, P. (2016). Effects of glycosylation on the enzymatic activity Biol. Cell 26, 3926–3931. and mechanisms of proteases. Int. J. Mol. Sci. 17, E1969. Sabino, F., Hermes, O., Egli, F.E., Kockmann, T., Schlage, P., Croizat, Hu, S., Loo, J.A., and Wong, D.T. (2006). Human body fluid proteome P., Kizhakkedathu, J.N., Smola, H., and auf dem Keller, U. analysis. Proteomics 6, 6326–6353. (2015). In vivo assessment of protease dynamics in cutane- Huesgen, P.F., Lange, P.F., and Overall, C.M. (2014). Ensembles of ous wound healing by degradomics analysis of porcine wound protein termini and specific proteolytic signatures as candidate exudates. Mol. Cell. Proteomics 14, 354–370. biomarkers of disease. Proteomics Clin. Appl. 8, 338–350. Schlage, P. and auf dem Keller, U. (2015). Proteomic approaches to Huesgen, P.F., Lange, P.F., Rogers, L.D., Solis, N., Eckhard, U., uncover MMP function. Matrix Biol. 44–46C, 232–238. Kleifeld, O., Goulas, T., Gomis-Ruth, F.X., and Overall, C.M. Shimbo, K., Hsu, G.W., Nguyen, H., Mahrus, S., Trinidad, J.C., (2015). LysargiNase mirrors trypsin for protein C-terminal and Burlingame, A.L., and Wells, J.A. (2012). Quantitative profiling methylation-site identification. Nat. Methods 12, 55–58. of caspase-cleaved substrates reveals different drug-induced Igarashi, Y., Eroshkin, A., Gramatikova, S., Gramatikoff, K., Zhang, and cell-type patterns in apoptosis. Proc. Natl. Acad. Sci. USA Y., Smith, J.W., Osterman, A.L., and Godzik, A. (2007). CutDB: a 109, 12432–12437. proteolytic event database. Nucleic Acids Res. 35, D546–D549. Song, J., Tan, H., Boyd, S.E., Shen, H., Mahmood, K., Webb, G.I., Julien, O., Zhuang, M., Wiita, A.P., O‘Donoghue, A.J., Knudsen, G.M., Akutsu, T., Whisstock, J.C., and Pike, R.N. (2011). Bioinformatic Craik, C.S., and Wells, J.A. (2016). Quantitative MS-based approaches for predicting substrates of proteases. J. Bioin- enzymology of caspases reveals distinct protein substrate form. Comput. Biol. 9, 149–178. specificities, hierarchies, and cellular roles. Proc. Natl. Acad. Soste, M., Hrabakova, R., Wanka, S., Melnik, A., Boersema, P., Sci. USA 113, E2001–E2010. Maiolica, A., Wernas, T., Tognetti, M., von Mering, C., Kockmann, T., Trachsel, C., Panse, C., Wahlander, A., Selevsek, and Picotti, P. (2014). A sentinel protein assay for simulta- N., Grossmann, J., Wolski, W.E., and Schlapbach, R. (2016). neously quantifying cellular processes. Nat. Methods 11, Targeted proteomics coming of age - SRM, PRM and DIA perfor- 1045–1048. mance evaluated from a core facility perspective. Proteomics Streng, A.S., de Boer, D., Bouwman, F.G., Mariman, E.C., Scholten, 16, 2183–2192. A., van Dieijen-Visser, M.P., and Wodzig, W.K. (2016). Devel- Kumar, S., van Raam, B.J., Salvesen, G.S., and Cieplak, P. (2014). opment of a targeted selected ion monitoring assay for the Caspase cleavage sites in the human proteome: CaspDB, a elucidation of protease induced structural changes in cardiac database of predicted substrates. PLoS One 9, e110539. troponin T. J. Proteomics 136, 123–132. Kusebauch, U., Campbell, D.S., Deutsch, E.W., Chu, C.S., Spicer, Tagliabracci, V.S., Engel, J.L., Wiley, S.E., Xiao, J., Gonzalez, D.J., D.A., Brusniak, M.Y., Slagel, J., Sun, Z., Stevens, J., Grimes, B., Nidumanda Appaiah, H., Koller, A., Nizet, V., White, K.E., and et al. (2016). Human SRMAtlas: a resource of targeted assays to Dixon, J.E. (2014). Dynamic regulation of FGF23 by Fam20C quantify the complete human proteome. Cell 166, 766–778. phosphorylation, GalNAc-T3 glycosylation, and furin proteoly- MacLean, B., Tomazela, D.M., Shulman, N., Chambers, M., Finney, sis. Proc. Natl. Acad. Sci. USA 111, 5520–5525. G.L., Frewen, B., Kern, R., Tabb, D.L., Liebler, D.C., and Turk, B., Turk, D.S.A., and Turk, V. (2012). Protease signalling: the MacCoss, M.J. (2010). Skyline: an open source document editor cutting edge. EMBO J. 31, 1630–1643. for creating and analyzing targeted proteomics experiments. Turowec, J.P., Zukowski, S.A., Knight, J.D., Smalley, D.M., Graves, Bioinformatics 26, 966–968. L.M., Johnson, G.L., Li, S.S., Lajoie, G.A., and Litchfield, D.W. Marino, G., Eckhard, U., and Overall, C.M. (2015). Protein termini (2014). An unbiased proteomic screen reveals caspase cleav- and their modifications revealed by positional proteomics. ACS age is positively and negatively regulated by substrate phos- Chem. Biol. 10, 1754–1764. phorylation. Mol. Cell. Proteomics 13, 1184–1197. Navarro, P., Kuharev, J., Gillet, L.C., Bernhardt, O.M., MacLean, B., Wiita, A.P., Hsu, G.W., Lu, C.M., Esensten, J.H., and Wells, J.A. (2014). Rost, H.L., Tate, S.A., Tsou, C.C., Reiter, L., Distler, U., et al. Circulating proteolytic signatures of chemotherapy-induced cell (2016). A multicenter study benchmarks software tools for label- death in humans discovered by N-terminal labeling. Proc. Natl. free proteome quantification. Nat. Biotechnol. 34, 1130–1136. Acad. Sci. USA 111, 7594–7599.

1.5 Cellular heterogeneity and single-cell proteome analysis

Protein and protease networks in healthy and diseased tissue have a fine balance. Disturbances can result in physiological anomalies and ultimately lead to diseases, such as impaired wound healing and cancer. Every cell in a tissue produces cell-type specific proteins, thus completing the network and indicating that a specific cell type is responsible for production of a certain proteoform. The imbalance of cellular composition and protein abundance is evident in solid tumors (Mason & Joyce, 2011). Starting from a single mutated cell, they develop into an intricate cluster of differentiated cells, forming a heterogenous, sometimes compartmentalized tumor. Such variation, genetic differences and unequal proliferative capacities lead to cellular heterogeneity in tumor tissue that at first sight might appear as a homogenous structure (Gomez, 2020) (Marusyk & Polyak, 2010). Heterogeneous cell types mediate diverse responses to environmental triggers. Remarkable efforts were put into isolating cells and assigning them to a specific family of cells. With the invention of the fluorescence-activated cell sorting (FACS) method, allowing isolation of multiple cells according to their molecular fingerprint, it became possible to delineate the cellular heterogeneity in tissues and organs. Combined with RNA-Seq transcriptomics FACS analyses could explain multilineage of cellular differences by capturing the genetic expression of single cells sorted from a tissue (Dalerba, et al., 2011). In addition to capturing the transcription patterns of each cell and classifying their expression profiles (Xu, et al., 2012), single-cell RNA-Seq also contributed to development of combinatorial drug treatment regimens. Drug resistance has been highly associated with cellular heterogeneity, and targeting two cell types with two different drugs has proven to be highly effective against tumor progression (Zahreddine & Borden, 2013). Thus, personalization of treatments for heterogenous cellular structures is key to understanding disease progression and devising effective strategies for therapeutic intervention. However, more and more resources claim that genetic information is insufficient to make a final judgment regarding treatment options due to remarkable environmental, metabolic, or post- translational influences over the genetic predisposition (Hinohara & Polyak, 2019). A recent publication by Lu et al (2016) shows poor correlation between mRNA and protein abundances, Thus, a fundamental change in the logic of transcript and protein copies is necessary (Liu, Beyer, & Aebersold, 2016). The poor correlation in copy numbers is followed by further processing of proteins and generation of proteoforms, such as proteolytically cleaved proteins (Aebersold, et al., 2018). Hence, this increasing complexity of regulation with every extra level requires direct measurement of proteins as the ultimate workhorses of cellular function. In order to address cellular heterogeneity also at the global protein level, various single- cell proteomics workflows have been developed. With help of technologies like CyTOF, single- cell western blots , ELISpot, SPLIFF (simple split-ubiquitin-based method), and recently SCoPE-

Novel proteomics technologies to decipher protease networks in cells and tissues 35

MS (Single Cell ProtEomics by Mass-Spetrometry) proteomics researchers, have started to systematically explore proteoforms at the single-cell level (Bodenmiller, et al., 2012) (Dünkler, Rösler, Kestler, Moreno-Andrés, & Johnsson, 2015) (Janetzki & Rabin, 2015) (Budnik, Levy, Harmange, & Slavov, 2018) (Quadri, 2015). These techniques are able to identify and quantify proteins and their proteoforms by sorting cells using FACS and detection either by mass spectrometry or with a fluorescence readout. However, most techniques except for SCoPE-MS are limited by number of proteins that can be simultaneously identified. With rapid developments in mass spectrometry, SCoPE-MS is able to analyze the proteomic landscape of heterogenous environments, quantify up to 1000 proteins per single cell and follow protein dynamics. Further improvements are needed in sample preparation and analysis, which greatly influence the number of identified and quantified proteins. It is expected that software tools like TOMAHTO (Yu, et al., 2020), microfluidic devices like nanoPOTS and ultra-low flow chromatography systems will soon address these limitations (Cong, et al., 2020). SCoPE-MS has been developed in 2018 in the laboratory of Nikolai Slavov at Northeastern University, Boston, USA (Budnik, Levy, Harmange, & Slavov, 2018) (Specht, Emmott, Koller, & Slavov, 2019). The workflow focuses on unbiased analysis of protein quantity in individual single cellular proteomes. In order to analyse and compare multiple cells and achieve high proteomic depth, each cell is independently labelled with a Tandem Mass Tag (TMT). In the early development of the technique each kit of TMT tags had one designated label for a pool of 500 cells, called the booster channel, which would serve for peptide identification using the MS2 spectrum, whereby the remaining TMT tags distinguished peptides from FACS sorted single cells and quantified their identified proteins. Later, the number of cells in the booster channel has been decreased and extended into a low buffer volume microfluidics device “nanoPOT” and its successor “nanoTPOT” that uses a 20 cell booster channel, increases protein identification numbers and lowers complexity in sample preparation (Dou, et al., 2019) (Wu, Pai, Liu, Xing, & Lu, 2020). The workflow of a typical 11plex-TMT SCoPE-MS experiment is shown in Figure 4. First the cells are FACS sorted, denatured and labelled with an appropriate TMT tag leading to one tag per cell and one tag for the booster channel. Next, labelled samples are combined and prepared for a reverse phase liquid chromatography separation coupled to a high-resolution mass spectrometer. Recently, a technique for peptide separation of nano gram/low input samples has been explored using Narrow-Bore Packed NanoLC columns with ultra-low flow rate of 20 nl/min, which reduced sample loss inherent to traditional chromatography (Cong, et al., 2020). In order to ensure fast and reliable sample processing of ~200 pg material sampled from a single cell, a fast high-resolution mass spectrometer is usually used, such as the newest generation Thermo Fisher Q Exactive HF-X, Thermo Fisher Exploris 480, or trybrid series Thermo Fisher Fusion, Thermo Fisher Eclipse instruments. Several studies investigated the most optimal instruments, methods and parameters to gain the highest number of proteins. Key parameters are injection

36 Novel proteomics technologies to decipher protease networks in cells and tissues A)

B)

Figure 4- SCoPE-MS workflow. A). FACS sorted single cells are labelled with one of TMT tags leaving one tag reserved for the booster channel. The samples are pooled before mass spectrometry injection and are analysed using high sensitivity, high-resolution mass spectrometers. B). The data dependent proteomics method is used to identify an eluting peptide on MS1 level from the pool of cells, and each TMT tag is able to quantify how much of the peptide signal belongs to a single cell, or to the booster channel.

Novel proteomics technologies to decipher protease networks in cells and tissues 37

time of MS2 ions, appropriate AGC target, physical size of the cell and cell count in the booster channel (Cong, et al., 2020) (Dou, et al., 2019) (Tsai, et al., 2020). The final data set is computationally expensive, as it requires delineation of each identified peptide spectrum to match the quantified tag from each cell with appropriate false discovery approximations, coefficients of variation between cells and quantification accuracy.

38 Novel proteomics technologies to decipher protease networks in cells and tissues

References Aebersold, R., Agar, J., Amster, I., Baker, M., Bertozzi, C., Boja, E., . . . Zhang, B. (2018). How many human proteoforms are there? Nature Chemical Biology, 14(3). Bodenmiller, B., Zunder, E., Finck, R., Chen, T., Savig, E., Bruggner, R., . . . Nolan, G. (2012). Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nature Biotechnology, 30(9). Budnik, B., Levy, E., Harmange, G., & Slavov, N. (2018). SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biology, 19(1). Bufu, T., Di, X., Yilin, Z., Gege, L., Xi, C., & Ling, W. (2018). Celastrol inhibits colorectal cancer cell proliferation and migration through suppression of MMP3 and MMP7 by the PI3K/AKT signaling pathway. Anti-Cancer Drugs, 29(6). Cong, Y., Liang, Y., Motamedchaboki, K., Huguet, R., Truong, T., Zhao, R., . . . Kelly, R. (2020). Improved Single-Cell Proteome Coverage Using Narrow-Bore Packed NanoLC Columns and Ultrasensitive Mass Spectrometry. Analytical Chemistry, 92(3). Dalerba, P., Kalisky, T., Sahoo, D., Rajendran, P., Rothenberg, M., Leyrat, A., . . . Quake, S. (2011). Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nature Biotechnology, 29(12). Dou, M., Clair, G., Tsai, C., Xu, K., Chrisler, W., Sontag, R., . . . Zhu, Y. (2019). High-Throughput Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a Nanodroplet Sample Preparation Platform. Analytical Chemistry, 91(20). Dünkler, A., Rösler, R., Kestler, H., Moreno-Andrés, D., & Johnsson, N. (2015). Spliff: A single-cell method to map protein-protein interactions in time and space. Esantis A. Dünkler, R. Rösler, H. Kestler, D. Moreno-Andrés, & N. Johnsson, Methods in Molecular Biology (T. 1346). Fahlman, R., Chen, W., & Overall, C. (2014). Absolute proteomic quantification of the activity state of proteases and proteolytic cleavages using proteolytic signature peptides and isobaric tags. Journal of Proteomics, 100. Fanjul-Fernández, M., Folgueras, A., Cabrera, S., & López-Otín, C. (2010). Matrix metalloproteinases: Evolution, gene regulation and functional analysis in mouse models. Biochimica et Biophysica Acta - Molecular Cell Research, 1803(1). Fortelny, N., Cox, J., Kappelhoff, R., Starr, A., Lange, P., Pavlidis, P., & Overall, C. (2014). Network Analyses Reveal Pervasive Functional Regulation Between Proteases in the Human Protease Web. PLoS Biology, 12(5). Garcia-Irigoyen, O., Carotti, S., Latasa, M., Uriarte, I., Fernández-Barrena, M., Elizalde, M., . . . Ávila, M. (2014). Matrix metalloproteinase-10 expression is induced during hepatic injury and plays a fundamental role in liver tissue repair. Liver International, 34(7). Gill, S., & Parks, W. (2008). Metalloproteinases and their inhibitors: Regulators of wound healing. International Journal of Biochemistry and Cell Biology, 40(6-7). Gill, S., Kassim, S., Birkland, T., & Parks, W. (2010). Mouse models of MMP and TIMP function. Methods in molecular biology (Clifton, N.J.), 622. Gomez, H. (2020). How heterogeneity drives tumour growth: A computational study. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 378(2171). Gong, Y., Hart, E., Shchurin, A., & Hoover-Plow, J. (2008). Inflammatory macrophage migration requires MMP-9 activation by plasminogen in mice. Journal of Clinical Investigation, 118(9). Gurtner, G., Werner, S., Barrandon, Y., & Longaker, M. (2008). Wound repair and regeneration. Nature, 453(7193). Hadler-Olsen, E., Fadnes, B., Sylte, I., Uhlin-Hansen, L., & Winberg, J. (2011). Regulation of matrix metalloproteinase activity in health and disease. FEBS Journal, 278(1). Hinohara, K., & Polyak, K. (2019). Intratumoral Heterogeneity: More Than Just Mutations. Trends in Cell Biology, 29(7). Janetzki, S., & Rabin, R. (2015). Enzyme-linked immunospot (Elispot) for single-cell analysis. Esantis S. Janetzki, & R. Rabin, Methods in Molecular Biology (T. 1346). Justilien, V., Regala, R., Tseng, I., Walsh, M., Batra, J., Radisky, E., . . . Fields, A. (2012). Matrix metalloproteinase-10 is required for lung cancer stem cell maintenance, tumor initiation and metastatic potential. PLoS ONE, 7(4). Kassim, S., Gharib, S., Mecham, B., Birkland, T., Parks, W., & McGuire, J. (2007). Individual matrix metalloproteinases control distinct transcriptional responses in airway epithelial cells infected with Pseudomonas aeruginosa. Infection and Immunity, 75(12). Kelly, R., Zhu, Y., Liang, Y., Cong, Y., Piehowski, P., Dou, M., . . . Ansong, C. (2019). Single Cell Proteome Mapping of Tissue Heterogeneity Using Microfluidic Nanodroplet Sample Processing and Ultrasensitive LC-MS. Journal of biomolecular techniques : JBT, 30.

Novel proteomics technologies to decipher protease networks in cells and tissues 39

Kerkelä, E., Ala-aho, R., Jeskanen, L., Lohi, J., Grénman, R., M-Kähäri, V., & Saarialho-Kere, U. (2001). Differential patterns of stromelysin-2 (MMP-10) and MT1-MMP (MMP-14) expression in epithelial skin cancers. British Journal of Cancer, 84(5). Koller, F., Dozier, E., Nam, K., Swee, M., Birkland, T., Parks, W., & Fingleton, B. (2012). Lack of MMP10 exacerbates experimental colitis and promotes development of inflammation-associated colonic dysplasia. Laboratory Investigation, 92(12). Krampert, M., Bloch, W., Sasaki, T., Bugnon, P., Rülicke, T., Wolf, E., . . . Werner, S. (2004). Activities of the matrix metalloproteinase stromelysin-2 (MMP-10) in matrix degradation and keratinocyte organization in wounded skin. Molecular Biology of the Cell, 15(12). Liu, Y., Beyer, A., & Aebersold, R. (2016). On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell, 165(3). Liu, Z., Li, N., Diaz, L., Shipley, J., Senior, R., & Werb, Z. (2005). Synergy between a plasminogen cascade and MMP-9 in autoimmune disease. Journal of Clinical Investigation, 115(4). Madlener, M., Mauch, C., Conca, W., Brauchle, M., Parks, W., & Werner, S. (1996). Regulation of the expression of stromelysin-2 by growth factors in keratinocytes: Implications for normal and impaired wound healing. Biochemical Journal, 320(2). Madlener, M., Parks, W., & Werner, S. (1998). Matrix metalloproteinases (MMPS) and their physiological inhibitors (TIMPs) are differentially expressed during excisional skin wound repair. Experimental Cell Research, 242(1). Madzharova, E., Kastl, P., Sabino, F., & auf dem Keller, U. (2019). Post-translational modification-dependent activity of matrix metalloproteinases. International Journal of Molecular Sciences, 20(12). Marusyk, A., & Polyak, K. (2010). Tumor heterogeneity: Causes and consequences. Biochimica et Biophysica Acta - Reviews on Cancer, 1805(1). Mason, S., & Joyce, J. (2011). Proteolytic networks in cancer. Trends in Cell Biology, 21(4). Mathew, R., Khanna, R., Kumar, R., Mathur, M., Shukla, N., & Ralhan, R. (2002). Stromelysin-2 overexpression in human esophageal squamous cell carcinoma: Potential clinical implications. Cancer Detection and Prevention, 26(3). McCawley, L., Wright, J., LaFleur, B., Crawford, H., & Matrisian, L. (2008). Keratinocyte expression of MMP3 enhances differentiation and prevents tumor establishment. American Journal of Pathology, 173(5). Moali, C., & Hulmes, D. (2009). Extracellular and cell surface proteases in wound healing: New players are still emerging. European Journal of Dermatology, 19(6). Muller, D., Breathnach, R., Engelmann, A., Millon, R., Bronner, G., Flesch, H., . . . Abecassis, J. (1991). Expression of -related metalloproteinase genes in human lung or head and neck tumours. International Journal of Cancer, 48(4). Nagase, H. (1995). Human stromelysins 1 and 2. Methods in Enzymology, 248(C). Nakamura, H., Fujii, Y., Ohuchi, E., Yamamoto, E., & Okada, Y. (1998). Activation of the precursor of human stromelysin 2 and its interactions with other matrix metalloproteinases. European Journal of Biochemistry, 253(1). O'Callaghan, J., Crosbie, D., Cassidy, P., Sherwood, J., Flügel-Koch, C., Lütjen-Drecoll, E., . . . Humphries, P. (2017). Therapeutic potential of AAV-mediated MMP-3 secretion from corneal endothelium in treating glaucoma. Human Molecular Genetics, 26(7). Orbe, J., Barrenetxe, J., Rodriguez, J., Vivien, D., Orset, C., Parks, W., . . . Páramo, J. (2011). Matrix metalloproteinase- 10 effectively reduces infarct size in experimental stroke by enhancing fibrinolysis via a - activatable fibrinolysis inhibitor-mediated mechanism. Circulation, 124(25). Overall, C., & López-Otín, C. (2002). Strategies for MMP inhibition in cancer: Innovations for the post-trial era. Nature Reviews Cancer, 2(9). Page-McCaw, A., Ewald, A., & Werb, Z. (2007). Matrix metalloproteinases and the regulation of tissue remodelling. Nature Reviews Molecular Cell Biology, 8(3). Parks, W., Wilson, C., & López-Boado, Y. (2004). Matrix metalloproteinases as modulators of inflammation and innate immunity. Nature Reviews Immunology, 4(8). Puente, X., Sánchez, L., Overall, C., & López-Otín, C. (2003). Human and mouse proteases: A comparative genomic approach. Nature Reviews Genetics, 4(7). Quadri, S. (2015). Single-cell western blotting. Methods in molecular biology (Clifton, N.J.), 1312. Rawlings, N., & Salvesen, G. (2013). Handbook of Proteolytic Enzymes (Tom. 1-3). Regala, R., Justilien, V., Walsh, M., Weems, C., Khoor, A., Murray, N., & Fields, A. (2011). Matrix metalloproteinase-10 promotes Kras-mediated bronchio-alveolar stem cell expansion and lung cancer formation. PLoS ONE, 6(10). Reinke, J., & Sorg, H. (2012). Wound repair and regeneration. European Surgical Research, 49(1).

40 Novel proteomics technologies to decipher protease networks in cells and tissues Rozenblatt-Rosen, O., Stubbington, M., Regev, A., & Teichmann, S. (2017). The Human Cell Atlas: From vision to reality. Nature, 550(7677). Saarialho-Kere, U., Pentland, A., Birkedal-Hansen, H., Parks, W., & Welgus, H. (1994). Distinct populations of basal keratinocytes express stromelysin-1 and stromelysin-2 in chronic wounds. Journal of Clinical Investigation, 94(1). Schäfer, M., & Werner, S. (2008). Cancer as an overhealing wound: An old hypothesis revisited. Nature Reviews Molecular Cell Biology, 9(8). Schlage, P., Kockmann, T., Sabino, F., Kizhakkedathu, J., & Auf Dem Keller, U. (2015). Matrix metalloproteinase 10 degradomics in keratinocytes and epidermal tissue identifies bioactive substrates with pleiotropic functions. Molecular and Cellular Proteomics, 14(12). Shannon, P., Markiel, A., Ozier, O., Baliga, N., Wang, J., Ramage, D., . . . Ideker, T. (2003). Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Research, 13(11). Specht, H., Emmott, E., Koller, T., & Slavov, N. (2019). High-throughput single-cell proteomics quantifies the emergence of macrophage heterogeneity. bioRxiv. Toriseva, M., & Kähäri, V. (2009). Proteinases in cutaneous wound healing. Cellular and Molecular Life Sciences, 66(2). Tsai, C., Zhao, R., Williams, S., Moore, R., Schultz, K., Chrisler, W., . . . Liu, T. (2020). An Improved Boosting to Amplify Signal with Isobaric Labeling (iBASIL) Strategy for Precise Quantitative Single-cell Proteomics. Molecular and Cellular Proteomics, 19(5). Turk, B., Turk, D., & Turk, V. (2012). Protease signalling: The cutting edge. EMBO Journal, 31(7). Van Den Steen, P., Dubois, B., Nelissen, I., Rudd, P., Dwek, R., & Opdenakker, G. (2002). Biochemistry and molecular biology of gelatinase B or matrix metalloproteinase-9 (MMP-9). Critical Reviews in Biochemistry and Molecular Biology, 37(6). Vandooren, J., Geurts, N., Martens, E., Van Den Steen, P., & Opdenakker, G. (2013). Zymography methods for visualizing hydrolytic enzymes. Nature Methods, 10(3). Vandooren, J., Van Den Steen, P., & Opdenakker, G. (2013). Biochemistry and molecular biology of gelatinase B or matrix metalloproteinase-9 (MMP-9): The next decade. Critical Reviews in Biochemistry and Molecular Biology, 48(3). Wu, R., Pai, A., Liu, L., Xing, S., & Lu, Y. (2020). NanoTPOT: Enhanced Sample Preparation for Quantitative Nanoproteomic Analysis. Analytical Chemistry, 92(9). Xu, X., Hou, Y., Yin, X., Bao, L., Tang, A., Song, L., . . . Wang, J. (2012). Single-cell exome sequencing reveals single- nucleotide mutation characteristics of a kidney tumor. Cell, 148(5). Yu, Q., Xiao, H., Jedrychowski, M., Schweppe, D., Navarrete-Perea, J., Knott, J., . . . Gygi, S. (2020). Sample multiplexing for targeted pathway proteomics in aging mice. Proceedings of the National Academy of Sciences of the United States of America, 117(18). Zahreddine, H., & Borden, K. (2013). Mechanisms and insights into drug resistance in cancer. Frontiers in Pharmacology, 4 MAR. Zhu, Y., Scheibinger, M., Ellwanger, D., Krey, J., Choi, D., Kelly, R., . . . Barr-Gillespie, P. (2019). Single-cell proteomics reveals changes in expression during hair-cell development. eLife, 8.

Novel proteomics technologies to decipher protease networks in cells and tissues 41 2. Research Objectives

Proteases contribute to tissue homeostasis, modulate complex responses to environmental factors and play pivotal roles in disease. However, they do not act alone but in complicated networks to finally process bioactive target proteins whose activity is irreversibly modified. An excellent example forming such a network are matrix metalloproteinases (MMPs) that contribute to skin homeostasis, cutaneous wound repair, and skin carcinogenesis. Thereby, they are secreted by migrating keratinocytes and fibroblasts and build an interdependent activation network that has been poorly understood. Hence, the biological aims of this thesis were to exploit next-generation proteomics technologies to map this MMP activation network and its interconnections in the skin on the molecular, cellular and tissue level. The MMP network is part of the overall network of interacting proteases and their substrates termed the protease web. Therefore, targeted analysis of such a sub-network might miss important events that would require separate analyses but might be prevented by limited sample amounts. Thus, we aimed at exploiting the speed and sensitivity of newest generation instruments to optimally combine biased and unbiased analyses. A second technological aim addressed the emerging importance of single-cell proteome analysis to understand cellular heterogeneity in complex tissues and its extension into the areas of discovery and targeted degradomics.

Biological aims: • establish targeted proteomics assays for pro- and active MMPs to test for abundance and activity of multiple MMPs in complex samples • apply MMP assays to map MMP10-dependent MMP activation networks in fibroblast and keratinocyte secretomes • monitor interconnected MMP activation and test the hypothesis that MMP10 is a local activator of other MMPs in hyperproliferative epidermis of skin wounds

Technological aims: • establish a workflow for concomitant targeted analysis of a protease sub-network and unbiased assessment of the surrounding proteomes and degradomes • design and create single-cell proteomics workflows to ultimately deconvolute protease networks in heterogeneous cell populations

42 Novel proteomics technologies to decipher protease networks in cells and tissues 3. Mapping the N-Terminome in Tissue Biopsies by PCT- TAILS – Manuscript 3

In order to analyze the proteome of biological samples using mass spectrometry-based proteomics the proteins must be extracted from the sample. Common samples analyzed by MS include muscle tissue, liver, cartilage or skin biopsies or even whole tumors. Depending on the origin of the sample/tissue the starting material can be very low and therefore protein extraction must be as efficient as possible. Classical mechanical protein extraction methods (i.e. Ultra- Turrax homogenization) have the major shortcoming that they require high amounts of starting material and only yield a low concentration of extracted proteins. In this protocol, we have adapted a protein extraction strategy based on Pressure Cycling Technology (PCT) for discovery and targeted degradomics. This method only requires as little as 2-3mg of starting tissue and extracts up to 20% protein (weight/weight) of the total sample weight.

This method was used in this thesis for processing mouse tissue, which was analyzed by discovery and targeted degradomics methods as described in manuscripts 4 and 5.

Bundgaard L, Savickas S, Auf dem Keller U. Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS. Methods Mol Biol. 2020;2043:285-296. doi:10.1007/978-1-4939-9698-8_24

Novel proteomics technologies to decipher protease networks in cells and tissues 43 Chapter 24

Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS

Louise Bundgaard, Simonas Savickas, and Ulrich auf dem Keller

Abstract

Proteases play pivotal roles in multiple biological processes in all living organisms and are tightly regulated under normal conditions, but alterations in the proteolytic system and uncontrolled protease activity result in multiple pathological conditions. A disease will most often be defined by an ensemble of cleavage events—a proteolytic signature, thus the system-wide study of protease substrates has gained significant attention and identification of disease specific clusters of protease substrates holds great promise as targets for diagnostics and therapy. In this chapter we describe a method that enables fast and reproducible analysis of protease substrates and proteolytic products in an amount of tissue less than the quantity obtained by a standard biopsy. The method combines tissue disruption and protein extraction by pressure cycling technology (PCT), N-terminal enrichment by tandem mass tag (TMT)-terminal amine isotopic labeling of substrates (TAILS), peptide analysis by mass spectrometry (MS), and a general pipeline for interpretation of the data.

Key words Proteolysis, Degradomics, Protease, Pressure cycling technology, PCT-TAILS, Tissue biopsies, Proteomics

1 Introduction

Proteases play pivotal roles in multiple biological processes in all living organisms [1], and with almost 600 proteases identified in the human genome so far, proteases constitute the largest enzyme family in humans [2]. Proteases catalyze a fundamental irreversible posttranslational modification in target substrates, and the cleavage products act as effector molecules that propagate, amplify, restrict, or dampen signals with direct impact on protein activity and func- tion [3]. Protease activities are tightly regulated under normal conditions, but uncontrolled protease activity leads to altered spa- tial and temporal control of substrate cleavage, and alterations in the proteolytic system result in multiple pathological conditions, such as impaired wound healing [4], cancer [5], neurodegenerative [6], cardiovascular [7], and gastrointestinal diseases [8]. As an example, proteases including ADAMTSs are involved in all phases of healing [4], and altered protease activity is often implicated in

Suneel S. Apte (ed.), ADAMTS Proteases: Methods and Protocols, Methods in Molecular Biology, vol. 2043, https://doi.org/10.1007/978-1-4939-9698-8_24, © Springer Science+Business Media, LLC, part of Springer Nature 2020 285 286 Louise Bundgaard et al.

healing disorders of the skin. The matrix metalloproteinases MMP1, MMP2, MMP8, and MMP9 were found upregulated and TIMP2 downregulated in chronic diabetic foot ulcers compared with healing wounds in normal patients, suggesting that the increased proteolytic environment contributes to the failure of diabetic wounds to heal [9]. In a study of skin inflammation it was demonstrated that loss of MMP2 could perturb the proteolytic signaling network and enhance inflammation [10], and loss of meprin α and β caused significantly reduced collagen I deposition in skin indicating implication of dysfunctional meprin α and β in connective tissue disorders [11]. In most cases, a disease will not be defined by a single proteo- lytic event nor by aberrant activity of a single protease, but by an ensemble of cleavage events—a proteolytic signature. Thus, in recent years the system-wide study of protease substrates has gained significant attention [12], and identification of disease specific clus- ters of protease substrates holds great promise as targets for diag- nostics and therapy. Here, we describe a method that enables fast and reproducible analysis of protease substrates and proteolytic products in an amount of tissue less than the quantity obtained by a standard biopsy. The method combines tissue disruption by pressure cycling technology (PCT), tandem mass tag (TMT)-terminal amine isoto- pic labeling of substrates (TAILS), and mass spectrometry (MS) [13] (Fig. 1). With availability of many mouse ADAMTS knockout models, such analysis may permit identification of ADAMTS sub- strates directly from tissues. In the first part of the procedure, proteins are extracted from the tissue samples and denatured by use of mechanical tissue dis- ruption and lysis buffer containing denaturant. Efficient disruption of the tissue is required to ensure high yields of protein, and various mechanical methods are currently commercially available [14]. In this protocol, we use PCT, which is a novel technology taking advantage of matrix and cell disruption by alternating hydrostatic pressure. The high pressure employed in PCT forces water into the inner core of the protein which assists in denaturation [15]. PCT is a fast and reproducible method and a central advantage is its capa- bility to extract proteins from small amounts of tissue, thereby overcoming one of the frequently encountered bottlenecks in molecular analyses [13]. The reported applications for PCT are diverse and have been expanding, particularly for isolating biomo- lecules from tissues [16–19]. In the second part, all amines of free protein N termini in each sample are chemically labeled with an isobaric mass tag (in this protocol we use 6-plex tandem mass tag (TMT) labels). After combining the samples, the proteins are digested by a protease with known specificity. We usually use trypsin, which cleaves the proteins C-terminal to arginine and lysine. The generated peptides PCT-TAILS 287

HOMOGENIZE TISSUE SAMPLES 1h ON THE BAROCYCLER

S SH S S SH S 3h DENATURE REDUCE ALKYLATE

126 131 127 128 130 131 TMT LABELING 126 129 128 126 130 OF AND 128 131 AMINES 5h

126 126

127 128 POOL THE 128 SAMPLES 129 AND PRECIPITATE 130 PROTEINS 131 131 8h

129 129 128 128 128 128 0 0 13 127 13 127 1 ENZYMATIC 13 131 126 126 DIGEST 131 131 126 126 TRYPSIN 30h

PreTAILS SAMPLE

9 128 130 AMINE REACTIVE 12 131 6 POLYMER BINDS 12

INTERNAL TRYPTIC 128 26 48h PEPTIDES 1 131 127

RECOVERY OF 129 126 126 1 TAILS N-TERMINI BY 13 SAMPLE ULTRAFILTRATION 128 130 50h 127 131 128

TANDEM MASS SPECTROMETRY MS ANALYSIS ANALYSIS

Fig. 1 Overview of the PCT-TAILS workflow. Proteins are extracted from the tissue samples and denatured followed by reduction of disulfide bonds and 288 Louise Bundgaard et al.

are analyzed by MS, revealing the identity of the TMT-labeled proteins. The TMT labels allow multiplexed relative quantification measurements of the proteins in the sample, since the products of the proteolysis originate from different samples/conditions. The isobaric mass tags also bind to all lysine side chains, which are then skipped by trypsin cleavage. Internal tryptic peptides with labeled lysine residues aid in the relative quantification of the proteins in the sample. Every tryptic cleavage creates a novel reactive N terminus. As a key step of the procedure, an amine-reactive polymer is used to bind the trypsin-generated free N termini. The polymer is recov- ered by ultrafiltration, and the flow through provides a fraction enriched for the N termini labeled with the isobaric mass tags. In summary, TMT-TAILS effectively reduces the complexity of the peptide mixture and facilitates investigation and characteriza- tion of alterations of the protease cleavage products in diseases. Moreover, naturally modified N termini can be discriminated from N termini labeled with isobaric mass tags, allowing for determina- tion of acetylation and pyroglutamate formation [20]. The TAILS workflow was described for the first time in 2010 [20], and it was successfully used in several studies of protease cleavage products, including global assessment of the N-terminome in human and pig wound exudate [10, 21–24]. In the third part, the peptides are analyzed by liquid chroma- tography tandem mass spectrometry (LC-MS/MS). The mass spectrometer is operated in a data-dependent analysis (DDA) mode, meaning that the intensities of peptide ions are recorded in the precursor scan (MS1), and a defined number of high intensity peptide ions are selected for fragmentation and recording of the fragment ion spectra (MS2). The number of peptides to be frag- mented during MS analysis primarily depends on the speed of the instrument. During fragmentation the isobaric mass tag is detached from the labeled peptide (reporter ion). These reporter ions are visible as intensity peaks in the low m/z range of the MS2 spectrum. Reporter ion intensity is proportional to the abundance of the peptide in the sample of origin, and the reporter ions interrelated

ratio in the multiplex sample can be used to determine the relative ä

Fig. 1 (continued) alkylation of free thiols. The proteins in each sample are chemically labeled with an isobaric mass tag (6-plex TMT labels), combined and the proteins precipitated and digested with trypsin. An amine-reactive polymer is used to bind the newly generated free N termini and after ultrafiltration the flow through contains a peptide fraction enriched for the N termini labeled with the isobaric mass tags. The peptides prior to (preTAILS sample) and after (TAILS sample) N-terminal enrichment are analyzed by liquid chromatography tandem mass spectrometry to enable categorization of true protease cleavages and noncleavage events and to reveal the identity of the TMT-labeled proteins. An estimate of the hours needed for the individual steps is given on the timeline to the left PCT-TAILS 289

quantitative abundance. The remaining fragment ion peaks in the spectrum originate from the common peptide backbone and are used to identify the corresponding peptide. Based on the isobaric mass tag attached to the peptide, it can be determined from which labeled sample/condition it originates. After removing systematic biases from the quantitative data (normalization), statistical meth- ods are applied to map and compare the degradome from different conditions. In this protocol, we describe a general data analysis strategy for TAILS data using Thermo Scientific™ Proteome Dis- coverer v2.2 and the TAILS Annotator 1.0 script [25].

2 Materials

2.1 Sampling 1. Biopsy punch 2–4 mm. 2. Scalpel and forceps. 3. 50 mL conical tubes. 4. Phosphate buffered saline pH 7.4 (PBS). 5. Cryogenic vials. 6. Liquid nitrogen.

2.2 PCT Protein 1. Barocycler (Pressure BioSciences Inc.). Extraction 2. Scalpel and forceps. and Denaturation 3. Glass plate. 4. Microbalance. 5. Ultrasonic bath. 6. 1.5 mL microcentrifuge tubes. 7. Microtubes for PCT (Pressure BioSciences Inc.). 8. PCT-MicroPestles (30 μL) (Pressure BioSciences Inc.) 9. Capper tool for MicroPestles (Pressure BioSciences Inc.). 10. Lysis buffer: 4 M guanidine hydrochloride (GuHCl), 250 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) pH 7.8—(Optional: supplemented with Roche cOmplete inhibitor cocktail and PhosSTOP phosphatase inhibitor cock- tail (Roche)). 11. 200 mM tris(2-carboxyethyl)phosphine hydrochloride (TCEP) dissolved in 50 mM HEPES. 12. 400 mM chloroacetamide (CAA) 400 mM dissolved in 50 mM HEPES.

2.3 TMT-TAILS 1. 250 mM HEPES, pH 7.8. 2. Bradford protein assay (Bio-Rad). 3. Heating block. 290 Louise Bundgaard et al.

4. 6 plex TMT (Thermo Fisher Scientific). 5. 1 M HEPES, pH 7.8. 6. Dimethyl sulfoxide (DMSO).

7. 1 M ammonium bicarbonate (NH4HCO4). 8. 15 mL conical tube. 9. Acetone (20 C). 10. Methanol (MeOH) (20 C). 11. 100 mM sodium hydroxide (NaOH). 12. Trypsin 1 μg/μL. 13. 1 M hydrochloric acid (HCl) in water. 14. Hyperbranched polyglycerol-aldehydes (HPG-ALD) polymer. Store polymer aliquots under inert gas at 80 C. Polymer is available without commercial or company restriction from Flintbox Innovation Network, The Global Intellectual Exchange and Innovation Network (https://www.flintbox. com/public/project/1948/). 15. 5 M sodium cyanoborohydride (NaBH3CN) in 1 M NaOH. 16. Amicon Ultra-0.5 Centrifugal Filter Units, 30 kDa molecular weight cutoff (MWCO) (Merck Millipore).

2.4 Desalting 1. Vacuum centrifuge. and C18 Cleanup 2. 19 gauge needle without bevel. 3. Plain P200 pipette tips. 4. C18 Empore disk (Sigma-Aldrich). 5. MeOH. 6. Activation and elution buffer: 80% acetonitrile (ACN), 0.1% formic acid (FA). 7. Equilibrium buffer: 3% ACN, 1% trifluoroacetic acid (TFA). 8. Wash buffer: 0.1% FA. 9. MS sample buffer: 2% ACN, 1% TFA.

2.5 LC-MS/MS 1. C18 chromatography system (e.g., Thermo ScientificTM Easy- Analysis nLC1000). 2. Solvent A (0.1% FA). 3. Solvent B (80% ACN, 0.1% FA). 4. Hybrid mass spectrometer (e.g., Thermo ScientificTM Q ExactiveTM).

2.6 Data Analysis 1. Thermo Scientific™ Proteome DiscovererTM software v2.2. 2. TAILS Annotator 1.0 (http://clipserve.clip.ubc.ca/tails/). PCT-TAILS 291

3 Methods

3.1 Sampling 1. Use the punch biopsy to harvest the tissue of interest. 2. Wash the tissue samples with PBS until they are free of blood contamination. 3. Immediately, transfer the tissue to a cryogenic tube and snap freeze in liquid nitrogen.

3.2 PCT Tissue 1. Weigh ~2 mg wet weight tissue (see Notes 1 and 2) and transfer Homogenization the tissue to a PCT microtube (see Note 3). and Protein 2. Place the PCT microtubes in 1.5 mL microcentrifuge tubes to Denaturation ease handling. 3. Add 30 μL of Lysis buffer (see Note 4). 4. Sonicate the vials 10 min in an ultrasonic bath. 5. Use the capper tool to cap the PCT microtubes with micro- pestles, ensuring that the tissue is underneath the micropestles. 6. Load the capped PCT microtubes into the Barocycler cartridge evenly distributed in each barrel and place the cartridge in the Barocycler. 7. Homogenize the samples on the Barocycler. Settings on the Barocycler for homogenization of tissue will depend on the type and size of the sample. Our data was obtained with the following settings: 60 cycles at 33 C with 45,000 psi for 20 s and no pressure for 10 s. 8. Release the micropestles and add 1:20 (v/v) 200 mM TCEP. 9. Place the PCT microtubes in 1.5 mL microcentrifuge tubes and incubate the samples on a shaker at 37 C for 1 h at 300 rpm. 10. Add 1:10 (v/v) of 400 mM CAA and incubate the samples on a shaker at RT for 30 min at 600 rpm.

3.3 TMT-TAILS 1. Quick spin the PCT microtubes upside down in a 1.5 mL protein low bind microcentrifuge tube to transfer the sample to the microcentrifuge tube. 2. Add 250 mM HEPES to obtain a final concentration of 2.5 M GuHCl (see Note 5). 3. Centrifuge samples at RT for 10 min at 13,000 g. 4. Transfer the supernatant to a new 1.5 mL protein low bind microcentrifuge tube. 5. Take an aliquot and determine the protein concentration using the Bradford protein assay. 6. Add 2.5 M GuHCl, 250 mM HEPES, pH 7.8 to dilute the samples to a protein concentration of 1 μg/μL. 292 Louise Bundgaard et al.

7. Transfer 50 μL sample (¼50 μg protein) to a clean microcentrifuge tube. 8. Dissolve each vial of TMT (0.2 mg) in DMSO in a volume equivalent to the volume of sample (50 μL) and mix by pipet- ting (see Note 6). 9. Differentially label proteins by adding TMT reagents to the samples in a protein to TMT weight ratio of 1:4 (w/w) and a final DMSO concentration of 50%. 10. Mix by pipetting and incubate for 60 min at RT (see Note 7).

11. To quench the labeling reaction add 1 M NH4HCO3 to a final concentration of 100 mM NH4HCO3, vortex, and incubate for 30 min at RT. 12. Combine samples labeled with different TMT reagents in a 15 mL conical tube and mix by vortexing. 13. To clean up the proteome precipitate proteins with 6–8 sample volumes of ice cold acetone and 1 sample volume of ice cold methanol followed by incubation for at least 2 h at 80 C. 14. Centrifuge sample at 10,000 g for 20 min at 4 C. 15. Carefully discard the supernatant and resuspend the pellet in 5 mL ice cold MeOH. 16. Centrifuge sample at 10,000 g for 10 min at 4 C, discard the supernatant, and air-dry the pellet with the tube upside down. 17. Resuspend the pellet in 100 mM NaOH in a protein to NaOH ratio of 10:1 (w/v). 18. Adjust the protein concentration to 1 μg/μL (assuming no protein loss) and 100 mM HEPES, pH 7.8 by adding an appropriate amount of 1 M HEPES, pH 7.8 and ddH2O. 19. Digest the proteins overnight with trypsin at a ratio of 1:100 trypsin to protein (w/w) at 37 C. 20. Take 10% of the peptide solution and store at 20 C (pre- TAILS sample) (see Note 8). 21. Adjust the pH of the remaining peptide solution to pH 6–7 with 2 M HCl. 22. Add a fourfold excess (w/w) of HPG-ALD polymer (see Note 9) and 5 M NaBH3CN to a final concentration of 50 mM NaBH3CN and incubate overnight at 37 C. Avoid vortexing the polymer or sample with polymer. 23. Condition a 30 kDa MWCO Amicon Ultra 0.5 mL Centrifugal Filter Unit with 400 μL ddH2O. 24. Add the polymer solution to the filter and centrifuge at 10,000 g for 10 min at RT to recover unbound peptides in the flow through. PCT-TAILS 293

25. Wash the polymer by adding 30 μL of 100 mM NaOH to the filter, pipet gently, and centrifuge at 10,000 g for 10 min at RT. 26. Combine the flow through from steps 24 and 25 (TAILS sample) and store at 20 C in case you do not proceed directly to C18 clean up.

3.4 Desalting 1. Prepare a C18 column by stamping out two pieces of a C18 and C18 Cleanup Empore disk by use of a blunt 19 gauge needle without bevel and mount them in a P200 pipette tip. 2. Wet the column with 40 μL of MeOH and subsequently 40 μL of activation buffer. 3. Equilibrate the column with 2 40 μL equilibration buffer. 4. Load a fraction of the peptide solution. 5. Wash the column with 2 40 μL wash buffer. 6. Elute peptides with 3 40 μL elution buffer into a clean protein low bind microcentrifuge tube. 7. Dry the samples completely with a SpeedVac concentrator and resuspend in MS sample buffer.

3.5 LC-MS/MS Analyze the preTAILS and TAILS samples by tandem mass spec- trometry on a hybrid mass spectrometer coupled in line with a C18 chromatography system. The settings will depend on the instru- ments used. This is an example of the settings we generally apply to a Thermo ScientificTM Q ExactiveTM coupled in line to a Thermo ScientificTM Easy-nLC1000 with a 2 cm 75 μm, AcclaimTM PepMapTM 100 column trap column (packed with 3 μm, C18 beads) and a 50 cm 75 μm, PepMap™ RSLC analytical column (packed with 2 μm C18 beads): 1. The Thermo ScientificTM Easy-nLC1000 is operated with a flow rate of 250 nL/min and a gradient from 6% to 60% solvent B in 125 min (From 6% to 23% in 85 min; from 23% to 38% in 30 min; from 38% to 60% in 10 min), followed by a wash step from 60% to 95% in 5 min and cleaning of the columns with 95% solvent B for 10 min. 2. The Thermo ScientificTM Q ExactiveTM is operated in a DDA mode for 140 min; MS1 resolution at 70.000; automatic gain control (AGC) target set to 3e6; maximum injection time (IT) set to 20 ms; scan range 300–1750 m/z; selecting top 10 for MS2 analysis; MS2 resolution set to 17.500; AGC target 1e6; maximum IT 60 ms, isolation window of 1.6 m/z, nor- malized collision energy at 28. 294 Louise Bundgaard et al.

3.6 Data Analysis Several strategies and software packages for N-terminomics data analysis have been developed that can be applied to interpret the recorded dataset [26, 27]. Challenges in TMT-TAILS are the identification of semitryptic peptides requiring a peptide-centric rather than a protein-centric approach, their reliable relative quan- tification with help of the of TMT reporter ions and their classifica- tion as natural protein N termini or protease generated neo-N termini. In our experience, this can be achieved by using the Thermo Scientific™ Proteome DiscovererTM v2.2 software pack- age and subsequent annotation with help of the TAILS Annotator 1.0 script that can be customized, for example, to use alternative types of protein sequence databases. 1. Process .raw files from preTAILS and TAILS samples in a combined analysis (see Note 10) with Proteome DiscovererTM v2.2 applying a standard TMT quantification protocol with appropriate settings for multiplexing and peptide modifications (see Note 11). 2. Extract N-terminal peptides from exported output data and add extended annotation using TAILS Annotator 1.0 (see Note 12).

4 Notes

1. Cut the tissue on a glass plate to avoid contamination with plastic polymers that might interfere with MS analysis. 2. Some tissue biopsies contain different tissue layers. If you aim for a quantitative study, you have to carefully weigh tissue with the same proportionate distribution of the different layers. Data should be interpreted with care. 3. In this protocol, we use PCT for tissue disruption, but various mechanical methods are currently available and will most likely be compatible with this method. The overall goal is efficient disruption of the tissue to ensure high yields of protein. The TMT-TAILS protocol presented here is optimized for 50 μg protein. In our opinion, PCT is a fast and reproducible method, and a central advantage is its capability to extract proteins from small amounts of tissue. 4. If you wish to deviate from the TAILS sample buffer, consider the following: GnHCl assures complete protein denaturation prior to protein labeling. We have not used buffers without chaotropic salts for TAILS and do not recommend this. In case you replace HEPES by a different buffering agent, make sure that your buffer of choice is devoid of free amine groups, since these would interfere with the labeling reaction, thereby lead- ing to partial labeling. PCT-TAILS 295

5. Be aware that you might recover different volumes of superna- tant from each sample. For optimal results, you should measure the recovered volume in each sample and adjust the concentra- tion of GuHCl to 2.5 M based on the exact volume. 6. We have successfully used 0.2 mg TMT for labeling. To prepare the 0.2 mg TMT aliquots we resuspend the content of one TMT vial (0.8 mg) in 800 μL 100% ACN, divide it equally into four aliquots, and vacuum dry the aliquots completely (avoid overdrying). 7. The 60 min of labeling should be counted precisely for each condition to reduce quantification errors derived from differ- ent labeling times. 8. Most internal tryptic peptides will have isobaric mass tag labeled lysine residues and the preTAILS sample can be used for relative quantification at the protein level. 9. The concentration of the HPG-ALD may vary with the specific batch. Information is provided on the package insert. 10. Proteome Discoverer v.2.2 workflow files for analysis of TMT-TAILS datasets acquired with the described settings are available upon request. 11. Combining files from preTAILS and TAILS analyses is recom- mended, since it will improve statistical models for secondary validation of peptide spectrum matches. 12. TAILS Annotator takes a list of peptides in a format of ‘X. PEPTIDESEQUENCE.X’ as input and adds annotation as outlined in the manual.

Acknowledgments

We thank Erwin Schoof and Lene Holberg Blicher for continuous support in operating the DTU Proteomics Core. This work was supported by a Novo Nordisk Foundation Young Investigator Award (NNF16OC0020670) and a grant from the Swiss National Science Foundation (31003A_163216) to U.a.d.K.

References

1. Lo´pez-Otı´n C, Bond JS (2008) Proteases: regulation between proteases in the human multifunctional enzymes in life and disease. J protease web. PLoS Biol 12:e1001869 Biol Chem 283:30433–30437 4. McCarty SM, Percival SL (2013) Proteases and 2. Pe´rez-Silva JG, Espan˜ol Y, Velasco G et al delayed wound healing. Adv Wound Care (2016) The degradome database: expanding 2:438–447 roles of mammalian proteases in life and dis- 5. Breznik B, Motaln H, Lah Turnsˇek T (2017) ease. Nucleic Acids Res 44:D351–D355 Proteases and cytokines as mediators of inter- 3. Fortelny N, Cox JH, Kappelhoff R et al (2014) actions between cancer and stromal cells in Network analyses reveal pervasive functional tumours. Biol Chem 398:709–719 296 Louise Bundgaard et al.

6. De Stefano ME, Herrero MT (2017) The mul- tissue sample preparation for 2-DE. Electro- tifaceted role of metalloproteinases in physio- phoresis 28:1022–1024 logical and pathological conditions in 18. Zhu Y, Guo T (2017) High-throughput pro- embryonic and adult brains. Prog Neurobiol teomic analysis of fresh-frozen biopsy tissue 155:36–56 samples using pressure cycling technology cou- 7. Weiss-Sadan T, Gotsman I, Blum G (2017) pled with SWATH mass spectrometry. Meth- Cysteine proteases in atherosclerosis. FEBS J ods Mol Biol 1788:279–287 284:1455–1472 19. Shao S, Guo T, Gross V et al (2016) Reproduc- 8. Edgington-Mitchell LE (2016) Pathophysio- ible tissue homogenization and protein extrac- logical roles of proteases in gastrointestinal dis- tion for quantitative proteomics using ease. Am J Physiol Liver Physiol 310: micropestle-assisted pressure-cycling technol- G234–G239 ogy. J Proteome Res 15:1821–1829 9. Lobmann R, Ambrosch A, Schultz G et al 20. Kleifeld O, Doucet A, auf dem Keller U et al (2002) Expression of matrix- (2010) Isotopic labeling of terminal amines in metalloproteinases and their inhibitors in the complex samples identifies protein N-termini wounds of diabetic and non-diabetic patients. and protease cleavage products. Nat Biotech- Diabetologia 45:1011–1016 nol 28:281–288 10. auf dem Keller U, Prudova A, Eckhard U et al 21. Schlage P, Egli FE, Nanni P et al (2014) Time- (2013) Systems-level analysis of proteolytic resolved analysis of the matrix metalloprotei- events in increased vascular permeability and nase 10 substrate degradome. Mol Cell Prote- complement activation in skin inflammation. omics 13:580–593 Sci Signal. https://doi.org/10.1126/ 22. Sabino F, Hermes O, Egli FE et al (2015) In scisignal.2003512 vivo assessment of protease dynamics in cuta- 11. Broder C, Arnold P, Vadon-Le Goff S et al neous wound healing by degradomics analysis (2013) Metalloproteases meprin and meprin of porcine wound exudates. Mol Cell Proteo- are C- and N-procollagen proteinases impor- mics 14:354–370 tant for collagen assembly and tensile strength. 23. Schlage P, Kockmann T, Sabino F et al (2015) Proc Natl Acad Sci 110:14219–14224 Matrix metalloproteinase 10 degradomics in 12. Huesgen PF, Lange PF, Overall CM (2014) keratinocytes and epidermal tissue identifies Ensembles of protein termini and specific pro- bioactive substrates with pleiotropic functions. teolytic signatures as candidate biomarkers of Mol Cell Proteomics 14:3234–3246 disease. Proteomics Clin Appl 8:338–350 24. Sabino F, Egli FE, Savickas S et al (2018) 13. Guo T, Kouvonen P, Koh CC et al (2015) Comparative degradomics of porcine and Rapid mass spectrometric conversion of tissue human wound exudates unravels biomarker biopsy samples into permanent quantitative candidates for assessment of wound healing digital proteome maps. Nat Med 21:407–413 progression in trauma patients. J Invest Der- 14. Goldberg S (2015) Mechanical/physical meth- matol 138:413–422 ods of cell distribution and tissue homogeniza- 25. Kleifeld O, Doucet A, Prudova A et al (2011) tion. Methods Mol Biol 1295:1–20 Identifying and quantifying proteolytic events 15. Balny C, Masson P, Heremans K (2002) High and the natural N terminome by terminal pressure effects on biological macromolecules: amine isotopic labeling of substrates. Nat Pro- from structural changes to alteration of cellular toc 6:1578–1611 processes. Biochim Biophys Acta 1595:3–10 26. auf dem Keller U, Overall CM (2012) CLIP- 16. Gross V, Carlson G, Kwan AT et al (2008) PER: an add-on to the trans-proteomic pipe- Tissue fractionation by hydrostatic pressure line for the automated analysis of TAILS cycling technology: the unified sample prepara- N-terminomics data. Biol Chem tion technique for systems biology studies. J 393:1477–1483 Biomol Tech 19:189–199 27. Schlage P, Egli FE, auf dem Keller U (2017) 17. Ringham H, Bell RL, Smejkal GB et al (2007) Time-resolved analysis of matrix metalloprotei- Application of pressure cycling technology to nase substrates in complex samples. Methods Mol Biol 1579:185–198 4. Mapping matrix metalloproteinase networks in the skin - Manuscript 4

During all phases of cutaneous wound healing multiple proteolytic cascades, pathways and networks interact in orchestrated processes that finally lead to restoration of the injured skin and its barrier function after injury. Among them, matrix metalloproteinases (MMPs) interface with several other proteolytic systems to aid in immune cell recruitment, re-epithelialization and remodeling of the scar tissue. Uncontrolled MMP activity has been associated with chronic, non- healing wounds and cancer. However, the complex regulation of these proteases has been only partially unraveled, the consequences of many substrate cleavages remain elusive and many target proteins are still unknown. In the following manuscript, we have developed novel highly specific and sensitive parallel reaction monitoring (PRM) assays to monitor MMPs in complex samples. To concomitantly analyze the MMP activation status, we have specifically targeted peptides that are cleaved during propeptide removal, thus resulting in activation of the zymogens. These assays have been applied to study interconnected MMP activation on the molecular, cellular and tissue level. We have tested the hypothesis that MMP10 and MMP3 act as activator MMPs of several other family members and mapped and functionally tested MMP3/10-dependent MMP activation in culture supernatants from fibroblasts and keratinocytes. Mouse models with normal and enhanced MMP10 activity have been analyzed and consequences on activation of other MMPs assessed. These studies will provide the scientific community with highly sensitive PRM assays for analysis of MMPs and their activation. Our study of interconnected MMP activation has helped to gain a more detailed understanding of the robustness of the MMP network in skin tissue. It also demonstrates the modulation of the MMP network by the biological matrix with significant differences in biochemical assays and analyses at the tissue level.

56 Novel proteomics technologies to decipher protease networks in cells and tissues Mapping matrix metalloproteinase networks in the skin

Simonas Savickas1,2; Philipp Kastl1; Ulrich auf dem Keller1, 2*

Affiliations 1Technical University of Denmark, Department of Biotechnology and Biomedicine, Kongens Lyngby, Denmark Affiliations 2ETH Zurich, Department of Biology, Institute of Molecular Health Sciences, Zurich, Switzerland

*Corresponding author:

Ulrich auf dem Keller, email: [email protected]

Key words: matrix metalloproteinases / protease web / proteomics / parallel reaction monitoring

Novel proteomics technologies to decipher protease networks in cells and tissues 57 Abstract

Matrix metalloproteinases (MMPs) contribute to skin homeostasis, cutaneous wound repair, and skin carcinogenesis. Thereby, they do not only remodel the extracellular matrix, but they also play pivotal roles in immune cell recruitment, angiogenesis and epithelial cell proliferation. At the wound edge, MMPs are secreted by migrating keratinocytes and fibroblasts and form an interdependent zymogen activation network that has been poorly understood. In this study, we applied targeted proteomics to map this MMP activation network and its interconnections on the molecular, cellular and tissue level. By using recombinant mouse MMPs 2, 3, 7, 8, 9, 10 and 13 we have developed parallel reaction monitoring (PRM) assays for detection of these proteases in complex biological matrices with very high specificity and sensitivity. The power of these assays was uniquely increased by including discriminative features for latent and active MMPs that allow monitoring zymogen removal. Next, we exploited our newly established PRM assays to assess co-activation of MMPs in three different biological complexities: biochemical, in vitro and in vivo. We have generated a first model of interconnected MMP activation at the molecular and tissue level. In agreement with previous biochemical studies, we observed activation of recombinant MMPs 7 and 13 by MMP3 and MMP10, increased activation of MMP7 by MMP10 in murine fibroblasts and a significant increase in active MMP13 upon overexpression in MMP 10 in murine skin. These observations confirm functions of MMP3 and MMP10 as ‘activator’ MMPs in MMP activation networks.

58 Novel proteomics technologies to decipher protease networks in cells and tissues 4.1 Introduction

Complex cellular homeostasis is highly regulated by proteases and their interconnected activities. Proteases form one of the most abundant protein families in multicellular eukaryotic systems (López-Otín & Bond, 2008). Every cell has one or more strongly controlled proteolytic machineries that irrevocably modify its intra- and extracellular proteomes. According to the latest release of MEROPS, there are 68 families of human proteases and analogs (Rawlings, 2020). Metalloproteinases and serine proteases have the most family members with 194 and 176, respectively, compared to the families of cysteine, threonine and aspartate proteases. Historically, matrix metalloproteinases (MMPs) have been studied for their role in extracellular matrix (ECM) degradation and more recently they have also been considered in processing of growth factors, cell-surface receptors and cytokines as well as other proteases and protease inhibitors (Marchant, et al., 2014). Moreover, MMPs have been linked to several pathologies including chronic inflammatory diseases, vascular diseases, neurological disorders and skin disorders (Fanjul-Fernández, Folgueras, Cabrera, & López-Otín, 2010) (Rodríguez, Morrison, & Overall, 2010) (Krampert, et al., 2004). Secreted MMPs are activated by proteolytic zymogen activation and play a pivotal role in skin homeostasis, wound healing and tissue remodeling. For instance, results from several studies indicated that dysregulated proteolytic activities of MMP8 and MMP13 in wounded skin lead to delayed healing, altered keratinocyte migration and contraction (Balbín, et al., 2003) (Hattori, et al., 2009). Others have linked dysfunctional MMP1 and MMP14 to rapid stiffening of the skin and aging (Zigrino, et al., 2016) (Xia, et al., 2013), whereby increased activity of MMP10 resulted in scattered keratinocytes at the wound edge (Krampert et al. 2004). However, an integrative understanding of interconnected activities of MMPs in normal skin and their disturbance in disease is missing. Most studies on MMPs in the past have been performed using classical protein abundance assays, but in the last few years a series of technical improvements have advanced the analysis of inactive and active versions of the proteases in cells and tissues. In gel and in situ zymography assays, quenched-fluorescent probes, and the latest mass spectrometry-based degradomics techniques (Vandooren, Geurts, Martens, Van Den Steen, & Opdenakker, 2013) (Savickas, Kastl, & auf dem Keller, 2020) have increased throughput, sensitivity and specificity. The latest highly sensitive and selective mass spectrometry technology allows an in depth analysis of MMP degradation substrates (Schlage, Kockmann, Sabino, Kizhakkedathu, & Auf Dem Keller, 2015) (Kleifeld, et al., 2011) and can reveal a multitude of interaction partners (Schlage et al. 2015) (Auf Dem Keller, Prudova, Eckhard, Fingleton, & Overall, 2013). With help of complex degradomics datasets and literature data compiled from the MEROPS database the protease web has been established that shows an interactive network of MMP activation (Fortelny, et al., 2014). The MMP network reveals a high degree of interconnectivity, which

Novel proteomics technologies to decipher protease networks in cells and tissues 59 requires an adequate holistic tool to measure the lowly abundant proteases and their proteolytic interactions. In this study, we have designed parallel reaction monitoring (PRM) targeted proteomics assays to explore the MMP interaction network in secretomes from mouse skin keratinocytes and fibroblasts as well as in murine skin. Disturbances of the network were studied by including samples from animals genetically modified for MMP10 activity in basal keratinocytes (Krampert et al. 2004; Schlage et al. 2015). Our PRM assays were further extended to distinguish between MMP zymogens and N-terminally truncated and thus activated versions. We tested the skin MMP interaction network at three levels of biological complexity, direct interactions of recombinant proteases, cell-based keratinocyte and fibroblast degradation dynamics, and in vivo proteolytic processing. Each level of complexity yielded a different outcome of expressed proteases and different degrees of activation. Overall, we see a highly interdependent network of MMPs in murine skin, their activity fluxes and changes to the network upon shifts in activity of an individual member of the network.

60 Novel proteomics technologies to decipher protease networks in cells and tissues 4.2 Methods 4.2.1 Recombinant proteins used for mass spectrum library generation. The following recombinant proteins were used to create a spectral library for active and inactive versions of protease candidates: murine MMP2 (924-MP-010, R&D Systems), MMP3 (548-MM- 010, R&D Systems), MMP7 (2967-MP-010, R&D Systems), MMP8 (2904-MP-010, R&D Systems), MMP9 (909-MM-010, R&D Systems), murine MMP10 (230-00748-10, RayBiotech), human MMP10 (910-MP-010, R&D Systems), MMP13 (MBS9422107, MyBioSource). 4.2.2 MMP activation 400ng of respective proMMPs were diluted individually in MMP activation buffer (10mM CaCl2, 100mM NaCl, 50mM HEPES, pH 7.8) to a final concentration of 40μg/ml. To induce proMMP activation proMMPs were incubated with 1mM 4-Aminophenylmercuric acetate (APMA) for 1h at 37oC, 600rpm, or overnight at 37°C without APMA addition. The reaction was stopped, and proteins denatured by addition of 8M guanidine hydrochloride (GuHCl) 1:1 (v/v). 4.2.3 Quenched fluorescence substrate assay Stock solutions of 100μM and 1μM synthetic quenched fluorescence substrate (ES002, R&D Systems) were prepared in DMSO and activation buffer, respectively. Inactive human MMP10 and active human MMP10 (activated with APMA as described by (Razai, Eckelman, & Salvesen, 2020)) were prepared in activation buffer to a final concentration of 2ng/μl. To measure activation 10ng of each MMP was mixed with 1μM of ES002 activation buffer to a final sample volume of 100μl. Fluorescence (ex.328, em.400) was immediately recorded on a multiwavelength fluorescence scanner (SpectraMax Gemini) for 2h in intervals of 45s. The experiment was performed in triplicate using activation buffer and inactive MMP10 as negative controls. 4.2.4 Pairwise time course activation of recombinant MMPs with MMP3 and MMP10 Inactive murine MMPs, active murine MMP3 and active human MMP10 (activated by incubating overnight in activation buffer) were prepared in activation buffer to a final concentration of 10ng/μl. 50ng of inactive MMPs were mixed 1:1 (w/w) with active murine MMP3 or human MMP10. Triplicates of each sample were incubated for 4h at 37oC, 600rpm. As a positive control 100ng of inactive MMPs were incubated with 1mM APMA. MMP cross-activation was interrupted by adding 1 sample volume of 8M GuHCl, 50mM HEPES, pH 7.8. 4.2.5 Growth of primary murine keratinocyte cells Primary murine keratinocyte cell lines were generated as described (Schlage, Kockmann, Sabino, Kizhakkedathu, & Auf Dem Keller, 2015). Each cell line was passaged once 80% confluence was reached in a T25 cell culture flask by washing the cells with 5mL sterile PBS (Thermo Fisher, BR0014G), following incubation with 0.05% Trypsin-EDTA solution

(Thermo Fisher, 11580626) for 5min at 37°C, 5% CO2. Trypsin was neutralized by adding at least 6mL of serum-containing media, cell suspension transferred to a 15ml conical tube and centrifuged at 1300rpm for 5min at 4oC. Supernatant was aspirated, the remaining cell pellet suspended in growth medium and 20% of the cell suspension cultivated with appropriate

o amounts of growth medium in a new T25 cell culture flask at 37 C in 5% CO2 atmosphere.

Novel proteomics technologies to decipher protease networks in cells and tissues 61 4.2.6 Growth of mouse embryonic fibroblast cells MEFs were cultured in Dulbecco’s Modified Eagles Medium (DMEM, D6429) supplemented with 10% (v/v) FBS, 1% (v/v) penicillin/streptomycin solution, 0.1mM non-essential amino acids and 55μM β-mercaptoethanol at 37°C in a 5% CO2 atmosphere. plus. For passaging of confluent cells, fibroblasts were washed with sterile PBS and incubated with 0.05% Trypsin- EDTA solution for 5min at 37oC. Trypsin was neutralized by adding at least 6mL of serum- containing media, cell suspension transferred to a 15ml conical tube and centrifuged at 1300rpm for 5min at 4oC. For long-term storage collected cells corresponding to a confluent T75 flask was centrifuged at 1 200rpm for 4min and resuspended in 1mL medium containing 10%(v/v) DMSO. Resuspended cells were transferred to cryotubes, slowly frozen to -80°C and stored in liquid nitrogen.Cell line Genotype Background Description MPK MMP10 -/- 129/SvEvBrd Murine primary keratinocytes xC57BL6/J MEF MMP10 -/- 129/SvEvBrd Murine embryonic fibroblasts xC57BL6/J Cell line Growth medium composition Description MPK K-SFM medium (Thermo Fisher, 10505013) Murine primary keratinocytes supplemented with 1% Penicillin/Streptomycin (Sigma-Aldrich, P0781), 0.1nM Cholera toxin (Sigma-Aldrich, C8052), 10ng/ml epidermal growth factor. MEF DMEM medium (D6429), supplemented with Murine embryonic fibroblasts 10% FBS (Thermo Fisher, 11550356), 1% Penicillin/Streptomycin, cocktail of non- essential amino acids (Gibco, 11140050), 55μM β-mercaptoethanol 4.2.7 Cell secretome collection When cells reached 90% confluency they were seeded 1:1 on T75 and grown overnight. The next day, when cells reached at least 70% confluency, growth medium was discarded,T75 flasks were washed 3 times with sterile PBS and 15mL secretome medium added. Cells were

o incubated with secretome medium for 24h at 37 C, 5% CO2. After 24 h medium was collected in a 50mL conical tube and phenylmethylsulfonyl fluoride (PMSF, Thermo Fisher, 36978) added to a final concentration of 0.5mM. Secretomes were centrifuged for 5min, 1300rpm at room temperature to pellet dead cells, transferred to a new conical tube and centrifuged for 30min, 4000rpm at 4oC to remove cell debris. Afterwards, secretomes were filtered through a 0.22μm filter and concentrated using Amicon Ultra filter units (3kDa cut-off). Buffer of the concentrated secretomes was exchanged using the Amicon manifolds by washing the secretomes three times with 10 fold the volume of 50mM HEPES, pH7.8. Concentrated samples were collected, protein

62 Novel proteomics technologies to decipher protease networks in cells and tissues concentration measured using Bradford reagent (Sigma, B6916) and final protein concentration adjusted to 2mg/ml.

Cell line Secretome medium composition Incubation time & yield per T75 flask MPK EpiLife Medium (Thermo Fisher MEPI500CA), 60μM 20h, 40μg protein

CaCl2 24h, 50μg protein MEF DMEM D1145 (Thermo Fisher, 31053028), 0.1M non- 40h, 70μg protein essential amino acids, 55μM B-mercaptoethanol, 1.5mM L-Glutamine (Thermo Fisher, 11539876), 1mM Sodium Pyruvate (Sigma, S8636)

4.2.8 Pairwise secretome digestion using MMP10 and MMP3 Overnight-activated human MMP10 or murine MMP3 were incubated with MEF or MPK secretomes for 8h at 600rpm, 37oC in triplicate at 1:170 enzyme to protein ratio (w/w). Digestion was interrupted by adding 1 sample volume of 8M Guanidine hydrochloride in 50mM HEPES, pH 7.8. 4.2.9 Mice Transgenic mice overexpressing constitutively active MMP10 in basal keratinocytes were described previously (Krampert et al. 2004). Mice husbandry was ensured according to local legislation and guidelines by EPIC facility at ETH Zurich, Switzerland. 4.2.10 DNA extraction from K14-MMP10 mice For genotyping of K14-MMP10 mice ear clips from weaning age mice were incubated in 75μL HotSHOT lysis buffer (25mM NaOH, 0.2mM EDTA) at 95oC from 20 min. The reaction was stopped by placing the samples on ice and adding 25μL neutralization buffer (40mM Tris-HCl, pH?). 1-5μL of extracts were used as a template for the subsequent PCR reaction with 1x KAPA buffer total volume of 20μl according to manufacturer’s instructions and 0.9μM of primers β- globin #62 and #63. 4.2.11 Analytical PCR reaction. To amplify the extracted DNA we used KAPA 2G Fast Genotyping PCR Kit (VWR International, Dietikon, Switzerland) according to the manufacturer’s instructions. Primers indicated in Table 1 were ordered from Microsynth (Balgach, Switzerland). Before starting the PCR reaction, a droplet

Table 1- Murine genotyping primers K14-MMP10 Name Sequence 5'-Primer β-globin #62 5'-GGA TCC TGA GAA CTT CAG GGT GAG-3' 3'-Primer β -globin #63 5'-CAG CAC AAT AAC CAG CAC GTT GCC-3'

Novel proteomics technologies to decipher protease networks in cells and tissues 63 Table 2-PCR programme PCR Program for K14-MMP10. K14-MMP10 KAPA Time Temperature

First denaturing step 3 min 95 °C 35 cycles Denaturing step 15 sec 95 °C Primer annealing 15 sec 53 °C Primer extension 15 sec 72 °C Last extension step 1 min 72 °C Storage 8 °C of mineral oil was applied to prevent sample from evaporating. DNA region amplification was performed using a peqSTAR 2X Thermocycler (PEQLAB, vWR International, Dietikon, Switzerland) with the programs indicated in Table 2. 4.2.12 Agarose gel electrophoresis To distinguish mouse genotypes PCR samples were visualized with a 1.5% agarose gel. 3g of agarose was dissolved in 200ml SBA buffer (10mM NaOH, buffered with boric acid at pH 8.5). After addition of ethidium bromide gels were cast.PCR samples were mixed with the running buffer and then loaded on the get together with a DNA size marker (Gene Ruler 100bp) was separated using constant voltage of 300V for 10-15min. The DNA bands were visualized under UV light (Gel Dox XR System, Bio-Rad) 4.2.13 K14-MMP10 murine dermis and epidermis separation by heat shock 2 2 58-65 days old mice were sacrificed using CO2. Approximately 4cm of back skin and 4cm of belly skin (after shaving) was carefully cut out with surgical scissors. Belly skin and half of back skin were placed in an Eppendorf tube and instantly snap frozen using liquid nitrogen. The rest of the back skin was incubated in PBS at 60oC for 30sec, followed by cooling it down in 4oC PBS for 60sec. The skin sample was then placed on a glass petri dish and the epidermis was scraped off from the dermis using a curved scalpel blade. Separated pieces were instantly processed using a Barocycler (Pressure BioSciences Inc.) or snap frozen for later analysis. 4.2.14 Murine skin protein extraction Extraction process is explained in detail in (Bundgaard, Savickas, & auf dem Keller, 2020). In brief, proteins were extracted using pressure cycling technology (PCT). 2-5mg of wet weight epidermis and dermis were collected using a 2mm punch biopsy needle (Sigma, WHAWB100076). Samples were added into a PCT tube with 30μl of lysis buffer (4M Guanidine hydrochloride with 50mM HEPES, pH7.8). The vials were briefly sonicated and covered using PCT micro-pestles. Samples were placed in a barocycler (Pressure Biosciences, 2320EXT) and lysed using 60 cycles of 50sec at 45000psi, 10sec at atmospheric pressure heating the sample to 33oC. After processing samples were transferred into an Eppendorf tube for further tryptic digestion.

64 Novel proteomics technologies to decipher protease networks in cells and tissues 4.2.15 Tryptic digest of proteins Samples were reduced and alkylated by addition of 400mM tris(2-carboxyethyl)phosphine (TCEP) 1:20 (v:v) and 200mM Chloroacetamide (CAA) 1:10 (v:v) followed by incubation at 65oC for 30min, 1000rpm. Afterwards, samples were cooled on ice, diluted with 50mM HEPES pH7.8, 1:4 (v:v) and digested with LysC (Wako W01W0112-0254) (1:100 protease:protein ratio) for 4h, at room temperature (RT), 600rpm. Finally, the samples were diluted with 50mM HEPES pH7.8, 1:1 (v:v) and digested a second time with Trypsin (Promega, V5280) (1:50 protease:protein ratio) overnight at 37oC. The next day, the digestion was stopped by adding trifluoroacetic acid (TFA) (Sigma 302031) to a final concentration of 1% TFA. 4.2.16 STAGE-tip sample clean-up Sample pH was measured using pH strips (Sigma P4536-100EA) and adjusted to STAGE (STop And Go Extraction) tipping buffer A (2% acetonitrile (Merck 34851-2.5L-M), 0.1% TFA) by titration. New STAGE tip columns were prepared using standard 200 μl pipette tips and 3x Empore C18 plug (66883-U, ~5μg peptide capacity/plug). The columns were washed twice with 50μl 100% methanol , twice with 50μl 100% acetonitrile and three times with 50μl buffer A with a plunger or centrifugation for 2min, 2000rpm at RT after every wash step. Samples were pooled into two groups, inactive and active, loaded on the columns by centrifugation and washed 5 times with 50μl buffer A. Peptides were eluted two-times by centrifugation 2000rpm for 2min at RT with 50μL of buffer B (80%Acetonitrile, 0.1% TFA) into protein LoBind tubes (Eppendorf). Eluates were dried under vacuum at 45oC. Dried peptides were resuspended in MS buffer (2% acetonitrile, 1% TFA), sonicated using 3 cycles 30s on, 30s off at High Power (BioRuptor, Diagenode)) and shaken at 1000rpm for 5min. Finally, peptide concentration was measured using a nanodrop (DS-11 FX, Denovix). 4.2.17 Determine limit of detection (LOD and limit of quantification (LOQ 6ng (0.2ng/ul) of cleaned respective recombinant proMMPs lysate was mixed at different concentrations with 100ng/ul HEK293 lysate (gift from Dr. Erwin M. Schoof). Double dilutions of proMMPs from 1000pg to 62,5pg of MMP peptide were prepared in triplicate. Total volume of prepared sample was 10ul. 4.2.18 Reverse phase liquid chromatography EASY-nLC 1200 chromatography system with a 2cm x 75μm, Acclaim®PepMap100 trap column, packed with 3μm, C18 beads was used to load 50ng of MMP peptides or 500ng complex proteome which were separated using a 50cm x 75μm, PepMap™ RSLC analytical column, packed with 2μm C18 beads (Thermo Fisher ES803A). Columns were washed with solvent A (0.1% FA) after sample loading and peptides gradient eluted with increasing concentrations of solvent B (80% acetonitrile, 0.1% FA) with a flowrate of 250nl/min. The chromatographic gradient of solvent B was set from 6% to 60% over total of 60min (from 6% to 23% in 43min; From 23% to 38% in 12min; From 38% to 60% in 5min), followed by wash steps from 60% to 95% over 3min and a final wash with 95% for 7min.

Novel proteomics technologies to decipher protease networks in cells and tissues 65 4.2.19 Shotgun mass spectrometry For MS analysis a Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific) was operated in data-dependent acquisition (DDA) mode for 70 min. MS1 resolution was set to 70.000 , Automatic gain control (AGC) target set to 3*e6, maximum injection time set to 20ms, scan range 350 to 1750 m/z, selecting the Top 10 MS1 ions for MS2 analysis. MS2 scans used 15,000 resolution, AGC target of 2*e5, maximum injection time 22ms, and isolation window of 1.6m/z at normalized collision energy of 28. 4.2.20 Targeted LC-MS analysis (Parallel Reaction Monitoring For targeted proteomics a Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific) was operated in PRM (Parallel Reaction Monitoring) mode for 70min. MS1 resolution of 70.000, AGC target set to 3*e6, maximum Injection time set to 20ms, scan range 350 to 1300m/z for a full MS scan. MS2 scans used 15,000 resolution, AGC target of 2*e6, maximum injection time 100ms, and isolation window of 1.2m/z at normalized collision energy of 27. Using 107 target peptides (Supplemental material) and scheduling their acquisition time in 5min windows adjusted to their chromatographic elution pattern. Synthetic heavy peptides for each target peptide were ordered from JPT technologies and injected together in the initial run to confirm elution patterns. 4.2.21 Data analysis Shotgun data analysis was performed using Proteome Discoverer 2.2 (Thermo Fisher Scientific). MS spectra were extracted from raw data files and searched against a complete Mus musculus database from the UniProt Database (TaxID 10090, 17,040 SwissProt entries, December 10, 2017). The database was concatenated with a list of all protein sequences in reversed order. Searches were run with 10ppm precursor ion tolerance for total protein level identification and 0.02 Da for fragment ion tolerance using SEQUEST search algorithm and Percolator for FDR filtering. Carbamidomethylation of cysteine residues (+57.021Da) was set as a static modification, while Oxidation (+15.995Da) was set as a variable modification. Peptide-spectrum matches (PSM’s) were set to not exceed 1% false discovery rate (FDR). Filtered PSMs were further filtered for peptide and protein-level FDR of 1%. Parallel reaction Monitoring (PRM) data was analyzed using Skyline 19.1 (MacCoss Lab). Target peptides were imported after setting filter parameters at MS1 orbitrap to 70.000 resolution, at 200 m/z and MS/MS targeted filtering at 15.000 resolution, at 200m/z. All precursor masses had at least 5 transitions. Elution patterns of the most confident peptides were selected for dot product of at least 0.98, allowing no more than 5ppm deviation of the transitions. Selected peaks were matched to the in-house target library. Area under the curve of each eluted transition was selected for further quantification using the “R” package “MSstats 3.12.3”. Proteoforms of independent injections were normalized according to the mean of total intensity, combat method batch corrected, limits of blank and limits of detections were calculated using the package “linear_quantlim” (Galitzine, et al., 2018), standard deviation was calculated, paired t-test were performed on within experimental groups and resulting intensities were plotted for data interpretation.

66 Novel proteomics technologies to decipher protease networks in cells and tissues 4.3 Results 4.3.1 Newly established PRM assays detect murine MMPs with high specificity and sensitivity To identify optimal peptides for PRM analysis of mouse skin MMPs, we prepared tryptic digests from recombinant mouse MMPs 2, 3, 7, 8, 9, 10 and 13 and subjected them to shotgun proteomics. Generated spectra were ranked and at least two most intense distinct peptide spectra selected for PRM assay development (Fig. 1A). To ensure assessment of all protein copies independent of zymogen removal, we focused on peptides localized to domains C-terminal to the zymogen cleavage site as depicted for MMP2 in Fig. 1B. Based on the obtained sequences, heavy peptide standards were purchased and used to validate elution patterns and identity endogenous MMPs in subsequent analyses. Each recombinant MMP could be identified by at least two unique peptides (Supplement data), which were used for further quantification. In order to evaluate and quantify the limits of detection (LOD) and the limits of blank (LOB) with this experimental setup, a dilution series of recombinant MMPs in a complex matrix of whole proteomes from human HEK293 cells was tested. Due to the differences between murine and human proteomes, no interference was expected. Thus, the noise level produced in the mass spectrometer by the non-target proteome was used to set the cut-off threshold for a LOD for each protease as exemplified for MMP2 in Fig. 1C. Independent triplicate injections of decreasing concentrations of the proteases combined with MS stats tool “linear_quantlim” (Galitzine, et al., 2018) to calculate theoretical LOD and LOB indicated the following values: MMP2-LOB (0.4pg/ μg), LOD (5.6pg/μg); MMP3-LOB (7.4pg/μg), LOD (14.8pg/μg); MMP7-LOB (2.7pg/μg),

Figure 1-Detection and quantification of recombinant MMPs by targeted LC-MS/MS. A). Experimental setup. Tryptic peptides of recombinant MMPs were identified using on-line reverse phase liquid chromatography with targeted MS/MS and Parallel Reaction Monitoring (PRM) B). Identification and quantification of recombinant murine MMP2. Two unique peptides in between C-terminus and the propeptide were used as a reference for total MMP2 abundance. The area under the curve of product ions over the chromatographic peak can be calculated and used to measure the relative abundance of a target protein. C). Limit of detection and quantification of the targeted MMP2 by PRM. Testing different concentrations of MMPs in HEK293 lysate shows limit of detection and blank in complex matrix.

Novel proteomics technologies to decipher protease networks in cells and tissues 67 LOD (15.8pg/μg); MMP8-LOB (23.8pg/μg), LOD (43.8pg/μg); MMP9-LOB (4.4pg/μg), LOD (12.6pg/μg; MMP13-LOB (3pg/μg), LOD (5.3pg/μg) (Fig. 1C, Supplemental data). 4.3.2 Targeted degradomics distinguishes between inactive proMMPs and active MMPs Having established highly sensitive PRM assays for detection of MMP proteins, we tested if PRM can be applied to distinguish the inactive zymogen proMMP from the activated protease form by monitoring neo-N termini and cleavage site spanning peptides (Savickas & Auf Dem Keller, 2017). For this purpose, inactive recombinant MMPs were treated with the known chemical activator 4- aminophenylmercuric acetate (APMA), which induces autocatalytic release of MMP propeptides (Mannello & Medda, 2011) (Fig. 2A). With this knowledge, the spanning and neo-N-terminal peptides of each MMP could be identified as exemplified for MMP10 in Fig. 2A. The neo-N- terminal-peptide generated upon zymogen removal was only identified after APMA activation but not in control samples, whereas the cleavage site spanning peptide was only present in controls, indicating full activation (Fig. 2B). Since our PRM assays only monitor removal of the propeptide and not actual MMP protease activity, we tested activity of pro- and APMA-activated enzymes using a quenched fluorescence peptide assay (Razai, Eckelman, & Salvesen, 2020). As an example, only proMMP10 that had been pretreated with APMA was able to efficiently process a specific peptide substrate, whereas the untreated and not proteolytically activated control did not have any effect (Fig. 2B). This correlated results from our PRM targeted degradomics assays with protease activity and demonstrated that this workflow can identify and distinguish between inactive and active MMPs.

Figure 2-Identification and quantification of recombinant human MMP10 activation. A). MMP10 zymogen and active MMP10 structural components. MMP10 is activated upon spanning peptide cleavage (histidine (H)↓phenylalanine (F)) between the propeptide and catalytic moiety. Mass spectra show two peptides of MMP10, one representing the inactive (Spanning) and one the active (Neo-N) protease. B). APMA induced MMP10 activation. APMA leads to propeptide cleavage and activation of the zymogen. Decrease of spanning peptide and increase of neo N-terminus signal intensities after incubation with APMA confirm MMP activation. Quenched fluorescence substrate (ES002) incubated with APMA-activated MMP10 confirms its proteolytic activity.

68 Novel proteomics technologies to decipher protease networks in cells and tissues 4.3.3 MMP3 and MMP10 selectively activate target MMPs in cell-free systems After having validated that MMP zymogens can be distinguished from their active forms by our newly designed PRM assay, the workflow was applied to investigate proMMP activation by other active MMPs. According to published biochemical studies MMP3 and MMP10 are able to activate a variety of other MMPs (Neil, Rawlings, & Salvesen, 2013). Thus, autoactivated recombinant mouse MMP3 or human MMP10, respectively, were co-incubated with selected inactive recombinant mouse proMMPs and activation of target MMPs measured by PRM (Fig. 3A). APMA- treatment of proMMPs served as a positive control, whereas only buffer without APMA or MMP3/10 served as a negative control. For all tested MMPs the neo-N-terminal peptide could be detected across all conditions, but with varying abundance. In some buffer-treated controls the neo-N-terminal peptide was detected, possibly due to protease autoactivation through autoproteolysis. Neo-N-terminal peptide abundance of MMP2 increased slightly compared to the negative control after incubation with MMP3, while MMP10 treatment had no measurable effect. However, overall effects of both MMPs on MMP2 were minor compared to full activation in the positive control. MMP7 was activated by both MMP3 and MMP10, with MMP3 having a higher and significant activation potential on MMP7 than MMP10. MMP8 was only activated by MMP10 but without statistical significance, with MMP3 having a similar effect as the negative control.

Figure 3- Biochemical cross-activation of recombinant MMPs. A). Experimental outline. Inactive murine proMMPs 2, 7, 8 and 13 were incubated separately with autoactivated murine MMP3 or human MMP10 and neo N-termini monitored by PRM. B). Neo N-terminal peptide levels of MMPs after treatment with MMP3 and MMP10. Total area under the curve values of detected N-terminal MMP peptide intensities were calculated and adjusted in relation to the highest value, which was set to 1. Error bars: SD, *: p<0.05 (t-test).

Novel proteomics technologies to decipher protease networks in cells and tissues 69 Finally, MMP13 was only activated by MMP3 with an even stronger effect than APMA treatment. MMP10 was not able to lead to a significant increase in abundance of detectable MMP13 neo-N- terminal peptide if compared to the negative control (Fig. 3B). Hence, these analyses demonstrated selective activation of target proMMPs 2, 7, 8 and 13 by the activator MMPs 3 and 10. 4.3.4 Distinct MMP subsets are secreted by keratinocytes and fibroblasts and selectively activated by MMPs 3 and 10 MMPs are known to vary in expression levels between different cell types in the skin, contributing to different biological processes. Thus, as a next layer of complexity, we applied our protein PRM assays to assess MMP abundances in secretomes from MMP10-deficient murine embryonic fibroblasts (MEF) and murine primary keratinocytes (MPK) (Fig. 4). In MEF secretomes high abundances of MMPs 2, 7, and 13 and low abundances of MMP 3 were observed. On the contrary, abundances of MMP 3 was higher in MPKs compared to MEFs, whereas abundances of MMPs 2, 7, and 13 were comparatively lower in these cells (Fig. 4B). Other secreted target MMPs could not be identified in fibroblast or keratinocyte secretomes, possibly due to too low abundances. To evaluate the effect of MMPs 3 and 10 on activation of endogenous proMMPs, secretomes from MMP10-deficient MEFs and MPKs were treated with either active, recombinant mouse MMP3 or human MMP10 and cross-activation of secondary MMPs analyzed by PRM monitoring of neo-N-terminal peptides generated upon zymogen removal (Fig. 4C). It should be noted that while identical for mouse MMP3 and MMP10 this neo-N terminus differs for human MMP10, allowing to monitor activation of endogenous MMP3 in these samples upon incubation with active human MMP10. We were able to observe neo-N termini of MMPs 2, 3 and 7 in MEF and MPK secretomes. MMP13 could not be included in this analysis, since its neo-N-terminal peptide could not be confidently quantified. Relative abundances of the MMP2 neo-N terminus in MEFs confirmed minor effects of MMP3 and MMP10 on MMP2 activation as seen in the cell-free assay (Fig. 3B), although in this complex matrix MMP3 had a stronger effect when compared to the control. MMP3-mediated activation of endogenous MMP7 could also be confirmed in MEF secretomes, showing a significant increase in abundance of the neo-N terminus compared to the negative control, whereas surprisingly no effect by MMP10 treatment could be observed. In MEF secretomes, MMP10 had only a negligible effect on activation of endogenous MMP3. MPKs showed a slightly different trend of MMP cross-activation. Although not statistically significant, MMP3 generated slightly more truncated MMP2 than in the control, whereas MMP10 treatment led to a reduction of peptide abundance, suggesting interference with autoactivation. Endogenous MMP3 was similarly affected by MMP10 as in MEFs. MMP7 was not activated by MMP10 and, in contrast to the MEF data, the abundance of the neo-N-terminal peptide was even reduced after MMP3 treatment. Finally, to further validate activity of spiked in MMP10, we used targeted degradomics to monitor cleavage of ADAMTS-like protein 1 in MEFs and of dermokine in MPKs that been

70 Novel proteomics technologies to decipher protease networks in cells and tissues Figure 4- In vitro identification and quantification of the activation status of different MMPs in murine cell lines. A). Experimental setup. Secretomes of MMP10-ko Mouse Primary Keratinocytes (MPK) and Mouse Embryonic Fibroblasts (MEF) were incubated with active, recombinant murine MMP3 and human MMP10. Samples were analyzed by PRM for differences in candidate MMP activation by quantifying their neo-N terminus. B). Quantification of MMPs in MEF and MPK secretomes. C). Quantification of the detected MMP neo-N termini in MEF and MPK secretomes after treatment with active MMP3 and MMP10. Shown are the normalized sums of the total area-under-the-curve quantification of all obtained spectra for the respective peptides. D). Targeted degradomics confirms a previously described MMP10-mediated dermokine cleavage in MPK secretomes between 179|180 amino acids. and ADAMTS like protein 1 cleavage in MEFs between 371|372 amino acids. Error bars: SD, *: p<0.05 (t-test) previously identified as direct MMP10 substrates (Schlage et al. (2014, 2015)). As expected, MMP10 incubation strongly increased respective neo-N-terminal peptides of both substrate proteins, confirming proteolytic processing under these conditions (Fig. 4D). 4.3.5 Dermis and epidermis selectively contribute to a shared MMP network Exploring MMP abundance and activation patterns in independent tissue culture secretomes has the advantage of isolating cell type specific activity. However, interaction of these cell types in tissues might influence the network in vivo. Therefore, to further explore MMP complexity and to test the PRM setup in an in vivo setting, MMP abundances and zymogen activation were measured in skin from mice overexpressing a constitutively active mutant of MMP10 in basal keratinocytes or wild-type controls (Krampert et al. 2004). Moreover, to be able to still assign MMPs and their activation to major skin layers and test influences of disturbance of the network in the epidermis on MMPs in the dermis, skin from both genotypes was separated into dermal and epidermal layers. Both layers were processed and analyzed separately, providing important new insights into tissue specific MMP activation profiles (Fig. 5A). In wild-type animals, MMPs 7, 8, 9, 13 had higher abundances in the epidermis, while MMP2 and MMP3 abundances were ˜30% higher in samples from the dermal than the epidermal

Novel proteomics technologies to decipher protease networks in cells and tissues 71 layer (Fig. 5B). Interestingly, even though mice overexpressing MMP10 in the epidermis do not show any clear phenotypic differences compared to wild-type mice under normal conditions (Krampert et al. 2004), the abundance of most candidate MMPs was increased compared to wild- type. As expected, abundance of MMP10 was ~3 times higher in epidermal samples from K14- MMP10 than wild-type mice, but also MMP2, MMP3, MMP9, and MMP13 were increased between 1.3- to 3-fold in their abundance. MMP7 abundance dropped compared to the control, whereas MMP8 did not show any differences in abundance in the epidermis between genotypes (Fig. 5B). Even if expressed in the epidermis, increased expression of MMP10 in keratinocytes also influenced abundances of targeted MMPs in the dermal layer. Abundances of MMP7, MMP8, MMP9, MMP10 and MMP13 were strongly elevated in dermal tissue of MMP10-overexpressing tissue, whereby MMP2 and MMP3, were moderately higher abundant in the dermis (Fig. 5B). Next, we quantified dermal and epidermal neo-N-terminal peptides generated upon zymogen removal to assess the activation status of targeted MMPs. Detection of neo-N termini for MMP2 and MMP7 in the epidermis was inconclusive in this first set of analyses and thus is not yet reported. Due to identity of the neo-N terminus of mouse MMP3 and MMP10, the much higher abundance in K14-MMP10 epidermis results from expression of the transgene but does not allow to selectively assess effects on MMP3 zymogen processing. However, abundance of the MMP13 neo-N-terminal peptide was 2-fold higher in samples from transgenic than wild-type animals, indicating shifts to the MMP activation network as a result of increased expression of constitutive MMP10 in the epidermis (Fig. 5C). Interestingly, MMP10 overexpression in basal keratinocytes of the epidermis had also significant effects on MMP zymogen removal as indicator for activation in the dermal compartment. Strongest effects were observed for MMP2 and MMP7, which showed much higher abundances of the indicative neo-N terminus in the dermis of transgenic than wild-type mice. A similar pattern was observed for the shared neo-N-terminal peptide released from MMP3 and MMP10, but this might be masked by transgene contamination from the epidermis. In contrast to measurements in the epidermis, we observed a reduction in MMP13 zymogen removal in dermis from K14-MMP10 mice but with a lower fold change. Overall, this data indicated that the MMP network is shared between both skin layers and generally influenced by disturbances in only one compartment.

72 Novel proteomics technologies to decipher protease networks in cells and tissues Figure 5- In Vivo identification and quantification of different MMPs and their activated forms in K14- MMP10 overexpressing mice. A). Experimental outline of mouse skin analysis. MMP10 overexpressing mouse skin was separated into dermis and epidermis prior to pressure cycling technology (PCT) assisted processing, followed by previously designed PRM assays for MMP protein and MMP Neo-N term quantification. B). Relative amounts of core MMP peptides in 500ng of dermal and epidermal proteome. C). Quantification of MMP Neo-N terms in dermal and epidermal tissues. Due to MMP3 and MMP10 sharing the same Neo-N term peptide it is impossible to deduce separate activation status. Error bars: SD, *: p<0.05 (t- test).

Novel proteomics technologies to decipher protease networks in cells and tissues 73 4.4 Discussion

Proteases do not act in isolation but in complex interconnected networks within the protease web (Fortelny et al. 2014). These interactions are often mediated by mutual proteolytic activation through zymogen removal. MMPs are a prime example for such an activation network, but due to lack of appropriate technology they have been mostly studied individually and not at an integrative level in complex biological matrices, hampering our understanding of MMP activities and their disturbance in disease. Here, we have exploited the power of targeted proteomics to develop novel PRM assays for a subset of MMPs with known co-expression in the skin (Neil, Rawlings, & Salvesen, 2013) to simultaneously assess abundances of these proteases with high specificity and sensitivity. These assays have been complemented with targeted degradomics assays (Savickas et al., 2017) to concomitantly monitor zymogen removal and thus the activation status of individual proteases. Our assays were able to detect MMPs with a sensitivity down to a limit of detection in the range of 4-5pg corresponding to <1fmol when spiked into 1μg of complex proteome and could distinguish between highly similar MMPs like MMP3 and MMP10. Monitoring both the cleavage site spanning peptide and the neo-N terminus of zymogen removal quantitatively assessed the MMP activation status and correlated with protease activity measured in substrate cleavage assays. Applying these assays for analysis of MMPs in cell-free systems, cellular secretomes from keratinocytes and fibroblasts as well as lysates from mouse epidermis and dermis, we mapped out an MMP activation network that demonstrates remarkable differences between levels of biological complexity. We were able to assess abundances of distinct subsets of MMPs in keratinocytes, fibroblasts, epidermal and dermal tissues. Since MMP levels are very low in normal skin, data on abundances in this tissue as well as in secretomes from related non-challenged cell types are sparse. However, where available results from prior studies are mostly in agreement with our analyses. Higher abundance of MMP2 in secretomes from fibroblasts than from keratinocytes have been observed (Tandara & Mustoe, 2011) as well as higher expression levels in the dermal than the epidermal compartment (Oikarinen et al., 1993). MMP3 is highly expressed in proliferating keratinocytes matching the phenotype of the spontaneously immortalized cells used in our study but at very low and comparable levels in dermis and epidermis (McCawley et al., 2008). MMP7 has not been localized to the epidermis (Rohani & Parks, 2015), which could explain a higher abundance in MEF than MPK secretomes but requires orthogonal validation of our contrary results in skin tissue compartments. Although low, expression of MMP13 has been observed in fibroblasts (Ravanti, et al., 2001) and keratinocytes and a low level in normal skin has been reported (Hattori, et al., 2009). Finally, we detected MMP8 and MMP9 only in tissue but not in cell secretomes, which reflects the low expression of these proteases under homeostatic conditions in the skin (Fisher et al., 2001; Oikarinen et al., 1993). Taken together, these results

74 Novel proteomics technologies to decipher protease networks in cells and tissues confirm the validity of our targeted degradomics assays as appropriate tool to identify and quantify MMPs in increasingly complex biological samples. Biochemically, MMP7 was activated by MMP10, but surprisingly this effect was abrogated in cell secretomes (Fig. 6). In our current dataset, the MMP7 neo-N terminus could not be reliably monitored in murine epidermis, but increased activity of MMP10 in the epidermis had a positive effect on MMP7 activation in the dermis. Previous studies confirm the biochemical catalytic interaction of these proteases (Gill, et al., 2004) (Nakamura, Fujii, Ohuchi, Yamamoto, & Okada, 1998), but our analyses demonstrate the modulatory effect of the biological matrix on this activating cleavage. Importantly, endogenous MMP7 protein was identified in cell secretomes and skin tissue, indicating that lack of activation was not a consequence of missing protein. Thus, these observations warrant further investigation. Interestingly, we observed that MMP7 is even stronger activated by MMP3 than by MMP10 in biochemical assays and secretomes from murine embryonic fibroblasts. Activation of MMP7 by MMP3 has been shown by incubation of recombinant enzymes (Imai, et al.,1995), but to our knowledge we present the first evidence for this catalytic interaction in a complex environment and monitoring endogenous MMP7. Both MMP3 and MMP10 have been implicated in the control of collagenase (MMPs 1, 8, 13) activity, either by activation of the zymogen or regulation of expression (Rohani, et al., 2015). Our results confirm this data for MMP8 for both MMP3 and MMP10, but without statistical significance when compared to the control. Significance was reached for MMP13 zymogen removal by MMP3, but baseline intensity of neo-N-terminal peptides was very high in all these assays, possibly masking stronger effects. Tissue-level analysis also confirmed effects on protein abundances of MMP8 and MMP13 in both the epidermal and the dermal compartment upon changes to MMP10 activity in epidermal keratinocytes. While increased intensity of the indicative MMP13 neo-N terminus in K14-MMP10 epidermis was proportional to higher abundance of the protein, levels of this peptide were reduced in dermal samples from transgenic mice compared to wild-type animals. These results indicate complex changes to an MMP network that is shared between both skin layers upon increase of activity of a single member (MMP10) in one tissue

Figure 6-MMP network in three different matrices. Nodes represent MMP proteases and the arrows show the activation link between them. Thicker node border indicates if a protease has been identified in the respective biological matrix. *p-value<0.05 (t-test)

Novel proteomics technologies to decipher protease networks in cells and tissues 75 compartment. These changes warrant further investigation and might explain the minor seen in these animals under homeostatic conditions (Krampert et al., 2004), which might result from robust rewiring of the MMP network and equivalent net outcomes of activities in both genotypes. At least according to our current dataset, several MMP abundances and activation status differed between biochemical, in vitro and in vivo assays (Fig. 6). Missing values will still require a further refinement of the data to draw a conclusive picture, but those differences follow the rule that not every proteolytic activation event that can happen in biochemical assays with recombinant proteins is validated in more complex biological systems invoking endogenous proteases. For example, an inhibitor present in a native secretome might prevent a specific cleavage under the given assay conditions. Vice versa, co-factors or secondary proteases might facilitate zymogen activation in complex active biological matrices, which are absent in simple incubation assays using recombinant MMPs. More experimentation is required to fill the missing links, but our data demonstrate the underlying principles.

76 Novel proteomics technologies to decipher protease networks in cells and tissues References Auf Dem Keller, U., Prudova, A., Eckhard, U., Fingleton, B., & Overall, C. (2013). Systems-level analysis of proteolytic events in increased vascular permeability and complement activation in skin inflammation. Science Signaling.

Balbín, M., Fueyo, A., Tester, A., Pendás, A., Pitiot, A., Astudillo, A., . . . López-Otín, C. (2003). Loss of collagenase-2 confers increased skin tumor susceptibility to male mice. Nature Genetics, 35(3).

Barksby, H. E., Milner, J. M., Patterson, A. M., Peake, N. J., Hui, W., Robson, T., … Rowan, A. D. (2006). Matrix metalloproteinase 10 promotion of collagenolysis via procollagenase activation: Implications for cartilage degradation in arthritis. Arthritis and Rheumatism, 54(10). https://doi.org/10.1002/art.22167

Bundgaard, L., Savickas, S., & auf dem Keller, U. (2020). Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS. Esantis L. Bundgaard, S. Savickas, & U. auf dem Keller, Methods in Molecular Biology (T. 2043).

Deutsch, E., Lane, L., Overall, C., Bandeira, N., Baker, M., Pineau, C., . . . Omenn, G. (2019). Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. Journal of Proteome Research, 18(12).

Fanjul-Fernández, M., Folgueras, A., Cabrera, S., & López-Otín, C. (2010). Matrix metalloproteinases: Evolution, gene regulation and functional analysis in mouse models. Biochimica et Biophysica Acta - Molecular Cell Research, 1803(1).

Fisher, G. J., Choi, H. C., Bata-Csorgo, Z., Shao, Y., Datta, S., Wang, Z. Q., … Voorhees, J. J. (2001). Ultraviolet irradiation increases matrix metalloproteinase-8 protein in human skin in vivo. Journal of Investigative Dermatology, 117(2). https://doi.org/10.1046/j.0022-202X.2001.01432.x

Fortelny, N., Cox, J., Kappelhoff, R., Starr, A., Lange, P., Pavlidis, P., & Overall, C. (2014). Network Analyses Reveal Pervasive Functional Regulation Between Proteases in the Human Protease Web. PLoS Biology, 12(5).

Galitzine, C., Egertson, J., Abbatiello, S., Henderson, C., Pino, L., MacCoss, M., . . . Vitek, O. (2018). Nonlinear regression improves accuracy of characterization of multiplexed mass spectrometric assays. Molecular and Cellular Proteomics.

Gill, J., Kirwan, I., Seargent, J., Martin, S., Tijani, S., Anikin, V., . . . Loadman, P. (2004). MMP-10 is overexpressed, proteolytically active, and a potential target for therapeutic intervention in human lung carcinomas. Neoplasia, 6(6).

Hattori, N., Mochizuki, S., Kishi, K., Nakajima, T., Takaishi, H., D'Armiento, J., & Okada, Y. (2009). MMP-13 plays a role in keratinocyte migration, angiogenesis, and contraction in mouse skin wound healing. American Journal of Pathology, 175(2).

Knauper, V., Wilhelm, S. M., Seperack, P. K., DeClerck, Y. A., Langley, K. E., Osthues, A., & Tschesche, H. (1993). Direct activation of human neutrophil procollagenase by recombinant stromelysin. Biochemical Journal, 295(2). https://doi.org/10.1042/bj2950581

Krampert, M., Bloch, W., Sasaki, T., Bugnon, P., Rülicke, T., Wolf, E., . . . Werner, S. (2004). Activities of the matrix metalloproteinase stromelysin-2 (MMP-10) in matrix degradation and keratinocyte organization in wounded skin. Molecular Biology of the Cell, 15(12).

López-Otín, C., & Bond, J. (2008). Proteases: Multifunctional enzymes in life and disease. Journal of Biological Chemistry, 283(45).

Mannello, F., & Medda, V. (2011). Differential expression of MMP-2 and MMP-9 activity in megakaryocytes and platelets. Blood, 118(24).

Marchant, D., Bellac, C., Moraes, T., Wadsworth, S., Dufour, A., Butler, G., . . . Overall, C. (2014). A new transcriptional role for matrix metalloproteinase-12 in antiviral immunity. Nature Medicine, 20(5).

Novel proteomics technologies to decipher protease networks in cells and tissues 77 McCawley, L. J., Wright, J., LaFleur, B. J., Crawford, H. C., & Matrisian, L. M. (2008). Keratinocyte expression of MMP3 enhances differentiation and prevents tumor establishment. American Journal of Pathology, 173(5). https://doi.org/10.2353/ajpath.2008.080132

Murphy, G., Cockett, M. I., Stephens, P. E., Smith, B. J., & Docherty, A. J. P. (1987). Stromelysin is an activator of procollagenase. A study with natural and recombinant enzymes. Biochemical Journal, 248(1). https://doi.org/10.1042/bj2480265

Nakamura, H., Fujii, Y., Ohuchi, E., Yamamoto, E., & Okada, Y. (1998). Activation of the precursor of human stromelysin 2 and its interactions with other matrix metalloproteinases. European Journal of Biochemistry, 253(1).

Neil, D., Rawlings, N., & Salvesen, G. (2013). Handbook of proteolytic enzymes.

Oikarinen, A., Kylmäniemi, M., Autio-Harmainen, H., Autio, P., & Salo, T. (1993). Demonstration of 72-kDa and 92- kDa forms of type IV collagenase in human skin: Variable expression in various blistering diseases, induction during re-epithelialization, and decrease by topical glucocorticoids. Journal of Investigative Dermatology, 101(2). https://doi.org/10.1111/1523-1747.ep12363823

RAVANTI, L., TORISEVA, M., PENTTINEN, R., CROMBLEHOLME, T., FOSCHI, M., HAN, J., & KÄHÄRI, V.-M. (2001). Expression of human collagenase-3 (MMP-13) by fetal skin fibroblasts is induced by transforming growth factor β via p38 mitogen-activated protein kinase. The FASEB Journal, 15(6). https://doi.org/10.1096/fj.00- 0588fje

Rodríguez, D., Morrison, C., & Overall, C. (2010). Matrix metalloproteinases: What do they not do? New substrates and biological roles identified by murine models and proteomics. Biochimica et Biophysica Acta - Molecular Cell Research, 1803(1).

Rohani, M. G., McMahan, R. S., Razumova, M. V., Hertz, A. L., Cieslewicz, M., Pun, S. H., … Parks, W. C. (2015). MMP- 10 Regulates Collagenolytic Activity of Alternatively Activated Resident Macrophages. Journal of Investigative Dermatology, 135(10). https://doi.org/10.1038/jid.2015.167

Savickas, S., Kastl, P., & auf dem Keller, U. (2020). Combinatorial degradomics: Precision tools to unveil proteolytic processes in biological systems. Biochimica et Biophysica Acta - Proteins and Proteomics, 1868(6).

Schlage, P., Kockmann, T., Sabino, F., Kizhakkedathu, J., & Auf Dem Keller, U. (2015). Matrix metalloproteinase 10 degradomics in keratinocytes and epidermal tissue identifies bioactive substrates with pleiotropic functions. Molecular and Cellular Proteomics, 14(12).

Suzuki, K., Morodomi, T., Nagase, H., Enghild, J. J., & Salvesen, G. (1990). Mechanisms of Activation of Tissue Procollagenase by Matrix Metalloproteinase 3 (Stromelysin). Biochemistry, 29(44). https://doi.org/10.1021/bi00496a016

Tandara, A. A., & Mustoe, T. A. (2011). MMP- and TIMP-secretion by human cutaneous keratinocytes and fibroblasts - Impact of coculture and hydration. Journal of Plastic, Reconstructive and Aesthetic Surgery, 64(1). https://doi.org/10.1016/j.bjps.2010.03.051

Vandooren, J., Geurts, N., Martens, E., Van Den Steen, P., & Opdenakker, G. (2013). Zymography methods for visualizing hydrolytic enzymes. Nature Methods, 10(3).

Xia, W., Hammerberg, C., Li, Y., He, T., Quan, T., Voorhees, J., & Fisher, G. (2013). Expression of catalytically active matrix metalloproteinase-1 in dermal fibroblasts induces collagen fragmentation and functional alterations that resemble aged human skin. Aging Cell, 12(4).

Zigrino, P., Brinckmann, J., Niehoff, A., Lu, Y., Giebeler, N., Eckes, B., . . . Mauch, C. (2016). Fibroblast-Derived MMP- 14 Regulates Collagen Homeostasis in Adult Skin. Journal of Investigative Dermatology, 136(8).

78 Novel proteomics technologies to decipher protease networks in cells and tissues 5. Hybrid Parallel Reaction Monitoring (hPRM), a novel mass spectrometry-based workflow for semi- discovery research- Manuscript 5

During the development of the MMP PRM assays, we have noticed that the signal intensity of the targeted peptides did not increase substantially, once the injection time of the PRM method exceeded a certain threshold. Upon determining an optimal injection time, we have discovered that we reach the point, where a lot of targeted scans are not useful scans. The instrument is iterating the same peak over 20 times, which is wasteful for quantitation or identification. Thus, we have exploited this insight to develop a method, which we termed hybrid parallel reaction monitoring (hPRM), and that replaces the unproductive scanning time with Data Independent Acquisition (DIA)-type scans for a concomitant sensitive whole proteome analysis. The design does not compromise the sensitivity of the assays for PRM target quantification, eliminates the bottleneck of few identified peptides and utilizes the power of the newest instruments. Hybrid-PRM does not require an advanced programming interface (API) or special access to software code and can be implemented e.g. using a conventional Thermo Xcalibur Method Editor by adjusting module positions and ensuring the correct target list based on an already existing PRM method. The method collects all the necessary ions for the PRM targets with independent scans. Subsequent to high sensitivity PRM scans, hPRM runs low resolution DIA windows taking less scanning time than a single PRM scan. The short DIA scans contain the ions from the most abundant proteins and peptides. In the following manuscript, we demonstrate the applicability of the method by concomitantly monitoring a MMP network by PRM and parts of the surrounding proteome by DIA in normal and inflamed mouse skin. Next to analyzing the MMP panel with high sensitivity, we identified and quantified hundreds of additional proteins and peptides including termini generated from protease cleavage events. Hence, hPRM is a novel semi-discovery method that can be used for target validation and low-depth DIA proteome analysis without compromising PRM sensitivity.

Novel proteomics technologies to decipher protease networks in cells and tissues 79 Hybrid Parallel Reaction Monitoring (hPRM), a mass spectrometry-based workflow for semi-discovery research

Simonas Savickas1; Philipp Kastl1; Ulrich auf dem Keller1*

Affiliations 1 Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby,Denmark

*Corresponding author:

Ulrich auf dem Keller, email: [email protected]

Key words: matrix metalloproteanases / protease-web / proteomics / Data Independent acquisition / Parallel Reaction Monitoring

80 Novel proteomics technologies to decipher protease networks in cells and tissues Abstract

New generation orbitrap mass spectrometers like Q Exactive HF-X, and Exploris 480 are equipped with an improved hardware that boosted sensitivity, speed and accuracy of classical proteomics techniques and opened an avenue for new methods. Targeted proteomics has unprecedented sensitivity but is heavily limited to monitored targets. In this study, we present a novel hybrid proteomics approach that combines the sensitivity of Parallel Reaction Monitoring (PRM) and the scalable nature of Data Independent Acquisition (DIA). By exploiting the power of existing XcaliburTM method editor, we overlapped PRM-like small windows with DIA-like wide windows. PRM windows were designed to monitor 107 targets at 1.2Da window in a scheduled manner for 100ms and the ion packages using DIA wide windows were injected as brief as possible for 7500 resolution in an optimized sequence. By combining the two powerful quantitative methods, we have successfully monitored the 107 targets and extended the analysis by 3500 peptides in a single shot to observe proteoform shifts upon overexpression of murine epidermal MMP10 and perturbation with a tumor promoter agent (TPA). The best of two worlds method has identified lowly abundant MMP2,3,7,8,9,10,13 in murine skin without compromising sensitivity to previously established PRM assays and has identified an endogenous S100-A11 cleavage site by cathepsin D. The novel technique of hybrid Parallel Reaction Monitoring (hPRM) is a semi- discovery technique designed to target endogenously low-abundance peptides and discover untargeted proteins and their post-translational modifications.

Novel proteomics technologies to decipher protease networks in cells and tissues 81 5.1 Introduction

Targeted proteomics is a widely applied mass spectrometry technique to quantify low-abundance proteins in biological samples, reproducibly measure individual post-translational modifications, and evaluate a few hundred proteins at a time. To cover a wide array of peptides by targeted mass spectrometry, it is crucial to develop a mass spectrometry-based strategy for utmost sensitivity without compromising throughput. With a robust method the assay can ensure reproducible measurement of multiple proteins. Compared to other mass spectrometry methods, such as Data Dependent Acquisition (DDA) or Data Independent Acquisition (DIA), targeted proteomics surpasses all by its sensitivity (Schubert, Rst, Collins, Rosenberger, & Aebersold, 2017). However, in contrast to DDA shotgun or DIA approaches, targeted proteomics is a hypothesis testing technique, neglecting the surrounding proteome in the analyte. Targeted proteomics techniques, such as Selected Reaction Monitoring (SRM) and Parallel Reaction Monitoring (PRM) quantify relative amounts of peptides and proteins at the MS2 level using fragment ions instead of precursors. The main difference between PRM/SRM and DIA is that the latter requires the mass spectrometer to isolate selected ions using wide i.e 20m/z filtering windows over the available range of m/z, instead of a few approximately 1.2m/z size windows at specified spots of the m/z range. Over the years, a few more techniques have been introduced to improve DIA, as it was still lacking behind in sensitivity to targeted techniques. For example, a method termed Multiplexed Data Independent Acquisition (MSX-DIA) multiplexes the windows and reduces the noise levels of co-isolated ions (Egertson, et al., 2013; Sidoli, Fujiwara, & Garcia, 2016). Multi-Mode Acquisition (MMA) has implemented an algorithm which adjusts the size of the windows of the DIA scans to fill approximately the same amount of ions into a mass spectrometer with every scan (Williams, et al., 2016). The latest method named BoxCar utilizes online computation to calculate the optimum ranges (boxes) of MS1 signal to reduce the overwhelming complexity of matching it to MS2 signal of DIA-like windows (Meier, Geyer, Virreira Winter, Cox, & Mann, 2018). However, there have been limited efforts to increase the sensitivity, coverage and power of techniques that depend on real time data. For a long time, PRM has been limited by a very few targets per injection, unless the user of the technique could afford multiple injections (Peterson, Russell, Bailey, Westphall, & Coon, 2012). This drawback has been partially addressed by two techniques called Internal Standard Triggered PRM (IS-PRM) and Triggered by Offset, Multiplexed, Accurate mass, High-resolution, Absolute Quantification technique (TOMAHAQ). These methods increase the list of targets available for detection from 100 to around 300 dependent on the ions that are being measured (Erickson, et al., 2017; Gallien, Kim, & Domon, 2015). However, they still have a relatively low limit of target peptides compared to DIA. Furthermore, there is an ongoing debate about the coefficient of variation influenced by points per chromatographic peak to accurately and reliably quantify the area under the curve without losing too many identified proteins. Some made significant attempts to set the golden standard (Bruderer, et al., 2017) at 8 points per peak, but

82 Novel proteomics technologies to decipher protease networks in cells and tissues many remain skeptical due to the stochastic nature of peptide abundance across all samples. There is no common agreement about whether to use higher injection times or collect more points per peak. Finally, DIA techniques compared to targeted proteomics is still not as sensitive. Regardless of new MSX or MMA techniques there are a few elements that remain unsolved and unexploited. In order to address the issue of sensitivity versus number of identified proteins, we have developed a hybrid Parallel Reaction Monitoring (hPRM) strategy, which allows to detect the ions of interest and identify a significant number of proteins surrounding the targets. Co-isolation of ions is inherent to PRM targeted proteomics and even though many instruments can isolate ions in windows as low as 1.2 m/z, the target ion will be measured with at least a few co-isolated ions. Moreover, the power of an instrument is highly determined by the balance between injection time and scanning speed for ion identification and quantification. Thus, in hPRM, we have combined the lowest possible injection time with large DIA-like windows and small PRM-like windows with the longest injection time possible in a single run. This approach retains the sensitivity of the targeted proteomics and adds the unique feature of DIA to identify surrounding proteins. In this manuscript, we demonstrate the applicability of hPRM to target and quantify lowly abundant matrix metalloproteinases in murine skin, while concomitantly identifying hundreds of proteins and confirming a previously identified endogenous limited proteolysis event in the discovery analysis.

Novel proteomics technologies to decipher protease networks in cells and tissues 83 5.2 Methods 5.2.1 Mice Transgenic mice that overexpress a constitutively active mutant of MMP10 in basal keratinocytes (K14-MMP10) have been described previously (Krampert, et al., 2004). Mice husbandry was ensured according to local legislation and guidelines by EPIC facility at ETH Zurich, Switzerland. 5.2.2 K14-MMP10 murine skin processing 58-65 days old mice were anesthetized using ketamine/xylazine intraperitoneal injection, with the dosage adjusted to their weight. Approximately 4cm2 of back fur was shaven and treated with tetradecanoylphorbol acetate (TPA) 25μg/ml dissolved in acetone or acetone alone as a control.

Mice were placed back into their cages for 24 h before sacrificing them using CO2. 5.2.3 Heat shock dermis and epidermis separation, skin preparation Approximately 4 cm2 of back skin and 4 cm2 of belly skin (after shaving) were carefully cut out with surgical scissors. Belly skin and half of back skin were placed in an Eppendorf tube and instantly snap frozen using liquid nitrogen. The rest of the back skin was put incubated in PBS at 60oC for 30 sec, followed by cooling it down in 4oC PBS for 60 sec. The skin sample was then placed on a glass petri dish and the epidermis was scraped off from the dermis using a curved scalpel blade. Separated pieces were instantly processed using a Barocycler (Pressure BioSciences Inc.) or snap frozen for later analysis. 5.2.4 Murine skin protein extraction Extraction process is explained in detail in (Bundgaard, Savickas, & auf dem Keller, 2020). In brief, proteins were extracted using pressure cycling technology (PCT). 2-5mg of wet weight epidermis and dermis were collected using a 2mm punch biopsy needle (Sigma, WHAWB100076). Samples were added into a PCT tube with 30μl of lysis buffer (4M Guanidine hydrochloride with 50mM HEPES, pH 7.8). The vials were briefly sonicated and covered using PCT micro-pestles. Samples were placed in a barocycler (Pressure Biosciences, 2320EXT) and lysed using 60 cycles of 50sec at 45000psi, 10sec at atmospheric pressure heating the sample to 33oC. After processing samples were transferred into an Eppendorf tube for further tryptic digestion. 5.2.5 Tryptic digest of proteins Samples were reduced and alkylated by addition of 400mM tris(2-carboxyethyl)phosphine (TCEP) 1:20 (v:v) and 200mM Chloroacetamide (CAA) 1:10 (v:v) followed by incubation at 65oC for 30min, 1000rpm. Afterwards, samples were cooled on ice, diluted with 50mM HEPES pH7.8, 1:4 (v:v) and digested with LysC (Wako W01W0112-0254) (1:100 protease:protein ratio) for 4h, at room temperature (RT), 600rpm. Finally, the samples were diluted with 50mM HEPES pH7.8, 1:1 (v:v) and digested a second time with Trypsin (Promega, V5280) (1:50 protease:protein ratio) overnight at 37oC. The next day, the digestion was stopped by adding trifluoroacetic acid (TFA) (Sigma 302031) to a final concentration of 1% TFA.

84 Novel proteomics technologies to decipher protease networks in cells and tissues 5.2.6 STAGE-tip sample clean-up Sample pH was measured using pH strips (Sigma P4536-100EA) and adjusted to STAGE (STop And Go Extraction) tipping buffer A (2% Acetonitrile (Merck 34851-2.5L-M), 0.1% TFA) by titration. New STAGE tip columns were prepared using standard 200ul pipette tips and 1x Empore C18 plug (66883-U, ~5μg peptide capacity/plug). The columns were washed twice with 50μl 100% methanol , twice with 50μl 100% Acetonitrile and three times with 50ul buffer A with a plunger or centrifugation for 2min, 2000rpm at RT after every wash step. Samples were pooled into two groups, inactive and active, loaded on the columns, centrifuged for 2min, 2000rpm at RT and washed 5 times with 50ul buffer A. Peptides were eluted two-times by centrifugation 2000rpm for 2min at RT with 50μl of buffer B (80% Acetonitrile, 0.1% TFA) into protein LoBind tubes (Eppendorf) . Eluates were dried under vacuum at 45oC. Dried peptides were resuspended in MS buffer (2% Aceonitrile, 1% TFA), sonicated using 3 cycles 30s on, 30s off at High Power (BioRuptor, Diagenode)) and shaken at 1000rpm for 5min. Finally, peptide concentration was measured using a nanodrop (DS-11 FX, Denovix). 5.2.7 Reverse phase liquid chromatography EASY-nLC 1200 chromatography system with a 2cm x 75μm, Acclaim®PepMap100 trap column, packed with 3μm, C18 beads was used to load 500ng of sample peptides which were separated using a 50cm x 75μm, PepMap™ RSLC analytical column, packed with 2μm C18 beads (Thermo Fisher ES803A). Columns were washed with solvent A (0.1% FA) after sample loading and peptides gradient eluted with increasing concentrations of solvent B (80% Acetonitrile, 0.1% FA) with a flowrate of 250nl/min. The chromatographic gradient of solvent B was set from 6% to 60% over 60min: from 6% to 23% in 43min; From 23% to 38% in 12min; From 38% to 60% in 5min, followed by wash steps from 60% to 95% over 3min and a final wash with 95% for 7min. 5.2.8 Mass spectrometry Thermo Fisher scientific Q Exactive HF-X and Fusion Trybrid instruments were used to design the hybrid PRM methods. Local XcaliburTM Method Editor was used for each method. Each method took 70min. 5.2.9 Data analysis Parallel reaction Monitoring (PRM) data was analyzed using Skyline 19.1 (MacCoss Lab). Target peptides were imported after setting filter parameters at MS1 orbitrap to 60.000 resolution, at 200 m/z and MS/MS targeted filtering at 35.000 resolution, at 200 m/z. All precursor masses had at least 5 transitions. Elution patterns of the most confident peptides were selected for dot product of at least 0.98, allowing no more than 5ppm deviation of the transitions. Selected peaks were matched to the in-house target library. Area under the curve of each eluted transition was selected for further quantification using the “R” package “MSstats 3.12.3”. Proteoforms of independent injections were normalized according to the mean of total intensity and resulting intensities were plotted for data interpretation. To analyze wide type of scans in hPRM data we used Spectronaut 12 (Biognosys, Schlieren, Switzerland) directDIATM function. Targeted overlapped scans were analyzed with

Novel proteomics technologies to decipher protease networks in cells and tissues 85 Skyline as described above. directDIATM has been the preferred method of choice to identify peptides and link them to their respective proteins for discovery analysis.

86 Novel proteomics technologies to decipher protease networks in cells and tissues 5.3 Results

5.3.1 Method and implementation of hPRM Since introduction of targeted proteomics workflows, instrument capacities and software packages have been rapidly improved. We have exploited new instrumental limits to extract more information from the peptide scans and scale the coverage of targeted analysis. Using the same LC-MS injections, we have developed hybrid Parallel Reaction Monitoring (hPRM), a novel method for semi-discovery analysis (Fig. 1). The XcaliburTM software package, which by default is installed on Thermo Fisher Scientific Q Exactive series instruments was used to implement the hPRM workflow. The first step requires generation of an m/z inclusion list for wide windows and targeted ions. Each wide window was set to 11.5 m/z in size starting from 400m/z to 900m/z, which resulted in 45 target windows.

Figure 1- Illustrated principle of hybrid Parallel Reaction Monitoring method (hPRM). A). Instrumental setup of the method for optimal acquisition. Reverse phase separated peptide fragments are analyzed in a mass spectrometer according dual features of hPRM analysis: targeted analysis for peptides of interest and global discovery analysis for global proteome surrounding the targets. B). The two principles of hPRM. (l) The first scan is a narrow window scan. Upon injection of ions a quadrupole isolates a specific target mass with a narrow window (i.e. 1.2Da). Target ions are accumulated within that window for an extended amount of time (i.e. 100ms), followed by fragmentation and identification by a high-resolution mass analyzer, (i.e. Orbitrap). (ll)) The second scan is a broad window scan. A quadrupole isolates ions in a wide window (i.e. 25Da), which overlays with the previous narrow mass window. It accumulates the ions with the shortest time available, fragments the peptides and analyzes them with the lowest resolution available.

Novel proteomics technologies to decipher protease networks in cells and tissues 87 Next, targeted masses were added preselected from previous DDA runs for 107 targets based on their retention times and collision energies. In total a list of 152 mases with 107 of them containing retention times was used. After generating the inclusion list, the instrument needs steps to read in the list correctly. The software picks each mass of the inclusion list and applies them in separate modules one after another until the masses in the inclusion list are exhausted. Once it reaches a mass that has a specific retention time applied, the software will skip this mass until the chromatographic time point arrives. The current version of XcaliburTM requires the user to align the isolation list target time points and wide windows with modules triggering the selected mass scans (Fig. 2). First, wide window scans start the modules list. It performs 7500 resolution scans, using “auto” injection time, with window size of 11.5m/z and adjusted loop for 45 scans. The 45 scans correspond to first 45 indicated mases in the inclusion list, and these mases will be scanned across the whole chromatographic gradient. Secondly, 107 targeted scans follow the wide window scans in the modules list, which are all added separately. They all share a few parameters: 30000 resolution, 100ms injection time, isolation window size of 1.2m/z and a single loop scan. However, these targeted modules differ in scan start and end points. Each scan module represents the exact

Figure 2- Q Exactive Tune method editor set-up for hPRM analysis. The first DIA module triggers fast scans with lowest possible injection time (IT) across the whole chromatographic gradient of 70min. PRM modules with 100ms IT are added sequentially, and their retention time in the inclusion list (Supplemental information) is aligned with the chromatographic elution event in the method editor.

88 Novel proteomics technologies to decipher protease networks in cells and tissues retention time of one target mass indicated in the isolation list. After adjusting each target module with its unique time in the isolation list, we see 107 modules one after another (Fig. 2). Once the list of wide window and target modules is set, a full MS1 scan can be performed. This scan helps the instrument to perform internal AGC calibration, giving the user a possibility to trace heavy internal standard peptides (PRTC or iRT) and allowing some flexibility for post analysis. 5.3.2 Implementation of hPRM on Thermo Fisher Trybrid series instruments The default XcaliburTM software installation on Thermo Fisher Fusion series allows easy implementation of the hPRM method. The first step is to create two lists. One of them contains the information from the wide window scan with 45 predetermined masses between 400m/z and 900m/z and windows of 11.5m/z. The second list is a list of masses for the targeted type of experiment containing the mass, retention time and collision energy of targets. The lists are imported into two separate experiments. The first experiment is set up as a DIA experiment with 15000 resolution, auto injection time, 45 loops over 400 to 900m/z. These parameters are used to use DIA type of scans over the whole chromatographic gradient using the Ultra High Field Orbitrap mass analyzer. In the next step, the list with targeted type of masses is added to a new experiment line. Target masses are measured using 100 ms injection time, 35000 resolution and 1.2m/z isolation windows. The retention times adjusted to the target masses show the time each mass has to be measured. In contrast to the implementation of Q Exactive series instruments, the method is not split into 107 smaller modules but kept in the same module for the entire run. With the two modes of DIA and PRM following on each other the final single file contains information from both modes, with the possibility of an additional MS1 scan providing post-analysis flexibility. 5.3.3 Investigating murine skin proteome by hPRM Having implemented hPRM, we tested its applicability and sensitivity in the analysis of matrix metalloproteinases (MMPs) in murine skin, a system that had been well studied by PRM targeted proteomics in our laboratory (Savickas, et al., 2020). Using hPRM, we explored a subset of MMPs and the surrounding proteome and degradome in epidermal and dermal samples from wild-type and K14-MMP10 mice ( (Krampert, et al., 2004) that had been treated with a phorbol ester (TPA) or acetone control (Fig. 3A). In 500ng injections of total protein we were able to reliably identify and quantify over 3000 peptides (900 proteins) in epidermal and more than 3500 peptides (1000 proteins) in dermal samples, with 2188 (600 proteins) and 3307 (900 proteins) in samples from all genotypes and treatment conditions, respectively (Fig. 3B, Supplementary material). Among the identified peptides we also found multiple semi-tryptic peptides, presumably originating from proteolytic cleavage. As an example, we monitored two peptides (MNTELAAFTK and TEFLSFMNTELAAFTK) in murine dermis that originate from S100-A11, indicating a previously described cleavage event (Impens, et al., 2010). Upon TPA treatment we observed a significant increase of the Neo-N term peptide [MNTELAAFTK] originating from its spanning

Novel proteomics technologies to decipher protease networks in cells and tissues 89 peptide [TEFLSFMNTELAAFTK] (Fig. 3B). The spanning peptide did not increase in abundance upon TPA treatment, suggesting constitutive processing. In PRM mode of hPRM, we targeted six MMPs that are lowly abundant in dermal and epidermal tissues and compared their abundances in their respective proteomes. We observed high abundances of MMP3, MMP7 and MMP8 in wild-type epidermal tissue, while MMP9 and MMP13 were more abundant in the dermal compartment and MMP2 almost equally distributed between layers in murine skin. To demonstrate quantitative capabilities of hPRM, we measured MMP abundances after TPA treatment (Fig. 3C). Overall abundances of MMPs noticeably increased in treated compared to non-treated skin. MMP2, MMP3, MMP10 and MMP13 increased the most in abundance either in epidermal or dermal tissues compared to other targeted MMPs, whereby there was little change in MMP7 and MMP9 expression in TPA-treated mouse skin.

90 Novel proteomics technologies to decipher protease networks in cells and tissues Figure 3-Quantification of different MMPs in mouse skin using hybrid PRM technology. A). Experimental set up of the analysis. TPA-treated wild-type or MMP10-overexpressing mouse skin was separated into dermis and epidermis, followed by processing using PCT and hPRM analysis targeting core MMP peptides. B). Venn diagrams of discovery hPRM analysis. High degree of identification and quantification discovered numerous peptides and identified a known cathepsin D substrate S100-A11. C). Targeted analysis of MMPs by hPRM. Using 500ng of peptide of each dermis and epidermis proteome we see uneven distribution of MMP abundances between skin compartments and a general increase in response to TPA treatment.

Novel proteomics technologies to decipher protease networks in cells and tissues 91 5.4 Discussion

We introduce hPRM, a new proteomics method combining core advantages of targeted proteomics in sensitivity and reproducibility across samples with the ability to determine parts of the surrounding proteome by low-depth DIA. The method has been optimized to allow detection and quantification of very lowly abundant peptides and concomitant detection of hundreds of untargeted proteins, while cutting measurement time in half compared to sequential targeted and untargeted measurements. Hybrid PRM using the latest generation of Orbitrap instruments (HF-X and Exploris 480 as of August 2020) takes advantage of the speed and sensitivity of this type of mass spectrometers. Classical PRM or SRM techniques extended by DIA-type windows allow increasing the search space without significantly compromising sensitivity of targeted analysis. With an increasing speed of each mass spectrometer and increasing sensitivity, boundaries between targeted proteomics, data-independent or data-dependent acquisition are blurred. New developments like BoxCar, IS-PRM or now hPRM are adding to versatility in proteome characterization (Meier, Geyer, Virreira Winter, Cox, & Mann, 2018; Gallien, Kim, & Domon, 2015), By extending the targeted degradomics workflow, which is limited to identifiable targets (Savickas & Auf Dem Keller, 2017), into a semi-discovery space the analysis of proteases and their cleavage products can be complemented with knowledge about non-targeted surrounding events. This might facilitate elucidation of complex interplays of protease classes as demonstrated by our analysis of cathepsin D activity in response to changes in active MMP10 levels. Moreover, by inherently recording abundances of multiple housekeeping proteins, hPRM provides advantages in normalization between runs without the need of specific targeting. However, during the method testing phase, we observed a few shortcomings that we hope will be solved in the future. One of them is the sensitivity of the wide window scans. Since the goal of the technique is to keep the sensitivity of the narrow windows, the wide windows must sacrifice their injection time. With either larger ion packages entering the C-trap, shortened injection times, faster scan speeds or a multiplexing capacity the hurdle could be overcome. By being able to control the injection time of multiplexed large and narrow windows inside the C-trap, a single high-resolution scan of the ion package would enrich the spectrum. Also, the technique is still limited by the amount of concurrent precursors it can measure as any other targeted proteomics technique. This drawback is partially rescued by excluding high intensity peptides from the target list, if the wide window acquisition scan can detect and quantify it. Data processing was performed using two software tools, one for targeted analysis and another for open search. We were not able to identify a single software for such purpose. Furthermore, the data is more complex than traditional targeted experiment data and thus the chromatographic peaks look like spikes. The spikes are due to fluctuations in intensity between long injection and short injection times. Finally, data repositories like “massIVE.quant” and others collecting quantitative mass spectrometry data have to be adjusted to this data format (Choi, M., et al. 2020) Besides these

92 Novel proteomics technologies to decipher protease networks in cells and tissues drawbacks and necessary software solutions, we hope that the awareness of this technique will be useful for novel low yield, precious samples and semi-discovery driven research.

Novel proteomics technologies to decipher protease networks in cells and tissues 93 References

Schubert, O. T., Röst, H. L., Collins, B. C., Rosenberger, G., & Aebersold, R. (2017). Quantitative proteomics: Challenges and opportunities in basic and applied research. Nature Protocols. https://doi.org/10.1038/nprot.2017.040

Egertson, J. D., Kuehn, A., Merrihew, G. E., Bateman, N. W., MacLean, B. X., Ting, Y. S., … MacCoss, M. J. (2013). Multiplexed MS/MS for improved data-independent acquisition. Nature Methods. https://doi.org/10.1038/nmeth.2528

Sidoli, S., Fujiwara, R., & Garcia, B. A. (2016). Multiplexed data independent acquisition (MSX-DIA) applied by high resolution mass spectrometry improves quantification quality for the analysis of histone peptides. Proteomics. https://doi.org/10.1002/pmic.201500527

Williams, B. J., Ciavarini, S. J., Devlin, C., Cohn, S. M., Xie, R., Vissers, J. P. C., … Geromanos, S. J. (2016). Multi-mode acquisition (MMA): An MS/MS acquisition strategy for maximizing selectivity, specificity and sensitivity of DIA product ion spectra. Proteomics. https://doi.org/10.1002/pmic.201500492

Meier, F., Geyer, P. E., Virreira Winter, S., Cox, J., & Mann, M. (2018). BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes. Nature Methods, 15(6). https://doi.org/10.1038/s41592-018-0003-5

Peterson, A. C., Russell, J. D., Bailey, D. J., Westphall, M. S., & Coon, J. J. (2012). Parallel Reaction Monitoring for High Resolution and High Mass Accuracy Quantitative, Targeted Proteomics. Molecular & Cellular Proteomics. https://doi.org/10.1074/mcp.O112.020131

Erickson, B. K., Rose, C. M., Braun, C. R., Erickson, A. R., Knott, J., McAlister, G. C., … Gygi, S. P. (2017). A Strategy to Combine Sample Multiplexing with Targeted Proteomics Assays for High-Throughput Protein Signature Characterization. Molecular Cell, 65(2). https://doi.org/10.1016/j.molcel.2016.12.005

Gallien, S., Kim, S., & Domon, B. (2015). Large-scale targeted proteomics using internal standard triggered-parallel reaction monitoring (IS-PRM). Molecular & Cellular Proteomics. Retrieved from http://www.mcponline.org/content/14/6/1630.short

Bruderer, R., Bernhardt, O. M., Gandhi, T., Xuan, Y., Sondermann, J., Schmidt, M., … Reiter, L. (2017). Optimization of Experimental Parameters in Data-Independent Mass Spectrometry Significantly Increases Depth and Reproducibility of Results. Molecular & Cellular Proteomics. https://doi.org/10.1074/mcp.RA117.000314

Krampert, M., Bloch, W., Sasaki, T., Bugnon, P., Rülicke, T., Wolf, E., … Werner, S. (2004). Activities of the matrix metalloproteinase stromelysin-2 (MMP-10) in matrix degradation and keratinocyte organization in wounded skin. Molecular Biology of the Cell, 15(12). https://doi.org/10.1091/mbc.E04-02-0109

Bundgaard, L., Savickas, S., & auf dem Keller, U. (2020). Mapping the N-Terminome in Tissue Biopsies by PCT-TAILS. In Methods in Molecular Biology (Vol. 2043). https://doi.org/10.1007/978-1-4939-9698-8_24

Impens, F., Colaert, N., Helsens, K., Ghesquière, B., Timmerman, E., De Bock, P. J., … Gevaert, K. (2010). A quantitative proteomics design for systematic identification of protease cleavage events. Molecular and Cellular Proteomics, 9(10). https://doi.org/10.1074/mcp.M110.001271

Savickas, S., & Auf Dem Keller, U. (2017). Targeted degradomics in protein terminomics and protease substrate discovery. Biological Chemistry. https://doi.org/10.1515/hsz-2017-0187

Savickas, S., Kastl, P., & auf dem Keller, U. (2020). Combinatorial degradomics: Precision tools to unveil proteolytic processes in biological systems. Biochimica et Biophysica Acta - Proteins and Proteomics, Vol. 1868. https://doi.org/10.1016/j.bbapap.2020.140392

94 Novel proteomics technologies to decipher protease networks in cells and tissues Choi, M., Carver, J., Chiva, C., Tzouros, M., Huang, T., Tsai, T. H., ... & Perez-Riverol, Y. (2020). MassIVE. quant: a community resource of quantitative mass spectrometry–based proteomics datasets. Nature Methods, 1-4.

Novel proteomics technologies to decipher protease networks in cells and tissues 95 6. Optimized SCope-MS for single-cell proteomics- Manuscript 6

To understand cellular heterogeneity at the protein level, we – in a project led by Dr. Erwin Schoof – have co-developed a single cell proteomics pipeline based on Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) (Budnik, Levy, Harmange, & Slavov, 2018) and termed single-cell mass spectrometry (scMS). To implement the scMS workflow, we FACS sorted 8227 AML patient primary cells into Leukemic Blasts (Blast), Leukemic Stem Cells (LSC), Leukemic Progenitors (Prog) and a mixture of all (Bulk) and monitored proteomic fingerprints of each cell type, one cell at a time. We labeled each cell independently with a TMT11plex tag and prepared it for a tandem liquid-chromatography mass-spectrometry run. We have adjusted the chromatographic gradient of our standard workflows to 100min with a short 15cm analytical column. Furthermore, we have increased the classical injection times from 20ms to 110ms, which has increased the amount of identified lowly abundant peptides. Furthermore, we minimized the “booster channel” interference with channels for single cells, which serves as a channel for peptide identification. We randomized cell types in the booster channel and ruled out technical bias. The confirmed proof of concept allowed us to further investigate and validate the underlying biology of the leukemia hierarchy. To discriminate among the cell types based on abundance on proteotypic signatures we extracted ~500 peptides with a quantifiable TMT11plex label and submitted to an in-house computational pipeline called “SCeptre”. By combining the power of principal component analysis (PCA) and eigen-protein annotation we identified cell type specific proteomes. Next, we compared the generated results with literature data and confirmed previous findings, provided proof-of-concept for single cell proteomics and have identified lead proteins as candidate markers for cellular identity. Thereby, we identified an average of 389 proteins per cell, of which 22 were proteases, including but not limited to caspases, aminopeptidase, cathepsins, calpains and proteasome components across all four cell types. Subsequent analysis showed cell type specific distribution

96 Novel proteomics technologies to decipher protease networks in cells and tissues 4500 4000 3500 3000 2500 2000 1500

Relative Relative intensity 1000 500 0 BLAST LSC PROG

Caspase-14 Cathepsin D

Figure 1- Single-Cell proteomics abundance profiles of selected proteases in three cell types. Three independent cells show distribution of caspase-14 and cathepsin D in their cell type. of proteases. Caspase-14 and cathepsin D protein abundance was highest in Leukemic Blast cells compared to the other two cell types. Leukemic Stem Cells exhibited the lowest levels of the two proteins (Fig. 1). This analysis demonstrated the ability of scMS to identify and quantify proteases at single-cell resolution, opening up new opportunities for degradomics research.

Novel proteomics technologies to decipher protease networks in cells and tissues 97 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Title:

A Quantitative Single-Cell Proteomics Approach to Characterize an Acute Myeloid Leukemia Hierarchy

Running title:

Single Cell Proteomics in Leukemia

Author list Erwin M. Schoof1,2,3,4,5,6,#,*, Nicolas Rapin1,2,4,*,@, Simonas Savickas3, Coline Gentil1,2, Eric Lechman5,6, James Seymour Haile1,2,4, Ulrich auf dem Keller3, John E. Dick5,6, Bo T. Porse1,2,4,#

Affiliations 1 The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, University of Copenhagen, Denmark 2 Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark 3 Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark 4 Danish Stem Cell Centre (DanStem), University of Copenhagen, Denmark 5 Princess Margaret Cancer Centre, University Health Network, Toronto, Canada 6 Department of Molecular Genetics, University of Toronto, Toronto, Canada

* These authors contributed equally to this work

# Corresponding author: Erwin M. Schoof, email: [email protected] / [email protected] Bo T. Porse, email: [email protected]

@ Current address: Deloitte Consulting, Weidekampsgade 6, 2300 København S, Denmark

Key words: acute myeloid leukemia / computational biology / proteomics / single-cell approaches / tandem mass tags (TMT)

Final character count: 47,782

1 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Abstract In recent years, cellular life science research has experienced a significant shift, moving away from conducting bulk cell interrogation towards single-cell analysis. It is only through single cell analysis that a complete understanding of cellular heterogeneity, and the interplay between various cell types that are fundamental to specific biological phenotypes, can be achieved. Single-cell assays at the protein level have been predominantly limited to targeted, antibody-based methods. However, here we present an experimental and computational pipeline, which establishes a comprehensive single-cell mass spectrometry-based proteomics workflow. By exploiting a leukemia culture system, containing functionally-defined leukemic stem cells, progenitors and terminally differentiated blasts, we demonstrate that our workflow is able to explore the cellular heterogeneity within this aberrant developmental hierarchy. We show our approach is capable to quantifying hundreds of proteins across hundreds of single cells using limited instrument time. Furthermore, we developed a computational pipeline (SCeptre), that effectively clusters the data and permits the extraction of cell-specific proteins and functional pathways. This proof-of-concept work lays the foundation for future global single-cell proteomics studies.

2 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Introduction In recent years, single-cell molecular approaches such as RNAseq (sc-RNAseq) have revolutionized our understanding of molecular cell biology (Treutlein et al, 2014; Islam et al, 2014; Poulin et al, 2016; Trapnell, 2015; Paul et al, 2015). Gaining single-cell resolution of what occurs at the molecular level within single cells is of the utmost importance, particularly within in cancer biology, where it has long been known that tumors consist of a multitude of cell types, all acting in concert. (Levitin et al, 2018; Jerby-Arnon et al, 2018; Rodriguez- Meira et al, 2019; Nam et al, 2019; van Galen et al, 2019) Similarly, in complex biological organs such as the hematopoietic system, it is the complex interplay of various cell types and differentiation stages which defines a healthy or malignant state (Bonnet & Dick, 1997; Mercier & Scadden, 2015; Kreso & Dick, 2014; Bahr et al, 2018; Notta et al, 2016; Shlush et al, 2017; Lauridsen et al, 2018; Lapidot et al, 1994). Thus, technologies which are unable to resolve the molecular landscapes at single-cell resolution, like the more traditional Mass Spectrometry (MS) based proteomics approaches that typically require an input of hundreds of thousands of cells, are not sufficient to gain an understanding of cellular heterogeneity and of underlying signaling networks within individual cell types. Due to technical and practical limitations in terms of instrument sensitivity and experimental workflows thus far, single-cell MS (scMS) has been elusive, leading to an initial focus on sc-RNAseq based approaches. While RNA-based methods have been informative about the RNA landscapes in a plethora of biological systems and have demonstrated high clinical relevance(Eppert et al, 2011; Duployez et al, 2019; Ng et al, 2016), their accuracy has proven limitation when used as a proxy for protein levels (Vogel & Marcotte, 2012; Khan et al, 2013). Therefore, to gain a thorough understanding of what occurs in a cell at the protein level, on a global scale, MS- based approaches are the sole way to accomplish this. Being the cellular workhorses, there is much knowledge to be gained from mechanisms occurring at the protein level, either through enzyme activity, post-translational modifications or protein degradation/proteolysis; hence the great need for protein level approaches at the single-cell level.

A few years ago, a novel type of flow cytometry was established; by combining traditional flow cytometry workflows with mass spectrometry, a new analysis method termed Mass Cytometry was developed, more commonly referred to as CyTOF (Newell et al, 2012; Bodenmiller et al, 2012). This allows the simultaneous readout of tens of markers simultaneously, allowing single-cell analysis of pre-defined sets of proteins or post- translational modifications (PTMs). This method, however, relies heavily on previously

3 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

validated reagents and antibody panels, and is thereby inherently limited in terms of proteome coverage and which cellular signaling networks can be interrogated. Thus, while CyTOF represents a dramatic leap forward in terms of quantitative, protein-level analysis of single cells, there are a few inherent limitations that have prevented the technology from gaining universal recognition as a protein-level alternative to sc-RNAseq.

In this work, we have established a novel experimental workflow that allows the global characterization of single cell proteomes without the need for antibodies, and we conducted a proof-of-concept study in a primary Acute Myeloid Leukemia (AML) hierarchy, consisting of leukemic stem cells (LSC), progenitors and blasts (Lechman et al, 2016). This primary culture system, termed ‘8227’, was derived from an AML sample where the patient had relapsed after treatment. By culturing under serum-free, growth-factor supplemented conditions, the hierarchical nature of AML is maintained, with LSC at the apex, differentiating to progenitors and terminally differentiated blasts; most traditional AML cell lines lack such a hierarchical structure. Blasts (characterized as CD34-) are the dominant population in the culture, but in vivo and in vitro long-term maintenance relies on the LSC (CD34+CD38-) and progenitor (CD34+CD38+) cells, thereby closely mimicking the in vivo hierarchical nature of primary AML (Fig. 1A). In order to successfully eradicate an AML in patients, we must not only target the blasts, but also the LSC in order to prevent relapse. However, due to their low abundance, studying LSC from a molecular perspective is challenging and has been limited to bulk-sorted approaches thus far (Raffel et al, 2017). The 8227 culture system provides us with an ideal proof-of-concept system, as the functional heterogeneity within this culture system has previously been evaluated and is readily isolated through FACS sorting (based on classical CD34/CD38 stem cell markers) (Lechman et al, 2016; Kaufmann et al, 2019). Modeling these functional differences using our molecular data would provide proof-of-principle that our workflow is able to distinguish differentiation stages in a complex cellular hierarchy.

Our approach builds on a series of recent developments in the low-input proteomics field (Kulak et al, 2014; Lechman et al, 2016; Schoof et al, 2016; Wojtowicz et al, 2016; Klimmeck et al, 2012; Cabezas-Wallscheid et al, 2014), and focuses on minimizing sample loss throughout the experimental protocol. Subsequently, by utilizing a ‘booster’ channel to provide additional peptide copies (and thus, ions for MS identification), combined with Tandem Mass Tag (TMT) labeling, we are able to derive quantitative information about

4 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

protein levels in 10 single cells per MS injection. A similar approach has recently been published (Budnik et al, 2018), also utilizing the TMT technology, but which does not employ FACS sorting to isolate the cells of interest, has not demonstrated the ability to differentiate between various cell types or differentiation stages originating from the same starting population, and has not applied state-of-the-art computational single-cell analysis. In order for scMS to be a viable alternative to sc-RNAseq, it needs to 1) be able to match the throughput capacity, 2) cover the same order of magnitude in terms of number of unique proteins detected and 3) be easily implementable in a wide range of cellular assays. Therefore, we opted to use a 96-well plate format, into which cells can be sorted by standard FACS sorting, thereby providing medium-high throughput, omitting the requirement for expensive consumables, and being amenable to automated liquid-handling systems. This type of experimental workflow puts our method in line with sc-RNAseq workflows in terms of throughput and ease of implementation. Maximizing proteome coverage was addressed by optimized experimental workflows and utilizing the latest generation of MS instruments. In order to be able to draw conclusions about single cell molecular phenotypes, it is imperative to be able to quantify hundreds of unique proteins, which was a vital criterium of our proposed workflow. Finally, in order to utilize the scMS data to its full potential, we adopted the latest algorithms from the sc-RNAseq field, and implemented them on our protein-level data. Thus, this work represents a proof-of-concept demonstration of the method and a data resource of a leukemia hierarchy, while additionally providing the community with a universally deployable software package that allows researchers from any field to analyze their own single-cell proteomics or other expression data.

Results Challenges to overcome One of the main technical challenges in conducting single-cell proteomics using MS, is the inherent sensitivity issue of MS instruments requiring enough ion (i.e. peptide) copies to successfully sequence, and thereby identify, the peptide. Depending on instrument type, this threshold rests anywhere between 10,000 – 100,000 ions and defines the lower limit of detection. To overcome this limitation, we utilized the main strength of TMT technology, namely the possibility to multiplex samples, while still being able to resolve sample-specific quantitative protein levels. Moreover, to boost the number of ions available for MS identification even further, we deploy a ‘booster’ sample, consisting of 500 cells, and dedicate a single TMT channel to that. The current iteration of the TMT technology has

5 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

eleven channels available, thus meaning that a single TMT sample can contain up to ten single cells plus the booster channel(s) (Fig. 1B). By multiplexing samples, the MS instrument has the ion equivalent of peptide copies from 510 cells available in total, leading to the successful identification and quantitation of a representative subset of the cellular proteome. For the experiments described below, we created single cell TMT pools for bulk cells (live cells covering the entire culture), blasts, progenitors and LSC, as we are able to FACS sort these populations according to the sorting scheme in Fig. 1B.

An additional challenge is the sample loss associated with the upstream experimental workflow, where proteins and peptides bind non-specifically to the plastic surfaces of the tubes they are contained in. When starting with extremely limited material such as single cells, minimizing these non-specific sample losses is of utmost importance. From our pilot experiments, we found that using Eppendorf LoBind technology was very effective at minimizing these losses and hence all our experiments are done in LoBind PCR plates and microtubes. To assist with minimizing sample loss, we follow and further adapted the iST approach (Kulak et al, 2014), by processing samples all in a single reaction chamber. Cells are FACS-sorted directly into a LoBind 96-well PCR plate, the lysis is done directly after sorting, and the solubilized proteins are digested and TMT-labelled in the same well, thereby minimizing transfer-associated sample loss. After acidification, the single cell peptide samples are pooled with their respective booster channel, and desalted using StageTip technology (Rappsilber et al, 2007).

The final and biggest technical challenge restricting single-cell proteomics is closely related with impediments in the sc-RNAseq field, namely the computational analysis and resolving the individual cell-types based on the available molecular data. To overcome this challenge, we adapted several of the latest state-of-the-art algorithms from the sc-RNAseq field, and tailored them to be amenable for our MS data (Fig. 1C). The resulting computational pipeline was termed ‘SCeptre’ (Single Cell proteomics readout of expression). Visualization of the data is facilitated through the use of SCANPY (Wolf et al, 2018), which allows one to adapt the embedding of choice (tSNE, UMAP, etc.) and explore the data visually. In order to account for experimental batch effects between sample injections and possible loss of protein groups between the same population, we rely on a proven technique from sc-RNAseq, where proteins most commonly expressed across all cells are used to compute an embedding of cells in order to see their relative positions in expression space. We use batch correction

6 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Haghverdi et al, 2018) on a reduced set of proteins commonly present in all populations (Step 1). The key parameter for defining this is “sigma”, which controls the percentage of proteins that are allowed to be missing in a given population. This can be adjusted for each dataset independently, in order to explore the most appropriate threshold levels for individual experimental setups. Once a set of common proteins is defined (Step 2), a feature engineering step is applied in the form of augmenting the data of protein expression with biological pathways (Step 3). In this proof-of-concept work, it was decided to use wiki pathways (Slenter et al, 2018), but other sources of pathway information or gene signatures can be used as well, as we have implemented support for the generic Gene Matrix Transposed (.gmt) pathway format widely used in the field(Subramanian et al, 2005). Next, we compute a correlation network for all features (protein and pathway expression, step 4). Features are set as nodes in the network and two nodes are linked together when they achieve a correlation coefficient across all cells in the dataset above a specified threshold. The complements larger than four nodes are then used to compute an “Eigenprotein” value from all proteins and pathways within that component for all cells (Step 5). Finally, the Eigenproteins (or so-called ‘cliques’) are used to compute a UMAP (Uniform Manifold Approximation and Projection) embedding of all cells in the dataset (Step 6), which is the core of the visualizations shown in Figure 4. Combined, this workflow allows the determination of which proteins and pathways might play a role in defining the various cellular phenotypes. SCeptre is available as a docker image that contains all the processing steps and libraries readily available through an ipython notebook.

Designing an optimal experimental workflow To determine the optimal experimental workflow, we explored various parameters to assess two key parameters for successful single-cell proteome analysis: 1) the choice of booster channel cells, and 2) instrument parameters. For the booster channel, in order to ensure the most accurate protein representation of the single cell proteomes within the booster channel, and exploit the nature of single-cell analyses to the fullest, it is imperative to choose the correct cell type, given the cell-specific protein expression levels that help distinguish cellular phenotype. To investigate the (dis)advantages of using various types of booster channel cells (bulk cells, differentiation stage specific and both), we repeated the experiments using the different types of booster channels and subsequently interpret the results. For the instrument parameters, we wished to investigate the pros and cons of conducting TMT quantitation at MS2 level or MS3 using the SPS TMT MS3 methodology (Ting et al, 2011; Hogrebe et al,

7 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

2018; McAlister et al, 2014); while the former method generally results in greater numbers of proteins identified, the quantitative accuracy tends to be affected by co-isolating peptides, which is resolved by the latter method, at the cost of sequencing speed, and consequently, number of protein identifications. Combined, these two types of investigations should provide us with a comprehensive catalogue on how to extract and utilize single cell proteomics data most effectively.

Effect of the choice of Booster Channel cells To investigate the importance of the choice of booster cells, we set out to compare the results using 1) bulk, 2) differentiation stage specific, and 3) a combination of both bulk and differentiation stage specific booster channels (i.e. using two TMT booster channels). A brief overview of the result statistics can be found in Table 1, where it is clear that when using two types of booster channels simultaneously, the gain in extra ions for identification significantly increases the number of proteins identifiable in the single cells (2,138 proteins, compared to 1,259 proteins in the bulk cell booster channel only). Likely, this increase in number of proteins identifiable is also due to the fact that differentiation stage specific proteins (such as CD34, which has elevated expression levels in progenitors and LSC) can be captured with the cell-specific booster channels, which would be lost in the population average provided by the bulk cell boosters. Moreover, comparison of PCA clustering of the three types of booster channels reveals that distinguishing various differentiation stages and accurately clustering them together becomes hampered when not using cell-specific boosters (Fig. 2A). Indeed, only when a cell-specific booster channel is used, do the blasts separate clearly from the progenitors and LSC, which is a key indicator of the ability to determine cellular phenotype from the molecular protein-level data. However, standard PCA analysis often falls short for interpreting single cell molecular analyses comprehensively; hence, we processed the data using SCeptre, in order to determine whether the single cell proteomics data would enable us to differentiate between the three differentiation stages, and whether specific cell clusters could be identified within the respective cell populations. As portrayed in Figure 2B, it is evident that, in fact, SCeptre is able to distinguish the various differentiation stages, irrespective of booster channel used, and clusters together single cells of the same differentiation stage in all three datasets. Especially the UMAP embedding is successful at highlighting the correct cell clusters, although tSNE is also able to generally cluster according to differentiation stage. As expected, bulk cells are generally placed between the blasts and progenitors/LSC, given that they consist of all three differentiation stages. The fact

8 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

that they are generally most closely located to the blasts can be explained by the fact that ~90% of the 8227 cell culture system is made up of blast cells, which are therefore over- represented in the bulk cell population, thereby rendering them molecularly most similar to the blasts. When taken together, these results strongly suggest that our experimental workflow is able to capture enough proteome depth to decipher cellular phenotype at the single cell level, and that SCeptre is able to resolve these cell populations from the protein expression data alone.

Effect of MS Instrument Type To exploit the latest capabilities in resolving TMT co-isolation effects, we utilized the ThermoFisher Orbitrap Fusion instrument to analyse a subset of samples (40 single cells for each of the three differentiation stages and bulk cells, i.e. 160 single cells total, combined with a differentiation stage specific booster channel of 500 cells) using the TMT SPS MS3 workflow(McAlister et al, 2014; Ting et al, 2011). We were interested in exploring whether the increased quantitative accuracy of such an instrument workflow would be beneficial for deciphering cellular phenotypes. Given the inherent lower proteome coverage with such an approach however, we also wanted to investigate whether the resulting proteome depth would still be sufficient for clustering the different differentiation stages correctly. When plotting this data using a standard PCA analysis (Fig. 2C), very little degree of separation between the different differentiation stages can be observed, and not as extensive as from our dataset using MS2-based quantitation. However, when analysing the same data using SCeptre (Fig. 2D), it becomes clear that the protein-level data was in fact sufficient to cluster the differentiation stages correctly, with the single cells generally clustering together by differentiation stage, especially in the UMAP-embedded visualization of the data.

Booster Channel control experiments Encouraged by our initial results of being able to cluster differentiation stages correctly, irrespective of which booster channel we used, we next set out to explore whether mixing single cells of different differentiation stages, combined with a booster channel of one differentiation stage only, would still allow us to resolve differentiation stages correctly. We hypothesize that if we are truly reading out single cell proteomes, irrespective of type of booster channel used, the booster channel of one differentiation stage should not influence the final clustering of the single cells from other differentiation stages that were analysed in unison. To this end, we sorted two 96-well plates according to Figure 3A, where for each

9 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

differentiation stage, we prepared two TMT pools with that specific differentiation stage as booster channel, while mixing in single cells of all three differentiation stages and bulk within the same TMT pool. Thus, in total, 16 single cells of each differentiation stage were analysed, in combination with a booster channel originating from each of the differentiation stages. Resulting samples were then analysed using both MS2 and MS3-level quantitation, and we tested whether the single cells clustered according to differentiation stage. Total proteome coverage was around 1,000 proteins across 72 cells (Fig. 3B). When subjected to standard PCA analysis, there was no clustering according to differentiation stage (Fig. 3C); in fact, the single cells clustered almost exclusively according to the TMT pool they originated from. No significant difference was observed between MS2 and MS3-level quantitation, and even with the more accurate MS3 approach, no correct differentiation stage clustering was apparent. However, when analysed using SCeptre, single cells strikingly clustered perfectly according to differentiation stage, especially in the MS3 dataset (Fig. 3D). These results are highly critical, as they indicate we truly are measuring single cell proteomes as opposed to being significantly influenced by the booster channel contents; while differentiation stage specific booster samples are important for detecting cell-specific proteins, they are not imperative for being able to correctly cluster the differentiation stages using the protein-level information only. Moreover, these experiments also show that even using small cell numbers, with limited per-cell proteome coverage of several hundred proteins (Fig. 3D), we are still able to detect different differentiation stages and extract those proteins that may be key to their functional phenotypes.

Eigenprotein Analysis of scMS data Having established that the molecular protein expression data from our experimental pipeline is of enough proteome depth and quantitative accuracy to correctly cluster differentiation stages, we next set out to improve the clustering ability even further, and try to decipher those proteins and functional pathways which may be fundamental to defining cellular phenotype. Being able to find cellular sub-clusters within a cell type of interest is of great benefit when trying to deduce cellular heterogeneity, as this may allow the determination of even purer cell populations within a previously deemed homogeneous cell type. To this end, we employed a concept from the RNAseq field, namely that of Eigenproteins. By combining protein expression patterns with protein interaction (i.e. pathway) information of those proteins, one can potentially derive functionally unique clusters within a cell type of interest, thereby spanning the bridge between protein expression and functional implications of those proteins.

10 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

This has been applied in the genomics field before (Han et al, 2017; Cheng et al, 2017; Agrahari et al, 2018), but to the best of our knowledge, has not been explored in the proteomics field thus far. By combining protein expression with protein interaction information, our pipeline generates so-called ‘cliques’, which are subsequently used to re- cluster the single cells on an embedding of choice (UMAP or tSNE). To demonstrate the utility of the Eigenprotein approach, we focused on dataset number three (using differentiation stage specific booster channels), as it contains the largest number of cells and total number of proteins identified, and it should contain the differentiation stage specific proteins that would be lost when using bulk cells as booster channels. When supplementing the protein expression levels with protein interaction information, a different clustering patterns emerges (Fig. 4A), where cells of one differentiation stage are clustered in a more cloud-like fashion. In this analysis, a total of 54 cliques were identified (Supplementary Figure 1), which are sufficient and able to cluster the single cells according to differentiation stage. When looking at the UMAP representations of these Eigenprotein cliques (Fig. 4B), one can clearly distinguish differentiation stage specific clusters, suggesting that those cells are utilizing functional signalling pathways in a similar manner. These clusters can then be related back to the Eigenprotein differentiation stage clustering (Fig. 4A) to determine which specific differentiation stages are having those pathways up- or down-regulated. The pathways associated with the Eigenprotein cliques shown here (Eigenproteins 16, 26 and 22) correspond to “Wnt Signaling/Pluripotency”, “VEGF/Fas ligand/p38_MAPK signaling”, and “mir-124 predicted interactions with cell cycle and differentiation”; relating back to the differentiation stage clustering, it appears that these pathways are important for Progenitors/LSC, LSC and blasts respectively. For a complete overview of the Eigenprotein cliques identified in this dataset, they are all listed in Supplementary Table 1. This analysis workflow presents a meaningful way to explore the data, both visually and functionally, and to get insights into what functional pathways are active within the various differentiation stages, thereby providing input for downstream functional validation.

Towards finding cell-specific proteins To further enhance the differentiation stage specific analysis, at the single-cell level, we wanted to be able to find those proteins whose expression levels most extensively define cellular phenotype. To this end, we deployed the “rank_genes” function from SCANPY (Wolf et al, 2018), which is integrated in SCeptre. This allows us to extract those proteins

11 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

that are ranked highest for defining a differentiation stage of interest, and plot them (Fig. 4C). For this illustration, the Wilcoxon-Rank-Sum statistical test was used. We then plotted the statistically significant proteins in a heatmap, illustrating the expression level of those proteins across the various cell populations, within each individual cell separately (Fig. 4D). In order to link these results back to our Eigenprotein embedding, the computational pipeline also enables the plotting of individual protein expression levels on any embedding of choice. To illustrate this more clearly, we plotted the top three proteins in the LSC population and the blast population in Figure 4E. From these results, we can find exactly those cells that show an increased or decreased expression level for the proteins of interest, and thereby interpret the statistical results in more detail. When looking at the targets highlighted by the Wilcoxon analysis, in the LSC population specifically, it is interesting to note that SWAP70 has previously been linked to AML development in a murine setting (Erkeland et al, 2004). Moreover, it is predicted to interact with NPM1 (Stelzer et al, 2016), which is frequently mutated in AML (McKerrell et al, 2015; Krönke et al, 2013). Combined with our single cell analysis potentially highlighting it as an LSC protein, seems to warrant future functional follow-up to investigate its exact role in disease development. The next target on the list, DDX46, has previously been shown to be required in hematopoietic stem cell activity in zebrafish (Hirabayashi et al, 2013). While it has not been shown in a leukemic context thus far, its role in hematopoietic stem cell differentiation, a process which has gone awry in AML, renders it a potentially highly interesting target. Together, these results indicate that the data is depicting several potentially relevant proteins for leukemia disease morphology, and that our experimental and computational pipeline is able to derive them effectively, from single cell proteomics data, something which is completely unprecedented.

Discussion This work represents a proof-of-concept study, investigating whether current technology is able to conduct single-cell proteomics analysis on a bio-therapeutically relevant model system. By spending considerable effort not only on the sample preparation and data generation, but also on the subsequent data analysis efforts, we managed to establish a scMS workflow package which is able to 1) quantify thousands of unique proteins, 2) analyse hundreds of cells per day of instrument time, 3) visualize the data using the latest state-of- the-art single cell computational algorithms, and 4) derive differentiation stage specific proteins which may pose as potential therapeutic targets or other functionally relevant candidates.

12 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

As single-cell approaches put significant strain on throughput requirements, we focused especially on having an experimental workflow that would allow easy preparation of large cell numbers. As is the case for any single cell analysis, the more cells that can be analysed, the more knowledge can be extracted about the model system under investigation. By using standard FACS sorting methodology, even for very low frequency cell populations, we are able to sort several 96-well plates per hour, including the eight booster channel samples per plate, consisting of 500 cells each. Simultaneously, this allows for including index sorting in the experimental setup, thereby allowing a link to be drawn between fluorescent surface markers and expression levels of all detected proteins within the cells. Future experiments will focus on porting the workflow to a 384-well plate format, to further increase the throughput capacity, which will be very well matched with the newly released TMT 16-plex reagents. By simultaneously decreasing sample volumes, reaction kinetics should be improved, thereby potentially boosting proteome coverage as well. Similar approaches using nanofluidics have been shown very effective when analysing small sample amounts (Zhu et al, 2018), and it is likely that this platform would be beneficial to scMS as well; however, the expensive consumables associated with such an approach make large-scale investigations very costly, hence why 384-well plates may be an effective compromise.

We demonstrated the utility of the booster channel, and that when paired with appropriate computational workflows, we are able to correctly cluster cells irrespective of which type of booster cells were used. Nevertheless, if possible, we would recommend the use of a cell- specific booster channel, in order to detect the cell-specific proteins; in our case, we used CD34 as a trial candidate, as we know this should only be found at high abundance in the LSC/Progenitor cells. We were able to confirm this (Supplementary Figure 2), but more importantly, this protein was only detected when a cell-specific booster channel was used, thereby underlining the importance of using such a booster type. However, the fact that cells clustered correctly in all cases suggests that, e.g. in cases of very low abundant cell types, other cells could be used. This would be of great advantage in studies focused on very rare cell populations. With improvements at the MS instrument level, lower booster cell numbers (tens of cells rather than hundreds) would still be able to produce useful proteome depth, so in future studies, cell abundance should not be a limiting factor for scMS. Alternatively, it could be opted to deploy targeted peptide libraries as booster channel; by TMT-labeling those reference peptides, the MS instrument can be steered towards analysing a set of proteins of

13 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

interest, without needing additional cell numbers to provide the peptide copies. This will require further optimization, but in principle can be a powerful complementary booster method, especially in cases where cell-specific booster channels are difficult to generate, e.g. in the case of smaller multi-cellular organisms. Simultaneously, this can help ensure that proteins of interest will be quantified, compared to the partially stochastic nature of data- dependent analysis workflows. While it remains to be tested, peptide libraries could theoretically even open up the possibility of studying PTMs such as phosphorylation events, and could thus be employed to study common cell processes such as cell cycle, kinase activity etc.

Regarding the instrument parameters related to TMT quantitation, we did not observe a significant improvement when using MS3 level quantitation compared to MS2. This may be due to the fact that we analysed fewer cells, and due to slower cycle time, were able to quantify fewer proteins. It is likely that a newer generation of the ThermoFisher Tribrid instruments (such as the Orbitrap Eclipse) would offset this difference in number of proteins identified compared to MS2 quantitation. However, this lack of improvement, combined with the efficient clustering capabilities of MS2 level data, also suggests that the TMT co-isolation effect is not strong enough to negatively affect our readouts; nevertheless, alternative tagging technologies such as EASI-tags could be explored in future work to determine their compatibility with scMS (Winter et al, 2018). The suitability of TMT for this workflow is further supported by the fact that we can use the raw intensities as provided by Proteome Discoverer for our computational analyses, and are therefore not subject to normalization and correction effects that are sometimes opted to include when using TMT. This could potentially have implications for merging several experiments retrospectively, since using the raw intensities means that different datasets should be more compatible due to the lack of post-acquisition processing requirements.

By porting several of the latest state-of-the-art algorithms developed in the more established sc-RNAseq field, we are able to utilize the knowledge that has been gained over the past years to our advantage. One of the main strengths of our computational pipeline is to significantly enhance the ability to cluster cell types according to different data types (e.g expression levels or Eigenprotein cliques), the subsequent visualization thereof and extraction of highly relevant proteins. As the main embeddings (tSNE and UMAP) commonly used in single-cell approaches each have their own (dis)advantages (Zhu et al, 2018), we opted to

14 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

give the user the opportunity to plot both and choose according to their experimental setting. When comparing the clustering based on expression levels only with the clustering based on the Eigenprotein integration, it becomes very clear why single-cell analysis is so important when trying to understand cellular phenotype; while specific protein expression patterns may not always be consistent amongst all the single cells of a differentiation stage, the pathway activity for a particular pathway often does seem to span across a large subset of that differentiation stage (Supplementary Figure 1), indicating that functionally, they are displaying a similar phenotype that would have gone unnoticed when using expression data only. It simultaneously translates protein expression levels to a more functional interpretation thereof, which can be powerful when trying to functionally validate certain observations. Furthermore, it allows detection of functionally similar cells across different cell types, which can be informative of which cellular pathways are shared between, and which ones are unique to a certain differentiation stage. This may not have been picked up by protein expression levels alone.

In conclusion, this work presents the first time that a true, pure LSC proteome is published, while simultaneously being the first single-cell analysis of a leukemia hierarchy at the global proteome level. While it focuses on a single AML sample, it should nevertheless be a good foundation as a resource for follow-up studies, and paves the way for studying primary leukemias. Furthermore, by providing the community with the experimental protocols, combined with a powerful computational analysis pipeline, we strongly believe our scMS approach is now a real alternative to conducting sc-RNAseq analyses, especially in those cases where protein-level information is desirable. This opens up a plethora of research avenues, spanning across many biological fields, and proteome coverage will only improve as instrument sensitivity and experimental workflows develop even further, closing the gap between RNA-based and protein-based approaches.

15 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Materials and Methods Cell Culture and FACS Sorting 8227 cells were grown in StemSpan SFEM II media, supplemented with growth factors (Miltenyi Biotec, IL-3, IL-6 and G-CSF (10ng/mL), h-SCF and FLt3-L (50ng/mL), and TPO (25ng/mL) to support the hierarchical nature of the leukemia hierarchy captured within the cell culture system. On day 6, cells were harvested (8e6 cells total), washed, counted and resuspended in fresh StemSpan SFEM II media on ice at a cell density of 5e6 cells / ml. Staining was done for 30mins on ice, using a CD34 antibody (CD34-APC-Cy7, Biolegend, clone 581) at 1:100 (vol/vol) and CD38 antibody (CD38-PE, BD, clone HB7) at 1:50 (vol/vol). Cells were washed with extra StemSpan SFEM II media, and subsequently underwent three washes with ice cold PBS to remove any remaining growth factors or other contaminants from the growth media. Cells were resuspended for FACS sorting in fresh, ice cold PBS at 2e6 cells / ml. Cell sorting was done on a FACSAria I or III instrument, controlled by the DIVA software package and operating with a 100um nozzle. Cells were sorted at single-cell resolution, into a 96-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 40ul of 50mM HEPES pH 8.5. In each row, wells 1-10 were filled with single cells, and well 11 was filled with 500 cells for the booster channel. Directly after sorting, plates were briefly spun and then boiled at 95C for 5mins, followed by sonication in a waterbath sonicator (VWR) for 2 mins to complete the lysis. Plates were then stored at -80C until further sample preparation.

Mass Spectrometry Sample Preparation After thawing, the lysates were treated with Benzonase (Sigma cat. nr. E1014), diluted to 1:500 (vol/vol) for 1hr at 37C to digest any DNA that would interfere with downstream processing. Subsequently, 25ng of Trypsin (Sigma cat. nr. T6567) was added to the single cell samples, 50ng of Trypsin was added to the 500-cell booster channel samples, and the

16 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

plates were vortexed and kept at 37C overnight to complete the protein digestion. The next morning, peptides were labelled with TMT (tandem mass tag) reagents according to manufacturer’s instructions. Briefly, 85mM of each label was added to the single-cell samples, while the 500-cell booster channel samples were labelled with 170mM of reagent. The labelling reaction was quenched with 2.5% Hydroxylamine for 15mins, after which peptides were acidified to a final concentration of 1% TFA, and TMT pools were mixed from 10 single cells + 1 booster channel to make up one 11-plex TMT sample each. The acidified TMT pools were subsequently desalted using in-house packed StageTips (Rappsilber et al, 2007). For each sample, 2 discs of C18 material (3M Empore) were packed in a 200ul tip, and the C18 material activated with 40ul of 100% Methanol (HPLC grade, Sigma), then 40ul of 80% Acetonitrile, 0.1% formic acid. The tips were subsequently equilibrated 2x with 40ul of 1%TFA, 3% Acetonitrile, after which the samples were loaded using centrifugation at 4,000x rpm. After washing the tips twice with 100ul of 0.1% formic acid, the peptides were eluted into clean 500ul Eppendorf tubes using 40% Acetonitrile, 0.1% formic acid. The eluted peptides were concentrated in an Eppendorf Speedvac, and re-constituted in 1% TFA, 2% Acetonitrile, containing iRT peptides (Biognosys AG, Switzerland) for Mass Spectrometry (MS) analysis.

Mass Spectrometry Data Collection Peptides were loaded onto a 2cm C18 trap column (ThermoFisher 164705), connected in-line to a 50cm C18 reverse-phase analytical column (Thermo EasySpray ES803) using 100% Buffer A (0.1% Formic acid in water) at 750bar, using either the Thermo EasyLC 1200, or the ThermoFisher Ultimate 3000 UHPLC system, and the column oven operating at 45°C. Peptides were eluted over a 100 or 140 minute gradient, ranging from 6 to 60% of 80% acetonitrile, 0.1% formic acid at 250 nl/min.

For samples analysed using the MS2-level quantitation feature of TMT, the Q-Exactive HF-X instrument (ThermoFisher Scientific) was operated in DD-MS2 mode. The instrument was run with a top 16 method, collecting MS2 spectra at 45,000 resolution and a 1e5 AGC target. Ions were collected for 120ms, and isolated with an isolation width of 1.2 or 0.7 m/z. Precursors with a charge of 2-7 were included, and those that have been sequenced once were put on an exclusion list for up to 60 seconds.

17 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

For samples analysed using MS3-level SPS TMT quantitation (McAlister et al, 2014), the Orbitrap Fusion instrument (ThermoFisher Scientific) was operated in DD-MS3 mode. MS1 scans were collected at 120,000 resolution, scanning from 375-1500 m/z, collecting ions for 50ms or until the AGC target of 4e5 was reached. Precursors with a charge state of 2-7 were included for MS2 analysis, which were isolated with an isolation window of 0.7 m/z. Ions were collected for up to 50ms or until an AGC target value of 1e4 was reached, and fragmented using CID at 35% energy; these were then read out on the linear ion trap in rapid mode. Subsequently, up to 10 notches were selected for MS3 analysis, isolated with an m/z window of 2 m/z, and fragmented with HCD at 65% energy. Resulting fragments were read out in the Orbitrap at 50,000 resolution, with a maximum injection time of 105ms or until the AGC target value of 1e5 was reached. Mass Spectrometry Raw Data Analysis To translate .raw files into protein identifications and TMT reporter ion intensities, Proteome Discoverer 2.2 (ThermoFisher Scientific) was used with the built-in TMT Reporter ion quantification workflows. Default settings were applied, with Trypsin as enzyme specificity. Spectra were matched against the 9606 human database obtained from Uniprot. Dynamic modifications were set as Oxidation (M), and Acetyl on protein N-termini. Cysteine carbamidomethyl was set as a static modification, together with the TMT tag on both peptide N-termini and K residues. All results were filtered to a 1% FDR. The mass spectrometry data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (Perez- Riverol et al, 2019) with the dataset identifier PXD015112. Reviewer account details: Username: [email protected], Password: 6BnVxQ9F.

Computational Analysis of Single Cell Data The Proteome Discoverer output file is filtered for single cell data only (through removal of the booster channel samples), and the raw TMT reporter ion intensities are taken through a computational pipeline that aims at denoising the data and find meaningful cell clusters. The first step in the SCeptre pipeline is to find proteins that are in common with all the cell populations. We set a modular threshold s that controls the percentage of a given protein being present on average in each population. When the most common proteins are found, a non-parametric Bayesian batch correction method (Johnson et al, 2007) is used to account for the repeated injections over several TMT pools, for each of the cell populations. The final dataset of cells and batch corrected protein expression is then reported and taken forward for

18 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

further analysis. The next step consists of the computation of gene signature scores, for each individual cell. To this end, an average value of protein expression for each protein expressed in a given cell is reported. For the scope of this article, it was decided to use Wikipathways gene sets (Pico et al, 2008). The resulting augmented dataset with gene signature expression and protein expression is then subjected to a correlation analysis where the correlation coefficient of each cell against all other cells in the dataset is reported. This correlation matrix is then used to build a network of cells. Cells close to each other, e.g. with a correlation coefficient above a threshold of 0.6 are linked together. We then find sub- components (cliques) in the network (Cazals & Karande, 2008), which groups proteins and gene signatures together, resulting in the definition of an Eigenprotein. To compute the Eigenprotein score, a principal component analysis is run using the proteins and gene signatures in the Eigenprotein on all cells of the dataset. The Eigenprotein score is the first principal component (PC1). Using the Eigenprotein score, an embedding of the cells in the dataset is created and used to report cell proximity in Eigenprotein space. The pipeline and all source code in Python is available as an iPython notebook run from a docker container (kuikuisven/sceptre).

Acknowledgements Erwin M. Schoof is a Lundbeck Fellow (fellowship # 2017-389) and a former EMBO Fellow (ALTF 1595–2014) and is co-funded by the European Commission (LTFCOFUND2013, GA-2013-609409) and Marie Curie Actions. This work was supported by the Kirsten & Freddy Johansen Foundation, the Independent Research Fund Denmark and through a centre grant from the Novo Nordisk Foundation (Novo Nordisk Foundation Centre for Stem Cell Biology, DanStem; Grant Number NNF17CC0027852). U. auf dem Keller acknowledges support by a Novo Nordisk Foundation Young Investigator Award (Grant Number NNF16OC0020670). Work in the Dick lab was supported by funds from the: Kirsten & Freddy Johansen International Prize, Princess Margaret Cancer Centre Foundation, Ontario Institute for Cancer Research with funding from the Province of Ontario, Canadian Institutes for Health Research, Canadian Cancer Society Research Institute, Terry Fox Foundation, Genome Canada through the Ontario Genomics Institute, and a Canada Research Chair.

Author contributions E.M. Schoof designed and carried out the experiments, wrote the manuscript and conducted data analysis. N. Rapin designed and implemented the computational workflow, conducted

19 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

data analysis and wrote the manuscript. E. Lechman, C. Gentile and J.S. Haile contributed with cellular assays. S. Savickas and U. auf dem Keller assisted with the MS analysis, helped develop the instrument methods and contributed to the overall functioning of the method. All authors proof-read and contributed to the manuscript. U. auf dem Keller, J.E. Dick and B.T. Porse oversaw the study, wrote the manuscript and helped design the experiments.

Conflict of interest The authors declare no conflict of interest

References Agrahari R, Foroushani A, Docking TR, Chang L, Duns G, Hudoba M, Karsan A & Zare H (2018) Applications of Bayesian network models in predicting types of hematological malignancies. Sci. Rep. 8: 6951 Available at: http://www.nature.com/articles/s41598- 018-24758-5 Bahr C, von Paleske L, Uslu V V., Remeseiro S, Takayama N, Ng SW, Murison A, Langenfeld K, Petretich M, Scognamiglio R, Zeisberger P, Benk AS, Amit I, Zandstra PW, Lupien M, Dick JE, Trumpp A & Spitz F (2018) A Myc enhancer cluster regulates normal and leukaemic haematopoietic stem cell hierarchies. Nature 553: 515–520 Available at: http://www.nature.com/articles/nature25193 Bodenmiller B, Zunder ER, Finck R, Chen TJ, Savig ES, Bruggner R V, Simonds EF, Bendall SC, Sachs K, Krutzik PO & Nolan GP (2012) Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nat. Biotechnol. 30: 858–867 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22902532 Bonnet D & Dick JE (1997) Human acute myeloid leukemia is organized as a hierarchy that originates from a primitive hematopoietic cell. Nat. Med. 3: 730–7 Available at: http://www.ncbi.nlm.nih.gov/pubmed/9212098 Budnik B, Levy E, Harmange G & Slavov N (2018) SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19: 161 Available at: http://www.ncbi.nlm.nih.gov/pubmed/30343672 Cabezas-Wallscheid N, Klimmeck D, Hansson J, Lipka DB, Reyes A, Wang Q, Weichenhan D, Lier A, von Paleske L, Renders S, Wünsche P, Zeisberger P, Brocks D, Gu L,

20 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Herrmann C, Haas S, Essers MAG, Brors B, Eils R, Huber W, et al (2014) Identification of Regulatory Networks in HSCs and Their Immediate Progeny via Integrated Proteome, Transcriptome, and DNA Methylome Analysis. Cell Stem Cell 15: 507–522 Available at: http://www.ncbi.nlm.nih.gov/pubmed/25158935 Cazals F & Karande C (2008) A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407: 564–568 Available at: https://www.sciencedirect.com/science/article/pii/S0304397508003903 Cheng J, Zhang J, Han Y, Wang X, Ye X, Meng Y, Parwani A, Han Z, Feng Q & Huang K (2017) Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. Cancer Res. 77: e91–e100 Available at: http://www.ncbi.nlm.nih.gov/pubmed/29092949 Duployez N, Marceau-Renaut A, Villenet C, Petit A, Rousseau A, Ng SWK, Paquet A, Gonzales F, Barthélémy A, Leprêtre F, Pottier N, Nelken B, Michel G, Baruchel A, Bertrand Y, Leverger G, Lapillonne H, Figeac M, Dick JE, Wang JCY, et al (2019) The stem cell-associated gene expression signature allows risk stratification in pediatric acute myeloid leukemia. Leukemia 33: 348–357 Available at: http://www.nature.com/articles/s41375-018-0227-5 Eppert K, Takenaka K, Lechman ER, Waldron L, Nilsson B, van Galen P, Metzeler KH, Poeppl A, Ling V, Beyene J, Canty AJ, Danska JS, Bohlander SK, Buske C, Minden MD, Golub TR, Jurisica I, Ebert BL & Dick JE (2011) Stem cell gene expression programs influence clinical outcome in human leukemia. Nat. Med. 17: 1086–1093 Available at: http://www.nature.com/articles/nm.2415 Erkeland SJ, Valkhof M, Heijmans-Antonissen C, van Hoven-Beijen A, Delwel R, Hermans MHA & Touw IP (2004) Large-scale identification of disease genes involved in acute myeloid leukemia. J. Virol. 78: 1971–80 Available at: http://www.ncbi.nlm.nih.gov/pubmed/14747562 van Galen P, Hovestadt V, Wadsworth Ii MH, Hughes TK, Griffin GK, Battaglia S, Verga JA, Stephansky J, Pastika TJ, Lombardi Story J, Pinkus GS, Pozdnyakova O, Galinsky I, Stone RM, Graubert TA, Shalek AK, Aster JC, Lane AA & Bernstein BE (2019) Single- Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease Progression and Immunity. Cell 176: 1265-1281.e24 Available at: https://linkinghub.elsevier.com/retrieve/pii/S0092867419300947 Haghverdi L, Lun ATL, Morgan MD & Marioni JC (2018) Batch effects in single-cell RNA- sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol.

21 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

36: 421–427 Available at: http://www.nature.com/articles/nbt.4091 Han Z, Johnson T, Zhang J, Zhang X & Huang K (2017) Functional Virtual Flow Cytometry: A Visual Analytic Approach for Characterizing Single-Cell Gene Expression Patterns. Biomed Res. Int. 2017: 1–9 Available at: https://www.hindawi.com/journals/bmri/2017/3035481/ Hirabayashi R, Hozumi S, Higashijima S-I & Kikuchi Y (2013) Ddx46 is required for multi- lineage differentiation of hematopoietic stem cells in zebrafish. Stem Cells Dev. 22: 2532–42 Available at: http://www.ncbi.nlm.nih.gov/pubmed/23635340 Hogrebe A, von Stechow L, Bekker-Jensen DB, Weinert BT, Kelstrup CD & Olsen J V. (2018) Benchmarking common quantification strategies for large-scale phosphoproteomics. Nat. Commun. 9: 1045 Available at: http://www.nature.com/articles/s41467-018-03309-6 Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lönnerberg P & Linnarsson S (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11: 163–166 Available at: http://www.nature.com/articles/nmeth.2772 Jerby-Arnon L, Shah P, Cuoco MS, Rodman C, Su M-J, Melms JC, Leeson R, Kanodia A, Mei S, Lin J-R, Wang S, Rabasha B, Liu D, Zhang G, Margolais C, Ashenberg O, Ott PA, Buchbinder EI, Haq R, Hodi FS, et al (2018) A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade. Cell 175: 984-997.e24 Available at: https://www.sciencedirect.com/science/article/pii/S0092867418311784?via%3Dihub Johnson WE, Li C & Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8: 118–127 Available at: http://www.ncbi.nlm.nih.gov/pubmed/16632515 Kaufmann KB, Garcia-Prat L, Liu Q, Ng SWK, Takayanagi S-I, Mitchell A, Wienholds E, van Galen P, Cumbaa CA, Tsay MJ, Pastrello C, Wagenblast E, Krivdova G, Minden MD, Lechman ER, Zandi S, Jurisica I, Wang JCY, Xie SZ & Dick JE (2019) A stemness screen reveals C3orf54/INKA1 as a promoter of human leukemia stem cell latency. Blood 133: 2198–2211 Available at: http://www.ncbi.nlm.nih.gov/pubmed/30796022 Khan Z, Ford MJ, Cusanovich DA, Mitrano A, Pritchard JK & Gilad Y (2013) Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342: 1100–4 Available at: http://www.sciencemag.org/cgi/doi/10.1126/science.1242379

22 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Klimmeck D, Hansson J, Raffel S, Vakhrushev SY, Trumpp A & Krijgsveld J (2012) Proteomic Cornerstones of Hematopoietic Stem Cell Differentiation: Distinct Signatures of Multipotent Progenitors and Myeloid Committed Cells. Mol. Cell. Proteomics 11: 286–302 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22454540 Kreso A & Dick JE (2014) Evolution of the Cancer Stem Cell Model. Cell Stem Cell 14: 275–291 Available at: https://www.sciencedirect.com/science/article/pii/S1934590914000575?via%3Dihub Krönke J, Bullinger L, Teleanu V, Tschürtz F, Gaidzik VI, Kühn MWM, Rücker FG, Holzmann K, Paschka P, Kapp-Schwörer S, Späth D, Kindler T, Schittenhelm M, Krauter J, Ganser A, Göhring G, Schlegelberger B, Schlenk RF, Döhner H & Döhner K (2013) Clonal evolution in relapsed NPM1-mutated acute myeloid leukemia. Blood 122: 100–8 Available at: http://www.ncbi.nlm.nih.gov/pubmed/23704090 Kulak NA, Pichler G, Paron I, Nagaraj N & Mann M (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods 11: 319–324 Available at: http://www.nature.com/articles/nmeth.2834 Lapidot T, Sirard C, Vormoor J, Murdoch B, Hoang T, Caceres-Cortes J, Minden M, Paterson B, Caligiuri MA & Dick JE (1994) A cell initiating human acute myeloid leukaemia after transplantation into SCID mice. Nature 367: 645–648 Available at: http://www.ncbi.nlm.nih.gov/pubmed/7509044 Lauridsen FKB, Jensen TL, Rapin N, Aslan D, Wilhelmson AS, Pundhir S, Rehn M, Paul F, Giladi A, Hasemann MS, Serup P, Amit I & Porse BT (2018) Differences in Cell Cycle Status Underlie Transcriptional Heterogeneity in the HSC Compartment. Cell Rep. 24: 766–780 Available at: http://www.ncbi.nlm.nih.gov/pubmed/30021172 Lechman ER, Gentner B, Ng SWK, Schoof EM, van Galen P, Kennedy JA, Nucera S, Ciceri F, Kaufmann KB, Takayama N, Dobson SM, Trotman-Grant A, Krivdova G, Elzinga J, Mitchell A, Nilsson B, Hermans KG, Eppert K, Marke R, Isserlin R, et al (2016) miR- 126 Regulates Distinct Self-Renewal Outcomes in Normal and Malignant Hematopoietic Stem Cells. Cancer Cell 29: 214–28 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26832662 Levitin HM, Yuan J & Sims PA (2018) Single-Cell Transcriptomic Analysis of Tumor Heterogeneity. Trends in Cancer 4: 264–268 Available at: https://www.sciencedirect.com/science/article/abs/pii/S2405803318300384?via%3Dihu b McAlister GC, Nusinow DP, Jedrychowski MP, Wühr M, Huttlin EL, Erickson BK, Rad R,

23 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Haas W & Gygi SP (2014) MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes. Anal. Chem. 86: 7150–7158 Available at: http://www.ncbi.nlm.nih.gov/pubmed/24927332 McKerrell T, Park N, Moreno T, Grove CS, Ponstingl H, Stephens J, Crawley C, Craig J, Scott MA, Hodkinson C, Baxter J, Rad R, Forsyth DR, Quail MA, Zeggini E, Ouwehand W, Varela I & Vassiliou GS (2015) Leukemia-Associated Somatic Mutations Drive Distinct Patterns of Age-Related Clonal Hemopoiesis. Cell Rep. 10: 1239–1245 Available at: https://www.sciencedirect.com/science/article/pii/S2211124715001138?via%3Dihub Mercier FE & Scadden DT (2015) Not All Created Equal: Lineage Hard-Wiring in the Production of Blood. Cell 163: 1568–1570 Available at: https://www.sciencedirect.com/science/article/pii/S0092867415016372?via%3Dihub Nam AS, Kim K-T, Chaligne R, Izzo F, Ang C, Taylor J, Myers RM, Abu-Zeinah G, Brand R, Omans ND, Alonso A, Sheridan C, Mariani M, Dai X, Harrington E, Pastore A, Cubillos-Ruiz JR, Tam W, Hoffman R, Rabadan R, et al (2019) Somatic mutations and cell identity linked by Genotyping of Transcriptomes. Nature 571: 355–360 Available at: http://www.nature.com/articles/s41586-019-1367-0 Newell EW, Sigal N, Bendall SC, Nolan GP & Davis MM (2012) Cytometry by Time-of- Flight Shows Combinatorial Cytokine Expression and Virus-Specific Cell Niches within a Continuum of CD8+ T Cell Phenotypes. Immunity 36: 142–152 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22265676 Ng SWK, Mitchell A, Kennedy JA, Chen WC, McLeod J, Ibrahimova N, Arruda A, Popescu A, Gupta V, Schimmer AD, Schuh AC, Yee KW, Bullinger L, Herold T, Görlich D, Büchner T, Hiddemann W, Berdel WE, Wörmann B, Cheok M, et al (2016) A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature 540: 433–437 Available at: http://www.nature.com/articles/nature20598 Notta F, Zandi S, Takayama N, Dobson S, Gan OI, Wilson G, Kaufmann KB, McLeod J, Laurenti E, Dunant CF, McPherson JD, Stein LD, Dror Y & Dick JE (2016) Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351: aab2116 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26541609 Paul F, Arkin Y, Giladi A, Jaitin DA, Kenigsberg E, Keren-Shaul H, Winter D, Lara-Astiaso D, Gury M, Weiner A, David E, Cohen N, Lauridsen FKB, Haas S, Schlitzer A, Mildner A, Ginhoux F, Jung S, Trumpp A, Porse BT, et al (2015) Transcriptional Heterogeneity

24 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

and Lineage Commitment in Myeloid Progenitors. Cell 163: 1663–77 Available at: https://linkinghub.elsevier.com/retrieve/pii/S0092867415014932 Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Pérez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, et al (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47: D442–D450 Available at: https://academic.oup.com/nar/article/47/D1/D442/5160986 Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR & Evelo C (2008) WikiPathways: Pathway Editing for the People. PLoS Biol. 6: e184 Available at: http://www.ncbi.nlm.nih.gov/pubmed/18651794 Poulin J-F, Tasic B, Hjerling-Leffler J, Trimarchi JM & Awatramani R (2016) Disentangling neural cell diversity using single-cell transcriptomics. Nat. Neurosci. 19: 1131–1141 Available at: http://www.nature.com/articles/nn.4366 Raffel S, Falcone M, Kneisel N, Hansson J, Wang W, Lutz C, Bullinger L, Poschet G, Nonnenmacher Y, Barnert A, Bahr C, Zeisberger P, Przybylla A, Sohn M, Tönjes M, Erez A, Adler L, Jensen P, Scholl C, Fröhling S, et al (2017) BCAT1 restricts αKG levels in AML stem cells leading to IDHmut-like DNA hypermethylation. Nature 551: 384–388 Available at: http://www.nature.com/articles/nature24294 Rappsilber J, Mann M & Ishihama Y (2007) Protocol for micro-purification, enrichment, pre- fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2: 1896–1906 Available at: http://www.nature.com/articles/nprot.2007.261 Rodriguez-Meira A, Buck G, Clark S-A, Povinelli BJ, Alcolea V, Louka E, McGowan S, Hamblin A, Sousos N, Barkas N, Giustacchini A, Psaila B, Jacobsen SEW, Thongjuea S & Mead AJ (2019) Unravelling Intratumoral Heterogeneity through High-Sensitivity Single-Cell Mutational Analysis and Parallel RNA Sequencing. Mol. Cell 73: 1292- 1305.e8 Available at: https://linkinghub.elsevier.com/retrieve/pii/S1097276519300097 Schoof EM, Lechman ER & Dick JE (2016) Global proteomics dataset of miR-126 overexpression in acute myeloid leukemia. Data Br. 9: 57–61 Available at: http://www.ncbi.nlm.nih.gov/pubmed/27656662 Shlush LI, Mitchell A, Heisler L, Abelson S, Ng SWK, Trotman-Grant A, Medeiros JJF, Rao-Bhatia A, Jaciw-Zurakowsky I, Marke R, McLeod JL, Doedens M, Bader G, Voisin V, Xu C, McPherson JD, Hudson TJ, Wang JCY, Minden MD & Dick JE (2017) Tracing the origins of relapse in acute myeloid leukaemia to stem cells. Nature 547:

25 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

104–108 Available at: http://www.nature.com/articles/nature22993 Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, et al (2018) WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46: D661–D667 Available at: http://academic.oup.com/nar/article/46/D1/D661/4612963 Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Stein TI, Nudel R, Lieder I, Mazor Y, Kaplan S, Dahary D, Warshawsky D, Guan-Golan Y, Kohn A, Rappaport N, Safran M & Lancet D (2016) The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. In Current Protocols in Bioinformatics pp 1.30.1-1.30.33. Hoboken, NJ, USA: John Wiley & Sons, Inc. Available at: http://doi.wiley.com/10.1002/cpbi.5 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES & Mesirov JP (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102: 15545–15550 Available at: https://www.pnas.org/content/102/43/15545.long Ting L, Rad R, Gygi SP & Haas W (2011) MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 8: 937–40 Available at: http://www.ncbi.nlm.nih.gov/pubmed/21963607 Trapnell C (2015) Defining cell types and states with single-cell genomics. Genome Res. 25: 1491–8 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26430159 Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA & Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509: 371–375 Available at: http://www.nature.com/articles/nature13173 Vogel C & Marcotte EM (2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13: 227–232 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22411467 Winter SV, Meier F, Wichmann C, Cox J, Mann M & Meissner F (2018) EASI-tag enables accurate multiplexed and interference-free MS2-based proteome quantification. Nat. Methods 15: 527–530 Available at: http://dx.doi.org/10.1038/s41592-018-0037-8 Wojtowicz EE, Lechman ER, Hermans KG, Schoof EM, Wienholds E, Isserlin R, van Veelen

26 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

PA, Broekhuis MJC, Janssen GMC, Trotman-Grant A, Dobson SM, Krivdova G, Elzinga J, Kennedy J, Gan OI, Sinha A, Ignatchenko V, Kislinger T, Dethmers-Ausema B, Weersing E, et al (2016) Ectopic miR-125a Expression Induces Long-Term Repopulating Stem Cell Capacity in Mouse and Human Hematopoietic Progenitors. Cell Stem Cell Wolf FA, Angerer P & Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19: 15 Available at: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0 Zhu Y, Piehowski PD, Zhao R, Chen J, Shen Y, Moore RJ, Shukla AK, Petyuk VA, Campbell-Thompson M, Mathews CE, Smith RD, Qian W-J & Kelly RT (2018) Nanodroplet processing platform for deep and quantitative proteome profiling of 10–100 mammalian cells. Nat. Commun. 9: 882 Available at: http://www.nature.com/articles/s41467-018-03367-w

Figure legends Figure 1 - A) Overview of a typical AML hierarchy, with LSC at the apex. B) Sorting scheme for sorting the bulk, blast, progenitor and LSC populations from the 8227 culture system, followed by the TMT labeling setup for single-cell proteomics, consisting of 9-10 single cells and 1-2 booster channels. C) Overview of the SCeptre computational single cell MS pipeline.

Figure 2 – A) Standard PCA clustering of the single cell proteomics data, when using 1) a bulk cell booster, 2) a bulk cell + cell-specific booster, or 3) a cell-specific only booster channel. B) Overview of the tSNE and UMAP clusterings of the bulk, blast, LSC and progenitor cells in the different datasets, combined with the number of proteins found in the specific cells as overlaid on the same computational embedding. C) Standard PCA clustering of the TMT SPS MS3 data. D) tSNE and UMAP clustering of the TMT SPS MS3 data, with the number of proteins found in the specific cells overlaid on the same computational embedding.

Figure 3 - A) Sorting overview for the control TMT pools, where single cells of each differentiation stage were combined with a booster channel of one differentiation stage, until all possible combinations were met. B) Total number of protein identifications for the control samples, using either MS2 and MS3 level TMT quantitation. C) Standard PCA

27 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

analysis of single cells, coloured either by cell differentiation stage or TMT pool. The TMT colour scheme highlights a significant batch effect where cells cluster mainly according to TMT pool rather than differentiation stage. D) SCeptre analysis of control samples, highlighting a clear clustering pattern according to cellular differentiation stage. Number of proteins identified in each cell is indicated in the accompanying plots, overlaid on the same tSNE/UMAP embedding as the original clustering.

Figure 4 - A) UMAP clustering of the different differentiation stages when using protein expression values only, or when using the Eigenprotein information derived from pathway integration. B) The Eigenprotein cliques overlaid on the Eigenprotein-based UMAP embedding, highlighting a Progenitor/LSC, an LSC and a blast-specific Eigenprotein clique respectively. C) Wilcoxon-ranked-sum ranking of top proteins defining the three differentiation stages and bulk. D) Heatmap visualization of the top three proteins from the Wilcoxon-ranked-sum testing. E) Protein expression levels of top three LSC and Blast proteins, overlaid on the Eigenprotein UMAP embedding.

Supplementary Figure 1 - Overview of all 54 Eigenprotein cliques identified by SCeptre, overlaid on the Eigenprotein UMAP embedding.

Supplementary Figure 2 - UMAP embedding plot, highlighting the CD34 expression levels within the single cells (lower panels), compared to the UMAP cell differentiation clusters (upper panels). Values are plotted both on protein expression-based embedding, and Eigenprotein embedding.

28 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Tables and their legends

Dataset Booster Cells MS # of # of Average # of Protein Instrument Cells Protein IDs p. cell (after batch IDs correction) 1 Bulk QExactive 320 1,259 389 HFX (MS2) (min: 24, max: 522) 2 Bulk + Cell-specific QExactive 320 2,138 994 HFX (MS2) (min: 318, max: 1157)

3 Cell-specific QExactive 400 2,452 389 HFX (MS2) (min: 143, max: 532)

4 Cell-specific Orbitrap 160 1,216 259 Fusion (MS3) (min: 63, max: 375)

Table 1 - Overview of Protein identification numbers across the various datasets

Expanded View Figure legends

Supplementary Table 1 – Overview of Eigenprotein cliques and the pathways and genes contained therein.

29 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 1 AML Hierarchy A) B)

Leukemia Lyse & digest TMT labeling Stem Cells “8227” AML hierarchy

(LSC) Booster Channel TMT-126 Mass Spectrometry

Single Cell TMT-127N Bulk Single Cell TMT-127C

Single Cell TMT-128N

Progenitor Single Cell TMT-128C y Blasts Progenitors Cells Single Cell TMT-129N

Single Cell TMT-129C Orbitrap Fusion (MS3) QExactive HF-X (MS2)

Single Cell TMT-130N CD38 PE Single Cell TMT-130C Blast LSC Cells Single Cell TMT-131N Single Cell / TMT-131C CD34 APC-Cy7 Booster Channel

C) SCeptre Workflow

Step 1 Step 4

TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample Batch correction by cell TMT sample TMT sample

Cells TMT sample TMT sample Computing EigenProteins TMT sample TMT sample type and TMT pool at TMT sample TMT sample TMT sample TMT sample TMT sample protein expression level TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample TMT sample Proteins

Step 2 Step 5

Protein correlation Data cleaning and clustering cells

eigenproteins Step 3 Step 6 isolate cliques

Protein network UMAP visualization integration UMAP2

UMAP1 bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 2

A) Bulk + Cell type-specific Cell type-specific Bulk Booster Booster Booster

70

20

10 60

50 10

0

40

0 30 PC 2 (6.6%) PC 2 (4.3%) PC 2 (6.5%) -10

20

-10

10 -20

0 -20

-30 -10

-40 -30 -20 -10 0 10 20 -20 -10 0 10 20 -20 -10 0 10 20 PC 1 (11.6%) PC 1 (9.9%) PC 1 (8.4%)

BLAST BULK LSC PROG BLAST BULK LSC PROG BLAST BULK LSC PROG

B) Bulk + Cell type-specific Cell type-specific Bulk Booster Booster Booster

C) D) TMT SPS MS3 TMT SPS MS3

30

20

PC 2 (5.6%) 10

0

-10

-10 -5 0 5 10 15 20 25 PC 1 (10.6%) BLAST BULK LSC PROG bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 3 A)

Cells Sorted 96 well plate 126 127N 127C 128N 128C 129N 129C 130N 130C 131N 131C MS Type Booster Cells 123456789101112 MS2 Bulk 500 cells PROG BULK LSC LSC - BULK BLASTS BLASTS PROG PROG - BULK MS2 Bulk 500 cells PROG BULK LSC LSC - BULK BLASTS BLASTS PROG PROG - BLASTS MS3 Bulk 500 cells PROG BULK LSC LSC - BULK BLASTS BLASTS PROG LSC - PROG MS3 Bulk 500 cells PROG BULK LSC LSC - BULK BLASTS BLASTS PROG LSC - LSC MS2 Blasts 500 cells BLASTS BLASTS BULK BULK PROG PROG LSC LSC - PROG - MS2 Blasts 500 cells BLASTS BLASTS BULK BULK PROG PROG LSC LSC - PROG - MS3 Blasts 500 cells BLASTS BLASTS BULK BULK PROG PROG LSC LSC - LSC - MS3 Blasts 500 cells BLASTS BLASTS BULK BULK PROG PROG LSC LSC - LSC -

96 well plate 126 127N 127C 128N 128C 129N 129C 130N 130C 131N 131C MS Type Booster Cells 123456789101112 MS2 Prog 500 cells - PROG BLASTS BLASTS BULK LSC BULK PROG LSC BULK - MS2 Prog 500 cells - PROG BLASTS BLASTS BULK LSC BULK PROG LSC BULK - MS3 Prog 500 cells - PROG BLASTS BLASTS BULK LSC BULK PROG LSC BLASTS - MS3 Prog 500 cells - PROG BLASTS BLASTS BULK LSC BULK PROG LSC BLASTS - MS2 LSC 500 cells LSC LSC PROG PROG BLASTS BLASTS - BULK BULK BULK - MS2 LSC 500 cells LSC LSC PROG PROG BLASTS BLASTS - BULK BULK BULK - MS3 LSC 500 cells LSC LSC PROG PROG BLASTS BLASTS - BULK BULK BLASTS - MS3 LSC 500 cells LSC LSC PROG PROG BLASTS BLASTS - BULK BULK BLASTS -

B) Total Protein IDs Ctrls_MS2 1,437 Ctrls_MS3 937

C) MS2 Quantitation MS3 Quantitation

Cell type TMT Injection Cell type TMT Injection

10 10 20

20

0 10 0

10 PC 2 (8.4%) 0

PC 2 (16.7%) -10 PC 2 (8.4%) -10

0 PC 2 (16.7%) -10

-20

-20

-10 -10 0 10 20 30 40 PC 1 (21.9%) -30

-50 -40 -30 -20 -10 0 10 20 -30 PC 1 (25.5%) BLAST_1 LSC_1 -50 -40 -30 -20 -10 0 10 20 PROG_2 PROG_1 -10 0 10 20 30 40 PC 1 (25.5%) PC 1 (21.9%) BLAST_2 LSC_2 LSC_2 LSC_1 PROG_1 BULK_1 Bulk_2 Bulk_1 BLAST BULK LSC PROG PROG_2 BULK_2 BLAST BULK LSC PROG Blast_2 Blast_1

D) MS2 Quantitation MS3 Quantitation bioRxiv preprint doi: https://doi.org/10.1101/745679. this version posted August 24, 2019. The copyright holder for this preprint (which wasFigure 4 not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. A) Protein Expression only Eigenprotein-based

B) Progenitor / LSC LSC Blasts

C)

D) E) S Blasts LSC 7. Harnessing a novel targeted single-cell proteomics technique to expand the knowledge of primary leukemia hierarchy- Manuscript 7

Our newly introduced optimized single-cell proteomics workflow termed single-cell mass spectrometry (scMS) (Manuscript 6) yielded sufficient numbers in protein identifications to classify cell types in an acute myeloid leukemia hierarchy based on single-cell proteome analysis. However, inherent to the stochastic nature of shotgun proteomics and the very low abundance of peptides per TMT-channel, mostly proteins with high copy numbers per cell could be identified. To overcome this limitation and allow detection of important indicative mediators, such as transcription factors, we have complemented scMS with a targeted approach for high-sensitivity single-cell proteomics. When implementing scMS, we used previously described AML patient cells, sorted them by FACS and analyzed them on the latest Thermo Fisher Exploris 480 instrument by data-dependent acquisition shotgun proteomics. To implement targeted scMS, we spiked each cell panel with heavy synthetic peptides labeled by a super heavy TMT reagent. Ninety-nine peptides representing 33 proteins were selected for an in-depth analysis. By offset triggering of the MS scans, we are able to quantify endogenously present single-cell peptides that were otherwise not visible in the untargeted shotgun proteomics approach. Identifying more than 95% of the targets in each cell has given us the confidence that the method works and can be easily scalable. As a proof of concept, we confirmed CD34 and CD38 abundance changed in between the three cell lines and observed appropriate proportional distribution among the cells. In addition to that we monitored lowly abundant proteases HTRA2 and HTRA3 and their abundance distribution between the cell types at single-cell level. High abundance of HTRA2 in Leukemic Stem Cells and a low abundance in progenitors has given a new insight into the role of this protease in each cell type.

Novel proteomics technologies to decipher protease networks in cells and tissues 131 Harnessing a novel targeted single-cell proteomics technique to expand the knowledge of primary leukemia hierarchy

Simonas Savickas1; Philipp Kastl1; Erwin Marten Schoof1, Ulrich auf dem Keller1*

Affiliations 1 Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby,Denmark

*Corresponding author:

Ulrich auf dem Keller, email: [email protected]

Key words: Single cell proteomics / proteases / proteomics / targeted-proteomics / Tandem-mass-tag (TMT) / SureQuant

132 Novel proteomics technologies to decipher protease networks in cells and tissues Abstract

With the breakthrough of single-cell transcriptomics and single-cell proteomics approaches, the structure of heterogenous cell populations is slowly being unraveled. While global single-cell discovery proteomics has allowed e.g. identification of cell-type specific proteins, the technique is classically prone to low sensitivity and stochastic bias. To address this bottleneck, we introduce a Single-Cell SureQuant™ proteomics technique to investigate, in a targeted manner, the proteome landscapes of leukemia hierarchy. We can confirm prognostic Leukemic Stem Cell differentiation using CD34/CD38 membrane proteins and consistently measure expression of lowly abundant proteases across the entire hierarchy.

Novel proteomics technologies to decipher protease networks in cells and tissues 133 7.1 Introduction

The rise of single cell RNA-Seq revolutionized cancer stem cell research by explaining multilineage of cellular differences with help of genetic expression signatures recorded from individual cells (Dalerba, et al., 2011). In addition to capturing the transcription patterns of each cell, classifying their expression profiles, development of combinatorial drug treatment strategies (Xu, et al., 2012; Zahreddine & Borden, 2013) has encouraged the development of complementary single-cell proteomics. Bridging the two fields over the past few years has been a key factor to come closer to the phenotype of a single cell (Marx, 2019). Single cell technologies like nanoPOTS developed by Ryan Kelly’s laboratory analyzing ultra-low load proteomes and extracting up to 400 proteins from individual HeLa cells are a proof of principle that with a developing technology we are able to explore new biological phenomena in cell heterogeneity (Cong, et al., 2020). Furthermore, exploiting sensitive mass spectrometers and efficient chemical labeling reagents, SCoPE-MS has given the field a boost by increasing throughput and allowing quantitative analysis of close to 1000 proteins per single cell. The technique developed in Nikolai Slavov’s laboratory was able to evaluate macrophage heterogeneity in utmost sensitivity and in parallel single-cell mass spectrometry (scMS) introduced by Schoof et al. managed to determine the proteomic fingerprint of leukemia cell hierarchy (Schoof, et al., 2019; Specht, Emmott, Koller, & Slavov, 2019). However, the technique with an increasing popularity has a few core shortcomings that are inherent to classical shotgun proteomics (Marx, 2019). In scMS, protein identification relies on peptide signal intensity to trigger an identifying MS2 scan, which primarily takes place for high- abundance proteins. Lowly abundant proteins do not cross the limit of detection of the mass spectrometer and thus are not measured. Moreover, stochastic bias prevents separate injections to measure the same proteins, thus reducing the number of reproducibly measured proteins with every new injection and decreasing the quantitative accuracy. Finally, a new concept introduced by Chris Rose’s laboratory, the carrier proteome effect is showing the shortcoming of TMT labels that stem from an overlapping carrier channel. The drawback leads to incorrect quantification of the channels and false interpretations (Kuster & Rose, 2020). In recent months, the technological advances of mass spectrometers have provided a solution for some of the issues mentioned above. The new Thermo Fisher Exploris 480 has improved sensitivity with its new ion flight path architecture. Combination with a FAIMS ion mobility device deflecting zero charge ions that usually overflow the orbitrap analyzer with uninformative ions results in increased confident peptide identification rates (Bekker-Jensen, et al., 2019). Finally, the new software tool SureQuant (Grossegesse, Hartkopf, Nitsche, & Doellinger, 2020) has been a key asset for developing a new method that we introduce in this manuscript and term Single-Cell SureQuant (scSureQuant). It is a new method to quantify low- abundance proteins in individual cells in a targeted fashion.

134 Novel proteomics technologies to decipher protease networks in cells and tissues 7.2 Methods 7.2.1 Single cell sorting and processing A modified protocol published by Schoof et al. (2019) has been used for cell preparation. In brief, we have FACS sorted primary AML patient 8227 cells according to CD34/CD38 identity into LSC, Progenitor, and Blast groups into 96-well format plates. Each well contained 1 cell except the 11th well which contained 200 cells of the same type. The cells were lysed by boiling for 1 min at 95oC, reduced and alkylated using 20mM CAA and 20mM TCEP, digested using trypsin in the well and labeled with TMT11 quantitative tags according to the optimized sample processing workflow described by the manufacturer. Prepared labeled peptides were stage tipped, vacuum dried and resuspended in 2%ACN, 1%TFA injection buffer. 7.2.2 Target peptide preparation and labeling with super heavy TMT (shTMT A panel of target synthetic heavy peptides (Supplementary information) was assembled from a previously published data set of single-cell proteomics (Schoof, et al., 2019), and an in-house deep fractionated Leukemia Stem Cell proteomics library. Selected peptides were ordered from JPT Peptide Technologies, 0.3 nmol of peptide mix was resuspended according to manufacturer’s instructions and labeled with super heavy TMT- (Thermo Fisher Scientific) reagent according to manufacturer’s instructions. A small aliquot was taken for LC-MS analysis to evaluate labeling efficiency and establish a SureQuant Survey scanning property. The remaining peptides were aliquoted and vacuum dried. Peptides containing a super heavy TMT tag were resuspended in 2%ACN, 1% TFA injection buffer and mixed with the TMT11 labeled peptides. 7.2.3 Survey scan of single cell targeted proteomics Synthetic peptides with shTMT label were loaded on a 15cm C18 reverse-phase analytical column and analyzed using a 100min gradient. The peptide analysis was performed on an Exploris 480 mass spectrometer (Thermo Fisher Scientific) coupled to a FAIMS Pro (high-field asymmetric waveform ion mobility spectrometer) device and operated in SureQuant Survey mode for 100min, MS1 resolution of 120.000, AGC target of 300%, CV at -55V, 50ms injection time, cycle time of 5s. This was followed by labeled synthetic peptide identification from the list (Supplementary information), at 7500 resolution, HCD collision energy at 32, AGC target of 1000%, 10ms injection time. 7.2.4 Single cell targeted proteomics Prepared samples of sorted cells and synthetic peptides were loaded on a 15cm C18 reverse-phase analytical column and analyzed using a 100min gradient. The peptide analysis was performed on an Exploris 480 mass spectrometer (Thermo Fisher Scientific) coupled to a FAIMS Pro (high-field asymmetric waveform ion mobility spectrometer) device and operated in SureQuant mode for 100min, MS1 resolution of 120.000, AGC target of 300%, CV at -55V, 50ms injection time, cycle time of 5s. This was followed by heavy peptide precursors and fragment recognition from the list (Supplementary information), at 7500 resolution, HCD collision energy at 32, AGC target of 1000%, 10ms injection time. On the fly identification of peptides initiated an

Novel proteomics technologies to decipher protease networks in cells and tissues 135 offset scan (Table 1) at 60000 resolution, HCD collision energy at 32, AGC target of 1000%, 116ms injection time.

Table 1-Mass offset for super heavy TMT- labeled peptides Heavy amino acid Charge Offset K +2 -10.0209 K +3 -6.6806 K +4 -5.0105 R +2 -8.0111 R +3 -5.3406

7.2.5 Mass spectrometry data analysis To analyze single-cell targeted data, we used Skyline 19.1 (MacCoss lab). Target peptides were imported after setting filter parameters at MS1 orbitrap to Centroided, at mass accuracy of 10ppm and MS/MS DIA Centroided filtering at 20ppm, with fixed isolation scheme of 0.007m/z. All trigger precursor masses had at least 5 transitions. Endogenous group of peptides had a fixed modification of TMT11 reagent (+229.2 mass, and +229.2 (If a peptide contained K), synthetic peptides with super heavy TMT modification had added masses of (+235.2 and +10(For peptides ending with R), +243.2 (for peptides ending with K). Selected peaks of endogenous peptides were matched to synthetic peptide elution pattern. Area under the curve of each TMT label intensity was selected for further quantification using the “R” package “MSstats 3.12.3” (processing code available upon request). Independent injections were normalized according to mean total intensity and resulting intensities were plotted for data interpretation.

136 Novel proteomics technologies to decipher protease networks in cells and tissues 7.3 Results

With an emerging field of single cell proteomics, increasing technical capabilities and a growing single-cell proteomics community, we have developed a method to target and quantify proteins in single cells. The results of such effort allow exploration of ultra-low-abundance proteomes at single-cell level. To design a targeted single-cell proteomics assay, we have selected a variation of highly and lowly abundant proteins. Ninety-nine target peptides were selected, of which 21 were proteases, 3 transcription factors and 9 tumor markers (Supplemental information). The proteins were previously identified in our published data set (Schoof, et al., 2019) and an in-house deep proteomics library of Leukemia Stem Cells (LSC). Since synthetic shTMT-labeled peptides serve as elution markers and trigger the scanning of endogenous peptides, it was necessary to generate a spectrum library of each synthetic peptide. The panel of peptides was run independently in SureQuant Survey mode. The run has generated a clear trigger spectrum for the Single-cell SureQuant runs (Supplementary information). 9 samples each containing ten labeled single cells and a 200 cell booster channel were mixed with synthetic trigger peptides to prepare Single-Cell SureQuant measurement (Fig. 1A). Figure 1 illustrates SureQuant logic of searching for the trigger peptide; whenever the synthetic peptide is eluting, the instrument searches for 5 representative transitions of the synthetic peptide after very short 10 ms injection times. Once 5 transitions of the trigger peptide match the collected spectrum, the mass-spectrometer triggers a long 300ms injection cycle for the endogenous peptide inside of the instrument (Fig. 1B, blue). This combination of software and labeling allows identification and quantification of the 99 targets selected in each cell over the chromatographic gradient (Fig. 1B, right) with high intensity of the 200-cell booster channel and lower TMT signal of the single-cell channels (Fig. 1C, left). After injecting the samples, we were able to identify and quantify 99 target peptides in Leukemia Stem Cells, 98 targets in Leukemic Progenitors and 96 targets in Leukemic Blasts. These peptides cover at least 2 peptides per protein allowing us to reliably quantify every protein we were interested in. To validate quantitative performance of scSureQuant, we classified FACS sorted cells according to single-cell abundances of the sorting markers CD34 and CD38 (Fig. 2A). Progenitor cells had a high abundance of CD38, while Leukemic Stem Cells (LSC) exhibited low abundance levels. LSCs also showed high levels of CD34, but in Blasts and Progenitors the protein was barely detectable (Fig. 2B). Relative abundance levels measured by scSureQuant were in accordance with FACS quantitation (Fig. 2A).

Novel proteomics technologies to decipher protease networks in cells and tissues 137 Figure 1- Pipeline for preparation and analysis of single-cell targeted proteome. A). Single-cell preparation workflow for injection into the mass spectrometer. 8227 cells are FACS sorted into three types (Leukemic Stem Cells- Blue; Progenitors- Yellow; Blasts- Green) according to CD34 and CD38 expression into 96 well plate. Isolated cells are lysed, digested and resulting peptides labeled with a Tandem Mass Tag (TMT). Differently labeled lysates and synthetic heavy target peptides containing super heavy Tandem Mass Tag (TMTsh) are pooled into a single vial. B). SureQuant measurement of single-cell peptides. Once target peptide mass spectrum matches 5 ions of synthetic heavy peptide (red) scSureQuant technique triggers an offset target scan of endogenous single-cell peptides (blue spectrum). The triggering continues throughout the chromatographic peak of each targeted peptide. C). Collected target peptide spectrum with TMT reporter ions identifies and quantifies peptide abundance in each cell. All TMT tags except TMT-131c contain one cell. TMT-131c channel contains 200 cells that serves as a booster channel. The remaining channels provide quantitative value of each cell independently. With multiple quantitative points across the peak, the peptides are accurately quantified and compared without the interference of synthetic heavy shTMT labeled peptide. The elution patterns of the booster channel and heavy synthetic peptide set the start and end of single-cell peptide elution as well.

Furthermore, some target proteins that were investigated harnessing the power of scSureQuant were intracellular proteases known to be involved in different stages of cellular differentiations (Tiaden, A.N., et al., 2012; Zurawa-Janicka, D., et al. 2012) This method allowed us to investigate the proteases HTRA2 and HTRA3 in single cells from all three cell populations,

138 Novel proteomics technologies to decipher protease networks in cells and tissues Figure 2- Quantification of marker proteins and low abundant proteases using single cell targeted proteomics. A). Single-Cell preparation workflow for injection into the mass spectrometer. 8227 cells are FACS sorted into three types (Leukemic Stem Cells-Blue; Progenitors- Yellow; Blasts –Green) according to CD34 and CD38 expression into 96 well plate. B). Acute Myeloid Leukemia hierarchy according to single- cell targeted mass spectrometry. Relative protein abundances of CD34 and CD38 confirm the classification of FACS sorted cells. C). Relative quantification of HTRA2 and HTRA3 proteases in FACS sorted cell lines. Each cell in every row corresponds to a FACS sorted single cell. which have been shown to be very low in abundance in monocytes (Fig. 2C) (Kim.,M.S., et al. 2014). The heatmap shows generally high abundance of HTRA2 in Leukemic Stem Cells and low protein levels in Leukemic Progenitors, while the population of Leukemic Blasts exhibited mixed abundance profiles. In contrast to HTRA2 with uniform abundances in LCS and Progenitors, HTRA3 abundances significantly varied at the single-cell level in all three cell populations but were still high in majority of the 30 analyzed Progenitor cells (Fig. 2C).

Novel proteomics technologies to decipher protease networks in cells and tissues 139 7.4 Discussion

By complementing single-cell mass spectrometry (scMS) with single-cell SureQuant (scSureQuant), we have implemented a workflow that enables quantitative assessment of very lowly abundant proteins in individual cells. scSureQuant opens up new possibilities for high- throughput protein analysis and for gaining new insight into cellular heterogeneity. The technique is still in its infancy but has a high potential for further development. The sensitivity range of shotgun method is only limited by MS2 ion injection time. This hurdle was overcome by developing Parallel Reaction Monitoring approach (Rauniyar, 2015) that maximizes the injection time of the ion package ensuring target peptide detection and quantification. In scSureQuant we detect the heavy synthetic peptide, but accumulate and quantify the endogenous peptide. Thus, strict elution patterns and strict ion isolation widths of 0.8Da ensures quantification of the target peptide. Furthermore, supported by classical targeted proteomics theory scSureQuant quantifies the same set of peptides in every injection. Fluctuation of peptide spectrum intensity due to the total matrix proteome of the cell does not influence peptide search and quantity depends on the abundance of the peptide itself. Due to continuously targeting by scSureQuant, we did not miss a trigger peptide but a few target peptides, either due to their absence from the sample or less likely through poor susceptibility to mass spectrometry analysis. Selecting the most suitable synthetic peptides for quantitative analysis can be a particular challenge. There are many resources like SRMatlas, SWATHatlas, Proteomics DB, that provide information on peptides, their spectra and properties in mass spectrometry analysis (Deutsch, Lam, & Aebersold, 2008; Samaras, et al., 2020). However, the annotated peptides do not perform the same way in a different instrument. Moreover, the super heavy TMT tag adds a more positive charge to a synthetic targeted peptide amino acid sequence, but it also alters the entire structure of the molecule. These intentional additions result in a peptide with different physical properties, making it difficult to rationally design peptides with optimized properties for mass spectrometry. Therefore, each peptide needs to be run independently and tested for suitability. In our analyses, we observed that co-isolation of the target peptide at 0.8Da with other peptides may be an issue. Since scSureQuant monitors and quantifies endogenously very lowly abundant peptides, any co-isolation can result in severe inaccuracies in quantification. Such an event should be possible to correct by integrating MS3 isolating capacity, ensuring quantitative accuracy (Ting, Rad, Gygi, & Haas, 2011). Using scSureQuant, we were able to further investigate the heterogeneity of the three previously identified cell types of the Leukemic hierarchy (Kornblau, et al., 2013). The method allowed us to identify endogenously very lowly abundant proteases and transcription factors independent from masking proteins from the complex cellular proteome. This novel technique has highlighted differential abundance distributions of HTRA2 and HTRA3, members of the same family of proteases. To our knowledge, we are the first to measure the abundance of these

140 Novel proteomics technologies to decipher protease networks in cells and tissues proteases at a single-cell level in monocyte populations. The data set also contains information on transcription factors like p53 and cytokines including TNF-alpha. We hope it can become a resource to progress single-cell proteome analysis, step into identification and quantification of proteolytic substrates and contribute to devising new strategies for precision medicine.

Novel proteomics technologies to decipher protease networks in cells and tissues 141 References Bekker-Jensen, D., del Val, A., Steigerwald, S., Rüther, P., Fort, K., Arrey, T., . . . Olsen, J. (2019). A Compact Quadrupole-Orbitrap Mass Spectrometer with FAIMS Interface Improves Proteome Coverage in Short LC Gradients. Molecular & Cellular Proteomics.

Cong, Y., Liang, Y., Motamedchaboki, K., Huguet, R., Truong, T., Zhao, R., . . . Kelly, R. (2020). Improved Single-Cell Proteome Coverage Using Narrow-Bore Packed NanoLC Columns and Ultrasensitive Mass Spectrometry. Analytical Chemistry, 92(3).

Dalerba, P., Kalisky, T., Sahoo, D., Rajendran, P., Rothenberg, M., Leyrat, A., . . . Quake, S. (2011). Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nature Biotechnology, 29(12).

Deutsch, E., Lam, H., & Aebersold, R. (2008). PeptideAtlas: A resource for target selection for emerging targeted proteomics workflows. EMBO Reports, 9(5).

Grossegesse, M., Hartkopf, F., Nitsche, A., & Doellinger, J. (2020 m. 7 2 d.). Stable Isotope-Triggered Offset Fragmentation Allows Massively Multiplexed Target Profiling on Quadrupole-Orbitrap Mass Spectrometers. Journal of Proteome Research, 19(7), 2854-2862.

Kim, M. S., Pinto, S. M., Getnet, D., Nirujogi, R. S., Manda, S. S., Chaerkady, R., ... & Thomas, J. K. (2014). A draft map of the human proteome. Nature, 509(7502), 575-581. Kornblau, S., Qutub, A., Yao, H., York, H., Qiu, Y., Graber, D., . . . Coombes, K. (2013). Proteomic Profiling Identifies Distinct Protein Patterns in Acute Myelogenous Leukemia CD34+CD38- Stem-Like Cells. PLoS ONE, 8(10).

Kuster, B., & Rose, C. M. (2020). Single Cell Proteomics and the Carrier Proteome Effect. Proceedings of the 68th ASMS Conference on Mass Spectrometry and Allied Topics (p. MP 544). Austin, TX: ASMS.

Marx, V. (2019). A dream of single-cell proteomics. Nature Methods, 16(9).

Rauniyar, N. (2015). Parallel reaction monitoring: A targeted experiment performed using high resolution and high mass accuracy mass spectrometry. International Journal of Molecular Sciences, 16(12).

Samaras, P., Schmidt, T., Frejno, M., Gessulat, S., Reinecke, M., Jarzab, A., . . . Wilhelm, M. (2020). ProteomicsDB: A multi-omics and multi-organism resource for life science research. Nucleic Acids Research, 48(D1).

Schoof, E., Rapin, N., Savickas, S., Gentil, C., Lechman, E., Haile, J., . . . Porse, B. (2019). A Quantitative Single-Cell Proteomics Approach to Characterize an Acute Myeloid Leukemia Hierarchy.

Specht, H., Emmott, E., Koller, T., & Slavov, N. (2019). High-throughput single-cell proteomics quantifies the emergence of macrophage heterogeneity. bioRxiv.

Tiaden, A. N., Breiden, M., Mirsaidi, A., Weber, F. A., Bahrenberg, G., Glanz, S., … Richards, P. J. (2012). Human HTRA1 positively regulates osteogenesis of human bone marrow-derived mesenchymal stem cells and mineralization of differentiating bone-forming cells through the modulation of extracellular matrix protein. Stem Cells, 30(10). https://doi.org/10.1002/stem.1190

Ting, L., Rad, R., Gygi, S., & Haas, W. (2011). MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nature Methods, 8(11).

Xu, X., Hou, Y., Yin, X., Bao, L., Tang, A., Song, L., . . . Wang, J. (2012). Single-cell exome sequencing reveals single- nucleotide mutation characteristics of a kidney tumor. Cell, 148(5).

Zahreddine, H., & Borden, K. (2013). Mechanisms and insights into drug resistance in cancer. Frontiers in Pharmacology, 4 MAR.

Zurawa-Janicka, D., Kobiela, J., Galczynska, N., Stefaniak, T., Lipinska, B., Lachinski, A., … Sledzinski, Z. (2012). Changes in expression of human serine protease HtrA1, HtrA2 and HtrA3 genes in benign and malignant thyroid tumors. Oncology Reports, 28(5). https://doi.org/10.3892/or.2012.1988

142 Novel proteomics technologies to decipher protease networks in cells and tissues 8. Concluding discussion

Upon healing of a cutaneous wound, a series of MMPs are secreted at the site of injury. They have been mainly investigated for their functions in remodeling of the extracellular matrix, but it is now clear that they also process non-matrix proteins and act not alone but within an interconnected activation network. However, besides direct biochemical assays or recombinant protein interactions the endogenous MMP interactivity has been poorly understood. Compared to other proteolytic cascades, the MMP network is poorly defined in vivo, since comprehensive studies in complex proteomes of cell lines or animal tissue are missing. The Handbook of Proteolytic Enzymes (Rawlings et al. 2013) lists all possible MMP-MMP interactions with evidence for most of them only at the recombinant enzyme level, while e.g. the blood coagulation pathway has been mapped also in animals. Therefore, we have selected seven secreted MMPs as candidates and performed a detailed quantitative analysis of their activity status in different skin related matrices using a targeted degradomics workflow. Despite significant advancements in technology and methodology of mass spectrometry- based proteomics in detecting low-abundance proteins and peptides, MMPs have been primarily measured using ELISA assays with low throughput (Duffy,M. J. et al., 1995; Kihira, Y., et al, 1996; Kumamoto, K., et al., 2009;Yoon, S., et al., 1999; Zucker, S., et al., 1992). For several MMPs no specific antibody is available (Mirastschijski, U., et al. 2019), and high amounts of starting material is needed for analysis. In addition, current targeted approaches for assessment of skin proteins in vivo require tedious sample preparation methods, involving mechanical disruption e.g. by UltraTurrax dissociation that are associated with significant sample losses. To address these limitations, we have developed technologies to investigate the interconnectivity of MMPs and their activation interactions. First, we implemented a targeted degradomics workflow that investigates 100s of targets with a single injection at ELISA level sensitivity for MMPs and their substrates, preceded by high extraction efficiency in vivo skin extraction using pressure cycling technology. Secondly, we designed a method for semi-discovery analysis termed hybrid Parallel Reaction Monitoring (hPRM) that allows sensitive targeted quantification of the MMP network and discovery of changes in the surrounding proteome. Third, we developed workflows for single-cell proteomics and single-cell targeted proteomics that could be extended to degradomics analyses to discover and quantify proteases and their substrates. With the developed technologies at hand we managed to overcome many shortcomings of the previous tools and methods. For the development of targeted degradomics assays, we exploited one of the most sensitive and high-throughput mass spectrometry technique called Parallel Reaction Monitoring (Bourmaud, A., et al., 2016) and used online resources like MEROPS (Rawlings, et al., 2020) and SRMatlas (Kusebauch, et al., 2016) to design highly sensitive assays for MMP quantification. As anticipated, the selected peptides for MMPs yielded the most intense signal and

Novel proteomics technologies to decipher protease networks in cells and tissues 143 they fulfilled a requirement to represent three different features of the protein: Neo-N terminus generated upon zymogen removal, tryptic spanning peptide extending over the zymogen removal site, and three tryptic peptides covering the mature C-terminal part of the protease. While detection of the neo-N terminus is critical to determine the activation status of the MMPs, it is also important to investigate the endogenous abundance levels of the proteases quantified by the three core peptides. With our targeted degradomics assays, we demonstrated that the designed assays can reach the sensitivity and dynamic range of a classical ELISA kit. Evidently, the seven target proteases are endogenously very low in abundance in murine tissue (Steinsträer, et al., 2010). Thus, the high sensitivity assays were able to quantify the proteases down to 0.4pg of MMPs in 1ug of complex proteome. As a targeted proteomics workflow, targeted degradomics still suffers from several drawbacks, such as retention time shifts between samples in chromatography, potential co- isolation of peptides, limited target amount and tedious bioinformatic analysis of the generated data. Moreover, although targeted degradomics has an immeasurable resource available for peptide selection and assay design, it is still limited by exogenous endopeptidases available for formation of peptides long enough to be measurable by a mass spectrometer. Presumably, 90% of substrates of interest are detectable with a classical setup, but in order to monitor the remaining 10% the workflow has to be extended e.g. by enrichment methods or specialized endopeptidase (such as Lys-N) treatment (Dupr, Cantel, Verdi, Martinez, & Enjalbal, 2011; Giansanti, Tsiatsiani, Low, & Heck, 2016). Furthermore, we have expanded our targeted degradomics workflow into semi-discovery analysis, which not only monitors the selected targets but also identifies proteins and substrates in the surrounding proteome. The new feature allows quantification of substrates and peptides without sacrificing the sensitivity of targeted peptides. In a single shot, while the target MMPs were analyzed we could investigate hundreds of proteins and protease cleavage events with one of them being a published substrate of cathepsin D (Impens, et al., 2010). This reduces sample consumption, analysis time, and interestingly is applicable using conventional software (tested on Thermo Fisher Q Exactive XcaliburTM and Thermo Fisher Fusion XcaliburTM). Hybrid PRM can be improved by designing a user-friendly software tool, since today it requires tedious manual handling to arrange the scheduling parameters. Also, tools available to process and quantify the generated data like Skyline (MaCoss lab) (MacLean, et al., 2010) and Spectronaut (Biognosys) (Muller, et al., 2019) have to be used independently for targeted analysis and discovery analysis. To improve the consistency of the analysis pipelines it needs to be combined. We discovered that in the discovery approach the method suffers from high degree of variation between samples (CV ~20%), which can be reduced by adjustment of the number of DIA-type windows but instantly reduces the number of identified peptides. Similar processing workflows have been published in recent years (WiSIM-DIA, BoxCar, PulseDIA) (Zheng, et al., 2020) that address the issue of sensitivity for each data-independent acquisition scan, while we

144 Novel proteomics technologies to decipher protease networks in cells and tissues expand from a highly sensitive and selective system into the discovery phase through data- independent acquisition. Furthermore, with the rapid technological advances in the sensitivity and selectivity of mass spectrometers, we have devised a single-cell proteomics technology. Thereby, different cell proteotypes and cell states in a tissue sample can be recognized one cell at a time. With over 400 proteins identified in each cell, it is possible to distinguish a cell type from its proteomic signature alone. Such technology provided direct analysis of proteolytic signatures of intracellular proteases in Acute Myeloid Leukemia (AML). Although the selected and sorted cells are able to show us only intracellular proteolysis for now, we were able to identify 22 proteases from several different classes and families. We observed differential abundances of cathepsin D and Caspase 14 in Leukemic Blasts, Leukemic Stem Cells, and Leukemic Progenitor Cells. cathepsin D is one of the few relatively highly abundant proteins that is localized in the lysomal compartment of the cell and has been shown to regulate p53 chemosensitivity. With an increased abundance of the protease, the cells are more susceptible to treatment with adriamycin and etoposide (Wang, et al., 2006; Wu, Saftig, Peters, & El-Deiry, 1998). For now, the technology of single-cell proteomics is limited to intracellular proteins. Thus, with further development of sample handling and cell preparation it might be possible to explore the extracellular space using microfluidic encapsulation or a similar technique (Yanakieva, et al., 2020). Besides the limited cellular organelle analysis, we also observe limited protein identification. The ambitious attempt to quantify the entire cell proteome comes short compared to highly fractionated high amount heterogenous cell populations. Further improvements in technology needs to be addressed. One of the hurdles has been overcome with our targeted single-cell proteomics workflow that allows us to identify theoretically any protein present in the cell at very low concentrations. This extended version of targeted proteomics and single-cell proteomics allowed us to identify and quantify very lowly abundant 99 targets in a Leukemic Blast, Leukemic Progenitor, and Leukemic Stem Cell. We were able to verify FACS sorting precision of these three cell types according to relative abundances of CD34 and CD38 (Raffel, et al., 2017). These two membrane proteins were not consistently visible in the untargeted analysis, thus missing values could be filled in by our targeted approach. The other advantages of this technology include increased throughput and decreased computational strain. It uses conventional software tools like Skyline and instrument control method SureQuant. The combination of the two can be easily expanded to other cell lines or laser capture micro-dissected tissue. The co-isolation of reporter ions remains an issue derived from the single -ell proteomics workflow and requires MS3 quantification. This can be implemented using TOMAHTO user interface on a Thermo Fisher tribrid instrument (Yu, et al., 2020). It is a tool that is under development, has a simplified user interface and advances the TOMAHAQ principle (Erickson, et al., 2017). However, with the increased complexity of generated data bioinformatic tools have to

Novel proteomics technologies to decipher protease networks in cells and tissues 145 be adjusted accordingly. Unfortunately, today only Proteome Discoverer is able to annotate such data structures for targeted analysis, thus it would be beneficial to generate new data processing pipelines.

146 Novel proteomics technologies to decipher protease networks in cells and tissues References

Bourmaud, A., Gallien, S., & Domon, B. (2016). Parallel reaction monitoring using quadrupole-Orbitrap mass spectrometer: Principle and applications. Proteomics. https://doi.org/10.1002/pmic.201500543

Duffy, M. J., Blaser, J., Duggan, C., McDermott, E., O’Higgins, N., Fennelly, J. J., & Tschesche, H. (1995). Assay of matrix metalloproteases types 8 and 9 by ELISA in human breast cancer. British Journal of Cancer, 71(5). https://doi.org/10.1038/bjc.1995.197

Dupré, M., Cantel, S., Verdié, P., Martinez, J., & Enjalbal, C. (2011). Sequencing Lys-N proteolytic peptides by ESI and MALDI tandem mass spectrometry. Journal of the American Society for Mass Spectrometry, 22(2).

Erickson, B., Rose, C., Braun, C., Erickson, A., Knott, J., McAlister, G., . . . Gygi, S. (2017). A Strategy to Combine Sample Multiplexing with Targeted Proteomics Assays for High-Throughput Protein Signature Characterization. Molecular Cell, 65(2).

Giansanti, P., Tsiatsiani, L., Low, T., & Heck, A. (2016). Six alternative proteases for mass spectrometry-based proteomics beyond trypsin. Nature Protocols.

Impens, F., Colaert, N., Helsens, K., Ghesquière, B., Timmerman, E., De Bock, P., . . . Gevaert, K. (2010). A quantitative proteomics design for systematic identification of protease cleavage events. Molecular and Cellular Proteomics, 9(10).

Kihira, Y., Mori, K., Miyazaki, K., & Matuo, Y. (1996). Production of recombinant human matrix metalloproteinase 7 (matrilysin) with potential role in tumor invasion by refolding from Escherichia coli inclusion bodies and development of sandwich ELISA of MMP-7. Urologic Oncology, 2(1). https://doi.org/10.1016/1078- 1439(96)00030-0

Kumamoto, K., Fujita, K., Kurotani, R., Saito, M., Unoki, M., Hagiwara, N., … Harris, C. C. (2009). ING2 is upregulated in colon cancer and increases invasion by enhanced MMP13 expression. International Journal of Cancer, 125(6). https://doi.org/10.1002/ijc.24437

Kusebauch, U., Campbell, D. S., Deutsch, E. W., Chu, C. S., Spicer, D. A., Brusniak, M. Y., … Moritz, R. L. (2016). Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell, 166(3). https://doi.org/10.1016/j.cell.2016.06.041

MacLean, B., Tomazela, D. M., Shulman, N., Chambers, M., Finney, G. L., Frewen, B., … MacCoss, M. J. (2010). Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics. https://doi.org/10.1093/bioinformatics/btq054

Mirastschijski, U., Dinesh, N., Baskaran, S., Wedekind, D., Gavrilovic, J., Murray, M. Y., … Kelm, S. (2019). Novel specific human and mouse stromelysin-1 (MMP-3) and stromelysin-2 (MMP-10) antibodies for biochemical and immunohistochemical analyses. Wound Repair and Regeneration, 27(4). https://doi.org/10.1111/wrr.12704

Müller, F., Kolbowski, L., Bernhardt, O. M., Reiter, L., & Rappsilber, J. (2019). Data-independent Acquisition Improves Quantitative Cross-linking Mass Spectrometry. Molecular & Cellular Proteomics, 18(4). https://doi.org/10.1074/mcp.tir118.001276

Raffel, S., Falcone, M., Kneisel, N., Hansson, J., Wang, W., Lutz, C., . . . Trumpp, A. (2017). BCAT1 restricts αkG levels in AML stem cells leading to IDHmut-like DNA hypermethylation. Nature, 551(7680).

Rawlings, N. D. (2020). Twenty-five years of nomenclature and classification of proteolytic enzymes. Biochimica et Biophysica Acta - Proteins and Proteomics, 1868(2). https://doi.org/10.1016/j.bbapap.2019.140345

Novel proteomics technologies to decipher protease networks in cells and tissues 147 Steinsträer, L., Jacobsen, F., Hirsch, T., Kesting, M., Chojnacki, C., Krisp, C., & Wolters, D. (2010). Immunodepletion of high-abundant proteins from acute and chronic wound fluids to elucidate low-abundant regulators in wound healing. BMC Research Notes, 3.

Yanakieva, D., Elter, A., Bratsch, J., Friedrich, K., Becker, S., & Kolmar, H. (2020). FACS-Based Functional Protein Screening via Microfluidic Co-encapsulation of Yeast Secretor and Mammalian Reporter Cells. Scientific reports, 10(1), 1-13.

Wu, G., Saftig, P., Peters, C., & El-Deiry, W. (1998). Potential role for cathepsin D in p53-dependent tumor suppression and chemosensitivity. Oncogene, 16(17).

Wang, Z., Liang, R., Huang, G. S., Piao, Y., Zhang, Y. Q., Wang, A. Q., ... & Guo, Y. (2006). Glucosamine sulfate– induced apoptosis in chronic myelogenous leukemia K562 cells is associated with translocation of cathepsin D and downregulation of Bcl-xL. Apoptosis, 11(10), 1851-1860.

Yoon, S., Tromp, G., Vongpunsawad, S., Ronkainen, A., Juvonen, T., & Kuivaniemi, H. (1999). Genetic analysis of MMP3, MMP9, and PAI-1 in Finnish patients with abdominal aortic or intracranial aneurysms. Biochemical and Biophysical Research Communications, 265(2). https://doi.org/10.1006/bbrc.1999.1721

Yu, Q., Xiao, H., Jedrychowski, M., Schweppe, D., Navarrete-Perea, J., Knott, J., . . . Gygi, S. (2020). Sample multiplexing for targeted pathway proteomics in aging mice. Proceedings of the National Academy of Sciences of the United States of America, 117(18).

Zucker, S., Lysik, R. M., Gurfinkel, M., Zarrabi, M. H., Stetler-Stevenson, W., Liotta, L. A., … Mann, W. (1992). Immunoassay of type IV collagenase / gelatinase (MMP-2) in human plasma. Journal of Immunological Methods, 148(1–2). https://doi.org/10.1016/0022-1759(92)90172-P

148 Novel proteomics technologies to decipher protease networks in cells and tissues 9. Supplementary material

Supplemental material is stored online and can be accessed using this link. https://files.dtu.dk/a/XAshFsULIRlOCyES/Thesis_supplementary_material?l

Please use this reviewer code: F9uUY*M5

It contains:

Supplement 1- Manu_4_Spectrums of proteins and their targeted peptides Supplement 2- Manu_4_PRM traces of MMP peptides Supplement 3- Manu_4_Peptide targets Supplement 4-Manu_4_PRM_values Supplement 5_Manu_4_R_Processing_script Supplement 6- Manu_5_hPRM_windows Supplement 7-Manu_7_SC-SQ_triggers Supplement 8- Manu_7_SC-SQ Quant_values Supplement 9-Manu_7_SureQuant_proteins Supplement 10-Manu_7_Skyline Supplement 11-Manu_5_hPRM_Spectronaut

Novel proteomics technologies to decipher protease networks in cells and tissues 149 10. Acknowledgments

This thesis summarizes the research I have carried out at the Institute of Molecular Health Sciences at ETH Zurich and the Department of Bioengineering, Technical University of Denmark. Many people have contributed to the success of my PhD project. First, I want to thank my supervisor Prof. Dr. Ulrich auf dem Keller for his great help. Thank you, Ulrich, for selecting me as your first PhD student at Technical University of Denmark, for your comprehensive and intense supervision, for encouraging me, guiding me, and overseeing my personal growth through the years. I am looking forward to our future collaborations. I would like to thank Prof. Birte Svensson for being my co-supervisor and Prof. Rune Busk Damgaard, Prof. Jesper V. Olsen and Prof. Christopher M. Overall for being members of my thesis committee. Very special thanks to Lene Holberg Blicher, Maike Wennekers Nielsen and Ass.Prof. Erwin Schoof at the DTU Proteomics Core. This work would not have been possible without unconventional instrument access by Thermo Fisher Scientific team members: Dr Aaron Gajadhar, Dr. Bhavin B. Patel, Dr. Sebastien Gallien, Dr. Khatereh Motamed. Also, thanks to Prof. Sabine Werner for hosting me in the beginning of the PhD journey in her lab. I would like to thank Elizabeta Madzdarova, Vahap Canbay, Cathrine Agnete Larsen, Konstantinos Kalogeropoulos, Oscar Mejias Gomez, Dr. Philipp Kastl, Dr. Fábio Sabino, Dr. Farrell McGeoghan, Dr. Louise Bundgaard, and all members of the section for creating a nice working atmosphere, for their help, for useful discussions, really a lot of cake and a good time. I want to thank Maria Rafaeva for her dear help showing me that the world does not revolve around work alone. Many thanks to my friends in Copenhagen and my family, especially my brother and sister in law: Dr. Vytautas & Dr. Andrada Savickas for showing me that there is light at the end of the tunnel.

Simonas Savickas Copenhagen, September 2020

150 Novel proteomics technologies to decipher protease networks in cells and tissues This thesis is the result of a PhD project under the supervision of Prof. Ulrich auf dem Keller and Prof. Birte Svensson at DTU Bioengineering.

Danmarks Tekniske Universitet

Søltofts Plads 224 Evt. Post Box 2800 Kgs. Lyngby Tlf. 45 25 20 65 E-mail [email protected] www.bioengineering.dtu.dk