<<

Sampling the N-terminal proteome of human blood

David Wildes and James A. Wells1

Departments of Pharmaceutical Chemistry and Cellular and Molecular Pharmacology, University of California, San Francisco, Byers Hall, 1700 4th Street, San Francisco, CA 94158

Contributed by James A Wells, December 15, 2009 (sent for review October 15, 2009)

The proteomes of blood plasma and serum represent a potential serum albumin are useful markers for diabetes mellitus (4). This gold mine of biological and diagnostic information, but challenges information is lost when protein abundance alone is measured. such as dynamic range of protein concentration have hampered Specific enrichment of modified species can address these efforts to unlock this resource. Here we present a method to label challenges, because separating modified from unmodified and isolate N-terminal peptides from human plasma and serum. peptides greatly reduces sample complexity. Proteolysis is an This process dramatically reduces the complexity of the sample excellent candidate for this strategy. Most blood proteins are by eliminating internal peptides. We identify 772 unique N-term- subject to at least one proteolytic cleavage, when the N-terminal inal peptides in 222 proteins, ranging over six orders of magnitude secretory signal is removed during biogenesis. Additional in abundance. This approach is highly suited for studying natural cleavages may occur in the secretory pathway, and proteins proteolysis in plasma and serum. We find internal cleavages in may be further processed by endo- and exoproteases acting in plasma proteins created by endo- and exopeptidases, providing biological processes such as and complement. These information about the activities of proteolytic in blood, proteolytic processes may be perturbed in disease, and disease- which may be correlated with disease states. We also find signa- specific, -derived new N-termini in blood may be a tures of signal peptide cleavage, coagulation and complement ac- valuable class of biomarkers. tivation, and other known proteolytic processes, in addition to a Recently, we and others have developed a number of comple- large number of cleavages that have not been reported previously, BIOCHEMISTRY including over 200 cleavages of blood proteins by aminopepti- mentary chemical methods to isolate and identify N-terminal dases. Finally, we can identify substrates from specific peptides in proteins, on the basis of positive or negative enrich- by exogenous addition of the protease combined with N-terminal ment strategies (reviewed in refs. 5, 6). Here we apply one such isolation and quantitative mass spectrometry. In this way we iden- method to isolate and identify the products of proteolysis of tified proteins cleaved in human plasma by membrane-type blood proteins, an underexploited class of potential biomarkers. protease 1, an linked to cancer progression. These studies These methods and data can further illuminate the role of demonstrate the utility of direct N-terminal labeling by subtiligase proteases in blood biology and could provide a strategy for to identify and characterize endogenous and exogenous proteoly- blood-based biomarker discovery. sis in human plasma and serum. Results and Discussion plasma ∣ protease ∣ proteomics ∣ serum ∣ biomarker N-Terminal Enrichment Strategy. Specific labeling and isolation of protein N termini is challenging, because of the similar reactivity α >20 he proteomes of human blood serum and plasma contain a of N-terminal -amines and the -fold more abundant Tvast amount of useful information about the state of the body ϵ-amines of lysine side chains. We have addressed this challenge in health and disease. Because the blood contacts virtually every by employing an engineered enzyme, called subtiligase. Subtili- cell and tissue throughout the body, it contains many proteins and gase is a double mutant (S221C/ P225A) of the other chemicals that may report on health and disease. In addi- BPN′ from Bacillus amyloliquefaciens (7), containing tion, blood collection is simple and minimally invasive, making it additional modifications that enhance stability (8, 9). It lacks a medium of choice for many classical diagnostic tests. Unfortu- detectable protease activity but is capable of cleaving peptide nately, the blood proteome has been challenging to exploit for glycolate esters, forming a thioester enzyme intermediate that discovery of protein biomarkers, because of the large number can be transferred onto free protein and peptide N termini. of unique proteins and their degradation products and the broad Subtiligase exhibits absolute specificity for N-terminal α-amines range of protein concentrations (from millimolar to picomolar or over lysine ϵ-amines, making it an excellent tool for N-terminal below) in serum and plasma. Just 22 proteins are estimated to labeling. Our group has previously described a subtiligase-based make up 99% of the blood proteome by mass. Promising candi- method for labeling, isolation, and enrichment of protein N ter- dates for diagnostic markers, such as cytokines, growth factors, mini in cell lysates (10). This protocol was modified for plasma and cancer-specific antigens, may be more than a billionfold less and serum labeling and is shown schematically in Fig. 1A. N-term- abundant than the major blood proteins (1). Immunoaffinity inal peptides are isolated with a characteristic serine-tyrosine depletion of certain abundant proteins is typically employed to dipeptide tag, providing a characteristic mass shift to all labeled improve dynamic range, though it has potential disadvantages, precursor ions as well as two prominent fragment ions in all including high cost and the possibility of removing low- abundance species that bind to highly abundant proteins (2, 3). MS/MS spectra. This tag provides strong evidence for subtiligase Many biomarker discovery efforts search for variations in total tagging, enrichment, and recovery. abundance of particular proteins. This approach is simple to implement and conceptually straightforward but may miss poten- Author contributions: D.W. and J.A.W. designed research; D.W. performed research; tially informative variation in a sample. A given protein in blood D.W. and J.A.W. analyzed data; and D.W. and J.A.W. wrote the paper. may be posttranslationally modified in myriad ways, including dif- The authors declare no conflict of interest. ferential glycosylation, sulfation, oxidation, glycation, proteolysis, 1To whom correspondence should be addressed: E-mail: [email protected]. and many others. Modified proteins may be informative about This article contains supporting information online at www.pnas.org/cgi/content/full/ disease states; for instance, glycated hemoglobin (HbA1c) and 0914495107/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.0914495107 PNAS Early Edition ∣ 1of6 Downloaded by guest on September 26, 2021 O Albumin A R H N Proteins A -3 B ENLYFQ SY O 2 ApoAI -4 Fibrinogen Aα HS Subtiligase ROH Complement C3 O -5 H Plasminogen B ENLYFQ SY N Proteins Gelsolin -6 Prothrombin Factor XIII Complement C4 Avidin vWF -7 Factor XII O H -8 ADAM-TS 13 AvidinAvidin B ENLYFQ SY N Proteins -9 HGF Activator

Trypsin Log (plasma concentration (M)) VEGF-D

O -10 H AvidinAvidin B ENLYFQ SY N B 50 TEV Protease 40 30 O H 20 SCX Fractionation SY N 10

LC MS/MS # of termini

Fig. 1. Method for specific, enzymatic labeling of N termini in serum. Rank Abundance (A) Schematic of workflow. Subtiligase is used to transfer a peptide contain- C ing biotin and a TEV protease-cleavable linker onto protein N termini. Proteins are then captured on streptavidin beads and trypsinized, removing 43 13 all but the N-terminal tryptic peptide. Trypsinization on beads reduces unlabeled background created from sample precipitation in solution digests. N-terminal peptides are released with TEV protease for strong cation ex- 113 87 change fractionation and MS/MS analysis. Release leaves a SY-dipeptide tag on the N terminus. 6 3 The N-Terminal Proteome of Blood. By using our N-terminal 31 enrichment technique, we identified 772 unique N termini in 222 proteins in human serum and plasma (Dataset S1), with Fig. 2. Concentration distribution of proteins and reproducibility of the an overall peptide false discovery rate estimated at 1.0% by a method. (A) A subset of 110 proteins of established abundance (11) is plotted target-decoy strategy. We found N termini in blood proteins with by mean molar concentration in plasma. Representative low, medium, and B concentrations spanning at least six orders of magnitude, with high abundance proteins are labeled. ( ) The number of N termini detected excellent coverage in the top four logs of abundance, where in each protein is shown, arranged in order of abundance. Proteins depicted in this plot are given in Table S1.(C) Venn diagram showing results of three we detected over 70% of the 150 most abundant proteins (11) replicate experiments on a single sample of citrated plasma. (Fig. 2 and Table S1). The number of unique termini found in each protein varied on their native N termini (14), rendering them undetectable in greatly and was generally consistent with the role of proteolysis the absence of internal proteolytic processing. Thus our method in protein function. For example, multiple N termini were found is more sensitive to secreted proteins whose N termini are free in many coagulation and complement factors. We also found pro- teolysis of abundant proteins where the biological function of after signal sequence removal, and this is reflected in the propor- proteolytic cleavage is less clear. It is possible that some of these tion of secreted protein identifications. cleavages represent nonspecific cleavage of abundant proteins by blood proteases, although it is notable that the correlation be- Endoproteolysis and Exoproteolysis in Blood. Most blood proteins tween number of N termini discovered and protein abundance are subject to proteolytic cleavage by a multitude of proteases is weak. Indeed, the abundant plasma protein alpha-1 acid in the secretory pathway and the extracellular environment. glycoprotein yields no detectable N termini. Tracking these proteolytic events could shed light on important We assessed the reproducibility of our labeling and enrichment biological processes in health and disease. We compared the N strategy by performing three technical replicate experiments on a termini that we found to annotations in the Swiss-Prot and MER- single sample of citrated plasma (Fig. 2C and Dataset S1). We OPS databases to identify known termini resulting from well- B found substantial overlap between these three samples, with understood biological processes (Fig. 3 ). Interestingly, 81% 29% of peptides found in all three experiments and 56% found of the N termini that we found are not annotated in either da- in at least two. This level of overlap between technical replicates tabase. Annotated signal peptide cleavages, within five residues is well within the range expected for the mass spectrometry of predicted signal processing sites, account for 11% of our data, techniques that we used (12) and suggests that our N-terminal with annotated propeptide cleavages making up another 4.5%. labeling does not result in major variations between samples. activity on fibrinogen (2%), cleavage of “bait loops” The subcellular localizations, as annotated in Swiss-Prot, for in protease inhibitors (0.9%), and removal of initiator methio- each of the 222 proteins we report are shown in Fig. 3A. As would nines (0.4%) are also represented. Consistent with previous stu- be expected for a survey of blood proteins, 67% are known to be dies (15), a significant number of N termini (28%) appear to arise secreted. The proportion of annotated secreted proteins reported from aminopeptidase processing of peptides derived from endo- here is substantially higher than the 50% found in a recently com- protease cleavage, indicated by systematic laddering of products piled, high confidence list of proteins found in plasma (13), likely (Fig. 3C). In total, 112 termini in 53 proteins are subject to ami- reflecting a bias inherent in N-terminal labeling. Intracellular nopeptidase trimming, ranging from removal of a single amino proteins arising from both tissue leakage in vivo and cell lysis dur- acid to long ladders of aminopeptidase-processed termini. Trim- ing sample collection and preparation are likely to be acetylated ming occurs on termini derived from a variety of endoprotease

2of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914495107 Wildes and Wells Downloaded by guest on September 26, 2021 in EDTA plasma reflects this. In these experiments the additional protease inhibition provided by P100 tubes does not significantly improve the results, though others have shown a reduction in pro- teolysis of specific substrates (16). Interestingly, aminopeptidase- derived termini comprised an approximately equal proportion of all samples, suggesting either that this activity is not affected by any of these anticoagulants or that these termini reflect in vivo proteolytic processing that is not affected by the conditions of sample collection.

Sequence and Structural Determinants of Proteolytic Cleavage. In or- der to understand the nature of the proteolytic enzymes acting on proteins in the blood, we investigated the cleavage sequences of the 461 endoprotease-derived N termini of unknown significance in our dataset (Fig. 4B). A preference is seen for basic (R, K) residues preceding the cut site, and small (G, S, A) residues fol- lowing, which is consistent with many endoproteases, including those of the coagulation (17) and complement (18) pathways. Efforts to discover a simple recognition motif(s) by using the program MotifX (19) did not reveal a clear consensus sequence; evidently either the protease(s) responsible have relatively low specificity at the level of the primary structure of the cleavage site or the proteolysis we see is because of the action of many different proteases with varied specificity. Proteolytic processing is important for the activation and inac- tivation of factors involved in coagulation and complement cas- cades (18, 20). Proteolysis in these systems has been characterized

extensively in vitro, and we compared our data to the in vitro find- BIOCHEMISTRY ings for some of these proteins (Fig. 5A). Prothrombin lies at the center of the coagulation cascade and is activated to by a series of discrete cleavages, shown as gaps in the rectangular representation of the protein in Fig. 5A. Thrombin cleaves fibri- nogen, initiating fibrin polymerization to form a clot (20). We identified the expected activating cleavages of thrombin, in addition to a few cleavages of unknown significance within the activation peptides. A Fig. 3. The N-terminal proteome of human blood. ( ) The subcellular loca- In contrast to prothrombin, we see much more heterogeneous tions, as annotated in Swiss-Prot (www.uniprot.org), of the proteins detected cleavage of complement C3. C3 is extensively proteolyzed in this study. (B) Cleavage site annotations of detected N termini, according to Swiss-Prot and MEROPS (40) databases. (C) Evidence of aminopeptidase trim- throughout its life cycle in blood. After an activating cleavage ming of N termini in three proteins. Similar degradation was seen in 112 N to C3a and C3b by C3 convertase complexes, factor I and other termini. (D) Comparison of cleavage annotations found in serum and plasma enzymes inactivate C3b by additional cleavages. A series of dis- collected with three different anticoagulants. crete fragments has been defined in vitro (21). Our findings in- dicate that C3b inactivation by factor I in vivo may be more cleavages, including signal and propeptide removal, reactive heterogeneous than previously appreciated. Whereas the overall center loop of serpin cleavage, and cleavages of unknown distribution of fragments is consistent in our data, we detect clus- significance. ters of cleavages at the domain boundaries (shown as vertical ar- rows in Fig. 4C) rather than isolated, discrete cuts. Interestingly, Proteolysis in Serum and Plasma. We investigated the N-terminal these areas of heterogeneous proteolysis cluster internally only in proteomes of human serum and plasma collected with three specific fragments of C3: C3a, C3g, and C3f. C3a in particular is a different anticoagulants, with increasingly stringent suppression mediator of inflammation, and internal cleavages within it may of proteolytic activity: citrate, EDTA, and the proprietary P100 antagonize this function (22). These data suggest that even system (BD). Serum is expected to differ from plasma because well-studied proteolytic cascades may yield unique insights from of the initiation of coagulation, resulting in cleavage of coagula- such global analysis, made possible by N-terminal proteomics. tion factors, release of platelet granule contents, and a general Proteome Simplification by N-Terminal Isolation. N-terminal isola- increase in proteolytic activity (16). A comparison of the types tion reduces each protein in a mixture to one or a few peptides, of N termini found in serum and the three plasmas is shown D which potentially has the advantage of reducing the interference in Fig. 3 . The overall differences are modest but some patterns caused by abundant proteins (23). For example, serum albumin are evident. Serum and citrated plasma are enriched for N produces about 100 tryptic peptides but has only a single N ter- termini of unknown significance. EDTA and P100 plasma have minus, meaning that N-terminal labeling may reduce albumin proportionately fewer unknown cleavages and a concomitant peptides by 100-fold. In practice, we detected internal peptides increase in signal peptide and other annotated cleavages. This from serum albumin (22 total) but at substantially reduced abun- increase is consistent with a higher background of proteolysis dance compared to the parent protein. This advantage has been in serum and citrated plasma, leading to cleavages of abundant described in previous efforts to characterize N-terminal peptides, proteins after sample collection. Serum and citrate appear similar both by depletion of all internal peptides from a -digested in their background proteolysis levels, as has been shown pre- sample (23) and by positive enrichment of N termini following viously (16). EDTA is a stronger inhibitor of plasma proteases selective chemical blockage of lysine residues (15). Each of these than citrate, and the lower proportion of unknown cleavages N-terminal enrichment methods has unique advantages and

Wildes and Wells PNAS Early Edition ∣ 3of6 Downloaded by guest on September 26, 2021 Fig. 4. Patterns of endoproteolysis. (A) Se- quence logo of the eight residues (P4–P4′) surrounding the cleavage site of all nonan- notated, endoproteolytic cleavages. The y axis denotes information content and has a maximum value of 4.2. Logo created by Weblogo (http://weblogo.berkeley.edu/) (41). (C)Cleavage oftwosignificant bloodproteins, prothrombin and complement C3. Swiss-Prot annotated cleavage sites divide the proteins into domains asshown on the schematic. Clea- vage sites detected in this study are indicated with arrows. AP,thrombin activation peptide; LC, thrombin light chain.

disadvantages and thus, in aggregate, are likely to provide com- may serve as useful markers of these processes. Proteolytic frag- plementary information. For plasma and serum, subtiligase label- ments of normal plasma proteins are also directly implicated in ing has some advantage in that it does not rely on the absolute the pathogenesis of certain diseases, such as amyloidoses and efficiency of internal peptide depletion or lysine blocking. atherosclerosis (26, 27). Within our N-terminal peptide dataset, Additional reduction in complexity is afforded by the sequence we find examples of all of these classes (Table 1). specificity of subtiligase. Subtiligase is very promiscuous toward The widespread exoproteolysis we observe in our experiments N-terminal sequences, but it disfavors certain N-terminal may represent useful patterns to monitor health and disease. residues, including acidic side chains and proline (9). Several Tempst and coworkers have recently correlated exoprotease ac- abundant serum proteins, including serum albumin and apolipo- tivities in blood to metastatic cancer (28). The 112 exoprotease- protein A-I, have acidic N-terminal residues and are not efficiently sensitive peptide sequences we identify here, derived from 53 labeled by subtiligase, reducing their relative abundance in our proteins, greatly increase the number of potential sequences experiments. Other abundant plasma proteins (e.g., alpha-1 acid available and should expand the scope of this approach to bio- glycoprotein) have chemically blocked N termini. Perhaps be- marker discovery. cause of these factors, we found little benefit from pretreatment Proteolytic products of apoptosis are of significant interest in of plasma to remove the 12 most abundant proteins (Fig. S1), biomarker discovery. Apoptosis, a form of programmed cell although it is likely that targeted depletion of abundant proteins death implicated in the response of cancer cells to chemotherapy with many N termini (e.g., C3, fibrinogen) would improve and radiation, is executed by a family of cysteine proteases called coverage. caspases (25). Products of caspase proteolysis could serve as mar- It should be noted that sampling only particular peptides from kers of successful cancer therapy. For example, increase in a cas- each protein can limit protein identification. N-terminal tryptic pase-derived peptide from the intracellular protein cytokeratin peptides may be too short or too long or ionize poorly, rendering 18 (CK-18) in the serum of breast cancer patients was correlated them difficult to identify by database matching. With only a with 5-year survival in one study (29). The method we describe single digest (in this case trypsin), this method is limited to here may identify other such caspase-derived markers. Intrigu- sampling rather than comprehensive coverage of the N-terminal ingly, we find a peptide derived from cleavage of the abundant proteome. Alternate digestions with different specificities should protein gelsolin after the sequence DQTD. This cleavage has increase coverage, as demonstrated in other N-terminal proteo- been reported in cell culture screens for apoptotic caspase sub- mic studies (24). strates (10). In addition to this putative caspase-derived peptide, we also find some intracellular proteins with internal proteolytic N-Terminal Proteomics and Biomarker Discovery. Specific proteases cleavages, including abundant cellular proteins such as actin, as may be up-regulated in diverse disease states, and their activity well as less abundant proteins, including the Ran-specific GTPase may leave a mark on the plasma proteome. Proteolytic products activating protein. How these proteins are cleaved and how they of certain intracellular events, including apoptotic and necrotic reach the blood is unclear at present, but they may also represent cell death (25), may also be released into the blood, where they useful markers for disease states. Proteolytically cleaved peptides and proteins in blood are not only proxies for intracellular disease states; in some cases, they are directly involved in pathogenesis. Several possible examples of this occur in the data we report here, including proteolysis of components of the insulin-like growth factor (IGF) signaling axis

Table 1. Putative disease-associated proteolytic cleavages detected in this study Protein or peptide Disease(s) References Exopeptidase-derived Thyroid carcinoma (28) IGF/IGFBPs Colorectal cancer, androgen- (30) insensitive prostate cancer ApoE Alzheimer’s Disease (31) TTR Senile systemic amyloidosis (32) Cystatin-C Cerebral Hemorrhage with (26) Fig. 5. Plots of iTRAQ signal for representative putative MT-SP1 substrates. Amyloidosis ● ○ ▴ ▵ ▪ A2M 705; A2M 707; A2M 707; A2M720; complement C3 713; ApoAI Cardiovascular Disease (27) □ complement C3 741.

4of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914495107 Wildes and Wells Downloaded by guest on September 26, 2021 that have been implicated in cancer progression (30), proteolysis Table 2. Putative substrates of MT-SP1 of transthyretin and apolipoprotein E that may be involved in se- ’ Start Fold nile systemic amyloidosis and Alzheimer s disease (31, 32), and Protein residue* P4–P1 P1′–P4′ change† proteolysis of apolipoprotein AI that may play a role in athero- sclerosis and cardiovascular disease (27). Alpha-2-macroglobulin 705 EGLR VGFY 8.2 Multiple reaction monitoring (MRM) methods combined with Alpha-2-macroglobulin 706 GLRV GFYE 10.9 stable isotope labeled peptide standards have shown great pro- Alpha-2-macroglobulin 707 LRVG FYES 5.5 Alpha-2-macroglobulin 720 GHAR LVHV 13.1 mise for quantification of biomarkers in plasma. These methods Apolipoprotein E 210 GRVR AATV 5.6 are sensitive, showing a limit of quantitation in the low nanogram/ Complement C3 713 RRTR FISL 5.4 milliliter range in some cases (33, 34), and reproducible across Complement C3 741 QHAR ASHL 21.3 multiple laboratories (35). The sensitivity of MRM methods Fibrinogen α chain 468 TVTK TVIG 11.5 can be further enhanced by specific enrichment of peptides by Fibrinogen α chain 582 SYSK QFTS 8.2 using peptide-directed antibodies (36). N-terminal enrichment Gelsolin 33 TASR GASQ 330 α may offer a similar benefit for enhancing detection of peptides Inter- -trypsin inhibitor 126 QYRK AAIS 4.5 specifically associated with protease activity in blood. Whereas heavy chain H1 Inter-α-trypsin inhibitor 140 TVGR ALYA 77.5 the sensitivity and reproducibility of N-terminal peptide isolation heavy chain H2 remain to be demonstrated in this context, the fact that we have Inter-α-trypsin inhibitor 658 AGSR MNFR 12.6 detected proteins present at the nanomolar to high picomolar le- heavy chain H4 vel (e.g., VEGF-D and osteopontin) by using relatively insensitive *Numbering according to Swiss-Prot database. survey MS/MS methods suggests that this is a promising area for † future study. Substrates are defined as those peptides showing more than 3-fold change in iTRAQ signal after 60 s. Identification of Membrane-Type Serine Protease 1 (MT-SP1) Sites in Human Plasma. In addition to identifying proteolyzed products could result from either direct cleavage by MT-SP1 or indirect generally present in blood, our method can also be used to iden- cleavage through plasmin activation (Table 2). In addition, we de- tify substrates of specific proteases. Proteolytic enzymes make up tect evidence of rapid aminopeptidase processing following MT- 2% of the human genome, but the biological significance of most SP1 cleavage of a single site in A2M. Cleavage after R704 in A2M

is unknown, owing partly to the difficulty in identifying natural is consistent with the known sequence preferences of MT-SP1. BIOCHEMISTRY substrates (37). A sensitive method to detect specific protease However, we also find cleavages after V705 and G706. These substrates must rely on quantitative proteomic methods, in order are expected to be poor substrates of MT-SP1 and are unlikely to identify N termini that increase with time after exogenous ad- to be cleaved at the same rate as R704. More likely, this terminus dition or activation of a protease. As a proof of concept, we ex- is subject to rapid aminopeptidase processing after exposure by plored the substrates of MT-SP1 in plasma. MT-SP1 cleavage. This observation is consistent with an active MT-SP1 is present on a variety of epithelial cell types, is natu- interplay between endo- and exoproteases in blood. rally shed into the blood, and is up-regulated in certain cancers. It is essential for development of a functional epidermal barrier, Conclusion likely because of its role in processing profilaggrin to filaggrin Here we have described a method for labeling and enrichment of (38). However, MT-SP1 has also been shown to process a number N-terminal peptides from proteins in blood serum and plasma, of other substrates, including prohepatocyte growth factor activa- allowing us to identify the sequences of the sites of proteolytic tor, prourokinase , and protease activated action in blood. We discovered many N termini corresponding receptor 2 (39). to known biological processes. In addition, over half of the N ter- MT-SP1 is rapidly inhibited in plasma, with a half-life of activ- mini we found have not been reported in protein databases. Some SI Text ity of 30 s (Fig. S2 and ). We therefore explored a 60-s time of these cuts may represent biologically significant substrates for course of proteolysis by labeling increasing time points with in- blood proteases, whereas others may be cleavages that occur dur- creasing isobaric tag for relative and absolute quantitation ing sample collection and storage. These results may impact the (iTRAQ) reporter ion masses. Of 86 peptides identified in this choice of representative peptides for MS-based quantification of experiment, 13 showed a large (>5-fold) change in iTRAQ signal blood proteins; we identify protease-sensitive species that could over the time course and were identified as putative substrates, introduce significant variability if they were used for this purpose. listed in Table 2. Plots of the iTRAQ ratio vs. time for represen- We also have demonstrated the utility of N-terminal labeling to tative peptides are shown in Fig. 5. Several putative substrates of MT-SP1 may be of functional in- identify substrates of proteases acting in blood. We anticipate terest. We observe multiple cuts in the bait loop of the protease using this method, particularly in combination with sensitive la- inhibitor α-2 macroglobulin (A2M), consistent with apparent bel-free quantification approaches, as a means to rapidly profile A2M inhibition in peptide experiments (Fig. S2 and SI Text). the actions of proteases in blood, ranging from endogenous blood Two cleavages in complement C3, located in the C3a anaphylo- coagulation and tissue surface proteases to pathogen-associated toxin domain, are also of interest. Free C3a is a potent mediator enzymes. of inflammation, whose function requires key C-terminal amino N-terminal peptide isolation should simplify blood proteome acids (22). Interestingly, these residues are removed by one of the digests by enriching for one or a few peptides per protein. two observed cleavages. Whereas our data cannot distinguish Whereas this may reduce the sensitivity for detecting certain whether this cleavage occurs in free C3a or in intact C3, it is proteins, such as those with N-terminal peptides that perform an intriguing observation, suggesting the possibility that cell-sur- poorly in the mass spectrometer or are too short for reliable face MT-SP1 has the ability to inactivate C3a and reduce inflam- database matching, it likely has advantages for discovery of matory response. biomarkers associated with proteolytic processes. It selects for N termini that increase in abundance after addition of MT-SP1 a suite of analytes that can be monitored by sensitive MRM meth- are not necessarily direct substrates of the protease. MT-SP1 is ods, potentially without the need to enrich each individual ana- known to activate uPA (39), which may process other proteins, lyte peptide by immunoaffinity methods. We believe that the or activate plasmin, leading to further indirect proteolysis. We application of this method holds promise for future biomarker detect several cleavages after lysine residues in fibrinogen Aα that candidate discovery.

Wildes and Wells PNAS Early Edition ∣ 5of6 Downloaded by guest on September 26, 2021 Materials and Methods Detection of MT-SP1 Substrates. Samples (1 mL) of EDTA plasma were treated μ Materials. Details of materials used in this study may be found in SI Text. with 1 M MT-SP1 for 10, 30, and 60 s and then quenched by addition of 4-(2-aminoethyl)benzenesulfonyl fluoride and PMSF. An untreated sample Sample Labeling and Workup. Samples (2 mL) were biotinylated with was used for a zero time point. Samples were processed as described above. subtiligase and peptide ester for 60 min at room temperature. Proteins were After desalting, they were labeled with iTRAQ reagents as follows: 0 s, mass reduced, alkylated with iodoacetamide, captured on immobilized streptavi- 114; 10 s, mass 115; 30 s, mass 116; 60 s, mass 117, by using the protocol pro- din, digested with trypsin, and released from the resin with TEV protease. vided by the manufacturer. Additional details of iTRAQ analysis are provided Additional details of the labeling and workup are provided in SI Text. in SI Text.

LCMS/MS Acquisition and Data Processing. Peptides were subject to offline ACKNOWLEDGMENTS. We thank A.L. Burlingame, D. Maltby, and J.C. Trinidad strong cation exchange fractionation and then to C18 chromatography for assistance with design and execution of mass spectrometry experiments, coupled directly to a QSTAR Pulsar or QSTAR Elite mass spectrometer. N.J. Agard, E.D. Crawford, and C.M. Jackson for critical reading of the Additional details of data acquisition and processing are in SI Text. manuscript, and members of the Wells and Burlingame groups for helpful discussions. Active MT-SP1 was a generous gift from E.L. Madison (Catalyst Biosciences, South San Francisco, CA). This work was supported by National Peptide Identification. Database searches to identify peptides were per- Institutes of Health (NIH) Grant F32GM079931 (to D.W.) and NIH Grant R01 formed by using Protein Prospector v. 5.2.2 (UCSF Mass Spectrometry Facility, GM081051 (to J.A.W.). Mass spectrometry was performed at the Bio-Organic prospector.ucsf.edu) and the December 2008 release of Swiss-Prot. Addi- Biomedical Mass Spectrometry Resource at UCSF (A.L. Burlingame, Director) tional details of peptide identification are in SI Text. Labeled and unlabeled supported by the Biomedical Research Technology Program of the NIH peptides are listed in Dataset S1, and annotated peaklists are provided in National Center for Research Resources, NIH NCRR P41RR001614 and NIH Dataset S2. NCRR RR015804.

1. Anderson NL, Anderson NG (2002) The human plasma proteome: History, character, 22. Hugli TE (1990) Structure and function of C3a anaphylatoxin. Curr Top Microbiol and diagnostic prospects. Mol Cell Proteomics 1:845–867. Immunol 153:181–208. 2. Liu T, et al. (2006) Evaluation of multiprotein immunoaffinity subtraction for plasma 23. McDonald L, Beynon RJ (2006) Positional proteomics: Preparation of amino-terminal proteomics and candidate biomarker discovery using mass spectrometry. Mol Cell Pro- peptides as a strategy for proteome simplification and characterization. Nat Protocols teomics 5:2167–2174. 1:1790–1798. 3. Zolotarjova N, et al. (2005) Differences among techniques for high-abundant protein 24. Schilling O, Overall CM (2008) Proteome-derived, database-searchable peptide depletion. Proteomics 5:3304–3313. libraries for identifying protease cleavage sites. Nat Biotechnol 26:685–694. 4. Cohen MP, Clements RS (1999) Measuring glycated proteins: Clinical and methodolo- 25. Taylor RC, Cullen SP, Martin SJ (2008) Apoptosis: Controlled demolition at the cellular gical aspects. Diabetes Technol The 1:57–70. level. Nat Rev Mol Cell Biol 9:231–241. 5. Agard NJ, Wells JA (2009) Methods for the proteomic identification of protease 26. Janowski R, Abrahamson M, Grubb A, Jaskolski M (2004) Domain swapping in J Mol Biol – substrates. Curr Opin Chem Biol 13:503–509. N-truncated human cystatin C. 341:151 160. 6. Doucet A, et al. (2008) Metadegradomics: Toward in vivo quantitative degradomics of 27. Liz MA, Gomes CM, Saraiva MJ, Sousa MM (2007) ApoA-I cleaved by transthyretin has J Lipid proteolytic post-translational modifications of the cancer proteome. Mol Cell Proteo- reduced ability to promote cholesterol efflux and increased amyloidogenicity. Res – mics 7:1925–1951. 48:2385 2395. 7. Abrahmsen L, et al. (1991) Engineering subtilisin and its substrates for efficient 28. Villanueva J, et al. (2008) A sequence-specific exopeptidase activity test (SSEAT) for Mol Cell Proteomics – ligation of peptide bonds in aqueous solution. Biochemistry 30:4151–4159. "functional" biomarker discovery. 7:509 518. 8. Atwell S, Wells JA (1999) Selection for improved subtiligases by phage display. Proc 29. Olofsson MH, et al. (2007) Cytokeratin-18 is a useful serum biomarker for early Clin Cancer Res Natl Acad Sci USA 96:9497–9502. determination of response of breast carcinomas to chemotherapy. 13:3198–3206. 9. Chang TK, Jackson DY, Burnier JP, Wells JA (1994) Subtiligase: A tool for semisynthesis 30. Fuchs CS, et al. (2008) Plasma insulin-like growth factors, insulin-like binding protein-3, of proteins. Proc Natl Acad Sci USA 91:12544–12548. and outcome in metastatic colorectal cancer: Results from intergroup trial n9741. 10. Mahrus S, et al. (2008) Global sequencing of proteolytic cleavage sites in apoptosis by Clin Cancer Res 14:8263–8269. specific labeling of protein N termini. Cell 134:866–876. 31. Harris FM, et al. (2003) Carboxyl-terminal-truncated apolipoprotein E4 causes 11. Hortin GL, Sviridov D, Anderson NL (2008) High-abundance polypeptides of the human alzheimer's disease-like neurodegeneration and behavioral deficits in transgenic mice. plasma proteome comprising the top 4 logs of polypeptide abundance. Clin Chem Proc Natl Acad Sci USA 100:10966–10971. 54:1608–1616. 32. Mizuguchi M, et al. (2008) Unfolding and aggregation of transthyretin by the 12. Elias JE, Haas W, Faherty BK, Gygi SP (2005) Comparative evaluation of mass spectro- truncation of 50 N-terminal amino acids. Proteins 72:261–269. metry platforms used in large-scale proteomics investigations. Nat Methods 33. Keshishian H, et al. (2007) Quantitative, multiplexed assays for low abundance 2:667–675. proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol 13. States DJ, et al. (2006) Challenges in deriving high-confidence protein identifications Cell Proteomics 6:2212–2229. from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol 34. Keshishian H, et al. (2009) Quantification of cardiovascular biomarkers in patient plas- – 24:333 338. ma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 14. Brown JL, Roberts WK (1976) Evidence that approximately eighty per cent of the 8:2339–2349. J Biol Chem soluble proteins from ehrlich ascites cells are N-alpha-acetylated. 35. Addona TA, et al. (2009) Multi-site assessment of the precision and reproducibility of – 251:1009 1014. multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotech- Biochem J 15. Timmer JC, et al. (2007) Profiling constitutive proteolytic events in vivo. nol 27:633–641. – 407:41 48. 36. Anderson NL, et al. (2004) Mass spectrometric quantitation of peptides and proteins 16. Yi J, Kim C, Gelfand CA (2007) Inhibition of intrinsic proteolytic activities moderates using stable isotope standards and capture by anti-peptide antibodies (SISCAPA). J Proteome Res – preanalytical variability and instability of human plasma. 6:1768 1781. J Proteome Res 3:235–244. 17. Page MJ, Macgillivray RTA, Di Cera E (2005) Determinants of specificity in coagulation 37. Marnett AB, Craik CS (2005) Papa's got a brand new tag: Advances in identification of proteases. J Thromb Haemost 3:2401–2408. proteases and their substrates. Trends Biotechnol 23:59–64. 18. Sim RB, Tsiftsoglou SA (2004) Proteases of the . Biochem Soc Trans 38. List K, Bugge TH, Szabo R (2006) Matriptase: Potent proteolysis on the cell surface. 32:21–27. Mol Med 12:1–7. 19. Schwartz D, Gygi SP (2005) An iterative statistical approach to the identification of 39. Uhland K (2006) Matriptase and its putative role in cancer. Cell Mol Life Sci protein phosphorylation motifs from large-scale data sets. Nat Biotechnol 63:2968–2978. 23:1391–1398. 40. Rawlings ND, et al. (2008) Merops: The peptidase database. Nucleic Acids Res 20. Jackson CM, Nemerson Y (1980) Blood coagulation. Annu Rev Biochem 49:765–811. 36:D320–D325. 21. Sahu A, Lambris JD (2001) Structure and biology of complement protein C3, a connect- 41. Crooks GE, Hon G, Chandonia J-M, Brenner SE (2004) Weblogo: A sequence logo ing link between innate and acquired immunity. Immunol Rev 180:35–48. generator. Genome Res 14:1188–1190.

6of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914495107 Wildes and Wells Downloaded by guest on September 26, 2021